Disclosure of Invention
The technical problem to be solved by the present invention is to provide a data backup method and a data backup apparatus, which are used to solve the defect that the write performance of the existing storage device supporting the COW mode is greatly reduced during the continuous write operation.
In order to solve the above technical problem, an embodiment of the present invention provides a data backup method, which is applied to a storage device, where the storage device includes a primary volume and a storage volume, and the method includes: detecting a write request to a storage device, wherein the write request comprises first data written to a primary volume; judging that the storage equipment is in a working state, and loading a time tag on the first data to form second data; redirecting the write request to a save volume of the storage device, and writing the second data into the save volume; when the remaining space of the storage volume is smaller than a set space threshold value, at least one second data is selected from the plurality of second data in the storage volume in the order of the time stamp and written to the primary volume.
In the method, redirecting the write request to the storage volume of the storage device further includes: and applying for a backup space in the storage volume, wherein the backup space comprises at least one meta-space with a first fixed granularity, and the capacity of the backup space is the sum of the capacities of all the meta-spaces and is larger than the size of the second data.
The method further comprises the following steps: and when the IO throughput of the primary volume is smaller than the flow threshold, searching third data associated with the second data from the primary volume, calling the third data into a backup space where the second data is located in the storage volume for completion, and forming a complete first effective data block in the backup space.
The method further comprises the following steps: and when the IO throughput of the primary volume is smaller than the flow threshold, searching fourth data associated with the second data from the storage volume, calling the fourth data into a backup space of the storage volume where the second data is located for completion, and forming a complete second effective data block in the backup space.
The method further comprises the following steps: when the IO throughput of the primary volume is smaller than the flow threshold value, searching fifth data associated with the second data from the primary volume, and searching sixth data associated with the second data from the storage volume; and calling the fifth data and the sixth data into a backup space where the second data is located in the storage volume for completion, and forming a complete third effective data block in the backup space.
The method further comprises the following steps: detecting a reading request to the storage device, redirecting the reading request to the storage volume, and reading second data in the storage volume; when reading the second data from the storage volume fails, redirecting the reading request to a primary volume of a storage device; the complete valid data block corresponding to the second data is read from the primary volume.
A data backup device is connected with a storage device, and comprises: a detection unit, configured to detect a write request to a storage device, where the write request includes first data written to a primary volume; the tag unit is used for judging that the storage equipment is in a working state and loading a time tag on the first data to form second data; the redirecting unit is used for redirecting the writing request to a storage volume of the storage device and writing the second data into the storage volume; and the interaction unit is used for selecting at least one second data from the plurality of second data in the storage volume according to the sequence of the time labels and writing the selected second data into the primary volume when the residual space of the storage volume is smaller than the set space threshold value.
The device further comprises: and the storage volume management unit is used for applying for a backup space in the storage volume, the backup space comprises at least one element space with a first fixed granularity, and the capacity of the backup space is the sum of the capacities of all the element spaces and is larger than the size of the second data.
The device further comprises: the effective data complementing unit is used for searching third data which is associated with the second data from the original volume when the IO throughput of the original volume is smaller than a flow threshold value, calling the third data into a backup space where the second data is located in the storage volume for complementing, and forming a complete first effective data block in the backup space; or when the IO throughput of the primary volume is smaller than the flow threshold, searching fourth data associated with the second data from the storage volume, calling the fourth data into a backup space of the storage volume where the second data is located for completion, and forming a complete second effective data block in the backup space; or when the IO throughput of the primary volume is smaller than the flow threshold, searching fifth data associated with the second data from the primary volume, and searching sixth data associated with the second data from the storage volume; and calling the fifth data and the sixth data into a backup space where the second data is located in the storage volume for completion, and forming a complete third effective data block in the backup space.
The device further comprises: the reading unit is used for detecting a reading request of the storage device, redirecting the reading request to the storage volume and reading a second data in the storage volume; when reading the second data from the storage volume fails, redirecting the reading request to a primary volume of a storage device; the complete valid data block corresponding to the second data is read from the primary volume.
The technical scheme of the invention has the following beneficial effects: when the storage device is in a working state, when a write-in request to the storage device is detected, the first data contained in the write-in request is loaded with a time tag and then written into the storage volume to form second data, the write operation is not required to be executed to the original volume of the storage device, the write performance is ensured not to be reduced, and because the remaining space of the storage volume is smaller than a set space threshold value, at least one second data is written into the original volume again, the storage device is ensured to be always capable of ensuring that the write performance is not reduced through redirection.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides backup based on a given time point, and aims to solve the problem that the CDP in a copy-on-Write COW mode brings pressure on the Write performance of storage equipment.
An embodiment of the present invention provides a method for data backup, as shown in fig. 1, which is applied to a storage device, where the storage device includes a primary volume and a storage volume, and the method includes:
detecting a write request to a storage device, wherein the write request comprises first data written to a primary volume;
judging that the storage equipment is in a working state, and loading a time tag on the first data to form second data;
redirecting the write request to a save volume of the storage device, and writing the second data into the save volume;
when the remaining space of the storage volume is smaller than a set space threshold value, at least one second data is selected from the plurality of second data in the storage volume in the order of the time stamp and written to the primary volume.
By applying the provided technology, when the storage device is in a working state and a write request to the storage device is detected, the first data contained in the write request is written into the save volume after being loaded with the time tag to form second data, the write operation is not required to be executed to the original volume of the storage device, the write performance is ensured not to be reduced, and because the residual space of the save volume is smaller than the set space threshold value, at least one second data is written into the original volume again, the storage device is ensured to be always capable of ensuring that the write performance is not reduced through redirection.
The technology provided by the embodiment of the invention supports that the storage device adopts a redirection technology during writing, a plurality of second data in the storage volume are provided, and each second data has a time tag, so that the second data can be sequenced according to time. When the remaining space of the storage volume is smaller than the set space threshold, storing the oldest historical version data, namely one or more second data with the most advanced time, into the primary volume, so that the storage volume has more space to accept the write request subsequently, wherein the second data in the storage volume is also called the historical version data.
As shown in fig. 1, the write request is handled by an IO processing process/thread, and the IO processing process/thread receives the write request, where the write request includes first data written to the primary volume;
in order not to affect the writing performance, the writing request is redirected to a storage volume of the storage device, and second data is written into the storage volume, wherein the second data is formed by loading the time tag on the first data.
The embodiment of the invention provides backup based on a given time point, so when receiving a write-in request of a specified storage position in a storage volume, whether the specified storage position in the storage volume has history version data needs to be judged, if so, a time tag is printed to redirect the storage volume, otherwise, the storage volume is written into an original volume.
In an application scenario, at time T0, a first data needs to be written to the original volume, and thus a second data corresponding to the first data is formed by changing the designated storage location a of the storage volume, and if a write request for the designated storage location a is received again after time T0, it is first determined whether the second data exists in the storage volume, and if so, the earlier second data stored at time T0 needs to be overwritten with the new second data tagged with a time stamp; if no change has been made to the data in the specified storage location a since initialization, the write request only needs to write directly to the primary volume.
If there is again a write request to location A at time T1, the time stamp needs to be re-stamped. If there are any write requests to the location A between the two times (T0, T1), only the second data saved at time T0 is overwritten.
In a preferred embodiment, redirecting write requests to a storage volume of a storage device further comprises:
and applying for a backup space in the storage volume, wherein the backup space comprises at least one meta-space with a first fixed granularity, and the capacity of the backup space is larger than the size of the second data.
The second data (history version data) on the storage volume of the storage device is stored in a fixed granularity, so that disorder and disorder are avoided, the complexity and disorder of a metadata pointer are reduced, and the query efficiency is improved.
The ROW mode provides a technique for improving read performance on the basis of solving the problem of write performance.
Without loss of generality, the ROW mode adopts a 64KB granularity, if a certain write request includes 4KB of first data, and the second data corresponding to the 4KB of first data is usually also 4KB, a 64KB storage space is pre-allocated, and the 4KB of second data is written into the 64KB storage space and identifies corresponding valid bits, and at this time, only 4KB of valid data exists in the 64KB storage space. When the write request is less than 64KB, the storage volume will also pre-allocate 64KB of storage space, and similarly, when the write IO is greater than 64KB, the write IO request will be divided into N small write requests of 64KB in size.
The storage device receives a read request of 32KB for the storage space in the storage volume at a later time, at this time, a part of the second data located in 64KB is uncertain, the 32KB to be read may be located in the valid data area completely, may be located in the valid data area only partially, or is not located in the valid data area completely, the read size may also be various uncertain conditions such as 4KB, 8KB, etc., if the 32KB to be read is located in the valid data area completely, the complexity of the algorithm and the request response time are increased, and therefore, it is necessary to first complement the part of the data from the storage area of the second data or the original volume, and then perform the operation of the read request.
In a preferred embodiment, further comprising:
when the IO throughput of the primary volume is smaller than the flow threshold value, searching third data which is associated with the second data from the primary volume,
and calling the third data into a backup space where the second data is located in the storage volume for completion, and forming a complete first effective data block in the backup space.
Wherein the third data in the reel is earlier in time than the second data.
In a preferred embodiment, further comprising:
when the IO throughput of the primary volume is smaller than the flow threshold value, finding fourth data associated with the second data from the storage volume,
and calling the fourth data into a backup space where the second data is located in the storage volume for completion, and forming a complete second effective data block in the backup space.
Wherein the fourth data in the save volume is earlier in time than the second data.
In a preferred embodiment, further comprising:
when the IO throughput of the primary volume is smaller than the flow threshold value, searching fifth data associated with the second data from the primary volume, and searching sixth data associated with the second data from the storage volume;
and calling the fifth data and the sixth data into a backup space where the second data is located in the storage volume for completion, and forming a complete third effective data block in the backup space.
The fifth data in the original volume, the sixth data in the storage volume, is earlier in time than the second data.
In the process of writing data, whether the first data, the second data, the third data, the fourth data, or the fifth data and the sixth data are written, a bitmap is set at the start position of the backup space where the data are stored, a valid bit pointing to the valid data is marked in the bitmap, and an end character is set at the end position.
In an application scenario, to solve the influence on the read performance, the read-write request to the storage device is monitored, and when the IO throughput of the primary volume is smaller than the traffic threshold, a completion thread is started to complete all the non-valid data areas without influencing the primary volume performance, as shown in fig. 2, the method includes:
step 201, starting a completion process corresponding to the completion operation, as shown in fig. 3, where the completion process completes each second data (history version data) in the storage volume.
Step 202, inquiring a first valid data block of second data in the storage volume;
in step 203, it is queried whether an invalid data block exists after the first valid data block, which is passed to step 204, otherwise, it is passed to step 205.
Step 204, copying the process by the kernel, and executing specific completion operations, including:
finding third data from the primary roll that is associated with said second data,
or, finding fourth data associated with the second data from the save volume,
or, searching for fifth data associated with the existence of the second data from the original volume and searching for sixth data associated with the existence of the second data from a storage volume;
and calling the backup space where the second data in the storage volume is located for completion, and forming a complete effective data block in the backup space.
A bitmap is set at the start position of the backup space, in which the valid bits pointing to valid data blocks are marked, and an end symbol is set at the end position of the backup space.
Returning to step 202.
Step 205, the process ends after completion.
In a preferred embodiment, further comprising:
detecting a reading request to the storage device, redirecting the reading request to the storage volume, and reading second data in the storage volume;
when reading the second data from the storage volume fails, redirecting the reading request to a primary volume of a storage device;
the complete valid data block corresponding to the second data is read from the primary volume.
Detecting a reading request to a storage device, wherein the reading request requests to read a first data, searching and searching whether a corresponding position has a second data of the first data in a storage volume, and if so, redirecting the reading request to the storage volume; if not, the read request is redirected to the storage device's primary volume because: the storage volume does not have the second data corresponding to the first data from beginning to end, in other words, the first data is not modified since the initialization, so that the second data corresponding to the first data does not exist in the storage volume.
In an application scenario, the network backup storage device includes a primary volume and a CDP storage volume, and in actual deployment, as shown in fig. 4, a host of a service system performs mirror deployment on the storage device, and synchronizes data in a local volume to the network backup storage device in a mirror manner. The local volume and the primary volume are in a mirror image relationship, after deployment initialization is completed, the primary volume stores data which is the same as that in the local volume, the data version of the primary volume is the oldest, and the CDP storage volume stores all the initialized modified historical version data.
In order to reduce the influence on the reading performance of the storage device and the service system, a reading request is received, the reading request specifically requests to read one datum, the reading request is optimized at a logic level,
normally, the read request only reads to the local volume, which hardly affects the read performance of the business system,
when a local volume fails or data is recovered, the storage volume in the network backup storage device needs to be read, and since the completion process completes the historical version data corresponding to the data in the storage volume when the system is idle, the reading performance of the service system is hardly affected in the process of reading the storage volume, so that the continuous data protection function is realized under the condition that the reading and writing performance of the original service system is not affected to the maximum extent.
The at least one second data selected from the plurality of second data in the save volume according to the time tag sequence is written into the original volume, and the at least one second data is different second data corresponding to different first data and different versions of second data corresponding to the same first data. Since the embodiment of the present invention provides backup based on a given time point, for the same first data a, if the time T0 has backed up the second data a0 for it, if the time T1 is backed up again, the second data a1 is formed, and similarly, backup performed at the times T2 and T3 also forms different versions of the second data a2 and the second data a 3. And if and only if the residual space of the storage volume is not enough to store the historical version data, writing the oldest historical version data back to the primary volume and recycling the storage space.
The embodiment of the invention provides a data backup device, which is connected with a storage device, and comprises:
a detection unit, configured to detect a write request to a storage device, where the write request includes first data written to a primary volume;
the tag unit is used for judging that the storage equipment is in a working state and loading a time tag on the first data to form second data;
the redirecting unit is used for redirecting the writing request to a storage volume of the storage device and writing the second data into the storage volume;
and the interaction unit is used for selecting at least one second data from the plurality of second data in the storage volume according to the sequence of the time labels and writing the selected second data into the primary volume when the residual space of the storage volume is smaller than the set space threshold value.
In a preferred embodiment, the apparatus further comprises:
and the storage volume management unit is used for applying for a backup space in the storage volume, the backup space comprises at least one element space with a first fixed granularity, and the capacity of the backup space is the sum of the capacities of all the element spaces and is larger than the size of the second data.
In a preferred embodiment, the apparatus further comprises:
the effective data complementing unit is used for searching third data which is associated with the second data from the original volume when the IO throughput of the original volume is smaller than a flow threshold value, calling the third data into a backup space where the second data is located in the storage volume for complementing, and forming a complete first effective data block in the backup space;
or,
when the IO throughput of the primary volume is smaller than the flow threshold, searching fourth data associated with the second data from the storage volume, transferring the fourth data into a backup space of the storage volume where the second data is located for completion, and forming a complete second effective data block in the backup space;
or when the IO throughput of the primary volume is smaller than the flow threshold, searching fifth data associated with the second data from the primary volume, and searching sixth data associated with the second data from the storage volume; and calling the fifth data and the sixth data into a backup space where the second data is located in the storage volume for completion, and forming a complete third effective data block in the backup space.
In a preferred embodiment, the apparatus further comprises:
the reading unit is used for detecting a reading request of the storage device, redirecting the reading request to the storage volume and reading a second data in the storage volume;
when reading the second data from the storage volume fails, redirecting the reading request to a primary volume of a storage device;
the complete valid data block corresponding to the second data is read from the primary volume.
The advantages after adopting this scheme are: when the storage device is in a working state, when a write-in request to the storage device is detected, the first data contained in the write-in request is loaded with a time tag and then written into the storage volume to form second data, the write operation is not required to be executed to the original volume of the storage device, the write performance is ensured not to be reduced, and because the remaining space of the storage volume is smaller than a set space threshold value, at least one second data is written into the original volume again, the storage device is ensured to be always capable of ensuring that the write performance is not reduced through redirection.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.