CN109408290B

CN109408290B - Fragmented file recovery method and device based on InoDB and storage medium

Info

Publication number: CN109408290B
Application number: CN201811225169.1A
Authority: CN
Inventors: 梁德荣; 田庆宜; 黄建邦; 沈长达; 吴少华; 张学君
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2021-02-26
Anticipated expiration: 2038-10-19
Also published as: CN109408290A

Abstract

The invention provides a fragmented file recovery method, a fragmented file recovery device and a storage medium based on InNODB, wherein the method comprises the following steps: reading n bytes of data from an initial position based on the InNODB as one data page of the InNODB data file; reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page, judging whether CheckSum1 is equal to CheckSum2, if not, obtaining Offset as Offset + m, reading the data again, and if so, recovering; reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, and sequencing the data pages from small to large in the file to which the data pages belong according to the page number PageNo. The invention can recover data from the whole disk and the mirror image based on the page structure of the InoDB data file, can recover data without depending on file system file records, can extract the undamaged part of the file if the file is partially damaged, and can trace the source and recombine the fragments and sort and recombine the fragments if the fragments contain a plurality of data files.

Description

Fragmented file recovery method and device based on InoDB and storage medium

Technical Field

The invention relates to the technical field of data processing, in particular to a fragmented file recovery method and device based on InNODB and a storage medium.

Background

InnodB has wide application as a default storage engine for MySql data. In database recovery, the recovery of the MySql database by the electronic data forensics industry is more urgent. When the MySql database file is deleted manually, viruses are damaged, a magnetic disk is damaged, and the like, the data file is lost, and how to accurately and comprehensively recover the file data is an important and urgent problem to be solved.

At present, a plurality of recovery software for deleting files exist in the market, the recovery software is based on the recovery of file system file records or the recovery of file signatures, and the recovery method based on the file system file records has the following defects: 1. the file record cannot be recovered after being covered by the new file record; 2. the disk executes a quick format, so that file records are emptied and cannot be recovered; 3. the disk has bad tracks in the file records, so that the file records cannot be read and cannot be recovered. The file signature-based recovery method has the following disadvantages: 1. file data is discontinuous on a magnetic disk and cannot be recovered; 2. the file header and the signature of the file are overwritten and cannot be recovered.

Disclosure of Invention

The present invention provides the following technical solutions to overcome the above-mentioned drawbacks in the prior art.

An InNODB-based fragmented file recovery method comprises the following steps:

a reading step of reading n-byte data as one data page of the InnoDB data file from an InnoDB-based initial position Offset of 0;

a matching step, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, resetting to read, and if so, resetting to execute a restoring step;

a recovery step, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then making Offset be Offset + n and re-executing the reading step; where m is a unit of data offset and n is the size of one data page.

Further, the fragmentation file is an ibdata and/or ibd fragmentation file.

Further, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.

Further, two integer xor algorithms are defined, with operators set by: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;

the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is Ni, wherein 1 is < i < ═ m, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.

Still further, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.

The invention also provides a fragment file recovery device based on InoDB, which comprises:

a reading unit for reading n-byte data as one data page of the InnoDB data file from an InnoDB-based initial position Offset of 0;

the matching unit is used for reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2 or not, if not, resetting the Offset to be Offset + m, and re-executing the operation of the reading unit, and if yes, executing the operation of the recovery unit;

the recovery unit is used for reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting Offset to Offset + n to re-execute the operation of the reading unit; where m is a unit of data offset and n is the size of one data page.

Further, the fragmentation file is an ibdata and/or ibd fragmentation file.

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

The invention also proposes a computer-readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.

The invention has the technical effects that: the invention is based on the page structure of InoDB data files, namely, data can be recovered by taking a data page as a unit, the data files can be recovered from storage media such as a whole disk and a mirror image, the data recovery can be carried out without depending on file system file records, if the files are partially damaged (such as encrypted by viruses and partially covered), the undamaged parts of the files can be extracted, if the storage media contain fragments of a plurality of data files, the fragments can be traced and recombined according to file identification FileId, and even if the data fragments are discontinuously and disorderly distributed in the disk, the fragments can be sorted and recombined according to page number PageNo.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an InnoDB data file structure according to an embodiment of the present invention.

Fig. 2 is a data page schematic of an InnoDB according to an embodiment of the present invention.

Fig. 3 is a flowchart of an inbo-based fragmented file restoration method according to an embodiment of the present invention.

Fig. 4 is a structural diagram of an inodb-based fragmented file restoration apparatus according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

InNODB is the maximum performance design for handling large amounts of data. The InnodB-based full integration with the MySQL server maintains its own buffer pool for caching data and indexes in the main memory. InNODB stores its tables & indices in a tablespace, which may contain several files (or original disk partitions). Technically, InNODB is a complete database system placed in MySQL background, and establishes a special buffer pool in main memory for caching data and indexing.

Referring to fig. 1, the InnoDB data file structure is composed of a series of data pages, and the data pages are ordered from small to large starting with a page number of 0.

Referring to FIG. 2, the InNODB storage engine has a data page size of 16384 bytes, wherein 0-3 bytes store the check value, 4-7 bytes store the page number PageNo, and 34-37 bytes store the file ID, i.e. the file identification FileId.

Based on the above description, it is understood that the file structure and the data page structure of the InNODB storage engine are the basis for realizing data recovery.

The principle of data recovery of the invention is as follows: 1) calculating a check value of the data block by using a folding and checking algorithm, wherein if the calculated check value is equal to a check value stored at the head of the page, the data block is the page of the InNODB data file; 2) in a database example, each data file has a unique file ID (file identification), the file ID of the file is recorded in each data, and the data pages are subjected to file combination through the characteristic; 3) each data page records the page number of the page, the data pages are sorted in the file by increasing the page number, and the data pages are sorted in the file by the characteristic.

Fig. 3 shows an inbo based fragmented file restoration method of the present invention, which includes:

in the reading step S101, n bytes of data are read as one data page of the inodb data file from the initial position Offset 0 based on the inodb.

A matching step S102, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, resetting to Offset + m, and executing a reading step S101 again, and if so, executing a recovery step S103. The key point of the invention is to calculate whether the check value of the data page and the read check value are constant, which is the key point of recovering the file and is an important invention point of the invention.

A restoring step S103, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting the Offset to Offset + n to re-execute the reading step S101; where m is a data offset unit, n is a data page size, that is, m may be 1 byte, a sector size, a cluster size, etc. according to the read strategy, and n is generally 16384 bytes, although others are also possible. The recovery operation continues until the data is read.

In the matching step S102, the first 4 bytes of the data page are read as the check value CheckSum1, which is known about the specific structure of the data page, see fig. 2 and the corresponding description above.

In the InNODB, the data is in the format of ibdata or ibd, so the fragment file type recovered in the invention is ibdata or ibd fragment file, and the two can be recovered together.

Another important inventive point of the present invention is to calculate the check value of the data page, which is an important step for implementing the present invention, specifically, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.

In order to calculate the check value of the data page, the invention further defines two integer xor algorithms, which is also an important invention of the invention, and specifically, an operator of the xor algorithm is set as: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

based on the xor value algorithm, the invention provides a folding xor calculation, which is one of the important points of the invention, and is the key point for realizing the invention, and the operation is as follows: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is fold Ni, wherein 1 is < i < ═ m, i and m are integers, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.

In one embodiment, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.

In one embodiment, the manner in which the data pages are merged according to the FileId is as follows,

where f denotes all fragment page information in a single ibdata/ibd file, PageCount denotes the number of pages, p_i＝{PageCheckSum_i,PageNo_i,FileId_i,Offset_iIn this case, i is an integer of 0 to n, which means PageCheckSum_iCheck value of data page, PageNo_iTo representPage number, FileId of a data page_iIndicates the file id, Offset to which the data page belongs_iIndicating the location of the data page on disk. I.e. according to FileId_iThe recovered data pages can be merged and then PageNo_iAnd sequencing the data pages to obtain the restored file.

Fig. 4 shows an inbo based fragmented file restoration method of the present invention, which includes:

a reading unit 401 for reading n bytes of data as one data page of the inodb data file from an initial position Offset 0 based on the inodb.

A matching unit 402, configured to read the first 4 bytes of the data page as a check value CheckSum1, calculate the check value CheckSum2 of the data page using a data page folding and checking algorithm, determine whether CheckSum1 is equal to CheckSum2, if not, Offset is Offset + m, and re-execute the operation of the reading unit 401, and if yes, execute the operation of the recovery unit 403. The key point of the invention is to calculate whether the check value of the data page and the read check value are constant, which is the key point of recovering the file and is an important invention point of the invention.

A recovery unit 403, configured to read a page number PageNo of the data page and a file identifier FileId of a file to which the data page belongs, merge the data pages according to the FileId, sort the data pages from small to large in the file to which the data page belongs according to the page number PageNo, and then make Offset equal to Offset + n to re-execute the operation of the reading unit 401; where m is a data offset unit, n is a data page size, that is, m may be 1 byte, a sector size, a cluster size, etc. according to the read strategy, and n is generally 16384 bytes, although others are also possible. The recovery operation continues until the data is read.

In the operation of the matching unit 402, the first 4 bytes of the data page are read as the check value CheckSum1, which is known about the specific structure of the data page, see fig. 2 and the corresponding description above.

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

where f denotes all fragment page information in a single ibdata/ibd file, PageCount denotes the number of pages, p_i＝{PageCheckSum_i,PageNo_i,FileId_i,Offset_iIn this case, i is an integer of 0 to n, which means PageCheckSum_iCheck value of data page, PageNo_iIndicating the page number, FileId, of a data page_iIndicates the file id, Offset to which the data page belongs_iIndicating the location of the data page on disk. I.e. according to FileId_iThe recovered data pages can be merged and then PageNo_iAnd sequencing the data pages to obtain the restored file.

The invention also verifies the method of the invention in the following way:

(1) the disk management tool using the windows system creates a 3GB size vhd image and mirrors the mount and format.

(2) Copying a data file TEST.ibd of an Innodb storage engine with the size of 8.65M to the mounted partition (suspending copying in the copying process, and writing other data into the disk to make the file discontinuous in the disk).

The ibd file signature recovery on the disk can only recover partial data, and the recovery based on the file record can not recover the file.

The invention has the technical effects that based on the page structure of the InoDB data file, the data can be recovered by taking a data page as a unit, the data file can be recovered from the storage medium such as the whole disk, mirror image and the like, the data can be recovered without depending on file system file records, if the file is partially damaged (such as encrypted by virus and partially covered), the undamaged part of the file can be extracted, if the storage medium contains fragments of a plurality of data files, the fragments can be traced and recombined according to the file identification FileId, and even if the data fragments are discontinuously and disorderly distributed in the disk, the fragments can be sorted and recombined according to the page number PageNo.

The method is particularly suitable for mobile terminal equipment which can be a smart phone, a tablet computer, a notebook computer, a desktop computer or a PDA and the like, and of course, the mobile terminal equipment can also be other portable electronic equipment with a data processing function.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims

1. A fragmented file recovery method based on InNODB is characterized by comprising the following steps:

a reading step of reading n-byte data as one data page of the InNODB data file, which is an ibdata and/or ibd fragment file, starting from an initial position Offset 0 based on InNODB;

a matching step, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, determining that Offset is Offset + m, re-executing the reading step, if yes, executing a recovery step, and calculating the check value CheckSum2 of the data page by using the data page folding and checking algorithm: a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value Checksum2 of the data page is sum1+ sum 2;

a recovery step, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then making Offset be Offset + n and re-executing the reading step;

where m is a unit of data offset and n is the size of one data page.

2. The method of claim 1,

defining two integer numerical exclusive or algorithms, wherein an operator is set as x: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3, the., Nm }, calculating the data set according to the integer XOR algorithm and the fold in sequence, and updating the return value to the fold, namely the fold is Ni, wherein 1< i ═ m, RANDOM _ MASK ═ 1653893711, and RANDOM _ MASK2 ═ 1463735687.

3. The method of claim 2, wherein traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.

4. An InNODB-based fragmented file restoration apparatus, comprising:

a reading unit for reading n-byte data as one data page of the InnoDB data file, which is an ibdata and/or ibd fragment file, starting from an InnoDB-based initial position Offset of 0;

a matching unit, configured to read the first 4 bytes of the data page as a check value CheckSum1, calculate the check value CheckSum2 of the data page using a data page folding and checking algorithm, determine whether CheckSum1 is equal to CheckSum2, if not, determine that Offset is Offset + m, re-execute the operation of the reading unit, and if so, execute the operation of the recovery unit, and calculate the check value CheckSum2 of the data page using the data page folding and checking algorithm: a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value Checksum2 of the data page is sum1+ sum 2;

the recovery unit is used for reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting Offset to Offset + n to re-execute the operation of the reading unit;

where m is a unit of data offset and n is the size of one data page.

5. The apparatus of claim 4,

a**b＝(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b)；

6. The apparatus of claim 5, wherein traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.

7. A computer-readable storage medium, characterized in that the storage medium has stored thereon computer program code which, when executed by a computer, performs the method of any of claims 1-3.