CN109408290B - Fragmented file recovery method and device based on InoDB and storage medium - Google Patents

Fragmented file recovery method and device based on InoDB and storage medium Download PDF

Info

Publication number
CN109408290B
CN109408290B CN201811225169.1A CN201811225169A CN109408290B CN 109408290 B CN109408290 B CN 109408290B CN 201811225169 A CN201811225169 A CN 201811225169A CN 109408290 B CN109408290 B CN 109408290B
Authority
CN
China
Prior art keywords
data
file
page
data page
bytes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811225169.1A
Other languages
Chinese (zh)
Other versions
CN109408290A (en
Inventor
梁德荣
田庆宜
黄建邦
沈长达
吴少华
张学君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201811225169.1A priority Critical patent/CN109408290B/en
Publication of CN109408290A publication Critical patent/CN109408290A/en
Application granted granted Critical
Publication of CN109408290B publication Critical patent/CN109408290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a fragmented file recovery method, a fragmented file recovery device and a storage medium based on InNODB, wherein the method comprises the following steps: reading n bytes of data from an initial position based on the InNODB as one data page of the InNODB data file; reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page, judging whether CheckSum1 is equal to CheckSum2, if not, obtaining Offset as Offset + m, reading the data again, and if so, recovering; reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, and sequencing the data pages from small to large in the file to which the data pages belong according to the page number PageNo. The invention can recover data from the whole disk and the mirror image based on the page structure of the InoDB data file, can recover data without depending on file system file records, can extract the undamaged part of the file if the file is partially damaged, and can trace the source and recombine the fragments and sort and recombine the fragments if the fragments contain a plurality of data files.

Description

Fragmented file recovery method and device based on InoDB and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a fragmented file recovery method and device based on InNODB and a storage medium.
Background
InnodB has wide application as a default storage engine for MySql data. In database recovery, the recovery of the MySql database by the electronic data forensics industry is more urgent. When the MySql database file is deleted manually, viruses are damaged, a magnetic disk is damaged, and the like, the data file is lost, and how to accurately and comprehensively recover the file data is an important and urgent problem to be solved.
At present, a plurality of recovery software for deleting files exist in the market, the recovery software is based on the recovery of file system file records or the recovery of file signatures, and the recovery method based on the file system file records has the following defects: 1. the file record cannot be recovered after being covered by the new file record; 2. the disk executes a quick format, so that file records are emptied and cannot be recovered; 3. the disk has bad tracks in the file records, so that the file records cannot be read and cannot be recovered. The file signature-based recovery method has the following disadvantages: 1. file data is discontinuous on a magnetic disk and cannot be recovered; 2. the file header and the signature of the file are overwritten and cannot be recovered.
Disclosure of Invention
The present invention provides the following technical solutions to overcome the above-mentioned drawbacks in the prior art.
An InNODB-based fragmented file recovery method comprises the following steps:
a reading step of reading n-byte data as one data page of the InnoDB data file from an InnoDB-based initial position Offset of 0;
a matching step, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, resetting to read, and if so, resetting to execute a restoring step;
a recovery step, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then making Offset be Offset + n and re-executing the reading step; where m is a unit of data offset and n is the size of one data page.
Further, the fragmentation file is an ibdata and/or ibd fragmentation file.
Further, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.
Further, two integer xor algorithms are defined, with operators set by: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is Ni, wherein 1 is < i < ═ m, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.
Still further, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
The invention also provides a fragment file recovery device based on InoDB, which comprises:
a reading unit for reading n-byte data as one data page of the InnoDB data file from an InnoDB-based initial position Offset of 0;
the matching unit is used for reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2 or not, if not, resetting the Offset to be Offset + m, and re-executing the operation of the reading unit, and if yes, executing the operation of the recovery unit;
the recovery unit is used for reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting Offset to Offset + n to re-execute the operation of the reading unit; where m is a unit of data offset and n is the size of one data page.
Further, the fragmentation file is an ibdata and/or ibd fragmentation file.
Further, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.
Further, two integer xor algorithms are defined, with operators set by: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is Ni, wherein 1 is < i < ═ m, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.
Still further, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
The invention also proposes a computer-readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.
The invention has the technical effects that: the invention is based on the page structure of InoDB data files, namely, data can be recovered by taking a data page as a unit, the data files can be recovered from storage media such as a whole disk and a mirror image, the data recovery can be carried out without depending on file system file records, if the files are partially damaged (such as encrypted by viruses and partially covered), the undamaged parts of the files can be extracted, if the storage media contain fragments of a plurality of data files, the fragments can be traced and recombined according to file identification FileId, and even if the data fragments are discontinuously and disorderly distributed in the disk, the fragments can be sorted and recombined according to page number PageNo.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an InnoDB data file structure according to an embodiment of the present invention.
Fig. 2 is a data page schematic of an InnoDB according to an embodiment of the present invention.
Fig. 3 is a flowchart of an inbo-based fragmented file restoration method according to an embodiment of the present invention.
Fig. 4 is a structural diagram of an inodb-based fragmented file restoration apparatus according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
InNODB is the maximum performance design for handling large amounts of data. The InnodB-based full integration with the MySQL server maintains its own buffer pool for caching data and indexes in the main memory. InNODB stores its tables & indices in a tablespace, which may contain several files (or original disk partitions). Technically, InNODB is a complete database system placed in MySQL background, and establishes a special buffer pool in main memory for caching data and indexing.
Referring to fig. 1, the InnoDB data file structure is composed of a series of data pages, and the data pages are ordered from small to large starting with a page number of 0.
Referring to FIG. 2, the InNODB storage engine has a data page size of 16384 bytes, wherein 0-3 bytes store the check value, 4-7 bytes store the page number PageNo, and 34-37 bytes store the file ID, i.e. the file identification FileId.
Based on the above description, it is understood that the file structure and the data page structure of the InNODB storage engine are the basis for realizing data recovery.
The principle of data recovery of the invention is as follows: 1) calculating a check value of the data block by using a folding and checking algorithm, wherein if the calculated check value is equal to a check value stored at the head of the page, the data block is the page of the InNODB data file; 2) in a database example, each data file has a unique file ID (file identification), the file ID of the file is recorded in each data, and the data pages are subjected to file combination through the characteristic; 3) each data page records the page number of the page, the data pages are sorted in the file by increasing the page number, and the data pages are sorted in the file by the characteristic.
Fig. 3 shows an inbo based fragmented file restoration method of the present invention, which includes:
in the reading step S101, n bytes of data are read as one data page of the inodb data file from the initial position Offset 0 based on the inodb.
A matching step S102, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, resetting to Offset + m, and executing a reading step S101 again, and if so, executing a recovery step S103. The key point of the invention is to calculate whether the check value of the data page and the read check value are constant, which is the key point of recovering the file and is an important invention point of the invention.
A restoring step S103, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting the Offset to Offset + n to re-execute the reading step S101; where m is a data offset unit, n is a data page size, that is, m may be 1 byte, a sector size, a cluster size, etc. according to the read strategy, and n is generally 16384 bytes, although others are also possible. The recovery operation continues until the data is read.
In the matching step S102, the first 4 bytes of the data page are read as the check value CheckSum1, which is known about the specific structure of the data page, see fig. 2 and the corresponding description above.
In the InNODB, the data is in the format of ibdata or ibd, so the fragment file type recovered in the invention is ibdata or ibd fragment file, and the two can be recovered together.
Another important inventive point of the present invention is to calculate the check value of the data page, which is an important step for implementing the present invention, specifically, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.
In order to calculate the check value of the data page, the invention further defines two integer xor algorithms, which is also an important invention of the invention, and specifically, an operator of the xor algorithm is set as: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
based on the xor value algorithm, the invention provides a folding xor calculation, which is one of the important points of the invention, and is the key point for realizing the invention, and the operation is as follows: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is fold Ni, wherein 1 is < i < ═ m, i and m are integers, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.
In one embodiment, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
In one embodiment, the manner in which the data pages are merged according to the FileId is as follows,
Figure GDA0002842603380000081
where f denotes all fragment page information in a single ibdata/ibd file, PageCount denotes the number of pages, pi={PageCheckSumi,PageNoi,FileIdi,OffsetiIn this case, i is an integer of 0 to n, which means PageCheckSumiCheck value of data page, PageNoiTo representPage number, FileId of a data pageiIndicates the file id, Offset to which the data page belongsiIndicating the location of the data page on disk. I.e. according to FileIdiThe recovered data pages can be merged and then PageNoiAnd sequencing the data pages to obtain the restored file.
Fig. 4 shows an inbo based fragmented file restoration method of the present invention, which includes:
a reading unit 401 for reading n bytes of data as one data page of the inodb data file from an initial position Offset 0 based on the inodb.
A matching unit 402, configured to read the first 4 bytes of the data page as a check value CheckSum1, calculate the check value CheckSum2 of the data page using a data page folding and checking algorithm, determine whether CheckSum1 is equal to CheckSum2, if not, Offset is Offset + m, and re-execute the operation of the reading unit 401, and if yes, execute the operation of the recovery unit 403. The key point of the invention is to calculate whether the check value of the data page and the read check value are constant, which is the key point of recovering the file and is an important invention point of the invention.
A recovery unit 403, configured to read a page number PageNo of the data page and a file identifier FileId of a file to which the data page belongs, merge the data pages according to the FileId, sort the data pages from small to large in the file to which the data page belongs according to the page number PageNo, and then make Offset equal to Offset + n to re-execute the operation of the reading unit 401; where m is a data offset unit, n is a data page size, that is, m may be 1 byte, a sector size, a cluster size, etc. according to the read strategy, and n is generally 16384 bytes, although others are also possible. The recovery operation continues until the data is read.
In the operation of the matching unit 402, the first 4 bytes of the data page are read as the check value CheckSum1, which is known about the specific structure of the data page, see fig. 2 and the corresponding description above.
In the InNODB, the data is in the format of ibdata or ibd, so the fragment file type recovered in the invention is ibdata or ibd fragment file, and the two can be recovered together.
Another important inventive point of the present invention is to calculate the check value of the data page, which is an important step for implementing the present invention, specifically, the operation of calculating the check value CheckSum2 of the data page using the folding and checking algorithm of the data page is as follows: and a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value checksum2 of the data page is sum1+ sum 2.
In order to calculate the check value of the data page, the invention further defines two integer xor algorithms, which is also an important invention of the invention, and specifically, an operator of the xor algorithm is set as: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
based on the xor value algorithm, the invention provides a folding xor calculation, which is one of the important points of the invention, and is the key point for realizing the invention, and the operation is as follows: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3,. and Nm }, sequentially calculating with the fold according to an integer XOR algorithm, and updating the return value to the fold, namely, the fold is fold Ni, wherein 1 is < i < ═ m, i and m are integers, RANDOM _ MASK is 1653893711, and RANDOM _ MASK2 is 1463735687.
In one embodiment, traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
In one embodiment, the manner in which the data pages are merged according to the FileId is as follows,
Figure GDA0002842603380000101
where f denotes all fragment page information in a single ibdata/ibd file, PageCount denotes the number of pages, pi={PageCheckSumi,PageNoi,FileIdi,OffsetiIn this case, i is an integer of 0 to n, which means PageCheckSumiCheck value of data page, PageNoiIndicating the page number, FileId, of a data pageiIndicates the file id, Offset to which the data page belongsiIndicating the location of the data page on disk. I.e. according to FileIdiThe recovered data pages can be merged and then PageNoiAnd sequencing the data pages to obtain the restored file.
The invention also verifies the method of the invention in the following way:
(1) the disk management tool using the windows system creates a 3GB size vhd image and mirrors the mount and format.
(2) Copying a data file TEST.ibd of an Innodb storage engine with the size of 8.65M to the mounted partition (suspending copying in the copying process, and writing other data into the disk to make the file discontinuous in the disk).
The ibd file signature recovery on the disk can only recover partial data, and the recovery based on the file record can not recover the file.
The invention has the technical effects that based on the page structure of the InoDB data file, the data can be recovered by taking a data page as a unit, the data file can be recovered from the storage medium such as the whole disk, mirror image and the like, the data can be recovered without depending on file system file records, if the file is partially damaged (such as encrypted by virus and partially covered), the undamaged part of the file can be extracted, if the storage medium contains fragments of a plurality of data files, the fragments can be traced and recombined according to the file identification FileId, and even if the data fragments are discontinuously and disorderly distributed in the disk, the fragments can be sorted and recombined according to the page number PageNo.
The method is particularly suitable for mobile terminal equipment which can be a smart phone, a tablet computer, a notebook computer, a desktop computer or a PDA and the like, and of course, the mobile terminal equipment can also be other portable electronic equipment with a data processing function.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims (7)

1. A fragmented file recovery method based on InNODB is characterized by comprising the following steps:
a reading step of reading n-byte data as one data page of the InNODB data file, which is an ibdata and/or ibd fragment file, starting from an initial position Offset 0 based on InNODB;
a matching step, reading the first 4 bytes of the data page as a check value CheckSum1, calculating the check value CheckSum2 of the data page by using a data page folding and checking algorithm, judging whether CheckSum1 is equal to CheckSum2, if not, determining that Offset is Offset + m, re-executing the reading step, if yes, executing a recovery step, and calculating the check value CheckSum2 of the data page by using the data page folding and checking algorithm: a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value Checksum2 of the data page is sum1+ sum 2;
a recovery step, reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then making Offset be Offset + n and re-executing the reading step;
where m is a unit of data offset and n is the size of one data page.
2. The method of claim 1,
defining two integer numerical exclusive or algorithms, wherein an operator is set as x: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3, the., Nm }, calculating the data set according to the integer XOR algorithm and the fold in sequence, and updating the return value to the fold, namely the fold is Ni, wherein 1< i ═ m, RANDOM _ MASK ═ 1653893711, and RANDOM _ MASK2 ═ 1463735687.
3. The method of claim 2, wherein traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
4. An InNODB-based fragmented file restoration apparatus, comprising:
a reading unit for reading n-byte data as one data page of the InnoDB data file, which is an ibdata and/or ibd fragment file, starting from an InnoDB-based initial position Offset of 0;
a matching unit, configured to read the first 4 bytes of the data page as a check value CheckSum1, calculate the check value CheckSum2 of the data page using a data page folding and checking algorithm, determine whether CheckSum1 is equal to CheckSum2, if not, determine that Offset is Offset + m, re-execute the operation of the reading unit, and if so, execute the operation of the recovery unit, and calculate the check value CheckSum2 of the data page using the data page folding and checking algorithm: a section of data with the length of 22 bytes is taken from the 4 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum1, and a section of data with the length of n-46 bytes is taken from the 38 th byte of the data page and is subjected to folding exclusive-or calculation to obtain a check value sum2, so that the check value Checksum2 of the data page is sum1+ sum 2;
the recovery unit is used for reading the page number PageNo of the data page and the file identification FileId of the file to which the data page belongs, merging the data pages according to the FileId, sorting the data pages from small to large in the file to which the page number PageNo belongs, and then resetting Offset to Offset + n to re-execute the operation of the reading unit;
where m is a unit of data offset and n is the size of one data page.
5. The apparatus of claim 4,
defining two integer numerical exclusive or algorithms, wherein an operator is set as x: setting two 4-byte integer numbers a and b, and the XOR algorithm is as follows:
a**b=(((((a^b^RANDOM_MASK)<<8)+a)^RANDOM_MASK2)+b);
i.e. the value of a exclusive or b exclusive or RANDOM _ MASK is shifted left by 8 plus a then exclusive or RANDOM _ MASK2 plus b;
the operation of the folding XOR calculation is: setting the initial value of the fold number to be 0, traversing the data according to the byte sequence, setting the traversal structure to be a data set N { N1, N2, N3, the., Nm }, calculating the data set according to the integer XOR algorithm and the fold in sequence, and updating the return value to the fold, namely the fold is Ni, wherein 1< i ═ m, RANDOM _ MASK ═ 1653893711, and RANDOM _ MASK2 ═ 1463735687.
6. The apparatus of claim 5, wherein traversing the piece of data in byte order to form a data set N { N1, N2, N3. A4-byte integer number is generated every four bytes from the start of the piece of data, and if the last remaining data is less than 4 bytes, the remaining number of bytes is made an integer as Nm.
7. A computer-readable storage medium, characterized in that the storage medium has stored thereon computer program code which, when executed by a computer, performs the method of any of claims 1-3.
CN201811225169.1A 2018-10-19 2018-10-19 Fragmented file recovery method and device based on InoDB and storage medium Active CN109408290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811225169.1A CN109408290B (en) 2018-10-19 2018-10-19 Fragmented file recovery method and device based on InoDB and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811225169.1A CN109408290B (en) 2018-10-19 2018-10-19 Fragmented file recovery method and device based on InoDB and storage medium

Publications (2)

Publication Number Publication Date
CN109408290A CN109408290A (en) 2019-03-01
CN109408290B true CN109408290B (en) 2021-02-26

Family

ID=65468048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811225169.1A Active CN109408290B (en) 2018-10-19 2018-10-19 Fragmented file recovery method and device based on InoDB and storage medium

Country Status (1)

Country Link
CN (1) CN109408290B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058969B (en) * 2019-04-18 2023-02-28 腾讯科技(深圳)有限公司 Data recovery method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014108083A1 (en) * 2013-01-11 2014-07-17 Tencent Technology (Shenzhen) Company Limited Method and device for verifying consistency of data of master device and slave device
CN104881418A (en) * 2014-02-28 2015-09-02 阿里巴巴集团控股有限公司 Method and device for quickly reclaiming rollback space in MySQL
US9824132B2 (en) * 2013-01-08 2017-11-21 Facebook, Inc. Data recovery in multi-leader distributed systems
CN108062358A (en) * 2017-11-28 2018-05-22 厦门市美亚柏科信息股份有限公司 The offline restoration methods of innodb engine deletion records, storage medium
CN108319862A (en) * 2017-01-16 2018-07-24 阿里巴巴集团控股有限公司 A kind of method and apparatus of data documents disposal
CN108563535A (en) * 2018-04-27 2018-09-21 四川巧夺天工信息安全智能设备有限公司 A kind of restoration methods to the full library of MySQL database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8510270B2 (en) * 2010-07-27 2013-08-13 Oracle International Corporation MYSQL database heterogeneous log based replication

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9824132B2 (en) * 2013-01-08 2017-11-21 Facebook, Inc. Data recovery in multi-leader distributed systems
WO2014108083A1 (en) * 2013-01-11 2014-07-17 Tencent Technology (Shenzhen) Company Limited Method and device for verifying consistency of data of master device and slave device
CN104881418A (en) * 2014-02-28 2015-09-02 阿里巴巴集团控股有限公司 Method and device for quickly reclaiming rollback space in MySQL
CN108319862A (en) * 2017-01-16 2018-07-24 阿里巴巴集团控股有限公司 A kind of method and apparatus of data documents disposal
CN108062358A (en) * 2017-11-28 2018-05-22 厦门市美亚柏科信息股份有限公司 The offline restoration methods of innodb engine deletion records, storage medium
CN108563535A (en) * 2018-04-27 2018-09-21 四川巧夺天工信息安全智能设备有限公司 A kind of restoration methods to the full library of MySQL database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
InnoDB数据库数据恢复技术研究;孙偏偏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20151015;I138-227 *

Also Published As

Publication number Publication date
CN109408290A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
US7716196B2 (en) Method for culling a litigation discovery file set
JP5735654B2 (en) Deduplication method for stored data, deduplication apparatus for stored data, and deduplication program
US20200097452A1 (en) Data deduplication device, data deduplication method, and data deduplication program
CN103870514B (en) Data de-duplication method and device
US8862555B1 (en) Methods and apparatus for generating difference files
US10783145B2 (en) Block level deduplication with block similarity
US20080282355A1 (en) Document container data structure and methods thereof
JP2007521528A (en) Creating a volume image
CN111382126B (en) System and method for deleting file and preventing file recovery
KR20140131333A (en) Stream recognition and filtering
US8914325B2 (en) Change tracking for multiphase deduplication
US8909606B2 (en) Data block compression using coalescion
CN109408290B (en) Fragmented file recovery method and device based on InoDB and storage medium
US20140156607A1 (en) Index for deduplication
CN109656929B (en) Method and device for carving complex relation type database file
CN113468118B (en) File increment storage method, device and storage medium based on blockchain
US8370390B1 (en) Method and apparatus for identifying near-duplicate documents
WO2024082525A1 (en) File snapshot method and system, electronic device, and storage medium
Billard et al. Making sense of unstructured flash-memory dumps
CN104484402A (en) Method and device for deleting repeating data
US20080320252A1 (en) Optimized and robust in-place data transformation
US20140250078A1 (en) Multiphase deduplication
CN111698330A (en) Data recovery method and device of storage cluster and server
CN113254397B (en) Data checking method and computing device
CN105260423A (en) Duplicate removal method and apparatus for electronic cards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant