CN113282592B - Method, system and storage medium for recovering MSSQL database - Google Patents

Method, system and storage medium for recovering MSSQL database Download PDF

Info

Publication number
CN113282592B
CN113282592B CN202110828001.5A CN202110828001A CN113282592B CN 113282592 B CN113282592 B CN 113282592B CN 202110828001 A CN202110828001 A CN 202110828001A CN 113282592 B CN113282592 B CN 113282592B
Authority
CN
China
Prior art keywords
data
object information
page
area
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110828001.5A
Other languages
Chinese (zh)
Other versions
CN113282592A (en
Inventor
黄传波
姚一永
龙星澧
涂磊
谢卓伟
钱禹航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Vinchin Science And Technology Co
Original Assignee
Chengdu Vinchin Science And Technology Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Vinchin Science And Technology Co filed Critical Chengdu Vinchin Science And Technology Co
Priority to CN202110828001.5A priority Critical patent/CN113282592B/en
Publication of CN113282592A publication Critical patent/CN113282592A/en
Application granted granted Critical
Publication of CN113282592B publication Critical patent/CN113282592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a system and a storage medium for recovering an MSSQL database, belonging to the field of data recovery. The method comprises the following steps: acquiring a data file; analyzing a system table page to acquire first object information; inquiring a user table page to obtain second object information; judging whether the second object information is compressed data or not; and if the second object information is compressed data, decompressing and decoding the second object information according to a line compression structure of the compressed data to recover the first data. The system comprises: the device comprises a file acquisition module, a system table analysis module, a user table query module, a second judgment module and a decompression decoding module. The method is based on binary analysis, can help the user to recover the data of the table level degree from MSSQL, supports the recovery of the compressed data, has a large range of recovered data, does not depend on log files, and is more flexible and convenient to operate.

Description

Method, system and storage medium for recovering MSSQL database
Technical Field
The invention belongs to the field of data recovery, and relates to a method, a system and a storage medium for recovering an MSSQL database.
Background
SQL server is a relational database management system introduced by Microsoft corporation, and is called Microsoft SQL server for short MSSQL. The SQL Server is a closed source database product, provides programs for vast users, does not provide source codes, and has the characteristics of high safety, strong usability and good operation performance.
The Data File of the SQL server is divided into an MDF File (Primary Data File), an NDF File (Secondary Data Files), and an LDF File (Log Data Files), wherein the MDF File and the LDF File are owned by each database, and the NDF File appears only when the databases are divided. The MDF file and the NDF file are mainly used for storing data, and both have similar structures, while the LDF file is a database log, and stores all transaction operations of the database, and is mainly used for recording and rolling back transactions.
At present, due to the closed-source characteristic of the SQL server, the traditional SQL server data recovery technology can only achieve the file-level recovery degree, but almost no technology can achieve the table-level recovery degree. At the same time, the user can select the desired position,
because the SQL server 2008 introduces the row compression operation, after the row compression is performed on the data, the data structure of the corresponding table is reorganized, so that the traditional SQL server data recovery technology cannot analyze the compressed data.
Therefore, how to help the user to recover the data of the table level degree from the SQL server and help the user to recover the data after line compression becomes a technical problem which needs to be solved urgently at present.
Disclosure of Invention
In order to solve the technical problems in the background art, embodiments of the present invention provide a method, a system, and a storage medium for recovering an MSSQL database. The technical scheme is as follows:
in a first aspect, a method for recovering an MSSQL database is provided, the method comprising the steps of:
acquiring a data file in a storage system;
analyzing a system table page in a data file according to a system table page identifier and a system table page organization structure, and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table;
inquiring a user table page according to the first object information, and acquiring second object information from the user table page;
judging whether the second object information is compressed data or not;
if the second object information is compressed data, decompressing the second object information according to a line compression structure of the compressed data, and then decoding to recover the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
and if the second object information is not the compressed data, decoding the second object information to recover the second data.
It is understood that the second object information refers to a format in which a single piece of data in the database user table is stored in the MDF file. The second object is composed of actual data stored by the user and metadata for describing the actual data storage structure. By means of the table field information and the second object information, the data content corresponding to each field in the single piece of data can be analyzed.
It is also understood that the first data and the second data are both data of the user table to be restored, and the two data are different in the source manner.
In one embodiment, the step of decompressing and decoding the second object information according to the line compression structure of the compressed data includes:
analyzing the column specifier area to obtain the total number of columns and the length of each column;
analyzing the short data area according to the first object information and the length of each column to obtain first actual data;
analyzing the offset item corresponding to each column in the long data area to obtain the ending offset of each column;
analyzing the long data area according to the ending deviation and the first object information to obtain second actual data;
and decoding the first actual data and the second actual data to recover the first data.
In one embodiment, before the step of querying a user table page according to the first object information and obtaining the second object information from the user table page, the method further includes:
judging whether the data file is a damaged file or not;
if the data file is an undamaged file, the user table page is inquired according to the first object information, and the second object information is obtained from the user table page, and the following steps are specifically executed:
indexing an IAM page according to the first object information, and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
and inquiring a user table page according to the third object information, and obtaining second object information from the user table page.
In one embodiment, the step of indexing the IAM page includes:
judging whether eight data slots in the IAM page mixing area are filled;
and if the eight data slots are not completely filled, analyzing the IAM page mixed area to acquire the third object information.
In one embodiment, after the step of determining whether the second object information is compressed data, the method further includes:
and analyzing the first data to recover third data, wherein the third data is any one or more of LOB data, SLOB data, BLOB data or line overflow data.
In one embodiment, after the step of determining whether the second object information is compressed data, the method further includes:
recording the first address offset of each piece of data in the first data;
and comparing the first address offset of each piece of data in the first data with the address in the line directory, and marking the unseen matching item as deleted data.
In a second aspect, there is also provided a system for recovering an MSSQL database, the system comprising:
the file acquisition module is used for acquiring data files in the storage system;
the system table analysis module is used for analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table;
the user table query module is used for querying a user table page according to the first object information and obtaining second object information from the user table page;
the second judging module is used for judging whether the second object information is compressed data or not;
the decompression decoding module is used for decompressing the second object information according to a line compression structure of the compressed data and then decoding the second object information to restore the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
and the decoding module is used for decoding the second object information and recovering second data.
In one embodiment, the system further includes:
the first judgment module is used for judging whether the data file is a damaged file or not;
wherein, the user table query module comprises:
the index unit is used for indexing an IAM page according to the first object information and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
and the query unit is used for querying a user table page according to the third object information and acquiring second object information from the user table page.
In one embodiment, the system further comprises:
and the data format analysis module is used for analyzing the first data and recovering third data, wherein the third data is any one or more of LOB data, SLOB data, BLOB data or line overflow data.
In one embodiment, the system further comprises:
the recording module is used for recording the first address offset of each piece of data in the first data;
and the comparison marking module is used for comparing the first address offset of each piece of data in the first data with the address in the row directory and marking the unseen matching item as deleted data.
In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the above method for recovering an MSSQL database.
The invention has the beneficial effects that:
1. according to the method, the MDF file or the NDF file is analyzed by using a binary analysis and coding technology, the user table page in the MSSQL database can be obtained, the data is recovered from the user table page, and the effect of recovering the data of the designated user table page is achieved;
2. based on binary analysis, the method provided by the invention can be used for recovering the MSSQL database without mounting and starting a system in a virtual disk or depending on a log file, thereby greatly simplifying the operation process and being more flexible to use;
3. according to the metadata and the positioned table fields, the row compression data can be completely decompressed by combining the decompression method obtained by the reverse analysis of the row compression structure, and the effect of helping a user to recover the compression data is achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for recovering an MSSQL database according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a system table page according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a line compression structure for compressing data according to an embodiment of the present invention.
Fig. 4 is a flowchart of a method for recovering an MSSQL database according to a second embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an IAM page linked list in the second embodiment of the present invention.
Fig. 6 is a physical structure diagram of an IAM page unified area in the second embodiment of the present invention.
Fig. 7 is a physical structure diagram of an IAM page mixing area in the second embodiment of the present invention.
Fig. 8 is a flowchart of a method for recovering an MSSQL database according to a third embodiment of the present invention.
Fig. 9 is a flowchart of a method for recovering an MSSQL database according to a fourth embodiment of the present invention.
Fig. 10 is a schematic structural diagram of a system for recovering an MSSQL database according to a fifth embodiment of the present invention.
Fig. 11 is a schematic structural diagram of a user table query module in the fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method provided by the invention can be applied to the following environments: in the virtual disk image containing SQL Server, the analysis process is compiled by JAVA, and the virtual disk is in VMDK format of VMware work.
Example one
As shown in fig. 1, in one embodiment, there is provided a method of recovering an MSSQL database, the method comprising the steps of:
s101, acquiring a data file in the virtual disk image.
Data files refer to MDF files and NDF files, i.e., files suffixed with MDF and files suffixed with NDF. The organization structure of the MDF file and the organization structure of the NDF file both use pages as basic units, each page consisting of several bytes. MDF files and NDF files are mainly used for storing data, and thus provide a possibility for recovering deleted user data by parsing page structure and byte information.
For ease of understanding, in particular, an example of operation is provided: reading a virtual disk and loading a registry file; acquiring a value named DefaultData node under an MSSQL14.MSSQLSERVER node, wherein the specific value is C \ Program Files \ Microsoft SQL Server \ MSSQL14.MSSQL SERVER \ MSSQL \ DATA; and acquiring a database data file named sample. mdf through the path because the database is not split, and loading the database data file into a recovery program realized by java.
S102, analyzing a system table page in the data file according to the system table page identifier and the system table page organization structure, and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table.
Pages are the basic units that make up a data file, and the types of pages include index pages, IAM pages, system table pages, and user table pages. The system table page contains a plurality of types of system tables, the system tables are typically prefixed by sys, each system table stores different metadata, such as Sysobjects, which stores all object information created in the database, such as constraints, defaults, logs, rules, stored procedures, user tables, each object occupies one row in the table.
Different types of pages have different organization structures, as shown in fig. 2, a complete system table page at least comprises: the device comprises a page header area, a data area, a line directory area (slot area) and a free area, wherein the size of the page header area is fixed 96 bytes, and the sizes of the other three areas are not fixed. The page header area mainly stores important information such as the identifier, the page type, the obj ID, the slot number, the fixed segment length, the current page ID and the next page ID of the page, and the information such as the position, the page storage content and the page integrity of the page can be confirmed through the page header area. The identifier is a 48-bit number, which is unique in the whole database data file, the upper 32 bits are the page number, and the lower 16 bits are the file number, and through the two numbers, a page can be uniquely determined. Meanwhile, all data pages of the same system table are stored in a double-linked list, and when one page is found, all the remaining pages can be found according to the pointer.
Therefore, the page id where all the system table pages are located can be quickly retrieved through the identifier and the organization structure, and a foundation is laid for acquiring the object information required to be recovered from the system table pages.
For ease of understanding, in particular, an example of operation is provided: cutting the page according to 8192 bytes, and retrieving a system table page in the data file; reading a system table page for storing table information, screening all user table information with the type of U, and selecting the table id required to be recovered as 251147940; reading the system table page for storing the table fields, and screening out all fields with table id 251147940, the results are shown in table 1 below:
TABLE 1 Table field information screening results
Name of table field Type (B) Actual type Length of Logical order
uid 56 Int 4 1
Pig 175 Char 8 2
Elephant 175 Char 10 3
Monkey 175 Char 12 4
Cat 239 NChar 20 5
Bird 35 Text 16 6
Duck 175 Char 1000 7
Dog 62 Float 8 8
S103, inquiring a user table page according to the first object information, and obtaining second object information from the user table page.
S104, judging whether the second object information is compressed data or not; if yes, go to step S105, otherwise, go to step S106.
And S105, decompressing the second object information according to a line compression structure of the compressed data, and then decoding to recover the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area.
When data is compressed, the data format is changed, a set of compression algorithm designed by the SQL Server is used for compressing the data to the maximum extent, but the compressed row data structure is similar to the uncompressed data and still contains metadata and stored actual data; therefore, the inventor can completely decompress the row compression data according to the metadata and the positioned table fields and by combining a decompression method obtained by reversely analyzing the row compression structure, and the effect of helping a user to recover the compression data is achieved.
Optionally, the step of decompressing the second object information according to the line compression structure of the compressed data and then performing decoding processing includes;
s1051, analyzing the column descriptor region to obtain the total number of columns and the length of each column;
s1052, analyzing the short data area according to the first object information and the length of each column to obtain first actual data;
s1053, analyzing the offset item corresponding to each column in the long data area, and acquiring the ending offset of each column;
s1054, analyzing the long data area according to the ending offset and the first object information to obtain second actual data;
s1055, decoding the first actual data and the second actual data to recover the first data.
It is understood that the first data refers to data of a user table that a user needs to restore. The first actual data and the second actual data are metadata stored in a row compression structure.
For ease of understanding, we provide an illustration of the row compression structure below, in particular as follows:
as shown in fig. 3, the row compression structure includes a header area, a column specifier area, a short data area, and a long data area, which are arranged in this order, wherein,
the length of the header area is 1 byte, the header area 0 indicates that the data is not subjected to line compression, and the header area 1 indicates that the data is subjected to line compression;
the column specifier area has a total column number portion and a column length portion, wherein the length of the total column number portion is M, the number of columns is N, and M satisfies: n <127, M =1 byte; when N is not less than 127, M =2 bytes; in the column length part, each column is 4 bits in length;
the short data area has a first aggregation part (i.e., a short data aggregation part) and a short data part containing first actual data (i.e., stored short data). It should be noted that the first aggregation portions are only present when the total number of columns is greater than 30 rows, and each first aggregation portion represents 30 columns and has a length of 1 byte. The first aggregation section is used for fast indexing short data.
The long data area has a second aggregation part (i.e., a long data aggregation part), an offset item part arranged by using small ends, and a long data part containing second actual data (i.e., stored long data), wherein the length of the offset item part is 2 bytes, and when the primary bit of the offset item part is 1, the corresponding column is a complex column. The second aggregation section is used to quickly index long data and facilitate recovery of special format data (e.g., LOB, line overflow data, etc.).
And S106, decoding the second object information to recover second data.
When the object information is not compressed, the object information is encoded in an original state and then stored in the line, and the object information can be directly decoded to recover data.
The technical scheme of the embodiment can be operated by taking a page as a unit based on binary analysis and coding, helps a user to recover data of a table level degree from MSSQL, further can recover deleted data under the condition of no log file, and provides effective help for a digital evidence obtaining worker. Meanwhile, the method also supports the recovery of compressed data, and enlarges the range of recoverable data.
Example two
As shown in fig. 4, in one embodiment, a method for recovering an MSSQL database is provided, which comprises:
s201, acquiring a data file in the virtual disk image.
S202, analyzing a system table page in the data file according to the system table page identifier and the system table page organization structure, and acquiring first object information of a table to be restored from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table.
S203, judging whether the data file is a damaged file or not; if so, steps S2041 and S2042 are performed, otherwise, steps S2051 to S2053 are performed.
It is understood that the step "determining whether the data file is a damaged file" may also be performed before step S202 (i.e. analyzing the system table page in the data file according to the system table page identifier and the system table page organization structure, and acquiring the first object information of the table to be restored), and then step S2041, step S2042, step S2051, step S2052, and step S2053 may all be performed after step S202.
S2041, indexing an IAM page according to the first object information, and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots.
And S2042, inquiring a user table page according to the third object information, and obtaining second object information from the user table page.
In this embodiment, under a normal condition, each table in the database has at least one corresponding IAM page, and the IAM page includes an index of a data page of the table, so that the IAM page may be used to track the table, avoid traversing the whole file, and effectively improve the recovery speed of the data, and the larger the file is, the more obvious the speed advantage is.
It should be noted that, in order to obtain the IAM page quickly, the first object information should also include an index of the IAM page.
As shown in FIG. 5, IAM pages are stored in a data file in a chained manner, with each IAM page having pointers to the next IAM page and the previous IAM page.
Optionally, the step of indexing the IAM page includes:
s20411, judging whether eight data slots in the IAM page mixing area are filled; if not, go to step S20412; if full, go to S20413;
s20412, analyzing the IAM page mixing area to obtain third object information;
and S20413, analyzing the IAM page mixed area and the IAM page unified area to acquire the third object information.
It should be noted that the mixed area of the IAM page has at least eight slot positions, each slot position corresponds to one page, page storage is stored in the mixed area first, and when all eight slot positions are full, the uniform area is allocated.
It will be appreciated that the above-described blending region means that the data pages in the region may be from a plurality of different tables. The mixed area is provided with at least eight slot positions, each slot position corresponds to a mixed area page, one slot position occupies 6 bytes, the first 2 bytes are file numbers, and the last 4 bytes are page sequence numbers.
It should be understood that the unified area refers to an area with eight pages as a partition, and when a partition is allocated to a table, only the data page of the table can be stored. The unified area is composed of a number of bits, and a number of the bits map a 4 Gigabyte (GB) area in the data file.
For ease of understanding, we provide a description of the IAM page uniform and mixed regions below, in particular as follows:
as shown in fig. 6, in the physical structure of the unified area of the IAM page in the huagrow table, the first two bytes are status bytes, the following bytes represent that the unified area allocation case is 0, and the area is not allocated to the target table, the 6 th bit of the 8 th byte of the unified area in fig. 6 is 1, it can be known that the 62 th area of the unified area is allocated to the huagrow page, since one area is composed of 8 continuous pages, it can be calculated that the continuous 8 pages starting from the 488 page all belong to the table huagrow, the mixed area of one IAM page has 7988 bytes, that is, 63904 areas, 511232 pages, and when the page is greater than this value, the mixed area appears on the second IAM page, and one IAM page manages about 4G of data.
As shown in fig. 7, in the physical structure of the IAM page mixed area in the hugerow table, the first dashed line part indicates the beginning part of the page area corresponding to the IAM page, and it can be seen that the page ID is 0 and the file number is 1, so this IAM page corresponds to 511232 pages starting from the 0 th page, the second dashed line part is the first slot of the mixed area, the file number is 1, and the page ID is 175 (small end mode), which indicates that the table is allocated into a uniform area composed of 8 continuous pages starting from 175.
To facilitate further understanding, in particular, an example of operation is provided: when the data file is not damaged, the page ID of the first page of the IAM page corresponding to the target table is screened out to be 150 (namely the page number) and the file number is 1 according to the storage object information page, the IAM page is obtained through the two items of data, the page ID of the data of the storage target table is analyzed to be 568 through the IAM page, the file number is 1, and then the page where the data is located through the page ID568 and the file number.
Optionally, in order to more smoothly parse the binary IAM page, the step of parsing the IAM page mixture area is specifically performed as follows:
s204121, determining a mapping area index in the IAM page;
s204122, determining all data slots in the IAM page according to the mapping area index;
s204123, analyzing all the data slots in the step 204122 to obtain third object information.
The step of analyzing the IAM page mixed area and the IAM page unified area is specifically executed as follows:
s204121, determining a mapping area index in the IAM page;
s204122, determining all data slots in the IAM page according to the mapping area index;
s204123, determining a uniform area map in the IAM page, and determining reserved bytes in the uniform area map;
s204124, traversing the IAM page, and marking all non-zero bits;
s204125, determining all unified regions according to the mapping region index of step S204122 and the non-zero bits of step S204124;
s204126, parsing all the data slots of step S204122 and all the uniform areas of step S204125 to obtain third object information.
Preferably, in order to improve data accuracy, the pages may be collated and verified, and the pages that fail the checksum calculation may be marked.
It is worth noting that, in the SQL Server, checksum calculation is performed on each page, and the calculation result is stored in the header, and the checksum calculation is performed on the page and compared with checksum data in the header, so that whether the data page is damaged or tampered can be determined, which is simple and effective.
S2051, indexing the system table page according to the first object information, and obtaining an index Id and a page header Id.
S2052, calculating auid through the index Id and the page head Id.
S2053, traversing the data file according to the auid to obtain a user table page, and obtaining second object information from the user table page.
When the data file is damaged, a large amount of data may be lost due to data restoration through the IAM page, so that all user table pages can be screened out by traversing the data file through the index Id (i.e., indexId) and the header Id (i.e., objectId), and further, the field items corresponding to a specific table can be screened out from the user table pages, thereby ensuring consistency of data restoration.
The calculation mode of the auid is as follows:
indexId=auid>>48
objectId=(auid-(indexId<<48))>>16。
in the above-described calculation method of the auid, "< <" means a left shift operator, "> > means a right shift operator," > >48 "means a right shift of 48 bits," > >16 "means a right shift of 16 bits, and" indexId = auid > >48 "means data in which indexId is equal to auid right shift of 16 bits.
S206, judging whether the second object information is compressed data or not; if yes, executing step S207, otherwise, executing step S208;
s207, decompressing the second object information according to a line compression structure of the compressed data, and then decoding to restore the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
and S208, decoding the second object information to recover second data.
EXAMPLE III
As shown in fig. 8, in one embodiment, a method of recovering an MSSQL database is provided, the method comprising:
s301, acquiring a data file in the virtual disk image;
s302, analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure, and acquiring first object information of a table to be restored from the system table page, wherein the first object information comprises a table name, a table field and a table main key;
s303, judging whether the data file is a damaged file or not; if yes, executing steps S3041 and S3042, otherwise, executing steps S3051-S3053;
s3041, indexing an IAM page according to first object information, and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
s3042, inquiring a user table page according to the third object information, and obtaining second object information from the user table page;
s3051, indexing the system table page according to the first object information, and obtaining an index Id and a header Id;
s3052, calculating an auid through the index Id and the header Id;
s3053, traversing the data file according to the auid to obtain a user table page, and obtaining second object information from the user table page;
s306, judging whether the second object information is compressed data or not; if yes, executing step S307, otherwise executing step S308;
s307, decompressing the second object information according to a line compression structure of the compressed data, and then decoding to recover the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
s308, decoding the second object information to recover second data;
s309, analyzing the first data, and recovering third data, wherein the third data are LOB data, SLOB data and BLOB data.
In the technical solution of this embodiment, since several special data formats exist in the SQL server, that is, LOB data, SLOB data, BLOB data, or line overflow data, most of them belong to variable length data types, but the storage mode is different from that of the common variable length data. The invention can support the recovery of the special data format through analysis, and enlarges the range of recoverable data.
Example four
As shown in fig. 9, in one embodiment, a method for recovering an MSSQL database is provided, which comprises:
s401, acquiring a data file in the virtual disk image;
s402, analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure, and acquiring first object information of a table to be restored from the system table page, wherein the first object information comprises a table name, a table field and a table main key;
s403, judging whether the data file is a damaged file or not; if yes, executing steps S4041 and S4042, otherwise executing steps S4051 to S4053;
s4041, an IAM page is indexed according to the first object information, and third object information is obtained, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
s4042, inquiring a user table page according to the third object information, and obtaining second object information from the user table page;
s4051, indexing the system table page according to the first object information, and obtaining an index Id and a header Id;
s4052, calculating auid through the index Id and the header Id;
s4053, traversing the data file according to the auid to obtain a user table page, and obtaining second object information from the user table page;
s406, judging whether the second object information is compressed data or not; if yes, executing step S407, otherwise, executing step S408;
s407, decompressing the second object information according to a line compression structure of compressed data, and then decoding to recover the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
s408, decoding the second object information to recover second data;
s409, analyzing the first data to recover third data, wherein the third data are LOB data, SLOB data and BLOB data;
s410, recording the first address offset of each piece of data in the first data;
s411, comparing the first address offset of each piece of data in the first data with the address in the row directory, and marking the matching items which are not found as deleted data.
In the technical scheme of this embodiment, since the row directory only records undeleted data, when the data is deleted, the row directory is empty, and thus the row directory cannot be located. Therefore, the range of the deleted data is determined by the difference between the two data line directories before and after the data, so that the deleted data can be recovered.
EXAMPLE five
As shown in fig. 10, in one embodiment, there is provided a system for recovering an MSSQL database, comprising:
a file obtaining module 501, configured to obtain a data file in the virtual disk image;
the system table parsing module 502 is configured to parse a system table page in the data file according to a system table page identifier and a system table page organization structure, and obtain first object information of a table that a user needs to restore from the system table page, where the first object information includes a table name, a table field, and a table primary key;
a user table query module 504, configured to query a user table page according to the first object information, and obtain second object information from the user table page;
a second judging module 505, configured to judge whether the second object information is compressed data;
a decompression decoding module 506, configured to decompress the second object information according to a line compression structure of the compressed data, and then perform decoding processing to recover the first data, where the line compression structure includes a header area, a column specifier area, a short data area, and a long data area;
the decoding module 507 is configured to perform decoding processing on the second object information to recover second data.
Optionally, on the basis of this embodiment, the system further includes:
a first judging module 503, configured to judge whether the data file is a damaged file;
as shown in fig. 11, the user table query module 504 includes:
an indexing unit 5041, configured to index an IAM page according to first object information, and obtain third object information, where the third object information includes a page number and a file number, and the IAM page includes a mixed area including at least eight data slots;
the querying unit 5042 queries a user table page according to the third object information, and obtains second object information from the user table page.
Optionally, on the basis of this embodiment, the system further includes:
a data format parsing module 508, configured to parse the first data to recover third data, where the third data is any one or more of LOB data, SLOB data, BLOB data, or line overflow data.
Optionally, on the basis of this embodiment, the system further includes:
a recording module 509, configured to record a first address offset of each piece of data in the first data;
a compare and mark module 510, configured to compare a first address offset of each piece of data in the first data with an address in the row directory, and mark an unseen matching entry as deleted data.
According to the technical scheme of the embodiment, the file obtaining module 501 is used for obtaining a data file in a virtual disk image, the system table parsing module 502 can parse a system table page in the data file according to a system table page identifier and a system table page organization structure, and obtain first object information of a table which a user needs to restore, wherein the first object information includes a table name, a table field and a main key of the table, and the user table querying module 504 can query a user table page according to the first object information and obtain second object information from the user table page; the first determining module 503 determines whether the second object information is compressed data, and if the second object information is compressed data, the decompressing and decoding module decompresses the second object information according to a line compression structure of the compressed data, and then performs decoding processing to recover the first data. The method solves the technical problems that the fine granularity of MSSQL recovery is not enough and decompressed data cannot be recovered in the prior art, and based on binary analysis, virtual disk mounting and operation of starting an operating system and a database are omitted, and the operation flow is simplified.
EXAMPLE six
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method for recovering an MSSQL database according to the first to fourth embodiments.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A method for recovering an MSSQL database, the method comprising the steps of:
acquiring a data file in a storage system;
analyzing a system table page in a data file according to a system table page identifier and a system table page organization structure, and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table;
inquiring a user table page according to the first object information, and acquiring second object information from the user table page;
judging whether the second object information is compressed data or not;
if the second object information is compressed data, decompressing the second object information according to a line compression structure of the compressed data, and then decoding to recover the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
if the second object information is not compressed data, decoding the second object information to recover second data;
the step of decompressing the second object information according to the line compression structure of the compressed data and then decoding the second object information comprises the following steps;
analyzing the column specifier area to obtain the total number of columns and the length of each column;
analyzing the short data area according to the first object information and the length of each column to obtain first actual data;
analyzing the offset item corresponding to each column in the long data area to obtain the ending offset of each column;
analyzing the long data area according to the ending deviation and the first object information to obtain second actual data;
and decoding the first actual data and the second actual data to recover the first data.
2. The method for recovering the MSSQL database according to claim 1, further comprising, before the step of querying a user table page according to the first object information and obtaining second object information from the user table page:
judging whether the data file is a damaged file or not;
if the data file is an undamaged file, the user table page is inquired according to the first object information, and the second object information is obtained from the user table page, and the following steps are specifically executed:
indexing an IAM page according to the first object information, and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
and inquiring a user table page according to the third object information, and obtaining second object information from the user table page.
3. The method for recovering an MSSQL database according to claim 2, wherein the step of indexing an IAM page comprises:
judging whether eight data slots in the IAM page mixing area are filled;
and if the eight data slots are not completely filled, analyzing the IAM page mixed area to acquire the third object information.
4. The method for recovering the MSSQL database according to any one of claims 1 to 3, wherein the step of judging whether the second object information is compressed data further comprises the following steps:
and analyzing the first data to recover third data, wherein the third data is any one or more of LOB data, SLOB data, BLOB data or line overflow data.
5. The method for recovering the MSSQL database according to any one of claims 1 to 3, wherein the step of judging whether the second object information is compressed data further comprises the following steps:
recording the first address offset of each piece of data in the first data;
and comparing the first address offset of each piece of data in the first data with the address in the line directory, and marking the unseen matching item as deleted data.
6. A system for recovering an MSSQL database, the system comprising:
the file acquisition module is used for acquiring data files in the storage system;
the system table analysis module is used for analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table;
the user table query module is used for querying a user table page according to the first object information and obtaining second object information from the user table page;
the second judging module is used for judging whether the second object information is compressed data or not;
the decompression decoding module is used for decompressing the second object information according to a line compression structure of the compressed data and then decoding the second object information to restore the first data, wherein the line compression structure comprises a header area, a column descriptor area, a short data area and a long data area;
the decoding module is used for decoding the second object information and recovering second data;
the second object information is decompressed according to a line compression structure of the compressed data, and then is decoded, wherein the decompression processing comprises the following steps;
analyzing the column specifier area to obtain the total number of columns and the length of each column;
analyzing the short data area according to the first object information and the length of each column to obtain first actual data;
analyzing the offset item corresponding to each column in the long data area to obtain the ending offset of each column;
analyzing the long data area according to the ending deviation and the first object information to obtain second actual data;
and decoding the first actual data and the second actual data to recover the first data.
7. The system for recovering an MSSQL database according to claim 6, further comprising:
the first judgment module is used for judging whether the data file is a damaged file or not;
wherein, the user table query module comprises:
the index unit is used for indexing an IAM page according to the first object information and obtaining third object information, wherein the third object information comprises a page number and a file number, and the IAM page comprises a mixed area containing at least eight data slots;
and the query unit is used for querying a user table page according to the third object information and acquiring second object information from the user table page.
8. The system for recovering an MSSQL database according to claim 6, further comprising:
the data format analysis module is used for analyzing the first data and recovering third data, wherein the third data is any one or more of LOB data, SLOB data, BLOB data or line overflow data;
the recording module is used for recording the first address offset of each piece of data in the first data;
and the comparison marking module is used for comparing the first address offset of each piece of data in the first data with the address in the row directory and marking the unseen matching item as deleted data.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of recovering an MSSQL database according to any one of claims 1 to 3.
CN202110828001.5A 2021-07-22 2021-07-22 Method, system and storage medium for recovering MSSQL database Active CN113282592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110828001.5A CN113282592B (en) 2021-07-22 2021-07-22 Method, system and storage medium for recovering MSSQL database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110828001.5A CN113282592B (en) 2021-07-22 2021-07-22 Method, system and storage medium for recovering MSSQL database

Publications (2)

Publication Number Publication Date
CN113282592A CN113282592A (en) 2021-08-20
CN113282592B true CN113282592B (en) 2021-09-24

Family

ID=77286955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110828001.5A Active CN113282592B (en) 2021-07-22 2021-07-22 Method, system and storage medium for recovering MSSQL database

Country Status (1)

Country Link
CN (1) CN113282592B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011023134A1 (en) * 2009-08-28 2011-03-03 Beijing Innovation Works Technology Company Limited Method and system for managing distributed storage system through virtual file system
CN105068887B (en) * 2015-08-03 2019-02-12 四川效率源信息安全技术股份有限公司 It is a kind of based on the data reconstruction method for being damaged SQLServer database
CN105677509B (en) * 2015-12-25 2019-03-01 北京奇虎科技有限公司 The restoration methods and device of data in database
CN106992842B (en) * 2017-04-17 2020-04-07 中山大学 Multi-element domain data recovery method based on network coding and compressed sensing
WO2018232336A1 (en) * 2017-06-15 2018-12-20 Huawei Technologies Co., Ltd. Method and apparatus for transmitting upstream data in a cable network
CN108563535B (en) * 2018-04-27 2021-12-24 四川巧夺天工信息安全智能设备有限公司 Recovery method for MySQL database whole database
CN109388523B (en) * 2018-09-26 2021-11-30 四川巧夺天工信息安全智能设备有限公司 Method for recovering MySQL database based on binary log file
CN111143115A (en) * 2018-11-05 2020-05-12 中国移动通信集团云南有限公司 Remote disaster recovery method and device based on backup data
CN109271463B (en) * 2018-11-30 2022-06-07 四川巧夺天工信息安全智能设备有限公司 Method for recovering inodb compressed data of MySQL database
CN112052120B (en) * 2020-08-27 2022-08-05 厦门市美亚柏科信息股份有限公司 Database deleted data recovery method and device

Also Published As

Publication number Publication date
CN113282592A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
JP6596102B2 (en) Lossless data loss by deriving data from basic data elements present in content-associative sheaves
US9251159B1 (en) Partial block allocation for file system block compression using virtual block metadata
US8825667B2 (en) Method and apparatus for managing data objects of a data storage system
US10635359B2 (en) Managing cache compression in data storage systems
US20020129028A1 (en) System and method for managing file system extended attributes
CN110741637B (en) Method for simplifying video data, computer readable storage medium and electronic device
O'Neil et al. Bitmap index design choices and their performance implications
CN108475508B (en) Simplification of audio data and data stored in block processing storage system
CN109710455A (en) Deletion file access pattern method and system based on FAT32 file system
TW202147787A (en) Exploiting locality of prime data for efficient retrieval of data that has been losslessly reduced using a prime data sieve
JP6726690B2 (en) Performing multidimensional search, content-associative retrieval, and keyword-based retrieval and retrieval on losslessly reduced data using basic data sieves
CN113282592B (en) Method, system and storage medium for recovering MSSQL database
TWI816954B (en) Method and apparatus for reconstituting a sequence of losslessly-reduced data chunks, method and apparatus for determining metadata for prime data elements, and storage medium
KR101670473B1 (en) Method for recovering deleted data by mysql innodb database
KR20160113167A (en) Optimized data condenser and method
JP2003337822A (en) Compression retrieval archive processing method, compression retrieval archive processing program and recording medium with its program recorded
CN113282573B (en) Database recovery method, system and storage medium based on IAM page
CN112905546A (en) Deleted file recovery method, device, equipment and storage medium
CN112650718A (en) Method for analyzing and extracting BTRFS file system data based on copy-on-write
CN112380174B (en) XFS file system analysis method containing deleted files, terminal device and storage medium
Edel et al. Measuring the compressibility of metadata and small files for disk/nvram hybrid storage systems
An et al. Data reconstruction and recovery of deduplicated files having non-resident attributes in NTFS volume
Maric Mesh Intermediary Representation
CN115168105A (en) Method for recovering thumbnail of Windows deleted picture and related device
Lømo File System supporting Arbitrarily sized Allocations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant