CN113282573A

CN113282573A - Database recovery method, system and storage medium based on IAM page

Info

Publication number: CN113282573A
Application number: CN202110827997.8A
Authority: CN
Inventors: 黄传波; 姚一永; 龙星澧; 涂磊; 谢卓伟; 钱禹航
Original assignee: Chengdu Vinchin Science And Technology Co
Current assignee: Chengdu Vinchin Science And Technology Co
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-08-20
Anticipated expiration: 2041-07-22
Also published as: CN113282573B

Abstract

The invention relates to a database recovery method, a system and a storage medium based on an IAM page, belonging to the field of data recovery. The method comprises the following steps: obtaining an IAM page; determining a mapping area index in an IAM page; determining a data slot in the IAM page; judging whether the data slot is filled; if the data slot is filled, analyzing the data slot to obtain second object information; inquiring a user table page according to the second object information, and acquiring fourth object information from the user table page; and decoding the fourth object information to recover the user table data. The system comprises: an IAM page acquisition module; a mapping region index determining module; a data slot determination module; a data slot judging module; a first parsing module; a user table query module; and a decoding module. The invention can analyze the IAM page in the binary state, thereby enabling the IAM page to normally track the table, avoiding traversing the whole file, greatly improving the recovery speed of data, and having simple and convenient flow and flexible use.

Description

Database recovery method, system and storage medium based on IAM page

Technical Field

The invention belongs to the field of data recovery, and relates to a database recovery method, a database recovery system and a storage medium based on an IAM page.

Background

SQL server is a relational database management system introduced by Microsoft corporation, and is called Microsoft SQL server for short MSSQL. The SQL Server is a closed source database product, provides programs for vast users, does not provide source codes, and has the characteristics of high safety, strong usability and good operation performance.

The Data File of the SQL server is divided into an MDF File (Primary Data File), an NDF File (Secondary Data Files), and an LDF File (Log Data Files), wherein the MDF File and the LDF File are owned by each database, and the NDF File appears only when the databases are divided. The MDF file and the NDF file are mainly used for storing data, and both have similar structures, while the LDF file is a database log, and stores all transaction operations of the database, and is mainly used for recording and rolling back transactions.

The data file of the SQL Server is composed of a plurality of pages, and the types of the pages comprise an index page, an IAM page, a system table page and a user table page. Among them, the IAM page (Index Allocation Map) is a special page, which is also called an Index Allocation Map page. The IAM page may be used to track the table since it contains an index to the data page of the table.

At present, due to the closed-source property of SQL server, there is almost no technology for recovering data in SQL server using IAM pages. In addition, if the SQL server is in an abnormal state, for example, the entire database cannot be started normally, all pages have only metadata, and the IAM page cannot be parsed and identified normally, which makes it more difficult to use the IAM page to recover data.

Therefore, how to help the user to quickly recover data through the IAM page becomes a technical problem which needs to be solved urgently at present.

Disclosure of Invention

In order to solve the technical problems in the background art, the present invention provides a method, a system and a storage medium for database recovery based on an IAM page. The technical scheme is as follows:

in a first aspect, a method for database recovery based on an IAM page is provided, the method comprising the steps of:

s1, acquiring an IAM page from a data file;

s2, determining a mapping area index in the IAM page;

s3, determining all data slots in the IAM page according to the offset stored in the mapping area index;

s4, judging whether all the data slots are filled, and if not, executing the step S5; if the data slot is full, go from step S6 to step S9;

s5, analyzing all the data slots in the step S4 to obtain second object information, wherein the second object information comprises page numbers and file numbers;

s6, determining a uniform area map in the IAM page, and determining reserved bytes in the uniform area map;

s7, traversing the IAM page, and marking all non-zero bits;

s8, determining all unified regions according to the offset stored by the mapping region index of the step S2 and the non-zero bits of the step S7;

s9, analyzing all the data slots in the step S4 and all the uniform areas in the step S8 to obtain third object information, wherein the third object information comprises page numbers and file numbers;

s10, inquiring a user table page according to the second object information of the step S5 or the third object information of the step S9, and obtaining fourth object information from the user table page;

and S11, decoding the fourth object information to recover the user table data.

It is understood that the fourth object information refers to undecoded user table data.

In one embodiment, step S1 includes:

s1001, acquiring a data file in a storage system;

s1002, analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure, and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field, a main key of the table and an index of an IAM page;

s1003, acquiring an IAM page according to the first object information.

In one embodiment, after step S11, the method further includes:

s12, checking the IAM page and the user table page, and marking the pages which do not pass the checking.

In one embodiment, after step S11, the method further includes:

s13, recording the first address offset of each piece of data in the user table data;

s14, comparing the first address offset of each piece of data in the user table data with the address in the row directory, and marking the unsettled matching item as deleted data.

In a second aspect, a system for IAM page based database recovery is provided, the system comprising:

the IAM page acquisition module is used for acquiring an IAM page from the data file;

a mapping area index determining module, configured to determine a mapping area index in the IAM page;

the data slot determining module is used for determining all data slots in the IAM page according to the offset stored in the mapping area index;

the data slot judging module is used for judging whether all the data slots are filled;

the first analysis module is used for analyzing the data slot to obtain second object information, and the second object information comprises a page number and a file number;

a reserved byte determining module, configured to determine a uniform area map in the IAM page, and determine reserved bytes in the uniform area map;

the traversal recording module is used for traversing the IAM page and marking all non-zero bits;

a uniform area determining module, configured to determine all uniform areas according to the offset stored in the mapping area index and the non-zero bit;

the second analysis module is used for analyzing all the data slots and all the uniform areas to obtain third object information, and the third object information comprises page numbers and file numbers;

the user table query module is used for querying a user table page according to the second object information or the third object information and obtaining fourth object information from the user table page;

and the decoding module is used for decoding the fourth object information and recovering the user table data.

In one embodiment, the IAM page retrieving module includes:

the file acquisition unit is used for acquiring data files in the storage system;

the system table analysis unit is used for analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field and a main key of the table;

an IAM page acquiring unit configured to acquire an IAM page according to the first object information.

In one embodiment, the system further includes:

and the checking and marking module is used for checking the IAM page and the user table page and marking the pages which do not pass the checking.

In one embodiment, the system further includes:

the offset recording module is used for recording the first address offset of each piece of data in the user table data;

and the comparison marking module is used for comparing the first address offset of each piece of data in the user table data with the address in the row directory and marking the unseen matching item as deleted data.

In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the above-described method for IAM page-based database recovery.

The invention has the beneficial effects that:

(1) the invention can analyze the IAM page in the binary state, thereby enabling the IAM page to normally track the table, avoiding traversing the whole file and greatly improving the recovery speed of data;

(2) the process of recovering the MSSQL database does not depend on the MSSQL state, does not need to mount and start a system in a virtual disk, does not need to use a log file, greatly simplifies the operation flow and is more flexible to use.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a database recovery method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a system table page according to an embodiment of the present invention.

Fig. 3 is a physical structure diagram of an IAM page mixing area according to an embodiment of the present invention.

Fig. 4 is a physical structure diagram of an IAM page unified area according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an IAM page linked list according to an embodiment of the present invention.

Fig. 6 is a flowchart of a database recovery method according to a second embodiment of the present invention.

Fig. 7 is a flowchart of a database recovery method according to a third embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a database recovery system according to a fourth embodiment of the present invention.

Fig. 9 is a schematic structural diagram of an IAM page acquisition module in the fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method provided by the invention can be applied to the following environments: in the virtual disk image containing SQL Server, the analysis process is compiled by JAVA, and the virtual disk is in VMDK format of VMware work.

Example one

As shown in FIG. 1, in one embodiment, a method for IAM page based database recovery is provided, the method comprising the steps of:

s101, acquiring an IAM page from a data file.

Optionally, step S101 includes:

s1001, acquiring a data file in a storage system.

Data files refer to MDF files and NDF files, i.e., files suffixed with MDF and files suffixed with NDF. The organization structure of the MDF file and the organization structure of the NDF file both use pages as basic units, each page consisting of several bytes. MDF files and NDF files are mainly used for storing data, and thus provide a possibility for recovering deleted user data by parsing page structure and byte information.

For ease of understanding, in particular, an example of operation is provided: reading a virtual disk and loading a registry file; acquiring a value named DefaultData node under an MSSQL14.MSSQLSERVER node, wherein the specific value is C \ Program Files \ Microsoft SQL Server \ MSSQL14.MSSQL SERVER \ MSSQL \ DATA; and acquiring a database data file named sample. mdf through the path because the database is not split, and loading the database data file into a recovery program realized by java.

S1002, analyzing a system table page in the data file according to the system table page identifier and the system table page organization structure, and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field, a main key of the table and an index of an IAM page.

Pages are the basic units that make up a data file, and the types of pages include index pages, IAM pages, system table pages, and user table pages. The system table page contains a plurality of types of system tables, the system tables are typically prefixed by sys, each system table stores different metadata, such as Sysobjects, which stores all object information created in the database, such as constraints, defaults, logs, rules, stored procedures, user tables, each object occupies one row in the table.

Different types of pages have different organization structures, as shown in fig. 2, a complete system table page at least comprises: the device comprises a page header area, a data area, a line directory area (slot area) and a free area, wherein the size of the page header area is fixed 96 bytes, and the sizes of the other three areas are not fixed. The page header area mainly stores important information such as the identifier, the page type, the obj ID, the slot number, the fixed segment length, the current page ID and the next page ID of the page, and the information such as the position, the page storage content and the page integrity of the page can be confirmed through the page header area. The identifier is a 48-bit number, which is unique in the whole database data file, the upper 32 bits are the page number, and the lower 16 bits are the file number, and through the two numbers, a page can be uniquely determined. Meanwhile, all data pages of the same system table are stored in a double-linked list, and when one page is found, all the remaining pages can be found according to the pointer.

Therefore, the page id where all the system table pages are located can be quickly retrieved through the identifier and the organization structure, and a foundation is laid for acquiring the object information required to be recovered from the system table pages.

For ease of understanding, in particular, an example of operation is provided: cutting the page according to 8192 bytes, and retrieving a system table page in the data file; reading a system table page for storing table information, screening all user table information with the type of U, and selecting the table id required to be recovered as 251147940; reading the system table page for storing the table fields, and screening out all fields with table id 251147940, the results are shown in table 1 below:

TABLE 1 Table field information screening results

Name of table field	Type (B)	Actual type	Length of	Logical order
					uid	56	Int	4	1
Pig	175	Char	8	2
					Elephant	175	Char	10	3
Monkey	175	Char	12	4
					Cat	239	NChar	20	5
Bird	35	Text	16	6
					Duck	175	Char	1000	7
Dog	62	Float	8	8

S1003, acquiring an IAM page according to the first object information.

S102, determining a mapping area index in the IAM page.

The mapping area index within an IAM page refers to the first bit of the data file area to which the current IAM page maps. In the IAM page, the mapping area index is fixed at the 136 th byte, and there is one data slot every six bytes after the mapping area index, so that the user can be helped locate the position of the data slot by determining the mapping area index.

S103, determining all data slots in the IAM page according to the offset stored in the mapping area index.

In the data file, the header type of the IAM page is 10, the IAM page is divided into a mixed area and a uniform area, the mixed area has eight slot positions (i.e. data slots), the small-end mode is adopted, each slot position corresponds to one mixed area page, each slot position occupies six bytes, the first two bytes are file numbers, and the last four bytes are page numbers and directory numbers.

For ease of understanding, we provide an illustration of the IAM page blend area below, in particular as follows:

as shown in fig. 3, in the physical structure of the IAM page mixed area in the hugerow table, the first dashed line part indicates the beginning part of the page area corresponding to the IAM page, and it can be seen that the page ID is 0 and the file number is 1, so this IAM page corresponds to 511232 pages starting from the 0 th page, the second dashed line part is the first slot of the mixed area, the file number is 1, and the page ID is 175 (small end mode), which indicates that the table is allocated into a uniform area composed of 8 continuous pages starting from 175.

S104, judging whether all the data slots are filled, and if not, executing the step S105; if the data slot is full, step S106 to step S109 are executed.

And S105, analyzing all the data slots in the step S104 to obtain second object information, wherein the second object information comprises a page number and a file number.

S106, determining a uniform area map in the IAM page, and determining reserved bytes in the uniform area map.

The unified area bitmap provides the use condition of the page in the area index mapped by the current IAM page, and if the use condition is 1, the unified area is allocated to the target table. In the IAM page, the uniform area bit map is fixed at 192 th byte, and the reserved byte is the first two bits of the uniform area bit map, so that the possibility of locating the uniform area is provided for the user by determining the reserved byte.

S107, traversing the IAM page and marking all non-zero bits.

Each bit after the reserved byte represents whether the target table is allocated to the corresponding uniform area, so that the allocation condition of the uniform area can be reflected by marking all the non-zero bits.

For ease of understanding, we provide an illustration of the IAM page Union section below, specifically as follows:

as shown in fig. 4, in the physical structure of the unified area of the IAM page in the huagrow table, the first two bytes are status bytes, the following bytes represent that the unified area allocation case is 0, and the area is not allocated to the target table, the 6 th bit of the 8 th byte of the unified area in fig. 4 is 1, it can be known that the 62 th area of the unified area is allocated to the huagrow page, since one area is composed of 8 continuous pages, it can be calculated that the continuous 8 pages starting from the 488 page all belong to the table huagrow, the mixed area of one IAM page has 7988 bytes, that is, 63904 areas, 511232 pages, and when the page is greater than this value, the mixed area appears on the second IAM page, and one IAM page manages about 4G of data.

S108, determining all uniform areas according to the offset stored in the mapping area index of the step S102 and the non-zero bits of the step S207.

S109, analyzing all data slots in the step S104 and all unified areas in the step S108 to obtain third object information, wherein the third object information comprises page numbers and file numbers.

S110, inquiring a user table page according to the second object information of the step S105 or the third object information of the step S209, and obtaining fourth object information from the user table page.

As shown in FIG. 5, IAM pages are stored in a data file in a chained manner, with each IAM page having pointers to the next IAM page and the previous IAM page. A data page can be uniquely identified by a file number and a page number. Therefore, all the user table pages can be quickly found by combining the chain structure of the IAM page and the second object information or the third object information, and a foundation is laid for recovering the user table data.

And S111, decoding the fourth object information to recover the user table data.

According to the technical scheme of the embodiment, the IAM page in the binary state is analyzed, so that the IAM page can normally track the table, the whole file is prevented from being traversed, and the data recovery speed is greatly improved. In addition, the process of recovering the MSSQL database does not depend on the MSSQL state, does not need to mount and start a system in a virtual disk, does not need to use log files, greatly simplifies the operation flow, and is more flexible to use.

Example two

As shown in FIG. 6, in one embodiment, a method for IAM page based database recovery is provided, the method comprising the steps of:

s201, acquiring an IAM page from a data file;

s202, determining a mapping area index in the IAM page;

s203, determining all data slots in the IAM page according to the offset stored in the mapping area index;

s204, judging whether all the data slots are filled, if not, executing the step S205; if the data slot is full, go to step S206 to step S209;

s205, analyzing all the data slots in the step S204 to obtain second object information, wherein the second object information comprises page numbers and file numbers;

s206, determining a uniform area map in the IAM page, and determining reserved bytes in the uniform area map;

s207, traversing the IAM page, and marking all non-zero bits;

s208, determining all uniform areas according to the offset stored in the mapping area index in the step S202 and the non-zero bits in the step S207;

s209, analyzing all the data slots in the step S204 and all the uniform areas in the step S208 to obtain third object information, wherein the third object information comprises page numbers and file numbers;

s210, inquiring a user table page according to the second object information of the step S205 or the third object information of the step S209, and obtaining fourth object information from the user table page;

s211, decoding the fourth object information to recover user table data;

s212, checking the IAM page and the user table page, and marking the page which does not pass the checking.

According to the technical scheme of the embodiment, the SQL Server has the functions of checking and calculating each page and storing the calculation result in the page header, so that whether the data page is damaged or tampered can be determined by checking and calculating the page and comparing the check result with the check result in the page header, the operation is simple and effective, and the accuracy of data recovery is improved.

EXAMPLE III

As shown in FIG. 7, in one embodiment, a method for IAM page based database recovery is provided, the method comprising the steps of:

s301, acquiring an IAM page from a data file;

s302, determining a mapping area index in the IAM page;

s303, determining all data slots in the IAM page according to the offset stored in the mapping area index;

s304, judging whether all the data slots are filled, if not, executing the step S305; if the data slot is full, go to step S306 to step S309;

s305, analyzing all the data slots in the step S304 to obtain second object information, wherein the second object information comprises page numbers and file numbers;

s306, determining a uniform area map in the IAM page, and determining reserved bytes in the uniform area map;

s307, traversing the IAM page, and marking all non-zero bits;

s308, determining all uniform areas according to the offset stored in the mapping area index in the step S302 and the non-zero bits in the step S307;

s309, analyzing all the data slots in the step S304 and all the uniform areas in the step S308 to obtain third object information, wherein the third object information comprises page numbers and file numbers;

s310, inquiring a user table page according to the second object information in the step S305 or the third object information in the step S309, and obtaining fourth object information from the user table page;

s311, decoding the fourth object information to recover user table data;

s312, checking the IAM page and the user table page, and marking the pages which do not pass the checking;

s313, recording the first address offset of each piece of data in the user table data;

s314, comparing the first address offset of each piece of data in the user table data with the address in the row directory, and marking the unseen matching item as deleted data.

In the technical scheme of this embodiment, since the row directory only records undeleted data, when the data is deleted, the row directory is empty, and thus the row directory cannot be located. Therefore, the range of the deleted data is determined by the difference between the two data line directories before and after the data, so that the deleted data can be recovered.

Example four

As shown in FIG. 8, in one embodiment, a system for IAM page based database recovery is provided, the system comprising:

an IAM page acquiring module 401, configured to acquire an IAM page from a data file;

a mapping area index determining module 402, configured to determine a mapping area index in the IAM page;

a data slot determining module 403, configured to determine all data slots in the IAM page according to the offset stored in the mapping area index;

a data slot determining module 404, configured to determine whether all the data slots are filled;

a first parsing module 405, configured to parse the data slot to obtain second object information, where the second object information includes a page number and a file number;

a reserved byte determining module 406, configured to determine a uniform area map in the IAM page, and determine reserved bytes in the uniform area map;

a traversal recording module 407, configured to traverse the IAM page and mark all non-zero bits;

a uniform region determining module 408, configured to determine all uniform regions according to the offsets stored in the mapping region indexes and the non-zero bits;

a second parsing module 409, configured to parse all the data slots and all the uniform areas to obtain third object information, where the third object information includes a page number and a file number;

a user table query module 410, configured to query a user table page according to the second object information or the third object information, and obtain fourth object information from the user table page;

the decoding module 411 is configured to perform decoding processing on the fourth object information to recover the user table data.

Optionally, on the basis of this embodiment, as shown in fig. 9, the IAM page obtaining module 401 includes:

a file acquiring unit 4001, configured to acquire a data file in a storage system;

the system table analyzing unit 4002 is configured to analyze a system table page in the data file according to a system table page identifier and a system table page organization structure, and acquire first object information of a table that a user needs to restore from the system table page, where the first object information includes a table name, a table field, and a table primary key;

an IAM page acquisition unit 4003 is configured to acquire an IAM page according to the first object information.

Optionally, on the basis of this embodiment, the system further includes:

and a check marking module 412, configured to check the IAM page and the user table page, and mark a page that fails to be checked.

Optionally, on the basis of this embodiment, the system further includes:

an offset recording module 413, configured to record a first address offset of each piece of data in the user table data;

and a comparison marking module 414, configured to compare the first address offset of each piece of data in the user table data with an address in the row directory, and mark an unseen matching entry as deleted data.

According to the technical scheme of the embodiment, the IAM page acquisition module 401 is used for acquiring an IAM page from a data file; the mapping area index determination module 402 is capable of determining a mapping area index within the IAM page; a data slot determining module 403, configured to determine all data slots in the IAM page according to the offset stored in the mapping area index; the data slot determining module 404 can determine whether all the data slots are filled; the first analyzing module 405 analyzes the data slot to obtain a page number and a file number; the user table query module 410 is configured to query a user table page according to the second object information, and obtain fourth object information from the user table page; the decoding module 411 can perform decoding processing on the fourth object information to recover the user table data. The technical problem that a user cannot quickly restore data through an IAM page under the condition that MSSQL is in an abnormal state in the prior art is solved, the IAM page can normally track a table based on binary analysis, the whole file is prevented from being traversed, and the operation process is simple, convenient and effective.

EXAMPLE five

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method for database recovery based on IAM pages as described in embodiments one to four.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for database recovery based on IAM pages, the method comprising the steps of:

s1, acquiring an IAM page from a data file;

s2, determining a mapping area index in the IAM page;

s7, traversing the IAM page, and marking all non-zero bits;

and S11, decoding the fourth object information to recover the user table data.

2. The method for IAM page based database recovery as claimed in claim 1, wherein the step S1 comprises:

s1001, acquiring a data file in a storage system;

s1003, acquiring an IAM page according to the first object information.

3. The method for IAM page based database recovery as claimed in claim 1, wherein after the step S11, the method further comprises:

4. The method for IAM page based database recovery as claimed in claim 1, wherein after the step S11, the method further comprises:

5. A system for IAM page based database recovery, the system comprising:

6. The IAM page based database recovery system of claim 5, wherein the IAM page acquisition module comprises:

the system table analysis unit is used for analyzing a system table page in the data file according to a system table page identifier and a system table page organization structure and acquiring first object information of a table which needs to be restored by a user from the system table page, wherein the first object information comprises a table name, a table field, a main key of the table and an index of an IAM page;

7. The IAM page based database recovery system of claim 5, further comprising:

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for IAM page-based database recovery as claimed in any one of claims 1 to 4.