WO2013163864A1 - 数据持久化处理方法、装置及数据库系统 - Google Patents

数据持久化处理方法、装置及数据库系统 Download PDF

Info

Publication number
WO2013163864A1
WO2013163864A1 PCT/CN2012/083305 CN2012083305W WO2013163864A1 WO 2013163864 A1 WO2013163864 A1 WO 2013163864A1 CN 2012083305 W CN2012083305 W CN 2012083305W WO 2013163864 A1 WO2013163864 A1 WO 2013163864A1
Authority
WO
WIPO (PCT)
Prior art keywords
page
dirty
active group
disk
checkpoint
Prior art date
Application number
PCT/CN2012/083305
Other languages
English (en)
French (fr)
Inventor
威诺斯
彭勇飞
杨上德
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013163864A1 publication Critical patent/WO2013163864A1/zh
Priority to US14/529,501 priority Critical patent/US20150058295A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a data persistence processing method, apparatus, and database system.
  • Memory can provide high throughput and fast response with respect to disk.
  • Database systems usually store certain data, such as tedious data, into memory in order to improve data read and write speed and cache.
  • the database system usually uses the page as the cache unit.
  • the page is marked as dirty page by the kernel.
  • the database system will put the dirty page data at the appropriate time. Write to disk to keep the data in the cache consistent with the data on the disk.
  • the Checkpoint mechanism is the mechanism that enables the database to recover after a failure.
  • the traditional checkpoint mechanism also known as the full checkpoint mechanism, dumps all dirty pages in the checkpoint queue to disk at once.
  • this checkpoint mechanism for data persistence, in order to ensure the consistency of memory and disk data, it is necessary to lock the entire checkpoint queue during the entire data persistence processing, that is, the user's normal transaction operations will be prevented from being compared. For a long time.
  • Embodiments of the present invention provide a data persistence processing method, apparatus, and database system, which are used to improve the efficiency of dirty page dumping to a certain extent.
  • an embodiment of the present invention provides a data persistence processing method, including: Each time a dirty page is generated in the database system memory, the corresponding page identifier of each dirty page is added to the checkpoint queue;
  • the dirty pages corresponding to the respective page identifiers included in the active group are sequentially transferred to the data file of the disk;
  • the embodiment of the present invention further provides a data persistence processing apparatus, including: a checkpoint queue maintenance unit, configured to respectively generate a page corresponding to each dirty page when a dirty page is generated in a database system memory. The logo is added to the checkpoint queue;
  • a packet processing unit configured to determine an active group and a current group in the checkpoint queue; a page identifier corresponding to each of the plurality of dirty pages currently prepared to be dumped to the disk in the checkpoint queue, forming the active group; The group inserted into the newly added dirty page in the checkpoint queue is the current group; the dirty page bulk dumping unit is configured to: at a preset checkpoint occurrence time, corresponding to each page identifier included in the active group Dirty pages are sequentially transferred to the data file of the disk;
  • the packet processing unit is further configured to: if the dirty page related to the active group is completed, determine the next active group in the checkpoint queue;
  • the dirty page bulk dump unit is further configured to sequentially dump the dirty pages corresponding to the page identifiers included in the next active group to the data file of the disk at the timing of the checkpoint occurrence.
  • an embodiment of the present invention further provides a database system, including: a disk file, an in-memory database, and a database management system, where the database management system is configured to manage data stored in the in-memory database, the database management system Including the above data persistence processing apparatus, the data persistence processing apparatus is configured to dump data stored in an in-memory database into the disk file.
  • the data persistence processing method and device and the database system provided by the embodiments of the present invention dynamically maintain a checkpoint queue, and use the page identifier corresponding to multiple dirty pages currently prepared to be dumped to the disk in the checkpoint queue as an active group, and check The group inserted in the dirty page newly added to the point queue is the current group.
  • the dirty pages corresponding to the respective page identifiers included in the active group are sequentially transferred to the database of the disk, and after the dumping of the dirty pages corresponding to the respective page identifiers included in the active group is completed, at the checkpoint
  • the next active group is determined in the queue, so that the dirty page corresponding to each page identifier included in the next active group is sequentially transferred to the data file of the disk at the timing of the next checkpoint.
  • this loop processing it is realized that the dirty pages are dumped to the disk in batches according to the timing of the checkpoint occurrence, thereby improving the efficiency of dirty page dumping on the basis that the dirty page dump has less influence on the normal transaction operation.
  • FIG. 1 is a flowchart of a data persistence processing method according to an embodiment of the present invention
  • FIG. 2a is an example 1 of a checkpoint queue grouping according to an embodiment of the present invention
  • FIG. 2b is an example of adding a page identifier to a checkpoint queue according to an embodiment of the present invention
  • FIG. 2c is a second example of a checkpoint queue grouping according to an embodiment of the present invention
  • FIG. 2 is a third example of a checkpoint queue grouping according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an example of correspondence between page identifiers, atomic operations, and log buffer addresses of a checkpoint queue according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a data persistence processing apparatus according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of another data persistence processing apparatus according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a database system according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a data persistence processing method according to an embodiment of the present invention. As shown in FIG. 1, the data persistence processing method provided in this embodiment includes:
  • a checkpoint queue is dynamically maintained in the database system, and the checkpoint queue is used to cache the page identifier corresponding to each dirty page generated by the database system memory. Each time a dirty page is generated in the database system memory, the generated dirty pages can be added to the checkpoint queue in chronological order. If any page included in the checkpoint queue identifies the data of the corresponding dirty page, after the memory is dumped to the data file of the disk, the page identifier of the dirty page is automatically deleted in the checkpoint queue.
  • the page identifiers included in the checkpoint queue can be grouped to enable batch group dumping of dirty pages. For example, each page identifier corresponding to each dirty page that needs to be dumped to the disk in the checkpoint queue can be grouped into an active group; the group inserted in the newly added dirty page in the checkpoint queue is the current group.
  • an activity group identifier may be marked for each page identifier included in the activity group. After the processing, the page identifiers included in the checkpoint queue are classified into two categories, and one type is marked with an activity group identifier.
  • the page identifier that is, the page identifiers included in the activity group
  • the dirty pages corresponding to the page identifiers are dirty pages that need to be dumped from the memory to the disk
  • the other is the page identifier that is not marked with the active group identifier, that is, the checkpoint queue
  • the other page identifiers other than the page identifiers included in the activity group are not marked with the activity group identifier.
  • the newly generated dirty page is sequentially added to the checkpoint sequence in chronological order, and the newly inserted page identifier is inserted into the current group, and an optional example is shown in FIG. 2b.
  • Shown. 2a and 2b are the first page of the checkpoint queue to be added as the active group.
  • the manner of determining the active group is only an exemplary description, and should not be construed as limiting the technical essence of the present invention.
  • the dirty pages corresponding to the page identifiers included in the active group may be sequentially transferred to the data file of the disk at the checkpoint occurrence timing.
  • the timing of the checkpoint may be predetermined, for example:
  • the timing of the checkpoint may be determined from an atomic operation angle to reduce the impact of the checkpoint mechanism on normal transaction operations.
  • the page identifier can be automatically deleted in the checkpoint queue, which is equivalent to automatically deleting the page identifier in the active group.
  • the next active group may be determined in the checkpoint queue, which is equivalent to regrouping in the remaining page identifiers of the checkpoint queue, An example is shown in Figure 2c, where the dashed line is the page identifiers included in the last active group that have been deleted in the checkpoint queue.
  • the active group preset needs to include 4 page identifiers, and the number of page identifiers of the dirty pages that have not yet been dumped by the checkpoint queue is one, which is represented as P9. In this case, it can be directly
  • P9 is the page identifier included in a new activity group.
  • the dirty page corresponding to each page identifier included in the activity group may be dumped to the data file of the disk at the timing of the new checkpoint; and the new dirty page generated in memory after the grouping is generated.
  • the page identifier is added to the current group, and the specific implementation is similar to 12, and is not described here.
  • the data persistence processing method provided in this embodiment dynamically maintains a checkpoint queue, and uses a page identifier corresponding to a plurality of dirty pages currently prepared to be dumped to the disk in the checkpoint queue as an active group, and the newly added dirty page of the checkpoint queue
  • the inserted group is the current group, and each checkpoint occurs, and the dirty pages corresponding to the page identifiers included in the active group are sequentially transferred to the database of the disk, and the dirty pages corresponding to the respective page identifiers included in the active group are completed.
  • determine the next live in the checkpoint queue The moving group, in order to occur at the next checkpoint, sequentially dumps the dirty pages corresponding to the respective page identifiers included in the next active group to the data files of the disk.
  • this loop processing it is realized that the dirty pages are dumped to the disk in batches according to the timing of the checkpoint occurrence, thereby improving the efficiency of dirty page dumping on the basis that the dirty page dump has less influence on the normal transaction operation.
  • the page identifier belongs to the active group; if yes, Before dumping the dirty page corresponding to the page identifier to the data file of the disk, create a mirror page of the dirty page corresponding to the page identifier; otherwise, the mirror page of the dirty page corresponding to the page identifier is not created. After the creation of the mirror page of the dirty page corresponding to the page identifier is completed, if the dump operation of the dirty page corresponding to the page identifier is completed, the mirror page corresponding to the page identifier is transferred to the data file of the disk.
  • This process reduces the memory space required to create a mirror page because it does not need to create a mirror page for the dirty page corresponding to each page in the checkpoint queue, but only creates a corresponding mirror page for the page identifier in the active group that needs to be modified. , to ensure data consistency between memory and disk.
  • one atomic operation may involve multiple dirty pages
  • one active group may include multiple dirty pages involved in atomic operations.
  • the log of the atomic operation associated with the active group in the log buffer of the memory may be transferred to the disk.
  • the log file for example: determining an atomic operation associated with each page identifier included in the current activity group; obtaining, in a log buffer of the database memory, each log buffer address associated with the determined atomic operation ; Transfer the obtained logs of each log buffer address cache to the log files of the disk.
  • the dirty page corresponding to each page identifier included in the active group is transferred to the data file of the disk.
  • P represents the page identifier
  • A represents the atomic operation
  • the current active group of the detection point queue includes the page identifiers P1-P6, where: Pl, P2, and P14 are the dirty involved in atomic operation A1.
  • the page identifier of the page, P1 and P2 belong to the active group, and P14 belongs to the inactive group; the latest data of the dirty pages corresponding to PI, P2 and P14 is cached in the buffer buffer of the memory and the buffer address corresponding to the atomic operation A1.
  • the log buffer address associated with the atomic operation A1 can be obtained, and the log of each log buffer address cache, that is, the PI, is obtained.
  • the corresponding logs of P2 and P14 are transferred to the log file of the disk; after that, P1 is sequentially
  • the dirty page corresponding to P2 is transferred to the data file of the disk.
  • the next active group is re-determined in the remaining page identifier of the checkpoint queue, and when the next checkpoint occurs, the timing comes.
  • atomic operations A1 involves pages marked as dirty pages of Pl, P2, and P14.
  • atomic operation A1 be: Transfer user account U1 to 100 yuan to user account U2, where the dirty pages corresponding to P1 and P2 correspond to the operation of deducting 100 yuan from user account U1 in atomic operation, and the dirty page corresponding to P14 corresponds to atom
  • an operation of adding 100 yuan from the user account U2 is added.
  • the log buffer records the balances of the user accounts U1 and U2.
  • the balance of the user account U1 corresponding to P1 is 100 and the balance of the user account U2 is 0, and the balance of the user account U1 corresponding to P2 is 0 and the balance of the user account U2.
  • the balance of the user account U1 corresponding to P14 is 0 and the balance of the user account U2 is 100.
  • the restored data shows that the balance of the user account U1 is 0 and the balance of the user account U2 is 0.
  • the corresponding data related to the atomic operation A1 recovered in the database system is updated, for example, the log corresponding to P14 stored in the log file of the disk, that is, the user account U1.
  • the balance of the user account U2 is 100, and the balance of the user account U2 in the above-mentioned recovery data is updated to 100, thereby ensuring the correctness of the restored data when the database system performs fault recovery based on the disk.
  • the log file starting point of each atomic operation associated with each page identifier included in the next active group may be obtained;
  • the log file starting point of any atomic operation is used to indicate: a log generated when any one of the atomic operations starts running, a storage location in the log file; and the logs included in the log file are saved in chronological order.
  • the database recovery point is used to indicate: if the database system completes the identification of each page included in the next active group The corresponding dirty page fails before being dumped to the disk, and when the failed database system is restored, The log file reads the starting point of the log required for recovery. This process can quickly determine the log that database recovery needs to be used according to the recovery point to improve the recovery speed of the database system. For example
  • the files of the atomic operations A2, A3, and A4 associated with the next active group G2 are obtained. Start point, and take the minimum value from the starting point of each log file obtained, and use the minimum value as the current database recovery point. If the database system fails during the dirty page dump operation of the active group G2, when the current database recovery point is restored as the database system, the starting point of the log required for recovery is read in the log file, Determine the logs in the log file after the recovery point for the database to be used for database recovery.
  • the foregoing storage medium includes: a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • the medium of the program code includes: a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.
  • FIG. 4 is a schematic structural diagram of a data persistence processing apparatus according to an embodiment of the present invention.
  • the data persistence processing apparatus 40 shown in FIG. 4 includes: a checkpoint queue maintenance unit 41, a packet processing unit 42, and a dirty page bulk dump unit 43.
  • the checkpoint queue maintenance unit 41 can be used to add the page identifier corresponding to each dirty page to the checkpoint queue each time the dirty page is generated in the database system memory;
  • the packet processing unit 42 is configured to determine, in the checkpoint queue, an active group and a current group; a page identifier corresponding to each of the plurality of dirty pages currently being dumped to the disk in the checkpoint queue, Forming the activity group; the group inserted by the newly added dirty page in the checkpoint queue is the current group; the dirty page bulk dump unit 43 can be used to include the activity group at the preset checkpoint occurrence time
  • the dirty pages corresponding to the respective page identifiers are sequentially transferred to the data files of the disk.
  • the packet processing unit 42 is further operable to determine the next active group in the checkpoint queue if the dirty page dump associated with the active set is completed.
  • the dirty page bulk dump unit 43 is further configured to sequentially dump the dirty pages corresponding to the page identifiers included in the next active group to the data file of the disk at the timing of the checkpoint occurrence.
  • the checkpoint occurrence timing includes: The database system memory has no atomic operations currently running.
  • the dirty page can be dumped to the data file of the disk in batches according to the timing of the checkpoint occurrence, thereby minimizing the influence on the normal transaction process of the user during the checkpoint execution process. , improve the efficiency of dirty page dumping.
  • the data persistence processing device 40 may further include: a mirror page creation unit 44.
  • the mirror page creation unit 44 may be configured to determine, after determining the activity group, if it is determined that the dirty page corresponding to any page identifier included in the checkpoint queue needs to be modified, determining whether the any page identifier belongs to the activity a group; if yes, creating a mirror page of the dirty page corresponding to any one of the page identifiers before dumping the dirty page corresponding to any one of the page identifiers to the data file of the disk; otherwise, the method is not created Either page identifies the mirror page of the corresponding dirty page.
  • the mirror page of the dirty page needs to be created, thereby saving the storage space required for storing the mirrored page.
  • the dirty page bulk dump unit 43 needs to perform the memory-to-disk transfer processing on the dirty page corresponding to any page identifier, if the page identifier is created with the mirror page, the page identifies the corresponding mirror page. Transfer from memory to the data file of the disk.
  • the data persistence processing device 40 may further include: a log file dump processing unit 45.
  • the log file dump processing unit 45 is configured to determine an atomic operation associated with each page identifier included in the activity group; in the log buffer of the database memory, obtain each log buffer associated with the determined atomic operation Address: The obtained log buffer of each log buffer address is transferred to the log file of the disk. In this way, it is beneficial to ensure the correctness of the recovered data when the database system performs fault recovery based on the disk.
  • the data persistence processing device 40 may further include: a database recovery point setting Module 46 is provided.
  • the database recovery point setting module 46 is configured to obtain a log file starting point of each atomic operation associated with each page identifier included in the next active group; a log file starting point of any atomic operation is used to indicate: the any atomic operation
  • the log generated in the log file is saved in the log file; the logs included in the log file are saved in chronological order; the minimum value of the log file starting point of each atomic operation obtained is set to be the current database.
  • the database recovery point is used to indicate: if the database system fails before dumping the dirty page corresponding to each page identifier included in the next active group to the disk, the failed database
  • the starting point of the log required for recovery is read in the log file. In this way, the log required for database recovery can be quickly determined according to the recovery point to improve the recovery speed of the database system.
  • the data persistence processing apparatus provided by the embodiment of the present invention is used to implement the data persistence processing method provided by the embodiment of the present invention.
  • the working mechanism of the present invention can be referred to the corresponding description of the foregoing method embodiment of the present invention, and details are not described herein again.
  • an embodiment of the present invention further provides a database system, including a disk file 53, an in-memory database 52, and a database management system 51.
  • the database management system 51 is for managing data stored in the in-memory database 52, and the database management system 51 includes any of the above-described data persistence processing means 40 for dumping data stored in the in-memory database 52.
  • the disk file 53 that is, the data file stored on the disk
  • the dirty page dump has a small impact on the normal transaction operation. , improve the efficiency of dirty page dumping.
  • For the specific module division and function method of the data persistence processing device 40 reference may be made to the foregoing embodiments, and details are not described herein again.
  • inventive arrangements may be described in the general context of computer-executable instructions executed by a computer, such as a program element.
  • program elements include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • inventive arrangements can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program units can be located in both local and remote computer storage media including storage devices.
  • each functional unit in each embodiment of the present invention may be integrated into one unit, or each functional unit may exist physically separately, or two or more functional units may be integrated in In a unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit, or in the form of a hardware plus software functional unit.
  • the various embodiments in the specification are described in a progressive manner, and the same or similar parts of the various embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located One place, or it can be distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment as described in the embodiments, or may be correspondingly changed in one or more apparatuses different from the embodiment.
  • the units of the above embodiments may be combined into one unit, or may be further split into a plurality of sub-modules.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据持久化处理方法、装置及数据库系统,方法包括:在数据库系统内存每次产生脏页时,将产生的各脏页分别对应的页标识加入检查点队列;在检查点队列中确定活动组和当前组,并在预设的检查点发生时机,将活动组包括的各页标识对应的脏页依次转存到磁盘;检查点队列中与当前准备转存到磁盘的多个脏页分别对应的页标识,组成活动组;加入所述检查点队列的新的脏页插入的组为当前组;如果完成活动组相关的脏页转存,则在所述检查点队列中确定下一活动组,并在检查点发生时机,将下一活动组包括的各页标识对应的脏页依次转存到磁盘。本发明在脏页转存对正常事务操作产生较小影响的基础上,提高了脏页转存的效率。

Description

数据持久化处理方法、 装置及数据库系统
技术领域 本发明涉及计算机技术领域, 特别是涉及一种数据持久化处理方法、 装置及数据库系统。
背景技术
内存相对于磁盘能够提供较高的吞吐量和快速响应, 数据库系统通常 将某些数据如读写较为繁瑣的数据优先存放到内存中, 以提高数据读写速 度, 实现高速緩存。 数据库系统通常是以页作为高速緩存的单位, 当进程 修改了高速緩存里的数据时, 该页就被内核标记为脏页(Dirty Page ) , 数 据库系统将会在合适的时间把脏页的数据写到磁盘中去, 以保持高速緩存 中的数据和磁盘中的数据是一致的。
检查点 (Checkpoint ) 机制是使数据库能够在发生故障之后进行恢复 的机制。 传统的检查点机制也称为全量检查点机制, 是一次性将检查点队 列中的所有脏页转存到磁盘中。 在使用该检查点机制进行数据持久化处理 时, 为保证内存与磁盘数据的一致性, 需要在整个数据持久化处理期间锁 定整个检查点队列, 也就是说, 用户的正常事务操作将被阻止比较长的一 段时间。
为了克服传统的全量检查点机制影响正常事务执行的弊端, 名为 "模 糊检查点" 的机制被提了出来。 模糊检查点机制旨在将产生的脏页逐步刷 到磁盘中, 以此减少了因数据持久化处理对用户正常事务操作造成的影 响, 但是具体如何实现, 现有技术尚缺少存少有效的解决方案。 发明内容 本发明实施例提供一种数据持久化处理方法、 装置及数据库系统, 用 以在一定程度上提高脏页转存的效率。
一方面, 本发明实施例提供了一种数据持久化处理方法, 包括: 在数据库系统内存每次产生脏页时, 将产生的各脏页分别对应的页标 识加入检查点队列;
在所述检查点队列中确定活动组和当前组; 所述检查点队列中与当前 准备转存到磁盘的多个脏页分别对应的页标识, 组成所述活动组; 所述检 查点队列中新加入的脏页所插入的组为所述当前组;
在预设的检查点发生时机, 将所述活动组包括的各页标识对应的脏页 依次转存到磁盘的数据文件;
如果完成所述活动组相关的脏页转存, 则在所述检查点队列中确定下 一所述活动组, 并在所述检查点发生时机, 将下一所述活动组包括的各页 标识对应的脏页依次转存到所述磁盘的数据文件。
另一方面, 本发明实施例还提供了一种数据持久化处理装置, 包括: 检查点队列维护单元, 用于在数据库系统内存每次产生脏页时, 将产 生的各脏页分别对应的页标识加入检查点队列;
分组处理单元, 用于在所述检查点队列中确定活动组和当前组; 所述 检查点队列中与当前准备转存到磁盘的多个脏页分别对应的页标识, 组成 所述活动组; 所述检查点队列中新加入的脏页所插入的组为所述当前组; 脏页批量转存单元, 用于在预设的检查点发生时机, 将所述活动组包 括的各页标识对应的脏页依次转存到所述磁盘的数据文件;
所述分组处理单元, 还用于如果完成所述活动组相关的脏页转存, 则 在所述检查点队列中确定下一所述活动组;
所述脏页批量转存单元, 还用于在所述检查点发生时机, 将下一所述 活动组包括的各页标识对应的脏页依次转存到所述磁盘的数据文件。
再一方面, 本发明实施例还提供了一种数据库系统, 包括: 磁盘文件、 内存数据库以及数据库管理系统, 所述数据库管理系统用于管理所述内存 数据库中存储的数据, 所述数据库管理系统包括上述的数据持久化处理装 置, 所述数据持久化处理装置用于将内存数据库中存储的数据转存到所述 磁盘文件中。
本发明实施例提供的数据持久化处理方法和装置及数据库系统, 动态 维护一检查点队列, 将检查点队列中与当前准备转存到磁盘的多个脏页对 应的页标识作为活动组, 检查点队列新加入的脏页所插入的组为当前组, 每一检查点发生时机, 将一活动组包括的各页标识对应的脏页依次转存到 磁盘的数据库中, 完成一活动组包括的各页标识对应的脏页的转存之后, 在检查点队列中确定下一活动组, 以在下一检查点发生时机, 将该下一活 动组包括的各页标识对应的脏页依次转存到磁盘的数据文件。 如此循环处 理, 实现了按检查点发生时机分组批量向磁盘转存脏页, 由此在脏页转存 对正常事务操作产生较小影响的基础上, 提高了脏页转存的效率。
附图说明
实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的 附图。
图 1为本发明实施例提供的一种数据持久化处理方法流程图; 图 2a为本发明实施例提供的检查点队列分组示例一;
图 2b为本发明实施例提供的向检测点队列加入页标识的示例; 图 2c为本发明实施例提供的检查点队列分组示例二;
图 2d为本发明实施例提供的检查点队列分组示例三;
图 3为本发明实施例提供的检查点队列各页标识、 原子操作以及日志緩 冲区地址之间的对应关系示例;
图 4为本发明实施例提供的一种数据持久化处理装置的结构示意图; 图 5 为本发明实施例提供的另一种数据持久化处理装置的结构示意 图;
图 6为本发明实施例提供的一种数据库系统的结构示意图。
具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述,显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有付出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。
图 1为本发明实施例提供的一种数据持久化处理方法流程图。 如图 1 所示, 本实施例提供的数据持久化处理方法包括:
1 1 : 在数据库系统内存每次产生脏页时, 将产生的各脏页分别对应的 页标识加入检查点队列。
数据库系统中动态维护一检查点队列, 该检查点队列用于緩存数据库 系统内存产生的各脏页对应的页标识。 在数据库系统内存每次产生脏页 时, 可将产生的各脏页以时间先后顺序, 依次将相应脏页的页标识加入检 查点队列。 如果检查点队列包括的任一页标识对应的脏页的数据, 由内存 转存到磁盘的数据文件之后, 该脏页的页标识在检查点队列中自动删除。
12:在所述检查点队列中确定活动组( Active Group )和当前组( Current Group ) , 并在预设的检查点发生时机, 将所述活动组包括的各页标识对 应的脏页依次转存到所述磁盘的数据文件; 所述检查点队列中与当前准备 转存到磁盘的多个脏页分别对应的页标识, 组成所述活动组; 所述检查点 队列中新加入的脏页所插入的组为所述当前组。
可对检查点队列包括的各页标识进行分组, 以便实现脏页的批量分组 转存。 例如: 可将检查点队列中与当前需要转存到磁盘的各脏页分别对应 的各页标识, 组成活动组; 检查点队列中新加入的脏页所插入的组为当前 组。 在一种可选的实现方式中, 可为活动组包括的各页标识打一活动组标 识; 如此处理之后, 检查点队列包括的各页标识分为两类, 一类是打上活 动组标识的页标识, 即活动组包括的各页标识, 这些页标识对应的脏页为 当前需要从内存转存到磁盘的脏页; 另一类为没有打上活动组标识的页标 识, 即检查点队列除了活动组包括的各页标识之外的其他各页标识, 均没 有打上活动组标识。 活动组确定完成之后, 检查点队列中当前组可选示例 如图 2a所示。 此时, 如果数据库系统中产生有新的脏页, 则将新产生的 脏页按时间顺序依次加入检查点序列, 并且新加入的页标识所插入的组为 当前组, 可选示例如图 2b所示。 图 2a和图 2b是将检查点队列中最先加 入的四个页标识作为活动组, 该活动组的确定方式仅为一个示例性说明, 不应理解为对本发明技术实质的限制。 在确定当前的活动组之后, 可在检查点发生时机, 将活动组包括的各 页标识对应的脏页依次转存到所述磁盘的数据文件。 其中, 检查点发生时 机可预先确定, 例如: 可从原子操作角度确定所述检查点的发生时机, 以 降低检查点机制对正常事务操作的影响。
当任一页标识对应的脏页转存到磁盘的数据文件之后, 检查点队列中 可自动删除该页标识, 即相当于在活动组中自动删除该页标识。
13 : 如果完成所述活动组相关的脏页转存, 则在所述检查点队列中确 定下一所述活动组, 并在所述检查点发生时机, 将下一所述活动组包括的 各页标识对应的脏页依次转存到所述磁盘的数据文件。
在活动组包括的各页标识对应的脏页都转存到磁盘的数据文件之后, 可在检查点队列中确定下一活动组, 即相当于在检查点队列剩余的页标识 中重新分组, 其示例如图 2c 所示, 虚线部分为检查点队列中已删除的上 一活动组包括的各页标识。
如果检查点队列剩余的页标识数量, 小于一个活动组预设的需包括的 预设页标识数量, 则可将检查点队列剩余的各页标识全部划分为活动组。 例如图 2d所示, 活动组预设的需包括 4个页标识, 而检查点队列尚未完 成转存的脏页的页标识的数量为 1 个, 表示为 P9 , 该情形下, 可直接将
P9作为一新活动组包括的页标识。
在下一活动组确定之后, 可在新的检查点发生时机, 将该活动组包括 的各页标识对应的脏页转存到磁盘的数据文件中; 并将在分组之后内存产 生的新的脏页的页标识, 加入当前组中, 具体实现方式与 12相似, 在此 不再赘述。
如果检查点队列剩余的页标识数量为 0 , 即检查点队列为空, 则不执 行上述 12和 13 , 待检查点队列新加入有页标识且新的检查点发生时机到 来时, 重复执行上述 12和 13。
本实施例提供的数据持久化处理方法动态维护一检查点队列, 将检查 点队列中与当前准备转存到磁盘的多个脏页对应的页标识作为活动组, 检 查点队列新加入的脏页所插入的组为当前组, 每一检查点发生时机, 将一 活动组包括的各页标识对应的脏页依次转存到磁盘的数据库中, 完成一活 动组包括的各页标识对应的脏页的转存之后, 在检查点队列中确定下一活 动组, 以在下一检查点发生时机, 将该下一活动组包括的各页标识对应的 脏页依次转存到磁盘的数据文件。 如此循环处理, 实现了按检查点发生时 机分组批量向磁盘转存脏页, 由此在脏页转存对正常事务操作产生较小影 响的基础上, 提高了脏页转存的效率。
在上述技术方案的基础上, 可选的, 如果确定需要对所述检查点队列 包括的任一页标识对应的脏页进行修改, 则判断该页标识是否属于所述活 动组; 如果是, 则在将该页标识对应的脏页转存到磁盘的数据文件之前, 创建该页标识对应的脏页的镜像页; 否则, 不创建该页标识对应的脏页的 镜像页。 在完成该页标识对应的脏页的镜像页的创建之后, 如果轮到该页 标识对应的脏页的转存操作, 则将该页标识对应的镜像页转存到磁盘的数 据文件中。 如此处理由于无需为检查点队列中每一页标识对应的脏页创建 镜像页, 而仅为活动组中确定需要修改的页标识创建相应的镜像页, 因此 减少了创建镜页所需的内存空间, 保证了内存和磁盘的数据一致性。
在上述技术方案的基础上, 可选的, 一原子操作可涉及多个脏页, 一 活动组可能包括多个原子操作涉及的脏页。 在将该活动组包括的各页标识 对应的脏页转存到磁盘的数据文件之前, 可将该活动组关联的各原子操作 在内存的日志緩冲区中緩存的日志, 转存到磁盘的日志文件中; 例如: 确 定与当前的所述活动组包括的各页标识关联的原子操作; 在所述数据库内 存的日志緩冲区中, 获取与确定的原子操作关联的各日志緩冲区地址; 将 获取的各日志緩冲区地址緩存的日志转存到所述磁盘的日志文件。 在完成 相应日志的转存之后, 再将该活动组包括的各页标识对应的脏页转存到磁 盘的数据文件。
下面以图 3为例进行说明。 如图 3所示的示例中, P表示页标识, A 表示原子操作; 检测点队列当前的活动组包括的页标识为 P1-P6, 其中: Pl、 P2和 P14为原子操作 A1涉及的各脏页的页标识, P1和 P2属于活动 组, P14属于非活动组; PI 、 P2和 P14对应的脏页的最新数据緩存在内 存的日志緩冲区与原子操作 A1相应的緩冲区地址内。 该场景在检查点发 生时机如数据库系统内存当前没有正在运行的原子操作, 可获取与原子操 作 A1关联的各日志緩冲区地址,将获取的各日志緩冲区地址緩存的日志, 即 PI、 P2和 P14相应的日志转存到磁盘的日志文件; 之后, 在依次将 P1 和 P2对应的脏页转存到磁盘的数据文件。 活动组包括的各页标识 P1-P6 各自对应的脏页都转存到磁盘的数据文件之后, 在检查点队列剩余的页标 识中重新确定下一活动组 , 并在下一检查点发生时机到来时执行上述相似 的操作, 如此处理有利于在基于磁盘进行数据库系统进行故障恢复时, 保 证恢复数据的正确性。
不妨再以图 3为例进行说明。例如:原子操作 A1涉及的页标识为 Pl、 P2和 P14的脏页。 设原子操作 A1为: 将用户帐号 U1转帐 100元到用 户帐号 U2, 其中 P1 和 P2对应的脏页对应该原子操作中从用户帐号 U1 扣款 100元的操作, P14对应的脏页对应该原子操作中从用户帐号 U2增 加 100元的操作。 日志緩冲区记录了用户账号 U1和 U2的余额, 如 P1对 应的用户账号 U1的余额为 100而用户账号 U2的余额为 0, P2对应的用 户账号 U1 的余额为 0而用户账号 U2的余额为 0, P14对应的用户账号 U1的余额为 0而用户账号 U2的余额为 100。 如果数据库系统在将 P1和 P2对应的脏页转存到磁盘的数据文件之后发生故障,该情形下基于磁盘存 储的信息对需要对发生故障的数据库系统进行恢复时, 可根据磁盘的数据 文件中 P1和 P2的数据恢复数据库系统中原子操作 A1涉及的相应数据, 此时恢复的数据显示:用户账号 U1的余额为 0而用户账号 U2的余额为 0。 之后在基于磁盘的日志文件中原子操作 A1涉及的各日志, 对数据库系统 中恢复的原子操作 A1 涉及的相应数据进行更新, 如磁盘的日志文件中存 储的与 P14对应的日志, 即用户账号 U1的余额为 0而用户账号 U2的余 额为 100, 将上述恢复数据中用户账号 U2的余额更新为 100, 由此在基于 磁盘进行数据库系统进行故障恢复时, 保证了恢复数据的正确性。
进一步可选的, 可在活动组的脏页转存操作完成、 且确定下一所述活 动组之后, 获取下一所述活动组包括的各页标识关联的各原子操作的日志 文件起始点; 任一原子操作的日志文件起始点用于指示: 所述任一原子操 作开始运行时产生的日志, 在所述日志文件中的保存位置; 所述日志文件 包括的各日志按时间先后顺序保存。 设置获取的各原子操作的日志文件起 始点的最小值, 为当前的数据库恢复点; 所述数据库恢复点用于指示: 如 果所述数据库系统在完成将下一所述活动组包括的各页标识对应的脏页 转存到所述磁盘之前发生故障, 对发生故障的数据库系统进行恢复时, 在 所述日志文件中读取恢复所需日志的起始点。 如此处理可根据恢复点快速 确定数据库恢复需要使用的日志, 以提高数据库系统恢复的速度。 例如图
3 中, 在完成将当前的活动组 G1 包括的各页标识 P1-P6对应的脏页转存 到磁盘之后, 获取下一活动组 G2关联的各原子操作 A2、 A3和 A4的曰 志文件起始点, 并从获取的各日志文件起始点中取最小值, 将该最小值作 为当前的数据库恢复点。 如果数据库系统在执行活动组 G2的脏页转存操 作过程中发生故障, 则当前的数据库恢复点即作为数据库系进行恢复时, 在所述日志文件中读取恢复所需日志的起始点, 可确定日志文件中自恢复 点之后的各日志为数据库恢复需要使用的日志。
需要说明的是: 对于前述的各方法实施例, 为了简单描述, 故将其都 表述为一系列的动作组合, 但是本领域技术人员应该知悉, 本发明并不受 所描述的动作顺序的限制, 因为依据本发明, 某些步骤可以釆用其他顺序 或者同时进行。 其次, 本领域普通技术人员可以知悉, 说明书中所描述的 实施例均属于优选实施例, 所涉及的动作和模块并不一定是本发明所必须 的。
在上述实施例中, 对各个实施例的描述都各有侧重, 某个实施例中没有 详述的部分, 可以参见其他实施例的相关描述。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步 骤可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机 可读取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的存储介质包括: 只读存储器(Read-Only Memory, 简称 ROM ) 、 随机存取存储器 ( Random Access Memory, 简称 RAM ) 、 磁碟或者光盘 等各种可以存储程序代码的介质。
图 4为本发明实施例提供的一种数据持久化处理装置的结构示意图。 具体的, 如图 4所示的数据持久化处理装置 40包括: 检查点队列维护单 元 41、 分组处理单元 42和脏页批量转存单元 43。
检查点队列维护单元 41 可用于在数据库系统内存每次产生脏页时, 将产生的各脏页分别对应的页标识加入检查点队列;
分组处理单元 42可用于在所述检查点队列中确定活动组和当前组; 所述检查点队列中与当前准备转存到磁盘的多个脏页分别对应的页标识, 组成所述活动组; 所述检查点队列中新加入的脏页所插入的组为当前组; 脏页批量转存单元 43 可用于在预设的检查点发生时机, 将所述活动 组包括的各页标识对应的脏页依次转存到所述磁盘的数据文件。
所述分组处理单元 42还可用于如果完成所述活动组相关的脏页转存, 则在所述检查点队列中确定下一所述活动组。
所述脏页批量转存单元 43还可用于在所述检查点发生时机, 将下一 所述活动组包括的各页标识对应的脏页依次转存到所述磁盘的数据文件。
为了保证数据库系统内存运行的原子操作的连贯性, 所述检查点发生 时机包括: 所述数据库系统内存当前没有正在运行的原子操作。
釆用上述数据持久化处理装置, 可实现脏页根据检查点发生时机分组 批量向磁盘的数据文件转存脏页, 由此在尽量减少检查点执行过程中对用 户正常事务处理过程的影响的同时, 提高脏页转存的效率。
如图 5所示, 在上述技术方案的基础上, 可选的, 数据持久化处理装 置 40还可包括: 镜像页创建单元 44。镜像页创建单元 44可用于在确定所 述活动组之后, 如果确定需要对所述检查点队列包括的任一页标识对应的 脏页进行修改, 则判断所述任一页标识是否属于所述活动组; 如果是, 则 在将所述任一页标识对应的脏页转存到所述磁盘的数据文件之前, 创建所 述任一页标识对应的脏页的镜像页; 否则, 不创建所述任一页标识对应的 脏页的镜像页。 由于仅对当前的活动组包括的页标识对应的脏页进行修改 时, 才需创建该脏页的镜像页, 由此节省了存储镜像页所需的存储空间。 所述脏页批量转存单元 43在需要对任一页标识对应的脏页进行从内存到 磁盘的转存处理时, 如果该页标识创建有镜像页, 则将该页标识对应的镜 像页, 从内存转存到磁盘的数据文件中。
在上述技术方案的基础上, 可选的, 数据持久化处理装置 40还可包 括: 日志文件转存处理单元 45。 日志文件转存处理单元 45用于确定与所 述活动组包括的各页标识关联的原子操作; 在所述数据库内存的日志緩冲 区中, 获取与确定的原子操作关联的各日志緩冲区地址; 将获取的各日志 緩冲区地址緩存的日志转存到所述磁盘的日志文件。 如此处理, 有利于在 基于磁盘进行数据库系统进行故障恢复时, 保证恢复数据的正确性。
进一步可选的, 数据持久化处理装置 40还可包括: 数据库恢复点设 置模块 46。 数据库恢复点设置模块 46可用于获取下一所述活动组包括的 各页标识关联的各原子操作的日志文件起始点; 任一原子操作的日志文件 起始点用于指示: 所述任一原子操作开始运行时产生的日志, 在所述日志 文件中的保存位置; 所述日志文件包括的各日志按时间先后顺序保存; 设 置获取的各原子操作的日志文件起始点的最小值, 为当前的数据库恢复 点; 所述数据库恢复点用于指示: 如果所述数据库系统在完成将下一所述 活动组包括的各页标识对应的脏页转存到所述磁盘之前发生故障, 对发生 故障的数据库系统进行恢复时, 在所述日志文件中读取恢复所需日志的起 始点。 如此处理, 可根据恢复点快速确定数据库恢复需要使用的日志, 以 提高数据库系统恢复的速度。
本发明实施例提供的数据持久化处理装置, 用于实现本发明实施例提 供的数据持久化处理方法, 其工作机理可参见本发明上述方法实施例的相 应记载, 在此不再赘述。
如图 6所示, 本发明实施例还提供了一种数据库系统, 包括磁盘文件 53、 内存数据库 52以及数据库管理系统 51。数据库管理系统 51用于管理 内存数据库 52中存储的数据, 该数据库管理系统 51包括上述任一数据持 久化处理装置 40 ,该数据持久化处理装置 40用于将内存数据库 52中存储 的数据转存到磁盘文件 53 (即在磁盘上存储的数据文件)中, 由此实现了 按检查点发生时机分组批量向磁盘转存脏页, 在脏页转存对正常事务操作 产生较小影响的基础上, 提高了脏页转存的效率。 其中, 数据持久化处理 装置 40 的具体模块划分和功能方法流程可以参照前述实施例, 在此不再 赘述。
本发明方案可以在由计算机执行的计算机可执行指令的一般上下文中描 述, 例如程序单元。 一般地, 程序单元包括执行特定任务或实现特定抽象数 据类型的例程、 程序、 对象、 组件、 数据结构等等。 也可以在分布式计算环 境中实践本发明方案, 在这些分布式计算环境中, 由通过通信网络而被连接 的远程处理设备来执行任务。 在分布式计算环境中, 程序单元可以位于包括 存储设备在内的本地和远程计算机存储介质中。
另外, 在本发明各个实施例中的各功能单元可以集成在一个单元中, 也 可以是各个功能单元单独物理存在, 或者是两个或两个以上功能单元集成在 一个单元中。 上述集成的单元既可以釆用硬件的形式或釆用软件功能单元的 形式实现, 也可以釆用硬件加软件功能单元的形式实现。
本说明书中的各个实施例均釆用递进的方式描述, 各个实施例之间相同 相似的部分互相参见即可, 每个实施例重点说明的都是与其他实施例的不同 之处。 尤其, 对于装置实施例而言, 由于其基本相似于方法实施例, 所以描 述得比较简单, 相关之处参见方法实施例的部分说明即可。 以上所描述的装 置实施例仅仅是示意性的, 其中所述作为分离部件说明的单元可以是或者也 可以不是物理上分开的, 作为单元显示的部件可以是或者也可以不是物理单 元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。 可以根据实 际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。 本领域 普通技术人员在不付出创造性劳动的情况下, 即可以理解并实施。
本领域普通技术人员可以理解: 实施例中的装置中的模块可以按照实 施例描述分布于实施例的装置中, 也可以进行相应变化位于不同于本实施 例的一个或多个装置中。 上述实施例的单元可以合并为一个单元, 也可以 进一步拆分成多个子模块。 所述功能如果以软件功能单元的形式实现并作 为独立的产品销售或使用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明的技术方案本质上或者说对现有技术做出贡献的 部分或者该技术方案的部分可以以软件产品的形式体现出来, 该计算机软 件产品存储在一个存储介质中, 包括若干指令用以使得一台计算机设备 (可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例 所述方法的全部或部分步骤。 而前述的存储介质包括: U盘、 移动硬盘、 只读存储器( ROM, Read-Only Memory )、随机存取存储器( RAM, Random Access Memory ) 、 磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解: 附图只是一个实施例的示意图, 附图 中的模块或流程并不一定是实施本发明所必须的。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修 改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不 使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims

权 利 要 求 书
1、 一种数据持久化处理方法, 其特征在于, 包括:
在数据库系统内存每次产生脏页时, 将产生的各脏页分别对应的页标 识加入检查点队列;
在所述检查点队列中确定活动组和当前组; 所述检查点队列中与当前 准备转存到磁盘的多个脏页分别对应的页标识, 组成所述活动组; 所述检 查点队列中新加入的脏页所插入的组为所述当前组;
在预设的检查点发生时机, 将所述活动组包括的各页标识对应的脏页 依次转存到所述磁盘的数据文件;
如果完成所述活动组相关的脏页转存, 则在所述检查点队列中确定下 一所述活动组, 并在所述检查点发生时机, 将下一所述活动组包括的各页 标识对应的脏页依次转存到所述磁盘的数据文件。
2、 根据权利要求 1 所述的方法, 其特征在于, 在确定所述活动组之 后, 所述方法还包括:
如果确定需要对所述检查点队列包括的任一页标识对应的脏页进行 修改, 则判断所述任一页标识是否属于所述活动组; 如果是, 则在将所述 任一页标识对应的脏页转存到所述磁盘的数据文件之前, 创建所述任一页 标识对应的脏页的镜像页; 否则, 不创建所述任一页标识对应的脏页的镜 像页。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述检查点发生 时机包括: 所述数据库系统内存当前没有正在运行的原子操作。
4、 根据权利要求 1至 3任一项所述的方法, 其特征在于, 在将所述 活动组包括的各页标识对应的脏页依次转存到所述磁盘的数据文件之前, 所述方法还包括:
确定与所述活动组包括的各页标识关联的原子操作;
在所述数据库内存的日志緩冲区中, 获取与所述原子操作关联的各日 志緩冲区地址;
将获取的各日志緩冲区地址緩存的日志转存到所述磁盘的日志文件。
5、 根据权利要求 4所述的方法, 其特征在于, 在将当前所述活动组 包括的各页标识对应的脏页依次转存到所述磁盘的数据文件、 且确定下一 所述活动组之后, 所述方法还包括:
获取下一所述活动组包括的各页标识关联的各原子操作的日志文件 起始点; 任一原子操作的日志文件起始点用于指示: 所述任一原子操作开 始运行时产生的日志, 在所述日志文件中的保存位置; 所述日志文件包括 的各日志按时间先后顺序保存;
设置获取的各原子操作的日志文件起始点的最小值, 为数据库恢复 点; 所述数据库恢复点用于指示: 如果所述数据库系统在完成将下一所述 活动组包括的各页标识对应的脏页转存到所述磁盘之前发生故障, 对发生 故障的数据库系统进行恢复时, 在所述日志文件中读取恢复所需日志的起 始点。
6、 一种数据持久化处理装置, 其特征在于, 包括:
检查点队列维护单元, 用于在数据库系统内存每次产生脏页时, 将产 生的各脏页分别对应的页标识加入检查点队列;
分组处理单元, 用于在所述检查点队列中确定活动组和当前组; 所述 检查点队列中与当前准备转存到磁盘的多个脏页分别对应的页标识, 组成 所述活动组; 所述检查点队列中新加入的脏页所插入的组为所述当前组; 脏页批量转存单元, 用于在预设的检查点发生时机, 将所述活动组包 括的各页标识对应的脏页依次转存到所述磁盘的数据文件;
所述分组处理单元, 还用于如果完成所述活动组相关的脏页转存, 则 在所述检查点队列中确定下一所述活动组;
所述脏页批量转存单元, 还用于在所述检查点发生时机, 将下一所述 活动组包括的各页标识对应的脏页依次转存到所述磁盘的数据文件。
7、 根据权利要求 6所述的装置, 其特征在于, 所述装置还包括: 镜像页创建单元, 用于在确定所述活动组之后, 如果确定需要对所述 检查点队列包括的任一页标识对应的脏页进行修改, 则判断所述任一页标 识是否属于所述活动组; 如果是, 则在将所述任一页标识对应的脏页转存 到所述磁盘的数据文件之前, 创建所述任一页标识对应的脏页的镜像页; 否则, 不创建所述任一页标识对应的脏页的镜像页。
8、 根据权利要求 6或 7所述的装置, 其特征在于, 所述检查点发生 时机包括: 所述数据库系统内存当前没有正在运行的原子操作。
9、 根据权利要求 6至 8任一项所述的装置, 其特征在于, 所述装置 还包括:
日志文件转存处理单元, 用于确定与所述活动组包括的各页标识关联 的原子操作; 在所述数据库内存的日志緩冲区中, 获取与所述原子操作关 联的各日志緩冲区地址; 将获取的各日志緩冲区地址緩存的日志转存到所 述磁盘的日志文件。
10、 根据权利要求 9所述的装置, 其特征在于, 所述装置还包括: 数据库恢复点设置模块, 用于获取下一所述活动组包括的各页标识关 联的各原子操作的日志文件起始点; 任一原子操作的日志文件起始点用于 指示: 所述任一原子操作开始运行时产生的日志, 在所述日志文件中的保 存位置; 所述日志文件包括的各日志按时间先后顺序保存; 设置获取的各 原子操作的日志文件起始点的最小值, 为数据库恢复点; 所述数据库恢复 点用于指示: 如果所述数据库系统在完成将下一所述活动组包括的各页标 识对应的脏页转存到所述磁盘之前发生故障, 对发生故障的数据库系统进 行恢复时, 在所述日志文件中读取恢复所需日志的起始点。
1 1、 一种数据库系统, 其特征在于, 包括磁盘文件、 内存数据库以及 数据库管理系统, 所述数据库管理系统用于管理所述内存数据库中存储的 数据, 所述数据库管理系统包括如权利要求 6至 10任一项所述的数据持 久化处理装置, 所述数据持久化处理装置用于将内存数据库中存储的数据 转存到所述磁盘文件中。
PCT/CN2012/083305 2012-05-02 2012-10-22 数据持久化处理方法、装置及数据库系统 WO2013163864A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/529,501 US20150058295A1 (en) 2012-05-02 2014-10-31 Data Persistence Processing Method and Apparatus, and Database System

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210133474.4 2012-05-02
CN201210133474.4A CN102750317B (zh) 2012-05-02 2012-05-02 数据持久化处理方法、装置及数据库系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/529,501 Continuation US20150058295A1 (en) 2012-05-02 2014-10-31 Data Persistence Processing Method and Apparatus, and Database System

Publications (1)

Publication Number Publication Date
WO2013163864A1 true WO2013163864A1 (zh) 2013-11-07

Family

ID=47030504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/083305 WO2013163864A1 (zh) 2012-05-02 2012-10-22 数据持久化处理方法、装置及数据库系统

Country Status (3)

Country Link
US (1) US20150058295A1 (zh)
CN (1) CN102750317B (zh)
WO (1) WO2013163864A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562642A (zh) * 2017-07-21 2018-01-09 华为技术有限公司 检查点淘汰方法和装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9304998B2 (en) * 2012-12-19 2016-04-05 Microsoft Technology Licensing, Llc Main-memory database checkpointing
CN103177085A (zh) * 2013-02-26 2013-06-26 华为技术有限公司 一种检查点操作方法及装置
CN103218430B (zh) * 2013-04-11 2016-03-02 华为技术有限公司 控制数据写入的方法、系统及设备
CN104462127B (zh) * 2013-09-22 2018-07-20 阿里巴巴集团控股有限公司 一种记录数据更新方法及装置
US9471632B2 (en) * 2013-10-18 2016-10-18 International Business Machines Corporation Query optimization considering virtual machine mirroring costs
CN104408126B (zh) * 2014-11-26 2018-06-15 杭州华为数字技术有限公司 一种数据库的持久化写入方法、装置和系统
US10216598B2 (en) * 2017-07-11 2019-02-26 Stratus Technologies Bermuda Ltd. Method for dirty-page tracking and full memory mirroring redundancy in a fault-tolerant server
CN110874287B (zh) * 2018-08-31 2023-05-02 阿里巴巴集团控股有限公司 数据库中数据的备份及恢复方法、装置及电子设备
CN112015807B (zh) * 2019-05-31 2024-07-02 阿里巴巴集团控股有限公司 数据同步的处理方法、装置、电子设备及计算机存储介质
CN111563053B (zh) * 2020-07-10 2020-12-11 阿里云计算有限公司 处理Bitmap数据的方法以及装置
CN113961138A (zh) * 2020-07-21 2022-01-21 北京金山云网络技术有限公司 数据处理方法、装置、系统和电子设备
US11593309B2 (en) 2020-11-05 2023-02-28 International Business Machines Corporation Reliable delivery of event notifications from a distributed file system
CN115061858B (zh) * 2022-08-19 2022-12-06 湖南视拓信息技术股份有限公司 数据持久化方法、装置、计算机设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1524226A (zh) * 2001-03-07 2004-08-25 甲骨文国际公司 对多节点系统中的检查点队列进行管理
CN101464820A (zh) * 2009-01-16 2009-06-24 中国科学院计算技术研究所 磁盘设备的持续数据保护方法和系统
CN101819561A (zh) * 2010-04-21 2010-09-01 中兴通讯股份有限公司 文件下载方法及系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2312444A1 (en) * 2000-06-20 2001-12-20 Ibm Canada Limited-Ibm Canada Limitee Memory management of data buffers incorporating hierarchical victim selection
US20020103819A1 (en) * 2000-12-12 2002-08-01 Fresher Information Corporation Technique for stabilizing data in a non-log based information storage and retrieval system
US6671786B2 (en) * 2001-06-07 2003-12-30 Microsoft Corporation System and method for mirroring memory with restricted access to main physical mirrored memory
US7587429B2 (en) * 2004-05-24 2009-09-08 Solid Information Technology Oy Method for checkpointing a main-memory database
CN100369038C (zh) * 2005-02-24 2008-02-13 中兴通讯股份有限公司 一种实时数据库事务操作的实现方法
US9235531B2 (en) * 2010-03-04 2016-01-12 Microsoft Technology Licensing, Llc Multi-level buffer pool extensions
CN101901250A (zh) * 2010-06-08 2010-12-01 中兴通讯股份有限公司 一种内存数据库及其数据处理方法
CN102012849B (zh) * 2010-11-19 2012-10-24 中国人民大学 一种基于闪存的数据库恢复方法
US9122631B2 (en) * 2011-11-07 2015-09-01 Peking University Buffer management strategies for flash-based storage systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1524226A (zh) * 2001-03-07 2004-08-25 甲骨文国际公司 对多节点系统中的检查点队列进行管理
CN101464820A (zh) * 2009-01-16 2009-06-24 中国科学院计算技术研究所 磁盘设备的持续数据保护方法和系统
CN101819561A (zh) * 2010-04-21 2010-09-01 中兴通讯股份有限公司 文件下载方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562642A (zh) * 2017-07-21 2018-01-09 华为技术有限公司 检查点淘汰方法和装置
CN107562642B (zh) * 2017-07-21 2020-03-20 华为技术有限公司 检查点淘汰方法和装置

Also Published As

Publication number Publication date
US20150058295A1 (en) 2015-02-26
CN102750317A (zh) 2012-10-24
CN102750317B (zh) 2015-01-21

Similar Documents

Publication Publication Date Title
WO2013163864A1 (zh) 数据持久化处理方法、装置及数据库系统
US11327799B2 (en) Dynamic allocation of worker nodes for distributed replication
US11010240B2 (en) Tracking status and restarting distributed replication
US20200348852A1 (en) Distributed object replication architecture
CN110532247B (zh) 数据迁移方法和数据迁移系统
CN111049902B (zh) 基于区块链网络的数据存储方法、装置、存储介质和设备
US11349915B2 (en) Distributed replication and deduplication of an object from a source site to a destination site
US9031910B2 (en) System and method for maintaining a cluster setup
CN108255647B (zh) 一种samba服务器集群下的高速数据备份方法
US20150213100A1 (en) Data synchronization method and system
CN111143133B (zh) 虚拟机备份方法和备份虚拟机恢复方法
WO2018098972A1 (zh) 一种日志恢复方法、存储装置和存储节点
CN104166606A (zh) 文件备份方法和主存储设备
WO2013091167A1 (zh) 日志存储方法及系统
US10642530B2 (en) Global occupancy aggregator for global garbage collection scheduling
CN113568566A (zh) 利用索引物件来进行简易存储服务无缝迁移的方法、主装置以及存储服务器
CN102833273B (zh) 临时故障时的数据修复方法及分布式缓存系统
US11163799B2 (en) Automatic rollback to target for synchronous replication
WO2011143851A1 (zh) 数据库服务器操作控制系统、方法及设备
CN107402841A (zh) 大规模分布式文件系统数据修复方法及设备
WO2013091162A1 (zh) 一种分布式存储数据恢复方法、装置及系统
US20220129446A1 (en) Distributed Ledger Management Method, Distributed Ledger System, And Node
US11669501B2 (en) Address mirroring of a file system journal
WO2022173652A1 (en) Reducing the impact of network latency during a restore operation
US11645333B1 (en) Garbage collection integrated with physical file verification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12876050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12876050

Country of ref document: EP

Kind code of ref document: A1