WO2012083754A1 - Method and device for processing dirty data - Google Patents

Method and device for processing dirty data Download PDF

Info

Publication number
WO2012083754A1
WO2012083754A1 PCT/CN2011/081046 CN2011081046W WO2012083754A1 WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1 CN 2011081046 W CN2011081046 W CN 2011081046W WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1
Authority
WO
WIPO (PCT)
Prior art keywords
tuple
storage block
data
che
memory
Prior art date
Application number
PCT/CN2011/081046
Other languages
French (fr)
Chinese (zh)
Inventor
时家幸
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201180002177.XA priority Critical patent/CN102725752B/en
Priority to PCT/CN2011/081046 priority patent/WO2012083754A1/en
Publication of WO2012083754A1 publication Critical patent/WO2012083754A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning

Definitions

  • the present invention relates to the field of storage technologies, and in particular, to a method and apparatus for processing dirty data. Background technique
  • the database ( Da t aba s e ) is a repository that organizes, stores, and manages data according to its data structure. In daily work, it is often necessary to put some relevant data into such a "warehouse” and handle it accordingly according to the needs of management.
  • the traditional database system and other storage-related engines work.
  • the modified data needs to be written to disk immediately (or in a short time) to ensure the integrity of the transaction or the data in the database. Reliability. In the process of writing the modified data to the disk, the data cannot be written to the memory, and the memory has to be suspended from the external service, thereby causing a limitation on the memory throughput and the read and write performance of the system.
  • the read and write performance of the system is improved by adding a flash device similar to a Solid State Disk (SSD) as a cache memory:
  • the memory writes the modified data in units of memory blocks in the SSD.
  • SSD Solid State Disk
  • the data that has been modified in the cache and has not been written to the disk is dirty data.
  • the SSD can only process a small amount of dirty data when reading and writing a data block, resulting in data throughput and reading and writing of the database system. Low performance, causing system response delays and even database crashes.
  • Embodiments of the present invention provide a method and apparatus for processing dirty data, which can improve data throughput and read and write performance of a database system.
  • an embodiment of the present invention provides a method for processing dirty data, including: determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache memory;
  • the dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.
  • an embodiment of the present invention provides an apparatus for processing dirty data, including: a determining unit, configured to determine, in a memory, a first storage block, the size of the first storage block and a ca che cache Write specifications match;
  • a first write unit configured to combine and write the elements marked as dirty data in the memory into the first storage block
  • a second writing unit configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
  • the method and apparatus for processing dirty data provided by the embodiments of the present invention can combine the elements marked as dirty data in the memory and write them together, and then write the dirty data to the disk through the ca che.
  • the method and device provided by the embodiments of the present invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of ca che, and prolong the service life of the ca che.
  • FIG. 1 is a schematic flowchart of a method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a device according to another embodiment of the present invention.
  • FIG. 4 is another schematic structural diagram of a device according to another embodiment of the present invention.
  • FIG. 5 is still another schematic structural diagram of a device according to another embodiment of the present invention
  • FIG. 6 is still another schematic structural diagram of a device according to another embodiment of the present invention. detailed description
  • the dirty data of the embodiment of the present invention may be data that has been modified in the cache and has not been written to the disk.
  • the embodiments of the present invention can be applied to various types of databases and data warehouse systems, including DB databases, Oracle databases, SQL databases, and the like.
  • An embodiment of the present invention provides a method for processing dirty data. As shown in FIG. 1, the method includes:
  • the method for processing dirty data provided by the embodiment of the present invention can combine the elements marked as dirty data in the memory and write them together to the cache, and then write the dirty data to the disk through the cache.
  • the method provided by the embodiment of the invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of the cache, thereby prolonging the service life of the cache.
  • Another embodiment of the present invention further provides a method for processing dirty data, as shown in FIG. 2, including:
  • the cache is a cache device that connects the memory and the disk; the write specification of the cache refers to the maximum amount of data that can be written by the cache every time it is refreshed.
  • cache read and write The speed is much larger than the read/write speed of the memory.
  • the storage space with the same or close to the write size of the cache can be determined in the memory as the first storage module. Specifically, the free space in the memory may be integrated to obtain the first storage block. The storage space that meets the specifications of the first storage block may be reserved in the memory as the first storage block, which is not limited herein.
  • the cache may be a flash device similar to a Solid State Disk (SSD), but is not limited thereto.
  • SSD Solid State Disk
  • the first storage block stores the original storage block information to which each tuple marked as dirty data belongs, and each The tuple data and a pointer to each tuple data; wherein the tuple may be a storage unit that stores dirty data, and can also represent a connection of a plurality of storage units, but is not limited thereto.
  • the first storage block can integrate the array marked as dirty data into the cache, thereby improving Difficult data read and write efficiency.
  • the first mapping table is used to record the first specific.
  • the first storage block may be used to write dirty data multiple times. Cache; thus storing a plurality of different versions of the first block information in the cache.
  • the information in the first storage block may be numbered according to the order of writing the cache, and the time version number of each first storage block information is determined, where each tuple in the first storage block information of the same version is used. The time version number is the same, and the time version number of each tuple is used to represent the first storage block information to which the tuple belongs in the cache.
  • the storage space marked as dirty data in the memory is larger than the storage space of the first storage block, it is necessary to write the tuple marked as dirty data in the memory to the cache by using the first storage block multiple times. Thereby storing different versions of the first storage block information in the cache.
  • tuples marked as dirty data in memory may be modified many times, so that multiple values of the tuple are often recorded in ca che; but dirty data in ca che is written to disk
  • the method of the present embodiment only needs to: write the final value of each tuple to the disk; in order to improve the efficiency of reading and writing data, the method provided in this embodiment further includes:
  • the effective tuple can be determined by using, but not limited to, the following methods:
  • the first storage block information of each version is sequentially read from the lowest version of the storage block according to the sequence of the storage block versions; and the tuple in the first storage block of the current version is detected in the higher version according to the time version number. Whether the memory block is modified again; if it is, the current tuple is ignored; if not, the current tuple is retained and marked as a valid tuple.
  • the original storage block information to which each tuple in the effective tuple belongs may be determined according to the first mapping table, and the meta-combination belonging to the same original storage block is further determined. And write to disk together to improve data read and write efficiency.
  • the first mapping table may be accessed to determine whether the specified tuple data is included in the cache; if not, the tuple data is read from the disk; And determining, according to the first mapping table, the first storage block information including the specified tuple in the ca che including the specified tuple, and determining data of the specified tuple.
  • the corresponding tuple is obtained from the ca che to cover the specified storage block in the disk; when the system needs to modify the single tuple Then, the modification of the specified tuple is completed according to the method provided in this embodiment.
  • the first mapping table in the memory may be stored in the cache, so that after the server restarts, the remaining version of the cache is determined according to the first mapping table.
  • the first block information is written to the disk.
  • the method for writing the dirty data in the ca che to the disk is referred to in this embodiment, and details are not described herein again.
  • the method for processing dirty data provided by the embodiment of the present invention, by determining the first storage block in the memory, and integrating the tuple marked as dirty data in the memory into the ca che; the dirty in the ca che in the idle period of the service Data is written to disk.
  • the method provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, also facilitates the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing caç, and prolong the service life of ca che.
  • a further embodiment of the present invention provides a device for processing dirty data, which can implement the foregoing method embodiment.
  • the device includes: a determining unit 31, configured to determine, in a memory, a first storage block, where a size of the first storage block matches a write specification of a cache;
  • a first writing unit 32 configured to combine and write the elements marked as dirty data in the memory into the first storage block
  • the second write unit 33 is configured to write dirty data in the first storage block to the cache, and write the dirty data to the disk by using the cache.
  • the determining unit 31 may further include an integration subunit 311 or a reservation subunit 312, where:
  • the integration subunit 311 is configured to integrate the free space in the memory to obtain the first storage block.
  • the reservation subunit 312 is configured to reserve, in the memory, a storage space conforming to the first storage block specification as the first storage block.
  • the first writing unit 32 is further configured to write related information of a tuple marked as dirty data in the memory to the first storage block, where the related information of the tuple includes each tuple marked as dirty data.
  • the apparatus further includes a processing unit 34.
  • the second writing unit 33 specifically includes a first processing sub-unit 331, a first searching sub-unit 332, and a second processing sub-unit 333, where:
  • the processing unit 34 is configured to establish a first mapping table in the memory, where the first mapping table uses the initial storage block information, where the time version number of each tuple is used to represent the tuple in the cache.
  • the first processing sub-unit 331 is configured to write dirty data in the first storage block The cache, the dirty data is written to the disk by the cache;
  • the first search sub-unit 332 is configured to search for a time version number of a final value of each tuple data in the dirty data according to the first mapping table, when the tuple marked as dirty data in the memory is modified multiple times, and determine a first storage block information corresponding to the time version number in the cache, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting it as an effective tuple ;
  • the second processing sub-unit 333 is configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache;
  • the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
  • the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
  • the second writing unit 33 may further include a second searching subunit 334 and a third processing subunit 335, and the apparatus further includes a first searching unit 35 and a second searching unit 36, where:
  • the second lookup subunit 334 is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;
  • the second processing sub-unit 335 is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the tuple data information corresponding to the tuple in the cache.
  • the first searching unit 35 is configured to: when the specified tuple needs to be searched, look up the first mapping table, and determine whether the specified tuple is included in the cache;
  • the second searching unit 36 is configured to: when the specified tuple is included in the cache, determine, according to the first mapping table, first storage block information that includes a final value of the specified tuple data in the cache, and determine the specified element. Group of data.
  • the processing unit 34 is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted, according to the The first storage block information of the remaining versions in the cache is used to reconstruct the first mapping table.
  • the first searching sub-unit 332 is further configured to search for remaining dirty data in the cache according to the first mapping table determined by the processing unit 34. a time version number of a final value of each tuple data, determining first storage block information corresponding to the time version number in the cache, and storing the final value of each tuple data in the first storage block information
  • the tuple is marked and set as an effective tuple;
  • the second processing sub-unit 333 is further configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache; the processing unit 34 further And after the second processing sub-unit 333 writes the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table.
  • the processing unit 34 is further configured to: when the server is shut down, store the first mapping table in the memory in the cache, so that the server is The first mapping table writes the first storage block information of the remaining versions in the cache to the disk.
  • the processing device for the dirty data determines the first storage block in the memory by the determining unit 31, and integrates the tuple marked as dirty data in the memory by the first writing unit 32 to write the first storage block.
  • the dirty data in the first memory block is written to the cache by the second write unit 33 during the service idle period, and the dirty data is written to the disk by the cache.
  • the device provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, it is also convenient for the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing of the cache and prolong the service life of the cache.
  • Embodiments of the present invention also provide a memory including the apparatus described in Figures 3 through 6 and a processor for controlling the apparatus for processing dirty data.
  • This memory is capable of handling dirty data. It should be noted that the memory may be used as a memory or as a cache, which is not limited herein.
  • the invention can be implemented by means of software plus the necessary general hardware, and of course also by hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed are a method and a device for processing dirty data. The method provided in the embodiment of the present invention includes: determining a first storage block in the memory, and the size of the first storage block is matched with the writing specification of the cache; combining the elements marked as dirty data in the memory and writing the same into the first storage block; and writing the dirty data from the first storage block into the cache and writing the dirty data into a disk via the cache. By implementing the present invention, the data throughput and read-write performance of a database system can be improved.

Description

处理脏数据的方法及装置 技术领域  Method and device for processing dirty data
本发明涉及存储技术领域, 尤其涉及一种处理脏数据的方法及装置。 背景技术  The present invention relates to the field of storage technologies, and in particular, to a method and apparatus for processing dirty data. Background technique
数据库( Da t aba s e )是按照数据结构来组织、存储和管理数据的仓库。 在日常工作中, 常常需要把某些相关的数据放进这样 "仓库", 并根据管 理的需要进行相应的处理。传统的数据库系统等与存储相关的引擎工作原 理是当数据在内存中被修改后, 需要马上(或者在很短时间内)将修改后 的数据写入磁盘, 以保证事务的完整或数据库中数据的可靠性。 在向磁盘 写入所述修改后的数据的过程中, 无法向内存写入数据, 导致内存不得不 暂停对外的业务,从而导致内存的吞吐量及系统的读写性能都受到较大限 制。  The database ( Da t aba s e ) is a repository that organizes, stores, and manages data according to its data structure. In daily work, it is often necessary to put some relevant data into such a "warehouse" and handle it accordingly according to the needs of management. The traditional database system and other storage-related engines work. When the data is modified in memory, the modified data needs to be written to disk immediately (or in a short time) to ensure the integrity of the transaction or the data in the database. Reliability. In the process of writing the modified data to the disk, the data cannot be written to the memory, and the memory has to be suspended from the external service, thereby causing a limitation on the memory throughput and the read and write performance of the system.
由于磁盘的读写速度远低于内存, 艮大程度上降低了系统性能。目前, 主要通过增加类似于固态硬盘 ( Solid State Disk, SSD) 的闪存设备作 为高速緩冲存储器 (cache ) 提升系统的读写性能: 内存以 SSD中的存储 块为单位将修改后的数据写入 SSD; 在业务空闲期将 cache的数据写入磁 盘,从而提高系统吞吐量及读写性能。其中,所述 cache中已经修改过的、 还未写入磁盘的数据就是脏数据。  Since the read and write speed of the disk is much lower than the memory, the system performance is greatly reduced. At present, the read and write performance of the system is improved by adding a flash device similar to a Solid State Disk (SSD) as a cache memory: The memory writes the modified data in units of memory blocks in the SSD. SSD; Write cached data to disk during business idle period to improve system throughput and read and write performance. The data that has been modified in the cache and has not been written to the disk is dirty data.
当内存中短时间内有大量数据被修改、且修改的数据分散于不同存储 块时, SSD 完成一次数据块读写时仅能够对少量脏数据进行处理, 导致 数据库系统的数据吞吐量及读写性能较低,造成系统响应延迟甚至导致数 据库崩溃。  When a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the SSD can only process a small amount of dirty data when reading and writing a data block, resulting in data throughput and reading and writing of the database system. Low performance, causing system response delays and even database crashes.
发明内容 Summary of the invention
本发明的实施例提供了一种处理脏数据的方法及装置,能够提升数据 库系统的数据吞吐量以及读写性能。  Embodiments of the present invention provide a method and apparatus for processing dirty data, which can improve data throughput and read and write performance of a database system.
本发明的实施例釆用如下技术方案: 一方面, 本发明实施例提供了一种处理脏数据的方法, 包括: 在内存中确定第一存储块,所述第一存储块的大小与高速緩冲存储器 ca che的写规格相匹配; Embodiments of the present invention use the following technical solutions: In one aspect, an embodiment of the present invention provides a method for processing dirty data, including: determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache memory;
将内存中标记为脏数据的元组合并写入所述第一存储块;  Merging a combination of elements in memory that are marked as dirty data into the first storage block;
将所述第一存储块中的脏数据写入所述 ca che , 通过所述 ca che将所 述脏数据写入磁盘。  The dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.
另一方面, 本发明实施例提供了一种处理脏数据的装置, 包括: 确定单元, 用于在内存中确定第一存储块, 所述第一存储块的大小与 ca che高速緩冲存储器的写规格相匹配;  In another aspect, an embodiment of the present invention provides an apparatus for processing dirty data, including: a determining unit, configured to determine, in a memory, a first storage block, the size of the first storage block and a ca che cache Write specifications match;
第一写单元,用于将内存中标记为脏数据的元组合并写入所述第一存 储块;  a first write unit, configured to combine and write the elements marked as dirty data in the memory into the first storage block;
第二写单元, 用于将所述第一存储块中的脏数据写入所述 ca che , 通 过所述 ca che将所述脏数据写入磁盘。  And a second writing unit, configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
本发明实施例提供的处理脏数据的方法及装置,能够将内存中标记为 脏数据的元组合并在一起写入 ca che , 再通过所述 ca che将脏数据写入磁 盘。本发明实施例提供的方法及装置可以提升数据库系统的数据吞吐量以 及读写性能, 还能降低 ca che的读写使用频率, 延长 ca che的使用寿命。 附图说明  The method and apparatus for processing dirty data provided by the embodiments of the present invention can combine the elements marked as dirty data in the memory and write them together, and then write the dirty data to the disk through the ca che. The method and device provided by the embodiments of the present invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of ca che, and prolong the service life of the ca che. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对 实施例描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中 的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不 付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.
图 1为本发明一实施例提供的方法的流程示意图;  FIG. 1 is a schematic flowchart of a method according to an embodiment of the present invention;
图 2为本发明另一实施例提供的方法的流程示意图;  2 is a schematic flowchart of a method according to another embodiment of the present invention;
图 3为本发明又一实施例提供的装置的一个结构示意图;  3 is a schematic structural diagram of a device according to another embodiment of the present invention;
图 4为本发明又一实施例提供的装置的另一个结构示意图;  4 is another schematic structural diagram of a device according to another embodiment of the present invention;
图 5为本发明又一实施例提供的装置的又一个结构示意图; 图 6为本发明又一实施例提供的装置的再一个结构示意图。 具体实施方式 FIG. 5 is still another schematic structural diagram of a device according to another embodiment of the present invention; FIG. FIG. 6 is still another schematic structural diagram of a device according to another embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没 有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的 范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例的脏数据可以是 cache中已经修改过的、还未写入磁盘 的数据。 本发明实施例可以应用在各种类型的数据库和数据仓库系统中, 包括 DB数据库、 Oracle数据库、 SQL数据库等。  The dirty data of the embodiment of the present invention may be data that has been modified in the cache and has not been written to the disk. The embodiments of the present invention can be applied to various types of databases and data warehouse systems, including DB databases, Oracle databases, SQL databases, and the like.
本发明一实施例提供了一种处理脏数据的方法, 如图 1所示, 包括: An embodiment of the present invention provides a method for processing dirty data. As shown in FIG. 1, the method includes:
101、 在内存中确定第一存储块, 所述第一存储块的大小与 cache的 写规格相匹配。 101. Determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache.
102、 将内存中标记为脏数据的元组合并写入所述第一存储块。  102. Combine the elements in the memory marked as dirty data into the first storage block.
103、 将所述第一存储块中的脏数据写入所述 cache, 通过所述 cache 将所述脏数据写入磁盘。  103. Write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
本发明实施例提供的处理脏数据的方法,能够将内存中标记为脏数据 的元组合并在一起写入 cache, 再通过所述 cache将脏数据写入磁盘。 本 发明实施例提供的方法能够提升数据库系统的数据吞吐量以及读写性能, 还能降低 cache的读写使用频率, 从而延长 cache的使用寿命。 本发明另一实施例还提供了一种脏数据的处理方法, 如图 2所示, 包 括:  The method for processing dirty data provided by the embodiment of the present invention can combine the elements marked as dirty data in the memory and write them together to the cache, and then write the dirty data to the disk through the cache. The method provided by the embodiment of the invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of the cache, thereby prolonging the service life of the cache. Another embodiment of the present invention further provides a method for processing dirty data, as shown in FIG. 2, including:
201、 在内存中确定第一存储块, 所述第一存储块的大小与 cache的 写规格相匹配。  201. Determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache.
其中, 所述 cache是连接内存和磁盘的緩存设备; 所述 cache的写规 格是指 cache每刷新一次所能写入的最大数据量。 一般的, cache的读写 速度远大于内存的读写速度, 为了提高脏数据的读写效率, 可以在内存中 确定与 cache的写规格大小相同或接近的存储空间作为第一存储模块。具 体的, 可以将内存中的空闲空间进行整合, 得到第一存储块; 也可以在内 存中预留符合第一存储块规格的存储空间作为第一存储块, 此处不做限 定。 The cache is a cache device that connects the memory and the disk; the write specification of the cache refers to the maximum amount of data that can be written by the cache every time it is refreshed. In general, cache read and write The speed is much larger than the read/write speed of the memory. In order to improve the read and write efficiency of the dirty data, the storage space with the same or close to the write size of the cache can be determined in the memory as the first storage module. Specifically, the free space in the memory may be integrated to obtain the first storage block. The storage space that meets the specifications of the first storage block may be reserved in the memory as the first storage block, which is not limited herein.
优选的,所述 cache可以是类似于固态硬盘( Solid State Disk, SSD ) 的闪存设备, 但不仅限于此。  Preferably, the cache may be a flash device similar to a Solid State Disk (SSD), but is not limited thereto.
202、 将内存中标记为脏数据的元组合并写入所述第一存储块。  202. Combine the elements in the memory marked as dirty data into the first storage block.
值得说明的是,将内存中标记为脏数据的元组合并写入第一存储块之 后,所述第一存储块中存储有标记为脏数据的每个元组所属的原始存储块 信息、 每个元组的数据以及指向每个元组数据的指针; 其中, 元组可以是 存储脏数据的存储单元,还能够表示多个存储单元的联系,但不仅限于此。  It should be noted that after the elements marked as dirty data in the memory are combined and written into the first storage block, the first storage block stores the original storage block information to which each tuple marked as dirty data belongs, and each The tuple data and a pointer to each tuple data; wherein the tuple may be a storage unit that stores dirty data, and can also represent a connection of a plurality of storage units, but is not limited thereto.
具体的, 当内存中短时间内有大量数据被修改、 且修改的数据分散于 不同的存储块时,所述第一存储块可以将标记为脏数据的数组整合在一起 写入 cache, 从而提升脏数据的读写效率。  Specifically, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the first storage block can integrate the array marked as dirty data into the cache, thereby improving Difficult data read and write efficiency.
203、 在内存中建立第一映射表, 所述第一映射表用于记录所述第一 具体的, 当内存中的脏数据较多时, 可能需要多次使用第一存储块将 脏数据写入 cache; 从而在 cache中将存储有多个不同版本的第一存储块 信息。 为了便于索引, 可以根据写入 cache的先后顺序对第一存储块中的 信息进行编号, 确定每个第一存储块信息的时间版本号, 其中, 同一版本 的第一存储块信息中各元组的时间版本号相同,所述各元组的时间版本号 用于表征该元组在 cache中所属的第一存储块信息。  203. Establish a first mapping table in the memory, where the first mapping table is used to record the first specific. When there is more dirty data in the memory, the first storage block may be used to write dirty data multiple times. Cache; thus storing a plurality of different versions of the first block information in the cache. In order to facilitate the indexing, the information in the first storage block may be numbered according to the order of writing the cache, and the time version number of each first storage block information is determined, where each tuple in the first storage block information of the same version is used. The time version number is the same, and the time version number of each tuple is used to represent the first storage block information to which the tuple belongs in the cache.
204、 将所述第一存储块中的脏数据合并写入所述 cache。  204. Write dirty data in the first storage block to the cache.
值得说明的是,当内存中标记为脏数据的存储空间大于所述第一存储 块的存储空间时,需要通过多次使用第一存储块将内存中标记为脏数据的 元组写入 cache, 从而在 cache中存储不同版本的第一储存块信息。 在实际应用中, 内存中标记为脏数据的元组往往可能进行过多次修 改, 从而在 ca che中往往会记录该元组的多个值; 但是在将 ca che中的脏 数据写入磁盘时仅需要将各元组的最终值写入磁盘;为了提高数据的读写 效率, 本实施例提供的方法还包括: It should be noted that when the storage space marked as dirty data in the memory is larger than the storage space of the first storage block, it is necessary to write the tuple marked as dirty data in the memory to the cache by using the first storage block multiple times. Thereby storing different versions of the first storage block information in the cache. In practical applications, tuples marked as dirty data in memory may be modified many times, so that multiple values of the tuple are often recorded in ca che; but dirty data in ca che is written to disk The method of the present embodiment only needs to: write the final value of each tuple to the disk; in order to improve the efficiency of reading and writing data, the method provided in this embodiment further includes:
2 05、 当内存中标记为脏数据的元组进行过多次修改时, 修改该元组 在所述第一映射表中的时间版本号信息, 更新所述第一映射表。  2 05. When the tuple marked as dirty data in the memory is modified multiple times, the time version number information of the tuple in the first mapping table is modified, and the first mapping table is updated.
2 06、 根据所述第一映射表查找所述脏数据中各元组数据最终值的时 间版本号, 确定所述 ca che中与所述时间版本号对应的第一存储块信息, 并将该第一存储块信息中存储有所述各元组数据最终值的元组进行标记, 将其设置为有效元组。  2 06. Searching, according to the first mapping table, a time version number of a final value of each tuple data in the dirty data, determining first storage block information corresponding to the time version number in the ca che, and The tuple in which the final value of each tuple data is stored in the first storage block information is marked and set as an effective tuple.
具体的, 可以釆用但不限于如下方法确定有效元组:  Specifically, the effective tuple can be determined by using, but not limited to, the following methods:
按照各存储块版本的先后顺序,从最低版本的存储块开始依次读取各 版本的第一存储块信息;根据所述时间版本号检测当前版本的第一存储块 中的元组在较高版本的存储块中是否被再次修改;若是,则忽略当前元组; 若否, 则保留当前元组, 将其标记为有效元组。  The first storage block information of each version is sequentially read from the lowest version of the storage block according to the sequence of the storage block versions; and the tuple in the first storage block of the current version is detected in the higher version according to the time version number. Whether the memory block is modified again; if it is, the current tuple is ignored; if not, the current tuple is retained and marked as a valid tuple.
2 07、 在业务空闲期将 ca che 中的有效元组写入磁盘, 同时删除所述 有效元组在所述第一映射表中对应的时间版本号信息以及所述有效元组 在所述 ca che中对应的第一存储块信息。  2 07. Write, in the service idle period, a valid tuple in the ca che to the disk, and delete the time version number information corresponding to the valid tuple in the first mapping table, and the valid tuple in the ca The corresponding first storage block information in che.
优选的, 在将所述有效元组写入磁盘时, 还可以根据所述第一映射表 确定所述有效元组中各元组所属的原始存储块信息,将属于同一原始存储 块的元组合并在一起写入磁盘, 以提升数据的读写效率。  Preferably, when the valid tuple is written to the disk, the original storage block information to which each tuple in the effective tuple belongs may be determined according to the first mapping table, and the meta-combination belonging to the same original storage block is further determined. And write to disk together to improve data read and write efficiency.
具体的,当系统需要查找指定元组时,可以通过访问所述第一映射表, 确定 ca che中是否包括所述指定元组数据; 若否, 则从磁盘中读取该元组 数据; 若是, 则根据第一映射表确定包括该指定元组所述 ca che中包括所 述指定元组的第一存储块信息, 确定该指定元组的数据。  Specifically, when the system needs to search for the specified tuple, the first mapping table may be accessed to determine whether the specified tuple data is included in the cache; if not, the tuple data is read from the disk; And determining, according to the first mapping table, the first storage block information including the specified tuple in the ca che including the specified tuple, and determining data of the specified tuple.
当系统需要对整个存储块中的数据进行修改时,则从 ca che中获取相 应的元组以覆盖磁盘中指定的存储块; 当系统需要对单个元组进行修改 时, 则按照本实施例提供的方法完成指定元组的修改。 When the system needs to modify the data in the entire storage block, the corresponding tuple is obtained from the ca che to cover the specified storage block in the disk; when the system needs to modify the single tuple Then, the modification of the specified tuple is completed according to the method provided in this embodiment.
值得说明的是, 当发生异常情况(如断电、 数据库系统崩溃、 或强制 性关闭数据库服务器等)导致将脏数据写入磁盘的过程被迫终止时, 可以 通过如下步骤将 ca che中剩余脏数据写入磁盘:  It is worth noting that when an abnormal situation (such as power failure, database system crash, or forced shutdown of the database server) causes the process of writing dirty data to disk to be terminated, you can use the following steps to save the remaining dirt in ca che Data is written to disk:
在服务器重启后,根据所述 ca che中剩余版本的第一存储块信息重构 所述第一映射表;  Reconfiguring the first mapping table according to the first storage block information of the remaining versions in the ca che after the server is restarted;
根据所述第一映射表查找所述 ca che 中剩余脏数据中各元组数据最 终值的时间版本号,确定所述 ca che中与所述时间版本号对应的第一存储 块信息,并将该第一存储块信息中存储有所述各元组数据最终值的元组进 行标记, 将其设置为有效元组;  And searching, according to the first mapping table, a time version number of a final value of each tuple data in the remaining dirty data in the ca che, determining first storage block information corresponding to the time version number in the ca che, and The tuple in which the final value of each tuple data is stored in the first storage block information is marked, and is set as an effective tuple;
将所述有效元组写入磁盘,同时删除所述有效元组在所述第一映射表 中对应的时间版本号信息以及所述有效元组在所述 ca che 中对应的第一 存储块信息。  Write the valid tuple to the disk, and delete the time version number information corresponding to the valid tuple in the first mapping table and the first storage block information corresponding to the valid tuple in the ca che .
此外, 当根据用户指示选择关闭服务器时, 可以将内存中的第一映射 表存储于所述 ca che中,以使得服务器在重启后根据所述第一映射表将所 述 ca che中剩余版本的第一存储块信息写入磁盘。 其中, 将 ca che中脏数 据写入磁盘的方法参照本实施例, 此处不再赘述。  In addition, when the server is selected to be shut down according to the user indication, the first mapping table in the memory may be stored in the cache, so that after the server restarts, the remaining version of the cache is determined according to the first mapping table. The first block information is written to the disk. The method for writing the dirty data in the ca che to the disk is referred to in this embodiment, and details are not described herein again.
本发明实施例提供的脏数据的处理方法,通过在内存中确定第一存储 块, 将内存中标记为脏数据的元组整合在一起写入 ca che ; 在业务空闲期 将 ca che中的脏数据写入磁盘。 与现有技术相比, 当内存中短时间内有大 量数据被修改、 且修改的数据分散于不同的存储块时, 本发明实施例提供 的方法能够明显提升数据库系统的数据吞吐量以及读写性能,还便于系统 对指定的元组数据进行查找或修改;同时还能够能降低 ca che的读写使用 频率, 延长 ca che的使用寿命。 本发明又一实施例提供了一种脏数据的处理装置,能够实现上述方法 实施例, 如图 3所示, 所述装置包括: 确定单元 31, 用于在内存中确定第一存储块, 所述第一存储块的大 小与 cache的写规格相匹配; The method for processing dirty data provided by the embodiment of the present invention, by determining the first storage block in the memory, and integrating the tuple marked as dirty data in the memory into the ca che; the dirty in the ca che in the idle period of the service Data is written to disk. Compared with the prior art, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the method provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, also facilitates the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing caç, and prolong the service life of ca che. A further embodiment of the present invention provides a device for processing dirty data, which can implement the foregoing method embodiment. As shown in FIG. 3, the device includes: a determining unit 31, configured to determine, in a memory, a first storage block, where a size of the first storage block matches a write specification of a cache;
第一写单元 32, 用于将内存中标记为脏数据的元组合并写入所述第 一存储块;  a first writing unit 32, configured to combine and write the elements marked as dirty data in the memory into the first storage block;
第二写单元 33, 用于将所述第一存储块中的脏数据写入所述 cache, 通过所述 cache将所述脏数据写入磁盘。  The second write unit 33 is configured to write dirty data in the first storage block to the cache, and write the dirty data to the disk by using the cache.
进一步的, 如图 4所示, 确定单元 31还可以包括整合子单元 311或 预留子单元 312, 其中:  Further, as shown in FIG. 4, the determining unit 31 may further include an integration subunit 311 or a reservation subunit 312, where:
整合子单元 311用于将内存中的空闲空间进行整合,得到所述第一存 储块;  The integration subunit 311 is configured to integrate the free space in the memory to obtain the first storage block.
预留子单元 312 用于在内存中预留符合所述第一存储块规格的存储 空间作为所述第一存储块。  The reservation subunit 312 is configured to reserve, in the memory, a storage space conforming to the first storage block specification as the first storage block.
具体的, 第一写单元 32还用于将内存中标记为脏数据的元组的相关 信息写入所述第一存储块,所述元组的相关信息包括标记为脏数据的每个 元组所属的原始存储块信息、每个元组的数据以及指向每个元组数据的指 针。  Specifically, the first writing unit 32 is further configured to write related information of a tuple marked as dirty data in the memory to the first storage block, where the related information of the tuple includes each tuple marked as dirty data. The original storage block information to which it belongs, the data for each tuple, and a pointer to each tuple data.
进一步的, 如图 5 所示, 所述装置还包括处理单元 34, 第二写单元 33具体包括第一处理子单元 331、第一查找子单元 332和第二处理子单元 333, 其中:  Further, as shown in FIG. 5, the apparatus further includes a processing unit 34. The second writing unit 33 specifically includes a first processing sub-unit 331, a first searching sub-unit 332, and a second processing sub-unit 333, where:
处理单元 34用于在所述内存中建立第一映射表, 所述第一映射表用 始存储块信息, 其中, 所述各元组的时间版本号用于表征该元组在所述 cache中所属的第一存储块的版本信息。 进行过多次修改时, 修改该元组在所述第一映射表中的时间版本号信息, 更新所述第一映射表;  The processing unit 34 is configured to establish a first mapping table in the memory, where the first mapping table uses the initial storage block information, where the time version number of each tuple is used to represent the tuple in the cache. The version information of the first storage block to which it belongs. When the modification is performed multiple times, the time version number information of the tuple in the first mapping table is modified, and the first mapping table is updated;
具体的,第一处理子单元 331用于将所述第一存储块中的脏数据写入 所述 cache, 通过所述 cache将所述脏数据写入磁盘; Specifically, the first processing sub-unit 331 is configured to write dirty data in the first storage block The cache, the dirty data is written to the disk by the cache;
第一查找子单元 332 用于当内存中标记为脏数据的元组进行过多次 修改时,根据所述第一映射表查找所述脏数据中各元组数据最终值的时间 版本号, 确定所述 cache中与所述时间版本号对应的第一存储块信息, 并 将该第一存储块信息中存储有所述各元组数据最终值的元组进行标记,将 其设置为有效元组;  The first search sub-unit 332 is configured to search for a time version number of a final value of each tuple data in the dirty data according to the first mapping table, when the tuple marked as dirty data in the memory is modified multiple times, and determine a first storage block information corresponding to the time version number in the cache, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting it as an effective tuple ;
第二处理子单元 333用于将第一查找子单元 332确定的有效元组写入 磁盘, 并删除所述有效元组在所述 cache中对应的元组数据信息;  The second processing sub-unit 333 is configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache;
则处理单元 34还用于在第二处理子单元 333将所述有效元组写入磁 盘后, 删除所述有效元组在第一映射表中对应的时间版本号信息。  The processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
则所述处理单元 34还用于在所述第二处理子单元 333将所述有效元 组写入磁盘后,删除所述有效元组在所述第一映射表中对应的时间版本号 信息。  The processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
进一步的, 如图 6所示, 第二写单元 33还可以包括第二查找子单元 334和第三处理子单元 335,所述装置还包括第一查找单元 35和第二查找 单元 36, 其中:  Further, as shown in FIG. 6, the second writing unit 33 may further include a second searching subunit 334 and a third processing subunit 335, and the apparatus further includes a first searching unit 35 and a second searching unit 36, where:
第二查找子单元 334 用于根据所述第一映射表确定所述有效元组中 各元组所属的原始存储块信息;  The second lookup subunit 334 is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;
第二处理子单元 335 用于将属于同一原始存储块的元组合并写入磁 盘, 并删除该元组在所述 cache中对应的元组数据信息。  The second processing sub-unit 335 is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the tuple data information corresponding to the tuple in the cache.
第一查找单元 35用于当需要查找指定元组时,查找所述第一映射表, 确定所述 cache中是否包括所述指定元组;  The first searching unit 35 is configured to: when the specified tuple needs to be searched, look up the first mapping table, and determine whether the specified tuple is included in the cache;
第二查找单元 36用于当所述 cache中包括所述指定元组时, 根据第 一映射表确定所述 cache 中包括该指定元组数据最终值的第一存储块信 息, 确定所述指定元组的数据。  The second searching unit 36 is configured to: when the specified tuple is included in the cache, determine, according to the first mapping table, first storage block information that includes a final value of the specified tuple data in the cache, and determine the specified element. Group of data.
根据图 6所述的装置, 进一步的, 所述处理单元 34还用于当发生异 常情况导致将脏数据写入磁盘的过程被迫终止时,在服务器重启后根据所 述 cache中剩余版本的第一存储块信息重构所述第一映射表; 第一查找子单元 332还用于根据所述处理单元 34确定的第一映射表 查找所述 cache中剩余脏数据中各元组数据最终值的时间版本号,确定所 述 cache中与所述时间版本号对应的第一存储块信息,并将该第一存储块 信息中存储有所述各元组数据最终值的元组进行标记,将其设置为有效元 组; According to the apparatus of FIG. 6, further, the processing unit 34 is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted, according to the The first storage block information of the remaining versions in the cache is used to reconstruct the first mapping table. The first searching sub-unit 332 is further configured to search for remaining dirty data in the cache according to the first mapping table determined by the processing unit 34. a time version number of a final value of each tuple data, determining first storage block information corresponding to the time version number in the cache, and storing the final value of each tuple data in the first storage block information The tuple is marked and set as an effective tuple;
第二处理子单元 333还用于将所述第一查找子单元 332确定的有效元 组写入磁盘, 并删除所述有效元组在所述 cache中对应的元组数据信息; 处理单元 34还用于在第二处理子单元 333将所述有效元组写入磁盘 后, 删除所述有效元组在所述第一映射表中对应的时间版本号信息。  The second processing sub-unit 333 is further configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache; the processing unit 34 further And after the second processing sub-unit 333 writes the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table.
根据图 6所述的装置, 进一步的, 处理单元 34还用于当服务器关闭 时, 将所述内存中的第一映射表存储于所述 cache中, 以使得所述服务器 在重启后根据所述第一映射表将所述 cache 中剩余版本的第一存储块信 息写入磁盘。  According to the apparatus of FIG. 6, further, the processing unit 34 is further configured to: when the server is shut down, store the first mapping table in the memory in the cache, so that the server is The first mapping table writes the first storage block information of the remaining versions in the cache to the disk.
本发明实施例提供的脏数据的处理装置, 通过确定单元 31在内存中 确定第一存储块, 通过第一写单元 32将内存中标记为脏数据的元组整合 在一起写入第一存储块; 由第二写单元 33在业务空闲期将第一存储块中 的脏数据写入 cache, 通过所述 cache将所述脏数据写入磁盘。 与现有技 术相比, 当内存中短时间内有大量数据被修改、 且修改的数据分散于不同 的存储块时,本发明实施例提供的装置能够明显提升数据库系统的数据吞 吐量以及读写性能, 还便于系统对指定的元组数据进行查找或修改; 同时 还能够能降低 cache的读写使用频率, 延长 cache的使用寿命。  The processing device for the dirty data provided by the embodiment of the present invention determines the first storage block in the memory by the determining unit 31, and integrates the tuple marked as dirty data in the memory by the first writing unit 32 to write the first storage block. The dirty data in the first memory block is written to the cache by the second write unit 33 during the service idle period, and the dirty data is written to the disk by the cache. Compared with the prior art, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the device provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, it is also convenient for the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing of the cache and prolong the service life of the cache.
本发明实施例还提供了一种存储器,所述存储器包括图 3至图 6所述 的装置以及处理器, 所述处理器用于控制所述处理脏数据的装置。 该存储 器能够处理脏数据。 值得说明的是, 所述存储器可以用作内存, 也可以用 作 cache, 此处不做限定。  Embodiments of the present invention also provide a memory including the apparatus described in Figures 3 through 6 and a processor for controlling the apparatus for processing dirty data. This memory is capable of handling dirty data. It should be noted that the memory may be used as a memory or as a cache, which is not limited herein.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到 本发明可借助软件加必需的通用硬件的方式来实现, 当然也可以通过硬 件, 但很多情况下前者是更佳的实施方式。 基于这样的理解, 本发明的技 术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式 体现出来, 该计算机软件产品存储在可读取的存储介质中, 如计算机的软 盘, 硬盘或光盘等, 包括若干指令用以使得一台计算机设备(可以是个人 计算机, 服务器, 或者网络设备等) 执行本发明各个实施例所述的方法。 以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保 护范围应以所述权利要求的保护范围为准。 Through the description of the above embodiments, those skilled in the art can clearly understand The invention can be implemented by means of software plus the necessary general hardware, and of course also by hardware, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention. The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权 利 要 求 书 Claim
1、 一种处理脏数据的方法, 其特征在于, 包括:  A method for processing dirty data, comprising:
在内存中确定第一存储块, 所述第一存储块的大小与高速緩冲存储器 ca che的写规格相匹配;  Determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache ca che;
将内存中标记为脏数据的元组合并写入所述第一存储块;  Merging a combination of elements in memory that are marked as dirty data into the first storage block;
将所述第一存储块中的脏数据写入所述 ca che , 通过所述 ca che将所 述脏数据写入磁盘。  The dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.
2、 根据权利要求 1所述的方法, 其特征在于, 所述在内存中确定第一 存储块包括:  2. The method according to claim 1, wherein the determining the first memory block in the memory comprises:
将内存中的空闲空间进行整合, 得到所述第一存储块; 或者  Integrating the free space in the memory to obtain the first storage block; or
在内存中预留符合所述第一存储块规格的存储空间作为所述第一存储 块。  A storage space conforming to the first storage block specification is reserved in the memory as the first storage block.
3、 根据权利要求 2所述的方法, 其特征在于, 所述将内存中标记为脏 数据的元组合并写入所述第一存储块之后, 所述第一存储块中存储有标记 为脏数据的每个元组所属的原始存储块信息、 每个元组的数据以及指向每 个元组数据的指针。  The method according to claim 2, wherein after the elements marked as dirty data in the memory are combined and written into the first storage block, the first storage block stores the mark as dirty. The original storage block information to which each tuple of data belongs, the data for each tuple, and a pointer to each tuple data.
4、 根据权利要求 3所述的方法, 其特征在于, 所述将内存中标记为脏 数据的元组合并写入所述第一存储块之后, 还包括:  The method according to claim 3, wherein after the combination of the elements marked as dirty data in the memory and written in the first storage block, the method further includes:
在所述内存中建立第一映射表, 所述第一映射表用于记录所述第一存 所述各元组的时间版本号用于表征该元组在所述 ca che 中所属的第一存储 块的版本信息。  Establishing a first mapping table in the memory, where the first mapping table is configured to record a time version number of the first stored tuple for characterizing a first part of the tuple in the ca che The version information of the storage block.
5、 根据权利要求 4所述的方法, 其特征在于, 当所述内存中标记为脏 数据的元组进行过多次修改时, 所述方法还包括:  The method according to claim 4, wherein when the tuple marked as dirty data in the memory is modified a plurality of times, the method further includes:
修改该元组在所述第一映射表中的时间版本号信息, 更新所述第一映 射表;  Modifying the time version number information of the tuple in the first mapping table, and updating the first mapping table;
则所述将所述第一存储块中的脏数据写入所述 ca che ,通过所述 ca che 将所述脏数据写入磁盘包括: Then, the dirty data in the first storage block is written into the ca che, through the ca che Writing the dirty data to the disk includes:
根据所述第一映射表查找所述脏数据中各元组数据最终值的时间版本 号, 确定所述 ca che 中与所述时间版本号对应的第一存储块信息, 并将该 第一存储块信息中存储有所述各元组数据最终值的元组进行标记, 将其设 置为有效元组;  And searching, according to the first mapping table, a time version number of a final value of each tuple data in the dirty data, determining first storage block information corresponding to the time version number in the ca che, and storing the first storage a tuple in which the final value of each tuple data is stored in the block information is marked, and is set as an effective tuple;
将所述有效元组写入磁盘, 删除所述有效元组在所述第一映射表中对 应的时间版本号信息以及所述有效元组在所述 ca che 中对应的元组数据信 息。  And writing the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table, and tuple data information corresponding to the valid tuple in the ca che.
6、 根据权利要求 5所述的方法, 其特征在于, 所述将所述有效元组写 入磁盘包括:  The method according to claim 5, wherein the writing the valid tuple to the disk comprises:
根据所述第一映射表确定所述有效元组中各元组所属的原始存储块信 息;  Determining, according to the first mapping table, original storage block information to which each tuple in the effective tuple belongs;
将属于同一原始存储块的元组合并写入磁盘, 并删除该元组在所述 ca che中对应的元组数据信息。  The elements belonging to the same original storage block are combined and written to the disk, and the corresponding tuple data information of the tuple in the ca che is deleted.
7、根据权利要求 1至 6 中任一项所述的方法, 其特征在于, 当需要查 找指定元组时, 所述方法还包括:  The method according to any one of claims 1 to 6, wherein when the specified tuple needs to be searched, the method further comprises:
访问所述第一映射表, 确定所述 ca che中是否包括所述指定元组; 当所述 ca che中包括所述指定元组时,根据第一映射表确定所述 ca che 中包括该指定元组数据最终值的第一存储块信息, 确定所述指定元组的数 据。  Accessing the first mapping table, determining whether the specified tuple is included in the ca che; when the specified tuple is included in the ca che, determining, according to the first mapping table, that the ca che includes the designation The first storage block information of the final value of the tuple data determines the data of the specified tuple.
8、 根据权利要求 1至 6中任一项所述的方法, 其特征在于, 当发生异 常情况导致将脏数据写入磁盘的过程被迫终止时, 所述方法还包括:  The method according to any one of claims 1 to 6, wherein when the abnormality causes the process of writing dirty data to the disk to be terminated, the method further includes:
在服务器重启后, 根据所述 ca che 中剩余版本的第一存储块信息重构 所述第一映射表;  After the server is restarted, reconstructing the first mapping table according to the first storage block information of the remaining versions in the ca che;
根据所述第一映射表查找所述 ca che 中剩余脏数据中各元组数据最终 值的时间版本号, 确定所述 ca che 中与所述时间版本号对应的第一存储块 信息, 并将该第一存储块信息中存储有所述各元组数据最终值的元组进行 标记, 将其设置为有效元组; And searching, according to the first mapping table, a time version number of a final value of each tuple data in the remaining dirty data in the ca che, determining first storage block information corresponding to the time version number in the ca che, and a tuple in which the final value of each tuple data is stored in the first storage block information Mark, set it as a valid tuple;
将所述有效元组写入磁盘, 删除所述有效元组在所述第一映射表中对 应的时间版本号信息以及所述有效元组在所述 ca che 中对应的元组数据信 息。  And writing the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table, and tuple data information corresponding to the valid tuple in the ca che.
9、 根据权利要求 1至 6中任一项所述的方法, 其特征在于, 当服务器 关闭时, 所述方法还包括:  The method according to any one of claims 1 to 6, wherein when the server is shut down, the method further comprises:
将内存中的第一映射表存储于所述 ca che 中, 以使得服务器在重启后 根据所述第一映射表将所述 ca che 中剩余版本的第一存储块信息写入磁 盘。  The first mapping table in the memory is stored in the ca che, so that the server writes the first storage block information of the remaining version in the ca che to the disk according to the first mapping table after the restart.
1 0、 一种处理脏数据的装置, 其特征在于, 包括:  A device for processing dirty data, comprising:
确定单元, 用于在内存中确定第一存储块, 所述第一存储块的大小与 高速緩冲存储器 ca che的写规格相匹配;  a determining unit, configured to determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache cache;
第一写单元, 用于将内存中标记为脏数据的元组合并写入所述第一存 储块;  a first write unit, configured to combine and write the elements marked as dirty data in the memory into the first storage block;
第二写单元, 用于将所述第一存储块中的脏数据写入所述 ca che , 通 过所述 ca che将所述脏数据写入磁盘。  And a second writing unit, configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
1 1、 根据权利要求 1 0所述的装置, 其特征在于, 所述确定单元包括整 合子单元或预留子单元, 其中:  The device according to claim 10, wherein the determining unit comprises an integrated subunit or a reserved subunit, wherein:
所述整合子单元用于将内存中的空闲空间进行整合, 得到所述第一存 储块;  The integration subunit is configured to integrate the free space in the memory to obtain the first storage block;
所述预留子单元用于在内存中预留符合所述第一存储块规格的存储空 间作为所述第一存储块。  The reservation subunit is configured to reserve, in the memory, a storage space that meets the specification of the first storage block as the first storage block.
1 2、 根据权利要求 1 1所述的装置, 其特征在于, 所述第一写单元还用 于将内存中标记为脏数据的元组的相关信息写入所述第一存储块, 所述元 组的相关信息包括标记为脏数据的每个元组所属的原始存储块信息、 每个 元组的数据以及指向每个元组数据的指针。  The device according to claim 11, wherein the first writing unit is further configured to write related information of a tuple marked as dirty data in the memory into the first storage block, where The related information of the tuple includes original storage block information to which each tuple marked as dirty data belongs, data of each tuple, and a pointer to each tuple data.
1 3、 根据权利要求 1 2所述的装置, 其特征在于, 所述装置还包括: 处理单元, 用于在所述内存中建立第一映射表, 所述第一映射表用于 储块信息, 其中, 所述各元组的时间版本号用于表征该元组在所述 ca che 中所属的第一存储块的版本信息。 The device according to claim 12, wherein the device further comprises: a processing unit, configured to establish a first mapping table in the memory, where the first mapping table is used for storing block information, where a time version number of each tuple is used to represent the tuple in the ca che Version information of the first storage block to which it belongs.
14、 根据权利要求 1 3所述的装置, 其特征在于, 所述处理单元还用于 当所述内存中标记为脏数据的元组进行过多次修改时, 修改该元组在所述 第一映射表中的时间版本号信息, 更新所述第一映射表;  The device according to claim 13, wherein the processing unit is further configured to: when the tuple marked as dirty data in the memory is modified a plurality of times, modify the tuple in the first Updating the first mapping table by using time version number information in a mapping table;
所述第二写单元包括第一处理子单元、 第一查找子单元和第二处理子 单元, 其中:  The second write unit includes a first processing subunit, a first lookup subunit, and a second processing subunit, where:
所述第一处理子单元用于将所述第一存储块中的脏数据写入所述 ca che , 通过所述 ca che将所述脏数据写入磁盘;  The first processing subunit is configured to write dirty data in the first storage block to the ca che, and write the dirty data to a disk by using the ca che;
所述第一查找子单元用于当所述内存中标记为脏数据的元组进行过多 次修改时, 根据所述第一映射表查找所述脏数据中各元组数据最终值的时 间版本号, 确定所述 ca che 中与所述时间版本号对应的第一存储块信息, 并将该第一存储块信息中存储有所述各元组数据最终值的元组进行标记, 将其设置为有效元组;  The first search subunit is configured to search for a time version of a final value of each tuple data in the dirty data according to the first mapping table when the tuple marked as dirty data in the memory is modified multiple times. No. determining first storage block information corresponding to the time version number in the ca che, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting the tuple Is an effective tuple;
所述第二处理子单元用于将所述第一查找子单元确定的有效元组写入 磁盘, 并删除所述有效元组在所述 ca che中对应的元组数据信息;  The second processing sub-unit is configured to write the valid tuple determined by the first search sub-unit to the disk, and delete the corresponding tuple data information of the valid tuple in the ca che;
所述处理单元还用于在所述第二处理子单元将所述有效元组写入磁盘 后, 删除所述有效元组在所述第一映射表中对应的时间版本号信息。  The processing unit is further configured to: after the second processing subunit writes the valid tuple to the disk, delete time version number information corresponding to the valid tuple in the first mapping table.
1 5、 根据权利要求 14所述的装置, 其特征在于, 所述第二写单元还包 括第二查找子单元和第三处理子单元, 其中:  The device according to claim 14, wherein the second writing unit further comprises a second searching subunit and a third processing subunit, wherein:
所述第二查找子单元用于根据所述第一映射表确定所述有效元组中各 元组所属的原始存储块信息;  The second search subunit is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;
所述第三处理子单元用于将属于同一原始存储块的元组合并写入磁 盘, 并删除该元组在所述 ca che中对应的元组数据信息。  The third processing sub-unit is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the corresponding tuple data information of the tuple in the ca che.
1 6、 根据权利要求 1 0至 1 5 中任一项所述的装置, 其特征在于, 还包 括: The apparatus according to any one of claims 10 to 15, further comprising Includes:
第一查找单元, 用于当需要查找指定元组时, 访问所述第一映射表, 确定所述 ca che中是否包括所述指定元组;  a first searching unit, configured to: when the specified tuple needs to be searched, access the first mapping table, and determine whether the specified tuple is included in the ca che;
第二查找单元, 用于当所述 ca che 中包括所述指定元组时, 根据第一 映射表确定所述 ca che 中包括该指定元组数据最终值的第一存储块信息, 确定所述指定元组的数据。  a second searching unit, configured to: when the specified tuple is included in the ca che, determine, according to the first mapping table, first storage block information including a final value of the specified tuple data in the ca che, determining the Specifies the data for the tuple.
1 7、 根据权利要求 1 6中任一项所述的装置, 其特征在于, 所述处理单 元还用于当发生异常情况导致将脏数据写入磁盘的过程被迫终止时, 在服 务器重启后根据所述 ca che 中剩余版本的第一存储块信息重构所述第一映 射表;  The device according to any one of claims 1 to 6, wherein the processing unit is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted Reconstructing the first mapping table according to the first storage block information of the remaining versions in the ca che;
所述第一查找子单元还用于根据所述处理单元重构的第一映射表查找 所述 ca che 中剩余脏数据中各元组数据最终值的时间版本号, 确定所述 ca che 中与所述时间版本号对应的第一存储块信息, 并将该第一存储块信 息中存储有所述各元组数据最终值的元组进行标记,将其设置为有效元组; 所述第二处理子单元还用于将所述第一查找子单元确定的有效元组写 入磁盘, 并删除所述有效元组在所述 ca che中对应的元组数据信息;  The first search subunit is further configured to search, according to the first mapping table reconstructed by the processing unit, a time version number of a final value of each tuple data in the remaining dirty data in the cache, and determine the a first storage block information corresponding to the time version number, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting the tuple as an effective tuple; The processing subunit is further configured to write the valid tuple determined by the first lookup subunit to the disk, and delete the corresponding tuple data information of the valid tuple in the cache;
所述处理单元还用于在所述第二处理子单元将所述有效元组写入磁盘 后, 删除所述有效元组在所述第一映射表中对应的时间版本号信息。  The processing unit is further configured to: after the second processing subunit writes the valid tuple to the disk, delete time version number information corresponding to the valid tuple in the first mapping table.
1 8、 根据权利要求 1 6中任一项所述的装置, 其特征在于, 所述处理单 元还用于当服务器关闭时, 将所述内存中的第一映射表存储于所述 ca che 中, 以使得所述服务器在重启后根据所述第一映射表将所述 ca che 中剩余 版本的第一存储块信息写入磁盘。  The device according to any one of claims 1 to 6, wherein the processing unit is further configured to: when the server is closed, store the first mapping table in the memory in the cache So that the server writes the first storage block information of the remaining versions in the ca che to the disk according to the first mapping table after the restart.
1 9、 一种存储器, 其特征在于, 包括根据权利要求 9至 1 6中任一项所 述的处理脏数据的装置, 以及处理器, 其中:  A memory, characterized by comprising the apparatus for processing dirty data according to any one of claims 9 to 16, and a processor, wherein:
所述处理器用于控制所述处理脏数据的装置。  The processor is configured to control the device that processes dirty data.
PCT/CN2011/081046 2011-10-20 2011-10-20 Method and device for processing dirty data WO2012083754A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201180002177.XA CN102725752B (en) 2011-10-20 2011-10-20 Method and device for processing dirty data
PCT/CN2011/081046 WO2012083754A1 (en) 2011-10-20 2011-10-20 Method and device for processing dirty data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/081046 WO2012083754A1 (en) 2011-10-20 2011-10-20 Method and device for processing dirty data

Publications (1)

Publication Number Publication Date
WO2012083754A1 true WO2012083754A1 (en) 2012-06-28

Family

ID=46313122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/081046 WO2012083754A1 (en) 2011-10-20 2011-10-20 Method and device for processing dirty data

Country Status (2)

Country Link
CN (1) CN102725752B (en)
WO (1) WO2012083754A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN105763351A (en) * 2014-12-17 2016-07-13 华为技术有限公司 Method for deploying value added service, forwarding equipment, detection equipment, and management equipment
CN108319609A (en) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 ETL data processing methods and system, data cleaning method and device
JP2020510905A (en) * 2017-02-06 2020-04-09 中興通訊股▲ふん▼有限公司Zte Corporation Flash memory file system and data management method thereof

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218430B (en) * 2013-04-11 2016-03-02 华为技术有限公司 The method that control data writes, system and equipment
CN103513941B (en) * 2013-10-18 2016-08-17 华为技术有限公司 The method and device of write data
CN103714121B (en) * 2013-12-03 2017-07-14 华为技术有限公司 The management method and device of a kind of index record
CN103631940B (en) * 2013-12-09 2017-02-08 中国联合网络通信集团有限公司 Data writing method and data writing system applied to HBASE database
CN104331452B (en) * 2014-10-30 2017-07-28 北京思特奇信息技术股份有限公司 A kind of method and system for handling dirty data
EP3321767B1 (en) 2015-12-30 2020-04-15 Huawei Technologies Co., Ltd. Method for reducing power consumption of memory and computer device
CN106802950A (en) * 2017-01-16 2017-06-06 郑州云海信息技术有限公司 A kind of method of distributed file system small documents write buffer optimization
CN110704468A (en) * 2019-10-17 2020-01-17 武汉微派网络科技有限公司 Data updating method and device and controller
CN111563053B (en) * 2020-07-10 2020-12-11 阿里云计算有限公司 Method and device for processing Bitmap data
CN112115073A (en) * 2020-09-04 2020-12-22 北京易捷思达科技发展有限公司 Recovery method and device applied to Bcache

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851677A (en) * 2005-11-25 2006-10-25 华为技术有限公司 Embedded processor system and its data operating method
CN101178689A (en) * 2007-12-06 2008-05-14 浙江科技学院 Dynamic state management techniques of NAND flash memory
CN101916290A (en) * 2010-08-18 2010-12-15 中兴通讯股份有限公司 Managing method of internal memory database and device
US20110191535A1 (en) * 2010-02-01 2011-08-04 Fujitsu Limited Method for controlling disk array apparatus and disk array apparatus
WO2011114384A1 (en) * 2010-03-19 2011-09-22 Hitachi, Ltd. Storage system and method for changing configuration of cache memory for storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851677A (en) * 2005-11-25 2006-10-25 华为技术有限公司 Embedded processor system and its data operating method
CN101178689A (en) * 2007-12-06 2008-05-14 浙江科技学院 Dynamic state management techniques of NAND flash memory
US20110191535A1 (en) * 2010-02-01 2011-08-04 Fujitsu Limited Method for controlling disk array apparatus and disk array apparatus
WO2011114384A1 (en) * 2010-03-19 2011-09-22 Hitachi, Ltd. Storage system and method for changing configuration of cache memory for storage system
CN101916290A (en) * 2010-08-18 2010-12-15 中兴通讯股份有限公司 Managing method of internal memory database and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN103593352B (en) * 2012-08-15 2016-10-12 阿里巴巴集团控股有限公司 A kind of mass data cleaning method and device
CN105763351A (en) * 2014-12-17 2016-07-13 华为技术有限公司 Method for deploying value added service, forwarding equipment, detection equipment, and management equipment
CN105763351B (en) * 2014-12-17 2019-09-03 华为技术有限公司 Dispose method, forwarding device, detection device and the management equipment of value-added service
CN108319609A (en) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 ETL data processing methods and system, data cleaning method and device
JP2020510905A (en) * 2017-02-06 2020-04-09 中興通訊股▲ふん▼有限公司Zte Corporation Flash memory file system and data management method thereof

Also Published As

Publication number Publication date
CN102725752A (en) 2012-10-10
CN102725752B (en) 2014-07-16

Similar Documents

Publication Publication Date Title
WO2012083754A1 (en) Method and device for processing dirty data
US9449005B2 (en) Metadata storage system and management method for cluster file system
US9703640B2 (en) Method and system of performing incremental SQL server database backups
US8799601B1 (en) Techniques for managing deduplication based on recently written extents
US9836514B2 (en) Cache based key-value store mapping and replication
US9305040B2 (en) Efficient B-tree data serialization
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
CN106662981A (en) Storage device, program, and information processing method
US9542279B2 (en) Shadow paging based log segment directory
US11526465B2 (en) Generating hash trees for database schemas
WO2017166815A1 (en) Data updating method and device for a distributed database system
WO2016070529A1 (en) Method and device for achieving duplicated data deletion
WO2014089828A1 (en) Method for accessing storage device and storage device
KR101674176B1 (en) Method and apparatus for fsync system call processing using ordered mode journaling with file unit
WO2018076633A1 (en) Remote data replication method, storage device and storage system
US9411692B2 (en) Applying write elision
US10423583B1 (en) Efficient caching and configuration for retrieving data from a storage system
US11625503B2 (en) Data integrity procedure
US20060155774A1 (en) Handling access requests to a page while copying an updated page of data to storage
US11899625B2 (en) Systems and methods for replication time estimation in a data deduplication system
US10528254B2 (en) Methods and systems of garbage collection and defragmentation in a distributed database
US10664442B1 (en) Method and system for data consistency verification in a storage system
US11748259B2 (en) System and method to conserve device lifetime for snapshot generation
CN116257531B (en) Database space recovery method
US11531644B2 (en) Fractional consistent global snapshots of a distributed namespace

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180002177.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11851101

Country of ref document: EP

Kind code of ref document: A1