WO2012083754A1

WO2012083754A1 - Method and device for processing dirty data

Info

Publication number: WO2012083754A1
Application number: PCT/CN2011/081046
Authority: WO
Inventors: 时家幸
Original assignee: 华为技术有限公司
Priority date: 2011-10-20
Filing date: 2011-10-20
Publication date: 2012-06-28
Also published as: CN102725752A; CN102725752B

Abstract

Disclosed are a method and a device for processing dirty data. The method provided in the embodiment of the present invention includes: determining a first storage block in the memory, and the size of the first storage block is matched with the writing specification of the cache; combining the elements marked as dirty data in the memory and writing the same into the first storage block; and writing the dirty data from the first storage block into the cache and writing the dirty data into a disk via the cache. By implementing the present invention, the data throughput and read-write performance of a database system can be improved.

Description

Method and device for processing dirty data

The present invention relates to the field of storage technologies, and in particular, to a method and apparatus for processing dirty data. Background technique

The database ( Da t aba s e ) is a repository that organizes, stores, and manages data according to its data structure. In daily work, it is often necessary to put some relevant data into such a "warehouse" and handle it accordingly according to the needs of management. The traditional database system and other storage-related engines work. When the data is modified in memory, the modified data needs to be written to disk immediately (or in a short time) to ensure the integrity of the transaction or the data in the database. Reliability. In the process of writing the modified data to the disk, the data cannot be written to the memory, and the memory has to be suspended from the external service, thereby causing a limitation on the memory throughput and the read and write performance of the system.

Since the read and write speed of the disk is much lower than the memory, the system performance is greatly reduced. At present, the read and write performance of the system is improved by adding a flash device similar to a Solid State Disk (SSD) as a cache memory: The memory writes the modified data in units of memory blocks in the SSD. SSD; Write cached data to disk during business idle period to improve system throughput and read and write performance. The data that has been modified in the cache and has not been written to the disk is dirty data.

When a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the SSD can only process a small amount of dirty data when reading and writing a data block, resulting in data throughput and reading and writing of the database system. Low performance, causing system response delays and even database crashes.

Summary of the invention

Embodiments of the present invention provide a method and apparatus for processing dirty data, which can improve data throughput and read and write performance of a database system.

Embodiments of the present invention use the following technical solutions: In one aspect, an embodiment of the present invention provides a method for processing dirty data, including: determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache memory;

Merging a combination of elements in memory that are marked as dirty data into the first storage block;

The dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.

In another aspect, an embodiment of the present invention provides an apparatus for processing dirty data, including: a determining unit, configured to determine, in a memory, a first storage block, the size of the first storage block and a ca che cache Write specifications match;

a first write unit, configured to combine and write the elements marked as dirty data in the memory into the first storage block;

And a second writing unit, configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.

The method and apparatus for processing dirty data provided by the embodiments of the present invention can combine the elements marked as dirty data in the memory and write them together, and then write the dirty data to the disk through the ca che. The method and device provided by the embodiments of the present invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of ca che, and prolong the service life of the ca che. DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.

FIG. 1 is a schematic flowchart of a method according to an embodiment of the present invention;

2 is a schematic flowchart of a method according to another embodiment of the present invention;

3 is a schematic structural diagram of a device according to another embodiment of the present invention;

4 is another schematic structural diagram of a device according to another embodiment of the present invention;

FIG. 5 is still another schematic structural diagram of a device according to another embodiment of the present invention; FIG. FIG. 6 is still another schematic structural diagram of a device according to another embodiment of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The dirty data of the embodiment of the present invention may be data that has been modified in the cache and has not been written to the disk. The embodiments of the present invention can be applied to various types of databases and data warehouse systems, including DB databases, Oracle databases, SQL databases, and the like.

An embodiment of the present invention provides a method for processing dirty data. As shown in FIG. 1, the method includes:

101. Determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache.

102. Combine the elements in the memory marked as dirty data into the first storage block.

103. Write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.

The method for processing dirty data provided by the embodiment of the present invention can combine the elements marked as dirty data in the memory and write them together to the cache, and then write the dirty data to the disk through the cache. The method provided by the embodiment of the invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of the cache, thereby prolonging the service life of the cache. Another embodiment of the present invention further provides a method for processing dirty data, as shown in FIG. 2, including:

201. Determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache.

The cache is a cache device that connects the memory and the disk; the write specification of the cache refers to the maximum amount of data that can be written by the cache every time it is refreshed. In general, cache read and write The speed is much larger than the read/write speed of the memory. In order to improve the read and write efficiency of the dirty data, the storage space with the same or close to the write size of the cache can be determined in the memory as the first storage module. Specifically, the free space in the memory may be integrated to obtain the first storage block. The storage space that meets the specifications of the first storage block may be reserved in the memory as the first storage block, which is not limited herein.

Preferably, the cache may be a flash device similar to a Solid State Disk (SSD), but is not limited thereto.

202. Combine the elements in the memory marked as dirty data into the first storage block.

It should be noted that after the elements marked as dirty data in the memory are combined and written into the first storage block, the first storage block stores the original storage block information to which each tuple marked as dirty data belongs, and each The tuple data and a pointer to each tuple data; wherein the tuple may be a storage unit that stores dirty data, and can also represent a connection of a plurality of storage units, but is not limited thereto.

Specifically, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the first storage block can integrate the array marked as dirty data into the cache, thereby improving Difficult data read and write efficiency.

203. Establish a first mapping table in the memory, where the first mapping table is used to record the first specific. When there is more dirty data in the memory, the first storage block may be used to write dirty data multiple times. Cache; thus storing a plurality of different versions of the first block information in the cache. In order to facilitate the indexing, the information in the first storage block may be numbered according to the order of writing the cache, and the time version number of each first storage block information is determined, where each tuple in the first storage block information of the same version is used. The time version number is the same, and the time version number of each tuple is used to represent the first storage block information to which the tuple belongs in the cache.

204. Write dirty data in the first storage block to the cache.

It should be noted that when the storage space marked as dirty data in the memory is larger than the storage space of the first storage block, it is necessary to write the tuple marked as dirty data in the memory to the cache by using the first storage block multiple times. Thereby storing different versions of the first storage block information in the cache. In practical applications, tuples marked as dirty data in memory may be modified many times, so that multiple values of the tuple are often recorded in ca che; but dirty data in ca che is written to disk The method of the present embodiment only needs to: write the final value of each tuple to the disk; in order to improve the efficiency of reading and writing data, the method provided in this embodiment further includes:

2 05. When the tuple marked as dirty data in the memory is modified multiple times, the time version number information of the tuple in the first mapping table is modified, and the first mapping table is updated.

2 06. Searching, according to the first mapping table, a time version number of a final value of each tuple data in the dirty data, determining first storage block information corresponding to the time version number in the ca che, and The tuple in which the final value of each tuple data is stored in the first storage block information is marked and set as an effective tuple.

Specifically, the effective tuple can be determined by using, but not limited to, the following methods:

The first storage block information of each version is sequentially read from the lowest version of the storage block according to the sequence of the storage block versions; and the tuple in the first storage block of the current version is detected in the higher version according to the time version number. Whether the memory block is modified again; if it is, the current tuple is ignored; if not, the current tuple is retained and marked as a valid tuple.

2 07. Write, in the service idle period, a valid tuple in the ca che to the disk, and delete the time version number information corresponding to the valid tuple in the first mapping table, and the valid tuple in the ca The corresponding first storage block information in che.

Preferably, when the valid tuple is written to the disk, the original storage block information to which each tuple in the effective tuple belongs may be determined according to the first mapping table, and the meta-combination belonging to the same original storage block is further determined. And write to disk together to improve data read and write efficiency.

Specifically, when the system needs to search for the specified tuple, the first mapping table may be accessed to determine whether the specified tuple data is included in the cache; if not, the tuple data is read from the disk; And determining, according to the first mapping table, the first storage block information including the specified tuple in the ca che including the specified tuple, and determining data of the specified tuple.

When the system needs to modify the data in the entire storage block, the corresponding tuple is obtained from the ca che to cover the specified storage block in the disk; when the system needs to modify the single tuple Then, the modification of the specified tuple is completed according to the method provided in this embodiment.

It is worth noting that when an abnormal situation (such as power failure, database system crash, or forced shutdown of the database server) causes the process of writing dirty data to disk to be terminated, you can use the following steps to save the remaining dirt in ca che Data is written to disk:

Reconfiguring the first mapping table according to the first storage block information of the remaining versions in the ca che after the server is restarted;

And searching, according to the first mapping table, a time version number of a final value of each tuple data in the remaining dirty data in the ca che, determining first storage block information corresponding to the time version number in the ca che, and The tuple in which the final value of each tuple data is stored in the first storage block information is marked, and is set as an effective tuple;

Write the valid tuple to the disk, and delete the time version number information corresponding to the valid tuple in the first mapping table and the first storage block information corresponding to the valid tuple in the ca che .

In addition, when the server is selected to be shut down according to the user indication, the first mapping table in the memory may be stored in the cache, so that after the server restarts, the remaining version of the cache is determined according to the first mapping table. The first block information is written to the disk. The method for writing the dirty data in the ca che to the disk is referred to in this embodiment, and details are not described herein again.

The method for processing dirty data provided by the embodiment of the present invention, by determining the first storage block in the memory, and integrating the tuple marked as dirty data in the memory into the ca che; the dirty in the ca che in the idle period of the service Data is written to disk. Compared with the prior art, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the method provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, also facilitates the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing caç, and prolong the service life of ca che. A further embodiment of the present invention provides a device for processing dirty data, which can implement the foregoing method embodiment. As shown in FIG. 3, the device includes: a determining unit 31, configured to determine, in a memory, a first storage block, where a size of the first storage block matches a write specification of a cache;

a first writing unit 32, configured to combine and write the elements marked as dirty data in the memory into the first storage block;

The second write unit 33 is configured to write dirty data in the first storage block to the cache, and write the dirty data to the disk by using the cache.

Further, as shown in FIG. 4, the determining unit 31 may further include an integration subunit 311 or a reservation subunit 312, where:

The integration subunit 311 is configured to integrate the free space in the memory to obtain the first storage block.

The reservation subunit 312 is configured to reserve, in the memory, a storage space conforming to the first storage block specification as the first storage block.

Specifically, the first writing unit 32 is further configured to write related information of a tuple marked as dirty data in the memory to the first storage block, where the related information of the tuple includes each tuple marked as dirty data. The original storage block information to which it belongs, the data for each tuple, and a pointer to each tuple data.

Further, as shown in FIG. 5, the apparatus further includes a processing unit 34. The second writing unit 33 specifically includes a first processing sub-unit 331, a first searching sub-unit 332, and a second processing sub-unit 333, where:

The processing unit 34 is configured to establish a first mapping table in the memory, where the first mapping table uses the initial storage block information, where the time version number of each tuple is used to represent the tuple in the cache. The version information of the first storage block to which it belongs. When the modification is performed multiple times, the time version number information of the tuple in the first mapping table is modified, and the first mapping table is updated;

Specifically, the first processing sub-unit 331 is configured to write dirty data in the first storage block The cache, the dirty data is written to the disk by the cache;

The first search sub-unit 332 is configured to search for a time version number of a final value of each tuple data in the dirty data according to the first mapping table, when the tuple marked as dirty data in the memory is modified multiple times, and determine a first storage block information corresponding to the time version number in the cache, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting it as an effective tuple ;

The second processing sub-unit 333 is configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache;

The processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.

Further, as shown in FIG. 6, the second writing unit 33 may further include a second searching subunit 334 and a third processing subunit 335, and the apparatus further includes a first searching unit 35 and a second searching unit 36, where:

The second lookup subunit 334 is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;

The second processing sub-unit 335 is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the tuple data information corresponding to the tuple in the cache.

The first searching unit 35 is configured to: when the specified tuple needs to be searched, look up the first mapping table, and determine whether the specified tuple is included in the cache;

The second searching unit 36 is configured to: when the specified tuple is included in the cache, determine, according to the first mapping table, first storage block information that includes a final value of the specified tuple data in the cache, and determine the specified element. Group of data.

According to the apparatus of FIG. 6, further, the processing unit 34 is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted, according to the The first storage block information of the remaining versions in the cache is used to reconstruct the first mapping table. The first searching sub-unit 332 is further configured to search for remaining dirty data in the cache according to the first mapping table determined by the processing unit 34. a time version number of a final value of each tuple data, determining first storage block information corresponding to the time version number in the cache, and storing the final value of each tuple data in the first storage block information The tuple is marked and set as an effective tuple;

The second processing sub-unit 333 is further configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache; the processing unit 34 further And after the second processing sub-unit 333 writes the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table.

According to the apparatus of FIG. 6, further, the processing unit 34 is further configured to: when the server is shut down, store the first mapping table in the memory in the cache, so that the server is The first mapping table writes the first storage block information of the remaining versions in the cache to the disk.

The processing device for the dirty data provided by the embodiment of the present invention determines the first storage block in the memory by the determining unit 31, and integrates the tuple marked as dirty data in the memory by the first writing unit 32 to write the first storage block. The dirty data in the first memory block is written to the cache by the second write unit 33 during the service idle period, and the dirty data is written to the disk by the cache. Compared with the prior art, when a large amount of data is modified in a short time in the memory, and the modified data is dispersed in different storage blocks, the device provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, it is also convenient for the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing of the cache and prolong the service life of the cache.

Embodiments of the present invention also provide a memory including the apparatus described in Figures 3 through 6 and a processor for controlling the apparatus for processing dirty data. This memory is capable of handling dirty data. It should be noted that the memory may be used as a memory or as a cache, which is not limited herein.

Through the description of the above embodiments, those skilled in the art can clearly understand The invention can be implemented by means of software plus the necessary general hardware, and of course also by hardware, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention. The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

Claim

A method for processing dirty data, comprising:

Determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache ca che;

2. The method according to claim 1, wherein the determining the first memory block in the memory comprises:

Integrating the free space in the memory to obtain the first storage block; or

A storage space conforming to the first storage block specification is reserved in the memory as the first storage block.

The method according to claim 2, wherein after the elements marked as dirty data in the memory are combined and written into the first storage block, the first storage block stores the mark as dirty. The original storage block information to which each tuple of data belongs, the data for each tuple, and a pointer to each tuple data.

The method according to claim 3, wherein after the combination of the elements marked as dirty data in the memory and written in the first storage block, the method further includes:

Establishing a first mapping table in the memory, where the first mapping table is configured to record a time version number of the first stored tuple for characterizing a first part of the tuple in the ca che The version information of the storage block.

The method according to claim 4, wherein when the tuple marked as dirty data in the memory is modified a plurality of times, the method further includes:

Modifying the time version number information of the tuple in the first mapping table, and updating the first mapping table;

Then, the dirty data in the first storage block is written into the ca che, through the ca che Writing the dirty data to the disk includes:

And searching, according to the first mapping table, a time version number of a final value of each tuple data in the dirty data, determining first storage block information corresponding to the time version number in the ca che, and storing the first storage a tuple in which the final value of each tuple data is stored in the block information is marked, and is set as an effective tuple;

And writing the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table, and tuple data information corresponding to the valid tuple in the ca che.

The method according to claim 5, wherein the writing the valid tuple to the disk comprises:

Determining, according to the first mapping table, original storage block information to which each tuple in the effective tuple belongs;

The elements belonging to the same original storage block are combined and written to the disk, and the corresponding tuple data information of the tuple in the ca che is deleted.

The method according to any one of claims 1 to 6, wherein when the specified tuple needs to be searched, the method further comprises:

Accessing the first mapping table, determining whether the specified tuple is included in the ca che; when the specified tuple is included in the ca che, determining, according to the first mapping table, that the ca che includes the designation The first storage block information of the final value of the tuple data determines the data of the specified tuple.

The method according to any one of claims 1 to 6, wherein when the abnormality causes the process of writing dirty data to the disk to be terminated, the method further includes:

After the server is restarted, reconstructing the first mapping table according to the first storage block information of the remaining versions in the ca che;

And searching, according to the first mapping table, a time version number of a final value of each tuple data in the remaining dirty data in the ca che, determining first storage block information corresponding to the time version number in the ca che, and a tuple in which the final value of each tuple data is stored in the first storage block information Mark, set it as a valid tuple;

The method according to any one of claims 1 to 6, wherein when the server is shut down, the method further comprises:

The first mapping table in the memory is stored in the ca che, so that the server writes the first storage block information of the remaining version in the ca che to the disk according to the first mapping table after the restart.

A device for processing dirty data, comprising:

a determining unit, configured to determine, in a memory, a first storage block, where the size of the first storage block matches a write specification of a cache cache;

The device according to claim 10, wherein the determining unit comprises an integrated subunit or a reserved subunit, wherein:

The integration subunit is configured to integrate the free space in the memory to obtain the first storage block;

The reservation subunit is configured to reserve, in the memory, a storage space that meets the specification of the first storage block as the first storage block.

The device according to claim 11, wherein the first writing unit is further configured to write related information of a tuple marked as dirty data in the memory into the first storage block, where The related information of the tuple includes original storage block information to which each tuple marked as dirty data belongs, data of each tuple, and a pointer to each tuple data.

The device according to claim 12, wherein the device further comprises: a processing unit, configured to establish a first mapping table in the memory, where the first mapping table is used for storing block information, where a time version number of each tuple is used to represent the tuple in the ca che Version information of the first storage block to which it belongs.

The device according to claim 13, wherein the processing unit is further configured to: when the tuple marked as dirty data in the memory is modified a plurality of times, modify the tuple in the first Updating the first mapping table by using time version number information in a mapping table;

The second write unit includes a first processing subunit, a first lookup subunit, and a second processing subunit, where:

The first processing subunit is configured to write dirty data in the first storage block to the ca che, and write the dirty data to a disk by using the ca che;

The first search subunit is configured to search for a time version of a final value of each tuple data in the dirty data according to the first mapping table when the tuple marked as dirty data in the memory is modified multiple times. No. determining first storage block information corresponding to the time version number in the ca che, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting the tuple Is an effective tuple;

The second processing sub-unit is configured to write the valid tuple determined by the first search sub-unit to the disk, and delete the corresponding tuple data information of the valid tuple in the ca che;

The processing unit is further configured to: after the second processing subunit writes the valid tuple to the disk, delete time version number information corresponding to the valid tuple in the first mapping table.

The device according to claim 14, wherein the second writing unit further comprises a second searching subunit and a third processing subunit, wherein:

The second search subunit is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;

The third processing sub-unit is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the corresponding tuple data information of the tuple in the ca che.

The apparatus according to any one of claims 10 to 15, further comprising Includes:

a first searching unit, configured to: when the specified tuple needs to be searched, access the first mapping table, and determine whether the specified tuple is included in the ca che;

a second searching unit, configured to: when the specified tuple is included in the ca che, determine, according to the first mapping table, first storage block information including a final value of the specified tuple data in the ca che, determining the Specifies the data for the tuple.

The device according to any one of claims 1 to 6, wherein the processing unit is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted Reconstructing the first mapping table according to the first storage block information of the remaining versions in the ca che;

The first search subunit is further configured to search, according to the first mapping table reconstructed by the processing unit, a time version number of a final value of each tuple data in the remaining dirty data in the cache, and determine the a first storage block information corresponding to the time version number, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting the tuple as an effective tuple; The processing subunit is further configured to write the valid tuple determined by the first lookup subunit to the disk, and delete the corresponding tuple data information of the valid tuple in the cache;

The device according to any one of claims 1 to 6, wherein the processing unit is further configured to: when the server is closed, store the first mapping table in the memory in the cache So that the server writes the first storage block information of the remaining versions in the ca che to the disk according to the first mapping table after the restart.

A memory, characterized by comprising the apparatus for processing dirty data according to any one of claims 9 to 16, and a processor, wherein:

The processor is configured to control the device that processes dirty data.