WO2022007937A1 - 处理Bitmap数据的方法以及装置 - Google Patents
处理Bitmap数据的方法以及装置 Download PDFInfo
- Publication number
- WO2022007937A1 WO2022007937A1 PCT/CN2021/105416 CN2021105416W WO2022007937A1 WO 2022007937 A1 WO2022007937 A1 WO 2022007937A1 CN 2021105416 W CN2021105416 W CN 2021105416W WO 2022007937 A1 WO2022007937 A1 WO 2022007937A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- update
- partitions
- disk
- memory
- area
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0804—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1466—Management of the backup or restore process to make the backup process non-disruptive
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1471—Saving, restoring, recovering or retrying involving logging of persistent data for recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0891—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0873—Mapping of cache memory to specific storage devices or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1024—Latency reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1032—Reliability improvement, data loss prevention, degraded operation etc
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1056—Simplification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/16—General purpose computing application
- G06F2212/163—Server or database system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/46—Caching storage objects of specific type in disk cache
- G06F2212/466—Metadata, control data
Definitions
- the embodiments of this specification relate to the technical field of databases, and in particular, to a method for processing Bitmap data.
- One or more embodiments of this specification simultaneously relate to an apparatus for processing Bitmap data, a computing device, and a computer-readable storage medium.
- Bitmap It is a disk space management structure that represents a dense set in a finite field. Each element appears at least once. It is widely used in indexing, data compression, etc.
- the data structure in memory and on disk are consistent.
- the management structure of Bitmap usually when the log is about to be filled, the machine is down, or other conditions, it is considered that the conditions for writing back to the disk are satisfied.
- the full amount of unpersisted data in the area corresponding to the Bitmap in the memory also known as dirty data, is flushed to the Bitmap area of the disk.
- the embodiments of this specification provide a method for processing Bitmap data.
- One or more embodiments of this specification simultaneously relate to an apparatus for processing Bitmap data, a computing device, and a computer-readable storage medium, so as to solve the technical defects existing in the prior art.
- a method for processing Bitmap data including: dividing a Bitmap area in a disk into a plurality of partitions in advance and setting an update area in the disk; in response to the condition for writing back to the disk being satisfied, Obtain the amount of dirty data corresponding to each of the plurality of partitions in the memory; according to the amount of dirty data corresponding to each of the plurality of partitions, find out the amount of dirty data from the plurality of partitions that satisfies the amount of dirty data merged into the update area. a second partition; the dirty data corresponding to the multiple second partitions in the memory are merged and then recorded to the update area in the disk through one or more I/Os.
- the recording of the dirty data corresponding to the multiple second partitions in the memory to the update area in the disk through one or more I/Os after merging the multiple second partitions includes: merging the multiple second partitions The update operation code in the corresponding dirty data in the memory, the code is used to record the corresponding position of the update operation in the second partition; the encoded update operations corresponding to the plurality of second partitions respectively The update area is merged into the area corresponding to the memory; the merged, encoded update operation is recorded from the area corresponding to the memory to the update area in the disk through one or more I/Os.
- the size of the update area is smaller than or equal to the size of a single I/O data block.
- the obtaining the amount of dirty data corresponding to each of the plurality of partitions in the memory in response to the condition for writing back to the disk is satisfied includes: in response to a checkpoint event being triggered, obtaining the amount of each of the plurality of partitions in the memory. or, in response to the amount of dirty data in the region corresponding to any one or more partitions reaching the preset dirty data flushing threshold, obtain the dirty data corresponding to each of the multiple partitions in the memory quantity.
- it also includes: in the case that the condition for writing back to the disk is satisfied, according to the amount of dirty data corresponding to each of the plurality of partitions, find out from the plurality of partitions that the amount of dirty data does not meet the requirements and merge into updating the first partition of the area; updating the dirty data corresponding to the first partition in the memory to the first partition in the disk on a per partition basis.
- the size of a single partition in the multiple partitions is less than or equal to the size of a single I/O data block.
- it also includes: in response to the condition of writing back to the memory being satisfied, loading the full amount of data from the Bitmap area of the disk into the area corresponding to the memory, and reading out from the update area of the disk The update operation is applied to the corresponding position in the memory, and the incremental update after the log number corresponding to the log number recently written back to the disk is applied to the region corresponding to the Bitmap area in the memory.
- it also includes: setting corresponding partition headers for each partition of the Bitmap area in advance, and the partition header is used to record the current latest log number of the log when the corresponding partition is written; A corresponding update header is set in the update area, and the update header is used to record the current latest log number of the log when data is written in the update area.
- the step of reading out the update operation from the update area of the disk and applying it to the corresponding position in the memory includes: comparing the log label recorded in the update header with the log label recorded in the partition header of the second partition ; If the log label recorded in the update header is larger than the log label recorded in the partition header of the second partition, read the update operation from the update area of the disk and apply it to the corresponding position in the memory.
- it also includes: setting corresponding partition headers for each partition of the Bitmap area in advance, where the partition headers are used to record cyclic redundancy check codes of data in the corresponding partitions; according to the partitions The cyclic redundancy check code recorded in the header verifies the correctness of the data loaded into the corresponding area of the memory.
- the reading out the update operation from the update area of the disk and applying it to the corresponding position in the memory includes: reading the encoded update operation from the update area of the disk; The update operation is decoded to obtain the update operation and the corresponding position of the update operation in the second partition; and the update operation is applied to the position in the memory.
- an apparatus for processing Bitmap data including: a setting module configured to divide a Bitmap area in a disk into a plurality of partitions in advance and set an update area in the disk.
- the write-back disk response module is configured to obtain the amount of dirty data corresponding to each of the plurality of partitions in the memory in response to the write-back disk condition being satisfied.
- the merging determination module is configured to find, according to the respective amounts of dirty data corresponding to the plurality of partitions, a plurality of second partitions whose amount of dirty data is sufficient to be merged into the update area from the plurality of partitions.
- the merging and recording module is configured to merge the dirty data corresponding to the plurality of second partitions in the memory and record them to the update area in the disk through one or more I/Os.
- a computing device including: a memory and a processor; the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions:
- the Bitmap area in the disk is divided into multiple partitions and an update area is set in the disk; in response to the conditions for writing back to the disk being satisfied, the amount of dirty data corresponding to each of the multiple partitions in the memory is obtained; according to the multiple partitions According to the respective amounts of dirty data, find out from the plurality of partitions a plurality of second partitions whose amount of dirty data is sufficient to be merged into the update area; after merging the dirty data corresponding to the plurality of second partitions in the memory Recording to the update area in the disk by one or more I/Os.
- a computer-readable storage medium which stores computer instructions, and when the instructions are executed by a processor, implements the steps of the method for processing Bitmap data described in any one of the embodiments of the present specification.
- An embodiment of the present specification implements a method for processing Bitmap data. Since the method divides the Bitmap area in the disk into multiple partitions in advance and sets the update area in the disk, the data can be obtained in response to the condition of writing back to the disk being satisfied.
- the amount of dirty data corresponding to each of the plurality of partitions in the memory according to the amount of dirty data corresponding to each of the plurality of partitions, find out from the plurality of partitions the amount of dirty data that meets the requirements of being merged into the update area. Second division.
- the dirty data corresponding to the multiple second partitions can be merged into the update area, that is, the partition with less dirty data on the Bitmap is found, and then the dirty data corresponding to the multiple second partitions in the memory is merged and passed One or more I/O updates are made to the update area in the disk, so that the update aggregation with less dirty data on the Bitmap is moved to the update area set on the disk. Since the number of I/Os required for the persistence of the aggregated dirty data is less than the number of I/Os required for the dirty data of multiple partitions with less dirty data to be updated to the disk respectively, a large number of dirty bits in the Bitmap are avoided in the original The cost of bit update reduces the amount of I/O issued and improves the efficiency of the system.
- FIG. 1 is a flowchart of a method for processing Bitmap data provided by an embodiment of this specification
- FIG. 2 is a schematic diagram of the division of the Bitmap area in the disk provided by an embodiment of this specification
- FIG. 3 is a schematic diagram of the Bitmap checkpoint process provided by an embodiment of this specification.
- FIG. 5 is a process flow diagram of a method for processing Bitmap data provided by an embodiment of this specification.
- FIG. 6 is a schematic structural diagram of a device for processing Bitmap data provided by an embodiment of this specification.
- FIG. 7 is a schematic structural diagram of a device for processing Bitmap data provided by another embodiment of this specification.
- FIG. 8 is a structural block diagram of a computing device provided by an embodiment of the present specification.
- LSN Log Sequence Number, log number
- each transaction operation corresponds to a log record, and each log record is identified by a unique ID, that is, LSN.
- Checkpoint LSN the most recent log number of the system log when the checkpoint event was triggered.
- a method for processing Bitmap data is provided, and this specification also relates to a device for processing Bitmap data, a computing device, and a computer-readable storage medium, which are detailed one by one in the following embodiments. instruction.
- FIG. 1 shows a flowchart of a method for processing Bitmap data according to an embodiment of the present specification, including steps 102 to 108 .
- Step 102 Divide the Bitmap area in the disk into multiple partitions in advance and set the update area in the disk.
- the Bitmap area in the disk is divided into multiple partitions, such as several fixed-size intervals, each partition is a Bitmap block, and an update area is set in the disk.
- the Bitmap area on the disk has a corresponding area with the same structure in the memory, and does not need to be encoded or decoded.
- the update area may include, but is not limited to, the update of the Bitmap on the disk, the metadata of the Bitmap, the latest LSN when the disk is refreshed, the Checkpoint location, and the like.
- the size of the update area can be consistent with the size of the Bitmap block, for example.
- the size of a single partition in the multiple partitions may be less than or equal to the size of a single I/O data block, so that the division of the Bitmap block size is aligned with the I/O block size (such as 4KB, 16KB), thereby ensuring that the written atomicity.
- the size and quantity of the Bitmap block and the size of the update area are not limited.
- the disk can be a set of Bitmap blocks as shown in Figure 2.
- multiple sets of Bitmap blocks can be used to achieve the atomicity of a series of Checkpoint IOs. If one set is originally written, the next Checkpoint will be written to another set, and the information of the set will be recorded in the update area, so that it can be determined during failure recovery. The location to which the update operation applies.
- the size of the update area may be smaller than or equal to the size of a data block of a single I/O.
- Step 104 In response to the condition for writing back to the disk being satisfied, obtain the amount of dirty data corresponding to each of the multiple partitions in the memory.
- Bitmap when Bitmap is doing Checkpoint, some updates may be accumulated on each Bitmap block, and each update corresponds to a dirty bit in the Bitmap block. Statistics on the amount of dirty data of all Bitmap blocks can be obtained in memory.
- Step 106 According to the amount of dirty data corresponding to each of the plurality of partitions, find out, from the plurality of partitions, a plurality of second partitions whose amount of dirty data is sufficient to be merged into the update area.
- the purpose of finding multiple second partitions is to find out the partitions with less dirty data in the multiple partitions, so as to merge the dirty data of multiple partitions with less dirty data to reduce the I/O load. Therefore, in the embodiments of this specification, there is no limitation on how to search for the amount of dirty data that meets the multiple second partitions merged into the update area, and search rules can be preset according to the needs of the scene. For example, all partitions are sorted by the amount of dirty data, so that a few partitions with less dirty data can be found as the second partition.
- a dirty data volume threshold may be preset, a partition greater than or equal to the preset dirty data volume threshold may be used as the first partition, and a partition smaller than the preset dirty data volume threshold may be used as the second partition.
- Step 108 After merging the dirty data corresponding to the second partitions in the memory, record the dirty data to the update area in the disk through one or more I/Os.
- the partition 301 and the partition 302 are the second partitions with less dirty data, and the amount of dirty data can be merged into the update area. Therefore, as shown in Figure 3 , the dirty data of the partition 301 and the partition 302 can be merged and updated to the update area of the disk, so as to reduce the amount of I/O issued.
- the method divides the Bitmap area in the disk into multiple partitions in advance and sets the update area in the disk, the dirty data corresponding to the multiple partitions in the memory can be obtained in response to the condition for writing back to the disk is satisfied.
- the amount of data according to the amount of dirty data corresponding to each of the plurality of partitions, a plurality of second partitions whose amount of dirty data is sufficient to be merged into the update area are found from the plurality of partitions.
- the dirty data corresponding to the multiple second partitions can be merged into the update area, that is, the partition with less dirty data on the Bitmap is found, and then the dirty data corresponding to the multiple second partitions in the memory is merged and passed One or more I/O updates are made to the update area in the disk, so that the update aggregation with less dirty data on the Bitmap is moved to the update area set on the disk. Since the number of I/Os required for the persistence of the aggregated dirty data is less than the number of I/Os required for the dirty data of multiple partitions with less dirty data to be updated to the disk respectively, a large number of dirty bits in the Bitmap are avoided in the original The cost of bit update reduces the amount of I/O issued and improves the efficiency of the system.
- the method for processing Bitmap data provided by the embodiments of this specification can be applied to the PolarStore multi-version storage engine.
- the storage engine adopts the Bitmap structure to manage the chunk space. According to the method provided by the embodiments of this specification, the problem of amplifying the number of reads and writes when the total amount of Bitmaps is large and persistent can be avoided, and the performance and stability of the system are improved.
- the location of the update operation may be recorded by encoding the update operation in the dirty data of the second partition.
- the update operations in the dirty data corresponding to the multiple second partitions in the memory may be encoded, where the encoding is used to record the corresponding positions of the update operations in the second partitions;
- the encoded update operations corresponding to the plurality of second partitions are merged into the area corresponding to the update area in the memory; the merged, encoded update operations are recorded from the area corresponding to the memory by one or more I/Os to the update area in the disk.
- the conditions for writing back to the disk may include: a checkpoint event is triggered, and the amount of dirty data in a region corresponding to any one or more partitions in the memory reaches Preset dirty data brushing threshold and other situations.
- the obtaining the amount of dirty data corresponding to each of the plurality of partitions in the memory in response to the condition for writing back to the disk is satisfied may include: in response to a checkpoint event being triggered, obtaining each of the plurality of partitions.
- the amount of dirty data corresponding to the memory and, in response to the amount of dirty data of any one or more partitions in the area corresponding to the memory reaching a preset dirty data flushing threshold, obtain the corresponding data of the multiple partitions in the memory amount of dirty data.
- the preset dirty data flushing threshold when the system is running normally, when the dirty bits accumulated in a certain Bitmap block have reached the flushing threshold of the block Threshold, actively brush the Bitmap block in advance, so that when the checkpoint is actually performed, the number of dirty bits on the Bitmap block has a higher probability to be merged into the update area, reducing the number of Bitmap blocks that need to be directly brushed to the partition. , to further reduce the I/O pressure during Checkpoint, and make the I/O happen more smoothly.
- dirty data may be searched out from the multiple partitions according to the respective amounts of dirty data corresponding to the multiple partitions If the amount is not sufficient to be merged into the first partition of the update area, the dirty data corresponding to the first partition in the memory is updated to the first partition in the disk on a per partition basis.
- the dirty data of the partition with more dirty data is directly flushed to the disk, and the dirty data of the multiple partitions with less dirty data is aggregated and updated, so as to avoid a large amount of dirty data in the Bitmap.
- the cost of updating dirty bits in place reduces the amount of I/O dispatched and improves the processing efficiency of the system.
- the full amount of data can be loaded from the Bitmap area of the disk into the corresponding area of the memory, and read from the update area of the disk
- the update operation is taken out and applied to the corresponding position in the memory, and the incremental update after the log number corresponding to the log number that was recently written back to the disk is applied to the area corresponding to the Bitmap area in the memory.
- the data can be recovered by loading the full amount of Bitmap data into the memory, reading the update operation from the update area and applying it to the memory, and replaying the log.
- the encoded update operation can be read from the update area of the disk during fault recovery; Decoding, to obtain the update operation and the corresponding position of the update operation in the second partition; and applying the update operation to the position in the memory.
- one or more Bitmap blocks with a larger number of dirty bits can be selected to update to the disk, and the remaining update operation codes are recorded in the update area, and passed One or more IOs complete persistence.
- the checkpoint is considered complete.
- load the Bitmap block in the disk and apply it to the corresponding structure in the memory read the data in the update area, decode the data in it, and convert it into an update operation.
- the incremental update after the Checkpoint LSN in the Log is applied to the Bitmap to complete the fault recovery.
- the Bitmap block in the disk is first loaded and applied to the corresponding structure in the memory. Since the data in the update area is the update in the Bitmap block, the data in it is decoded through step 404 and converted into an update operation, which acts on the corresponding Bitmap block. Finally, through step 406, the incremental update after the recent event written back to the disk in the log is applied to the area corresponding to the Bitmap in the memory. It can be seen that fault recovery can be completed efficiently and accurately through the above-mentioned implementation manner, and the system efficiency can be improved.
- a corresponding partition header may be set for each partition of the Bitmap area in advance, and the partition header is used to record whether the corresponding partition is The latest log number of the log when data is written; the corresponding update header is set for the update area in advance.
- the update header is used to record the current latest log number of the log when data is written in the update area.
- the step of reading out the update operation from the update area of the disk and applying it to the corresponding position in the memory includes: comparing the log label recorded in the update header with the log label recorded in the partition header of the second partition ; If the log label recorded in the update header is larger than the log label recorded in the partition header of the second partition, read the update operation from the update area of the disk and apply it to the corresponding position in the memory.
- a corresponding partition header may also be set for each partition of the Bitmap area in advance.
- each Bitmap block can add a partition header.
- the header can be located anywhere in the Bitmap block such as front, middle, and back. When the header occupies part of the space, the Bitmap block can be appropriately reduced.
- the header can contain metadata such as the latest LSN of the system when the Bitmap block is written, the cyclic redundancy check code crc (the crc of the Bitmap block).
- the LSN recorded in the header of the Bitmap block and the update area can be used to distinguish whether the update operation in the update area needs to be applied to the corresponding Bitmap block when the system recovers after a system downtime.
- the update operation in the update area can be applied to the corresponding location in the memory, otherwise, you can choose to skip the application.
- the amount of updating operation applications can be reduced to a certain extent, and the system efficiency can be improved.
- FIG. 5 shows a process flow chart of a method for processing Bitmap data provided by an embodiment of the present specification, and the specific steps include steps 502 to 528 .
- Step 502 Divide the Bitmap area in the disk into a plurality of partitions in advance and set an update area in the disk; set a corresponding partition header for each partition of the Bitmap area in advance; set a corresponding update area for the update area in advance Header.
- Step 504 In response to the amount of dirty data in any one or more partitions in the area corresponding to the memory reaching the preset dirty data flushing threshold, sort all partitions according to the amount of dirty data, and select less dirty data from all partitions according to the ordering A plurality of partitions are used as the second partition, and the remaining partitions are used as the first partition. Go to step 508 .
- Step 506 In response to the checkpoint event being triggered, all partitions are sorted according to the amount of dirty data, and according to the sorting, multiple partitions with less dirty data are selected from all the partitions as the second partition, and the remaining partitions are used as the first partition. Go to step 508 .
- Step 508 Update the dirty data corresponding to the first partition in the memory to the first partition in the disk on a per partition basis.
- Step 510 Encode the update operations in the dirty data corresponding to the second partitions in the memory, where the encoding is used to record the corresponding positions of the update operations in the second partitions.
- Step 512 Merge the encoded update operations corresponding to the plurality of second partitions into an area corresponding to the update area in the memory.
- Step 514 Record the combined, encoded update operation from the corresponding area in the memory to the update area in the disk through one or more I/Os.
- Step 516 In response to the failure recovery event being triggered, load the full amount of data from the Bitmap area of the disk into the corresponding area of the memory.
- Step 518 Compare the log label recorded in the update header with the log label recorded in the partition header of the second partition.
- Step 520 If the log label recorded in the update header is larger than the log label recorded in the partition header of the second partition, read the encoded update operation from the update area of the disk.
- Step 522 Decode the encoded update operation to obtain the update operation and the corresponding position of the update operation in the second partition.
- Step 524 Apply the update operation to the location in memory.
- Step 526 Apply the incremental update after the log number corresponding to the latest checkpoint in the log to the region corresponding to the Bitmap region in the memory.
- Step 528 Check the correctness of the data loaded into the corresponding area of the memory according to the cyclic redundancy check code recorded in the partition header.
- the Bitmap area is segmented and introduced into the update area, and the updates of multiple partitions with less dirty data on the Bitmap are merged and transferred to the update area, and the updates are aggregated to avoid a large number of updates in the Bitmap.
- the cost of in-situ updating of dirty bits reduces the amount of IO issued and improves the system Checkpoint efficiency; based on the preset dirty data flushing threshold, Bitmap blocks can be flushed in advance, reducing the number of Bitmap blocks that need to be flushed directly to the partition.
- FIG. 6 shows a schematic structural diagram of an apparatus for processing Bitmap data provided by an embodiment of the present specification.
- the apparatus includes: a setting module 602 , a write-back disk response module 604 , a merge determination module 606 and a merge record module 608 .
- the setting module 602 may be configured to divide the Bitmap area in the disk into multiple partitions in advance and set the update area in the disk.
- the write-back-to-disk response module 604 may be configured to, in response to the condition of write-back to the disk being satisfied, obtain the amount of dirty data corresponding to each of the plurality of partitions in the memory.
- the merging determination module 606 may be configured to find, from the plurality of partitions, according to the respective amounts of dirty data corresponding to the plurality of partitions, a plurality of second partitions having a sufficient amount of dirty data to be merged into the update area.
- the merging and recording module 608 may be configured to merge the dirty data corresponding to the plurality of second partitions in the memory and record them to the update area in the disk through one or more I/Os.
- the device Since the device divides the Bitmap area in the disk into multiple partitions in advance and sets the update area in the disk, the amount of dirty data corresponding to each of the multiple partitions in the memory can be obtained in response to the condition of writing back to the disk being satisfied. , according to the amount of dirty data corresponding to each of the plurality of partitions, find out from the plurality of partitions a plurality of second partitions whose amount of dirty data is sufficient to be merged into the update area.
- the corresponding dirty data in the memory of multiple second partitions with less dirty data are merged and then updated to the disk through one or more I/Os
- the update area in the bitmap causes the update aggregation of multiple partitions with less dirty data on the Bitmap to be transferred to the update area set on the disk. Since the number of I/Os required for the persistence of the aggregated dirty data is less than the number of I/Os required to update the dirty data of multiple partitions to the disk respectively, the cost of in-situ updating of a large number of dirty bits in the Bitmap is avoided. Reduce the amount of I/O issued and improve the efficiency of the system.
- Fig. 7 shows a schematic structural diagram of an apparatus for processing Bitmap data provided by another embodiment of the present specification.
- the merging and recording module 608 may include: an encoding sub-module 6082 , a merging sub-module 6084 and a recording sub-module 6086 .
- the encoding sub-module 6082 can be configured to encode the update operations in the dirty data corresponding to the second partitions in the memory, where the encoding is used to record the update operations corresponding to the second partitions Location.
- the merging sub-module 6084 may be configured to merge the encoded update operations corresponding to the plurality of second partitions respectively into an area corresponding to the update area in the memory.
- the recording sub-module 6086 may be configured to record the combined, encoded update operation from the corresponding area of the memory to the update area in the disk through one or more I/Os.
- the write-back disk response module 604 of the apparatus may be configured to obtain the amount of dirty data corresponding to each of the multiple partitions in the memory in response to a checkpoint event being triggered; and, in response to The amount of dirty data in an area corresponding to any one or more partitions in the memory reaches a preset dirty data flushing threshold, and the amount of dirty data corresponding to each of the multiple partitions in the memory is obtained.
- the amount of dirty data corresponding to each partition is compared with the preset dirty data flushing threshold, when the system is running normally, when the dirty bits accumulated in a certain Bitmap block have reached the flushing threshold of the block Threshold, actively brush the Bitmap block in advance, so that when the checkpoint is actually performed, the number of dirty bits on the Bitmap block has a higher probability to be merged into the update area, reducing the number of Bitmap blocks that need to be directly brushed to the partition. , to further reduce the I/O pressure during Checkpoint, and make the I/O happen more smoothly.
- the apparatus may further include: a partition brushing module 612, which is configured to, under the condition that the conditions for writing back to the disk are satisfied, according to the dirty data corresponding to the multiple partitions The amount of dirty data is found from the plurality of partitions and the amount of dirty data does not meet the requirements for merging into the first partition of the update area; the dirty data corresponding to the first partition in the memory is updated to the first partition in the unit of each partition the first partition in the disk.
- a partition brushing module 612 which is configured to, under the condition that the conditions for writing back to the disk are satisfied, according to the dirty data corresponding to the multiple partitions The amount of dirty data is found from the plurality of partitions and the amount of dirty data does not meet the requirements for merging into the first partition of the update area; the dirty data corresponding to the first partition in the memory is updated to the first partition in the unit of each partition the first partition in the disk.
- the concentrated dirty data is directly flushed to the disk, and the dirty data of multiple partitions with a small amount of dirty data is aggregated and updated, so that a large number of dirty bits in the Bitmap are avoided in place.
- the apparatus may further include: a partition data recovery module 614, which may be configured to load the full amount of data from the Bitmap area of the disk to the memory in response to the condition for writing back to the memory is satisfied. in the corresponding area.
- the update data recovery module 616 may be configured to read the update operation from the update area of the disk and apply the update operation to the corresponding position in the memory.
- the incremental data recovery module 618 may be configured to apply the incremental update in the log after the log number corresponding to the log number that was recently written back to the disk to the region corresponding to the Bitmap region in the memory.
- the data when the conditions for writing back to the memory are satisfied, such as when the system recovers from a fault, the data can be restored by loading the full amount of Bitmap data into the memory, reading the update operation from the update area and applying it to the memory, and replaying the log. .
- fault recovery can be completed efficiently and accurately, and the system efficiency can be improved.
- the apparatus may further include: a partition header preset module 620, which may be configured to pre-set corresponding partition headers for each partition of the Bitmap area, and the partition headers It is used to record the latest log number of the log when data is written to the corresponding partition.
- the update header preset module 622 may be configured to set a corresponding update header for the update area in advance, where the update header is used to record the current latest log number of the log when data is written in the update area.
- the update data recovery module 616 may include: a header comparison sub-module 6162, which may be configured to compare the log label recorded in the update header with the log label recorded in the partition header of the second partition log number.
- the update application sub-module 6164 can be configured to read the update operation from the update area of the disk if the log number recorded in the update header is greater than the log number recorded in the partition header of the second partition and applied to the corresponding location in memory.
- the amount of updating operation applications can be reduced to a certain extent, and the system efficiency can be improved.
- the apparatus may further include: a check code setting module 624, which may be configured to set corresponding partition headers for each partition of the Bitmap area in advance, and the partition headers use Cyclic redundancy check code for recording the data in the corresponding partition.
- the verification module 626 may be configured to verify the correctness of the data loaded into the corresponding area of the memory according to the cyclic redundancy check code recorded in the partition header. Through this embodiment, the correctness of the Bitmap block can be verified, and the reliability of the system can be increased.
- the update application submodule 6164 may include: an operation reading submodule 6164a, an operation decoding submodule 6164b, and an operation application submodule 6164c.
- the operation reading sub-module 6164a can be configured to read the code from the update area of the disk if the log number recorded in the update header is greater than the log number recorded in the partition header of the second partition subsequent update operations.
- the operation decoding sub-module 6164b may be configured to decode the encoded update operation to obtain the update operation and the corresponding position of the update operation in the second partition.
- the operation application sub-module 6164c may be configured to apply the update operation to the location in memory.
- the update operation in combination with the implementation of encoding the update operation, during fault recovery, can be decoded to obtain the corresponding position of the update operation, which is more convenient for fault recovery, and the update operation is applied to the corresponding position.
- FIG. 8 shows a structural block diagram of a computing device 800 provided according to an embodiment of the present specification.
- Components of the computing device 800 include, but are not limited to, a memory 810 and a processor 820 .
- the processor 820 is connected with the memory 810 through the bus 830, and the database 850 is used for saving data.
- Computing device 800 also includes access device 840 that enables computing device 800 to communicate via one or more networks 860 .
- networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet.
- Access device 840 may include one or more of any type of network interface (eg, network interface card (NIC)), wired or wireless, such as IEEE 802.11 wireless local area network (WLAN) wireless interface, World Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, and the like.
- NIC network interface card
- computing device 800 may also be connected to each other, such as through a bus.
- bus may also be connected to each other, such as through a bus.
- FIG. 8 the structural block diagram of the computing device shown in FIG. 8 is only for the purpose of example, rather than limiting the scope of this specification. Those skilled in the art can add or replace other components as required.
- Computing device 800 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (eg, tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (eg, smart phones) ), wearable computing devices (eg, smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs.
- Computing device 800 may also be a mobile or stationary server.
- the processor 820 is configured to execute the following computer-executable instructions:
- Dirty data corresponding to the plurality of second partitions in the memory is merged and then recorded to the update area in the disk through one or more I/Os.
- the above is a schematic solution of a computing device according to this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned method for processing Bitmap data belong to the same concept, and the details that are not described in detail in the technical solution of the computing device can be referred to the above-mentioned technical solution of the method for processing Bitmap data. description of.
- An embodiment of the present specification further provides a computer-readable storage medium, which stores computer instructions, which, when executed by a processor, are used for:
- Dirty data corresponding to the plurality of second partitions in the memory is merged and then recorded to the update area in the disk through one or more I/Os.
- the computer instructions include computer program code, which may be in source code form, object code form, an executable file, some intermediate form, or the like.
- the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Electric carrier signals and telecommunication signals are not included.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种处理Bitmap数据的方法以及装置,其中所述处理Bitmap数据的方法包括:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域(102);响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量(104);根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区(106);将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域(108)。
Description
本申请要求2020年07月10日递交的申请号为202010665109.2、发明名称为“处理Bitmap数据的方法以及装置”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本说明书实施例涉及数据库技术领域,特别涉及一种处理Bitmap数据的方法。本说明书一个或者多个实施例同时涉及一种处理Bitmap数据的装置,一种计算设备,以及一种计算机可读存储介质。
Bitmap(位图):是一种磁盘空间管理结构,代表了有限域中的稠集(dense set),每一个元素至少出现一次,在索引,数据压缩等方面有广泛应用。
在Bitmap的管理结构中,内存和磁盘上的数据结构保持一致。配合Bitmap的管理结构,通常在日志快要写满、宕机或者其他情况下,认为写回磁盘的条件被满足,通过触发例如检查点事件(Checkpoint)的方式,把日志中的增量,也即上个检查点事件对应的日志编号之后Bitmap在内存对应的区域中全量未持久化的数据,也称为脏数据,下刷至磁盘的Bitmap区域。
由于在写回磁盘的条件被满足时如发生Checkpoint事件,直接将Bitmap进行下刷需要多个I/O来完成,当Bitmap管理的磁盘空间非常大或粒度非常小时,做持久化所需的I/O量也会随之变多,增加了系统的负担。
发明内容
有鉴于此,本说明书施例提供了一种处理Bitmap数据的方法。本说明书一个或者多个实施例同时涉及一种处理Bitmap数据的装置,一种计算设备,以及一种计算机可读存储介质,以解决现有技术中存在的技术缺陷。
根据本说明书实施例的第一方面,提供了处理Bitmap数据的方法,包括:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
可选地,所述将多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域包括:将所述多个第二分区在内存中对应的脏数据中的更新操作编码,所述编码用于记录所述更新操作在所述第二分区中对应的位置;将所述多个第二分区分别对应的编码后的更新操作合并到所述更新区域在内存对应的区域中;通过一个或多个I/O将合并的、编码后的更新操作从内存对应的区域记录至所述磁盘中的所述更新区域。
可选地,所述更新区域的大小,小于或等于单个I/O的数据块的大小。
可选地,所述响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量包括:响应于检查点事件被触发,获得所述多个分区各自在内存中对应的脏数据量;或者,响应于任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值,获得所述多个分区各自在内存中对应的脏数据量。
可选地,还包括:在所述写回磁盘的条件被满足的情况下,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量不满足合并至所述更新区域的第一分区;将所述第一分区在内存中对应的脏数据以每个分区为单位地更新至所述磁盘中的所述第一分区。
可选地,所述多个分区中单个分区的大小,小于或等于单个I/O的数据块的大小。
可选地,还包括:响应于写回内存的条件被满足,从所述磁盘的所述Bitmap区域加载全量数据到内存对应的区域中,以及,从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置,以及,将日志中最近写回磁盘对应的日志编号之后的增量更新应用到所述Bitmap区域在内存对应的区域中。
可选地,还包括:预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区在被写入数据时日志当前最新的日志编号;预先为所述更新区域设置对应的更新标头,所述更新标头用于记录更新区域在被写入数据时日志当前最新的日志编号。所述从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置包括:比较所述更新标头中记录的日志标号与所述第二分区的分区标头中记录的日志标号;如果所述更新标头中记录的日志标号大于所述第二分区的分区标头中记录的日志标号,则从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置。
可选地,还包括:预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区中的数据的循环冗余校验码;根据所述分区标头记 录的循环冗余校验码校验加载到内存对应区域的数据的正确性。
可选地,所述从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置包括:从磁盘的所述更新区域读取出编码后的更新操作;对所述编码后的更新操作进行解码,得到所述更新操作以及所述更新操作在所述第二分区中对应的位置;将所述更新操作应用到内存中的所述位置。
根据本说明书实施例的第二方面,提供了一种处理Bitmap数据的装置,包括:设置模块,被配置为预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域。写回磁盘响应模块,被配置为响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量。合并确定模块,被配置为根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。合并记录模块,被配置为将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
根据本说明书实施例的第三方面,提供了一种计算设备,包括:存储器和处理器;所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
根据本说明书实施例的第四方面,提供了一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现本说明书任意一实施例所述处理Bitmap数据的方法的步骤。
本说明书一个实施例实现了处理Bitmap数据的方法,由于该方法预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域,从而可以响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。由于多个第二分区对应的脏数据能够合并至更新区域,也即查找出了Bitmap上脏数据较少的分区,再通过将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O更新至所述磁盘中的所述更新区域,使得Bitmap上脏数据较少的更新聚集转移到磁盘设置的更新区域中。由于聚集的脏数据的持久化所需 I/O的数量小于多个脏数据较少的分区的脏数据分别更新至磁盘所需的I/O的数量,避免了Bitmap中的大量脏位在原位更新的代价,减少I/O下发量,提升了系统的效率。
图1是本说明书一个实施例提供的一种处理Bitmap数据的方法的流程图;
图2是本说明书一个实施例提供的磁盘中Bitmap区域划分示意图;
图3是本说明书一个实施例提供的Bitmap checkpoint过程示意图;
图4是本说明书一个实施例提供的Bitmap recover过程示意图;
图5是本说明书一个实施例提供的一种处理Bitmap数据的方法的处理过程流程图;
图6是本说明书一个实施例提供的一种处理Bitmap数据的装置的结构示意图;
图7是本说明书另一个实施例提供的一种处理Bitmap数据的装置的结构示意图;
图8是本说明书一个实施例提供的一种计算设备的结构框图。
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。
检查点事件(Checkpoint),该事件触发后系统需要将内存中未持久化的数据(脏数据)写入磁盘,保证内存和硬盘上的数据一致。持久化完成之后,一次Checkpoint 过程结束,系统可以推进Checkpoint LSN、回收脏数据对应的空间。重启恢复时,只需要从最大的Checkpoint LSN开始恢复,从而缩短系统在发生重启后的恢复时间。例如,Bitmap管理结构的实现,可以配合WAL(Write-Ahead Logging,预写日志系统)来保证持久化和故障恢复。Log(日志)通常用于记录系统中对数据的持久化修改,能够聚集对数据的修改,可以在系统宕机恢复时通过重放日志的方式恢复数据至宕机发生时刻一致的状态。系统实现时,Log的容量一般不会是无限大的,当Log快要写满时,需要通过Checkpoint的方式把Log中的增量(上个Checkpoint LSN之后的、当前Checkpoint LSN之前的)更新对应的修改持久化到存储介质上,然后推进Checkpoint LSN,回收部分Log空间。
LSN(Log Sequence Number,日志编号),例如在WAL系统中,每一个事务操作对应一条日志记录,每一条日志记录都有一个唯一的ID来进行标识,即LSN。
Checkpoint LSN,检查点事件被触发时系统日志的最近日志编号。
在本说明书中,提供了一种处理Bitmap数据的方法,本说明书同时涉及一种处理Bitmap数据的装置,一种计算设备,以及一种计算机可读存储介质,在下面的实施例中逐一进行详细说明。
图1示出了根据本说明书一个实施例提供的一种处理Bitmap数据的方法的流程图,包括步骤102至步骤108。
步骤102:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域。
例如,如图2所示的磁盘中的Bitmap区域划分示意图,Bitmap区域被划分为多个分区如数个固定大小的区间,每个分区为一个Bitmap block,且磁盘中设置了更新区域。磁盘中的Bitmap区域在内存中具有对应的、结构相同的区域,不需要编解码。更新区域,例如可以包含但不限于记录Bitmap在磁盘上的更新、Bitmap的元数据、下刷磁盘时最新的LSN、Checkpoint位点等。更新区域的大小例如可以与Bitmap block大小保持一致。所述多个分区中单个分区的大小,可以小于或等于单个I/O的数据块的大小,以便Bitmap block大小的划分与I/O块大小对齐(如4KB、16KB),从而保证写入的原子性。
需要说明的是,本说明书实施例中对Bitmap block大小、数量,更新区域的大小不限。
磁盘中的数据组织形式可以有多种。例如,可以是如图2所示的一组Bitmap block。再例如,可以用多组Bitmap block来实现一连串Checkpoint IO的原子性,若原来写入一组,则下一次Checkpoint写入另一组,并把组的信息记录至更新区域,以便故障恢复时确定更新操作所应用的位置。
再例如,为了保证写入的原子性,如图2所示,所述更新区域的大小,可以小于或等于单个I/O的数据块的大小。
步骤104:响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量。
可以理解的是,Bitmap在作Checkpoint时,各个Bitmap block上可能积攒了一些更新,每一个更新对应Bitmap block中的一个脏位。在内存中可以得到所有Bitmap block脏数据量的统计信息。
步骤106:根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。
可以理解的是,查找多个第二分区的目的是为了查找出多个分区中脏数据较少的分区,以便将多个脏数据较少的多个分区的脏数据合并来减少I/O下发量,因此,本说明书实施例对于如何查找脏数据量满足合并至所述更新区域的多个第二分区的方式不限,可以根据场景需要预设查找规则。例如,所有分区按脏数据量排序,以便从中查找出脏数据较少的几个分区作为第二分区。具体如:可以按照脏数据量从大到小对多个分区进行排名,将排名靠前的、脏数据量较多的分区作为第一分区,将排名靠后的、脏数据量较少的分区作为第二分区。再例如,可以预设脏数据量阈值,将大于或等于预设脏数据量阈值的分区作为第一分区,将小于预设脏数据量阈值的分区作为第二分区。
步骤108:将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
例如,如图3所示的Bitmap checkpoint(检查点事件)过程示意图,分区301以及分区302为脏数据量较少的第二分区,脏数据量满足合并至更新区域,因此,如图3所示,可以将分区301以及分区302的脏数据合并更新到磁盘的更新区域,以便减少I/O下发量。
需要说明的是,本说明书实施例还可以在写回磁盘的条件被满足的情况下,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量不满足合并至所述更新区域的第一分区,将所述第一分区在内存中对应的脏数据以每个分区为单位 地更新至所述磁盘中的所述第一分区。例如,如图3所示,分区303脏数据量较大,不满足合并到更新区域的要求,则可以将分区303对应的脏数据直接下刷至磁盘对应的分区。
可见,由于该方法预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域,从而可以响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。由于多个第二分区对应的脏数据能够合并至更新区域,也即查找出了Bitmap上脏数据较少的分区,再通过将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O更新至所述磁盘中的所述更新区域,使得Bitmap上脏数据较少的更新聚集转移到磁盘设置的更新区域中。由于聚集的脏数据的持久化所需I/O的数量小于多个脏数据较少的分区的脏数据分别更新至磁盘所需的I/O的数量,避免了Bitmap中的大量脏位在原位更新的代价,减少I/O下发量,提升了系统的效率。
例如,本说明书实施例提供的处理Bitmap数据的方法可以应用于PolarStore多版本存储引擎。该存储引擎采用了Bitmap结构来管理chunk空间,根据本说明书实施例提供的方法,可以避免Bitmap总量大做持久化时读写次数放大的问题,提高了系统的性能和稳定性。
本说明书一个或多个实施例中,为了便于写回内存时,将更新操作应用到对应的位置,可以通过对第二分区的脏数据中的更新操作进行编码,来记录更新操作的位置。具体地,例如,可以将所述多个第二分区在内存中对应的脏数据中的更新操作编码,所述编码用于记录所述更新操作在所述第二分区中对应的位置;将所述多个第二分区分别对应的编码后的更新操作合并到所述更新区域在内存对应的区域中;通过一个或多个I/O将合并的、编码后的更新操作从内存对应的区域记录至所述磁盘中的所述更新区域。
本说明书一个或多个实施例中,所述写回磁盘的条件被满足例如可以包括:检查点事件被触发的情况,以及,任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值等情况。具体地,例如,所述响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量可以包括:响应于检查点事件被触发,获得所述多个分区各自在内存中对应的脏数据量;以及,响应于任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值,获得所述多个分区各自在内 存中对应的脏数据量。在该实施方式中,由于将各个分区对应的脏数据量与预设脏数据下刷阈值比对,从而在系统正常运行时,当某个Bitmap block中积攒的脏位已经达到了该block下刷阈值,主动提前将该Bitmap block进行下刷,这样在真正进行Checkpoint时,该Bitmap block上的脏位数量有更高的概率可以被合并进更新区域,减少需要直接下刷到分区的Bitmap block数目,进一步减少Checkpoint时的I/O压力,使I/O的发生更加平缓。
本说明书一个或多个实施例中,还可以在所述写回磁盘的条件被满足的情况下,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量不满足合并至所述更新区域的第一分区,将所述第一分区在内存中对应的脏数据以每个分区为单位地更新至所述磁盘中的所述第一分区。在该实施例中,由于将Bitmap区域分段,将脏数据较多的分区的脏数据直接下刷至磁盘,将脏数据较少的多个分区的脏数据聚集更新,避免了Bitmap中的大量脏位在原地更新的代价,减少I/O下发量,提升系统的处理效率。
在写回磁盘结束后,还可以响应于写回内存的条件被满足,从所述磁盘的所述Bitmap区域加载全量数据到内存对应的区域中,以及,从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置,以及,将日志中最近写回磁盘对应的日志编号之后的增量更新应用到所述Bitmap区域在内存对应的区域中。在该实施例中,可以在系统故障恢复时,通过加载Bitmap全量数据到内存,从更新区域读取出更新操作应用到内存,以及重放日志的方式恢复数据。
例如,结合上述检查点事件中将更新操作应用到更新区域的实施方式,可以在故障恢复时,从磁盘的所述更新区域读取出编码后的更新操作;对所述编码后的更新操作进行解码,得到所述更新操作以及所述更新操作在所述第二分区中对应的位置;将所述更新操作应用到内存中的所述位置。
可见,在该实施例中,写回磁盘的条件被满足的情况下,可以选择其中脏位数量较多的一到多个Bitmap block更新至磁盘,剩余更新操作编码,记录到更新区域中,通过一个或多个IO完成持久化。当更新区域写入完成时,认为Checkpoint完成。在写回内存的条件被满足的情况下,加载磁盘中的Bitmap block,并应用到内存中的相应结构,读取更新区域的数据,将其中的数据解码,转换成一条条更新操作,作用于相应的Bitmap block上,再将Log中Checkpoint LSN之后的增量更新应用到Bitmap中,完成故障恢复。
例如,如图4所示的Bitmap recover(故障恢复事件)过程示意图,通过步骤402,先加载磁盘中的Bitmap block,并应用到内存中的相应结构。由于更新区域中的数据是Bitmap block中的更新,再通过步骤404将其中的数据解码,转换成一条条更新操作,作用于相应的Bitmap block上。最后,通过步骤406将日志中最近写回磁盘事件之后的增量更新应用到内存中Bitmap对应的区域中。可见,通过上述实施方式可以高效精准地完成故障恢复,提高系统效率。
为了进一步提高故障恢复的效率,本说明书一个或多个实施例中,还可以预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区在被写入数据时日志当前最新的日志编号;预先为所述更新区域设置对应的更新标头。所述更新标头用于记录更新区域在被写入数据时日志当前最新的日志编号。所述从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置包括:比较所述更新标头中记录的日志标号与所述第二分区的分区标头中记录的日志标号;如果所述更新标头中记录的日志标号大于所述第二分区的分区标头中记录的日志标号,则从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置。
为了可以校验Bitmap block的正确性,增加系统的可靠性,本说明书一个或多个实施例中,还可以预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区中的数据的循环冗余校验码;根据所述分区标头记录的循环冗余校验码校验加载到内存对应区域的数据的正确性。
例如,每个Bitmap block可增加一个分区标头header。header可位于Bitmap block的任意位置如前、中、后。header占用部分空间的情况下,Bitmap block可以适当缩小。例如,header中可以包含写Bitmap block时的系统最新LSN、循环冗余校验码crc(Bitmap block的crc)等元数据。Bitmap block与更新区域的header中记录的LSN,可以用于在系统宕机后恢复时,区分更新区域中的更新操作是否需要应用到对应的Bitmap block。如果更新区域的header记录的LSN值更大,可以将更新区域中更新操作应用于内存对应的位置上,否则,可以选择跳过不应用。通过该实施例,在一定程度上可以减少更新操作应用的量,提高系统效率。
下述结合附图5,对上述多个实施例的结合的实施方式进行详细说明。图5示出了本说明书一个实施例提供的一种处理Bitmap数据的方法的处理过程流程图,具体步骤包括步骤502至步骤528。
步骤502:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新 区域;预先为所述Bitmap区域的各个分区分别设置对应的分区标头;预先为所述更新区域设置对应的更新标头。
步骤504:响应于任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值,将所有分区按脏数据量排序,根据排序从所有分区中选取脏数据较少的多个分区作为第二分区,其余分区作为第一分区。进入步骤508。
步骤506:响应于检查点事件被触发,将所有分区按脏数据量排序,根据排序从所有分区中选取脏数据较少的多个分区作为第二分区,其余分区作为第一分区。进入步骤508。
步骤508:将所述第一分区在内存中对应的脏数据以每个分区为单位地更新至所述磁盘中的所述第一分区。
步骤510:将所述多个第二分区在内存中对应的脏数据中的更新操作编码,所述编码用于记录所述更新操作在所述第二分区中对应的位置。
步骤512:将所述多个第二分区分别对应的编码后的更新操作合并到所述更新区域在内存对应的区域中。
步骤514:通过一个或多个I/O将合并的、编码后的更新操作从内存对应的区域记录至所述磁盘中的所述更新区域。
步骤516:响应于故障恢复事件被触发,从所述磁盘的所述Bitmap区域加载全量数据到内存对应的区域中。
步骤518:比较所述更新标头中记录的日志标号与所述第二分区的分区标头中记录的日志标号。
步骤520:如果所述更新标头中记录的日志标号大于所述第二分区的分区标头中记录的日志标号,则从磁盘的所述更新区域读取出编码后的更新操作。
步骤522:对所述编码后的更新操作进行解码,得到所述更新操作以及所述更新操作在所述第二分区中对应的位置。
步骤524:将所述更新操作应用到内存中的所述位置。
步骤526:将日志中最近检查点对应的日志编号之后的增量更新应用到所述Bitmap区域在内存对应的区域中。
步骤528:根据所述分区标头记录的循环冗余校验码校验加载到内存对应区域的数据的正确性。
在该实施例中,基于Log机制,将Bitmap区域分段并引入更新区域,将Bitmap 上脏数据量较少的多个分区的更新合并转移到更新区域中,聚集更新,避免了Bitmap中的大量脏位在原地更新的代价,减少IO下发量,提升系统Checkpoint效率;基于预设脏数据下刷阈值,可以主动提前将Bitmap block进行下刷,减少需要直接下刷到分区的Bitmap block数目,进一步减少Checkpoint时的I/O压力,使I/O的发生更加平缓;通过对第二分区的脏数据中的更新操作进行编码,来记录更新操作的位置,更加便于故障恢复时,将更新操作应用到对应的位置;还通过分区标头、更新标头记录的LSN来区分更新区域中的更新操作是否需要应用到对应的Bitmap block,减少更新操作应用的量,提高系统效率。
与上述方法实施例相对应,本说明书还提供了处理Bitmap数据的装置实施例,图6示出了本说明书一个实施例提供的一种处理Bitmap数据的装置的结构示意图。如图6所示,该装置包括:设置模块602、写回磁盘响应模块604、合并确定模块606及合并记录模块608。
该设置模块602,可以被配置为预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域。
该写回磁盘响应模块604,可以被配置为响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量。
该合并确定模块606,可以被配置为根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。
该合并记录模块608,可以被配置为将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
由于该装置预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域,从而可以响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区。由于查找出了Bitmap上脏数据较少的多个分区,再通过将脏数据较少的多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O更新至所述磁盘中的所述更新区域,使得Bitmap上脏数据量较少的多个分区的更新聚集转移到磁盘设置的更新区域中。由于聚集的脏数据的持久化所需I/O的数量小于多个分区的脏数据分别更新至磁盘所需的I/O的数量,避免了Bitmap中的大量脏位在原位更新的代价,减少I/O下发量,提升了系统的效率。
图7示出了本说明书另一个实施例提供的一种处理Bitmap数据的装置的结构示 意图。如图7所示,所述合并记录模块608可以包括:编码子模块6082、合并子模块6084及记录子模块6086。
该编码子模块6082,可以被配置为将所述多个第二分区在内存中对应的脏数据中的更新操作编码,所述编码用于记录所述更新操作在所述第二分区中对应的位置。
该合并子模块6084,可以被配置为将所述多个第二分区分别对应的编码后的更新操作合并到所述更新区域在内存对应的区域中。
该记录子模块6086,可以被配置为通过一个或多个I/O将合并的、编码后的更新操作从内存对应的区域记录至所述磁盘中的所述更新区域。
在该实施例中,由于通过对第二分区的脏数据中的更新操作进行编码,来记录更新操作的位置,从而更加便于故障恢复时,将更新操作应用到对应的位置。
可选地,如图7,该装置的写回磁盘响应模块604,可以被配置为响应于检查点事件被触发,获得所述多个分区各自在内存中对应的脏数据量;以及,响应于任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值,获得所述多个分区各自在内存中对应的脏数据量。
在该实施例中,由于将各个分区对应的脏数据量与预设脏数据下刷阈值比对,从而在系统正常运行时,当某个Bitmap block中积攒的脏位已经达到了该block下刷阈值,主动提前将该Bitmap block进行下刷,这样在真正进行Checkpoint时,该Bitmap block上的脏位数量有更高的概率可以被合并进更新区域,减少需要直接下刷到分区的Bitmap block数目,进一步减少Checkpoint时的I/O压力,使I/O的发生更加平缓。
可选地,如图7所示,该装置还可以包括:分区下刷模块612,被配置为在所述写回磁盘的条件被满足的情况下,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量不满足合并至所述更新区域的第一分区;将所述第一分区在内存中对应的脏数据以每个分区为单位地更新至所述磁盘中的所述第一分区。在该实施例中,由于将Bitmap区域分段,将集中的脏数据直接下刷至磁盘,将脏数据量较少的多个分区的脏数据聚集更新,避免了Bitmap中的大量脏位在原地更新的代价,减少I/O下发量,提升系统的处理效率。
可选地,如图7所示,该装置还可以包括:分区数据恢复模块614,可以被配置为响应于写回内存的条件被满足,从所述磁盘的所述Bitmap区域加载全量数据到内存对应的区域中。更新数据恢复模块616,可以被配置为从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置。增量数据恢复模块618,可以被配置为 将日志中最近写回磁盘对应的日志编号之后的增量更新应用到所述Bitmap区域在内存对应的区域中。在该实施例中,可以在写回内存的条件被满足时如系统故障恢复时,通过加载Bitmap全量数据到内存,从更新区域读取出更新操作应用到内存,以及重放日志的方式恢复数据。通过该实施例,可以高效精准地完成故障恢复,提高系统效率。
可选地,如图7所示,该装置还可以包括:分区标头预设模块620,可以被配置为预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区在被写入数据时日志当前最新的日志编号。更新标头预设模块622,可以被配置为预先为所述更新区域设置对应的更新标头,所述更新标头用于记录更新区域在被写入数据时日志当前最新的日志编号。该实施例中,所述更新数据恢复模块616可以包括:标头比较子模块6162,可以被配置为比较所述更新标头中记录的日志标号与所述第二分区的分区标头中记录的日志编号。更新应用子模块6164,可以被配置为如果所述更新标头中记录的日志编号大于所述第二分区的分区标头中记录的日志编号,则从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置。通过该实施例,在一定程度上可以减少更新操作应用的量,提高系统效率。
可选地,如图7所示,该装置还可以包括:校验码设置模块624,可以被配置为预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区中的数据的循环冗余校验码。校验模块626,可以被配置为根据所述分区标头记录的循环冗余校验码校验加载到内存对应区域的数据的正确性。通过该实施例可以校验Bitmap block的正确性,增加系统的可靠性。
可选地,如图7所示,所述更新应用子模块6164可以包括:操作读取子模块6164a、操作解码子模块6164b及操作应用子模块6164c。
该操作读取子模块6164a,可以被配置为如果所述更新标头中记录的日志编号大于所述第二分区的分区标头中记录的日志编号,从磁盘的所述更新区域读取出编码后的更新操作。
该操作解码子模块6164b,可以被配置为对所述编码后的更新操作进行解码,得到所述更新操作以及所述更新操作在所述第二分区中对应的位置。
该操作应用子模块6164c,可以被配置为将所述更新操作应用到内存中的所述位置。
在该实施例中,结合对更新操作编码的实施方式,在故障恢复时,可以通过对更 新操作解码,得到更新操作对应的位置,更加便于故障恢复时,将更新操作应用到对应的位置。
上述为本实施例的一种处理Bitmap数据的装置的示意性方案。需要说明的是,该处理Bitmap数据的装置的技术方案与上述的处理Bitmap数据的方法的技术方案属于同一构思,处理Bitmap数据的装置的技术方案未详细描述的细节内容,均可以参见上述处理Bitmap数据的方法的技术方案的描述。
图8示出了根据本说明书一个实施例提供的一种计算设备800的结构框图。该计算设备800的部件包括但不限于存储器810和处理器820。处理器820与存储器810通过总线830相连接,数据库850用于保存数据。
计算设备800还包括接入设备840,接入设备840使得计算设备800能够经由一个或多个网络860通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备840可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC))中的一个或多个,诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口,等等。
在本说明书的一个实施例中,计算设备800的上述部件以及图8中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图8所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。
计算设备800可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或PC的静止计算设备。计算设备800还可以是移动式或静止式的服务器。
其中,处理器820用于执行如下计算机可执行指令:
预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;
响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;
根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足 合并至所述更新区域的多个第二分区;
将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的处理Bitmap数据的方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述处理Bitmap数据的方法的技术方案的描述。
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时以用于:
预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;
响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;
根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;
将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的处理Bitmap数据的方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述处理Bitmap数据的方法的技术方案的描述。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的 是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。
Claims (13)
- 一种处理Bitmap数据的方法,包括:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
- 根据权利要求1所述的方法,将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域包括:将所述多个第二分区在内存中对应的脏数据中的更新操作编码,所述编码用于记录所述更新操作在所述第二分区中对应的位置;将所述多个第二分区分别对应的编码后的更新操作合并到所述更新区域在内存对应的区域中;通过一个或多个I/O将合并的、编码后的更新操作从内存对应的区域记录至所述磁盘中的所述更新区域。
- 根据权利要求1所述的方法,所述更新区域的大小,小于或等于单个I/O的数据块的大小。
- 根据权利要求1所述的方法,所述响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量包括:响应于检查点事件被触发,获得所述多个分区各自在内存中对应的脏数据量;响应于任一个或多个分区在内存对应的区域中的脏数据量达到预设脏数据下刷阈值,获得所述多个分区各自在内存中对应的脏数据量。
- 根据权利要求1所述的方法,还包括:在所述写回磁盘的条件被满足的情况下,根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量不满足合并至所述更新区域的第一分区;将所述第一分区在内存中对应的脏数据以每个分区为单位地更新至所述磁盘中的所述第一分区。
- 根据权利要求5所述的方法,所述多个分区中单个分区的大小,小于或等于 单个I/O的数据块的大小。
- 根据权利要求5所述的方法,还包括:响应于写回内存的条件被满足,从所述磁盘的所述Bitmap区域加载全量数据到内存对应的区域中,以及,从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置,以及,将日志中最近写回磁盘对应的日志编号之后的增量更新应用到所述Bitmap区域在内存对应的区域中。
- 根据权利要求7所述的方法,还包括:预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区在被写入数据时日志当前最新的日志编号;预先为所述更新区域设置对应的更新标头,所述更新标头用于记录更新区域在被写入数据时日志当前最新的日志编号;从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置包括:比较所述更新标头中记录的日志标号与所述第二分区的分区标头中记录的日志标号;如果所述更新标头中记录的日志标号大于所述第二分区的分区标头中记录的日志标号,则从磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置。
- 根据权利要求7所述的方法,还包括:预先为所述Bitmap区域的各个分区分别设置对应的分区标头,所述分区标头用于记录对应的分区中的数据的循环冗余校验码;根据所述分区标头记录的循环冗余校验码校验加载到内存对应区域的数据的正确性。
- 根据权利要求7所述的方法,从所述磁盘的所述更新区域读取出更新操作并应用到内存中对应的位置包括:从磁盘的所述更新区域读取出编码后的更新操作;对所述编码后的更新操作进行解码,得到所述更新操作以及所述更新操作在所述第二分区中对应的位置;将所述更新操作应用到内存中的所述位置。
- 一种处理Bitmap数据的装置,包括:设置模块,被配置为预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;写回磁盘响应模块,被配置为响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;合并确定模块,被配置为根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;合并记录模块,被配置为将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
- 一种计算设备,包括:存储器和处理器;所述存储器用于存储计算机可执行指令,所述处理器用于执行所述计算机可执行指令:预先将磁盘中的Bitmap区域划分为多个分区以及在磁盘中设置更新区域;响应于写回磁盘的条件被满足,获得所述多个分区各自在内存中对应的脏数据量;根据所述多个分区各自对应的脏数据量,从所述多个分区中查找出脏数据量满足合并至所述更新区域的多个第二分区;将所述多个第二分区在内存中对应的脏数据合并后通过一个或多个I/O记录至所述磁盘中的所述更新区域。
- 一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现权利要求1至10任意一项所述处理Bitmap数据的方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/095,459 US20230161702A1 (en) | 2020-07-10 | 2023-01-10 | Method and Apparatus for Processing Bitmap Data |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010665109.2 | 2020-07-10 | ||
CN202010665109.2A CN111563053B (zh) | 2020-07-10 | 2020-07-10 | 处理Bitmap数据的方法以及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/095,459 Continuation US20230161702A1 (en) | 2020-07-10 | 2023-01-10 | Method and Apparatus for Processing Bitmap Data |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022007937A1 true WO2022007937A1 (zh) | 2022-01-13 |
Family
ID=72070341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/105416 WO2022007937A1 (zh) | 2020-07-10 | 2021-07-09 | 处理Bitmap数据的方法以及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230161702A1 (zh) |
CN (1) | CN111563053B (zh) |
WO (1) | WO2022007937A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116226098A (zh) * | 2023-05-09 | 2023-06-06 | 北京尽微致广信息技术有限公司 | 数据处理的方法、装置、电子设备及存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563053B (zh) * | 2020-07-10 | 2020-12-11 | 阿里云计算有限公司 | 处理Bitmap数据的方法以及装置 |
CN113655955B (zh) * | 2021-07-16 | 2023-05-16 | 深圳大普微电子科技有限公司 | 缓存管理方法、固态硬盘控制器及固态硬盘 |
CN113886143B (zh) * | 2021-10-19 | 2022-09-13 | 深圳市木浪云科技有限公司 | 虚拟机持续数据保护方法、装置及数据恢复方法、装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102725752A (zh) * | 2011-10-20 | 2012-10-10 | 华为技术有限公司 | 处理脏数据的方法及装置 |
CN103365926A (zh) * | 2012-03-30 | 2013-10-23 | 伊姆西公司 | 在文件系统中用于保存快照的方法和装置 |
CN103530322A (zh) * | 2013-09-18 | 2014-01-22 | 深圳市华为技术软件有限公司 | 数据处理方法和装置 |
US20150058295A1 (en) * | 2012-05-02 | 2015-02-26 | Huawei Technologies Co., Ltd. | Data Persistence Processing Method and Apparatus, and Database System |
CN109783023A (zh) * | 2019-01-04 | 2019-05-21 | 平安科技(深圳)有限公司 | 一种数据下刷的方法和相关装置 |
CN111563053A (zh) * | 2020-07-10 | 2020-08-21 | 阿里云计算有限公司 | 处理Bitmap数据的方法以及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4366298B2 (ja) * | 2004-12-02 | 2009-11-18 | 富士通株式会社 | 記憶装置、その制御方法及びプログラム |
CN104731872B (zh) * | 2015-03-05 | 2018-04-03 | 长沙新弘软件有限公司 | 基于位图的存储空间管理系统及其方法 |
JP6589981B2 (ja) * | 2015-05-25 | 2019-10-16 | ソニー株式会社 | 記録装置、記録方法、記録媒体 |
CN108427648B (zh) * | 2017-02-14 | 2023-12-01 | 中兴通讯股份有限公司 | 存储系统页内脏数据索引方法和装置 |
-
2020
- 2020-07-10 CN CN202010665109.2A patent/CN111563053B/zh active Active
-
2021
- 2021-07-09 WO PCT/CN2021/105416 patent/WO2022007937A1/zh active Application Filing
-
2023
- 2023-01-10 US US18/095,459 patent/US20230161702A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102725752A (zh) * | 2011-10-20 | 2012-10-10 | 华为技术有限公司 | 处理脏数据的方法及装置 |
CN103365926A (zh) * | 2012-03-30 | 2013-10-23 | 伊姆西公司 | 在文件系统中用于保存快照的方法和装置 |
US20150058295A1 (en) * | 2012-05-02 | 2015-02-26 | Huawei Technologies Co., Ltd. | Data Persistence Processing Method and Apparatus, and Database System |
CN103530322A (zh) * | 2013-09-18 | 2014-01-22 | 深圳市华为技术软件有限公司 | 数据处理方法和装置 |
CN109783023A (zh) * | 2019-01-04 | 2019-05-21 | 平安科技(深圳)有限公司 | 一种数据下刷的方法和相关装置 |
CN111563053A (zh) * | 2020-07-10 | 2020-08-21 | 阿里云计算有限公司 | 处理Bitmap数据的方法以及装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116226098A (zh) * | 2023-05-09 | 2023-06-06 | 北京尽微致广信息技术有限公司 | 数据处理的方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111563053A (zh) | 2020-08-21 |
US20230161702A1 (en) | 2023-05-25 |
CN111563053B (zh) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022007937A1 (zh) | 处理Bitmap数据的方法以及装置 | |
US10437662B2 (en) | Crash recovery using non-volatile memory | |
US10936441B2 (en) | Write-ahead style logging in a persistent memory device | |
US8977597B2 (en) | Generating and applying redo records | |
US9542279B2 (en) | Shadow paging based log segment directory | |
CN110543446B (zh) | 一种基于快照的区块链直接归档方法 | |
US11656942B2 (en) | Methods for data writing and for data recovery, electronic devices, and program products | |
US20170293536A1 (en) | Database journaling method and apparatus | |
CN111078667A (zh) | 一种数据迁移的方法以及相关装置 | |
US10452496B2 (en) | System and method for managing storage transaction requests | |
CN113253932B (zh) | 一种分布式存储系统的读写控制方法和系统 | |
CN114297196A (zh) | 元数据存储方法、装置、电子设备及存储介质 | |
US10430115B2 (en) | System and method for optimizing multiple packaging operations in a storage system | |
CN117349235A (zh) | 一种基于LSM-Tree的KV存储系统、电子设备、介质 | |
WO2014061847A1 (ko) | 모바일 환경에 구축된 데이터베이스에 대한 트랜잭션 로깅 및 회복 장치 및 그 방법 | |
KR101809679B1 (ko) | 데이터베이스의 일관성 유지를 위한 컴퓨팅 장치 및 방법 | |
CN113495807A (zh) | 数据备份方法、数据恢复方法及装置 | |
CN113312414B (zh) | 数据处理方法、装置、设备和存储介质 | |
CN112328433B (zh) | 归档数据恢复的处理方法、装置、电子装置和存储介质 | |
US20240086392A1 (en) | Consistency checks for compressed data | |
US11200219B2 (en) | System and method for idempotent metadata destage in a storage cluster with delta log based architecture | |
CN116401416A (zh) | 支持无锁化并发访问的持久可变基数树访问系统 | |
CN118860994A (zh) | 一种数据压缩方法及相关系统 | |
CN112328433A (zh) | 归档数据恢复的处理方法、装置、电子装置和存储介质 | |
CN117099093A (zh) | 管理装置、数据库系统、管理方法及程序 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21837387 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21837387 Country of ref document: EP Kind code of ref document: A1 |