WO2012083754A1 - Procédé et dispositif de traitement de données douteuses - Google Patents
Procédé et dispositif de traitement de données douteuses Download PDFInfo
- Publication number
- WO2012083754A1 WO2012083754A1 PCT/CN2011/081046 CN2011081046W WO2012083754A1 WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1 CN 2011081046 W CN2011081046 W CN 2011081046W WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tuple
- storage block
- data
- che
- memory
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
Definitions
- the present invention relates to the field of storage technologies, and in particular, to a method and apparatus for processing dirty data. Background technique
- the database ( Da t aba s e ) is a repository that organizes, stores, and manages data according to its data structure. In daily work, it is often necessary to put some relevant data into such a "warehouse” and handle it accordingly according to the needs of management.
- the traditional database system and other storage-related engines work.
- the modified data needs to be written to disk immediately (or in a short time) to ensure the integrity of the transaction or the data in the database. Reliability. In the process of writing the modified data to the disk, the data cannot be written to the memory, and the memory has to be suspended from the external service, thereby causing a limitation on the memory throughput and the read and write performance of the system.
- the read and write performance of the system is improved by adding a flash device similar to a Solid State Disk (SSD) as a cache memory:
- the memory writes the modified data in units of memory blocks in the SSD.
- SSD Solid State Disk
- the data that has been modified in the cache and has not been written to the disk is dirty data.
- the SSD can only process a small amount of dirty data when reading and writing a data block, resulting in data throughput and reading and writing of the database system. Low performance, causing system response delays and even database crashes.
- Embodiments of the present invention provide a method and apparatus for processing dirty data, which can improve data throughput and read and write performance of a database system.
- an embodiment of the present invention provides a method for processing dirty data, including: determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache memory;
- the dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.
- an embodiment of the present invention provides an apparatus for processing dirty data, including: a determining unit, configured to determine, in a memory, a first storage block, the size of the first storage block and a ca che cache Write specifications match;
- a first write unit configured to combine and write the elements marked as dirty data in the memory into the first storage block
- a second writing unit configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
- the method and apparatus for processing dirty data provided by the embodiments of the present invention can combine the elements marked as dirty data in the memory and write them together, and then write the dirty data to the disk through the ca che.
- the method and device provided by the embodiments of the present invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of ca che, and prolong the service life of the ca che.
- FIG. 1 is a schematic flowchart of a method according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a method according to another embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a device according to another embodiment of the present invention.
- FIG. 4 is another schematic structural diagram of a device according to another embodiment of the present invention.
- FIG. 5 is still another schematic structural diagram of a device according to another embodiment of the present invention
- FIG. 6 is still another schematic structural diagram of a device according to another embodiment of the present invention. detailed description
- the dirty data of the embodiment of the present invention may be data that has been modified in the cache and has not been written to the disk.
- the embodiments of the present invention can be applied to various types of databases and data warehouse systems, including DB databases, Oracle databases, SQL databases, and the like.
- An embodiment of the present invention provides a method for processing dirty data. As shown in FIG. 1, the method includes:
- the method for processing dirty data provided by the embodiment of the present invention can combine the elements marked as dirty data in the memory and write them together to the cache, and then write the dirty data to the disk through the cache.
- the method provided by the embodiment of the invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of the cache, thereby prolonging the service life of the cache.
- Another embodiment of the present invention further provides a method for processing dirty data, as shown in FIG. 2, including:
- the cache is a cache device that connects the memory and the disk; the write specification of the cache refers to the maximum amount of data that can be written by the cache every time it is refreshed.
- cache read and write The speed is much larger than the read/write speed of the memory.
- the storage space with the same or close to the write size of the cache can be determined in the memory as the first storage module. Specifically, the free space in the memory may be integrated to obtain the first storage block. The storage space that meets the specifications of the first storage block may be reserved in the memory as the first storage block, which is not limited herein.
- the cache may be a flash device similar to a Solid State Disk (SSD), but is not limited thereto.
- SSD Solid State Disk
- the first storage block stores the original storage block information to which each tuple marked as dirty data belongs, and each The tuple data and a pointer to each tuple data; wherein the tuple may be a storage unit that stores dirty data, and can also represent a connection of a plurality of storage units, but is not limited thereto.
- the first storage block can integrate the array marked as dirty data into the cache, thereby improving Difficult data read and write efficiency.
- the first mapping table is used to record the first specific.
- the first storage block may be used to write dirty data multiple times. Cache; thus storing a plurality of different versions of the first block information in the cache.
- the information in the first storage block may be numbered according to the order of writing the cache, and the time version number of each first storage block information is determined, where each tuple in the first storage block information of the same version is used. The time version number is the same, and the time version number of each tuple is used to represent the first storage block information to which the tuple belongs in the cache.
- the storage space marked as dirty data in the memory is larger than the storage space of the first storage block, it is necessary to write the tuple marked as dirty data in the memory to the cache by using the first storage block multiple times. Thereby storing different versions of the first storage block information in the cache.
- tuples marked as dirty data in memory may be modified many times, so that multiple values of the tuple are often recorded in ca che; but dirty data in ca che is written to disk
- the method of the present embodiment only needs to: write the final value of each tuple to the disk; in order to improve the efficiency of reading and writing data, the method provided in this embodiment further includes:
- the effective tuple can be determined by using, but not limited to, the following methods:
- the first storage block information of each version is sequentially read from the lowest version of the storage block according to the sequence of the storage block versions; and the tuple in the first storage block of the current version is detected in the higher version according to the time version number. Whether the memory block is modified again; if it is, the current tuple is ignored; if not, the current tuple is retained and marked as a valid tuple.
- the original storage block information to which each tuple in the effective tuple belongs may be determined according to the first mapping table, and the meta-combination belonging to the same original storage block is further determined. And write to disk together to improve data read and write efficiency.
- the first mapping table may be accessed to determine whether the specified tuple data is included in the cache; if not, the tuple data is read from the disk; And determining, according to the first mapping table, the first storage block information including the specified tuple in the ca che including the specified tuple, and determining data of the specified tuple.
- the corresponding tuple is obtained from the ca che to cover the specified storage block in the disk; when the system needs to modify the single tuple Then, the modification of the specified tuple is completed according to the method provided in this embodiment.
- the first mapping table in the memory may be stored in the cache, so that after the server restarts, the remaining version of the cache is determined according to the first mapping table.
- the first block information is written to the disk.
- the method for writing the dirty data in the ca che to the disk is referred to in this embodiment, and details are not described herein again.
- the method for processing dirty data provided by the embodiment of the present invention, by determining the first storage block in the memory, and integrating the tuple marked as dirty data in the memory into the ca che; the dirty in the ca che in the idle period of the service Data is written to disk.
- the method provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, also facilitates the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing caç, and prolong the service life of ca che.
- a further embodiment of the present invention provides a device for processing dirty data, which can implement the foregoing method embodiment.
- the device includes: a determining unit 31, configured to determine, in a memory, a first storage block, where a size of the first storage block matches a write specification of a cache;
- a first writing unit 32 configured to combine and write the elements marked as dirty data in the memory into the first storage block
- the second write unit 33 is configured to write dirty data in the first storage block to the cache, and write the dirty data to the disk by using the cache.
- the determining unit 31 may further include an integration subunit 311 or a reservation subunit 312, where:
- the integration subunit 311 is configured to integrate the free space in the memory to obtain the first storage block.
- the reservation subunit 312 is configured to reserve, in the memory, a storage space conforming to the first storage block specification as the first storage block.
- the first writing unit 32 is further configured to write related information of a tuple marked as dirty data in the memory to the first storage block, where the related information of the tuple includes each tuple marked as dirty data.
- the apparatus further includes a processing unit 34.
- the second writing unit 33 specifically includes a first processing sub-unit 331, a first searching sub-unit 332, and a second processing sub-unit 333, where:
- the processing unit 34 is configured to establish a first mapping table in the memory, where the first mapping table uses the initial storage block information, where the time version number of each tuple is used to represent the tuple in the cache.
- the first processing sub-unit 331 is configured to write dirty data in the first storage block The cache, the dirty data is written to the disk by the cache;
- the first search sub-unit 332 is configured to search for a time version number of a final value of each tuple data in the dirty data according to the first mapping table, when the tuple marked as dirty data in the memory is modified multiple times, and determine a first storage block information corresponding to the time version number in the cache, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting it as an effective tuple ;
- the second processing sub-unit 333 is configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache;
- the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
- the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
- the second writing unit 33 may further include a second searching subunit 334 and a third processing subunit 335, and the apparatus further includes a first searching unit 35 and a second searching unit 36, where:
- the second lookup subunit 334 is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;
- the second processing sub-unit 335 is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the tuple data information corresponding to the tuple in the cache.
- the first searching unit 35 is configured to: when the specified tuple needs to be searched, look up the first mapping table, and determine whether the specified tuple is included in the cache;
- the second searching unit 36 is configured to: when the specified tuple is included in the cache, determine, according to the first mapping table, first storage block information that includes a final value of the specified tuple data in the cache, and determine the specified element. Group of data.
- the processing unit 34 is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted, according to the The first storage block information of the remaining versions in the cache is used to reconstruct the first mapping table.
- the first searching sub-unit 332 is further configured to search for remaining dirty data in the cache according to the first mapping table determined by the processing unit 34. a time version number of a final value of each tuple data, determining first storage block information corresponding to the time version number in the cache, and storing the final value of each tuple data in the first storage block information
- the tuple is marked and set as an effective tuple;
- the second processing sub-unit 333 is further configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache; the processing unit 34 further And after the second processing sub-unit 333 writes the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table.
- the processing unit 34 is further configured to: when the server is shut down, store the first mapping table in the memory in the cache, so that the server is The first mapping table writes the first storage block information of the remaining versions in the cache to the disk.
- the processing device for the dirty data determines the first storage block in the memory by the determining unit 31, and integrates the tuple marked as dirty data in the memory by the first writing unit 32 to write the first storage block.
- the dirty data in the first memory block is written to the cache by the second write unit 33 during the service idle period, and the dirty data is written to the disk by the cache.
- the device provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, it is also convenient for the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing of the cache and prolong the service life of the cache.
- Embodiments of the present invention also provide a memory including the apparatus described in Figures 3 through 6 and a processor for controlling the apparatus for processing dirty data.
- This memory is capable of handling dirty data. It should be noted that the memory may be used as a memory or as a cache, which is not limited herein.
- the invention can be implemented by means of software plus the necessary general hardware, and of course also by hardware, but in many cases the former is a better implementation.
- the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
- a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Cette invention concerne un procédé et un dispositif de traitement de données douteuses. Le procédé consiste : à déterminer un premier bloc de stockage dans la mémoire et à vérifier que la taille de ce bloc correspond aux spécifications d'écriture de la mémoire cache ; à combiner les éléments marqués comme données douteuses et à les écrire dans le premier bloc de stockage ; et à écrire les données douteuses tirées du premier bloc de stockage sur un disque via la mémoire cache. Cette invention permet d'améliorer le débit de données et les caractéristiques de lecture-écriture du système de base de données.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2011/081046 WO2012083754A1 (fr) | 2011-10-20 | 2011-10-20 | Procédé et dispositif de traitement de données douteuses |
CN201180002177.XA CN102725752B (zh) | 2011-10-20 | 2011-10-20 | 处理脏数据的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2011/081046 WO2012083754A1 (fr) | 2011-10-20 | 2011-10-20 | Procédé et dispositif de traitement de données douteuses |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012083754A1 true WO2012083754A1 (fr) | 2012-06-28 |
Family
ID=46313122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2011/081046 WO2012083754A1 (fr) | 2011-10-20 | 2011-10-20 | Procédé et dispositif de traitement de données douteuses |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN102725752B (fr) |
WO (1) | WO2012083754A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593352A (zh) * | 2012-08-15 | 2014-02-19 | 阿里巴巴集团控股有限公司 | 一种海量数据清洗方法及装置 |
CN105763351A (zh) * | 2014-12-17 | 2016-07-13 | 华为技术有限公司 | 部署增值业务的方法、转发设备、检测设备和管理设备 |
CN108319609A (zh) * | 2017-01-16 | 2018-07-24 | 医渡云(北京)技术有限公司 | Etl数据处理方法及系统、数据清洗方法及装置 |
JP2020510905A (ja) * | 2017-02-06 | 2020-04-09 | 中興通訊股▲ふん▼有限公司Zte Corporation | フラッシュメモリファイルシステム及びそのデータ管理方法 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218430B (zh) * | 2013-04-11 | 2016-03-02 | 华为技术有限公司 | 控制数据写入的方法、系统及设备 |
CN103513941B (zh) * | 2013-10-18 | 2016-08-17 | 华为技术有限公司 | 写入数据的方法及装置 |
CN103714121B (zh) * | 2013-12-03 | 2017-07-14 | 华为技术有限公司 | 一种索引记录的管理方法及装置 |
CN103631940B (zh) * | 2013-12-09 | 2017-02-08 | 中国联合网络通信集团有限公司 | 一种应用于hbase数据库的数据写入方法及系统 |
CN104331452B (zh) * | 2014-10-30 | 2017-07-28 | 北京思特奇信息技术股份有限公司 | 一种处理脏数据的方法及系统 |
WO2017113247A1 (fr) * | 2015-12-30 | 2017-07-06 | 华为技术有限公司 | Procédé pour réduire la consommation d'énergie d'une mémoire et d'un dispositif informatique |
CN106802950A (zh) * | 2017-01-16 | 2017-06-06 | 郑州云海信息技术有限公司 | 一种分布式文件系统小文件写缓存优化的方法 |
CN110704468A (zh) * | 2019-10-17 | 2020-01-17 | 武汉微派网络科技有限公司 | 数据更新方法、装置及控制器 |
CN111563053B (zh) * | 2020-07-10 | 2020-12-11 | 阿里云计算有限公司 | 处理Bitmap数据的方法以及装置 |
CN112115073A (zh) * | 2020-09-04 | 2020-12-22 | 北京易捷思达科技发展有限公司 | 应用于Bcache的回收方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1851677A (zh) * | 2005-11-25 | 2006-10-25 | 华为技术有限公司 | 嵌入式处理器系统及其数据操作方法 |
CN101178689A (zh) * | 2007-12-06 | 2008-05-14 | 浙江科技学院 | 一种NAND Flash存储器的动态管理方法 |
CN101916290A (zh) * | 2010-08-18 | 2010-12-15 | 中兴通讯股份有限公司 | 内存数据库的管理方法和装置 |
US20110191535A1 (en) * | 2010-02-01 | 2011-08-04 | Fujitsu Limited | Method for controlling disk array apparatus and disk array apparatus |
WO2011114384A1 (fr) * | 2010-03-19 | 2011-09-22 | Hitachi, Ltd. | Système de stockage et procédé permettant de modifier la configuration d'une mémoire cache pour le système de stockage |
-
2011
- 2011-10-20 WO PCT/CN2011/081046 patent/WO2012083754A1/fr active Application Filing
- 2011-10-20 CN CN201180002177.XA patent/CN102725752B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1851677A (zh) * | 2005-11-25 | 2006-10-25 | 华为技术有限公司 | 嵌入式处理器系统及其数据操作方法 |
CN101178689A (zh) * | 2007-12-06 | 2008-05-14 | 浙江科技学院 | 一种NAND Flash存储器的动态管理方法 |
US20110191535A1 (en) * | 2010-02-01 | 2011-08-04 | Fujitsu Limited | Method for controlling disk array apparatus and disk array apparatus |
WO2011114384A1 (fr) * | 2010-03-19 | 2011-09-22 | Hitachi, Ltd. | Système de stockage et procédé permettant de modifier la configuration d'une mémoire cache pour le système de stockage |
CN101916290A (zh) * | 2010-08-18 | 2010-12-15 | 中兴通讯股份有限公司 | 内存数据库的管理方法和装置 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593352A (zh) * | 2012-08-15 | 2014-02-19 | 阿里巴巴集团控股有限公司 | 一种海量数据清洗方法及装置 |
CN103593352B (zh) * | 2012-08-15 | 2016-10-12 | 阿里巴巴集团控股有限公司 | 一种海量数据清洗方法及装置 |
CN105763351A (zh) * | 2014-12-17 | 2016-07-13 | 华为技术有限公司 | 部署增值业务的方法、转发设备、检测设备和管理设备 |
CN105763351B (zh) * | 2014-12-17 | 2019-09-03 | 华为技术有限公司 | 部署增值业务的方法、转发设备、检测设备和管理设备 |
CN108319609A (zh) * | 2017-01-16 | 2018-07-24 | 医渡云(北京)技术有限公司 | Etl数据处理方法及系统、数据清洗方法及装置 |
JP2020510905A (ja) * | 2017-02-06 | 2020-04-09 | 中興通訊股▲ふん▼有限公司Zte Corporation | フラッシュメモリファイルシステム及びそのデータ管理方法 |
Also Published As
Publication number | Publication date |
---|---|
CN102725752A (zh) | 2012-10-10 |
CN102725752B (zh) | 2014-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012083754A1 (fr) | Procédé et dispositif de traitement de données douteuses | |
US9449005B2 (en) | Metadata storage system and management method for cluster file system | |
US9703640B2 (en) | Method and system of performing incremental SQL server database backups | |
US8799601B1 (en) | Techniques for managing deduplication based on recently written extents | |
US9836514B2 (en) | Cache based key-value store mapping and replication | |
US9305040B2 (en) | Efficient B-tree data serialization | |
US9418094B2 (en) | Method and apparatus for performing multi-stage table updates | |
CN103106286B (zh) | 元数据的管理方法和装置 | |
CN106662981A (zh) | 存储设备、程序和信息处理方法 | |
US9542279B2 (en) | Shadow paging based log segment directory | |
US11526465B2 (en) | Generating hash trees for database schemas | |
WO2016070529A1 (fr) | Procédé et dispositif d'obtention de suppression de données dupliquées | |
WO2014089828A1 (fr) | Procédé d'accès à un dispositif de stockage et dispositif de stockage | |
WO2018076633A1 (fr) | Procédé de duplication de données à distance, dispositif de stockage et système de stockage | |
KR101674176B1 (ko) | 파일 단위 순서 모드 저널링 기법을 이용한 fsync 시스템 호출 처리 장치 및 방법 | |
US9411692B2 (en) | Applying write elision | |
US10423583B1 (en) | Efficient caching and configuration for retrieving data from a storage system | |
US8086580B2 (en) | Handling access requests to a page while copying an updated page of data to storage | |
US11625503B2 (en) | Data integrity procedure | |
US11899625B2 (en) | Systems and methods for replication time estimation in a data deduplication system | |
US10528254B2 (en) | Methods and systems of garbage collection and defragmentation in a distributed database | |
US10664442B1 (en) | Method and system for data consistency verification in a storage system | |
US11748259B2 (en) | System and method to conserve device lifetime for snapshot generation | |
CN116257531B (zh) | 一种数据库空间回收方法 | |
US11531644B2 (en) | Fractional consistent global snapshots of a distributed namespace |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180002177.X Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11851101 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11851101 Country of ref document: EP Kind code of ref document: A1 |