WO2012083754A1 - Procédé et dispositif de traitement de données douteuses - Google Patents

Procédé et dispositif de traitement de données douteuses Download PDF

Info

Publication number
WO2012083754A1
WO2012083754A1 PCT/CN2011/081046 CN2011081046W WO2012083754A1 WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1 CN 2011081046 W CN2011081046 W CN 2011081046W WO 2012083754 A1 WO2012083754 A1 WO 2012083754A1
Authority
WO
WIPO (PCT)
Prior art keywords
tuple
storage block
data
che
memory
Prior art date
Application number
PCT/CN2011/081046
Other languages
English (en)
Chinese (zh)
Inventor
时家幸
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/081046 priority Critical patent/WO2012083754A1/fr
Priority to CN201180002177.XA priority patent/CN102725752B/zh
Publication of WO2012083754A1 publication Critical patent/WO2012083754A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning

Definitions

  • the present invention relates to the field of storage technologies, and in particular, to a method and apparatus for processing dirty data. Background technique
  • the database ( Da t aba s e ) is a repository that organizes, stores, and manages data according to its data structure. In daily work, it is often necessary to put some relevant data into such a "warehouse” and handle it accordingly according to the needs of management.
  • the traditional database system and other storage-related engines work.
  • the modified data needs to be written to disk immediately (or in a short time) to ensure the integrity of the transaction or the data in the database. Reliability. In the process of writing the modified data to the disk, the data cannot be written to the memory, and the memory has to be suspended from the external service, thereby causing a limitation on the memory throughput and the read and write performance of the system.
  • the read and write performance of the system is improved by adding a flash device similar to a Solid State Disk (SSD) as a cache memory:
  • the memory writes the modified data in units of memory blocks in the SSD.
  • SSD Solid State Disk
  • the data that has been modified in the cache and has not been written to the disk is dirty data.
  • the SSD can only process a small amount of dirty data when reading and writing a data block, resulting in data throughput and reading and writing of the database system. Low performance, causing system response delays and even database crashes.
  • Embodiments of the present invention provide a method and apparatus for processing dirty data, which can improve data throughput and read and write performance of a database system.
  • an embodiment of the present invention provides a method for processing dirty data, including: determining, in a memory, a first memory block, the size of the first memory block matching a write specification of a cache memory;
  • the dirty data in the first memory block is written to the ca che, and the dirty data is written to the disk by the ca che.
  • an embodiment of the present invention provides an apparatus for processing dirty data, including: a determining unit, configured to determine, in a memory, a first storage block, the size of the first storage block and a ca che cache Write specifications match;
  • a first write unit configured to combine and write the elements marked as dirty data in the memory into the first storage block
  • a second writing unit configured to write dirty data in the first storage block to the cache, and write the dirty data to a disk by using the cache.
  • the method and apparatus for processing dirty data provided by the embodiments of the present invention can combine the elements marked as dirty data in the memory and write them together, and then write the dirty data to the disk through the ca che.
  • the method and device provided by the embodiments of the present invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of ca che, and prolong the service life of the ca che.
  • FIG. 1 is a schematic flowchart of a method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a device according to another embodiment of the present invention.
  • FIG. 4 is another schematic structural diagram of a device according to another embodiment of the present invention.
  • FIG. 5 is still another schematic structural diagram of a device according to another embodiment of the present invention
  • FIG. 6 is still another schematic structural diagram of a device according to another embodiment of the present invention. detailed description
  • the dirty data of the embodiment of the present invention may be data that has been modified in the cache and has not been written to the disk.
  • the embodiments of the present invention can be applied to various types of databases and data warehouse systems, including DB databases, Oracle databases, SQL databases, and the like.
  • An embodiment of the present invention provides a method for processing dirty data. As shown in FIG. 1, the method includes:
  • the method for processing dirty data provided by the embodiment of the present invention can combine the elements marked as dirty data in the memory and write them together to the cache, and then write the dirty data to the disk through the cache.
  • the method provided by the embodiment of the invention can improve the data throughput and the read/write performance of the database system, and can also reduce the frequency of reading and writing of the cache, thereby prolonging the service life of the cache.
  • Another embodiment of the present invention further provides a method for processing dirty data, as shown in FIG. 2, including:
  • the cache is a cache device that connects the memory and the disk; the write specification of the cache refers to the maximum amount of data that can be written by the cache every time it is refreshed.
  • cache read and write The speed is much larger than the read/write speed of the memory.
  • the storage space with the same or close to the write size of the cache can be determined in the memory as the first storage module. Specifically, the free space in the memory may be integrated to obtain the first storage block. The storage space that meets the specifications of the first storage block may be reserved in the memory as the first storage block, which is not limited herein.
  • the cache may be a flash device similar to a Solid State Disk (SSD), but is not limited thereto.
  • SSD Solid State Disk
  • the first storage block stores the original storage block information to which each tuple marked as dirty data belongs, and each The tuple data and a pointer to each tuple data; wherein the tuple may be a storage unit that stores dirty data, and can also represent a connection of a plurality of storage units, but is not limited thereto.
  • the first storage block can integrate the array marked as dirty data into the cache, thereby improving Difficult data read and write efficiency.
  • the first mapping table is used to record the first specific.
  • the first storage block may be used to write dirty data multiple times. Cache; thus storing a plurality of different versions of the first block information in the cache.
  • the information in the first storage block may be numbered according to the order of writing the cache, and the time version number of each first storage block information is determined, where each tuple in the first storage block information of the same version is used. The time version number is the same, and the time version number of each tuple is used to represent the first storage block information to which the tuple belongs in the cache.
  • the storage space marked as dirty data in the memory is larger than the storage space of the first storage block, it is necessary to write the tuple marked as dirty data in the memory to the cache by using the first storage block multiple times. Thereby storing different versions of the first storage block information in the cache.
  • tuples marked as dirty data in memory may be modified many times, so that multiple values of the tuple are often recorded in ca che; but dirty data in ca che is written to disk
  • the method of the present embodiment only needs to: write the final value of each tuple to the disk; in order to improve the efficiency of reading and writing data, the method provided in this embodiment further includes:
  • the effective tuple can be determined by using, but not limited to, the following methods:
  • the first storage block information of each version is sequentially read from the lowest version of the storage block according to the sequence of the storage block versions; and the tuple in the first storage block of the current version is detected in the higher version according to the time version number. Whether the memory block is modified again; if it is, the current tuple is ignored; if not, the current tuple is retained and marked as a valid tuple.
  • the original storage block information to which each tuple in the effective tuple belongs may be determined according to the first mapping table, and the meta-combination belonging to the same original storage block is further determined. And write to disk together to improve data read and write efficiency.
  • the first mapping table may be accessed to determine whether the specified tuple data is included in the cache; if not, the tuple data is read from the disk; And determining, according to the first mapping table, the first storage block information including the specified tuple in the ca che including the specified tuple, and determining data of the specified tuple.
  • the corresponding tuple is obtained from the ca che to cover the specified storage block in the disk; when the system needs to modify the single tuple Then, the modification of the specified tuple is completed according to the method provided in this embodiment.
  • the first mapping table in the memory may be stored in the cache, so that after the server restarts, the remaining version of the cache is determined according to the first mapping table.
  • the first block information is written to the disk.
  • the method for writing the dirty data in the ca che to the disk is referred to in this embodiment, and details are not described herein again.
  • the method for processing dirty data provided by the embodiment of the present invention, by determining the first storage block in the memory, and integrating the tuple marked as dirty data in the memory into the ca che; the dirty in the ca che in the idle period of the service Data is written to disk.
  • the method provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, also facilitates the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing caç, and prolong the service life of ca che.
  • a further embodiment of the present invention provides a device for processing dirty data, which can implement the foregoing method embodiment.
  • the device includes: a determining unit 31, configured to determine, in a memory, a first storage block, where a size of the first storage block matches a write specification of a cache;
  • a first writing unit 32 configured to combine and write the elements marked as dirty data in the memory into the first storage block
  • the second write unit 33 is configured to write dirty data in the first storage block to the cache, and write the dirty data to the disk by using the cache.
  • the determining unit 31 may further include an integration subunit 311 or a reservation subunit 312, where:
  • the integration subunit 311 is configured to integrate the free space in the memory to obtain the first storage block.
  • the reservation subunit 312 is configured to reserve, in the memory, a storage space conforming to the first storage block specification as the first storage block.
  • the first writing unit 32 is further configured to write related information of a tuple marked as dirty data in the memory to the first storage block, where the related information of the tuple includes each tuple marked as dirty data.
  • the apparatus further includes a processing unit 34.
  • the second writing unit 33 specifically includes a first processing sub-unit 331, a first searching sub-unit 332, and a second processing sub-unit 333, where:
  • the processing unit 34 is configured to establish a first mapping table in the memory, where the first mapping table uses the initial storage block information, where the time version number of each tuple is used to represent the tuple in the cache.
  • the first processing sub-unit 331 is configured to write dirty data in the first storage block The cache, the dirty data is written to the disk by the cache;
  • the first search sub-unit 332 is configured to search for a time version number of a final value of each tuple data in the dirty data according to the first mapping table, when the tuple marked as dirty data in the memory is modified multiple times, and determine a first storage block information corresponding to the time version number in the cache, and marking a tuple of the first storage block information in which the final value of each tuple data is stored, and setting it as an effective tuple ;
  • the second processing sub-unit 333 is configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache;
  • the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
  • the processing unit 34 is further configured to delete, after the second processing sub-unit 333 writes the valid tuple to the disk, the time version number information corresponding to the valid tuple in the first mapping table.
  • the second writing unit 33 may further include a second searching subunit 334 and a third processing subunit 335, and the apparatus further includes a first searching unit 35 and a second searching unit 36, where:
  • the second lookup subunit 334 is configured to determine, according to the first mapping table, original storage block information to which each tuple in the valid tuple belongs;
  • the second processing sub-unit 335 is configured to combine and write the elements belonging to the same original storage block to the disk, and delete the tuple data information corresponding to the tuple in the cache.
  • the first searching unit 35 is configured to: when the specified tuple needs to be searched, look up the first mapping table, and determine whether the specified tuple is included in the cache;
  • the second searching unit 36 is configured to: when the specified tuple is included in the cache, determine, according to the first mapping table, first storage block information that includes a final value of the specified tuple data in the cache, and determine the specified element. Group of data.
  • the processing unit 34 is further configured to: when an abnormal situation occurs, causing the process of writing dirty data to the disk to be terminated, after the server is restarted, according to the The first storage block information of the remaining versions in the cache is used to reconstruct the first mapping table.
  • the first searching sub-unit 332 is further configured to search for remaining dirty data in the cache according to the first mapping table determined by the processing unit 34. a time version number of a final value of each tuple data, determining first storage block information corresponding to the time version number in the cache, and storing the final value of each tuple data in the first storage block information
  • the tuple is marked and set as an effective tuple;
  • the second processing sub-unit 333 is further configured to write the valid tuple determined by the first lookup sub-unit 332 to the disk, and delete the tuple data information corresponding to the valid tuple in the cache; the processing unit 34 further And after the second processing sub-unit 333 writes the valid tuple to the disk, deleting time version number information corresponding to the valid tuple in the first mapping table.
  • the processing unit 34 is further configured to: when the server is shut down, store the first mapping table in the memory in the cache, so that the server is The first mapping table writes the first storage block information of the remaining versions in the cache to the disk.
  • the processing device for the dirty data determines the first storage block in the memory by the determining unit 31, and integrates the tuple marked as dirty data in the memory by the first writing unit 32 to write the first storage block.
  • the dirty data in the first memory block is written to the cache by the second write unit 33 during the service idle period, and the dirty data is written to the disk by the cache.
  • the device provided by the embodiment of the present invention can significantly improve the data throughput of the database system and read and write. Performance, it is also convenient for the system to find or modify the specified tuple data; at the same time, it can also reduce the frequency of reading and writing of the cache and prolong the service life of the cache.
  • Embodiments of the present invention also provide a memory including the apparatus described in Figures 3 through 6 and a processor for controlling the apparatus for processing dirty data.
  • This memory is capable of handling dirty data. It should be noted that the memory may be used as a memory or as a cache, which is not limited herein.
  • the invention can be implemented by means of software plus the necessary general hardware, and of course also by hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Cette invention concerne un procédé et un dispositif de traitement de données douteuses. Le procédé consiste : à déterminer un premier bloc de stockage dans la mémoire et à vérifier que la taille de ce bloc correspond aux spécifications d'écriture de la mémoire cache ; à combiner les éléments marqués comme données douteuses et à les écrire dans le premier bloc de stockage ; et à écrire les données douteuses tirées du premier bloc de stockage sur un disque via la mémoire cache. Cette invention permet d'améliorer le débit de données et les caractéristiques de lecture-écriture du système de base de données.
PCT/CN2011/081046 2011-10-20 2011-10-20 Procédé et dispositif de traitement de données douteuses WO2012083754A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/081046 WO2012083754A1 (fr) 2011-10-20 2011-10-20 Procédé et dispositif de traitement de données douteuses
CN201180002177.XA CN102725752B (zh) 2011-10-20 2011-10-20 处理脏数据的方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/081046 WO2012083754A1 (fr) 2011-10-20 2011-10-20 Procédé et dispositif de traitement de données douteuses

Publications (1)

Publication Number Publication Date
WO2012083754A1 true WO2012083754A1 (fr) 2012-06-28

Family

ID=46313122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/081046 WO2012083754A1 (fr) 2011-10-20 2011-10-20 Procédé et dispositif de traitement de données douteuses

Country Status (2)

Country Link
CN (1) CN102725752B (fr)
WO (1) WO2012083754A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (zh) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 一种海量数据清洗方法及装置
CN105763351A (zh) * 2014-12-17 2016-07-13 华为技术有限公司 部署增值业务的方法、转发设备、检测设备和管理设备
CN108319609A (zh) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 Etl数据处理方法及系统、数据清洗方法及装置
JP2020510905A (ja) * 2017-02-06 2020-04-09 中興通訊股▲ふん▼有限公司Zte Corporation フラッシュメモリファイルシステム及びそのデータ管理方法

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218430B (zh) * 2013-04-11 2016-03-02 华为技术有限公司 控制数据写入的方法、系统及设备
CN103513941B (zh) * 2013-10-18 2016-08-17 华为技术有限公司 写入数据的方法及装置
CN103714121B (zh) * 2013-12-03 2017-07-14 华为技术有限公司 一种索引记录的管理方法及装置
CN103631940B (zh) * 2013-12-09 2017-02-08 中国联合网络通信集团有限公司 一种应用于hbase数据库的数据写入方法及系统
CN104331452B (zh) * 2014-10-30 2017-07-28 北京思特奇信息技术股份有限公司 一种处理脏数据的方法及系统
WO2017113247A1 (fr) * 2015-12-30 2017-07-06 华为技术有限公司 Procédé pour réduire la consommation d'énergie d'une mémoire et d'un dispositif informatique
CN106802950A (zh) * 2017-01-16 2017-06-06 郑州云海信息技术有限公司 一种分布式文件系统小文件写缓存优化的方法
CN110704468A (zh) * 2019-10-17 2020-01-17 武汉微派网络科技有限公司 数据更新方法、装置及控制器
CN111563053B (zh) * 2020-07-10 2020-12-11 阿里云计算有限公司 处理Bitmap数据的方法以及装置
CN112115073A (zh) * 2020-09-04 2020-12-22 北京易捷思达科技发展有限公司 应用于Bcache的回收方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851677A (zh) * 2005-11-25 2006-10-25 华为技术有限公司 嵌入式处理器系统及其数据操作方法
CN101178689A (zh) * 2007-12-06 2008-05-14 浙江科技学院 一种NAND Flash存储器的动态管理方法
CN101916290A (zh) * 2010-08-18 2010-12-15 中兴通讯股份有限公司 内存数据库的管理方法和装置
US20110191535A1 (en) * 2010-02-01 2011-08-04 Fujitsu Limited Method for controlling disk array apparatus and disk array apparatus
WO2011114384A1 (fr) * 2010-03-19 2011-09-22 Hitachi, Ltd. Système de stockage et procédé permettant de modifier la configuration d'une mémoire cache pour le système de stockage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1851677A (zh) * 2005-11-25 2006-10-25 华为技术有限公司 嵌入式处理器系统及其数据操作方法
CN101178689A (zh) * 2007-12-06 2008-05-14 浙江科技学院 一种NAND Flash存储器的动态管理方法
US20110191535A1 (en) * 2010-02-01 2011-08-04 Fujitsu Limited Method for controlling disk array apparatus and disk array apparatus
WO2011114384A1 (fr) * 2010-03-19 2011-09-22 Hitachi, Ltd. Système de stockage et procédé permettant de modifier la configuration d'une mémoire cache pour le système de stockage
CN101916290A (zh) * 2010-08-18 2010-12-15 中兴通讯股份有限公司 内存数据库的管理方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (zh) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 一种海量数据清洗方法及装置
CN103593352B (zh) * 2012-08-15 2016-10-12 阿里巴巴集团控股有限公司 一种海量数据清洗方法及装置
CN105763351A (zh) * 2014-12-17 2016-07-13 华为技术有限公司 部署增值业务的方法、转发设备、检测设备和管理设备
CN105763351B (zh) * 2014-12-17 2019-09-03 华为技术有限公司 部署增值业务的方法、转发设备、检测设备和管理设备
CN108319609A (zh) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 Etl数据处理方法及系统、数据清洗方法及装置
JP2020510905A (ja) * 2017-02-06 2020-04-09 中興通訊股▲ふん▼有限公司Zte Corporation フラッシュメモリファイルシステム及びそのデータ管理方法

Also Published As

Publication number Publication date
CN102725752A (zh) 2012-10-10
CN102725752B (zh) 2014-07-16

Similar Documents

Publication Publication Date Title
WO2012083754A1 (fr) Procédé et dispositif de traitement de données douteuses
US9449005B2 (en) Metadata storage system and management method for cluster file system
US9703640B2 (en) Method and system of performing incremental SQL server database backups
US8799601B1 (en) Techniques for managing deduplication based on recently written extents
US9836514B2 (en) Cache based key-value store mapping and replication
US9305040B2 (en) Efficient B-tree data serialization
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
CN103106286B (zh) 元数据的管理方法和装置
CN106662981A (zh) 存储设备、程序和信息处理方法
US9542279B2 (en) Shadow paging based log segment directory
US11526465B2 (en) Generating hash trees for database schemas
WO2016070529A1 (fr) Procédé et dispositif d'obtention de suppression de données dupliquées
WO2014089828A1 (fr) Procédé d'accès à un dispositif de stockage et dispositif de stockage
WO2018076633A1 (fr) Procédé de duplication de données à distance, dispositif de stockage et système de stockage
KR101674176B1 (ko) 파일 단위 순서 모드 저널링 기법을 이용한 fsync 시스템 호출 처리 장치 및 방법
US9411692B2 (en) Applying write elision
US10423583B1 (en) Efficient caching and configuration for retrieving data from a storage system
US8086580B2 (en) Handling access requests to a page while copying an updated page of data to storage
US11625503B2 (en) Data integrity procedure
US11899625B2 (en) Systems and methods for replication time estimation in a data deduplication system
US10528254B2 (en) Methods and systems of garbage collection and defragmentation in a distributed database
US10664442B1 (en) Method and system for data consistency verification in a storage system
US11748259B2 (en) System and method to conserve device lifetime for snapshot generation
CN116257531B (zh) 一种数据库空间回收方法
US11531644B2 (en) Fractional consistent global snapshots of a distributed namespace

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180002177.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11851101

Country of ref document: EP

Kind code of ref document: A1