WO2024077863A1 - 一种全闪存储系统的恢复方法及相关装置 - Google Patents

一种全闪存储系统的恢复方法及相关装置 Download PDF

Info

Publication number
WO2024077863A1
WO2024077863A1 PCT/CN2023/081445 CN2023081445W WO2024077863A1 WO 2024077863 A1 WO2024077863 A1 WO 2024077863A1 CN 2023081445 W CN2023081445 W CN 2023081445W WO 2024077863 A1 WO2024077863 A1 WO 2024077863A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
logical volume
state
clean
storage system
Prior art date
Application number
PCT/CN2023/081445
Other languages
English (en)
French (fr)
Inventor
张凯
刚亚州
王见
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2024077863A1 publication Critical patent/WO2024077863A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Definitions

  • the present application relates to the field of storage technology, and in particular to a recovery method for an all-flash storage system; and also to a recovery device and equipment for an all-flash storage system and a computer non-volatile readable storage medium.
  • Metadata refers to data that describes data.
  • Metadata management mainly manages L-P mapping (the mapping relationship between Logical Block Address and Physical Block Address), P-L mapping (the mapping relationship between Physical Block Address and Logical Block Address), etc. Since it involves a large amount of highly concurrent and short-latency data access, metadata in an all-flash storage system is usually organized using a tree data structure. Due to limited memory capacity, a large amount of metadata management needs to be solidified and preserved, which involves disk flushing and on-disk metadata space allocation.
  • the length of the repair time of the all-flash storage system determines the duration of customer business interruption, and the length of the repair time of the all-flash storage system also reflects the availability, reliability, and security of the entire storage system.
  • the purpose of the present application is to provide a method for recovering an all-flash storage system, which can achieve rapid recovery after a power failure of the all-flash storage system, shorten the repair time, and improve the availability, reliability, and security of the entire storage system.
  • Another purpose of the present application is to provide a recovery device, equipment, and computer non-volatile readable storage medium for an all-flash storage system, all of which have the above technical effects.
  • the present application provides a recovery method of an all-flash storage system, comprising:
  • marking the state of the metadata of the logical volume as clean includes:
  • marking the state of the metadata of the logical volume as a clean state includes:
  • the system is restored online, including:
  • the forward metadata is metadata for indicating a mapping of a logical block address to a physical block address
  • marking the state of the metadata of the logical volume as clean includes:
  • marking the state of metadata of the logical volume as clean in the logical volume includes:
  • the state of the metadata of the logical volume is marked as clean in the super block at the head of the logical volume.
  • marking the state of the metadata of the logical volume as clean includes:
  • the root node address of the tree structure where the metadata is located is written into the super block at the head of the logical volume.
  • the root node address of the tree structure where the metadata is written in the super block at the head of the logical volume includes:
  • the root node address of the B+ tree where the metadata is located is written into the super block at the head of the logical volume.
  • the root node address of the B+ tree where the metadata is located is written in the super block at the head of the logical volume, including:
  • mark the state of the metadata of the logical volume as dirty including:
  • the status of the metadata of the logical volume is marked as dirty in the logical volume, including:
  • the state of the metadata of the logical volume is marked as dirty in the super block at the head of the logical volume.
  • marking the state of the metadata of the logical volume as dirty includes:
  • reconstruct forward metadata including:
  • the present application also provides a recovery device for an all-flash storage system, comprising:
  • a status marking module is configured to mark the metadata of the logical volume as being in a clean state when the metadata of the logical volume is clean;
  • a status reading module is configured to read the status of the metadata of the logical volume after the all-flash storage system is powered on again;
  • the recovery module is configured to recover the logical volume if the metadata of the logical volume is in a clean state.
  • the present application also provides a recovery device for an all-flash storage system, including:
  • a memory arranged to store a computer program
  • a processor is configured to implement the steps of any of the above all-flash storage system recovery methods when executing a computer program.
  • the present application also provides a computer non-volatile readable storage medium, on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the steps of the all-flash storage system recovery method as described in any of the above items are implemented.
  • the recovery method of the all-flash storage system includes: when the metadata of the logical volume is clean, marking the state of the metadata of the logical volume as a clean state; when the all-flash storage system is powered on again, reading the state of the metadata of the logical volume; if the metadata of the logical volume is in a clean state, restoring the system online.
  • the recovery method of the all-flash storage system will mark the status of the metadata as clean when the metadata of the logical volume is clean. After the all-flash storage system subsequently loses power and is restored to power, if the status of the metadata of the logical volume is clean, it will be directly restored online without the need to reconstruct the forward metadata, thereby enabling the all-flash storage system to recover quickly after a power failure, shortening the repair time, and improving the availability, reliability, and security of the entire storage system.
  • the recovery device, equipment and computer non-volatile readable storage medium of the all-flash storage system provided in this application all have the above-mentioned technical effects.
  • FIG1 is a schematic diagram of a process flow of a recovery method for an all-flash storage system provided in an embodiment of the present application
  • FIG2 is a TO_CLEAN flow chart provided in an embodiment of the present application.
  • FIG3 is a TO_DIRTY flow chart provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a recovery device for an all-flash storage system provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a recovery device for an all-flash storage system provided in an embodiment of the present application.
  • the core of this application is to provide a method for recovering an all-flash storage system, which can achieve rapid recovery after a power failure of the all-flash storage system, shorten the repair time, and improve the availability, reliability and security of the entire storage system.
  • Another core of this application is to provide a recovery device, equipment and computer non-volatile readable storage medium for an all-flash storage system, all of which have the above technical effects.
  • the present application provides a recovery method of an all-flash storage system, which can realize rapid recovery of the all-flash storage system after a power failure and shorten the repair time.
  • FIG. 1 is a schematic flow chart of a method for recovering an all-flash storage system provided in an embodiment of the present application.
  • the method includes:
  • Clean means that the metadata of the logical volume has been flushed. If the metadata has not been flushed, that is, there is metadata in the memory that has not been flushed, the metadata of the logical volume is dirty. When the all-flash storage system is normal, if the metadata of the logical volume is clean, the status of the metadata of the logical volume will be marked as clean.
  • marking the state of the metadata of the logical volume as a clean state may include: when no IO is issued to the logical volume within a preset timing period and the metadata of the logical volume is refreshed, marking the state of the metadata of the logical volume as a clean state.
  • the condition for the metadata of the logical volume to be clean is that within a preset timing period, no IO is issued to the logical volume and the metadata of the logical volume is flushed. If the above conditions are met, the metadata of the logical volume is clean. Otherwise, the metadata of the logical volume is dirty.
  • the method of marking the state of the metadata of the logical volume can be set differently.
  • the metadata of the logical volume can be marked as clean in the logical volume itself.
  • the metadata of the logical volume can also be marked as clean in other locations outside the logical volume.
  • marking the status of the metadata of the logical volume as a clean status includes: marking the status of the metadata of the logical volume as a clean status in the logical volume.
  • the clean status of the metadata of the logical volume is marked in the logical volume.
  • the data is in the state of being read, it can be read from the logical volume.
  • the step of marking the state of metadata of the logical volume as a clean state in the logical volume may include:
  • the state of the metadata of the logical volume is marked as clean in the super block at the head of the logical volume.
  • the area used to mark the clean state of the metadata of the logical volume is the superblock at the head of the logical volume. If the metadata of the logical volume is clean, the state of the metadata of the logical volume is marked as clean in the superblock of the logical volume.
  • marking the state of the metadata of the logical volume as a clean state may include:
  • the client's timer is set to 2 minutes (it can also be set to other durations), the idle time refresh task is started, and it is determined whether the metadata of the logical volume has been refreshed. If the metadata of the logical volume has been refreshed, the client requests that the metadata status be changed to clean.
  • the state machine control end triggers the state machine to run, and initiating the TO_CLEAN task initiates the clean task. Then the client executes the task and marks the status of the metadata of the logical volume as clean in the spuerblock of the logical volume.
  • the state of the metadata of the logical volume is read first.
  • the state of the metadata of the logical volume marked in the superblock of the logical volume is read first. If the state of the metadata of the read logical volume is clean, it indicates that there is forward metadata in the hard disk or disk.
  • the forward metadata can be directly read from the hard disk or disk into the memory. At this time, the system can be directly restored online without the need to reconstruct the forward metadata.
  • Forward metadata refers to the metadata of the mapping of logical block addresses to physical block addresses.
  • it may also include:
  • the root node address of the tree structure where the metadata is located is written into the super block at the head of the logical volume.
  • the metadata of the logical volume is organized using a tree structure.
  • a new LP mapping relationship namely, Logical Block Mapping
  • the mapping relationship between the logical volume address and the physical block address will make at least one node of the tree dirty, and the whole tree is in a dirty state.
  • the metadata of the logical volume can be marked as clean in the superblock of the logical volume, and the root node address of the tree can be written at the same time.
  • the root node address of the tree structure where the metadata is written in the super block at the head of the logical volume includes:
  • the root node address of the B+ tree where the metadata is located is written into the super block at the head of the logical volume.
  • the B+ tree index has a search time complexity of O(logn) and a space utilization rate of 75% (non-leaf nodes are used as index nodes, not as nodes for storing data).
  • B+ tree search starts from the root node and then traverses down level by level until it reaches the leaf node. Therefore, non-leaf nodes are important nodes in the query process and are the most frequently accessed nodes. Moreover, the lower the level of the node, the higher the access frequency. Therefore, try to keep the lower level non-leaf nodes in memory.
  • B+ tree has better search efficiency and is more suitable for organizing metadata objects. Therefore, the tree in this embodiment adopts B+ tree to support the effective search of metadata objects within the all-flash storage system.
  • the metadata of the logical volume When the metadata of the logical volume is in the clean state, every time a new IO is written, a new LP mapping relationship will be inserted into the B+ tree, which will make at least one node of the B+ tree dirty. At this time, the entire B+ tree is in the dirty state. When the logical volume does not issue IO within a timing cycle and all dirty metadata are flushed, the entire B+ tree will be in the clean state. At this time, the metadata of the logical volume can be marked as clean in the superblock of the logical volume, and the root node address of the B+ tree can be written at the same time.
  • the all-flash storage system After the all-flash storage system is powered on again, when the metadata of the logical volume is in the clean state, it can be directly restored online, and the forward metadata of the logical volume can be accessed according to the root node address.
  • it may also include:
  • the state of the metadata of the logical volume when the metadata of the logical volume is clean, the state of the metadata of the logical volume will be marked as clean.
  • the metadata of the logical volume When the metadata of the logical volume is dirty, the state of the metadata of the logical volume will also be marked as dirty.
  • the state of the metadata of the logical volume every time a new IO is written, a new LP mapping relationship will be generated, and the state of the metadata of the logical volume will become dirty. If the metadata of the logical volume is dirty, the metadata of the logical volume will be marked as dirty. If a power failure occurs in the all-flash storage system later, after the all-flash storage system is powered on again, it can be known whether the state of the metadata of the logical volume before the power failure was dirty by reading the mark.
  • the metadata of the position mark logical volume is in dirty state.
  • marking the status of the metadata of the logical volume as dirty may include:
  • the dirty state of the metadata of the logical volume is also marked in the logical volume.
  • the logical volume can be read to know whether the metadata of the logical volume is in the clean state or the dirty state.
  • marking the state of the metadata of the logical volume as a dirty state in the logical volume may include: marking the state of the metadata of the logical volume as a dirty state in a super block at the head of the logical volume.
  • the area used to mark the dirty state of the metadata of the logical volume is the superblock at the head of the logical volume. If the metadata of the logical volume is dirty, the state of the metadata of the logical volume is marked as dirty in the superblock of the logical volume.
  • marking the state of the metadata of the logical volume as dirty may include:
  • the client determines whether the metadata of the logical volume is in the clean state. If the metadata of the logical volume is in the clean state, the client requests that the metadata state be changed to dirty.
  • the state machine control end triggers the state machine to run, and initiates the client TO_CLEAN task, that is, the dirty task.
  • the client executes the task and marks the dirty state in the superblock of the logical volume.
  • the superblock at the head of the logical volume can also mark information such as grainsize.
  • it may also include:
  • the forward metadata must be reconstructed first, and then the system is restored online after the forward metadata is reconstructed.
  • reconstructing the forward metadata may include:
  • Reverse metadata refers to the metadata of the mapping from physical block addresses to logical block addresses.
  • the recovery method provided in the above embodiment can realize the rapid recovery of the all-flash storage system in some scenarios by marking the status of metadata and restoring online according to the status of metadata. For example, the system loses power unplanned; the system cluster status is abnormal and unavailable due to software failure; the non-volatile memory cannot be saved due to system software failure; the non-volatile memory is lost due to system software failure; the non-volatile memory is lost due to system hardware failure, etc.
  • the recovery method of the all-flash storage system includes: when the metadata of the logical volume is clean, marking the state of the metadata of the logical volume as a clean state; when the all-flash storage system is powered on again, reading the state of the metadata of the logical volume; if the metadata of the logical volume is in a clean state, then the system is restored online.
  • the recovery method of the all-flash storage system will mark the metadata as a clean state when the metadata of the logical volume is clean, and after the all-flash storage system subsequently loses power and is powered on again, if the state of the metadata of the logical volume is in a clean state, the system will be directly restored online without the need to reconstruct the forward metadata, thereby enabling the all-flash storage system to recover quickly after a power failure, shortening the repair time, and improving the availability, reliability, and security of the entire storage system.
  • the present application also provides a recovery device for an all-flash storage system, and the device described below can be referred to in correspondence with the method described above.
  • the status marking module 10 is configured to mark the metadata of the logical volume as being in a clean state when the metadata of the logical volume is clean;
  • the status reading module 20 is configured to read the status of the metadata of the logical volume after the all-flash storage system is powered on again;
  • the recovery module 30 is configured to recover the logical volume if the metadata of the logical volume is in a clean state.
  • the status of the metadata of the logical volume is marked as clean.
  • the status of the metadata of the logical volume is read first. If the status of the metadata of the read logical volume is clean, it is directly restored online without the need to reconstruct the forward metadata.
  • the state marking module 10 is configured as follows:
  • the condition for the metadata of the logical volume to be clean is that within a preset timing period, no IO is issued to the logical volume and the metadata of the logical volume is flushed. If the above conditions are met, the metadata of the logical volume is clean. Otherwise, the metadata of the logical volume is dirty.
  • the state marking module 10 is configured as follows:
  • the clean status of the metadata of the logical volume is marked in the logical volume.
  • the status of the metadata of the logical volume needs to be read, it can be read from the logical volume.
  • the state marking module 10 is configured as follows:
  • the state of the metadata of the logical volume is marked as clean in the super block at the head of the logical volume.
  • the area used to mark the clean state of the metadata of the logical volume is the superblock at the head of the logical volume. If the metadata of the logical volume is clean, the state of the metadata of the logical volume is marked as clean in the superblock of the logical volume.
  • the state marking module 10 is configured as follows:
  • the address marking module writes the root node address of the tree structure where the metadata is located in the super block at the head of the logical volume when the metadata of the logical volume is cleaned.
  • the metadata of the logical volume is organized using a tree structure.
  • a new LP mapping relationship that is, the mapping relationship from Logical Block Address to Physical Block Address
  • the entire tree is in a dirty state.
  • the metadata of the logical volume can be marked as clean in the superblock of the logical volume, and the root node address of the tree can be written at the same time.
  • the address marking module is configured as follows:
  • the root node address of the B+ tree where the metadata is located is written into the super block at the head of the logical volume.
  • the B+ tree index has a search time complexity of O(logn) and a space utilization rate of 75% (non-leaf nodes are used as index nodes, not as nodes for storing data).
  • B+ tree search starts from the root node and then traverses down level by level until it reaches the leaf node. Therefore, non-leaf nodes are important nodes in the query process and are the most frequently accessed nodes. Moreover, the lower the level of the node, the higher the access frequency. Therefore, try to keep the lower level non-leaf nodes in memory.
  • B+ tree has better search efficiency and is more suitable for organizing metadata objects. Therefore, the tree in this embodiment adopts B+ tree to support the effective search of metadata objects within the all-flash storage system.
  • the metadata of the logical volume When the metadata of the logical volume is in the clean state, every time a new IO is written, a new LP mapping relationship will be inserted into the B+ tree, which will make at least one node of the B+ tree dirty. At this time, the entire B+ tree is in the dirty state. When the logical volume does not issue IO within a timing cycle and all dirty metadata are flushed, the entire B+ tree will be in the clean state. At this time, the metadata of the logical volume can be marked as clean in the superblock of the logical volume, and the root node address of the B+ tree can be written at the same time.
  • the address reading module is configured to read the root node address
  • the metadata access module is configured to access the forward metadata of the logical volume according to the root node address.
  • the all-flash storage system After the all-flash storage system is powered on again, when the metadata of the logical volume is in the clean state, it can be directly restored online and The forward metadata of the logical volume can be accessed based on the root node address.
  • the status marking module 10 is further configured to:
  • the state of the metadata of the logical volume when the metadata of the logical volume is clean, the state of the metadata of the logical volume will be marked as clean.
  • the metadata of the logical volume When the metadata of the logical volume is dirty, the state of the metadata of the logical volume will also be marked as dirty.
  • the metadata of the logical volume is clean, every time a new IO is written, a new LP mapping relationship will be generated, and the metadata of the logical volume will become dirty. If the metadata of the logical volume is dirty, the metadata of the logical volume will be marked as dirty. If a power failure occurs in the all-flash storage system later, after the all-flash storage system is powered on again, it can be known whether the state of the metadata of the logical volume before the power failure was dirty by reading the mark.
  • the state marking module 10 is configured as follows:
  • the dirty state of the metadata of the logical volume is also marked in the logical volume.
  • the logical volume can be read to know whether the metadata of the logical volume is in the clean state or the dirty state.
  • the state marking module 10 is configured as follows:
  • the state of the metadata of the logical volume is marked as dirty in the super block at the head of the logical volume.
  • the area used to mark the dirty state of the metadata of the logical volume is the superblock at the head of the logical volume. If the state of the metadata of the logical volume is dirty, the state of the metadata of the logical volume is marked as dirty in the superblock of the logical volume.
  • the metadata reconstruction module is configured to reconstruct the forward metadata if the state of the metadata of the logical volume is dirty, and resume online after reconstructing the forward metadata.
  • the metadata of the marked logical volume is in a dirty state. If the metadata of the marked logical volume is read and the read metadata is in a dirty state, you need to first reconstruct the forward metadata and then restore the online state after reconstructing the forward metadata.
  • the metadata reconstruction module is configured as follows:
  • Reverse metadata refers to metadata that maps physical block addresses to logical block addresses. After the all-flash storage system is powered on again, if the metadata of the logical volume is in a dirty state, the logical partition space of the logical volume on the physical disk is first read, and the forward metadata of the logical volume is reconstructed through reverse metadata. The implementation process of reconstructing forward metadata through reverse metadata is not described in detail in this application, and reference may be made to the prior art.
  • the recovery device of the all-flash storage system will mark the status of the metadata as clean when the metadata of the logical volume is clean. After the all-flash storage system subsequently loses power and is restored to power, if the status of the metadata of the logical volume is clean, it will be directly restored online without the need to reconstruct the forward metadata, thereby enabling the all-flash storage system to quickly recover after a power failure, shortening the repair time, and improving the availability, reliability, and security of the entire storage system.
  • the present application also provides a recovery device for an all-flash storage system.
  • the device includes a memory 1 and a processor 2 .
  • a memory 1 configured to store a computer program
  • Processor 2 is configured to execute a computer program to implement the following steps:
  • the state of the metadata of the logical volume is marked as clean; when the all-flash storage system is powered on again, the state of the metadata of the logical volume is read; if the metadata of the logical volume is clean, the forward metadata of the logical volume is accessed.
  • the recovery device of the all-flash storage system will mark the status of the metadata as clean when the metadata of the logical volume is clean. After the all-flash storage system subsequently loses power and is restored to power, if the status of the metadata of the logical volume is clean, it will be directly restored online without the need to reconstruct the forward metadata, thereby enabling the all-flash storage system to recover quickly after a power failure, shortening the repair time, and improving the availability, reliability, and security of the entire storage system.
  • the present application also provides a computer non-volatile readable storage medium, on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the state of the metadata of the logical volume is marked as clean; when the all-flash storage system is powered on again, the state of the metadata of the logical volume is read; if the metadata of the logical volume is clean, the forward metadata of the logical volume is accessed.
  • the computer non-volatile readable storage medium may include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes.
  • the computer non-volatile readable storage medium provided in the present application will mark the metadata as clean when the metadata of the logical volume is in clean state. After the all-flash storage system fails to power on and then recovers, if the logical volume If the metadata status is clean, it will be directly restored online without the need to reconstruct the forward metadata, thereby enabling rapid recovery of the all-flash storage system after a power failure, shortening the repair time, and improving the availability, reliability, and security of the entire storage system.
  • the steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly using hardware, a software module executed by a processor, or a combination of the two.
  • the software module may be placed in a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

本申请公开了一种全闪存储系统的恢复方法,涉及存储技术领域,包括:当逻辑卷的元数据clean时,标记所述逻辑卷的元数据的状态为clean状态;当全闪存储系统恢复上电后,读取所述逻辑卷的元数据的状态;若所述逻辑卷的元数据为所述clean状态,则访问所述逻辑卷的正向元数据。该方法能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。本申请还公开了一种全闪存储系统的恢复装置、设备以及计算机非易失性可读存储介质,均具有上述技术效果。

Description

一种全闪存储系统的恢复方法及相关装置
相关申请的交叉引用
本申请要求于2022年10月10日提交中国专利局,申请号为202211231242.2,申请名称为“一种全闪存储系统的恢复方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,特别涉及一种全闪存储系统的恢复方法;还涉及一种全闪存储系统的恢复装置、设备以及计算机非易失性可读存储介质。
背景技术
元数据是指描述数据的数据。在全闪存储系统中,元数据管理至关重要。元数据管理主要管理L-P映射(Logical Block Address到Physical Block Address的映射关系)、P-L映射(Physical Block Address到Logical Block Address的映射关系)等。由于涉及到大量且高并发、短时延的数据访问,全闪存储系统中元数据通常使用树的数据结构来组织。由于内存容量有限,大量的元数据管理需要固化保存,因此涉及到刷盘和盘上元数据空间分配。当存在掉电时非易失性内存丢失的软硬件故障,会导致存储系统节点故障,进而导致全闪存储系统不可用,需要修复才能继续处理业务。全闪存储系统的修复时间的长短决定客户业务中断的时间,全闪存储系统的修复时间的长短也反应了整个存储系统的可用性,可靠性和安全性。
因此,如何缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性已成为本领域技术人员亟待解决的技术问题。
发明内容
本申请的目的是提供一种全闪存储系统的恢复方法,能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。本申请的另一个目的是提供一种全闪存储系统的恢复装置、设备以及计算机非易失性可读存储介质,均具有上述技术效果。
为解决上述技术问题,本申请提供了一种全闪存储系统的恢复方法,包括:
当逻辑卷的元数据clean(空白)时,标记逻辑卷的元数据的状态为clean状态;
当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;
若逻辑卷的元数据为clean状态,则恢复上线。
可选的,当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态包括:
当在预设定时周期内,逻辑卷没有IO(Input/Output,输入/输出)下发且逻辑卷的元数据下刷完成时,标记逻辑卷的元数据的状态为clean状态。
可选的,当在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成时,标记逻辑卷的元数据的状态为clean状态包括:
启动闲时下刷任务,并判断逻辑卷的元数据是否下刷完成;
若逻辑卷的元数据下刷完成,则发起元数据的状态变clean的请求,以使状态机控制端触发状态机运行,发起变clean任务;
执行变clean任务,标记逻辑卷的元数据的状态为clean状态。
可选的,若逻辑卷的元数据为clean状态,则恢复上线,包括:
若逻辑卷的元数据为clean状态,确定硬盘中有正向元数据,其中,正向元数据用于指示逻辑块地址到物理块地址的映射的元数据;
将正向元数据从硬盘中读取到内存中,并回复上线。
可选的,标记逻辑卷的元数据的状态为clean状态包括:
在逻辑卷中标记逻辑卷的元数据的状态为clean状态。
可选的,在逻辑卷中标记逻辑卷的元数据的状态为clean状态包括:
在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为clean状态。
可选的,标记逻辑卷的元数据的状态为clean状态包括:
在逻辑卷之外的其他位置标记逻辑卷的元数据为clean状态。
可选的,还包括:
当逻辑卷的元数据clean时,在逻辑卷的头部的超级区块中写入元数据所在树结构的根节点地址。
可选的,在逻辑卷的头部的超级区块中写入元数据所在树结构的根节点地址包括:
在逻辑卷的头部的超级区块中写入元数据所在B+树的根节点地址。
可选的,在逻辑卷的头部的超级区块中写入元数据所在B+树的根节点地址,包括:
在逻辑卷在定时周期内没有下发IO且dirty的元数据全部下刷的情况下,确定B+树的全部节点均为clean状态,其中,在存在IO写入的情况下,B+树上对应的节点变为dirty;
在逻辑卷的超级区块中标记逻辑卷的元数据为clean状态,并同时写入B+树的根节点地址。
可选的,还包括:
读取根节点地址;
根据根节点地址访问逻辑卷的正向元数据。
可选的,还包括:
当逻辑卷的元数据dirty(写过)时,标记逻辑卷的元数据的状态为dirty状态。
可选的,标记逻辑卷的元数据的状态为dirty状态,包括:
在逻辑卷中标记逻辑卷的元数据的状态为dirty状态。
可选的,在逻辑卷中标记逻辑卷的元数据的状态为dirty状态,包括:
在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为dirty状态。
可选的,当逻辑卷的元数据dirty时,标记逻辑卷的元数据的状态为dirty状态包括:
当逻辑卷有IO下发时,判断逻辑卷的元数据的状态是否为clean状态;
若逻辑卷的元数据的状态为clean状态,则发起元数据的状态变dirty的请求,以使状态机控制端触发状态机运行,发起变dirty任务;
执行变dirty任务,标记逻辑卷的元数据的状态为dirty状态。
可选的,还包括:
若逻辑卷的元数据的状态为dirty状态,则重构正向元数据,并在重构正向元数据后,恢复上线。
可选的,重构正向元数据包括:
读取逻辑卷在物理磁片的逻辑划分空间,并通过反向元数据重构正向元数据。
为解决上述技术问题,本申请还提供了一种全闪存储系统的恢复装置,包括:
状态标记模块,被设置为当逻辑卷的元数据clean时,标记逻辑卷的元数据为clean状态;
状态读取模块,被设置为当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;
恢复上线模块,被设置为若逻辑卷的元数据的状态为clean状态,则恢复上线。
为解决上述技术问题,本申请还提供了一种全闪存储系统的恢复设备,包括:
存储器,被设置为存储计算机程序;
处理器,被设置为执行计算机程序时实现如上任一项的全闪存储系统的恢复方法的步骤。
为解决上述技术问题,本申请还提供了一种计算机非易失性可读存储介质,计算机非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上任一项的全闪存储系统的恢复方法的步骤。
本申请所提供的全闪存储系统的恢复方法,包括:当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态;当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;若逻辑卷的元数据为clean状态,则恢复上线。
可见,本申请所提供的全闪存储系统的恢复方法,在逻辑卷的元数据clean时,会标记元数据的状态为clean状态,后续全闪存存储系统发生掉电故障并恢复上电后,如果逻辑卷的元数据的状态为clean状态,则会直接恢复上线,而不需要重构正向元数据,从而能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。
本申请所提供的全闪存储系统的恢复装置、设备以及计算机非易失性可读存储介质均具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种全闪存储系统的恢复方法的流程示意图;
图2为本申请实施例所提供的一种TO_CLEAN流程图;
图3为本申请实施例所提供的一种TO_DIRTY流程图;
图4为本申请实施例所提供的一种全闪存储系统的恢复装置的示意图;
图5为本申请实施例所提供的一种全闪存储系统的恢复设备的示意图。
具体实施方式
本申请的核心是提供一种全闪存储系统的恢复方法,能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。本申请的另一个核心是提供一种全闪存储系统的恢复装置、设备以及计算机非易失性可读存储介质,均具有上述技术效果。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
传统技术方案中,在全闪存储系统正常时不会标记逻辑卷的元数据的状态,在全闪存储系统发生掉电故障时,无论掉电故障前逻辑卷的元数据的状态如何,均会在恢复上电后首先重构正向元数据,由此导致恢复速度慢。为解决传统技术方案所存在的上述缺陷,本申请提供了一种全闪存储系统的恢复方法,能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间。
请参考图1,图1为本申请实施例所提供的一种全闪存储系统的恢复方法的流程示意图,参考图1所示,该方法包括:
S101:当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态;
Clean是指逻辑卷的元数据下刷完成。如果元数据没有下刷完成,即在内存中存在没有下刷的元数据,则逻辑卷的元数据为dirty。全闪存储系统正常时,如果逻辑卷的元数据为clean,则此时会标记逻辑卷的元数据的状态为clean状态。
在一些实施例中,当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态可以包括:当在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成时,标记逻辑卷的元数据的状态该为clean状态。
本实施例中,逻辑卷的元数据clean的条件为在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成。如果满足上述条件,那么逻辑卷的元数据clean。否则,逻辑卷的元数据dirty。
当逻辑卷的元数据clean时,每当有新的IO写入时,就会产生新的LP映射关系,逻辑卷的元数据就会变为dirty。当逻辑卷在一个预设定时周期内没有IO下发且dirty的元数据全部下刷后,逻辑卷的元数据变为clean。如果逻辑卷的元数据clean,则标记逻辑卷的元数据的状态为clean状态。后续如果全闪存储系统发生掉电故障,在全闪存储系统恢复上电后,通过读取标记就可以获知掉电故障前逻辑卷的元数据的状态是否为clean。
需要说明的是,需要确保所标记的元数据的clean状态在全闪存储系统掉电恢复后不会丢失,可以正常获取到所标记的元数据的clean状态。在此前提下,标记逻辑卷的元数据的状态的方式可以差异性设置。例如,可以在逻辑卷自身标记逻辑卷的元数据为clean状态。也可以在逻辑卷之外的其他位置标记逻辑卷的元数据为clean状态。
为了更针对性的进行状态标记,以及便于读取逻辑卷的元数据的状态,在一些实施例中,标记逻辑卷的元数据的状态为clean状态包括:在逻辑卷中标记逻辑卷的元数据的状态为clean状态。
本实施例中,逻辑卷的元数据的clean状态标记在逻辑卷中。当需要读取逻辑卷的元数 据的状态时,从逻辑卷中读取即可。
其中,在逻辑卷中标记逻辑卷的元数据的状态为clean状态可以包括:
在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为clean状态。
本实施例中用于标记逻辑卷的元数据的clean状态的区域为逻辑卷的头部的superblock即超级区块。如果逻辑卷的元数据clean,则在逻辑卷的superblock标记逻辑卷的元数据的状态为clean状态。
此外,当在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成时,标记逻辑卷的元数据的状态为clean状态可以包括:
启动闲时下刷任务,并判断逻辑卷的元数据是否下刷完成;
若逻辑卷的元数据下刷完成,则发起元数据的状态变clean的请求,以使状态机控制端触发状态机运行,发起变clean任务;
执行变clean任务,标记逻辑卷的元数据的状态为clean状态。
参考图2所示的TO_CLEAN流程,客户端的定时器设置为2分钟(也可以设置为其他时长),启动闲时下刷任务,并判断逻辑卷的元数据是否下刷完成。如果逻辑卷的元数据下刷完成,则客户端请求元数据的状态变为clean。状态机控制端触发状态机运行,发起TO_CLEAN任务即发起变为clean任务。进而客户端执行任务,在逻辑卷的spuerblock标记逻辑卷的元数据的状态为clean状态。
S102:当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;
S103:若逻辑卷的元数据的状态为clean状态,则访问逻辑卷的正向元数据。
当全闪存储系统发生掉电故障并恢复上电后,此时首先读取逻辑卷的元数据的状态。在逻辑卷的元数据的状态标记在逻辑卷的superblock的情况下,首先读取逻辑卷的superblock中标记逻辑卷的元数据的状态。如果读取的逻辑卷的元数据的状态为clean状态,表明硬盘或磁盘中有正向元数据,可以直接将正向元数据从硬盘或磁盘中读取到内存中,此时可以直接恢复上线,而不需要重构正向元数据。正向元数据是指逻辑块地址到物理块地址的映射的元数据。
在一些实施例中,还可以包括:
当逻辑卷的元数据clean时,在逻辑卷的头部的超级区块中写入元数据所在树结构的根节点地址。
本实施例中,逻辑卷的元数据使用树结构来组织。当逻辑卷的元数据的状态为clean状态时,每当有新的IO写入时,就会在树中插入一个新的LP映射关系即Logical Block  Address到Physical Block Address的映射关系,就会使树至少一个节点变dirty,此时整棵树为dirty状态。当逻辑卷在一个预设定时周期内没有下发IO且dirty的元数据全部下刷后,就会整棵树为clean状态,这时可在逻辑卷的superblock标记逻辑卷的元数据为clean状态,并同时写入树的根节点地址。
其中,在逻辑卷的头部的超级区块中写入元数据所在树结构的根节点地址包括:
在逻辑卷的头部的超级区块中写入元数据所在B+树的根节点地址。
B+树索引具有O(logn)的查找时间复杂度和75%的空间使用率(非叶子节点作为索引节点,不作为保存数据的节点)。B+树查找是通过根节点然后逐级往下遍历直到叶子节点,因此非叶子节点是查询过程中重要的节点,是最常访问的节点,而且层次越低的节点访问频率越高,因此尽量将层次越低的非叶子节点保留在内存中。B+树具有更好的搜索效率,更适合组织元数据对象,因此本实施例中树采用B+树,以支持全闪存储系统内部元数据对象的有效查找。
当逻辑卷的元数据为clean状态时,每当有新的IO写入时,就会在B+树中插入一个新的LP映射关系,就会使B+树至少一个节点变dirty,此时整棵B+树为dirty状态。当逻辑卷在一个定时周期内没有下发IO且dirty的元数据全部下刷后,就会整棵B+树为clean状态,这时可在逻辑卷的superblock标记逻辑卷的元数据为clean状态,并同时写入B+树的根节点地址。
在以树结构组织元数据并标记元数据所在树结构的根节点地址的基础上,还可以包括:
读取根节点地址;
根据根节点地址访问逻辑卷的正向元数据。
全闪存储系统恢复上电后,当逻辑卷的元数据为clean状态时,可以直接恢复上线,并可以根据根节点地址访问逻辑卷的正向元数据。
在一些实施例中,还可以包括:
当逻辑卷的元数据dirty时,标记逻辑卷的元数据的状态为dirty状态。
本实施例中,当逻辑卷的元数据clean时,会标记逻辑卷的元数据的状态为clean状态。当逻辑卷的元数据dirty时,还会标记逻辑卷的元数据的状态为dirty状态。当逻辑卷的元数据的状态为clean时,每当有新的IO写入时,就会产生新的LP映射关系,逻辑卷的元数据的状态就会变为dirty。如果逻辑卷的元数据dirty,则会标记逻辑卷的元数据为dirty状态。后续如果全闪存储系统发生掉电故障,在全闪存储系统恢复上电后,通过读取标记就可以获知掉电故障前逻辑卷的元数据的状态是否为dirty。
同样可以在逻辑卷自身标记逻辑卷的元数据为dirty状态。也可以在逻辑卷之外的其他 位置标记逻辑卷的元数据为dirty状态。
为了更针对性的进行状态标记,以及便于读取逻辑卷的元数据的状态,标记逻辑卷的元数据的状态为dirty状态可以包括:
在逻辑卷中标记逻辑卷的元数据的状态为dirty状态。
本实施例中,逻辑卷的元数据的dirty状态同样标记在逻辑卷中。当需要读取逻辑卷的元数据的状态时,读取逻辑卷即可获知逻辑卷的元数据为clean状态还是dirty状态。
其中,在逻辑卷中标记逻辑卷的元数据的状态为dirty状态可以包括:在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为dirty状态。
本实施例中用于标记逻辑卷的元数据的dirty状态的区域为在逻辑卷的头部的superblock即超级区块。如果逻辑卷的元数据dirty,则在逻辑卷的superblock标记逻辑卷的元数据的状态为dirty状态。
此外,当逻辑卷的元数据dirty时,标记逻辑卷的元数据的状态为dirty状态可以包括:
当逻辑卷有IO下发时,判断逻辑卷的元数据的状态是否为clean状态;
若逻辑卷的元数据的状态为clean状态,则发起元数据的状态变dirty的请求,以使状态机控制端触发状态机运行,发起变dirty任务;
执行变dirty任务,标记逻辑卷的元数据的状态为dirty状态。
参考图3所示的TO_DIRTY流程,客户端判断逻辑卷的元数据是否为clean状态。如果逻辑卷的元数据是clean状态,则客户端请求元数据的状态变为dirty。状态机控制端触发状态机运行,发起客户端TO_CLEAN任务即发起变为dirty任务。客户端执行任务,在逻辑卷的superblock标记dirty状态。
除了可以在逻辑卷的头部的superblock标记逻辑卷的元数据的状态以及根节点地址外,还可以在逻辑卷的头部的superblock标记grainsize等信息。
在一些实施例中,还可以包括:
若逻辑卷的元数据的状态为dirty状态,则重构正向元数据。
在标记逻辑卷的元数据为dirty状态的情况下。如果读取标记的逻辑卷的元数据的状态时,读取到的逻辑卷的元数据的状态为dirty状态,则此时需首先重构正向元数据,进而在重构正向元数据后恢复上线。
其中,重构正向元数据可以包括:
读取逻辑卷在物理磁片的逻辑划分空间,并通过反向元数据重构正向元数据。
反向元数据是指物理块地址到逻辑块地址的映射的元数据。全闪存储系统恢复上电后, 在逻辑卷的元数据的状态为dirty状态的情况下,首先读取逻辑卷在物理磁盘的逻辑划分空间,通过反向元数据重构逻辑卷的正向元数据。对于通过反向元数据重构正向元数据的实现过程,本申请在此不再赘述,可以参照现有技术。
以下通过一个可选的实施例阐述全闪存储系统掉电故障后的恢复过程:
当逻辑卷的元数据为clean状态时,每当有新的IO写入时,在B+树中插入一个新的LP映射关系,使B+树至少一个节点变dirty,此时整棵B+树为dirty状态,这时在逻辑卷的superblock标记dirty状态;
当逻辑卷在一个定时周期内没有IO下发且dirty的元数据全部下刷后整棵B+树为clean状态,这时在逻辑卷的superblock标记clean状态且同时写入B+树的根节点地址。
在全闪存储系统发生掉电非易失性内存丢失等故障场景下,全闪存储系统上电恢复时首先检查superblock标记的为clean状态还是dirty状态。
如果是clean状态,可以立即恢复上线且获取到根节点地址,并通过该根节点地址访问该逻辑卷的所有正向元数据。
如果是dirty状态,首先需要读取该逻辑卷在物理磁盘的逻辑划分空间,通过反向元数据重建卷的正向元数据,进而恢复上线。
上述实施例所提供的恢复方法,通过标记元数据的状态并根据元数据的状态进行恢复上线,可以实现一些场景下的全闪存储系统的快速恢复。例如,系统非计划内掉电;系统因软件故障导致集群状态不正常、不可用;系统软件故障导致非易失性内存未能保存;系统软件故障导致非易失性内存丢失;系统硬件故障导致非易失性内存丢失等场景下的全闪存储系统的快速恢复。
综上,本申请所提供的全闪存储系统的恢复方法,包括:当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态;当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;若逻辑卷的元数据为clean状态,则恢复上线。可见,本申请所提供的全闪存储系统的恢复方法,在逻辑卷的元数据clean时,会标记元数据为clean状态,后续全闪存存储系统发生掉电故障并恢复上电后,如果逻辑卷的元数据的状态为clean状态,则会直接恢复上线,而不需要重构正向元数据,从而能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。
本申请还提供了一种全闪存储系统的恢复装置,下文描述的该装置可以与上文描述的方法相互对应参照。请参考图4,图4为本申请实施例所提供的一种全闪存储系统的恢复装置的 示意图,结合图4所示,该装置包括:
状态标记模块10,被设置为当逻辑卷的元数据clean时,标记逻辑卷的元数据为clean状态;
状态读取模块20,被设置为当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;
恢复上线模块30,被设置为若逻辑卷的元数据的状态为clean状态,则恢复上线。
全闪存储系统正常时,当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态。当全闪存储系统发生掉电故障并恢复上电后,此时首先读取逻辑卷的元数据的状态。如果读取的逻辑卷的元数据的状态为clean状态,则此时直接恢复上线,而不需要重构正向元数据。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
当在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成时,标记逻辑卷的元数据的状态为clean状态。
本实施例中,逻辑卷的元数据clean的条件为在预设定时周期内,逻辑卷没有IO下发且逻辑卷的元数据下刷完成。如果满足上述条件,那么逻辑卷的元数据clean。否则,逻辑卷的元数据dirty。
当逻辑卷的元数据clean时,每当有新的IO写入时,就会产生新的LP映射关系,逻辑卷的元数据就会变为dirty。当逻辑卷在一个预设定时周期内没有IO下发且dirty的元数据全部下刷后,逻辑卷的元数据就会变为clean。如果逻辑卷的元数据clean,则标记逻辑卷的元数据的状态为clean状态。后续如果全闪存储系统发生掉电故障,在全闪存储系统恢复上电后,通过读取标记就可以获知掉电故障前逻辑卷的元数据是否clean。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
在逻辑卷中标记逻辑卷的元数据的状态为clean状态。
为了更针对性的进行状态标记,以及便于读取逻辑卷的元数据的状态,本实施例中,逻辑卷的元数据的clean状态标记在逻辑卷中。当需要读取逻辑卷的元数据的状态时,从逻辑卷中读取即可。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为clean状态。
本实施例中用于标记逻辑卷的元数据的clean状态的区域为逻辑卷的头部的superblock即超级区块。如果逻辑卷的元数据clean,则在逻辑卷的superblock标记逻辑卷的元数据的状态为clean状态。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
启动闲时下刷任务,并判断逻辑卷的元数据是否下刷完成;
若逻辑卷的元数据下刷完成,则发起元数据的状态变clean的请求,以使状态机控制端触发状态机运行,发起变clean任务;
执行变clean任务,标记逻辑卷的元数据的状态为clean状态。
在上述实施例的基础上,作为一种可选的实施方式,还包括:
地址标记模块,当逻辑卷的元数据clean时,在逻辑卷的头部的超级区块中写入元数据所在树结构的根节点地址。
本实施例中,逻辑卷的元数据使用树结构来组织。当逻辑卷的元数据clean时,每当有新的IO写入时,就会在树中插入一个新的LP映射关系即Logical Block Address到Physical Block Address的映射关系,就会使树至少一个节点变dirty,此时整棵树为dirty状态。当逻辑卷在一个预设定时周期内没有下发IO且dirty的元数据全部下刷后,就会整棵树为clean状态,这时可在逻辑卷的superblock标记逻辑卷的元数据为clean状态,并同时写入树的根节点地址。
在上述实施例的基础上,作为一种可选的实施方式,地址标记模块被设置为:
在逻辑卷的头部的超级区块中写入元数据所在B+树的根节点地址。
B+树索引具有O(logn)的查找时间复杂度和75%的空间使用率(非叶子节点作为索引节点,不作为保存数据的节点)。B+树查找是通过根节点然后逐级往下遍历直到叶子节点,因此非叶子节点是查询过程中重要的节点,是最常访问的节点,而且层次越低的节点访问频率越高,因此尽量将层次越低的非叶子节点保留在内存中。B+树具有更好的搜索效率,更适合组织元数据对象,因此本实施例中树采用B+树,以支持全闪存储系统内部元数据对象的有效查找。
当逻辑卷的元数据为clean状态时,每当有新的IO写入时,就会在B+树中插入一个新的LP映射关系,就会使B+树至少一个节点变dirty,此时整棵B+树为dirty状态。当逻辑卷在一个定时周期内没有下发IO且dirty的元数据全部下刷后,就会整棵B+树为clean状态,这时可在逻辑卷的superblock标记逻辑卷的元数据为clean状态,并同时写入B+树的根节点地址。
在上述实施例的基础上,作为一种可选的实施方式,还包括:
地址读取模块,被设置为读取根节点地址;
元数据访问模块,被设置为根据根节点地址访问逻辑卷的正向元数据。
全闪存储系统恢复上电后,当逻辑卷的元数据为clean状态时,可以直接恢复上线,并 可以根据根节点地址访问逻辑卷的正向元数据。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10还被设置为:
当逻辑卷的元数据dirty时,标记逻辑卷的元数据的状态为dirty状态。
本实施例中,当逻辑卷的元数据clean时,会标记逻辑卷的元数据的状态为clean状态。当逻辑卷的元数据dirty时,还会标记逻辑卷的元数据的状态为dirty状态。当逻辑卷的元数据clean时,每当有新的IO写入时,就会产生新的LP映射关系,逻辑卷的元数据就会变为dirty。如果逻辑卷的元数据dirty,则会标记逻辑卷的元数据为dirty状态。后续如果全闪存储系统发生掉电故障,在全闪存储系统恢复上电后,通过读取标记就可以获知掉电故障前逻辑卷的元数据的状态是否为dirty。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
在逻辑卷中标记逻辑卷的元数据的状态为dirty状态。
本实施例中,逻辑卷的元数据的dirty状态同样标记在逻辑卷中。当需要读取逻辑卷的元数据的状态时,读取逻辑卷即可获知逻辑卷的元数据为clean状态还是dirty状态。
在上述实施例的基础上,作为一种可选的实施方式,状态标记模块10被设置为:
在逻辑卷的头部的超级区块中标记逻辑卷的元数据的状态为dirty状态。
本实施例中用于标记逻辑卷的元数据的dirty状态的区域为在逻辑卷的头部的superblock即超级区块。如果逻辑卷的元数据的状态为dirty,则在逻辑卷的superblock标记逻辑卷的元数据的状态为dirty状态。
在上述实施例的基础上,作为一种可选的实施方式,还包括:
元数据重构模块,被设置为若逻辑卷的元数据的状态为dirty状态,则重构正向元数据,并在重构正向元数据后,恢复上线。
在标记逻辑卷的元数据为dirty状态的情况下。如果读取标记的逻辑卷的元数据的状态时,读取到的逻辑卷的元数据的状态为dirty状态,则此时需首先重构正向元数据,进而在重构正向元数据后恢复上线。
在上述实施例的基础上,作为一种可选的实施方式,元数据重构模块被设置为:
读取逻辑卷在物理磁片的逻辑划分空间,并通过反向元数据重构正向元数据。
反向元数据是指物理块地址到逻辑块地址的映射的元数据。全闪存储系统恢复上电后,在逻辑卷的元数据的状态为dirty状态的情况下,首先读取逻辑卷在物理磁盘的逻辑划分空间,通过反向元数据重构逻辑卷的正向元数据。对于通过反向元数据重构正向元数据的实现过程,本申请在此不再赘述,可以参照现有技术。
本申请所提供的全闪存储系统的恢复装置,在逻辑卷的元数据clean时,会标记元数据的状态为clean状态,后续全闪存存储系统发生掉电故障并恢复上电后,如果逻辑卷的元数据的状态为clean状态,则会直接恢复上线,而不需要重构正向元数据,从而能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。
本申请还提供了一种全闪存储系统的恢复设备,参考图5所示,该设备包括存储器1和处理器2。
存储器1,被设置为存储计算机程序;
处理器2,被设置为执行计算机程序实现如下的步骤:
当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态;当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;若逻辑卷的元数据为clean状态,则访问逻辑卷的正向元数据。
本申请所提供的全闪存储系统的恢复设备,在逻辑卷的元数据clean时,会标记元数据的状态为clean状态,后续全闪存存储系统发生掉电故障并恢复上电后,如果逻辑卷的元数据的状态为clean状态,则会直接恢复上线,而不需要重构正向元数据,从而能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。
对于本申请所提供的设备的介绍请参照上述方法实施例,本申请在此不做赘述。
本申请还提供了一种计算机非易失性可读存储介质,该计算机非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现如下的步骤:
当逻辑卷的元数据clean时,标记逻辑卷的元数据的状态为clean状态;当全闪存储系统恢复上电后,读取逻辑卷的元数据的状态;若逻辑卷的元数据为clean状态,则访问逻辑卷的正向元数据。
该计算机非易失性可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请所提供的计算机非易失性可读存储介质,在逻辑卷的元数据的状态为clean时,会标记元数据为clean状态,后续全闪存存储系统发生掉电故障并恢复上电后,如果逻辑卷 的元数据的状态为clean状态,则会直接恢复上线,而不需要重构正向元数据,从而能够实现全闪存储系统掉电故障后快速恢复,缩短修复时间,提高整个存储系统的可用性、可靠性以及安全性。
对于本申请所提供的计算机非易失性可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、设备以及计算机非易失性可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的全闪存储系统的恢复方法、装置、设备以及计算机非易失性可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围。

Claims (20)

  1. 一种全闪存储系统的恢复方法,其特征在于,包括:
    当逻辑卷的元数据clean时,标记所述逻辑卷的元数据的状态为clean状态;
    当全闪存储系统恢复上电后,读取所述逻辑卷的元数据的状态;
    若所述逻辑卷的元数据为所述clean状态,则恢复上线。
  2. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,所述当逻辑卷的元数据clean时,标记所述逻辑卷的元数据的状态为clean状态包括:
    当在预设定时周期内,所述逻辑卷没有IO下发且所述逻辑卷的元数据下刷完成时,标记所述逻辑卷的元数据的状态为clean状态。
  3. 根据权利要求2所述的全闪存储系统的恢复方法,其特征在于,所述当在预设定时周期内,所述逻辑卷没有IO下发且所述逻辑卷的元数据下刷完成时,标记所述逻辑卷的元数据的状态为clean状态包括:
    启动闲时下刷任务,并判断逻辑卷的元数据是否下刷完成;
    若逻辑卷的元数据下刷完成,则发起元数据的状态变clean的请求,以使状态机控制端触发状态机运行,发起变clean任务;
    执行所述变clean任务,标记所述逻辑卷的元数据的状态为clean状态。
  4. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,所述若所述逻辑卷的元数据为所述clean状态,则恢复上线,包括:
    若所述逻辑卷的元数据为所述clean状态,确定硬盘中有正向元数据,其中,所述正向元数据用于指示逻辑块地址到物理块地址的映射的元数据;
    将所述正向元数据从所述硬盘中读取到内存中,并回复上线。
  5. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,所述标记所述逻辑卷的元数据的状态为clean状态包括:
    在所述逻辑卷中标记所述逻辑卷的元数据的状态为clean状态。
  6. 根据权利要求5所述的全闪存储系统的恢复方法,其特征在于,所述在所述逻辑卷中标记所述逻辑卷的元数据的状态为clean状态包括:
    在所述逻辑卷的头部的超级区块中标记所述逻辑卷的元数据的状态为clean状态。
  7. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,所述标记所述逻辑卷的元数据的状态为clean状态包括:
    在所述逻辑卷之外的其他位置标记所述逻辑卷的元数据为clean状态。
  8. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,还包括:
    当所述逻辑卷的元数据clean时,在所述逻辑卷的头部的超级区块中写入所述元数据所在树结构的根节点地址。
  9. 根据权利要求6所述的全闪存储系统的恢复方法,其特征在于,所述在所述逻辑卷的头部的超级区块中写入所述元数据所在树结构的根节点地址包括:
    在所述逻辑卷的头部的超级区块中写入所述元数据所在B+树的根节点地址。
  10. 根据权利要求9所述的全闪存储系统的恢复方法,其特征在于,所述在所述逻辑卷的头部的超级区块中写入所述元数据所在B+树的根节点地址,包括:
    在逻辑卷在定时周期内没有下发IO且dirty的元数据全部下刷的情况下,确定所述B+树的全部节点均为clean状态,其中,在存在IO写入的情况下,所述B+树上对应的节点变为dirty;
    在所述逻辑卷的所述超级区块中标记所述逻辑卷的元数据为clean状态,并同时写入所述B+树的根节点地址。
  11. 根据权利要求8所述的全闪存储系统的恢复方法,其特征在于,还包括:
    读取所述根节点地址;
    根据所述根节点地址访问所述逻辑卷的正向元数据。
  12. 根据权利要求1所述的全闪存储系统的恢复方法,其特征在于,还包括:
    当逻辑卷的元数据dirty时,标记所述逻辑卷的元数据的状态为dirty状态。
  13. 根据权利要求12所述的全闪存储系统的恢复方法,其特征在于,所述标记所述逻辑卷的元数据的状态为dirty状态,包括:
    在所述逻辑卷中标记所述逻辑卷的元数据的状态为dirty状态。
  14. 根据权利要求13所述的全闪存储系统的恢复方法,其特征在于,所述在所述逻辑卷中标记所述逻辑卷的元数据的状态为dirty状态,包括:
    在所述逻辑卷的头部的超级区块中标记所述逻辑卷的元数据的状态为dirty状态。
  15. 根据权利要求11所述的全闪存储系统的恢复方法,其特征在于,所述当逻辑卷的元数据dirty时,标记所述逻辑卷的元数据的状态为dirty状态包括:
    当所述逻辑卷有IO下发时,判断所述逻辑卷的元数据的状态是否为clean状态;
    若所述逻辑卷的元数据的状态为clean状态,则发起元数据的状态变dirty的请求,以使状态机控制端触发状态机运行,发起变dirty任务;
    执行所述变dirty任务,标记所述逻辑卷的元数据的状态为dirty状态。
  16. 根据权利要求12所述的全闪存储系统的恢复方法,其特征在于,还包括:
    若所述逻辑卷的元数据的状态为所述dirty状态,则重构正向元数据,并在重构所述 正向元数据后,恢复上线。
  17. 根据权利要求16所述的全闪存储系统的恢复方法,其特征在于,所述重构正向元数据包括:
    读取所述逻辑卷在物理磁片的逻辑划分空间,并通过反向元数据重构所述正向元数据。
  18. 一种全闪存储系统的恢复装置,其特征在于,包括:
    状态标记模块,被设置为当逻辑卷的元数据clean时,标记所述逻辑卷的元数据为clean状态;
    状态读取模块,被设置为当全闪存储系统恢复上电后,读取所述逻辑卷的元数据的状态;
    恢复上线模块,被设置为若所述逻辑卷的元数据的状态为所述clean状态,则恢复上线。
  19. 一种全闪存储系统的恢复设备,其特征在于,包括:
    存储器,被设置为存储计算机程序;
    处理器,被设置为执行所述计算机程序时实现如权利要求1至17任一项所述的全闪存储系统的恢复方法的步骤。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的全闪存储系统的恢复方法的步骤。
PCT/CN2023/081445 2022-10-10 2023-03-14 一种全闪存储系统的恢复方法及相关装置 WO2024077863A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211231242.2 2022-10-10
CN202211231242.2A CN115309591B (zh) 2022-10-10 2022-10-10 一种全闪存储系统的恢复方法及相关装置

Publications (1)

Publication Number Publication Date
WO2024077863A1 true WO2024077863A1 (zh) 2024-04-18

Family

ID=83866576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081445 WO2024077863A1 (zh) 2022-10-10 2023-03-14 一种全闪存储系统的恢复方法及相关装置

Country Status (2)

Country Link
CN (1) CN115309591B (zh)
WO (1) WO2024077863A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309591B (zh) * 2022-10-10 2023-03-24 浪潮电子信息产业股份有限公司 一种全闪存储系统的恢复方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532201A (zh) * 2019-08-23 2019-12-03 北京浪潮数据技术有限公司 一种元数据处理方法及装置
CN111581020A (zh) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 一种分布式块存储系统中数据恢复的方法和装置
US20220129332A1 (en) * 2020-10-22 2022-04-28 Sap Se Handling of Metadata for Microservices Processing
CN114816266A (zh) * 2022-05-30 2022-07-29 苏州浪潮智能科技有限公司 一种元数据修复方法、系统、存储介质及设备
CN115309591A (zh) * 2022-10-10 2022-11-08 浪潮电子信息产业股份有限公司 一种全闪存储系统的恢复方法及相关装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4282319B2 (ja) * 2002-12-13 2009-06-17 株式会社ルネサステクノロジ 半導体記憶装置
CN103761058B (zh) * 2014-01-23 2016-08-17 天津中科蓝鲸信息技术有限公司 Raid1和raid4混合结构网络存储系统及方法
CN109445713A (zh) * 2018-11-09 2019-03-08 郑州云海信息技术有限公司 一种元数据卷的存储状态记录方法、系统及相关组件
CN110377529A (zh) * 2019-06-27 2019-10-25 苏州浪潮智能科技有限公司 一种全闪存储系统数据管理的方法、装置以及设备
CN112214247B (zh) * 2019-07-12 2022-05-17 华为技术有限公司 一种系统启动方法以及相关设备
CN110673791B (zh) * 2019-09-06 2022-07-22 苏州浪潮智能科技有限公司 一种元数据下刷方法、装置、设备及可读存储介质
CN111124283A (zh) * 2019-11-29 2020-05-08 浪潮(北京)电子信息产业有限公司 一种存储空间管理方法、系统、电子设备及存储介质
CN111752487B (zh) * 2020-06-18 2024-01-12 深圳大普微电子科技有限公司 一种数据恢复方法、装置及固态硬盘
CN112099999A (zh) * 2020-10-12 2020-12-18 苏州浪潮智能科技有限公司 一种存储系统的集群结构中元数据的恢复方法及系统
CN112463079B (zh) * 2020-12-17 2023-12-22 北京浪潮数据技术有限公司 一种数据存储控制方法、装置、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532201A (zh) * 2019-08-23 2019-12-03 北京浪潮数据技术有限公司 一种元数据处理方法及装置
CN111581020A (zh) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 一种分布式块存储系统中数据恢复的方法和装置
US20220129332A1 (en) * 2020-10-22 2022-04-28 Sap Se Handling of Metadata for Microservices Processing
CN114816266A (zh) * 2022-05-30 2022-07-29 苏州浪潮智能科技有限公司 一种元数据修复方法、系统、存储介质及设备
CN115309591A (zh) * 2022-10-10 2022-11-08 浪潮电子信息产业股份有限公司 一种全闪存储系统的恢复方法及相关装置

Also Published As

Publication number Publication date
CN115309591A (zh) 2022-11-08
CN115309591B (zh) 2023-03-24

Similar Documents

Publication Publication Date Title
US10101930B2 (en) System and method for supporting atomic writes in a flash translation layer
US9348760B2 (en) System and method for efficient flash translation layer
US11301379B2 (en) Access request processing method and apparatus, and computer device
JP6026538B2 (ja) 検証されたデータセットの不揮発性媒体ジャーナリング
US9298578B2 (en) Method and apparatus for power loss recovery in a flash memory-based SSD
US8448023B2 (en) Approach for data integrity in an embedded device environment
CN104881371A (zh) 持久性内存事务处理缓存管理方法与装置
CN103577121A (zh) 一种基于nand flash的高可靠线性文件存取方法
US9785438B1 (en) Media cache cleaning based on workload
US11030092B2 (en) Access request processing method and apparatus, and computer system
WO2012083754A1 (zh) 处理脏数据的方法及装置
WO2024077863A1 (zh) 一种全闪存储系统的恢复方法及相关装置
CN109213690A (zh) 一种l2p表的重建方法及相关装置
JP2007188497A (ja) トランザクション処理のためのログ情報管理システムおよび方法
CN110502523A (zh) 业务数据存储方法、装置、服务器及计算机可读存储介质
CN106469123A (zh) 一种基于nvdimm的写缓存分配、释放方法及其装置
TW202134882A (zh) 資料同步方法
CN107402819A (zh) 一种客户端缓存的管理方法及系统
CN116755625A (zh) 一种数据处理方法、装置、设备及可读存储介质
CN103761156A (zh) 一种针对文件系统的在线修复方法
WO2022166265A1 (zh) 一种数据恢复方法、装置、设备及介质
CN111813603B (zh) 一种精简卷元数据备份方法、装置、设备及可读存储介质
CN100435118C (zh) 高速缓存数据回存方法
CN111602121B (zh) 利用所应用的存储器区域生命期的比特精确跟踪分析
CN116431067A (zh) 一种分布式存储系统换盘方法、装置以及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876076

Country of ref document: EP

Kind code of ref document: A1