WO2014123372A1 - Flash translation layer design framework for provable and accurate error recovery - Google Patents

Flash translation layer design framework for provable and accurate error recovery Download PDF

Info

Publication number
WO2014123372A1
WO2014123372A1 PCT/KR2014/001028 KR2014001028W WO2014123372A1 WO 2014123372 A1 WO2014123372 A1 WO 2014123372A1 KR 2014001028 W KR2014001028 W KR 2014001028W WO 2014123372 A1 WO2014123372 A1 WO 2014123372A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing module
block
log
data
information
Prior art date
Application number
PCT/KR2014/001028
Other languages
French (fr)
Korean (ko)
Inventor
민상렬
남이현
이수관
윤진혁
성윤제
김홍석
최진용
박정수
Original Assignee
서울대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR10-2013-0013659 priority Critical
Priority to KR20130013659 priority
Application filed by 서울대학교 산학협력단 filed Critical 서울대학교 산학협력단
Priority to KR1020140013590A priority patent/KR101526110B1/en
Priority to KR10-2014-0013590 priority
Publication of WO2014123372A1 publication Critical patent/WO2014123372A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery

Abstract

A flash translation layer design framework of a flash memory is disclosed. A flash translation layer structure, according to one embodiment, comprises: a first log for processing data; a second log for processing mapping information; and a third log for processing checkpoint information, wherein the first and second logs can recover errors by using the checkpoint information.

Description

Flash translation layer design framework for verifiable and accurate error recovery

The embodiments below relate to a flash translation layer design framework of flash memory.

Unlike the HDD, the flash memory cannot be updated in place, and the basic operations include reading and writing in units of pages and erasing in units of blocks, allowing bad blocks, and limiting lifetime. Therefore, in order to develop a high-performance, high-reliability storage device using flash memory as a storage medium, it is necessary to effectively utilize the advantages of flash memory and overcome the limitations, and the flash translation layer (FTL) mainly plays this role. Have been. FTL introduces the concept of mapping between logical sector addresses and flash memory physical addresses to overcome the limitations of flash memory that cannot be updated in place, thereby providing a block storage system capable of in-place updating to a host system. FTL also uses mapping to prevent bad blocks that may occur during operation from being used in the future, or to level out wear that prevents certain physical blocks from being over-erased.

Recently, as the process of flash memory is miniaturized, reliability characteristics are deteriorated. Therefore, the role and function of FTL for ensuring the reliability of flash storage systems are increasing. In particular, the possibility of bit inversion error in which normally written data is deformed after a certain time is increasing, and the number of erase / writes allowed per block is gradually decreasing. In addition, newly discovered error situations related to power failures are emerging as important issues that threaten the reliability of flash memory storage devices.

If a block or page in the flash memory where the operation (program / erase) was executing at the time of a power failure may have an unintended residual effect by the FTL and a power failure occurs while writing a page in the MLC flash memory, the page and its sibling Even data from pages that are successfully written in a relationship can be lost together. In addition, the power failure problem is difficult to identify the block in which the power failure occurred during the recovery process, and may also occur due to overlapping power failure during recovery. In order to overcome these power failure recovery problems and provide a reliable storage system, FTL must always consider recoverability during normal operation. The physical abnormality of the memory must be eliminated and the logical consistency of the data must be restored.

In order to overcome the limitations of existing incomplete and FTL dependent power failure recovery methods, embodiments present a complete, systematic and accurate verification of power failure recovery, and HIL capable of developing various FTL compositionally. We propose a (Hierarchically Interacting a set of Logs) framework.

The HIL framework provides a log as the building block for the design and implementation of FTL. The log provides a linear address space that can be updated in place for persistent recording of data and acts as a container for dividing the host data and FTL metadata that make up the storage system. FTL developers can design and implement any of a variety of FTLs by combining logs.

The power failure recovery technique provided by the HIL framework ensures complete recovery even in the event of a random power failure, taking into account the effects of residual effects, sibling page problems, and overlapping power failures. The HIL framework can formally prove the accuracy of power failures by oriented towards Design for Provability.

Each log that makes up the HIL framework is implemented as a separate thread, and each thread creates a series of flash memory compute streams, so that FTL naturally takes full advantage of thread-level parallelism and flash-level parallelism. Each log thread can run independently and in parallel except for a synchronization interface to ensure data coherence and recoverability.

According to one or more exemplary embodiments, a flash translation hierarchy includes a first processing module configured to process data; A second processing module for processing the mapping information of the data; And a third processing module configured to process checkpoint information including information on an uninterpreted block of the first processing module and information on an uninterpreted block of the second processing module, wherein the first processing module includes the first processing module. The error recovery may be performed using the non-interpretation block of the first processing module, and the second processing module may recover the error using the non-interpretation block of the second processing module.

At this time, for error recovery, the first processing module detects an error page, copies valid pages in an error block including the error page to an uninterpreted block of the first processing module, and when copying is completed, The error block and the non-interpretation block of the first processing module may be logically swapped.

The checkpoint information of the first processing module may further include a block write list of the first processing module, and the first processing module may detect the error page using the block write list.

The checkpoint information of the first processing module may further include a block write list of the first processing module and a write pointer of the first processing module. The first processing module may further include a write pointer of the first processing module. The error page may be detected by checking whether a page is in error along the block write list of the first processing module from the page to which the instruction is directed.

In addition, the first processing module may transmit updated checkpoint information to the third processing module due to a logical swap between the error block and the non-interpretation block of the first processing module.

In addition, after recovering an error using the non-interpreted block of the first processing module, the first processing module obtains the mapping information of the data from the page where the data is stored, and converts the mapping information of the data to the second. Can be sent to the processing module.

The checkpoint information of the first processing module may further include a block write list of the first processing module, a write pointer of the first processing module, and a reproduction pointer of the first processing module. Acquires mapping information from the page indicated by the reproduction pointer of the first processing module to the page indicated by the write pointer of the first processing module along the block write list of the first processing module, and transmits the mapping information to the second processing module. Can be.

The first processing module may store data corresponding to the write command in a cache when receiving a write command, and determine whether data corresponding to the read command exists in the cache when receiving a read command. Can be.

The first processing module may store the data and the mapping information of the data in the same page of a flash memory. The first processing module may store the data in a flash memory and then transmit mapping information of the data to the second processing module.

The first processing module may store the data in a flash memory in units of pages and, when the data in units of pages is stored, advance the write pointer along the block write list.

The first processing module transmits a persistence request signal to the second processing module when the write pointer crosses a block boundary, and the second processing module responds to the persistence request in response to the block corresponding to the persistence request signal. The mapping information may be stored in a flash memory, and the first processing module may advance a reproduction pointer along the block write list when receiving the persistence completion signal from the second processing module.

In addition, for error recovery, the second processing module detects an error page, copies valid pages in the error block including the error page to an uninterpreted block of the second processing module, and if the copy is completed, the error. Blocks and non-interpreted blocks of the second processing module may be logically swapped.

The checkpoint information of the second processing module may further include a block write list of the second processing module, and the second processing module may detect the error page using the block write list.

The checkpoint information of the second processing module may further include a block write list of the second processing module and a write pointer of the second processing module. The second processing module may further include a write pointer of the second processing module. The error page may be detected by checking whether a page is in error along the block write list of the second processing module from the page indicating.

In addition, the second processing module may transmit updated checkpoint information to the third processing module due to a logical swap between the error block and the non-interpretation block of the second processing module.

The flash conversion hierarchy further includes an upper processing module for processing upper mapping information of the mapping information, and after recovering an error using an uninterpreted block of the second processing module, the second processing module The upper mapping information of the mapping information may be obtained from a page in which the mapping information is stored, and the upper mapping information of the mapping information may be transmitted to the higher processing module.

The checkpoint information of the second processing module may further include a block write list of the second processing module, a write pointer of the second processing module, and a reproduction pointer of the second processing module. Acquires upper mapping information from a page indicated by the reproduction pointer of the second processing module to a page indicated by the write pointer of the second processing module and transmits the upper mapping information to the higher processing module according to the block write list of the second processing module; Can be.

The second processing module may store mapping information corresponding to the mapping command in a cache when a mapping command is received, and determine whether mapping information corresponding to the read command exists in the cache when receiving a read command. You can judge.

The second processing module may store the mapping information and the upper mapping information of the mapping information in the same page of the flash memory.

The flash conversion hierarchy further includes an upper processing module configured to process upper mapping information of the mapping information, and the second processing module stores the mapping information in a flash memory, and then upper mapping information of the mapping information. May be transmitted to the higher processing module.

The second processing module may store the mapping information in a flash memory in units of pages, and advance the write pointer along the block write list when the mapping information in units of pages is stored.

The flash translation layer structure may further include an upper processing module configured to process upper mapping information of the mapping information, and the second processing module sends a persistence request signal to the higher processing module when the write pointer crosses a block boundary. And the upper processing module stores upper mapping information of a block corresponding to the persistent request signal in a flash memory in response to the persistent request, and when the second processing module receives the persistent completion signal from the higher processing module, A reproduction pointer can be advanced along the block write list.

In addition, the error may include a power error that occurs asynchronously.

The processing module included in the flash translation layer may include an interface unit connected to at least one of a host, another processing module, and a flash memory; A cache unit including volatile memory; And a processing unit processing data or information according to a processing module type using the interface unit and the cache unit.

The flash translation hierarchy further includes a fourth processing module for processing block state information, wherein the third processing module further includes information on an uninterpreted block of the fourth processing module, and the fourth processing. The module may recover from the error using the non-interpreted block of the fourth processing module.

Further, the flash translation hierarchy further includes a fifth processing module that operates as a nonvolatile buffer for another processing module, wherein the third processing module further includes information about an uninterpreted block of the fifth processing module, The fifth processing module may recover an error by using an uninterpreted block of the fifth processing module.

According to another aspect of the present invention, a flash memory controller includes a manager configured to manage a synchronous error caused by a flash memory operation; And a first processing module that processes data, a second processing module that processes the mapping information of the data, and information about an uninterpreted block of the first processing module and information about an uninterpreted block of the second processing module. And a flash translation layer unit including a third processing module for processing checkpoint information to be included. Here, the first processing module recovers asynchronous errors using the non-interpretation block of the first processing module, and the second processing module recovers asynchronous errors using the non-interpretation block of the second processing module. can do.

According to another aspect, a flash conversion hierarchy includes a D-log for processing data; It includes a plurality of M-logs for hierarchically processing the mapping information of the data.

In this case, each of the plurality of M-logs stores the information received from the lower log in the flash memory resource allocated to the plurality of M-logs, and when the size of the information received from the lower log is larger than a predetermined size, the flash memory resource. Mapping information of can be transmitted to the upper log.

In addition, a log whose size of information received from a lower log among the plurality of M-logs is smaller than or equal to a predetermined size may be determined as the highest M-log.

In addition, the highest M-log may store information received from the lower log in a flash memory resource allocated thereto, and transmit mapping information of the flash memory resource to a C-log that processes checkpoint information.

In addition, the characteristics of each of the plurality of M-logs are set for individual M-logs, and the characteristics of each of the plurality of M-logs are the mapping unit of each of the plurality of M-logs and the plurality of M-logs. Each of the cache management policy may include at least one.

In addition, the L-log processing the block state information; And a plurality of LM-logs hierarchically processing the mapping information of the block state information.

Each of the plurality of LM-logs stores information received from a lower log in a flash memory resource allocated to the plurality of LM-logs, and when the size of the information received from the lower log is larger than a predetermined size. Mapping information can be sent to the upper log.

In addition, a log whose size of information received from a lower log among the plurality of LM-logs is smaller than or equal to a predetermined size may be determined as the highest LM-log.

In addition, the highest LM-log may store information received from the lower log in a flash memory resource allocated thereto, and transmit mapping information of the flash memory resource to a C-log that processes checkpoint information.

In addition, characteristics of each of the plurality of LM-logs are set for each individual LM-log, and characteristics of each of the plurality of LM-logs are mapping units of each of the plurality of LM-logs and the plurality of LM-logs. Each of the cache management policy may include at least one.

According to another aspect, a method of designing a flash translation layer includes providing a plurality of building blocks for constructing a flash translation layer. The plurality of building blocks may include a first processing block that processes data; At least one second processing block hierarchically processing the mapping information of the data; And a third processing block for processing checkpoint information including information about an uninterpreted block of the first processing block and information about an uninterpreted block of the at least one second processing block. The block may recover an error using the non-interpretation block of the first processing block, and the at least one second processing block may recover the error using the non-interpretation block of the at least one second processing block.

In this case, the method of designing the flash translation layer may include receiving a setting related to the design of the flash translation layer; And generating the flash translation layer based on the plurality of building blocks and the setting.

The setting may also include a setting related to the number of threads implementing the flash translation layer; A setting related to the number of cores driving the flash translation layer; A setting related to the number of threads processed per core; And a setting related to mapping between the plurality of cores and the plurality of threads.

The method of designing a flash translation layer may also include receiving a second setting related to the design of the flash translation layer; And adaptively regenerating the flash translation layer based on the plurality of building blocks and the second configuration.

1 is a diagram illustrating flash memory fault classification, according to one embodiment;

2 illustrates a storage system architecture according to one embodiment.

3 illustrates a log interface according to an embodiment.

4 is a diagram illustrating a flash translation hierarchy structure using log combination, according to an embodiment.

FIG. 5 illustrates a hierarchical mapping structure for a storage device providing a logical address space according to an embodiment. FIG.

6 illustrates a host write command process according to an embodiment.

7 is a diagram illustrating recursive processing of mapping data on a hierarchical structure of mapping according to an embodiment.

8 illustrates a host read command process according to an embodiment.

9 is a diagram illustrating a data consistency guarantee method according to an embodiment.

10 illustrates an asynchronous error according to an embodiment.

11 illustrates removal of residual effects during a structural repair process, according to one embodiment.

12 illustrates a structural recovery process from the perspective of a storage system, according to one embodiment.

13 is a diagram illustrating reproduction of mapping information during a functional recovery process according to one embodiment.

14 is a diagram illustrating cases in which recoverability is broken when there is no synchronization between logs according to an embodiment.

15 illustrates an interface to guarantee recoverability according to an embodiment.

16 illustrates a functional recovery process from the perspective of a storage system, according to one embodiment.

17 illustrates a step of formal verification according to one embodiment.

18 illustrates timeliness of a storage device system according to one embodiment.

19 illustrates a guarantee of data consistency, according to one embodiment.

20 illustrates a guarantee of recoverability according to one embodiment.

Classification of flash memory faults according to one embodiment

A NAND flash memory chip is composed of a plurality of blocks, and a block includes a plurality of pages. Each page is divided into a data area for storing host data and a spare area for storing metadata associated with the host data such as mapping information or ECC information. Flash memory supports three operations: read, program, and erase. In-place update is impossible, so the erase operation must be performed before data can be written. The minimum unit of erase operations is blocks, and the minimum unit of program and read operations is pages. In terms of reliability, NAND flash memory allows the generation of bad blocks that cannot guarantee normal operation. Bad blocks may occur not only in the initial manufacturing stage but also during operation. In addition, an error may occur in which bits of data of a page which have been recorded in the past are reversed by a disturbance mechanism.

Referring to FIG. 1, a fault of a flash memory may be classified into a synchronous fault caused by an internal factor of the flash memory and an asynchronous fault caused by an external environmental factor. have. Synchronous error means failure of internal operation of flash memory such as erase / program / read. Erase and program errors are caused by damaged cells for unknown reasons, whether or not the block has reached the end of its life. These errors have a permanent effect on the block. Therefore, blocks with such errors should be properly handled so that they cannot be used again in the system. On the other hand, a read error refers to a bit inversion error caused by a program disturb, a read disturb, or a data retention error. Bit inversion errors have a transient property that returns to the normal page when the page is deleted and reused. Since the synchronous error in the flash memory operation occurs as a result of the flash memory operation, it is possible to solve the problem exclusively in the lower layer (Bad Block Management Module, Error Correction Module) independent of the FTL. It can provide the illusion of flash memory with no synchronous error. Embodiments adopt this hierarchical approach and assume that the FTL handles only asynchronous power failures. This is an important assumption that enables the proof of correct recovery in any power failure situation by greatly reducing the complexity of the power failure recovery algorithm.

The asynchronous error that FTL needs to solve exclusively is an error that can occur at any moment at any time without time correlation with internal operations of flash memory such as erase / program / read. Specifically, it means a power failure. Since the erase / program operation that changes the state of the flash memory page does not guarantee atomic properties, the page can have various states if a power failure occurs during the operation. In particular, even if the data appears to have been successfully written to a page that initially had a power failure during the program, it is very vulnerable to read disturbance and may return non-deterministic incorrect data on every read. Residual effect) page. The data contained in these residual effect pages has the potential to turn into an element of system error at any moment, so if the residual effect pages are not successfully identified and removed during the power failure recovery process, the system's consistency can be fatal. It can be brought about. To make the problem situation even more difficult is the sibling-page problem in flash memory chips of the MLC type, in which the metadata (or host data) of the FTL, which has already been successfully written in the past, may be sibling pages in the future. If a power failure occurs while the page is being programmed, it can be lost at any time. This means that from the FTL point of view, the successful completion of a program on a physical page no longer guarantees the persistence of the data written to that page.

File translation layer design framework according to one embodiment

FIG. 2 shows a flash memory storage architecture consisting of a flash memory controller, a flash memory subsystem composed of a BMS layer, and an FTL implemented with an HIL framework. It is assumed that the flash memory controller can control a plurality of flash memory chips, support flash memory read / write / erase operations, and detect and report an error occurring during the execution of the operation. It is also assumed that the BMS layer is responsible for handling all synchronous errors that occur during the execution of flash memory operations. To this end, it is assumed that a typical BMS layer reserves extra blocks for exchanging blocks when a write / erase error occurs and includes error control logic such as an ECC encoder / decoder to handle read errors. Therefore, the BMS layer can provide a virtual physical address space in which the flash memory operation error does not occur to the upper layer until all the extra blocks are exhausted or a read error that cannot be corrected by the ECC circuit occurs.

FTL is implemented on a flash memory subsystem that uses the HIL framework to show a virtual physical address space that does not cause synchronous flash memory errors. The HIL framework defines an object that implements an address space that can be updated in place using flash memory blocks in the virtual physical address space for each FTL-managed metadata such as host data, mapping data, and block state data. . Users can combine logs to implement various FTLs. The log consists of a flash log consisting of a series of flash memory blocks for writing data persistently internally and a cache implemented with volatile memory for better performance.

The log processes metadata such as host data, mapping data, or block state data, and thus may be referred to as a processing module. In the HIL framework, the logs may be referred to as processing blocks because the various types of logs for designing the FTL are provided as building blocks. The log may be variously implemented in software, hardware, or a combination thereof.

3 shows the interface of a typical log. Logs require (1) a read / write interface to provide other logs or external host systems with a linear address space that can be updated in place for their managed (meta) data, which is read / write to the flash memory subsystem. The log needs to interface with the flash memory subsystem because it is converted to a request. When a log writes data to flash memory, it is necessary to update each of the mapping information for the data, and (4) block state information to recover the space invalidated by the write operation. I need an interface. In addition, persistently written data must be guaranteed to be recovered even if a sudden power failure occurs. For this purpose, all logs need an interface to periodically check (5) all information necessary for recovery.

Depending on the type of data managed by the FTL, the log types include D-type logs that process host data, M-type logs that process mapping information, L-type logs that process block status information, and checkpoint information for power failure recovery. It is divided into C type log to be processed. In some cases, the type of log may further include a W-type log that operates as a nonvolatile buffer of another log. In addition, the type of log is a concept distinct from the M-type log that processes the mapping information of the host data, and may further include an LM-type log that processes the mapping information of the block state information. Type D logs, Type M logs, Type L logs, Type C logs, Type W logs, and Type LM logs are D-log, M-log, L-log, C-log, W-log, and LM-log, respectively. It may be referred to as.

Although the interface and behavior vary slightly depending on the type of log, all logs have a common address in that they provide a linear address space to other logs or host systems that can update their own managed data using resources allocated to them. to be. The HIL framework integrates and manages the in-place updateable address space managed by all the logs that make up the FTL, and defines it as a pseudo logical address space (PLA). As can be seen in Figure 3, each log occupies a unique area of its own dedicated management in the virtual logical address space.

The HIL framework can design and implement any of a variety of FTLs through this combination of different kinds of logs. The concatenation structure of the logs is determined by the hierarchy of mappings and the attributes of the data managed by the logs. 4 shows an example of a connection structure between logs. When the log writes data to flash memory, mapping information is created and the mapping information is managed by the upper mapping log. Therefore, in the HIL framework, a plurality of logs form a constant hierarchy as shown in FIG. 4 through a mapping relationship. In order to process read / write commands sent from the host computer, the FTL must perform address translation with reference to the mapping information. In the HIL framework, multiple logs in the mapping hierarchy and the interworking between them Implemented by address translation. Hierarchical expansion of the mapping structure and host read and write processing using the same will be described later.

When a log writes data, it also needs to update the state information of the physical block, and the HIL framework assumes that the L-type log is responsible for updating the block state information that all logs carry. This relationship is represented by a connection line where all logs carry block status information to the L-shaped log. Type C logs record information for power failure recovery. Since all logs have their own checkpoint information and the amount of checkpoint information is generally small, managing the checkpoint information of all logs in one checkpoint log Assumed Hereinafter, for convenience of description, the embodiments will be described assuming an FTL composed of a D-type log, an M-type log, and a C-type log. However, the L-type log that manages the liveness or invalidation of a block or page for garbage collection, and the W-type log that operates as a nonvolatile buffer of another log for hybrid mapping FTL will be described later. The same may apply.

I. Hierarchical Mapping Structure

When the log writes data to flash memory, mapping information is generated. The mapping information is smaller than the host data because it is a pointer to the physical address of the flash memory where the host data is stored. The size of the entire mapping information is determined by the size of the host data area, the mapping unit, and the size of the mapping entry. As shown in FIG. 5, when the page size of the flash memory is 4KB and the size of the mapping entry is 4B, the mapping to the 4GB logical address space is performed. The information is 4MB, which is 1/1024 of 4GB. When the mapping data is viewed from the same point of view as the host data, the mapping data also has a linear address space like the host data. Therefore, the 4GB user address space is written to the flash memory in units of pages and the mapping information occupies 4MB of address space. Therefore, the address space for 4MB of mapping data is recursively divided into units of page size, so that it is separated into separate mapping data. It corresponds to writing to flash memory and mapping data to mapping data occupying 4 KB of space. That is, in FIG. 5, the relationship between the D-type log and the M0 log is the same as that between the M0 log and the M1 log. As mapping data has a smaller address space than mapping data and mapping relationship has a recursive property, mapping data between logs caused by host data write operation is generally the same. Can be formed. As such, embodiments may provide a general hierarchical structure that is unlimited in how many steps hierarchical mapping takes place.

In the HIL framework, the hierarchy of mappings can continue until the top level mapping data is any limited number (e.g., single page). In the example of FIG. 5, since the address space of the M0 log is 4 MB, the size of the address space of the M1 log is 4 KB, which is the flash memory page size. Thus, the hierarchy of mappings in the M1 log can be terminated. The FTL implemented by the HIL framework keeps only the highest mapping information in volatile memory during operation, so that all the latest mapping information and the location of the data can be found in order along the mapping hierarchy. Therefore, since the entire mapping information does not need to be maintained in the volatile memory, even in a flash memory storage system having a limited size of the volatile memory, it is possible to support a very large (e.g., page mapped FTL) FTL. The FTL checks the location of the highest level mapping information periodically during operation and scans only the checkpoint area when the storage device is restarted to find it quickly. For example, information about where in the flash memory the mapping information processed by the top log (eg, the M1 log) is stored may be stored in the C-type log.

5 also shows the areas occupied by D-logs, M0-logs, and M1-logs when the HIL maintains a virtual logical address space internally as byte addresses to provide 4GB logical address space to the host system. The D-log provides a logical address space to the host system, and accesses the virtual logical address space implemented by the M0-log. Similarly, the M0-log provides a virtual logical address space for the D-log and accesses the virtual logical address space managed by the M1-log. As such, read and write commands from the host system begin with the data log and are address-translated from virtual logical addresses and propagate along the hierarchical log. As shown in FIG. 5, since the upper and lower interfaces of the logs are identical in the mapping relationship, the logs in the mapping hierarchy can be easily interconnected. Therefore, the HIL framework can easily build various types of hierarchical mapping structures even if the storage capacity increases or the mapping units of logs change.

II. Host Write and Read Behavior

Implemented in the HIL framework, the FTL provides the host system with a storage interface that can read and write any number of sectors.Like a typical storage device such as an HDD, data coherence and data durability are maintained. The requirements for In addition, FTL implemented with the HIL framework must naturally accommodate the command queuing features supported by traditional storage interfaces, such as SATA's Native command queuing (NCQ) and SCSI's Tagged command queuing (TCQ).

1. Write Host

6 shows a process of processing a host write request in a three-level mapping hierarchy. The host write procedure is largely divided into the foreground task of storing host data in the D-type cache and responding to the write completion, and the background task of serially updating the mapping information according to the mapping hierarchy of the log. Lose.

Sector-based host write requests for the logical address space are converted by the write thread to page-level operations install_data for the virtual logical address space and forwarded to the D-type log (1). The write thread has only an interface with the D-type log among the logs in the HIL framework. It is executed independently of the log, but is not a log because it does not leave a nonvolatile state. The D-type log stores the data requested by the host in volatile memory (2) and immediately responds to completion with installed_v (3). When the write thread receives the Installed_v response for all the page writes constituting one host write request, the write thread can respond to the host system with write_ack for the corresponding host write request. If the host read request arrives in the same logical address space as the write request without writing the data in the volatile buffer after the write thread's completion response, the FTL processes the read request with the latest data in the buffer and stores it. Data consistency must be ensured during device operation.

Unless there is a separate command or setting that enforces data persistence from the host system, the D-type log writes the data held in the cache to the flash memory at the appropriate time according to the internal buffer management policy (4). The location where the logical page is written corresponds to the mapping information, which is transferred to the upper M0 log by the install_mapping command (5). The install_mapping command received by the M0 log is conceptually the same as the write command that the D log receives from the write thread. Therefore, the M0 log saves the mapping information in volatile memory and immediately responds to the D log with install_v. (6) After this completion response, the M0 log should ensure the consistency of the mapping data to the D-log by returning the most recent mapping data installed when the D log queries the mapping information. This is equivalent to guaranteeing consistency of host data to the host system during the storage device operation as the D log is a data write completion response.

The processing method of mapping data in the M0 log is also essentially the same as the processing of host data in the data log. Writing the mapping data from the M0 log into the flash memory creates a mapping to the mapping data, and then a new install_mapping command for the M1 log address space that can be updated in place must be sent to the M1 log. The chained write request propagated to the address space of the hierarchical log by this same write request processing structure is repeated recursively until the top mapping log is reached. FIG. 7 illustrates a recursive procedure in which a plurality of mapping data delivered from a lower log are collected in a unit mapping log, written on one page, and one mapping information thereof is successively transferred to a higher log. In addition, since mapping data is smaller than the data pointed to by the mapping information (e.g., 4B vs. 4KB), the page unit operation in the lower log is converted to the request unit operation of the upper log and transferred to the upper log.

The cache maintained by each layer logs improves the response speed to write requests between the host and D-logs, and between logs and logs, and increases the efficiency of flash memory write operations. In addition, the logs of each layer can be executed in parallel by default, so that data in volatile memory can be written to flash memory independently. Therefore, the FTL implemented by the HIL framework can optimize the performance of the FTL by simultaneously executing the tasks of each log in different threads and simultaneously sending read / write requests to several flash memory chips.

2. Read Host

8 shows a process in which a read request for a logical address space p is processed on a mapping hierarchy having a three-level mapping hierarchy. Since host writes are serially propagated from the writes of the data log to the higher mapping logs, the latest data or mapping information is always propagated from the bottom up. In addition, since each log maintains volatile memory, the most recently updated or referenced data can be maintained in volatile memory. Therefore, in order to maintain data coherence, the FTL implemented by the HIL framework basically operates as a bottom-up query like the write request processing.

The read thread first queries the D log and if there is data, the D log can transfer the data in volatile memory to the host system and complete the read processing without reading the data from flash memory. Otherwise, data must be read directly from flash memory and a bottom-up query process begins to obtain the necessary mapping information. The query to the upper log continues until the queried data is in volatile memory. If all of the logs in the mapping hierarchy do not have data queried in volatile memory, the query reaches the top mapping log. Since the HIL framework assumes that all mapping information managed by the top level mapping log is kept in volatile memory, queries against the top level mapping log always succeed. If there is no queried data in the lower log's volatile memory but in the upper log, the upper log's data is the mapping information of the lower log, so the read thread is top-down to the lower log with the mapping information from the upper log. requery). The log should always return the queried data by reading the data from flash memory using the mapping information received with the material's command. The process of the top-down material is repeated from the time when the data queried in the bottom-up query process is correctly returned from the volatile memory to the data log.

8 illustrates a process of a bottom-up query and a top-down material through an example in which there is no data queried in the D log and data queried in the M0 log. First, the read thread converts a sector read host request into a page query request to the data log (1). A read thread, like a write thread, is distinguished from a log that writes nonvolatile data to flash memory in that it maintains only volatile data structures for the operation of internal functions. Since the D log does not have data corresponding to the queried virtual logical address space pla_d in the volatile cache, when a cache miss response (Query_ack (miss, null)) is sent (2), the read thread gets mapping information for the pla_d address. The new query request is forwarded to the M0 log (3). At this time, the query for the upper mapping log has the same interface as the query for the data log. As shown in the scenario of FIG. 8, when the cache hit occurs in the M0 log, the M0 mapping log reads the cache hit and the query result value (B i , P j ) (mapping information on the logical page pla_d) as a response to the query (Query_ack). Return to thread (4). After that, the read thread resends the query command with the mapping information obtained from the upper mapping log to the D log (5), and the D log reads data from the received flash memory address and completes the response with the (6,7) read thread. (8). The read thread can finally send the collected data to the host system after receiving a complete response from the data log through this same bottom-up query for all logical pages that make up a host request. The interface between the read thread and the data log, and the read thread and the M0 log is the same for any higher mapping log Mi, so that the bottom-up query can easily extend the scenario of FIG. 8 even when a cache hit occurs in the Mi log. have.

Unlike write threads, read threads interact directly with all mapping logs, including data logs, to improve performance by processing host read requests in parallel as much as possible. The host write command, which is a non-blocking request, can immediately respond as soon as data is written to volatile memory, and writing the actual data into flash memory is a late task that can be distributed to multiple flash memory chips to improve performance. However, since the host read command, which is a blocking request, can complete the response only after the requested data has been obtained, the performance can be improved by sending the request to as many flash memory chips as possible. In particular, if the host interface supports command queuing such as NCQ, each host read command is distributed among multiple read threads and each read thread independently interacts with the logs, simultaneously making read requests to the flash memory chip. Because they can be sent, the speed of processing host read requests can be greatly improved. With this extension in mind, the HIL framework is designed to allow the read thread to interact directly with each log.

To ensure data consistency between host write and read operations, the data that has been written can only be released from the buffer after receiving the installed_v response to the mapping from the parent mapping log. 9 illustrates a problem situation that may occur when data is released from a buffer after completion of writing and before receiving installed_v. If it is not certain that the mapping information for the latest data is reflected in the upper mapping log and the mapping information for the data disappears from the lower log, as shown in the shaded portion of FIG. There may be a point in time that appears to disappear from. The thread that handles the read operation can run concurrently, independent of the log threads associated with the write operation, and if the read thread that executes independently at this point performs the read operation, then the past, rather than the latest version of the data in that logical page It will return the version data of as a result value. This situation must be avoided because it violates the data coherence requirement that the result of a read request to any logical address should return the result of the last successful response to the same logical address. Therefore, all logs must comply with the synchronization rule of releasing information about the logical page from the cache only after receiving the installed_v response as shown in the right side of FIG. 9, and this rule is an important design element that can prove the accuracy of the HIL framework. Design for provability).

Asynchronous error recovery according to one embodiment

Power failure recovery techniques built into the HIL framework ensure complete power failure recovery even if a power failure occurs randomly. Hereinafter, the effects of the power failure will be described first, and then a power failure technique for remedy this will be described.

FIG. 10 shows two effects, a residual effect and a lost update, in the case of a power failure during the execution of an operation for changing the internal state of the flash memory. The residual effect is defined as an effect in which an unintended state remains in the flash memory due to a power failure during a flash memory write or erase operation. Since the flash memory chip does not guarantee the atomicity of the operation, if a power failure occurs during the execution of the write operation as shown in the right side of Fig. 10, 1) no data is written, 2) only a part of the intended data is written, or 3) All may be written. In the case of 1) or 2), the page could not write the intended data, so these pages can be detected and reused by using ECC or the like during the power failure recovery process. However, since 3) the write operation guaranteed by the flash chip is not atomically terminated, even if the data does not have an ECC error, the data cannot be completely trusted. This is because pages that have a power failure during writing are vulnerable to read disturbances, so even if they initially return normal data, they are likely to return incorrect data for subsequent read operations during operation. The effects of errors occurring when the pages or blocks with residual effects are not handled properly are potential and unpredictable. Also, as power failures repeat, these potential error pages can accumulate cumulatively. If the data contained in the page with residual effects contains meta information from the system, such as mapping information, the information may be read temporarily temporarily if the FTL does not handle it properly, but the information is unintentionally deleted after a certain period of time. Can cause serious damage to the consistency of the storage device.

Lost updates result in power failures while the FTL is still in the middle of a series of internal operations to handle a single external command, such as a host write request, resulting in a loss of consistency between the data structures in the FTL. It means the situation. There are many types of coherence that can be destroyed depending on when a power failure occurs or the implementation of the FTL. Basically, the data is written to the flash memory, but the mapping information for the data is not written or if the power failure occurs while writing the mapping information as shown in FIG. 10, the mapping information maintained in the volatile memory disappears. It is not possible to recover the mapping information for the data written during the recovery process. In addition, if the FTL guarantees the persistence of the data in response to the flush cache, etc., but the data cannot be recovered because the mapping information for the data is lost, even if the storage device is recovered in the event of a power failure, the file system or database system You may fall into this unrecoverable state. Furthermore, if a power failure occurs when the mapping information is first written to the flash memory and the data has not been written yet, the mapping information may point to incorrect data, and thus the storage device may not guarantee data integrity. Therefore, when performing an operation that changes the state of the flash memory, the FTL must always consider the possibility of recovery, and the recovery process must be able to heal two effects, a residual effect and a lost update, in a systematic and verifiable manner.

The power failure recovery of the HIL framework consists of two phases: structural recovery and functional recovery. Structural repair is the step of atomically removing all residual effects in the storage device. Structural restoration of the HIL framework is a new concept that does not exist in existing known recovery techniques. Functional recovery is the step of restoring consistency between data and metadata. For functional recovery, each log reproduces what would have been done during normal operation if there were no power failures.

I. Structural Recovery

In structured recovery, each log removes residual effects independently of the other logs. In order to remove the residual effect, 1) it should be able to define the blocks that can handle the operation that changes the state of the flash memory at the time of power failure, and determine the block and page in which the power failure has occurred. A spare block must be prepared to copy the normal data among the blocks that have effected, and 3) randomly overlapping power failures occur until the removal of residual effects is atomically over all logs that make up the HIL framework. We need a way to prevent this. 4) Finally, we need a way to reduce the time to find the block that caused the power failure.

When a block or page of flash memory that has a residual effect is called crash frontier, the HIL framework defines a block that the log is using or blocks that will be used in the near future and the order in which they are written. ). In normal operation, the log performs flash memory write operations to blocks in the order specified in the block write list, and writes them in ascending order within the block. During the recovery process, the FTL scans only the blocks in the block write list, not the entire flash memory space, in order, so that the log can determine which blocks were being written at the moment of power failure. The determination of whether the residual effect actually occurred uses the ECC and CRC information recorded in the spare area along with the data. The log scans the list of block writes during the recovery process and uses the copy up to the page immediately preceding the page where the CRC error occurred in the crash frontier block.

During recovery, the crash frontier page may or may not be read normally. A successful read means that it is a potential error page that might return incorrect data in the future due to read disturb. The structural recovery of the HIL framework can eliminate this residual effect page by rewriting the valid pages in the power failure block into a new block after identification of the power failure block. This is because if the residual effect page returns a valid value, it will be moved to a new page and written, so that the residual effect will be removed naturally, and if it does not return a valid value, it will be excluded from the copy and successfully excluded from the system.

During a valid page copy operation to remove residual effect pages from the log, a page copy operation for structural recovery of the HIL framework in the event of another power failure is called a don't care block. Performed on a special block. An uninterpreted block is a block that is specially allocated only for structural recovery, which is maintained for each log with a block write list, and whose contents are not interpreted at all by the system. If a power failure occurs while programming a non-interpreted block during structural recovery, after the system reboots, the block is still in the non-interpreted block and thus does not affect the integrity of the system. An uninterpreted block is erased each time before the valid page copy operation of structural recovery and used in a new state. After all data writing to the uninterpreted block is completed, the structural recovery is completed as soon as the uninterpreted block is confirmed with the power failure block on the block write list and the change of use. In other words, even if a power failure occurs again during structural recovery, all write commands are executed on the uninterpreted block, allowing you to return to the beginning of the initial recovery as if nothing happened at the next recovery. Each write pointer can be held to quickly find a power failure block. The log performs write operations in the order specified on the block write list and advances the write pointer after writing. Pages before the write pointer are guaranteed to be persistent, so there is no need to scan during power failure recovery. Therefore, scanning time can be shortened compared to scanning from the beginning of the block write list. 11 summarizes the key data structures and mechanisms of logs for removing residual effects.

12 shows a structural recovery process at the storage level. The FTL constituting the HIL framework can consist of multiple logs, each of which can run in parallel. Therefore, the key to structural recovery is to atomically handle the task of eliminating the residual effects of multiple logs. To this end, structural restoration proceeds with the following procedure.

Structured recovery begins with the type C log looking for the most recent checkpoint information in the checkpoint area. The size of the checkpoint area is limited, and since each page storing the checkpoint information includes a timestamp, the checkpoint information can be obtained by scanning the checkpoint area in a short time (1). The checkpoint information includes information necessary for power failure recovery such as a block write list and an uninterpreted block of logs managed by the user and transmits them to the corresponding logs (2). Each log scans the blocks from the write pointer to the block write list using the checkpoint information received (3) Finds the power failure block and the location where the ECC / CRC error occurred in the corresponding page, and removes the pages read normally from the power failure block. Copy to analysis block (4).

After copying, the log volatically changes the number of uninterpreted blocks and power failure blocks in checkpoint information as shown in the right side of FIG. 12 (5). This volatile HIL meta-information update causes each to have a shadow block that consists entirely of valid pages in the power failure block, with the effect of the residual effect being completely eliminated. The checkpoint information changed by the copying process and the update of the HIL metadata in parallel in each log are retransmitted to the checkpoint log (6). When the changed HIL metadata of all logs finally arrives in the C log, a complete shadow-tree is created. The C log collects all the checkpoint information passed to it and writes it to the checkpoint area. When this task is completed successfully, all the above-described tasks are atomically replaced. (7) Type C logs explicitly broadcast all logs that the structural recovery has been completed successfully, and each log starts a functional recovery.

All erase and write operations that change the state of flash memory throughout the entire structural recovery process are done in uninterpreted blocks, except for the last one page in the C-type log, and all other operations are in volatile memory. Thus, if a power failure occurs again at any moment during structural recovery, there is no change in the logical state of the storage device, so it returns to the state when power failure recovery started. The entire structural recovery process is completed atomically by gathering the complex residual effect removal operations performed on each log and writing checkpoint information to the C-type log.

II. Functional recovery

After the structural recovery, each log starts a functional recovery. From the point of view of each log, functional recovery is to reproduce the install_mapping process, which would have been done if there were no power failures, to restore the logical address space of each log to the state prior to the power outage.

When a power failure occurs, all the information that FTL kept in volatile memory disappears. Therefore, in order to be able to reproduce, when the log operates normally 1) Whenever data is written to the flash memory, the logical state of the data is restored during the recovery process. The necessary information must be kept nonvolatile, and 2) the logs must be synchronized to the flash memory to prevent unrecoverable situations. Also, in the reproduction process, the reproduction process should be processed considering the difference between 3) the state after removing the residual effect and the state before the power failure occurs, and 4) a method for reducing the reproduction time is needed.

The object of reproduction is mapping information for data written to the flash memory. Since the page of the flash memory consists of a data area and a spare area for recording meta information, the log writes mapping information to be used in the reproduction process together with the data in the flash memory page as shown in FIG. Since mapping data, which is data and recovery information, are stored together in a flash memory page by a single flash write operation, the log does not need to consider the order in which the data and recovery information are stored, and does not need to manage separate space for recovery information. . The HIL framework makes it easy to maintain recovery information during normal operation, and the reproducing process can be handled easily because the data and mapping information are always on the same page.

In principle, each log can run independently, but in order to recover from power failure, minimal synchronization is required between logs during normal operation. Basically, data must be written to flash memory before mapping to data. If the log is completely independent and the upper log writes the mapping information to the flash memory and the lower log does not write the data to the flash memory as shown in the left first part of FIG. 14, the mapping information is not actually written. Points to data that do not exist will cause problems with the data integrity of the storage device. The log follows the rules that the mapping information passes to the parent log only after the child log has written the data in the hierarchy, thus keeping the flash memory write order constraint between the child log and the parent log.

MLC flash memory with sibling page structure does not guarantee the persistence of the data by successfully writing the data on a specific page, but it does not guarantee the persistence of the data contained in the page until all the pages in sibling relationship are used up. . The middle figure of FIG. 14 shows a situation in which the data of the first page that has been successfully written is lost due to a power failure that occurred while writing data to the second page in sibling relationship. If the mapping log writes mapping data without considering such a situation, there is a problem that the mapping information is referred to the missing data.

Finally, the third figure of FIG. 14 shows a problem of mapping and data consistency occurring during power failure recovery. The power failure block is replaced by an uninterpreted block to remove the residual effect during recovery. If the mapping data of the valid data written in the power failure block has already been written by the upper mapping log, the mapping information is the current valid block after removing the residual effect. Rather than point to data written in (uninterpreted block before power failure), refer to nonsense data contained in a block that was valid at the time of a past power failure but became a current non-interpreted block (power failure block before power failure). There is.

To solve this problem, the HIL framework uses a synchronization command called NV_grant that allows the child log to write the mapping data to flash memory, and an Intalled_NV that tells the parent log that the mapping data has been permanently written to the child log. An interface was introduced. 15 illustrates an overview of operations of NV_grant and Intalled_NV. The NV_grant command for data included in a specific page can be transferred to the upper log only after all the blocks to which the page belongs are written to flash memory and the write operation for the next block in the block write list starts. This condition ensures that (1) it is not lost in case of power failure due to the sibling page structure of MLC chip, and (2) that it is not included in power failure block. The upper log receives the mapping information and keeps it in the cache before receiving the NV_Grant command. When the NV_grant is received from the lower log, the mapping information can be written. The mapping information is not only written to flash memory, but also persisted to Installed_NV in the lower data log. The lower log receiving the Installed_NV message may advance the replay pointer of the block write list with respect to the pages where the mapping information is persisted as shown in (7) of FIG. 15. Pages before the reproducing pointer no longer need to reproduce the update of the mapping information during power failure recovery because both the data and the mapping data for that data are persistent. When recovering from a power failure, the reproduction process starts from the reproduction pointer, so that the reproduction time can be reduced compared to the reproduction from the beginning of the block write list. Blocks past the reproduction pointer in the log write list of the log can be truncated from the list since they no longer need to be used at recovery time.

After the structural recovery is completed, the log starts a functional recovery through a reproduction process. After the residual effect was removed by the structured recovery operation, the state of the log was 1) that all data structures held in volatile memory were lost, and 2) the mapping data for data after the reproduction pointer was written to flash memory from the upper log. It is conceptually identical to the state before the power failure, except that it is unknown. Therefore, the separate mechanism for the reproduction process is minimized and most of the install_mapping processing mechanisms during normal operation are reused. The replay process retransmits the mapping information to the parent log by scanning pages along the block write list, starting at the rewrite pointer. Unlike normal data, which requires writing data and transmitting mapping information, the data is already written. Therefore, only the mapping information stored with the data needs to be transmitted in the reproduction process. The mapping information to be transmitted can be obtained from the spare area during the scanning process. The transmitted mapping data may have already been written to the flash memory from the upper mapping log before the power failure, but the updating of the mapping information is idempotent, so if the mapping information is sent in the same order, the result is the same even if the mapping information is updated several times. This guarantees that the mapping data can be written to the flash memory multiple times.

16 illustrates a process of functional recovery in a storage system having a hierarchical structure of three levels of mapping. In the hierarchical log configuration of the HIL framework, functional recovery is completed from the top log and ready to receive mapping information update requests from the sublog. If the mapping information in the top-level mapping log is contained within one physical page, the write and reproduce pointers are in the same position, and the functional recovery of the logical address space of the top-level mapping log is completed by reading the page pointed to by the latest write pointer. do. The logical address space of the restored highest mapping log M1 represents the mapping data of the lower mapping log M0, and in the M0 log, the mapping information from the reproduction pointer to the last page successfully written is reproduced (retransmitted to the upper log). Complete a functional recovery of the logical address space. This top-down, chained functional recovery continues until the final datalog is reached. Finally, the functional recovery in the datalog is completed, and finally the logical address space for the host data is restored to the state prior to the power outage.

Verify the accuracy of the file translation layer design framework according to one embodiment

Since the HIL framework is oriented towards the generality of designing any of a variety of FTLs, it is important to implement the FTL as a specific instance and then verify the accuracy through testing to fully verify the accuracy of the HIL framework itself. It is not a verification. Therefore, the formal verification of the accuracy was taken through the abstracted system model of the HIL framework, and for this purpose, the design for provability was pursued from the system design stage.

17 shows the steps of the formal verification process of the HIL framework. First, the assumptions of the system were established, and the next step was the establishment of correctness criteria for clearly what it means for the storage system to operate correctly. Next, I created a formal specification of the HIL framework. This process involves abstract modeling of HIL framework behavior, a clear definition of concepts, and a formal representation of key rules. Finally, based on the assumptions of the system established through the above steps, the criteria for accuracy, and the formal specification, it was mathematically proved by deductive reasoning that the theorem about the accuracy of the HIL framework is true.

Table 1 shows the accuracy criteria for the storage system as the starting point for proof of accuracy.

Figure PCTKR2014001028-appb-I000001

More clearly defined in the above accuracy criterion is the concept of data optimism, the most recent, and the concept of time, always. First of all, it is necessary to define more clearly in the accuracy criteria of the storage device, the concept of data bestness of 'the most recent' and the concept of time of 'always'. First, in view of the timeliness of the storage system, as shown in FIG. 18, the storage system may be regarded as having a finite number of times of normal operation and power failure recovery until the end of life after initialization.

Recency of data has different meanings in normal operation and power failure recovery. First, when the storage system is operating normally, the latest data will be the data with the most recent volatile write success response. When the power failure of the storage system is completed, the latest data may be the data that was last responded to lasting before the power failure. Thus, the subdivided definition of the accuracy criteria in normal operation and power failure recovery, in normal operation, “returns the data of the most recent volatile write success response to the read request”, that is, data consistency ( Coherence is guaranteed, and in the event of a power failure recovery, it is “to recover the data that was last responded to lasting before the power failure”, that is, recoverability. Thus, if the HIL framework proves to satisfy these two attributes (data consistency and recoverability), it is equivalent to demonstrating that the “always” accuracy criterion is satisfied on the time axis in which n pairs of normal operation and power failure recovery repeat.

The HIL framework introduces rules that must be followed in log operations to ensure data consistency and recoverability. First, we introduce the contents of the reasoning used in the system rules and proofs introduced to ensure data consistency. In FIG. 19, when Data (X) in Cache i is recorded in Flash log i , new mapping information Link (X) is generated. The Link (X) is then carried to the Cache i + 1 of the parent log i + 1 through the log between the interface successfully installed on this Link (X) Cache i + 1 according to installed_v rules of the HIL framework previously described Only after it is confirmed can Data (X) be removed from Cache i . By following these rules, it is possible to ensure that the connection request for Data (X) is not lost at any point during normal operation for the read request for Data (X). This is the same principle as placing a feature on the other hand after confirming that the target of the next target is held up with an outstretched hand to ensure safety during rock climbing.

Next, the rules introduced in the HIL framework to ensure recoverability will be described with reference to FIG. 20. In the HIL framework, the flash log does not reproduce all mappings in the log to improve recovery performance, but rather maintains a small sized Redo_set based on the replay pointer. Installed_NV rules of the HIL framework described in Figure 15, this time Link to the conditions under which the Data (X) recorded in the flash log (Flash log i) can be removed from the reproduction set (Redo_set i) to Data (X) ( It is limited after it is confirmed that X) is persistent in the flash log (Flash log i + 1 ) of the upper log. This rule ensures that the connection to Data (X) is not lost in any power failure situation by ensuring that the connection to Data (X) is revived by Log i 's recovery process, is present in the parent log, or either. To ensure.

According to embodiments, any of a variety of FTLs can be configured through a combination of logs. HIL is a generic framework for FTL design that can cover a variety of mapping units, write buffering or caching policies, and garbage collection policies. In addition, the HIL framework according to the embodiments ensures successful recovery in any power failure situation and its accuracy has been verified. HIL has solved the reliability problem of modern flash memory comprehensively and completely. In addition, embodiments provide the possibility of performance optimization through the parallelism of threads, which effectively leads to the parallelism of flash memory requests.

Design of a Flash Translation Layer According to an Embodiment

According to an FTL design technique according to an embodiment, a plurality of building blocks for configuring a flash translation layer may be provided to a user. The plurality of building blocks may include a first processing block (eg, a D-type log) for processing data and at least one second processing block (eg, an M-type log) for hierarchically processing mapping information of data. ) And a third processing block (eg, a C-type log) for processing checkpoint information for the first processing block and the at least one second processing block. For example, a user interface for constructing a flash translation layer using a plurality of building blocks may be provided to a user. The user can design a desired FTL using a plurality of building blocks.

In addition, a user interface may be provided to the user that receives settings related to the design of the FTL. Here, the configuration related to the design of the FTL is related to the number of threads implementing the FTL, the configuration related to the number of cores driving the FTL, the configuration related to the number of threads processed per core, and the plurality of cores. And configuration related to mapping between the plurality of threads. For example, individual logs may be set to be implemented as functions, set to be implemented as a single thread, or set to be implemented as multi-threads. If individual logs are implemented as functions, FTL can be implemented as a single thread. If individual logs are implemented as single thread or multithreaded, FTL can be implemented as multithreaded.

When the FTL is implemented as multi-threaded and the FTL is set to be driven by multi-cores, thread-core mapping may be set. For example, if the number of cores is two and the FTL is implemented with ten threads, the mapping between the two cores and the ten threads may be set.

An FTL may be generated based on a plurality of building blocks used in the design of the FTL and settings related to the design of the FTL. At this time, the HIL framework provides an essential synchronization interface to satisfy data consistency and recoverability of data even when multiple threads execute FTL in parallel. For example, individual logs preferentially determine cache hits to satisfy data consistency. In addition, the upper log and the lower log exchange NV_grant and Installed_NV to satisfy the recoverability of data.

Because of this, one log can be implemented in multi-threaded execution in parallel with each other. Alternatively, one log may be implemented as a function, where several logs may be combined to create one thread. Thus, FTL built on the HIL framework can be leveraged without code changes, from single core environments to multi-core environments. Furthermore, in each case, a single thread or a multi-thread may be driven for each core. As such, the HIL framework can be configured in various forms in a multicore / multithreaded environment.

In addition, according to the FTL design technique according to an embodiment, the pre-generated FTL may be adaptively regenerated according to the changed setting. For example, a user interface may be provided to the user that changes settings related to the design of the FTL. In this case, the FTL may be adaptively regenerated based on the new configuration and the plurality of building blocks used for the previously generated FTL. In other words, even if the setting is changed, such as hardware configuration, the FTL can be adaptively regenerated without changing the code.

The embodiments described above may be implemented as hardware components, software components, and / or combinations of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gates (FPGAs). It may be implemented using one or more general purpose or special purpose computers, such as an array, a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved. Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims (42)

  1. A first processing module for processing data;
    A second processing module for processing the mapping information of the data; And
    A third processing module for processing checkpoint information including information on an uninterpreted block of the first processing module and information on an uninterpreted block of the second processing module
    Including,
    The first processing module recovers an error using an uninterpreted block of the first processing module, and the second processing module recovers an error using an uninterpreted block of the second processing module. .
  2. The method of claim 1,
    To recover from the error,
    The first processing module detects an error page, copies valid pages in the error block including the error page to an uninterpreted block of the first processing module, and when copying is completed, the error block and the first processing module. Flash translation hierarchy for logically swapping uninterpreted blocks.
  3. The method of claim 2,
    And the checkpoint information of the first processing module further includes a block write list of the first processing module, wherein the first processing module detects the error page using the block write list.
  4. The method of claim 2,
    The checkpoint information of the first processing module further includes a block write list of the first processing module and a write pointer of the first processing module, wherein the first processing module is indicated by a write pointer of the first processing module. And detecting the error page from the page along the block write list of the first processing module by checking whether the page is in error.
  5. The method of claim 2,
    And the first processing module transmits updated checkpoint information to the third processing module due to logical swapping of the error block and the uninterpreted block of the first processing module.
  6. The method of claim 1,
    After recovering the error using the non-interpreted block of the first processing module,
    And the first processing module obtains the mapping information of the data from the page in which the data is stored, and transmits the mapping information of the data to the second processing module.
  7. The method of claim 6,
    The checkpoint information of the first processing module further includes a block write list of the first processing module, a write pointer of the first processing module, and a reproduction pointer of the first processing module,
    The first processing module acquires the mapping information from the page indicated by the reproduction pointer of the first processing module to the last page normally written after structural recovery according to the block write list of the first processing module and transmits the mapping information to the second processing module. Flash conversion hierarchy.
  8. The method of claim 1,
    The first processing module stores data corresponding to the write command in a cache when receiving a write command, and determines whether data corresponding to the read command exists in the cache when receiving a read command. Transform hierarchy.
  9. The method of claim 1,
    And the first processing module stores the data and mapping information of the data in the same page of a flash memory.
  10. The method of claim 1,
    And the first processing module stores the data in a flash memory and transmits mapping information of the data to the second processing module.
  11. The method of claim 1,
    And the first processing module stores the data in a flash memory in units of pages and advances a write pointer along a block write list when data in units of pages is stored.
  12. The method of claim 11,
    The first processing module transmits a persistence request signal to the second processing module when the write pointer crosses a block boundary, and the second processing module maps a block corresponding to the persistence request signal in response to the persistence request. Storing information in a flash memory, wherein the first processing module advances a reproduction pointer along the block write list upon receiving a persistence complete signal from the second processing module.
  13. The method of claim 1,
    To recover from the error,
    The second processing module detects an error page, copies valid pages in the error block including the error page to an uninterpreted block of the second processing module, and when the copy is completed, the error block and the second processing module. Flash translation hierarchy for logically swapping uninterpreted blocks.
  14. The method of claim 13,
    And the checkpoint information of the second processing module further includes a block write list of the second processing module, wherein the second processing module detects the error page using the block write list.
  15. The method of claim 13,
    The checkpoint information of the second processing module further includes a block write list of the second processing module and a write pointer of the second processing module, wherein the second processing module is indicated by a write pointer of the second processing module. And detecting the error page from the page along the block write list of the second processing module by checking whether the page is in error.
  16. The method of claim 13,
    And the second processing module transmits updated checkpoint information to the third processing module due to logical swapping of the error block and the uninterpreted block of the second processing module.
  17. The method of claim 1,
    Upper processing module for processing higher mapping information of the mapping information
    More,
    After recovering the error using the non-interpreted block of the second processing module,
    And the second processing module obtains higher mapping information of the mapping information from a page in which the mapping information is stored, and transmits higher mapping information of the mapping information to the higher processing module.
  18. The method of claim 17,
    The checkpoint information of the second processing module further includes a block write list of the second processing module, a write pointer of the second processing module, and a reproduction pointer of the second processing module,
    The second processing module acquires upper mapping information from the page indicated by the reproduction pointer of the second processing module to the last page normally written after structural recovery according to the block write list of the second processing module and transmits the upper mapping information to the higher processing module. Flash conversion hierarchy.
  19. The method of claim 1,
    The second processing module stores mapping information corresponding to the mapping command in a cache when receiving a mapping command, and determines whether mapping information corresponding to the read command exists in the cache when receiving a read command. Flash conversion hierarchy.
  20. The method of claim 1,
    And the second processing module stores the mapping information and higher mapping information of the mapping information in the same page of a flash memory.
  21. The method of claim 1,
    Upper processing module for processing higher mapping information of the mapping information
    More,
    And the second processing module stores the mapping information in a flash memory and transmits higher mapping information of the mapping information to the higher processing module.
  22. The method of claim 1,
    And the second processing module stores the mapping information in a flash memory on a page basis, and advances a write pointer along a block write list when the mapping information on a page basis is stored.
  23. The method of claim 22,
    Upper processing module for processing higher mapping information of the mapping information
    More,
    The second processing module transmits a persistence request signal to the higher processing module when the write pointer crosses a block boundary, and the upper processing module transmits higher mapping information of a block corresponding to the persistence request signal in response to the persistence request. Store the data in a flash memory, and wherein the second processing module advances a reproduction pointer along the block write list when receiving the persistence completion signal from the upper processing module.
  24. The method of claim 1,
    The error includes a power failure occurring asynchronously.
  25. The method of claim 1,
    The processing module included in the flash translation layer
    An interface unit connected to at least one of a host, another processing module, and a flash memory;
    A cache unit including volatile memory; And
    A processing unit for processing data or information according to the type of processing module using the interface unit and the cache unit
    Including, flash conversion hierarchy.
  26. The method of claim 1,
    A fourth processing module for processing block state information
    More,
    The third processing module further includes information on the uninterpreted block of the fourth processing module, wherein the fourth processing module recovers an error using the non-interpreted block of the fourth processing module. .
  27. The method of claim 1,
    Fifth processing module acting as a nonvolatile buffer for another processing module
    More,
    The third processing module further includes information on the uninterpreted block of the fifth processing module, wherein the fifth processing module recovers an error using the non-interpreted block of the fifth processing module. .
  28. A D-log for processing data;
    A plurality of M-logs hierarchically processing the mapping information of the data
    Including, flash conversion hierarchy.
  29. The method of claim 28,
    Each of the plurality of M-logs is
    Save the information received from the sub-log to the flash memory resource allocated to it,
    And transmitting mapping information of the flash memory resource to an upper log when a size of information received from the lower log is larger than a predetermined size.
  30. The method of claim 28,
    And a log having a size less than or equal to a predetermined size of information received from a lower log among the plurality of M-logs is determined as the highest M-log.
  31. The method of claim 30,
    The top M-log is
    Store the information received from the lower log in a flash memory resource allocated to it;
    And transmitting mapping information of the flash memory resource to a C-log that processes checkpoint information.
  32. The method of claim 28,
    The characteristics of each of the plurality of M-logs are set for individual M-logs,
    The characteristic of each of the plurality of M-logs comprises at least one of a mapping unit of each of the plurality of M-logs and a cache management policy of each of the plurality of M-logs.
  33. The method of claim 28,
    An L-log for processing block state information; And
    A plurality of LM-logs hierarchically processing mapping information of the block state information;
    The flash conversion hierarchy further comprising.
  34. The method of claim 33, wherein
    Each of the plurality of LM-logs is
    Save the information received from the sub-log to the flash memory resource allocated to it,
    And transmitting mapping information of the flash memory resource to an upper log when a size of information received from the lower log is larger than a predetermined size.
  35. The method of claim 33, wherein
    And a log having a size less than or equal to a predetermined size of information received from a lower log among the plurality of LM-logs is determined as the highest LM-log.
  36. 36. The method of claim 35 wherein
    The highest LM-log is
    Store the information received from the lower log in a flash memory resource allocated to it;
    And transmitting mapping information of the flash memory resource to a C-log that processes checkpoint information.
  37. The method of claim 33, wherein
    The characteristics of each of the plurality of LM-logs are set for each individual LM-log,
    The characteristic of each of the plurality of LM-logs comprises at least one of a mapping unit of each of the plurality of LM-logs and a cache management policy of each of the plurality of LM-logs.
  38. Providing a plurality of building blocks for constructing a flash translation layer
    Including,
    The plurality of building blocks
    A first processing block for processing data;
    At least one second processing block hierarchically processing the mapping information of the data; And
    A third processing block for processing checkpoint information for the first processing block and the at least one second processing block;
    Wherein the plurality of building blocks provide a synchronization interface that satisfies data consistency and data recoverability.
  39. The method of claim 38,
    Receiving a setting related to the design of the flash translation layer; And
    Generating the flash translation layer based on the plurality of building blocks and the configuration
    Further comprising a method of designing a flash translation layer.
  40. The method of claim 39,
    The above settings
    A setting associated with the number of threads implementing the flash translation layer;
    A setting related to the number of cores driving the flash translation layer;
    A setting related to the number of threads processed per core; And
    Configuration related to mapping between multiple cores and multiple threads
    And at least one of: a flash translation layer.
  41. The method of claim 39,
    Receiving a second setting related to the design of the flash translation layer; And
    Adaptively regenerating the flash translation layer based on the plurality of building blocks and the second configuration
    Further comprising a method of designing a flash translation layer.
  42. A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 38-41.
PCT/KR2014/001028 2013-02-07 2014-02-06 Flash translation layer design framework for provable and accurate error recovery WO2014123372A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR10-2013-0013659 2013-02-07
KR20130013659 2013-02-07
KR1020140013590A KR101526110B1 (en) 2013-02-07 2014-02-06 Flash transition layor design framework for provably correct crash recovery
KR10-2014-0013590 2014-02-06

Publications (1)

Publication Number Publication Date
WO2014123372A1 true WO2014123372A1 (en) 2014-08-14

Family

ID=51299917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2014/001028 WO2014123372A1 (en) 2013-02-07 2014-02-06 Flash translation layer design framework for provable and accurate error recovery

Country Status (1)

Country Link
WO (1) WO2014123372A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060179212A1 (en) * 2005-02-07 2006-08-10 Kim Jin-Hyuk Flash memory control devices that support multiple memory mapping schemes and methods of operating same
US20080098195A1 (en) * 2006-10-19 2008-04-24 Cheon Won-Moon Memory system including flash memory and mapping table management method
US20090089610A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Rapid crash recovery for flash storage
WO2012008731A2 (en) * 2010-07-12 2012-01-19 (주)이더블유비엠코리아 Device and method for managing flash memory using block unit mapping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060179212A1 (en) * 2005-02-07 2006-08-10 Kim Jin-Hyuk Flash memory control devices that support multiple memory mapping schemes and methods of operating same
US20080098195A1 (en) * 2006-10-19 2008-04-24 Cheon Won-Moon Memory system including flash memory and mapping table management method
US20090089610A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Rapid crash recovery for flash storage
WO2012008731A2 (en) * 2010-07-12 2012-01-19 (주)이더블유비엠코리아 Device and method for managing flash memory using block unit mapping

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YOON, JI HYEOK: "X-BMS: A Provably-Correct Bad Block Management Scheme for Flash Memory Based Storage Systems", SEOUL NATIONAL UNIVERSITY DOCTORAL THESIS, COMPUTER ENGINEERING, February 2011 (2011-02-01) *

Similar Documents

Publication Publication Date Title
Zhang et al. Mojim: A reliable and highly-available non-volatile memory system
Mittal et al. A survey of software techniques for using non-volatile memories for storage and main memory systems
US9772938B2 (en) Auto-commit memory metadata and resetting the metadata by writing to special address in free space of page storing the metadata
TWI511157B (en) Efficient enforcement of command execution order in solid state drives
US20190251067A1 (en) Snapshots for a non-volatile device
US9460028B1 (en) Non-disruptive and minimally disruptive data migration in active-active clusters
JP6294518B2 (en) Synchronous mirroring in non-volatile memory systems
ES2600914T3 (en) Replicated virtual storage management in recovery sites
US10289545B2 (en) Hybrid checkpointed memory
US9047351B2 (en) Cluster of processing nodes with distributed global flash memory using commodity server technology
Moraru et al. Consistent, durable, and safe memory management for byte-addressable non volatile main memory
US10009438B2 (en) Transaction log acceleration
US9152508B1 (en) Restoration of a backup of a first volume to a second volume on physical media
Arulraj et al. Write-behind logging
US7925852B2 (en) Storage controller and data management method
US9218278B2 (en) Auto-commit memory
KR101805948B1 (en) Checkpoints for a file system
US8914567B2 (en) Storage management system for virtual machines
US8930947B1 (en) System and method for live migration of a virtual machine with dedicated cache
TWI556104B (en) Techniques to perform power fail-safe caching without atomic metadata
KR101921365B1 (en) Nonvolatile media dirty region tracking
DE112011100112B4 (en) Buffer memory plate in flash copy cascade
US9104529B1 (en) System and method for copying a cache system
US8255371B2 (en) Methods and apparatuses for data protection
US20150019792A1 (en) System and method for implementing transactions using storage device support for atomic updates and flexible interface for managing data logging

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14748958

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14748958

Country of ref document: EP

Kind code of ref document: A1