WO2021139166A1 - Error page identification method based on three-dimensional flash storage structure - Google Patents

Error page identification method based on three-dimensional flash storage structure Download PDF

Info

Publication number
WO2021139166A1
WO2021139166A1 PCT/CN2020/110817 CN2020110817W WO2021139166A1 WO 2021139166 A1 WO2021139166 A1 WO 2021139166A1 CN 2020110817 W CN2020110817 W CN 2020110817W WO 2021139166 A1 WO2021139166 A1 WO 2021139166A1
Authority
WO
WIPO (PCT)
Prior art keywords
flash memory
error
memory storage
storage structure
physical
Prior art date
Application number
PCT/CN2020/110817
Other languages
French (fr)
Chinese (zh)
Inventor
黄敏
杜雅芝
肖仲喆
吴迪
顾济华
Original Assignee
苏州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州大学 filed Critical 苏州大学
Publication of WO2021139166A1 publication Critical patent/WO2021139166A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature

Definitions

  • the invention relates to the field of three-dimensional flash memory storage structures, in particular to an error page identification method based on the three-dimensional flash memory storage structure.
  • TLC Triple Level Cell flash memory devices
  • SLC Single Level Cell
  • MLC Multiple Level Cell
  • TLC flash memory devices have higher storage density and lower cost, especially when used in three-dimensional flash memory storage systems
  • the land is more extensive.
  • 3D stacking technology in the three-dimensional flash memory storage system that is, the vertical stacking of TLC flash memory basic units, makes the storage density of flash memory rapidly increase while the reliability is constantly decreasing.
  • the current research status mainly relies on the error correction code and does not rely on the error correction code to solve the problem of increasing error rate in the 3D flash memory storage system and improve the storage system Overall reliability performance.
  • error correction codes are the earliest used in flash memory devices, which can sufficiently meet the error correction requirements of SLC flash memory devices. With the advent of MLC flash memory devices, the Hamming code error correction capability cannot meet the demand.
  • RS codes and BCH codes have been proposed in the field of flash storage systems, but with the increase in the density of basic storage units and the structural characteristics of the basic storage units of flash memory, the probability of data storage errors cannot be met.
  • three-dimensional flash memory storage systems generally use TLC as the storage medium, and the demand for error correction codes has become greater.
  • powerful correction algorithms such as low-density parity
  • Checksum (LDPC)) has become critical.
  • redundant backup technology is an effective measure to improve the error rate of the storage system.
  • RAID Redundant Array of Indepent Disks
  • RAID Redundant Array of Indepent Disks
  • It is widely used to increase the redundant data to extend the data retention time, reduce the error rate and improve the reliability of the storage system.
  • a method of separating "hot data” blocks and "cold data” is carried out.
  • the technical problem to be solved by the present invention is to provide an error page identification method based on the three-dimensional flash memory storage structure, called error page identification technology, which can accurately identify the error rate of all physical pages in the current three-dimensional flash memory storage system, and can effectively improve Reliability of 3D flash storage system.
  • the basic storage unit of 3D flash memory storage system is mainly based on TLC as the storage medium. Compared with SLC and MLC, TLC has the worst reliability and service life due to its own structural characteristics.
  • the reading speed of all physical pages in the OCSSD structure of the three-dimensional flash memory storage system characterizes the error rate of the physical page.
  • the machine learning method is used to classify the reliability of all physical pages. The lower the level, the higher the error rate.
  • the physical pages with high error rates are removed for real-time data migration, thereby effectively reducing the error rate and achieving the purpose of improving the reliability of the 3D flash memory storage system.
  • the present invention provides an error page identification method based on a three-dimensional flash memory storage structure, including:
  • Error page detection According to the current speed and the initial speed of the current reading speed of each physical page, the reliability of all physical pages is divided into five categories from high to low, and one of the two types of physical pages with a relatively high error rate Signs with higher error rates;
  • Real-time data migration modify the basic command copyback operation of NAND flash at the MTD layer, without modifying other software and hardware other than the MTD layer; use the copyback operation to specify the target address for programming operations.
  • the read speed includes the initial speed of the physical page, the current speed of the physical page, and the overall average speed of the flash memory.
  • the load characteristics of the current work include random requests and sequential requests.
  • the reliability of all physical pages is classified into five categories from high to low: Best, Good, Normal, Weak, and Worst.
  • the copyback operation includes a copyback read instruction and a copyback programming instruction.
  • two types of physical pages with high error rates are stored in the list, and the real-time data migration operation is performed on them during the idle time of the storage system.
  • the copy target address of the physical page with the highest error rate is released to the NAND flash memory controller; the next physical page with the next highest error rate can be loaded into the NAND flash memory controller at the same time.
  • this application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • a computer program stored in the memory and capable of running on the processor.
  • the processor executes the program, any item is implemented. The steps of the method.
  • the present application also provides a solid-state hard disk, which applies an error page recognition method based on a three-dimensional flash memory storage structure.
  • this application also provides a computer including the solid-state hard disk.
  • the present invention can efficiently identify physical pages with a higher error rate, and combine the machine learning algorithms to classify the physical pages in the current three-dimensional flash memory storage system. While reducing the error rate of the 3D flash memory storage system, the copyback operation used in the real-time data migration operation is performed to improve the response time of the system.
  • FIG. 1 is a diagram of the overall structure of error page identification in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
  • Fig. 2 is a schematic diagram of a data collection strategy in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
  • FIG. 3 is an error page detection model in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
  • FIG. 4 is a schematic diagram of real-time data migration in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
  • FIG. 5 is an example diagram of an error page identification method in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
  • OCSSD Open-Channel SSD
  • FTL Flash Translation Layer
  • PPA Physical Page Address
  • This patent proposes a method for identifying error pages, which is a strategy for enhancing system reliability.
  • the physical page read speed of the OCSSD structure of the three-dimensional flash memory storage system is used to characterize the physical page error rate. The slower the read speed of the physical page, the higher the error rate, otherwise, vice versa.
  • the machine learning method is used to classify the reliability levels of all physical pages, and the physical pages with high error rates are removed for real-time data migration, which reduces the error rate and improves the reliability of the 3D flash memory storage system.
  • Figure 5 shows an example of the error page identification method.
  • each channel has 1 Plane
  • each Plane has 4 Physical blocks (Block0, Block1, Block2, and Block3)
  • each block has n physical pages (Page0, Page1...Pagen).
  • the need is to collect the read speed data of all physical pages in the 3D flash memory system, mainly the data collection of the read speed of a physical page (the initial speed of the physical page, the current speed of the physical page and the average speed of the flash memory as a whole) ; Then, the collected data is divided into five categories (Best, Good, Normal, Weak, and Worst) through the training model, and physical pages with higher error rates are identified.
  • the physical pages referred to here are Weak and Worst with higher error rates;
  • the copyback operation method in idle time, the identified physical pages are stored in a list and sorted according to the error rate, and the data migration operation is performed first with the highest error rate.
  • page2# the physical page with the highest error rate (for example, page2#) to perform a copyback operation in the storage register.
  • page2# is released to the NAND flash memory controller at the specified target address, and the next page4 with the next highest error rate can be loaded into the NAND flash memory controller at the same time.
  • page4 programming After waiting for the end of page2# programming, continue with other operations of page4 programming, which reduces page read and write operations during the rewriting process, improves the I/O performance of the storage system and reduces the consumption of the storage system.
  • the OCSSD structure Based on the three-dimensional flash storage system structure, that is, the OCSSD structure, it can accurately identify the physical page with a higher error rate through the physical page read speed of the flash storage system. At the same time, the structure can meet the calculation and calculation requirements of the machine learning algorithm. Speed requirements and real-time data migration methods using copyback operations.
  • the error page identification method proposed by the present invention effectively identifies physical pages with a higher error rate, and performs real-time data migration operations on them, thereby improving the reliability of the overall system and the system response time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Read Only Memory (AREA)

Abstract

An error page identification method based on a three-dimensional flash storage structure. The read speeds of all physical pages in a three-dimensional flash storage system OCSSD structure represent the error rates of the physical pages. All physical pages are classified according to the reliability levels by a machine learning method. The lower the level, the higher the error rate. A physical page having a high error rate is selected for real-time data migration, thereby effectively reducing the error rate and improving the reliability of the three-dimensional flash storage system.

Description

基于三维闪存存储结构的错误页识别方法Error page recognition method based on three-dimensional flash memory storage structure 技术领域Technical field
本发明涉及三维闪存存储结构领域,具体涉及一种基于三维闪存存储结构的错误页识别方法。The invention relates to the field of three-dimensional flash memory storage structures, in particular to an error page identification method based on the three-dimensional flash memory storage structure.
背景技术Background technique
与SLC(Single Level Cell)和MLC(Multiple Level Cell)闪存器件相比较而言,TLC(Triple Level Cell)闪存器件具有更高的存储密度和更低的成本,特别是在三维闪存存储系统中使用地较为广泛。在三维闪存存储系统中采用3D堆叠技术,即TLC闪存基本单元垂直堆叠,使得闪存存储密度快速增加的同时可靠性却在不断下降。Compared with SLC (Single Level Cell) and MLC (Multiple Level Cell) flash memory devices, TLC (Triple Level Cell) flash memory devices have higher storage density and lower cost, especially when used in three-dimensional flash memory storage systems The land is more extensive. The use of 3D stacking technology in the three-dimensional flash memory storage system, that is, the vertical stacking of TLC flash memory basic units, makes the storage density of flash memory rapidly increase while the reliability is constantly decreasing.
传统技术存在以下技术问题:Traditional technology has the following technical problems:
为了能够降低存储在三维闪存存储系统中错误率,目前的研究现状主要是依赖纠错码和不依赖纠错码两个角度来解决解决三维闪存存储系统中错误率升高的问题,提高存储系统整体的可靠性能。纠错码的使用是降低闪存存储系统错误率升高的有效手段之一,在最早在闪存器件中用的比较多的是汉明码,其可以足够满足SLC闪存器件的纠错要求。随着MLC闪存器件的出现,汉明码纠错能力无法满足需求。RS码和BCH码在闪存存储系统领域被提出,但是随着基本存储单元密度的提高以及闪存存储基本单元的结构特点,也无法满足数据存储错误几率变大。然而,三维闪存存储系统一般是TLC为存储介质,对于纠错码的需求变得更大,为了确保当今基于闪存的固态存储系统中的数据可信赖性,利用强大的校正算法(例如低密度奇偶校验(LDPC))变得至关重要。In order to be able to reduce the error rate stored in the 3D flash memory storage system, the current research status mainly relies on the error correction code and does not rely on the error correction code to solve the problem of increasing error rate in the 3D flash memory storage system and improve the storage system Overall reliability performance. The use of error correction codes is one of the effective means to reduce the increase in the error rate of flash memory storage systems. Hamming codes are the earliest used in flash memory devices, which can sufficiently meet the error correction requirements of SLC flash memory devices. With the advent of MLC flash memory devices, the Hamming code error correction capability cannot meet the demand. RS codes and BCH codes have been proposed in the field of flash storage systems, but with the increase in the density of basic storage units and the structural characteristics of the basic storage units of flash memory, the probability of data storage errors cannot be met. However, three-dimensional flash memory storage systems generally use TLC as the storage medium, and the demand for error correction codes has become greater. In order to ensure the reliability of data in today’s flash-based solid-state storage systems, powerful correction algorithms (such as low-density parity) are used. Checksum (LDPC)) has become critical.
对于不依赖于纠错码研究NAND闪存自身错误特征进行错误率抑制的策略,通过冗余备份技术是提高存储系统错误率的有效措施方法,RAID(Redundant Array of Indepent Disks)技术在闪存存储系统里面被广泛地使用,增加冗余数据延长数据保留时间,减少错误率提高存储系统的可靠性的问题。同时考虑到由于“热数据”块和“冷数据”块的混合,这可能会导致非常高的成本,进行“热数据”块和“冷数据”分离的方法。还有就是采用冗余备份和冷热数据分离相结合的优点,实现提高系统的耐久性和可靠性。For the strategy of suppressing the error rate without relying on the error correction code to study the error characteristics of NAND flash memory, redundant backup technology is an effective measure to improve the error rate of the storage system. RAID (Redundant Array of Indepent Disks) technology is in the flash storage system It is widely used to increase the redundant data to extend the data retention time, reduce the error rate and improve the reliability of the storage system. At the same time, considering that due to the mixing of "hot data" blocks and "cold data" blocks, which may cause very high costs, a method of separating "hot data" blocks and "cold data" is carried out. There is also the advantage of combining redundant backup and cold and hot data separation to improve the durability and reliability of the system.
虽然现有技术在一定程度上面减少数据存储的错误率,提高了三维闪存存储系统的可靠性,但是还是存在空间利用率不高,实用性不高以及硬件成本过高等问题的存在。Although the prior art reduces the error rate of data storage to a certain extent and improves the reliability of the three-dimensional flash memory storage system, there are still problems such as low space utilization, low practicability, and high hardware cost.
发明内容Summary of the invention
本发明要解决的技术问题是提供一种基于三维闪存存储结构的错误页识别方法,称之错误页识别技术,可以精确地识别当前三维闪存存储系统中所有物理页的错误率,能够有效地提高三维闪存存储系统的可靠性。三维闪存存储系统存储基本单元主要是以TLC为存储介质,相较于SLC和MLC而言,TLC因其本身的结构特点导致它的可靠性能和使用寿命全是这三种里面最差的。三维闪存存储系统OCSSD结构下所有物理页的读取速度来表征该物理页的错误率高低,利用机器学习的方法对所有物理页进行可靠性等级分类,等级越低就表示这错误率越高。将错误率高的物理页剔除出来进行实时数据迁移,从而有效地降低错误率实现提高三维闪存存储系统的可靠性的目的。The technical problem to be solved by the present invention is to provide an error page identification method based on the three-dimensional flash memory storage structure, called error page identification technology, which can accurately identify the error rate of all physical pages in the current three-dimensional flash memory storage system, and can effectively improve Reliability of 3D flash storage system. The basic storage unit of 3D flash memory storage system is mainly based on TLC as the storage medium. Compared with SLC and MLC, TLC has the worst reliability and service life due to its own structural characteristics. The reading speed of all physical pages in the OCSSD structure of the three-dimensional flash memory storage system characterizes the error rate of the physical page. The machine learning method is used to classify the reliability of all physical pages. The lower the level, the higher the error rate. The physical pages with high error rates are removed for real-time data migration, thereby effectively reducing the error rate and achieving the purpose of improving the reliability of the 3D flash memory storage system.
为了解决上述技术问题,本发明提供了一种基于三维闪存存储结构的错误页识别方法,包括:In order to solve the above technical problems, the present invention provides an error page identification method based on a three-dimensional flash memory storage structure, including:
数据收集:当用户空间工作在不同工作负载下,对每一个物理页的读速度的数据进行收集,同时收集当前工作的负载特征;Data collection: When the user space is working under different workloads, the data of the read speed of each physical page is collected, and the load characteristics of the current work are collected at the same time;
错误页检测:根据对比当前每个物理页读速度的当前速度和初始速度,将所有物理页的可靠性从高到低等级分成五类,对其中错误率比较高的两大类物理页进行一个错误率较高的标识;Error page detection: According to the current speed and the initial speed of the current reading speed of each physical page, the reliability of all physical pages is divided into five categories from high to low, and one of the two types of physical pages with a relatively high error rate Signs with higher error rates;
实时数据迁移:在MTD层进行修改NAND flash的基本命令copyback操作,无需对MTD层以外的其他软件和硬件进行修改;利用copyback操作指定对目标地址进行编程操作。Real-time data migration: modify the basic command copyback operation of NAND flash at the MTD layer, without modifying other software and hardware other than the MTD layer; use the copyback operation to specify the target address for programming operations.
在其中一个实施例中,所述读速度包括物理页的初始速度、物理页的当前速度以及闪存整体的平均速度。In one of the embodiments, the read speed includes the initial speed of the physical page, the current speed of the physical page, and the overall average speed of the flash memory.
在其中一个实施例中,所述当前工作的负载特征包括随机请求和顺序请求。In one of the embodiments, the load characteristics of the current work include random requests and sequential requests.
在其中一个实施例中,如果物理页的当前速度已经大于整体平均速度,则不需进行错误率较高的标识。In one of the embodiments, if the current speed of the physical page is already greater than the overall average speed, there is no need to perform identification with a higher error rate.
在其中一个实施例中,所有物理页的可靠性从高到低等级分成具体如下的五类:Best、Good、Normal、Weak以及Worst。In one of the embodiments, the reliability of all physical pages is classified into five categories from high to low: Best, Good, Normal, Weak, and Worst.
在其中一个实施例中,所述copyback操作包括copyback读指令和copyback编程指令。In one of the embodiments, the copyback operation includes a copyback read instruction and a copyback programming instruction.
在其中一个实施例中,将错误率高的两大类的物理页存储在列表里面,在存储系统空闲时间的时候,对其进行实时数据迁移操作。In one of the embodiments, two types of physical pages with high error rates are stored in the list, and the real-time data migration operation is performed on them during the idle time of the storage system.
在其中一个实施例中,将错误率最高的物理页的副本目标地址释放到NAND闪存控制器;下一个错误率次高的物理页可以同时加载到NAND闪存控制器中。In one of the embodiments, the copy target address of the physical page with the highest error rate is released to the NAND flash memory controller; the next physical page with the next highest error rate can be loaded into the NAND flash memory controller at the same time.
基于同样的发明构思,本申请还提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一项所述方法的步骤。Based on the same inventive concept, this application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. When the processor executes the program, any item is implemented. The steps of the method.
基于同样的发明构思,本申请还提供一种固态硬盘,应用基于三维闪存存储结构的错误页识别方法。Based on the same inventive concept, the present application also provides a solid-state hard disk, which applies an error page recognition method based on a three-dimensional flash memory storage structure.
基于同样的发明构思,本申请还提供一种计算机,包含所述的固态硬盘。Based on the same inventive concept, this application also provides a computer including the solid-state hard disk.
本发明的有益效果:The beneficial effects of the present invention:
本发明可以高效地识别出错误率较高的物理页,结合机器学习的算法将当前三维闪存存储系统中的物理页进行分类。降低三维闪存存储系统错误率的同时,进行实时数据迁移操作采用的copyback操作,提高系统的响应时间。The present invention can efficiently identify physical pages with a higher error rate, and combine the machine learning algorithms to classify the physical pages in the current three-dimensional flash memory storage system. While reducing the error rate of the 3D flash memory storage system, the copyback operation used in the real-time data migration operation is performed to improve the response time of the system.
附图说明Description of the drawings
图1是本发明基于三维闪存存储结构的错误页识别方法中的错误页识别整体结构图。FIG. 1 is a diagram of the overall structure of error page identification in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
图2是本发明基于三维闪存存储结构的错误页识别方法中的数据收集策略示意图。Fig. 2 is a schematic diagram of a data collection strategy in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
图3是本发明基于三维闪存存储结构的错误页识别方法中的错误页检测模型。FIG. 3 is an error page detection model in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
图4是本发明基于三维闪存存储结构的错误页识别方法中的实时数据迁移示意图。FIG. 4 is a schematic diagram of real-time data migration in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
图5是本发明基于三维闪存存储结构的错误页识别方法中错误页识别方法示例图。FIG. 5 is an example diagram of an error page identification method in the error page identification method based on the three-dimensional flash memory storage structure of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定。The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention, but the examples cited are not intended to limit the present invention.
为了提高三维闪存存储系统的可靠性,其错误率升高的问题尤其明显。在三维闪存存储系统以Open-Channel SSD(OCSSD)为范例,它将闪存存储结构中的转换层(Flash Translation Layer,FTL)搬移到Host端,通过专用的接口PPA(Physical Page Address)I/O接口将底层设备物理结构信息直接暴露给上层用户空间。在上层空间还没有使用底层设备之前就已经知道底层物理信息,而在传统的闪存存储系统结构下是没法确切地对获取底层设备信息,只能对逻辑地址进行操作无法读取物理地址信息。因此,在OCSSD结构下能够实现通过闪存存储系统的物理页读速度快慢来精确识别出错误率高低,同时该结构也能满足机器学习的算法对运算量和运算速率的需求。In order to improve the reliability of the 3D flash memory storage system, the problem of increased error rate is particularly obvious. Taking Open-Channel SSD (OCSSD) as an example in the 3D flash storage system, it moves the Flash Translation Layer (FTL) in the flash storage structure to the Host side, and uses the dedicated interface PPA (Physical Page Address) I/O The interface directly exposes the physical structure information of the bottom device to the upper user space. The underlying physical information is known before the underlying device is used in the upper space. However, under the traditional flash storage system structure, it is impossible to obtain the underlying device information exactly. You can only operate on the logical address and cannot read the physical address information. Therefore, under the OCSSD structure, the error rate can be accurately identified through the physical page read speed of the flash memory storage system. At the same time, the structure can also meet the requirements of the machine learning algorithm for the amount of calculation and the calculation rate.
本专利提出错误页识别方法,是一种增强系统可靠性策略。以三维闪存存储系统OCSSD结构的物理页读取速度快慢来表征物理页错误率高低。物理页的读速度越慢,说明错误率就越高,否则,反之。采用机器学习的方法对所有物理页进行可靠性等级分类,将错误率高的物理页剔除出来进行实时数据迁移,降低错误率提高三维闪存存储系统的可靠性。This patent proposes a method for identifying error pages, which is a strategy for enhancing system reliability. The physical page read speed of the OCSSD structure of the three-dimensional flash memory storage system is used to characterize the physical page error rate. The slower the read speed of the physical page, the higher the error rate, otherwise, vice versa. The machine learning method is used to classify the reliability levels of all physical pages, and the physical pages with high error rates are removed for real-time data migration, which reduces the error rate and improves the reliability of the 3D flash memory storage system.
如图1的错误页识别整体方案所示,在主机端我们提出了一种有效的错误识别管理单元设计,与文件系统和FTL协同工作。通过三个策略来实现对三维闪存存储系统中的每个物理页进行错误率识别和实时数据迁移:As shown in the overall scheme of error page recognition in Figure 1, we propose an effective error recognition management unit design on the host side, which works with the file system and FTL. Three strategies are adopted to realize error rate identification and real-time data migration for each physical page in the 3D flash memory storage system:
(1)数据收集,如图2所示,当用户空间工作在不同工作负载下,对每一个物理页的读速度(物理页的初始速度、物理页的当前速度以及闪存整体的平均速度)的数据进行收集,同时收集当前工作的负载特征(随机请求还是顺序请求)。(1) Data collection, as shown in Figure 2, when the user space is working under different workloads, the read speed of each physical page (the initial speed of the physical page, the current speed of the physical page, and the overall average speed of the flash memory) Data is collected, and the load characteristics of the current work (random request or sequential request) are collected at the same time.
(2)错误页检测,将收集的数据输入如图3所示的机器学习的训练模型中,根据对比当前每个物理页读速度的当前速度和初始速度,将所有物理页的可靠性等级从高到低分成五类(Best、Good、Normal、Weak以及Worst),其中的Weak和Worst两大类的物理页是错误率比较高的,这也就是意味着其已经超过了纠错码纠错能力范围,是三维闪存存储系统中错误率升高的问题所在。因此, 我们需要对这两大类的物理页进行一个错误率较高的标识。但是,同时考虑到过度的操作导致的资源功耗过大以及系统延迟等问题,如果物理页的当前速度已经大于整体平均速度,则不需进行错误率较高的标识。(2) Error page detection, input the collected data into the machine learning training model shown in Figure 3. According to the current speed and initial speed of the current reading speed of each physical page, the reliability level of all physical pages is changed from High to low is divided into five categories (Best, Good, Normal, Weak, and Worst). Among them, the physical pages of Weak and Worst have a relatively high error rate, which means that they have exceeded the error correction code. The scope of capability is the problem of the increase in error rate in the 3D flash memory storage system. Therefore, we need to identify these two types of physical pages with a higher error rate. However, taking into account issues such as excessive resource power consumption and system delay caused by excessive operations, if the current speed of the physical page is already greater than the overall average speed, there is no need to perform identification with a higher error rate.
(3)实时数据迁移,在MTD(Memory Technology Device)层进行修改NAND flash的基本命令copyback操作,无需对MTD层以外的其他软件和硬件进行修改。利用copyback操作包括读操作和编程操作两个部分,可以指定将目标地址进行编程操作。如图4所示,将错误率高的两大类的物理页存储在列表里面,在存储系统空闲时间的时候,对其进行实时数据迁移操作。将错误率最高的物理页的副本(例如page2#)目标地址释放到NAND闪存控制器。下一个错误率次高的物理页(例如page4)可以同时加载到NAND闪存控制器中。在等待page2#编程结束后,继续进行page4编程其他操作,减少了重写过程中页的读写操作,提高了存储系统的I/O性能和减轻存储系统消耗。(3) Real-time data migration, modify the basic command copyback operation of NAND flash at the MTD (Memory Technology Device) layer, without the need to modify other software and hardware other than the MTD layer. Use copyback operation to include two parts of read operation and programming operation, you can specify the target address for programming operation. As shown in Figure 4, two types of physical pages with high error rates are stored in the list, and real-time data migration operations are performed on them when the storage system is idle. Release the target address of the copy of the physical page with the highest error rate (for example, page2#) to the NAND flash memory controller. The next physical page with the next highest error rate (eg page4) can be loaded into the NAND flash controller at the same time. After waiting for the end of page2# programming, continue with other operations of page4 programming, which reduces page read and write operations during the rewriting process, improves the I/O performance of the storage system and reduces storage system consumption.
如图5错误页识别方法的示例,在这个例子里面,我们假设三维闪存存储系统OCSSD里面一共有两个通道(Channel0和Channel1),每个通道里面有1个Plane,每个Plane里面有4个物理块(Block0、Block1、Block2以及Block3),每个Block里面有n个物理页(Page0、Page1…Pagen)。首先,需要就是对三维闪存系统里面的所有物理页的读速度数据收集,主要是一个物理页的读取速度(物理页的初始速度、物理页的当前速度以及闪存整体的平均速度)的数据收集;然后,将收集数据通过训练模型分成五类(Best、Good、Normal、Weak以及Worst)并且标识出需要错误率较高的物理页,这里指的物理页是错误率较高的Weak和Worst;最后,利用copyback操作的方法,在空闲时间的时候,标识好的物理页存储在一个列表里面按照错误率高低进行排序,错误率最高的先进行数据迁移操作。图5所示的当前错误率最高的page2,将需要进行错误率最高的物理页的副本(例如page2#)在存储寄存器中进行copyback操作。page2#按指定的目标地址释放到NAND闪存控制器,下一个错误率次高的page4可以同时加载到NAND闪存控制器中。在等待page2#编程结束后,继续进行 page4编程的其他操作,减少了重写过程中页的读写操作,提高了存储系统的I/O性能和减轻存储系统消耗。Figure 5 shows an example of the error page identification method. In this example, we assume that there are two channels (Channel0 and Channel1) in the three-dimensional flash memory storage system OCSSD, each channel has 1 Plane, and each Plane has 4 Physical blocks (Block0, Block1, Block2, and Block3), each block has n physical pages (Page0, Page1...Pagen). First of all, the need is to collect the read speed data of all physical pages in the 3D flash memory system, mainly the data collection of the read speed of a physical page (the initial speed of the physical page, the current speed of the physical page and the average speed of the flash memory as a whole) ; Then, the collected data is divided into five categories (Best, Good, Normal, Weak, and Worst) through the training model, and physical pages with higher error rates are identified. The physical pages referred to here are Weak and Worst with higher error rates; Finally, using the copyback operation method, in idle time, the identified physical pages are stored in a list and sorted according to the error rate, and the data migration operation is performed first with the highest error rate. The page2 with the highest error rate currently shown in FIG. 5 will require a copy of the physical page with the highest error rate (for example, page2#) to perform a copyback operation in the storage register. page2# is released to the NAND flash memory controller at the specified target address, and the next page4 with the next highest error rate can be loaded into the NAND flash memory controller at the same time. After waiting for the end of page2# programming, continue with other operations of page4 programming, which reduces page read and write operations during the rewriting process, improves the I/O performance of the storage system and reduces the consumption of the storage system.
以上对本发明提供的基于三维闪存存储结构的错误页识别方法做了详细的描述,还有以下几点需要说明:The error page recognition method based on the three-dimensional flash memory storage structure provided by the present invention has been described in detail above, and the following points need to be explained:
基于三维闪存存储系统结构,即OCSSD结构,能够实现通过闪存存储系统的物理页读速度快慢来精确识别出错误率较高的物理页,同时该结构能满足机器学习的算法需求的运算量和运算速率的需求以及利用copyback操作进行实时数据迁移方法。本发明提出的错误页识别方法有效地识别出错误率较高的物理页,对其进行实时的数据迁移操作,提高整体系统的可靠性同时系统响应时间。Based on the three-dimensional flash storage system structure, that is, the OCSSD structure, it can accurately identify the physical page with a higher error rate through the physical page read speed of the flash storage system. At the same time, the structure can meet the calculation and calculation requirements of the machine learning algorithm. Speed requirements and real-time data migration methods using copyback operations. The error page identification method proposed by the present invention effectively identifies physical pages with a higher error rate, and performs real-time data migration operations on them, thereby improving the reliability of the overall system and the system response time.
以上所述实施例仅是为充分说明本发明而所举的较佳的实施例,本发明的保护范围不限于此。本技术领域的技术人员在本发明基础上所作的等同替代或变换,均在本发明的保护范围之内。本发明的保护范围以权利要求书为准。The above-mentioned embodiments are only preferred embodiments for fully explaining the present invention, and the protection scope of the present invention is not limited thereto. Equivalent substitutions or alterations made by those skilled in the art on the basis of the present invention are all within the protection scope of the present invention. The protection scope of the present invention is subject to the claims.

Claims (10)

  1. 一种基于三维闪存存储结构的错误页识别方法,其特征在于,包括:An error page identification method based on a three-dimensional flash memory storage structure, which is characterized in that it includes:
    数据收集:当用户空间工作在不同工作负载下,对每一个物理页的读速度的数据进行收集,同时收集当前工作的负载特征。Data collection: When the user space is working under different workloads, the data of the read speed of each physical page is collected, and the load characteristics of the current work are collected at the same time.
    错误页检测:根据对比当前每个物理页读速度的当前速度和初始速度,将所有物理页的可靠性从高到低等级分成五类,对其中错误率比较高的两大类物理页进行一个错误率较高的标识;Error page detection: According to the current speed and the initial speed of the current reading speed of each physical page, the reliability of all physical pages is divided into five categories from high to low, and one of the two types of physical pages with a relatively high error rate Signs with higher error rates;
    实时数据迁移:在MTD层进行修改NANDflash的基本命令copyback操作,无需对MTD层以外的其他软件和硬件进行修改;利用copyback操作对目标地址进行编程操作。Real-time data migration: The basic command copyback operation to modify the NANDflash at the MTD layer does not need to modify other software and hardware other than the MTD layer; use the copyback operation to program the target address.
  2. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,所述读速度包括物理页的初始速度、物理页的当前速度以及闪存整体的平均速度。The method for identifying error pages based on a three-dimensional flash memory storage structure according to claim 1, wherein the read speed includes the initial speed of the physical page, the current speed of the physical page, and the overall average speed of the flash memory.
  3. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,所述当前工作的负载特征包括随机请求和顺序请求。The method for identifying error pages based on a three-dimensional flash memory storage structure according to claim 1, wherein the load characteristics of the current work include random requests and sequential requests.
  4. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,如果物理页的当前速度已经大于整体平均速度,则不需进行错误率较高的标识。The method for identifying error pages based on a three-dimensional flash memory storage structure as claimed in claim 1, wherein if the current speed of the physical page is already greater than the overall average speed, there is no need to perform identification with a higher error rate.
  5. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,所有物理页的可靠性从高到低等级分成具体如下的五类:Best、Good、Normal、Weak以及Worst。The method for identifying error pages based on a three-dimensional flash memory storage structure according to claim 1, wherein the reliability of all physical pages is divided into five categories from high to low: Best, Good, Normal, Weak, and Worst.
  6. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,所述copyback操作包括copyback读指令和copyback编程指令。The method for identifying error pages based on a three-dimensional flash memory storage structure according to claim 1, wherein the copyback operation includes a copyback read command and a copyback programming command.
  7. 如权利要求1所述的基于三维闪存存储结构的错误页识别方法,其特征在于,将错误率高的两大类的物理页存储在列表里面,在存储系统空闲时间的时候,对其进行实时数据迁移操作。The method for identifying error pages based on a three-dimensional flash memory storage structure according to claim 1, wherein the two types of physical pages with high error rates are stored in the list, and the physical pages are processed in real time when the storage system is idle. Data migration operation.
  8. 如权利要求7所述的基于三维闪存存储结构的错误页识别方法,其特征在于,将错误率最高的物理页的副本目标地址释放到NAND闪存控制器;下一个错误率次高的物理页可以同时加载到NAND闪存控制器中。The error page identification method based on the three-dimensional flash memory storage structure of claim 7, wherein the copy target address of the physical page with the highest error rate is released to the NAND flash memory controller; the next physical page with the second highest error rate can be Simultaneously load into the NAND flash memory controller.
  9. 一种固态硬盘,其特征在于,应用权利要求1到8任一项所述的基于三维闪存存储结构的错误页识别方法。A solid-state hard disk, characterized by applying the error page identification method based on a three-dimensional flash memory storage structure according to any one of claims 1 to 8.
  10. 一种计算机,其特征在于,包含权利要求9所述的固态硬盘。A computer, characterized by comprising the solid-state hard disk according to claim 9.
PCT/CN2020/110817 2020-01-07 2020-08-24 Error page identification method based on three-dimensional flash storage structure WO2021139166A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010015474.9A CN111240887A (en) 2020-01-07 2020-01-07 Error page identification method based on three-dimensional flash memory storage structure
CN202010015474.9 2020-01-07

Publications (1)

Publication Number Publication Date
WO2021139166A1 true WO2021139166A1 (en) 2021-07-15

Family

ID=70874317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110817 WO2021139166A1 (en) 2020-01-07 2020-08-24 Error page identification method based on three-dimensional flash storage structure

Country Status (2)

Country Link
CN (1) CN111240887A (en)
WO (1) WO2021139166A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111240887A (en) * 2020-01-07 2020-06-05 苏州大学 Error page identification method based on three-dimensional flash memory storage structure
CN112732182A (en) * 2020-12-29 2021-04-30 北京浪潮数据技术有限公司 NAND data writing method and related device
US20220343177A1 (en) * 2021-04-26 2022-10-27 Micron Technology, Inc. Artificial neural network remapping in memory
CN114281271B (en) * 2022-03-07 2022-05-13 北京得瑞领新科技有限公司 Method for judging reliability of NAND flash memory data, storage medium and storage device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120272123A1 (en) * 2011-04-21 2012-10-25 Phison Electronics Corp. Data writing method, memory controller and memory storage apparatus
CN104615503A (en) * 2015-01-14 2015-05-13 广东省电子信息产业集团有限公司 Flash error detection method and device for reducing influence on performance of interface of storage
CN105677242A (en) * 2015-12-31 2016-06-15 杭州华为数字技术有限公司 Hot and cold data separation method and device
CN107220185A (en) * 2017-05-23 2017-09-29 建荣半导体(深圳)有限公司 Date storage method, device and flash chip based on flash memory
CN111240887A (en) * 2020-01-07 2020-06-05 苏州大学 Error page identification method based on three-dimensional flash memory storage structure

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7512847B2 (en) * 2006-02-10 2009-03-31 Sandisk Il Ltd. Method for estimating and reporting the life expectancy of flash-disk memory
US8365041B2 (en) * 2010-03-17 2013-01-29 Sandisk Enterprise Ip Llc MLC self-raid flash data protection scheme
CN102163165B (en) * 2011-05-26 2012-11-14 忆正存储技术(武汉)有限公司 Error estimation module and estimation method thereof for flash memory
CN102591790B (en) * 2011-12-30 2015-11-25 记忆科技(深圳)有限公司 Data based on solid state hard disc store snapshot implementing method and solid state hard disc
CN108415851B (en) * 2018-01-18 2021-02-12 珠海全志科技股份有限公司 Method and device for improving starting speed of flash memory equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120272123A1 (en) * 2011-04-21 2012-10-25 Phison Electronics Corp. Data writing method, memory controller and memory storage apparatus
CN104615503A (en) * 2015-01-14 2015-05-13 广东省电子信息产业集团有限公司 Flash error detection method and device for reducing influence on performance of interface of storage
CN105677242A (en) * 2015-12-31 2016-06-15 杭州华为数字技术有限公司 Hot and cold data separation method and device
CN107220185A (en) * 2017-05-23 2017-09-29 建荣半导体(深圳)有限公司 Date storage method, device and flash chip based on flash memory
CN111240887A (en) * 2020-01-07 2020-06-05 苏州大学 Error page identification method based on three-dimensional flash memory storage structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG, MIN ET AL.: "Implicit Programming: A Fast Programming Strategy for NAND Flash Memory Storage Systems Adopting Redundancy Methods", IEEE EMBEDDED SYSTEMS LETTERS, vol. 9, no. 2, 30 June 2017 (2017-06-30), XP011650950, DOI: 10.1109/LES.2017.2670140 *

Also Published As

Publication number Publication date
CN111240887A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
WO2021139166A1 (en) Error page identification method based on three-dimensional flash storage structure
US11726688B2 (en) Storage system managing metadata, host system controlling storage system, and storage system operating method
US10303600B2 (en) Method and storage device for collecting garbage data
US20220137849A1 (en) Fragment Management Method and Fragment Management Apparatus
CN107250975B (en) Data storage system and data storage method
CN109358809B (en) RAID data storage system and method
WO2022017002A1 (en) Garbage collection method and device
US11449443B2 (en) Identification and classification of write stream priority
CN105339913A (en) Managing the write performance of an asymmetric memory system
GB2507410A (en) Storage class memory having low power, low latency, and high capacity
Chung et al. Partial parity cache and data cache management method to improve the performance of an SSD-based RAID
CN103488432B (en) Hybrid disk array, deferred write verification method for hybrid disk array, and data recovery method for hybrid disk array
WO2022142544A1 (en) Method for preventing data loss from flash memory, solid state drive controller, solid state drive
WO2020007030A1 (en) System controller and system garbage recovery method
US11487609B2 (en) Separating parity data from host data in a memory sub-system
Wu et al. Proactive data migration for improved storage availability in large-scale data centers
US11379326B2 (en) Data access method, apparatus and computer program product
US8055835B2 (en) Apparatus, system, and method for migrating wear spots
Chiueh et al. Software orchestrated flash array
US9652172B2 (en) Data storage device performing merging process on groups of memory blocks and operation method thereof
US20240061782A1 (en) Method and device for data caching
CN111475112B (en) Device for improving performance of Oracle database and data reading and writing method
CN109491593B (en) Data storage management system and method
WO2016082504A1 (en) Method and apparatus for implementing redundant arrays of independent disks
Luo et al. Cdb: Critical data backup design for consumer devices with high-density flash based hybrid storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911591

Country of ref document: EP

Kind code of ref document: A1