TW201915736A - Methods of proactive ecc failure handling - Google Patents

Methods of proactive ecc failure handling Download PDF

Info

Publication number
TW201915736A
TW201915736A TW107109044A TW107109044A TW201915736A TW 201915736 A TW201915736 A TW 201915736A TW 107109044 A TW107109044 A TW 107109044A TW 107109044 A TW107109044 A TW 107109044A TW 201915736 A TW201915736 A TW 201915736A
Authority
TW
Taiwan
Prior art keywords
command
error correction
data
open channel
solid state
Prior art date
Application number
TW107109044A
Other languages
Chinese (zh)
Other versions
TWI670595B (en
Inventor
林聖嵂
Original Assignee
慧榮科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 慧榮科技股份有限公司 filed Critical 慧榮科技股份有限公司
Priority to CN201810668594.1A priority Critical patent/CN109558266B/en
Priority to US16/034,915 priority patent/US11016841B2/en
Publication of TW201915736A publication Critical patent/TW201915736A/en
Application granted granted Critical
Publication of TWI670595B publication Critical patent/TWI670595B/en

Links

Abstract

An embodiment of a method and apparatus of proactive ECC failure handling is introduced. The method comprises: fetching a completion element from a completion queue, determining whether there is a unsafe value stored in a execution result table in the completion element, if yes, reassigning a new physical address to a user data corresponding to the unsafe value, and issuing a data programming command to a submission queue in order to program the user data to the new physical address, wherein the submission and completion queues are both located in a host.

Description

主動錯誤更正失敗處理方法  Active error correction failure handling method  

本發明關連於一種快閃記憶體,特別是一種快閃記憶體的主動錯誤更正失敗處理方法以及使用該方法的裝置。 The present invention relates to a flash memory, and more particularly to a method for processing an active error correction failure of a flash memory and a device using the same.

快閃記憶體裝置通常分為NOR快閃裝置與NAND快閃裝置。NOR快閃裝置為隨機存取裝置,主裝置(Host)可於位址腳位上提供存取NOR快閃裝置的任意位址,並即時地由NOR快閃裝置的資料腳位上獲得儲存於該位址上的使用者資料。相反地,NAND快閃裝置並非隨機存取,而是序列存取。NAND快閃裝置無法像NOR快閃裝置一樣,可以存取任何隨機位址,主裝置反而需要寫入序列的位元組(Bytes)到NAND快閃裝置中,用以定義請求命令(Command)的類型(如,讀取、寫入、抹除等),以及此命令上的位址。位址可指向一個頁面(在快閃記憶體中的一個寫入作業的最小資料塊)或一個區塊(在快閃記憶體中的一個抹除作業的最小資料塊)。實際上,NAND快閃裝置通常從記憶體單元(Memory Cells)上讀取或寫入完整的數頁資料。當一整頁的資料從陣列讀取到裝置中的緩存器(Buffer)後,藉由使用提取訊號(Strobe Signal)順序地敲出(Clock Out)內容,讓主單元可逐位元組或字元組(Words)存取資料。 Flash memory devices are generally classified into NOR flash devices and NAND flash devices. The NOR flash device is a random access device, and the host device (Host) can provide an address for accessing the NOR flash device at the address pin, and is instantly stored in the data pin of the NOR flash device. User profile on this address. Conversely, NAND flash devices are not random access, but sequential access. The NAND flash device cannot access any random address like the NOR flash device. Instead, the master device needs to write a sequence of bytes (Bytes) to the NAND flash device to define the request command (Command). Type (eg, read, write, erase, etc.) and the address on this command. The address can point to a page (the smallest data block of a write job in flash memory) or a block (the smallest data block of an erase job in flash memory). In fact, NAND flash devices typically read or write complete pages of data from memory cells. When a full page of data is read from the array to the buffer in the device, the main unit can be bitwise or word by sequentially using the Strobe Signal to knock out the content. Tuples access data.

開放通道固態硬碟(Open-Channel Solid State Disk)並不在裝置端實施快閃記憶體翻譯層(FTL,Flash Translation Layer),反而將實體固態硬碟轉交給主裝置管理。不同於傳統的固態硬碟,開放通道固態硬碟讓主裝置知道固態硬碟內部的並行架構,並允許主裝置進行管理。開放通道固態硬碟中具有編碼器,用以依據主裝置欲寫入的資料產生錯誤修正碼,並一併將資料及錯誤修正碼寫入儲存單元。此外,開放通道固態硬碟中另有錯誤修正電路,在不需要主裝置參與的情況下,使用錯誤修正碼修正存在於讀取資料的錯誤。然而,隨著儲存單元的存取次數增加,儲存資料的錯誤位元數目會上升。由於主裝置不知道讀取資料的錯誤率趨勢,無法指示開放通道固態硬碟執行資料搬移作業,用以將資料搬移到較少使用的位置。 The Open Channel Solid State Disk does not implement the Flash Translation Layer (FTL) on the device side, but instead transfers the physical solid state drive to the host device management. Unlike traditional solid-state drives, open-channel SSDs let the master know the parallel architecture inside the solid-state drive and allow the master to manage it. The open channel solid state hard disk has an encoder for generating an error correction code according to the data to be written by the main device, and writing the data and the error correction code to the storage unit. In addition, there is an error correction circuit in the open channel solid state hard disk, and the error correction code is used to correct the error existing in the read data without the participation of the master device. However, as the number of accesses to the storage unit increases, the number of error bits in the stored data increases. Since the master device does not know the error rate trend of reading data, it cannot instruct the open channel solid state hard disk to perform data transfer operations to move the data to a less-used location.

因此,需要一種快閃記憶體的主動錯誤更正失敗處理方法以及使用該方法的裝置,用以解決如上所述的問題。 Therefore, there is a need for an active error correction failure processing method for flash memory and a device using the same to solve the above problems.

本發明的實施例提出一種主動錯誤更正失敗處理方法,包含:從完成佇列取得完成元件;判斷完成元件的執行回覆表是否包括不安全值,如果是則重新分配實體位址給不安全值所對應的使用者資料;以及輸出資料寫入命令至遞交佇列以將使用者資料寫入至重新分配後的實體位址,其中,完成佇列以及遞交佇列皆位於主裝置中。 An embodiment of the present invention provides an active error correction failure processing method, including: obtaining a completion component from a completion queue; determining whether an execution reply table of the completion component includes an unsafe value, and if so, reallocating the physical address to an unsafe value Corresponding user data; and an output data write command to the delivery queue to write the user data to the reassigned physical address, wherein the completion queue and the delivery queue are all located in the primary device.

本發明的實施例提出一種主動錯誤更正失敗處理方法,包含:接收參數設定命令;依據參數設定命令而設定錯誤位元閥值;接收資料讀取命令;依據資料讀取命令至來源位址讀取使用者資料,如果使用者資料的錯誤位元數大於或等於錯誤位元閥值時,將執行回覆表中對應至該使用者資料的位元 設定成不安全值;以及將包括執行回覆表的完成元件寫入至完成佇列中,其中,參數設定命令以及資料讀取命令皆由開放通道固態硬碟所接收。 An embodiment of the present invention provides an active error correction failure processing method, including: receiving a parameter setting command; setting an error bit threshold according to a parameter setting command; receiving a data reading command; reading from a data reading command to a source address User data, if the number of error bits of the user data is greater than or equal to the error bit threshold, the bit corresponding to the user data in the reply table is set to an unsafe value; and the execution reply table is included The completion component is written to the completion queue, wherein the parameter setting command and the data reading command are all received by the open channel solid state hard disk.

110‧‧‧主裝置 110‧‧‧Main device

120‧‧‧資料緩衝器 120‧‧‧Data buffer

130‧‧‧開放通道固態硬碟 130‧‧‧Open Channel Solid State Drive

133‧‧‧處理單元 133‧‧‧Processing unit

135‧‧‧快閃控制器 135‧‧‧flash controller

137‧‧‧存取介面 137‧‧‧Access interface

137_0~137_j‧‧‧存取子介面 137_0~137_ j ‧‧‧Access subinterface

139‧‧‧儲存單元 139‧‧‧ storage unit

139_0_0~139_j_i‧‧‧儲存子單元 139_0_0~139_ j _ i ‧‧‧Storage subunit

310_0‧‧‧資料線 310_0‧‧‧Information line

320_0_0~320_0_i‧‧‧晶片致能控制訊號 320_0_0~320_0_ i ‧‧‧ Chip enable control signal

410、430、450、470‧‧‧輸出入通道 410, 430, 450, 470‧‧‧ input and output channels

410_0~410_m、430_0~430_m、450_0~450_m、470_0~470_m‧‧‧資料平面 410_0~410_ m , 430_0~430_ m , 450_0~450_ m , 470_0~470_ m ‧‧‧ data plane

490_0~490_n‧‧‧超頁面 490_0~490_ n ‧‧‧Super Page

P#0~P#(n)‧‧‧實體頁面 P#0~P#( n )‧‧‧ entity page

510‧‧‧遞交佇列 Submitted to 510‧‧‧

530‧‧‧完成佇列 530‧‧‧ completed queue

S611~S636‧‧‧方法步驟 S611~S636‧‧‧ method steps

710‧‧‧裝置辨認階段 710‧‧‧Device identification stage

730‧‧‧參數設定階段 730‧‧‧Parameter setting phase

S711~S735‧‧‧方法步驟 S711~S735‧‧‧ method steps

S811~S827‧‧‧方法步驟 S811~S827‧‧‧ method steps

900‧‧‧完成元件 900‧‧‧Complete components

910‧‧‧執行回覆表 910‧‧‧Execution reply form

920‧‧‧狀態欄位 920‧‧‧Status field

930‧‧‧命令識別碼 930‧‧‧Command ID

第1圖係依據本發明實施例之快閃記憶體的系統架構示意圖。 1 is a schematic diagram of a system architecture of a flash memory according to an embodiment of the present invention.

第2圖係依據本發明實施例之存取介面與儲存單元的方塊圖。 2 is a block diagram of an access interface and a storage unit in accordance with an embodiment of the present invention.

第3圖係依據本發明實施例之一個存取子介面與多個儲存子單元的連接示意圖。 Figure 3 is a schematic diagram showing the connection of an access sub-interface and a plurality of storage sub-units according to an embodiment of the present invention.

第4圖係儲存單元的示意圖。 Figure 4 is a schematic diagram of a storage unit.

第5圖係命令佇列示意圖。 Figure 5 is a schematic diagram of the command queue.

第6圖係管理命令或資料存取命令的執行步驟的流程圖。 Figure 6 is a flow chart showing the execution steps of a management command or a data access command.

第7圖係依據本發明實施例之快閃記憶體的裝置參數方法流程圖。 Figure 7 is a flow chart of a device parameter method of a flash memory according to an embodiment of the present invention.

第8圖係依據本發明實施例之主動錯誤更正失敗處理方法流程圖。 Figure 8 is a flow chart of a method for processing an active error correction failure according to an embodiment of the present invention.

第9圖係完成元件的資料格式圖。 Figure 9 is a data format diagram of the completed component.

以下說明係為完成發明的較佳實現方式,其目的在於描述本發明的基本精神,但並不用以限定本發明。實際的發明內容必須參考之後的權利要求範圍。 The following description is a preferred embodiment of the invention, which is intended to describe the basic spirit of the invention, but is not intended to limit the invention. The actual inventive content must be referenced to the scope of the following claims.

必須了解的是,使用於本說明書中的”包含”、”包 括”等詞,係用以表示存在特定的技術特徵、數值、方法步驟、作業處理、元件以及/或組件,但並不排除可加上更多的技術特徵、數值、方法步驟、作業處理、元件、組件,或以上的任意組合。 It must be understood that the terms "comprising", "comprising" and "the" are used in the <RTI ID=0.0> </RTI> <RTIgt; </ RTI> to indicate the existence of specific technical features, numerical values, method steps, work processes, components and/or components, but do not exclude Add more technical features, values, method steps, job processing, components, components, or any combination of the above.

於權利要求中使用如”第一”、"第二"、"第三"等詞係用來修飾權利要求中的元件,並非用來表示之間具有優先權順序,先行關係,或者是一個元件先於另一個元件,或者是執行方法步驟時的時間先後順序,僅用來區別具有相同名字的元件。 The words "first", "second", and "third" are used in the claims to modify the elements in the claims, and are not used to indicate a priority order, an advance relationship, or a component. Prior to another component, or the chronological order in which the method steps are performed, it is only used to distinguish components with the same name.

第1圖係依據本發明實施例之開放通道固態硬碟系統100架構示意圖。開放通道固態硬碟系統100架構包含主裝置110、資料緩衝器(Data Buffer)120及開放通道固態硬碟(SSD,Solid State Disk)130。主裝置111運作時可依據其需求而建立佇列(Queue)、實體儲存對照表(Storage Mapping Table,又稱為L2P Logical-to-Physical表)及使用紀錄。此系統架構可實施於個人電腦、筆記型電腦(Laptop PC)、平板電腦、手機、數位相機、數位攝影機等電子產品。資料緩衝器120、佇列、實體儲存對照表及使用紀錄可實施於隨機存取記憶體(RAM,Random Access Memory)中的特定區域。主裝置110透過開放通道固態硬碟快速非揮發記憶體(NVMe,Non-Volatile Memory express)介面與開放通道固態硬碟130溝通。主裝置110可使用多種方式實施,例如使用通用硬體(例如,單一處理器、具平行處理能力的多處理器、圖形處理器或其他具運算能力的處理器),並且在執行指令(Instructions)、宏碼(Macrocode)或微碼 (Microcode)時,提供之後描述的功能。主裝置110可包含運算邏輯單元(ALU,Arithmetic and Logic Unit)以及位移器(Bit Shifter)。運算邏輯單元負責執行布林運算(如AND、OR、NOT、NAND、NOR、XOR、XNOR等)或數學運算(如加、減、乘、除等),而位移器負責位移運算及位元旋轉。開放通道SSD NVMe規格,例如:版本1.2,公開於2016年四月,支援數個輸出入通道(I/O Channels),每一輸出入通道連接至一個邏輯單元編號(LUNs,Logical Unit Numbers),用以分別對應到儲存單元139中的多個儲存子單元。於開放通道SSD NVMe規格中,主裝置110整合原來實施於裝置端中的快閃記憶體翻譯層(FTL,Flash Translation Layer),用以最佳化負載。傳統的快閃記憶體翻譯層將主裝置端或檔案系統認得的邏輯區塊位址(LBAs,Logical Block Addresses)映射至儲存單元139的實體位址(也稱為邏輯至實體映射)。於開放通道SSD NVMe規格中,主裝置110可指示開放通道固態硬碟130將使用者資料儲存至儲存單元139中的一個實體位址,因此,實體儲存對照表的維護由主裝置110所負責及記錄每個邏輯區塊位址的使用者資料實際儲存於儲存單元139中的哪個實體位址。 1 is a schematic diagram of an architecture of an open channel solid state hard disk system 100 in accordance with an embodiment of the present invention. The Open Channel SSD system 100 architecture includes a main device 110, a Data Buffer 120, and an Open Channel Solid State Disk (SSD) 130. When the main device 111 is in operation, a Queue, a Storage Mapping Table (also referred to as an L2P Logical-to-Physical Table), and a usage record can be established according to the requirements thereof. This system architecture can be implemented in electronic products such as personal computers, laptops, tablets, mobile phones, digital cameras, digital cameras, and the like. The data buffer 120, the queue, the physical storage comparison table, and the usage record can be implemented in a specific area in a random access memory (RAM). The main device 110 communicates with the open channel solid state drive 130 through an open channel solid state hard disk non-volatile memory (NVM-Nola-Volatile Memory Express) interface. The master device 110 can be implemented in a variety of ways, such as using a general purpose hardware (eg, a single processor, a multiprocessor with parallel processing capabilities, a graphics processor, or other computing capable processor), and executing instructions (Instructions) For macro code (Macrocode) or microcode (Microcode), the functions described later are provided. The master device 110 may include an arithmetic logic unit (ALU, Arithmetic and Logic Unit) and a shifter (Bit Shifter). The arithmetic logic unit is responsible for performing Boolean operations (such as AND, OR, NOT, NAND, NOR, XOR, XNOR, etc.) or mathematical operations (such as addition, subtraction, multiplication, division, etc.), while the shifter is responsible for displacement operations and bit rotation. . The open channel SSD NVMe specification, for example, version 1.2, is published in April 2016 and supports several I/O Channels, each of which is connected to a Logical Unit Numbers (LUNs). Used to respectively correspond to a plurality of storage subunits in the storage unit 139. In the open channel SSD NVMe specification, the main device 110 integrates a Flash Translation Layer (FTL) originally implemented in the device end to optimize the load. The traditional flash memory translation layer maps the logical block addresses (LBAs) of the host device or file system to the physical addresses of the storage unit 139 (also referred to as logical to entity mapping). In the Open Channel SSD NVMe specification, the host device 110 can instruct the Open Channel SSD 130 to store user data to a physical address in the storage unit 139. Therefore, the maintenance of the physical storage lookup table is performed by the host device 110 and The physical address of the user data stored in each logical block address is actually stored in the storage unit 139.

開放通道固態硬碟130包含處理單元133。處理單元133可採用開放通道SSD NVMe通訊協定與主裝置110溝通,用以接收包含實體位址的資料存取命令,並且依據資料存取命令指示快閃控制器135執行抹除、讀取或寫入。於此須注意的是,處理單元133可使用輕簡型通用目的處理器(Lightweight General-Purpose Processor)實施。 The open channel solid state drive 130 includes a processing unit 133. The processing unit 133 can communicate with the host device 110 by using an open channel SSD NVMe communication protocol to receive a data access command including a physical address, and instruct the flash controller 135 to perform erasing, reading or writing according to the data access command. In. It should be noted here that the processing unit 133 can be implemented using a Lightweight General-Purpose Processor.

開放通道固態硬碟130另包含快閃控制器135、存取介面137及儲存單元139,並且快閃控制器135透過存取介面137與儲存單元139溝通,詳細來說,可採用雙倍資料率(Double Data Rate,DDR)通訊協定,例如,開放NAND快閃(Open NAND Flash Interface,ONFI)、雙倍資料率開關(DDR Toggle)或其他介面。開放通道固態硬碟130的快閃控制器135透過存取介面137寫入使用者資料到儲存單元139中指定的目的位址(實體位址),以及從儲存單元139中指定的來源位址(實體位址)讀取使用者資料。存取介面137使用數個電子訊號來協調快閃控制器135與儲存單元139間的資料與命令傳遞,包含資料線(Data Line)、時脈訊號(Clock Signal)與控制訊號(Control Signal)。資料線可用以傳遞命令、位址、讀出及寫入的資料;控制訊號線可用以傳遞晶片致能(Chip Enable,CE)、位址提取致能(Address Latch Enable,ALE)、命令提取致能(Command Latch Enable,CLE)、寫入致能(Write Enable,WE)等控制訊號。處理單元133與快閃控制器135可分開存在或整合於同一晶片中。 The open channel SSD 130 further includes a flash controller 135, an access interface 137 and a storage unit 139, and the flash controller 135 communicates with the storage unit 139 through the access interface 137. In detail, the double data rate can be used. (Double Data Rate, DDR) protocol, for example, Open NAND Flash Interface (ONFI), Double Data Rate Switch (DDR Toggle) or other interface. The flash controller 135 of the open channel solid state hard disk 130 writes the user data to the specified destination address (physical address) in the storage unit 139 through the access interface 137, and the source address specified from the storage unit 139 ( The physical address) reads the user data. The access interface 137 uses a plurality of electronic signals to coordinate data and command transmission between the flash controller 135 and the storage unit 139, including a data line, a clock signal, and a control signal. The data line can be used to transmit commands, addresses, read and write data; the control signal line can be used to transmit Chip Enable (CE), Address Latch Enable (ALE), and command extraction. Control signals such as Command Latch Enable (CLE) and Write Enable (WE). Processing unit 133 and flash controller 135 may be present separately or integrated into the same wafer.

快閃控制器135包含錯誤更正編碼器(ECC,Error Correction Code Encoder)及錯誤更正解碼器(ECC Decoder)。於寫入作業中,錯誤更正編碼器使用編碼演算法依據主裝置110傳來的資料產生錯誤更正碼,並且將主裝置的資料及錯誤更正碼(可統稱為使用者資料)寫入儲存單元139。於讀取作業中,錯誤更正解碼器使用相應的解碼演算法檢查從儲存單元139讀出的使用者資料的正確性,並嘗試修正其中的錯誤位元。如果使用者資料正確,快閃控制器135直接丟棄錯誤更正碼,並透過 處理單元133回覆主裝置110原始讀出的資料。如果使用者資料包含錯誤位元但已經被錯誤更正解碼器修正,快閃控制器135丟棄修正後的錯誤更正碼,並透過處理單元133回覆主裝置110修正後的資料。如果錯誤位元太多而無法回復,快閃控制器135透過處理單元133回覆資料讀取錯誤的訊息給主裝置110。錯誤更正碼可為低密度奇偶較驗碼(LDPC,Low-Density Parity Check Code)、BCH碼(Bose-Chaudhuri-Hocquenghem Code)等。一般而言,低密度奇偶較驗碼可提供比BCH碼較佳的錯誤位元更正能力,例如:每1K位元組的使用者資料,BCH碼可提供最多76個錯誤位元的修正能力,而低密度奇偶較驗碼可提供最多120個錯誤位元的修正能力。 The flash controller 135 includes an Error Correction Code Encoder (ECC) and an Error Correction Decoder (ECC Decoder). In the write operation, the error correction encoder generates an error correction code according to the data transmitted from the host device 110 using the encoding algorithm, and writes the data of the master device and the error correction code (collectively referred to as user data) to the storage unit 139. . In the read operation, the error correction decoder checks the correctness of the user profile read from the storage unit 139 using the corresponding decoding algorithm and attempts to correct the error bit therein. If the user profile is correct, the flash controller 135 directly discards the error correction code and replies to the data originally read by the host device 110 through the processing unit 133. If the user profile contains an error bit but has been corrected by the error correction decoder, the flash controller 135 discards the corrected error correction code and replies to the corrected data of the host device 110 via the processing unit 133. If the error bit is too large to be recovered, the flash controller 135 replies to the master device 110 via the processing unit 133 to reply to the data read error message. The error correction code may be a Low Density Parity Check Code (LDPC), a BCH (Bose-Chaudhuri-Hocquenghem Code), or the like. In general, low-density parity check codes can provide better error bit correction capabilities than BCH codes. For example, for every 1K byte user data, the BCH code can provide correction capability of up to 76 error bits. The low-density parity check code provides correction capability of up to 120 error bits.

於系統開機(System Boot)時,主裝置110從開放通道固態硬碟130獲得控制開放通道固態硬碟130運作時所需的操作參數,例如,區塊數目、壞塊(Bad Block)數目、延遲(Latency)時間、輸出入通道總數、是否致能錯誤更正功能等。 At system boot, the master device 110 obtains operational parameters required to control the operation of the open channel solid state drive 130 from the open channel solid state drive 130, such as the number of blocks, the number of bad blocks, and delays. (Latency) time, the total number of input and output channels, whether or not the error correction function is enabled.

儲存單元139可包含多個儲存子單元,每個儲存子單元,各自使用關聯的存取子介面與快閃控制器135進行溝通。一或多個儲存子單元可封裝在一個晶粒(Die)之中。第2圖係依據本發明實施例之存取介面與儲存單元的方塊圖。開放通道固態硬碟130可包含j+1個存取子介面137_0至137_j,每一個存取子介面連接i+1個儲存子單元。存取子介面及其後連接的儲存子單元又可統稱為輸出入通道,並可以邏輯單元編號識別。換句話說,i+1個儲存子單元共享一個存取子介面。例如,當開放通道固態硬碟130包含4個輸出入(j=3)且每一個輸出入 連接4個儲存單元(i=3)時,開放通道固態硬碟130一共擁有16個儲存子單元139_0_0至139_j_i。快閃控制器135可驅動存取子介面137_0至137_j中之一者,從指定的儲存子單元讀取資料。每個儲存子單元擁有獨立的晶片致能(CE)控制訊號。換句話說,當欲對指定的儲存子單元進行資料讀取時,需要驅動關聯的存取子介面來致能此儲存子單元的晶片致能控制訊號。第3圖係依據本發明實施例之一個存取子介面與多個儲存子單元的連接示意圖。快閃控制器135可透過存取子介面137_0使用獨立的晶片致能控制訊號320_0_0至320_0_i從連接的儲存子單元139_0_0至139_0_i中選擇出其中一者,接著,透過共享的資料線310_0從選擇出的儲存子單元的來源位址讀取使用者資料。 The storage unit 139 can include a plurality of storage subunits, each of which communicates with the flash controller 135 using an associated access sub-interface. One or more of the storage subunits may be packaged in one die (Die). 2 is a block diagram of an access interface and a storage unit in accordance with an embodiment of the present invention. The open channel solid state drive 130 may include j+1 access sub-interfaces 137_0 to 137_j, and each access sub-interface is connected to i+1 storage sub-units. The access sub-interface and the storage sub-units connected thereto can be collectively referred to as an input-in channel and can be identified by a logical unit number. In other words, i+1 storage subunits share an access subinterface. For example, when the open channel solid state drive 130 includes 4 inputs and outputs (j=3) and each of the input and output connections is connected to 4 storage units (i=3), the open channel solid state drive 130 has a total of 16 storage subunits 139_0_0. To 139_j_i. The flash controller 135 can drive one of the access sub-interfaces 137_0 to 137_j to read data from the designated storage sub-unit. Each storage subunit has an independent wafer enable (CE) control signal. In other words, when data reading is to be performed on a specified storage subunit, the associated access subinterface needs to be driven to enable the wafer enable control signal of the storage subunit. Figure 3 is a schematic diagram showing the connection of an access sub-interface and a plurality of storage sub-units according to an embodiment of the present invention. The flash controller 135 can select one of the connected storage sub-units 139_0_0 to 139_0_i through the access sub-interface 137_0 using the independent chip enable control signals 320_0_0 to 320_0_i, and then select from the shared data line 310_0. The source address of the storage subunit reads the user data.

第4圖係儲存單元139的示意圖。儲存單元139包含多個資料平面(Data Planes)410_0至410_m、430_0至430_m、450_0至450_m及470_0至470_m,每一資料平面或多個資料平面置於一個邏輯單元編號中。資料平面410_0至410_m及共享的存取子介面稱為輸出入通道410,資料平面430_0至430_m及共享的存取子介面稱為輸出入通道430,資料平面450_0至450_m及共享的存取子介面稱為輸出入通道450,及資料平面470_0至470_m及共享的存取子介面稱為輸出入通道470,其中,m可為2的次方的整數(例如2、4、8、16、32等),輸出入通道410、430、450及470可使用邏輯單元編號識別。資料平面410_0至470_m中之每一者包含多個實體區塊(Physical Blocks),每個實體區塊包含多個頁面(Pages)P#0至P#(n),每個頁面包含一個以上區 段(Sector),例如,4個,其中,n可為767或1023等。每個頁面包含多個NAND記憶體單元(Memory Cells),並且NAND記憶體單元可為單層式單元(Single-Level Cells,SLCs)、多層式單元(Multi-Level Cells,MLCs)、三層式單元(Triple-Level Cells,TLCs)或四層式單元(Quad-Level Cells,QLCs)。於一些實施例中,當每一個NAND記憶體單元為單層式單元而可記錄2個狀態時,資料平面410_0至470_0中的頁面P#0可虛擬形成超頁面(Super Page)490_0,資料平面410_0至470_0中的頁面P#1可虛擬形成超頁面490_1,依此類推。於另一些實施例中,當每一個NAND記憶體單元為多層式單元而可記錄4個狀態時,一個實體字元線可包含頁面P#0(可稱為最低位元頁面,MSB,Most Significant Bit Page)、頁面P#1(可稱為最高位元頁面,LSB,Least Significant Bit Page),依此類推。於更另一些實施例中,當每一個NAND記憶體單元為三層式單元而可記錄8個狀態時,一個實體字元線可包含頁面P#0(可稱為最低位元頁面,MSB Page)、頁面P#1(可稱為中間位元頁面,CSB,Center Significant Bit Page)及頁面P#2(可稱為最高位元頁面,LSB Page)。當每一個NAND記憶體單元為四層式單元而可記錄16個狀態時,除了MSB、CSB以及LSB頁面之外,更包括TSB(可稱為頂部位元,TSB,Top Significant Bit)頁面。 FIG. 4 is a schematic diagram of the storage unit 139. The storage unit 139 includes a plurality of data planes 410_0 to 410_m, 430_0 to 430_m, 450_0 to 450_m, and 470_0 to 470_m, each data plane or a plurality of data planes being placed in one logical unit number. The data planes 410_0 to 410_m and the shared access sub-interface are referred to as the input-input channels 410, the data planes 430_0 to 430_m and the shared access sub-interfaces are referred to as the input-input channels 430, the data planes 450_0 to 450_m, and the shared access sub-interfaces. The input-input channel 450, and the data planes 470_0 to 470_m and the shared access sub-interface are referred to as the input-input channels 470, where m can be an integer of 2 (eg, 2, 4, 8, 16, 32, etc.) The input and output channels 410, 430, 450, and 470 can be identified using a logical unit number. Each of the data planes 410_0 to 470_m includes a plurality of physical blocks, each of which contains a plurality of pages (Pages) P#0 to P#(n), each of which contains more than one zone. Sector, for example, four, where n can be 767 or 1023, and the like. Each page contains multiple NAND memory cells, and the NAND memory cells can be single-level cells (SLCs), multi-level cells (MLCs), and three-layer cells. Triple-Level Cells (TLCs) or Quad-Level Cells (QLCs). In some embodiments, when each NAND memory cell is a single-level cell and two states can be recorded, the page P#0 in the data planes 410_0 to 470_0 can virtually form a Super Page 490_0, a data plane. The page P#1 in 410_0 to 470_0 can virtually form the super page 490_1, and so on. In other embodiments, when each NAND memory cell is a multi-level cell and four states can be recorded, one physical word line can include page P#0 (which can be called the lowest bit page, MSB, Most Significant). Bit Page), page P#1 (can be called the highest bit page, LSB, Least Significant Bit Page), and so on. In still other embodiments, when each NAND memory cell is a three-layer cell and 8 states can be recorded, one physical word line may include page P#0 (which may be referred to as the lowest bit page, MSB Page). ), page P#1 (can be called the intermediate bit page, CSB, Center Significant Bit Page) and page P#2 (can be called the highest bit page, LSB Page). When each NAND memory cell is a four-layer cell and 16 states can be recorded, in addition to the MSB, CSB, and LSB pages, a TSB (TSB, Top Significant Bit) page is further included.

儲存單元139運作時,頁面為資料寫入或編程的最小單位,大小例如為4KB,此時實體位址可表示為頁面編號;如果頁面包含多個區段,大小例如為16KB,每一區段可儲存4KB的資料時,則區段可為資料管理的最小單位,此時實體位 址可表示為頁面的區段編號(Sector Number)或區段在頁面的偏移量(Offset)。另外,一般而言,區塊為資料抹除的最小單位。 When the storage unit 139 is in operation, the page is a minimum unit for data writing or programming, and the size is, for example, 4 KB, and the physical address can be represented as a page number; if the page contains multiple segments, the size is, for example, 16 KB, each segment When 4KB of data can be stored, the section can be the smallest unit of data management. In this case, the physical address can be expressed as the sector number of the page (Sector Number) or the offset of the section on the page (Offset). In addition, in general, a block is the smallest unit of data erasure.

實體區塊可依據其使用狀態而區分成主動區塊、資料區塊以及閒置區塊。主動區塊表示正在進行資料寫入的實體區塊,即尚未寫入區塊結束(End of Block)資訊的實體區塊。主動區塊的選取可依據一些參數,例如:最低的抺寫次數或最久的使用者資料寫入時間。當主動區塊已寫滿使用者資料或不再寫入使用者資料時,區塊結束資訊將被寫入主動區塊中,並將主動區塊視為資料區塊。閒置區塊可被選取而成為主動區塊,閒置區塊不儲存任何有效的使用者資料。通常閒置區塊被選取後,需執行抹除動作方可成為主動區塊。 The physical blocks can be divided into active blocks, data blocks, and idle blocks according to their usage status. The active block represents a physical block in which data is being written, that is, a physical block in which the end of block information has not been written. The active block can be selected based on some parameters, such as the minimum number of writes or the longest user data write time. When the active block is full of user data or no longer written to the user profile, the block end information will be written into the active block and the active block will be treated as a data block. The idle block can be selected to become the active block, and the idle block does not store any valid user data. Usually, after the idle block is selected, the erase operation is performed to become the active block.

於一些實施例中,主裝置110傳送給開放通道固態硬碟130的實體位址可包含輸出入通道編號、邏輯單元編號、資料平面編號、實體區塊編號、實體頁面編號及區段編號等資訊,用以指出欲讀取或寫入位於特定輸出入通道中的特定資料平面中的特定實體區塊中的特定實體頁面中的特定區段的使用者資料。於一些實施例中,有時會以行(Column)編號取代區段編號。於另一些實施例中,主裝置110傳送給開放通道固態硬碟130的實體位址可包含邏輯單元編號、資料平面編號及實體區塊編號等資訊,用以指出欲抹除特定輸出入通道中的特定資料平面中的特定資料區塊。 In some embodiments, the physical address transmitted by the host device 110 to the open channel SSD 130 may include information such as an input channel number, a logical unit number, a data plane number, a physical block number, a physical page number, and a segment number. Used to indicate the user profile of a particular segment in a particular entity page in a particular physical block that is to be read or written in a particular data plane in a particular input-in channel. In some embodiments, the segment number is sometimes replaced with a column number. In other embodiments, the physical address transmitted by the host device 110 to the open channel SSD 130 may include information such as a logical unit number, a data plane number, and a physical block number to indicate that the specific input and output channels are to be erased. A specific data block in a specific data plane.

第5圖係命令佇列示意圖。佇列115可包含遞交佇列(Submission Queue)510及完成佇列(Completion Queue)530, 分別用以暫存主裝置指令以及完成元件(Completion Element)。遞交佇列510及完成佇列530中之每一者包含多筆項目(Entry)的集合。遞交佇列510中的每一筆項目儲存一個主裝置指令,例如:管理命令(Administration Command),例如,裝置辨認(Device Identification)、參數設定(Parameter Setting)命令、輸出入命令(I/O Command),例如,抹除、讀取、寫入命令等,亦可稱為資料存取命令。 Figure 5 is a schematic diagram of the command queue. The queue 115 can include a Submission Queue 510 and a Completion Queue 530 for temporarily storing the main device instructions and the Completion Element. Each of the delivery queue 510 and the completion queue 530 contains a collection of multiple entries. Each item in the delivery queue 510 stores a main device command, such as an Administration Command, for example, Device Identification, Parameter Setting, and I/O Command. For example, erase, read, write commands, etc., may also be referred to as data access commands.

完成佇列530中的每一筆項目儲存關聯至一個管理命令或資料存取命令的完成元件(Completion Element),此完成元件的功能類似確認訊息。集合中的項目依序存放。集合的操作基本原則是由結束位置(或稱為佇列尾,Tail)新增項目(可稱為入列),執行位於開始位置(或稱為佇列頭,Head)的項目(可稱為出列),其中,入列或出列一次的項目總數可大於等於一。第一個新增至遞交佇列510或完成佇列530的命令或訊息,之後,也將會是第一個被替代或更新的。主裝置110可寫入管理或資料存取命令至遞交佇列510,並且處理單元133從遞交佇列510讀取(或稱為提取Fetch)最早到達的管理或資料存取命令並執行。於管理或資料存取命令執行完成後,處理單元133寫入完成元件至完成佇列530,主裝置110可讀取或提取完成元件而判斷管理或資料存取命令的執行結果。 Each of the completed queues 530 stores a Completion Element associated with a management command or data access command that functions like a confirmation message. The items in the collection are stored in order. The basic principle of the operation of the collection is to add a new item (called an enqueue) from the end position (or called the tail column), and execute the item at the starting position (or called the head, Head). Dequeued), wherein the total number of items listed or listed once may be greater than or equal to one. The first command or message added to the delivery queue 510 or the completion queue 530 will be the first one to be replaced or updated. The master device 110 can write a management or data access command to the delivery queue 510, and the processing unit 133 reads (or refers to the Fetch) the earliest arriving management or data access command from the delivery queue 510 and executes it. After the execution of the management or data access command is completed, the processing unit 133 writes the completion component to the completion queue 530, and the host device 110 can read or extract the completion component to determine the execution result of the management or data access command.

第6圖係管理命令或資料存取命令的執行步驟的流程圖。主裝置110產生並寫入管理命令或資料存取命令至遞交佇列510(步驟S611)。資料存取命令包含實體位址的資訊,實體位址包括來源位址或目的位址,並且,來源位址或目的位址 指向儲存單元139或資料緩衝器120的實體位址,例如:特定的區塊、頁面或區段位址,而非邏輯區塊位址。接著,主裝置110發出遞交門鈴(Submission Doorbell)給處理單元133(步驟S612),用以通知處理單元133關於遞交佇列510中已寫入一個管理命令或資料存取命令的資訊,並更新遞交佇列510的佇列尾的值。於此須注意的是,步驟S611及步驟S612又可稱為主裝置110發出管理命令或資料存取命令給開放通道固態硬碟130。處理單元133接收到遞交門鈴後(步驟S631),從遞交佇列510讀取位於佇列頭的管理命令或資料存取命令(步驟S632),並且依據管理命令或資料存取命令指示快閃控制器135,用以完成指定的作業(例如,裝置辨認、參數設定、抹除、資料讀取、寫入等)(步驟S633)。 Figure 6 is a flow chart showing the execution steps of a management command or a data access command. The main device 110 generates and writes a management command or a material access command to the delivery queue 510 (step S611). The data access command includes information of the physical address, the physical address includes a source address or a destination address, and the source address or the destination address points to a physical address of the storage unit 139 or the data buffer 120, for example: specific A block, page, or sector address, not a logical block address. Next, the main device 110 sends a Submission Doorbell to the processing unit 133 (step S612) to notify the processing unit 133 about the information that the management command or the data access command has been written in the delivery queue 510, and updates the submission. The value of the end of the array column of column 510. It should be noted that step S611 and step S612 may be referred to as the main device 110 issuing a management command or a data access command to the open channel solid state drive 130. After receiving the delivery of the doorbell (step S631), the processing unit 133 reads the management command or the material access command located at the queue header from the delivery queue 510 (step S632), and indicates the flash control according to the management command or the data access command. The device 135 is configured to complete a specified job (for example, device identification, parameter setting, erasing, data reading, writing, etc.) (step S633).

於此須注意的是,步驟S631及步驟S632又可稱為開放通道固態硬碟130接收從主裝置110發出的管理命令或資料存取命令。當指定的作業完成後,處理單元133產生並寫入完成元件至完成佇列530(步驟S6340)用以通知主裝置110相應於特定管理命令或資料存取命令的作業的執行狀態資訊,並且發出中斷給主裝置(步驟S635)。接收中斷後(步驟S613),主裝置110從完成佇列530讀取位於佇列頭的完成元件(步驟S613),接著,發出完成門鈴給處理單元133(步驟S614)。接收完成門鈴後(S636),處理單元133更新完成佇列530的佇列頭的值。於此須注意的是,步驟S634及步驟S635又可稱為開放通道固態硬碟130回覆主裝置110執行管理命令或資料存取命令的結果。於此須注意的是,步驟S613及步驟S614又可稱為主裝置110從開 放通道固態硬碟130接收執行管理命令或資料存取命令的結果。 It should be noted here that step S631 and step S632 may be referred to as an open channel solid state drive 130 to receive a management command or a material access command issued from the host device 110. When the designated job is completed, the processing unit 133 generates and writes the completion component to the completion queue 530 (step S6340) for notifying the master device 110 of the execution status information of the job corresponding to the specific management command or the material access command, and issues The interruption is given to the master device (step S635). After receiving the interruption (step S613), the main apparatus 110 reads the completion element located at the queue head from the completion queue 530 (step S613), and then issues the completion doorbell to the processing unit 133 (step S614). After receiving the completion of the doorbell (S636), the processing unit 133 updates the value of the queue header of the completion queue 530. It should be noted that step S634 and step S635 may be referred to as an open channel solid state hard disk 130 to reply to the result of the host device 110 executing a management command or a data access command. It should be noted here that steps S613 and S614 may be referred to as the result of the master device 110 receiving an execution management command or a data access command from the open channel solid state drive 130.

於步驟S612及S614,主裝置110可設定相應寄存器(Registers)來向處理單元133發出遞交門鈴及結束門鈴。 In steps S612 and S614, the main device 110 can set corresponding registers (Registers) to issue the delivery doorbell and the end doorbell to the processing unit 133.

一筆資料存取命令可處理多筆使用者資料,例如:64筆,則完成元件中可包括64個位元長度的執行回覆表,在此執行回覆表中,每個位元分別表示一筆使用者資料的執行結果,例如:”0”表示成功,”1”表示失敗。資料存取命令包含操作碼欄位,用以表示資料存取命令的類型(例如,抹除、讀取、寫入等)。完成元件包含狀態欄位,用以儲存對應的資料存取命令的執行狀態(例如,成功、失敗等)。另外,處理單元133可亂序或依優先權的順序來執行資料存取命令,因此,資料存取命令及完成元件都包含命令識別碼(Command Identifier),用以讓主裝置110及處理單元133將每一個完成元件關聯至特定資料存取命令。 A data access command can process multiple user data, for example, 64 pens, and the completion component can include an execution reply table of 64 bit lengths. In this execution reply table, each bit represents a user. The execution result of the data, for example: "0" indicates success, and "1" indicates failure. The data access command contains an opcode field that indicates the type of data access command (eg, erase, read, write, etc.). The completion component contains a status field for storing the execution status of the corresponding data access command (eg, success, failure, etc.). In addition, the processing unit 133 can execute the data access command in an out-of-order or priority order. Therefore, the data access command and the completion component both include a command identifier (Command Identifier) for the main device 110 and the processing unit 133. Associate each completion element to a specific data access command.

舉例來說,一個閒置區塊在寫入前需要被抹除以成為主動區塊,主裝置110可寫入抹除命令至遞交佇列510(步驟S611)用以指示開放通道固態硬碟130(詳細來說為處理單元133)針對特定輸出入通道中的特定閒置區塊執行抹除作業。處理單元133因應抹除命令而指示快閃控制器135通過驅動存取介面137以完成於儲存單元139中指定的抹除作業(步驟S633)。當抹除作業完成,處理單元133寫入完成元件至完成佇列530(步驟S634)用以通知主裝置110關於相應抹除作業已經完成的資訊。 For example, an idle block needs to be erased to become an active block before writing, and the main device 110 can write an erase command to the delivery queue 510 (step S611) to indicate the open channel solid state drive 130 ( In detail, the processing unit 133) performs an erase job for a particular idle block in a particular input-in channel. The processing unit 133 instructs the flash controller 135 to complete the erase job specified in the storage unit 139 by driving the access interface 137 in response to the erase command (step S633). When the erase job is completed, the processing unit 133 writes the completion component to the completion queue 530 (step S634) to notify the host device 110 of the information that the corresponding erase job has been completed.

又舉例來說,主裝置110可寫入資料讀取命令至遞交佇列510(步驟S611)用以指示開放通道固態硬碟130從特定輸出入通道中的特定資料平面中的特定實體區塊中的特定實體頁面(的特定區段)讀取使用者資料。處理單元133因應資料讀取命令而指示快閃控制器135通過驅動存取介面137從儲存單元139中指定的來源位址讀取使用者資料,並且將使用者資料儲存至資料讀取命令所指定的資料緩衝器120(步驟S633)。當讀取作業完成,處理單元133寫入完成元件至完成佇列530(步驟S634)用以通知主裝置110關於相應資料讀取作業已經完成的資訊。 For another example, the main device 110 can write a data read command to the delivery queue 510 (step S611) to indicate that the open channel solid state drive 130 is in a specific physical block in a specific data plane in a specific input/output channel. The user profile is read by a specific section of the specific entity page. The processing unit 133 instructs the flash controller 135 to read the user data from the source address specified in the storage unit 139 through the drive access interface 137 in response to the data read command, and save the user data to the data read command. The data buffer 120 (step S633). When the read job is completed, the processing unit 133 writes the completion component to the completion queue 530 (step S634) to notify the host device 110 of the information that the corresponding material reading job has been completed.

又舉例來說,主裝置110可儲存欲寫入的使用者資料於資料緩衝器120,並儲存寫入命令至遞交佇列510(步驟S611)用以指示開放通道固態硬碟130寫入資料緩衝器120的使用者資料至特定主動區塊中的特定實體頁面(的特定區段),其中,寫入命令包含特定實體頁面(的特定區段)的目的位址(實體位址)以及使用者資料的來源位址(實體位址)的資訊。處理單元133因應寫入命令而從資料緩衝器120的來源位址讀取使用者資料,並指示快閃控制器135通過驅動存取介面137將使用者資料編程至儲存單元139中寫入命令所指定的目的位址(步驟S633)。當寫入作業完成,處理單元133寫入完成元件至完成佇列530(步驟S634)用以通知主裝置110關於相應寫入作業已經完成的資訊。 For another example, the main device 110 can store the user data to be written in the data buffer 120, and store the write command to the delivery queue 510 (step S611) for instructing the open channel solid state hard disk 130 to write the data buffer. User data of the device 120 to (a specific section of a specific entity page in a specific active block), wherein the write command includes a destination address (physical address) of the (specific section) of the specific entity page and the user Information about the source address (physical address) of the data. The processing unit 133 reads the user profile from the source address of the data buffer 120 in response to the write command, and instructs the flash controller 135 to program the user profile to the write command in the storage unit 139 by driving the access interface 137. The specified destination address (step S633). When the write job is completed, the processing unit 133 writes the completion element to the completion queue 530 (step S634) to notify the host device 110 of the information that the corresponding write job has been completed.

雖然第5圖只顯示二個佇列510及530,但熟習此技藝人士可將遞交佇列510分為管理遞交佇列(Administration Submission Queue)及輸出入遞交佇列(I/O Submission Queue),分別用以暫存來自主裝置110的管理命令及資料存取命令,並且將完成佇列530分為管理完成佇列(Administration Completion Queue)及輸出入完成佇列(I/O Completion Queue),用以分別儲存關聯至管理命令及資料存取命令的完成元件。 Although FIG. 5 shows only two queues 510 and 530, those skilled in the art can divide the delivery queue 510 into an Administration Submission Queue and an I/O Submission Queue. The management command and the data access command from the host device 110 are temporarily stored, and the completion queue 530 is divided into an Administration Completion Queue and an I/O Completion Queue. To store the completion elements associated with the management commands and data access commands, respectively.

隨著區塊的抹除次數的增加,區塊的資料保存(Data Retention)能力會逐漸變弱,這將造成儲存於實體頁面的使用者資料可能包含更多的錯誤位元。由於錯誤更正解碼器會自動檢查並修正使用者資料中的錯誤位元,所以,主裝置110無法知道一個實體頁面的錯誤位元率增加的程度,故無法採取適當的錯誤預防機制。最後,當錯誤更正解碼器無法修正使用者資料中的錯誤位元時,開放通道固態硬碟130回傳資料讀取錯誤訊息到主裝置110,主裝置110只能啟動進階資料修正機制,例如,使用獨立硬碟冗餘陣列(RAID,Redundant Array of Independent Disks)以修正使用者資料。然而,獨立硬碟冗餘陣列回復的執行會耗費大量主裝置110及開放通道固態硬碟130的運算資源及時間,以及主裝置110與開放通道固態硬碟130間的資料頻寬。為了避免此缺陷,本發明實施例提出一種錯誤預防機制以降低發生資料讀取錯誤的可能性或頻率。 As the number of erasures of the block increases, the data retention capability of the block will gradually weaken, which will cause the user data stored in the entity page to contain more error bits. Since the error correction decoder automatically checks and corrects the error bit in the user profile, the master device 110 cannot know the extent of the error bit rate of a physical page, and thus cannot take an appropriate error prevention mechanism. Finally, when the error correction decoder cannot correct the error bit in the user data, the open channel SSD 130 returns the data read error message to the host device 110, and the host device 110 can only initiate the advanced data correction mechanism, for example, Use Redundant Array of Independent Disks (RAID) to correct user data. However, the execution of the independent hard disk redundant array recovery consumes a large amount of computing resources and time of the main device 110 and the open channel solid state hard disk 130, and the data bandwidth between the main device 110 and the open channel solid state hard disk 130. In order to avoid this drawback, embodiments of the present invention propose an error prevention mechanism to reduce the likelihood or frequency of data reading errors.

大致上而言,本發明錯誤預防機制先取得開放通道固態硬碟130的操作參數,接著再設定開放通道固態硬碟130的操作參數,其中,操作參數包括錯誤位元閥值,再配合主動錯誤更正失敗處理(Proactive ECC failure handling)方法以達到 本發明的目的。 In general, the error prevention mechanism of the present invention first obtains the operating parameters of the open channel solid state hard disk 130, and then sets the operating parameters of the open channel solid state hard disk 130, wherein the operating parameters include the error bit threshold, and then cooperate with the active error. The Proactive ECC failure handling method is used to achieve the objectives of the present invention.

第7圖係本發明取得並設定快閃記憶體的裝置參數方法流程圖。整個流程包含裝置辨認階段(Device Identification Phase)710及參數設定階段(Parameter Setting Phase)730。裝置辨認階段710包括步驟S711~S715。主裝置110寫入裝置辨認(Device Identification)命令至遞交佇列510(步驟S711)。裝置辨認命令用以請求開放通道固態硬碟130(詳細來說為處理單元133)提供操作參數,包括;區塊數目、壞塊數目、延遲時間、輸出入通道總數、是否致能錯誤更正功能、錯誤更正功能的最大能力值(單位為錯誤位元數/資料長度)等,更可包括錯誤位元閥值(單位為錯誤位元數/資料長度)。開放通道固態硬碟130從遞交佇列510接收裝置辨認命令(步驟S713)。於收到裝置辨認命令後,開放通道固態硬碟130儲存操作參數至裝置辨認命令所指定的記憶體位址,之後,寫入裝置辨認命令所對應的完成元件至完成佇列530(步驟S715)。 Figure 7 is a flow chart of a method for obtaining and setting device parameters of a flash memory according to the present invention. The entire process includes a Device Identification Phase 710 and a Parameter Setting Phase 730. The device identification phase 710 includes steps S711 to S715. The main device 110 writes a device identification command to the delivery queue 510 (step S711). The device identification command is used to request the open channel solid state drive 130 (in detail, the processing unit 133) to provide operational parameters, including: the number of blocks, the number of bad blocks, the delay time, the total number of input and output channels, whether the error correction function is enabled, The maximum capability value of the error correction function (in units of error bits/data length), etc., may include the error bit threshold (in units of error bits/data length). The open channel solid state hard disk 130 receives the device identification command from the delivery queue 510 (step S713). After receiving the device identification command, the open channel solid state hard disk 130 stores the operation parameter to the memory address specified by the device identification command, and then writes the completion element corresponding to the device identification command to the completion queue 530 (step S715).

參數設定階段(Parameter Setting Phase)730包括步驟S731~S735。主裝置110可對開放通道固態硬碟130所回傳的操作參數進行設定,例如:致能錯誤更正功能,並設定錯誤位元閥值的值,例如:100,其中,錯誤位元閥值小於被致能的錯誤更正功能的最大能力值,例如:120。之後,主裝置110再將上述設定值儲存至參數設定命令,接著主裝置110寫入參數設定命令至遞交佇列510(步驟S731)。開放通道固態硬碟130從遞交佇列510接收到參數設定命令(步驟S733)。當取得參數設定命令後,開放通道固態硬碟130依據參數設定命令的設定值 進行操作參數的設定,例如:致能錯誤更正功能且設定錯誤位元閥值為100。當參數設定命令的設定值有效且依此完成操作參數的設定後,開放通道固態硬碟130寫入參數設定命令所對應的完成元件至完成佇列530(步驟S735)。 The Parameter Setting Phase 730 includes steps S731 to S735. The main device 110 can set the operation parameters returned by the open channel solid state hard disk 130, for example, enable the error correction function, and set the value of the error bit threshold, for example: 100, wherein the error bit threshold is less than The maximum ability value of the enabled error correction function, for example: 120. Thereafter, the host device 110 stores the set value again to the parameter setting command, and then the host device 110 writes the parameter setting command to the delivery queue 510 (step S731). The open channel solid state hard disk 130 receives a parameter setting command from the delivery queue 510 (step S733). After the parameter setting command is obtained, the open channel solid state hard disk 130 performs setting of the operating parameter according to the set value of the parameter setting command, for example, enabling the error correction function and setting the error bit threshold to 100. When the set value of the parameter setting command is valid and the setting of the operation parameter is completed accordingly, the open channel solid state hard disk 130 writes the completion element corresponding to the parameter setting command to the completion queue 530 (step S735).

當錯誤位元閥值完成設定後,主裝置110即可啟動本發明主動錯誤更正失敗處理方法。第8圖係依據本發明實施例由主裝置110執行的主動錯誤更正失敗處理方法的流程圖。主裝置110輸出資料讀取命令至開放通道固態硬碟130(步驟S811),其中,輸出資料讀取命令的方法可參考步驟S611~S612。 When the error bit threshold is set, the host device 110 can initiate the active error correction failure processing method of the present invention. Figure 8 is a flow diagram of an active error correction failure processing method performed by the host device 110 in accordance with an embodiment of the present invention. The main device 110 outputs a data read command to the open channel solid state drive 130 (step S811). The method for outputting the data read command may refer to steps S611 to S612.

接著,開放通道固態硬碟130依據資料讀取命令從指定的來源位址(實體位址)讀取使用者資料(步驟S813),步驟S813的細節可參考步驟S631~S633。在取得使用者資料後,開放通道固態硬碟130的錯誤更正解碼器會自動檢查並修正讀取的使用者資料的錯誤位元,並計算錯誤位元的總數。 Next, the open channel solid state hard disk 130 reads the user data from the specified source address (physical address) according to the data read command (step S813), and the details of step S813 can refer to steps S631 to S633. After obtaining the user data, the error correction decoder of the Open Channel SSD 130 automatically checks and corrects the error bits of the read user data and calculates the total number of error bits.

接著,開放通道固態硬碟130的快閃控制器135判斷讀取的使用者資料的錯誤位元的總數是否等於或大於錯誤位元閥值(步驟S815)。若是(步驟S815中”是”的路徑),將錯誤位元的總數等於或大於錯誤位元閥值的使用者資料在執行回覆表中所對應的位元設定成”1”(步驟S817)。假設一筆資料讀取命令要求讀取64筆使用者資料,執行回覆表中的每個位元可分別表示一筆使用者資料的錯誤位元的總數是否等於或大於錯誤位元閥值,如果否則值為”0”,即安全值,表示使用者資料仍被安全地儲存,如果是則值為”1”,即不安全值,表示使用者資料的儲存可能存在著風險。 Next, the flash controller 135 of the open channel solid state hard disk 130 determines whether the total number of error bits of the read user profile is equal to or greater than the error bit threshold (step S815). If it is (the path of YES in step S815), the bit corresponding to the user data whose total number of error bits is equal to or larger than the error bit threshold is set to "1" in the execution reply table (step S817). Suppose a data read command requires reading 64 user data. Each bit in the execution reply table can respectively indicate whether the total number of error bits of a user data is equal to or greater than the error bit threshold. A value of "0", that is, a security value, indicates that the user data is still stored securely. If it is, the value is "1", that is, an unsafe value, indicating that there may be a risk in the storage of the user data.

接著,開放通道固態硬碟130將使用者資料儲存至資料讀取命令所指定的目的位址(步驟S819),其中,目的位址較佳為資料緩衝器120的實體位址。接著,開放通道固態硬碟130將完成元件寫入至完成佇列530(步驟S821)。 Next, the open channel SSD 130 stores the user data to the destination address specified by the data read command (step S819), wherein the destination address is preferably the physical address of the data buffer 120. Next, the open channel solid state hard disk 130 writes the completion element to the completion queue 530 (step S821).

接著,主裝置110自完成佇列530取得完成元件的執行回覆表(步驟S823),接著,主裝置110判斷執行回覆表中是否有”1”(步驟S825),如果否則結束本發明的執行。 Next, the main device 110 acquires the execution reply table of the completion element from the completion queue 530 (step S823), and then, the main device 110 determines whether or not there is "1" in the execution reply table (step S825), and if otherwise, the execution of the present invention is ended.

接著,主裝置110重新分配實體位址給執行回覆表中”1”所對應的使用者資料(步驟S827),其中,重新分配的實體位址置於主動區塊中。 Next, the master device 110 reassigns the physical address to the user profile corresponding to the "1" in the reply table (step S827), wherein the reallocated physical address is placed in the active block.

接著,主裝置110輸出資料寫入命令至開放通道固態硬碟130以將執行回覆表中”1”所對應的使用者資料寫入至重新分配後的實體位址(步驟S829),其中,輸出資料寫入命令的方法可參考步驟S611~S612。 Next, the main device 110 outputs a data write command to the open channel solid state drive 130 to write the user data corresponding to the "1" in the execution reply table to the reallocated physical address (step S829), where the output For the method of writing data to the command, refer to steps S611 to S612.

第9圖係完成元件的資料格式圖。完成元件900可為16位元組訊息。完成元件900的第3雙字組的第0至1位元組紀錄命令識別碼930,其內容應與主裝置資料讀取命令的命令識別碼一致,用以讓此完成元件900關聯至此主裝置110所發出的資料讀取命令。完成元件900的第0至1雙字組儲存執行回覆表910,共計有64位元,每一位元可記錄一筆使用者資料在步驟S817的執行結果,即一筆使用者資料的錯誤位元的總數是否等於或大於錯誤位元閥值。完成元件1100的第3雙字組的第17至31位元紀錄狀態欄位920,狀態欄位920可表示主裝置110所發出的資料讀取命令是否已成功地執行。 Figure 9 is a data format diagram of the completed component. Completion component 900 can be a 16-bit tuple message. The 0th to 1st byte record command identification code 930 of the third double block of the completion component 900 is identical to the command identification code of the master device data read command for associating the completion component 900 to the master device. 110 data read command issued. The 0th to 1st double word storage execution reply table 910 of the completion component 900 has a total of 64 bits, and each bit can record the execution result of the user data in step S817, that is, the error bit of a user data. Whether the total is equal to or greater than the error bit threshold. The 17th to 31st bit record status field 920 of the third double word of the component 1100 is completed, and the status field 920 may indicate whether the data read command issued by the host device 110 has been successfully executed.

錯誤位元閥值的設定值可使用以下公式:Tr=MAX i x αTr代表閥值,MAX i 代表特定錯誤更正能力的最大能力值(單位為錯誤位元數/資料長度),例如120b/1KB,α代表介於0至1之間的係數,可為0.6≦α≦1。例如,α的初始值設為0.8,則錯誤位元閥值的初始設定值為96。當讀取64筆使用者資料,完成元件的狀態欄位顯示成功,且完成元件的執行回覆表中全部為”0”。 The value of the error bit threshold can be set using the following formula: Tr = MAX i x α ; Tr represents the threshold, MAX i represents the maximum capability value of the specific error correction capability (in units of error bits / data length), eg 120b /1KB, α represents a coefficient between 0 and 1, which can be 0.6≦ α ≦1. For example, if the initial value of α is set to 0.8, the initial setting value of the error bit threshold is 96. When the 64 user data is read, the status field of the completed component is displayed successfully, and all the execution reply forms of the completed component are “0”.

隨著抺除次數的增加及儲存單元139的老化,開放通道固態硬碟130所回傳的警告訊息的頻率也將隨之增加。因此,主裝置110可適當地增加α的值,例如,將α的值變更為0.9,即錯誤位元閥值變更為108,以避免不必要的資料搬移作業。例如:讀取64筆使用者資料時,完成元件的狀態欄位顯示成功,但是完成元件的執行回覆表中有32個”1”,即有32筆使用者資料的錯誤位元數的值超過96,此時,主裝置110可將錯誤位元閥值變更為108並儲存至參數設定命令,接著發出參數設定命令至開放通道固態硬碟130,使開放通道固態硬碟130將變更錯誤位元閥值變更為108,即有效地降低執行回覆表中有”1”的數目。之後,主裝置110僅需對”1”的使用者資料重新分配一個實體位址。由於重新分配的實體位址位於主動區塊中,而主動區塊通常具有較低的抺除次數,即具有較佳的資料資料保存,故能有效地克服習知技藝的缺失。 As the number of erasures increases and the storage unit 139 ages, the frequency of warning messages returned by the open channel SSD 130 will also increase. Therefore, the master device 110 can appropriately increase the value of α , for example, changing the value of α to 0.9, that is, changing the error bit threshold to 108 to avoid unnecessary data transfer operations. For example, when reading 64 user data, the status field of the completed component is displayed successfully, but there are 32 "1"s in the execution reply table of the completed component, that is, the value of the number of error bits of 32 user data exceeds 96. At this time, the main device 110 can change the error bit threshold to 108 and save to the parameter setting command, and then issue the parameter setting command to the open channel solid state hard disk 130, so that the open channel solid state hard disk 130 will change the error bit. The threshold is changed to 108, which effectively reduces the number of "1"s in the execution reply table. Thereafter, the master device 110 only needs to reallocate a physical address to the user data of "1". Since the reallocated physical address is located in the active block, and the active block usually has a lower number of erasures, that is, it has better data storage, it can effectively overcome the lack of prior art.

雖然本發明使用以上實施例進行說明,但需要注意的是,這些描述並非用以限縮本發明。相反地,此發明涵蓋了熟習此技藝人士顯而易見的修改與相似設置。所以,申請權 利要求範圍須以最寬廣的方式解釋來包含所有顯而易見的修改與相似設置。 Although the present invention has been described using the above embodiments, it should be noted that these descriptions are not intended to limit the invention. On the contrary, this invention covers modifications and similar arrangements that are apparent to those skilled in the art. Therefore, the scope of the application claims must be interpreted in the broadest sense to include all obvious modifications and similar arrangements.

Claims (16)

一種主動錯誤更正失敗處理方法,包含:從一完成佇列取得一完成元件;判斷該完成元件的一執行回覆表是否包括一不安全值,如果是則重新分配一實體位址給該不安全值所對應的一使用者資料;以及輸出一資料寫入命令至一遞交佇列以將該使用者資料寫入至重新分配後的該實體位址。  An active error correction failure processing method includes: obtaining a completion component from a completion queue; determining whether an execution reply table of the completion component includes an unsafe value, and if yes, reallocating a physical address to the unsafe value Corresponding user data; and outputting a data write command to a delivery queue to write the user data to the reallocated physical address.   如申請專利範圍第1項所述的主動錯誤更正失敗處理方法,其中,該完成佇列以及該遞交佇列皆位於一主裝置。  The method for processing an active error correction failure as described in claim 1, wherein the completion queue and the delivery queue are all located in a master device.   如申請專利範圍第2項所述的主動錯誤更正失敗處理方法,其中,一開放通道固態硬碟自該遞交佇列取得該資料寫入命令,該開放通道固態硬碟將該完成元件寫入至該完成佇列。  The active error correction failure processing method of claim 2, wherein an open channel solid state hard disk obtains the data write command from the delivery queue, and the open channel solid state hard disk writes the completion component to The completion queue.   如申請專利範圍第3項所述的主動錯誤更正失敗處理方法,其中,該實體位址位於該開放通道固態硬碟。  The method for processing an active error correction failure as described in claim 3, wherein the physical address is located in the open channel solid state drive.   如申請專利範圍第1項所述的主動錯誤更正失敗處理方法,該方法更包括:輸出一資料讀取命令至一開放通道固態硬碟以讀取該使用者資料。  The method for processing an active error correction failure as described in claim 1 further includes: outputting a data read command to an open channel solid state drive to read the user data.   如申請專利範圍第1項所述的主動錯誤更正失敗處理方法,該方法更包括:輸出一參數設定命令至一開放通道固態硬碟以設定一錯誤位元閥值。  The method for processing an active error correction failure according to claim 1, wherein the method further comprises: outputting a parameter setting command to an open channel solid state hard disk to set an error bit threshold.   如申請專利範圍第6項所述的主動錯誤更正失敗處理方法,其中,該錯誤位元閥值小於該開放通道固態硬碟的錯誤更正功能的一最大能力值。  The active error correction failure processing method according to claim 6, wherein the error bit threshold is smaller than a maximum capability value of the error correction function of the open channel solid state hard disk.   如申請專利範圍第1項所述的主動錯誤更正失敗處理方法,該方法更包括:輸出一裝置辨認命令至一開放通道固態硬碟以取得該開放通道固態硬碟的至少一操作參數。  The method for processing an active error correction failure according to claim 1, wherein the method further comprises: outputting a device identification command to an open channel solid state hard disk to obtain at least one operating parameter of the open channel solid state hard disk.   一種主動錯誤更正失敗處理方法,包含:接收一參數設定命令;依據該參數設定命令而設定一錯誤位元閥值;接收一資料讀取命令;依據該資料讀取命令至一來源位址讀取一使用者資料,如果該使用者資料的一錯誤位元數大於或等於該錯誤位元閥值時,將一執行回覆表中對應至該使用者資料的一位元設定成一不安全值;以及將包括該執行回覆表的一完成元件寫入至一完成佇列。  An active error correction failure processing method includes: receiving a parameter setting command; setting an error bit threshold according to the parameter setting command; receiving a data reading command; reading the command to a source address according to the data reading command a user data, if an error bit number of the user data is greater than or equal to the error bit threshold, setting a bit corresponding to the user data in an execution reply table to an unsafe value; A completion component including the execution reply table is written to a completion queue.   如申請專利範圍第9項所述的主動錯誤更正失敗處理方法,該方法更包括:當執行該資料讀取命令時,如果讀取的該使用者資料的該錯誤位元數小於該錯誤位元閥值時,將該執行回覆表中對應至該使用者資料的該位元設定成一安全值。  The method for processing an active error correction failure as described in claim 9 further includes: when executing the data read command, if the number of error bits of the read user data is less than the error bit At the threshold, the bit corresponding to the user profile in the execution reply table is set to a safe value.   如申請專利範圍第9項所述的主動錯誤更正失敗處理方法,該方法更包括:依據該參數設定命令而致能一錯誤更正功能。  The method for processing an active error correction failure according to claim 9 of the patent application scope, the method further comprising: enabling an error correction function according to the parameter setting command.   如申請專利範圍第9項所述的主動錯誤更正失敗處理方法,其中,該參數設定命令以及該資料讀取命令皆由一開放通道固態硬碟所接收。  The method for processing an active error correction failure according to claim 9, wherein the parameter setting command and the data reading command are all received by an open channel solid state hard disk.   如申請專利範圍第12項所述的主動錯誤更正失敗處理方法,其中,該錯誤位元閥值小於該開放通道固態硬碟的一錯誤更正功能的一最大能力值。  The active error correction failure processing method according to claim 12, wherein the error bit threshold is smaller than a maximum capability value of an error correction function of the open channel solid state hard disk.   如申請專利範圍第12項所述的主動錯誤更正失敗處理方法,其中,該完成佇列以及該遞交佇列皆位於一主裝置。  The method for processing an active error correction failure according to claim 12, wherein the completion queue and the delivery queue are all located in a main device.   如申請專利範圍第9項所述的主動錯誤更正失敗處理方法,其中,該參數設定命令儲存於一遞交佇列。  The active error correction failure processing method described in claim 9 is wherein the parameter setting command is stored in a delivery queue.   如申請專利範圍第9項所述的主動錯誤更正失敗處理方法,該方法更包括:將該使用者資料儲存至該資料讀取命令所指定的一目的位址。  The method for processing an active error correction failure as described in claim 9 further includes: storing the user data to a destination address specified by the data read command.  
TW107109044A 2017-09-26 2018-03-16 Methods of proactive ecc failure handling TWI670595B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810668594.1A CN109558266B (en) 2017-09-26 2018-06-26 Failure processing method for active error correction
US16/034,915 US11016841B2 (en) 2017-09-26 2018-07-13 Methods and apparatuses for proactive ECC failure handling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762563115P 2017-09-26 2017-09-26
US62/563,115 2017-09-26

Publications (2)

Publication Number Publication Date
TW201915736A true TW201915736A (en) 2019-04-16
TWI670595B TWI670595B (en) 2019-09-01

Family

ID=66991784

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107109044A TWI670595B (en) 2017-09-26 2018-03-16 Methods of proactive ecc failure handling

Country Status (1)

Country Link
TW (1) TWI670595B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI730454B (en) * 2019-07-10 2021-06-11 慧榮科技股份有限公司 Apparatus and method and computer program product for executing host input-output commands
CN112988043A (en) * 2019-12-12 2021-06-18 西部数据技术公司 Error recovery for commit queue fetch errors
TWI748507B (en) * 2020-06-08 2021-12-01 瑞昱半導體股份有限公司 Data access system, and method for operating a data access system
TWI760695B (en) * 2019-05-24 2022-04-11 華邦電子股份有限公司 Semiconductor apparatus and continuous readout method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI382422B (en) * 2008-07-11 2013-01-11 Genesys Logic Inc Storage device for refreshing data pages of flash memory based on error correction code and method for the same
TWI479359B (en) * 2013-08-01 2015-04-01 Phison Electronics Corp Command executing method, memory controller and memory storage apparatus
TWI467578B (en) * 2014-01-09 2015-01-01 Phison Electronics Corp Error handling method, memory storage device and memory controlling circuit unit
WO2015116168A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Rendering data invalid in a memory array
TWI585770B (en) * 2015-08-11 2017-06-01 群聯電子股份有限公司 Memory management method, memory control circuit unit and memory storage device
TWI601059B (en) * 2015-11-19 2017-10-01 慧榮科技股份有限公司 Data storage device and data storage method
US20170177349A1 (en) * 2015-12-21 2017-06-22 Intel Corporation Instructions and Logic for Load-Indices-and-Prefetch-Gathers Operations
TWI591640B (en) * 2016-01-08 2017-07-11 群聯電子股份有限公司 Memory management method, memory control circuit unit and memory storage device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI760695B (en) * 2019-05-24 2022-04-11 華邦電子股份有限公司 Semiconductor apparatus and continuous readout method
TWI730454B (en) * 2019-07-10 2021-06-11 慧榮科技股份有限公司 Apparatus and method and computer program product for executing host input-output commands
CN112988043A (en) * 2019-12-12 2021-06-18 西部数据技术公司 Error recovery for commit queue fetch errors
TWI748507B (en) * 2020-06-08 2021-12-01 瑞昱半導體股份有限公司 Data access system, and method for operating a data access system

Also Published As

Publication number Publication date
TWI670595B (en) 2019-09-01

Similar Documents

Publication Publication Date Title
CN109542335B (en) Data internal moving method of flash memory and device using the same
TWI670595B (en) Methods of proactive ecc failure handling
TWI601060B (en) Data transmitting method, memory storage device and memory control circuit unit
TWI592799B (en) Mapping table updating method, memory control circuit unit and memory storage device
TWI556249B (en) Data reading method, memory storage device and memory controlling circuit unit
TWI726475B (en) Methods for internal data movement of a flash memory and apparatuses using the same
TWI648634B (en) Memory management method, memory storage device and memory control circuit unit
TWI656531B (en) Average wear method, memory control circuit unit and memory storage device
TW201913382A (en) Decoding method, memory storage device and memory control circuit unit
TW201913383A (en) Memory management method, memory control circuit unit and memory storage apparatus
TWI591640B (en) Memory management method, memory control circuit unit and memory storage device
TWI606333B (en) Data processing method, memory storage device and memory control circuit unit
CN109558266B (en) Failure processing method for active error correction
TW201329999A (en) Method for managing buffer memory, memory controllor, and memory storage device
TWI678621B (en) Memory management method, memory storage device and memory control circuit unit
TW201903779A (en) Memory management method, memory control circuit unit and memory storage apparatus
TWI631460B (en) Data reading method, memory control circuit unit and memory storage device
TWI797464B (en) Data reading method, memory storage device and memory control circuit unit
US20200244289A1 (en) Data writing method, memory control circuit unit and memory storage device
US20190065101A1 (en) Data storing method, memory control circuit unit and memory storage device
TW201816795A (en) Mapping table loading method, memory control circuit unit and memory storage apparatus
TW202221715A (en) Memory control method, memory storage device and memory control circuit unit
US10884660B2 (en) Memory management method, memory storage device and memory control circuit unit
TWI799031B (en) Decoding circuit module, memory control circuit unit and memory storage device
TWI819876B (en) Memory management method, memory storage device and memory control circuit unit