TW202020675A - Computuer system - Google Patents

Computuer system Download PDF

Info

Publication number
TW202020675A
TW202020675A TW108129186A TW108129186A TW202020675A TW 202020675 A TW202020675 A TW 202020675A TW 108129186 A TW108129186 A TW 108129186A TW 108129186 A TW108129186 A TW 108129186A TW 202020675 A TW202020675 A TW 202020675A
Authority
TW
Taiwan
Prior art keywords
erasure coding
coding logic
data
logic
statement
Prior art date
Application number
TW108129186A
Other languages
Chinese (zh)
Other versions
TWI791880B (en
Inventor
桑龐 保羅 歐拉利格
佛瑞德 沃里
奧斯卡P 品托
Original Assignee
南韓商三星電子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/207,080 external-priority patent/US10635609B2/en
Application filed by 南韓商三星電子股份有限公司 filed Critical 南韓商三星電子股份有限公司
Publication of TW202020675A publication Critical patent/TW202020675A/en
Application granted granted Critical
Publication of TWI791880B publication Critical patent/TWI791880B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/154Error and erasure correction, e.g. by using the error and erasure locator or Forney polynomial
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Algebra (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Lock And Its Accessories (AREA)

Abstract

A topology is disclosed. The topology may include at least one Non-Volatile Memory Express (NVMe) Solid State Drive (SSD), a Field Programmable Gate Array (FPGA) to implement one or more functions supporting the NVMe SSD, such as data acceleration, data deduplication, data integrity, data encryption, and data compression, and a Peripheral Component Interconnect Express (PCIe) switch. The PCIe switch may communicate with both the FPGA and the NVMe SSD.

Description

以FPGA+SSD內埋入式PCIE閥支持擦除碼資料保護之方法FPGA+SSD embedded PCIE valve to support erasure code data protection method

本發明概念大體來說涉及電腦系統,且更具體地說,涉及周邊元件連接快速(Peripheral Component Interconnect Express,PCIe)交換機內的擦除編碼。The concept of the present invention relates generally to computer systems, and more specifically, to erasing codes in peripheral component interconnect (Express Peripheral Interconnect Express, PCIe) switches.

目前,大多數具有獨立盤的冗餘陣列(Redundant Array of Independent Disk,RAID)保護的基於非揮發性儲存快速(Non-Volatile Memory Express,NVMe)的固態驅動器(Solid State Drive,SSD)是通過外部PCIe外掛程式卡(Add-In-Card,AIC)而實現。為優化主機中央處理器(central processing unit,CPU)與AIC RAID控制器之間的匯流排頻寬,匯流排通常支援X16個PCIe通道(lane)。然而,由於PCIe卡標準形狀因數(form factor)的物理限制,每個AIC RAID控制器僅支持少量的U.2連接器(目前是NVMe SSD的優選連接器):通常僅支持兩個或四個U.2連接器。At present, most solid state drives (SSD) based on Non-Volatile Memory Express (NVMe) protected by Redundant Array of Independent Disk (RAID) are protected by external PCIe plug-in card (Add-In-Card, AIC). To optimize the bus bandwidth between the central processing unit (CPU) of the host and the AIC RAID controller, the bus usually supports X16 PCIe lanes. However, due to the physical limitations of the standard form factor of PCIe cards, each AIC RAID controller only supports a small number of U.2 connectors (currently the preferred connector for NVMe SSDs): usually only supports two or four U.2 connector.

為在2U主機殼(chassis)內部支持多達24個NVMe SSD,需要6個AIC RAID控制器,從而會形成6個不同的RAID域。此種配置增加了管理6個RAID域的成本及複雜性。此外,每個AIC RAID控制器目前的成本接近400美元。因此,即使是單個2U主機殼的整個RAID解決方案,僅AIC RAID控制器就超過了2,400美元,這還未算上NVMe SSD的成本。In order to support up to 24 NVMe SSDs in a 2U chassis, 6 AIC RAID controllers are needed, which will form 6 different RAID domains. This configuration increases the cost and complexity of managing 6 RAID domains. In addition, the current cost of each AIC RAID controller is close to $400. Therefore, even for the entire RAID solution of a single 2U mainframe, the AIC RAID controller alone exceeds $2,400, which does not include the cost of the NVMe SSD.

由於缺乏對大型資料集的經濟高效的RAID資料保護,因此NVMe SSD在企業市場中的採用受到限制。軟體RAID解決方案適用於相對小的資料集,但不適用於大資料(Big Data)。Due to the lack of cost-effective RAID data protection for large data sets, the adoption of NVMe SSDs in the enterprise market is limited. Software RAID solutions are suitable for relatively small data sets, but not for big data.

使用AIC RAID控制器還存在其他問題:There are other problems when using the AIC RAID controller:

1)如上所述,主機殼內部具有多個RAID域會增加管理複雜性。1) As mentioned above, having multiple RAID domains inside the mainframe increases management complexity.

2)作為RAID域管理複雜性的必然結果,主機殼不具有單個RAID域,而具有單個RAID域將是優選的。2) As an inevitable result of the complexity of RAID domain management, the main chassis does not have a single RAID domain, but it would be preferable to have a single RAID domain.

3)中央處理器(CPU)需要支援大量PCIe通道:每AIC RAID控制器16個PCIe通道×每主機殼6個AIC RAID控制器=僅AIC RAID控制器就有96個PCIe通道。目前只有昂貴得多的高端CPU才支持這麼多PCIe通道。3) The central processing unit (CPU) needs to support a large number of PCIe channels: 16 PCIe channels per AIC RAID controller × 6 AIC RAID controllers per chassis = 96 PCIe channels only for the AIC RAID controller. Only the much more expensive high-end CPUs currently support so many PCIe channels.

4)由於每個AIC RAID控制器可能消耗25瓦,因此6個AIC RAID控制器會使每主機殼的功耗增加高達150瓦。4) Since each AIC RAID controller may consume 25 watts, 6 AIC RAID controllers will increase the power consumption per chassis by up to 150 watts.

5)主機殼常常僅具有幾個PCIe槽位(slot),這可能會限制可添加的AIC RAID控制器的數目,並間接地減少主機殼中可受RAID所保護的NVMe SSD的數目。5) The mainframe often has only a few PCIe slots, which may limit the number of AIC RAID controllers that can be added and indirectly reduce the number of NVMe SSDs in the mainframe that can be protected by RAID.

6)軟體RAID解決方案常常支援相對較少的RAID級別,並會增加CPU的負擔(overhead)。6) Software RAID solutions often support relatively few RAID levels and increase the CPU overhead.

7)當通過網路進行使用時,SSD存取可能由於在網路之間發送資料存取所需的時間而較慢。此外,在一些例子中,網路型記憶體可能需要軟體RAID實施方案,從而會增加CPU的負擔。7) When used over a network, SSD access may be slower due to the time required to send data access between networks. In addition, in some examples, networked memory may require a software RAID implementation, which will increase the burden on the CPU.

仍然需要一種支援對大量儲存裝置進行擦除編碼而不受AIC RAID控制器及軟體RAID解決方案所限制的方式。 [發明目的]There is still a need for a way to support erasure coding of mass storage devices without being restricted by AIC RAID controllers and software RAID solutions. [Object of the invention]

本公開的示例性實施例可提供一種使用擦除代碼來支援資料保護的系統。Exemplary embodiments of the present disclosure may provide a system that uses erasure codes to support data protection.

示例性實施例提供一種系統,所述系統可包括非揮發性儲存快速(NVMe)固態驅動器(SSD)、實施支援NVMe SSD的功能的現場可程式設計閘陣列(FPGA)以及周邊元件連接快速(PCIe)交換機。支援NVMe SSD的功能來自一組包括資料加速(data acceleration)、重復資料刪除(data deduplication)、資料完整性(data integrity)、資料加密(data encryption)及資料壓縮(data compression)在內的功能。PCIe交換機與FPGA及NVMe SSD通信。Exemplary embodiments provide a system that may include a non-volatile storage express (NVMe) solid-state drive (SSD), a field programmable gate array (FPGA) that implements functions that support NVMe SSD, and a peripheral component connection express (PCIe )switch. The functions that support NVMe SSD come from a set of functions including data acceleration, data deduplication, data integrity, data encryption, and data compression. PCIe switch communicates with FPGA and NVMe SSD.

另一個示例性實施例提供一種系統,所述系統可包括非揮發性儲存快速(NVMe)固態驅動器(SSD)及現場可程式設計閘陣列(FPGA),現場可程式設計閘陣列(FPGA)包括第一FPGA部分及第二FPGA部分。第一FPGA部分實施支援NVMe SSD的功能。第二FPGA部分實施周邊元件連接快速(PCIe)交換機。支援NVMe SSD的功能來自一組包括資料加速、重復資料刪除、資料完整性、資料加密及資料壓縮在內的功能。PCIe交換機與FPGA及NVMe SSD通信。FPGA及NVMe SSD位於共用殼體內部。Another exemplary embodiment provides a system that may include a non-volatile storage flash (NVMe) solid-state drive (SSD) and a field programmable gate array (FPGA). The field programmable gate array (FPGA) includes a first One FPGA part and the second FPGA part. The first FPGA part implements functions that support NVMe SSD. The second FPGA part implements peripheral component connection express (PCIe) switches. Support for NVMe SSD comes from a set of functions including data acceleration, deduplication, data integrity, data encryption and data compression. PCIe switch communicates with FPGA and NVMe SSD. FPGA and NVMe SSD are located inside the common housing.

又一示例性實施例提供一種系統,所述系統可包括非揮發性儲存快速(NVMe)固態驅動器(SSD)及具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機。PCIe交換機可包括能夠使PCIe交換機與處理器通信的外部連接器、能夠使PCIe交換機與NVMe SSD通信的至少一個連接器、用於配置PCIe交換機的功率處理單元(Power Processing Unit,PPU)以及擦除編碼控制器,擦除編碼控制器包括用於將擦除編碼方案應用於儲存在NVMe SSD上的資料的電路系統(circuitry)。 [發明效果]Yet another exemplary embodiment provides a system that may include a non-volatile storage express (NVMe) solid-state drive (SSD) and a peripheral component-connected express (PCIe) switch with erasure coding logic. The PCIe switch may include an external connector that enables the PCIe switch to communicate with the processor, at least one connector that enables the PCIe switch to communicate with the NVMe SSD, a power processing unit (PPU) for configuring the PCIe switch, and erasure The coding controller, the erasure coding controller includes circuitry for applying the erasure coding scheme to the data stored on the NVMe SSD. [Effect of invention]

根據本發明的實施例,使用包括旁視擦除編碼邏輯(Look-Aside Eraser Coding logic)的PCIe交換機將擦除編碼移動得更靠近儲存裝置,此可減少來回移動資料所需的時間。另外,通過將擦除編碼控制器與PCIe交換機一起放置,使得不再需要昂貴的RAID外掛程式卡,且可使用更大的陣列(甚至跨越多個主機殼)。According to an embodiment of the present invention, a PCIe switch including Look-Aside Eraser Coding logic (Look-Aside Eraser Coding logic) is used to move the erasure code closer to the storage device, which can reduce the time required to move data back and forth. In addition, by placing the erasure code controller with the PCIe switch, expensive RAID plug-in cards are no longer required, and a larger array can be used (even spanning multiple mainframes).

現將詳細參考本發明概念的實施例,所述實施例的例子示出在附圖中。在以下詳細說明中,闡述許多具體細節,以使得能夠徹底理解本發明概念。然而,應理解,所屬領域中的普通技術人員無需這些具體細節即可實踐本發明概念。在其他情形中,未詳細闡述眾所周知的方法、過程、元件、電路及網路,以免不必要地使實施例的方面模糊。Reference will now be made in detail to embodiments of the inventive concept, examples of which are shown in the drawings. In the following detailed description, many specific details are set forth to enable a thorough understanding of the inventive concept. However, it should be understood that those of ordinary skill in the art can practice the inventive concept without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.

將理解,儘管本文中可能使用用語第一(first)、第二(second)等來闡述各種元件,然而這些元件不應受這些用語所限制。這些術語僅用於區分各個元件。例如,在不背離本發明概念的範圍的條件下,第一模組可被稱為第二模組,且相似地,第二模組可被稱為第一模組。It will be understood that although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish each element. For example, without departing from the scope of the inventive concept, the first module may be referred to as the second module, and similarly, the second module may be referred to as the first module.

在本文中,在本發明概念的說明中所使用的術語僅用於闡述特定實施例,而非旨在限制本發明概念。除非上下文清楚地另外指示,否則如在對本發明概念及隨附申請專利範圍書的說明中所使用的單數形式“一(a/an)”及“所述(the)”旨在也包括複數形式。也將理解,本文中所使用的用語“和/或(and/or)”指代且囊括相關聯所列項中一個或多個項的任何及所有可能組合。更將理解,當在本說明書中使用用語“包括(comprises和/或comprising)”時,是指明所聲明特徵、整數、步驟、操作、元件和/或元件的存在,但不排除一個或多個其他特徵、整數、步驟、操作、元件、元件和/或其群組的存在或添加。圖式所示元件及特徵未必按比例繪製。Herein, the terminology used in the description of the inventive concept is only used to illustrate specific embodiments, and is not intended to limit the inventive concept. Unless the context clearly indicates otherwise, the singular forms "a/an" and "the" as used in the description of the concept of the invention and the accompanying patent application are intended to also include the plural form . It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will further be understood that when the term "comprises and/or comprising" is used in this specification, it indicates the presence of the claimed features, integers, steps, operations, elements and/or elements, but does not exclude one or more The presence or addition of other features, integers, steps, operations, elements, elements, and/or groups thereof. The elements and features shown in the drawings are not necessarily drawn to scale.

現場可程式設計閘陣列(FPGA)具有足夠的智慧、計算資源及高速輸入/輸出(Input/Output,I/O)連接,以在必要時實行獨立盤的冗餘陣列(RAID)/擦除代碼同位生成和資料發現(Erasure Code parity generation and data discovery)。FPGA+固態驅動器(SSD)可能需要嵌入式周邊元件連接快速(PCIe)交換機來支持更多的協控制器/輔助處理器,例如一個或多個SSD、圖形處理單元(Graphical Processing Unit,GPU)、張量處理單元(Tensor Processing Unit,TPU)等。多個輔助處理器還需要更多的反及快閃記憶體(NAND flash memory)通道。Field programmable gate array (FPGA) has sufficient intelligence, computing resources, and high-speed input/output (I/O) connections to implement redundant array of independent disks (RAID)/erase codes when necessary Erasure Code parity generation and data discovery. FPGA+Solid State Drive (SSD) may require embedded peripheral components connected to a PCIe switch to support more co-controllers/auxiliary processors, such as one or more SSDs, Graphical Processing Unit (GPU), Zhang Quantity processing unit (Tensor Processing Unit, TPU), etc. Multiple auxiliary processors also require more NAND flash memory channels.

本發明的實施例支持FPGA內部的PCIe交換機內的擦除代碼。本發明概念的實施例還可使得用戶能夠通過基板管理控制器(Baseboard Management Controller,BMC)遠端地配置(FPGA內部的)RAID引擎。使用者可使用這些標準介面(例如PCIe(用作控制平面)或系統管理匯流排(System Management Bus,SMBus))來預配置晶片上RAID(RAID-on-a-chip,RoC)或擦除代碼控制器。對於租賃計算資源的用戶而言,能夠以此種方式配置儲存裝置可為有用的:當完成時,用戶可能希望在下一用戶可能使用相同的計算資源之前快速銷毀資料。在此種情形中,BMC可向多個FPGA+SSD內部的所有嵌入式PCIe交換機發送擦除命令。一旦接收到擦除命令,FPGA的RoC/擦除代碼控制器將擦除由命令邏輯塊位址(LBA)範圍規定的資料與同位資料二者。Embodiments of the present invention support erasure codes in PCIe switches inside FPGAs. Embodiments of the inventive concept may also enable a user to remotely configure a RAID engine (internal to the FPGA) through a Baseboard Management Controller (BMC). Users can use these standard interfaces (such as PCIe (used as a control plane) or System Management Bus (System Management Bus, SMBus)) to pre-configure on-chip RAID (RAID-on-a-chip, RoC) or erase code Controller. For users who lease computing resources, it can be useful to be able to configure the storage device in this way: when completed, the user may want to quickly destroy the data before the next user may use the same computing resources. In this case, the BMC can send an erase command to all embedded PCIe switches inside multiple FPGA+SSDs. Once the erase command is received, the RoC/Erase Code Controller of the FPGA will erase both the data specified by the command logic block address (LBA) range and the parity data.

現今,PCIe交換機暴露虛擬交換機或虛擬分組(virtual grouping),其中多於一個交換機被暴露給管理者。當這些虛擬域後面的網路、CPU-GPU、FPGA及記憶體可分組到一起時,這些配置在虛擬化環境中是有用的。此種虛擬分組可在一個實施例中通過為虛擬化環境創建暴露給用戶群組的RAID子群組來應用於記憶體,或者作為另一選擇用於RAID分組(例如RAID 10、RAID 50、RAID 60等)。這些分層的RAID群組創建小的群組,且在頂部應用附加的RAID層來創建較大的RAID解決方案。虛擬交換機管理較小的RAID群組,而主交換機管理總體RAID配置。Today, PCIe switches expose virtual switches or virtual grouping, where more than one switch is exposed to managers. When the network, CPU-GPU, FPGA, and memory behind these virtual domains can be grouped together, these configurations are useful in a virtualized environment. Such virtual grouping can be applied to memory in one embodiment by creating a RAID subgroup exposed to the user group for the virtualized environment, or as an alternative for RAID grouping (eg RAID 10, RAID 50, RAID 60 etc.). These layered RAID groups create small groups and apply additional RAID layers on top to create larger RAID solutions. The virtual switch manages smaller RAID groups, while the master switch manages the overall RAID configuration.

由於啟用資料保護方案且使管理更靠近儲存單元,因此所述解決方案在企業環境及資料中心環境中提供了具有重要區分特點的有益效果。本發明概念的實施例以較低的功耗提供較高的密度及性能。Since the data protection scheme is enabled and management is brought closer to the storage unit, the solution provides beneficial effects with important distinguishing characteristics in the enterprise environment and the data center environment. Embodiments of the inventive concept provide higher density and performance with lower power consumption.

所述解決方案可由一個具有集成RoC的嵌入式PCIe交換機或位於主機與SSD之間的資料路徑中的擦除代碼控制器組成。PCIe交換機+RoC元件可由BMC管理以進行配置及控制,且在發佈給新使用者之前,可對軟體暴露介面以進行特定配置。The solution can consist of an embedded PCIe switch with integrated RoC or an erasure code controller in the data path between the host and SSD. The PCIe switch + RoC component can be managed and configured and controlled by the BMC, and the interface can be exposed to the software for specific configuration before being released to new users.

當以擦除代碼/RAID模式運行時,所有往來於嵌入式PCIe交換機的新進非揮發性儲存快速(NVMe)或基於組構的NVMe(NVMe over Fabric,NVMe-oF)業務量都可能被RoC或擦除代碼控制器(其可稱為旁視RoC或擦除代碼控制器)探測。RoC或擦除代碼控制器可判斷業務量中的資料是否導致對其本地快取的快取命中(cache hit)。如果存在快取命中,則無需將交易(讀取或寫入)轉發到適宜的SSD。所請求讀取資料可由RoC的快取直接提供。寫入資料將直接更新到RoC的本地快取並被標記為“經修改(modified)”或“髒(dirty)”資料。When running in erasure code/RAID mode, all new non-volatile storage express (NVMe) or fabric-based NVMe (NVMe over Fabric, NVMe-oF) traffic to and from embedded PCIe switches may be affected by RoC or Erasing code controller (which may be referred to as bystander RoC or erasing code controller) detection. The RoC or erasure code controller can determine whether the data in the traffic causes a cache hit on its local cache. If there is a cache hit, there is no need to forward the transaction (read or write) to the appropriate SSD. The requested reading data can be provided directly by RoC cache. The written data will be directly updated to RoC's local cache and marked as "modified" or "dirty" data.

對於SSD,同位可分佈在經連接的SSD之間。例如,如果選擇RAID 4,則最末SSD僅可用於儲存同位,而其他SSD則用於儲存資料。For SSDs, parity can be distributed between connected SSDs. For example, if RAID 4 is selected, the last SSD can only be used to store parity, while the other SSDs are used to store data.

通過讓主機與SSD裝置之間具有外部PCIe交換機,可支援虛擬I/O位址。在此種情形中,作為主機PCIe交換機一部分的一級(primary)RoC可將所有SSD位址虛擬化。換句話說,位址及裝置對主機作業系統(operating system,OS)不可見。在本發明概念的此種實施例中,容許和支持作為對等體的至少兩個SSD之間的對等(peer-to-peer)交易。此選項可通過跨多於一個SSD進行條帶化(striping)來增強SSD的一些形式的冗餘和/或可用性(availability)。在此種模式中,FPGA內的嵌入式RoC或擦除代碼控制器可被禁用(如果存在)。唯一被啟用的RoC/擦除代碼控制器位於主機PCIe交換機中。By having an external PCIe switch between the host and the SSD device, virtual I/O addresses can be supported. In this case, a primary RoC that is part of the host PCIe switch can virtualize all SSD addresses. In other words, the address and device are not visible to the host operating system (OS). In such an embodiment of the inventive concept, peer-to-peer transactions between at least two SSDs that are peers are allowed and supported. This option can enhance some forms of SSD redundancy and/or availability by striping across more than one SSD. In this mode, the embedded RoC or erasure code controller in the FPGA can be disabled (if present). The only RoC/Erase Code Controller that is enabled is located in the host PCIe switch.

如果儲存裝置以單裝置模式進行操作,則所有新進NVMe/PCIe業務量均可能被轉發到具有所請求資料的SSD。If the storage device operates in single-device mode, all new NVMe/PCIe traffic may be forwarded to the SSD with the requested data.

如果配對模式被啟用,則RoC/擦除代碼控制器可判斷所請求資料的位址是否屬於其自身的基址暫存器(base address register,BAR)域。在此種情形中,交易可由本地RoC完成。對於寫入交易,可使用張貼式寫入緩衝器(posted write buffer)或寫入快取(使用一些嵌入式靜態隨機存取記憶體(static random access memory,SRAM)或動態隨機存取記憶體(dynamic random access memory,DRAM))。如果存在寫入快取命中(前一寫入已發生,且資料仍儲存在寫入快取緩衝器中),則處理取決於寫入快取策略。例如,如果快取策略是寫回(write-back),則寫入命令將由RoC快取完成並終止。如果快取策略是寫直達(write-through),則寫入命令將在寫入資料已成功傳送到驅動器時完成。在此種情形中,一旦寫入資料已成功更新到其本地快取,RoC便可終止對主機的寫入命令。If the pairing mode is enabled, the RoC/erase code controller can determine whether the address of the requested data belongs to its own base address register (BAR) field. In this case, the transaction can be completed by the local RoC. For write transactions, you can use a posted write buffer (posted write buffer) or write cache (using some embedded static random access memory (static random access memory, SRAM) or dynamic random access memory ( dynamic random access memory, DRAM)). If there is a write cache hit (the previous write has occurred and the data is still stored in the write cache buffer), the processing depends on the write cache strategy. For example, if the cache strategy is write-back, the write command will be completed and terminated by the RoC cache. If the cache strategy is write-through, the write command will be completed when the written data has been successfully transferred to the drive. In this case, once the written data has been successfully updated to its local cache, RoC can terminate the write command to the host.

RoC可將其所主張的一堆裝置虛擬化,並將所述裝置作為單個裝置或更少的裝置來呈現,以作為針對資料或裝置故障的保護方案。資料保護方案本質上可跨一堆裝置進行分佈,以便當在任何裝置上有資料丟失時,可從其他裝置重建資料。RAID及擦除編碼(EC)是使用分散式演算法來保護此種損失的常用資料保護方案。RoC can virtualize a bunch of devices it claims and present the devices as a single device or fewer devices as a protection scheme against data or device failures. Data protection schemes can be distributed across a bunch of devices in essence, so that when data is lost on any device, data can be reconstructed from other devices. RAID and erasure coding (EC) are common data protection schemes that use distributed algorithms to protect against such losses.

為將RoC下方的裝置虛擬化,可讓裝置終止在RoC處並對主機不可見。也就是說,PCIe交換機可連接到所有已知的裝置,且RoC可連接到交換機。為管理裝置,RoC可通過PCIe交換機發現並配置各別裝置。交替地,RoC在預設/工廠模式下可為通透的且允許主機軟體來配置RoC。主機軟體可被特別定制,以與PCIe交換機+RoC硬體一起工作。一旦經配置後,RoC便可終止裝置,並使其對主機不可見。To virtualize the device under RoC, the device can be terminated at RoC and not visible to the host. That is, the PCIe switch can be connected to all known devices, and the RoC can be connected to the switch. To manage devices, RoC can discover and configure individual devices through PCIe switches. Alternately, RoC can be transparent in the default/factory mode and allow the host software to configure RoC. The host software can be customized to work with PCIe switch + RoC hardware. Once configured, RoC can terminate the device and make it invisible to the host.

PCIe交換機+RoC可以以多種方式配置用於RAID模式及EC模式。下游可存在附加的PCIe交換機,以創建更大的扇出配置(fan-out configuration)來支援更多裝置。另外,多於一個此種硬體組合可關聯在一起以形成更大的設置(setup)。例如,2個PCIe交換機+RoC可一起工作以形成替代配置。交替地,這2個PCIe交換機+RoC可單獨地工作。PCIe switch + RoC can be configured for RAID mode and EC mode in a variety of ways. There can be additional PCIe switches downstream to create a larger fan-out configuration to support more devices. In addition, more than one such hardware combination can be associated together to form a larger setup. For example, 2 PCIe switches + RoC can work together to form an alternative configuration. Alternately, the two PCIe switches + RoC can work independently.

當PCIe交換機+RoC單獨地工作時,主機將每個RoC和PCIe交換機組合產生實體為單獨的裝置。此處,主機可具有標準OS驅動器,標準OS驅動器將看到由RoC虛擬化的所有SSD。例如,假設有6個SSD聚集在PCIe交換機下面且有1個SSD由RoC暴露給主機;第二RoC和PCIe交換機組合也可將相似的設置暴露給主機。主機為所有RoC控制器裝置發現2個SSD(每個裝置一個)。每個RoC控制器可為每個所暴露SSD暴露單獨的裝置空間。主機可能看不到所有支援此所暴露SSD且在所暴露SSD後面的裝置。RoC通過PCIe交換機管理硬體I/O路徑。When the PCIe switch + RoC works separately, the host combines each RoC and PCIe switch to generate an entity as a separate device. Here, the host may have a standard OS driver, and the standard OS driver will see all SSDs virtualized by RoC. For example, suppose there are 6 SSDs clustered under the PCIe switch and 1 SSD is exposed to the host by RoC; the second RoC and PCIe switch combination can also expose similar settings to the host. The host discovers 2 SSDs (one for each device) for all RoC controller devices. Each RoC controller can expose a separate device space for each exposed SSD. The host may not see all devices that support the exposed SSD and are behind the exposed SSD. RoC manages hardware I/O paths through PCIe switches.

此方法可用在主動-被動設置(active-passive setup)中,其中,第二控制器是用於防備第一控制器路徑出現故障的備用路徑。主機在此處僅主動使用第一控制器,且I/O均不被發送到第二RoC控制器。如果使用主動-被動設置,則所述2個RoC控制器可在內部複製資料。像在RAID 1資料保護設置中一樣,此可通過第一主動控制器向第二RoC控制器發送所有寫入來完成。This method can be used in an active-passive setup (active-passive setup), where the second controller is a backup path used to prevent a failure of the path of the first controller. The host only actively uses the first controller here, and no I/O is sent to the second RoC controller. If the active-passive setting is used, the 2 RoC controllers can copy data internally. As in the RAID 1 data protection setup, this can be done by the first active controller sending all writes to the second RoC controller.

可存在第二主動-被動設置,其中第二RoC和PCIe交換機後面可不具有任何其自身的SSD且可僅為備用控制器路徑。在此種情形中,由於所述2個RoC控制器涉及同一組SSD,因此其之間可不發送I/O。此為標準主動-被動設置。There may be a second active-passive setting, where the second RoC and PCIe switch may not have any SSD of its own and may only be a backup controller path. In this case, since the two RoC controllers involve the same set of SSDs, no I/O may be sent between them. This is the standard active-passive setting.

每個RoC後面的SSD也可能彼此不協調,在此種情形中,所述2個SSD被視為單獨的SSD,其之間不共用保護方案。The SSDs behind each RoC may also be incompatible with each other. In this case, the two SSDs are regarded as separate SSDs, and the protection scheme is not shared between them.

在又一用法中,兩個路徑均可用在主動-主動設置(active-active setup)中。此種設置可用於負載平衡(load-balancing)目的。此處,主機可以使用特定軟體層來分佈I/O工作負載的方式來使用這兩個路徑。所述兩個RoC控制器可在其之間協調其寫入操作,以使這兩個SSD保持同步。也就是說,來自每個RoC控制器的每個SSD可含有與RAID 1設置中相同的資料。In yet another usage, both paths can be used in active-active setup. This type of setting can be used for load-balancing purposes. Here, the host can use a specific software layer to distribute I/O workload to use these two paths. The two RoC controllers can coordinate their write operations between them to keep the two SSDs in sync. In other words, each SSD from each RoC controller can contain the same data as in the RAID 1 setup.

在又一配置中,所述2個RoC控制器以使其I/O在定制設置(custom setup)中保持分佈的方式通信。此處,主機僅使用一個RoC控制器:另一個RoC控制器連接到第一RoC控制器。第一RoC控制器可將一個或多個虛擬NVMe SSD暴露給主機。所述2個RoC可被設置為在其之間劃分奇數LBA空間與偶數LBA空間。由於NVMe對來自裝置側的資料使用拉模型(pull model),因此主機僅將命令發送到由第一RoC控制器暴露的SSD。RoC控制器可經由其旁側通道連接(side channel connection)向第二RoC控制器發送消息副本。RoC控制器可被設置為服務於僅為奇數或僅為偶數的LBA、條帶(stripe)、區(zone)等。此種設置提供無需由主機管理且可由RoC和PCIe交換機組合來透明地管理的內部負載平衡。各別RoC控制器可僅處理奇數LBA範圍或偶數LBA範圍,並滿足對主機緩衝器的請求。由於這兩個RoC控制器均存取主機,因此其可對其奇數對或偶數對填入數據。In yet another configuration, the 2 RoC controllers communicate in a manner that keeps their I/O distributed in a custom setup. Here, the host uses only one RoC controller: another RoC controller is connected to the first RoC controller. The first RoC controller may expose one or more virtual NVMe SSDs to the host. The 2 RoCs may be set to divide an odd LBA space and an even LBA space therebetween. Since NVMe uses a pull model for data from the device side, the host only sends commands to the SSD exposed by the first RoC controller. The RoC controller may send a copy of the message to the second RoC controller via its side channel connection. The RoC controller can be set to serve only odd or even LBAs, stripes, zones, etc. This setup provides internal load balancing that does not need to be managed by the host and can be transparently managed by a combination of RoC and PCIe switches. Individual RoC controllers can handle only odd LBA ranges or even LBA ranges and satisfy requests for host buffers. Since both RoC controllers access the host, they can fill in data for odd or even pairs.

例如,主機可向第一RoC控制器發送讀取四個連續的LBA 0、LBA 1、LBA 2、LBA 3的命令,第一RoC控制器向第二RoC控制器發送副本。接著,第一RoC控制器從其PCIe交換機上的前兩個SSD讀取LBA 0及LBA 2的數據,而第二RoC控制器從其PCIe交換機上的前兩個SSD讀取來自LBA 1及LBA 3的資料。第二RoC控制器可接著將其已完成其操作報告給第一RoC控制器,第一RoC控制器可接著將交易完成報告給主機。For example, the host may send a command to read four consecutive LBA 0, LBA 1, LBA 2, LBA 3 to the first RoC controller, and the first RoC controller sends a copy to the second RoC controller. Next, the first RoC controller reads the data of LBA 0 and LBA 2 from the first two SSDs on its PCIe switch, while the second RoC controller reads the data from LBA 1 and LBA from the first two SSDs on its PCIe switch 3. Information. The second RoC controller may then report that it has completed its operation to the first RoC controller, and the first RoC controller may then report the transaction completion to the host.

奇數/偶數LBA/條帶/區對是可應用於其他負載分配用法的例子。Odd/even LBA/strip/zone pairs are examples that can be applied to other load distribution usages.

本發明概念的實施例可支援SSD故障、移除及熱添加(hot addition)。當SSD無法正常工作或被從其槽位中移除時,PCIe交換機中的RoC需要檢測此種情況。當PCIe交換機檢測到此種情況時,RoC可對出現故障或被移除的SSD開始重建操作(rebuild operation)。RoC還可通過確定來自相關聯條帶的資料的優先順序來處置重建週期期間的任何I/O操作。Embodiments of the inventive concept can support SSD failure, removal, and hot addition. When the SSD does not work properly or is removed from its slot, the RoC in the PCIe switch needs to detect this situation. When the PCIe switch detects this situation, RoC can start a rebuild operation on the SSD that has failed or was removed. RoC can also handle any I/O operations during the reconstruction cycle by prioritizing the data from the associated stripes.

存在至少兩種將SSD故障或移除報告給PCIe交換機中的RoC的方法。在本發明概念的一個實施例中,所有SSD均具有連接到BMC的存在引腳(Present pin)。當SSD被從主機殼中拉出時,BMC會檢測到移除。接著,BMC將受影響的槽位號報告給PCIe交換機中的RoC。BMC還可週期性地監控SSD的健康狀況。如果BMC檢測到SSD所報告的任何致命錯誤情況,則BMC可決定讓此SSD不再進行服務。接著,BMC可將故障的槽位元號報告給RoC,以便可重建新的SSD。There are at least two ways to report SSD failure or removal to RoC in PCIe switches. In one embodiment of the inventive concept, all SSDs have a presence pin (Present pin) connected to the BMC. When the SSD is pulled out of the case, the BMC will detect the removal. Next, BMC reports the affected slot number to the RoC in the PCIe switch. BMC can also periodically monitor the health of SSDs. If the BMC detects any fatal error condition reported by the SSD, the BMC may decide to stop the SSD from serving. Then, the BMC can report the failed slot number to the RoC, so that a new SSD can be rebuilt.

在本發明概念的另一實施例中,PCIe交換機可能夠支持熱插拔(hot plug),在熱插拔中,所有SSD通過PCIe邊帶信號(sideband signal)進行連接且可檢測特定錯誤情況。PCIe交換機可檢測SSD何時被拉出或添加進來,或者檢測通往SSD的PCIe鏈路何時不再相連。在此種錯誤情境中,PCIe交換機中的RoC可隔離故障SSD,或者BMC可通過禁用故障驅動器的電源並立即開始重建所述驅動器來隔離故障SSD。In another embodiment of the inventive concept, the PCIe switch may be able to support hot plugs, in which all SSDs are connected via PCIe sideband signals and can detect specific error conditions. The PCIe switch can detect when the SSD is pulled out or added in, or when the PCIe link to the SSD is no longer connected. In this error scenario, the RoC in the PCIe switch can isolate the faulty SSD, or the BMC can isolate the faulty SSD by disabling the power supply of the faulty drive and immediately starting to rebuild the drive.

當斷言(assert)時,每個U.2連接器的存在(PRSNT#)引腳可指示主機殼中存在新裝置。信號連接到PCIe交換機和/或BMC。RoC可根據目前的資料保護策略適宜地將新驅動器配置到其現有域中。When asserted, the presence (PRSNT#) pin of each U.2 connector can indicate the presence of a new device in the main chassis. The signal is connected to the PCIe switch and/or BMC. RoC can appropriately configure the new drive into its existing domain based on the current data protection strategy.

來自主機的所有新進業務量均需要轉發到探測P2P及位址轉換邏輯(物理到邏輯)。在PCIe枚舉期間,所有埠的所有配置迴圈均需要轉發到探測P2P邏輯。視所選擇的操作模式而定,具有RoC的PCIe交換機的行為被定義如下:

Figure 108129186-A0304-0001
All incoming traffic from the host needs to be forwarded to probe P2P and address conversion logic (physical to logical). During PCIe enumeration, all configuration loops of all ports need to be forwarded to the probe P2P logic. Depending on the selected operating mode, the behavior of the PCIe switch with RoC is defined as follows:
Figure 108129186-A0304-0001

RoC也可位於PCIe交換機與主處理器之間,與其排成一行。在本發明概念的此種實施例中,RoC可稱為透視RoC(Look-Through RoC)。當使用透視RoC時,如果PCIe交換機像正常PCIe交換機一樣進行操作,則RoC被禁用且成為所有埠的重計時器(re-timer)。在此種情形中,所有上游埠均被容許像正常使用情形中一樣進行連接。RoC can also be located in a line between the PCIe switch and the main processor. In such an embodiment of the inventive concept, RoC may be referred to as Look-Through RoC. When using perspective RoC, if the PCIe switch operates like a normal PCIe switch, RoC is disabled and becomes a re-timer for all ports. In this case, all upstream ports are allowed to connect as in normal use.

如果RoC被啟用,則少量非透明橋(non-transparent bridge,NTB)埠將連接到主機。在此種情形中,RoC可按照所選擇的RAID或擦除編碼級別將新進位址虛擬化為邏輯位址。If RoC is enabled, a small number of non-transparent bridge (NTB) ports will be connected to the host. In this case, RoC can virtualize the new address into a logical address according to the selected RAID or erasure coding level.

無論RoC是旁視RoC還是透視RoC,所有新進讀取/寫入記憶體請求均可對照RoC的本地快取進行檢查,以確定快取命中或快取未中(cache miss)。如果存在快取命中,則所請求的讀取資料可由RoC本地快取記憶體而非SSD提供。對於記憶體寫入命中,可立即將寫入資料更新到快取記憶體。相同的寫入資料可稍後更新到SSD。此種實施方案可降低記憶體寫入的總潛伏時間(latency),從而提高系統性能。Regardless of whether RoC is watching RoC or looking through RoC, all incoming read/write memory requests can be checked against RoC's local cache to determine whether the cache hit or cache miss. If there is a cache hit, the requested read data can be provided by the RoC local cache memory instead of the SSD. For memory write hits, the write data can be immediately updated to the cache memory. The same written data can be updated to SSD later. Such an implementation can reduce the total latency of memory writing, thereby improving system performance.

如果存在快取未中,則RoC控制器可確定哪個SSD是存取資料的正確驅動器。If there is a cache miss, the RoC controller can determine which SSD is the correct drive to access the data.

為對PCIe裝置進行定址,必須通過映射到系統的I/O埠位址空間或記憶體映射位址空間來啟用PCIe裝置。系統的固件、裝置驅動器或作業系統程式對基址暫存器(BAR)進行程式設計,以通過向PCI控制器寫入配置命令來將其位址映射告知給所述裝置。由於所有PCIe裝置在系統重置時均處於非現用狀態(inactive state),因此其將不被分配以可供作業系統或裝置驅動器用來與其通信的地址。基本輸入/輸出系統(basic input/output system,BIOS)或作業系統通過PCIe控制器,使用每個槽位元的初始化裝置選擇(Initialization Device Select,IDSEL)信號在地理上對PCIe槽位(例如,主機板上的第一PCIe槽位、第二PCIe槽位或第三PCIe槽位等)進行定址。

Figure 108129186-A0304-0002
To address PCIe devices, the PCIe device must be enabled by mapping to the system's I/O port address space or memory mapped address space. The system's firmware, device driver, or operating system program programs the base address register (BAR) to inform the device of its address mapping by writing configuration commands to the PCI controller. Since all PCIe devices are in an inactive state when the system is reset, they will not be assigned an address that the operating system or device driver can use to communicate with it. The basic input/output system (basic input/output system, BIOS) or operating system uses the PCIe controller to initialize the PCIe slots geographically (for example, using the Initialization Device Select (IDSEL) signal for each slot). Address the first PCIe slot, second PCIe slot, or third PCIe slot on the motherboard.
Figure 108129186-A0304-0002

由於BIOS或作業系統沒有直接的方法來確定哪些PCIe槽位元已安裝裝置(也沒有直接的方法確定所述裝置實施哪些功能),因此會枚舉PCI匯流排。匯流排枚舉可通過嘗試在裝置的功能15處從供應商標識(ID)及裝置標識(VID/DID)暫存器中讀取匯流排號和裝置號的每個組合來實行。注意,與DID不同的裝置號僅為裝置在此匯流排上的序號。此外,在檢測到新橋之後,定義新的匯流排號,且裝置枚舉在裝置號零處重新開始。Since the BIOS or the operating system does not have a direct method to determine which PCIe slots have devices installed (nor does it have a direct method to determine which functions are implemented by the devices), the PCI bus will be enumerated. Bus enumeration can be implemented by attempting to read each combination of bus number and device number from the vendor identification (ID) and device identification (VID/DID) registers at function 15 of the device. Note that the device number different from DID is only the serial number of the device on this bus. In addition, after detecting a new bridge, a new bus number is defined, and device enumeration restarts at the device number zero.

如果未從裝置的功能15接收到響應,則匯流排主控器可實行異常中止(abort)並返回全位開啟值(all-bits-on value)(十六進位的FFFFFFFF),此值是無效的VID/DID值。通過此種方式,裝置驅動器可明白指定組合匯流排/裝置_號/功能(bus/device_number/function,B/D/F)不存在。因此,當對於給定匯流排/裝置對值為零的功能ID的讀取導致主控器(啟動器)異常中止時,裝置驅動器可斷定此匯流排上不存在工作的裝置(需要裝置來實施功能號零)。在此種情形中,沒有必要讀取其餘的功能號(1到7),因為其也將不存在。If no response is received from function 15 of the device, the bus master can perform an abort and return an all-bits-on value (hexadecimal FFFFFFFF), which is invalid VID/DID value. In this way, the device driver can understand that the specified bus/device_number/function (B/D/F) does not exist. Therefore, when the reading of a function ID of zero for a given bus/device causes the main controller (starter) to abort, the device driver can conclude that there is no working device on this bus (requires the device to implement Function number zero). In this case, there is no need to read the remaining function numbers (1 to 7), because it will also not exist.

當對於供應商ID暫存器對指定B/D/F組合的讀取成功時,裝置驅動器會知道此裝置存在。裝置驅動器可將所有的1寫入其BAR,並以編碼形式讀回裝置的所請求記憶體大小。所述設計暗示所有位址空間大小是2的冪且自然對齊。When the reading of the specified B/D/F combination is successful for the vendor ID register, the device driver will know that the device exists. The device driver can write all 1s to its BAR and read back the requested memory size of the device in coded form. The design implies that all address space sizes are powers of 2 and are naturally aligned.

此時,BIOS或作業系統可將記憶體映射位址及I/O埠位址程式設計到裝置的BAR配置暫存器中。只要系統保持接通,這些位址便會保持有效。一旦斷電,所有這些設置均會丟失,且下次系統重新通電時會重複所述過程。由於此整個過程是完全自動的,因此用戶無需通過自己更換卡上的DIP交換機來手動配置任何新添加的硬體。此種自動的裝置發現和位址空間分配是隨插即用(plug and play)的實施方式。At this time, the BIOS or the operating system can program the memory mapped address and the I/O port address into the BAR configuration register of the device. As long as the system remains on, these addresses will remain valid. Once the power is turned off, all these settings will be lost, and the process will be repeated the next time the system is powered on again. Because this entire process is completely automatic, users do not need to manually configure any newly added hardware by replacing the DIP switch on the card. Such automatic device discovery and address space allocation are plug and play implementations.

如果找到PCIe到PCIe橋,則系統可為所述橋以外的二級(secondary)PCI匯流排分配非零的匯流排號,且接著枚舉此二級匯流排上的裝置。如果找到更多的PCIe橋,則所述發現可遞迴地繼續,直到所有可能的域/匯流排/裝置組合均得到掃描為止。If a PCIe to PCIe bridge is found, the system can assign a non-zero bus number to the secondary PCI bus outside the bridge, and then enumerate the devices on this secondary bus. If more PCIe bridges are found, the discovery can continue recursively until all possible domain/bus/device combinations have been scanned.

每個非橋PCIe裝置功能可實施多達6個BAR,所述6個BAR中的每一個可回應於I/O埠及記憶體映射位址空間中的不同位址。每個BAR闡述一區域。Each non-bridge PCIe device function can implement up to 6 BARs, each of which can respond to different addresses in the I/O port and memory mapped address space. Each BAR describes a region.

PCIe裝置也可具有可含有驅動器代碼或配置資訊的可選唯讀記憶體(read only memory,ROM)。PCIe devices can also have optional read only memory (ROM) that can contain drive codes or configuration information.

BMC可直接配置RoC設置。BMC可具有其中要應用特定資料保護方案的硬編碼路徑或者可配置設置。後者可將介面作為BIOS選項暴露給此配置,或者經由硬體暴露介面(hardware exposed interface)附加地暴露給軟體。硬編碼方案可內建在BIOS固件中,且可仍然提供啟用/禁用保護的選項。BMC can directly configure RoC settings. The BMC may have a hard-coded path where specific data protection schemes are to be applied or configurable settings. The latter can expose the interface as a BIOS option to this configuration, or it can be additionally exposed to the software via a hardware exposed interface. A hard-coded scheme can be built into the BIOS firmware, and can still provide options to enable/disable protection.

為處置裝置故障,BMC可通過控制路徑檢測驅動器何時變壞或被移除。BMC還可通過自我監控分析和報告技術(Self-Monitoring Analysis and Reporting Technology,SMART)來確定裝置預計很快會變壞。在這些情形中,BMC可重新配置RoC硬體,以啟用失敗的場景或警告用戶所述情境。BMC僅進入控制路徑,而不進入資料路徑。當插入新驅動器時,BMC可再次進行干預並將新驅動器配置作為受保護群組的一部分,或者啟動重建操作。RoC硬體可處置實際重建、此設置中的恢復路徑,以提供盡可能小的性能影響,同時在資料存取路徑中提供更少的潛伏時間。To handle device failures, the BMC can detect when the drive has deteriorated or been removed through the control path. BMC can also use Self-Monitoring Analysis and Reporting Technology (SMART) to determine that the device is expected to deteriorate soon. In these situations, the BMC can reconfigure RoC hardware to enable failed scenarios or warn the user of the situation. BMC only enters the control path, not the data path. When a new drive is inserted, the BMC can intervene again and configure the new drive as part of a protected group, or initiate a rebuild operation. RoC hardware can handle the actual reconstruction, the recovery path in this setup, to provide the smallest possible performance impact, while providing less latency in the data access path.

圖1示出根據本發明概念實施例的機器,所述機器包括具有旁視擦除編碼邏輯的周邊元件連接快速(PCIe)交換機。在圖1中,示出機器105。機器105可包括處理器110。處理器110可為任何種類的處理器:例如,英特爾至強(Intel Xeon)、賽揚(Celeron)、安騰(Itanium)或淩動處理器(Atom processor)、高級微型裝置(Advanced Micro Devices,AMD)皓龍(Opteron)處理器、高級精簡指令集電腦器(advanced RSIC machine,ARM)處理器等。儘管圖1示出機器105中的單個處理器110,然而機器105可包括任何數目的處理器,所述處理器中的每一個均可為單核心處理器或多核心處理器,且可以以任何所期望組合進行混合。FIG. 1 illustrates a machine according to an embodiment of the inventive concept, the machine including a peripheral component connection express (PCIe) switch with side-by-side erasure coding logic. In FIG. 1, the machine 105 is shown. The machine 105 may include a processor 110. The processor 110 may be any kind of processor: for example, Intel Xeon, Celeron, Itanium or Atom processor, Advanced Micro Devices, AMD) Opteron processor, advanced reduced instruction set computer (advanced RSIC machine, ARM) processor, etc. Although FIG. 1 shows a single processor 110 in the machine 105, the machine 105 may include any number of processors, each of which may be a single-core processor or a multi-core processor, and may be any Mix as desired.

機器105還可包括記憶體115,記憶體115可由記憶體控制器120管理。記憶體115可為任何種類的記憶體,例如快閃記憶體、動態隨機存取記憶體(DRAM)、靜態隨機存取記憶體(SRAM)、永久隨機存取記憶體(Persistent Random Access Memory)、鐵電式隨機存取記憶體(Ferroelectric Random Access Memory,FRAM)或者例如磁阻式隨機存取記憶體(Magnetoresistive Random Access Memory,MRAM)等非揮發性隨機存取記憶體(Non-Volatile Random Access Memory,NVRAM)。記憶體115也可為不同記憶體類型的任何所期望組合。The machine 105 may further include a memory 115, which may be managed by the memory controller 120. The memory 115 may be any kind of memory, such as flash memory, dynamic random access memory (DRAM), static random access memory (SRAM), permanent random access memory (Persistent Random Access Memory), Ferroelectric Random Access Memory (FRAM) or non-volatile random access memory (Non-Volatile Random Access Memory, such as Magnetoresistive Random Access Memory, MRAM) , NVRAM). The memory 115 can also be any desired combination of different memory types.

機器105還可包括具有旁視擦除編碼邏輯的周邊元件連接快速(PCIe)交換機125。PCIe交換機125可為支援旁視擦除編碼邏輯的任何所期望PCIe交換機。The machine 105 may also include a peripheral component connection express (PCIe) switch 125 with side-by-side erasure coding logic. The PCIe switch 125 may be any desired PCIe switch that supports side-by-side erasure coding logic.

機器105還可包括儲存裝置130,儲存裝置130可由裝置驅動器135控制。儲存裝置130可為能夠與PCIe交換機125通信的任何所期望形式的儲存裝置。例如,儲存裝置130可為非揮發性儲存快速(NVMe)固態驅動器(SSD)。The machine 105 may further include a storage device 130, which may be controlled by the device driver 135. The storage device 130 may be any desired form of storage device capable of communicating with the PCIe switch 125. For example, the storage device 130 may be a non-volatile storage fast (NVMe) solid state drive (SSD).

儘管圖1將機器105繪示為伺服器(其可為獨立伺服器或機架式伺服器(rack server)),然而本發明概念的實施例可包括任何所期望類型的機器105,而無限制。例如,機器105可以以桌面型電腦(desktop computer)或膝上型電腦(laptop computer)或任何其他可受益於本發明概念實施例的機器來替換。機器105還可包括專用可攜式電腦器、平板電腦(tablet computer)、智慧手機及其他電腦器。Although FIG. 1 illustrates the machine 105 as a server (which may be a stand-alone server or a rack server), embodiments of the inventive concept may include any desired type of machine 105 without limitation. . For example, the machine 105 may be replaced with a desktop computer or a laptop computer or any other machine that may benefit from embodiments of the inventive concept. The machine 105 may also include a dedicated portable computer device, a tablet computer (tablet computer), a smartphone, and other computer devices.

圖2示出圖1所示機器的附加細節。在圖2中,通常,機器105包括一個或多個處理器110,所述一個或多個處理器110可包括記憶體控制器120及時脈205,時脈205可用於協調裝置105的元件的操作。處理器110還可耦合到記憶體115,記憶體115可包括例如隨機存取記憶體(random access memory,RAM)、唯讀記憶體(ROM)或其他狀態保持介質。處理器110還可耦合到儲存裝置130及網路連接器210,網路連接器210可為例如乙太網連接器或無線連接器。處理器110還可連接到匯流排215,匯流排215可與使用者介面220及輸入/輸出介面埠附接,輸入/輸出介面埠可使用輸入/輸出引擎225以及其他元件來管理。Fig. 2 shows additional details of the machine shown in Fig. 1. In FIG. 2, generally, the machine 105 includes one or more processors 110, which may include a memory controller 120 and a clock 205, which may be used to coordinate the operation of elements of the device 105 . The processor 110 may also be coupled to a memory 115, which may include, for example, random access memory (RAM), read-only memory (ROM), or other state retention media. The processor 110 may also be coupled to the storage device 130 and the network connector 210. The network connector 210 may be, for example, an Ethernet connector or a wireless connector. The processor 110 can also be connected to a bus 215, which can be attached to a user interface 220 and an input/output interface port, which can be managed using an input/output engine 225 and other components.

圖3示出圖1所示機器105的附加細節,包括配電板及將具有圖1所示旁視擦除編碼邏輯的PCIe交換機125連接到儲存裝置的中間平面。在圖3中,機器105可包括中間平面305以及配電板310及315。分別來說,配電板310可包括具有旁視擦除編碼邏輯的PCIe交換機125及基板管理控制器325,配電板315可包括具有旁視擦除編碼邏輯的PCIe交換機320及基板管理控制器330。(配電板310及315還可包括圖3中未示出的附加組件:圖3關注與本發明概念實施例最相關的元件。)FIG. 3 shows additional details of the machine 105 shown in FIG. 1, including a power distribution board and an intermediate plane connecting the PCIe switch 125 with the side-by-side erasure coding logic shown in FIG. 1 to the storage device. In FIG. 3, the machine 105 may include an intermediate plane 305 and distribution boards 310 and 315. Respectively, the power distribution board 310 may include a PCIe switch 125 with a side-by-side erasure coding logic and a baseboard management controller 325, and the power distribution board 315 may include a PCIe switch 320 with a side-by-side erasure coding logic and a baseboard management controller 330. (Distribution boards 310 and 315 may also include additional components not shown in FIG. 3: FIG. 3 focuses on the elements most relevant to embodiments of the inventive concept.)

在本發明概念的一些實施例中,每個具有旁視擦除編碼邏輯的PCIe交換機125及320可支援多達總共96個PCIe通道。通過使用U.2連接器將具有旁視擦除編碼邏輯的PCIe交換機125及320連接到儲存裝置130-1至130-6,每個U.2連接器支援每個裝置多達4個PCIe通道。使用兩個X4通道(每個通信方向一個X4通道),此意味著每個PCIe交換機可支援多達96 ÷ 8 = 12個裝置。因此,圖3示出12個儲存裝置130-1至130-3與具有旁視擦除編碼邏輯的PCIe交換機125通信,且12個儲存裝置130-4至130-6與具有旁視擦除編碼邏輯的PCIe交換機320通信。但是與具有旁視擦除編碼邏輯的PCIe交換機125及320通信的儲存裝置的數目僅由具有旁視擦除編碼邏輯的PCIe交換機125及320所提供的PCIe通道的數目以及每個儲存裝置130-1至130-6所使用的PCIe通道的數目來限定。In some embodiments of the inventive concept, each PCIe switch 125 and 320 with side-by-side erasure coding logic can support up to a total of 96 PCIe lanes. By using U.2 connectors to connect PCIe switches 125 and 320 with side-by-side erasure coding logic to storage devices 130-1 to 130-6, each U.2 connector supports up to 4 PCIe channels per device . Two X4 channels are used (one X4 channel per communication direction), which means that each PCIe switch can support up to 96 ÷ 8 = 12 devices. Therefore, FIG. 3 shows that 12 storage devices 130-1 to 130-3 communicate with the PCIe switch 125 with side-by-side erasure coding logic, and 12 storage devices 130-4 to 130-6 have side-by-side erasure coding The logical PCIe switch 320 communicates. However, the number of storage devices that communicate with PCIe switches 125 and 320 with bypass erasure coding logic is only provided by the number of PCIe channels provided by PCIe switches 125 and 320 with bypass erasure coding logic and each storage device 130- The number of PCIe channels used from 1 to 130-6 is limited.

在本發明概念的一些實施例中,具有旁視擦除編碼邏輯的PCIe交換機125及320可使用定制電路系統來實施。在本發明概念的其他實施例中,具有旁視擦除編碼邏輯的PCIe交換機125及320可使用適當程式設計的現場可程式設計閘陣列(FPGA)或應用專用積體電路(Application-Specific Integrated Circuit,ASIC)來實施。In some embodiments of the inventive concept, PCIe switches 125 and 320 with side-by-side erasure coding logic can be implemented using custom circuitry. In other embodiments of the inventive concept, PCIe switches 125 and 320 with side-by-side erasure coding logic can use field programmable gate arrays (FPGAs) or application-specific integrated circuits (Application-Specific Integrated Circuits) with appropriate programming , ASIC) to implement.

BMC 325及330可用於配置儲存裝置130-1至130-6。例如,BMC 325及330可將儲存裝置130-1至130-6初始化,從而擦除儲存裝置130-1至130-6上所存在的任何資料:在啟動時、當儲存裝置130-1至130-6被添加到擦除編碼方案時或者當兩者同時發生時。作為另外一種選擇,此種功能可由處理器(圖1所示處理器110或者由存在(但未出)於配電板310及315上的本地處理器)支援。BMC 325及330(或圖1所示處理器110或者存在(但未出)於配電板310及315上的本地處理器)也可負責具有旁視擦除編碼邏輯的PCIe交換機125及320的旁視擦除編碼邏輯的初始配置。The BMCs 325 and 330 can be used to configure the storage devices 130-1 to 130-6. For example, the BMCs 325 and 330 can initialize the storage devices 130-1 to 130-6, thereby erasing any data present on the storage devices 130-1 to 130-6: at startup, when the storage devices 130-1 to 130 -6 is added to the erasure coding scheme or when both occur at the same time. Alternatively, such a function may be supported by a processor (the processor 110 shown in FIG. 1 or a local processor existing (but not shown) on the power distribution boards 310 and 315). BMCs 325 and 330 (or processor 110 shown in FIG. 1 or local processors present (but not shown) on power distribution boards 310 and 315) can also be responsible for PCIe switches 125 and 320 with side-by-side erasure coding logic Depending on the initial configuration of erasure coding logic.

圖3示出具有旁視擦除編碼邏輯的兩個PCIe交換機125及320的資料保護的示例性完整設置:BMC 325及330可直接配置旁視擦除編碼邏輯。BMC 325及330可具有其中應用特定資料保護方案的硬編碼路徑或者可配置設置。後者可將介面作為基本輸入/輸出系統(BIOS)選項暴露給此配置,或者經由硬體暴露介面暴露給附加軟體。硬編碼方案可內建在BIOS固件中,且可仍然提供啟用/禁用保護的選項。FIG. 3 shows an exemplary complete setup of data protection for two PCIe switches 125 and 320 with side-by-side erasure coding logic: BMC 325 and 330 can be directly configured with side-by-side erasure coding logic. BMC 325 and 330 may have hard-coded paths in which specific data protection schemes are applied or configurable settings. The latter can expose the interface to this configuration as a basic input/output system (BIOS) option, or to additional software via a hardware exposed interface. A hard-coded scheme can be built into the BIOS firmware, and can still provide options to enable/disable protection.

在儲存裝置發生故障的情形中,BMC 325及330可檢測儲存裝置何時變壞或何時經由控制路徑被移除。BMC 325及330可接著重新配置旁視擦除編碼邏輯以啟用故障場景。BMC 325及330可連接到控制路徑,但不連接到資料路徑。相似地,當插入新儲存裝置時,BMC 325及330可進行干預並將新儲存裝置配置作為已建立的群組的一部分,或者啟動重建操作。旁視擦除編碼邏輯可處置實際重建;理想情況下,此設置中的恢復路徑應將對資料存取的性能影響最小化,並且從其餘儲存裝置重構重建儲存裝置上的資料。In the event of a storage device failure, the BMCs 325 and 330 can detect when the storage device has deteriorated or was removed via the control path. The BMCs 325 and 330 can then reconfigure the sidetrack erasure coding logic to enable the failure scenario. BMC 325 and 330 can be connected to the control path, but not to the data path. Similarly, when a new storage device is inserted, BMCs 325 and 330 can intervene and configure the new storage device as part of an established group, or initiate a rebuild operation. Bypass erasure coding logic can handle the actual reconstruction; ideally, the recovery path in this setting should minimize the performance impact on data access, and reconstruct the data on the storage device from the remaining storage devices.

此時,定義用語“擦除編碼”是有意義的。擦除編碼旨在闡述用於對多個儲存裝置上的資料進行編碼的任何所期望方式。擦除編碼可能需要至少兩個儲存裝置或儲存裝置的至少兩個部分(例如,含有兩個或更多個NAND快閃記憶體通道的單個殼(shell)或殼體(housing)),這是因為如果僅使用一個儲存裝置,則可使用適宜於所述儲存裝置的傳統資料存取技術來儲存資料。換句話說,擦除編碼被定義為意指以更高效地使用儲存裝置和/或提供資料冗餘的方式跨兩個或更多個儲存裝置、單個儲存裝置的兩個或更多個部分或其任意組合儲存資料的方式。At this time, it makes sense to define the term "erasure coding". Erasure coding is intended to illustrate any desired method for encoding data on multiple storage devices. Erasure coding may require at least two storage devices or at least two parts of the storage device (eg, a single shell or housing containing two or more NAND flash memory channels), which is Because if only one storage device is used, data can be stored using conventional data access techniques suitable for the storage device. In other words, erasure coding is defined to mean spanning two or more storage devices, two or more parts of a single storage device in a more efficient use of storage devices and/or providing data redundancy or It can store data in any combination.

獨立盤的冗餘陣列(RAID)代表擦除編碼的子集;或者換句話說,RAID級別代表各種擦除編碼方案的特定實施方案。然而,可存在可被定義為超出傳統RAID級別的其他擦除編碼方案。A redundant array of independent disks (RAID) represents a subset of erasure coding; or in other words, a RAID level represents a specific implementation of various erasure coding schemes. However, there may be other erasure coding schemes that can be defined beyond traditional RAID levels.

通常,實施擦除編碼(或RAID)使用兩個或更多個物理上不同的儲存裝置。但是在本發明概念的一些實施例中,單個殼或殼體可包括儲存裝置的多個部分,出於擦除編碼的目的,儲存裝置的所述多個部分可被視為單獨的儲存裝置。例如,單個NVMe SSD殼或殼體可包括多個NAND快閃記憶體通道。出於擦除編碼的目的,每個NAND快閃記憶體通道可被視為單獨的儲存裝置,資料跨各種NAND快閃記憶體通道進行條帶化(或者進行編碼)。在本發明概念的一些實施例中,此使得有可能使用單個儲存裝置實施擦除編碼。此外,具有旁視擦除編碼邏輯的PCIe交換機125有可能支持改錯碼(Error Correcting Code)(內建於某處的具有旁視擦除編碼邏輯的PCIe交換機125中,或者通過附加邏輯)或其他可與單個儲存裝置一起使用的功能。Generally, the implementation of erasure coding (or RAID) uses two or more physically different storage devices. However, in some embodiments of the inventive concept, a single shell or housing may include multiple parts of the storage device, which may be considered separate storage devices for the purpose of erasure coding. For example, a single NVMe SSD case or housing may include multiple NAND flash memory channels. For the purpose of erasure coding, each NAND flash memory channel can be regarded as a separate storage device, and data is striped (or encoded) across various NAND flash memory channels. In some embodiments of the inventive concept, this makes it possible to implement erasure coding using a single storage device. In addition, the PCIe switch 125 with side-view erasure coding logic may support Error Correcting Code (built-in somewhere in the PCIe switch 125 with side-view erasure coding logic, or through additional logic) or Other functions that can be used with a single storage device.

圖4示出用於實現不同擦除編碼方案的圖3所示儲存裝置130-1至130-6。在圖4中,如擦除編碼方案405中所示,RAID 0配置中可使用儲存裝置130-1至130-6。RAID 0跨各種儲存裝置對資料進行條帶化。也就是說,資料被劃分成適宜於儲存裝置的邏輯單元,且每個邏輯單元被寫入到多達陣列中儲存裝置數目的不同儲存裝置;在所有儲存裝置上均被寫入一個資料邏輯單元後,在第一儲存裝置上再次寫入資料,依此類推。FIG. 4 shows the storage devices 130-1 to 130-6 shown in FIG. 3 for implementing different erasure coding schemes. In FIG. 4, as shown in the erasure coding scheme 405, storage devices 130-1 to 130-6 can be used in a RAID 0 configuration. RAID 0 stripes data across various storage devices. In other words, the data is divided into logical units suitable for storage devices, and each logical unit is written to different storage devices up to the number of storage devices in the array; one data logical unit is written on all storage devices Then, write the data on the first storage device again, and so on.

與單獨使用單個儲存裝置或甚至使用無組織的磁片群組(例如磁片櫃(Just a Bunch of Disks,JBOD)或快閃記憶體櫃(Just a Bunch of Flash,JBOF))相比,RAID 0具有優勢。由於資料儲存在多個儲存裝置上,因此資料可被更快地讀取和寫入,其中每個儲存裝置平行作業。因此,例如,通過如圖4中所示跨12個儲存裝置130-1至130-6劃分資料,每個儲存裝置130-1至130-6僅需讀取或寫入總數據的十二分之一,此比讀取或寫入整個資料快。陣列的總容量可作為陣列中儲存裝置的數目乘以陣列中最小儲存裝置的容量來計算。因此,在圖4中,由於陣列包括12個資料儲存裝置,因此陣列的總容量是陣列中最小儲存裝置的容量的12倍。Compared with the use of a single storage device alone or even an unorganized group of diskettes (such as a diskette cabinet (Just a Bunch of Disks, JBOD) or a flash memory cabinet (Just a Bunch of Flash, JBOF)), RAID 0 has an advantage. Since the data is stored on multiple storage devices, the data can be read and written faster, with each storage device operating in parallel. Therefore, for example, by dividing data across 12 storage devices 130-1 to 130-6 as shown in FIG. 4, each storage device 130-1 to 130-6 only needs to read or write twelve points of the total data One, this is faster than reading or writing the entire data. The total capacity of the array can be calculated as the number of storage devices in the array multiplied by the capacity of the smallest storage device in the array. Therefore, in FIG. 4, since the array includes 12 data storage devices, the total capacity of the array is 12 times the capacity of the smallest storage device in the array.

RAID 0的缺點是存在針對儲存裝置故障的保護:如果陣列中任何儲存裝置出現故障,則資料會丟失。事實上,RAID 0可被視為比JBOD或JBOF的風險更高:通過跨多個儲存裝置對資料進行條帶化,如果任何各別儲存裝置出現故障,則所有資料均會丟失。(相反,對於JBOD或JBOF,文件通常被寫入到僅一個儲存裝置。因此,儘管在JBOD或JBOF設置中,單個儲存裝置的故障可能導致一些資料丟失,但並非所有資料均必然會丟失。)The disadvantage of RAID 0 is that there is protection against storage device failure: if any storage device in the array fails, data will be lost. In fact, RAID 0 can be seen as a higher risk than JBOD or JBOF: by striping data across multiple storage devices, if any individual storage device fails, all data will be lost. (In contrast, for JBOD or JBOF, files are usually written to only one storage device. Therefore, although in the JBOD or JBOF settings, a single storage device failure may cause some data to be lost, but not all data will necessarily be lost.)

RAID 0不包括任何冗餘,且因此在技術上不是獨立盤的冗餘陣列。但是傳統上,RAID 0被視為一種RAID級別,且RAID 0無疑可被視為擦除編碼方案。RAID 0 does not include any redundancy, and therefore is not technically a redundant array of independent disks. But traditionally, RAID 0 is regarded as a RAID level, and RAID 0 can undoubtedly be regarded as an erasure coding scheme.

擦除編碼方案410示出RAID 5,RAID 5是一種常見的RAID方案。在RAID 5中,可為儲存在此條帶的其他儲存裝置上的資料計算同位塊。因此,在圖4中,由於RAID 5陣列包括總共12個儲存裝置,因此11個儲存裝置被用作資料驅動器,且1個儲存裝置被用作同位驅動器。(在RAID 5中,同位資料不限於同位驅動器,而是像任何資料一樣跨儲存裝置進行分佈。不再經常使用的RAID 4將所有同位資訊儲存在單個驅動器上。)陣列(其中所述陣列中存在n個儲存裝置)的總容量可被計算為最小儲存裝置的容量的n - 1倍。由於每個條帶包括一個同位塊,因此擦除編碼方案410可容忍多達一個儲存裝置的故障,且仍然能夠存取所有資料(故障儲存裝置上的資料可結合同位塊使用功能儲存裝置上的資料來恢復)。The erasure coding scheme 410 shows RAID 5, which is a common RAID scheme. In RAID 5, parity blocks can be calculated for data stored on other storage devices in this stripe. Therefore, in FIG. 4, since the RAID 5 array includes a total of 12 storage devices, 11 storage devices are used as data drives, and one storage device is used as a co-location drive. (In RAID 5, co-located data is not limited to co-located drives, but is distributed across storage devices like any data. RAID 4, which is no longer used, stores all co-located information on a single drive.) Array (where the array There are n storage devices) The total capacity can be calculated as n-1 times the capacity of the smallest storage device. Since each stripe includes a co-located block, the erasure coding scheme 410 can tolerate the failure of up to one storage device and still be able to access all data (the data on the faulty storage device can be used in conjunction with the co-located block on the functional storage device Data to recover).

注意,與RAID 0相比,RAID 5提供的總儲存較少,但提供一些針對儲存裝置故障的保護。在決定RAID級別時,此為一重要權衡:總儲存容量與冗餘的相對重要性。Note that compared to RAID 0, RAID 5 provides less total storage, but provides some protection against storage device failures. When determining the RAID level, this is an important trade-off: the relative importance of total storage capacity and redundancy.

圖4中未示出的其他RAID級別也可用作擦除編碼方案。例如,RAID 6使用兩個儲存裝置來儲存同位資訊,從而將總儲存容量減少到最小儲存裝置容量的n - 2倍,但同時容忍多達兩個儲存裝置故障。混合方案也是可能的:例如,RAID 0+1、RAID 1+0、RAID 5+0、RAID 6+0及其他RAID方案均是可能的,每個方案提供不同的總儲存容量及儲存裝置故障容忍度。例如,儲存裝置130-1至130-6中的五個可用於形成一個RAID 5陣列,儲存裝置130-1至130-6中的另外五個可用於形成第二RAID 5陣列,且這兩個群組與其餘兩個儲存裝置相組合可用於形成更大的RAID 5陣列。或者,儲存裝置130-1至130-6可劃分成兩個群組,每個群組實施RAID 0陣列,其中所述兩個群組充當更大的RAID 1陣列(從而實施RAID 0+1設置)。應注意,RAID及擦除編碼技術使用固定代碼(fixed code)或旋轉代碼(rotating code),且以上固定代碼/同位驅動器符號僅用於說明目的。Other RAID levels not shown in FIG. 4 can also be used as the erasure coding scheme. For example, RAID 6 uses two storage devices to store parity information, thereby reducing the total storage capacity to n-2 times the capacity of the minimum storage device, but tolerating up to two storage device failures at the same time. Hybrid schemes are also possible: for example, RAID 0+1, RAID 1+0, RAID 5+0, RAID 6+0 and other RAID schemes are possible, each scheme provides different total storage capacity and storage device fault tolerance degree. For example, five of the storage devices 130-1 to 130-6 can be used to form a RAID 5 array, and the other five of the storage devices 130-1 to 130-6 can be used to form a second RAID 5 array, and these two The group combined with the remaining two storage devices can be used to form a larger RAID 5 array. Alternatively, the storage devices 130-1 to 130-6 can be divided into two groups, each group implementing a RAID 0 array, wherein the two groups act as a larger RAID 1 array (thus implementing a RAID 0+1 setup ). It should be noted that RAID and erasure coding techniques use fixed codes or rotating codes, and the above fixed code/co-located drive symbols are for illustrative purposes only.

擦除編碼方案415代表更一般的說明,其適用於所有RAID級別及任何其他所期望擦除編碼方案。考慮到儲存裝置130-1至130-6的陣列,這些儲存裝置可劃分成兩個群組:一個群組用於儲存資料,另一個群組用於儲存代碼。代碼可為同位資訊或允許從資料群組中的資料子集及編碼群組中的一些編碼中恢復丟失資料的任何其他所期望編碼資訊。如圖4中所示,擦除編碼方案415可包括多達X個資料儲存裝置及Y個代碼儲存裝置。考慮到來自陣列的X個儲存裝置的任意組合,預期有可能存取或重構來自所有X個資料儲存裝置的資料。因此,擦除編碼方案415一般可容忍陣列中多達Y個儲存裝置故障,且仍然能夠存取儲存在陣列中的所有資料。就容量而言,擦除編碼方案415的總容量是最小儲存裝置的容量的X倍。The erasure coding scheme 415 represents a more general description, which applies to all RAID levels and any other desired erasure coding scheme. Considering the array of storage devices 130-1 to 130-6, these storage devices can be divided into two groups: one group is used to store data, and the other group is used to store codes. The code may be parity information or any other desired encoding information that allows the recovery of lost data from a subset of data in the data group and some encodings in the encoding group. As shown in FIG. 4, the erasure coding scheme 415 may include up to X data storage devices and Y code storage devices. Considering any combination of X storage devices from the array, it is expected that it is possible to access or reconstruct data from all X data storage devices. Therefore, the erasure coding scheme 415 can generally tolerate up to Y storage device failures in the array and still be able to access all data stored in the array. In terms of capacity, the total capacity of the erasure coding scheme 415 is X times the capacity of the smallest storage device.

注意,在以上論述中,任何擦除編碼方案的總容量均是相對於“最小儲存裝置的容量”而闡述。對於一些擦除編碼方案,儲存裝置可能具有不同的容量且仍然得到充分利用。但是一些擦除編碼方案(例如RAID 0或RAID 1)預期所有儲存裝置具有相同的容量,且將丟棄較大儲存裝置所可能包括的任何容量。因此,短語“最小儲存裝置的容量”應被理解為相對性短語,且使用任何特定擦除編碼方案的陣列所提供的總容量可大於上述公式。Note that in the above discussion, the total capacity of any erasure coding scheme is stated relative to "the capacity of the smallest storage device". For some erasure coding schemes, storage devices may have different capacities and still be fully utilized. But some erasure coding schemes (such as RAID 0 or RAID 1) expect all storage devices to have the same capacity, and will discard any capacity that larger storage devices may include. Therefore, the phrase "minimum storage device capacity" should be understood as a relative phrase, and the total capacity provided by an array using any particular erasure coding scheme may be greater than the above formula.

回到圖3,不管所使用的特定擦除編碼方案如何,PCIe交換機125及320的旁視擦除編碼邏輯均會有效地從物理儲存裝置130-1至130-6中創建新儲存裝置。由於由擦除編碼方案呈現的儲存裝置在物理上不存在,因而此新儲存裝置可被視為虛擬儲存裝置。並且由於此虛擬儲存裝置使用物理儲存裝置130-1至130-6,因此物理儲存裝置130-1至130-6應對主機隱藏。畢竟,當儲存在儲存裝置130-1至130-6上的資料可能已以主機所不知道的方式編碼時,主機嘗試直接存取儲存裝置130-1至130-6上的塊將成問題。Returning to FIG. 3, regardless of the specific erasure coding scheme used, the bypass erasure coding logic of the PCIe switches 125 and 320 will effectively create new storage devices from the physical storage devices 130-1 to 130-6. Since the storage device presented by the erasure coding scheme does not physically exist, this new storage device can be regarded as a virtual storage device. And since this virtual storage device uses physical storage devices 130-1 to 130-6, the physical storage devices 130-1 to 130-6 should be hidden by the host. After all, when the data stored on the storage devices 130-1 to 130-6 may have been encoded in a manner unknown to the host, the host's attempt to directly access the blocks on the storage devices 130-1 to 130-6 will be a problem.

為支援此虛擬儲存裝置的使用,具有旁視擦除編碼邏輯的PCIe交換機125和/或320可將虛擬儲存裝置的容量告知給圖1所示處理器110。例如,如果儲存裝置130-1至130-6包括五個NVMe SSD(每個NVMe SSD儲存1太位元組(TB)的資料(為在數學上簡單起見,1 TB被視為240個位元組,而非1012個位元組)且擦除編碼方案實施RAID 5陣列,則虛擬儲存裝置的有效儲存容量為4 TB。(擦除編碼的其他實施方案使用更少或更多的儲存裝置(每個儲存裝置可儲存少於或多於1 TB)可能會導致虛擬儲存裝置具有不同的容量。)具有旁視擦除編碼邏輯的PCIe交換機125和/或320可將其連接到提供總共4 TB(或242個位元組)儲存容量的虛擬儲存裝置通知給處理器110。如以下參考圖5進一步闡述,圖1所示處理器110可接著將資料寫入到此虛擬儲存裝置中的塊,且旁視擦除編碼邏輯可處置資料的實際儲存。例如,如果NVMe SSD上的塊大小各為4千位元組(KB),則處理器110可請求將資料寫入到編號在0至230-1之間的邏輯塊。To support the use of this virtual storage device, PCIe switches 125 and/or 320 with side-by-side erasure coding logic can notify the processor 110 shown in FIG. 1 of the capacity of the virtual storage device. For example, if storage devices 130-1 to 130-6 include five NVMe SSDs (each NVMe SSD stores 1 terabyte (TB) of data (for mathematical simplicity, 1 TB is regarded as 240 bits Tuples instead of 1012 bytes) and the erasure coding scheme implements a RAID 5 array, the effective storage capacity of the virtual storage device is 4 TB. (Other implementations of erasure coding use fewer or more storage devices (Each storage device can store less than or more than 1 TB.) It may result in virtual storage devices with different capacities.) PCIe switches 125 and/or 320 with side-by-side erasure coding logic can be connected to provide a total of 4 The virtual storage device of TB (or 242 bytes) storage capacity is notified to the processor 110. As explained further below with reference to FIG. 5, the processor 110 shown in FIG. 1 can then write data to the blocks in this virtual storage device , And the side-by-side erasure coding logic can handle the actual storage of the data. For example, if the block size on the NVMe SSD is 4 kilobytes (KB), the processor 110 may request that the data be written to the number from 0 to Logic blocks between 230-1.

作為另外一種選擇,具有旁視擦除編碼邏輯的PCIe交換機125和/或320可從圖1所示處理器110請求主機記憶體位址塊,此代表用於與虛擬儲存裝置通信的方法。當圖1所示處理器110想要讀取或寫入資料時,包括主機記憶體位址塊內的適宜位址在內的傳輸可被發送到具有旁視擦除編碼邏輯的PCIe交換機125和/或320。此主機記憶體位址塊應至少與使用擦除編碼方案實施的虛擬儲存裝置一樣大(且如果預期在使用期間可將附加的儲存裝置添加到擦除編碼方案,則可大於虛擬儲存裝置的初始容量)。Alternatively, PCIe switches 125 and/or 320 with side-by-side erasure coding logic may request a host memory address block from the processor 110 shown in FIG. 1, which represents a method for communicating with a virtual storage device. When the processor 110 shown in FIG. 1 wants to read or write data, the transmission including the appropriate address in the address block of the host memory can be sent to the PCIe switch 125 with sidetrack erasure coding logic and/or Or 320. This host memory address block should be at least as large as the virtual storage device implemented using the erasure coding scheme (and if it is expected that additional storage devices can be added to the erasure coding scheme during use, it can be larger than the initial capacity of the virtual storage device ).

圖5示出具有圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的細節。在圖5中,具有旁視擦除編碼邏輯的PCIe交換機125可包括各種元件,例如連接器505、PCIe到PCIe堆疊(PCIe-to-PCIe stack)510-1至510-6、PCIe交換機核心515及功率處理單元(PPU)520。連接器505能夠使具有旁視擦除編碼邏輯的PCIe交換機125與圖1所示機器105中的各種其他元件(例如圖1所示處理器110及圖3所示儲存裝置130-1至130-6)通信。連接器505中的一個或多個可被稱為“外部”連接器,這是因為其連接到上游元件(例如圖1所示處理器110);其餘連接器505可被稱為內部或下游“連接器”,因為其連接到下游裝置(例如圖3所示儲存裝置130-1至130-6)。PCIe到PCIe堆疊510-1至510-6允許PCIe裝置之間的資料交換。例如,圖3所示儲存裝置130-1可向圖3所示儲存裝置130-3發送資料。或者,圖1所示處理器110可正在請求圖3所示儲存裝置130-1至130-6中的一個或多個實行讀取或寫入請求。PCIe到PCIe堆疊510-1至510-6可包括緩衝器來臨時儲存資料:例如,如果特定傳輸的目的地裝置當前正忙,則PCIe到PCIe堆疊510-1至510-6中的緩衝器可儲存傳輸,直到目的地裝置空閒為止。PPU 520可充當配置中心,從而處置對具有旁視擦除編碼邏輯的PCIe交換機125的任何配置請求。儘管圖5示出六個PCIe到PCIe堆疊510-1至510-6,然而本發明概念的實施例可包括任何數目的PCIe到PCIe堆疊。PCIe交換機核心515進行操作以將資料從一個PCIe埠路由到另一個PCIe埠。FIG. 5 shows the details of the PCIe switch 125 with the bypass erasure coding logic shown in FIG. In FIG. 5, the PCIe switch 125 with side-by-side erasure coding logic may include various elements, such as a connector 505, a PCIe-to-PCIe stack (510-1 to 510-6), a PCIe switch core 515 And power processing unit (PPU) 520. The connector 505 enables the PCIe switch 125 with bypass erasure coding logic and various other elements in the machine 105 shown in FIG. 1 (such as the processor 110 shown in FIG. 1 and the storage devices 130-1 to 130- shown in FIG. 3 6) Communication. One or more of the connectors 505 may be referred to as "external" connectors because they are connected to upstream elements (such as the processor 110 shown in FIG. 1); the remaining connectors 505 may be referred to as internal or downstream" "Connector" because it is connected to a downstream device (for example, storage devices 130-1 to 130-6 shown in FIG. 3). PCIe to PCIe stacks 510-1 to 510-6 allow data exchange between PCIe devices. For example, the storage device 130-1 shown in FIG. 3 may send data to the storage device 130-3 shown in FIG. Alternatively, the processor 110 shown in FIG. 1 may be requesting one or more of the storage devices 130-1 to 130-6 shown in FIG. 3 to perform a read or write request. The PCIe to PCIe stacks 510-1 to 510-6 may include buffers to temporarily store data: for example, if the destination device of a particular transfer is currently busy, the buffers in the PCIe to PCIe stacks 510-1 to 510-6 may be The transmission is stored until the destination device is idle. The PPU 520 may act as a configuration center to handle any configuration requests for the PCIe switch 125 with side-by-side erasure coding logic. Although FIG. 5 shows six PCIe to PCIe stacks 510-1 to 510-6, embodiments of the inventive concept may include any number of PCIe to PCIe stacks. The PCIe switch core 515 operates to route data from one PCIe port to another PCIe port.

在進入探測邏輯525及擦除編碼控制器530的操作之前,理解有至少兩個不同的“位址”用於儲存在圖3所示儲存裝置130-1至130-6上的資料是有說明的。在任何儲存裝置上,資料被寫入到與硬體結構相關聯的特定位址:此位址可被視為“物理”位址:在NVMe SSD的上下文中,“物理”位址通常被稱為物理塊位址(Physical Block Address,PBA)。Before entering the operation of the detection logic 525 and the erasure coding controller 530, it is understood that there are at least two different "addresses" for storing the data stored on the storage devices 130-1 to 130-6 shown in FIG. 3 of. On any storage device, data is written to a specific address associated with the hardware structure: this address can be regarded as a "physical" address: in the context of NVMe SSDs, a "physical" address is often called Physical block address (Physical Block Address, PBA).

NVMe SSD中所使用的快閃記憶體通常不容許資料就地重寫。相反,當需要重寫資料時,舊資料會無效化,而新資料會被寫入到NVMe SSD上其他地點處的新資料塊。因此,寫入與特定資料結構(無論是文件、物件還是任何其他資料結構)相關聯的資料的PBA可隨時間而變化。The flash memory used in NVMe SSDs usually does not allow data to be rewritten in place. Conversely, when data needs to be rewritten, the old data will be invalidated, and the new data will be written to new data blocks at other locations on the NVMe SSD. Therefore, the PBA written to the data associated with a particular data structure (whether it is a document, object, or any other data structure) may change over time.

另外,還有其他在快閃記憶體中重新定位資料的原因。資料通常是以比在將資料寫入到快閃記憶體時所使用的單元大的單元從快閃記憶體擦除。如果在要擦除的單元中的其他地點儲存有有效資料,則在可擦除所述單元之前,必須將此有效資料寫入快閃記憶體中的其他地點。此擦除過程通常被稱為垃圾收集(Garbage Collection),且從要擦除的單元中複製出有效資料的過程被稱為程式設計。並且耗損均衡(Wear Levelling)(嘗試使快閃記憶體中的單元大致以同等程度使用的過程)也可在快閃記憶體內重新定位資料。In addition, there are other reasons for relocating data in flash memory. The data is usually erased from the flash memory in a unit larger than the unit used when writing the data to the flash memory. If valid data is stored elsewhere in the cell to be erased, the valid data must be written to another location in the flash memory before the cell can be erased. This erasing process is usually called Garbage Collection, and the process of copying valid data from the unit to be erased is called programming. And Wear Levelling (a process of trying to use cells in flash memory at roughly the same level) can also relocate data in the flash memory.

每次移動特定資料塊時,主機均可收到通知,並被告知資料的新儲存位置。但是以此種方式通知主機會給主機帶來顯著的負擔。因此,大部分快閃記憶體裝置將儲存資料的邏輯塊位址(LBA)通知給主機,並維持將LBA映射到PBA的表(通常位於快閃記憶體轉換層(Flash Translation Layer,FTL)中)。接著,每當所討論的資料被移動到新的PBA時,快閃記憶體可更新FTL中的LBA到PBA映射表,而非將新位址通知給主機。因此,對於每個儲存裝置,可存在與資料相關聯的PBA和LBA二者。Each time a specific data block is moved, the host can receive a notification and be informed of the new storage location of the data. But notifying the host in this way will bring a significant burden to the host. Therefore, most flash memory devices notify the host of the logical block address (LBA) where the data is stored, and maintain a table that maps the LBA to the PBA (usually located in the Flash Translation Layer (FTL) ). Then, whenever the data in question is moved to a new PBA, the flash memory can update the LBA to PBA mapping table in the FTL instead of notifying the host of the new address. Therefore, for each storage device, there may be both PBA and LBA associated with the data.

通過添加由旁視擦除編碼邏輯呈現的虛擬儲存裝置的概念,使得為此結構引入又一級別。回想以上參考圖3呈現的例子,其中擦除編碼方案包括五個1 TB NVMe SSD,每個NVMe SSD使用大小為4 KB的塊。每個NVMe SSD可包括編號在0至228-1之間的LBA。但是呈現給主機的虛擬儲存裝置包括編號在0至230-1之間的LBA。By adding the concept of virtual storage devices presented by side-by-side erasure coding logic, a further level is introduced for this structure. Recall the example presented above with reference to FIG. 3, where the erasure coding scheme includes five 1 TB NVMe SSDs, each of which uses blocks of 4 KB in size. Each NVMe SSD may include LBAs numbered from 0 to 228-1. But the virtual storage device presented to the host includes LBAs with numbers between 0 and 230-1.

因此,主機所看到的LBA範圍可代表各種儲存裝置的多個LBA範圍的組合。為在由主機所使用的LBA範圍與各別儲存裝置的LBA範圍之間進行區分,由主機所使用的LBA可被稱為“主機LBA(host LBA)”、“全域LBA(global LBA)”或“作業系統(O/S)感知LBA(operating system-aware LBA)”,而由儲存裝置所使用的LBA可被稱為“裝置LBA(device LBA)”、“本地LBA(local LBA)”或“RoC後面的LBA(LBA behind RoC)”。主機LBA範圍可以以任何所期望方式在各種儲存裝置之間劃分。例如,主機LBA範圍可劃分成連續的塊,其中每個各別塊被分配給特定的儲存裝置。通過使用此種方案,主機LBA 0至LBA 228-1可映射到儲存裝置130-1的裝置LBA 0至LBA 228–1,主機LBA 228至LBA 229-1可映射到儲存裝置130-2的裝置LBA 0至LBA 228-1,依此類推。作為另外一種選擇,主機LBA中的各別位元可用於確定適宜的儲存裝置及儲存此資料的裝置LBA:例如,使用主機LBA中的低階位元來識別所述裝置,並剝離這些位元以產生由儲存裝置所使用的裝置LBA。但是無論主機LBA如何映射到裝置LBA,均可能存在兩個、三個或甚至可能更多代表資料儲存位置的不同位址。Therefore, the LBA range seen by the host may represent a combination of multiple LBA ranges of various storage devices. To distinguish between the LBA range used by the host and the LBA range of each storage device, the LBA used by the host can be called "host LBA (host LBA)", "global LBA (global LBA)" or "Operating System (O/S) Aware LBA (operating system-aware LBA)", and the LBA used by the storage device may be called "device LBA (device LBA)", "local LBA (local LBA)" or " LBA behind RoC (LBA behind RoC)". The host LBA range can be divided between various storage devices in any desired manner. For example, the host LBA range can be divided into consecutive blocks, where each individual block is allocated to a specific storage device. By using this scheme, the host LBA 0 to LBA 228-1 can be mapped to the device LBA 0 to LBA 228-1 of the storage device 130-1, and the host LBA 228 to LBA 229-1 can be mapped to the device of the storage device 130-2 LBA 0 to LBA 228-1, and so on. Alternatively, the individual bits in the host LBA can be used to determine the appropriate storage device and the device LBA that stores this data: for example, use the lower-order bits in the host LBA to identify the device and strip the bits To generate the device LBA used by the storage device. However, no matter how the host LBA is mapped to the device LBA, there may be two, three, or even more different addresses that represent data storage locations.

當然,並不要求儲存裝置是均質的:其可具有不同的大小且因此具有不同數目的LBA:其甚至可為例如將SSD與硬碟驅動器混合的不同裝置類型。Of course, storage devices are not required to be homogeneous: they can have different sizes and therefore different numbers of LBAs: they can even be different device types such as mixing SSDs with hard drives.

注意,為闡述簡單起見,即使提供給儲存裝置的位址不是邏輯塊位址(例如,硬碟驅動器),也可使用用語“裝置LBA”。如果“裝置LBA”是儲存裝置上儲存資料的實際位址,則儲存裝置在存取資料之前可不將裝置LBA映射到不同的位址。Note that for simplicity of explanation, even if the address provided to the storage device is not a logical block address (for example, a hard disk drive), the term "device LBA" may be used. If the "device LBA" is the actual address of the data stored on the storage device, the storage device may not map the device LBA to a different address before accessing the data.

現在回到圖5,探測邏輯525及擦除編碼控制器530充當具有旁視擦除編碼邏輯的PCIe交換機125的旁視擦除編碼邏輯。探測邏輯525可“探測”(例如,通過在請求被遞送到其目的地之前攔截所述請求)傳輸,並使用捕獲介面(capture interface)535-1至535-6確定適宜的目的地,捕獲介面535-1至535-6可經由複用器540傳遞到探測邏輯525。如以上所論述,處理器110僅“看到”給定容量的虛擬儲存裝置(或特定大小的主機記憶體位址塊),並基於主機LBA(與虛擬儲存裝置相關聯)發佈讀取或寫入資料的命令。探測邏輯525可將這些主機LBA轉換成一個或多個特定物理儲存裝置上的裝置LBA,並相應地改變傳輸以引導請求。探測邏輯525可以以任何所期望方式管理此種轉換。例如,探測邏輯525可包括將第一範圍的主機LBA映射到圖3所示儲存裝置130-1、將第二範圍的主機LBA映射到圖3所示儲存裝置130-2(依此類推)的表,其中裝置LBA取決於可與旁視擦除編碼邏輯如何進行操作相關的因素:例如,擦除編碼方案本身(例如RAID級別)、條帶大小、儲存裝置的數目等。或者,探測邏輯525可使用主機LBA中的特定位元來決定圖3所示儲存裝置130-1至130-6中的哪一個儲存所討論的資料:例如,如果陣列僅包括兩個儲存裝置,則探測邏輯525可使用低階位元(或邏輯塊位址中的一些其他位元)來判斷資料要被寫入到第一儲存裝置還是第二儲存裝置。(顯然,隨著陣列中包括更多的儲存裝置,可使用更多的位,適宜地注意確保邏輯塊位址均不包括“識別”不存在的儲存裝置的位元組合。例如,圖3示出總共24個儲存裝置130-1至130-6,儲存裝置130-1至130-6可使用位值00000至10111;應避免11000至11111之間的位值。)本發明概念的實施例可使用任何其他所期望方法來將從主機接收的邏輯塊位址映射到(適宜的)儲存裝置上的塊位址。Returning now to FIG. 5, the detection logic 525 and the erasure coding controller 530 act as the sidetrack erasure coding logic of the PCIe switch 125 with the sidetrack erasure coding logic. The detection logic 525 can “probe” (for example, by intercepting the request before it is delivered to its destination) transmission, and use capture interfaces 535-1 to 535-6 to determine the appropriate destination, capture interface 535-1 to 535-6 may be passed to the detection logic 525 via the multiplexer 540. As discussed above, the processor 110 only "sees" a virtual storage device of a given capacity (or a block of host memory address of a certain size) and issues a read or write based on the host LBA (associated with the virtual storage device) Information orders. The detection logic 525 may convert these host LBAs into device LBAs on one or more specific physical storage devices, and change the transmission accordingly to direct the request. The detection logic 525 can manage this conversion in any desired manner. For example, the detection logic 525 may include mapping the first range of host LBAs to the storage device 130-1 shown in FIG. 3, and mapping the second range of host LBAs to the storage device 130-2 shown in FIG. 3 (and so on) Table, where the device LBA depends on factors that can be related to how the side-by-side erasure coding logic operates: for example, the erasure coding scheme itself (eg RAID level), stripe size, number of storage devices, etc. Alternatively, the detection logic 525 may use specific bits in the host LBA to determine which of the storage devices 130-1 to 130-6 shown in FIG. 3 stores the data in question: for example, if the array includes only two storage devices, Then, the detection logic 525 may use low-order bits (or some other bits in the logical block address) to determine whether the data is to be written to the first storage device or the second storage device. (Obviously, as more storage devices are included in the array, more bits can be used, taking care to ensure that logical block addresses do not include bit combinations that "identify" nonexistent storage devices. For example, Figure 3 shows A total of 24 storage devices 130-1 to 130-6 are available, and storage devices 130-1 to 130-6 can use bit values from 00000 to 10111; bit values between 11000 and 11111 should be avoided.) Embodiments of the inventive concept may Use any other desired method to map the logical block address received from the host to the (suitable) block address on the storage device.

作為例子,考慮圖1所示處理器110發送寫入請求,所述寫入請求具有足夠的資料來填充跨儲存裝置130-1至130-6中的所有儲存裝置的整個條帶(在算上擦除編碼之後)。探測邏輯525可將資料分成單獨的邏輯單元,且如以下所論述,擦除編碼控制器530可提供或修改所述資料。探測邏輯525可接著以適宜的資料生成一個傳輸,所述一個傳輸的目的地是儲存裝置130-1至130-6中的每一個。As an example, consider that the processor 110 shown in FIG. 1 sends a write request with sufficient data to fill the entire stripe across all storage devices in the storage devices 130-1 to 130-6 (in the calculation After erasure coding). The detection logic 525 may divide the data into separate logical units, and as discussed below, the erasure coding controller 530 may provide or modify the data. The detection logic 525 may then generate a transmission with suitable data, the destination of the one transmission being each of the storage devices 130-1 to 130-6.

注意,當探測邏輯525以適宜於所討論儲存裝置的裝置LBA替換原始主機LBA時,此裝置LBA並非必須是物理塊位址。換句話說,由探測邏輯所使用的裝置LBA本身可為另一個邏輯塊位址。此種結構能夠使物理儲存裝置繼續在適宜時管理其自身的資料儲存。例如,如果物理儲存裝置是NVMe SSD,則所述SSD可四處移動資料以實行垃圾收集或損耗均衡,使用其快閃記憶體轉換層來管理所提供裝置LBA與NAND快閃記憶體晶片中一個NAND快閃記憶體晶片上的PBA的關聯。此種操作可在不知道探測邏輯525的情況下發生。但是,如果所討論儲存裝置不重新定位資料,則除非主機如此指示,否則由探測邏輯525所提供的裝置LBA可為儲存裝置上的物理位址。Note that when probe logic 525 replaces the original host LBA with a device LBA suitable for the storage device in question, this device LBA does not have to be a physical block address. In other words, the device LBA used by the detection logic itself can be another logical block address. This structure enables the physical storage device to continue to manage its own data storage when appropriate. For example, if the physical storage device is a NVMe SSD, the SSD can move data around for garbage collection or wear leveling, and use its flash memory conversion layer to manage one NAND of the provided device LBA and NAND flash memory chip The association of PBA on the flash memory chip. Such operations may occur without knowing the detection logic 525. However, if the storage device in question does not relocate data, the device LBA provided by the probe logic 525 can be a physical address on the storage device unless the host instructs so.

如上所述,擦除編碼控制器530可實施擦除編碼方案。視擦除編碼方案而定,擦除編碼控制器530可簡單地生成適宜的同位資料(例如,當使用RAID 5或RAID 6擦除編碼方案時),而讓原始資料(如由圖1所示處理器110所提供)保持不變。但是,在本發明概念的一些實施例中,擦除編碼控制器530也可修改原始資料。例如,擦除編碼控制器530可對原始資料實施改錯碼,以使得即使在出現錯誤的情形中,也可恰當地讀取儲存在圖3所示各別儲存裝置130-1至130-6上的塊。或者,擦除編碼控制器530可將寫入到圖3所示儲存裝置130-1至130-6的資料加密,從而使寫入到圖3所示儲存裝置130-1至130-6的資料在無加密金鑰(encryption key)的條件下不可讀取——或者更糟地導致擦除編碼控制器530認為如果圖1所示處理器110要直接寫入資料,則儲存裝置130-1至130-6被毀壞。或者,擦除編碼控制器530可將同位資訊(或類似類型的資訊)引入到寫入到圖3所示儲存裝置130-1至130-6中的每一個中的資料中。擦除編碼控制器530對資料實行的特定操作取決於所使用的擦除編碼方案。As described above, the erasure coding controller 530 may implement an erasure coding scheme. Depending on the erasure coding scheme, the erasure coding controller 530 can simply generate suitable co-located data (for example, when using a RAID 5 or RAID 6 erasure coding scheme), while allowing the original data (as shown in Figure 1 Provided by the processor 110) remains unchanged. However, in some embodiments of the inventive concept, the erasure coding controller 530 may also modify the original material. For example, the erasure coding controller 530 may implement an error correction code on the original data, so that even in the case of an error, the respective storage devices 130-1 to 130-6 stored in FIG. 3 can be properly read On the block. Alternatively, the erasure coding controller 530 may encrypt the data written to the storage devices 130-1 to 130-6 shown in FIG. 3 so that the data written to the storage devices 130-1 to 130-6 shown in FIG. 3 Cannot be read without an encryption key (or worse, cause the erasure coding controller 530 to think that if the processor 110 shown in FIG. 1 wants to directly write data, the storage device 130-1 to 130-6 was destroyed. Alternatively, the erasure coding controller 530 may introduce parity information (or similar types of information) into the data written into each of the storage devices 130-1 to 130-6 shown in FIG. The specific operation performed by the erasure coding controller 530 on the material depends on the erasure coding scheme used.

探測邏輯525及擦除編碼控制器530可以以任何所期望方式實施。例如,探測邏輯525及擦除編碼控制器530可使用其上儲存有適宜軟體的處理器來實施。但是由於PCIe交換機一般被實施為硬體電路系統(其通常比在例如PCIe交換機等裝置的處理器上運行的軟體更快,所述軟體一般不需要實施大量功能),探測邏輯525及擦除編碼控制器530可使用適宜的電路系統來實施。此種電路系統可包括以適宜方式程式設計的FPGA、ASIC或任何其他所期望硬體實施方案。The detection logic 525 and the erasure coding controller 530 can be implemented in any desired manner. For example, the detection logic 525 and the erasure code controller 530 may be implemented using a processor on which suitable software is stored. However, since PCIe switches are generally implemented as hardware circuitry (which is usually faster than software running on the processor of devices such as PCIe switches, which generally do not need to implement a large number of functions), detection logic 525 and erasure coding The controller 530 may be implemented using suitable circuitry. Such circuitry may include FPGA, ASIC, or any other desired hardware implementation programmed in a suitable manner.

在最基本的實施例中,可僅使用探測邏輯525及擦除編碼控制器530來實施旁視擦除編碼邏輯。但是在旁視擦除編碼邏輯中包括快取545和/或寫入緩衝器550可提供顯著的益處。In the most basic embodiment, only the probing logic 525 and the erasure coding controller 530 may be used to implement the sidetrack erasure coding logic. But including cache 545 and/or write buffer 550 in the side-by-side erasure coding logic may provide significant benefits.

快取545可儲存儲存在虛擬儲存裝置中的資料的子集。一般來說,快取545的容量小於總虛擬儲存裝置,但存取更快。因此,與從下伏物理儲存裝置(underlying physical storage device)存取資料相比,通過將一些資料儲存在快取545中,對快取545的快取命中可使虛擬儲存裝置獲得更快的性能。例如,快取545可儲存最新從虛擬儲存裝置存取的資料,使用任何所期望演算法來在所要替換的資料變舊時識別所述資料(例如最近最少使用演算法(Least Recently Used algorithm)或最不常用演算法(Least Frequently Used algorithm))。快取545可使用任何所期望記憶體結構(例如DRAM、SRAM、MRAM或任何其他所期望記憶體結構)來實施。快取545可甚至使用比傳統記憶體更快的記憶體結構來實施,例如可用在處理器中的L1或L2快取中。最後,儘管快取545被示出為具有旁視擦除編碼邏輯的PCIe交換機125的一部分,然而快取545也可儲存在圖1所示記憶體115中且由具有旁視擦除編碼邏輯的PCIe交換機125從記憶體115中存取。The cache 545 may store a subset of data stored in the virtual storage device. In general, the capacity of cache 545 is less than the total virtual storage device, but the access is faster. Therefore, compared to accessing data from an underlying physical storage device, by storing some data in the cache 545, a cache hit on the cache 545 can result in faster performance of the virtual storage device . For example, the cache 545 may store the latest data accessed from the virtual storage device, and use any desired algorithm to identify the data to be replaced as it becomes older (eg, Least Recently Used algorithm or the most recently used algorithm). Least Frequently Used algorithm). The cache 545 may be implemented using any desired memory structure (eg, DRAM, SRAM, MRAM, or any other desired memory structure). The cache 545 may even be implemented using a memory structure that is faster than traditional memory, such as can be used in the L1 or L2 cache in the processor. Finally, although cache 545 is shown as part of PCIe switch 125 with side-by-side erasure coding logic, cache 545 may also be stored in memory 115 shown in FIG. The PCIe switch 125 accesses from the memory 115.

寫入緩衝器550提供使寫入請求加速的機制。對使用擦除編碼來跨越(span)多個物理儲存裝置的虛擬儲存裝置實行寫入操作所需的時間可比對單個物理儲存裝置的相似寫入請求慢。實行寫入操作可涉及從同一塊中的其他儲存裝置讀取資料,在此之後,可合併新資料,接著可將合併的資料寫回到適宜的儲存裝置。實行合併還可涉及計算同位資訊或其他代碼資訊。並且如果下伏物理儲存裝置正在忙於實行其他操作(例如,處理讀取請求),則寫入請求也可能延遲。使在圖1所示處理器110上運行的軟體延遲且同時等待寫入請求完成可為非期望的。因此,寫入緩衝器550可臨時儲存資料直到對下伏物理儲存裝置的寫入完成為止,而非阻擋在圖1所示處理器110上運行的軟體;同時探測邏輯525可將寫入請求已完成告知給在圖1所示處理器110上運行的軟體。與回寫式快取策略相比,此種方法相似於寫直達快取策略(write-through cache policy),在回寫式快取策略中,寫入操作完成於在處理器110上運行的軟體被告知寫入已完成之前。像快取530一樣,除其他可能情況以外,寫入緩衝器550可使用例如DRAM、SRAM、MRAM或L1或L2快取結構等任何所期望記憶體結構來實施。The write buffer 550 provides a mechanism to accelerate write requests. The time required to perform a write operation on a virtual storage device that uses erasure coding to span multiple physical storage devices may be slower than a similar write request to a single physical storage device. Performing a write operation may involve reading data from other storage devices in the same block, after which new data may be merged, and then the merged data may be written back to the appropriate storage device. The implementation of consolidation may also involve calculation of parity information or other code information. And if the underlying physical storage device is busy performing other operations (eg, processing read requests), write requests may also be delayed. It may be undesirable to delay software running on the processor 110 shown in FIG. 1 while waiting for the write request to complete. Therefore, the write buffer 550 can temporarily store data until the write to the underlying physical storage device is completed, rather than blocking the software running on the processor 110 shown in FIG. 1; meanwhile, the detection logic 525 can change the write request The completion is notified to the software running on the processor 110 shown in FIG. Compared with the write-back cache strategy, this method is similar to the write-through cache policy (write-through cache policy). In the write-back cache strategy, the write operation is completed by the software running on the processor 110 Before being told that the write is complete. Like cache 530, write buffer 550 can be implemented using any desired memory structure, such as DRAM, SRAM, MRAM, or L1 or L2 cache structure, among other possibilities.

作為實行寫入操作的一部分,旁視擦除編碼邏輯可檢查完成寫入操作所需的資料中的任一資料當前是否位於快取545中。例如,當圖1所示處理器110向虛擬儲存裝置發送寫入請求時,擦除編碼方案可能需要讀取整個條帶以計算同位資訊或其他代碼資訊。如果此資料中的一些(或全部)資料駐留在快取545中,則可從快取545存取資料,而非通過從下伏物理儲存裝置讀取資料來存取資料。另外,快取策略可建議所要寫入的資料也應被快取在快取545中,以防資料可能在近期被再次請求。As part of performing the write operation, the side-by-side erasure coding logic may check whether any of the data required to complete the write operation is currently in the cache 545. For example, when the processor 110 shown in FIG. 1 sends a write request to the virtual storage device, the erasure coding scheme may need to read the entire stripe to calculate parity information or other code information. If some (or all) of the data resides in the cache 545, the data can be accessed from the cache 545 instead of reading the data from the underlying physical storage device. In addition, the cache strategy may suggest that the data to be written should also be cached in the cache 545 in case the data may be requested again in the near future.

儘管圖5將快取545及寫入緩衝器550示為單獨的元件,然而本發明概念的實施例可將所述兩者組合成單個元件(其可被簡單地稱為“快取”)。在本發明概念的此種實施例中,所述快取可包括指示其上所儲存的資料是“乾淨”還是“髒”的位。“乾淨”資料表示自其上次寫入到下伏物理儲存裝置以來僅被讀取但未被修改的資料;“髒”資料自其上次寫入到下伏物理儲存裝置以來已被修改。如果所述快取包括“髒”資料,則當根據快取策略將資料從快取移除時,旁視擦除編碼邏輯可能需要將“髒”資料寫回到下伏儲存裝置。另外,本發明概念的實施例可包括快取545、寫入緩衝器550,包括這兩者(單獨地或組合成單個元件)或者兩者都不包括。Although FIG. 5 shows the cache 545 and the write buffer 550 as separate elements, embodiments of the inventive concept may combine the two into a single element (which may be simply referred to as "cache"). In such an embodiment of the inventive concept, the cache may include a bit indicating whether the data stored thereon is "clean" or "dirty". "Clean" data means data that has only been read but not modified since it was last written to the underlying physical storage device; "dirty" data has been modified since it was last written to the underlying physical storage device. If the cache includes "dirty" data, when the data is removed from the cache according to the cache strategy, the sidetrack erasure coding logic may need to write the "dirty" data back to the underlying storage device. In addition, embodiments of the inventive concept may include cache 545, write buffer 550, both (either individually or combined into a single element), or neither.

如以上所論述,具有旁視擦除編碼邏輯的PCIe交換機125中的旁視擦除編碼邏輯可從下伏物理儲存裝置“創建”虛擬儲存裝置,且如果圖1所示處理器110獲得對圖3所示物理儲存裝置130-1至130-6的直接存取,則此將成問題。因此,當圖1所示機器105最初進行引導(即,啟動或上電)並嘗試枚舉可存取的各種PCIe裝置時,具有旁視擦除編碼邏輯的PCIe交換機125可確定其要使用旁視擦除編碼邏輯及其所附接的儲存裝置。在此種情形中,具有旁視擦除編碼邏輯的PCIe交換機125應防止對具有旁視擦除編碼邏輯的PCIe交換機125下游的任何PCIe裝置進行枚舉。通過防止此種枚舉,具有旁視擦除編碼邏輯的PCIe交換機125可“創建”虛擬儲存裝置,而無需擔心圖1所示處理器110可能夠直接存取圖3所示儲存裝置130-1至130-6上的資料(此可能毀壞擦除編碼方案中所使用的資料)。但是如以下參考圖9至圖10所論述,可存在具有旁視擦除編碼邏輯的PCIe交換機125應允許對PCIe裝置進行下游枚舉的情境。As discussed above, the side-by-side erasure coding logic in the PCIe switch 125 with side-by-side erasure coding logic can "create" the virtual storage device from the underlying physical storage device, and if the processor 110 shown in FIG. The direct access of the physical storage devices 130-1 to 130-6 shown in 3 will be a problem. Therefore, when the machine 105 shown in FIG. 1 initially boots (ie, starts or powers up) and attempts to enumerate the various PCIe devices that are accessible, the PCIe switch 125 with side-by-side erasure coding logic can determine that it wants to use the bypass Depending on the erasure coding logic and the attached storage device. In such a situation, the PCIe switch 125 with side-by-side erasure coding logic should prevent enumeration of any PCIe devices downstream of the PCIe switch 125 with side-by-side erasure coding logic. By preventing such enumeration, the PCIe switch 125 with side-by-side erasure coding logic can "create" a virtual storage device without worrying that the processor 110 shown in FIG. 1 may be able to directly access the storage device 130-1 shown in FIG. To the data on 130-6 (this may destroy the data used in the erasure coding scheme). But as discussed below with reference to FIGS. 9-10, there may be scenarios where the PCIe switch 125 with side-by-side erasure coding logic should allow downstream enumeration of PCIe devices.

探測邏輯525也可將配置命令傳遞到PPU 520。這樣,探測邏輯525也可作為PCIe到PCIe堆疊來進行操作,以實現將PCIe交換機核心515與PPU 520連接的目的。The probe logic 525 may also pass configuration commands to the PPU 520. In this way, the detection logic 525 can also operate as a PCIe to PCIe stack to achieve the purpose of connecting the PCIe switch core 515 and the PPU 520.

最後,探測邏輯525可從圖1所示處理器110接收擦除編碼啟用信號555(可能通過具有旁視擦除編碼邏輯的PCIe交換機125上的引腳)。擦除編碼啟用信號555可用來啟用來禁用具有旁視擦除編碼邏輯的PCIe交換機125中的擦除編碼邏輯。Finally, the detection logic 525 may receive the erasure coding enable signal 555 from the processor 110 shown in FIG. 1 (possibly through a pin on the PCIe switch 125 with side-by-side erasure coding logic). The erasure coding enable signal 555 can be used to enable to disable erasure coding logic in the PCIe switch 125 with bypass erasure coding logic.

圖6示出根據本發明概念另一實施例的具有透視擦除編碼邏輯的PCIe交換機的細節。通過比較圖5與圖6可看出,在圖5所示具有旁視擦除編碼邏輯的PCIe交換機125與圖6所示具有透視擦除編碼邏輯的PCIe交換機605中,旁視擦除編碼邏輯與透視擦除編碼邏輯之間的主要不同在於擦除編碼邏輯所放置的地點。在圖5所示具有旁視擦除編碼邏輯的PCIe交換機125中,擦除編碼邏輯位於PCIe交換機的“旁側(side)”,而在圖6所示具有透視擦除編碼邏輯的PCIe交換機605中,擦除編碼邏輯與PCIe交換機“排成一行(inline)”。6 shows details of a PCIe switch with perspective erasure coding logic according to another embodiment of the inventive concept. It can be seen by comparing FIGS. 5 and 6 that in the PCIe switch 125 with bypass erasure coding logic shown in FIG. 5 and the PCIe switch 605 with perspective erasure coding logic shown in FIG. 6, the bypass erasure coding logic The main difference from perspective erasure coding logic is where the erasure coding logic is placed. In the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 5, the erasure coding logic is located on the “side” of the PCIe switch, while the PCIe switch 605 with perspective erasure coding logic shown in FIG. 6 In the erasing coding logic and PCIe switch "inline".

與透視擦除編碼邏輯相比,使用旁視擦除編碼邏輯在技術上存在優點及缺點。圖5所示旁視擦除編碼邏輯是更複雜的實施方案,因為需要探測邏輯525來攔截和管理來自主機的資料的重新定向。相反,圖6所示透視擦除編碼邏輯更容易實施,因為主機與圖3所示儲存裝置130-1至130-6之間的所有資料均通過擦除編碼控制器530。另一方面,當擦除編碼邏輯被禁用時,包括旁視擦除編碼邏輯不會對PCIe交換機125的操作引入附加的潛伏時間。相反,圖6所示透視擦除編碼邏輯可充當PCIe端點。圖6所示透視擦除編碼邏輯可能在主機與圖3所示儲存裝置130-1至130-6之間緩衝資料,此可能增加通信的潛伏時間。在圖6所示透視擦除編碼邏輯中,擦除編碼控制器530還可包括例如幀緩衝器(Frame Buffer)、路由表(Route Table)、埠仲裁邏輯(Port Arbitration logic)及調度程式(Scheduler)(圖6中未示出)等元件:PCIe交換機核心515內所通常包括的元件。Compared with perspective erasure coding logic, the use of side-by-side erasure coding logic has technical advantages and disadvantages. The side-by-side erasure coding logic shown in FIG. 5 is a more complex implementation because probing logic 525 is required to intercept and manage the redirection of data from the host. In contrast, the perspective erasure coding logic shown in FIG. 6 is easier to implement because all data between the host and the storage devices 130-1 to 130-6 shown in FIG. 3 pass through the erasure coding controller 530. On the other hand, when erasure coding logic is disabled, including side-by-side erasure coding logic does not introduce additional latency to the operation of PCIe switch 125. In contrast, the perspective erasure coding logic shown in FIG. 6 can serve as a PCIe endpoint. The perspective erasure coding logic shown in FIG. 6 may buffer data between the host and the storage devices 130-1 to 130-6 shown in FIG. 3, which may increase the latency of communication. In the perspective erasure coding logic shown in FIG. 6, the erasure coding controller 530 may further include, for example, a frame buffer (Frame Buffer), a routing table (Route Table), a port arbitration logic (Port Arbitration logic) and a scheduler (Scheduler ) (Not shown in FIG. 6) and other elements: elements generally included in the PCIe switch core 515.

另外,PCIe交換機通常對上游(到主機)業務量與下游(到儲存裝置及其他連接裝置)業務量使用相同數目的埠。例如,如果PCIe交換機605包括總共96個埠,則通常48個用於上游業務量,而48個用於下游業務量。但是,在啟用圖6所示透視擦除編碼邏輯的情況下,擦除編碼控制器530可將所有下游裝置虛擬化。在此種情形中,與主機通信通常僅需要16個或可能32個上游埠。如果PCIe交換機605包括多於32個或64個埠的更多埠,則附加的埠可用於連接附加的下游裝置,附加的下游裝置可用於增加虛擬儲存裝置的容量。為此,圖6所示擦除編碼控制器530可使用不透明橋(non-transparent bridge,NTB)埠來與主機通信。In addition, PCIe switches usually use the same number of ports for upstream (to host) traffic and downstream (to storage devices and other connected devices) traffic. For example, if the PCIe switch 605 includes a total of 96 ports, usually 48 are used for upstream traffic and 48 are used for downstream traffic. However, with the perspective erasure coding logic shown in FIG. 6 enabled, the erasure coding controller 530 can virtualize all downstream devices. In this case, communication with the host usually requires only 16 or possibly 32 upstream ports. If the PCIe switch 605 includes more ports than 32 or 64 ports, the additional ports can be used to connect additional downstream devices, and the additional downstream devices can be used to increase the capacity of the virtual storage device. To this end, the erasure code controller 530 shown in FIG. 6 may use a non-transparent bridge (NTB) port to communicate with the host.

圖6示出包括透視擦除編碼邏輯的PCIe交換機605。但是本發明概念的實施例可將透視擦除編碼邏輯與PCIe交換機605分開。例如,透視擦除編碼邏輯可利用FPGA或ASIC實施為與PCIe交換機605分開的單獨組件。Figure 6 shows a PCIe switch 605 that includes perspective erasure coding logic. However, embodiments of the inventive concept may separate the perspective erasure coding logic from the PCIe switch 605. For example, the perspective erasure coding logic may be implemented as a separate component from the PCIe switch 605 using FPGA or ASIC.

但是儘管如圖5中所示的旁視擦除編碼邏輯與如圖6中所示的透視擦除編碼邏輯之間存在實施上和技術上的不同,但是在功能上兩種擦除編碼邏輯會實現相似的結果。因此,如圖5中所示的旁視擦除編碼邏輯與如圖6中所示的透視擦除編碼邏輯可視需要互換。本文件中對旁視擦除編碼邏輯的任何引用均旨在也囊括透視擦除編碼邏輯。However, although there are implementation and technical differences between the side-by-side erasure coding logic shown in FIG. 5 and the perspective erasure coding logic shown in FIG. 6, the two erasure coding logics will be functionally different. Achieve similar results. Therefore, the side-by-side erasure coding logic as shown in FIG. 5 and the perspective-erasure coding logic as shown in FIG. 6 may be interchanged as necessary. Any reference to by-pass erasure coding logic in this document is intended to also include perspective erasure coding logic.

圖7至圖10示出使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的各種拓撲。但是不管使用中的拓撲如何,圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的操作是相同的:既提供與各種附加儲存裝置的連接,又支援跨這些儲存裝置的擦除編碼。7 to 10 illustrate various topologies using the PCIe switch 125 shown in FIG. 1 with side-by-side erasure coding logic. Regardless of the topology in use, the operation of the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 1 is the same: it not only provides connections to various additional storage devices, but also supports erasure coding across these storage devices.

圖7示出根據本發明概念一個實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的第一拓撲。在圖7中,示出具有旁視擦除編碼邏輯的PCIe交換機125,其可被實施為圖1所示機器105的單獨組件。也就是說,具有旁視擦除編碼邏輯的PCIe交換機125可與例如圖1所示處理器110或儲存裝置130等任何其他元件分開製造和銷售。FIG. 7 shows a first topology using the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 1 according to an embodiment of the inventive concept. In FIG. 7, a PCIe switch 125 with side-by-side erasure coding logic is shown, which may be implemented as a separate component of the machine 105 shown in FIG. That is, the PCIe switch 125 with side-by-side erasure coding logic can be manufactured and sold separately from any other components such as the processor 110 or the storage device 130 shown in FIG. 1.

具有旁視擦除編碼邏輯的PCIe交換機125可連接到儲存裝置130。在圖7中,具有旁視擦除編碼邏輯的PCIe交換機125被示為僅連接到單個儲存裝置,所述單個儲存裝置可能不支援擦除編碼:擦除編碼需要至少兩個儲存裝置或儲存裝置的至少兩個部分來實行條帶化、分塊(chunking)、分組以及使用同位資訊或代碼資訊。但是即使是具有旁視擦除編碼邏輯的單個儲存裝置PCIe交換機125也可提供一些優點。例如,具有旁視擦除編碼邏輯的PCIe交換機125可支援與儲存裝置130一起使用改錯碼,或者如果儲存裝置130未在本機中提供這些服務,則將儲存在儲存裝置130上的資料加密。A PCIe switch 125 with side-by-side erasure coding logic may be connected to the storage device 130. In FIG. 7, the PCIe switch 125 with side-by-side erasure coding logic is shown as being connected only to a single storage device, which may not support erasure coding: erasure coding requires at least two storage devices or storage devices At least two parts of it to implement striping, chunking, grouping and using parity information or code information. But even a single storage device PCIe switch 125 with side-by-side erasure coding logic can provide some advantages. For example, the PCIe switch 125 with side-by-side erasure coding logic can support the use of error correction codes with the storage device 130, or if the storage device 130 does not provide these services locally, encrypt the data stored on the storage device 130 .

儲存裝置130也可連接到FPGA 705。FPGA 705可支援加速。簡而言之,可能存在需要被處理的資料且接著被丟棄的情境。將所有此種資料載入到圖1所示處理器110中以實行處理可能是昂貴又耗時的:在更靠近資料的地點可更容易實行計算。FPGA 705可支援在更靠近記憶體的地點實行此種計算,從而使得不再需要將資料載入到圖1所示處理器110中以實行計算:此概念被稱為“加速”。在2018年9月5日提出申請的美國專利申請第16/122,865號中更多地論述了基於FPGA的加速,所述美國專利申請主張2018年3月13日提出申請的美國臨時專利申請第62/642,568號、2018年3月9日提出申請的美國臨時專利申請第62/641,267號及2018年3月5日提出申請的美國臨時專利申請第62/638,904號(所述申請中的所有申請均併入本文供參考)的權益以及2018年9月6日提出申請的美國專利申請第16/124,179號、2018年9月6日提出申請的美國專利申請第16/124,182號及2018年9月6日提出申請的美國專利申請第16/124,182號(所述申請中的所有申請是2018年9月5日提出申請的美國專利申請16/122,865號的連續案(continuation)且併入本文供參考)的權益。由於加速的目的是在不將資料傳送到圖1所示處理器110的條件下處置資料,因此圖7示出更靠近儲存裝置130的FPGA 705。然而,注意,圖7中所示的特定佈置並非所需:FPGA 705可位於具有旁視擦除編碼邏輯的PCIe交換機125與儲存裝置130之間。The storage device 130 may also be connected to the FPGA 705. FPGA 705 can support acceleration. In short, there may be situations where data needs to be processed and then discarded. Loading all such data into the processor 110 shown in FIG. 1 to perform processing can be expensive and time-consuming: calculations can be performed more easily at locations closer to the data. FPGA 705 can support such calculations at a location closer to the memory, so that it is no longer necessary to load data into the processor 110 shown in FIG. 1 to perform calculations: this concept is called "acceleration". FPGA-based acceleration is discussed more in U.S. Patent Application No. 16/122,865 filed on September 5, 2018, which asserts U.S. Provisional Patent Application No. 62 filed on March 13, 2018 /642,568, U.S. Provisional Patent Application No. 62/641,267 filed on March 9, 2018, and U.S. Provisional Patent Application No. 62/638,904 filed on March 5, 2018 (all applications in the application (Incorporated herein for reference) and the rights and interests of US Patent Application No. 16/124,179 filed on September 6, 2018, US Patent Application No. 16/124,182 filed on September 6, 2018, and September 6, 2018 U.S. Patent Application No. 16/124,182 filed on the Japanese filing date (all applications in the application are continuations of U.S. Patent Application No. 16/122,865 filed on September 5, 2018 and are incorporated herein by reference) Rights. Since the purpose of acceleration is to process the data without transferring the data to the processor 110 shown in FIG. 1, FIG. 7 shows the FPGA 705 closer to the storage device 130. Note, however, that the specific arrangement shown in FIG. 7 is not required: FPGA 705 may be located between PCIe switch 125 with side-by-side erasure coding logic and storage device 130.

除數據加速以外,FPGA 705可提供其他功能來支援儲存裝置130。例如,FPGA 705可對儲存裝置130實施重復資料刪除功能,以嘗試減少相同資料儲存在儲存裝置130上的次數。FPGA 705可判斷特定資料是否在儲存裝置130上儲存多於一次,在各種邏輯塊位址(或被主機用來識別所述資料的其他資訊)與資料在儲存裝置130上儲存的地點之間建立關聯,並刪除附加副本。In addition to data acceleration, FPGA 705 can provide other functions to support storage device 130. For example, the FPGA 705 may implement a deduplication function on the storage device 130 in an attempt to reduce the number of times the same data is stored on the storage device 130. FPGA 705 can determine whether specific data is stored on storage device 130 more than once, established between various logical block addresses (or other information used by the host to identify the data) and the location where the data is stored on storage device 130 Associate and delete additional copies.

作為另外一種選擇,FPGA 705可在儲存裝置130上實施資料完整性功能,例如添加用於防止由於儲存裝置130的操作錯誤而導致的資料丟失的改錯碼或者使用迴圈冗餘校正(Cyclic Redundancy Correction,CRC)進行端到端保護的T10DIF(資料完整性欄位(Data Integrity Field))。通過此種方式,FPGA 705可能夠檢測何時存在對儲存裝置130上的資料或傳輸中的資料的錯誤寫入及讀取,並恢復原始資料。注意,FPGA 705可在主機未意識到正在提供資料完整性功能的條件下實施資料完整性功能:主機可能僅看到資料本身,而看不到改錯碼中的任一個。Alternatively, FPGA 705 can implement data integrity functions on storage device 130, such as adding error correction codes to prevent data loss due to operational errors of storage device 130 or using cyclic redundancy correction (Cyclic Redundancy) Correction (CRC) T10DIF (Data Integrity Field) for end-to-end protection. In this way, FPGA 705 may be able to detect when there is an erroneous writing and reading of data on storage device 130 or data in transmission, and restore the original data. Note that FPGA 705 can implement the data integrity function without the host being aware that the data integrity function is being provided: the host may only see the data itself, but not any of the error correction codes.

作為另外一種選擇,FPGA 705可對儲存裝置130實施資料加密功能,以防止未授權方能夠存取儲存裝置130上的資料:在未提供適宜加密金鑰的條件下,從FPGA 705返回的資料對於請求者來說可為無意義的。主機可提供在寫入及讀取資料時所要使用的加密金鑰。或者,FPGA 705可自動實行資料加密及解密:FPGA 705可儲存加密金鑰(且甚至可代表主機生成加密金鑰)並基於誰請求資料來確定所要使用的適宜加密金鑰。Alternatively, FPGA 705 can implement a data encryption function on storage device 130 to prevent unauthorized parties from accessing the data on storage device 130: Without providing an appropriate encryption key, the data returned from FPGA 705 is The requester may be meaningless. The host can provide the encryption key to be used when writing and reading data. Alternatively, FPGA 705 can automatically perform data encryption and decryption: FPGA 705 can store the encryption key (and can even generate the encryption key on behalf of the host) and determine the appropriate encryption key to use based on who requested the data.

作為另外一種選擇,FPGA 705可對儲存裝置130實施資料壓縮功能,以減少在儲存裝置130上儲存資料所需的空間量。當向儲存裝置130寫入資料時,FPGA 705可實施以下功能:將主機所提供的資料壓縮成更小的儲存量,接著儲存壓縮資料(以及當從儲存裝置130讀取資料時恢復原始資料所需的任何資訊)。當從儲存裝置130讀取資料時,FPGA 705可讀取壓縮資料(以及從壓縮資料恢復原始資料所需的任何資訊)並移除壓縮以恢復原始資料。Alternatively, FPGA 705 may implement a data compression function on storage device 130 to reduce the amount of space required to store data on storage device 130. When writing data to the storage device 130, the FPGA 705 can implement the following functions: compress the data provided by the host into a smaller amount of storage, and then store the compressed data (and restore the original data when reading data from the storage device 130 Any information needed). When reading data from the storage device 130, the FPGA 705 can read the compressed data (and any information needed to restore the original data from the compressed data) and remove the compression to restore the original data.

可使用重復資料刪除、資料完整性、資料加密及資料壓縮中的任何所期望實施方案。本發明概念的實施例不限於這些功能中的任何功能的特定實施方案。Any desired implementation of deduplication, data integrity, data encryption, and data compression can be used. Embodiments of the inventive concept are not limited to specific implementations of any of these functions.

FPGA 705還可視需要對儲存裝置130實施任何功能組合。例如,FPGA 705可實施資料壓縮與資料完整性二者(因為資料壓縮可能提高資料對錯誤的敏感性:儲存在儲存裝置130上的資料中的單個錯誤可能導致大量資料不可用)。或者FPGA 705可實施資料加密與資料壓縮二者(以保護資料,同時對資料使用盡可能少的記憶體)。FPGA 705也可提供兩個或更多個功能的其他組合。The FPGA 705 may also implement any combination of functions on the storage device 130 as needed. For example, FPGA 705 may implement both data compression and data integrity (because data compression may increase the sensitivity of data to errors: a single error in the data stored on storage device 130 may render a large amount of data unusable). Or FPGA 705 can implement both data encryption and data compression (to protect data while using as little memory as possible). FPGA 705 may also provide other combinations of two or more functions.

就總操作來說,當實施這些功能中的任一個時,FPGA 705可從適宜的源讀取數據。注意,儘管用語“源”是單數名詞,然而在適宜時,本發明概念的實施例可從多個源(例如多個儲存裝置)讀取資料。FPGA 705可接著對資料實行適宜的操作:資料加速、資料集成(data integration)、資料加密和/或資料壓縮。FPGA 705可接著對操作的結果採取適宜的動作:例如,將結果發送到圖1所示主機105,或者將資料寫入到儲存裝置130。In terms of overall operation, when any of these functions is implemented, the FPGA 705 can read data from a suitable source. Note that although the term "source" is a singular noun, when appropriate, embodiments of the inventive concept can read data from multiple sources (eg, multiple storage devices). The FPGA 705 can then perform appropriate operations on the data: data acceleration, data integration, data encryption, and/or data compression. The FPGA 705 may then take appropriate action on the result of the operation: for example, sending the result to the host 105 shown in FIG. 1 or writing the data to the storage device 130.

儘管以上功能是參考圖7所示FPGA 705來闡述,然而本發明概念的實施例可在包括FPGA的系統中的任何地點包括這些功能。此外,本發明概念的實施例可讓FPGA 705從“遠端”儲存裝置存取資料。例如,暫且返回到圖3,並假設儲存裝置130-1包括與FPGA 705相似的FPGA,但是儲存裝置130-2缺少此種儲存裝置。儲存裝置130-1中所包括的FPGA可用於通過向儲存裝置130-2發送請求而將其功能應用於儲存裝置130-2。例如,如果儲存裝置130-1中的FPGA提供資料加速,則儲存裝置130-1中的FPGA可發送請求以從儲存裝置130-2讀取資料,實行適宜的加速,接著將結果發送到適宜的目的地(例如圖1所示主機105)。Although the above functions are explained with reference to the FPGA 705 shown in FIG. 7, embodiments of the inventive concept may include these functions at any place in a system including an FPGA. In addition, embodiments of the inventive concept allow FPGA 705 to access data from a "remote" storage device. For example, returning to FIG. 3 for the time being, and assuming that the storage device 130-1 includes an FPGA similar to the FPGA 705, but the storage device 130-2 lacks such a storage device. The FPGA included in the storage device 130-1 may be used to apply its function to the storage device 130-2 by sending a request to the storage device 130-2. For example, if the FPGA in the storage device 130-1 provides data acceleration, the FPGA in the storage device 130-1 may send a request to read the data from the storage device 130-2, perform the appropriate acceleration, and then send the result to the appropriate Destination (for example, host 105 shown in FIG. 1).

在圖7中(及在以下圖8至圖10中所示的拓撲中),具有旁視擦除編碼邏輯的PCIe交換機125可附接到不符合擦除編碼資格的裝置。例如,具有旁視擦除編碼邏輯的PCIe交換機125可附接到具有內建擦除編碼功能的其他儲存裝置,或者例如圖7所示FPGA 705或圖形處理單元(GPU)等不是儲存裝置的裝置。所有此種裝置可被闡述為不符合擦除編碼資格(或者至少不符合由具有旁視擦除編碼邏輯的PCIe交換機125提供的擦除編碼資格)的裝置。In FIG. 7 (and in the topology shown in FIGS. 8 to 10 below), a PCIe switch 125 with side-by-side erasure coding logic may be attached to a device that does not qualify for erasure coding. For example, a PCIe switch 125 with side-by-side erasure coding logic can be attached to other storage devices with built-in erasure coding functions, or devices such as FPGA 705 or graphics processing unit (GPU) shown in FIG. 7 that are not storage devices . All such devices may be described as devices that are not eligible for erasure coding (or at least not eligible for erasure coding provided by PCIe switch 125 with side-by-side erasure coding logic).

當具有旁視擦除編碼邏輯的PCIe交換機125連接到不符合擦除編碼資格的裝置時,系統具有可使用的各種替代方式。在本發明概念的一個實施例中,包括任何不符合擦除編碼資格的裝置可能導致具有旁視擦除編碼邏輯的PCIe交換機125的旁視擦除編碼邏輯被禁用。因此,例如,如果具有旁視擦除編碼邏輯的PCIe交換機125要連接到圖7所示FPGA 705或GPU或具有本機擦除編碼邏輯的儲存裝置,則連接到具有旁視擦除編碼邏輯的PCIe交換機125的儲存裝置均不可與擦除編碼一起使用。注意,禁用具有旁視擦除編碼邏輯的PCIe交換機125的旁視擦除編碼邏輯的決定並不一定會轉換到相同主機殼或其他主機殼中的其他具有旁視擦除編碼邏輯的PCIe交換機。例如,圖3示出兩個具有旁視擦除編碼邏輯的PCIe交換機125及320,其中一個PCIe交換機可啟用旁視擦除編碼邏輯,而另一個PCIe交換機可禁用旁視擦除編碼邏輯。When a PCIe switch 125 with bypass erasure coding logic is connected to a device that does not qualify for erasure coding, the system has various alternatives available. In one embodiment of the inventive concept, including any device that does not qualify for erasure coding may cause the bypass erasure coding logic of the PCIe switch 125 with the bypass erasure coding logic to be disabled. Therefore, for example, if the PCIe switch 125 with bypass erasure coding logic is to be connected to the FPGA 705 or GPU shown in FIG. 7 or a storage device with native erasure coding logic, it is connected to the None of the storage devices of PCIe switch 125 can be used with erasure coding. Note that the decision to disable the side-by-side erasure coding logic of the PCIe switch 125 with side-by-side erasure coding logic does not necessarily translate to other PCIe with side-by-side erasure coding logic in the same mainframe or other mainframes switch. For example, FIG. 3 shows two PCIe switches 125 and 320 with side-by-side erasure coding logic, one of which can enable the side-by-side erasure coding logic and the other PCIe switch can disable the side-by-side erasure coding logic.

本發明概念的另一個實施例可禁用不符合擦除編碼資格的裝置,將其視為好像其根本未連接到具有旁視擦除編碼邏輯的PCIe交換機125一樣。在本發明概念的此實施例中,具有旁視擦除編碼邏輯的PCIe交換機125可對儲存裝置130啟用旁視擦除編碼邏輯,且可禁用任何其他符合擦除編碼資格的儲存裝置,就好像其未連接到具有旁視擦除編碼邏輯的PCIe交換機125一樣。Another embodiment of the inventive concept can disable a device that does not qualify for erasure coding as if it were not connected to the PCIe switch 125 with bypass erasure coding logic at all. In this embodiment of the inventive concept, the PCIe switch 125 with side-by-side erasure coding logic can enable the side-by-side erasure coding logic to the storage device 130, and can disable any other storage devices that qualify for erasure coding, as if It is not connected to the PCIe switch 125 with side-by-side erasure coding logic.

在本發明概念的又一個實施例中,具有旁視擦除編碼邏輯的PCIe交換機125可對可被旁視擦除編碼邏輯覆蓋的儲存裝置啟用旁視擦除編碼邏輯,但是仍然能夠使不符合擦除編碼資格的其他裝置被存取。本發明概念的此實施例是最複雜的實施方案:具有旁視擦除編碼邏輯的PCIe交換機125需要判斷哪些裝置符合擦除編碼資格、哪些不符合,接著分析業務量以判斷業務量的目的地是虛擬儲存裝置(在此種情形中,業務量被旁視擦除編碼邏輯攔截)還是並非虛擬儲存裝置(在此種情形中,業務量被遞送到其原始目的地)。In yet another embodiment of the inventive concept, the PCIe switch 125 with side-by-side erasure coding logic can enable the side-by-side erasure coding logic for storage devices that can be overwritten by the side-by-side erasure coding logic, but still enable non-compliance Other devices that are eligible for erasure coding are accessed. This embodiment of the inventive concept is the most complex implementation: PCIe switch 125 with bypass erasure coding logic needs to determine which devices are eligible for erasure coding and which do not, and then analyze the traffic to determine the destination of the traffic Whether it is a virtual storage device (in this case, the traffic is intercepted by sidetrack erasure coding logic) or not a virtual storage device (in this case, the traffic is delivered to its original destination).

在本發明概念的其中機器105最終不提供所安裝裝置的全部功能的實施例(即本發明概念的其中由於存在不符合擦除編碼資格的裝置而禁用擦除編碼,或者此種裝置被具有旁視擦除編碼邏輯的PCIe交換機125禁用的實施例)中,機器105可將此事實通知給用戶。此通知可由圖1所示處理器110、圖3所示BMC 325或具有旁視擦除編碼邏輯的PCIe交換機125來提供。除告知用戶一些功能已被禁用以外,所述通知還可告知使用者如何重新配置機器105以允許添加的功能。例如,所述通知可建議不符合擦除編碼資格的裝置連接到圖3所示中間平面305中的特定槽位(可能是那些連接到具有旁視擦除編碼邏輯的PCIe交換機320的槽位)並建議符合擦除編碼資格的儲存裝置連接到其他槽位元(例如那些連接到具有旁視擦除編碼邏輯的PCIe交換機125的槽位)。通過此種方式,至少一些符合擦除編碼資格的儲存裝置可受益於擦除編碼方案,而不會阻擋對不符合擦除編碼資格的其他裝置的存取。In the embodiment of the inventive concept where the machine 105 does not ultimately provide all the functions of the installed device (ie, where the erasure coding is disabled due to the existence of a device that does not qualify for erasure coding, or such a device is Depending on the erasure coding PCIe switch 125 disabled embodiment), the machine 105 may notify the user of this fact. This notification may be provided by the processor 110 shown in FIG. 1, the BMC 325 shown in FIG. 3, or the PCIe switch 125 with side-by-side erasure coding logic. In addition to informing the user that some functions have been disabled, the notification may also inform the user how to reconfigure the machine 105 to allow added functions. For example, the notification may suggest that devices that are not eligible for erasure coding are connected to specific slots in the middle plane 305 shown in FIG. 3 (probably those connected to PCIe switches 320 with bypass erasure coding logic) It is also recommended to connect storage devices that are eligible for erasure coding to other slots (such as those connected to PCIe switches 125 with side-by-side erasure coding logic). In this way, at least some storage devices that qualify for erasure coding can benefit from the erasure coding scheme without blocking access to other devices that are not eligible for erasure coding.

圖8示出根據本發明概念另一實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的第二拓撲。在圖8中,具有旁視擦除編碼邏輯的PCIe交換機125可位於FPGA 705內:也就是說,FPGA 705還可實施具有旁視擦除編碼邏輯的PCIe交換機125。FPGA 705及具有旁視擦除編碼邏輯的PCIe交換機125可接著連接到儲存裝置130-1至130-4。儘管圖8示出FPGA 705及具有旁視擦除編碼邏輯的PCIe交換機125連接到四個儲存裝置130-1至130-4,然而本發明概念的實施例可包括任意數目的儲存裝置130-1至130-6。FIG. 8 illustrates a second topology using the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 1 according to another embodiment of the inventive concept. In FIG. 8, the PCIe switch 125 with side-by-side erasure coding logic may be located within the FPGA 705: that is, the FPGA 705 may also implement the PCIe switch 125 with side-by-side erasure coding logic. The FPGA 705 and the PCIe switch 125 with side-by-side erasure coding logic may then be connected to the storage devices 130-1 to 130-4. Although FIG. 8 shows that FPGA 705 and PCIe switch 125 with bypass erasure coding logic are connected to four storage devices 130-1 to 130-4, embodiments of the inventive concept may include any number of storage devices 130-1 To 130-6.

通常,圖8中所示拓撲可在單個殼或殼體內實施且含有所示元件中的所有元件(SSD 130-1至130-4可為單獨的快閃記憶體,而非自容式SSD)。也就是說,圖8中所示的整個結構可作為單個單元出售,而非作為單獨的元件出售。但是本發明概念的實施例還可在一端包括連接到圖1所示機器105(可能連接到圖3所示中間平面305)的提升器卡(riser card)且在另一端具有用於連接到儲存裝置130-1至130-4的連接器(例如U.2、M.3或SFF-TA-1008連接器)。並且儘管圖8示出具有旁視擦除編碼邏輯的PCIe交換機125作為FPGA 705的一部分,然而具有旁視擦除編碼邏輯的PCIe交換機125也可實施為智能SSD的一部分。In general, the topology shown in FIG. 8 can be implemented in a single shell or housing and contains all of the components shown (SSD 130-1 to 130-4 can be separate flash memories, not self-contained SSDs) . That is, the entire structure shown in FIG. 8 can be sold as a single unit rather than as a separate element. However, embodiments of the inventive concept may also include a riser card connected to the machine 105 shown in FIG. 1 (possibly connected to the intermediate plane 305 shown in FIG. 3) at one end and having a connection for storage to the storage at the other end. Connectors for devices 130-1 to 130-4 (eg U.2, M.3 or SFF-TA-1008 connectors). And although FIG. 8 shows the PCIe switch 125 with side-by-side erasure coding logic as part of the FPGA 705, the PCIe switch 125 with side-by-side erasure coding logic may also be implemented as part of the smart SSD.

圖9示出根據本發明概念又一個實施例的用於使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的第三拓撲。在圖9中,示出兩個具有旁視擦除編碼邏輯的PCIe交換機125及320,所述兩個PCIe交換機125與320之間連接多達24個儲存裝置130-1至130-6。如以上參考圖3所述,每個具有旁視擦除編碼邏輯的PCIe交換機125及320可包括96個PCIe通道,在每個方向上使用四個PCIe通道來與儲存裝置130-1至130-6中的一個通信:每個具有旁視擦除編碼邏輯的PCIe交換機125及320可接著支援多達12個儲存裝置。為支援由多個具有旁視擦除編碼邏輯的PCIe交換機125及320所支援的跨儲存裝置的擦除編碼,可指定一個具有旁視擦除編碼邏輯的PCIe交換機負責跨所有裝置的擦除編碼,且可啟用旁視擦除編碼邏輯。具有旁視擦除編碼邏輯的另一個PCIe交換機320可純粹作為PCIe交換機進行操作,其中旁視擦除編碼邏輯被禁用。關於應選擇哪個PCIe交換機來處置擦除編碼的選擇可以以任何所期望方式來完成:例如,所述兩個PCIe交換機可在其之間就此進行協商,或者首先被枚舉的PCIe交換機可被指定用來處置擦除編碼。被選擇來處置擦除編碼的PCIe交換機可接著報告虛擬儲存裝置(跨越兩個PCIe交換機),而不處置擦除編碼的PCIe交換機可不報告下游裝置(以防止圖1所示處理器110嘗試存取作為擦除編碼方案的一部分的儲存裝置)。FIG. 9 illustrates a third topology for using the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 1 according to yet another embodiment of the inventive concept. In FIG. 9, two PCIe switches 125 and 320 with bypass erasure coding logic are shown, and up to 24 storage devices 130-1 to 130-6 are connected between the two PCIe switches 125 and 320. As described above with reference to FIG. 3, each PCIe switch 125 and 320 with side-by-side erasure coding logic may include 96 PCIe lanes, using four PCIe lanes in each direction to communicate with the storage devices 130-1 to 130- One of 6 communications: each PCIe switch 125 and 320 with side-by-side erasure coding logic can then support up to 12 storage devices. To support erasure coding across storage devices supported by multiple PCIe switches 125 and 320 with side-by-side erasure coding logic, a PCIe switch with side-by-side erasure coding logic can be designated to be responsible for erasure coding across all devices , And can enable side-by-side erasure coding logic. Another PCIe switch 320 with side-by-side erasure coding logic may operate purely as a PCIe switch, with side-by-side erasure coding logic disabled. The choice of which PCIe switch should be selected to handle erasure coding can be done in any desired manner: for example, the two PCIe switches can negotiate between them, or the PCIe switch that is enumerated first can be designated Used to deal with erasure codes. PCIe switches that are selected to handle erasure coding may then report virtual storage devices (across two PCIe switches), and PCIe switches that do not handle erasure coding may not report to downstream devices (to prevent processor 110 shown in FIG. 1 from attempting access Storage devices as part of the erasure coding scheme).

注意,儘管具有旁視擦除編碼邏輯的PCIe交換機125與320可均位於同一主機殼中,然而具有旁視擦除編碼邏輯的PCIe交換機125與320可位於不同的主機殼中。也就是說,擦除編碼方案可跨越多個主機殼之間的儲存裝置。所需的只是各種主機殼中的PCIe交換機能夠彼此協商要成為擦除編碼方案一部分的儲存裝置的位置。本發明概念的實施例也不限於兩個具有旁視擦除編碼邏輯的PCIe交換機125及320:擦除編碼方案中所包括的儲存裝置可連接到任何數目的具有旁視擦除編碼邏輯的PCIe交換機125及320。Note that although PCIe switches 125 and 320 with side-by-side erasure coding logic can both be located in the same mainframe, PCIe switches 125 and 320 with side-by-side erasure coding logic can be located in different mainframes. That is, the erasure coding scheme can span storage devices between multiple main chassis. All that is required is that the PCIe switches in the various mainframes can negotiate with each other the location of the storage device to be part of the erasure coding scheme. The embodiments of the inventive concept are also not limited to two PCIe switches 125 and 320 with side-by-side erasure coding logic: the storage devices included in the erasure coding scheme can be connected to any number of PCIe with side-by-side erasure coding logic Switches 125 and 320.

主機LBA可以以任何所期望方式跨具有旁視擦除編碼邏輯的PCIe交換機125及320進行分割。例如,主機LBA中的最低有效位元(least significant bit)可用於識別具有旁視擦除編碼邏輯的哪個PCIe交換機125或320包括以此主機LBA儲存資料的儲存裝置。通過使用多於兩個具有旁視擦除編碼邏輯的PCIe交換機,可使用多個位來確定哪個具有旁視擦除編碼邏輯的PCIe交換機管理儲存資料的儲存裝置。一旦已識別出適宜的具有旁視擦除編碼邏輯的PCIe交換機(且圖5所示探測邏輯525已修改傳輸),傳輸便可被路由到適宜的具有旁視擦除編碼邏輯的PCIe交換機(假設傳輸的目的地不是與旁視擦除編碼邏輯被啟用的具有旁視擦除編碼邏輯的PCIe交換機連接的儲存裝置)。The host LBA can be split across PCIe switches 125 and 320 with side-by-side erasure coding logic in any desired manner. For example, the least significant bit in the host LBA can be used to identify which PCIe switch 125 or 320 with side-by-side erasure coding logic includes a storage device that stores data with this host LBA. By using more than two PCIe switches with bypass erasure coding logic, multiple bits can be used to determine which PCIe switch with bypass erasure coding logic manages the storage device that stores the data. Once a suitable PCIe switch with side-by-side erasure coding logic has been identified (and the detection logic 525 shown in FIG. 5 has been modified for transmission), the transmission can be routed to the appropriate PCIe switch with side-by-side erasure coding logic (assuming The destination of the transmission is not a storage device connected to a PCIe switch with bypass erasure coding logic enabled and bypass erasure coding logic enabled).

在本發明概念的另一個實施例中,不再讓單個具有旁視擦除編碼邏輯的PCIe交換機負責將與兩個具有旁視擦除編碼邏輯的PCIe交換機連接的所有儲存裝置虛擬化,而是每個具有旁視擦除編碼邏輯的PCIe交換機均可創建單獨的虛擬儲存裝置(具有單獨的擦除編碼域)。通過此種方式,可為不同的客戶創建不同的、但容量較小的擦除編碼域。In another embodiment of the inventive concept, a single PCIe switch with bypass erasure coding logic is no longer responsible for virtualizing all storage devices connected to two PCIe switches with bypass erasure coding logic, but Each PCIe switch with side-by-side erasure coding logic can create a separate virtual storage device (with a separate erasure coding domain). In this way, different erasure coding fields can be created for different customers, but with smaller capacities.

圖9也可表示本發明概念的另一實施例。儘管圖9暗示僅儲存裝置130-1至130-16連接到具有旁視擦除編碼邏輯的PCIe交換機125及320且所有儲存裝置130-1至130-6均可與擦除編碼方案一起使用,但是如以上所論述,本發明概念的實施例並非僅限於此:具有旁視擦除編碼邏輯的PCIe交換機125及320可具有不符合與其連接的擦除編碼資格的裝置。此種裝置可被分組在具有旁視擦除編碼邏輯的單個PCIe交換機下,符合擦除編碼資格的儲存裝置被分組在不同的具有旁視擦除編碼邏輯的PCIe交換機125下。通過此種方式,可實現圖1所示機器105的最佳功能,一個(或一些)具有旁視擦除編碼邏輯的PCIe交換機啟用旁視擦除編碼邏輯,而一個(或一些)具有旁視擦除編碼邏輯的PCIe交換機禁用旁視擦除編碼邏輯。Figure 9 may also represent another embodiment of the inventive concept. Although FIG. 9 implies that only storage devices 130-1 to 130-16 are connected to PCIe switches 125 and 320 with side-by-side erasure coding logic and all storage devices 130-1 to 130-6 can be used with the erasure coding scheme, But as discussed above, embodiments of the inventive concept are not limited to this: PCIe switches 125 and 320 with side-by-side erasure coding logic may have devices that are not eligible for erasure coding connected to them. Such devices can be grouped under a single PCIe switch with side-by-side erasure coding logic, and storage devices that qualify for erasure coding are grouped under different PCIe switches 125 with side-by-side erasure coding logic. In this way, the best function of the machine 105 shown in FIG. 1 can be achieved. One (or some) PCIe switches with side-by-side erasure coding logic enable the side-by-side erasure coding logic, and one (or some) have side-by-side The PCIe switch with erasure coding logic disables the sidetrack erasure coding logic.

圖10示出根據本發明概念又一實施例的用於使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的第四拓撲。在圖10中,與圖9相比,具有旁視擦除編碼邏輯的PCIe交換機125、320及1005可構造成等級結構(hierarchy)。在等級結構的頂部,具有旁視擦除編碼邏輯的PCIe交換機125可管理等級結構中位於具有旁視擦除編碼邏輯的PCIe交換機125之下的所有儲存裝置的擦除編碼,且因此可啟用旁視擦除編碼邏輯。另一方面,具有旁視擦除編碼邏輯的PCIe交換機320及1005可禁用其旁視擦除編碼邏輯(因為其儲存裝置由具有旁視擦除編碼邏輯的PCIe交換機125的旁視擦除編碼邏輯管理)。FIG. 10 illustrates a fourth topology for using the PCIe switch 125 with bypass erasure coding logic shown in FIG. 1 according to yet another embodiment of the inventive concept. In FIG. 10, compared with FIG. 9, PCIe switches 125, 320, and 1005 with side-by-side erasure coding logic may be constructed in a hierarchy. At the top of the hierarchical structure, the PCIe switch 125 with side-by-side erasure coding logic can manage the erasure coding of all storage devices in the hierarchical structure below the PCIe switch 125 with side-by-side erasure coding logic, and therefore the side Depending on erasure coding logic. On the other hand, PCIe switches 320 and 1005 with side-by-side erasure coding logic can disable their side-by-side erasure coding logic (because their storage device is by side-by-side erasure coding logic of PCIe switch 125 with side-by-side erasure coding logic management).

儘管圖10示出構造成兩層式等級結構(two-tier hierarchy)的三個具有旁視擦除編碼邏輯的PCIe交換機125、320及1005,然而本發明概念的實施例在所包括的PCIe交換機的數目或其等級結構佈置方面不受限制。因此,本發明概念的實施例可支援以任何所期望等級結構進行佈置的任何數目的具有旁視擦除編碼邏輯的PCIe交換機。Although FIG. 10 shows three PCIe switches 125, 320, and 1005 with side-by-side erasure coding logic configured in a two-tier hierarchy, embodiments of the inventive concept are included in the included PCIe switches The number of or its hierarchical arrangement is not limited. Therefore, embodiments of the inventive concept can support any number of PCIe switches with side-by-side erasure coding logic arranged in any desired hierarchical structure.

以上參考圖1至圖10闡述的本發明概念的實施例關注單埠儲存裝置。但是本發明概念的實施例可擴展到雙埠儲存裝置,其中一個(或多個)儲存裝置與多個具有旁視擦除編碼邏輯的PCIe交換機通信。在本發明概念的此種實施例中,如果圖3所示具有旁視擦除編碼邏輯的PCIe交換機125不能夠與雙埠儲存裝置通信,則具有旁視擦除編碼邏輯的PCIe交換機125可向具有旁視擦除編碼邏輯的PCIe交換機320發送傳輸,以嘗試與儲存裝置通信。具有旁視擦除編碼邏輯的PCIe交換機320有效地充當橋,以使具有旁視擦除編碼邏輯的PCIe交換機125與儲存裝置通信。The embodiments of the inventive concept explained above with reference to FIGS. 1 to 10 focus on the dual-port storage device. However, embodiments of the inventive concept can be extended to dual-port storage devices, where one (or more) storage devices communicate with multiple PCIe switches with side-by-side erasure coding logic. In such an embodiment of the inventive concept, if the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 cannot communicate with the dual-port storage device, the PCIe switch 125 with bypass code erasure coding logic can The PCIe switch 320 with side-by-side erasure coding logic sends a transmission in an attempt to communicate with the storage device. The PCIe switch 320 with side-view erasure coding logic effectively acts as a bridge to enable the PCIe switch 125 with side-view erasure coding logic to communicate with the storage device.

本發明概念的實施例還可支援檢測並處置儲存裝置故障。例如,再次考慮圖4,並假設儲存裝置130-1出現故障。儲存裝置130-1可能由於任何數目的原因而出現故障:電湧(power surge)可能已損壞電子元件(electronics)、佈線(在儲存裝置130-1內部或儲存裝置130-1與具有旁視擦除編碼邏輯的PCIe交換機125之間的連接部中)可能已出現故障、儲存裝置130-1可能已檢測到太多錯誤而自己關機,或者儲存裝置130-1可能由於其他原因而出現故障。儲存裝置130-1也可能已被用戶從其槽位中移除(可能是為以更新的、更可靠的或更大的儲存裝置對其進行替換)。無論是什麼原因,儲存裝置130-1均可能變得不可用。Embodiments of the inventive concept can also support the detection and handling of storage device failures. For example, consider FIG. 4 again, and assume that the storage device 130-1 fails. The storage device 130-1 may fail for any number of reasons: power surges may have damaged electronics, wiring (inside the storage device 130-1 or the storage device 130-1 and the Except in the connection between the encoded logic PCIe switches 125), a failure may have occurred, the storage device 130-1 may have detected too many errors and shut itself down, or the storage device 130-1 may have failed for other reasons. The storage device 130-1 may also have been removed from its slot by the user (perhaps to replace it with a newer, more reliable or larger storage device). Whatever the reason, the storage device 130-1 may become unusable.

具有旁視擦除編碼邏輯的PCIe交換機125可通過儲存裝置130-1的連接器上的存在引腳來檢測儲存裝置130-1的故障。如果儲存裝置130-1被從主機殼中移除,或者如果儲存裝置130-1已關機,則其可能不再通過連接器上的存在引腳來斷言其存在,此可能在具有旁視擦除編碼邏輯的PCIe交換機125中觸發中斷。作為另外一種選擇,具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325)可向儲存裝置130-1發送偶然消息(occasional message),以檢查其是否仍然為現用(有時被稱為“心跳(heartbeat)”的過程):如果儲存裝置130-1未對此種消息作出回應,則具有旁視擦除編碼邏輯的PCIe交換機125或圖3所示BMC 325可斷定儲存裝置130-1已出現故障。The PCIe switch 125 with bypass code erasure coding logic can detect the failure of the storage device 130-1 through the presence pin on the connector of the storage device 130-1. If the storage device 130-1 is removed from the main case, or if the storage device 130-1 has been shut down, it may no longer assert its presence through the presence pin on the connector, which may An interrupt is triggered in the PCIe switch 125 except for coding logic. Alternatively, the PCIe switch 125 (or BMC 325 shown in FIG. 3) with side-by-side erasure coding logic can send an occasional message to the storage device 130-1 to check whether it is still active (sometimes The process called "heartbeat": If the storage device 130-1 does not respond to this kind of message, then the PCIe switch 125 with the bypass erasure coding logic or the BMC 325 shown in FIG. 3 can conclude the storage device 130-1 has failed.

如果(且當)儲存裝置130-1出現故障,則具有旁視擦除編碼邏輯的PCIe交換機125可通過利用其他方式存取在正常情況下將從儲存裝置130-1請求的任何資料來管理所述情境。例如,如果存在儲存裝置130-1的鏡像(mirror),則具有旁視擦除編碼邏輯的PCIe交換機125可從儲存裝置130-1的鏡像請求資料。或者,具有旁視擦除編碼邏輯的PCIe交換機125可從陣列中的其他儲存裝置請求含有所期望資料的條帶的其餘部分,並使用擦除編碼資訊來重構來自儲存裝置130-1的資料。可存在其他機制,通過這些機制,具有旁視擦除編碼邏輯的PCIe交換機125可存取儲存在故障儲存裝置130-1上的資料。If (and when) the storage device 130-1 fails, the PCIe switch 125 with side-by-side erasure coding logic can manage all data by using other means to access any data normally requested from the storage device 130-1 Describe the situation. For example, if there is a mirror of the storage device 130-1, the PCIe switch 125 with bypass erasure coding logic may request data from the mirror of the storage device 130-1. Alternatively, PCIe switch 125 with side-by-side erasure coding logic can request the rest of the stripe containing the desired data from other storage devices in the array and use erasure coding information to reconstruct the data from storage device 130-1 . There may be other mechanisms by which the PCIe switch 125 with bypass erasure coding logic can access the data stored on the fault storage device 130-1.

本發明概念的實施例還可支援檢測並處置新儲存裝置在陣列中的插入。如同檢測儲存裝置的故障一樣,具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325)可通過偶然查驗裝置以查看連接有什麼或任何其他所期望機制、利用連接器上的存在引腳檢測新儲存裝置的插入(如同檢測故障儲存裝置一樣,使用存在引腳檢測新儲存裝置可能在具有旁視擦除編碼邏輯的PCIe交換機125中觸發中斷)。當檢測到新儲存裝置時,具有旁視擦除編碼邏輯的PCIe交換機125可將此新儲存裝置添加到陣列。將新儲存裝置添加到陣列未必涉及改變擦除編碼方案:此種改變可能需要改變儲存在儲存裝置上的所有資料。(例如,考慮從RAID 5改變為RAID 6:每個條帶現將需要兩個同位塊(其將需要在儲存裝置之間輪換),從而需要計算和移動大量資料。)但是將新儲存裝置添加到現有擦除編碼方案可能不需要到處移動大量資料。因此,儘管添加新儲存裝置可能不會提高陣列對儲存裝置故障的容忍度,然而添加新儲存裝置仍然可增加虛擬儲存裝置的容量。Embodiments of the inventive concept can also support the detection and handling of the insertion of new storage devices in the array. As with the detection of storage device failures, PCIe switch 125 (or BMC 325 shown in Figure 3) with side-by-side erasure coding logic can accidentally inspect the device to see what is connected or any other desired mechanism, using the The presence pin detects the insertion of a new storage device (as with the detection of a failed storage device, using the presence pin to detect a new storage device may trigger an interrupt in the PCIe switch 125 with side-by-side erasure coding logic). When a new storage device is detected, the PCIe switch 125 with side-by-side erasure coding logic can add this new storage device to the array. Adding new storage devices to the array does not necessarily involve changing the erasure coding scheme: such changes may require changes to all the data stored on the storage devices. (For example, consider changing from RAID 5 to RAID 6: each stripe will now require two co-located blocks (which will need to be rotated between storage devices), requiring large amounts of data to be calculated and moved.) But new storage devices are added It may not be necessary to move large amounts of data around to existing erasure coding schemes. Therefore, although adding a new storage device may not improve the array's tolerance to storage device failure, adding a new storage device can still increase the capacity of the virtual storage device.

如果陣列中已存在故障儲存裝置,則可利用新儲存裝置的插入來重建故障儲存裝置。圖5所示擦除編碼控制器530可計算儲存在故障儲存裝置上的資料,並將此資料儲存在替換儲存裝置上的適宜塊位址中。例如,故障儲存裝置上的原始資料可根據其他儲存裝置上的資料(原始資料與同位或代碼資訊二者)來計算;儲存在故障儲存裝置上的同位或代碼資訊可根據其他儲存裝置上的原始資料來重新計算。(當然,如果故障儲存裝置存在鏡像,則圖5所示擦除編碼控制器530可簡單地指令將資料從鏡像複製到替換儲存裝置上。)If a faulty storage device already exists in the array, the insertion of the new storage device can be used to rebuild the faulty storage device. The erasure coding controller 530 shown in FIG. 5 can calculate the data stored on the faulty storage device and store the data in the appropriate block address on the replacement storage device. For example, the original data on the faulty storage device can be calculated based on the data on other storage devices (both the original data and the parity or code information); the parity or code information stored on the faulty storage device can be based on the original data on the other storage device Data to recalculate. (Of course, if the faulty storage device has an image, the erasure code controller 530 shown in FIG. 5 can simply instruct to copy the data from the image to the replacement storage device.)

重建故障儲存裝置可能是耗時的過程。在本發明概念的一些實施例中,一旦安裝了替換儲存裝置,就可進行重建。在本發明概念的其他實施例中,就儲存裝置可在閒置時間週期中重建而言,圖5所示擦除編碼控制器530可在閒置時間週期中重建儲存裝置。然而,如果虛擬儲存裝置正忙,則圖5所示擦除編碼控制器530可推遲重建替換儲存裝置,直到出現閒置時間,且可基於來自圖1所示處理器110的請求,根據需要重構來自故障儲存裝置的資料。(當然,此種重建的資料可被寫入到替換儲存裝置,而無需等待完整的重建,從而使得稍後不再需要再次重新計算此資料。)Rebuilding a failed storage device can be a time-consuming process. In some embodiments of the inventive concept, once a replacement storage device is installed, it can be rebuilt. In other embodiments of the inventive concept, as far as the storage device can be rebuilt in the idle time period, the erasure coding controller 530 shown in FIG. 5 can rebuild the storage device in the idle time period. However, if the virtual storage device is busy, the erasure coding controller 530 shown in FIG. 5 can postpone the reconstruction of the replacement storage device until the idle time occurs, and can be reconstructed as needed based on the request from the processor 110 shown in FIG. 1 Data from the faulty storage device. (Of course, such reconstructed data can be written to a replacement storage device without waiting for the complete reconstruction, so that it is no longer necessary to recalculate this data again later.)

本發明概念的實施例還可支援儲存裝置的初始化。當將新儲存裝置添加到陣列(作為故障儲存裝置的替換儲存裝置,或者用以增加虛擬儲存裝置的容量)時,可將新儲存裝置初始化。初始化可包括使儲存裝置為擦除編碼方案做好準備。Embodiments of the inventive concept can also support the initialization of storage devices. When a new storage device is added to the array (as a replacement storage device for a faulty storage device, or to increase the capacity of a virtual storage device), the new storage device can be initialized. Initialization may include preparing the storage device for an erasure coding scheme.

新儲存裝置的初始化還可涉及從新儲存裝置中擦除現有資料。例如,考慮將特定儲存裝置租賃給客戶的情境。此客戶的租賃已結束,且儲存裝置可重新用於新客戶。但是儲存裝置上可能仍然儲存有來自原始客戶的資料。為避免以後的客戶獲得對早前客戶資料的存取,可使用任何所期望機制擦除儲存裝置上的資料。例如,儲存關於資料儲存地點的資訊的表可被擦除。或者資料本身可被以新資料重寫(以防止稍後嘗試恢復任何可能已被刪除的資訊):新資料可使用設計用於說明確保原始資料可不被恢復的模式。例如,美國國防部(U.S. Department of Defense,DOD)已發佈如何擦除數據以防止恢復的標準:這些標準可用於擦除儲存裝置上的舊資料,然後再將儲存裝置重新用於新用戶端。The initialization of the new storage device may also involve erasing existing data from the new storage device. For example, consider the scenario of renting a specific storage device to a customer. This customer's lease has ended, and the storage device can be reused for new customers. However, the data from the original customer may still be stored on the storage device. To prevent future customers from gaining access to the data of earlier customers, any desired mechanism can be used to erase the data on the storage device. For example, a table storing information about where data is stored can be erased. Or the data itself can be overwritten with new data (to prevent later attempts to recover any information that may have been deleted): new data can use a model designed to illustrate the way to ensure that the original data cannot be recovered. For example, the U.S. Department of Defense (DOD) has issued standards on how to erase data to prevent recovery: these standards can be used to erase old data on storage devices and then reuse the storage devices for new clients.

初始化可不限於當將新儲存裝置熱添加到現有陣列時進行。當儲存裝置或具有旁視擦除編碼邏輯的PCIe交換機125或圖1所示機器105作為整體而一起最初上電時,也可進行初始化。Initialization may not be limited to when a new storage device is hot added to an existing array. It can also be initialized when the storage device or the PCIe switch 125 with side-by-side erasure coding logic or the machine 105 shown in FIG. 1 as a whole is first powered together.

圖11A至圖11D示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機125的示例過程的流程圖,以支持圖4所示擦除編碼方案405、410及415。在圖11A中,在方塊1103處,可將圖3所示具有旁視擦除編碼邏輯的PCIe交換機125初始化(可能通過圖3所示BMC 325或圖1所示處理器110)。在方塊1106處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可接收傳輸。此傳輸可為來自圖1所示處理器110的讀取或寫入請求、來自圖1所示處理器110或圖3所示BMC 325的控制傳輸或者由圖3所示儲存裝置130-1至130-6回應於來自圖1所示處理器110的讀取或寫入請求而發送的傳輸。11A to 11D show a flowchart of an example process of the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 1 to support the erasure coding schemes 405, 410, and 415. In FIG. 11A, at block 1103, the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 may be initialized (perhaps through the BMC 325 shown in FIG. 3 or the processor 110 shown in FIG. 1). At block 1106, the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 3 may receive the transmission. This transmission may be a read or write request from the processor 110 shown in FIG. 1, a control transmission from the processor 110 shown in FIG. 1 or the BMC 325 shown in FIG. 3, or from the storage device 130-1 shown in FIG. 130-6 transmits in response to a read or write request from the processor 110 shown in FIG.

在方塊1109處,圖5所示探測邏輯525可判斷傳輸是否是來自圖1所示處理器110的控制傳輸。如果是,則在方塊1112處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將控制傳輸遞送到圖5所示PPU 520,在此之後,處理結束。At block 1109, the detection logic 525 shown in FIG. 5 may determine whether the transmission is a control transmission from the processor 110 shown in FIG. If so, at block 1112, the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 3 may deliver the control transmission to the PPU 520 shown in FIG. 5, after which the process ends.

如果傳輸不是來自圖1所示處理器110的控制傳輸,則在方塊1115(圖11B)處,圖5所示探測邏輯525可判斷傳輸是否是來自主機的讀取或寫入請求。如果傳輸不是來自主機的讀取或寫入請求,則在方塊1118處,圖5所示探測邏輯525可在傳輸中以適宜於主機的主機LBA替換裝置LBA。圖5所示探測邏輯525還可修改傳輸,以暗示傳輸來自虛擬儲存裝置,而不是儲存實際資料的物理儲存裝置。在方塊1121處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將傳輸遞送到圖1所示處理器110,在此之後,處理結束。If the transmission is not a control transmission from the processor 110 shown in FIG. 1, at block 1115 (FIG. 11B), the detection logic 525 shown in FIG. 5 can determine whether the transmission is a read or write request from the host. If the transmission is not a read or write request from the host, at block 1118, the detection logic 525 shown in FIG. 5 may replace the device LBA with a host LBA suitable for the host during the transmission. The probe logic 525 shown in FIG. 5 can also modify the transmission to imply that the transmission comes from a virtual storage device, rather than a physical storage device that stores actual data. At block 1121, the PCIe switch 125 with bypass erasure coding logic shown in FIG. 3 may deliver the transmission to the processor 110 shown in FIG. 1, after which the process ends.

另一方面,如果傳輸是來自圖1所示處理器110的讀取或寫入請求,則在方塊1124處,圖5所示探測邏輯525可判斷所討論的資料在圖5所示快取545或圖5所示寫入緩衝器550中是否可用。如果資料在圖5所示快取545或圖5所示寫入緩衝器550中可用,則在方塊1127(圖11C)處,圖5所示擦除編碼控制器530可從適宜的位置存取資料。On the other hand, if the transmission is a read or write request from the processor 110 shown in FIG. 1, then at block 1124, the detection logic 525 shown in FIG. 5 may determine that the data in question is cached 545 shown in FIG. Or whether the write buffer 550 shown in FIG. 5 is available. If the data is available in cache 545 shown in FIG. 5 or write buffer 550 shown in FIG. 5, then at block 1127 (FIG. 11C), erasure coding controller 530 shown in FIG. 5 can be accessed from a suitable location data.

如果資料在圖5所示快取545或圖5所示寫入緩衝器550中不可用,則在方塊1130處,圖5所示探測邏輯525可修改傳輸,以便以儲存裝置應從其讀取資料的裝置LBA替換由主機所提供的主機LBA。圖5所示探測邏輯525也可修改傳輸,以識別接收傳輸的適宜的儲存裝置。接著,在方塊1133處,探測邏輯525可將傳輸遞送到適宜的儲存裝置。If the data is not available in the cache 545 shown in FIG. 5 or the write buffer 550 shown in FIG. 5, then at block 1130, the detection logic 525 shown in FIG. 5 may modify the transmission so that the storage device should read the data from it The device LBA replaces the host LBA provided by the host. The detection logic 525 shown in FIG. 5 may also modify the transmission to identify the appropriate storage device to receive the transmission. Next, at block 1133, the probe logic 525 may deliver the transmission to the appropriate storage device.

無論所討論的資料可從快取存取還是從儲存裝置讀取,此時,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125均具有所需資料。此時,處理可能會發散(diverge)。如果傳輸是來自圖1所示處理器110的讀取請求,則在方塊1136處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將資料返回到圖1所示處理器110。如方塊1139中所示,圖5所示探測邏輯525也可將資料儲存在圖5所示快取545中;方塊1139是可選的,且可如由虛線1142所示被省略。此時,處理結束。Regardless of whether the data in question can be accessed from the cache or from the storage device, at this time, the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3 has the required data. At this time, the processing may diverge. If the transmission is a read request from the processor 110 shown in FIG. 1, then at block 1136, the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3 may return the data to the processor 110 shown in FIG. As shown in block 1139, the detection logic 525 shown in FIG. 5 can also store data in the cache 545 shown in FIG. 5; block 1139 is optional and can be omitted as shown by the dashed line 1142. At this point, the process ends.

另一方面,如果來自圖1所示處理器110的傳輸是寫入請求,則在方塊1145處,圖5所示擦除編碼控制器530可讀取跨圖3所示儲存裝置130-1至130-6的條帶。方塊1145實際上是對方塊1127、1130及1133的重述,且可能並非所需;圖11C中包括方塊1145是為了強調將資料寫入到虛擬儲存裝置可能涉及從跨儲存裝置130-1至130-6的整個條帶讀取資料。在方塊1148處,圖5所示擦除編碼控制器530可將從圖1所示處理器110接收的資料與從快取或從儲存裝置130-1至130-6存取的資料條帶合併。On the other hand, if the transfer from the processor 110 shown in FIG. 1 is a write request, then at block 1145, the erasure coding controller 530 shown in FIG. 5 can read the storage devices 130-1 through 3 shown in FIG. 130-6 strips. Block 1145 is actually a restatement of blocks 1127, 1130, and 1133, and may not be required; block 1145 is included in FIG. 11C to emphasize that writing data to the virtual storage device may involve cross storage devices 130-1 to 130 The entire strip of -6 reads the data. At block 1148, the erasure coding controller 530 shown in FIG. 5 may merge the data received from the processor 110 shown in FIG. 1 with the data stripe accessed from the cache or from the storage devices 130-1 to 130-6 .

此時,依據圖3所示具有旁視擦除編碼邏輯的PCIe交換機125是否包括圖5所示寫入緩衝器550,處理可能再次發叉(diverge)。如果圖3所示具有旁視擦除編碼邏輯的PCIe交換機125包括圖5所示寫入緩衝器550,則在方塊1151(圖11D)處,圖5所示擦除編碼控制器530可將合併的資料條帶寫入到圖5所示寫入緩衝器550(將此資料標記為髒且需要清除到儲存裝置130-1至130-6)。接著,在方塊1154處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將寫入請求完成報告給圖1所示處理器110。注意,如果圖5所示寫入緩衝器550使用回寫式快取策略,則方塊1154是適宜的;如果圖5所示寫入緩衝器550使用寫直達快取策略,則方塊1154可如由虛線1157所示被省略。At this time, depending on whether the PCIe switch 125 with bypass erasure coding logic shown in FIG. 3 includes the write buffer 550 shown in FIG. 5, the process may be diverged again. If the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3 includes the write buffer 550 shown in FIG. 5, then at block 1151 (FIG. 11D), the erasure coding controller 530 shown in FIG. 5 may merge The data stripe is written to the write buffer 550 shown in FIG. 5 (this data is marked as dirty and needs to be cleared to the storage devices 130-1 to 130-6). Next, at block 1154, the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3 may report the write request completion to the processor 110 shown in FIG. Note that if the write buffer 550 shown in FIG. 5 uses a write-back cache strategy, block 1154 is appropriate; if the write buffer 550 shown in FIG. 5 uses a write-through cache strategy, then block 1154 may be as follows The dotted line 1157 is omitted.

最終,由於圖3所示具有旁視擦除編碼邏輯的PCIe交換機125不包括圖5所示寫入緩衝器550,或者由於圖5所示寫入緩衝器550中的資料將被清除到圖3所示儲存裝置130-1至130-6,因此在方塊1160處,圖5所示擦除編碼控制器530可將更新的條帶寫回到圖3所示儲存裝置130-1至130-6。接著,在方塊1163處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將寫入請求完成報告給圖1所示處理器110。注意,如果合併的資料已儲存在圖5所示寫入緩衝器550中且圖5所示寫入緩衝器550使用回寫式快取策略,則方塊1163不是必需的:圖3所示具有旁視擦除編碼邏輯的PCIe交換機125已報告寫入請求完成(在方塊1154處)。在此種情境中,方塊1163可如由虛線1166所示被省略。此時,處理結束。Finally, since the PCIe switch 125 with bypass erasure coding logic shown in FIG. 3 does not include the write buffer 550 shown in FIG. 5, or because the data in the write buffer 550 shown in FIG. 5 will be cleared to FIG. 3 The storage devices 130-1 to 130-6 are shown, so at block 1160, the erasure coding controller 530 shown in FIG. 5 can write the updated stripe back to the storage devices 130-1 to 130-6 shown in FIG. . Next, at block 1163, the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3 may report the write request completion to the processor 110 shown in FIG. Note that if the merged data has been stored in the write buffer 550 shown in FIG. 5 and the write buffer 550 shown in FIG. 5 uses a write-back cache strategy, the block 1163 is not necessary: The PCIe switch 125 depending on the erasure coding logic has reported that the write request is complete (at block 1154). In such a scenario, the block 1163 may be omitted as shown by the dashed line 1166. At this point, the process ends.

圖12A至圖12B示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機125實行初始化的示例性過程。在圖12A中,在方塊1205處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125判斷連接到圖3所示具有旁視擦除編碼邏輯的PCIe交換機125的裝置是否僅是儲存裝置且可具有由圖3所示具有旁視擦除編碼邏輯的PCIe交換機125管理的擦除編碼。如果有不是儲存裝置的裝置或者是可能不具有由圖3所示具有旁視擦除編碼邏輯的PCIe交換機125管理的擦除編碼的儲存裝置的裝置連接到圖3所示具有旁視擦除編碼邏輯的PCIe交換機125,則在本發明概念的一些實施例中,在方塊1210處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可禁用旁視擦除編碼邏輯,在此之後,處理結束。FIGS. 12A to 12B illustrate an exemplary process of the PCIe switch 125 with bypass erasure coding logic shown in FIG. 1 performing initialization according to an embodiment of the inventive concept. In FIG. 12A, at block 1205, the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 determines whether the device connected to the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 is only a storage device And it may have an erasure code managed by the PCIe switch 125 with the bypass erasure code logic shown in FIG. 3. If there is a device that is not a storage device or a device that may not have an erasure-coded storage device managed by the PCIe switch 125 with bypass erasure coding logic shown in FIG. 3, connect to the side-view erasure code shown in FIG. 3. Logical PCIe switch 125, in some embodiments of the inventive concept, at block 1210, the PCIe switch 125 with bypass erasure coding logic shown in FIG. 3 may disable the bypass erasure coding logic, after which, Processing ends.

但是在本發明概念的其他實施例中,即使有不符合擦除編碼資格的裝置連接到圖3所示具有旁視擦除編碼邏輯的PCIe交換機125,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125也可管理擦除編碼。在本發明概念的這些實施例中,或者如果只有符合擦除編碼資格的儲存裝置連接到圖3所示具有旁視擦除編碼邏輯的PCIe交換機125,則在方塊1215處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可啟用旁視擦除編碼邏輯。接著,在方塊1220(圖12B)處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可被配置為使用擦除編碼方案(可能通過圖3所示BMC 325或圖1所示處理器110)。However, in other embodiments of the inventive concept, even if a device that does not meet the qualification for erasure coding is connected to the PCIe switch 125 with the bypass erasure coding logic shown in FIG. 3, the device with the bypass erasure coding logic shown in FIG. 3 The PCIe switch 125 can also manage erasure coding. In these embodiments of the inventive concept, or if only storage devices eligible for erasure coding are connected to the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3, at block 1215, The PCIe switch 125 of the side-view erasure coding logic can enable the side-view erasure coding logic. Next, at block 1220 (FIG. 12B), the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 3 may be configured to use an erasure coding scheme (possibly through the BMC 325 shown in FIG. 3 or the processing shown in FIG. 1 110).

在方塊1225處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可禁用不符合擦除編碼資格的裝置。注意,如由虛線1230所示,方塊1225是可選的:可能沒有任何不符合擦除編碼資格的裝置連接到圖3所示具有旁視擦除編碼邏輯的PCIe交換機125,或者儘管對其他裝置使用擦除編碼,然而圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可允許圖1所示處理器110存取那些不符合擦除編碼資格的裝置。At block 1225, the PCIe switch 125 shown in FIG. 3 with bypass erasure coding logic may disable devices that do not qualify for erasure coding. Note that, as indicated by the dashed line 1230, block 1225 is optional: there may not be any devices that do not qualify for erasure coding connected to the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3, or Erasure coding is used, however, the PCIe switch 125 shown in FIG. 3 with bypass erasure coding logic may allow the processor 110 shown in FIG. 1 to access those devices that are not eligible for erasure coding.

在方塊1235處,對於任何經歷擦除編碼的裝置,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可終止圖3所示具有旁視擦除編碼邏輯的PCIe交換機125下游的枚舉。在方塊1240處,基於圖3所示儲存裝置130-1至130-6經歷擦除編碼,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125可將虛擬儲存裝置報告給圖1所示處理器110。圖3所示具有旁視擦除編碼邏輯的PCIe交換機125也可將所可能枚舉的任何其他PCIe裝置報告給圖1所示處理器110。此時,處理結束。At block 1235, for any device that has undergone erasure coding, the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 may terminate the enumeration downstream of the PCIe switch 125 with bypass code erasure coding logic shown in FIG. . At block 1240, based on the storage devices 130-1 to 130-6 shown in FIG. 3 undergoing erasure coding, the PCIe switch 125 with bypass code erasure coding logic shown in FIG. 3 can report the virtual storage device to FIG. 1 Processor 110. The PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 3 may also report any other PCIe devices that may be enumerated to the processor 110 shown in FIG. 1. At this point, the process ends.

圖13示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機125將新儲存裝置併入圖4所示擦除編碼方案405、410及415中的示例性過程的流程圖。在圖13中,在方塊1305處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325)可檢查新儲存裝置。如果檢測到新儲存裝置,則在方塊1310處,圖5所示擦除編碼控制器530可將新儲存裝置添加到虛擬儲存裝置後面的陣列。最後,在方塊1315處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325,或圖1所示處理器110)可將新儲存裝置初始化。此時,處理可結束,或者可如由虛線1320所示返回到方塊1305,以檢查附加的新儲存裝置。FIG. 13 shows an exemplary process of incorporating the new storage device into the erasure coding schemes 405, 410, and 415 shown in FIG. 4 by the PCIe switch 125 with side-by-side erasure coding logic shown in FIG. 1 according to an embodiment of the inventive concept. Flow chart. In FIG. 13, at block 1305, the PCIe switch 125 (or the BMC 325 shown in FIG. 3) shown in FIG. 3 with side-by-side erasure coding logic can check for new storage devices. If a new storage device is detected, at block 1310, the erasure coding controller 530 shown in FIG. 5 may add the new storage device to the array behind the virtual storage device. Finally, at block 1315, the PCIe switch 125 (or the BMC 325 shown in FIG. 3, or the processor 110 shown in FIG. 1) with the bypass erasure coding logic shown in FIG. 3 can initialize the new storage device. At this point, the process may end, or may return to block 1305 as indicated by the dashed line 1320 to check for additional new storage devices.

圖14示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機125處置故障儲存裝置的示例性過程的流程圖。在圖14中,在方塊1405處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325)可檢查故障的(或移除的)儲存裝置。如果檢測到故障儲存裝置,則在方塊1410處,當本將從故障儲存裝置存取資料的讀取請求抵達時,圖5所示擦除編碼控制器530可實行對已儲存在故障儲存裝置上的資料的擦除編碼恢復。此種擦除編碼恢復可涉及從包括來自其他儲存裝置的所請求資料的條帶讀取資料並根據條帶中的其餘資料計算所請求資料。FIG. 14 shows a flowchart of an exemplary process of the PCIe switch 125 with bypass erasure coding logic shown in FIG. 1 to handle a faulty storage device according to an embodiment of the inventive concept. In FIG. 14, at block 1405, the PCIe switch 125 (or BMC 325 shown in FIG. 3) shown in FIG. 3 with side-by-side erasure coding logic can check for failed (or removed) storage devices. If a faulty storage device is detected, then at block 1410, when a read request that would otherwise access data from the faulty storage device arrives, the erasure code controller 530 shown in FIG. 5 can implement the storage on the faulty storage device The erasure coding of the data is restored. Such erasure code recovery may involve reading data from a strip that includes the requested data from other storage devices and calculating the requested data based on the remaining data in the strip.

在方塊1415處,圖3所示具有旁視擦除編碼邏輯的PCIe交換機125(或圖3所示BMC 325)可判斷替換儲存裝置是否已被添加到虛擬儲存裝置後面的陣列。如果儲存裝置已被添加到虛擬儲存裝置後面的陣列,則在方塊1420處,圖5所示擦除編碼控制器530可使用替換儲存裝置重建故障儲存裝置。此時,處理可結束,或者可如由虛線1425所示返回到方塊1405,以檢查附加的新儲存裝置。At block 1415, the PCIe switch 125 (or BMC 325 shown in FIG. 3) with the bypass erasure coding logic shown in FIG. 3 can determine whether the replacement storage device has been added to the array behind the virtual storage device. If the storage device has been added to the array behind the virtual storage device, at block 1420, the erasure coding controller 530 shown in FIG. 5 may use the replacement storage device to rebuild the failed storage device. At this point, the process may end, or may return to block 1405 as indicated by the dashed line 1425 to check for additional new storage devices.

在圖11A至圖14中,示出本發明概念的一些實施例。但是所屬領域中的技術人員將認識到,通過改變方塊的次序、通過省略方塊或者通過包括圖式中未示出的環節,也可能實現本發明概念的其他實施例。無論是否明確闡述,流程圖的所有此種變型均被視為本發明概念的實施例。In FIGS. 11A to 14, some embodiments of the inventive concept are shown. However, those skilled in the art will recognize that other embodiments of the inventive concept may be implemented by changing the order of the blocks, by omitting the blocks, or by including links not shown in the drawings. Regardless of whether it is explicitly stated or not, all such variations of the flowchart are considered as embodiments of the inventive concept.

與現有技術相比,本發明概念的實施例提供技術優點。使用具有旁視擦除編碼邏輯的PCIe交換機會使擦除編碼移動得更靠近儲存裝置,此會減少到處移動資料所需的時間。將擦除編碼從處理器中移出會減少處理器上的負載,從而允許處理器為應用執行更多指令。通過使用可配置的擦除編碼控制器,可使用任何所期望的擦除編碼方案,而非由硬體和軟體擦除編碼供應商所支援的有限的幾組方案。通過將擦除編碼控制器與PCIe交換機一起放置,使得不再需要昂貴的RAID外掛程式卡,且可使用甚至跨越多個主機殼的更大的陣列。Compared with the prior art, embodiments of the inventive concept provide technical advantages. Using a PCIe switch with side-by-side erasure coding logic will move the erasure code closer to the storage device, which will reduce the time required to move data around. Removing the erasure code from the processor reduces the load on the processor, allowing the processor to execute more instructions for the application. By using a configurable erasure coding controller, any desired erasure coding scheme can be used instead of the limited set of schemes supported by hardware and software erasure coding vendors. By placing the erasure coding controller with the PCIe switch, expensive RAID plug-in cards are no longer required, and larger arrays can be used even across multiple mainframes.

以下論述旨在提供對可在其中實施本發明概念某些方面的合適的一個或多個機器的簡要大致說明。所述一個或多個機器可至少部分地通過來自傳統輸入裝置(例如鍵盤、滑鼠等)的輸入來控制以及通過從另一機器接收的指示、與虛擬實境(virtual reality,VR)環境的交互、生物測定回饋或其他輸入信號來控制。本文中所使用的用語“機器”旨在廣泛地囊括單個機器、虛擬機器或者一起進行操作的以通信方式耦合的機器、虛擬機器或裝置構成的系統。示例性機器包括例如個人電腦、工作站、伺服器、可攜式電腦、手持裝置、電話、平板(tablet)等計算裝置以及例如私人或公共運輸工具(例如汽車、火車、計程車等)等運輸裝置。The following discussion is intended to provide a brief general description of a suitable machine or machines in which certain aspects of the inventive concept can be implemented. The one or more machines may be controlled, at least in part, by input from traditional input devices (eg, keyboard, mouse, etc.) and by instructions received from another machine, and virtual reality (VR) environment Interactive, biometric feedback or other input signals to control. The term "machine" as used herein is intended to broadly encompass a single machine, a virtual machine, or a system of communicatively coupled machines, virtual machines, or devices operating together. Exemplary machines include computing devices such as personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, and transportation devices such as private or public transportation vehicles (such as cars, trains, taxis, etc.).

所述一個或多個機器可包括嵌入式控制器,例如可程式設計邏輯裝置或陣列或者不可程式設計邏輯裝置或陣列、應用專用積體電路(ASIC)、嵌入式電腦、智慧卡等。所述一個或多個機器可利用與一個或多個遠端機器的一個或多個連接(例如通過網路介面、數據機或其他通信耦合方式)。機器可通過例如內聯網(intranet)、互聯網、局域網、廣域網路等物理網路和/或邏輯網路進行互連。所屬領域中的技術人員將理解,網路通信可利用各種有線和/或無線短程或遠端載波及協議,包括射頻(radio frequency,RF)、衛星、微波、電氣和電子工程師學會(Institute of Electrical and Electronics Engineers,IEEE)802.11、藍牙 、光學器件、紅外器件、纜線、鐳射等。The one or more machines may include an embedded controller, such as a programmable logic device or array or a non-programmable logic device or array, an application specific integrated circuit (ASIC), an embedded computer, a smart card, and so on. The one or more machines may utilize one or more connections to one or more remote machines (eg, via a network interface, modem, or other communication coupling method). The machines can be interconnected through physical and/or logical networks such as intranet, Internet, local area network, wide area network, etc. Those skilled in the art will understand that network communications can utilize a variety of wired and/or wireless short-range or remote carriers and protocols, including radio frequency (RF), satellite, microwave, electrical and electronic engineering institutes (Institute of Electrical and Electronics Engineers, IEEE) 802.11, Bluetooth, optical devices, infrared devices, cables, lasers, etc.

本發明概念的實施例可參考或結合包括功能、過程、資料結構、應用程式等的相關聯資料來闡述,所述相關聯資料當被機器存取時會使機器實行任務或定義抽象資料類型或低層級硬體上下文。相關聯資料可儲存在例如揮發性記憶體和/或非揮發性記憶體(例如,RAM、ROM等)中或者儲存在包括硬碟驅動器、軟碟、光學記憶體、磁帶、快閃記憶體、儲存條(memory stick)、數位視訊磁片(digital video disk)、生物記憶體等在內的其它儲存裝置及其相關聯儲存介質中。相關聯資料可以以分包(packet)、串列資料、並行資料、傳播信號等形式通過包括物理網路和/或邏輯網路在內的傳輸環境遞送,且可以以壓縮或加密格式使用。相關聯資料可在分散式環境(distributed environment)中使用,且在本地和/或遠端存放以供機器存取。Embodiments of the inventive concept can be described with reference to or in conjunction with related data including functions, processes, data structures, applications, etc. When the related data is accessed by the machine, the machine can perform tasks or define abstract data types or Low-level hardware context. The associated data may be stored in, for example, volatile memory and/or non-volatile memory (eg, RAM, ROM, etc.) or stored in hard drives, floppy disks, optical memory, magnetic tape, flash memory, Other storage devices including memory stick, digital video disk, biological memory, etc. and their associated storage media. Related data can be delivered in a transmission environment including a physical network and/or a logical network in the form of packets, serial data, parallel data, and propagation signals, and can be used in a compressed or encrypted format. The associated data can be used in a distributed environment and stored locally and/or remotely for machine access.

本發明概念的實施例可包括有形非暫時性機器可讀介質,所述有形非暫時性機器可讀介質包括可由一個或多個處理器執行的指令,所述指令包括用於實行本文中所述本發明概念的要素的指令。Embodiments of the inventive concept may include a tangible, non-transitory machine-readable medium that includes instructions executable by one or more processors, the instructions including instructions for practicing described herein Instruction of elements of the inventive concept.

上述方法的各種操作可通過能夠實行所述操作的任何適合的工具(例如各種硬體和/或軟體元件、電路和/或模組)來實行。所述軟體可包括用於實施邏輯功能的可執行指令的有序列表,且可實施在任何“處理器可讀介質”中以供指令執行系統、設備或裝置(例如單核心或多核心處理器或者含處理器的系統)使用或者與指令執行系統、設備或裝置結合使用。The various operations of the above method can be performed by any suitable tool capable of performing the operations (eg, various hardware and/or software components, circuits, and/or modules). The software may include an ordered list of executable instructions for implementing logical functions, and may be implemented in any "processor-readable medium" for instruction execution systems, devices, or devices (eg, single-core or multi-core processors) Or a system with a processor) or used in conjunction with an instruction execution system, device, or device.

結合本文中所公開的實施例闡述的方法或演算法及功能的方塊或步驟可直接實施在硬體中、由處理器執行的軟體模組中或所述兩者的組合中。如果實施在軟體中,則所述功能可作為一個或多個指令或代碼儲存在有形非暫時性電腦可讀介質上或通過有形非暫時性電腦可讀介質進行傳輸。軟體模組可駐留在隨機存取記憶體(RAM)、快閃記憶體、唯讀記憶體(ROM)、電可程式設計唯讀記憶體(EPROM)、電可擦除可程式設計唯讀記憶體(EEPROM)、暫存器、硬碟、抽取式磁碟、光碟唯讀記憶體(CD ROM)或所述領域中已知的任何其他形式的儲存介質中。The blocks or steps of the methods or algorithms and functions described in conjunction with the embodiments disclosed herein may be directly implemented in hardware, in a software module executed by a processor, or in a combination of the two. If implemented in software, the functions may be stored on or transmitted over as tangible, non-transitory computer-readable medium as one or more instructions or codes. Software modules can reside in random access memory (RAM), flash memory, read only memory (ROM), electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), scratchpad, hard drive, removable disk, CD-ROM, or any other form of storage medium known in the art.

鑒於已參考所說明實施例闡述並說明了本發明概念的原理,將認識到,在不背離此種原理的條件下,所說明實施例可在佈置及細節上進行修改,且可以以任何所期望方式進行組合。並且,儘管前面的論述關注特定實施例,然而也可設想其他配置。特別地,即使本文中使用例如“根據本發明概念的實施例”等表達,這些短語也意在大致提及實施例的可能性,而非旨在將本發明概念限制於特定的實施例配置。本文中所使用的這些用語可提及可組合成其他實施例的相同實施例或不同實施例。Given that the principles of the inventive concept have been explained and explained with reference to the illustrated embodiments, it will be appreciated that the illustrated embodiments can be modified in arrangement and detail without departing from such principles, and can be modified in any desired manner Way. Also, although the foregoing discussion has focused on specific embodiments, other configurations are also contemplated. In particular, even if expressions such as “embodiments according to the inventive concept” are used herein, these phrases are intended to roughly mention the possibility of the embodiments, and are not intended to limit the inventive concept to specific embodiment configurations . These terms used herein may refer to the same embodiment or different embodiments that can be combined into other embodiments.

前述說明性實施例不應被解釋為限制本發明的發明概念。儘管已闡述幾個實施例,然而所述領域中的技術人員將容易理解,在實質上不背離本公開的新穎教示內容及優點的條件下,能夠對這些實施例進行許多修改。因此,所有此種修改均旨在包括如在申請專利範圍書中所定義的本發明概念的範圍內。The foregoing illustrative embodiments should not be construed as limiting the inventive concepts of the present invention. Although several embodiments have been described, those skilled in the art will readily understand that many modifications can be made to these embodiments without substantially departing from the novel teachings and advantages of the present disclosure. Therefore, all such modifications are intended to include the scope of the inventive concept as defined in the scope of the patent application.

本發明概念的實施例可延伸到以下聲明,但不限於此:Embodiments of the inventive concept can be extended to the following statements, but are not limited thereto:

聲明1. 本發明概念的實施例包括一種具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機,具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機包括:Statement 1. Embodiments of the inventive concept include a peripheral element connection express (PCIe) switch with erasure coding logic, and a peripheral element connection express (PCIe) switch with erasure coding logic including:

外部連接器,能夠使PCIe交換機與處理器通信;External connector, which enables the PCIe switch to communicate with the processor;

至少一個連接器,能夠使PCIe交換機與至少一個儲存裝置通信;At least one connector that enables the PCIe switch to communicate with at least one storage device;

功率處理單元(PPU),處置PCIe交換機的配置;Power processing unit (PPU), which handles the configuration of PCIe switches;

擦除編碼控制器,包括用於將擦除編碼方案應用於儲存在所述至少一個儲存裝置上的資料的電路系統;以及An erasure coding controller, including circuitry for applying an erasure coding scheme to data stored on the at least one storage device; and

探測邏輯,包括用於攔截在PCIe交換機處接收的資料傳輸並回應於擦除編碼方案修改資料傳輸的電路系統。The detection logic includes circuitry for intercepting data transmissions received at the PCIe switch and modifying the data transmission in response to the erasure coding scheme.

聲明2. 本發明概念的實施例包括根據聲明1的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 2. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 1, wherein the erasure coding logic includes at least one of side-by-side erasure coding logic and perspective erasure coding logic.

聲明3. 本發明概念的實施例包括根據聲明1的具有擦除編碼邏輯的PCIe交換機,其中所述至少一個儲存裝置包括至少一個非揮發性儲存快速(NVMe)固態驅動器(SSD)。Statement 3. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 1, wherein the at least one storage device includes at least one non-volatile storage express (NVMe) solid-state drive (SSD).

聲明4. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中探測邏輯能夠操作以攔截在PCIe交換機處接收的控制傳輸,並將控制傳輸轉發到PPU。Statement 4. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, where the probe logic is operable to intercept control transmissions received at the PCIe switch and forward the control transmissions to the PPU.

聲明5. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中探測邏輯能夠操作以攔截在PCIe交換機處從主機接收的資料傳輸,並在資料傳輸中以由所述至少一個NVMe SSD所使用的裝置LBA替換主機所使用的主機邏輯塊位址(LBA)。Statement 5. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, wherein the detection logic is operable to intercept data transmissions received from the host at the PCIe switch, and the data transmission At least one device LBA used by the NVMe SSD replaces the host logical block address (LBA) used by the host.

聲明6. 本發明概念的實施例包括根據聲明5的具有擦除編碼邏輯的PCIe交換機,其中探測邏輯進一步能夠操作以將資料傳輸引導到所述至少一個NVMe SSD。Statement 6. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 5, wherein the probe logic is further operable to direct data transfer to the at least one NVMe SSD.

聲明7. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中探測邏輯能夠操作以攔截在PCIe交換機處從所述至少一個NVMe SSD中的一個NVMe SSD接收的資料傳輸,並在資料傳輸中以由主機所使用的主機LBA替換由所述至少一個NVMe SSD中的所述一個NVMe SSD所使用的裝置LBA。Statement 7. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein the detection logic is operable to intercept data transmissions received from one of the at least one NVMe SSD at the PCIe switch In the data transmission, the host LBA used by the host replaces the device LBA used by the one NVMe SSD in the at least one NVMe SSD.

聲明8. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,具有擦除編碼邏輯的PCIe交換機進一步包括快取。Statement 8. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, and a PCIe switch with erasure coding logic further includes a cache.

聲明9. 本發明概念的實施例包括根據聲明8的具有擦除編碼邏輯的PCIe交換機,其中探測邏輯能夠操作以至少部分基於快取中存在來自主機的資料傳輸中所請求的資料而返回對資料傳輸的回應。Statement 9. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to Statement 8, wherein the probe logic is operable to return data to the data based at least in part on the presence of data requested in the data transfer from the host in the cache The transmitted response.

聲明10. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中:Statement 10. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein:

PCIe交換機位於主機殼中;並且The PCIe switch is located in the mainframe; and

主機殼包括由擦除編碼控制器用作外部快取的記憶體。The main case includes memory used by the erasure code controller as an external cache.

聲明11. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,具有擦除編碼邏輯的PCIe交換機進一步包括寫入緩衝器。Statement 11. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, and a PCIe switch with erasure coding logic further includes a write buffer.

聲明12. 本發明概念的實施例包括根據聲明11的具有擦除編碼邏輯的PCIe交換機,其中:Statement 12. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 11, where:

資料傳輸包括來自主機的寫入操作;並且Data transmission includes write operations from the host; and

擦除編碼控制器能夠操作以在向主機發送對資料傳輸的回應之後完成寫入操作。The erasure coding controller is operable to complete the write operation after sending a response to the data transmission to the host.

聲明13. 本發明概念的實施例包括根據聲明11的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以將寫入操作中的資料儲存在寫入緩衝器中。Statement 13. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 11, wherein the erasure coding controller is operable to store data in a write operation in a write buffer.

聲明14. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以至少部分基於所述至少一個NVMe SSD中的所有NVMe SSD可與擦除編碼控制器一起使用而啟用擦除編碼控制器及探測邏輯。Statement 14. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch is operable to be based at least in part on all NVMe SSDs in the at least one NVMe SSD with erasure coding controllers Used together to enable erasure coding controller and detection logic.

聲明15. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以至少部分地基於所述至少一個NVMe SSD包括內建擦除編碼功能而禁用擦除編碼控制器及探測邏輯。Statement 15. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch is operable to disable erasure based at least in part on the at least one NVMe SSD including a built-in erasure coding function Encoding controller and detection logic.

聲明16. 本發明概念的實施例包括根據聲明15的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以至少部分地基於所述至少一個NVMe SSD包括內建擦除編碼功能而將擦除編碼控制器及探測邏輯被禁用告知給用戶。Statement 16. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 15, wherein the PCIe switch is operable to erase based at least in part on the at least one NVMe SSD including a built-in erasure coding function The coding controller and detection logic are disabled to inform the user.

聲明17. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以至少部分地基於至少一個非儲存裝置使用所述至少一個連接器連接到PCIe交換機而禁用擦除編碼控制器及探測邏輯。Statement 17. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein the PCIe switch is operable to connect to the PCIe switch based at least in part on at least one non-storage device using the at least one connector Disable erasure coding controller and detection logic.

聲明18. 本發明概念的實施例包括根據聲明17的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以至少部分地基於所述至少一個非儲存裝置使用所述至少一個連接器連接到PCIe交換機而將擦除編碼控制器及探測邏輯被禁用告知給用戶。Statement 18. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to Statement 17, wherein the PCIe switch is operable to connect to the PCIe using the at least one connector based at least in part on the at least one non-storage device The switch notifies the user that the erasure code controller and detection logic are disabled.

聲明19. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以與所述至少一個NVMe SSD一起啟用擦除編碼控制器及探測邏輯,並阻止對使用所述至少一個連接器連接到PCIe交換機的非儲存裝置的存取。Statement 19. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch is operable to enable erasure coding controller and detection logic together with the at least one NVMe SSD and prevent Access to non-storage devices connected to the PCIe switch using the at least one connector.

聲明20. 本發明概念的實施例包括根據聲明19的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以將對連接到PCIe交換機的非儲存裝置的存取被阻止告知給用戶。Statement 20. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 19, wherein the PCIe switch is operable to notify the user that access to non-storage devices connected to the PCIe switch is blocked.

聲明21. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以使用擦除編碼控制器及探測邏輯來管理連接到第二PCIe交換機的至少一個附加NVMe SSD上的擦除編碼方案。Statement 21. Embodiments of the inventive concept include a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch is operable to use the erasure coding controller and detection logic to manage at least one add-on connected to the second PCIe switch Erase coding scheme on NVMe SSD.

聲明22. 本發明概念的實施例包括根據聲明21的具有擦除編碼邏輯的PCIe交換機,其中第二PCIe交換機能夠操作以禁用第二PCIe交換機中的第二擦除編碼控制器及第二探測邏輯。Statement 22. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 21, wherein the second PCIe switch is operable to disable the second erasure coding controller and the second detection logic in the second PCIe switch .

聲明23. 本發明概念的實施例包括根據聲明22的具有擦除編碼邏輯的PCIe交換機,其中:Statement 23. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 22, where:

PCIe交換機位於第一主機殼中;並且The PCIe switch is located in the first mainframe; and

第二PCIe交換機位於第二主機殼中。The second PCIe switch is located in the second mainframe.

聲明24. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機使用現場可程式設計閘陣列(FPGA)來實施。Statement 24. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, where the PCIe switch is implemented using a field programmable gate array (FPGA).

聲明25. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中:Statement 25. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein:

所述至少一個NVMe SSD包括至少兩個NVMe SSD;並且The at least one NVMe SSD includes at least two NVMe SSDs; and

PCIe交換機與所述至少兩個NVMe SSD位於共用殼體內部。The PCIe switch and the at least two NVMe SSDs are located inside a common housing.

聲明26. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機與所述至少一個NVMe SSD位於單獨的殼體中。Statement 26. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch and the at least one NVMe SSD are located in separate housings.

聲明27. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中:Statement 27. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein:

PCIe交換機能夠操作以檢測所述至少一個NVMe SSD的故障NVMe SSD;並且The PCIe switch is operable to detect a faulty NVMe SSD of the at least one NVMe SSD; and

擦除編碼控制器能夠操作以處置資料傳輸,以應對故障NVMe SSD。The erasure coding controller is operable to handle data transmission in response to a faulty NVMe SSD.

聲明28. 本發明概念的實施例包括根據聲明27的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以實行對儲存在故障NVMe SSD上的資料的擦除編碼恢復。Statement 28. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 27, wherein the erasure coding controller is operable to perform erasure coding recovery of data stored on a failed NVMe SSD.

聲明29. 本發明概念的實施例包括根據聲明28的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以為故障NVMe SSD重建替換NVMe SSD。Statement 29. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 28, wherein the erasure coding controller is operable to replace the NVMe SSD for a failed NVMe SSD reconstruction.

聲明30. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中:Statement 30. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein:

PCIe交換機能夠操作以檢測新NVMe SSD;並且The PCIe switch can operate to detect the new NVMe SSD; and

擦除編碼控制器能夠操作以使用新NVMe SSD作為擦除編碼方案的一部分。The erasure coding controller is operable to use the new NVMe SSD as part of the erasure coding scheme.

聲明31. 本發明概念的實施例包括根據聲明30的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以使用新NVMe SSD實行容量增加。Statement 31. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 30, wherein the erasure coding controller is operable to perform a capacity increase using a new NVMe SSD.

聲明32. 本發明概念的實施例包括根據聲明30的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以檢測連接到所述至少一個連接器中的一個連接器的新NVMe SSD。Statement 32. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 30, wherein the PCIe switch is operable to detect a new NVMe SSD connected to one of the at least one connector.

聲明33. 本發明概念的實施例包括根據聲明30的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以通過來自第二PCIe交換機的消息檢測新NVMe SSD。Statement 33. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 30, wherein the PCIe switch is operable to detect a new NVMe SSD through a message from a second PCIe switch.

聲明34. 本發明概念的實施例包括根據聲明33的具有擦除編碼邏輯的PCIe交換機,其中新NVMe SSD連接到第二PCIe交換機上的第二連接器。Statement 34. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 33, wherein a new NVMe SSD is connected to a second connector on a second PCIe switch.

聲明35. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中所述至少一個連接器包括用於檢測故障NVMe SSD與新NVMe SSD二者的存在引腳。Statement 35. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, wherein the at least one connector includes presence pins for detecting both a faulty NVMe SSD and a new NVMe SSD.

聲明36. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以將自身作為單個裝置呈現給主機,並防止對所述至少一個NVMe SSD的下游PCIe匯流排枚舉。Statement 36. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, wherein the PCIe switch is operable to present itself as a single device to the host and prevent downstream PCIe to the at least one NVMe SSD Busbar enumeration.

聲明37. 本發明概念的實施例包括根據聲明36的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機進一步能夠操作以防止對PCIe交換機下游的第二PCIe交換機的下游PCIe匯流排枚舉。Statement 37. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 36, wherein the PCIe switch is further operable to prevent downstream PCIe bus enumeration of the second PCIe switch downstream of the PCIe switch.

聲明38. 本發明概念的實施例包括根據聲明36的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機能夠操作以將所述至少一個NVMe SSD虛擬化。Statement 38. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 36, wherein the PCIe switch is operable to virtualize the at least one NVMe SSD.

聲明39. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以將連接到所述至少一個連接器中的一個連接器的新NVMe SSD初始化。Statement 39. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, wherein the erasure coding controller is operable to connect a new NVMe SSD connected to one of the at least one connector initialization.

聲明40. 本發明概念的實施例包括根據聲明39的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器能夠操作以在熱插拔事件(hot insertion event)之後將新NVMe SSD初始化。Statement 40. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 39, wherein the erasure coding controller is operable to initialize a new NVMe SSD after a hot insertion event.

聲明41. 本發明概念的實施例包括根據聲明39的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器進一步能夠操作以在啟動時將所述至少一個NVMe SSD初始化。Statement 41. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 39, wherein the erasure coding controller is further operable to initialize the at least one NVMe SSD at startup.

聲明42. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中PCIe交換機是包括基板管理控制器(BMC)的系統的一部分,基板管理控制器能夠操作以將連接到所述至少一個連接器中的一個連接器的新NVMe SSD初始化。Statement 42. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to Statement 3, where the PCIe switch is part of a system that includes a baseboard management controller (BMC) that is operable to connect to A new NVMe SSD of one of the at least one connector is initialized.

聲明43. 本發明概念的實施例包括根據聲明42的具有擦除編碼邏輯的PCIe交換機,其中BMC能夠操作以在啟動時將所述至少一個NVMe SSD初始化。Statement 43. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 42, wherein the BMC is operable to initialize the at least one NVMe SSD at startup.

聲明44. 本發明概念的實施例包括根據聲明3的具有擦除編碼邏輯的PCIe交換機,其中擦除編碼控制器包括用於跨所述至少一個NVMe SSD將資料條帶化的條帶管理器。Statement 44. An embodiment of the inventive concept includes a PCIe switch with erasure coding logic according to statement 3, wherein the erasure coding controller includes a stripe manager for striping data across the at least one NVMe SSD.

聲明45. 本發明概念的實施例包括一種方法,所述方法包括:Statement 45. An embodiment of the inventive concept includes a method, the method comprising:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸;Receive transmissions at peripheral component connection express (PCIe) switches with erasure coding logic;

使用擦除編碼邏輯中的探測邏輯處理傳輸;以及Use probing logic in erasure coding logic to process the transmission; and

通過PCIe交換機將傳輸遞送到其目的地。Deliver the transmission to its destination through the PCIe switch.

聲明46. 本發明概念的實施例包括根據聲明45的方法,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 46. An embodiment of the inventive concept includes a method according to Statement 45, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic.

聲明47. 本發明概念的實施例包括根據聲明45的方法,其中:Statement 47. An embodiment of the inventive concept includes a method according to Statement 45, wherein:

使用擦除編碼邏輯中的探測邏輯處理傳輸包括由探測邏輯確定傳輸包括控制傳輸;並且Processing the transmission using the detection logic in the erasure coding logic includes determining by the detection logic that the transmission includes the control transmission; and

通過PCIe交換機將傳輸遞送到其目的地包括將傳輸遞送到功率處理單元(PPU)。Delivering the transmission to its destination through the PCIe switch includes delivering the transmission to the power processing unit (PPU).

聲明48. 本發明概念的實施例包括根據聲明45的方法,其中使用擦除編碼邏輯中的探測邏輯處理傳輸包括至少部分地基於擦除編碼邏輯為現用而使用探測邏輯處理傳輸。Statement 48. An embodiment of the inventive concept includes a method according to Statement 45, wherein processing the transmission using probe logic in erasure coding logic includes processing the transmission using probe logic for active use based at least in part on the erasure coding logic.

聲明49. 本發明概念的實施例包括根據聲明45的方法,其中:Statement 49. An embodiment of the inventive concept includes a method according to Statement 45, wherein:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從主機接收讀取請求;Receiving transmissions at peripheral component connection express (PCIe) switches with erasure coding logic includes receiving read requests from the host;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在讀取請求中以裝置邏輯塊位址(LBA)替換主機LBA;並且Using probe logic in erasure coding logic to process the transmission includes replacing the host LBA with the device logical block address (LBA) in the read request; and

通過PCIe交換機將傳輸遞送到其目的地包括將讀取請求遞送到非揮發性儲存快速(NVMe)固態驅動器(SSD)。Delivering the transmission to its destination through the PCIe switch includes delivering the read request to a non-volatile storage express (NVMe) solid-state drive (SSD).

聲明50. 本發明概念的實施例包括根據聲明49的方法,其中使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括識別讀取請求應被遞送到的NVMe SSD。Statement 50. An embodiment of the inventive concept includes a method according to statement 49, wherein processing the transmission using probe logic in erasure coding logic further includes identifying the NVMe SSD to which the read request should be delivered.

聲明51. 本發明概念的實施例包括根據聲明49的方法,其中:Statement 51. An embodiment of the inventive concept includes a method according to Statement 49, in which:

使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括至少部分地基於資料駐留在快取中而從快取存取主機在讀取請求中所請求的資料;Processing the transmission using the detection logic in the erasure coding logic further includes accessing the data requested by the host in the read request from the cache based at least in part on the data residing in the cache;

在讀取請求中以裝置邏輯塊位址(LBA)替換主機LBA包括至少部分地基於資料不駐留在快取中而在讀取請求中以裝置LBA替換主機LBA;並且Replacing the host LBA with the device logical block address (LBA) in the read request includes replacing the host LBA with the device LBA in the read request based at least in part on the data not residing in the cache; and

通過PCIe交換機將傳輸遞送到其目的地包括至少部分地基於資料不駐留在快取中而將讀取請求遞送到NVMe SSD。Delivering the transmission to its destination through the PCIe switch includes delivering the read request to the NVMe SSD based at least in part on the material not residing in the cache.

聲明52. 本發明概念的實施例包括根據聲明45的方法,其中:Statement 52. An embodiment of the inventive concept includes a method according to Statement 45, wherein:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從主機接收寫入請求;Receiving a transmission at a peripheral component connection express (PCIe) switch with erasure coding logic includes receiving a write request from the host;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在寫入請求中以裝置LBA替換主機LBA;並且Using probe logic in erasure coding logic to process the transmission includes replacing the host LBA with the device LBA in the write request; and

通過PCIe交換機將傳輸遞送到其目的地包括將寫入請求遞送到NVMe SSD。Delivering the transmission to its destination through the PCIe switch includes delivering the write request to the NVMe SSD.

聲明53. 本發明概念的實施例包括根據聲明52的方法,其中使用擦除編碼邏輯中的探測邏輯來處理傳輸進一步包括識別寫入請求應被遞送到的NVMe SSD。Statement 53. An embodiment of the inventive concept includes a method according to Statement 52, wherein using probe logic in erasure coding logic to process the transmission further includes identifying the NVMe SSD to which the write request should be delivered.

聲明54. 本發明概念的實施例包括根據聲明52的方法,所述方法進一步包括:Statement 54. An embodiment of the inventive concept includes a method according to Statement 52, the method further comprising:

從至少一個NVMe SSD讀取塊條帶;Read block stripes from at least one NVMe SSD;

將寫入請求中的資料與塊條帶合併,以形成更新的塊條帶;以及Merge the data written in the request with the block stripe to form an updated block stripe; and

將更新的塊條帶寫入到所述至少一個NVMe SSD。Write the updated block stripe to the at least one NVMe SSD.

聲明55. 本發明概念的實施例包括根據聲明54的方法,其中將寫入請求中的資料合併包括除寫入請求中的資料以外,還計算要寫入到所述至少一個NVMe SSD的附加資料。Statement 55. An embodiment of the inventive concept includes a method according to statement 54, wherein combining the data in the write request includes calculating additional data to be written to the at least one NVMe SSD in addition to the data in the write request .

聲明56. 本發明概念的實施例包括根據聲明54的方法,其中:Statement 56. An embodiment of the inventive concept includes a method according to Statement 54, wherein:

所述方法進一步包括至少部分地基於塊條帶駐留在快取中而從快取讀取塊條帶;並且The method further includes reading the block stripe from the cache based at least in part on the block stripe residing in the cache; and

從至少一個NVMe SSD讀取塊條帶包括至少部分地基於塊條帶不駐留在快取中而從所述至少一個NVMe SSD讀取塊條帶。Reading the block stripes from the at least one NVMe SSD includes reading the block stripes from the at least one NVMe SSD based at least in part on the block stripes not residing in the cache.

聲明57. 本發明概念的實施例包括根據聲明54的方法,其中將更新的塊條帶寫入到所述至少一個NVMe SSD包括將更新的塊條帶寫入到寫入緩衝器。Statement 57. An embodiment of the inventive concept includes the method according to statement 54, wherein writing the updated block stripe to the at least one NVMe SSD includes writing the updated block stripe to the write buffer.

聲明58. 本發明概念的實施例包括根據聲明57的方法,所述方法進一步包括就寫入已在更新的塊條帶被寫入到寫入緩衝器之後和在更新的塊條帶被寫入到所述至少一個NVMe SSD之前完成而對主機作出回應。Statement 58. An embodiment of the inventive concept includes a method according to statement 57, the method further comprising writing to the write buffer after the updated block stripe is written to and after the updated block stripe is written The response to the host is completed before the at least one NVMe SSD is completed.

聲明59. 本發明概念的實施例包括根據聲明45的方法,其中:Statement 59. An embodiment of the inventive concept includes a method according to Statement 45, wherein:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從NVMe SSD接收回應;Receiving a transmission at a PCIe switch with peripheral code erasure coding logic includes receiving a response from the NVMe SSD;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在回應中以主機LBA替換裝置LBA;並且Use the detection logic in the erasure coding logic to process the transmission including replacing the device LBA with the host LBA in the response; and

通過PCIe交換機將傳輸遞送到其目的地包括將回應遞送到主機。Delivering the transmission to its destination through the PCIe switch includes delivering the response to the host.

聲明60. 本發明概念的實施例包括根據聲明59的方法,其中使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括以虛擬儲存裝置的識別字(identifier)替換NVMe SSD的識別字。Statement 60. An embodiment of the inventive concept includes a method according to statement 59, wherein processing the transmission using probe logic in erasure coding logic further includes replacing the identifier of the NVMe SSD with an identifier of the virtual storage device.

聲明61. 本發明概念的實施例包括根據聲明45的方法,其中通過PCIe交換機將傳輸遞送到其目的地包括將傳輸遞送到NVMe SSD所連接到的第二PCIe交換機,NVMe SSD是目的地。Statement 61. An embodiment of the inventive concept includes the method according to Statement 45, wherein delivering the transmission to its destination through the PCIe switch includes delivering the transmission to the second PCIe switch to which the NVMe SSD is connected, the NVMe SSD being the destination.

聲明62. 本發明概念的實施例包括根據聲明61的方法,其中PCIe交換機位於第一主機殼中,且第二PCIe交換機位於第二主機殼中。Statement 62. An embodiment of the inventive concept includes the method according to Statement 61, wherein the PCIe switch is located in the first mainframe and the second PCIe switch is located in the second mainframe.

聲明63. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括將連接到PCIe交換機的至少一個NVMe SSD初始化,以與擦除編碼一起使用。Statement 63. An embodiment of the inventive concept includes the method according to Statement 45, the method further comprising initializing at least one NVMe SSD connected to the PCIe switch for use with erasure coding.

聲明64. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括:Statement 64. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising:

檢測到新NVMe SSD連接到PCIe交換機;以及A new NVMe SSD is detected connected to the PCIe switch; and

將新NVMe SSD添加到虛擬儲存裝置的容量。Add the new NVMe SSD to the capacity of the virtual storage device.

聲明65. 本發明概念的實施例包括根據聲明64的方法,所述方法進一步包括將新NVMe SSD初始化,以與擦除編碼一起使用。Statement 65. An embodiment of the inventive concept includes a method according to statement 64, the method further comprising initializing a new NVMe SSD for use with erasure coding.

聲明66. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括:Statement 66. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising:

檢測連接到PCIe交換機的故障NVMe SSD;以及Detect a faulty NVMe SSD connected to a PCIe switch; and

對儲存在故障NVMe SSD上的資料實行擦除編碼恢復。Perform erasure code recovery on the data stored on the faulty NVMe SSD.

聲明67. 本發明概念的實施例包括根據聲明66的方法,所述方法進一步包括:Statement 67. An embodiment of the inventive concept includes a method according to Statement 66, the method further comprising:

檢測故障NVMe SSD的替換NVMe SSD;以及Replacement NVMe SSD for faulty NVMe SSD detection; and

使用替換NVMe SSD重建故障NVMe SSD。Use the replacement NVMe SSD to rebuild the faulty NVMe SSD.

聲明68. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括:Statement 68. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising:

檢測到只有不具有擦除編碼功能的NVMe SSD連接到PCIe交換機;以及It is detected that only NVMe SSD without erasure coding is connected to the PCIe switch; and

啟用PCIe交換機中的擦除編碼邏輯。Enable the erasure coding logic in the PCIe switch.

聲明69. 本發明概念的實施例包括根據聲明68的方法,所述方法進一步包括終止PCIe交換機下游的PCIe匯流排枚舉。Statement 69. An embodiment of the inventive concept includes a method according to Statement 68, the method further comprising terminating the PCIe bus enumeration downstream of the PCIe switch.

聲明70. 本發明概念的實施例包括根據聲明68的方法,所述方法進一步包括將虛擬儲存裝置報告給主機,所述虛擬儲存裝置的容量至少部分地基於連接到PCIe交換機的NVMe SSD的容量及擦除編碼方案。Statement 70. An embodiment of the inventive concept includes a method according to Statement 68, the method further comprising reporting a virtual storage device to a host, the capacity of the virtual storage device is based at least in part on the capacity of an NVMe SSD connected to a PCIe switch and Erase coding scheme.

聲明71. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括:Statement 71. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising:

檢測到至少一個非儲存裝置或具有擦除編碼功能的至少一個NVMe SSD連接到PCIe交換機;以及It is detected that at least one non-storage device or at least one NVMe SSD with erasure coding function is connected to the PCIe switch; and

禁用PCIe交換機中的擦除編碼邏輯。Disable the erasure coding logic in the PCIe switch.

聲明72. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括:Statement 72. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising:

檢測到至少一個非儲存裝置或具有擦除編碼功能的至少一個NVMe SSD連接到PCIe交換機;It is detected that at least one non-storage device or at least one NVMe SSD with erasure coding function is connected to the PCIe switch;

啟用PCIe交換機中的擦除編碼邏輯;以及Enable erasure coding logic in PCIe switches; and

禁用所述至少一個非儲存裝置或具有擦除編碼功能的所述至少一個NVMe SSD。Disable the at least one non-storage device or the at least one NVMe SSD with erasure coding function.

聲明73. 本發明概念的實施例包括根據聲明72的方法,所述方法進一步包括終止PCIe交換機下游的PCIe匯流排枚舉。Statement 73. An embodiment of the inventive concept includes a method according to Statement 72, the method further comprising terminating the PCIe bus enumeration downstream of the PCIe switch.

聲明74. 本發明概念的實施例包括根據聲明72的方法,所述方法進一步包括將虛擬儲存裝置報告給主機,所述虛擬儲存裝置的容量至少部分地基於連接到PCIe交換機的NVMe SSD的容量及擦除編碼方案。Statement 74. An embodiment of the inventive concept includes a method according to Statement 72, the method further comprising reporting a virtual storage device to a host, the capacity of the virtual storage device is based at least in part on the capacity of an NVMe SSD connected to a PCIe switch and Erase coding scheme.

聲明75. 本發明概念的實施例包括根據聲明45的方法,所述方法進一步包括將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案。Statement 75. An embodiment of the inventive concept includes a method according to Statement 45, the method further comprising configuring a PCIe switch with erasure coding logic to use an erasure coding scheme.

聲明76. 本發明概念的實施例包括根據聲明75的方法,其中將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案包括使用基板管理控制器(BMC)將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案。Statement 76. An embodiment of the inventive concept includes a method according to Statement 75, wherein configuring a PCIe switch with erasure coding logic to use an erasure coding scheme includes using a baseboard management controller (BMC) to configure a PCIe with erasure coding logic The switch is configured to use an erasure coding scheme.

聲明77. 本發明概念的實施例包括一種物品,所述物品包括非暫時性儲存介質,所述非暫時性儲存介質上儲存有指令,所述指令當由機器執行時使得:Statement 77. An embodiment of the inventive concept includes an article that includes a non-transitory storage medium that has instructions stored thereon that when executed by a machine cause:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸;Receive transmissions at peripheral component connection express (PCIe) switches with erasure coding logic;

使用擦除編碼邏輯中的探測邏輯處理傳輸;並且Use the detection logic in the erasure coding logic to process the transmission; and

通過PCIe交換機將傳輸遞送到其目的地。Deliver the transmission to its destination through the PCIe switch.

聲明78. 本發明概念的實施例包括根據聲明77的物品,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 78. An embodiment of the inventive concept includes an article according to Statement 77, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic.

聲明79. 本發明概念的實施例包括根據聲明77的物品,其中:Statement 79. An embodiment of the inventive concept includes an article according to Statement 77, in which:

使用擦除編碼邏輯中的探測邏輯處理傳輸包括由探測邏輯確定傳輸包括控制傳輸;並且Processing the transmission using the detection logic in the erasure coding logic includes determining by the detection logic that the transmission includes the control transmission; and

通過PCIe交換機將傳輸遞送到其目的地包括將傳輸遞送到功率處理單元(PPU)。Delivering the transmission to its destination through the PCIe switch includes delivering the transmission to the power processing unit (PPU).

聲明80. 本發明概念的實施例包括根據聲明77的物品,其中使用擦除編碼邏輯中的探測邏輯處理傳輸包括至少部分地基於擦除編碼邏輯為現用而使用探測邏輯處理傳輸。Statement 80. An embodiment of the inventive concept includes an article according to statement 77, wherein using probe logic in erasure coding logic to process the transmission includes using probe logic to process the transmission for active use based at least in part on the erasure coding logic.

聲明81. 本發明概念的實施例包括根據聲明77的物品,其中:Statement 81. An embodiment of the inventive concept includes an article according to Statement 77, in which:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從主機接收讀取請求;Receiving transmissions at peripheral component connection express (PCIe) switches with erasure coding logic includes receiving read requests from the host;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在讀取請求中以裝置邏輯塊位址(LBA)替換主機LBA;並且Using probe logic in erasure coding logic to process the transmission includes replacing the host LBA with the device logical block address (LBA) in the read request; and

通過PCIe交換機將傳輸遞送到其目的地包括將讀取請求遞送到非揮發性儲存快速(NVMe)固態驅動器(SSD)。Delivering the transmission to its destination through the PCIe switch includes delivering the read request to a non-volatile storage express (NVMe) solid-state drive (SSD).

聲明82. 本發明概念的實施例包括根據聲明81的物品,其中使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括識別讀取請求應被遞送到的NVMe SSD。Statement 82. An embodiment of the inventive concept includes an item according to statement 81, wherein processing the transmission using probe logic in erasure coding logic further includes identifying the NVMe SSD to which the read request should be delivered.

聲明83. 本發明概念的實施例包括根據聲明81的物品,其中:Statement 83. An embodiment of the inventive concept includes an article according to Statement 81, in which:

使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括至少部分地基於資料駐留在快取中而從快取存取主機在讀取請求中所請求的資料;Processing the transmission using the detection logic in the erasure coding logic further includes accessing the data requested by the host in the read request from the cache based at least in part on the data residing in the cache;

在讀取請求中以裝置邏輯塊位址(LBA)替換主機LBA包括至少部分地基於資料不駐留在快取中而在讀取請求中以裝置LBA替換主機LBA;以及Replacing the host LBA with the device logical block address (LBA) in the read request includes replacing the host LBA with the device LBA in the read request based at least in part on the data not residing in the cache; and

通過PCIe交換機將傳輸遞送到其目的地包括至少部分基於資料不駐留在快取中而將讀取請求遞送到NVMe SSD。Delivering the transmission to its destination through the PCIe switch includes delivering the read request to the NVMe SSD based at least in part on the material not residing in the cache.

聲明84. 本發明概念的實施例包括根據聲明77的物品,其中:Statement 84. An embodiment of the inventive concept includes an article according to Statement 77, in which:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從主機接收寫入請求;Receiving a transmission at a peripheral component connection express (PCIe) switch with erasure coding logic includes receiving a write request from the host;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在寫入請求中以裝置LBA替換主機LBA;並且Using probe logic in erasure coding logic to process the transmission includes replacing the host LBA with the device LBA in the write request; and

通過PCIe交換機將傳輸遞送到其目的地包括將寫入請求遞送到NVMe SSD。Delivering the transmission to its destination through the PCIe switch includes delivering the write request to the NVMe SSD.

聲明85. 本發明概念的實施例包括根據聲明84的物品,其中使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括識別寫入請求應被遞送到的NVMe SSD。Statement 85. An embodiment of the inventive concept includes an article according to statement 84, wherein processing the transmission using probe logic in erasure coding logic further includes identifying the NVMe SSD to which the write request should be delivered.

聲明86. 本發明概念的實施例包括根據聲明84的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 86. An embodiment of the inventive concept includes an article according to Statement 84, on which non-transitory storage media stores further instructions that when executed by a machine cause:

從至少一個NVMe SSD讀取塊條帶;Read block stripes from at least one NVMe SSD;

將寫入請求中的資料與塊條帶合併,以形成更新的塊條帶;並且Merge the data written in the request with the block stripe to form an updated block stripe; and

將更新的塊條帶寫入到所述至少一個NVMe SSD。Write the updated block stripe to the at least one NVMe SSD.

聲明87. 本發明概念的實施例包括根據聲明86的物品,其中將寫入請求中的資料合併包括除寫入請求中的資料以外,還計算要寫入到所述至少一個NVMe SSD的附加資料。Statement 87. An embodiment of the inventive concept includes an item according to statement 86, wherein combining the data in the write request includes calculating additional data to be written to the at least one NVMe SSD in addition to the data in the write request .

聲明88. 本發明概念的實施例包括根據聲明86的物品,其中:Statement 88. An embodiment of the inventive concept includes an article according to statement 86, in which:

所述非暫時性儲存介質上儲存有進一步的指令,所述指令當被機器執行時使得至少部分地基於塊條帶駐留在快取中而從快取讀取塊條帶;並且Further instructions are stored on the non-transitory storage medium, which when executed by the machine causes the block stripe to be read from the cache based at least in part on the block stripe residing in the cache; and

從至少一個NVMe SSD讀取塊條帶包括至少部分地基於塊條帶不駐留在快取中而從所述至少一個NVMe SSD讀取塊條帶。Reading the block stripes from the at least one NVMe SSD includes reading the block stripes from the at least one NVMe SSD based at least in part on the block stripes not residing in the cache.

聲明89. 本發明概念的實施例包括根據聲明86的物品,其中將更新的塊條帶寫入到所述至少一個NVMe SSD包括將更新的塊條帶寫入到寫入緩衝器。Statement 89. An embodiment of the inventive concept includes the article according to statement 86, wherein writing the updated block stripe to the at least one NVMe SSD includes writing the updated block stripe to the write buffer.

聲明90. 本發明概念的實施例包括根據聲明89的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得就寫入已在更新的塊條帶被寫入到寫入緩衝器之後和在更新的塊條帶被寫入到所述至少一個NVMe SSD之前完成而對主機作出回應。Statement 90. An embodiment of the inventive concept includes an article according to statement 89, on which a non-transitory storage medium stores further instructions which, when executed by the machine, cause the block stripes that have been updated to be written After writing to the buffer and before the updated block stripe is written to the at least one NVMe SSD, it responds to the host.

聲明91. 本發明概念的實施例包括根據聲明77的物品,其中:Statement 91. An embodiment of the inventive concept includes an article according to Statement 77, in which:

在具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機處接收傳輸包括從NVMe SSD接收回應;Receiving a transmission at a PCIe switch with peripheral code erasure coding logic includes receiving a response from the NVMe SSD;

使用擦除編碼邏輯中的探測邏輯處理傳輸包括在回應中以主機LBA替換裝置LBA;並且Use the detection logic in the erasure coding logic to process the transmission including replacing the device LBA with the host LBA in the response; and

通過PCIe交換機將傳輸遞送到其目的地包括將回應遞送到主機。Delivering the transmission to its destination through the PCIe switch includes delivering the response to the host.

聲明92. 本發明概念的實施例包括根據聲明91的物品,其中使用擦除編碼邏輯中的探測邏輯處理傳輸進一步包括以虛擬儲存裝置的識別字替換NVMe SSD的識別字。Statement 92. An embodiment of the inventive concept includes an item according to Statement 91, wherein processing the transmission using probe logic in erasure coding logic further includes replacing the identification word of the NVMe SSD with the identification word of the virtual storage device.

聲明93. 本發明概念的實施例包括根據聲明77的物品,其中通過PCIe交換機將傳輸遞送到其目的地包括將傳輸遞送到NVMe SSD所連接到的第二PCIe交換機,NVMe SSD是目的地。Statement 93. An embodiment of the inventive concept includes an article according to Statement 77, wherein delivering the transmission to its destination through the PCIe switch includes delivering the transmission to the second PCIe switch to which the NVMe SSD is connected, the NVMe SSD being the destination.

聲明94. 本發明概念的實施例包括根據聲明93的物品,其中PCIe交換機位於第一主機殼中,且第二PCIe交換機位於第二主機殼中。Statement 94. An embodiment of the inventive concept includes the article according to statement 93, wherein the PCIe switch is located in the first main chassis and the second PCIe switch is located in the second main chassis.

聲明95. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得將連接到PCIe交換機的至少一個NVMe SSD初始化,以與擦除編碼一起使用。Statement 95. An embodiment of the inventive concept includes the item according to Statement 77, further instructions are stored on the non-transitory storage medium, which when executed by the machine initializes at least one NVMe SSD connected to the PCIe switch, to Used with erasure coding.

聲明96. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 96. An embodiment of the inventive concept includes an article according to Statement 77, on which non-transitory storage media stores further instructions which when executed by a machine cause:

檢測到新NVMe SSD連接到PCIe交換機;並且It is detected that the new NVMe SSD is connected to the PCIe switch; and

將新NVMe SSD添加到虛擬儲存裝置的容量。Add the new NVMe SSD to the capacity of the virtual storage device.

聲明97. 本發明概念的實施例包括根據聲明96的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器實行時使得將新NVMe SSD初始化,以與擦除編碼一起使用。Statement 97. An embodiment of the inventive concept includes the article according to statement 96, where further instructions are stored on the non-transitory storage medium that when executed by the machine causes the new NVMe SSD to be initialized for use with erasure coding .

聲明98. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 98. An embodiment of the inventive concept includes an article according to Statement 77, on which non-transitory storage media stores further instructions that when executed by a machine cause:

檢測連接到PCIe交換機的故障NVMe SSD;並且Detect a faulty NVMe SSD connected to a PCIe switch; and

對儲存在故障NVMe SSD上的資料實行擦除編碼恢復。Perform erasure code recovery on the data stored on the faulty NVMe SSD.

聲明99. 本發明概念的實施例包括根據聲明98的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 99. An embodiment of the inventive concept includes an article according to Statement 98, on which non-transitory storage media stores further instructions that when executed by a machine cause:

檢測故障NVMe SSD的替換NVMe SSD;並且Detect the replacement NVMe SSD for the faulty NVMe SSD; and

使用替換NVMe SSD重建故障NVMe SSD。Use the replacement NVMe SSD to rebuild the faulty NVMe SSD.

聲明100. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 100. An embodiment of the inventive concept includes an article according to statement 77, on which non-transitory storage media stores further instructions that when executed by a machine cause:

檢測到只有不具有擦除編碼功能的NVMe SSD連接到PCIe交換機;並且It is detected that only the NVMe SSD without erasure coding is connected to the PCIe switch; and

啟用PCIe交換機中的擦除編碼邏輯。Enable the erasure coding logic in the PCIe switch.

聲明101. 本發明概念的實施例包括根據聲明100的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得終止PCIe交換機下游的PCIe匯流排枚舉。Statement 101. An embodiment of the inventive concept includes an item according to Statement 100, on which non-transitory storage media stores further instructions that when executed by a machine causes termination of PCIe bus enumeration downstream of the PCIe switch.

聲明102. 本發明概念的實施例包括根據聲明100的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得將虛擬儲存裝置報告給主機,所述虛擬儲存裝置的容量至少部分地基於連接到PCIe交換機的NVMe SSD的容量及擦除編碼方案。Statement 102. An embodiment of the inventive concept includes an item according to Statement 100, on which non-transitory storage media stores further instructions that when executed by a machine causes a virtual storage device to be reported to the host, the virtual storage device The capacity is based at least in part on the capacity of the NVMe SSD connected to the PCIe switch and the erasure coding scheme.

聲明103. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 103. An embodiment of the inventive concept includes an article according to statement 77, on which non-transitory storage media stores further instructions, which when executed by a machine cause:

檢測到至少一個非儲存裝置或具有擦除編碼功能的至少一個NVMe SSD連接到PCIe交換機;並且It is detected that at least one non-storage device or at least one NVMe SSD with erasure coding function is connected to the PCIe switch; and

禁用PCIe交換機中的擦除編碼邏輯。Disable the erasure coding logic in the PCIe switch.

聲明104. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得:Statement 104. An embodiment of the inventive concept includes an article according to Statement 77, on which non-transitory storage media stores further instructions that when executed by a machine cause:

檢測到至少一個非儲存裝置或具有擦除編碼功能的至少一個NVMe SSD連接到PCIe交換機;It is detected that at least one non-storage device or at least one NVMe SSD with erasure coding function is connected to the PCIe switch;

啟用PCIe交換機中的擦除編碼邏輯;並且Enable the erasure coding logic in the PCIe switch; and

禁用所述至少一個非儲存裝置或具有擦除編碼功能的所述至少一個NVMe SSD。Disable the at least one non-storage device or the at least one NVMe SSD with erasure coding function.

聲明105. 本發明概念的實施例包括根據聲明104的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得終止PCIe交換機下游的PCIe匯流排枚舉。Statement 105. An embodiment of the inventive concept includes the item according to statement 104, on which non-transitory storage media stores further instructions that when executed by the machine cause termination of the PCIe bus enumeration downstream of the PCIe switch.

聲明106. 本發明概念的實施例包括根據聲明104的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得將虛擬儲存裝置報告給主機,所述虛擬儲存裝置的容量至少部分地基於連接到PCIe交換機的NVMe SSD的容量及擦除編碼方案。Statement 106. An embodiment of the inventive concept includes an article according to statement 104, on which non-transitory storage media stores further instructions that when executed by a machine causes the virtual storage device to be reported to the host, the virtual storage device The capacity is based at least in part on the capacity of the NVMe SSD connected to the PCIe switch and the erasure coding scheme.

聲明107. 本發明概念的實施例包括根據聲明77的物品,非暫時性儲存介質上儲存有進一步的指令,所述指令當由機器執行時使得將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案。Statement 107. An embodiment of the inventive concept includes an item according to statement 77, on which non-transitory storage media stores further instructions that when executed by a machine causes the PCIe switch with erasure coding logic to be configured to use erasure Except coding scheme.

聲明108. 本發明概念的實施例包括根據聲明107的物品,其中將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案包括使用基板管理控制器(BMC)將具有擦除編碼邏輯的PCIe交換機配置為使用擦除編碼方案。Statement 108. An embodiment of the inventive concept includes an article according to statement 107, wherein configuring a PCIe switch with erasure coding logic to use an erasure coding scheme includes using a baseboard management controller (BMC) to configure a PCIe with erasure coding logic The switch is configured to use an erasure coding scheme.

聲明109. 本發明概念的實施例包括一種系統,所述系統包括:Statement 109. An embodiment of the inventive concept includes a system including:

非揮發性儲存快速(NVMe)固態驅動器(SSD);Non-volatile storage fast (NVMe) solid state drive (SSD);

現場可程式設計閘陣列(FPGA),所述FPGA實施支持NVMe SSD的一個或多個功能,所述功能包括資料加速、重復資料刪除、資料完整性、資料加密及資料壓縮中的至少一個;以及Field programmable gate array (FPGA) that implements one or more functions that support NVMe SSD, the functions including at least one of data acceleration, deduplication, data integrity, data encryption, and data compression; and

周邊元件連接快速(PCIe)交換機;Peripheral components connected to Express (PCIe) switches

其中PCIe交換機與FPGA及NVMe SSD通信。PCIe switch communicates with FPGA and NVMe SSD.

聲明110. 本發明概念的實施例包括根據聲明109的系統,其中FPGA及NVMe SSD位於共用殼體內部。Statement 110. An embodiment of the inventive concept includes a system according to statement 109, wherein the FPGA and NVMe SSD are located inside a common housing.

聲明111. 本發明概念的實施例包括根據聲明110的系統,其中PCIe交換機位於包括FPGA及NVMe SSD的共用殼體外部。Statement 111. An embodiment of the inventive concept includes a system according to Statement 110, in which the PCIe switch is located outside of a common housing that includes an FPGA and an NVMe SSD.

聲明112. 本發明概念的實施例包括根據聲明109的系統,其中:Statement 112. An embodiment of the inventive concept includes a system according to Statement 109, in which:

PCIe交換機連接到FPGA;並且The PCIe switch is connected to the FPGA; and

FPGA連接到NVMe SSD。The FPGA is connected to the NVMe SSD.

聲明113. 本發明概念的實施例包括根據聲明109的系統,其中:Statement 113. An embodiment of the inventive concept includes a system according to Statement 109, in which:

PCIe交換機連接到NVMe SSD;並且The PCIe switch is connected to the NVMe SSD; and

NVMe SSD連接到FPGA。The NVMe SSD is connected to the FPGA.

聲明114. 本發明概念的實施例包括根據聲明109的系統,其中PCIe交換機包括擦除編碼邏輯,擦除編碼邏輯包括擦除編碼控制器。Statement 114. An embodiment of the inventive concept includes a system according to statement 109, wherein the PCIe switch includes erasure coding logic, and the erasure coding logic includes an erasure coding controller.

聲明115. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 115. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic.

聲明116. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯能夠操作以至少部分地基於快取中存在來自主機的讀取請求中所請求的資料而返回對所述讀取請求的回應。Statement 116. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic is operable to return to the read based at least in part on the presence of the data requested in the read request from the host in the cache Requested response.

聲明117. 本發明概念的實施例包括根據聲明116的系統,其中擦除編碼邏輯進一步包括快取。Statement 117. Embodiments of the inventive concept include a system according to statement 116, wherein the erasure coding logic further includes cache.

聲明118. 本發明概念的實施例包括根據聲明116的系統,其中:Statement 118. An embodiment of the inventive concept includes a system according to statement 116, wherein:

PCIe交換機位於主機殼中;並且The PCIe switch is located in the mainframe; and

主機殼包括由擦除編碼邏輯用作快取的記憶體。The main case includes memory used as cache by erasure coding logic.

聲明119. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯能夠操作以在完成寫入請求之前向主機返回對寫入請求的回應。Statement 119. Embodiments of the inventive concept include a system according to statement 114, wherein the erasure coding logic is operable to return a response to the write request to the host before completing the write request.

聲明120. 本發明概念的實施例包括根據聲明119的系統,其中:Statement 120. An embodiment of the inventive concept includes a system according to statement 119, in which:

PCIe交換機進一步包括寫入緩衝器;並且The PCIe switch further includes a write buffer; and

擦除編碼控制器能夠操作以將寫入請求中的資料儲存在寫入緩衝器中。The erasure coding controller is operable to store the data in the write request in the write buffer.

聲明121. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯,旁視擦除編碼邏輯包括探測邏輯。Statement 121. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic includes side-by-side erasure coding logic, and the side-view erasure-coding logic includes detection logic.

聲明122. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處接收的控制傳輸,並將控制傳輸轉發到功率處理單元(PPU)。Statement 122. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic is operable to intercept control transmissions received at the PCIe switch and forward the control transmissions to the power processing unit (PPU).

聲明123. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從主機接收的資料傳輸,並在資料傳輸中以由NVMe SSD所使用的裝置邏輯塊位址(LBA)替換由主機所使用的主機LBA。Statement 123. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic is operable to intercept data transmissions received from the host at the PCIe switch, and in the data transmission with device logic used by the NVMe SSD Block address (LBA) replaces the host LBA used by the host.

聲明124. 本發明概念的實施例包括根據聲明123的系統,其中擦除編碼邏輯進一步能夠操作以將資料傳輸引導到NVMe SSD。Statement 124. An embodiment of the inventive concept includes a system according to statement 123, wherein the erasure coding logic is further operable to direct data transfer to the NVMe SSD.

聲明125. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從NVMe SSD接收的資料傳輸,並在資料傳輸中以由主機所使用的主機LBA替換由NVMe SSD所使用的裝置LBA。Statement 125. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic is operable to intercept data transfers received from the NVMe SSD at the PCIe switch, and use the host LBA used by the host in the data transfer Replace the device LBA used by NVMe SSD.

聲明126. 本發明概念的實施例包括根據聲明114的系統,其中擦除編碼邏輯定義跨越NVMe SSD及第二NVMe SSD的虛擬儲存裝置。Statement 126. An embodiment of the inventive concept includes a system according to statement 114, wherein the erasure coding logic defines a virtual storage device that spans the NVMe SSD and the second NVMe SSD.

聲明127. 本發明概念的實施例包括根據聲明114的系統,其中PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯。Statement 127. Embodiments of the inventive concept include a system according to statement 114, wherein the PCIe switch is operable to enable erasure coding logic based at least in part on the NVMe SSD being able to be used with erasure coding logic.

聲明128. 本發明概念的實施例包括根據聲明114的系統,所述系統進一步包括連接到具有擦除編碼邏輯的PCIe交換機的第二裝置。Statement 128. An embodiment of the inventive concept includes a system according to statement 114, the system further comprising a second device connected to a PCIe switch with erasure coding logic.

聲明129. 本發明概念的實施例包括根據聲明128的系統,其中第二裝置包括儲存裝置、具有現場可程式設計閘陣列(FPGA)的SSD及圖形處理單元(GPU)中的至少一個。Statement 129. An embodiment of the inventive concept includes a system according to statement 128, wherein the second device includes at least one of a storage device, an SSD with a field programmable gate array (FPGA), and a graphics processing unit (GPU).

聲明130. 本發明概念的實施例包括根據聲明128的系統,其中:Statement 130. An embodiment of the inventive concept includes a system according to statement 128, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

PCIe交換機能夠操作以至少部分地基於第二裝置不能夠與擦除編碼邏輯一起使用而禁用擦除編碼邏輯。The PCIe switch is operable to disable erasure coding logic based at least in part on the inability of the second device to be used with erasure coding logic.

聲明131. 本發明概念的實施例包括根據聲明128的系統,其中:Statement 131. An embodiment of the inventive concept includes a system according to Statement 128, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯,並在不使用擦除編碼邏輯的條件下啟用對第二裝置的存取。The PCIe switch is operable to enable erasure coding logic based at least in part on the NVMe SSD being able to be used with erasure coding logic, and enable access to the second device without using erasure coding logic.

聲明132. 本發明概念的實施例包括根據聲明128的系統,其中:Statement 132. An embodiment of the inventive concept includes a system according to statement 128, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯,並禁用對第二裝置的存取。The PCIe switch is operable to enable erasure coding logic based at least in part on the NVMe SSD can be used with erasure coding logic, and disable access to the second device.

聲明133. 本發明概念的實施例包括一種系統,所述系統包括:Statement 133. Embodiments of the inventive concept include a system, the system including:

非揮發性儲存快速(NVMe)固態驅動器(SSD);以及Non-volatile storage fast (NVMe) solid-state drive (SSD); and

現場可程式設計閘陣列(FPGA),所述FPGA包括第一FPGA部分及第二FPGA部分,第一FPGA部分實施支援NVMe SSD的一個或多個功能,所述功能包括資料加速、重復資料刪除、資料完整性、資料加密及資料壓縮中的至少一個,且第二FPGA部分實施周邊元件連接快速(PCIe)交換機,Field programmable gate array (FPGA). The FPGA includes a first FPGA part and a second FPGA part. The first FPGA part implements one or more functions that support NVMe SSD. The functions include data acceleration, deduplication, At least one of data integrity, data encryption and data compression, and the second FPGA part implements peripheral component connection express (PCIe) switches,

其中PCIe交換機與FPGA及NVMe SSD通信,且PCIe switch communicates with FPGA and NVMe SSD, and

其中FPGA及NVMe SSD位於共用殼體內部。Among them, FPGA and NVMe SSD are located inside the common housing.

聲明134. 本發明概念的實施例包括根據聲明133的系統,其中PCIe交換機包括擦除編碼邏輯,擦除編碼邏輯包括擦除編碼控制器。Statement 134. An embodiment of the inventive concept includes a system according to statement 133, wherein the PCIe switch includes erasure coding logic, and the erasure coding logic includes an erasure coding controller.

聲明135. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯定義跨越NVMe SSD的至少兩個部分的虛擬儲存裝置。Statement 135. Embodiments of the inventive concept include a system according to statement 134, wherein the erasure coding logic defines a virtual storage device that spans at least two parts of the NVMe SSD.

聲明136. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯定義跨越NVMe SSD及第二NVMe SSD的虛擬儲存裝置。Statement 136. An embodiment of the inventive concept includes a system according to statement 134, wherein the erasure coding logic defines a virtual storage device that spans the NVMe SSD and the second NVMe SSD.

聲明137. 本發明概念的實施例包括根據聲明136的系統,其中第二NVMe SSD位於共用殼體內部。Statement 137. An embodiment of the inventive concept includes the system according to statement 136, wherein the second NVMe SSD is located inside the common housing.

聲明138. 本發明概念的實施例包括根據聲明136的系統,其中第二NVMe SSD位於共用殼體外部。Statement 138. An embodiment of the inventive concept includes the system according to statement 136, wherein the second NVMe SSD is located outside the common housing.

聲明139. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 139. Embodiments of the inventive concept include a system according to statement 134, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic.

聲明140. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯能夠操作以至少部分地基於快取中存在來自主機的讀取請求中所請求的資料而返回對所述讀取請求的回應。Statement 140. An embodiment of the inventive concept includes a system according to statement 134, wherein the erasure coding logic is operable to return to the read based at least in part on the presence of the data requested in the read request from the host in the cache Requested response.

聲明141. 本發明概念的實施例包括根據聲明140的系統,其中FPGA進一步包括快取。Statement 141. An embodiment of the inventive concept includes a system according to statement 140, wherein the FPGA further includes a cache.

聲明142. 本發明概念的實施例包括根據聲明140的系統,其中:Statement 142. An embodiment of the inventive concept includes a system according to statement 140, wherein:

共用殼體位於主機殼中;並且The common casing is located in the main casing; and

主機殼包括由擦除編碼邏輯用作快取的記憶體。The main case includes memory used as cache by erasure coding logic.

聲明143. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯能夠操作以在完成寫入請求之前向主機返回對寫入請求的回應。Statement 143. An embodiment of the inventive concept includes a system according to statement 134, wherein the erasure coding logic is operable to return a response to the write request to the host before completing the write request.

聲明144. 本發明概念的實施例包括根據聲明143的系統,其中:Statement 144. An embodiment of the inventive concept includes a system according to statement 143, in which:

FPGA進一步包括寫入緩衝器;並且The FPGA further includes a write buffer; and

擦除編碼控制器能夠操作以將寫入請求中的資料儲存在寫入緩衝器中。The erasure coding controller is operable to store the data in the write request in the write buffer.

聲明145. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯,旁視擦除編碼邏輯包括探測邏輯。Statement 145. Embodiments of the inventive concept include a system according to statement 134, wherein the erasure coding logic includes side-by-side erasure coding logic, and the side-view erasure-coding logic includes detection logic.

聲明146. 本發明概念的實施例包括根據聲明145的系統,其中探測邏輯能夠操作以攔截在PCIe交換機處接收的控制傳輸,並將控制傳輸轉發到功率處理單元(PPU)。Statement 146. An embodiment of the inventive concept includes a system according to statement 145, wherein the probe logic is operable to intercept control transmissions received at the PCIe switch and forward the control transmissions to the power processing unit (PPU).

聲明147. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從主機接收的資料傳輸,並在資料傳輸中以由NVMe SSD所使用的裝置邏輯塊位址(LBA)替換由主機所使用的主機LBA。Statement 147. An embodiment of the inventive concept includes a system according to statement 134, wherein the erasure coding logic is operable to intercept data transmissions received from the host at the PCIe switch, and in the data transmission with device logic used by the NVMe SSD Block address (LBA) replaces the host LBA used by the host.

聲明148. 本發明概念的實施例包括根據聲明147的系統,其中擦除編碼邏輯進一步能夠操作以將資料傳輸引導到NVMe SSD。Statement 148. Embodiments of the inventive concept include a system according to statement 147, wherein the erasure coding logic is further operable to direct data transfer to the NVMe SSD.

聲明149. 本發明概念的實施例包括根據聲明134的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從NVMe SSD接收的資料傳輸,並在資料傳輸中以由主機所使用的主機LBA替換由NVMe SSD所使用的裝置LBA。Statement 149. An embodiment of the inventive concept includes a system according to statement 134, where the erasure coding logic is operable to intercept data transfers received from the NVMe SSD at the PCIe switch, and use the host LBA used by the host in the data transfer Replace the device LBA used by NVMe SSD.

聲明150. 本發明概念的實施例包括根據聲明134的系統,其中具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分基於NVMe SSD能夠與擦除編碼邏輯一起使用來啟用擦除編碼邏輯。Statement 150. An embodiment of the inventive concept includes a system according to statement 134, wherein a PCIe switch with erasure coding logic is operable to enable erasure coding logic based at least in part on NVMe SSDs that can be used with erasure coding logic.

聲明151. 本發明概念的實施例包括根據聲明134的系統,其中具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分地基於NVMe SSD不能夠與擦除編碼邏輯一起使用來禁用擦除編碼邏輯。Statement 151. Embodiments of the inventive concept include a system according to statement 134, wherein a PCIe switch with erasure coding logic is operable to disable erasure coding logic based at least in part on the NVMe SSD cannot be used with erasure coding logic.

聲明152. 本發明概念的實施例包括一種系統,所述系統包括:Statement 152. Embodiments of the inventive concept include a system that includes:

非揮發性儲存快速(NVMe)固態驅動器(SSD);以及Non-volatile storage fast (NVMe) solid-state drive (SSD); and

具有擦除編碼邏輯的周邊元件連接快速(PCIe)交換機,包括:Peripheral components with erasure-coded logic connect to PCIe switches, including:

外部連接器,能夠使PCIe交換機與處理器通信;External connector, which enables the PCIe switch to communicate with the processor;

至少一個連接器,能夠使PCIe交換機與NVMe SSD通信;At least one connector that enables the PCIe switch to communicate with the NVMe SSD;

功率處理單元(PPU),配置PCIe交換機;以及Power Processing Unit (PPU) with PCIe switches; and

擦除編碼控制器,包括用於將擦除編碼方案應用於儲存在NVMe SSD上的資料的電路系統。The erasure coding controller includes circuitry for applying the erasure coding scheme to the data stored on the NVMe SSD.

聲明153. 本發明概念的實施例包括根據聲明152的系統,其中:Statement 153. Embodiments of the inventive concept include a system according to statement 152, wherein:

所述系統進一步包括第二NVMe SSD;並且The system further includes a second NVMe SSD; and

具有擦除編碼邏輯的PCIe交換機包括第二連接器,以能夠使具有擦除編碼邏輯的PCIe交換機與第二NVMe SSD通信。The PCIe switch with erasure coding logic includes a second connector to enable the PCIe switch with erasure coding logic to communicate with the second NVMe SSD.

聲明154. 本發明概念的實施例包括根據聲明152的系統,其中:Statement 154. An embodiment of the inventive concept includes a system according to statement 152, wherein:

所述系統進一步包括:The system further includes:

第二NVMe SSD;以及Second NVMe SSD; and

第二PCIe交換機,包括:The second PCIe switch includes:

第二外部連接器,能夠使第二PCIe交換機與處理器通信;The second external connector enables the second PCIe switch to communicate with the processor;

第二連接器,能夠使第二PCIe交換機與第二NVMe SSD通信;以及A second connector that enables the second PCIe switch to communicate with the second NVMe SSD; and

第三連接器,能夠使第二PCIe交換機與具有擦除編碼邏輯的PCIe交換機通信;並且The third connector enables the second PCIe switch to communicate with the PCIe switch with erasure coding logic; and

具有擦除編碼邏輯的PCIe交換機包括第四連接器,以能夠使具有擦除編碼邏輯的PCIe交換機與第二PCIe交換機通信,The PCIe switch with erasure coding logic includes a fourth connector to enable the PCIe switch with erasure coding logic to communicate with the second PCIe switch,

其中擦除編碼方案應用於儲存在NVMe SSD及第二NVMe SSD上的資料。The erasure coding scheme is applied to the data stored on the NVMe SSD and the second NVMe SSD.

聲明155. 本發明概念的實施例包括根據聲明154的系統,其中第二PCIe交換機進一步包括被禁用的第二擦除編碼邏輯。Statement 155. Embodiments of the inventive concept include a system according to statement 154, wherein the second PCIe switch further includes disabled second erasure coding logic.

聲明156. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。Statement 156. Embodiments of the inventive concept include a system according to statement 152, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic.

聲明157. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯能夠操作以至少部分地基於快取中存在來自主機的讀取請求中所請求的資料而返回對所述讀取請求的回應。Statement 157. Embodiments of the inventive concept include a system according to statement 152, wherein the erasure coding logic is operable to return to the read based at least in part on the presence of data requested in the read request from the host in the cache Requested response.

聲明158. 本發明概念的實施例包括根據聲明157的系統,其中擦除編碼邏輯進一步包括快取。Statement 158. Embodiments of the inventive concept include a system according to statement 157, wherein the erasure coding logic further includes cache.

聲明159. 本發明概念的實施例包括根據聲明157的系統,其中:Statement 159. An embodiment of the inventive concept includes a system according to statement 157, in which:

具有擦除編碼邏輯的PCIe交換機位於主機殼中;並且The PCIe switch with erasure coding logic is located in the mainframe; and

主機殼包括由擦除編碼邏輯用作快取的記憶體。The main case includes memory used as cache by erasure coding logic.

聲明160. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯能夠操作以在完成寫入請求之前向主機返回對寫入請求的回應。Statement 160. An embodiment of the inventive concept includes a system according to statement 152, wherein the erasure coding logic is operable to return a response to the write request to the host before completing the write request.

聲明161. 本發明概念的實施例包括根據聲明160的系統,其中:Statement 161. An embodiment of the inventive concept includes a system according to statement 160, in which:

具有擦除編碼邏輯的PCIe交換機進一步包括寫入緩衝器;並且The PCIe switch with erasure coding logic further includes a write buffer; and

擦除編碼控制器能夠操作以將寫入請求中的資料儲存在寫入緩衝器中。The erasure coding controller is operable to store the data in the write request in the write buffer.

聲明162. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯包括旁視擦除編碼邏輯,旁視擦除編碼邏輯包括探測邏輯。Statement 162. An embodiment of the inventive concept includes a system according to statement 152, wherein the erasure coding logic includes side-by-side erasure coding logic, and the side-view erasure-coding logic includes detection logic.

聲明163. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處接收的控制傳輸,並將控制傳輸轉發到功率處理單元(PPU)。Statement 163. Embodiments of the inventive concept include a system according to statement 152, where the erasure coding logic is operable to intercept control transmissions received at the PCIe switch and forward the control transmissions to the power processing unit (PPU).

聲明164. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從主機接收的資料傳輸,並在資料傳輸中以由NVMe SSD所使用的裝置邏輯塊位址(LBA)替換由主機所使用的主機LBA。Statement 164. Embodiments of the inventive concept include a system according to statement 152, wherein the erasure coding logic is operable to intercept data transmissions received from the host at the PCIe switch, and in the data transmission with device logic used by the NVMe SSD Block address (LBA) replaces the host LBA used by the host.

聲明165. 本發明概念的實施例包括根據聲明164的系統,其中擦除編碼邏輯進一步能夠操作以將資料傳輸引導到NVMe SSD。Statement 165. Embodiments of the inventive concept include a system according to statement 164, wherein the erasure coding logic is further operable to direct data transfer to the NVMe SSD.

聲明166. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯能夠操作以攔截在PCIe交換機處從NVMe SSD接收的資料傳輸,並在資料傳輸中以由主機所使用的主機LBA替換由NVMe SSD所使用的裝置LBA。Statement 166. An embodiment of the inventive concept includes a system according to statement 152, where the erasure coding logic is operable to intercept data transmissions received from the NVMe SSD at the PCIe switch, and use the host LBA used by the host in the data transmission Replace the device LBA used by NVMe SSD.

聲明167. 本發明概念的實施例包括根據聲明152的系統,其中擦除編碼邏輯定義跨越NVMe SSD及第二NVMe SSD的虛擬儲存裝置。Statement 167. An embodiment of the inventive concept includes a system according to statement 152, wherein the erasure coding logic defines a virtual storage device that spans the NVMe SSD and the second NVMe SSD.

聲明168. 本發明概念的實施例包括根據聲明152的系統,其中具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯。Statement 168. Embodiments of the inventive concept include a system according to statement 152, wherein a PCIe switch with erasure coding logic is operable to enable erasure coding logic based at least in part on the NVMe SSD being able to be used with erasure coding logic.

聲明169. 本發明概念的實施例包括根據聲明152的系統,所述系統進一步包括連接到具有擦除編碼邏輯的PCIe交換機的第二裝置。Statement 169. An embodiment of the inventive concept includes a system according to statement 152, the system further comprising a second device connected to a PCIe switch with erasure coding logic.

聲明170. 本發明概念的實施例包括根據聲明169的系統,其中第二裝置包括儲存裝置、具有現場可程式設計閘陣列(FPGA)的SSD及圖形處理單元(GPU)中的至少一個。Statement 170. An embodiment of the inventive concept includes a system according to statement 169, wherein the second device includes at least one of a storage device, an SSD with a field programmable gate array (FPGA), and a graphics processing unit (GPU).

聲明171. 本發明概念的實施例包括根據聲明169的系統,其中:Statement 171. An embodiment of the inventive concept includes a system according to statement 169, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分地基於第二裝置不能夠與擦除編碼邏輯一起使用而禁用擦除編碼邏輯。The PCIe switch with erasure coding logic is operable to disable erasure coding logic based at least in part on the inability of the second device to be used with erasure coding logic.

聲明172. 本發明概念的實施例包括根據聲明169的系統,其中:Statement 172. Embodiments of the inventive concept include a system according to statement 169, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯,並在不使用擦除編碼邏輯的情況下啟用對第二裝置的存取。A PCIe switch with erasure coding logic is operable to enable erasure coding logic based at least in part on the NVMe SSD can be used with erasure coding logic, and enable storage of the second device without using erasure coding logic take.

聲明173. 本發明概念的實施例包括根據聲明169的系統,其中:Statement 173. An embodiment of the inventive concept includes a system according to statement 169, in which:

第二裝置不能夠與擦除編碼邏輯一起使用;並且The second device cannot be used with erasure coding logic; and

具有擦除編碼邏輯的PCIe交換機能夠操作以至少部分地基於NVMe SSD能夠與擦除編碼邏輯一起使用而啟用擦除編碼邏輯,並禁用對第二裝置的存取。The PCIe switch with erasure coding logic is operable to enable erasure coding logic based at least in part on the NVMe SSD being able to be used with the erasure coding logic and disable access to the second device.

因此,就本文中所述實施例的各種各樣的排列方案而言,此詳細說明及隨附材料旨在僅為說明性的,而不應被視為限制本發明概念的範圍。因此,本發明概念所主張的是所有此種修改均可落入以上申請專利範圍書及其等效條款的範圍及精神內。Therefore, with regard to the various arrangements of the embodiments described herein, this detailed description and accompanying materials are intended to be illustrative only and should not be considered as limiting the scope of the inventive concept. Therefore, the concept of the present invention claims that all such modifications can fall within the scope and spirit of the above patent application scope and equivalent clauses.

105:機器/主機 110:處理器 115:記憶體 120:記憶體控制器 125、320、605、1005:周邊元件連接快速(PCIe)交換機 130:儲存裝置 130-1、130-2、130-3、130-4、130-5、130-6:固態驅動器(SSD)/儲存裝置/物理儲存裝置 135:裝置驅動器 205:時脈 210:網路連接器 215:匯流排 220:使用者介面 225:輸入/輸出引擎 305:中間平面 310、315:配電板 325、330:基板管理控制器(BMC) 405、410、415:擦除編碼方案 505:連接器 510-1、510-2、510-3、510-4、510-5、510-6:PCIe到PCIe堆疊 515:PCIe交換機核心 520:功率處理單元(PPU) 525:探測邏輯 530:擦除編碼控制器 535-1、535-2、535-3、535-4、535-5、535-6:捕獲介面 540:複用器 545:快取 550:寫入緩衝器 555:擦除編碼啟用信號 705:現場可程式設計閘陣列(FPGA) 1103、1106、1109、1112、1115、1118、1121、1124、1127、1130、1133、1136、1139、1145、1148、1151、1154、1160、1163、1205、1210、1215、1220、1225、1235、1240、1305、1310、1315、1405、1410、1415、1420:方塊105: machine/host 110: processor 115: Memory 120: memory controller 125, 320, 605, 1005: Peripheral components connected to Express (PCIe) switches 130: storage device 130-1, 130-2, 130-3, 130-4, 130-5, 130-6: solid state drive (SSD)/storage device/physical storage device 135: device driver 205: Clock 210: network connector 215: busbar 220: user interface 225: input/output engine 305: middle plane 310, 315: distribution board 325, 330: Baseboard Management Controller (BMC) 405, 410, 415: erasure coding scheme 505: connector 510-1, 510-2, 510-3, 510-4, 510-5, 510-6: PCIe to PCIe stack 515: PCIe switch core 520: Power Processing Unit (PPU) 525: Detection logic 530: Erase code controller 535-1, 535-2, 535-3, 535-4, 535-5, 535-6: capture interface 540: Multiplexer 545: Cache 550: Write buffer 555: Erase code enable signal 705: Field programmable gate array (FPGA) 1103, 1106, 1109, 1112, 1115, 1118, 1121, 1124, 1127, 1130, 1133, 1136, 1139, 1145, 1148, 1151, 1154, 1160, 1163, 1205, 1210, 1215, 1220, 1225, 1235, 1240, 1305, 1310, 1315, 1405, 1410, 1415, 1420: square

圖1示出根據本發明概念實施例的機器,所述機器包括具有旁視擦除編碼邏輯的周邊元件連接快速(PCIe)交換機。 圖2示出圖1所示機器的附加細節。 圖3示出圖1所示機器的附加細節,所述附加細節包括配電板及將具有圖1所示旁視擦除編碼邏輯的PCIe交換機連接到儲存裝置的中間平面(mid-plane)。 圖4示出用於實現不同擦除編碼方案的圖3所示儲存裝置。 圖5示出圖1所示具有旁視擦除編碼邏輯的PCIe交換機的細節。 圖6示出根據本發明概念另一實施例的具有透視擦除編碼邏輯的PCIe交換機的細節。 圖7示出根據本發明概念一個實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機的第一拓撲(topology)。 圖8示出根據本發明概念另一實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機的第二拓撲。 圖9示出根據本發明概念又一實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機的第三拓撲。 圖10示出根據本發明概念又一實施例的使用圖1所示具有旁視擦除編碼邏輯的PCIe交換機的第四拓撲。 圖11A至圖11D示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機支援擦除編碼方案的示例性過程的流程圖。 圖12A至圖12B示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機實行初始化的示例性過程。 圖13示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機將新儲存裝置併入擦除編碼方案中的示例性過程的流程圖。 圖14示出根據本發明概念實施例的圖1所示具有旁視擦除編碼邏輯的PCIe交換機處置故障儲存裝置的示例性過程的流程圖。FIG. 1 illustrates a machine according to an embodiment of the inventive concept, the machine including a peripheral component connection express (PCIe) switch with side-by-side erasure coding logic. Fig. 2 shows additional details of the machine shown in Fig. 1. FIG. 3 shows additional details of the machine shown in FIG. 1, the additional details including a power distribution board and a mid-plane connecting the PCIe switch with the bypass erasure coding logic shown in FIG. 1 to the storage device. FIG. 4 shows the storage device shown in FIG. 3 for implementing different erasure coding schemes. FIG. 5 shows details of the PCIe switch shown in FIG. 1 with side-by-side erasure coding logic. 6 shows details of a PCIe switch with perspective erasure coding logic according to another embodiment of the inventive concept. FIG. 7 illustrates a first topology using the PCIe switch with side-by-side erasure coding logic shown in FIG. 1 according to an embodiment of the inventive concept. FIG. 8 illustrates a second topology using the PCIe switch with bypass code erasure coding logic shown in FIG. 1 according to another embodiment of the inventive concept. FIG. 9 shows a third topology using the PCIe switch with side-by-side erasure coding logic shown in FIG. 1 according to yet another embodiment of the inventive concept. FIG. 10 illustrates a fourth topology using the PCIe switch with side-by-side erasure coding logic shown in FIG. 1 according to yet another embodiment of the inventive concept. FIGS. 11A to 11D show a flowchart of an exemplary process of the PCIe switch with bypass erasure coding logic shown in FIG. 1 supporting the erasure coding scheme according to an embodiment of the inventive concept. FIGS. 12A-12B illustrate an exemplary process for the PCIe switch with bypass erasure coding logic shown in FIG. 1 to perform initialization according to an embodiment of the inventive concept. 13 shows a flowchart of an exemplary process of the PCIe switch shown in FIG. 1 with side-by-side erasure coding logic incorporating a new storage device into an erasure coding scheme according to an embodiment of the inventive concept. 14 shows a flowchart of an exemplary process of the PCIe switch with bypass erasure coding logic shown in FIG. 1 to handle a faulty storage device according to an embodiment of the inventive concept.

125:PCIe交換機 125: PCIe switch

505:連接器 505: connector

510-1~510-6:PCIe到PCIe堆疊 510-1~510-6: PCIe to PCIe stack

515:PCIe交換機核心 515: PCIe switch core

520:PPU 520: PPU

525:探測邏輯 525: Detection logic

530:擦除編碼控制器 530: Erase code controller

535-1~535-6:捕獲介面 535-1~535-6: Capture interface

540:複用器 540: Multiplexer

545:快取 545: Cache

550:寫入緩衝器 550: Write buffer

555:擦除編碼啟用信號 555: Erase code enable signal

Claims (20)

一種系統,包括: 非揮發性儲存快速固態驅動器; 現場可程式設計閘陣列,所述現場可程式設計閘陣列實施支援所述非揮發性儲存快速固態驅動器的一個或多個功能,所述功能包括資料加速、重復資料刪除、資料完整性、資料加密及資料壓縮中的至少一個;以及 周邊元件連接快速交換機; 其中所述周邊元件連接快速交換機與所述現場可程式設計閘陣列及所述非揮發性儲存快速固態驅動器通信。A system including: Non-volatile storage fast solid-state drive; Field-programmable gate array that implements one or more functions that support the non-volatile storage fast solid-state drive, including data acceleration, deduplication, data integrity, and data encryption And at least one of data compression; and The peripheral components are connected to the fast switch; The peripheral device is connected to a fast switch to communicate with the field programmable gate array and the non-volatile storage fast solid-state drive. 如申請專利範圍第1項所述的系統,其中所述周邊元件連接快速交換機包括擦除編碼邏輯,所述擦除編碼邏輯包括擦除編碼控制器。The system according to item 1 of the patent application scope, wherein the peripheral element connection fast switch includes erasure coding logic, and the erasure coding logic includes an erasure coding controller. 如申請專利範圍第2項所述的系統,其中所述擦除編碼邏輯操作以基於快取中存在來自主機的讀取請求中所請求的至少部分的資料而返回對所述讀取請求的回應。The system of claim 2 of the patent application scope, wherein the erasure coding logic operates to return a response to the read request based on the presence of at least part of the data requested in the read request from the host in the cache . 如申請專利範圍第2項所述的系統,其中所述擦除編碼邏輯操作以在完成寫入請求之前向主機返回對所述寫入請求的回應。The system of claim 2 of the patent application scope, wherein the erasure coding logic operates to return a response to the write request to the host before completing the write request. 如申請專利範圍第2項所述的系統,其中所述擦除編碼邏輯包括旁視擦除編碼邏輯,所述旁視擦除編碼邏輯包括探測邏輯。The system according to item 2 of the patent application scope, wherein the erasure coding logic includes side-view erasure coding logic, and the side-view erasure coding logic includes detection logic. 如申請專利範圍第2項所述的系統,其中所述擦除編碼邏輯操作以攔截在所述周邊元件連接快速交換機處從主機接收的資料傳輸,並替換所述主機在所述資料傳輸中所使用的主機邏輯塊位址為所述非揮發性儲存快速固態驅動器的裝置邏輯塊位址。The system according to item 2 of the patent application scope, wherein the erasure coding logic operates to intercept the data transmission received from the host where the peripheral element is connected to the fast switch, and replace the host in the data transmission. The host logical block address used is the logical block address of the device of the non-volatile storage fast solid-state drive. 如申請專利範圍第2項所述的系統,其中所述周邊元件連接快速交換機操作以基於至少部分的所述非揮發性儲存快速固態驅動器中不包括本機擦除編碼邏輯而啟用所述擦除編碼邏輯。The system of claim 2 of the patent application scope, wherein the peripheral element is connected to a fast switch operation to enable the erasure based on at least a portion of the non-volatile storage fast solid-state drive that does not include native erasure coding logic Coding logic. 如申請專利範圍第2項所述的系統,進一步包括連接到具有擦除編碼邏輯的所述周邊元件連接快速交換機的第二裝置。The system as described in item 2 of the scope of the patent application further includes a second device connected to the peripheral element connected to the fast switch with erasure coding logic. 如申請專利範圍第8項所述的系統,其中: 所述第二裝置包括非儲存裝置及具有本機擦除編碼邏輯的儲存裝置中的至少一個;並且 所述周邊元件連接快速交換機操作以至少部分地基於所述第二裝置而禁用所述擦除編碼邏輯。The system as described in item 8 of the patent application scope, in which: The second device includes at least one of a non-storage device and a storage device having local erasure coding logic; and The peripheral element is connected to a fast switch operation to disable the erasure coding logic based at least in part on the second device. 如申請專利範圍第8項所述的系統,其中: 所述第二裝置包括非儲存裝置及具有本機擦除編碼邏輯的儲存裝置中的至少一個;並且 包括非儲存裝置及具有本機擦除編碼的儲存裝置中的至少一個所述周邊元件連接快速交換機操作以至少部分地基於非揮發性儲存快速固態驅動器不包括本機擦除編碼邏輯而啟用所述擦除編碼邏輯,並在不使用所述擦除編碼邏輯下啟用對所述第二裝置的存取。The system as described in item 8 of the patent application scope, in which: The second device includes at least one of a non-storage device and a storage device having local erasure coding logic; and At least one of the peripheral elements including a non-storage device and a storage device with a local erasure code is connected to a fast switch operation to enable the at least partly based on the non-volatile storage fast solid-state drive not including the local erasure code logic Erasure coding logic, and enable access to the second device without using the erasure coding logic. 一種系統,包括: 非揮發性儲存快速固態驅動器;以及 現場可程式設計閘陣列,所述現場可程式設計閘陣列包括第一現場可程式設計閘陣列部分及第二現場可程式設計閘陣列部分,所述第一現場可程式設計閘陣列部分實施一個或多個功能以支持所述非揮發性儲存快速固態驅動器,所述功能包括資料加速、重復資料刪除、資料完整性、資料加密及資料壓縮中的至少一個,且所述第二現場可程式設計閘陣列部分實施周邊元件連接快速交換機, 其中所述周邊元件連接快速交換機與所述現場可程式設計閘陣列及所述非揮發性儲存快速固態驅動器通信,且 其中所述現場可程式設計閘陣列及所述非揮發性儲存快速固態驅動器位於共用殼體內部。A system including: Non-volatile storage fast solid-state drives; and On-site programmable gate array, the on-site programmable gate array includes a first on-site programmable gate array part and a second on-site programmable gate array part, the first on-site programmable gate array part implements one or Multiple functions to support the non-volatile storage fast solid-state drive, the functions include at least one of data acceleration, deduplication, data integrity, data encryption, and data compression, and the second field programmable gate The array part implements peripheral components to connect fast switches, Wherein the peripheral components are connected to the fast switch to communicate with the field programmable gate array and the non-volatile storage fast solid-state drive, and The on-site programmable gate array and the non-volatile storage fast solid-state drive are located inside a common housing. 如申請專利範圍第11項所述的系統,其中所述周邊元件連接快速交換機包括擦除編碼邏輯,所述擦除編碼邏輯包括擦除編碼控制器。The system as recited in item 11 of the patent application range, wherein the peripheral element connection fast switch includes erasure coding logic, and the erasure coding logic includes an erasure coding controller. 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯包括旁視擦除編碼邏輯及透視擦除編碼邏輯中的至少一個。The system according to item 12 of the patent application scope, wherein the erasure coding logic includes at least one of side-view erasure coding logic and perspective erasure coding logic. 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯操作以至少部分地基於快取中存在來自主機的讀取請求中所請求的資料而返回對所述讀取請求的回應。The system of claim 12 of the patent application scope, wherein the erasure coding logic operates to return a response to the read request based at least in part on the presence of data requested in the read request from the host in the cache . 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯操作以在完成寫入請求之前向主機返回對所述寫入請求的回應。The system of claim 12 of the patent application scope, wherein the erasure coding logic operates to return a response to the write request to the host before completing the write request. 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯包括旁視擦除編碼邏輯,所述旁視擦除編碼邏輯包括探測邏輯。The system according to item 12 of the patent application scope, wherein the erasure coding logic includes side-view erasure coding logic, and the side-view erasure coding logic includes detection logic. 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯操作以攔截在所述周邊元件連接快速交換機處從主機接收的資料傳輸,並在所述資料傳輸中以由所述非揮發性儲存快速固態驅動器所使用的裝置邏輯塊位址替換由所述主機所使用的主機邏輯塊位址。The system of claim 12 of the patent application scope, wherein the erasure coding logic operates to intercept the data transmission received from the host where the peripheral element is connected to the fast switch, and in the data transmission The device logical block address used by the volatile storage fast solid-state drive replaces the host logical block address used by the host. 如申請專利範圍第12項所述的系統,其中所述擦除編碼邏輯操作以攔截在所述周邊元件連接快速交換機處從所述非揮發性儲存快速固態驅動器接收的資料傳輸,並在所述資料傳輸中以由主機所使用的主機邏輯塊位址替換由所述非揮發性儲存快速固態驅動器所使用的裝置邏輯塊位址。The system of claim 12 of the patent application scope, wherein the erasure coding logic operates to intercept the data transmission received from the non-volatile storage fast solid-state drive at the peripheral element connected to the fast switch, and in the In the data transmission, the logical block address of the device used by the non-volatile storage fast solid-state drive is replaced with the logical block address of the host used by the host. 如申請專利範圍第12項所述的系統,其中具有擦除編碼邏輯的所述周邊元件連接快速交換機操作以基於至少部分的所述非揮發性儲存快速固態驅動器中不包括本機擦除編碼邏輯而啟用所述擦除編碼邏輯。The system of claim 12 of the patent application scope, wherein the peripheral element with erasure coding logic is connected to a fast switch to operate based on at least part of the non-volatile storage fast solid-state drive does not include native erasure coding logic Instead, the erasure coding logic is enabled. 如申請專利範圍第12項所述的系統,其中具有擦除編碼邏輯的所述周邊元件連接快速交換機操作以基於至少部分的所述非揮發性儲存快速固態驅動器中包括本機擦除編碼邏輯而禁用所述擦除編碼邏輯。The system of claim 12 of the patent application scope, wherein the peripheral element with erasure coding logic is connected to a fast switch to operate based on at least part of the non-volatile storage fast solid-state drive including local erasure coding logic Disable the erasure coding logic.
TW108129186A 2018-10-12 2019-08-16 Computuer system TWI791880B (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201862745261P 2018-10-12 2018-10-12
US62/745,261 2018-10-12
US16/207,080 US10635609B2 (en) 2018-03-02 2018-11-30 Method for supporting erasure code data protection with embedded PCIE switch inside FPGA+SSD
US16/207,080 2018-11-30
US16/226,629 US10838885B2 (en) 2018-03-02 2018-12-19 Method for supporting erasure code data protection with embedded PCIE switch inside FPGA+SSD
US16/226,629 2018-12-19
US16/260,087 US11860672B2 (en) 2018-03-02 2019-01-28 Method for supporting erasure code data protection with embedded PCIE switch inside FPGA+SSD
US16/260,087 2019-01-28

Publications (2)

Publication Number Publication Date
TW202020675A true TW202020675A (en) 2020-06-01
TWI791880B TWI791880B (en) 2023-02-11

Family

ID=70219044

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108129186A TWI791880B (en) 2018-10-12 2019-08-16 Computuer system

Country Status (4)

Country Link
JP (1) JP7370801B2 (en)
KR (1) KR20200041815A (en)
CN (1) CN111045597B (en)
TW (1) TWI791880B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI784804B (en) * 2021-11-19 2022-11-21 群聯電子股份有限公司 Retiming circuit module, signal transmission system and signal transmission method
TWI817223B (en) * 2021-06-21 2023-10-01 日商鎧俠股份有限公司 Memory system and control method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102225577B1 (en) * 2020-08-21 2021-03-09 (주)테온 Method and device for distributed storage of data using hybrid storage
CN112148227B (en) * 2020-09-25 2023-03-24 中国科学院空天信息创新研究院 Storage device and information processing method
CN112732477B (en) * 2021-04-01 2021-06-29 四川华鲲振宇智能科技有限责任公司 Method for fault isolation by out-of-band self-checking

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8572320B1 (en) * 2009-01-23 2013-10-29 Cypress Semiconductor Corporation Memory devices and systems including cache devices for memory modules
CN102819517A (en) * 2011-06-08 2012-12-12 鸿富锦精密工业(深圳)有限公司 PCIE (peripheral component interconnect-express) interface card
US20130232293A1 (en) * 2012-03-05 2013-09-05 Nguyen P. Nguyen High performance storage technology with off the shelf storage components
US9111621B2 (en) * 2012-06-20 2015-08-18 Pfg Ip Llc Solid state drive memory device comprising secure erase function
JP2014063497A (en) * 2012-09-21 2014-04-10 Plx Technology Inc Pci express switch with logical device capability
US8954657B1 (en) * 2013-09-27 2015-02-10 Avalanche Technology, Inc. Storage processor managing solid state disk array
US9298648B2 (en) * 2013-05-08 2016-03-29 Avago Technologies General Ip (Singapore) Pte Ltd Method and system for I/O flow management using RAID controller with DMA capabilitiy to directly send data to PCI-E devices connected to PCI-E switch
US9336173B1 (en) * 2013-12-20 2016-05-10 Microsemi Storage Solutions (U.S.), Inc. Method and switch for transferring transactions between switch domains
US9940036B2 (en) * 2014-09-23 2018-04-10 Western Digital Technologies, Inc. System and method for controlling various aspects of PCIe direct attached nonvolatile memory storage subsystems
US20160259754A1 (en) * 2015-03-02 2016-09-08 Samsung Electronics Co., Ltd. Hard disk drive form factor solid state drive multi-card adapter
US10007443B1 (en) * 2016-03-31 2018-06-26 EMC IP Holding Company LLC Host to device I/O flow
CN108073833A (en) * 2016-11-10 2018-05-25 苏州韦科韬信息技术有限公司 Solid state disk secrecy system and method based on PCIE interfaces
TW201823916A (en) * 2016-12-27 2018-07-01 英業達股份有限公司 Server system
US10255134B2 (en) * 2017-01-20 2019-04-09 Samsung Electronics Co., Ltd. Control plane method and apparatus for providing erasure code protection across multiple storage devices

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI817223B (en) * 2021-06-21 2023-10-01 日商鎧俠股份有限公司 Memory system and control method
TWI784804B (en) * 2021-11-19 2022-11-21 群聯電子股份有限公司 Retiming circuit module, signal transmission system and signal transmission method

Also Published As

Publication number Publication date
CN111045597B (en) 2024-08-20
CN111045597A (en) 2020-04-21
JP2020061149A (en) 2020-04-16
JP7370801B2 (en) 2023-10-30
TWI791880B (en) 2023-02-11
KR20200041815A (en) 2020-04-22

Similar Documents

Publication Publication Date Title
US11860672B2 (en) Method for supporting erasure code data protection with embedded PCIE switch inside FPGA+SSD
US11797181B2 (en) Hardware accessible external memory
US11360679B2 (en) Paging of external memory
TWI791880B (en) Computuer system
US8560772B1 (en) System and method for data migration between high-performance computing architectures and data storage devices
US11086525B2 (en) Resilient external memory
US11782634B2 (en) Dynamic use of non-volatile ram as memory and storage on a storage system
US20210271393A1 (en) Method and apparatus for performing data access management of all flash array server
US20180307427A1 (en) Storage control apparatus and storage control method
US20240095196A1 (en) Method for supporting erasure code data protection with embedded pcie switch inside fpga+ssd
WO2018055686A1 (en) Information processing system