JP2008033921A

JP2008033921A - Distributed storage system with accelerated striping

Info

Publication number: JP2008033921A
Application number: JP2007172961A
Authority: JP
Inventors: Stephen J Sicola; ジェイ．シコラスティーブン
Original assignee: Seagate Technology LLC
Current assignee: Seagate Technology LLC
Priority date: 2006-06-29
Filing date: 2007-06-29
Publication date: 2008-02-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide an intelligent data storage sub system for providing a capacity as a virtual storage space to a network in order to conduct self-determining allocation of respective data storage capacities, manage, and protect the respective data storage capacities, and store a total storage request. <P>SOLUTION: A data storage device is provided with a resident software system in a memory space configured so as to encode data read from a first number of logical units into a single channel in order to store the data in a second number of logical units. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

（関連出願）
本出願は２００５年６月３日提出の、本願の譲渡人に譲渡された米国特許出願番号第１１／１４５、４０３号の一部継続出願である。 (Related application)
This application is a continuation-in-part of US patent application Ser. No. 11 / 145,403, filed Jun. 3, 2005, assigned to the assignee of the present application.

本発明は分散データ記憶システムの分野に一般的に関係し、特に、これには制限されないが、メモリに記憶したデータを再ストライプする方法と装置に関係する。 The present invention relates generally to the field of distributed data storage systems, and more particularly, but not exclusively, to methods and apparatus for restriping data stored in memory.

業界標準アーキテクチャのデータ転送速度がインテル社により製造された８０３８６プロセッサのアクセス速度に合わせられなくなった時、コンピュータ網は増殖し始めた。ローカルエリア網（ＬＡＮ）は網にデータ記憶能力を集約することによりストレージエリア網（ＳＡＮ）へ進化した。直接接続の記憶装置では不可能なオーダー以上のより大きな記憶を処理する能力のような、ＳＡＮの機器により処理される関連データと機器の集約により、ユーザーは顕著な利点を実現し、かつこれを管理可能なコストで実行できた。 Computer networks began to proliferate when the data transfer rate of industry standard architectures could not match the access rate of the 80386 processor manufactured by Intel. The local area network (LAN) has evolved into a storage area network (SAN) by consolidating data storage capabilities into the network. Aggregation of relevant data and equipment processed by SAN equipment, such as the ability to handle larger storage beyond the order of storage directly connected, allows users to realize significant benefits and We were able to do it at a manageable cost.

最近、データ記憶サブシステムを制御する網中心方式に動向が向いてきている。すなわち、記憶部を集約したのと同様に、記憶部の機能を制御するシステムもサーバーと網自体から切り離される。例えば、ホストベースのソフトウェアは保守及び管理タスクをインテリジェント・スイッチまたは特殊な網記憶サービス・プラットフォームに委託可能である。アプライアンスベースのソリューションはホストで実行するソフトウェアを不要とし、企業のノードとして配置されたコンピュータ内で動作する。いずれにせよ、インテリジェントな網ソリューションは、記憶割当てルーチン、バックアップ・ルーチン、及びフォールトトレラント方式のようなものをホストとは独立に集中化可能である。 Recently, there has been a trend toward network-centric methods for controlling data storage subsystems. That is, the system that controls the functions of the storage unit is also separated from the server and the network itself, just as the storage unit is integrated. For example, host-based software can delegate maintenance and management tasks to intelligent switches or special network storage service platforms. Appliance-based solutions do not require software to run on the host and operate in computers that are deployed as enterprise nodes. In any case, intelligent network solutions can centralize things like storage allocation routines, backup routines, and fault tolerant schemes independently of the host.

ホストから網へのインテリジェンスの移行はこのようないくつかの問題を解決したが、ホストへの仮想記憶の表現を変更する柔軟性は一般的に欠如しているので本質的な困難を解決できない。例えば、記憶したデータは信頼性の問題から移動する必要がある、または成長する網を収容するためより多くの記憶容量を追加する必要がある。これらの場合、ホストまたは網のどちらかを変更して新たなまたは変更した記憶空間の存在を知らせなければならない。必要なものは、その各々のデータ記憶容量を自己決定的に割当て、管理し、保護し、全体記憶要求を収容するため網へ仮想記憶空間としてその容量を提供するインテリジェントなデータ記憶サブシステムである。分散計算環境はこれらのインテリジェントな記憶装置を全体プロビジョニングと共に記憶データの全体ストライピングと再ストライピングに使用する。本発明の実施例が指向するのはこのソリューションである。 While the migration of intelligence from the host to the network has solved some of these problems, it does not solve the inherent difficulties because it generally lacks the flexibility to change the representation of virtual memory to the host. For example, stored data needs to be moved due to reliability issues, or more storage capacity needs to be added to accommodate the growing network. In these cases, either the host or the network must be changed to signal the presence of the new or changed storage space. What is needed is an intelligent data storage subsystem that self-determinably allocates, manages and protects its respective data storage capacity and provides that capacity as virtual storage space to the network to accommodate the entire storage request . A distributed computing environment uses these intelligent storage devices along with global provisioning for global striping and restriping of stored data. It is this solution to which embodiments of the present invention are directed.

本発明の実施例は、元々記憶されたデータのバックアップに付随する集中的なオーバーヘッドを避けて、これを再ストライプ構成に復元する再ストライピング能力を有する分散記憶システムを一般的に指向する。 Embodiments of the present invention are generally directed to distributed storage systems that have a re-striping capability that avoids the intensive overhead associated with backing up originally stored data and restores it to a restripe configuration.

ある実施例では、データを第２数の論理装置に記憶するために第１数の論理装置から読み出したデータを単一のチャネルにコード化するよう構成されるメモリ空間に常駐するソフトウェア・システムがデータ記憶装置に提供される。 In one embodiment, a software system residing in a memory space configured to encode data read from a first number of logical devices into a single channel for storing data in the second number of logical devices. Provided for data storage.

ある実施例では、第１数の論理装置上にストライプされたデータを単一のチャネルにコード化し、コード化データを第２数の論理装置にデコードする方法が提供される。 In one embodiment, a method is provided for encoding data striped on a first number of logical devices into a single channel and decoding the encoded data into a second number of logical devices.

ある実施例では、メモリ空間を有し、データをメモリ空間の論理装置に再ストライプする装置を有するインテリジェント記憶要素がデータ記憶システムに提供される。 In one embodiment, an intelligent storage element is provided to a data storage system having a memory space and having a device that restripes data to logical devices in the memory space.

本請求の発明を特徴付けるこれらの及び各種その他の機能と利点は以下の詳細な説明を読み、関連する図面を参照することにより明らかとなる。 These and various other features and advantages that characterize the claimed invention will become apparent upon reading the following detailed description and upon reference to the associated drawings.

図１は本発明の実施例が有用である例示のコンピュータシステム１００である。１つ以上のホスト１０２がローカルエリア網（ＬＡＮ）及び／または広域網（ＷＡＮ）１０６を介して１つ以上の網接続サーバー１０４にネットワークされる。ＬＡＮ／ＷＡＮ１０６はワールドワイドウェブを通して通信するインターネット・プロトコル（ＩＰ）網インフラストラクチャを使用することが望ましい。ホスト１０２は、多数のインテリジェント記憶要素（「ＩＳＥ」）１０８の１つ以上に記憶されたデータを日常的に必要とするサーバー１０４中に常駐するアプリケーションをアクセスする。従って、ＳＡＮ１１０はサーバー１０４をＩＳＥ１０８に接続して記憶データをアクセスする。ＩＳＥ１０８は、企業またはデスクトップ・クラスの記憶媒体を中に有する、シリアルＡＴＡ及びファイバチャネルのような各種の選択通信プロトコルを通してデータを記憶するデータ記憶容量１０９のブロックを提供する。 FIG. 1 is an exemplary computer system 100 in which embodiments of the present invention are useful. One or more hosts 102 are networked to one or more network connection servers 104 via a local area network (LAN) and / or a wide area network (WAN) 106. The LAN / WAN 106 preferably uses an Internet Protocol (IP) network infrastructure that communicates through the World Wide Web. The host 102 accesses applications residing in the server 104 that routinely require data stored in one or more of a number of intelligent storage elements (“ISE”) 108. Accordingly, SAN 110 connects server 104 to ISE 108 to access stored data. The ISE 108 provides a block of data storage capacity 109 for storing data through various selective communication protocols such as Serial ATA and Fiber Channel, having enterprise or desktop class storage media therein.

図２は図１のコンピュータシステムの簡略化線図である。ホスト１０２は網またはファブリック（ｆａｂｒｉｃ）１１０を介して互いにかつＩＳＥ１０８の対（各々ＡとＢと記載）と相互作用する。各ＩＳＥ１０８は独立ドライブの冗長アレイ（ＲＡＩＤ）として特徴付けられるデータ記憶装置の組としてデータ記憶容量１０９上で動作することが望ましい二重冗長制御器１１２（Ａ１、Ａ２とＢ１、Ｂ２と記載）を含む。制御器１１２とデータ記憶容量１０９は、各種の制御器１１２が並列の冗長リンクを利用し、システム１００により記憶されたユーザー・データの少なくともあるものがデータ記憶容量１０９の少なくとも１つの組に記憶されるようフォールトトレラント配置を利用することが望ましい。 FIG. 2 is a simplified diagram of the computer system of FIG. Hosts 102 interact with each other and with a pair of ISEs 108 (denoted A and B, respectively) via a network or fabric 110. Each ISE 108 includes a dual redundant controller 112 (denoted A1, A2 and B1, B2) that preferably operates on the data storage capacity 109 as a set of data storage devices characterized as a redundant array (RAID) of independent drives. Including. The controller 112 and the data storage capacity 109 use a redundant link in which the various controllers 112 use parallel, and at least some of the user data stored by the system 100 is stored in at least one set of the data storage capacity 109. It is desirable to use a fault tolerant arrangement.

Ａホスト・コンピュータ１０２とＡＩＳＥ１０８は第１サイトに物理的に位置し、Ｂホスト・コンピュータ１０２とＢＩＳＥ１０８は第２サイトに物理的に位置し、Ｃホスト・コンピュータ１０７はさらに第３サイトにいることも可能であるとさらに考えられるが、このようなことは単なる例示であり、制限ではない。分散コンピュータシステム上の全てのエンティティは何らかの型式のコンピュータ網を通して接続される。 A host computer 102 and A ISE 108 are physically located at the first site, B host computer 102 and B ISE 108 are physically located at the second site, and C host computer 107 is further at the third site. It is further believed that this is possible, but this is merely exemplary and not limiting. All entities on a distributed computer system are connected through some type of computer network.

図３は本発明の実施例に従って構成されたＩＳＥ１０８を図示する。棚１１４はミッドプレーン１１６と電気的接続する制御器１１２と受容的に係合する空洞部を設ける。棚１１４はまた、キャビネット（図示せず）内に支持される。複数ディスク組立体（ＭＤＡ）１１８の対がミッドプレーン１１６の同じ側で棚１１４と受容的に係合される。ミッドプレーン１１６の反対側には、緊急電源を与える二重電池１２２、二重交流電源１２４、二重インターフェース・モジュール１２６が接続される。二重部品はＭＤＡのどちらかまたは同時に両方で動作するよう構成され、部品故障の場合のバックアップ保護を提供することが望ましい。 FIG. 3 illustrates an ISE 108 configured in accordance with an embodiment of the present invention. The shelf 114 provides a cavity that receptively engages the controller 112 that is electrically connected to the midplane 116. The shelf 114 is also supported in a cabinet (not shown). A multiple disk assembly (MDA) 118 pair is receptively engaged with the shelf 114 on the same side of the midplane 116. On the opposite side of the midplane 116, a dual battery 122, a dual AC power supply 124, and a dual interface module 126 that provide emergency power are connected. It is desirable that the dual components be configured to operate on either the MDA or both at the same time to provide backup protection in the event of component failure.

図４は本発明のある実施例に従って構成されたＭＤＡ１１８の拡大部分展開等角投影図である。ＭＤＡ１１８は上部区画１３０と下部区画１３２を有し、各々が５台のデータ記憶装置１２８を支持する。区画１３０、１３２は、ミッドプレーン１１６（図３）と動作的に係合するコネクタ１３６を有する共通回路板１３４と接続するデータ記憶装置１２８と整合する。ラッパー１３８は電磁干渉シールドを提供する。ＭＤＡ１１８のこの例示実施例は、本発明の譲渡人に譲渡され、引用により本明細書に包含される複数ディスクアレイのキャリア装置と方法という名称の特願第１０／８８４、６０５号の主題事項である。ＭＤＡのその他の例示実施例は、本発明の譲渡人に譲渡され、引用により本明細書に包含される同一タイトルの特願第１０／８１７、３７８号の主題事項である。別の等価な実施例では、ＭＤＡ１１８は以下で説明するように密封筐体内で提供可能である。 FIG. 4 is an enlarged partial isometric view of MDA 118 constructed in accordance with an embodiment of the present invention. The MDA 118 has an upper section 130 and a lower section 132, each supporting five data storage devices 128. The compartments 130, 132 are aligned with a data storage device 128 that connects to a common circuit board 134 having a connector 136 that operatively engages the midplane 116 (FIG. 3). The wrapper 138 provides an electromagnetic interference shield. This exemplary embodiment of MDA 118 is the subject matter of Japanese Patent Application No. 10 / 884,605, which is assigned to the assignee of the present invention and is incorporated herein by reference. is there. Another exemplary embodiment of MDA is the subject matter of Japanese Patent Application No. 10 / 817,378 of the same title assigned to the assignee of the present invention and incorporated herein by reference. In another equivalent embodiment, the MDA 118 can be provided in a sealed housing as described below.

図５は本発明の実施例の使用に適する例示のデータ記憶装置１２８の等角投影図で、回転媒体ディスクドライブの形式である。移動データ記憶媒体を有する回転スピンドルを以下の説明に使用するが、別の等価な実施例では、固体メモリ素子のような非回転媒体装置を使用する。データ記憶ディスク１４０はモーター１４２により回転されてディスク１４０のデータ記憶位置を読取／書込ヘッド（「ヘッド」）１４３へ与える。ヘッド１４３は、ヘッド１４３をディスク１４０の内部及び外部トラック間で放射方向に移動可能な回転アクチュエータ１４４の遠心端で支持される。ヘッド１４３はフレックス回路１４６を介して回路板１４５へ電気的に接続される。回路板１４５はデータ記憶装置１２８の機能を制御する制御信号を送受信するようにされる。コネクタ１４８が回路板１４５に電気的に接続されて、データ記憶装置１２８をＭＤＡ１１８の回路板１３４（図４）と接続するようにする。 FIG. 5 is an isometric view of an exemplary data storage device 128 suitable for use with embodiments of the present invention, in the form of a rotating media disk drive. Although a rotating spindle with a moving data storage medium is used in the following description, another equivalent embodiment uses a non-rotating media device such as a solid-state memory element. The data storage disk 140 is rotated by a motor 142 to provide the data storage position of the disk 140 to a read / write head (“head”) 143. The head 143 is supported by the distal end of a rotary actuator 144 that can move the head 143 radially between the internal and external tracks of the disk 140. Head 143 is electrically connected to circuit board 145 via flex circuit 146. Circuit board 145 is adapted to transmit and receive control signals that control the function of data storage device 128. A connector 148 is electrically connected to the circuit board 145 to connect the data storage device 128 to the circuit board 134 of the MDA 118 (FIG. 4).

図６は本発明の実施例により構成されたＩＳＥ１０８の線図である。制御器１１２はインテリジェント記憶プロセッサ（ＩＳＰ）１５０と関連して動作して、データ一体性の管理された信頼性を提供する。ＩＳＰ１５０は制御器１１２、ＭＤＡ１１８、またはＩＳＥ１０８内のどこかに常駐可能である。 FIG. 6 is a diagram of an ISE 108 constructed in accordance with an embodiment of the present invention. Controller 112 operates in conjunction with intelligent storage processor (ISP) 150 to provide managed reliability of data integrity. ISP 150 can reside anywhere within controller 112, MDA 118, or ISE 108.

制御器１１２は各々、通信ポート１５５、１５７を介したデータパック１５１、１５３への遠隔アクセス指令に応答する。説明の都合上、図６の制御器１１２は、全てが記憶容量のホスト指令に応答する、データパック１５１から論理装置（「ＬＵＮ」）１とＬＵＮ２を作成する。また、本説明の都合上、データパック１５１、１５３はデータ記憶用の八台のデータ記憶装置１２８と二台の予備データ記憶装置１２８を含むものと仮定する。 Controller 112 responds to remote access commands to data packs 151, 153 via communication ports 155, 157, respectively. For convenience of explanation, the controller 112 of FIG. 6 creates logical units (“LUN”) 1 and LUN 2 from the data pack 151, all responding to a storage capacity host command. For the convenience of this description, it is assumed that the data packs 151 and 153 include eight data storage devices 128 for data storage and two spare data storage devices 128.

管理信頼性の特徴は、ＲＡＩＤ戦略のような信頼可能なデータ記憶形式を起動することを含む。例えば、複数個の異なるＲＡＩＤ形式の内の選択されたものを選択的に使用するシステムを提供することにより、相対的により堅牢なデータ記憶用のシステムが生成され、ＭＤＡ１１８を管理するために使用するソフトウェアの複雑性を減少するファームウェア・アルゴリズムの最適化を可能とすると共に、記憶障害状態からの相対的に迅速な復帰を可能とする。この複数ＲＡＩＤ形式システムのこれらの及びその他の特徴は本譲渡人に譲渡され、引用により本明細書に包含される記憶媒体データ構造とその方法という名称の特願第１０／８１７、２６４号に記載されている。 Management reliability features include invoking a reliable data storage format such as a RAID strategy. For example, by providing a system that selectively uses a selected one of a plurality of different RAID formats, a relatively more robust system for data storage is generated and used to manage the MDA 118. Enables optimization of firmware algorithms that reduce software complexity and enables relatively quick recovery from memory failure conditions. These and other features of this multiple RAID format system are assigned to this assignee and described in Japanese Patent Application No. 10 / 817,264, entitled Storage Medium Data Structure and Method, which is incorporated herein by reference. Has been.

管理信頼性はまたシステムの監視使用を基にした診断及び訂正ルーチンのスケジューリングを含む。データ復元操作はデータのコピーと再構成に実行される。ＩＳＰ１５０は、データ損失なしに全体データ記憶容量の「自己修復（ｓｅｌｆ−ｈｅａｌｉｎｇ）」を容易にするようにＭＤＡ１１８に統合される。本明細書で考慮する管理信頼性機能のこれらの及びその他の特徴は、本譲渡人に譲渡され、引用により本明細書に包含される管理信頼性記憶システムとその方法という名称の特願第１０／８１７、６１７号に記載されている。管理信頼性のその他の特徴は、例えば本譲渡人に譲渡され、引用により本明細書に包含される分散記憶システムの予測故障からの決定的予防回復という名称の特願第１１／０４０、４１０号に開示されているような、所定の規則と関連する予防故障指示への応答を含む。 Administrative reliability also includes the scheduling of diagnostic and correction routines based on system monitoring usage. Data restoration operations are performed for data copying and reconstruction. ISP 150 is integrated into MDA 118 to facilitate “self-healing” of the total data storage capacity without data loss. These and other features of the management reliability features considered herein are assigned to this assignee and are hereby incorporated by reference into Japanese Patent Application No. 10 entitled Management Reliability Storage System and Method. / 817,617. Another feature of management reliability is, for example, Japanese Patent Application No. 11 / 040,410 entitled Definitive Preventive Recovery from Predictive Failure of a Distributed Storage System Assigned to the Assignee and incorporated herein by reference. In response to preventive failure indications associated with a given rule.

図７は冗長ＩＳＰ１５０の対が常駐するＩＳＰ回路板１５２の線図である。ＩＳＰ１５０はデータ記憶容量１０９をＳＡＮファブリック１１０にインターフェースする。各ＩＳＰ１５０はルーティング、ボリューム管理、及びデータ移行と複製のような各種の記憶サービスを管理可能である。ＩＳＰ１５０はバス１５８により結合された２つのＩＳＰサブシステム１５４、１５６に板１５２を分割する。ＩＳＰサブシステム１５４は、各々リンク１６０、１６２によりファブリック１１０と記憶容量１０９に結合された「Ｂ」と記載されたＩＳＰ１５０を含む。ＩＳＰ１５４はまた実時間オペレーティングシステムを実行するポリシー・プロセッサ１６４も含む。ＩＳＰ１５０とポリシー・プロセッサ１６４はバス１６６を介して通信し、かつ両者はメモリ１６８と通信する。 FIG. 7 is a diagram of an ISP circuit board 152 in which a pair of redundant ISPs 150 resides. ISP 150 interfaces data storage capacity 109 to SAN fabric 110. Each ISP 150 can manage various storage services such as routing, volume management, and data migration and replication. ISP 150 divides plate 152 into two ISP subsystems 154 and 156 that are coupled by bus 158. ISP subsystem 154 includes ISP 150 labeled “B” coupled to fabric 110 and storage capacity 109 by links 160, 162, respectively. ISP 154 also includes a policy processor 164 that executes a real-time operating system. ISP 150 and policy processor 164 communicate via bus 166 and both communicate with memory 168.

図８は本発明の実施例に従って構成された例示のＩＳＰサブシステム１５４の線図である。ＩＳＰ１５０は、交差点スイッチ（ＣＳＰ）１８６メッセージ・クロスバーを介してリスト・マネージャ１８２、１８４と通信する多数の機能制御器（１７０−１８０）を含む。従って、制御器（１７０−１８０）はメモリ・モジュールにアクセスするため及び／またはＩＳＰ動作を起動するため、各々特定の条件に応答してＣＳＰメッセージを発生し、このメッセージをＣＰＳ１８６を通してリスト・マネージャ１８２、１８４に送信可能である。同様に、リスト・マネージャ１８２、１８４からの応答はＣＰＳ１８６を介して制御器（１７０−１８０）の任意のものと通信可能である。図８の配置と関連する説明は例示のものであり、本発明の考えられる実施例を限定するものではない。 FIG. 8 is a diagram of an exemplary ISP subsystem 154 configured in accordance with an embodiment of the present invention. ISP 150 includes a number of function controllers (170-180) that communicate with list managers 182, 184 via an intersection switch (CSP) 186 message crossbar. Accordingly, the controller (170-180) generates a CSP message in response to each specific condition to access the memory module and / or initiate an ISP operation, and this message is sent through the CPS 186 to the list manager 182. , 184 can be transmitted. Similarly, responses from list managers 182, 184 can communicate with any of the controllers (170-180) via CPS 186. The description associated with the arrangement of FIG. 8 is exemplary and is not intended to limit the possible embodiments of the present invention.

ポリシー・プロセッサ１６４はＩＳＰ１５０を介して所要の動作を実行するようプログラム可能である。例えば、ポリシー・プロセッサ１６４は、ＣＰＳ１８６を介してメッセージを送受信するリスト・マネージャ１８２、１８４と通信可能である。ポリシー・プロセッサ１６４への応答はメモリ１６８レジスタを読み出す信号への割り込みとしての役割を果たすことが可能である。 Policy processor 164 is programmable through ISP 150 to perform the required operations. For example, policy processor 164 can communicate with list managers 182, 184 that send and receive messages via CPS 186. The response to the policy processor 164 can serve as an interrupt to a signal that reads the memory 168 register.

図９は、ＦＣ、ｉＳＣＳＩ、またはＳＡＳのような、予め選択された複数個の通信プロトコルの内のいくつかでホスト１０２と通信するための、インテリジェント制御器１１２による、ＩＳＥ１０８の柔軟性の利点を示す線図である。ＩＳＥ１０９はホスト指令の抽象レベルを確定し、これにより指令と関係する物理記憶１０９に仮想記憶ボリュームをマップするようプログラム可能である。 FIG. 9 illustrates the flexibility advantage of ISE 108 with intelligent controller 112 for communicating with host 102 over some of a plurality of preselected communication protocols, such as FC, iSCSI, or SAS. FIG. The ISE 109 is programmable to determine the level of abstraction of the host command and thereby map the virtual storage volume to the physical storage 109 associated with the command.

現在の目的には、用語「仮想記憶ボリューム」とは物理記憶の論理抽象化に一般的に対応する論理エンティティを意味する。「仮想記憶ボリューム」は、例えば、カウント・キー・データ・アーキテクチャで固定ブロックアーキテクチャまたはレコードで連続的にアドレスされるブロックであるかのように（論理的に）処理されるエンティティを含むことが可能である。仮想記憶ボリュームは１つ以上の記憶要素上に物理的に配置可能である。 For present purposes, the term “virtual storage volume” means a logical entity that generally corresponds to a logical abstraction of physical storage. A “virtual storage volume” can include entities that are processed (logically) as if they were, for example, a fixed block architecture with a count key data architecture or a block that is sequentially addressed with a record It is. A virtual storage volume can be physically located on one or more storage elements.

図１０は任意のホスト１０２と独立のＩＳＥ１０８により実行可能なデータ管理サービスの型式の線図である。例えば、ＲＡＩＤ管理は、所要数のデータ記憶装置１２８_１、１２８_２、１２８_３、…１２８_ｎ内で実行されるデータのストライピングにより、フォールトトレラントなデータ一体性のために局所的に制御可能である。仮想化サービスを局所的に制御して論理エンティティにメモリ容量を割り当て及び／または逆割り当てすることが可能である。上述した管理信頼性方式や同一ＩＳＥ１０８内の論理ボリューム間のデータ移行のようなアプリケーション・ルーチンは、同様に局所的に制御可能である。 FIG. 10 is a diagram of a type of data management service that can be executed by an ISE 108 independent of any host 102. For example, RAID management can be controlled locally for fault tolerant data integrity by striping data performed within the required number of data storage devices 128 ₁ , 128 ₂ , 128 ₃ ,... 128 _n . . It is possible to locally control the virtualization service to allocate and / or reverse allocate memory capacity to logical entities. Application routines such as the management reliability method described above and data migration between logical volumes within the same ISE 108 can be controlled locally as well.

図１１で、ＩＳＥ１０８はさらなる記憶容量のホスト指令に応答して他のＬＵＮ３を生成した。本実施例は、ＬＵＮ１とＬＵＮ２に既に存在するデータを全３台のＬＵＮ上に再ストライプする加速の方法を考える。有利なことに、本実施例の加速再ストライピングは、既存のデータをバックアップし、ついでこれを新たに構成した記憶空間に復元する関連技術のソリューションでの比較的時間のかかる処理を必要としない。すなわち、本ソリューションでは、ＩＳＥ１０８は、本明細書で記述した処理に従って、再ストライプ過程を開始した時と再ストライピングの速度のみを基にしてシステム１００に利用可能である。開始した時間とデータを転送する速度は、システム１００性能への再ストライピング過程の悪い影響を最小とするため、他のシステム１００リソース要求の関係に関連して、変更可能である。また、再ストライピング・アプリケーションへの入力を簡略化し、遠隔ホスト１０２からアプリケーションをオフロードすることにより、本実施例は全体として分散システム１００への少ないオーバーヘッドの支出で過程を加速する。 In FIG. 11, the ISE 108 has created another LUN 3 in response to a host command with additional storage capacity. In this embodiment, an acceleration method is considered in which data already existing in LUN1 and LUN2 is re-striped on all three LUNs. Advantageously, the accelerated restriping of this embodiment does not require a relatively time consuming process with a related art solution that backs up existing data and then restores it to the newly configured storage space. That is, in this solution, the ISE 108 is available to the system 100 based on only when the re-stripe process is initiated and the rate of restriping, according to the process described herein. The time started and the rate at which data is transferred can be varied in relation to other system 100 resource requirements in order to minimize the negative impact of the restriping process on system 100 performance. Also, by simplifying input to the restriping application and offloading the application from the remote host 102, the present embodiment as a whole accelerates the process with less overhead spent on the distributed system 100.

図１２は説明の都合上、ＬＵＮ１とＬＵＮ２が共に記憶データにより容量を充填されていることを示す。列Ａ−Ｈはデータパック１５１内のデータ記憶装置１２８（または「ドメイン」）の各々を表す。「Ａ１」と名付けられたような、各列と行の交差点は、ドメインの各々に割り当てた記憶容量の塊を表す。各行の複数個の塊が全てのドメイン上での記憶容量のストライプを形成する。従って、ＬＵＮ１の第１ストライプは塊Ａ１−Ｈ１から構成され、ＬＵＮ２の第１ストライプは塊Ａ５−Ｈ５から構成される。ＬＵＮ１の終わりからＬＵＮ２の始まりとの間の空間は単に説明用のものである。これらのまたはその他の連続するＬＵＮの間の記憶空間には事実何らの間隙もないものと想定される。 FIG. 12 shows that both LUN1 and LUN2 are filled with storage data for convenience of explanation. Columns AH represent each of the data storage devices 128 (or “domains”) in the data pack 151. Each column and row intersection, such as named “A1”, represents a chunk of storage capacity allocated to each of the domains. Multiple chunks in each row form a stripe of storage capacity on all domains. Therefore, the first stripe of LUN1 is composed of chunks A1-H1, and the first stripe of LUN2 is composed of chunks A5-H5. The space between the end of LUN1 and the start of LUN2 is for illustration only. It is assumed that there is virtually no gap in the storage space between these or other consecutive LUNs.

図１３は、メモリに常駐し経路２００を介して選択した複数個のＬＵＮからデータを読み取り、経路２０２を介して単一チャネルに当該データをコード化するよう構成されたソフトウェア（またはファームウェア）システム１９９を実行するＩＳＰ１５０を図式的に図示する。次いで、ソフトウェア・システム１９９は経路２０４を介して第２の複数個の論理装置にデータを再ストライプするためにコード化データをデコードする。 FIG. 13 illustrates a software (or firmware) system 199 configured to read data from a plurality of selected LUNs residing in memory and selected via path 200 and to encode the data into a single channel via path 202. Fig. 1 schematically illustrates an ISP 150 that performs Software system 199 then decodes the encoded data to restrip the data to a second plurality of logical units via path 204.

ソフトウェア・システム１９９は多重化操作（「マルチプレックス」）２０６を実行してデータをコード化し、逆多重化操作（「デマルチプレックス」）２０８を実行してコード化データをデコードすることが望ましい。現在の説明の都合上、元々ＬＵＮ１とＬＵＮ２（図１２）に記憶されたデータは本発明の実施例に従ってＬＵＮ１−３上に再ストライプされる。言い換えると、現在記憶されたデータは二台のＬＵＮから三台のＬＵＮに再ストライプされる。従って、図１３ではｎ＝２とｍ＝３である。 Software system 199 preferably performs a multiplex operation (“multiplex”) 206 to encode the data and a demultiplex operation (“demultiplex”) 208 to decode the encoded data. For the convenience of the current description, the data originally stored in LUN1 and LUN2 (FIG. 12) is restriped on LUN1-3 in accordance with an embodiment of the present invention. In other words, the currently stored data is restriped from two LUNs to three LUNs. Therefore, in FIG. 13, n = 2 and m = 3.

図１４は、ＬＵＮ１（キャッシュ１６８から）からデータ・ストライプＡ１−Ｈ１を読み取り、これをＬＵＮ１に記憶することにより開始する多重化−逆多重化ソフトウェア制御回路を図示する。図１５は多重化−逆多重化回路がいかに入力ソースと出力到達先上を同期的に連続して、次にデータ・ストライプＡ５−Ｈ５をキャッシュ１６８から読み取ってこれをＬＵＮ２に記憶するかを図示する。図１６は多重化−逆多重化回路が入力ソースと出力到達先をシリアルにループする方法を十分図示している。すなわち、全ての入力ソースを順番に回った後、たとえ逆多重化操作が全ての出力到達先をまだ回っていなくとも、多重化操作はＬＵＮ１に復帰する。 FIG. 14 illustrates a multiplexing-demultiplexing software control circuit that begins by reading data stripe A1-H1 from LUN1 (from cache 168) and storing it in LUN1. FIG. 15 illustrates how the multiplexing-demultiplexing circuit is synchronously continuous on the input source and output destination, and then reads the data stripe A5-H5 from the cache 168 and stores it in LUN2. To do. FIG. 16 fully illustrates how the multiplexing-demultiplexing circuit loops serially between the input source and the output destination. That is, after going through all input sources in turn, the multiplexing operation returns to LUN1 even though the demultiplexing operation has not yet gone through all output destinations.

図１７で、データ・ストライプＡ６−Ｈ６がキャッシュ１６８から読み取られてＬＵＮ１に記憶される。図１８で、多重化―逆多重化回路はキャッシュ１６８からのデータ・ストライプＡ３−Ｈ３を次に読み取ってこれをＬＵＮ２に記憶するよう回る。図１９で、データ・ストライプＡ７−Ｈ７はＬＵＮ３に記憶される。図２０で、データ・ストライプＡ４−Ｈ４はＬＵＮ１に記憶される。最後に、図２１で、データ・ストライプＡ８−Ｈ８がＬＵＮ２に記憶される。 In FIG. 17, data stripes A6-H6 are read from cache 168 and stored in LUN1. In FIG. 18, the multiplexing-demultiplexing circuit then reads the data stripe A3-H3 from the cache 168 and turns to store it in LUN2. In FIG. 19, data stripes A7-H7 are stored in LUN3. In FIG. 20, data stripes A4-H4 are stored in LUN1. Finally, in FIG. 21, data stripes A8-H8 are stored in LUN2.

図１４−２１に図示した再ストライピング・ユーティリティは基本的には同じ処理であるが、ｎ＝３とｍ＝２である処理を実行することにより逆転可能である。図２１に示すような状態にあるものとして元々記憶されたデータを考えると、図２２はＬＵＮ１にデータ・ストライプＡ１−Ｈ１を記憶することにより復元過程を開始する。図２３で、多重化−逆多重化回路はＬＵＮ２にデータ・ストライプＡ５−Ｈ５を記憶するために連続的に巡回する。図２４はＬＵＮ１にデータ・ストライプＡ２−Ｈ２を次に記憶する回路を図示し、図２５でデータ・ストライプＡ６−Ｈ６はＬＵＮ２に記憶され、図２６でデータ・ストライプＡ３−Ｈ３はＬＵＮ１に記憶され、図２７でデータ・ストライプＡ７−Ｈ７はＬＵＮ２に記憶され、図２８でデータ・ストライプＡ４−Ｈ４がＬＵＮ１に記憶される。最後に、図２９でデータ・ストライプＡ８−Ｈ８はＬＵＮ２に記憶される。 The restriping utility illustrated in FIGS. 14-21 is basically the same process, but can be reversed by executing processes where n = 3 and m = 2. Considering the data originally stored as being in the state as shown in FIG. 21, FIG. 22 starts the restoration process by storing data stripes A1-H1 in LUN1. In FIG. 23, the multiplexing-demultiplexing circuit cycles continuously to store data stripes A5-H5 in LUN2. FIG. 24 illustrates a circuit that next stores data stripes A2-H2 in LUN1, FIG. 25 shows data stripes A6-H6 stored in LUN2, and FIG. 26 shows data stripes A3-H3 stored in LUN1. 27, data stripes A7-H7 are stored in LUN2, and in FIG. 28, data stripes A4-H4 are stored in LUN1. Finally, in FIG. 29, data stripes A8-H8 are stored in LUN2.

図３０は本発明の実施例に従って再ストライピングする方法２５０の流れ図である。方法２５０は、多重化−逆多重化操作用に入力ノードの数ｎと出力ノードの数ｍを選択することによりブロック２５２で開始する。例えば、図１４−２１の２つのＬＵＮから３つのＬＵＮへのデータのストライプに対しては、ｎ＝２とｍ＝３である。ブロック２５４で、データは入力ノードから同期的にコード化され、出力ノードへデコードされる。ブロック２５６で、最後のデータ・ストライプを新たに構成した論理ボリュームに再ストライプしたかどうかを決定する。ブロック２５６の決定が「ＹＥＳ」の場合、方法２５０は終了し、そうでない場合、制御はブロック２５８に渡る。ブロック２５８で、多重化操作と逆多重化操作が次の各ノードに連続的にループされ、次のノードでデータがコード化されデコードされるブロック２５４へ制御が復帰する。 FIG. 30 is a flow diagram of a method 250 for re-striping according to an embodiment of the present invention. The method 250 begins at block 252 by selecting the number n of input nodes and the number m of output nodes for the multiplex-demultiplex operation. For example, n = 2 and m = 3 for stripes of data from two LUNs to three LUNs in FIG. At block 254, data is synchronously encoded from the input node and decoded to the output node. At block 256, it is determined whether the last data stripe has been re-striped to the newly configured logical volume. If the determination at block 256 is “YES”, the method 250 ends, otherwise control passes to block 258. At block 258, the multiplexing and demultiplexing operations are continuously looped to each next node, and control returns to block 254 where the data is encoded and decoded at the next node.

最後に、図３１は図４と同様であるが、基部１９０とこれに密封して取り付けたカバー１９２から作成された密封筐体内に含まれる複数個のデータ記憶装置１２８と回路板１３４を有する。ＭＤＡ１１８Ａを形成するデータ記憶装置１２８を密封して係合することは、データ記憶装置１２８の配置が所定の最適配置から変更されないことを保証することを含む多数の利点をユーザーに与える。このような配置は、ＭＤＡ１１８Ａ製造業者がシステムを最適性能に調整することも可能とし、データ記憶装置１２８の数、サイズおよび型式を与えることが、明確に定義可能である。 Finally, FIG. 31 is similar to FIG. 4 but includes a plurality of data storage devices 128 and a circuit board 134 contained within a sealed enclosure made from a base 190 and a cover 192 sealed and attached thereto. Sealing and engaging the data storage device 128 that forms the MDA 118A provides the user with a number of advantages including ensuring that the placement of the data storage device 128 is not altered from a predetermined optimal configuration. Such an arrangement also allows the MDA 118A manufacturer to tune the system for optimal performance and can be clearly defined to give the number, size and type of data storage 128.

密封ＭＤＡ１１８Ａはまた、製造業者が中の記憶媒体の群の信頼性とフォールトトレランスを最大することが可能である。これは、複数スピンドル配置のドライブを最適化することにより実行される。設計最適化は、コストを減少し、性能を増強し、信頼性を高め、一般的にＭＤＡ１１８Ａ内のデータの寿命を延長する。さらに、ＭＤＡ１１８Ａの設計自体が殆どゼロの回転振動と高冷却効率環境を提供し、これは本願の譲渡人に譲渡された強化ＲＶＩの記憶アレイという名称の共願の米国特願第１１／１４５、４０４号の主題事項である。これは、ＭＤＡ１１８の信頼性、性能、または容量を妥協することなく内部の記憶媒体をよりコスト的な基準で製造可能とする。密封ＭＤＡ１１８Ａは従って、単一障害点ではなく、殆ど完全な回転振動の回避と冷却効率を提供する。これは、ＭＤＡ１１８Ａの設計を最適ディスク媒体特性とし、同時に信頼性と性能を増強しつつコストを削減することを可能とする。 Sealed MDA 118A also allows the manufacturer to maximize the reliability and fault tolerance of the group of storage media inside. This is done by optimizing the drive with a multiple spindle arrangement. Design optimization reduces costs, increases performance, increases reliability, and generally extends the life of data in MDA 118A. In addition, the MDA 118A design itself provides almost zero rotational vibration and a high cooling efficiency environment, which is co-pending US Patent Application No. 11/145, entitled Enhanced RVI Storage Array, assigned to the assignee of the present application. This is the subject matter of No. 404. This allows internal storage media to be manufactured on a more costly basis without compromising the reliability, performance, or capacity of the MDA 118. The sealed MDA 118A is therefore not a single point of failure, but provides almost complete rotational vibration avoidance and cooling efficiency. This allows the MDA 118A design to be optimal disk media characteristics and at the same time reduce costs while enhancing reliability and performance.

要約すると、記憶媒体に対してデータ記憶と読出し関係にある各独立に動作可能なアクチュエータに隣接する記憶媒体を各々支持する複数個の回転可能なスピンドルを含む、分散記憶システムの内蔵型ＩＳＥが提供される。ＩＳＥはさらに、分散記憶システムの遠隔装置により使用される複数個の媒体に仮想記憶ボリュームをマップするようにされているＩＳＰを含む。 In summary, a self-contained ISE for a distributed storage system is provided that includes a plurality of rotatable spindles each supporting a storage medium adjacent to each independently operable actuator that is in data storage and read relation to the storage medium. Is done. The ISE further includes an ISP adapted to map the virtual storage volume to a plurality of media used by the remote devices of the distributed storage system.

ある実施例では、ＩＳＥは複数個のスピンドルと共通密封ハウジング内に収めた媒体を有する。ＩＳＰは、ＲＡＩＤ方法論のような、フォールトトレラント式にデータを記憶する仮想記憶ボリュームにメモリを割り当てることが望ましい。さらにＩＳＰは、観測した予測記憶故障に応答して原位置での決定的予防復元段階を開始するような、データ記憶過程の管理信頼性方法論を実行可能である。ＩＳＥは、２つ以上のディスクのデータ記憶媒体から構成されたディスク・スタックを各々が有する複数個のデータ記憶装置から構成されることが望ましい。 In one embodiment, the ISE has a plurality of spindles and a medium contained in a common sealed housing. It is desirable for an ISP to allocate memory to a virtual storage volume that stores data in a fault tolerant manner, such as a RAID methodology. Further, the ISP can implement a management reliability methodology for the data storage process that initiates an in-situ definitive preventive restoration phase in response to the observed predicted storage failure. The ISE is preferably composed of a plurality of data storage devices each having a disk stack composed of two or more disk data storage media.

他の実施例では、ＩＳＥは、内臓型の複数個の離散データ記憶装置と、データ記憶装置と通信して遠隔装置から受信した指令を抽象化するようにし、これにより関連メモリを関係付けるＩＳＰとを含む分散記憶システムと考えられる。ＩＳＰは、分散記憶システムの１つ以上の遠隔装置により使用される複数個のデータ記憶装置に仮想記憶ボリュームをマップするようにされていることが望ましい。以前と同様に、複数個のデータ記憶装置と媒体は共通の密封ハウジング内に収めることも可能である。ＩＳＰは、ＲＡＩＤ方法論のようにフォールトトレラント的にデータを記憶する仮想記憶ボリュームにメモリを割り当てることが望ましい。ＩＳＰはさらに観測された予測記憶故障に応答してデータ記憶装置での原位置の決定的予防復元段階を開始可能である。 In another embodiment, the ISE includes a plurality of self-contained discrete data storage devices and an ISP that communicates with the data storage devices to abstract instructions received from a remote device, thereby associating associated memory with the ISP. A distributed storage system including The ISP is preferably adapted to map the virtual storage volume to a plurality of data storage devices used by one or more remote devices of the distributed storage system. As before, a plurality of data storage devices and media can be contained in a common sealed housing. It is desirable for an ISP to allocate memory to a virtual storage volume that stores data in a fault-tolerant manner like a RAID methodology. The ISP can further initiate an in-situ definitive preventive restoration phase in the data storage device in response to the observed predicted storage failure.

別の実施例では、ホストと、網を通してホストと通信するバックエンド記憶サブシステムとを含み、ホストとは独立に内蔵記憶容量を仮想化する装置を含む、分散記憶システムが提供される。 In another embodiment, a distributed storage system is provided that includes a host and a back-end storage subsystem that communicates with the host through a network, and includes a device that virtualizes internal storage capacity independent of the host.

仮想化の装置は、複数個の離散的な個別にアクセス可能なデータ記憶装置を特徴とすることが可能である。仮想化の装置は複数個のデータ記憶装置と関係する記憶容量の仮想ブロックをマップすることを特徴とすることが可能である。仮想化の装置は複数個のデータ記憶装置と関連制御部を密封的にコンテナ化することを特徴とすることが可能である。仮想化の装置はＲＡＩＤ方法論への制限なしのような、フォールトトレラント的にデータを記憶することを特徴とすることが可能である。仮想化の装置は観測された予測記憶故障に応答して原位置での決定的予防復元段階を開始することを特徴とすることが可能である。仮想化の装置は複数個のスピンドル・データ記憶アレイを特徴とすることが可能である。 The virtualization device can feature a plurality of discrete, individually accessible data storage devices. The virtualization device may be characterized by mapping a virtual block of storage capacity associated with a plurality of data storage devices. The virtualization device may be characterized by hermetically containerizing a plurality of data storage devices and associated control units. The virtualization device can be characterized by storing data in a fault-tolerant manner, such as without restrictions on RAID methodology. The virtualization device may be characterized by initiating a definitive preventive restoration phase in situ in response to the observed predictive memory failure. The virtualization device can feature a plurality of spindle data storage arrays.

本明細書の都合上、用語「仮想化の装置」は、各データ記憶サブシステム内以外のデータ記憶空間にマップするシステム・インテリジェンスに含まれる以前に試行したソリューションを明示的に考えていない。例えば、「仮想化の装置」はデータ記憶サブシステムの機能を制御する記憶マネージャの使用を考えておらず、またＳＡＮファブリック内またはホスト内のマネージャまたはスイッチの配置も考えていない。 For purposes of this specification, the term “virtualization device” does not explicitly consider previously tried solutions included in system intelligence that maps to data storage space other than within each data storage subsystem. For example, a “virtualization device” does not consider the use of a storage manager that controls the functions of the data storage subsystem, nor does it consider the placement of managers or switches within a SAN fabric or within a host.

本実施例は、メモリ空間に常駐して、第２数の論理装置にデータを記憶するために第１数の論理装置から読み出したデータを単一チャネルにコード化するよう構成されたソフトウェア・システムを有するデータ記憶装置をさらに考える。ソフトウェア・システムは、第２数の論理装置にデータを記憶する前にコード化データをデコードするよう構成されることが望ましい。 The present embodiment is a software system resident in a memory space and configured to encode data read from a first number of logical units into a single channel for storing data in the second number of logical units. Consider further a data storage device having: The software system is preferably configured to decode the encoded data prior to storing the data in the second number of logical devices.

例えば、ソフトウェア・システムはデータをコード化するために第１数の論理装置上に元々ストライプされたデータを多重化するよう構成可能である。この場合、ソフトウェア・システムは第２数の論理装置上にデータを再ストライプするためにコード化データを逆多重化するよう構成可能である。ソフトウェア・システムは１つ以上の入力ソースと１つ以上の出力到達先上で各々データ多重化とデータ逆多重化を同期的にループすることが望ましい。 For example, the software system can be configured to multiplex the originally striped data on a first number of logical units to encode the data. In this case, the software system can be configured to demultiplex the encoded data to restrip the data on the second number of logical units. The software system preferably loops data multiplexing and data demultiplexing synchronously on one or more input sources and one or more output destinations, respectively.

第１数の論理装置と第２数の論理装置は異なることも可能であり、すなわち、ある場合には第２数は第１数より大きく、他の場合には第１数は第２数より大きい。 The first number of logical units and the second number of logical units can also be different, i.e., in some cases the second number is greater than the first number and in other cases the first number is greater than the second number. large.

ソフトウェア・システムは、第２数の論理装置から再ストライプしたデータを多重化することにより第１数の論理装置上にデータの元のストライピングも復元可能である。 The software system can also restore the original striping of data on the first number of logical units by multiplexing the data that has been re-striped from the second number of logical units.

ある実施例では、ソフトウェア・システムは分散記憶システムのインテリジェントな記憶要素に常駐可能である。 In one embodiment, the software system can reside on intelligent storage elements of a distributed storage system.

他の実施例では、第１数の論理装置上のデータをコード化し、コード化データを第２数の論理装置上にデコードする方法が提供される。コード化段階は第１数の論理装置に元々ストライプされたデータを多重化することを特徴とし、デコード段階は第２数の論理装置上にデータを再ストライプするためにコード化データを逆多重化することを特徴とする。コード化段階とデコード段階は、１つ以上の入力ソースと１つ以上の出力の各々にデータ多重化とデータ逆多重化を同期的にループすることを特徴とする。本方法はまた、第２数の論理装置から再ストライプしたデータを多重化することにより第１数の論理装置上にデータの元のストライピングを復元することを考えることも可能である。 In another embodiment, a method is provided for encoding data on a first number of logical devices and decoding the encoded data on a second number of logical devices. The encoding stage is characterized by multiplexing the data originally striped onto the first number of logical units, and the decoding stage is demultiplexing the encoded data to re-strip the data onto the second number of logical units. It is characterized by doing. The encoding and decoding steps are characterized by synchronously looping data multiplexing and data demultiplexing to each of the one or more input sources and the one or more outputs. The method may also consider restoring the original striping of data on the first number of logical devices by multiplexing data that has been re-striped from the second number of logical devices.

他の実施例では、メモリ空間を有するインテリジェントな記憶要素と、メモリ空間の論理装置にデータを再ストライピングする装置とを有するデータ記憶システムが提供される。本説明と添付の請求の範囲の意味の都合上、「再ストライピングの装置」という句の意味は、これには限定されないが上述した多重化と逆多重化操作のような、上述したようなコード化／デコード操作を明白に必要とする。「再ストライピングの装置」の意味は、元の記憶されたデータをバックアップし、次いでこれを新たなストライプ配列に復元することを含む以前に試行されたソリューションを明らかに含まない。 In another embodiment, a data storage system is provided having an intelligent storage element having a memory space and a device for re-striping data to a logical device in the memory space. For the purposes of this description and the accompanying claims, the meaning of the phrase “re-striping device” is not limited thereto, but may be a code as described above, such as the multiplexing and demultiplexing operations described above. Obviously requires a digitization / decoding operation. The meaning of “re-striping device” clearly does not include a previously tried solution that includes backing up the original stored data and then restoring it to a new stripe arrangement.

本発明の各種の実施例の多数の特徴と利点を、本発明の各種の実施例の構造と機能の詳細と共に以上の説明に記載してきたが、この詳細な説明は単なる例示であり、特に添付の請求の範囲に表示される用語の広範囲な一般的意味により指示される最大の範囲で本発明の原理内で構造及び部品の配置に関して、変更を詳細部に加えてもよいことを理解すべきである。例えば、本発明の範囲と要旨から逸脱することなく特定の処理環境に応じて特定の要素を変更してもよい。 Although numerous features and advantages of various embodiments of the invention have been described in the foregoing description, together with details of the structure and function of the various embodiments of the invention, this detailed description is merely exemplary and particularly not restrictive. It should be understood that changes may be made to the details regarding the structure and arrangement of parts within the principles of the invention to the maximum extent indicated by the broad general meaning of the terms presented in the claims. It is. For example, specific elements may be changed according to a specific processing environment without departing from the scope and spirit of the present invention.

さらに、本明細書で記載した実施例はデータ記憶アレイに向けられているが、請求項の主題事項はこれには限定されず、請求の発明の範囲と要旨から逸脱することなく各種のその他の処理システムを利用可能であることが当業者には認められる。 Further, while the embodiments described herein are directed to data storage arrays, the claimed subject matter is not so limited, and various other arrangements may be used without departing from the scope and spirit of the claimed invention. One skilled in the art will recognize that a processing system can be used.

本発明の実施例が有用なコンピュータ・システムの図式表現。1 is a graphical representation of a computer system in which embodiments of the present invention are useful. 図１のコンピュータ・システムの簡単化した図式表現。FIG. 2 is a simplified schematic representation of the computer system of FIG. 本発明の実施例に従って構成したインテリジェント記憶要素の展開した等角投影図。FIG. 3 is an exploded isometric view of an intelligent storage element configured in accordance with an embodiment of the present invention. 図３のインテリジェント記憶要素の複数ディスクアレイの部分展開等角投影図。FIG. 4 is a partially expanded isometric view of a plurality of disk arrays of the intelligent storage element of FIG. 3. 図４の複数ディスクアレイに使用した例示のデータ記憶装置。5 is an exemplary data storage device used in the multiple disk array of FIG. 図３のインテリジェント記憶要素の機能ブロック線図。FIG. 4 is a functional block diagram of the intelligent storage element of FIG. 3. 図３のインテリジェント記憶要素のインテリジェント記憶プロセッサ回路板の機能ブロック線図。FIG. 4 is a functional block diagram of an intelligent storage processor circuit board of the intelligent storage element of FIG. 3. 図３のインテリジェント記憶要素のインテリジェント記憶プロセッサの機能ブロック線図。FIG. 4 is a functional block diagram of an intelligent storage processor of the intelligent storage element of FIG. 3. 図３のインテリジェント記憶要素により実行される指令抽象化および関連メモリ・マッピング・サービスの機能ブロック線図表現。FIG. 4 is a functional block diagram representation of the command abstraction and associated memory mapping service performed by the intelligent storage element of FIG. 3. 図３のインテリジェント記憶要素により実行されるその他の例示のデータサービスの機能ブロック線図。FIG. 4 is a functional block diagram of another example data service performed by the intelligent storage element of FIG. 3. 図６と同様の図であるが、新たな記憶空間の割当て用のホスト指令に続くものである。FIG. 7 is a diagram similar to FIG. 6, but following a host command for allocation of a new storage space. 図６でＬＵＮ１とＬＵＮ２に記憶されるデータのマトリクス表現。FIG. 6 is a matrix representation of data stored in LUN1 and LUN2. 本発明の実施例に従って動作するよう構成されたソフトウェア・システムの線図表現。2 is a diagrammatic representation of a software system configured to operate in accordance with an embodiment of the present invention. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮ上に２つのＬＵＮに元々記憶されたデータの再ストライピングに関連する連続的同期多重化と逆多重化の連続線図表現。A continuous diagram representation of continuous synchronous multiplexing and demultiplexing associated with restriping of data originally stored in two LUNs on three LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. ３つのＬＵＮから２つのＬＵＮへデータを復元する連続線図表現。A continuous diagram representation that restores data from three LUNs to two LUNs. 本発明の実施例に従って再ストライプする方法の段階を示す流れ図。FIG. 5 is a flow diagram illustrating steps in a method of restriping according to an embodiment of the present invention. 図４と同様の展開等角投影図であるが、密封された筐体内に収容されたデータ記憶装置と回路板を有する。FIG. 5 is an exploded isometric view similar to FIG. 4 but having a data storage device and a circuit board housed in a sealed housing.

Explanation of symbols

１０８ＩＳＥ
１０９データ記憶
１１２制御装置
１１８ＭＤＡ
１２８データ記憶装置
１５０ＩＳＰ
１５４、１５６ＩＳＰサブシステム
１６４ポリシー・プロセッサ
１７０−１８０機能制御器
１８２、１８４リスト・マネージャ
１８６ＣＳＰ
１９９ソフトウェア・システム
２０６多重化操作
２０８逆多重化操作 108 ISE
109 Data storage 112 Control device 118 MDA
128 data storage device 150 ISP
154, 156 ISP Subsystem 164 Policy Processor 170-180 Function Controller 182, 184 List Manager 186 CSP
199 Software system 206 Multiplexing operation 208 Demultiplexing operation

Claims

In a data storage device, including a software system resident in a memory space configured to encode data retrieved from a first number of logical devices into a single channel and store the data in a second number of logical devices. Data storage device characterized.

The apparatus of claim 1, wherein the software system is configured to decode the encoded data before storing the data in the second number of logical devices.

3. The apparatus of claim 2, wherein the software system is configured to multiplex data originally striped on the first number of logical units to encode data.

4. The apparatus of claim 3, wherein the software system is configured to demultiplex encoded data to restrip data on the second number of logical units.

The apparatus of claim 1, wherein the first number and the second number are different.

6. The apparatus of claim 5, wherein the second number is greater than the first number.

6. The apparatus of claim 5, wherein the first number is greater than the second number.

5. The apparatus of claim 4, wherein the software system restores original striping of data on the first number of logical units by multiplexing restriped data from the second number of logical units. An apparatus characterized by being configured as follows.

5. The apparatus of claim 4, wherein the software system is configured to synchronize data multiplexing and demultiplexing continuously on each of the one or more input sources and the one or more output destinations. A device characterized by.

The apparatus of claim 9, wherein the software system is configured to serially loop between an input source and an output destination during data multiplexing and data demultiplexing, respectively.

11. The apparatus of claim 10, wherein the software system is resident on an intelligent storage element of the distributed storage system.

Encoding data striped on a first number of logical units into a single channel;
Decoding the encoded data onto a second number of logic units;
Including methods.

13. The method of claim 12, wherein the encoding step multiplexes the originally striped data into a first number of logical units.

14. The method of claim 13, wherein the decoding step demultiplexes the encoded data to restrip the data over a second number of logical units.

13. The method of claim 12, wherein the first number is different from the second number.

16. The method of claim 15, wherein the first number is greater than the second number.

16. The method of claim 15, wherein the first number is less than the second number.

15. The method of claim 14, further comprising restoring original striping of data on the first number of logical devices by multiplexing restriped data from the second number of logical devices.

15. The method of claim 14, wherein the encoding step and the decoding step synchronously loop data multiplexing and data demultiplexing to one or more input sources and one or more output destinations, respectively. A method characterized by that.

In a data storage device,
An intelligent storage element having a memory space;
A device that restripes data to a logical unit of memory space; and
A storage device characterized by including data.