JPWO2013093994A1

JPWO2013093994A1 - Storage system, data rebalancing program, and data rebalancing method

Info

Publication number: JPWO2013093994A1
Application number: JP2013549973A
Authority: JP
Inventors: 敬司桑山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-12-19
Filing date: 2011-12-19
Publication date: 2015-04-27
Anticipated expiration: 2031-12-19
Also published as: US20140297950A1; WO2013093994A1; JP6056769B2; US9703504B2

Abstract

ストレージシステムは、データを格納する複数の格納装置と、データを保持するキャッシュメモリと、情報処理端末から、対象データの読み込みまたは対象データの書込みのためのアクセス要求があった場合、いずれかの格納装置へアクセスすると共に、対象データをキャッシュメモリへ格納するアクセス制御部と、キャッシュメモリへ格納した対象データを、複数の格納装置のうち対象データが格納されていない格納装置へ書き込む書込部と、を含む。The storage system stores a plurality of storage devices that store data, a cache memory that stores data, and an information processing terminal that receives an access request for reading or writing target data. An access control unit for accessing the device and storing the target data in the cache memory; a writing unit for writing the target data stored in the cache memory to a storage device in which the target data is not stored among the plurality of storage devices; including.

Description

本明細書に記載の技術は、ストレージシステムに関する。 The technology described in this specification relates to a storage system.

ストレージシステムは、複数のストレージ装置と、それらを統括するサーバ装置とを含む。スケールアウト型ストレージシステムでは、ストレージ装置の追加時にデータの平準化を確保するため、ストレージ装置間でデータリバランス（データの再配置）処理が行われる。データリバランスは多くの場合、ストレージ装置において偏ったデータを別のストレージ装置にコピーする。そのため、データリバランスを行っている間は、ネットワーク帯域とシステムリソースが使用される。 The storage system includes a plurality of storage devices and a server device that controls them. In a scale-out storage system, data rebalancing (data rearrangement) processing is performed between storage devices in order to ensure data leveling when adding storage devices. In many cases, data rebalancing copies uneven data in a storage device to another storage device. Therefore, network bandwidth and system resources are used during data rebalancing.

論理ボリュームの負荷分散に関して、次の論理ボリューム負荷分散技術がある。ストレージ装置において、性能測定機構は、データ転送機構により転送されるデータ量およびコマンド処理量に基づいて、論理ボリュームの負荷状況を測定する。コピー機構は、性能測定機構の測定結果に基づいて、物理ボリュームに設定された論理ボリュームの内容を予備物理ボリュームに設定された論理ボリュームにコピーする。複数のコピーしたデータを持つことで、データアクセスの負荷分散を行う。 Regarding the logical volume load distribution, there are the following logical volume load distribution techniques. In the storage apparatus, the performance measurement mechanism measures the load status of the logical volume based on the data amount transferred by the data transfer mechanism and the command processing amount. The copy mechanism copies the contents of the logical volume set in the physical volume to the logical volume set in the spare physical volume based on the measurement result of the performance measurement mechanism. By having multiple copied data, load distribution of data access is performed.

特開２００６−０５３６０１号公報JP 2006-053601 A 特開２００７−１４９０６８号公報JP 2007-149068 A 特開２００５−２８４６３２号公報JP 2005-284632 A 特開２００６−９９７４８号公報JP 2006-99748 A 特開２００４−５６３４号公報JP 2004-5634 A 特開２００７−７２５３８号公報JP 2007-72538 A

通常行われるデータリバランシングでは、ストレージ装置に格納されているデータを、追加したストレージにコピーすることで、特定のストレージへのアクセス集中を回避することができる。このコピー処理には、ストレージ装置間のネットワーク帯域を使用する。したがって、ストレージ装置間でのデータのコピーは、ネットワークに負荷がかかるため、ストレージ装置の負荷が低い時間帯にコピーを行うことが考えられる。 In normal data rebalancing, data stored in the storage device can be copied to the added storage, thereby avoiding concentration of access to a specific storage. This copy processing uses a network bandwidth between storage devices. Therefore, copying data between storage devices places a load on the network, so it is conceivable that copying is performed during a time zone when the load on the storage device is low.

しかしながら、ストレージ統合を行ったストレージ装置の場合、そのストレージ装置は、複数のシステムから利用されるため、ネットワークが低負荷となる時間帯が確保できないケースも考えられる。また、データのコピー中は、ネットワークの帯域を一度に使用するため、そのコピーが、ストレージ装置とサーバ間の入出力のボトルネックとなる要因になる。 However, in the case of a storage device that has integrated storage, since the storage device is used by a plurality of systems, there may be a case where it is not possible to secure a time zone during which the network has a low load. In addition, since the network bandwidth is used at a time during data copying, the copying becomes a factor of input / output bottleneck between the storage apparatus and the server.

１つの側面では、本発明は、ストレージシステムにおけるデータリバランスの効率化を図る技術を提供する。 In one aspect, the present invention provides a technique for improving the efficiency of data rebalance in a storage system.

ストレージシステムは、複数の格納装置、キャッシュメモリ、アクセス制御部、書込部を含む。格納装置は、データを格納する。キャッシュメモリは、データを保持する。アクセス制御部は、情報処理端末から、対象データの読み込みまたは対象データの書込みのためのアクセス要求があった場合、いずれかの前記格納装置へアクセスすると共に、前記対象データを前記キャッシュメモリへ格納する。書込部は、前記キャッシュメモリへ格納した前記対象データを、前記複数の格納装置のうち該対象データが格納されていない格納装置へ書き込む。 The storage system includes a plurality of storage devices, a cache memory, an access control unit, and a writing unit. The storage device stores data. The cache memory holds data. When there is an access request for reading target data or writing target data from the information processing terminal, the access control unit accesses any of the storage devices and stores the target data in the cache memory . The writing unit writes the target data stored in the cache memory to a storage device in which the target data is not stored among the plurality of storage devices.

１実施形態によれば、ストレージシステムにおけるデータリバランスの効率化を図ることができる。 According to one embodiment, the efficiency of data rebalance in a storage system can be improved.

本実施形態におけるストレージシステムの構成を示す。The structure of the storage system in this embodiment is shown. 本実施形態におけるシステムの全体構成図を示す。1 is an overall configuration diagram of a system in the present embodiment. ボリューム管理サーバ１５とI/Oサーバ１８のブロック図を示す。A block diagram of the volume management server 15 and the I / O server 18 is shown. レプリカ表３１の一例を示す。An example of the replica table 31 is shown. 書き込み保留表３２の一例を示す。An example of the write hold table 32 is shown. I/O周期表３３の一例を示す。An example of the I / O periodic table 33 is shown. read/writeアクセス負荷対応表３４の一例を示す。An example of the read / write access load correspondence table 34 is shown. read/writeディスクアクセス負荷対応表３５の一例を示す。An example of the read / write disk access load correspondence table 35 is shown. レプリカ完成度表３６の一例を示す。An example of the replica completeness table 36 is shown. ディスクI/O周期表３７の一例を示す。An example of the disk I / O periodic table 37 is shown. read/write位置アクセス負荷対応表３８の一例を示す。An example of the read / write position access load correspondence table 38 is shown. ストレージ容量表３９の一例を示す。An example of the storage capacity table 39 is shown. 負荷統計情報４０の一例を示す。An example of the load statistical information 40 is shown. 本実施形態におけるデータ読み込みフローを示す。The data reading flow in this embodiment is shown. 本実施形態におけるデータ書込みフローを示す。The data write flow in this embodiment is shown. 本実施形態におけるレプリカ作成処理フロー（その１）を示す。The replica creation processing flow (the 1) in this embodiment is shown. 本実施形態におけるレプリカ作成処理フロー（その２）を示す。The replica creation processing flow (No. 2) in this embodiment is shown. 本実施形態におけるレプリカ削除処理フロー（その１）を示す。The replica deletion processing flow (the 1) in this embodiment is shown. 本実施形態におけるレプリカ削除処理フロー（その２）を示す。The replica deletion processing flow (the 2) in this embodiment is shown. I/O周期表を利用した予測レプリカ作成処理フローを示す。The process flow of creating a predicted replica using the I / O periodic table is shown. ストレージ装置ＡについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage device A are tabulated every hour and a graph (B) thereof are shown. ストレージ装置ＢについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage device B are tabulated every hour and a graph (B) thereof are shown. ストレージ装置ＣについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage device C are counted every hour, and a graph (B) thereof are shown. 図２１−図２３で示す作業表（Ａ）の１時間毎のアクセス頻度の平均値と、そのグラフ（Ｂ）を示す。The average value of the access frequency for every hour of the work table (A) shown in FIGS. 21 to 23 and its graph (B) are shown. ストレージ装置ＢのディスクＢ１についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B1 of the storage apparatus B are counted every minute and a graph (B) thereof are shown. . ストレージ装置ＢのディスクＢ２についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B2 of the storage device B are counted every minute and a graph (B) thereof are shown. . ストレージ装置ＢのディスクＢ３についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。A work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B3 of the storage device B are counted every minute and a graph (B) thereof are shown. . 図２５−図２７で示す作業表（Ａ）の１分毎のアクセス頻度の平均値と、そのグラフ（Ｂ）を示す。The average value of the access frequency per minute of the work table (A) shown in FIGS. 25 to 27 and the graph (B) are shown.

上述した論理ボリューム負荷分散技術では、アクセス頻度を分析し、レプリカおよび一部データの複製を独自に作成することで負荷分散を行っている。しかし、この場合、業務で用いる入出力（I/O）とは別にコピー用の帯域が必要となり、このコピーによる業務への影響も考えられる。また、論理ボリュームの性能を拡充することができるのは、論理ボリュームを定義する初期設定時のみであり、オンデマンドな性能拡充は行うことはできない。また、複数のミラーが作成され、そのミラーに書込みが行われる場合において、書込みによる負荷が増大した場合には、ミラーを解体する手法が用いられる。しかし、一時の負荷によりミラーを解体することが、返って性能劣化につながる問題もある。さらに、データの読込みの際、ネットワーク帯域を使用するため、ストレージ装置とクライアント間のネットワーク帯域を圧迫してしまう。そのため、上述した論理ボリューム負荷分散技術は、大規模なストレージシステムのでは使用することができない。 In the logical volume load distribution technique described above, load distribution is performed by analyzing the access frequency and independently creating a replica and a copy of some data. However, in this case, a bandwidth for copying is required in addition to the input / output (I / O) used in the business, and this copy may affect the business. Further, the performance of the logical volume can be enhanced only at the initial setting for defining the logical volume, and the on-demand performance enhancement cannot be performed. Further, when a plurality of mirrors are created and writing is performed on the mirrors, a method of disassembling the mirrors is used when the load due to writing increases. However, there is also a problem that disassembling the mirror due to a temporary load results in performance degradation. Furthermore, since the network bandwidth is used when reading data, the network bandwidth between the storage apparatus and the client is compressed. Therefore, the above-described logical volume load distribution technique cannot be used in a large-scale storage system.

そこで、ストレージ装置の追加時や運用中の負荷不均衡時に行うデータリバランスの効率化を図る。このとき、複数のストレージ装置へのデータの複製を作成する場合に、情報処理端末からのアクセス要求を効率的に利用することで、ストレージシステムの入出力の性能の向上を図る。 Therefore, the efficiency of data rebalancing that is performed when a storage device is added or when the load is not balanced during operation is improved. At this time, when replicating data to a plurality of storage devices, the access request from the information processing terminal is efficiently used to improve the input / output performance of the storage system.

図１は、本実施形態におけるストレージシステムの構成を示す。本実施形態におけるストレージシステム１は、複数の格納装置２、キャッシュメモリ３、アクセス制御部４、書込部５を含む。 FIG. 1 shows the configuration of a storage system in this embodiment. The storage system 1 in this embodiment includes a plurality of storage devices 2, a cache memory 3, an access control unit 4, and a writing unit 5.

格納装置２は、データを格納する。格納装置２の一例は、ストレージ装置２５である。キャッシュメモリ３は、データを保持する。キャッシュメモリ３の一例は、キャッシュメモリ１８ｄの一例である。 The storage device 2 stores data. An example of the storage device 2 is the storage device 25. The cache memory 3 holds data. An example of the cache memory 3 is an example of the cache memory 18d.

アクセス制御部４は、情報処理端末９から、対象データの読み込みまたは対象データの書込みのためのアクセス要求があった場合、いずれかの格納装置２へアクセスすると共に、対象データをキャッシュメモリ３へ格納する。アクセス制御部４の一例は、I/Oサーバ１８である。 When there is an access request for reading target data or writing target data from the information processing terminal 9, the access control unit 4 accesses one of the storage devices 2 and stores the target data in the cache memory 3. To do. An example of the access control unit 4 is an I / O server 18.

書込部５は、キャッシュメモリ３へ格納した対象データを、複数の格納装置２のうち対象データが格納されていない格納装置２へ書き込む。書込部５の一例は、I/Oサーバ１８である。 The writing unit 5 writes the target data stored in the cache memory 3 to the storage device 2 in which the target data is not stored among the plurality of storage devices 2. An example of the writing unit 5 is an I / O server 18.

このように構成することにより、ストレージ間のリバランシング処理の効率化を図ることができる。すなわち、ストレージ装置を追加した場合に、情報処理端末９からのアクセス要求を有効利用することで、ストレージ間のリバランシング処理を、ネットワークに負荷を掛けずに効率的に行うことができる。 With this configuration, it is possible to improve the efficiency of rebalancing processing between storages. That is, when a storage device is added, by effectively using an access request from the information processing terminal 9, rebalancing processing between storages can be performed efficiently without imposing a load on the network.

前記ストレージシステムは、さらに、負荷監視部６を含む。負荷監視部６は、格納装置２に対する、読み込みまたは書き込みによるアクセス負荷の監視を行う。負荷監視部６の一例は、負荷監視部１７の一例である。 The storage system further includes a load monitoring unit 6. The load monitoring unit 6 monitors the access load by reading or writing on the storage device 2. An example of the load monitoring unit 6 is an example of the load monitoring unit 17.

アクセス制御部４は、情報処理端末９からアクセス要求があった場合、監視の結果に基づいて、格納装置２のうちアクセス負荷の最も少ない格納装置へアクセスすると共に、対象データをキャッシュメモリ３へ格納する。 When there is an access request from the information processing terminal 9, the access control unit 4 accesses the storage device with the smallest access load among the storage devices 2 and stores the target data in the cache memory 3 based on the monitoring result. To do.

アクセス制御部４は、情報処理端末９から対象データへの読み込み要求があった場合、監視の結果に基づいて、対象データを格納する格納装置２のうちアクセス負荷の最も少ない格納装置から対象データを取得する。アクセス制御部４は、対象データをキャッシュメモリ３へ格納し、対象データを情報処理端末へ送信する。 When there is a read request to the target data from the information processing terminal 9, the access control unit 4 retrieves the target data from the storage device with the least access load among the storage devices 2 that store the target data based on the monitoring result. get. The access control unit 4 stores the target data in the cache memory 3 and transmits the target data to the information processing terminal.

アクセス制御部４は、情報処理端末９から対象データの書き込み要求があった場合、監視の結果に基づいて、格納装置２のうちアクセス負荷の最も少ない格納装置に対象データを書き込み、対象データをキャッシュメモリ３へ格納する。 When there is a request for writing the target data from the information processing terminal 9, the access control unit 4 writes the target data to the storage device with the least access load among the storage devices 2 based on the monitoring result, and caches the target data. Store in memory 3.

書込部５は、キャッシュメモリ３へ格納した対象データを、書込みを行った格納装置２以外の格納装置２へ書き込む。 The writing unit 5 writes the target data stored in the cache memory 3 to a storage device 2 other than the storage device 2 that performed the writing.

このように構成することにより、監視の結果に基づいて、格納装置２のうちアクセス負荷の最も少ない格納装置に対象データを書き込むことができる。 With this configuration, the target data can be written to the storage device with the least access load among the storage devices 2 based on the monitoring result.

ストレージシステム１は、さらに、レプリカ削除部７を含む。レプリカ削除部７は、複数の格納装置２のボリュームを仮想化した仮想ボリュームのレプリカの書込みアクセス負荷が閾値を超えた場合、仮想ボリュームのレプリカが複数の格納装置２に３以上あれば、いずれかの格納装置２に格納されたレプリカを削除する。または、レプリカ削除部７は、レプリカの作成において格納装置２の空き容量が第1の閾値より少ない場合、仮想ボリュームのレプリカが複数の格納装置２に３以上あれば、いずれかの格納装置２に格納されたレプリカを削除する。 The storage system 1 further includes a replica deletion unit 7. When the write access load of the replica of the virtual volume obtained by virtualizing the volumes of the plurality of storage devices 2 exceeds the threshold, the replica deletion unit 7 has any of three or more virtual volume replicas in the plurality of storage devices 2. The replica stored in the storage device 2 is deleted. Alternatively, if the free capacity of the storage device 2 is less than the first threshold value when creating a replica, the replica deletion unit 7 determines that any one of the storage devices 2 has three or more virtual volume replicas. Delete stored replicas.

このように構成することにより、情報処理端末からの書込み要求が頻繁にある場合、またはストレージ装置の空き容量が少なくなった場合に、3以上のレプリカを有する仮想ボリュームのいずれかのレプリカを削除することができる。 With this configuration, when there are frequent write requests from the information processing terminal or when the free capacity of the storage device becomes low, one of the virtual volumes having three or more replicas is deleted. be able to.

レプリカ削除部７は、監視の結果に基づいて、レプリカを含む格納装置２のうちアクセス負荷の最も少ない格納装置２のレプリカを選択する。レプリカ削除部７は、時刻毎に集計した格納装置２に対するアクセス頻度を示すアクセス頻度情報に基づいて、選択したレプリカに対して、現時刻以降、第2の閾値を越えるアクセス頻度の読み込みアクセスがないと判定した場合、選択したレプリカを削除する。 The replica deletion unit 7 selects a replica of the storage device 2 with the least access load from the storage devices 2 including the replica based on the monitoring result. Based on the access frequency information indicating the access frequency with respect to the storage device 2 aggregated for each time, the replica deletion unit 7 has no read access with the access frequency exceeding the second threshold after the current time on the selected replica. If it is determined, delete the selected replica.

このように構成することにより、書込み後に閾値を超えるアクセスが一定時間継続する場合には、その仮想ボリュームのレプリカを削除対象から除くことができる。 With this configuration, when access exceeding the threshold value continues for a certain time after writing, the replica of the virtual volume can be excluded from the deletion target.

ストレージシステム１は、さらに、レプリカ作成部８を含む。レプリカ作成部８は、いずれかの格納装置２に対する時刻毎のアクセス負荷のうちいずれかの時間帯のアクセス負荷と、全ての格納装置２に対する時間帯のアクセス負荷の平均との差分が第3の閾値を超えている場合、次の処理を行う。すなわち、レプリカ作成部８は、時間帯を細分化する。レプリカ作成部８は、細分化したいずれかの時間帯のうちいずれかの格納装置２のいずれかのディスクに対するアクセス負荷と、いずれかの格納装置２の全ディクスの該細分化したいずれかの時間帯のアクセス負荷の平均との差分が第4の閾値より小さいか判定する。その差分が第4の閾値より小さい場合、レプリカ作成部８は、細分化したいずれかの時間帯より前で、いずれかの格納装置のアクセス負荷が第5の閾値を超えていない時間に、最もアクセス負荷の多いディスクにレプリカを作成する。 The storage system 1 further includes a replica creation unit 8. The replica creation unit 8 determines that the difference between the access load in any time zone among the access loads for each time for any storage device 2 and the average of the access loads in the time zone for all storage devices 2 is the third. If the threshold is exceeded, the following processing is performed. That is, the replica creation unit 8 subdivides the time zone. The replica creation unit 8 accesses the access load on any disk of any storage device 2 in any subdivided time zone and any subdivided time of all disks in any storage device 2 It is determined whether the difference from the average access load of the band is smaller than the fourth threshold. When the difference is smaller than the fourth threshold value, the replica creating unit 8 is the most before the time when the access load of any storage device does not exceed the fifth threshold value before any of the subdivided time zones. Create a replica on a disk with a high access load.

このように構成することにより、各ディスクのアクセス負荷は低いが、ストレージ装置全体として高い負荷がかかっているストレージ装置について、最もアクセス負荷の多いディスクにレプリカを作成することができる。 With this configuration, it is possible to create a replica on a disk with the highest access load for a storage apparatus that has a low access load on each disk but is heavily loaded as a whole storage apparatus.

本実施形態におけるストレージシステムには、ストレージ装置が２台以上ある。以下では、ストレージ装置Ａとストレージ装置Ｂの２台がある前提で説明する。ストレージ装置Ａ内のディスクは、ストレージ装置Ｂのディスクとミラー状態（すなわちレプリカは２個である）を保っているものとする。この２つディスクは、I/Oサーバ内で１つの仮想ディスクとして見せている。 The storage system in this embodiment has two or more storage devices. The following description is based on the assumption that there are two storage apparatuses A and B. It is assumed that the disk in the storage apparatus A maintains a mirror state (that is, two replicas) with the disk in the storage apparatus B. These two disks are shown as one virtual disk in the I / O server.

ここでミラー状態を保つ理由は、ディスクやストレージ装置が故障した場合でも、データロストや、運用を停止しないためである。ストレージにアクセスを行うクライアント装置は、クラウドシステムのように複数存在するものとする。 Here, the reason for maintaining the mirror state is that even if a disk or storage device fails, data lost or operation is not stopped. Assume that there are a plurality of client devices that access the storage as in the cloud system.

図２は、本実施形態におけるシステムの全体構成図を示す。本実施形態におけるシステムは、クライアント装置１１とストレージシステム１４とがLAN（Local Area Network）１３により接続されている。クライアント装置１１は、通信インターフェース（以下、インターフェースを「IF」と称する。）１２を介してLAN１３と接続されている。なお、LAN１３は、一例であって、インターネット、イントラネット、その他のネットワークであってよい。 FIG. 2 shows an overall configuration diagram of a system in the present embodiment. In the system according to this embodiment, a client device 11 and a storage system 14 are connected by a LAN (Local Area Network) 13. The client device 11 is connected to the LAN 13 via a communication interface (hereinafter, the interface is referred to as “IF”) 12. The LAN 13 is an example, and may be the Internet, an intranet, or another network.

ストレージシステム１４は、ボリューム管理サーバ１５、I/Oサーバ１８、ストレージ装置２５、管理用LAN２７、I/O用LAN２８を含む。 The storage system 14 includes a volume management server 15, an I / O server 18, a storage device 25, a management LAN 27, and an I / O LAN 28.

ボリューム管理サーバ１５は、ストレージシステム１４全体を制御し、I/Oサーバ１８で用いる仮想ボリューム２１を管理する。ボリューム管理サーバ１５は、制御部１５ａ、管理用ＬＡＮＩＦ１５ｂ、記憶部１５ｃを含む。制御部１５ａは、ボリューム管理サーバ１５の動作を制御するものであって、例えば、ＣＰＵ（central processing unit）である。管理用ＬＡＮＩＦ１５ｂは、管理用ＬＡＮ２７と接続するための通信インターフェースである。記憶部１５ｃには、制御部１５ａを動作させるプログラム、本実施形態にかかるプログラム及び後述する表等が格納されている。記憶部１５ｃは、例えば、ＲＯＭ（リードオンリメモリ）、ＲＡＭ（ランダムアクセスメモリ）、ハードディスク、フラッシュメモリ等を用いることができる。 The volume management server 15 controls the entire storage system 14 and manages the virtual volume 21 used by the I / O server 18. The volume management server 15 includes a control unit 15a, a management LAN IF 15b, and a storage unit 15c. The control unit 15a controls the operation of the volume management server 15, and is, for example, a CPU (central processing unit). The management LAN IF 15 b is a communication interface for connecting to the management LAN 27. The storage unit 15c stores a program for operating the control unit 15a, a program according to the present embodiment, a table to be described later, and the like. As the storage unit 15c, for example, a ROM (read only memory), a RAM (random access memory), a hard disk, a flash memory, or the like can be used.

I/Oサーバ１８は、クライアント装置１１からの入力やクライアント装置１１への出力を制御する。I/Oサーバ１８は、外部用通信IF１８ａ、制御部１８ｂ、記憶部２８ｃ、キャッシュメモリ（以下、「キャッシュ」と称する）１８ｄ、ストレージ用通信ＩＦ１８ｅ、管理用ＬＡＮＩＦ１８ｆを含む。外部用通信IF１８ａは、ＬＡＮ１３と接続するための通信インターフェースである。制御部１８ｂは、I/Oサーバ１８の動作を制御し、ストレージのディスクを仮想化した仮想ボリュームを制御するものであって、例えば、ＣＰＵ（central processing unit）である。記憶部１８ｃには、制御部１８ｂを動作させるプログラム、本実施形態にかかるプログラム及び後述する表、仮想ボリューム等が格納されている。記憶部１８ｂは、例えば、ＲＯＭ（リードオンリメモリ）、ＲＡＭ（ランダムアクセスメモリ）、ハードディスク、フラッシュメモリ等を用いることができる。ストレージ用通信ＩＦ１８ｅは、I/O用LAN２８と接続するための通信インターフェースである。管理用ＬＡＮＩＦ１８ｆは、管理用ＬＡＮ２７と接続するための通信インターフェースである。 The I / O server 18 controls input from the client device 11 and output to the client device 11. The I / O server 18 includes an external communication IF 18a, a control unit 18b, a storage unit 28c, a cache memory (hereinafter referred to as “cache”) 18d, a storage communication IF 18e, and a management LAN IF 18f. The external communication IF 18 a is a communication interface for connecting to the LAN 13. The control unit 18b controls the operation of the I / O server 18 and controls a virtual volume obtained by virtualizing a storage disk, and is, for example, a CPU (central processing unit). The storage unit 18c stores a program for operating the control unit 18b, a program according to the present embodiment, a table to be described later, a virtual volume, and the like. As the storage unit 18b, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk, a flash memory, or the like can be used. The storage communication IF 18 e is a communication interface for connecting to the I / O LAN 28. The management LAN IF 18 f is a communication interface for connecting to the management LAN 27.

ストレージ装置２５は、複数のディスク２６を含む。ストレージ装置２５は、管理用LAN２７と、I/O用LAN２８に接続されている。管理用LAN２７は、ボリューム管理サーバ１５、I/Oサーバ１８、ストレージ装置２５間で、命令や負荷の監視の監視等のために用いられるネットワークである。I/O用LAN２８は、ストレージ装置２５に格納されたデータまたはストレージ装置２５へ格納するデータを転送するためのネットワークである。管理用LAN２７と、I/O用LAN２８は、ＬＡＮに限定されず、たとえば、ＳＡＮ（ストレージ・エリア・ネットワーク）等のネットワークであってもよい。 The storage device 25 includes a plurality of disks 26. The storage device 25 is connected to a management LAN 27 and an I / O LAN 28. The management LAN 27 is a network used for monitoring commands and loads among the volume management server 15, the I / O server 18, and the storage device 25. The I / O LAN 28 is a network for transferring data stored in the storage device 25 or data stored in the storage device 25. The management LAN 27 and the I / O LAN 28 are not limited to LANs, and may be networks such as a SAN (Storage Area Network), for example.

以下の実施形態で説明する処理を実現するプログラムは、プログラム提供者側からＬＡＮ１３を介して、例えば記憶部１５ｃ，１８ｂに格納されてもよい。また、以下の実施形態で説明する処理を実現するプログラムは、市販され、流通している可搬型記憶媒体に格納されていてもよい。この場合、この可搬型記憶媒体はストレージシステム１４の読み取り装置（不図示）にセットされて、制御部１５ａ，１８ｂによってそのプログラムが読み出されて、実行されてもよい。可搬型記憶媒体としてはＣＤ−ＲＯＭ、フレキシブルディスク、光ディスク、光磁気ディスク、ＩＣカード、ＵＳＢメモリ装置など様々な形式の記憶媒体を使用することができる。このような記憶媒体に格納されたプログラムが読み取り装置によって読み取られる。 A program that realizes processing described in the following embodiment may be stored in the storage units 15c and 18b, for example, via the LAN 13 from the program provider side. Moreover, the program which implement | achieves the process demonstrated by the following embodiment may be stored in the portable storage medium marketed and distribute | circulated. In this case, the portable storage medium may be set in a reading device (not shown) of the storage system 14, and the program may be read and executed by the control units 15a and 18b. As the portable storage medium, various types of storage media such as a CD-ROM, a flexible disk, an optical disk, a magneto-optical disk, an IC card, and a USB memory device can be used. A program stored in such a storage medium is read by a reading device.

図３は、ボリューム管理サーバ１５とI/Oサーバ１８のブロック図を示す。ボリューム管理サーバ１５は、ボリューム管理部１６、負荷監視部１７を含む。ボリューム管理部１６は、各I/Oサーバ１８の仮想ボリューム２１を管理する。負荷監視部１７は、各ストレージ装置２５の負荷情報及びストレージ装置２５内のボリュームの負荷を監視する。 FIG. 3 shows a block diagram of the volume management server 15 and the I / O server 18. The volume management server 15 includes a volume management unit 16 and a load monitoring unit 17. The volume management unit 16 manages the virtual volume 21 of each I / O server 18. The load monitoring unit 17 monitors the load information of each storage device 25 and the load of the volume in the storage device 25.

ボリューム管理部１６は、レプリカ表３１、書き込み保留表３２、I/O周期表３３、read/writeアクセス負荷対応表３４、read/writeディスクアクセス負荷対応表３５を含む。レプリカ表３１、書き込み保留表３２、I/O周期表３３、read/writeアクセス負荷対応表３４、read/writeディスクアクセス負荷対応表３５は、ボリューム管理サーバ１５の記憶部１５ｃに格納されている。 The volume management unit 16 includes a replica table 31, a write hold table 32, an I / O periodic table 33, a read / write access load correspondence table 34, and a read / write disk access load correspondence table 35. The replica table 31, the write hold table 32, the I / O periodic table 33, the read / write access load correspondence table 34, and the read / write disk access load correspondence table 35 are stored in the storage unit 15c of the volume management server 15.

I/Oサーバ１８は、I/O管理部２０、仮想ボリューム２１を含む。I/O管理部２０は、クライアント装置１１から仮想ボリューム２１へのアクセスや仮想ボリューム２１からクライアント装置１１への出力を制御する。仮想ボリューム２１は、I/Oサーバ１８に設けられた記憶部１８ｃに格納されている。 The I / O server 18 includes an I / O management unit 20 and a virtual volume 21. The I / O management unit 20 controls access from the client device 11 to the virtual volume 21 and output from the virtual volume 21 to the client device 11. The virtual volume 21 is stored in the storage unit 18 c provided in the I / O server 18.

I/O管理部２０は、レプリカ完成度表３６、ディスクI/O周期表３７、read/write位置アクセス負荷対応表３８を含む。レプリカ完成度表３６、ディスクI/O周期表３７、read/write位置アクセス負荷対応表３８は、I/Oサーバ１８の記憶部１８ｃに格納されている。 The I / O management unit 20 includes a replica completeness table 36, a disk I / O periodic table 37, and a read / write position access load correspondence table 38. The replica completeness table 36, the disk I / O periodic table 37, and the read / write position access load correspondence table 38 are stored in the storage unit 18c of the I / O server 18.

図４は、レプリカ表３１の一例を示す。レプリカ表３１とは、仮想ボリューム２１のレプリカがどのストレージ装置２５に作成されているか、またはそのレプリカがどのような状態であるかを示す表である。 FIG. 4 shows an example of the replica table 31. The replica table 31 is a table indicating in which storage device 25 the replica of the virtual volume 21 is created or in what state the replica is.

レプリカ表３１は、「仮想ボリューム ID」、「ストレージID 1」、「ストレージID 2」、・・・のデータ項目を含む。「仮想ボリューム ID」には、仮想ボリューム２１を識別する識別情報が格納される。「ストレージID 1」、「ストレージID 2」・・・には、仮想ボリューム２１に対応するストレージ装置２５におけるレプリカの完成度が設定される。ここで、レプリカの完成度が100%の場合は、「Created」が設定される。レプリカの完成度が“0 < 完成度 < 100”の場合は、「Copying」が設定される。レプリカの完成度が0%の場合は、「None」が設定される。また、レプリカが削除中である場合は、「Deleting」が設定される。 The replica table 31 includes data items of “virtual volume ID”, “storage ID 1”, “storage ID 2”,. In the “virtual volume ID”, identification information for identifying the virtual volume 21 is stored. In “Storage ID 1”, “Storage ID 2”,..., The degree of completion of the replica in the storage device 25 corresponding to the virtual volume 21 is set. Here, if the degree of completion of the replica is 100%, “Created” is set. When the degree of completion of the replica is “0 <completeness <100”, “Copying” is set. When the degree of completion of the replica is 0%, “None” is set. If the replica is being deleted, “Deleting” is set.

図５は、書き込み保留表３２の一例を示す。書き込み保留表３２は、キャッシュ１８ｄに保留した書込み対象のデータを管理する表である。書き込み保留表３２は、「仮想ディスク ID」、「ストレージID」、「位置」、「日付」のデータ項目を含む。「仮想ディスク ID」には、仮想ボリューム２１を識別する識別情報が格納される。「ストレージID」には、仮想ボリューム２１に対応するストレージ装置２５を識別する情報（ストレージID）が格納される。「位置」には、データの書き込みを行ったストレージ２５における書き込み位置が格納される。「日付」には、書き込んだ日付が格納される。 FIG. 5 shows an example of the write hold table 32. The write hold table 32 is a table for managing write target data held in the cache 18d. The write hold table 32 includes data items of “virtual disk ID”, “storage ID”, “location”, and “date”. In the “virtual disk ID”, identification information for identifying the virtual volume 21 is stored. In “Storage ID”, information (storage ID) for identifying the storage device 25 corresponding to the virtual volume 21 is stored. “Position” stores a write position in the storage 25 where data is written. In “Date”, the written date is stored.

図６は、I/O周期表３３の一例を示す。I/O周期表３３は、単位時間当たりの各ストレージ装置２５の各ディスクのアクセス（read/write）数を集計した表である。I/O周期表３３は、ストレージ装置２５のディスク毎に作成される。 FIG. 6 shows an example of the I / O periodic table 33. The I / O periodic table 33 is a table in which the number of accesses (read / write) of each disk of each storage device 25 per unit time is tabulated. The I / O periodic table 33 is created for each disk of the storage device 25.

I/O周期表３３は、「時刻」、「read」、「write」のデータ項目を含む。「時刻」には、一定間隔の時刻が格納される。「read」には、その時刻におけるreadアクセス数が格納される。「write」には、その時刻におけるwriteアクセス数が格納される。 The I / O periodic table 33 includes data items of “time”, “read”, and “write”. In “Time”, time at a predetermined interval is stored. “Read” stores the number of read accesses at that time. “Write” stores the number of write accesses at that time.

図７は、read/writeアクセス負荷対応表３４の一例を示す。read/writeアクセス負荷対応表３４は、read/writeディスクアクセス負荷対応表３５のディスクIDとread/write位置アクセス負荷対応表３８のディスクIDを関連付けさせた表である。read/writeディスクアクセス負荷対応表３５、read/write位置アクセス負荷対応表３８については後述する。 FIG. 7 shows an example of the read / write access load correspondence table 34. The read / write access load correspondence table 34 is a table in which the disk ID of the read / write disk access load correspondence table 35 is associated with the disk ID of the read / write position access load correspondence table 38. The read / write disk access load correspondence table 35 and the read / write position access load correspondence table 38 will be described later.

図８は、read/writeディスクアクセス負荷対応表３５の一例を示す。read/writeディスクアクセス負荷対応表３５は、全ストレージ２５内に存在する各ディスクへのreadアクセスとwriteアクセスの回数をカウントする表である。 FIG. 8 shows an example of the read / write disk access load correspondence table 35. The read / write disk access load correspondence table 35 is a table for counting the number of times of read access and write access to each disk existing in all the storages 25.

read/writeディスクアクセス負荷対応表３５は、ストレージ装置単位であり、「ディスクID」、「Readアクセス頻度」、「Writeアクセス頻度」のデータ項目を含む。「ディスクID」には、ストレージ装置２５内のディスク２６を識別する識別情報が設定される。「Readアクセス頻度」には、そのディスクに対するReadアクセス頻度が設定される。「Writeアクセス頻度」には、そのディスクに対するWriteアクセス頻度が設定される。 The read / write disk access load correspondence table 35 is a storage device unit, and includes data items of “disk ID”, “Read access frequency”, and “Write access frequency”. In the “disk ID”, identification information for identifying the disk 26 in the storage device 25 is set. In “Read access frequency”, the read access frequency for the disk is set. In “Write access frequency”, the write access frequency for the disk is set.

図９は、レプリカ完成度表３６の一例を示す。レプリカ完成度表３６は、仮想ボリュームのストレージ装置２５毎のレプリカの完成度を位置毎に示している表である。レプリカ完成度表３６は、ストレージ装置２５においてレプリカが格納されている「位置」、各ストレージ装置の「レプリカ完成度」、ストレージ装置２５に格納されているレプリカの「状態」が格納される。 FIG. 9 shows an example of the replica completeness table 36. The replica completeness table 36 is a table showing the completeness of the replica for each storage device 25 of the virtual volume for each position. The replica completeness table 36 stores “location” where the replica is stored in the storage device 25, “replica completeness” of each storage device, and “state” of the replica stored in the storage device 25.

レプリカ完成度とは、どの程度レプリカが完成しているかを示す値である。レプリカの「状態」には、「None」、「Copying」、「Created」、「Deleting」の４つの状態が存在する。「None」は、レプリカが作成されていないことを示す。「Copying」は、レプリカ作成先にwrite処理を発行する前であることを示す。また、「Created」は、レプリカ作成が完了されていることを示す。「Deleting」は、レプリカの削除中であることを示す。 The replica completeness is a value indicating how much the replica is completed. The “state” of the replica has four states of “None”, “Copying”, “Created”, and “Deleting”. “None” indicates that a replica has not been created. “Copying” indicates that it is before issuing a write process to the replica creation destination. “Created” indicates that replica creation has been completed. “Deleting” indicates that the replica is being deleted.

図１０は、ディスクI/O周期表３７の一例を示す。単位時間当たりのストレージ２５の各ディスクのアクセス数を集計した表である。ディスクI/O周期表３７は、当該I/Oサーバが一定の時刻毎にストレージ装置２５の各ディスク２６にアクセスした回数を集計した表である。ディスクI/O周期表３３は、ストレージ装置２５のディスク毎に作成される。 FIG. 10 shows an example of the disk I / O periodic table 37. It is the table | surface which totaled the access count of each disk of the storage 25 per unit time. The disk I / O periodic table 37 is a table in which the number of times the I / O server has accessed each disk 26 of the storage device 25 at a certain time is tabulated. The disk I / O periodic table 33 is created for each disk of the storage device 25.

ディスクI/O周期表３３は、「時刻」、「read」、「write」のデータ項目を含む。「時刻」には、一定間隔の時刻が格納される。「read」には、その時刻におけるreadアクセス数が格納される。「write」には、その時刻におけるwriteアクセス数が格納される。 The disk I / O periodic table 33 includes data items of “time”, “read”, and “write”. In “Time”, time at a predetermined interval is stored. “Read” stores the number of read accesses at that time. “Write” stores the number of write accesses at that time.

図１１は、read/write位置アクセス負荷対応表３８の一例を示す。read/write位置アクセス負荷対応表３８は、I/Oサーバ１８がアクセスを行っているディスク２６の位置毎にread/writeのアクセス負荷を管理している表である。ここで、ディスク２６におけるread/writeする位置とは、データをread/writeするための、ディスク２６の先頭位置に対する位置を示す。 FIG. 11 shows an example of the read / write position access load correspondence table 38. The read / write position access load correspondence table 38 is a table that manages the read / write access load for each position of the disk 26 to which the I / O server 18 is accessing. Here, the read / write position on the disk 26 indicates a position relative to the head position of the disk 26 for reading / writing data.

read/write位置アクセス負荷対応表３８は、ディスク単位に設けられ、「ディスクID」、「Readアクセス頻度」、「Writeアクセス頻度」のデータ項目を含む。「ディスクID」には、ストレージ装置２５内のディスク２６を識別する識別情報が設定される。「Readアクセス頻度」には、そのディスク２６に対するReadアクセス頻度が設定される。「Writeアクセス頻度」には、そのディスク２６に対するWriteアクセス頻度が設定される。 The read / write position access load correspondence table 38 is provided for each disk, and includes data items of “disk ID”, “Read access frequency”, and “Write access frequency”. In the “disk ID”, identification information for identifying the disk 26 in the storage device 25 is set. In “Read access frequency”, the Read access frequency for the disk 26 is set. In the “Write access frequency”, the write access frequency for the disk 26 is set.

図１２は、ストレージ容量表３９の一例を示す。ストレージ容量表３９は、各ストレージ装置２５の容量が格納された表である。ストレージ容量表３９は、ボリューム管理サーバ１５の負荷監視部１７によって各ストレージ装置２５から取得した、ストレージ装置が使用している容量を一覧にしたものである。 FIG. 12 shows an example of the storage capacity table 39. The storage capacity table 39 is a table in which the capacity of each storage device 25 is stored. The storage capacity table 39 is a list of capacities used by the storage apparatus acquired from each storage apparatus 25 by the load monitoring unit 17 of the volume management server 15.

図１３は、負荷統計情報４０の一例を示す。負荷統計情報４０は、ストレージ装置２５毎の、単位時間当たりのread/write（I/O per second）、帯域使用率が格納された表である。ボリューム管理サーバ１５は、負荷監視部１７を用いて、各ストレージ装置２５から負荷統計情報４０を取得する。 FIG. 13 shows an example of the load statistical information 40. The load statistical information 40 is a table storing read / write (I / O per second) and bandwidth usage rate per unit time for each storage device 25. The volume management server 15 acquires the load statistical information 40 from each storage device 25 using the load monitoring unit 17.

本実施形態において、ストレージ装置が追加されるストレージシステム１４は、ストレージ装置が２台以上ある。例えば、ストレージシステム１４は、ストレージ装置Ａとストレージ装置Ｂの２台がある。ストレージ装置Ａ内のディスクは、ストレージ装置Ｂのディスクとミラー状態（すなわちレプリカは２個である）を保っているものとする。ストレージシステム１４に追加するストレージ装置をストレージ装置Ｃとする。 In this embodiment, the storage system 14 to which the storage device is added has two or more storage devices. For example, the storage system 14 includes two storage apparatuses A and B. It is assumed that the disk in the storage apparatus A maintains a mirror state (that is, two replicas) with the disk in the storage apparatus B. A storage device added to the storage system 14 is a storage device C.

（１）ストレージ装置Cをストレージシステム１４に追加する。 (1) Add the storage device C to the storage system 14.

（２）クライアント１１を用いて、ユーザがデータ読込み・書き込みを行う契機で以下の処理が実施される。
（２−１）負荷監視部１７を利用して、負荷が少ないストレージ装置２５を選択する。
（２−２）選択したストレージ装置２５からデータの読込みを行う。
（２−３）読み込んだデータをクライアント１１へ返す。(2) The following processing is performed when the user reads / writes data using the client 11.
(2-1) Using the load monitoring unit 17, select a storage device 25 with a low load.
(2-2) Data is read from the selected storage device 25.
(2-3) Return the read data to the client 11.

（３）データ読み込み時のキャッシュに保持したデータを、ストレージ装置Cのネットワーク帯域が圧迫されていない時間に、ストレージ装置Cのディスクへ書込みを行う（この時点で、レプリカが３になる）。このように、負荷が少ないストレージ装置Cにデータが移動することで、次に読込む際の読込み性能が向上できる。 (3) The data held in the cache at the time of data reading is written to the disk of the storage device C at a time when the network bandwidth of the storage device C is not squeezed (at this time, the replica becomes 3). As described above, the data is moved to the storage device C with a small load, so that the reading performance in the next reading can be improved.

（４）readのみが高い頻度で行われ、かつ、いずれのレプリカも読込み性能が低下している場合、新たに追加されたストレージ装置があるならば、レプリカを作成する。 (4) If only read is performed at a high frequency and the read performance of any replica is degraded, a replica is created if there is a newly added storage device.

（５）ストレージ装置２５の空き容量が少なく（ある閾値を設定。たとえば80%）となったら、レプリカが３以上のものに関して、削除することで空き容量を増やす。ここで、I/O周期表３３を用いて、書込み後に大量の読込み期間が続くボリュームのレプリカは削除対象から外す。レプリカ数が３以上存在し、書き込み処理が頻繁に行われる場合は、少なくとも２つのレプリカを維持することを条件に、負荷の高いストレージ装置のレプリカを削除する。 (5) When the free capacity of the storage device 25 is small (a certain threshold is set, for example, 80%), the free capacity is increased by deleting those having 3 or more replicas. Here, using the I / O periodic table 33, replicas of volumes that have a large number of read periods after writing are excluded from deletion targets. When the number of replicas is 3 or more and write processing is frequently performed, the replica of the storage device with a high load is deleted on condition that at least two replicas are maintained.

以下では、本実施形態の詳細について説明する。
ストレージシステム１４に追加するストレージ装置をストレージ装置Ｃ（２５ｃ）とする。本実施形態では、ストレージ装置２５の新規追加の場面で説明をするが、運用開始後にも同様なアルゴリズムでリバランスをすることができる。Below, the detail of this embodiment is demonstrated.
A storage device to be added to the storage system 14 is a storage device C (25c). In the present embodiment, explanation will be given in the case of a new addition of the storage apparatus 25, but rebalancing can be performed with the same algorithm even after the operation is started.

（１）ストレージ装置C（２５ｃ）の追加
ストレージ装置C（２５ｃ）は、全てのクライアント１１からアクセスできるように、ストレージシステム１４内に設置され、I/O用LAN２７と接続される。(1) Addition of Storage Device C (25c) The storage device C (25c) is installed in the storage system 14 and connected to the I / O LAN 27 so that all clients 11 can access it.

（２）データ読込み
図１４は、本実施形態におけるデータ読み込みフローを示す。以下では、図１４を用いて、ボリューム管理サーバ１５とI/Oサーバ１８の動作を説明する。(2) Data Reading FIG. 14 shows a data reading flow in this embodiment. Hereinafter, the operations of the volume management server 15 and the I / O server 18 will be described with reference to FIG.

［読込みストレージの選択］
ボリューム管理サーバ１５の負荷監視部１７は、各ストレージ装置２５の負荷状況・ストレージ装置内のボリュームの負荷を監視する（Ｓ１１）。ここでは、負荷監視部１７は、ストレージ装置２５から送信された負荷統計情報４０を取得する。[Select Read Storage]
The load monitoring unit 17 of the volume management server 15 monitors the load status of each storage device 25 and the load of the volume in the storage device (S11). Here, the load monitoring unit 17 acquires the load statistical information 40 transmitted from the storage device 25.

ボリューム管理サーバ１５は、I/Oサーバ１８に対して、定期的に、ストレージ装置２５から取得した負荷統計情報を送信する（Ｓ１２）。 The volume management server 15 periodically transmits the load statistical information acquired from the storage device 25 to the I / O server 18 (S12).

［読込みデータのキャッシュの保持］
クライアント１１からデータの読込み要求があると、I/Oサーバ１８は、ストレージ装置２５内のディスク２６よりデータの読み込みを行う（Ｓ２１）。レプリカ完成度表３６を参照して複数のレプリカが存在すると判定した場合、I/Oサーバ１８は、アクセス負荷が最も低いストレージ装置２５またはアクセス負荷が最も低いディスク２６を有するストレージ装置２５を選択する（Ｓ２２）。例えば、I/Oサーバ１８は、ボリューム管理サーバ１５から受信した負荷統計情報の帯域使用率またはI/Opsを用いて、アクセス負荷が最も低いストレージ装置２５を選択する。またはI/Oサーバ１８は、read/write位置アクセス負荷対応表３８を用いて、アクセス負荷が最も低いディスク２６を有するストレージ装置２５を選択する。[Retention of read data cache]
When there is a data read request from the client 11, the I / O server 18 reads the data from the disk 26 in the storage device 25 (S21). When it is determined that a plurality of replicas exist with reference to the replica completeness table 36, the I / O server 18 selects the storage device 25 having the lowest access load or the storage device 25 having the disk 26 having the lowest access load. (S22). For example, the I / O server 18 uses the bandwidth usage rate or I / Ops of the load statistical information received from the volume management server 15 to select the storage device 25 with the lowest access load. Alternatively, the I / O server 18 uses the read / write position access load correspondence table 38 to select the storage device 25 having the disk 26 with the lowest access load.

I/Oサーバ１８は、Ｓ２１で選択したストレージ装置２５またはアクセス負荷が最も低いディスク２６を有するストレージ装置２５に対して、データの読込みを行う（Ｓ２３）。このとき、I/Oサーバ１８は、read/write位置アクセス負荷対応表３８において、ディスク２６を読み込んだ位置のreadアクセス頻度を更新する（Ｓ２４）。 The I / O server 18 reads data from the storage device 25 selected in S21 or the storage device 25 having the disk 26 having the lowest access load (S23). At this time, the I / O server 18 updates the read access frequency at the position where the disk 26 is read in the read / write position access load correspondence table 38 (S24).

データの読込みの際、I/Oサーバ１８は、同じデータの再読込み時の性能を上げるために、キャッシュ１８ｄに、その読み込んだデータを保持する（Ｓ２５）。I/Oサーバ１８は、キャッシュ１８ｄに保持したデータをクライアント１１へ送信する（Ｓ２６）。これにより、データの読込みが完了する。 When reading data, the I / O server 18 holds the read data in the cache 18d in order to improve the performance when the same data is read again (S25). The I / O server 18 transmits the data held in the cache 18d to the client 11 (S26). Thereby, reading of data is completed.

ストレージ装置２５のファイルシステム上のファイルを読み込む場合には、I/Oサーバ１８は、データの先読みを行い、その先読みしたデータをキャッシュ１８ｄに保持する。 When reading a file on the file system of the storage device 25, the I / O server 18 pre-reads data and holds the pre-read data in the cache 18d.

［読込みデータのキャッシュの利用］
I/Oサーバ１８は、読込み時にキャッシュ１８ｄに保持したデータ（存在するなら先読みデータも含む）を用いて、クライアント１１からのデータ読込み要求のバックグランドで、追加したストレージ装置２５ｃのレプリカに書込み要求を行う（Ｓ２７）。このキャッシュ１８ｄに保持したデータを使用することで、コピーのための読込みを行わないので、データのリバランスのデータコピーにかかるネットワーク負荷を1/2に抑えることができる。[Use read data cache]
The I / O server 18 uses the data held in the cache 18d at the time of reading (including prefetched data if it exists) in the background of the data reading request from the client 11, and writes a request to the replica of the added storage device 25c. (S27). By using the data held in the cache 18d, reading for copying is not performed, so that the network load required for data copying for data rebalancing can be reduced to ½.

ボリューム管理サーバ１５からのレプリカ開始指示の前にキャッシュ１８ｄにデータを保持している場合は、I/Oサーバ１８は、次の処理を行う。I/Oサーバ１８は、図１６のＳ５６以降で説明するように、仮想ボリューム２１に新規に追加するレプリカ作成先が選択された時点で、レプリカ作成先にそのデータの書込みを行う。このように、このキャッシュ１８ｄに保持したデータを利用することで、効率的なレプリカ作成ができる。 If data is held in the cache 18d before the replica start instruction from the volume management server 15, the I / O server 18 performs the following processing. The I / O server 18 writes the data to the replica creation destination when a replica creation destination to be newly added to the virtual volume 21 is selected, as will be described from S56 onward in FIG. Thus, efficient replica creation can be performed by using the data held in the cache 18d.

（３）データ書き込み
図１５は、本実施形態におけるデータ書込みフローを示す。以下では、図１５を用いて、ボリューム管理サーバ１５とI/Oサーバ１８の動作を説明する。(3) Data Write FIG. 15 shows a data write flow in this embodiment. Hereinafter, the operations of the volume management server 15 and the I / O server 18 will be described with reference to FIG.

クライアント１１からデータの書込み要求があった場合、I/Oサーバ１８は、仮想ボリューム２１の実ディスク２６に対して、データの書込み処理を開始する（Ｓ３１）。このとき、I/Oサーバ１８は、データの整合性を保つために、レプリカ完成度表３６を読み出して（Ｓ３３）、書き込み対象のレプリカの存在するストレージ装置（対象ストレージ装置）の情報を取得する（Ｓ３２）。 When there is a data write request from the client 11, the I / O server 18 starts a data write process to the real disk 26 of the virtual volume 21 (S31). At this time, in order to maintain data consistency, the I / O server 18 reads the replica completeness table 36 (S33), and acquires information on the storage device (target storage device) where the write target replica exists. (S32).

I/Oサーバ１８は、Ｓ１２においてボリューム管理サーバ１５から受信した負荷統計情報を用いて、対象ストレージ装置２５の負荷状況を確認する（Ｓ３４）。I/Oサーバ１８は、負荷統計情報を用いて、全ての対象ストレージ装置２５に負荷がないか、すなわち、全ての対象ストレージ装置２５の帯域使用率が閾値より低いかを判定する（Ｓ３５）。全ての対象ストレージ装置２５の帯域使用率が閾値より低い場合（Ｓ３５で「Ｙｅｓ」）、I/Oサーバ１８は、全ての対象ストレージ装置２５へデータの書き込みを行う（Ｓ４２）。I/Oサーバ１８は、データの書き込みを行った対象ストレージ装置２５についてのread/write位置アクセス負荷対応表３４を更新する（Ｓ４３）。I/Oサーバ１８は、データ書込みが完了した旨を書き込み要求元のクライアント１１へ通知する（Ｓ４４）。 The I / O server 18 confirms the load status of the target storage device 25 using the load statistical information received from the volume management server 15 in S12 (S34). The I / O server 18 uses the load statistical information to determine whether there is no load on all the target storage devices 25, that is, whether the bandwidth usage rate of all the target storage devices 25 is lower than the threshold (S35). When the bandwidth usage of all the target storage devices 25 is lower than the threshold (“Yes” in S35), the I / O server 18 writes data to all the target storage devices 25 (S42). The I / O server 18 updates the read / write position access load correspondence table 34 for the target storage apparatus 25 that has written data (S43). The I / O server 18 notifies the write request source client 11 that the data writing has been completed (S44).

［複数レプリカへの書き込み］
いずれかの対象ストレージ装置２５の負荷が閾値より高い場合（Ｓ３５で「Ｎｏ」）、I/Oサーバ１８は、次を行う。すなわち、I/Oサーバ１８は、負荷統計情報またはread/write位置アクセス負荷対応表３８を用いて、書込み対象のデータを、負荷が最も低いストレージ装置２５、または負荷が最も低いディスクへ書込む。１つの書込みが完了した時点で、I/Oサーバ１８は、データ書込みが完了した旨を書き込み要求元のクライアント１１へ通知する（Ｓ３６）。[Write to multiple replicas]
If the load on any of the target storage devices 25 is higher than the threshold (“No” in S35), the I / O server 18 performs the following. That is, the I / O server 18 uses the load statistical information or the read / write position access load correspondence table 38 to write the write target data to the storage device 25 with the lowest load or the disk with the lowest load. At the time when one writing is completed, the I / O server 18 notifies the client 11 as the write request source that the data writing has been completed (S36).

［書込み保留したストレージへの対処］
I/Oサーバ１８は、他へのストレージ装置２５への書込みも必要になるが、高負荷時にデータを書き込むことで、性能劣化の要因となってしまう。そのため、I/Oサーバ１８は、負荷の高さが閾値を超えI/Oの余裕がないストレージ装置２５への書込みデータをキャッシュ１８ｄに保持し、データの書き込みを一旦保留する。このキャッシュ１８ｄに保持したデータは、破棄させない。これについて、以下に説明する。[Countermeasures for write pending storage]
The I / O server 18 needs to write to the storage device 25 elsewhere, but writing data at a high load causes degradation in performance. For this reason, the I / O server 18 holds the write data to the storage device 25 in which the height of the load exceeds the threshold and there is no I / O margin in the cache 18d, and temporarily holds the data write. The data held in the cache 18d is not discarded. This will be described below.

ストレージ装置２５への書込み対象のデータを保留した場合、I/Oサーバ１８は、ボリューム管理サーバ１５へ保留情報を通知する（Ｓ３７）。保留情報は、書き込みを行わなかった対象ストレージ装置２５のストレージＩＤ、ディスクＩＤ、ディスクの書込み位置、及び書き込みデータを含む。ボリューム管理サーバ１５は、I/Oサーバ１８から通知された保留情報を用いて、書込み保留表３２を更新する（Ｓ３８）。I/Oサーバ１８は、書き込みデータをキャッシュ１８ｄへ保持する（Ｓ３９）。 When the data to be written to the storage device 25 is suspended, the I / O server 18 notifies the volume management server 15 of the suspension information (S37). The hold information includes the storage ID, disk ID, disk write position, and write data of the target storage device 25 that has not been written. The volume management server 15 updates the write hold table 32 using the hold information notified from the I / O server 18 (S38). The I / O server 18 holds the write data in the cache 18d (S39).

他I/Oサーバ１８からのデータ読込み時は、他I/Oサーバ１８は、書込み保留表３２を参照して、最新データがある書込み完了したデータが存在するストレージ装置２５へデータの読込み処理を行う。これにより、最新のデータの読込みができる。 At the time of reading data from the other I / O server 18, the other I / O server 18 refers to the write hold table 32 and performs a process of reading the data into the storage device 25 where the latest data and the written data exist. Do. Thereby, the latest data can be read.

その後、I/Oサーバ１８は、I/Oサーバ１８は、負荷統計情報またはread/write位置アクセス負荷対応表３８を用いて、ストレージ装置２５の負荷が閾値より下がった時点で、キャッシュ１８ｄに保持したデータをそのストレージ装置２５へ書込む（Ｓ４０）。このとき、I/Oサーバ１８は、read/write位置アクセス負荷対応表３８において、ストレージ装置２５のディスク２６に書き込んだ位置のwriteアクセス頻度を更新する（Ｓ４１）。ストレージ装置２５への書込みが完了した時点で、I/Oサーバ１８は、ボリューム管理サーバ１５へ完了通知を出す。これにより、すべてのストレージ装置２５から同一データの読込みができる。 Thereafter, the I / O server 18 uses the load statistics information or the read / write position access load correspondence table 38 to hold the I / O server 18 in the cache 18d when the load on the storage device 25 falls below the threshold. The written data is written into the storage device 25 (S40). At this time, the I / O server 18 updates the write access frequency at the position written in the disk 26 of the storage device 25 in the read / write position access load correspondence table 38 (S41). When the writing to the storage device 25 is completed, the I / O server 18 issues a completion notification to the volume management server 15. Thereby, the same data can be read from all the storage devices 25.

また、書込み対象のデータをキャッシュ１８ｄに保留した状態でI/Oサーバ１８がダウンした場合は、キャッシュ１８ｄに保留した書込み対象のデータが失われるため、どのレプリカが最新であるか管理することができない。そのため、I/Oサーバ１８のダウン後の起動時に、ボリューム管理サーバ１５は、書込み保留表３２の中にキャッシュ１８ｄに書込み対象のデータがあるかどうかを調べ、必要に応じてレプリカ作成を行う。 In addition, when the I / O server 18 goes down while the write target data is held in the cache 18d, the write target data held in the cache 18d is lost, so it is possible to manage which replica is the latest. Can not. Therefore, when the I / O server 18 is started after being down, the volume management server 15 checks whether there is data to be written in the cache 18d in the write hold table 32, and creates a replica if necessary.

レプリカ作成を行っているストレージ装置２５に対してキャッシュ１８ｄに保留したデータのストレージ装置２５への書込みが完了することで、ストレージ装置Ａ，Ｂ、及び、追加したストレージ装置Ｃが同じデータを保持する。このとき、ストレージ装置Ａとストレージ装置Ｂとは、ミラー状態が保たれている。また、ストレージ装置Cについては、書込みを行った部分のみのデータが書かれた状態になる。 The storage devices A and B and the added storage device C hold the same data by completing the writing of the data held in the cache 18d to the storage device 25 to the storage device 25 performing the replica creation. . At this time, the storage apparatus A and the storage apparatus B are maintained in a mirror state. In addition, for the storage device C, only the data that has been written is written.

ここで、追加したストレージ装置Cにおいて、read/write処理でキャッシュ１８ｄに保持したデータ（よくアクセスがあるデータ）に関しては、ミラー状態を保つことができるが、ディスク全体をミラー化することはできてはいない。したがって、この時点では、レプリカの元のディスク全体のデータ削除はできない。しかし、この場合、読込みがあるデータに関しては、ミラー状態を保っているので（レプリカができているので）、ストレージ装置２５へのI/Oサーバ１８のデータの読込みを分散させることができる。これにより、読み込み先となるストレージ装置へのアクセス（負荷）が１つに集中しないので、ストレージシステム１４の性能を向上することができる。 Here, in the added storage device C, the data held in the cache 18d by read / write processing (data that is frequently accessed) can be kept in the mirror state, but the entire disk can be mirrored. No. Therefore, at this time, the entire original disk of the replica cannot be deleted. However, in this case, since the read data is in the mirror state (because a replica is made), the reading of the data of the I / O server 18 to the storage device 25 can be distributed. As a result, the access (load) to the storage device that is the read destination is not concentrated on one, so that the performance of the storage system 14 can be improved.

（４）レプリカ作成によるアクセス負荷分散方法
［レプリカ作成方法］
図１６及び図１７は、本実施形態におけるレプリカ作成処理フローを示す。以下では、図１６及び図１７を用いて、ボリューム管理サーバ１５とI/Oサーバ１８の動作を説明する。(4) Access load distribution method by replica creation [Replica creation method]
16 and 17 show a replica creation processing flow in the present embodiment. Hereinafter, the operations of the volume management server 15 and the I / O server 18 will be described with reference to FIGS. 16 and 17.

ボリューム管理サーバ１５は、負荷監視部１７から得られる情報を、随時read/writeディスクアクセス負荷対応表３５に反映する。さらに、ボリューム管理サーバ１５は、read/writeディスクアクセス負荷対応表３５と、定期的に全I/Oサーバ１８から送付されるread/write位置アクセス負荷対応表３８を用いて、read/writeアクセス負荷対応表３４を作成する。 The volume management server 15 reflects the information obtained from the load monitoring unit 17 in the read / write disk access load correspondence table 35 as needed. Further, the volume management server 15 uses the read / write disk access load correspondence table 35 and the read / write position access load correspondence table 38 periodically sent from all the I / O servers 18 to read / write access load. A correspondence table 34 is created.

まず、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４を用いて、ストレージ装置単位でアクセス負荷（read/writeアクセス頻度）が予め設定された閾値を超えているか否かを判定する（Ｓ５１）。 First, the volume management server 15 uses the read / write access load correspondence table 34 to determine whether or not the access load (read / write access frequency) exceeds a preset threshold value for each storage device (S51). ).

アクセス負荷が閾値を超えていた場合（Ｓ５１で「Ｙｅｓ」）、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４のディスクのread/writeアクセス頻度を参照し、アクセス負荷の高いデータの位置を特定する（Ｓ５２）。すなわち、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４を用いて、アクセス負荷（read/writeアクセス頻度）が閾値を超えているストレージ装置２５内のどのディスクのどの位置のデータに対して、read/writeが多いのかを判定する。 When the access load exceeds the threshold value (“Yes” in S51), the volume management server 15 refers to the read / write access frequency of the disk in the read / write access load correspondence table 34, and the position of the data with a high access load. Is specified (S52). That is, the volume management server 15 uses the read / write access load correspondence table 34 for the data at which position on which disk in the storage device 25 in which the access load (read / write access frequency) exceeds the threshold. Determine if there are many read / writes.

さらに、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４を用いて、レプリカの作成先を決定する（Ｓ５３）。ここでは、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４を用いて、アクセス負荷が最も低いストレージ装置２５を選択する。さらに、ボリューム管理サーバ１５は、その選択したストレージ装置２５から、空き容量が最も多いストレージ装置をレプリカ作成先として選択する。
そして、ボリューム管理サーバ１５は、I/Oサーバ１８にレプリカ作成先を通知する（Ｓ５４）。Furthermore, the volume management server 15 determines a replica creation destination using the read / write access load correspondence table 34 (S53). Here, the volume management server 15 uses the read / write access load correspondence table 34 to select the storage device 25 with the lowest access load. Furthermore, the volume management server 15 selects a storage device with the largest free capacity from the selected storage devices 25 as a replica creation destination.
Then, the volume management server 15 notifies the I / O server 18 of the replica creation destination (S54).

I/Oサーバ１８は、ボリューム管理サーバ１５よりレプリカの作成先の通知を受けると、仮想ボリューム２１についてのレプリカ完成度表３６を作成する（Ｓ５５）。なお、Read処理（図１４）、write処理（図１５）は、Ｓ５５とＳ５６との間で発生する。 Upon receiving notification of the replica creation destination from the volume management server 15, the I / O server 18 creates a replica completeness table 36 for the virtual volume 21 (S55). Note that the Read process (FIG. 14) and the write process (FIG. 15) occur between S55 and S56.

I/Oサーバ１８は、レプリカ完成度表３６を作成後、該当仮想ボリューム２１の実ディスクの１つに、レプリカ作成先のディスクを追加する（Ｓ５６）。I/Oサーバ１８は、レプリカ作成先のディスクへのキャッシュに保持したデータの書込みを開始することをボリューム管理サーバ１５に通知する（Ｓ５７）。 After creating the replica completeness table 36, the I / O server 18 adds a replica creation destination disk to one of the real disks of the virtual volume 21 (S56). The I / O server 18 notifies the volume management server 15 to start writing the data held in the cache to the replica creation destination disk (S57).

ボリューム管理サーバ１５は、I/Oサーバ１８から、レプリカ作成先のディスクへのキャッシュに保持したデータの書込みを開始する通知を受けると、レプリカ表３１の該当箇所を「Copying」に更新する（Ｓ５８）。 When the volume management server 15 receives a notification from the I / O server 18 to start writing the data held in the cache to the replica creation destination disk, it updates the corresponding part of the replica table 31 to “Copying” (S58). ).

［レプリカへのデータの書込み］
キャッシュ１８ｄにレプリカ元のデータが存在する場合（Ｓ６１で「Ｙｅｓ」）、I/Oサーバ１８は、クライアント１１からレプリカ元となるストレージ装置２５へのデータの読込み処理に続いて、キャッシュ１８ｄを利用し、レプリカ作成処理を開始する（Ｓ６２）。また、さらに、ボリューム管理サーバ１５からレプリカ作成依頼が来る前に、キャッシュ１８ｄに保持していたデータについても、I/Oサーバ１８は、レプリカ完成度表３６の更新と、レプリカ作成対象のディスクに書込みを行う。キャッシュ１８ｄに既に保持されているデータを利用することで、レプリカの更新処理を進めることができる。[Write data to replica]
If the replica source data exists in the cache 18d ("Yes" in S61), the I / O server 18 uses the cache 18d following the data read processing from the client 11 to the replica storage device 25. Then, the replica creation process is started (S62). Further, regarding the data held in the cache 18d before the replica creation request is received from the volume management server 15, the I / O server 18 updates the replica completeness table 36 and stores the data in the replica creation target disk. Write. By using the data already held in the cache 18d, the replica update process can proceed.

I/Oサーバ１８は、ボリューム管理サーバ１５より通知された情報（レプリカ作成先、書き込み時刻）を用いて、データ読み出し時のキャッシュ１８ｄに保持したデータを、予め設定された時刻にレプリカ先に書き込む（Ｓ６３）。書込み完了後、I/Oサーバ１８は、レプリカ完成度表３６において、書き込んだストレージ装置２５のディスク２６について「Created」と更新する（Ｓ６４）。 The I / O server 18 uses the information (replica creation destination and write time) notified from the volume management server 15 to write the data held in the cache 18d at the time of data read to the replica destination at a preset time. (S63). After completion of the writing, the I / O server 18 updates the created disk 26 of the storage device 25 to “Created” in the replica completeness table 36 (S64).

レプリカ完成度表３６において、レプリカ完成度が100%となったら、I/Oサーバ１８は、ボリューム管理サーバ１５に、レプリカ作成が完了した旨を通知する（Ｓ６５）。ボリューム管理サーバ１５は、レプリカ表３１の該当箇所を「Created」に更新する（Ｓ６６）。 In the replica completeness table 36, when the replica completeness reaches 100%, the I / O server 18 notifies the volume management server 15 that the replica creation is completed (S65). The volume management server 15 updates the corresponding part of the replica table 31 to “Created” (S66).

I/Oサーバ１８は、キャッシュ１８ｄに保持したデータをレプリカ作成先のディスクへ書き込む処理を開始することをボリューム管理サーバ１５に通知する（Ｓ６７）。 The I / O server 18 notifies the volume management server 15 to start processing to write the data held in the cache 18d to the replica creation destination disk (S67).

［レプリカ削除方法］
クライアント１１からの書込み要求が頻繁にある場合は、全てのレプリカへの書込みが頻繁に起こるため、ネットワーク帯域の負荷が高くなってしまう。書込みが頻発する場合は、I/Oサーバ１８は、以下の処理により、レプリカを削除するか否かを判断する。[Replica deletion method]
When there are frequent write requests from the client 11, writing to all replicas occurs frequently, resulting in a high network bandwidth load. When writing frequently occurs, the I / O server 18 determines whether or not to delete the replica by the following processing.

（i）I/Oサーバ１８は、書込み対象のデータについてディスクI/O周期表３７を用いて、データの読込み量・書込み量から、その後に、大量の読み込み期間があると判定した場合は、レプリカを削除しない。
（ii）I/Oサーバ１８は、ディスクI/O周期表３７を用いて、書込み期間が一定期間続くと判定した場合は、レプリカを削除する。(I) If the I / O server 18 uses the disk I / O periodic table 37 for the data to be written and determines that there is a large amount of reading period after that from the amount of data read / written, Do not delete the replica.
(Ii) If the I / O server 18 uses the disk I / O periodic table 37 and determines that the write period continues for a certain period, the I / O server 18 deletes the replica.

図１８及び図１９は、本実施形態におけるレプリカ削除処理フローを示す。以下では、図１８及び図１９を用いて、ボリューム管理サーバ１５とI/Oサーバ１８の動作を説明する。 18 and 19 show a replica deletion processing flow in the present embodiment. Hereinafter, the operations of the volume management server 15 and the I / O server 18 will be described with reference to FIGS. 18 and 19.

ボリューム管理サーバ１５は、以下のいずれかの条件（Ｓ７１，Ｓ７２）に合致したと判定したときに、I/Oサーバ１８にレプリカ削除処理開始を通知する（Ｓ７３）。Ｓ７１の条件は、ボリューム管理サーバ１５が、read/writeアクセス負荷対応表３４のwriteアクセス頻度から、いずれかの仮想ボリュームのレプリカ内のデータへのwriteアクセス頻度が閾値を超えたと判定することである。また、Ｓ７２の条件は、ボリューム管理サーバ１５がストレージ容量表３９を用いて、いずれかの仮想ボリュームのレプリカ作成時にストレージ装置２５の空き容量が閾値を下回ると判定することである。 When it is determined that any of the following conditions (S71, S72) is met, the volume management server 15 notifies the I / O server 18 of the start of replica deletion processing (S73). The condition of S71 is that the volume management server 15 determines from the write access frequency in the read / write access load correspondence table 34 that the write access frequency to the data in the replica of any virtual volume has exceeded the threshold. . Further, the condition of S72 is that the volume management server 15 uses the storage capacity table 39 to determine that the free capacity of the storage device 25 is below the threshold when creating a replica of any virtual volume.

−−削除するレプリカの決定方法−−
レプリカ削除処理では、ボリューム管理サーバ１５は、レプリカ表３１を参照して、「Created」となっているいずれかの仮想ボリュームのレプリカ数が３以上存在するか判定する（Ｓ７４）。レプリカ数が３未満である場合（Ｓ７４で「Ｎｏ」）、ボリューム管理サーバ１５は、レプリカ削除処理を実行しない。-Determination method of replica to be deleted-
In the replica deletion processing, the volume management server 15 refers to the replica table 31 and determines whether there are three or more replicas of any virtual volume that is “Created” (S74). When the number of replicas is less than 3 (“No” in S74), the volume management server 15 does not execute the replica deletion process.

レプリカ数が３以上存在する場合、ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４のwriteアクセス頻度項目を用いて、最も高いアクセス負荷がかかっているストレージ装置２５内にあるレプリカを選択する（Ｓ７５）。 When the number of replicas is 3 or more, the volume management server 15 uses the write access frequency item in the read / write access load correspondence table 34 to select a replica in the storage apparatus 25 with the highest access load. (S75).

ボリューム管理サーバ１５は、read/writeアクセス負荷対応表３４とI/O周期表３３とを関係付けて、現時刻から所定の時間内に、選択したレプリカの存在するディスクに対して、閾値を超えるreadアクセスがあるか否かを判定する（Ｓ７６）。 The volume management server 15 associates the read / write access load correspondence table 34 with the I / O periodic table 33 and exceeds the threshold for the disk where the selected replica exists within a predetermined time from the current time. It is determined whether there is a read access (S76).

選択したレプリカに対して、現時刻から所定の時間内に、閾値を超えるreadアクセスがされないと判定した場合（Ｓ７６で「Ｙｅｓ」）、ボリューム管理サーバ１５は、その仮想ボリューム２１についてのレプリカ削除処理を実行しない。このとき、ボリューム管理サーバ１５は、他の仮想ボリューム２１のレプリカを検索する（Ｓ７８）。 When it is determined that read access exceeding the threshold is not made within a predetermined time from the current time for the selected replica (“Yes” in S76), the volume management server 15 deletes the replica for the virtual volume 21 Do not execute. At this time, the volume management server 15 searches for a replica of another virtual volume 21 (S78).

他の仮想ボリュームのレプリカが存在する場合（Ｓ７９で「Ｙｅｓ」）、ボリューム管理サーバ１５は、当該他の仮想ボリュームについて、Ｓ７３の処理以降の処理を行う。
他の仮想ボリュームのレプリカが存在しない場合（Ｓ７９で「Ｎｏ」）、ボリューム管理サーバ１５は、I/Oサーバ１８に、Ｓ８４で用いるレプリカ完成度の閾値を動的に所定値下げるように通知する（Ｓ８０）。そして、ボリューム管理サーバ１５は、当該仮想ボリュームについて、再びＳ７３以降の処理を行う。I/Oサーバ１８は、その通知を受信すると、Ｓ８４で用いるレプリカ完成度の閾値を動的に所定値下げる。If there is a replica of another virtual volume (“Yes” in S79), the volume management server 15 performs the processing subsequent to the processing in S73 for the other virtual volume.
When there is no replica of another virtual volume (“No” in S79), the volume management server 15 notifies the I / O server 18 to dynamically lower the replica completeness threshold used in S84 by a predetermined value. (S80). Then, the volume management server 15 performs the processing subsequent to S73 again for the virtual volume. Upon receiving the notification, the I / O server 18 dynamically lowers the replica perfection threshold used in S84 by a predetermined value.

‐レプリカ削除の実行
選択したレプリカに対して、現時刻から所定の時間内に、閾値を超えるreadアクセスがされると判定した場合（Ｓ７６で「Ｎｏ」）、ボリューム管理サーバ１５は、レプリカ表３１の該当箇所を「Deleting」に更新する。その後、ボリューム管理サーバ１５は、選択したレプリカについての削除情報（仮想ボリュームID、データ配置場所）を該当レプリカへアクセスを行っているI/Oサーバ１８に通知する（Ｓ７７）。-Execution of replica deletion When it is determined that the read access exceeding the threshold is made within a predetermined time from the current time ("No" in S76), the volume management server 15 determines that the replica table 31 Update the relevant part of to "Deleting". Thereafter, the volume management server 15 notifies the deletion information (virtual volume ID, data arrangement location) about the selected replica to the I / O server 18 that is accessing the relevant replica (S77).

I/Oサーバ１８は、選択したレプリカについての削除情報をボリューム管理サーバ１５から受信する。I/Oサーバ１８は、レプリカ完成度表３６を用いて、通知された仮想ボリュームIDに関するレプリカ完成度100%のレプリカが３つ以上存在するか判定する（Ｓ８１）。 The I / O server 18 receives deletion information about the selected replica from the volume management server 15. The I / O server 18 uses the replica completeness table 36 to determine whether there are three or more replicas with 100% replica completeness related to the notified virtual volume ID (S81).

レプリカ完成度表３６において、レプリカ完成度100%であるレプリカが３つ以上存在しない場合（Ｓ８１で「Ｎｏ」）、I/Oサーバ１８は、作成中の状態にあるレプリカに対して、そのレプリカ作成を続行させる。レプリカ完成後、I/Oサーバ１８は、Ｓ７５で選択した負荷の高いストレージ装置２５のレプリカを削除し（Ｓ８３）、レプリカ削除処理を終了する。 In the replica completeness table 36, when three or more replicas having a replica completeness of 100% do not exist (“No” in S81), the I / O server 18 applies the replicas to the replicas being created. Continue creation. After completing the replica, the I / O server 18 deletes the replica of the storage device 25 with a high load selected in S75 (S83), and ends the replica deletion processing.

I/Oサーバ１８は、レプリカ完成度表３６において、通知された仮想ボリュームＩＤの全レプリカ完成度が所定の閾値以下であるかを判定する（Ｓ８４）。通知された仮想ボリュームＩＤの全レプリカ完成度が所定の閾値（例えば80%）以下である場合（Ｓ８４で「Ｙｅｓ」）、I/Oサーバ１８は、ボリューム管理サーバ１５に、他の仮想ボリュームのレプリカの削除対象の変更依頼を行う（Ｓ８５）。 In the replica completeness table 36, the I / O server 18 determines whether or not the total replica completeness of the notified virtual volume ID is equal to or less than a predetermined threshold (S84). If the completeness of all replicas of the notified virtual volume ID is less than or equal to a predetermined threshold (for example, 80%) (“Yes” in S84), the I / O server 18 sends the volume management server 15 to another virtual volume. A request for changing a replica deletion target is made (S85).

他の仮想ボリュームのレプリカが存在する場合（Ｓ７９で「Ｙｅｓ」）、ボリューム管理サーバ１５は、当該他の仮想ボリュームについて、Ｓ７３の処理以降の処理を行う。他の仮想ボリュームのレプリカが存在しない場合（Ｓ７９で「Ｎｏ」）、ボリューム管理サーバ１５は、I/Oサーバ１８に、Ｓ８４で用いるレプリカ完成度の閾値を動的に所定値下げるように通知する（Ｓ８０）。そして、ボリューム管理サーバ１５は、当該仮想ボリュームについて、再びＳ７３以降の処理を行う。I/Oサーバ１８は、その通知を受信すると、Ｓ８４で用いるレプリカ完成度の閾値を動的に所定値下げる。これにより、ボリューム管理サーバ１５は、再度、I/Oサーバ１８へレプリカ削除依頼を行うことができる。 If there is a replica of another virtual volume (“Yes” in S79), the volume management server 15 performs the processing subsequent to the processing in S73 for the other virtual volume. When there is no replica of another virtual volume (“No” in S79), the volume management server 15 notifies the I / O server 18 to dynamically lower the replica completeness threshold used in S84 by a predetermined value. (S80). Then, the volume management server 15 performs the processing subsequent to S73 again for the virtual volume. Upon receiving the notification, the I / O server 18 dynamically lowers the replica perfection threshold used in S84 by a predetermined value. As a result, the volume management server 15 can request the I / O server 18 to delete the replica again.

通知された仮想ボリュームＩＤの全レプリカ完成度が所定の閾値（例えば80%）より高い場合（Ｓ８４で「Ｎｏ」）、I/Oサーバ１８は、次の処理を行う。すなわち、I/Oサーバ１８は、Ｓ５４でボリューム管理サーバ１５より通知されたレプリカ作成先情報を用いて、データ読み込み時にキャッシュ１８ｄに保持したデータをレプリカ作成先に書き込み、レプリカ完成度を100%にする（Ｓ８６）。 If the completeness of all replicas of the notified virtual volume ID is higher than a predetermined threshold (for example, 80%) (“No” in S84), the I / O server 18 performs the following processing. That is, the I / O server 18 uses the replica creation destination information notified from the volume management server 15 in S54 to write the data held in the cache 18d at the time of data reading to the replica creation destination, and the replica completeness is set to 100%. (S86).

レプリカ完成度表３６において、通知された仮想ボリュームIDに関係するレプリカ完成度100%のレプリカが３つ以上存在する場合（Ｓ８１で「Ｙｅｓ」）、またはＳ８６の処理完了後、I/Oサーバ１８は、次の処理を行う。すなわち、I/Oサーバ１８は、ストレージ装置２５の空き容量が閾値を下回ったかを判定する（Ｓ８７）。ストレージの空き容量が閾値を上回った場合（Ｓ８７で「Ｎｏ」）、レプリカ削除処理を終了する。 In the replica completeness table 36, when there are three or more replicas having a replica completeness of 100% related to the notified virtual volume ID (“Yes” in S81), or after the processing of S86 is completed, the I / O server 18 Performs the following processing. That is, the I / O server 18 determines whether the free capacity of the storage device 25 has fallen below the threshold (S87). When the free space of the storage exceeds the threshold (“No” in S87), the replica deletion process is terminated.

ストレージ装置２５の空き容量が下回った場合（Ｓ８７で「Ｙｅｓ」）、I/Oサーバ１８は、read/write位置アクセス負荷対応表３８を用いて、レプリカ完成度100%のレプリカから、readアクセス頻度が閾値を下回っているレプリカを選択する（Ｓ８８）。このとき、I/Oサーバ１８は、レプリカ完成度100%であるレプリカを保持するストレージ装置２５の中で、最も負荷が高いストレージ装置２５のレプリカを選択する。I/Oサーバ１８は、選択したレプリカを削除する（Ｓ８９）。 When the free capacity of the storage device 25 is lower (“Yes” in S87), the I / O server 18 uses the read / write position access load correspondence table 38 to read the read access frequency from the replica with 100% completeness of replica. A replica whose value is below the threshold is selected (S88). At this time, the I / O server 18 selects the replica of the storage device 25 with the highest load among the storage devices 25 holding the replica having a replica completeness of 100%. The I / O server 18 deletes the selected replica (S89).

そして、I/Oサーバ１８は、ボリューム管理サーバ１５にレプリカ削除処理の完了を通知する（Ｓ９０）。ボリューム管理サーバ１５は、I/Oサーバ１８からレプリカ削除処理の完了通知を受けると、レプリカ表３１の該当箇所を「Deleting」から「None」に更新し（Ｓ９１）、レプリカ削除処理を終了する。 The I / O server 18 notifies the volume management server 15 of the completion of the replica deletion process (S90). When receiving the notification of completion of the replica deletion process from the I / O server 18, the volume management server 15 updates the corresponding part of the replica table 31 from “Deleting” to “None” (S91), and ends the replica deletion process.

［I/O周期表を利用した予測レプリカ作成処理］
上記「（４）レプリカ作成によるアクセス負荷分散方法」で捕捉しきれないアクセス負荷が低いディスクが多数存在する場合に、ストレージ装置単位でみると、負荷が高くなる場合がある。そこで、このような状況でのデータリバランスでは、I/O周期表を利用し、レプリカを作成することで負荷分散をさせる手法を用いる。[Predictive replica creation process using I / O periodic table]
When there are a large number of disks with low access loads that cannot be captured by “(4) Access load distribution method by replica creation”, the load may increase when viewed in units of storage devices. Therefore, in the data rebalancing in such a situation, a method of using the I / O periodic table and creating a replica to distribute the load is used.

図２０は、I/O周期表を利用した予測レプリカ作成処理フローを示す。I/Oサーバ１８は、ストレージ装置２５内のディスク単位で、予め設定した期間で、一定間隔（周期）の入出力（I/O）（すなわちreadアクセス回数、writeアクセス回数）の集計を行い、ディスクI/O周期表３７に記憶する。この一定間隔（周期）が1週間単位の業務の場合は、I/Oサーバ１８は、1週間で集計をとることも可能である。なお、図２０の処理は、図１６のＳ５４の処理において、実行させるようにしてもよい。 FIG. 20 shows a predicted replica creation processing flow using the I / O periodic table. The I / O server 18 aggregates the input / output (I / O) (that is, the number of read accesses and the number of write accesses) at regular intervals (cycles) in a predetermined period for each disk in the storage device 25, Store in the disk I / O periodic table 37. In the case where the fixed interval (cycle) is a business in units of one week, the I / O server 18 can also collect data in one week. The process of FIG. 20 may be executed in the process of S54 of FIG.

ボリューム管理サーバ１５は、所定の時刻に、全I/Oサーバ１８からディスクI/O周期表３７を収集する。ボリューム管理サーバ１５は、収集した全ディスクI/O周期表３７を用いて、ストレージ装置のディスクの時刻毎に、readアクセス回数、writeアクセス回数を集計して、I/O周期表３３を作成する。 The volume management server 15 collects the disk I / O periodic table 37 from all the I / O servers 18 at a predetermined time. The volume management server 15 uses the collected all-disk I / O periodic table 37 to create the I / O periodic table 33 by counting the number of read accesses and the number of write accesses for each disk time of the storage device. .

ボリューム管理サーバ１５は、作成したI/O周期表３３から、各ストレージ装置２５に対する時刻あたりのI/O（readアクセス回数とwriteアクセス回数の合計）について一定周期で集計を行い、ストレージ装置２５のアクセス負荷を計算する（Ｓ１０１）。例えば、ボリューム管理サーバ１５は、1時間ごとのストレージ装置２５のアクセス負荷を1日分計算する。ここで、アクセス負荷とは、readアクセス回数とwriteアクセス回数の合計をいう。 The volume management server 15 aggregates the I / O per time (total number of read accesses and write accesses) for each storage device 25 from the created I / O cycle table 33 at a constant cycle. The access load is calculated (S101). For example, the volume management server 15 calculates the access load of the storage device 25 every hour for one day. Here, the access load refers to the total number of read accesses and write accesses.

次に、ボリューム管理サーバ１５は、全ストレージ装置２５に対するアクセス負荷の平均値を計算する（Ｓ１０２）。その後、ボリューム管理サーバ１５は、集計した時間帯において、平均値と各ストレージ装置２５のアクセス負荷の差分が、予め設定した閾値を超えているかを判定する（Ｓ１０３）。そのサクセス負荷の差分が閾値を超えていない場合（Ｓ１０３で「Ｎｏ」）、ボリューム管理サーバ１５は、予測処理を終了する。 Next, the volume management server 15 calculates an average value of access loads for all the storage devices 25 (S102). Thereafter, the volume management server 15 determines whether or not the difference between the average value and the access load of each storage device 25 exceeds a preset threshold during the tabulated time period (S103). If the difference in the success load does not exceed the threshold (“No” in S103), the volume management server 15 ends the prediction process.

そのサクセス負荷の差分が閾値を超えている場合（Ｓ１０３で「Ｙｅｓ」）、ボリューム管理サーバ１５は、次の処理を行う。すなわち、ボリューム管理サーバ１５は、該当ストレージ装置２５の該当時間に対して、より細かい時間単位でそのストレージ装置２５に属する、各ディスク２６のアクセス負荷と、全ディスク２６のアクセス負荷の平均値の差分を求める。より細かい時間単位とは、例えば、1分単位である。 When the difference in the success load exceeds the threshold (“Yes” in S103), the volume management server 15 performs the following process. That is, the volume management server 15 determines the difference between the access load of each disk 26 and the average value of the access loads of all the disks 26 belonging to the storage device 25 in smaller time units with respect to the corresponding time of the corresponding storage device 25. Ask for. The finer time unit is, for example, 1 minute unit.

ボリューム管理サーバ１５は、その差分が予め設定した閾値を下回っているか判定する（Ｓ１０４）。その差分が予め設定した閾値を下回っていれば（Ｓ１０４で「Ｙｅｓ」）、ボリューム管理サーバ１５は、I/O周期表３３を用いて、該当ストレージ装置２５内のディスク２６の中で、最もread処理が多いディスク１６を選択する。ボリューム管理サーバ１５は、該当時間より前、かつ、ストレージ装置２５のアクセス負荷が予め設定している閾値を超えていない時間帯に、その選択したディスク２６にレプリカを作成する（Ｓ１０５）。 The volume management server 15 determines whether the difference is below a preset threshold value (S104). If the difference is below the preset threshold (“Yes” in S104), the volume management server 15 uses the I / O periodic table 33 to read the most read disk 26 in the storage device 25. A disk 16 with a lot of processing is selected. The volume management server 15 creates a replica on the selected disk 26 before the corresponding time and in a time zone where the access load of the storage device 25 does not exceed a preset threshold (S105).

以下では、I/O周期表３３を用いた、ストレージ装置２５単位でのアクセス負荷が高く、かつ、ディスク単位のアクセス負荷が低いデータに対する予測レプリカ作成処理の例を示す。まずは、図２１−図２４を用いて、図２０のＳ１０１−Ｓ１０３の処理を説明する。 In the following, an example of predicted replica creation processing for data having a high access load in the storage device 25 unit and a low access load in the disk unit using the I / O periodic table 33 will be described. First, the processing of S101 to S103 in FIG. 20 will be described with reference to FIGS.

図２１は、ストレージ装置ＡについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２２は、ストレージ装置ＢについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２３は、ストレージ装置ＣについてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１時間毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２４は、図２１−図２３で示す作業表（Ａ）の１時間毎のアクセス頻度の平均値と、そのグラフ（Ｂ）を示す。 FIG. 21 shows a work table (A) in which access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage device A are totaled every hour, and a graph (B) thereof. Show. FIG. 22 shows a work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage apparatus B are counted every hour, and a graph (B) thereof. Show. FIG. 23 shows a work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the storage device C are counted every hour, and a graph (B) thereof. Show. FIG. 24 shows an average value of the access frequency for each hour of the work table (A) shown in FIGS. 21 to 23 and a graph (B) thereof.

ボリューム管理サーバ１５は、全てのI/Oサーバ１８からディスクI/O周期表３７を収集してI/O周期表３３を作成後、図２１−図２３に示すように、３つのストレージ装置Ａ，Ｂ，Ｃのアクセス負荷を算出する。アクセス負荷の算出後、ボリューム管理サーバ１５は、図２４に示すように、３つのストレージ装置Ａ，Ｂ，Ｃに対するアクセス負荷の平均値を計算する。ここでは、時間帯４：００〜５：００のアクセス負荷に着目して説明する。 The volume management server 15 collects the disk I / O periodic table 37 from all the I / O servers 18 and creates the I / O periodic table 33, and then, as shown in FIGS. , B, and C are calculated. After calculating the access load, the volume management server 15 calculates the average access load for the three storage apparatuses A, B, and C as shown in FIG. Here, a description will be given focusing on the access load in the time zone 4:00 to 5:00.

図２１に示すように、ストレージ装置Aのアクセス負荷は、５４５である。図２２に示すように、ストレージ装置Bのアクセス負荷は、５４３５である。図２３に示すように、ストレージ装置Cのアクセス負荷は、２３である。図２４に示すように、アクセス負荷の平均値は、２００１となる。 As shown in FIG. 21, the access load of the storage apparatus A is 545. As shown in FIG. 22, the access load of the storage apparatus B is 5435. As shown in FIG. 23, the access load of the storage apparatus C is 23. As shown in FIG. 24, the average value of the access load is 2001.

ボリューム管理サーバ１５は、各ストレージ装置２５のアクセス負荷と平均値の差分を計算する。ここで、ストレージ装置Bと平均値の差分は、５４３５−２００１＝３４３４となる。予め設定していた閾値を２０００とする。ストレージ装置Bのアクセス負荷と平均値の差分（３４３４）がその閾値（２０００）を超えているため、ボリューム管理サーバ１５は、図２０のＳ１０４−Ｓ１０５の処理を実行する。これについて、図２５−図２８を用いて説明する。 The volume management server 15 calculates the difference between the access load of each storage device 25 and the average value. Here, the difference between the storage device B and the average value is 5435−2001 = 3434. The threshold value set in advance is set to 2000. Since the difference (3434) between the access load and the average value of the storage device B exceeds the threshold value (2000), the volume management server 15 executes the processing of S104-S105 in FIG. This will be described with reference to FIGS.

図２５は、ストレージ装置ＢのディスクＢ１についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２６は、ストレージ装置ＢのディスクＢ２についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２７は、ストレージ装置ＢのディスクＢ３についてのI/O周期表３３のread頻度とwrite頻度とを加算して得られるアクセス頻度を１分毎に集計した作業表（Ａ）と、そのグラフ（Ｂ）を示す。図２８は、図２５−図２７で示す作業表（Ａ）の１分毎のアクセス頻度の平均値と、そのグラフ（Ｂ）を示す。 FIG. 25 shows a work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B1 of the storage device B are counted every minute, and its graph ( B). FIG. 26 shows a work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B2 of the storage device B are counted every minute and its graph ( B). FIG. 27 shows a work table (A) in which the access frequencies obtained by adding the read frequency and the write frequency of the I / O periodic table 33 for the disk B3 of the storage apparatus B are counted every minute, and its graph ( B). FIG. 28 shows an average value of access frequencies per minute of the work table (A) shown in FIGS. 25 to 27 and a graph (B) thereof.

ボリューム管理サーバ１５は、ストレージ装置２５のアクセス負荷と平均値の差分が閾値を超えるストレージ装置２５の該当時間（時間帯４：００〜５：００）に対して、次の処理を行う。すなわち、ボリューム管理サーバ１５は、１分単位でそのストレージ装置２５に属する各ディスクのアクセス負荷と、全ディスクのアクセス負荷の平均値の差分を求める。ここでは、ストレージ装置B内に３つのディスクB1，B2，B3があるとし、時間帯４：３１〜４：３２に着目して説明する。 The volume management server 15 performs the following processing for the corresponding time (time zone 4:00 to 5:00) of the storage device 25 where the difference between the access load of the storage device 25 and the average value exceeds the threshold value. That is, the volume management server 15 obtains the difference between the access load of each disk belonging to the storage device 25 and the average value of the access loads of all the disks in units of one minute. Here, it is assumed that there are three disks B1, B2, and B3 in the storage apparatus B, and explanation will be given focusing on the time zone 4:31 to 4:32.

まず、ストレージ装置Bの1分単位のアクセス負荷の平均値は、５４３５/６０≒９１となる。さらにストレージ装置B内のディスクが３つのため、各ディスクの１分単位のアクセス負荷の平均値は、９１/３≒３１となる。ここで、ディスクB1，B2，B3の４：３１〜４：３２の時間帯でのアクセス負荷はそれぞれ、２７、２４、２９である。ディスクB1，B2，B3のアクセス負荷と平均値との差分はそれぞれ、４、７、２となる。予め設定した閾値が１０とすると、ディスクB1，B2，B3のアクセス負荷と平均値との差分はいずれも、閾値を下回っている。他の時間帯も全て、ディスクB1，B2，B3のアクセス負荷と平均値との差分はいずれも、閾値を下回っている。このことから、ストレージ装置Bに対するアクセス負荷は高いが、ストレージ装置内の各ディスクB1，B2，B3へのアクセス負荷が低いことがわかる。本例において、レプリカ作成処理では、ストレージ装置Bはリバランスの対象から外れてしまい、ストレージ装置Bのアクセス負荷を分散させるためのリバランスを実施できない。 First, the average value of the access load of the storage device B per minute is 5435 / 60≈91. Furthermore, since there are three disks in the storage apparatus B, the average value of the access load for each minute of each disk is 91 / 3≈31. Here, the access loads of the disks B1, B2, and B3 in the time zone of 4:31 to 4:32 are 27, 24, and 29, respectively. The difference between the access load and the average value of the disks B1, B2, and B3 is 4, 7, and 2, respectively. If the preset threshold value is 10, the difference between the access load and the average value of the disks B1, B2, and B3 are all below the threshold value. In all other time zones, the difference between the access load and the average value of the disks B1, B2, and B3 is all below the threshold value. From this, it can be seen that the access load on the storage device B is high, but the access load on each disk B1, B2, B3 in the storage device is low. In this example, in the replica creation process, the storage apparatus B is excluded from the rebalance target, and the rebalance for distributing the access load of the storage apparatus B cannot be performed.

そのため、ボリューム管理サーバ１５は、I/O周期表３３を利用した予測レプリカ作成処理を用いて、ディスクB1，B2，B3のうち、最もread処理が多いディスクを選択する。ボリューム管理サーバ１５は、その選択したディスク２６にデータのレプリカを作成する胸の指示をI/Oサーバ１８に行う。 Therefore, the volume management server 15 selects the disk with the most read processing among the disks B1, B2, and B3 by using the predicted replica creation process using the I / O periodic table 33. The volume management server 15 instructs the I / O server 18 to create a replica of the data on the selected disk 26.

上記のようにストレージ装置間のリバランシングを行うことにより、ストレージ装置、または、ストレージ装置内のディスクに異常が発生した場合でも、データを失うことはない。例えば、ストレージ装置Ａからストレージ装置Ｃへデータを移行している際に、ストレージ装置Ｂが故障した場合には、レプリカの冗長度が下がるが、データはストレージ装置Ａとストレージ装置Ｃを合わせることで最新性を保証できる。また、ボリューム管理サーバ１５は、レプリカの冗長度が下がったことを認識すると、強制的にレプリカ作成処理を行い、レプリカの冗長度を２になるようにする。 By performing rebalancing between storage apparatuses as described above, data is not lost even if an abnormality occurs in the storage apparatus or a disk in the storage apparatus. For example, if the storage device B fails when the data is transferred from the storage device A to the storage device C, the redundancy of the replica decreases, but the data can be obtained by combining the storage devices A and C. Can guarantee up-to-date. Further, when the volume management server 15 recognizes that the replica redundancy has decreased, the volume management server 15 forcibly performs the replica creation processing so that the replica redundancy is 2.

複数のストレージ装置２５や、I/Oサーバ１８がある大規模なクラウドのような環境では、I/Oサーバ１８とストレージ装置２５間の帯域が重要になっている。キャッシュに保持された、入力/出力要求に対応する読込み・書込みに必要なデータを有効的に利用することにより、ネットワーク帯域に負荷をかけずに、ストレージ装置２５のリバランスを行うことができる。 In an environment such as a large-scale cloud having a plurality of storage devices 25 and I / O servers 18, the bandwidth between the I / O servers 18 and the storage devices 25 is important. By effectively using the data necessary for reading / writing corresponding to the input / output request held in the cache, the storage device 25 can be rebalanced without imposing a load on the network bandwidth.

本実施形態によれば、ストレージ装置を追加した場合に、クライアント装置からのアクセス要求を有効利用することで、ストレージ間のリバランシング処理を、ネットワーク帯域に負荷を掛けずに効率的に行うことができる。また、ストレージシステムの入力／出力性能を最大限に活用することができる。そのため、本実施形態におけるストレージシステムを用いれば、業務に影響を与えずに、業務やサービスの効率を改善することができる。 According to the present embodiment, when a storage device is added, the rebalancing process between the storages can be efficiently performed without imposing a load on the network bandwidth by effectively using the access request from the client device. it can. In addition, the input / output performance of the storage system can be fully utilized. Therefore, if the storage system in this embodiment is used, the efficiency of business and services can be improved without affecting business.

また、本実施形態におけるストレージシステムに含まれるストレージ装置またはディスクに異常が発生した場合も、データは常に冗長されているために、データを失うことがなく業務を継続できる。 In addition, even when an abnormality occurs in a storage device or a disk included in the storage system in this embodiment, data is always redundant, so that business can be continued without losing data.

なお、本発明は、以上に述べた実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内で種々の構成または実施形態を取ることができる。 The present invention is not limited to the above-described embodiment, and various configurations or embodiments can be taken without departing from the gist of the present invention.

１ストレージシステム
２格納装置
３キャッシュメモリ
４アクセス制御部
５書込部
６負荷監視部
７レプリカ削除部
８レプリカ作成部
９情報処理端末
１１クライアント装置
１２通信インターフェース
１３ LAN
１４ストレージシステム
１５ボリューム管理サーバ
１６ボリューム管理部
１７負荷監視部
１８ I/Oサーバ
１８ｄキャッシュメモリ
２０ I/O管理部
２１仮想ボリューム
２５ストレージ装置
２６ディスク
２７管理用LAN
２８ I/O用LAN
３１レプリカ表
３２書き込み保留表
３３ I/O周期表
３４ read/writeアクセス負荷対応表
３５ read/writeディスクアクセス負荷対応表
３６レプリカ完成度表
３７ディスクI/O周期表
３８ read/write位置アクセス負荷対応表DESCRIPTION OF SYMBOLS 1 Storage system 2 Storage apparatus 3 Cache memory 4 Access control part 5 Writing part 6 Load monitoring part 7 Replica deletion part 8 Replica creation part 9 Information processing terminal 11 Client apparatus 12 Communication interface 13 LAN
14 Storage System 15 Volume Management Server 16 Volume Management Unit 17 Load Monitoring Unit 18 I / O Server 18d Cache Memory 20 I / O Management Unit 21 Virtual Volume 25 Storage Device 26 Disk 27 Management LAN
28 LAN for I / O
31 replica table 32 write pending table 33 I / O periodic table 34 read / write access load correspondence table 35 read / write disk access load correspondence table 36 replica completeness table 37 disk I / O periodic table 38 read / write position access load correspondence table

Claims

A plurality of storage devices for storing data;
Cache memory to hold data,
When there is an access request for reading target data or writing target data from an information processing terminal, an access control unit that accesses any of the storage devices and stores the target data in the cache memory;
A writing unit for writing the target data stored in the cache memory to a storage device in which the target data is not stored among the plurality of storage devices;
A storage system comprising:

The storage system further includes:
A load monitoring unit for monitoring an access load by reading or writing to the storage device;
The access control unit, when receiving the access request from the information processing terminal, accesses the storage device with the least access load among the storage devices based on the monitoring result, and caches the target data. The storage system according to claim 1, wherein the storage system is stored in a memory.

When there is a read request to the target data from the information processing terminal, the access control unit is a storage device with the least access load among the storage devices that store the target data based on the monitoring result The storage system according to claim 2, wherein the target data is acquired from the storage, the target data is stored in the cache memory, and the target data is transmitted to the information processing terminal.

The access control unit writes the target data to the storage device with the least access load among the storage devices based on the monitoring result when there is a request to write the target data from the information processing terminal, Store the target data in the cache memory,
The storage system according to claim 2, wherein the writing unit writes the target data stored in the cache memory to a storage device other than the storage device that performed the writing.

The storage system further includes:
When the write access load of a replica of a virtual volume obtained by virtualizing the volumes of the plurality of storage devices exceeds a threshold value, or when the free capacity of the storage device is less than a first threshold value in the creation of the replica, the virtual 2. The storage system according to claim 1, further comprising: a replica deletion unit that deletes the replica stored in one of the storage devices when there are three or more replicas of the volume in the plurality of storage devices.

The replica deletion unit selects the replica of the storage device with the least access load from the storage devices including the replica based on the monitoring result, and indicates the access frequency indicating the access frequency to the storage device that is aggregated at each time The selected replica is deleted when it is determined that there is no read access with an access frequency exceeding a second threshold after the current time on the selected replica based on frequency information. 6. The storage system according to 5.

The storage system further includes:
The difference between the access load in any time zone among the access loads for each time for any of the storage devices and the average of the access loads in the time zone for all the storage devices exceeds the third threshold The time zone is subdivided, the access load on any disk of the storage device in any one of the subdivided time zones, and the total disk of the storage device If the difference from the average access load of any of the subdivided time zones is smaller than the fourth threshold, the access load of any of the storage devices is the fifth before the subdivided time zone The storage system according to claim 1, further comprising: a replica creation unit that creates a replica on a disk having the highest access load during a time that does not exceed the threshold.

On the computer,
When there is an access request for reading target data or writing target data from an information processing terminal, it accesses any one of a plurality of storage devices for storing the data, and stores the target data in a cache memory Stored in
A data rebalancing program for executing a process of writing the target data stored in the cache memory to a storage device in which the target data is not stored among the plurality of storage devices.

In addition to the computer,
A process of monitoring an access load by reading or writing on the storage device;
When there is an access request from the information processing terminal, based on the monitoring result, the storage device with the least access load is accessed among the storage devices, and the target data is stored in a cache memory. 9. The data rebalancing program according to claim 8, wherein

When there is a read request to the target data from the information processing terminal, the target data is acquired from the storage device with the least access load among the storage devices that store the target data based on the monitoring result The data rebalancing program according to claim 9, wherein the target data is stored in the cache memory, and the target data is transmitted to the information processing terminal.

When there is a request for writing target data from the information processing terminal, based on the monitoring result, the target data is written to the storage device with the least access load among the storage devices, and the target data is stored in the cache memory. Stored in
The data rebalancing program according to claim 9, wherein the target data stored in the cache memory is written to a storage device other than the storage device that performed the writing.

In addition to the computer,
When the write access load of a replica of a virtual volume obtained by virtualizing the volumes of the plurality of storage devices exceeds a threshold value, or when the free capacity of the storage device is less than a first threshold value in the creation of the replica, the virtual The data rebalancing program according to claim 8, wherein if there are three or more replicas of a volume in the plurality of storage devices, a process of deleting the replica stored in any of the storage devices is executed.

Based on the monitoring results, select the storage device replica with the least access load among the storage devices including the replica, and based on the access frequency information indicating the access frequency to the storage device counted for each time, The data replica according to claim 12, wherein if the selected replica is determined not to have read access with an access frequency exceeding a second threshold after the current time, the selected replica is deleted. Balancing program.

In addition to the computer,
The difference between the access load in any time zone among the access loads for each time for any of the storage devices and the average of the access loads in the time zone for all the storage devices exceeds the third threshold The time zone is subdivided, the access load on any disk of the storage device in any one of the subdivided time zones, and the total disk of the storage device If the difference from the average access load of any of the subdivided time zones is smaller than the fourth threshold, the access load of any of the storage devices is the fifth before the subdivided time zone The data rebalancing program according to claim 8, wherein a process of creating a replica on a disk having the largest access load is executed during a time when the threshold is not exceeded.

A method of rebalancing data between storage devices executed by a computer,
The computer
When there is an access request for reading target data or writing target data from an information processing terminal, it accesses any one of a plurality of storage devices for storing the data, and stores the target data in a cache memory Stored in
A data rebalancing method for executing processing for writing the target data stored in the cache memory to a storage device in which the target data is not stored among the plurality of storage devices.

The computer further includes:
Monitor the access load by reading or writing to the storage device,
When there is an access request from the information processing terminal, based on the monitoring result, the storage device with the least access load is accessed among the storage devices, and the target data is stored in a cache memory. The data rebalancing method according to claim 15, characterized in that:

The computer
When there is a read request to the target data from the information processing terminal, the target data is acquired from the storage device with the least access load among the storage devices that store the target data based on the monitoring result The data rebalancing method according to claim 16, wherein the target data is stored in the cache memory, and the target data is transmitted to the information processing terminal.

The computer
When there is a request for writing target data from the information processing terminal, based on the monitoring result, the target data is written to the storage device with the least access load among the storage devices, and the target data is stored in the cache memory. Stored in
The data rebalancing method according to claim 16, wherein the target data stored in the cache memory is written to a storage device other than the storage device that performed the writing.