JP5415338B2

JP5415338B2 - Storage system, load balancing management method and program thereof

Info

Publication number: JP5415338B2
Application number: JP2010080212A
Authority: JP
Inventors: 洋俊赤池; 和久藤本
Original assignee: Tohoku University NUC; Hitachi Ltd
Current assignee: Tohoku University NUC; Hitachi Ltd
Priority date: 2010-03-31
Filing date: 2010-03-31
Publication date: 2014-02-12
Anticipated expiration: 2030-03-31
Also published as: JP2011215677A

Description

本発明は、計算機に対してファイルサービスを提供する複数のファイルサーバと、これらファイルサーバに記憶領域を提供する記憶装置システム（以下「ストレージシステム」という。）とで構成される計算機システムに関する。また、当該システム上での負荷分散を実現する管理方法及びプログラムに関する。 The present invention relates to a computer system including a plurality of file servers that provide file services to computers, and a storage device system (hereinafter referred to as “storage system”) that provides storage areas to these file servers. The present invention also relates to a management method and a program for realizing load distribution on the system.

近年、データセンタで扱うデータ量の急激な増大に伴い、ストレージシステムに対するアクセス負荷が増加している。特に、オンライントランザクションシステムやＨＰＣ（High Performance Computing）システムでは、大容量のデータを入出力する。このため、ファイルサーバに対するアクセス負荷が増加している。その結果、ファイルサーバには、高いアクセス負荷に対応できる高性能なファイルサービスが求められている。 In recent years, with the rapid increase in the amount of data handled in a data center, the access load on the storage system has increased. In particular, online transaction systems and HPC (High Performance Computing) systems input and output large volumes of data. For this reason, the access load on the file server is increasing. As a result, the file server is required to have a high-performance file service that can cope with a high access load.

多くのデータセンタは、この高い性能要求に対し、複数のファイルサーバを並列稼働させることで、高性能なファイルサービスを提供する。また、多くのデータセンタは、複数のファイルサーバを統一した共有ファイルシステムの名前空間を提供する。これにより、クライアントは、複数のファイルサーバがあたかも１つの共有ファイルシステムを提供しているかのように、ファイルサービスを受けることができる。 Many data centers provide high-performance file services by operating a plurality of file servers in parallel in response to this high performance requirement. Many data centers also provide a shared file system name space that unifies multiple file servers. Thereby, the client can receive a file service as if a plurality of file servers provide one shared file system.

ところが、従来システムでは、ファイルサーバに負荷が均等に割り振られず、少数のファイルサーバに負荷が偏ることがある。この場合、共有ファイルシステム全体の性能が、負荷が集中した一部のファイルサーバの性能に律速する。その結果、ファイルのＲｅａｄ／Ｗｒｉｔｅ速度の低下や応答時間の増加が生じ、共有ファイルシステムによるファイルサービスの性能が低下するという問題がある。 However, in the conventional system, the load is not evenly allocated to the file servers, and the load may be biased to a small number of file servers. In this case, the performance of the entire shared file system is limited to the performance of some file servers where the load is concentrated. As a result, there is a problem that the read / write speed of the file decreases and the response time increases, and the performance of the file service by the shared file system decreases.

上記問題を解決するため、クライアントのファイルアクセスを複数のファイルサーバに振り分ける技術が、特許文献１に開示されている。特許文献１に示す方法の場合、クライアントとサーバの間に設置されたセッション管理部とサーバ選択部が、クライアントのファイルアクセスを１つ１つに分解し、さらにラウンドロビン法により選択されたサーバへファイルアクセスを送信する。これにより、ファイルサーバに対する負荷を分散する。 In order to solve the above-described problem, Japanese Patent Application Laid-Open No. 2004-151867 discloses a technique for distributing client file access to a plurality of file servers. In the case of the method shown in Patent Document 1, a session management unit and a server selection unit installed between a client and a server decompose the file access of the client one by one, and further to the server selected by the round robin method Send file access. This distributes the load on the file server.

また、上記問題を解決する別の手段として、ファイルサーバが管理するファイルシステムの一部を、他のファイルサーバに移動する技術が、特許文献２に開示されている。特許文献２の方法は、クライアントからファイルサーバへのアクセスの状態を監視するファイル管理プログラムが、あるファイルサーバにクライアントのファイルアクセスが集中していることを知ると、そのファイルサーバが管理するファイルシステムの一部を他のファイルサーバに移動する。これにより、ファイルサーバに対する負荷を分散する。 Further, as another means for solving the above problem, Patent Document 2 discloses a technique for moving a part of a file system managed by a file server to another file server. When the file management program for monitoring the state of access from the client to the file server knows that the file access of the client is concentrated on a certain file server, the method of Patent Document 2 manages the file system managed by the file server. Move a part of to another file server. This distributes the load on the file server.

特開2002-351760号公報JP 2002-351760 特開2004-139200号公報JP 2004-139200 A

ストレージシステムの負荷を分散する方法である特許文献１は、クライアントとファイルサーバとの間でファイルアクセスを振り分けることにより、ファイルサーバの負荷の分散を実現する。しかし、振り分けられた先のファイルサーバが対象のファイル又はファイルシステムを管理していない場合、実際に管理している他のファイルサーバにファイルアクセスをリダイレクトする必要があり、そのオーバーヘッドのためにファイルサービス性能が低下する問題があった。 Japanese Patent Application Laid-Open No. 2004-151899, which is a method for distributing the load on a storage system, realizes distribution of a load on a file server by distributing file access between a client and a file server. However, if the distributed file server does not manage the target file or file system, it is necessary to redirect file access to the other file server that is actually managed, and file service is due to that overhead. There was a problem that performance deteriorated.

特許文献２の技術は、ファイルアクセスが集中しているファイルサーバから、ファイルシステムの一部を他のファイルサーバに移動することにより、ファイルサーバの負荷分散を実現する。しかし、アクセスの集中によりファイルサーバの負荷が高い、例えばＣＰＵ利用率が 100%に近い状態の場合、ファイルシステムの移動が困難であるという問題がある。また、この技術は、負荷分散のためにファイルアクセスを移動させるはずが、逆にファイルサーバに負荷をかけることになる。このため、ファイルサービスの性能低下を発生させる問題がある。 The technique of Patent Document 2 realizes load distribution of a file server by moving a part of the file system from a file server where file access is concentrated to another file server. However, there is a problem that it is difficult to move the file system when the load on the file server is high due to concentration of access, for example, when the CPU usage rate is close to 100%. In addition, this technology should move the file access for load distribution, but conversely places a load on the file server. For this reason, there is a problem that the performance degradation of the file service occurs.

上述した課題を解決するために、本発明の一実施形態は以下の構成を有する。具体的には、第一の管理装置が接続された複数の計算機に接続された１以上のファイルサーバと、ファイルサーバに接続され１以上のボリュームを有するストレージ装置と、ファイルサーバと第一の管理装置とに接続される第二の管理装置を有するストレージシステムであって、第一の管理装置は、計算機上で逐次実行されるジョブの情報（ジョブ情報）とジョブキューの情報（ジョブキュー情報）を記憶する領域と、実行キューと、実行までジョブを待機するキューとを有する。 In order to solve the above-described problems, an embodiment of the present invention has the following configuration. Specifically, one or more file servers connected to a plurality of computers to which the first management device is connected, a storage device connected to the file server and having one or more volumes, the file server and the first management A storage system having a second management device connected to the device, wherein the first management device is information on jobs (job information) and job queue information (job queue information) executed sequentially on the computer. , An execution queue, and a queue for waiting for a job until execution.

そして、第二の管理装置は、ジョブ情報を収集する手段と、ジョブキュー情報を収集する手段と、収集したジョブ情報とジョブキュー情報を解析する手段と、負荷を管理する手段と、ファイルシステムを管理する手段とを有する。ジョブ情報とジョブキュー情報を解析する手段は、ジョブ情報に基づいて、当該ジョブがアクセスするファイルと、ファイルシステムと、ファイルサーバを特定する。 The second management device includes means for collecting job information, means for collecting job queue information, means for analyzing the collected job information and job queue information, means for managing load, and a file system. Means for managing. The means for analyzing the job information and the job queue information specifies a file, a file system, and a file server to be accessed by the job based on the job information.

負荷を管理する手段は、ジョブ情報とジョブキュー情報に基づいて予測負荷を算出して負荷分散を実行する。ファイルシステムを管理する手段は、上記負荷を管理する手段で決定された負荷分散に従い、ファイルシステムを、上記ファイルサーバ内の予測負荷の高いサーバから予測負荷の小さいサーバに移動する。 The means for managing the load calculates the predicted load based on the job information and the job queue information and executes load distribution. The means for managing the file system moves the file system from a server with a high predicted load in the file server to a server with a low predicted load according to the load distribution determined by the means for managing the load.

その他、本願が開示する課題、及びその解決方法は、発明の実施形態の欄及び図面により明らかにされる。 In addition, the problem which this application discloses and the solution method are clarified by the column and drawing of embodiment of invention.

本発明により、高性能が要求されるバッチ処理系のアプリケーションにおいて、負荷が動的かつ急に変化する場合でも、複数並列に稼働するファイルサーバ間で実負荷と予測に基づいて負荷を分散する。これにより、高性能なファイルサービスを行うストレージシステムを提供することが可能となる。 According to the present invention, in a batch processing system application that requires high performance, even when the load changes dynamically and suddenly, the load is distributed between the file servers operating in parallel based on the actual load and the prediction. This makes it possible to provide a storage system that performs high-performance file services.

本発明のストレージシステムとそれに繋がる計算機、管理サーバの構成例を示す図である。It is a figure which shows the structural example of the storage system of this invention, a computer connected to it, and a management server. 本発明の計算機管理サーバとストレージ管理サーバの構成の例を示す図である。It is a figure which shows the example of a structure of the computer management server and storage management server of this invention. 図１のストレージシステムから、共有ファイルシステムの構成例を抜粋した図である。FIG. 2 is a diagram in which a configuration example of a shared file system is extracted from the storage system of FIG. 1. ファイルサーバの負荷分散を実行する前のキュー、計算機、ファイルサーバ、ファイルシステム、予測負荷の例を示す図である。It is a figure which shows the example of the queue before performing the load distribution of a file server, a computer, a file server, a file system, and an estimated load. ファイルシステムのマウント切り替えによる負荷分散の例を示す図である。It is a figure which shows the example of the load distribution by the mount switching of a file system. ファイルサーバの負荷分散を実行した後のキュー、計算機、ファイルサーバ、ファイルシステム、予測負荷の例を示す図である。It is a figure which shows the example of the queue after performing the load distribution of a file server, a computer, a file server, a file system, and prediction load. キューと実行キュー内で発生するイベントの例を示す図である。It is a figure which shows the example of the event which generate | occur | produces in a queue and an execution queue. 本発明のファイルサーバの負荷分散のタイミングの決定、負荷分散実行の手順の一例を示す図である。It is a figure which shows an example of the procedure of the determination of the load distribution timing of a file server of this invention, and load distribution execution. 本発明の予測負荷の作成の手順の一例を示す図である。It is a figure which shows an example of the procedure of preparation of the prediction load of this invention. 本発明の負荷分散対象リスト、負荷リストの一例を示す構造図である。It is a structural diagram showing an example of a load distribution target list and a load list of the present invention. 本発明のファイルシステム管理テーブルとファイルサーバ管理テーブルの一例を示す構造図である。It is a structure figure which shows an example of the file system management table and file server management table of this invention. 閾値s1を算出する手順の一例を示す図である。It is a figure which shows an example of the procedure which calculates threshold value s1. 閾値s2を算出する手順の一例を示す図である。It is a figure which shows an example of the procedure which calculates threshold value s2. 計算機実行スクリプトの例を示す図である。It is a figure which shows the example of a computer execution script.

以下、図面に基づいて、発明の実施例を説明する。なお、後述する実施例は一例であり、本発明には、本明細書に記載する任意の機能を組み合わせたシステム構成や本明細書に記載する全て又は一部機能に周知技術を組み合わせたシステム構成も含まれる。また、後述する実施例で実行される機能は、計算機（コンピュータ）上で実行されるプログラムとして実現されるものとして説明する。もっとも、プログラムの一部又は全部は、ハードウェアを通じて実現しても良い。 Embodiments of the invention will be described below with reference to the drawings. The embodiment described below is an example, and the present invention includes a system configuration in which any function described in this specification is combined, or a system configuration in which all or some of the functions described in this specification are combined with a well-known technique. Is also included. The functions executed in the embodiments described later will be described as being realized as a program executed on a computer (computer). However, part or all of the program may be realized through hardware.

[実施例１]
図１は、第一の実施形態のストレージ装置を含むシステムの構成を示す図である。計算機システム１は、計算機１１、ＩＰスイッチ２、ストレージシステム９および計算機管理サーバ７を有する。また、ストレージシステム９は、ファイルサーバ３、ファイバチャネル（FC：Fiber Channel）スイッチ４、ストレージ装置５及びストレージ管理サーバ８を有する。 [Example 1]
FIG. 1 is a diagram illustrating a configuration of a system including the storage apparatus according to the first embodiment. The computer system 1 includes a computer 11, an IP switch 2, a storage system 9, and a computer management server 7. The storage system 9 includes a file server 3, a fiber channel (FC) switch 4, a storage device 5, and a storage management server 8.

図１に示すように、ＩＰスイッチ２を介して計算機１１とファイルサーバ３を接続することで、計算機１１はストレージシステム９に接続している。また、計算機１１、ファイルサーバ３、ストレージ装置５、ストレージ管理サーバ８は、管理用のネットワークであるＬＡＮ６（Local Area Network）を介して互いに接続される。 As shown in FIG. 1, the computer 11 is connected to the storage system 9 by connecting the computer 11 and the file server 3 via the IP switch 2. The computer 11, the file server 3, the storage device 5, and the storage management server 8 are connected to each other via a LAN 6 (Local Area Network) that is a management network.

ファイルサーバ３とストレージ装置５とを接続するインターフェースには、ファイバチャネル、ｉＳＣＳＩ等のブロックデータを送るプロトコルのインターフェースを用いるのが一般的である。ここで、ファイルサーバ３とストレージ装置５を直接接続する場合もあるが、図１ではＦＣスイッチ４を介して接続している。 As an interface for connecting the file server 3 and the storage apparatus 5, it is common to use an interface of a protocol for sending block data such as fiber channel and iSCSI. Here, the file server 3 and the storage device 5 may be directly connected, but in FIG. 1, they are connected via the FC switch 4.

ストレージ装置５は、コントローラ５１と、内部にハードディスク５９を持つハードディスク搭載部５８とを有する。コントローラ５１は、ファイルサーバ又は計算機等の上位装置からのデータ書き込み／読み出しコマンドを制御するＩ／ＦであるＣＨＡ（channel adapter）５４と、ハードディスク５９に接続され、ハードディスク５９への書き込み／読み出しコマンドを制御するディスクＩ／ＦであるＤＫＡ（disk adapter）５６と、キャッシュメモリ５２と、共有メモリ５３と、ＳＷ５５と、内部ＬＡＮ５７と、管理端末６０とを有する。 The storage device 5 includes a controller 51 and a hard disk mounting unit 58 having a hard disk 59 inside. The controller 51 is connected to a hard disk 59 and a CHA (channel adapter) 54 that is an I / F that controls a data write / read command from a host device such as a file server or a computer. A DKA (disk adapter) 56 that is a disk I / F to be controlled, a cache memory 52, a shared memory 53, a SW 55, an internal LAN 57, and a management terminal 60 are included.

キャッシュメモリ５２と共有メモリ５３は、ＣＨＡ５４とＤＫＡ５６が共有するメモリ装置である。共有メモリ５３は、主として制御情報やコマンド等を記憶するために利用される。キャッシュメモリ５２は、主としてデータを記憶するために利用される。 The cache memory 52 and the shared memory 53 are memory devices shared by the CHA 54 and the DKA 56. The shared memory 53 is mainly used for storing control information and commands. The cache memory 52 is mainly used for storing data.

ＳＷ５５は、キャッシュメモリ５２、共有メモリ５３、ＣＨＡ５４、ＤＫＡ５６を相互に接続する。これらのキャッシュメモリ５２、共有メモリ５３、ＣＨＡ５４、ＤＫＡ５６相互間のコマンド又はデータの送受信はＳＷ５５を介して行われる。ＳＷ５５は、例えば高速スイッチングによりデータ伝送を行う１つ以上のスイッチ装置等で構成されるのが一般的である。ただし、ＳＷ５５は、１本以上の共通バスで構成しても良い。 The SW 55 connects the cache memory 52, the shared memory 53, the CHA 54, and the DKA 56 to each other. Command or data transmission / reception among the cache memory 52, shared memory 53, CHA 54, and DKA 56 is performed via the SW 55. The SW 55 is generally composed of one or more switch devices that perform data transmission by high-speed switching, for example. However, the SW 55 may be composed of one or more common buses.

ハードディスク搭載部５８は、ＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）を構成するハードディスク５９のグループを１つ以上持つ。このハードディスク５９のグループをＲＡＩＤグループ（ＲＡＩＤＧｒ．５７）と呼ぶ。（修正点１）ストレージ装置には、単一又は複数のＲＡＩＤＧｒ５７の記憶空間を合わせた論理ボリュームが設定されている。ストレージ装置は、上位装置に対し、記憶領域として論理ボリュームを提供する。上位装置は、この論理ボリュームに対するデータの書き込み／読み出しコマンドを発行する。 The hard disk mounting unit 58 has one or more groups of hard disks 59 constituting a RAID (Redundant Arrays of Inexpensive Disks). This group of hard disks 59 is called a RAID group (RAIDGr.57). (Modification 1) In the storage apparatus, a logical volume that combines a single or a plurality of RAIDGr57 storage spaces is set. The storage device provides a logical volume as a storage area to the host device. The host device issues a data write / read command for this logical volume.

ＣＨＡ５４は、上位装置からのデータの書き込み／読み出しコマンドを受けた際、キャッシュメモリ５２との間のデータ転送を制御する。ＤＫＡ５６は、ハードディスク５９へのデータの書き込み／読出し時に、キャッシュメモリ５２との間のデータ転送を制御する。この時、ＤＫＡ５６は、ＣＨＡ５４から送信された論理アドレス指定によるデータアクセス要求を、物理アドレス指定によるデータアクセス要求に変換し、ハードディスク５９にデータの書き込み／読出しを行う。このようなキャッシュメモリ５２を介したＣＨＡ５４とＤＫＡ５６の間のデータのやり取りにより、上位装置からハードディスク５９へのデータの書き込み／読出しを行う。このような制御を行うため、ＣＨＡ５４及びＤＫＡ５６は、１つ以上のプロセッサ(図示していない)を有する。 The CHA 54 controls data transfer with the cache memory 52 when receiving a data write / read command from the host device. The DKA 56 controls data transfer with the cache memory 52 when writing / reading data to / from the hard disk 59. At this time, the DKA 56 converts the data access request by the logical address designation transmitted from the CHA 54 into a data access request by the physical address designation, and writes / reads data to / from the hard disk 59. By such data exchange between the CHA 54 and the DKA 56 via the cache memory 52, data is written / read to / from the hard disk 59 from the host device. In order to perform such control, the CHA 54 and the DKA 56 have one or more processors (not shown).

ＣＨＡ５４、ＤＫＡ５６、管理端末６０は、内部ＬＡＮ５７を介して接続されている。さらに、ストレージ装置の外部に配置されるストレージ管理サーバ８は、ＬＡＮ６を介して内部ＬＡＮ５７に接続される。管理者は、不図示の入力装置を通じて管理端末６０を操作することにより、論理ボリューム、ＣＨＡ５４、ＤＫＡ５６の設定が可能である。 The CHA 54, DKA 56, and management terminal 60 are connected via an internal LAN 57. Further, the storage management server 8 arranged outside the storage apparatus is connected to the internal LAN 57 via the LAN 6. The administrator can set the logical volume, CHA 54, and DKA 56 by operating the management terminal 60 through an input device (not shown).

ここで、上述のコントローラ５１の構成は一実施例に過ぎず、構成を上記に限定するものではない。コントローラ５１は、計算機１１からのデータの書き込み／読み出し要求に応じてハードディスク５９へのデータの書き込み／読出しを行う機能を有していれば問題ない。 Here, the configuration of the controller 51 described above is merely an example, and the configuration is not limited to the above. There is no problem if the controller 51 has a function of writing / reading data to / from the hard disk 59 in response to a data write / read request from the computer 11.

計算機管理サーバ７は、ＣＰＵ７１、メモリ７２、ジョブスケジューラ７３、ＩＰインターフェース７４を有する。ジョブスケジューラ７３は、メモリ７２の一部領域を用いてＣＰＵ７１により実行され、計算機１１が実行するジョブを管理する。 The computer management server 7 includes a CPU 71, a memory 72, a job scheduler 73, and an IP interface 74. The job scheduler 73 is executed by the CPU 71 using a partial area of the memory 72 and manages jobs executed by the computer 11.

ストレージ管理サーバ８は、ＣＰＵ８１、メモリ８２、情報収集部８３、情報解析部８４、負荷管理部８５、ファイルシステム管理部８６、ＩＰインターフェース８７を有する。情報収集部８３、情報解析部８４、負荷管理部８５、ファイルシステム管理部８６は、メモリ８２の一部領域を用いてＣＰＵ８１で実行される。ストレージ管理サーバ８は、これらの機能を用いてファイルサーバ３の負荷分散を実行する。 The storage management server 8 includes a CPU 81, a memory 82, an information collection unit 83, an information analysis unit 84, a load management unit 85, a file system management unit 86, and an IP interface 87. The information collection unit 83, the information analysis unit 84, the load management unit 85, and the file system management unit 86 are executed by the CPU 81 using a partial area of the memory 82. The storage management server 8 executes load balancing of the file server 3 using these functions.

図２に、計算機管理サーバ７及びストレージ管理サーバ８の機能を示す。計算機管理サーバ７のジョブスケジューラ７３は、計算機１１で実行するジョブ２１１を管理するジョブ管理部２０１、実行するまでの間ジョブを待機させておくキュー２１２、計算機で実行中のジョブを格納する実行キュー２１３、ジョブの情報（ジョブ情報）とジョブキューの情報（ジョブキュー情報）を管理する計算機管理部２２１を有する。計算機管理部２２１は、計算機に対してジョブの実行と終了の指示、ジョブの実行状態の監視、ジョブ実行結果の取得などの管理を行う。 FIG. 2 shows functions of the computer management server 7 and the storage management server 8. The job scheduler 73 of the computer management server 7 includes a job management unit 201 that manages jobs 211 to be executed by the computer 11, a queue 212 that waits for jobs until execution, and an execution queue that stores jobs being executed by the computer 213, a computer management unit 221 that manages job information (job information) and job queue information (job queue information). The computer management unit 221 performs management such as job execution and termination instructions, job execution status monitoring, and job execution result acquisition to the computer.

キュー２１２内のジョブ２１１は、ジョブスケジューラ７３の管理に従い、ジョブの実行順に並んでいる。 The jobs 211 in the queue 212 are arranged in the job execution order according to the management of the job scheduler 73.

ストレージ管理サーバ８は、情報収集部８３、情報解析部８４、負荷管理部８５及びファイルシステム管理部８６を有する。情報収集部８３は、ジョブスケジューラ７３からジョブ２１１の情報（ジョブ情報）、キュー２１２及び実行キュー２１３の情報（ジョブキュー情報）を収集し、情報解析部８４が収集した情報を解析する。 The storage management server 8 includes an information collection unit 83, an information analysis unit 84, a load management unit 85, and a file system management unit 86. The information collection unit 83 collects information on the job 211 (job information), information on the queue 212 and execution queue 213 (job queue information) from the job scheduler 73, and analyzes the information collected by the information analysis unit 84.

負荷管理部８５は、負荷分散を実行するタイミングを決定する負荷分散タイミング決定部２３１、一定時間後までの最大の負荷を予測する予測負荷作成部２３２、予測負荷を元に負荷分散を実行する負荷分散実行部２３３、ファイルサーバ管理テーブル２３４、負荷分散対象リスト２３５、負荷リスト２３６を有する。 The load management unit 85 includes a load distribution timing determination unit 231 that determines the timing for executing load distribution, a predicted load creation unit 232 that predicts the maximum load until a certain time later, and a load that executes load distribution based on the predicted load A distribution execution unit 233, a file server management table 234, a load distribution target list 235, and a load list 236 are included.

ファイルシステム管理部８６は、ファイルサーバ３のファイルシステムを管理するためのファイルシステム管理テーブル２４１と、ファイルシステム移動部２４２とを有する。 The file system management unit 86 includes a file system management table 241 for managing the file system of the file server 3 and a file system migration unit 242.

負荷分散実行部２３３は、負荷集中が実際に生じる前に、ファイルシステム管理部８６に対してファイルシステムの移動を指示し、ファイルシステム移動部２４２がファイルサーバ間でファイルシステムを移動する。具体的には、負荷の高いファイルサーバが管理しているファイルシステムを、負荷の低いファイルサーバに移動する。これにより、予測に基づく負荷分散が実行される。ファイルシステムの具体的な移動方法については後述する。 The load distribution execution unit 233 instructs the file system management unit 86 to move the file system before the load concentration actually occurs, and the file system movement unit 242 moves the file system between the file servers. Specifically, the file system managed by the high load file server is moved to the low load file server. Thereby, load distribution based on prediction is executed. A specific method for moving the file system will be described later.

図３に、計算機１１とファイルサーバ３の機能を示す。計算機１１は、ジョブスケジューラ７３から受信したジョブを実行するジョブ実行部３０３を有し、ジョブで指定された計算プログラム３０４を実行する。計算プログラム３０４は、ファイルサーバ３に対してファイルアクセスを行う。ファイルサーバ３は、ストレージ装置５が記憶領域として提供するボリューム３２１内のファイルシステム３２２を管理する。また、ファイルサーバ３は、受信したファイルアクセスをブロックアクセスに変換し、ファイルシステムにアクセスする。 FIG. 3 shows functions of the computer 11 and the file server 3. The computer 11 includes a job execution unit 303 that executes a job received from the job scheduler 73, and executes a calculation program 304 specified by the job. The calculation program 304 performs file access to the file server 3. The file server 3 manages the file system 322 in the volume 321 provided as a storage area by the storage apparatus 5. Further, the file server 3 converts the received file access into block access, and accesses the file system.

図４に、予測負荷の作成の一例を示す。なお、以下の処理は、負荷管理部８５が実行する。例では、実行キュー２１３内のジョブ（JOB）#1,#2,#3が実行中であり、キュー２１２内のJOB#4,#5,#6,#7,#8が待機している。図に示すように、JOB#1からJOB#8は、それぞれファイルシステム（FS）#1からFS#8にアクセスする。NAS#1はFS#1,#4,#5,#6,#8を、NAS#2はFS#2,#3,#7,#9を、NAS#3はFS#10,#11を管理している。予測負荷４０３は、実行キュー２１３で待機するJOB#1,#2,#3と、予め定められた方法で算出された閾値s2 401よりもキュー２１２内の順番が小さいJOB#4,#5,#6が、それぞれのファイルシステムFS#1,#2,#3,#4,#5,#6に与える負荷を、NAS毎に合計したものである。 FIG. 4 shows an example of creating a predicted load. The following processing is executed by the load management unit 85. In the example, jobs (JOB) # 1, # 2, and # 3 in the execution queue 213 are being executed, and JOB # 4, # 5, # 6, # 7, and # 8 in the queue 212 are waiting. . As shown in the figure, JOB # 1 to JOB # 8 access file system (FS) # 1 to FS # 8, respectively. NAS # 1 has FS # 1, # 4, # 5, # 6, # 8, NAS # 2 has FS # 2, # 3, # 7, # 9, NAS # 3 has FS # 10, # 11 I manage. The predicted load 403 includes JOBs # 1, # 2, and # 3 waiting in the execution queue 213, and JOBs # 4, # 5, and # 5 having a smaller order in the queue 212 than the threshold s2 401 calculated by a predetermined method. # 6 is the total load applied to each file system FS # 1, # 2, # 3, # 4, # 5, # 6 for each NAS.

図のように、JOBがFS#1,#2,#3,#4,#5,#6に与える負荷４０５を、実行キュー２１３とキュー２１２におけるジョブ２１１の並びの順番に従い、負荷1,2,3,4,5,6と表す。NAS毎に計算された予測負荷４０３が、予め定めた負荷の閾値４０４よりも大きいとき、負荷管理部８５は、そのNASが高負荷であると判断する。図では、NAS#1の予測負荷４０３のみが負荷の閾値４０４よりも高くなっている。このとき、負荷管理部８５は、NAS#1が高負荷であると判断する。 As shown in the figure, the load 405 given by the JOB to the FS # 1, # 2, # 3, # 4, # 5, # 6 is determined according to the order of the jobs 211 in the execution queue 213 and the queue 212. , 3,4,5,6. When the predicted load 403 calculated for each NAS is larger than a predetermined load threshold 404, the load management unit 85 determines that the NAS has a high load. In the figure, only the predicted load 403 of NAS # 1 is higher than the load threshold 404. At this time, the load management unit 85 determines that NAS # 1 has a high load.

図５に、ファイルシステム移動部８６が、ファイルサーバ間でファイルシステムを移動する際に使用する方法の一例を示す。例では、NAS#1 ５０１が、計算機１１に対してファイルサービスを提供している。また、計算機１１は、NFS共有５１１をディレクトリ５２１にNFSマウントしている。この場合、計算機１１は、ディレクトリ５２１にアクセスすることにより、あたかもボリューム３２１内のファイルシステム３２２が計算機１１内にあるかのようにファイルにアクセスすることができる。 FIG. 5 shows an example of a method used when the file system moving unit 86 moves a file system between file servers. In the example, NAS # 1 501 provides a file service to the computer 11. Further, the computer 11 NFS mounts the NFS share 511 on the directory 521. In this case, the computer 11 can access the file as if the file system 322 in the volume 321 is in the computer 11 by accessing the directory 521.

ファイルシステムの移動は、次の４つの手順で行う。
（１）ファイルシステム移動部２４２が計算機１１に指示を出す。その指示に従って、計算機１１がNFS共有５１１をNFSアンマウントする。
（２）ファイルシステム移動部２４２がNAS#1 ５０１に指示を出す。その指示に従って、NAS#1 ５０１がNFS共有５１１を停止させ、その後、ファイルシステム３２２をファイルシステムアンマウントする。
（３）ファイルシステム移動部２４２がNAS#3 ５０２に指示を出す。その指示に従って、NAS#3 ５０２はファイルシステム３２２をファイルシステムマウントし、その後、NFS共有５１２を開始する。
（４）ファイルシステム移動部２４２が計算機１１に指示を出す。その指示に従って、計算機１１はNFS共有５１２をディレクトリ５２１にＮＦＳマウントする。 The file system is moved in the following four procedures.
(1) The file system moving unit 242 issues an instruction to the computer 11. In accordance with the instruction, the computer 11 unmounts the NFS share 511 by NFS.
(2) The file system moving unit 242 issues an instruction to the NAS # 1 501. According to the instruction, the NAS # 1 501 stops the NFS share 511, and then unmounts the file system 322.
(3) The file system moving unit 242 issues an instruction to the NAS # 3 502. In accordance with the instruction, the NAS # 3 502 mounts the file system 322 as a file system, and then starts the NFS sharing 512.
(4) The file system moving unit 242 issues an instruction to the computer 11. In accordance with the instruction, the computer 11 NFS mounts the NFS share 512 on the directory 521.

ファイルシステムの移動の結果、NAS#1の代わりにNAS#3がファイルシステム３２２を管理するようになり、計算機１１がディレクトリ５２１にアクセスしたときNAS#3がファイルサービスを提供する。 As a result of the movement of the file system, NAS # 3 manages the file system 322 instead of NAS # 1, and NAS # 3 provides a file service when the computer 11 accesses the directory 521.

なお、上記の例の場合、手順（１）と手順（４）において、ファイルシステム移動部２４２が計算機１１に対してＮＦＳアンマウントとマウントするように直接指示を出している。しかし、ファイルシステム移動部２４２がＮＩＳ（Network Information Service）のようなネットワーク上のコンピュータ間で情報を共有するシステムに指示を出すことで、計算機１１のNFSアンマウントとマウントを制御する手法を採用しても良い。 In the case of the above example, in the procedure (1) and the procedure (4), the file system moving unit 242 directly instructs the computer 11 to mount the NFS unmount. However, the file system moving unit 242 employs a technique for controlling the NFS unmount and mount of the computer 11 by giving an instruction to a system that shares information between computers on a network such as NIS (Network Information Service). Also good.

図６に、図４で作成した予測負荷に基づく負荷分散実行動作の一例と予測負荷の変化を示す。負荷分散実行では図５に示す操作を行うが、その前にまず、ファイルシステムの移動元と移動先のファイルサーバと、対象のファイルシステムとを選択する。図４の場合には、NAS#1が高負荷と判断され、負荷６が負荷の閾値４０４を超えていた。 FIG. 6 shows an example of a load distribution execution operation based on the predicted load created in FIG. 4 and changes in the predicted load. In the load distribution execution, the operation shown in FIG. 5 is performed. Before that, first, the source and destination file servers of the file system and the target file system are selected. In the case of FIG. 4, NAS # 1 is determined to have a high load, and the load 6 exceeds the load threshold 404.

この場合、負荷分散におけるファイルシステムの移動元として、予測負荷が最も高いファイルサーバ、すなわち図６におけるNAS#1が選択される。 In this case, the file server with the highest predicted load, that is, NAS # 1 in FIG. 6, is selected as the migration source of the file system in load distribution.

また、負荷分散の対象ファイルシステムとして、負荷の閾値４０４を超える負荷、すなわち図６における負荷６のファイルシステムFS#6が選択される。 Also, a load exceeding the load threshold 404, that is, the file system FS # 6 with the load 6 in FIG. 6 is selected as the target file system for load distribution.

さらに、ファイルシステムの移動先として、予測負荷の値が最も小さいファイルサーバ、すなわち図６におけるNAS#3が選択される。 Further, the file server with the smallest predicted load value, that is, NAS # 3 in FIG. 6 is selected as the destination of the file system.

次に、決定したファイルシステムの移動元と、移動先のファイルサーバと、対象のファイルシステムの情報に基づいて、図５に示すファイルシステムの移動操作に基づいてファイルシステムが実際に移動される。すなわち、図６の負荷分散は、図５に示した方法により、NAS#1のファイルシステムFS#6をNAS#3に移動することにより実現される。この操作は、負荷６に対応するJOB#6が実行に移るより前に予め実行される。このファイルシステムの移動により、負荷６はNAS#1からNAS#3に移る。このため、NAS#1の予測負荷は、負荷の閾値４０４よりも小さく抑えられる。結果として、ジョブ実行により動的に負荷が変化する場合においても、ジョブ実行前に予めファイルサーバ間で負荷分散を行うことができる。 Next, the file system is actually moved based on the file system move operation shown in FIG. 5 based on the determined file system move source, move destination file server, and target file system information. That is, the load distribution in FIG. 6 is realized by moving the file system FS # 6 of NAS # 1 to NAS # 3 by the method shown in FIG. This operation is executed in advance before JOB # 6 corresponding to the load 6 starts execution. With this movement of the file system, the load 6 moves from NAS # 1 to NAS # 3. For this reason, the predicted load of NAS # 1 is suppressed to be smaller than the load threshold 404. As a result, even when the load dynamically changes due to job execution, it is possible to perform load distribution between file servers in advance before job execution.

負荷分散を常時実行することは、ファイルサーバへの負荷を発生させるために避けられるべきである。キュー２１２と実行キュー２１３で発生するイベントを監視し、イベント発生した時のみ負荷分散を実行することで、ファイルサーバへの負荷を減らすことができる。 Always performing load balancing should be avoided to generate load on the file server. By monitoring the events occurring in the queue 212 and the execution queue 213 and executing load distribution only when the event occurs, the load on the file server can be reduced.

負荷分散の実行タイミングを決定するために用いるキュー２１２と実行キュー２１３で発生するイベントを図７に示す。キュー２１２と実行キュー２１３内のジョブの状態を合わせてジョブ状態と呼ぶ。 FIG. 7 shows events that occur in the queue 212 and the execution queue 213 used to determine the execution timing of load distribution. The job statuses in the queue 212 and the execution queue 213 are collectively referred to as a job status.

初めに、ジョブ状態(t=t1)の時、実行キュー２１３にはJOB#2が１つだけあり、キュー２１２にはJOB#3,#4,#5,#6,#7がある。 First, in the job state (t = t1), the execution queue 213 has only one job # 2, and the queue 212 has jobs # 3, # 4, # 5, # 6, and # 7.

ジョブ状態(t=t2)になると、JOB#3 ２１４がキュー２１２から実行キュー２１３に移動した。これはJOB#3 ２１５が計算機１１で実行開始したことを意味している。ジョブ状態(t=2)と(t=1)の変化から、ジョブ実行開始イベントを検出する。 When the job status (t = t2) is reached, JOB # 3 214 is moved from the queue 212 to the execution queue 213. This means that execution of JOB # 3 215 has started on the computer 11. A job execution start event is detected from changes in the job state (t = 2) and (t = 1).

ジョブ状態(t=t4)になると、ジョブ状態(t=t3)の時には存在したキュー２１２内のJOB#6 ２１６が消えている。これは、JOB#6がユーザによってキャンセルされ、スケジューラがJOB#6をキュー２１２から削除したことを意味している。ジョブ状態(t=2)と(t=1)の変化から、ジョブキャンセルイベントを検出する。 When the job status (t = t4) is reached, the job # 6 216 in the queue 212 that existed in the job status (t = t3) has disappeared. This means that JOB # 6 has been canceled by the user, and the scheduler has deleted JOB # 6 from the queue 212. A job cancel event is detected from changes in the job state (t = 2) and (t = 1).

次に、ジョブ状態(t=t6)になると、ジョブ状態(t=t5)の時にはキュー２１３内に存在したJOB#2 ２１７が消えている。これは、JOB#2が実行終了したことを意味している。ジョブ状態(t=6)と(t=5)の変化から、ジョブ実行終了イベントを検出する。 Next, when the job state (t = t6) is reached, JOB # 2 217 existing in the queue 213 disappears in the job state (t = t5). This means that JOB # 2 has finished executing. A job execution end event is detected from changes in the job status (t = 6) and (t = 5).

ただし、負荷分散の実行タイミングは上述の方法に限定されない。例えば(1) イベントが予め設定した回数分発生する毎に負荷分散する、(2) イベントの発生による負荷の変化の合計が予め設定した値より大きくなる毎に負荷分散する、(3) 予め設定した時間毎に負荷分散する等の方法がある。 However, the load distribution execution timing is not limited to the above-described method. For example, (1) The load is distributed every time the event occurs for a preset number of times, (2) The load is distributed every time the total load change due to the occurrence of the event becomes greater than a preset value, (3) Preset For example, there is a method of distributing the load every time.

上述の負荷とは、ジョブがファイルシステムにアクセスすることでファイルサーバに与える負荷を意味する。例えばファイルのＲｅａd／Ｗｒｉｔｅの転送速度やＩＯＰＳ（Input Output Per Second）、又は、ＮＡＳのファイルサービスにおけるＣＰＵ利用率等を負荷として定義することができる。負荷管理部８５は、これらの値を負荷の大きさとして使用する。また、負荷の大きさを与える値は、負荷管理部８５内で管理されている負荷リスト１００３における負荷の大きさ１０２７に入力され、負荷分散処理に使用される。また、この定義以外にも、上記の値に重みをつけて足し合わせる等、何らかの方法で算出した値を負荷として定義することもできる。 The above-mentioned load means a load given to the file server by the job accessing the file system. For example, a file read / write transfer speed, IOPS (input output per second), or a CPU utilization rate in a NAS file service can be defined as a load. The load management unit 85 uses these values as the magnitude of the load. A value that gives the magnitude of the load is input to the load magnitude 1027 in the load list 1003 managed in the load manager 85 and used for the load distribution process. In addition to this definition, a value calculated by some method such as adding a weight to the above values can be defined as a load.

各ファイルサーバには、負荷の上限となる負荷の閾値４０４を予め設定しておく。この負荷の閾値４０４は、ファイルサーバ管理テーブル１１０２の負荷の閾値１１２２に予め設定され、負荷分散処理に用いられる。 In each file server, a load threshold 404 that is an upper limit of the load is set in advance. The load threshold 404 is set in advance as the load threshold 1122 of the file server management table 1102 and is used for the load distribution process.

図８は、本発明のファイルサーバでの負荷分散の手順を示しており、負荷管理部８５が処理を実行する。まず、ステップ８０１で、負荷管理部８５は、ジョブスケジューラ７３からジョブ状態を収集する。次のステップ８０２で、負荷管理部８５は、ジョブ状態を解析し、イベントを検知する。イベント検知の方法は上述の通り図７に示した。 FIG. 8 shows a load distribution procedure in the file server of the present invention, and the load management unit 85 executes the processing. First, in step 801, the load management unit 85 collects job statuses from the job scheduler 73. In the next step 802, the load management unit 85 analyzes the job status and detects an event. The event detection method is shown in FIG. 7 as described above.

ステップ８０３において、負荷管理部８５は、発生したイベントが負荷分散対象イベントか否か判定する。なお、発生したイベントが、ジョブ実行開始、ジョブキャンセル、ジョブ実行終了イベントの場合、負荷管理部８５は、ステップ８０４に進む。それ以外の場合、負荷管理部８５は、ステップ８０１に戻る。 In step 803, the load management unit 85 determines whether the generated event is a load distribution target event. If the generated event is a job execution start, job cancellation, or job execution end event, the load management unit 85 proceeds to step 804. Otherwise, the load management unit 85 returns to Step 801.

ステップ８０４では、負荷管理部８５は、負荷分散を実行する。以下、負荷分散の詳細内容を説明する。 In step 804, the load management unit 85 executes load distribution. The detailed contents of load distribution will be described below.

まず、負荷管理部８５は、ステップ８１１において予測負荷を作成する。この手順については、図９で詳しく説明する。次に、負荷管理部８５は、ステップ８１２で負荷の閾値４０４を超える負荷を集めた負荷リスト１００１を作成する。負荷リスト１００１は負荷管理部８５内に存在し、負荷管理部８５が管理している。 First, the load management unit 85 creates a predicted load in Step 811. This procedure will be described in detail with reference to FIG. Next, the load management unit 85 creates a load list 1001 in which loads exceeding the load threshold 404 are collected in step 812. The load list 1001 exists in the load management unit 85 and is managed by the load management unit 85.

続くステップ８１３で、負荷管理部８５は、この負荷リスト１００１からキューの順番１０１４≦閾値s1 ４０２となる負荷を削除する。この閾値s1の決定方法については、図１３にて詳しく説明する。次のステップ８１４で、負荷管理部８５は、負荷リスト１００１のサイズが１以上か否かを判定する。サイズが０だった場合、負荷管理部８５は、負荷分散処理を終了する。サイズが１以上だった場合、負荷管理部８５は、ステップ８１５に進む。 In subsequent step 813, the load management unit 85 deletes the load satisfying the queue order 1014 ≦ threshold s 1 402 from the load list 1001. A method for determining the threshold value s1 will be described in detail with reference to FIG. In the next step 814, the load management unit 85 determines whether the size of the load list 1001 is 1 or more. When the size is 0, the load management unit 85 ends the load distribution process. If the size is 1 or more, the load management unit 85 proceeds to step 815.

ステップ８１５で、負荷管理部８５は、負荷分散対象リスト１００１から、その中に記述されているマウント元のＮＡＳ１０１５の番号が一番小さい負荷分散対象１００２を選択する。一番小さい負荷分散対象１００２が複数ある場合、更に、キューの順番１０１４が一番小さい負荷分散対象１００２を選択する。ステップ８１６では、負荷分散のファイルシステム移動先として、予測負荷４０３の最も小さいNAS#bを選択する。 In step 815, the load management unit 85 selects the load distribution target 1002 having the smallest mount source NAS 1015 number described in the load distribution target list 1001. When there are a plurality of smallest load distribution targets 1002, the load distribution target 1002 having the smallest queue order 1014 is selected. In step 816, the NAS # b having the smallest predicted load 403 is selected as the file system migration destination for load distribution.

ステップ８１７において、負荷管理部８５は、ステップ８１５で選択した負荷分散対象１００２内に記述されたファイルシステム１０１３を負荷分散対象、ＮＡＳ（１０１５）をマウント元、NAS#bをマウント切り替え先として、ファイルシステム移動部２４２にマウント切り替え指示を出す。マウントの切り替えは、上述の通り図５に示した方法により、ファイルシステム管理部８６のファイルシステム移動部２４２が処理を実行する。ステップ８１７が終わると、負荷管理部８５は、ステップ８１１に戻る。 In step 817, the load management unit 85 sets the file system 1013 described in the load distribution target 1002 selected in step 815 as the load distribution target, NAS (1015) as the mount source, and NAS # b as the mount switching destination. A mount switching instruction is issued to the system moving unit 242. As described above, the mount switching is performed by the file system moving unit 242 of the file system managing unit 86 according to the method shown in FIG. When step 817 ends, the load management unit 85 returns to step 811.

図９に、ステップ８１１における予測負荷４０３の作成手順を示す。この処理は、負荷管理部８５の予測負荷作成部２３２が実行する。まず、予測負荷作成部２３２は、ステップ９０１で負荷リスト１００３（図２の２３６）を初期化する。 FIG. 9 shows a procedure for creating the predicted load 403 in step 811. This process is executed by the predicted load creation unit 232 of the load management unit 85. First, the predicted load creation unit 232 initializes the load list 1003 (236 in FIG. 2) in step 901.

次に、予測負荷作成部２３２は、ステップ９０２でジョブスケジューラ７３から各ジョブのジョブスクリプト１４０１を取得する。予測負荷作成部２３２は、ステップ９０３で、取得した各ジョブのジョブスクリプト１４０１のそれぞれを解析する。解析の結果として、予測負荷作成部２３２は、ジョブアクセス先ファイルシステムを判定する。また、予測負荷作成部２３２は、ファイルシステム管理テーブル１１０１を用いることにより、判定で得たファイルシステムからマウント元のＮＡＳを判定し、負荷リスト１００３に負荷データ１００４として追加する。 Next, the predicted load creation unit 232 acquires a job script 1401 for each job from the job scheduler 73 in step 902. In step 903, the predicted load creation unit 232 analyzes each job script 1401 of each acquired job. As a result of the analysis, the predicted load creation unit 232 determines a job access destination file system. Also, the predicted load creation unit 232 uses the file system management table 1101 to determine the mount source NAS from the file system obtained by the determination, and adds it as load data 1004 to the load list 1003.

次のステップ９０４で、予測負荷作成部２３２は、負荷リスト１００３内の各負荷データ１００４のジョブＩＤ１０２２に対応するキュー２１２及び実行キュー２１３内の順番をジョブスケジューラ７３から取得し、負荷リスト１００３の実行キュー内の順番１０２５とキュー内の順番１０２４にそれぞれ入力する。 In the next step 904, the predicted load creation unit 232 acquires the order in the queue 212 and the execution queue 213 corresponding to the job ID 1022 of each load data 1004 in the load list 1003 from the job scheduler 73, and executes the load list 1003. Input is made to the order 1025 in the queue and the order 1024 in the queue.

次のステップ９０５で、予測負荷作成部２３２は、負荷リスト１００３内の各負荷データ１００４のジョブＩＤ１０２２に対応する負荷の大きさを、負荷リスト１００３の負荷の大きさ１０２７に入力する。 In the next step 905, the predicted load creating unit 232 inputs the load size corresponding to the job ID 1022 of each load data 1004 in the load list 1003 to the load size 1027 of the load list 1003.

次のステップ９０６で、予測負荷作成部２３２は、負荷リスト１００３からキュー内の順番１０２４≧閾値s2 ４０１となる負荷データ１００４を削除する。この閾値s2の決定方法については、図１３にて詳しく説明する。 In the next step 906, the predicted load creating unit 232 deletes the load data 1004 satisfying the order 1024 ≧ threshold s 2 401 in the queue from the load list 1003. The method for determining the threshold value s2 will be described in detail with reference to FIG.

次のステップ９０７で、予測負荷作成部２３２は、各ＮＡＳについて予測負荷４０３を０に初期化する。 In the next step 907, the predicted load creating unit 232 initializes the predicted load 403 to 0 for each NAS.

最後のステップ９０８で、予測負荷作成部２３２は、各負荷データ１００４について、その中に記述されているマウント元のＮＡＳ１０２６に対応する予測負荷４０３に、負荷の大きさ１０２７に対応する負荷４０５を追加し、一連の処理を終了する。 In the last step 908, the predicted load creating unit 232 adds a load 405 corresponding to the load size 1027 to the predicted load 403 corresponding to the mount source NAS 1026 described in each load data 1004. Then, a series of processing ends.

ここで説明した予測負荷の作成は、作成する毎に負荷リスト１００３を初期化から作り直している。作り直すことで負荷予測を間違いなく作ることができるが、その過程で同じような処理が重複するために作成する効率は良くない。 In the creation of the predicted load described here, the load list 1003 is recreated from initialization every time it is created. By recreating it, you can definitely make a load prediction, but the same process is duplicated in the process, so the efficiency of creating it is not good.

特に、ファイルサーバ３で多数のファイルシステム３２２を管理している場合、作り直しのために予測負荷の作成に必要な処理量が増加してしまう。負荷の変化する頻度が高い場合は、予測負荷の作成が高い頻度で実行されるために、さらに多くの処理が必要となる。 In particular, when a large number of file systems 322 are managed by the file server 3, the amount of processing necessary for creating a predicted load increases due to re-creation. When the frequency of changing the load is high, the prediction load is created with a high frequency, and thus more processing is required.

これに対し、予測負荷を作り直す代わりに、予測負荷の変化分を更新することで効率的に作成する方法もある。例えばキュー２１２内のジョブが閾値s2よりも小さい順番になった時に、そのジョブに対応する負荷を予測負荷４０３に追加することで更新する。ジョブ実行終了イベント又はジョブキャンセルイベントが発生した場合、予測負荷作成部２３２は、予測負荷４０３から終了したジョブに対応する負荷を削除することで更新する。ただし、予測負荷の変化分を確実に更新する必要がある。 On the other hand, instead of recreating the predicted load, there is also a method of efficiently creating the predicted load by updating the change in the predicted load. For example, when the jobs in the queue 212 are in an order smaller than the threshold s 2, the load corresponding to the jobs is updated by adding to the predicted load 403. When a job execution end event or a job cancel event occurs, the predicted load creating unit 232 updates the job by deleting the load corresponding to the job that has ended from the predicted load 403. However, it is necessary to reliably update the change in the predicted load.

そのため、管理サーバ８が障害などで負荷予測の変化分を更新できない時があれば、負荷予測を初期化して作り直すことで対応する。この他、負荷予測を確実に作成するために、定期的に負荷予測を初期化して作り直す方法もある。 Therefore, if there is a time when the management server 8 cannot update the load prediction change due to a failure or the like, it can be dealt with by initializing and recreating the load prediction. In addition, there is also a method for periodically initializing and regenerating the load prediction in order to reliably create the load prediction.

図１２に、閾値s1 ４０２を算出する手順を示す。この処理も負荷管理部８５が実行する。まず、ステップ１２０２でジョブ実行イベントが発生した時刻をサンプルとして集計する。 FIG. 12 shows a procedure for calculating the threshold value s1 402. This process is also executed by the load management unit 85. First, the time at which the job execution event occurs in step 1202 is totaled as a sample.

次に、ステップ１２０３で集計したジョブ実行イベントの発生時刻のサンプルに基づき、時間t1内に何個のジョブが実行するか確率Ｐ１を算出する。例えばサンプルが時刻tsから時刻teまでで発生しているとき、１個以下のジョブが実行する確率P1(1) は、以下のように算出する。まず、時刻tsから時刻teまで１分間隔の時刻t毎に、時刻tから時刻t+t1の間に１個以下のジョブが実行した回数をサンプルから数え上げる。次に、その回数を時刻tsから時刻teまで１分間隔の時刻tの回数で割り、確率P1(1) を算出する。このように、２個以下のジョブが実行する確率P1(2) 、３個以下のジョブが実行する確率P1(3) というように確率P1を算出していく。 Next, the probability P1 of how many jobs are executed within the time t1 is calculated based on the sample of job execution event occurrence times counted in step 1203. For example, when a sample occurs from time ts to time te, the probability P1 (1) that one or less job is executed is calculated as follows. First, the number of executions of one or less jobs between time t and time t + t1 is counted from the sample every time t at 1 minute intervals from time ts to time te. Next, the number of times is divided by the number of times t at 1-minute intervals from time ts to time te, and the probability P1 (1) is calculated. In this way, the probability P1 is calculated such that the probability P1 (2) that two or less jobs are executed and the probability P1 (3) that three or less jobs are executed.

算出方法は、集計したジョブ実行イベントの発生時刻のサンプルに基づき、時間t1内に何個のジョブが実行するか確率Ｐを算出できれば他の方法でも良い。例えば統計的にジョブ実行イベントが発生した時刻のサンプルからジョブが実行する確率Ｐ１の分布を推定し、その推定した分布から時間t1内にジョブが発生する確率を計算しても良い。 The calculation method may be another method as long as the probability P of how many jobs are executed within the time t1 can be calculated based on the aggregated job execution event occurrence time samples. For example, the distribution of the probability P1 that the job is executed is estimated from a sample of the time at which the job execution event occurs statistically, and the probability that the job occurs within the time t1 may be calculated from the estimated distribution.

確率Ｐ１の一例を図１２のグラフ１２１０に示す。次に、ステップ１２０４で確率Ｐ1＞閾値Ｐth１となるキュー内の順番kで最大のものを算出する。図では、k=1である。ステップ１２０５で閾値s1にステップ１２０４で算出したkを設定する。閾値s1を設定することで、負荷分散の候補から順番k≦s1となるキュー２１２内のジョブの負荷を除外し、ファイルシステムの移動中にジョブ実行が開始しないようにしている。 An example of the probability P1 is shown in a graph 1210 in FIG. Next, in step 1204, the maximum value in the order k in the queue where probability P1> threshold value Pth1 is calculated. In the figure, k = 1. In step 1205, k calculated in step 1204 is set to the threshold s1. By setting the threshold value s1, the load of the job in the queue 212 in order k ≦ s1 is excluded from the candidates for load distribution, and the job execution is not started while the file system is moving.

そのため、時間t1はファイルシステムの移動に必要となる時間を設定する。例えばファイルサーバの仕様で決められたファイルシステムの移動時間の最大値や、これまでファイルシステムを移動したときの移動時間の最大値に数分のマージンを加えた値を設定する方法がある。 Therefore, the time t1 sets the time required for moving the file system. For example, there is a method of setting a maximum value of the file system movement time determined by the specification of the file server or a value obtained by adding a margin of several minutes to the maximum value of the movement time when the file system has been moved so far.

閾値P1thは、ファイルシステム移動中にジョブ実行が開始してしても良いと許容される確率を設定する。例えば閾値P1th=0.0001など十分小さい確率を設定する。この閾値s1のことを負荷分散選択閾値と呼ぶ。 The threshold value P1th sets a probability that job execution may be started during file system migration. For example, a sufficiently small probability such as threshold value P1th = 0.0001 is set. This threshold s1 is called a load distribution selection threshold.

図１３は、閾値s2 ４０１を算出する手順を示している。この処理も、負荷予測部８５が実行する。まず、ステップ１３０２でジョブ実行イベントが発生した時刻をサンプルとして集計する。 FIG. 13 shows a procedure for calculating the threshold s2 401. This process is also executed by the load prediction unit 85. First, in step 1302, the time when the job execution event occurs is counted as a sample.

次に、ステップ１３０３で集計したジョブ実行イベントの発生時刻のサンプルに基づき、時間t2内に何個のジョブが実行するか確率Ｐ２を算出する。例えばサンプルが時刻tsから時刻teまでで発生しているとき、１個以下のジョブが実行する確率P2(1)は、時刻tsから時刻teまで１分間隔の時刻t毎に、時刻tから時刻t+t2の間に1個以下のジョブが実行した回数をサンプルから数え上げる。 Next, the probability P2 of how many jobs are executed within the time t2 is calculated based on the sample of job execution event occurrence times counted in step 1303. For example, when a sample is generated from time ts to time te, the probability P2 (1) that one or less job is executed is from time t to time t every 1 minute interval from time ts to time te. Count the number of times one or less jobs were executed during t + t2 from the sample.

そして、その回数を時刻tsから時刻teまで１分間隔の時刻tの回数で割ることで算出できる。この様に、２個以下のジョブが実行する確率P2(2)、３個以下のジョブが実行する確率P2(3)というように、確率Ｐ２を算出していく。 Then, the number of times can be calculated by dividing the number of times by the number of times t at 1 minute intervals from time ts to time te. In this way, the probability P2 is calculated as the probability P2 (2) that two or less jobs are executed and the probability P2 (3) that three or less jobs are executed.

算出方法は、集計したジョブ実行イベントの発生時刻のサンプルに基づき、時間t2内に何個のジョブが実行するか確率Ｐ２を算出できれば他の方法でも良い。例えば統計的にジョブ実行イベントが発生した時刻のサンプルからジョブが実行する確率Ｐ２の分布を推定し、その推定した分布から時間t2内にジョブが発生する確率を計算しても良い。確率Ｐ２の一例を図１３のグラフ１３１０に示した。 As the calculation method, other methods may be used as long as the probability P2 of how many jobs are executed within the time t2 can be calculated based on the sampled occurrence times of job execution events. For example, the distribution of the probability P2 that the job is executed is estimated from a sample of the time when the job execution event occurs statistically, and the probability that the job is generated within the time t2 may be calculated from the estimated distribution. An example of the probability P2 is shown in a graph 1310 in FIG.

次に、ステップ１３０４で確率Ｐ２＜閾値Ｐth2となるキュー内の順番kで最小のものを算出する。図では、k=5である。ステップ１３０５で閾値s2にステップ１３０４で算出したkを設定する。閾値s2を設定することで、順番k≦s2となるキュー２１２内のジョブはジョブ実行までの時間が長いと判断できる。 Next, in step 1304, the smallest one in the order k in the queue where probability P2 <threshold value Pth2 is calculated. In the figure, k = 5. In step 1305, k calculated in step 1304 is set to the threshold s2. By setting the threshold value s2, it can be determined that the job in the queue 212 in the order k ≦ s2 has a long time until job execution.

そして、予測負荷からそのジョブの負荷を除外することで、ジョブの実行予測負荷の精度を高めている。そのため、時間t2は予測負荷を算出に用いる時間範囲を設定する。同時に実行される可能性のあるジョブの負荷を集めることで予測負荷を算出するので、時間t2には、例えばこれまで実行されたジョブの中で最大の実行時間を設定する。 Then, by excluding the job load from the predicted load, the accuracy of the job execution predicted load is increased. Therefore, the time t2 sets a time range in which the predicted load is used for calculation. Since the predicted load is calculated by collecting the loads of jobs that may be executed at the same time, for example, the maximum execution time among jobs executed so far is set as the time t2.

ジョブの最大実行時間がその時だけ異常に長く、あてにならないと考えられる場合は、ジョブの実行時間で例えば２番目に長い時間を設定しても良い。閾値s2は、予測負荷の算出に用いる時間の範囲を超えると予想されたジョブが予想に反して実行されても良いと許容される確率を設定する。例えば閾値P2=0.0001など十分小さい確率を設定する。この閾値s2のことを予測負荷除外閾値と呼び、時間t2のことを予測負荷対象時間と呼ぶ。 If it is considered that the maximum execution time of the job is abnormally long only at that time and cannot be relied upon, for example, the second longest time may be set as the execution time of the job. The threshold value s2 sets a probability that a job that is predicted to exceed the time range used for calculating the predicted load may be executed contrary to the prediction. For example, a sufficiently small probability such as threshold value P2 = 0.0001 is set. This threshold value s2 is called a predicted load exclusion threshold value, and time t2 is called a predicted load target time.

図１０に、負荷分散対象リスト１００１（図２の２３５）と負荷リスト１００３（図２の２３６）の構造の一例を示す。負荷分散対象リスト１００１は、負荷の管理番号１０１１、ジョブＩＤ１０１２、ジョブのアクセス先ファイルシステム１０１３、ジョブＩＤに対応するキュー内の順番１０１４、ファイルシステムのマウント元のＮＡＳ１０１５と負荷の大きさ１０１６から構成される。リストの行には、負荷分散対象１００２を入力する。負荷リスト１００３は、負荷の管理番号１０２１、ジョブＩＤ１０２２、ジョブのアクセス先ファイルシステム１０２３、ジョブＩＤに対応するキュー内の順番１０２４、ジョブＩＤに対応する実行キュー内の順番１０２５、ファイルシステムのマウント元のＮＡＳ１０２６とジョブＩＤに対応する負荷の大きさ１０２７から構成される。 FIG. 10 shows an example of the structure of the load distribution target list 1001 (235 in FIG. 2) and the load list 1003 (236 in FIG. 2). The load distribution target list 1001 includes a load management number 1011, a job ID 1012, a job access destination file system 1013, an order 1014 in the queue corresponding to the job ID, a NAS 1015 that mounts the file system, and a load size 1016. Is done. The load distribution target 1002 is input to the list row. The load list 1003 includes a load management number 1021, a job ID 1022, a job access destination file system 1023, an order 1024 in the queue corresponding to the job ID, an order 1025 in the execution queue corresponding to the job ID, and a file system mount source NAS 1026 and the load size 1027 corresponding to the job ID.

図１１に、ファイルシステム管理テーブル１１０１（図２の２４１）とファイルサーバ管理テーブル１１０２（図２の２３４）の構造の一例を示す。ファイルシステム管理テーブル１１０１は、ファイルシステム名１１１１、計算機側でファイルシステムをＮＦＳマウントしているディレクトリ５２１に対応するディレクトリ名１１１２とマウント元のＮＡＳ１１１３から構成される。ファイルサーバ管理テーブル１１０２はＮＡＳ名１１２１と負荷の閾値１１２２から構成される。 FIG. 11 shows an example of the structure of the file system management table 1101 (241 in FIG. 2) and the file server management table 1102 (234 in FIG. 2). The file system management table 1101 includes a file system name 1111, a directory name 1112 corresponding to the directory 521 in which the file system is NFS mounted on the computer side, and a mount source NAS 1113. The file server management table 1102 includes a NAS name 1121 and a load threshold value 1122.

図１４に、ジョブスクリプト１４０１の内容の一例を示す。ジョブスクリプト１４０１は、ジョブが用いるＣＰＵ数１４１１、ジョブの最大実行時間１４１２、ジョブが用いる最大のメモリ数１４１３、ジョブが入力データとして読み込む入力ファイル名１４１４、ジョブがデータを出力する先の出力ファイル名１４１５、ジョブが計算機１１で実行するプログラムの実行ファイル名１４１６を有する。 FIG. 14 shows an example of the contents of the job script 1401. The job script 1401 includes a CPU count 1411 used by the job, a maximum job execution time 1412, a maximum memory count 1413 used by the job, an input file name 1414 read by the job as input data, and an output file name to which the job outputs data. 1415, the job has an execution file name 1416 of the program executed by the computer 11.

１計算機システム
２ＩＰスイッチ
３ファイルサーバ
４ＦＣスイッチ
５ストレージ装置
６ＬＡＮ
７計算機管理サーバ
８ストレージ管理サーバ
５１コントローラ
５７ＲＡＩＤＧｒ．
５８ハードディスク搭載部
５９ハードディスク
７３ジョブスケジューラ
８３情報収集部
８４情報解析部
８５負荷管理部
８６ファイルシステム管理部 1 Computer system 2 IP switch 3 File server 4 FC switch 5 Storage device 6 LAN
7 Computer management server 8 Storage management server 51 Controller 57 RAIDGr.
58 hard disk mounting unit 59 hard disk 73 job scheduler 83 information collecting unit 84 information analyzing unit 85 load managing unit 86 file system managing unit

Claims

Two or more file servers connected to a plurality of computers to which a first management device is connected, a storage device connected to the file server and having one or more volumes, the file server, and the first management device; A storage system having a second management device connected to the storage system,
The first management device includes an area for storing job information (job information) and job queue information (job queue information) sequentially executed on a computer, an execution queue, and a queue for waiting for a job until execution. Have
The second management device includes means for collecting the job information, means for collecting the job queue information, means for analyzing the collected job information and the job queue information, means for managing a load, Means for managing the file system,
The means for analyzing the job information and the job queue information specifies a file, a file system, and a file server to be accessed by the job based on the job information,
The means for managing the load calculates a predicted load for each file server based on the job information and the job queue information, and determines that load distribution is performed when the predicted load exceeds a predetermined load threshold. ,
The means for managing the file system, when the means for managing the load determines to execute load distribution , changes the file system corresponding to the load from a file server having a high predicted load in the file server to a low predicted load. A storage system that moves to a file server.

The storage system according to claim 1, wherein
The means for managing the load selects a file server that exceeds a preset load threshold from among the predicted loads calculated for each of the file servers, and the load threshold is exceeded. storage system, wherein a partial load is selected as load balancing target, selects the smallest file server estimated load as a file server of the destination load.

The storage system according to claim 2, wherein
The means for managing the load specifies a file system and a file server to be accessed by a job arranged in the job queue based on the job information and the job queue information, and sets each file arranged in the job queue. A storage system characterized in that a predicted load is calculated by summing up the load given by a job for each of the file servers.

The storage system according to claim 3, wherein
The storage system, wherein the load managing unit deletes the load from the load distribution target when the order of jobs corresponding to the load in the job queue is equal to or less than a load distribution selection threshold.

The storage system according to claim 4, wherein
The storage system, wherein the means for managing the load deletes a load given by a job whose order in the job queue is equal to or higher than a predicted load exclusion threshold.

The storage system according to claim 5, wherein
Said means for managing the load on the basis of the frequency of events occurs, calculates a maximum number of times a predetermined method to perform job in a range of a predetermined probability within a predetermined estimated load target time The storage system is characterized in that the number of times is set as a predicted load exclusion threshold.

The storage system according to claim 6, wherein
The storage system characterized in that the means for managing the load detects an event occurring in the job queue by taking a difference between current and past contents of the job queue information.

The storage system according to claim 7, wherein
The storage system characterized in that the means for managing the load determines load distribution timing by a predetermined method based on the occurrence of the event.

The storage system according to claim 7, wherein
The means for managing the load calculates a maximum number of times that the job is executed within a predetermined probability range within a predetermined file system movement time based on the frequency of occurrence of the event by a predetermined method. The number of times is set as a load balancing selection threshold.

Two or more file servers connected to a plurality of computers to which a first management device is connected, a storage device connected to the file server and having one or more volumes, the file server, and the first management device; A load distribution management method in a storage system having a second management device connected to
The first management device includes an area for storing job information (job information) and job queue information (job queue information) sequentially executed on the computer, an execution queue, and a queue for waiting for a job until execution. If you have
The second management device includes a process for collecting the job information, a process for collecting the job queue information, a process for analyzing the collected job information and the job queue information, a process for managing a load, Process to manage the file system,
The process of analyzing the job information and the job queue information specifies a file, a file system, and a file server that the job accesses based on the job information,
The process for managing the load calculates a predicted load for each file server based on the job information and the job queue information, and determines that load distribution is executed when the predicted load exceeds a predetermined load threshold. ,
The process of managing the file system, when the means for managing the load decides to perform load balancing, the file system corresponding to the load, small estimated load from a high file server predictable load of the file server A load balancing management method for a storage system, characterized by moving to a file server.

Two or more file servers connected to a plurality of computers to which a first management device is connected, a storage device connected to the file server and having one or more volumes, the file server, and the first management device; A program for causing a computer to execute load balancing management processing in a storage system having a second management device connected to
The first management device includes an area for storing job information (job information) and job queue information (job queue information) sequentially executed on the computer, an execution queue, and a queue for waiting for a job until execution. If you have
The computer mounted on the second management device has a process of collecting the job information, a process of collecting the job queue information, a process of analyzing the collected job information and the job queue information, and a load. Execute the process to manage and the process to manage the file system,
The process of analyzing the job information and the job queue information specifies a file, a file system, and a file server that the job accesses based on the job information,
The process for managing the load calculates a predicted load for each file server based on the job information and the job queue information, and determines that load distribution is executed when the predicted load exceeds a predetermined load threshold. ,
The process of managing the file system, when the means for managing the load decides to perform load balancing, the file system corresponding to the load, small estimated load from a high file server predictable load of the file server A program that enables a computer to implement load balancing management of a storage system, characterized by moving to a file server.