JP2012173751A

JP2012173751A - Shared resource management system and resource management server device

Info

Publication number: JP2012173751A
Application number: JP2011031746A
Authority: JP
Inventors: Yasushi Kobayashi; 靖司小林
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-02-17
Filing date: 2011-02-17
Publication date: 2012-09-10
Anticipated expiration: 2031-02-17
Also published as: JP5732893B2

Abstract

PROBLEM TO BE SOLVED: To improve the availability of an information processing center in which a shared resource is arranged.SOLUTION: A configuration catalog 140 carries the configuration of a system using a resource as a component. A center manager is able to update the content by appropriate input means. A function module 150 notifies a function module 120 of the state information and failure information of each resource configuring servers (1000-5000). The function module 150 obtains the state information and failure information from each resource configuring each server. The function module 120 collects the state information and failure information of which the function module 150 has notified the function module 120, and stores the state information and failure information in a database 110. A function module 130 rearranges the resource while referring to the state information and failure information, and performs calculation for constructing the system requested by the configuration catalog 140, and performs the mounting of hardware configuring the system including wire connection.

Description

本発明は共用リソース管理システム及びリソース管理サーバ装置に係り、特に、複数のシステムで共有されるリソースの状態及び障害時の状態を管理しながら、障害発生時等には障害発生リソースの機能を他のリソースに割り振ることができる共用リソース管理システム及びリソース管理サーバ装置に関する。 The present invention relates to a shared resource management system and a resource management server device, and in particular, manages the status of a resource shared by a plurality of systems and the status at the time of a failure while performing other functions of the failed resource when a failure occurs. The present invention relates to a shared resource management system and a resource management server device that can be allocated to other resources.

近年、大規模な情報処理システムとして、顧客のサーバ装置等を管理する大規模なデータセンタ（リソース共用管理センタシステム）等においては、データセンタ側のサーバ装置を含む様々な種類のデータセンタ側のリソースを一括集中管理させ、顧客側のサーバ装置等で構成される複数のシステムに対しては、これらのデータセンタ側のリソースを共有リソースとして共用させるリソース共有型情報処理システムが普及しつつある。
ここで、データセンタ側のリソースとは、データセンタ側のサーバ装置を構成するＣＰＵ機器、メモリ機器、ディスク機器、及びネットワーク機器（インタフェースを含む）等のことであるが、これらの機器群で構成されるデータセンタ側のサーバ機器自体も該リソースの範疇に含まれるものとする。 In recent years, as a large-scale information processing system, in a large-scale data center (resource sharing management center system) that manages a customer's server device and the like, various types of data center side including a server device on the data center side are used. Resource sharing type information processing systems in which resources are centrally managed and a plurality of systems including customer-side server devices and the like share these data center resources as shared resources are becoming widespread.
Here, the resources on the data center side include CPU devices, memory devices, disk devices, network devices (including interfaces), and the like that constitute the server device on the data center side. The server equipment itself on the data center side is also included in the category of the resource.

従って、顧客側のサーバ装置等を管理する大規模なデータセンタ等では、これら複数のリソースを有しており、これにより、顧客側のサーバ装置等を含めて構成される複数のシステムを運行していることになる。
なお、これらの複数のシステムを構成するデータセンタ側のサーバ群の中には、可用性の技術を応用してシステムの継続使用可能なクラスタ構成（可用性の用語としてホットスタンバイ、コールドスタンバイ）を備えたものも開発されている。この場合、該クラスタを構成するサーバ装置の一部のリソースが故障すると、該故障したリソースを構成するサーバ装置をシステムから切り離して修理することが必要な場合も生じるので、この解決も本発明に際しての課題の１つであった。 Therefore, a large-scale data center or the like that manages server devices on the customer side has these multiple resources, thereby operating a plurality of systems that include the server devices on the customer side. Will be.
In addition, the data center side server group that constitutes these multiple systems has a cluster configuration (hot standby, cold standby as availability terms) that can use the system continuously by applying availability technology. Things are also being developed. In this case, if a part of the resources of the server devices constituting the cluster fails, it may be necessary to disconnect and repair the server devices constituting the failed resource from the system. It was one of the problems.

また、このように、データセンタ側では、システムから切り離されたサーバ装置等を手動で管理する必要も生じるので、この解決も本発明に際しての課題の１つであった。
また、データセンタ側で稼働中のサーバ装置等に代わるリソースを、代替サーバ装置等としてデータセンタ側で予めプーリングしておき、障害発生時には該稼働中のサーバ装置等をシステムから切り離すと共に、この代替サーバ装置等をリアルタイムで補充して稼働させる必要が生じる。従って、このような技術を提供することも本発明に際しての課題の１つであった。 Further, as described above, since it is necessary on the data center side to manually manage server devices and the like separated from the system, this solution is also one of the problems in the present invention.
In addition, resources that replace server devices that are operating on the data center side are pooled in advance on the data center side as alternative server devices, and the server devices that are operating are disconnected from the system when a failure occurs. It becomes necessary to replenish the server device and the like in real time. Therefore, providing such a technique was also one of the problems in the present invention.

さらに、データセンタ管理者から、新しいサーバ装置等の構成要求が出された際に備えて、該新しいサーバ装置等の構成要件に対応できるリソースを一括管理の下で予めプーリングしておき、新しいサーバ装置等の構成要求が出された際には、該構成要件にあったサーバ装置等を結線まで含めて自動で実装することも、本発明に際しての課題の１つであった。 Furthermore, in preparation for a configuration request for a new server device or the like from a data center administrator, resources that can meet the configuration requirements of the new server device or the like are pooled in advance under collective management, and a new server When a configuration request for a device or the like is issued, it is also one of the problems in the present invention to automatically mount a server device or the like that meets the configuration requirement including connection.

この分野の公知技術として、例えば、特許文献１には、高可用性クラスタ・ネットワーク構成中の資源グループを定義する「自動」機構を提供する技術が開示されている。具体的には、管理者は、所与のアプリケーションをその時に実行しているコンピュータで故障が発生した場合に、単に、そのアプリケーションと並置しなければならない資源のセットを識別するだけで良い。この場合、資源グループが、１組のコロケーション「制約」または規則を使用して自動的に生成する。第１のコロケーション制約は、所与のアプリケーションについて任意のユーザ定義コロケーションを実施することが好ましいとしており、第２の制約は、同一の物理ディスク上に常駐するディスク区画資源を並置するものとしている。 As a known technique in this field, for example, Patent Document 1 discloses a technique for providing an “automatic” mechanism for defining resource groups in a high availability cluster network configuration. Specifically, an administrator need only identify a set of resources that must be juxtaposed with an application if a failure occurs on the computer that is currently executing the given application. In this case, the resource group is automatically generated using a set of collocation “constraints” or rules. The first collocation constraint states that it is preferable to implement any user-defined collocation for a given application, and the second constraint is to juxtapose disk partition resources that reside on the same physical disk.

また、例えば、特許文献２には、管理者が、ノード毎のリソース使用状況を把握して管理することを可能とし、クラスタ構成の記憶システムの管理コストの削減を図る技術を開示している。具体的には、管理サーバは、各ノードのデータ物理位置情報を統合して有し、クラスタ構成記憶システムの構成及び各ノードの構成と関連付けて提示する手段を備える。ノードは、処理に際し使用するリソースの量や負荷を取得する手段を備え、管理サーバは、各リソースの使用状況を各サーバから収集して集計する手段と、リソース使用状況と前記データ情報と前記構成情報とを関連付けて、階層を設けて提示する手段を備える。さらに、論理ボリュームをノード間においてホストコンピュータには透過に移動する手段を備える。管理サーバは、情報表示により移動元データと移動先物理位置との選択を支援する手段と、前記移動の指示を受領し、前記クラスタ構成記憶システムに移動を指示する手段を備えるものとしている。 Further, for example, Patent Document 2 discloses a technology that enables an administrator to grasp and manage the resource usage status for each node, and to reduce the management cost of a storage system with a cluster configuration. Specifically, the management server has data physical position information of each node integrated, and includes means for presenting the data in association with the configuration of the cluster configuration storage system and the configuration of each node. The node includes means for acquiring the amount and load of resources used in processing, and the management server collects and aggregates the usage status of each resource from each server, the resource usage status, the data information, and the configuration There is provided means for associating with information and providing a hierarchy. Further, a means for moving the logical volume between nodes transparently to the host computer is provided. The management server includes means for supporting selection of movement source data and movement destination physical position by displaying information, and means for receiving the movement instruction and instructing movement to the cluster configuration storage system.

また、例えば、特許文献３には、ネットワークにより接続されたサーバ、ストレージ及びネットワーク装置から構成される自律制御システムで、制御要件（ポリシー）に応じて柔軟な自律制御を行う技術を開示している。具体的には、ポリシーＤＢなどに各種ポリシーを記憶し、システムを構成する各ノードが他のノードと連携すると共に、ポリシーに基づいて自律制御を行う。また、予備のリソースを共有プール、ベアメタルプール及びスタンバイプールを用いて管理し、障害時や性能劣化時にワークグループシステムリソースマネジャがスタンバイプール、ベアメタルプールの順に予備のリソースを選択するものとしている。 Further, for example, Patent Document 3 discloses a technique for performing flexible autonomous control according to control requirements (policy) in an autonomous control system including a server, a storage, and a network device connected via a network. . Specifically, various policies are stored in a policy DB or the like, and each node constituting the system cooperates with other nodes and performs autonomous control based on the policy. In addition, spare resources are managed using a shared pool, a bare metal pool, and a standby pool, and a work group system resource manager selects spare resources in the order of the standby pool and the bare metal pool in the event of a failure or performance degradation.

また、例えば、特許文献４には、既に他の情報処理システムが稼動中である場合にも影響を与えないリソースを求め、業務サービスの継続的な実行と共有リソースを有効活用できる手段を開示している。具体的には、業務アプリケーション毎に、各業務アプリケーションを実行するコンピュータシステムを識別する情報、及び各コンピュータシステムを構成するリソースの利用率を含むリソース利用率情報を格納した記憶部と、業務アプリケーションが、新たにコンピュータシステムのリソースの使用を要求する場合に、リソース利用率情報を用いて、変更後の当該コンピュータシステムを構成する各リソースの利用率を集計して出力する影響範囲調査部と、を備えるリソース管理システムである。 Further, for example, Patent Document 4 discloses a means for obtaining a resource that does not affect even when another information processing system is already in operation, and for making effective use of continuous execution of business services and shared resources. ing. Specifically, for each business application, a storage unit storing information for identifying a computer system that executes each business application, and resource usage rate information including a usage rate of resources constituting each computer system, An impact range investigating unit that aggregates and outputs the utilization rate of each resource constituting the computer system after the change using the resource utilization rate information when newly requesting the use of the resource of the computer system; It is a resource management system provided.

さらに、例えば、特許文献５には、ソフトウェア・リソースを動的に提供する技術を開示している。具体的には、オペレーティング・システム、アプリケーション・プログラム、及びソフトウェア・ドライバといったソフトウェア・リソースを動的に提供する技術である。 Furthermore, for example, Patent Document 5 discloses a technique for dynamically providing software resources. Specifically, it is a technology that dynamically provides software resources such as an operating system, application programs, and software drivers.

特開２００１−１０９６３９号公報JP 2001-109639 A 特開２００３−２９６０３９号公報JP 2003-296039 A 特開２００５−３４６２０４号公報JP 2005-346204 A 特開２００８−０３３８５２号公報JP 2008-033852 A 特表２００８−５０２９６７号公報Special table 2008-502967

ところで、上記背景技術で述べた従来の共用リソース管理システムにおいては、前述のとおり、顧客のサーバ等を含めて構築される複数のシステムを管理する大規模なデータセンタ等に前記の複数のリソースを備えて管理する構成をとっている。また、各システムを構成するデータセンタ側のサーバ群の中には、可用性の技術を応用したクラスタ構成を有するものも存在する。
また、前述のとおり、このクラスタ構成を成すサーバの一部のリソースが故障した際には、故障したリソースで構成されるサーバをシステムから切り離して手動で管理し、場合によっては修理する必要も有るが、従来のリソース共有型情報処理システムステムでは、このような管理や修理の実施に対応できていないという問題点が有る。 By the way, in the conventional shared resource management system described in the background art, as described above, the plurality of resources are allocated to a large-scale data center or the like that manages a plurality of systems constructed including customer servers and the like. It is configured to prepare and manage. In addition, some servers on the data center side configuring each system have a cluster configuration to which availability technology is applied.
In addition, as described above, when a part of the resources of a server in this cluster configuration fails, the server configured with the failed resource must be manually managed by separating it from the system, and in some cases, it is necessary to repair it. However, the conventional resource-sharing information processing system system has a problem that it cannot cope with such management and repair.

また、データセンタ側では、稼働中のサーバ装置等に代わるリソースを、代替サーバ装置等として予めプーリングしておき、障害発生時等には該稼働中のサーバ装置等をシステムから切り離すと共に、この代替サーバ装置等をリアルタイムで補充して稼働させる必要が生じる。しかしながら、従来のリソース共有型情報処理システムでは、このような代替サーバ装置を事前にプーリングしておいて、該装置を障害発生時等に補充することまでは考慮されていないという問題点が有る。 On the data center side, resources that replace the operating server device, etc. are pooled in advance as an alternative server device, etc., and when the failure occurs, the operating server device, etc. is disconnected from the system, and this replacement It becomes necessary to replenish the server device and the like in real time. However, in the conventional resource sharing type information processing system, there is a problem that such an alternative server device is pooled in advance and is not taken into account for supplementing the device when a failure occurs.

さらに、従来のリソース共有型情報処理システムステムでは、一括管理の下で予めプーリングしていたリソースの内から、データセンタの管理者が具体的に要求する新しいシステム（サーバ装置等）の構成要件に合致したリソースを摘出すると共に、該リソースを障害発生時等に補充することまでは考慮されていないという問題点が有る。
よって、本発明に際しては、システムを構成するリソースに障害が発生した場合にも、サービスの提供を中止しなくても済むシステム（即ち、より可用性が重視されたシステム）を構築することが重点的な課題であった。 Furthermore, in the conventional resource-sharing information processing system system, the configuration requirements of a new system (server device, etc.) specifically requested by the data center manager from the resources pooled in advance under the collective management. There is a problem that it is not taken into consideration to extract a matching resource and to supplement the resource when a failure occurs.
Therefore, in the present invention, it is important to construct a system that does not have to stop providing a service even if a resource constituting the system fails (that is, a system that places more importance on availability). It was a difficult task.

ちなみに、近年、仮想環境等を実現する技術は、物理機器の保守費や運用費の削減する必要や、技術的な観点により、急速に進展しており、このため、機器等の物理的なリソースを有効利用することや、オペレータの作業を簡易化すること等により、システム管理費用の削減や運用費の削減を図ることが重要課題となってきている。
また、仮想技術に対する技術の複雑化を緩和するための可視化技術を取り入れることにより、システム環境のリソース全体の稼働率を高めることや、センタ運用管理者の作業軽減を提供することも重要課題となってきている。 Incidentally, in recent years, technologies for realizing virtual environments and the like have been rapidly developed due to the necessity of reducing maintenance costs and operation costs of physical equipment and from a technical point of view. It has become an important issue to reduce the system management cost and the operation cost by effectively using the system and simplifying the operation of the operator.
It is also important to increase the operating rate of all resources in the system environment by providing visualization technology to reduce the complexity of the virtual technology, and to reduce the work of the center operation manager. It is coming.

なお、特許文献１に記載の技術は、所与のアプリケーションを実行するコンピュータに故障が発生した場合に、該アプリケーションと並置すべきリソースのセットを管理者が識別するだけで、後は、資源グループが、１組のコロケーション「制約」または規則を使用して自動的に生成するものであり、本発明のように、障害情報まで収集して、障害が発生したリソースの機能を他のリソースに割り振って代替させたり、障害が発生したリソースであっても、該リソースを構成する機器の内、他のシステムの構成が可能な機器については、できるだけ該機器を活用しようとするものではない。 Note that in the technology described in Patent Document 1, when a failure occurs in a computer that executes a given application, the administrator simply identifies a set of resources to be juxtaposed with the application. Are automatically generated using a set of collocation “constraints” or rules, and like the present invention, they collect up to failure information and allocate the function of the failed resource to other resources. Even if a resource is replaced or failed, a device that can be configured in another system among the devices that constitute the resource is not intended to utilize the device as much as possible.

また、特許文献２に記載の技術は、クラスタ構成の記憶システムの管理コストの削減を図ることに限定されており、よって、リソースとしては記憶システムだけが意識されており、本発明のように、サーバや、該サーバを構成するＣＰＵ装置、メモリ装置、ディスク装置、ネットワーク機器（インターフェースを含む）までをリソースとして見なして扱うものではない。 In addition, the technique described in Patent Document 2 is limited to reducing the management cost of a storage system having a cluster configuration, and therefore, only the storage system is considered as a resource. Servers, CPU devices, memory devices, disk devices, and network devices (including interfaces) constituting the servers are not handled as resources.

また、特許文献３に記載の技術は、Front 層、Web 層、AP層、DB層といった層により、サービスのポリシーを定義し、各層での障害時の層毎のポリシーによってサーバ群の増設や切換えを実施するものである。しかし、本発明では、 Web層、AP層等の上位のサーバがそこで何を動かしているかを意識しない。即ち、本発明では、層を意識していないため、層のサービスといったものではなく、ハードウェアレベルのリソース（例えば、ＣＰＵ装置、メモリ装置、ディスク装置、ネットワーク機器（インターフェースを含む））をシステムの構成要素としている。 Furthermore, the technology described in Patent Document 3 defines service policies by the layers such as the Front layer, Web layer, AP layer, and DB layer, and adds or switches server groups according to the policy for each layer at the time of failure in each layer. Is to implement. However, in the present invention, it is not conscious of what a higher-level server such as the Web layer or AP layer operates. That is, in the present invention, since the layer is not conscious, it is not a service of the layer, but a hardware level resource (for example, CPU device, memory device, disk device, network device (including interface)) is allocated to the system. As a component.

また、特許文献４に記載の技術は、業務アプリケーションを業務構成オブジェクトとする定義を行っているが、本発明では、業務構成オブジェクトとリソースのプーリングとは結び付けていない。さらに、本発明では、障害発生時の迅速な代替リソースの提供を特徴としているが、構成要件の障害対策・復旧機能は、各サブシステムに任せ、管理サーバは各サブシステムの構成及びプーリングされたリソースの管理とを分けて管理している。 The technique described in Patent Document 4 defines a business application as a business configuration object. However, in the present invention, the business configuration object and resource pooling are not linked. Furthermore, the present invention is characterized by the provision of a quick replacement resource in the event of a failure. The fault countermeasure / recovery function of configuration requirements is left to each subsystem, and the management server is configured and pooled for each subsystem. It is managed separately from resource management.

さらに、特許文献５に記載の技術は、主としてソフトウェア面での対策であり、特に、オペレーティング・システム、アプリケーション・プログラム、及びソフトウェア・ドライバの動的な提供をするものである。従って、本発明の課題であったハードリソースを柔軟に動的に構成変更して提供するという課題を解決するものではない。 Furthermore, the technique described in Patent Document 5 is mainly a countermeasure in terms of software, and in particular, dynamically provides an operating system, application programs, and software drivers. Therefore, it does not solve the problem of providing a hard resource that is a problem of the present invention by flexibly and dynamically changing the configuration.

即ち、本発明は、他のシステム群とでリソースを共有し、障害時等の場合に自動的にリソースを変更する手段において、集中管理センタ（管理サーバ）が障害の起きたリソースの状態情報を収集し、顧客（センタ管理者）が要求する他のシステムの業務カタログ（構成カタログ、要件）に対し、予め統合的に管理されているリソースの蓄積（プーリング）情報と照合し、代替が可能なリソースが有る場合は、該リソースを組み込んで前記顧客が要求する他のシステムを構築することを骨子としている。 That is, according to the present invention, in a means for sharing resources with other system groups and automatically changing resources in the event of a failure or the like, the central management center (management server) stores the status information of the failed resource. Collected and checked against the system's business catalog (configuration catalog, requirements) of other systems requested by customers (center managers) and collated with resource accumulation (pooling) information managed in advance in an integrated manner. When there is a resource, the main point is to construct the other system required by the customer by incorporating the resource.

本発明は、上記従来の問題点に鑑みてなされたものであって、ネットワークを介して構成される複数のシステムに共有されるリソースの一部に障害が発生した場合に、該発生した障害の情報と、システム管理者によって要求される新しいサーバ要件とに基づいて、障害発生のリソースの機能を他のリソースに割り振ることができる共用リソース管理システム及びリソース管理サーバ装置を提供することを目的としている。 The present invention has been made in view of the above-described conventional problems. When a failure occurs in a part of resources shared by a plurality of systems configured via a network, the failure is It is an object of the present invention to provide a shared resource management system and a resource management server apparatus capable of allocating the function of a failed resource to other resources based on information and new server requirements requested by a system administrator. .

上記課題を解決するために、本発明に係る共用リソース管理システムは、ハードウェア資源としての複数のリソースと、前記複数のリソースの各々の状態を示す状況情報と、前記複数のリソースの各々の障害状況を示す障害情報とを、それぞれ収集する手段と、前記収集した前記状況情報及び前記障害情報をデータベースに登録する手段と、構成済の複数のシステムの各々のハードウェア構成を、前記複数のリソースの少なくとも一部を構成要素として記載すると共に、新たに構成すべきシステムのハードウェア構成を、前記複数のリソースの少なくとも一部を構成要素として記載することができる構成カタログと、前記構成カタログが更新された時に、前記状況情報及び前記障害情報を参照し、前記構成カタログに指定された前記新たなシステムのハードウェアを構成するリソースの各々が動作可能であることを確認した上で、該リソースを使用する前記システムのハードウェアを構成し、かつ実装する手段と、を備えたことを特徴とする。 In order to solve the above problems, a shared resource management system according to the present invention includes a plurality of resources as hardware resources, status information indicating the state of each of the plurality of resources, and a failure of each of the plurality of resources. Means for collecting failure information indicating a situation, means for registering the collected situation information and failure information in a database, and hardware configurations of each of a plurality of configured systems, the plurality of resources A configuration catalog that can describe at least a part of the plurality of resources as a component, and the configuration catalog is updated. The new information specified in the configuration catalog with reference to the status information and the failure information. And a means for configuring and implementing the hardware of the system that uses the resource after confirming that each of the resources configuring the system hardware is operable. .

また、本発明のリソース管理サーバ装置は、ハードウェア資源としての複数のリソースとハードウェア接続され、前記複数のリソースの各々の状態を示す状況情報と、前記複数のリソースの各々の障害状況を示す障害情報とを、それぞれ収集してデータベースに登録する手段と、構成済の複数のシステムの各々のハードウェア構成を、前記複数のリソースの少なくとも一部を構成要素として記載すると共に、新たに構成すべきシステムのハードウェア構成を、前記複数のリソースの少なくとも一部を構成要素として記載することができる構成カタログと、前記構成カタログが更新された時に、前記状況情報及び前記障害情報を参照し、前記構成カタログに指定された前記新たなシステムのハードウェアを構成するリソースの各々が動作可能であることを確認した上で、該リソースを使用する前記システムのハードウェアを構成し、かつ実装する手段と、を備えたことを特徴とする。 The resource management server device of the present invention is hardware-connected to a plurality of resources as hardware resources, and indicates status information indicating the status of each of the plurality of resources, and indicates a fault status of each of the plurality of resources. A means for collecting failure information and registering it in a database, and a hardware configuration of each of a plurality of configured systems, including at least a part of the plurality of resources as constituent elements and a new configuration A configuration catalog capable of describing at least a part of the plurality of resources as components, and referring to the status information and the failure information when the configuration catalog is updated, Each of the resources that make up the hardware of the new system specified in the configuration catalog is operational. After having Ensure, to configure the hardware of the system using the resource, and characterized by comprising a means for mounting.

以上説明したように、本発明の共用リソース管理システムによれば、リソースを予め統合的に管理・蓄積（プーリング）しておき、ネットワークを介して構成される複数のシステムに共有されるリソースの一部に障害が発生した場合に、システム管理者によって要求される新しいサーバ要件と、該管理・蓄積情報とを照合し、代替が可能なリソースが有る場合は、該リソースを使用してシステムを再構成すると共に、該システムに対応するサーバ装置等のハードウェアを、配線まで含めて実装することができるので、共用リソースを配備する情報処理センタの可用性を向上させることができる効果が有る。 As described above, according to the shared resource management system of the present invention, resources are managed and accumulated (pooled) in advance in an integrated manner, and one resource shared by a plurality of systems configured via a network is used. When a failure occurs in a part, the new server requirements required by the system administrator are checked against the management / stored information. If there is a resource that can be replaced, the system can be restarted using the resource. In addition to the configuration, hardware such as a server device corresponding to the system including the wiring can be mounted, so that there is an effect that the availability of the information processing center in which the shared resource is deployed can be improved.

本発明の実施形態に係る共用リソース管理システムの全体構成を示す構成図である。It is a block diagram which shows the whole structure of the shared resource management system which concerns on embodiment of this invention. 本発明の実施形態の共用リソース管理システムで、サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生したことを示す説明図である。It is explanatory drawing which shows that a failure generate | occur | produced in network I / F3100 which is a part of resource of the server 3000 in the shared resource management system of embodiment of this invention. サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生した場合の動作例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an operation example when a failure occurs in the network I / F 3100 that is a part of the resource of the server 3000. サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生した場合の、他の動作例を示す説明図である。FIG. 10 is an explanatory diagram illustrating another operation example when a failure occurs in the network I / F 3100 that is a part of the resource of the server 3000. 本発明の実施形態に係る共用リソース管理システムの動作順序の１例を示すシーケンス図である。It is a sequence diagram which shows an example of the operation | movement order of the shared resource management system which concerns on embodiment of this invention.

本発明の共用リソース管理システムは、顧客側のサーバ装置等を管理する大規模なデータセンタ等において、前記顧客側のサーバ装置等を含めて構成される複数のシステムに対して、該システムを構成するデータセンタ側配備のリソースの状態を管理して共用できるようにすることを特徴とする。
ここで、データセンタ側配備のリソースとは、システムを構築するために充当されるハードウェア資源のことであり、例えば、データセンタ側のサーバ装置を構成するＣＰＵ機器、メモリ機器、ディスク機器、及びネットワーク機器（ネットワークとのインタフェース機能を担う）のことである。但し、本発明では、前記の機器群で構成されるサーバ装置自体も前記リソースの範疇に入るものと見なしている。 The shared resource management system of the present invention is configured for a plurality of systems configured to include the customer-side server device and the like in a large-scale data center that manages the server device and the like on the customer side. It is characterized in that the status of resources deployed on the data center side can be managed and shared.
Here, the resources deployed on the data center side are hardware resources allocated to construct a system. For example, a CPU device, a memory device, a disk device, and a server device constituting the server device on the data center side. It is a network device (responsible for the interface function with the network). However, in the present invention, the server device itself composed of the device group is considered to fall within the category of the resource.

また、本発明は、現在使用されていないリソースも含めてプーリングしておき、一括管理することを特徴とする。
ここで、プーリングとは、リソースの一定数を保持することを意味するものとする。
さらに、本発明は、前記プーリングされたリソースを使用して、データセンタの管理者が要求する新しいサーバ装置等の構成要件に合致したシステムを構成し、該システムの構成を自動で構築する（結線等を含めてハードウェアとして実装する）機能を具備することを特徴とする。 In addition, the present invention is characterized in that resources that are not currently used are pooled and managed collectively.
Here, pooling means holding a certain number of resources.
Furthermore, the present invention uses the pooled resources to configure a system that meets the configuration requirements of a new server device or the like requested by a data center manager, and automatically constructs the system configuration (connection). Etc.), and is implemented as hardware.

本発明では、様々なリソースを使用して、所与のシステム構成を有するシステムのハードウェアを構築するが、該リソースの中には運用上の都合や、構成上の都合、または障害発生に起因して、使用されずに置き去りにされるリソースも存在していて良い。
本発明の共用リソース管理システムは、このようなリソースも含めて、全てのリソースの状態を管理し、プーリングされているリソースの内から、センタ管理者によって指定された新しいサーバ要件に適合するリソースを摘出し、該リソースが有効（動作可能か否かの検証も含む）と判断できれば、該リソースを使用するシステムを構成すると共に、該システムに対応するサーバ装置等のハードウェアを構築（即ち、結線まで含めて実装）することを特徴とする。 In the present invention, the hardware of a system having a given system configuration is constructed using various resources, and some of these resources are caused by operational convenience, configuration convenience, or failure occurrence. There may also be resources left unused and left behind.
The shared resource management system of the present invention manages the state of all resources including such resources, and selects resources that meet the new server requirements specified by the center administrator from among the pooled resources. If it is determined that the resource is valid (including verification of whether or not it can be operated), a system that uses the resource is configured, and hardware such as a server device corresponding to the system is constructed (that is, connected) It is featured that it is implemented inclusive).

以下、本発明のリソース共用管理センタシステム及びリソース管理サーバ装置の実施形態について、図面を参照して詳細に説明する。
図１は、本発明の実施形態に係る共用リソース管理システムの全体構成を示す構成図である。
同図において、本実施形態の共用リソース管理システムは、複数のリソースを管理するサーバ装置であるリソース管理サーバ１００（本発明の実施形態に係るリソース管理サーバ装置）と、リソースとして提供されてプログラムを実行するサーバ装置であるサーバ１０００〜５０００と、を備える。
なお、サーバ５０００は、現在、実際には稼動していないサーバ装置（計画段階のサーバ装置）であるものとする。 Hereinafter, embodiments of a resource sharing management center system and a resource management server apparatus according to the present invention will be described in detail with reference to the drawings.
FIG. 1 is a configuration diagram showing the overall configuration of a shared resource management system according to an embodiment of the present invention.
In the figure, the shared resource management system of the present embodiment includes a resource management server 100 (resource management server apparatus according to the embodiment of the present invention) that is a server device that manages a plurality of resources, and a program provided as a resource. And servers 1000 to 5000 which are server devices to be executed.
It is assumed that the server 5000 is a server device that is not actually operating at present (a server device at the planning stage).

リソース管理サーバ１００は、データを格納するデータベース１１０と、後述するリソース（ハードウェア資源）の状態情報及び障害情報を収集する機能を有する機能モジュール１２０と、リソースの組み替え及び実装を担う機能モジュール１３０と、センタ管理者が要求するシステムの構成要件をサーバ装置等の具体的なリソースを含めて記載した構成カタログ１４０と、を備える。 The resource management server 100 includes a database 110 for storing data, a function module 120 having a function of collecting status information and failure information of resources (hardware resources) described later, and a function module 130 responsible for resource rearrangement and implementation. And a configuration catalog 140 that describes the configuration requirements of the system requested by the center manager, including specific resources such as server devices.

サーバ（１０００〜５０００）は、リソースの状態情報及び障害情報をリソース管理サーバ１００に送出する機能を有する機能モジュール１５０を共通に備えるが、この他に、ネットワークとのインタフェース機能を担うネットワークＩ／Ｆ（１１００〜５１００）及びネットワークＩ／Ｆ（１２００，３１００，３２００）と、ディスク記憶装置（符号は省略）等のリソースを備える。 The servers (1000 to 5000) are commonly provided with a functional module 150 having a function of transmitting resource state information and failure information to the resource management server 100, but in addition to this, a network I / F that performs an interface function with the network. (1100-5100) and network I / F (1200, 3100, 3200), and a disk storage device (reference numerals are omitted).

なお、ここでは、リソース管理サーバ１００にはネットワークＩ／Ｆを配備していないが、本発明では、一般に、リソース管理サーバ１００にも、ネットワークＩ／Ｆを備えることが可能である。
一般に、サーバ（１０００〜５０００）は、ネットワークＩ／Ｆ（１１００〜５１００）や、ネットワークＩ／Ｆ（１２００〜３２００）が備える通信プロトコルにより、ネットワーク（インターネット網等）を介して、ユーザの情報処理装置と接続されている。
ここで、上記のリソースとは、システムを構築するのに必要なハードウェア資源一般を示すものであり、例えば、ＣＰＵ装置、メモリ装置、内蔵ディスク装置、及びネットワークＩ／Ｆなどが範疇に含まれるものとする。 Here, the network I / F is not provided in the resource management server 100, but in the present invention, generally, the resource management server 100 can also be provided with the network I / F.
Generally, the server (1000 to 5000) is configured to process user information via a network (Internet network or the like) using a communication protocol provided in the network I / F (1100 to 5100) or the network I / F (1200 to 3200). Connected to the device.
Here, the above-mentioned resources indicate general hardware resources necessary for constructing a system, and include, for example, a CPU device, a memory device, a built-in disk device, and a network I / F. Shall.

以下、図１を参照して、本実施形態の共用リソース管理システムの機能について説明する。
本発明に係る共用リソース管理システム（図１）は、顧客のサーバ等を管理する大規模なデータセンタ等に存在する複数のリソースを、１つのサーバで管理し、複数のシステム間で共用できるようにする。
即ち、クラスタ構成を有するサーバ装置のリソースの内、障害になった一部のサーバのリソースの状態情報、及び単一のサーバ装置で発生したリソースの障害状況を示す障害情報も、リアルタイムにリソースを管理するリソース管理サーバ１００に通知する。
また、センタ管理者が要求するシステムの新しい構成要件に対し、統合的に管理されているリソースのプーリング情報を照合して該新しいシステムを構成すると共に、該システムを構成するハードウェアとしての実装を行う。 Hereinafter, the function of the shared resource management system of this embodiment will be described with reference to FIG.
The shared resource management system (FIG. 1) according to the present invention can manage a plurality of resources existing in a large-scale data center that manages a customer's server, etc., by a single server and can be shared among a plurality of systems. To.
That is, among the resources of server devices having a cluster configuration, the resource status information of some failed servers and the failure information indicating the failure status of resources that occurred in a single server device are also allocated in real time. The resource management server 100 to be managed is notified.
In addition, the new system is configured by collating pooling information of resources managed in an integrated manner with respect to the new configuration requirements of the system requested by the center manager, and implemented as hardware constituting the system. Do.

図１において、リソース管理サーバ１００は、サーバ（１０００〜５０００）を含む前述の各リソースを管理し、該管理しているリソースを用いて構築されているシステムの構成を変更する機能を有するサーバ装置である。
このソースを管理するリソース管理サーバ１００は、前述のとおり、データを格納するデータベース１１０と、リソースの状態情報及び障害情報を収集する機能を有する機能モジュール１２０と、リソースの組み替え及び実装を担う機能モジュール１３０と、から構成される。 In FIG. 1, a resource management server 100 has a function of managing each of the above-described resources including servers (1000 to 5000) and changing the configuration of a system constructed using the managed resources. It is.
As described above, the resource management server 100 that manages the source includes the database 110 that stores data, the function module 120 that has a function of collecting resource state information and failure information, and a function module that is responsible for resource rearrangement and implementation. 130.

ここでは、データベース１１０が格納するデータを、前記リソースの状態情報及び障害情報と、センタ管理者が要求するシステムの構成要件をサーバ装置等のリソースでもって記載した構成カタログ１４０との２種類としている。しかし、本発明では、一般に、前記リソースの状態情報及び障害情報と、構成カタログ１４０とは、それぞれ異なるデータベースに格納することも可能である。
なお、構成カタログ１４０とは、各システムが必要とするリソース（即ち、システムの構築に必要なリソースの名称及び結線）が掲載されている構成カタログのことであり、センタ管理者は、適当な入力手段により、その内容を更新することができる。 Here, the data stored in the database 110 is of two types: the status information and failure information of the resource, and the configuration catalog 140 that describes the configuration requirements of the system requested by the center manager with resources such as server devices. . However, in the present invention, in general, the resource state information and failure information and the configuration catalog 140 can be stored in different databases.
Note that the configuration catalog 140 is a configuration catalog in which resources required by each system (that is, names and connections of resources necessary for system construction) are listed. The contents can be updated by the means.

機能モジュール１５０は、リソース管理サーバ１００の管理下の各サーバを構成している各リソースの状態情報及び障害情報を機能モジュール１２０に通知する。
機能モジュール１５０は、この状態情報及び障害情報を、リソース管理サーバ１００の管理下の各サーバ（即ち、サーバ（１０００〜５０００））を構成している各リソースから得る。
機能モジュール１２０は、機能モジュール１５０から通知された状態情報及び障害情報を収集し、データベース１１０に保存する。
機能モジュール１３０は、前記状態情報及び障害情報を参照しながら、リソース管理サーバ１００の管理下の各サーバ装置、及び該各サーバ装置を構成しているリソースを組み替えて、構成カタログ１４０に要求されたシステムを構築するための計算を行うと共に、結線も含めて該システムを構成するハードウェアの実装を行う。 The functional module 150 notifies the functional module 120 of status information and failure information of each resource constituting each server managed by the resource management server 100.
The functional module 150 obtains the status information and the failure information from each resource constituting each server managed by the resource management server 100 (that is, the server (1000 to 5000)).
The functional module 120 collects status information and failure information notified from the functional module 150 and stores them in the database 110.
The function module 130 requested the configuration catalog 140 by referring to the status information and the failure information, and rearranging each server device managed by the resource management server 100 and the resources constituting each server device. The calculation for constructing the system is performed, and the hardware configuring the system including the connection is implemented.

より具体的には、該システムを構成するサーバ装置、及び該サーバ装置を構成するために必要なリソース（例えば、ＣＰＵ装置、メモリ装置、内蔵ディスク装置、及びネットワークＩ／Ｆ）を組み立てると共に、適切なドメインにネットワークケーブルの結線を行うものである。
また、内蔵ディスク装置等のディスク機器に対しては、必要とする機器類（例えば、コントローラ、エンクロージャ、キャッシュメモリ、ディスク等）と、管理ケーブル（ネットワーク）及びコントローラケーブル(Fibre Channel)との間の結線を行う。
さらに、ネットワーク機器に対しては、IP設定、ポート設定、及びVLAN設定を行うと共に、適切なドメインにネットワークケーブルの結線（接続）を行う。 More specifically, a server device that constitutes the system and resources necessary for constructing the server device (for example, a CPU device, a memory device, a built-in disk device, and a network I / F) are assembled and appropriately Network cable connection to various domains.
For disk devices such as built-in disk devices, it is necessary to connect between necessary devices (for example, controller, enclosure, cache memory, disk, etc.), management cable (network) and controller cable (Fibre Channel). Connect the wires.
In addition, for network devices, IP settings, port settings, and VLAN settings are made, and network cables are connected (connected) to appropriate domains.

以下、本実施形態の共用リソース管理システムの動作をケース毎に説明する。
図２は、本発明の実施形態の共用リソース管理システムで、サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生したことを示す説明図である。
図２において、サーバ１０００、サーバ２０００、及びサーバ３０００は、クラスタ構成であり、ネットワークＩ／Ｆ３１００に障害が発生したことを示している。
この場合、サーバ３０００上でのプログラムが動作不可能であるとクラスタが判断し、プログラムの実行をサーバ１０００へ切換えることになる。 Hereinafter, the operation of the shared resource management system of this embodiment will be described for each case.
FIG. 2 is an explanatory diagram illustrating that a failure has occurred in the network I / F 3100 that is a part of the resource of the server 3000 in the shared resource management system according to the embodiment of this invention.
In FIG. 2, a server 1000, a server 2000, and a server 3000 have a cluster configuration and indicate that a failure has occurred in the network I / F 3100.
In this case, the cluster determines that the program on the server 3000 is not operable, and switches the execution of the program to the server 1000.

図３は、サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生した場合の動作例を示す説明図である。
ネットワークＩ／Ｆ３１００に障害が発生した（図２）ので、サーバ１０００、サーバ２０００、及びサーバ３０００のクラスタ構成から、サーバ３０００が外され、単体のサーバ３０００として起動する。
この状態と、ネットワークＩ／Ｆ３１００の障害情報とを、機能モジュール１５０が収集し、状態・障害情報を収集する機能モジュール１２０に該情報を送出し、機能モジュール１２０は、データベース１１０に該情報を格納する。 FIG. 3 is an explanatory diagram illustrating an operation example when a failure occurs in the network I / F 3100 that is a part of the resource of the server 3000.
Since a failure has occurred in the network I / F 3100 (FIG. 2), the server 3000 is removed from the cluster configuration of the server 1000, the server 2000, and the server 3000, and starts as a single server 3000.
The function module 150 collects this state and the failure information of the network I / F 3100 and sends the information to the function module 120 that collects the state / failure information. The function module 120 stores the information in the database 110. To do.

他方、一般ユーザの要求に基づいてセンタ管理者が、新しいシステムを構成するために必要な構成要件（一般にはサーバ装置を含む）を作成し、データベース１１０の構成カタログ１４０に新規登録する。
機能モジュール１３０は、データベース１１０に格納されている構成カタログ１４０が更新されたことにより、新しいシステムを構成するために、自己が管理しているリソースを再計算する。
より具体的には、データベース１１０が格納する状態・障害情報を参照しながら、構成カタログ１４０に記載された新しいシステムを構成する複数のリソース（サーバ装置を含む）の各々について、現在、障害を持たずに動作可能であるか否かを再検証する。 On the other hand, the center manager creates a configuration requirement (generally including a server device) necessary for configuring a new system based on a request from a general user, and newly registers it in the configuration catalog 140 of the database 110.
The functional module 130 recalculates resources managed by itself in order to configure a new system by updating the configuration catalog 140 stored in the database 110.
More specifically, referring to the state / failure information stored in the database 110, each of a plurality of resources (including server devices) constituting the new system described in the configuration catalog 140 currently has a failure. Re-verify whether it is possible to operate.

図３に示すケースでは、前記新しいシステムの構成要求を満たすリソースが、構成カタログ１４０の内で、サーバ１０００として存在している。
よって、機能モジュール１３０は、サーバ１０００でもって、前記新しいシステムを構成し、結線まで含めたハードウェアの実装を行う。 In the case shown in FIG. 3, a resource that satisfies the configuration request for the new system exists as a server 1000 in the configuration catalog 140.
Therefore, the functional module 130 configures the new system with the server 1000 and implements hardware including connection.

図４は、サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生した場合の、他の動作例を示す説明図である。
このケースでも、図３のケースと同様に、ネットワークＩ／Ｆ３１００に障害が発生しているケースであるが、機能モジュール１３０は、ネットワークＩ／Ｆ３２００を介して実現されていたシステム機能については、サーバ３０００に移管せずに、カタログ構成要件に合致したサーバ５０００に移すものである。これにより、システム機能が分散できるので、負荷の集中が緩和される効果が得られる。 FIG. 4 is an explanatory diagram illustrating another operation example when a failure occurs in the network I / F 3100 that is a part of the resource of the server 3000.
Even in this case, as in the case of FIG. 3, a failure has occurred in the network I / F 3100, but the function module 130 is configured as a server function for the system function realized via the network I / F 3200. Instead of transferring to 3000, it is transferred to the server 5000 that matches the catalog configuration requirements. As a result, the system functions can be distributed, so that the effect of reducing the load concentration can be obtained.

この場合、機能モジュール１３０は、サーバ３０００が担っていたシステム機能の内、ネットワークＩ／Ｆ３１００を介して実現されていたシステム機能のみをサーバ３０００に担わせるように結線まで含めたハードウェアの実装を行うと共に、ネットワークＩ／Ｆ３２００を介して実現されていたシステム機能については、新たにサーバ５０００を充当して該機能を担わせるシステムを構成し、同じく結線まで含めたハードウェアの実装を行う。
この新しく構築されたサーバ５０００のリソースの状況情報についても、機能モジュール１５０から機能モジュール１２０に通知され、機能モジュール１２０は、該情報をリソースデータベース１１０に登録する。 In this case, the functional module 130 implements hardware implementation including connection so that only the system functions realized through the network I / F 3100 can be carried by the server 3000 among the system functions carried by the server 3000. At the same time, with respect to the system functions realized through the network I / F 3200, a new system is assigned to the server 5000 to perform the functions, and hardware including the connection is also implemented.
The status information of the resources of the newly constructed server 5000 is also notified from the function module 150 to the function module 120, and the function module 120 registers the information in the resource database 110.

図５は、本発明の実施形態に係る共用リソース管理システムの動作順序の１例を示すシーケンス図である。
同図において、サーバ１０００、サーバ２０００、及びサーバ３０００は、クラスタ構成を成すサーバ装置群であるものとする。但し、サーバ１０００は待機系のサーバ装置であり、サーバ２０００及びサーバ３０００は、現用系のサーバ装置とする。
また、サーバ５０００は、シングルサーバ装置であるものとする。 FIG. 5 is a sequence diagram showing an example of the operation order of the shared resource management system according to the embodiment of the present invention.
In the figure, it is assumed that a server 1000, a server 2000, and a server 3000 are a group of server devices that form a cluster configuration. However, the server 1000 is a standby server device, and the server 2000 and the server 3000 are active server devices.
The server 5000 is assumed to be a single server device.

機能モジュール１２０は、リソース管理サーバ１００の制御下において、前記サーバ装置群を構成する各サーバ、及び該サーバを構成しているリソースから、機能モジュール１５０を介して通知される状態情報及び障害情報を収集し、データベース１１０に該情報を保存する。
ここでは、サーバ３０００のリソースの一部であるネットワークＩ／Ｆ３１００に障害が発生した場合を説明する。
サーバ１０００、サーバ２０００、及びサーバ３０００はクラスタ構成を成すサーバ装置群であるので、ネットワークＩ／Ｆ３１００が障害になったことで、サーバ３０００上でのプログラムが実行不可能になったとクラスタが判断し、サーバ１０００へのプログラムの実行の切換えを行うことになる。 Under the control of the resource management server 100, the functional module 120 receives status information and failure information notified via the functional module 150 from each server constituting the server device group and the resources constituting the server. Collect and store the information in the database 110.
Here, a case where a failure occurs in the network I / F 3100 that is a part of the resource of the server 3000 will be described.
Since the server 1000, the server 2000, and the server 3000 are a group of server devices that form a cluster configuration, the cluster determines that the program on the server 3000 has become unexecutable due to the failure of the network I / F 3100. The execution of the program to the server 1000 is switched.

まず、タイミングＴ１で、ネットワークＩ／Ｆ３１００に障害が発生したので、サーバ１０００、サーバ２０００、及びサーバ３０００のクラスタ構成から、サーバ３０００が外され、単体のサーバ装置であるサーバ３０００として起動する。この状態を示す情報と、ネットワークＩ／Ｆ３１００の障害情報とを、機能モジュール１５０が収集する。
タイミングＴ２で、機能モジュール１５０は、機能モジュール１２０に対して該状態・障害情報を送出する。
タイミングＴ３で、機能モジュール１２０は、該状態・障害情報をデータベース１１０に格納する。 First, since a failure has occurred in the network I / F 3100 at timing T1, the server 3000 is removed from the cluster configuration of the server 1000, the server 2000, and the server 3000, and the server 3000 is started as a single server device. The function module 150 collects information indicating this state and failure information of the network I / F 3100.
At timing T2, the functional module 150 sends the state / failure information to the functional module 120.
At timing T3, the functional module 120 stores the state / failure information in the database 110.

その後、タイミングＴ４で、センタ管理者が、一般ユーザのシステム要求を継続するには新しいリソース（一般にはサーバ装置を含む）が必要となるため、該構成を示す新しい構成カタログをデータベース１１０に登録する（即ち構成カタログ１４０を更新する）。
タイミングＴ５では、データベース１１０の（構成カタログ１４０）の内容が更新されたので、機能モジュール１３０（リソース配備を計算する機能モジュール）が、データベース１１０の障害情報等を参照しながら、構成カタログ１４０に示されているリソースの内、動作可能で構成可能なリソースが有るか否かを検証する。 After that, at timing T4, the center administrator needs a new resource (generally including a server device) to continue the system request of the general user, and therefore registers a new configuration catalog indicating the configuration in the database 110. (That is, update the configuration catalog 140).
At timing T5, since the contents of the (configuration catalog 140) of the database 110 are updated, the function module 130 (function module for calculating resource allocation) is displayed in the configuration catalog 140 while referring to the failure information and the like of the database 110. Verify whether there are any resources that can be operated and configured from among the configured resources.

ここでは、構成カタログ１４０に掲載されたリソースの内、要求された新しいシステムを構成するためには、ネットワークＩ／Ｆ３１００が使用不可能であるが、ネットワークＩ／Ｆ３２００は使用可能であり、かつネットワークＩ／Ｆ３２００を介して実現されていたシステム機能に対応するシステム構成は、障害発生前に、サーバ３０００を含む構成においてプログラムの実行を可能にしていた構成カタログの条件とも一致するので、該カタログ構成の要件に合ったサーバ５０００を実装可能なリソースと判断する。
この判断は、構成カタログに記載のリソースと、サーバ装置を含めて要求される新しいリソースの構成要件（ＣＰＵ装置、メモリ装置、内蔵ディスク装置、及びネットワークＩ／Ｆ）を照合することで行われる。 Here, among the resources listed in the configuration catalog 140, the network I / F 3100 cannot be used to configure the requested new system, but the network I / F 3200 can be used, and the network Since the system configuration corresponding to the system function realized via the I / F 3200 matches the conditions of the configuration catalog that enabled the execution of the program in the configuration including the server 3000 before the failure occurred, the catalog configuration It is determined that the server 5000 that meets the above requirements is a resource that can be implemented.
This determination is performed by collating the resources described in the configuration catalog with the configuration requirements (CPU device, memory device, built-in disk device, and network I / F) of new resources required including the server device.

ＣＰＵ装置の場合、上記の実装可能か否かの判断要素となるのは、動作周波数と、コア数（リソース要求を満たすか否か）である。
また、メモリ装置の場合は、上記の実装可能か否かを判断する判断基準は、メモリ容量がリソース要件を満たすか否かである。
また、内蔵ディスク装置の場合は、上記の実装可能か否かを判断する判断要素は、ディスク容量と、ＲＡＩＤ構成（リソース要件を満たすか否か）である。 In the case of a CPU device, the determination factors as to whether or not mounting is possible are the operating frequency and the number of cores (whether or not resource requirements are satisfied).
In the case of a memory device, the criterion for determining whether or not the above-described mounting is possible is whether or not the memory capacity satisfies the resource requirement.
In the case of a built-in disk device, the determination factors for determining whether or not mounting is possible are a disk capacity and a RAID configuration (whether or not resource requirements are satisfied).

さらに、ネットワークＩ／Ｆの場合は、上記の実装可能か否かを判断する判断要素は、転送レートと、ネットワークＩ／Ｆの数（リソース要件を満たすか否か）である。
なお、ネットワークケーブルの結線は手作業では行う必要がない。最初からネットワークＩ／Ｆに全て差し込まれており、データリンク層以上の接続で、 Switch のポートVLANと, タグ VLAN とを使用して行う。適切なドメインに接続を行えるか否かの判断は、リソース要件を満たすか否かによるものとする。 Further, in the case of the network I / F, the determination factors for determining whether or not the above-described implementation is possible are the transfer rate and the number of network I / Fs (whether or not the resource requirement is satisfied).
It is not necessary to connect the network cable manually. All are plugged into the network I / F from the beginning, and are connected using the port VLAN of the Switch and the tag VLAN in the connection above the data link layer. Whether or not a connection to an appropriate domain can be made depends on whether or not resource requirements are satisfied.

タイミングＴ６で、機能モジュール１３０は、サーバ５０００を実装可能と判断し、構成カタログ１４０の新たな構成情報を基にして、前述の新たなリソースを構成し、サーバ５０００を実装する。
ここで、「実装する」とは、実際にサーバに必要なＣＰＵ装置、メモリ装置、内蔵ディスク装置、ネットワークＩ／Ｆ等のリソースを物理的に動かすことなく、要求された新しいシステムに対応するハードウェアを、リソース間の結線の変更だけで再構成することであるものとする。 At timing T6, the functional module 130 determines that the server 5000 can be mounted, configures the new resource described above based on the new configuration information in the configuration catalog 140, and mounts the server 5000.
Here, “implement” means hardware corresponding to a requested new system without physically moving resources such as a CPU device, a memory device, a built-in disk device, and a network I / F actually required for the server. It is assumed that the hardware is reconfigured only by changing the connection between resources.

タイミングＴ７では、機能モジュール１２０が、新しく構築されたサーバ５０００のリソース状態についても、機能モジュール１５０を介してリソース管理サーバ１００に通知する。
タイミングＴ８では、機能モジュール１２０が、新しく構築されたサーバ５０００のリソース状態についても、リソースデータベース１１０に登録する。 At timing T7, the functional module 120 also notifies the resource management server 100 via the functional module 150 about the resource status of the newly constructed server 5000.
At timing T8, the functional module 120 registers the resource state of the newly constructed server 5000 in the resource database 110 as well.

本発明の実施形態に係る共用リソース管理システムによれば、障害が起きたリソースの状態を収集し、リソースデータベース１１０にリソースの状態情報を蓄積するので、他のシステムの業務カタログで動作可能な業務が無いか否かをシステムで検証し、確認することができる効果が有る。
また、障害が起きたリソースが存在する場合は、予備等の他のシステムの一部のリソースに切り替えることで、該障害が起きたリソースで構成されるシステムの機能の一部を、該他のシステムの一部に自動で切り替えることができる効果が有る。
また、特に、複数のサーバ装置が現用で１つのサーバ装置が待機用であるクラスタ構成の場合には、システム切換えの条件として、本発明を適用することができる効果が有る。 According to the shared resource management system according to the embodiment of the present invention, the status of a resource in which a failure has occurred is collected and the resource status information is stored in the resource database 110. There is an effect that the system can verify and confirm whether or not there is any.
In addition, if there is a failed resource, switch to a part of the resource of another system such as a spare so that a part of the function of the system composed of the failed resource is changed to the other resource. There is an effect that can be automatically switched to a part of the system.
In particular, in the case of a cluster configuration in which a plurality of server devices are currently used and one server device is for standby, the present invention can be applied as a system switching condition.

また、PaaS(Platform as a Service)のように、ソフトウェアを構築し、かつ稼動させるための土台となるプラットフォームにも応用して、システム間でリソースを他のシステムと共有できる効果が得られる。
また、クラウド・コンピューティングシステムを運営する集中管理センタ（コンピュータやスイッチを設置するための専用のセンタ) においても、この発明を採用することにより、システム間でリソースを他のシステムと共有できる効果が有る。 In addition, it can be applied to a platform that is the foundation for building and operating software, such as PaaS (Platform as a Service), and the effect of being able to share resources with other systems between systems can be obtained.
In addition, the central management center (dedicated center for installing computers and switches) that operates a cloud computing system also has the effect of allowing resources to be shared with other systems between the systems. Yes.

さらに、グリッド・コンピューティングシステム（インターネットなどの広域のネットワーク上にある複数の計算資源（ＣＰＵ装置などの計算機能や、ハードディスク装置などの情報格納領域）を結び付け、１つの複合したコンピュータシステムとしてのサービスを提供するシステム) においても、本発明を適用することにより、複数のコンピュータの計算モジュール間で、個々のリソースを共有できる効果が有る。 In addition, a grid computing system (a service that combines multiple computing resources (computation functions such as CPU devices and information storage areas such as hard disk devices) on a wide-area network such as the Internet as a combined computer system. In the present invention, the application of the present invention also has the effect of sharing individual resources among the calculation modules of a plurality of computers.

１００リソース管理サーバ（管理用）
１１０データベース
１２０，１３０，１５０機能モジュール
１４０構成カタログ
１０００〜５０００サーバ（リソース用）
１１００〜５１００ネットワークＩ／Ｆ（インタフェース）
３１００，３２００ネットワークＩ／Ｆ 100 Resource management server (for management)
110 Database 120, 130, 150 Function module 140 Configuration catalog 1000-5000 Server (for resources)
1100-5100 Network I / F (interface)
3100, 3200 Network I / F

Claims

Multiple resources as hardware resources,
Means for collecting status information indicating a status of each of the plurality of resources and fault information indicating a fault status of each of the plurality of resources;
Means for registering the collected status information and failure information in a database;
The hardware configuration of each of a plurality of configured systems is described using at least a part of the plurality of resources as a component, and the hardware configuration of a system to be newly configured is described at least a part of the plurality of resources. A configuration catalog that can be described as a component,
When the configuration catalog is updated, referring to the status information and the failure information, it is confirmed that each of the resources constituting the hardware of the new system specified in the configuration catalog is operable. And means for configuring and implementing the hardware of the system that uses the resource;
A shared resource management system characterized by comprising:

The resource category includes a server device and a CPU device, a memory device, a built-in disk device, and a network I / F (interface) necessary for configuring the server device. The shared resource management system according to claim 1.

3. The shared resource management system according to claim 1, wherein the mounting is performed including connection between the resources.

The plurality of configured systems and the system to be newly configured have a hardware configuration including a general user terminal device connected via a network. The shared resource management system according to claim 1.

3. The shared resource management system according to claim 2, wherein the category of the network includes an Internet network.

Hardware connection with multiple resources as hardware resources,
Means for collecting and registering in the database the status information indicating the status of each of the plurality of resources and the fault information indicating the fault status of each of the plurality of resources;
The hardware configuration of each of a plurality of configured systems is described using at least a part of the plurality of resources as a component, and the hardware configuration of a system to be newly configured is described at least a part of the plurality of resources. A configuration catalog that can be described as a component,
When the configuration catalog is updated, referring to the status information and the failure information, it is confirmed that each of the resources constituting the hardware of the new system specified in the configuration catalog is operable. And means for configuring and implementing the hardware of the system that uses the resource;
A resource management server device comprising: