JP2007506157A

JP2007506157A - Hierarchical management of dynamic resource allocation in multi-node systems

Info

Publication number: JP2007506157A
Application number: JP2006523412A
Authority: JP
Inventors: ソーダー，ベニー; チャッタージー，デバシシュ; チダムバラン，ラクシュミナラヤナン; ブロワー，デイビッド; コルレイン，キャロル; セムラー，ダニエル; カンタルジーブ，クリストファー; スタモス，ジェイムズ・ダブリュ
Original assignee: オラクル・インターナショナル・コーポレイション
Priority date: 2003-08-14
Filing date: 2004-08-13
Publication date: 2007-03-15
Anticipated expiration: 2024-08-13
Also published as: AU2004266017B2; JP4970939B2; WO2005017745A2; EP1654649B1; EP1654648A2; WO2005017745A3; CA2533744C; WO2005017783A2; JP4805150B2; CA2534807C; CA2534807A1; CA2533744A1; AU2004266019A1; EP1654649A2; JP2007502468A; AU2004266019B2; EP1654648B1; AU2004266019A2; AU2004266017A1; WO2005017783A3

Abstract

マルチノードデータベースサーバにより提供されるサービス間における、マルチノードデータベースシステムのリソースの動的な割当を、効率よく効果的に管理するための手法が用いられる。サービスは、データベースサーバ上でホストされる作業のカテゴリである。これらの手法は、異なるレベルでリソースの割当を管理する。特定のデータベースを使用するサービスに関し、このサービスによって実現される性能が監視される。データベースに割振られたリソースは、これらのサービスの各々についての性能の目標が確実に達成されるように、これらのサービス間で割当てられる。１クラスタのノードに割振られたリソースは、そのデータベースを使用するすべてのサービスについての性能の目標が確実に達成されるように、データベース間で割当てられる。１ファームのクラスタに割振られたリソースは、サービスレベルに関する合意およびバックエンドポリシーに基づき、クラスタ間で割振られる。この手法は、異なるレベルにおいてリソースを管理するために、ディレクタの階層を用いる。 A technique for efficiently and effectively managing the dynamic allocation of resources of the multi-node database system among services provided by the multi-node database server is used. A service is a category of work hosted on a database server. These approaches manage resource allocation at different levels. For services that use a particular database, the performance achieved by this service is monitored. Resources allocated to the database are allocated among these services to ensure that performance goals for each of these services are achieved. Resources allocated to a cluster of nodes are allocated between databases to ensure that performance goals for all services that use the database are achieved. Resources allocated to clusters in one farm are allocated between clusters based on service level agreements and back-end policies. This approach uses a director hierarchy to manage resources at different levels.

Description

関連出願
本願は、ここに引用により援用される、２００３年８月１４日に出願された米国仮出願第６０／４９５，３６８号、「コンピュータリソースの供給（Computer Resource Provisioning）」の優先権を主張する。本願は、ここに引用により援用される、２００３年９月３日に出願された米国仮出願第６０／５００，０９６号、「分散型システムにおける、サービスベースの作業負荷の管理および測定（Service Based Workload Management and Measurement in a Distributed System）」の優先権を主張する。本願は、ここに引用により援用される、２００３年９月３日に出願された米国仮出願第６０／５００，０５０号、「データベースの自動的かつ動的な供給（Automatic And Dynamic Provisioning Of Databases）」の優先権を主張する。 RELATED APPLICATIONS This application claims priority to US Provisional Application No. 60 / 495,368, filed Aug. 14, 2003, “Computer Resource Provisioning”, incorporated herein by reference. To do. This application is hereby incorporated by reference, US Provisional Application No. 60 / 500,096, filed Sep. 3, 2003, “Service Based Workload Management and Measurement in Distributed Systems. Claims the priority of “Workload Management and Measurement in a Distributed System”. This application is hereby incorporated by reference, US Provisional Application No. 60 / 500,050, filed September 3, 2003, “Automatic And Dynamic Provisioning Of Databases”. ”Claim priority.

本願は、以下の米国出願、すなわち、
ここに引用により援用される、２００４年８月１２日にベニー・サウダー（Benny Souder）他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「マルチノードシステムにおけるリソースの動的な割当の階層的管理（Hierarchical Management of the Dynamic Allocation of Resources in a Multi-Node System）（代理人整理番号第５０２７７−２３８２号）」と、
ここに引用により援用される、２００３年１１月２１日に出願された米国出願第１０／７１８，７４７号、「データベースの自動的かつ動的な供給（代理人整理番号第５０２７７−２３４３号）」と、
ここに引用により援用される、２００４年８月１２日にサンジャイ・カルスカー（Sanjay Kaluskar）他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「サーバ全体にわたる透過的なセッション移動（Transparent Session Migration Across Servers）（代理人整理番号第５０２７７−２３８３号）」と、
ここに引用により援用される、２００４年８月１２日にラクシミナラヤナン・チダムバラン（Lakshminarayanan Chidambaran）他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「サービスをホストするマルチノード環境における、サービスの性能グレードの計算（Calculation of Service Performance Grades in a Multi-Node Environment That
Hosts the Services）（代理人整理番号第５０２７７−２４１０号）」と、
ここに引用により援用される、２００４年８月１２日にラクシミナラヤナン・チダムバラン他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「マルチノードシステムにおけるインクリメンタルな実行時セッションの平均化（Incremental Run-Time Session Balancing in a Multi-Node System）（代理人整理番号第５０２７７−２４１１号）」と、
ここに引用により援用される、２００４年８月１２日にラクシミナラヤナン・チダムバラン他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「マルチノードシステムにおいて性能および可用性のレベルを実施するためのサービス配置（Service Placement for Enforcing Performance and Availability Levels in a Multi-Node System）（代理人整理番号第５０２７７−２４１２号）」と、
ここに引用により援用される、２００４年８月１２日にラクシミナラヤナン・チダムバラン他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「ノードおよびサーバインスタンスのオンデマンドな割当および割当解除（On Demand Node and Server Instance Allocation and De-Allocation）（代理人整理番号第５０２７７−２４１３号）」と、
ここに引用により援用される、２００４年８月１２日にラクシミナラヤナン・チダムバラン他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「マルチノードシステムにおける、回復可能な非同期メッセージ駆動型の処理（Recoverable Asynchronous Message
Driven Processing in a Multi-Node System）（代理人整理番号第５０２７７−２４１４号）」と、
ここに引用により援用される、２００４年８月１２日にキャロル・コルレイン（Carol Colrain）他により出願された米国出願第ＸＸ／ＸＸＸ，ＸＸＸ号、「サービスによる作業負荷の管理（Managing Workload by Service）（代理人整理番号第５０２７７−２３３７号）」とに関連する。 This application is based on the following US applications:
US application XX / XXX, XXX filed by Benny Souder et al. On Aug. 12, 2004, incorporated herein by reference, “Dynamic Resource Allocation Hierarchy in Multi-Node Systems (Hierarchical Management of the Dynamic Allocation of Resources in a Multi-Node System) (Attorney Docket No. 50277-2382) "
US application Ser. No. 10 / 718,747, filed Nov. 21, 2003, “Automatic and Dynamic Supply of Database (Attorney Docket No. 50277-2343),” filed Nov. 21, 2003, incorporated herein by reference. When,
US application XX / XXX, XXX filed by Sanjay Kaluskar et al. On August 12, 2004, incorporated herein by reference, “Transparent Session Migration across Servers. Across Servers) (Attorney Docket No. 50277-2383) "
U.S. Application No. XX / XXX, XXX, filed August 12, 2004 by Lakshminarayanan Chidambaran et al., “In a multi-node environment hosting services, Calculation of Service Performance Grades in a Multi-Node Environment That
Hosts the Services) (Attorney Docket No. 50277-2410) "
U.S. Application No. XX / XXX, XXX, filed Aug. 12, 2004, filed by Lacsimina Rayanan Chidambalan et al., “Incremental Runtime Session Averaging in Multi-Node Systems ( Incremental Run-Time Session Balancing in a Multi-Node System (Attorney Docket No. 50277-2411) "
U.S. Application No. XX / XXX, XXX, filed August 12, 2004 by Lacsimina Rayanan Chidambalan et al., "To implement performance and availability levels in multi-node systems. Service Placement for Enforcing Performance and Availability Levels in a Multi-Node System (Representative reference number 50277-2412) "
U.S. Application No. XX / XXX, XXX, filed August 12, 2004 by Lacsimina Rayanan Chidambalan et al., “On-demand allocation and deallocation of nodes and server instances ( On Demand Node and Server Instance Allocation and De-Allocation) (Attorney Docket Number 50277-2413) "
U.S. Application No. XX / XXX, XXX, filed August 12, 2004 by Lacsimina Rayanan Chidambalan et al., “Recoverable Asynchronous Message-Driven in Multi-Node Systems. Processing (Recoverable Asynchronous Message
Driven Processing in a Multi-Node System) (Attorney Docket No. 50277-2414) "
US Application XX / XXX, XXX, filed August 12, 2004, by Carol Colrain et al., “Managing Workload by Service,” incorporated herein by reference. (Attorney Docket No. 50277-2337) ”.

本願は、以下の国際出願、すなわち、
ここに引用により援用され、米国受理官庁（United States Receiving Office）においてオラクル・インターナショナル・コーポレイション（Oracle International Corporation）により２００４年８月９日に出願され、かつ、「データベースの自動的かつ動的な供給」と題された国際出願第ＰＣＴ／ＸＸＸＸ／ＸＸＸＸＸ号と、
ここに引用により援用され、米国受理官庁においてオラクル・インターナショナル・コーポレイションにより２００４年８月１３日に出願された国際出願第ＰＣＴ／ＸＸＸＸ／ＸＸＸＸＸ号、「サーバ全体にわたる透過的なセッション移動（代理人整理番号第５０２７７−２５９３号）」と、
ここに引用により援用され、米国受理官庁においてオラクル・インターナショナル・コーポレイションにより２００４年８月１３日に出願された国際出願第ＰＣＴ／ＸＸＸＸ／ＸＸＸＸＸ号、「サーバ全体にわたるステートレスなセッションの透過的な移動（Transparent Migration of Stateless Sessions Across Servers）（代理人整理番号第５０２７７−２５９４号）」と、
ここに引用により援用され、米国受理官庁においてオラクル・インターナショナル・コーポレイションにより２００４年８月１３日に出願された国際出願第ＰＣＴ／ＸＸＸＸ／ＸＸＸＸＸ号、「ノードおよびサーバインスタンスのオンデマンドな割当および割当解除（代理人整理番号第５０２７７−２５９５号）」とに関する。 This application consists of the following international applications:
Incorporated herein by reference, filed on August 9, 2004 by Oracle International Corporation at the United States Receiving Office, and “Automatic and Dynamic Supply of Databases” International application No. PCT / XXXX / XXXX, entitled “
International Application No. PCT / XXXX / XXXX, filed August 13, 2004 by Oracle International Corporation in the United States Receiving Office, "Transparent Session Movement Through Servers (Representative Organisation) No. 50277-2593),
International Application No. PCT / XXXX / XXXX, filed August 13, 2004 by Oracle International Corporation in the United States Receiving Office, "Transparent movement of stateless sessions across servers ( Transparent Migration of Stateless Sessions Across Servers) (Attorney Docket No. 50277-2594)
International Application No. PCT / XXXX / XXXX, filed Aug. 13, 2004, by Oracle International Corporation in the US Receiving Office, “On-Demand Allocation and Unallocation of Nodes and Server Instances” (Attorney Docket No. 50277-2595) ".

発明の分野
この発明は、作業負荷の管理に関し、特に、マルチノードコンピュータシステム内における作業負荷の管理に関する。 The present invention relates to workload management, and more particularly to workload management within a multi-node computer system.

発明の背景
企業は、データ処理システムのコストを削減し、効率を高める方法を探し求めている。一般的な企業のデータ処理システムは、企業のアプリケーションの各々に個別のリソースを割当てる。アプリケーションの、推定されるピーク負荷に対処するために、各アプリケーションには、十分な負荷が事前に割当てられる。各アプリケーションは、異なる負荷特性を有する。すなわち、アプリケーションによっては、日中に繁忙なものもあれば夜間に繁忙なものもあり、レポートによっては、週に一度稼動されるものもあれば月に一度稼動されるものもある。その結果、未使用の状態の多くのリソース容量が生じる。グリッド計算は、この未使用の容量の使用または解消を可能にする。実際に、グリッド計算は、計算の経済的側面を抜本的に変化させる態勢を整えている。 BACKGROUND OF THE INVENTION Companies are looking for ways to reduce the cost and increase efficiency of data processing systems. A typical enterprise data processing system allocates a separate resource for each enterprise application. Each application is pre-assigned sufficient load to handle the estimated peak load of the application. Each application has different load characteristics. That is, some applications are busy during the day and some are busy at night, and some reports are run once a week or once a month. As a result, a lot of resource capacity in an unused state is generated. Grid calculations allow this unused capacity to be used or eliminated. In fact, grid calculations are poised to radically change the economic aspects of calculations.

グリッドは、処理と、何らかの程度の共用記憶とを提供する計算素子の集合である。すなわち、グリッドのリソースは、そのクライアントの計算上の必要性および優先順位を満たすように動的に割当てられる。グリッド計算は、計算のコストを劇的に下げ、計算リソースの可用性を拡大し、より高い生産性と、より高い品質とを生じ得る。グリッド計算の基本的な理念は、電力グリッドまたは電話網に類似した、ユティリティとしての計算の概念である。グリッドのクライアントは、そのデータが存在する場所、または、計算が実行
される場所に関心を持たない。クライアントは単に、計算が行なわれることと、情報がクライアントの必要時にクライアントに配信されることとを望む。 A grid is a collection of computing elements that provide processing and some degree of shared storage. That is, the resources of the grid are dynamically allocated to meet the client's computational needs and priorities. Grid computation can dramatically reduce the cost of computation, increase the availability of computing resources, and result in higher productivity and higher quality. The basic idea of grid calculation is the concept of calculation as a utility, similar to a power grid or telephone network. Grid clients are not interested in where the data exists or where calculations are performed. The client simply wants the computation to be performed and that information be delivered to the client when the client needs it.

これは、電気ユティリティが作動する態様に類似する。すなわち、顧客は、発電機が存在する場所、または、電気グリッドが配線される態様を知らない。顧客は単に、電気を所望して電気を得る。目標は、計算をユティリティにすること、すなわち、どこにでも存在する日用品にすることである。そのため、これはグリッド（Grid）という名前を有する。 This is similar to the manner in which the electrical utility operates. That is, the customer does not know where the generator is located or how the electrical grid is wired. The customer simply wants electricity and gets it. The goal is to make the calculation a utility, i.e., a household item that exists everywhere. So it has the name Grid.

ユティリティとしてのグリッド計算というこの観点は、当然ながら、クライアント側の観点である。グリッドは、サーバ側から、すなわち裏側から見ると、リソースの割当、情報の共有、および高可用性に関するものである。リソースの割当により、リソースを必要とするか、または要求するすべてのものが、必要なものを確実に獲得するようになる。リソースがアイドル状態に置かれない一方で、要求は、応対されない状態で置かれる。情報の共有により、クライアントおよびアプリケーションが必要とする情報は、この情報が必要とされる場所および時間に確実に利用可能となる。高可用性により、すべてのデータおよび計算は、ちょうどユティリティ企業が電力を常に提供するように、そこに常に必ず存在することが確実となる。 This viewpoint of grid calculation as a utility is, of course, a client side viewpoint. From the server side, that is, from the back side, the grid relates to resource allocation, information sharing, and high availability. Allocating resources ensures that everything that needs or requests resources gets what it needs. While resources are not placed in an idle state, requests are placed in an unacknowledged state. Information sharing ensures that the information needed by clients and applications is available where and when this information is needed. High availability ensures that all data and calculations are always present there, just as utility companies always provide power.

データベースに対するグリッド計算
グリッド計算から利益を得ることのできるコンピュータ技術の一領域が、データベース技術である。グリッドは、複数のデータベースをサポートし、リソースを必要時に動的に割当および再割当して、各データベースに対する現時点での需要をサポートすることができる。データベースに対する需要が増大するのに伴い、そのデータベースには、より多くのリソースが割当てられ、他のリソースは、別のデータベースから割当を解除される。たとえば企業のグリッドにおいて、データベースは、グリッド上の１つのサーバブレード上で稼動する１つのデータベースサーバにより提供されている。そのデータベースにデータを要求するユーザの数が増大する。この増大に応答して、別のデータベース用のデータベースサーバが或るサーバブレードから除去されて、そのサーバブレードに、増大したユーザ要求を受けているデータベース用のデータベースサーバが供給される。 Grid computing for databases One area of computer technology that can benefit from grid computing is database technology. The grid can support multiple databases and dynamically allocate and reallocate resources as needed to support the current demand for each database. As demand for a database increases, more resources are allocated to that database, and other resources are deallocated from another database. For example, in an enterprise grid, the database is provided by one database server running on one server blade on the grid. The number of users requesting data from the database increases. In response to this increase, a database server for another database is removed from a server blade and the server blade is provided with a database server for a database that has received increased user requests.

データベース用のグリッド計算は、異なるレベルにおけるリソースの割当および管理を必要とする。１つのデータベースに対応するレベルにおいて、そのデータベースのユーザに提供される性能は、監視されなければならず、ユーザ間に割当てられるデータベースのリソースは、各ユーザについての性能目標が確実に満たされるように監視されなければならない。データベース間において、データベース間のグリッドのリソースの割当は、すべてのデータベースのユーザについての性能目標が確実に満たされるように管理されなければならない。これらの異なるレベルにおいてリソースの割当を管理し、このような管理を実行するのに必要とされる情報を管理する作業は、極めて複雑である。したがって、データベースシステムだけでなく、グリッド内において異なるレベルでリソースを割当てる他の種類のシステムのためのグリッド計算システムでは、リソースの管理を単純化し、かつ、リソースの管理に効率よく対処するメカニズムが必要とされる。 Grid calculations for databases require the allocation and management of resources at different levels. At the level corresponding to one database, the performance provided to the users of that database must be monitored, and the database resources allocated among the users ensure that the performance goals for each user are met. Must be monitored. Between databases, the allocation of grid resources between databases must be managed to ensure that performance goals for all database users are met. The task of managing the allocation of resources at these different levels and managing the information required to perform such management is extremely complex. Therefore, grid computing systems not only for database systems, but also for other types of systems that allocate resources at different levels within the grid, need a mechanism that simplifies resource management and efficiently addresses resource management. It is said.

このセクションに記載する手法は追求可能な手法であるが、必ずしも、これまでに考案されたか、または追求された手法ではない。したがって、特に示さない限り、このセクションに記載する手法はいずれも、このセクションに含まれているが故に単に先行技術として限定されると考えられるべきではない。 The approaches described in this section are approaches that can be pursued, but not necessarily approaches that have been previously devised or pursued. Accordingly, unless otherwise indicated, any approach described in this section should not be considered as being limited solely to the prior art because it is included in this section.

この発明は、添付の図面において限定ではなく例示として示されており、これらの図面において、同様の参照番号は同様の要素を指す。 The present invention is illustrated by way of example and not limitation in the accompanying drawings, in which like reference numerals refer to like elements, and in which:

発明の詳細な説明
マルチノード環境においてリソースの割当を管理するための方法および装置を記載する。以下の記載では、説明のために、多数の特定の詳細を明示して、この発明の完全な理解を図る。しかしながら、これらの特定の詳細を用いなくてもこの発明を実施できることが明らかであろう。場合によっては、周知の構造および装置をブロック図の形で示し、この発明をむやみに分かりにくくすることを避ける場合もある。 DETAILED DESCRIPTION OF THE INVENTION A method and apparatus for managing resource allocation in a multi-node environment is described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent that the invention may be practiced without these specific details. In some cases, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

ここでは、マルチノードデータベースシステムにより提供されるサービス間における、マルチノードデータベースシステムのリソースの動的な割当を、効率よく効果的に管理するために用いられる手法を記載する。サービスは、１つ以上のクライアントの便宜を図って実行される作業の特定の種類またはカテゴリである。サービスは、たとえば、ＣＰＵの処理時間、揮発性メモリへのデータの記憶およびそのデータへのアクセス、永続的な記憶装置（すなわちディスク記憶装置）からの読出およびその記憶装置への書込、ならびに、ネットワーク帯域幅またはバス帯域幅の使用を含む、コンピュータリソースのあらゆる使用または消費を含む。サービスは、たとえば、データベースサーバのクライアント上の特定のアプリケーションに対して実行される作業であり得る。 Here, a technique used for efficiently and effectively managing the dynamic allocation of resources of the multi-node database system between services provided by the multi-node database system will be described. A service is a specific type or category of work performed for the convenience of one or more clients. Services include, for example, CPU processing time, storing and accessing data in volatile memory, reading from and writing to persistent storage (i.e. disk storage), and Includes any use or consumption of computer resources, including use of network bandwidth or bus bandwidth. A service can be, for example, work performed on a specific application on a database server client.

これらの手法は、異なるレベルにおいてリソースの割当を管理する。特定のデータベースを使用するサービスに関し、これらのサービスによって実現される性能は監視される。データベースに割振られたリソースは、サービス間で割当てられて、各々についての性能目標が確実に達成されるようにする。１クラスタのノードに割振られたリソースは、データベース間で割当てられて、それらのデータベースを使用するすべてのサービスについての性能目標が確実に達成されるようにする。 These approaches manage resource allocation at different levels. For services that use specific databases, the performance achieved by these services is monitored. Resources allocated to the database are allocated between services to ensure that performance goals for each are achieved. Resources allocated to the nodes of a cluster are allocated between databases to ensure that performance goals for all services that use those databases are achieved.

この手法は、異なるレベルにおいてリソースを管理するために、ディレクタの階層を用いる。ディレクタの一種であるデータベースディレクタは、データベースおよびそのデータベースインスタンスを使用するサービス間で、データベースに割当てられるリソースを管理する。データベースディレクタは、サービス間でのデータベースインスタンスの割当を管理する。クラスタディレクタは、そのデータベースサーバがクラスタ上でホストされるデータベース間において、１クラスタのノードのリソースを管理する。さらに別のディレクタであるファームディレクタは、クラスタ間で割当てられるリソースを管理する。 This approach uses a director hierarchy to manage resources at different levels. A database director, which is a kind of director, manages resources allocated to a database between the database and the service that uses the database instance. The database director manages the allocation of database instances between services. The cluster director manages the resources of the nodes of one cluster among the databases whose database servers are hosted on the cluster. Still another director, a farm director, manages resources allocated between clusters.

図１は、この発明の一実施例を実現するために使用され得るマルチノードコンピュータシステムを示す。図１を参照すると、図１は、クラスタファーム１０１を示す。クラスタファームは、クラスタと呼ばれるノードのグループに編成される１組のノードである。クラスタは、そのクラスタ内のノード間において、何らかの程度の共用記憶（１組のディスクドライブへの共用アクセス等）を提供する。 FIG. 1 illustrates a multi-node computer system that can be used to implement one embodiment of the present invention. Referring to FIG. 1, FIG. 1 shows a cluster farm 101. A cluster farm is a set of nodes organized into a group of nodes called a cluster. A cluster provides some degree of shared storage (such as shared access to a set of disk drives) between nodes in the cluster.

クラスタファーム内のノードは、ネットワークを介して相互接続されたコンピュータ（ワークステーション、パーソナルコンピュータ等）の形態を取り得る。代替的に、ノードは、グリッドのノードであり得る。グリッドは、ラック上で他のサーバブレードと相互接続されたサーバブレードの形態を取ったノードで構成される。各サーバブレードは、１つのマザーボード上にプロセッサ、メモリ、ネットワーク接続および関連する電子機器を備えた包括的なコンピュータシステムである。一般に、サーバブレードは、（揮発性メモリ以外に）オンボードの記憶装置を含まず、ラック内の電源、冷却システム、および配線に加え、記憶装置（共用ディスク等）を共有する。 Nodes in a cluster farm can take the form of computers (workstations, personal computers, etc.) interconnected via a network. Alternatively, the node may be a grid node. The grid is composed of nodes in the form of server blades interconnected with other server blades on the rack. Each server blade is a comprehensive computer system with a processor, memory, network connection and associated electronics on a single motherboard. In general, server blades do not include on-board storage devices (other than volatile memory) and share storage devices (such as shared disks) in addition to the power supply, cooling system, and wiring in the rack.

クラスタファーム内のクラスタの決定的な特徴は、１つのクラスタから別のクラスタに
ノードを物理的に再接続する必要なしに、クラスタのノードがファーム内のクラスタ間でソフトウェア制御を介して自動的に転送され得ることである。クラスタは、ここではクラスタウェアと呼ばれるソフトウェアユティリティによって制御および管理される。クラスタウェアを実行して、クラスタからノードを除去すること、および、クラスタにノードを提供することが可能である。クラスタウェアは、人間の管理者からの要求を受取るコマンドラインインターフェイスを設けて、管理者が、クラスタにノードを提供し、クラスタからノードを除去するコマンドを入力し得るようにする。これらのインターフェイスは、アプリケーションプログラムインターフェイス（Application Program Interface（「ＡＰＩ」））の形態も取り得、ＡＰＩは、クラスタファーム内で実行されている他のソフトウェアによって呼出され得る。クラスタウェアは、ファーム内のクラスタの構成を規定するメタデータを使用および保存する。このメタデータは、どの特定のノードがクラスタ内に存在するかを含む、ファーム内のクラスタのトポロジーを規定するクラスタ構成のメタデータを含む。メタデータは、クラスタファーム内のクラスタに対してクラスタウェアが行った変更を反映するように変更される。クラスタウェアの一例が、Oracle（登録商標）により開発されたソフトウェア、たとえば、オラクル９ｉリアルアプリケーションクラスタ（Oracle9i Real Application Cluster）またはオラクルリアルアプリケーションクラスタ１０ｇ（Oracle Real Application Cluster 10g）である。オラクル９ｉリアルアプリケーションクラスタは、マイク・オールト（Mike Ault）およびマドフ・トゥーマ（Madhu
Tumma）による、オラクル９ｉＲＡＣ：オラクルリアルアプリケーションクラスタの構成および本質、第２版（２００３年８月２日）に記載されている。 A critical feature of clusters in a cluster farm is that the nodes of the cluster automatically go through software control between the clusters in the farm without having to physically reconnect the nodes from one cluster to another. It can be transferred. The cluster is controlled and managed by a software utility called here clusterware. Clusterware can be executed to remove nodes from the cluster and to provide nodes to the cluster. The clusterware provides a command line interface that receives requests from a human administrator so that the administrator can provide nodes to the cluster and enter commands to remove nodes from the cluster. These interfaces can also take the form of application program interfaces (“APIs”), which can be invoked by other software running in the cluster farm. Clusterware uses and stores metadata that defines the configuration of the clusters in the farm. This metadata includes cluster configuration metadata that defines the topology of the clusters in the farm, including which specific nodes are present in the cluster. The metadata is changed to reflect changes made by the clusterware to the clusters in the cluster farm. An example of clusterware is software developed by Oracle (registered trademark), for example, Oracle 9i Real Application Cluster (Oracle9i Real Application Cluster) or Oracle Real Application Cluster 10g (Oracle Real Application Cluster 10g). Oracle 9i Real Application Clusters are Mike Ault and Madhu Touma
Tumma), described in Oracle 9iRAC: Configuration and Essence of Oracle Real Application Cluster, Second Edition (August 2, 2003).

クラスタファーム１０１は、クラスタ１１０、１２０、および１３０を含む。クラスタの各々は、データベースへのアクセスを提供および管理する１つ以上のマルチノードデータベースサーバをホストする。 The cluster farm 101 includes clusters 110, 120, and 130. Each cluster hosts one or more multi-node database servers that provide and manage access to the database.

クラスタおよびマルチノードデータベースサーバ
図２は、この発明の一実施例に従ったクラスタ１１０を示す。クラスタの決定的な特徴は、クラスタが、当該クラスタのクライアントによって単一のユニットまたはエンティティとして扱われる点である。なぜなら、より詳細に説明するように、クライアントが、どの特定の１つまたは複数のノードが要求を実行するのかを指定せずに、クラスタによってホストされるサービスの要求を発行するためである。 Cluster and Multi-Node Database Server FIG. 2 shows a cluster 110 according to one embodiment of the present invention. A critical feature of a cluster is that the cluster is treated as a single unit or entity by clients of the cluster. This is because, as will be described in more detail, the client issues a request for a service hosted by the cluster without specifying which particular node or nodes will execute the request.

クラスタ１１０は、マルチノードデータベースサーバ２２２、２３２、および２４２を含む。マルチノードデータベースサーバ２２２、２３２、および２４２は、クラスタ１１０の１つ以上のノード上に存在する。マルチノードサーバ等のサーバは、統合されたソフトウェア構成要素と、プロセッサ上の統合されたソフトウェア構成要素を実行するためのメモリ、ノード、およびノード上のプロセス等の計算リソースの割当との組合せであり、ソフトウェアと計算リソースとの組合せは、専ら、１つ以上のクライアントのために特定の機能を実行する。マルチノードサーバは、データベース管理の他の機能の中でも、特定のデータベースへのアクセスを統制して容易にし、クライアントによる要求を処理してデータベースにアクセスする。マルチノードサーバ２２２、２３２、および２４２はそれぞれ、データベース２２０、２３０、および２４０へのアクセスを統制および提供する。サーバの別の例がウェブサーバである。 Cluster 110 includes multi-node database servers 222, 232, and 242. Multi-node database servers 222, 232, and 242 reside on one or more nodes of cluster 110. A server, such as a multi-node server, is a combination of integrated software components and allocation of computing resources such as memory, nodes, and processes on the nodes to execute the integrated software components on the processor The combination of software and computing resources performs a specific function exclusively for one or more clients. The multi-node server regulates and facilitates access to a particular database, among other functions of database management, and processes requests by clients to access the database. Multi-node servers 222, 232, and 242 govern and provide access to databases 220, 230, and 240, respectively. Another example of a server is a web server.

マルチノード計算システム内の複数のノードからのリソースは、サーバのソフトウェアを稼動するために割当てられ得る。ソフトウェアと、ノードからのリソースの割当との組合せの各々は、ここで「サーバインスタンス」または「インスタンス」と呼ばれるサーバである。したがって、マルチノードデータベースサーバは、複数のノード上で稼動し得る複数のサーバインスタンスを含む。マルチノードデータベースサーバのいくつかのインスタンスは、実際に同一ノード上で稼動し得る。マルチノードデータベースサーバは、複数
の「データベースインスタンス」を含み、各データベースインスタンスは、ノード上で稼動し、特定のデータベースへのアクセスを統制して容易にする。したがって、各インスタンスは、ここで、特定のデータベースのデータベースインスタンスと呼ぶことができる。クラスタは、マルチノードデータベースサーバをホストするためにしばしば使用される。 Resources from multiple nodes in a multi-node computing system can be allocated to run server software. Each combination of software and resource allocation from a node is a server referred to herein as a “server instance” or “instance”. Thus, a multi-node database server includes multiple server instances that can run on multiple nodes. Several instances of a multi-node database server can actually run on the same node. A multi-node database server includes a plurality of “database instances”, each database instance running on a node to regulate and facilitate access to a particular database. Thus, each instance can be referred to herein as a database instance for a particular database. Clusters are often used to host multi-node database servers.

クラスタ１１０に加え、マルチノードデータベースサーバ２２２、２３２、および２４２のクライアントは、クライアント２０３および２０５を含む。クライアント２０３および２０５は、たとえばネットワークを介してクラスタ１１０に相互接続されたコンピュータ上のアプリケーションを実行する。アプリケーションは、この用語のここでの使用時には、サーバと対話し、かつ、サーバの機能を使用するように構成されるソフトウェアの単位である。一般に、アプリケーションは、統合された機能と、１組の関連する機能を実行するソフトウェアモジュール（機械実行可能なコードまたは解釈可能なコードで構成されるプログラム、動的にリンクされるライブラリ等）とで構成される。 In addition to cluster 110, clients of multi-node database servers 222, 232, and 242 include clients 203 and 205. Clients 203 and 205 execute applications on computers interconnected to cluster 110 via, for example, a network. An application, as the term is used herein, is a unit of software that is configured to interact with a server and to use server functionality. In general, an application consists of integrated functions and software modules (programs composed of machine-executable code or interpretable code, dynamically linked libraries, etc.) that perform a set of related functions. Composed.

クライアント２０３は、たとえば、ＦＩＮアプリケーションおよびＰＡＹアプリケーションを実行するコンピュータプロセスを含む。ＦＩＮアプリケーションは、企業の会計上および財務上の情報を生成および分析するソフトウェアを含む。ＰＡＹアプリケーションは、企業の従業員の給与に関する情報を生成および追跡する。 The client 203 includes, for example, a computer process that executes a FIN application and a PAY application. FIN applications include software that generates and analyzes corporate accounting and financial information. The PAY application generates and tracks information about the salaries of company employees.

データベースサーバ２２２、２３２、および２４２のクライアントは、ネットワークを介してクラスタ１１０に相互接続されるコンピュータに限定されない。たとえば、データベースサーバ２２２は、データベースサーバ２３２のクライアントであり得る。 Clients of database servers 222, 232, and 242 are not limited to computers interconnected to cluster 110 via a network. For example, the database server 222 can be a client of the database server 232.

データベース、たとえばデータベース２２０、２３０、および２４０は、データベースオブジェクトの集合である。データベースオブジェクトは、構造化データのどのような形態をも含む。構造化データは、構造を規定するメタデータ記述に従って構造化されたデータである。構造化データは、リレーショナルテーブル、オブジェクトテーブル、オブジェクト−リレーショナルテーブル、拡張可能マークアップ言語（Extensible Markup Language「ＸＭＬ」）に従って構造化されたデータの本体、たとえばＸＭＬ文書を含む。 Databases, such as databases 220, 230, and 240, are collections of database objects. Database objects include any form of structured data. Structured data is data structured according to a metadata description that defines the structure. Structured data includes a relational table, an object table, an object-relational table, a body of data structured according to Extensible Markup Language (XML), such as an XML document.

セッション
クライアントがクラスタ１１０上のデータベースサーバと対話するために、そのクライアントに対してセッションが確立される。セッション、たとえばデータベースセッションは、クライアントのために、データベースインスタンス等のサーバに対して確立された特定の接続であり、クライアントは、この接続を介して一連の要求（データベースステートメントの実行の要求）を発行する。データベースインスタンス上に確立された各データベースセッションに関し、データベースセッションの現状を反映するセッション状態データが保存される。このような情報には、たとえば、セッションが確立されたクライアントの身元、および、データベースセッション内でソフトウェアを実行するプロセスにより生成された一時的数値変数が含まれる。 Session In order for a client to interact with a database server on cluster 110, a session is established for that client. A session, for example a database session, is a specific connection established for a client to a server, such as a database instance, through which the client issues a series of requests (requests to execute database statements) To do. For each database session established on the database instance, session state data reflecting the current state of the database session is stored. Such information includes, for example, the identity of the client with which the session was established, and temporary numeric variables generated by the process executing the software within the database session.

クライアントは、データベース接続要求をクラスタ１１０に送信することにより、データベースセッションを確立する。 The client establishes a database session by sending a database connection request to the cluster 110.

クラスタ１１０のクライアント、たとえばクライアント２０３および２０５は、クライアントと同じコンピュータ上に存在するクライアント側のインターフェイス構成要素を介して、クラスタ１１０と対話することができる。クライアント側のインターフェイス構成要素は、クライアント２０３および２０５により実行されるアプリケーションによって呼出されるＡＰＩ機能を含む。データベースサーバへの接続が割振られると、ノードを識別する接続情報が、クライアント側のインターフェイス構成要素によって受取られて使用さ
れ、クライアントによる以降の要求をノードに送信する。 Clients of cluster 110, such as clients 203 and 205, can interact with cluster 110 through client-side interface components that reside on the same computer as the client. The client-side interface components include API functions that are called by applications executed by the clients 203 and 205. When a connection to the database server is allocated, connection information identifying the node is received and used by the client-side interface component to send subsequent requests by the client to the node.

クライアントに割振られたデータベースセッションは、別のデータベースインスタンスに移動され得る。データベースセッションの移動は、別のノード上に新規のデータベースセッションを作成することを必然的に伴う。移動および接続に関する情報がクライアント側のインターフェイス構成要素に送信されている間、アプリケーションはこの情報にアクセスすることができず、アプリケーションはこの移動に「気付かない」。このようにして、移動は、アプリケーションにとって透過的に実行される。以前のデータベースセッションに関連する要求は、新規のデータベースセッション内で実行される。このようにしてデータベースセッションを移動するための技術は、「サーバ全体にわたる透過的なセッション移動（５０２７７−２３８３）」に記載されている。 Database sessions allocated to clients can be moved to another database instance. Moving a database session entails creating a new database session on another node. While information about movement and connection is being sent to the client-side interface component, the application cannot access this information and the application is “not aware” of this movement. In this way, the movement is performed transparently for the application. Requests associated with the previous database session are executed within the new database session. Techniques for moving database sessions in this way are described in “Transparent Session Movement Across Servers (50277-2383)”.

サービス
以前に述べたように、サービスは、１つ以上のクライアントの便宜を図って実行される特定の種類またはカテゴリの作業である。クラスタ１１０は、クライアント２０３および２０５に対し、データベース２２０にアクセスするためのデータベースサービスと、データベース２３０にアクセスするためのデータベースサービスと、データベース２４０にアクセスするためのデータベースサービスとを提供する。一般に、データベースサービスは、クライアントのために、データベースサーバによって実行される作業であり、この作業は一般に、データベースへのアクセスを要求するクエリーの処理および／または計算を行なう作業を含む。ここで使用されるクエリーという用語は、データベース言語、たとえばＳＱＬに適合するデータベースステートメントを指しており、データを追加、削除、または変更するための演算を指定し、あるいは、テーブル、オブジェクトビュー、および実行可能なルーチン等のデータベースオブジェクトを作成および変更するデータベースステートメントを含む。 Service As previously mentioned, a service is a specific type or category of work performed for the convenience of one or more clients. The cluster 110 provides the clients 203 and 205 with a database service for accessing the database 220, a database service for accessing the database 230, and a database service for accessing the database 240. In general, a database service is the work performed by a database server on behalf of a client, and this work typically includes the work of processing and / or computing queries that require access to the database. As used herein, the term query refers to a database statement that conforms to a database language, eg, SQL, and specifies operations to add, delete, or modify data, or tables, object views, and executions. Contains database statements that create and modify database objects such as possible routines.

データベースサービスは、どのようなサービスとも同様に、さらにサブカテゴリに分割または類別することができる。データベース２２０に対するデータベースサービスは、さらに分割されてＦＩＮサービスおよびＰＡＹサービスとなる。ＦＩＮサービスは、ＦＩＮアプリケーションのためにデータベースサーバ２２２によって実行されるデータベースサービスである。一般に、このサービスは、ＦＩＮアプリケーションについてのデータベースデータを記憶するデータベース２２０上のデータベースオブジェクトにアクセスすることを伴う。ＰＡＹサービスは、ＰＡＹアプリケーションのためにデータベースサーバ２２２によって実行されるデータベースサービスである。一般に、このサービスは、ＰＡＹアプリケーションについてのデータベースデータを記憶するデータベース２２０上のデータベースオブジェクトにアクセスすることを伴う。 Database services, like any service, can be further divided or categorized into subcategories. The database service for the database 220 is further divided into a FIN service and a PAY service. The FIN service is a database service executed by the database server 222 for the FIN application. In general, this service involves accessing a database object on database 220 that stores database data for the FIN application. The PAY service is a database service executed by the database server 222 for the PAY application. In general, this service involves accessing a database object on database 220 that stores database data for the PAY application.

クラスタによる作業がサービスのカテゴリおよびサブカテゴリに分割または類別され得るさまざまな態様が存在し、この発明は、どのような特定の態様にも限定されない。たとえば、作業は、その作業が実行されるユーザまたはユーザのグループ（企業、企業内の部門）に基づいてサービスに分割され得る。 There are various ways in which work by clusters can be divided or categorized into service categories and subcategories, and the invention is not limited to any particular aspect. For example, work may be divided into services based on the user or group of users (company, department within the company) on which the work is performed.

サービスを提供する参加者
図３は、データベース２２０に対するさまざまなサービスを提供することに携わるクラスタ１１０およびマルチノードデータベースサーバ２２２の構成要素を示す。図３を参照すると、マルチノードデータベースサーバ２２２は、データベースインスタンス３２２、３３２、３４２、３５２、および３６２を含み、これらのデータベースインスタンスは、それぞれノード３２０、３３０、３４０、３５０、および３６０上に存在して、データベース２２０へのアクセスを管理する。データベースインスタンスは、インスタンス管理アプリケーションを用いて特定のノードに供給されるか、または特定のノードから除去され
得る。インスタンス管理アプリケーションは、たとえば、オラクル９ｉリアルアプリケーションクラスタまたはオラクルリアルアプリケーションクラスタ１０ｇの一部として利用可能である。インスタンス管理アプリケーションは、ノードにデータベースインスタンスを供給するか、またはノードからデータインスタンスを除去するために、管理者またはクライアントによって呼出され得るコマンドラインインターフェイスまたはＡＰＩを提供する。 Participants Offering Services FIG. 3 shows the components of cluster 110 and multi-node database server 222 that are involved in providing various services for database 220. Referring to FIG. 3, multi-node database server 222 includes database instances 322, 332, 342, 352, and 362, which reside on nodes 320, 330, 340, 350, and 360, respectively. Thus, access to the database 220 is managed. A database instance can be provisioned to or removed from a particular node using an instance management application. The instance management application can be used as a part of the Oracle 9i real application cluster or the Oracle real application cluster 10g, for example. The instance management application provides a command line interface or API that can be invoked by an administrator or client to supply a database instance to a node or remove a data instance from a node.

マルチノードデータベースサーバ２２２のデータベースインスタンスは、特定のサービスに割当てられる。データベースインスタンス３２２および３３２は、サービスＦＩＮに割当てられる。データベースインスタンス３４２および３５２は、サービスＰＡＹに割当てられる。インスタンス３６２は、どのようなサービスにも割当てられない。サービスは、このサービスを実行するためにインスタンス、ノード、もしくはクラスタが割当てられたときに、インスタンス、ノード、もしくはクラスタを稼動するか、インスタンス、ノード、もしくはクラスタ上に存在するか、または、インスタンス、ノード、もしくはクラスタによりホストされると称される。したがって、ＦＩＮサービスは、データベースインスタンス３２２、３３２、およびノード３２０、３３０を稼動するか、またはこれらの上に存在すると称される。 A database instance of the multi-node database server 222 is assigned to a specific service. Database instances 322 and 332 are assigned to service FIN. Database instances 342 and 352 are assigned to service PAY. Instance 362 is not assigned to any service. A service is running, residing on an instance, node, or cluster when an instance, node, or cluster is assigned to perform this service, or an instance, It is said to be hosted by a node or cluster. Thus, the FIN service is said to run or reside on database instances 322, 332 and nodes 320, 330.

リスナー３９０は、クラスタ１１０上で稼動し、クライアントのデータベース接続要求を受取り、クラスタ１１０内のデータベースインスタンスにそれらの要求を方向付けるプロセスである。受取られたクライアントの接続要求は、サービス（サービスＦＩＮ、ＰＡＹ等）に関連する。クライアントの要求は、当該サービスをホストするデータベースインスタンスに方向付けられ、そこで、データベースセッションがクライアントに対して確立される。以前に述べたように、セッションは、別のデータベースインスタンスに移動され得る。リスナー３９０は、アプリケーションに対して透過的な態様で、特定のデータベースインスタンスおよび／またはノードに要求を方向付ける。リスナー３９０は、クラスタ１１０上のどのようなノード上でも稼動し得る。クライアントに対してデータベースセッションが一旦確立されると、クライアントはさらに別の要求を発行することができ、この要求は、関数か、または遠隔プロシージャの呼出の形態を取り得、トランザクションの実行開始、クエリーの実行、更新および他の種類のトランザクション動作の実行、トランザクションのコミットまたは終了、およびデータベースセッションの終了の要求を含む。 The listener 390 is a process that runs on the cluster 110 and receives client database connection requests and directs those requests to database instances in the cluster 110. The received client connection request relates to a service (service FIN, PAY, etc.). Client requests are directed to the database instance that hosts the service, where a database session is established to the client. As previously mentioned, the session can be moved to another database instance. The listener 390 directs requests to specific database instances and / or nodes in a manner that is transparent to the application. The listener 390 can run on any node on the cluster 110. Once a database session has been established for the client, the client can issue another request, which can be in the form of a function or a remote procedure call to initiate the execution of a transaction, query Includes requests to execute, update and perform other types of transaction operations, commit or end transactions, and end database sessions.

作業負荷の監視
リソースは、性能のレベルと、リソースに関するカーディナリティの制約とを満たすように割当および再割当が行なわれる。特定のサービスに対して確立された性能のレベルおよびリソースの可用性は、ここで、サービスレベルに関する合意と呼ばれる。一般にマルチノードシステムに該当し、必ずしも特定のサービスに該当しない、性能のレベルおよびリソースについてのカーディナリティの制約は、ここで、ポリシーと呼ばれる。たとえば、サービスＦＩＮについてのサービスレベルに関する合意は、性能のレベルとして、サービスＦＩＮに対する平均トランザクション時間が所定のしきい値を上回らないことを要求し、かつ、可用性の要件として、少なくとも２つのインスタンスがサービスＦＩＮをホストすることを要求することが考えられる。ポリシーは、どのようなノードのＣＰＵ利用率も８０％を上回らないことを要求することが考えられる。 Workload monitoring Resources are allocated and reallocated to meet performance levels and cardinality constraints on resources. The level of performance and resource availability established for a particular service is referred to herein as a service level agreement. The cardinality constraints on performance levels and resources that are generally applicable to multi-node systems and not necessarily specific services are referred to herein as policies. For example, service level agreements for service FIN require that the average transaction time for service FIN not exceed a predetermined threshold as a level of performance, and that at least two instances be serviced as availability requirements. It may be required to host the FIN. The policy may require that the CPU utilization of any node does not exceed 80%.

ポリシーは、ここで、バックエンドポリシーとも呼ばれ得る。なぜなら、これらのポリシーが、バックエンド管理者によって使用されてシステム全体の性能を管理しており、１組のサービスすべてのサービスレベルに関する合意を満たすにはリソースが不十分であると考えられる場合に、その１組のサービス間でリソースを割当てるためである。たとえば、ポリシーは、或るデータベースに対し、別のデータベースよりも高位の優先順位を割当てる。両方のデータベースのサービスの、サービスレベルに関する合意を満たすには十分
なリソースが存在していないときに、より高位の優先順位を有するデータベース、およびそのデータベースを使用するサービスは、リソースの割当時に優遇される。 The policy may also be referred to herein as a backend policy. Because these policies are used by back-end administrators to manage overall system performance and resources are considered insufficient to meet service level agreements for all sets of services. This is because resources are allocated between the set of services. For example, a policy assigns one database a higher priority than another database. When there are not enough resources to meet service level agreements for both database services, databases with higher priorities and services that use those databases are favored when allocating resources. The

サービスレベルに関する合意を満たすために、さまざまなリソース上に配置された作業負荷を監視および測定するメカニズムが必要とされる。作業負荷のこれらの測定値を用いて、サービスレベルに関する合意が満たされているかどうかを判断し、必要な場合、このサービスレベルに関する合意を満たすようにリソースの割当を調節する。 In order to meet service level agreements, a mechanism is needed to monitor and measure the workload placed on various resources. These measurements of workload are used to determine whether a service level agreement is met and, if necessary, adjust the resource allocation to meet this service level agreement.

作業負荷モニタ、たとえば作業負荷モニタ３８８は、クラスタのノード上で稼動してクラスタの作業負荷を監視および測定して「性能メトリクス」を生成する、分散された１組のプロセスである。作業負荷モニタ３８８は、クラスタ１１０上で稼動する。性能メトリクスは、性能の測定値に基づき、１つ以上のリソースまたはサービスに関する性能のレベルを示すデータである。これらの機能を実行するための手法は、「サービスによる作業負荷の測定（５０２７７−２３３７）」に記載されている。以下により詳細に説明するように、サービスレベルに関する合意を満たすようにリソースの割当を管理する責任を負う、マルチノードデータベースサーバ２２２内のさまざまな構成要素は、生成されたこの情報にアクセスすることができる。 A workload monitor, such as workload monitor 388, is a distributed set of processes that run on the nodes of the cluster to monitor and measure the workload of the cluster to generate “performance metrics”. The workload monitor 388 operates on the cluster 110. Performance metrics are data indicating the level of performance for one or more resources or services based on performance measurements. Techniques for performing these functions are described in “Work Load Measurement by Services (50277-2337)”. As described in more detail below, various components within the multi-node database server 222 that are responsible for managing the allocation of resources to meet service level agreements may access this generated information. it can.

性能のレベルまたは作業負荷を示す特徴または状態を測定するために使用され得る特定の種類の性能メトリクスは、ここで、性能測定値と呼ばれる。性能測定値には、たとえば、トランザクションの実行時間またはＣＰＵの利用率が含まれる。一般に、性能のレベルを必要とする、サービスレベルに関する合意は、性能測定値に基づくしきい値および基準により規定され得る。 A particular type of performance metric that can be used to measure a characteristic or condition indicative of a level of performance or workload is referred to herein as a performance measurement. The performance measurement value includes, for example, a transaction execution time or a CPU usage rate. In general, service level agreements that require a level of performance may be defined by thresholds and criteria based on performance measurements.

たとえば、トランザクションの実行時間は、性能測定値である。この測定値に基づいた、サービスレベルに関する合意は、サービスＦＩＮに対するトランザクションが、３００ミリ秒以内で実行されるべきであるというものである。さらに別の性能測定値が、ノードのＣＰＵ利用率である。この測定値に基づいたバックエンドポリシーは、ノードが、８０％を上回る利用率を生じないようにするというものである。 For example, transaction execution time is a performance measurement. A service level agreement based on this measurement is that a transaction for service FIN should be executed within 300 milliseconds. Yet another performance measurement is the CPU utilization of the node. A back-end policy based on this measurement is one that prevents the node from generating utilization above 80%.

性能メトリクスは、クラスタの性能、および、クラスタ上で稼動するサービスか、クラスタ内のノードか、または特定のデータベースインスタンスの性能を示し得る。サービスに特有の性能メトリクスまたは測定値は、ここで、サービスの性能メトリクスまたは測定値と呼ばれる。たとえば、サービスＦＩＮに対するサービス性能測定値は、サービスＦＩＮに対して実行されるトランザクションについてのトランザクション時間である。 Performance metrics may indicate the performance of the cluster and the performance of a service running on the cluster, a node in the cluster, or a particular database instance. Service specific performance metrics or measurements are herein referred to as service performance metrics or measurements. For example, the service performance measurement for service FIN is the transaction time for a transaction executed for service FIN.

階層的なリソースの割当
グリッド計算は、サービスレベルに関する合意を満たすようにコンピュータリソースを動的に割当てることを必要とする。一実施例において、コンピュータリソースは、１つ以上のレベルのリソースの割当の階層において平均化または調節される。階層の各レベルは、使途（サービス等）間で割当てられる、一組の異なるリソースプールを有する。リソースプールは、特定の種類のリソースのグループであり、たとえば、サービスにとって利用可能なノードおよびデータベースインスタンスか、データベースにとって利用可能なノードか、または、クラスタにとって利用可能なノードである。リソースの割当における３つのレベルが、データベースレベル、クラスタレベル、およびファームレベルである。 Hierarchical Resource Allocation Grid calculations require the dynamic allocation of computer resources to meet service level agreements. In one embodiment, computer resources are averaged or adjusted in a hierarchy of one or more levels of resource allocation. Each level of the hierarchy has a set of different resource pools that are allocated between uses (services, etc.). A resource pool is a group of a particular type of resource, for example, a node and database instance available to a service, a node available to a database, or a node available to a cluster. The three levels of resource allocation are the database level, the cluster level, and the farm level.

データベースレベル
データベースレベルにおいて、割当てられるリソースプールは、特定のデータベース用に現時点で使用されているものであり、データベースのデータベースインスタンスと、これらのデータベースインスタンスをホストするノードとを含む。データベースレベルのリ
ソースプール（すなわち、そのデータベースレベルにおいて割当てられ得るリソース）は、サービスレベルに関する合意を満たすようにデータベースのサービス間で割当てられる。一般に、このことは、インスタンス上にサービスを配置すること、およびインスタンス上にセッションを配置することを必要とする。 Database level At the database level, the allocated resource pool is the one currently used for a particular database and includes the database instances of the database and the nodes that host these database instances. Database level resource pools (ie, resources that can be allocated at that database level) are allocated among the services of the database to satisfy service level agreements. In general, this requires placing a service on the instance and placing a session on the instance.

セッションは、いくつかの態様で配置され得る。第１の態様は、ここで、接続時間の平均化と呼ばれる。リスナー３９０は、接続時間の平均化の際に、特定のサービスを要求するデータベース接続要求をデータベース２２０のインスタンスに方向付けることにより、サービスインスタンス間の作業負荷を平均化する。たとえば、データベースインスタンス３２２上のサービスＦＩＮが、他のインスタンスよりも良好なサービス性能を提供していると想定されたい。リスナー３９０は、これに応じて、ＦＩＮサービスを要求するデータベース接続要求のより大きな部分を、データベースインスタンス３２２に方向付ける。 Sessions can be arranged in several ways. The first aspect is herein referred to as connection time averaging. The listener 390 averages the workload between service instances by directing database connection requests that request a particular service to instances of the database 220 during connection time averaging. For example, assume that a service FIN on database instance 322 provides better service performance than other instances. In response, the listener 390 directs a larger portion of the database connection request for the FIN service to the database instance 322.

セッションを配置する第２の態様は、実行時セッションの平均化と呼ばれる。データベースセッションは、実行時セッションの平均化の際に、或るデータベースインスから別のデータベースインスタンスに移動される。データベースセッションは、透過的なセッション移動を用いて移動される。以前に述べたように、このことを実行するための技術は、「サーバ全体にわたる透過的なセッション移動」に記載されている。 The second aspect of placing sessions is called runtime session averaging. Database sessions are moved from one database instance to another during the runtime session averaging. Database sessions are moved using transparent session movement. As mentioned earlier, techniques for doing this are described in “Transparent Session Movement Throughout the Server”.

サービスの配置は、サービスの拡張および縮小を必然的に伴う。データベースインスタンスは、サービスの拡張および縮小時に、ホストするサービスに割当てられるか、または、ホストするサービスから割当を解除される。サービスをホストするようにデータベースインスタンスが割当てられるときに、そのインスタンス上のそのサービスに対してより多くのデータベースセッションが作成され得、そのため、そのサービスに関連し、かつ、そのサービス用に利用することのできるデータベースセッションの数が増大する。たとえば、サービスＦＩＮについてのサービスレベルに関する合意を満たすために、インスタンス３２２および３３２は、サービスＦＩＮを稼動するように割当てられる。サービスに対する需要が増大するのに伴い、サービスレベルに関する合意が満たされなくなる。サービスレベルに関する合意が満たされないときに、サービスレベルに関する合意は、違反されたと称される。このサービスレベルの違反に応答して、ＦＩＮサービスには、さらに別のインスタンス、すなわちインスタンス３４２が割当てられる。１つのインスタンスは、２つ以上のサービスを稼動し得る。例示のため、１つのインスタンスは１つのサービスしか稼動していない。したがって、ＦＩＮがインスタンス３４２に追加されると、サービスＰＡＹはインスタンス３４２から「休止状態にされ」、すなわち、インスタンス３４２は、ＰＡＹサービスのために使用されるリソースとして割当を解除され、インスタンス上のサービスは中止される。 Service deployment entails service expansion and contraction. Database instances are either assigned to hosted services or unassigned from hosted services upon service expansion and contraction. When a database instance is assigned to host a service, more database sessions can be created for that service on that instance, so it is associated with and utilized for that service The number of database sessions that can be increased. For example, to satisfy the service level agreement for service FIN, instances 322 and 332 are assigned to run service FIN. As demand for services increases, service level agreements will not be met. A service level agreement is said to have been violated when the service level agreement is not met. In response to this service level violation, the FIN service is assigned another instance, instance 342. An instance can run more than one service. For illustration purposes, one instance is running only one service. Thus, when FIN is added to instance 342, service PAY is “hibernated” from instance 342, ie, instance 342 is deallocated as a resource used for PAY service and service on the instance Is canceled.

クラスタレベル
クラスタレベルにおいて、クラスタに現時点で割当てられているリソースは、サービスレベルに関する合意を満たすように、データベース（すなわちデータベースサービス）間で平均化される。このレベルで平均化されるリソースプールは、データベースインスタンスと、これらのデータベースインスタンスをホストするノードとである。一般に、このレベルでリソースを平均化することは、クラスタ内の既存のノード間でインスタンスを供給することおよび休止状態にすることを必要とする。たとえば、サービスレベルの違反に応答してサービスレベルに関する合意を満たすために、インスタンス３６２が、ノード３６０、すなわち、マルチノードデータベースサーバ２２２用のクラスタ内に既に存在するノードに供給される。 Cluster level At the cluster level, the resources currently allocated to the cluster are averaged between databases (ie, database services) to satisfy service level agreements. Resource pools averaged at this level are database instances and the nodes that host these database instances. In general, averaging resources at this level requires provisioning and dormancy between existing nodes in the cluster. For example, to satisfy service level agreements in response to service level violations, instances 362 are provided to nodes 360, ie, nodes that already exist in the cluster for multi-node database server 222.

ファームレベル
このレベルにおいて、サービス間で割当てられ得るリソースは、クラスタ内のノードで
ある。クラスタ用のノードのプールは動的である。たとえば、サービスレベルの違反に応答して、ノード３７０が、マルチノードデータベースサーバ２２２のクラスタに追加される。 Farm level At this level, resources that can be allocated between services are nodes in the cluster. The pool of nodes for the cluster is dynamic. For example, in response to a service level violation, node 370 is added to the cluster of multi-node database server 222.

リソースの割当を調節するためのアクションの階層
一般に、リソースの割当の階層のより低位のレベルにおけるリソースの調節は、より高位のレベルにおけるリソースの調節ほど混乱を招かない。データベースレベルにおいて、稼動中のインスタンス間でデータベースセッションおよびサービスを移動することは、クラスタレベルにおいて、新規のデータベースインスタンスを供給するか、または休止状態にすることほど混乱を招かず、費用もかからない。リソースが既に割振られたエンティティ間で、割振られたリソースを入換えることは、より高位のレベルからより多くのリソースを割振る要求ほど費用がかからない。データベースレベルにおけるサービスの配置およびセッションの移動は、クラスタレベルにおける、データベース用のデータベースインスタンスの数の変更ほど費用がかからない。データベースレベル内において、セッションを移動して、サービスを既にホストしているインスタンス間でそれらのセッションを入換えることは、サービスの拡張および縮小ほど費用がかからない。なぜなら、後者がサービスのトポロジーに影響を与え、負荷の分配に対して全体的により大きな影響を及ぼし得るためである。 Hierarchy of Actions for Adjusting Resource Allocation In general, resource adjustments at lower levels of the resource allocation hierarchy are less disruptive than resource adjustments at higher levels. Moving database sessions and services between running instances at the database level is less confusing and less expensive than providing a new database instance or putting it dormant at the cluster level. Replacing allocated resources between entities that have already been allocated resources is less expensive than a request to allocate more resources from a higher level. Service placement and session movement at the database level is less expensive than changing the number of database instances for the database at the cluster level. Within the database level, moving sessions and swapping them between instances that are already hosting services is less expensive than expanding and contracting services. This is because the latter affects the service topology and can have a greater overall impact on load distribution.

サービスレベルの違反を是正するために、リソースの割当は、リソースの割当の階層の最下位のレベルから調節される。このようにすることで、サービスレベルの違反は、それほど混乱を招かず、かつ、それほど費用のかからない態様で一般に是正される。データベースレベルにおいてリソースの割当を調節することによってサービスレベルの違反が解消され得る場合、より高位のレベルのリソースの割当には頼らない。サービスレベルの違反によっては、全レベルまたはいくつかのレベルにおける調節を必要とするものがある。 To correct service level violations, resource allocation is adjusted from the lowest level of the resource allocation hierarchy. In this way, service level violations are generally rectified in a less disruptive and less expensive manner. If service level violations can be resolved by adjusting resource allocation at the database level, do not rely on higher level resource allocation. Some service level violations require adjustment at all levels or at several levels.

休止状態のサービスＰＡＹが、ＰＡＹについてのサービスレベルに関する合意を違反していない場合に限り、サービスレベルの違反に応答して、ＦＩＮがインスタンス３５２に供給され、ＰＡＹは、同じインスタンスから休止状態にされる。それ以外の場合、サービスＦＩＮは別のデータベースインスタンスに拡張される。しかしながら、サービスＦＩＮを供給するために利用可能なインスタンスが存在しない場合、新規のインスタンスが供給される。このことは、クラスタレベルにおいて割当を行なうこと、すなわち、新規のインスタンス３６２をノード３６０に、すなわち、クラスタ内に既に存在するノードに供給することにより行なわれる。次に、データベースレベルにおいてサービスＦＩＮをインスタンス３６２に拡張することにより、割当が行なわれ得る。インスタンス３６２は、サービスが供給された時点において、データベース２２０に既に割当てられていたインスタンスである。 In response to a service level violation, FIN is provided to instance 352 and PAY is hibernated from the same instance only if the dormant service PAY does not violate the service level agreement for PAY. The Otherwise, the service FIN is extended to another database instance. However, if no instance is available to supply the service FIN, a new instance is supplied. This is done by making an assignment at the cluster level, i.e. supplying a new instance 362 to node 360, i.e. to a node that already exists in the cluster. The assignment can then be made by extending the service FIN to the instance 362 at the database level. The instance 362 is an instance that has already been assigned to the database 220 when the service is supplied.

ディレクタの階層
一実施例に従うと、分散型ディレクタと呼ばれる分散型システムの構成要素は、各レベルにおけるリソースの作業負荷およびリソースの割当を管理する責任を負う。システム構成要素は、ここでその用語が使用されているように、ソフトウェアと、データと、そのソフトウェアを実行し、かつ、そのデータを使用および保存して特定の機能を実行する１つ以上のプロセスとの組合せである。分散型システムの構成要素は、複数のノード上で実行される。必ずしもそうでなくてよいが、好ましくは、ディレクタは、データベースインスタンスのシステム構成要素であり、データベースインスタンスの制御下で作動する。分散型ディレクタは、クラスタファーム１０１の複数のノード上で実行されるディレクタを含む。 Director Hierarchy According to one embodiment, the components of a distributed system, called a distributed director, are responsible for managing the resource workload and resource allocation at each level. A system component, as the term is used herein, is one or more processes that execute software, data, the software, and use and store the data to perform specific functions. In combination. The components of the distributed system are executed on a plurality of nodes. Preferably, but not necessarily, the director is a system component of the database instance and operates under the control of the database instance. The distributed director includes directors that are executed on a plurality of nodes of the cluster farm 101.

一実施例に従うと、ディレクタは、リソースの割当の階層の１つ以上のレベルにおける
リソースの割当を管理する責任を負う。具体的に、ディレクタは、各データベースに対し、データベースレベルにおけるリソースの割当を管理するデータベースディレクタとして機能する。データベースの他のデータベースインスタンスは、待機中のデータベースディレクタとして機能するディレクタを有することが考えられ、待機中のデータベースディレクタは、現時点で「活動中の」データベースディレクタが、たとえばシステム故障によりこの役割を果たすことができなくなった場合に、この活動中のデータベースディレクタを引き継ぐ準備を整えている。 According to one embodiment, the director is responsible for managing resource assignments at one or more levels of the resource assignment hierarchy. Specifically, the director functions as a database director that manages resource allocation at the database level for each database. Other database instances of the database may have a director that acts as a standby database director, and the standby database director plays this role, for example, due to a system failure. We are ready to take over this active database director if we are unable to do so.

ディレクタは、各クラスタに対し、クラスタディレクタとして機能する。クラスタディレクタは、クラスタに対し、クラスタレベルにおけるリソースの割当を管理する責任を負う。クラスタ内の他のディレクタは、待機中のクラスタディレクタとして機能する。 The director functions as a cluster director for each cluster. The cluster director is responsible for managing the allocation of resources at the cluster level for the cluster. Other directors in the cluster function as standby cluster directors.

最後に、ディレクタはファームディレクタとして機能する。ファームディレクタは、ファームレベルにおけるリソースの割当を管理する責任を負う。クラスタ内の他のディレクタは、待機中のファームディレクタとして機能する。 Finally, the director functions as a farm director. The farm director is responsible for managing resource allocation at the farm level. The other directors in the cluster function as standby farm directors.

ディレクタは、ディレクタの対応するレベルにおいて作業負荷を管理するのに必要とされる情報を受取り、保存し、生成する。 The director receives, stores, and generates the information needed to manage the workload at the corresponding level of the director.

図３を参照すると、ディレクタ３８０は、データベースインスタンス３４２上で稼動する。ディレクタ３８０は、データベース２２０に対するデータベースディレクタ、クラスタ１１０に対するクラスタディレクタ、およびクラスタファーム１０１に対するファームディレクタとして機能する。他のディレクタはそれぞれ、データベース２３０および２４０に対するデータベースディレクタとして、ならびに、それぞれクラスタ１２０および１３０に対するクラスタディレクタとして機能する。 Referring to FIG. 3, the director 380 runs on the database instance 342. The director 380 functions as a database director for the database 220, a cluster director for the cluster 110, and a farm director for the cluster farm 101. The other directors function as database directors for databases 230 and 240, respectively, and as cluster directors for clusters 120 and 130, respectively.

一般に、サービスレベルの違反を解消するために、リソースの割当の階層の全レベルにおけるディレクタが、この違反の是正に携わることが考えられる。データベースディレクタは、データベースレベルにおけるリソースの割当を調節することによってサービスレベルの違反の是正を試みる。サービスレベルの違反が、リソースの割当の階層の次に高位のレベル、すなわちクラスタレベルにおける調節を必要とする場合、データベースディレクタは、このサービスレベルの違反の解消を、クラスタディレクタに上げる。サービスレベルの違反がリソースの割当の階層の最高位のレベル、すなわちファームレベルにおける調節を必要とする場合、クラスタディレクタは、このサービスレベルの違反の解消を、ファームディレクタに上げる。サービスレベルの違反の是正は、全ディレクタによるリソースの割当の調節を必要とすることが考えられる。 In general, to resolve service level violations, directors at all levels of the resource allocation hierarchy may be involved in correcting this violation. The database director attempts to correct service level violations by adjusting the allocation of resources at the database level. If a service level violation requires adjustment at the next higher level of the resource allocation hierarchy, ie the cluster level, the database director raises the resolution of this service level violation to the cluster director. If a service level violation requires adjustment at the highest level of the resource allocation hierarchy, ie the farm level, the cluster director raises the resolution of this service level violation to the farm director. Correcting service level violations may require adjustment of resource allocation by all directors.

ディレクタは、メッセージ待ち行列を用いて互いに通信する。一実施例に従うと、メッセージ待ち行列は、ディレクタをホストするデータベースインスタンスのデータベース内に記憶されるテーブルである。テーブルのレコードまたは行は、メッセージ待ち行列に対応する。このレコードは、その種のメッセージに応答する責任を負うディレクタによってメッセージに応答して行なわれた、どのようなアクションの状態をも示す。たとえば、データベースディレクタは、クラスタディレクタにデータベースインスタンスを要求し得る。この要求は、待ち行列に追加される。クラスタディレクタは、このメッセージ待ち行列を走査し、要求を検出し、この要求に従って機能し、その要求に応答するために着手されたアクションを反映するために、そのレコードを更新する。クラスタディレクタは、データベースディレクタにメッセージを送信し、このメッセージは、データベースディレクタのメッセージ待ち行列内に配置される。 Directors communicate with each other using message queues. According to one embodiment, the message queue is a table stored in the database of the database instance that hosts the director. A record or row in the table corresponds to a message queue. This record indicates the status of any action taken in response to the message by the director responsible for responding to that type of message. For example, the database director may request a database instance from the cluster director. This request is added to the queue. The cluster director scans this message queue, detects the request, works according to the request, and updates its record to reflect the actions undertaken to respond to the request. The cluster director sends a message to the database director, which is placed in the database director's message queue.

データベース２２０内のテーブルを使用する利点は、マルチノードデータベースサーバ
２２２等のトランザクション指向のデータベースシステムの電力および機能を利用して、メッセージ待ち行列を永続的にかつ回復可能に記憶することができる点である。活動中のデータベースディレクタ、クラスタディレクタ、またはファームディレクタが故障すると、その地位に介入する待機中のディレクタは、ディレクタが故障時にメッセージを残した態様と整合性を有する状態で、メッセージ待ち行列にアクセスすることができる。このようなメッセージ待ち行列およびその使用は、「マルチノードシステムにおける、回復可能な非同期メッセージ駆動型の処理（５０２７７−２４１４）」に記載されている。 The advantage of using the tables in database 220 is that the message queue can be stored persistently and recoverably using the power and functionality of a transaction-oriented database system, such as multi-node database server 222. is there. If an active database director, cluster director, or farm director fails, the waiting director that intervenes in that position accesses the message queue in a manner consistent with the manner in which the director left a message at the time of the failure. be able to. Such message queues and their use are described in "Recoverable Asynchronous Message Driven Processing in Multi-Node Systems (50277-2414)".

データベースディレクタ
データベースディレクタは、サービスレベルに関する合意が確実に満たされるようにサービスのサービス性能を監視し、データベース２２０のデータベースインスタンスからサービスを拡張または縮小し、データベース２２０のデータベースインスタンス間においてデータベースセッションを移動する責任を負う。データベースディレクタはまた、データベースを用いる各サービスについてのサービスの性能メトリクスおよびサービスレベルに関する合意へのアクセスを有し、それらを記憶する。データベースディレクタは、これらのサービスの性能メトリクスおよびサービスレベルに関する合意に基づき、サービスレベルに関する合意が以下の２つの態様で満たされるようにする。すなわち、（１）リスナーに対し、当該リスナーによるサービスインスタンス間での作業負荷の平均化を可能にする情報を生成および送信することにより、サービスレベルに関する合意にサービス性能が準拠することを維持し、（２）サービスレベルの違反を是正するために、サービスレベルの違反を検出してリソースの割当に対する調節を開始することによってサービスレベルの違反を是正する態様で、サービスレベルに関する合意が満たされるようにする。 Database Director The Database Director monitors service performance of services to ensure that service level agreements are met, expands or contracts services from the database instances of the database 220, and moves database sessions between database instances of the database 220. Take responsible. The database director also has access to and stores service performance metrics and service level agreements for each service that uses the database. Based on the performance metrics and service level agreements of these services, the database director ensures that the service level agreements are satisfied in the following two ways: That is, (1) by generating and transmitting information that enables the listener to average the workload among service instances by the listener, to maintain that the service performance conforms to the agreement regarding the service level, (2) Ensure that service level agreements are met in a manner that corrects service level violations by detecting service level violations and initiating adjustments to resource allocation to correct service level violations. To do.

サービスレベルに関する合意にサービス性能が準拠することを維持するために、接続時間の平均化が用いられる。具体的には、ディレクタ３８０が、ここでは性能グレードと呼ばれる情報をリスナー３９０に生成および提供する。性能グレードは、作業負荷を平均化するときにリスナー３９０を案内して、サービス性能の準拠を維持する。性能グレードは、他のインスタンスを基準とした、或るインスタンス上のサービスの相対的なサービス性能を示す。リスナー３９０は、この性能グレードに基づき、より良好なサービス性能を提供するデータベースインスタンスに対し、サービスに対する接続ユーザ要求の分配を変更する。性能グレードを生成するための技術および性能グレードに基づいてこのような態様でユーザ要求を分配するための技術は、「サービスをホストするマルチノード環境における、サービスの性能グレードの計算（５０２７７−２４１０）」に記載されている。 Connection time averaging is used to keep service performance compliant with service level agreements. Specifically, director 380 generates and provides information, referred to herein as performance grade, to listener 390. The performance grade guides the listener 390 when averaging the workload to maintain service performance compliance. A performance grade indicates the relative service performance of a service on an instance relative to other instances. Based on this performance grade, the listener 390 changes the distribution of connected user requests for services to database instances that provide better service performance. Techniques for generating performance grades and techniques for distributing user requests in this manner based on performance grades are described in "Calculating Service Performance Grades in a Multi-Node Environment Hosting Services (50277-2410)". "It is described in.

サービスレベルの違反を是正することは、サービスレベルの違反を検出することを必要とする。ディレクタ３８０は、サービスの性能メトリクスとサービスレベルに関する合意とを比較することにより、サービスについてのサービスレベルの違反を検出する。たとえば、ディレクタ３８０は、作業負荷モニタ３８８から、サービスＦＩＮについての平均トランザクション時間が３０ミリ秒を上回ることを示す、サービスの性能メトリクスを受取る。サービスＦＩＮについてのサービスレベルに関する合意は、平均トランザクション時間が２０ミリ秒を上回らないことを要求する。ディレクタ３８０は、実際の平均トランザクション時間とサービスレベルに関する合意とを比較することにより、サービスレベルの違反を検出する。 Correcting service level violations requires detecting service level violations. Director 380 detects service level violations for a service by comparing service performance metrics and service level agreements. For example, the director 380 receives service performance metrics from the workload monitor 388 indicating that the average transaction time for the service FIN is greater than 30 milliseconds. Service level agreements for service FIN require that the average transaction time not exceed 20 milliseconds. The director 380 detects service level violations by comparing the actual average transaction time with the service level agreement.

データベースディレクタがサービスについてのサービスレベルの違反を検出すると、データベースディレクタは、リソースの割当の階層に従い、より混乱を招きかつ費用のかかるリソースの割当の調整を試みる前に、最も混乱を招かず最も費用のかからないリソースの割当の調節を試みる。このため、データベースディレクタはまず、サービスに割当てられたデータベースセッションを、サービス性能がより良好な別のデータベースインスタンスに移動することによって、そのサービスを既にホストしているインスタンス間で作業負
荷を平均化することにより、サービスレベルの違反が是正できるかどうかを判断する。移動されるデータベースセッションの数は、サービスインスタンス間で平均化された負荷を得ることを目標とする態様で選択される。 When the database director detects a service level violation for a service, the database director follows the resource allocation hierarchy and tries to adjust the allocation of the more confusing and expensive resources, and the least disruptive and least expensive. Try to adjust the allocation of resources that do not cost. For this reason, the database director first averages the workload among the instances already hosting the service by moving the database session assigned to the service to another database instance with better service performance. To determine whether service level violations can be corrected. The number of database sessions moved is selected in a manner that aims to obtain an averaged load between service instances.

データベースディレクタが、既存のサービスインスタンス間で負荷の均衡を取戻すことによりサービスレベルの違反が解消されないと判断した場合、ディレクタ３８０は、サービスを拡張することにより、すなわち、ここではターゲットデータベースインスタンスと呼ばれる既存の別のデータベースインスタンスを、サービスにホストするように割当てることにより、サービスレベルの違反の是正を試みる。ターゲットデータベースインスタンスがサービスをホストしていない場合、サービスは、サービスにホストするようにターゲットデータベースインスタンスを割当てることによって拡張される。ターゲットデータベースインスタンスが別のサービスをホストしている場合、データベースディレクタは、それを行なうことによって他のサービスについてのサービスレベルの違反を生じないと判断した場合に、そのサービスを休止状態にすることができる。 If the database director determines that the service level violation is not resolved by rebalancing the load among the existing service instances, the director 380 extends the service, i.e. referred to herein as the target database instance. Attempt to correct service level violations by assigning another existing database instance to host the service. If the target database instance does not host a service, the service is extended by assigning a target database instance to host in the service. If the target database instance hosts another service, the database director may put that service into a dormant state if it determines that doing so does not cause a service level violation for the other service. it can.

図４は、さらに別のデータベースインスタンス（「ターゲットデータベースインスタンス」）にサービスを拡張するために、当該ターゲットデータベースインスタンス上のサービスが最初に休止状態にされなければならないときに、データベース２２０に対するデータベースディレクタとしてのディレクタ３８０が辿るプロセスのフロー図を示す。例示のために、サービスＰＡＹはデータベースインスタンス３４２、３５２、および３６２によってホストされ、データベースインスタンス３６２は、ノード３６０上に存在する。サービスＦＩＮは、データベースインスタンス３４２に拡張されている。インスタンス３４２上で稼動するサービスＰＡＹのサービスインスタンスは、休止状態にされている。 FIG. 4 shows the database director for database 220 when the service on that target database instance must first be dormant to extend the service to yet another database instance (“target database instance”). A flow diagram of the process followed by the director 380 is shown. For purposes of illustration, service PAY is hosted by database instances 342, 352, and 362, which reside on node 360. Service FIN is extended to database instance 342. The service instance of the service PAY that operates on the instance 342 is in a dormant state.

図４を参照すると、ステップ４１０において、ディレクタ３８０は、リスナー３９０にブロッキングメッセージを送信する。このブロッキングメッセージは、リスナー３９０に対し、サービスＰＡＹに対するユーザ要求を、ターゲットデータベースインスタンス３４２に分配することを中止するように指示する。 Referring to FIG. 4, at step 410, director 380 sends a blocking message to listener 390. This blocking message instructs listener 390 to stop distributing user requests for service PAY to target database instance 342.

ステップ４２０において、データベースディレクタ３８０は、ターゲットデータベースインスタンス上のデータベースセッションを、サービスＦＩＮをホストする他のデータベースインスタンスに移動する。これらのデータベースセッションは、他のデータベースインスタンス間で作業負荷を平均化する態様で、他のデータベースインスタンス間で分配される。 In step 420, the database director 380 moves the database session on the target database instance to another database instance that hosts the service FIN. These database sessions are distributed among the other database instances in a manner that averages the workload among the other database instances.

ステップ４３０において、ディレクタ３８０は、リスナー３９０にサービス起動メッセージを送信する。サービス起動メッセージは、リスナー３９０に対し、サービスＦＩＮがインスタンス３６２上で稼動していることを通知する。 In step 430, the director 380 sends a service activation message to the listener 390. The service activation message notifies the listener 390 that the service FIN is operating on the instance 362.

ステップ４４０において、ディレクタ３８０は、実行時の平均化を用いて、サービスＦＩＮが稼動しているインスタンス（すなわち、インスタンス３２２、３３２、および３４２）間で作業負荷を平均化する。 In step 440, director 380 averages the workload among the instances where service FIN is running (ie, instances 322, 332, and 342) using runtime averaging.

場合によっては、ターゲットインスタンスについてのサービスレベルに関する合意に違反せずに、縮小して別のサービスを拡張するための場所を提供し得るサービスが存在しないとディレクタ３８０が判断することがある。この場合、ディレクタ３８０は、どのようなサービスもまだホストしていないデータベースインスタンスに、このサービスを拡張することを選択することができる。どのデータベースインスタンスも利用できない場合、ディレクタ３８０は、クラスタディレクタに１つのデータベースインスタンスを要求する。クラスタディレクタは、それに応答して、要求された別のデータベースインスタンスを供
給し、ディレクタ３８０に通知する。ディレクタ３８０は、この新規のデータベースインスタンスにサービスを拡張する。 In some cases, director 380 may determine that there is no service that can provide a place to scale and expand another service without violating service level agreements for the target instance. In this case, the director 380 may choose to extend this service to a database instance that has not yet hosted any service. If no database instance is available, the director 380 requests one database instance from the cluster director. In response, the cluster director provides another requested database instance and notifies the director 380. Director 380 extends the service to this new database instance.

サービスレベルに関する合意は、サービスのカーディナリティを制限し得る。たとえば、サービスＦＩＮについてのサービスレベルに関する合意は、少なくとも１つであるが３つを超えないデータベースインスタンスがサービスＦＩＮをホストすること、および、サービスＰＡＹが少なくとも３つのデータベースインスタンスによってホストされることを要求する。 Service level agreements may limit service cardinality. For example, service level agreements for service FIN require that at least one but no more than three database instances host service FIN and that service PAY be hosted by at least three database instances. To do.

サービスは、別のサービスを拡張するためにデータベースインスタンスを利用可能にすること以外の理由で、休止状態にされ得る。たとえば、サービスＦＩＮについてのカーディナリティの制約は、１日の時間に基づいて変化し得る。通常の営業時間中では、サービスＦＩＮのカーディナリティが、３の大きさであることが考えられるが、非営業時間中において、カーディナリティは１よりも大きくならないことが考えられる。非営業時間の開始時には、３個のデータベースインスタンスがサービスＦＩＮをホストしている。データベースディレクタ３８０は、データベースインスタンスのうちの２つのインスタンス上のサービスを休止状態にすることにより、サービスＦＩＮを縮小する。 A service may be hibernated for reasons other than making a database instance available to extend another service. For example, the cardinality constraints for service FIN may change based on the time of day. During normal business hours, the cardinality of the service FIN may be a magnitude of 3, but during non-business hours, the cardinality may not be greater than 1. At the start of non-business hours, three database instances host the service FIN. The database director 380 reduces the service FIN by putting services on two of the database instances into a dormant state.

データベースディレクタは、クラスタディレクタにより行なわれた要求に応答しなければならないことが考えられる。このようなアクションは、クラスタディレクタによる要求に応答して、データベースインスタンスを休止状態にすること、すなわち、データベースインスタンスにより現時点でホストされているサービスを休止状態にすることを含む。このステップは、クラスタディレクタが、或るデータベース用のデータベースインスタンスを、別のデータベース用のデータベースインスタンスで置き換えたいと望むときに必要とされる。 It is conceivable that the database director must respond to requests made by the cluster director. Such actions include, in response to a request by the cluster director, putting the database instance into a dormant state, i.e. putting a service currently hosted by the database instance into a dormant state. This step is required when the cluster director wants to replace a database instance for one database with a database instance for another database.

クラスタディレクタ
クラスタディレクタは、クラスタ内の既存のノードにデータベースインスタンスを供給し、これらの既存のノードからデータベースインスタンスを除去する責任を負う。クラスタディレクタはまた、データベースレベルのポリシーを実施する。データベースレベルのポリシーは、たとえば、データベース用のデータベースインスタンスのカーディナリティが、最小値および／または最大値以内に入ることを要求し、または、サービスレベルに関する合意のすべてを満たすのに十分なリソースが存在しないときに、クラスタディレクタが、リソースの割当に関してより高位の優先順位を有すると指定されたデータベースに、データベースインスタンスの割当を変更することを要求する。データベース間における、このような態様での変更はまた、より高位の優先順位が付けられたデータベースを用いるサービスに、リソースの割当を変更する。クラスタディレクタは、データベース間のリソースの割当についての優先順位を指定するデータへのアクセスを有し、そのデータを記憶する。このようなデータは、クラスタの管理者により構成され得る。 Cluster Director The cluster director is responsible for supplying database instances to existing nodes in the cluster and removing database instances from these existing nodes. The cluster director also enforces database level policies. Database level policies, for example, require that the cardinality of the database instance for the database be within the minimum and / or maximum values, or there are not enough resources to satisfy all of the service level agreements Sometimes, the cluster director requests to change the database instance assignment to the database designated as having a higher priority for resource assignment. Changes in this manner between databases also change the resource allocation to services that use databases with higher priorities. The cluster director has access to data that specifies priorities for the allocation of resources between databases and stores that data. Such data can be configured by a cluster administrator.

クラスタディレクタは、データベースインスタンスに対するデータベースディレクタの要求（「NEED-INSTANCE」要求）に応答してデータベースインスタンスを供給し、データベースインスタンスを除去する。データベースインスタンスをホストしていないノード（「自由ノード」）がクラスタ内に存在する場合、クラスタディレクタは、その自由ノードにデータベースインスタンスを供給することにより、データベースに別のノードを割当てる。 The cluster director supplies the database instance in response to the database director request for the database instance ("NEED-INSTANCE" request) and removes the database instance. If a node that does not host a database instance ("free node") exists in the cluster, the cluster director assigns another node to the database by providing the database instance to that free node.

クラスタ内に自由ノードが存在しない場合、クラスタディレクタは、ファームディレクタに「NEED-NODE」要求を発行することにより、ファームディレクタに１つのノードを要求することができる。ファームディレクタが１つのノードを提供し得ない場合、クラスタ
ディレクタは、クラスタによってホストされるデータベース間でデータベースインスタンスの割当を調停する。この調停は、クラスタ内のノードからデータベースのデータベースインスタンスを除去し、NEED-INSTANCE要求が生成されたデータベース用のデータベースインスタンスを供給することを必然的に伴う。 If there are no free nodes in the cluster, the cluster director can request a single node from the farm director by issuing a “NEED-NODE” request to the farm director. If the farm director cannot provide one node, the cluster director arbitrates database instance allocation among the databases hosted by the cluster. This arbitration entails removing the database instance of the database from the nodes in the cluster and providing a database instance for the database from which the NEED-INSTANCE request was generated.

図５は、データベース間でのデータベースインスタンスの割当を調停するためのプロセスを示すフロー図である。データベース２２０に対するデータベースディレクタは、別のデータベースインスタンスがサービスに必要とされていることを判断し、ディレクタ３８０であるクラスタディレクタに、NEED-INSTANCE要求を生成する。ディレクタ３８０は、クラスタディレクタとしてのその機能において、別のデータベース用のデータベースインスタンスがクラスタ１１０内のノードから除去されることにより、そのノードを用いて、データベース２２０用の別のデータベースインスタンスを供給することができると判断する。 FIG. 5 is a flow diagram illustrating a process for arbitrating the allocation of database instances between databases. The database director for database 220 determines that another database instance is required for the service, and generates a NEED-INSTANCE request to the cluster director, director 380. Director 380, in its function as a cluster director, uses that node to supply another database instance for database 220 by removing the database instance for the other database from the node in cluster 110. Judge that you can.

図５を参照すると、ステップ５１０において、クラスタディレクタとしてのディレクタ３８０は、要求を行なうデータベースディレクタ（すなわち、NEED-INSTANCE要求を発行するディレクタ）以外のデータベースディレクタに、「VOLUNTEER-TO-QUIESCE」要求を送信する。VOLUNTEER-TO-QUIESCE要求の目的は、データベースディレクタに対し、データベースディレクタが、データベースインスタンスを休止状態にし得るかどうか、すなわち、ディレクタのデータベース用のデータベースインスタンスのカーディナリティを下げることができるかどうかを尋ねるものである。データベースのデータベースディレクタは、申し出ること、すなわち、データベース用のデータベースインスタンスが休止状態にされ得ることを示すメッセージを送信することにより、応答し得る。データベースディレクタは、データベースインスタンスを休止状態にすること、すなわち、データベースインスタンスのカーディナリティを下げることを断り得る。データベースディレクタが、その要求を断るメッセージを送信し得る１つの理由は、サービスについての可用性の要件を満たすために、ディレクタのデータベースインスタンスのすべてが必要とされるためである。 Referring to FIG. 5, in step 510, director 380 as a cluster director sends a “VOLUNTEER-TO-QUIESCE” request to a database director other than the database director making the request (ie, the director issuing the NEED-INSTANCE request). Send. The purpose of the VOLUNTEER-TO-QUIESCE request is to ask the database director if the database director can put the database instance into a dormant state, that is, can reduce the cardinality of the database instance for the director's database. It is. The database director of the database can respond by sending a message indicating that the database instance for the database can be hibernated. The database director may refuse to put the database instance in a dormant state, i.e. reduce the cardinality of the database instance. One reason a database director may send a message that declines that request is because all of the director's database instances are required to meet the availability requirements for the service.

データベースディレクタ３８０が、データベースディレクタから、データベースインスタンスが休止状態にされ得ることを断言する少なくとも１つのメッセージを受取った場合、すなわち、２つ以上のデータベースディレクタが申し出た場合、ステップ５２０において、ディレクタ３８０は、申し出たデータベースの中からデータベースを選択する。リソースの割当の、より高位の優先順位を有するデータベースを優遇するために、リソースの割当の、より低位の優先順位を有するデータベースが選択され得る。例示のために、ディレクタ３８０は、データベース２４０を選択する。クラスタディレクタ３８４は、データベース２４０のデータベースディレクタにメッセージを送信し、データベース２４０のデータベースインスタンスをホストするノードから、このデータベースインスタンスを休止状態にする。 If the database director 380 receives from the database director at least one message affirming that the database instance can be hibernated, i.e., if more than one database director has offered, in step 520 the director 380 , Select a database from among the databases offered. In order to favor a database with a higher priority of resource allocation, a database with a lower priority of resource allocation may be selected. For illustration, director 380 selects database 240. The cluster director 384 sends a message to the database director of the database 240 and puts this database instance into a dormant state from the node hosting the database instance of the database 240.

データベースディレクタ３８０が、データベースインスタンスが休止状態にされ得ないことを断言するメッセージをデータベースディレクタから受取らない場合、すなわち、どのデータベースディレクタも申し出ない場合、ステップ５３０において、ディレクタ３８０は、リソースの割当のより低位の優先順位を有するデータベースを選択する。ディレクタ３８０は次に、選択したデータベースのデータベースディレクタにメッセージを送信し、データベースインスタンスを休止状態にする。データベースインスタンスを選択して休止状態にするための技術は、「ノードおよびサーバのオンデマンドな割当および割当解除（５０２７７−２４１３）」において、より詳細に記載されている。 If the database director 380 does not receive a message from the database director affirming that the database instance cannot be hibernated, i.e., no database director offers, then in step 530, the director 380 determines the allocation of resources. Select a database with a lower priority. Director 380 then sends a message to the database director for the selected database to put the database instance in a dormant state. Techniques for selecting a database instance to be dormant are described in more detail in “On-demand node and server allocation and deallocation (50277-2413)”.

ステップ５４０において、データベース２４０のデータベースディレクタは、ノード上のデータベースインスタンスを休止状態にして、クラスタディレクタ、すなわちディレク
タ３８０に対し、データベースインスタンスが休止状態にされたという通知（「INSTANCE-IDLE」通知）を送信する。ステップ５５０において、ディレクタ３８０は通知を受取る。 In step 540, the database director of database 240 puts the database instance on the node into a dormant state and notifies the cluster director, ie, director 380, that the database instance has been put into a dormant state ("INSTANCE-IDLE" notification). Send. In step 550, director 380 receives a notification.

ステップ５６０において、クラスタディレクタ３８０は、そのノードからデータベースインスタンスを除去し、たとえば、クラスタウェアのＡＰＩを供給するインスタンスを用いて、そのノードに対し、データベース２３０用のデータベースインスタンスを供給する。 In step 560, the cluster director 380 removes the database instance from the node and provides the database instance for the database 230 to the node using, for example, an instance that provides the clusterware API.

データベース間においてデータベースインスタンスの割当を調停した結果、ノードを放棄したデータベースによって提供されるサービスが、サービスレベルの違反を被るおそれがある。より詳細に説明するように、クラスタ内のノードのカーディナリティが増大し得ることから自由ノードが利用可能となり得、これらの自由ノードを用いて、このようなサービスレベルの違反を是正することができる。クラスタディレクタとしてのディレクタ３８０は、クラスタ構成のメタデータを監視して、より多くの自由ノードが利用可能になる時点を検出し、それらの自由ノードを、サービスレベルの違反を被るデータベースに割当てることができる。たとえば現在の例において、データベース２２０に対するデータベースディレクタは、データベースインスタンスを放棄した後に、サービスレベルの違反を引続き検出し、それに応答して、そのクラスタディレクタ、すなわちディレクタ３８０にNEED-INSTANCE要求を送信する。結果的に、ディレクタ３８０は、より多くの自由ノードがクラスタ１１０に追加されたことを検出した後に、その自由ノードにデータベースインスタンスを供給することにより、要求の１つに応答することができる。 As a result of arbitrating the allocation of database instances between databases, the service provided by the database that abandons the node may suffer a service level violation. As will be described in more detail, free nodes may be available because the cardinality of nodes in the cluster may increase, and these free nodes can be used to correct such service level violations. Director 380 as a cluster director can monitor cluster configuration metadata to detect when more free nodes become available and assign those free nodes to a database that is subject to service level violations. it can. For example, in the current example, the database director for database 220 continues to detect service level violations after abandoning the database instance, and in response, sends a NEED-INSTANCE request to its cluster director, director 380. As a result, director 380 can respond to one of the requests by providing a database instance to the free node after detecting that more free nodes have been added to cluster 110.

ファームディレクタ
ファームディレクタは、クラスタファーム内のクラスタ間におけるノードの割当、すなわち、１つのクラスタからノードを除去してそのノードを別のクラスタに供給することにより、クラスタ間においてノードを再編成する責任を負う。クラスタディレクタはまた、クラスタワイドなポリシーも実施する。クラスタワイドなポリシーは、たとえば、クラスタ内のノードのカーディナリティが、最小値および／または最大値以内に入ることを要求し得る。 Farm Director A farm director is responsible for reorganizing nodes between clusters by assigning nodes between clusters in a cluster farm, that is, removing a node from one cluster and feeding it to another cluster. Bear. The cluster director also enforces cluster-wide policies. A cluster-wide policy may require, for example, that the cardinality of nodes in the cluster be within a minimum and / or maximum value.

ファームディレクタは、クラスタディレクタからのNEED-NODE要求に応答して、クラスタにノードを供給し、クラスタのサービスレベルに関する合意の影響下にある別のクラスタから自由ノードを除去する。ファームディレクタは、クラスタファーム内のクラスタディレクタと通信し、ノードの割当を調停する。このプロセスは、クラスタディレクタに「RELINQUISH-NODE」要求を送信することを必然的に伴い、クラスタディレクタは、ノードを放棄し得るか否かを示すために応答する。ファームディレクタは、これらの応答に基づいてクラスタを選択して、選択したクラスタのクラスタディレクタと対話し、クラスタからノードを除去する。クラスタからのノードの除去は、データベースインスタンスを休止状態にすることを必然的に伴い得る。データベースインスタンスが一旦休止状態にされると、ファームディレクタは、選択したクラスタからノードを除去し、この目的で提供されたＡＰＩを用いるクラスタウェアを呼出すことによって、このノードを必要とするクラスタに、このノードを供給する。 In response to the NEED-NODE request from the cluster director, the farm director supplies the node to the cluster and removes the free node from another cluster that is subject to the agreement on the service level of the cluster. The farm director communicates with the cluster director in the cluster farm to arbitrate node assignment. This process entails sending a “RELINQUISH-NODE” request to the cluster director, which responds to indicate whether the node can be relinquished. The farm director selects a cluster based on these responses, interacts with the cluster director of the selected cluster, and removes the node from the cluster. Removing a node from the cluster may entail bringing the database instance into a dormant state. Once the database instance has been hibernated, the farm director removes the node from the selected cluster and calls this clusterware with the API provided for this purpose into the cluster that requires this node. Supply the node.

加えて、ファームディレクタは、各クラスタ上で作動する作業負荷モニタから受取った性能メトリクスを用いて、クラスタファーム内のクラスタの性能を監視する。クラスタが、そのクラスタについてのサービスレベルに関する合意に違反した状態で作動していること、または、他のクラスタほど作動していないことを性能メトリクスが示した場合、ファームディレクタは、より良好に作動するクラスタから最も作動していないクラスタに、１つ以上のノードをシフトする。 In addition, the farm director monitors the performance of the clusters in the cluster farm using performance metrics received from the workload monitor running on each cluster. A farm director works better if the performance metrics indicate that the cluster is operating in violation of service level agreements for that cluster, or not as well as other clusters. Shift one or more nodes from the cluster to the least active cluster.

ディレクタの選択
以前に述べたように、データベースのデータベースインスタンス内には、活動中のデータベースディレクタまたは待機中のディレクタとして働き得る複数のディレクタが存在する。したがって、どのディレクタが活動中のディレクタであるかを決定するためのメカニズムが必要とされる。さらに、活動中のディレクタの故障時に、待機中のデータベースディレクタを選択して活動中のデータベースディレクタにする必要がある。同種の必要性が、クラスタディレクタおよびファームディレクタにも存在する。活動中のディレクタを選択するプロセスを、ここで、ディレクタ選択と呼ぶ。 Director Selection As previously mentioned, there are multiple directors within a database instance of a database that can act as active or standby directors. Therefore, a mechanism is needed to determine which director is the active director. Furthermore, when an active director fails, it is necessary to select a database director that is waiting to become an active database director. A similar need exists for cluster directors and farm directors. The process of selecting an active director is referred to herein as director selection.

データベースディレクタが選択され得るさまざまな態様が存在する。第１の態様は、データベースのグローバルロックの使用を必要とする。データベースのグローバルロックは、データベースのすべてのデータベースインスタンスの制御下で稼動するプロセスを同期させるために使用される。データベースディレクタは、始動時に、排他的なロックを要求する。いずれのデータベースディレクタもロックを保持していない場合、データベースディレクタにロックが与えられ、このデータベースディレクタが、データベースディレクタの立場を取る。ロックをそれ以降に要求する他のディレクタにはロックが与えられず、仮にロックが与えられる場合も、ロックが与えられるまで待機中のディレクタの立場を取る。これらの要求は、要求者によって与えられるか、または無効にされるまで、保留の状態に置かれる。 There are various ways in which a database director can be selected. The first aspect requires the use of a database global lock. Database global locks are used to synchronize processes running under the control of all database instances of a database. The database director requests an exclusive lock at startup. If no database director holds the lock, the database director is given a lock, and this database director takes the position of the database director. Other directors that subsequently request the lock are not granted the lock, and if a lock is granted, they take the position of a director that is waiting until the lock is granted. These requests are placed on hold until given by the requestor or invalidated.

マルチノードデータベースサーバは、データベースグローバルロックの保持者がシステム故障を被った時点を検出する。この場合、故障した保持者のデータベースグローバルロックは、取消されるか、または解放され、そのデータベースグローバルロックに対する保留中の要求が、待機中のデータベースディレクタに与えられる。ロックが与えられた待機中のデータベースディレクタは、その後、データベースディレクタの役割を果たす。 The multi-node database server detects when the database global lock holder suffers a system failure. In this case, the failed holder's database global lock is either canceled or released and a pending request for the database global lock is given to the waiting database director. The waiting database director granted the lock then acts as the database director.

ディレクタ選択のための別の技術は、プロセスグループの使用を必要とする。プロセスグループは、クラスタ内のあらゆるノード上で実行されるプロセスが参加することのできるグループである。グループのメンバーには、別のメンバーが（たとえばシステム故障により）いつ稼動を停止するかまたはいつプロセスグループを離れるかが通知される。加えて、メンバーには、グループへの参加時にｉｄが割当てられる。 Another technique for director selection requires the use of process groups. A process group is a group to which processes running on any node in the cluster can participate. The members of the group are informed when another member stops working (eg due to a system failure) or when to leave the process group. In addition, members are assigned ids when joining a group.

ディレクタが始動すると、ディレクタは、そのデータベース用のプロセスグループに参加する。参加した時点で、ディレクタが最高位のメンバーｉｄを有していた場合、そのディレクタは、そのデータベースに対してデータベースディレクタの立場を取る。活動中のディレクタが稼動を停止すると、プロセスグループのメンバーに通知され、最高位のメンバーｉｄを有する１つのメンバーが、データベースディレクタの役割を引き受ける。 When the director starts, the director joins the process group for that database. If the director has the highest member id at the time of participation, the director assumes the database director position for the database. When an active director stops working, members of the process group are notified and one member with the highest member id assumes the role of database director.

クラスタディレクタの選択には、同様の技術を用いることができる。ディレクタの選択に、クラスタワイドなロックを使用することができ、または、クラスタ用のプロセスグループを使用することができる。 A similar technique can be used to select the cluster director. A cluster-wide lock can be used for director selection, or a process group for the cluster can be used.

このような技術により、クラスタディレクタの役割を果たすディレクタが、以前の活動中のクラスタディレクタのデータベースとは異なるデータベースに対するディレクタとなり得ることも可能である。その結果、メッセージ待ち行列は、異なるデータベース上に存在し得る。データベース上のテーブルは、別のデータベースのデータベースインスタンス上に存在するプロセスよりも効率的に、そのデータベース用のデータベースインスタンス上に存在するプロセスによってアクセスされ得る。 With such a technique, a director acting as a cluster director can also be a director for a database different from the database of the previously active cluster director. As a result, message queues may exist on different databases. A table on a database can be accessed by a process that resides on a database instance for that database more efficiently than a process that resides on a database instance of another database.

活動中のディレクタの役割を果たす待機中のディレクタが、同じデータベース用のデータベースインスタンスによって確実にホストされるようにするために使用され得る技術が、静的データ指定技術である。この技術により、データベースは、クラスタに対するクラスタディレクタをホストする（すなわち、そのデータベースインスタンスがホストする）１つのデータベースとして指定される。そのデータベースに対するディレクタのみが、クラスタに対するクラスタディレクタの役割を果たす。これらのディレクタ間におけるディレクタの選択は、グローバルデータベースロックまたはデータベースに対するプロセスグループのいずれかを用いることにより実行され得る。 A technique that can be used to ensure that a waiting director acting as an active director is hosted by a database instance for the same database is a static data specification technique. With this technique, a database is designated as one database that hosts the cluster director for the cluster (ie, that database instance hosts). Only the director for that database serves as the cluster director for the cluster. Director selection between these directors may be performed by using either a global database lock or a process group for the database.

データベースは、このためにクラスタウェアによって設けられたインターフェイスを用いる管理者によって指定され得る。待機中のクラスタディレクタの可用性を高めるために、高位の優先順位と、高い、最小／最大のカーディナリティ要件とを有するデータベースを指定することができ、相対的に多数の待機中のディレクタを確保する。静的なデータベース指定の手法を用いて、活動中のファームディレクタを選択することができる。 The database can be specified by an administrator using an interface provided by the clusterware for this purpose. To increase the availability of waiting cluster directors, a database with high priority and high minimum / maximum cardinality requirements can be specified, ensuring a relatively large number of waiting directors. An active farm director can be selected using a static database specification technique.

ディレクタの階層を用いてファームクラスタ内のリソースの管理を組織化することにより、リソースの管理に必要とされる情報の生成および交換が容易になる。一般に、特定のデータベース内で稼動するプロセス（すなわち、特定のデータベース用のデータベースインスタンスのプロセス）は、このデータベース内に存在しないプロセスよりも効率的に、そのデータベース内にある他のプロセスと通信することができる。したがって、データベースに対するデータベースディレクタとして働き、かつ、そのデータベース内で稼動する１つ以上の作業負荷モニタからのサービスの性能メトリクスを獲得するディレクタは、他のデータベース内で稼動するディレクタよりも効率的に、そのデータを獲得することができる。加えて、データベース内には１つの活動中のデータベースディレクタしか存在しないため、データベースを稼動するサービスについての情報、たとえばサービスの性能メトリクス、サービスレベルに関する合意、メッセージ待ち行列のデータ等を獲得および生成する作業は、１つのディレクタによってのみ実行されることを必要とする。 Organizing the management of resources in a farm cluster using a director hierarchy facilitates the generation and exchange of information required for resource management. In general, a process running in a particular database (ie, the process of a database instance for a particular database) communicates with other processes in that database more efficiently than processes that do not exist in this database Can do. Thus, a director that acts as a database director for a database and obtains service performance metrics from one or more workload monitors running in that database is more efficient than a director running in other databases, That data can be acquired. In addition, since there is only one active database director in the database, it acquires and generates information about the services that run the database, such as service performance metrics, service level agreements, message queue data, etc. The work needs to be performed by only one director.

クラスタディレクタについても同様に、クラスタ内の１つのディレクタのみが、メッセージ待ち行列にアクセスし、どのノードがクラスタ内に存在するか、および、どのデータベースのどのデータベースインスタンスをノードがホストしているかを追跡するのに必要とされる情報と、クラスタのサービスレベルに関する合意とを獲得する作業を実行しなければならない。 Similarly for cluster directors, only one director in the cluster has access to the message queue to keep track of which nodes are in the cluster and which database instances of which databases are hosted. The task of obtaining the information needed to do so and the agreement on the service level of the cluster must be performed.

情報および情報の交換が分配される態様を用いて、ディレクタ自体がサービス性能を管理するために行なわなければならないアクションと、ディレクタが別のディレクタにどのアクションを上げるか、または委託するかとが規定される。たとえば、データベース間におけるデータベースインスタンスの再割当は、どのノードがクラスタ内で利用可能であるか、および、どのノードがどのデータベースインスタンスを有するか等の情報を認識していることを必要とする。したがって、このような情報を認識していないデータベースディレクタが、データベース間のデータベースインスタンスの再割当の形態を取ったアクションを要求するサービスレベルの違反を検出すると、データベースディレクタは、このような情報を認識するクラスタディレクタに、このアクションを上げる。 The manner in which information and the exchange of information are distributed is used to specify what actions the director itself must take to manage service performance, and what actions the director should raise or delegate to another director. The For example, the reassignment of database instances between databases requires knowledge of which nodes are available in the cluster and which nodes have which database instances. Thus, if a database director that is not aware of such information detects a service level violation that requires an action in the form of a reassignment of database instances between databases, the database director recognizes such information. Raise this action to the cluster director.

代替的な実施例の例
この発明の一実施例は、データベースサービス間およびデータベースサービスのサブカテゴリ間でマルチノードシステムのリソースを動的に割当てることによって示された。しかしながら、この発明はそれに限定されない。 Alternative Embodiment Example One embodiment of the present invention has been shown by dynamically allocating multi-node system resources between database services and database service subcategories. However, the present invention is not limited to this.

たとえば、この発明の一実施例を用いて、アプリケーションサーバによって提供される
サービス間において、アプリケーションサーバをホストするマルチノードシステムのコンピュータリソースを割当てることができる。アプリケーションサーバは、たとえば、アプリケーションサーバがクライアントとデータベースサーバとの間に位置する３層アーキテクチャの一部である。このアプリケーションサーバは、アプリケーションコードを記憶し、アプリケーションコードへのアクセスを提供し、アプリケーションコードを実行するために主に使用され、データベースサーバは、アプリケーションサーバ用のデータベースを記憶し、アプリケーションサーバ用のデータベースへのアクセスを提供するために主に使用される。アプリケーションサーバは、データベースサーバにデータの要求を送信する。要求は、アプリケーションサーバに記憶されたアプリケーションコードを実行することに応答して、アプリケーションサーバにより生成され得る。アプリケーションサーバの一例が、オラクル９ｉアプリケーションサーバまたはオラクル１０ｇアプリケーションサーバである。ここに記載したマルチノードサーバの例と同様に、アプリケーションサーバは、複数のノード上で実行される複数のサーバインスタンスとして分配され得、サーバインスタンスは、サーバインスタンス間で移動され得る複数のセッションをホストする。 For example, one embodiment of the present invention can be used to allocate computer resources of a multi-node system that hosts an application server between services provided by the application server. The application server is, for example, part of a three-tier architecture where the application server is located between the client and the database server. This application server is mainly used to store application code, provide access to the application code, and execute the application code, the database server stores the database for the application server, and the database for the application server Mainly used to provide access to. The application server sends a request for data to the database server. The request may be generated by the application server in response to executing application code stored on the application server. An example of an application server is an Oracle 9i application server or an Oracle 10g application server. Similar to the multi-node server example described here, an application server can be distributed as multiple server instances running on multiple nodes, and the server instance hosts multiple sessions that can be moved between server instances. To do.

この発明はまた、同じソフトウェア製品または同じバージョンのソフトウェア製品の複製を実行するサーバインスタンスのみで構成される同種のマルチノードサーバに限定されない。たとえば、マルチノードデータベースサーバは、サーバインスタンスのいくつかのグループで構成され得、各グループは、異なるベンダからの異なるデータベースサーバソフトウェアを実行し、または、同じベンダからのデータベースサーバソフトウェアの異なるバージョンを実行する。 The present invention is also not limited to the same type of multi-node server consisting of only server instances that execute copies of the same software product or the same version of the software product. For example, a multi-node database server may consist of several groups of server instances, each group running different database server software from different vendors, or running different versions of database server software from the same vendor To do.

透過的なセッション移動
透過的なセッション移動により、クライアントは、最初のセッションが確立されたアプリケーションに対して透過的な態様で、或るサーバ上のセッションから別のサーバ上のセッションに切換えられ得る。移動という用語は、サーバ上の既存のセッションのクライアントが、既存のセッションから別のセッションに切換えられる動作を指し、既存のセッションが終了されること、および、クライアントが既存のセッションの代わりに別のセッションを使用することを可能にする。既存のセッションは、ここでは、移動されたと呼ばれる。「透過的な」という用語は、ソフトウェアのユニットに関し、そのユニット内で、動作を実行するように適合された命令の実行を必要としない態様で、動作を実行することを指す。したがって、クライアントは、透過的なセッション移動の下で、移動を行なうように適合されたアプリケーション命令を実行することなく、セッション間で切換えられる。その代わりに、それを介してアプリケーションがサーバと対話するクライアント側のインターフェイス構成要素が、移動の詳細に対処し、クライアント側のインターフェイス構成要素の内部状態を変更して移動を行なう。ここに記載する技術を開始するために、レガシーアプリケーションを変更する必要はない。 Transparent session movement Transparent session movement allows a client to switch from a session on one server to a session on another server in a manner that is transparent to the application in which the initial session was established. The term move refers to the behavior in which a client of an existing session on the server is switched from an existing session to another session, the existing session is terminated, and the client Allows you to use a session. An existing session is referred to herein as moved. The term “transparent” refers to performing an operation within a unit in a manner that does not require execution of instructions adapted to perform the operation within the unit. Thus, the client is switched between sessions under transparent session movement, without executing application instructions adapted to perform the movement. Instead, the client-side interface component through which the application interacts with the server handles the details of the move and changes the internal state of the client-side interface component to perform the move. There is no need to change legacy applications to start the technology described here.

セッションを移動するために、セッションの状態が収集および復元される。セッションの状態の収集は、セッションのセッション状態の真の複製としてバイトのストリームを生成することを必然的に伴い、バイトのストリームは、オブジェクト、ファイル、または他の種類のデータ構造に記憶され得、セッションを復元するために後にアクセスされ得る。透過的なセッション移動の下で、クライアントのセッションはソースサーバ上で収集され、データ構造内で記憶されて宛先サーバに移送されるバイトのストリームを生成する。そして、そこで宛先サーバは、クライアントに対して確立された宛先サーバ上のセッションに、バイトのストリームをロードすることにより、セッションを復元する。 To move the session, the session state is collected and restored. The collection of session state entails generating a stream of bytes as a true replica of the session state of the session, which can be stored in an object, file, or other type of data structure; It can be accessed later to restore the session. Under transparent session movement, the client session is collected on the source server and generates a stream of bytes that are stored in a data structure and transported to the destination server. The destination server then restores the session by loading a stream of bytes into the session on the destination server established for the client.

セッションの移動
データベースセッションは、今後のアプリケーション呼出が、先行のアプリケーション呼出によって生成されたセッション状態に依存するときに、「ステートフル」であると考
えられる。ステートフルなセッションのセッション状態に対して、今後のアプリケーション呼出が潜在的な依存性を有することにより、ステートフルなデータベースセッションの移動は、ソースインスタンスに記憶されたセッション状態の部分が、別のデータベースインスタンスに転送され得るかどうかを判断することを必然的に伴い、転送され得る場合は、ソースデータベースインスタンスと宛先インスタンスとの間でセッション状態の複製を転送することを必然的に伴う。一実施例において、ソースインスタンスに記憶されたセッション状態の部分が別のデータベースインスタンスに転送され得るかどうかを判断するために、さまざまな移動の確認が行なわれて、セッション状態が別のデータベースインスタンスに転送され得るかどうかを判断する。これらの移動の確認は、データベースセッションがトランザクション境界に存在するか、呼出の境界に存在するか、または構成要素の境界に存在するかを判断することを含む。 Session Movement A database session is considered “stateful” when future application calls depend on the session state generated by the previous application call. Because future application invocations have a potential dependency on the session state of a stateful session, the movement of a stateful database session will cause a portion of the session state stored in the source instance to be transferred to another database instance. It entails determining whether it can be transferred, and if so, it entails transferring a session state replica between the source database instance and the destination instance. In one embodiment, various movement checks are performed to determine whether a portion of the session state stored in the source instance can be transferred to another database instance, so that the session state is transferred to another database instance. Determine if it can be transferred. Confirmation of these moves includes determining whether the database session exists at a transaction boundary, a call boundary, or a component boundary.

現時点で、そのセッションに対して実行されている活動中のトランザクションが存在しない場合、データベースセッションは、トランザクション境界に存在する。トランザクションは、原子単位として実行される作業の論理単位である。データベースシステムの場合、データベースは、トランザクションによって行なわれた変更のすべてを反映するか、または、トランザクションによって行なわれた変更のいずれをも反映せず、データベースの整合性を確保しなければならない。その結果、データベースには、トランザクションが完全に実行されるまで、トランザクションによって行なわれた変更が永続的に適用されない。トランザクションによって行なわれた変更が永続的なものになったときに、トランザクションは「コミットする」と言われる。トランザクションがコミットされないか、中断されないか、または終了されない場合、トランザクションは活動中である。 If there is currently no active transaction being executed for the session, the database session is at a transaction boundary. A transaction is a logical unit of work executed as an atomic unit. In the case of a database system, the database must reflect all of the changes made by the transaction, or reflect any of the changes made by the transaction, and must ensure the integrity of the database. As a result, the changes made by the transaction are not permanently applied to the database until the transaction is completely executed. A transaction is said to "commit" when changes made by the transaction become permanent. A transaction is active if the transaction is not committed, aborted, or terminated.

データベースインスタンスがクライアントの呼出の実行を終了している場合、セッションは、呼出の処理の中間ステージではなく、呼出の境界に存在する。たとえば、データベースインスタンスは、呼出を処理してデータベースステートメントを実行するために、その各々が特定の種類の演算に対応するステージを通過する。これらのステージは、（１）カーソルの作成、（２）データベースステートメントの構文解析および変数の結合、（３）データベースステートメントの実行、（４）クエリーに返すための行のフェッチ、および（５）カーソルの閉鎖である。これらのステージは、「オラクル８サーバの概念（Oracle8 Server Concepts）」、リリース（Release）８．０、第３巻(Volume) （この内容は、ここに引用により援用される）、第２３章において、より詳細に記載されている。中間ステージは、呼出の処理が完了する前に実行される演算である。中間ステージは、ステージ（１）〜（５）である。インスタンスが呼出に応答してステップ（５）を実行した後、ソースセッションは呼出の境界に存在する。 If the database instance has finished executing the client call, the session is at the call boundary, not in the intermediate stage of the call processing. For example, a database instance passes through stages that each correspond to a particular type of operation in order to process the call and execute the database statement. These stages consist of (1) creating a cursor, (2) parsing database statements and combining variables, (3) executing database statements, (4) fetching rows to return to a query, and (5) cursors. Is closed. These stages are described in “Oracle8 Server Concepts”, Release 8.0, Volume 3 (the contents of which are incorporated herein by reference), Chapter 23. Are described in more detail. The intermediate stage is an operation executed before the calling process is completed. The intermediate stages are stages (1) to (5). After the instance performs step (5) in response to the call, the source session exists at the call boundary.

セッションの「データベース構成要素」の各々が、そのそれぞれの構成要素の境界に存在する場合、セッションは構成要素の境界に存在する。データベース構成要素は、特化されかつ関連する機能を提供する、データベースサーバ上の１組のソフトウェアモジュールである。構成要素のセッション状態は特に、データベース構成要素によって生成および使用される。一実施例に従うと、セッション状態は、構成要素のセッション状態の合併または組合せとして見ることができる。以下の内容は、データベース構成要素の例である。カーソル構成要素は、データベースインスタンス内のカーソルを管理する。カーソルは、構文解析されたデータベースステートメントに関する情報と、データベースステートメントの処理に関連する他の情報とを記憶するために使用されるメモリの領域である。ＰＬ／ＳＱＬ構成要素は、ＰＬ／ＳＱＬ、すなわち、オラクル・コーポレイション（Oracle Corporation）により公表された手続データベース言語で書かれたコード（プロシージャ等）を実行する責任を負うデータベース構成要素である。セッションパラメータ構成要素は、セッションに関連する呼出および要求がどのように処理されるかを包括的に制御する属性を管理する責任を負うデータベース構成要素である。これらの属性は、構成要素のセッショ
ン状態に記憶される。たとえば、セッションパラメータは、クエリーの実行により返された結果に関して特定の人間言語を制御する属性を含み得る。Java（登録商標）構成要素は、Java（登録商標）で書かれたコード（クラスおよびオブジェクトメソッド等）を実行する責任を負うデータベース構成要素である。Java（登録商標）構成要素は、構成要素のセッション状態を用いて、Java（登録商標）コードの実行に関連する情報を記憶する。 A session exists at a component boundary if each of the “database components” of the session is present at the boundary of its respective component. A database component is a set of software modules on a database server that provide specialized and related functions. The session state of the component is specifically generated and used by the database component. According to one embodiment, session state can be viewed as a merge or combination of component session states. The following content is an example of a database component. The cursor component manages the cursor in the database instance. A cursor is an area of memory used to store information about the parsed database statement and other information related to the processing of the database statement. A PL / SQL component is a database component that is responsible for executing PL / SQL, code (procedures, etc.) written in a procedural database language published by Oracle Corporation. A session parameter component is a database component that is responsible for managing attributes that comprehensively control how calls and requests associated with a session are processed. These attributes are stored in the session state of the component. For example, session parameters may include attributes that control a particular human language with respect to results returned by executing a query. A Java (registered trademark) component is a database component responsible for executing code (classes, object methods, etc.) written in Java (registered trademark). The Java component uses the session state of the component to store information related to the execution of Java code.

データベース構成要素の構成要素のセッション状態が別のセッションに移動され得る場合、データベースセッションは、特定のデータベース構成要素に関する構成要素の境界に存在する。データベース構成要素は、データベース構成要素のそれぞれの構成要素のセッション状態が移動され得るか否かを示す値を返す関数を提供する。データベースセッションの構成要素セッション状態が移動されないかもしれない理由は、構成要素のセッション状態が、オープンファイルのファイル記述子を含むためである。ファイル記述子は、データベースセッションをホストするインスタンスにとってのみ有効な情報を含む。 A database session exists at the component boundary for a particular database component if the session state of the component of the database component can be moved to another session. The database component provides a function that returns a value indicating whether the session state of each component of the database component can be moved. The reason that the component session state of the database session may not be moved is because the component session state includes a file descriptor for an open file. The file descriptor contains information that is valid only for the instance that hosts the database session.

関数は、各データベース構成要素によってサポートされるコールバック関数のインターフェイスの一部である。これらの関数は、構成要素のセッション状態を収集および復元するためだけでなく、構成要素のセッション状態が、セッションの移動を可能にするかどうかを判断するために呼出される。 The function is part of the callback function interface supported by each database component. These functions are called not only to collect and restore the session state of the component, but also to determine whether the session state of the component allows for session movement.

性能メトリクスの生成
性能メトリクスは、１つ以上のリソースに関してサービスによって実現されるサービスの品質を示すデータである。バックグラウンドプロセスは、データベースインスタンス上でホストされる各セッションおよびサービスに対して生成される性能の統計値から性能メトリクスを生成する。性能の統計値は、性能メトリクスと同様に、性能の品質を示し得る。しかしながら、性能の測定値は一般に、特定のリソースの特定の使用についてのより詳細な情報を含む。性能の統計値には、たとえば、セッションによってどれだけの時間、ＣＰＵ時間が使用されたか、呼出の速度、セッションが行なった呼出の数、セッションに関し、呼出を完了するのに必要とされた応答時間、セッションのためのクエリーを構文解析するのにどれだけのＣＰＵ処理時間が使用されたか、クエリーを実行するためにどれだけのＣＰＵ処理時間が使用されたか、セッションに対していくつの論理読出および物理読出が実行されたか、ならびに、さまざまなリソースに対する入出力動作に対する待ち時間、たとえば特定の１組のデータブロックを読出すか、または書込むための待ち時間が含まれる。セッションに対して生成された性能の統計値は、セッションに関連付けられたサービスによって集計される。 Generation of Performance Metrics Performance metrics are data indicating the quality of service realized by a service with respect to one or more resources. The background process generates performance metrics from performance statistics generated for each session and service hosted on the database instance. Performance statistics may indicate quality of performance, as well as performance metrics. However, performance measurements generally include more detailed information about a particular use of a particular resource. Performance statistics include, for example, how much time, CPU time was used by the session, the speed of the call, the number of calls made by the session, and the response time required to complete the call with respect to the session How much CPU processing time was used to parse the query for the session, how much CPU processing time was used to execute the query, how many logical reads and physical It includes whether a read has been performed, as well as latency for I / O operations to various resources, such as latency for reading or writing a particular set of data blocks. The performance statistics generated for the session are aggregated by the service associated with the session.

データベースインスタンス上で確立された各セッションに対し、データベースインスタンス上でセッションを確立する動作の一部として、セッションオブジェクトが作成される。セッションオブジェクトは、セッションを管理するためにデータベースインスタンスによって使用される情報の項目を含む。情報のこれらの項目の中には、セッションに対して規定されるサービス識別子がある。一例において、サービスｉｄは、サービスのサービス名から生成されたハッシュ値である。 For each session established on the database instance, a session object is created as part of the operation of establishing a session on the database instance. The session object contains items of information used by the database instance to manage the session. Among these items of information is a service identifier defined for the session. In one example, the service id is a hash value generated from the service name of the service.

性能の統計値は、セッション内で要求された作業を実行するプロセスによって生成および集計される。たとえば、データベースセッションは、クエリーを実行する要求を送信するクライアントに対して確立される。そのセッションに対するプロセスは、データベースサーバのソフトウェアを実行し、クエリーを構文解析して、クエリーを並列に計算するための実行プランを考案する。データベースクエリーを構文解析し、実行プランを考案し、クエリーを計算するデータベースサーバのソフトウェアはまた、性能の統計値を生成および集計する。データベースサーバのソフトウェアは、セッションおよびそのセッションのサービスにより性能の統計値を集計する。この性能の統計値は、性能の統計値のリポジト
リに記憶され、このリポジトリは、たとえば、セッションに関連するか、または、いずれかのセッションから接続を解除されたサービスに関連するインメモリ固定テーブルであり得る。セッションに関する性能の統計値は、そのセッションのためのインメモリテーブル内で集計および記憶される。サービスに関する性能の統計値は、そのサービスのためのインメモリテーブル内で集計および記憶される。 Performance statistics are generated and aggregated by a process that performs the requested work within the session. For example, a database session is established for a client that sends a request to execute a query. The process for that session executes the database server software, parses the query, and devise an execution plan for computing the query in parallel. Database server software that parses database queries, devise execution plans, and computes queries also generates and aggregates performance statistics. The database server software aggregates performance statistics by session and the services of that session. This performance statistic is stored in a performance statistic repository, which is, for example, an in-memory fixed table associated with a session or associated with a service disconnected from any session. possible. Performance statistics for a session are aggregated and stored in an in-memory table for that session. Performance statistics for a service are aggregated and stored in an in-memory table for that service.

セッションは、たとえばサービスＰＡＹに関連する。セッションに割当てられたデータベースプロセスは、クエリーを計算するために０．４秒のＣＰＵ時間を使用する。このプロセスは、サービスＰＡＹのためのインメモリテーブル内の小計に０．４を加算する。 The session is associated with the service PAY, for example. The database process assigned to the session uses 0.4 seconds of CPU time to calculate the query. This process adds 0.4 to the subtotal in the in-memory table for service PAY.

作業負荷モニタは、定期的に（５秒ごとに等）、性能の統計値のリポジトリにアクセスして性能メトリクスを生成し、その生成した性能メトリクスを性能メトリクスのリポジトリに記憶する。性能メトリクスのリポジトリは、好ましくは、インメモリデータベーステーブルである。作業負荷モニタは、性能の統計値のリポジトリに５秒ごとにアクセスして、性能グレードを含む性能メトリクスを生成することができる。性能グレードは、接続時間の平均化のためにリスナーに送信される。リスナーは、この性能グレードを用いて、接続時間の平均化を実行し、サービスに対してより良好な性能を提供するデータベースインスタンス上のサービスに対する接続要求に対し、セッションを確立する。 The workload monitor periodically (such as every 5 seconds) accesses the performance statistics repository, generates performance metrics, and stores the generated performance metrics in the performance metrics repository. The repository of performance metrics is preferably an in-memory database table. The workload monitor can access the performance statistics repository every 5 seconds and generate performance metrics including performance grades. The performance grade is sent to the listener for connection time averaging. The listener uses this performance grade to perform connection time averaging and establish a session for a connection request for a service on a database instance that provides better performance for the service.

作業負荷モニタはまた、生成された性能メトリクスを比較して、サービスレベルに関する合意の違反を検出することができる。サービスに関する性能の違反が検出されると、作業負荷モニタは、サービスレベルの違反に応答する責任を負うディレクタ（データベースディレクタ等）に警告するためにメッセージを送信する。 The workload monitor can also compare the generated performance metrics to detect violations of service level agreements. When a service performance violation is detected, the workload monitor sends a message to alert a director (such as a database director) responsible for responding to the service level violation.

サービスおよびそのさまざまな属性と、サービスの品質（サービスレベルに関する合意等）とは、規定される必要がある。一実施例に従うと、データベースサーバは、人間の管理者からコマンドを受取り、そしてデータベースサーバのサービスについての定義を作成および変更するコマンドラインインターフェイスを設ける。これらの定義は、「サービスプロファイル」としてのデータベース内のディクショナリに、データベース構成データとして記憶される。 The service and its various attributes and the quality of service (such as agreement on service level) need to be specified. According to one embodiment, the database server provides a command line interface that receives commands from a human administrator and creates and modifies definitions for the services of the database server. These definitions are stored as database configuration data in a dictionary in the database as a “service profile”.

ハードウェアの概観
図６は、この発明の一実施例が実現され得るコンピュータシステム６００を示すブロック図である。コンピュータシステム６００は、バス６０２または情報を通信するための他の通信メカニズムと、バス６０２に結合されて情報を処理するためのプロセッサ６０４とを含む。コンピュータシステム６００は、バス６０２に結合されてプロセッサ６０４が実行する命令および情報を記憶するためのメインメモリ６０６、たとえばランダムアクセスメモリ（ＲＡＭ）または他の動的記憶装置も含む。メインメモリ６０６は、プロセッサ６０４が実行する命令の実行中に、一時的数値変数または他の中間情報を記憶するためにも使用され得る。コンピュータシステム６００は、バス６０２に結合されてプロセッサ６０４に対する静的情報および命令を記憶するための読出専用メモリ（ＲＯＭ）６０８または他の静的記憶装置をさらに含む。磁気ディスクまたは光学ディスク等の記憶装置６１０が設けられてバス６０２に結合され、情報および命令を記憶する。 Hardware Overview FIG. 6 is a block diagram that illustrates a computer system 600 upon which an embodiment of the invention may be implemented. Computer system 600 includes a bus 602 or other communication mechanism for communicating information, and a processor 604 coupled with bus 602 for processing information. Computer system 600 also includes a main memory 606, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 602 for storing instructions and information for execution by processor 604. Main memory 606 may also be used to store temporary numeric variables or other intermediate information during execution of instructions executed by processor 604. Computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604. A storage device 610 such as a magnetic disk or optical disk is provided and coupled to the bus 602 for storing information and instructions.

コンピュータシステム６００は、コンピュータユーザに情報を表示するためのディスプレイ６１２、たとえば陰極線管（ＣＲＴ）に、バス６０２を介して結合され得る。英数字キーおよび他のキーを含む入力装置６１４がバス６０２に結合されて、情報およびコマンド選択をプロセッサ６０４に通信する。別の種類のユーザ入力装置が、方向情報およびコマンド選択をプロセッサ６０４に通信してディスプレイ６１２上のカーソルの動作を制御するためのカーソル制御装置６１６、たとえばマウス、トラックボール、またはカーソル
方向キーである。この入力装置は一般に、２つの軸、すなわち第１の軸（ｘ等）および第２の軸（ｙ等）において２自由度を有し、これによって入力装置は平面上で位置を特定することができる。 Computer system 600 may be coupled via bus 602 to a display 612 for displaying information to a computer user, such as a cathode ray tube (CRT). An input device 614 including alphanumeric keys and other keys is coupled to bus 602 to communicate information and command selections to processor 604. Another type of user input device is a cursor controller 616, such as a mouse, trackball, or cursor direction key, for communicating direction information and command selections to the processor 604 to control the movement of the cursor on the display 612. . The input device generally has two degrees of freedom in two axes, a first axis (such as x) and a second axis (such as y), which allows the input device to locate on a plane. it can.

この発明は、この明細書に記載された技術を実現するためにコンピュータシステム６００を用いることに関する。この発明の一実施例によると、これらの技術は、メインメモリ６０６に含まれる１つ以上の命令の１つ以上のシーケンスをプロセッサ６０４が実行することに応答して、コンピュータシステム６００により実行される。このような命令は、別のコンピュータ読取可能な媒体、たとえば記憶装置６１０からメインメモリ６０６内に読出すことができる。メインメモリ６０６に含まれる命令のシーケンスを実行することにより、プロセッサ６０４はこの明細書に記載されたプロセスのステップを実行する。代替的な実施例では、ソフトウェア命令の代わりに、またはソフトウェア命令と組合せて結線回路を用いて、この発明を実施することができる。したがって、この発明の実施例は、ハードウェア回路およびソフトウェアのいずれかの特定の組合せに限定されない。 The invention is related to the use of computer system 600 for implementing the techniques described herein. According to one embodiment of the invention, these techniques are performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606. . Such instructions can be read into main memory 606 from another computer-readable medium, such as storage device 610. By executing the sequence of instructions contained in main memory 606, processor 604 performs the steps of the processes described herein. In an alternative embodiment, the present invention can be implemented using a connection circuit instead of or in combination with software instructions. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

この明細書で用いられる「コンピュータ読取可能な媒体」という用語は、プロセッサ６０４に対して実行のために命令を提供することに携わる、いずれかの媒体を指す。このような媒体は、不揮発性媒体、揮発性媒体、および伝送媒体を含む多くの形態を取り得るが、これらに限定されない。不揮発性媒体には、たとえば記憶装置６１０等の光学または磁気ディスクが含まれる。揮発性媒体には、メインメモリ６０６等の動的メモリが含まれる。伝送媒体には、同軸ケーブル、銅線、および光ファイバが含まれ、バス６０２を有するワイヤが含まれる。伝送媒体は、電波データ通信および赤外線データ通信の間に生成されるもの等の音波または光波の形態を取り得る。 The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 604 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks such as storage device 610. Volatile media includes dynamic memory, such as main memory 606. Transmission media includes coaxial cables, copper wire, and optical fiber, and includes wires having a bus 602. Transmission media can take the form of acoustic or light waves, such as those generated during radio wave data communication and infrared data communication.

コンピュータ読取可能な媒体の一般的な形態には、たとえばフロッピー（登録商標）ディスク、フレキシブルディスク、ハードディスク、磁気テープ、他のいずれかの磁気媒体、ＣＤ−ＲＯＭ、他のいずれかの光学媒体、パンチカード、紙テープ、孔のパターンを有する他のいずれかの物理的媒体、ＲＡＭ、ＰＲＯＭ、ＥＰＲＯＭ、ＦＬＡＳＨ−ＥＰＲＯＭ、他のいずれかのメモリチップもしくはカートリッジ、以下に述べる搬送波、またはコンピュータが読出すことのできる他のいずれかの媒体が含まれる。 Common forms of computer readable media include, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch Card, paper tape, any other physical medium with a hole pattern, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave described below, or computer-readable Any other media that can be included.

プロセッサ６０４に対して実行のために１つ以上の命令の１つ以上のシーケンスを搬送することに対し、コンピュータ読取可能な媒体のさまざまな形態が関与し得る。たとえば、命令は、最初に遠隔コンピュータの磁気ディスクで搬送され得る。遠隔コンピュータはそれらの命令をそれ自体の動的メモリにロードして、それらの命令を、モデムを用いて電話回線経由で送信することができる。コンピュータシステム６００に対してローカルなモデムが電話回線上のデータを受信して、赤外線送信機を用いてそのデータを赤外線信号に変換することができる。赤外線信号によって搬送されたデータは赤外線検出器によって受信され得、適切な回路がそのデータをバス６０２上に出力することができる。バス６０２はそのデータをメインメモリ６０６に搬送し、そこからプロセッサ６０４が命令を取り出して実行する。メインメモリ６０６が受信した命令は、プロセッサ６０４による実行前または実行後のいずれかに、記憶装置６１０に状況に応じて記憶され得る。 Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution. For example, the instructions may be initially carried on a remote computer magnetic disk. The remote computer can load the instructions into its own dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 600 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. Data carried by the infrared signal can be received by an infrared detector, and appropriate circuitry can output the data on bus 602. Bus 602 carries the data to main memory 606 from which processor 604 retrieves and executes the instructions. The instructions received by main memory 606 may be stored in storage device 610 depending on the situation, either before or after execution by processor 604.

コンピュータシステム６００は、バス６０２に結合された通信インターフェイス６１８も含む。通信インターフェイス６１８は、ローカルネットワーク６２２に接続されたネットワークリンク６２０に対する双方向のデータ通信結合を提供する。たとえば通信インターフェイス６１８は、対応する種類の電話回線に対するデータ通信接続を設けるための統合サービスデジタル網（ＩＳＤＮ）カードまたはモデムであり得る。別の例として、通信インターフェイス６１８は、互換性を有するローカルエリアネットワーク（ＬＡＮ）にデータ通信接続を設けるためのＬＡＮカードであり得る。無線リンクもまた実現することが
できる。このようなどの実現例においても、通信インターフェイス６１８は、さまざまな種類の情報を表わすデジタルデータストリームを搬送する電気信号、電磁信号、または光信号を送受信する。 Computer system 600 also includes a communication interface 618 coupled to bus 602. Communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to a local network 622. For example, communication interface 618 may be an integrated services digital network (ISDN) card or modem to provide a data communication connection for a corresponding type of telephone line. As another example, communication interface 618 may be a LAN card for providing a data communication connection to a compatible local area network (LAN). A wireless link can also be realized. In any such implementation, communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

ネットワークリンク６２０は一般に、１つ以上のネットワーク経由で他のデータ装置に対してデータ通信を提供する。たとえば、ネットワークリンク６２０は、ローカルネットワーク６２２経由で、ホストコンピュータ６２４か、またはインターネットサービスプロバイダ（Internet Service Provider）（ＩＳＰ）６２６により運営されるデータ装置に接続を提供することができる。ＩＳＰ６２６は次いで、現在一般に「インターネット」６２８と呼ばれるワールドワイドパケットデータ通信網を介してデータ通信サービスを提供する。ローカルネットワーク６２２およびインターネット６２８はともに、デジタルデータストリームを搬送する電気信号、電磁信号、または光信号を用いる。さまざまなネットワークを経由する信号と、ネットワークリンク６２０上の、通信インターフェイス６１８経由の信号とは、コンピュータシステム６００との間でデジタルデータを搬送し、情報を運ぶ搬送波の例示的形態である。 Network link 620 typically provides data communication to other data devices via one or more networks. For example, the network link 620 can provide a connection via a local network 622 to a host computer 624 or a data device operated by an Internet Service Provider (ISP) 626. ISP 626 then provides data communication services through a worldwide packet data communication network now commonly referred to as the “Internet” 628. Local network 622 and Internet 628 both use electrical, electromagnetic or optical signals that carry digital data streams. Signals over various networks and over communication link 618 on network link 620 are exemplary forms of carrier waves that carry digital data to and from computer system 600 and carry information.

コンピュータシステム６００は、ネットワーク、ネットワークリンク６２０、および通信インターフェイス６１８を介してメッセージを送信して、プログラムコードを含むデータを受信することができる。インターネットの例では、サーバ６３０は、インターネット６２８、ＩＳＰ６２６、ローカルネットワーク６２２、および通信インターフェイス６１８経由で、アプリケーションプログラムに対して要求されたコードを送信することができる。 Computer system 600 can send messages over network, network link 620, and communication interface 618 to receive data including program code. In the Internet example, the server 630 can send the requested code to the application program via the Internet 628, ISP 626, local network 622, and communication interface 618.

受信されたコードは、受信されたときにプロセッサ６０４によって実行され得、および／または後の実行のために記憶装置６１０もしくは他の不揮発性記憶装置に記憶され得る。このようにして、コンピュータシステム６００は搬送波の形でアプリケーションコードを得ることができる。 The received code may be executed by processor 604 when received and / or stored in storage device 610 or other non-volatile storage for later execution. In this manner, computer system 600 can obtain application code in the form of a carrier wave.

上述の明細書では、この発明の実施例を実現例ごとに異なり得る多数の特定の詳細を参照して説明してきた。したがって、この発明が何であるか、およびこの発明を目指して出願人が何を意図しているかを排他的に示す唯一のものが、この出願に由来する一組の請求項であり、このような請求項の結果、特有の表現形式となり、今後のどのような補正をも含む。このような請求項に含まれる用語に対してここで明示されたどのような定義も、請求項で用いられる用語の意味を決定するものとする。したがって、請求項に明示的に記載されていない限定、要素、特性、特徴、利点または属性は、このような請求項の範囲を決して限定しない。したがって、明細書および図面は限定的な意味ではなく例示的な意味で捉えられるべきである。 In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Therefore, the only thing that shows exclusively what this invention is and what the applicant intends for this invention is a set of claims derived from this application, such as As a result of the claims, it becomes a specific expression format and includes any future corrections. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.

この発明の一実施例が実現され得るマルチノードコンピュータシステムを示すブロック図である。1 is a block diagram illustrating a multi-node computer system in which one embodiment of the present invention may be implemented. この発明の一実施例に従った１クラスタのノードを示すブロック図である。It is a block diagram which shows the node of 1 cluster according to one Example of this invention. この発明の一実施例に従った、１クラスタのノードと、データベースにさまざまなサービスを提供することに携わるマルチノードデータベースサーバとの構成要素を示すブロック図である。FIG. 2 is a block diagram illustrating components of a cluster of nodes and a multi-node database server engaged in providing various services to a database according to one embodiment of the present invention. ターゲットデータベースインスタンスにサービスを拡張し、ターゲットデータベースインスタンス上のサービスを休止状態にするための手続きを示すフロー図である。It is a flowchart which shows the procedure for extending a service to a target database instance and putting the service on a target database instance into a dormant state. ２段階の休止を行なうための手続きを示すフロー図である。It is a flowchart which shows the procedure for performing a two-step rest. この発明の一実施例で使用され得るコンピュータシステムのブロック図である。1 is a block diagram of a computer system that can be used in one embodiment of the present invention.

Claims

A method for dynamically allocating computer resources of a multi-node computer system comprising:
A computer-implemented step of monitoring performance implemented by a plurality of services running on the multi-node computer system, wherein the plurality of services includes a first service and a second service; Furthermore,
Based on the step of monitoring the performance of a plurality of services, generating a performance metric indicating the performance realized by each service of the plurality of services;
A computer-implemented step in which, based on the performance metrics, the multi-node computer system detects a violation of a service level agreement for the first service;
Responsive to the multi-node computer system detecting the violation of the agreement regarding the service level, the computer resource allocation of the multi-node system is adjusted between the first service and the second service. And a computer-implemented step.

The method of claim 1, wherein the step of adjusting resource allocation comprises allocating another node of the multi-node system to host the first service.

The first server runs on a first node of the multi-node system, and the step of adjusting resource allocation includes extending the first service to the first server. The method according to 1.

Adjusting resource allocation includes providing the first server to the first node to perform the step of extending the first service to the first server. Item 4. The method according to Item 3.

The multi-node system includes a first node and a second node;
A first group of sessions is established for the first service on the first node;
A second group of sessions is established for the first service on the second node;
The method of claim 1, wherein allocating resources comprises moving at least one session from the second group to the first node.

The computer resource includes a pool of resources;
The step of adjusting the allocation of computer resources includes a lower level in the hierarchy before attempting to adjust the allocation of a second pool of resources higher in the hierarchy for the first service. Attempting to resolve the performance violation by adjusting the allocation of the first pool of resources,
The method of claim 1, wherein the pool of resources includes a first pool of resources and a second pool of resources.

The pool of resources includes a first pool of servers as the first pool of resources and a second pool of nodes as the second pool of resources;
The step of attempting to resolve the performance violation hosts the first service before attempting to add another node from the second pool to host the first service. The method of claim 6, comprising attempting to allocate another server from the first pool for the purpose.

8. The method of claim 7, wherein the step further comprises providing another server to the another node in response to adding the other node.

The first multi-node server includes a first server, a second server, and a third server;
The first server hosts a plurality of sessions for the first service;
The step of attempting to resolve the performance violation moves at least one of the plurality of sessions to the second server before assigning the third server to host the first service. 7. The method of claim 6, comprising the step of attempting to do so.

Each service of the plurality of services is a category of work executed by the multi-node system,
The method of claim 1, wherein the step of adjusting the allocation of computer resources of the multi-node system includes adjusting the allocation of computer resources for each category of the plurality of categories of work.

Multiple units of software run on a set of client computers interconnected to the multi-node system;
The method of claim 10, wherein each category of work corresponds to a particular unit of software among the plurality of units.

The method of claim 11, wherein each category of work corresponds to work performed in response to a request generated by a particular unit of software in the plurality of units.

A method for dynamically allocating computer resources of a multi-node computer system comprising a first set of nodes and a second set of nodes, comprising:
Monitoring the performance of a plurality of services hosted on the multi-node system to generate performance metrics;
Generating a performance metric indicating the performance realized by each service of the plurality of services based on the step of monitoring the performance of a plurality of services;
The multi-node system includes a first set of nodes and a second set of nodes, the method further comprising:
Running a first multi-node server and a second multi-node server on the first set of nodes;
The plurality of services includes a first service and a second service hosted by the first multi-node server, and the method further includes:
The first multi-node server resources between the first service and the second service based on the performance metrics, wherein the first system component operating on the first set of nodes is Adjusting the allocation of
A second system component operating on the first set of nodes, the first set between the first multi-node server and the second multi-node server based on the performance metrics; Adjusting the resource allocation of the nodes.

The step further comprises:
The first system component detecting a service level violation for the first service;
In response to detecting a violation of the service level for the first service,
Said second system for said first system component to coordinate resource allocation of said first set of nodes between said first multi-node server and said second multi-node server; And sending a first request to the component.

Said step of said second system component adjusting resource allocation of said first set of nodes between said first multi-node server and said second multi-node server comprises said second system The method of claim 13, wherein a component includes assigning a node between the first multi-node server and the second multi-node server.

The step further comprises:
The first system component detecting a service level violation for the first service;
In response to detecting a violation of the service level for the first service, the first system component is configured to cause the first multiplicity between the first service and the second service. Adjusting the resource allocation of the node server.

The first multi-node server includes a first server instance and a second server instance;
The method of claim 16, wherein the step of adjusting resource allocation of the first multi-node server comprises causing the second server to host the first service.

The first multi-node server includes a first server instance and a second server instance;
The first server instance hosts at least one session for the first service;
The second server instance hosts at least one session for the first service;
The step of adjusting resource allocation of the first multi-node server includes adjusting a balance of sessions for the first service between the first server instance and the second server instance. The method of claim 16.

The method of claim 14, wherein in response to receiving the first request, the second system component provides another server instance of the first multi-node server to another node.

The first set of nodes is a first cluster of nodes sharing access to at least one persistent storage;
The second system component sends a second request to a third system component running on the multi-node system to request another node;
The method of claim 19, wherein the third system component assigns a node between the first cluster and another cluster in the multi-node system based on the performance metrics.

Providing another server instance causes the first system component to send a message to the second system component to indicate that the other server instance is available. 19. The method according to 19.