JP2004005669A

JP2004005669A - Network server allocation system

Info

Publication number: JP2004005669A
Application number: JP2003139991A
Authority: JP
Inventors: Hidekazu Takahashi; 高橋　英一; Takeshi Aoki; 青木　武司; Ken Yokoyama; 横山　乾; Shinji Kikuchi; 菊池　慎司
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-05-19
Filing date: 2003-05-19
Publication date: 2004-01-08
Anticipated expiration: 2018-09-08
Also published as: JP3510623B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a load recognition method for a server, and a server allocation method at high precision and efficiency. <P>SOLUTION: A device to transfer data from a client to a plurality of servers through a network is provided with a relay means to change address of the data sent from the client to be transferred to either of the servers, a connection control means to hold correspondence between the data and the server, and instruct the address to the relay means, and a server allocation means to determine processing ability of the server, the client, and a channel by measuring, and determine the correspondence between the data and the server using a function in accordance with a service distribution factor based on that to be transmitted to the connection control means. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、サーバ資源の割り当てに関するものである。
【０００２】
【従来の技術】
近年、インターネット／イントラネットの急速な普及により、ネットワークサービスサーバの効率的利用およびサービス安定性が要求されてきている。サーバの効率的利用および安定したサービス供給にはサーバへのサービスの最適な割り振りが不可欠で、そのためにはサーバの負荷を正確に認識する必要がある。
【０００３】
従来技術におけるサーバの負荷認識方法としては以下に示すものが知られている。
（１）．エージェント方式
サーバ上にＣＰＵやメモリなどの資源使用率を計測するプログラムをおく方式であるが、エージェント自身がサーバの負荷を上げ、エージェントが外部と通信を行う場合、そのために帯域を消費するなど、エージェントによる負荷計測精度への干渉が生じる。また、サーバへエージェントプログラムをインストールしなければならないため、汎用性に欠け指導コストが大きいという問題があった。
（２）．負荷計測通信方式
サーバに対しｐｉｎｇや擬似的なサービス通信などを行い、レスポンス時間などからサーバ負荷を求める方式であるが、計測のための通信で経路の帯域を消費してしまい、サーバも応答のための負荷を負うので負荷計測への干渉が生じる。また、計測に用いるプロトコルなどをサーバがサポートしている必要があり汎用性に欠けるといった問題があった。
（３）．ＶＣ数、接続時間、接続頻度、接続エラー率、レスポンス時間の計測
これらは、クライアントからのパケットをサーバへ中継する装置上にあって、中継時に計測したサーバへのＶＣ数、接続時間、接続頻度、接続エラー率、レスポンス時間からサーバ負荷を求める方式であるが、接続時のサーバの振る舞いに基づくため誤差が大きい。精度を上げるためには多量の接続を必要とするため、少ない接続で大量の通信を行うサービスには適さない。また、中継が必須であるため、計測装置のスループットでサーバのスループットが制限されるといった問題があった。
（４）．ヒット回数、ヒット率計算方式
ＷＷＷサーバなどへのパケットを調べ、アクセス対象であるファイルなどのコンテンツ毎にアクセス回数（ヒット回数）やアクセス頻度（ヒット率）を計測し、結果からサーバ負荷を求める方式であるが、アクセス対象ファイルを特定するためにはプロトコル毎のパケット解析処理が必要になり新規サービスに対応できない。さらにサーバの性能が既知でなければならない。サーバ性能をあらかじめ与えるにはサーバ性能をカタログ値や経験で求めるしかないが、サーバ性能はシステム構成や運用形態に大きく影響されるため、標準的な構成や形態に基づいたカタログ性能値は正確でなく、経験的に求める場合は少なくとも１回の障害を避けることができないといった問題があった。以上のように、いずれの方式もサーバに負担をかけずに高速かつ効率的にサーバの負荷を検出できるものではなかった。
【０００４】
さらに、このようなサーバの負荷が正確に認識できないために、サーバで提供されるサービスを割り振ることも難しかった。サービスの割り振りという視点だけからは以下のような方式が提案されている。
（５）．ラウンドロビンＤＮＳ方式
ＤＮＳ（Ｄｏｍａｉｎ　Ｎａｍｅ　Ｓｙｓｔｅｍ）サービスにおいて、１つのドメイン名に対し複数のサーバのＩＰアドレスを対応させるようエントリ表に設定しておき、クライアントからのサーバＩＰアドレスの問い合わせ要求に対し、各サーバをエントリ表にしたがい循環的（ラウンドロビン）に割り当て、割り当てられたサーバのＩＰアドレスを選択してクライアントに応えることで、サービスを複数のサーバへ分配する方式である。
【０００５】
しかし、このラウンドロビンＤＮＳ方式では、サービスの分配率は、均等であるかあるいは単純な比率でしか行えず、各サーバは、それぞれの能力や動的な負荷状況に関係なく、割り当てられた分配率に応じてサービスを行わなければならないので、各サーバの負荷状況に差が生じてしまい、全体として非効率になってしまっていた。また、ＤＮＳ問い合わせ情報は、通常クライアント側でキャッシングされてしまうので、比率変更が生じてもそれをただちに反映できないという問題もあった。
（６）．ハッシュテーブルを用いた分配方式
コネクションを管理するハッシュテーブルのエントリをサーバへ割り当て、割り当てるエントリ数に応じた比率でサービスをサーバへ分配する方式である。
【０００６】
この方式ではまず、クライアントからのサービス要求時にクライアントアドレスやサービスからエントリを決め、そのエントリが割り当てられているサーバへ要求を送る。そして、割り当てたエントリ数の比率に応じた数のサービスが各サーバへ振り分けられるので、高性能サーバへ多くのエントリを割り振ったり、高負荷になったサーバへの割り当てたエントリをそれ程負荷の高くないサーバへ割り当て直したりすることでサーバの効率的利用を実現している。
【０００７】
しかし、このハッシュテーブルを用いた分配方式では、ハッシュエントリ数の比率をサービス分配率へ正しく反映させるためには、偏りのないハッシュ値を生成するハッシュ関数が必要であるが、一般にハッシュキー（クライアントアドレスやポート番号など）のあらゆる分布に対し偏りのないハッシュ値を生成するハッシュ関数を見つけることは不可能である。また、分配率の精度はハッシュエントリ数に比例するため、精度を上げるためには多数のエントリ数が必要となり、記憶資源（バッファ）を多く消費してしまう。そして、結果的にコネクション管理に使用できる記憶資源（バッファ）が少なくなってしまい、大量なアクセスを扱うことができなくなるといった問題があった。
（７）．サーバの状態や性能にしたがった分配方式サーバに対しｐｉｎｇなどでレスポンス時間を計測したり、クライアントからのパケットを中継して、中継時に接続時間、接続エラー率などを計測してサーバの負荷の高低を予測したり、サーバ間の性能比率を予測する等して、負荷や性能比に応じた量のサービスを分配する方式である。
【０００８】
しかし、この方式では、クライアントの処理能力、クライアントまでの経路の長さや帯域幅などとは無関係にどのクライアントへのサービスも平等にサーバへ分配するため、サーバの利用効率を最大にできなかった。
【０００９】
たとえば経路の帯域幅が狭かったり長かったりして経路がボトルネックとなっているクライアントや処理能力の低いクライアントにとってサーバの性能（特にスピード）差や負荷はサービス品質には現れてこない。
【００１０】
逆に、近くや高速な回線で接続しているクライアントあるいは処理能力が高いクライアントにとってサーバの性能差や負荷はサービス品質に大きく影響することになる。そこで全てのクライアントへのサービスを平等に振り分けようとすると、クライアントにとって必要以上のサーバ資源を振り分けたり、不足したサーバ資源を振り分けたりせざるを得なくなってしまうといった問題があった。
【発明が解決しようとする課題】
以上述べたように、従来技術におけるサーバの負荷認識方法ならびにサーバ割当方法はいずれも問題があった。
【００１１】
本発明は、上記問題点を解決するべく提案されたものであり、その目的は、サーバの負荷をサーバに負担をかけずに高速かつ効率的に認識するとともに、サーバにおいて動的な負荷状況に応じたサービス分配を行い、設定や調整で得たサービス分配率を正確にサービス分配に反映させるとともに、クライアント毎に必要なサーバ資源を見積もりながらサービスを分配することでサーバの利用効率を最大にすることにある。
【００１２】
【課題を解決するための手段】
本発明は、前記課題を解決するため、以下の手段を採用した。
【００１３】
本発明の第１の手段は、データをネットワークを介してクライアントから複数のサーバに転送する装置において、クライアントから送信されるデータを宛先を変えてサーバのいずれかに転送する中継手段と、データとサーバの対応関係を保持し中継手段へ宛先を指示する接続管理手段と、サーバ、クライアントおよび経路の処理能力を計測して求め、それに基づいたサービス分配率にしたがった関数を用いてデータとサーバの対応を決定し接続管理手段へ伝えるサーバ割当手段とからなるネットワークサーバ割当装置である。
【００１４】
サーバの性能・負荷およびクライアント側の性能・負荷を計測して求めた方法にしたがってサービスを分配するので、サーバの動的な負荷状況の変化に自動的に対応でき、さらに、クライアントから見えるサービス品質を維持するために必要なだけのサーバを割り当てることができ、サーバの利用効率を最大にするという効果がある。さらに、サーバ割当決定を関数を用いて行うので、サービス分配率を正確に分配に反映できる効果がある。さらに、単一の接続管理手段のみで十分な効果を得られる。
【００１５】
本発明の第２の手段は、前記第１の手段でのサーバ割当手段において、サーバの処理能力に応じた確率分布に対し、クライアントおよび経路の処理能力が低いほど一様分布に近づける修正を行うことで求めた修正確率分布を分配率とするネットワークサーバ割当装置である。
【００１６】
クライアントおよび経路の処理能力の高さとサーバの処理能力がサービス品質に与える影響の大小の比例関係をサービス分配率へ反映するので、サーバ処理能力の影響がサービス品質に与える影響が大きいクライアントへ処理能力が高いサーバを優先的に割り当てることができる効果がある。
【００１７】
本発明の第３の手段は、前記第１の手段でのサーバ割当手段において、現在サービス中のクライアントについてのクライアントおよび経路の処理能力の分布を求め、新規接続クライアントおよび経路の処理能力が分布に対して低いほど、サーバの処理能力に応じた確率分布を一様分布に近づけ、逆に高いほど各サーバの処理能力を際立たせる修正を行い修正確率分布を求め、それを分配率とするようにした。
【００１８】
サービス分配率を現在サービス中のクライアントおよび経路の処理能力分布との関係で調節するので、遠地からと近地からのクライアントの比率が変化するような場合でも自動的に対応できる効果がある。
【００１９】
本発明の第４の手段は、前記第１の手段において、サーバ割当手段を複数分であり、接続管理手段の規模はサービス分配に依存しないため、記憶資源（バッファ）の利用効率を上げる効果がある。
【００２０】
サービス別やクライアント別に割当対象サーバ群を使い分けたりサービス分配ポリシを切り替えたりするなど多様なサーバ割り当てを単一の装置で行える効果がある。本発明の第５の手段は、クライアントからサーバへの通信を監視し、接続当たりの通信データサイズをサーバの負荷として計測するステップと、接続当たりの通信データサイズの変化を検出し、最大値を記録するステップと、前記最大値に対するその時点での接続当たりの通信データサイズが小さくなればサーバが高負荷であると判断するステップとからなるネットワークサーバ負荷検出方法である。
【００２１】
ここで、ＴＣＰ等では、サーバはクライアントから送られたパケットデータを保持するための記憶資源（バッファ）を接続毎に均等に割り当てているが、サーバは次回の受信で記憶資源（バッファ）に保持できるデータサイズをクライアントへ通知し、クライアントはサーバから通知されたサイズのデータをサーバに送るようになっている。
【００２２】
したがって、サーバが高負荷になるとクライアントから送られたデータをただちに処理できなくなるので、データの全てあるいは一部がサーバの記憶資源（バッファ）内に残留することになり、結果としてサーバは記憶資源（バッファ）内の残留データの分だけ小さいサイズをクライアントに通知せざるを得なくなる。
【００２３】
したがって、通信回線上での接続時間当たりでのデータサイズを検出することによって、サーバの高負荷状態を検出することが可能となる。本発明の第６の手段は、前記第５の手段において、監視通信最小数および監視最小時間を用いて、監視した通信の数が監視通信最小数に達し、かつ、計測時間が監視最小時間に達するまで、接続数および通信データサイズを計測するようにした。
【００２４】
本発明の第７の手段は、前記第５の手段において、接続開始および接続終了の通信を認識し、接続開始および接続終了の通信データサイズを負荷検出対象から除外するようにした。
【００２５】
接続開始と終了の通信データは小さくサーバの負荷には依存しないので、通信総データサイズ計上から除外することで、負荷計測および高負荷判断の精度を上げる効果がある。
【００２６】
本発明の第８の手段は、前記第５の手段において、接続開始通信の情報を接続終了または接続確立まで保持するステップと、クライアントが接続失敗と判断して行う再接続のための接続開始通信を前記保持された情報に基づいて検出するステップと、接続開始通信回数に占める再接続通信の割合をサーバの負荷とし、この割合が高い場合にサーバが高負荷であると判断するものである。
【００２７】
サーバの負荷が大きい場合には、サーバはクライアントからの接続要求に応答通知を返信しなくなる。これに対してクライアントは接続要求を再送することになる。したがって、通信回線上でのクライアントの接続要求の再送を検出することによりサーバの高負荷を判定できる。
【００２８】
本発明の第９の手段は、前記第５の手段において、さらに、クライアントからの通信データサイズの分布を求めるステップと、前記分布からサーバの負荷に関係しない極端に小さな通信データを識別するステップと、前記極端に小さな通信データを負荷判定から除外するステップとを含むものである。
【００２９】
サーバ負荷に関係しない極端に小さな通信データを計測から除外することで負荷計測および高負荷検出の精度を上げる効果がある。本発明の第１０の手段は、前記第５の手段において、クライアントからサーバへの通信から少なくともシーケンス番号を求めるステップと、前記シーケンス番号の最大値を、接続開始から終了まで保持するステップと、受信した通信のシーケンス番号を前記で保持されたシーケンス番号と比較するステップと、通信から得られたシーケンス番号が保持されたシーケンス番号よりも小さい場合、その通信を計測から除外するようにした。
【００３０】
シーケンス番号は通常昇順であるが、通信回線上での輻輳などによって通信の順序性の破壊もしくは欠損が起きると順序が昇順でなくなる。サーバは到着していないデータ以降のデータを処理できないので、サーバ負荷に関わらずサーバの受信可能データサイズが小さくなり、クライアントの通信データサイズも合わせて小さくなる。経路の影響を上記の方法で回避することで、サーバ負荷計測および高負荷検出の精度を上げる効果がある。
【００３１】
本発明の第１１の手段は、前記第５の手段において、前記通信から得られたシーケンス番号が保持されたシーケンス番号よりも小さい場合、当該通信データを重み付け処理を行ったの後に計上する、または両シーケンス番号から経路上に問題がなかったときの通信データサイズを予測して、予測したサイズを負荷検出に計上するようにした。
【００３２】
本発明の第１２の手段は、サーバからクライアントへの通信を監視し、サーバがクライアントへ通知する受信可能データサイズおよび接続数を計測するステップと、接続当たりの受信可能データサイズをサーバ負荷として求めるステップと、接続当たりの受信可能データサイズの最大値を記憶し、当該最大値に対する現接続当たりの受信可能データサイズが小さくなることでサーバが高負荷であると判断するネットワークサーバ負荷検出方法である。
【００３３】
本発明の第１３の手段は、クライアントからサーバへの通信を監視し、サーバの負荷状態を検出するサーバ負荷検出装置であって、接続当たりの通信データのサイズを計算するデータサイズ計算手段と、接続当たりの通信データサイズの変化を検出し、最大値を記憶する記憶手段と、前記最大値に対するその時点での接続当たりの通信データサイズが一定値以下となったときにサーバの高負荷を検出する負荷検出手段とからなるネットワークサーバ負荷検出装置である。
【００３４】
【発明の実施の形態】
【実施例１】
図１は、本実施例１におけるサーバ負荷検出装置４の機能構成を示したものである。同図に示すように、サーバ負荷検出装置４は、クライアント１とサーバ２の通信回線３に接続されており、具体的には、ルータ等に実装することも可能である。
【００３５】
このサーバ負荷検出装置４は、同図に示すように、通信回線３を伝送されるパケットデータ（ＴＣＰパケット：Ｔｒａｎｓｍｉｓｓｉｏｎ　Ｃｏｎｔｒｏｌ　Ｐｒｏｔｏｃｏｌ　Ｐａｃｋｅｔ）を取り込む通信データ取込部５を有している。この通信データ取込部５には、接続数検出部６、パケット数計算部８およびパケットサイズ計算部７が接続されている。
【００３６】
接続数検出部６は、通信データ取込部５が取り込んだＴＣＰパケットから単位時間当たりの接続数Ｃを検出する機能を有している。この接続数検出部６は、先頭パケットを意味するＳＹＮパケットを検出すると＋１とし、最後のパケットを意味するＦＩＮパケットを検出すると−１とする。これによって、当該サーバに現在接続されているクライアントの数が検出できることになる。
【００３７】
パケット数計算部８は、前記通信データ取込部５が取り込んだ単位時間当たりのＴＣＰパケットの数Ｎをカウントする機能を有しており、パケットサイズ計算部７は、前記通信データ取込部５が取り込んだ単位時間当たりのＴＣＰパケットの合計サイズＳを計算する機能を有している。
【００３８】
これらの各部の計算・計数データは、負荷検出部１０に送られて後述の所定の演算処理に基づいて負荷が判定される。パケットサイズ計算部７によって計算されるパケット合計サイズＳは、計測開始時に０とし、パケットが到着したらそのパケットサイズ分だけ順次増やしていく。なお、ＳＹＮ，ＦＩＮパケットについてはデータパケットに較べてそのサイズが小さく、サーバ負荷への影響が小さいため、無視してもよい。
【００３９】
パケット数計算部８によってカウントされるパケット数Ｎは、計測開始時に０とし、パケットが到着する度に＋１とする。なお、ここでもＳＹＮ，ＦＩＮパケットについては前述した理由によりカウントを無視してもよい。
【００４０】
パケット数計算部８によるカウントはＮがある値Ｎｍｉｎを超えるまで継続するが、このＮｍｉｎを超えても計測開始からの時間があらかじめ設定された時間Ｔｍｉｎよりも短かいときには、時間Ｔｍｉｎが経過するまでカウントを継続する。
【００４１】
ここで、ＮｍｉｎおよびＴｍｉｎはあらかじめパケット数計算部８に設定しておく。このようにＮｍｉｎ，Ｔｍｉｎを併用することで、負荷検出のためのパケットのサンプル数が少ないために生じる計算誤差を減じることができ、また、サンプル数が多すぎるために生じるオーバーフローを回避することもでき、負荷検出精度を高めることができる。
【００４２】
負荷検出部１０における負荷検出は以下の演算処理を行うことにより行われる。まず、負荷検出部１０は、接続数検出部６から接続数Ｃを、パケットサイズ計算部７よりパケットサイズＳを受け取ると、下記の式に基づいてサーバ負荷指標値Ｌを求める。
【００４３】
なお、ここでＴはタイマ１１により計測された計測時間である。ここで設定されたＴｍｉｎ経過時にサンプルとなるカウント数ＮがＮｍｉｎを超えているときにはＴ＝Ｔｍｉｎとする。
【００４４】
Ｌ＝（Ｓ／Ｃ）／Ｔここで、Ｌは単位時間での１接続当たりのデータ転送量を意味することになる。このＬを用いてサーバ２の負荷を検出することができる。
【００４５】
また、負荷検出部１０では、サーバ２の処理能力限界予測値Ｌｍａｘを更新する。ここでＬｍａｘは０を初期値とし、ＬがＬｍａｘを超えた場合にはＬｍａｘの値をＬとする。ここでもし、ＬとＬｍａｘとの間に以下の関係が成立すればサーバは高負荷であると判断することができる。
Ｌ＜αＬｍａｘ　ただし　０＜α＜＝１　・・・（１）
上式（１）において、αはあらかじめ設定した定数である。図３は、前述した負荷検出部１０における負荷検出をフロー図で示したものである。
【００４６】
まず、計測が開始されると、カウント数Ｎおよびサーバ負荷指標値Ｌはリセットされ、タイマ１１がスタートされる（ステップ３０１）。そして、通信データ取込部５を介してパケットの受信が開始されると（３０２）、接続開始パケットＳＹＮであるか（３０３）、接続終了パケットＦＩＮであるか（３０５）がそれぞれ判定される。ここで、接続開始パケットＳＹＮである場合には、変数Ｖが＋１される（３０４）。また、接続終了パケットＦＩＮである場合には変数Ｖが−１される（３０６）。
【００４７】
次に、新たなパケットが受信される度にＮが＋１されてサーバ負荷指標値Ｌが負荷検出部１０で計算される（３０７）。この計算は前に説明した計算式に基づいて行われる。そして、前述の（１）式を用いて、サーバ負荷指標値ＬがαＬｍａｘよりも小さい場合には、サーバは高負荷状態になっていると判定される。
【００４８】
このような高負荷判定は、タイマ値があらかじめ設定されたＴｍｉｎ以上となり、かつパケットのカウント数Ｎがあらかじめ設定されたＮｍｉｎ以上となったときに終了する（３０８）。
【００４９】
ここで、ＴＣＰでは、サーバ２はクライアント１から送られたパケットデータを保持するための記憶資源（バッファ）を接続毎に均等に割り当てている。サーバ２は次回の受信で記憶資源（バッファ）に保持できるデータサイズをクライアント１へ通知し、クライアント１はサーバ２から通知されたサイズのデータを通信回線３を通じてサーバ２に送るようになっている。
【００５０】
したがって、サーバ２が高負荷になるとクライアント１から送られたデータをただちに処理できなくなるので、データの全てあるいは一部がサーバ２の記憶資源（バッファ）内に残留することになり、結果としてサーバ２は記憶資源（バッファ）内の残留データの分だけ小さいサイズをクライアントに通知せざるを得なくなる。
【００５１】
ここで、ＴＣＰはできるだけ大きなサイズのデータをやり取りするよう設計されたプロトコルであるため、サーバ２が高負荷となる前の状態では、クライアント１からサーバ２に送られるデータサイズは最大となり、その後、サーバ２の負荷が大きくなると通信回線３上を伝送されるデータサイズも小さくなる。本実施例では、図２に示すように、このデータサイズが小さくなることに着目してサーバの高負荷状態を検出している。
【００５２】
本実施例では、サーバ２が高負荷となる前の状態のときの通信回線３を伝送されるデータサイズが最大の値をＬｍａｘとしてデータベース１２に保持するようにしている。そして、（１）式で示したように、このＬｍａｘに定数αを乗じた値（しきい値）とＬとを比較し、このＬがしきい値以下となったときにサーバ２が高負荷状態であると判定している。
【００５３】
このように本実施例では、接続当たりのサイズを調べるため、接続数自体が減少することによるデータ合計サイズの減少で判断を誤ることを防ぐことができ、さらにαを用いることにより、外乱で生じるＬの変動による高負荷誤検出を防ぐことができる。
【００５４】
なお、図４のフロー図は、図３で説明したフロー図とほぼ同様であるが、通信開始パケットＳＹＮと通信終了パケットＦＩＮとを考慮しないで高負荷を判定する手順を示したものである。
【００５５】
【実施例２】
本実施例２は、クライアント１からサーバ２への再送処理を利用した高負荷検出方法である。
【００５６】
本実施例２で用いる装置構成は実施例１の図１で示したものとほぼ同様であるので図示は省略する。本実施例２では、個々の開始パケットＳＹＮの情報をデータベース１２に記録している（図６（ａ）〜（ｃ）参照）。そして、それぞれの開始パケットＳＹＮの情報は、クライアントアドレス（ＩＰ）、クライアントポート番号（ｓｐ）、サーバポート番号（ｄｐ）の組で識別するようになっている。
【００５７】
ＴＣＰではサーバ２がクライアント１からの開始パケットＳＹＮを受信すると、クライアント１に対してＳＹＮ受信確認パケットを返信する。ここで、クライアント１がサーバ２からのＳＹＮ受信確認パケットを一定時間経過しても受信できない場合、再度開始パケットＳＹＮをサーバ２に対して再送信している。
【００５８】
この概念を示したものが図５である。同図（ａ）において、まずクライアント１ａより接続要求（開始パケットＳＹＮ）がサーバ２に送信される。一方、別のクライアント１ｂからも接続要求（開始パケットＳＹＮ）がサーバ２に送信される。ここで、サーバ２のバッファ５１に余裕のある場合、すなわち低負荷状態の場合には、サーバ２は、クライアント１ａおよび１ｂに対して応答通知（受信確認パケット）を送信する。しかし、サーバ２のバッファ５１に余裕のない場合には同図（ｂ）に示すように、クライアント１からの接続要求（開始パケットＳＹＮ）に対して応答ができない。そこでクライアント１は、同図（ｃ）に示すように、一定時間内にサーバ２からの応答通知（受信確認パケット）を受領できない場合には、サーバ２に対して接続要求を再送する本実施例２では、開始パケットＳＹＮの数Ｃｓを接続数検出部６でカウントし、クライアント１からの開始パケットＳＹＮの再送回数を検出してＣｓに対する開始パケットＳＹＮの再送回数の比率Ｒｓを算出し、これをサーバ負荷指標値Ｃｒｓとする。
【００５９】
ここで、開始パケットＳＹＮの再送は、開始パケットＳＹＮから抽出したＳＹＮ情報がデータベース１２に既に記録済みであれば再送であると判別できる。このことを示したのが図６である。同図（ａ）において、負荷検出装置４のデータベースには、ＳＹＮ情報として、ＳＹＮ１（ＩＰ１，ｓｐ１，ｄｐ１）、ＳＹＮ２（ＩＰ２，ｓｐ２，ｄｐ２）およびＳＹＮ３（ＩＰ３，ｓｐ３，ｄｐ３）が記録されている。このときクライアント１からサーバに対して接続要求（開始パケットＳＹＮ４）が通信回線３を通じて発信される。負荷検出装置４は、この接続要求が、自身のデータベース１２に格納されていない接続要求、すなわち初めての接続要求である場合には、この接続要求（ＳＹＮ４：ＩＰ４，ｓｐ４，ｄｐ４）を当該データベース１２に格納する（図６（ｂ））。そして、この接続要求（ＳＹＮ４）に対してサーバ２からクライアント１に応答通知がなされないときには、クライアント１よりサーバ２に対して当該接続要求（ＳＹＮ４）が再送される。負荷検出装置４は、この接続要求（ＳＹＮ４）を通信データ取込部５で取り込んで、負荷検出部１０がデータベース１２を検索することにより、既に自身が格納している接続要求であることを知り、その結果当該接続要求（ＳＹＮ４）が再接続要求であると判定する。
【００６０】
負荷検出部１０における具体的な計測方法は実施例１で説明した接続数ＣおよびパケットサイズＳの計数・検出方法にしたがう。ここで、求めた開始パケットＳＹＮの再送回数の比率Ｒｓ、すなわちＣｒｓにより次式（２）が成立すればサーバ１は高負荷であると判定する。
Ｃｒｓ＞β　ただし、０＜β＜１　・・・（２）
上式（３）において、βはあらかじめ設定された定数である。
【００６１】
サーバ２は接続毎にクライアント１からのデータを保持するバッファ５１を割り当てるが、割り当てるバッファ５１が枯渇すると接続を行わずに応答通知（ＳＹＮ受信確認パケット）をクライアント１に返さない。そのため、クライアント１は開始パケットＳＹＮの再送数の割合が増加することになる。したがって、式（２）よりサーバの高負荷を検出することが可能になる。図６（ｄ）は、このような再送率（再送回数／通信回数）とサーバ負荷の関係を示したグラフ図である。
【００６２】
なお、上式（２）の定数βは、外乱や瞬間的な高負荷状態による誤検出を防ぐためのものである。瞬間的な高負荷状態は生起確率が小さくかつ長くは続かないため、無視してよい。
【００６３】
【実施例３】
本実施例３は、負荷検出に際して、通信データサイズによって計測対象を弁別する技術である。なお、本実施例３も装置構成は図１と同様であるので、図１を用いて説明する。
【００６４】
本実施例３では、負荷検出部１０において、クライアント１からのパケットサイズＳｉとＤｓとの間に以下の関係が成立する場合はＳｉをパケット合計サイズＬに加算しないで負荷検出を行う。
Ｓｉ＜γＤｓ　・・・（３）
ただし、０＜γ＜１，　Ｄｓ＝ｆ（Ｓ１，Ｓ２，．．．．Ｓｉ−１）とする。
【００６５】
ここで、γはあらかじめ設定された定数である。Ｄｓは計測したパケットサイズの分布指標を求める関数で、たとえば平均値としてよい。またＤｓの結果値が複数の値であるならば、重み付き加算や選択などによって単一の値としてもよい。
【００６６】
ＴＣＰではクライアント１は接続後、送信データサイズをサーバ２から通知されたデータサイズよりも小さなサイズにして送信を開始し、徐々に通知データサイズまで大きくしていく。そのため接続開始後間もないクライアント１からのパケットサイズはサーバ２の負荷に関わらず小さい。
【００６７】
したがって、接続開始後間もないクライアント１の数が多ければ多数の小さな送信データのために式（１）のＬを小さく見積もってしまい負荷計測、高負荷検出の精度を落としてしまう可能性がある。
【００６８】
このことを概念的に示したものが図７である。同図（ａ）では、クライアント１ａはサーバ２に対して比較的大きなサイズのパケットデータＡを送信しているが、クライアント１ｂは、通信開始後間もないため、コマンドや応答信号など、比較的小さなサイズのパケットデータＢを送信している。このような小さなパケットデータはサーバの負荷検出に際して無視して問題ない。
【００６９】
そこで、本実施例３では、式（３）を用いることにより、接続開始後間もないクライアント１からのパケットを検出してこれを計測対象から外すことで、負荷計測と高負荷検出の精度を上げている。
【００７０】
サーバが高負荷になれば、接続している全クライアントからのデータサイズが小さくなるが、データ保持のためのバッファ５１の減少は比較的緩やかであるため、上のＬの減少も緩やかである。また、全クライアントが一斉に新たに接続を開始することは確率的に低いため、式（３）で十分である。
【００７１】
精度を上げるために式（３）に適用条件としてＤｓの下限値Ｄｓｍｉｎを設定し、ＤｓがＤｓｍｉｎ以下であれば式（３）を適用しない、つまりＳｉをＬに加算するようにしてもよい。
【００７２】
【実施例４】
本実施例４は、実施例１で説明した負荷検出において、通信回線上での輻輳等により生じるパケットの矛盾によりサーバ高負荷が誤検出されてしまうことを防止するための技術である。
【００７３】
本実施例４の装置構成は図１と同様である。ここで、クライアント１からサーバ２へのパケットは、クライアントアドレス（ＩＰ）とクライアントポート番号（ｓｐ）およびサーバポート番号（ｄｐ）の組（パケット識別子）およびシーケンス番号を接続開始から終了までがデータベース１２に保持される。このとき保持されるシーケンス番号は最大値（その時点での最終値）とする。
【００７４】
負荷検出装置４がクライアント１からのサーバ２へのパケットを受信したら、そのパケットからパケット識別子とシーケンス番号Ｐｉを求め、データベース１２に保持している同一のパケット識別子のシーケンス番号Ｐｊと比較する。
【００７５】
ここで、負荷検出部１０の判定により、Ｐｉ＜Ｐｊが成り立てば、通信回線３上でパケットの追い越しが起きたか、途中のパケットが消失したことによって再送されたことがわかる。
【００７６】
いずれにしても、このような状態でサーバ２が受け取るデータには途中で欠損が生じることになり、欠損個所以降のデータをサーバ２は処理することができず、欠損個所以降のデータはバッファ５１に残留することになる。これによりサーバ２は受信可能なデータサイズが小さくなるが、原因はサーバ負荷ではなくクライアント−サーバ間の経路での輻輳などである。このことを概念的に示したものが図８である。同図では、クライアント１からサーバ２に対してパケットデータ「１〜３」が送信されているが、これが経路輻輳等の要因でパケットデータ「２」のみが消失している。クライアント２は、受信したパケットデータ「１，３」をバッファ５１に格納する。ここで、クライアント１に対して応答通知（パケットデータ「２」の再送要求）を送るが、自身のバッファ内ではパケットデータ「２」が受信されていないため、既に到着しているパケットデータ「３」以降を処理できない状態となっている。
【００７７】
クライアント１はパケットデータ「２」に関する応答通知を重複して受信するとパケットデータ「２」の再送を行う。このようにして、パケットデータ「２〜５」が揃うことによりサーバ２は受信したこれらのパケットデータを処理できる状態となるが、ただちに処理には移行できないため、クライアント１に通知するバッファの空きサイズは本来のバッファの大きさＮよりもはるかに小さいｎとなる。
【００７８】
次に、クライアント１はサーバ２から通知されたサイズｎに格納可能なサイズのパケットデータ「６」を送信するが、実際にはこのパケットデータ「６」を受信する段階では、パケットデータ「１〜５」が処理されているため、バッファには広い空き空間が存在しており高負荷状態とはなっていない。
【００７９】
つまり、本実施例４では、このような図８に示した状態を高負荷と判定しないようにしている。以上のような理由により、Ｐｉ＜Ｐｊが成立したパケットＰｉは計測から除外する。あるいはある重み付けを行い計上してもよく、さらに負荷検出部１０において、Ｐｊ−Ｐｉをパケットサイズに加えて計上してもよい。
【００８０】
ここで、Ｐｊ−Ｐｉの算出は、サーバ２内のバッファ５１に残留しているデータの予測サイズをパケットサイズに加えることでデータの欠損が生じなかった場合のサーバ２からクライアント１へ通知される受信可能データサイズすなわち現パケットのサイズの予測を行うことを意味している。
【００８１】
【実施例５】
本実施例５は、サーバ２がクライアント１に送信するパケットデータを監視することでサーバ２の負荷を判定するものである。
【００８２】
本実施例５の負荷検出装置４は、サーバ２がクライアント１へ送るパケット中のウィンドウサイズ合計値Ｓｗと接続数Ｃとを監視する。ウィンドウサイズはサーバ２がクライアント１へ通知する受信可能なデータサイズである。
【００８３】
Ｃの値は、サーバ２からクライアント１への開始パケットＳＹＮを検出したときに１増やし、終了パケットＦＩＮを検出した場合１減じることで求める。ここで、ＳｗとＣの計測は実施例１と同様である。
【００８４】
サーバ２の負荷指標値Ｌ３は次式で求められる。Ｔは実施例１のＴと同様であるが必ずしも必須ではない。
Ｌ３＝（Ｓｗ／Ｃ）／Ｔ　・・・（４）
Ｌ３は接続当たりのウィンドウサイズを意味する。Ｌ３を用いてサーバ２の高負荷を検出する方法は次の通りである。
【００８５】
まず、サーバ２の処理能力限界予測値Ｌ３ｍａｘを更新する。Ｌ３ｍａｘは、０を初期値としＬ３がＬ３ｍａｘを超えた場合にＬ３ｍａｘの値をＬ３とすることで行う。
【００８６】
ここでもし、Ｌ３とＬ３ｍａｘとの間に以下の関係が成立すれば、サーバ２は高負荷であると判定する。
Ｌ３＜α３・Ｌ３ｍａｘ　ただし、０＜α３＜＝１　・・・（５）
上式（５）において、α３はあらかじめ設定された定数である。
【００８７】
サーバ２は、クライアント１に対して、自身が処理できるバッファ５１の空きサイズ、すなわちウィンドウサイズを通知しているが（図９（ａ））、ここで、サーバ２の負荷が上昇しクライアント１から送られたデータを完全に処理できなくなると、図９（ｂ）に示すように、サーバ２はクライアント１に対して以前より小さいウィンドウサイズｎを通知する（より具体的には次回受信可能なデータサイズである）。このように、サーバ２からクライアントに通知されるウィンドウサイズと時間の関係をグラフで示したものが図９（ｂ）である。
【００８８】
サーバ２の負荷は接続した全てのクライアントに影響するため、Ｌ３はサーバ負荷の上昇に伴い減少する。したがって、式（４）よりサーバ負荷を計測することができ、式（５）で高負荷を検出することができる。
【００８９】
【実施例６】
本実施例６は、本発明のサーバ割当装置をクライアント・サーバ間でＴＣＰパケットを中継する装置として実現した場合である。
【００９０】
図１０において、宛先変換・パケット中継手段１００２は、クライアント１から受信したパケット１０１０が接続要求を意味する開始パケットＳＹＮであると、サービスを割り当てるサーバを決定するためにサーバ割当手段１００１のサーバ選択手段１００７に対してサーバ割当指示１０２０を出力し、クライアント側処理能力計測手段１００８に計測指示１０２１を行う。
【００９１】
サーバ処理能力計測手段１００４は、各サーバの処理能力を計算し、結果データ１０１３をサーバ割当確率計算手段１００６に送出する。各サーバの処理能力は、サーバ２に対してｐｉｎｇなどを発信してそれに対する応答時間から算出することもできるし、ユーザがあらかじめ設定しておいてもよい。また、本実施例１〜５で述べたサーバ負荷検出装置を用いることもできる。
【００９２】
クライアント側処理能力計測手段１００８は、中継手段１００２からの指示があると、クライアント１および通信回線３の処理能力１０１８を計算し、サーバ割当確率修正情報生成手段１００９に報告する。ここで、クライアント側処理能力は、たとえばクライアントに対してｐｉｎｇなどを発信してその応答時間から求めることもできる。また、Ｂｐｒｏｂなどの帯域計測手法を用いたり、クライアント１の通信についての過去の記録、パケットから抽出されるウィンドウサイズやＴＴＬから求めることもできる。
【００９３】
サーバ割当修正情報生成手段１００９は、クライアント側処理能力１０１８から、サーバ割当確率分布に対する修正関数１０２２を生成する。図１１にサーバ割当確率分布ＰｓＤの例を、図１２下図に修正関数Ｍ（１０２２）の例を示す。
【００９４】
サーバ割当確率計算手段１００６は、サーバ割当確率分布ＰｓＤに修正関数Ｍ（１０２２）を適用してサーバ割当確率分布ＭＰｓＤを求める。確率分布ＰｓＤは、現時点で処理能力が高いサーバほど割当確率が高くなるような分布とする。たとえば、現時点での各サーバの処理能力値（後述）をｐ１，ｐ２，．．．ｐｎ（ｎはサーバの数）とすれば、サーバＳｉへの割当確率Ｐｉを次式で求めることもできる。
Ｐｉ＝ｐｉ／（ｐ１＋ｐ２＋．．．＋ｐｎ）　・・・（６）
確率修正関数Ｍは、図１２に示すように、クライアント側の処理能力が低いほどＰｓＤを一様分布に近づけるように修正する関数となる。たとえばｐｉｎｇなどによる応答時間Ｔｐｉｎｇをクライアント処理能力とすれば、各サーバ処理能力Ｐｉについて次の式から修正Ｐｉ’を求めてもよい。
Ｐｉ’＝Ｐｉ＋（Ｐａｖ−Ｐｉ）
＊２／π＊ａｒｃ＿ｔａｎ（α＊Ｔｐｉｎｇ）　・・・（７）
ここで、Ｐａｖは、Ｐｉの平均値であり、αはあらかじめ設定された０より大きい数である。また、ａｒｃ＿ｔａｎ（ｘ）はｔａｎ−１（ｘ）を意味する。
【００９５】
このＰｉ’から修正確率分布ＭＰｓＤを求める。サーバ割当確率計算手段１００６は、求めたＭＰｓＤをサーバ選択手段（１００７）に送る。サーバ選択手段（１００７）は、ＭＰｓＤから図１３に示すテーブルを生成し、０〜１の任意の値をとる一様乱数値を用いて実現する。同図のテーブルは、たとえばサーバ数の要素を持つ配列で実現し、各要素に０〜１までの範囲Ｐｉの最大値および最小値とサーバアドレスの組をおき、一様乱数値を含む範囲を持つ要素のサーバアドレスをサービス割当サーバアドレスとしてよい。ただし、各要素の範囲は他の要素の範囲と重複しないようにする。
【００９６】
ＰｓＤおよびＭＰｓＤの確率分布については、サーバ処理能力値ＰｉおよびＰｉ’を度数分布として実現してもよい。この場合、一様乱数値は０から全Ｐｉの合計値までの範囲をとるようにする。
【００９７】
サーバ選択手段１００７は、割当サーバを決定したらそのサーバアドレス１０１２を接続管理手段１００３に送る。接続管理手段１００３は、宛先変換・パケット中継部より受け取った開始パケットＳＹＮまたはその一部からクライアントアドレス（ＩＰ）、クライアントポート番号（ｓｐ）、宛先ポート番号（ｄｐ）の組情報を抽出し、組情報とサーバ割当手段１００１とから受け取ったサーバアドレスの対を記録する。ここで、記録には組情報をキーとするハッシュテーブルを用いてもよい。接続管理手段１００３は、宛先変換・パケット中継手段１００２へサーバアドレス１０１２を送る。
【００９８】
宛先変換・パケット中継手段１００２は、受信したクライアント１からのパケットの宛先を接続管理手段１００３から受け取ったサーバアドレス１０１２に変換してサーバ２に送信する。
【００９９】
サービス中、宛先変換・パケット中継手段１００２は、接続管理手段１００３にパケット１０１４を送り、この接続管理手段１００３は、パケット１０１４から求めた組情報より割当サーバアドレス１０１２を求めて宛先変換・パケット中継手段１００２に送る。開始パケットＳＹＮと同様に、宛先変換・パケット中継手段１００２は、受信したクライアント１からのパケットの宛先を、接続管理手段１００３から受けたサーバアドレス１０１２に変換してサーバ２へ送信する。
【０１００】
サービス終了時、すなわち終了パケットＦＩＮ受信時はサービス中と同様であるが、これを受信した接続管理手段１００３は、パケットに対応した組情報を破棄する。
【０１０１】
本実施例では、確率分布を用いてサービス割当を決定することで、処理能力が高いクライアントほど処理能力が高いサーバを割り当て易くなるので、応答時間などのサービス品質に対するサーバ処理能力の影響力の大小にしたがったサービス割り当てが可能になる。
【０１０２】
サーバ割当手段１００１のサーバ割当確率修正情報生成手段１００９は、過去のクライアント側処理能力の分布（図１４）を求め、新規接続クライアントのクライアント処理能力値の分布からの隔たりδを求め、新規接続クライアントのクライアント側処理能力値の分布からの隔たりδｃを求める。そしてδｃを修正関数Ｍに加味して確率分布を修正する（図１５）。ここで、たとえばδｃを以下の式で求める。
δｃ＝Ｐｃａ−ｐｃｉ
ここで、Ｐｃａは過去のクライアント側処理能力平均値であり、ｐｃｉは新規接続クライアントのクライアント側処理能力値である。δｃが小さいほどサーバ処理能力値ｐｉを全ｐｉの平均値に近づけ大きいほど平均値からの隔たりを大きくするように修正関数Ｍを定める。ただし、平均値からの隔たりを大きくする場合、ｐｉの修正値ｐｉ’が負数にならないようにする。たとえば、式（７）を以下のようにしてもよい。
ｐｉ’＝ｐｉ＋（Ｐａｖ−ｐｉ）＊β＊２／π＊ａｒｃ＿ｔａｎ（α＊δｃ＋γ）・・・（７’）
ここで、Ｐａｖはｐｉの平均値であり、α、γはあらかじめ設定された０より大きい数である。また、ａｒｃ＿ｔａｎ（ｘ）はｔａｎ−１（ｘ）を意味する。βは−１のとき、δｃ＜０であり、−ｄｐｊ／ｐｊのときδ＞＝０である。ｐｊはｐｉの最小値であり、ｄｐｊ＝Ｐａｖ−ｐｊとなる。
【０１０３】
このように、新規接続クライアントのクライアント側処理能力値の過去のクライアント側処理能力値の分布に対する隔たりにしたがってサーバ割当を行うことにより、各時点におけるクライアントに応じたサーバ割当が可能となる。たとえば、遠隔地からと近隣地からのクライアントの比率が時間帯によって変動する場合などに自動的に対応することが可能となる。
【０１０４】
さらに、本実施例では、サーバ割当手段（１００１）を複数配置して、それぞれをクライアントアドレス、クライアントポート番号、サービスポート番号などに応じて選択してもよい。
【０１０５】
サービス毎やクライアント毎に割当対象サーバ群を使い分けたり、サービス分配ポリシを切り替えたりすることが可能となり、多様なサービス割当を一つの装置で行うことができる。
本発明は、サーバの負荷計測および高負荷検出をクライアント・サーバ間の通信を監視して行うので、サーバへ手を加える必要がなく、サービス以外のパケットを出さない。したがって、いかなるサーバへも対処でき導入コストが低く負荷への干渉が一切ないという効果がある。また、プロトコルに依存しない指標で負荷計測、高負荷検出を行うので、いかなるサービスへも対処できるという効果もあり、サービス中の通信状態を監視するために外乱の影響が小さく精度が高いという効果もある。
さらに、サーバの提供するサービスを複数のサーバに分担させる際、サーバ構成の変更やサーバの状態の変化に対し、クライアントから見えるサービス品質に対するサーバ処理能力の影響の大小に応じて各サーバの負荷分担を自動的に、かつ、効率的に割り振るので、クライアントにとって、迅速なサービスの供給を受けることができる効果がある。
（その他）
上記実施形態は以下の発明を開示する。
（付記１）データをネットワークを介してクライアントから複数のサーバに転送する装置において、クライアントから送信されるデータを宛先を変えてサーバのいずれかに転送する中継手段と、データとサーバの対応関係を保持し中継手段へ宛先を指示する接続管理手段と、サーバ、クライアントおよび経路の処理能力を計測して求め、それに基づいたサービス分配率にしたがった関数を用いてデータとサーバの対応を決定し接続管理手段へ伝えるサーバ割当手段とからなるネットワークサーバ割当装置。
（付記２）前記付記１のサーバ割当手段において、サーバの処理能力に応じた確率分布に対し、クライアントおよび経路の処理能力が低いほど一様分布に近づける修正を行うことで求めた修正確率分布を分配率とするネットワークサーバ割当装置。
（付記３）前記付記１のサーバ割当手段において、現在サービス中のクライアントについてのクライアントおよび経路の処理能力の分布を求め、新規接続クライアントおよび経路の処理能力が分布に対して低いほど、サーバの処理能力に応じた確率分布を一様分布に近づけ、逆に高いほど各サーバの処理能力を際立たせる修正を行い修正確率分布を求め、それを分配率とするネットワークサーバ割当装置。
（付記４）前記付記１において、サーバ割当手段を複数有しており、当該サーバ割当手段は、クライアントやサービス毎に選択されるネットワークサーバ割当装置。
【０１０６】
【発明の効果】
以上説明してきたように、本発明は、サーバの性能・負荷およびクライアント側の性能・負荷を計測して求めた方法にしたがってサービスを分配するので、サーバの動的な負荷状況の変化に自動的に対応でき、さらに、クライアントから見えるサービス品質を維持するために必要なだけのサーバを割り当てることができ、サーバの利用効率を最大にするという効果がある。さらに、サーバ割当決定を関数を用いて行うので、サービス分配率を正確に分配に反映できる効果がある。
【０１０７】
また、クライアント側処理能力の高さとサーバの処理能力がサービス品質に与える影響の大小の比例関係をサービス分配率へ反映するので、サーバ処理能力の影響がサービス品質に与える影響が大きいクライアントへ処理能力が高いサーバを優先的に割り当てることができる効果がある。
また、サービス分配率を現在サービス中のクライアント側処理能力分布との関係で調節するので、遠地からと近地からのクライアントの比率が変化するような場合でも自動的に対応できる効果がある。
【図面の簡単な説明】
【図１】本発明の実施形態である負荷検出装置の接続構成を示す図
【図２】実施形態におけるサーバの高負荷判定を行うためのデータサイズと時間との関係を示したグラフ図
【図３】実施例１におけるパケット監視方法を示すフロー図（１）
【図４】実施例１におけるパケット監視方法を示すフロー図（２）
【図５】クライアントからサーバへの接続要求と、バッファの状態に応じた応答処理を説明するための図
【図６】クライアントからサーバへの接続要求の再送を説明するための図
【図７】負荷検出においてデータサイズによって対象データとするか否かの弁別を行う例を示す説明図
【図８】シーケンス番号に基づくサーバの処理を説明するための図
【図９】サーバからクライアントへの通信を監視する説明図
【図１０】実施形態のサーバ割当装置の構成を示すブロック図
【図１１】サーバ割当確率分布ＰｓＤを説明するためのグラフ図
【図１２】修正関数を説明するための図（１）
【図１３】実施形態において、サーバ選択手段により生成されるテーブルの一例を示す図
【図１４】過去のクライアント側処理能力値の分布例を示すグラフ図
【図１５】修正関数を説明するための図（２）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to server resource allocation.
[0002]
[Prior art]
In recent years, with the rapid spread of the Internet / intranet, there has been a demand for efficient use of network service servers and service stability. Optimal allocation of services to servers is essential for efficient use of servers and stable supply of services, for which it is necessary to accurately recognize the server load.
[0003]
The following are known as server load recognition methods in the prior art.
(1). Agent method
This is a method in which a program for measuring resource usage such as CPU and memory is placed on the server. However, when the agent itself increases the load on the server and the agent communicates with the outside, the agent consumes bandwidth for that purpose. Interference with load measurement accuracy occurs. In addition, since the agent program must be installed on the server, there is a problem that versatility is lacking and guidance cost is high.
(2). Load measurement communication method
This method pings or simulates service communication with the server and calculates the server load from the response time. However, communication for measurement consumes the bandwidth of the path, and the server also reduces the load for the response. As a result, interference with load measurement occurs. In addition, there is a problem that the server needs to support a protocol used for measurement and the like and lacks versatility.
(3). Measurement of VC count, connection time, connection frequency, connection error rate, response time
These are methods for finding a server load from the number of VCs, connection time, connection frequency, connection error rate, and response time to a server measured at the time of relay, which are on a device that relays a packet from a client to a server. The error is large because it is based on the behavior of the server at the time of connection. Since a large number of connections are required to increase the accuracy, it is not suitable for a service that performs a large amount of communication with a small number of connections. Further, since the relay is essential, there is a problem that the throughput of the server is limited by the throughput of the measuring device.
(4). Hit count, hit rate calculation method
This method examines packets to WWW servers, etc., measures the number of accesses (number of hits) and access frequency (hit ratio) for each content such as a file to be accessed, and obtains the server load from the result. In order to specify the packet, packet analysis processing for each protocol is required, and it is not possible to cope with a new service. In addition, the performance of the server must be known. The only way to give server performance in advance is to determine the server performance based on catalog values and experience, but since server performance is greatly affected by the system configuration and operation mode, catalog performance values based on standard configurations and modes are accurate. However, there is a problem that at least one obstacle cannot be avoided when empirically seeking. As described above, none of the methods can detect the load on the server quickly and efficiently without imposing a load on the server.
[0004]
Furthermore, it is difficult to allocate services provided by the server because the load on the server cannot be accurately recognized. From the viewpoint of service allocation only, the following methods have been proposed.
(5). Round robin DNS system
In a DNS (Domain Name System) service, an entry table is set so that one domain name is associated with the IP address of a plurality of servers. In this method, services are distributed to a plurality of servers by cyclically (round-robin) allocation in accordance with an IP address of the assigned server and responding to the client.
[0005]
However, in the round-robin DNS system, the distribution ratio of the service can be equal or simple, and each server can assign the distribution ratio regardless of its capacity or dynamic load situation. Services must be provided in accordance with the requirements, and there is a difference in the load status of each server, resulting in inefficiency as a whole. Further, since the DNS inquiry information is usually cached on the client side, there is a problem that even if a ratio change occurs, it cannot be immediately reflected.
(6). Distribution method using hash table
In this method, entries in a hash table for managing connections are allocated to servers, and services are distributed to servers at a ratio corresponding to the number of entries to be allocated.
[0006]
In this method, first, when a service is requested from a client, an entry is determined from a client address and a service, and the request is sent to a server to which the entry is assigned. Then, since the number of services corresponding to the ratio of the number of allocated entries is allocated to each server, many entries are allocated to a high-performance server, and the number of entries allocated to a high-load server is not so high. By reassigning to servers, efficient use of servers is realized.
[0007]
However, in the distribution method using the hash table, a hash function for generating an unbiased hash value is necessary to correctly reflect the ratio of the number of hash entries to the service distribution ratio. It is not possible to find a hash function that produces an unbiased hash value for every distribution of addresses and port numbers. Further, since the accuracy of the distribution ratio is proportional to the number of hash entries, a large number of entries are required to increase the accuracy, and a large amount of storage resources (buffers) are consumed. As a result, there is a problem that the storage resources (buffers) that can be used for connection management decrease, and a large amount of accesses cannot be handled.
(7). Distribution method according to server status and performance Measure response time by pinging the server, relay packets from the client, measure connection time, connection error rate, etc. at the time of relaying, and adjust the load on the server. This is a method of distributing an amount of service according to the load and the performance ratio, for example, by predicting the performance ratio between servers.
[0008]
However, in this method, the service efficiency of the server cannot be maximized because the service to any client is equally distributed to the server regardless of the processing capacity of the client, the length of the path to the client, the bandwidth, and the like.
[0009]
For example, a difference in server performance (especially speed) or load does not appear in the service quality for a client whose route is a bottleneck due to a narrow or long bandwidth of the route or a client with low processing capacity.
[0010]
Conversely, for a client connected by a close or high-speed line or a client having a high processing capability, the difference in server performance and load greatly affects the service quality. Therefore, when trying to distribute services to all clients equally, there is a problem that the server resources must be distributed more than necessary for the clients or insufficient server resources must be distributed.
[Problems to be solved by the invention]
As described above, both the server load recognition method and the server assignment method in the related art have problems.
[0011]
The present invention has been proposed to solve the above problems, and an object of the present invention is to recognize a server load at high speed and efficiently without imposing a load on the server, and to recognize a dynamic load situation in the server. The service distribution is performed according to the requirements, and the service distribution rate obtained by the setting and adjustment is accurately reflected in the service distribution, and the service is distributed while estimating the required server resources for each client, thereby maximizing the utilization efficiency of the server. It is in.
[0012]
[Means for Solving the Problems]
The present invention employs the following means in order to solve the above problems.
[0013]
According to a first aspect of the present invention, in an apparatus for transferring data from a client to a plurality of servers via a network, a relay means for transferring data transmitted from the client to one of the servers by changing a destination, Connection management means that holds the correspondence of the servers and indicates the destination to the relay means, and measures and obtains the processing capacity of the server, client, and route, and uses the function according to the service distribution rate based on the data to calculate the data and server. This is a network server allocating device comprising a server allocating means for determining a correspondence and transmitting it to a connection managing means.
[0014]
The service is distributed according to the method obtained by measuring the performance and load of the server and the performance and load of the client side, so that it can automatically respond to changes in the dynamic load status of the server, and furthermore, the service quality seen by the client As many servers as necessary to maintain the server can be allocated, which has the effect of maximizing server utilization efficiency. Further, since the server allocation is determined using the function, the service distribution ratio can be accurately reflected in the distribution. Further, a sufficient effect can be obtained with only a single connection management means.
[0015]
According to a second means of the present invention, in the server allocating means of the first means, the probability distribution according to the processing capacity of the server is corrected so as to be closer to a uniform distribution as the processing capacity of the client and the route is lower. This is a network server allocating apparatus that uses the corrected probability distribution obtained as described above as a distribution ratio.
[0016]
Since the proportional relationship between the high processing power of the client and the path and the processing power of the server on the service quality is reflected in the service distribution ratio, the processing power is provided to the client that the influence of the server processing power has a large influence on the service quality. There is an effect that a server having a high priority can be preferentially assigned.
[0017]
According to a third means of the present invention, in the server allocating means of the first means, a distribution of the processing capacity of the client and the route for the client currently in service is obtained, and the processing capacity of the newly connected client and the path is distributed. On the other hand, the lower the lower, the closer the probability distribution according to the processing capacity of the server to a uniform distribution, and the higher the higher, the correction that makes the processing capacity of each server more prominent, the corrected probability distribution is obtained, and the distribution rate is used. did.
[0018]
Since the service distribution rate is adjusted in relation to the clients currently being served and the processing capacity distribution of the route, there is an effect that it is possible to automatically cope with the case where the ratio of clients from a distant place and a near place changes.
[0019]
According to a fourth aspect of the present invention, in the first aspect, the server allocating means is divided into a plurality of parts, and the scale of the connection managing means does not depend on the service distribution. is there.
[0020]
There is an effect that various server assignments can be performed by a single device, such as using a group of servers to be assigned for each service or client or switching a service distribution policy. A fifth means of the present invention monitors communication from a client to a server, measures communication data size per connection as a load on a server, detects a change in communication data size per connection, and sets a maximum value. A network server load detection method comprising: a recording step; and a step of determining that a server has a high load if a communication data size per connection at that time with respect to the maximum value is reduced.
[0021]
Here, in TCP and the like, the server equally allocates a storage resource (buffer) for holding packet data sent from the client for each connection, but the server holds the storage resource (buffer) in the next reception. The client reports the available data size to the client, and the client sends data of the size notified from the server to the server.
[0022]
Therefore, when the load of the server becomes high, the data sent from the client cannot be processed immediately, and all or a part of the data remains in the storage resources (buffers) of the server. Buffer) must be notified to the client of a size smaller than that of the remaining data.
[0023]
Therefore, by detecting the data size per connection time on the communication line, it becomes possible to detect a high load state of the server. According to a sixth aspect of the present invention, in the fifth aspect, the number of monitored communications reaches the minimum number of monitored communications by using the minimum number of monitored communications and the minimum monitored time, and the measurement time is set to the minimum monitored time. Until the number reached, the number of connections and communication data size were measured.
[0024]
According to a seventh means of the present invention, in the fifth means, the communication of the connection start and the connection end is recognized, and the communication data size of the connection start and the connection end is excluded from the load detection targets.
[0025]
Since the communication data at the start and end of the connection is small and does not depend on the load on the server, excluding it from the total communication data size calculation has the effect of increasing the accuracy of load measurement and high load judgment.
[0026]
According to an eighth aspect of the present invention, in the fifth aspect, the step of holding the information of the connection start communication until the connection is ended or the connection is established, and the connection start communication for the reconnection performed by the client when the client determines that the connection has failed. Is detected based on the stored information, and the ratio of the reconnection communication to the number of connection start communication times is set as the load on the server. If the ratio is high, the server is determined to have a high load.
[0027]
When the load on the server is heavy, the server does not return a response notice to the connection request from the client. On the other hand, the client resends the connection request. Therefore, high load on the server can be determined by detecting retransmission of a client connection request on the communication line.
[0028]
The ninth means of the present invention is the communication device according to the fifth means, further comprising: obtaining a distribution of communication data size from the client; and identifying extremely small communication data irrespective of the load on the server from the distribution. , Excluding the extremely small communication data from the load determination.
[0029]
By excluding extremely small communication data unrelated to the server load from the measurement, there is an effect of improving the accuracy of load measurement and high load detection. The tenth means of the present invention, in the fifth means, obtains at least a sequence number from a communication from the client to the server; holds a maximum value of the sequence number from the start to the end of the connection; Comparing the sequence number of the communication with the stored sequence number, and excluding the communication from the measurement if the sequence number obtained from the communication is smaller than the stored sequence number.
[0030]
The sequence numbers are usually in ascending order, but if the order of communication is broken or lost due to congestion on a communication line or the like, the order will not be ascending. Since the server cannot process data after the data that has not arrived, the receivable data size of the server is reduced regardless of the server load, and the communication data size of the client is also reduced. Avoiding the influence of the route by the above method has an effect of increasing the accuracy of server load measurement and high load detection.
[0031]
According to an eleventh aspect of the present invention, in the fifth aspect, if the sequence number obtained from the communication is smaller than the held sequence number, the communication data is counted after performing a weighting process, or The communication data size when there is no problem on the route is predicted from both sequence numbers, and the predicted size is included in the load detection.
[0032]
The twelfth means of the present invention monitors the communication from the server to the client, measures the receivable data size and the number of connections notified by the server to the client, and obtains the receivable data size per connection as a server load. This is a network server load detection method that stores a maximum value of a receivable data size per connection and a maximum value of a receivable data size per current connection with respect to the maximum value and determines that the server has a high load. .
[0033]
A thirteenth means of the present invention is a server load detection device that monitors communication from a client to a server and detects a load state of the server, wherein the data size calculation means calculates a size of communication data per connection; Storage means for detecting a change in the communication data size per connection and storing the maximum value, and detecting a high load on the server when the communication data size per connection at that time with respect to the maximum value becomes equal to or smaller than a certain value; This is a network server load detecting device including a load detecting unit that performs load detection.
[0034]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiment 1
FIG. 1 illustrates a functional configuration of the server load detection device 4 according to the first embodiment. As shown in the figure, the server load detecting device 4 is connected to the communication line 3 between the client 1 and the server 2, and can be specifically mounted on a router or the like.
[0035]
As shown in FIG. 1, the server load detection device 4 has a communication data capturing unit 5 that captures packet data (TCP packet: Transmission Control Protocol Packet) transmitted through the communication line 3. The communication data acquisition unit 5 is connected with a connection number detection unit 6, a packet number calculation unit 8, and a packet size calculation unit 7.
[0036]
The connection number detecting unit 6 has a function of detecting the number of connections C per unit time from the TCP packet captured by the communication data capturing unit 5. The connection number detection unit 6 sets +1 when detecting a SYN packet indicating the first packet, and sets −1 when detecting a FIN packet indicating the last packet. As a result, the number of clients currently connected to the server can be detected.
[0037]
The packet number calculation unit 8 has a function of counting the number N of TCP packets per unit time captured by the communication data capture unit 5, and the packet size calculation unit 7 includes Has a function of calculating the total size S of the TCP packets per unit time taken in.
[0038]
The calculation / count data of each of these units is sent to the load detection unit 10, and the load is determined based on a predetermined calculation process described later. The total packet size S calculated by the packet size calculation unit 7 is set to 0 at the start of measurement, and when a packet arrives, it is sequentially increased by the packet size. Note that the SYN and FIN packets are smaller in size than the data packets and have less influence on the server load, and may be ignored.
[0039]
The packet number N counted by the packet number calculation unit 8 is set to 0 at the start of measurement, and is set to +1 each time a packet arrives. Here, the count of the SYN and FIN packets may be ignored for the above-described reason.
[0040]
The count by the packet number calculation unit 8 continues until N exceeds a certain value Nmin. However, if the time from the start of measurement is shorter than the preset time Tmin even after exceeding Nmin, until the time Tmin elapses. Continue counting.
[0041]
Here, Nmin and Tmin are set in the packet number calculation unit 8 in advance. By using Nmin and Tmin together in this way, it is possible to reduce a calculation error caused by a small number of packet samples for load detection, and to avoid an overflow caused by too many samples. The load detection accuracy can be improved.
[0042]
The load detection in the load detection unit 10 is performed by performing the following arithmetic processing. First, upon receiving the number of connections C from the number-of-connections detecting unit 6 and the packet size S from the packet-size calculating unit 7, the load detecting unit 10 obtains a server load index value L based on the following equation.
[0043]
Here, T is a measurement time measured by the timer 11. When the set count number N exceeds Nmin when Tmin elapses, T = Tmin.
[0044]
L = (S / C) / T Here, L means the data transfer amount per connection per unit time. Using this L, the load on the server 2 can be detected.
[0045]
Further, the load detecting unit 10 updates the processing capacity limit predicted value Lmax of the server 2. Here, Lmax is set to 0 as an initial value, and when L exceeds Lmax, the value of Lmax is set to L. Here, if the following relationship is established between L and Lmax, it can be determined that the server has a high load.
L <αLmax where 0 <α <= 1 (1)
In the above equation (1), α is a preset constant. FIG. 3 is a flowchart illustrating load detection in the load detection unit 10 described above.
[0046]
First, when the measurement is started, the count number N and the server load index value L are reset, and the timer 11 is started (step 301). Then, when packet reception is started via the communication data acquisition unit 5 (302), it is determined whether the packet is the connection start packet SYN (303) or the connection end packet FIN (305). Here, in the case of the connection start packet SYN, the variable V is incremented by 1 (304). If it is the connection end packet FIN, the variable V is decremented by one (306).
[0047]
Next, every time a new packet is received, N is incremented by 1 and the server load index value L is calculated by the load detection unit 10 (307). This calculation is performed based on the above-described calculation formula. If the server load index value L is smaller than αLmax using the above-described equation (1), it is determined that the server is in a high load state.
[0048]
Such a high load determination ends when the timer value becomes equal to or more than the preset Tmin and the packet count number N becomes equal to or more than the preset Nmin (308).
[0049]
Here, in TCP, the server 2 equally allocates storage resources (buffers) for holding packet data sent from the client 1 for each connection. The server 2 notifies the client 1 of the data size that can be held in the storage resource (buffer) at the next reception, and the client 1 sends data of the size notified from the server 2 to the server 2 through the communication line 3. .
[0050]
Therefore, if the server 2 is overloaded, the data sent from the client 1 cannot be processed immediately, so that all or a part of the data remains in the storage resources (buffer) of the server 2 and as a result, the server 2 Must notify the client of a smaller size by the amount of residual data in the storage resource (buffer).
[0051]
Here, since TCP is a protocol designed to exchange data of as large a size as possible, before the server 2 becomes under a high load, the data size transmitted from the client 1 to the server 2 becomes maximum, and thereafter, When the load on the server 2 increases, the size of data transmitted on the communication line 3 also decreases. In the present embodiment, as shown in FIG. 2, attention is paid to the fact that the data size is reduced, and the high load state of the server is detected.
[0052]
In the present embodiment, the maximum value of the data size transmitted over the communication line 3 when the server 2 is in a state before the load becomes high is held in the database 12 as Lmax. Then, as shown in equation (1), L is compared with a value (threshold) obtained by multiplying Lmax by a constant α, and when this L becomes equal to or less than the threshold, the server 2 has a high load. It is determined that it is in the state.
[0053]
As described above, in this embodiment, since the size per connection is checked, it is possible to prevent a misjudgment due to a decrease in the total data size due to a decrease in the number of connections itself. Erroneous detection of a high load due to a change in L can be prevented.
[0054]
The flow chart of FIG. 4 is substantially the same as the flow chart described with reference to FIG. 3, but shows a procedure for determining a high load without considering the communication start packet SYN and the communication end packet FIN.
[0055]
Embodiment 2
The second embodiment is a high load detection method using retransmission processing from the client 1 to the server 2.
[0056]
The configuration of the apparatus used in the second embodiment is substantially the same as that shown in FIG. In the second embodiment, information of each start packet SYN is recorded in the database 12 (see FIGS. 6A to 6C). The information of each start packet SYN is identified by a set of a client address (IP), a client port number (sp), and a server port number (dp).
[0057]
In the TCP, when the server 2 receives the start packet SYN from the client 1, it returns a SYN reception confirmation packet to the client 1. Here, if the client 1 cannot receive the SYN reception confirmation packet from the server 2 even after a certain period of time, it retransmits the start packet SYN to the server 2 again.
[0058]
FIG. 5 illustrates this concept. In FIG. 3A, first, a connection request (start packet SYN) is transmitted from the client 1 a to the server 2. On the other hand, a connection request (start packet SYN) is also transmitted to the server 2 from another client 1b. Here, when there is room in the buffer 51 of the server 2, that is, when the load is low, the server 2 transmits a response notification (acknowledgment packet) to the clients 1a and 1b. However, if the buffer 51 of the server 2 has no room, as shown in FIG. 3B, it is impossible to respond to the connection request (start packet SYN) from the client 1. Therefore, as shown in FIG. 3C, the client 1 retransmits the connection request to the server 2 when it cannot receive a response notification (acknowledgment packet) from the server 2 within a predetermined time. In step 2, the connection number detector 6 counts the number Cs of the start packets SYN, detects the number of retransmissions of the start packet SYN from the client 1, and calculates the ratio Rs of the number of retransmissions of the start packet SYN to Cs. The server load index value is Crs.
[0059]
Here, retransmission of the start packet SYN can be determined to be retransmission if the SYN information extracted from the start packet SYN has already been recorded in the database 12. FIG. 6 shows this. In FIG. 3A, SYN1 (IP1, sp1, dp1), SYN2 (IP2, sp2, dp2), and SYN3 (IP3, sp3, dp3) are recorded as SYN information in the database of the load detection device 4. I have. At this time, a connection request (start packet SYN4) is transmitted from the client 1 to the server via the communication line 3. If the connection request is a connection request not stored in its own database 12, that is, the first connection request, the load detection device 4 transmits the connection request (SYN4: IP4, sp4, dp4) to the database 12 (FIG. 6B). When the server 2 does not notify the client 1 of the connection request (SYN4), the client 1 retransmits the connection request (SYN4) to the server 2. The load detection device 4 captures the connection request (SYN4) by the communication data capture unit 5, and the load detection unit 10 searches the database 12 to find out that the connection request is already stored by itself. As a result, it is determined that the connection request (SYN4) is a reconnection request.
[0060]
A specific measuring method in the load detecting unit 10 follows the counting and detecting method of the number of connections C and the packet size S described in the first embodiment. Here, if the following equation (2) is satisfied based on the calculated ratio Rs of the number of retransmissions of the start packet SYN, that is, Crs, the server 1 determines that the load is high.
Crs> β where 0 <β <1 (2)
In the above equation (3), β is a preset constant.
[0061]
The server 2 allocates a buffer 51 for holding data from the client 1 for each connection, but does not return a response notification (SYN reception confirmation packet) to the client 1 without performing connection when the allocated buffer 51 is exhausted. Therefore, the client 1 increases the ratio of the number of retransmissions of the start packet SYN. Therefore, it is possible to detect a high load on the server from the equation (2). FIG. 6D is a graph showing the relationship between the retransmission rate (the number of retransmissions / the number of communications) and the server load.
[0062]
Note that the constant β in the above equation (2) is for preventing erroneous detection due to disturbance or an instantaneous high load state. The momentary high load state can be ignored because the probability of occurrence is small and does not last long.
[0063]
Embodiment 3
The third embodiment is a technique for discriminating a measurement target according to a communication data size when detecting a load. Note that the device configuration of the third embodiment is the same as that of FIG. 1, and will be described with reference to FIG. 1.
[0064]
In the third embodiment, when the following relationship is established between the packet size Si from the client 1 and Ds, the load detection unit 10 detects the load without adding Si to the total packet size L.
Si <γDs (3)
However, it is assumed that 0 <γ <1, Ds = f (S1, S2,... Si−1).
[0065]
Here, γ is a preset constant. Ds is a function for obtaining a distribution index of the measured packet size, and may be, for example, an average value. If the result value of Ds is a plurality of values, a single value may be obtained by weighted addition or selection.
[0066]
In TCP, after connection, the client 1 starts transmission by setting the transmission data size to a size smaller than the data size notified from the server 2, and gradually increases the transmission data size to the notification data size. Therefore, the packet size from the client 1 shortly after the connection is started is small regardless of the load on the server 2.
[0067]
Therefore, if the number of clients 1 shortly after the start of connection is large, L of Expression (1) may be estimated small due to many small transmission data, and the accuracy of load measurement and high load detection may be reduced. .
[0068]
FIG. 7 conceptually illustrates this. In FIG. 2A, the client 1a transmits packet data A of a relatively large size to the server 2, but the client 1b has a relatively short time since the start of communication. The packet data B having a small size is transmitted. Such small packet data can be ignored when detecting the load on the server.
[0069]
Therefore, in the third embodiment, the accuracy of the load measurement and the high load detection is improved by using Expression (3) to detect the packet from the client 1 shortly after the connection is started and remove the packet from the measurement target. Raising.
[0070]
When the load of the server becomes high, the data size from all the connected clients becomes small, but the decrease in the buffer 51 for holding the data is relatively slow, so the decrease in L above is also slow. In addition, since it is probable that all clients start a new connection at the same time, equation (3) is sufficient.
[0071]
In order to increase the accuracy, the lower limit value Dsmin of Ds may be set as an application condition in Expression (3), and if Ds is equal to or less than Dsmin, Expression (3) may not be applied, that is, Si may be added to L.
[0072]
Embodiment 4
The fourth embodiment is a technique for preventing a server high load from being erroneously detected due to packet inconsistency caused by congestion or the like on a communication line in the load detection described in the first embodiment.
[0073]
The device configuration of the fourth embodiment is the same as that of FIG. Here, a packet from the client 1 to the server 2 includes a set (packet identifier) of a client address (IP), a client port number (sp), and a server port number (dp) and a sequence number from the connection start to the end of the database 12. Is held. The sequence number held at this time is the maximum value (final value at that time).
[0074]
When the load detection device 4 receives a packet from the client 1 to the server 2, it obtains a packet identifier and a sequence number Pi from the packet, and compares it with the sequence number Pj of the same packet identifier stored in the database 12.
[0075]
Here, if Pi <Pj is satisfied by the determination of the load detection unit 10, it can be understood that the packet has been retransmitted due to the overtaking of the packet on the communication line 3 or the lost packet in the middle.
[0076]
In any case, the data received by the server 2 in such a state will be lost on the way, and the server 2 cannot process the data after the lost portion and the data after the lost portion will be stored in the buffer 51. Will remain. As a result, the receivable data size of the server 2 is reduced, but the cause is not the server load but the congestion on the path between the client and the server. FIG. 8 conceptually illustrates this. In the figure, the packet data “1” to “3” are transmitted from the client 1 to the server 2, but only the packet data “2” is lost due to factors such as path congestion. The client 2 stores the received packet data “1, 3” in the buffer 51. Here, a response notification (retransmission request of packet data "2") is sent to the client 1, but since the packet data "2" has not been received in its own buffer, the already arrived packet data "3" ”And subsequent states cannot be processed.
[0077]
The client 1 retransmits the packet data "2" when receiving the response notification regarding the packet data "2" in duplicate. In this way, when the packet data "2 to 5" are completed, the server 2 can process the received packet data. However, since the process cannot immediately proceed to the process, the free space of the buffer to be notified to the client 1 is not available. Becomes n which is much smaller than the original buffer size N.
[0078]
Next, the client 1 transmits packet data “6” of a size that can be stored in the size n notified from the server 2, but actually receives the packet data “1” at the stage of receiving the packet data “6”. Since “5” is processed, a large empty space exists in the buffer, and the buffer is not in a high load state.
[0079]
That is, in the fourth embodiment, the state shown in FIG. 8 is not determined to be a high load. For the above reasons, the packet Pi for which Pi <Pj holds is excluded from the measurement. Alternatively, the weight may be calculated by performing a certain weighting, and the load detection unit 10 may further calculate by adding Pj-Pi to the packet size.
[0080]
Here, the calculation of Pj-Pi is notified from the server 2 to the client 1 when no data loss occurs by adding the predicted size of the data remaining in the buffer 51 in the server 2 to the packet size. This means that the receivable data size, that is, the size of the current packet is predicted.
[0081]
Embodiment 5
In the fifth embodiment, the load on the server 2 is determined by monitoring the packet data transmitted from the server 2 to the client 1.
[0082]
The load detection device 4 according to the fifth embodiment monitors the total window size Sw and the number of connections C in the packet sent from the server 2 to the client 1. The window size is a receivable data size that the server 2 notifies the client 1.
[0083]
The value of C is obtained by increasing 1 when detecting the start packet SYN from the server 2 to the client 1 and decrementing 1 when detecting the end packet FIN. Here, the measurement of Sw and C is the same as in the first embodiment.
[0084]
The load index value L3 of the server 2 is obtained by the following equation. T is the same as T in Example 1, but is not necessarily required.
L3 = (Sw / C) / T (4)
L3 means the window size per connection. A method for detecting a high load on the server 2 using L3 is as follows.
[0085]
First, the processing capacity limit predicted value L3max of the server 2 is updated. L3max is set by setting 0 as an initial value and setting L3max to L3 when L3 exceeds L3max.
[0086]
Here, if the following relationship is established between L3 and L3max, the server 2 determines that the load is high.
L3 <α3 · L3max where 0 <α3 <= 1 (5)
In the above equation (5), α3 is a preset constant.
[0087]
The server 2 notifies the client 1 of the free size of the buffer 51 that can be processed by the server 2, that is, the window size (FIG. 9A). When the transmitted data cannot be completely processed, as shown in FIG. 9B, the server 2 notifies the client 1 of the smaller window size n (more specifically, the data that can be received next time). Size). FIG. 9B is a graph showing the relationship between the window size and time notified from the server 2 to the client.
[0088]
Since the load on the server 2 affects all connected clients, L3 decreases as the server load increases. Therefore, the server load can be measured from Expression (4), and a high load can be detected from Expression (5).
[0089]
Embodiment 6
In the sixth embodiment, the server allocation device of the present invention is realized as a device that relays a TCP packet between a client and a server.
[0090]
In FIG. 10, if the packet 1010 received from the client 1 is a start packet SYN indicating a connection request, the destination translating / packet relaying unit 1002 determines the server to which the service is to be allocated by the server selecting unit 1001. A server allocation instruction 1020 is output to 1007, and a measurement instruction 1021 is issued to the client-side processing capacity measuring means 1008.
[0091]
The server processing capacity measuring means 1004 calculates the processing capacity of each server, and sends the result data 1013 to the server allocation probability calculating means 1006. The processing capacity of each server can be calculated from the response time to the server 2 by sending a ping or the like to the server 2 or can be set in advance by the user. Further, the server load detection devices described in the first to fifth embodiments can also be used.
[0092]
Upon receiving an instruction from the relay unit 1002, the client-side processing capability measuring unit 1008 calculates the processing capability 1018 of the client 1 and the communication line 3, and reports the same to the server allocation probability correction information generating unit 1009. Here, the client-side processing capability can be obtained from, for example, a ping to the client and the response time. In addition, a bandwidth measurement method such as Bprob can be used, or past records of communication of the client 1 can be obtained from a window size or TTL extracted from a packet.
[0093]
The server allocation correction information generation unit 1009 generates a correction function 1022 for the server allocation probability distribution from the client-side processing capacity 1018. FIG. 11 shows an example of the server allocation probability distribution PsD, and the lower part of FIG. 12 shows an example of the correction function M (1022).
[0094]
The server allocation probability calculation means 1006 calculates the server allocation probability distribution MPsD by applying the correction function M (1022) to the server allocation probability distribution PsD. The probability distribution PsD is a distribution in which a server having a higher processing capability at this time has a higher allocation probability. For example, the processing capacity values (described later) of each server at the present time are p1, p2,. . . Assuming pn (n is the number of servers), the allocation probability Pi to the server Si can be obtained by the following equation.
Pi = pi / (p1 + p2 + ... + pn) (6)
As shown in FIG. 12, the probability correction function M is a function that corrects PsD to be closer to a uniform distribution as the processing capability on the client side is lower. For example, if the response time Tping due to ping or the like is used as the client processing capacity, a correction Pi ′ may be obtained for each server processing capacity Pi from the following equation.
Pi ′ = Pi + (Pav−Pi)
* 2 / π * arc_tan (α * Tping) (7)
Here, Pav is an average value of Pi, and α is a number larger than 0 which is set in advance. Also, arc_tan (x) means tan-1 (x).
[0095]
A modified probability distribution MPsD is obtained from Pi ′. The server allocation probability calculation means 1006 sends the obtained MPsD to the server selection means (1007). The server selection means (1007) generates the table shown in FIG. 13 from MPsD, and realizes the table using a uniform random number value that takes any value from 0 to 1. The table shown in the figure is realized by, for example, an array having elements of the number of servers, and sets a maximum and minimum value of a range Pi from 0 to 1 and a server address in each element, and sets a range including a uniform random number value. The server address of the element possessed may be the service assignment server address. However, the range of each element should not overlap with the ranges of other elements.
[0096]
Regarding the probability distribution of PsD and MPsD, the server processing capability values Pi and Pi ′ may be realized as a frequency distribution. In this case, the uniform random number value ranges from 0 to the total value of all Pis.
[0097]
When the server selection means 1007 determines the assigned server, it sends the server address 1012 to the connection management means 1003. The connection management unit 1003 extracts the set information of the client address (IP), the client port number (sp), and the destination port number (dp) from the start packet SYN or a part of the start packet SYN received from the destination conversion / packet relay unit. A pair of the information and the server address received from the server allocating unit 1001 is recorded. Here, a hash table using group information as a key may be used for recording. The connection management means 1003 sends the server address 1012 to the destination conversion / packet relay means 1002.
[0098]
The destination conversion / packet relay unit 1002 converts the destination of the received packet from the client 1 into the server address 1012 received from the connection management unit 1003, and transmits it to the server 2.
[0099]
During the service, the destination conversion / packet relay unit 1002 sends a packet 1014 to the connection management unit 1003, and the connection management unit 1003 obtains the assigned server address 1012 from the set information obtained from the packet 1014 and obtains the destination conversion / packet relay unit. Send to 1002. Similarly to the start packet SYN, the destination conversion / packet relay unit 1002 converts the received destination of the packet from the client 1 into the server address 1012 received from the connection management unit 1003, and transmits it to the server 2.
[0100]
At the end of the service, that is, at the time of receiving the end packet FIN, it is the same as during the service, but upon receiving this, the connection management unit 1003 discards the set information corresponding to the packet.
[0101]
In the present embodiment, the service allocation is determined using the probability distribution, so that a client having a higher processing capability is more likely to allocate a server having a higher processing capability. Therefore, the influence of the server processing capability on the service quality such as response time is small or large. Service allocation according to the above.
[0102]
The server allocation probability correction information generating means 1009 of the server allocating means 1001 obtains the past client-side processing capacity distribution (FIG. 14), obtains the distance δ from the client processing capacity value distribution of the newly connected client, and obtains the new connected client. Is obtained from the distribution of the client-side processing capability values of the above. Then, the probability distribution is corrected by adding δc to the correction function M (FIG. 15). Here, for example, δc is obtained by the following equation.
δc = Pca−pci
Here, Pca is a past client-side processing capacity average value, and pci is a client-side processing capacity value of a newly connected client. The correction function M is determined so that the server processing capability value pi approaches the average value of all pis as δc decreases, and the distance from the average value increases as the δc increases. However, when the distance from the average value is increased, the corrected value pi ′ of pi is prevented from becoming a negative number. For example, equation (7) may be made as follows.
pi ′ = pi + (Pav−pi) * β * 2 / π * arc_tan (α * δc + γ) (7 ′)
Here, Pav is an average value of pi, and α and γ are numbers larger than 0 set in advance. Also, arc_tan (x) means tan-1 (x). When β is −1, δc <0, and when −dpj / pj, δ> = 0. pj is the minimum value of pi, and dpj = Pav-pj.
[0103]
In this way, by allocating servers according to the distance between the client-side processing capacity value of the newly connected client and the distribution of past client-side processing capacity values, server allocation according to the client at each point in time becomes possible. For example, it is possible to automatically cope with a case where the ratio of clients from a remote place and a nearby place fluctuates depending on the time zone.
[0104]
Further, in this embodiment, a plurality of server allocating means (1001) may be arranged, and each may be selected according to a client address, a client port number, a service port number, and the like.
[0105]
It is possible to use the allocation target server group for each service or each client, or to switch the service distribution policy, and it is possible to perform various service allocations with one device.
According to the present invention, since the load measurement and the high load detection of the server are performed by monitoring the communication between the client and the server, there is no need to modify the server and no packets other than the service are output. Therefore, there is an effect that any server can be dealt with, the introduction cost is low, and there is no interference with the load. In addition, since load measurement and high load detection are performed using indices that do not depend on the protocol, there is an effect that any service can be dealt with, and the effect of disturbance is small and the accuracy is high because the communication state during the service is monitored. is there.
Furthermore, when sharing the service provided by a server to multiple servers, the load of each server is shared according to the effect of server processing capacity on the quality of service seen by the client when the server configuration or server status changes. Is automatically and efficiently allocated, so that there is an effect that the client can receive prompt service supply.
(Other)
The above embodiment discloses the following invention.
(Supplementary Note 1) In a device for transferring data from a client to a plurality of servers via a network, a relay means for transferring data transmitted from the client to one of the servers by changing a destination, and a correspondence relationship between the data and the server. Connection management means for holding and instructing the relay means on the destination, and measuring and determining the processing capacity of the server, client, and route, and determining the connection between the data and the server using a function according to the service distribution rate based on the result. A network server allocating device comprising server allocating means for transmitting to a managing means.
(Supplementary Note 2) In the server allocating means according to Supplementary Note 1, the corrected probability distribution obtained by performing a correction to make the probability distribution according to the processing capacity of the server closer to a uniform distribution as the processing capacity of the client and the route is lower is obtained. Network server allocating device to be a distribution rate.
(Supplementary Note 3) In the server allocating means of Supplementary Note 1, the distribution of the processing capability of the client and the route for the client currently in service is obtained, and the lower the processing capability of the newly connected client and the route, the more the processing of the server. A network server allocating device that makes a probability distribution according to the performance close to a uniform distribution, and conversely, makes a correction to make the processing capability of each server more prominent, obtains a corrected probability distribution, and uses the corrected distribution as a distribution ratio.
(Supplementary Note 4) The network server assigning device according to Supplementary Note 1, wherein a plurality of server assigning means are provided, and the server assigning means is selected for each client or service.
[0106]
【The invention's effect】
As described above, according to the present invention, services are distributed according to a method obtained by measuring the performance / load of the server and the performance / load of the client side. In addition, it is possible to allocate as many servers as necessary to maintain the quality of service that can be seen from the client, which has the effect of maximizing the efficiency of server use. Further, since the server allocation is determined using the function, the service distribution ratio can be accurately reflected in the distribution.
[0107]
In addition, since the proportional relationship between the high processing power of the client side and the effect of the processing capacity of the server on the service quality is reflected in the service distribution rate, the processing capacity is provided to the client that the influence of the server processing capacity has a large influence on the service quality. There is an effect that a server having a high priority can be preferentially assigned.
Further, since the service distribution rate is adjusted in relation to the distribution of the processing capacity of the client currently being serviced, there is an effect that it is possible to automatically cope with the case where the ratio of clients from a distant place and a near place changes.
[Brief description of the drawings]
FIG. 1 is a diagram showing a connection configuration of a load detection device according to an embodiment of the present invention.
FIG. 2 is a graph showing a relationship between data size and time for performing a high load determination of a server according to the embodiment;
FIG. 3 is a flowchart illustrating a packet monitoring method according to the first embodiment (1).
FIG. 4 is a flowchart illustrating a packet monitoring method according to the first embodiment (2).
FIG. 5 is a diagram for explaining a connection request from a client to a server and a response process according to a buffer state;
FIG. 6 is a diagram for explaining retransmission of a connection request from a client to a server;
FIG. 7 is an explanatory diagram showing an example of performing discrimination as to whether or not target data is to be used according to the data size in load detection
FIG. 8 is a view for explaining server processing based on a sequence number;
FIG. 9 is an explanatory diagram for monitoring communication from a server to a client.
FIG. 10 is a block diagram illustrating the configuration of a server allocation device according to the embodiment;
FIG. 11 is a graph for explaining a server allocation probability distribution PsD.
FIG. 12 is a diagram (1) for explaining a correction function;
FIG. 13 is a diagram illustrating an example of a table generated by a server selection unit in the embodiment.
FIG. 14 is a graph showing a distribution example of past client-side processing capacity values;
FIG. 15 is a diagram for explaining a correction function (2).

Claims

In a device for transferring data from a client to a plurality of servers via a network,
Relay means for transferring data transmitted from the client to one of the servers by changing the destination;
Connection management means for holding the correspondence between the data and the server and indicating the destination to the relay means;
Server allocation means for measuring and determining the processing capacity of the server and the processing capacity on the client side, determining the correspondence between the data and the server using a function according to the service distribution rate based on the measurement, and transmitting the correspondence to the connection management means,
A network server allocating apparatus, wherein the server allocating unit makes a correction probability distribution obtained by performing a correction to make the distribution closer to a uniform distribution as the client-side processing capability becomes lower, with respect to the probability distribution according to the processing capability of the server, and the distribution ratio is used as the distribution ratio.

In a device for transferring data from a client to a plurality of servers via a network,
Relay means for transferring data transmitted from the client to one of the servers by changing the destination;
Connection management means for holding the correspondence between the data and the server and indicating the destination to the relay means;
Server allocation means for measuring and determining the processing capacity of the server and the processing capacity on the client side, determining the correspondence between the data and the server using a function according to the service distribution rate based on the measurement, and transmitting the correspondence to the connection management means,
In the server allocating means, the distribution of the client-side processing capacity of the client currently being serviced is obtained, and the lower the newly connected client-side processing capacity is, the more uniform the probability distribution according to the processing capacity of the server becomes. A network server allocating apparatus that makes corrections to make the processing capacity of each server more prominent as it approaches, and obtains a correction probability distribution, and uses the distribution as a distribution rate.

In a device for transferring data from a client to a plurality of servers via a network,
Relay means for transferring data transmitted from the client to one of the servers by changing the destination;
Connection management means for holding the correspondence between the data and the server and indicating the destination to the relay means;
Server allocation means for measuring and determining the processing capacity of the server and the processing capacity on the client side, determining the correspondence between the data and the server using a function according to the service distribution rate based on the measurement, and transmitting the correspondence to the connection management means,
A network server allocating apparatus comprising a plurality of the server allocating means, wherein the server allocating means is selected for each client or service so that a service distribution policy is switched for each service or each client.