JP2004126968A - Job scheduling system for parallel computer - Google Patents

Job scheduling system for parallel computer Download PDF

Info

Publication number
JP2004126968A
JP2004126968A JP2002290695A JP2002290695A JP2004126968A JP 2004126968 A JP2004126968 A JP 2004126968A JP 2002290695 A JP2002290695 A JP 2002290695A JP 2002290695 A JP2002290695 A JP 2002290695A JP 2004126968 A JP2004126968 A JP 2004126968A
Authority
JP
Japan
Prior art keywords
job
computer
temperature
computers
job scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2002290695A
Other languages
Japanese (ja)
Inventor
Masazumi Matsubara
松原 正純
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP2002290695A priority Critical patent/JP2004126968A/en
Publication of JP2004126968A publication Critical patent/JP2004126968A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

<P>PROBLEM TO BE SOLVED: To solve a thermal problem, realize an energy saving, and reduce the running cost of the whole system by performing a job scheduling based on the temperature distribution information obtained from a temperature sensor. <P>SOLUTION: Computers have built-in temperature sensors 10, 11, 12 and 13. Temperature monitoring demons 100b, 101b, 102b and 103b read values from the temperature sensors, and transfer them to a temperature information management server 110a on a scheduler computer 110. The temperature information management server 110 manages the temperature information by use of a temperature information management table. A scheduler 110b retrieves a computer having the lowest temperature in reference to the temperature information management table before inputting a new job to the computers 100, 101, 102, and 103, and inputs the job to this computer. Otherwise, the job of each computer may be re-assigned based on the temperature. <P>COPYRIGHT: (C)2004,JPO

Description

【0001】
【発明の属する技術分野】
本発明は、分散メモリ型並列計算機やPCクラスタに代表される並列計算機システムでのジョブスケジューリング装置に関する。
【0002】
【従来の技術】
従来、並列計算機システムにおけるジョブスケジューリングと言えば、各計算機の負荷を均等化することで実行性能の改善を目指したものがほとんどであった(例えば特許文献1,特許文献2,特許文献3参照)。
【0003】
【特許文献1】
特開平11−312149号公報
【特許文献2】
特開2001−14286号公報
【特許文献3】
特開平8−83257号公報
【0004】
【発明が解決しようとする課題】
従来のジョブスケジューリングでは性能にのみ着目している。しかし、大規模な並列計算機システムを構築する場合、性能以外にも熱、設置スペースなどの問題も解決しなくては実用化は難しい。
設置スペースの問題については、部品を小型化し、拡張性を犠牲にして高密度な計算機を構築することで改善できる。ただし、その場合は小さいスペースにより多くの熱源が存在することになるため、熱問題がさらに深刻になる。
この結果、十分に排熱しきれないために、システムに障害をきたし、結局は性能低下につながることも十分に有り得る。冷却装置を強力にすることで解決する方法があるが、やみくもにシステム全体を冷却したのではコストが嵩んでしまう。つまり、数千、数万といった超並列計算機システムにおいては、従来方式の性能に加えて計算機の温度というファクタを加味してスケジューリングすべきである。
本発明は上記事情に鑑みなされたものであって、本発明の目的は、並列計算機システムを構成する各計算機に温度センサを設け、センサから得られる温度分布情報をもとにジョブスケジューリングを行うことにより、上記熱問題を解決し、省エネルギーを実現するとともに、システム全体のランニングコストを下げることである。
【0005】
【課題を解決するための手段】
上記課題を本発明においては、次のように解決する。
(1)並列計算機システムを構成する各計算機に温度センサを設け、センサから得られる値を集計して、その温度分布情報をもとにジョブの投入先を決定する。
(2)並列計算機システムを構成する各々の計算機に温度センサと、現在システムで実行中のジョブを監視する監視装置とを設け、該監視装置による監視結果と、上記温度センサによって取得した値をもとに、実行中のジョブを他の計算機に割り付け直す。
(3)並列計算機システムを構成する各計算機に温度センサを設け、また、システム内のどの計算機上で各ジョブを実行しているのか監視する装置を設け、近傍の計算機間で温度情報を交換し、現在の実行中の計算機よりも温度の低い計算機にジョブを割り付けになおす。
(4)上記(1)〜(3)において、並列計算機システムを構成する各々の計算機は、冷却装置を設け、温度センサによって取得した値を基に上記冷却装置を制御する。
本発明の請求項1,5の発明においては、上記(1)のように並列計算機システムを構成する各計算機に温度センサを設け、センサから得られる値を集計しその温度分布情報をもとにジョブの投入先を決定しているので、システムの温度分布を把握して、最適な計算機にジョブを割り付けることが可能となる。
本発明の請求項2の発明においては、各計算機の温度センサから取得した値と現在実行しているジョブがどの計算機で実行されているかを監視し、実行中のジョブを他の計算機に割り付け直しているので、システムの温度分布を把握して、最適な計算機にジョブを割り付けることが可能となる。
本発明の請求項3の発明においては、近傍の計算機間で温度情報を交換し、現在の実行中の計算機よりも温度の低い計算機にジョブを割り付けになおしているので、局所的なスケジューリングであり、最適なジョブ再割り付けではないものの、ある程度の省エネルギーが見込め、さらにスケジューリングのオーバヘッドを抑えることが可能となる。
本発明の請求項4の発明においては、冷却装置を設け、該冷却装置をジョブスケジューリング装置から制御可能にしたので、システム状況に適した冷却強度で設定でき、過剰な電力を抑えて、コスト削減を図ることができる。
【0006】
また、本発明においては、以下のように構成することもできる。
(イ)上記(1)と(2)を組み合わせ、監視装置による監視結果と、上記温度センサによって取得した値をもとに、ジョブの投入先計算機を決定するとともに、実行中のジョブを他の計算機に割り付け直す。
上記構成とすることにより、ジョブ投入前、投入後どちらでもジョブを制御することができるようになり、より効率的なスケジューリングが可能となる。
(ロ)上記(1)のジョブスケジューリングを行う第1のジョブスケジューリング装置と、上記(3)のジョブスケジューリングを行う第2のジョブスケジューリング装置を設けて、階層型のジョブスケジューリングを行うことにより、短いインターバルでは近傍の計算機間でのジョブスケジューリングを適用し、それよりも長いインターバルでシステム全体を対象としたジョブスケジューリングを適用することができ、スケジューリングのオーバヘッドを抑えつつ、最適化効率を上げることが可能となる。
【0007】
【発明の実施の形態】
図1は本発明の第1の実施例を示す図である。
ネットワーク120によってスケジューラ計算機110と計算機100,101,102,103が結合されている。スケジューラ計算機110では、温度情報管理サーバ110aとジョブスケジューラ110bが走っている。各計算機には温度センサ10,11,12,13が内蔵され、また温度監視デーモン100a,101a,102a,103aが走っている。なお、図1では4台の計算機の場合を示しているがこの台数は任意である。
各計算機100,101,102,103上の温度監視デーモン100b,101b,102b,103bは同機上の温度センサ10,11,12,13から値を読みとり、その結果をスケジューラ計算機110上の温度情報管理サーバ110aに伝達する。
【0008】
温度情報管理サーバ110aでは図2に示す温度情報管理テーブルを用いて、各計算機100,101,102,103の温度情報を管理する。温度情報管理テーブルは同図に示すように、各計算機100,101,102,103の計算機番号と、各計算機の温度を管理するテーブルであり、このテーブルのサイズは計算機台数に比例する。
スケジューラ110bは新規ジョブを計算機100,101,102,103上に投入する前に、この温度情報管理テーブルを参照して最も温度の低い計算機を検索する。
検索の結果選ばれた計算機が最も適切な計算機であり、スケジューラ110bはその計算機に対してジョブを投入する。新規ジョブが複数台の計算機を必要とする場合は、温度の低い順に必要台数分計算機を検索し、それらの計算機上にジョブを投入する。
図1ではスケジューラ計算機110とその他の計算機は分かれているが、計算機100,101,102,103のいずれかがスケジューラ計算機110を兼ねても良い。
【0009】
図3は第2実施例を示す図である。
ネットワーク220によってスケジューラ計算機210と計算機200, 201,202,203が結合されている。スケジューラ計算機210では、ジョブ管理サーバ210a、温度情報管理サーバ210bとジョブスケジューラ210cが走っている。
各計算機200, 201,202,203には温度センサ20,21,22,23が内蔵され、また温度監視デーモン200a,201a,202a,203aが走っている。なお、図3では4台の計算機の場合を示しているがこの台数は任意である。
各計算機200, 201,202,203上の温度監視デーモン200a,201a,202a,203aは同機上の温度センサ20,21,22,23から値を読みとり、その結果をスケジューラ計算機210上の温度情報管理サーバ210bに伝達する。
温度情報管理サーバ210bでは図2に示す温度情報管理テーブルを用いて、各計算機200, 201,202,203の温度情報を管理する。
【0010】
スケジューラ210cは新規ジョブを計算機200, 201,202,203上に投入する前に、この温度情報管理テーブルを参照して最も温度の低い計算機を検索する。
検索の結果選ばれた計算機が最も適切な計算機であり、スケジューラ210cはその計算機に対してジョブを投入する。新規ジョブが複数台の計算機を必要とする場合は、温度の低い順に必要台数分計算機を検索し、それらの計算機上にジョブを投入する。
この際、ジョブ管理サーバ210aに図4に示すジョブ管理テーブルへのジョブ登録を通知する。ジョブ管理テーブルは、同図に示すようにジョブ番号と使用計算機を管理するテーブルである。
ジョブ管理サーバ210aではスケジューラ210cからのジョブ登録以外に、各計算機200, 201,202,203からジョブ終了通知を受けとってジョブ管理テーブルからのジョブ削除も行なう。
【0011】
スケジューラ210cは定期的に現在実行中のジョブについて、最適な計算機上で実行されているかどうかチェックする。
すなわち、図4のジョブ管理テーブルに登録されている使用計算機群について、温度情報管理テーブルから温度情報を取得し、それらの計算機の温度が高いか調べる。温度が高いかどうかの判断は、閾値以上かどうか調べる絶対評価や、もしくはその他の計算機と比較する相対評価などがある。
1台でも温度が高いと判断された場合は、その計算機上のジョブを他の温度の低い計算機に移動する。その後、ジョブ情報管理テーブルの該当エントリを更新する。
図3ではスケジューラ計算機とその他の計算機は分かれているが、計算機200,201,202,203のいずれかがスケジューラ計算機210を兼ねても良い。
【0012】
図5は本発明の第3の実施例を示す図である。
ネットワーク320によって計算機300,301,302,303が結合されている。各計算機300,301,302,303には温度センサ30,31,32,33が内蔵され、また温度監視デーモン300a,301a,302a,303a及びスケジューラ300b,301b,302b,303bが走っている。なお、図5では4台の計算機の場合を示しているがこの台数は任意である。
各計算機300,301,302,303上の温度監視デーモン300a,301a,302a,303aは同機上の温度センサから値を読みとり、その結果をスケジューラ300b,301b,302b,303bに伝達する。スケジューラ300b,301b,302b,303bは定期的に隣接する計算機と温度情報を交換し、隣の計算機の温度がある温度以下、自機の温度よりもある一定値以上低い場合に、自機上のジョブをその隣接計算機に移動する。
図5ではスケジューラと温度監視デーモンが分かれているが、1つのプロセスが両方の機能を兼ねても良い。
【0013】
図6は本発明の第4の実施例を示す図であり、本実施例は、上記第2の実施例に冷却装置を設けた実施例を示す。
ネットワーク420によってスケジューラ計算機410と計算機400,401,402,403が結合されている。スケジューラ計算機410では、ジョブ管理サーバ410a、温度情報管理サーバ410b、ジョブスケジューラ410cが走っている。
各計算機には温度センサ40,41,42,43、冷却装置50,51,52,53が内蔵され、また温度監視デーモン400a,401a,402a,403aが走っている。この冷却装置50,51,52,53はスケジューラ計算機410上のスケジューラ410bから制御可能である。図6では4台の計算機の場合を示しているがこの台数は任意である。
【0014】
各計算機上の温度監視デーモン400a,401a,402a,403aは同機上の温度センサ40,41,42,43から値を読みとり、その結果をスケジューラ計算機410上の温度情報管理サーバ410bに伝達する。
温度情報管理サーバ410bでは前記図2に示す温度情報管理テーブルを用いて、各計算機400,401,402,403の温度情報を管理する。
スケジューラ410cは新規ジョブを計算機400,401,402,403上に投入する前に、この温度情報管理テーブルを参照して各計算機400,401,402,403の温度を調べる。また、ジョブ管理サーバ410aは、前記したように、ジョブ管理テーブルにより実行中のジョブを管理する。
ここで、既に計算機400にジョブ1が、計算機401にジョブ2が走っているものとする。
【0015】
この場合、スケジューラ410cは以下に説明する図7のフローチャートに従い、スケジューリングを行なう。
まず、CPU使用率などの負荷情報を取得し(ステップS1)、ジョブ1、2を同一計算機にまとめてしまっても良いかどうかを性能面から判断する(ステップS2)。すなわちジョブ1,2の負荷の和が100%以下であり、例えば、計算機401のジョブを計算機400にまとめてしまってもよいかを判断する。
ジョブ1、2を同一計算機にまとめることができない場合には、スケジューリング処理を終了する。
【0016】
また、ジョブ1,2を同一計算機にまとめてしまっても、性能的には問題ないとなった場合に、続いてコスト面から評価するために現在の温度、消費電力情報を取得し(ステップS3)、ジョブ移動時の消費電力を見積もる(ステップS4)。この時の見積りは、現在の冷却装置強度から、最低に落した場合と最高まで上げた場合の差分程度の単純な計算でも良い。
ジョブを同一計算機上にまとめたほうが総消費電力が下がるという結論に達した場合には、実際にジョブを移動させる(ステップS6)。例えば、ジョブ2を計算機400に移動する。そして、それぞれの計算機の冷却装置の強度を適切に設定する。例えば計算機400の冷却装置400bの強度を上げ、計算機401の冷却装置401bの強度を下げる。
また、ジョブを同一計算機上にまとめても総消費電力が下がらない場合には、スケジューリング処理を終了する。
【0017】
なお、上記第2、第3の実施例を組み合わせ、階層型のジョブスケジューリングを行うようにしてもよい。すなわち、並列計算機システムを構成する各々の計算機に温度センサと、温度監視デーモンと、スケジューラを設け、また、スケジューラ計算機に現在システムで実行中のジョブを監視するジョブ情報管理サーバと、スケジューラと、温度情報管理サーバを設け、上記スケジューラ計算機により、長いインターバルでシステム全体を対象とするジョブスケジューリングを行い、各並列計算機に設けたスケジューラにより、近傍の計算機間で短いインターバルでジョブスケジューリングを行うようにする。
これにより、スケジューリングのオーバヘッドを抑えつつ、最適化効率を上げることができる。
さらに、前記第1〜第3の実施例において、各並列計算機システムを構成する各々の計算機に冷却装置を設け、スケジューラにより冷却装置を制御するように構成してもよい。
【0018】
(付記1) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサを設け、
上記ジョブスケジューリング装置は、上記温度センサによって取得した値をもとにジョブの投入先計算機を決定する
ことを特徴としたジョブスケジューリング装置。
(付記2) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサと、現在システムで実行中のジョブを監視する監視装置とを設け、
上記ジョブスケジューリング装置は、上記監視装置による監視結果と、上記温度センサによって取得した値をもとに、実行中のジョブを他の計算機に割り付け直す
ことを特徴としたジョブスケジューリング装置。
(付記3) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサと、現在システムで実行中のジョブを監視する監視装置とを設け、
上記ジョブスケジューリング装置は、上記監視装置による監視結果と、上記温度センサによって取得した値をもとに、ジョブの投入先計算機を決定するとともに、実行中のジョブを他の計算機に割り付け直す
ことを特徴としたジョブスケジューリング装置。
(付記4) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサを備え、
上記ジョブスケジーリング装置が、並列計算機システムを構成する各計算機に分散配置されており、
上記ジョブスケジーリング装置は、近傍の計算機間で温度情報を交換し、取得した温度情報に基づき、実行中のジョブを近傍の計算機に割り付け直すことを特徴とするジョブスケジューリング装置。
(付記5) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサと、現在システムで実行中のジョブを監視する監視装置と、
上記並列計算機を構成する各計算機のジョブスケジューリングを行う第1のジョブスケジューリング装置と、
各計算機に分散配置された第2のジョブスケジューリング装置を設け、
上記第1のジョブスケジューリング装置は、上記監視装置による監視結果と、上記温度センサによって取得した値をもとに、ジョブの投入先計算機を決定するとともに、実行中のジョブを他の計算機に割り付け直し、
上記第2のジョブスケジューリング装置は、近傍の計算機間で温度情報を交換し、取得した温度情報に基づき、実行中のジョブを近傍の計算機に割り付け直すことを特徴としたジョブスケジューリング装置。
(付記6) 上記並列計算機システムを構成する各々の計算機は、冷却装置を備え、
上記ジョブスケジューリング装置は、上記温度センサによって取得した値を基に上記冷却装置を制御する
ことを特徴とする付記1,2,3,4または付記5のジョブスケジューリング装置。
(付記7) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリングプログラムであって、
上記ジョブスケジューリングプログラムは、並列計算機システムを構成する各々の計算機に設けられた温度センサによって取得した値をもとにジョブの投入先計算機を決定する処理をコンピュータに実行させる
ことを特徴とするジョブスケジューリングプログラム。
(付記8) 複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリングプログラムであって、
上記ジョブスケジューリングプログラムは、並列計算機システムを構成する各々の計算機に設けられた温度センサによって取得した値をもとに実行中のジョブを他の計算機に割り付け直す処理をコンピュータに実行させる
ことを特徴とするジョブスケジューリングプログラム。
【0019】
【発明の効果】
以上説明したように、本発明においては、並列計算機システムを構成する各計算機に温度センサを設け、センサから得られる温度分布情報をもとにジョブスケジューリングを行っているので、充分に排熱しきれないためシステムに障害をきたし、性能低下につながるといった問題を解決することができる。また、省エネルギーを実現でき、システム全体のランニングコストを下げることができる。
【図面の簡単な説明】
【図1】本発明の第1の実施例を示す図である。
【図2】温度情報管理テーブルの構成例を示す図である。
【図3】本発明の第2の実施例を示す図である。
【図4】ジョブ情報管理テーブルの構成例を示す図である。
【図5】本発明の第3の実施例を示す図である。
【図6】本発明の第4の実施例を示す図である。
【図7】第4の実施例における処理を示すフローチャートである。
【符号の説明】
100,101,102,103     計算機
200,201,202,203     計算機
300,301,302,303     計算機
400,401,402,403     計算機
120,220,320    ネットワーク
110,210,410    スケジューラ計算機
10〜13,20〜23    温度センサ
30〜33,40〜43    温度センサ
50〜53          冷却装置
[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a job scheduling device in a parallel computer system represented by a distributed memory parallel computer or a PC cluster.
[0002]
[Prior art]
Conventionally, most of job scheduling in a parallel computer system aims at improving the execution performance by equalizing the load of each computer (for example, see Patent Literature 1, Patent Literature 2, Patent Literature 3). .
[0003]
[Patent Document 1]
Japanese Patent Application Laid-Open No. H11-321149 [Patent Document 2]
JP 2001-14286 A [Patent Document 3]
JP-A-8-83257
[Problems to be solved by the invention]
Conventional job scheduling focuses only on performance. However, when constructing a large-scale parallel computer system, it is difficult to put it into practical use unless problems other than performance, such as heat and installation space, are solved.
The problem of installation space can be improved by reducing the size of parts and building a high-density computer at the expense of expandability. However, in that case, more heat sources will be present in the small space, and the heat problem will be more serious.
As a result, there is a possibility that the system may be damaged due to insufficient exhaustion of heat, which may eventually lead to performance degradation. There is a solution to this problem by making the cooling device powerful, but if the entire system is blindly cooled, the cost increases. In other words, in a massively parallel computer system such as thousands or tens of thousands, the scheduling should be performed in consideration of the factor of the temperature of the computer in addition to the performance of the conventional method.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a temperature sensor in each computer constituting a parallel computer system and perform job scheduling based on temperature distribution information obtained from the sensor. Thus, the above-mentioned heat problem can be solved, energy can be saved, and the running cost of the entire system can be reduced.
[0005]
[Means for Solving the Problems]
In the present invention, the above problems are solved as follows.
(1) A temperature sensor is provided in each computer constituting the parallel computer system, the values obtained from the sensors are totaled, and the job submission destination is determined based on the temperature distribution information.
(2) Each computer constituting the parallel computer system is provided with a temperature sensor and a monitoring device for monitoring a job currently being executed in the system, and the monitoring result of the monitoring device and the value obtained by the temperature sensor are also used. At the same time, the running job is reassigned to another computer.
(3) A temperature sensor is provided for each computer constituting the parallel computer system, and a device for monitoring which computer in the system is executing each job is provided, and temperature information is exchanged between nearby computers. The job is assigned to a computer whose temperature is lower than that of the currently running computer.
(4) In the above (1) to (3), each computer constituting the parallel computer system is provided with a cooling device, and controls the cooling device based on a value obtained by a temperature sensor.
According to the first and fifth aspects of the present invention, a temperature sensor is provided in each computer constituting the parallel computer system as described in the above (1), values obtained from the sensors are totaled, and the temperature distribution information is obtained. Since the job input destination is determined, it is possible to grasp the temperature distribution of the system and allocate the job to the most suitable computer.
In the invention according to claim 2 of the present invention, the value obtained from the temperature sensor of each computer and the computer on which the currently executing job is being executed are monitored, and the running job is reallocated to another computer. Therefore, it is possible to grasp the temperature distribution of the system and assign a job to an optimal computer.
In the invention of claim 3 of the present invention, temperature information is exchanged between nearby computers, and the job is reassigned to a computer whose temperature is lower than that of the currently running computer. Although it is not the optimal job re-allocation, some energy savings can be expected and the scheduling overhead can be suppressed.
In the invention according to claim 4 of the present invention, a cooling device is provided, and the cooling device can be controlled from the job scheduling device. Therefore, it is possible to set a cooling intensity suitable for the system status, suppress excessive power, and reduce costs. Can be achieved.
[0006]
Further, in the present invention, the following configuration is also possible.
(A) By combining the above (1) and (2), based on the monitoring result by the monitoring device and the value obtained by the temperature sensor, the computer to which the job is to be submitted is determined, and the job being executed is Reassign to the calculator.
With the above configuration, the job can be controlled both before and after the job is input, so that more efficient scheduling can be performed.
(B) By providing a first job scheduling device for performing the job scheduling of the above (1) and a second job scheduling device for performing the job scheduling of the above (3), and performing hierarchical job scheduling, a short In the interval, job scheduling between nearby computers can be applied, and in the longer interval, job scheduling for the entire system can be applied, so that optimization efficiency can be increased while suppressing scheduling overhead. It becomes.
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a diagram showing a first embodiment of the present invention.
A scheduler computer 110 and computers 100, 101, 102, and 103 are connected by a network 120. In the scheduler computer 110, a temperature information management server 110a and a job scheduler 110b are running. Each computer has built-in temperature sensors 10, 11, 12, and 13, and temperature monitoring daemons 100a, 101a, 102a, and 103a are running. FIG. 1 shows the case of four computers, but the number is arbitrary.
The temperature monitoring daemons 100b, 101b, 102b, and 103b on the computers 100, 101, 102, and 103 read the values from the temperature sensors 10, 11, 12, and 13 on the computers, and store the results in the temperature information management on the scheduler computer 110. The information is transmitted to the server 110a.
[0008]
The temperature information management server 110a manages the temperature information of each of the computers 100, 101, 102, and 103 using the temperature information management table shown in FIG. As shown in the figure, the temperature information management table is a table for managing the computer numbers of the computers 100, 101, 102, and 103 and the temperature of each computer, and the size of this table is proportional to the number of computers.
Before submitting a new job to the computers 100, 101, 102, and 103, the scheduler 110b refers to the temperature information management table to search for a computer with the lowest temperature.
The computer selected as a result of the search is the most appropriate computer, and the scheduler 110b submits a job to the computer. When a new job requires a plurality of computers, the required number of computers are searched in ascending order of temperature, and jobs are input to those computers.
Although the scheduler computer 110 and the other computers are separated in FIG. 1, any one of the computers 100, 101, 102, and 103 may serve as the scheduler computer 110.
[0009]
FIG. 3 shows the second embodiment.
A scheduler computer 210 and computers 200, 201, 202, and 203 are connected by a network 220. In the scheduler computer 210, a job management server 210a, a temperature information management server 210b, and a job scheduler 210c are running.
Each of the computers 200, 201, 202, and 203 has a built-in temperature sensor 20, 21, 22, 23, and a temperature monitoring daemon 200a, 201a, 202a, 203a. FIG. 3 shows the case of four computers, but the number is arbitrary.
The temperature monitoring daemons 200a, 201a, 202a, and 203a on the computers 200, 201, 202, and 203 read values from the temperature sensors 20, 21, 22, and 23 on the computers, and store the results in the temperature information management on the scheduler computer 210. The information is transmitted to the server 210b.
The temperature information management server 210b manages the temperature information of each of the computers 200, 201, 202, and 203 using the temperature information management table shown in FIG.
[0010]
Before submitting a new job to the computers 200, 201, 202, and 203, the scheduler 210c refers to the temperature information management table to search for a computer with the lowest temperature.
The computer selected as a result of the search is the most appropriate computer, and the scheduler 210c submits a job to the computer. When a new job requires a plurality of computers, the required number of computers are searched in ascending order of temperature, and jobs are input to those computers.
At this time, the job management server 210a is notified of the job registration in the job management table shown in FIG. The job management table is a table for managing job numbers and computers used as shown in FIG.
In addition to registering a job from the scheduler 210c, the job management server 210a receives a job end notification from each of the computers 200, 201, 202, and 203 and deletes the job from the job management table.
[0011]
The scheduler 210c periodically checks whether the currently executing job is being executed on an optimal computer.
That is, for the computers used in the job management table shown in FIG. 4, temperature information is acquired from the temperature information management table, and it is checked whether the temperatures of those computers are high. The determination of whether the temperature is high includes an absolute evaluation for checking whether the temperature is equal to or higher than a threshold, and a relative evaluation for comparing with a computer.
If it is determined that at least one of the computers has a high temperature, the job on that computer is moved to another low-temperature computer. After that, the corresponding entry in the job information management table is updated.
In FIG. 3, the scheduler computer and other computers are separated, but any of the computers 200, 201, 202, and 203 may also serve as the scheduler computer 210.
[0012]
FIG. 5 is a diagram showing a third embodiment of the present invention.
Computers 300, 301, 302, and 303 are connected by a network 320. Each of the computers 300, 301, 302, and 303 has a built-in temperature sensor 30, 31, 32, and 33, and a temperature monitoring daemon 300a, 301a, 302a, and 303a and schedulers 300b, 301b, 302b, and 303b are running. FIG. 5 shows the case of four computers, but the number is arbitrary.
The temperature monitoring daemons 300a, 301a, 302a, and 303a on the computers 300, 301, 302, and 303 read values from the temperature sensors on the computers and transmit the results to the schedulers 300b, 301b, 302b, and 303b. The schedulers 300b, 301b, 302b, and 303b periodically exchange temperature information with an adjacent computer. Move the job to its neighbor.
Although the scheduler and the temperature monitoring daemon are separated in FIG. 5, one process may have both functions.
[0013]
FIG. 6 is a diagram showing a fourth embodiment of the present invention. This embodiment shows an embodiment in which a cooling device is provided in the second embodiment.
A scheduler computer 410 and computers 400, 401, 402, and 403 are connected by a network 420. In the scheduler computer 410, a job management server 410a, a temperature information management server 410b, and a job scheduler 410c are running.
Each computer has built-in temperature sensors 40, 41, 42, 43 and cooling devices 50, 51, 52, 53, and runs temperature monitoring daemons 400a, 401a, 402a, 403a. The cooling devices 50, 51, 52, 53 can be controlled from a scheduler 410b on the scheduler computer 410. FIG. 6 shows the case of four computers, but the number is arbitrary.
[0014]
The temperature monitoring daemons 400a, 401a, 402a, and 403a on each computer read the values from the temperature sensors 40, 41, 42, and 43 on the computer, and transmit the results to the temperature information management server 410b on the scheduler computer 410.
The temperature information management server 410b manages the temperature information of each of the computers 400, 401, 402, and 403 using the temperature information management table shown in FIG.
Before submitting a new job to the computers 400, 401, 402, and 403, the scheduler 410c refers to the temperature information management table to check the temperatures of the computers 400, 401, 402, and 403. Further, the job management server 410a manages the running job based on the job management table as described above.
Here, it is assumed that the job 1 is already running on the computer 400 and the job 2 is running on the computer 401.
[0015]
In this case, the scheduler 410c performs scheduling according to the flowchart of FIG. 7 described below.
First, load information such as a CPU usage rate is obtained (step S1), and it is determined from a performance point of view whether or not the jobs 1 and 2 may be combined on the same computer (step S2). That is, it is determined whether or not the sum of the loads of the jobs 1 and 2 is 100% or less, and for example, the jobs of the computer 401 may be combined into the computer 400.
If the jobs 1 and 2 cannot be combined on the same computer, the scheduling process ends.
[0016]
Also, if there is no problem in performance even if the jobs 1 and 2 are put together on the same computer, the current temperature and power consumption information are subsequently obtained for cost evaluation (step S3). ), Estimate the power consumption when moving the job (step S4). The estimation at this time may be a simple calculation from the current cooling device strength, which is about the difference between the case where the cooling device strength is reduced to the minimum and the case where the cooling device is increased to the maximum.
If the conclusion is reached that the total power consumption is reduced when the jobs are grouped on the same computer, the jobs are actually moved (step S6). For example, the job 2 is moved to the computer 400. Then, the strength of the cooling device of each computer is set appropriately. For example, the strength of the cooling device 400b of the computer 400 is increased, and the strength of the cooling device 401b of the computer 401 is reduced.
If the total power consumption does not decrease even if the jobs are grouped on the same computer, the scheduling process ends.
[0017]
It should be noted that the second and third embodiments may be combined to perform hierarchical job scheduling. That is, a temperature sensor, a temperature monitoring daemon, and a scheduler are provided in each computer constituting the parallel computer system, and a job information management server that monitors a job currently being executed in the system by the scheduler computer, a scheduler, and a temperature controller. An information management server is provided, the scheduler computer performs job scheduling for the entire system at long intervals, and the scheduler provided at each parallel computer performs job scheduling at short intervals between nearby computers.
As a result, the optimization efficiency can be increased while suppressing the scheduling overhead.
Further, in the first to third embodiments, a cooling device may be provided in each computer constituting each parallel computer system, and the cooling device may be controlled by a scheduler.
[0018]
(Supplementary Note 1) A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor is provided for each computer constituting the parallel computer system,
The job scheduling apparatus is characterized in that a job submission computer is determined based on a value acquired by the temperature sensor.
(Supplementary Note 2) A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor is provided for each computer constituting the parallel computer system, and a monitoring device for monitoring a job currently being executed in the system is provided.
The job scheduling device, wherein the job being executed is reassigned to another computer based on a monitoring result of the monitoring device and a value acquired by the temperature sensor.
(Supplementary Note 3) A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor is provided for each computer constituting the parallel computer system, and a monitoring device for monitoring a job currently being executed in the system is provided.
The job scheduling device determines a computer to which a job is to be submitted based on a result of monitoring by the monitoring device and a value obtained by the temperature sensor, and reallocates a running job to another computer. Job scheduling device.
(Supplementary Note 4) A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
Each computer constituting the parallel computer system has a temperature sensor,
The job scheduling device is distributed and arranged in each computer constituting the parallel computer system,
The job scheduling apparatus is characterized in that temperature information is exchanged between nearby computers, and a running job is reassigned to a nearby computer based on the acquired temperature information.
(Supplementary Note 5) A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor for each computer constituting the parallel computer system, a monitoring device for monitoring a job currently being executed in the system,
A first job scheduling device that performs job scheduling of each computer constituting the parallel computer;
Providing a second job scheduling device distributed in each computer,
The first job scheduling device determines a computer to which a job is to be submitted based on a result of monitoring by the monitoring device and a value obtained by the temperature sensor, and reallocates a running job to another computer. ,
The second job scheduling device is characterized in that temperature information is exchanged between nearby computers, and a running job is reassigned to a nearby computer based on the acquired temperature information.
(Supplementary Note 6) Each computer constituting the parallel computer system includes a cooling device,
The job scheduling device according to attachments 1, 2, 3, 4 or 5, wherein the job scheduling device controls the cooling device based on a value acquired by the temperature sensor.
(Supplementary Note 7) A job scheduling program in a parallel computer system in which a plurality of computers are connected via a network,
The job scheduling program causes a computer to execute a process of determining a computer to which a job is to be submitted based on a value obtained by a temperature sensor provided in each computer constituting the parallel computer system. program.
(Supplementary Note 8) A job scheduling program in a parallel computer system in which a plurality of computers are connected via a network,
The job scheduling program causes a computer to execute a process of reassigning a running job to another computer based on a value obtained by a temperature sensor provided in each computer configuring the parallel computer system. Job scheduling program to do.
[0019]
【The invention's effect】
As described above, in the present invention, since the temperature sensors are provided in the respective computers constituting the parallel computer system and the job scheduling is performed based on the temperature distribution information obtained from the sensors, the heat cannot be sufficiently exhausted. Therefore, it is possible to solve the problem of causing a failure in the system and deteriorating the performance. Further, energy saving can be realized, and the running cost of the entire system can be reduced.
[Brief description of the drawings]
FIG. 1 is a diagram showing a first embodiment of the present invention.
FIG. 2 is a diagram illustrating a configuration example of a temperature information management table.
FIG. 3 is a diagram showing a second embodiment of the present invention.
FIG. 4 is a diagram illustrating a configuration example of a job information management table.
FIG. 5 is a diagram showing a third embodiment of the present invention.
FIG. 6 is a diagram showing a fourth embodiment of the present invention.
FIG. 7 is a flowchart illustrating a process according to a fourth embodiment.
[Explanation of symbols]
100, 101, 102, 103 Computers 200, 201, 202, 203 Computers 300, 301, 302, 303 Computers 400, 401, 402, 403 Computers 120, 220, 320 Networks 110, 210, 410 Scheduler computers 10 to 13, 20 -23 Temperature sensors 30-33, 40-43 Temperature sensors 50-53 Cooling device

Claims (5)

複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサを設け、
上記ジョブスケジューリング装置は、上記温度センサによって取得した値をもとにジョブの投入先計算機を決定する
ことを特徴としたジョブスケジューリング装置。
A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor is provided for each computer constituting the parallel computer system,
The job scheduling apparatus is characterized in that a job submission computer is determined based on a value acquired by the temperature sensor.
複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサと、現在システムで実行中のジョブを監視する監視装置とを設け、
上記ジョブスケジューリング装置は、上記監視装置による監視結果と、上記温度センサによって取得した値をもとに、実行中のジョブを他の計算機に割り付け直す
ことを特徴としたジョブスケジューリング装置。
A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
A temperature sensor is provided for each computer constituting the parallel computer system, and a monitoring device for monitoring a job currently being executed in the system is provided.
The job scheduling device, wherein the job scheduling device reallocates the running job to another computer based on a result of monitoring by the monitoring device and a value obtained by the temperature sensor.
複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリング装置であって、
上記並列計算機システムを構成する各々の計算機に温度センサを備え、
上記ジョブスケジーリング装置が、並列計算機システムを構成する各計算機に分散配置されており、
上記ジョブスケジーリング装置は、近傍の計算機間で温度情報を交換し、取得した温度情報に基づき、実行中のジョブを近傍の計算機に割り付け直す
ことを特徴とするジョブスケジューリング装置。
A job scheduling device in a parallel computer system in which a plurality of computers are connected via a network,
Each computer constituting the parallel computer system has a temperature sensor,
The job scheduling device is distributed and arranged in each computer constituting the parallel computer system,
The job scheduling device is characterized in that temperature information is exchanged between nearby computers, and a job being executed is reassigned to a nearby computer based on the acquired temperature information.
上記並列計算機システムを構成する各々の計算機は、冷却装置を備え、
上記ジョブスケジューリング装置は、上記温度センサによって取得した値を基に上記冷却装置を制御する
ことを特徴とする請求項1,2または請求項3のジョブスケジューリング装置。
Each computer constituting the parallel computer system includes a cooling device,
4. The job scheduling device according to claim 1, wherein the job scheduling device controls the cooling device based on a value acquired by the temperature sensor.
複数の計算機をネットワークを介して結合した並列計算機システムにおけるジョブスケジューリングプログラムであって、
上記ジョブスケジューリングプログラムは、並列計算機システムを構成する各々の計算機に設けられた温度センサによって取得した値をもとにジョブの投入先計算機を決定する処理をコンピュータに実行させる
ことを特徴とするジョブスケジューリングプログラム。
A job scheduling program in a parallel computer system in which a plurality of computers are connected via a network,
The job scheduling program causes a computer to execute a process of determining a computer to which a job is to be submitted based on a value obtained by a temperature sensor provided in each computer constituting the parallel computer system. program.
JP2002290695A 2002-10-03 2002-10-03 Job scheduling system for parallel computer Pending JP2004126968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2002290695A JP2004126968A (en) 2002-10-03 2002-10-03 Job scheduling system for parallel computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2002290695A JP2004126968A (en) 2002-10-03 2002-10-03 Job scheduling system for parallel computer

Publications (1)

Publication Number Publication Date
JP2004126968A true JP2004126968A (en) 2004-04-22

Family

ID=32282479

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2002290695A Pending JP2004126968A (en) 2002-10-03 2002-10-03 Job scheduling system for parallel computer

Country Status (1)

Country Link
JP (1) JP2004126968A (en)

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005119402A1 (en) 2004-06-04 2005-12-15 Sony Computer Entertainment Inc. Processor, processor system, temperature estimation device, information processing device, and temperature estimation method
WO2005124550A1 (en) * 2004-06-22 2005-12-29 Sony Computer Entertainment Inc. Processor, information processor and control method of processor
JP2006018758A (en) * 2004-07-05 2006-01-19 Sony Corp Information processor, information processing method, and program
KR100634931B1 (en) 2004-06-14 2006-10-17 인텔 코오퍼레이션 A temperature-aware steering mechanism
WO2006134775A1 (en) * 2005-06-15 2006-12-21 Matsushita Electric Industrial Co., Ltd. Electronic circuit
JP2007179437A (en) * 2005-12-28 2007-07-12 Fujitsu Ltd Management system, management program and management method
JP2008521127A (en) * 2004-11-17 2008-06-19 レイセオン カンパニー Fault tolerance and recovery in high performance computing (HPC) systems
JP2008165301A (en) * 2006-12-27 2008-07-17 Fujitsu Ltd Load aggregation program, recording medium storing it, load aggregation device, and load aggregation method
JP2009134716A (en) * 2007-11-28 2009-06-18 Internatl Business Mach Corp <Ibm> Method for giving sharing cache line in multiprocessor data processing system, computer readable medium and multiprocessor data processing system
JP2009193509A (en) * 2008-02-18 2009-08-27 Fujitsu Ltd Information processor, information processing method, and information processing program
EP2109028A2 (en) 2008-04-09 2009-10-14 Hitachi Ltd. Operations management methods and devices in information processing systems
US20090271608A1 (en) * 2008-04-25 2009-10-29 Gooding Thomas M Temperature Threshold Application Signal Trigger for Real-Time Relocation of Process
JP2010015192A (en) * 2008-06-30 2010-01-21 Hitachi Ltd Information processing system, and electric power saving control method in the same
WO2010032501A1 (en) 2008-09-17 2010-03-25 株式会社日立製作所 Operation management method of infromation processing system
WO2010050249A1 (en) * 2008-10-30 2010-05-06 株式会社日立製作所 Operation management apparatus of information processing system
US7721052B2 (en) 2006-09-29 2010-05-18 Hitachi, Ltd. System and method of reducing power consumption of a main memory
JP2010204962A (en) * 2009-03-03 2010-09-16 Sony Corp Information-processing system
JP2011076158A (en) * 2009-09-29 2011-04-14 Nec Corp Server operation system and server operation method
JP2011141672A (en) * 2010-01-06 2011-07-21 Nec Computertechno Ltd Information processor and method for controlling the same
JP2011170751A (en) * 2010-02-22 2011-09-01 Nec Corp Bus system
JP2011170647A (en) * 2010-02-19 2011-09-01 Nec Corp Signal processing apparatus, signal processing method and program
WO2011121786A1 (en) * 2010-03-31 2011-10-06 富士通株式会社 Multi-core processor system, power control method, and power control program
JP2011197715A (en) * 2010-03-17 2011-10-06 Fujitsu Ltd Load distribution system and computer program
JP2012021711A (en) * 2010-07-15 2012-02-02 Fujitsu Ltd System and method for controlling air conditioning
US8155793B2 (en) 2008-09-25 2012-04-10 Hitachi, Ltd. System and method for controlling air conditioning facilities, and system and method for power management of computer room
JP2012104576A (en) * 2010-11-09 2012-05-31 Ntt Facilities Inc Cooperative control method of air conditioning device with data processing distribution
WO2012081092A1 (en) * 2010-12-15 2012-06-21 富士通株式会社 Air-conditioning control system and airflow adjustment device
JP2012169494A (en) * 2011-02-15 2012-09-06 Nec Corp Cooling system and method for efficiently cooling apparatus
JP2013502642A (en) * 2009-08-18 2013-01-24 インターナショナル・ビジネス・マシーンズ・コーポレーション Decentralized load balancing method and computer program in event-driven system
JP2013037439A (en) * 2011-08-04 2013-02-21 Fujitsu Ltd Information processing system and information processing method
US8397089B2 (en) 2008-10-29 2013-03-12 Hitachi, Ltd. Control method with management server apparatus for storage device and air conditioner and storage system
JP2013058257A (en) * 2010-09-16 2013-03-28 Hitachi Ltd Operation management method, operation management program, and operation management device for information processing system, and information processing system
EP2575003A1 (en) 2011-09-28 2013-04-03 Hitachi Ltd. Method for determining assignment of loads of data center and information processing system
EP2587339A2 (en) 2011-10-27 2013-05-01 Hitachi Ltd. Information processing system, and its power-saving control method and device
US8584134B2 (en) 2009-07-07 2013-11-12 Fujitsu Limited Job assigning apparatus and job assignment method
CN103530192A (en) * 2013-10-25 2014-01-22 上海交通大学 Low-energy-consumption reliability scheduling method based on solar energy sensing
US8782660B2 (en) 2010-09-28 2014-07-15 Fujitsu Limited Computing system and job allocation method
US8910175B2 (en) 2004-04-15 2014-12-09 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9037833B2 (en) 2004-04-15 2015-05-19 Raytheon Company High performance computing (HPC) node having a plurality of switch coupled processors
JP2015161489A (en) * 2014-02-28 2015-09-07 富士通株式会社 Data center, control program of control device and control method of data center
US9178784B2 (en) 2004-04-15 2015-11-03 Raytheon Company System and method for cluster management based on HPC architecture
JP2015195031A (en) * 2015-04-14 2015-11-05 任天堂株式会社 Information processor, information processing system and information processing method
JP2015230686A (en) * 2014-06-06 2015-12-21 富士通株式会社 Information processor, control method of information processor and control program of information processor
CN106095640A (en) * 2016-05-31 2016-11-09 联想(北京)有限公司 A kind of control method and electronic equipment
US9588577B2 (en) 2013-10-31 2017-03-07 Samsung Electronics Co., Ltd. Electronic systems including heterogeneous multi-core processors and methods of operating same
US9883617B2 (en) 2013-02-28 2018-01-30 Hitachi, Ltd. Air-conditioning control apparatus for data center
JP2018018340A (en) * 2016-07-28 2018-02-01 富士通株式会社 Program, management method and management device
US10261837B2 (en) 2017-06-30 2019-04-16 Sas Institute Inc. Two-part job scheduling with capacity constraints and preferences
US10310896B1 (en) 2018-03-15 2019-06-04 Sas Institute Inc. Techniques for job flow processing
US10359820B2 (en) 2016-05-31 2019-07-23 Lenovo (Beijing) Co., Ltd. Electronic device and control method thereof
CN111615298A (en) * 2020-05-15 2020-09-01 维沃移动通信有限公司 Heat dissipation method and device
US11526378B2 (en) 2019-01-16 2022-12-13 Toyota Jidosha Kabushiki Kaisha Information processing device and information processing method
JP7448703B2 (en) 2020-12-24 2024-03-12 株式会社日立製作所 Information processing system and data arrangement method in the information processing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0816531A (en) * 1994-06-28 1996-01-19 Hitachi Ltd Process schedule system
JPH11296488A (en) * 1998-04-09 1999-10-29 Hitachi Ltd Electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0816531A (en) * 1994-06-28 1996-01-19 Hitachi Ltd Process schedule system
JPH11296488A (en) * 1998-04-09 1999-10-29 Hitachi Ltd Electronic equipment

Cited By (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9904583B2 (en) 2004-04-15 2018-02-27 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9037833B2 (en) 2004-04-15 2015-05-19 Raytheon Company High performance computing (HPC) node having a plurality of switch coupled processors
US10289586B2 (en) 2004-04-15 2019-05-14 Raytheon Company High performance computing (HPC) node having a plurality of switch coupled processors
US10621009B2 (en) 2004-04-15 2020-04-14 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9594600B2 (en) 2004-04-15 2017-03-14 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9178784B2 (en) 2004-04-15 2015-11-03 Raytheon Company System and method for cluster management based on HPC architecture
US10769088B2 (en) 2004-04-15 2020-09-08 Raytheon Company High performance computing (HPC) node having a plurality of switch coupled processors
US11093298B2 (en) 2004-04-15 2021-08-17 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US8984525B2 (en) 2004-04-15 2015-03-17 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US8910175B2 (en) 2004-04-15 2014-12-09 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9189278B2 (en) 2004-04-15 2015-11-17 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US9832077B2 (en) 2004-04-15 2017-11-28 Raytheon Company System and method for cluster management based on HPC architecture
US9928114B2 (en) 2004-04-15 2018-03-27 Raytheon Company System and method for topology-aware job scheduling and backfilling in an HPC environment
US7520669B2 (en) 2004-06-04 2009-04-21 Sony Computer Entertainment Inc. Processor, processor system, temperature estimation device, information processing device, and temperature estimation method
KR100831854B1 (en) * 2004-06-04 2008-05-22 가부시키가이샤 소니 컴퓨터 엔터테인먼트 Processor, processor system, temperature estimation device, information processing device, and temperature estimation method
WO2005119402A1 (en) 2004-06-04 2005-12-15 Sony Computer Entertainment Inc. Processor, processor system, temperature estimation device, information processing device, and temperature estimation method
KR100634931B1 (en) 2004-06-14 2006-10-17 인텔 코오퍼레이션 A temperature-aware steering mechanism
US7831842B2 (en) 2004-06-22 2010-11-09 Sony Computer Entertainment Inc. Processor for controlling performance in accordance with a chip temperature, information processing apparatus, and method of controlling processor
CN100432943C (en) * 2004-06-22 2008-11-12 索尼计算机娱乐公司 Processor, information processing device and method for controlling processor
WO2005124550A1 (en) * 2004-06-22 2005-12-29 Sony Computer Entertainment Inc. Processor, information processor and control method of processor
US8086880B2 (en) 2004-07-05 2011-12-27 Sony Corporation Information processing apparatus, information processing method, and computer program
JP2006018758A (en) * 2004-07-05 2006-01-19 Sony Corp Information processor, information processing method, and program
JP2008521127A (en) * 2004-11-17 2008-06-19 レイセオン カンパニー Fault tolerance and recovery in high performance computing (HPC) systems
WO2006134775A1 (en) * 2005-06-15 2006-12-21 Matsushita Electric Industrial Co., Ltd. Electronic circuit
JP2007179437A (en) * 2005-12-28 2007-07-12 Fujitsu Ltd Management system, management program and management method
US8751653B2 (en) 2005-12-28 2014-06-10 Fujitsu Limited System for managing computers and pieces of software allocated to and executed by the computers
US7721052B2 (en) 2006-09-29 2010-05-18 Hitachi, Ltd. System and method of reducing power consumption of a main memory
JP2008165301A (en) * 2006-12-27 2008-07-17 Fujitsu Ltd Load aggregation program, recording medium storing it, load aggregation device, and load aggregation method
JP2009134716A (en) * 2007-11-28 2009-06-18 Internatl Business Mach Corp <Ibm> Method for giving sharing cache line in multiprocessor data processing system, computer readable medium and multiprocessor data processing system
JP2009193509A (en) * 2008-02-18 2009-08-27 Fujitsu Ltd Information processor, information processing method, and information processing program
US9389664B2 (en) 2008-04-09 2016-07-12 Hitachi, Ltd. Operations management methods and devices thereof in systems
JP4724730B2 (en) * 2008-04-09 2011-07-13 株式会社日立製作所 Information processing system operation management method, operation management program, operation management apparatus, and information processing system
JP2011040083A (en) * 2008-04-09 2011-02-24 Hitachi Ltd Operation management method, operation management program, and operation management device for information processing system, and information processing system
US9128704B2 (en) 2008-04-09 2015-09-08 Hitachi, Ltd. Operations management methods and devices thereof in information-processing systems
JP2009252056A (en) * 2008-04-09 2009-10-29 Hitachi Ltd Method and device for operation management of information-processing system
EP2109028A2 (en) 2008-04-09 2009-10-14 Hitachi Ltd. Operations management methods and devices in information processing systems
US20090271608A1 (en) * 2008-04-25 2009-10-29 Gooding Thomas M Temperature Threshold Application Signal Trigger for Real-Time Relocation of Process
US8250383B2 (en) * 2008-04-25 2012-08-21 International Business Machines Corporation Temperature threshold application signal trigger for real-time relocation of process
JP2010015192A (en) * 2008-06-30 2010-01-21 Hitachi Ltd Information processing system, and electric power saving control method in the same
US8200995B2 (en) 2008-06-30 2012-06-12 Hitachi, Ltd. Information processing system and power-save control method for use in the system
US8145927B2 (en) 2008-09-17 2012-03-27 Hitachi, Ltd. Operation management method of information processing system
CN102099791B (en) * 2008-09-17 2012-11-07 株式会社日立制作所 Operation management method of infromation processing system
JPWO2010032501A1 (en) * 2008-09-17 2012-02-09 株式会社日立製作所 Operation management method of information processing system
JP4751962B2 (en) * 2008-09-17 2011-08-17 株式会社日立製作所 Operation management method of information processing system
WO2010032501A1 (en) 2008-09-17 2010-03-25 株式会社日立製作所 Operation management method of infromation processing system
US8155793B2 (en) 2008-09-25 2012-04-10 Hitachi, Ltd. System and method for controlling air conditioning facilities, and system and method for power management of computer room
US8397089B2 (en) 2008-10-29 2013-03-12 Hitachi, Ltd. Control method with management server apparatus for storage device and air conditioner and storage system
US8429434B2 (en) 2008-10-29 2013-04-23 Hitachi, Ltd. Control method with management server apparatus for storage device and air conditioner and storage system
JP4768082B2 (en) * 2008-10-30 2011-09-07 株式会社日立製作所 Information management system operation management device
US8127298B2 (en) 2008-10-30 2012-02-28 Hitachi, Ltd. Operations management apparatus of information-processing system
WO2010050249A1 (en) * 2008-10-30 2010-05-06 株式会社日立製作所 Operation management apparatus of information processing system
US9672055B2 (en) 2009-03-03 2017-06-06 Sony Corporation Information processing system having two sub-systems with different hardware configurations which enable switching therebetween
JP2010204962A (en) * 2009-03-03 2010-09-16 Sony Corp Information-processing system
US8584134B2 (en) 2009-07-07 2013-11-12 Fujitsu Limited Job assigning apparatus and job assignment method
US9665407B2 (en) 2009-08-18 2017-05-30 International Business Machines Corporation Decentralized load distribution to reduce power and/or cooling costs in an event-driven system
JP2013502642A (en) * 2009-08-18 2013-01-24 インターナショナル・ビジネス・マシーンズ・コーポレーション Decentralized load balancing method and computer program in event-driven system
JP2011076158A (en) * 2009-09-29 2011-04-14 Nec Corp Server operation system and server operation method
JP2011141672A (en) * 2010-01-06 2011-07-21 Nec Computertechno Ltd Information processor and method for controlling the same
JP2011170647A (en) * 2010-02-19 2011-09-01 Nec Corp Signal processing apparatus, signal processing method and program
JP2011170751A (en) * 2010-02-22 2011-09-01 Nec Corp Bus system
JP2011197715A (en) * 2010-03-17 2011-10-06 Fujitsu Ltd Load distribution system and computer program
US9152472B2 (en) 2010-03-17 2015-10-06 Fujitsu Limited Load distribution system
US9037888B2 (en) 2010-03-31 2015-05-19 Fujitsu Limited Multi-core processor system, electrical power control method, and computer product for migrating process from one core to another
JP5472449B2 (en) * 2010-03-31 2014-04-16 富士通株式会社 Multi-core processor system, power control method, and power control program
WO2011121786A1 (en) * 2010-03-31 2011-10-06 富士通株式会社 Multi-core processor system, power control method, and power control program
JP2012021711A (en) * 2010-07-15 2012-02-02 Fujitsu Ltd System and method for controlling air conditioning
JP2013058257A (en) * 2010-09-16 2013-03-28 Hitachi Ltd Operation management method, operation management program, and operation management device for information processing system, and information processing system
US8782660B2 (en) 2010-09-28 2014-07-15 Fujitsu Limited Computing system and job allocation method
JP2012104576A (en) * 2010-11-09 2012-05-31 Ntt Facilities Inc Cooperative control method of air conditioning device with data processing distribution
WO2012081092A1 (en) * 2010-12-15 2012-06-21 富士通株式会社 Air-conditioning control system and airflow adjustment device
JP2012169494A (en) * 2011-02-15 2012-09-06 Nec Corp Cooling system and method for efficiently cooling apparatus
JP2013037439A (en) * 2011-08-04 2013-02-21 Fujitsu Ltd Information processing system and information processing method
EP2575003A1 (en) 2011-09-28 2013-04-03 Hitachi Ltd. Method for determining assignment of loads of data center and information processing system
EP2587339A2 (en) 2011-10-27 2013-05-01 Hitachi Ltd. Information processing system, and its power-saving control method and device
US9507392B2 (en) 2011-10-27 2016-11-29 Hitachi, Ltd. Information processing system, and its power-saving control method and device
US9883617B2 (en) 2013-02-28 2018-01-30 Hitachi, Ltd. Air-conditioning control apparatus for data center
CN103530192A (en) * 2013-10-25 2014-01-22 上海交通大学 Low-energy-consumption reliability scheduling method based on solar energy sensing
US9588577B2 (en) 2013-10-31 2017-03-07 Samsung Electronics Co., Ltd. Electronic systems including heterogeneous multi-core processors and methods of operating same
JP2015161489A (en) * 2014-02-28 2015-09-07 富士通株式会社 Data center, control program of control device and control method of data center
JP2015230686A (en) * 2014-06-06 2015-12-21 富士通株式会社 Information processor, control method of information processor and control program of information processor
US10203670B2 (en) 2014-06-06 2019-02-12 Fujitsu Limited Information processing equipment and method for controlling information processing equipment
JP2015195031A (en) * 2015-04-14 2015-11-05 任天堂株式会社 Information processor, information processing system and information processing method
CN106095640A (en) * 2016-05-31 2016-11-09 联想(北京)有限公司 A kind of control method and electronic equipment
US10359820B2 (en) 2016-05-31 2019-07-23 Lenovo (Beijing) Co., Ltd. Electronic device and control method thereof
CN106095640B (en) * 2016-05-31 2019-07-26 联想(北京)有限公司 A kind of control method and electronic equipment
JP2018018340A (en) * 2016-07-28 2018-02-01 富士通株式会社 Program, management method and management device
US10261837B2 (en) 2017-06-30 2019-04-16 Sas Institute Inc. Two-part job scheduling with capacity constraints and preferences
US10310896B1 (en) 2018-03-15 2019-06-04 Sas Institute Inc. Techniques for job flow processing
US11526378B2 (en) 2019-01-16 2022-12-13 Toyota Jidosha Kabushiki Kaisha Information processing device and information processing method
CN111615298A (en) * 2020-05-15 2020-09-01 维沃移动通信有限公司 Heat dissipation method and device
JP7448703B2 (en) 2020-12-24 2024-03-12 株式会社日立製作所 Information processing system and data arrangement method in the information processing system

Similar Documents

Publication Publication Date Title
JP2004126968A (en) Job scheduling system for parallel computer
Mukherjee et al. Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers
Sharma et al. Balance of power: Dynamic thermal management for internet data centers
US8145927B2 (en) Operation management method of information processing system
Tang et al. Thermal-aware task scheduling for data centers through minimizing heat recirculation
US20170139462A1 (en) Datacenter power management optimizations
US9507392B2 (en) Information processing system, and its power-saving control method and device
KR101400286B1 (en) Method and apparatus for migrating task in multi-processor system
Sun et al. Dynamic network function provisioning to enable network in box for industrial applications
US20110131431A1 (en) Server allocation to workload based on energy profiles
Cao et al. Cooling-aware job scheduling and node allocation for overprovisioned HPC systems
Rapp et al. Power-and cache-aware task mapping with dynamic power budgeting for many-cores
Akbar et al. A game-based thermal-aware resource allocation strategy for data centers
JP2011516998A (en) Workload scheduling method, system, and computer program
Supreeth et al. Virtual machine scheduling strategies in cloud computing-A review
Babu et al. Interference aware prediction mechanism for auto scaling in cloud
Agrawal et al. Energy-efficient scheduling: classification, bounds, and algorithms
Kuo et al. Task assignment with energy efficiency considerations for non-DVS heterogeneous multiprocessor systems
Ren et al. Interval job scheduling with machine launch cost
Stavrou et al. Thermal-aware scheduling: A solution for future chip multiprocessors thermal problems
Primas et al. A framework and task allocation analysis for infrastructure independent energy-efficient scheduling in cloud data centers
Reza et al. Energy-efficient task-resource co-allocation and heterogeneous multi-core NoC design in dark silicon era
Terzopoulos et al. Energy-efficient real-time heterogeneous cluster scheduling with node replacement due to failures
Aghababaeipour et al. A new adaptive energy-aware job scheduling in cloud computing
Martinez et al. Robust and fault-tolerant fog design and dimensioning for reliable operation

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20050915

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20070405

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20070410

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20070606

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20071106