JP4209109B2

JP4209109B2 - Container terminal operation optimization system

Info

Publication number: JP4209109B2
Application number: JP2001388533A
Authority: JP
Inventors: 一浩武多; 昭井上; 洋一平嶋
Original assignee: Mitsubishi Heavy Industries Ltd
Current assignee: Mitsubishi Heavy Industries Ltd
Priority date: 2001-12-20
Filing date: 2001-12-20
Publication date: 2009-01-14
Anticipated expiration: 2021-12-20
Also published as: JP2003182854A

Description

【０００１】
【発明の属する技術分野】
本発明は、コンテナターミナル運用最適化システムに関し、特にコンテナを効率良くコンテナ船に積み込む技術に関する。
【０００２】
【従来の技術】
船舶を利用した貨物輸送は世界各国の港で従来から盛んに行われてきた。通常、各貨物はコンテナによって管理されるため、港では船舶に対してコンテナの荷役作業が不可欠である。コンテナを船舶に積み込む場合、各コンテナの輸送先が決まっており、船舶内ではコンテナを移動することができないので、予め決定された積載順にコンテナを並べておく必要がある。コンテナは各地からトラックによって運び込まれ、所定の領域に積み上げられる。この領域をベイ、ベイの集合をヤードエリアと呼ぶ。ヤードエリアのコンテナは積載順を無視して並べられるため、船舶への積み込み前にコンテナを配置換えする必要が生じる。配置換え後、コンテナはバッファエリアと呼ばれる領域に保管される。
【０００３】
一方、港に停泊する船舶からは停泊料が徴収される。料金は停泊時間を基準にして決定されるため、長時間停泊すると料金が高額になってしまう。このためコンテナの配置替えを効率よく行うことによって停泊時間をできるだけ短くする方法が求められている。
【０００４】
コンテナの配置替えを行う場合、コンテナの配置と移動の組み合わせを全部記憶しておくことによって、最短時間の配置替え順序を探索することができる。しかし、配置と移動の組み合わせ数は、コンテナ数が増加するにつれて指数関数的に増大するため、コンテナ数が大きい場合は全ての組み合わせを記憶・利用することは困難である。そこで、初期状態から局所的な探索をはじめ、解の改善を行う手法が必要になる。
【０００５】
上記説明と関連して、特開平９−１２１１６号公報には、コンテナ配置替え計画作成方法が開示されている。この引例では、Ｎ台の搬送手段が同期して搬送物をセットとして一度に運ぶ場合において、セットの搬送時間を最小にすることを目的としている。
【０００６】
また、特開平９−２６７９１８号公報にはコンテナターミナルにおける本線荷役計画作成方法が開示されている。この引例では、岸壁荷役用クレーンの作業時間とコンテナヤード荷役用クレーンの作業時間の調整について述べている。
【０００７】
【発明が解決しようとする課題】
従って、本発明の課題は、短時間で荷役作業を行うことができるコンテナターミナル運用最適化システムとその方法を提供することである。
【０００８】
本発明の他の課題は、コンテナの最小移動を通して荷役作業を完了することができるように荷役作業をシミュレーションし、その結果に基づいて荷役作業を行うことができるコンテナターミナル運用最適化システムとその方法を提供することである。
【０００９】
本発明の他の課題は、上記シミュレーションにQ-Learning法を適用してシミュレーションの効率を改善したコンテナターミナル運用最適化システムとその方法を提供することである。
【００１０】
本発明の他の課題は、Q-Learning法の適用に際し、必要とされるメモリ容量を減らすことができるコンテナターミナル運用最適化システムとその方法を提供することである。
【００１１】
【課題を解決するための手段】
以下に、［発明の実施の形態］で使用する番号・符号を用いて、課題を解決するための手段を説明する。これらの番号・符号は、[特許請求の範囲]の記載と［発明の実施の形態］の記載との対応関係を明らかにするために付加されたものであるが、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。
【００１２】
本発明の第１の観点で、コンテナターミナル運用最適化システムは、ヤードエリアに保管されているコンテナの配置位置を示すヤード配置データを格納するヤード配置データベース（３６）と、バッファエリアに保管されるべき移動対象コンテナの希望配置位置を示すバッファ配置データを格納するバッファ配置データベース（３８）と、コンテナ船の輸送先を示すデータから前記バッファ配置データを作成し、Ｑ−ｌｅａｒｎｉｎｇ法を用いて、前記ヤード配置データと前記バッファ配置データから最小移動回数で前記移動対象コンテナを前記ヤードエリアから前記バッファエリアに移動する手順を決定する制御部（４０）とを具備するコンテナターミナル運用最適化システムは、前記移動対象コンテナを移動するためのトランスファクレーン（１０）を更に具備し、前記制御部（４０）は、前記決定された手順に従って、前記移動対象コンテナを前記ヤードエリア（４）から前記バッファエリア（６）に移動する順番を前記トランスファクレーン（１０）に指示する制御指示装置（４４）を更に具備する。
【００１３】
前記トランスファクレーン（１０）は、前記ヤードエリアに沿って設けられたレール（１２）上を移動する。
【００１４】
前記制御部（４０）は、Ｑ−ｌｅａｒｎｉｎｇ法を実行するシミュレーション部（４２）を具備し、前記シミュレーション部（４２）は、Ｑ−ｌｅａｒｎｉｎｇ法を実行するとき、移動可能な前記移動対象コンテナを前記ヤードエリア（４）から前記バッファエリア（６）に移動し、残りの前記コンテナに前記Ｑ−ｌｅａｒｎｉｎｇ法を実行する。このとき、前記残りのコンテナ中の残りの前記移動対象コンテナの順列について前記Ｑ−ｌｅａｒｎｉｎｇ法を実行する。
【００１５】
本発明の第２の観点では、コンテナターミナル運用最適化方法は、コンテナ船の輸送先を示すデータからバッファ配置データを作成するステップと、前記バッファ配置データは、バッファエリアに保管されるべき移動対象コンテナの希望配置位置を示し、Ｑ−ｌｅａｒｎｉｎｇ法を用いて、ヤード配置データと前記バッファ配置データから最小移動回数で移動対象コンテナをヤードエリアからバッファエリアに移動する手順を決定するステップと、前記ヤード配置データは、前記ヤードエリアに保管されているコンテナの配置位置を示し、前記決定された手順に従って、前記移動対象コンテナを前記ヤードエリアから前記バッファエリアに移動するステップとを具備する。
【００１６】
前記決定するステップは、移動可能な前記移動対象コンテナを前記ヤードエリアから前記バッファエリアに移動し、残りの前記コンテナに前記Ｑ−ｌｅａｒｎｉｎｇ法を実行するステップを具備する。
【００１７】
前記実行するステップは、残りのコンテナ中の残りの前記移動対象コンテナの順列について前記Ｑ−ｌｅａｒｎｉｎｇ法を実行する。
【００１８】
本発明の第３の観点では、コンテナターミナル運用最適化方法は、（ａ）配置替え可能な移動対象コンテナが存在するか否かを判定するステップと、（ｂ）配置替え可能な前記移動対象コンテナが存在するとき、全ての前記移動可能コンテナをヤードエリアからバッファエリアに移動するステップと、（ｃ）配置替え可能ではない前記移動対象コンテナの特定のものを選択するステップと、（ｄ）Ｑ値を更新しながら前記移動対象コンテナを前記ヤードエリアから前記バッファエリアに移動するステップと、（ｅ）前記移動対象コンテナが前記ヤードエリアに存在しなくなるまで、前記ステップ（ａ）から（ｄ）までを繰り返すステップとを具備する。
【００１９】
前記ステップ（ａ）から（ｅ）までを、前記移動対象コンテナを並べたときの全ての順列に対して実行して、コンテナの移動回数が最も少ない順列を決定する。また、前記ステップ（ｄ）は、前記移動対象コンテナ以外のコンテナをヤード内で移動して前記特定の移動対象コンテナを見いだし、前記特定の移動対象コンテナを前記ヤードエリアから前記バッファエリアに移動するステップを具備する。
【００２０】
近年、わが国のコンテナターミナルの整備の立ち遅れやサービス水準の低下が指摘されており、中枢国際港湾の整備が緊急の課題となっている。このような状況の中でコンテナターミナルのオペレーションに着目し、そのスピードアップを図るために、試行錯誤を通じて最適解を導出する強化学習の一手法であるQ-Learning法をコンテナの配置替え荷役作業に適用し、短時間で荷役作業を終えることができる最適な運用化システムが提供される。
【００２１】
【発明の実施の形態】
以下に添付図面を参照して、本発明のコンテナターミナル運用最適化システムについて詳細に説明する。
【００２２】
図１は、本発明の実施の形態が適用されるコンテナターミナルと最適化システムを示している。コンテナターミナルは、ヤードエリア４とバッファエリア６とを有する。ヤードエリア４は、複数のベイ８からなる。ヤードエリア４とバッファエリア６の両側にはトランスファクレーン１０用のレール１２が設けられている。トランスファクレーン１０は、レール１２に沿って、ヤードエリア４とバッファエリア６とを移動する。最適化システムは、制御装置２を含み、制御装置２は、トランスファークレーン１０の動作を監視し、制御する。
【００２３】
コンテナトラックで運ばれてきたコンテナは、トランスファクレーン１０によりいずれかのベイ８に積み上げられる。コンテナ船２０の入港が予定されるときには、制御装置２は、輸送先を示すデータからコンテナの最適な荷役作業を計画し、その計画に従ってトランスファクレーン１０にヤードエリア４からバッファエリア６にコンテナを移動する順番を指示する。その後、コンテナ船２０が埠頭に着岸したときには、トランスファクレーン１０は、制御装置２からの指示に従ってヤードシャーシ１４にコンテナを積む。ヤードシャーシ１４は、コンテナを積んだまま埠頭に移動する。岸壁荷役用クレーン２２は、ヤードシャーシ１４に積載されていたコンテナをコンテナ船２０の内部に順番に移動する。こうして、効率よくコンテナはヤードエリア４からコンテナ船２０に積み込まれることができる。
【００２４】
制御装置２は、ヤード配置データベース３６、バッファ配置データベース３８、制御部４０、入力装置３２、出力装置３４とを有する。制御部４０は、シミュレータ４２と制御指示装置４４とを備えている。ヤード配置データベース３６は、各コンテナのヤードエリア４での保管位置を示すヤード配置データを格納している。また、バッファ配置データベース３８は、バッファエリア６に移動されたコンテナの一時保管位置を示すバッファ配置データを格納している。コンテナ船２０の入港が予定されるときには、コンテナ船２０の輸送先を示すデータが入力装置３２からシミュレータ４２に入力される。シミュレータ４２は、輸送先データとヤード配置データとからバッファ配置データを作成する。その後、シミュレータ４２は、ヤード配置データとバッファ配置データから、コンテナ船２０に積み込まれるべきコンテナの最適な順番をシミュレートして配置替え計画を立案する。立案された計画は出力装置３４に出力される。また、制御指示装置４４は、その計画に従って指示をトランスファクレーン１０に出力する。こうして、コンテナの移動回数が最小となる最適な順番でコンテナはヤードエリア４からバッファエリア８に配置替えされることができる。
【００２５】
次に、コンテナの配置替え作業について説明する。
【００２６】
先ず、図３を参照して、ヤードエリア４とバッファエリア６について説明する。一例として、図３（ａ）に示されるように、ヤードエリア４に２つのベイ８が存在するとする。この場合、ベイ８に積み上げられているコンテナには、図示のように番号がつけられている。また、輸送先はヤード配置データとリンクされている。コンテナ船２０の輸送先データに基づいて、バッファエリア８に積み上げられるべきコンテナの位置が図３（ｂ）に示されるとおりであるとする。実際にベイ８に積み上げられているコンテナの状態が図３（ｃ）に示されている。このとき、コンテナを図３（ｂ）に示されるようにコンテナを最も効率よく積み上げることを考える。
【００２７】
図７は、荷繰り、配置替え操作の第１例を示している。この例では、バッファエリア８に移動され配置されるコンテナの順番は、Ｃ１、Ｃ２，Ｃ３、Ｃ４、Ｃ５、Ｃ６、Ｃ７、Ｃ８、Ｃ９の順である。このための操作を図４と図５を参照して説明する。
【００２８】
図４（ａ）から（ｆ）と図５（ｇ）から（ｍ）は、図７に示される荷役作業計画のステップに対応している。左２つは、ベイ８でのコンテナの保管状況を示し、右端はバッファエリア８でのコンテナの一時保管状況を示している。
【００２９】
図４（ａ）は、最初のステップを示している。図３（ｃ）に示される状態から、コンテナＣ１を配置替えするためにコンテナＣ９が荷繰りされる。次に、第２ステップでは、コンテナＣ１は移動可能であるので、コンテナＣ１がバッファエリア８に移動される。次に、第３ステップでは、コンテナＣ２のためにコンテナＣ７が荷繰りされる。続いて第４ステップでは、コンテナＣ２がバッファエリア８に配置替えされる。第５ステップでは、コンテナＣ５が荷繰りされ、第６ステップでは、コンテナＣ３がバッファエリア８に配置替えされる。第７ステップでは、コンテナＣ４がバッファエリア８に配置替えされ、続いて第８ステップではコンテナＣ５がバッファエリア８に配置替えされる。次に、第９ステップでは、コンテナＣ９が荷繰りされ、第１０ステップではコンテナＣ６がバッファエリア８に配置替えされる。続いて、第１１ステップでは、コンテナＣ７が配置替えされ、第１２ステップではコンテナＣ８がバッファエリア８に配置替えされる。最後に、第１３ステップでコンテナＣ９がバッファエリア８に配置替えされる。荷役作業計画の第１例では、コンテナは、Ｃ１、Ｃ２、Ｃ３、Ｃ４、Ｃ５、Ｃ６、Ｃ７、Ｃ８、Ｃ９の順で配置替えされている。このように、Ｃ１からＣ９までのコンテナを、図３（ｂ）に示される所望の順番に積み上げるために、１３ステップを要することになる。
【００３０】
次に、図８は、荷役作業計画の第２例を示している。この例では、バッファエリア８に移動され配置されるコンテナの順番は、Ｃ２、Ｃ５，Ｃ３、Ｃ６、Ｃ９、Ｃ１、Ｃ４、Ｃ７、Ｃ８の順である。このための操作を図６を参照して説明する。図６（ａ）から（ｊ）は、図８のステップに対応している。左２つは、ベイ８でのコンテナの保管状況を示し、右端はバッファエリア８でのコンテナの一時保管状況を示している。
【００３１】
図６（ａ）は、最初のステップを示している。図３（ｃ）に示される状態から、コンテナＣ２を配置替えするためにコンテナＣ７が荷繰りされる。第１ステップで、ヤードエリア４で荷替え可能コンテナは、Ｃ６、Ｃ９、Ｃ４、Ｃ８、Ｃ７、Ｃ５であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ２、Ｃ３であり、実際に荷替え可能なコンテナは存在しない。そこで、コンテナＣ７が荷繰りされる。ここで、コンテナＣ７が選択されたのは、Ｑ−ｌｅａｒｎｉｎｇ法による学習の結果である。
【００３２】
第２ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ６、Ｃ９、Ｃ４、Ｃ７、Ｃ２、Ｃ５であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ２、Ｃ３であり、コンテナＣ２は配置替え可能である。従って、第２ステップでコンテナＣ２がバッファエリア８に配置替えされる。
【００３３】
次に、第３ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ６、Ｃ９、Ｃ４、Ｃ７、Ｃ５であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ５、Ｃ３であり、コンテナＣ５は配置替え可能である。従って、第３ステップでコンテナＣ５がバッファエリア８に配置替えされる。
【００３４】
次に、第４ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ６、Ｃ９、Ｃ４、Ｃ７、Ｃ３であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ８、Ｃ３であり、コンテナＣ３は配置替え可能である。従って、第４ステップでコンテナＣ３がバッファエリア８に配置替えされる。
【００３５】
次に、第５ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ６、Ｃ９、Ｃ４、Ｃ７であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ８、Ｃ６であり、コンテナＣ６は配置替え可能である。従って、第５ステップでコンテナＣ６がバッファエリア８に配置替えされる。
【００３６】
次に、第６ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ９、Ｃ４、Ｃ７であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ８、Ｃ９であり、コンテナＣ９は配置替え可能である。従って、第６ステップでコンテナＣ９がバッファエリア８に配置替えされる。
【００３７】
次に、第７ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ１、Ｃ４、Ｃ７であるが、バッファエリア８では荷替え可能コンテナはＣ１、Ｃ８であり、コンテナＣ１は配置替え可能である。従って、第７ステップでコンテナＣ１がバッファエリア８に配置替えされる。
【００３８】
次に、第８ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ４、Ｃ７であるが、バッファエリア８では荷替え可能コンテナはＣ４、Ｃ８であり、コンテナＣ４は配置替え可能である。従って、第８ステップでコンテナＣ４がバッファエリア８に配置替えされる。
【００３９】
次に、第９ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ７であるが、バッファエリア８では荷替え可能コンテナはＣ７、Ｃ８であり、コンテナＣ７は配置替え可能である。従って、第９ステップでコンテナＣ７がバッファエリア８に配置替えされる。
【００４０】
最後に、第１０ステップでは、ヤードエリア４で荷替え可能コンテナは、Ｃ８であるが、バッファエリア８では荷替え可能コンテナはＣ８であり、コンテナＣ８は配置替え可能である。従って、第１０ステップでコンテナＣ８がバッファエリア８に配置替えされる。
【００４１】
こうして、荷役作業計画の第２例では、１０ステップでコンテナＣ１からコンテナＣ９が所望の位置に配置替えされている。一方、第１例では、１３ステップかかっている。このように、配置替えの手順を考えるだけで最小の移動回数でコンテナをヤードエリア４からバッファエリア６に配置替えすることができる。
【００４２】
次にこのような効率の良い配置替えをシミュレーションするために強化学習法としてのＱ−ｌｅａｒｎｉｎｇ法を適用することを考える。このために、Ｑ−ｌｅａｒｎｉｎｇ法におけるQ値は、ある状態とある動作の評価値を有する。ここで、配置替え問題の場合には、Q値（Q(s,a)）は、コンテナの配置状態(s)と、このコンテナをヤードエリア４内での荷繰り操作、ヤードエリア４からバッファエリア８への配置替え操作(a)により定義される。Ｑ値（Q(s,a)）が最適値を持つように以下に示すアルゴリズムで更新していくことにより、コンテナの移動回数を最小にするコンテナの配置換え荷役を計画する。
（１）コンテナ状態sを観測する。
（２）任意の行動選択方法に従ってコンテナの移動行動aを実行する。
（３）移動行動に対する報酬rを受け取る。
（４）配置換え後のコンテナ状態s'を観測する。
（５）以下の更新式よりQ値を更新する。
Q(s,a)←(1−α)×Q(s,a)＋α［r+γ・max Q(s',a')］
ただし、αは学習率（0<α<1）、γは割引率(0<γ<1)
（６）時間ステップtをt+1へ進めて手順(1)へ戻る。
【００４３】
従来のQ-learning法では、x_i,u_j,(i=1,．．．,m,j=1,・・・,n)をそれぞれのプラントの状態変数、入力とする。各(x_i,u_j)の組み合わせに対応する評価としてQ値を与える。Q値の集合をQ値表といい、各u_jは対応する1Q値表を持つ。よって全ての(x_i,u_j)の組み合わせに対する評価を行うために、n個のQ値表が必要である。Q値表の入力はx_i、出力はQ(x_i,u_j)である。プラントの特性が未知の場合、Q(x_i,u_j)も未知である。従ってQ(x_i,u_j)の時刻tにおける推定値をQ_t(x_i,u_j)として、次の式（１）でこれを更新する：
【数１】

ここで、0<γ<1は割引率、αは学習率である。また、R_tは報酬で、時刻t−1の制御入力がu_j、時刻tのプラント状態がx'の場合に与える。このとき、式(1)によって更新されるのは、(x_i,u_j)に対応する1Q値のみである。
【００４４】
学習初期には、多くの(x_i,u_j)に対するQ(x_i,u_j)を繰り返し更新する必要がある。このため以下の式（２）のボルツマン分布によって入力u_jを選択する方法が広く使われている。
【数２】

ただし、Tは温度定数である。
【００４５】
ここで、k個のコンテナの配置替え問題を考える。各コンテナはc₁からc_kまでの固有の識別子を持つ。コンテナは１個のベイ８にランダムに積み上げられている。船舶に荷積みするコンテナの順序が決まっているため、コンテナをベイ８からバッファエリア６と呼ばれる領域に配置替えする。バッファエリア６の配置はコンテナの荷積み順に従って予め決定する。ベイの大きさはm_y行n_y列、コンテナの高さがｌ、バッファエリアの大きさはm_b行n_b列とする。ヤードエリア内の位置は1からm_y×n_y×lまでの整数で識別する。また、コンテナc_i(i=1，．．．，k)のヤードエリア内の位置をx_i(xiは、1以上かつm_y×n_y×l以下の数)とし、ヤードエリアの配置をx=[x₁，．．．，x_k]と表す。ただし、c_iがバッファエリアにある場合はx_i=0とする。バッファエリアにはベイからコンテナを移動し、下から積み上げる。このため、バッファエリアに移動可能なコンテナはバッファエリアの列数n_bに制限される。コンテナを配置替えする場合、まず対象コンテナc_Tを配置替え候補uc_j(j=n_y，．．．，n_b+n_y−1)から決定する。次に、c_Tの上に他のコンテナが存在する場合、これらをすべてベイ内の他の列[um₁，．．．，um_ny ₋ ₁]のうち１列に移動する。これを荷繰りと呼び、荷繰り対象コンテナをc_Mとする。そしてc_Tをバッファエリアに配置替えする。このとき、荷繰り先は１コンテナ当りn_y−1通り、配置替えはn_b通りの選択肢があり、荷繰りまたは配置替えを行うとヤードエリアの状態が変化する。そこで、プラントの動作を、以下のように表す。
uj = um_j (1≦ｊ≦n_y-1）
= uc_j (n_y≦j≦n_b+n_y-1)
この場合、プラントを以下の式（３）で表現できる。
【数３】

ここで、f()はヤード・バッファエリアに対する動作u_jの適用関数である。図３において、m_y=n_y=m_b=n_b=3，l=2，k=9であり、コンテナの位置は1から18までの整数で識別される、このプラントに対して図６（ａ）で配置替え対象c_Tを[c₁,c₂,c₃]の中からc₂に決定し、c₂上のコンテナc₇を荷繰りしている。また、その荷繰り先を[um₁,um₂]=[13,18]から決定し、次の配置替え対象をc₅としている。このときc₇の荷繰り先を18にするよりも13にした方がc₅を配置替えするための荷繰り回数が1回少なくなっている。また配置替え順序を第１例の1,2,3,4,5,6,7,8,9とするよりも、第２例の2,5,3,6,9,1,4,7,8とする方が少ない荷繰り回数で配置替えを行うことができる。2通りの配置替え順序に対するxの変化は、上記図７と８に示したとおりである。
【００４６】
コンテナ配置替え問題に対するQ-learning法の適用に関し、従来では実現困難だったQ-learning法を用いた解の探索方法を説明する。
【００４７】
Q-learning法では、全ての状態と動作の組み合わせが評価（Ｑ値）を持つ必要がある。コンテナ配置替え問題にQ-learningを適用する場合、荷繰りを行う際にプラントの状態を表現するためにはxに加えて荷繰りコンテナc_Ｔが必要になる。そこで、xを拡張したx⁺=[x₁,…,x_k,c_Ｔ]とu_j(j=1,…,n_y+n_b−1)の組み合わせに対してQ値を割り当てる。このとき、1以上m_y×n_y×l以下のx_i(i=1,…,k)に対して、c_Ｔはｎ_ｂ通りであるため、x⁺は、以下の式のように、
【数４】

通り存在し、Q値数は
【数５】

である。つまり、kが増加するとQ値の数が指数関数的に増加する。従来、Q-learningではQ値数が大きくなると学習速度が低下し、また、Q値を格納するために必要な記憶領域も大きくなってしまう。
【００４８】
そこで、学習速度を改善するためにプラントの性質を利用したQ-learningのアルゴリズムと、必要な記憶領域を減少させるためのデータ構造を以下で説明する。
【００４９】
従来のQ-learningでは、Q値が(1)式によって更新され、γによって割り引かれつつ伝播する。配置替え問題の目的は荷繰り回数をできる限り小さくすることであるので、Q値は荷繰り回数を表す指標であれば良い。そこで、荷繰り u _j(j=1,．．．,n_y−1)に対してヤードエリアの2状態x,x'に、
【数６】

の関係が成り立つ場合に、以下の式（４）
【数７】

によってQ値を更新／伝播する。報酬R_tを配置替えがすべて完了した場合のみ与え、式(4)を用いてQ値を伝播すると、荷繰り回数の増加に伴って割り引き回数が増えるため対応するQ値が小さくなる。つまり、各状態で、Q値が相対的に大きくなる荷繰りを選択することによって荷繰り回数を小さくすることができる。また、各u_j(j=1,．．．,n_y−1)の選択確率を次式（５）で計算する。
【数８】

【００５０】
一方、配置替え順序は総荷繰り回数に影響を与える。すべての(x,u_j)（jは1以上n_y−1以下の整数）に対してQ値が求まっていれば、各配置順序に対する総荷繰り回数の最小値が計算できる。また、配置替え対象コンテナの選択は状態xに影響しない。これらの性質を利用し、c_Tを決定する場合には割引を行うことなくQ値を伝播する。つまりQ'(x,u_j)（jは1以上n_y−1以下の整数）をヤードエリアの状態xと配置替え対象コンテナの選択u_jの組に対するQ値として、次式（６）で更新する。
【数９】

式(6)によって配置替えしてもQ値の荷繰り回数の指標とし、各xに対して荷繰り回数を最も小さくするc_Tを求めることができる。n_y以上n_b(n_y−1)以下を示す各u_jの選択確率は次式（７）とする。
【数１０】

ただし、複数の配置替え候補が同一列に存在する場合、最も上にある候補を配置替え対象とする。これは、総繰り回数を増加させることが明らかな動作を除外するためである。
【００５１】
さらに、他のコンテナが配置替え対象コンテナの上に存在しない場合には無条件に配置替えできる。この場合には、動作の選択肢が1通りのみであり、学習を行う必要がないため、Q値の伝播は配置替え直前と配置替え直後の状態に対応するQ値間で行う。つまり、時刻t−1，状態xで荷繰り、時刻t+i，状態x'ⁱ(i=1,．．．,K−1)（ただしKはコンテナの数）で配置替え候補コンテナをバッファに移動、時刻t+K，状態x'^Kも新たな配置替え候補コンテナを決定した場合、以下の式（８）
【数１１】

となる。
【００５２】
図１０と図１１に上記式(4)〜(8)を考慮したQ値の伝播方法の例を示す。図１０において、パターン１では、プラントに入力u_j1,u_j2が順に与えられ、ヤードエリアの状態がx,x',x"の順に変化している。そして、荷繰りのみを繰り返しているので、式(4)によって2回Q値が伝播している。また、図１１において、パターン２では、入力uj1,uj2,uj3が順に与えられているが、u2は配置替え対象の決定であるために状態が変化していない。また、x'→x"は配置替えでありx^＋'に対してQ値の伝播が行われない。従って(x^＋”,u_j2)に対して式(6)，(x⁺,u_j1)に対して式(8)を用いてQ値が伝播されている。この結果、割引を伴ったQ値の伝播は荷繰りを行った場合のみに行われる。
【００５３】
以上の説明に基づいて、以下に荷役作業最適化手順の決定方法を図９を参照して説明する。
【００５４】
最初に、すべてのQ値を0に初期化する。ステップＳ２とＳ４で、配置替え可能なコンテナを全てバッファエリアに移動する。これにより、試行すべき順列の数が減少する。続いて、ステップＳ６で、配置替え対象コンテナc_Tを決定し、式(8)によってQ値を伝播した後、(x,u_j)を記憶する。ステップＳ８では、Ｑ値を更新する。続いて、ステップＳ１０と１２では、c_Mが存在すれば、式(6)によってQ値を更新した後(x',u_j)を記憶する。さらにc_Mがあれば荷繰りし、式(4)によってQ値を更新した後、(x⁺,u_j)を記憶する。これをc_Mがなくなるまで繰り返す。ステップＳ１４では、c_Tを配置替えする。コンテナがヤードエリアに残っていれば制御フローは、ステップＳ１６からステップＳ２に戻る。配置替えが完了すれば、ステップＳ１８で報酬を受け取る。
【００５５】
以上によって、Q値の繰り返し回数の指標とすることができる。
【００５６】
提案方法の有効性を示すために、規模の異なる3つのプラントに対して計算機シミュレーションを行った。プラント1はl=1,k=4,m_y=3,n_y=4,m_b=1,n_b=4，プラント2はl=1,k=5,m_y=3,n_y=4,m_b=2,n_b=3，プラント3はl=2,k=36,m_y=n_y=m_b=n_b=6とし、ヤードエリアの初期配列とバッファエリアの目標配列を図１２に示す。プラント1、プラント2の状態数はそれぞれ47,520、380,160である。またプラント3の状態数は
【数１２】

であり、Q値数は
【数１３】

となる。
【００５７】
1Q値を記憶するために4Byteが必要と仮定すると、全てのQ値を記憶するために記憶容量が約9.2×10⁶⁶Byte必要であり、従来のQ-tableを用いる手法は構成できなかった。本発明の手法の設定パラメータは、α=0.99、γ=0.8、T=0.1とした。プラント１，２の最適解は荷繰り回数が1であり、どちらのプラントに対しても提案方法は最適解を発見できた。また、初期状態から配置替え完了までを1試行とし、プラント１に対して50試行、プラント２に対して200試行で全てのQ値の値が収束し、学習が終了した。このとき、プラント１に対して提案手法が学習した全ての状態・動作・Ｑ値を図１３に示す。
【００５８】
図１３からプラント１に対して記憶した状態はx_ε，x_ωのみであり、各状態に対して2Q値を学習していることが分かる。プラント２に対して、記憶した状態数は11であり、従来のQ-learningに比べQ-tableの構成に必要な状態数が削減できた。また、各状態において学習したQ値のうちで最大の値を持つ動作を選択することによって、荷繰り・配置替えの合計回数を最小にできることが分かった。プラント１，２では状態数が小さく、全ての状態・動作について学習が行えるため、初期状態から配置替え完了にいたるまでの全ての経路に対して、荷繰り・配置替え回数に応じたQ値が獲得できている。
【００５９】
プラント１に対して学習した全てのQ値の試行回数に対する変化を調べ、異なった初期に対して行った30回のシミュレーション結果の空間平均をとったところ、学習した全てのQ値の値が収束していることが分かった。また状態x_εの動作完了後、荷繰り0回または1回で配置替えが完了するため、各動作に対応するQ値は0.8または1.0となる。x_ωに対してはいずれの動作後も荷繰り0回で配置替えが完了するためQ値は1.0となる。従って、全てのQ値が真値に収束していることが分かった。
【００６０】
図１４は、プラント３に本発明の手法を適用した結果を示す。図１４において、横軸は試行回数、縦軸は各試行の荷繰り数と配置替え数の合計を表す。結果は(A)30試行ごとの荷繰り・配置替え数の平均、(B)各試行までで最も小さい荷繰り・配置替えの合計回数について、行動選択アルゴリズムの初期値を変えて行った30回の独立したシミュレーションの結果の空間平均をとった。試行を重ねて学習が進行するとともに荷繰り・配置替え数が減少していることが分かった。また、全てのシミュレーションにおいて、最も少ない荷繰り・配置替え数は43(荷繰り数7)、1シミュレーションあたりの計算時間はＣＰＵがPentiumＩＩＩ850MHzのパーソナルコンピュータを用いて約1分30秒、使用メモリは約6×10⁶Byteだった。
【００６１】
このように、本発明は、強化学習の一手法であるQ-learning法を用いたコンテナターミナル運用最適化システムが提供する。本発明では、コンテナの配置をシステムの状態、コンテナの移動を制御入力ととらえ、コンテナの移動回数を小さくするように学習・記憶を行う。通常状態数が大きいプラントにQ-learning法を用いると状態と行動の組み合わせ数が大きくなり、必要な記憶容量が膨大になることが知られているが、本発明の手法は探索済みデータのみを記憶するため、必要な配置・移動の組み合わせ数を小さくできる。また、探索済みデータが再利用できるため、解の高速な探索・改善が可能である。
【００６２】
上記のように、本発明では、コンテナ荷役計画に対して、状態・動作の組み合わせ数の爆発のために従来困難だったQ-learning法による学習法が提供される。その際、荷繰り回数のみをQ値の値に反映することによって学習効果を高めるとともに、必要な状態のみをQ-tableに記憶することによって必要な記憶容量を削減した。また、計算機シミュレーションによって、小規模な問題に対して最適解が求まることを確認した。さらに、実用規模の問題に対して，良好な結果が小さい記憶容量と短い計算時間で得られることが分かった。
【００６３】
今回対象にしたコンテナターミナルは、本来荷役のための専用バッファ（一時保管場所）を有する。運用面では、翌朝船積みされるコンテナを前日夜間にヤードからバッファに配置換えし、バッファにおいては本船荷役中に荷繰りする必要がないように船積み順に合わせて積みつけるという運用になる。本発明のメリットは、▲１▼船積み中に荷繰りを行う必要がない、▲２▼ヤードシャーシはバッファとコンテナクレーンの間のみで搬送し走行距離が短縮される、▲３▼ヤードで搬出入を行う外来シャーシとの干渉がない、といったメリットがある。
【００６４】
本研究の類似問題として古くから人工知能分野などで扱われた積み木問題があり、最近GAやニューラルネットワーク、マルチエージェント手法などによる解法が提案されている。積み木問題と本研究のプラントとはバッファエリアが存在し、バッファエリアへ積むコンテナの順序を荷繰り順に加えて学習することが異なる。
【００６５】
【発明の効果】
この前日夜間に行われる配置換え荷役において、試行錯誤を通じて最適解を導出する強化学習の一手法であるQ-Learning法をコンテナの配置換え荷役作業に適用し、短時間で荷役作業を終えることができる最適な運用を計画する。
【図面の簡単な説明】
【図１】図１は、本発明が適用されるコンテナターミナルを示す図である。
【図２】図２は、本発明のコンテナターミナル運用最適化システムのうちの制御装置の構成を示すブロック図である。
【図３】図３は、コンテナの移動の最適化を説明する一例を示す図であり、（ａ）は、２つのヤードからヤードエリアにおけるコンテナの配置を示すヤード配置位置データであり、（ｂ）は、バッファエリアに配置されるべきコンテナの位置を示すバッファ配置位置データを示す図であり、（ｃ）はコンテナの初期状態を示す図である。
【図４】図４の（ａ）から（ｆ）は、第１例におけるコンテナの移動を示す図である。
【図５】図５の（ｇ）から（ｍ）は、第１例におけるコンテナの移動を示す図である。
【図６】図６の（ａ）から（ｊ）は、第２例におけるコンテナの移動を示す図である。
【図７】図７は、第１例におけるコンテナの荷繰りと配置替えの順序を示す図である。
【図８】図８は、第２例におけるコンテナの荷繰りと配置替えの順序を示す図である。
【図９】図９は、本発明のコンテナターミナル運用最適化の方法を示すフローチャートである。
【図１０】図１０は、Q値の伝播方法の例を示す図である。
【図１１】図１１は、Q値の伝播方法の例を示す図である。
【図１２】図１２は、ヤードエリアの初期配列とバッファエリアの目標配列を示す図である。
【図１３】図１３は、プラント１に対して提案手法が学習した全ての状態・動作・Ｑ値を示す図である。
【図１４】図１４は、プラント３に本発明の手法を適用した結果を示す図である。
【符号の説明】
２：制御装置
４：ヤードエリア
６：バッファエリア
８：ベイ
１０：トランスファクレーン
１２：レール
１４：ヤードシャーシ
２０：コンテナ船
２２：コンテナクレーン
３２：入力装置
３４：出力装置
３６：ヤード配置データベース
３８：バッファ配置データベース
４０：制御部
４２：シミュレータ
４４：制御指示装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a container terminal operation optimizing system, and more particularly to a technique for efficiently loading containers onto a container ship.
[0002]
[Prior art]
Freight transportation using ships has been actively conducted at ports around the world. Usually, since each cargo is managed by a container, the container handling work is indispensable for a ship at a port. When a container is loaded on a ship, the destination of transportation of each container is determined, and the container cannot be moved in the ship. Therefore, it is necessary to arrange the containers in a predetermined loading order. Containers are transported from various locations by truck and stacked in a predetermined area. This area is called a bay, and the set of bays is called a yard area. Since the containers in the yard area are arranged ignoring the order of loading, it is necessary to rearrange the containers before loading into the ship. After the rearrangement, the container is stored in an area called a buffer area.
[0003]
On the other hand, a berthing fee is collected from a ship anchored at the port. The price is determined based on the berthing time, so if you berth for a long time, the price will be high. For this reason, there is a demand for a method for making the berthing time as short as possible by efficiently rearranging the containers.
[0004]
When performing container rearrangement, it is possible to search for the rearrangement order in the shortest time by storing all combinations of container arrangement and movement. However, since the number of combinations of arrangement and movement increases exponentially as the number of containers increases, it is difficult to store and use all combinations when the number of containers is large. Therefore, a method for improving the solution is required, including a local search from the initial state.
[0005]
In connection with the above description, Japanese Patent Application Laid-Open No. 9-12116 discloses a container rearrangement plan creation method. This reference aims at minimizing the transport time of a set when N transport means are synchronized and transport a transported object as a set at one time.
[0006]
Japanese Laid-Open Patent Publication No. 9-267918 discloses a main line cargo handling plan creation method in a container terminal. This reference describes the adjustment of the working time of a quay handling crane and the working time of a container yard handling crane.
[0007]
[Problems to be solved by the invention]
Accordingly, an object of the present invention is to provide a container terminal operation optimizing system and method capable of carrying out cargo handling work in a short time.
[0008]
Another object of the present invention is a container terminal operation optimization system and method for simulating a cargo handling operation so that the cargo handling operation can be completed through a minimum movement of the container and performing the cargo handling operation based on the result. Is to provide.
[0009]
Another object of the present invention is to provide a container terminal operation optimizing system and method for improving the efficiency of the simulation by applying the Q-Learning method to the simulation.
[0010]
Another object of the present invention is to provide a container terminal operation optimization system and method that can reduce the required memory capacity when applying the Q-Learning method.
[0011]
[Means for Solving the Problems]
The means for solving the problem will be described below using the numbers and symbols used in the [Embodiments of the Invention]. These numbers and symbols are added to clarify the correspondence between the description of [Claims] and the description of [Embodiments of the Invention]. It should not be used to interpret the technical scope of the described invention.
[0012]
In the first aspect of the present invention, the container terminal operation optimizing system is stored in a buffer area and a yard arrangement database (36) for storing yard arrangement data indicating the arrangement positions of containers stored in the yard area. The buffer arrangement database (38) for storing buffer arrangement data indicating the desired arrangement position of the container to be moved and the buffer arrangement data are created from data indicating the transport destination of the container ship, and using the Q-learning method, A container terminal operation optimizing system comprising a control unit (40) for determining a procedure for moving the movement target container from the yard area to the buffer area with a minimum number of movements from the yard arrangement data and the buffer arrangement data, Transfer crease to move the container to be moved (40), and the control unit (40) determines the order of moving the container to be moved from the yard area (4) to the buffer area (6) according to the determined procedure. A control instruction device (44) for instructing (10) is further provided.
[0013]
The transfer crane (10) moves on a rail (12) provided along the yard area.
[0014]
  The control unit (40) includes a simulation unit (42) that executes a Q-learning method, and the simulation unit (42) is movable when the Q-learning method is executed.TransferThe moving target container is moved from the yard area (4) to the buffer area (6), and the Q-learning method is executed on the remaining containers. At this time, the order of the remaining containers to be moved in the remaining containersIn a rowThen, the Q-learning method is executed.
[0015]
In the second aspect of the present invention, the container terminal operation optimizing method includes a step of creating buffer arrangement data from data indicating a transport destination of a container ship, and the buffer arrangement data is a moving object to be stored in a buffer area. Determining a procedure for moving a container to be moved from the yard area to the buffer area with the minimum number of movements from the yard arrangement data and the buffer arrangement data using a Q-learning method, and indicating a desired arrangement position of the container; The arrangement data indicates an arrangement position of the container stored in the yard area, and includes the step of moving the container to be moved from the yard area to the buffer area according to the determined procedure.
[0016]
The determining step includes a step of moving the movable container to be moved from the yard area to the buffer area and executing the Q-learning method on the remaining containers.
[0017]
  The performing step includes the remaining of the remaining containers in the remaining containers.TransferOrder of moving target containersIn a rowThen, the Q-learning method is executed.
[0018]
In the third aspect of the present invention, the container terminal operation optimizing method includes: (a) determining whether there is a relocatable transfer target container; and (b) the relocatable transfer target container. (C) selecting a specific one of the containers to be moved that is not relocatable; and (d) a Q value. (E) moving the container to be moved from the yard area to the buffer area while updating (a) until the container to be moved no longer exists in the yard area. Repeating steps.
[0019]
The steps (a) to (e) are executed for all the permutations when the containers to be moved are arranged, and the permutation with the smallest number of container movements is determined. In the step (d), a container other than the movement target container is moved in the yard to find the specific movement target container, and the specific movement target container is moved from the yard area to the buffer area. It comprises.
[0020]
In recent years, it has been pointed out that Japan's container terminal development has been delayed and service levels have declined, and the development of a central international port has become an urgent issue. In this situation, focusing on the operation of the container terminal, the Q-Learning method, which is a reinforcement learning method that derives the optimal solution through trial and error, is used for container relocation handling work in order to speed up the operation. An optimum operation system that can be applied and can finish the cargo handling work in a short time is provided.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a container terminal operation optimization system according to the present invention will be described in detail with reference to the accompanying drawings.
[0022]
FIG. 1 shows a container terminal and an optimization system to which an embodiment of the present invention is applied. The container terminal has a yard area 4 and a buffer area 6. The yard area 4 includes a plurality of bays 8. Rails 12 for the transfer crane 10 are provided on both sides of the yard area 4 and the buffer area 6. The transfer crane 10 moves between the yard area 4 and the buffer area 6 along the rail 12. The optimization system includes a control device 2 that monitors and controls the operation of the transfer crane 10.
[0023]
The containers carried by the container truck are stacked in one of the bays 8 by the transfer crane 10. When the container ship 20 is scheduled to enter the port, the control device 2 plans the optimum container handling operation from the data indicating the destination, and moves the container from the yard area 4 to the buffer area 6 in the transfer crane 10 according to the plan. Instruct the order to do. Thereafter, when the container ship 20 arrives at the wharf, the transfer crane 10 loads the container on the yard chassis 14 in accordance with an instruction from the control device 2. The yard chassis 14 moves to the wharf with the container loaded. The wharf crane 22 moves the containers loaded on the yard chassis 14 into the container ship 20 in order. Thus, the containers can be efficiently loaded from the yard area 4 into the container ship 20.
[0024]
The control device 2 includes a yard arrangement database 36, a buffer arrangement database 38, a control unit 40, an input device 32, and an output device 34. The control unit 40 includes a simulator 42 and a control instruction device 44. The yard arrangement database 36 stores yard arrangement data indicating the storage position of each container in the yard area 4. The buffer arrangement database 38 stores buffer arrangement data indicating the temporary storage position of the container moved to the buffer area 6. When the port of the container ship 20 is scheduled, data indicating the transport destination of the container ship 20 is input from the input device 32 to the simulator 42. The simulator 42 creates buffer arrangement data from the transportation destination data and the yard arrangement data. Thereafter, the simulator 42 creates a rearrangement plan by simulating the optimum order of containers to be loaded on the container ship 20 from the yard arrangement data and the buffer arrangement data. The planned plan is output to the output device 34. Further, the control instruction device 44 outputs an instruction to the transfer crane 10 according to the plan. In this way, the containers can be rearranged from the yard area 4 to the buffer area 8 in an optimal order that minimizes the number of times the containers have moved.
[0025]
Next, container rearrangement work will be described.
[0026]
First, the yard area 4 and the buffer area 6 will be described with reference to FIG. As an example, it is assumed that there are two bays 8 in the yard area 4 as shown in FIG. In this case, the containers stacked in the bay 8 are numbered as shown. In addition, the transportation destination is linked to the yard arrangement data. Assume that the positions of the containers to be stacked in the buffer area 8 are as shown in FIG. 3B based on the transport destination data of the container ship 20. The state of the containers actually stacked in the bay 8 is shown in FIG. At this time, it is considered that the containers are stacked most efficiently as shown in FIG.
[0027]
FIG. 7 shows a first example of unloading and rearrangement operations. In this example, the order of containers moved and arranged in the buffer area 8 is the order of C1, C2, C3, C4, C5, C6, C7, C8, and C9. The operation for this will be described with reference to FIGS.
[0028]
FIGS. 4A to 4F and FIGS. 5G to 5M correspond to the steps of the cargo handling work plan shown in FIG. The two on the left indicate the storage status of the container in the bay 8, and the right end indicates the temporary storage status of the container in the buffer area 8.
[0029]
FIG. 4A shows the first step. From the state shown in FIG. 3C, the container C9 is unloaded in order to rearrange the container C1. Next, in the second step, since the container C1 is movable, the container C1 is moved to the buffer area 8. Next, in the third step, the container C7 is unloaded for the container C2. Subsequently, in the fourth step, the container C2 is rearranged in the buffer area 8. In the fifth step, the container C5 is unloaded, and in the sixth step, the container C3 is rearranged in the buffer area 8. In the seventh step, the container C4 is rearranged in the buffer area 8, and in the eighth step, the container C5 is rearranged in the buffer area 8. Next, in the ninth step, the container C9 is unloaded, and in the tenth step, the container C6 is rearranged in the buffer area 8. Subsequently, in the eleventh step, the container C7 is rearranged, and in the twelfth step, the container C8 is rearranged in the buffer area 8. Finally, the container C9 is rearranged in the buffer area 8 in the thirteenth step. In the first example of the cargo handling work plan, the containers are rearranged in the order of C1, C2, C3, C4, C5, C6, C7, C8, and C9. Thus, 13 steps are required to stack the containers C1 to C9 in the desired order shown in FIG. 3B.
[0030]
Next, FIG. 8 shows a second example of the cargo handling work plan. In this example, the order of containers moved and arranged in the buffer area 8 is the order of C2, C5, C3, C6, C9, C1, C4, C7, and C8. The operation for this will be described with reference to FIG. FIGS. 6A to 6J correspond to the steps in FIG. The two on the left indicate the storage status of the container in the bay 8, and the right end indicates the temporary storage status of the container in the buffer area 8.
[0031]
  FIG. 6A shows the first step. In order to rearrange the container C2 from the state shown in FIG.7Is unloaded. In the first step, the refillable containers in the yard area 4 are C6, C9, C4, C8, C7, and C5. In the buffer area 8, the refillable containers are C1, C2, and C3. There is no replaceable container. Therefore, the container C7 is unloaded. Here, the container C7 is selected as a result of learning by the Q-learning method.
[0032]
In the second step, the refillable containers in the yard area 4 are C6, C9, C4, C7, C2, and C5. In the buffer area 8, the refillable containers are C1, C2, and C3, and the container C2 is Rearrangement is possible. Accordingly, the container C2 is rearranged in the buffer area 8 in the second step.
[0033]
Next, in the third step, the refillable containers in the yard area 4 are C6, C9, C4, C7, and C5. In the buffer area 8, the refillable containers are C1, C5, and C3, and the container C5 Can be rearranged. Accordingly, the container C5 is rearranged in the buffer area 8 in the third step.
[0034]
Next, in the fourth step, the refillable containers in the yard area 4 are C6, C9, C4, C7, and C3. In the buffer area 8, the refillable containers are C1, C8, and C3, and the container C3 Can be rearranged. Accordingly, the container C3 is rearranged in the buffer area 8 in the fourth step.
[0035]
Next, in the fifth step, the refillable containers in the yard area 4 are C6, C9, C4, and C7. In the buffer area 8, the refillable containers are C1, C8, and C6, and the container C6 is arranged. It is possible to change. Accordingly, the container C6 is rearranged in the buffer area 8 in the fifth step.
[0036]
Next, in the sixth step, the refillable containers in the yard area 4 are C9, C4, and C7. In the buffer area 8, the refillable containers are C1, C8, and C9, and the container C9 can be rearranged. It is. Accordingly, the container C9 is rearranged in the buffer area 8 in the sixth step.
[0037]
Next, in the seventh step, the refillable containers in the yard area 4 are C1, C4, and C7. In the buffer area 8, the refillable containers are C1 and C8, and the container C1 can be rearranged. . Accordingly, the container C1 is rearranged in the buffer area 8 in the seventh step.
[0038]
Next, in the eighth step, the refillable containers in the yard area 4 are C4 and C7, but in the buffer area 8, the refillable containers are C4 and C8, and the container C4 can be rearranged. Accordingly, the container C4 is rearranged in the buffer area 8 in the eighth step.
[0039]
Next, in the ninth step, the refillable container in the yard area 4 is C7, but in the buffer area 8, the refillable containers are C7 and C8, and the container C7 can be rearranged. Accordingly, the container C7 is rearranged in the buffer area 8 in the ninth step.
[0040]
Finally, in the tenth step, the refillable container in the yard area 4 is C8, but in the buffer area 8, the refillable container is C8, and the container C8 can be rearranged. Accordingly, the container C8 is rearranged in the buffer area 8 in the tenth step.
[0041]
Thus, in the second example of the cargo handling work plan, the containers C1 to C9 are rearranged to a desired position in 10 steps. On the other hand, in the first example, 13 steps are required. As described above, the container can be rearranged from the yard area 4 to the buffer area 6 with the minimum number of movements only by considering the rearrangement procedure.
[0042]
Next, consider the application of the Q-learning method as a reinforcement learning method in order to simulate such efficient rearrangement. For this reason, the Q value in the Q-learning method has an evaluation value of a certain state and a certain operation. Here, in the case of the relocation problem, the Q value (Q (s, a)) is determined by the container arrangement state (s) and the container unloading operation in the yard area 4, and buffered from the yard area 4. It is defined by the rearrangement operation (a) to area 8. By renewing the Q value (Q (s, a)) with the algorithm shown below so as to have the optimum value, the container relocation handling that minimizes the number of times of container movement is planned.
(1) Observe container state s.
(2) The container movement action a is executed according to an arbitrary action selection method.
(3) Receive a reward r for moving behavior.
(4) Observe the container state s ′ after the rearrangement.
(5) Update the Q value using the following update formula.
Q (s, a) ← (1−α) × Q (s, a) + α [r + γ · max Q (s ', a')]
Where α is the learning rate (0 <α <1) and γ is the discount rate (0 <γ <1)
(6) Advance time step t to t + 1 and return to step (1).
[0043]
In the conventional Q-learning method, x_i, u_j, (i = 1,..., m, j = 1,..., n) are the state variables and inputs of the respective plants. Each (x_i, u_jQ value is given as the evaluation corresponding to the combination of). A set of Q values is called a Q value table._jHas a corresponding 1Q value table. So all (x_i, u_j) N number of Q value tables are required to evaluate the combination. The input of the Q value table is x_i, Output is Q (x_i, u_j). If the plant characteristics are unknown, Q (x_i, u_j) Is also unknown. Therefore Q (x_i, u_j) Q at time t_t(x_i, u_j) To update this with the following equation (1):
[Expression 1]

Here, 0 <γ <1 is a discount rate, and α is a learning rate. R_tIs the reward, and the control input at time t−1 is u_j, Given when the plant state at time t is x ′. At this time, what is updated by equation (1) is (x_i, u_jOnly 1Q value corresponding to).
[0044]
In early learning, many (x_i, u_j) For Q (x_i, u_j) Must be updated repeatedly. For this reason, the input u is given by the Boltzmann distribution of the following equation (2)._jThe method of selecting is widely used.
[Expression 2]

Where T is a temperature constant.
[0045]
Here, consider the relocation problem of k containers. Each container is c₁To c_kHas unique identifiers up to. Containers are stacked randomly in one bay 8. Since the order of containers to be loaded on the ship is determined, the containers are rearranged from the bay 8 to an area called a buffer area 6. The arrangement of the buffer area 6 is determined in advance according to the loading order of the containers. The size of the bay is m_yRow n_yRow, container height is l, buffer area size is m_bRow n_bA column. Position in the yard area is 1 to m_yXn_yIdentifies with an integer up to × l. Also container c_iThe position in the yard area of (i = 1, ..., k) is x_i(xi is 1 or more and m_yXn_yXl or less) and the yard area layout is x = [x₁,. . . , X_k]. Where c_iX if is in the buffer area_i= 0. Move the containers from the bay to the buffer area and stack them from the bottom. Therefore, the number of columns in the buffer area is n_bLimited to When relocating a container, first target container c_TReplace candidate uc_j(j = n_y,. . . , N_b+ n_y-1). Then c_TAll other containers in the bay [um₁,. . . , Um_ny ₋ ₁] Move to one row. This is called unloading._MAnd And c_TTo the buffer area. At this time, the shipping destination is n per container._y−1, relocation is n_bThere are street choices, and the state of the yard area changes when unloading or rearranging. Therefore, the operation of the plant is expressed as follows.
uj = um_j (1 ≦ j ≦ n_y-1)
= uc_j (n_y≦ j ≦ n_b+ n_y-1)
In this case, the plant can be expressed by the following equation (3).
[Equation 3]

Where f () is the operation for the yard / buffer area u_jIs an application function of. In FIG. 3, m_y= n_y= m_b= n_b= 3, l = 2, k = 9, and the position of the container is identified by an integer from 1 to 18. For this plant, the relocation target c in FIG. 6 (a)_T[C₁, c₂, c_Three] C₂Determined to c₂On container c₇Unloading. Also, the shipping destination is [um₁, um₂] = [13,18], and the next relocation target is c_FiveIt is said. At this time c₇C is better than 13 instead of 18._FiveThe number of times of unloading to rearrange is reduced by one. Also, the order of rearrangement is 2,2,3,4,5,6,7,8,9 in the first example, rather than 2,2,3,6,9,1,4,7 in the second example. , 8 can be rearranged with fewer unloading times. Changes in x with respect to the two rearrangement orders are as shown in FIGS.
[0046]
Regarding the application of the Q-learning method to the container relocation problem, a solution search method using the Q-learning method, which has been difficult to realize in the past, will be described.
[0047]
In the Q-learning method, all state and action combinations must have an evaluation (Q value). When applying Q-learning to container relocation problem, in order to express the state of the plant when unloading, in addition to x, unloading container c_TIs required. So x is an extension of x⁺= [x₁, ..., x_k, c_T] And u_j(j = 1,…, n_y+ n_bQ value is assigned to the combination of −1). At this time, 1 or more m_yXn_yX less than x_i(i = 1, ..., k), c_TIs n_bX because it is street⁺Is as follows:
[Expression 4]

And the number of Q values is
[Equation 5]

It is. That is, as k increases, the number of Q values increases exponentially. Conventionally, in Q-learning, as the number of Q values increases, the learning speed decreases, and the storage area required to store the Q values also increases.
[0048]
Therefore, a Q-learning algorithm that uses plant properties to improve the learning speed and a data structure for reducing the necessary storage area will be described below.
[0049]
In the conventional Q-learning, the Q value is updated by the equation (1) and propagated while being discounted by γ. Since the purpose of the relocation problem is to make the number of unloading as small as possible, the Q value may be an index representing the unloading number. Therefore, unloadingR u _j(j = 1, ..., n_y−1) to two states x and x ′ in the yard area,
[Formula 6]

When the above relationship holds, the following equation (4)
[Expression 7]

To update / propagate the Q value. Reward R_tIs given only when all the rearrangements are completed, and if the Q value is propagated using equation (4), the corresponding Q value becomes smaller because the number of discounts increases as the number of times of unloading increases. That is, in each state, the number of times of unloading can be reduced by selecting a unloading with a relatively large Q value. Also each u_j(j = 1, ..., n_yThe selection probability of -1) is calculated by the following equation (5).
[Equation 8]

[0050]
On the other hand, the rearrangement order affects the total number of times of unloading. All (x, u_j) (J is 1 or more n_yIf the Q value is obtained for an integer less than or equal to −1, the minimum value of total unloading times for each arrangement order can be calculated. In addition, the selection of the relocation target container does not affect the state x. Utilizing these properties, c_TThe Q value is propagated without discounting. That is, Q '(x, u_j) (J is 1 or more n_y-1 or less integer) is the yard area state x and the container to be relocatedSelection ofu_jThe Q value for the set of is updated by the following equation (6).
[Equation 9]

Even if rearranged according to Equation (6), it will be used as an index of the Q number of unloading times, and the unloading number will be the smallest for each x_TCan be requested. n_yN_b(n_y−1) Each u_jThe selection probability of is given by the following equation (7).
[Expression 10]

However, when a plurality of rearrangement candidates exist in the same column, the candidate at the top is the target for rearrangement. This is to exclude an operation that is apparent to increase the total number of repetitions.
[0051]
Furthermore, if there is no other container on the container to be rearranged, it can be rearranged unconditionally. In this case, since there are only one choice of operation and learning is not necessary, propagation of the Q value is performed between the Q values corresponding to the states immediately before and after the rearrangement. That is, unloading at time t−1, state x, time t + i, state x ′ⁱ(i = 1, ..., K-1) (where K is the number of containers) move the relocation candidate containers to the buffer, time t + K, state x '^KIf a new replacement candidate container is also determined, the following equation (8)
## EQU11 ##

It becomes.
[0052]
FIG. 10 and FIG. 11 show examples of Q value propagation methods considering the above equations (4) to (8). In FIG. 10, pattern 1 is input to the plant u_j1, u_j2Are given in order, and the state of the yard area changes in the order of x, x ', x ". Since only unloading is repeated, the Q value is propagated twice according to equation (4). In FIG. 11, in pattern 2, inputs uj1, uj2, and uj3 are given in order, but u2 is determined to be a rearrangement target, so the state has not changed. Rearrangement x⁺Q value is not propagated to '. Therefore (x⁺”, U_j2) For (6), (x⁺, u_j1) Is propagated using equation (8). As a result, the Q value with a discount is only propagated when unloading.
[0053]
Based on the above description, a method for determining a cargo handling work optimization procedure will be described below with reference to FIG.
[0054]
First, all Q values are initialized to 0. In steps S2 and S4, all containers that can be rearranged are moved to the buffer area. This reduces the number of permutations to be tried. Subsequently, in step S6, the relocation target container c_TAnd propagating the Q value according to equation (8), then (x, u_j) Is memorized. In step S8, the Q value is updated. Subsequently, in steps S10 and S12, c_M(X ', u) after updating the Q value according to Equation (6)_j) Is memorized. C_MIf there is, unload and update the Q value according to Equation (4), then (x⁺, u_j) Is memorized. C_MRepeat until there is no more. In step S14, c_TRearrange. If the container remains in the yard area, the control flow returns from step S16 to step S2. If the rearrangement is completed, a reward is received in step S18.
[0055]
As described above, an index of the number of repetitions of the Q value can be used.
[0056]
In order to show the effectiveness of the proposed method, computer simulations were performed for three different scale plants. Plant 1 is l = 1, k = 4, m_y= 3, n_y= 4, m_b= 1, n_b= 4, plant 2 is l = 1, k = 5, m_y= 3, n_y= 4, m_b= 2, n_b= 3, plant 3 is l = 2, k = 36, m_y= n_y= m_b= n_bFIG. 12 shows the initial arrangement of the yard area and the target arrangement of the buffer area. The number of states of plant 1 and plant 2 are 47,520, 380,160, respectively. The number of states of plant 3 is
[Expression 12]

And the number of Q values is
[Formula 13]

It becomes.
[0057]
Assuming that 4 bytes are needed to store 1Q value, the storage capacity is about 9.2 × 10 to store all Q values⁶⁶Byte is necessary and the conventional method using Q-table could not be constructed. The setting parameters of the method of the present invention were α = 0.99, γ = 0.8, and T = 0.1. The optimal solution for

plants

1 and 2 has a loading count of 1, and the proposed method was able to find the optimal solution for both plants. Moreover, from the initial state to the completion of the rearrangement, one trial was performed, and all Q values converged after 50 trials for plant 1 and 200 trials for plant 2, and learning was completed. At this time, all the states / operations / Q values learned by the proposed method for the plant 1 are shown in FIG.
[0058]
From FIG. 13, the state stored for plant 1 is x._ε, X_ωIt can be seen that 2Q values are learned for each state. The number of states stored for the plant 2 is 11, and the number of states necessary for the configuration of the Q-table can be reduced compared to the conventional Q-learning. It was also found that the total number of times of unloading and rearrangement can be minimized by selecting the operation having the maximum value among the Q values learned in each state. Since the number of states in

Plants

1 and 2 is small and learning is possible for all states and operations, the Q value corresponding to the number of times of unloading / rearrangement is obtained for all routes from the initial state to the completion of relocation. I have earned it.
[0059]
Examining changes in the number of trials of all Q values learned for plant 1 and taking the spatial average of the results of 30 simulations performed for different initial stages, the values of all learned Q values converge. I found out that Also state x_εAfter the operation is completed, the relocation is completed 0 times or once, so the Q value corresponding to each operation is 0.8 or 1.0. x_ωFor any of the above, the Q value is 1.0 because the rearrangement is completed with 0 unloading after any operation. Therefore, it was found that all Q values converged to true values.
[0060]
FIG. 14 shows the result of applying the method of the present invention to the plant 3. In FIG. 14, the horizontal axis represents the number of trials, and the vertical axis represents the total number of unloading and rearrangement for each trial. The results are (A) the average number of unloading / relocations per 30 trials, and (B) the total number of unloading / relocations that were the smallest up to each trial, 30 times changed by changing the initial value of the action selection algorithm. The spatial average of the results of independent simulations was taken. It was found that the number of unloading and rearrangement decreased as learning progressed through repeated trials. In all simulations, the smallest number of unloading / replacement was 43 (unloading number 7), the calculation time per simulation was about 1 minute 30 seconds using a personal computer with a Pentium III 850 MHz CPU, and the memory used was about 6 × 10⁶It was Byte.
[0061]
Thus, the present invention provides a container terminal operation optimization system using the Q-learning method, which is one method of reinforcement learning. In the present invention, learning and storage are performed so as to reduce the number of times the container is moved by regarding the arrangement of the container as a system state and moving the container as a control input. It is known that using the Q-learning method for a plant with a large number of normal states will increase the number of combinations of states and actions, and the required storage capacity will be enormous. However, the method of the present invention uses only searched data. Since it is stored, the number of necessary combinations of arrangement and movement can be reduced. In addition, since searched data can be reused, it is possible to search and improve a solution at high speed.
[0062]
As described above, in the present invention, a learning method based on the Q-learning method, which has been difficult in the past due to the explosion of the number of combinations of states and operations, is provided for the container handling plan. At that time, the learning effect was improved by reflecting only the number of unloading in the Q value, and the necessary storage capacity was reduced by storing only the necessary states in the Q-table. It was also confirmed by computer simulation that an optimal solution was obtained for a small problem. Furthermore, it was found that good results can be obtained with a small storage capacity and a short calculation time for practical scale problems.
[0063]
The container terminal targeted this time originally has a dedicated buffer (temporary storage location) for cargo handling. In terms of operation, the container to be loaded the next morning is replaced from the yard to the buffer the night before the day before, and the buffer is loaded in accordance with the loading order so that it is not necessary to unload it during the cargo handling. Advantages of the present invention are as follows: (1) No need to unload during loading, (2) Yard chassis is transported only between the buffer and the container crane and the travel distance is shortened, (3) Loading and unloading at the yard There is an advantage that there is no interference with the external chassis.
[0064]
A similar problem of this research is a building block problem that has been treated for a long time in the field of artificial intelligence. Recently, solutions using GA, neural networks, and multi-agent methods have been proposed. There is a buffer area between the building block problem and the plant of this study, and the difference is that learning is performed by adding the order of containers loaded into the buffer area to the loading order.
[0065]
【The invention's effect】
The Q-Learning method, which is a reinforcement learning method for deriving the optimal solution through trial and error, can be applied to the container replacement work. Plan for the best possible operation.
[Brief description of the drawings]
FIG. 1 is a diagram showing a container terminal to which the present invention is applied.
FIG. 2 is a block diagram showing a configuration of a control device in the container terminal operation optimizing system of the present invention.
FIG. 3 is a diagram illustrating an example for explaining optimization of container movement; (a) is yard arrangement position data indicating arrangement of containers in a yard area from two yards; () Is a diagram showing buffer arrangement position data indicating the position of the container to be arranged in the buffer area, and (c) is a diagram showing an initial state of the container.
FIGS. 4A to 4F are views showing container movement in the first example. FIG.
FIGS. 5 (g) to 5 (m) are diagrams illustrating container movement in the first example.
FIGS. 6A to 6J are views showing movement of containers in the second example.
FIG. 7 is a diagram illustrating the order of container unloading and rearrangement in the first example.
FIG. 8 is a diagram illustrating the order of container unloading and rearrangement in the second example.
FIG. 9 is a flowchart showing a container terminal operation optimizing method according to the present invention.
FIG. 10 is a diagram illustrating an example of a Q value propagation method;
FIG. 11 is a diagram illustrating an example of a Q value propagation method;
FIG. 12 is a diagram illustrating an initial arrangement of yard areas and a target arrangement of buffer areas.
FIG. 13 is a diagram showing all states / operations / Q values learned by the proposed method for the plant 1;
FIG. 14 is a diagram showing a result of applying the method of the present invention to a plant 3;
[Explanation of symbols]
2: Control device
4: Yard area
6: Buffer area
8: Bay
10: Transfer crane
12: Rail
14: Yard chassis
20: Container ship
22: Container crane
32: Input device
34: Output device
36: Yard layout database
38: Buffer allocation database
40: Control unit
42: Simulator
44: Control instruction device

Claims

A yard arrangement database for storing the initial state of multiple containers stored in the yard area;
A buffer arrangement database for storing buffer arrangement data indicating a desired arrangement position of the plurality of containers to be stored in a buffer area;
Using Q-learning method using a Q value corresponding to the act of transferring the plurality container and state of the plurality container to another state from the state, and the yard grid data from said buffer location data of said plurality containers A controller for determining a procedure for moving the entire area from the yard area to the buffer area;
The operation is
Unloading to move one container of the plurality of containers from one position of the yard area to another position of the yard area;
Repositioning to move one container of the plurality of containers from one position of the yard area to one position of the buffer area,
When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating, the Q value corresponding to state x + and unloading u is updated,
When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. A container terminal operation optimization system in which a Q value corresponding to an operation is propagated and a Q value corresponding to the state x + and the unloading u is updated .

In claim 1,
The operation further includes a relocation target container selection operation for selecting a relocation target container from a yard area container disposed in the yard area of the plurality of containers,
The unloading is an operation of moving one container of the yard area containers when another container is arranged on the relocation target container,
The rearrangements are Ru operation der moving the rearranged target container when another container on top of the relocated object container is located
Container terminal operation optimization system.

  In claim 2,
  When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating with a discount, the Q value corresponding to state x + and unloading u is updated,
  When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. The Q value corresponding to the action propagates with a discount, and the Q value corresponding to the state x + and the unloading u is updated,
  When the state x + is changed from the state x to the state x + by the relocation target container selection operation u ′ ″ and the unloading is executed in the state x +, it corresponds to the unloading that can be executed in the state x + and the state x +. The Q value propagates without discounting, and the Q value corresponding to the state x and the relocation target container selection operation u ′ ″ is updated,
  The procedure is determined by selecting the operation having the largest value among the Q values corresponding to the state.
  Container terminal operation optimization system.

In claim 3,
Corresponding to state x +, and corresponds to the load repeatedly u Q value Q (x +, u) is
'When the said load repeatedly runs, state x +' state x + moving by load repeatedly u from state x + corresponds to, and corresponds to the 'load repeatedly u may be performed in' state x + Using the Q value Q (x + ', u'), the following formula:
Q (x + , u) = (1- [alpha]) Q (x + , u) + [alpha] [R + [gamma] maxQ (x + ', u')]
Expressed by
When 'from the state x by one or more of the relocated' state x + moves to 'state x''corresponds to, and the state x' Q value corresponding to the 'operation u that may be performed by'' Using Q (x ″, u ″), the following formula:
Q (x + , u) = (1−α) Q (x + , u) + α [R + γmaxQ (x ″, u ″)]
Expressed by
Corresponding to state x, and, 'Q value Q corresponding to (x, u' rearranged target container selection operation u '' '') is selected by the rearranged target container selection operation u '''arranged when the load repeatedly are required when rearranging the target container instead, the state x or we relocated object container selection operation u '''load that may be performed in a state x + a state x + proceeding by repeated u' Using ''' and the following formula:
Q ( x, u ′ ″ ) = maxQ (x +, u ″ ″ )
Expressed by the container terminal operation optimization system.

In any one of claims 1 to 4,
A transfer crane for moving the plurality of containers;
The said control part is further equipped with the control instruction | indication apparatus which instruct | indicates the order which moves the said movement object container from the said yard area to the said buffer area according to the determined procedure. The container terminal operation optimization system.

A yard arrangement database for storing the initial state of multiple containers stored in the yard area;
A buffer arrangement database for storing buffer arrangement data indicating a desired arrangement position of the plurality of containers to be stored in a buffer area;
Using Q-learning method using a Q value corresponding to the act of transferring the plurality container and state of the plurality container to another state from the state, and the yard grid data from said buffer location data of said plurality containers A controller for determining a procedure for moving the entire area from the yard area to the buffer area;
The operation is
Unloading to move one container of the plurality of containers from one position of the yard area to another position of the yard area;
Repositioning to move one container of the plurality of containers from one position of the yard area to one position of the buffer area,
When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating, the Q value corresponding to state x + and unloading u is updated,
When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. A container terminal operation optimizing device in which a Q value corresponding to an operation is propagated and a Q value corresponding to the state x + and the unloading u is updated .

In claim 6 ,
The operation further includes a relocation target container selection operation for selecting a relocation target container from a yard area container disposed in the yard area of the plurality of containers,
The unloading is an operation of moving one container of the yard area containers when another container is arranged on the relocation target container,
The rearrangements are Ru operation der moving the rearranged target container when another container on top of the relocated object container is located
Container terminal operation optimization device.

  In claim 7,
  When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating with a discount, the Q value corresponding to state x + and unloading u is updated,
  When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. The Q value corresponding to the action propagates with a discount, and the Q value corresponding to the state x + and the unloading u is updated,
  When the state x + is changed from the state x to the state x + by the relocation target container selection operation u ′ ″ and the unloading is executed in the state x +, it corresponds to the unloading that can be executed in the state x + and the state x +. The Q value propagates without discounting, and the Q value corresponding to the state x and the relocation target container selection operation u ′ ″ is updated,
  The procedure is determined by selecting the operation having the largest value among the Q values corresponding to the state.
  Container terminal operation optimization device.

In claim 8,
Corresponding to state x +, and corresponds to the load repeatedly u Q value Q (x +, u) is
'When the said load repeatedly runs, state x +' state x + moving by load repeatedly u from state x + corresponds to, and corresponds to the 'load repeatedly u may be performed in' state x + Using the Q value Q (x + ', u'), the following formula:
Q (x + , u) = (1- [alpha]) Q (x + , u) + [alpha] [R + [gamma] maxQ (x + ', u')]
Expressed by
When 'from the state x by one or more of the relocated' state x moves to 'state x''corresponds to, and the state x' Q value Q corresponding to 'operation u that may be performed by'' Using (x ″, u ″), the following formula:
Q (x + , u) = (1−α) Q (x + , u) + α [R + γmaxQ (x ″, u ″)]
Expressed by
Corresponding to state x, and, 'Q value Q corresponding to (x, u' rearranged target container selection operation u '' '') is selected by the rearranged target container selection operation u '''arranged when the load repeatedly are required when rearranging the target container instead, the state x or we relocated object container selection operation u '''load that may be performed in a state x + a state x + proceeding by repeated u' Using ''' and the following formula:
Q ( x, u ′ ″ ) = maxQ (x +, u ″ ″ )
Represented by
Container terminal operation optimization device.

Storing the initial state of multiple containers stored in the yard area in a yard arrangement database;
Storing buffer arrangement data indicating a desired arrangement position of the plurality of containers to be stored in a buffer area in a buffer arrangement database;
Using Q-learning method using a Q value corresponding to the act of transferring the plurality container and state of the plurality container to another state from the state, and the yard grid data from said buffer location data of said plurality containers Determining a procedure for moving everything from the yard area to the buffer area,
The operation is
Unloading to move one container of the plurality of containers from one position of the yard area to another position of the yard area;
Repositioning to move one container of the plurality of containers from one position of the yard area to one position of the buffer area,
When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating, the Q value corresponding to state x + and unloading u is updated,
When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. A container terminal operation optimization method in which a Q value corresponding to an operation is propagated and a Q value corresponding to the state x + and the unloading u is updated .

In claim 10 ,
The operation further includes a relocation target container selection operation for selecting a relocation target container from a yard area container disposed in the yard area of the plurality of containers,
The unloading is an operation of moving one container of the yard area containers when another container is arranged on the relocation target container,
The rearrangements are Ru operation der moving the rearranged target container when another container on top of the relocated object container is located
Container terminal operation optimization method.

  In claim 11,
  When the state x + is changed to the state x + ′ by the unloading u and the unloading is executed in the state x + ′, the Q value corresponding to the unloading that can be executed in the state x + ′ and the state x + ′ is Propagating with a discount, the Q value corresponding to state x + and unloading u is updated,
  When the state x + is changed to the state x + ′ by the unloading u and the state x ″ is changed from the state x + ′ to the state x ″ by one or more rearrangements, the state x ″ and the state x ″ can be executed. The Q value corresponding to the action propagates with a discount, and the Q value corresponding to the state x + and the unloading u is updated,
  When the state x + is changed from the state x to the state x + by the relocation target container selection operation u ′ ″ and the unloading is executed in the state x +, it corresponds to the unloading that can be executed in the state x + and the state x +. The Q value propagates without discounting, and the Q value corresponding to the state x and the relocation target container selection operation u ′ ″ is updated,
  The procedure is determined by selecting the operation having the largest value among the Q values corresponding to the state.
  Container terminal operation optimization method.

In claim 12,
Corresponding to state x +, and corresponds to the load repeatedly u Q value Q (x +, u) is
'When repeat the load from runs, state x' state x + state x + moving by load repeatedly u from corresponding to, and, Q values corresponding to 'load repeatedly u may be performed in' state x Using Q (x + ', u'), the following formula:
Q (x + , u) = (1- [alpha]) Q (x + , u) + [alpha] [R + [gamma] maxQ (x + ', u')]
Expressed by
When 'from the state x by one or more of the relocated' state x + moves to 'state x''corresponds to, and the state x' Q value corresponding to the 'operation u that may be performed by'' Using Q (x ″, u ″), the following formula:
Q (x + , u) = (1−α) Q (x + , u) + α [R + γmaxQ (x ″, u ″)]
Expressed by
Corresponding to state x, and, Q value Q (x, u ''') corresponding to rearrange target container selection operation u is rearranged target container selection operation u' selected rearranged target container by '' when it is necessary to repeat the load when rearranging and the state x or al the rearranged target container selection operation state proceeds by u x + a state x + luggage may be performed in repeated u '''' Use the following formula:
Q ( x, u ′ ″ ) = maxQ (x +, u ″ ″ )
The container terminal operation optimization method expressed by