JP2964902B2

JP2964902B2 - Learning method of neural network for elevator call assignment

Info

Publication number: JP2964902B2
Application number: JP7058080A
Authority: JP
Inventors: シャンドルマルコン
Original assignee: Fujitetsuku Kk
Current assignee: Fujitetsuku Kk
Priority date: 1995-02-21
Filing date: 1995-02-21
Publication date: 1999-10-18
Anticipated expiration: 2014-10-18
Also published as: JPH08225258A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ニューラルネットを用
いて呼びの割り当てを行うエレベータの群管理制御装置
に係り、特にその割り当て用ニューラルネットの学習方
法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an elevator group management control system for allocating calls using a neural network, and more particularly to a method of learning a neural network for the allocation.

【０００２】[0002]

【従来の技術】従来、エレベータの群管理制御といえば
評価関数を用いた呼び割り当て方式や、ファジー理論を
用いたエキスパートシステムによる呼び割り当て制御が
主流であったが、最近では生物の神経回路をモデルにし
たニューラルネットを用いて呼びの割り当てを行うとい
う新しい方式が提案されている。2. Description of the Related Art Conventionally, elevator group management control has mainly been a call allocation method using an evaluation function or a call allocation control by an expert system using fuzzy logic. A new method of assigning calls using a neural network has been proposed.

【０００３】ニューラルネットとは、人間の脳をまねた
ネットワークで、神経細胞モデル（ニューロン）が複数
個、複雑に接続され、各ニューロンの動作及びニューロ
ン間の接続形態をうまく決めることによって、パターン
認識機能や知識処理機能を埋め込むことができるという
ものであり、例えば「日経エレクトロニクス」１９８７
年８月１０日号（No４２７）のＰ１１５〜Ｐ１２４や１
９８９年２月に産業図書株式会社から刊行された図書
「ＰＤＰモデル」などに開示されており、特にニューロ
ンを階層構造に配置したものは「バックプロパゲーショ
ン」と呼ばれる自律的学習アルゴリズムを利用できるこ
とに特徴がある。A neural network is a network imitating the human brain. A plurality of neural cell models (neurons) are connected in a complex manner, and the operation of each neuron and the form of connection between the neurons are properly determined to perform pattern recognition. The function and the knowledge processing function can be embedded. For example, "Nikkei Electronics" 1987
P115-P124 and 1 of August 10, issue (No. 427)
It is disclosed in the book "PDP model" published by Sangyo Tosho Co., Ltd. in February 989, etc. In particular, those in which neurons are arranged in a hierarchical structure can use an autonomous learning algorithm called "back propagation". There are features.

【０００４】このニューラルネットを用いると、割り当
てアルゴリズムを人間が一切考える必要はなく、しかも
各種の交通状況に対応して、結果的には最適な割り当て
かごを決定する判断システムを自動的に生成できるとい
う優れた効果があり、例えばエレベータの呼び割り当て
に用いた例としては、特開平１−２７５３８１号「エレ
ベータの群管理制御装置」や、特開平３−３１１７３号
「エレベータの群管理制御装置」、特願平５−２４３８
１７号「エレベータ呼び割当て用ニューラルネットの学
習方法」などがある。When this neural network is used, there is no need for a human to consider an assignment algorithm at all, and a decision system for deciding an optimum assigned car in response to various traffic conditions can be automatically generated. For example, Japanese Patent Application Laid-Open No. 1-275381 "Elevator group management control device" and Japanese Patent Application Laid-Open No. 3-31173 "Elevator group management control device" Japanese Patent Application No. 5-2438
No. 17, "Method of Learning Neural Network for Elevator Call Assignment" and the like.

【０００５】ここで呼び割り当て用ニューラルネットの
一例を図３に示す。図３に示すように、呼び割り当て用
のニューラルネットＮＮは、入力パターン（エレベータ
システム状態データ）に対応する入力層ＮＲ1 と、出力
パターン（割り当て適性）に対応する出力層ＮＲ3 と、
入力層と出力層の中間に置かれる中間層ＮＲ2 のニュー
ロンとで構成される。FIG. 3 shows an example of a neural network for call assignment. As shown in FIG. 3, the neural network NN for call assignment includes an input layer NR1 corresponding to an input pattern (elevator system state data), an output layer NR3 corresponding to an output pattern (assignment suitability),
It is composed of neurons of an intermediate layer NR2 located between the input layer and the output layer.

【０００６】入力パターンは、エレベータシステムの状
態を表す種々のデータ（乗場呼びの発生階と方向、各号
機の位置と運転方向、かご呼び、荷重状態等）を、呼び
の割り当てに必要なパラメータとして、ニューラルネッ
トに入力できる形に変換したものであり、入力層のニュ
ーロンの数はそのパラメータの総数に対応する。The input pattern uses various data representing the state of the elevator system (floor and direction of landing call, position and driving direction of each car, car call, load state, etc.) as parameters necessary for call assignment. , Into a form that can be input to a neural network, and the number of neurons in the input layer corresponds to the total number of parameters.

【０００７】この入力層の各ニューロンに入力データを
与えると、出力層に向かって順に信号が伝わり、その結
果出力層の各ニューロンからそれぞれ何らかの値が出力
される。出力層では、エレベータの台数分のニューロン
があり、さまざまな入力パターンに対して、割り当てに
最適である号機に対応するニューロンが「１」（最大
値）を、その他のニューロンは「０」を出力するように
予め学習されている。When input data is given to each neuron in the input layer, a signal is sequentially transmitted to the output layer, and as a result, a certain value is output from each neuron in the output layer. In the output layer, there are neurons for the number of elevators, and for various input patterns, the neuron corresponding to the car that is optimal for allocation outputs “1” (maximum value), and the other neurons output “0”. Is learned in advance.

【０００８】従って、出力パターンの各ニューロンの値
の中で、「１」（最大値）に最も近い値を出力したニユ
ーロンが割り当てに最適であることを示すことになり、
このニューロンに対応する号機が割り当て号機として選
択される。なお、中間層（実施例では一層であるが、複
数であってもよい）のニューロンの数は、エレベータの
台数やビルの性質等に応じて適宜定められる。Accordingly, among the values of each neuron in the output pattern, a neuron that outputs a value closest to “1” (maximum value) indicates that it is optimal for allocation.
The unit corresponding to this neuron is selected as the assigned unit. Note that the number of neurons in the intermediate layer (one layer in the embodiment, but may be plural) is appropriately determined according to the number of elevators, the properties of the building, and the like.

【０００９】また、図示を省略しているが、各ニューロ
ン間にはニューロンの結び付きの強さを表す結合重み
（シナプスウェイト）が設定されている。この結合重み
は、最初は適当な値に設定されているが、その後「バッ
クプロパゲーション」と呼ばれる学習アルゴリズムを用
いて、より精度の高い呼び割り当てができるように修正
していくことができる。Although not shown, a connection weight (synapse weight) indicating the strength of the connection between the neurons is set between the neurons. The connection weight is initially set to an appropriate value, but can be modified so that a more accurate call assignment can be performed using a learning algorithm called “back propagation”.

【００１０】このバックプロパゲーションについてはよ
く知られているので詳細な説明は省略するが、予め作成
された学習用サンプル（入力パターンと、その入力パタ
ーンに対する望ましい出力パターンすなわち教師信号と
を対にしたもの）を用い、同一の入力パターンに対する
出力パターンと教師信号とを比較し、その誤差を最小化
するように結合重みを修正していくアルゴリズムで、ま
ず最初はすべての重みを初期化（例えばランダムな値に
設定）しておき、入力層の各ニューロンに学習用サンプ
ルの入力パターンを与える。そしてこのときの出力パタ
ーンとその学習用サンプルの出力パターン（教師信号）
とを比較し、その差（誤差）を用いて、その差が小さく
なるように各結合重みの値を出力層側から順に修正して
いくのである。Since this back propagation is well known and will not be described in detail, a learning sample (an input pattern and a desired output pattern corresponding to the input pattern, that is, a teacher signal are paired). An algorithm that compares the output pattern for the same input pattern with the teacher signal and corrects the connection weights to minimize the error. Initially, all weights are initialized (for example, random The input pattern of the learning sample is given to each neuron of the input layer. The output pattern at this time and the output pattern of the learning sample (teacher signal)
Then, using the difference (error), the values of the connection weights are sequentially corrected from the output layer side so that the difference becomes smaller.

【００１１】そして、多数の学習用サンプルを用いて誤
差が収束するまでこれを繰り返すと、ニューラルネット
に教師信号と同レベルの呼び割当機能が自動的に埋め込
まれたことになり、学習用の入力パターンだけでなく未
知の入力パターンに対しても、教師信号と同レベルの呼
び割り当てを行なうことができるようになる。If this is repeated using a large number of learning samples until the error converges, the call assignment function at the same level as the teacher signal is automatically embedded in the neural network, and the learning input The same level of call assignment as the teacher signal can be performed not only for patterns but also for unknown input patterns.

【００１２】[0012]

【発明が解決しようとする課題】上記のようにバックプ
ロパゲーション法を用いて学習するためには、必ず教師
信号が必要となるが、エレベータの呼び割当ての場合に
は最適な教師信号を得ることは非常に困難である。とい
うのは、新規乗場呼びが発生した場合、各号機のかご位
置や他の呼びの状況等からその時点における最適号機を
見つけ出すことは比較的容易であるが、実際には割当号
機がその呼びに応答するまでの間に、別の新たな呼びが
発生したり途中階での停止時間が長くなったりするなど
交通状況に予測できないさまざまな変化を生じるからで
あり、そのため乗場呼びが発生した時点で真に最適な割
当て解を得ることは非常に困難となる。しかも、このバ
ックプロパゲーションにより学習を行う方法では、教師
信号よりも精度の高い呼び割当てを行うことができない
といった問題もある。As described above, a teacher signal is always required for learning using the back propagation method. However, in the case of elevator call assignment, an optimal teacher signal is obtained. Is very difficult. This is because when a new hall call occurs, it is relatively easy to find the optimal car at that time based on the car position of each car and the status of other calls, but in practice the assigned car is assigned to that car. Until the call is answered, there will be various unpredictable changes in traffic conditions, such as another new call or longer downtime on the middle floor, so that when the hall call occurs, It is very difficult to obtain a truly optimal assignment solution. In addition, the method of performing learning by back propagation has a problem in that call assignment with higher accuracy than a teacher signal cannot be performed.

【００１３】このため、教師信号の不要ないわゆる強化
学習法（reinforcement learning）を用いてニューラル
ネットの自己組織化を図っていくことが考えられる。こ
の強化学習法については既に周知であるので詳細な説明
は省略するが、まずすべての結合重みを初期化してお
き、次に任意の１個或いは複数個の結合重みに摂動を与
える。すなわち結合重みの値を僅かに変化させる。そし
て摂動を与える前と後とでニューラルネットの出力又は
ニューラルネットで制御されているシステムの評価値を
比較評価し、その結果改善されている場合はその摂動を
受入れ、改善されていない場合は摂動を与える前の状態
に戻す。こうしてこの手順を繰り返し実行すると、各結
合重みは少しずつ最適値に向けて収束していくことにな
る。これが強化学習法である。For this reason, it is conceivable that self-organization of the neural network is attempted by using a so-called reinforcement learning method that does not require a teacher signal. Since this reinforcement learning method is already well known, a detailed description thereof will be omitted. First, all connection weights are initialized, and then one or a plurality of arbitrary connection weights are perturbed. That is, the value of the connection weight is slightly changed. The output of the neural network or the evaluation value of the system controlled by the neural network is compared and evaluated before and after the perturbation is given, and if the result is improved, the perturbation is accepted. To the state before giving. When this procedure is repeatedly performed in this manner, each connection weight gradually converges toward the optimum value. This is the reinforcement learning method.

【００１４】従ってこの強化学習法をエレベータの呼び
割当て用のニューラルネットに適用した場合、その手順
は一般的には図４に示したようになり、まずステップＳ
１で割り当て用ニューラルネットに（＋）摂動を与え
る。すなわち任意の１個或いは複数個の結合重みの値を
僅かに大きくする。そしてその（＋）摂動後のニューラ
ルネットによる割り当てを行いながら、一定時間エレベ
ータの運転を行い、その間の割り当て結果から各呼びの
待ち時間を集計する（ステップＳ２）。Therefore, when this reinforcement learning method is applied to a neural network for assigning calls to elevators, the procedure is generally as shown in FIG.
1 gives (+) perturbation to the neural network for assignment. That is, the value of one or a plurality of arbitrary connection weights is slightly increased. Then, the elevator is operated for a certain period of time while performing the assignment by the neural network after the (+) perturbation, and the waiting time of each call is totaled from the assignment result during that time (step S2).

【００１５】次にステップＳ３で、今度は初期のニュー
ラルネットに（−）摂動を与え、すなわち今度は結合重
みの値を僅かに小さくし、その（−）摂動後のニューラ
ルネットで上記と同様に割り当てを行いながら一定時間
運転を行い、その間の割り当て結果から各呼びの待ち時
間を集計する（ステップＳ４）。Next, in step S3, a (-) perturbation is applied to the initial neural network, that is, the value of the connection weight is slightly reduced this time, and the neural network after the (-) perturbation is performed in the same manner as described above. The operation is performed for a fixed time while the assignment is performed, and the waiting time of each call is totaled from the assignment result during the operation (step S4).

【００１６】そして（＋）摂動の場合と（−）摂動の場
合とで、上記の集計結果から例えば平均待ち時間を算出
して割り当て性能を比較し（ステップＳ５）、性能の優
れていた方を選択して結合重みをその値に更新する（ス
テップＳ６）。以後はこの更新後のニューラルネットを
新たな割り当て用ニューラルネットとして上記の手順を
繰り返すと、やがて各結合重みの値は最適値に収束する
ようになる。In the case of (+) perturbation and in the case of (-) perturbation, for example, an average waiting time is calculated from the above totaled result, and the allocation performance is compared (step S5). The connection weight is selected and updated to that value (step S6). Thereafter, when the updated neural network is used as a new neural network for allocation and the above procedure is repeated, the values of the connection weights eventually converge to the optimum values.

【００１７】ところで、上記の手順において、ステップ
Ｓ５で（＋）摂動と（−）摂動の場合の割り当て性能を
正確に比較するためには、ステップＳ２とステップＳ４
の運転を同一条件で、すなわち呼びの発生状況等が同じ
状態で行う必要がある。そのためにはシミュレーション
装置を用いて学習を行えばよいが、実際にエレベータが
設置されるビルの性質や交通状況を事前に正確に把握す
ることは極めて困難であり、また把握できたとしてもエ
レベータの設置後も交通状況は変動していくため、やは
り最終的にはエレベータを設置後にそのビルで実際に運
転を行いながら強化学習を継続していく必要がある。In the above procedure, in order to accurately compare the allocation performance in the case of (+) perturbation and (-) perturbation in step S5, steps S2 and S4 are required.
Must be performed under the same conditions, that is, under the same call occurrence conditions. For this purpose, learning can be performed using a simulation device.However, it is extremely difficult to accurately grasp in advance the properties and traffic conditions of the building in which the elevator is actually installed. Since the traffic situation fluctuates even after installation, it is necessary to continue reinforcement learning while actually driving in the building after installing the elevator.

【００１８】しかしながら、現場で実際に運転を行いな
がら強化学習を行っていく場合、前述のようにステップ
Ｓ２とＳ４の運転条件を同一とするのは非常に困難であ
る。これは、待ち時間等の統計をとって比較するために
はそれぞれ３０分程度の運転を行う必要があるが、例え
ばステップＳ２の運転中はＵＰピーク時であったとして
も、ステップＳ４の時点では交通状態が平常時に変化し
ている可能性があり、また、同じＵＰピーク時間帯であ
ったとしても前半の３０分と後半の３０分とでは呼びの
発生状況が大きく異なることもあるからである。However, when reinforcement learning is performed while actually driving at the site, it is very difficult to make the operating conditions in steps S2 and S4 identical as described above. This means that it is necessary to perform the operation for about 30 minutes each in order to obtain statistics such as the waiting time and the like. For example, even during the operation at the step S2 during the UP peak time, at the time of the step S4, This is because traffic conditions may change during normal times, and even in the same UP peak time zone, the occurrence of calls may differ significantly between the first half hour and the second half hour. .

【００１９】このため、図４の手順により現場で実際に
運転を行いながら強化学習をおこなっていくには、例え
ばステップＳ２で（＋）摂動の場合の集計をとると、ス
テップＳ４の（−）摂動による集計は、次の日の同じ時
間帯、或いは更に正確にするためには次の週の同じ曜日
の同じ時間帯とするなど、条件をできるだけ同じにする
必要があるが、それでは学習の１サイクルを実行するだ
けで１日或いは１週間単位の時間を要することとなり、
学習を終了するまでに非常に長期間を要するだけでな
く、交通状況が度々変動するようなビルではその変動に
ニューラルネットの学習が追随できないといった問題が
あった。For this reason, in order to perform reinforcement learning while actually driving at the site according to the procedure of FIG. 4, for example, if the tally in the case of (+) perturbation is obtained in step S2, (-) in step S4 Aggregation by perturbation requires the same conditions as possible, such as the same time period of the next day, or, for more accuracy, the same time period of the same day of the next week. Executing a cycle will take one day or one week,
Not only does it take a very long time to complete learning, but also in buildings where traffic conditions frequently fluctuate, there is a problem that neural network learning cannot follow such fluctuations.

【００２０】本発明はこのような問題点に鑑みてなされ
たもので、現場で実際に運転を行いながら、しかも短期
間で強化学習を実施することのできる方法を提供するこ
とを目的とする。The present invention has been made in view of such problems, and has as its object to provide a method capable of performing reinforcement learning in a short period of time while actually driving at a site.

【００２１】[0021]

【課題を解決するための手段】上記目的を達成するた
め、本発明では、割り当て用ニューラルネットの結合重
みに摂動を与えて、それぞれ摂動量の異なる複数個のネ
ットを作成する。例えば（＋）側に摂動させた（＋）ネ
ットと、（−）側に摂動させた（−）ネットとをそれぞ
れ作成し、前記（＋）ネットでの割り当てによる短時間
運転と、前記（−）ニューラルネットでの割り当てによ
る短時間運転とを１サイクルとして、所定サイクルエレ
ベータの運転を繰り返した後、その間の前記（＋）ネッ
トによる割当て性能と、前記（−）ネットによる割当て
性能とを比較して、性能の優れている方の摂動を前記割
り当て用ニューラルネットの結合重みに与えて更新し、
以後上記の手順を繰り返すことにより前記割り当て用ニ
ューラルネットの結合重みを修正していくようにしたこ
とを特徴とする。In order to achieve the above object, according to the present invention, a plurality of nets having different perturbation amounts are created by perturbing the connection weights of the neural network for assignment. For example, a (+) net perturbed to the (+) side and a (−) net perturbed to the (−) side are respectively created, and the short-time operation by the assignment in the (+) net and the (−) net are performed. After repeating the operation of the elevator for a predetermined cycle with the short-time operation based on the assignment by the neural network as one cycle, the assignment performance by the (+) net and the assignment performance by the (-) net during that period are compared. And giving the perturbation with the better performance to the connection weight of the neural network for assignment to update
Thereafter, the above procedure is repeated to correct the connection weight of the neural network for assignment.

【００２２】[0022]

【作用】本発明においては、（＋）ネットによる短時間
運転と、（−）ネットによる短時間運転との１サイクル
毎に割り当て性能が比較され結合重みが修正されるので
はなく、所定サイクルの運転を繰り返した後、その所定
サイクルの統計結果でそれぞれのニューラルネットの割
り当て性能が比較され、その結果に従って結合重みが更
新される。According to the present invention, the allocation performance is compared and the connection weight is not corrected for each cycle of the short-time operation by the (+) net and the short-time operation by the (-) net, but the link weight is corrected. After the operation is repeated, the allocation performance of each neural network is compared with the statistical result of the predetermined cycle, and the connection weight is updated according to the result.

【００２３】[0023]

【実施例】以下、本発明の一実施例について説明する。
図１は、割り当て用ニューラルネットによりエレベータ
の運転を実際に行いながら強化学習を行っていく、本発
明の学習の手順を示すフローチャートである。An embodiment of the present invention will be described below.
FIG. 1 is a flowchart showing a learning procedure of the present invention, in which reinforcement learning is performed while an elevator is actually operated by an assignment neural network.

【００２４】まず、ステップＳ１で初期ネットをコピー
し、これを割り当て用のニューラルネットとする。この
初期ネットとしては、シミュレーション等により既に学
習を終了しているもの、或いは他の現場ですでに割り当
てに使用しているものなど、そのままでも実際に或る程
度の割り当てが可能なニューラルネットを用いる。First, in step S1, the initial net is copied and used as a neural net for assignment. As this initial net, a neural net that can be actually assigned to some extent as it is, such as one already learned by simulation or the like, or one already used for assignment at another site, is used. .

【００２５】ステップＳ２ではこの割り当て用ニューラ
ルネットの出力層のｉ番目の結合重みの摂動量を設定
し、ステップＳ３とＳ４とで、（＋）側の摂動を与えた
（＋）ネットと、（−）側の摂動を与えた（−）ネット
をそれぞれ作成する。ここで重みの摂動量は適当な値と
することもできるが、例えばこの割り当て用ニューラル
ネットで実際に割り当てを行い、その結果を用いて適切
な摂動量（割り当て結果に影響があり、しかもその影響
が許容の範囲内となる程度）を計算により求めることも
できる。In step S2, the perturbation amount of the i-th connection weight of the output layer of the neural network for allocation is set. In steps S3 and S4, the (+) net given the (+) side perturbation and ( Create (-) nets with perturbation on the-) side. Here, the amount of perturbation of the weight can be set to an appropriate value. For example, an actual assignment is performed by using the neural network for assignment, and an appropriate amount of perturbation (the assignment result is affected, and the Is within the allowable range) by calculation.

【００２６】次にステップＳ５では、（＋）ネットで割
り当てを行い、一定数の乗場呼びが割り当てられてサー
ビスされるまでこの（＋）ネットによるエレベータの運
転を継続し（ステップＳ７）、その間にステップＳ６で
各乗場呼びの実際の待ち時間を記録する。そして（＋）
ネットで割り当て、サービスされた乗場呼びが一定数に
達すると、今度は（−）ネットにより同様にして一定数
になるまで乗場呼びの割り当てを行い、その間に各乗場
呼びの待ち時間を記録する（ステップＳ８〜Ｓ１０）。Next, in step S5, the elevator is assigned on the (+) net until the fixed number of hall calls are assigned and serviced (step S7). In step S6, the actual waiting time of each hall call is recorded. And (+)
When the number of hall calls assigned and serviced by the net reaches a certain number, hall calls are allocated in the same way until a certain number is reached by the (-) net, and the waiting time of each hall call is recorded during that time ( Steps S8 to S10).

【００２７】ここで上記の一定数の乗場呼びとは、例え
ば５個程度とし、（＋）ネット或いは（−）ネットによ
り割り当てを行う一つの期間が、その間に交通状況が大
きく変化しない程度の短時間（例えば１〜２分、最大で
も５分程度）となるように設定する。そして（＋）ネッ
トによる一期間の短時間運転と、（−）ネットによる一
期間の短時間運転とを合わせて１サイクルとする。この
様子を示したのが第２図である。Here, the above-mentioned fixed number of hall calls is, for example, about five, and one period in which the assignment is made by the (+) net or the (-) net is short enough that the traffic situation does not change significantly during that period. The time is set to be, for example, 1 to 2 minutes (at most, about 5 minutes). Then, one cycle includes the short-time operation for one period by the (+) net and the short-time operation for one period by the (-) net. FIG. 2 shows this state.

【００２８】第２図において、各矢印はその起点が乗場
呼びの発生時点と発生階を、その終点がその呼びにかご
が実際にサービスした時点を、矢印の長さはその呼びの
実際の待ち時間をそれぞれ表している。第２図に示すよ
うに、（＋）ネットによる割り当てでサービスされた乗
場呼びが一定数（この例では１〜４の４個）になると、
次は（−）ネットによる割り当てとし、これを１サイク
ルとして繰り返す。このとき、５のように（＋）ネット
による割り当ての期間と（−）ネットによる割り当ての
期間の両方にまたがっている呼びは、両方の影響を受け
ているので評価の対象としないようにする。In FIG. 2, each arrow has a starting point indicating the point of occurrence and floor at which the hall call occurred, an end point indicating the point at which the car actually served the call, and the length of the arrow indicates the actual waiting time of the call. Each represents time. As shown in FIG. 2, when the number of hall calls serviced by the (+) net allocation becomes a fixed number (four in this example, 1 to 4),
Next, the assignment is made by the (-) net, and this is repeated as one cycle. At this time, a call that extends over both the period of assignment by the (+) net and the period of assignment by the (−) net, such as 5, is not subject to evaluation because both are affected.

【００２９】図１に戻り、ステップＳ１１を経てステッ
プＳ５〜Ｓ１０を繰り返し、すなわち（＋）ネットによ
る割り当て期間と（−）ネットによる割り当て期間とを
１サイクルとした短時間運転を所定サイクル数（１０回
〜２０回程度）になるまで繰り返し、所定サイクル数に
達するとステップＳ１１からＳ１２へと進む。ステップ
Ｓ１２では、この所定サイクルの運転期間における各乗
場呼びの待ち時間を、（＋）ネットにより割り当てられ
た乗場呼びと（−）ネットにより割り当てられた乗場呼
びとに分けて集計し、それぞれのネットでの平均待ち時
間を算出する。そして、ステップＳ１３でその平均待ち
時間、すなわち割り当て性能を比較する。Returning to FIG. 1, steps S5 to S10 are repeated via step S11, that is, the short-time operation in which the (+) net allocation period and the (-) net allocation period are one cycle is performed for a predetermined number of cycles (10 (About 20 times), and when the predetermined number of cycles is reached, the process proceeds from step S11 to S12. In step S12, the waiting time of each hall call in the operation period of the predetermined cycle is divided into the hall call assigned by the (+) net and the hall call assigned by the (-) net, and totaled. Calculate the average waiting time at. Then, in step S13, the average waiting time, that is, the allocation performance is compared.

【００３０】このように１サイクル毎の比較ではなく、
所定サイクル数の集計で比較すると、１サイクル毎に見
れば多少の交通状況の変動がある場合でも、全体では
（＋）ネットの期間と（−）ネットの期間とはほぼ同一
の時間帯・交通状況となり、両者の割り当て性能をほぼ
同一の条件で正確に比較できることになる。Thus, instead of comparing each cycle,
Comparing by counting the number of predetermined cycles, even if there is a slight change in the traffic situation in each cycle, the period of (+) net and the period of (-) net are almost the same time zone / traffic as a whole. In this situation, the allocation performance of the two can be accurately compared under almost the same conditions.

【００３１】比較の結果（＋）ネットによる性能の方が
優れていた場合は、割り当て用ニューラルネットの出力
層のｉ番目の結合重みを（＋）摂動させ、その逆の場合
は（−）摂動させて更新する（ステップＳ１４）。こう
して、以後も上記の手順を更に繰り返していくと、結合
重みが徐々に収束していき、そのビルの交通状況に最も
適したニューラルネットが構築される。As a result of comparison, if the performance of the (+) net is better, the i-th connection weight of the output layer of the neural network for assignment is perturbed by (+), and vice versa. And update (step S14). In this way, when the above procedure is further repeated, the connection weight gradually converges, and a neural network most suitable for the traffic condition of the building is constructed.

【００３２】なお、上記の実施例では（＋）ネットと
（−）ネットの２つだけ作成するようにしたが、２つだ
けに限らず、摂動量の異なるネットを複数個作成して上
記と同様にそれぞれで短時間運転を行い、それを１サイ
クルとして所定サイクル数の運転を行った後、各ネット
の割り当て性能を評価し、その中で最も優れているネッ
トの摂動を割り当て用ニューラルネットに与えて更新す
るようにすることもできる。In the above embodiment, only two (+) nets and (-) nets are created. However, the number is not limited to two, and a plurality of nets having different perturbation amounts are created. Similarly, each of them is operated for a short time, and the operation is performed for a predetermined number of cycles as one cycle. Then, the assignment performance of each net is evaluated, and the best perturbation of the net is assigned to the assignment neural net. It can also be provided and updated.

【００３３】また、上記の実施例では割り当て性能の評
価は、乗場呼びの実際の待ち時間を用いてその平均値に
より行っているが、これに限らず一定値を超える長待ち
の発生する確率である長待ち確率や或いは電力消費量な
ど他の指標を用いて行うようにしてもよく、それらの総
合評価で行うようにしてもよい。Further, in the above embodiment, the evaluation of the allocation performance is performed by the average value using the actual waiting time of the hall call, but the present invention is not limited to this. It may be performed by using other indices such as a certain long wait probability or power consumption, or may be performed by comprehensive evaluation thereof.

【００３４】また、上記の実施例では、重みの修正は出
力層だけを対象にしているが、勿論入力層や中間層の重
みの修正にも同様に本発明を適用することができる。In the above embodiment, the correction of the weight is performed only on the output layer. However, the present invention can be similarly applied to the correction of the weight of the input layer and the intermediate layer.

【００３５】また、上記の実施例では、（＋）或いは
（−）ネットにより割り当てを行う期間は一定数の乗場
呼びがサービスされるまでとしたが、所定の時間として
もよく、その場合、その所定時間は最大想定される待ち
時間より長くなるように（長い待ち時間のサンプルが取
れるように）するが、前述と同様に最大でも５分程度に
設定するのが望ましい。Further, in the above embodiment, the period of time of the allocation by the (+) or (-) net is a period until a certain number of hall calls are serviced, but may be a predetermined time. The predetermined time is set to be longer than the maximum expected waiting time (to take a sample with a long waiting time), but it is preferable to set the maximum time to about 5 minutes as described above.

【００３６】[0036]

【発明の効果】本発明によれば、現場で実際に運転を行
いながら、しかも短時間で強化学習法による学習を行う
ことができる。また、エレベータの設置後に交通状況が
変動する場合でも、その変動に速やかに追随することが
でき、常に精度の高い割当て性能を維持することができ
る。According to the present invention, the learning by the reinforcement learning method can be performed in a short time while actually driving at the site. In addition, even when the traffic condition changes after the elevator is installed, it is possible to quickly follow the change and constantly maintain highly accurate assignment performance.

[Brief description of the drawings]

【図１】本発明による学習の手順を示すフローチャート
である。FIG. 1 is a flowchart showing a learning procedure according to the present invention.

【図２】本発明における各ネットの割り当て期間の関係
を説明するための図である。FIG. 2 is a diagram for explaining a relationship between allocation periods of each net according to the present invention.

【図３】呼び割当て用ニューラルネットの一例を示す図
である。FIG. 3 is a diagram showing an example of a neural network for call assignment.

【図４】従来の学習手順を示すフローチャートである。FIG. 4 is a flowchart showing a conventional learning procedure.

[Explanation of symbols]

ＮＮ呼び割り当て用ニューラルネットＮＲ1 入力層のニューロンＮＲ2 中間層のニューロンＮＲ3 出力層のニューロン１〜５乗場呼びとその待ち時間 NN Neural network for call assignment NR1 Input layer neurons NR2 Hidden layer neurons NR3 Output layer neurons 1 to 5

Claims

(57) [Claims]

When a new hall call is generated, a system state data of the group at that time is input, and an elevator call allocation neural network learning method used for selecting and allocating an optimum car from its output is assigned. A perturbation is applied to the connection weights of the neural network for use to create a plurality of nets having different amounts of perturbation, and the short-time operation by the assignment in each of the plurality of nets is combined into one cycle, and the elevator having a predetermined number of cycles is used. After the operation is repeated, the allocation performance of the nets during the operation is compared, and the perturbation of the net having the best performance is given to the connection weight of the neural network for allocation to be updated, and thereafter, the above procedure is repeated. The connection weight of the neural network for assignment is modified by Learning method of neural network for call assignment.

2. A method according to claim 1, wherein when a new hall call is generated, the system state data of the group at that time is input, and an elevator car allocation neural network learning method is used for selecting and allocating an optimum car from the output. A (+) net in which the connection weight of the neural network for use is perturbed to the (+) side and a (-) net in which the connection weight is perturbed to the (−) side are respectively created, and the short time by the assignment in the (+) net is created. After repeating the operation of the elevator for a predetermined cycle with the operation and the short-time operation by the assignment in the (-) net as one cycle, the assignment performance by the (+) net and the assignment by the (-) net in the meantime. Compared with the performance, the perturbation with the better performance is given to the connection weight of the neural network for assignment and updated, and thereafter, A method of learning a neural network for allocating an elevator call, wherein a connection weight of the neural network for allocating is corrected by repeating a procedure.

3. The learning method for an elevator call assignment neural network according to claim 1, wherein the assignment performance is evaluated based on an actual waiting time of the hall call.

4. The method according to claim 1, wherein the assignment performance is evaluated by comprehensive evaluation of a plurality of indices.
The learning method of the neural network for elevator call assignment described in (1).