JP2021047625A

JP2021047625A - Evacuation guidance device, and evacuation guidance model learning device

Info

Publication number: JP2021047625A
Application number: JP2019169638A
Authority: JP
Inventors: 正博大渕; Masahiro Obuchi; 恒川　裕史; Yasushi Tsunekawa; 裕史恒川
Original assignee: Takenaka Komuten Co Ltd
Current assignee: Takenaka Komuten Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2021-03-25
Anticipated expiration: 2039-09-18
Also published as: JP7415293B2

Abstract

To lead evacuees to evacuate in a manner to minimize risks by taking into account of an evacuation simulation result when a disaster occurs.SOLUTION: An obtainment section 36 of an evacuation guidance device 30 obtains observation information expressing positions or motions of people when a disaster occurs. Then, an evacuation information generation section 38 inputs, on the basis of an evacuation simulation result when the disaster occurs, the observation information obtained by the obtainment section 36, into a trained model to which reinforcement learning has been previously performed by using a count of deaths and a count of injured persons in the disaster of the evacuation simulation as well as a remuneration set in accordance with a time until the evacuation completes, and generates evacuation information which is information relating to an evacuation route when the disaster occurs. And further, a control section 40 controls a display device 42 in accordance with the evacuation information generated by the evacuation information generation section 38.SELECTED DRAWING: Figure 1

Description

本発明は、避難誘導装置及び避難誘導モデル学習装置に関する。 The present invention relates to an evacuation guidance device and an evacuation guidance model learning device.

従来、避難シミュレーションシステムが知られている（例えば、特許文献１）。この避難シミュレーションシステムは、マルチエージェントシミュレーション技術を用いて高層建造物における災害避難方法をシミュレーションする。この避難シミュレーションシステムは、避難者個人を一個の行動単位としてモデル化して避難行動中の各個人の状態を逐次再現するというアプローチをとることにより、避難中の任意の時点における避難状況を追跡することで、安全な避難を妨げるボトルネックを容易に特定して改善施策の検討を行うためのものである。 Conventionally, an evacuation simulation system is known (for example, Patent Document 1). This evacuation simulation system uses multi-agent simulation technology to simulate disaster evacuation methods in high-rise buildings. This evacuation simulation system tracks the evacuation situation at any time during evacuation by taking the approach of modeling the individual evacuees as one action unit and sequentially reproducing the state of each individual during the evacuation action. The purpose is to easily identify bottlenecks that hinder safe evacuation and consider improvement measures.

また、被災したところを避けた避難ルートを出力する避難ルート出力装置が知られている（例えば、特許文献２）。この避難ルート出力装置は、災害時において避難場所まで安全に行くことができるルートを生成する。 Further, an evacuation route output device that outputs an evacuation route avoiding a damaged area is known (for example, Patent Document 2). This evacuation route output device generates a route that can safely reach the evacuation site in the event of a disaster.

また、災害の状況に応じて迅速かつ適切に避難計画を策定できる避難シミュレーション装置が知られている（例えば、特許文献３）。この避難シミュレーション装置は、避難者の密度に基づいて経路の流動を計算し、避難完了時間が最短となる最適避難経路候補を複数導出する。そして、避難シミュレーション装置は、マルチエージェント法により避難者の行動を計算し、複数の最適避難経路候補から、避難完了時間が最短となる最適避難経路を選択する。 Further, an evacuation simulation device capable of quickly and appropriately formulating an evacuation plan according to a disaster situation is known (for example, Patent Document 3). This evacuation simulation device calculates the flow of routes based on the density of evacuees, and derives a plurality of optimal evacuation route candidates with the shortest evacuation completion time. Then, the evacuation simulation device calculates the behavior of the evacuees by the multi-agent method, and selects the optimum evacuation route with the shortest evacuation completion time from a plurality of optimum evacuation route candidates.

特許第5372421号公報Japanese Patent No. 5342421 特許第5686479号公報Japanese Patent No. 5686479 特許第5996689号公報Japanese Patent No. 5996689

災害が発生した際に建物内の人に対して避難誘導を行う場合には、避難経路の提示を適切に行う必要がある。また、その避難経路の提示には迅速性が求められる。 When evacuation guidance is given to people in the building in the event of a disaster, it is necessary to properly present the evacuation route. In addition, promptness is required to present the evacuation route.

しかし、上記特許文献１の技術は、安全な避難を妨げるボトルネックを容易に特定して改善施策の検討を行うためのものであり、計画対象の建物を評価する際に用いられる技術である。 However, the technique of Patent Document 1 is for easily identifying a bottleneck that hinders safe evacuation and examining improvement measures, and is a technique used when evaluating a building to be planned.

また、上記特許文献２の技術は、災害が発生した際に被災したところを避けた避難ルートを出力するものである。しかし、実際に災害が発生した場合には、被災した箇所以外の様々な状況を考慮する必要がある。例えば、避難する人の動き等を考慮する必要がある。 Further, the technique of Patent Document 2 outputs an evacuation route avoiding the damaged part when a disaster occurs. However, when a disaster actually occurs, it is necessary to consider various situations other than the affected area. For example, it is necessary to consider the movement of people who evacuate.

また、上記特許文献３に記載されている技術は、実際の災害の状況に応じてシミュレーションを行うが、当該シミュレーションを実行する際には時間がかかり、迅速性という観点からは適切ではない。 Further, the technique described in Patent Document 3 performs a simulation according to an actual disaster situation, but it takes time to execute the simulation and is not appropriate from the viewpoint of speed.

本発明は上記事実に鑑みて、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることを目的とする。 In view of the above facts, an object of the present invention is to evacuate an evacuee so as to minimize the risk in consideration of the evacuation simulation result when a disaster occurs.

上記目的を達成するために、本発明の避難誘導装置は、災害が発生した際の人の位置又は動きを表す観測情報を取得する取得部と、前記取得部によって取得された前記観測情報を、災害が発生した際の避難シミュレーションの結果に基づき、前記避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて予め強化学習された学習済みモデルへ入力して、前記災害が発生した際の避難経路に関する情報である避難情報を生成する避難情報生成部と、前記避難情報生成部によって生成された前記避難情報に応じて、避難情報出力装置を制御する制御部と、を含む避難誘導装置である。本発明の避難誘導装置によれば、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 In order to achieve the above object, the evacuation guidance device of the present invention obtains an acquisition unit that acquires observation information indicating the position or movement of a person when a disaster occurs, and the observation information acquired by the acquisition unit. Based on the results of the evacuation simulation when a disaster occurs, learning that has been strengthened and learned in advance using rewards set according to the number of deaths, injuries, and the time until evacuation is completed in the disaster of the evacuation simulation. Evacuation information generation unit that inputs to the completed model and generates evacuation information that is information about the evacuation route when the disaster occurs, and evacuation information output according to the evacuation information generated by the evacuation information generation unit. It is an evacuation guidance device including a control unit that controls the device. According to the evacuation guidance device of the present invention, the evacuees can be evacuated so as to minimize the risk in consideration of the evacuation simulation result when a disaster occurs.

本発明の前記学習済みモデルは、複数種類の前記避難シミュレーションの結果に応じて予め前記強化学習された学習済みモデルであるようにすることができる。これにより、複数種類の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 The trained model of the present invention can be made to be the trained model that has been reinforcement-learned in advance according to the results of a plurality of types of the evacuation simulations. As a result, the evacuees can be evacuated so as to minimize the risk in consideration of the results of a plurality of types of evacuation simulations.

本発明の避難誘導モデル学習装置は、災害が発生した際の避難シミュレーションを実行し、前記避難シミュレーション結果に基づいて、前記避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて、災害が発生した際の人の位置又は動きを表す観測情報から前記災害が発生した際の避難経路に関する情報を出力するためのモデルを強化学習させて、前記観測情報から前記避難経路に関する情報を出力する学習済みモデルを得る学習部と、を含む避難誘導モデル学習装置である。本発明の避難誘導モデル学習装置によれば、災害が発生した場合に、リスクを最小化させるように避難者を避難させるための学習済みモデルを取得することができる。 The evacuation guidance model learning device of the present invention executes an evacuation simulation when a disaster occurs, and based on the evacuation simulation result, the number of deaths, the number of injured, and the evacuation in the disaster of the evacuation simulation until the evacuation is completed. Using the rewards set according to the time, strengthen learning the model for outputting information on the evacuation route when the disaster occurs from the observation information showing the position or movement of the person when the disaster occurs. , An evacuation guidance model learning device including a learning unit that obtains a learned model that outputs information about the evacuation route from the observation information. According to the evacuation guidance model learning device of the present invention, it is possible to acquire a learned model for evacuating an evacuee so as to minimize the risk in the event of a disaster.

本発明の前記学習部は、複数種類の前記避難シミュレーションを実行し、前記避難シミュレーション結果に基づいて、前記学習済みモデルを得るようにすることができる。これにより、複数種類の避難シミュレーションを考慮して、災害が発生した場合にリスクを最小化するように避難者を避難させるための学習済みモデルを取得することができる。 The learning unit of the present invention can execute a plurality of types of the evacuation simulations and obtain the learned model based on the evacuation simulation results. This makes it possible to take into account multiple types of evacuation simulations and obtain a trained model for evacuating evacuees to minimize risk in the event of a disaster.

本発明によれば、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる、という効果が得られる。 According to the present invention, it is possible to evacuate an evacuee so as to minimize the risk in consideration of the evacuation simulation result when a disaster occurs.

本実施形態に係る避難誘導モデル学習装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the evacuation guidance model learning apparatus which concerns on this embodiment. 避難シミュレーションのシミュレーション結果と報酬との関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between the simulation result of the evacuation simulation and the reward. 本実施形態に係る避難誘導装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the evacuation guidance device which concerns on this embodiment. 表示装置の建物内の設置イメージの一例を示す図である。It is a figure which shows an example of the installation image in a building of a display device. 第１実施形態の学習処理ルーチンの一例を示す図である。It is a figure which shows an example of the learning processing routine of 1st Embodiment. 本実施形態の避難誘導処理ルーチンの一例を示す図である。It is a figure which shows an example of the evacuation guidance processing routine of this embodiment. 第２実施形態に係る建物内の避難と街区の避難とを説明するための説明図である。It is explanatory drawing for demonstrating the evacuation in a building and the evacuation of a block which concerns on 2nd Embodiment. 第２実施形態の変数間の関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between the variable of 2nd Embodiment. 第２実施形態の学習処理ルーチンの一例を示す図である。It is a figure which shows an example of the learning processing routine of 2nd Embodiment. 第２実施形態の学習処理ルーチンの一例を示す図である。It is a figure which shows an example of the learning processing routine of 2nd Embodiment.

＜本実施形態の概要＞ <Outline of this embodiment>

建物内に人が存在する際に災害が発生した場合、建物から避難する際には、被災者自身が避難経路を比較し、最適と考えられる経路を選択することになる。ここで、被災者の判断の良否は、災害に関する知識や入手できる情報によって影響されるため、状況によっては危険な避難経路を選択する場合がある。 If a disaster occurs when there are people in the building, when evacuating from the building, the victims themselves will compare the evacuation routes and select the most suitable route. Here, since the quality of the disaster victim's judgment is influenced by the knowledge about the disaster and the available information, a dangerous evacuation route may be selected depending on the situation.

このような背景のため、避難シミュレーションを活用した避難指示の提示手法（例えば、特許第5996689号公報を参照）が提案されているが、従来の技術は避難シミュレーションの結果（出力）が最適になるように、避難指示という入力パラメタを同定する手法である。このため、複数の手法や複数のモデルによるシミュレーション結果を並列的に考慮して避難指示を最適化することは、手法及びモデルの数が増えるにつれ、同定が困難となる。 Against this background, a method for presenting evacuation instructions using evacuation simulation (see, for example, Japanese Patent No. 5996689) has been proposed, but the conventional technique optimizes the result (output) of evacuation simulation. As described above, it is a method of identifying an input parameter called an evacuation order. Therefore, optimizing evacuation instructions by considering the simulation results of a plurality of methods and a plurality of models in parallel becomes difficult to identify as the number of methods and models increases.

一方、機械学習によって得られる学習済みモデルは、シミュレーションの入力と出力との間の関係性を学習した上で、最適な入力を選定することができる。そのため、異なる種類の避難シミュレーションであっても、同じ項目の入力及び出力がある避難シミュレーションであれば、併用することが容易である。 On the other hand, in the trained model obtained by machine learning, the optimum input can be selected after learning the relationship between the input and output of the simulation. Therefore, even if the evacuation simulations are of different types, it is easy to use them together as long as the evacuation simulations have the same input and output.

そこで、本実施形態では、避難指示の判定において、機械学習によって得られる学習済みモデルを活用することで複数の避難シミュレーションの併用を可能にした手法を提案する。本実施形態によれば、複数の避難シミュレーションのシミュレーション結果を考慮することができるとともに、リスクを最小化するように避難者を避難させることができる。 Therefore, in the present embodiment, we propose a method that enables a plurality of evacuation simulations to be used together by utilizing a learned model obtained by machine learning in determining an evacuation order. According to this embodiment, it is possible to consider the simulation results of a plurality of evacuation simulations and evacuate the evacuees so as to minimize the risk.

以下、本発明の実施形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail.

＜第１実施形態＞ <First Embodiment>

＜避難誘導モデル学習装置のシステム構成＞ <System configuration of evacuation guidance model learning device>

図１は、本発明の第１実施形態に係る避難誘導モデル学習装置１０の構成の一例を示すブロック図である。避難誘導モデル学習装置１０は、機能的には、図１に示されるように、受付部１２と、コンピュータ２０とを含んだ構成で表すことができる。 FIG. 1 is a block diagram showing an example of the configuration of the evacuation guidance model learning device 10 according to the first embodiment of the present invention. Functionally, the evacuation guidance model learning device 10 can be represented by a configuration including a reception unit 12 and a computer 20 as shown in FIG.

受付部１２は、ユーザから入力された情報を受け付ける。受付部１２は、例えばキーボードやマウス等によって実現される。受付部１２は、避難シミュレーションを実行する対象の仮想的な学習用の建物の仕様を表す仕様情報を受け付ける。学習用の建物とは、後述する学習部２４においてシミュレーションに用いられるコンピュータ上の仮想的な建物である。仕様情報には、例えば、学習用の建物内の部屋の配置に関する情報、学習用の建物の構造種別を表す情報、学習用の建物の材料を表す情報、学習用の建物の階数に関する情報、及び学習用の建物の設備に関する情報等が含まれている。 The reception unit 12 receives the information input from the user. The reception unit 12 is realized by, for example, a keyboard, a mouse, or the like. The reception unit 12 receives specification information representing the specifications of the virtual learning building for which the evacuation simulation is executed. The learning building is a virtual building on a computer used for simulation in the learning unit 24, which will be described later. The specification information includes, for example, information on the arrangement of rooms in the learning building, information on the structural type of the building for learning, information on the materials of the building for learning, information on the number of floors of the building for learning, and information on the number of floors of the building for learning. Contains information about the facilities of the building for learning.

また、受付部１２は、避難シミュレーションを実行する際の各種条件に関する情報である条件情報を受け付ける。条件情報には、避難シミュレーションにおける仮想的な災害の発生条件に関する情報である災害発生条件情報と、仮想的な災害が発生した際の建物内の仮想的な人に対する避難指示条件に関する情報である避難指示条件情報とが含まれている。また、条件情報には、仮想的な避難者の配置状況に関する情報が含まれている。 Further, the reception unit 12 receives condition information which is information on various conditions when executing the evacuation simulation. The condition information includes disaster occurrence condition information, which is information on virtual disaster occurrence conditions in the evacuation simulation, and evacuation, which is information on evacuation instruction conditions for virtual people in the building when a virtual disaster occurs. It contains instruction condition information. In addition, the condition information includes information on the placement status of virtual evacuees.

設定情報記憶部２２には、受付部１２により受け付けられた仕様情報と、条件情報とが格納される。設定情報記憶部２２に格納された、仕様情報及び条件情報に応じて、後述する学習部２４において避難シミュレーションが実行される。 The setting information storage unit 22 stores the specification information received by the reception unit 12 and the condition information. The evacuation simulation is executed in the learning unit 24, which will be described later, according to the specification information and the condition information stored in the setting information storage unit 22.

学習部２４は、設定情報記憶部２２に格納された仕様情報と条件情報とに基づいて、学習用の建物において災害が発生した際の避難シミュレーションを実行する。なお、避難シミュレーションの実行されているときの、各時刻の仮想的な人の位置及び動きは、所定の記憶領域（図示省略）に逐次記録される。本実施形態において用いる避難シミュレーションは、既存の避難シミュレーションと同様のものであり、従来の技術（例えば、特許第5372421号公報に記載の技術）を利用する。また、避難シミュレーションの回数は、従来の強化学習と同様に設定することができる。 The learning unit 24 executes an evacuation simulation when a disaster occurs in the building for learning based on the specification information and the condition information stored in the setting information storage unit 22. The position and movement of the virtual person at each time when the evacuation simulation is being executed are sequentially recorded in a predetermined storage area (not shown). The evacuation simulation used in this embodiment is the same as the existing evacuation simulation, and uses a conventional technique (for example, the technique described in Japanese Patent No. 5342421). In addition, the number of evacuation simulations can be set in the same manner as in conventional reinforcement learning.

そして、学習部２４は、当該シミュレーション結果に基づいて、災害が発生した際の建物内の人の位置又は動きを表す観測情報から災害が発生した際の避難経路に関する情報である避難情報を出力するためのモデルを強化学習させる。学習部２４によるモデルの強化学習によって、どのような被災状況及び避難者の位置ではどのような避難指示を出せば良いかが学習される。 Then, based on the simulation result, the learning unit 24 outputs evacuation information which is information on an evacuation route when a disaster occurs from observation information representing the position or movement of a person in the building when a disaster occurs. Reinforcement learning of the model for. Through reinforcement learning of the model by the learning unit 24, what kind of disaster situation and what kind of evacuation instruction should be issued at the position of the evacuees are learned.

以下、強化学習に関して説明する。強化学習は、環境の中での試行錯誤を通じて最適な行動を学習する手法である。強化学習において、教師データの代わりになるのが報酬である。累積報酬Ｒ_ｔは、報酬の割引率をγ、各局面での報酬をｒ_{ｔ＋ｋ＋１}として、以下の式（１）に示されるように定義される。なお、ｔは時刻を表す。 Hereinafter, reinforcement learning will be described. Reinforcement learning is a method of learning optimal behavior through trial and error in the environment. In reinforcement learning, rewards are a substitute for teacher data. The cumulative reward R _t is defined as shown in the following equation (1), where the discount rate of the reward is γ and the reward in each phase is _{rt + k + 1.} In addition, t represents a time.

（１）

(1)

なお、方策πの下で、状態ｓにおいて行動ａを選択することの価値は、以下の式（２）に示される行動価値関数Ｑ^π（ｓ，ａ）によって表される。なお、Ｅ_π｛・｝は期待値を表す。 The value of selecting the action a in the state s under the policy π is expressed by the action value function Q ^π (s, a) shown in the following equation (2). In addition, E _π {・} represents an expected value.

（２）

(2)

上記式（２）に示される行動価値関数Ｑ^π（ｓ，ａ）を用いて、価値が最も高くなるような行動ａが選択される。最適な行動価値関数Ｑ^＊は、以下の式（３）によって表される。 ^{Using the action value function Q π} (s, a) shown in the above equation (2), the action a having the highest value is selected. The optimal action value function Q ^* is expressed by the following equation (3).

（３）

(3)

行動価値関数Ｑ^＊（ｓ，ａ）を学習する方法としては、Q-Learning（例えば、公知文献（Watkins, C.J.C.H., "Learning from Delayed Rewards", 1989）が挙げられる。Q-Learningは、以下の式（４）に示されるように、逐次Ｑ値を更新しながら学習する。なお、αは予め設定される定数である。本実施形態においては、以下の式（４）に示されるQ-Learningによって行動価値関数を強化学習させる。 Examples of the method for learning the behavioral value function Q ^* (s, a) include Q-Learning (for example, publicly known literature (Watkins, CJCH, "Learning from Delayed Rewards", 1989). Q-Learning includes the following. As shown in the equation (4), learning is performed while sequentially updating the Q value. α is a preset constant. In the present embodiment, the Q-Learning shown in the following equation (4) The action value function is strengthened and learned by.

（４）

(4)

本実施形態では、災害状況及び被災者の位置等を状態ｓとし、その状態ｓと方策πとに応じた避難指示を行動ａとし、その避難指示ａが表示された表示装置を見た被災者が避難を行うものとする。Q-Learningによって学習が行われた学習済みモデルは、行動価値関数Ｑが最適となるよう、方策πに応じた避難指示ａを選定することができるようになる。 In the present embodiment, the disaster situation, the position of the victim, etc. are set as the state s, the evacuation instruction according to the state s and the policy π is set as the action a, and the victim who sees the display device on which the evacuation order a is displayed is set. Shall evacuate. In the trained model trained by Q-Learning, the evacuation instruction a according to the policy π can be selected so that the action value function Q is optimized.

本実施形態の学習部２４は、死傷者数及び避難経路の時間に基づくリスク評価結果に基づき、避難シミュレーションの災害における死者数Ｄ、負傷者数Ｉ、及び避難が完了するまでの時間Ｔに応じて設定された報酬ｒ_ｔを用いて、観測情報から避難経路に関する情報を出力するためのモデルを強化学習させる。具体的には、本実施形態においては、以下の式（５）に示される報酬ｒ_ｔを設定する。 Based on the risk evaluation results based on the number of casualties and the time of the evacuation route, the learning unit 24 of the present embodiment responds to the number of casualties D, the number of injured I, and the time T until the evacuation is completed in the disaster of the evacuation simulation. using a set reward r _t, to reinforcement learning model for outputting information about the evacuation route from the observation information Te. Specifically, in the present embodiment, to set a reward r _t represented by the following formula (5).

（５）

(5)

上記式（５）におけるＤは死者数を表し、Ｉは負傷者数を表す。また、Ｔは避難が完了するまでの時間である。Ｃ_ｄは死者１人あたりに対する損失を表す係数であり、Ｃ_ｉは負傷者１人あたりに対する損失を表す係数、Ｃ_ｔは避難時間と損失とを関係付ける係数である。Ｃ_ｄ、Ｃ_ｉ、及びＣ_ｔは、予め設定される。 In the above formula (5), D represents the number of dead and I represents the number of injured. In addition, T is the time until the evacuation is completed. C _d is a coefficient representing the loss for per dead, C _i is a coefficient representing the loss for per injured, the C _t is the coefficient that relates the loss evacuation time. C _d, _{C i,} and _{C t} is set in advance.

図２に、避難シミュレーションのシミュレーション結果と報酬との関係を説明するための説明図を示す。図２に示されるように、建物内に複数の避難者Ｕが存在している場合、災害の一例である火災Ｆが発生した場合の避難シミュレーションを実行したとする。この場合、避難指示Ａが出された場合には、避難時間がＸ１分であり、死者がＹ１人であり、負傷者がＺ１人であり、報酬は高いことが示されている。また、避難指示Ｂが出された場合には、避難時間がＸ２分であり、死者がＹ２人であり、負傷者がＺ２人であり、報酬は中程度であることが示されている。また、避難指示Ｃが出された場合には、避難時間がＸ３分であり、死者がＹ３人であり、負傷者がＺ３人であり、報酬は低いことが示されている。このように、シミュレーション結果と報酬とが紐付けられるため、本実施形態では、シミュレーション結果に応じた報酬に基づいて、観測情報から避難情報を出力するためのモデルを強化学習させる。 FIG. 2 shows an explanatory diagram for explaining the relationship between the simulation result of the evacuation simulation and the reward. As shown in FIG. 2, when a plurality of evacuees U exist in the building, it is assumed that an evacuation simulation is executed when a fire F, which is an example of a disaster, occurs. In this case, when the evacuation order A is issued, it is shown that the evacuation time is X1 minutes, the dead is Y1, the injured is Z1, and the reward is high. Further, when the evacuation order B is issued, it is shown that the evacuation time is X2 minutes, the dead are Y2, the injured are Z2, and the reward is medium. Further, when the evacuation order C is issued, it is shown that the evacuation time is X3 minutes, the dead are Y3, the injured are Z3, and the reward is low. In this way, since the simulation result and the reward are linked, in the present embodiment, the model for outputting the evacuation information from the observation information is strengthened and learned based on the reward according to the simulation result.

具体的には、学習部２４は、上記式（５）に示される報酬ｒ_ｔが大きくなるように、観測情報から避難情報を出力するためのモデルを強化学習させ、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。なお、状態ｓを表す観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、その観測情報に応じた行動ａを表す避難指示が避難情報の一例として出力される。 Specifically, the learning unit 24, as reward r _t represented by the above formula (5) is increased, to enhance learning model for outputting evacuation information from observation information, which is an example of the learned model Obtain the action value function Q ^* (s, a). When the observation information representing the state s is ^{input to the action value function Q *} (s, a), the evacuation instruction representing the action a corresponding to the observation information is output as an example of the evacuation information.

なお、観測情報から避難情報を出力するための行動価値関数のモデルとしては、どのような関数を用いてもよい。例えば、行動価値関数のモデルとしてニューラルネットワークモデルを用いることができる。または、状態ｓを表す観測情報と行動ａを表す避難指示とが対応付けられたテーブル（Ｑテーブルとも称される。）を用いても良い。 Any function may be used as a model of the action value function for outputting evacuation information from the observation information. For example, a neural network model can be used as a model of the action value function. Alternatively, a table (also referred to as a Q table) in which the observation information representing the state s and the evacuation instruction representing the action a are associated with each other may be used.

学習済みモデル記憶部２６には、学習部２４によって学習された学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）が格納される。なお、行動価値関数Ｑ^*（ｓ，ａ）は、後述する避難誘導装置において用いられ、各時刻において観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、避難情報の一例である避難指示が表示装置へ表示される。 ^{The trained model storage unit 26 stores an action value function Q *} (s, a), which is an example of the trained model learned by the learning unit 24. The action value function Q ^* (s, a) is used in the evacuation guidance device described later, and when the observation information is ^{input to the action value function Q *} (s, a) at each time, it is an example of evacuation information. A certain evacuation order is displayed on the display device.

従来のシミュレーションによる災害発生時の避難指示の最適化は、結果が最適となるように避難指示という入力パラメタを同定する手法である。このため、複数の手法や複数のモデルによるシミュレーション結果を並列的に考慮して避難指示を最適化する場合、手法やモデルの数が増えるにつれて同定が困難となる。 Optimization of evacuation instructions in the event of a disaster by conventional simulation is a method of identifying an input parameter called evacuation instructions so that the result is optimal. Therefore, when optimizing evacuation instructions by considering the simulation results of a plurality of methods and a plurality of models in parallel, identification becomes difficult as the number of methods and models increases.

一方、本実施形態では、避難シミュレーションのシミュレーション結果を、状態ｓ及び行動ａという変数に変換した後、最適な方策πを選定する手法である。このため、異なる手法や異なるモデルのシミュレーション結果であっても、共通な状態ｓ及び共通な行動ａに変換できるのであれば、枠組みを変えることなく最適な方策πを選定することが可能である。これにより、例えば個々の建物を対象とした避難シミュレーションと、地域を対象とした河川氾濫からの避難シミュレーションのように、全く異なるシミュレーション結果を並列的に組み合わせ、最適な避難指示を評価することが可能になる。 On the other hand, in the present embodiment, the simulation result of the evacuation simulation is converted into the variables of the state s and the action a, and then the optimum measure π is selected. Therefore, even if the simulation results of different methods and different models can be converted into a common state s and a common action a, it is possible to select the optimum policy π without changing the framework. This makes it possible to evaluate the optimum evacuation order by combining completely different simulation results in parallel, such as an evacuation simulation for individual buildings and an evacuation simulation from river flooding for an area. become.

＜避難誘導装置のシステム構成＞ <System configuration of evacuation guidance device>

図３は、本発明の実施形態に係る避難誘導装置３０の構成の一例を示すブロック図である。避難誘導装置３０は、機能的には、図３に示されるように、観測装置３２と、コンピュータ３４と、複数の表示装置４２とを含んだ構成で表すことができる。表示装置４２は、本発明の避難情報出力装置の一例である。 FIG. 3 is a block diagram showing an example of the configuration of the evacuation guidance device 30 according to the embodiment of the present invention. Functionally, as shown in FIG. 3, the evacuation guidance device 30 can be represented by a configuration including an observation device 32, a computer 34, and a plurality of display devices 42. The display device 42 is an example of the evacuation information output device of the present invention.

避難誘導装置３０の観測装置３２及び複数の表示装置４２は、学習用の建物と同等の対象の建物内に設置される。例えば、対象の建物は、仮想的な学習用の建物の設計図に基づき建設された建物である。そして、災害が発生した際に、避難誘導装置３０は、建物内に設置された観測装置３２により逐次観測される情報に基づいて、建物内に設置された複数の表示装置４２の表示を制御し、建物内の人の避難を誘導する。以下、具体的に説明する。 The observation device 32 of the evacuation guidance device 30 and the plurality of display devices 42 are installed in a target building equivalent to the building for learning. For example, the target building is a building constructed based on the blueprint of a virtual learning building. Then, when a disaster occurs, the evacuation guidance device 30 controls the display of the plurality of display devices 42 installed in the building based on the information sequentially observed by the observation device 32 installed in the building. , Guide the evacuation of people in the building. Hereinafter, a specific description will be given.

観測装置３２は、建物内に設置され、災害が発生した際の建物内の人の位置又は動きを表す観測情報を逐次取得する。また、観測装置３２としては、例えば、人が携帯している携帯端末等、グローバル・ポジショニング・システム（GPS）機能を有する端末を利用することができる。また、建物内の避難誘導の場合には、建物内に設置されたカメラによる画像データから人の動きを判断するシステム（例えば、構造計画研究所によるVitracom Site Viewや、産業技術総合研究所によるCrowd Walk等）を利用することができる。また、観測装置３２は、災害状況（例えば、火災の広がり具合及び地震による建物の崩壊度合い等）も併せて観測するようにしてもよい。 The observation device 32 is installed in the building and sequentially acquires observation information indicating the position or movement of a person in the building when a disaster occurs. Further, as the observation device 32, for example, a terminal having a global positioning system (GPS) function such as a mobile terminal carried by a person can be used. In the case of evacuation guidance inside a building, a system that judges the movement of people from image data from a camera installed inside the building (for example, Vitracom Site View by the Structural Planning Institute and Crowd by the National Institute of Advanced Industrial Science and Technology). Walk etc.) can be used. In addition, the observation device 32 may also observe the disaster situation (for example, the degree of spread of the fire and the degree of collapse of the building due to the earthquake).

コンピュータ３４は、ＣＰＵ（Central Processing Unit）、各処理ルーチンを実現するためのプログラム等を記憶したＲＯＭ（Read Only Memory）、データを一時的に記憶するＲＡＭ（Random Access Memory）、記憶手段としてのメモリ、ネットワークインタフェース等を含んで構成されている。コンピュータ３４は、機能的には、図３に示すように、取得部３６と、学習済みモデル記憶部３７と、避難情報生成部３８と、制御部４０とを備えている。 The computer 34 includes a CPU (Central Processing Unit), a ROM (Read Only Memory) that stores programs for realizing each processing routine, a RAM (Random Access Memory) that temporarily stores data, and a memory as a storage means. , Network interface, etc. are included. Functionally, as shown in FIG. 3, the computer 34 includes an acquisition unit 36, a learned model storage unit 37, an evacuation information generation unit 38, and a control unit 40.

取得部３６は、観測装置３２によって逐次取得された観測情報を取得する。なお、例えば、観測情報がカメラによって撮像された画像等である場合、取得部３６は、所定の画像処理によって、画像に写る人の位置及び動きを検出する。 The acquisition unit 36 acquires the observation information sequentially acquired by the observation device 32. For example, when the observation information is an image captured by a camera or the like, the acquisition unit 36 detects the position and movement of a person appearing in the image by a predetermined image process.

学習済みモデル記憶部３７には、避難誘導モデル学習装置１０の学習済みモデル記憶部２６に格納された学習済みモデルと同一の学習済みモデルが格納されている。本実施形態の学習済みモデルは行動価値関数Ｑ^*（ｓ，ａ）である。 The trained model storage unit 37 stores the same trained model as the trained model stored in the trained model storage unit 26 of the evacuation guidance model learning device 10. The trained model of this embodiment is the behavioral value function Q ^* (s, a).

避難情報生成部３８は、学習済みモデル記憶部３７に格納された学習済みモデルとしての行動価値関数Ｑ^*（ｓ，ａ）を読み出す。そして、避難情報生成部３８は、取得部３６によって取得された観測情報ｓを、行動価値関数Ｑ^*（ｓ，ａ）へ入力して、災害が発生した際の避難指示ａを生成する。なお、避難者は避難指示ａに応じた行動をとるものとする。 ^{The evacuation information generation unit 38 reads out the action value function Q *} (s, a) as a learned model stored in the learned model storage unit 37. Then, the evacuation information generation unit 38 inputs the observation information s acquired by the acquisition unit 36 into the action value function Q ^* (s, a) to generate an evacuation instruction a when a disaster occurs. The evacuees shall take actions according to the evacuation order a.

制御部４０は、避難情報生成部３８によって生成された避難情報に応じて、災害が発生した建物内に設置された複数の表示装置４２を制御する。 The control unit 40 controls a plurality of display devices 42 installed in the building where the disaster has occurred according to the evacuation information generated by the evacuation information generation unit 38.

複数の表示装置４２の各々は、図４に示されるように、建物の各箇所に設置される。そして、複数の表示装置４２の各々は、制御部４０による制御に応じて各箇所個別に表示を変更させる。表示装置４２に表示される内容は、例えば、避難指示に応じた避難方向を表す矢印又は避難指示に応じた文章（例えば、「右手方向は通行不可です。左手方向から避難してください。」）が表示される。これにより、災害が発生した際に人々を適切に避難誘導することができる。 Each of the plurality of display devices 42 is installed at various points in the building as shown in FIG. Then, each of the plurality of display devices 42 individually changes the display at each location according to the control by the control unit 40. The content displayed on the display device 42 is, for example, an arrow indicating the evacuation direction according to the evacuation order or a sentence corresponding to the evacuation order (for example, "The right hand direction is impassable. Please evacuate from the left hand direction.") Is displayed. This makes it possible to appropriately guide people to evacuate in the event of a disaster.

＜避難誘導モデル学習装置の作用＞ <Operation of evacuation guidance model learning device>

次に、避難誘導モデル学習装置１０の作用を説明する。避難誘導モデル学習装置１０は、図５の学習処理ルーチンを実行する。 Next, the operation of the evacuation guidance model learning device 10 will be described. The evacuation guidance model learning device 10 executes the learning processing routine shown in FIG.

＜学習処理ルーチン＞ <Learning processing routine>

仕様情報と条件情報とが避難誘導モデル学習装置１０に入力され、受付部１２が仕様情報と条件情報とを受け付けると、設定情報記憶部２２に、仕様情報と条件情報とが格納される。そして、避難誘導モデル学習装置１０は、学習処理の指示信号を受け付けると、図５に示される学習処理ルーチンを実行する。 When the specification information and the condition information are input to the evacuation guidance model learning device 10 and the reception unit 12 receives the specification information and the condition information, the setting information storage unit 22 stores the specification information and the condition information. Then, when the evacuation guidance model learning device 10 receives the instruction signal for the learning process, the evacuation guidance model learning device 10 executes the learning process routine shown in FIG.

ステップＳ１００において、学習部２４は、設定情報記憶部２２に格納された仕様情報と条件情報とを読み込む。 In step S100, the learning unit 24 reads the specification information and the condition information stored in the setting information storage unit 22.

ステップＳ１０２において、学習部２４は、上記ステップＳ１００で読み込まれた条件情報のうちの災害発生条件情報に基づき、避難シミュレーションにおける災害発生条件を設定する。例えば、学習部２４は、災害発生条件情報に基づき、建物内の火災が発生する場所及びその規模等を災害発生条件として設定する。 In step S102, the learning unit 24 sets the disaster occurrence condition in the evacuation simulation based on the disaster occurrence condition information in the condition information read in step S100. For example, the learning unit 24 sets the location and scale of the fire in the building as the disaster occurrence condition based on the disaster occurrence condition information.

ステップＳ１０４において、学習部２４は、上記ステップＳ１００で読み込まれた条件情報のうちの避難指示条件情報に基づき、避難シミュレーションにおける避難指示条件を設定する。例えば、ある場所で火災が発生した際には、被災者はその場所から離れるように避難指示が出されるような避難指示条件が設定される。避難シミュレーションにおいて、避難指示条件に応じた様々な避難指示が出され、その避難指示による被災者の行動と結果に基づき、後述する行動価値関数Ｑ^*（ｓ，ａ）が学習される。 In step S104, the learning unit 24 sets the evacuation instruction condition in the evacuation simulation based on the evacuation instruction condition information in the condition information read in step S100. For example, when a fire breaks out in a certain place, evacuation instruction conditions are set so that the victim is instructed to leave the place. In the evacuation simulation, various evacuation orders are issued according to the evacuation instruction conditions, and the action value function Q ^* (s, a) described later is learned based on the actions and results of the victims according to the evacuation instructions.

ステップＳ１０６において、学習部２４は、上記ステップＳ１００で読み込まれた建物の仕様情報と、上記ステップＳ１０２で設定された災害発生条件と、上記ステップＳ１０４で設定された避難指示条件とに基づいて、学習用の建物において災害が発生した際の避難シミュレーションを実行する。 In step S106, the learning unit 24 learns based on the building specification information read in step S100, the disaster occurrence condition set in step S102, and the evacuation instruction condition set in step S104. Execute an evacuation simulation when a disaster occurs in a building for use.

ステップＳ１０８において、学習部２４は、上記ステップＳ１０６で実行された避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S108, the learning unit 24 stores the simulation result of the evacuation simulation executed in step S106 in a storage unit (not shown).

ステップＳ１０９において、学習部２４は、上記ステップＳ１０８に格納されたシミュレーション結果に基づいて、報酬ｒ_ｔが大きくなるように、観測情報から避難経路に関する情報を出力するためのモデルを強化学習させ、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S109, the learning unit 24, based on the results simulation stored in the step S108, as reward r _t becomes large, to enhance learning model for outputting information on evacuation route from the observed information, learning ^{We obtain the action value function Q *} (s, a), which is an example of the completed model.

ステップＳ１１０において、学習部２４は、所定回数の避難シミュレーションが実行されたか否かを判定する。所定回数の避難シミュレーションが実行された場合には、ステップＳ１１２へ進む。一方、所定回数の避難シミュレーションが実行さていない場合には、ステップＳ１０４へ戻る。これにより、避難指示条件に応じた避難指示のみが変更された避難シミュレーションが必要な回数実行される。 In step S110, the learning unit 24 determines whether or not the evacuation simulation has been executed a predetermined number of times. When the evacuation simulation is executed a predetermined number of times, the process proceeds to step S112. On the other hand, if the evacuation simulation has not been executed a predetermined number of times, the process returns to step S104. As a result, the evacuation simulation in which only the evacuation instruction according to the evacuation instruction condition is changed is executed as many times as necessary.

ステップＳ１１２において、学習部２４は、全ての災害発生条件の避難シミュレーションが実行されたか否かを判定する。全ての災害発生条件の避難シミュレーションが実行された場合には、ステップＳ１１４へ進む。一方、避難シミュレーションが実行さていない災害発生条件が存在する場合には、ステップＳ１０２へ戻る。これにより、災害の災害発生条件のみが変更された避難シミュレーションが実行され、想定される災害についての避難シミュレーションが実行される。 In step S112, the learning unit 24 determines whether or not the evacuation simulation of all the disaster occurrence conditions has been executed. When the evacuation simulation of all the disaster occurrence conditions is executed, the process proceeds to step S114. On the other hand, if there is a disaster occurrence condition for which the evacuation simulation has not been executed, the process returns to step S102. As a result, the evacuation simulation in which only the disaster occurrence conditions of the disaster are changed is executed, and the evacuation simulation for the assumed disaster is executed.

ステップＳ１１４において、学習部２４は、上記ステップＳ１０９で学習された、学習済みの行動価値関数Ｑ^*（ｓ，ａ）を学習済みモデル記憶部２６に格納して、学習処理ルーチンを終了する。 In step S114, the learning unit 24 stores the learned action value function Q ^* (s, a) learned in step S109 in the learned model storage unit 26, and ends the learning processing routine.

＜避難誘導装置の作用＞ <Action of evacuation guidance device>

次に、避難誘導装置３０の作用を説明する。避難誘導装置３０は、図６の避難誘導処理ルーチンを実行する。 Next, the operation of the evacuation guidance device 30 will be described. The evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG.

＜避難誘導処理ルーチン＞ <Evacuation guidance processing routine>

避難誘導モデル学習装置１０によって学習された学習済みの行動価値関数Ｑ^*（ｓ，ａ）が避難誘導装置３０へ入力されると、学習済みの行動価値関数Ｑ^*（ｓ，ａ）は学習済みモデル記憶部３７へ格納される。 ^{When the learned behavior value function Q *} (s, a) learned by the evacuation guidance model learning device 10 is input to the evacuation guidance device 30, the learned behavior value function Q ^* (s, a) has been learned. It is stored in the model storage unit 37.

そして、避難誘導装置３０が設置された建物内において災害が発生したことが検知されると、避難誘導装置３０は、図６に示す避難誘導処理ルーチンを実行する。避難誘導装置３０は、観測装置３２によって観測情報が得られる毎に、図６に示す避難誘導処理ルーチンを実行する。 Then, when it is detected that a disaster has occurred in the building where the evacuation guidance device 30 is installed, the evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG. The evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG. 6 every time observation information is obtained by the observation device 32.

ステップＳ２００において、取得部３６は、観測装置３２によって取得された観測情報を取得する。観測情報は、被災した建物内の被災者の位置及び動き等である。 In step S200, the acquisition unit 36 acquires the observation information acquired by the observation device 32. The observation information is the position and movement of the victim in the damaged building.

ステップＳ２０２において、避難情報生成部３８は、学習済みモデル記憶部３７に格納された学習済みモデルとしての行動価値関数Ｑ^*（ｓ，ａ）を読み出す。 ^{In step S202, the evacuation information generation unit 38 reads out the action value function Q *} (s, a) as a learned model stored in the learned model storage unit 37.

ステップＳ２０４において、避難情報生成部３８は、上記ステップＳ２００で取得された観測情報を、上記ステップＳ２０２で読み出された行動価値関数Ｑ^*（ｓ，ａ）へ入力して、避難経路に関する情報である避難情報を生成する。具体的には、観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、行動価値関数Ｑ^*（ｓ，ａ）から避難情報の一例である避難指示が出力される。 In step S204, the evacuation information generation unit 38 inputs the observation information acquired in step S200 into the action value function Q ^* (s, a) read in step S202, and uses information on the evacuation route. Generate some evacuation information. Specifically, the observation information action value function Q ^* (s, a) is input to, action value function Q ^* (s, a) evacuation order, which is an example of evacuation information is outputted from.

ステップＳ２０６において、制御部４０は、避難情報生成部３８によって生成された避難指示に応じて、災害が発生した建物内に設置された複数の表示装置４２を制御して、避難誘導処理ルーチンを終了する。 In step S206, the control unit 40 controls a plurality of display devices 42 installed in the building where the disaster has occurred in response to the evacuation instruction generated by the evacuation information generation unit 38, and ends the evacuation guidance processing routine. To do.

複数の表示装置４２の各々は、制御部４０による制御に応じて表示を変更させる。建物内の避難者は、複数の表示装置４２の各々に表示された避難指示に従って避難する。 Each of the plurality of display devices 42 changes the display according to the control by the control unit 40. The evacuees in the building evacuate according to the evacuation instructions displayed on each of the plurality of display devices 42.

以上詳細に説明したように、本実施形態の避難誘導装置は、災害が発生した際の人の位置又は動きを表す観測情報を、避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて予め強化学習された学習済みモデルへ入力して、災害が発生した際の避難経路に関する情報である避難情報を生成し、避難情報に応じて建物内に設置された表示装置を制御する。これにより、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 As described in detail above, the evacuation guidance device of the present embodiment uses observation information indicating the position or movement of a person when a disaster occurs, and the number of dead, injured, and evacuation in the disaster of the evacuation simulation are completed. Evacuation information, which is information on evacuation routes in the event of a disaster, is generated by inputting into a pre-enhanced trained model using rewards set according to the time until the evacuation. Controls the display device installed in the building. As a result, the evacuees can be evacuated so as to minimize the risk in consideration of the evacuation simulation result when a disaster occurs.

また、本実施形態の避難誘導モデル学習装置は、災害が発生した際の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて、災害が発生した際の人の位置又は動きを表す観測情報から災害が発生した際の避難経路に関する情報である避難情報を出力するためのモデルを強化学習させて、学習済みモデルを得る。これにより、災害が発生した場合に、リスクを最小化するように避難者を避難させるための学習済みモデルを取得することができる。 Further, the evacuation guidance model learning device of the present embodiment executes an evacuation simulation when a disaster occurs, and based on the simulation result, the number of deaths, the number of injured, and the evacuation in the disaster of the evacuation simulation are completed. A model for outputting evacuation information, which is information on evacuation routes when a disaster occurs, from observation information showing the position or movement of a person when a disaster occurs, using the reward set according to the time of Get a trained model by training. This makes it possible to obtain a trained model for evacuating evacuees to minimize risk in the event of a disaster.

また、本実施形態では、避難シミュレーションのシミュレーション結果を状態ｓ，行動ａ，方策πという変数に変換した後、これらの３変数の関係性をモデルに学習させることで、避難指示を最適化することができる。すなわち、最適な避難指示の判定において、強化学習によって得られる学習済みモデルを利用することで、異なる手法や異なるモデルによるシミュレーション結果を並列的に考慮することができる。 Further, in the present embodiment, after converting the simulation result of the evacuation simulation into the variables of the state s, the action a, and the policy π, the evacuation instruction is optimized by learning the relationship between these three variables in the model. Can be done. That is, in determining the optimum evacuation instruction, by using the trained model obtained by reinforcement learning, it is possible to consider different methods and simulation results by different models in parallel.

＜第２実施形態＞ <Second Embodiment>

次に、第２実施形態について説明する。第２実施形態では、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて学習済みモデルを得て、その学習済みモデルを用いて避難情報を取得する点が第１実施形態と異なる。なお、第２実施形態に係る各装置の構成は、第１実施形態と同様の構成となるため、同一符号を付して説明を省略する。 Next, the second embodiment will be described. The second embodiment is different from the first embodiment in that a plurality of types of evacuation simulations are executed, a trained model is obtained based on the simulation results, and evacuation information is acquired using the trained model. Since the configuration of each device according to the second embodiment has the same configuration as that of the first embodiment, the same reference numerals are given and the description thereof will be omitted.

第２実施形態では、異なる種類の避難シミュレーション（例えば、建物内を対象とした避難シミュレーション及び建物外の街区内を対象とした避難シミュレーション）を実行し、そのシミュレーション結果を学習済みモデルへ反映させる。 In the second embodiment, different types of evacuation simulations (for example, an evacuation simulation targeting the inside of a building and an evacuation simulation targeting a block outside the building) are executed, and the simulation results are reflected in the trained model.

第２実施形態では、建物単独を対象とした避難シミュレーション（建物内部から外への避難）である第１の避難シミュレーションと、建物の外における街区レベルの避難シミュレーション（建物の外の街区内での避難）である第２の避難シミュレーションとを想定する。例えば、第２実施形態では、図７に示されるように、避難者Ｕは建物Ａｘから外へ出て街区における避難も行う場合を想定する。なお、本実施形態においては、２つの異なる種類の避難シミュレーションを実行する場合を例に説明するが、２つよりも多い複数種類の避難シミュレーションを本実施形態へ適用することも可能である。 In the second embodiment, the first evacuation simulation, which is an evacuation simulation for the building alone (evacuation from the inside of the building to the outside), and the evacuation simulation at the district level outside the building (inside the district outside the building). Evacuation) is assumed as the second evacuation simulation. For example, in the second embodiment, as shown in FIG. 7, it is assumed that the evacuee U goes out from the building Ax and also evacuates in the block. In the present embodiment, the case where two different types of evacuation simulations are executed will be described as an example, but it is also possible to apply a plurality of types of evacuation simulations, which is more than two, to the present embodiment.

第２実施形態の学習部２４は、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、観測情報から避難情報を出力するためのモデルを強化学習させて、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。 The learning unit 24 of the second embodiment is an example of a learned model in which a plurality of types of evacuation simulations are executed, and a model for outputting evacuation information from observation information is strengthened and learned based on the simulation results. Obtain the action value function Q ^* (s, a).

具体的には、第２実施形態においては、災害が発生したときの建物の内部の状態をｓ_０とし、建物から外への避難に関する方策をπ_１とし、建物から外への避難指示をａ_１とする。なお、避難者は避難指示ａ_１に応じた行動をとるものとする。 Specifically, in the second embodiment, the internal state of the building when a disaster occurs is set to s ₀ , the measure for evacuation from the building to the outside is set to π _1, and the evacuation instruction from the building to the outside is a. _Let it be 1. It should be noted that the evacuees shall take action in accordance with the evacuation order a _1.

また、建物から外への避難が完了したときの状態をｓ_１とし、建物の外の街区における避難に関する方策をπ_２とし、建物の外の街区における避難に関する行動をａ_２とし、建物の外の街区における避難が完了したときの状況をｓ_２とする。 In addition, the state when evacuation from the building is completed is s ₁ , the measure for evacuation in the block outside the building is π ₂ , the action for evacuation in the block outside the building is a _2, and the outside of the building. _{Let s 2} be the situation when the evacuation in the block is completed.

この場合、上記の変数間の関係は、状態ｓ_０に対して方策π_１を適用することで避難指示ａ_１が出され、避難者が避難指示ａ_１に応じた避難行動をとり、その結果として建物からの避難が完了した時点の状態がｓ_１となる。そして、この状態ｓ_１に対して方策π_２を適用することで避難指示ａ_２が出され、避難者が避難指示ａ_２に応じた避難行動をとり、その結果として避難所等への避難が完了したときの状態がｓ_２となる。 In this case, the relationship between the variables, evacuation order a ₁ by applying measures Pai ₁ relative state s ₀ is issued, take evacuation of evacuees in accordance with the evacuation order a _1, so that state at the time of evacuation of the building has been completed is s ₁ as. Then, by applying the measure π ₂ to this state s ₁ , an evacuation order a ₂ is issued, the evacuees _{take evacuation actions according to the evacuation order a 2} , and as a result, evacuation to an evacuation center or the like is performed. The state when completed is s ₂ .

Q-Learningにおいては、学習のためには状態ｓ_１に応じた報酬ｒ_１と、状態ｓ_２に応じた報酬ｒ_２とを算出する必要がある。これらの変数間の関係は図８のようになる。 In Q-Learning, for learning and reward r ₁ corresponding to the state s _1, it is necessary to calculate the reward r ₂ in accordance with a state s _2. The relationship between these variables is as shown in FIG.

本実施形態においては、最終的な状態ｓ_２の時点における報酬ｒ_２を最大化させる避難指示の方法を、強化学習によって学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）へ反映させることが目的となる。なお、報酬ｒ_１及び報酬ｒ_２は、上記式（５）に示されるように、死者数Ｄ、負傷者数Ｉ、及び避難時間Ｔ等に基づく関数として設定される。 In the present embodiment, reflecting the reward r ₂ at the time of the final state s ₂ methods evacuation order to maximize, to the reinforcement learning action-value function, which is an example of the learned model by Q ^* (s, a) The purpose is to make it. The reward r ₁ and the reward r ₂ are set as functions based on the number of dead D, the number of injured I, the evacuation time T, and the like, as shown in the above equation (5).

第２の避難シミュレーションである街区レベルの避難シミュレーションでは、シミュレーション結果から避難の状態ｓ_２が算出される。このため、第２の避難シミュレーションのシミュレーション結果に基づいて、報酬ｒ_２を最大化するように、避難指示を出力する行動価値関数Ｑ^*（ｓ，ａ）を得ることができる。 In the second evacuation simulation, the block-level evacuation simulation, the evacuation state s ₂ is calculated from the simulation results. Therefore, it is possible on the basis of the second evacuation simulation simulation results, so as to maximize the reward r _2, to obtain the action value function Q outputs of evacuation instruction ^* (s, a).

一方、第１の避難シミュレーションである建物内の避難シミュレーションのシミュレーション結果のみでは、避難の状態ｓ_１しか算出されない。このため、第１の避難シミュレーションによって報酬ｒ_１を算出することはできても、報酬ｒ_２を直接算出することはできない。 On the other hand, only a simulation result of the evacuation simulation in a building, which is the first evacuation simulation, only the state s ₁ evacuation not calculated. Therefore, although it is possible to calculate the reward r ₁ by the first evacuation simulations, it is impossible to calculate the reward r ₂ directly.

このため、例えば、第１の避難シミュレーションのシミュレーション結果に基づき報酬ｒ_１を用いて強化学習を行い、建物内から外への避難指示をモデルに学習させた場合には、建物から外への避難が完了するまでを最適化するような行動価値関数Ｑ^*（ｓ，ａ）が得られるが、この行動価値関数Ｑ^*（ｓ，ａ）は、報酬ｒ_２を最大化するような避難指示を必ずしも出力するわけではない。 Thus, for example, performs a reinforcement learning using reward r ₁ based on the first evacuation simulation simulation results, when the the building was learned evacuation instruction outside the model, evacuation out of the building There action value, such as to optimize until the completion function Q ^* (s, a) it is obtained, this action-value function Q ^* (s, a) is, the evacuation order, such as to maximize the reward r ₂ It does not always output.

例えば、西側と東側とに出口を有する建物であって、かつ西側の出口の方が東側の出口口よりも広い建物を想定する。また、この建物の東側には避難所となる公園が存在すると想定する。この場合、建物から外へ避難するのみにおいては、西側の出口を利用した方が好ましい。しかし、建物から避難所まで避難することを考慮すると、建物の東口の出口を利用した方がより好ましいと考えられる。このような場合、第１の避難シミュレーションのシミュレーション結果のみに基づいて強化学習を行い、行動価値関数Ｑ^*（ｓ，ａ）を得ることは不適切である。 For example, assume a building having exits on the west side and the east side, and the exit on the west side is wider than the exit on the east side. It is also assumed that there is a park as a shelter on the east side of this building. In this case, it is preferable to use the exit on the west side only to evacuate from the building to the outside. However, considering evacuating from the building to the shelter, it is considered more preferable to use the exit at the east exit of the building. In such a case, it is inappropriate to perform reinforcement learning based only on the simulation result of the first evacuation simulation ^{and obtain the action value function Q *} (s, a).

このため、報酬ｒ_２に基づいて、第１の避難シミュレーションにおける避難指示を強化学習させる手法が必要となる。そこで、本実施形態では、第１の避難シミュレーションにおける報酬ｒ_１と第２の避難シミュレーションにおける報酬ｒ_２とを尤度関数によって関係付けることにより、観測情報から避難情報を出力するためのモデルを強化学習させる。 Therefore, based on the reward r _2, approach to enhance learning evacuation instruction in the first evacuation simulation is required. It reinforced Therefore, in this embodiment, by associating a reward r ₁ of the first evacuation simulations and reward r ₂ of the second evacuation simulated by likelihood function, a model for outputting evacuation information from observation information Let them learn.

具体的には、まず、学習部２４は、様々な避難の状態ｓ_１を初期条件として設定し、設定された各初期条件に基づき第２の避難シミュレーションを行う。そして、学習部２４は、設定された各初期条件に基づき第２の避難シミュレーションによって得られた避難の状態ｓ_２を取得する。 Specifically, first, the learning unit 24 sets various evacuation states s ₁ as initial conditions, and performs a second evacuation simulation based on each set initial condition. Then, the learning unit 24 obtains the state s ₂ evacuation obtained with the second evacuation simulation based on the initial conditions set.

次に、学習部２４は、第２の避難シミュレーションによって得られた避難の状態ｓ_２に応じた報酬ｒ_２に基づいて強化学習を行い、行動価値関数Ｑ^*（ｓ，ａ）を得る。 Then, the learning unit 24 performs reinforcement learning based on reward r ₂ in accordance with the state s ₂ evacuation obtained with the second evacuation simulation, obtaining a behavior value function Q ^* (s, a).

次に、学習部２４は、第２の避難シミュレーションによって得られた避難の状態ｓ_１及び避難の状態ｓ_２に基づいて、報酬ｒ_１と報酬ｒ_２とを算出する。そして、学習部２４は、算出された報酬ｒ_１と報酬ｒ_２とを対応付ける。 Then, the learning unit 24, based on the state s ₂ states s ₁ and evacuation Evacuation obtained with the second evacuation simulation calculates the reward r ₁ and reward r _2. Then, the learning unit 24 associates the remuneration _{r 2} reward _{r 1} calculated.

次に、学習部２４は、対応付けられた報酬ｒ_１と報酬ｒ_２とを用いて、報酬ｒ_２に対する報酬ｒ_１の尤度Ｌ（ｒ_１｜ｒ_２）を算出する。尤度Ｌ（ｒ_１｜ｒ_２）は、報酬ｒ_２が観測されたときの報酬ｒ_１の尤もらしさを表す指標である。この尤度Ｌ（ｒ_１｜ｒ_２）によって、報酬ｒ_１と報酬ｒ_２とは、以下の式（６）に示されるような関係となる。 Next, the learning unit 24 calculates the likelihood L (r ₁ | r ₂ ) of the reward r ₁ with respect to the reward r ₂ _{by using the associated reward r 1} and the reward r _2. The likelihood L (r ₁ | r ₂ ) is an index showing the likelihood of the reward r ₁ _{when the reward r 2 is observed.} Due to this likelihood L (r ₁ | r ₂ ), the reward r ₁ and the reward r ₂ have a relationship as shown in the following equation (6).

Ｐ（ｒ_２）＝Ｐ（ｒ_１）×Ｌ（ｒ_１｜ｒ_２）
（６） P (r ₂ ) = P (r ₁ ) x L (r ₁ | r ₂ )
(6)

なお、上記式（６）における、Ｐ（ｒ_２）は報酬がｒ_２となる確率を表し、Ｐ（ｒ_１）は報酬がｒ_１となる確率を表す。 In the above equation (6), P (r ₂ ) represents the probability that the reward will be r _2, _{and P (r 1} ) represents the probability that the reward will be r ₁ .

次に、学習部２４は、尤度Ｌ（ｒ_１｜ｒ_２）を算出した後、避難の状態ｓ₀を初期条件として、第１の避難シミュレーションを実行し、シミュレーション結果を得る。 Next, the learning unit 24 _{calculates the likelihood L (r 1} | r ₂ ), executes _{the first evacuation simulation with the evacuation state s 0} as the initial condition, and obtains the simulation result.

次に、学習部２４は、第１の避難シミュレーションのシミュレーション結果に基づいて、報酬ｒ_１を算出する。そして、学習部２４は、第１の避難シミュレーションのシミュレーション結果から算出された報酬ｒ_１に対してＬ（ｒ_１｜ｒ_２）を乗じることにより、報酬ｒ_２の確率分布を算出し、報酬ｒ_２の期待値を最大化するように、第２の避難シミュレーションの結果によって既に強化学習された行動価値関数Ｑ^*（ｓ，ａ）を更に強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 Then, the learning unit 24, based on the first evacuation simulation simulation results, calculates the reward r _1. Then, the learning unit 24 calculates the probability distribution of the reward r ₂ _{by multiplying the reward r 1} calculated from the simulation result of the first evacuation simulation by L (r ₁ | r ₂ ), and calculates the reward r 2. to maximize the _second expected value, already reinforcement learning behavioral value function Q ^* (s, a) by the results of the second evacuation simulations further enhance learning, action value function Q ^* (s, a) To get.

第２実施形態の避難情報生成部３８は、取得部３６によって取得された観測情報ｓを、第２実施形態の学習部２４によって学習された行動価値関数Ｑ^*（ｓ，ａ）へ入力して、災害が発生した際の避難指示ａを生成する。 The evacuation information generation unit 38 of the second embodiment inputs the observation information s acquired by the acquisition unit 36 into the action value function Q ^* (s, a) learned by the learning unit 24 of the second embodiment. , Generate an evacuation order a when a disaster occurs.

次に、第２実施形態の避難誘導モデル学習装置１０の作用を説明する。第２実施形態の避難誘導モデル学習装置１０は、図９及び図１０の学習処理ルーチンを実行する。 Next, the operation of the evacuation guidance model learning device 10 of the second embodiment will be described. The evacuation guidance model learning device 10 of the second embodiment executes the learning processing routines of FIGS. 9 and 10.

＜学習処理ルーチン＞ <Learning processing routine>

仕様情報と条件情報とが避難誘導モデル学習装置１０に入力され、受付部１２が仕様情報と条件情報とを受け付けると、設定情報記憶部２２に、仕様情報と条件情報とが格納される。そして、第２実施形態の避難誘導モデル学習装置１０は、学習処理の指示信号を受け付けると、図９及び図１０に示される学習処理ルーチンを実行する。 When the specification information and the condition information are input to the evacuation guidance model learning device 10 and the reception unit 12 receives the specification information and the condition information, the setting information storage unit 22 stores the specification information and the condition information. Then, when the evacuation guidance model learning device 10 of the second embodiment receives the instruction signal of the learning process, it executes the learning process routine shown in FIGS. 9 and 10.

ステップＳ１００〜ステップＳ１０４は、第１実施形態と同様に実行される。 Steps S100 to S104 are executed in the same manner as in the first embodiment.

ステップＳ３０６において、学習部２４は、上記ステップＳ１００で読み込まれた建物の仕様情報と、上記ステップＳ１０２で設定された災害発生条件と、上記ステップＳ１０４で設定された避難指示条件とに基づいて、街区レベルの避難シミュレーションである第２の避難シミュレーションを実行する。 In step S306, the learning unit 24 blocks the block based on the building specification information read in step S100, the disaster occurrence condition set in step S102, and the evacuation instruction condition set in step S104. Execute a second evacuation simulation, which is a level evacuation simulation.

ステップＳ３０８において、学習部２４は、上記ステップＳ３０６で実行された第２の避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S308, the learning unit 24 stores the simulation result of the second evacuation simulation executed in step S306 in the storage unit (not shown).

ステップＳ３０９において、学習部２４は、上記ステップＳ３０８で第２の避難シミュレーションによって得られた結果である避難の状態ｓ_２に応じた報酬ｒ_２に基づいて、モデルを強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S309, the learning unit 24, based on the reward r ₂ in accordance with the state s ₂ evacuation is the result obtained by the second evacuation simulations in step S308, to reinforcement learning model, action value function Q ^* (S, a) is obtained.

ステップＳ１１０〜ステップＳ１１２は、第１実施形態と同様に実行される。 Steps S110 to S112 are executed in the same manner as in the first embodiment.

次に、図１０に示すステップＳ３１６において、学習部２４は、上記ステップＳ３０８で記憶された、第２の避難シミュレーションによって得られた避難の状態ｓ_１及び避難の状態ｓ_２に基づいて、報酬ｒ_１と報酬ｒ_２とを算出する。そして、学習部２４は、上記ステップＳ３１６で算出された報酬ｒ_１の各々と報酬ｒ_２の各々とを対応付ける。 Next, in step S316 shown in FIG. 10, the learning unit 24 rewards r based on the _{evacuation state s 1} and the evacuation state s _{2 stored in the second evacuation simulation.} ₁ and reward r ₂ are calculated. Then, the learning unit 24 associates the respective each and reward _{r 2} reward _{r 1} calculated in step S316.

ステップＳ３１８において、学習部２４は、上記ステップＳ３１６で対応付けられた報酬ｒ_１の各々と報酬ｒ_２の各々とを用いて、各報酬ｒ_２に対する各報酬ｒ_１の尤度Ｌ（ｒ_１｜ｒ_２）を算出する。 In step S318, the learning unit 24 uses the respective each and reward _{r 2} reward _{r 1} associated in step S316, the likelihood L _(r 1 of the reward _{r 1} for each reward _{r 2} | Calculate r ₂ ).

ステップＳ３２０は、上記ステップＳ１０２と同様に実行される。 Step S320 is executed in the same manner as in step S102.

ステップＳ３２２は、上記ステップＳ１０４と同様に実行される。 Step S322 is executed in the same manner as in step S104.

ステップＳ３２４において、学習部２４は、避難の状態ｓ_０を初期条件として、第１の避難シミュレーションを実行し、シミュレーション結果を得る。 In step S324, the learning unit 24 _{executes the first evacuation simulation with the evacuation state s 0} as the initial condition, and obtains the simulation result.

ステップＳ３２６において、学習部２４は、第１の避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S326, the learning unit 24 stores the simulation result of the first evacuation simulation in the storage unit (not shown).

ステップＳ３２７において、学習部２４は、上記ステップＳ３２６で記憶部（図示省略）に格納された、第１の避難シミュレーションのシミュレーション結果から算出された報酬ｒ_１に対してＬ（ｒ_１｜ｒ_２）を乗じる。そして、学習部２４は、上記式（６）に従って、報酬ｒ_２の確率分布を算出し、報酬ｒ_２の期待値を最大化するように、第２の避難シミュレーションの結果によって既に強化学習された行動価値関数Ｑ^*（ｓ，ａ）を更に強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S327, the learning unit 24 L (r ₁ | r ₂ ) with _{respect to the reward r 1} calculated from the simulation result of the first evacuation simulation stored in the storage unit (not shown) in step S326. Multiply. Then, the learning unit 24 in accordance with the above equation (6), to calculate the probability distribution of the reward r _2, to maximize the expected value of the reward r _2, already reinforcement learning by the results of the second evacuation simulations The behavioral value function Q ^* (s, a) is further strengthened and learned to obtain the behavioral value function Q ^* (s, a).

ステップＳ３２８において、上記ステップＳ１１０と同様に実行される。 In step S328, the same execution as in step S110 is executed.

ステップＳ３３０において、上記ステップＳ１１２と同様に実行される。 In step S330, it is executed in the same manner as in step S112.

ステップＳ３３２において、学習部２４は、上記ステップＳ３２７で得られた行動価値関数Ｑ^*（ｓ，ａ）を学習済みモデル記憶部２６へ格納して、学習処理ルーチンを終了する。 In step S332, the learning unit 24 stores the action value function Q ^* (s, a) obtained in step S327 in the learned model storage unit 26, and ends the learning processing routine.

以上詳細に説明したように、第２実施形態の避難誘導モデル学習装置は、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、学習済みモデルを得る。これにより、複数の避難シミュレーションを考慮して、災害が発生した場合に被災者のリスクを最小化するように避難させるための学習済みモデルを取得することができる。 As described in detail above, the evacuation guidance model learning device of the second embodiment executes a plurality of types of evacuation simulations and obtains a learned model based on the simulation results. This makes it possible to take into account multiple evacuation simulations and obtain a trained model for evacuating to minimize the risk of disaster victims in the event of a disaster.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、上記実施形態では、避難情報出力装置が表示装置４２である場合を例に説明したが、これに限定されるものではない。例えば、避難情報出力装置は、音声出力装置であってもよく、この場合には、避難指示が音声によって出力される。また、避難情報出力装置は、各避難者が保有しているスマートフォン等の端末であってもよい。 For example, in the above embodiment, the case where the evacuation information output device is the display device 42 has been described as an example, but the present invention is not limited to this. For example, the evacuation information output device may be a voice output device, in which case the evacuation instruction is output by voice. Further, the evacuation information output device may be a terminal such as a smartphone owned by each evacuees.

また、上記では本発明に係るプログラムが記憶部（図示省略）に予め記憶（インストール）されている態様を説明したが、本発明に係るプログラムは、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ及びマイクロＳＤカード等の記録媒体に記録されている形態で提供することも可能である。 In addition, although the mode in which the program according to the present invention is stored (installed) in advance in the storage unit (not shown) has been described above, the program according to the present invention includes a CD-ROM, a DVD-ROM, a micro SD card, and the like. It is also possible to provide the form recorded on the recording medium of.

１０避難誘導モデル学習装置
１２受付部
２０コンピュータ
２２設定情報記憶部
２４学習部
２６学習済みモデル記憶部
３０避難誘導装置
３２観測装置
３４コンピュータ
３６取得部
３７学習済みモデル記憶部
３８避難情報生成部
４０制御部
４２表示装置 10 Evacuation guidance model learning device 12 Reception unit 20 Computer 22 Setting information storage unit 24 Learning unit 26 Learned model storage unit 30 Evacuation guidance device 32 Observation device 34 Computer 36 Acquisition unit 37 Learned model storage unit 38 Evacuation information generation unit 40 Control Part 42 Display device

Claims

An acquisition unit that acquires observation information that represents the position or movement of a person when a disaster occurs,
The observation information acquired by the acquisition unit is set according to the number of deaths, injuries, and time until evacuation is completed in the disaster of the evacuation simulation based on the result of the evacuation simulation when a disaster occurs. An evacuation information generation unit that generates evacuation information, which is information on the evacuation route when the disaster occurs, by inputting into a trained model that has been strengthened and learned in advance using the rewards.
A control unit that controls an evacuation information output device according to the evacuation information generated by the evacuation information generation unit.
Evacuation guidance device including.

The trained model is a trained model that has been reinforcement-learned in advance according to the results of a plurality of types of evacuation simulations.
The evacuation guidance device according to claim 1.

An evacuation simulation is executed when a disaster occurs, and based on the evacuation simulation results, the number of deaths and injuries in the disaster of the evacuation simulation, and the reward set according to the time until the evacuation is completed are used. Then, the model for outputting the information on the evacuation route when the disaster occurs is strengthened and learned from the observation information representing the position or movement of the person when the disaster occurs, and the information on the evacuation route is obtained from the observation information. A learning unit that obtains a trained model that outputs
Evacuation guidance model learning device including.

The learning unit executes a plurality of types of the evacuation simulations, and obtains the learned model based on the evacuation simulation results.
The evacuation guidance model learning device according to claim 3.