JP7415293B2

JP7415293B2 - Evacuation guidance device and evacuation guidance model learning device

Info

Publication number: JP7415293B2
Application number: JP2019169638A
Authority: JP
Inventors: 正博大渕; 裕史恒川
Original assignee: Takenaka Corp
Current assignee: Takenaka Corp
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2024-01-17
Anticipated expiration: 2039-09-18
Also published as: JP2021047625A

Description

本発明は、避難誘導装置及び避難誘導モデル学習装置に関する。 The present invention relates to an evacuation guidance device and an evacuation guidance model learning device.

従来、避難シミュレーションシステムが知られている（例えば、特許文献１）。この避難シミュレーションシステムは、マルチエージェントシミュレーション技術を用いて高層建造物における災害避難方法をシミュレーションする。この避難シミュレーションシステムは、避難者個人を一個の行動単位としてモデル化して避難行動中の各個人の状態を逐次再現するというアプローチをとることにより、避難中の任意の時点における避難状況を追跡することで、安全な避難を妨げるボトルネックを容易に特定して改善施策の検討を行うためのものである。 Conventionally, an evacuation simulation system is known (for example, Patent Document 1). This evacuation simulation system uses multi-agent simulation technology to simulate disaster evacuation methods for high-rise buildings. This evacuation simulation system can track the evacuation situation at any point during evacuation by modeling each evacuee as a single action unit and sequentially reproducing the state of each individual during evacuation behavior. The purpose is to easily identify bottlenecks that impede safe evacuation and consider improvement measures.

また、被災したところを避けた避難ルートを出力する避難ルート出力装置が知られている（例えば、特許文献２）。この避難ルート出力装置は、災害時において避難場所まで安全に行くことができるルートを生成する。 Furthermore, an evacuation route output device that outputs an evacuation route that avoids areas affected by a disaster is known (for example, Patent Document 2). This evacuation route output device generates a safe route to an evacuation site in the event of a disaster.

また、災害の状況に応じて迅速かつ適切に避難計画を策定できる避難シミュレーション装置が知られている（例えば、特許文献３）。この避難シミュレーション装置は、避難者の密度に基づいて経路の流動を計算し、避難完了時間が最短となる最適避難経路候補を複数導出する。そして、避難シミュレーション装置は、マルチエージェント法により避難者の行動を計算し、複数の最適避難経路候補から、避難完了時間が最短となる最適避難経路を選択する。 Furthermore, an evacuation simulation device is known that can quickly and appropriately formulate an evacuation plan depending on the disaster situation (for example, Patent Document 3). This evacuation simulation device calculates the flow of routes based on the density of evacuees and derives a plurality of optimal evacuation route candidates with the shortest evacuation completion time. Then, the evacuation simulation device calculates the behavior of the evacuee using a multi-agent method, and selects the optimal evacuation route with the shortest evacuation completion time from a plurality of optimal evacuation route candidates.

特許第5372421号公報Patent No. 5372421 特許第5686479号公報Patent No. 5686479 特許第5996689号公報Patent No. 5996689

災害が発生した際に建物内の人に対して避難誘導を行う場合には、避難経路の提示を適切に行う必要がある。また、その避難経路の提示には迅速性が求められる。 When providing evacuation guidance to people inside a building when a disaster occurs, it is necessary to appropriately present evacuation routes. In addition, promptness is required in presenting the evacuation route.

しかし、上記特許文献１の技術は、安全な避難を妨げるボトルネックを容易に特定して改善施策の検討を行うためのものであり、計画対象の建物を評価する際に用いられる技術である。 However, the technology of Patent Document 1 is for easily identifying bottlenecks that impede safe evacuation and considering improvement measures, and is a technology used when evaluating buildings to be planned.

また、上記特許文献２の技術は、災害が発生した際に被災したところを避けた避難ルートを出力するものである。しかし、実際に災害が発生した場合には、被災した箇所以外の様々な状況を考慮する必要がある。例えば、避難する人の動き等を考慮する必要がある。 Furthermore, the technology disclosed in Patent Document 2 outputs an evacuation route that avoids the affected areas when a disaster occurs. However, when a disaster actually occurs, it is necessary to consider various situations other than the affected area. For example, it is necessary to consider the movement of people evacuating.

また、上記特許文献３に記載されている技術は、実際の災害の状況に応じてシミュレーションを行うが、当該シミュレーションを実行する際には時間がかかり、迅速性という観点からは適切ではない。 Further, the technique described in Patent Document 3 performs a simulation according to the actual disaster situation, but it takes time to execute the simulation and is not appropriate from the viewpoint of speed.

本発明は上記事実に鑑みて、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることを目的とする。 In view of the above facts, an object of the present invention is to evacuate evacuees in a manner that minimizes the risk, taking into account the results of an evacuation simulation when a disaster occurs.

上記目的を達成するために、本発明の避難誘導装置は、災害が発生した際の人の位置又は動きを表す観測情報を取得する取得部と、前記取得部によって取得された前記観測情報を、災害が発生した際の避難シミュレーションの結果に基づき、前記避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて予め強化学習された学習済みモデルへ入力して、前記災害が発生した際の避難経路に関する情報である避難情報を生成する避難情報生成部と、前記避難情報生成部によって生成された前記避難情報に応じて、避難情報出力装置を制御する制御部と、を含む避難誘導装置である。本発明の避難誘導装置によれば、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 In order to achieve the above object, the evacuation guidance device of the present invention includes an acquisition section that acquires observation information representing the position or movement of a person when a disaster occurs, and the observation information acquired by the acquisition section. Based on the results of an evacuation simulation when a disaster occurs, reinforcement learning is performed in advance using rewards set according to the number of dead and injured in the disaster of the evacuation simulation, and the time until evacuation is completed. an evacuation information generation unit that generates evacuation information that is information about an evacuation route when the disaster occurs by inputting the input to the evacuation model; and an evacuation information outputting unit that outputs evacuation information according to the evacuation information generated by the evacuation information generation unit. This is an evacuation guidance device including a control unit that controls the device. According to the evacuation guidance device of the present invention, evacuees can be evacuated in a manner that minimizes risk, taking into account the results of evacuation simulation when a disaster occurs.

本発明の前記学習済みモデルは、複数種類の前記避難シミュレーションの結果に応じて予め前記強化学習された学習済みモデルであるようにすることができる。これにより、複数種類の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 The trained model of the present invention may be a trained model that has undergone reinforcement learning in advance according to the results of a plurality of types of evacuation simulations. Thereby, it is possible to evacuate evacuees in a manner that minimizes risk, taking into account multiple types of evacuation simulation results.

本発明の避難誘導モデル学習装置は、災害が発生した際の避難シミュレーションを実行し、前記避難シミュレーション結果に基づいて、前記避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて、災害が発生した際の人の位置又は動きを表す観測情報から前記災害が発生した際の避難経路に関する情報を出力するためのモデルを強化学習させて、前記観測情報から前記避難経路に関する情報を出力する学習済みモデルを得る学習部と、を含む避難誘導モデル学習装置である。本発明の避難誘導モデル学習装置によれば、災害が発生した場合に、リスクを最小化させるように避難者を避難させるための学習済みモデルを取得することができる。 The evacuation guidance model learning device of the present invention executes an evacuation simulation when a disaster occurs, and based on the results of the evacuation simulation, calculates the number of dead and injured in the disaster of the evacuation simulation, and the number of people until the evacuation is completed. Using rewards set according to time, reinforcement learning is performed on a model that outputs information about evacuation routes when a disaster occurs based on observation information representing the position or movement of people when a disaster occurs. and a learning unit that obtains a trained model that outputs information regarding the evacuation route from the observation information. According to the evacuation guidance model learning device of the present invention, when a disaster occurs, it is possible to obtain a trained model for evacuating evacuees in a manner that minimizes risk.

本発明の前記学習部は、複数種類の前記避難シミュレーションを実行し、前記避難シミュレーション結果に基づいて、前記学習済みモデルを得るようにすることができる。これにより、複数種類の避難シミュレーションを考慮して、災害が発生した場合にリスクを最小化するように避難者を避難させるための学習済みモデルを取得することができる。 The learning unit of the present invention may execute a plurality of types of evacuation simulations and obtain the learned model based on the evacuation simulation results. As a result, it is possible to obtain a trained model for evacuating evacuees in a manner that minimizes the risk when a disaster occurs, taking into account multiple types of evacuation simulations.

本発明によれば、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる、という効果が得られる。 According to the present invention, an effect can be obtained in that evacuees can be evacuated in a manner that minimizes the risk, taking into account the results of an evacuation simulation when a disaster occurs.

本実施形態に係る避難誘導モデル学習装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an evacuation guidance model learning device according to the present embodiment. 避難シミュレーションのシミュレーション結果と報酬との関係を説明するための説明図である。It is an explanatory diagram for explaining the relationship between the simulation result of evacuation simulation and the reward. 本実施形態に係る避難誘導装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an evacuation guidance device according to the present embodiment. 表示装置の建物内の設置イメージの一例を示す図である。FIG. 2 is a diagram illustrating an example of an installation image of a display device in a building. 第１実施形態の学習処理ルーチンの一例を示す図である。It is a figure showing an example of a learning processing routine of a 1st embodiment. 本実施形態の避難誘導処理ルーチンの一例を示す図である。It is a figure showing an example of an evacuation guidance processing routine of this embodiment. 第２実施形態に係る建物内の避難と街区の避難とを説明するための説明図である。FIG. 7 is an explanatory diagram for explaining evacuation within a building and evacuation of a block according to the second embodiment. 第２実施形態の変数間の関係を説明するための説明図である。FIG. 7 is an explanatory diagram for explaining the relationship between variables in the second embodiment. 第２実施形態の学習処理ルーチンの一例を示す図である。It is a figure which shows an example of the learning process routine of 2nd Embodiment. 第２実施形態の学習処理ルーチンの一例を示す図である。It is a figure which shows an example of the learning process routine of 2nd Embodiment.

＜本実施形態の概要＞ <Overview of this embodiment>

建物内に人が存在する際に災害が発生した場合、建物から避難する際には、被災者自身が避難経路を比較し、最適と考えられる経路を選択することになる。ここで、被災者の判断の良否は、災害に関する知識や入手できる情報によって影響されるため、状況によっては危険な避難経路を選択する場合がある。 If a disaster occurs while people are inside a building, the victims themselves will have to compare evacuation routes and select the one that is considered optimal when evacuating from the building. Here, the quality of a disaster victim's judgment is influenced by their knowledge of the disaster and available information, so they may choose a dangerous evacuation route depending on the situation.

このような背景のため、避難シミュレーションを活用した避難指示の提示手法（例えば、特許第5996689号公報を参照）が提案されているが、従来の技術は避難シミュレーションの結果（出力）が最適になるように、避難指示という入力パラメタを同定する手法である。このため、複数の手法や複数のモデルによるシミュレーション結果を並列的に考慮して避難指示を最適化することは、手法及びモデルの数が増えるにつれ、同定が困難となる。 Against this background, methods for presenting evacuation instructions using evacuation simulations have been proposed (see, for example, Japanese Patent No. 5996689), but with conventional techniques, the results (outputs) of evacuation simulations are optimal. This method identifies an input parameter called an evacuation order. For this reason, optimizing evacuation orders by considering simulation results from multiple methods and multiple models in parallel becomes difficult to identify as the number of methods and models increases.

一方、機械学習によって得られる学習済みモデルは、シミュレーションの入力と出力との間の関係性を学習した上で、最適な入力を選定することができる。そのため、異なる種類の避難シミュレーションであっても、同じ項目の入力及び出力がある避難シミュレーションであれば、併用することが容易である。 On the other hand, a trained model obtained by machine learning can select the optimal input after learning the relationship between the input and output of the simulation. Therefore, even if the evacuation simulations are of different types, it is easy to use them together as long as the evacuation simulations have the same input and output items.

そこで、本実施形態では、避難指示の判定において、機械学習によって得られる学習済みモデルを活用することで複数の避難シミュレーションの併用を可能にした手法を提案する。本実施形態によれば、複数の避難シミュレーションのシミュレーション結果を考慮することができるとともに、リスクを最小化するように避難者を避難させることができる。 Therefore, in this embodiment, we propose a method that makes it possible to use a plurality of evacuation simulations in combination in determining an evacuation order by utilizing a trained model obtained by machine learning. According to this embodiment, simulation results of a plurality of evacuation simulations can be considered, and evacuees can be evacuated in a manner that minimizes risk.

以下、本発明の実施形態について詳細に説明する。 Embodiments of the present invention will be described in detail below.

＜第１実施形態＞ <First embodiment>

＜避難誘導モデル学習装置のシステム構成＞ <System configuration of evacuation guidance model learning device>

図１は、本発明の第１実施形態に係る避難誘導モデル学習装置１０の構成の一例を示すブロック図である。避難誘導モデル学習装置１０は、機能的には、図１に示されるように、受付部１２と、コンピュータ２０とを含んだ構成で表すことができる。 FIG. 1 is a block diagram showing an example of the configuration of an evacuation guidance model learning device 10 according to a first embodiment of the present invention. Functionally, the evacuation guidance model learning device 10 can be represented by a configuration including a reception section 12 and a computer 20, as shown in FIG.

受付部１２は、ユーザから入力された情報を受け付ける。受付部１２は、例えばキーボードやマウス等によって実現される。受付部１２は、避難シミュレーションを実行する対象の仮想的な学習用の建物の仕様を表す仕様情報を受け付ける。学習用の建物とは、後述する学習部２４においてシミュレーションに用いられるコンピュータ上の仮想的な建物である。仕様情報には、例えば、学習用の建物内の部屋の配置に関する情報、学習用の建物の構造種別を表す情報、学習用の建物の材料を表す情報、学習用の建物の階数に関する情報、及び学習用の建物の設備に関する情報等が含まれている。 The reception unit 12 receives information input from a user. The reception unit 12 is realized by, for example, a keyboard, a mouse, or the like. The reception unit 12 receives specification information representing the specifications of a virtual learning building for which an evacuation simulation is to be performed. The learning building is a virtual building on a computer used for simulation in the learning section 24, which will be described later. The specification information includes, for example, information regarding the arrangement of rooms in the learning building, information representing the structural type of the learning building, information representing the material of the learning building, information regarding the number of floors of the learning building, and Contains information on the facilities of the learning building.

また、受付部１２は、避難シミュレーションを実行する際の各種条件に関する情報である条件情報を受け付ける。条件情報には、避難シミュレーションにおける仮想的な災害の発生条件に関する情報である災害発生条件情報と、仮想的な災害が発生した際の建物内の仮想的な人に対する避難指示条件に関する情報である避難指示条件情報とが含まれている。また、条件情報には、仮想的な避難者の配置状況に関する情報が含まれている。 Further, the receiving unit 12 receives condition information that is information regarding various conditions when executing the evacuation simulation. The condition information includes disaster occurrence condition information, which is information about the conditions for the occurrence of a virtual disaster in the evacuation simulation, and evacuation condition information, which is information about the evacuation instruction conditions for virtual people in the building when a virtual disaster occurs. instruction condition information is included. The condition information also includes information regarding the virtual placement of evacuees.

設定情報記憶部２２には、受付部１２により受け付けられた仕様情報と、条件情報とが格納される。設定情報記憶部２２に格納された、仕様情報及び条件情報に応じて、後述する学習部２４において避難シミュレーションが実行される。 The setting information storage unit 22 stores the specification information and condition information accepted by the reception unit 12. According to the specification information and condition information stored in the setting information storage section 22, an evacuation simulation is executed in the learning section 24, which will be described later.

学習部２４は、設定情報記憶部２２に格納された仕様情報と条件情報とに基づいて、学習用の建物において災害が発生した際の避難シミュレーションを実行する。なお、避難シミュレーションの実行されているときの、各時刻の仮想的な人の位置及び動きは、所定の記憶領域（図示省略）に逐次記録される。本実施形態において用いる避難シミュレーションは、既存の避難シミュレーションと同様のものであり、従来の技術（例えば、特許第5372421号公報に記載の技術）を利用する。また、避難シミュレーションの回数は、従来の強化学習と同様に設定することができる。 The learning unit 24 executes an evacuation simulation when a disaster occurs in the learning building based on the specification information and condition information stored in the setting information storage unit 22. Note that while the evacuation simulation is being executed, the positions and movements of virtual people at each time are sequentially recorded in a predetermined storage area (not shown). The evacuation simulation used in this embodiment is similar to existing evacuation simulations, and uses conventional technology (for example, the technology described in Japanese Patent No. 5372421). Further, the number of evacuation simulations can be set in the same way as in conventional reinforcement learning.

そして、学習部２４は、当該シミュレーション結果に基づいて、災害が発生した際の建物内の人の位置又は動きを表す観測情報から災害が発生した際の避難経路に関する情報である避難情報を出力するためのモデルを強化学習させる。学習部２４によるモデルの強化学習によって、どのような被災状況及び避難者の位置ではどのような避難指示を出せば良いかが学習される。 Then, based on the simulation results, the learning unit 24 outputs evacuation information, which is information about evacuation routes when a disaster occurs, based on observation information representing the position or movement of people in the building when a disaster occurs. Reinforcement learning of the model for Through reinforcement learning of the model by the learning unit 24, it is learned what kind of evacuation instruction should be issued in what kind of disaster situation and evacuee position.

以下、強化学習に関して説明する。強化学習は、環境の中での試行錯誤を通じて最適な行動を学習する手法である。強化学習において、教師データの代わりになるのが報酬である。累積報酬Ｒ_ｔは、報酬の割引率をγ、各局面での報酬をｒ_{ｔ＋ｋ＋１}として、以下の式（１）に示されるように定義される。なお、ｔは時刻を表す。 Reinforcement learning will be explained below. Reinforcement learning is a method of learning optimal behavior through trial and error in an environment. In reinforcement learning, rewards replace training data. The cumulative reward R _t is defined as shown in the following equation (1), where the discount rate of the reward is γ and the reward at each stage is r _t+k+1 . Note that t represents time.

（１）
(1)

なお、方策πの下で、状態ｓにおいて行動ａを選択することの価値は、以下の式（２）に示される行動価値関数Ｑ^π（ｓ，ａ）によって表される。なお、Ｅ_π｛・｝は期待値を表す。 Note that under policy π, the value of selecting action a in state s is expressed by an action value function Q ^π (s, a) shown in equation (2) below. Note that E _π {·} represents an expected value.

（２）
(2)

上記式（２）に示される行動価値関数Ｑ^π（ｓ，ａ）を用いて、価値が最も高くなるような行動ａが選択される。最適な行動価値関数Ｑ^＊は、以下の式（３）によって表される。 Using the action value function Q ^π (s, a) shown in the above equation (2), the action a that has the highest value is selected. The optimal action value function Q ^* is expressed by the following equation (3).

（３）
(3)

行動価値関数Ｑ^＊（ｓ，ａ）を学習する方法としては、Q-Learning（例えば、公知文献（Watkins, C.J.C.H., "Learning from Delayed Rewards", 1989）が挙げられる。Q-Learningは、以下の式（４）に示されるように、逐次Ｑ値を更新しながら学習する。なお、αは予め設定される定数である。本実施形態においては、以下の式（４）に示されるQ-Learningによって行動価値関数を強化学習させる。 Examples of methods for learning the action value function Q ^* (s, a) include Q-Learning (for example, known literature (Watkins, CJCH, "Learning from Delayed Rewards", 1989). Q-Learning is based on the following As shown in equation (4), learning is performed while updating the Q value sequentially.Also, α is a preset constant.In this embodiment, Q-Learning shown in equation (4) below is performed. Reinforcement learning of the action value function is performed using the following method.

（４）
(4)

本実施形態では、災害状況及び被災者の位置等を状態ｓとし、その状態ｓと方策πとに応じた避難指示を行動ａとし、その避難指示ａが表示された表示装置を見た被災者が避難を行うものとする。Q-Learningによって学習が行われた学習済みモデルは、行動価値関数Ｑが最適となるよう、方策πに応じた避難指示ａを選定することができるようになる。 In this embodiment, the disaster situation, the location of the disaster victim, etc. are set as state s, the evacuation instruction according to the state s and the policy π is set as action a, and the disaster victim sees the display device on which the evacuation order a is displayed. shall carry out the evacuation. The learned model trained by Q-Learning is able to select evacuation instruction a according to policy π so that action value function Q becomes optimal.

本実施形態の学習部２４は、死傷者数及び避難経路の時間に基づくリスク評価結果に基づき、避難シミュレーションの災害における死者数Ｄ、負傷者数Ｉ、及び避難が完了するまでの時間Ｔに応じて設定された報酬ｒ_ｔを用いて、観測情報から避難経路に関する情報を出力するためのモデルを強化学習させる。具体的には、本実施形態においては、以下の式（５）に示される報酬ｒ_ｔを設定する。 The learning unit 24 of this embodiment adjusts the number of casualties D, the number of injured I, and the time T until the evacuation is completed in the disaster of the evacuation simulation based on the risk evaluation result based on the number of casualties and the time of the evacuation route. Reinforcement learning is performed on a model for outputting information regarding evacuation routes from observed information using the reward r _t set in the following manner. Specifically, in this embodiment, the reward _rt shown in the following equation (5) is set.

（５）
(5)

上記式（５）におけるＤは死者数を表し、Ｉは負傷者数を表す。また、Ｔは避難が完了するまでの時間である。Ｃ_ｄは死者１人あたりに対する損失を表す係数であり、Ｃ_ｉは負傷者１人あたりに対する損失を表す係数、Ｃ_ｔは避難時間と損失とを関係付ける係数である。Ｃ_ｄ、Ｃ_ｉ、及びＣ_ｔは、予め設定される。 D in the above formula (5) represents the number of dead, and I represents the number of injured. Moreover, T is the time until evacuation is completed. C _d is a coefficient representing loss per dead person, C _i is a coefficient representing loss per injured person, and C _t is a coefficient relating evacuation time to loss. C _d , C _i , and C _t are set in advance.

図２に、避難シミュレーションのシミュレーション結果と報酬との関係を説明するための説明図を示す。図２に示されるように、建物内に複数の避難者Ｕが存在している場合、災害の一例である火災Ｆが発生した場合の避難シミュレーションを実行したとする。この場合、避難指示Ａが出された場合には、避難時間がＸ１分であり、死者がＹ１人であり、負傷者がＺ１人であり、報酬は高いことが示されている。また、避難指示Ｂが出された場合には、避難時間がＸ２分であり、死者がＹ２人であり、負傷者がＺ２人であり、報酬は中程度であることが示されている。また、避難指示Ｃが出された場合には、避難時間がＸ３分であり、死者がＹ３人であり、負傷者がＺ３人であり、報酬は低いことが示されている。このように、シミュレーション結果と報酬とが紐付けられるため、本実施形態では、シミュレーション結果に応じた報酬に基づいて、観測情報から避難情報を出力するためのモデルを強化学習させる。 FIG. 2 shows an explanatory diagram for explaining the relationship between the simulation results of the evacuation simulation and the rewards. As shown in FIG. 2, it is assumed that an evacuation simulation is performed in the case where a fire F, which is an example of a disaster, occurs when a plurality of evacuees U exist in the building. In this case, when evacuation order A is issued, the evacuation time is X1 minutes, Y1 people are dead, Z1 people are injured, and the reward is high. Further, when evacuation order B is issued, the evacuation time is X2 minutes, the number of dead is Y2, the number of injured is Z2, and the reward is medium. Furthermore, when evacuation order C is issued, the evacuation time is X3 minutes, Y3 people are dead, Z3 people are injured, and the reward is low. In this way, since the simulation result and the reward are linked, in this embodiment, a model for outputting evacuation information from observation information is trained based on the reward according to the simulation result.

具体的には、学習部２４は、上記式（５）に示される報酬ｒ_ｔが大きくなるように、観測情報から避難情報を出力するためのモデルを強化学習させ、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。なお、状態ｓを表す観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、その観測情報に応じた行動ａを表す避難指示が避難情報の一例として出力される。 Specifically, the learning unit 24 performs reinforcement learning on a model for outputting evacuation information from observation information so that the reward r _t shown in the above formula (5) becomes large, which is an example of a trained model. Obtain the action value function Q ^* (s, a). Note that when observation information representing state s is input to the action value function Q ^* (s, a), an evacuation instruction representing action a according to the observation information is output as an example of evacuation information.

なお、観測情報から避難情報を出力するための行動価値関数のモデルとしては、どのような関数を用いてもよい。例えば、行動価値関数のモデルとしてニューラルネットワークモデルを用いることができる。または、状態ｓを表す観測情報と行動ａを表す避難指示とが対応付けられたテーブル（Ｑテーブルとも称される。）を用いても良い。 Note that any function may be used as a model of the action value function for outputting evacuation information from observation information. For example, a neural network model can be used as a model of the action value function. Alternatively, a table (also referred to as a Q table) in which observation information representing state s and evacuation instructions representing action a are associated may be used.

学習済みモデル記憶部２６には、学習部２４によって学習された学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）が格納される。なお、行動価値関数Ｑ^*（ｓ，ａ）は、後述する避難誘導装置において用いられ、各時刻において観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、避難情報の一例である避難指示が表示装置へ表示される。 The learned model storage unit 26 stores an action value function Q ^* (s,a), which is an example of a learned model learned by the learning unit 24. The action value function Q ^* (s, a) is used in the evacuation guidance device described later, and when observation information is input to the action value function Q ^* (s, a) at each time, an example of evacuation information is A certain evacuation order is displayed on the display device.

従来のシミュレーションによる災害発生時の避難指示の最適化は、結果が最適となるように避難指示という入力パラメタを同定する手法である。このため、複数の手法や複数のモデルによるシミュレーション結果を並列的に考慮して避難指示を最適化する場合、手法やモデルの数が増えるにつれて同定が困難となる。 Conventional simulation-based optimization of evacuation instructions in the event of a disaster is a method of identifying input parameters called evacuation instructions so that the result is optimal. For this reason, when optimizing evacuation orders by considering simulation results from multiple methods and multiple models in parallel, identification becomes more difficult as the number of methods and models increases.

一方、本実施形態では、避難シミュレーションのシミュレーション結果を、状態ｓ及び行動ａという変数に変換した後、最適な方策πを選定する手法である。このため、異なる手法や異なるモデルのシミュレーション結果であっても、共通な状態ｓ及び共通な行動ａに変換できるのであれば、枠組みを変えることなく最適な方策πを選定することが可能である。これにより、例えば個々の建物を対象とした避難シミュレーションと、地域を対象とした河川氾濫からの避難シミュレーションのように、全く異なるシミュレーション結果を並列的に組み合わせ、最適な避難指示を評価することが可能になる。 On the other hand, in this embodiment, the simulation result of the evacuation simulation is converted into variables such as the state s and the action a, and then the optimal policy π is selected. Therefore, even if the simulation results of different methods or different models can be converted into a common state s and a common action a, it is possible to select the optimal policy π without changing the framework. This makes it possible to evaluate optimal evacuation instructions by combining completely different simulation results in parallel, such as evacuation simulation for individual buildings and evacuation simulation for river flooding for a region. become.

＜避難誘導装置のシステム構成＞ <System configuration of evacuation guidance device>

図３は、本発明の実施形態に係る避難誘導装置３０の構成の一例を示すブロック図である。避難誘導装置３０は、機能的には、図３に示されるように、観測装置３２と、コンピュータ３４と、複数の表示装置４２とを含んだ構成で表すことができる。表示装置４２は、本発明の避難情報出力装置の一例である。 FIG. 3 is a block diagram showing an example of the configuration of the evacuation guidance device 30 according to the embodiment of the present invention. Functionally, the evacuation guidance device 30 can be represented by a configuration including an observation device 32, a computer 34, and a plurality of display devices 42, as shown in FIG. The display device 42 is an example of an evacuation information output device of the present invention.

避難誘導装置３０の観測装置３２及び複数の表示装置４２は、学習用の建物と同等の対象の建物内に設置される。例えば、対象の建物は、仮想的な学習用の建物の設計図に基づき建設された建物である。そして、災害が発生した際に、避難誘導装置３０は、建物内に設置された観測装置３２により逐次観測される情報に基づいて、建物内に設置された複数の表示装置４２の表示を制御し、建物内の人の避難を誘導する。以下、具体的に説明する。 The observation device 32 and the plurality of display devices 42 of the evacuation guidance device 30 are installed in a target building equivalent to the learning building. For example, the target building is a building constructed based on a blueprint for a virtual learning building. When a disaster occurs, the evacuation guidance device 30 controls the display of the plurality of display devices 42 installed inside the building based on information sequentially observed by the observation device 32 installed inside the building. , guide the evacuation of people inside the building. This will be explained in detail below.

観測装置３２は、建物内に設置され、災害が発生した際の建物内の人の位置又は動きを表す観測情報を逐次取得する。また、観測装置３２としては、例えば、人が携帯している携帯端末等、グローバル・ポジショニング・システム（GPS）機能を有する端末を利用することができる。また、建物内の避難誘導の場合には、建物内に設置されたカメラによる画像データから人の動きを判断するシステム（例えば、構造計画研究所によるVitracom Site Viewや、産業技術総合研究所によるCrowd Walk等）を利用することができる。また、観測装置３２は、災害状況（例えば、火災の広がり具合及び地震による建物の崩壊度合い等）も併せて観測するようにしてもよい。 The observation device 32 is installed inside a building and sequentially acquires observation information representing the positions or movements of people inside the building when a disaster occurs. Further, as the observation device 32, for example, a terminal having a global positioning system (GPS) function, such as a mobile terminal carried by a person, can be used. In addition, in the case of evacuation guidance within a building, systems that judge people's movements from image data from cameras installed inside the building (for example, Vitracom Site View by Kozo Keikaku, Crowd by the National Institute of Advanced Industrial Science and Technology) Walk, etc.) can be used. Furthermore, the observation device 32 may also observe the disaster situation (for example, the extent of the spread of fire, the degree of collapse of buildings due to earthquakes, etc.).

コンピュータ３４は、ＣＰＵ（Central Processing Unit）、各処理ルーチンを実現するためのプログラム等を記憶したＲＯＭ（Read Only Memory）、データを一時的に記憶するＲＡＭ（Random Access Memory）、記憶手段としてのメモリ、ネットワークインタフェース等を含んで構成されている。コンピュータ３４は、機能的には、図３に示すように、取得部３６と、学習済みモデル記憶部３７と、避難情報生成部３８と、制御部４０とを備えている。 The computer 34 includes a CPU (Central Processing Unit), a ROM (Read Only Memory) that stores programs for realizing each processing routine, a RAM (Random Access Memory) that temporarily stores data, and a memory that serves as a storage means. , network interface, etc. Functionally, the computer 34 includes an acquisition section 36, a learned model storage section 37, an evacuation information generation section 38, and a control section 40, as shown in FIG.

取得部３６は、観測装置３２によって逐次取得された観測情報を取得する。なお、例えば、観測情報がカメラによって撮像された画像等である場合、取得部３６は、所定の画像処理によって、画像に写る人の位置及び動きを検出する。 The acquisition unit 36 acquires observation information sequentially acquired by the observation device 32. Note that, for example, when the observation information is an image captured by a camera, the acquisition unit 36 detects the position and movement of the person in the image through predetermined image processing.

学習済みモデル記憶部３７には、避難誘導モデル学習装置１０の学習済みモデル記憶部２６に格納された学習済みモデルと同一の学習済みモデルが格納されている。本実施形態の学習済みモデルは行動価値関数Ｑ^*（ｓ，ａ）である。 The trained model storage unit 37 stores a trained model that is the same as the trained model stored in the trained model storage unit 26 of the evacuation guidance model learning device 10. The trained model of this embodiment is the action value function Q ^* (s,a).

避難情報生成部３８は、学習済みモデル記憶部３７に格納された学習済みモデルとしての行動価値関数Ｑ^*（ｓ，ａ）を読み出す。そして、避難情報生成部３８は、取得部３６によって取得された観測情報ｓを、行動価値関数Ｑ^*（ｓ，ａ）へ入力して、災害が発生した際の避難指示ａを生成する。なお、避難者は避難指示ａに応じた行動をとるものとする。 The evacuation information generation unit 38 reads out the action value function Q ^* (s,a) as a learned model stored in the learned model storage unit 37. Then, the evacuation information generation unit 38 inputs the observation information s acquired by the acquisition unit 36 into the action value function Q ^* (s, a) to generate an evacuation instruction a when a disaster occurs. In addition, evacuees shall take actions in accordance with evacuation order a.

制御部４０は、避難情報生成部３８によって生成された避難情報に応じて、災害が発生した建物内に設置された複数の表示装置４２を制御する。 The control unit 40 controls a plurality of display devices 42 installed in a building where a disaster has occurred, according to the evacuation information generated by the evacuation information generation unit 38.

複数の表示装置４２の各々は、図４に示されるように、建物の各箇所に設置される。そして、複数の表示装置４２の各々は、制御部４０による制御に応じて各箇所個別に表示を変更させる。表示装置４２に表示される内容は、例えば、避難指示に応じた避難方向を表す矢印又は避難指示に応じた文章（例えば、「右手方向は通行不可です。左手方向から避難してください。」）が表示される。これにより、災害が発生した際に人々を適切に避難誘導することができる。 Each of the plurality of display devices 42 is installed at each location in the building, as shown in FIG. Each of the plurality of display devices 42 changes the display at each location individually according to the control by the control unit 40. The content displayed on the display device 42 may be, for example, an arrow indicating the direction of evacuation according to the evacuation order or text corresponding to the evacuation order (for example, "Passage is prohibited on the right-hand side. Please evacuate from the left-hand direction.") is displayed. This makes it possible to appropriately guide people to evacuate when a disaster occurs.

＜避難誘導モデル学習装置の作用＞ <Operation of evacuation guidance model learning device>

次に、避難誘導モデル学習装置１０の作用を説明する。避難誘導モデル学習装置１０は、図５の学習処理ルーチンを実行する。 Next, the operation of the evacuation guidance model learning device 10 will be explained. The evacuation guidance model learning device 10 executes the learning processing routine shown in FIG.

＜学習処理ルーチン＞ <Learning processing routine>

仕様情報と条件情報とが避難誘導モデル学習装置１０に入力され、受付部１２が仕様情報と条件情報とを受け付けると、設定情報記憶部２２に、仕様情報と条件情報とが格納される。そして、避難誘導モデル学習装置１０は、学習処理の指示信号を受け付けると、図５に示される学習処理ルーチンを実行する。 When the specification information and condition information are input to the evacuation guidance model learning device 10 and the reception unit 12 receives the specification information and condition information, the specification information and condition information are stored in the setting information storage unit 22. When the evacuation guidance model learning device 10 receives the instruction signal for the learning process, it executes the learning process routine shown in FIG.

ステップＳ１００において、学習部２４は、設定情報記憶部２２に格納された仕様情報と条件情報とを読み込む。 In step S100, the learning section 24 reads the specification information and condition information stored in the setting information storage section 22.

ステップＳ１０２において、学習部２４は、上記ステップＳ１００で読み込まれた条件情報のうちの災害発生条件情報に基づき、避難シミュレーションにおける災害発生条件を設定する。例えば、学習部２４は、災害発生条件情報に基づき、建物内の火災が発生する場所及びその規模等を災害発生条件として設定する。 In step S102, the learning unit 24 sets disaster occurrence conditions in the evacuation simulation based on the disaster occurrence condition information among the condition information read in step S100. For example, the learning unit 24 sets the location where a fire occurs in a building, its scale, etc. as disaster occurrence conditions based on the disaster occurrence condition information.

ステップＳ１０４において、学習部２４は、上記ステップＳ１００で読み込まれた条件情報のうちの避難指示条件情報に基づき、避難シミュレーションにおける避難指示条件を設定する。例えば、ある場所で火災が発生した際には、被災者はその場所から離れるように避難指示が出されるような避難指示条件が設定される。避難シミュレーションにおいて、避難指示条件に応じた様々な避難指示が出され、その避難指示による被災者の行動と結果に基づき、後述する行動価値関数Ｑ^*（ｓ，ａ）が学習される。 In step S104, the learning unit 24 sets evacuation instruction conditions in the evacuation simulation based on the evacuation instruction condition information of the condition information read in step S100. For example, evacuation instruction conditions are set such that when a fire breaks out in a certain location, evacuation instructions are issued to disaster victims to leave the location. In the evacuation simulation, various evacuation orders are issued according to evacuation order conditions, and an action value function Q ^* (s, a), which will be described later, is learned based on the actions and results of disaster victims according to the evacuation orders.

ステップＳ１０６において、学習部２４は、上記ステップＳ１００で読み込まれた建物の仕様情報と、上記ステップＳ１０２で設定された災害発生条件と、上記ステップＳ１０４で設定された避難指示条件とに基づいて、学習用の建物において災害が発生した際の避難シミュレーションを実行する。 In step S106, the learning unit 24 performs learning based on the building specification information read in step S100, the disaster occurrence conditions set in step S102, and the evacuation instruction conditions set in step S104. Run an evacuation simulation when a disaster occurs in a building.

ステップＳ１０８において、学習部２４は、上記ステップＳ１０６で実行された避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S108, the learning unit 24 stores the simulation result of the evacuation simulation executed in step S106 in the storage unit (not shown).

ステップＳ１０９において、学習部２４は、上記ステップＳ１０８に格納されたシミュレーション結果に基づいて、報酬ｒ_ｔが大きくなるように、観測情報から避難経路に関する情報を出力するためのモデルを強化学習させ、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S109, the learning unit 24 performs reinforcement learning on a model for outputting information regarding the evacuation route from observation information so that the reward _rt becomes large based on the simulation results stored in step S108. Obtain an action value function Q ^* (s, a), which is an example of a completed model.

ステップＳ１１０において、学習部２４は、所定回数の避難シミュレーションが実行されたか否かを判定する。所定回数の避難シミュレーションが実行された場合には、ステップＳ１１２へ進む。一方、所定回数の避難シミュレーションが実行さていない場合には、ステップＳ１０４へ戻る。これにより、避難指示条件に応じた避難指示のみが変更された避難シミュレーションが必要な回数実行される。 In step S110, the learning unit 24 determines whether the evacuation simulation has been executed a predetermined number of times. If the evacuation simulation has been executed a predetermined number of times, the process advances to step S112. On the other hand, if the predetermined number of evacuation simulations have not been executed, the process returns to step S104. As a result, an evacuation simulation in which only the evacuation instructions according to the evacuation instruction conditions are changed is executed as many times as necessary.

ステップＳ１１２において、学習部２４は、全ての災害発生条件の避難シミュレーションが実行されたか否かを判定する。全ての災害発生条件の避難シミュレーションが実行された場合には、ステップＳ１１４へ進む。一方、避難シミュレーションが実行さていない災害発生条件が存在する場合には、ステップＳ１０２へ戻る。これにより、災害の災害発生条件のみが変更された避難シミュレーションが実行され、想定される災害についての避難シミュレーションが実行される。 In step S112, the learning unit 24 determines whether evacuation simulations for all disaster occurrence conditions have been executed. If the evacuation simulation for all disaster occurrence conditions has been executed, the process advances to step S114. On the other hand, if there is a disaster occurrence condition for which the evacuation simulation is not being executed, the process returns to step S102. As a result, an evacuation simulation is executed in which only the disaster occurrence conditions of the disaster are changed, and an evacuation simulation for a hypothetical disaster is executed.

ステップＳ１１４において、学習部２４は、上記ステップＳ１０９で学習された、学習済みの行動価値関数Ｑ^*（ｓ，ａ）を学習済みモデル記憶部２６に格納して、学習処理ルーチンを終了する。 In step S114, the learning unit 24 stores the learned action value function Q ^* (s,a) learned in the above step S109 in the learned model storage unit 26, and ends the learning processing routine.

＜避難誘導装置の作用＞ <Effect of evacuation guidance device>

次に、避難誘導装置３０の作用を説明する。避難誘導装置３０は、図６の避難誘導処理ルーチンを実行する。 Next, the operation of the evacuation guidance device 30 will be explained. The evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG.

＜避難誘導処理ルーチン＞ <Evacuation guidance processing routine>

避難誘導モデル学習装置１０によって学習された学習済みの行動価値関数Ｑ^*（ｓ，ａ）が避難誘導装置３０へ入力されると、学習済みの行動価値関数Ｑ^*（ｓ，ａ）は学習済みモデル記憶部３７へ格納される。 When the learned action value function Q ^* (s, a) learned by the evacuation guidance model learning device 10 is input to the evacuation guidance device 30, the learned action value function Q ^* (s, a) is learned. The data is stored in the model storage section 37.

そして、避難誘導装置３０が設置された建物内において災害が発生したことが検知されると、避難誘導装置３０は、図６に示す避難誘導処理ルーチンを実行する。避難誘導装置３０は、観測装置３２によって観測情報が得られる毎に、図６に示す避難誘導処理ルーチンを実行する。 Then, when it is detected that a disaster has occurred in the building where the evacuation guidance device 30 is installed, the evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG. The evacuation guidance device 30 executes the evacuation guidance processing routine shown in FIG. 6 every time observation information is obtained by the observation device 32.

ステップＳ２００において、取得部３６は、観測装置３２によって取得された観測情報を取得する。観測情報は、被災した建物内の被災者の位置及び動き等である。 In step S200, the acquisition unit 36 acquires observation information acquired by the observation device 32. The observation information includes the location and movement of disaster victims within the damaged building.

ステップＳ２０２において、避難情報生成部３８は、学習済みモデル記憶部３７に格納された学習済みモデルとしての行動価値関数Ｑ^*（ｓ，ａ）を読み出す。 In step S202, the evacuation information generation unit 38 reads out the action value function Q ^* (s, a) as a learned model stored in the learned model storage unit 37.

ステップＳ２０４において、避難情報生成部３８は、上記ステップＳ２００で取得された観測情報を、上記ステップＳ２０２で読み出された行動価値関数Ｑ^*（ｓ，ａ）へ入力して、避難経路に関する情報である避難情報を生成する。具体的には、観測情報が行動価値関数Ｑ^*（ｓ，ａ）へ入力されると、行動価値関数Ｑ^*（ｓ，ａ）から避難情報の一例である避難指示が出力される。 In step S204, the evacuation information generation unit 38 inputs the observation information acquired in step S200 to the action value function Q ^* (s,a) read out in step S202, and generates information regarding the evacuation route. Generate certain evacuation information. Specifically, when observation information is input to the action value function Q ^* (s,a), an evacuation instruction, which is an example of evacuation information, is output from the action value function Q ^* (s,a).

ステップＳ２０６において、制御部４０は、避難情報生成部３８によって生成された避難指示に応じて、災害が発生した建物内に設置された複数の表示装置４２を制御して、避難誘導処理ルーチンを終了する。 In step S206, the control unit 40 controls the plurality of display devices 42 installed in the building where the disaster occurred in accordance with the evacuation instruction generated by the evacuation information generation unit 38, and ends the evacuation guidance processing routine. do.

複数の表示装置４２の各々は、制御部４０による制御に応じて表示を変更させる。建物内の避難者は、複数の表示装置４２の各々に表示された避難指示に従って避難する。 Each of the plurality of display devices 42 changes the display according to control by the control unit 40. Evacuees in the building evacuate according to the evacuation instructions displayed on each of the plurality of display devices 42.

以上詳細に説明したように、本実施形態の避難誘導装置は、災害が発生した際の人の位置又は動きを表す観測情報を、避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて予め強化学習された学習済みモデルへ入力して、災害が発生した際の避難経路に関する情報である避難情報を生成し、避難情報に応じて建物内に設置された表示装置を制御する。これにより、災害が発生した際の避難シミュレーション結果を考慮して、リスクを最小化するように避難者を避難させることができる。 As explained in detail above, the evacuation guidance device of this embodiment uses observation information representing the positions or movements of people when a disaster occurs, the number of dead and injured in a disaster in an evacuation simulation, and the number of people who have completed evacuation. This input is input into a trained model that has undergone reinforcement learning in advance using a reward set according to the time until Control display devices installed in the building. This makes it possible to evacuate evacuees in a manner that minimizes risk, taking into consideration the results of an evacuation simulation when a disaster occurs.

また、本実施形態の避難誘導モデル学習装置は、災害が発生した際の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、避難シミュレーションの災害における死者数、負傷者数、及び避難が完了するまでの時間に応じて設定された報酬を用いて、災害が発生した際の人の位置又は動きを表す観測情報から災害が発生した際の避難経路に関する情報である避難情報を出力するためのモデルを強化学習させて、学習済みモデルを得る。これにより、災害が発生した場合に、リスクを最小化するように避難者を避難させるための学習済みモデルを取得することができる。 In addition, the evacuation guidance model learning device of this embodiment executes an evacuation simulation when a disaster occurs, and based on the simulation results, calculates the number of dead and injured in the disaster of the evacuation simulation, and until the evacuation is completed. A model for outputting evacuation information, which is information about evacuation routes when a disaster occurs, from observation information representing the position or movement of people when a disaster occurs, using rewards set according to the time of the disaster. Perform reinforcement learning to obtain a trained model. This makes it possible to obtain a trained model for evacuating evacuees to minimize risk when a disaster occurs.

また、本実施形態では、避難シミュレーションのシミュレーション結果を状態ｓ，行動ａ，方策πという変数に変換した後、これらの３変数の関係性をモデルに学習させることで、避難指示を最適化することができる。すなわち、最適な避難指示の判定において、強化学習によって得られる学習済みモデルを利用することで、異なる手法や異なるモデルによるシミュレーション結果を並列的に考慮することができる。 In addition, in this embodiment, after converting the simulation results of the evacuation simulation into variables such as state s, action a, and policy π, evacuation instructions can be optimized by having the model learn the relationship among these three variables. I can do it. That is, in determining the optimal evacuation order, by using a trained model obtained by reinforcement learning, simulation results from different methods and different models can be considered in parallel.

＜第２実施形態＞ <Second embodiment>

次に、第２実施形態について説明する。第２実施形態では、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて学習済みモデルを得て、その学習済みモデルを用いて避難情報を取得する点が第１実施形態と異なる。なお、第２実施形態に係る各装置の構成は、第１実施形態と同様の構成となるため、同一符号を付して説明を省略する。 Next, a second embodiment will be described. The second embodiment differs from the first embodiment in that a plurality of types of evacuation simulations are executed, a learned model is obtained based on the simulation results, and evacuation information is obtained using the learned model. Note that the configuration of each device according to the second embodiment is the same as that of the first embodiment, so the same reference numerals are given and explanations are omitted.

第２実施形態では、異なる種類の避難シミュレーション（例えば、建物内を対象とした避難シミュレーション及び建物外の街区内を対象とした避難シミュレーション）を実行し、そのシミュレーション結果を学習済みモデルへ反映させる。 In the second embodiment, different types of evacuation simulations (for example, an evacuation simulation for the inside of the building and an evacuation simulation for the inside of the city block outside the building) are executed, and the simulation results are reflected in the learned model.

第２実施形態では、建物単独を対象とした避難シミュレーション（建物内部から外への避難）である第１の避難シミュレーションと、建物の外における街区レベルの避難シミュレーション（建物の外の街区内での避難）である第２の避難シミュレーションとを想定する。例えば、第２実施形態では、図７に示されるように、避難者Ｕは建物Ａｘから外へ出て街区における避難も行う場合を想定する。なお、本実施形態においては、２つの異なる種類の避難シミュレーションを実行する場合を例に説明するが、２つよりも多い複数種類の避難シミュレーションを本実施形態へ適用することも可能である。 In the second embodiment, the first evacuation simulation is an evacuation simulation for a single building (evacuation from inside the building to the outside), and the evacuation simulation at the block level outside the building (evacuation within the block outside the building). A second evacuation simulation is assumed. For example, in the second embodiment, as shown in FIG. 7, it is assumed that the evacuee U leaves the building Ax and also evacuates in the city block. In addition, in this embodiment, the case where two different types of evacuation simulations are performed is demonstrated as an example, but it is also possible to apply multiple types of evacuation simulations, which are more than two, to this embodiment.

第２実施形態の学習部２４は、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、観測情報から避難情報を出力するためのモデルを強化学習させて、学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）を得る。 The learning unit 24 of the second embodiment executes multiple types of evacuation simulations, performs reinforcement learning on a model for outputting evacuation information from observation information based on the simulation results, and is an example of a learned model. Obtain the action value function Q ^* (s, a).

具体的には、第２実施形態においては、災害が発生したときの建物の内部の状態をｓ_０とし、建物から外への避難に関する方策をπ_１とし、建物から外への避難指示をａ_１とする。なお、避難者は避難指示ａ_１に応じた行動をとるものとする。 Specifically, in the second embodiment, the internal state of the building when a disaster occurs is _s0 , the policy for evacuation from the building is _π1 , and the evacuation instruction from the building is a. Set to ₁ . In addition, evacuees shall take actions in accordance with evacuation order _a1 .

また、建物から外への避難が完了したときの状態をｓ_１とし、建物の外の街区における避難に関する方策をπ_２とし、建物の外の街区における避難に関する行動をａ_２とし、建物の外の街区における避難が完了したときの状況をｓ_２とする。 In addition, the state when evacuation from the building is completed is s ₁ , the evacuation policy in the block outside the building is π ₂ , the action related to evacuation in the block outside the building is a ₂ , and the situation outside the building is π 2. Let _s2 be the situation when the evacuation in the block is completed.

この場合、上記の変数間の関係は、状態ｓ_０に対して方策π_１を適用することで避難指示ａ_１が出され、避難者が避難指示ａ_１に応じた避難行動をとり、その結果として建物からの避難が完了した時点の状態がｓ_１となる。そして、この状態ｓ_１に対して方策π_２を適用することで避難指示ａ_２が出され、避難者が避難指示ａ_２に応じた避難行動をとり、その結果として避難所等への避難が完了したときの状態がｓ_２となる。 In this case, the relationship between the above variables is that evacuation order _a1 is issued by applying policy _π1 to state _s0 , evacuees take evacuation action in accordance with evacuation order _a1 , and the result is The state at the time when evacuation from the building is completed is _s1 . Then, by applying the policy π ₂ to this state s ₁ , an evacuation order a ₂ is issued, the evacuees take evacuation action in accordance with the evacuation order a ₂ , and as a result, they are evacuated to a shelter etc. The state when completed is _s2 .

Q-Learningにおいては、学習のためには状態ｓ_１に応じた報酬ｒ_１と、状態ｓ_２に応じた報酬ｒ_２とを算出する必要がある。これらの変数間の関係は図８のようになる。 In Q-Learning, in order to learn, it is necessary to calculate a reward r ₁ according to the state s ₁ and a reward r ₂ according to the state s ₂ . The relationship between these variables is shown in FIG.

本実施形態においては、最終的な状態ｓ_２の時点における報酬ｒ_２を最大化させる避難指示の方法を、強化学習によって学習済みモデルの一例である行動価値関数Ｑ^*（ｓ，ａ）へ反映させることが目的となる。なお、報酬ｒ_１及び報酬ｒ_２は、上記式（５）に示されるように、死者数Ｄ、負傷者数Ｉ、及び避難時間Ｔ等に基づく関数として設定される。 In this embodiment, the evacuation instruction method that maximizes the reward _r2 at the time of the final state _s2 is reflected in the action value function Q ^* (s, a), which is an example of a trained model, by reinforcement learning. The purpose is to do so. Note that the reward r ₁ and the reward r ₂ are set as functions based on the number of dead D, the number of injured I, evacuation time T, etc., as shown in the above equation (5).

第２の避難シミュレーションである街区レベルの避難シミュレーションでは、シミュレーション結果から避難の状態ｓ_２が算出される。このため、第２の避難シミュレーションのシミュレーション結果に基づいて、報酬ｒ_２を最大化するように、避難指示を出力する行動価値関数Ｑ^*（ｓ，ａ）を得ることができる。 In the second evacuation simulation at the block level, the evacuation state _s2 is calculated from the simulation results. Therefore, based on the simulation result of the second evacuation simulation, it is possible to obtain an action value function Q ^* (s, a) that outputs an evacuation instruction so as to maximize the reward _r2 .

一方、第１の避難シミュレーションである建物内の避難シミュレーションのシミュレーション結果のみでは、避難の状態ｓ_１しか算出されない。このため、第１の避難シミュレーションによって報酬ｒ_１を算出することはできても、報酬ｒ_２を直接算出することはできない。 On the other hand, only the evacuation state _s1 is calculated using only the simulation results of the evacuation simulation inside the building, which is the first evacuation simulation. Therefore, although the reward r ₁ can be calculated by the first evacuation simulation, the reward r ₂ cannot be directly calculated.

このため、例えば、第１の避難シミュレーションのシミュレーション結果に基づき報酬ｒ_１を用いて強化学習を行い、建物内から外への避難指示をモデルに学習させた場合には、建物から外への避難が完了するまでを最適化するような行動価値関数Ｑ^*（ｓ，ａ）が得られるが、この行動価値関数Ｑ^*（ｓ，ａ）は、報酬ｒ_２を最大化するような避難指示を必ずしも出力するわけではない。 For this reason, for example, if reinforcement learning is performed using the reward r ₁ based on the simulation results of the first evacuation simulation, and the model is made to learn evacuation instructions from inside the building to outside, An action value function Q ^* ⁽ s, a) is obtained that optimizes _the time until the completion of the action. It does not necessarily output.

例えば、西側と東側とに出口を有する建物であって、かつ西側の出口の方が東側の出口口よりも広い建物を想定する。また、この建物の東側には避難所となる公園が存在すると想定する。この場合、建物から外へ避難するのみにおいては、西側の出口を利用した方が好ましい。しかし、建物から避難所まで避難することを考慮すると、建物の東口の出口を利用した方がより好ましいと考えられる。このような場合、第１の避難シミュレーションのシミュレーション結果のみに基づいて強化学習を行い、行動価値関数Ｑ^*（ｓ，ａ）を得ることは不適切である。 For example, assume a building that has exits on the west and east sides, and the west exit is wider than the east exit. It is also assumed that there is a park on the east side of this building that can be used as an evacuation center. In this case, it is preferable to use the exit on the west side only to evacuate from the building. However, when considering evacuation from the building to the evacuation center, it is considered more preferable to use the east exit of the building. In such a case, it is inappropriate to perform reinforcement learning based only on the simulation results of the first evacuation simulation and obtain the action value function Q ^* (s,a).

このため、報酬ｒ_２に基づいて、第１の避難シミュレーションにおける避難指示を強化学習させる手法が必要となる。そこで、本実施形態では、第１の避難シミュレーションにおける報酬ｒ_１と第２の避難シミュレーションにおける報酬ｒ_２とを尤度関数によって関係付けることにより、観測情報から避難情報を出力するためのモデルを強化学習させる。 Therefore, a method is required that performs reinforcement learning of the evacuation instructions in the first evacuation simulation based on the reward _r2 . Therefore, in this embodiment, the model for outputting evacuation information from observation information is strengthened by relating the reward r ₁ in the first evacuation simulation and the reward r ₂ in the second evacuation simulation by a likelihood function. Let them learn.

具体的には、まず、学習部２４は、様々な避難の状態ｓ_１を初期条件として設定し、設定された各初期条件に基づき第２の避難シミュレーションを行う。そして、学習部２４は、設定された各初期条件に基づき第２の避難シミュレーションによって得られた避難の状態ｓ_２を取得する。 Specifically, first, the learning unit 24 sets various evacuation states _s1 as initial conditions, and performs a second evacuation simulation based on each of the set initial conditions. Then, the learning unit 24 acquires the evacuation state _s2 obtained by the second evacuation simulation based on each set initial condition.

次に、学習部２４は、第２の避難シミュレーションによって得られた避難の状態ｓ_２に応じた報酬ｒ_２に基づいて強化学習を行い、行動価値関数Ｑ^*（ｓ，ａ）を得る。 Next, the learning unit 24 performs reinforcement learning based on the reward r ₂ corresponding to the evacuation state s ₂ obtained by the second evacuation simulation, and obtains the action value function Q ^* (s, a).

次に、学習部２４は、第２の避難シミュレーションによって得られた避難の状態ｓ_１及び避難の状態ｓ_２に基づいて、報酬ｒ_１と報酬ｒ_２とを算出する。そして、学習部２４は、算出された報酬ｒ_１と報酬ｒ_２とを対応付ける。 Next, the learning unit 24 calculates the reward r ₁ and the reward r ₂ based on the evacuation state s ₁ and the evacuation state s ₂ obtained by the second evacuation simulation. The learning unit 24 then associates the calculated reward r ₁ with the calculated reward r ₂ .

次に、学習部２４は、対応付けられた報酬ｒ_１と報酬ｒ_２とを用いて、報酬ｒ_２に対する報酬ｒ_１の尤度Ｌ（ｒ_１｜ｒ_２）を算出する。尤度Ｌ（ｒ_１｜ｒ_２）は、報酬ｒ_２が観測されたときの報酬ｒ_１の尤もらしさを表す指標である。この尤度Ｌ（ｒ_１｜ｒ_２）によって、報酬ｒ_１と報酬ｒ_２とは、以下の式（６）に示されるような関係となる。 Next, the learning unit 24 uses the correlated rewards r ₁ and rewards r ₂ to calculate the likelihood L (r ₁ | r ₂ ) of the reward r ₁ with respect to the reward r ₂ . The likelihood L(r ₁ | r ₂ ) is an index representing the likelihood of reward r ₁ when reward r ₂ is observed. Due to this likelihood L(r ₁ |r ₂ ), the reward r ₁ and the reward r ₂ have a relationship as shown in equation (6) below.

Ｐ（ｒ_２）＝Ｐ（ｒ_１）×Ｌ（ｒ_１｜ｒ_２）
（６） P(r ₂ )=P(r ₁ )×L(r ₁ | r ₂ )
(6)

なお、上記式（６）における、Ｐ（ｒ_２）は報酬がｒ_２となる確率を表し、Ｐ（ｒ_１）は報酬がｒ_１となる確率を表す。 In the above formula (6), P(r ₂ ) represents the probability that the reward will be r ₂ , and P(r ₁ ) represents the probability that the reward will be r ₁ .

次に、学習部２４は、尤度Ｌ（ｒ_１｜ｒ_２）を算出した後、避難の状態ｓ₀を初期条件として、第１の避難シミュレーションを実行し、シミュレーション結果を得る。 Next, after calculating the likelihood L (r ₁ | r ₂ ), the learning unit 24 executes a first evacuation simulation using the evacuation state s ₀ as an initial condition, and obtains a simulation result.

次に、学習部２４は、第１の避難シミュレーションのシミュレーション結果に基づいて、報酬ｒ_１を算出する。そして、学習部２４は、第１の避難シミュレーションのシミュレーション結果から算出された報酬ｒ_１に対してＬ（ｒ_１｜ｒ_２）を乗じることにより、報酬ｒ_２の確率分布を算出し、報酬ｒ_２の期待値を最大化するように、第２の避難シミュレーションの結果によって既に強化学習された行動価値関数Ｑ^*（ｓ，ａ）を更に強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 Next, the learning unit 24 calculates the reward r ₁ based on the simulation result of the first evacuation simulation. Then, the learning unit 24 calculates the probability distribution of the reward r 2 by multiplying the reward r ₁ calculated from the simulation result of the first evacuation simulation by L(r ₁ | r ₂ ), and calculates the probability distribution of the reward r ₂ . In order to maximize the expected value of ₂ , the behavior value function Q ^* (s, a), which has already been reinforced and learned based on the results of the second evacuation simulation, is further reinforced and learned to be the behavior value function Q ^* (s, a). get.

第２実施形態の避難情報生成部３８は、取得部３６によって取得された観測情報ｓを、第２実施形態の学習部２４によって学習された行動価値関数Ｑ^*（ｓ，ａ）へ入力して、災害が発生した際の避難指示ａを生成する。 The evacuation information generation unit 38 of the second embodiment inputs the observation information s acquired by the acquisition unit 36 into the action value function Q ^* (s, a) learned by the learning unit 24 of the second embodiment. , generates an evacuation instruction a when a disaster occurs.

次に、第２実施形態の避難誘導モデル学習装置１０の作用を説明する。第２実施形態の避難誘導モデル学習装置１０は、図９及び図１０の学習処理ルーチンを実行する。 Next, the operation of the evacuation guidance model learning device 10 of the second embodiment will be explained. The evacuation guidance model learning device 10 of the second embodiment executes the learning processing routines shown in FIGS. 9 and 10.

＜学習処理ルーチン＞ <Learning processing routine>

仕様情報と条件情報とが避難誘導モデル学習装置１０に入力され、受付部１２が仕様情報と条件情報とを受け付けると、設定情報記憶部２２に、仕様情報と条件情報とが格納される。そして、第２実施形態の避難誘導モデル学習装置１０は、学習処理の指示信号を受け付けると、図９及び図１０に示される学習処理ルーチンを実行する。 When the specification information and condition information are input to the evacuation guidance model learning device 10 and the reception unit 12 receives the specification information and condition information, the specification information and condition information are stored in the setting information storage unit 22. When the evacuation guidance model learning device 10 of the second embodiment receives the learning process instruction signal, it executes the learning process routine shown in FIGS. 9 and 10.

ステップＳ１００～ステップＳ１０４は、第１実施形態と同様に実行される。 Steps S100 to S104 are executed in the same manner as in the first embodiment.

ステップＳ３０６において、学習部２４は、上記ステップＳ１００で読み込まれた建物の仕様情報と、上記ステップＳ１０２で設定された災害発生条件と、上記ステップＳ１０４で設定された避難指示条件とに基づいて、街区レベルの避難シミュレーションである第２の避難シミュレーションを実行する。 In step S306, the learning unit 24 determines the city block based on the building specification information read in step S100, the disaster occurrence conditions set in step S102, and the evacuation order conditions set in step S104. A second evacuation simulation, which is a level evacuation simulation, is performed.

ステップＳ３０８において、学習部２４は、上記ステップＳ３０６で実行された第２の避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S308, the learning unit 24 stores the simulation result of the second evacuation simulation executed in step S306 in the storage unit (not shown).

ステップＳ３０９において、学習部２４は、上記ステップＳ３０８で第２の避難シミュレーションによって得られた結果である避難の状態ｓ_２に応じた報酬ｒ_２に基づいて、モデルを強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S309, the learning unit 24 performs reinforcement learning on the model based on the reward _r2 corresponding to the evacuation state _s2 , which is the result obtained by the second evacuation simulation in step S308, and performs reinforcement learning on the model to ^* Obtain (s, a).

ステップＳ１１０～ステップＳ１１２は、第１実施形態と同様に実行される。 Steps S110 to S112 are executed in the same manner as in the first embodiment.

次に、図１０に示すステップＳ３１６において、学習部２４は、上記ステップＳ３０８で記憶された、第２の避難シミュレーションによって得られた避難の状態ｓ_１及び避難の状態ｓ_２に基づいて、報酬ｒ_１と報酬ｒ_２とを算出する。そして、学習部２４は、上記ステップＳ３１６で算出された報酬ｒ_１の各々と報酬ｒ_２の各々とを対応付ける。 Next, in step S316 shown in FIG. 10, the learning unit 24 calculates the reward r based on the evacuation state _s1 and the evacuation state _s2 obtained by the second evacuation simulation, which are stored in step S308. ₁ and the reward _r2 . The learning unit 24 then associates each of the rewards r ₁ and each of the rewards r ₂ calculated in step S316 above.

ステップＳ３１８において、学習部２４は、上記ステップＳ３１６で対応付けられた報酬ｒ_１の各々と報酬ｒ_２の各々とを用いて、各報酬ｒ_２に対する各報酬ｒ_１の尤度Ｌ（ｒ_１｜ｒ_２）を算出する。 In step S318, _the learning unit 24 uses each reward r ₁ and each reward _{r 2} _associated in step S316 to calculate the likelihood L(r ₁ | r ₂ ).

ステップＳ３２０は、上記ステップＳ１０２と同様に実行される。 Step S320 is executed in the same manner as step S102 above.

ステップＳ３２２は、上記ステップＳ１０４と同様に実行される。 Step S322 is executed in the same manner as step S104 above.

ステップＳ３２４において、学習部２４は、避難の状態ｓ_０を初期条件として、第１の避難シミュレーションを実行し、シミュレーション結果を得る。 In step S324, the learning unit 24 executes a first evacuation simulation using the evacuation state _s0 as an initial condition, and obtains a simulation result.

ステップＳ３２６において、学習部２４は、第１の避難シミュレーションのシミュレーション結果を記憶部（図示省略）に格納する。 In step S326, the learning unit 24 stores the simulation result of the first evacuation simulation in the storage unit (not shown).

ステップＳ３２７において、学習部２４は、上記ステップＳ３２６で記憶部（図示省略）に格納された、第１の避難シミュレーションのシミュレーション結果から算出された報酬ｒ_１に対してＬ（ｒ_１｜ｒ_２）を乗じる。そして、学習部２４は、上記式（６）に従って、報酬ｒ_２の確率分布を算出し、報酬ｒ_２の期待値を最大化するように、第２の避難シミュレーションの結果によって既に強化学習された行動価値関数Ｑ^*（ｓ，ａ）を更に強化学習させ、行動価値関数Ｑ^*（ｓ，ａ）を得る。 In step S327, the learning unit 24 calculates L(r ₁ | r ₂ ) for the reward r ₁ calculated from the simulation result of the first evacuation simulation stored in the storage unit (not shown) in step S326. Multiply by Then, the learning unit 24 calculates the probability distribution of the reward r ₂ according to the above formula (6), and calculates the probability distribution of the reward r ₂ based on the results of the second evacuation simulation. The action value function Q ^* (s, a) is further subjected to reinforcement learning to obtain the action value function Q ^* (s, a).

ステップＳ３２８において、上記ステップＳ１１０と同様に実行される。 In step S328, the same process as in step S110 is performed.

ステップＳ３３０において、上記ステップＳ１１２と同様に実行される。 In step S330, the same process as in step S112 is performed.

ステップＳ３３２において、学習部２４は、上記ステップＳ３２７で得られた行動価値関数Ｑ^*（ｓ，ａ）を学習済みモデル記憶部２６へ格納して、学習処理ルーチンを終了する。 In step S332, the learning unit 24 stores the action value function Q ^* (s,a) obtained in step S327 above in the learned model storage unit 26, and ends the learning processing routine.

以上詳細に説明したように、第２実施形態の避難誘導モデル学習装置は、複数種類の避難シミュレーションを実行し、当該シミュレーション結果に基づいて、学習済みモデルを得る。これにより、複数の避難シミュレーションを考慮して、災害が発生した場合に被災者のリスクを最小化するように避難させるための学習済みモデルを取得することができる。 As described above in detail, the evacuation guidance model learning device of the second embodiment executes a plurality of types of evacuation simulations and obtains a learned model based on the simulation results. As a result, it is possible to obtain a trained model for evacuating disaster victims in a manner that minimizes the risk of disasters when a disaster occurs, taking into account a plurality of evacuation simulations.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the embodiments described above, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上記実施形態では、避難情報出力装置が表示装置４２である場合を例に説明したが、これに限定されるものではない。例えば、避難情報出力装置は、音声出力装置であってもよく、この場合には、避難指示が音声によって出力される。また、避難情報出力装置は、各避難者が保有しているスマートフォン等の端末であってもよい。 For example, in the above embodiment, the evacuation information output device is the display device 42, but the present invention is not limited to this. For example, the evacuation information output device may be an audio output device, and in this case, the evacuation instructions are output by voice. Further, the evacuation information output device may be a terminal such as a smartphone owned by each evacuee.

また、上記では本発明に係るプログラムが記憶部（図示省略）に予め記憶（インストール）されている態様を説明したが、本発明に係るプログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ及びマイクロＳＤカード等の記録媒体に記録されている形態で提供することも可能である。 In addition, although the embodiment in which the program according to the present invention is stored (installed) in advance in a storage unit (not shown) has been described above, the program according to the present invention can be stored on a CD-ROM, a DVD-ROM, a micro SD card, etc. It is also possible to provide the information in the form recorded on a recording medium.

１０避難誘導モデル学習装置
１２受付部
２０コンピュータ
２２設定情報記憶部
２４学習部
２６学習済みモデル記憶部
３０避難誘導装置
３２観測装置
３４コンピュータ
３６取得部
３７学習済みモデル記憶部
３８避難情報生成部
４０制御部
４２表示装置 10 Evacuation guidance model learning device 12 Reception section 20 Computer 22 Setting information storage section 24 Learning section 26 Learned model storage section 30 Evacuation guidance device 32 Observation device 34 Computer 36 Acquisition section 37 Learned model storage section 38 Evacuation information generation section 40 Control Section 42 Display device

Claims

an acquisition unit that acquires observation information representing the position or movement of people when a disaster occurs;
The observation information acquired by the acquisition unit is set based on the results of an evacuation simulation when a disaster occurs, according to the number of dead and injured people in the disaster of the evacuation simulation, and the time until evacuation is completed. an evacuation information generation unit that generates evacuation information that is information regarding an evacuation route when the disaster occurs by inputting the received reward into a trained model that has been subjected to reinforcement learning in advance;
a control unit that controls an evacuation information output device according to the evacuation information generated by the evacuation information generation unit;
including;
The trained model is a trained model that has undergone reinforcement learning in advance according to the results of a first evacuation simulation targeting the inside of the building and the results of a second evacuation simulation targeting the inside of the city block outside the building. be,
Evacuation guidance device.

When the reinforcement learning is executed,
The reward r ₁ in the first evacuation simulation and the reward r ₂ in the second evacuation simulation are expressed as a likelihood function L(r ₁ | r _{2 )} representing the likelihood of the reward r ₁ when the reward r ₂ is observed. ),
As shown in equation (1) below, the likelihood _function _L ( r ₁ |r ₂ ) to calculate the probability distribution P(r ₂ ) of the reward r ₂ ,
P(r ₂ )=P(r ₁ )×L(r ₁ | r ₂ )
(1)
performing reinforcement learning on the trained model so that the expected value of reward r2 _is maximized;
The evacuation guidance device according to claim 1.

Execute an evacuation simulation when a disaster occurs, and based on the results of the evacuation simulation, receive rewards set according to the number of dead and injured in the disaster of the evacuation simulation, and the time until evacuation is completed. A model for outputting information about the evacuation route when a disaster occurs is trained based on observation information representing the position or movement of people when a disaster occurs, and the model is trained to output information about the evacuation route from the observed information. a learning unit that obtains a trained model that outputs information;
including;
The learning unit performs reinforcement learning on the model according to the results of the first evacuation simulation targeting the inside of the building and the results of the second evacuation simulation targeting the inside of the city block outside the building. Get a trained model,
Evacuation guidance model learning device.

The learning unit, when performing the reinforcement learning,
The reward r ₁ in the first evacuation simulation and the reward r ₂ in the second evacuation simulation are expressed as a likelihood function L(r ₁ | r _{2 )} representing the likelihood of the reward r ₁ when the reward r ₂ is observed. ),
As shown in equation (1) below, the likelihood _function _L ( r ₁ |r ₂ ) to calculate the probability distribution P(r ₂ ) of the reward r ₂ ,
P(r ₂ )=P(r ₁ )×L(r ₁ | r ₂ )
(1)
performing reinforcement learning on the trained model so that the expected value of reward r2 _is maximized;
The evacuation guidance model learning device according to claim 3.