JPH06161551A

JPH06161551A - Obstacle evasion system for autonomously moving object

Info

Publication number: JPH06161551A
Application number: JP4310485A
Authority: JP
Inventors: Tetsuo Shigeta; 哲雄繁田; Yukinori Kakazu; 侑昇嘉数; Sadayoshi Mikami; 貞芳三上
Original assignee: Mitsubishi Heavy Industries Ltd
Current assignee: Mitsubishi Heavy Industries Ltd
Priority date: 1992-11-19
Filing date: 1992-11-19
Publication date: 1994-06-07

Abstract

PURPOSE:To automatically determine a proper action to environments and to flexibly and automatically cope with environmental changes. CONSTITUTION:The environmental information on the external environments 9 of a moving object is sensed by a direction sensor 5 and a distance sensor 6. A recognition block 11 recognizes an environmental state from the environmental information sensed by sensors 5, 6. An action determination block 14 introduces an action selection probability aggregation 18 corresponding to the environmental state recognized by the recognition block 11 and determines an action command based on this. By this action command, an actuator 8 is made to actuate and to perform an actual action. An evaluation block 12 performs the indentificaion of an environmenal response by the change of information before and after the moving of the moving object sensed by the sensors 5, 6. A learning block 15 performs the update of the action selection probability aggregation by the environmental response to the action of the evaluated moving object. Thus, the action suitable to environments can be selected.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、確率的学習オートマト
ン理論（以下、ＳＬＡ理論と略称する）を用いて障害物
を回避する自律移動物体の障害物回避システムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an obstacle avoidance system for an autonomous moving object which avoids obstacles by using stochastic learning automaton theory (hereinafter referred to as SLA theory).

【０００２】[0002]

【従来の技術】例えばナビゲーションシステムを持つ自
動車、各種センサを有するロボット、非誘導自走式の無
人搬送機等の自律移動物体においては、走行中に障害物
を回避する方法が重要な課題であり、従来から種々の方
法が考えられている。そして、予測不能な障害物（移動
障害物を含む）を回避するために、環境情報に基づいて
リアルタイムに操作指令を出す方法、例えばファジィ
法、ヒューリスティックス法、If-Then ルール等を用い
た方法が採用されている。2. Description of the Related Art For autonomous moving objects such as automobiles having a navigation system, robots having various sensors, and non-guided self-propelled automatic guided vehicles, a method of avoiding obstacles during traveling is an important issue. Conventionally, various methods have been considered. Then, in order to avoid unpredictable obstacles (including moving obstacles), a method of issuing an operation command in real time based on environmental information, for example, a method using a fuzzy method, a heuristics method, an If-Then rule, etc. Has been adopted.

【０００３】図７は、従来における自律移動物体の障害
物回避方法を示すものである。まず、自律移動物体の移
動に伴う環境の状況をセンサにより検出し、このセンサ
により収集した環境情報から環境状態を認知し、行動目
標を判断する。そして、この行動目標に対して行動計画
を立て、アクチュエータに行動指令を与える。上記行動
計画においては、障害物が存在した場合には、その障害
物を回避する回避戦略を有し、それに基づいてアクチュ
エータに行動指令を与え、障害物の回避動作を行なう。FIG. 7 shows a conventional obstacle avoidance method for an autonomously moving object. First, a sensor detects an environmental condition associated with the movement of an autonomously moving object, recognizes an environmental state from the environmental information collected by the sensor, and determines an action target. Then, an action plan is made for this action target, and an action command is given to the actuator. In the above action plan, when an obstacle is present, an avoidance strategy for avoiding the obstacle is provided, and an action command is given to the actuator based on the avoidance strategy to perform the obstacle avoidance operation.

【０００４】[0004]

【発明が解決しようとする課題】しかし、従来において
は、行動指令を決定するための回避戦略が固定されてお
り、次のような問題を生じる。However, conventionally, the avoidance strategy for determining the action command is fixed, and the following problems occur.

【０００５】（ａ）回避行動を一意に決定できるだけの
障害物情報を入力できるとは限らない。例として移動障
害物の今後の行動がある。従来方法では、移動障害物は
直進運動をとる等の仮定をおいており、この仮定が実際
の環境においても当て嵌まるとは必ずしも言えない。(A) It is not always possible to input obstacle information enough to uniquely determine an avoidance action. An example is the future behavior of mobile obstacles. In the conventional method, it is assumed that the moving obstacle takes a rectilinear motion, and this assumption does not always hold true in an actual environment.

【０００６】（ｂ）回避戦略が固定されているため、環
境の変化に対応できない。例えば、静止障害物のみの環
境に適応している回避戦略では移動障害物の存在する環
境では十分な対応がとれないことは明白である。逆に、
移動障害物用の戦略を持つ移動物体が、静的な環境下を
行動する時は、必要以上の回避を行なうためその環境に
適切な回避戦略とはならない。(B) Since the avoidance strategy is fixed, it cannot respond to changes in the environment. For example, it is clear that an avoidance strategy that adapts to an environment with only stationary obstacles will not be adequately addressed in an environment with moving obstacles. vice versa,
When a moving object with a strategy for moving obstacles behaves in a static environment, it avoids more than necessary, so it is not an appropriate avoidance strategy for that environment.

【０００７】（ｃ）上記の問題について、あらゆる状況
で適切な行動を行なうためには、各状況毎に必要な行動
を記述したルール、例えば“もし環境がＡの状態なら
ば、ａの行動を取れ”等のルールの個数が膨大となり、
その記憶が不可能となる。(C) Regarding the above problems, in order to take appropriate actions in all situations, a rule describing actions required in each situation, for example, "if the environment is in the state of A, the action of a is taken. The number of rules such as "take" becomes huge,
The memory becomes impossible.

【０００８】（ｄ）回避戦略の設定の根拠が曖昧であ
る。例えば、ヒューリスティックス法では、「障害物が
右に存在すれば、左に進路を取れ」等のルールにより回
避を行なうが、そのルールが障害物の状況に関して最適
なルールであるか検証することはできない。(D) The basis for setting the avoidance strategy is ambiguous. For example, in the heuristics method, avoidance is performed by a rule such as "If an obstacle exists on the right, take a course to the left", but it is not possible to verify whether the rule is the optimal rule regarding the situation of the obstacle. .

【０００９】本発明は上記の実情を考慮してなされたも
ので、環境に対する適切な行動を自動的に決定できると
共に、環境変化に柔軟かつ自動的に対応できる自律移動
物体の障害物回避システムを提供することを目的とす
る。The present invention has been made in consideration of the above circumstances, and provides an obstacle avoidance system for an autonomously moving object which can automatically determine an appropriate action for the environment and can flexibly and automatically respond to environmental changes. The purpose is to provide.

【００１０】[0010]

【課題を解決するための手段】本発明に係る自律移動物
体の障害物回避システムは、移動物体の外部環境に関す
る現時刻での環境情報を感知するセンサと、このセンサ
により感知された環境情報からその環境を特徴的に表す
環境状態を認知する認知手段と、この認知手段により認
知された環境状態に対する行動選択確率集合を導き、現
状態での移動物体の行動を確率的に決定する行動決定手
段と、この手段により決定された行動に基づいて実際の
行動を行なう手段と、上記センサより得られる移動物体
の移動前後の情報変化により、環境応答の同定を行なう
評価手段と、この評価手段により評価された移動物体の
行動に対する環境応答を受けとり、上記行動選択確率集
合の更新を行なう学習手段とを具備したことを特徴とす
る。An obstacle avoidance system for an autonomous moving object according to the present invention comprises a sensor for detecting environmental information of the external environment of the moving object at a current time, and an environmental information detected by the sensor. A cognitive means for recognizing an environmental state characteristically representing the environment, and a behavior deciding means for probabilistically determining the behavior of the moving object in the current state by deriving a behavior selection probability set for the environmental state recognized by the cognitive means. And means for performing an actual action based on the action determined by this means, an evaluation means for identifying an environmental response by the information change before and after the movement of the moving object obtained by the sensor, and an evaluation by this evaluation means. And a learning means for receiving the environmental response to the action of the moving object and updating the action selection probability set.

【００１１】[0011]

【作用】移動物体の移動に伴う外部環境に関する環境情
報をセンサにより感知し、認知手段に入力する。この認
知手段は、上記センサにより感知された環境情報からそ
の環境を簡単な分類により、例えば障害物の方向、距離
等により環境状態を認知し、行動決定手段に入力する。
この行動決定手段は、確率マトリクスを有しており、上
記認知手段により認知された現環境状態に対応する行動
選択確率集合を導き、これに基づいて行動集合内より行
動指令を決定する。この手段により決定された行動に基
づいて例えばアクチュエータ等を作動させ、駆動系を駆
動して実際の行動を行なわせる。The sensor detects environmental information related to the external environment associated with the movement of the moving object and inputs it to the recognition means. The recognition means recognizes the environment state from the environment information sensed by the sensor by simple classification, for example, the direction and distance of an obstacle, and inputs it to the action determination means.
The action determining means has a probability matrix, derives an action selection probability set corresponding to the current environmental state recognized by the recognition means, and determines an action command from the action set based on this. Based on the action determined by this means, for example, an actuator or the like is actuated to drive the drive system to perform an actual action.

【００１２】一方、上記移動物体の移動前後の情報変化
をセンサにより感知して評価手段に入力する。この評価
手段は、上記センサより得られる移動物体の移動前後の
情報変化により、環境応答の同定を行なう。すなわち、
ある障害物において、もしその距離が移動後、さらに接
近するならば、両物体が衝突する危険度が高まることを
示しているので、この場合の環境応答は悪い評価を表
す。逆に障害物との距離が遠ざかるならば、環境応答は
良い評価を示す。そして、学習手段は、上記評価手段に
より評価された移動物体の行動に対する環境応答によ
り、行動選択確率集合の更新を行なう。すなわち、良い
評価を環境から受ける行動の選択確率を高め、逆に悪い
評価を受ける行動の選択確率を低くする。この評価基準
は、物体に目標に従って予め設定し、実際に物体を環境
内で行動させ、行動に対する環境応答を獲得する。これ
より、評価の高い行動、つまり環境に適した行動を選択
できるようになる。On the other hand, a change in information of the moving object before and after the movement is detected by a sensor and input to the evaluation means. This evaluation means identifies the environmental response based on the information change obtained by the sensor before and after the movement of the moving object. That is,
The environmental response in this case represents a poor evaluation, as it indicates that the risk of collision of both objects increases if the distance of the obstacle further approaches after moving. On the contrary, if the distance from the obstacle increases, the environmental response shows a good evaluation. Then, the learning unit updates the action selection probability set based on the environmental response to the action of the moving object evaluated by the evaluation unit. That is, the selection probability of an action that receives a good evaluation from the environment is increased, and conversely, the selection probability of an action that receives a bad evaluation is decreased. This evaluation standard is set in advance for an object according to a goal, and actually causes the object to act in the environment, and acquires an environmental response to the action. This makes it possible to select highly evaluated actions, that is, actions that are suitable for the environment.

【００１３】[0013]

【実施例】以下、図面を参照して本発明の一実施例を説
明する。図１は、自律移動物体として例えば前輪駆動自
動車に実施した場合の例を示しものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows an example of the case where the autonomous moving object is applied to, for example, a front-wheel drive vehicle.

【００１４】図１において、１は移動装置で、この実施
例ではナビゲーションシステムを備えた前輪駆動の自動
車を対象としている。この移動物体１は、前車輪２ａ，
２ｂ、後車輪３ａ，３ｂを備え、前車輪２ａ，２ｂが駆
動系４により駆動される。この駆動系４は、操作指令を
受けて自律移動物体を運動させる装置全般を示すもの
で、この実施例では、例えばエンジンから前車輪２ａ，
２ｂまでの系統と、ハンドルやアクセル等の操作部を示
している。In FIG. 1, reference numeral 1 is a mobile device, and in this embodiment, a front-wheel drive automobile equipped with a navigation system is targeted. This moving object 1 has front wheels 2a,
2b and rear wheels 3a and 3b are provided, and the front wheels 2a and 2b are driven by the drive system 4. The drive system 4 represents a general device for moving an autonomously moving object in response to an operation command. In this embodiment, for example, the engine is used to drive the front wheels 2a,
The system up to 2b and the operating parts such as the handle and accelerator are shown.

【００１５】そして、上記移動装置１には、障害物の位
置を測定する例えばレーダー等の方向センサ５が設けら
れると共に、障害物までの距離を測定する例えば超音波
測定器等の距離センサ６が設けられる。上記方向センサ
５及び距離センサ６により測定され情報は、ＳＬＡ理論
をプログラムしたコンピュータ７に送られる。このコン
ピュータ７は、方向センサ５及び距離センサ６から送ら
れてくる情報をＳＬＡ理論を用いて処理し、障害物を回
避する行動指令をアクチュエータ８に出力する。このア
クチュエータ８は、コンピュータ７からの行動指令を駆
動系４に伝達し、障害物を回避する運動を行なわせる。
次に上記実施例の動作を図２（ａ），（ｂ）に示すフロ
ーチャートを参照して説明する。The moving device 1 is provided with a direction sensor 5 such as a radar for measuring the position of an obstacle and a distance sensor 6 such as an ultrasonic measuring device for measuring the distance to the obstacle. It is provided. The information measured by the direction sensor 5 and the distance sensor 6 is sent to a computer 7 programmed with SLA theory. The computer 7 processes the information sent from the direction sensor 5 and the distance sensor 6 using the SLA theory, and outputs an action command for avoiding an obstacle to the actuator 8. This actuator 8 transmits an action command from the computer 7 to the drive system 4 and causes it to perform a motion to avoid an obstacle.
Next, the operation of the above embodiment will be described with reference to the flow charts shown in FIGS.

【００１６】方向センサ５及び距離センサ６により計測
された移動装置１の環境情報は、コンピュータ７に入力
される。このコンピュータ７は、ＳＬＡ理論を用いて現
時刻の環境状態に対する行動を行動選択確率集合より決
定し、その行動に対する環境からの応答（評価）に基づ
いて確率集合の更新を行ない、環境に適した行動を求
め、アクチュエータ８に行動指令を与える。Environmental information of the mobile device 1 measured by the direction sensor 5 and the distance sensor 6 is input to the computer 7. The computer 7 determines the action for the environmental state at the current time from the action selection probability set using the SLA theory, updates the probability set based on the response (evaluation) from the environment to the action, and is suitable for the environment. An action is requested and an action command is given to the actuator 8.

【００１７】一般にＳＬＡ理論では、環境応答は環境よ
り入力されるが、本実施例では移動前後の環境情報をセ
ンサ５，６を介してコンピュータ７に入力し、それらを
比較して求められる衝突危険度の増減により応答に替え
るものとする。Generally, in the SLA theory, the environmental response is input from the environment, but in this embodiment, environmental information before and after movement is input to the computer 7 via the sensors 5 and 6, and the collision risk obtained by comparing them is calculated. Responses shall be changed as the frequency increases or decreases.

【００１８】ここで、上記ＳＬＡ理論について簡単に説
明する。ＳＬＡ理論では、各状態に対する行動を物体が
動作可能な行動の中から確率的に選択する。ここで状態
とは、外部環境に基づく環境状態または、物体特有の内
部状態を示す。これを形式的に表現すると、環境状態Ｓ
_iにおいて、行動Ａ₁を選択する確率はＰ₁である。行動Ａ₂を選択する確率はＰ₂である。：：行動Ａ_rを選択する確率はＰ_rである。となる。上記Ｐ₁，…，Ｐ_rはそれぞれ［０，１］内の
定数、且つ、Here, the above SLA theory will be briefly described. In the SLA theory, an action for each state is stochastically selected from actions in which an object can move. Here, the state indicates an environmental state based on an external environment or an internal state peculiar to an object. If this is expressed formally, the environmental state S
_{In i} , the probability of selecting the action A ₁ is P ₁ . The probability of selecting the action A ₂ is P ₂ . _{:: The} probability of selecting the action A _r is P _r . Becomes The P ₁ , ..., P _r are constants in [0,1], respectively, and

【００１９】[0019]

【数１】である。[Equation 1] Is.

【００２０】ここで、確率集合Ｐ_i＝［Ｐ₁，…，
Ｐ_r］を定めるために環境応答に基づく学習を行なう。
すなわち、良い評価を環境から受ける行動の選択確率を
高め、逆に悪い評価を受ける行動の選択確率を低くす
る。この評価基準は、物体に目標に従って予め設定し、
実際に物体を環境内で行動させ、行動に対する環境応答
を獲得する。これにより評価の高い行動、つまり環境に
適した行動を選択できるようになる。以下、上記ＳＬＡ
理論によるコンピュータ７の動作を詳細に説明する。Here, the probability set P _i = [P ₁ , ...,
Learning based on the environmental response is performed to determine P _r ].
That is, the selection probability of an action that receives a good evaluation from the environment is increased, and conversely, the selection probability of an action that receives a bad evaluation is decreased. This evaluation standard is preset according to the object,
The object is actually made to act in the environment, and the environmental response to the action is acquired. This makes it possible to select highly evaluated actions, that is, actions that are suitable for the environment. Hereinafter, the above SLA
The operation of the computer 7 based on theory will be described in detail.

【００２１】コンピュータ７には、図２（ａ）に示すよ
うに認知ブロック１１、評価ブロック１２、ＳＬＡブロ
ック１３がプログラムされている。更に、このＳＬＡブ
ロック１３は、図２（ａ）に示すように行動決定ブロッ
ク１４及び学習ブロック１５からなっている。The computer 7 is programmed with a recognition block 11, an evaluation block 12, and an SLA block 13 as shown in FIG. 2 (a). Further, the SLA block 13 is composed of an action decision block 14 and a learning block 15 as shown in FIG.

【００２２】コンピュータ７は、各センサ５，６から入
力される環境情報を認知ブロック１１により図３に示す
ように特徴的な分類を行なう。この操作を探索範囲領域
内の全てあるいは近接した有限個の障害物ａ，ｂ，ｃに
ついて行なう。即ち、探索範囲を前方左［ＦＬ］、前方
右［ＦＲ］、後方左［ＢＬ］、後方右［ＢＲ］の４つに
分けて、障害物に対する方向情報としている。また、探
索範囲を近接領域ＣＬＯＳＥと、それ以遠の全領域ＦＡ
Ｒに分けて距離情報としている。図３は、３つの障害物
ａ，ｂ，ｃについて環境状態を示したもので、障害物ａ
は近距離ＣＬＯＳＥで後方右［ＢＲ］、障害物ｂは遠距
離ＦＡＲで前方右［ＦＲ］、障害物ｃは遠距離ＦＡＲで
前方左［ＦＬ］、の方向に存在していることを示してい
る。このようにして、各障害物ａ，ｂ，ｃの情報は、有
限個の状態の１つに分類される。The computer 7 classifies the environmental information input from the sensors 5 and 6 by the recognition block 11 as shown in FIG. This operation is performed for all or a finite number of adjacent obstacles a, b, c in the search range area. That is, the search range is divided into four areas, the front left [FL], the front right [FR], the rear left [BL], and the rear right [BR], and is used as direction information for the obstacle. In addition, the search range is the close area CLOSE and all areas FA beyond that.
The distance information is divided into R. FIG. 3 shows the environmental conditions of three obstacles a, b, and c.
Indicates that there is a short distance CLOSE to the rear right [BR], an obstacle b exists to the far distance FAR to the front right [FR], and an obstacle c exists to the far distance FAR to the front left [FL]. There is. In this way, the information on each obstacle a, b, c is classified into one of a finite number of states.

【００２３】次に、各障害物ａ，ｂ，ｃの状態を図４に
示すように１つの環境状態で表現する。この環境状態中
の“１”はそれに対応する状態の障害物が“存在”する
ことを意味する。障害物が存在しない状態では“０”を
その状態を表す位置に入力する。このように環境情報
は、状態ストリングの形式で表現される。このシステム
において、認知できる環境状態は“０”と“１”のみ取
ることができるストリングで表現されるため、有限かつ
既知である。また、移動装置１が可能な行動も有限かつ
離散的に表現できる。このため、図５に示すように認知
可能な環境状態Ｕ（ｎ）を状態集合１６として、また、
この移動装置１が可能な行動を行動集合として、さら
に、それらに対応する行動選択確率集合を図５に示すよ
うに確率マトリクス１７として予め設定できる。Next, the states of the obstacles a, b and c are represented by one environmental state as shown in FIG. "1" in this environmental state means that the obstacle in the corresponding state "exists". When there is no obstacle, "0" is input to the position indicating that state. In this way, the environment information is expressed in the form of a state string. In this system, the perceptible environmental state is finite and known because it is represented by a string that can take only "0" and "1". Moreover, the actions that the mobile device 1 can perform can be expressed in a finite and discrete manner. Therefore, as shown in FIG. 5, the recognizable environmental state U (n) is set as the state set 16, and
Actions that can be performed by the mobile device 1 can be set in advance as a set of actions, and a set of action selection probabilities corresponding to them can be set in advance as a probability matrix 17 as shown in FIG.

【００２４】行動決定ブロック１４では、上記確率マト
リクス１７を有し、現環境の状態に対応する行動選択確
率集合１８を呼び出し、これに基づいて行動集合１９内
から行動指令Ａ（ｎ）を決定する。この決定方法には
［０，１］間のランダム値αにより、以下の式を用い
る。The action decision block 14 calls the action selection probability set 18 having the above-mentioned probability matrix 17 and corresponding to the state of the current environment, and decides the action command A (n) from the action set 19 based on this. . The following formula is used for this determination method with a random value α between [0, 1].

【００２５】[0025]

【数２】 [Equation 2]

【００２６】ここで、Ｐ_i（ｎ）はｉ番目の行動ａ_iに
対する選択確率、Ａ（ｎ）は時刻ｎでの行動を示してい
る。この方法は、図６に示すように各行動の選択確率と
一致した割合で選ばれる。Here, P _i (n) indicates the selection probability for the i-th action a _i , and A (n) indicates the action at time n. This method is selected at a rate matching the selection probability of each action, as shown in FIG.

【００２７】上記行動決定ブロック１４は、上記のよう
にして行動指令Ａ（ｎ）を選択し、アクチュエータ８に
出力する。アクチュエータ８は、この行動指令Ａ（ｎ）
に従って駆動系４を動作させ、移動装置１を移動する。
この移動装置１の移動後、この行動に対する環境９から
の環境情報をセンサ５，６により得て評価ブロック１２
に入力し、その環境応答２０を求める。The action decision block 14 selects the action command A (n) as described above and outputs it to the actuator 8. The actuator 8 uses this action command A (n)
The drive system 4 is operated according to the above, and the moving device 1 is moved.
After the movement of the mobile device 1, environmental information from the environment 9 for this action is obtained by the sensors 5 and 6 and the evaluation block 12
To obtain the environmental response 20.

【００２８】上記センサ５，６からの環境情報は、移動
前に障害物ａ，ｂ，ｃと認知した物体の移動前後におけ
る距離情報を用いる。すなわち、ある障害物において、
もしその距離が移動後、さらに接近するならば、両物体
が衝突する危険度が高まることを表している。この場合
の環境応答２０は悪い評価を表す。逆に距離が遠ざかる
ならば、環境応答２０は良い評価を示すことになる。こ
れを各障害物ａ，ｂ，ｃにおいて行なう。更に、その線
形和（重心）をとり、環境応答２０を求める。もし、こ
の移動装置１の目標点を考慮に入れるならば、目標点に
関する環境応答２０を線形和に加える。このときも移動
前後の目標点の距離に基づき、目標点に関する環境応答
２０を求めるが、障害物の時とは反対に、接近する方が
評価が高くなる。以上の方法の具体的な例を次に示す。
認知した障害物の個数をｍとし、その距離をＬ₁（ｎ），Ｌ₂（ｎ），…，Ｌ_m（ｎ）とおく。ここで、ｎは現時刻を表す。移動後の距離は、
これより、Ｌ₁（ｎ＋１），Ｌ₂（ｎ＋１），…Ｌ_m（ｎ＋１）となる。As the environmental information from the sensors 5 and 6, the distance information before and after the movement of the object recognized as the obstacles a, b and c before the movement is used. That is, at an obstacle,
If the distance is closer after the movement, the risk of collision between both objects is increased. The environmental response 20 in this case represents a bad evaluation. On the contrary, if the distance increases, the environmental response 20 shows a good evaluation. This is done for each obstacle a, b, c. Further, the linear sum (center of gravity) is taken to obtain the environmental response 20. If the target point of this mobile device 1 is taken into account, the environmental response 20 for the target point is added to the linear sum. At this time as well, the environmental response 20 regarding the target point is obtained based on the distance between the target points before and after the movement, but the evaluation is higher when approaching, as opposed to when an obstacle is present. A specific example of the above method is shown below.
Let m be the number of recognized obstacles and L ₁ (n), L ₂ (n), ..., L _m (n) be their distances. Here, n represents the current time. The distance after moving is
This results in L ₁ (n + 1), L ₂ (n + 1), ... L _m (n + 1).

【００２９】一般に、環境応答２０は［０，１］間の数
量で表現される。［１］」に近いほど、悪い評価を示す
ことを意味している。各障害物の環境応答ｂ₁（ｎ），
ｂ₂（ｎ），…，ｂ_m（ｎ）も、これにならって次式で
求める。 If Ｌ_j（ｎ＋１）≦Ｌ_j（ｎ），Then ｂ_j（ｎ）＝１ …（２） If Ｌ_j（ｎ＋１）＞Ｌ_j（ｎ），Then ｂ_j（ｎ）＝０ …（３）同様に目標点Ｅとの移動前後での距離をＳ（ｎ），Ｓ
（ｎ＋１）とすると、その目標点に関する環境応答ｂ′
（ｎ）は If Ｓ（ｎ＋１）≧Ｓ（ｎ），Then ｂ′（ｎ）＝１ …（４） If Ｓ（ｎ＋１）＜Ｓ（ｎ），Then ｂ′（ｎ）＝０ …（５）となる。これより、環境応答Ｂ（ｎ）は以下のように計
算される。In general, the environmental response 20 is represented by a quantity between [0,1]. The closer to [1], the worse the evaluation. Environmental response of each obstacle b ₁ (n),
b ₂ (n), ..., B _m (n) are also obtained by the following equations. If L _j (n + 1) ≦ L _j (n), Then b _j (n) = 1 (2) If L _j (n + 1)> L _j (n), Then b _j (n) = 0 (3) Similarly, the distance before and after the movement from the target point E is S (n), S
(N + 1), the environmental response b'for the target point
(N) is If S (n + 1) ≧ S (n), Then b ′ (n) = 1 (4) If S (n + 1) <S (n), Then b ′ (n) = 0 (5) Becomes From this, the environmental response B (n) is calculated as follows.

【００３０】[0030]

【数３】そして、上記環境応答Ｂ（ｎ）より学習ブロック１５に
て学習する。この場合、ＳＬＡ理論の最強化学習則の１
つである[Equation 3] Then, the learning block 15 learns from the environmental response B (n). In this case, one of the most reinforcement learning rules of SLA theory
Is one

【００３１】[0031]

【数４】スキームを用いる。これは定数μとλを［０，１］かつ
μ＞＞λとおき、次式で記述される。 If Ａ（ｎ）≠ａ_K， Then Ｐ_k（ｎ＋１）＝Ｐ_k（ｎ）＋μＢ（ｎ）Ｐ_k（ｎ） −λ（１−Ｂ（ｎ））Ｐ_k（ｎ） …（７） If Ａ（ｎ）＝ａ_K， Then Ｐ_k（ｎ＋１）＝Ｐ_k（ｎ）＋μＢ（ｎ）（１−Ｐ_k（ｎ））＋λ（１−Ｂ（ｎ））（１−Ｐ_k（ｎ））…（８）ただし、時刻ｎで認知した状態における確率集合のみ更
新する。また、環境応答２０は前述の例をとるものとす
る。[Equation 4] The scheme is used. This is described by the following equation with constants μ and λ set to [0, 1] and μ >> λ. If A (n) ≠ a _K , Then P _k (n + 1) = P _k (n) + μB (n) P _k (n) −λ (1-B (n)) P _k (n) (7) If A (n) = a _K , Then P _k (n + 1) = P _k (n) + μB (n) (1-P _k (n)) + λ (1-B (n)) (1-P _k (n) ) ... (8) However, only the probability set in the state recognized at the time n is updated. Further, the environmental response 20 is assumed to take the above-mentioned example.

【００３２】上記スキームは、学習を重ねるにつれ、環
境９に適切な行動を選べるように確率集合１８の更新を
行なうことが証明されている（ペナルティー確率の単調
減少性により）。It has been proved that the above scheme updates the probability set 18 so that an appropriate action for the environment 9 can be selected as the learning is repeated (due to the monotonic decreasing property of the penalty probability).

【００３３】なお、式（６）において、各項に比例係数
を乗じて環境応答２０に対する影響度を調整することが
できる。ただし、環境応答（Ｂｎ）が［０，１］内であ
ることに注意する。これを以下に示す。In equation (6), the influence on the environmental response 20 can be adjusted by multiplying each term by a proportional coefficient. However, note that the environmental response (Bn) is within [0,1]. This is shown below.

【００３４】[0034]

【数５】 [Equation 5]

【００３５】各係数ともに値が大きいほど影響度が高く
なる。例えば距離が接近している障害物に大きな値の係
数を与えると、これらの障害物を優先的に考慮した回避
行動を学習することになる。The larger the value of each coefficient, the higher the degree of influence. For example, if a large coefficient is given to obstacles that are close to each other, the avoidance behavior that preferentially considers these obstacles is learned.

【００３６】また、各障害物ａ，ｂ，ｃや、目標点に対
する環境応答ｂ_j（ｎ），Ｓ（ｎ）も、“０”，“１”
の２値ではなく、距離変化に比例した［０，１］間の値
で表現することも可能である。例えば、以下に示す式を
用いることにより、距離変化に比例した［０，１］間の
値で表現することができる。ｂ_j（ｎ）＝（１／２）−｛Ｌ_j（ｎ＋１）−Ｌ_j（ｎ）｝／２（ｖ＋ｖ_j） …（１１）ｂ′（ｎ）＝（１／２）＋｛Ｓ（ｎ＋１）−Ｓ（ｎ）｝／２ｖ …（１２）The obstacles a, b and c and the environmental responses b _j (n) and S (n) to the target point are also “0” and “1”.
It is also possible to express not with the binary value of, but with a value between [0, 1] proportional to the distance change. For example, by using the formula shown below, it is possible to express by a value between [0, 1] proportional to the distance change. _{b j (n) = (1/2} ) - {L j (n + 1) -L j (n)} / 2 (v + v j) ... (11) b '(n) = (1/2) + {S ( n + 1) -S (n)} / 2v (12)

【００３７】ここで、ｖ，ｖ_jは移動装置と障害物の速
度を示す。障害物速度は距離情報より推定する。これら
の式では、移動後でもその距離に変化が無ければ、各環
境応答は「０．５」で距離差に比例した応答が定まるこ
とがわかる。Here, v and v _j represent the speed of the moving device and the obstacle. The obstacle speed is estimated from the distance information. From these equations, it can be seen that if there is no change in the distance even after the movement, each environmental response is “0.5” and a response proportional to the distance difference is determined.

【００３８】なお、上記実施例では、ナビゲーションシ
ステムを備えた自動車に実施した場合について説明した
が、その他、例えばロボット、無人搬送機等の自律移動
物体に対して実施し得るものである。In the above embodiments, the case where the invention is applied to the automobile equipped with the navigation system has been described, but the invention can be applied to other autonomous moving objects such as robots and automatic guided vehicles.

【００３９】[0039]

【発明の効果】以上詳記したように本発明に係る自律移
動物体の障害物回避システムを用いることにより、次の
ような効果が得られる。As described in detail above, by using the obstacle avoidance system for an autonomously moving object according to the present invention, the following effects can be obtained.

【００４０】（１）外部環境から良い評価を受ける行動
を選択する確率が高くなる様に学習される。従って、環
境に適切な行動を自動的に決定できる。これは回避戦略
の獲得を意味している。(1) Learning is performed so that the probability of selecting an action that receives a good evaluation from the external environment is high. Therefore, the action suitable for the environment can be automatically determined. This means acquisition of evasion strategies.

【００４１】（２）環境の変化に対し、変化後の環境に
おいて評価の高い行動を選択できるように行動選択確率
を調整できる。よって、環境変化に柔軟かつ自動的に対
応できる。(2) It is possible to adjust the action selection probability in response to a change in environment so that a highly evaluated action can be selected in the changed environment. Therefore, it is possible to flexibly and automatically respond to environmental changes.

【００４２】（３）限定されたセンサ情報を用いても、
過去の経験に基づいた行動選択確率により適切な行動を
決定できる。これは有限情報によるメモリ量の軽減につ
ながる。（４）環境からの応答に基づいて行動選択確率を更新し
ているため、学習が進んだ後の行動は、外部環境に適し
ていることが保障される。（５）更に、環境応答に対する入力機構が不要なため、
システムの構成はより単純化される。(3) Even if the limited sensor information is used,
Appropriate actions can be determined by the action selection probability based on past experience. This leads to a reduction in the amount of memory due to finite information. (4) Since the action selection probability is updated based on the response from the environment, it is guaranteed that the action after learning progresses is suitable for the external environment. (5) Furthermore, since an input mechanism for environmental response is unnecessary,
The system configuration is simplified.

[Brief description of drawings]

【図１】本発明の一実施例に係る自律移動物体の障害物
回避システムの概略図。FIG. 1 is a schematic diagram of an obstacle avoidance system for an autonomously moving object according to an embodiment of the present invention.

【図２】同実施例における回避動作を示すフローチャー
ト。FIG. 2 is a flowchart showing an avoidance operation in the same embodiment.

【図３】認知ブロックでの具体的な状態認知法の説明
図。FIG. 3 is an explanatory diagram of a specific state recognition method in a recognition block.

【図４】環境状態のストリング表現を示す図。FIG. 4 is a diagram showing a string representation of environmental conditions.

【図５】行動決定ブロックにおける確率マトリクスの説
明図。FIG. 5 is an explanatory diagram of a probability matrix in an action determination block.

【図６】確率に基づく行動選択システムの説明図。FIG. 6 is an explanatory diagram of an action selection system based on probability.

【図７】従来の障害物回避方法を示す概略図。FIG. 7 is a schematic view showing a conventional obstacle avoidance method.

[Explanation of symbols]

１…移動装置、２ａ，２ｂ…前車輪、３…後車
輪、４…駆動系、５…方向センサ、６…距離セ
ンサ、７…コンピュータ、８…アクチュエー
タ、９…環境、１１…認知ブロック、１２
…評価ブロック、１３…ＳＬＡブロック、１４…行
動決定ブロック、１５…学習ブロック、１６…状態集
合、１７…確率マトリクス、１８…行動選択確
率集合、１９…行動集合、２０…環境応答。DESCRIPTION OF SYMBOLS 1 ... Moving device, 2a, 2b ... Front wheel, 3 ... Rear wheel, 4 ... Drive system, 5 ... Direction sensor, 6 ... Distance sensor, 7 ... Computer, 8 ... Actuator, 9 ... Environment, 11 ... Recognition block, 12
... evaluation block, 13 ... SLA block, 14 ... action decision block, 15 ... learning block, 16 ... state set, 17 ... probability matrix, 18 ... action selection probability set, 19 ... action set, 20 ... environmental response.

───────────────────────────────────────────────────── フロントページの続き (72)発明者嘉数侑昇北海道江別市文京台52−１ (72)発明者三上貞芳北海道札幌郡広島町白樺町３丁目６の３ ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yusho Kakazu 52-1, Bunkyodai, Ebetsu-shi, Hokkaido (72) Inventor Sadayoshi Mikami 3-6-3, Shirakaba-cho, Hiroshima-cho, Sapporo-gun, Hokkaido

Claims

[Claims]

1. A sensor for sensing environmental information about a moving object's external environment at the current time, a recognition means for recognizing an environmental state characteristically representing the environment from the environmental information sensed by the sensor, and this recognition. Action determining means for deriving an action selection probability set for the environmental state recognized by the means, and probabilistically determining the action of the moving object in the current state, and means for actually performing the action based on the action determined by this means And an evaluation means for identifying an environmental response by the information change before and after the movement of the moving object obtained from the sensor, and an environmental response to the behavior of the moving object evaluated by this evaluation means, and An obstacle avoidance system for an autonomously moving object, comprising: learning means for updating.