JP2003042718A

JP2003042718A - Apparatus and method for image processing

Info

Publication number: JP2003042718A
Application number: JP2002107013A
Authority: JP
Inventors: Takamasa Echizen; 孝方越膳; Koji Tsujino; 広司辻野; Koji Akatsuka; 浩二赤塚
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2001-05-02
Filing date: 2002-04-09
Publication date: 2003-02-13
Anticipated expiration: 2022-04-09
Also published as: EP1255177A2; US20030007682A1; JP4159794B2; US7221797B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processor by which the behavior of a moving body is recognized with high accuracy on the basis of image information on an external environment acquired when the moving body behaves. SOLUTION: A behavior-command output part 12 outputs a behavior command which makes the moving body 32 behave. A local-feature extraction part 16 extracts local feature information on an image on the basis of the image information on the external environment acquired in the moving body 32 when the behavior command is output. A hole-feature extraction part 18 extracts feature information on a whole region in the image from the local feature information. In a learning part 20, a probability statistical model used to recognize the behavior command given to the moving body 22 is calculated on the basis of the feature information on the whole region. In a movement after that, the probability statistical model is applied to the image information on the external environmental acquired in the moving body 32, and the behavior of the moving body 32 is recognized at high speed and with high accuracy.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、画像処理装置及び
画像処理方法に関し、より詳細には、移動体の移動時に
取得した外部環境の画像情報を用いて移動体の行動を高
精度に認識するための画像処理装置及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus and an image processing method, and more specifically, it accurately recognizes the behavior of a moving body using image information of the external environment acquired when the moving body moves. And an image processing apparatus and method therefor.

【０００２】[0002]

【従来の技術】従来、移動体が取得する外部環境の画像
情報を用いて移動体の行動を認識する方法として、入力
される連続画像から画像濃度の勾配変化などを計算して
オプティカルフローを検出する方法が良く知られてい
る。2. Description of the Related Art Conventionally, as a method of recognizing the behavior of a moving body using image information of the external environment acquired by the moving body, an optical flow is detected by calculating a gradient change in image density from input continuous images. It is well known how to do it.

【０００３】例えば、特開２０００−１７１２５０号公
報では、オプティカルフローを用いて移動体の現在位置
を検出する方法が開示されている。この方法によれば、
予め移動体を所定の走行領域に沿って走行させた時に、
その走行領域について所定距離間隔毎に走行領域近傍の
情景のオプティカルフローが検出され、オプティカルフ
ローと検出位置との関係が記憶される。そして、後の走
行時にも走行領域のオプティカルフローが検出され、こ
れと記憶している全てのオプティカルフローとのマッチ
ングが行われる。このマッチングにおいて最大マッチン
グ結果を示すオプティカルフローが選択され、選択され
たオプティカルフローに対応付けられている位置が移動
体の現在走行位置であると認識される。For example, Japanese Unexamined Patent Publication No. 2000-171250 discloses a method of detecting the current position of a moving body by using an optical flow. According to this method
When the moving body is run along a predetermined running area in advance,
The optical flow of the scene in the vicinity of the traveling area is detected at predetermined distance intervals in the traveling area, and the relationship between the optical flow and the detected position is stored. Then, the optical flow in the traveling area is also detected during the subsequent traveling, and matching with all the stored optical flows is performed. In this matching, the optical flow showing the maximum matching result is selected, and the position associated with the selected optical flow is recognized as the current traveling position of the mobile body.

【０００４】また、特開平１１−１３４５０４号公報に
は、動画像からオプティカルフローを算出し、これをニ
ューラルネット層で処理することで行動を認識し、必要
な処理を判断する技術が開示されている。この技術によ
れば、簡単な構成のニューラルネットによって動画像か
ら障害物への接近を判断することが可能となる。Further, Japanese Laid-Open Patent Publication No. 11-134504 discloses a technique of calculating an optical flow from a moving image and processing the optical flow in a neural network layer to recognize an action and determine necessary processing. There is. According to this technique, it is possible to determine the approach of an obstacle from a moving image using a neural network having a simple structure.

【０００５】[0005]

【発明が解決しようとする課題】しかし、前者の方法で
位置認識するには、予め所定領域を移動してオプティカ
ルフローと位置との関係を記憶しておく必要がある。ま
た、一般に動画像のみに基づいてオプティカルフロー等
の特徴抽出を行い、位置や行動を認識するには以下のよ
うな問題がある。すなわち、移動体の移動に伴い太陽光
や蛍光灯などの光源と移動体に搭載されたカメラとの位
置関係が時々刻々と変わるため、明度などの画像強度が
変化して特徴抽出を精度良く行うのが困難となる。ま
た、移動体の移動時の振動がカメラに伝わるため、取得
する連続画像が振動の影響を受けて特徴抽出の精度を低
下させてしまう。さらに、上記画像強度の変化や振動の
悪影響を取り除くために、画像情報に対し複数フレーム
に渡ってスムージング処理を行うように構成すると、計
算負荷が増大し、かつ時間変動の大きな高速動作をする
対象物を捉えるための特徴抽出が困難になってしまうと
いう問題もある。However, in order to recognize the position by the former method, it is necessary to move the predetermined region in advance and store the relationship between the optical flow and the position. In addition, in general, there are the following problems in performing feature extraction such as optical flow based on only moving images and recognizing positions and actions. That is, since the positional relationship between the light source such as sunlight or a fluorescent lamp and the camera mounted on the moving body changes every moment as the moving body moves, the image intensity such as the brightness changes and the feature extraction is performed accurately. Becomes difficult. Further, since the vibration of the moving body during movement is transmitted to the camera, the acquired continuous images are affected by the vibration and the accuracy of feature extraction is reduced. Furthermore, in order to eliminate the adverse effects of the change in the image intensity and the vibration, if the smoothing process is performed on the image information over a plurality of frames, the calculation load increases, and the target for high-speed operation with large time fluctuation There is also a problem that feature extraction for capturing an object becomes difficult.

【０００６】本発明は、上記の点に鑑みてなされたもの
であり、実環境においても移動体が取得する外部環境の
画像情報を用いて移動体の行動を高速かつ高精度に認識
することができる画像処理装置及び画像処理方法を提供
することを目的とする。The present invention has been made in view of the above points, and it is possible to recognize the action of a mobile body at high speed and with high accuracy by using image information of the external environment acquired by the mobile body even in an actual environment. An object of the present invention is to provide an image processing apparatus and an image processing method that can be performed.

【０００７】[0007]

【課題を解決するための手段】本発明の原理は、移動体
に対する行動コマンドと当該行動コマンドに従って移動
した際に移動体が取得した外部環境の画像情報との関係
を学習によって確率統計モデルとして生成しておき、以
後の移動時には移動体が取得する外部環境の画像情報及
び確率統計モデルに基づき現在の行動を高速かつ高精度
に認識する点にある。The principle of the present invention is to generate a probability statistical model by learning the relationship between an action command for a moving body and the image information of the external environment acquired by the moving body when moving according to the action command. The point is that the current behavior is recognized at high speed and with high accuracy based on the image information of the external environment acquired by the moving body and the probability statistical model during the subsequent movement.

【０００８】本発明は、従来の方法においてノイズの影
響を除去するために必要とされた特徴抽出を行う前のス
ムージング処理等をすることなしに、実環境におけるオ
ンライン的な学習を事前学習段階で行い、そのようなノ
イズをも特徴抽出のデータして利用することによって、
環境の変化に対するロバスト性を向上して不良設定問題
を回避しようとするものである。According to the present invention, online learning in a real environment is performed at a pre-learning stage without performing a smoothing process or the like before performing the feature extraction required for removing the influence of noise in the conventional method. By using such noise as feature extraction data,
It aims to improve the robustness against environmental changes and avoid the problem of defective setting.

【０００９】本発明は、移動体を行動させる行動コマン
ドを出力する行動コマンド出力部と、該行動コマンド出
力時に前記移動体において取得された外部環境の画像情
報から画像の局所領域の特徴情報を抽出する局所特徴抽
出部と、抽出された局所領域の特徴情報を使用して画像
の全体領域の特徴情報を抽出する全体特徴抽出部と、抽
出された全体領域の特徴情報に基づいて行動コマンドを
認識するための確率統計モデルを計算する学習部を含む
画像処理装置を提供する。According to the present invention, an action command output section for outputting an action command for moving a moving body, and feature information of a local region of an image are extracted from image information of an external environment acquired by the moving body when the moving command is output. A local feature extraction unit that extracts the feature information of the entire region of the image using the extracted feature information of the local region, and recognizes the action command based on the feature information of the extracted entire region Provided is an image processing device including a learning unit that calculates a probability statistical model for performing.

【００１０】局所特徴抽出部はガボールフィルタを用い
て画像の局所領域の特徴情報を抽出する。局所領域の特
徴情報の抽出には、正成分及び負成分のガボールフィル
タを適用して得られる画像強度を使用する。ガボールフ
ィルタは８方向について適用するのが好ましい。The local feature extraction unit extracts the feature information of the local region of the image using the Gabor filter. The image intensity obtained by applying the Gabor filter of the positive component and the negative component is used for the extraction of the characteristic information of the local region. The Gabor filter is preferably applied in eight directions.

【００１１】全体特徴抽出部は、ガウス関数を用いて局
所領域の特徴情報を融合する。The overall feature extraction unit fuses the feature information of the local area using a Gaussian function.

【００１２】確率統計モデルの計算は期待値最大化アル
ゴリズムとニューラルネットワークによる教師付き学習
により行われるのが好ましいが、他の学習アルゴリズム
を使用することも可能である。The calculation of the probability statistical model is preferably performed by supervised learning using an expectation maximization algorithm and a neural network, but other learning algorithms can be used.

【００１３】確率モデルが形成されると、新たに取得さ
れた画像にこの確率モデルを適用することで移動体の行
動を高精度に認識できるようになる。従って本発明の画
像処理装置は、画像情報に対して確率統計モデルを使用
したベイズ則を適用し、各行動コマンドについての確信
度を算出することにより移動体の行動を認識する行動認
識部をさらに含む。When the probabilistic model is formed, the behavior of the moving body can be recognized with high accuracy by applying the probabilistic model to the newly acquired image. Therefore, the image processing apparatus of the present invention further applies a Bayesian rule using a probability statistical model to image information, and further includes an action recognition unit that recognizes the action of the moving object by calculating the certainty factor for each action command. Including.

【００１４】上記のようにして移動体の行動を認識する
ことが可能となるが、常に一定レベル以上の確信度を有
する行動の認識が行われることが好ましい。従って本発
明の画像処理装置は、行動認識部の算出した確信度に基
づく値と所定の値を比較することによって行動認識を評
価する行動評価部と、行動評価部における評価に応じ
て、確率統計モデルを更新させる注意要求を生成する注
意生成部と、注意要求に応じて全体特徴抽出部の所定の
パラメータを変更する注意転調部をさらに含むことがで
きる。この場合、学習部は、パラメータの変更後に確率
統計モデルを再度計算する。そして行動認識部は、この
確率統計モデルを用いて移動体の行動の認識をやり直
す。Although it is possible to recognize the behavior of the moving body as described above, it is preferable to always recognize the behavior having a certainty level or higher. Therefore, the image processing device of the present invention, the behavior evaluation unit that evaluates the behavior recognition by comparing the value based on the certainty factor calculated by the behavior recognition unit and a predetermined value, and the probability statistics according to the evaluation in the behavior evaluation unit. It may further include an attention generation unit that generates an attention request for updating the model, and an attention modulation unit that changes a predetermined parameter of the overall feature extraction unit according to the attention request. In this case, the learning unit recalculates the probability statistical model after changing the parameters. Then, the action recognition unit redoes the action recognition of the moving body using this probability statistical model.

【００１５】[0015]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明の一実施形態である画像処
理装置１０のブロック構成図である。画像処理装置１０
は、行動コマンド出力部１２、局所特徴抽出部１６、全
体特徴抽出部１８、学習部２０、記憶部２２、行動認識
部２４、行動評価部２６、注意生成部２８、及び注意転
調部３０等から構成される。FIG. 1 is a block diagram of an image processing apparatus 10 which is an embodiment of the present invention. Image processing device 10
From the action command output unit 12, the local feature extraction unit 16, the overall feature extraction unit 18, the learning unit 20, the storage unit 22, the action recognition unit 24, the action evaluation unit 26, the attention generation unit 28, the attention modulation unit 30, and the like. Composed.

【００１７】画像処理装置１０の動作は、予めカメラ等
を搭載した移動体３２を実環境下で行動させながら取得
した画像とそのときの行動の対応関係について学習を行
う事前学習段階と、事前学習段階で学習した知識を利用
して新たに取得した画像情報から移動体３２の行動を認
識する行動認識段階の二段階からなる。The operation of the image processing apparatus 10 includes a pre-learning stage in which learning is carried out on a correspondence relationship between an image acquired while the moving body 32 equipped with a camera or the like is made to act in an actual environment in advance and the action at that time, and a pre-learning stage. The action recognition stage includes two stages of recognizing the action of the moving body 32 from the image information newly acquired by using the knowledge learned in the stage.

【００１８】事前学習段階では、図１中の行動コマンド
出力部１２、局所特徴抽出部１６、全体特徴抽出部１
８、学習部２０及び記憶部２２が使用される。行動認識
段階では、これらに加えて行動認識部２４、行動評価部
２６、注意生成部２８及び注意転調部３０も使用され
る。At the pre-learning stage, the action command output unit 12, local feature extraction unit 16 and overall feature extraction unit 1 shown in FIG.
8, the learning unit 20 and the storage unit 22 are used. At the action recognition stage, in addition to these, the action recognition unit 24, the action evaluation unit 26, the attention generation unit 28, and the attention modulation unit 30 are also used.

【００１９】まず、事前学習段階において使用される各
部を説明する。First, each part used in the pre-learning stage will be described.

【００２０】行動コマンド出力部１２は、移動体３２に
対し行動コマンドを出力する。行動コマンドとは、移動
体３２に直進、右折、左折などの行動をさせるコマンド
である。行動コマンド出力部１２は、外部から受信する
指示信号に応じて行動コマンドを出力するが、予め時系
列で設定されたコマンド列を図示しないメモリから読み
出してきて出力するようにしても良い。また、移動体３
２が取得画像に基づいて自己の行動を認識して随時次に
とるべき行動を決定し、行動コマンド出力部１２はその
決定に応じた行動コマンドを出力する構成としても良
い。出力された行動コマンドは、無線または有線により
移動体３２に送られると共に、全体特徴抽出部１８に供
給されて、後述する全体特徴情報の生成に使用される。The action command output section 12 outputs an action command to the moving body 32. The action command is a command that causes the moving body 32 to take an action such as going straight, turning right, turning left, or the like. Although the action command output unit 12 outputs an action command in response to an instruction signal received from the outside, the action command output unit 12 may read a command sequence set in advance in time series from a memory (not shown) and output it. In addition, the moving body 3
The configuration may be such that 2 recognizes its own action based on the acquired image, determines the action to be taken next at any time, and the action command output unit 12 outputs the action command according to the determination. The output action command is sent to the moving body 32 wirelessly or by wire, and is also supplied to the overall feature extraction unit 18 to be used for generating overall feature information described later.

【００２１】移動体３２にはＣＣＤカメラ等の画像取得
部１４が付設されている。画像取得部１４は、時刻ｔに
おける移動体３２の外部環境の画像I(t)を所定の時間間
隔で取得し、局所特徴抽出部１６に供給する。An image acquisition unit 14 such as a CCD camera is attached to the moving body 32. The image acquisition unit 14 acquires images I (t) of the external environment of the moving body 32 at time t at predetermined time intervals and supplies them to the local feature extraction unit 16.

【００２２】局所特徴抽出部１６は、画像I(t)の局所領
域の特徴情報を抽出する。本明細書において「局所領
域」とは、画像取得部１４が取得する画像I(t)の全体領
域をそれぞれが同一の大きさを有するように分割したと
きの各小領域のことを指しており、各局所領域は複数の
画素を含む。本実施形態では、時間的に連続する２枚の
画像I(t)、I(t+1)からオプティカルフローを算出して、
これを各局所領域の特徴情報（以下、「局所特徴情報」
という）の生成に使用する。局所特徴抽出部１６によっ
て抽出された局所特徴情報は、全体特徴抽出部１８に供
給される。The local feature extraction unit 16 extracts the feature information of the local area of the image I (t). In the present specification, the “local area” refers to each small area when the entire area of the image I (t) acquired by the image acquisition unit 14 is divided so as to have the same size. , Each local region includes a plurality of pixels. In the present embodiment, an optical flow is calculated from two temporally consecutive images I (t) and I (t + 1),
This is the feature information of each local area (hereinafter, “local feature information”).
Used to generate). The local feature information extracted by the local feature extraction unit 16 is supplied to the overall feature extraction unit 18.

【００２３】全体特徴抽出部１８は、画像I(t)について
得られた全ての局所特徴情報を融合して画像全体につい
ての特徴情報（以下、「全体特徴情報」という）を抽出
する。抽出された全体特徴情報は学習部２０に供給され
る。The overall feature extraction unit 18 fuses all the local feature information obtained for the image I (t) to extract feature information for the entire image (hereinafter referred to as "overall feature information"). The extracted overall characteristic information is supplied to the learning unit 20.

【００２４】学習部２０は、全体特徴抽出部１８から供
給される全体特徴情報に基づいて学習を行い、後述する
確率モデルを作成する。本実施形態では、この学習に公
知の期待値最大化アルゴリズムとニューラルネットワー
クを用いた教師付き学習を使用するが、他の学習アルゴ
リズムを用いることもできる。学習結果である確率統計
モデルは記憶部２２に格納され、行動認識段階における
移動体３２の行動の認識に使用される。The learning section 20 performs learning based on the overall characteristic information supplied from the overall characteristic extracting section 18 and creates a probability model described later. In this embodiment, a known learning method using an expected value maximization algorithm and a neural network is used for this learning, but other learning algorithms may be used. The probabilistic statistical model that is the learning result is stored in the storage unit 22 and is used for recognizing the action of the moving body 32 in the action recognition stage.

【００２５】事前学習段階が終了すると、その学習結果
（確率モデル）を新たに取得した画像に適用して移動体
３２の行動を高精度に認識可能な状態になる。以下、行
動認識段階において使用される各部を説明する。When the pre-learning stage ends, the learning result (probabilistic model) is applied to the newly acquired image, and the behavior of the moving body 32 can be recognized with high accuracy. Hereinafter, each unit used in the action recognition stage will be described.

【００２６】事前学習段階と同様、画像取得部１４は時
刻ｔにおける移動体３２の外部環境の画像I(t)を所定の
時間間隔で取得し、行動認識部２４に供給する。Similar to the pre-learning stage, the image acquisition unit 14 acquires images I (t) of the external environment of the moving body 32 at time t at predetermined time intervals and supplies them to the action recognition unit 24.

【００２７】行動認識部２４は、供給された画像I(t)に
対して記憶部２２に格納されている確率モデルを適用し
て、各行動コマンドについて確信度を算出し、最大のも
のを移動体３２の行動と認識する。算出された確信度は
行動評価部２６に供給される。The action recognition unit 24 applies the probability model stored in the storage unit 22 to the supplied image I (t), calculates the certainty factor for each action command, and moves the maximum one. Recognize as the action of the body 32. The calculated certainty factor is supplied to the behavior evaluation unit 26.

【００２８】行動評価部２６は、行動コマンドの確信度
について対数尤度を計算する。注意生成部２８は、確信
度の対数尤度が所定値以上であれば何も行わない。確信
度の対数尤度が所定値未満の場合、認識した行動は十分
な確信度を得ていないとして、注意要求信号を生成して
注意転調部３０に供給する。The behavior evaluation unit 26 calculates the log likelihood for the certainty factor of the behavior command. The attention generator 28 does nothing if the log-likelihood of the certainty factor is equal to or larger than a predetermined value. When the logarithmic likelihood of the certainty factor is less than the predetermined value, it is determined that the recognized action does not have sufficient certainty factor, and the attention request signal is generated and supplied to the attention modulation unit 30.

【００２９】注意転調部３０は、注意生成部２８から注
意要求信号を受信すると、学習のアルゴリズムにおける
所定のパラメータ値を変更（転調）して、学習部２０に
確率モデルの更新を行わせる。学習部２０は更新した確
率モデルを記憶部２２に格納する。行動認識部２４は、
更新後の確率モデルを適用して再度移動体３２の行動を
認識する。これによって、常に一定レベル以上の確信度
を有する行動の認識が行われることになる。Upon receiving the attention request signal from the attention generation unit 28, the attention modulation unit 30 changes (modulates) a predetermined parameter value in the learning algorithm and causes the learning unit 20 to update the probability model. The learning unit 20 stores the updated probabilistic model in the storage unit 22. The action recognition unit 24
The behavior of the moving body 32 is recognized again by applying the updated probabilistic model. As a result, an action having a certainty level or higher is always recognized.

【００３０】なお、画像取得部１４は移動体３２に付設
されている必要があるが、画像処理装置１０は、画像取
得部１４と一体でまたは別個に移動体３２に付設されて
いても、あるいは移動体３２とは別の場所に設置されて
いてもよい。画像取得部１４と画像処理装置１０の間の
通信は、有線でも無線でもよい。Although the image acquisition unit 14 needs to be attached to the moving body 32, the image processing apparatus 10 may be attached to the moving body 32 either integrally with the image obtaining unit 14 or separately. It may be installed in a place different from the moving body 32. The communication between the image acquisition unit 14 and the image processing apparatus 10 may be wired or wireless.

【００３１】続いて、図１及び図２を用いて事前学習段
階について詳細に説明する。図２は事前学習段階の処理
の流れを示すフローチャートである。Next, the pre-learning stage will be described in detail with reference to FIGS. FIG. 2 is a flowchart showing the flow of processing in the pre-learning stage.

【００３２】行動コマンド出力部１２からの行動コマン
ドに応じて移動体３２が行動する際に、移動体３２に付
設された画像取得部１４は、時間的に連続する２枚の画
像を取得する（ステップＳ４２）。そして、局所特徴抽
出部１６は、画像取得部１４が取得した画像から局所特
徴情報を抽出する（ステップＳ４４〜Ｓ４８）。具体的
には、取得した画像中の各局所領域画像に対して複数方
向のガボールフィルタを適用して、各局所領域について
ガボールフィルタの各方向における画像強度Ｅ _i(x_t,y_t)
を計算する（ステップＳ４４）。画像強度Ｅ_i(x_t,y_t)
は、以下の式(１)により計算される。Action command from action command output unit 12
When the moving body 32 acts in accordance with the
The image acquisition unit 14 provided has two images that are temporally continuous.
An image is acquired (step S42). And local feature extraction
The output unit 16 determines a local feature from the image acquired by the image acquisition unit 14.
The signature information is extracted (steps S44 to S48). concrete
For each local area image in the acquired image,
Gabor filter for each local region
Image intensity E in each direction of the Gabor filter _i(x_t, y_t)
Is calculated (step S44). Image intensity E_i(x_t, y_t)
Is calculated by the following equation (1).

【００３３】Ｅ_i(x_t,y_t)=Img_(t)×Gbr_i(+)＋Img_(t+1)×Gbr_i(-) (１) ここで、Gbr_i(+)、Gbr_i(-)はそれぞれ正成分及び負成分
のガボールフィルタであることを示す。また、添え字
「ｉ」はガボールフィルタの方向を示し、本実施形態で
はｉ＝１,…,８である。Img_(t)は時刻ｔにおいて取得さ
れた画像の局所領域画像を示し、Img_(t+1)は次の時刻ｔ
＋１において取得された画像の局所領域画像を示す。さ
らに、(x_t,y_t)は、時刻ｔにおける局所領域内での画素
の座標を表している。従って、Ｅ_i(x_t,y_t)は当該局所領
域における方向ｉの画像強度を表す。E _i (x _t , y _t ) = Img _(t) × Gbr _i (+) + Img _{(t + 1)} × Gbr _i (−) (1) Here, Gbr _i (+) and Gbr _i ( -) Indicates Gabor filters with positive and negative components, respectively. Further, the subscript "i" indicates the direction of the Gabor filter, and i = 1, ..., 8 in this embodiment. Img _(t) represents a local area image of the image acquired at time t, and Img _{(t + 1)} is the next time t.
4 shows a local area image of the image acquired at +1. Furthermore, (x _t , y _t ) represents the coordinates of the pixel in the local area at time t. Therefore, E _i (x _t , y _t ) represents the image intensity in the direction i in the local area.

【００３４】ガボールフィルタの方向及び適用する数は
任意であるが、本実施形態では人間の視覚機能の受容野
を模倣して、全体画像の中心から等角に放射状に伸びる
８方向のガボールフィルタを使用している。Although the direction of the Gabor filter and the number of Gabor filters to be applied are arbitrary, in the present embodiment, an 8-direction Gabor filter that radially extends equiangularly from the center of the entire image is imitated by receptive fields of human visual function. I'm using it.

【００３５】次に、局所特徴抽出部１６は、ステップ４
４で計算した各局所領域における８方向の画像強度Ｅ
_i(x_t,y_t)（ｉ＝１,…,８）から、次式(２)により各局所
領域において最も画像強度の大きい方向ｊを選択する
（ステップＳ４６）。Next, the local feature extraction unit 16 executes step 4
Image intensity E in 8 directions in each local region calculated in 4
_{_{_{i (x t, y t)}}} (i = 1, ..., 8) from, to select the larger direction j of the most image intensity in each local region by the following equation (2) (step S46).

【００３６】ｊ＝argmax_i Ｅ_i(x_t,y_t) (２) ここで、方向ｊ（ｊ＝１,…,８）は、局所領域毎に異な
ることに注意する。J = argmax _i E _i (x _t , y _t ) (2) Here, note that the direction j (j = 1, ..., 8) is different for each local region.

【００３７】続いて、局所特徴抽出部１６は、次式(３)
のように最大の画像強度Ｅ_j(x_t,y_t)に対してガウス関数
を使用して、各局所領域について局所特徴情報Ψ_j(x_t,y
_t)を算出する（ステップＳ４８）。Subsequently, the local feature extraction unit 16 uses the following equation (3)
Maximum image intensity _{_{_{E j (x t, y t}}} ) by using a Gaussian function with respect to, the local characteristic information Ψ _j (x _t for each local region, y as
_t ) is calculated (step S48).

【００３８】[0038]

【数１】 [Equation 1]

【００３９】ここで、式(３)中、「μ_j」は、当該局所
領域における画像強度Ｅ_j(x_t,y_t)の平均値である。ま
た、「σ_j」はこれらの画像強度Ｅ_j(x_t,y_t)の分散を示
す。従って、局所特徴情報Ψ_j(x_t,y_t)は、各局所領域に
おいて画像強度が最大値を有する方向に関する画像強度
Ｅ_j(x_t,y_t)を確率密度分布で表現したものになる。局所
特徴情報Ψ_j(x_t,y_t)は、局所領域の数と等しいだけ求め
られるが、それぞれの局所領域について局所特徴情報Ψ
_j(x_t,y_t)を求めた方向ｊは異なっていることに注意す
る。Here, in equation (3), “μ _j ” is the average value of the image intensities E _j (x _t , y _t ) in the local area. Further, “σ _j ” denotes the variance of these image intensities E _j (x _t , y _t ). Therefore, the local feature information Ψ _j (x _t , y _t ) is the image intensity E _j (x _t , y _t ) in the direction in which the image intensity has the maximum value in each local region, represented by the probability density distribution. . The local feature information Ψ _j (x _t , y _t ) is obtained by the number equal to the number of local regions.
Note that the direction j in which _j (x _t , y _t ) is determined is different.

【００４０】全体特徴抽出部１８は、行動コマンド出力
部１２から行動コマンドを、局所特徴抽出部１６から局
所特徴情報Ψ_j(x_t,y_t)を受け取ると、次式(４)に従っ
て、画像強度の最大方向ｊの各方向について、その方向
に関して求められた全ての局所特徴情報Ψ_j(x_t,y_t)を融
合して全体特徴情報ρ_j(χ_t|l)を算出する（ステップＳ
５０）。When the overall feature extraction unit 18 receives the action command from the action command output unit 12 and the local feature information Ψ _j (x _t , y _t ) from the local feature extraction unit 16, the image is calculated according to the following equation (4). For each direction of the maximum intensity direction j, all local feature information Ψ _j (x _t , y _t ) obtained for that direction is fused to calculate overall feature information ρ _j (χ _t | l) (step S
50).

【００４１】[0041]

【数２】ここで、「χ_t」は(x_t,y_t)による二次元直交座標を意味
する。[Equation 2] Here, “χ _t ” means two-dimensional Cartesian coordinates by (x _t , y _t ).

【００４２】算出した全体特徴情報ρ_j(χ_t|l)は、それ
ぞれ元の画像I(t)を取得したときに行動コマンド出力部
１２が出力した行動コマンドの別にクラス分けして格納
される（ステップＳ５２）。ここで、「ｌ」は行動コマ
ンドを表す。本実施形態では３つの行動コマンド（直
進、左折及び右折）が使用されているので、ｌ＝１を直
進、ｌ＝２を左折、ｌ＝３を右折の行動コマンドとして
いる。従って、移動体の直進（ｌ＝１）時に取得された
複数の全体特徴情報ρ_j(χ_t|１)と、左折（ｌ＝２）時
に取得された複数の全体特徴情報ρ_j(χ_t|２)と、右折
（ｌ＝３）時に取得された複数の全体特徴情報ρ_j(χ_t|
３)は、それぞれ別々のクラスに格納される。The calculated overall characteristic information ρ _j (χ _t | l) is stored by classifying the action commands output by the action command output unit 12 when the original image I (t) is obtained. (Step S52). Here, "l" represents an action command. Since three action commands (straight ahead, left turn, and right turn) are used in the present embodiment, l = 1 is a straight ahead action, l = 2 is a left turn action, and l = 3 is a right turn action command action. Accordingly, the straight moving body (l = 1) a plurality of whole feature information acquired during [rho _j | a (chi _t 1), left (l = 2) a plurality of whole feature information acquired at ρ _j (χ _t | 2) and a plurality of overall characteristic information ρ _j (χ _t | acquired at the time of right turn (l = 3)
3) are stored in separate classes.

【００４３】このクラスは「注意のクラス」Ω_lであ
る。注意のクラスとは、新たな特徴情報が提示されたと
きにその全てを学習結果に反映させるのではなく、特定
の特徴情報に注目することで効率良く学習を更新するた
めのものである。This class is the “class of attention” Ω _l . The attention class is for not only reflecting all of the new feature information in the learning result when the new feature information is presented, but for efficiently updating the learning by focusing on the specific feature information.

【００４４】なお、注意のクラスは３つに限られず、行
動コマンドの数と対応づけて任意の数設定することがで
きる。The number of attention classes is not limited to three, and an arbitrary number can be set in association with the number of action commands.

【００４５】全体特徴情報ρ_j(χ_t|l)は所定間隔毎に取
得される画像I(t)について行動コマンドに関連付けて計
算されるので、式(４)の計算によって、８方向の全体特
徴情報のセットがそれぞれ複数、行動コマンドの別に格
納されることになる。Since the overall feature information ρ _j (χ _t | l) is calculated in association with the action command for the image I (t) acquired at every predetermined interval, the overall characteristic in eight directions is calculated by the equation (4). A plurality of sets of characteristic information are stored separately for each action command.

【００４６】図３〜図５は、元の画像I(t)と、局所特徴
情報Ψ_j(x_t,y_t)と、全体特徴情報ρ _j(χ_t|l)の対応を示
す図である。図３は移動体３２の直進時に、図４は左折
時に、図５は右折時に取得された画像にそれぞれ対応す
る。3 to 5 show the original image I (t) and the local features.
Information Ψ_j(x_t, y_t) And the overall characteristic information ρ _j(χ_t| l)
It is a figure. 3 shows the moving body 32 going straight, and FIG. 4 shows a left turn.
Occasionally, Figure 5 corresponds to the images acquired during the right turn.
It

【００４７】各図の(a)は、画像取得部１４により取得
された画像I(t)の一例である。各図の(b)は、ガボール
フィルタのある１方向の局所特徴情報を画像全体につい
てグラフ化したものであり、Ｚ軸は局所特徴情報Ψ
_j(x_t,y_t)の絶対値を表す。この例では、画像全体が７７
×５７個の局所領域に分割されている。各図の(c)は、
(b)の局所特徴情報から式(４)の計算によって算出され
た全体特徴情報ρ_j(χ_t|l)を、ガボールフィルタの適用
方向毎に示した極形マップである。図(c)中、の１から
８の数字はガボールフィルタの適用方向（上方向、右上
方向、…）に対応している。(A) of each figure is an example of the image I (t) acquired by the image acquisition unit 14. (B) of each figure is a graph of the local feature information in one direction with a Gabor filter for the entire image, and the Z axis is the local feature information Ψ.
Represents the absolute value of _j (x _t , y _t ). In this example, the entire image is 77
It is divided into x57 local regions. (C) of each figure
It is a polar map showing the overall feature information ρ _j (χ _t | l) calculated by the calculation of equation (4) from the local feature information of (b) for each application direction of the Gabor filter. In the figure (c), the numbers 1 to 8 correspond to the application direction (upward direction, upper right direction, ...) Of the Gabor filter.

【００４８】図３〜５の(c)の極形マップに現れた形状
を比較すると、各画像について８方向の全体特徴情報を
求めることで移動体３２の行動（行動コマンドｌ）につ
いての特徴が捉えられていることがわかる。Comparing the shapes appearing in the polar maps of FIGS. 3 to 5 (c), the characteristics of the action of the moving body 32 (action command l) are obtained by obtaining the overall feature information in eight directions for each image. You can see that they are being captured.

【００４９】図２に戻り、ステップＳ５２で全体特徴情
報ρ_j(χ_t|l)を格納した後、学習部２０は全体特徴情報
ρ_j(χ_t|l)に基づいて学習を行う（ステップＳ５４〜Ｓ
５８）。具体的には、期待値最大化アルゴリズム（ＥＭ
アルゴリズム）とニューラルネットワークを用いた教師
付き学習を行って、移動体３２の行動を認識するための
確率モデルを生成する。以下、本実施形態におけるＥＭ
アルゴリズムとニューラルネットワークを用いた教師付
き学習の適用について順に説明する。Returning to FIG. 2, after storing the overall feature information ρ _j (χ _t | l) in step S52, the learning section 20 performs learning based on the overall feature information ρ _j (χ _t | l) (step S54 ~ S
58). Specifically, the expected value maximization algorithm (EM
Algorithm) and a neural network are used to perform supervised learning to generate a probabilistic model for recognizing the behavior of the moving body 32. Hereinafter, the EM in the present embodiment
The application of supervised learning using algorithms and neural networks will be described in order.

【００５０】ＥＭアルゴリズムは、観測データが不完全
データであるときに最大尤度になるパラメータθを推測
する繰り返しアルゴリズムである。観測データの平均を
μ^l、共分散をΣ^lとすると、パラメータθはθ(μ^l,
Σ^l)と表わすことができる。ＥＭアルゴリズムでは、パ
ラメータθ(μ^l,Σ^l)の適当な初期値から開始して、Ｅ
ステップ（Expectation step）とＭステップ（Maximiza
tion step）の２つのステップを反復することでパラメ
ータθ(μ^l,Σ^l)の値を逐次更新していく。The EM algorithm is an iterative algorithm for estimating the parameter θ which has the maximum likelihood when the observed data is incomplete data. If the mean of the observed data is μ ^l and the covariance is Σ ^l , the parameter θ is θ (μ ^l ,
Σ ^l ). In the EM algorithm, starting from a suitable initial value of the parameter θ (μ ^l , Σ ^l )
Step (Expectation step) and M step (Maximiza
The value of the parameter θ (μ ^l , Σ ^l ) is sequentially updated by repeating the two steps of (tion step).

【００５１】まず、Ｅステップでは、次式(５)により条
件付き期待値ψ(θ|θ^(k))を算出する。First, in step E, the conditional expected value ψ (θ | θ ^(k) ) is calculated by the following equation (5).

【００５２】[0052]

【数３】 [Equation 3]

【００５３】次に、Ｍステップでは、次式(６)によりψ
(θ|θ^(k))を最大にするパラメータμ^l、Σ^lを計算し、
これを新たな推測値θ^(k+1)とする。Next, in the M step, ψ is calculated by the following equation (6).
Calculate the parameters μ ^l and Σ ^l that maximize (θ | θ ^(k) ),
This is a new estimated value θ ^{(k + 1)} .

【００５４】[0054]

【数４】 [Equation 4]

【００５５】このＥステップとＭステップを反復してい
き、得られた条件付き期待値ψ(θ|θ^(k))をθ^(k)に関
して偏微分する。そして、偏微分の結果を「０」と置く
ことによって、最終的なμ^l、Σ^lが算出される。ＥＭア
ルゴリズムは当技術分野において周知なので、これ以上
詳細な説明は省略する。By repeating the E step and the M step, the conditional expectation value ψ (θ | θ ^(k) ) obtained is partially differentiated with respect to θ ^(k) . Then, by setting the result of the partial differentiation as “0”, the final μ ^l and Σ ^l are calculated. The EM algorithm is well known in the art and will not be described in further detail.

【００５６】ＥＭアルゴリズムにより、各注意のクラス
Ω_lの全体特徴情報を正規分布で表すことができる（ス
テップＳ５４）。By the EM algorithm, the overall feature information of each attention class Ω _l can be represented by a normal distribution (step S54).

【００５７】全体特徴抽出部１８は、行動コマンドｌに
ついて算出したμ^l、Σ^lを次式(７)に用いて、全体特徴
情報ρ_j(χ_t|l)が行動コマンドｌのクラスΩ_ｌに属する
確率である事前確率p-(ρ|Ω_l)を算出する（ステップＳ
５６）。The overall feature extraction unit 18 uses μ ^l and Σ ^l calculated for the action command l in the following equation (7), and the overall feature information ρ _j (χ _t | l) is the class Ω _{l of the} action command _l. The prior probability p- (ρ | Ω _l ) which is the probability of belonging to (step S
56).

【００５８】[0058]

【数５】上式において、Ｎは全体特徴情報ρ_j(χ_t|l)の次元数で
ある。[Equation 5] In the above equation, N is the number of dimensions of the overall feature information ρ _j (χ _t | l).

【００５９】次に、ニューラルネットワークを用いた教
師付き学習について説明する。この学習では、注意のク
ラスΩ_lを教師信号として、画像取得部１４により取得
された画像I(t)について、条件付き確率密度関数p(I(t)
|Ω_l)を算出する（ステップＳ５８）。Next, supervised learning using a neural network will be described. In this learning, the conditional probability density function p (I (t) is set for the image I (t) acquired by the image acquisition unit 14 using the attention class Ω _l as a teacher signal.
| Ω _l ) is calculated (step S58).

【００６０】図６は、このニューラルネットワークを用
いた教師付き学習に使用される階層型ニューラルネット
ワークの構成例を示す図である。この階層型ニューラル
ネットワークは３層のノードを有し、入力層７２は元の
画像I(t)、中間層７４は全体特徴情報ρ_j(χ_t|l)、出力
層７６は行動コマンドｌの注意のクラスΩ_lにそれぞれ
対応する。なお、入力層７２には簡単のために３つのノ
ードのみ描かれているが、実際にはノードは画像I(t)の
数だけ存在する。同様に、中間層７４には入力層７２と
同数の全体特徴情報ρ_j(χ_t|l)のノードがあり、両者は
それぞれ１対１に接続されている。また出力層７６のノ
ードは注意クラスΩ_lの数（本実施形態では３つ）だけ
生成される。FIG. 6 is a diagram showing an example of the structure of a hierarchical neural network used for supervised learning using this neural network. This hierarchical neural network has three layers of nodes, the input layer 72 is the original image I (t), the intermediate layer 74 is the overall feature information ρ _j (χ _t | l), and the output layer 76 is the action command l. Corresponds to each class of attention Ω _l . Although only three nodes are drawn in the input layer 72 for simplicity, there are actually as many nodes as there are images I (t). Similarly, the intermediate layer 74 has the same number of nodes as the total feature information ρ _j (χ _t | l) as the input layer 72, and both nodes are connected in a one-to-one relationship. The number of nodes in the output layer 76 is the same as the number of attention classes Ω _l (three in this embodiment).

【００６１】図６において、「λ」は階層型ニューラル
ネットワークのシナプス荷重である。ＥＭアルゴリズム
によって全体特徴情報ρ_j(χ_t|l)がそれぞれの注意のク
ラスΩ_lに属する確率が求められており、また全体特徴
情報ρ_j(χ_t|l)は１組の画像I(t)、I(t+1)に１：１に対
応して算出されるので、注意のクラスΩ_lを教師信号と
する教師付き学習を繰り返していくことで画像I(t)と注
意のクラスΩ_lの確率的な関係（つまり図６中のλ）が
決定されていく。この確率的な関係は条件付き確率密度
関数ｐ(I(t)|Ω_l)である。階層型ニューラルネットワー
クは当技術分野において周知なので、これ以上詳細な説
明は省略する。In FIG. 6, “λ” is the synapse weight of the hierarchical neural network. Whole feature information by EM algorithm ρ _j (χ _t | l) has been required probability belonging to the class Omega _l of each note, also the whole feature information ρ _j (χ _t | l) is the set of images I ( Since t) and I (t + 1) are calculated in a one-to-one correspondence, the image I (t) and the attention class can be obtained by repeating supervised learning with the attention class Ω _l as the teacher signal. The probabilistic relationship of Ω _l (that is, λ in FIG. 6) is determined. This probabilistic relationship is the conditional probability density function p (I (t) | Ω _l ). Hierarchical neural networks are well known in the art and will not be described in further detail.

【００６２】このようなニューラルネットワークを用い
た教師付き学習によって、画像I(t)と注意のクラスΩ_l
との確率的な対応関係である条件付き確率密度関数ｐ(I
(t)|Ω_l)を得ることができる。By the supervised learning using such a neural network, the image I (t) and the attention class Ω _l
Conditional probability density function p (I
(t) | Ω _l ) can be obtained.

【００６３】なお、ステップＳ５４〜Ｓ５８の処理は、
行動コマンドｌ毎に実行される。従って本実施形態で
は、行動コマンドｌ＝１、２、３のそれぞれについて事
前確率p-(ρ|Ω_l)と条件付き確率密度関数ｐ(I(t)|Ω_l)
（これらをまとめて「確率モデル」という）が算出され
る。The processing of steps S54 to S58 is as follows.
It is executed for each action command l. Therefore, in the present embodiment, the a priori probability p- (ρ | Ω _l ) and the conditional probability density function p (I (t) | Ω _l ) for each of the action commands l = 1, 2, and 3.
(These are collectively referred to as a “probabilistic model”) is calculated.

【００６４】学習部２０によって算出された確率モデル
は、記憶部２２に格納される（ステップＳ６０）。事前
学習を継続する場合はステップＳ６２で「ｙｅｓ」とな
り再度ステップ４２からの一連の処理が繰り返され、確
率モデルが更新される。事前学習は、移動体３２が行動
をしている間、所定の間隔で取得される画像I(t)の全て
について実行される。そして、所定数の画像I(t)につい
て処理を完了する等、行動認識に十分精度の高い確率モ
デルが生成されたと判断される時点で終了する（ステッ
プＳ６４）。The probability model calculated by the learning unit 20 is stored in the storage unit 22 (step S60). In the case of continuing the pre-learning, the result in step S62 is "yes", and the series of processes from step 42 is repeated again, and the probabilistic model is updated. The pre-learning is executed for all the images I (t) acquired at a predetermined interval while the moving body 32 is acting. Then, the process ends for a predetermined number of images I (t), etc., and ends when it is determined that a stochastic model with sufficient accuracy for action recognition has been generated (step S64).

【００６５】続いて、図１及び図７を用いて行動認識段
階について詳細に説明する。図７は行動認識段階の処理
の流れを示すフローチャートである。Next, the action recognition stage will be described in detail with reference to FIGS. FIG. 7 is a flowchart showing the flow of processing in the action recognition stage.

【００６６】画像取得部１４は、所定の時間間隔ごとに
２枚の画像を取得する（ステップ８２）。The image acquisition unit 14 acquires two images at predetermined time intervals (step 82).

【００６７】次に、事前学習時に算出された確率モデ
ル、すなわち事前確率ｐ-(ρ^l|Ω_l)と条件付き確率密度
関数p(I(t)|Ω_l)が以下のベイズ則において使用され、
各注意のクラスΩ_lの確信度p(Ω_l(t))（confidence）が
計算される（ステップＳ８４）。この確信度p(Ω_l(t))
は、画像取得部１４の取得した画像I(t)が各注意のクラ
スΩ_lに属している確率を表している。Next, the probability model calculated at the time of pre-training, that is, the prior probability p- (ρ ^l | Ω _l ) and the conditional probability density function p (I (t) | Ω _l ) are used in the following Bayes law. Is
The confidence level p (Ω _l (t)) (confidence) of each attention class Ω _l is calculated (step S84). This certainty factor p (Ω _l (t))
Represents the probability that the image I (t) acquired by the image acquisition unit 14 belongs to each attention class Ω _l .

【００６８】[0068]

【数６】そして、算出された３つの確信度p(Ω₁(t))、p(Ω
₂(t))、p(Ω₃(t))のうち、最大のものが選択される（ス
テップＳ８６）。[Equation 6] Then, the calculated three confidences p (Ω ₁ (t)), p (Ω
_{Of 2} (t)) and p (Ω ₃ (t)), the maximum one is selected (step S86).

【００６９】行動評価部２６は、行動認識部２４におい
て選択された確信度p(Ω_l(t))について、対数尤度log p
(Ω_l(t))が所定値Ｋより大きいか否かを判別する（ステ
ップＳ８８）。log p(Ω_l(t))＞Ｋの場合、確信度が最
大である注意のクラスΩ_lに対応する行動コマンドｌ
が、画像I(t)が取得されたときに現実になされている移
動体３２の行動であると認識される（ステップＳ９
２）。The behavior evaluation unit 26, with respect to the certainty factor p (Ω _l (t)) selected by the action recognition unit 24, the log likelihood log p.
It is determined whether (Ω _l (t)) is larger than a predetermined value K (step S88). When log p (Ω _l (t))> K, the action command l corresponding to the attention class Ω _l with the highest certainty factor
Is recognized as the action of the moving body 32 that is actually performed when the image I (t) is acquired (step S9).
2).

【００７０】一方、log p(Ω_l(t))≦Ｋの場合、注意生
成部２８は注意要求を行う。注意転調部３０は、式(７)
におけるガウシアンミクスチャ「ｍ」を所定値だけ増加
（すなわち、注意転調）する（ステップＳ９０）。そし
て、学習部２０において図２のステップＳ５６〜Ｓ６０
の一連の処理が再度実行され、確率モデル（事前確率p-
(ρ|Ω_l)及び条件付き確率密度関数p(I(t)|Ω_l)）が更
新される。On the other hand, if log p (Ω _l (t)) ≦ K, the caution generator 28 issues a caution request. Note The modulation unit 30 is calculated by the formula (7).
The Gaussian mixture "m" in (1) is increased by a predetermined value (that is, attention modulation) (step S90). Then, in the learning unit 20, steps S56 to S60 of FIG.
The series of processing of is executed again, and the probabilistic model (prior probability p-
(ρ | Ω _l ) and the conditional probability density function p (I (t) | Ω _l )) are updated.

【００７１】そしてプロセスは図７のステップＳ８４に
戻り、ステップＳ８４〜Ｓ８８の処理が繰り返され、対
数尤度log p(Ω_l(t))が所定の値Ｋ以上になるまでガウ
シアンミクスチャｍが増加される。Then, the process returns to step S84 in FIG. 7 and the processes of steps S84 to S88 are repeated, and the Gaussian mixture m increases until the logarithmic likelihood log p (Ω _l (t)) becomes equal to or larger than the predetermined value K. To be done.

【００７２】なお、このような更新の過程がなく予め作
成された確率モデルが常に使用されるとしてもよい。Note that a probabilistic model created in advance may always be used without such an updating process.

【００７３】以上説明したように、本発明では画像情報
のみから移動体の行動を認識するのではなく、画像情報
から抽出した全体特徴情報と行動コマンドとの関係につ
いての学習を予め行っておき、その学習結果を利用して
行動認識を行うので、実環境においても移動体の行動を
高速かつ高精度に認識することができる。As described above, according to the present invention, the action of the moving body is not recognized only from the image information, but the relationship between the overall feature information extracted from the image information and the action command is learned in advance, Since the action recognition is performed using the learning result, the action of the moving body can be recognized at high speed and with high accuracy even in the actual environment.

【００７４】また、移動体３２のタイヤの取り付け不良
等によって、移動体３２が与えられた行動コマンドに応
じた正しい移動をしなくなった場合でも、画像から真の
移動状況が把握できる。Even if the moving body 32 does not move correctly in response to a given action command due to a tire mounting failure or the like, the true moving state can be grasped from the image.

【００７５】続いて、本発明の実施例について説明す
る。図８は、本発明の画像処理装置１０を搭載したラジ
オコントロールカー（以下、「ＲＣカー」という）１０
０のブロック図である。画像処理装置１０の各部は図１
に関連して説明したものと同様の機能を有する。ＲＣカ
ー１００には、画像処理装置１０の他、画像を取得する
画像入力カメラ１１４、行動コマンドに従ってＲＣカー
１００の行動を制御する操舵制御部１３２及び駆動制御
部１３４、外部と通信を行うための受信機１３６及び送
信機１３８が設けられている。受信機１３６は外部から
行動コマンド指令信号を受信し、行動コマンド出力部１
２に供給する。ＲＣカー１００は行動コマンドに応じて
直進、左折、右折の３つの行動の何れかをとる。また、
行動認識部２４の認識したＲＣカー１００の行動は、送
信機１３８を介して外部に送信される。Next, examples of the present invention will be described. FIG. 8 is a radio control car (hereinafter referred to as “RC car”) 10 equipped with the image processing apparatus 10 of the present invention.
It is a block diagram of 0. Each part of the image processing apparatus 10 is shown in FIG.
It has a function similar to that described in connection with. In addition to the image processing device 10, the RC car 100 includes an image input camera 114 for acquiring an image, a steering control unit 132 and a drive control unit 134 for controlling the behavior of the RC car 100 in accordance with a behavior command, and communication with the outside. A receiver 136 and a transmitter 138 are provided. The receiver 136 receives the action command command signal from the outside, and the action command output unit 1
Supply to 2. The RC car 100 takes one of three actions of going straight, turning left, and turning right according to the action command. Also,
The action of the RC car 100 recognized by the action recognition unit 24 is transmitted to the outside via the transmitter 138.

【００７６】このＲＣカー１００に対して右折、直進、
及び左折の各行動コマンドを与えながら画像を取得させ
て事前学習段階を完了させた後、２４フレームの画像に
ついて行動認識をさせたときの結果を以下に述べる。Turn right on this RC car 100, go straight,
The following is a description of the results when the action recognition is performed on the image of 24 frames after the image is acquired while the action commands of turning left and the left turn are given and the pre-learning stage is completed.

【００７７】図９は、注意のクラスの確信度の対数尤度
log p(Ω_l(t))の値の変化を示すグラフである。横軸は
式(７)のガウシアンミクスチャｍの値を表し、縦軸は確
信度の対数尤度log p(Ω_l(t))を表す。図９より、ガウ
シアンミクスチャの数が５０前後になると、対数尤度が
飽和することが分かる。各画像に対する注意のクラスの
確信度の対数尤度が大きいということは、各画像を取得
したときのＲＣカー１００が当該注意のクラスに対応す
る行動をしている可能性が十分に高いと画像処理装置１
０が認識していることに相当する。FIG. 9 shows the log-likelihood of the certainty factor of the attention class.
6 is a graph showing changes in the value of log p (Ω _l (t)). The horizontal axis represents the value of the Gaussian mixture m in equation (7), and the vertical axis represents the logarithmic likelihood log p (Ω _l (t)) of the certainty factor. It can be seen from FIG. 9 that the log-likelihood is saturated when the number of Gaussian mixtures is around 50. The fact that the log-likelihood of the confidence level of the attention class for each image is large means that it is highly likely that the RC car 100 at the time of acquiring each image is performing an action corresponding to the attention class. Processor 1
This corresponds to 0 being recognized.

【００７８】図１０はガウシアンミクスチャ（ガウス関
数の個数）ｍがｍ＝２０のときの式(８)で求めた確信度
の対数尤度log p(Ω_l)の結果を示し、図１１はｍ＝５０
のときの同様の結果を示す。図１０及び図１１における
縦軸は、(a)では行動コマンドｌ＝１（直進）の注意の
クラスΩ₁についての確信度の対数尤度log p(Ω₁)、(b)
では行動コマンドｌ＝２（左折）の注意のクラスΩ₂に
ついての確信度の対数尤度log p(Ω₂)、(c)では行動コ
マンドｌ＝３（右折）の注意のクラスΩ₃についての確
信度の対数尤度log p(Ω₃)をそれぞれ示している。各図
の横軸は行動認識をさせた２４個の画像に対応してい
る。２４個の画像のうち最初の８つ（画像１〜８）はＲ
Ｃカー１００に左折の行動コマンドｌ＝２を与えたとき
に、中央の８つ（画像９〜１６）は直進の行動コマンド
ｌ＝１を与えたときに、最後の８つ（画像１７〜２４）
は右折の行動コマンドｌ＝３を与えたときに、それぞれ
対応している。FIG. 10 shows the result of the logarithmic likelihood log p (Ω _l ) of the certainty factor obtained by the equation (8) when the Gaussian mixture (the number of Gaussian functions) m is m = 20, and FIG. 11 shows m. = 50
The same result is shown for. The vertical axis in FIGS. 10 and 11 is, in (a), the logarithmic likelihood log p (Ω ₁ ) of the certainty factor for the attention class Ω ₁ of the action command l = 1 (straight ahead), (b)
In action command l = 2 log-likelihood confidence for attention class Omega ₂ of (left) log p (Ω _2), for the attention of the class Omega ₃ (c), the action command l = 3 (right turn) The log-likelihood log p (Ω ₃ ) of the certainty factor is shown. The abscissa of each figure corresponds to 24 images for which action recognition is performed. The first 8 out of 24 images (images 1-8) are R
When the left turn action command l = 2 is given to the C-car 100, the middle eight (images 9 to 16) are the last eight (images 17 to 24) when the straight ahead action command l = 1 is given. )
Correspond to the right turn action command l = 3.

【００７９】そこで図１０を参照すると、(a)では中央
の８つの画像(すなわち直進時)について最大の確信度の
対数尤度を示し、(b)では最初の８つの画像(すなわち左
折時)について、(c)では最後の８つの画像(すなわち右
折時)について、それぞれ同様である。しかし画像間の
対数尤度のばらつきが大きく、行動の認識は十分とは言
えない。Therefore, referring to FIG. 10, (a) shows the log-likelihood of the maximum confidence for the central eight images (that is, when going straight), and (b) shows the first eight images (that is, when turning left). (C) is the same for each of the last eight images (that is, when turning right). However, the variation of log-likelihood between images is large, and it cannot be said that the recognition of actions is sufficient.

【００８０】次に図１１を参照すると、(a)、(b)、(c)
ともに図１０と同様、行動コマンドｌに対応する画像が
最大の確信度の対数尤度を示している。しかし図１１に
おいては、図１０に比べて画像間の対数尤度のばらつき
が少なく、滑らかになっている。これは注意転調により
ガウシアンミクスチャを増加することによって達成され
たものである。Next, referring to FIG. 11, (a), (b), (c)
In both cases, as in FIG. 10, the image corresponding to the action command 1 shows the log likelihood of the maximum certainty factor. However, in FIG. 11, there is less variation in the log likelihood between images as compared with FIG. 10, and the image is smooth. This was achieved by increasing the Gaussian mixture by attentional modulation.

【００８１】このように、本発明の画像処理装置を用い
ることによって、事前学習段階においてボトムアップ的
に形成された注意のクラスが学習を重ねることにより信
頼性が向上し、行動認識段階では、確信度の対数尤度が
所定値を超えるまで確率モデルが更新されるので、行動
の認識精度が向上する。As described above, by using the image processing device of the present invention, the reliability is improved by the learning of the attention class formed in the bottom-up in the pre-learning stage, and the confidence is increased in the action recognition stage. Since the probability model is updated until the log-likelihood of the degree exceeds a predetermined value, the action recognition accuracy is improved.

【００８２】[0082]

【発明の効果】本発明によると、画像情報のみから移動
体の行動を認識するのではなく、画像情報と行動コマン
ドとの関係についての学習を予め行っておき、その学習
結果を利用して判断するので、実環境においても移動体
の行動を高速かつ高精度に認識することができる。According to the present invention, the action of the moving body is not recognized only from the image information, but the relationship between the image information and the action command is learned in advance, and the learning result is used to make a decision. Therefore, the behavior of the moving body can be recognized at high speed and with high accuracy even in the actual environment.

[Brief description of drawings]

【図１】本発明の一実施形態である画像処理装置の機能
ブロック図である。FIG. 1 is a functional block diagram of an image processing apparatus that is an embodiment of the present invention.

【図２】本発明による画像処理方法の事前学習段階を示
すフローチャートである。FIG. 2 is a flowchart showing a pre-learning stage of the image processing method according to the present invention.

【図３】移動体の直進時（Ω₁）の画像認識結果を示す
図である。FIG. 3 is a diagram showing an image recognition result when a moving body is straight ahead (Ω ₁ ).

【図４】移動体の左折時（Ω₂）の画像認識結果を示す
図である。FIG. 4 is a diagram showing an image recognition result when a moving body turns left (Ω ₂ ).

【図５】移動体の右折時（Ω₃）の画像認識結果を示す
図である。FIG. 5 is a diagram showing an image recognition result when a moving body turns right (Ω ₃ ).

【図６】ニューラルネットワークを用いた教師付き学習
に使用される階層型ニューラルネットワークの構成例を
示す図である。FIG. 6 is a diagram showing a configuration example of a hierarchical neural network used for supervised learning using a neural network.

【図７】本発明による画像処理方法の行動認識段階を示
すフローチャートである。FIG. 7 is a flowchart showing an action recognition step of the image processing method according to the present invention.

【図８】本発明による画像処理装置を使用したＲＣカー
の全体的な構成を示すブロック図である。FIG. 8 is a block diagram showing the overall configuration of an RC car using the image processing apparatus according to the present invention.

【図９】確信度の対数尤度の変化を示す図である。FIG. 9 is a diagram showing a change in log likelihood of confidence.

【図１０】ガウシアンミクスチャｍ＝２０のときのＲＣ
カーの行動の認識結果を示す図である。FIG. 10: RC when Gaussian mixture m = 20
It is a figure which shows the recognition result of the behavior of a car.

【図１１】ガウシアンミクスチャｍ＝５０のときのＲＣ
カーの行動の認識結果を示す図である。FIG. 11: RC when Gaussian mixture m = 50
It is a figure which shows the recognition result of the behavior of a car.

[Explanation of symbols]

１０画像処理装置１２行動コマンド出力部１４画像取得部１６局所特徴抽出部１８全体特徴抽出部２０学習部２２記憶部２４行動認識部２６行動評価部２８注意生成部３０注意転調部３２移動体 10 Image processing device 12 Action command output section 14 Image acquisition unit 16 Local Feature Extraction Unit 18 Overall Feature Extraction Unit 20 Learning Department 22 Memory 24 Action recognition unit 26 Behavior Evaluation Department 28 Attention generator 30 Attention modulation section 32 moving bodies

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｔ 7/20 １００Ｇ０６Ｔ 7/20 １００ // Ｇ０５Ｄ 1/02 Ｇ０５Ｄ 1/02 Ｋ (72)発明者赤塚浩二埼玉県和光市中央１丁目４番１号株式会社本田技術研究所内Ｆターム(参考） 2F065 AA01 BB15 DD06 FF04 JJ03 JJ26 PP01 QQ00 QQ14 QQ17 QQ24 QQ26 QQ27 QQ33 QQ41 5B057 AA05 AA06 BA02 CE06 DA07 DB02 DB09 DC08 DC22 DC34 DC40 5H301 AA01 DD02 GG09 5L096 AA06 BA04 BA05 CA04 FA34 FA67 FA69 GA30 GA55 HA04 HA11 JA18 KA04 KA15 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G06T 7/20 100 G06T 7/20 100 // G05D 1/02 G05D 1/02 K (72) Inventor Akatsuka Koji 1-4-1 Chuo, Wako City, Saitama Prefecture F-term in Honda R & D Co., Ltd. (Reference) 2F065 AA01 BB15 DD06 FF04 JJ03 JJ26 PP01 QQ00 QQ14 QQ17 QQ24 QQ26 QQ27 QQ33 QQ41 5B057 AA05 AA06 BA02 DB06 DC07 DB07 CE07 DA07 DB07 DB07 DC34 DC40 5H301 AA01 DD02 GG09 5L096 AA06 BA04 BA05 CA04 FA34 FA67 FA69 GA30 GA55 HA04 HA11 JA18 KA04 KA15

Claims

[Claims]

1. A behavior command output unit for outputting a behavior command for acting a moving body, and a local for extracting characteristic information of a local region of an image from image information of an external environment acquired in the moving body at the time of outputting the behavior command. A feature extraction unit, which extracts feature information of the entire region of the image using the extracted feature information of the local region, and an action command to recognize the action command based on the feature information of the extracted entire region An image processing apparatus including: a learning unit that calculates a probability statistical model of.

2. The image processing apparatus according to claim 1, wherein the local feature extracting section extracts feature information of a local region of an image using a Gabor filter.

3. The image processing apparatus according to claim 2, wherein the image intensity obtained by applying a Gabor filter of a positive component and a negative component is used for extracting the characteristic information of the local region of the image.

4. The image processing apparatus according to claim 2, wherein the Gabor filter is an 8-direction Gabor filter.

5. The image processing apparatus according to claim 1, wherein the overall feature extraction unit fuses feature information of local areas using a Gaussian function.

6. The image processing apparatus according to claim 1, wherein the probability statistical model is generated by supervised learning using an expected value maximizing algorithm and a neural network.

7. An action recognition unit for recognizing the action of the moving body by applying Bayes' rule using the probability statistical model to the image information and calculating a certainty factor for each action command. The image processing apparatus according to claim 1.

8. An action evaluation unit that evaluates the action recognition by comparing a value based on a certainty factor calculated by the action recognition unit with a predetermined value, and the probability statistics according to the evaluation by the action evaluation unit. An attention generating unit that generates an attention request for updating a model, and an attention modulation unit that changes a predetermined parameter of the overall feature extraction unit according to the attention request are further provided, and the learning unit re-executes the parameter after changing the parameter. The image processing apparatus according to claim 7, which calculates a probability statistical model.

9. An action command for moving a moving body is output, feature information of a local region of an image is extracted from image information of an external environment acquired in the moving body at the time of outputting the action command, and the extracted local region is extracted. An image processing method comprising: extracting feature information of an entire region of an image using the feature information of, and calculating a probability statistical model for recognizing an action command based on the extracted feature information of the entire region.

10. The image processing method according to claim 9, wherein the extraction of the characteristic information of the local region is performed using a Gabor filter.

11. The image processing method according to claim 10, wherein image intensity obtained by applying a Gabor filter of a positive component and a negative component is used for extracting the characteristic information of the local region.

12. The image processing method according to claim 10, wherein the Gabor filter is an 8-direction Gabor filter.

13. The image processing method according to claim 9, wherein the feature information of the entire region is extracted by fusing feature information of the local region using a Gaussian function.

14. The image processing method according to claim 9, wherein the probability statistical model is generated by supervised learning using an expected value maximizing algorithm and a neural network.

15. The method further comprises recognizing the action of the moving body by applying a Bayes rule using the probability statistical model to the image information and calculating a certainty factor for each action command. Any one of items 9 to 14
The image processing method described in the item.

16. The behavior recognition is evaluated by comparing a value based on the calculated certainty factor with a predetermined value, and a caution request for updating the probability statistical model is generated according to the evaluation. The image processing method according to claim 15, further comprising changing a predetermined parameter of the overall feature extraction unit according to a request, wherein the probability statistical model is recalculated after the parameter is changed.