JP2022181996A

JP2022181996A - Image processing apparatus and vehicle

Info

Publication number: JP2022181996A
Application number: JP2021089266A
Authority: JP
Inventors: 將馬坂本; Shoma Sakamoto
Original assignee: Subaru Corp
Current assignee: Subaru Corp
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2022-12-08

Abstract

To provide an image processing apparatus or the like, configured to increase the speed of estimation processing speed of an object position.SOLUTION: An image processing apparatus includes: an area setting unit which sets one or more image areas in a captured image; an extraction unit which extracts two-dimensional layout feature quantities included in the image areas of the captured image; and a position estimation unit which performs correlation operation using convolutional operation on the feature quantities, in image areas of the captured image obtained in each of frame periods adjacent along a time axis, to estimate a position of an object to be tracked. The position estimation unit uses Fourier transform on the feature quantities in the convolution operation, and executes the Fourier transform and the convolution operation on the two-dimensional layout feature quantities per dimension.SELECTED DRAWING: Figure 6

Description

本開示は、撮像画像に基づいて物体の位置推定を行う画像処理装置、および、そのような画像処理装置を備えた車両に関する。 The present disclosure relates to an image processing device that estimates the position of an object based on a captured image, and a vehicle equipped with such an image processing device.

撮像装置により得られた撮像画像には、様々な物体の画像が含まれる。例えば特許文献１には、そのような撮像画像に基づいて物体の位置推定を行う画像処理装置が開示されている。 Captured images obtained by an imaging device include images of various objects. For example, Patent Literature 1 discloses an image processing device that estimates the position of an object based on such captured images.

特開２００８－１２３１４１号公報JP 2008-123141 A

ところで、このような画像処理装置では、物体位置の推定処理の高速化を図ることが、求められている。物体位置の推定処理の高速化を図ることが可能な画像処理装置、および、そのような画像処理装置を備えた車両を提供することが望ましい。 By the way, in such an image processing apparatus, there is a demand for speeding up the process of estimating the position of an object. It is desirable to provide an image processing device capable of speeding up object position estimation processing, and a vehicle equipped with such an image processing device.

本開示の一実施の形態に係る画像処理装置は、撮像画像において１または複数の画像領域を設定する領域設定部と、撮像画像の画像領域に含まれる２次元配置の特徴量を抽出する抽出部と、時間軸に沿った前後のフレーム期間にてそれぞれ得られた撮像画像の画像領域同士において、特徴量に対する畳み込み演算を利用した相関演算を行うことにより、追跡対象物体の位置を推定する位置推定部と、を備えたものである。位置推定部は、上記畳み込み演算を行う際に、特徴量に対するフーリエ変換を用いると共に、フーリエ変換および畳み込み演算をそれぞれ、２次元配置の特徴量に対して、１次元ずつ実行する。 An image processing apparatus according to an embodiment of the present disclosure includes a region setting unit that sets one or more image regions in a captured image, and an extraction unit that extracts a two-dimensional arrangement feature amount included in the image region of the captured image. Position estimation for estimating the position of the tracked object by performing a correlation operation using a convolution operation on the feature quantity between the image areas of the captured images obtained in the preceding and following frame periods along the time axis. and When performing the convolution operation, the position estimation unit uses the Fourier transform of the feature amount, and performs the Fourier transform and the convolution operation on the two-dimensionally arranged feature amount for each one dimension.

本開示の一実施の形態に係る車両は、上記本開示の一実施の形態に係る画像処理装置と、上記位置推定部から得られる追跡対象物体の推定位置を利用して、車両制御を行う車両制御部と、を備えたものである。 A vehicle according to an embodiment of the present disclosure is a vehicle that performs vehicle control using the image processing device according to the embodiment of the present disclosure and the estimated position of a tracked object obtained from the position estimation unit. and a control unit.

本開示の一実施の形態に係る車両の概略構成例を表すブロック図である。1 is a block diagram showing a schematic configuration example of a vehicle according to an embodiment of the present disclosure; FIG. 図１に示した車両の外観構成例を模式的に表す上面図である。FIG. 2 is a top view schematically showing an external configuration example of the vehicle shown in FIG. 1; 図１に示したステレオカメラが生成した左画像および右画像の一例を表す模式図である。FIG. 2 is a schematic diagram showing an example of a left image and a right image generated by the stereo camera shown in FIG. 1; 図１に示した領域設定部が設定した画像領域の一例を表す模式図である。2 is a schematic diagram showing an example of an image area set by an area setting unit shown in FIG. 1; FIG. 相関演算を用いた物体追跡の概要について説明するための模式図である。FIG. 4 is a schematic diagram for explaining an overview of object tracking using correlation calculation; 実施の形態に係る相関演算の処理例を表す模式図である。FIG. 10 is a schematic diagram showing a processing example of correlation calculation according to the embodiment; 実施の形態に係る相関演算を含む処理例を表す流れ図である。7 is a flow chart showing an example of processing including correlation calculation according to the embodiment; 比較例および実施例に係る相関演算の処理例を対比して表す模式図である。FIG. 10 is a schematic diagram showing a comparison of correlation calculation processing examples according to a comparative example and an example.

以下、本開示の実施の形態について、図面を参照して詳細に説明する。なお、説明は以下の順序で行う。
１．実施の形態（距離情報または機械学習を利用して画像領域を設定する場合の例）
２．変形例 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. The description will be given in the following order.
1. Embodiment (example of setting an image area using distance information or machine learning)
2. Modification

＜１．実施の形態＞
［構成］
図１は、本開示の一実施の形態に係る車両（車両１０）の概略構成例を、ブロック図で表したものである。図２は、図１に示した車両１０の外観構成例を、模式的に上面図で表したものである。 <1. Embodiment>
[Constitution]
FIG. 1 is a block diagram showing a schematic configuration example of a vehicle (vehicle 10) according to an embodiment of the present disclosure. FIG. 2 is a schematic top view showing an example of the exterior configuration of the vehicle 10 shown in FIG.

車両１０は、図１に示したように、ステレオカメラ１１、画像処理装置１２および車両制御部１３を備えている。なお、この図１では、車両１０の駆動力源（エンジンやモータなど）等の図示については、省略している。この車両１０は、例えば、ハイブリッド自動車（ＨＥＶ）や電気自動車（ＥＶ：Electric Vehicle）などの電動車両、あるいは、ガソリン車により構成されている。 The vehicle 10 includes a stereo camera 11, an image processing device 12, and a vehicle control section 13, as shown in FIG. It should be noted that FIG. 1 omits illustration of a driving force source (engine, motor, etc.) of the vehicle 10 . The vehicle 10 is, for example, an electric vehicle such as a hybrid vehicle (HEV) or an electric vehicle (EV), or a gasoline vehicle.

（Ａ．ステレオカメラ１１）
ステレオカメラ１１は、例えば図２に示したように、車両１０の前方を撮像することにより、互いに視差を有する一組の画像（左画像ＰＬおよび右画像ＰＲ）を生成するカメラである。このステレオカメラ１１は、図１，図２に示したように、左カメラ１１Ｌおよび右カメラ１１Ｒを有している。 (A. Stereo camera 11)
The stereo camera 11 is a camera that generates a pair of images (a left image PL and a right image PR) having parallax with each other by capturing an image in front of the vehicle 10, as shown in FIG. 2, for example. This stereo camera 11 has a left camera 11L and a right camera 11R, as shown in FIGS.

左カメラ１１Ｌおよび右カメラ１１Ｒはそれぞれ、例えば、レンズおよびイメージセンサを含んでいる。左カメラ１１Ｌおよび右カメラ１１Ｒは、例えば図２に示したように、車両１０におけるフロントガラス１９の上部近傍に、車両１０の幅方向に沿って所定距離だけ離間して、配置されている。これらの左カメラ１１Ｌおよび右カメラ１１Ｒは、互いに同期して撮像動作を行うようになっている。具体的には図１に示したように、左カメラ１１Ｌは左画像ＰＬを生成し、右カメラ１１Ｒは右画像ＰＲを生成する。左画像ＰＬは複数の画素値を含み、右画像ＰＲは複数の画素値を含んでいる。これらの左画像ＰＬおよび右画像ＰＲは、図１に示したように、ステレオ画像ＰＩＣを構成している。 Left camera 11L and right camera 11R each include, for example, a lens and an image sensor. For example, as shown in FIG. 2, the left camera 11L and the right camera 11R are arranged in the vicinity of the upper portion of the windshield 19 of the vehicle 10 with a predetermined distance therebetween along the width direction of the vehicle 10 . These left camera 11L and right camera 11R are designed to perform imaging operations in synchronization with each other. Specifically, as shown in FIG. 1, the left camera 11L generates a left image PL and the right camera 11R generates a right image PR. The left image PL contains multiple pixel values, and the right image PR contains multiple pixel values. These left image PL and right image PR constitute a stereo image PIC, as shown in FIG.

図３は、このようなステレオ画像ＰＩＣの一例を表したものである。具体的には、図３（Ａ）は、左画像ＰＬの一例を示しており、図３（Ｂ）は、右画像ＰＲの一例を示している。なお、図３中に示したｘ，ｙはそれぞれ、ｘ軸，ｙ軸を表している。この例では、車両１０が走行している走行路における車両１０の前方に、他車両（先行車両９０）が走行している。左カメラ１１Ｌは先行車両９０を撮像することにより左画像ＰＬを生成し、右カメラ１１Ｒは先行車両９０を撮像することにより右画像ＰＲを生成する。 FIG. 3 shows an example of such a stereo image PIC. Specifically, FIG. 3A shows an example of the left image PL, and FIG. 3B shows an example of the right image PR. Note that x and y shown in FIG. 3 represent the x-axis and y-axis, respectively. In this example, another vehicle (preceding vehicle 90) is running in front of the vehicle 10 on the road on which the vehicle 10 is running. The left camera 11L captures the preceding vehicle 90 to generate the left image PL, and the right camera 11R captures the preceding vehicle 90 to generate the right image PR.

ステレオカメラ１１は、このような左画像ＰＬおよび右画像ＰＲを含む、ステレオ画像ＰＩＣを生成するようになっている。また、ステレオカメラ１１は、所定のフレームレート（例えば６０［ｆｐｓ］）にて撮像動作を行うことにより、一連のステレオ画像ＰＩＣを生成するようになっている。 The stereo camera 11 is adapted to generate a stereo image PIC including such left image PL and right image PR. Also, the stereo camera 11 is adapted to generate a series of stereo images PIC by performing imaging operations at a predetermined frame rate (eg, 60 [fps]).

（Ｂ．画像処理装置１２）
画像処理装置１２は、ステレオカメラ１１から供給されたステレオ画像ＰＩＣに基づいて、各種の画像処理（車両１０の前方の物体の追跡処理等）を行う装置である。この画像処理装置１２は、図１に示したように、画像メモリ１２１、距離情報生成部１２２、領域設定部１２３、特徴量抽出部１２４および位置推定部１２５を、有している。 (B. Image processing device 12)
The image processing device 12 is a device that performs various image processing (such as tracking an object in front of the vehicle 10) based on the stereo image PIC supplied from the stereo camera 11. FIG. The image processing device 12 has an image memory 121, a distance information generating section 122, an area setting section 123, a feature quantity extracting section 124, and a position estimating section 125, as shown in FIG.

このような画像処理装置１２は、例えば、プログラムを実行する１または複数のプロセッサ（ＣＰＵ：Central Processing Unit）と、これらのプロセッサに通信可能に接続される１または複数のメモリと、を含んで構成される。また、このようなメモリは、例えば、処理データを一時的に記憶するＲＡＭ（Random Access Memory）、および、プログラムを記憶するＲＯＭ（Read Only Memory）等により構成される。 Such an image processing device 12 includes, for example, one or more processors (CPU: Central Processing Unit) that executes a program, and one or more memories communicably connected to these processors. be done. Further, such a memory includes, for example, a RAM (Random Access Memory) that temporarily stores processing data, a ROM (Read Only Memory) that stores programs, and the like.

なお、上記した特徴量抽出部１２４は、本開示における「抽出部」の一具体例に対応している。 Note that the feature amount extraction unit 124 described above corresponds to a specific example of the “extraction unit” in the present disclosure.

（画像メモリ１２１）
画像メモリ１２１は、図１に示したように、ステレオ画像ＰＩＣに含まれる左画像ＰＬおよび右画像ＰＲをそれぞれ、一旦記憶するメモリである。また、画像メモリ２１は、このようにして記憶された左画像ＰＬおよび右画像ＰＲの少なくとも一方を、撮像画像Ｐとして、距離情報生成部１２２および特徴量抽出部１２４に対してそれぞれ、順次供給するようになっている（図１参照）。 (Image memory 121)
The image memory 121, as shown in FIG. 1, is a memory that temporarily stores the left image PL and the right image PR included in the stereo image PIC. Further, the image memory 21 sequentially supplies at least one of the left image PL and the right image PR thus stored to the distance information generation unit 122 and the feature amount extraction unit 124 as the captured image P. (See Figure 1).

（距離情報生成部１２２）
距離情報生成部１２２は、画像メモリ１２１から読み出された撮像画像Ｐ（ここでは、左画像ＰＬおよび右画像ＰＲ）に基づいて、ステレオマッチング処理やフィルタリング処理などを含む所定の画像処理を行うことにより、距離情報Ｉｚを生成するものである（図１参照）。具体的には、距離情報生成部１２２は、これらの左画像ＰＬおよび右画像ＰＲに基づき、複数の画素値を含む距離画像を生成する。複数の画素値はそれぞれ、この例では視差値である。言い換えれば、複数の画素値はそれぞれ、３次元の実空間における、各画素に対応する点までの距離に対応する。なお、この例には限定されず、例えば、複数の画素値がそれぞれ、３次元の実空間における、各画素に対応する点までの距離を示す距離値であってもよい。このようにして距離情報生成部１２２は、各画素に対応する点までの距離を示す情報である、距離情報Ｉｚを生成するようになっている。 (Distance information generator 122)
The distance information generation unit 122 performs predetermined image processing including stereo matching processing and filtering processing based on the captured image P (here, the left image PL and the right image PR) read from the image memory 121. to generate the distance information Iz (see FIG. 1). Specifically, the distance information generator 122 generates a distance image including a plurality of pixel values based on the left image PL and right image PR. Each of the multiple pixel values is a parallax value in this example. In other words, each of the plurality of pixel values corresponds to the distance to the point corresponding to each pixel in the three-dimensional real space. However, the present invention is not limited to this example. For example, each of the plurality of pixel values may be a distance value indicating the distance to a point corresponding to each pixel in a three-dimensional real space. In this manner, the distance information generator 122 generates distance information Iz, which is information indicating the distance to the point corresponding to each pixel.

（領域設定部１２３）
領域設定部１２３は、距離情報生成部１２２から供給される距離情報Ｉｚに基づき、撮像画像Ｐにおいて、１または複数の画像領域Ｒを設定するものである。具体的には、領域設定部１２３は、距離情報Ｉｚに基づき、撮像画像Ｐ内において互いに近くに位置すると共に視差値がほぼ同じである、複数の画素を特定し、その複数の画素を含む矩形領域を、画像領域Ｒとして設定するようになっている。すなわち、撮像画像Ｐ内において物体がある場合には、その物体に対応する領域の画素は、互いに近くに位置し、視差値はほぼ同じである。よって、領域設定部１２３は、このようにして画像領域Ｒを設定することにより、物体を囲むように画像領域Ｒを、設定するようになっている。 (Region setting unit 123)
The area setting section 123 sets one or more image areas R in the captured image P based on the distance information Iz supplied from the distance information generating section 122 . Specifically, based on the distance information Iz, the region setting unit 123 identifies a plurality of pixels that are positioned close to each other in the captured image P and have substantially the same parallax value, and selects a rectangle that includes the plurality of pixels. The area is set as an image area R. That is, when there is an object in the captured image P, the pixels in the area corresponding to the object are positioned close to each other and have substantially the same parallax value. Therefore, by setting the image area R in this manner, the area setting unit 123 sets the image area R so as to surround the object.

図４は、領域設定部１２３が設定した画像領域Ｒの一例を、模式的に表したものである。この図４に示した例では、撮像画像Ｐ（ここでは、左画像ＰＬおよび右画像ＰＲのうちの一方の画像）において、２つの車両にそれぞれ、画像領域Ｒが設定されている。なお、領域設定部１２３は、この例では車両に画像領域Ｒを設定したが、この例には限定されず、例えば、人、ガードレール、壁などにも画像領域Ｒを設定するようにしてもよい。 FIG. 4 schematically shows an example of the image region R set by the region setting section 123. As shown in FIG. In the example shown in FIG. 4, an image region R is set for each of the two vehicles in the captured image P (here, one of the left image PL and the right image PR). In this example, the area setting unit 123 sets the image area R on the vehicle, but is not limited to this example, and may set the image area R on a person, a guardrail, a wall, etc., for example. .

このようにして領域設定部１２３にて設定された、１または複数の画像領域Ｒについての情報は、図１に示したように、特徴量抽出部１２４に対して供給されるようになっている。 Information about one or a plurality of image regions R set by the region setting unit 123 in this manner is supplied to the feature quantity extraction unit 124 as shown in FIG. .

なお、この図１に示した例では、領域設定部１２３は、距離情報Ｉｚを利用して画像領域Ｒを設定しているが、この例には限られない。すなわち、領域設定部１２３は、例えば、ＤＮＮ（Deep Neural Network）等の学習済みモデルを用いて、撮像画像Ｐ内の物体を識別すると共に、その識別した物体の座標を出力することにより、矩形領域である画像領域Ｒを設定するようにしてもよい。つまり、領域設定部１２３は、機械学習を利用して画像領域Ｒを設定するようにしてもよい。 Note that in the example shown in FIG. 1, the region setting unit 123 sets the image region R using the distance information Iz, but is not limited to this example. That is, the region setting unit 123 uses a learned model such as a DNN (Deep Neural Network), for example, to identify an object in the captured image P, and outputs the coordinates of the identified object to create a rectangular region. may be set. That is, the region setting unit 123 may set the image region R using machine learning.

（特徴量抽出部１２４）
特徴量抽出部１２４は、撮像画像Ｐ（ここでは、左画像ＰＬおよび右画像ＰＲのうちの一方の画像）における１または複数の画像領域Ｒに含まれる、特徴量Ｆを抽出するものである（図１参照）。この特徴量Ｆは、詳細は後述するが（図６等）、行列状に配置（２次元配置）された複数の画素における画素値により構成されている。なお、このような特徴量Ｆとしては、例えば、ＲＧＢ（Red, Green, Blue）特徴量やＨＯＢ（Histograms of Oriented Gradients）特徴量等が挙げられる。 (Feature quantity extraction unit 124)
The feature amount extraction unit 124 extracts the feature amount F included in one or more image regions R in the captured image P (here, one of the left image PL and the right image PR) ( See Figure 1). The feature amount F, which will be described in detail later (FIG. 6, etc.), is composed of pixel values of a plurality of pixels arranged in a matrix (two-dimensional arrangement). In addition, as such a feature amount F, for example, an RGB (Red, Green, Blue) feature amount, a HOB (Histograms of Oriented Gradients) feature amount, or the like can be cited.

特徴量抽出部１２４は、例えば、ＤＮＮの学習済みモデルを用いて、このような特徴量Ｆを抽出するようになっている。なお、その場合、例えば、特徴量抽出部１２４におけるニューラルネットワークがそれぞれ、複数の畳み込み層および複数のプーリング層を有するようになっている。 The feature quantity extraction unit 124 extracts such a feature quantity F using, for example, a DNN trained model. In that case, for example, each neural network in the feature amount extraction unit 124 has a plurality of convolution layers and a plurality of pooling layers.

このようにして特徴量抽出部１２４にて抽出された特徴量Ｆは、位置推定部１２５に対して供給されるようになっている（図１参照）。 The feature quantity F extracted by the feature quantity extraction unit 124 in this manner is supplied to the position estimation unit 125 (see FIG. 1).

（位置推定部１２５）
位置推定部１２５は、時間軸に沿った前後のフレーム期間にてそれぞれ得られた撮像画像Ｐの画像領域Ｒ同士において、上記した特徴量Ｆに対する畳み込み演算を利用した相関演算を行うことにより、追跡対象物体の位置を推定するものである。つまり、位置推定部１２５は、各フレーム期間での撮像画像Ｐごとに、特徴量Ｆを用いて追跡対象物体の位置を推定することによって、物体追跡を行う。 (Position estimation unit 125)
The position estimating unit 125 performs a correlation operation using a convolution operation on the feature amount F between the image regions R of the captured image P respectively obtained in the preceding and succeeding frame periods along the time axis, thereby performing a tracking operation. It estimates the position of the target object. That is, the position estimation unit 125 performs object tracking by estimating the position of the tracked object using the feature amount F for each captured image P in each frame period.

また、このような物体追跡は、例えば、ＫＣＦ（Kernelized Correlation Filter）やテンプレートマッチング、Siamese Network等の、ＣＮＮ（Convolutional Neural Network：畳み込みニューラルネットワーク）を利用した物体追跡手法を用いて、行われるようになっている。 Further, such object tracking, for example, KCF (Kernelized Correlation Filter) and template matching, such as Siamese Network, such as CNN (Convolutional Neural Network: convolutional neural network) using an object tracking method, so as to be carried out It's becoming

図５は、このような相関演算を用いた物体追跡の概要について、模式的に表したものである。なお、この図５中に示したｔは、時間軸を表しており、以降においても同様である。 FIG. 5 schematically shows an overview of object tracking using such correlation calculation. Note that t shown in FIG. 5 represents the time axis, and the same applies to the following.

例えば図５（Ａ），図５（Ｂ）に示したように、相関演算を用いた物体追跡では、時間軸ｔに沿った各タイミングｔ０，ｔ１，…，ｔｎにおける各撮像画像Ｐ０，Ｐ１，…Ｐｎにおいて、追跡対象物体の位置（位置座標ｘ０）がどのように移動したのかが推定される（移動量Δｘ参照）。具体的には、このような物体追跡の際には、各撮像画像Ｐ０，Ｐ１，…Ｐｎ間での相関値が最も大きい（最大値を示す）位置が、追跡対象物体の位置座標ｘ０として推定され、随時更新されるようになっている。 For example, as shown in FIGS. 5A and 5B, in object tracking using correlation calculation, captured images P0, P1, . . , at Pn, how the position of the tracked object (position coordinate x0) has moved is estimated (refer to the amount of movement Δx). Specifically, during such object tracking, the position where the correlation value among the captured images P0, P1, . and is updated from time to time.

一例として、上記したＫＣＦを用いた場合の物体追跡の際には、以下のようにして各種演算が行われる。すなわち、まず、以下の（１）式で示される値が最小化されるように、係数ｗの機械学習が行われる。そして、この係数ｗの値を最適化できれば、入力画像における追跡対象物体の位置座標を出力する式となる。なお、このようにして係数ｗを最適化する際には、例えば、後述するカーネルトリックが用いられるようになっている。 As an example, when tracking an object using the KCF described above, various calculations are performed as follows. That is, first, machine learning of the coefficient w is performed so as to minimize the value represented by the following equation (1). If the value of this coefficient w can be optimized, it becomes an equation that outputs the position coordinates of the tracked object in the input image. When optimizing the coefficient w in this manner, for example, a kernel trick, which will be described later, is used.

また、図６は、本実施の形態に係る相関演算の処理例を、模式的に表したものである。具体的には、現在のフレーム期間（図６（Ａ）参照）において得られた撮像画像Ｐｎの画像領域Ｒにて抽出された特徴量Ｆと、直後のフレーム期間（１フレーム後：図６（Ｂ）参照）において得られた撮像画像Ｐｎ＋１の画像領域Ｒにて取得された特徴量Ｆｎ＋１と、を用いた相関演算の処理例について、示している。 FIG. 6 schematically shows an example of correlation calculation processing according to the present embodiment. Specifically, the feature amount F extracted in the image area R of the captured image Pn obtained in the current frame period (see FIG. 6A) and the immediately following frame period (one frame later: FIG. 6 ( B) shows an example of correlation calculation processing using the feature amount Fn+1 acquired in the image region R of the captured image Pn+1 obtained in (see B)).

なお、上記した特徴量Ｆｎ，Ｆｎ＋１ではそれぞれ、１６個の画素ＰＸが（ｙ軸方向に対応する縦方向：４個×ｘ軸方向に対応する横方向：４個）にて、行列状に配置（２次元配置）されている（図６（Ａ），図６（Ｂ）参照）。また、図６中には、以下説明するフーリエ変換を用いた畳み込み演算による相関演算から得られる、相関マップＭｃを示している。 In each of the above-described feature amounts Fn and Fn+1, 16 pixels PX (vertical direction corresponding to the y-axis direction: 4 pixels x horizontal direction corresponding to the x-axis direction: 4 pixels) are arranged in a matrix. (two-dimensional arrangement) (see FIGS. 6A and 6B). FIG. 6 also shows a correlation map Mc obtained from correlation calculation by convolution using Fourier transform, which will be described below.

本実施の形態では、位置推定部１２５は、前述した特徴量Ｆに対する畳み込み演算を行う際に、特徴量Ｆに対するフーリエ変換を用いるようになっている。また、位置推定部１２５は、例えば、前述したＫＣＦを用いて、そのようなフーリエ変換を用いた相関演算を行うようになっている。更に、このＫＣＦを用いた場合、例えば図５中に示したように、フーリエ変換によって、色空間を示す特徴量（前述したＲＧＢ特徴量等）から、周波数空間への変換が行われることで、処理の高速化が図られるようになっている In the present embodiment, the position estimating unit 125 uses the Fourier transform of the feature amount F when performing the convolution operation on the feature amount F described above. Further, the position estimating unit 125 is configured to perform correlation calculation using such Fourier transform, for example, using the KCF described above. Furthermore, when using this KCF, for example, as shown in FIG. It is designed to speed up processing

ここで、このようなＫＣＦにおいて前述したカーネルトリックを用いた場合、フーリエ変換を行う関数Ｆ（ｘ）は、以下の（２）式にて定義される。また、上記した畳み込み演算を（ｆ＊ｇ）として表すと、以下の（３）式にて定義される。 Here, when the kernel trick described above is used in such a KCF, the function F(x) for Fourier transform is defined by the following equation (2). Further, when the convolution operation described above is expressed as (f*g), it is defined by the following equation (3).

すると、フーリエ変換を行ったもの同士の畳み込み演算は、以下の（４）式にて表されることになる。つまり、この（４）式における最後の式から分かるように、フーリエ変換を行ったもの同士の畳み込み演算は、フーリエ変換を行ったもの同士の積と等しくなる。具体的には、図６の例では、フーリエ変換を用いて周波数空間に変換した特徴量Ｆｎ，Ｆｎ＋１に対して畳み込み演算を行う場合、これらの特徴量Ｆｎ，Ｆｎ＋１（行列）同士の積を取ることで、演算が行われるようになっている。このようにしてＫＣＦを用いることで、計算量が削減され、処理の高速化が実現されるようになっている。 Then, the convolution operation between the objects subjected to the Fourier transform is represented by the following equation (4). That is, as can be seen from the last expression in this expression (4), the convolution operation between the Fourier-transformed values is equal to the product of the Fourier-transformed values. Specifically, in the example of FIG. 6, when performing a convolution operation on the feature amounts Fn and Fn+1 transformed into the frequency space using the Fourier transform, the product of these feature amounts Fn and Fn+1 (matrices) is taken. By doing so, the calculation is performed. By using the KCF in this way, the amount of calculation is reduced, and high speed processing is realized.

ここで、本実施の形態では、位置推定部１２５は、このようなフーリエ変換および畳み込み演算をそれぞれ、２次元配置の特徴量Ｆに対して、１次元ずつ実行するようになっている（例えば、図６（Ａ），図６（Ｂ）中の特徴量Ｆｎ，Ｆｎ＋１に示した、破線の矢印参照）。なお、このような位置推定部における詳細な処理例については、後述する（図７，図８）。 Here, in the present embodiment, the position estimating unit 125 performs such Fourier transform and convolution operation on the two-dimensionally arranged feature quantity F one dimension at a time (for example, (See dashed arrows shown in feature quantities Fn and Fn+1 in FIGS. 6A and 6B). A detailed processing example in such a position estimation unit will be described later (FIGS. 7 and 8).

（Ｃ．車両制御部１３）
車両制御部１３は、位置推定部１２５から供給される位置推定の結果（追跡対象物体の推定位置Ｐｅ）を利用して、車両１０における各種の車両制御を行うものである（図１参照）。具体的には、車両制御部１３はこの推定位置Ｐｅの情報に基づき、例えば、車両１０の走行制御や、車両１０における各種部材の動作制御などを、行うようになっている。 (C. Vehicle control unit 13)
The vehicle control unit 13 uses the position estimation result (estimated position Pe of the tracked object) supplied from the position estimation unit 125 to perform various vehicle controls in the vehicle 10 (see FIG. 1). Specifically, the vehicle control unit 13 performs, for example, travel control of the vehicle 10 and operation control of various members in the vehicle 10 based on the information of the estimated position Pe.

このような車両制御部１３は、画像処理装置１２と同様に、例えば、プログラムを実行する１または複数のプロセッサ（ＣＰＵ）と、これらのプロセッサに通信可能に接続される１または複数のメモリと、を含んで構成される。また、このようなメモリも、画像処理装置１２と同様に、例えば、処理データを一時的に記憶するＲＡＭ、および、プログラムを記憶するＲＯＭ等により構成される。 Similar to the image processing device 12, such a vehicle control unit 13 includes, for example, one or more processors (CPUs) that execute programs, one or more memories communicatively connected to these processors, Consists of Further, like the image processing apparatus 12, such a memory is also composed of, for example, a RAM for temporarily storing processing data, a ROM for storing programs, and the like.

［動作および作用・効果］
続いて、本実施の形態における動作および作用・効果について、詳細に説明する。 [Operation and action/effect]
Next, the operation, functions and effects of this embodiment will be described in detail.

（Ａ．本実施の形態の位置推定処理等）
まず、図１～図６に加えて図７，図８を参照して、本実施の形態における前述した相関演算を含む処理（位置推定処理等）の一例について、説明する。図７は、そのような相関演算を含む処理例を、流れ図で表したものである。また、図８は、比較例（図８（Ａ））および本実施の形態の実施例（図８（Ｂ））に係る相関演算の処理例をそれぞれ、対比して（破線で示した符号ＣＦ参照）、模式的に表したものである。 (A. Position estimation processing, etc. of the present embodiment)
First, with reference to FIGS. 7 and 8 in addition to FIGS. 1 to 6, an example of processing (position estimation processing, etc.) including the aforementioned correlation calculation according to the present embodiment will be described. FIG. 7 is a flowchart showing an example of processing including such a correlation calculation. Further, FIG. 8 shows a comparative example (FIG. 8A) and an example of the correlation calculation according to the example of the present embodiment (FIG. 8B) in comparison (marked by the dashed line CF ), which is a schematic representation.

図７に示した処理例では、まず、ステレオカメラ１１が車両１０の前方を撮像することにより、ステレオ画像ＰＩＣ（左画像ＰＬおよび右画像ＰＲ）を生成する（ステップＳ１００）。具体的には、例えば図３に示したような、ステレオ画像ＰＩＣ（左画像ＰＬおよび右画像ＰＲ）が、生成される。 In the processing example shown in FIG. 7, first, the stereo camera 11 captures an image in front of the vehicle 10 to generate a stereo image PIC (left image PL and right image PR) (step S100). Specifically, for example, a stereo image PIC (left image PL and right image PR) as shown in FIG. 3 is generated.

次に、画像処理装置１２内の画像メモリ２１が、そのようにして生成されたステレオ画像ＰＩＣ（左画像ＰＬおよび右画像ＰＲ）を、撮像画像Ｐとして一旦記憶する（ステップＳ１０１）。続いて、領域設定部１２３において、撮像画像Ｐにおける画像領域Ｒが設定済みなのか否かが、判定される（ステップＳ１０２）。ここで、そのような画像領域Ｒが設定済みであると判定された場合には（ステップＳ１０２：Ｙ）、後述するステップＳ１０４へと進むことになる。 Next, the image memory 21 in the image processing device 12 temporarily stores the stereo image PIC (the left image PL and the right image PR) thus generated as the captured image P (step S101). Subsequently, the area setting unit 123 determines whether or not the image area R in the captured image P has been set (step S102). Here, if it is determined that such an image area R has been set (step S102: Y), the process proceeds to step S104, which will be described later.

一方、そのような画像領域Ｒが設定済みではないと判定された場合には（ステップＳ１０２：Ｎ）、次に領域設定部１２３は、前述した手法（距離情報Ｉｚまたは機械学習など）を利用して、撮像画像Ｐにおける１または複数の画像領域Ｒを設定する（ステップＳ１０３）。そして、次にステップＳ１０４では、特徴量抽出部１２４が、撮像画像Ｐの画像領域Ｒに含まれる、前述した特徴量Ｆ（２次元配置）を抽出する。 On the other hand, if it is determined that such an image region R has not been set (step S102: N), then the region setting unit 123 uses the above-described method (distance information Iz, machine learning, etc.). to set one or a plurality of image regions R in the captured image P (step S103). Then, in step S104, the feature amount extraction unit 124 extracts the above-described feature amount F (two-dimensional arrangement) included in the image region R of the captured image P. FIG.

続いて、位置推定部１２５は、前述した撮像画像Ｐｎ，Ｐｎ＋１（図６参照）の画像領域Ｒ同士において、特徴量Ｆに対する畳み込み演算を利用した相関演算を行うことにより、追跡対象物体の位置を推定する（ステップS１０５）。具体的には、位置推定部１２５は、前述したようなフーリエ変換および畳み込み演算をそれぞれ、２次元配置の特徴量Ｆに対して１次元ずつ実行することで、そのような追跡対象物体の位置推定を行う。 Subsequently, the position estimating unit 125 calculates the position of the tracked object by performing a correlation operation using a convolution operation on the feature amount F between the image regions R of the captured images Pn and Pn+1 (see FIG. 6). Estimate (step S105). Specifically, the position estimating unit 125 performs the above-described Fourier transform and convolution operation on the two-dimensionally arranged feature quantity F one dimension at a time, thereby estimating the position of such a tracked object. I do.

より具体的には、図６の例に対応した図８（Ｂ）の実施例では、位置推定部１２５は、以下のようにして位置推定を行う。すなわち、位置推定部１２５は、まず、特徴量Ｆｎにおけるｘ軸方向（行方向）に沿って、前述したようなフーリエ変換および畳み込み演算の１次元処理を、それぞれ実行する（破線の矢印参照）。そして、位置推定部１２５は、このようなｘ軸方向に沿った１次元処理の結果を、ｙ軸方向（列方向）に沿った複数組（この例では４組）同士で互いに加算することにより、２次元配置の特徴量Ｆｎに対するフーリエ変換および畳み込み演算を実行する。 More specifically, in the example of FIG. 8B corresponding to the example of FIG. 6, the position estimation unit 125 performs position estimation as follows. That is, the position estimating unit 125 first executes the above-described one-dimensional processing of Fourier transform and convolution along the x-axis direction (row direction) of the feature amount Fn (see the dashed arrow). Then, the position estimation unit 125 adds the results of such one-dimensional processing along the x-axis direction to a plurality of sets (four sets in this example) along the y-axis direction (column direction). , performs a Fourier transform and a convolution operation on the two-dimensionally arranged feature amount Fn.

ここで、上記したｘ軸方向（行方向）は、本開示における「第１の方向」の一具体例に対応している。また、上記したｙ軸方向（列方向）は、本開示における「第２の方向」の一具体例に対応している。 Here, the x-axis direction (row direction) described above corresponds to a specific example of the "first direction" in the present disclosure. Also, the y-axis direction (column direction) described above corresponds to a specific example of the “second direction” in the present disclosure.

なお、このようなステップS１０５において、追跡対象物体の位置推定が困難であると判断された場合には、そのような位置推定が終了となり、図７に示した一連の処理例も終了となる。 If it is determined in step S105 that it is difficult to estimate the position of the object to be tracked, such position estimation ends, and the series of processing examples shown in FIG. 7 also ends.

続いて、車両制御部１３が、ステップＳ１０５にて得られた位置推定の結果（前述した追跡対象物体の推定位置Ｐｅ）を利用して、車両１０における各種の車両制御（前述した車両１０の走行制御や、各種部材の動作制御など）を行う（ステップＳ１０６）。 Subsequently, the vehicle control unit 13 uses the position estimation result obtained in step S105 (estimated position Pe of the tracked object described above) to perform various vehicle controls in the vehicle 10 (running of the vehicle 10 described above). control, operation control of various members, etc.) is performed (step S106).

以上で、図７に示した一連の処理例が終了となる。 This completes the series of processing examples shown in FIG.

（Ｂ．作用・効果）
このようにして本実施の形態では、時間軸に沿って前後のフレーム期間にてそれぞれ得られた撮像画像Ｐの画像領域Ｒ同士において、抽出された特徴量Ｆ（２次元配置）に対する畳み込み演算を利用した相関演算を行うことによって、追跡対象物体の位置を推定する際に、以下の処理が行われる。すなわち、特徴量Ｆに対する畳み込み演算を行う際に、特徴量Ｆに対するフーリエ変換が用いられると共に、これらのフーリエ変換および畳み込み演算がそれぞれ、２次元配置の特徴量Ｆに対して、１次元ずつ実行される。これにより本実施の形態では、例えば、これらのフーリエ変換および畳み込み演算をそれぞれ２次元で実行する場合（以下の比較例の場合に対応）と比べ、演算処理量（計算量）が削減される。 (B. Action and effect)
In this manner, in the present embodiment, the convolution operation is performed on the extracted feature amount F (two-dimensional arrangement) between the image regions R of the captured image P obtained respectively in the preceding and succeeding frame periods along the time axis. The following processing is performed when estimating the position of the tracked object by performing the correlation calculation using. That is, when performing a convolution operation on the feature amount F, a Fourier transform is used on the feature amount F, and these Fourier transform and convolution operation are each performed on the feature amount F arranged in two dimensions one dimension at a time. be. As a result, in the present embodiment, for example, the amount of arithmetic processing (computational amount) is reduced compared to the case where the Fourier transform and the convolution operation are each executed two-dimensionally (corresponding to the case of the comparative example below).

具体的には、図８（Ａ）に示した比較例では、離散フーリエ変換（ＤＦＴ：Discrete Fourier Transform）を用いた場合について考えると、計算量は以下のようになる。すなわち、まず、（Ｎ×Ｎ）の２次元配置の画素ＰＸにおける１次元ＤＦＴの際の計算量のオーダーは、Ｏ（Ｎ²）となる。したがって、この比較例では上記したように、フーリエ変換が２次元で実行される（ｘ軸方向（行方向）およびｙ軸方向（列方向）の各々に沿って実行される：破線の矢印参照）ことから、２次元ＤＦＴの際の計算量のオーダーは、Ｏ（Ｎ²×Ｎ²）＝Ｏ（Ｎ⁴）となる。つまり、図８（Ａ）の例では、（４×４）の２次元配置の画素ＰＸであることから、２次元ＤＦＴの際の計算量のオーダーは、Ｏ（４²×４²）＝Ｏ（４⁴）となる。 Specifically, in the comparative example shown in FIG. 8A, the amount of calculation is as follows, considering the case of using the discrete Fourier transform (DFT). That is, first, the order of the amount of calculation in the case of one-dimensional DFT for pixels PX in a two-dimensional arrangement of (N×N) is O(N ² ). Therefore, in this comparative example, as described above, the Fourier transform is performed in two dimensions (along each of the x-axis direction (row direction) and y-axis direction (column direction): see dashed arrows). Therefore, the order of the amount of calculation for two-dimensional DFT is O(N ² ×N ² )=O(N ⁴ ). That is, in the example of FIG. 8A, since the pixels PX are two-dimensionally arranged (4×4), the order of the amount of calculation for the two-dimensional DFT is O(4 ² ×4 ² )=O (4 ⁴ ).

これに対して、図８（Ｂ）に示した実施例では、同様にしてＤＦＴを用いた場合について考えると、計算量は以下のようになる。すなわち、この実施例では、ｘ軸に沿って実行した１次元ＤＦＴ（破線の矢印参照）の結果が、ｙ軸方向に沿った複数組（４組）同士で互いに加算されることから、トータルでの計算量のオーダーは、Ｏ（Ｎ²×Ｎ）＝Ｏ（Ｎ³）＝Ｏ（４³）となる。つまり、この実施例では上記した比較例と比べ、計算精度を保持しつつ、特徴量Ｆｎに対するＤＦＴの際の計算量が、削減されることになる。なお、このような計算量の削減については、上記したＤＦＴの場合には限られず、例えば高速フーリエ変換（ＦＦＴ：Fast Fourier Transform）などの、他の種類のフーリエ変換等においても、同様となる。 On the other hand, in the embodiment shown in FIG. 8B, the amount of calculation is as follows when considering the case of using DFT in the same manner. That is, in this embodiment, the results of the one-dimensional DFT (see the dashed arrow) executed along the x-axis are added together in a plurality of sets (four sets) along the y-axis direction. is of the order of O(N ² ×N)=O(N ³ )=O(4 ³ ). That is, in this embodiment, the amount of calculation for DFT on the feature amount Fn is reduced while maintaining the calculation accuracy as compared with the above-described comparative example. Note that such a reduction in the amount of calculation is not limited to the above-described DFT, and the same applies to other types of Fourier transform such as FFT (Fast Fourier Transform).

以上のことから、本実施の形態では、物体位置の推定処理の高速化（処理時間の短縮化）を図ることが可能となる。 As described above, in the present embodiment, it is possible to speed up the object position estimation process (reduce the processing time).

また、本実施の形態では、ｘ軸方向に沿った１次元処理の結果を、ｙ軸方向に沿った複数組同士で互いに加算することによって、２次元配置の特徴量Ｆに対するフーリエ変換および畳み込み演算を実行するようにしたので、以下のようになる。すなわち、このような具体的な演算手法を用いることで、上記したような計算量の削減を、実際に実現することが可能となる。 In addition, in the present embodiment, the results of one-dimensional processing along the x-axis direction are added together in a plurality of sets along the y-axis direction to perform a Fourier transform and a convolution operation on the feature quantity F in a two-dimensional arrangement. is executed, so it will be as follows. That is, by using such a specific calculation technique, it is possible to actually reduce the amount of calculation as described above.

更に、本実施の形態では、追跡対象物体の位置推定を行う際に、前述したＫＣＦを用いて相関演算を行うと共に、フーリエ変換によって、色空間を示す特徴量Ｆから周波数空間への変換を行うようにしたので、更なる高速化（処理時間の更なる短縮化）を図ることが可能となる。 Furthermore, in the present embodiment, when estimating the position of the object to be tracked, the correlation calculation is performed using the KCF described above, and the feature quantity F representing the color space is transformed into the frequency space by Fourier transform. As a result, it is possible to further increase the speed (further shorten the processing time).

加えて、本実施の形態では、前述した距離情報Ｉｚまたは機械学習を利用して、撮像画像Ｐにおける画像領域Ｒを設定するようにしたので、そのような画像領域Ｒを容易に設定することができ、この点でも、更なる高速化（処理時間の更なる短縮化）を図ることが可能となる。 In addition, in the present embodiment, the above-described distance information Iz or machine learning is used to set the image region R in the captured image P, so that such an image region R can be easily set. In this respect as well, it is possible to achieve further speeding up (further shortening of processing time).

＜２．変形例＞
以上、実施の形態を挙げて本開示を説明したが、本開示はこの実施の形態に限定されず、種々の変形が可能である。 <2. Variation>
Although the present disclosure has been described above with reference to the embodiments, the present disclosure is not limited to these embodiments, and various modifications are possible.

例えば、車両１０や画像処理装置１２における各部材の構成（形式、形状、配置、個数等）については、上記実施の形態で説明したものには限られない。すなわち、これらの各部材における構成については、他の形式や形状、配置、個数等であってもよい。また、上記実施の形態で説明した各種パラメータの値や範囲、大小関係等についても、上記実施の形態で説明したものには限られず、他の値や範囲、大小関係等であってもよい。 For example, the configuration (type, shape, arrangement, number, etc.) of each member in the vehicle 10 and the image processing device 12 is not limited to that described in the above embodiment. That is, the configuration of each of these members may be of other types, shapes, arrangements, numbers, and the like. Further, the values, ranges, magnitude relationships, etc. of the various parameters described in the above embodiments are not limited to those described in the above embodiments, and may be other values, ranges, magnitude relationships, and the like.

具体的には、例えば上記実施の形態では、ステレオカメラ１１が車両１０の前方を撮像するように構成されていたが、このような構成には限定されず、例えばステレオカメラ１１が、車両１０の側方や後方を撮像するように構成してもよい。また、上記実施の形態では、ステレオカメラ１１を用いた場合の例について説明したが、この例には限られず、例えば単眼のカメラを用いて、上記実施の形態で説明した各種処理を行うようにしてもよい。 Specifically, for example, in the above-described embodiment, the stereo camera 11 is configured to capture an image in front of the vehicle 10. However, the present invention is not limited to such a configuration. It may be configured to image the side or the rear. Further, in the above-described embodiment, an example in which the stereo camera 11 is used has been described, but the present invention is not limited to this example. may

また、例えば、上記実施の形態では、車両１０や画像処理装置１２において行われる各種処理について、具体例を挙げて説明したが、これらの具体例には限られない。すなわち、他の手法を用いて、これらの各種処理を行うようにしてもよい。具体的には、例えば、前述したフーリエ変換および畳み込み演算の１次元処理については、上記実施の形態で説明した手法（ｘ軸方向に沿った１次元処理）には限られず、例えば、ｙ軸方向に沿った１次元処理や、その他の方向（斜め方向など）に沿った１次元処理であってもよい。また、上記実施の形態では、ＫＣＦを用いた処理手法の例について説明したが、この例には限られず、他の処理手法を用いるようにしてもよい。 Further, for example, in the above-described embodiment, various processes performed in the vehicle 10 and the image processing device 12 have been described with specific examples, but the present invention is not limited to these specific examples. That is, other methods may be used to perform these various processes. Specifically, for example, the one-dimensional processing of the Fourier transform and convolution operation described above is not limited to the method described in the above embodiment (one-dimensional processing along the x-axis direction). It may be one-dimensional processing along the line or one-dimensional processing along other directions (such as oblique directions). Also, in the above embodiment, an example of a processing method using KCF has been described, but the present invention is not limited to this example, and other processing methods may be used.

更に、上記実施の形態で説明した一連の処理は、ハードウェア（回路）で行われるようにしてもよいし、ソフトウェア（プログラム）で行われるようにしてもよい。ソフトウェアで行われるようにした場合、そのソフトウェアは、各機能をコンピュータにより実行させるためのプログラム群で構成される。各プログラムは、例えば、上記コンピュータに予め組み込まれて用いられてもよいし、ネットワークや記録媒体から上記コンピュータにインストールして用いられてもよい。 Furthermore, the series of processes described in the above embodiment may be performed by hardware (circuit) or by software (program). When it is performed by software, the software consists of a program group for executing each function by a computer. Each program, for example, may be installed in the computer in advance and used, or may be installed in the computer from a network or a recording medium and used.

また、上記実施の形態では、画像処理装置１２が車両に設けられている場合の例について説明したが、この例には限られず、そのような画像処理装置１２が、例えば、車両以外の移動体や、移動体以外の装置に設けられているようにしてもよい。 Further, in the above-described embodiment, an example in which the image processing device 12 is provided in a vehicle has been described, but the present invention is not limited to this example. Alternatively, it may be provided in a device other than the mobile body.

更に、これまでに説明した各種の例を、任意の組み合わせで適用させるようにしてもよい。 Furthermore, the various examples described so far may be applied in any combination.

なお、本明細書中に記載された効果はあくまで例示であって限定されるものではなく、また、他の効果があってもよい。 Note that the effects described in this specification are merely examples and are not limited, and other effects may be provided.

また、本開示は、以下のような構成を取ることも可能である。
（１）
撮像画像において、１または複数の画像領域を設定する領域設定部と、
前記撮像画像の前記画像領域に含まれる、２次元配置の特徴量を抽出する抽出部と、
時間軸に沿った前後のフレーム期間にてそれぞれ得られた、前記撮像画像の前記画像領域同士において、前記特徴量に対する畳み込み演算を利用した相関演算を行うことにより、追跡対象物体の位置を推定する位置推定部と
を備え、
前記位置推定部は、
前記畳み込み演算を行う際に、前記特徴量に対するフーリエ変換を用いると共に、
前記フーリエ変換および前記畳み込み演算をそれぞれ、前記２次元配置の前記特徴量に対して、１次元ずつ実行する
画像処理装置。
（２）
前記位置推定部は、
前記特徴量における第１の方向に沿って、前記フーリエ変換および前記畳み込み演算の１次元処理を、それぞれ実行すると共に、
前記第１の方向に沿った前記１次元処理の結果を、前記第１の方向とは異なる第２の方向に沿った複数組同士で、互いに加算することにより、
前記２次元配置の前記特徴量に対する、前記フーリエ変換および前記畳み込み演算を実行する
上記（１）に記載の画像処理装置。
（３）
前記位置推定部は、
ＫＣＦ（Kernelized Correlation Filter）を用いて前記相関演算を行うと共に、
前記フーリエ変換によって、色空間を示す前記特徴量から周波数空間への変換を行う
上記（１）または（２）に記載の画像処理装置。
（４）
前記領域設定部は、ステレオカメラから得られる前記撮像画像としての左画像および右画像に基づいて生成される距離情報、または、機械学習を利用して、前記画像領域を設定する
上記（１）ないし（３）のいずれかに記載の画像処理装置。
（５）
上記（１）ないし（４）のいずれかに記載の画像処理装置と、
前記位置推定部から供給される前記追跡対象物体の推定位置を利用して、車両制御を行う車両制御部と
を備えた車両。
（６）
１または複数のプロセッサと
前記１または複数のプロセッサに通信可能に接続される１または複数のメモリと
を備え、
前記１または複数のプロセッサは、
撮像画像において、１または複数の画像領域を設定することと、
前記撮像画像の前記画像領域に含まれる、２次元配置の特徴量を抽出することと、
時間軸に沿った前後のフレーム期間にてそれぞれ得られた、前記撮像画像の前記画像領域同士において、前記特徴量に対する畳み込み演算を利用した相関演算を行うことにより、追跡対象物体の位置を推定することと
を行うと共に、
前記畳み込み演算を行う際に、前記特徴量に対するフーリエ変換を用いると共に、
前記フーリエ変換および前記畳み込み演算をそれぞれ、前記２次元配置の前記特徴量に対して、１次元ずつ実行する
画像処理装置。 In addition, the present disclosure can also be configured as follows.
(1)
an area setting unit that sets one or more image areas in a captured image;
an extraction unit that extracts a two-dimensional arrangement feature amount included in the image area of the captured image;
estimating the position of the tracked object by performing a correlation operation using a convolution operation on the feature amount between the image regions of the captured image obtained respectively in the preceding and following frame periods along the time axis; a position estimator and
The position estimation unit
When performing the convolution operation, using a Fourier transform for the feature amount,
An image processing device that executes the Fourier transform and the convolution operation on the feature amount arranged in the two-dimensional arrangement one dimension at a time.
(2)
The position estimation unit
Along the first direction in the feature quantity, performing one-dimensional processing of the Fourier transform and the convolution operation, respectively,
By adding together the results of the one-dimensional processing along the first direction in a plurality of sets along a second direction different from the first direction,
The image processing device according to (1) above, wherein the Fourier transform and the convolution operation are performed on the feature amount in the two-dimensional arrangement.
(3)
The position estimation unit
While performing the correlation calculation using KCF (Kernelized Correlation Filter),
The image processing apparatus according to (1) or (2) above, wherein the Fourier transform is used to transform the feature quantity representing a color space into a frequency space.
(4)
The region setting unit sets the image region using distance information generated based on the left image and the right image as the captured images obtained from a stereo camera, or machine learning. The image processing device according to any one of (3).
(5)
an image processing device according to any one of (1) to (4) above;
and a vehicle control unit that performs vehicle control using the estimated position of the tracked object supplied from the position estimation unit.
(6)
one or more processors; and one or more memories communicatively coupled to the one or more processors;
The one or more processors are
setting one or more image areas in a captured image;
Extracting a two-dimensional arrangement feature amount included in the image area of the captured image;
estimating the position of the tracked object by performing a correlation operation using a convolution operation on the feature amount between the image regions of the captured image obtained respectively in the preceding and following frame periods along the time axis; While doing things and
When performing the convolution operation, using a Fourier transform for the feature amount,
An image processing device that executes the Fourier transform and the convolution operation on the feature amount arranged in the two-dimensional arrangement one dimension at a time.

１０…車両、１１…ステレオカメラ、１１Ｌ…左カメラ、１１Ｒ…右カメラ、１２…画像処理装置、１２１…画像メモリ、１２２…距離情報生成部、１２３…領域設定部、１２４…特徴量抽出部、１２５…位置推定部、１３…車両制御部、１９…フロントガラス、９０…先行車両、ＰＬ…左画像、ＰＲ…右画像、ＰＩＣ…ステレオ画像、Ｐ，Ｐ０～Ｐｎ，Ｐｎ＋１…撮像画像、Ｒ…画像領域、Ｉｚ…距離情報、Ｆ，Ｆｎ，Ｆｎ＋１…特徴量、Ｍｃ…相関マップ、Ｐｅ…推定位置、ｘ０…位置座標、ＰＸ…画素、ｔ…時間軸、ｔ０～ｔｎ…タイミング。 Reference Signs List 10 Vehicle 11 Stereo camera 11L Left camera 11R Right camera 12 Image processing device 121 Image memory 122 Distance information generation unit 123 Area setting unit 124 Feature amount extraction unit 125 Position estimation unit 13 Vehicle control unit 19 Windshield 90 Leading vehicle PL Left image PR Right image PIC Stereo image P, P0 to Pn, Pn+1 Captured image R Image area, Iz: distance information, F, Fn, Fn+1: feature amount, Mc: correlation map, Pe: estimated position, x0: position coordinate, PX: pixel, t: time axis, t0 to tn: timing.

Claims

an area setting unit that sets one or more image areas in a captured image;
an extraction unit that extracts a two-dimensional arrangement feature amount included in the image area of the captured image;
estimating the position of the tracked object by performing a correlation operation using a convolution operation on the feature amount between the image regions of the captured image obtained respectively in the preceding and following frame periods along the time axis; a position estimator and
The position estimation unit
When performing the convolution operation, using a Fourier transform for the feature amount,
An image processing device that executes the Fourier transform and the convolution operation on the feature amount arranged in the two-dimensional arrangement one dimension at a time.

The position estimation unit
Along the first direction in the feature quantity, performing one-dimensional processing of the Fourier transform and the convolution operation, respectively,
By adding together the results of the one-dimensional processing along the first direction in a plurality of sets along a second direction different from the first direction,
The image processing apparatus according to claim 1, wherein the Fourier transform and the convolution operation are performed on the feature amount of the two-dimensional arrangement.

The position estimation unit
While performing the correlation calculation using KCF (Kernelized Correlation Filter),
3. The image processing apparatus according to claim 1, wherein the Fourier transform is used to transform the feature amount representing a color space into a frequency space.

The area setting unit sets the image area using distance information generated based on the left image and the right image as the captured images obtained from a stereo camera, or machine learning. Item 4. The image processing device according to any one of Item 3.

an image processing apparatus according to any one of claims 1 to 4;
and a vehicle control unit that performs vehicle control using the estimated position of the tracked object supplied from the position estimation unit.