JP7226368B2

JP7226368B2 - Object state identification device

Info

Publication number: JP7226368B2
Application number: JP2020024569A
Authority: JP
Inventors: 大輔橋本
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2023-02-21
Anticipated expiration: 2040-02-17
Also published as: JP2021128705A

Description

本発明は、画像に表された物体の状態を識別する物体状態識別装置に関する。 The present invention relates to an object state identification device that identifies the state of an object represented in an image.

カメラにより得られた画像といった、センサ情報に表された物体を検出する技術が研究されている。近年では、物体を検出するために、いわゆるディープニューラルネットワーク（以下、単にDNNと呼ぶ）といった機械学習手法を用いることで、検出精度を向上する技術が提案されている。 Techniques for detecting objects represented by sensor information, such as images obtained by cameras, are being researched. In recent years, techniques have been proposed to improve detection accuracy by using machine learning techniques such as so-called deep neural networks (hereinafter simply referred to as DNNs) to detect objects.

また、画像に表された物体を追跡し、あるいは異常を検知するために、時系列の複数の画像またはそれらの画像から得られた特徴量をニューラルネットワークの入力として利用する技術が提案されている（例えば、特許文献１及び２を参照）。 In addition, in order to track objects represented in images or detect anomalies, techniques have been proposed that utilize multiple time-series images or feature values obtained from those images as inputs to a neural network. (See, for example, Patent Documents 1 and 2).

例えば、特許文献１に開示された物体追跡方法は、時系列に連続した２以上の画像をニューラルネットワークに入力する。この物体追跡方法は、それら２以上の画像それぞれの特徴量であってニューラルネットワークに抽出させた特徴量を比較して類似性を照合する。そしてこの物体追跡方法は、その照合結果に基づいて時系列で前の画像に映る追跡候補である１以上の物体に一致する、前の画像より時系列で後の画像に映る１以上の物体の識別情報及び位置情報を、識別結果として出力する。また、使用されるニューラルネットワークは、１以上の全結合層と０以上の畳み込み層とを有する同一構造を２以上含み、同一構造間の対応する層でパラメータを共有する。 For example, in the object tracking method disclosed in Patent Document 1, two or more images that are consecutive in time series are input to a neural network. This object tracking method compares feature amounts extracted by a neural network, which are feature amounts of each of the two or more images, and checks for similarity. Then, this object tracking method matches one or more objects that are tracking candidates appearing in the previous image in chronological order based on the matching result, and finds one or more objects appearing in the following image in chronological order relative to the previous image. Identification information and location information are output as identification results. Also, the neural network used includes two or more identical structures with one or more fully connected layers and zero or more convolutional layers, and shares parameters in corresponding layers between identical structures.

また、特許文献２に開示された異常監視システムは、監視対象の画像から変化のあった画像部分を抽出し、変化のあった画像部分を畳み込みニューラルネットワークに入力して特徴量を抽出し、抽出した特徴量を再帰型ニューラルネットワークに入力して画像の概略を示す画像説明文を生成する。 In addition, the anomaly monitoring system disclosed in Patent Document 2 extracts an image portion that has changed from an image to be monitored, inputs the changed image portion to a convolutional neural network, extracts a feature amount, and extracts The resulting feature amount is input to a recursive neural network to generate an image description that outlines the image.

特開２０１８－２６１０８号公報Japanese Patent Application Laid-Open No. 2018-26108 特開２０１８－１０１３１７号公報JP 2018-101317 A

上記の技術でも、画像に表された物体の状態を正確に識別できないことがある。 Even with the techniques described above, it may not be possible to accurately identify the state of the object represented in the image.

そこで、本発明は、画像に表された物体の状態を識別することが可能な物体状態識別装置を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an object state identification device capable of identifying the state of an object represented in an image.

一つの実施形態によれば、物体状態識別装置が提供される。この物体状態識別装置は、時系列に得られる一連の画像を、所定の物体を検出するように予め学習された第１の識別器に入力することで、一連の画像のそれぞれについて、その画像上において所定の物体を含み、かつ所定の形状を有する物体領域を検出する物体検出部と、一連の画像のそれぞれを、画素単位で所定の物体が表されている画素の集合をその他の画素の集合と識別するように予め学習された第２の識別器に入力することで、一連の画像のそれぞれについて所定の物体が表された画素の集合である着目領域とその他の領域とに分割する領域分割部と、一連の画像のそれぞれにおいて検出された物体領域内の画素値から求められる特徴のうち、物体領域及び着目領域の両方に含まれる特徴を、時系列順に再帰構造を持つ第３の識別器に入力することで、時系列の外観変化を伴う所定の物体の状態を識別する状態識別部と、を有する。 According to one embodiment, an object state identification device is provided. This object state identification device inputs a series of images obtained in time series to a first classifier that has been pre-learned to detect a predetermined object. and an object detection unit that detects an object region that includes a predetermined object and has a predetermined shape, and a set of pixels in which the predetermined object is represented in pixel units for each of the series of images. By inputting it to a second discriminator that has been pre-learned to discriminate, each of the series of images is divided into a region of interest, which is a set of pixels representing a predetermined object, and other regions. and a third discriminator having a recursive structure in chronological order for features included in both the object region and the region of interest, among features obtained from pixel values in the object region detected in each of the series of images. and a state identification unit that identifies a state of a predetermined object that accompanies a time-series change in appearance by inputting to .

本発明に係る物体状態識別装置は、画像に表された物体の状態を識別することができるという効果を奏する。 The object state identification device according to the present invention has the effect of being able to identify the state of an object represented in an image.

物体状態識別装置が実装される車両制御システムの概略構成図である。1 is a schematic configuration diagram of a vehicle control system in which an object state identification device is mounted; FIG. 物体状態識別装置の一実施形態である電子制御装置のハードウェア構成図である。1 is a hardware configuration diagram of an electronic control device that is an embodiment of an object state identification device; FIG. 物体状態識別処理を含む車両制御処理に関する、電子制御装置のプロセッサの機能ブロック図である。FIG. 3 is a functional block diagram of a processor of an electronic control unit relating to vehicle control processing including object state identification processing; 第１の識別器として利用されるDNNの構成の一例を示す図である。FIG. 4 is a diagram showing an example of the configuration of a DNN used as a first discriminator; FIG. 画像上に表される検出対象物体と物体領域の一例を示す図である。FIG. 3 is a diagram showing an example of a detection target object and an object region represented on an image; 物体領域及び着目領域の一例を示す図である。FIG. 3 is a diagram showing an example of an object region and a region of interest; 状態識別処理に関連する各部の処理のタイミングチャートである。4 is a timing chart of processing of each unit related to state identification processing; 検出物体リストの一例を示す図である。FIG. 10 is a diagram showing an example of a detected object list; 物体状態識別処理を含む車両制御処理の動作フローチャートである。4 is an operation flowchart of vehicle control processing including object state identification processing;

以下、図を参照しつつ、物体状態識別装置について説明する。この物体状態識別装置は、時系列に得られる一連の画像に表される、検出対象となる物体（以下、検出対象物体と呼ぶことがある）に関して、時系列の外観変化を伴う状態を識別する。そのために、この物体状態識別装置は、検出対象物体を検出するように予め学習された第１の識別器にその一連の画像を入力することで、画像ごとに、その画像上でその検出対象物体を含み、かつ、所定の形状を有する領域（以下、物体領域と呼ぶことがある）を検出する。そして、この物体状態識別装置は、各画像から検出された検出対象物体を追跡することで、各画像において同じ検出対象物体が表された物体領域同士を対応付ける。また、この物体状態識別装置は、一連の画像を、検出対象物体が表されている画素の集合をその他の画素の集合と識別するように予め学習された第２の識別器に入力することで、各画像について、検出対象物体が表された画素の集合である着目領域とその他の領域（以下、マスク領域と呼ぶことがある）とに領域分割する。さらに、この物体状態識別装置は、一連の画像における、同じ検出対象物体が表された物体領域内の画素値から求められる特徴のうち、その物体領域及び着目領域の両方に含まれる特徴を、時系列順に再帰構造を持つ第３の識別器に入力することで、その検出対象物体の状態を識別する。 The object state identification device will be described below with reference to the drawings. This object state identification device identifies a state of an object to be detected (hereinafter, sometimes referred to as a detection target object) represented by a series of images obtained in time series, accompanied by time-series changes in appearance. . For this purpose, this object state identification apparatus inputs the series of images to a first classifier pre-trained to detect the detection target object, and for each image, the detection target object and having a predetermined shape (hereinafter sometimes referred to as an object region) is detected. By tracking the detection target object detected from each image, the object state identification apparatus associates object regions in which the same detection target object is represented in each image. In addition, this object state identification device inputs a series of images to a second discriminator trained in advance to discriminate a set of pixels representing a detection target object from other sets of pixels. , each image is divided into a region of interest, which is a set of pixels representing a detection target object, and other regions (hereinafter sometimes referred to as mask regions). Further, the object state identification apparatus, among features obtained from pixel values in an object region representing the same detection target object in a series of images, detects features included in both the object region and the target region at a time. By inputting to the third discriminator having a recursive structure in sequential order, the state of the object to be detected is discriminated.

例えば、検出対象物体が車両であるとする。車両は、左折または右折する場合、ターンシグナルを点滅させる。また、車両は、減速する際にブレーキランプを点灯させ、停車する際などにハザードランプを点滅させる。これらのシグナルまたはランプの点灯または点滅は、時系列の車両の外観の変化を伴うものであり、かつ、車両の挙動に関する状態を表している。しかし、ターンシグナル、ブレーキランプまたはハザードランプが表された個々の画像では、それらのランプの明滅の時系列変化が分からないため、ターンシグナルまたはハザードランプが点滅しているか否か、ブレーキランプが点灯しているか消灯しているかを、ターンシグナル、ブレーキランプまたはハザードランプが表された個々の画像から精度良く識別することは困難である。そこで、この物体状態識別装置は、上記のように、時系列の一連の画像のそれぞれの物体領域及び着目領域の両方に含まれる特徴を、再帰的な構造を持つ第３の識別器に入力することで、ターンシグナルまたはハザードランプの点滅の有無、ブレーキランプの点灯及び消灯を精度良く識別することができる。 For example, assume that the object to be detected is a vehicle. The vehicle flashes the turn signals when turning left or right. In addition, the vehicle turns on the brake lamps when decelerating, and blinks the hazard lamps when stopping the vehicle. The lighting or blinking of these signals or lamps is accompanied by changes in the appearance of the vehicle over time, and represents the state of behavior of the vehicle. However, individual images showing turn signals, brake lamps, or hazard lamps do not show time-series changes in blinking of those lamps, so whether the turn signals or hazard lamps are blinking, and whether the brake lamps are on or off cannot be determined. It is difficult to accurately identify whether the lights are on or off from individual images showing turn signals, brake lights, or hazard lights. Therefore, as described above, this object state identification device inputs the features included in both the object region and the region of interest of each of the series of time-series images into the third classifier having a recursive structure. Thus, it is possible to accurately identify whether the turn signals or hazard lamps are blinking, and whether the brake lamps are on or off.

以下では、物体状態識別装置を、車両制御システムに適用した例について説明する。この例では、物体状態識別装置は、車両に搭載されたカメラにより得られた時系列の一連の画像に対して物体状態識別処理を実行することで、検出対象物体として、車両の周囲に存在する他の車両を検出する。そしてこの物体状態識別装置は、検出した他の車両の、外観変化を伴う状態として、左右何れかのターンシグナルまたはハザードランプの点滅状態か否か、ブレーキランプが点灯している状態あるいは消灯している状態かを識別する。 An example in which the object state identification device is applied to a vehicle control system will be described below. In this example, the object state identification device performs object state identification processing on a series of time-series images obtained by a camera mounted on a vehicle, and detects objects existing around the vehicle as objects to be detected. Detect other vehicles. Then, the object state identification device determines whether or not left or right turn signals or hazard lamps are blinking, whether the brake lamps are on or off, as the other vehicle's detected state accompanied by a change in appearance. identify the state of

図１は、物体状態識別装置が実装される車両制御システムの概略構成図である。また図２は、物体状態識別装置の一つの実施形態である電子制御装置のハードウェア構成図である。本実施形態では、車両１０に搭載され、かつ、車両１０を制御する車両制御システム１は、車両１０の周囲を撮影するためのカメラ２と、物体状態識別装置の一例である電子制御装置（ＥＣＵ）３とを有する。カメラ２とＥＣＵ３とは、コントローラエリアネットワークといった規格に準拠した車内ネットワーク４を介して通信可能に接続される。なお、車両制御システム１は、車両１０の自動運転制御に用いられる地図を記憶するストレージ装置をさらに有していてもよい。さらに、車両制御システム１は、LiDARあるいはレーダといった測距センサ、GPS受信機といった、衛星測位システムに準拠して車両１０の自己位置を測位するための受信機、他の機器と無線通信するための無線端末、及び、車両１０の走行予定ルートを探索するためのナビゲーション装置などを有していてもよい。 FIG. 1 is a schematic configuration diagram of a vehicle control system in which an object state identification device is installed. FIG. 2 is a hardware configuration diagram of an electronic control unit, which is one embodiment of the object state identification device. In this embodiment, a vehicle control system 1 mounted on a vehicle 10 and controlling the vehicle 10 includes a camera 2 for capturing an image of the surroundings of the vehicle 10 and an electronic control unit (ECU) which is an example of an object state identification device. ) and 3. The camera 2 and the ECU 3 are communicably connected via an in-vehicle network 4 conforming to a standard such as a controller area network. Note that the vehicle control system 1 may further include a storage device that stores maps used for automatic driving control of the vehicle 10 . Further, the vehicle control system 1 includes a distance measuring sensor such as LiDAR or radar, a receiver such as a GPS receiver for measuring the self-position of the vehicle 10 in compliance with the satellite positioning system, and a receiver for wirelessly communicating with other devices. A wireless terminal, a navigation device for searching for a planned travel route of the vehicle 10, and the like may be provided.

カメラ２は、所定の検知範囲内に存在する物体を検出するためのセンサである撮像部の一例であり、CCDあるいはC-MOSなど、可視光に感度を有する光電変換素子のアレイで構成された２次元検出器と、その２次元検出器上に撮影対象となる領域の像を結像する結像光学系を有する。そしてカメラ２は、車両１０の前方を向くように、例えば、車両１０の車室内に取り付けられる。そしてカメラ２は、所定の撮影周期（例えば1/30秒～1/10秒）ごとに車両１０の前方領域を撮影し、その前方領域が写った画像を生成する。カメラ２により得られた画像は、カラー画像であることが好ましい。なお、車両１０には、撮影方向または焦点距離が異なる複数のカメラが設けられてもよい。 The camera 2 is an example of an imaging unit that is a sensor for detecting an object existing within a predetermined detection range, and is composed of an array of photoelectric conversion elements sensitive to visible light, such as a CCD or C-MOS. It has a two-dimensional detector and an imaging optical system that forms an image of an area to be photographed on the two-dimensional detector. The camera 2 is installed, for example, in the vehicle interior of the vehicle 10 so as to face the front of the vehicle 10 . Then, the camera 2 photographs the area in front of the vehicle 10 at predetermined photographing intervals (for example, 1/30 second to 1/10 second) and generates an image showing the area in front of the vehicle. The image obtained by camera 2 is preferably a color image. Note that the vehicle 10 may be provided with a plurality of cameras having different photographing directions or focal lengths.

カメラ２は、画像を生成する度に、その生成した画像を、車内ネットワーク４を介してＥＣＵ３へ出力する。 The camera 2 outputs the generated image to the ECU 3 via the in-vehicle network 4 each time it generates an image.

ＥＣＵ３は、車両１０を制御する。本実施形態では、ＥＣＵ３は、カメラ２により得られた時系列の一連の画像から検出された物体に基づいて車両１０を自動運転するよう、車両１０を制御する。そのために、ＥＣＵ３は、通信インターフェース２１と、メモリ２２と、プロセッサ２３とを有する。 The ECU 3 controls the vehicle 10 . In this embodiment, the ECU 3 controls the vehicle 10 to automatically drive the vehicle 10 based on an object detected from a series of time-series images obtained by the camera 2 . Therefore, the ECU 3 has a communication interface 21 , a memory 22 and a processor 23 .

通信インターフェース２１は、通信部の一例であり、ＥＣＵ３を車内ネットワーク４に接続するためのインターフェース回路を有する。すなわち、通信インターフェース２１は、車内ネットワーク４を介して、カメラ２と接続される。そして通信インターフェース２１は、カメラ２から画像を受信する度に、受信した画像をプロセッサ２３へわたす。 The communication interface 21 is an example of a communication section and has an interface circuit for connecting the ECU 3 to the in-vehicle network 4 . That is, the communication interface 21 is connected to the camera 2 via the in-vehicle network 4 . Then, the communication interface 21 passes the received image to the processor 23 every time it receives an image from the camera 2 .

メモリ２２は、記憶部の一例であり、例えば、揮発性の半導体メモリ及び不揮発性の半導体メモリを有する。なお、メモリ２２は、後述するようにプロセッサ２３が複数の演算ユニットを有する場合に、演算ユニットごとに専用のメモリ回路を有していてもよい。そしてメモリ２２は、ＥＣＵ３のプロセッサ２３により実行される物体状態識別処理において使用される各種のデータ及びパラメータ、例えば、カメラ２から受信した画像、物体状態識別処理で利用される各識別器を特定するための各種パラメータ、及び、物体の種類ごとの確信度閾値などを記憶する。さらに、メモリ２２は、検出された物体に関する情報を表す検出物体リストといった、物体状態識別処理の途中で生成される各種のデータを一定期間記憶する。さらにまた、メモリ２２は、地図情報といった車両１０の走行制御に利用される情報を記憶してもよい。 The memory 22 is an example of a storage unit, and has, for example, a volatile semiconductor memory and a nonvolatile semiconductor memory. Note that the memory 22 may have a dedicated memory circuit for each arithmetic unit when the processor 23 has a plurality of arithmetic units as will be described later. The memory 22 specifies various data and parameters used in the object state identification processing executed by the processor 23 of the ECU 3, such as images received from the camera 2 and classifiers used in the object state identification processing. It stores various parameters for the determination, confidence thresholds for each type of object, and the like. Furthermore, the memory 22 stores various data generated during the object state identification process, such as a detected object list representing information about detected objects, for a certain period of time. Furthermore, the memory 22 may store information used for driving control of the vehicle 10, such as map information.

プロセッサ２３は、制御部の一例であり、１個または複数個のCPU(Central Processing Unit)及びその周辺回路を有する。プロセッサ２３は、論理演算ユニット、数値演算ユニットあるいはグラフィック処理ユニット(Graphics Processing Unit, GPU)といった他の演算回路をさらに有していてもよい。そしてプロセッサ２３は、車両１０が走行している間、カメラ２から画像を受信する度に、受信した画像に対して物体状態識別処理を含む車両制御処理を実行する。そしてプロセッサ２３は、検出された車両１０の周囲の物体に基づいて、車両１０を自動運転するよう、車両１０を制御する。 The processor 23 is an example of a control unit, and has one or more CPUs (Central Processing Units) and their peripheral circuits. The processor 23 may further comprise other arithmetic circuits such as a logic operation unit, a numerical operation unit or a graphics processing unit (GPU). Each time an image is received from the camera 2 while the vehicle 10 is running, the processor 23 executes vehicle control processing including object state identification processing on the received image. The processor 23 then controls the vehicle 10 to automatically drive the vehicle 10 based on the detected objects around the vehicle 10 .

図３は、物体状態識別処理を含む車両制御処理に関する、ＥＣＵ３のプロセッサ２３の機能ブロック図である。プロセッサ２３は、物体検出部３１と、追跡部３２と、領域分割部３３と、状態識別部３４と、運転計画部３５と、車両制御部３６とを有する。プロセッサ２３が有するこれらの各部は、例えば、プロセッサ２３上で動作するコンピュータプログラムにより実現される機能モジュールである。あるいは、プロセッサ２３が有するこれらの各部は、プロセッサ２３に設けられる、専用の演算回路であってもよい。また、プロセッサ２３が有するこれらの各部のうち、物体検出部３１、追跡部３２、領域分割部３３及び状態識別部３４が、物体状態識別処理を実行する。なお、車両１０に複数のカメラが設けられる場合には、プロセッサ２３は、カメラごとに、そのカメラにより得られた画像に基づいて物体状態識別処理を実行してもよい。 FIG. 3 is a functional block diagram of the processor 23 of the ECU 3 regarding vehicle control processing including object state identification processing. The processor 23 has an object detection unit 31 , a tracking unit 32 , an area division unit 33 , a state identification unit 34 , an operation planning unit 35 and a vehicle control unit 36 . These units of the processor 23 are, for example, functional modules implemented by computer programs running on the processor 23 . Alternatively, each of these units of processor 23 may be a dedicated arithmetic circuit provided in processor 23 . Among these units of the processor 23, the object detection unit 31, the tracking unit 32, the area division unit 33, and the state identification unit 34 execute object state identification processing. In addition, when a plurality of cameras are provided in the vehicle 10, the processor 23 may perform the object state identification processing for each camera based on the image obtained by the camera.

物体検出部３１は、カメラ２から画像を受信する度に、受信した最新の画像を物体検出用の第１の識別器に入力することで、その画像に表されている検出対象物体（すなわち、他の車両）を含み、かつ、所定の形状を有する領域（すなわち、物体領域）を検出するとともに、その検出対象物体の種類を特定する。 Every time an image is received from the camera 2, the object detection unit 31 inputs the received latest image to the first discriminator for object detection, so that the object to be detected (that is, other vehicles) and having a predetermined shape (that is, an object region), and specifies the type of the detection target object.

本実施形態では、物体検出部３１は、第１の識別器として、画像に表された検出対象物体を含む物体領域を検出し、かつ、検出対象物体の種類を識別するように予め学習されたDNNを利用する。物体検出部３１が利用するDNNは、例えば、Single Shot MultiBox Detector（SSD）、または、Faster R-CNNといった、コンボリューショナルニューラルネットワーク（以下、単にCNNと呼ぶ）型のアーキテクチャを持つDNNとすることができる。 In this embodiment, the object detection unit 31 detects an object region including a detection target object represented in an image as a first discriminator, and is pre-trained to identify the type of the detection target object. Use DNNs. The DNN used by the object detection unit 31 is, for example, a Single Shot MultiBox Detector (SSD) or a DNN with a convolutional neural network (hereinafter simply referred to as CNN) type architecture such as Faster R-CNN. can be done.

図４は、第１の識別器として利用されるDNNの構成の一例を示す図である。DNN４００は、画像が入力される入力側に設けられる主幹部４０１と、主幹部４０１よりも出力側に設けられる位置検出部４０２及び種類推定部４０３とを有する。位置検出部４０２は、主幹部４０１からの出力に基づいて、画像上に表された検出対象物体の外接矩形を物体領域として出力する。なお、物体領域の形状は、矩形形状に限られず、例えば、円形、楕円形あるいは、５角形以上の多角形形状であってもよい。種類推定部４０３は、主幹部４０１からの出力に基づいて、位置検出部４０２で検出された物体領域に表された検出対象物体の種類ごとの確信度を算出する。なお、位置検出部４０２及び種類推定部４０３は一体的に形成されてもよい。 FIG. 4 is a diagram showing an example of the configuration of a DNN used as the first discriminator. The DNN 400 has a main body 401 provided on the input side to which an image is input, and a position detection section 402 and type estimation section 403 provided on the output side of the main body 401 . Based on the output from the main body 401, the position detection unit 402 outputs the circumscribed rectangle of the detection target object represented on the image as the object region. Note that the shape of the object region is not limited to a rectangular shape, and may be, for example, a circular shape, an elliptical shape, or a polygonal shape with pentagons or more. Based on the output from the main body 401 , the type estimating section 403 calculates a certainty factor for each type of detection target object represented in the object region detected by the position detecting section 402 . Note that the position detection unit 402 and the type estimation unit 403 may be integrally formed.

主幹部４０１は、例えば、入力側から出力側へ向けて直列に接続される複数の層を有するCNNとすることができる。その複数の層には２以上の畳み込み層が含まれる。さらに、主幹部４０１が有する複数の層には、１または複数の畳み込み層ごとに設けられるプーリング層が含まれてもよい。さらにまた、主幹部４０１が有する複数の層には、１以上の全結合層が含まれてもよい。例えば、主幹部４０１は、SSDのベースレイヤーと同様の構成とすることができる。あるいは、主幹部４０１は、VGG-19、AlexNetあるいはNetwork-In-Networkといった他のCNNアーキテクチャに従って構成されてもよい。 The main trunk 401 can be, for example, a CNN with multiple layers connected in series from the input side to the output side. The multiple layers include two or more convolutional layers. Furthermore, the plurality of layers included in main trunk 401 may include pooling layers provided for each of one or more convolution layers. Furthermore, the multiple layers of main trunk 401 may include one or more fully bonded layers. For example, the main trunk 401 can be configured similar to the base layer of an SSD. Alternatively, main trunk 401 may be configured according to other CNN architectures such as VGG-19, AlexNet or Network-In-Network.

主幹部４０１は、画像が入力されると、その画像に対して各層での演算を実行することで、その画像から算出したfeature mapを出力する。なお、主幹部４０１は、解像度の異なる複数のfeature mapを出力してもよい。例えば、主幹部４０１は、入力された画像の解像度と同じ解像度を持つfeature mapと、入力された画像の解像度よりも低い解像度の１以上のfeature mapを出力してもよい。 When an image is input, the main body 401 outputs a feature map calculated from the image by executing operations in each layer on the image. Note that the main body 401 may output a plurality of feature maps with different resolutions. For example, the main body 401 may output a feature map having the same resolution as the input image and one or more feature maps having a lower resolution than the input image.

位置検出部４０２及び種類推定部４０３には、それぞれ、主幹部４０１から出力されたfeature mapが入力される。そして位置検出部４０２及び種類推定部４０３は、それぞれ、例えば、入力側から出力側へ向けて直列に接続される複数の層を有するCNNとすることができる。位置検出部４０２及び種類推定部４０３のそれぞれについて、CNNが有する複数の層には２以上の畳み込み層が含まれる。また、位置検出部４０２及び種類推定部４０３のそれぞれについて、CNNが有する複数の層には、１または複数の畳み込み層ごとに設けられるプーリング層が含まれてもよい。なお、CNNが有する畳み込み層及びプーリング層は、位置検出部４０２及び種類推定部４０３について共通化されてもよい。さらに、位置検出部４０２及び種類推定部４０３のそれぞれについて、複数の層には、１以上の全結合層が含まれてもよい。この場合、全結合層は、各畳み込み層よりも出力側に設けられることが好ましい。また全結合層には、各畳み込み層からの出力が直接入力されてもよい。また、種類推定部４０３の出力層は、ソフトマックス関数に従って検出対象物体の種類のそれぞれの確信度を算出するソフトマックス層としてもよいし、シグモイド関数に従って検出対象物体の種類のそれぞれの確信度を算出するシグモイド層としてもよい。 The feature map output from the main body 401 is input to the position detection unit 402 and the type estimation unit 403, respectively. Each of the position detection unit 402 and the type estimation unit 403 can be, for example, a CNN having multiple layers connected in series from the input side to the output side. For each of the position detection unit 402 and the type estimation unit 403, the multiple layers of the CNN include two or more convolution layers. Further, for each of the position detection unit 402 and the type estimation unit 403, the multiple layers of the CNN may include a pooling layer provided for each one or multiple convolution layers. Note that the convolution layer and pooling layer of the CNN may be shared between the position detection unit 402 and the type estimation unit 403 . Furthermore, for each of the position detection unit 402 and the type estimation unit 403, the plurality of layers may include one or more fully connected layers. In this case, the fully connected layer is preferably provided on the output side of each convolutional layer. Alternatively, the output from each convolutional layer may be directly input to the fully connected layer. The output layer of the type estimating unit 403 may be a softmax layer that calculates the confidence of each type of the detection target object according to the softmax function, or a softmax layer that calculates the confidence of each type of the detection target object according to the sigmoid function. It may be a sigmoid layer to be calculated.

位置検出部４０２及び種類推定部４０３は、例えば、画像上の様々な位置、様々なサイズ及び様々なアスペクト比の領域ごとに、検出対象物体の種類のそれぞれの確信度を出力するように学習される。したがって、識別器４００は、画像が入力されることで、画像上の様々な位置、様々なサイズ及び様々なアスペクト比の領域ごとに、検出対象物体の種類のそれぞれの確信度を出力する。そして位置検出部４０２及び種類推定部４０３は、何れかの種類の検出対象物体についての確信度が所定の確信度閾値以上となる領域を、その種類の検出対象物体が表された物体領域として検出する。 The position detecting unit 402 and the type estimating unit 403 are trained to output respective degrees of certainty of the type of the detection target object, for example, for regions of various positions on the image, various sizes, and various aspect ratios. be. Therefore, when an image is input, the discriminator 400 outputs respective confidence factors of the detection target object type for each region of various positions, various sizes, and various aspect ratios on the image. Then, the position detecting unit 402 and the type estimating unit 403 detect an area where the certainty of any type of detection target object is equal to or higher than a predetermined certainty threshold as an object area representing that type of detection target object. do.

識別器４００の学習に利用される教師データに含まれる画像（教師画像）には、例えば、検出対象物体の種類（例えば、普通乗用車、バス、トラック、二輪車など）と、検出対象物体が表された物体領域である、その検出対象物体の外接矩形とがタグ付けされる。 An image (teacher image) included in the training data used for learning of the discriminator 400 represents, for example, the type of detection target object (eg, ordinary passenger car, bus, truck, motorcycle, etc.) and the detection target object. The circumscribing rectangle of the object to be detected, which is the object region that is detected, is tagged.

識別器４００は、上記のような多数の教師画像を用いて、例えば、誤差逆伝搬法といった学習手法に従って学習される。プロセッサ２３は、このように学習された識別器４００を利用することで、画像から検出対象となる物体を精度良く検出できる。 The discriminator 400 is trained using a large number of teacher images as described above, for example, according to a learning technique such as error backpropagation. By using the classifier 400 learned in this way, the processor 23 can accurately detect the object to be detected from the image.

なお、物体検出部３１は、車両１０の周囲の他の車両以外の、車両１０の走行制御に影響する物体を検出してもよい。そのような物体には、例えば、人、道路標識、信号機、車線区画線などの道路標示、及び、道路上のその他の物体などが含まれる。この場合、第１の識別器は、これらの物体も検出するように予め学習されればよい。そして物体検出部３１は、画像をその第１の識別器に入力することで、これらの物体も検出することができる。 Note that the object detection unit 31 may detect objects other than other vehicles around the vehicle 10 that affect the travel control of the vehicle 10 . Such objects include, for example, people, road signs, traffic lights, road markings such as lane markings, and other objects on the road. In this case, the first discriminator may be trained in advance so as to detect these objects as well. The object detection unit 31 can also detect these objects by inputting the image to the first discriminator.

物体検出部３１は、さらに、Non-maximum suppression(NMS)処理を実行することで、少なくとも部分的に重複する２以上の物体領域のうち、同一の物体が表されていると推定される物体領域から一つを選択してもよい。 The object detection unit 31 further performs non-maximum suppression (NMS) processing to detect object regions estimated to represent the same object, out of two or more object regions that at least partially overlap. You can choose one from

物体検出部３１は、各物体領域の画像上での位置及び範囲と、その物体領域に含まれる物体の種類とを、検出物体リストに登録する。そして物体検出部３１は、検出物体リストをメモリ２２に記憶する。さらに、物体検出部３１は、各物体領域について、その物体領域に含まれる各画素から第１の識別器の主幹部により算出され、かつ、状態識別部３４へ出力されるfeature mapをメモリ２２に記憶する。なお、状態識別部３４へ出力されるfeature mapは、第１の識別器に入力された画像の解像度と同じ解像度を有するものとすることができる。また、第１の識別器の主幹部が有するプーリング層などにより、入力された画像の解像度よりも低い解像度を持つfeature mapが算出される場合には、その低い解像度を持つfeature mapが状態識別部３４へ出力されてもよい。さらに、第１の識別器の主幹部により算出される、互いに異なる解像度を持つ複数のfeature mapが状態識別部３４へ出力されてもよい。 The object detection unit 31 registers the position and range of each object area on the image and the type of object included in the object area in a detected object list. The object detection unit 31 then stores the detected object list in the memory 22 . Furthermore, for each object region, the object detection unit 31 stores the feature map, which is calculated by the main part of the first discriminator from each pixel included in the object region and is output to the state discrimination unit 34, in the memory 22. Remember. Note that the feature map output to the state identification unit 34 can have the same resolution as the resolution of the image input to the first classifier. In addition, when a feature map with a resolution lower than the resolution of the input image is calculated by a pooling layer of the main body of the first discriminator, the feature map with the lower resolution is used by the state discriminator. 34. Furthermore, a plurality of feature maps having different resolutions calculated by the main part of the first discriminator may be output to the state discriminating section 34 .

追跡部３２は、最新の画像から検出された物体領域のそれぞれについて、その物体領域に表された検出対象物体を、検出物体リストを参照して過去の画像から検出された検出対象物体と対応付けることで、その物体領域に表された検出対象物体を追跡する。さらに、追跡部３２は、追跡中の検出対象物体が所定数（例えば、５～１０）よりも多く存在する場合、それら追跡中の検出対象物体のなかから、所定数の検出対象物体を、状態識別の対象となる物体として選択する。 For each object region detected from the latest image, the tracking unit 32 refers to the detected object list and associates the detection target object represented in the object region with the detection target object detected from the previous image. , to track the object to be detected represented in the object region. Further, when there are more than a predetermined number (for example, 5 to 10) of detection target objects being tracked, the tracking unit 32 selects a predetermined number of detection target objects from among the detection target objects being tracked, Select as an object to be identified.

追跡部３２は、例えば、Lucas-Kanade法といった、オプティカルフローに基づく追跡処理を、最新の画像における、着目する物体領域及び過去の画像における物体領域に対して適用することで、その物体領域に表された検出対象物体を追跡する。そのため、追跡部３２は、例えば、着目する物体領域に対してSIFTあるいはHarrisオペレータといった特徴点抽出用のフィルタを適用することで、その物体領域から複数の特徴点を抽出する。そして追跡部３２は、複数の特徴点のそれぞれについて、過去の画像における物体領域における対応する点を、適用される追跡手法に従って特定することで、オプティカルフローを算出すればよい。あるいは、追跡部３２は、画像から検出された移動物体の追跡に適用される他の追跡手法を、最新の画像における、着目する物体領域及び過去の画像における物体領域に対して適用することで、その物体領域に表された検出対象物体を追跡してもよい。 For example, the tracking unit 32 applies tracking processing based on optical flow, such as the Lucas-Kanade method, to the object region of interest in the latest image and the object region in the previous image, thereby displaying the object region. track the detected object. Therefore, the tracking unit 32 extracts a plurality of feature points from the object region by applying a feature point extraction filter such as SIFT or Harris operator to the object region of interest. Then, the tracking unit 32 may calculate the optical flow by specifying corresponding points in the object region in the past image for each of the plurality of feature points according to the applied tracking method. Alternatively, the tracking unit 32 applies another tracking method applied to tracking a moving object detected from an image to the object region of interest in the latest image and the object region in the previous image, A detection target object represented in the object region may be tracked.

追跡部３２は、最新の画像から検出された検出対象物体のうち、過去の画像に表された検出対象物体と対応付けられなかった検出対象物体に対して、新たな追跡対象として、他の追跡中の検出対象物体と異なる識別番号を割り当て、その割り当てた識別番号を検出物体リストに登録する。一方、追跡部３２は、最新の画像から検出された検出対象物体のうち、過去の画像に表された検出対象物体と対応付けられた検出対象物体、すなわち、追跡中の検出対象物体について、その追跡中の検出対象物体に割り当てられた識別番号と同じ識別番号を対応付ける。 The tracking unit 32 selects, among the detection target objects detected from the latest image, the detection target objects that have not been associated with the detection target objects represented in the past images as new tracking targets for other tracking. An identification number different from that of the object to be detected is assigned, and the assigned identification number is registered in the detection object list. On the other hand, the tracking unit 32 detects the detection target object associated with the detection target object represented in the previous image, that is, the detection target object being tracked, among the detection target objects detected from the latest image. The same identification number as the identification number assigned to the object being tracked is associated.

上記のように、追跡部３２は、追跡中の検出対象物体が所定数よりも多く存在する場合、追跡中の検出対象物体の中から、所定数の検出対象物体を状態識別の対象となる物体として選択する。 As described above, when there are more than a predetermined number of detection target objects being tracked, the tracking unit 32 selects a predetermined number of detection target objects from among the detection target objects being tracked. Select as

例えば、車両１０に近い検出対象物体ほど、車両１０の運転制御に対する影響が大きいので、追跡部３２は、追跡中の検出対象物体のうち、車両１０に近い方から順に所定数の検出対象物体を選択する。例えば、画像上で検出対象物体が表された物体領域が大きいほど、車両１０からその検出対象物体までの距離が近いと推定される。そこで、追跡部３２は、例えば、最新の画像上での物体領域のサイズが大きい方から順に所定数の検出対象物体を選択する。 For example, since a detection target object closer to the vehicle 10 has a greater influence on the operation control of the vehicle 10, the tracking unit 32 selects a predetermined number of detection target objects from among the detection target objects being tracked in order from the closest to the vehicle 10. select. For example, the larger the object area where the detection target object is represented on the image, the closer the distance from the vehicle 10 to the detection target object is estimated. Therefore, the tracking unit 32 selects a predetermined number of detection target objects in descending order of the size of the object region on the latest image, for example.

あるいは、追跡部３２は、追跡中の検出対象物体のそれぞれの物体領域の画像上での下端の位置に基づいて所定数の検出対象物体を選択してもよい。車両１０が走行中の道路と同じ道路を検出対象物体が走行している場合、画像上でのその検出対象物体が表された物体領域の下端の位置は、その検出対象物体が位置している路面上の位置と推定される。そして、車両１０に検出対象物体が近いほど、カメラ２からその検出対象物体が位置している路面上の位置への方位は下向きとなるので、画像上での物体領域の下端も画像の下端に近くなる。したがって、物体領域の下端の位置が画像端に近いほど、車両１０からその物体領域に表された検出対象物体までの距離は近いと推定される。そこで、追跡部３２は、追跡中の検出対象物体の中から、最新の画像において、物体領域の下端が画像の下端に近い方から順に所定数の検出対象物体を選択してもよい。 Alternatively, the tracking unit 32 may select a predetermined number of detection target objects based on the positions of the lower ends of the object areas of the detection target objects being tracked on the image. When the object to be detected is traveling on the same road as the road on which the vehicle 10 is traveling, the position of the lower end of the object area in which the object to be detected on the image is represented is the position of the object to be detected. Estimated position on the road surface. The closer the detection target object is to the vehicle 10, the lower the direction from the camera 2 to the position on the road surface where the detection target object is located. get closer. Therefore, it is estimated that the closer the position of the lower end of the object region is to the edge of the image, the closer the distance from the vehicle 10 to the detection target object represented in the object region is. Therefore, the tracking unit 32 may select a predetermined number of detection target objects from among the detection target objects being tracked in order from the lower end of the object region closer to the lower end of the image in the latest image.

あるいは、追跡部３２は、追跡中の検出対象物体のそれぞれについて、その検出対象物体が表された物体領域のサイズ（例えば、横幅）と、その検出対象物体と同一種類の基準物体が車両１０から所定距離に位置していると仮定した場合の基準サイズとの比に基づいて、車両１０からその検出対象物体までの距離を推定してもよい。あるいはまた、車両制御システム１がLiDARあるいはレーダといった測距センサ（図示せず）を有している場合、その測距センサにより、追跡中の各検出対象物体までの距離が測定されてもよい。この場合、例えば、画像上での検出対象物体が表された物体領域の重心に対応する、カメラ２からの方位に相当する測距センサからの方位における距離が、車両１０からその検出対象物体までの距離として測定される。そして追跡部３２は、推定または測定された車両１０からの距離が近い方から順に、所定数の検出対象物体を選択すればよい。 Alternatively, for each of the detection target objects being tracked, the tracking unit 32 determines the size (e.g., width) of the object region in which the detection target object is represented and the reference object of the same type as the detection target object from the vehicle 10. The distance from the vehicle 10 to the object to be detected may be estimated based on the ratio to the reference size assuming that the object is located at a predetermined distance. Alternatively, if the vehicle control system 1 has a ranging sensor (not shown) such as LiDAR or radar, the ranging sensor may measure the distance to each detection target object being tracked. In this case, for example, the distance in the azimuth from the ranging sensor corresponding to the azimuth from the camera 2 corresponding to the center of gravity of the object area in which the detection target object is represented on the image is from the vehicle 10 to the detection target object. measured as the distance between Then, the tracking unit 32 may select a predetermined number of detection target objects in descending order of estimated or measured distance from the vehicle 10 .

あるいはまた、追跡部３２は、追跡中の検出対象物体のなかから、車線ごとに決められた数の検出対象物体を選択してもよい。例えば、追跡部３２は、車両１０が走行中の車線と同じ車線を走行中の検出対象物体のうち、車両１０に最も近いと推定される検出対象物体を選択する。さらに、追跡部３２は、車両１０が走行中の車線の左右に隣接するそれぞれの車線、及び、それら隣接車線にさらに隣接する車線（すなわち、車両１０が走行中の車線を中心とする、左右それぞれ二つの車線）のそれぞれから、車両１０に最も近いと推定される検出対象物体を選択する。この場合、例えば、物体検出部３１が最新の画像から車線区画線を検出している場合、あるいは、ローカライズ処理部（図示せず）が、最新の画像から車線区画線を検出している場合には、追跡部３２は、車線区画線と物体領域との位置関係に基づいて、各検出対象物体が走行中の車線を特定すればよい。例えば、追跡部３２は、着目する検出対象物体について、その検出対象物体を含む物体領域の下端の両側に位置する二つの車線区画線で挟まれた車線上にその検出対象物体が位置していると判定すればよい。また、追跡部３２は、車線ごとに、上記の検出対象物体の選択と同様の処理を実行することで、その車線を走行中の検出対象物体のうち、車両１０に最も近い検出対象物体を選択すればよい。なお、追跡部３２は、車線ごとに、車両１０に近い方から順に二つ以上の検出対象物体を選択してもよい。 Alternatively, the tracking unit 32 may select a predetermined number of detection target objects for each lane from the detection target objects being tracked. For example, the tracking unit 32 selects a detection target object estimated to be closest to the vehicle 10 from among the detection target objects traveling in the same lane as the vehicle 10 is traveling. Furthermore, the tracking unit 32 tracks each of the lanes adjacent to the left and right of the lane in which the vehicle 10 is traveling, and the lanes further adjacent to the adjacent lanes (that is, each of the left and right lanes centered on the lane in which the vehicle 10 is traveling). A detection target object estimated to be closest to the vehicle 10 is selected from each of the two lanes). In this case, for example, when the object detection unit 31 detects lane markings from the latest image, or when the localization processing unit (not shown) detects lane markings from the latest image, In other words, the tracking unit 32 may identify the lane in which each detection target object is traveling based on the positional relationship between the lane markings and the object area. For example, the tracking unit 32 determines that a detection target object of interest is located on a lane sandwiched between two lane markings located on both sides of the lower end of the object area containing the detection target object. should be determined. Further, the tracking unit 32 selects the detection target object closest to the vehicle 10 from among the detection target objects traveling in the lane by executing the same processing as the detection target object selection for each lane. do it. Note that the tracking unit 32 may select two or more detection target objects in order from the closest to the vehicle 10 for each lane.

変形例によれば、追跡部３２は、追跡中の検出対象物体全てを、状態識別の対象となる物体として選択してもよい。 According to a variant, the tracking unit 32 may select all detected objects that are being tracked as objects for state identification.

追跡部３２は、状態識別の対象となる検出対象物体の識別番号を状態識別部３４へ通知する。また、追跡部３２は、状態識別の対象についての判定結果に基づいて、検出物体リストにおける、状態識別の対象となる検出対象物体を示すインデックスの値を更新する。 The tracking unit 32 notifies the state identification unit 34 of the identification number of the detection target object whose state is to be identified. In addition, the tracking unit 32 updates the value of the index indicating the detection target object that is the target of state identification in the detection object list based on the determination result regarding the target of state identification.

領域分割部３３は、カメラ２から画像を受信する度に、その画像を、検出対象となる物体が表されている画素の集合をその他の画素の集合と識別するように予め学習された第２の識別器に入力することで、その画像について検出対象物体が表された画素の集合である着目領域とその他のマスク領域とに領域分割する。 Each time an image is received from the camera 2, the region dividing unit 33 divides the image into a second pixel group that has been learned in advance so as to distinguish a group of pixels representing an object to be detected from other groups of pixels. , the image is divided into a region of interest, which is a set of pixels representing the object to be detected, and other mask regions.

物体検出部３１において説明したように、物体領域は、検出対象物体の外接矩形あるいは予め決められた所定の形状の領域として設定される。一方、車両１０から見た検出対象物体の外観形状は矩形とは限らない。また、検出対象物体の透明な部分（例えば、リアウィンドウなど）を介して、検出対象物体以外のものが車両１０から見えることがある。 As described in the object detection unit 31, the object area is set as a circumscribed rectangle of the object to be detected or an area having a predetermined shape. On the other hand, the external shape of the detection target object as viewed from the vehicle 10 is not limited to a rectangle. In addition, objects other than the detection target object may be visible from the vehicle 10 through a transparent portion (for example, a rear window) of the detection target object.

図５は、画像上に表される検出対象物体と物体領域の一例を示す図である。画像５００には、検出対象物体の一例である車両が複数写っており、各車両が検出対象物体として検出されている。このうち、車両５０１について、車両５０１よりも前方に位置する車両５０２と車両５０３とが、車両５０１と重なって見えるため、車両５０１について設定された物体領域５１１内に、車両５０２及び車両５０３のそれぞれの一部が含まれている。特に、この例では、物体領域５１１内に、車両５０２及び車両５０３のブレーキランプ及びターンシグナルが含まれている。そのため、物体領域５１１内の全ての特徴が車両５０１の状態の識別のために利用されると、車両５０２または車両５０３のブレーキランプの点灯状況またはターンシグナルの点滅状況に影響されて、車両５０１の状態が正確に識別されなくなる可能性がある。 FIG. 5 is a diagram showing an example of a detection target object and an object region displayed on an image. An image 500 includes a plurality of vehicles, which are examples of detection target objects, and each vehicle is detected as a detection target object. Of these vehicles 501 , vehicles 502 and 503 located in front of the vehicle 501 appear to overlap with the vehicle 501 . contains some of the Specifically, in this example, the brake lights and turn signals of vehicle 502 and vehicle 503 are included within object region 511 . Therefore, if all the features in the object region 511 are used to identify the state of the vehicle 501, the state of the vehicle 501 will be affected by the lighting state of the brake lights or the blinking state of the turn signals of the vehicle 502 or 503. The state may not be accurately identified.

同様に、画像５００では、車両５０４のフロントウィンドウ及びリアウィンドウを介して、車両５０４よりも前方に位置する車両５０５の一部が見えている。そのため、車両５０４について設定された物体領域５１４内に、車両５０５の一部が含まれている。そのため、物体領域５１４内の全ての特徴が車両５０４の状態の識別のために利用されると、車両５０５のブレーキランプの点灯状況またはターンシグナルの点滅状況などに影響されて、車両５０４の状態が正確に識別されなくなる可能性がある。 Similarly, image 500 shows a portion of vehicle 505 located ahead of vehicle 504 through the front and rear windows of vehicle 504 . Therefore, part of the vehicle 505 is included in the object region 514 set for the vehicle 504 . Therefore, if all the features in the object region 514 are used to identify the state of the vehicle 504, the state of the vehicle 504 can be determined by the state of the brake lights of the vehicle 505, the blinking state of the turn signal, or the like. It may not be identified correctly.

そこで、領域分割部３３は、各画像について、着目領域とマスク領域とに領域分割する。このように、各画像を領域分割することで、後述する状態識別部３４が、検出対象物体以外のものを表す特徴を置換または低減することが可能となる。 Therefore, the region dividing unit 33 divides each image into a region of interest and a mask region. By segmenting each image into regions in this manner, the state identification unit 34, which will be described later, can replace or reduce features representing objects other than the detection target object.

例えば、領域分割部３３は、第２の識別器として、例えば、Fully Convolutional Network(FCN)、SegNetまたはU-NetといったCNN型のアーキテクチャを有するDNNを用いることができる。 For example, the region dividing unit 33 can use a DNN having a CNN type architecture such as Fully Convolutional Network (FCN), SegNet, or U-Net as the second discriminator.

あるいは、領域分割部３３は、第２の識別器として、同じ種類の物体でも、異なる物体が表れている画素の集合ごとに（すなわち、インスタンスごとに）領域分割可能なインスタンスセグメンテーション用のDNNを用いてもよい。領域分割部３３は、そのようなインスタンスセグメンテーション用のDNNとして、例えば、Mask-RCNNまたはInstance FCNを用いることができる。第２の識別器としてインスタンスセグメンテーション用の識別器が利用されることにより、領域分割部３３は、画像上に同じ種類の検出対象物体が複数表されており、かつ、それら複数の検出対象物体同士が部分的に重なっていても、検出対象物体ごとに着目領域を設定することができる。そのため、例えば、着目する車両以外の他の車両のブレーキランプまたはターンシグナルが、その着目する車両の物体領域に含まれていたとしても、他の車両のブレーキランプまたはターンシグナルが、着目する車両の状態の識別結果（例えば、ブレーキランプの点灯状態または右左折のターンシグナルの点滅状態か否か）に影響することが抑制される。 Alternatively, the region dividing unit 33 uses a DNN for instance segmentation that can divide regions for each set of pixels representing different objects (that is, for each instance) even for objects of the same type as the second discriminator. may The region dividing unit 33 can use, for example, Mask-RCNN or Instance FCN as DNN for such instance segmentation. By using a classifier for instance segmentation as the second classifier, the region dividing unit 33 can detect that a plurality of detection target objects of the same type are represented on the image, and that the plurality of detection target objects are partially overlapped, a region of interest can be set for each detection target object. Therefore, for example, even if the brake lamps or turn signals of other vehicles other than the vehicle of interest are included in the object region of the vehicle of interest, the brake lamps or turn signals of the other vehicles are not of the vehicle of interest. Affecting the identification result of the state (for example, whether the brake lamp is on or the turn signal is blinking) is suppressed.

また、領域分割部３３は、第２の識別器として、ニューラルネットワーク以外の手法に基づくセマンティックセグメンテーション用の識別器、例えば、ランダムフォレストといった手法に基づくセマンティックセグメンテーション用の識別器を用いてもよい。 Also, the region dividing unit 33 may use, as the second classifier, a classifier for semantic segmentation based on a method other than a neural network, for example, a classifier for semantic segmentation based on a method such as random forest.

第２の識別器は、各画像について、画素単位で、着目領域とマスク領域とに領域分割することが好ましい。これにより、着目領域の形状が検出対象物体の形状をより正確に表すことができるので、検出対象物体以外のものを表す特徴が、検出対象物体の状態の識別のために利用される第３の識別器に入力され難くなる。しかし、第２の識別器は、各画像について、想定される物体領域の最小サイズよりも小さい画素グループ単位（例えば、２×２画素単位、あるいは４×４画素単位）で、着目領域とマスク領域とに領域分割してもよい。これにより、第２の識別器による演算量が削減される。 Preferably, the second discriminator divides each image into a region of interest and a mask region on a pixel-by-pixel basis. As a result, the shape of the region of interest can more accurately represent the shape of the detection target object. Input to the discriminator becomes difficult. However, for each image, the second discriminator divides the region of interest and the mask region into units of pixel groups smaller than the minimum size of the assumed object region (for example, units of 2×2 pixels or units of 4×4 pixels). It is possible to divide the area into This reduces the amount of computation by the second discriminator.

なお、領域分割結果を表す情報は、例えば、カメラ２により得られた画像と同じサイズを持ち、かつ、領域ごとに異なる値を持つビットマップとして表される。 Note that the information representing the region division result is represented as a bitmap having the same size as the image obtained by the camera 2 and having different values for each region, for example.

領域分割部３３は、領域分割結果を表す情報を、状態識別部３４へわたす。 The region division unit 33 passes information representing the region division result to the state identification unit 34 .

状態識別部３４は、追跡中の検出対象物体のうち、状態識別の対象となる検出対象物体のそれぞれについて、カメラ２から画像が得られる度に、その検出対象物体が含まれる物体領域内の画素値から求められる特徴のうち、その物体領域及び着目領域の両方に含まれる特徴を、領域分割結果を表す情報を参照して抽出する。そして状態識別部３４は、抽出した特徴を、再帰的な構造を持つ第３の識別器に入力することで、時系列の外観変化を伴う、その検出対象物体の状態を識別する。 Each time an image is obtained from the camera 2 for each detection target object to be subjected to state identification among the detection target objects being tracked, the state identification unit 34 identifies pixels in the object region including the detection target object. Among the features obtained from the values, the features included in both the object region and the region of interest are extracted by referring to the information representing the segmentation result. The state identification unit 34 then inputs the extracted features to a third classifier having a recursive structure, thereby identifying the state of the detection target object that accompanies changes in appearance over time.

状態識別部３４は、検出対象物体が表された物体領域内の画素値から求められる特徴として、例えば、第１の識別器の主幹部により算出されたfeature mapのうち、その物体領域に含まれるfeatureを利用することができる。これにより、検出対象物体そのものの特徴だけでなく、検出対象物体周囲の環境の特徴も状態識別に利用することが可能となる。本実施形態では、検出対象物体である車両と他の車両との相対的な位置関係による影響、例えば、検出対象物体である車両のターンシグナル等の一部が他の車両に隠れているような状況も考慮して、第３の識別器は、検出対象物体である車両の状態を識別することができる。例えば、feature mapの解像度が第１の識別器に入力された画像の解像度と同一である場合、その画像上での物体領域と対応する、feature map上の領域内に含まれる各featureが、物体領域内の画素値から求められる特徴となる。また、feature mapの解像度が第１の識別器に入力された画像の解像度よりも低い場合、入力された画像の解像度に対するfeature mapの解像度の比に応じて、物体領域の座標を補正した位置及び範囲が、物体領域に対応する、feature map上の領域となる。例えば、入力された画像上での物体領域の左上端位置及び右上端位置がそれぞれ(tlX, tlY)及び(brX, brY)であり、入力された画像に対して1/N(Nは2以上の整数)のダウンサイジングが行われてfeature mapが算出されているとする。この場合、画像上の物体領域に対応するfeature map上の領域の左上端位置及び右下端位置は、それぞれ、(tlX/N, tlY/N)及び(brX/N, brY/N)となる。 The state identification unit 34 uses, for example, the feature map calculated by the main part of the first discriminator as the feature obtained from the pixel values in the object region representing the detection target object. feature can be used. This makes it possible to use not only the characteristics of the detection target object itself, but also the characteristics of the environment around the detection target object for state identification. In this embodiment, the effect of the relative positional relationship between the vehicle that is the object to be detected and the other vehicle, such as the fact that part of the turn signal of the vehicle that is the object to be detected is hidden behind other vehicles In consideration of the situation, the third discriminator can discriminate the state of the vehicle, which is the object to be detected. For example, when the resolution of the feature map is the same as the resolution of the image input to the first discriminator, each feature included in the region on the feature map corresponding to the object region on the image is the object It is a feature obtained from the pixel values in the region. Further, when the resolution of the feature map is lower than the resolution of the image input to the first discriminator, the position and The range is the area on the feature map that corresponds to the object area. For example, the upper left edge position and upper right edge position of the object area on the input image are (tlX, tlY) and (brX, brY), respectively, and 1/N (N is 2 or more) for the input image. ) is downsized and the feature map is calculated. In this case, the upper left end position and lower right end position of the region on the feature map corresponding to the object region on the image are (tlX/N, tlY/N) and (brX/N, brY/N), respectively.

状態識別部３４は、さらに、状態識別の対象となる検出対象物体のそれぞれについて、その検出対象物体が含まれる物体領域のうち、領域分割結果を示す情報においてその検出対象物体と同じ種類の物体の着目領域を、この物体領域内の着目領域とし、それ以外の領域をマスク領域とする。なお、第２の識別器としてインスタンスセグメンテーション用のDNNが用いられ、物体ごとに個別に着目領域が設定されている場合には、状態識別部３４は、物体領域に含まれる検出対象物体の種類と同じ種類の物体のうち、領域分割結果を示す情報においてその物体領域内で最も大きい物体の着目領域を、その物体領域内の着目領域とする。 Further, for each of the detection target objects to be subjected to state identification, the state identification unit 34 further determines, from among the object regions containing the detection target object, the information indicating the region division result of the same type of object as the detection target object. The region of interest is defined as a region of interest within this object region, and the other region is defined as a mask region. Note that when a DNN for instance segmentation is used as the second discriminator, and a region of interest is individually set for each object, the state identification unit 34 determines the type of detection target object included in the object region and Among the objects of the same type, the region of interest of the largest object within the object region in the information indicating the result of region division is set as the region of interest within the object region.

図６は、物体領域及び着目領域の一例を示す図である。図６に示される物体領域６００と対応する領域分割結果６１０では、物体領域６００に含まれる検出対象物体６０１が表されている画素の集合である着目領域６１１と、着目領域６１１以外の画素の集合であるマスク領域６１２とに領域分割されている。そのため、物体領域６００と着目領域６１１の積集合となる領域６２０が、特徴の抽出対象となる。 FIG. 6 is a diagram showing an example of an object region and a region of interest. In the region segmentation result 610 corresponding to the object region 600 shown in FIG. is divided into a mask region 612 which is . Therefore, a region 620, which is the intersection of the object region 600 and the region of interest 611, is a feature extraction target.

状態識別部３４は、マスク領域に含まれる特徴が検出対象物体の状態の識別に影響しないよう、物体領域内に含まれる各featureのうち、マスク領域に含まれるfeatureを0に置換し、あるいは、マスク領域に含まれるfeatureに1未満の係数を乗じて減衰させる。これにより、物体領域及び着目領域の両方に含まれる特徴が抽出される。 The state identification unit 34 replaces the features included in the mask region with 0 among the features included in the object region so that the features included in the mask region do not affect the identification of the state of the detection target object, or Attenuates the features contained in the mask area by multiplying them by a factor less than 1. As a result, features included in both the object region and the region of interest are extracted.

変形例によれば、状態識別部３４は、第１の識別器に入力された画像上の検出対象物体が表された物体領域内の各画素の値そのものを、第３の識別器に入力する、検出対象物体が表された物体領域内の画素値から求められる特徴としてもよい。あるいは、状態識別部３４は、その物体領域内の各画素に対して、畳み込み演算といった所定のフィルタ処理を行って得られた値を、第３の識別器に入力する、検出対象物体が表された物体領域内の画素値から求められる特徴としてもよい。この場合も、マスク領域に含まれる特徴が検出対象物体の状態の識別に影響しないよう、状態識別部３４は、物体領域内に含まれる各画素の値または各画素のフィルタ処理された値のうち、マスク領域に含まれる値を0に置換し、あるいは、マスク領域に含まれる値に1未満の係数を乗じて減衰させる。 According to the modified example, the state identifying unit 34 inputs the value itself of each pixel in the object region representing the detection target object on the image input to the first classifier to the third classifier. , may be a feature obtained from pixel values in an object region representing the object to be detected. Alternatively, the state identification unit 34 inputs a value obtained by performing a predetermined filtering process such as a convolution operation on each pixel in the object region to the third classifier. It may also be a feature obtained from pixel values in the object region. In this case as well, the state identification unit 34 selects the value of each pixel included in the object region or the filtered value of each pixel so that the features included in the mask region do not affect the identification of the state of the object to be detected. , replaces the values contained in the mask region with 0, or attenuates the values contained in the mask region by multiplying them by a factor of less than 1.

状態識別部３４は、各物体領域について、抽出した特徴をダウンサンプリングまたはアップサンプリングすることで所定のサイズ（例えば、32×32）にリサイズする。これにより、検出対象物体の追跡の途中で車両１０と検出対象物体間の相対距離が変化して、画像上での検出対象物体のサイズが変化しても、第３の識別器は、入力される特徴を一定のサイズとして扱えるので、第３の識別器の構成が簡単化される。 The state identification unit 34 resizes each object region to a predetermined size (eg, 32×32) by down-sampling or up-sampling the extracted features. As a result, even if the relative distance between the vehicle 10 and the detection target object changes during tracking of the detection target object and the size of the detection target object on the image changes, the third discriminator is not input. The configuration of the third discriminator is simplified because the feature can be treated as a constant size.

状態識別部３４は、再帰的な構造を持つ第３の識別器として、例えば、Recurrent Neural Network(RNN)、Long Short Term Memory(LSTM)またはGated Recurrent Unit(GRU)といった、再帰的な構造を持つニューラルネットワークを用いることができる。第３の識別器は、物体領域及び着目領域の両方に含まれる特徴を処理すればよいため、第１の識別器と比較して、入力層及び中間層のサイズが小さくて済み、かつ、重み係数といった第３の識別器を規定するためのパラメータ数が少なくて済む。そのため、第３の識別器は、第１の識別器と比較して演算量が少なく、プロセッサ２３に対する演算負荷を小さくすることができる。さらに、第３の識別器の学習に要する演算量も削減される。なお、第１の識別器、第２の識別器及び第３の識別器がそれぞれニューラルネットワークとして構成されている場合、共通の教師データを用いて、誤差逆伝搬法によりそれらニューラルネットワークが一体的に学習されてもよい。その学習の際、第２の識別器の各畳み込み層のカーネルの重み係数は固定してもよい。これにより、第３の識別器に入力される特徴に、検出対象物体そのものの特徴だけでなく検出対象物体周囲の環境の特徴のうちの状態識別の精度向上に必要なものが含まれ、一方、状態識別の精度を低下させるものが含まれなくなるように各識別器が学習される。 The state identification unit 34 has a recursive structure such as a Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), or Gated Recurrent Unit (GRU) as a third classifier having a recursive structure. A neural network can be used. Since the third discriminator only needs to process features included in both the object region and the region of interest, the size of the input layer and the intermediate layer can be smaller than the first discriminator, and the weight A small number of parameters, such as coefficients, are required to define the third discriminator. Therefore, the third discriminator has a smaller amount of calculation than the first discriminator, and can reduce the calculation load on the processor 23 . Furthermore, the amount of computation required for learning the third discriminator is also reduced. In addition, when the first discriminator, the second discriminator, and the third discriminator are each configured as a neural network, these neural networks are integrally integrated by the error backpropagation method using common teacher data. may be learned. During the learning, the weight coefficient of the kernel of each convolutional layer of the second discriminator may be fixed. Thereby, the features input to the third discriminator include not only the features of the detection target object itself but also the features of the environment around the detection target object that are necessary for improving the accuracy of state identification. Each discriminator is trained so as not to include those that reduce the accuracy of state discrimination.

第３の識別器は、再帰的な構造を有しているため、時系列に順次特徴が入力される度に、内部状態を更新する。これにより、第３の識別器は、追跡中の検出対象物体の外観の時系列の変化に基づいて、その検出対象物体の状態を識別することができる。本実施形態では、状態識別部３４は、上記のように、検出対象物体（すなわち、車両１０の周囲の他の車両）の状態として、左右何れかのターンシグナルまたはハザードランプの点滅状態か否か、ブレーキランプが点灯している状態あるいは消灯している状態かを識別する。そのために、第３の識別器の出力層の活性化関数として、例えば、sigmoid関数が用いられる。これにより、第３の識別器は、各状態の確信度を出力することができる。そして状態識別部３４は、各状態の確信度を対応する閾値と比較し、検出対象物体の状態は、確信度が対応する閾値以上となる状態であると判定する。例えば、検出対象物体の左のターンシグナルが点滅している状態についての確信度が0.8であり、一方、左のターンシグナルが点滅していない状態についての確信度が0.2であるとする。そして閾値が0.5であるとすると、状態識別部３４は、検出対象物体の状態は、左のターンシグナルが点滅している状態であると判定する。 Since the third discriminator has a recursive structure, it updates its internal state each time features are sequentially input in time series. Thereby, the third discriminator can discriminate the state of the detection target object based on the time-series change in the appearance of the detection target object being tracked. In the present embodiment, as described above, the state identification unit 34 determines whether the state of the object to be detected (that is, another vehicle around the vehicle 10) is the left or right turn signal or whether the hazard lamp is blinking. , to identify whether the brake lamp is on or off. Therefore, for example, a sigmoid function is used as the activation function of the output layer of the third discriminator. This allows the third discriminator to output the confidence of each state. Then, the state identification unit 34 compares the certainty of each state with the corresponding threshold, and determines that the state of the detection target object has the certainty greater than or equal to the corresponding threshold. For example, suppose the confidence factor for the left turn signal flashing condition of the detected object is 0.8, while the confidence factor for the left turn signal non-flashing condition is 0.2. Assuming that the threshold value is 0.5, the state identification unit 34 determines that the state of the object to be detected is a state in which the left turn signal is blinking.

あるいは、第３の識別器の出力層の活性化関数として、softmax関数が用いられてもよい。この場合には、第３の識別器は、検出対象物体の状態として、左のターンシグナルが点滅、右のターンシグナルが点滅、ハザードランプが点滅、ブレーキランプが点灯、あるいはその何れでもないとの判定結果を出力する。したがって、状態識別部３４は、第３の識別器から出力された判定結果で表される状態を、検出対象物体の状態とすればよい。 Alternatively, a softmax function may be used as the activation function of the output layer of the third discriminator. In this case, the third discriminator indicates that the state of the object to be detected is blinking of the left turn signal, blinking of the right turn signal, blinking of the hazard lamp, blinking of the brake lamp, or none of these. Output the judgment result. Therefore, the state identification unit 34 may set the state represented by the determination result output from the third classifier as the state of the detection target object.

状態識別部３４は、状態識別の対象となる個々の検出体対象物体についての状態識別結果を検出物体リストへ登録するとともに、運転計画部３５へ通知する。 The state identification unit 34 notifies the operation planning unit 35 while registering the state identification results of the individual detection target objects to be subjected to state identification in the detection object list.

図７は、状態識別処理に関連する各部の処理のタイミングチャートである。プロセッサ２３の各部の処理は、例えば、プロセッサ２３上で動作するスケジューラ（図示せず）により管理され、図７に示されるタイミングチャートに従って実行される。図７において、横軸は時間を表す。また、図７において、個々のブロックは、そのブロック内に示された処理が実行されることを表し、個々の矢印は、各処理間でのデータ（画像、領域分割結果、特徴等）の受け渡しを表す。例えば、時刻t1にてＥＣＵ３がカメラ２から画像を受け取ると、プロセッサ２３が有するGPUにて、その画像に対する、物体検出部３１による検出対象物体の検出処理、及び、領域分割部３３による領域分割処理が並列に実行される。なお、検出対象物体の検出処理及び領域分割処理が行われる前に、画像に対してコントラスト補正または色変換といった前処理が行われてもよい。 FIG. 7 is a timing chart of processing of each unit related to state identification processing. The processing of each part of the processor 23 is managed, for example, by a scheduler (not shown) operating on the processor 23, and executed according to the timing chart shown in FIG. In FIG. 7, the horizontal axis represents time. In FIG. 7, each block indicates that the processing shown in that block is executed, and each arrow indicates the transfer of data (image, segmentation result, feature, etc.) between each processing. represents For example, when the ECU 3 receives an image from the camera 2 at time t1, the GPU of the processor 23 performs detection processing of a detection target object by the object detection unit 31 and region division processing by the region division unit 33 for the image. are executed in parallel. Note that preprocessing such as contrast correction or color conversion may be performed on the image before the detection processing of the detection target object and the region division processing are performed.

検出対象物体の検出処理が行われると、プロセッサ２３が有するCPUにて、検出された物体の種類及び物体領域の検出物体リストへの登録などの物体検出の後処理が行われ、その後に、追跡部３２による追跡処理が実行される。そして、追跡処理の後に、状態識別部３４による、個々の物体領域についての第３の識別器に入力される特徴の抽出、抽出した特徴のリサイズ及び第３の識別器を用いた状態識別処理が実行される。上記のように、第３の識別器による演算量は相対的に少ないため、個々の検出対象物体の状態識別処理に要する演算時間は少なくて済む。そして得られた検出対象物体の状態識別の結果が、運転計画部３５及び車両制御部３６の処理に利用される。なお、CPUによる処理とGPUによる処理間のタスクのスイッチングコスト及びメモリ転送量を最小化するために、領域分割処理、各検出対象物体についての特徴の抽出処理、状態識別処理及び状態識別処理の結果の読出し処理はバッチ処理としてまとめて実行されることが好ましい。 When the detection target object detection processing is performed, the CPU of the processor 23 performs object detection post-processing such as registering the type of the detected object and the object area in the detection object list. A tracking process by the unit 32 is executed. After the tracking process, the state identification unit 34 extracts features input to the third classifier for each object region, resizes the extracted features, and performs state identification processing using the third classifier. executed. As described above, since the amount of computation by the third discriminator is relatively small, the computation time required for state identification processing of each detection target object can be reduced. Then, the obtained result of state identification of the object to be detected is used for the processing of the operation planning unit 35 and the vehicle control unit 36 . In order to minimize task switching cost and memory transfer amount between processing by CPU and processing by GPU, region division processing, feature extraction processing for each detection target object, state identification processing, and state identification processing results is preferably executed collectively as a batch process.

図８は、検出物体リストの一例を示す図である。検出物体リスト８００には、追跡中の検出対象物体のそれぞれについて、その物体が状態識別対象か否かを表すインデックス、その物体に割り当てられた識別番号、その物体に関する情報が記憶されているメモリ２２のアドレスを表すポインタ及び状態識別部３４により状態識別された回数（すなわち、第３の識別器に対応する物体領域内の特徴が入力された回数）が格納される。さらに、検出物体リスト８００には、追跡中の検出対象物体のそれぞれについて、物体領域の位置及び範囲を表す情報（図示せず）及び検出対象物体の種類を表す情報（図示せず）などが格納される。また、各検出対象物体についてのポインタで示されるメモリ２２上の格納領域８０１には、最新の画像における、第３の識別器に入力される特徴、第３の識別器の中間層状態、第３の識別器からの出力結果などが記憶される。 FIG. 8 is a diagram showing an example of a detected object list. In the detected object list 800, the memory 22 stores an index indicating whether or not the object is a state identification target, an identification number assigned to the object, and information about the object for each object to be detected being tracked. , and the number of times the state has been identified by the state identification unit 34 (that is, the number of times the features in the object region corresponding to the third classifier have been input) are stored. Further, the detected object list 800 stores information (not shown) representing the position and range of the object region and information (not shown) representing the type of the detected object for each of the detected objects being tracked. be done. Further, in a storage area 801 on the memory 22 indicated by a pointer for each detection target object, the feature input to the third discriminator, the intermediate layer state of the third discriminator, the third The output results from the discriminators of are stored.

運転計画部３５は、検出物体リストを参照して、車両１０の周囲に存在する物体と車両１０とが衝突しないように車両１０の走行予定経路（トラジェクトリ）を１以上生成する。走行予定経路は、例えば、現時刻から所定時間先までの各時刻における、車両１０の目標位置の集合として表される。例えば、運転計画部３５は、検出物体リストを参照して、カメラ２についての車両１０への取り付け位置などの情報を用いて視点変換処理を実行することで、検出物体リストにおける物体の画像内座標を鳥瞰画像上の座標（鳥瞰座標）に変換する。そして運転計画部３５は、一連の鳥瞰座標に対してKalman FilterまたはParticle filterなどを用いたトラッキング処理を実行することで、検出物体リストに登録されている物体を追跡し、その追跡結果により得られた軌跡から、物体のそれぞれの所定時間先までの予測軌跡を推定する。その際、運転計画部３５は、検出対象物体の状態識別結果を予測軌跡の推定に利用する。例えば、着目する検出対象物体の状態が、左のターンシグナルが点滅している状態である場合、その検出対象物体は左側へ車線変更し、あるいは、左折する可能性が高い。そこで、運転計画部３５は、その検出対象物体について、左側へ車線変更し、あるいは、左折する予測軌跡を推定する。また、着目する検出対象物体の状態が、ブレーキランプが点灯している状態である場合、あるいは、ハザードランプが点滅している状態である場合、その検出対象物体は減速する可能性が高い。そこで、運転計画部３５は、その検出対象物体について、現時点よりも減速するような予測軌跡を推定する。さらに、着目する検出対象物体の状態が、左右のターンシグナル及びハザードランプの何れも点滅しておらず、かつ、ブレーキランプが消灯している状態である場合、その検出対象物体は減速せずに直進する可能性が高い。そこで、運転計画部３５は、その検出対象物体について、減速せずに直進するような予測軌跡を推定する。 The operation planning unit 35 refers to the detected object list to generate one or more planned travel routes (trajectories) for the vehicle 10 so that the vehicle 10 does not collide with objects existing around the vehicle 10 . The planned travel route is represented, for example, as a set of target positions of the vehicle 10 at each time from the current time to a predetermined time ahead. For example, the driving planning unit 35 refers to the detected object list and executes viewpoint conversion processing using information such as the mounting position of the camera 2 on the vehicle 10, thereby obtaining the in-image coordinates of the object in the detected object list. to coordinates on the bird's-eye image (bird's-eye coordinates). Then, the operation planning unit 35 tracks objects registered in the detected object list by executing tracking processing using a Kalman filter, a particle filter, or the like on a series of bird's-eye coordinates. From the trajectory obtained, the predicted trajectory of each object up to a predetermined time ahead is estimated. At that time, the operation planning unit 35 uses the state identification result of the detection target object to estimate the predicted trajectory. For example, when the state of the detection target object of interest is a state in which the left turn signal is blinking, the detection target object is highly likely to change lanes to the left or turn left. Therefore, the driving planning unit 35 estimates a predicted trajectory for changing lanes to the left or turning left for the object to be detected. Further, when the state of the detection target object of interest is a state in which the brake lamps are on or a state in which the hazard lamps are blinking, there is a high possibility that the detection target object will decelerate. Therefore, the operation planning unit 35 estimates a predicted trajectory for the object to be detected that decelerates from the current time. Furthermore, when the state of the detection target object of interest is a state in which none of the left and right turn signals and hazard lamps are blinking and the brake lights are off, the detection target object does not decelerate. likely to go straight. Therefore, the operation planning unit 35 estimates a predicted trajectory for the object to be detected so that the object goes straight without deceleration.

運転計画部３５は、追跡中の各物体の予測軌跡と、車両１０の位置、速度及び姿勢に基づいて、何れの物体についても所定時間先までの追跡中の物体のそれぞれと車両１０間の距離の予測値が所定距離以上となるように、車両１０の走行予定経路を生成する。なお、運転計画部３５は、車両１０の位置、速度及び姿勢を、例えば、車両１０に搭載されたＧＰＳ受信機（図示せず）から得た車両１０の現在位置を表す現在位置情報に基づいて推定できる。あるいは、ローカライズ処理部（図示せず）が、カメラ２により画像が得られる度に、その画像から車両１０の左右の車線区画線を検出し、検出された車線区画線とメモリ２２に記憶されている地図情報とをマッチングすることで、車両１０の位置、速度及び姿勢が推定されてもよい。さらに、運転計画部３５は、例えば、車両１０の現在位置情報と、メモリ２２に記憶されている地図情報とを参照して、車両１０が走行可能な車線の数を確認してもよい。そして運転計画部３５は、車両１０が走行可能な車線が複数存在する場合には、車両１０が走行する車線を変更するように走行予定経路を生成してもよい。
なお、運転計画部３５は、複数の走行予定経路を生成してもよい。この場合、運転計画部３５は、複数の走行予定経路のうち、車両１０の加速度の絶対値の総和が最小となる経路を選択してもよい。 Based on the predicted trajectory of each object being tracked and the position, speed, and attitude of the vehicle 10, the operation planning unit 35 calculates the distance between each object being tracked and the vehicle 10 up to a predetermined time ahead for any object. A planned travel route for the vehicle 10 is generated so that the predicted value of is equal to or greater than a predetermined distance. The operation planning unit 35 determines the position, speed, and attitude of the vehicle 10 based on current position information indicating the current position of the vehicle 10 obtained from a GPS receiver (not shown) mounted on the vehicle 10, for example. can be estimated. Alternatively, each time an image is obtained by the camera 2, a localization processing unit (not shown) detects the left and right lane markings of the vehicle 10 from the image, and the detected lane markings and the detected lane markings are stored in the memory 22. The position, speed, and attitude of the vehicle 10 may be estimated by matching with the map information in the vehicle 10 . Further, the operation planning unit 35 may refer to the current position information of the vehicle 10 and the map information stored in the memory 22 to confirm the number of lanes in which the vehicle 10 can travel. If there are a plurality of lanes on which the vehicle 10 can travel, the operation planning unit 35 may generate the planned travel route so as to change the lane on which the vehicle 10 travels.
Note that the operation planning unit 35 may generate a plurality of planned travel routes. In this case, the operation planning unit 35 may select the route that minimizes the sum of the absolute values of the acceleration of the vehicle 10 from among the plurality of planned travel routes.

運転計画部３５は、生成した走行予定経路を車両制御部３６へ通知する。 The operation planning unit 35 notifies the vehicle control unit 36 of the generated planned travel route.

車両制御部３６は、車両１０が通知された走行予定経路に沿って走行するように車両１０の各部を制御する。例えば、車両制御部３６は、通知された走行予定経路、及び、車速センサ（図示せず）により測定された車両１０の現在の車速に従って、車両１０の加速度を求め、その加速度となるようにアクセル開度またはブレーキ量を設定する。そして車両制御部３６は、設定されたアクセル開度に従って燃料噴射量を求め、その燃料噴射量に応じた制御信号を車両１０のエンジンの燃料噴射装置へ出力する。あるいは、車両制御部３６は、設定されたブレーキ量に応じた制御信号を車両１０のブレーキへ出力する。 The vehicle control unit 36 controls each part of the vehicle 10 so that the vehicle 10 travels along the notified planned travel route. For example, the vehicle control unit 36 obtains the acceleration of the vehicle 10 according to the notified planned travel route and the current vehicle speed of the vehicle 10 measured by a vehicle speed sensor (not shown), and adjusts the accelerator to achieve that acceleration. Set the opening or braking amount. The vehicle control unit 36 obtains the fuel injection amount according to the set accelerator opening, and outputs a control signal corresponding to the fuel injection amount to the fuel injection device of the engine of the vehicle 10 . Alternatively, the vehicle control unit 36 outputs a control signal corresponding to the set brake amount to the brake of the vehicle 10 .

さらに、車両制御部３６は、車両１０が走行予定経路に沿って走行するために車両１０の進路を変更する場合には、その走行予定経路に従って車両１０の操舵角を求め、その操舵角に応じた制御信号を、車両１０の操舵輪を制御するアクチュエータ（図示せず）へ出力する。 Further, when the vehicle 10 changes its course so that the vehicle 10 travels along the planned travel route, the vehicle control unit 36 obtains the steering angle of the vehicle 10 according to the planned travel route, and determines the steering angle according to the steering angle. This control signal is output to an actuator (not shown) that controls the steered wheels of the vehicle 10 .

図９は、プロセッサ２３により実行される、物体状態識別処理を含む車両制御処理の動作フローチャートである。プロセッサ２３は、カメラ２から画像を受信する度に、図９に示される動作フローチャートに従って車両制御処理を実行する。なお、以下に示される動作フローチャートにおいて、ステップＳ１０１～Ｓ１０６の処理が物体状態識別処理に対応する。 FIG. 9 is an operation flowchart of vehicle control processing including object state identification processing executed by the processor 23 . Each time the processor 23 receives an image from the camera 2, it executes vehicle control processing according to the operation flowchart shown in FIG. In the operation flowchart shown below, the processing of steps S101 to S106 corresponds to the object state identification processing.

プロセッサ２３の物体検出部３１は、カメラ２から得られた最新の画像を第１の識別器に入力して、その画像に表された検出対象物体を検出する。すなわち、物体検出部３１は、画像上で検出対象物体を含む所定形状の物体領域を検出する（ステップＳ１０１）。さらに、物体検出部３１は、その検出対象物体の種類を識別する。そして物体検出部３１は、検出された物体を検出物体リストに登録する。 The object detection unit 31 of the processor 23 inputs the latest image obtained from the camera 2 to the first discriminator, and detects the detection target object represented in the image. That is, the object detection unit 31 detects an object region having a predetermined shape including the detection target object on the image (step S101). Furthermore, the object detection unit 31 identifies the type of the object to be detected. Then, the object detection unit 31 registers the detected object in the detected object list.

プロセッサ２３の追跡部３２は、最新の画像における、検出対象物体を含む物体領域のそれぞれについて、その物体領域と、過去の画像における物体領域とに基づいて、最新の画像における、その物体領域に表されている検出対象物体を追跡する（ステップＳ１０２）。さらに、追跡部３２は、追跡中の検出対象物体の中から、所定数の検出対象物体を、状態識別の対象となる検出対象物体として選択する（ステップＳ１０３）。 For each object region containing the object to be detected in the latest image, the tracking unit 32 of the processor 23 displays the object region in the latest image based on the object region and the object regions in the past images. The object to be detected is tracked (step S102). Further, the tracking unit 32 selects a predetermined number of detection target objects from among the detection target objects being tracked as detection target objects to be subjected to state identification (step S103).

また、プロセッサ２３の領域分割部３３は、最新の画像を第２の識別器に入力して、その画像を、個々の検出対象物体が表された着目領域とそれ以外のマスク領域とに領域分割する（ステップＳ１０４）。 In addition, the region dividing unit 33 of the processor 23 inputs the latest image to the second discriminator, and divides the image into a region of interest representing each detection target object and a mask region other than that. (step S104).

プロセッサ２３の状態識別部３４は、選択された状態識別の対象となる検出対象物体のそれぞれについて、その検出対象物体が表された物体領域内の画素値から求められる特徴のうち、物体領域と着目領域の両方に含まれる特徴を抽出する（ステップＳ１０５）。そして状態識別部３４は、状態識別の対象となる検出対象物体のそれぞれについて、抽出した特徴を、再帰構造を持つ第３の識別器に入力することで、その検出対象物体の状態を識別する（ステップＳ１０６）。 The state identification unit 34 of the processor 23 determines, for each of the selected detection target objects to be subjected to state identification, the object region and the focus, among the features obtained from the pixel values in the object region representing the detection target object. Features included in both regions are extracted (step S105). Then, the state identification unit 34 identifies the state of each detection target object by inputting the extracted features to a third classifier having a recursive structure ( step S106).

プロセッサ２３の運転計画部３５は、検出物体リストを参照して、検出物体リストに登録されている各検出対象物体について、状態識別結果を参照して推定されるその物体の予測軌跡と所定の距離以上となるように、車両１０の走行予定経路を生成する（ステップＳ１０７）。そしてプロセッサ２３の車両制御部３６は、走行予定経路に沿って車両１０が走行するように車両１０を制御する（ステップＳ１０８）。そしてプロセッサ２３は、車両制御処理を終了する。 The operation planning unit 35 of the processor 23 refers to the detected object list, and for each detection target object registered in the detected object list, determines the predicted trajectory and predetermined distance of the object estimated by referring to the state identification result. As described above, the planned travel route for the vehicle 10 is generated (step S107). Then, the vehicle control unit 36 of the processor 23 controls the vehicle 10 so that the vehicle 10 travels along the planned travel route (step S108). The processor 23 then terminates the vehicle control process.

以上に説明してきたように、この物体状態識別装置は、時系列に得られる一連の画像をそれぞれ第１の識別器に入力することで、一連の画像のそれぞれから、検出対象物体を含む所定形状の物体領域を検出する。また、この物体状態識別装置は、各画像を第２の識別器に入力することで、検出対象物体が表された画素の集合とそれ以外の物体が表された画素の集合とに領域分割し、その領域分割結果を利用して、物体領域内の特徴のうち、検出対象物体が表される着目領域に含まれる特徴を抽出する。そしてこの物体状態識別装置は、抽出した特徴を時系列順に再帰的な構造を持つ第３の識別器に入力することで、検出対象物体の状態を識別する。これにより、この物体状態識別装置は、画像に表された検出対象物体の時系列の外観変化を、状態識別の判定に利用される特徴の時系列の変化として捉えることができる。さらに、この物体状態識別装置は、画像に表された検出対象物体以外の情報がその検出対象物体の状態の識別に影響することを抑制できる。そのため、この物体状態識別装置は、検出対象物体の状態を正確に識別することができる。さらに、この物体状態識別装置は、個々の画像から物体を検出する第１の識別器を利用して、時系列の一連の画像のそれぞれから、第３の識別器に入力する特徴を抽出するので、画像全体を再帰構造を持つ識別器に入力して物体の状態を識別するよりも、全体として演算量を削減することができる。また、第１の識別器及び第２の識別器の学習に用いられる画像は静止画像であればよく、一方、第３の識別器の学習には、動画像が必要となるものの、その動画像に含まれる個々の画像のサイズは、第１及び第２の識別器の学習に利用される画像のサイズよりも小さくてよい。そのため、この物体状態識別装置は、各識別器の学習に必要なコスト（例えば、教師画像のアノテーションに要するコスト、教師画像の収集に要するコストなど）を削減するとともに、各識別器の学習に要する演算量及び演算時間を削減することができる。 As described above, this object state identification apparatus inputs a series of images obtained in time series to the first classifier, respectively. , the object area is detected. In addition, the object state identification device inputs each image to the second classifier, and divides the image into a set of pixels representing the object to be detected and a set of pixels representing other objects. , using the region segmentation result, features included in the region of interest representing the object to be detected are extracted from among the features in the object region. This object state identification device identifies the state of the object to be detected by inputting the extracted features into a third classifier having a recursive structure in chronological order. As a result, the object state identification apparatus can capture time-series changes in the appearance of the detection target object represented in the image as time-series changes in features used for state identification determination. Furthermore, this object state identification device can suppress the influence of information other than the detection target object represented in the image on the identification of the state of the detection target object. Therefore, this object state identification device can accurately identify the state of the object to be detected. Furthermore, this object state identification device uses the first classifier for detecting the object from each image, and extracts features to be input to the third classifier from each of the time-series images. , the overall amount of computation can be reduced compared to inputting the entire image to a discriminator having a recursive structure to discriminate the state of the object. Images used for learning the first classifier and the second classifier may be still images. On the other hand, learning for the third classifier requires a moving image. The size of each image included in may be smaller than the size of the images used for training the first and second classifiers. Therefore, this object state identification device reduces the cost required for learning each classifier (for example, the cost required for annotation of teacher images, the cost required for collecting teacher images, etc.), and the cost required for learning each classifier. A calculation amount and calculation time can be reduced.

なお、変形例によれば、物体検出部３１は、DNN以外の識別器を利用して、画像から検出対象物体を検出してもよい。例えば、物体検出部３１は、第1の識別器として、画像上に設定されるウィンドウから算出される特徴量（例えば、HOG）を入力として、そのウィンドウに検出対象となる物体が表される確信度を出力するように予め学習されたサポートベクトルマシン（SVM）を用いてもよい。物体検出部３１は、画像上に設定するウィンドウの位置、サイズ及びアスペクト比を様々に変更しながら、そのウィンドウから特徴量を算出し、算出した特徴量をSVMへ入力することで、そのウィンドウについて確信度を求める。そして物体検出部３１は、何れかの種類の検出対象物体について確信度が所定の確信度閾値以上となるウィンドウに、その検出対象物体が表されていると判定し、かつ、そのウィンドウを物体領域とすればよい。なお、SVMは、検出対象となる物体の種類ごとに用意されてもよい。この場合には、物体検出部３１は、各ウィンドウについて、そのウィンドウから算出された特徴量をそれぞれのSVMへ入力することで、物体の種類ごとに確信度を算出すればよい。この場合、状態識別部３４の第３の識別器に入力する物体領域の特徴は、検出対象物体が表されていると判定されたウィンドウ（すなわち、物体領域）から抽出され、SVMに入力されるHOGといった特徴量とすることができる。 Note that, according to a modification, the object detection unit 31 may detect a detection target object from an image using a discriminator other than DNN. For example, the object detection unit 31 receives as input a feature amount (for example, HOG) calculated from a window set on the image as a first discriminator, and is confident that the object to be detected is represented in that window. A support vector machine (SVM) pre-trained to output degrees may also be used. The object detection unit 31 calculates the feature amount from the window while variously changing the position, size, and aspect ratio of the window set on the image, and inputs the calculated feature amount to the SVM, thereby obtaining information about the window. Seek certainty. Then, the object detection unit 31 determines that the detection target object is represented in a window in which the certainty of any type of detection target object is equal to or higher than a predetermined certainty threshold, and determines that the window is an object region. And it is sufficient. Note that an SVM may be prepared for each type of object to be detected. In this case, the object detection unit 31 may calculate the certainty factor for each type of object by inputting the feature amount calculated from each window into each SVM. In this case, the feature of the object region input to the third discriminator of the state identification unit 34 is extracted from the window (i.e., the object region) determined to represent the detection target object and input to the SVM. A feature quantity such as HOG can be used.

上記の実施形態または変形例による物体状態識別装置は、車載機器以外に実装されてもよい。例えば、上記の実施形態または変形例による物体状態識別装置は、屋外または屋内の所定の領域を所定周期ごとに撮影するように設置された監視カメラにより生成された画像から物体を検出し、検出した物体の状態を識別するように構成されてもよい。そして物体状態識別装置は、一定期間にわたって物体が検出された場合、物体状態識別装置と接続されるディスプレイに物体が検出されたこと、及びその物体の状態の識別結果を表すメッセージを表示させてもよい。 The object state identification device according to the above embodiment or modification may be mounted in a device other than an in-vehicle device. For example, the object state identification device according to the above embodiment or modification detects an object from an image generated by a surveillance camera installed to photograph a predetermined outdoor or indoor area at predetermined intervals, and detects the detected object. It may be configured to identify the state of the object. Then, when an object is detected for a certain period of time, the object state identification device displays a message indicating that the object has been detected and the identification result of the state of the object on a display connected to the object state identification device. good.

また、上記の実施形態または変形例による、物体状態識別装置のプロセッサ２３の各部の機能を実現するコンピュータプログラムは、半導体メモリ、磁気記録媒体または光記録媒体といった、コンピュータ読取可能な可搬性の記録媒体に記録された形で提供されてもよい。 In addition, the computer program that realizes the function of each part of the processor 23 of the object state identification device according to the above embodiment or modification can be stored in a computer-readable portable recording medium such as a semiconductor memory, a magnetic recording medium, or an optical recording medium. may be provided in recorded form.

以上のように、当業者は、本発明の範囲内で、実施される形態に合わせて様々な変更を行うことができる。 As described above, those skilled in the art can make various modifications within the scope of the present invention according to the embodiment.

１車両制御システム
２カメラ
３電子制御装置（物体状態識別装置）
４車内ネットワーク
２１通信インターフェース
２２メモリ
２３プロセッサ
３１物体検出部
３２追跡部
３３領域分割部
３４状態識別部
３５運転計画部
３６車両制御部 1 vehicle control system 2 camera 3 electronic control device (object state identification device)
4 in-vehicle network 21 communication interface 22 memory 23 processor 31 object detector 32 tracking unit 33 area dividing unit 34 state identifying unit 35 operation planning unit 36 vehicle control unit

Claims

By inputting a series of images obtained in time series to a first discriminator pre-trained to detect a predetermined object, each of the series of images includes the object on the image, and an object detection unit that detects an object region having a predetermined shape;
each of the series of images by inputting each of the series of images into a second discriminator pretrained to distinguish sets of pixels representing the object from other sets of pixels; a region division unit that divides into a region of interest, which is a set of pixels representing the object, and other regions,
A third method having a recursive structure in which, among features obtained from pixel values in the object region detected in each of the series of images, the features included in both the object region and the region of interest are arranged in chronological order. A state identification unit that identifies the state of the object that accompanies a time-series appearance change by inputting to the classifier;
An object state identification device comprising: