JP2005311691A

JP2005311691A - Apparatus and method for detecting object

Info

Publication number: JP2005311691A
Application number: JP2004125705A
Authority: JP
Inventors: Mineki Soga; 峰樹曽我; Takeo Kato; 武男加藤
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2004-04-21
Filing date: 2004-04-21
Publication date: 2005-11-04

Abstract

<P>PROBLEM TO BE SOLVED: To determine with high accuracy whether or not an object in an image obtained by photographing an object imaging area is really an object for detection. <P>SOLUTION: From a stereo image obtained by photographing the object imaging area, a candidate area is obtained which is predicted as a solid body and has a calculated identification value of no less than a predetermined value. The obtained candidate area is collated with past candidate area series a0-a5 and b0-b4 stored in a storage. On the basis of the collation, the storage is updated so as to add a candidate area (a0-a6 and b0-b5) in case the above corresponds, or to generate a new series (c0) in case there is no corresponding series. A mean of the identification values is obtained for each series a0-a6, b0-b4 and c0. For each corresponding time, the obtained mean value of the identification values is compared with a plurality of thresholds having a smaller value as a retroactive time becomes longer, and thus, it is determined whether or not the series of the candidate area is the detection object. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、物体検出装置及び方法にかかり、特に、被撮像領域を時系列に撮像して得られた画像を用いて、画像中に物体が真に検出対象であるか否かを判定して検出する物体検出装置及び方法に関する。 The present invention relates to an object detection apparatus and method, and in particular, determines whether or not an object is truly a detection target in an image using an image obtained by capturing an imaged region in time series. The present invention relates to an object detection apparatus and method for detection.

交通弱者である歩行者の事故防止や被害軽減のために、歩行者情報提示システム、警報システム、自動ブレーキシステム等の自動車用安全システムの実現が期待されている。これらのシステムの実現には歩行者検出センサの開発が必要である。 Realization of automobile safety systems such as a pedestrian information presentation system, warning system, and automatic brake system is expected to prevent accidents and reduce damage to pedestrians who are vulnerable to traffic. Development of a pedestrian detection sensor is necessary to realize these systems.

画像による歩行者等の人物検出の従来技術は、侵入者や侵入物体の検出を行う監視・セキュリティー用途のものが多く、背景差分法や時間差分法及びそれらを組合せた手法が主として用いられている。 Many conventional techniques for detecting people such as pedestrians using images are used for monitoring and security purposes to detect intruders and objects, and the background subtraction method, time subtraction method, and a combination of these methods are mainly used. .

まず、背景差分法は、侵入物体のない背景画像を予め撮像して記憶し、その背景画像と監視時の撮像画像との差分により背景と異なる明るさの領域を侵入物体として検出するものである。次に、時間差分法は、所定時間間隔の２枚の撮像画像の差分により明るさの変化領域を移動物体即ち侵入物体として検出するものである。これらの手法は、固定カメラで撮像された静止した背景の画像を前提としており、車載カメラ等の動くカメラで撮像された画像への適用は難しい。 First, in the background subtraction method, a background image without an intruding object is captured and stored in advance, and a region having a brightness different from that of the background is detected as an intruding object based on a difference between the background image and a captured image at the time of monitoring. . Next, in the time difference method, a brightness change region is detected as a moving object, that is, an intruding object, based on a difference between two captured images at a predetermined time interval. These methods are based on a static background image captured by a fixed camera, and are difficult to apply to images captured by a moving camera such as an in-vehicle camera.

また、車載カメラ画像等に適用可能な歩行者検出手法についてもいくつかの研究例がある。それらの手法は主として以下に述べる３段階の処理で歩行者を検出する。 There are also some research examples of pedestrian detection methods applicable to in-vehicle camera images. These methods detect pedestrians mainly by the following three-step process.

まず、第１段階では、比較的高速且つ簡単な処理により候補領域を抽出する。ここで、候補領域とは、歩行者が存在する可能性のある画像中の小領域であり、歩行者が存在しない小領域も含んでいる。 First, in the first stage, candidate areas are extracted by a relatively fast and simple process. Here, the candidate area is a small area in an image where a pedestrian may exist, and includes a small area where no pedestrian exists.

非特許文献１は、路上の立体物を候補領域として抽出する。ステレオカメラ画像より生成された視差画像の各点（視差点）を実空間に投影すると、立体物の周辺では視差点の空間密度が大きくなる性質を利用している。非特許文献２は、予め歩行者の輪郭形状のパターンを階層的にモデル化し、そのモデルとの一致度に基づいて候補領域を抽出する。非特許文献３は、遠赤外線画像中の歩行者領域は濃度値が相対的に大きくなる性質を利用して候補領域を抽出する。 Non-Patent Document 1 extracts a three-dimensional object on a road as a candidate area. When each point (parallax point) of the parallax image generated from the stereo camera image is projected onto the real space, the property of increasing the spatial density of the parallax points around the three-dimensional object is used. Non-Patent Literature 2 hierarchically models a pedestrian contour pattern in advance, and extracts candidate areas based on the degree of coincidence with the model. Non-Patent Document 3 extracts a candidate area using a property that a pedestrian area in a far-infrared image has a relatively large density value.

次の第２段階では、第１段階で抽出した候補領域が歩行者であるか否かを識別器を用いて仮判定する。識別器の生成は機械学習によって行われるのが主流であり、非特許文献１及び非特許文献２ではニューラルネットワークが用いられ、非特許文献３ではサポートベクターマシンが用いられている。 In the next second stage, whether or not the candidate region extracted in the first stage is a pedestrian is provisionally determined using a discriminator. The generation of classifiers is mainly performed by machine learning. In Non-Patent Document 1 and Non-Patent Document 2, a neural network is used, and in Non-Patent Document 3, a support vector machine is used.

なお、第１段階と第２段階で異なる基準により候補領域を絞り込むことで候補領域の抽出精度を向上できる。更に、比較的高速な第１段階の処理で候補領域数を限定することで全体の処理時間を削減できる。 It should be noted that the extraction accuracy of candidate areas can be improved by narrowing down candidate areas according to different criteria in the first stage and the second stage. Furthermore, the overall processing time can be reduced by limiting the number of candidate areas in the first stage process that is relatively fast.

そして、第３段階では、第２段階の仮判定結果を、時間的連続性を考慮して補正し、最終判定を行う。非特許文献２は、αβフィルタを用いて追跡し、所定連続回数追跡されたときに歩行者であると最終判定し、所定連続回数ロストしたときに歩行者でないと最終判定する。非特許文献３は、第１段階と第２段階の処理を５フレーム毎に行い、その間のフレームでは位置の予測のみを行う。位置予測では、カルマンフィルタによる予測位置をミーンシフトフィルタにより補正している。即ち、５フレーム毎に行う仮判定を正しいと仮定し、その判定を継続させたものを最終判定としている。 In the third stage, the provisional determination result in the second stage is corrected in consideration of temporal continuity, and the final determination is performed. Non-Patent Document 2 tracks using an αβ filter, and finally determines that it is a pedestrian when it is tracked a predetermined number of times, and finally determines that it is not a pedestrian when it is lost a predetermined number of times. In Non-Patent Document 3, the first stage and second stage processes are performed every five frames, and only the position prediction is performed in the frames in between. In the position prediction, the predicted position by the Kalman filter is corrected by the mean shift filter. That is, it is assumed that the provisional determination performed every 5 frames is correct, and the determination is continued as the final determination.

更に、動くカメラへの適用を前提としていないが、上記第３段階の処理のように過去のフレームを用いて検出対象であるか否かを最終判定する発明はいくつか提案されているが、以下、例として次の２つを説明する。 Further, although not premised on application to a moving camera, several inventions have been proposed for final determination as to whether or not a detection target is made using a past frame as in the third stage processing. As an example, the following two will be described.

第１に、特許文献１は、抽出手段において撮像画像から所定の条件を満たす部分領域即ち候補領域を抽出し、対応付け手段により異なる撮像画像での部分領域を対応付け、１次元パターン変換手段において部分領域を基に複数の１次元パターンを生成し、２次元パターン変換手段において異なる撮像画像の複数の１次元パターンから２次元パターンを生成し、２次元パターンを辞書パターンと比較することにより抽出された部分領域が特定対象物であるか否かを判定する。 First, Patent Document 1 extracts a partial region that satisfies a predetermined condition, that is, a candidate region, from a captured image by an extraction unit, associates partial regions of different captured images by an association unit, and performs one-dimensional pattern conversion unit. Extracted by generating a plurality of one-dimensional patterns based on partial areas, generating a two-dimensional pattern from a plurality of one-dimensional patterns of different captured images in a two-dimensional pattern conversion means, and comparing the two-dimensional pattern with a dictionary pattern It is determined whether the partial area is a specific object.

第２に、特許文献２は、第１の処理過程において背景差分法と動的２値化又はテンプレートマッチングによって移動物体即ち候補領域を抽出し、第２の処理過程において移動物体の特徴量を抽出し、第３の処理過程において抽出された特徴量の系列を移動物体と関連付けて記憶し、第４の処理過程において記憶された特徴量の系列から移動体が特定移動体であるか否かを判定する。判定の方法として、移動物体の軌跡が連続する時間が所定以上であるものを特定移動体とする方法と、移動体の位置変化のばらつきが所定範囲以内であるものを特定移動体とする方法が記載されている。
L.Zhao and C.E.Thorpe, "Stereo and Neural Network-Based Pedestrian Detection," IEEE Transactions on ITS, Vol. １ No.３, pp１４８-１５４, ２０００. D.M.Gavrila and J.Giebel, "Shape-based Pedestrian Detection and Tracking," Proc. IEEE Intelligent Vehicles Symposium, ２００２. F.Xu and K.Fujimura, "Pedestrian Detection and Tracking with Night Vision," Proc. IEEE Intelligent Vehicles Symposium, ２００２. 特開平１０-０９１７９４号公報特開２００２-１５７５９９号公報 Second, Patent Document 2 extracts a moving object, that is, a candidate area by background difference method and dynamic binarization or template matching in the first processing process, and extracts a feature amount of the moving object in the second processing process. Then, the feature quantity sequence extracted in the third processing step is stored in association with the moving object, and whether or not the moving object is the specific moving object from the feature quantity sequence stored in the fourth processing process is determined. judge. As a determination method, there are a method in which the moving object trajectory continues for a predetermined time or more is used as the specific moving body, and a method in which the variation in position change of the moving body is within a predetermined range is used as the specific moving body. Has been described.
L. Zhao and CEThorpe, "Stereo and Neural Network-Based Pedestrian Detection," IEEE Transactions on ITS, Vol. 1 No. 3, pp 148-154, 2000. DMGavrila and J. Giebel, "Shape-based Pedestrian Detection and Tracking," Proc. IEEE Intelligent Vehicles Symposium, 2002. F.Xu and K. Fujimura, "Pedestrian Detection and Tracking with Night Vision," Proc. IEEE Intelligent Vehicles Symposium, 2002. Japanese Patent Laid-Open No. 10-091794 JP 2002-157599 A

歩行者はその姿勢やポーズ、服装、荷物等により画像中での見え方が変化するので非常に多様なパターン（見え方）の集合である。また、この多様なパターンはカメラの動きや歩行者の動きによって時々刻々と変化する。従って、判定に用いられる評価値即ち識別値は時間的に変動し、フレーム内での識別値のみに基づいて最終判定すると誤判定し易い。 A pedestrian is a set of very diverse patterns (appearances) because the appearance in the image changes depending on the posture, pose, clothes, luggage, and the like. The various patterns change from moment to moment depending on the movement of the camera and the movement of the pedestrian. Therefore, the evaluation value used for the determination, that is, the identification value varies with time, and it is easy to make an erroneous determination if the final determination is made based only on the identification value in the frame.

非特許文献３は１フレーム内の処理による仮判定結果を次フレーム以降で継続しているので、誤判定しやすいという問題がある。更に誤判定した場合に、誤検出が継続したり、未検出の物体が次に判定が行われるフレームまで検出されないという問題がある。 Non-Patent Document 3 has a problem in that it is easy to make an erroneous determination because a temporary determination result obtained by processing in one frame is continued from the next frame. Further, when erroneous determination is made, there are problems that erroneous detection continues or that an undetected object is not detected until the next frame in which determination is performed.

特許文献１の方法は時系列の特徴量と検出対象の時系列のモデルとを比較することにより判定を行う。固定カメラを用いる場合、比較的安定して候補領域を抽出できるので時系列の特徴量を正しく求められる。動くカメラを用いる場合、抽出される候補領域の一時的な欠落や位置ズレが生じやすいため、時系列の特徴量を正しく求められないという問題がある。また、歩行者のような多様な見え方の検出対象を時系列のパターンとして扱うとパターンの多様性が更に大きくなり識別精度が低下することもある。 The method of Patent Literature 1 performs determination by comparing a time-series feature amount with a time-series model to be detected. When a fixed camera is used, candidate regions can be extracted relatively stably, so that time-series feature values can be obtained correctly. When a moving camera is used, there is a problem in that a time-series feature amount cannot be obtained correctly because temporary extraction of a candidate area to be extracted or positional deviation is likely to occur. In addition, if a detection target having various appearances such as a pedestrian is handled as a time-series pattern, the diversity of the pattern is further increased and the identification accuracy may be lowered.

非特許文献２、特許文献２の方法は、所定フレーム数連続して追跡した場合に特定対象物であると最終判定している。即ち、１フレーム内の処理による仮判定結果の連続性に基づいて最終判定を行うものである。上述したように見え方が時間変動する検出対象では、１フレーム内の仮判定を誤りやすい。仮判定の誤りが頻繁に生じると仮判定結果の連続性が成立しないため正しい最終判定ができない問題がある。また、検出対象であると最終判定されるまでに必ず所定時間要するために検出が遅れるという問題もある。 The methods of Non-Patent Document 2 and Patent Document 2 finally determine that the object is a specific object when a predetermined number of frames are continuously tracked. That is, the final determination is performed based on the continuity of the provisional determination result by the processing within one frame. As described above, in the detection target whose appearance changes with time, the provisional determination within one frame is likely to be erroneous. If errors in temporary determination frequently occur, the continuity of the temporary determination results is not established, and there is a problem that correct final determination cannot be made. There is also a problem that the detection is delayed because it always takes a predetermined time before the final determination that it is a detection target.

特許文献２では、位置変化のばらつきに基づいて判定する方法も提案されている。これはフレーム間の位置変化量を所定時間記憶し、そのばらつきが小さいときに特定対象物であると判定するものである。カメラと検出対象の相対運動が加速度的であるときや候補抽出位置にズレがある場合に正しく最終判定できないという問題がある。 Patent Document 2 also proposes a method of determining based on variation in position change. This stores the amount of change in position between frames for a predetermined time, and determines that the object is a specific object when the variation is small. There is a problem that the final determination cannot be performed correctly when the relative motion of the camera and the detection target is acceleration or when there is a deviation in the candidate extraction position.

本発明は、上記事実に鑑み成されたもので、被撮像領域を撮像して得られた画像中の物体が真に検出対象であるか否かをより精度よく判定可能な物体検出装置及び方法の提供を目的とする。 The present invention has been made in view of the above-described facts, and an object detection apparatus and method capable of more accurately determining whether or not an object in an image obtained by imaging an imaged region is truly a detection target. The purpose is to provide.

上記目的を達成するために請求項１に記載の発明の物体検出装置は、被撮像領域を時系列に撮像して得られた画像を時系列に出力する撮像手段と、前記撮像手段により時系列に出力された画像各々の中から、予め作成された検出対象のモデルに基づいて、該検出対象として識別される度合いを示す識別値が所定値以上の領域を取得する取得手段と、前記時系列に出力された画像各々の中から前記取得手段により取得されかつ各々対応付けられた１個以上の領域である領域の系列を記憶する記憶手段と、前記取得手段により今回取得された領域と、前記記憶手段に記憶された前記領域の系列と、が対応するか照合する照合手段と、前記照合手段による照合の結果、前記今回取得された領域と前記領域の系列とが対応する場合には、前記今回取得された領域と前記領域の系列とが対応付けられ、前記今回取得された領域と前記領域の系列とが対応しない場合には、前記今回取得された領域を新たな系列として記憶するように、前記記憶手段を更新する更新手段と、前記記憶手段に記憶されかつ前記更新された前記領域の系列において、異なる複数の所定時刻各々まで遡って算出した領域の識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較することにより、前記領域の系列が検出対象であるか否かを判定する判定手段と、を備えている。 In order to achieve the above object, an object detection apparatus according to a first aspect of the present invention includes an imaging unit that outputs an image obtained by imaging a region to be imaged in time series, and time series by the imaging unit. Acquisition means for acquiring a region having an identification value indicating a degree of identification as a detection target based on a model of the detection target created in advance from each of the images output to the time series, and the time series Storage means for storing a series of areas which are one or more areas acquired by the acquisition means from each of the images output to each of the images, the areas acquired this time by the acquisition means, The collation means for collating whether or not the series of areas stored in the storage means corresponds, and as a result of the collation by the collation means, when the area acquired this time and the series of areas correspond, This time If the region acquired this time and the sequence of the region are associated with each other and the region acquired this time does not correspond to the sequence of the region, the region acquired this time is stored as a new sequence, Update means for updating storage means; a plurality of average values of identification values of areas calculated retroactively to each of a plurality of different predetermined times in the series of the areas stored and updated in the storage means; And determining means for determining whether or not the series of the region is a detection target by comparing a plurality of threshold values that become smaller as the length of the region becomes longer at each corresponding time.

また、請求項６記載の発明の物体検出方法は、被撮像領域を時系列に撮像して得られた画像を時系列に出力し、前記時系列に出力された画像各々の中から、予め作成された検出対象のモデルに基づいて、該検出対象として識別される度合いを示す識別値が所定値以上の領域を取得し、前記時系列に出力された画像各々の中から以前に取得されかつ各々対応付けられた１個以上の領域である領域の系列と、今回取得された領域と、が対応するか照合し、前記照合の結果、前記今回取得された領域と前記領域の系列とが対応する場合には、前記今回取得された領域と前記領域の系列とを対応付け、前記今回取得された領域と前記領域の系列とが対応しない場合には、前記今回取得された領域を新たな系列として記憶し、前記今回取得された領域が対応付けられた領域の系列において、異なる複数の所定時刻各々まで遡って算出した領域の識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較することにより、前記領域の系列が検出対象であるか否かを判定する。なお、本発明の作用及び効果は、上記請求項１記載の発明と同様であるので、以下、請求項１記載の発明の作用及び効果を説明し、請求項８記載の発明の作用及び効果の説明を省略する。 The object detection method according to the sixth aspect of the present invention outputs, in time series, images obtained by capturing an imaged region in time series, and creates in advance from each of the images output in time series. Based on the detected model of the detection target, an area having an identification value indicating a degree to be identified as the detection target is greater than or equal to a predetermined value, acquired from each of the images output in time series, and each It is verified whether a series of areas that are one or more associated areas matches the area acquired this time, and as a result of the matching, the area acquired this time corresponds to the series of areas In this case, the region acquired this time and the sequence of the region are associated with each other. If the region acquired this time and the sequence of the region do not correspond to each other, the region acquired this time is set as a new sequence. And the area acquired this time In the sequence of assigned areas, the average values of the identification values of the areas calculated retroactively to a plurality of different predetermined times and the threshold values that become smaller as the retroactive time becomes longer are compared at corresponding times. By doing so, it is determined whether or not the series of regions is a detection target. Since the operation and effect of the present invention are the same as those of the invention described in claim 1, the operation and effect of the invention described in claim 1 will be described below, and the operation and effect of the invention described in claim 8 will be described. Description is omitted.

すなわち、請求項１記載の発明の撮像手段は、被撮像領域を時系列に撮像して得られた画像を時系列に出力する。なお、撮像手段として、例えば、ステレオカメラを用いることができる。 That is, the image pickup means according to the first aspect of the present invention outputs, in time series, images obtained by picking up the imaged region in time series. For example, a stereo camera can be used as the imaging means.

取得手段は、前記撮像手段により時系列に出力された画像各々の中から、予め作成された検出対象のモデルに基づいて、該検出対象として識別される度合いを示す識別値が所定値以上の領域を取得する。なお、識別値としては、例えば、機械学習により生成した識別器の２値化前の出力を用いることができる。 An acquisition unit is an area in which an identification value indicating a degree of identification as a detection target is greater than or equal to a predetermined value based on a model of the detection target created in advance from each of the images output in time series by the imaging unit To get. As the identification value, for example, an output before binarization of the classifier generated by machine learning can be used.

上記取得手段を、前記撮像手段により時系列に出力された画像の中から、検出対象を含む可能性のある候補領域を抽出する抽出手段と、前記抽出手段によって抽出された候補領域毎に、前記検出対象として識別される度合いを示す識別値を算出する識別値算出手段と、前記識別値算出手段より算出された識別値が所定値以上である候補領域を前記識別値が所定値以上の領域として選択する選択手段と、を備えて構成するようにしてもよい。 For each candidate area extracted by the extraction means, an extraction means for extracting a candidate area that may include a detection target from the images output in time series by the imaging means, An identification value calculation unit that calculates an identification value indicating a degree of identification as a detection target, and a candidate area in which the identification value calculated by the identification value calculation unit is equal to or greater than a predetermined value And selecting means for selecting.

すなわち、抽出手段は、前記撮像手段により時系列に出力された画像の中から、検出対象を含む可能性のある候補領域を抽出する。例えば、上記例では、左右のステレオ画像の視差に基づいて距離画像を作成し、視差の得られた点（視差点）を実空間を表すマップ上に投影する。例えば、撮像手段を車両に載置し、被撮像領域として道路を撮像して得られた画像からこのマップを作成すると、路上の立体物の周辺で視差点の空間密度が大きくなるという性質がある。この性質により、マップ上で視差点の空間密度の大きい領域を抽出することにより、歩行者を含む道路上の立体物を検出対象の候補として抽出する。 That is, the extraction unit extracts candidate areas that may include a detection target from the images output in time series by the imaging unit. For example, in the above example, a distance image is created based on the parallax of the left and right stereo images, and a point (parallax point) where the parallax is obtained is projected onto a map representing the real space. For example, when this map is created from an image obtained by placing an imaging means on a vehicle and imaging a road as an imaged area, the spatial density of parallax points increases around a three-dimensional object on the road. . Due to this property, a three-dimensional object on a road including a pedestrian is extracted as a candidate for detection by extracting a region having a high spatial density of parallax points on the map.

識別値算出手段は、前記抽出手段によって抽出された候補領域毎に、予め作成された検出対象のモデルに基づいて、前記検出対象として識別される度合いを示す識別値を算出する。 The identification value calculation means calculates an identification value indicating the degree of identification as the detection target based on a detection target model created in advance for each candidate region extracted by the extraction means.

なお、上記予め作成された検出対象のモデル、例えば、歩行者のモデルは、歩行者及びそれ以外の多数の画像サンプルを用いて機械学習により生成することできる。機械学習は次のように行う。まず、補間処理によりサンプル画像のサイズを規格化し、エッジ等の特徴量に変換する。この特徴量は歩行者とそれ以外の物体との差異をより明確に表すものが望ましく、検出対象や撮像画像の性質に基づいて最適な特徴量を選択することにより識別精度を向上できる。そして、サンプルを特徴量に変換した後は、ニューラルネットワークやサポートベクターマシン等公知の方法を用いて識別器、即ち、モデルを生成することができる。モデルの生成は事前に一度だけ行えばよい。 The detection target model created in advance, for example, a pedestrian model, can be generated by machine learning using a pedestrian and many other image samples. Machine learning is performed as follows. First, the size of the sample image is normalized by interpolation processing and converted into a feature amount such as an edge. It is desirable that this feature amount expresses the difference between the pedestrian and other objects more clearly, and the identification accuracy can be improved by selecting an optimum feature amount based on the detection target and the properties of the captured image. Then, after the sample is converted into the feature amount, a discriminator, that is, a model, can be generated using a known method such as a neural network or a support vector machine. The model needs to be generated only once in advance.

モデルを生成した後の識別値算出手段の処理手順を説明する。抽出手段で抽出された候補領域毎に、各候補領域をモデル生成時と同様に領域のサイズに規格化して特徴量に変換し、事前に生成した上記モデルを用いて処理する。これにより、検出対象として識別される度合いを示す識別値が得られる。機械学習を用いない簡易な方法であるテンプレートマッチングも、出力される相関値を識別値と見なすことができる。 The processing procedure of the identification value calculation means after generating the model will be described. For each candidate area extracted by the extraction means, each candidate area is normalized to the size of the area and converted into a feature amount in the same manner as when generating the model, and is processed using the model generated in advance. As a result, an identification value indicating the degree of identification as a detection target is obtained. Template matching, which is a simple method that does not use machine learning, can also regard an output correlation value as an identification value.

そして、選択手段は、前記識別値算出手段より算出された識別値が所定値以上である候補領域を前記識別値が所定値以上の領域として選択する。ここで、所定値は、確実に歩行者でないと判断できる候補領域が棄却されるように定める。候補領域の数を削減することにより、後段の照合手段における誤対応を防止できる。更に、後段の判定手段における判定をより正確に行えるようになる。これは、候補領域の抽出位置のズレ等に起因して識別値が誤って一時的に小さくなった場合に、候補領域が棄却されることにより判定に反映されなくなるためである。撮像手段が動いた状態の場合、候補領域の位置ズレが生じやすく、更に、見え方の時間変化のために識別値を誤りやすい。したがって、この選択手段が特に有効に作用する。 Then, the selection means selects a candidate area whose identification value calculated by the identification value calculation means is greater than or equal to a predetermined value as an area where the identification value is greater than or equal to the predetermined value. Here, the predetermined value is determined such that a candidate area that can be reliably determined not to be a pedestrian is rejected. By reducing the number of candidate areas, it is possible to prevent erroneous correspondence in the subsequent collating means. Further, the determination by the determination means at the subsequent stage can be performed more accurately. This is because when the identification value is temporarily reduced due to a deviation in the extraction position of the candidate area, the candidate area is rejected and is not reflected in the determination. When the imaging means is in a moving state, the candidate area is likely to be misaligned, and the identification value is likely to be erroneous due to a change in appearance. Therefore, this selection means works particularly effectively.

上記のように、取得手段（抽出手段と識別値算出手段と選択手段）は、撮像手段により時系列に出力された画像各々を各々毎に処理するため、撮像手段が動く場合であっても静止した場合であっても、背景の動きに影響されることなく検出対象の候補を抽出してその識別値を出力することができる。非特許文献２、及び特許文献２の方法は、過去のフレーム（時刻）での処理結果を用いて最終判定を行うが、この処理結果とは検出対象の有無を表す２値の情報である。これに対して、本発明における後述する最終判定において用いられる情報は、検出対象と識別される度合いを示す識別度、すなわち、連続値の情報である。このため、本発明は従来技術よりも正確な最終判定を行うことができる。 As described above, the acquisition means (extraction means, identification value calculation means, and selection means) processes each of the images output in time series by the imaging means, so that even if the imaging means moves, it remains stationary. Even in this case, the detection target candidate can be extracted and the identification value can be output without being affected by the movement of the background. In the methods of Non-Patent Document 2 and Patent Document 2, the final determination is performed using the processing result in the past frame (time). This processing result is binary information indicating the presence / absence of a detection target. On the other hand, the information used in the final determination to be described later in the present invention is an identification degree indicating a degree of identification as a detection target, that is, continuous value information. For this reason, the present invention can make a final determination more accurate than the prior art.

以上のように、取得手段を、抽出手段、識別値算出手段、及び選択手段により構成する例を説明した。すなわち、撮像された画像中から候補領域を抽出し、モデルに基づいて、識別値を出力する処理を、抽出手段と識別値算出手段の２段階で実現する方法を説明した。しかし、２段階に明確に分かれない構成も可能である。例えば、画像中のすべての小領域を識別器へ入力し、出力された識別値が所定値以上である小領域を候補領域と見なすことができる。また、多段階の処理で逐次候補を限定していくことにより最終的な候補領域を抽出する構成も可能である。 As described above, the example in which the acquisition unit is configured by the extraction unit, the identification value calculation unit, and the selection unit has been described. That is, a method has been described in which a candidate region is extracted from a captured image and a process of outputting an identification value based on a model is realized in two stages of an extraction unit and an identification value calculation unit. However, a configuration that is not clearly divided into two stages is possible. For example, all the small areas in the image can be input to the discriminator, and the small area whose output identification value is a predetermined value or more can be regarded as a candidate area. Further, it is possible to extract the final candidate region by sequentially limiting the candidates by multi-step processing.

上記記憶手段は、前記時系列に出力された画像各々の中から前記取得手段により取得されかつ各々対応付けられた１個以上の領域である領域の系列を記憶する。 The storage means stores a series of areas that are one or more areas acquired by the acquisition means from each of the images output in time series and associated with each other.

照合手段は、前記取得手段により今回取得された領域と、前記記憶手段に記憶された前記領域の系列と、が対応するか照合し、更新手段は、前記照合手段による照合の結果、前記今回取得された領域と前記領域の系列とが対応する場合には、前記今回取得された領域と前記領域の系列とが対応付けられ、前記今回取得された領域と前記領域の系列とが対応しない場合には、前記今回取得された領域を新たな系列として記憶するように、前記記憶手段を更新する。 The collation means collates whether the area acquired this time by the acquisition means corresponds to the series of areas stored in the storage means, and the update means obtains the current acquisition as a result of the collation by the collation means. The region acquired this time corresponds to the sequence of the region, the region acquired this time and the sequence of the region are associated with each other, and the region acquired this time and the sequence of the region do not correspond Updates the storage means so as to store the area acquired this time as a new series.

例えば、記憶手段に記憶された過去の領域の系列から今回の画像中での領域の出現位置又は出現位置の確率分布を予測し、予測した出現位置又はこの位置近くに今回取得された領域が存在するか否かを判断したり、出現確率が所定値以上となる位置を含む所定範囲に今回取得された領域が存在するか否かを判断したり、することにより、照合をし、照合の結果、領域が存在する場合に、その領域と記憶された領域の系列とを対応付ける。 For example, the appearance position of the area in the current image or the probability distribution of the appearance position in the current image is predicted from the series of past areas stored in the storage unit, and the currently acquired area exists near the predicted appearance position or near this position. The result of the collation is determined by determining whether or not the region acquired this time is within a predetermined range including the position where the appearance probability is a predetermined value or more. When an area exists, the area is associated with a series of stored areas.

なお、予測位置や予測位置の確率分布を求める方法としては、αβフィルタやカルマンフィルタ等の公知の方法がある。 As a method for obtaining the predicted position and the probability distribution of the predicted position, there are known methods such as an αβ filter and a Kalman filter.

他の対応付けの方法として、過去の領域の系列中の最新の領域をテンプレートとして記憶し、そのテンプレートとの相関値に基づいて対応付けを行う方法もある。更に、予測位置とテンプレートとの相関値を組み合わせて対応付けを行うこともできる。 As another association method, there is also a method in which the latest area in a series of past areas is stored as a template, and association is performed based on a correlation value with the template. Further, the correlation can be performed by combining the correlation values between the predicted position and the template.

ここで照合を行う領域の系列は、直前のフレームにおいて領域と対応付けられたものに限定されない。所定フレーム間対応付けられる領域が存在しない系列も含む。したがって候補領域が一時的に欠落する場合でも、対応付けられた前後のフレームでの識別値を用いて適切な判定が可能である。 Here, the series of areas to be collated is not limited to that associated with the area in the immediately preceding frame. A sequence in which there is no region associated with predetermined frames is also included. Therefore, even when a candidate area is temporarily lost, it is possible to make an appropriate determination using the identification values in the frames before and after the associated areas.

更新手段は、前記照合手段による照合の結果、前記今回取得された領域と前記領域の系列とが対応する場合、領域に関する照合に必要な情報と識別値とをその系列に追加して記憶する。 If the region acquired this time corresponds to the series of the regions as a result of the collation by the collating unit, the updating unit adds information necessary for the collation regarding the region and the identification value to the series and stores them.

ここで、照合に必要な情報とは、照合方法に応じて決められるものであり、前述した予測位置に基づく照合では位置情報であり、テンプレートとの相関に基づく方法では最新の候補領域から作成したテンプレートである。 Here, the information necessary for the collation is determined according to the collation method, and is the position information in the collation based on the predicted position described above, and created from the latest candidate area in the method based on the correlation with the template. It is a template.

なお、更新手段は、前記照合手段による照合の結果、前記今回取得された領域と前記領域の系列とが対応しない場合には、前記今回取得された領域を新たな系列として記憶するように、前記記憶手段の記憶内容を更新する。すなわち、照合手段により対応付けられる系列がない場合は、新たな系列を生成して、同様に照合に必要な情報と識別値を記憶する。 In addition, when the region acquired this time and the sequence of the region do not correspond as a result of the verification by the verification unit, the update unit stores the region acquired this time as a new sequence, Update the stored contents of the storage means. That is, if there is no series associated with the collating means, a new series is generated, and information necessary for collation and an identification value are similarly stored.

判定手段は、前記記憶手段に記憶されかつ前記更新された前記領域の系列において、異なる複数の所定時刻各々まで遡って算出した領域の識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較することにより、前記領域の系列が検出対象であるか否かを判定する。 The determination means stores a plurality of average values of the identification values of the areas calculated retroactively to each of a plurality of different predetermined times in the series of the areas stored and updated in the storage means, and decreases as the retroactive time increases. By comparing a plurality of threshold values for each corresponding time, it is determined whether or not the series of the region is a detection target.

ここで、所定の複数時刻まで遡って算出した識別値の複数の平均値とは、例えば、過去３画像分の領域の識別値の平均値、過去４画像分の識別値の平均値、過去５画像分の領域の識別値の平均値を意味する。 Here, the plurality of average values of the identification values calculated retroactively to a predetermined plurality of times are, for example, the average value of the identification values of the region for the past three images, the average value of the identification values for the past four images, and the past 5 It means the average value of the identification values of the image area.

上記のように判定すると、識別値の大きい対象は検出に要する画像数が少なくなり、識別値の小さいものは検出に要する画像数が多くなる。これにより、判定を誤りにくい対象、即ち、識別値が大きい対象は少ない画像数で即座に検出し、判定を誤りやすい対象、即ち、識別値が小さい対象は多数の画像に基づいて正確に判定される。 When the determination is made as described above, the number of images required for detection is small for an object with a large identification value, and the number of images required for detection is large for an object with a small identification value. As a result, an object that is difficult to make an error, that is, an object with a large identification value is detected immediately with a small number of images, and an object that is easy to make an error, that is, an object with a small identification value is accurately determined based on many images. The

また、固定カメラにより物体検出を行う場合、背景差分法や時間差分法を適用できるので比較的安定して候補領域を検出可能である。一方、撮像手段が動く状態で撮像すると、背景と前景の区別が難しいため、抽出される領域の一時的な欠落や位置ズレが生じやすい。しかし、本発明では領域が一時的に欠落する場合でも、照合手段により欠落前後の領域が時系列で対応付けられているため、欠落しない領域の識別値を用いているので、正しく判定することができる。 In addition, when performing object detection with a fixed camera, the background subtraction method and the time subtraction method can be applied, so that the candidate region can be detected relatively stably. On the other hand, when imaging is performed in a state where the imaging means is moving, it is difficult to distinguish the background from the foreground, so that the extracted region is likely to be temporarily lost or misaligned. However, even if a region is temporarily missing in the present invention, since the regions before and after the lack are matched in time series by the collating means, the identification value of the region that is not missing is used, so that the correct determination can be made. it can.

このように、本発明は、領域の系列において、異なる複数の所定時刻各々まで遡って算出した領域の識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較して、前記領域の系列が検出対象であるか否かを判定するので、識別値の平均値という連続的な情報を用いているため、判定精度を向上させることができる。なお、識別値を一時的に大きく誤るとその誤りが平均値に大きく影響することも考えられるが、本発明では、取得手段において識別値が所定値以上の領域のみを取得するので、大きな誤りを棄却することができ、この難点を除去している。 As described above, the present invention corresponds to a plurality of average values of identification values of areas calculated retroactively to a plurality of different predetermined times in a series of areas, and a plurality of threshold values that decrease as the retroactive time increases. Since it is determined at each time whether or not the series of the region is a detection target, continuous information called the average value of the identification values is used, so that the determination accuracy can be improved. Although it may be considered that if the identification value is mistaken temporarily, the error may greatly affect the average value.However, in the present invention, since the acquisition unit acquires only the region where the identification value is equal to or larger than the predetermined value, a large error is caused. It can be rejected, eliminating this difficulty.

ところで、前記記憶手段は、前記領域の系列に対応して、前記判定手段による判定の情報を更に記憶し、前記判定手段は、前記記憶手段に前記領域の系列に対応して記憶された判定の情報を更に考慮して、前記領域の系列が検出対象であるか否かを判定するようにしてもよい。 By the way, the storage means further stores determination information by the determination means corresponding to the series of areas, and the determination means stores the determination information stored in the storage means corresponding to the series of areas. In consideration of the information, it may be determined whether the sequence of the area is a detection target.

この場合、判定手段は、前記異なる複数の所定時刻の内の今回に近い所定時刻においては、前記識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較し、前記異なる複数の所定時刻の内の今回に遠い所定時刻においては、前記記憶手段に記憶された判定情報を更に考慮するようにしてもよい。 In this case, the determination means corresponds to a plurality of average values of the identification values and a plurality of threshold values that become smaller as the retroactive time becomes longer at a predetermined time close to the current time among the different predetermined times. The determination information stored in the storage unit may be further taken into consideration at a predetermined time far from the current time among the plurality of different predetermined times.

なお、上記判定の情報は、前記判定手段による判定の結果、及び、前記領域の系列が検出対象と判定された回数の少なくとも１つとしてもよい。 Note that the determination information may be at least one of a determination result by the determination unit and the number of times that the region series is determined as a detection target.

このように、前記領域の系列に対応して記憶された、判定手段による判定の情報を更に考慮するので、最終的な判定結果が頻繁に反転するのを防止できる。具体的には、一旦検出した検出対象を見失いにくくする効果がある。 In this way, since the determination information by the determination means stored corresponding to the series of the regions is further taken into account, it is possible to prevent the final determination result from being frequently inverted. Specifically, there is an effect of making it difficult to lose sight of the detection target once detected.

以上説明したように本発明は、領域の系列において、異なる複数の所定時刻各々まで遡って算出した領域の識別値の複数の平均値と、遡る時刻が長いほど小さくなる複数の閾値と、を対応する時刻毎に比較して、前記領域の系列が検出対象であるか否かを判定するので、判定精度を向上させることができる、という効果がある。 As described above, the present invention corresponds to a plurality of average values of area identification values calculated retroactively to a plurality of different predetermined times in a series of areas, and a plurality of threshold values that decrease as the retroactive time increases. Since it is determined whether or not the region series is a detection target, the determination accuracy can be improved.

以下、図面を参照して、本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１に示すように、本実施の形態にかかる物体検出装置は、監視領域を時系列に撮像した画像を時系列で出力する撮像部101と、撮像部101に接続され、時系列で出力される画像をフレーム毎に処理して、検出対象の候補領域の位置と形状情報を出力する候補抽出部102と、予め機械学習等によって算出された検出対象のモデルを表すパラメータが記憶された識別器104と、撮像部101、候補抽出部102、及び識別器104に接続され、候補抽出部102で指定された候補領域毎に、識別器104に記憶されたパラメータを用いて識別値を出力する識別値算出部103と、を備えている。 As shown in FIG. 1, the object detection apparatus according to the present embodiment is connected to the imaging unit 101 that outputs, in time series, images obtained by imaging a monitoring area in time series, and is output in time series. A candidate extraction unit 102 that processes the image for each frame and outputs the position and shape information of the detection target candidate region, and a discriminator in which parameters representing the detection target model calculated in advance by machine learning or the like are stored 104, an identification unit that is connected to the imaging unit 101, the candidate extraction unit 102, and the discriminator 104, and outputs a discrimination value using the parameters stored in the discriminator 104 for each candidate region designated by the candidate extraction unit 102 A value calculation unit 103.

また、本実施の形態にかかる物体検出装置は、候補抽出部102及び識別値算出部103に接続され、識別値が所定値以上である候補領域のみを選択し、その選択された候補領域の識別値と位置情報を出力する候補選択部105と、候補選択部105により出力された現フレームの候補領域と記憶部107に記憶された過去のフレームの候補領域の系列とを照合し、それらの対応を求め、現フレームの候補領域が過去の候補領域の系列と対応する場合、その系列に現フレームの候補領域の情報を加え、現フレームの候補領域が過去の候補領域の系列と対応しない場合、新たな系列を生成して現フレームの候補領域の情報を記憶するように記憶部107を制御する照合部106と、記憶部107に記憶された候補領域の系列を参照して系列毎に識別値の平均値を算出する演算部108と、演算部108で算出された識別値の平均値に基づいて、候補領域の系列毎に歩行者であるか否かを判定する判定部109と、を備えている。 Further, the object detection apparatus according to the present embodiment is connected to the candidate extraction unit 102 and the identification value calculation unit 103, selects only candidate regions whose identification value is equal to or greater than a predetermined value, and identifies the selected candidate region The candidate selection unit 105 that outputs the value and position information, the candidate region of the current frame output by the candidate selection unit 105 and the sequence of candidate regions of the past frame stored in the storage unit 107, and their correspondence If the candidate area of the current frame corresponds to a sequence of past candidate areas, information on the candidate area of the current frame is added to that series, and if the candidate area of the current frame does not correspond to the series of past candidate areas, A matching unit 106 that controls the storage unit 107 to generate a new sequence and store information on the candidate region of the current frame, and an identification value for each sequence with reference to the sequence of candidate regions stored in the storage unit 107 Calculating unit 108 for calculating the average value of Comprises based on the average value of the identification value calculated by the calculation unit 108, a determination unit 109 whether or not the pedestrian for each sequence of the candidate region.

次に、本実施の形態にかかる物体検出方法を、図２等を用いて説明する。 Next, an object detection method according to the present embodiment will be described with reference to FIG.

画像入力のステップ201では、ステレオ画像が入力される。候補抽出のステップ202では、ステレオ画像から距離画像を生成し、立体物を候補領域として抽出する。図３(a)は抽出された候補領域を点線の矩形で示している。この例では、歩行者300以外に車両302、標識304、ガードレール306、建物308が立体物として抽出されている。 In step 201 of image input, a stereo image is input. In candidate extraction step 202, a distance image is generated from the stereo image, and a three-dimensional object is extracted as a candidate region. FIG. 3A shows the extracted candidate area as a dotted rectangle. In this example, in addition to the pedestrian 300, a vehicle 302, a sign 304, a guardrail 306, and a building 308 are extracted as three-dimensional objects.

識別値算出のステップ203では、候補領域毎に識別値即ち検出対象との類似度を出力する。候補選択のステップ204では、識別値が所定値以上である候補領域のみを選択する。図３(b)は、候補領域毎に算出された識別値を示しており、識別値が‐0.5以上である候補領域を選択している。選択された候補領域は、実線の矩形で示されたＡ，Ｂ，Ｃである。 In step 203 for calculating the identification value, the identification value, that is, the similarity to the detection target is output for each candidate area. In the candidate selection step 204, only candidate regions whose identification value is greater than or equal to a predetermined value are selected. FIG. 3B shows an identification value calculated for each candidate area, and a candidate area having an identification value of −0.5 or more is selected. The selected candidate areas are A, B, and C indicated by solid rectangles.

照合のステップ205では、選択された候補領域と記憶されている過去の候補領域の系列とを照合して対応付けを行う。図４(a)は記憶された系列の位置（カメラとの相対位置）を実空間上に示したものであり、a0〜a5、b0〜b4の２つの系列が記憶されている。同図中の楕円は、２つの系列から予測した現フレームでの出現位置の範囲を示している。この例では、現フレームで抽出された候補領域Ａと候補領域Ｂはそれぞれ系列ai、bi（iは整数）と対応付けられ、候補Ｃは対応する系列が存在しない。 In step 205 of collation, the selected candidate area is collated with the stored series of past candidate areas to perform association. FIG. 4A shows the position of the stored series (relative position with respect to the camera) in real space, and two series of a0 to a5 and b0 to b4 are stored. The ellipse in the figure shows the range of appearance positions in the current frame predicted from the two sequences. In this example, candidate area A and candidate area B extracted in the current frame are associated with sequences ai and bi (i is an integer), respectively, and candidate C has no corresponding sequence.

系列の更新のステップ206では、照合結果に基づいて記憶された系列を更新する。図４(b)は更新された系列を示している。現フレームで対応する候補が存在する系列ai、biはインデックスiを１増加させ、Ａ，Ｂを最新の候補a0、b0として追加する。対応する系列が存在しない候補Ｃは、新たな系列c0を生成する。 In the sequence update step 206, the stored sequence is updated based on the collation result. FIG. 4B shows the updated sequence. For the series ai and bi in which there is a corresponding candidate in the current frame, the index i is incremented by 1, and A and B are added as the latest candidates a0 and b0. Candidate C for which no corresponding sequence exists generates a new sequence c0.

判定のステップ207では、系列ai、bi、ciについてそれぞれの識別値の平均を求め、後述する判定基準に基づいて歩行者であるか否かの判定を行う。図３(c)は、候補Ｂが歩行者であると判定されたことを示している。 In the determination step 207, the average of the respective identification values is obtained for the series ai, bi, and ci, and it is determined whether or not the person is a pedestrian based on a determination criterion described later. FIG. 3C shows that the candidate B is determined to be a pedestrian.

次に各ステップ201〜207を更に詳細に説明する。 Next, each step 201-207 is demonstrated in detail.

まず、ステップ201（撮像部101の動作）を説明する。撮像部101は、同じ被撮影領域を撮像する一対のカメラ（ステレオカメラ）で構成され、各々のカメラでは、同じ被撮像領域を所定間隔離れた位置から時系列に撮像し、得られた画像を時系列に出力する。 First, step 201 (operation of the imaging unit 101) will be described. The imaging unit 101 is composed of a pair of cameras (stereo cameras) that capture the same imaged area, and each camera images the same imaged area in time series from positions separated by a predetermined interval. Output in time series.

ステップ202（候補抽出部102の動作）では、候補領域を抽出する。具体的には、図８に示すように、ステップ150で、距離画像を生成する。すなわち、入力された2枚のステレオ画像の画素毎の対応付けを行い、対応する画素間のズレ量（視差）に基づいて距離画像を生成する。 In step 202 (operation of candidate extraction unit 102), candidate areas are extracted. Specifically, as shown in FIG. 8, in step 150, a distance image is generated. That is, the input two stereo images are associated for each pixel, and a distance image is generated based on a shift amount (parallax) between the corresponding pixels.

２枚のステレオ画像で対応が取れた画素即ち対象までの距離が得られた画素を視差点と呼ぶが、ステップ１５２では、この視差点を実空間を表すマップ上に投影する。車載ステレオカメラにより道路シーンを撮像し、図５(a)のような画像が得られたとする。図５(b)は、この画像から作成した距離画像の視差点を投影した実空間のマップである。図５(b)に示したように、歩行者300、車両302、ガードレール304、標識306等の路上の立体物の周辺では視差点の空間密度が大きくなる。したがって、候補抽出部102は、ステップ154で、マップ上の視差点の空間密度が大きい領域を抽出することにより、路上の立体物を歩行者の候補として抽出し、ステップ１５６で、その位置と形状の情報を出力する。位置の情報とは画像上での位置と実空間上での位置であり、前者は識別値算出部での画像内位置の特定に用いられ、後者は照合部106での対応付けに用いられる。形状の情報は画像内での形状と大きさの情報であり、歩行者の検出では候補領域の形状を所定縦横比の矩形と限定できるので、出力される形状情報はスケールファクターのみである。 A pixel with which correspondence between two stereo images is obtained, that is, a pixel with a distance to the target is called a parallax point. In step 152, the parallax point is projected onto a map representing the real space. Assume that a road scene is captured by an in-vehicle stereo camera and an image as shown in FIG. 5A is obtained. FIG. 5B is a map of the real space where the parallax points of the distance image created from this image are projected. As shown in FIG. 5B, the spatial density of the parallax points increases around the three-dimensional object on the road such as the pedestrian 300, the vehicle 302, the guard rail 304, and the sign 306. Therefore, the candidate extraction unit 102 extracts a solid object on the road as a pedestrian candidate by extracting an area where the spatial density of the parallax points on the map is large in step 154, and in step 156, the position and shape are extracted. The information of is output. The position information is a position on the image and a position on the real space. The former is used for specifying the position in the image by the identification value calculation unit, and the latter is used for the association by the matching unit 106. The shape information is information on the shape and size in the image. Since the shape of the candidate area can be limited to a rectangle having a predetermined aspect ratio in detection of a pedestrian, the shape information to be output is only a scale factor.

本実施の形態では、ステレオ画像から立体物を抽出する候補抽出方法を用いたが他の方法を用いることもできる。例えば、動画像撮像手段として遠赤外線カメラを用いることもできる。遠赤外カメラによる撮像画像では高温の物体の濃度値が大きくなるという性質があるため、候補抽出手段では単純な２値化や動的2値化法などにより歩行者を含む高温物体を候補として抽出することができる。また、候補抽出手段として、ミリ波レーダやレーザレーダ等の他のセンサを用いることも可能である。 In the present embodiment, the candidate extraction method for extracting a three-dimensional object from a stereo image is used, but other methods can also be used. For example, a far-infrared camera can be used as the moving image capturing means. In the image picked up by the far-infrared camera, the density value of the high-temperature object is large, so the candidate extraction means uses a simple binarization or dynamic binarization method as a candidate for high-temperature objects including pedestrians. Can be extracted. Further, other sensors such as a millimeter wave radar and a laser radar can be used as the candidate extraction means.

次に、ステップ203（識別値算出部103の動作）を説明する。 Next, step 203 (operation of the identification value calculation unit 103) will be described.

本実施の形態では、検出対象としての人のモデルを機械学習により予め生成して識別器104に記憶し、識別値算出部103は、これを用いて識別値を算出する。図６は、機械学習により人のモデルを生成する過程を示した概念図である。図６に示すように、歩行者と非歩行者のサンプルを多数用意し、これらの画素サイズが識別に適したサイズとなるように正規化する。 In the present embodiment, a human model as a detection target is generated in advance by machine learning and stored in the classifier 104, and the identification value calculation unit 103 calculates an identification value using this. FIG. 6 is a conceptual diagram showing a process of generating a human model by machine learning. As shown in FIG. 6, a large number of samples of pedestrians and non-pedestrians are prepared, and normalized so that these pixel sizes are suitable for identification.

歩行者を検出対象として識別する場合における横方向のサイズは10〜40画素であり、縦方向サイズは横方向サイズの２，３倍程度である。サイズを正規化されたサンプルは、特徴量に変換される。特徴量としては、例えば画像の濃度値やエッジ等を用いることができる。図６に示した特徴量は、上下方向、左右方向、斜め2方向の４つの方向のエッジである。更にこのエッジを低解像度化して画素数を削減したものは４方向面特徴と呼ばれる特徴量として知られている。特徴量は多次元ベクトルとして表すと、１つのサンプルは多次元空間即ち特徴量空間の中の１つの点である。図６では、多次元の特徴量空間を2次元で模式的に示しており、歩行者のサンプルを●印、非歩行者のサンプルを＋印で示している。機械学習は、多数のサンプルの分布を学習することにより、特徴量空間内での歩行者と非歩行者の分布の境界を求めることに等しい。機械学習の方法の１つであるサポートベクターマシンでは、多数のサンプルから境界付近のサンプルが自動的に抽出される。この抽出されたサンプルの特徴量ベクトルはサポートベクトルと呼ばれ、識別境界はこのサポートベクトルによって表される。識別器104には、サポートベクトルとその乗数が記憶されており、これが学習により生成された検出対象（歩行者）のモデルである。 In the case of identifying a pedestrian as a detection target, the horizontal size is 10 to 40 pixels, and the vertical size is about 2 to 3 times the horizontal size. The sample whose size is normalized is converted into a feature amount. As the feature amount, for example, an image density value or an edge can be used. The feature amounts shown in FIG. 6 are edges in four directions, ie, the up and down direction, the left and right direction, and the two diagonal directions. Further, the resolution of the edge is reduced to reduce the number of pixels, which is known as a feature quantity called a four-direction plane feature. When the feature quantity is expressed as a multidimensional vector, one sample is one point in the multidimensional space, that is, the feature quantity space. In FIG. 6, the multi-dimensional feature amount space is schematically shown in two dimensions, with the pedestrian sample indicated by ● and the non-pedestrian sample indicated by +. Machine learning is equivalent to obtaining the boundary between the distribution of pedestrians and non-pedestrians in the feature space by learning the distribution of a large number of samples. In a support vector machine, which is one of machine learning methods, samples near the boundary are automatically extracted from a large number of samples. The feature vector of the extracted sample is called a support vector, and the identification boundary is represented by this support vector. The discriminator 104 stores a support vector and its multiplier, and this is a detection target (pedestrian) model generated by learning.

図９は本ステップ203の処理フローチャートである。識別値算出部103は、図９のステップ160で、候補抽出部102で抽出された画像中の候補領域において、モデルの生成と同様にサイズを正規化し、ステップ162で、特徴量ベクトルｘを算出する。 FIG. 9 is a processing flowchart of step 203. The identification value calculation unit 103 normalizes the size in the candidate region in the image extracted by the candidate extraction unit 102 in step 160 of FIG. 9 in the same manner as the model generation, and calculates the feature vector x in step 162. To do.

ここで、Ｍは特徴量ベクトルの次元数である。 Here, M is the number of dimensions of the feature vector.

ステップ164で、識別値ｆを算出する。すなわち、識別器104に記憶されているサポートベクトルを、その乗数をとする。ここで、Nはサポートベクトルの数である。識別値ｆは次式により得られる。 In step 164, an identification value f is calculated. That is, the support vector stored in the discriminator 104 is set as its multiplier. Here, N is the number of support vectors. The identification value f is obtained by the following equation.

ここで、t_iはサンプルのラベルであり、サポートベクターs_iが歩行者サンプルであればt_i=1であり、s_iが非歩行者のサンプルであればt_i=−1である。また、hは定数である。K( )はカーネル関数であり、ガウスカーネルの場合は次式で定義される。 Here, t _i is a label of the sample, t _i = 1 if the support vector s _i is a pedestrian sample, and t _i = −1 if s _i is a non-pedestrian sample. H is a constant. K () is a kernel function. In the case of a Gaussian kernel, it is defined as

なお、ガウスカーネル以外にも、シグモイドカーネルや多項式カーネルをカーネル関数として用いることができる。 In addition to the Gaussian kernel, a sigmoid kernel or a polynomial kernel can be used as a kernel function.

そして、ステップ166で、識別値ｆを出力する。 In step 166, the identification value f is output.

識別値算出部103は、候補抽出部102で特定された位置の周辺の複数の位置及び特定されたスケールに近い複数のスケールで識別値を算出し、算出された複数の識別値の最大値や平均値をその候補領域の識別値として出力することもできる。これにより、候補抽出部102で特定される候補領域の位置やスケールの精度が低い場合でも、識別値を正しく求めることができる。 The identification value calculation unit 103 calculates an identification value at a plurality of positions around the position specified by the candidate extraction unit 102 and a plurality of scales close to the specified scale, and the maximum value of the plurality of calculated identification values The average value can also be output as the identification value of the candidate area. Thereby, even when the accuracy of the position and scale of the candidate area specified by the candidate extraction unit 102 is low, the identification value can be obtained correctly.

次に、ステップ204（候補選択部105の動作）では、識別値が所定値以上である候補領域のみを選択し、識別値及び照合部106で必要な情報を出力する。ここで所定値は、確実に歩行者でないと判断できる候補領域のみ棄却できるように定められ、真に歩行者である候補を棄却することなく候補領域の数を削減できる。この候補選択により次の2つの効果が得られる。第1に、候補領域の数が削減されると後段の照合手段における誤対応を防止することができる。第2に、誤って算出された識別値が後段の判定に用いられる平均値に反映されなくなるため、より正確な判定が可能になる。識別値が謝って算出される原因としては、候補領域が真に歩行者であってもその位置がズレて抽出されることや、識別器の学習が適切でないこと等が挙げられる。候補選択部105の代わりに、図７に示すような特性の変換関数により識別値を変換する手段を設けることにより、上記第２の効果を得ることができる。これは、識別値の下限値をFminにクリップすることにより、一時的に誤って小さくなった識別値の影響を小さくできるためである。 Next, in step 204 (operation of the candidate selection unit 105), only candidate regions whose identification value is greater than or equal to a predetermined value are selected, and the identification value and collation unit 106 outputs necessary information. Here, the predetermined value is determined so that only candidate areas that can be reliably determined not to be pedestrians can be rejected, and the number of candidate areas can be reduced without rejecting candidates who are truly pedestrians. This candidate selection provides the following two effects. First, if the number of candidate areas is reduced, it is possible to prevent erroneous correspondence in the subsequent collating means. Second, since the identification value calculated in error is not reflected in the average value used for the subsequent determination, more accurate determination is possible. The reason why the discriminant value is calculated by apologizing is that even if the candidate region is truly a pedestrian, its position is extracted with a shift, or learning of the discriminator is not appropriate. The second effect can be obtained by providing means for converting the identification value using a characteristic conversion function as shown in FIG. 7 instead of the candidate selection unit 105. This is because the influence of the identification value temporarily reduced by mistake can be reduced by clipping the lower limit value of the identification value to Fmin.

次に、ステップ205（照合部106の動作）について説明する。 Next, step 205 (operation of the collation unit 106) will be described.

記憶部107には、過去に抽出された候補領域の系列毎に、照合に必要な情報及び判定に必要な情報が記憶されている。以下は、１つの系列の記憶内容の一例である。
（１） F0、F1、…Fn
（２） J0、J1、…Jn
（３）Ｘ，Ｙ，Ｚ
（４） L
（５）内部パラメータ
ここで、上記（１）Fi(i=0,..,n)はiフレーム前の識別値であり、対応する候補領域がない場合は識別値がないことを示す値0が記憶されている。 The storage unit 107 stores information necessary for collation and information necessary for determination for each series of candidate regions extracted in the past. The following is an example of the contents stored in one series.
(1) F0, F1, ... Fn
(2) J0, J1, ... Jn
(3) X, Y, Z
(4) L
(5) Internal parameter Here, (1) Fi (i = 0,..., N) is an identification value before i frames, and a value 0 indicating that there is no identification value when there is no corresponding candidate area Is remembered.

（２）Ji(i=0,..,n)は、iフレーム前の判定結果であり、歩行者と判定された場合は１であり、非歩行者と判定された場合は0と記憶される。 (2) Ji (i = 0, .., n) is a determination result before i frame, and is 1 when determined as a pedestrian, and is stored as 0 when determined as a non-pedestrian. The

（３）Ｘ，Ｙ，Ｚは、系列の直前の位置ベクトルである。 (3) X, Y, and Z are position vectors immediately before the series.

（４）Lはロストした回数である。 (4) L is the number of lost times.

（５）は照合に必要な内部パラメータであり、後述するカルマンフィルタを用いる場合は、速度ベクトル(Vx,Vy,Vz)等の内部状態変数やその共分散行列等のパラメータである。 (5) is an internal parameter necessary for collation. When a Kalman filter described later is used, it is an internal state variable such as a velocity vector (Vx, Vy, Vz) or a parameter such as a covariance matrix thereof.

そして、本ステップ205では、照合部106は、記憶部107に記憶された系列と候補選択部105から出力された候補領域とを照合して対応付けを行い、その対応付けの結果に基づいて記憶部107の記憶内容を更新する。 In this step 205, the collation unit 106 collates the sequence stored in the storage unit 107 with the candidate area output from the candidate selection unit 105, associates them, and stores them based on the result of the association. The stored contents of the unit 107 are updated.

まず、照合部106は、候補選択部105で出力された候補領域と、記憶部107に記憶されている系列のすべての組み合わせについてその対応度C(i,j) (i=1,…,Nc; j=1,…,Ns)を算出する。 First, the matching unit 106 determines the correspondence C (i, j) (i = 1,..., Nc) for all combinations of the candidate area output from the candidate selection unit 105 and the series stored in the storage unit 107. ; j = 1, ..., Ns) is calculated.

ここで、iは候補領域の番号であり、Ncはその数である。ｊは系列の番号であり、Nsはその数である。 Here, i is the number of the candidate area, and Nc is the number thereof. j is the number of the series and Ns is the number.

上記の対応度を求めた後に、図10に示したフローチャートに基づいて対応付けを行う。 After obtaining the degree of correspondence, association is performed based on the flowchart shown in FIG.

図１０のステップ801では、変数の初期化を行う。Rc(i) (i=1,..,Nc)はi番目の候補領域が対応する系列の番号を格納する変数であり、すべてのiについてRc(i)=0と初期化する。Rc(i)=0はi番目の候補領域に対応する系列が存在しないことを示す。Rs(j) (j=1,..,Ns)はj番目の系列が対応する候補領域の番号を格納する変数であり、すべてのjについてRs(j)=0と初期化する。Rs(j)=0はj番目の系列に対応する候補領域が存在しないことを示す。Mc(i) (i=1,..,Nc)はi番目の候補領域に対して対応度が最大となる系列との対応度を格納する変数であり、すべてのiについてMc(i)=Tと初期化する。Tは候補領域と系列とが対応するための対応度の閾値である。ステップ802で、変数ｉを１に初期化する。 In step 801 in FIG. 10, the variables are initialized. Rc (i) (i = 1,..., Nc) is a variable for storing the sequence number corresponding to the i-th candidate region, and is initialized to Rc (i) = 0 for all i. Rc (i) = 0 indicates that there is no sequence corresponding to the i-th candidate region. Rs (j) (j = 1,..., Ns) is a variable for storing the number of the candidate area corresponding to the jth series, and is initialized to Rs (j) = 0 for all j. Rs (j) = 0 indicates that there is no candidate region corresponding to the j-th sequence. Mc (i) (i = 1,.., Nc) is a variable for storing the degree of correspondence with the series having the maximum degree of correspondence with respect to the i-th candidate region, and Mc (i) = Initialize with T. T is a threshold value of the degree of correspondence for the candidate area and the series to correspond. In step 802, a variable i is initialized to 1.

ステップ803、804、805、806、807のループにおいて、i番目の候補領域に対して最大の対応度Mc(i)をもつ系列の番号Rc(i)が決定される。すなわち、ステップ803で、変数ｊを１に初期化し、ステップ804で、i番目の候補領域とj番目の系列との対応度C(i,j) が、Mc(i) より大きいか否かを判断することにより、閾値Ｔ以上且つj番目より前の系列との対応度より大きいか否かを判断する。 In the loop of Steps 803, 804, 805, 806, and 807, the sequence number Rc (i) having the maximum correspondence Mc (i) with respect to the i-th candidate region is determined. That is, in step 803, the variable j is initialized to 1, and in step 804, whether or not the correspondence C (i, j) between the i-th candidate region and the j-th sequence is greater than Mc (i). By determining, it is determined whether or not the degree of correspondence is greater than the threshold T and before the j-th sequence.

C(i,j) がMc(i) より大きくないと判断された場合には、ステップ806に進む。C(i,j) がMc(i) より大きいと判断された場合には、ステップ805で、Rc(i)をjにセットする、すなわち、i番目の候補領域に対応する系列として、j番目の系列を示すｊを記憶する。これにより、i番目の候補領域に対応する系列が、Rc(i)に記憶されているｊにより識別される系列であることを把握することができる。 If it is determined that C (i, j) is not greater than Mc (i), the process proceeds to step 806. If it is determined that C (i, j) is greater than Mc (i), in step 805, Rc (i) is set to j, i.e., the j-th sequence as the sequence corresponding to the i-th candidate region. J indicating the sequence of. Thereby, it can be understood that the sequence corresponding to the i-th candidate region is a sequence identified by j stored in Rc (i).

また、本ステップ805で、Mc(i)に、i番目の候補領域とj番目の系列との対応度C(i,j)を記憶する。これにより、i番目の候補領域と該i番目の候補領域に対応する系列との対応度がMc(i)に記憶されているC(i,j)により把握される。 In step 805, the correspondence degree C (i, j) between the i-th candidate region and the j-th sequence is stored in Mc (i). As a result, the degree of correspondence between the i-th candidate region and the sequence corresponding to the i-th candidate region is grasped by C (i, j) stored in Mc (i).

ステップ806で、変数jを１インクリメントし、ステップ807で、変数ｊが、系列の総数Ｎｓより大きいか否かを判断することにより、i番目の候補領域と最も対応する系列がどれであるか全ての系列について判断されたか否かを判断することができる。 In step 806, the variable j is incremented by 1. In step 807, it is determined whether the variable j is larger than the total number Ns of sequences. It can be determined whether or not it has been determined for the series.

変数ｊが、系列の総数Ｎｓより大きいと判断されなかった場合には、i番目の候補領域と最も対応する系列がどれであるか全ての系列について判断されていないので、ステップ804に戻って以上の処理（ステッ804〜807）を実行する。なお、ステップ801においてMc(i)=Tと初期化されているので、Ｔより大きい対応度C(i,j) (j=1,…,Ns)が存在しない場合は初期値Rc(i)=0は更新されず、即ち対応する系列はないと判断される。 If it is not determined that the variable j is larger than the total number Ns of series, it is not determined for all the series which series corresponds most to the i-th candidate area, so that the process returns to step 804. (Steps 804 to 807) are executed. Since Mc (i) = T is initialized in step 801, if there is no correspondence C (i, j) (j = 1,..., Ns) greater than T, the initial value Rc (i) = 0 is not updated, that is, it is determined that there is no corresponding sequence.

一方、変数ｊが、系列の総数Ｎｓより大きいと判断された場合には、i番目の候補領域と最も対応する系列がどれであるか全ての系列について判断された、すなわち、i番目の候補領域に対して最大の対応度Mc(i)をもつ系列の番号Rc(i)が決定されたので、ステップ808に進む。 On the other hand, if it is determined that the variable j is greater than the total number Ns of series, it is determined for all the series which is the series most corresponding to the i-th candidate area, that is, the i-th candidate area. Since the sequence number Rc (i) having the maximum correspondence Mc (i) is determined, the process proceeds to step 808.

ステップ808で、Rc(i)が0より大きいか否かを判断する。Rc(i)が0より大きくない、すなわち、０である場合は、i番目の候補領域に対応する系列が存在しないので、i番目の候補領域に対する処理を終了してステップ814へ進む。 In step 808, it is determined whether Rc (i) is greater than zero. If Rc (i) is not greater than 0, that is, 0, there is no sequence corresponding to the i-th candidate region, so the processing for the i-th candidate region is terminated and the process proceeds to step 814.

Rc(i)が0より大きい場合には、i番目の候補領域に対応する系列が存在するので、ステップ809で、Rc(i)番目の系列が既に他の候補領域と対応付けられているか否かを判定し、対応付けられていない場合はステップ810においてRc(i)番目の系列がi番目の候補領域と対応付けられるように変数を設定する。 If Rc (i) is greater than 0, there is a sequence corresponding to the i-th candidate region, so whether or not the Rc (i) -th sequence is already associated with another candidate region in step 809. If they are not associated with each other, variables are set in step 810 so that the Rc (i) -th sequence is associated with the i-th candidate region.

また、ステップ809においてRc(i)番目の系列が既にRs(Rc(i))番目の候補領域と対応付けられていると判定された場合は、ステップ811において対応度の比較を行う。既に対応付けられたRs(Rc(i))番目の候補領域よりもi番目の候補領域の方が、Rc(i)番目の系列に対する対応度が大きい場合は、ステップ812においてRs(Rc(i))番目の候補領域と対応する系列が存在しないように変数を設定する。また、既に対応付けられたRs(Rc(i))番目の候補領域よりもi番目の候補領域の方が、Rc(i)番目の系列に対する対応度が小さい場合は、ステップ813においてi番目の候補領域と対応する系列が存在しないように変数を設定する。 If it is determined in step 809 that the Rc (i) -th sequence is already associated with the Rs (Rc (i))-th candidate region, the degree of correspondence is compared in step 811. If the i-th candidate region has a higher degree of correspondence with the Rc (i) -th sequence than the already associated Rs (Rc (i) -th) candidate region, Rs (Rc (i )) Set variables so that there is no sequence corresponding to the second candidate area. If the i-th candidate area is smaller than the Rs (Rc (i))-th candidate area already associated with the Rc (i) -th series, the i-th candidate area has a smaller degree of correspondence in step 813. Variables are set so that there is no series corresponding to the candidate area.

ステップ814で、変数iを１インクリメントし、ステップ815で、変数iが候補領域の総数Ｎｃより大きいか否かを判断することにより、すべての候補領域に対する対応付けをしたか否か判断する。 In step 814, the variable i is incremented by 1. In step 815, it is determined whether or not the variable i is associated with all candidate areas by determining whether the variable i is larger than the total number Nc of candidate areas.

変数iが候補領域の総数Ｎｃより大きいと判断されなかった場合には、未だ対応付けしていない候補領域があるので、ステップ803に戻って、以上の処理（ステップ803〜ステップ815）を実行する。変数iが候補領域の総数Ｎｃより大きいと判断された場合には、すべての候補領域に対する対応付けを完了したので、本処理を終了する。 If it is not determined that the variable i is larger than the total number Nc of candidate areas, there is a candidate area that has not been associated yet, so the process returns to step 803 and the above processing (step 803 to step 815) is executed. . When it is determined that the variable i is larger than the total number Nc of candidate areas, since the association with all candidate areas is completed, this process is terminated.

ところで、対応度の算出及び記憶部107に記憶された情報の更新は、例えばカルマンフィルターを用いて行うことができる。カルマンフィルターを用いる場合に記憶部107に記憶される情報は、前フレームでの位置ベクトル及び前フレームでの状態を表す内部パラメータである。内部パラメータとは、位置ベクトルや内部変数ベクトルの共分散行列やシステムノイズ等である。カルマンフィルターはこれらの前フレームの状態から現フレームでの位置ベクトルと状態を予測することができる。現フレームの状態として位置ベクトルの共分散行列が予測されるので、系列と候補領域との対応度は、例えば予測した位置ベクトルと候補領域の位置ベクトルとのマハラノビス距離として求めることができる。但し、マハラノビス距離は離間度を表すので、対応度はその逆数や負の符号を付けたものである。 Incidentally, the calculation of the correspondence level and the update of the information stored in the storage unit 107 can be performed using, for example, a Kalman filter. When the Kalman filter is used, information stored in the storage unit 107 is an internal parameter that represents a position vector in the previous frame and a state in the previous frame. The internal parameter is a covariance matrix of a position vector or an internal variable vector, system noise, or the like. The Kalman filter can predict the position vector and state in the current frame from the state of the previous frame. Since the covariance matrix of the position vector is predicted as the state of the current frame, the correspondence between the series and the candidate area can be obtained as, for example, the Mahalanobis distance between the predicted position vector and the position vector of the candidate area. However, since the Mahalanobis distance represents the degree of separation, the degree of correspondence is obtained by adding its reciprocal or a negative sign.

記憶部107に記憶された情報の更新は図１１に示すように、以下のように行う。ステップ902で、系列を識別する変数ｋを０に初期化し、ステップ904で、変数ｋを１インクリメントする。ステップ906で、変数ｋにより識別される系列ｋに対応する候補領域が存在しないか否か判断し、変数ｋにより識別される系列ｋに対応する候補領域が存在しない場合には、ステップ908で、系列ｋに対応するロスト回数を１増加させ、ステップ910で、ロスト回数が所定値以上か否を判断する。ロスト回数が所定値以上であれば、ステップ912で、系列ｋを消去する。ロスト回数が所定値未満であれば、ステップ914で、対応度算出時に予測した位置ベクトルと状態に記憶内容を更新する。 As shown in FIG. 11, the information stored in the storage unit 107 is updated as follows. In step 902, a variable k for identifying a series is initialized to 0. In step 904, the variable k is incremented by 1. In step 906, it is determined whether or not there is a candidate area corresponding to the series k identified by the variable k. If there is no candidate area corresponding to the series k identified by the variable k, in step 908, The number of lost times corresponding to the series k is incremented by 1, and it is determined in step 910 whether the number of lost times is equal to or greater than a predetermined value. If the number of lost times is equal to or greater than a predetermined value, in step 912, the series k is deleted. If the number of lost times is less than the predetermined value, in step 914, the stored content is updated to the position vector and state predicted when the correspondence degree is calculated.

ステップ906で、対応する候補領域が存在すると判断された場合には、ステップ916で、その対応する候補領域の位置ベクトルと記憶された前フレームでの位置ベクトル及び状態を用いて現フレームでの位置ベクトルと状態を推定し、ステップ918で、この推定された値に記憶内容を更新する。 If it is determined in step 906 that the corresponding candidate area exists, the position in the current frame is determined in step 916 using the position vector of the corresponding candidate area and the stored position vector and state in the previous frame. The vector and state are estimated, and in step 918, the stored content is updated to the estimated value.

系列が消去される場合以外（ステップ914、918）では、識別値Fi(i=0,..,n)と判定結果Ji(i=0,..,n)の更新も行う。但し、現フレームの判定結果であるJ0は後段で判定された後に更新される。 Except when the series is deleted (steps 914 and 918), the identification value Fi (i = 0,... N) and the determination result Ji (i = 0,... N) are also updated. However, J0, which is the determination result of the current frame, is updated after being determined in the subsequent stage.

ステップ920で、変数ｋが、系列の総数Ｎｋ以上か否かを判断し、変数ｋが、系列の総数Ｎｋ以上でない場合には、ステップ904に戻って、以上の処理（ステップ904〜920）を実行し、変数ｋが、系列の総数Ｎｋ以上の場合には、ステップ922で、対応付けられる系列が存在しない候補領域があるか否かを判断し、ある場合に、ステップ924で、新規に系列を生成する。このとき、Fi(i=0,..,n)はすべて0に、Ji(i=0,..,n)はすべて0に、ロスト回数はゼロに初期化する。また、位置ベクトルは候補領域の位置ベクトルを記憶し、内部状態は適切な初期状態に定める。 In step 920, it is determined whether or not the variable k is greater than or equal to the total number of series Nk. If the variable k is not greater than or equal to the total number of series Nk, the process returns to step 904 and the above processing (steps 904 to 920) is performed. If the variable k is greater than or equal to the total number Nk of sequences, it is determined in step 922 whether there is a candidate area for which there is no associated sequence. Is generated. At this time, Fi (i = 0,.., N) is initialized to 0, Ji (i = 0, .., n) is initialized to 0, and the number of lost times is initialized to zero. Further, the position vector stores the position vector of the candidate area, and the internal state is set to an appropriate initial state.

次に、ステップ207（演算部108と判定部109の動作）を詳細に説明する。演算部108は記憶部107に記憶された情報を参照して、歩行者か否かを判定するのに必要な評価値を算出する。判定部109は、演算部108で算出された評価値に基づいて判定を行いその結果を出力する。演算部108で算出する評価値を以下に示す。 Next, step 207 (operations of the calculation unit 108 and the determination unit 109) will be described in detail. The calculation unit 108 refers to the information stored in the storage unit 107 and calculates an evaluation value necessary to determine whether or not the person is a pedestrian. The determination unit 109 performs determination based on the evaluation value calculated by the calculation unit 108 and outputs the result. The evaluation values calculated by the calculation unit 108 are shown below.

Siはiフレーム前までの識別値の加重平均であり、NJはnフレーム前から1フレームまでの間に歩行者であると判定された回数である。加重平均の重みは次式のように定義でできる。 Si is a weighted average of identification values up to i frames before, and NJ is the number of times that it was determined to be a pedestrian between n frames up to 1 frame. The weight of the weighted average can be defined as follows:

対応する候補領域が存在しないフレーム（Ｆ_i＝０）では、識別値と加重は共に0であるので、平均値には反映されない。また、次式のように重みを定義することもできる。 In a frame (F _i = 0) in which no corresponding candidate area exists, the identification value and the weight are both 0, and thus are not reflected in the average value. Also, the weight can be defined as in the following equation.

ここで、ｒ＞１である。 Here, r> 1.

この定義では、現フレームに近い識別値をより大きく平均値即ち評価値に反映させることができる。 According to this definition, an identification value close to the current frame can be reflected to a larger average value, that is, an evaluation value.

判定部109ではこれらの評価値を用いて、所定の判定条件に基づき判定を行う。n=6であるときの判定条件の例をいくつか示す。 The determination unit 109 uses these evaluation values to make a determination based on a predetermined determination condition. Some examples of determination conditions when n = 6 are shown.

第１の例
n=6までの異なる複数の時刻各々まで遡って算出した候補領域の識別値の複数の平均値（評価値）S₀〜Ｓ₆と、遡る時刻が長いほど小さくなる複数の閾値Ｔ₀〜Ｔ₆と、を対応する時刻毎に比較する。
(1) S0 > T0
(2) S1 > T1
(3) S2 > T2
(4) S3 > T3
(5) S4 > T4
(6) S5 > T5
(7) S6 > T6
但し、Ti > Ti+1 (i=0,...,n-1)（遡る時刻が長いほど小さくなる）
条件(1)から(7)の1つ以上を満たした場合に歩行者であると判定する。 First example
A plurality of average values (evaluation values) S _{0 to} S ₆ of identification values of candidate areas calculated retroactively to a plurality of different times up to n = 6, and a plurality of threshold values T _{0 to} T that become smaller as the retrospective time becomes longer. ₆ is compared at each corresponding time.
(1) S0> T0
(2) S1> T1
(3) S2> T2
(4) S3> T3
(5) S4> T4
(6) S5> T5
(7) S6> T6
However, Ti> Ti + 1 (i = 0, ..., n-1) (The smaller the retroactive time, the smaller)
When one or more of the conditions (1) to (7) are satisfied, it is determined that the person is a pedestrian.

第２の例
n=6までの異なる複数の時刻各々まで遡って算出した候補領域の識別値の複数の平均値（評価値）S₀〜Ｓ₆と、遡る時刻が長いほど小さくなる複数の閾値Ｔ₀〜Ｔ₆及び判定の情報（判定結果(J1+1)）により定まる値と、を対応する時刻毎に比較する。
(1) S0 > T0 / (J1+1)
(2) S1 > T1 / (J1+1)
(3) S2 > T2 / (J1+1)
(4) S3 > T3 / (J1+1)
(5) S4 > T4 / (J1+1)
(6) S5 > T5 / (J1+1)
(7) S6 > T6 / (J1+1)
但し、Ti > Ti+1 (i=0,...,n-1) （遡る時刻が長いほど小さくなる）
条件(1)から(7)の1つ以上を満たした場合に歩行者であると判定する
第３の例
n=6までの異なる複数の時刻各々まで遡って算出した候補領域の識別値の複数の平均値（評価値）S₀〜Ｓ₆と、遡る時刻が長いほど小さくなる複数の閾値Ｔ₀〜Ｔ₆及び判定の情報（歩行者であると判定された回数(NJ+6)）により定まる値と、を対応する時刻毎に比較する。
(1) S0 > T0 ×6 / (NJ+6)
(2) S1 > T1 ×6 / (NJ+6)
(3) S2 > T2 ×6 / (NJ+6)
(4) S3 > T3 ×6 / (NJ+6)
(5) S4 > T4 ×6 / (NJ+6)
(6) S5 > T5 ×6 / (NJ+6)
(7) S6 > T6 ×6 / (NJ+6)
但し、Ti > Ti+1 (i=0,...,n-1) （遡る時刻が長いほど小さくなる）
条件(1)から(7)の1つ以上を満たした場合に歩行者であると判定する
第４の例
n=6までの異なる複数の所定時刻の内の今回に近い所定時刻（n＝0〜３）においては、前記識別値の複数の平均値S₀〜Ｓ₆と、遡る時刻が長いほど小さくなる複数の閾値T₀〜T₆と、を対応する時刻毎に比較し、n=6までの異なる複数の所定時刻の内の今回に遠い所定時刻（n＝4〜6）においては、判定情報（歩行者であると判定された回数NJ）を更に考慮する。
(1) S0 > T0
(2) S1 > T1
(3) S2 > T2
(4) S3 > T3
(5) S4 > T4 and NJ≧1
(6) S5 > T5 and NJ≧2
(7) S6 > T6 and NJ≧3
但し、Ti > Ti+1 (i=0,...,n-1)
条件(1)から(7)の1つ以上を満たした場合に歩行者であると判定する
以上に示した判定条件により、歩行者であると判定された場合はJ0=1とし、歩行者でないと判定されたときはJ0=0として記憶部107の内容を更新する。 Second example
A plurality of average values (evaluation values) S _{0 to} S ₆ of identification values of candidate areas calculated retroactively to a plurality of different times up to n = 6, and a plurality of threshold values T _{0 to} T that become smaller as the retrospective time becomes longer. ₆ and a value determined by determination information (determination result (J1 + 1)) are compared for each corresponding time.
(1) S0> T0 / (J1 + 1)
(2) S1> T1 / (J1 + 1)
(3) S2> T2 / (J1 + 1)
(4) S3> T3 / (J1 + 1)
(5) S4> T4 / (J1 + 1)
(6) S5> T5 / (J1 + 1)
(7) S6> T6 / (J1 + 1)
However, Ti> Ti + 1 (i = 0, ..., n-1) (The smaller the retroactive time, the smaller)
A third example that determines that a person is a pedestrian when one or more of conditions (1) to (7) are met
A plurality of average values (evaluation values) S _{0 to} S ₆ of identification values of candidate areas calculated retroactively to a plurality of different times up to n = 6, and a plurality of threshold values T _{0 to} T that become smaller as the retrospective time becomes longer. ₆ and the value determined by the determination information (number of times determined to be a pedestrian (NJ + 6)) are compared at each corresponding time.
(1) S0> T0 × 6 / (NJ + 6)
(2) S1> T1 × 6 / (NJ + 6)
(3) S2> T2 × 6 / (NJ + 6)
(4) S3> T3 × 6 / (NJ + 6)
(5) S4> T4 × 6 / (NJ + 6)
(6) S5> T5 × 6 / (NJ + 6)
(7) S6> T6 × 6 / (NJ + 6)
However, Ti> Ti + 1 (i = 0, ..., n-1) (The smaller the retroactive time, the smaller)
A fourth example that determines that a person is a pedestrian when one or more of conditions (1) to (7) are met
At a predetermined time (n = 0 to 3) close to the current time among a plurality of different predetermined times up to n = 6, a plurality of average values S _{0 to} S _{6 of the} identification value become smaller as the retroactive time becomes longer. A plurality of threshold values T _{0 to} T ₆ are compared for each corresponding time, and at a predetermined time (n = 4 to 6) far from this time among a plurality of different predetermined times up to n = 6, determination information ( Further consider the number of times NJ) determined to be a pedestrian.
(1) S0> T0
(2) S1> T1
(3) S2> T2
(4) S3> T3
(5) S4> T4 and NJ ≧ 1
(6) S5> T5 and NJ ≧ 2
(7) S6> T6 and NJ ≧ 3
However, Ti> Ti + 1 (i = 0, ..., n-1)
If one or more of the conditions (1) to (7) are met, it is determined that the person is a pedestrian. Is determined, J0 = 0 and the contents of the storage unit 107 are updated.

以上説明した本実施の形態では、候補抽出部１０２、識別値算出部１０２において、出力画像をフレーム毎に処理するため、カメラが動く場合であっても静止した場合であっても、背景の動きに影響されることなく検出対象の候補領域を抽出してその識別値を出力することができる。上記非特許文献２等の方法は、過去のフレーム（時刻）での処理結果を用いて最終判定を行うが、この処理結果とは検出対象の有無を表す２値の情報である。それに対して、本実施の形態では、最終判定が用いる過去の情報は、歩行者との類似度を表す連続値の情報である。このため、本実施の形態は従来技術よりも正確な最終判定を行うことができる。 In the present embodiment described above, since the candidate extraction unit 102 and the identification value calculation unit 102 process the output image for each frame, the motion of the background regardless of whether the camera is moving or stationary. It is possible to extract a candidate area to be detected without being influenced by and to output the identification value. In the method of Non-Patent Document 2 and the like, the final determination is performed using the processing result in the past frame (time), and this processing result is binary information indicating the presence or absence of the detection target. On the other hand, in the present embodiment, the past information used in the final determination is continuous value information representing the degree of similarity with the pedestrian. For this reason, this embodiment can perform final determination more accurate than the prior art.

また、本実施の形態では、候補選択部１０５において、識別値が所定値以上である候補領域を選択するが、確実に歩行者でないと判断できる候補領域の棄却されるように所定値を定めているので、候補領域の数を削減することにより、後段の照合部１０６における誤対応を防止できると共に、判定部１０９における判定をより正確に行えることができる。これは、候補領域の抽出位置のズレ等に起因して識別値が誤って一時的に小さくなった場合に、候補領域が棄却されことにより判定に反映されなくなるためである。動くカメラにより歩行者を検出する場合、候補領域の位置ズレが生じやすく、また、見え方の時間変化のために識別値を誤りやすい。したがって、この候補選択が特に有効に作用する。 In the present embodiment, candidate selection section 105 selects a candidate area whose identification value is greater than or equal to a predetermined value. However, a predetermined value is set so that candidate areas that can be determined to be non-pedestrians are rejected. Therefore, by reducing the number of candidate areas, it is possible to prevent erroneous correspondence in the subsequent collation unit 106 and to perform the determination in the determination unit 109 more accurately. This is because, when the identification value is accidentally temporarily reduced due to a deviation in the extraction position of the candidate area, the candidate area is rejected and is not reflected in the determination. When a pedestrian is detected by a moving camera, the position of the candidate area is likely to be misaligned, and the identification value is likely to be erroneous due to the temporal change in appearance. Therefore, this candidate selection works particularly effectively.

更に、本実施の形態では、照合を行う候補領域の系列は、図１１を用いて説明したように、直前のフレームにおいて候補領域と対応付けられたものに限定されない。所定フレーム間対応付けられる候補領域が存在しない系列も含む。したがって、候補領域が一時的に欠落する場合でも、対応付けられた前後のフレームでの識別値を用いて適切な判定が可能である。 Furthermore, in this embodiment, the sequence of candidate areas to be collated is not limited to that associated with the candidate area in the immediately preceding frame, as described with reference to FIG. A sequence in which there is no candidate area associated with a predetermined frame is also included. Therefore, even when the candidate area is temporarily lost, it is possible to make an appropriate determination using the identification values in the previous and subsequent frames associated with each other.

また、本実施の形態では、識別値の時間平均に基づいて判定を行うため、最終的な判定結果は識別値の時間変動の影響を受けにくい。また、識別値の大きい対象は検出に要するフレーム数が少なくなり、識別値の小さいものは検出に要するフレーム数が多くなる。したがって、判定を誤りにくい対象は少ないフレーム数で即座に検出され、判定を誤りやすい対象は多数のフレームに基づいてより正確に判定される。 In the present embodiment, since the determination is performed based on the time average of the identification value, the final determination result is not easily affected by the time variation of the identification value. In addition, an object with a large identification value requires fewer frames for detection, and an object with a small identification value requires more frames for detection. Therefore, an object that is less likely to be erroneously detected is immediately detected with a small number of frames, and an object that is likely to be erroneously determined is more accurately determined based on a large number of frames.

固定カメラにより物体検出を行う場合、背景差分法や時間差分法を適用できるので比較的安定して候補領域を検出可能である。動くカメラによる物体検出では、背景と前景の区別が難しいため、抽出される候補領域の一時的な欠落や位置ズレが生じやすい。また、候補領域の位置ズレが生じると識別値も誤って算出される。更に、候補領域の位置が正しく抽出された場合でも、歩行者のような見え方のパターンが多用且つ時間変動する場合、識別値も誤りやすいという課題がある。 When performing object detection with a fixed camera, the background subtraction method or the time subtraction method can be applied, so that the candidate region can be detected relatively stably. In the object detection by the moving camera, it is difficult to distinguish the background from the foreground, and therefore, the extracted candidate region is likely to be temporarily lost or misaligned. In addition, when the position of the candidate area is shifted, the identification value is also erroneously calculated. Furthermore, even when the position of the candidate area is correctly extracted, there is a problem that the identification value is likely to be erroneous when the pattern of appearance like a pedestrian is frequently used and fluctuates over time.

以上説明したように、動くカメラで撮像した画像から見え方のパターンが多様且つ時間変動する物体を検出対象であるか否かを正確に判定して検出するのは従来困難であったが、本実施の形態では、これを可能にした。 As described above, it has been difficult in the past to accurately determine and detect whether or not an object whose appearance pattern is diverse and time-varying from an image captured by a moving camera is a detection target. In the embodiment, this is made possible.

以上説明した実施の形態では、識別値として機械学習により生成した識別器から出力される値を用いた。この他にも候補領域の検出対象らしさを表す値を識別値とすることができる。例えば、検出対象の代表的なサンプルをテンプレートとして、そのテンプレートとの相関値等を用いてもよい。 In the embodiment described above, the value output from the classifier generated by machine learning is used as the identification value. In addition, a value representing the likelihood of detection of a candidate area can be used as an identification value. For example, a representative sample to be detected may be used as a template, and a correlation value with the template may be used.

本実施の形態にかかる物体検出装置のブロック図である。It is a block diagram of the object detection apparatus concerning this Embodiment. 本実施の形態にかかる物体検出方法を示したフローチャートである。It is the flowchart which showed the object detection method concerning this Embodiment. 候補流域の選択を説明する説明図である。It is explanatory drawing explaining selection of a candidate basin. 候補領域の系列の更新を説明する説明図である。It is explanatory drawing explaining the update of the series of a candidate area | region. 候補領域の抽出を説明する説明図である。It is explanatory drawing explaining extraction of a candidate area | region. 歩行者のモデルの生成を説明する説明図である。It is explanatory drawing explaining the production | generation of a pedestrian model. 識別値を変換する変換関数を示す図である。It is a figure which shows the conversion function which converts an identification value. 候補抽出処理を示すフローチャートである。It is a flowchart which shows a candidate extraction process. 識別値算出処理を示すフローチャートである。It is a flowchart which shows an identification value calculation process. 照合処理を示すフローチャートである。It is a flowchart which shows a collation process. 記憶部107に記憶された情報の更新処理を示すフローチャートである。7 is a flowchart illustrating an update process of information stored in a storage unit 107.

Explanation of symbols

１０１撮像部（撮像手段）
１０２候補抽出部（取得手段、抽出手段）
１０３識別値算出部（取得手段、識別値算出手段）
１０４識別器（モデル）
１０５候補選択部（取得手段、選択手段）
１０６照合部（照合手段、更新手段）
１０７記憶部（記憶手段）
１０８演算部（判定手段）
１０９判定部（判定手段） 101 Imaging unit (imaging means)
102 Candidate extraction unit (acquisition means, extraction means)
103 Identification value calculation unit (acquisition means, identification value calculation means)
104 Classifier (model)
105 Candidate selection unit (acquisition means, selection means)
106 Verification unit (verification unit, update unit)
107 Storage unit (storage means)
108 Calculation unit (determination means)
109 Determination unit (determination means)

Claims

Imaging means for outputting in time series images obtained by imaging the imaged region in time series;
Acquisition of an area where an identification value indicating a degree of identification as a detection target is greater than or equal to a predetermined value based on a model of the detection target created in advance from each of the images output in time series by the imaging unit Means,
Storage means for storing a series of areas that are one or more areas acquired by the acquisition means from each of the images output in time series and associated with each other;
Collation means for collating whether the area acquired this time by the acquisition means corresponds to the series of areas stored in the storage means;
As a result of the collation by the collation means, when the region acquired this time corresponds to the sequence of the region, the region acquired this time and the sequence of the region are associated with each other, and the region acquired this time And an update unit that updates the storage unit so as to store the area acquired this time as a new series,
A plurality of average values of the identification values of the areas calculated retroactively to each of a plurality of different predetermined times in the series of the areas stored and updated in the storage means, and a plurality of threshold values that become smaller as the retroactive time becomes longer , For each corresponding time, determination means for determining whether the series of the region is a detection target,
An object detection apparatus comprising:

The storage means further stores information of determination by the determination means corresponding to the series of the areas,
The determination means further considers determination information stored in the storage means corresponding to the series of areas, and determines whether the series of areas is a detection target;
The object detection apparatus according to claim 1.

The determination means includes a plurality of average values of the identification values and a plurality of threshold values that become smaller as the retroactive time becomes longer at a predetermined time close to the current time among the different predetermined times. 3. The object detection apparatus according to claim 2, wherein the determination information stored in the storage unit is further considered at a predetermined time far from the current time among the plurality of different predetermined times.

4. The object detection apparatus according to claim 2, wherein the determination information is at least one of a result of determination by the determination unit and a number of times that the series of regions is determined as a detection target.

The acquisition means includes
Extraction means for extracting candidate areas that may include a detection target from images output in time series by the imaging means;
For each candidate area extracted by the extraction means, an identification value calculation means for calculating an identification value indicating a degree of identification as the detection target based on a model of the detection target created in advance;
Selection means for selecting a candidate area whose identification value calculated by the identification value calculation means is greater than or equal to a predetermined value as an area where the identification value is greater than or equal to the predetermined value;
The object detection apparatus according to claim 1, further comprising:

Output images obtained by imaging the imaged area in time series, in time series,
From each of the images output in time series, based on a detection target model created in advance, an area having an identification value indicating a degree of identification as the detection target is greater than or equal to a predetermined value,
Check whether or not a series of areas that are one or more areas previously acquired from each of the images output in time series correspond to the areas acquired this time correspond to each other,
As a result of the collation, when the region acquired this time corresponds to the sequence of the region, the region acquired this time and the sequence of the region are associated with each other, and the region acquired this time and the region If the series does not correspond, the area acquired this time is stored as a new series,
In the series of areas associated with the area acquired this time, a plurality of average values of the identification values of the areas calculated retroactively to each of a plurality of different predetermined times, and a plurality of threshold values that become smaller as the retroactive time becomes longer, To determine whether or not the series of the region is a detection target, by comparing each corresponding time,
Object detection method.