JP2012073971A

JP2012073971A - Moving image object detection device, method and program

Info

Publication number: JP2012073971A
Application number: JP2010220190A
Authority: JP
Inventors: Makoto Yonaha; 誠與那覇
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2010-09-30
Filing date: 2010-09-30
Publication date: 2012-04-12

Abstract

PROBLEM TO BE SOLVED: To inhibit both erroneous detection and detection omission of an object.SOLUTION: An object detection means 12 detects an object from a frame image of a moving image and stores a location of the detected object in an object list storage part 14. An object tracking processing means 13 tracks the location of the object stored in the object list storage part 14 between multiple frames after the object is detected and stores it in the object list storage part 14. When a new object is detected in the object detection means 12, the object tracking processing means 13 goes back to the time before the object is detected in a frame image, tracks a location of the new detected object between multiple frames and stores it in the object list storage part 14.

Description

本発明は、動画オブジェクト検出装置、方法、及びプログラムに関し、更に詳しくは、複数フレームの動画像からオブジェクトを検出する動画オブジェクト検出装置、方法、及びプログラムに関する。 The present invention relates to a moving image object detection apparatus, method, and program, and more particularly, to a moving image object detection apparatus, method, and program for detecting an object from a moving image of a plurality of frames.

複数フレームの動画像からオブジェクトを検出する技術が知られている。また、あるフレームで検出されたオブジェクトを追跡対象として設定し、後続するフレームにおいてオブジェクトを追跡する技術も知られている。例えば特許文献１には、顔を検出し、検出した顔の領域を追跡することが記載されている。特許文献１では、顔領域検出部は、顔検出アルゴリズムを用いてフレーム単位で顔領域を検出する。トラッキング部は、顔領域履歴記憶部に記録されたフレーム単位の顔領域の検出履歴を参照して、検出された顔領域を連続する複数のフレームにわたって追跡する。トラッキング部は、あるフレームにて顔領域が検出されたときでも、そのフレーム以降の所定数以上の連続するフレームにて当該顔領域が検出されないときは、検出された顔領域を無効と判断する。 A technique for detecting an object from a plurality of frames of moving images is known. In addition, a technique for setting an object detected in a certain frame as a tracking target and tracking the object in a subsequent frame is also known. For example, Patent Document 1 describes that a face is detected and a region of the detected face is tracked. In Patent Document 1, a face area detection unit detects a face area in units of frames using a face detection algorithm. The tracking unit tracks the detected face region over a plurality of continuous frames with reference to the detection history of the face region in units of frames recorded in the face region history storage unit. Even when a face area is detected in a certain frame, the tracking unit determines that the detected face area is invalid if the face area is not detected in a predetermined number of consecutive frames after that frame.

また、特許文献２には、時系列の動画像を処理し、画像上で重なった移動物体同士を分割して認識する方法が記載されている。特許文献２では、画像処理部は、複数の時系列画像の各画像について、背景画像と比較することで移動物体を認識し、認識された移動物体に識別符号を付与する。画像処理部は、時刻ｔの画像に含まれる各移動物体の動きベクトルを求める。画像処理部は、時刻ｔ−１の画像に含まれる移動物体上の領域が同じ識別符号を持っていると仮定して、その領域に対応した時刻ｔでの画像上の領域を動きベクトルに基づいて推定し、両領域間の相関度を求める。画像処理部は、複数の識別符号の各々について求めた相関度の大小関係に基づいて各領域に付与すべき識別符号を決定することで、時刻ｔ−１の画像に含まれている非分離移動物体を分割する。 Patent Document 2 describes a method of processing time-series moving images and dividing and recognizing moving objects overlapped on the images. In Patent Document 2, the image processing unit recognizes a moving object by comparing each image of a plurality of time-series images with a background image, and assigns an identification code to the recognized moving object. The image processing unit obtains a motion vector of each moving object included in the image at time t. The image processing unit assumes that the area on the moving object included in the image at time t-1 has the same identification code, and based on the motion vector the area on the image at time t corresponding to the area. To obtain the degree of correlation between the two regions. The image processing unit determines the identification code to be assigned to each region based on the magnitude relationship of the degree of correlation obtained for each of the plurality of identification codes, so that the non-separated movement included in the image at time t−1 Divide the object.

特開２００９−５２３９号公報JP 2009-5239 A 特開２００２−１３３４２１号公報JP 2002-133421 A 特開２０１０−７２７２３号公報JP 2010-72723 A

特許文献１では、顔領域の検出後、その顔領域を未来方向に追跡している。しかし、特許文献１では、画像中に顔が現れたとしても、その顔の領域の追跡が直ちに開始されるとは限らない。例えば顔が隠れたり、正面を向いていなかったり、顔が小さかったりすると、顔を検出することができず、新たな追跡対象が動画像に現れたことを装置が認識することができないことがある。そのような場合、その人物の顔がはっきりと認識されるようになってはじめて追跡処理がスタートすることになり、その人物の登場から顔検出に成功するまでの間の動き情報を捉えることができない。特に、多数の人物が写る画像では、顔の隠れが起こる可能性が高く、検出したい顔領域の検出漏れ顕著になる。 In Patent Document 1, after detecting a face area, the face area is tracked in the future direction. However, in Patent Document 1, even if a face appears in an image, tracking of the face area is not always started immediately. For example, if the face is hidden, not facing the front, or the face is small, the face may not be detected and the device may not be able to recognize that a new tracking target has appeared in the moving image. . In such a case, the tracking process starts only after the person's face is clearly recognized, and the motion information from the appearance of the person to the successful face detection cannot be captured. . In particular, in an image in which a large number of persons are captured, there is a high possibility that the face will be hidden, and the detection omission of the face area to be detected becomes noticeable.

特許文献１において、顔検出の判定を甘くすれば、ある程度早い段階から顔を検出することが可能になる。しかしながら、その場合には検出漏れは抑制できるものの誤検出が増加し、装置自体の信頼性が低下するという弊害が生じる。つまり、特許文献１では、誤検出と検出漏れの双方を抑制することはできない。また、特許文献２は、ある時刻までは複数の移動物体同士が画像上で重なっていることで複数の移動物体を分離して検出することができず、ある時刻で画像上の重なりがなくなって複数の移動物体を分離して検出できたときに、過去フレームにおいて重なって検出された移動物体を分離するというものに過ぎない。従って、特許文献２においても、誤検出と検出漏れとの双方を抑制することはできない。 In Japanese Patent Application Laid-Open No. 2004-228867, if the detection of face detection is made mild, it is possible to detect a face from a certain early stage. However, in this case, detection omission can be suppressed, but erroneous detection increases, resulting in a problem that the reliability of the apparatus itself is lowered. That is, in Patent Document 1, it is not possible to suppress both erroneous detection and detection omission. Further, in Patent Document 2, a plurality of moving objects are overlapped on the image until a certain time, so that the plurality of moving objects cannot be separated and detected, and there is no overlap on the image at a certain time. When a plurality of moving objects can be detected separately, the moving objects detected by overlapping in the past frame are merely separated. Therefore, even in Patent Document 2, it is not possible to suppress both erroneous detection and detection omission.

上記課題の解決を図る技術として、特許文献３が知られている。特許文献３では、検出手段は、順次に入力される各フレームの画像から、人物と、人物か否か確定できない対象である候補とを検出する。人物追跡手段は、人物を追跡対象とする追跡処理を実行し、追跡対象とされた人物の位置情報を追跡情報として記録する。候補追跡手段は、候補を対象とする追跡処理を実行し、追跡対象とされた候補の位置情報を追跡情報として記録する。変更手段は、検出手段が検出した人物のなかに、候補追跡手段により追跡対象とされている候補と同一の対象と評価される人物が存在すると、候補の位置情報として記録されていた追跡情報を、人物の追跡情報へと変更する。 Patent Document 3 is known as a technique for solving the above problems. In Patent Document 3, the detection means detects a person and a candidate that cannot be determined whether or not it is a person from images of respective frames that are sequentially input. The person tracking means executes a tracking process for tracking a person, and records position information of the person who is the tracking target as tracking information. The candidate tracking means executes a tracking process for the candidate, and records the position information of the candidate as the tracking target as tracking information. If there is a person who is evaluated as the same target as the candidate tracked by the candidate tracking unit among the persons detected by the detection unit, the changing unit changes the tracking information recorded as the position information of the candidate. , Change to person tracking information.

特許文献３では、人物か否かが確定できない対象を“候補”として追跡し、“人物”として検出された対象が、“候補”として追跡されていた対象と同じ対象のとき、候補の追跡情報を人物の追跡情報へと変更する。このようにすることで、誤検出と検出漏れとの双方を抑制できる。しかし、特許文献３では、人物のみならず、人物か否かが確定できない対象も、人物と同様に追跡する必要があり、処理負荷が高いという問題が生じる。特に、人物検出において、人物か否かが確定できない対象の判定基準を甘めに設定すると、本当は人物でない多数の対象を“候補”として追跡する必要が生じ、無駄に追跡処理を行う必要がある分だけ追跡処理の処理負荷が無駄に高くなる。 In Patent Document 3, a target that cannot be determined whether or not it is a person is tracked as a “candidate”, and when the target that is detected as a “person” is the same target that was tracked as a “candidate”, the candidate tracking information To person tracking information. By doing in this way, both a misdetection and a detection omission can be suppressed. However, in Patent Document 3, not only a person but also an object for which it is not possible to determine whether or not it is a person needs to be tracked in the same way as a person, resulting in a high processing load. In particular, in the person detection, if the judgment criterion of the object that cannot be determined whether it is a person is set loosely, it is necessary to track a large number of objects that are not really people as “candidates”, and it is necessary to perform a wasteful tracking process. The processing load of the tracking process is unnecessarily increased by that amount.

本発明は、上記に鑑み、処理負荷を無駄に上げることなく、オブジェクトの誤検出と検出漏れとの双方を抑制できる動画像オブジェクト検出装置、方法、及びプログラムを提供することを目的とする。 In view of the above, an object of the present invention is to provide a moving image object detection apparatus, method, and program capable of suppressing both erroneous detection of an object and omission of detection without wastefully increasing the processing load.

上記目的を達成するために、本発明は、複数フレームから成る動画像のフレーム画像からオブジェクトを検出し、該検出したオブジェクトの位置をオブジェクトリスト記憶部に記憶するオブジェクト検出手段と、前記オブジェクトリスト記憶部に記憶されたオブジェクトの位置を、当該オブジェクトが検出された時刻以後、複数フレーム間で追跡し、該追跡したオブジェクトの位置をオブジェクトリスト記憶部に記憶する第１の追跡手段と、前記オブジェクト検出手段でオブジェクトが新たに検出されたとき、該新たに検出されたオブジェクトの位置を、当該オブジェクトが検出されたフレーム画像の時刻よりも前の時刻に遡って複数フレーム間で追跡し、該追跡したオブジェクトの位置を前記オブジェクトリスト記憶部に記憶する第２の追跡手段とを含む手段オブジェクト追跡処理手段とを備えたことを特徴とする動画オブジェクト検出装置を提供する。 In order to achieve the above object, the present invention provides an object detection means for detecting an object from a frame image of a moving image composed of a plurality of frames and storing the position of the detected object in an object list storage unit, and the object list storage A first tracking means for tracking the position of the object stored in the section from a plurality of frames after the time when the object is detected, and storing the position of the tracked object in the object list storage section; When an object is newly detected by the means, the position of the newly detected object is tracked between a plurality of frames retroactive to the time before the time of the frame image where the object is detected, and the tracked A second add-on that stores the position of the object in the object list storage unit. To provide a moving object detection apparatus characterized by comprising a means object tracking processing means and means.

前記オブジェクト検出手段が、１つのフレーム画像に含まれると想定されるオブジェクト数を示す想定登場数をｎ、１つのオブジェクトが何フレーム分撮影されるかを示す想定有効ショット数をＳ、オブジェクトの検出確率をＰ（％）として、１つのフレーム画像から、下記式、
Ｎ＝（ｎ／Ｓ）×（１００／Ｐ）
で求まるＮを整数化した個数だけオブジェクトを検出する構成を採用することができる。 The object detection means sets the assumed number of appearances indicating the number of objects assumed to be included in one frame image, n, the estimated number of effective shots indicating how many frames of one object are shot, and the detection of the object The probability is P (%), and from one frame image, the following formula:
N = (n / S) × (100 / P)
It is possible to adopt a configuration in which objects are detected by the number obtained by converting N obtained by (1) into an integer.

上記に代えて、前記オブジェクト検出手段が、１つのフレーム画像から１つのオブジェクトを検出する構成でもよい。 Instead of the above, the object detecting means may detect one object from one frame image.

前記オブジェクト追跡処理手段が、前記フレーム画像を順次に入力し、現在処理対象とするフレーム画像と、該フレーム画像よりも前の時刻のフレーム画像とに基づいて動きベクトル場を求める動きベクトル場計測手段を更に含み、前記第１の追跡手段が、前記動きベクトル場計測手段で求められた動きベクトル場と、前記オブジェクトリスト記憶部に記憶されたオブジェクトの位置とに基づいて、複数のフレーム間でオブジェクトの位置を追跡し、前記第２の追跡手段が、前記動きベクトル場計測手段で求められた動きベクトル場と、前記オブジェクト検出手段で新たに検出されたオブジェクトの位置とに基づいて、前の時刻に遡ってオブジェクトの位置を追跡する構成を採用することができる。 The object tracking processing unit sequentially inputs the frame images, and obtains a motion vector field based on a frame image to be processed at present and a frame image at a time before the frame image. The first tracking means includes an object between a plurality of frames based on the motion vector field obtained by the motion vector field measuring means and the position of the object stored in the object list storage unit. And the second tracking means determines the previous time based on the motion vector field obtained by the motion vector field measuring means and the position of the object newly detected by the object detecting means. It is possible to adopt a configuration that traces the position of an object retroactively.

前記動きベクトル場計測手段が、対象フレーム上の動きベクトル計測の対象となる対象画素について、参照フレームを対象フレームに対して動きベクトル検出空間に対応する所定範囲内でずらしつつ、各ずらし量に対して、前記対象画素と該対象画素に対応する参照フレームの画素との相関を表わすスコアの分布である動きベクトル分布を算出する動きベクトル分布算出手段と、前記動きベクトル分布に基づいて、前記対象画素における動きベクトルを検出する動きベクトル検出手段と、前記動きベクトル分布に基づいて、前記検出された動きベクトルが誤計測であるか否かを判定する誤計測判定手段とを含む構成を採用してもよい。 The motion vector field measurement means shifts the reference frame within the predetermined range corresponding to the motion vector detection space with respect to the target frame, with respect to the target pixel that is the target of the motion vector measurement on the target frame. A motion vector distribution calculating means for calculating a motion vector distribution which is a score distribution representing a correlation between the target pixel and a pixel of a reference frame corresponding to the target pixel; and the target pixel based on the motion vector distribution A configuration that includes a motion vector detection unit that detects a motion vector in FIG. 5 and an erroneous measurement determination unit that determines whether the detected motion vector is an erroneous measurement based on the motion vector distribution. Good.

前記誤計測判定手段は、前記動きベクトル検出空間の中心位置に対応するずらし量に対して算出された動きベクトル分布におけるスコアと、前記検出された動きベクトルの位置に対応するずらし量に対して算出された動きベクトル分布におけるスコアとが所定の関係を満たすか否かに基づいて、前記検出された動きベクトルが誤計測であるか否かを判定してもよい。誤計測判定手段は、例えばスコア間の差分が所定のしきい値以上であるか否かに基づいて、スコアが所定の関係を満たすか否かを判定することができる。あるいは誤計測判定手段は、スコア間の比率が所定のしきい値以上であるか否かに基づいて、スコアが所定の関係を満たすか否かを判定してもよい。 The erroneous measurement determination unit calculates the score in the motion vector distribution calculated for the shift amount corresponding to the center position of the motion vector detection space and the shift amount corresponding to the position of the detected motion vector. Whether or not the detected motion vector is an erroneous measurement may be determined based on whether or not the score in the determined motion vector distribution satisfies a predetermined relationship. The erroneous measurement determination unit can determine whether the score satisfies a predetermined relationship based on, for example, whether the difference between the scores is equal to or greater than a predetermined threshold. Alternatively, the erroneous measurement determination unit may determine whether or not the score satisfies a predetermined relationship based on whether or not the ratio between the scores is equal to or greater than a predetermined threshold value.

前記誤計測判定手段が、検出された動きベクトルが真の動きベクトルであるときの、前記各ずらし量に対して算出された動きベクトル分布における前記対象画素のスコアを並べたデータを正の教師データとし、検出された動きベクトルが真の動きベクトルではないときの、前記各ずらし量に対して算出された動きベクトル分布における対象画素のスコアを並べたデータを負の教師データとして機械学習を用いて生成された判別器を用い、各ずらし量に対して算出された動きベクトル分布における前記対象画素のスコアを並べたデータを前記判別器に入力したときの該判別器の出力に基づいて誤計測か否かを判定する構成でもよい。 When the detected error vector is a true motion vector, the erroneous measurement determination means uses data obtained by arranging the scores of the target pixels in the motion vector distribution calculated for each shift amount as positive teacher data. When the detected motion vector is not a true motion vector, machine learning is used as negative teacher data for data in which the scores of target pixels in the motion vector distribution calculated for each shift amount are arranged. Using the generated discriminator, whether or not erroneous measurement is performed based on the output of the discriminator when the data of the target pixel scores in the motion vector distribution calculated for each shift amount are arranged in the discriminator The structure which determines whether or not may be sufficient.

前記動きベクトル分布算出手段が、更に、参照フレーム上の前記対象画素に対応する画素について、対象フレームを参照フレームに対して動きベクトル検出空間に対応する所定範囲内でずらしつつ、各ずらし量に対して、前記参照フレーム上の前記対象画素に対応する画素と前記対応画素との相関を表わすスコアの分布である別の動きベクトル分布を算出し、前記動きベクトル検出手段が、更に、前記別の動きベクトル分布に基づいて、前記参照フレーム上の前記対象画素に対応する画素における別の動きベクトルを検出し、前記誤計測判定手段が、前記動きベクトル分布に基づいて前記検出された動きベクトルが誤計測であるか否かを判定するのに代えて、又はこれに加えて、前記動きベクトルと前記別の動きベクトルとの関係に基づいて誤計測であるか否を判定する構成を採用できる。 The motion vector distribution calculating unit further shifts the target frame on the reference frame with respect to each shift amount while shifting the target frame with respect to the reference frame within a predetermined range corresponding to the motion vector detection space. And calculating another motion vector distribution which is a score distribution indicating a correlation between a pixel corresponding to the target pixel on the reference frame and the corresponding pixel, and the motion vector detecting means further includes the another motion. Based on the vector distribution, another motion vector in a pixel corresponding to the target pixel on the reference frame is detected, and the erroneous measurement determination unit determines that the detected motion vector is erroneously measured based on the motion vector distribution. Instead of or in addition to determining whether the motion vector is based on the relationship between the motion vector and the other motion vector It can be employed for determining configure whether it is measured.

上記の場合、前記誤計測判定手段は、前記動きベクトルと前記別の動きベクトルとが逆ベクトルの関係にないとき、前記検出された動きベクトルが誤計測であると判定してもよい。 In the above case, the erroneous measurement determination unit may determine that the detected motion vector is an erroneous measurement when the motion vector and the another motion vector are not in an inverse vector relationship.

前記動きベクトル分布算出手段が、更に、前記対象画素について、前記対象フレームを前記対象フレーム自身に対して動きベクトル検出空間に対応する所定範囲内でずらしつつ、各ずらし量に対して、前記対象画素自身の相関を表すスコアの分布である自己動きベクトル分布を更に算出し、前記誤計測判定手段が、前記動きベクトル分布に基づいて前記検出された動きベクトルが誤計測であるか否かを判定するのに代えて、又はこれに加えて、前記検出された動きベクトルの位置に対応するずらし量に対して算出された自己動きベクトル分布におけるスコアに基づいて誤計測であるか否かを判定する構成を採用できる。 The motion vector distribution calculating unit further shifts the target pixel for each shift amount while shifting the target frame with respect to the target pixel within a predetermined range corresponding to the motion vector detection space. A self-motion vector distribution, which is a score distribution representing its own correlation, is further calculated, and the erroneous measurement determination means determines whether or not the detected motion vector is an erroneous measurement based on the motion vector distribution. Instead of or in addition to this, a configuration for determining whether or not the measurement is erroneous based on a score in the self-motion vector distribution calculated for the shift amount corresponding to the position of the detected motion vector Can be adopted.

上記の場合、前記誤計測判定手段は、前記自己動きベクトル分布におけるスコアを所定のしきい値でしきい値処理し、誤計測であるか否かを判定してもよい。 In the above case, the erroneous measurement determination means may perform threshold processing on the score in the self-motion vector distribution with a predetermined threshold value to determine whether or not it is an erroneous measurement.

前記オブジェクト検出手段が、検出すべきオブジェクトの輪郭形状に対応したフィルタ特性を有する平滑化フィルタを画像に畳み込む処理を繰り返し行うことにより、前記フレーム画像からスケールが異なる複数枚の平滑化画像を生成する平滑化処理手段と、前記複数枚の平滑化画像のうち、スケールが互いに異なる２枚の平滑化画像間の差分画像を、スケールを変えつつ複数枚生成する差分画像生成手段と、前記複数枚の差分画像を合算し合算画像を生成する合算手段と、前記合算画像における画素値に基づいて検出すべきオブジェクトの位置を推定する位置推定手段と、前記フレーム画像から、前記推定された位置の周辺でオブジェクトを検出する照合手段とを含む構成を採用してもよい。 The object detecting means repeatedly generates a plurality of smoothed images having different scales from the frame image by repeatedly performing a process of convolving a smoothing filter having a filter characteristic corresponding to the contour shape of the object to be detected on the image. A smoothing processing unit; a difference image generating unit configured to generate a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images while changing the scale; and Summing means for summing up difference images to generate a summed image, position estimating means for estimating the position of an object to be detected based on a pixel value in the summed image, and surrounding the estimated position from the frame image You may employ | adopt the structure containing the collation means which detects an object.

前記オブジェクト検出手段が検出すべきオブジェクトの数をＭとしたとき、前記位置推定手段が、前記合算画像の画素値を大きい順に並べたときの上位Ｍ個又は下位Ｍ個の画素の画素位置を、前記オブジェクトの位置として推定してもよい。 When the number of objects to be detected by the object detection means is M, the position estimation means indicates the pixel positions of the upper M or lower M pixels when the pixel values of the combined image are arranged in descending order. The position of the object may be estimated.

前記オブジェクト検出手段が、前記複数枚の差分画像の画素値を比較し、最大又は最小の画素値を有する差分画像のスケールに基づいて、検出すべきオブジェクトのサイズを推定するサイズ推定手段を更に備える構成を採用することもできる。 The object detection means further comprises size estimation means for comparing pixel values of the plurality of difference images and estimating a size of the object to be detected based on a scale of the difference image having the maximum or minimum pixel value. A configuration can also be adopted.

前記サイズ推定手段は、前記位置推定手段が推定したオブジェクトの位置の周辺で前記差分画像の画素値を比較してもよい。 The size estimation unit may compare pixel values of the difference image around the position of the object estimated by the position estimation unit.

前記サイズ推定手段が、前記最大又は最小の画素値を有する差分画像の生成元となった２枚の平滑化画像のうちのスケールが小さい方の平滑化画像内のスケールに基づいて前記オブジェクトのサイズを推定する構成でもよい。 The size estimation unit determines the size of the object based on a scale in a smoothed image having a smaller scale of the two smoothed images from which the difference image having the maximum or minimum pixel value is generated. The structure which estimates this may be sufficient.

前記平滑化処理手段がスケールσ_１からσ_ａ×ｋ（ａ及びｋは２以上の整数）までのａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ａ×ｋ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋまでのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）との差分に基づいて生成してもよい。この場合、前記差分画像生成手段は、下記式、
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ×ａ）
を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成してもよい。 The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . The generated image L (x, y, σ _j ) may be generated based on the difference between the smoothed image L (x, y, σ _{j × a} ) of the scale σ _{j × a} . In this case, the difference image generation means has the following formula:
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j × a} )
May be used to generate the difference image G (x, y, σ _j ).

上記に代えて、前記平滑化処理手段がスケールσ_１からσ_ｒ（ｒは３以上の整数）までのｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ｒ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋ−ｐ（ｐは１以上の整数）までのｋ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ−ｐ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）との差分に基づいて生成してもよい。この場合、前記差分画像生成手段が、下記式、
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）
を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成してもよい。 Instead of the above, the smoothing processing means outputs r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from the scale σ ₁ to σ _r (r is an integer of 3 or more). The difference image generation means generates kp difference images G (x, y, σ _j ) (j = 1 to k) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more). the -p), the smoothed image L (x, respectively scale σ _{_j,} y, σ _j) the scale sigma _{j + p} of the smoothed image L (x, y, may be generated based on the difference between the σ _{j + p)} . In this case, the difference image generation means has the following formula:
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j + p} )
May be used to generate the difference image G (x, y, σ _j ).

前記オブジェクト検出手段が、前記フレーム画像から動き領域を抽出し動き領域抽出画像を生成する動き領域抽出手段を更に備え、前記平滑化画像生成手段が、前記動き領域抽出画像に対して前記平滑化フィルタを畳み込む構成を採用することもできる。この場合、前記動き領域抽出手段は、各画素が、前記フレーム画像から抽出した動きの量に応じた階調値を有するグレースケール画像を前記動き領域抽出画像として生成してもよい。更に、前記動き領域抽出手段は、前記グレースケール画像に対して所定のコントラスト低減処理を施してもよい。 The object detection unit further includes a motion region extraction unit that extracts a motion region from the frame image and generates a motion region extraction image, and the smoothed image generation unit performs the smoothing filter on the motion region extraction image. It is also possible to adopt a configuration in which the is folded. In this case, the motion region extraction unit may generate a grayscale image in which each pixel has a gradation value corresponding to the amount of motion extracted from the frame image as the motion region extraction image. Furthermore, the motion region extraction unit may perform a predetermined contrast reduction process on the grayscale image.

本発明は、また、複数フレームから成る動画像のフレーム画像からオブジェクトを検出し、該検出したオブジェクトの位置をオブジェクトリスト記憶部に記憶するステップと、前記オブジェクトリスト記憶部に記憶されたオブジェクトの位置を、当該オブジェクトが検出された時刻以後、複数フレーム間で追跡し、該追跡したオブジェクトの位置をオブジェクトリスト記憶部に記憶するステップと、前記オブジェクト検出手段でオブジェクトが新たに検出されたとき、該新たに検出されたオブジェクトの位置を、当該オブジェクトが検出されたフレーム画像の時刻よりも前の時刻に遡って複数フレーム間で追跡し、該追跡したオブジェクトの位置を前記オブジェクトリスト記憶部に記憶するステップとを有することを特徴とする動画オブジェクト検出方法を提供する。 The present invention also includes a step of detecting an object from a frame image of a moving image composed of a plurality of frames, storing the position of the detected object in an object list storage unit, and a position of the object stored in the object list storage unit Are tracked between a plurality of frames after the time when the object is detected, the position of the tracked object is stored in an object list storage unit, and when an object is newly detected by the object detection means, The position of the newly detected object is tracked between a plurality of frames, going back to the time before the time of the frame image where the object is detected, and the position of the tracked object is stored in the object list storage unit. And a video object characterized by comprising steps To provide a door detection method.

更に本発明は、コンピュータに、複数フレームから成る動画像のフレーム画像からオブジェクトを検出し、該検出したオブジェクトの位置をオブジェクトリスト記憶部に記憶するステップと、前記オブジェクトリスト記憶部に記憶されたオブジェクトの位置を、当該オブジェクトが検出された時刻以後、複数フレーム間で追跡し、該追跡したオブジェクトの位置をオブジェクトリスト記憶部に記憶するステップと、前記オブジェクト検出手段でオブジェクトが新たに検出されたとき、該新たに検出されたオブジェクトの位置を、当該オブジェクトが検出されたフレーム画像の時刻よりも前の時刻に遡って複数フレーム間で追跡し、該追跡したオブジェクトの位置を前記オブジェクトリスト記憶部に記憶するステップとを実行させるためのプログラムを提供する。 The present invention further includes a step of detecting an object from a frame image of a moving image composed of a plurality of frames in a computer and storing the position of the detected object in an object list storage unit; and an object stored in the object list storage unit The position of the object is tracked between a plurality of frames after the time when the object is detected, and the position of the tracked object is stored in the object list storage unit, and when the object is newly detected by the object detection means The position of the newly detected object is traced between a plurality of frames going back to the time before the time of the frame image where the object is detected, and the position of the tracked object is stored in the object list storage unit. And a professional for executing the memorizing step To provide the ram.

本発明の動画オブジェクト検出装置、方法、及びプログラムでは、ある時刻のフレーム画像においてオブジェクトが検出されると、検出された時刻よりも後の時刻においてオブジェクトの位置を複数フレーム間で追跡すると共に、オブジェクトが検出された時刻よりも前の時刻に遡って複数フレーム間でオブジェクトの位置を追跡する。オブジェクト検出においてオブジェクトの検出条件を厳しく設定し、オブジェクトらしさが低いオブジェクトはオブジェクトとして検出しないことで、オブジェクトの誤検出を抑制できる。一方で、オブジェクトが検出されたときに、過去フレーム方向にオブジェクトを追跡するため、過去フレームにおいて検出できなかったオブジェクトの位置を推定でき、過去フレーム画像におけるオブジェクトの検出漏れを抑制できる。つまり、本発明においては、オブジェクトの誤検出の抑制と、検出漏れの抑制との双方を実現できる。また、本発明では、特許文献３とは異なり、オブジェクトとは別にオブジェクトの候補を検出し、そのオブジェクトの候補を追跡する必要がないため、処理負荷が無駄に高くなることはない。 In the moving image object detection device, method, and program of the present invention, when an object is detected in a frame image at a certain time, the position of the object is tracked between a plurality of frames at a time later than the detected time. The position of the object is traced between a plurality of frames by going back to the time before the time at which is detected. In object detection, object detection conditions are set strictly, and an object with low object likelihood is not detected as an object, so that erroneous detection of the object can be suppressed. On the other hand, when an object is detected, the object is tracked in the past frame direction. Therefore, the position of the object that could not be detected in the past frame can be estimated, and the detection omission of the object in the past frame image can be suppressed. That is, in the present invention, both suppression of erroneous detection of an object and suppression of detection omission can be realized. Also, in the present invention, unlike Patent Document 3, it is not necessary to detect an object candidate separately from the object and track the object candidate, so that the processing load is not increased unnecessarily.

本発明の一実施形態の動画オブジェクト検出装置を示すブロック図。The block diagram which shows the moving image object detection apparatus of one Embodiment of this invention. オブジェクト追跡処理手段を示すブロック図。The block diagram which shows an object tracking process means. 動きベクトル場計測手段の構成例を示すブロック図。The block diagram which shows the structural example of a motion vector field measurement means. 動きベクトル場計測手段の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of a motion vector field measurement means. オブジェクト検出手段の構成例を示すブロック図。The block diagram which shows the structural example of an object detection means. オブジェクト検出手段の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of an object detection means. 動画オブジェクト検出装置の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of a moving image object detection apparatus. 具体的な処理例を示す図。The figure which shows the specific process example.

以下、図面を参照し、本発明の実施の形態を詳細に説明する。図１は、本発明の一実施形態の動画オブジェクト検出装置を示す。動画オブジェクト検出装置１０は、フレームメモリ１１、オブジェクト検出手段１２、オブジェクト追跡処理手段１３、及びオブジェクトリスト記憶部１４を有する。動画オブジェクト検出装置１０は、動画像を構成するフレーム画像を順次に処理し、動画像に含まれるオブジェクトを検出する。動画オブジェクト検出装置１０内の各部の機能は、コンピュータシステムが所定のプログラムに従って動作することで実現することができる。あるいは各部の機能は、ＩＣ（Integrated Circuit）などで実現することもできる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows a moving image object detection apparatus according to an embodiment of the present invention. The moving image object detection apparatus 10 includes a frame memory 11, an object detection unit 12, an object tracking processing unit 13, and an object list storage unit 14. The moving image object detection device 10 sequentially processes frame images constituting a moving image, and detects an object included in the moving image. The function of each unit in the moving image object detection device 10 can be realized by the computer system operating according to a predetermined program. Alternatively, the function of each unit can be realized by an IC (Integrated Circuit) or the like.

フレームメモリ１１は、動画オブジェクト検出装置１０に入力された複数フレームから成る動画像のうちの少なくとも数フレーム分のフレーム画像（フレームデータ）を記憶する。オブジェクト検出手段１２は、フレームメモリ１１を参照し、フレーム画像からオブジェクトを検出する。オブジェクト検出手段１２は、オブジェクトが存在する領域をＲＯＩ領域（Region of Interest:関心領域）として検出する。 The frame memory 11 stores frame images (frame data) for at least several frames of a moving image composed of a plurality of frames input to the moving image object detection device 10. The object detection unit 12 refers to the frame memory 11 and detects an object from the frame image. The object detection means 12 detects a region where an object exists as a ROI region (Region of Interest).

オブジェクト検出手段１２は、例えばフレーム画像から所定個数のオブジェクトを検出する。オブジェクト検出手段１２は、例えば１つのフレーム画像に含まれると想定されるオブジェクト数を示す想定登場数をｎ、オブジェクトが隠れによる撮影不良フレームを除いて１つのオブジェクトが何フレーム分撮影されるかを示す想定有効ショット数をＳ、オブジェクトの検出確率をＰ（％）として、１つのフレーム画像から、下記式、
Ｎ＝（ｎ／Ｓ）×（１００／Ｐ）
で求まるＮを整数化した個数だけオブジェクトを検出する。オブジェクト検出手段１２は、例えば上記で計算されたＮの小数点以下を切り上げることで整数化を行う。例えば想定登場数ｎが４、想定有効ショット数Ｓが数十回、検出確率Ｐが５０％であるとすると、オブジェクト検出手段１２は、１つのフレーム画像から１つのオブジェクトを検出すればよい。上記式を用いてＮを計算せずに、オブジェクト検出手段１２におけるオブジェクトの検出個数を１つと決めてもよい。 The object detection unit 12 detects a predetermined number of objects from, for example, a frame image. For example, the object detection unit 12 sets the assumed appearance number indicating the number of objects assumed to be included in one frame image to n, and how many frames an object is imaged except for a defective shooting frame due to the object being hidden. Assuming that the number of assumed effective shots to be shown is S and the object detection probability is P (%), from one frame image,
N = (n / S) × (100 / P)
As many objects as the number obtained by converting N obtained by (1) into an integer are detected. For example, the object detection means 12 performs integerization by rounding up the fractions of N calculated above. For example, if the assumed appearance number n is 4, the assumed effective shot number S is several tens of times, and the detection probability P is 50%, the object detection unit 12 may detect one object from one frame image. The number of detected objects in the object detecting means 12 may be determined as one without calculating N using the above formula.

オブジェクト検出手段１２におけるオブジェクトの検出手法は特に問わない。オブジェクト検出手段１２は、例えば各フレーム画像に対してテンプレートマッチングを行い、人物や顔（人物の頭部）などの検出対象のオブジェクトを検出する。また、オブジェクト検出手段１２は、例えば機械学習により生成された識別器を用いて各フレーム画像から検出対象のオブジェクトを検出してもよい。あるいはオブジェクト検出手段１２は、背景画像とフレーム画像との差分からオブジェクトを検出してもよい。オブジェクト検出手段１２は、例えば入口ゲートなど、オブジェクトが画像中に登場する際の位置が決まっている場合は、その部分の近傍でのみでオブジェクト検出を行ってもよい。 The object detection method in the object detection means 12 is not particularly limited. The object detection unit 12 performs template matching on each frame image, for example, and detects an object to be detected such as a person or a face (person's head). Further, the object detection unit 12 may detect an object to be detected from each frame image using, for example, a discriminator generated by machine learning. Alternatively, the object detection unit 12 may detect the object from the difference between the background image and the frame image. If the position at which an object appears in the image, such as an entrance gate, is determined, the object detection means 12 may detect the object only in the vicinity of that portion.

オブジェクトリスト記憶部１４は、検出されたオブジェクトについて、どのフレームのどの位置で検出されたかを示す情報を記憶する。オブジェクトリスト記憶部１４は、例えばオブジェクトの識別情報（オブジェクトＩＤ）と、オブジェクトが検出されたフレーム画像を識別する情報（フレームＩＤ）と、画像中のオブジェクトの位置とを対応付けて記憶する。フレームＩＤには、例えば動画像の開始時刻からの経過時間やフレーム番号などを用いることができる。オブジェクトの位置は、例えばオブジェクトの検出領域を矩形で表わしたときの矩形の始点（左上の頂点）と終点（右下の頂点）との組で表わすことができる。 The object list storage unit 14 stores information indicating at which position in which frame the detected object is detected. The object list storage unit 14 stores, for example, object identification information (object ID), information (frame ID) identifying the frame image in which the object is detected, and the position of the object in the image in association with each other. As the frame ID, for example, an elapsed time from the start time of the moving image, a frame number, or the like can be used. The position of the object can be represented by, for example, a set of a rectangle starting point (upper left vertex) and end point (lower right vertex) when the object detection area is represented by a rectangle.

オブジェクト検出手段１２は、現在フレームにおいてオブジェクトを検出すると、検出したオブジェクトの位置をオブジェクトリスト記憶部１４に記憶する。オブジェクト検出手段１２は、例えば検出したオブジェクトに対して識別情報（オブジェクトＩＤ）を付与し、付与したオブジェクトＩＤとフレームＩＤと画像中のオブジェクトの位置とをオブジェクトリスト記憶部１４に追加記憶する。また、新たなオブジェクトが検出された旨を、オブジェクト追跡処理手段１３に通知する。 When the object detection unit 12 detects an object in the current frame, the object detection unit 12 stores the position of the detected object in the object list storage unit 14. For example, the object detection unit 12 assigns identification information (object ID) to the detected object, and additionally stores the assigned object ID, the frame ID, and the position of the object in the image in the object list storage unit 14. Further, it notifies the object tracking processing means 13 that a new object has been detected.

オブジェクト追跡処理手段１３は、検出済みのオブジェクト、すなわち過去フレームにおいて検出され、オブジェクトリスト記憶部１４に記憶されたオブジェクトを、オブジェクトが検出された時刻以後、複数フレーム間で追跡し、現在フレームにおけるオブジェクトの位置を推定する。オブジェクト追跡処理手段１３は、推定したオブジェクトの位置をオブジェクトリスト記憶部１４に記憶する。また、オブジェクト追跡処理手段１３は、新たなオブジェクトが検出された旨の通知を受けると、その新たに検出されたオブジェクトを過去フレーム方向に追跡し、現在フレームよりも前の時刻のフレーム画像におけるオブジェクトの位置を推定する。オブジェクト追跡処理手段１３は、推定した過去のフレーム画像におけるオブジェクトの位置をオブジェクトリスト記憶部１４に記憶する。 The object tracking processing means 13 tracks the detected object, that is, the object detected in the past frame and stored in the object list storage unit 14 between a plurality of frames after the time when the object is detected, and the object in the current frame. Is estimated. The object tracking processing unit 13 stores the estimated object position in the object list storage unit 14. When the object tracking processing unit 13 receives a notification that a new object has been detected, the object tracking processing unit 13 tracks the newly detected object in the direction of the past frame, and the object in the frame image at a time earlier than the current frame. Is estimated. The object tracking processing unit 13 stores the estimated position of the object in the past frame image in the object list storage unit 14.

図２は、オブジェクト追跡処理手段１３を示す。オブジェクト追跡処理手段１３は、動きベクトル場計測手段３１、第１の追跡手段３２、及び第２の追跡手段３３を有する。動きベクトル場計測手段３１は、フレーム画像を順次に入力し、現在フレーム画像と、現在フレームよりも１つ以上前の過去フレーム画像とに基づいて、動きベクトル場を計測する。動きベクトル場計測手段３１は、複数の画素位置で動きベクトルを求める。動きベクトル場計測手段３１は、例えばフレーム画像を構成する全ての画素に対して動きベクトルを求める。動きベクトル場計測手段３１は、画像を構成する各画素を隣接する複数の画素ずつ複数のブロックに分割し、ブロック単位で動きベクトルを求めてもよい。 FIG. 2 shows the object tracking processing means 13. The object tracking processing unit 13 includes a motion vector field measuring unit 31, a first tracking unit 32, and a second tracking unit 33. The motion vector field measuring unit 31 sequentially inputs frame images and measures a motion vector field based on the current frame image and a past frame image one or more before the current frame. The motion vector field measuring means 31 obtains a motion vector at a plurality of pixel positions. The motion vector field measurement means 31 obtains a motion vector for all the pixels constituting the frame image, for example. The motion vector field measurement means 31 may divide each pixel constituting the image into a plurality of blocks by a plurality of adjacent pixels, and obtain a motion vector in units of blocks.

第１の追跡手段３２は、オブジェクトリスト記憶部１４に記憶された、過去フレーム画像において検出済みのオブジェクトのそれぞれに対し、現在フレームでのオブジェクトの位置を推定する。第１の追跡手段３２は、過去フレーム画像、例えば現在フレームよりも１フレーム前のフレーム画像におけるオブジェクトの位置と、動きベクトル場計測手段３１が計測した動きベクトル場とに基づいて、現在フレーム画像におけるオブジェクトの位置を推定する。第１の追跡手段３２は、例えば１つ前のフレーム画像におけるオブジェクトの位置に対応する領域内の動きベクトルを平均化し、１つ前のフレーム画像におけるオブジェクトの領域を動きベクトルの平均値だけずらした領域を、現在フレーム画像におけるオブジェクトの位置として推定する。第１の追跡手段３２は、推定したオブジェクトの位置をオブジェクトＩＤ及びフレームＩＤと共にオブジェクトリスト記憶部１４に記憶する。 The first tracking unit 32 estimates the position of the object in the current frame for each of the detected objects in the past frame image stored in the object list storage unit 14. Based on the position of the object in the past frame image, for example, the frame image one frame before the current frame, and the motion vector field measured by the motion vector field measurement unit 31, the first tracking unit 32 Estimate the position of the object. For example, the first tracking unit 32 averages the motion vectors in the region corresponding to the position of the object in the previous frame image, and shifts the object region in the previous frame image by the average value of the motion vector. The region is estimated as the position of the object in the current frame image. The first tracking unit 32 stores the estimated position of the object in the object list storage unit 14 together with the object ID and the frame ID.

第２の追跡手段３２は、オブジェクト検出手段１２が新たに検出したオブジェクトに対し、現在フレームよりも過去のフレーム画像におけるオブジェクトの位置を推定する。第２の追跡手段３２は、現在フレームにおいて検出されたオブジェクトの位置と、動きベクトル場計測手段３１が計測した動きベクトル場とに基づいて、例えば現在フレームよりも１つ前の過去フレームにおけるオブジェクトの位置を推定する。第２の追跡手段３２は、例えば動きベクトルに基づいて、過去フレームにおける各画素のうちで移動先が現在フレーム画像におけるオブジェクトの位置に対応する領域に入る画素位置を特定する。第２の追跡手段３２は、特定した過去フレームにおける画素位置の動きベクトルの平均を求め、その平均ベクトルの逆ベクトルを用いて、過去フレームにおけるオブジェクトの位置を推定する。第２の追跡手段３２は、推定したオブジェクトの位置をオブジェクトＩＤ及びフレームＩＤと共にオブジェクトリスト記憶部１４に記憶する。 The second tracking unit 32 estimates the position of the object in a frame image in the past of the current frame with respect to the object newly detected by the object detection unit 12. Based on the position of the object detected in the current frame and the motion vector field measured by the motion vector field measuring unit 31, the second tracking unit 32, for example, detects the object in the past frame immediately before the current frame. Estimate the position. For example, based on the motion vector, the second tracking unit 32 specifies a pixel position where the movement destination enters an area corresponding to the position of the object in the current frame image among the pixels in the past frame. The second tracking unit 32 obtains the average of the motion vectors of the pixel positions in the identified past frame, and estimates the position of the object in the past frame using the inverse vector of the average vector. The second tracking unit 32 stores the estimated position of the object in the object list storage unit 14 together with the object ID and the frame ID.

動きベクトル場計測手段３１の具体的な構成例を説明する。図３は、動きベクトル場計測手段３１の構成例を示す。動きベクトル場計測手段３１は、解像度変換手段４０、動きベクトル分布算出手段４１、平均化処理手段４２、動きベクトル検出手段４３、及び誤計測判定手段４４を有する。解像度変換手段４０は、フレームメモリ１１（図１）に記憶されたフレーム画像を、フレーム画像の解像度よりも低い低解像度の画像に変換する。解像度変換手段４０は、ある時刻のフレーム画像に対しては１度だけ解像度変換を行えばよい。言い換えれば、解像度変換手段４０は、同一フレーム画像に対して解像度変換を複数回行う必要はない。 A specific configuration example of the motion vector field measuring unit 31 will be described. FIG. 3 shows a configuration example of the motion vector field measuring means 31. The motion vector field measurement unit 31 includes a resolution conversion unit 40, a motion vector distribution calculation unit 41, an averaging processing unit 42, a motion vector detection unit 43, and an erroneous measurement determination unit 44. The resolution conversion means 40 converts the frame image stored in the frame memory 11 (FIG. 1) into a low resolution image lower than the resolution of the frame image. The resolution conversion unit 40 may perform resolution conversion only once for a frame image at a certain time. In other words, the resolution conversion means 40 does not need to perform resolution conversion a plurality of times for the same frame image.

解像度変換には、種々の手法を用いることができる。解像度変換手段４０は、例えば解像度変換前のフレーム画像の画素を所定の割合いで間引くことで、画像の解像度を低下させる。あるいは解像度変換手段４０は、変換前のフレーム画像の所定画素数分の画素値の平均を求め、求めた平均値を解像度変換後の画像における各画素の画素値とすることで、解像度を低解像度化してもよい。解像度変換手段４０は、低解像度化の処理を複数回繰り返して、画像を所望の解像度まで低下させてもよい。例えば解像度を１／２にする処理を３回行って、解像度を１／８まで低下させてもよい。 Various methods can be used for resolution conversion. For example, the resolution conversion unit 40 reduces the resolution of the image by thinning out the pixels of the frame image before the resolution conversion at a predetermined rate. Alternatively, the resolution conversion means 40 obtains an average of pixel values for a predetermined number of pixels in the frame image before conversion, and uses the obtained average value as the pixel value of each pixel in the image after resolution conversion, thereby reducing the resolution to low resolution. May be used. The resolution conversion means 40 may reduce the image to a desired resolution by repeating the resolution reduction process a plurality of times. For example, the resolution may be reduced to 1/8 by performing the process of reducing the resolution by 3 times three times.

動きベクトル場計測手段３１は、解像度変換手段４０が変換した画像を用いて動きベクトル場の計測を行う。より詳細には、動きベクトル場計測手段３１は、現在時刻のフレーム画像を低解像度化した画像を対象フレームとし、過去フレームの画像を低解像度化した画像を参照フレームとして動きベクトル場の計測を行う。低解像度化した画像を用いて動きベクトル場を計測することで、処理を高速化できる。ただし、低解像度化したことで、処理の高速化と引き換えに動きベクトル場の計測精度は低下する。解像度変換手段４０がフレーム画像をどの程度低解像度化するかは、必要とされる演算時間と計測精度とに応じて適宜設定すればよい。画像の低解像度化は必須ではなく、動きベクトル場の計測を、フレーム画像を低解像度化せずに行うことも可能である。つまりフレーム画像の本来の解像度のまま動きベクトル場の計測を行うことも可能である。 The motion vector field measurement means 31 measures the motion vector field using the image converted by the resolution conversion means 40. More specifically, the motion vector field measuring unit 31 measures a motion vector field using an image obtained by reducing the resolution of a frame image at the current time as a target frame and an image obtained by reducing the resolution of a past frame image as a reference frame. . By measuring a motion vector field using an image with a reduced resolution, the processing speed can be increased. However, since the resolution is lowered, the measurement accuracy of the motion vector field is lowered in exchange for the higher processing speed. How much the resolution conversion means 40 reduces the resolution of the frame image may be set as appropriate according to the required calculation time and measurement accuracy. It is not essential to reduce the resolution of the image, and the motion vector field can be measured without reducing the resolution of the frame image. That is, it is possible to measure the motion vector field with the original resolution of the frame image.

動きベクトル分布算出手段４１は、対象フレームと参照フレームとに基づいて、両者の画素間の相関（類似度）を示す値（スコア）を算出する。動きベクトル分布算出手段４１は、対象フレーム上の動きベクトル計測の対象となる対象画素について、参照フレームを対象フレームに対して動きベクトル検出空間に対応する所定範囲内でずらしつつ、各ずらし量に対してスコアを算出する。動きベクトル分布算出手段４１は、例えば対象フレームの着目画素（ｘ，ｙ）の周囲に所定ブロックサイズのブロックを設定すると共に、所定範囲でずらした各ずれ量の参照フレームの対応画素（ｘ，ｙ）の周囲に所定ブロックサイズのブロックを設定し、ブロック内の各画素がどの程度一致するかを示すスコアＣ０を算出する。 The motion vector distribution calculating unit 41 calculates a value (score) indicating a correlation (similarity) between the pixels based on the target frame and the reference frame. The motion vector distribution calculating means 41 shifts the reference frame within the predetermined range corresponding to the motion vector detection space with respect to the target frame, which is the target of motion vector measurement on the target frame, for each shift amount. To calculate the score. For example, the motion vector distribution calculating unit 41 sets a block having a predetermined block size around the target pixel (x, y) of the target frame, and also corresponds to the corresponding pixel (x, y) of each shift amount shifted within a predetermined range. ) Are set around a predetermined block size, and a score C0 indicating how much each pixel in the block matches is calculated.

上記ブロックサイズは、例えば３×３のサイズとすることができる。その場合、動きベクトル分布算出手段４１は、着目画素の位置を（ｘ，ｙ）として、対象フレームの（ｘ−１，ｙ−１）から（ｘ＋１，ｙ＋１）までの各位置の画素と、対象フレームに対してずらした参照画像の（ｘ−１，ｙ−１）から（ｘ＋１，ｙ＋１）までの各位置の画素とに基づいてスコアＣ０を算出する。スコアＣ０は、ブロック内の画素同士の差分絶対値の和（平均絶対値誤差）とすることができる。具体的には、対象フレームの各位置での画素値（階調値）をｆ（ｘ，ｙ）とし、参照フレームをずらした画像の各位置での画素値をｇ（ｘ，ｙ）として、下記式１で定義される値をスコアＣ０とすることができる。
Ｃ０＝Σ_ｐΣ_ｑ｜ｆ（ｘ＋ｐ，ｙ＋ｑ）−ｇ（ｘ＋ｐ，ｙ＋ｑ）｜ …(1)
差分絶対値の和に代えて、平均二乗誤差を用いてもよい。 The block size can be 3 × 3, for example. In this case, the motion vector distribution calculating unit 41 sets the position of the pixel of interest as (x, y), the pixel at each position from (x−1, y−1) to (x + 1, y + 1) of the target frame, and the target A score C0 is calculated based on the pixels at each position from (x-1, y-1) to (x + 1, y + 1) of the reference image shifted with respect to the frame. The score C0 can be the sum of the absolute differences between the pixels in the block (average absolute value error). Specifically, the pixel value (gradation value) at each position of the target frame is f (x, y), and the pixel value at each position of the image shifted from the reference frame is g (x, y). The value defined by the following formula 1 can be set as the score C0.
_{_{C0 = Σ p Σ q | f}} (x + p, y + q) -g (x + p, y + q) | ... (1)
Instead of the sum of absolute differences, a mean square error may be used.

ここで、対象フレームと参照フレームとでは、動画像の撮影時における露出値の変動や、予期しない明るさの変更が生じることも考えられる。そのような場合に、上記式１を用いて画素値の差分の絶対値を求めると、本来差分絶対値が０になるような場合でも、明るさの変動に伴って差分絶対値が０にならないことが考えられる。そのような場合に備えて、対象フレームと参照フレームとにおける明るさの差を補正した上で、スコアＣ０を計算するようにしてもよい。具体的には、対象フレームにおけるブロック内の画素値の平均ｆｍと、参照フレームをずらした画像におけるブロック内の画素値の平均値ｇｍとを算出し、それらを用いて下記式２でスコアＣ０を算出してもよい。
Ｃ０＝Σ_ｐΣ_ｑ｜ｆ（ｘ＋ｐ，ｙ＋ｑ）−ｆｍ−ｇ（ｘ＋ｐ，ｙ＋ｑ）−ｇｍ｜ …(2) Here, in the target frame and the reference frame, it is also conceivable that the exposure value fluctuates and the brightness is unexpectedly changed when the moving image is captured. In such a case, when the absolute value of the difference between the pixel values is obtained using the above formula 1, even if the absolute value of the difference is originally 0, the absolute value of the difference does not become 0 due to the change in brightness. It is possible. In preparation for such a case, the score C0 may be calculated after correcting the difference in brightness between the target frame and the reference frame. Specifically, the average fm of the pixel values in the block in the target frame and the average value gm of the pixel values in the block in the image shifted from the reference frame are calculated, and using them, the score C0 is calculated by the following equation 2. It may be calculated.
_{_{C0 = Σ p Σ q | f}} (x + p, y + q) -fm-g (x + p, y + q) -gm | ... (2)

上記の式１又は式２で算出されるスコアＣ０は、画素同士の相関が高いほど値が小さくなる。これに代えて、画素同士の相関が高いほど値が大きくなるようなスコアを用いてもよい。例えば動きベクトル分布算出手段４１は、画素値の最大値（最大階調値）からスコアＣ０を減算した値をスコアＣ１として算出してもよい。具体的には、画素値が８ビットの階調値で表わされる場合、スコア分布算出部４１は、下記式でスコアＣ１を算出してもよい。
Ｃ１＝２５５−Ｃ０
以降の説明では、主にベクトル分布算出部４１がスコアＣ１を算出する例を用いて説明するものとする。スコアＣ０を用いる場合は、以下の説明において、スコアの大小関係を適宜読み替えればよい。 The score C0 calculated by the above expression 1 or expression 2 decreases as the correlation between pixels increases. Instead, a score that increases as the correlation between pixels increases may be used. For example, the motion vector distribution calculating unit 41 may calculate a value obtained by subtracting the score C0 from the maximum pixel value (maximum gradation value) as the score C1. Specifically, when the pixel value is represented by an 8-bit gradation value, the score distribution calculation unit 41 may calculate the score C1 by the following formula.
C1 = 255-C0
In the following description, the vector distribution calculation unit 41 will mainly be described using an example in which the score C1 is calculated. In the case of using the score C0, the magnitude relationship between the scores may be appropriately read in the following description.

動きベクトル分布算出手段４１は、動きベクトル検出範囲（探索範囲）に相当する範囲で、水平方向及び垂直方向に参照フレームを対象フレームに対して１画素ずつずらしつつ、ずらし量ごとに、対象フレームの各画素位置に対してスコアＣ１を算出する。例えば（−２，−２）から（＋２，＋２）の範囲で動きベクトルを検出するとした場合、動きベクトル分布算出手段４１は、参照フレームを水平方向及び垂直方向にそれぞれ±２画素の範囲で１画素ずつずらして、対象フレームの各画素位置に対してスコアＣ１を算出する。 The motion vector distribution calculating means 41 shifts the reference frame by one pixel from the target frame in the horizontal direction and the vertical direction in a range corresponding to the motion vector detection range (search range), and for each shift amount, A score C1 is calculated for each pixel position. For example, when a motion vector is detected in the range of (−2, −2) to (+2, +2), the motion vector distribution calculating unit 41 sets the reference frame to 1 in the range of ± 2 pixels in the horizontal direction and the vertical direction, respectively. The score C1 is calculated for each pixel position of the target frame by shifting the pixels.

参照フレームを対象フレームに対してずらすための水平方向の座標をｈとし、垂直方向の座標をｖをとすると、対象フレームに対する参照フレームの画素のずらし量は（ｈ，ｖ）＝（ｋ，ｌ）（ｋ，ｌ＝−２〜＋２の整数）と表わすことができる。ずらし量（ｈ，ｖ）は、２５通りの値を持つ。ずらし量（０，０）は、対象フレームと参照フレームとを一致させた場合に対応し、ずらし量（１，０）は、参照フレームを対象フレームに対して水平方向にのみ１画素正の方向にずらした場合に対応する。 When the horizontal coordinate for shifting the reference frame with respect to the target frame is h and the vertical coordinate is v, the shift amount of the pixel of the reference frame with respect to the target frame is (h, v) = (k, l ) (K, l = integer of −2 to +2). The shift amount (h, v) has 25 values. The shift amount (0, 0) corresponds to the case where the target frame is matched with the reference frame, and the shift amount (1, 0) is a direction in which one pixel is positive only in the horizontal direction with respect to the target frame. This corresponds to the case of shifting to

ここで、各ずらし量（ｈ，ｖ）は、動きベクトル検出空間における動きベクトルとみなすことができる。以下では、ずらし量ごとに求まるスコアＣ１の分布を、動きベクトル分布Ｄと呼ぶ。水平方向及び垂直方向にそれぞれ−２から＋２の範囲を動きベクトル検出範囲とする場合、２５通りの動きベクトル分布Ｄ（ｋ，ｌ）が算出されることになる。２５通りの動きベクトル分布Ｄが得られているということは、対象フレームの各画素位置に対して、各ずらし量に対応した２５個のスコアＣ１が算出されているのと等価である。 Here, each shift amount (h, v) can be regarded as a motion vector in the motion vector detection space. Hereinafter, the distribution of the score C1 obtained for each shift amount is referred to as a motion vector distribution D. When the range of −2 to +2 in the horizontal direction and the vertical direction is set as the motion vector detection range, 25 types of motion vector distributions D (k, l) are calculated. Obtaining 25 different motion vector distributions D is equivalent to calculating 25 scores C1 corresponding to each shift amount for each pixel position of the target frame.

平均化処理手段４２は、動きベクトル分布Ｄ（ｋ，ｌ）からノイズ成分を除去する。平均化処理手段４２は、例えば動きベクトル分布Ｄ（ｋ，ｌ）に対して空間的な平均化フィルタ処理を行い、平均化された動きベクトル分布Ｄｍ（ｋ，ｌ）を算出する。平均化フィルタのサイズは、例えば３×３のサイズとすることができる。平均化フィルタのサイズは、得たいノイズ除去の程度に応じて適宜設定すればよい。平均化処理手段４２が使用するフィルタは平均化フィルタには限定されない。平均化フィルタに代えて、空間的なメディアンフィルタを用いてもよい。 The averaging processing means 42 removes noise components from the motion vector distribution D (k, l). The averaging processing means 42 performs, for example, a spatial averaging filter process on the motion vector distribution D (k, l), and calculates an averaged motion vector distribution Dm (k, l). The size of the averaging filter can be 3 × 3, for example. The size of the averaging filter may be appropriately set according to the degree of noise removal to be obtained. The filter used by the averaging processing means 42 is not limited to the averaging filter. A spatial median filter may be used in place of the averaging filter.

動きベクトル検出手段４３は、ずらし量の数に対応する数の動きベクトル分布、例えば２５通りの動きベクトル分布（平均化された動きベクトル分布Ｄｍ（ｋ，ｌ））に基づいて、対象フレームの各画素位置における動きベクトルを検出する。動きベクトル検出手段４３は、例えば対象フレームにおける着目画素の位置を（ｘ，ｙ）としたとき、２５通りの平均化された動きベクトル分布Ｄｍ（ｋ，ｌ）のそれぞれにおける画素位置（ｘ，ｙ）のスコアＣ１を相互に比較する。動きベクトル検出手段４３は、スコアＣ１が最大となるずらし量（ｈ，ｖ）を求め、そのずらし量を動きベクトルとして検出する。 The motion vector detection means 43 is based on the number of motion vector distributions corresponding to the number of shift amounts, for example, 25 motion vector distributions (averaged motion vector distribution Dm (k, l)). A motion vector at a pixel position is detected. For example, when the position of the pixel of interest in the target frame is (x, y), the motion vector detection unit 43 has pixel positions (x, y) in each of the 25 averaged motion vector distributions Dm (k, l). The scores C1) are compared with each other. The motion vector detection means 43 obtains a shift amount (h, v) that maximizes the score C1, and detects the shift amount as a motion vector.

動くベクトル検出手段４３は、例えばある画素位置に対して、２５通りの平均化された動きベクトル分布Ｄｍの中から、スコアＣ１が最大となる動きベクトル分布Ｄｍを探す。平均化された動きベクトル分布Ｄｍ（−２，２）〜（２，２）のうち、平均化された動きベクトル分布Ｄｍ（２，２）におけるスコアＣ１が最大であったとする。この場合、動きベクトル検出手段４３は、そのずらし量に対応する動きベクトル（２，２）を検出する。動きベクトルの検出を、平均化された動きベクトル分布Ｄｍ（ｋ，ｌ）を用いて行うことで、ノイズの影響を低減した動きベクトルの検出が可能になる。 The moving vector detection means 43 searches for a motion vector distribution Dm having the maximum score C1 from 25 averaged motion vector distributions Dm, for example, for a certain pixel position. Of the averaged motion vector distributions Dm (−2, 2) to (2, 2), it is assumed that the score C1 in the averaged motion vector distribution Dm (2, 2) is the maximum. In this case, the motion vector detection means 43 detects a motion vector (2, 2) corresponding to the shift amount. By detecting the motion vector using the averaged motion vector distribution Dm (k, l), it is possible to detect a motion vector with reduced influence of noise.

なお、動きベクトル検出手段４３は、解像度変換手段４０で変換された画像の解像度に応じて、検出した動きベクトルの大きさを修正する。例えば解像度変換手段４０で解像度が１／８に変換されていたときは、検出した動きベクトルの大きさを８倍する。また、動きベクトル検出手段４３は、対象フレームと参照フレームとの間の時間差（フレーム間隔）が１でないときは、そのフレーム間隔に応じて検出した動きベクトルの大きさを補正する。例えばフレーム間隔が３のときは、検出した動きベクトルの大きさを１／３倍に修正する。 The motion vector detection unit 43 corrects the magnitude of the detected motion vector according to the resolution of the image converted by the resolution conversion unit 40. For example, when the resolution is converted to 1/8 by the resolution converting means 40, the magnitude of the detected motion vector is multiplied by eight. In addition, when the time difference (frame interval) between the target frame and the reference frame is not 1, the motion vector detection unit 43 corrects the magnitude of the detected motion vector according to the frame interval. For example, when the frame interval is 3, the magnitude of the detected motion vector is corrected to 1/3.

誤計測判定手段４４は、動きベクトル分布に基づいて、動きベクトル検出手段４３が対象フレームの各画素位置に対して検出した動きベクトルが誤計測であるか否かを判定する。誤計測判定手段４４は、例えば動きベクトル検出空間の中央（ずらし量（０，０））に対応する平均化された動きベクトル分布Ｄｍ（０，０）におけるスコアＣ１（以下、スコアＣ１ｓとも呼ぶ）と、検出された動きベクトル（そのずらし量）に対応する平均化された動きベクトル分布ＤｍにおけるスコアＣ１（以下、スコアＣ１ｅとも呼ぶ）とが所定の関係を満たすか否かに基づいて、検出された動きベクトルが誤計測であるか否かを判定する。 Based on the motion vector distribution, the erroneous measurement determination unit 44 determines whether or not the motion vector detected by the motion vector detection unit 43 for each pixel position in the target frame is an erroneous measurement. The erroneous measurement determination means 44 is, for example, a score C1 (hereinafter also referred to as a score C1s) in the averaged motion vector distribution Dm (0, 0) corresponding to the center (shift amount (0, 0)) of the motion vector detection space. And a score C1 (hereinafter also referred to as score C1e) in the averaged motion vector distribution Dm corresponding to the detected motion vector (the shift amount thereof) is detected based on whether or not a predetermined relationship is satisfied. It is determined whether the measured motion vector is an erroneous measurement.

誤計測判定手段４４は、例えばスコアＣ１ｓとスコアＣ１ｅとの差分をしきい値判定し、差分がしきい値よりも小さいとき、動きベクトル検出手段４３において動きベクトルが誤計測されたと判定する。あるいは誤計測判定手段４４は、スコアＣ１ｓとスコアＣ１ｅとの比が所定の関係にあるときに、動きベクトル検出手段４３において動きベクトルが誤計測されたと判定してもよい。より詳細には、誤計測判定手段４４は、αを１より小さい所定の係数として、スコアＣ１ｓとスコアＣ１ｅとが下記式を満たすとき、誤計測であると判定してもよい。
Ｃ１ｓ＞Ｃ１ｅ×α
上記式における係数αは、要求される計測精度に応じて適宜設定すればよい。 For example, the erroneous measurement determination unit 44 determines a threshold value of the difference between the score C1s and the score C1e, and determines that the motion vector is erroneously measured by the motion vector detection unit 43 when the difference is smaller than the threshold value. Alternatively, the erroneous measurement determination unit 44 may determine that the motion vector is erroneously measured in the motion vector detection unit 43 when the ratio between the score C1s and the score C1e is in a predetermined relationship. More specifically, the erroneous measurement determination unit 44 may determine that the measurement is erroneous when α is a predetermined coefficient smaller than 1 and the scores C1s and C1e satisfy the following expressions.
C1s> C1e × α
The coefficient α in the above equation may be set as appropriate according to the required measurement accuracy.

誤計測判定手段４４が、上記の差分又は比率の関係に基づいて検出された動きベクトルが誤計測であるか否かを判定するのは、次のような理論に基づく。すなわち、スコアＣ１の算出に際して３×３のような小さなブロックサイズを用いると、フレーム中のノイズにより偶然に現れた動きベクトル検出する可能性が生じる。また、対象フレームにおける着目画素が移動する物体を構成する画素ではなく、その物体の背景に対応する画素である場合、動きベクトルがランダムな方向に発生することがある。そのような場合、検出された動きベクトルの始点位置に相当するずらし量（０，０）のスコアＣ１と、最大スコアを与えるずらし量のスコアＣ１との差分はそれほど大きくならない。従って、差分をしきい値処理すること、或いは比率が所定の関係を満たすか否かを判断することで、誤計測か否かの判定を行うことができる。 The erroneous measurement determination means 44 determines whether or not the detected motion vector is an erroneous measurement based on the above difference or ratio relationship based on the following theory. That is, if a small block size such as 3 × 3 is used in calculating the score C1, there is a possibility of detecting a motion vector that appears by chance due to noise in the frame. In addition, when the target pixel in the target frame is not a pixel constituting the moving object but a pixel corresponding to the background of the object, a motion vector may be generated in a random direction. In such a case, the difference between the score C1 of the shift amount (0, 0) corresponding to the start position of the detected motion vector and the score C1 of the shift amount that gives the maximum score is not so large. Accordingly, it is possible to determine whether or not the measurement is erroneous by performing threshold processing on the difference or determining whether or not the ratio satisfies a predetermined relationship.

誤計測判定手段４４は、誤計測でないと判定したときは、動きベクトル検出手段４３が検出した動きベクトルを出力する。誤計測判定手段４４は、誤計測と判定すると、その旨を出力する。あるいは誤計測判定手段４４は、誤計測と判定したときに、動きベクトル検出手段４３で検出された動きベクトルに代えて、動きベクトル（０，０）を出力してもよい。 When it is determined that there is no erroneous measurement, the erroneous measurement determination unit 44 outputs the motion vector detected by the motion vector detection unit 43. If the erroneous measurement determination means 44 determines that there is an erroneous measurement, it outputs that fact. Alternatively, the erroneous measurement determination unit 44 may output a motion vector (0, 0) instead of the motion vector detected by the motion vector detection unit 43 when it is determined as an erroneous measurement.

図４は、動きベクトル場計測手段３１の動作手順を示している。解像度変換手段４０は、フレーム画像を低解像度の画像に変換する（ステップＳ１１）。解像度変換手段４０は、例えば対象フレームとして用いる現在フレームの画像が入力されるたびに、そのフレーム画像を低解像度の画像に変換する。参照フレームとして用いる過去フレームの画像については、その過去フレームの画像を対象フレームとして動きベクトルを検出する際に低解像度化した画像を利用すればよい。 FIG. 4 shows an operation procedure of the motion vector field measuring means 31. The resolution conversion means 40 converts the frame image into a low resolution image (step S11). For example, every time a current frame image used as a target frame is input, the resolution conversion unit 40 converts the frame image into a low-resolution image. For the past frame image used as the reference frame, an image with a reduced resolution may be used when the motion vector is detected using the past frame image as the target frame.

動きベクトル分布算出手段４１は、現在フレームの画像を対象フレーム、過去フレームの画像を参照フレームとして、対象フレームと参照フレームとのずらし量を初期値に設定する（ステップＳ１２）。動きベクトル分布算出手段４１は、例えばずらし量（ｈ，ｖ）＝（０，０）を初期ずらし量として設定する。動きベクトル分布算出手段４１は、対象フレームと参照フレームとを設定したずらし量だけずらし、対象フレームにおける各画素に対してスコアＣ１を算出する（ステップＳ１３）。言い換えれば、設定したずらし量に対する動きベクトル分布Ｄを算出する。 The motion vector distribution calculating means 41 sets the shift amount between the target frame and the reference frame to an initial value using the current frame image as the target frame and the past frame image as the reference frame (step S12). The motion vector distribution calculating unit 41 sets, for example, the shift amount (h, v) = (0, 0) as the initial shift amount. The motion vector distribution calculating unit 41 shifts the target frame and the reference frame by the set shift amount, and calculates a score C1 for each pixel in the target frame (step S13). In other words, the motion vector distribution D for the set shift amount is calculated.

動きベクトル分布算出手段４１は、全てのずらし量に対して動きベクトル分布Ｄを算出したか否かを判断する（ステップＳ１４）。動きベクトル分布算出手段４１は、全てのずらし量に対して動きベクトル分布Ｄを算出していないと判断すると、ずらし量を変更する（ステップＳ１５）。動きベクトル分布算出手段４１は、ずらし量を変更した後にステップＳ１３に戻り、変更したずらし量に対して動きベクトル分布Ｄを算出する。動きベクトル分布算出手段４１は、ステップＳ１４で全てのずらし量に対して動きベクトル分布Ｄを算出したと判断するまでステップＳ１３〜Ｓ１５を繰り返し実行し、動きベクトル検出範囲内の各ずらし量に対して動きベクトル分布Ｄを算出する。 The motion vector distribution calculating unit 41 determines whether or not the motion vector distribution D has been calculated for all the shift amounts (step S14). If the motion vector distribution calculating unit 41 determines that the motion vector distribution D is not calculated for all the shift amounts, the motion vector distribution calculating unit 41 changes the shift amount (step S15). The motion vector distribution calculating means 41 returns to step S13 after changing the shift amount, and calculates the motion vector distribution D for the changed shift amount. The motion vector distribution calculating means 41 repeatedly executes steps S13 to S15 until it determines that the motion vector distribution D has been calculated for all the shift amounts in step S14, and for each shift amount within the motion vector detection range. A motion vector distribution D is calculated.

平均化処理手段４２は、動きベクトル算出部４１が算出した各ずらし量に対する動きベクトル分布Ｄに対して平均化処理を施し、平均化された動きベクトル分布Ｄｍを生成する（ステップＳ１６）。この平均化の処理は省略してもよい。動きベクトル検出手段４３は、平均化された動きベクトル分布Ｄｍを用い、対象フレームにおける各画素に対して動きベクトルを検出する（ステップＳ１７）。誤計測判定手段４４は、検出された動きベクトルが誤計測であるか否かを判定する（ステップＳ１８）。動きベクトル場計測手段３１は、動きベクトル計測結果を出力する（ステップＳ１９）。動きベクトル場計測手段３１は、誤計測でないと判定された画素については検出された動きベクトルを出力し、誤計測であると判定された画素については動きベクトルが計測されなかった旨、又は大きさが０の動きベクトルを出力する。 The averaging processing means 42 performs an averaging process on the motion vector distribution D for each shift amount calculated by the motion vector calculating unit 41, and generates an averaged motion vector distribution Dm (step S16). This averaging process may be omitted. The motion vector detection unit 43 detects a motion vector for each pixel in the target frame using the averaged motion vector distribution Dm (step S17). The erroneous measurement determination unit 44 determines whether or not the detected motion vector is an erroneous measurement (step S18). The motion vector field measurement means 31 outputs a motion vector measurement result (step S19). The motion vector field measurement means 31 outputs a detected motion vector for a pixel determined not to be erroneously measured, and indicates that the motion vector has not been measured or has a size for a pixel determined to be erroneously measured. Outputs a motion vector of zero.

ここで、ブロックマッチング法により動きベクトルを検出する際に、ブロックサイズを３×３のように小さくすると、ノイズの影響などにより動きベクトルの誤計測が発生しやすくなる。図３に示す構成の動きベクトル場計測手段３１では、誤計測判定手段４４が、検出された動きベクトルが誤計測であるか否かを判定する。この誤計測判定手段４４を用いることで、動きベクトル検出の際のブロックサイズを例えば３×３のように小さくしても、誤計測の影響を低減することができ、動きベクトルを精度よく検出できる。また、ブロックサイズを小さくした分だけ演算量を低減することができ、その結果、動きベクトルを高速に検出することができる。つまり、動きベクトル検出の精度を落とさずに、処理を高速化することができる。 Here, when the motion vector is detected by the block matching method, if the block size is reduced to 3 × 3, an erroneous measurement of the motion vector is likely to occur due to the influence of noise or the like. In the motion vector field measuring means 31 having the configuration shown in FIG. 3, the erroneous measurement determining means 44 determines whether or not the detected motion vector is an erroneous measurement. By using this erroneous measurement determination means 44, even if the block size at the time of motion vector detection is reduced to, for example, 3 × 3, the influence of erroneous measurement can be reduced, and the motion vector can be detected with high accuracy. . In addition, the amount of calculation can be reduced by the reduction in the block size, and as a result, the motion vector can be detected at high speed. That is, the processing can be speeded up without reducing the accuracy of motion vector detection.

なお、動きベクトル場計測手段３１は、各画素位置の周辺の画素位置での動きベクトルの平均を求め、その平均ベクトルを各画素位置の動きベクトルとして出力してもよい。例えば動きベクトル場計測手段３１は、ある画素位置について、その画素位置で検出された動きベクトルと、その周辺の画素位置で検出された動きベクトルとの平均を求める。ベクトル場計測手段３１は求めた平均ベクトルを、当該画素位置で検出された動きベクトルとして出力することができる。このように複数の画素位置で検出された動きベクトルを平均化する場合、動きベクトルのばらつきを抑えて、より精度よく動きベクトルを計測することができる。 Note that the motion vector field measuring unit 31 may obtain an average of motion vectors at pixel positions around each pixel position and output the average vector as a motion vector at each pixel position. For example, the motion vector field measuring unit 31 calculates an average of a motion vector detected at a certain pixel position and a motion vector detected at the surrounding pixel positions. The vector field measuring means 31 can output the obtained average vector as a motion vector detected at the pixel position. When the motion vectors detected at a plurality of pixel positions are averaged as described above, the motion vectors can be measured with higher accuracy while suppressing variations in the motion vectors.

また誤計測判定手段４４は、スコアＣ１の差分に基づいて誤計測であるか否かを判定するのに代えて、別の判断基準で誤計測であるか否かを判定してもよい。例えば、動きベクトル分布算出手段４１が、上記した動きベクトル分布（第１の動きベクトル分布）に加えて、対象フレームと参照フレームとを入れ替え、過去フレームを対象フレームとし、現在フレームを参照フレームとして別の動きベクトル分布(第２の動きベクトル分布)を算出する。動くベクトル検出手段４３は、第１の動きベクトル分布に基づく動きベクトル（第１の動きベクトル）の検出に加えて、第２の動きベクトル分布に基づいて別の動きベクトル（第２の動きベクトル）を検出する。誤計測判定手段４４は、第１の動きベクトルと第２の動きベクトルとの関係に基づいて、検出された動きベクトルが誤計測であるか否かを判定してもよい。 Further, the erroneous measurement determination unit 44 may determine whether or not the erroneous measurement is based on another determination criterion instead of determining whether or not the erroneous measurement is based on the difference of the score C1. For example, the motion vector distribution calculating unit 41 replaces the target frame and the reference frame in addition to the above-described motion vector distribution (first motion vector distribution), and sets the past frame as the target frame and the current frame as the reference frame. Motion vector distribution (second motion vector distribution) is calculated. The motion vector detecting unit 43 detects another motion vector (second motion vector) based on the second motion vector distribution, in addition to detecting a motion vector (first motion vector) based on the first motion vector distribution. Is detected. The erroneous measurement determination unit 44 may determine whether or not the detected motion vector is an erroneous measurement based on the relationship between the first motion vector and the second motion vector.

仮に、第１の動きベクトルが誤計測されたものでないと仮定すると、第１の動きベクトルと第２の動きベクトルとの大きさは同じで、方向は反対方向となると考えられる。つまり、第２の動きベクトルは、第１の動きベクトルの逆ベクトルになると考えられる。誤計測判定手段４４は、第１の動きベクトルと第２の動きベクトルとが逆ベクトルの関係にあるか否かを判断し、逆ベクトルの関係にあるときに誤計測ではないと判定する。逆ベクトルの関係にないときは、誤計測と判定する。誤計測判定手段４４は、第１の動きベクトルの大きさと第２の動きベクトルの大きさとの差が所定のしきい値以内で、かつ、第１の動きベクトルの方向と、第２の動きベクトルの方向を反転させた方向との差が所定のしきい値以内のときに、両者は逆ベクトルの関係にあると判断することができる。 If it is assumed that the first motion vector has not been erroneously measured, the first motion vector and the second motion vector have the same magnitude and the opposite directions. That is, the second motion vector is considered to be an inverse vector of the first motion vector. The erroneous measurement determination unit 44 determines whether or not the first motion vector and the second motion vector have an inverse vector relationship, and determines that there is no erroneous measurement when the inverse vector relationship exists. If there is no inverse vector relationship, it is determined that the measurement is incorrect. The erroneous measurement determination means 44 has a difference between the magnitude of the first motion vector and the magnitude of the second motion vector within a predetermined threshold, the direction of the first motion vector, and the second motion vector. When the difference from the inverted direction is within a predetermined threshold value, it can be determined that the two are in an inverse vector relationship.

誤計測判定手段４４は、上記に代えて、対象フレームの自己相関を求め、求めた自己相関を用いて誤計測か否かを判定してもよい。例えば動きベクトル分布算出手段４１は、対象フレームに対して参照フレームをずらしつつ動きベクトル分布を算出するのに加えて、対象フレームに対して対象フレーム自身をずらしつつ動きベクトル分布を算出する。この動きベクトル分布を自己動きベクトル分布と呼ぶこととする。誤計測判定手段４４は、動きベクトル検出手段４３が動きベクトルを検出すると、検出された動きベクトル（ずらし量）に対応する自己動きベクトル分布のスコアを参照し、そのスコアに基づいて誤計測か否かを判定してもよい。例えばある画素位置に対して検出された動きベクトルが（２，２）のとき、誤計測判定手段４４は、ずらし量（２，２）に対応する自己動きベクトル分布における当該座標位置のスコアを参照する。誤計測判定手段４４は、例えば自己動きベクトル分布におけるスコアをしきい値処理し、スコアがしきい値以下であれば誤計測ではないと判定し、スコアがしきい値より大きいとき誤計測であると判定する。 Instead of the above, the erroneous measurement determination means 44 may determine the autocorrelation of the target frame and determine whether or not the erroneous measurement is performed using the calculated autocorrelation. For example, in addition to calculating the motion vector distribution while shifting the reference frame with respect to the target frame, the motion vector distribution calculating unit 41 calculates the motion vector distribution while shifting the target frame itself with respect to the target frame. This motion vector distribution is called a self motion vector distribution. When the motion vector detection unit 43 detects the motion vector, the erroneous measurement determination unit 44 refers to the score of the self-motion vector distribution corresponding to the detected motion vector (shift amount), and determines whether the erroneous measurement is based on the score. It may be determined. For example, when the motion vector detected for a certain pixel position is (2, 2), the erroneous measurement determination means 44 refers to the score of the coordinate position in the self-motion vector distribution corresponding to the shift amount (2, 2). To do. For example, the erroneous measurement determination unit 44 performs threshold processing on the score in the self-motion vector distribution, determines that it is not erroneous measurement if the score is equal to or lower than the threshold, and is erroneous measurement when the score is greater than the threshold. Is determined.

自己動きベクトル分布を用いた誤計測の判定は、以下の理論に基づく。すなわち、対象フレームのある画素に着目すると、ずらし量０では同一画素間の相関を求めることになるため、自己動きベクトル分布におけるスコアＣ１の値は大きくなる。一方、着目画素が移動物体などに該当する画素である場合に、対象フレームに対して対象フレーム自身をずらしてスコアを算出すると、両者の間で物体の位置がずれることからスコアＣ１の値は小さくなると考えられる。つまり、移動物体などに該当する着目画素に対して算出されるスコアＣ１は、ずらし量０が最大で、ずらし量が大きくなるほどスコアＣ１の値は小さくなると考えられる。 Determination of erroneous measurement using the self-motion vector distribution is based on the following theory. That is, when attention is paid to a certain pixel of the target frame, since the correlation between the same pixels is obtained when the shift amount is 0, the value of the score C1 in the self-motion vector distribution becomes large. On the other hand, when the target pixel is a pixel corresponding to a moving object or the like, if the score is calculated by shifting the target frame itself with respect to the target frame, the position of the object is shifted between the two, so the value of the score C1 is small. It is considered to be. That is, it is considered that the score C1 calculated for the target pixel corresponding to a moving object or the like has the maximum shift amount 0, and the value of the score C1 decreases as the shift amount increases.

一方、着目画素が背景部分などの単調な部分に対応する場合は、上記とは異なり、ずらし量０で算出されたスコアＣ１と、対象フレーム自身をずらして算出されたスコアＣ１とは、それほど差がつかないと考えられる。つまり、単調背景などに該当する着目画素に対して算出されるスコアＣ１は、ずらし量に依存せず、比較的大きな値を取ると考えられる。自己動きベクトル分布において、ずらし量０以外のずらし量においてスコアＣ１の値が大きくなるということは、その大きなスコアとなったずらし量において誤計測が起こっている可能性が高いことを意味すると考えられる。そこで、誤計測判定手段４４は、検出された動きベクトルに対応するずらし量の自己動きベクトル分布のスコアＣ１をしきい値処理し、スコアＣ１の値がしきい値よりも大きいときに、誤計測であると判定できる。 On the other hand, when the target pixel corresponds to a monotonous portion such as a background portion, unlike the above, the score C1 calculated with the shift amount 0 and the score C1 calculated by shifting the target frame itself are not so different. It is thought that it does not stick. That is, it is considered that the score C1 calculated for the target pixel corresponding to the monotonous background or the like takes a relatively large value without depending on the shift amount. In the self-motion vector distribution, an increase in the value of the score C1 at a shift amount other than the shift amount 0 means that there is a high possibility that an erroneous measurement has occurred at the shift amount having a large score. . Therefore, the erroneous measurement determination unit 44 performs threshold processing on the score C1 of the self-motion vector distribution of the shift amount corresponding to the detected motion vector, and when the value of the score C1 is larger than the threshold, erroneous measurement is performed. Can be determined.

別例として、誤計測判定手段４４は、機械学習を用いて生成された判別器を用いて誤計測か否かを判定してもよい。誤計測判定手段４４は、例えば２５通りの動きベクトル分布における各画素位置のスコアを並べたデータ（動きベクトル検出空間のスコア分布）をベクトルデータとして判別器に入力する。判別器は、入力されたデータに対して、誤計測であるか否かを示す信号を出力する。判別器は、動きベクトルが真の動きベクトルであるときの動きベクトル検出空間のスコア分布を正の教師データとし、動きベクトルが誤検出であるときの動きベクトル検出空間のスコア分布を負の教師データとして、正負の教師データを学習することで生成することができる。誤計測判定手段４４は、判別器の出力に基づいて、誤計測か否かを判定することができる。 As another example, the erroneous measurement determination unit 44 may determine whether there is an erroneous measurement by using a discriminator generated using machine learning. For example, the erroneous measurement determination unit 44 inputs data (score distribution in the motion vector detection space) in which the scores of the respective pixel positions in 25 different motion vector distributions are arranged as vector data to the discriminator. The discriminator outputs a signal indicating whether or not the input data is erroneous measurement. The classifier uses the score distribution in the motion vector detection space when the motion vector is a true motion vector as positive teacher data, and the score distribution in the motion vector detection space when the motion vector is a false detection as negative teacher data Can be generated by learning positive and negative teacher data. The erroneous measurement determination means 44 can determine whether or not there is an erroneous measurement based on the output of the discriminator.

上記した誤計測判定の手法は組み合わせて用いてもよい。例えば、スコアの差分に基づいて誤計測か否かを判定する手法と、第１の動きベクトルと第２の動きベクトルとの関係に基づいて誤計測か否かを判定する手法とを組み合わせてもよい。その場合、誤計測判定手段４４は、双方の条件がそろったとき、すなわちスコアの差分がしきい値よりも小さく、かつ、第１の動きベクトルと第２の動きベクトルとが逆ベクトルの関係にないときに誤計測と判定してもよい。あるいは、これに代えて、何れか一方の条件が成立したとき、すなわちスコアの差分がしきい値よりも小さいか、又は第１の動きベクトルと第２の動きベクトルとが逆ベクトルの関係にないときに誤計測と判定してもよい。 The above-described erroneous measurement determination methods may be used in combination. For example, a method of determining whether or not an erroneous measurement is based on the difference in scores may be combined with a method of determining whether or not an erroneous measurement is based on the relationship between the first motion vector and the second motion vector. Good. In that case, the erroneous measurement determination means 44 has a relationship in which both conditions are met, that is, the difference between the scores is smaller than the threshold value, and the first motion vector and the second motion vector have an inverse vector relationship. It may be determined that there is no measurement error. Alternatively, when any one of the conditions is satisfied, that is, the difference between the scores is smaller than the threshold value, or the first motion vector and the second motion vector are not in an inverse vector relationship. Sometimes it may be determined as an erroneous measurement.

続いて、オブジェクト検出手段１２の具体的な構成例を説明する。図５は、オブジェクト検出手段１２の構成例を示す。オブジェクト検出手段１２は、前処理手段２１、平滑化処理手段２２、差分画像生成手段２３、合算手段２４、位置推定手段２５、サイズ推定手段２６、及び、照合手段２７を有する。オブジェクト検出手段１２は、動画像内の特定パターン、例えば人物の頭部をオブジェクトとして検出する。以下では、オブジェクト検出手段１２が、動画像からオブジェクトを１つ検出するものとして説明を行う。 Next, a specific configuration example of the object detection unit 12 will be described. FIG. 5 shows a configuration example of the object detection means 12. The object detection unit 12 includes a preprocessing unit 21, a smoothing processing unit 22, a difference image generation unit 23, a summation unit 24, a position estimation unit 25, a size estimation unit 26, and a collation unit 27. The object detection means 12 detects a specific pattern in the moving image, for example, a human head as an object. In the following description, it is assumed that the object detection unit 12 detects one object from a moving image.

前処理手段２１は、解像度変換手段５１と動き領域抽出手段５２とを有する。解像度変換手段５１は、動画像を構成するフレーム画像を所定の解像度に低解像度化する。解像度変換手段５１は、例えば画像の解像度を縦横それぞれ１／８倍に変換する。解像度変換手段５１が変換する画像の解像度は、動きベクトル場計測手段３１（図３）における解像度変換手段４０が変換する画像の解像度と同一でもよい。なお、オブジェクト検出手段１２と動きベクトル場計測手段３１とが別個に解像度変換手段を有している必要はない。両者に共通の解像度変換手段から低解像度化した画像を提供するようにしてもよい。 The preprocessing unit 21 includes a resolution conversion unit 51 and a motion region extraction unit 52. The resolution conversion means 51 lowers the frame image constituting the moving image to a predetermined resolution. The resolution conversion means 51 converts, for example, the resolution of an image to 1/8 times in the vertical and horizontal directions. The resolution of the image converted by the resolution converting means 51 may be the same as the resolution of the image converted by the resolution converting means 40 in the motion vector field measuring means 31 (FIG. 3). Note that the object detection means 12 and the motion vector field measurement means 31 do not need to have resolution conversion means separately. You may make it provide the image which reduced the resolution from the resolution conversion means common to both.

動き領域抽出手段５２は、動画像を構成するフレーム画像から動き領域を抽出し動き領域抽出画像を生成する。動き領域の抽出には、例えば背景画像やフレーム間画像の差分を算出するなど任意の手法を用いることができる。動き領域抽出手段５２は、抽出された動きの量に応じて、動きがある領域ほど白く（階調値が高く）、動きが少ない領域ほど黒く（階調値が低く）なるようなグレースケール画像を動き領域抽出画像として生成する。動き領域抽出手段５２は、例えば階調数２５６のグレースケール画像に対して所定の関数に従って階調を変換し、白から黒までの階調数を減少させるコントラスト低減処理を実施してもよい。動き領域抽出手段１２は、グレースケール画像に代えて、動き領域を白、背景領域を黒にするような２値化画像を動き領域抽出画像として生成してもよい。 The motion region extraction unit 52 extracts a motion region from the frame image constituting the motion image and generates a motion region extraction image. For extracting the motion region, for example, an arbitrary method such as calculating a difference between the background image and the inter-frame image can be used. The motion region extraction means 52 is a grayscale image in which the region with motion is white (the tone value is high) and the region with less motion is black (the tone value is low) according to the amount of motion extracted. Is generated as a motion region extraction image. For example, the motion region extraction unit 52 may perform a contrast reduction process for converting the gray scale according to a predetermined function on a grayscale image having 256 gray scales and reducing the gray scale from white to black. The motion region extraction unit 12 may generate a binarized image in which the motion region is white and the background region is black, instead of the grayscale image, as the motion region extraction image.

平滑化処理手段２２には、前処理手段２１で前処理された画像Ｐ（ｘ，ｙ）、すなわち解像度が低解像度化され、動き領域が抽出された画像が入力される。平滑化処理手段２２は、平滑化フィルタを画像に畳み込む処理を繰り返し行うことにより、スケールが異なる複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する。 The smoothing processing unit 22 is input with the image P (x, y) pre-processed by the pre-processing unit 21, that is, the image whose resolution is reduced and the motion region is extracted. The smoothing processing means 22 generates a plurality of smoothed images L (x, y, σ _i ) having different scales by repeatedly performing a process of convolving the smoothing filter into the image.

平滑化処理手段２２は、まず画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込むことで平滑化画像Ｌ（ｘ，ｙ，σ_１）を生成し、その平滑化画像Ｌ（ｘ，ｙ，σ_１）に更に平滑化フィルタを畳み込むことでスケールσ_２の平滑化画像＋（ｘ，ｙ，σ_２）を生成する。平滑化処理手段２２は、以降同様に平滑化フィルタの畳み込みを繰り返し行い、任意のスケールσ_ｑの平滑化画像Ｌ（ｘ，ｙ，σ_ｑ）から次のスケールσ_ｑ＋１の平滑化画像Ｌ（ｘ，ｙ，σ_ｑ＋１）を生成する。 The smoothing processing means 22 first generates a smoothed image L (x, y, σ ₁ ) by convolving a smoothing filter with the image P (x, y), and the smoothed image L (x, y, σ). ₁ ) is further convolved with a smoothing filter to generate a smoothed image + (x, y, σ ₂ ) of scale σ ₂ . The smoothing processing means 22 repeats the convolution of the smoothing filter in the same manner thereafter, and from the smoothed image L (x, y, σ _q ) of an arbitrary scale σ _{q to} the smoothed image L (x of the next scale σ _{q + 1} , Y, σ _{q + 1} ).

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるスケール番号ｉは、平滑化フィルタを畳み込んだ回数に相当する。平滑化処理手段２２は、例えばスケールが異なるａ×ｋ枚（ａ及びｋはそれぞれ２以上の整数）の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜Ｌ（ｘ，ｙ，σ_ａ×ｋ）を生成する。平滑化処理手段２２は、例えばａ＝２、ｋ＝３０とすれば２×３０＝６０枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_６０）を生成する。 The scale number i in the smoothed image L (x, y, σ _i ) corresponds to the number of times the smoothing filter is convoluted. The smoothing processing means 22 is, for example, a × k images (a and k are integers of 2 or more) of different scales L (x, y, σ ₁ ) to L (x, y, σ _{a × k).} ) Is generated. For example, if a = 2 and k = 30, the smoothing processing unit 22 generates 2 × 30 = 60 smoothed images L (x, y, σ ₁ ) to (x, y, σ ₆₀ ).

平滑化フィルタには、例えばガウシアンフィルタを用いることができる。平滑化フィルタは、例えば検出対象であるオブジェクトの輪郭形状に合わせたフィルタ特性となる３×３オペレータから成る。例えばオブジェクト検出手段１２で検出対象とするオブジェクトが人物の頭部であれば、平滑化フィルタとして、人物の頭部の輪郭形状に沿って下側のフィルタ係数が小さくなる特性（オメガ形状）を有するフィルタを用いる。このような平滑化フィルタを用いることで、人物の頭部の輪郭形状を有する領域を強調し、それ以外の領域は抑制された平滑化処理を実現できる。 As the smoothing filter, for example, a Gaussian filter can be used. The smoothing filter is composed of, for example, a 3 × 3 operator having a filter characteristic that matches the contour shape of the object to be detected. For example, if the object to be detected by the object detection means 12 is a person's head, the smoothing filter has a characteristic (omega shape) in which the lower filter coefficient decreases along the contour shape of the person's head. Use a filter. By using such a smoothing filter, it is possible to realize a smoothing process in which a region having a contour shape of a person's head is emphasized and other regions are suppressed.

なお、フィルタの形状はオメガ形状には限定されず、例えば特開２００３−２４８８２４号公報等に記載されたものなど、他の公知技術を適用することも可能である。例えば検出対象のオブジェクトの形状が円形、三角形、四角形などの場合には、それぞれのオブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化処理を施せばよい。 The shape of the filter is not limited to the omega shape, and other known techniques such as those described in Japanese Patent Application Laid-Open No. 2003-248824 can be applied. For example, when the object to be detected has a circular shape, a triangular shape, a quadrangular shape, or the like, the smoothing process may be performed using a smoothing filter having a filter characteristic matched to each object shape.

差分画像生成手段２３は、平滑化処理手段２２が生成した複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を入力し、スケールが互いに異なる２つの平滑化画像間の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を、スケールを変えつつ複数枚生成する。ここで、差分画像Ｇ（ｘ，ｙ，σ_ｊ）におけるスケール番号ｊの最大値は、平滑化画像Ｌにおけるスケールσ_ｉの最大値（例えばａ×ｋ）よりは小さい。差分画像生成手段２３は、例えばスケール番号ｊに応じたスケールだけ離れた平滑化画像間の差分画像を生成する。具体的には、差分画像生成手段２３は、例えば下記式３を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成することができる。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ×ａ）・・・（３）
差分画像は、差分値の絶対値であってもよい。 The difference image generation means 23 receives a plurality of smoothed images L (x, y, σ _i ) generated by the smoothing processing means 22 and a difference image G (x between two smoothed images having different scales. , Y, σ _j ) are generated while changing the scale. Here, the maximum value of the scale number j in the difference image G (x, y, σ _j ) is smaller than the maximum value (for example, a × k) of the scale σ _i in the smoothed image L. The difference image generation unit 23 generates a difference image between smoothed images separated by a scale corresponding to the scale number j, for example. Specifically, the difference image generation unit 23 can generate the difference image G (x, y, σ _j ) using, for example, the following Equation 3.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j × a} ) (3)
The difference image may be an absolute value of the difference value.

上記の式３の定義からわかるように、差分画像Ｇ（ｘ，ｙ，σ_ｊ）は、スケールσ_ｊの平滑化画像と、スケールσ_ｊ×ａの平滑化画像との差分として定義される。例えばａ＝２、ｋ＝３０とすると、差分画像生成手段２３は、スケールσ_１とσ_２、スケールσ_２とσ_４、スケールσ_３とσ_６、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。式３に従って差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する場合、ｊは１〜ｋの値を取る。すなわち、差分画像生成手段２３は、ｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。 As can be seen from the definition of formula 3 above, the difference image _{G (x, y, σ j} ) has a smoothed image of the scale sigma _j, it is defined as the difference between the smoothed image of the scale σ _{j × a.} For example, the a = 2, k = 30, the difference image generation unit 23, the scale sigma ₁ and sigma _2, scale sigma ₂ and sigma _4, scale sigma ₃ and sigma _6, · · ·, scale sigma ₃₀ and sigma ₆₀ of Thirty differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations are generated. When the difference image G (x, y, σ _j ) is generated according to Equation 3, j takes a value from 1 to k. That is, the difference image generation unit 23 generates k difference images G (x, y, σ ₁ ) to (x, y, σ _k ).

差分画像生成手段２３は、上記に代えて、一定のスケールだけ離れた平滑化画像間の差分を差分画像として生成してもよい。差分画像生成手段２３は、例えばスケールσ_ｊの平滑化画像と、スケールσ_ｊ＋ｐの平滑化画像（ｐは１以上の整数）との差分を差分画像（ｘ，ｙ，σ_ｊ）として生成してもよい。具体的には、差分画像生成手段２３は、下記式４を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成してもよい。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）・・・（４）
この場合、平滑化画像の枚数をｒ（ｒ：３以上の整数）枚とすると、ｊは１〜ｒ−ｐの値を取る。すなわち差分画像生成手段２３は、ｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。具体的には、ｒ＝６０で、ｐ＝３０の場合、差分画像生成手段２３は、スケールσ_１とσ_３１、スケールσ_２とσ_３２、スケールσ_３とσ_３３、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。 Instead of the above, the difference image generation means 23 may generate a difference between smoothed images separated by a certain scale as a difference image. The difference image generating means 23 generates, for example, a difference between a smoothed image of scale σ _{j and} a smoothed image of scale σ _{j + p} (p is an integer of 1 or more) as a difference image (x, y, σ _j ). Also good. Specifically, the difference image generation means 23 may generate the difference image G (x, y, σ _j ) using the following formula 4.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j + p} ) (4)
In this case, if the number of smoothed images is r (r: an integer of 3 or more), j takes a value of 1 to rp. That is, the difference image generation unit 23 generates rp difference images G (x, y, σ ₁ ) to (x, y, σ _r−p ). Specifically, in the case of r = 60 and p = 30, the difference image generation unit 23 determines that the scales σ ₁ and σ ₃₁ , the scales σ ₂ and σ ₃₂ , the scales σ ₃ and σ ₃₃ ,. ₃₀ differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations of ₃₀ and σ ₆₀ are generated.

合算手段２４は、差分画像生成手段２３が生成した複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する。位置推定手段５０は、合算画像ＡＰ（ｘ，ｙ）における画素値に基づいてオブジェクトの位置を推定する。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）において画素値（差分値を合計した値）が最も大きくなる位置を調べ、その位置をオブジェクトの位置として推定する。 The summing unit 24 adds the plurality of difference images G (x, y, σ _j ) generated by the difference image generating unit 23 to generate a combined image AP (x, y). The position estimation unit 50 estimates the position of the object based on the pixel value in the combined image AP (x, y). For example, the position estimating unit 25 checks the position where the pixel value (the sum of the difference values) is the largest in the combined image AP (x, y), and estimates the position as the position of the object.

サイズ推定手段２６は、複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）の画素値を比較し、最大の画素値を有する差分画像のスケールに基づいて、検出すべきオブジェクトのサイズを推定する。サイズ推定手段２６は、例えば最大の画素値（差分値）を有する差分画像の生成元となった２枚の平滑化画像のうちのスケールが小さい方の平滑化画像内のスケールに基づいてオブジェクトのサイズを推定する。すなわちサイズ推定手段２６は、式３又は式４に従って生成される複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）のうちで、最大の差分値を有するスケールσ_ｊを求め、求めたスケールσ_ｊに基づいてオブジェクトの位置を推定する。 The size estimation means 26 compares the pixel values of a plurality of difference images G (x, y, σ _j ), and estimates the size of the object to be detected based on the scale of the difference image having the maximum pixel value. . For example, the size estimating unit 26 determines the object based on the scale in the smoothed image having the smaller scale of the two smoothed images that are the generation sources of the difference image having the maximum pixel value (difference value). Estimate the size. That is, the size estimation unit 26 obtains the scale σ _j having the maximum difference value among the plurality of difference images G (x, y, σ _j ) generated according to the expression 3 or 4, and the obtained scale σ The position of the object is estimated based on _j .

上記のオブジェクトの位置及びサイズの推定について説明する。平滑化処理手段２２は、オブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成しており、この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）は、特定の形状を持つ領域が強調され、他の領域が抑制された画像となる。例えば平滑化処理を数十回行ったときでも平滑化画像内にオブジェクトの輪郭成分が残るが、スケールσ_ｉが大きくなるほど、オブジェクトの領域はボケていくと共に広がっていく。 The estimation of the position and size of the object will be described. The smoothing processing means 22 generates a smoothed image L (x, y, σ _i ) using a smoothing filter having a filter characteristic matched to the object shape, and this smoothed image L (x, y, σ _i ) is an image in which a region having a specific shape is emphasized and other regions are suppressed. For example, even when the smoothing process is performed several tens of times, the contour component of the object remains in the smoothed image, but as the scale σ _i increases, the area of the object blurs and expands.

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるオブジェクトの形状及びサイズは、入力画像内のオブジェクトの形状及びサイズとそれぞれ一致していると仮定する。この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）でのオブジェクト形状及びサイズの顕著性を算出するために、あるスケールの平滑化画像に対して、そのスケールよりもスケールが大きい平滑化画像を背景として設定する。すなわちスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）対して、式３ではスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）を背景画像として設定し、式４ではスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）を背景として設定する。そして、式３又は式４に従って、スケールσ_ｊの平滑化画像と背景画像として設定する平滑化画像との差分画像Ｇ（ｘ，ｙ，σ_ｊ）が、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）におけるオブジェクトの顕著性として算出される。このように差分画像生成手段２３においてオブジェクトの顕著性を数値化し、位置推定手段２５及びサイズ推定手段２６において、差分画像生成手段２３において数値化されたオブジェクトの顕著性に基づいて、オブジェクトの位置及びサイズをそれぞれ推定する。 It is assumed that the shape and size of the object in the smoothed image L (x, y, σ _i ) match the shape and size of the object in the input image, respectively. In order to calculate the saliency of the object shape and size in the smoothed image L (x, y, σ _i ), a smoothed image having a scale larger than that scale is used as a background for the smoothed image of a certain scale. Set as. That scale sigma _j of the smoothed image _{L (x, y, σ j} ) for sets smoothed image L of formula 3 in the scale _{σ j × a (x, y} , σ j × a) as the background image, In Equation 4, a smoothed image L (x, y, σ _{j + p} ) of scale σ _{j + p} is set as the background. Then, according to Equation 3 or Equation 4, the scale sigma difference image G of the smoothed image to be set as _j of the smoothed image and the background image (x, y, σ _j) are scaled sigma _j of the smoothed image L (x , Y, σ _j ) as the saliency of the object. In this way, the saliency of the object is digitized by the difference image generation means 23, and the position estimation means 25 and the size estimation means 26 based on the saliency of the object quantified by the difference image generation means 23. Estimate each size.

ここで、画像内においてオブジェクトが理想形状、すなわちフィルタ特性に最も合致した形状であって、かつ背景にノイズがない差分画像が、他の差分画像に比べて最大の信号を有する。言い換えれば、前処理済みの画像Ｐ（ｘ，ｙ）内のオブジェクトを構成する各画素の成分がオブジェクトの領域にほぼ等しくなるまで広がったとき、差分画像Ｇ（ｘ，ｙ，σ_ｊ）内の差分値は最大となる。例えば画像Ｐ（ｘ，ｙ）内のオブジェクトが直径１０画素の円形画素から構成される場合、複数の差分画像のうちで、ｊ＝１０の差分画像Ｇ（ｘ，ｙ，σ_１０）（式３ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_ａ×１０）、式４ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_１０＋ｐ））における差分値が、他の差分画像における差分値に比べて大きな値を有することになる。 Here, the difference image in which the object has an ideal shape in the image, that is, the shape that most closely matches the filter characteristics and has no noise in the background, has the maximum signal compared to the other difference images. In other words, when the component of each pixel constituting the object in the preprocessed image P (x, y) spreads to be approximately equal to the object region, the difference image G (x, y, σ _j ) The difference value is the maximum. For example, when an object in the image P (x, y) is composed of circular pixels having a diameter of 10 pixels, among the plurality of difference images, the difference image G (x, y, σ ₁₀ ) (equation 3) where j = 10. In L (x, y, σ ₁₀ ) −L (x, y, σ _{a × 10} ), and in Equation 4, the difference value in L (x, y, σ ₁₀ ) −L (x, y, σ _{10 + p} )) is Therefore, it has a larger value than the difference value in other difference images.

一方で、実際に画像内に映し出されるオブジェクトは、カメラとオブジェクトの位置関係や個体差などに応じて映り方が異なり、オブジェクトの輪郭形状及びサイズは理想形状になるとは限らない。つまり、オブジェクトの輪郭形状及びサイズは変動する。そこで、位置推定手段５０は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算した合算画像ＡＰ（ｘ，ｙ）を用いてオブジェクトの位置を推定する。このようにすることで、オブジェクトの変動を吸収しながらオブジェクトの位置を推定できる。つまり、サイズが小さいオブジェクトからサイズが大きいオブジェクトに含まれる様々な輪郭形状の変動を持つオブジェクトに対して、平滑化画像を加算した合算画像ＡＰ（ｘ，ｙ）から最大値を検出することにより、変動を吸収しながら位置推定を行うことができる。 On the other hand, the object actually reflected in the image differs in the way it is reflected according to the positional relationship between the camera and the object, individual differences, and the like, and the contour shape and size of the object are not necessarily ideal. That is, the contour shape and size of the object vary. Therefore, the position estimating means 50 estimates the position of the object using the combined image AP (x, y) obtained by adding a plurality of difference images G (x, y, σ _j ). By doing so, it is possible to estimate the position of the object while absorbing the variation of the object. That is, by detecting the maximum value from the summed image AP (x, y) obtained by adding the smoothed image to the object having various outline shape fluctuations included in the large object from the small object, Position estimation can be performed while absorbing fluctuations.

また、上述したように、式３、４におけるスケール番号ｊは、画像Ｐ（ｘ，ｙ）内における検出対象のオブジェクトのサイズに対応するパラメータである。オブジェクトのサイズが小さい場合にはスケール番号ｊが小さい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出され、オブジェクトのサイズが大きい場合にはスケール番号ｊが大きい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出される。サイズ推定手段６０は、この性質を利用し、複数の差分画像の間で差分値同士を比較し、最大の差分値となる差分画像のスケール番号、すなわち平滑化処理の繰り返し回数からオブジェクトのサイズを推定する。 As described above, the scale number j in the expressions 3 and 4 is a parameter corresponding to the size of the object to be detected in the image P (x, y). When the object size is small, the maximum value is detected from the difference image G (x, y, σ _j ) having a small scale number j. When the object size is large, the difference image G (x, The maximum value is detected from y, σ _j ). Using this property, the size estimation means 60 compares the difference values among a plurality of difference images, and determines the size of the object from the scale number of the difference image that becomes the maximum difference value, that is, the number of times of smoothing processing. presume.

照合手段２７は、位置推定手段２５から推定されたオブジェクトの位置を入力し、サイズ推定手段２６から推定されたオブジェクトのサイズを入力する。照合手段２７は、入力されたオブジェクトの位置及びサイズの情報を用いて、入力画像（フレーム画像）からオブジェクトを検出する。より詳細には、照合手段２７は、推定された位置の周辺領域をオブジェクトが存在する確率が高い領域として、その周辺領域からオブジェクトを検出する。このとき照合手段２７は、サイズ推定手段２６で推定されたサイズのオブジェクトを検出する。照合手段２７が行うオブジェクト検出には、パターンマッチングやニューラルネットワークを用いたオブジェクト検出など、任意のオブジェクト検出手法を用いることができる。 The collating unit 27 inputs the position of the object estimated from the position estimating unit 25 and inputs the size of the object estimated from the size estimating unit 26. The collating unit 27 detects an object from the input image (frame image) using the input position and size information of the object. More specifically, the collating unit 27 detects an object from the surrounding area by setting the surrounding area of the estimated position as an area where the probability that the object exists is high. At this time, the collating unit 27 detects the object having the size estimated by the size estimating unit 26. For the object detection performed by the matching unit 27, any object detection method such as pattern matching or object detection using a neural network can be used.

図６は、オブジェクト検出手段１２の動作手順を示す。前処理手段２１は、フレームメモリ１１（図１）からフレーム画像を読み出し、フレーム画像に対して前処理を行う（ステップＳ２１）。すなわち、解像度変換手段５１がフレーム画像を所定の解像度にまで低解像度化し、動き領域抽出手段５２が低解像度化されたフレーム画像から動き領域を抽出する。前処理手段２１は、前処理後の画像、すなわち解像度が低解像度化され、動き領域が白で背景領域が黒となるようにグレースケール化された画像Ｐ（ｘ，ｙ）を平滑化処理手段２２に入力する。なお、前処理手段２１における解像度変換及び動き領域抽出の何れか一方、又は双方を省略しても構わない。双方を省略する場合、フレーム画像を平滑化処理手段２２に入力すればよい。 FIG. 6 shows an operation procedure of the object detection means 12. The preprocessing means 21 reads out a frame image from the frame memory 11 (FIG. 1) and performs preprocessing on the frame image (step S21). That is, the resolution conversion means 51 lowers the frame image to a predetermined resolution, and the motion area extraction means 52 extracts a motion area from the reduced resolution frame image. The pre-processing means 21 smoothes the pre-processed image, that is, the image P (x, y) gray-scaled so that the resolution is reduced and the motion area is white and the background area is black. 22 is input. Note that either one or both of resolution conversion and motion region extraction in the preprocessing unit 21 may be omitted. When both are omitted, the frame image may be input to the smoothing processing means 22.

平滑化処理手段２３は、画像Ｐ（ｘ，ｙ）を入力し、画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込む処理を繰り返すことで、スケールが異なる複数の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する（ステップＳ２２）。平滑化処理手段２３は、フレーム画像そのものに対して平滑化フィルタを畳み込んでもよい。差分画像生成手段２３は、スケールが異なる２つの平滑化画像間の差分を計算し、差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する（ステップＳ２３）。差分画像生成手段２３は、例えば式３を用いて、ａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｋのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。あるいは差分画像生成手段２３は、式４を用いて、ｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｒ−ｐのｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。 The smoothing processing means 23 receives the image P (x, y), and repeats the process of convolving the smoothing filter with the image P (x, y), so that a plurality of smoothed images L (x, y with different scales) are obtained. , Σ _i ) is generated (step S22). The smoothing processing unit 23 may convolve a smoothing filter with respect to the frame image itself. The difference image generation means 23 calculates a difference between two smoothed images having different scales, and generates a difference image G (x, y, σ _j ) (step S23). The difference image generation means 23 uses, for example, Expression 3 to calculate k difference images G (x, y, σ) having scale numbers 1 to k from a × k smoothed images L (x, y, σ _i ). ₁ ) to (x, y, σ _k ) are generated. Alternatively, the difference image generation means 23 uses equation 4 to calculate rp difference images G (x, y) having scale numbers 1 to rp from r smoothed images L (x, y, σ _i ). , Σ ₁ ) to (x, y, σ _rp ).

合算手段２４は、差分画像生成手段２３が生成した複数の差分画像を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する（ステップＳ２４）。合算手段２４は、例えば差分画像生成手段２３で生成されたｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）の各画素値を全て加算する。位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）に基づいて、オブジェクトが存在する位置を推定する（ステップＳ２５）。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）を構成する各画素位置の画素値（差分の合算値）を比較し、合算画像において最大の画素値を有する画素位置をオブジェクトの位置として推定する。 The summing unit 24 sums the plurality of difference images generated by the difference image generating unit 23 to generate a summed image AP (x, y) (step S24). For example, the summing unit 24 adds all the pixel values of the _k difference images G (x, y, σ ₁ ) to (x, y, σ _k ) generated by the difference image generating unit 23. The position estimation unit 25 estimates the position where the object exists based on the combined image AP (x, y) (step S25). The position estimation unit 25 compares, for example, pixel values (summation values of differences) of pixel positions constituting the summed image AP (x, y), and uses the pixel position having the maximum pixel value in the summed image as the position of the object. presume.

なお、合算手段２４は、全ての差分画像を合算する必要はない。合算手段２４は、例えば全ｋ枚の差分画像のうちの任意数、及び任意のスケール番号の差分画像を合算してもよい。合算手段２４は、例えば吸収したいサイズ変動幅に応じて、加算処理に用いる差分画像の数（合算する差分画像のスケール）を変更してもよい。例えば、検出対象のオブジェクトの種類に応じて吸収したいサイズ変動幅を設定しておき、あるオブジェクトについては、スケール番号が小さい、具体的にはスケール番号１、２の差分画像Ｇ（ｘ，ｙ，σ_１）、（ｘ，ｙ，σ_２）を合算から除外して、スケール番号３〜ｋの差分画像Ｇ（ｘ，ｙ，σ_３）〜（ｘ，ｙ，σ_ｋ）を合算してもよい。また、合算手段２４が、スケール番号１から、ｋよりも小さい任意のスケール番号までの差分画像（ｘ，ｙ，σ_ｊ）を合算してもよい。 Note that the summing unit 24 does not have to sum all the difference images. For example, the summing unit 24 may sum any number of difference images of all k pieces of difference images and any number of difference images. For example, the summing unit 24 may change the number of difference images (the scale of the difference image to be summed) used for the addition processing in accordance with the size fluctuation range to be absorbed. For example, the size fluctuation range to be absorbed is set according to the type of the object to be detected, and for a certain object, the difference number G (x, y, Even if the difference images G (x, y, σ ₃ ) to (x, y, σ _k ) of scale numbers ₃ to _k are added together by excluding σ ₁ ) and (x, y, σ ₂ ) from the addition. Good. Further, the summing unit 24 may sum the difference images (x, y, σ _j ) from the scale number 1 to an arbitrary scale number smaller than k.

サイズ推定手段２６は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）に基づいて、オブジェクトのサイズを推定する（ステップＳ２６）。サイズ推定手段２６は、例えばｋ枚の差分画像間で、位置推定手段２５で推定されたオブジェクトの位置の周辺の画素の画素値（差分値）を比較する。サイズ推定手段２６は、最大の画素値を与える差分画像のスケールを特定する。あるいはサイズ推定手段２６は、推定されたオブジェクトの位置の周辺だけではなく、差分画像の全画素の画素値を比較し、最大の画素値を与える差分画像のスケールを特定してもよい。平滑化処理を行うことで画像内の像はどの程度広がるか（ボケるか）は既知であるため、差分最大を与えるスケールが判明すれば、そのスケール番号に基づいてオブジェクトのサイズが推定できる。また、上述のように検出対象であるオブジェクトは変動するため、サイズ推定手段２６は、最も大きい差分値を有する差分画像から推定したサイズ±α（αは所定の値）をオブジェクトのサイズとして推定するようにしてもよい。 The size estimation means 26 estimates the size of the object based on the plurality of difference images G (x, y, σ _j ) (step S26). The size estimation unit 26 compares pixel values (difference values) of pixels around the position of the object estimated by the position estimation unit 25 between, for example, k difference images. The size estimation means 26 specifies the scale of the difference image that gives the maximum pixel value. Alternatively, the size estimation unit 26 may compare not only the vicinity of the estimated position of the object but also the pixel values of all the pixels of the difference image, and specify the scale of the difference image that gives the maximum pixel value. Since it is known how much the image in the image spreads out by performing the smoothing process, if the scale that gives the maximum difference is found, the size of the object can be estimated based on the scale number. Since the object to be detected varies as described above, the size estimation unit 26 estimates the size ± α (α is a predetermined value) estimated from the difference image having the largest difference value as the object size. You may do it.

照合手段２７は、推定されたオブジェクトの位置及びサイズを利用して、フレーム画像からオブジェクトを検出する（ステップＳ２７）。例えば照合手段２７がパターンマッチングを用いてオブジェクトの検出を行う場合、照合手段２７は、推定されたオブジェクトの位置の周辺領域でパターンマッチングを行う。その際、照合手段２７は、推定されたオブジェクトのサイズに適合するサイズのパターンを用いてパターンマッチングを行う。照合手段２７が、推定されたオブジェクトの位置の周辺で、推定されたサイズのオブジェクトを検出することで、効率のよいオブジェクト検出が可能となる。 The matching unit 27 detects the object from the frame image using the estimated position and size of the object (step S27). For example, when the matching unit 27 detects an object using pattern matching, the matching unit 27 performs pattern matching in a peripheral region around the estimated object position. At this time, the matching unit 27 performs pattern matching using a pattern having a size that matches the estimated size of the object. The collation means 27 detects an object of the estimated size around the estimated object position, thereby enabling efficient object detection.

比較例としてＤＯＧ画像を用いたオブジェクト検出を考えると、ＤＯＧ画像を用いたオブジェクト検出では隣接するスケールの平滑化画像間の差分を全て求める必要があり、生成する必要がある差分画像の枚数が多くなる。図５に示すオブジェクト検出手段１２を用いる場合、あるスケールの平滑化画像と、そのスケールから所定スケールだけ離れたスケールの平滑化画像との差分を差分画像として生成すればよく、ＤＯＧ画像を用いたオブジェクト検出に比して、差分画像の生成枚数を少なくすることができる。このため、効率的に精度良くオブジェクトの位置を推定することができる。また、図５に示す構成のオブジェクト検出手段１２では、多重解像度画像を生成しなくてもオブジェクトのサイズの推定することができ、効率的にオブジェクトのサイズを推定することができる。 Considering object detection using a DOG image as a comparative example, object detection using a DOG image requires finding all differences between smoothed images of adjacent scales, and the number of difference images that need to be generated is large. Become. When the object detection unit 12 shown in FIG. 5 is used, a difference between a smoothed image of a certain scale and a smoothed image of a scale separated from the scale by a predetermined scale may be generated as a difference image, and a DOG image is used. Compared to object detection, the number of generated difference images can be reduced. For this reason, the position of the object can be estimated efficiently and accurately. Further, the object detection means 12 having the configuration shown in FIG. 5 can estimate the size of an object without generating a multi-resolution image, and can efficiently estimate the size of the object.

特に、平滑化処理手段２２においてａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ａ×ｋ）を生成し、差分画像生成手段２３において、式３を用いて、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ａ×ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ａ×ｊ）との差分を差分画像Ｇ（ｘ，ｙ，σ_ｊ）として求める場合、オブジェクトのサイズの様々な変動に合わせて、オブジェクトの位置を精度よく推定することができる。また、オブジェクトのサイズの推定を精度よく行うことができる。 In particular, the smoothing processing unit 22 generates a × k smoothed images L (x, y, σ ₁ ) to (x, y, σ _{a × k} ). used, scale sigma _j of the smoothed image _{L (x, y, σ j} ) the scale sigma smoothed image L of _{a × j (x, y,} σ a × j) and subtracting a difference image G (x in, When obtaining as y, σ _j ), the position of the object can be accurately estimated in accordance with various variations in the size of the object. Also, the object size can be estimated with high accuracy.

なお、上記の説明では、動き領域抽出手段５２が動き領域（オブジェクト）を白、背景領域を黒とするようなグレースケール化処理又は２値化処理を行うものとして説明したが、動き領域抽出手段５２の動作はこれには限定されない。例えば動き領域抽出手段５２は、動き領域を黒、背景領域を白とするようなグレースケール化処理又は２値化処理を行ってもよい。その場合には、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）において、画素値が最小となる画素位置を、オブジェクトの位置として推定すればよい。また、サイズ推定手段２６は、複数の差分画像のうちで最小の画素値（差分値）を与える差分画像のスケールに基づいて、オブジェクトのサイズを推定すればよい。 In the above description, the motion region extraction unit 52 has been described as performing gray scale processing or binarization processing in which the motion region (object) is white and the background region is black. The operation of 52 is not limited to this. For example, the motion region extraction unit 52 may perform gray scale processing or binarization processing in which the motion region is black and the background region is white. In this case, the position estimation unit 25 may estimate the pixel position where the pixel value is minimum in the summed image AP (x, y) as the position of the object. The size estimation unit 26 may estimate the size of the object based on the scale of the difference image that gives the minimum pixel value (difference value) among the plurality of difference images.

また、上記の説明では、オブジェクト検出手段１２が動画像からオブジェクトを１つ検出する例を説明したが、オブジェクト検出手段１２において複数のオブジェクトを検出してもよい。例えばオブジェクト検出手段１２が検出すべきオブジェクトの数をＭとする。その場合、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）の画素値を大きい順に並べ、上位Ｍ個の画素位置を、各オブジェクトの位置として推定すればよい。つまり、合算画像ＡＰ（ｘ，ｙ）において画素値が大きい順にＭ個の画素位置をオブジェクトの位置として推定すればよい。サイズ推定手段２６は、推定されたＭ個のオブジェクトの位置の周辺において、最大の画素値を与える差分画像のスケールに基づいて、各オブジェクトのサイズを推定すればよい。 In the above description, the example in which the object detection unit 12 detects one object from the moving image has been described. However, the object detection unit 12 may detect a plurality of objects. For example, M is the number of objects to be detected by the object detection means 12. In this case, the position estimation unit 25 may arrange the pixel values of the summed image AP (x, y) in descending order and estimate the top M pixel positions as the positions of the objects. That is, it is only necessary to estimate M pixel positions as object positions in descending order of pixel values in the combined image AP (x, y). The size estimation means 26 may estimate the size of each object based on the scale of the difference image that gives the maximum pixel value around the estimated positions of the M objects.

オブジェクト検出手段１２は、オブジェクト追跡処理手段１３の第１の追跡手段３２が追跡しているオブジェクトは検出対象から除外する。例えばオブジェクト検出手段１２は、第１の追跡処理部３２から追跡中のオブジェクトの現在フレームにおける位置を受け取り、追跡中のオブジェクトの位置の画素値を“０”とし、それ以外の位置の画素値を“１”とするマスク画像を生成する。位置推定手手段２５は、合算画像ＡＰ（ｘ，ｙ）とマスク画像との積を取り、その積を取った画像において画素値が最大となる位置をオブジェクトの位置として推定すればよい。また、サイズ推定手段２６は、各差分画像Ｇ（ｘ，ｙ，σ_ｊ）とマスク画像との積を取り、その積を取った各画像において最大の画素値を与えるスケールに基づいてオブジェクトのサイズを推定すればよい。追跡中のオブジェクトの位置をオブジェクト検出の対象から除外する手法は、マスク画像を用いる手法には限定されず、任意の手法でよい。 The object detection unit 12 excludes the object being tracked by the first tracking unit 32 of the object tracking processing unit 13 from the detection target. For example, the object detection unit 12 receives the position of the object being tracked in the current frame from the first tracking processing unit 32, sets the pixel value of the position of the object being tracked to “0”, and sets the pixel values of other positions as the pixel value. A mask image with “1” is generated. The position estimator 25 may calculate the product of the combined image AP (x, y) and the mask image, and estimate the position where the pixel value is maximum in the image obtained as the product as the position of the object. Further, the size estimation means 26 takes the product of each difference image G (x, y, σ _j ) and the mask image, and the size of the object based on the scale that gives the maximum pixel value in each image obtained by the product. Can be estimated. The method for excluding the position of the object being tracked from the object detection target is not limited to the method using the mask image, and any method may be used.

引き続き、動画オブジェクト検出装置１０（図１）の全体的な動作手順いついて説明する。図７は、動画オブジェクト検出装置１０の動作手順を示す。フレームメモリ１１は、順次に入力されるフレーム画像を記憶する（ステップＳ３１）。動きベクトル場計測手段３１（図３）は、現在フレーム画像と過去のフレーム画像とに基づいて動きベクトル場を計測する（ステップＳ３２）。動きベクトル場計測手段３１は、例えば図４に示す手順に従って動きベクトル場を計測する。 Subsequently, the overall operation procedure of the moving image object detection device 10 (FIG. 1) will be described. FIG. 7 shows an operation procedure of the moving image object detection apparatus 10. The frame memory 11 stores sequentially input frame images (step S31). The motion vector field measuring means 31 (FIG. 3) measures the motion vector field based on the current frame image and the past frame image (step S32). The motion vector field measuring means 31 measures the motion vector field according to the procedure shown in FIG. 4, for example.

第１の追跡手段３２は、オブジェクトリスト記憶部１４から処理対象のオブジェクトの過去フレーム画像における位置を取得する（ステップＳ３３）。第１の追跡手段３２は、過去フレーム画像から現在フレーム画像方向にオブジェクトを追跡し、処理対象のオブジェクトの現在フレーム画像における位置を推定する（ステップＳ３４）。第１の追跡手段３２は、ステップＳ３２で求められた動きベクトル場と過去フレーム画像におけるオブジェクトの位置とに基づいて、現在フレームにおけるオブジェクトの位置を推定する。 The first tracking unit 32 acquires the position of the processing target object in the past frame image from the object list storage unit 14 (step S33). The first tracking unit 32 tracks the object from the past frame image in the current frame image direction, and estimates the position of the processing target object in the current frame image (step S34). The first tracking unit 32 estimates the position of the object in the current frame based on the motion vector field obtained in step S32 and the position of the object in the past frame image.

第１の追跡手段３２は、推定した現在フレーム画像におけるオブジェクトの位置を、オブジェクトリスト記憶部１４に記憶する。第１の追跡手段３２は、例えば処理対象のオブジェクトのＩＤと、現在フレーム画像のフレームＩＤと、推定したオブジェクトの位置とを関連付けてオブジェクトリスト記憶部１４に記憶する。 The first tracking unit 32 stores the estimated position of the object in the current frame image in the object list storage unit 14. For example, the first tracking unit 32 stores the ID of the object to be processed, the frame ID of the current frame image, and the estimated position of the object in the object list storage unit 14 in association with each other.

第１の追跡手段３２は、オブジェクトリスト記憶部１４に未処理のオブジェクトが存在するか否かを判断する（ステップＳ３５）。第１の追跡手段３２は、未処理のオブジェクトが存在するときはステップＳ３３に戻り、未処理のオブジェクトの中から１つを選択し、選択したオブジェクトの位置を取得する。 The first tracking unit 32 determines whether there is an unprocessed object in the object list storage unit 14 (step S35). When there is an unprocessed object, the first tracking unit 32 returns to step S33, selects one of the unprocessed objects, and acquires the position of the selected object.

第１の追跡手段３２は、未処理のオブジェクトがなくなるまでステップＳ３３からステップＳ３５を繰り返し実行し、オブジェクトリスト記憶部１４に記憶された各オブジェクトについて、現在フレームにおけるオブジェクトの位置を推定し、推定したオブジェクトリスト記憶部１４に記憶する。未処理のオブジェクトがなくなると、過去フレーム画像から現在フレーム画像方向へのオブジェクトの追跡処理が終了する。なお、オブジェクトリスト記憶部１４に記憶されたオブジェクトが１つも存在しないときは、ステップＳ３３からステップＳ３５をスキップすればよい。 The first tracking unit 32 repeatedly executes step S33 to step S35 until there is no unprocessed object, and estimates and estimates the position of the object in the current frame for each object stored in the object list storage unit 14. Store in the object list storage unit 14. When there is no unprocessed object, the object tracking process from the past frame image in the current frame image direction ends. Note that when there is no object stored in the object list storage unit 14, step S33 to step S35 may be skipped.

オブジェクト検出手段１２は、現在フレーム画像からオブジェクトを所定個数、例えば１つ検出する（ステップＳ３６）。オブジェクト検出手段１２は、例えば図６に示す手順に従ってオブジェクトを検出する。オブジェクト検出部１２は、オブジェクトを検出したか否かを判断する（ステップＳ３７）。オブジェクト検出部１２において、現在フレーム画像からオブジェクトができなかったときは、現在フレーム画像に対する処理を終了する。 The object detection means 12 detects a predetermined number, for example, one object from the current frame image (step S36). The object detection means 12 detects an object according to the procedure shown in FIG. 6, for example. The object detection unit 12 determines whether an object has been detected (step S37). When the object detection unit 12 cannot create an object from the current frame image, the processing for the current frame image is terminated.

オブジェクト検出手段１２は、オブジェクトを検出すると、その検出したオブジェクトの位置をオブジェクトリスト記憶部１４に記憶する（ステップＳ３８）。オブジェクト検出手段１２は、例えば現在フレーム画像から検出したオブジェクトに新規オブジェクトＩＤを割り当て、割り当てたオブジェクトＩＤと、現在フレーム画像のフレームＩＤと、検出したオブジェクトの位置とを対応付けてオブジェクトリスト記憶部１４に記憶する。 When detecting the object, the object detection means 12 stores the position of the detected object in the object list storage unit 14 (step S38). For example, the object detection unit 12 assigns a new object ID to an object detected from the current frame image, and associates the assigned object ID, the frame ID of the current frame image, and the position of the detected object with the object list storage unit 14. To remember.

オブジェクト検出手段１２は、現在フレーム画像において新たなオブジェクトが検出された旨を第２の追跡手段３３に通知する。第２の追跡手段３３は、現在フレーム画像の時刻よりも以前の時刻に遡ってオブジェクトを追跡し、過去フレームにおけるオブジェクトの位置を推定する（ステップＳ３９）。第２の追跡手段３３は、ステップＳ３２で求められた動きベクトル場と現在フレームにおいて検出されたオブジェクトの位置とに基づいて、過去フレームにおけるオブジェクトの位置を推定する。第２の追跡手段３３は、例えば所定フレーム数だけ過去方向に遡ってオブジェクトを追跡する。あるいはオブジェクトが追跡できなくなるまで、過去方向に遡ってオブジェクトを追跡してもよい。 The object detection means 12 notifies the second tracking means 33 that a new object has been detected in the current frame image. The second tracking unit 33 tracks the object by tracing back the time before the time of the current frame image, and estimates the position of the object in the past frame (step S39). The second tracking unit 33 estimates the position of the object in the past frame based on the motion vector field obtained in step S32 and the position of the object detected in the current frame. The second tracking unit 33 tracks the object by going back in the past direction by a predetermined number of frames, for example. Alternatively, the object may be traced back in the past direction until the object cannot be traced.

ところで、仮にステップＳ３２における動きベクトル場の計測を、追跡すべきオブジェクト（追跡中のオブジェクト）の周辺のみで行うものとした場合、過去方向にオブジェクトを追跡するためには、動きベクトル場の再計測が必要になる。この場合、１つのフレーム画像に対する処理を一定の時間以内で終わらせることが困難になると考えられる。動きベクトル場の計測を、追跡中のオブジェクトの周辺のみで行うのではなく、フレーム画像の全体に対して動きベクトル場を計測しけば、動きベクトル場の再計測は不要であり、第２の追跡手段３３の追跡処理を余計な時間をかけずに行うことができる。 By the way, if the measurement of the motion vector field in step S32 is performed only around the object to be tracked (the object being tracked), the motion vector field is re-measured to track the object in the past direction. Is required. In this case, it is considered difficult to finish the processing for one frame image within a certain time. If the motion vector field is not measured only around the object being tracked, but the motion vector field is measured for the entire frame image, it is not necessary to re-measure the motion vector field. The tracking process of the tracking means 33 can be performed without taking extra time.

第２の追跡手段３３は、推定した過去フレーム画像におけるオブジェクトの位置を、オブジェクトリスト記憶部１４に記憶する。第２の追跡手段３３は、例えば検出されたオブジェクトのＩＤと、位置が推定された過去フレーム画像のフレームＩＤと、推定したオブジェクトの位置とを関連付けてオブジェクトリスト記憶部１４に記憶する。第２の追跡手段３３は、ステップＳ３６で検出されたオブジェクトが複数あるときは、そのそれぞれに対して過去フレーム画像におけるオブジェクトの位置を推定する。過去方向へのオブジェクトの追跡が終了すると、現在フレーム画像に対する処理が終了する。 The second tracking unit 33 stores the estimated position of the object in the past frame image in the object list storage unit 14. For example, the second tracking unit 33 stores the ID of the detected object, the frame ID of the past frame image whose position is estimated, and the estimated position of the object in the object list storage unit 14 in association with each other. When there are a plurality of objects detected in step S36, the second tracking unit 33 estimates the position of the object in the past frame image for each of the objects. When the tracking of the object in the past direction is finished, the processing for the current frame image is finished.

本実施形態では、ある時刻のフレーム画像においてオブジェクトが検出されると、第１の追跡手段３２（図２）がオブジェクトの検出時刻よりも後の時刻のフレーム画像におけるオブジェクトの位置を複数フレーム間で追跡すると共に、第２の追跡手段３３が、オブジェクトが検出された時刻よりも前の時刻に遡って複数フレーム間でオブジェクトの位置を追跡する。本実施形態では、ある時刻のフレーム画像において、オブジェクトが例えば画像上で他のオブジェクトや何らかの物体と重なることにより、そのオブジェクトが検出できなかったとしても、その後のフレーム画像においてオブジェクトが検出されたときに、第２の追跡手段３３が時間を遡ってオブジェクトを追跡する。このため、ある時刻においてオブジェクトの検出漏れがあったとしても、そのある時刻におけるオブジェクトの位置を後の時刻において推定することができる。 In the present embodiment, when an object is detected in a frame image at a certain time, the first tracking unit 32 (FIG. 2) sets the position of the object in the frame image at a time later than the object detection time between a plurality of frames. In addition to tracking, the second tracking means 33 tracks the position of the object between a plurality of frames, going back to the time before the time when the object was detected. In the present embodiment, when an object is detected in a subsequent frame image even if the object cannot be detected because the object overlaps another object or some object on the image, for example, in a frame image at a certain time In addition, the second tracking means 33 tracks the object by going back in time. For this reason, even if there is an object detection failure at a certain time, the position of the object at that certain time can be estimated at a later time.

本実施形態では、オブジェクトが検出されたときに、そのオブジェクトの過去フレーム画像における位置を時間を遡って追跡するため、各時刻において無理にオブジェクトを検出する必要がない。言い換えれば、各時刻においてオブジェクトの検出漏れが許容される。本実施形態では、各時刻においてオブジェクトの検出漏れを許容することができるため、オブジェクト検出においてオブジェクトの検出条件を厳しく設定することができる。例えば、現在フレーム画像に複数のオブジェクトが存在する場合でも、オブジェクト検出手段１２において検出するオブジェクトの数を１つに制限することができる。オブジェクトの検出条件を厳しく設定し、オブジェクトらしさが低いオブジェクトはオブジェクトとして検出しないようにすることで、誤検出を抑制できる。 In the present embodiment, when an object is detected, the position of the object in the past frame image is traced back in time, so that it is not necessary to detect the object at each time. In other words, omission of object detection is allowed at each time. In the present embodiment, since object detection omissions can be allowed at each time, object detection conditions can be set strictly in object detection. For example, even when there are a plurality of objects in the current frame image, the number of objects detected by the object detection means 12 can be limited to one. By setting strict object detection conditions so that objects with low object-likeness are not detected as objects, false detection can be suppressed.

単にオブジェクトの検出条件を厳しく設定するだけであれば、誤検出は抑制できるものの、検出漏れが発生することになる。この検出漏れの発生に対し、本実施形態では、オブジェクト検出時に過去方向にオブジェクトを追跡する第２の追跡手段３３を用いる。第２の追跡手段３３が、オブジェクトが検出されたときに、過去フレーム方向にオブジェクトを追跡することで、過去フレームにおいて検出できなかったオブジェクトの位置を推定することができる。これにより、過去フレーム画像におけるオブジェクトの検出漏れを補償できる。従って、本実施形態では、オブジェクトの誤検出の抑制と、検出漏れの抑制との双方を実現できる。また、本実施形態では、特許文献３とは異なり、オブジェクトとは別にオブジェクトの候補を検出し、そのオブジェクトの候補を追跡する必要がないため、処理負荷が無駄に高くなることはない。 If the object detection conditions are simply set strictly, erroneous detection can be suppressed, but detection failure will occur. In the present embodiment, the second tracking unit 33 that tracks the object in the past direction when the object is detected is used for the occurrence of this detection omission. When the second tracking unit 33 detects an object, it tracks the object in the past frame direction, so that the position of the object that could not be detected in the past frame can be estimated. Thereby, it is possible to compensate for an object detection omission in the past frame image. Therefore, in the present embodiment, it is possible to realize both suppression of erroneous detection of an object and suppression of detection omission. Also, in this embodiment, unlike Patent Document 3, there is no need to detect object candidates separately from the objects and to track the object candidates, so that the processing load is not increased unnecessarily.

図８は、具体的な処理例を示す。時刻ｔ以前に検出され、オブジェクトリスト記憶部１４に記憶されたオブジェクトは存在しないものとする。時刻ｔのフレーム画像が入力されると、オブジェクト検出手段１２は、図７のステップＳ３６でオブジェクトを１つ検出する。時刻ｔのフレーム画像には２つのオブジェクトが登場している。しかし、紙面向かって左側のオブジェクトは柱に隠れており、オブジェクトらしさを示すオブジェクトの信頼度が低い。オブジェクト検出手段１２は、ステップＳ３６において紙面向かった右側のオブジェクトを検出する。このオブジェクトをオブジェクトＡとする。 FIG. 8 shows a specific processing example. It is assumed that there is no object detected before time t and stored in the object list storage unit 14. When the frame image at time t is input, the object detection unit 12 detects one object in step S36 in FIG. Two objects appear in the frame image at time t. However, the object on the left side of the page is hidden by the pillar, and the reliability of the object indicating the object likeness is low. The object detection means 12 detects the object on the right side facing the page in step S36. This object is called object A.

オブジェクト検出手段１２は、ステップＳ３８でオブジェクトＡのオブジェクトＩＤと、時刻ｔのフレーム画像のフレームＩＤと、オブジェクトＡの位置とをオブジェクトリスト記憶部１４に記憶する。オブジェクト検出手段１２はオブジェクトＡが検出された旨を第２の追跡手段３３に通知する。オブジェクトＡは、時刻ｔのフレーム画像において初めて登場したオブジェクトであり、第２の追跡手段３３がステップＳ３９において過去フレーム画像方向へオブジェクトを追跡しても、時刻ｔ以前のフレーム画像にオブジェクトＡに対応するオブジェクトは存在しないものとする。 In step S38, the object detection unit 12 stores the object ID of the object A, the frame ID of the frame image at time t, and the position of the object A in the object list storage unit 14. The object detection means 12 notifies the second tracking means 33 that the object A has been detected. The object A is an object that first appears in the frame image at time t. Even if the second tracking unit 33 tracks the object in the direction of the past frame image in step S39, the object A corresponds to the frame image before time t. It is assumed that no object exists.

時刻ｔ＋１のオブジェクト画像が入力されると、動きベクトル場計測手段３１は、ステップＳ３２で時刻ｔのフレーム画像と時刻ｔ＋１のフレーム画像とから動きベクトル場を計測する。第１の追跡手段３２は、ステップＳ３３で、オブジェクトリスト記憶部１４からオブジェクトＡの時刻ｔのフレーム画像における位置を取得する。第１の追跡手段３２は、ステップＳ３４で、動きベクトル場におけるオブジェクトＡの位置の動きベクトルに基づいて時刻ｔにおけるオブジェクトＡの位置を移動し、時刻ｔ＋１におけるオブジェクトＡの位置を推定する。オブジェクトリスト記憶部１４にはオブジェクトＡしか記憶されていないため、オブジェクトＡの現在フレームにおける位置の推定が終わると、ステップＳ３６のオブジェクトの検出へと進む。 When the object image at time t + 1 is input, the motion vector field measuring unit 31 measures the motion vector field from the frame image at time t and the frame image at time t + 1 in step S32. In step S33, the first tracking unit 32 acquires the position of the object A at the time t in the frame image from the object list storage unit 14. In step S34, the first tracking unit 32 moves the position of the object A at time t based on the motion vector of the position of the object A in the motion vector field, and estimates the position of the object A at time t + 1. Since only the object A is stored in the object list storage unit 14, when the estimation of the position of the object A in the current frame is completed, the process proceeds to the object detection in step S36.

オブジェクト検出手段１２は、ステップＳ３６において、時刻ｔ＋１のフレーム画像のオブジェクトＡの位置として推定された位置を除く部分からオブジェクトの検出を試みる。時刻ｔ＋１のフレーム画像には、時刻ｔのフレーム画像にも含まれていた柱に隠れていたオブジェクトに加えて、柱の前を横切るオブジェクトが登場している。しかし、これらオブジェクトは画像上で柱と重なっていることからオブジェクトの信頼度が低く、オブジェクト検出手段１２は、時刻ｔ＋１のフレーム画像からはオブジェクトを検出しない。 In step S36, the object detection means 12 tries to detect the object from the portion excluding the position estimated as the position of the object A in the frame image at time t + 1. In the frame image at time t + 1, in addition to the object hidden in the column that was also included in the frame image at time t, an object that crosses the front of the column appears. However, since these objects overlap the pillars on the image, the reliability of the objects is low, and the object detection unit 12 does not detect the objects from the frame image at time t + 1.

時刻ｔ＋２のフレーム画像が入力されると、動きベクトル場計測手段３１は、ステップＳ３２で時刻ｔ＋１のフレーム画像と時刻ｔ＋２のフレーム画像とから動きベクトル場を計測する。第１の追跡手段３２は、ステップＳ３３で、オブジェクトリスト記憶部１４からオブジェクトＡの時刻ｔ＋１のフレーム画像における位置を取得する。第１の追跡手段３２は、ステップＳ３４で、動きベクトル場におけるオブジェクトＡの位置の動きベクトルに基づいて時刻ｔ＋１におけるオブジェクトＡの位置を移動し、時刻ｔ＋２におけるオブジェクトＡの位置を推定する。オブジェクトリスト記憶部１４にはオブジェクトＡしか記憶されていないため、オブジェクトＡの現在フレームにおける位置の推定が終わると、ステップＳ３６のオブジェクトの検出へと進む。 When the frame image at time t + 2 is input, the motion vector field measuring means 31 measures the motion vector field from the frame image at time t + 1 and the frame image at time t + 2 in step S32. In step S33, the first tracking unit 32 acquires the position of the object A in the frame image at the time t + 1 from the object list storage unit 14. In step S34, the first tracking unit 32 moves the position of the object A at time t + 1 based on the motion vector of the position of the object A in the motion vector field, and estimates the position of the object A at time t + 2. Since only the object A is stored in the object list storage unit 14, when the estimation of the position of the object A in the current frame is completed, the process proceeds to the object detection in step S36.

オブジェクト検出手段１２は、ステップＳ３６において、時刻ｔ＋２のフレーム画像のオブジェクトＡの位置として推定された位置を除く部分からオブジェクトの検出を試みる。時刻ｔ＋２のフレーム画像には、時刻ｔ＋１のフレーム画像において画像上で柱と重なっていた２つのオブジェクトが柱から離れた位置に存在する。オブジェクト検出手段１２は、これら２つのオブジェクトのうちでオブジェクトの信頼度が高い方を検出する。オブジェクト検出手段１２は、例えば紙面向かって下側のオブジェクトを検出する。このオブジェクトをオブジェクトＢとする。 In step S36, the object detection means 12 tries to detect the object from the portion excluding the position estimated as the position of the object A in the frame image at time t + 2. In the frame image at time t + 2, two objects that overlap the column on the image in the frame image at time t + 1 are present at positions away from the column. The object detection means 12 detects the higher object reliability of these two objects. The object detection means 12 detects, for example, an object on the lower side with respect to the paper surface. This object is called object B.

オブジェクト検出手段１２は、ステップＳ３８でオブジェクトＢのオブジェクトＩＤと、時刻ｔ＋２のフレーム画像のフレームＩＤと、オブジェクトＢの位置とをオブジェクトリスト記憶部１４に記憶する。オブジェクト検出手段１２はオブジェクトＢが検出された旨を第２の追跡手段３３に通知する。第２の追跡手段３３は、時刻ｔ＋２におけるオブジェクトＢの位置をステップＳ３２で求められた動きベクトル場に基づいて移動させ、時刻ｔ＋１におけるオブジェクトＢの位置を推定する。このようにすることで、時刻ｔ＋１においてオブジェクト検出手段１２で検出できなかった、画像上で柱と重なるオブジェクトＢの位置を、オブジェクトリスト記憶部１４に記憶することができる。 In step S38, the object detection unit 12 stores the object ID of the object B, the frame ID of the frame image at time t + 2, and the position of the object B in the object list storage unit 14. The object detection means 12 notifies the second tracking means 33 that the object B has been detected. The second tracking unit 33 moves the position of the object B at time t + 2 based on the motion vector field obtained in step S32, and estimates the position of the object B at time t + 1. In this way, the position of the object B that overlaps the pillar on the image that could not be detected by the object detection unit 12 at time t + 1 can be stored in the object list storage unit 14.

時刻ｔ＋３のフレーム画像が入力されると、動きベクトル場計測手段３１は、ステップＳ３２で時刻ｔ＋２のフレーム画像と時刻ｔ＋３のフレーム画像とから動きベクトル場を計測する。第１の追跡手段３２は、ステップＳ３３で、オブジェクトリスト記憶部１４からオブジェクトＡの時刻ｔ＋２のフレーム画像における位置を取得する。第１の追跡手段３２は、ステップＳ３４で、動きベクトル場におけるオブジェクトＡの位置の動きベクトルに基づいて時刻ｔ＋２におけるオブジェクトＡの位置を移動し、時刻ｔ＋３におけるオブジェクトＡの位置を推定する。 When the frame image at time t + 3 is input, the motion vector field measuring means 31 measures the motion vector field from the frame image at time t + 2 and the frame image at time t + 3 in step S32. In step S33, the first tracking unit 32 acquires the position of the object A in the frame image at the time t + 2 from the object list storage unit 14. In step S34, the first tracking unit 32 moves the position of the object A at time t + 2 based on the motion vector of the position of the object A in the motion vector field, and estimates the position of the object A at time t + 3.

第１の追跡手段３２は、オブジェクトリスト記憶部１４にはオブジェクトＢが未処理のオブジェクトとして残っているため、ステップＳ３５からステップＳ３３に戻り、時刻ｔ＋２のフレーム画像におけるオブジェクトＢの位置を取得する。第１の追跡手段３２は、ステップＳ３４で、動きベクトル場におけるオブジェクＢの位置の動きベクトルに基づいて時刻ｔ＋２におけるオブジェクトＢの位置を移動し、時刻ｔ＋３におけるオブジェクトＢの位置を推定する。オブジェクトＢに対する処理が終わると、オブジェクトリスト記憶部１４には未処理のオブジェクトが存在しないため、ステップＳ３６のオブジェクトの検出へと進む。 Since the object B remains as an unprocessed object in the object list storage unit 14, the first tracking unit 32 returns from step S35 to step S33, and acquires the position of the object B in the frame image at time t + 2. In step S34, the first tracking unit 32 moves the position of the object B at time t + 2 based on the motion vector of the position of the object B in the motion vector field, and estimates the position of the object B at time t + 3. When the processing for the object B is completed, since there is no unprocessed object in the object list storage unit 14, the process proceeds to the detection of the object in step S36.

オブジェクト検出手段１２は、ステップＳ３６において、時刻ｔ＋３のフレーム画像のオブジェクトＡ及びＢの位置として推定された位置を除く部分からオブジェクトの検出を試みる。オブジェクト検出手段１２は、時刻ｔ＋３のフレーム画像のオブジェクトの信頼度が最も高い位置からオブジェクトを検出する。このオブジェクトをオブジェクトＣとする。 In step S36, the object detection means 12 tries to detect an object from a portion excluding the positions estimated as the positions of the objects A and B in the frame image at time t + 3. The object detection means 12 detects the object from the position where the reliability of the object of the frame image at time t + 3 is the highest. Let this object be object C.

オブジェクト検出手段１２は、ステップＳ３８でオブジェクトＣのオブジェクトＩＤと、時刻ｔ＋３のフレーム画像のフレームＩＤと、オブジェクトＣの位置とをオブジェクトリスト記憶部１４に記憶する。オブジェクト検出手段１２はオブジェクトＣが検出された旨を第２の追跡手段３３に通知する。第２の追跡手段３３は、時刻ｔ＋３におけるオブジェクトＣの位置をステップＳ３２で求められた動きベクトル場に基づいて移動させ、時刻ｔ＋２におけるオブジェクトＣの位置を推定する。第２の追跡手段３３は、更に時刻ｔ−１、時刻ｔへと遡って、遡った時刻のフレーム画像におけるオブジェクトＣの位置を推定してもよい。 In step S38, the object detection unit 12 stores the object ID of the object C, the frame ID of the frame image at time t + 3, and the position of the object C in the object list storage unit 14. The object detection means 12 notifies the second tracking means 33 that the object C has been detected. The second tracking unit 33 moves the position of the object C at time t + 3 based on the motion vector field obtained in step S32, and estimates the position of the object C at time t + 2. The second tracking unit 33 may further estimate the position of the object C in the frame image at the retroactive time by going back to the time t−1 and the time t.

オブジェクトリスト記憶部１４には、オブジェクトＡに対して、時刻ｔ、時刻ｔ＋１、時刻ｔ＋２、及び時刻ｔ＋３の各フレーム画像における位置が記憶される。オブジェクトＢに対しては、時刻ｔ＋１、時刻ｔ＋２、及び時刻ｔ＋３の各フレーム画像における位置が記憶される。また、オブジェクトＣに対しては、時刻ｔ＋２及び時刻ｔ＋３の各フレーム画像における位置が記憶される。オブジェクトＢについては、その検出時刻は時刻ｔ＋２であるものの、未来方向だけでなく過去方向への追跡も行うことで、検出時刻よりも前の時刻ｔ−１におけるオブジェクトＢの位置をオブジェクトリスト記憶部１４に記憶することができる。また、オブジェクトＣについては、その検出時刻ｔ＋３よりも前の時刻ｔ＋２におけるオブジェクトＣの位置をオブジェクトリスト記憶部１４に記憶することができる。このように、誤検出の抑制を目的にオブジェクト検出条件を厳しく設定しても、オブジェクトの検出漏れを抑制できる効果がある。 The object list storage unit 14 stores the position of the object A in each frame image at time t, time t + 1, time t + 2, and time t + 3. For object B, the position in each frame image at time t + 1, time t + 2, and time t + 3 is stored. For the object C, the position in each frame image at time t + 2 and time t + 3 is stored. For the object B, the detection time is the time t + 2, but by tracking not only in the future direction but also in the past direction, the position of the object B at the time t−1 prior to the detection time is determined as an object list storage unit. 14 can be stored. For the object C, the position of the object C at the time t + 2 before the detection time t + 3 can be stored in the object list storage unit 14. Thus, even if the object detection conditions are set strictly for the purpose of suppressing false detection, there is an effect that omission of object detection can be suppressed.

なお、上記実施形態では、動きベクトル場計測手段３１が対象フレーム画像の全体に対して動きベクトル場を計測するものとして説明したが、これには限定されない。例えばフレーム画像においてオブジェクトが検出されないことが明らかな領域があるような場合には、その領域を動きベクトル場の計測から除外してもよい。また、図７に示す動作手順は例示であり、必ずしもその手順に限定されるわけではない。例えばオブジェクトの追跡に関する処理（ステップＳ３２〜Ｓ３５）と、オブジェクトの検出（ステップＳ３６）の一部とを並列に行うことも可能である。具体的には、図４における各ステップと、図６におけるステップＳ２４までの処理とを並列に行うことができる。図６のオブジェクトの位置推定（ステップＳ２５）以降の処理については、現在時刻のフレーム画像における追跡中のオブジェクトの位置が求まった後に行えばよい。 In the above embodiment, the motion vector field measuring unit 31 is described as measuring the motion vector field with respect to the entire target frame image. However, the present invention is not limited to this. For example, when there is a region where it is clear that an object is not detected in the frame image, the region may be excluded from the measurement of the motion vector field. Moreover, the operation | movement procedure shown in FIG. 7 is an illustration, and is not necessarily limited to the procedure. For example, it is also possible to perform in parallel the processing related to object tracking (steps S32 to S35) and part of object detection (step S36). Specifically, each step in FIG. 4 and the processing up to step S24 in FIG. 6 can be performed in parallel. The processing after the object position estimation (step S25) in FIG. 6 may be performed after the position of the object being tracked in the frame image at the current time is obtained.

本実施形態の動画オブジェクト検出装置１０は、例えば商店やイベント会場などに設置されたカメラから動画像を入力し、客やイベント参加者などがどのように売り場やイベント会場内を動いたかを記録する用途に適用できる。また、動画像の圧縮において、特定のオブジェクト部分とそれ以外の部分とで圧縮のパラメータを変更して圧縮を行うといった用途に適用することができる。例えばオブジェクトリスト記憶部１４に記憶されたオブジェクトの位置に相当する領域の圧縮率を、その他の領域よりも低く設定して圧縮を行うことで、画像全体の圧縮率を上げてファイルサイズを小さく抑えつつも、オブジェクトの領域の情報の欠落を抑えることが可能になる。 The moving image object detection apparatus 10 according to the present embodiment inputs a moving image from a camera installed in, for example, a store or an event venue, and records how customers or event participants moved in the sales floor or the event venue. Applicable to usage. Further, in the compression of moving images, the present invention can be applied to a purpose of performing compression by changing compression parameters between a specific object portion and other portions. For example, by setting the compression rate of the area corresponding to the position of the object stored in the object list storage unit 14 to be lower than that of other areas and performing compression, the compression rate of the entire image is increased and the file size is reduced. However, it is possible to suppress the lack of information on the object area.

以上、本発明をその好適な実施形態に基づいて説明したが、本発明の動画オブジェクト検出装置、方法、及びプログラムは、上記実施形態にのみ限定されるものではなく、上記実施形態の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 As mentioned above, although this invention was demonstrated based on the suitable embodiment, the moving image object detection apparatus of this invention, a method, and a program are not limited only to the said embodiment, Various from the structure of the said embodiment. Those modified and changed as described above are also included in the scope of the present invention.

１０：動画オブジェクト検出装置
１１：フレームメモリ
１２：オブジェクト検出手段
１３：オブジェクト追跡処理手段
２１：前処理手段
２２：平滑化処理手段
２３：差分画像生成手段
２４：合算手段
２５：位置推定手段
２６：サイズ推定手段
２７：照合手段
３１：動きベクトル場計測手段
３２：第１の追跡手段
３３：第２の追跡手段
４０：解像度変換手段
４１：動きベクトル分布算出手段
４２：平均化処理手段
４３：動きベクトル検出手段
４４：誤計測判定手段４４
５１：解像度変換手段
５２：動き領域抽出手段 10: Movie object detection device 11: Frame memory 12: Object detection means 13: Object tracking processing means 21: Preprocessing means 22: Smoothing processing means 23: Difference image generation means 24: Summation means 25: Position estimation means 26: Size Estimating means 27: collating means 31: motion vector field measuring means 32: first tracking means 33: second tracking means 40: resolution converting means 41: motion vector distribution calculating means 42: averaging processing means 43: motion vector detection Means 44: erroneous measurement determination means 44
51: Resolution conversion means 52: Motion region extraction means

Claims

Object detection means for detecting an object from a frame image of a moving image composed of a plurality of frames, and storing the position of the detected object in an object list storage unit;
First tracking means for tracking the position of the object stored in the object list storage unit between a plurality of frames after the time when the object is detected, and storing the position of the tracked object in the object list storage unit; When the object is newly detected by the object detection means, the position of the newly detected object is traced between a plurality of frames, going back to the time before the time of the frame image where the object is detected. A moving image object detection apparatus comprising: means tracking processing means including second tracking means for storing the position of the tracked object in the object list storage unit.

The object detection means sets the assumed number of appearances indicating the number of objects assumed to be included in one frame image, n, the estimated number of effective shots indicating how many frames of one object are shot, and the detection of the object The probability is P (%), and from one frame image, the following formula:
N = (n / S) × (100 / P)
2. The moving image object detection apparatus according to claim 1, wherein the object is detected by the number obtained by converting N obtained by (1) into an integer.

The moving image object detection apparatus according to claim 1, wherein the object detection unit detects one object from one frame image.

The object tracking processing unit sequentially inputs the frame images, and obtains a motion vector field based on a frame image to be processed at present and a frame image at a time before the frame image. Further including
The first tracking unit tracks the position of the object between a plurality of frames based on the motion vector field obtained by the motion vector field measuring unit and the position of the object stored in the object list storage unit. Then, the second tracking unit is configured to return the object to the previous time based on the motion vector field obtained by the motion vector field measuring unit and the position of the object newly detected by the object detection unit. 4. The moving image object detection apparatus according to claim 1, wherein the position of the moving image object is tracked. 5.

The motion vector field measuring means is
For the target pixel that is the target of motion vector measurement on the target frame, the target pixel and the target are shifted with respect to each shift amount while shifting the reference frame within a predetermined range corresponding to the motion vector detection space with respect to the target frame. Motion vector distribution calculating means for calculating a motion vector distribution which is a score distribution representing a correlation with a pixel of a reference frame corresponding to the pixel;
Motion vector detection means for detecting a motion vector in the target pixel based on the motion vector distribution;
The object detection apparatus according to claim 4, further comprising: an erroneous measurement determination unit that determines whether the detected motion vector is an erroneous measurement based on the motion vector distribution.

The erroneous measurement determination means calculates the score in the motion vector distribution calculated for the shift amount corresponding to the center position of the motion vector detection space and the shift amount corresponding to the position of the detected motion vector. 6. The method according to claim 5, wherein whether or not the detected motion vector is an erroneous measurement is determined based on whether or not the score in the motion vector distribution thus satisfied satisfies a predetermined relationship. The object detection device described.

When the detected error vector is a true motion vector, the erroneous measurement determination means uses data obtained by arranging the scores of the target pixels in the motion vector distribution calculated for each shift amount as positive teacher data. When the detected motion vector is not a true motion vector, machine learning is used as negative teacher data for data in which the scores of target pixels in the motion vector distribution calculated for each shift amount are arranged. Using the generated discriminator, whether or not erroneous measurement is performed based on the output of the discriminator when the data of the target pixel scores in the motion vector distribution calculated for each shift amount are arranged in the discriminator 6. The object detection apparatus according to claim 5, wherein the object detection apparatus determines whether or not.

The motion vector distribution calculating unit further shifts the target frame on the reference frame with respect to each shift amount while shifting the target frame with respect to the reference frame within a predetermined range corresponding to the motion vector detection space. Calculating another motion vector distribution, which is a score distribution representing a correlation between a pixel corresponding to the target pixel on the reference frame and the corresponding pixel,
The motion vector detection means further detects another motion vector in a pixel corresponding to the target pixel on the reference frame based on the another motion vector distribution;
Instead of, or in addition to, determining whether the detected motion vector is an erroneous measurement based on the motion vector distribution, the erroneous measurement determination unit and the another motion 8. The object detection apparatus according to claim 5, wherein it is determined whether or not an erroneous measurement is made based on a relationship with a vector.

The erroneous measurement determination means determines that the detected motion vector is an erroneous measurement when the motion vector and the another motion vector are not in an inverse vector relationship. 9. The object detection device according to 8.

The motion vector distribution calculating unit further shifts the target pixel for each shift amount while shifting the target frame with respect to the target pixel within a predetermined range corresponding to the motion vector detection space. Further calculate a self-motion vector distribution which is a distribution of scores representing the correlation of the self,
Instead of or in addition to determining whether or not the detected motion vector is an erroneous measurement based on the motion vector distribution, the erroneous measurement determination unit determines the position of the detected motion vector. 10. The object detection according to claim 5, wherein it is determined whether or not an erroneous measurement is made based on a score in the self-motion vector distribution calculated with respect to the shift amount corresponding to. apparatus.

11. The erroneous measurement determination unit is configured to perform threshold processing on a score in the self-motion vector distribution with a predetermined threshold value and determine whether or not the erroneous measurement is performed. Object detection device.

The object detection means is
Smoothing processing means for generating a plurality of smoothed images having different scales from the frame image by repeatedly performing a process of convolving a smoothing filter having filter characteristics corresponding to the contour shape of the object to be detected, on the image;
A difference image generating means for generating a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images, while changing the scale;
A summing means for summing the plurality of difference images to generate a summed image;
Position estimating means for estimating a position of an object to be detected based on a pixel value in the combined image;
The moving image object detection device according to claim 1, further comprising: a matching unit that detects an object around the estimated position from the frame image.

When the number of objects to be detected by the object detection means is M, the position estimation means indicates the pixel positions of the upper M or lower M pixels when the pixel values of the combined image are arranged in descending order. The moving image object detection apparatus according to claim 12, wherein the moving image object detection apparatus estimates the position of the object.

The object detection means further comprises size estimation means for comparing pixel values of the plurality of difference images and estimating the size of the object to be detected based on a scale of the difference image having the maximum or minimum pixel value. The moving image object detection device according to claim 12 or 13,

15. The moving image object detection apparatus according to claim 14, wherein the size estimation unit compares pixel values of the difference image around the position of the object estimated by the position estimation unit.

The size estimation unit determines the size of the object based on a scale in a smoothed image having a smaller scale of the two smoothed images from which the difference image having the maximum or minimum pixel value is generated. The moving image object detection device according to claim 14 or 15, wherein

The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . And generating _a smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a} based on a difference between the converted image L (x, y, σ _j ) and the smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a.} The moving image object detection device according to any one of 12 to 16.

The difference image generating means has the following formula:
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j × a} )
The moving image object detection apparatus according to claim 17, wherein the difference image G (x, y, σ _j ) is generated by using.

The smoothing processing means generates r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from scale σ ₁ to σ _r (r is an integer of 3 or more), and the difference The image generating means outputs kp differential images G (x, y, σ _j ) (j = 1 to k−p) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more), smoothed image L of each scale _{σ j (x, y, σ} j) is characterized in that to produce on the basis of the difference between the scale sigma _{j + p} of the smoothed image _{L (x, y, σ j} + p) and The moving image object detection device according to claim 12.

The difference image generating means has the following formula:
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j + p} )
The moving image object detection apparatus according to claim 19, wherein a difference image G (x, y, σ _j ) is generated by using.

The object detection means further comprises a motion area extraction means for extracting a motion area from the frame image and generating a motion area extraction image;
21. The moving image object detection apparatus according to claim 12, wherein the smoothed image generating unit convolves the smoothing filter with the motion region extraction image.

The motion region extraction unit is configured to generate, as the motion region extraction image, a grayscale image in which each pixel has a gradation value corresponding to the amount of motion extracted from the frame image. The moving image object detection device according to claim 21.

23. The moving image object detection apparatus according to claim 22, wherein the moving region extraction unit performs a predetermined contrast reduction process on the grayscale image.

Detecting an object from a frame image of a moving image composed of a plurality of frames, and storing the position of the detected object in an object list storage unit;
Tracking the position of the object stored in the object list storage unit between a plurality of frames after the time when the object is detected, and storing the position of the tracked object in the object list storage unit;
When an object is newly detected by the object detection means, the position of the newly detected object is traced between a plurality of frames retroactive to the time before the time of the frame image where the object is detected, And storing the tracked object position in the object list storage unit.

On the computer,
Detecting an object from a frame image of a moving image composed of a plurality of frames, and storing the position of the detected object in an object list storage unit;
Tracking the position of the object stored in the object list storage unit between a plurality of frames after the time when the object is detected, and storing the position of the tracked object in the object list storage unit;
When an object is newly detected by the object detection means, the position of the newly detected object is traced between a plurality of frames retroactive to the time before the time of the frame image where the object is detected, Storing the tracked object position in the object list storage unit.