JP7121708B2

JP7121708B2 - Object extractor, method and program

Info

Publication number: JP7121708B2
Application number: JP2019149690A
Authority: JP
Inventors: 良亮渡邊
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2022-08-18
Anticipated expiration: 2039-08-19
Also published as: JP2021033407A

Description

本発明は、オブジェクト抽出装置、方法およびプログラムに係り、特に、映像の各フレームから、過去フレームを統計的に処理した背景モデルを背景とする背景差分計算によりオブジェクトを抽出するオブジェクト抽出装置、方法およびプログラムに関する。 The present invention relates to an object extraction device, method, and program, and more particularly to an object extraction device, method, and program for extracting an object from each frame of a video by background difference calculation using a background model obtained by statistically processing past frames as a background. Regarding the program.

主に画像中からの移動物体の検出や、非特許文献１に示されるような自由視点映像の制作などを目的として、前景領域と背景領域とを分離する手法が数多く提案されている。その中で、事前に撮影した前景の存在しない背景画像などをベースに背景の特徴をモデル化し、背景モデルと入力画像との差分が大きい領域を前景として抽出するアプローチは背景差分法と呼ばれる。 Many techniques for separating a foreground region and a background region have been proposed mainly for the purpose of detecting a moving object in an image and producing a free-viewpoint video as shown in Non-Patent Document 1. Among them, the approach of modeling the features of the background based on a background image with no foreground taken in advance and extracting the area where the difference between the background model and the input image is large as the foreground is called the background subtraction method.

背景差分法の例として、非特許文献２には、複数のガウス分布を混合させた混合ガウス分布を用いて背景をモデル化する手法が開示されている。非特許文献２は、混合ガウス分布によってモデル化された背景に対し、一定の差分以上の画素を前景として抽出することで前景と背景との分離を実現する。 As an example of the background subtraction method, Non-Patent Document 2 discloses a method of modeling the background using a mixed Gaussian distribution obtained by mixing a plurality of Gaussian distributions. Non-Patent Document 2 realizes separation of the foreground and the background by extracting pixels having a certain difference or more as the foreground from the background modeled by the mixed Gaussian distribution.

非特許文献２で使用される背景モデルは、フレームを経るごとに徐々に更新される機構を備えている。これにより、背景部分に変化が生じた場合にも、徐々に背景の特徴を学習することで、その状況に応じた背景モデルを保持することが可能になる。 The background model used in Non-Patent Document 2 has a mechanism for gradually updating each frame. As a result, even if the background portion changes, the background model can be maintained according to the situation by gradually learning the characteristics of the background.

非特許文献３では、背景差分を実施する際に設定する前景抽出のための閾値を、人物の追跡によって得られる人物位置に応じて画素ごとに適応的に変化させることで、背景差分の精度を高める手法が開示されている。非特許文献３では、このような人物の認識や追跡情報に基づくパラメータ変更により、非特許文献２が開示する背景差分を用いた技術の精度をさらに向上させることが期待できる。 In Non-Patent Document 3, the accuracy of the background subtraction is improved by adaptively changing the threshold for foreground extraction that is set when performing the background subtraction for each pixel according to the human position obtained by tracking the person. A method for increasing is disclosed. In Non-Patent Document 3, it is expected that the accuracy of the technology using background subtraction disclosed in Non-Patent Document 2 can be further improved by changing parameters based on such person recognition and tracking information.

特許文献１は、物体認識の結果に基づいて、背景差分法を実施する際の閾値および更新率の双方を画素ごとに動的に変更することで、高精度な前景抽出を実現する発明に関するものである。 Patent Document 1 relates to an invention that realizes highly accurate foreground extraction by dynamically changing both the threshold value and the update rate for each pixel when performing the background subtraction method based on the result of object recognition. is.

特許文献１では、前景として抽出したい物体が存在する可能性が高い領域の背景差分閾値を、物体認識の結果に基づいて低くすることで、抽出対象物体が前景として抽出されやすくできる。加えて、物体認識の結果に基づいて、物体が存在する画素の背景モデルの更新率を低くすることで、静止している物体に欠けが生じることを防止できる。 In Japanese Patent Laid-Open No. 2004-100001, the background difference threshold value of a region in which an object to be extracted as the foreground is likely to exist is lowered based on the result of object recognition, thereby making it easier to extract the object to be extracted as the foreground. In addition, by lowering the update rate of the background model for the pixels where the object exists based on the result of object recognition, it is possible to prevent the static object from being chipped.

特願2019-009705号Patent application No. 2019-009705

H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732(2018)H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732 (2018) C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999). 寺林賢司，梅田和昇，モロアレッサンドロ，"人物追跡情報を用いた背景差分のリアルタイム適応閾値処理" ,電気学会一般産業研究会資料, GID-09-17, pp.89-90(2009).Kenji Terabayashi, Kazunobu Umeda, Alessandro Moro, "Real-time Adaptive Threshold Processing of Background Subtraction Using Human Tracking Information", IEEJ General Industry Research Group, GID-09-17, pp.89-90(2009). Gunnar Farneback, "Two-frame motion estimation based on polynomial expansion." In Proceedings of the 13th Scandinavian conference on Image analysis (SCIA'03), pp.363-370, 2003.Gunnar Farneback, "Two-frame motion estimation based on polynomial expansion." In Proceedings of the 13th Scandinavian conference on Image analysis (SCIA'03), pp.363-370, 2003. J. F. Henriques, R. Caseiro, P. Martins and J. Batista, "High-Speed Tracking with Kernelized Correlation Filters," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596, 2015.J. F. Henriques, R. Caseiro, P. Martins and J. Batista, "High-Speed Tracking with Kernelized Correlation Filters," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596, 2015. K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988 (2017).K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988 (2017). J. Redmon and A. Farhadi,"YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017).J. Redmon and A. Farhadi,"YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017).

本発明の発明者等は、非特許文献２，３及び特許文献１に開示されているような背景差分法ベースの手法で抽出したシルエットを用い、非特許文献１に代表されるような自由視点映像の制作を行って考察を行った。 The inventors of the present invention used silhouettes extracted by a background subtraction-based method as disclosed in Non-Patent Documents 2 and 3 and Patent Document 1, and used a free-viewpoint silhouette represented by Non-Patent Document 1. I made a video and made a study.

その結果、非特許文献１に開示されている自由視点映像技術の制作工程では、背景差分法を用いてシルエット画像を作成し、その後、各シルエット画像をベースに３次元空間上の積集合を計算することで視体積を生成し、対象物体の３Ｄモデル化を行うが、このときのシルエット抽出の精度が自由視点映像の品質に大きく影響を及ぼすことを確認した。 As a result, in the production process of the free-viewpoint video technology disclosed in Non-Patent Document 1, silhouette images are created using the background subtraction method, and then the intersection in the three-dimensional space is calculated based on each silhouette image. By doing so, a visual volume is generated and a 3D model of the target object is created. It was confirmed that the accuracy of silhouette extraction at this time greatly affects the quality of the free-viewpoint video.

非特許文献２に開示されている手法は、背景を統計的にモデル化し、徐々に更新を行うことで、緩やかな背景の変化や規則的な背景部分の変化に関してロバストにシルエット抽出を行えるという強みがあった。しかしながら、背景が複雑かつ急峻に変化するようなシーンには適用が難しかった。 The method disclosed in Non-Patent Document 2 has the strength of being able to robustly extract silhouettes with respect to gradual changes in the background and regular changes in the background part by statistically modeling the background and gradually updating it. was there. However, it has been difficult to apply it to a scene in which the background is complicated and changes abruptly.

背景が複雑なシーンの例としては、スポーツの試合において選手やボールの抽出を行いたい場合に、選手の背後に頻繁に切り替わりが発生する広告表示用の液晶ディスプレイが配置されているシーンや、悪天候で背景の状況が常に大きく変化し続けるようなシーンが挙げられる。 An example of a scene with a complex background is a scene in which a liquid crystal display for displaying advertisements that frequently switch is placed behind the player when you want to extract players and balls in a sports game, or a scene in which bad weather is used. Scenes in which the background situation constantly changes greatly can be cited.

このようなシーンでは、背景が急激に変化することに加え、その背景の変化に規則性がないことも多いことから、背景を誤って前景として抽出してしまう可能性が高くなる。これらの抽出困難なシーンにおいて抽出を行うためには、非特許文献２の手法では精度的に不十分である。 In such a scene, the background changes abruptly, and the change of the background often has no regularity. Therefore, there is a high possibility that the background is mistakenly extracted as the foreground. The method of Non-Patent Document 2 is insufficient in accuracy to perform extraction in these scenes that are difficult to extract.

このような技術課題を解決するために、非特許文献３や特許文献１のように、人物追跡や人物認識を行い、その結果に基づいて、背景差分法の閾値や更新率を変更する手法が提案されていた。しかしながら、非特許文献３では、追跡で得られる矩形に基づいて、その内部の閾値を小さくし、抽出されやすくする機構しか開示されていない。したがって、対象物体の抽出をしやすくはなるものの、閾値を下げた領域ではノイズなどが多く発生しやすくなるという新たな課題が生じ得る。 In order to solve such technical problems, as in Non-Patent Document 3 and Patent Document 1, there is a method of performing person tracking and person recognition, and changing the threshold value and update rate of the background subtraction method based on the results. had been proposed. However, Non-Patent Literature 3 only discloses a mechanism for reducing the threshold value inside a rectangle obtained by tracking to facilitate extraction. Therefore, although it becomes easier to extract the target object, a new problem may arise in that a large amount of noise is likely to occur in the area where the threshold is lowered.

一方、特許文献１では、人物認識の結果に基づいて、閾値のみならず背景モデルの更新率にも変化を加えることで精度の高い抽出を行う。この閾値および更新率の調整は、あくまで毎フレームの人物認識の結果のみに基づいて行われ、特に複数フレームに渡って人物が連続で認識されているような領域の閾値は低く設定される。 On the other hand, in Patent Literature 1, based on the result of person recognition, not only the threshold value but also the update rate of the background model are changed to perform highly accurate extraction. This adjustment of the threshold value and the update rate is performed only based on the result of person recognition for each frame, and the threshold value is set low especially in areas where persons are continuously recognized over a plurality of frames.

しかしながら、長く静止していた人物が突然動き出すような状況においては、物体が長く静止していた画素位置の閾値が、しばらくの間低く設定され続けるため、本領域でノイズが現れやすくなる。加えて、今まで物体が認識されていなかった領域、すなわち閾値の高い領域に物体が移動する場合には欠けが発生しやすくなるという問題があった。このように、特許文献１は対象物体の不規則な移動に関して、ノイズや欠けが発生しやすいという課題を有していた。 However, in situations where a person who has been standing still for a long time suddenly starts to move, the threshold for pixel positions where the object has been standing still for a long time continues to be set low for a while, so noise tends to appear in this area. In addition, there is a problem that chipping is likely to occur when an object moves to an area where the object has not been recognized, ie, an area with a high threshold value. As described above, Patent Document 1 has a problem that noise and chipping are likely to occur with respect to irregular movement of the target object.

本発明の目的は、上記の技術課題を解決し、オブジェクトが不規則に移動、静止するなど、その動きが激しい場合でもロバストなオブジェクト抽出を可能にするオブジェクト抽出装置、方法およびプログラムを提供することにある。 It is an object of the present invention to solve the above technical problems and to provide an object extraction device, method, and program that enable robust object extraction even when the object moves violently, such as when the object moves irregularly or stands still. It is in.

上記の目的を達成するために、本発明は、映像の各フレームから、過去フレームを統計的に処理した背景モデルを背景とする背景差分計算によりオブジェクトを抽出するオブジェクト抽出装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention provides an object extracting apparatus for extracting an object from each frame of a video by background difference calculation using a background model obtained by statistically processing past frames as a background, and having the following configuration. It is characterized by the fact that it is equipped.

(1) 各フレームからオブジェクト領域を検出する手段(102)と、フレーム間でのオブジェクト領域の変位情報を取得する手段(103)と、前フレームに用いた背景差分閾値を前記変位情報に基づいて現フレームの対応する画素領域に移動して現フレームに用いる背景差分閾値を決定する手段(104a)と、前記背景モデルと現フレームとの差分を前記更新した背景差分閾値と比較する背景差分計算により現フレームの各画素を背景および前景のいずれかに識別する手段(105)と、現フレームに基づいて前記背景モデルを更新する手段(106)とを具備した。 (1) means (102) for detecting an object region from each frame; means (103) for acquiring displacement information of the object region between frames; means (104a) for moving to a corresponding pixel region of the current frame to determine a background difference threshold to be used for the current frame; and background difference calculation for comparing the difference between the background model and the current frame with the updated background difference threshold. Means (105) for identifying each pixel of the current frame as either background or foreground, and means (106) for updating the background model based on the current frame.

(2) 背景モデルを更新する手段(106)が背景モデルに現フレームを所定の更新率で反映させて更新する際の当該更新率を前記変位情報に基づいて決定する更新率決定手段(104b)を更に具備した。 (2) update rate determination means (104b) for determining the update rate based on the displacement information when the means (106) for updating the background model updates the background model by reflecting the current frame at a predetermined update rate; was further provided.

(3) 更新率決定手段(104b)は、前記変位情報に基づいて、変位速度が大きい画素ほど更新率を高く調整するようにした。 (3) Based on the displacement information, the update rate determination means (104b) adjusts the update rate to a higher value for a pixel with a higher displacement speed.

(4) 更新率決定手段(104b)は、現フレームに対するオブジェクト検出の結果に基づいて、オブジェクトが検出された画素の更新率を低く調整するようにした。 (4) The update rate determining means (104b) adjusts the update rate of the pixel where the object is detected to be low based on the object detection result for the current frame.

(5) 背景モデルを更新する手段(106)は、前記識別する手段(105)による識別の結果に基づいて、前景に識別された画素は背景に識別された画素よりも低い更新率で更新するようにした。 (5) means (106) for updating the background model updates the pixels identified as the foreground at a lower update rate than the pixels identified as the background, based on the identification result of the identifying means (105); I made it

(6) 背景差分閾値を決定する手段は、前記変位情報に基づいて決定した背景差分閾値を、現フレームに対するオブジェクト検出の結果に基づいて、オブジェクトが検出された画素について低く調整するようにした。 (6) The means for determining the background difference threshold adjusts the background difference threshold determined based on the displacement information to be lower for the pixels where the object is detected based on the result of object detection for the current frame.

(7) 前記変位情報に基づいて背景差分閾値が移動された移動元の各画素の更新率を初期化する手段をさらに具備した。 (7) It further comprises means for initializing the update rate of each pixel of the movement source of which the background difference threshold has been moved based on the displacement information.

(8) 前記変位情報に基づいて背景差分閾値が移動された移動元の各画素の背景差分閾値を初期化する手段をさらに具備した。 (8) Further provided is means for initializing the background difference threshold of each pixel of the movement source of which the background difference threshold has been moved based on the displacement information.

(1) フレーム間でのオブジェクト領域の変位情報に基づいて、前フレームに用いた背景差分閾値を現フレームの対応する画素領域に移動して現フレームに用いる背景差分閾値を決定するので、オブジェクトが不規則に移動、静止するなど、その動きが激しい場合でもロバストなオブジェクト抽出が可能になる。 (1) Based on the displacement information of the object area between frames, the background difference threshold used in the previous frame is moved to the corresponding pixel area in the current frame to determine the background difference threshold used in the current frame. Robust object extraction becomes possible even when the movement is intense, such as moving or standing still irregularly.

(2) 背景モデルを更新する手段が背景モデルに現フレームを所定の更新率で反映させて更新する際の当該更新率を前記変位情報に基づいて決定するので、オブジェクトの動きに応じて背景モデルを適応的に最適化できるようになる。 (2) The means for updating the background model determines the update rate when updating the background model by reflecting the current frame at a predetermined update rate based on the displacement information. can be adaptively optimized.

(3) 更新率決定手段は、変位速度が大きい画素ほど更新率を高く調整するので、長時間静止するオブジェクトについては更新処理の繰り返しによる欠けなどが発生しにくくなる一方、高速に移動するオブジェクトには高い更新率が割り当てられるので、輪郭を綺麗に削りやすくなる。 (3) The update rate determining means adjusts the update rate to a higher value as the displacement speed of a pixel increases. is assigned a high update rate, which makes it easier to sharpen contours.

(4) 更新率決定手段は、オブジェクトが検出された画素の更新率を低く調整するので、静止しているオブジェクトの識別結果に欠けが生じることを防止できるようになる。 (4) Since the update rate determining means adjusts the update rate of the pixel where the object is detected to be low, it is possible to prevent the identification result of the stationary object from being deficient.

(5) 背景モデルを更新する手段は、前景に識別された画素は背景に識別された画素よりも低い更新率で更新するので、静止しているオブジェクトの識別結果に欠けが生じることを防止できるようになる。 (5) The means for updating the background model updates the pixels identified as the foreground at a lower update rate than the pixels identified as the background, so that it is possible to prevent the identification results of stationary objects from being deficient. become.

(6) 背景差分閾値を決定する手段は、変位情報に基づいて決定した背景差分閾値を、オブジェクトが検出された画素について低く調整するので、オブジェクトの検出された領域が背景差分計算により前景に識別しやすくなる。 (6) The means for determining the background difference threshold adjusts the background difference threshold determined based on the displacement information to be lower for the pixels where the object is detected, so that the detected area of the object is identified as the foreground by the background difference calculation. easier to do.

(7) 変位情報に基づいて背景差分閾値が移動された移動元の各画素の更新率を初期化する手段を具備したので、更新率が低い値に維持され続けることを原因とするノイズの発生を防止できるようになる。 (7) Since means for initializing the update rate of each pixel of the movement source whose background difference threshold has been moved based on the displacement information is provided, noise occurs due to the update rate being maintained at a low value. can be prevented.

(8) 変位情報に基づいて背景差分閾値が移動された移動元の各画素の背景差部閾値を初期化する手段を具備したので、背景差分閾値が低い値に維持され続けることを原因とするノイズの発生を防止できるようになる。 (8) The background difference threshold is maintained at a low value because a means for initializing the background difference threshold of each pixel whose background difference threshold has been moved based on the displacement information is provided. It becomes possible to prevent the occurrence of noise.

本発明の一実施形態に係るオブジェクト抽出装置の主要部の構成を示した機能ブロック図である。1 is a functional block diagram showing the configuration of main parts of an object extraction device according to an embodiment of the present invention; FIG. オブジェクト検出の一例を示した図である。It is a figure showing an example of object detection. オブジェクト追跡の一例を示した図である。FIG. 4 is a diagram showing an example of object tracking; 背景差分閾値の決定方法を示した図である。FIG. 10 is a diagram showing a method of determining a background difference threshold; 背景差分閾値の移動元の画素に、移動させた背景差分閾値が残ることでノイズが発生する例を示した図である。FIG. 10 is a diagram showing an example in which noise occurs due to the background difference threshold remaining in the pixel from which the background difference threshold is moved. 背景差分閾値の移動後、移動元の背景差分閾値を初期化する例を示した図である。FIG. 10 is a diagram showing an example of initializing the background difference threshold of the movement source after moving the background difference threshold; 背景差分計算に基づく背景/前景の識別結果の表示例を示した図である。FIG. 10 is a diagram showing a display example of a background/foreground identification result based on background difference calculation; 本発明の一実施形態の動作を示したフローチャートである。It is a flow chart showing the operation of one embodiment of the present invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係るオブジェクト抽出装置１の主要部の構成を示した機能ブロック図である。本実施形態では、映像の各フレームから背景差分によりオブジェクトを抽出する際に、背景として過去のフレーム画像を統計的に処理した背景モデルが用いられる。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of main parts of an object extraction device 1 according to one embodiment of the present invention. In the present embodiment, a background model obtained by statistically processing past frame images is used as the background when an object is extracted from each frame of video by background difference.

このようなオブジェクト抽出装置１は、CPU、メモリ、インタフェースおよびこれらを接続するバス等を備えた汎用のコンピュータやサーバに、後述する各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはプログラム化した専用機や単能機としても構成できる。 Such an object extraction device 1 can be configured by installing an application (program) that realizes each function described later on a general-purpose computer or server equipped with a CPU, memory, interfaces, and a bus connecting them. Alternatively, a part of the application can be configured as a dedicated machine or a single-function machine that is hardware or programmed.

フレーム画像取得部１０１は、カメラ２が撮影した映像からフレーム画像Iを取得し、オブジェクト検出部１０２、背景差分計算部１０５および背景モデル更新部１０６へ提供する。このフレーム画像Iは、カメラ等で撮影した映像をハードディスクなどに保存しておき、この保存された画像を入力として処理を行ってもよい。オブジェクト検出部１０２は、図２に一例を示したように、各フレームからオブジェクトを検出し、現フレームI_Nにおける検出結果O (x, y)をオブジェクト分布L_N (x, y)として出力する。ここで、Nはフレーム番号を示し、(x, y)はフレーム内での各画素の位置を示している。オブジェクト分布L_N (x, y)では、オブジェクトが検出された画素には高い値、オブジェクトが検出されない画素には低い値が、0～1の範囲でそれぞれ登録される。 The frame image acquisition unit 101 acquires the frame image I from the video captured by the camera 2 and provides it to the object detection unit 102 , the background difference calculation unit 105 and the background model update unit 106 . This frame image I may be an image captured by a camera or the like and stored in a hard disk or the like, and the stored image may be used as an input for processing. The object detection unit 102 detects an object from each frame, and outputs the detection result O(x, y) in the current frame I _N as an object distribution L _N (x, y), as shown in an example in FIG. . Here, N indicates the frame number, and (x, y) indicates the position of each pixel within the frame. In the object distribution L _N (x, y), a high value is registered for pixels where an object is detected, and a low value is registered for pixels where no object is detected, in the range of 0 to 1.

本実施形態では、検出対象のオブジェクトを様々な環境下で撮影したカメラ映像の各フレームから画像特徴量を抽出し、深層学習の結果を法則化した学習モデルを用いて各オブジェクトを検出する。 In this embodiment, an image feature amount is extracted from each frame of a camera video of an object to be detected under various environments, and each object is detected using a learning model that normalizes the results of deep learning.

例えば、カメラ映像が球技を撮影した映像であれば、ボールや選手がオブジェクトとして検出される。このようなオブジェクト検出には、非特許文献６に開示されるように、オブジェクトの形状まで含めて抽出できる深層学習ベースの抽出手法や、非特許文献７に開示されるように、オブジェクトの矩形領域を抽出できる手法を適用できる。あるいは、HOG (Histograms of Oriented Gradients) 特徴量などの画像特徴量に基づいて各オブジェクトを抽出しても良い。 For example, if the camera video is a video of a ball game, the ball and the player are detected as objects. For such object detection, as disclosed in Non-Patent Document 6, a deep learning-based extraction method that can extract even the shape of the object, as disclosed in Non-Patent Document 7, a rectangular region of the object is used. can be applied. Alternatively, each object may be extracted based on an image feature amount such as HOG (Histograms of Oriented Gradients) feature amount.

なお、オブジェクト検出部１０２が各オブジェクトのクラスを識別できるように構成されていれば、識別クラスの情報も含めて出力するようにしても良い。識別クラスの情報とは、オブジェクトが、例えば"人物"なのか"ボール"なのかを識別するための情報である。 Note that if the object detection unit 102 is configured to be able to identify the class of each object, the information on the identification class may also be included in the output. Identification class information is information for identifying whether an object is, for example, a "person" or a "ball".

また、オブジェクト検出部１０２が検出結果の確からしさを示す尤度を算出できるように構成されていれば、この尤度をオブジェクト検出結果O (x, y)に反映させ、後述する背景差分パラメータ決定部１０４におけるパラメータ決定に利用してもよい。 Further, if the object detection unit 102 is configured to be able to calculate the likelihood indicating the likelihood of the detection result, this likelihood is reflected in the object detection result O (x, y), and background difference parameter determination (described later) is performed. It may be used for parameter determination in section 104 .

さらに、オブジェクト分布L_N (x, y)は、次式(1)に示すように、前フレームI_N-1のオブジェクト分布L_{N -1}と現フレームI_Nのオブジェクト検出結果O (x, y)とを所定の割合mで加算した重み付け和として求めても良い。これにより、１つのフレームで物体認識に誤りが生じた場合でも、その影響が他のフレームに与える影響を軽減することができる。 Furthermore, the object distribution L _N (x, y) is the object distribution L _{N -1} of the previous frame I _N -1 and the object detection result O (x, _y ) may be obtained as a weighted sum obtained by adding them at a predetermined ratio m. As a result, even if an object recognition error occurs in one frame, the effect of the error on other frames can be reduced.

変位情報取得部１０３は、現フレームI_Nに対するオブジェクト検出の結果と前フレームI_N-1に対するオブジェクト検出の結果とを比較し、フレーム間での各オブジェクトの変位方向および変位量を変位情報として取得する。このとき、変位情報を非特許文献４に開示されたオプティカルフローの計算により取得するのであれば、一般的にオブジェクトの事前検出が不要なので、現フレームI_Nおよび前フレームI_N-1から変位情報を取得することができる。なお、オプティカルフローを用いれば変位情報を比較的高速かつ簡易に算出できるものの確度は十分とは言えない。 The displacement information acquisition unit 103 compares the object detection result for the current frame I _N with the object detection result for the previous frame I _N-1 , and acquires the displacement direction and displacement amount of each object between frames as displacement information. do. At this time, if the displacement information is obtained by the calculation of the optical flow disclosed in Non _- Patent Document ₄ , the prior detection of the object is generally unnecessary. can be obtained. Although displacement information can be calculated relatively quickly and easily using optical flow, the accuracy is not sufficient.

その他にも、図３に示したように、検出したオブジェクトを追跡することで変位情報を得るアプローチがある。この場合、オブジェクト検出部１０２が検出したオブジェクト領域をスタートとして、非特許文献５に開示される物体追跡技術を用いてフレーム間でオブジェクトの位置を追跡していく。このとき、フレームごとにオブジェクト検出も並行して実施し、オブジェクト検出の結果が追跡結果と重なる場合には、検出結果を新しい追跡のスタートとして繰り返し追跡を行うようなアプローチを採用してもよい。これらのアプローチに基づき、変位情報取得部１０３はフレーム間での各画素の変位方向と変位量とを取得できる。 Another approach is to obtain displacement information by tracking the detected object, as shown in FIG. In this case, starting from the object area detected by the object detection unit 102, the position of the object is tracked between frames using the object tracking technique disclosed in Non-Patent Document 5. At this time, an approach may be adopted in which object detection is also performed in parallel for each frame, and if the object detection result overlaps with the tracking result, the detection result is used as the start of new tracking, and the tracking is repeatedly performed. Based on these approaches, the displacement information acquisition unit 103 can acquire the displacement direction and displacement amount of each pixel between frames.

背景差分パラメータ決定部１０４は、背景差分閾値決定部１０４ａ、更新率決定部１０４ｂおよび初期化部１０４ｃを含み、現フレームI_Nに背景差分計算を適用する際の閾値（背景差分閾値）や背景（背景モデル）の更新率といった背景差分パラメータを決定する。 The background difference parameter determination unit 104 includes a background difference threshold determination unit 104a, an update rate determination unit 104b, and an initialization unit _104c . Determine the background difference parameters, such as the update rate of the background model).

前記背景差分閾値決定部１０４ａは、図４に示したように、前フレームI_N-1のオブジェクト領域に適用された背景差分閾値を前記変位情報に基づいて現フレームI_Nの対応する領域の各画素に移動させることで、現フレームI_Nに適用する背景差分閾値を決定する。たとえば、前フレームI_N-1から現フレームI_Nへの変位情報を変位ベクトル(Δx, Δy)、前フレームI_N-1に適用した背景差分閾値をT_N-1(x, y)で表現すれば、現フレームI_Nに適用する変位ベースの閾値T_{N_MOV}(x, y)は次式(2)で求められる。 The background difference threshold determining unit 104a, as shown in FIG. 4, sets the background difference threshold applied to the object area of the previous frame I _N-1 to each of the corresponding areas of the current frame I _N based on the displacement information. Moving to the pixel determines the background difference threshold to apply to the current frame I _N . For example, the displacement information from the previous frame I _N-1 to the current frame I _N is expressed as a displacement vector (Δx, Δy), and the background difference threshold applied to the previous frame I _N-1 is expressed as T _N-1 (x, y). Then, the displacement-based threshold T _{N_MOV} (x, y) applied to the current frame I _N is obtained by the following equation (2).

なお、前フレームI_N-1では独立していた２つのオブジェクトの画素が現フレームI_Nでは同じ画素に移動してくるようなケースでは、同じ画素に２箇所から閾値が移動してくることが考えられる。このようなケースでは、各閾値の平均値あるいは小さい方の値を優先的に選択するなどの手段を採って変位ベースの閾値T_{N_MOV}(x, y)を得る。 Note that in the case where the pixels of two objects that were independent in the previous frame I _N-1 move to the same pixel in the current frame I _N , it is possible that the threshold values will move to the same pixel from two locations. Conceivable. In such cases, measures such as preferentially choosing the mean or the smaller value of each threshold are taken to obtain the displacement-based threshold T _{N_MOV} (x, y).

また、デプス推定などに基づいて、重なった複数のオブジェクトの前後関係を判定することができる場合には、前面の移動物体の閾値を優先的に反映させるような機構を具備していてもよい。このときに、前フレームI_N-1からの移動先に該当しない画素に関しては、次式(3)に基づいて前フレームI_N-1の対応する画素の閾値T_{N-1_}(x, y)をそのまま採用することができる。 In addition, if the front-back relationship of a plurality of overlapping objects can be determined based on depth estimation or the like, a mechanism may be provided that preferentially reflects the threshold value of the front moving object. At this time, for pixels that do not correspond to the movement destination from the previous frame I _N -1, the threshold T _{N-1_} (x, y) of the corresponding pixel in the previous frame I _N-1 is calculated based on the following equation (3). can be adopted as is.

前記背景差分閾値決定部１０４ａは更に、前記移動ベースで背景差分閾値T_{N_MOV}(x, y)を計算した後、オブジェクト検出部１０２が出力するオブジェクト検出の結果L_N (x, y)をT_{N_MOV(}x, y)に反映させるための調整値T_C (x, y)を次式(4)により計算する。T_maxはフレーム間での閾値の変化量の最大値を示す定数であり、予め設定されている。 The background difference threshold determination unit 104a further calculates the background difference threshold T _{N_MOV} (x, y) based on the movement, and then converts the object detection result L _N (x, y) output from the object detection unit 102 to T _{N_MOV} An adjustment value T _C ₍ x, y) to be reflected in (x, y) is calculated by the following equation (4). T _max is a constant that indicates the maximum amount of change in the threshold between frames, and is set in advance.

すなわち、オブジェクト分布L_N (x, y)によりオブジェクトが存在するとされた画素が前景と判定されやすくするためには、その閾値を下げることが望ましい。そこで、本実施形態ではオブジェクトが存在する画素では負の調整値T_C (x, y)が得られ、オブジェクトが存在しない画素では正の調整値T_C (x, y)が得られるようにしている。 In other words, it is desirable to lower the threshold in order to make it easier to determine that the pixels in which the object exists according to the object distribution L _N (x, y) are in the foreground. Therefore, in the present embodiment, a negative adjustment value T _C (x, y) is obtained for a pixel where an object exists, and a positive adjustment value T _C (x, y) is obtained for a pixel where an object does not exist. there is

前記背景差分閾値決定部１０４ａは、変位ベースの閾値T_{N_MOV}(x, y)および調整値T_C (x, y)を次式(5)に適用することにより、現フレームI_Nに適用する最終的な背景差分閾値T_N(x, y)を求める。 The background difference threshold determination unit 104a applies the displacement-based threshold T _{N_MOV} (x, y) and the adjustment value T _C (x, y) to the following equation (5) to determine the final value to be applied to the current frame I _N . A realistic background difference threshold T _N( x, y) is obtained.

このように、本実施形態ではオブジェクトが存在する可能性が高い領域に低い閾値が設定され、オブジェクトの存在可能性が低い領域には高い閾値が設定されるので、オブジェクトが抽出され易くなる効果が生まれる。 As described above, in the present embodiment, a low threshold value is set for an area where there is a high probability that an object exists, and a high threshold value is set for an area where there is a low possibility that an object exists. to be born.

更新率決定部１０４ｂは、後述する背景モデル更新部１０６が、前フレームまで（～I_N-1）の各フレームを統計的に処理してモデル化した背景モデルに現フレームI_Nの画素情報を反映して現フレームまで（～I_N）の背景モデルに更新する際の当該反映の割合（更新率U_N (x, y)）を前記変位情報に基づいて画素ごとに決定する。 The update rate determining unit 104b adds the pixel information of the current frame I _N to the background model modeled by the background model updating unit 106, which will be described later, by statistically processing each frame up to the previous frame (to I _N-1 ). A rate of reflection (update rate U _N (x, y)) when updating the background model up to the current frame (˜I _N ) is determined for each pixel based on the displacement information.

本実施形態では、更新率U_N (x, y)がフレーム間で大幅に変化することを避けるため、前フレームI_N-1に対する背景差分計算で用いた背景モデルを更新した際の更新率U_N-1(x, y)と補正値U_C(x, y)とに基づいて、現フレームI_Nに対する背景差分計算で用いる背景モデルを更新するための更新率U_N (x, y)を次式(6)で求める。 In this embodiment, in order to prevent the update rate U _N ( _x , y) from changing significantly between frames, the update rate U Based on _N −1(x, y) and the correction value U _C (x, y), the update rate U _N (x, y) for updating the background model used in the background subtraction calculation for the current frame I _N is Calculated by the following formula (6).

ただし、更新率U_N (x, y)は0～1の値の範囲を持つことから、0未満になる場合には0に、1より大きくなる場合には1になるように値を修正する。補正値U_C(x, y)は、現フレームI_Nでの各画素の更新率の変化量であり、次式(7)で求められる。 However, since the update rate U _N (x, y) has a value range of 0 to 1, the value is corrected to 0 if it is less than 0, and 1 if it is greater than 1. . The correction value U _C (x, y) is the amount of change in the update rate of each pixel in the current frame I _N and is obtained by the following equation (7).

ここで、U_maxはフレーム間での更新率の変化量の最大値を示す定数であり、予め手動で設定される。M_N (x, y)は、フレーム間での各画素の変位量すなわち変位速度の指標値であり、変位速度が大きい画素ほど大きい値が入力されるように設計され、0～1の範囲で正規化した値を採る。これにより、M_N(x, y)が1のときにU_C (x, y)はU_maxとなり、M_N (x, y)が0のときにU_C (x, y)は-U_maxとなることから、M_N(x, y)に応じて更新率が最大でU_maxだけ変化することになる。 Here, U _max is a constant that indicates the maximum amount of change in the update rate between frames, and is set manually in advance. M _N (x, y) is the displacement amount of each pixel between frames, that is, the index value of the displacement speed. Take the normalized value. Thus, U _C (x, y) is U _max when M _N (x, y) is 1, and U _C (x, y) is -U _max when M _N (x, y) is 0. As a result, the update rate changes at most by U _max according to M _N (x, y).

このように、本実施形態では変位速度が大きい画素には大きい更新率が割り当てられ、変位速度が小さい画素には小さい更新率が割り当てられるため、長時間静止するオブジェクトに関し、更新処理の繰り返しによる欠けなどが発生しにくくなる。また、高速に移動するオブジェクトには高い更新率が割り当てられるので、輪郭を綺麗に削りやすくなる。 As described above, in this embodiment, a large update rate is assigned to a pixel with a high displacement speed, and a small update rate is assigned to a pixel with a low displacement speed. etc. will be less likely to occur. Also, since a high update rate is assigned to a fast-moving object, it becomes easier to sharpen the contour.

また、追跡中に別の物体が同じ位置に重なってくるようなケースでは、M_N (x, y)が2つ得られるケースも考えられるが、この場合は平均や最大となる量M_N (x, y)を採用することができる。また、閾値の決定と同様に、デプス推定などに基づいて、重なる複数の物体の前後関係を判定することができる場合には、前面の移動物体のM_N (x, y)を優先的に反映させるような機構を設けても良い。 Also, in cases where another object overlaps the same position during tracking, there may be cases where two M _N (x, y) are obtained, but in this case, the average or maximum amount M _N ( x, y) can be adopted. Also, similar to the determination of the threshold value, if the anteroposterior relationship of multiple overlapping objects can be determined based on depth estimation, etc., M _N (x, y) of the moving object in front is preferentially reflected. A mechanism may be provided to allow

さらに、閾値と同様に上式(7)にもオブジェクト分布L_N (x, y)を反映させ、物体が存在する場合には更新率が低くなるような制御を追加しても良い。物体が存在する画素の更新率を低くすることで、静止物体の欠けなどを防止できる効果が期待できる。 Furthermore, similar to the threshold, the object distribution L _N (x, y) may be reflected in the above equation (7), and control may be added to lower the update rate when an object exists. By lowering the update rate of pixels in which an object exists, it is possible to expect the effect of preventing missing of a stationary object.

なお、オブジェクト検出部１０２が各オブジェクトのクラスを識別できるのであれば、識別クラスごとに前記最大値U_max、T_maxを設定してもよい。これにより、例えば人物の肌の色が床の色と酷似しており欠けやすく、ボールは背景とかけ離れた色をしているため欠けにくいという状況下で、人物クラスのT_max（T_{max_person}）を大きく、ボールクラスのT_max（T_{max_ball}）を小さく設計することで、人物クラスの方が認識されたケースの方が閾値の変動量が大きくなり、人物が存在する可能性が高いと思われる領域では、抽出がされやすくなることなどの効果が期待できる。 If the object detection unit 102 can identify the class of each object, the maximum values U _max and T _max may be set for each identification class. As a result, for example, in a situation where the skin color of a person is very similar to the color of the floor and is easily chipped, and the ball is a color that is far from the background and is difficult to chip, T _max (T _{max_person} ) of the person class can be By designing T _max (T _{max_ball} ) of the ball class to be large and small, the amount of change in the threshold becomes larger in cases where the person class is recognized, and the area where it is thought that a person is likely to exist. In this case, effects such as facilitating extraction can be expected.

さらに、前記更新率U_N (x, y)および閾値T_N(x, y)の決定の際に、矩形ベースの追跡方法による変位ベクトルの取得と、オプティカルフローなどから得られる画素ベースの変位ベクトルを組み合わせるアプローチを用いてもよい。 Furthermore, in determining the update rate U _N (x, y) and the threshold T _{N (} x, y), the acquisition of the displacement vector by the rectangle-based tracking method and the pixel-based displacement vector obtained from optical flow, etc. A combined approach may also be used.

例えば、矩形ベースの追跡を実施している際に、当該矩形の中の画素にのみ、オプティカルフローを用いた変位ベクトルの取得を実施し、矩形内部にて、画素ごとに異なる閾値や更新率を得ることが考えられる。矩形ベースの追跡では一矩形につき一つの変位ベクトルしか得ることができないが、矩形内部で画素ベースの変位ベクトルを計算し、利用することによって、画素ごとに適応的に閾値と更新率を変化させることができる。これは、例えば上式(7)で利用される、各画素のM_N (x, y)を矩形のM_N1 (x, y)と画素ベースのM_N2(x, y)との重み付き和などを計算することで実現される。 For example, when performing rectangle-based tracking, we acquire displacement vectors using optical flow only for the pixels within the rectangle, and set different thresholds and update rates for each pixel within the rectangle. can be obtained. Rectangle-based tracking can only yield one displacement vector per rectangle, but by computing and using pixel-based displacement vectors inside the rectangle, we can adaptively vary the threshold and update rate for each pixel. can be done. This is the weighted sum of the rectangular M _N1 (x, y) and the pixel-based M _N2 (x, y), which is used, for example, in equation (7) _above . This is achieved by calculating

このような処理を加えることで、矩形そのものは停止しているものの、矩形の内部で物体に動きがあるような場合に、適切なパラメータを設定できる可能性が高くなる。また、矩形内部だけで画素ベースの変位ベクトルの取得を行うため、画像全体で画素ベースの変位ベクトルの取得を行う場合と比べて高速に計算することが可能となる。 Adding such processing increases the possibility of setting appropriate parameters when the rectangle itself is stationary but the object is moving inside the rectangle. In addition, since the pixel-based displacement vector is obtained only within the rectangle, the calculation can be performed at a higher speed than when the pixel-based displacement vector is obtained for the entire image.

さらに、オブジェクト追跡領域ではその外側近傍の領域ほど背景/前景の識別結果が変化しやすいことから、矩形ベースの追跡であれば、矩形内の中央部に比べて外側の更新率を高目に微調整するなどの処理を加えても良い。 Furthermore, in the object tracking area, the background/foreground discrimination results are more likely to change in the area near the outside of the object tracking area. Processing such as adjustment may be added.

初期化部１０４ｃは、閾値T_N-1(x, y)を変位ベクトルに基づいて移動されると、移動元の各画素(x-Δx, y-Δy)の閾値おおよび更新率を初期化する。これは、図５に一例を示したように、オブジェクトが移動していく場合に、移動元の領域でノイズが発生しやすくなることを抑止するために行われる。 When the threshold T _N-1 (x, y) is moved based on the displacement vector, the initialization unit 104c initializes the threshold and update rate of each pixel (x-Δx, y-Δy) of the movement source. do. This is done in order to prevent noise from easily occurring in the area of the movement source when the object moves, as shown in an example in FIG.

すなわち、移動元の閾値および更新率を放置すると、暫くの間、閾値は高く、更新率は低く維持されるため、このようなノイズが発生する場合がある。そこで、図６に示したように、本実施形態では移動元の閾値と更新率を初期化することで、ノイズの発生を抑制する効果が期待できる。 That is, if the threshold and the update rate of the movement source are left as they are, the threshold is kept high and the update rate low for a while, so such noise may occur. Therefore, as shown in FIG. 6, in this embodiment, by initializing the threshold value and the update rate of the movement source, an effect of suppressing the occurrence of noise can be expected.

背景差分計算部１０５は、各画素が単一のガウスモデルでモデル化された背景モデルを背景として用い、フレーム画像取得部１０１が取得した現フレームI_Nの各画素を背景差分により前景および背景のいずれかに識別する。 The background difference calculation unit 105 uses a background model in which each pixel is modeled by a single Gaussian model as the background, and divides each pixel of the current frame I _N acquired by the frame image acquisition unit 101 into the foreground and background by the background difference. Identify one.

本実施形態では、後述する背景モデル更新部１０６から、前フレームI_N-1までの背景モデルとして、画素ごとに時間軸上の平均μ_N-1(x, y)および標準偏差σ_N-1(x, y)を取得し、ガウス分布を用いた背景差分において次式(8)を満たす画素(x, y)を背景に識別する。 In the present embodiment, the background model updating unit 106, which will be described later, sets the average μ _N-1 (x, y) and the standard deviation σ _N-1 on the time axis for each pixel as the background model up to the previous frame I _N -1 . (x, y) is obtained, and pixels (x, y) satisfying the following equation (8) are identified as the background in background subtraction using a Gaussian distribution.

ここで、I_N(x, y)は現フレームI_Nの各画素の輝度値、zは標準偏差の何倍までを背景と判断するかを調節するパラメータであり、背景差分閾値T_N(x, y)が大きいほど背景と判断される可能性が高くなる。 Here, I _N (x, y) is the luminance value of each pixel in the current frame I _N , z is a parameter for adjusting the standard deviation up to which the background is judged to be the background, and the background difference threshold T _N (x , y) is more likely to be judged as background.

なお、背景差分の判定に使う画像の色空間はグレースケールでも良いし、RGBやYUV等の色空間でも良いが、複数の色チャネルを持つ場合は、全てのチャネルを独立に処理し、全ての色が背景の条件を満たす画素が背景に識別されるようにすることが望ましい。 The color space of the image used to determine the background difference may be grayscale, RGB, YUV, or other color space. It is desirable to ensure that pixels whose color satisfies the background condition are identified as background.

また、上式(8)では標準偏差の項と閾値の項とを分けているが、背景差分閾値T_N(x, y)に応じて標準偏差項の定数値zを調節するような機能を備えていてもよい。 In addition, although the standard deviation term and the threshold term are separated in the above equation (8), a function to adjust the constant value z of the standard deviation term according to the background difference threshold T _N (x, y) is added. may be provided.

加えて、背景差分計算部１０５が計算結果として出力するマスクに対して何らかの後処理を実施しても良い。後処理とは、得られたマスクに対してフィルタ処理などを施すことによってマスクを洗練化する処理の総称であり、多くの背景差分アルゴリズムの中で、背景差分計算後に取り入れられている。 In addition, some post-processing may be performed on the mask output by the background difference calculation unit 105 as a calculation result. Post-processing is a general term for processing for refining the obtained mask by applying filtering or the like to the mask, and is incorporated in many background subtraction algorithms after background subtraction calculation.

後処理の代表例としてはメディアンフィルタなどのフィルタ処理によるノイズ除去や、輪郭の膨張(dilation)と縮退(erosion)を繰り返すことで細かいノイズを除去する処理などが挙げられる。また、ユーザが指定したサイズより小さい前景領域は無条件で背景として扱うようなノイズ除去処理を実施しても良いし、前景に囲まれる背景領域のサイズが小さい場合には、前景領域として穴埋めるなどの処理を実施しても良い。 Representative examples of post-processing include noise removal by filtering such as a median filter, and processing to remove fine noise by repeating contour dilation and erosion. A foreground area smaller than the size specified by the user may be subjected to noise removal processing such that it is unconditionally treated as the background. If the size of the background area surrounded by the foreground is small, it is filled as the foreground area. etc. may be performed.

背景モデル更新部１０６は、現フレームに対する背景差分の計算結果に基づいて、背景として用いた背景モデルを次フレームI_N+１用に更新する。本実施形態では、前フレームI_N-1までの各画素の平均値μ_N-1(x, y)および標準偏差σ_N-1(x, y)が、現フレームI_Nの画素情報および前記更新率決定部１０４ｂが決定した更新率U_N(x, y)に基づいて更新される。すなわち、現フレームI_Nの各画素の輝度値I_N(x, y)を反映した各画素の平均μ_N(x, y)は次式(9)で計算される。 The background model updating unit 106 updates the background model used as the background for the next frame I _N+1 based on the calculation result of the background difference for the current frame. In this embodiment, the average value μ _N-1 (x, y) and the standard deviation σ _N-1 (x, y) of each pixel up to the previous frame I _N-1 are combined with the pixel information of the current frame I _N and the It is updated based on the update rate U _N (x, y) determined by the update rate determination unit 104b. That is, the average μ _N (x, y) of each pixel reflecting the luminance value I _N (x, y) of each pixel of the current frame I _N is calculated by the following equation (9).

さらに、現フレームの各画素の輝度値I_N(x, y)を反映した各画素の標準偏差σ_N(x, y)は、次式(10)，(11)で計算される。 Furthermore, the standard deviation σ _N (x, y) of each pixel reflecting the luminance value I _N (x, y) of each pixel in the current frame is calculated by the following equations (10) and (11).

なお、更新率U_N(x, y)に関しては、背景差分計算部１０５による前景/背景の計算結果に基づいて、前景と判定された画素は背景と判定された画素よりも低い更新率で更新されるようにしても良い。一般に、背景の画素に対して高い更新率を、前景の画素に対して低い更新率を設定することで、前景と判定された領域が更新で欠けることを抑止できる効果が期待される。 Regarding the update rate U _N (x, y), pixels determined to be the foreground are updated at a lower update rate than pixels determined to be the background based on the calculation result of the foreground/background by the background difference calculation unit 105. You can let it be. In general, by setting a high update rate for background pixels and a low update rate for foreground pixels, it is expected that an area determined as the foreground will be prevented from being updated incompletely.

また、背景モデルに関しては、実際にシルエット画像の抽出を試みる前に、事前に一定時間に渡り背景モデルを記録し、その背景モデルを利用して背景差分計算部での計算を実施するような機構を有していてもよい。 As for the background model, before actually trying to extract the silhouette image, the background model is recorded for a certain period of time in advance, and the background model is used to perform calculations in the background difference calculation unit. may have

計算結果出力部１０７は、背景差分計算部１０５による背景/前景の計算結果を映像として出力する。出力形式は、例えば図７に示したように、各フレームを背景の画素領域でマスクして得られるカラー画像形式でも良いし、あるいは背景と前景とを識別できる2値のマスク画像形式でも良い。 The calculation result output unit 107 outputs the background/foreground calculation result by the background difference calculation unit 105 as an image. The output format may be, for example, a color image format obtained by masking each frame with a background pixel area as shown in FIG. 7, or a binary mask image format that allows discrimination between the background and the foreground.

図８は、本発明の一実施形態の動作を示したフローチャートであり、ステップＳ１では、フレーム画像取得部１０１によりカメラ映像から今回のフレーム画像（現フレームI_N）が取得される。ステップＳ２では、オブジェクト検出部１０２により現フレームI_Nからオブジェクト領域が検出される。ステップＳ３では、変位情報取得部１０３により、前フレームI_N-1から検出したオブジェクト領域と現フレームI_Nから検出したオブジェクト領域とに基づいてフレーム間でのオブジェクト領域の変位情報(Δx, Δy)が取得される。 FIG. 8 is a flowchart showing the operation of one embodiment of the present invention. In step S1, the current frame image (current frame I _N ) is acquired from the camera video by the frame image acquisition unit 101 . In step S2, the object detection unit 102 detects an object area from the current frame _IN . In step S3, the displacement information acquisition unit 103 obtains displacement information (Δx, Δy) of the object region between frames based on the object region detected from the previous frame I _N-1 and the object region detected from the current frame I _N . is obtained.

ステップＳ４では、図４を参照して詳述したように、変位情報(Δx, Δy)および前フレームI_N-1に適用した背景差分閾値T_N-1(x, y)に基づいて変位ベース閾値T_{N_MOV}(x, y)が計算される。ステップＳ５では、現フレームI_Nに対するオブジェクト検出の結果に基づいて調整値T_C(x, y)が計算される。ステップＳ６では、前記変位ベース閾値T_{N_MOV}(x, y)および調整値T_C(x, y)を上式(5)に適用することで、現フレームI_Nに対する背景差分計算で用いる背景差分閾値T_N(x, y)が決定される。 In step S4, as detailed with reference to FIG. 4, _{a displacement-} _based A threshold T _{N_MOV} (x, y) is calculated. In step S5, an adjustment value T _C (x, y) is calculated based on the object detection results for the current frame I _N . In step S6, by applying the displacement base threshold T _{N_MOV} (x, y) and the adjustment value T _C (x, y) to the above equation (5), the background difference threshold used in the background difference calculation for the current frame I _N is _TN( x, y) is determined.

ステップＳ７では、図６を参照して詳述したように、前フレームI_N-1に適用した背景差分閾値T_N-1(x, y)を前記変位情報に基づいて移動した際の移動元の各画素(x-Δx, y-Δy)に設定されていた背景差分閾値および更新率が、前記初期化部１０４ｃにより初期化される。ステップＳ８では、上式(7)に基づいて更新率の補正値U_C(x, y)が計算される。ステップＳ９では、補正値U_C(x, y)および前フレームI_N-1に適用した更新率U_N-1(x, y)を上式(6)に適用することで、現フレームI_Nに対する背景差分計算で用いる背景モデルを次フレームI_N+1用に更新するための更新率U_N(x, y)が決定される。 In step S7, as described in detail with reference to FIG. 6, the background difference threshold T _N-1 (x, y) applied to the previous frame I _N-1 is moved based on the displacement information. are initialized by the initialization unit 104c. In step S8, the update rate correction value U _C (x, y) is calculated based on the above equation (7). In step _S9 , the _current frame _I _N The update rate U _N (x, y) for updating the background model used in the background difference calculation for the next frame I _N+1 is determined.

ステップＳ１０では、前フレームI_N-1までの各フレームの画素情報を統計的に処理してモデル化した背景モデル（各画素の平均値μN-1(x, y)および標準偏差σN-1(x, y)）が取得される。ステップＳ１１では、前記背景差分計算部１０５が、現フレームI_Nの画素情報と前記取得した背景モデルとの差分を前記背景差分閾値T_N(x, y)と比較する背景差分計算を実行することで現フレームI_Nの各画素が背景および前景のいずれかに識別される。 In step S10, a background model (mean μN-1(x, y) and standard deviation σN _- 1( x, y)) is obtained. In step S11, the background difference calculation unit 105 performs background difference calculation for comparing the difference between the pixel information of the current frame I _N and the obtained background model with the background difference threshold T _{N (} x, y). identifies each pixel of the current frame I _N as either background or foreground.

ステップＳ１２では、前記ステップＳ９で決定した更新率U_N(x, y)が、現フレームI_Nに対する背景差分計算の結果に基づいて、前景と判定された画素は背景と判定された画素よりも低い更新率で更新されるように補正される。ステップＳ１３では、前フレームI_N-1に対する背景差分計算で用いた背景モデル（平均値μN-1(x, y)，標準偏差σN-1(x, y)）、現フレームI_Nの画素情報I_N(x, y)および更新率U_N(x, y)を上式(9)，(10)に適用することで背景モデルが次フレームI_N+1用に更新される。 In step S12, the update rate U _N (x, y) determined in step S9 is set such that the pixels determined to be the foreground are higher than the pixels determined to be the background based on the result of the background difference calculation for the current frame I _N . Corrected to update at a lower update rate. In step S13, the background model (mean value μN-1(x, y), standard deviation σN-1(x, y)) used in the background subtraction calculation for the previous frame I _N-1 and the pixel information of the current frame I _N By applying I _N (x, y) and the update rate U _N (x, y) to the above equations (9) and (10), the background model is updated for the next frame I _N+1 .

ステップＳ１４では、映像が終了したか否かが判断される。終了していなければステップＳ１へ戻り、次フレームI_N+1に対して上記の各処理が繰り返される。 In step S14, it is determined whether or not the video has ended. If not completed, the process returns to step S1, and the above processes are repeated for the next frame _IN+1 .

１...オブジェクト抽出装置，２...カメラ，１０１...フレーム画像取得部，１０２...オブジェクト検出部，１０３...変位情報取得部，１０４...背景差分パラメータ決定部，１０４ａ...背景差分閾値決定部，１０４ｂ...更新率決定部，１０４ｃ...初期化部，１０５...背景差分計算部，１０６...背景モデル更新部，１０７...計算結果出力部 1... Object extraction device, 2... Camera, 101... Frame image acquisition unit, 102... Object detection unit, 103... Displacement information acquisition unit, 104... Background difference parameter determination unit, 104a... Background difference threshold determination unit 104b... Update rate determination unit 104c... Initialization unit 105... Background difference calculation unit 106... Background model update unit 107... Calculation Result output part

Claims

An object extraction device for extracting an object from each frame of a video by background difference calculation using a background model obtained by statistically processing past frames as a background,
means for detecting object regions from each frame;
means for obtaining displacement information of an object area between frames;
means for determining a background difference threshold used for the current frame by moving the background difference threshold used for the previous frame to a corresponding pixel region of the current frame based on the displacement information;
means for identifying each pixel of the current frame as either background or foreground by a background difference calculation comparing the difference between the background model and the current frame to the updated background difference threshold;
and means for updating the background model based on the current frame.

The method further comprises update rate determination means for determining an update rate based on the displacement information when the means for updating the background model updates the background model by reflecting the current frame at a predetermined update rate. 2. The object extraction device according to claim 1.

3. The object extracting apparatus according to claim 2, wherein said update rate determining means adjusts the update rate to be higher for pixels having a higher displacement speed based on said displacement information.

4. The object extracting apparatus according to claim 2, wherein said update rate determining means adjusts the update rate of pixels in which an object is detected to be low based on the result of object detection for the current frame.

3. The means for updating the background model updates the pixels identified as the foreground at a lower update rate than the pixels identified as the background based on the result of identification by the identifying means. 5. The object extracting device according to any one of 1 to 4.

The means for determining the background difference threshold adjusts the background difference threshold determined based on the displacement information to be lower for pixels where objects are detected, based on results of object detection for the current frame. Item 6. An object extraction device according to any one of Items 1 to 5.

6. The object extracting apparatus according to any one of claims 2 to 5, further comprising means for initializing an update rate of each pixel whose background difference threshold has been moved based on said displacement information.

8. The object extracting apparatus according to any one of claims 1 to 7, further comprising means for initializing a background difference threshold of each pixel whose background difference threshold has been moved based on said displacement information. .

In an object extraction method in which a computer extracts an object from each frame of an image by background difference calculation using a background model obtained by statistically processing past frames as a background,
detect the object area from each frame,
Get the displacement information of the object area between frames,
moving the background difference threshold used for the previous frame to the corresponding pixel region of the current frame based on the displacement information to determine the background difference threshold used for the current frame;
identifying each pixel of the current frame as either background or foreground by a background difference calculation that compares the difference between the background model and the current frame to the updated background difference threshold;
An object extraction method, comprising updating the background model based on a current frame.

10. The method of extracting an object according to claim 9, wherein an update rate for updating the background model by reflecting the current frame at a predetermined update rate is determined based on the displacement information.

11. The object extraction method according to claim 10, wherein, based on the displacement information, a pixel having a higher displacement speed is adjusted to have a higher update rate.

12. The object extraction method according to claim 10, wherein the update rate of pixels where the object is detected is adjusted to be lower based on the result of object detection for the current frame.

13. The object extraction method according to any one of claims 10 to 12, wherein pixels identified as the foreground are updated at a lower update rate than pixels identified as the background based on the result of the identification.

14. The method of any one of claims 9 to 13, wherein the background difference threshold determined based on the displacement information is adjusted lower for pixels where objects are detected based on results of object detection for the current frame. Object extraction method.

An object extraction program that causes a computer to execute the object extraction method according to any one of claims 9 to 14.