JP2020119250A

JP2020119250A - Object extraction method and device

Info

Publication number: JP2020119250A
Application number: JP2019009705A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe; 軍陳; Gun Chin
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2020-08-06
Anticipated expiration: 2039-01-23
Also published as: JP7096175B2

Abstract

To provide an object extraction method and device for extracting an object by separating foreground from a moving image.SOLUTION: An object region detection part 2 detects an object region for each frame from a camera video acquired by a camera video acquisition part 1. A presence likelihood map calculation part 3 calculates a presence likelihood map indicating presence likelihood of an object for each frame on the basis of a result of the detection of the object region. A statistical information calculation part 6 calculates statistical information (an average and a standard deviation) of each pixel for each frame. A background difference calculation part 7 extracts, as the object, a foreground region by a background difference method based on the statistical information and a background difference threshold value. The statistical information calculation part 6 updates past statistical information of each pixel at a predetermined update ratio on the basis of a current pixel value to thereby obtain current statistical information. Further included is an update ratio determination part 5 which determines the update ratio on the basis of the presence likelihood map.SELECTED DRAWING: Figure 1

Description

本発明は、オブジェクト抽出方法および装置に係り、特に、動画像から前景を分離することでオブジェクトを抽出するオブジェクト抽出方法および装置に関する。 The present invention relates to an object extraction method and device, and more particularly to an object extraction method and device that extracts an object by separating a foreground from a moving image.

主に画像中からの移動物体の検出や、自由視点映像の制作などを目的として、前景領域と背景領域とを分離する手法が数多く提案されてきた。特に、画像中の背景領域について統計情報等に基づきモデル化を行い、背景モデルと入力画像との差分が大きい領域を前景として抽出するアプローチは背景差分法と呼ばれる。 Many methods have been proposed for separating a foreground region and a background region, mainly for the purpose of detecting a moving object in an image and producing a free-viewpoint image. In particular, an approach in which a background region in an image is modeled based on statistical information or the like and a region having a large difference between the background model and an input image is extracted as a foreground is called a background subtraction method.

背景差分法の例として、非特許文献１には、複数のガウス分布を混合させた混合ガウス分布を用いて背景のモデル化を行うことで、入力画像の背景領域を特定し、前景のみを抽出する技術が開示されている。 As an example of the background subtraction method, Non-Patent Document 1 specifies the background region of the input image and extracts only the foreground by modeling the background using a mixed Gaussian distribution obtained by mixing a plurality of Gaussian distributions. Techniques for doing so are disclosed.

非特許文献２には、背景を単一のガウス分布でモデル化し、平均と分散とに代表される背景統計情報を各フレームで更新しつつ、背景差分を計算する手法が開示されている。この手法では、ガウス分布に基づいて前景の候補領域を抽出した後に、候補領域の形状やヒストグラムに基づいて、候補領域を前景と背景に再度分類することで、影などの本来前景とすべきではない部分を排除することができる。 Non-Patent Document 2 discloses a method of modeling the background with a single Gaussian distribution and calculating the background difference while updating the background statistical information represented by the mean and the variance in each frame. In this method, after extracting candidate regions of the foreground based on the Gaussian distribution, the candidate regions should be reclassified into the foreground and the background based on the shape and histogram of the candidate regions, which should not be the original foreground such as shadows. It is possible to eliminate the parts that do not exist.

非特許文献３には、背景差分を実施する際に設定する前景抽出のための閾値を、人物の追跡情報に基づいて適応的に変化させることで、背景差分の精度を高める手法が開示されている。 Non-Patent Document 3 discloses a method of increasing the accuracy of the background difference by adaptively changing the threshold for extracting the foreground set when performing the background difference based on the tracking information of the person. There is.

一方、近年では、非特許文献４に代表されるような深層学習を用いて対象オブジェクトのシルエットを抽出する技術も提案されている。本手法は、事前に訓練データを用意し、畳み込みニューラルネットワークを用いた事前学習に基づいて対象のシルエット抽出を行うことができる技術である。本手法は訓練データを基に対象オブジェクトを抽出することから、本手法を前景と背景との分離に応用した場合、影などが抽出されにくく、また照明条件などの変化に対し頑健に対象オブジェクトを抽出できるという特徴があった。 On the other hand, in recent years, a technique for extracting the silhouette of the target object using deep learning as represented by Non-Patent Document 4 has also been proposed. This method is a technique that prepares training data in advance and can extract the silhouette of an object based on prior learning using a convolutional neural network. Since this method extracts the target object based on the training data, when this method is applied to the separation of the foreground and the background, shadows are difficult to extract, and the target object is robust against changes in lighting conditions. It had the feature that it could be extracted.

C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W.E.L.Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999). Q. Yao, H. Sankoh, H. Sabirin and S. Naito, "Accurate silhouette extraction of multiple moving objects for free viewpoint sports video synthesis," 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), 2015, pp. 1-6 (2015).Q. Yao, H. Sankoh, H. Sabirin and S. Naito, "Accurate silhouette extraction of multiple moving objects for free viewpoint sports video synthesis," 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), 2015, pp. 1 -6 (2015). 寺林賢司，梅田和昇，モロアレッサンドロ，"人物追跡情報を用いた背景差分のリアルタイム適応閾値処理" ,電気学会一般産業研究会資料, GID-09-17, pp.89-90(2009).Kenji Terabayashi, Kazunobu Umeda, Moro Alessandro, "Real-time adaptive threshold processing of background difference using human tracking information", Institute of Electrical Engineers of Japan, GID-09-17, pp.89-90 (2009). K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988 (2017).K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988 (2017). H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732(2018)H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732(2018) Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310 (2017).Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310 (2017 ). J. Redmon and A. Farhadi,"YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017).J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017).

本発明の発明者等は、非特許文献１，２に開示されているような背景差分法ベースの手法で抽出したシルエットを用い、非特許文献５に代表されるような自由視点映像の制作を行ってきた。非特許文献５に開示されている自由視点映像技術の制作工程では、背景差分法を用いてシルエット画像を作成し、その後、各シルエットから３次元空間上の積集合を計算することで視体積を生成し、対象人物の３Ｄモデル化を行う。このときのシルエット抽出の精度が自由視点映像の品質に大きく影響を及ぼす。 The inventors of the present invention use the silhouette extracted by the background difference method-based method disclosed in Non-Patent Documents 1 and 2 to create a free-viewpoint video represented by Non-Patent Document 5. I went. In the production process of the free-viewpoint video technology disclosed in Non-Patent Document 5, a silhouette image is created using the background subtraction method, and then a visual intersection is calculated from each silhouette by calculating a product set in a three-dimensional space. Generate and 3D model the target person. The accuracy of silhouette extraction at this time greatly affects the quality of the free-viewpoint image.

非特許文献１，２に開示されている手法は、背景を統計的にモデル化し、更新を行うことから、緩やかな背景の変化や規則的な背景部分の変化に関しては、ロバストな対象オブジェクト抽出を行えるという強みがあった。しかしながら、背景が複雑かつ急峻に変化するようなシーンには適用が難しかった。ここで述べる背景が複雑なシーンの例としては、例えばスポーツの試合において選手やボールの抽出を行いたい場合に、選手の背後に頻繁に切り替わりが発生する広告表示用の液晶ディスプレイが配置されているシーンや、野球などにおいて引かれていたフィールド上の白線が選手の走塁と共に踏み荒らされてしまうようなシーン等が該当する。 The methods disclosed in Non-Patent Documents 1 and 2 statistically model the background and update the background. Therefore, robust target object extraction is performed for moderate background changes and regular background part changes. It had the advantage of being able to do it. However, it was difficult to apply it to a scene where the background is complicated and changes rapidly. As an example of a scene with a complicated background described here, for example, when a player or a ball is to be extracted in a sports game, a liquid crystal display for advertisement display is arranged behind the player, which frequently switches. The scene and the scene where the white line on the field that was drawn in baseball and the like is trampled along with the runner of the player are applicable.

このようなシーンでは、背景が急激に変化することに加えて、その背景の変化に規則性がないことから、背景を誤って前景として抽出してしまう可能性が高くなる。これらの抽出困難なシーンにおいて抽出を行うためには、非特許文献１，２の手法では精度的に不十分である。 In such a scene, in addition to the sudden change in the background, there is no regularity in the change in the background, so there is a high possibility that the background will be erroneously extracted as the foreground. In order to perform extraction in these difficult scenes, the methods of Non-Patent Documents 1 and 2 are insufficient in accuracy.

このような技術課題に対して、非特許文献３のように、人物追跡を行い、その結果を背景差分法の閾値に作用させる手法が提案されていた。非特許文献３は人物の追跡を行い、その結果に基づき閾値の調整を行うことから、非特許文献１，２と比べると照明の変化等に頑健という特徴があった。 For such a technical problem, as in Non-Patent Document 3, a method has been proposed in which person tracking is performed and the result is applied to a threshold of the background subtraction method. Since Non-Patent Document 3 tracks a person and adjusts the threshold value based on the result, it is characterized by being robust against changes in illumination and the like as compared with Non-Patent Documents 1 and 2.

しかしながら、非特許文献３のような手法で前景と背景とを判別するための閾値を動的に調整したとしても、背景モデルを統計的に更新する場合、長時間に渡り静止している選手が存在する場合、徐々に静止している選手部分が背景モデルとして判定されるようになり、対象オブジェクトが背景として判定されてしまうという問題が存在する。逆に、背景を統計的に更新する機構がない場合には、照明変動などに対する頑健さが失われてしまう。 However, even if the threshold for discriminating between the foreground and the background is dynamically adjusted by the method of Non-Patent Document 3, when the background model is statistically updated, a player who has been stationary for a long time When it exists, there is a problem in that the player part that is gradually stationary is determined as the background model, and the target object is determined as the background. On the other hand, if there is no mechanism for statistically updating the background, the robustness against variations in illumination will be lost.

一方、非特許文献４のような深層学習ベースの手法は、画像全体の特徴量から対象オブジェクトを検出することから照明の変化に頑健で、影などが前景として抽出されにくいという利点が存在していた。 On the other hand, the deep learning-based method as in Non-Patent Document 4 has the advantage that it is robust against changes in lighting because the target object is detected from the feature amount of the entire image, and shadows and the like are difficult to extract as the foreground. It was

しかしながら、抽出オブジェクトが重なり合うことでオクルージョンが発生する場合には認識漏れが多く発生することに加え、輪郭を綺麗に抜くことが難しく、自由視点映像制作に用いるシルエットでは対象オブジェクトの輪郭を正確に抽出することが求められることを鑑みると、非特許文献４の手法は適用しづらかった。 However, when occlusion occurs due to overlapping of extracted objects, there are many recognition omissions, and it is difficult to remove the contours neatly, and the contours of the target object are accurately extracted with the silhouette used for free-viewpoint video production. In view of the need to do so, the method of Non-Patent Document 4 was difficult to apply.

本発明の目的は、上記の技術課題を解決し、静止時間の長いオブジェクトも正確に抽出できるオブジェクト抽出方法および装置を提供することにある。 An object of the present invention is to solve the above technical problems and to provide an object extraction method and apparatus capable of accurately extracting an object having a long stationary time.

上記の目的を達成するために、本発明は、動画の映像からオブジェクトを抽出するオブジェクト抽出装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above-mentioned object, the present invention is characterized in that an object extracting device for extracting an object from a video image of a moving image has the following configuration.

(1) 映像を取得する手段と、取得した映像からオブジェクト領域を検出する手段と、オブジェクト領域の検出結果に基づいて存在尤度マップを計算する手段と、各画素の統計情報を計算する統計情報計算手段と、前記統計情報および背景差分閾値に基づく背景差分法により前景領域をオブジェクトとして抽出する背景差分計算手段とを具備し、前記統計情報計算手段は、各画素の過去の統計情報に今回の画素値を所定の更新率で反映することで今回の統計情報を求め、前記更新率を前記存在尤度マップに基づいて決定する更新率決定手段をさらに具備し、例えば、オブジェクトの存在尤度が高い画素ほど更新率を低くするようにした。 (1) A means for acquiring a video, a means for detecting an object area from the acquired video, a means for calculating an existence likelihood map based on the detection result of the object area, and statistical information for calculating statistical information of each pixel Comprising a calculation means and a background difference calculation means for extracting a foreground region as an object by a background difference method based on the statistical information and the background difference threshold, and the statistical information calculation means adds the past statistical information of each pixel to the current statistical information. The statistical information of this time is obtained by reflecting the pixel value at a predetermined update rate, and the update rate determining means for determining the update rate based on the existence likelihood map is further included. The higher the pixel, the lower the update rate.

(2) 前記背景差分閾値を前記存在尤度マップに基づいて計算する閾値計算手段をさらに具備し、例えば、オブジェクトの存在尤度が高い画素ほど背景差分閾値を低くするようにした。 (2) A threshold value calculation means for calculating the background difference threshold value based on the existence likelihood map is further provided, and for example, a pixel having a higher existence likelihood of an object is made to have a lower background difference threshold value.

(3) 前記閾値計算手段は更に、前記オブジェクト領域の検出結果と背景差分計算手段の計算結果との一致比率に基づいて背景差分閾を動的に変更するようにした。 (3) The threshold value calculating means is further adapted to dynamically change the background difference threshold value based on the matching ratio between the detection result of the object area and the calculation result of the background difference calculating means.

(4) 前記統計情報計算手段が、画素値の履歴に基づいて平均値および標準偏差を算出する手段を具備し、前記背景差分計算手段は、背景領域を統計情報に基づいて単一のガウス分布でモデル化するようにした。 (4) The statistical information calculating means comprises means for calculating an average value and a standard deviation based on a history of pixel values, the background difference calculating means, the background area is a single Gaussian distribution based on statistical information. I tried to model with.

(5) 前記存在尤度マップを計算する手段は、オブジェクト領域の今回の検出結果に前回までの検出結果を所定の学習率mで重み付けして今回の存在尤度マップを計算するようにした。 (5) The means for calculating the existence likelihood map is configured to calculate the present existence likelihood map by weighting the current detection result of the object area with the previous detection results by a predetermined learning rate m.

(6) 前記オブジェクト領域を検出する手段は、複数の異なる検出方式でオブジェクト領域をそれぞれ検出し、各検出結果を一つに統合するようにした。 (6) The means for detecting the object area detects each of the object areas by a plurality of different detection methods, and integrates the detection results into one.

(7) 前記抽出したオブジェクトを存在尤度マップに基づいて高精度化する後処理手段として、存在尤度マップの平均値が所定のノイズ閾値を下回る前景領域を背景領域とみなすノイズ除去手段を具備し、前記ノイズ閾値は、前景領域のサイズが大きいほど低くされるようにした。 (7) As a post-processing unit for improving the accuracy of the extracted object based on the existence likelihood map, a noise removing unit that regards a foreground region whose average value of the existence likelihood map is below a predetermined noise threshold as a background region is provided. However, the noise threshold is set to be lower as the size of the foreground region is larger.

(8) 前記抽出したオブジェクトを存在尤度マップに基づいて高精度化する後処理手段として、存在尤度マップの平均値が所定の穴埋め閾値を上回る背景領域を前景領域とみなす欠損穴埋め手段を具備した。 (8) As a post-processing unit for improving the accuracy of the extracted object based on the existence likelihood map, a defect filling unit that regards a background region whose average value of the existence likelihood map exceeds a predetermined filling threshold as a foreground region is provided. did.

(9) 前記存在尤度マップを計算する手段は、抽出対象のオブジェクトごとに存在尤度マップを計算し、前記背景差分閾値および更新率が存在尤度マップごとに決定されるようにした。 (9) The means for calculating the existence likelihood map calculates the existence likelihood map for each object to be extracted, and the background difference threshold and the update rate are determined for each existence likelihood map.

(10) 前記オブジェクトの抽出結果および存在尤度マップに基づいて前記更新率を見直す更新率見直し手段を更に具備し、例えば、存在尤度マップに基づいて決定される更新率を、前景領域では背景領域よりも低くした。 (10) further comprising update rate review means for reviewing the update rate based on the extraction result of the object and the existence likelihood map, for example, the update rate determined based on the existence likelihood map, the background in the foreground region. Lower than the area.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 各画素の過去の統計情報に今回の画素値を所定の更新率で反映することで今回の統計情報を求めるにあたり、前記更新率を存在尤度マップに基づいて決定するようにしたので、例えば、存在尤度マップに基づいて設定される背景差分閾値の範囲を、オブジェクトの存在尤度が高い画素ほど低くすることにより、静止しているオブジェクトが徐々に背景として認識されてしまうためにオブジェクトとして認識されにくくなる、という技術課題を解決できるようになる。 (1) When the current statistical information is obtained by reflecting the current pixel value in the past statistical information of each pixel at a predetermined update rate, the update rate is determined based on the existence likelihood map. , For example, by setting the range of the background difference threshold set based on the existence likelihood map to be lower for pixels having a higher existence likelihood of an object, a stationary object is gradually recognized as a background. It becomes possible to solve the technical problem that it is difficult to be recognized as an object.

(2) 背景差分閾値を存在尤度マップに基づいて計算するようにしたので、例えばオブジェクトの存在尤度が高い画素ほど背景差分閾値を低く設定することにより、背景領域が前景領域として誤検出されにくくすることができる。 (2) Since the background difference threshold is calculated based on the existence likelihood map, the background area is erroneously detected as the foreground area by setting the background difference threshold to be lower for pixels with higher object likelihood. Can be hardened.

(3) 前景と判断された領域と背景と判断された領域との比率に基づいて背景差分閾値を動的に変更するようにしたので、例えば、存在尤度マップに基づいて設定される背景差分閾値の範囲を、オブジェクトの存在尤度が高い画素ほど低くすれば、前景領域が背景領域として誤検出されにくくすることができる。 (3) Since the background difference threshold is dynamically changed based on the ratio of the area determined to be the foreground and the area determined to be the background, for example, the background difference set based on the existence likelihood map. If the threshold range is set lower for pixels having a higher likelihood of existence of an object, the foreground region can be made less likely to be erroneously detected as the background region.

(4) 画素値の履歴に基づいて平均値および標準偏差を算出し、前記背景差分計算手段は、背景領域を統計情報に基づいて単一のガウス分布でモデル化するので、オブジェクトとみなせる背景領域を正確に抽出できるようになる。 (4) The average value and the standard deviation are calculated based on the history of pixel values, and the background difference calculation means models the background area with a single Gaussian distribution based on statistical information, so that the background area can be regarded as an object. Can be accurately extracted.

(5) オブジェクト領域の今回の検出結果に前回までの検出結果を所定の学習率mで重み付けして今回の存在尤度マップを計算するようにしたので、一部のフレームにオブジェクトの検出漏れが生じても、その前後フレームでオブジェクトが検出されていればオブジェクトの見逃しを防止できるようになる。 (5) Since the detection result up to the previous time is weighted by the predetermined learning rate m to the detection result of this time in the object area to calculate the existence likelihood map of this time, omission of detection of the object may occur in some frames. Even if it occurs, if the object is detected in the frames before and after that, it is possible to prevent the object from being overlooked.

(6) 映像からオブジェクト領域を検出する際に、複数の異なるアルゴリズムでオブジェクト検出をそれぞれ実行し、各検出結果を一つに統合するようにしたので、各オブジェクト検出手法の欠点を相互に補うことが可能になる。 (6) When detecting the object area from the video, the object detection is executed by each of a plurality of different algorithms, and the detection results are integrated into one. Will be possible.

(7) 存在尤度マップの平均値が所定のノイズ閾値を下回る前景領域を背景領域とみなすノイズ除去手段を具備したので、背景領域が前景領域として抽出されてしまうことにより生じるオブジェクト抽出の精度低下を回復できるようになる。 (7) Since the foreground area in which the average value of the existence likelihood map is lower than a predetermined noise threshold is considered as a background area, a noise removing means is provided, so that the accuracy of object extraction is reduced due to the background area being extracted as the foreground area. Will be able to recover.

(8) 存在尤度マップの平均値が所定の穴埋め閾値を上回る背景領域を前景領域とみなす欠損穴埋め手段を具備したので、前景領域が背景領域として抽出されてしまうことにより生じるオブジェクト抽出の精度低下を回復できるようになる。 (8) The loss of accuracy of object extraction caused by the foreground region being extracted as the background region because the background region in which the average value of the existence likelihood map exceeds the predetermined threshold for filling is considered as the foreground region. Will be able to recover.

(9) 抽出対象のオブジェクトごとに存在尤度マップを計算し、更に背景差分閾値および更新率を存在尤度マップごとに設定すれば、存在尤度マップが各オブジェクトに固有となるので、オブジェクト抽出の精度を向上させることができる。 (9) If the existence likelihood map is calculated for each object to be extracted and the background difference threshold and the update rate are set for each existence likelihood map, the existence likelihood map becomes unique to each object. The accuracy of can be improved.

(10) オブジェクトの抽出結果および存在尤度マップに基づいて前記更新率を見直す手段を更に設けたので、例えば、存在尤度マップに基づいて決定される更新率の範囲を、前景領域では背景領域よりも低くすれば、静止しているオブジェクトが徐々に背景として認識されてしまうためにオブジェクトとして認識されにくくなる、という技術課題を更に高い確度で解決できるようになる。 (10) Since the means for reviewing the update rate based on the extraction result of the object and the existence likelihood map is further provided, for example, the range of the update rate determined based on the existence likelihood map is set to the background area in the foreground area. If it is set lower than this, it is possible to solve with a higher degree of certainty the technical problem that a stationary object is gradually recognized as a background and is thus difficult to be recognized as an object.

本発明の第１実施形態に係るオブジェクト抽出装置の機能ブロック図である。It is a functional block diagram of the object extraction device which concerns on 1st Embodiment of this invention. Mask R-CNN法によるオブジェクト検出の結果を示した図である。It is a figure showing a result of object detection by Mask R-CNN method. OpenPose法によるオブジェクト検出の結果を示した図である。It is a figure showing the result of object detection by the OpenPose method. オブジェクト検出の結果を比較した図である。It is the figure which compared the result of object detection. 後処理部におけるノイズ除去の方法を示した図である。It is a figure showing the method of noise removal in a post-processing part. 後処理部における欠損穴埋めの方法を示した図である。It is a figure showing the method of filling up a deficient hole in a post-processing part. 欠損穴埋めの結果を従来技術と比較して示した図である。It is the figure which showed the result of missing hole filling in comparison with the prior art. オブジェクトの抽出例を示した図である。It is a figure showing an example of extraction of an object. 本発明の第２実施形態に係るオブジェクト抽出装置の機能ブロック図である。It is a functional block diagram of an object extraction device concerning a 2nd embodiment of the present invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第１実施形態に係るオブジェクト抽出装置の主要部の構成を示した機能ブロック図であり、ここでは、複数台のカメラが設置された環境への適用を例にして説明するが、カメラは１台のみであってもよい。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a main part of an object extraction device according to a first embodiment of the present invention, and here, an example of application to an environment in which a plurality of cameras is installed will be described. However, the number of cameras may be only one.

カメラ映像取得部１は、視野の異なる複数のカメラcamから動画像のカメラ映像を取得する。オブジェクト領域検出部２は、オブジェクトの検出手法が異なる複数の検出部２１，２２を備え、深層学習に代表される複数のオブジェクト検出手法を用いて、カメラ映像ごとにフレーム単位でオブジェクト領域N(x, y)を検出する。 The camera image acquisition unit 1 acquires camera images of moving images from a plurality of cameras cam having different fields of view. The object area detection unit 2 includes a plurality of detection units 21 and 22 having different object detection methods, and uses a plurality of object detection methods typified by deep learning, in which the object area N(x , y) is detected.

本実施形態では、オブジェクト検出の信頼性を高めるために、第１検出部２１として、非特許文献４に開示されているMask R-CNN法を備えると共に、第２検出部２２として、非特許文献６に開示されているOpenPose法を備え、各検出手法を併用し、検出結果を統合することでオブジェクト検出の信頼性を高めている。 In the present embodiment, in order to increase the reliability of object detection, the Mask R-CNN method disclosed in Non-Patent Document 4 is provided as the first detecting unit 21, and the second detecting unit 22 is provided as a non-patent document. The OpenPose method disclosed in No. 6 is provided, each detection method is used together, and the detection results are integrated to improve the reliability of object detection.

また、本実施形態ではバレーボール中継のカメラ映像からオブジェクトを抽出する場合を想定し、抽出対象のオブジェクトを選手およびボールの２種類に限定すると共に、前記OpenPose法によるオブジェクト検出では選手のみを抽出対象としている。 Further, in the present embodiment, it is assumed that the object is extracted from the camera image of the volleyball relay, and the objects to be extracted are limited to two types, that is, the player and the ball, and only the player is extracted in the object detection by the OpenPose method. There is.

図２は、前記Mask R-CNN法によるオブジェクト検出の結果の一例を示した図であり、選手およびボールのシルエット画像が得られている。図３は、前記OpenPose法によるオブジェクト検出の結果の一例を示した図であり、各選手の画像から骨格情報が得られている。 FIG. 2 is a diagram showing an example of a result of object detection by the Mask R-CNN method, in which silhouette images of a player and a ball are obtained. FIG. 3 is a diagram showing an example of a result of object detection by the OpenPose method, and skeleton information is obtained from the image of each player.

なお、採用するオブジェクトの検出手法およびその組み合わせは上記の各手法およびその組み合わせに限定されるものではなく、非特許文献７に開示されているように、抽出対象オブジェクトを包含する矩形を取得するようなアルゴリズムを採用してもよいし、HOG (Histograms of Oriented Gradients) 特徴量などの画像特徴量に基づく検出手法を採用しても良い。深層学習等に基づいてオブジェクト検出を行う場合には、事前に対象オブジェクトを利用して学習させた訓練済モデルが必要となるため、本モデルは事前に計算され、用意されることを前提とする。 It should be noted that the method of detecting an object to be adopted and the combination thereof are not limited to the above-mentioned methods and the combination thereof, and as disclosed in Non-Patent Document 7, a rectangle including an extraction target object is acquired. Any algorithm may be adopted, or a detection method based on an image feature amount such as HOG (Histograms of Oriented Gradients) feature amount may be employed. When performing object detection based on deep learning, etc., it is necessary to have a trained model that has been trained using the target object in advance, so it is assumed that this model is calculated and prepared in advance. ..

図１へ戻り、存在尤度マップ計算部３は、前記オブジェクト領域検出の結果に基づいて、フレーム画像の各位置にオブジェクトが存在する確率（尤度）を計算し、フレーム画像上での存在尤度の分布を表す存在尤度マップE(x, y)を計算する。 Returning to FIG. 1, the existence likelihood map calculation unit 3 calculates the probability (likelihood) that an object exists at each position of the frame image based on the result of the object area detection, and calculates the existence likelihood on the frame image. Existential likelihood map E(x, y) representing the distribution of degrees is calculated.

本実施形態では、次式(1)，(2)に示したように、今回（時刻t）の存在尤度マップE_t(x，y)が、前回（時刻t-1）の存在尤度マップE_t-1(x, y)の計算結果と今回のオブジェクト領域の検出結果N(x, y)との、所定の存在尤度マップ学習率mに基づく重み付け和として計算される。 In the present embodiment, as shown in the following equations (1) and (2), the existence likelihood map E _t (x, y) at this time (time t) is the existence likelihood map at the previous time (time t−1). It is calculated as a weighted sum of the calculation result of the map E _t-1 (x, y) and the current detection result N(x, y) of the object region based on the predetermined existence likelihood map learning rate m.

前記存在尤度マップ学習率mは、オブジェクト領域検出部２等で認識漏れが生じた際の対策として、過去の存在尤度マップの値E_t-1(x, y)を次のフレームに伝播させる比率を示す。N(x, y)は、時刻tにおいてオブジェクト領域検出部２から得られる検出結果を統合した値を示し、R_iは各検出手法から得られる検出結果を示し、k_iは各検出手法の影響比率を調整するための事前に決定されるパラメータを示す。ただしt=0（最初のフレーム）の場合には(1)式のE_t-1(x, y)の項は0として計算する。 Regarding the existence likelihood map learning rate m, the past existence likelihood map value _Et-1( x, y) is propagated to the next frame as a countermeasure when a recognition failure occurs in the object area detection unit 2 or the like. The ratio is shown. N(x, y) represents a value obtained by integrating the detection results obtained from the object area detection unit 2 at time t, R _i represents the detection result obtained from each detection method, and k _i represents the influence of each detection method. 5 shows predetermined parameters for adjusting the ratio. However, when t=0 (first frame), the term of E _t-1 (x, y) in Eq. (1) is calculated as 0.

本実施例では、オブジェクト検出手法としてMask R-CNN法およびOpenPose法を用いていることから、I=2となり、R₁(x, y)としてMask R-CNN法による検出結果が、R₂(x, y)としてOpenPose法による検出結果が、それぞれ代入される。上式(2)では各検出結果に対して重み付けを行い、その和を計算しているが、和に代えて積を計算することでN(x, y)を算出してもよい。 In this embodiment, since it is using Mask R-CNN method and OpenPose method as the object detecting method, I = 2 becomes, R ₁ (x, y) detected result by the Mask R-CNN method as is, R ₂ ( The detection results of the OpenPose method are respectively substituted as (x, y). In the above formula (2), each detection result is weighted and the sum thereof is calculated, but N(x, y) may be calculated by calculating a product instead of the sum.

本実施例では、Mask R-CNN法の結果R₁(x, y)は、Mask R-CNN法で抽出対象オブジェクトが存在すると判定された位置を1、存在しない位置を0としている。また、OpenPose法の結果R₂(x, y)は、抽出された各骨格から一定の距離の部分を1、それ以外を0としている。オブジェクト領域検出部２で骨格のようなオブジェクトの構造情報を抽出する場合には、その中心となる部分から距離が離れるほどR₂(x, y)の値が小さくなるように重み付けを行ってもよい。 In the present embodiment, in the result R ₁ (x, y) of the Mask R-CNN method, the position where it is determined by the Mask R-CNN method that the extraction target object exists is 1 and the position where it does not exist is 0. In addition, in the result R ₂ (x, y) of the OpenPose method, 1 is set for a portion at a constant distance from each extracted skeleton, and 0 is set for other portions. When the object region detection unit 2 extracts the structural information of an object such as a skeleton, even if weighting is performed so that the value of R ₂ (x, y) becomes smaller as the distance from the central portion becomes larger. Good.

また、オブジェクト領域検出部２がオブジェクトを検出する際に、抽出対象のオブジェクトの存在確率を計算できるような機構を持つ場合には、この存在確率をR_i(x, y)の値に反映させてもよい。 Further, when the object area detection unit 2 has a mechanism capable of calculating the existence probability of the object to be extracted when detecting the object, the existence probability is reflected in the value of R _i (x, y). May be.

加えて、存在尤度マップE(x, y)に関して、抽出対象のオブジェクトが選手やボールのように複数ある場合には、オブジェクトごとに存在尤度マップを作成してもよい。その際、設定される閾値や更新率の範囲をオブジェクトごとに変更することで、さらに高精度な前景抽出を実現できる可能性がある。例えば、抽出し易いと思われるオブジェクトの閾値の範囲を高めに、抽出し難いと思われるオブジェクトの閾値の範囲を低めに設定することで、前景の誤抽出を減らすことができる。 In addition, regarding the existence likelihood map E(x, y), when there are a plurality of extraction target objects such as players and balls, a existence likelihood map may be created for each object. At that time, there is a possibility that more accurate foreground extraction can be realized by changing the set threshold value or the update rate range for each object. For example, erroneous extraction of the foreground can be reduced by setting the threshold value range of the object that is likely to be easily extracted to be high and the threshold value range of the object that is difficult to be extracted to be low.

閾値計算部４は、前記存在尤度マップE(x, y)を参照し、フレーム画像の各領域を各画素値に基づいて前景と背景とに分離する際の分離閾値T(x, y)を画素ごとに動的に決定する。本実施形態では、オブジェクトの存在尤度が高い領域には低い閾値が設定され、オブジェクトの存在尤度が低い領域には高い閾値が設定されるように、次式(3)に基づいて分離閾値T(x, y)が画素ごとに決定される。これにより、抽出対象のオブジェクトが存在する領域が前景と判断され易くなる効果を奏することができる。 The threshold calculation unit 4 refers to the existence likelihood map E(x, y) and separates a threshold T(x, y) when separating each region of the frame image into a foreground and a background based on each pixel value. Is dynamically determined for each pixel. In the present embodiment, a low threshold is set in a region where the existence likelihood of an object is high, and a high threshold is set in a region where the existence likelihood of an object is high. T(x, y) is determined for each pixel. As a result, the region in which the object to be extracted exists can be easily determined to be the foreground.

ここで、T_min，T_maxは、それぞれ分離閾値T(x, y)が採り得る最小値および最大値であり、対象とするシーンなどを鑑みて手動で決定されてもよいが、オブジェクト領域検出部２の検出結果または存在尤度マップの値E_t(x，y)と、実際に前景として抽出された領域との一致比率を計算し、一致比率が低い場合には閾値の範囲設定が上手くできていないと判断し、一致比率が改善されるようにT_min，T_maxを自動的に変更するような機構を備えていてもよい。 Here, T _min and T _max are the minimum and maximum values that the separation threshold T(x, y) can take, and may be manually determined in consideration of the target scene, but the object region detection The match ratio between the detection result of the part 2 or the value E _t (x, y) of the existence likelihood map and the area actually extracted as the foreground is calculated, and when the match ratio is low, the threshold range setting is successful. A mechanism may be provided for automatically determining T _min and T _max so that it is determined that the matching ratio has not been improved and the matching ratio is improved.

なお、存在尤度マップE(x, y)を抽出対象のオブジェクトごとに作成するのであれば、T_min，T_maxもそれぞれの抽出対象のオブジェクトごと（選手とボール）に設定しても良い。そして、最終的な各画素の閾値については、各オブジェクトのT(x, y)の平均値あるいは最大値として算出するなどの方法が考えられる。 If the existence likelihood map E(x, y) is created for each extraction target object, T _min and T _max may also be set for each extraction target object (player and ball). Then, the final threshold value of each pixel may be calculated as an average value or maximum value of T(x, y) of each object.

統計情報計算部６は、取得したフレーム画像の画素ごとに、現在の画素値に過去の画素値を統計的に反映させることで画素の統計情報を算出する。本実施形態では画素ごとに、平均値計算部６１が次式(4)に基づいて画素値の平均値u(x, y)を計算し、標準偏差計算部６２が次式(5)に基づいて画素値の標準偏差σ(x, y)を計算する。 The statistical information calculation unit 6 calculates the statistical information of pixels by statistically reflecting the past pixel value on the current pixel value for each pixel of the acquired frame image. In this embodiment, for each pixel, the average value calculation unit 61 calculates the average value u(x, y) of pixel values based on the following formula (4), and the standard deviation calculation unit 62 calculates based on the following formula (5). Then, the standard deviation σ(x, y) of the pixel values is calculated.

前記統計情報計算部６は、統計情報の算出を新しいフレーム画像が取得される時刻tごとに繰り返すので、各統計情報u(x, y)，σ(x, y)はフレーム単位で更新されることになる。U(x, y)は、統計情報を計算する際に過去の統計情報を現在の画素値に反映させる割合（更新率）であり、更新率決定部５が、前記存在尤度マップE(x, y)の各存在尤度をパラメータとして、次式(7)に基づいて計算する。 Since the statistical information calculation unit 6 repeats the calculation of statistical information at each time t when a new frame image is acquired, each statistical information u(x, y), σ(x, y) is updated in frame units. It will be. U(x, y) is the ratio (update rate) of reflecting the past statistical information on the current pixel value when the statistical information is calculated, and the update rate determining unit 5 sets the existence likelihood map E(x , y) is used as a parameter, and calculation is performed based on the following equation (7).

U_min，U_maxは、それぞれ更新率が採り得る最小値、最大値であり、本実施形態では、オブジェクトの存在尤度が高い領域に低い更新率が設定されることになる。したがって、抽出対象のオブジェクトが長時間静止していた場合にも、存在尤度マップの値が高く保たれていれば、抽出対象オブジェクトが欠けてしまうことを防止することができる。 U _min and U _max are respectively the minimum value and the maximum value that the update rate can take, and in the present embodiment, a low update rate is set in a region where the likelihood of existence of an object is high. Therefore, even if the object to be extracted remains stationary for a long time, it is possible to prevent the object to be extracted from being chipped if the value of the existence likelihood map is kept high.

なお、前記U_minとU_maxは、後述の背景差分計算部７によって得られる各画素の前景/背景の判別結果に応じて、異なる更新率を有するように設計されていてもよい。一般に、背景と判定された画素に対して高い更新率を、前景と判定された画素に対しては低い更新率を、それぞれ設定することが望ましい。 It should be noted that U _min and U _max may be designed to have different update rates according to the foreground/background discrimination result of each pixel obtained by the background difference calculation unit 7 described later. Generally, it is desirable to set a high update rate for pixels determined to be the background and a low update rate for pixels determined to be the foreground.

背景差分計算部７は、フレーム画像ごとに各画素の統計情報（本実施形態では、μ_t (x, y)および標準偏差σ_t(x, y)）、分離閾値T(x, y)および画素値I_t(x, y)に基づいて画素単位で前景／背景判別を実施し、判別結果を例えばマスク形式で出力する。 The background difference calculation unit 7 calculates the statistical information (in this embodiment, μ _t (x, y) and standard deviation σ _t (x, y)) of each pixel for each frame image, the separation threshold T(x, y), and Foreground/background discrimination is performed in pixel units based on the pixel value I _t (x, y), and the discrimination result is output in, for example, a mask format.

本実施例では、非特許文献２と同じように、単一のガウスモデルで背景をモデル化することを考える。色空間に関しても非特許文献２と同じYUV色空間にて処理を記載するが、色空間に関してRGBなどの他の色空間を対象としていても同一に処理を行うことが可能である。ただし、カメラ映像取得部１で得られる色空間と、背景差分計算部７で計算対象とする色空間が異なる場合には、入力された画像に対して色空間の変換を行う機構を有するものとする。そして、次式(8)の条件を満たす画素(x, y)は背景と判断される。 In this embodiment, as in Non-Patent Document 2, consider modeling a background with a single Gaussian model. Regarding the color space, the processing is described in the same YUV color space as in Non-Patent Document 2, but the same processing can be performed even if another color space such as RGB is targeted as the color space. However, if the color space obtained by the camera image acquisition unit 1 and the color space to be calculated by the background difference calculation unit 7 are different, it is assumed that the input image has a mechanism for converting the color space. To do. Then, the pixel (x, y) that satisfies the following expression (8) is determined to be the background.

ここで、zは標準偏差の何倍までを背景と判断するかを調節するパラメータであり、T(x, y)は、前記閾値計算部４により算出される閾値である。したがって、T(x, y)が大きいほど背景と判断される可能性が高くなる。また、本実施形態ではYUV色空間での計算を行うと述べたが、複数の色空間を持つ場合には、色空間ごとに独立して上記の条件式の計算を行い、全ての色空間にて条件を満たす場合に、当該画素を背景であると判定することとする。 Here, z is a parameter for adjusting how many times the standard deviation is determined as the background, and T(x, y) is a threshold value calculated by the threshold value calculation unit 4. Therefore, the larger T(x, y) is, the more likely it is to be judged as the background. Further, in the present embodiment, it is stated that the calculation is performed in the YUV color space. However, in the case of having a plurality of color spaces, the calculation of the above conditional expression is performed independently for each color space, and all color spaces are calculated. If the conditions are satisfied, the pixel is determined to be the background.

さらに、上式(8)では標準偏差の項と閾値の項とが分けられているが、実際にはT(x, y)や存在尤度マップの値に応じて標準偏差項の定数値zを調節するような機能を備えていてもよい。 Furthermore, in the above equation (8), the standard deviation term and the threshold value term are separated, but in practice, the constant value z of the standard deviation term is changed according to T(x, y) and the value of the existence likelihood map. May be provided with a function for adjusting.

後処理部８は、存在尤度マップE_t(x, y)に基づいて、ノイズ除去を行うノイズ除去部８１および欠損領域の穴埋めを行う欠損穴埋め部８２を具備し、前記背景差分計算部７が出力するマスクに対して、メディアンフィルタなどのフィルタ処理によるノイズ除去や、輪郭の膨張(dilation)と縮退(erosion)を繰り返すことで細かいノイズを除去する処理などを実施する。 The post-processing unit 8 includes a noise removing unit 81 that removes noise based on the existence likelihood map E _t (x, y) and a defect filling unit 82 that fills in a defective region, and the background difference calculating unit 7 With respect to the mask output by, the noise removal by a filtering process such as a median filter or the process of removing fine noise by repeating the dilation and erosion of the contour is performed.

前記ノイズ除去部８１に関して、非特許文献２の背景差分法では、背景差分を用いて計算したマスクに対し、結合されている前景領域を一つの塊として捉え、その塊ごとに輪郭枠のサイズやアスペクト比を確認することでノイズの除去を行っている。 Regarding the noise removing unit 81, in the background subtraction method of Non-Patent Document 2, for the mask calculated using the background difference, the combined foreground region is regarded as one block, and the size of the outline frame for each block and Noise is removed by checking the aspect ratio.

しかしながら、このような従来手法では、例えばボールのような小さいオブジェクトが入り込む場合、ボールが消えないようにするためにボールより小さい値をノイズ除去のパラメータとして設定せざるを得ず、効果を出すことが難しい。 However, in such a conventional method, when a small object such as a ball enters, for example, a value smaller than the ball has to be set as a noise removal parameter in order to prevent the ball from disappearing, and it is effective. Is difficult.

加えて、何らかの原因で選手のマスクが分断された場合に、分断された部分のサイズが小さいと削除されてしまう可能性が生じる。そこで、本実施形態では分断された塊の大きさだけではなく、存在尤度マップE_t(x，y)の値も利用してノイズの除去を行うようにしている。 In addition, when the player's mask is divided for some reason, it may be deleted if the size of the divided portion is small. Therefore, in the present embodiment, not only the size of the divided chunk but also the value of the existence likelihood map E _t (x, y) is used to remove noise.

例えば、図５に示したように、背景差分計算部７から出力された１次マスク[同図(a)]に３つの塊P_j（P₁，P₂，P₃：jは塊識別子）が含まれていると、各塊P₁，P₂，P₃の内部の存在尤度マップ[同図(b)]の平均値d_uを計算する。そして、平均値d_uがノイズ除去用の閾値d_ref（例えば、d_ref=0.5）よりも低い塊P₁のみを除去し、他の塊P₂，P₃は残すようにすることで、存在尤度が高い位置にある小領域を残すことが可能となり[同図(c)]、高精度なノイズ除去を行うことができる。 For example, as shown in FIG. 5, three blocks P _j (P ₁ , P ₂ , P ₃ : j are block identifiers) are added to the primary mask [(a) in the figure] output from the background difference calculation unit 7. , The average value d _u of the existence likelihood map inside the blocks P ₁ , P ₂ , P ₃ [(b) in the same figure] is calculated. Then, by removing only the mass P ₁ whose average value _du is lower than the noise removal threshold d _ref (for example, d _ref =0.5), the other masses P ₂ and P ₃ are left, It is possible to leave a small area at a position with a high likelihood [(c) in the same figure], and it is possible to perform highly accurate noise removal.

ノイズ除去の閾値d_refは定値でも良いし、対象とする領域のサイズが大きくなればなるほど小さくし、確実に抽出対象オブジェクトではないと判断できる場合のみノイズ除去ができるような機構を備えていてもよい。 The noise removal threshold value d _ref may be a fixed value, or may be reduced as the size of the target region increases, and even if a mechanism is provided that can perform noise removal only when it can be reliably determined that the object is not the extraction target object. Good.

前記欠損穴埋め部８２に関して、図６に示したように、前景領域（白色部分）に囲まれるような形で、前景が背景と誤判断される小領域（欠損領域）が生じ得る。このような欠損領域は、例えば人物の衣服の色に、背景と同じような色の部分が存在している場合などに、オブジェクトの一部が背景と誤判断されることで生じる。本実施形態では、欠損領域内部の存在尤度マップE(x, y)の平均値を前記ノイズ除去部８１と同様に計算し、平均値が所定の閾値を上回る場合には穴埋めを行うことで欠損領域の修復を行う。 As shown in FIG. 6, a small area (a missing area) in which the foreground is erroneously determined to be the background may be formed in the missing hole filling portion 82 so as to be surrounded by the foreground area (white portion). Such a defective area is generated by erroneously determining a part of the object as the background, for example, when the color of the clothes of the person has a portion having the same color as the background. In the present embodiment, the average value of the existence likelihood map E(x, y) inside the loss area is calculated in the same manner as the noise removing unit 81, and if the average value exceeds a predetermined threshold value, padding is performed. Repair the defective area.

図７は、非特許文献２の手法で閾値を上下させながら欠損領域を穴埋めした場合[同図(a)，(b)]と、前記欠損穴埋め部８２により、存在尤度マップE(x, y)に基づいて穴埋めした場合[同図(c)]とを比較した図である。 FIG. 7 shows a case in which the defect region is filled up by increasing and decreasing the threshold value according to the method of Non-Patent Document [(a) and (b) in the figure], and the existence filling map E(x, FIG. 8 is a diagram comparing with the case [filled in (c) in the figure] when filling is performed based on y).

非特許文献２の手法で閾値を低めに設定すると、オブジェクト（選手）の欠けは少ないが床や看板などの背景が前景と誤判断されている。また、非特許文献２の手法で閾値を高めに設定すると、オブジェクト（選手）の欠けが散見されるようになり、閾値の設定では欠損の防止に限界のあることが判る。 When the threshold value is set to a low value by the method of Non-Patent Document 2, the number of missing objects (players) is small, but the background of the floor or signboard is erroneously determined to be the foreground. Further, when the threshold value is set to a high value by the method of Non-Patent Document 2, it becomes apparent that the lack of objects (players) is scattered, and there is a limit to the prevention of the loss in the setting of the threshold value.

これに対して、本実施形態では前記後処理部８が、存在尤度マップE(x, y)に基づいてノイズ除去および欠損穴埋めを行うので、背景領域を確実に除去しながら、オブジェクトを綺麗に抽出できていることが判る。 On the other hand, in the present embodiment, since the post-processing unit 8 performs noise removal and defect filling based on the existence likelihood map E(x, y), the background area is reliably removed and the object is cleaned. You can see that it has been extracted.

出力部９は、背景差分計算部７あるいは後処理部８で計算された背景領域の情報を基に、結果となる映像（画像）を出力する。ここで出力される画像は、図７に示したように、入力画像をマスクしたことによって得られるカラー画像でも良いし、図８に示したように、背景/前景を判断するための2値からなる2値マスク画像でも良い。 The output unit 9 outputs the resulting video (image) based on the information of the background area calculated by the background difference calculation unit 7 or the post-processing unit 8. The image output here may be a color image obtained by masking the input image as shown in FIG. 7, or may be a binary image for determining the background/foreground as shown in FIG. It may be a binary mask image.

本実施形態によれば、存在尤度マップに基づいて設定される背景統計情報の更新率の範囲が、オブジェクトの存在尤度が高い画素ほど低くなるので、静止しているオブジェクトが徐々に背景として認識されてしまうためにオブジェクトとして認識されにくくなるという技術課題を解決できるようになる。 According to the present embodiment, the range of the update rate of the background statistical information set based on the existence likelihood map becomes lower as the pixel having the higher existence likelihood of the object becomes lower, so that the stationary object gradually becomes the background. It becomes possible to solve the technical problem that it is difficult to recognize as an object because it is recognized.

図９は、本発明の第２実施形態に係るオブジェクト抽出装置の主要部の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。 FIG. 9 is a functional block diagram showing a configuration of a main part of an object extracting device according to the second embodiment of the present invention, and the same reference numerals as those used above represent the same or equivalent parts, and therefore the description thereof will be omitted. To do.

本実施形態では、前記更新率決定部５が更新率見直し部５１を具備し、前記背景差分計算部７の出力と前記存在尤度マップE(x，y)との比較結果に基づいて前記更新率U(x, y)の見直しを行うようにした点に特徴がある。なお、更新率見直し部５１が更新率の見直しに用いる出力マスクは、前記背景差分計算部７の出力に限定されるものではなく、後処理部８が出力するマスクを用いてもよい。 In the present embodiment, the update rate determining unit 5 includes an update rate reviewing unit 51, and the update is performed based on a comparison result between the output of the background difference calculating unit 7 and the existence likelihood map E(x, y). The feature is that the rate U(x, y) is reviewed. The output mask used by the update rate review unit 51 to review the update rate is not limited to the output of the background difference calculation unit 7, and a mask output by the post-processing unit 8 may be used.

前記更新率見直し部５１は、前記背景差分計算部７により背景と判定された画素に対しては高めの更新率が、前景と判定された画素に対しては低めの更新率が設定されるように、背景差分計算部７により前景と判断された画素の更新率U_fore(x, y)を次式(9)に基づいて計算する。 The update rate reviewing unit 51 sets a higher update rate for the pixels determined to be the background by the background difference calculation unit 7 and a lower update rate for the pixels determined to be the foreground. Then, the update rate U _fore (x, y) of the pixel determined to be the foreground by the background difference calculation unit 7 is calculated based on the following equation (9).

一方、背景差分計算部７により背景と判断された画素の更新率U_back (x, y)は次式(10)に基づいて計算する。 On the other hand, the update rate U _back (x, y) of the pixel determined to be the background by the background difference calculation unit 7 is calculated based on the following expression (10).

ここで、U_minfore，U_minbackは、それぞれ前景と判定された画素、背景と判定された画素が採り得る更新率の最小値であり、U_minfore＜U_minbackとされる。U_maxfore，U_maxbackは、それぞれ前景と判定された画素、背景と判定された画素が採り得る更新率の最大値であり、U_maxfore＜U_maxbackとされる。 Here, U _minfore and U _minback are the minimum values of the update rates that can be taken by the pixel determined to be the foreground and the pixel determined to be the background, respectively, and U _minfore <U _minback . U _maxfore and U _maxback are the maximum values of the update rates that can be taken by the pixel determined to be the foreground and the pixel determined to be the background, respectively, and U _maxfore <U _maxback .

本実施形態によれば、存在尤度マップに基づいて決定される更新率の範囲が、前景領域では背景領域よりも低くされるので、静止しているオブジェクトが徐々に背景として認識されてしまうためにオブジェクトとして認識されにくくなるという技術課題を、更に高い確度で解決できるようになる。 According to the present embodiment, the range of the update rate determined based on the existence likelihood map is set to be lower in the foreground area than in the background area, so that the stationary object is gradually recognized as the background. It becomes possible to solve the technical problem of being difficult to be recognized as an object with higher accuracy.

１…カメラ映像取得部，２…オブジェクト領域検出部，３…存在尤度マップ計算部，４…閾値計算部，５…更新率決定部，６…統計情報計算部，７…背景差分計算部，８…後処理部，９…出力部，２１…第１検出部，２２…第２検出部，８１…ノイズ除去部，８２…欠損穴埋め部 1... Camera image acquisition unit, 2... Object region detection unit, 3... Existence likelihood map calculation unit, 4... Threshold value calculation unit, 5... Update rate determination unit, 6... Statistical information calculation unit, 7... Background difference calculation unit, 8... Post-processing section, 9... Output section, 21... First detection section, 22... Second detection section, 81... Noise removal section, 82... Defect filling section

Claims

In an object extraction device that extracts objects from video images,
Means to capture video,
Means for detecting the object area from the acquired video,
Means for calculating an existence likelihood map of the object based on the detection result of the object area,
Statistical information calculating means for calculating statistical information of each pixel,
A background difference calculation means for extracting the foreground region as an object by the background difference method based on the statistical information and the background difference threshold value,
The statistical information calculating means obtains the current statistical information by reflecting the current pixel value in the past statistical information of each pixel at a predetermined update rate,
The object extracting apparatus further comprising an update rate determining means for determining the update rate based on the existence likelihood map.

The object extraction device according to claim 1, wherein the update rate determination unit lowers the update rate for pixels having a higher likelihood of existence of an object.

3. The object extracting apparatus according to claim 1, further comprising a threshold value calculating means for calculating the background difference threshold value based on the existence likelihood map.

The object extraction device according to claim 3, wherein the threshold value calculation unit lowers the background difference threshold value for a pixel having a higher likelihood of existence of an object.

5. The object according to claim 3, wherein the threshold value calculation means dynamically changes the background difference threshold value based on a matching ratio between the detection result of the object area and the calculation result of the background difference calculation means. Extractor.

The means for calculating the existence likelihood map calculates the present existence likelihood map by weighting the current detection result of the object region with the previous detection results by a predetermined learning rate. 6. The object extraction device according to any one of 1 to 5.

7. The object extracting apparatus according to claim 1, wherein the means for detecting the object area detects each of the object areas by a plurality of different detection methods and integrates the respective detection results into one. ..

8. The object extracting apparatus according to claim 1, further comprising post-processing means for improving the accuracy of the extracted object based on an existence likelihood map.

9. The object extracting apparatus according to claim 8, wherein the post-processing unit includes a noise removing unit that regards a foreground region whose average value of the existence likelihood map is below a predetermined noise threshold as a background region.

The object extraction device according to claim 9, wherein the noise threshold is set to be lower as the size of the foreground region is larger.

11. The object according to claim 8, wherein the post-processing unit includes a defect filling unit that regards a background region in which the average value of the existence likelihood maps exceeds a predetermined filling threshold as a foreground region. Extractor.

12. The object extraction apparatus according to claim 1, wherein the means for calculating the existence likelihood map calculates an existence likelihood map for each object to be extracted.

The object extraction device according to claim 12, wherein the background difference threshold and the update rate are determined for each existence likelihood map.

14. The object extracting apparatus according to claim 1, further comprising update rate reviewing means for reviewing the update rate based on the extraction result of the object and the existence likelihood map.

15. The object extracting apparatus according to claim 14, wherein the update rate reviewing unit sets the update rate determined based on the existence likelihood map to be lower in the foreground area than in the background area.

16. The object extracting apparatus according to claim 1, wherein the statistical information calculating means includes means for calculating an average value and a standard deviation based on a history of pixel values.

The object extraction device according to any one of claims 1 to 16, wherein the background difference calculation means models the background area with a single Gaussian distribution based on the statistical information.

In the object extraction method for extracting objects from video images,
The procedure of acquiring the image and detecting the object area,
A procedure for calculating the existence likelihood map of the object based on the detection result of the object area,
A procedure for calculating the statistical information of each pixel,
A step of extracting a foreground region as an object by the background difference method based on the statistical information and the background difference threshold, and in the procedure of calculating the statistical information of the pixel, the current pixel value is set in the past statistical information of each pixel. The object extraction method is characterized in that the statistical information of this time is obtained by reflecting the update rate with the update rate, and the update rate is determined based on the existence likelihood map.