JP6632134B2

JP6632134B2 - Image processing apparatus, image processing method, and computer program

Info

Publication number: JP6632134B2
Application number: JP2016124761A
Authority: JP
Inventors: 強要; 敬介野中; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-06-23
Filing date: 2016-06-23
Publication date: 2020-01-15
Anticipated expiration: 2036-06-23
Also published as: JP2017228146A

Description

本発明は、画像処理装置、画像処理方法およびコンピュータプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a computer program.

従来、スポーツシーンなどを対象として、カメラ視点以外の自由な視点からの映像（以下、自由視点映像と称す）を生成する技術が提案されている。この技術は、複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 2. Description of the Related Art Conventionally, there has been proposed a technique for generating an image from a free viewpoint other than a camera viewpoint (hereinafter, referred to as a free viewpoint image) for a sports scene or the like. In this technology, based on images taken by multiple cameras, images of virtual viewpoints where they are not arranged are synthesized, and the results are displayed on the screen, so that image viewing from various viewpoints can be viewed. It is possible.

背景差分によるオブジェクトの正確な抽出は、高品質な自由視点映像を得るための最初のステップである。背景差分について、従来では例えば非特許文献１、２に記載される技術が知られている。非特許文献１には、自動的にオブジェクトを抽出する技術が開示されている。非特許文献２には、Grabcut法により正確にオブジェクトを抽出する技術が開示されている。 Accurate extraction of objects by background subtraction is the first step in obtaining high quality free viewpoint video. Conventionally, techniques described in Non-Patent Documents 1 and 2 have been known for background subtraction. Non-Patent Document 1 discloses a technique for automatically extracting an object. Non-Patent Document 2 discloses a technique for accurately extracting an object by the Grabcut method.

Elgammal, Ahmed, David Harwood, and Larry Davis. "Non-parametric model for background subtraction," Computer Vision-ECCV 2000. Springer Berlin Heidelberg, 2000. 751-767Elgammal, Ahmed, David Harwood, and Larry Davis. "Non-parametric model for background subtraction," Computer Vision-ECCV 2000.Springer Berlin Heidelberg, 2000.751-767. Rother, Carsten, Vladimir Kolmogorov, and Andrew Blake. "Grabcut: Interactive foreground extraction using iterated graph cuts." ACM transactions on graphics (TOG). Vol. 23. No. 3. ACM, 2004Rother, Carsten, Vladimir Kolmogorov, and Andrew Blake. "Grabcut: Interactive foreground extraction using iterated graph cuts." ACM transactions on graphics (TOG). Vol. 23. No. 3. ACM, 2004 森田真司, 山澤一誠, 寺沢征彦, 横矢直和: "全方位画像センサを用いたネットワーク対応型遠隔監視システム", 電子情報通信学会論文誌（D-II), Vol. J88-D-II, No. 5, pp. 864-875, (2005.5)Shinji Morita, Kazumasa Yamazawa, Masahiko Terasawa, Naokazu Yokoya: "Network-enabled Remote Monitoring System Using Omnidirectional Image Sensor", IEICE Transactions on Information and Systems (D-II), Vol. J88-D-II, No. . 5, pp. 864-875, (2005.5)

しかしながら、非特許文献１に記載される技術では、事前に様々なパラメータを手動で設定する必要がある。したがって、ユーザにとって煩わしく、また手動で設定されたパラメータが最適なものである保証もない。また、非特許文献２に記載されるGrabcut法はユーザによる入力を必要とする。したがって、動画像からのオブジェクトの抽出にGrabcut法を適用すると、動画像のフレームごとにユーザに入力を求めることとなり、現実的ではない。 However, in the technology described in Non-Patent Document 1, it is necessary to manually set various parameters in advance. Therefore, there is no guarantee that the parameters that are troublesome for the user and that are manually set are optimal. Further, the Grabcut method described in Non-Patent Document 2 requires input by a user. Therefore, if the Grabcut method is applied to the extraction of an object from a moving image, an input is required from the user for each frame of the moving image, which is not realistic.

本発明はこうした課題に鑑みてなされたものであり、その目的は、オブジェクトの抽出の正確さとユーザ利便性とを両立できる背景差分技術の提供にある。 The present invention has been made in view of such a problem, and an object of the present invention is to provide a background subtraction technique that can achieve both object extraction accuracy and user convenience.

本発明のある態様は、画像処理装置に関する。この画像処理装置は、動画像の対象フレームに対して該対象フレーム内で第１背景差分を行うことで得られる第１マスクを取得する手段と、対象フレームに対してフレーム間で第２背景差分を行うことで得られる第２マスクを取得する手段と、第１マスクと第２マスクとを合成することで合成マスクを生成する手段と、前記第１背景差分、前記第２背景差分のいずれとも異なる第３背景差分を行うことで基準マスクを生成する手段と、生成された前記基準マスクに基づいて、前記第１背景差分で用いられるパラメータおよび前記第２背景差分で用いられるパラメータのうちの少なくとも一方を設定する手段と、を備える。 One embodiment of the present invention relates to an image processing device. The image processing apparatus includes: a unit that obtains a first mask obtained by performing a first background difference on a target frame of a moving image in the target frame; and a second background difference between the target frame and the frame. Means for obtaining a second mask obtained by performing the above , means for generating a combined mask by combining the first mask and the second mask, and both of the first background difference and the second background difference Means for generating a reference mask by performing a different third background difference; and at least one of a parameter used in the first background difference and a parameter used in the second background difference, based on the generated reference mask. Means for setting one of them .

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, and those in which the constituent elements and expressions of the present invention are replaced with each other among apparatuses, methods, systems, computer programs, recording media storing computer programs, and the like, are also included in the present invention. This is effective as an embodiment.

本発明によれば、オブジェクトの抽出の正確さとユーザ利便性とを両立できる背景差分技術を提供できる。 According to the present invention, it is possible to provide a background subtraction technique that can achieve both the accuracy of object extraction and user convenience.

実施の形態に係る画像処理装置を備える自由視点画像配信システムを示す模式図である。1 is a schematic diagram illustrating a free viewpoint image distribution system including an image processing device according to an embodiment. 図１の携帯端末の機能および構成を示すブロック図である。FIG. 2 is a block diagram illustrating functions and configurations of the mobile terminal of FIG. 1. 図１の画像処理装置の機能および構成を示すブロック図である。FIG. 2 is a block diagram illustrating functions and configurations of the image processing apparatus of FIG. 1. 図３のパラメータ保持部の一例を示すデータ構造図である。FIG. 4 is a data structure diagram illustrating an example of a parameter holding unit in FIG. 3. 図５（ａ）、（ｂ）は、関心領域設定画面の代表画面図である。FIGS. 5A and 5B are representative screen diagrams of the region of interest setting screen. 図６（ａ）、（ｂ）、（ｃ）は、重み設定画面の代表画面図である。FIGS. 6A, 6B, and 6C are representative screen diagrams of the weight setting screen. 自由視点画像再生画面の代表画面図である。It is a typical screen figure of a free viewpoint image reproduction screen. エラー画面の代表画面図である。It is a typical screen figure of an error screen. 図３のパラメータ設定部におけるパラメータ設定処理を説明するための説明図である。FIG. 4 is an explanatory diagram for describing parameter setting processing in a parameter setting unit in FIG. 3. 図１０（ａ）、（ｂ）、（ｃ）は、重みαの違いによる合成マスクの違いを説明するための説明図である。FIGS. 10A, 10B, and 10C are explanatory diagrams for explaining the difference in the combination mask due to the difference in the weight α. 図１１（ａ）、（ｂ）は、改良Ｆ値による評価を説明するためのグラフである。FIGS. 11A and 11B are graphs for explaining evaluation based on the improved F value. 図１２（ａ）〜（ｆ）は、フレームの平均画素強度の変動を示すグラフである。FIGS. 12A to 12F are graphs showing fluctuations in the average pixel intensity of a frame. 図１の画像処理装置における一連の処理の流れを示すフローチャートである。2 is a flowchart illustrating a flow of a series of processes in the image processing apparatus of FIG. 1. 図１３の閾値設定処理ステップにおける処理の流れを示すフローチャートである。14 is a flowchart illustrating a flow of processing in a threshold setting processing step of FIG. 13; 図１３の合成マスク生成処理ステップにおける処理の流れを示すチャートである。14 is a chart showing a flow of processing in a synthetic mask generation processing step of FIG. 13.

以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 Hereinafter, the same or equivalent components, members, and processes illustrated in each drawing are denoted by the same reference numerals, and the repeated description will be omitted as appropriate. In each drawing, some of the members that are not important for the description are omitted.

実施の形態に係る画像処理装置は、動画像のフレームからオブジェクトを抽出する際、空間ドメイン（spatial domain）における背景差分によりフレームからマスクを生成する。空間ドメインにおける背景差分は処理対象のフレーム内で行われる背景差分であるから、以降イントラ背景差分と称す。イントラ背景差分により生成されるマスクをイントラマスクと称す。画像処理装置は、イントラ背景差分と並行して時間ドメイン（temporal domain）における背景差分によりフレームからマスクを生成する。時間ドメインにおける背景差分は処理対象のフレームを含むフレーム間で行われる背景差分であるから、以降インター背景差分と称す。インター背景差分により生成されるマスクをインターマスクと称す。画像処理装置は、生成された２つのマスク、イントラマスクおよびインターマスク、を合成することで合成マスクを生成し、生成された合成マスクをオブジェクトの抽出に利用する。これにより、ユーザによる指示や入力の必要性を抑えつつ、確度の高いオブジェクト抽出を実現できる。 When extracting an object from a frame of a moving image, the image processing apparatus according to the embodiment generates a mask from the frame based on a background difference in a spatial domain. Since the background difference in the spatial domain is a background difference performed within the frame to be processed, it is hereinafter referred to as an intra background difference. A mask generated by the intra background difference is called an intra mask. The image processing apparatus generates a mask from a frame based on the background difference in the temporal domain in parallel with the intra background difference. Since the background difference in the time domain is a background difference performed between frames including the frame to be processed, the background difference is hereinafter referred to as an inter background difference. The mask generated by the inter background subtraction is called an inter mask. The image processing apparatus generates a combined mask by combining the two generated masks, the intra mask and the inter mask, and uses the generated combined mask for extracting an object. As a result, highly accurate object extraction can be realized while suppressing the necessity of instructions and inputs by the user.

本実施の形態では、イントラ背景差分として正規混合モデル（Gaussian Mixture Model）を用いた背景差分（非特許文献１参照）を採用する。この背景差分では、混合正規分布(Mixture of Gaussian Distribution, MoG)を用いて背景（background）をモデル化する。まず、オブジェクトの無い背景フレームを用意し、その背景フレームから背景の混合正規分布モデルを得る。処理対象のフレームの画素が混合正規分布モデルに属さない場合、その画素は前景（foreground）として抽出される。この属否の判定の際に閾値が用いられる。以下、この閾値をイントラ閾値（ｇｍｍ＿ｔｈ）と表記する。 In the present embodiment, a background difference using a normal mixture model (Gaussian Mixture Model) (see Non-Patent Document 1) is adopted as the intra background difference. In this background subtraction, a background is modeled using a mixture normal distribution (Mixture of Gaussian Distribution, MoG). First, a background frame without objects is prepared, and a mixed normal distribution model of the background is obtained from the background frame. If the pixel of the processing target frame does not belong to the mixture normal distribution model, the pixel is extracted as a foreground. A threshold is used when determining whether or not the item belongs. Hereinafter, this threshold is referred to as an intra threshold (gmm_th).

本実施の形態では、インター背景差分として、画素ごとにひとつの正規モデルを適用した背景差分（非特許文献３参照）を採用する。この背景差分では、フレーム中の画素の座標を（ｉ，ｊ）と表記するとき、平均値μ_ｉ，ｊおよび標準偏差σ_ｉ，ｊを有する一つの正規分布モデルを画素（ｉ，ｊ）の強度に割り当てる。平均値μ_ｉ，ｊおよび標準偏差σ_ｉ，ｊは、動画像に含まれる複数のフレームに亘る画素（ｉ，ｊ）の強度の集合を母集団として計算される。Ｙ、Ｕ、Ｖの全てのチャネルについて、処理対象のフレームの画素（ｉ，ｊ）の強度の偏差の絶対値が、標準偏差σ_ｉ，ｊに閾値を加えた値よりも小さい場合、画素（ｉ，ｊ）は背景に属すると判定される。そうでなければ画素（ｉ，ｊ）は前景に属すると判定される。以下、Ｙ、Ｕ、Ｖのそれぞれについての上記閾値をインター閾値（Ｙ＿ｔｈ、Ｕ＿ｔｈ、Ｖ＿ｔｈ）と表記する。 In the present embodiment, a background difference (see Non-Patent Document 3) in which one normal model is applied to each pixel is adopted as an inter background difference. In this background subtraction, when the coordinates of a pixel in a frame are expressed as (i, j), one normal distribution model having an average value μ _{i, j} and a standard deviation σ _{i, j} is represented by the pixel (i, j). Assign to strength. The average value μ _{i, j} and the standard deviation σ _{i, j} are calculated using a set of intensities of the pixels (i, j) over a plurality of frames included in the moving image as a population. When the absolute value of the deviation of the intensity of the pixel (i, j) of the frame to be processed is smaller than the value obtained by adding a threshold to the standard deviation σ _{i, j} for all the channels Y, U, and V, the pixel ( i, j) are determined to belong to the background. Otherwise, pixel (i, j) is determined to belong to the foreground. Hereinafter, the thresholds for each of Y, U, and V are referred to as inter thresholds (Y_th, U_th, V_th).

図１は、実施の形態に係る画像処理装置２００を備える自由視点画像配信システム１１０を示す模式図である。自由視点画像配信システム１１０は、複数のカメラ１１６、１１８、１２０と、それらのカメラと接続された画像処理装置２００と、携帯電話やタブレットやスマートフォンやＨＭＤ（Head Mounted Display）やノートＰＣなどの携帯端末１１４と、を備える。画像処理装置２００と携帯端末１１４とはインターネットなどのネットワーク１１２を介して接続される。自由視点画像配信システム１１０では、例えば屋内に配置された複数のカメラ１１６、１１８、１２０が床１２６に立つ人物１２４を撮像する。複数のカメラ１１６、１１８、１２０は撮った映像を画像処理装置２００に送信し、画像処理装置２００はそれらの映像を処理する。携帯端末１１４のユーザは画像処理装置２００に対して希望の視点を指定し、画像処理装置２００は指定された視点（仮想視点）から人物１２４を見た場合の画像を合成し、ネットワーク１１２を介して携帯端末１１４に配信する。 FIG. 1 is a schematic diagram illustrating a free viewpoint image distribution system 110 including an image processing device 200 according to the embodiment. The free viewpoint image distribution system 110 includes a plurality of cameras 116, 118, and 120, an image processing apparatus 200 connected to the cameras, and a mobile phone, tablet, smartphone, HMD (Head Mounted Display), and notebook PC. And a terminal 114. The image processing device 200 and the portable terminal 114 are connected via a network 112 such as the Internet. In the free viewpoint image distribution system 110, for example, a plurality of cameras 116, 118, and 120 arranged indoors image a person 124 standing on a floor 126. The plurality of cameras 116, 118, and 120 transmit the captured images to the image processing device 200, and the image processing device 200 processes the images. The user of the portable terminal 114 designates a desired viewpoint to the image processing apparatus 200, and the image processing apparatus 200 synthesizes an image when the person 124 is viewed from the designated viewpoint (virtual viewpoint), and To the portable terminal 114.

なお、図１では屋内の人物を撮像する場合を説明したが、これに限られず、例えばフィットネスのインストラクタを撮像する場合やダンサーのダンスを撮像する場合やサッカーの試合を撮像する場合などに、本実施の形態の技術的思想を適用できる。また、携帯端末１１４の代わりに、デスクトップＰＣやラップトップＰＣ、ＴＶ受像機、セットトップボックス等の据え置き型端末が使用されてもよい。また、画像処理装置２００による配信形態は、予め全体をダウンロードしてから再生するものでも、ストリーミングでも、プログレッシブでもよい。配信形態がリアルタイムでない場合、画像処理装置２００はカメラから取得して保持している動画像について予め平均値μ_ｉ，ｊおよび標準偏差σ_ｉ，ｊを演算してもよい。配信形態がリアルタイムである場合、画像処理装置２００は現時点までに得られた動画像から平均値μ_ｉ，ｊおよび標準偏差σ_ｉ，ｊを演算し、新たな動画像が所定の量得られるたびにそれらの値を再演算することで更新してもよい。 Although FIG. 1 illustrates the case where an indoor person is imaged, the present invention is not limited to this case. For example, when an image of a fitness instructor, an image of a dancer's dance, or an image of a soccer match is taken, the present invention is not limited to this. The technical idea of the embodiment can be applied. Further, instead of the portable terminal 114, a stationary terminal such as a desktop PC, a laptop PC, a TV receiver, and a set-top box may be used. In addition, the distribution form of the image processing apparatus 200 may be such that the whole is downloaded in advance and then reproduced, or may be streaming or progressive. When the distribution mode is not real-time, the image processing apparatus 200 may calculate the average μ _{i, j} and the standard deviation σ _{i, j} of the moving image acquired and held from the camera in advance. When the distribution mode is real-time, the image processing apparatus 200 calculates the average value μ _{i, j} and the standard deviation σ _{i, j} from the moving images obtained up to the present time, and every time a predetermined amount of a new moving image is obtained. May be updated by recalculating those values.

図２は、図１の携帯端末１１４の機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウエア的には、コンピュータのＣＰＵ（Central Processing Unit）をはじめとする素子や機械装置で実現でき、ソフトウエア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウエア、ソフトウエアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 2 is a block diagram illustrating functions and configurations of the portable terminal 114 in FIG. Each block shown here can be realized by hardware or other elements or mechanical devices such as a CPU (Central Processing Unit) of the computer, and is realized by a computer program or the like in software. The functional blocks realized by their cooperation are drawn. Therefore, it will be understood by those skilled in the art referred to in this specification that these functional blocks can be realized in various forms by a combination of hardware and software.

携帯端末１１４は通信部２４６と表示制御部２１６とディスプレイ２１８と入力部２２０とを備える。通信部２４６は、入力部２２０を介してユーザにより指定された視点を特定する情報をネットワーク１１２を介して画像処理装置２００に送信する。通信部２４６は動画像を画像処理装置２００からネットワーク１１２を介して取得する。表示制御部２１６はディスプレイ２１８を制御し、ディスプレイ２１８に種々の画面を表示させる。表示制御部２１６は、通信部２４６によって取得された動画像をディスプレイ２１８に表示させる。ディスプレイ２１８は、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electro Luminescence）ディスプレイであってもよい。入力部２２０はユーザからの入力を受け付ける。入力部２２０は、マウスやキーボードやタッチパネルやボタンやリモートコントローラであってもよい。 The portable terminal 114 includes a communication unit 246, a display control unit 216, a display 218, and an input unit 220. The communication unit 246 transmits information specifying the viewpoint specified by the user via the input unit 220 to the image processing device 200 via the network 112. The communication unit 246 acquires a moving image from the image processing device 200 via the network 112. The display control unit 216 controls the display 218 and causes the display 218 to display various screens. The display control unit 216 causes the display 218 to display the moving image acquired by the communication unit 246. The display 218 may be an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display. The input unit 220 receives an input from a user. The input unit 220 may be a mouse, a keyboard, a touch panel, buttons, or a remote controller.

図３は、図１の画像処理装置２００の機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウエア的には、コンピュータのＣＰＵ（Central Processing Unit）をはじめとする素子や機械装置で実現でき、ソフトウエア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウエア、ソフトウエアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 3 is a block diagram illustrating functions and configurations of the image processing apparatus 200 in FIG. Each block shown here can be realized by hardware or other elements or mechanical devices such as a CPU (Central Processing Unit) of the computer, and is realized by a computer program or the like in software. The functional blocks realized by their cooperation are drawn. Therefore, it will be understood by those skilled in the art referred to in this specification that these functional blocks can be realized in various forms by a combination of hardware and software.

画像処理装置２００は、カメラから得られた動画像から合成マスクを用いてオブジェクト、例えば人物１２４の像を抽出し、抽出されたオブジェクトに基づいて任意の仮想視点の画像を合成する。画像処理装置２００は、動画像取得部２０２と、パラメータ設定部２０４と、合成マスク生成部２０６と、オブジェクト抽出部２０８と、再設定判定部２１０と、更新判定部２１２と、パラメータ更新部２１４と、動画像配信部２４４と、動画像保持部２２２と、パラメータ保持部２２４と、を備える。 The image processing apparatus 200 extracts an object, for example, an image of the person 124 from a moving image obtained from the camera using a synthesis mask, and synthesizes an image of an arbitrary virtual viewpoint based on the extracted object. The image processing device 200 includes a moving image acquisition unit 202, a parameter setting unit 204, a combined mask generation unit 206, an object extraction unit 208, a reset determination unit 210, an update determination unit 212, a parameter update unit 214 , A moving image distribution unit 244, a moving image holding unit 222, and a parameter holding unit 224.

動画像取得部２０２は、画像処理装置２００と接続された各カメラ１１６、１１８、１２０から動画像を取得する。動画像取得部２０２は、取得された動画像を動画像保持部２２２に格納する。動画像保持部２２２は動画像を保持する。動画像は複数のフレームの時系列であってもよい。 The moving image acquisition unit 202 acquires a moving image from each of the cameras 116, 118, and 120 connected to the image processing device 200. The moving image acquisition unit 202 stores the acquired moving image in the moving image holding unit 222. The moving image holding unit 222 holds a moving image. The moving image may be a time series of a plurality of frames.

パラメータ設定部２０４は、動画像からのオブジェクトの抽出を始める際に、イントラ背景差分およびインター背景差分で使用されるパラメータを設定する。イントラ背景差分のパラメータはｇｍｍ＿ｔｈを含み、インター背景差分のパラメータはＹ＿ｔｈ、Ｕ＿ｔｈ、Ｖ＿ｔｈを含む。パラメータ設定部２０４は、関心領域設定部２２６と、基準マスク生成部２２８と、重み設定部２３０と、テストマスク生成部２３２と、改良Ｆ値算出部２３４と、パラメータ決定部２３６と、を含む。 When starting extraction of an object from a moving image, the parameter setting unit 204 sets parameters used for an intra background difference and an inter background difference. The parameters of the intra background difference include gmm_th, and the parameters of the inter background difference include Y_th, U_th, and V_th. The parameter setting unit 204 includes a region of interest setting unit 226, a reference mask generation unit 228, a weight setting unit 230, a test mask generation unit 232, an improved F value calculation unit 234, and a parameter determination unit 236.

関心領域設定部２２６は、処理対象の動画像のうち基準マスクを生成するために使用される基準フレーム、例えば最初のフレームを動画像保持部２２２から取得する。関心領域設定部２２６は、取得された最初のフレームを含む関心領域設定画面４００（図５（ａ）、（ｂ）で後述）を生成し、その画面データをネットワーク１１２を介して携帯端末１１４に送信する。携帯端末１１４の表示制御部２１６は、受信した画面データに基づき関心領域設定画面４００をディスプレイ２１８に表示させる。携帯端末１１４は、表示された最初のフレームにおける関心領域（Region Of Interest）の指定をユーザから入力部２２０を介して受け付ける。携帯端末１１４の通信部２４６は、指定された関心領域の情報をネットワーク１１２を介して画像処理装置２００に送信する。画像処理装置２００の関心領域設定部２２６は、指定された関心領域の情報を受信する。 The region-of-interest setting unit 226 acquires, from the moving image holding unit 222, a reference frame used for generating a reference mask, for example, the first frame of the moving image to be processed. The region-of-interest setting unit 226 generates a region-of-interest setting screen 400 (to be described later with reference to FIGS. 5A and 5B) including the acquired first frame, and transmits the screen data to the mobile terminal 114 via the network 112. Send. The display control unit 216 of the mobile terminal 114 causes the display 218 to display the region of interest setting screen 400 based on the received screen data. The mobile terminal 114 receives designation of a region of interest (Region Of Interest) in the displayed first frame from the user via the input unit 220. The communication unit 246 of the mobile terminal 114 transmits information of the designated region of interest to the image processing device 200 via the network 112. The region of interest setting unit 226 of the image processing device 200 receives the information of the designated region of interest.

基準マスク生成部２２８は、関心領域が指定された最初のフレームに対して、イントラ背景差分、インター背景差分のいずれとも異なる背景差分を行うことで基準マスクを生成する。基準マスク生成部２２８は、最初のフレームと該最初のフレームに対して指定された関心領域とを入力とするGrabcut法（非特許文献２参照）により基準マスクを生成する。基準マスク生成部２２８は、Grabcut法により生成された基準マスクに対するユーザによる編集（ペインティング、ブラッシング等）を受け付けてもよい。 The reference mask generation unit 228 generates a reference mask by performing a background difference different from any of the intra background difference and the inter background difference on the first frame in which the region of interest is specified. The reference mask generation unit 228 generates a reference mask by the Grabcut method (see Non-Patent Document 2) that inputs a first frame and a region of interest specified for the first frame. The reference mask generation unit 228 may accept editing (painting, brushing, etc.) by the user of the reference mask generated by the Grabcut method.

Grabcut法はイントラ背景差分やインター背景差分よりも処理に時間がかかり、かつユーザ入力すなわちユーザによる関心領域の指定を必要とするものであるが、それらの背景差分よりも正確な結果が得られる。以下で説明される最適なパラメータの探索において客観的な基準は重要である。本実施の形態では、最初のフレームからGrabcut法により基準マスクを得て、その基準マスクを探索の際の基準とする。 The Grabcut method requires more time for processing than the intra background difference or the inter background difference, and requires user input, that is, the user to specify a region of interest. However, a more accurate result can be obtained than the background difference. Objective criteria are important in the search for optimal parameters, described below. In the present embodiment, a reference mask is obtained from the first frame by the Grabcut method, and the reference mask is used as a reference when searching.

重み設定部２３０は、後述の改良Ｆ値の算出における重みαの指定をユーザから受け付ける。重み設定部２３０は、重み設定画面５００（図６（ａ）、（ｂ）、（ｃ）で後述）を生成し、その画面データをネットワーク１１２を介して携帯端末１１４に送信する。携帯端末１１４の表示制御部２１６は、受信した画面データに基づき重み設定画面５００をディスプレイ２１８に表示させる。携帯端末１１４は、ユーザが重み設定画面５００に対して入力部２２０を介して入力または指定した値を取得する。携帯端末１１４の通信部２４６は、取得された値をネットワーク１１２を介して画像処理装置２００に送信する。画像処理装置２００の重み設定部２３０は、受信した値を重みαとして設定する。 The weight setting unit 230 receives designation of a weight α in a calculation of an improved F value described later from a user. The weight setting unit 230 generates a weight setting screen 500 (described later in FIGS. 6A, 6B, and 6C), and transmits the screen data to the portable terminal 114 via the network 112. The display control unit 216 of the mobile terminal 114 causes the display 218 to display the weight setting screen 500 based on the received screen data. The portable terminal 114 acquires a value input or specified by the user via the input unit 220 to the weight setting screen 500. The communication unit 246 of the mobile terminal 114 transmits the obtained value to the image processing device 200 via the network 112. The weight setting unit 230 of the image processing device 200 sets the received value as the weight α.

テストマスク生成部２３２は、インター背景差分の設定可能なインター閾値（Ｙ＿ｔｈ、Ｕ＿ｔｈ、Ｖ＿ｔｈ）の集合のなかからひとつのインター閾値を選択する。テストマスク生成部２３２は、関心領域設定部２２６によって取得された最初のフレームに対して、選択されたインター閾値を用いたインター背景差分を行うことでインターテストマスクを生成する。 The test mask generation unit 232 selects one inter threshold from a set of inter thresholds (Y_th, U_th, V_th) for which an inter background difference can be set. The test mask generation unit 232 generates an intertest mask by performing an inter background difference using the selected inter threshold on the first frame acquired by the region of interest setting unit 226.

改良Ｆ値算出部２３４は、テストマスク生成部２３２によって生成されたインターテストマスクを評価対象とし、基準マスク生成部２２８によって生成された基準マスクを正解とするときの適合率（Precision）と再現率（Recall）との重み付け調和平均を改良Ｆ値（Modified F-Measure）として算出する。改良Ｆ値は以下の式１で与えられる。

…（１）
ここで、重みαは０以上２以下の値であり、重み設定部２３０により設定される。重みαは、オブジェクトの抽出の際のユーザの好みを反映する。適合率（Precision）および再現率（Recall）はそれぞれ以下の式２、式３により算出される。

…（２）

…（３）
ここで、ＴＰ（True Positive）はインターテストマスクおよび基準マスクの両方で背景に属する画素の総数であり、ＦＮ（False Negative）はインターテストマスクでは前景に属するが基準マスクでは背景に属する画素の総数であり、ＦＰ（False Positive）はインターテストマスクでは背景に属するが基準マスクでは前景に属する画素の総数である。 The improved F value calculation unit 234 evaluates the inter-test mask generated by the test mask generation unit 232 as an evaluation target, and sets the precision and recall when the reference mask generated by the reference mask generation unit 228 is regarded as a correct answer. A weighted harmonic average with (Recall) is calculated as an improved F value (Modified F-Measure). The improved F-number is given by Equation 1 below.

… (1)
Here, the weight α is a value of 0 or more and 2 or less, and is set by the weight setting unit 230. The weight α reflects the user's preference when extracting an object. The precision (Precision) and the recall (Recall) are calculated by the following equations 2 and 3, respectively.

… (2)

… (3)
Here, TP (True Positive) is the total number of pixels belonging to the background in both the intertest mask and the reference mask, and FN (False Negative) is the total number of pixels belonging to the foreground in the intertest mask but belongs to the background in the reference mask. FP (False Positive) is the total number of pixels belonging to the background in the intertest mask but belonging to the foreground in the reference mask.

テストマスク生成部２３２および改良Ｆ値算出部２３４は、インター閾値の選択および改良Ｆ値の算出を設定可能な全てのインター閾値が選択されるまで繰り返す。パラメータ決定部２３６は、改良Ｆ値が大きくなるようにインター閾値を決定する。パラメータ決定部２３６は、テストマスク生成部２３２および改良Ｆ値算出部２３４により得られたインター閾値と改良Ｆ値との組の集合のなかから、最も大きい改良Ｆ値を有する組を抽出する。パラメータ決定部２３６は、抽出された組のインター閾値を最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）として決定し、パラメータ保持部２２４に登録する。 The test mask generator 232 and the improved F value calculator 234 repeat the selection of the inter threshold and the calculation of the improved F value until all the settable inter thresholds are selected. The parameter determining unit 236 determines the inter threshold so that the improved F value increases. The parameter determination unit 236 extracts a set having the largest improved F value from a set of sets of the inter threshold and the improved F value obtained by the test mask generation unit 232 and the improved F value calculation unit 234. The parameter determination unit 236 determines the extracted set of inter thresholds as optimal inter thresholds (Y_th_opt, U_th_opt, V_th_opt), and registers them in the parameter holding unit 224.

テストマスク生成部２３２は、イントラ背景差分の設定可能なイントラ閾値（ｇｍｍ＿ｔｈ）の集合のなかからひとつのイントラ閾値を選択する。テストマスク生成部２３２は、関心領域設定部２２６によって取得された最初のフレームに対して、選択されたイントラ閾値を用いたイントラ背景差分を行うことでイントラテストマスクを生成する。テストマスク生成部２３２は、最初のフレームに対してパラメータ保持部２２４に保持される最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）を用いたインター背景差分を行うことでインター最適マスクを生成する。テストマスク生成部２３２は、後述の合成部２４２における合成方法と同じまたはそれに準じた合成方法により、イントラテストマスクとインター最適マスクとを合成し、合成テストマスクを生成する。 The test mask generation unit 232 selects one intra threshold from a set of intra thresholds (gmm_th) for which an intra background difference can be set. The test mask generation unit 232 generates an intra test mask by performing an intra background difference using the selected intra threshold on the first frame acquired by the region of interest setting unit 226. The test mask generation unit 232 generates an inter-optimal mask by performing an inter-background difference using an optimal inter-threshold (Y_th_opt, U_th_opt, V_th_opt) held in the parameter holding unit 224 for the first frame. The test mask generation unit 232 synthesizes the intra-test mask and the inter-optimal mask by the same synthesis method as the synthesis method in the synthesis unit 242 described later or a similar method, and generates a synthesized test mask.

改良Ｆ値算出部２３４は、テストマスク生成部２３２によって生成された合成テストマスクを評価対象とし、基準マスク生成部２２８によって生成された基準マスクを正解とするときの改良Ｆ値を算出する。テストマスク生成部２３２および改良Ｆ値算出部２３４は、イントラ閾値の選択および改良Ｆ値の算出を全てのイントラ閾値が選択されるまで繰り返す。パラメータ決定部２３６は、改良Ｆ値が大きくなるようにイントラ閾値を決定する。パラメータ決定部２３６は、テストマスク生成部２３２および改良Ｆ値算出部２３４により得られたイントラ閾値と改良Ｆ値との組の集合のなかから、最も大きい改良Ｆ値を有する組を抽出する。パラメータ決定部２３６は、抽出された組のイントラ閾値を最適なイントラ閾値（ｇｍｍ＿ｔｈ＿ｏｐｔ）として決定し、パラメータ保持部２２４に登録する。 The improved F value calculation unit 234 calculates an improved F value when the synthesized test mask generated by the test mask generation unit 232 is to be evaluated and the reference mask generated by the reference mask generation unit 228 is a correct answer. The test mask generator 232 and the improved F value calculator 234 repeat the selection of the intra threshold and the calculation of the improved F value until all the intra thresholds are selected. The parameter determination unit 236 determines the intra threshold so that the improved F value increases. The parameter determining unit 236 extracts a set having the largest improved F value from a set of sets of the intra threshold value and the improved F value obtained by the test mask generating unit 232 and the improved F value calculating unit 234. The parameter determination unit 236 determines the extracted set of intra thresholds as the optimal intra threshold (gmm_th_opt), and registers the determined intra threshold in the parameter holding unit 224.

合成マスク生成部２０６は、処理対象の動画像に含まれるフレームに対して、パラメータ保持部２２４に保持される閾値を参照し、インター背景差分およびイントラ背景差分を適用することで合成マスクを生成する。合成マスク生成部２０６は、イントラ背景差分部２３８と、インター背景差分部２４０と、合成部２４２と、を含む。 The synthesis mask generation unit 206 generates a synthesis mask by referring to the threshold value stored in the parameter storage unit 224 and applying an inter background difference and an intra background difference to a frame included in a moving image to be processed. . The synthesis mask generation unit 206 includes an intra background difference unit 238, an inter background difference unit 240, and a synthesis unit 242.

イントラ背景差分部２３８は、パラメータ保持部２２４を参照し、最適なイントラ閾値（ｇｍｍ＿ｔｈ＿ｏｐｔ）を特定する。イントラ背景差分部２３８は、フレームに対して、特定されたイントラ閾値を用いたイントラ背景差分を行うことでイントラマスクを生成する。 The intra background difference unit 238 refers to the parameter holding unit 224 and specifies an optimal intra threshold (gmm_th_opt). The intra-background difference unit 238 generates an intra-mask by performing an intra-background difference on the frame using the specified intra threshold.

インター背景差分部２４０は、パラメータ保持部２２４を参照し、最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）を特定する。インター背景差分部２４０は、フレームに対して、特定されたインター閾値を用いたインター背景差分を行うことでインターマスクを生成する。 The inter-background difference unit 240 refers to the parameter storage unit 224 and specifies the optimum inter-threshold values (Y_th_opt, U_th_opt, V_th_opt). The inter-background difference unit 240 generates an inter-mask by performing an inter-background difference on the frame using the specified inter threshold.

合成部２４２は、イントラ背景差分部２３８によって生成されたイントラマスクとインター背景差分部２４０によって生成されたインターマスクとを合成することで合成マスクを生成する。合成部２４２は、画素ごとにイントラマスクとインターマスクとの間で論理積（ＡＮＤ演算、＆＆）を行うことで合成マスクを生成する。ＲＧＢで表した場合、マスクの背景部分の画素値は（０，０，０）すなわち黒色であり、前景部分の画素値は画素のサンプリングビットをｎとすると（２^ｎ−１，２^ｎ−１，２^ｎ−１）すなわち白色である。したがって、合成マスクの背景部分は、イントラマスクおよびインターマスクのうちの少なくとも一方における背景部分であり、合成マスクの前景部分はイントラマスクおよびインターマスクの両方における前景部分である。言い換えると、合成によりマスクの背景部分が増加する。これにより、より正確でシャープなオブジェクトの輪郭が抽出される。 The synthesis unit 242 generates a synthesis mask by synthesizing the intra mask generated by the intra background subtraction unit 238 and the inter mask generated by the inter background subtraction unit 240. The combining unit 242 generates a combined mask by performing a logical product (AND operation, &&) between the intra mask and the inter mask for each pixel. When represented by RGB, the pixel value of the background portion of the mask is (0, 0, 0), that is, black, and the pixel value of the foreground portion is (2 ⁿ -1,2 ⁿ −1) ^where n is the sampling bit of the pixel. , 2 ⁿ -1), that is, white. Therefore, the background portion of the composite mask is a background portion of at least one of the intra mask and the inter mask, and the foreground portion of the composite mask is a foreground portion of both the intra mask and the inter mask. In other words, the background portion of the mask is increased by the combination. Thereby, a more accurate and sharp outline of the object is extracted.

オブジェクト抽出部２０８は、合成部２４２によって生成された合成マスクを用いてフレームからオブジェクトを抽出する。
動画像配信部２４４は、オブジェクト抽出部２０８における抽出結果を利用して、携帯端末１１４のユーザにより指定された視点からの動画像を合成する。動画像配信部２４４は、合成により得られた合成動画像をネットワーク１１２を介して携帯端末１１４に送信する。 The object extraction unit 208 extracts an object from the frame using the synthesis mask generated by the synthesis unit 242.
The moving image distribution unit 244 combines the moving images from the viewpoint specified by the user of the portable terminal 114 using the extraction result of the object extraction unit 208. The moving image distribution unit 244 transmits the synthesized moving image obtained by the synthesis to the portable terminal 114 via the network 112.

再設定判定部２１０は、パラメータ設定部２０４による基準マスクの生成および生成された該基準マスクによるパラメータの再設定が必要か否かを判定する。再設定判定部２１０は、合成部２４２によって生成された合成マスクの平均画素強度の変化量を指標として再設定の要否の判定を行う。合成部２４２によって生成された合成マスクにおける画素（ｉ，ｊ）の強度をＭｏｕｔ（ｉ，ｊ）と表記する。再設定判定部２１０は、以下の式４により合成マスクの平均画素強度Ａｖｅ＿ｍｓｋを算出する。

…（４）
ここで、Ｍはフレームの幅、Ｎはフレームの高さである。再設定判定部２１０は、現在のフレームから得られた合成マスクの平均画素強度と一つ前のフレームから得られた合成マスクの平均画素強度との差の絶対値｜Δｍｓｋ｜を算出する。再設定判定部２１０は、算出された｜Δｍｓｋ｜が所定の閾値ｔｈ＿ｍｓｋを上回る場合、パラメータの再設定が必要であると判定し、そうでなければ再設定は不要と判定する。 The reset determination unit 210 determines whether or not it is necessary to generate a reference mask by the parameter setting unit 204 and reset parameters using the generated reference mask. The reset determination unit 210 determines whether resetting is necessary using the amount of change in the average pixel intensity of the combined mask generated by the combining unit 242 as an index. The intensity of the pixel (i, j) in the synthesis mask generated by the synthesis unit 242 is denoted as Mout (i, j). The reset determination unit 210 calculates the average pixel intensity Ave_msk of the combined mask by the following Expression 4.

… (4)
Here, M is the width of the frame, and N is the height of the frame. The reset determination unit 210 calculates the absolute value | Δmsk | of the difference between the average pixel intensity of the composite mask obtained from the current frame and the average pixel intensity of the composite mask obtained from the immediately preceding frame. When the calculated | Δmsk | exceeds the predetermined threshold th_msk, the reset determination unit 210 determines that the parameter needs to be reset, and otherwise determines that the reset is unnecessary.

２つの連続するフレームの内容は通常は互いによく似ているから、オブジェクトの抽出のエラーが小さい場合は｜Δｍｓｋ｜は小さい。オブジェクトの抽出のエラーが大きい場合はそのエラーに起因して｜Δｍｓｋ｜が大きくなる。再設定判定部２１０によると、｜Δｍｓｋ｜がｔｈ＿ｍｓｋより大きい場合はエラーが大きいと判定され、パラメータの再設定が行われる。このエラー検知は自動的に行われる。 Since the contents of two consecutive frames are usually very similar to each other, | Δmsk | is small when the error in object extraction is small. If the error of the object extraction is large, | Δmsk | becomes large due to the error. According to the reset determination unit 210, when | Δmsk | is larger than th_msk, it is determined that the error is large, and the parameters are reset. This error detection is performed automatically.

更新判定部２１２は、パラメータ保持部２２４に保持されるインター閾値の更新が必要か否かを判定する。更新判定部２１２は、フレーム間の平均画素強度の差に基づいて更新の要否を判定する。位置（ｉ，ｊ）の画素の強度をＩ（ｉ，ｊ）と表記する。更新判定部２１２は、以下の式５によりフレームの平均画素強度Ａｖｅを算出する。

…（５）
更新判定部２１２は、現在のフレームの平均画素強度と一つ前のフレームの平均画素強度との差の絶対値｜Δｉｍｇ｜を算出する。更新判定部２１２は、算出された｜Δｉｍｇ｜が所定の閾値ｔｈ＿ｉｍｇを上回る場合、更新が必要であると判定し、そうでなければ更新は不要と判定する。ｔｈ＿ｉｍｇがゼロに設定される場合、フレームごとに更新が行われる。 The update determination unit 212 determines whether the update of the inter threshold held in the parameter storage unit 224 is necessary. The update determination unit 212 determines whether update is necessary based on the difference in average pixel intensity between frames. The intensity of the pixel at the position (i, j) is denoted by I (i, j). The update determination unit 212 calculates the average pixel intensity Ave of the frame by the following Expression 5.

… (5)
The update determination unit 212 calculates the absolute value | Δimg | of the difference between the average pixel intensity of the current frame and the average pixel intensity of the immediately preceding frame. When the calculated | Δimg | exceeds a predetermined threshold th_img, the update determination unit 212 determines that update is necessary, and otherwise determines that update is unnecessary. If th_img is set to zero, an update is performed for each frame.

なお、上述のマスクやフレームの平均画素強度はＹ、Ｕ、Ｖのチャネルごとに算出され、閾値と比較される。特に、ΔｉｍｇはＹチャネルに係るΔｉｍｇ＿ＹとＵチャネルに係るΔｉｍｇ＿ＵとＶチャネルに係るΔｉｍｇ＿Ｖとからなる。 The above-described average pixel intensity of the mask or frame is calculated for each of the Y, U, and V channels, and is compared with a threshold. In particular, Δimg includes Δimg_Y for the Y channel, Δimg_U for the U channel, and Δimg_V for the V channel.

パラメータ更新部２１４は、更新判定部２１２において更新が必要であると判定された場合、インター閾値を、更新判定部２１２で得られた平均画素強度の差Δｉｍｇに応じて更新する。パラメータ更新部２１４は、パラメータ保持部２２４にアクセスし、保持されているインター閾値に差Δｉｍｇを加算する。特に、Ｙチャネルのインター閾値Ｙ＿ｔｈ＿ｏｐｔにはΔｉｍｇ＿Ｙが加算される。Ｕチャネル、Ｖチャネルについても同様である。 When the update determination unit 212 determines that the update is necessary, the parameter update unit 214 updates the inter threshold according to the average pixel intensity difference Δimg obtained by the update determination unit 212. The parameter updating unit 214 accesses the parameter holding unit 224, and adds the difference Δimg to the held inter threshold. In particular, Δimg_Y is added to the Y-channel inter threshold Y_th_opt. The same applies to the U channel and the V channel.

図４は、パラメータ保持部２２４の一例を示すデータ構造図である。パラメータ保持部２２４は、イントラ閾値であるｇｍｍ＿ｔｈ＿ｏｐｔと、インター閾値であるＹ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔと、を対応付けて保持する。 FIG. 4 is a data structure diagram showing an example of the parameter holding unit 224. The parameter holding unit 224 holds gmm_th_opt, which is an intra threshold, and Y_th_opt, U_th_opt, V_th_opt, which are inter thresholds, in association with each other.

図５（ａ）、（ｂ）は、関心領域設定画面４００の代表画面図である。関心領域設定画面４００は、フレーム表示領域４０２と、ＯＫボタン４０４と、キャンセルボタン４０６と、を有する。フレーム表示領域４０２は最初のフレームを表示する。ユーザは、図５（ａ）に示される関心領域設定画面４００を見ながらタッチパネルを操作し、関心領域を矩形４０８で指定する。図５（ｂ）はフレーム表示領域４０２に表示される最初のフレームに対して矩形４０８を描いた状態を示す。この状態でＯＫボタン４０４がタップされると、携帯端末１１４は矩形４０８で囲まれる領域を関心領域として取得する。 FIGS. 5A and 5B are representative screen diagrams of the region of interest setting screen 400. FIG. The region of interest setting screen 400 includes a frame display region 402, an OK button 404, and a cancel button 406. The frame display area 402 displays the first frame. The user operates the touch panel while viewing the region of interest setting screen 400 shown in FIG. FIG. 5B shows a state in which a rectangle 408 is drawn for the first frame displayed in the frame display area 402. When the OK button 404 is tapped in this state, the portable terminal 114 acquires a region surrounded by the rectangle 408 as a region of interest.

図６（ａ）、（ｂ）、（ｃ）は、重み設定画面５００の代表画面図である。重み設定画面５００は、重み設定領域５０２と、代表マスク表示領域５０４と、ＯＫボタン５０６と、を有する。重み設定領域５０２はスライダバーであり、ユーザによる重みの値の指定を受け付ける。代表マスク表示領域５０４には重み設定領域５０２でユーザが指定した重みの値を用いたときの合成マスクが表示される。ユーザは、重み設定領域５０２で重みの値を様々に変えながら代表マスク表示領域５０４に表示される合成マスクを確認し、自分の好みに合致する合成マスクを与える重みを選択する。ＯＫボタン５０６がタップされると、携帯端末１１４はそのとき重み設定領域５０２で設定されている値を重みαとして取得する。 FIGS. 6A, 6B, and 6C are representative screen diagrams of the weight setting screen 500. FIG. The weight setting screen 500 has a weight setting area 502, a representative mask display area 504, and an OK button 506. The weight setting area 502 is a slider bar, and accepts designation of a weight value by a user. In the representative mask display area 504, a composite mask when the weight value specified by the user in the weight setting area 502 is used is displayed. The user checks the composite mask displayed in the representative mask display area 504 while changing the weight value variously in the weight setting area 502, and selects a weight that gives a composite mask that matches his / her preference. When the OK button 506 is tapped, the portable terminal 114 acquires the value set in the weight setting area 502 at that time as the weight α.

図７は、自由視点画像再生画面６００の代表画面図である。表示制御部２１６は、動画像配信部２４４から送信された合成動画像を受信し、自由視点画像再生画面６００をディスプレイ２１８に表示させる。自由視点画像再生画面６００は、自由視点画像表示領域６０２と、プログレッシブバー６０４と、視点指定領域６０６と、操作領域６０８と、ＲＯＩ指定ボタン６１０と、重み指定ボタン６１２と、を有する。自由視点画像表示領域６０２には、合成動画像が表示される。操作領域６０８は合成動画像の再生、一時停止、早送り等の操作を行うための領域である。視点指定領域６０６は、視点の上下方向の位置を指定するためのスライダバーである。ＲＯＩ指定ボタン６１０がタップされると、携帯端末１１４は画像処理装置２００にその旨を通知する。画像処理装置２００は、該通知を受けると、関心領域設定画面４００により関心領域の指定を受け付ける処理を開始する。重み指定ボタン６１２がタップされると、携帯端末１１４は画像処理装置２００にその旨を通知する。画像処理装置２００は、該通知を受けると、重み設定画面５００により重みαの指定を受け付ける処理を開始する。 FIG. 7 is a representative screen diagram of the free viewpoint image reproduction screen 600. The display control unit 216 receives the combined moving image transmitted from the moving image distribution unit 244, and causes the display 218 to display the free viewpoint image reproduction screen 600. The free viewpoint image reproduction screen 600 has a free viewpoint image display area 602, a progressive bar 604, a viewpoint specification area 606, an operation area 608, an ROI specification button 610, and a weight specification button 612. In the free viewpoint image display area 602, a synthesized moving image is displayed. An operation area 608 is an area for performing operations such as playing, pausing, and fast-forwarding the synthesized moving image. The viewpoint designation area 606 is a slider bar for designating the vertical position of the viewpoint. When the ROI designation button 610 is tapped, the portable terminal 114 notifies the image processing device 200 of the fact. Upon receiving the notification, the image processing apparatus 200 starts a process of accepting designation of a region of interest on the region of interest setting screen 400. When the weight designation button 612 is tapped, the portable terminal 114 notifies the image processing device 200 of the fact. Upon receiving the notification, the image processing apparatus 200 starts a process of accepting the designation of the weight α on the weight setting screen 500.

図８は、エラー画面８００の代表画面図である。再設定判定部２１０は、再設定が必要であると判定すると、エラー画面８００を生成し、その画面データをネットワーク１１２を介して携帯端末１１４に送信する。携帯端末１１４の表示制御部２１６は、受信した画面データに基づきエラー画面８００をディスプレイ２１８に表示させる。ＯＫボタン８０２がタップされると、携帯端末１１４は画像処理装置２００にその旨を通知する。画像処理装置２００は、該通知を受けると、関心領域設定画面４００により関心領域の指定を受け付ける処理を開始する。 FIG. 8 is a representative screen diagram of the error screen 800. When determining that the resetting is necessary, the resetting determination unit 210 generates an error screen 800 and transmits the screen data to the portable terminal 114 via the network 112. The display control unit 216 of the mobile terminal 114 displays an error screen 800 on the display 218 based on the received screen data. When the OK button 802 is tapped, the portable terminal 114 notifies the image processing device 200 of the fact. Upon receiving the notification, the image processing apparatus 200 starts a process of accepting designation of a region of interest on the region of interest setting screen 400.

図９は、パラメータ設定部２０４におけるパラメータ設定処理を説明するための説明図である。最初のフレーム９０２において関心領域９０４が指定される。これにGrabcut法を適用することで、基準マスク９０６が生成される。種々のインター閾値を用いたインター背景差分を最初のフレーム９０２に適用することで種々のインターテストマスクが得られる。これらのインターテストマスクのなかから、基準マスク９０６に最も合致する（すなわち、改良Ｆ値が最も大きい）インター最適マスク９０８が選択される。次に、種々のイントラ閾値を用いたイントラ背景差分を最初のフレーム９０２に適用して得られるイントラテストマスクと、インター最適マスク９０８と、が合成されて種々の合成テストマスクが得られる。これらの合成テストマスクのなかから、基準マスク９０６に最も合致する（すなわち、改良Ｆ値が最も大きい）合成最適マスク９１０が選択される。インター最適マスク９０８を与えるインター閾値および合成最適マスク９１０を与えるイントラ閾値が最適な閾値としてパラメータ保持部２２４に登録される。基準マスク９０６と合成最適マスク９１０とを比べると、基準マスク９０６では人物の右脚と左脚との間の領域は前景と判断されていたが、インター最適マスク９０８を合成した後の合成最適マスク９１０ではその領域も正しく背景と認識される。 FIG. 9 is an explanatory diagram for explaining the parameter setting process in the parameter setting unit 204. In the first frame 902, a region of interest 904 is specified. By applying the Grabcut method to this, a reference mask 906 is generated. Various intertest masks are obtained by applying inter background differences using various inter thresholds to the first frame 902. From these inter-test masks, the inter-optimal mask 908 that best matches the reference mask 906 (that is, has the largest improved F value) is selected. Next, an intra test mask obtained by applying an intra background difference using various intra threshold values to the first frame 902 and an inter-optimal mask 908 are synthesized to obtain various synthesized test masks. From these combined test masks, the combined optimal mask 910 that best matches the reference mask 906 (ie, has the largest improved F value) is selected. The inter threshold that gives the inter optimal mask 908 and the intra threshold that gives the combined optimal mask 910 are registered in the parameter holding unit 224 as optimal thresholds. When the reference mask 906 is compared with the combined optimum mask 910, the region between the right leg and the left leg of the person is determined to be the foreground in the reference mask 906, but the combined optimum mask after combining the inter optimal mask 908 is used. At 910, that area is also correctly recognized as the background.

Grabcut法において、関心領域９０４の外側は背景部分として扱われる。したがって、基準マスク９０６の背景部分は、指定された関心領域９０４の外部を含む。インター最適マスク９０８の人物の像の右足と合成最適マスク９１０の人物の像の右足とを比べると、後者の方がより正確に足の輪郭を抽出していることが分かる。 In the Grabcut method, the outside of the region of interest 904 is treated as a background portion. Therefore, the background portion of the reference mask 906 includes the outside of the designated region of interest 904. Comparing the right foot of the image of the person in the inter-optimal mask 908 with the right foot of the image of the person in the composite optimal mask 910, it can be seen that the latter extracts the contour of the foot more accurately.

図１０（ａ）、（ｂ）、（ｃ）は、重みαの違いによる合成マスクの違いを説明するための説明図である。図１０（ａ）、（ｂ）、（ｃ）はそれぞれα＝０．３、１．０、１．７のときの合成マスクを示す。式１に関して上述した通り、αが異なると、決定される最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）や最適なイントラ閾値（ｇｍｍ＿ｔｈ＿ｏｐｔ）も異なり、したがって、結果として得られる合成マスクの外観も異なる。具体的には、重みαを大きくすると、輪郭はぼけるが背景でないのに背景とされてしまう部分を減らすことができる。重みαを小さくすると、輪郭はよりくっきりするが、背景でないのに背景とされてしまう部分が増える。ユーザは、この違いを理解しつつ、アプリケーションや好みにより重みαを設定すればよい。 FIGS. 10A, 10B, and 10C are explanatory diagrams for explaining the difference in the combination mask due to the difference in the weight α. FIGS. 10A, 10B, and 10C show combined masks when α = 0.3, 1.0, and 1.7, respectively. As described above with respect to Equation 1, when α is different, the determined optimal inter thresholds (Y_th_opt, U_th_opt, V_th_opt) and the optimal intra thresholds (gmm_th_opt) are different, and therefore, the appearance of the resultant combined mask is different. Specifically, when the weight α is increased, it is possible to reduce a portion that is blurred but not a background but is set as a background. When the weight α is reduced, the outline becomes sharper, but the number of portions which are not the background but become the background increases. The user may set the weight α according to the application or preference while understanding the difference.

図１１（ａ）、（ｂ）は、改良Ｆ値による評価を説明するためのグラフである。図１１（ａ）はテストマスク生成部２３２が選んだインター閾値（Ｙ＿ｔｈ、Ｕ＿ｔｈ、Ｖ＿ｔｈ）を特定する番号を横軸、改良Ｆ値算出部２３４により算出された改良Ｆ値を縦軸とするグラフである。改良Ｆ値の最大は符号１５０の箇所で得られ、対応するインター閾値が最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）として決定される。図１１（ｂ）はテストマスク生成部２３２が選んだイントラ閾値ｇｍｍ＿ｔｈの１／１０を横軸、改良Ｆ値算出部２３４により算出された改良Ｆ値を縦軸とするグラフである。改良Ｆ値の最大は符号１５２の箇所で得られ、対応するイントラ閾値が最適なイントラ閾値（ｇｍｍ＿ｔｈ＿ｏｐｔ）として決定される。 FIGS. 11A and 11B are graphs for explaining evaluation based on the improved F value. FIG. 11A is a graph in which the numbers specifying the inter thresholds (Y_th, U_th, V_th) selected by the test mask generation unit 232 are on the horizontal axis, and the improved F value calculated by the improved F value calculation unit 234 is the vertical axis. It is. The maximum of the improved F-number is obtained at the point denoted by reference numeral 150, and the corresponding inter-threshold is determined as the optimum inter-threshold (Y_th_opt, U_th_opt, V_th_opt). FIG. 11B is a graph in which 1/10 of the intra threshold value gmm_th selected by the test mask generation unit 232 is the horizontal axis, and the improved F value calculated by the improved F value calculation unit 234 is the vertical axis. The maximum of the improved F-number is obtained at a position denoted by reference numeral 152, and the corresponding intra threshold is determined as the optimal intra threshold (gmm_th_opt).

図１２（ａ）〜（ｆ）は、フレームの平均画素強度の変動を示すグラフである。図１２（ａ）、（ｂ）、（ｃ）はそれぞれ、あるシーンをカメラで撮像して得られる６００フレーム分の動画像について、横軸をフレーム番号、縦軸をＹチャネル、Ｕチャネル、Ｖチャネルの平均画素強度としてプロットしたものである。図１２（ｄ）、（ｅ）、（ｆ）はそれぞれ、同じシーンを別のカメラで撮像して得られる６００フレーム分の動画像について、横軸をフレーム番号、縦軸をＹチャネル、Ｕチャネル、Ｖチャネルの平均画素強度としてプロットしたものである。これらのグラフから分かる通り、一般に、フレーム間の平均画素強度の変動はそれほど大きくない。したがって、フレーム間の平均画素強度の差は、閾値更新の要否の判定のための良い指標と言える。 FIGS. 12A to 12F are graphs showing fluctuations in the average pixel intensity of a frame. FIGS. 12A, 12B, and 12C respectively show a frame number on the horizontal axis, a Y channel, a U channel, and a V axis on a 600-frame moving image obtained by capturing a certain scene with a camera. It is plotted as the average pixel intensity of the channel. FIGS. 12D, 12E, and 12F respectively show a frame number on a horizontal axis, a Y channel, and a U channel on a 600-frame moving image obtained by capturing the same scene with another camera. , V-channel. As can be seen from these graphs, the variation in average pixel intensity between frames is generally not very large. Therefore, the difference between the average pixel intensities between frames can be said to be a good index for determining whether or not the threshold needs to be updated.

以上の構成による画像処理装置２００の動作を説明する。
図１３は、画像処理装置２００における一連の処理の流れを示すフローチャートである。画像処理装置２００は、動画像保持部２２２に保持される処理対象の動画像からひとつのフレームを取得する（Ｓ１２）。画像処理装置２００は、ステップＳ１２で取得されたフレームが最初のフレームであるか否かを判定する（Ｓ１４）。最初のフレームである場合（Ｓ１４のＹＥＳ）、画像処理装置２００はユーザによる入力を要する閾値設定処理（Ｓ１６）を実行する。画像処理装置２００は、ステップＳ１６により設定されたインター閾値、イントラ閾値を用いて、ステップＳ１２で取得された最初のフレームから合成マスクを生成する（Ｓ１８）。最初のフレームを扱っている場合は、画像処理装置２００は以下のステップＳ２０、ステップＳ２２をスキップする。画像処理装置２００は、２番目以降のフレームについて、ステップＳ１８で生成された合成マスクを用いてΔｍｓｋを算出する（Ｓ２０）。画像処理装置２００は、マスクエラーを評価するため、ステップＳ２０で算出されたΔｍｓｋとｔｈ＿ｍｓｋとを比較する（Ｓ２２）。Δｍｓｋ＞ｔｈ＿ｍｓｋの場合（Ｓ２２のＹＥＳ）、処理はステップＳ１６に戻る。Δｍｓｋ≦ｔｈ＿ｍｓｋの場合（Ｓ２２のＮＯ）、画像処理装置２００はステップＳ１８で生成された合成マスクを出力する（Ｓ２４）。合成マスクの出力の後、画像処理装置２００は扱っているフレームが処理対象の動画像の最後のフレームであるか否かを判定する（Ｓ２６）。最後のフレームである場合（Ｓ２６のＹＥＳ）、処理は終了する。最後のフレームでない場合（Ｓ２６のＮＯ）、処理はステップＳ１２に戻る。 The operation of the image processing apparatus 200 having the above configuration will be described.
FIG. 13 is a flowchart illustrating a flow of a series of processes in the image processing apparatus 200. The image processing device 200 acquires one frame from the moving image to be processed held in the moving image holding unit 222 (S12). The image processing device 200 determines whether or not the frame acquired in step S12 is the first frame (S14). When the frame is the first frame (YES in S14), the image processing apparatus 200 executes a threshold setting process (S16) requiring input by the user. The image processing apparatus 200 generates a combined mask from the first frame acquired in step S12 using the inter threshold and the intra threshold set in step S16 (S18). When the first frame is handled, the image processing apparatus 200 skips the following steps S20 and S22. The image processing device 200 calculates Δmsk for the second and subsequent frames using the composite mask generated in step S18 (S20). The image processing apparatus 200 compares Δmsk calculated in step S20 with th_msk to evaluate a mask error (S22). If Δmsk> th_msk (YES in S22), the process returns to step S16. If Δmsk ≦ th_msk (NO in S22), the image processing apparatus 200 outputs the combined mask generated in step S18 (S24). After the output of the composite mask, the image processing apparatus 200 determines whether the frame being handled is the last frame of the moving image to be processed (S26). If it is the last frame (YES in S26), the process ends. If it is not the last frame (NO in S26), the process returns to step S12.

ステップＳ１２で取得されたフレームが最初のフレームでない場合（Ｓ１４のＮＯ）、画像処理装置２００はΔｉｍｇを算出する（Ｓ２８）。画像処理装置２００は、閾値の更新の要否を判定するため、Δｉｍｇとｔｈ＿ｉｍｇとを比較する（Ｓ３０）。Δｉｍｇ＞ｔｈ＿ｉｍｇの場合（Ｓ３０のＹＥＳ）、画像処理装置２００はインター閾値を更新する（Ｓ３２）。ステップＳ３２の後、処理はステップＳ１８に進み、画像処理装置２００は更新されたインター閾値を用いて合成マスクを生成する。Δｉｍｇ≦ｔｈ＿ｉｍｇの場合（Ｓ３０のＮＯ）、インター閾値は更新されずに処理はステップＳ１８に進む。 If the frame acquired in step S12 is not the first frame (NO in S14), the image processing device 200 calculates Δimg (S28). The image processing device 200 compares Δimg with th_img in order to determine whether the threshold needs to be updated (S30). If Δimg> th_img (YES in S30), the image processing apparatus 200 updates the inter threshold (S32). After step S32, the process proceeds to step S18, and the image processing apparatus 200 generates a combined mask using the updated inter threshold. If Δimg ≦ th_img (NO in S30), the process proceeds to step S18 without updating the inter threshold.

図１４は、図１３の閾値設定処理ステップＳ１６における処理の流れを示すフローチャートである。画像処理装置２００は、ユーザが利用する携帯端末１１４を介して、関心領域の指定を受け付ける（Ｓ３６）。画像処理装置２００は、指定された関心領域と最初のフレームとに基づくGrabcut法により基準マスクを生成する（Ｓ３８）。画像処理装置２００は、テスト用のインター閾値を選択する（Ｓ４０）。画像処理装置２００は、ステップＳ４０で選択されたインター閾値を用いて最初のフレームからインターテストマスクを生成する（Ｓ４２）。画像処理装置２００は、ステップＳ３８で生成された基準マスクとステップＳ４２で生成されたインターテストマスクとから改良Ｆ値を算出する（Ｓ４４）。選択されていないインター閾値がある場合（Ｓ４６のＮＯ）、処理はステップＳ４０に戻る。全てのインター閾値が選択された場合（Ｓ４６のＹＥＳ）、画像処理装置２００は最大の改良Ｆ値を与えるインター閾値を最適インター閾値として決定する（Ｓ４８）。画像処理装置２００は、テスト用のイントラ閾値を選択する（Ｓ５０）。画像処理装置２００は、ステップＳ５０で選択されたイントラ閾値を用いて最初のフレームからイントラテストマスクを生成する（Ｓ５２）。画像処理装置２００は、ステップＳ５２で生成されたイントラテストマスクとステップＳ４８で決定された最適インター閾値に対応するインター最適マスクとを合成し、合成テストマスクを生成する（Ｓ５４）。画像処理装置２００は、ステップＳ３８で生成された基準マスクとステップＳ５４で生成された合成テストマスクとから改良Ｆ値を算出する（Ｓ５６）。選択されていないイントラ閾値がある場合（Ｓ５８のＮＯ）、処理はステップＳ５０に戻る。全てのイントラ閾値が選択された場合（Ｓ５８のＹＥＳ）、画像処理装置２００は最大の改良Ｆ値を与えるイントラ閾値を最適イントラ閾値として決定する（Ｓ６０）。 FIG. 14 is a flowchart showing the flow of the process in the threshold setting process step S16 of FIG. The image processing device 200 accepts the designation of the region of interest via the portable terminal 114 used by the user (S36). The image processing device 200 generates a reference mask by the Grabcut method based on the designated region of interest and the first frame (S38). The image processing apparatus 200 selects an inter threshold for testing (S40). The image processing device 200 generates an intertest mask from the first frame using the inter threshold value selected in step S40 (S42). The image processing apparatus 200 calculates an improved F value from the reference mask generated in step S38 and the intertest mask generated in step S42 (S44). If there is an unselected inter threshold value (NO in S46), the process returns to step S40. When all the inter thresholds have been selected (YES in S46), the image processing apparatus 200 determines the inter threshold that gives the maximum improved F value as the optimum inter threshold (S48). The image processing apparatus 200 selects a test intra threshold value (S50). The image processing device 200 generates an intra test mask from the first frame using the intra threshold value selected in step S50 (S52). The image processing apparatus 200 combines the intra-test mask generated in step S52 with the inter-optimal mask corresponding to the optimal inter-threshold determined in step S48 to generate a combined test mask (S54). The image processing apparatus 200 calculates an improved F value from the reference mask generated in step S38 and the combined test mask generated in step S54 (S56). If there is an unselected intra threshold (NO in S58), the process returns to step S50. When all the intra thresholds have been selected (YES in S58), the image processing apparatus 200 determines the intra threshold that gives the maximum improved F value as the optimal intra threshold (S60).

図１５は、図１３の合成マスク生成処理ステップＳ１８における処理の流れを示すチャートである。画像処理装置２００は、処理対象のフレームを取得する（Ｓ６２）。画像処理装置２００は、ステップＳ６２で取得されたフレームに対するイントラ背景差分とインターとを並列に実行する。イントラ背景差分では、画像処理装置２００は、背景フレームを取得し（Ｓ６４）、パラメータ保持部２２４からイントラ閾値を読み出す（Ｓ６６）。画像処理装置２００は、ステップＳ６６で読み出されたイントラ閾値およびステップＳ６４で取得された背景フレームを用いて、ステップＳ６２で取得されたフレームに対してイントラ背景差分を行い、イントラマスクを生成する（Ｓ６８）。インター背景差分では、画像処理装置２００は、パラメータ保持部２２４からインター閾値を読み出す（Ｓ７０）。画像処理装置２００は、ステップＳ７０で読み出されたインター閾値を用いて、ステップＳ６２で取得されたフレームに対してインター背景差分を行い、インターマスクを生成する（Ｓ７２）。画像処理装置２００は、ステップＳ６８で生成されたイントラマスクとステップＳ７２で生成されたインターマスクとを合成し、合成マスクを生成する（Ｓ７４）。 FIG. 15 is a chart showing a flow of processing in the synthetic mask generation processing step S18 in FIG. The image processing device 200 acquires a frame to be processed (S62). The image processing device 200 executes the intra background difference and the inter on the frame acquired in step S62 in parallel. With the intra background subtraction, the image processing device 200 acquires a background frame (S64), and reads an intra threshold from the parameter storage unit 224 (S66). The image processing device 200 performs an intra-background difference on the frame acquired in step S62 using the intra threshold read in step S66 and the background frame acquired in step S64, and generates an intra mask ( S68). For the inter background difference, the image processing apparatus 200 reads out the inter threshold from the parameter holding unit 224 (S70). The image processing device 200 performs an inter background subtraction on the frame acquired in step S62 using the inter threshold read out in step S70, and generates an inter mask (S72). The image processing device 200 combines the intra mask generated in step S68 with the inter mask generated in step S72 to generate a combined mask (S74).

上述の実施の形態において、保持部の例は、ハードディスクや半導体メモリである。また、本明細書の記載に基づき、各部を、図示しないＣＰＵや、インストールされたアプリケーションプログラムのモジュールや、システムプログラムのモジュールや、ハードディスクから読み出したデータの内容を一時的に記憶する半導体メモリなどにより実現できることは本明細書に触れた当業者には理解される。 In the above embodiment, examples of the holding unit are a hard disk and a semiconductor memory. Further, based on the description in this specification, each unit is realized by a CPU (not shown), a module of an installed application program, a module of a system program, and a semiconductor memory that temporarily stores the content of data read from a hard disk. It will be understood by those skilled in the art referred to herein that this can be achieved.

本実施の形態に係る画像処理装置２００によると、基本的にはユーザは最初に関心領域を指定するだけで、より正確に抽出されたオブジェクトに基づく自由視点映像を楽しむことができる。関心領域の指定後は、イントラ背景差分およびインター背景差分に必要なパラメータは自動的に設定され、自動的に更新される。したがって、ユーザ利便性が向上する。 According to the image processing apparatus 200 according to the present embodiment, basically, the user can enjoy a free viewpoint video based on a more accurately extracted object simply by first specifying a region of interest. After designating the region of interest, parameters necessary for the intra background difference and the inter background difference are automatically set and updated automatically. Therefore, user convenience is improved.

例えば、背景差分のパラメータの意義や背景差分そのものについての知見を有さないユーザにパラメータの設定を求めても、パラメータが適切に設定される蓋然性は低いし、ユーザも当惑するであろう。これに対して本実施の形態に係る画像処理装置２００では、そのようなパラメータの設定、更新は自動的に行われるので、ユーザは背景差分についての知見を有さなくても自由視点映像を楽しむことができる。 For example, even if a user who does not have knowledge of the meaning of the background difference parameter or the background difference itself is requested to set the parameter, it is unlikely that the parameter is appropriately set, and the user will be embarrassed. On the other hand, in the image processing device 200 according to the present embodiment, such parameters are automatically set and updated, so that the user can enjoy the free viewpoint video without having knowledge of the background difference. be able to.

また、本実施の形態に係る画像処理装置２００では、イントラ背景差分の結果とインター背景差分の結果とを合成することで合成マスクを生成する。したがって、イントラ背景差分の結果、インター背景差分の結果のそれぞれに求められる正確さのレベルを下げ、代わりに処理速度を高めることができる。その結果、リアルタイムにも適用可能な程度に高速でありながら、より正確にオブジェクトを抽出できる背景差分を提供できる。 The image processing apparatus 200 according to the present embodiment generates a combined mask by combining the result of the intra background difference and the result of the inter background difference. Therefore, the level of accuracy required for each of the intra background difference result and the inter background difference result can be reduced, and the processing speed can be increased instead. As a result, it is possible to provide a background difference that can extract an object more accurately while being fast enough to be applicable in real time.

また、本実施の形態に係る画像処理装置２００では、マスクのエラーが自動的に検出される。したがって、ユーザが自らマスクのエラーを見つける必要がなくなるので、ユーザ利便性が向上する。 Further, in image processing apparatus 200 according to the present embodiment, a mask error is automatically detected. Therefore, the user does not need to find the mask error by himself, and the user convenience is improved.

また、本実施の形態に係る画像処理装置２００では、改良Ｆ値を用いてテストマスクの良し悪しが評価される。この改良Ｆ値の重みαを変えることで、ユーザは抽出結果を自分の好みに合わせることができる。したがって、ユーザの好みを反映できる柔軟性の高い評価手法が実現される。例えば、ユーザがより正確な輪郭の抽出を望む場合、αを１より小さく設定すればよい。また、ユーザがオブジェクトの抽出漏れの低減を望む場合はαを１より大きく設定すればよい。 In the image processing apparatus 200 according to the present embodiment, the quality of the test mask is evaluated using the improved F value. By changing the weight α of the improved F value, the user can adjust the extraction result to his / her preference. Therefore, a highly flexible evaluation method that can reflect user preferences is realized. For example, if the user wants to extract a more accurate contour, α may be set to be smaller than 1. If the user wants to reduce omission of object extraction, α may be set to be larger than 1.

また、本実施の形態に係る画像処理装置２００では、フレームごとにインター閾値の更新の要否が判定され、必要と判定された場合はフレーム間の平均画素強度の差に基づき自動的に閾値が更新される。したがって、オブジェクトの抽出の堅牢性（Robustness）が向上する。 Further, in the image processing apparatus 200 according to the present embodiment, it is determined whether or not the update of the inter threshold is necessary for each frame, and when it is determined that the inter threshold is required, the threshold is automatically set based on a difference in average pixel intensity between frames. Be updated. Therefore, robustness (Robustness) of object extraction is improved.

以上、実施の形態に係る画像処理装置２００の構成と動作について説明した。この実施の形態は例示であり、各構成要素や各処理の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解される。 The configuration and operation of the image processing device 200 according to the embodiment have been described above. This embodiment is an exemplification, and it is understood by those skilled in the art that various modifications can be made to the combination of each component and each processing, and such modifications are also within the scope of the present invention.

実施の形態では、自由視点映像の配信を例として説明したが、これに限られず、監視カメラから得られる動画像の解析などの一般的な動画像の解析に本実施の形態に係る技術的思想を適用してもよい。 In the embodiment, the distribution of the free viewpoint video has been described as an example, but the present invention is not limited to this, and the technical idea according to the present embodiment is applied to analysis of general moving images such as analysis of a moving image obtained from a surveillance camera. May be applied.

実施の形態では、フレームごとにインター閾値の更新の要否が判定される場合を説明したが、これに限られず、更新の要否の判定は任意の数のフレームごとになされてもよいし、ランダムなタイミングでなされてもよい。また、Δｉｍｇに限られず任意の基準により、イントラ閾値およびインター閾値のうちの少なくとも一方の更新の要否が判定されてもよい。 In the embodiment, the case where the necessity of the update of the inter threshold is determined for each frame has been described, but the present invention is not limited to this, and the determination of the necessity of the update may be made for any number of frames, It may be done at random timing. In addition, whether or not at least one of the intra threshold and the inter threshold is required to be updated may be determined based on an arbitrary criterion, not limited to Δimg.

実施の形態では、基準フレームとして最初のフレームが採用されたが、これに限られず、処理対象の動画像の任意のフレームが基準フレームとして採用されてもよい。 In the embodiment, the first frame is adopted as the reference frame. However, the present invention is not limited to this, and any frame of the moving image to be processed may be adopted as the reference frame.

実施の形態では、イントラ背景差分の結果とインター背景差分の結果との合成により合成マスクを生成する処理を画像処理装置２００が行う場合を説明したが、これに限られず、例えば携帯端末１１４が当該合成処理を行ってもよい。このように、画像処理装置２００の機能の一部または全部を携帯端末１１４が実現する態様も可能である。 In the embodiment, the case where the image processing apparatus 200 performs the process of generating the combined mask by combining the result of the intra background difference and the result of the inter background difference is described. However, the present invention is not limited thereto. A combining process may be performed. As described above, a mode in which the mobile terminal 114 realizes part or all of the functions of the image processing apparatus 200 is also possible.

実施の形態では、イントラテストマスクを生成し、それをインター最適マスクと合成して合成テストマスクを生成する場合を説明したが、これに限られず、例えばイントラテストマスクに対して改良Ｆ値を算出し、最大の改良Ｆ値を与えるイントラ閾値を最適なイントラ閾値として決定してもよい。この場合、インター閾値、イントラ閾値は個々に基準マスクに対して最適化される。 In the embodiment, the case where the intra test mask is generated and the synthesized test mask is generated by synthesizing the intra test mask with the inter-optimal mask has been described. However, the present invention is not limited to this. Then, the intra threshold that gives the maximum improved F value may be determined as the optimal intra threshold. In this case, the inter threshold and the intra threshold are individually optimized for the reference mask.

以下に、最適なインター閾値（Ｙ＿ｔｈ＿ｏｐｔ、Ｕ＿ｔｈ＿ｏｐｔ、Ｖ＿ｔｈ＿ｏｐｔ）を決めるためのプログラムコードの例を示す。最適なイントラ閾値（ｇｍｍ＿ｔｈ＿ｏｐｔ）は同様のプログラムコードにより決められてもよい。

The following is an example of a program code for determining the optimum inter thresholds (Y_th_opt, U_th_opt, V_th_opt). The optimal intra threshold (gmm_th_opt) may be determined by a similar program code.

１１０自由視点画像配信システム、１１２ネットワーク、１１４携帯端末、２００画像処理装置。 110 free viewpoint image distribution system, 112 network, 114 portable terminal, 200 image processing device.

Claims

Means for obtaining a first mask obtained by performing a first background difference on a target frame of a moving image in the target frame;
Means for obtaining a second mask obtained by performing a second background difference between frames for the target frame;
Means for generating a combined mask by combining the first mask and the second mask;
Means for generating a reference mask by performing a third background difference different from any of the first background difference and the second background difference;
Means for setting at least one of a parameter used in the first background difference and a parameter used in the second background difference based on the generated reference mask .

The image processing apparatus according to claim 1, wherein a background portion of the composite mask is a background portion of at least one of the first mask and the second mask.

The image processing apparatus according to claim 1, further comprising a unit configured to extract an object from the target frame using the generated composite mask.

The apparatus further includes means for receiving designation of a region of interest in a reference frame of the moving image from a user,
4. The image processing apparatus according to claim 1 , wherein a background portion of the reference mask includes an area outside the designated region of interest. 5.

5. The parameter setting unit according to claim 1 , wherein the setting unit sets the parameter such that a weighted harmonic average of a precision and a recall when the reference mask is set as a correct answer is large. 6. An image processing apparatus according to claim 1.

The image processing apparatus according to claim 5 , further comprising a unit configured to receive designation of a weight in the weighted harmonic mean from a user.

The image processing apparatus according to any one of claims 1 to 6 , further comprising: a unit configured to determine whether it is necessary to generate a reference mask and reset parameters using the generated reference mask.

The apparatus according to any one of claims 1 to 7 , further comprising: a unit configured to determine whether at least one of a parameter used in the first background difference and a parameter used in the second background difference needs to be updated. Image processing device.

The determining means determines whether or not updating is necessary based on a difference in average pixel intensity between frames,
The image processing apparatus further updates at least one of a parameter used in the first background difference and a parameter used in the second background difference according to the difference when it is determined that the update is necessary. The image processing apparatus according to claim 8 , further comprising a unit.

Obtaining a first mask obtained by performing a first background difference on a target frame of a moving image in the target frame;
Obtaining a second mask obtained by performing a second background difference between frames for the target frame;
Generating a combined mask by combining the first mask and the second mask;
Generating a reference mask by performing a third background difference different from any of the first background difference and the second background difference;
Setting at least one of a parameter used in the first background difference and a parameter used in the second background difference based on the generated reference mask .

A function of obtaining a first mask obtained by performing a first background difference on a target frame of a moving image in the target frame;
A function of obtaining a second mask obtained by performing a second background difference between frames for the target frame;
A function of generating a combined mask by combining the first mask and the second mask;
A function of generating a reference mask by performing a third background difference different from any of the first background difference and the second background difference;
A computer program for causing a computer to implement , based on the generated reference mask, a function of setting at least one of a parameter used in the first background difference and a parameter used in the second background difference .