WO2022185403A1 - Image processing device, image processing method, and program - Google Patents

Image processing device, image processing method, and program Download PDF

Info

Publication number
WO2022185403A1
WO2022185403A1 PCT/JP2021/007878 JP2021007878W WO2022185403A1 WO 2022185403 A1 WO2022185403 A1 WO 2022185403A1 JP 2021007878 W JP2021007878 W JP 2021007878W WO 2022185403 A1 WO2022185403 A1 WO 2022185403A1
Authority
WO
WIPO (PCT)
Prior art keywords
foreground
subject
pixel
input image
region
Prior art date
Application number
PCT/JP2021/007878
Other languages
French (fr)
Japanese (ja)
Inventor
翔大 山田
秀信 長田
弘員 柿沼
浩太 日高
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/007878 priority Critical patent/WO2022185403A1/en
Publication of WO2022185403A1 publication Critical patent/WO2022185403A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • the present invention relates to an image processing device, an image processing method, and a program.
  • a subject extraction technology that generates a foreground image by extracting only the subject from an image captured by a camera.
  • a TRIMAP is generated by classifying an image into a foreground region, a background region, and an unknown region, using a background subtraction method or machine learning. Classify into foreground or background based on the information of whether it is classified as background.
  • the present invention has been made in view of the above, and it is an object of the present invention to more accurately extract a subject even when the colors of the subject and the background are similar.
  • An image processing device is an image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera, and projects a three-dimensional model in the shape of the subject onto the input image.
  • an arrangement unit that arranges the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when the three-dimensional model is placed, a classification unit that classifies the input image into a foreground region, a background region, and an unknown region; For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model.
  • a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information as to whether the pixel belongs to the object; an output unit for outputting a foreground image from which only the
  • An image processing method is an image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera, wherein a computer inputs the three-dimensional model in the shape of the subject. arranging the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when projected onto an image; classifying the input image into a foreground region, a background region, and an unknown region; for each pixel of the input image that belongs to the three-dimensional model, calculating a score representing whether the pixel is foreground or background based on the pixel value of the pixel, and the pixel belongs to the projection range of the three-dimensional model classifying the pixels into the foreground region or the background region by weighting the scores based on information on whether or not the pixels are extracted from the input image, extracting a group of pixels in the foreground region and extracting only the subject to obtain a foreground image Output.
  • the subject can be extracted more correctly even when the colors of the subject and the background are similar.
  • FIG. 1 is a functional block diagram showing an example of the configuration of the image processing apparatus of this embodiment.
  • FIG. 2 is a diagram showing an example of how subject models are arranged.
  • FIG. 3 is a diagram showing an example of TRIMAP.
  • FIG. 4 is a diagram showing an example of projecting a subject model onto TRIMAP.
  • FIG. 5 is a flowchart showing an example of the flow of processing by the image processing apparatus.
  • FIG. 6 is a flowchart showing an example of the flow of processing for clustering pixels in an unknown region.
  • FIG. 7 is a diagram illustrating an example of a hardware configuration of an image processing apparatus;
  • FIG. 1 is a functional block diagram showing an example of the configuration of an image processing apparatus 1 of this embodiment.
  • the image processing device 1 includes an input unit 11 , an arrangement unit 12 , a classification unit 13 , a boundary determination unit 14 and an output unit 15 .
  • the image processing apparatus 1 receives an image captured by a camera, separates the subject from the background, and outputs a foreground image in which only the subject is extracted.
  • the input unit 11 inputs in advance a background image showing only the background.
  • the input unit 11 may input in advance a lookup table (LUT) used for object separation.
  • the LUT is a table that holds the foreground probability for a combined value of the pixel value of the background image and the pixel value of the input image.
  • the background image and LUT are sent to the classifier 13 .
  • the input unit 11 inputs a video captured by a camera frame by frame, and transmits the input frames (hereinafter referred to as input images) to the placement unit 12 and the classification unit 13 .
  • the arrangement unit 12 detects a subject from the input image and arranges a three-dimensional model in the shape of the subject (hereinafter referred to as a subject model) according to the subject in the input image. More specifically, the arranging unit 12 aligns the position and size of the object model and arranges the object model so that the object model overlaps the object in the input image when the object model is projected onto the input image. . In other words, the arranging unit 12 arranges the subject model in the virtual space so that the subject model is superimposed on the subject in the input image when the subject model is perspectively transformed.
  • a subject model placed in virtual space can be photographed by a virtual camera and rendered on an input image.
  • the arranging unit 12 can use virtual reality (AR) technology to superimpose a subject model on the input image, and arrange the subject model so that it overlaps the position of the subject in the input image.
  • AR virtual reality
  • the placement unit 12 estimates the pose of the subject and transforms the pose of the subject model to match the pose of the subject. For example, when the subject is a human, the arrangement unit 12 estimates skeleton data connecting the joint points of the subject from the input image, and applies the skeleton data to the subject model to match the posture.
  • Fig. 2 shows an example of how the subject model is arranged.
  • a subject model representing the subject is created in advance, and the placement unit 12 holds the subject model.
  • the arranging unit 12 transforms the posture of the subject model according to the posture of the subject in the input image, and sizes the subject model so that the subject model projected onto the two-dimensional plane has the same size as the subject in the input image. change.
  • the object model is projected onto the two-dimensional plane, the object model is projected onto the position of the object within the input image, as shown in FIG.
  • the arranging unit 12 may transmit information on the position and orientation of the arranged subject model to the boundary determining unit 14 , and may transmit information on the pixels of the input image onto which the subject model is projected (drawn) to the boundary determining unit 14 . can be sent to The boundary determining unit 14 only needs to be able to identify the pixels on which the subject model is projected from the pixels of the input image.
  • the classification unit 13 classifies the input image into the foreground area where the subject appears, the background area where the subject does not appear, and the unknown area where it is impossible to determine whether the foreground area or the background area. Specifically, the classification unit 13 performs threshold processing on the difference between the background image and the input image to generate TRIMAP.
  • TRIMAP is a region map that classifies each pixel of the input image as foreground, background, or unknown. Pixels in the foreground region are given a foreground label. Pixels in the background region are given a background label. Pixels in the unknown region are given an unknown label. Pixels in the unknown region may not be labeled.
  • FIG. 3 shows an example of TRIMAP. In the example of FIG. 3, the foreground area is black, the background area is white, and the unknown area is hatched. An unknown region exists between the foreground region and the background region.
  • the classification unit 13 When generating a TRIMAP using a LUT, the classification unit 13 refers to the LUT, and converts a value obtained by combining pixel values of pixels at the same coordinates in the background image and the input image into a probability of being in the foreground. The classification unit 13 assigns a foreground label, a background label, or an unknown label to each pixel based on the foreground probability obtained from the LUT.
  • classification unit 13 may use any method as long as it can classify the input image into the foreground region, the background region, and the unknown region.
  • the boundary determination unit 14 uses the color model to determine a score indicating whether the pixel is foreground or background. For example, the boundary determining unit 14 selects pixels having a color similar to that of the target pixel from around the target pixel, and determines the score of the target pixel based on the label given to the selected pixels.
  • the boundary determination unit 14 weights the score of each pixel using information as to whether the target pixel belongs to the projection range of the subject model, and assigns a foreground label or a background label to the pixel based on the weighted score.
  • a subject model is projected onto an input image
  • pixels within the projected range of the subject model have a high probability of being in the foreground, and pixels outside the range of projecting the subject model have a low probability of being in the foreground.
  • the boundary determination unit 14 weights pixels included in the projected portion of the subject model so that they are determined to be in the foreground, and weights pixels not included in the projected portion of the subject model. Weighting is performed so that it is determined to be the background. Since the boundary part of the subject model may not completely match the boundary of the subject in the input image, the boundary determining unit 14 weights pixels further away from the boundary of the projected subject model. good too.
  • Fig. 4 shows an example of projecting a subject model on TRIMAP.
  • the boundary of the object model falls within the unknown region of TRIMAP.
  • a portion hidden by the subject model has a high probability of being the foreground, and a portion protruding from the subject model has a low probability of being the foreground.
  • the boundary determination unit 14 weights the score determined using the color model using the projection information of the subject model, thereby classifying pixels that are difficult to determine based on color alone into the foreground and the background.
  • the output unit 15 extracts the foreground-labeled pixel group from the input image to generate and output a foreground image. For example, the output unit 15 generates a mask image in which foreground-labeled pixels are white and background-labeled pixels are black, and generates a foreground image by synthesizing the input image and the mask image.
  • step S1 the input unit 11 inputs an image from the camera.
  • step S2 the placement unit 12 places the subject model so that the subject model overlaps the subject in the input image.
  • the classification unit 13 generates a TRIMAP by classifying the input image into a foreground area, a background area, and an unknown area.
  • the processing of step S2 and the processing of step S3 may be performed in parallel, or the processing of step S3 may be performed prior to the processing of step S2.
  • step S4 the boundary determination unit 14 clusters each pixel classified as an unknown region by TRIMAP into the foreground or background to determine the boundary of the subject. Details of the clustering process will be described later. Each pixel of the input image is given a foreground label or a background label by the clustering process.
  • step S5 the output unit 15 extracts the foreground-labeled pixel group from the input image, generates a foreground image, and outputs the generated foreground image.
  • step S41 the boundary determination unit 14 determines the score of the target pixel based on the color of the target pixel.
  • step S42 the boundary determination unit 14 determines weighting using information as to whether the target pixel belongs to the projection range of the subject model.
  • step S43 the boundary determination unit 14 weights and evaluates the score of the target pixel, and determines the label of the target pixel.
  • step S44 the boundary determining unit 14 assigns a foreground label or a background label to the target pixel based on the determination result of step S43.
  • the boundary determination unit 14 performs the above processing on all pixels in the unknown region.
  • the image processing apparatus 1 of the present embodiment includes the arrangement unit 12 that arranges the object model so that the object model overlaps with the object in the input image when the object model is projected onto the input image, and the input image into a foreground region, a background region, and an unknown region; a boundary determining unit 14 that calculates a score representing the pixel, weights the score based on information as to whether the pixel belongs to the projection range of the subject model, and classifies the pixel into a foreground region or a background region; An output unit 15 is provided for extracting a group of pixels in an area and outputting a foreground image in which only the subject is extracted. As a result, the subject can be extracted more accurately even when the colors of the subject and the background are similar.
  • the image processing apparatus 1 described above includes, for example, a central processing unit (CPU) 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906 as shown in FIG. and a general-purpose computer system can be used.
  • the image processing apparatus 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 .
  • This program can be recorded on a computer-readable recording medium such as a magnetic disk, optical disk, or semiconductor memory, or distributed via a network.

Abstract

An image processing device 1 that comprises: an arrangement unit 12 that arranges a subject model such that the subject model overlaps a subject in an input image when the subject model is projected onto the input image; a classification unit 13 that divides the input image into a foreground region, a background region, and an unknown region; a boundary determination unit 14 that, for each of the pixels in the unknown region of the input image, calculates a score that indicates whether the pixel is foreground or background on the basis of a pixel value for the pixel, weights the score on the basis of information about whether the pixel is in the projection area of the subject model, and classifies the pixel as being in the foreground region or the background region; and an output unit 15 that extracts a pixel group for the foreground region from the input image and outputs a foreground image in which only the subject has been extracted.

Description

画像処理装置、画像処理方法、およびプログラムImage processing device, image processing method, and program
 本発明は、画像処理装置、画像処理方法、およびプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.
 カメラで撮影した画像から被写体のみを抽出した前景画像を生成する被写体抽出技術が知られている。被写体抽出技術では、画像を前景領域、背景領域、および未知領域に分類したTRIMAPを背景差分法または機械学習により生成し、未知領域の画素については、周囲の画素で画素値が近い画素が前景と背景のどちらに分類されているかの情報に基づいて前景または背景に分類する。 A subject extraction technology is known that generates a foreground image by extracting only the subject from an image captured by a camera. In subject extraction technology, a TRIMAP is generated by classifying an image into a foreground region, a background region, and an unknown region, using a background subtraction method or machine learning. Classify into foreground or background based on the information of whether it is classified as background.
 しかしながら、従来の被写体抽出技術では、背景が複雑な色変化を伴う場合、または被写体の色が背景と同じ場合には、被写体の境界を正しく抽出できないという問題があった。 However, with conventional subject extraction technology, there was a problem that the boundary of the subject could not be extracted correctly when the background was accompanied by complex color changes, or when the color of the subject was the same as the background.
 本発明は、上記に鑑みてなされたものであり、被写体と背景の色が類似する場合であっても被写体をより正しく抽出することを目的とする。 The present invention has been made in view of the above, and it is an object of the present invention to more accurately extract a subject even when the colors of the subject and the background are similar.
 本発明の一態様の画像処理装置は、カメラで撮影した入力画像から被写体のみを抽出した前景画像を生成する画像処理装置であって、前記被写体の形をした3次元モデルを前記入力画像に投影した際に前記3次元モデルが前記入力画像内の前記被写体と重なるように前記3次元モデルを配置する配置部と、前記入力画像を前景領域、背景領域、および未知領域に分類する分類部と、前記未知領域に属する前記入力画像の画素のそれぞれについて、当該画素の画素値に基づいて当該画素が前景であるか背景であるかを表すスコアを算出し、当該画素が前記3次元モデルの投影範囲内に属するか否かの情報に基づいて前記スコアを重み付けて当該画素を前記前景領域または前記背景領域に分類する境界決定部と、前記入力画像から前記前景領域の画素群を抽出して前記被写体のみを抽出した前景画像を出力する出力部を備える。 An image processing device according to one aspect of the present invention is an image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera, and projects a three-dimensional model in the shape of the subject onto the input image. an arrangement unit that arranges the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when the three-dimensional model is placed, a classification unit that classifies the input image into a foreground region, a background region, and an unknown region; For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information as to whether the pixel belongs to the object; an output unit for outputting a foreground image from which only the
 本発明の一態様の画像処理方法は、カメラで撮影した入力画像から被写体のみを抽出した前景画像を生成する画像処理方法であって、コンピュータが、前記被写体の形をした3次元モデルを前記入力画像に投影した際に前記3次元モデルが前記入力画像内の前記被写体と重なるように前記3次元モデルを配置し、前記入力画像を前景領域、背景領域、および未知領域に分類し、前記未知領域に属する前記入力画像の画素のそれぞれについて、当該画素の画素値に基づいて当該画素が前景であるか背景であるかを表すスコアを算出し、当該画素が前記3次元モデルの投影範囲内に属するか否かの情報に基づいて前記スコアを重み付けて当該画素を前記前景領域または前記背景領域に分類し、前記入力画像から前記前景領域の画素群を抽出して前記被写体のみを抽出した前景画像を出力する。 An image processing method according to one aspect of the present invention is an image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera, wherein a computer inputs the three-dimensional model in the shape of the subject. arranging the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when projected onto an image; classifying the input image into a foreground region, a background region, and an unknown region; for each pixel of the input image that belongs to the three-dimensional model, calculating a score representing whether the pixel is foreground or background based on the pixel value of the pixel, and the pixel belongs to the projection range of the three-dimensional model classifying the pixels into the foreground region or the background region by weighting the scores based on information on whether or not the pixels are extracted from the input image, extracting a group of pixels in the foreground region and extracting only the subject to obtain a foreground image Output.
 本発明によれば、被写体と背景の色が類似する場合であっても被写体をより正しく抽出できる。 According to the present invention, the subject can be extracted more correctly even when the colors of the subject and the background are similar.
図1は、本実施形態の画像処理装置の構成の一例を示す機能ブロック図である。FIG. 1 is a functional block diagram showing an example of the configuration of the image processing apparatus of this embodiment. 図2は、被写体モデルを配置する様子の一例を示す図である。FIG. 2 is a diagram showing an example of how subject models are arranged. 図3は、TRIMAPの一例を示す図である。FIG. 3 is a diagram showing an example of TRIMAP. 図4は、TRIMAPに被写体モデルを投影した一例を示す図である。FIG. 4 is a diagram showing an example of projecting a subject model onto TRIMAP. 図5は、画像処理装置の処理の流れの一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of the flow of processing by the image processing apparatus. 図6は、未知領域の画素をクラスタリングする処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of processing for clustering pixels in an unknown region. 図7は、画像処理装置のハードウェア構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a hardware configuration of an image processing apparatus;
 以下、本発明の一実施形態について図面を用いて説明する。なお、以下で説明する実施形態は、包括的または具体的な例を示すものである。 An embodiment of the present invention will be described below with reference to the drawings. It should be noted that the embodiments described below represent generic or specific examples.
 [画像処理装置の構成]
 図1は、本実施形態の画像処理装置1の構成の一例を示す機能ブロック図である。画像処理装置1は、入力部11、配置部12、分類部13、境界決定部14、および出力部15を備える。画像処理装置1は、カメラで撮影した画像を入力し、被写体と背景とを分離して、被写体のみを抽出した前景画像を出力する。
[Configuration of image processing device]
FIG. 1 is a functional block diagram showing an example of the configuration of an image processing apparatus 1 of this embodiment. The image processing device 1 includes an input unit 11 , an arrangement unit 12 , a classification unit 13 , a boundary determination unit 14 and an output unit 15 . The image processing apparatus 1 receives an image captured by a camera, separates the subject from the background, and outputs a foreground image in which only the subject is extracted.
 入力部11は、背景のみを映した背景画像を事前に入力する。入力部11は、被写体の分離に用いるルックアップテーブル(LUT)を事前に入力してもよい。LUTは、背景画像の画素値と入力画像の画素値とを組み合わせた値に対する前景である確率を保持するテーブルである。背景画像とLUTは分類部13へ送信される。 The input unit 11 inputs in advance a background image showing only the background. The input unit 11 may input in advance a lookup table (LUT) used for object separation. The LUT is a table that holds the foreground probability for a combined value of the pixel value of the background image and the pixel value of the input image. The background image and LUT are sent to the classifier 13 .
 入力部11は、カメラで撮影した映像をフレーム毎に入力し、入力したフレーム(以下、入力画像と称する)を配置部12と分類部13へ送信する。 The input unit 11 inputs a video captured by a camera frame by frame, and transmits the input frames (hereinafter referred to as input images) to the placement unit 12 and the classification unit 13 .
 配置部12は、入力画像から被写体を検出し、被写体の形をした3次元モデル(以下、被写体モデルと称する)を入力画像内の被写体に合わせて配置する。より具体的には、配置部12は、被写体モデルを入力画像に投影した際に被写体モデルが入力画像内の被写体と重なるように、被写体モデルの位置と大きさを整合させて被写体モデルを配置する。言い換えると、配置部12は、被写体モデルを透視変換した際に、被写体モデルが入力画像内の被写体に重畳して描画されるように、被写体モデルを仮想空間内に配置する。仮想空間内に配置した被写体モデルは仮想カメラで撮影されて、入力画像上にレンダリングできる。例えば、配置部12は、仮想現実(AR)技術を利用して入力画像に被写体モデルを重畳させて、入力画像内の被写体の位置に被写体モデルが重なるように配置できる。 The arrangement unit 12 detects a subject from the input image and arranges a three-dimensional model in the shape of the subject (hereinafter referred to as a subject model) according to the subject in the input image. More specifically, the arranging unit 12 aligns the position and size of the object model and arranges the object model so that the object model overlaps the object in the input image when the object model is projected onto the input image. . In other words, the arranging unit 12 arranges the subject model in the virtual space so that the subject model is superimposed on the subject in the input image when the subject model is perspectively transformed. A subject model placed in virtual space can be photographed by a virtual camera and rendered on an input image. For example, the arranging unit 12 can use virtual reality (AR) technology to superimpose a subject model on the input image, and arrange the subject model so that it overlaps the position of the subject in the input image.
 被写体の姿勢が変化する場合、配置部12は、被写体の姿勢を推定して、被写体モデルの姿勢を被写体の姿勢に合わせて変形する。例えば被写体が人間の場合、配置部12は、入力画像から被写体の関節点を結んだ骨格データを推定し、骨格データを被写体モデルに適用して姿勢を整合させる。 When the pose of the subject changes, the placement unit 12 estimates the pose of the subject and transforms the pose of the subject model to match the pose of the subject. For example, when the subject is a human, the arrangement unit 12 estimates skeleton data connecting the joint points of the subject from the input image, and applies the skeleton data to the subject model to match the posture.
 図2に被写体モデルを配置する様子の一例を示す。被写体をかたどった被写体モデルは事前に作成し、配置部12が被写体モデルを保持しておく。配置部12は、被写体モデルの姿勢を入力画像内の被写体の姿勢に合わせて変形し、2次元平面に投影した被写体モデルが入力画像内の被写体と同じ大きさになるように被写体モデルの大きさを変える。被写体モデルを2次元平面に投影すると、図2に示すように、入力画像内の被写体の位置に被写体モデルが投影される。 Fig. 2 shows an example of how the subject model is arranged. A subject model representing the subject is created in advance, and the placement unit 12 holds the subject model. The arranging unit 12 transforms the posture of the subject model according to the posture of the subject in the input image, and sizes the subject model so that the subject model projected onto the two-dimensional plane has the same size as the subject in the input image. change. When the object model is projected onto the two-dimensional plane, the object model is projected onto the position of the object within the input image, as shown in FIG.
 配置部12は、配置した被写体モデルの位置および姿勢の情報を境界決定部14へ送信してもよいし、入力画像の画素で被写体モデルが投影(描画)される画素の情報を境界決定部14へ送信してもよい。境界決定部14は、入力画像の画素で被写体モデルが投影される画素が特定できればよい。 The arranging unit 12 may transmit information on the position and orientation of the arranged subject model to the boundary determining unit 14 , and may transmit information on the pixels of the input image onto which the subject model is projected (drawn) to the boundary determining unit 14 . can be sent to The boundary determining unit 14 only needs to be able to identify the pixels on which the subject model is projected from the pixels of the input image.
 分類部13は、入力画像を被写体の写っている領域である前景領域、被写体の写っていない領域である背景領域、および前景領域か背景領域か判別できない未知領域に分類する。具体的には、分類部13は、背景画像と入力画像との差分を閾値処理してTRIMAPを生成する。TRIMAPは、入力画像の各画素を前景、背景、または未知に分類した領域マップである。前景領域の画素には前景ラベルが付与される。背景領域の画素には背景ラベルが付与される。未知領域の画素には未知ラベルが付与される。未知領域の画素にはラベルを付与しなくてもよい。図3にTRIMAPの一例を示す。図3の例では、前景領域を黒、背景領域を白、および未知領域をハッチングで塗分けている。未知領域は、前景領域と背景領域との間に存在する。 The classification unit 13 classifies the input image into the foreground area where the subject appears, the background area where the subject does not appear, and the unknown area where it is impossible to determine whether the foreground area or the background area. Specifically, the classification unit 13 performs threshold processing on the difference between the background image and the input image to generate TRIMAP. TRIMAP is a region map that classifies each pixel of the input image as foreground, background, or unknown. Pixels in the foreground region are given a foreground label. Pixels in the background region are given a background label. Pixels in the unknown region are given an unknown label. Pixels in the unknown region may not be labeled. FIG. 3 shows an example of TRIMAP. In the example of FIG. 3, the foreground area is black, the background area is white, and the unknown area is hatched. An unknown region exists between the foreground region and the background region.
 LUTを用いてTRIMAPを生成する場合、分類部13は、LUTを参照し、背景画像と入力画像で同じ座標の1画素それぞれの画素値を組み合わせた値を前景である確率に変換する。分類部13は、LUTから得られた前景である確率に基づき、画素ごとに前景ラベル、背景ラベル、または未知ラベルを付与する。 When generating a TRIMAP using a LUT, the classification unit 13 refers to the LUT, and converts a value obtained by combining pixel values of pixels at the same coordinates in the background image and the input image into a probability of being in the foreground. The classification unit 13 assigns a foreground label, a background label, or an unknown label to each pixel based on the foreground probability obtained from the LUT.
 なお、分類部13は、入力画像を前景領域、背景領域、および未知領域に分類できれば、いかなる手法を用いてもよい。 Note that the classification unit 13 may use any method as long as it can classify the input image into the foreground region, the background region, and the unknown region.
 境界決定部14は、未知領域の各画素について、カラーモデルを用い、画素が前景であるか背景であるかを示すスコアを決定する。例えば、境界決定部14は、対象画素の周囲から対象画素と近い色の画素を選別し、選別した画素に付与されているラベルに基づいて対象画素のスコアを決定する。 For each pixel in the unknown region, the boundary determination unit 14 uses the color model to determine a score indicating whether the pixel is foreground or background. For example, the boundary determining unit 14 selects pixels having a color similar to that of the target pixel from around the target pixel, and determines the score of the target pixel based on the label given to the selected pixels.
 そして、境界決定部14は、各画素について、対象画素が被写体モデルの投影範囲内に属するか否かの情報を用いてスコアを重み付けし、重み付けしたスコアに基づいて画素に前景ラベルまたは背景ラベルを付与する。入力画像に被写体モデルを投影した際、被写体モデルが投影される範囲内にある画素は前景である確率が高く、被写体モデルが投影される範囲外にある画素は前景である確率が低い。境界決定部14は、被写体モデルを入力画像へ投影した際に、被写体モデルの投影部分に含まれる画素については前景と判定されるように重み付けし、被写体モデルの投影部分に含まれない画素については背景と判定されるように重み付けをする。被写体モデルの境界部分は入力画像の被写体の境界と完全に一致しないことがあるので、境界決定部14は、投影後の被写体モデルの境界から離れた位置にある画素ほど重み付けの度合いを大きくしてもよい。 Then, the boundary determination unit 14 weights the score of each pixel using information as to whether the target pixel belongs to the projection range of the subject model, and assigns a foreground label or a background label to the pixel based on the weighted score. Give. When a subject model is projected onto an input image, pixels within the projected range of the subject model have a high probability of being in the foreground, and pixels outside the range of projecting the subject model have a low probability of being in the foreground. When the subject model is projected onto the input image, the boundary determination unit 14 weights pixels included in the projected portion of the subject model so that they are determined to be in the foreground, and weights pixels not included in the projected portion of the subject model. Weighting is performed so that it is determined to be the background. Since the boundary part of the subject model may not completely match the boundary of the subject in the input image, the boundary determining unit 14 weights pixels further away from the boundary of the projected subject model. good too.
 図4にTRIMAPに被写体モデルを投影した一例を示す。被写体モデルの境界がTRIMAPの未知領域内に入る。図4において、被写体モデルで隠れる部分は前景である確率が高く、被写体モデルからはみ出す部分は前景である確率が低い。境界決定部14が、カラーモデルを用いて決定したスコアに対して被写体モデルの投影情報を用いて重み付けすることで、色だけでは判定しにくい画素を前景と背景に分類できる。 Fig. 4 shows an example of projecting a subject model on TRIMAP. The boundary of the object model falls within the unknown region of TRIMAP. In FIG. 4, a portion hidden by the subject model has a high probability of being the foreground, and a portion protruding from the subject model has a low probability of being the foreground. The boundary determination unit 14 weights the score determined using the color model using the projection information of the subject model, thereby classifying pixels that are difficult to determine based on color alone into the foreground and the background.
 出力部15は、入力画像から前景ラベルが付与された画素群を抽出して前景画像を生成して出力する。例えば、出力部15は、前景ラベルが付与された画素を白、背景ラベルが付与された画素を黒としたマスク画像を生成し、入力画像とマスク画像とを合成して前景画像を生成する。 The output unit 15 extracts the foreground-labeled pixel group from the input image to generate and output a foreground image. For example, the output unit 15 generates a mask image in which foreground-labeled pixels are white and background-labeled pixels are black, and generates a foreground image by synthesizing the input image and the mask image.
 [画像処理装置の動作]
 次に、図5のフローチャートを参照し、画像処理装置1の処理の流れについて説明する。画像処理装置1には、TRIMAPを生成するための背景画像またはLUTが事前に入力されており、画像処理装置1は被写体モデルを保持しているものとする。図5の処理は、カメラで撮影した映像のフレームを入力するごとに実行される。
[Operation of image processing device]
Next, the processing flow of the image processing apparatus 1 will be described with reference to the flowchart of FIG. It is assumed that the background image or LUT for generating the TRIMAP is input in advance to the image processing apparatus 1, and the image processing apparatus 1 holds a subject model. The processing in FIG. 5 is executed each time a frame of video captured by a camera is input.
 ステップS1にて、入力部11は、カメラから画像を入力する。 In step S1, the input unit 11 inputs an image from the camera.
 ステップS2にて、配置部12は、被写体モデルが入力画像内の被写体と重なるように被写体モデルを配置する。 In step S2, the placement unit 12 places the subject model so that the subject model overlaps the subject in the input image.
 ステップS3にて、分類部13は、入力画像を前景領域、背景領域、および未知領域に分類したTRIMAPを生成する。ステップS2の処理とステップS3の処理は並列に実行されてもよいし、ステップS3の処理がステップS2の処理よりも先に行われてもよい。 At step S3, the classification unit 13 generates a TRIMAP by classifying the input image into a foreground area, a background area, and an unknown area. The processing of step S2 and the processing of step S3 may be performed in parallel, or the processing of step S3 may be performed prior to the processing of step S2.
 ステップS4にて、境界決定部14は、TRIMAPで未知領域に分類された各画素を前景または背景にクラスタリングして被写体の境界を決定する。クラスタリング処理の詳細は後述する。クラスタリング処理により、入力画像の各画素には、前景ラベルまたは背景ラベルが付与される。 In step S4, the boundary determination unit 14 clusters each pixel classified as an unknown region by TRIMAP into the foreground or background to determine the boundary of the subject. Details of the clustering process will be described later. Each pixel of the input image is given a foreground label or a background label by the clustering process.
 ステップS5にて、出力部15は、入力画像から前景ラベルが付与された画素群を抽出して前景画像を生成し、生成した前景画像を出力する。 In step S5, the output unit 15 extracts the foreground-labeled pixel group from the input image, generates a foreground image, and outputs the generated foreground image.
 次に、図6のフローチャートを参照し、境界決定部14の処理の流れについて説明する。図6の処理は、TRIMAPの未知領域の画素ごとに実行される。 Next, the processing flow of the boundary determination unit 14 will be described with reference to the flowchart of FIG. The processing in FIG. 6 is executed for each pixel in the unknown region of TRIMAP.
 ステップS41にて、境界決定部14は、対象画素の色に基づいて対象画素のスコアを決定する。 In step S41, the boundary determination unit 14 determines the score of the target pixel based on the color of the target pixel.
 ステップS42にて、境界決定部14は、対象画素が被写体モデルの投影範囲内に属するか否かの情報を用いて重み付けを決定する。 In step S42, the boundary determination unit 14 determines weighting using information as to whether the target pixel belongs to the projection range of the subject model.
 ステップS43にて、境界決定部14は、対象画素のスコアを重み付けて評価し、対象画素のラベルを判別する。 In step S43, the boundary determination unit 14 weights and evaluates the score of the target pixel, and determines the label of the target pixel.
 ステップS44にて、境界決定部14は、ステップS43の判別結果に基づいて対象画素に前景ラベルまたは背景ラベルを付与する。 In step S44, the boundary determining unit 14 assigns a foreground label or a background label to the target pixel based on the determination result of step S43.
 境界決定部14は、以上の処理を未知領域の全ての画素に対して行う。 The boundary determination unit 14 performs the above processing on all pixels in the unknown region.
 以上説明したように、本実施形態の画像処理装置1は、被写体モデルを入力画像に投影した際に被写体モデルが入力画像内の被写体と重なるように被写体モデルを配置する配置部12と、入力画像を前景領域、背景領域、および未知領域に分類する分類部13と、未知領域に属する入力画像の画素のそれぞれについて、当該画素の画素値に基づいて当該画素が前景であるか背景であるかを表すスコアを算出し、当該画素が被写体モデルの投影範囲内に属するか否かの情報に基づいてスコアを重み付けて当該画素を前景領域または背景領域に分類する境界決定部14と、入力画像から前景領域の画素群を抽出して被写体のみを抽出した前景画像を出力する出力部15を備える。これにより、被写体と背景の色が類似する場合であってもより正しく被写体を抽出できる。 As described above, the image processing apparatus 1 of the present embodiment includes the arrangement unit 12 that arranges the object model so that the object model overlaps with the object in the input image when the object model is projected onto the input image, and the input image into a foreground region, a background region, and an unknown region; a boundary determining unit 14 that calculates a score representing the pixel, weights the score based on information as to whether the pixel belongs to the projection range of the subject model, and classifies the pixel into a foreground region or a background region; An output unit 15 is provided for extracting a group of pixels in an area and outputting a foreground image in which only the subject is extracted. As a result, the subject can be extracted more accurately even when the colors of the subject and the background are similar.
 上記説明した画像処理装置1には、例えば、図7に示すような、中央演算処理装置(CPU)901と、メモリ902と、ストレージ903と、通信装置904と、入力装置905と、出力装置906とを備える汎用的なコンピュータシステムを用いることができる。このコンピュータシステムにおいて、CPU901がメモリ902上にロードされた所定のプログラムを実行することにより、画像処理装置1が実現される。このプログラムは磁気ディスク、光ディスク、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記録することも、ネットワークを介して配信することもできる。 The image processing apparatus 1 described above includes, for example, a central processing unit (CPU) 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906 as shown in FIG. and a general-purpose computer system can be used. In this computer system, the image processing apparatus 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 . This program can be recorded on a computer-readable recording medium such as a magnetic disk, optical disk, or semiconductor memory, or distributed via a network.
 1…画像処理装置
 11…入力部
 12…配置部
 13…分類部
 14…境界決定部
 15…出力部
DESCRIPTION OF SYMBOLS 1... Image processing apparatus 11... Input part 12... Arrangement part 13... Classification part 14... Boundary determination part 15... Output part

Claims (4)

  1.  カメラで撮影した入力画像から被写体のみを抽出した前景画像を生成する画像処理装置であって、
     前記被写体の形をした3次元モデルを前記入力画像に投影した際に前記3次元モデルが前記入力画像内の前記被写体と重なるように前記3次元モデルを配置する配置部と、
     前記入力画像を前景領域、背景領域、および未知領域に分類する分類部と、
     前記未知領域に属する前記入力画像の画素のそれぞれについて、当該画素の画素値に基づいて当該画素が前景であるか背景であるかを表すスコアを算出し、当該画素が前記3次元モデルの投影範囲内に属するか否かの情報に基づいて前記スコアを重み付けて当該画素を前記前景領域または前記背景領域に分類する境界決定部と、
     前記入力画像から前記前景領域の画素群を抽出して前記被写体のみを抽出した前景画像を出力する出力部を備える
     画像処理装置。
    An image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera,
    an arrangement unit that arranges the three-dimensional model so that when the three-dimensional model having the shape of the subject is projected onto the input image, the three-dimensional model overlaps the subject in the input image;
    a classification unit that classifies the input image into a foreground region, a background region, and an unknown region;
    For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information about whether the pixel belongs to the foreground region or the background region;
    An image processing apparatus comprising an output unit that extracts a group of pixels in the foreground area from the input image and outputs a foreground image in which only the subject is extracted.
  2.  請求項1に記載の画像処理装置であって、
     前記配置部は、前記3次元モデルの姿勢を前記入力画像内の被写体の姿勢に合わせて変形する
     画像処理装置。
    The image processing device according to claim 1,
    The image processing device, wherein the placement unit transforms the orientation of the three-dimensional model to match the orientation of the subject in the input image.
  3.  カメラで撮影した入力画像から被写体のみを抽出した前景画像を生成する画像処理方法であって、
     コンピュータが、
     前記被写体の形をした3次元モデルを前記入力画像に投影した際に前記3次元モデルが前記入力画像内の前記被写体と重なるように前記3次元モデルを配置し、
     前記入力画像を前景領域、背景領域、および未知領域に分類し、
     前記未知領域に属する前記入力画像の画素のそれぞれについて、当該画素の画素値に基づいて当該画素が前景であるか背景であるかを表すスコアを算出し、当該画素が前記3次元モデルの投影範囲内に属するか否かの情報に基づいて前記スコアを重み付けて当該画素を前記前景領域または前記背景領域に分類し、
     前記入力画像から前記前景領域の画素群を抽出して前記被写体のみを抽出した前景画像を出力する
     画像処理方法。
    An image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera,
    the computer
    arranging the three-dimensional model so that when the three-dimensional model in the shape of the subject is projected onto the input image, the three-dimensional model overlaps the subject in the input image;
    classifying the input image into a foreground region, a background region, and an unknown region;
    For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. classify the pixel into the foreground region or the background region by weighting the score based on whether it belongs to
    An image processing method comprising: extracting a group of pixels in the foreground area from the input image and outputting a foreground image in which only the subject is extracted.
  4.  請求項1または2に記載の画像処理装置の各部としてコンピュータを動作させるプログラム。 A program that causes a computer to operate as each part of the image processing apparatus according to claim 1 or 2.
PCT/JP2021/007878 2021-03-02 2021-03-02 Image processing device, image processing method, and program WO2022185403A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/007878 WO2022185403A1 (en) 2021-03-02 2021-03-02 Image processing device, image processing method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/007878 WO2022185403A1 (en) 2021-03-02 2021-03-02 Image processing device, image processing method, and program

Publications (1)

Publication Number Publication Date
WO2022185403A1 true WO2022185403A1 (en) 2022-09-09

Family

ID=83154028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/007878 WO2022185403A1 (en) 2021-03-02 2021-03-02 Image processing device, image processing method, and program

Country Status (1)

Country Link
WO (1) WO2022185403A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
JP2019192022A (en) * 2018-04-26 2019-10-31 キヤノン株式会社 Image processing apparatus, image processing method, and program
JP2019204333A (en) * 2018-05-24 2019-11-28 日本電信電話株式会社 Video processing device, video processing method, and video processing program
JP2020160812A (en) * 2019-03-27 2020-10-01 Kddi株式会社 Region extraction device and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
JP2019192022A (en) * 2018-04-26 2019-10-31 キヤノン株式会社 Image processing apparatus, image processing method, and program
JP2019204333A (en) * 2018-05-24 2019-11-28 日本電信電話株式会社 Video processing device, video processing method, and video processing program
JP2020160812A (en) * 2019-03-27 2020-10-01 Kddi株式会社 Region extraction device and program

Similar Documents

Publication Publication Date Title
Matern et al. Exploiting visual artifacts to expose deepfakes and face manipulations
US11107232B2 (en) Method and apparatus for determining object posture in image, device, and storage medium
US11727577B2 (en) Video background subtraction using depth
US11882357B2 (en) Image display method and device
US20200258196A1 (en) Image processing apparatus, image processing method, and storage medium
CN116018616A (en) Maintaining a fixed size of a target object in a frame
WO2022156640A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
CN111291885A (en) Near-infrared image generation method, network generation training method and device
WO2022156626A1 (en) Image sight correction method and apparatus, electronic device, computer-readable storage medium, and computer program product
CN105229697A (en) Multi-modal prospect background segmentation
US20210374972A1 (en) Panoramic video data processing method, terminal, and storage medium
KR20160098560A (en) Apparatus and methdo for analayzing motion
CN111985281B (en) Image generation model generation method and device and image generation method and device
KR20190054702A (en) Method and apparatus for detecting action of object in viedio stream
KR20180087918A (en) Learning service Method of virtual experience for realistic interactive augmented reality
CN111832745A (en) Data augmentation method and device and electronic equipment
JP2019117577A (en) Program, learning processing method, learning model, data structure, learning device and object recognition device
US20190068955A1 (en) Generation apparatus, generation method, and computer readable storage medium
CN110598139A (en) Web browser augmented reality real-time positioning method based on 5G cloud computing
CN112712487A (en) Scene video fusion method and system, electronic equipment and storage medium
WO2023273069A1 (en) Saliency detection method and model training method and apparatus thereof, device, medium, and program
Chen et al. Sound to visual: Hierarchical cross-modal talking face video generation
WO2022185403A1 (en) Image processing device, image processing method, and program
US20230131418A1 (en) Two-dimensional (2d) feature database generation
WO2023086398A1 (en) 3d rendering networks based on refractive neural radiance fields

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21928979

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21928979

Country of ref document: EP

Kind code of ref document: A1