WO2022185403A1 - Dispositif de traitement d'image, procédé de traitement d'image et programme - Google Patents

Dispositif de traitement d'image, procédé de traitement d'image et programme Download PDF

Info

Publication number
WO2022185403A1
WO2022185403A1 PCT/JP2021/007878 JP2021007878W WO2022185403A1 WO 2022185403 A1 WO2022185403 A1 WO 2022185403A1 JP 2021007878 W JP2021007878 W JP 2021007878W WO 2022185403 A1 WO2022185403 A1 WO 2022185403A1
Authority
WO
WIPO (PCT)
Prior art keywords
foreground
subject
pixel
input image
region
Prior art date
Application number
PCT/JP2021/007878
Other languages
English (en)
Japanese (ja)
Inventor
翔大 山田
秀信 長田
弘員 柿沼
浩太 日高
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/007878 priority Critical patent/WO2022185403A1/fr
Publication of WO2022185403A1 publication Critical patent/WO2022185403A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • the present invention relates to an image processing device, an image processing method, and a program.
  • a subject extraction technology that generates a foreground image by extracting only the subject from an image captured by a camera.
  • a TRIMAP is generated by classifying an image into a foreground region, a background region, and an unknown region, using a background subtraction method or machine learning. Classify into foreground or background based on the information of whether it is classified as background.
  • the present invention has been made in view of the above, and it is an object of the present invention to more accurately extract a subject even when the colors of the subject and the background are similar.
  • An image processing device is an image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera, and projects a three-dimensional model in the shape of the subject onto the input image.
  • an arrangement unit that arranges the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when the three-dimensional model is placed, a classification unit that classifies the input image into a foreground region, a background region, and an unknown region; For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model.
  • a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information as to whether the pixel belongs to the object; an output unit for outputting a foreground image from which only the
  • An image processing method is an image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera, wherein a computer inputs the three-dimensional model in the shape of the subject. arranging the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when projected onto an image; classifying the input image into a foreground region, a background region, and an unknown region; for each pixel of the input image that belongs to the three-dimensional model, calculating a score representing whether the pixel is foreground or background based on the pixel value of the pixel, and the pixel belongs to the projection range of the three-dimensional model classifying the pixels into the foreground region or the background region by weighting the scores based on information on whether or not the pixels are extracted from the input image, extracting a group of pixels in the foreground region and extracting only the subject to obtain a foreground image Output.
  • the subject can be extracted more correctly even when the colors of the subject and the background are similar.
  • FIG. 1 is a functional block diagram showing an example of the configuration of the image processing apparatus of this embodiment.
  • FIG. 2 is a diagram showing an example of how subject models are arranged.
  • FIG. 3 is a diagram showing an example of TRIMAP.
  • FIG. 4 is a diagram showing an example of projecting a subject model onto TRIMAP.
  • FIG. 5 is a flowchart showing an example of the flow of processing by the image processing apparatus.
  • FIG. 6 is a flowchart showing an example of the flow of processing for clustering pixels in an unknown region.
  • FIG. 7 is a diagram illustrating an example of a hardware configuration of an image processing apparatus;
  • FIG. 1 is a functional block diagram showing an example of the configuration of an image processing apparatus 1 of this embodiment.
  • the image processing device 1 includes an input unit 11 , an arrangement unit 12 , a classification unit 13 , a boundary determination unit 14 and an output unit 15 .
  • the image processing apparatus 1 receives an image captured by a camera, separates the subject from the background, and outputs a foreground image in which only the subject is extracted.
  • the input unit 11 inputs in advance a background image showing only the background.
  • the input unit 11 may input in advance a lookup table (LUT) used for object separation.
  • the LUT is a table that holds the foreground probability for a combined value of the pixel value of the background image and the pixel value of the input image.
  • the background image and LUT are sent to the classifier 13 .
  • the input unit 11 inputs a video captured by a camera frame by frame, and transmits the input frames (hereinafter referred to as input images) to the placement unit 12 and the classification unit 13 .
  • the arrangement unit 12 detects a subject from the input image and arranges a three-dimensional model in the shape of the subject (hereinafter referred to as a subject model) according to the subject in the input image. More specifically, the arranging unit 12 aligns the position and size of the object model and arranges the object model so that the object model overlaps the object in the input image when the object model is projected onto the input image. . In other words, the arranging unit 12 arranges the subject model in the virtual space so that the subject model is superimposed on the subject in the input image when the subject model is perspectively transformed.
  • a subject model placed in virtual space can be photographed by a virtual camera and rendered on an input image.
  • the arranging unit 12 can use virtual reality (AR) technology to superimpose a subject model on the input image, and arrange the subject model so that it overlaps the position of the subject in the input image.
  • AR virtual reality
  • the placement unit 12 estimates the pose of the subject and transforms the pose of the subject model to match the pose of the subject. For example, when the subject is a human, the arrangement unit 12 estimates skeleton data connecting the joint points of the subject from the input image, and applies the skeleton data to the subject model to match the posture.
  • Fig. 2 shows an example of how the subject model is arranged.
  • a subject model representing the subject is created in advance, and the placement unit 12 holds the subject model.
  • the arranging unit 12 transforms the posture of the subject model according to the posture of the subject in the input image, and sizes the subject model so that the subject model projected onto the two-dimensional plane has the same size as the subject in the input image. change.
  • the object model is projected onto the two-dimensional plane, the object model is projected onto the position of the object within the input image, as shown in FIG.
  • the arranging unit 12 may transmit information on the position and orientation of the arranged subject model to the boundary determining unit 14 , and may transmit information on the pixels of the input image onto which the subject model is projected (drawn) to the boundary determining unit 14 . can be sent to The boundary determining unit 14 only needs to be able to identify the pixels on which the subject model is projected from the pixels of the input image.
  • the classification unit 13 classifies the input image into the foreground area where the subject appears, the background area where the subject does not appear, and the unknown area where it is impossible to determine whether the foreground area or the background area. Specifically, the classification unit 13 performs threshold processing on the difference between the background image and the input image to generate TRIMAP.
  • TRIMAP is a region map that classifies each pixel of the input image as foreground, background, or unknown. Pixels in the foreground region are given a foreground label. Pixels in the background region are given a background label. Pixels in the unknown region are given an unknown label. Pixels in the unknown region may not be labeled.
  • FIG. 3 shows an example of TRIMAP. In the example of FIG. 3, the foreground area is black, the background area is white, and the unknown area is hatched. An unknown region exists between the foreground region and the background region.
  • the classification unit 13 When generating a TRIMAP using a LUT, the classification unit 13 refers to the LUT, and converts a value obtained by combining pixel values of pixels at the same coordinates in the background image and the input image into a probability of being in the foreground. The classification unit 13 assigns a foreground label, a background label, or an unknown label to each pixel based on the foreground probability obtained from the LUT.
  • classification unit 13 may use any method as long as it can classify the input image into the foreground region, the background region, and the unknown region.
  • the boundary determination unit 14 uses the color model to determine a score indicating whether the pixel is foreground or background. For example, the boundary determining unit 14 selects pixels having a color similar to that of the target pixel from around the target pixel, and determines the score of the target pixel based on the label given to the selected pixels.
  • the boundary determination unit 14 weights the score of each pixel using information as to whether the target pixel belongs to the projection range of the subject model, and assigns a foreground label or a background label to the pixel based on the weighted score.
  • a subject model is projected onto an input image
  • pixels within the projected range of the subject model have a high probability of being in the foreground, and pixels outside the range of projecting the subject model have a low probability of being in the foreground.
  • the boundary determination unit 14 weights pixels included in the projected portion of the subject model so that they are determined to be in the foreground, and weights pixels not included in the projected portion of the subject model. Weighting is performed so that it is determined to be the background. Since the boundary part of the subject model may not completely match the boundary of the subject in the input image, the boundary determining unit 14 weights pixels further away from the boundary of the projected subject model. good too.
  • Fig. 4 shows an example of projecting a subject model on TRIMAP.
  • the boundary of the object model falls within the unknown region of TRIMAP.
  • a portion hidden by the subject model has a high probability of being the foreground, and a portion protruding from the subject model has a low probability of being the foreground.
  • the boundary determination unit 14 weights the score determined using the color model using the projection information of the subject model, thereby classifying pixels that are difficult to determine based on color alone into the foreground and the background.
  • the output unit 15 extracts the foreground-labeled pixel group from the input image to generate and output a foreground image. For example, the output unit 15 generates a mask image in which foreground-labeled pixels are white and background-labeled pixels are black, and generates a foreground image by synthesizing the input image and the mask image.
  • step S1 the input unit 11 inputs an image from the camera.
  • step S2 the placement unit 12 places the subject model so that the subject model overlaps the subject in the input image.
  • the classification unit 13 generates a TRIMAP by classifying the input image into a foreground area, a background area, and an unknown area.
  • the processing of step S2 and the processing of step S3 may be performed in parallel, or the processing of step S3 may be performed prior to the processing of step S2.
  • step S4 the boundary determination unit 14 clusters each pixel classified as an unknown region by TRIMAP into the foreground or background to determine the boundary of the subject. Details of the clustering process will be described later. Each pixel of the input image is given a foreground label or a background label by the clustering process.
  • step S5 the output unit 15 extracts the foreground-labeled pixel group from the input image, generates a foreground image, and outputs the generated foreground image.
  • step S41 the boundary determination unit 14 determines the score of the target pixel based on the color of the target pixel.
  • step S42 the boundary determination unit 14 determines weighting using information as to whether the target pixel belongs to the projection range of the subject model.
  • step S43 the boundary determination unit 14 weights and evaluates the score of the target pixel, and determines the label of the target pixel.
  • step S44 the boundary determining unit 14 assigns a foreground label or a background label to the target pixel based on the determination result of step S43.
  • the boundary determination unit 14 performs the above processing on all pixels in the unknown region.
  • the image processing apparatus 1 of the present embodiment includes the arrangement unit 12 that arranges the object model so that the object model overlaps with the object in the input image when the object model is projected onto the input image, and the input image into a foreground region, a background region, and an unknown region; a boundary determining unit 14 that calculates a score representing the pixel, weights the score based on information as to whether the pixel belongs to the projection range of the subject model, and classifies the pixel into a foreground region or a background region; An output unit 15 is provided for extracting a group of pixels in an area and outputting a foreground image in which only the subject is extracted. As a result, the subject can be extracted more accurately even when the colors of the subject and the background are similar.
  • the image processing apparatus 1 described above includes, for example, a central processing unit (CPU) 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906 as shown in FIG. and a general-purpose computer system can be used.
  • the image processing apparatus 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 .
  • This program can be recorded on a computer-readable recording medium such as a magnetic disk, optical disk, or semiconductor memory, or distributed via a network.

Abstract

L'invention concerne un dispositif de traitement d'image (1) comprenant : une unité d'agencement (12) qui agence un modèle de sujet de façon à ce que le modèle de sujet chevauche un sujet dans une image d'entrée lorsque le modèle de sujet est projeté sur l'image d'entrée ; une unité de classification (13) qui divise l'image d'entrée en une zone d'avant-plan, une zone d'arrière-plan et une zone inconnue ; une unité de détermination de limite(14) qui, pour chacun des pixels dans la zone inconnue de l'image d'entrée, calcule un score indiquant si le pixel est un premier plan ou un arrière-plan d'après une valeur de pixel pour le pixel, pondère le score d'après les informations indiquant si le pixel se trouve dans la zone de projection du modèle de sujet, puis classe le pixel comme étant dans la zone de premier plan ou dans la zone d'arrière-plan ; et une unité de sortie (15) qui extrait un groupe de pixels pour la zone de premier plan à partir de l'image d'entrée et génère une image de premier plan dans laquelle seul le sujet a été extrait.
PCT/JP2021/007878 2021-03-02 2021-03-02 Dispositif de traitement d'image, procédé de traitement d'image et programme WO2022185403A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/007878 WO2022185403A1 (fr) 2021-03-02 2021-03-02 Dispositif de traitement d'image, procédé de traitement d'image et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/007878 WO2022185403A1 (fr) 2021-03-02 2021-03-02 Dispositif de traitement d'image, procédé de traitement d'image et programme

Publications (1)

Publication Number Publication Date
WO2022185403A1 true WO2022185403A1 (fr) 2022-09-09

Family

ID=83154028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/007878 WO2022185403A1 (fr) 2021-03-02 2021-03-02 Dispositif de traitement d'image, procédé de traitement d'image et programme

Country Status (1)

Country Link
WO (1) WO2022185403A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
JP2019192022A (ja) * 2018-04-26 2019-10-31 キヤノン株式会社 画像処理装置、画像処理方法及びプログラム
JP2019204333A (ja) * 2018-05-24 2019-11-28 日本電信電話株式会社 映像処理装置、映像処理方法、および映像処理プログラム
JP2020160812A (ja) * 2019-03-27 2020-10-01 Kddi株式会社 領域抽出装置及びプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100111370A1 (en) * 2008-08-15 2010-05-06 Black Michael J Method and apparatus for estimating body shape
JP2019192022A (ja) * 2018-04-26 2019-10-31 キヤノン株式会社 画像処理装置、画像処理方法及びプログラム
JP2019204333A (ja) * 2018-05-24 2019-11-28 日本電信電話株式会社 映像処理装置、映像処理方法、および映像処理プログラム
JP2020160812A (ja) * 2019-03-27 2020-10-01 Kddi株式会社 領域抽出装置及びプログラム

Similar Documents

Publication Publication Date Title
Matern et al. Exploiting visual artifacts to expose deepfakes and face manipulations
US11107232B2 (en) Method and apparatus for determining object posture in image, device, and storage medium
US11727577B2 (en) Video background subtraction using depth
US11882357B2 (en) Image display method and device
US20200258196A1 (en) Image processing apparatus, image processing method, and storage medium
WO2022156640A1 (fr) Procédé et appareil de correction du regard pour image, dispositif électronique, support d'enregistrement lisible par ordinateur et produit programme d'ordinateur
CN116018616A (zh) 保持帧中的目标对象的固定大小
CN111291885A (zh) 近红外图像的生成方法、生成网络的训练方法和装置
WO2022156626A1 (fr) Procédé et appareil de correction de vue d'image, dispositif électronique, support d'enregistrement lisible par ordinateur et produit programme d'ordinateur
CN105229697A (zh) 多模态前景背景分割
US20210374972A1 (en) Panoramic video data processing method, terminal, and storage medium
KR20160098560A (ko) 동작 분석 장치 및 방법
CN111985281B (zh) 图像生成模型的生成方法、装置及图像生成方法、装置
KR20180087918A (ko) 실감형 인터랙티브 증강현실 가상체험 학습 서비스 방법
CN111832745A (zh) 数据增广的方法、装置及电子设备
JP2019117577A (ja) プログラム、学習処理方法、学習モデル、データ構造、学習装置、および物体認識装置
US20190068955A1 (en) Generation apparatus, generation method, and computer readable storage medium
CN110598139A (zh) 基于5G云计算的Web浏览器增强现实实时定位的方法
CN112712487A (zh) 一种场景视频融合方法、系统、电子设备及存储介质
WO2023273069A1 (fr) Procédé de détection de relief, procédé et appareil d'apprentissage de modèle associés, dispositif, support et programme
Chen et al. Sound to visual: Hierarchical cross-modal talking face video generation
WO2022185403A1 (fr) Dispositif de traitement d'image, procédé de traitement d'image et programme
US20230131418A1 (en) Two-dimensional (2d) feature database generation
WO2023086398A1 (fr) Réseaux de rendu 3d basés sur des champs de radiance neurale de réfraction
EP4020372A1 (fr) Extracteur d'actif d'écriture/de dessin vers le numérique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21928979

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21928979

Country of ref document: EP

Kind code of ref document: A1