WO2022185403A1

WO2022185403A1 - Image processing device, image processing method, and program

Info

Publication number: WO2022185403A1
Application number: PCT/JP2021/007878
Authority: WO
Inventors: 翔大山田; 秀信長田; 弘員柿沼; 浩太日高
Original assignee: 日本電信電話株式会社
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2022-09-09

Abstract

An image processing device 1 that comprises: an arrangement unit 12 that arranges a subject model such that the subject model overlaps a subject in an input image when the subject model is projected onto the input image; a classification unit 13 that divides the input image into a foreground region, a background region, and an unknown region; a boundary determination unit 14 that, for each of the pixels in the unknown region of the input image, calculates a score that indicates whether the pixel is foreground or background on the basis of a pixel value for the pixel, weights the score on the basis of information about whether the pixel is in the projection area of the subject model, and classifies the pixel as being in the foreground region or the background region; and an output unit 15 that extracts a pixel group for the foreground region from the input image and outputs a foreground image in which only the subject has been extracted.

Description

Image processing device, image processing method, and program

The present invention relates to an image processing device, an image processing method, and a program.

A subject extraction technology is known that generates a foreground image by extracting only the subject from an image captured by a camera. In subject extraction technology, a TRIMAP is generated by classifying an image into a foreground region, a background region, and an unknown region, using a background subtraction method or machine learning. Classify into foreground or background based on the information of whether it is classified as background.

However, with conventional subject extraction technology, there was a problem that the boundary of the subject could not be extracted correctly when the background was accompanied by complex color changes, or when the color of the subject was the same as the background.

The present invention has been made in view of the above, and it is an object of the present invention to more accurately extract a subject even when the colors of the subject and the background are similar.

An image processing device according to one aspect of the present invention is an image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera, and projects a three-dimensional model in the shape of the subject onto the input image. an arrangement unit that arranges the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when the three-dimensional model is placed, a classification unit that classifies the input image into a foreground region, a background region, and an unknown region; For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information as to whether the pixel belongs to the object; an output unit for outputting a foreground image from which only the

An image processing method according to one aspect of the present invention is an image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera, wherein a computer inputs the three-dimensional model in the shape of the subject. arranging the three-dimensional model so that the three-dimensional model overlaps the subject in the input image when projected onto an image; classifying the input image into a foreground region, a background region, and an unknown region; for each pixel of the input image that belongs to the three-dimensional model, calculating a score representing whether the pixel is foreground or background based on the pixel value of the pixel, and the pixel belongs to the projection range of the three-dimensional model classifying the pixels into the foreground region or the background region by weighting the scores based on information on whether or not the pixels are extracted from the input image, extracting a group of pixels in the foreground region and extracting only the subject to obtain a foreground image Output.

According to the present invention, the subject can be extracted more correctly even when the colors of the subject and the background are similar.

FIG. 1 is a functional block diagram showing an example of the configuration of the image processing apparatus of this embodiment. FIG. 2 is a diagram showing an example of how subject models are arranged. FIG. 3 is a diagram showing an example of TRIMAP. FIG. 4 is a diagram showing an example of projecting a subject model onto TRIMAP. FIG. 5 is a flowchart showing an example of the flow of processing by the image processing apparatus. FIG. 6 is a flowchart showing an example of the flow of processing for clustering pixels in an unknown region. FIG. 7 is a diagram illustrating an example of a hardware configuration of an image processing apparatus;

An embodiment of the present invention will be described below with reference to the drawings. It should be noted that the embodiments described below represent generic or specific examples.

[Configuration of image processing device]
FIG. 1 is a functional block diagram showing an example of the configuration of an image processing apparatus 1 of this embodiment. The image processing device 1 includes an input unit 11 , an arrangement unit 12 , a classification unit 13 , a boundary determination unit 14 and an output unit 15 . The image processing apparatus 1 receives an image captured by a camera, separates the subject from the background, and outputs a foreground image in which only the subject is extracted.

The input unit 11 inputs in advance a background image showing only the background. The input unit 11 may input in advance a lookup table (LUT) used for object separation. The LUT is a table that holds the foreground probability for a combined value of the pixel value of the background image and the pixel value of the input image. The background image and LUT are sent to the classifier 13 .

The input unit 11 inputs a video captured by a camera frame by frame, and transmits the input frames (hereinafter referred to as input images) to the placement unit 12 and the classification unit 13 .

The arrangement unit 12 detects a subject from the input image and arranges a three-dimensional model in the shape of the subject (hereinafter referred to as a subject model) according to the subject in the input image. More specifically, the arranging unit 12 aligns the position and size of the object model and arranges the object model so that the object model overlaps the object in the input image when the object model is projected onto the input image. . In other words, the arranging unit 12 arranges the subject model in the virtual space so that the subject model is superimposed on the subject in the input image when the subject model is perspectively transformed. A subject model placed in virtual space can be photographed by a virtual camera and rendered on an input image. For example, the arranging unit 12 can use virtual reality (AR) technology to superimpose a subject model on the input image, and arrange the subject model so that it overlaps the position of the subject in the input image.

When the pose of the subject changes, the placement unit 12 estimates the pose of the subject and transforms the pose of the subject model to match the pose of the subject. For example, when the subject is a human, the arrangement unit 12 estimates skeleton data connecting the joint points of the subject from the input image, and applies the skeleton data to the subject model to match the posture.

Fig. 2 shows an example of how the subject model is arranged. A subject model representing the subject is created in advance, and the placement unit 12 holds the subject model. The arranging unit 12 transforms the posture of the subject model according to the posture of the subject in the input image, and sizes the subject model so that the subject model projected onto the two-dimensional plane has the same size as the subject in the input image. change. When the object model is projected onto the two-dimensional plane, the object model is projected onto the position of the object within the input image, as shown in FIG.

The arranging unit 12 may transmit information on the position and orientation of the arranged subject model to the boundary determining unit 14 , and may transmit information on the pixels of the input image onto which the subject model is projected (drawn) to the boundary determining unit 14 . can be sent to The boundary determining unit 14 only needs to be able to identify the pixels on which the subject model is projected from the pixels of the input image.

The classification unit 13 classifies the input image into the foreground area where the subject appears, the background area where the subject does not appear, and the unknown area where it is impossible to determine whether the foreground area or the background area. Specifically, the classification unit 13 performs threshold processing on the difference between the background image and the input image to generate TRIMAP. TRIMAP is a region map that classifies each pixel of the input image as foreground, background, or unknown. Pixels in the foreground region are given a foreground label. Pixels in the background region are given a background label. Pixels in the unknown region are given an unknown label. Pixels in the unknown region may not be labeled. FIG. 3 shows an example of TRIMAP. In the example of FIG. 3, the foreground area is black, the background area is white, and the unknown area is hatched. An unknown region exists between the foreground region and the background region.

When generating a TRIMAP using a LUT, the classification unit 13 refers to the LUT, and converts a value obtained by combining pixel values of pixels at the same coordinates in the background image and the input image into a probability of being in the foreground. The classification unit 13 assigns a foreground label, a background label, or an unknown label to each pixel based on the foreground probability obtained from the LUT.

Note that the classification unit 13 may use any method as long as it can classify the input image into the foreground region, the background region, and the unknown region.

For each pixel in the unknown region, the boundary determination unit 14 uses the color model to determine a score indicating whether the pixel is foreground or background. For example, the boundary determining unit 14 selects pixels having a color similar to that of the target pixel from around the target pixel, and determines the score of the target pixel based on the label given to the selected pixels.

Then, the boundary determination unit 14 weights the score of each pixel using information as to whether the target pixel belongs to the projection range of the subject model, and assigns a foreground label or a background label to the pixel based on the weighted score. Give. When a subject model is projected onto an input image, pixels within the projected range of the subject model have a high probability of being in the foreground, and pixels outside the range of projecting the subject model have a low probability of being in the foreground. When the subject model is projected onto the input image, the boundary determination unit 14 weights pixels included in the projected portion of the subject model so that they are determined to be in the foreground, and weights pixels not included in the projected portion of the subject model. Weighting is performed so that it is determined to be the background. Since the boundary part of the subject model may not completely match the boundary of the subject in the input image, the boundary determining unit 14 weights pixels further away from the boundary of the projected subject model. good too.

Fig. 4 shows an example of projecting a subject model on TRIMAP. The boundary of the object model falls within the unknown region of TRIMAP. In FIG. 4, a portion hidden by the subject model has a high probability of being the foreground, and a portion protruding from the subject model has a low probability of being the foreground. The boundary determination unit 14 weights the score determined using the color model using the projection information of the subject model, thereby classifying pixels that are difficult to determine based on color alone into the foreground and the background.

The output unit 15 extracts the foreground-labeled pixel group from the input image to generate and output a foreground image. For example, the output unit 15 generates a mask image in which foreground-labeled pixels are white and background-labeled pixels are black, and generates a foreground image by synthesizing the input image and the mask image.

[Operation of image processing device]
Next, the processing flow of the image processing apparatus 1 will be described with reference to the flowchart of FIG. It is assumed that the background image or LUT for generating the TRIMAP is input in advance to the image processing apparatus 1, and the image processing apparatus 1 holds a subject model. The processing in FIG. 5 is executed each time a frame of video captured by a camera is input.

In step S1, the input unit 11 inputs an image from the camera.

In step S2, the placement unit 12 places the subject model so that the subject model overlaps the subject in the input image.

At step S3, the classification unit 13 generates a TRIMAP by classifying the input image into a foreground area, a background area, and an unknown area. The processing of step S2 and the processing of step S3 may be performed in parallel, or the processing of step S3 may be performed prior to the processing of step S2.

In step S4, the boundary determination unit 14 clusters each pixel classified as an unknown region by TRIMAP into the foreground or background to determine the boundary of the subject. Details of the clustering process will be described later. Each pixel of the input image is given a foreground label or a background label by the clustering process.

In step S5, the output unit 15 extracts the foreground-labeled pixel group from the input image, generates a foreground image, and outputs the generated foreground image.

Next, the processing flow of the boundary determination unit 14 will be described with reference to the flowchart of FIG. The processing in FIG. 6 is executed for each pixel in the unknown region of TRIMAP.

In step S41, the boundary determination unit 14 determines the score of the target pixel based on the color of the target pixel.

In step S42, the boundary determination unit 14 determines weighting using information as to whether the target pixel belongs to the projection range of the subject model.

In step S43, the boundary determination unit 14 weights and evaluates the score of the target pixel, and determines the label of the target pixel.

In step S44, the boundary determining unit 14 assigns a foreground label or a background label to the target pixel based on the determination result of step S43.

The boundary determination unit 14 performs the above processing on all pixels in the unknown region.

As described above, the image processing apparatus 1 of the present embodiment includes the arrangement unit 12 that arranges the object model so that the object model overlaps with the object in the input image when the object model is projected onto the input image, and the input image into a foreground region, a background region, and an unknown region; a boundary determining unit 14 that calculates a score representing the pixel, weights the score based on information as to whether the pixel belongs to the projection range of the subject model, and classifies the pixel into a foreground region or a background region; An output unit 15 is provided for extracting a group of pixels in an area and outputting a foreground image in which only the subject is extracted. As a result, the subject can be extracted more accurately even when the colors of the subject and the background are similar.

The image processing apparatus 1 described above includes, for example, a central processing unit (CPU) 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906 as shown in FIG. and a general-purpose computer system can be used. In this computer system, the image processing apparatus 1 is realized by the CPU 901 executing a predetermined program loaded on the memory 902 . This program can be recorded on a computer-readable recording medium such as a magnetic disk, optical disk, or semiconductor memory, or distributed via a network.

DESCRIPTION OF SYMBOLS 1... Image processing apparatus 11... Input part 12... Arrangement part 13... Classification part 14... Boundary determination part 15... Output part

Claims

An image processing device that generates a foreground image by extracting only a subject from an input image captured by a camera,
an arrangement unit that arranges the three-dimensional model so that when the three-dimensional model having the shape of the subject is projected onto the input image, the three-dimensional model overlaps the subject in the input image;
a classification unit that classifies the input image into a foreground region, a background region, and an unknown region;
For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. a boundary determining unit that classifies the pixel into the foreground region or the background region by weighting the score based on information about whether the pixel belongs to the foreground region or the background region;
An image processing apparatus comprising an output unit that extracts a group of pixels in the foreground area from the input image and outputs a foreground image in which only the subject is extracted.
The image processing device according to claim 1,
The image processing device, wherein the placement unit transforms the orientation of the three-dimensional model to match the orientation of the subject in the input image.
An image processing method for generating a foreground image by extracting only a subject from an input image captured by a camera,
the computer
arranging the three-dimensional model so that when the three-dimensional model in the shape of the subject is projected onto the input image, the three-dimensional model overlaps the subject in the input image;
classifying the input image into a foreground region, a background region, and an unknown region;
For each pixel of the input image that belongs to the unknown region, a score representing whether the pixel is foreground or background is calculated based on the pixel value of the pixel, and the pixel is the projection range of the three-dimensional model. classify the pixel into the foreground region or the background region by weighting the score based on whether it belongs to
An image processing method comprising: extracting a group of pixels in the foreground area from the input image and outputting a foreground image in which only the subject is extracted.
A program that causes a computer to operate as each part of the image processing apparatus according to claim 1 or 2.