KR20210062388A

KR20210062388A - Image processing apparatus and method for performing object segmentation of image

Info

Publication number: KR20210062388A
Application number: KR1020190150549A
Authority: KR
Inventors: 길종인
Original assignee: 주식회사 케이티
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-05-31

Abstract

Provided is an image processing device for performing object segmentation of an image, which includes: an image input unit for receiving an image; a first foreground mask generation unit generating a first foreground mask generated based on a deep learning model from the received image, a second foreground mask generated based on background modeling, and a third foreground mask generated based on color information of a reference background image; a result mask generation unit generating a result mask by performing a logical operation on at least two of the first foreground mask, the second foreground mask, and the third foreground mask; and an object dividing unit dividing an object from the received image by using the result mask.

Description

An image processing apparatus and method for performing object segmentation of an image {IMAGE PROCESSING APPARATUS AND METHOD FOR PERFORMING OBJECT SEGMENTATION OF IMAGE}

본 발명은 영상의 객체 분할을 수행하는 영상 처리 장치 및 방법에 관한 것이다.The present invention relates to an image processing apparatus and method for performing object segmentation of an image.

영상에서 배경이 되는 영역을 제거하고, 객체(전경)를 분할하는 다양한 기술이 존재한다. Various techniques exist for removing the background area from the image and segmenting the object (foreground).

영상의 배경을 제거하는 방법의 예로, 하나의 영상을 다수의 정지 영상으로 변환한 후, 각 정지 영상마다 객체 영역을 분리하고, 다수의 정지 영상을 다시 결합하여 영상을 생성할 수 있다. As an example of a method of removing a background of an image, an image may be generated by converting one image into a plurality of still images, separating an object region for each still image, and combining the plurality of still images again.

또한, 영상의 각 정지 영상 간의 시간적 연관성과 움직임 정보를 활용하여 배경과 전경을 구분하는 방법도 있다.In addition, there is a method of distinguishing the background and the foreground by using temporal correlation and motion information between each still image of an image.

영상에서 시간의 흐름에 따른 객체의 움직임에 기초하여 객체를 분할하는 방법은, 영상 내에서 변화가 존재하는 부분(픽셀)을 객체 영역으로 판단하고 변화가 없는 부분을 배경 영역으로 판단한다. 이러한 방법은, 조명의 변화, 그림자의 이동 등 배경 영역에서 이미지의 변화가 발생하는 경우에, 해당 부분을 배경으로 올바르게 판단하기 어렵다는 문제점이 있다.In a method of segmenting an object based on the movement of an object over time in an image, a part (pixel) in the image where a change exists is determined as an object area, and a part without change is determined as a background area. This method has a problem in that it is difficult to correctly determine a corresponding part as a background when a change in an image occurs in a background area such as a change in lighting or a movement of a shadow.

최근에는 영상에서 객체를 분리하는 방법으로, 딥러닝을 기반으로 하는 기술들이 많이 활용되고 있다. 도 1은 종래의 딥랩-v3+(DeepLab-v3+) 모델을 적용하여, 영상으로부터 객체 영역(전경)을 분리한 영상의 일 정지화면을 나타낸다.Recently, as a method of separating an object from an image, technologies based on deep learning have been widely used. 1 shows a still image of an image obtained by separating an object area (foreground) from an image by applying a conventional DeepLab-v3+ model.

도 1을 참조하면, 딥러닝 모델만을 이용하여 영상의 객체 분할을 수행하였을 때, 결과 영상에 존재하는 여러가지 문제점이 나타난다. 즉, 움직임이 크고 빠르게 발생하는 사람의 손 부분과 같은 객체 말단의 일부 영역이 제대로 검출되지 않는다.Referring to FIG. 1, when object segmentation of an image is performed using only a deep learning model, various problems present in the resulting image appear. That is, some areas of the distal end of the object, such as the part of the human hand, which have a large movement and occur rapidly, are not properly detected.

또한, 도 1에 나타난 바와 같이 사람의 배, 다리 부분과 같은 객체 내부의 일부 영역이 제대로 검출되지 않을 수도 있다. 이로 인하여 객체 내부의 일부 영역이 배경으로 판단될 수 있고, 객체 영역에 하나 이상의 홀(hole)이 생성될 수 있다. In addition, as shown in FIG. 1, some areas inside the object, such as a human stomach and legs, may not be properly detected. Accordingly, a partial area inside the object may be determined as a background, and one or more holes may be created in the object area.

따라서, 공연 영상과 같이 객체의 움직임이 큰 영상에 딥러닝 기반의 객체 분할 기술을 적용할 경우, 객체(사람)의 손과 발이 제대로 검출되지 않거나, 또는 객체 영역의 내부에 구멍(Hole)이 발생하는 문제가 있다.Therefore, when deep learning-based object segmentation technology is applied to images with large object movements such as performance images, the hands and feet of the object (person) are not properly detected, or holes are generated inside the object area. There is a problem.

한국공개특허공보 제 10-2015-0037091호에는, 영상 프레임 내에서 사람의 형상을 검출하는 영상 처리 장치 및 그 제어방법이 개시되어 있다.Korean Patent Publication No. 10-2015-0037091 discloses an image processing apparatus for detecting a shape of a person in an image frame and a control method thereof.

입력받은 영상에서 배경과 전경 영역을 구분할 수 있는 영상 처리 장치 및 방법을 제공하고자 한다.An object of the present invention is to provide an image processing apparatus and method capable of distinguishing a background and a foreground region from an input image.

입력받은 영상을 처리하여 배경이 제거된 영상을 제공할 수 있는 영상 처리 장치 및 방법을 제공하고자 한다.An object of the present invention is to provide an image processing apparatus and method capable of processing an input image and providing an image from which a background is removed.

특히, 객체의 움직임이 큰 영상으로부터 객체를 용이하게 추출해 낼 수 있는 영상 처리 장치 및 방법을 제공하고자 한다.In particular, to provide an image processing apparatus and method capable of easily extracting an object from an image having a large movement of the object.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 영상의 객체 분할을 수행하는 영상 처리 장치에 있어서, 영상을 입력받는 영상 입력부, 상기 입력받은 영상으로부터 딥러닝 모델에 기초하여 생성된 제 1 전경 마스크, 배경 모델링에 기초하여 생성된 제 2 전경 마스크 및 참조 배경 영상의 색상 정보에 기초하여 생성된 제 3 전경마스크를 생성하는 전경 마스크 생성부, 상기 제 1 전경 마스크, 상기 제 2 전경 마스크 및 상기 제 3 전경 마스크 중 적어도 둘을 논리 연산하여 결과 마스크를 생성하는 결과 마스크 생성부 및 상기 결과 마스크를 이용하여 상기 입력받은 영상으로부터 객체를 분할하는 객체 분할부를 포함하는 것인, 영상 처리 장치를 제공할 수 있다.As a means for achieving the above-described technical problem, an embodiment of the present invention provides an image processing apparatus for performing object segmentation of an image, based on an image input unit receiving an image, and a deep learning model from the received image. A foreground mask generator that generates a first foreground mask generated, a second foreground mask generated based on background modeling, and a third foreground mask generated based on color information of a reference background image, the first foreground mask, and the second A result mask generator configured to generate a result mask by logically calculating at least two of the 2 foreground mask and the third foreground mask, and an object dividing unit that divides an object from the input image using the result mask, An image processing device may be provided.

일 실시예에서, 상기 전경 마스크 생성부는, 상기 입력받은 영상으로부터 상기 딥러닝 모델에 기초하여 상기 제 1 전경 마스크를 생성하는 제 1 전경 마스크 생성부를 포함할 수 있다.In an embodiment, the foreground mask generator may include a first foreground mask generator that generates the first foreground mask based on the deep learning model from the received image.

일 실시예에서, 상기 전경 마스크 생성부는, 상기 입력받은 영상으로부터 상기 배경 모델링을 수행하여 배경 정보를 생성하고, 상기 배경 정보에 기초하여 상기 입력받은 영상으로부터 상기 제 2 전경 마스크를 생성하는 제 2 전경 마스크 생성부를 포함할 수 있다.In an embodiment, the foreground mask generator generates background information by performing the background modeling from the input image, and generates the second foreground mask from the input image based on the background information. It may include a mask generator.

일 실시예에서, 상기 전경 마스크 생성부는, 상기 입력받은 영상으로부터 예상 객체 영역을 예측하고, 상기 예상 객체 영역에 대해 인페이팅 알고리즘을 적용하여 상기 참조 배경 영상을 생성하는 제 3 전경 마스크 생성부를 포함할 수 있다.In one embodiment, the foreground mask generator includes a third foreground mask generator configured to predict a predicted object region from the input image and generate the reference background image by applying an infacing algorithm to the predicted object region. can do.

일 실시예에서, 상기 제 3 전경 마스크 생성부는 상기 참조 배경 영상을 제 1 HSV(Hue, Saturation, Value) 색 공간으로 변환하고, 상기 제 1 HSV 색 공간으로부터 제 1 Hue 성분을 추출하고, 상기 입력받은 영상을 제 2 HSV 색 공간으로 변환하고, 상기 제 2 HSV 색 공간으로부터 제 2 Hue 성분을 추출하고, 상기 제 1 Hue 성분 및 상기 제 2 Hue 성분에 기초하여 상기 제 3 전경 마스크를 생성할 수 있다.In one embodiment, the third foreground mask generator converts the reference background image into a first HSV (Hue, Saturation, Value) color space, extracts a first Hue component from the first HSV color space, and input Convert the received image to a second HSV color space, extract a second Hue component from the second HSV color space, and generate the third foreground mask based on the first Hue component and the second Hue component. have.

일 실시예에서, 상기 결과 마스크 생성부는, 상기 제 1 전경 마스크, 상기 제 2 전경 마스크 및 상기 제 3 전경 마스크의 대응되는 픽셀에 상기 논리 연산을 수행한 결과에 기초하여 삼진 영상을 생성할 수 있다.In an embodiment, the result mask generator may generate a strikeout image based on a result of performing the logical operation on a pixel corresponding to the first foreground mask, the second foreground mask, and the third foreground mask. .

일 실시예에서, 상기 논리 연산은 AND 연산, OR 연산 및 XOR 연산 중 적어도 하나를 포함할 수 있다.In an embodiment, the logical operation may include at least one of an AND operation, an OR operation, and an XOR operation.

일 실시예에서, 상기 결과 마스크 생성부는, 상기 제 1 전경 마스크, 상기 제 2 전경 마스크 및 상기 제 3 전경 마스크의 대응되는 픽셀에 상기 논리 연산을 수행한 결과, 모두 전경으로 판단된 픽셀에 2의 값을 할당하고, 모두 배경으로 판단된 픽셀에 0의 값을 할당하고, 그 외의 픽셀에 1의 값을 할당하여 상기 삼진 영상을 생성할 수 있다.In an embodiment, as a result of performing the logical operation on pixels corresponding to the first foreground mask, the second foreground mask, and the third foreground mask, the result mask generator is The triplet image may be generated by assigning a value, assigning a value of 0 to all pixels determined as background, and assigning a value of 1 to other pixels.

일 실시예에서, 상기 결과 마스크 생성부는, 상기 삼진 영상에 영상 매팅 알고리즘을 적용하여 결과 이진 마스크를 생성할 수 있다.In an embodiment, the result mask generator may generate a result binary mask by applying an image matting algorithm to the struck image.

일 실시예에서, 상기 객체 분할부는, 상기 결과 이진 마스크 및 상기 입력받은 영상에 기초하여 상기 입력받은 영상으로부터 객체를 분할할 수 있다.In an embodiment, the object dividing unit may divide an object from the input image based on the resultant binary mask and the input image.

본 발명의 다른 실시예는, 영상의 객체 분할을 수행하는 방법에 있어서, 영상을 입력받는 단계, 상기 입력받은 영상으로부터 딥러닝 모델에 기초하여 생성된 제 1 전경 마스크, 배경 모델링에 기초하여 생성된 제 2 전경 마스크 및 참조 배경 영상의 색상 정보에 기초하여 생성된 제 3 전경마스크를 생성하는 단계, 상기 제 1 전경 마스크, 상기 제 2 전경 마스크 및 상기 제 3 전경 마스크 중 적어도 둘을 논리 연산하여 결과 마스크를 생성하는 단계 및 상기 결과 마스크를 이용하여 상기 입력받은 영상으로부터 객체를 분할하는 단계를 포함하는 것인, 영상의 객체 분할 방법을 제공할 수 있다.In another embodiment of the present invention, in a method of performing object segmentation of an image, receiving an image, a first foreground mask generated based on a deep learning model from the input image, and a background modeling Generating a second foreground mask and a third foreground mask generated based on color information of a reference background image, a result of logically calculating at least two of the first foreground mask, the second foreground mask, and the third foreground mask It is possible to provide a method for dividing an object of an image, comprising generating a mask and dividing an object from the input image using the resulting mask.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 입력받은 영상으로부터 배경을 제거하고 객체 영역을 분할하는 영상 처리 장치 및 방법을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, it is possible to provide an image processing apparatus and method for removing a background from an input image and segmenting an object region.

또한, 객체의 움직임이 큰 영상에서 객체 영역을 정확하게 추출하는 영상 처리 장치 및 방법을 제공할 수 있다.In addition, it is possible to provide an image processing apparatus and method for accurately extracting an object region from an image having a large object movement.

또한, 영상의 배경 제거 처리를 하는 데에 소요되는 비용과 시간을 절감할 수 있다.In addition, it is possible to reduce the cost and time required to remove the background of the image.

또한, 영상의 객체를 분할하는 방법을 실감 미디어 콘텐츠 제작 등에 활용할 수 있다.In addition, a method of dividing an object of an image can be utilized for the production of realistic media content.

도 1은 종래의 딥러닝을 기반으로 객체 분할을 수행한 영상에서 발생하는 문제점을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 영상 처리 장치의 구성도이다.
도 3a 내지 도 3c는 본 발명의 일 실시예에 따른 영상 처리 장치가 입력받는 영상의 일 정지화면을 예시적으로 나타낸 도면이다.
도 4a 내지 도 4c는 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 논리 연산을 수행한 결과 화면의 예시적인 도면이다.
도 5는 본 발명의 일 실시예에 따른 영상 처리 장치가 객체 분할을 수행한 영상의 일 정지화면을 예시적으로 나타낸다.
도 6은 본 발명의 일 실시예에 따른 영상의 객체 분할 방법의 순서도이다.1 is a diagram showing a problem occurring in an image in which object segmentation is performed based on a conventional deep learning.
2 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
3A to 3C are views exemplarily showing a still image of an image received by an image processing apparatus according to an embodiment of the present invention.
4A to 4C are exemplary diagrams of a result screen of a result of performing a logical operation on a pixel corresponding to a first foreground mask, a second foreground mask, and a third foreground mask.
5 illustrates an exemplary still image of an image in which an image processing apparatus according to an embodiment of the present invention has performed object segmentation.
6 is a flowchart of a method for dividing an object of an image according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" with another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized by using two or more hardware, or two or more units may be realized by one piece of hardware.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 영상 처리 장치(100)의 구성도이다. 도 2를 참조하면, 영상 처리 장치(100)는 영상 입력부(110), 전경 마스크 생성부(120), 결과 마스크 생성부(130) 및 객체 분할부(140)를 포함할 수 있다.2 is a block diagram of an image processing apparatus 100 according to an embodiment of the present invention. Referring to FIG. 2, the image processing apparatus 100 may include an image input unit 110, a foreground mask generation unit 120, a result mask generation unit 130, and an object segmentation unit 140.

영상 처리 장치(100)는 서버, 데스크탑, 노트북, 키오스크(KIOSK) 및 스마트폰(smartphone), 태블릿 PC를 포함할 수 있다. 다만, 영상 처리 장치(100)는 앞서 예시된 것들로 한정 해석되는 것은 아니다. 즉, 영상 처리 장치(100)는 후술하는 영상의 객체 분할 방법을 수행하는 프로세서를 탑재한 모든 장치를 포함할 수 있다.The image processing apparatus 100 may include a server, a desktop, a notebook, a kiosk, a smartphone, and a tablet PC. However, the image processing apparatus 100 is not limited to those exemplified above. That is, the image processing apparatus 100 may include all devices equipped with a processor that performs an object segmentation method of an image, which will be described later.

영상 입력부(110)는 하나 이상의 영상을 입력받을 수 있다. 예를 들어, 영상 입력부(110)는 사용자 단말과 같은 외부 장치로부터 영상을 입력받을 수 있다. 영상 입력부(110)는 외부 서버와의 통신을 통해 영상을 입력받을 수 있다.The image input unit 110 may receive one or more images. For example, the image input unit 110 may receive an image from an external device such as a user terminal. The image input unit 110 may receive an image through communication with an external server.

예를 들어, 영상 입력부(110)가 입력받는 영상은, 위치가 고정된 카메라를 사용하여 촬영된 것일 수 있다. 입력받는 영상은, 공연 영상과 같이 춤추는 사람을 촬영한 영상으로, 객체의 움직임이 큰 영상일 수 있다. 또한, 입력받는 영상은 장면 전환이 없는 영상일 수 있다. 또한, 입력받는 영상에는 하나 이상의 객체가 존재할 수 있고, 영상의 일 정지화면에 등장하는 객체가 시간의 흐름에 따라 변경될 수 있다. 또한, 입력받는 영상은 객체가 나타나는 장면 또는 객체가 사라지는 장면을 포함할 수 있다. 또한, 입력받는 영상에는 복수의 객체가 일렬로 배치될 수 있다. 예를 들어, 한 사람의 행위가 끝나면 그 사람은 맨 뒤로 이동하고, 다음 사람의 행위가 시작되는 패턴을 가지는 영상일 수 있다.For example, an image received by the image input unit 110 may be photographed using a camera having a fixed position. The received image is an image of a dancing person, such as a performance image, and may be an image in which the movement of the object is large. Also, the input image may be an image without a scene change. In addition, one or more objects may exist in an input image, and objects appearing on a still image of the image may be changed over time. Also, the input image may include a scene in which an object appears or a scene in which the object disappears. In addition, a plurality of objects may be arranged in a line in an input image. For example, when a person's action is over, the person moves to the back, and the video may have a pattern in which the next person's action begins.

도 3a 내지 도 3c는 본 발명의 일 실시예에 따른 영상 처리 장치가 입력받는 영상의 일 정지화면을 예시적으로 나타낸다.3A to 3C exemplarily show a still image of an image received by an image processing apparatus according to an embodiment of the present invention.

영상 입력부(110)가 입력받는 영상은 예를 들어, 도 3a와 같이 영상에서 객체가 차지하는 비중이 크고, 객체의 움직임 큰 영상일 수 있다. 즉, 영상 입력부(110)가 입력받는 영상은 예를 들어, 도 3a와 같이 객체가 빠르게 움직임으로 인하여 영상의 일 정지화면에서 객체의 특정 부위(도 3a에서는 손)에 블러(blur)가 발생될 수 있는 영상일 수 있다.The image received by the image input unit 110 may be, for example, an image with a large weight occupied by an object in the image and a large movement of the object as shown in FIG. 3A. That is, the image received by the image input unit 110 may cause blur to occur in a specific part of the object (the hand in FIG. 3A) in one still image of the image due to the rapid movement of the object as shown in FIG. 3A. It may be an image that can be used.

다른 예를 들어, 영상 입력부(110)가 입력받는 영상은 예를 들어, 도 3b와 같이 복수의 객체가 일렬로 배치되고, 영상의 일 정지화면에서 두 객체가 등을 맞대고 있는 장면을 포함할 수 있다.As another example, the image received by the image input unit 110 may include a scene in which a plurality of objects are arranged in a line as shown in FIG. 3B, and two objects face their backs in a still image of the image. have.

또 다른 예를 들어, 영상 입력부(110)가 입력받는 영상은 예를 들어, 도 3c와 같이 한 객체의 행위가 끝난 후에 그 객체는 맨 뒤로 이동하고, 다음 객체가 맨 앞에 배치되는 장면을 포함할 수 있다.As another example, the image inputted by the image input unit 110 includes a scene in which the object moves to the back and the next object is placed at the front after the action of an object is finished, for example, as shown in FIG. 3C. I can.

이와 같이, 영상 입력부(110)가 입력받는 영상은 객체의 움직임이 크기 때문에 종래와 같이 딥러닝 모델만을 이용하여 객체 영역을 분리할 경우, 객체의 손과 발이 제대로 검출되지 않거나, 또는 객체 영역의 내부에 구멍(Hole)이 발생하는 문제가 있다.In this way, since the image inputted by the image input unit 110 has a large movement of the object, when the object region is separated using only the deep learning model as in the prior art, the hands and feet of the object are not properly detected, or the inside of the object region. There is a problem that a hole occurs in the.

이에 반해, 본 발명의 일 실시예에 따른 영상 처리 장치는 후술하는 바와 같이 딥러닝 모델에 기초하여 생성된 제 1 전경 마스크뿐 아니라, 배경 모델링에 기초하여 생성된 제 2 전경 마스크 및 참조 배경 영상의 색상 정보에 기초하여 생성된 제 3 전경 마스크를 추가적으로 이용하므로, 상술한 문제점들을 해결할 수 있다.In contrast, the image processing apparatus according to an embodiment of the present invention includes not only the first foreground mask generated based on the deep learning model, but also the second foreground mask and the reference background image generated based on the background modeling, as described later. Since the third foreground mask generated based on the color information is additionally used, the above-described problems can be solved.

또한, 영상 입력부(110)가 입력받는 영상은 객체 중심적이며 객체의 동작이 반복적이기 때문에 후술하는 제 2 전경 마스크 및 제 3 전경 마스크를 생성하는 것이 용이하다.In addition, since the image received by the image input unit 110 is object-oriented and the operation of the object is repetitive, it is easy to generate the second foreground mask and the third foreground mask described later.

전경 마스크 생성부(120)는 입력받은 영상으로부터 딥러닝 모델에 기초하여 생성된 제 1 전경 마스크, 배경 모델링에 기초하여 생성된 제 2 전경 마스크 및 참조 배경 영상의 색상 정보에 기초하여 생성된 제 3 전경마스크를 생성할 수 있다. 전경 마스크 생성부(120)는 입력받은 영상의 각 픽셀에 대하여 전경으로 판단되는 경우에는 1의 값을 할당하고, 그 외의 경우에는 0의 값을 할당하여 하나 이상의 전경 마스크를 생성할 수 있다.The foreground mask generator 120 includes a first foreground mask generated based on a deep learning model from an input image, a second foreground mask generated based on background modeling, and a third generated based on color information of the reference background image. You can create a foreground mask. The foreground mask generator 120 may generate one or more foreground masks by assigning a value of 1 to each pixel of the input image when it is determined to be a foreground, and assigning a value of 0 to other pixels.

도 2에 도시된 바와 같이, 전경 마스크 생성부(120)는 제 1 전경 마스크 생성부(121)를 포함할 수 있다. 제 1 전경 마스크 생성부(121)는 입력받은 영상으로부터 딥러닝 모델에 기초하여 제 1 전경 마스크를 생성할 수 있다.2, the foreground mask generation unit 120 may include a first foreground mask generation unit 121. The first foreground mask generator 121 may generate a first foreground mask based on a deep learning model from an input image.

제 1 전경 마스크 생성부(121)는 예를 들어, 딥러닝 기반의 객체 분할 기술 중 하나인 딥랩-v3+(DeepLab-v3+) 모델을 이용할 수 있고, 이에 따라 제 1 전경 마스크 생성부(121)는 입력받은 영상으로부터 객체 분할이 수행된 제 1 전경 마스크를 생성할 수 있다. The first foreground mask generation unit 121 may use, for example, a DeepLab-v3+ (DeepLab-v3+) model, which is one of deep learning-based object segmentation technologies, and accordingly, the first foreground mask generation unit 121 A first foreground mask in which object segmentation has been performed may be generated from the received image.

제 1 전경 마스크 생성부(121)는 전술한 딥러닝 모델에 제한되지 않고, 딥러닝 기반의 임의의 알고리즘을 이용하여 제 1 전경 마스크를 생성할 수 있다.The first foreground mask generator 121 is not limited to the above-described deep learning model, and may generate the first foreground mask using a deep learning-based arbitrary algorithm.

도 2에 도시된 바와 같이, 전경 마스크 생성부(120)는 제 2 전경 마스크 생성부(122)를 포함할 수 있다. 제 2 전경 마스크 생성부(122)는 배경 모델링에 기초하여 입력받은 영상으로부터 제 2 전경 마스크를 생성할 수 있다.As shown in FIG. 2, the foreground mask generation unit 120 may include a second foreground mask generation unit 122. The second foreground mask generator 122 may generate a second foreground mask from an input image based on background modeling.

일 실시예에서, 제 2 전경 마스크 생성부(122)는 입력받은 영상을 비실시간으로 처리하여 제 2 전경 마스크를 생성할 수 있다. 제 2 전경 마스크 생성부(122)는 입력받은 영상으로부터 추출한 정지 영상을 배경 모델링을 통해 생성한 배경 영상과 비교함으로써 객체 영역을 분리할 수 있다.In an embodiment, the second foreground mask generator 122 may generate a second foreground mask by processing the received image in non-real time. The second foreground mask generator 122 may separate the object region by comparing the still image extracted from the input image with the background image generated through background modeling.

제 2 전경 마스크 생성부(122)는 입력받은 영상으로부터 배경 모델링을 수행하여 배경 정보를 생성할 수 있다. 배경 모델링은 동영상 데이터로부터 배경을 예측하는 기술로, 배경 모델링에 의하여 입력받은 영상에서 배경으로 판단되는 영역에 관한 정보인 배경 정보가 생성될 수 있다. 상술한 바와 같이, 입력받은 영상에서 객체는 영상의 중앙에 위치하고 기설정된 패턴을 반복하기 때문에 제 2 전경 마스크 생성부(122)가 배경 모델링을 용이하게 수행할 수 있다.The second foreground mask generator 122 may generate background information by performing background modeling from an input image. Background modeling is a technique for predicting a background from moving image data, and background information, which is information about a region determined as a background in an image received by background modeling, may be generated. As described above, since the object is located in the center of the image and repeats a preset pattern in the received image, the second foreground mask generator 122 can easily perform background modeling.

일 실시예에서, 제 2 전경 마스크 생성부(122)는 배경 모델링을 수행하기 위해 혼합 가우시안(Mixture of Gaussian, MOG) 모델을 사용할 수 있다. 혼합 가우시안 모델은 영상에 존재하는 복수의 픽셀 각각에 대하여, 확률 분포를 다중의 가우시안 분포로 모델링하는 기법이다. 제 2 전경 마스크 생성부(122)는 예를 들어, 영상 입력부(110)에서 입력받은 영상을 재생하여 첫 프레임부터 마지막 프레임까지 순차적으로 입력받을 수 있다. 제 2 전경 마스크 생성부(122)는 혼합 가우시안 모델에 의해 영상에 존재하는 모든 픽셀에 대하여 여러 개의 가우시안 분포를 결정할 수 있다.In an embodiment, the second foreground mask generator 122 may use a mixed Gaussian (MOG) model to perform background modeling. The mixed Gaussian model is a technique for modeling a probability distribution with multiple Gaussian distributions for each of a plurality of pixels in an image. The second foreground mask generation unit 122 may play back an image input from the image input unit 110 and receive sequential input from the first frame to the last frame, for example. The second foreground mask generator 122 may determine a plurality of Gaussian distributions for all pixels existing in the image by the mixed Gaussian model.

제 2 전경 마스크 생성부(122)는 배경 모델링에 의해 생성된 배경 정보에 기초하여 입력받은 영상으로부터 제 2 전경 마스크를 생성할 수 있다. 제 2 전경 마스크 생성부(122)는 배경 모델링에 의해 생성된 배경 정보와 입력받은 영상으로부터 추출한 정지 영상을 비교하여 차영상을 획득할 수 있다. 차영상은 다음의 수학식 1에 기초하여 도출될 수 있다.The second foreground mask generator 122 may generate a second foreground mask from an input image based on background information generated by background modeling. The second foreground mask generator 122 may obtain a difference image by comparing background information generated by background modeling with a still image extracted from an input image. The difference image may be derived based on Equation 1 below.

여기서, I(x,y)는 입력받은 영상의 (x,y) 좌표의 픽셀이고, BG(x,y)는 배경 모델링을 수행하여 생성된 배경 정보의 (x,y) 좌표의 픽셀이고, D(x,y)는 차영상의 (x,y) 좌표의 픽셀이다.Here, I(x,y) is a pixel of (x,y) coordinates of the input image, and BG(x,y) is a pixel of (x,y) coordinates of background information generated by performing background modeling, D(x,y) is a pixel of the (x,y) coordinate of the difference image.

제 2 전경 마스크 생성부(122)는 입력받은 영상과 배경 정보의 대응되는 픽셀에 대하여, 각 픽셀의 값의 차이에 따라 생성되는 차영상을 획득할 수 있다.The second foreground mask generator 122 may obtain a difference image generated according to a difference between a value of each pixel with respect to a pixel corresponding to the input image and the background information.

제 2 전경 마스크 생성부(122)는 획득된 차영상에 임계치를 적용하여, 차영상을 이진화함으로써 제 2 전경 마스크를 생성할 수 있다. 제 2 전경 마스크는 다음의 수학식 2에 기초하여 도출될 수 있다.The second foreground mask generator 122 may generate a second foreground mask by applying a threshold to the obtained difference image and binarizing the difference image. The second foreground mask may be derived based on Equation 2 below.

여기서 D(x,y)는 차영상의 (x,y) 좌표의 픽셀이고, T는 임계치이고, M₂(x,y)는 제 2 전경 마스크의 (x,y) 좌표의 픽셀이다.Here, D(x,y) is a pixel of (x,y) coordinates of the difference image, T is a threshold value, and M ₂ (x,y) is a pixel of (x,y) coordinates of the second foreground mask.

제 2 전경 마스크 생성부(122)는 배경 정보와 입력받은 영상을 비교하여 획득된 차영상에 대하여, 차영상의 각 픽셀의 값이 임계치(T)보다 큰 경우에는 1의 값을 할당하고, 그 외의 경우에는 0의 값을 할당함으로써 차영상을 이진화하여 제 2 전경 마스크를 생성할 수 있다.The second foreground mask generator 122 allocates a value of 1 to the difference image obtained by comparing the background information and the input image, when the value of each pixel of the difference image is greater than the threshold value T, and In other cases, a second foreground mask may be generated by binarizing the difference image by assigning a value of 0.

제 2 전경 마스크 생성부(122)는 전술한 배경 모델링 기법에 제한되지 않고, 배경 모델링을 수행하는 임의의 알고리즘을 이용하여 제 2 전경 마스크를 생성할 수 있다.The second foreground mask generator 122 is not limited to the above-described background modeling technique, and may generate a second foreground mask using an arbitrary algorithm for performing background modeling.

도 2에 도시된 바와 같이, 전경 마스크 생성부(120)는 제 3 전경 마스크 생성부(123)를 포함할 수 있다. 제 3 전경 마스크 생성부(123)는 참조 배경 영상의 색상 정보에 기초하여 입력받은 영상으로부터 제 3 전경 마스크를 생성할 수 있다. 마찬가지로, 입력받은 영상에서 객체는 영상의 중앙에 위치하고 기설정된 패턴을 반복하기 때문에 제 3 전경 마스크 생성부(123)가 참조 배경 영상을 용이하게 생성할 수 있다.2, the foreground mask generation unit 120 may include a third foreground mask generation unit 123. The third foreground mask generator 123 may generate a third foreground mask from an input image based on color information of the reference background image. Likewise, since the object in the input image is located at the center of the image and repeats a preset pattern, the third foreground mask generator 123 can easily generate the reference background image.

제 3 전경 마스크 생성부(123)는 입력받은 영상으로부터 예상 객체 영역을 예측할 수 있다. 예를 들어, 도 3a 내지 도 3c에 도시된 바와 같은 영상을 입력받는 경우, 영상의 첫번째 프레임에서 분할 대상인 객체에 해당하는 사람 영역은 영상의 가운데 부분에 위치하고 있다. 영상의 첫번째 프레임에서 사람이 존재하는 가운데 부분 이외의 영역은 배경에 해당한다. 즉, 전체 영상에서 객체가 위치할 가능성이 큰 영역을 영상의 첫번째 프레임으로부터 예측할 수 있다.The third foreground mask generator 123 may predict the expected object area from the input image. For example, when an image as shown in FIGS. 3A to 3C is received, a human region corresponding to an object to be segmented in the first frame of the image is located in the center of the image. In the first frame of the image, the area other than the center where a person is present corresponds to the background. That is, a region in the entire image where the object is likely to be located can be predicted from the first frame of the image.

제 3 전경 마스크 생성부(123)는 객체가 위치할 가능성이 큰 영역에 해당하는 예상 객체 영역을 예측하고, 미리 설정할 수 있다. 제 3 전경 마스크 생성부(123)는 예상 객체 영역으로 설정된 부분의 모든 픽셀에 0의 값을 할당할 수 있고, 이에 의하여 영상에서 객체 영역 해당하는 부분을 검은색으로 만들어 제거할 수 있다.The third foreground mask generator 123 may predict and preset an expected object area corresponding to an area in which the object is likely to be located. The third foreground mask generator 123 may allocate a value of 0 to all pixels of the portion set as the expected object area, thereby making the portion corresponding to the object area black in the image and removing it.

제 3 전경 마스크 생성부(123)는 예상 객체 영역에 대해 인페인팅 알고리즘을 적용하여 참조 배경 영상을 생성할 수 있다. 인페인팅(Inpainting)은 주변에 존재하는 픽셀의 정보로부터 비어있는 픽셀의 값을 예측하는 기술이다. The third foreground mask generator 123 may generate a reference background image by applying an inpainting algorithm to the expected object area. Inpainting is a technique that predicts the value of an empty pixel from information on pixels existing around it.

제 3 전경 마스크 생성부(123)는 제거된 영역, 즉 예상 객체 영역으로 설정된 부분 이외의 부분에 대하여 인페인팅 기술을 적용할 수 있다. 제 3 전경 마스크 생성부(123)는 인페이팅 알고리즘에 의해 제거된 영역에 새로운 값을 채움으로써, 입력받은 영상으로부터 참조 배경 영상(Reference background image)을 생성할 수 있다.The third foreground mask generator 123 may apply the inpainting technique to the removed area, that is, a portion other than the portion set as the expected object area. The third foreground mask generator 123 may generate a reference background image from an input image by filling a new value in the area removed by the infating algorithm.

제 3 전경 마스크 생성부(123)는 참조 배경 영상을 제 1 HSV(Hue, Saturation, Value) 색 공간으로 변환할 수 있다. 제 3 전경 마스크 생성부(123)는 참조 배경 영상을 이용하여 생성된 제 1 HSV 색 공간으로부터 제 1 Hue 성분을 추출할 수 있다. 또한, 제 3 전경 마스크 생성부(123)는 RGB 색 공간으로 구성된 참조 배경 영상을 제 1 HSV 색 공간으로 변환함으로써 제 1 Hue 성분을 추출할 수 있다.The third foreground mask generator 123 may convert the reference background image into a first hue, saturation, value (HSV) color space. The third foreground mask generator 123 may extract the first Hue component from the first HSV color space generated by using the reference background image. Also, the third foreground mask generator 123 may extract the first Hue component by converting the reference background image composed of the RGB color space into the first HSV color space.

제 3 전경 마스크 생성부(123)는 입력받은 영상을 제 2 HSV 색 공간으로 변환할 수 있다. 제 3 전경 마스크 생성부(123)는 입력받은 영상을 이용하여 생성된 제 2 HSV 색 공간으로부터 제 2 Hue 성분을 추출할 수 있다. 또한, 제 3 전경 마스크 생성부(123)는 RGB 색 공간으로 구성된 입력받은 영상의 각 프레임을 순차적으로 제 2 HSV 색 공간으로 변환함으로써 제 2 Hue 성분을 추출할 수 있다.The third foreground mask generator 123 may convert the received image into a second HSV color space. The third foreground mask generator 123 may extract a second Hue component from the second HSV color space generated using the input image. In addition, the third foreground mask generator 123 may extract the second Hue component by sequentially converting each frame of an input image composed of an RGB color space into a second HSV color space.

제 3 전경 마스크 생성부(123)는 제 1 Hue 성분 및 제 2 Hue 성분에 기초하여 제 3 전경 마스크를 생성할 수 있다. 제 3 전경 마스크 생성부(123)는 제 1 Hue 성분과 제 2 Hue 성분을 비교하여, 그 차가 임계치 이상이면 객체 영역으로 판단할 수 있고, 그 외의 경우에는 배경 영역으로 판단할 수 있다. 제 3 전경 마스크 생성부(123)는 제 1 Hue 성분과 제 2 Hue 성분을 비교한 결과에 기초하여 각 픽셀의 값을 할당함으로써, 제 3 전경 마스크를 생성할 수 있다.The third foreground mask generator 123 may generate a third foreground mask based on the first Hue component and the second Hue component. The third foreground mask generator 123 compares the first Hue component and the second Hue component, and if the difference is greater than or equal to a threshold value, it may be determined as an object region, and otherwise, it may be determined as a background region. The third foreground mask generator 123 may generate a third foreground mask by allocating a value of each pixel based on a result of comparing the first Hue component and the second Hue component.

제 3 전경 마스크 생성부(123)는 전술한 인페인팅 알고리즘에 제한되지 않고, 인페이팅을 수행하는 임의의 알고리즘을 이용하여 제 3 전경 마스크를 생성할 수 있다.The third foreground mask generator 123 is not limited to the above-described inpainting algorithm, and may generate a third foreground mask by using an arbitrary algorithm for performing the inpainting.

결과 마스크 생성부(130)는 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크 중 적어도 둘을 논리 연산하여 결과 마스크를 생성할 수 있다. 이하에서는, 결과 마스크 생성부(130)가 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크 모두를 논리 연산하여 결과 마스크를 생성하는 경우만을 설명하지만 이에 한정되지 않는다.The result mask generator 130 may generate a result mask by performing a logical operation on at least two of a first foreground mask, a second foreground mask, and a third foreground mask. Hereinafter, only a case in which the result mask generator 130 generates a result mask by performing a logical operation on all of the first foreground mask, the second foreground mask, and the third foreground mask will be described, but is not limited thereto.

결과 마스크 생성부(130)는 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크에 대응되는 픽셀에 논리 연산을 수행한 결과에 기초하여 삼진 영상을 생성할 수 있다. 결과 마스크 생성부(130)가 수행하는 논리 연산은 AND 연산, OR 연산 및 XOR 연산 중 적어도 하나를 포함할 수 있다.The result mask generator 130 may generate a triplet image based on a result of performing a logical operation on pixels corresponding to the first foreground mask, the second foreground mask, and the third foreground mask. The logical operation performed by the result mask generator 130 may include at least one of an AND operation, an OR operation, and an XOR operation.

제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크에 대응되는 픽셀에 AND 연산을 수행하면, 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크에서 모두 전경이라고 판단된 경우에는, 1의 값이 할당된다. 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크 중 어느 하나에서라도 배경이라고 판단된 경우에는 AND 연산의 수행 결과, 0의 값이 할당된다.When the AND operation is performed on the pixels corresponding to the first, second, and third foreground masks, when it is determined that all of the first, second, and third foreground masks are foreground, 1 Value is assigned. If any one of the first foreground mask, the second foreground mask, and the third foreground mask is determined to be a background, a value of 0 is assigned as a result of the AND operation.

AND 연산의 수행 결과에 따라 각 픽셀에 할당되는 값은 다음의 수학식 3에 기초하여 도출될 수 있다. 여기서 M_AND(x,y)는 AND 연산의 수행 결과에 의해 (x,y) 좌표의 픽셀에 할당되는 값이고, M₁(x,y)는 제 1 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₂(x,y)는 제 2 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₃(x,y)는 제 3 전경 마스크의 (x,y) 좌표의 픽셀의 값이다.A value allocated to each pixel according to the result of the AND operation may be derived based on Equation 3 below. Here, M _AND (x,y) is a value assigned to the pixel of the (x,y) coordinate by the result of the AND operation, and M ₁ (x,y) is the (x,y) coordinate of the first foreground mask. Is the value of the pixel, M ₂ (x,y) is the value of the pixel at the (x,y) coordinate of the second foreground mask, and M ₃ (x,y) is the value of the (x,y) coordinate of the third foreground mask. It is the value of the pixel.

도 4a를 참조하면, 전경 마스크 생성부(120)에서 생성된 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 AND 연산을 수행한 결과 화면이 예시적으로 나타난다.Referring to FIG. 4A, a screen as a result of performing an AND operation on corresponding pixels of a first foreground mask, a second foreground mask, and a third foreground mask generated by the foreground mask generator 120 is shown as an example.

제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 OR 연산을 수행하면, 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크 중 어느 하나에서라도 전경이라고 판단된 경우에는, 1의 값이 할당된다. 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크 모두에서 배경이라고 판단된 경우에는 OR 연산의 수행 결과, 0의 값이 할당된다.When the OR operation is performed on the corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask, when it is determined that any one of the first foreground mask, the second foreground mask, and the third foreground mask is a foreground, A value of 1 is assigned. When it is determined that the first foreground mask, the second foreground mask, and the third foreground mask are all backgrounds, a value of 0 is assigned as a result of the OR operation.

OR 연산의 수행 결과에 따라 각 픽셀에 할당되는 값은 다음의 수학식 4에 기초하여 도출될 수 있다. 여기서 M_OR(x,y)는 OR 연산의 수행 결과에 의해 (x,y) 좌표의 픽셀에 할당되는 값이고, M₁(x,y)는 제 1 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₂(x,y)는 제 2 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₃(x,y)는 제 3 전경 마스크의 (x,y) 좌표의 픽셀의 값이다.A value allocated to each pixel according to the result of performing the OR operation may be derived based on Equation 4 below. Here, M _OR (x,y) is a value assigned to the pixel of the (x,y) coordinate by the result of the OR operation, and M ₁ (x,y) is the (x,y) coordinate of the first foreground mask. Is the value of the pixel, M ₂ (x,y) is the value of the pixel at the (x,y) coordinate of the second foreground mask, and M ₃ (x,y) is the value of the (x,y) coordinate of the third foreground mask. It is the value of the pixel.

도 4b를 참조하면, 전경 마스크 생성부(120)에서 생성된 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 OR 연산을 수행한 결과 화면이 예시적으로 나타난다.Referring to FIG. 4B, a screen as a result of performing an OR operation on corresponding pixels of a first foreground mask, a second foreground mask, and a third foreground mask generated by the foreground mask generator 120 is shown as an example.

제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 XOR 연산을 수행하면, 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크가 모두 전경이라고 판단한 경우 또는 모두 배경이라고 판단한 경우에, 0의 값이 할당된다. When the XOR operation is performed on the corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask, the first foreground mask, the second foreground mask, and the third foreground mask are all determined to be foreground or all are determined to be backgrounds. In this case, a value of 0 is assigned.

즉, 세 가지 방법에 의해 모두 동일한 결과가 도출된 경우에는 0의 값이 할당된다. 그러나, 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크에서의 결과가 모두 일치하지 않는 경우에는 XOR 연산의 수행 결과, 1의 값이 할당된다.That is, when the same result is obtained by all three methods, a value of 0 is assigned. However, when the results of the first foreground mask, the second foreground mask, and the third foreground mask do not all match, a value of 1 is assigned as a result of performing the XOR operation.

XOR 연산의 수행 결과에 따라 각 픽셀에 할당되는 값은 다음의 수학식 5에 기초하여 도출될 수 있다. 여기서 M_XOR(x,y)는 XOR 연산의 수행 결과에 의해 (x,y) 좌표의 픽셀에 할당되는 값이고, M₁(x,y)는 제 1 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₂(x,y)는 제 2 전경 마스크의 (x,y) 좌표의 픽셀의 값이고, M₃(x,y)는 제 3 전경 마스크의 (x,y) 좌표의 픽셀의 값이다.A value allocated to each pixel according to a result of performing the XOR operation may be derived based on Equation 5 below. Here, M _XOR (x,y) is a value assigned to the pixel of the (x,y) coordinate by the result of the XOR operation, and M ₁ (x,y) is the (x,y) coordinate of the first foreground mask. Is the value of the pixel, M ₂ (x,y) is the value of the pixel at the (x,y) coordinate of the second foreground mask, and M ₃ (x,y) is the value of the (x,y) coordinate of the third foreground mask. It is the value of the pixel.

도 4c를 참조하면, 전경 마스크 생성부(120)에서 생성된 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 XOR 연산을 수행한 결과 화면이 예시적으로 나타난다.Referring to FIG. 4C, a screen as a result of performing an XOR operation on corresponding pixels of a first foreground mask, a second foreground mask, and a third foreground mask generated by the foreground mask generator 120 is shown as an example.

AND 연산에 의해 전경으로 판단되는 픽셀은, 객체 영역에 해당할 확률이 매우 높다. OR 연산에 의해 배경으로 판단되는 픽셀은, 객체 영역이 아닌 배경 영역에 해당할 확률이 매우 높다. XOR 연산에 의해 1의 값이 할당된 픽셀은 객체인지 또는 배경인지 결정하기 어렵다고 판단되는 영역이다.A pixel determined to be a foreground by an AND operation has a very high probability that it corresponds to an object area. A pixel determined as a background by an OR operation has a very high probability that it corresponds to a background area other than an object area. A pixel to which a value of 1 is assigned by an XOR operation is an area that is judged to be difficult to determine whether it is an object or a background.

결과 마스크 생성부(130)가 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 상기 논리 연산을 수행한 결과, 모두 전경으로 판단된 픽셀에 2의 값을 할당할 수 있다. 즉, AND 연산에 의해 전경으로 판단되는 픽셀에 2의 값을 할당할 수 있다.As a result of the result mask generator 130 performing the logical operation on the pixels corresponding to the first foreground mask, the second foreground mask, and the third foreground mask, a value of 2 may be allocated to the pixels determined to be the foreground. . That is, a value of 2 may be assigned to a pixel determined as a foreground by an AND operation.

결과 마스크 생성부(130)가 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 상기 논리 연산을 수행한 결과, 모두 배경으로 판단된 픽셀에 0의 값을 할당할 수 있다. 즉, OR 연산에 의해 배경으로 판단되는 픽셀에 0의 값을 할당할 수 있다.As a result of the result mask generator 130 performing the logical operation on the pixels corresponding to the first foreground mask, the second foreground mask, and the third foreground mask, a value of 0 may be allocated to all pixels determined as backgrounds. . That is, a value of 0 may be assigned to a pixel determined as a background by an OR operation.

결과 마스크 생성부(130)가 제 1 전경 마스크, 제 2 전경 마스크 및 제 3 전경 마스크의 대응되는 픽셀에 상기 논리 연산을 수행한 결과, 모두 전경으로 판단된 픽셀 및 모두 배경으로 판단된 픽셀에 해당하지 않는, 그 외의 픽셀에 1의 값을 할당할 수 있다. 즉, XOR 연산에 의해 1의 값이 할당된 픽셀에 1의 값을 할당할 수 있다.As a result of performing the logical operation on the corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask by the mask generator 130, all of the pixels determined to be foreground and all of the pixels determined as backgrounds correspond to You can assign a value of 1 to other pixels that do not. That is, a value of 1 may be assigned to a pixel to which a value of 1 is assigned by an XOR operation.

결과 마스크 생성부(130)는 상술한 바와 같이 논리 연산을 수행한 결과에 기초하여 각 픽셀에 2, 1 또는 0의 값의 할당한 삼진 영상(Trimap)을 생성할 수 있다. 삼진 영상의 각 픽셀에 할당되는 값은 다음의 수학식 6에 기초하여 도출될 수 있다. 여기서 Trimap(x,y)는 삼진 영상의 (x,y) 좌표의 픽셀에 할당되는 값이고, M_AND(x,y)는 AND 연산의 수행 결과에 의해 (x,y) 좌표의 픽셀에 할당되는 값이고, M_XOR(x,y)는 XOR 연산의 수행 결과에 의해 (x,y) 좌표의 픽셀에 할당되는 값이다.The result mask generator 130 may generate a trimap in which a value of 2, 1, or 0 is allocated to each pixel based on a result of performing the logical operation as described above. A value assigned to each pixel of the ternary image may be derived based on Equation 6 below. Here, Trimap(x,y) is a value assigned to the pixel of the (x,y) coordinate of the ternary image, and M _AND (x,y) is assigned to the pixel of the (x,y) coordinate by the result of the AND operation. Is a value, and M _XOR (x,y) is a value assigned to the pixel of (x,y) coordinates by the result of the XOR operation.

결과 마스크 생성부(130)는 삼진 영상에 영상 매팅(Image matting) 알고리즘을 적용하여 결과 이진 마스크를 생성할 수 있다.The result mask generation unit 130 may generate a resultant binary mask by applying an image matting algorithm to the striking image.

결과 마스크 생성부(130)는 전술한 영상 매팅 알고리즘에 제한되지 않고, 영상 매팅을 수행하는 임의의 알고리즘을 이용하여 결과 이진 마스크를 생성할 수 있다.The result mask generator 130 is not limited to the above-described image matting algorithm, and may generate a result binary mask using an arbitrary algorithm for performing image matting.

객체 분할부(140)는 결과 마스크를 이용하여 입력받은 영상으로부터 객체를 분할할 수 있다. 객체 분할부(140)는 결과 이진 마스크 및 입력받은 영상에 기초하여, 입력받은 영상으로부터 객체를 분할할 수 있다.The object segmentation unit 140 may segment an object from an input image using a result mask. The object dividing unit 140 may divide an object from the input image based on the resulting binary mask and the input image.

즉, 객체 분할부(140)는 결과 이진 마스크와 입력받은 영상을 결합하여 배경이 제거된 전경 영상을 생성할 수 있다.That is, the object segmentation unit 140 may generate a foreground image from which the background is removed by combining the resulting binary mask and the input image.

도 5는 본 발명의 일 실시예에 따른 영상 처리 장치가 객체 분할을 수행한 영상의 일 정지화면을 예시적으로 나타낸다. 도 1에 도시된 딥러닝 모델만으로 배경을 제거한 결과 화면과 비교하여, 사람의 손, 발 영역이 정확하게 검출되고 객체 내부에 해당하는 영역에 홀이 발생하지 않았음을 확인할 수 있다.5 illustrates an exemplary still image of an image in which an image processing apparatus according to an embodiment of the present invention has performed object segmentation. As a result of removing the background with only the deep learning model illustrated in FIG. 1, it can be seen that the areas of the hands and feet of a human are accurately detected and no holes are generated in the areas corresponding to the interior of the object.

도 6은 본 발명의 일 실시예에 따른, 영상의 객체 분할 방법의 순서도이다. 도 6에 도시된 영상 처리 장치(100)에서 영상의 객체 분할 방법은 도 2에 도시된 실시예에 따라 영상 처리 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 2에 도시된 실시예에 따른 영상 처리 장치(100)에서 수행되는 영상 처리 방법에도 적용된다.6 is a flowchart of a method for dividing an object of an image according to an embodiment of the present invention. The method for dividing an object of an image in the image processing apparatus 100 shown in FIG. 6 includes steps processed in a time series by the image processing apparatus 100 according to the embodiment shown in FIG. 2. Therefore, even if omitted below, it is also applied to the image processing method performed by the image processing apparatus 100 according to the exemplary embodiment illustrated in FIG. 2.

단계 S610에서 영상 처리 장치(100)는 영상을 입력받을 수 있다.In step S610, the image processing apparatus 100 may receive an image.

단계 S620에서 영상 처리 장치(100)는 딥러닝 모델에 기초하여 제 1 전경 마스크를 생성할 수 있다.In step S620, the image processing apparatus 100 may generate a first foreground mask based on the deep learning model.

단계 S630에서 영상 처리 장치(100)는 배경 모델링에 기초하여 제 2 전경 마스크를 생성할 수 있다.In operation S630, the image processing apparatus 100 may generate a second foreground mask based on background modeling.

단계 S640에서 영상 처리 장치(100)는 참조 배경 영상의 색상 정보에 기초하여 제 3 전경 마스크를 생성할 수 있다.In operation S640, the image processing apparatus 100 may generate a third foreground mask based on color information of the reference background image.

단계 S650에서 영상 처리 장치(100)는 논리 연산하여 결과 마스크를 생성할 수 있다.In operation S650, the image processing apparatus 100 may generate a result mask by performing a logical operation.

단계 S660에서 영상 처리 장치(100)는 결과 마스크를 이용하여 입력받은 영상으로부터 객체를 분할할 수 있다.In operation S660, the image processing apparatus 100 may divide an object from the input image using the result mask.

상술한 설명에서, 단계 S610 내지 S660은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S610 to S660 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be switched.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that other specific forms can be easily modified without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

110: 영상 입력부
120: 전경 마스크 생성부
121: 제 1 전경 마스크 생성부
122: 제 2 전경 마스크 생성부
123: 제 3 전경 마스크 생성부
130: 결과 마스크 생성부
140: 객체 분할부110: video input unit
120: foreground mask generation unit
121: first foreground mask generation unit
122: second foreground mask generation unit
123: third foreground mask generation unit
130: result mask generation unit
140: object division

Claims

In the image processing apparatus for performing object segmentation of an image,
An image input unit for receiving an image;
A foreground mask for generating a first foreground mask generated based on a deep learning model from the input image, a second foreground mask generated based on background modeling, and a third foreground mask generated based on color information of a reference background image Generation unit;
A result mask generator configured to generate a result mask by performing a logical operation on at least two of the first foreground mask, the second foreground mask, and the third foreground mask; And
An object dividing unit that divides an object from the input image using the result mask
Including a, image processing apparatus.

The method of claim 1,
The foreground mask generation unit,
And a first foreground mask generator configured to generate the first foreground mask based on the deep learning model from the received image.

The method of claim 1,
The foreground mask generation unit,
And a second foreground mask generator configured to generate background information by performing the background modeling from the input image, and to generate the second foreground mask from the input image based on the background information .

The method of claim 1,
The foreground mask generation unit,
And a third foreground mask generator configured to generate the reference background image by predicting a predicted object region from the input image and applying an infacing algorithm to the predicted object region.

The method of claim 4,
The third foreground mask generator converts the reference background image into a first Hue, Saturation, Value (HSV) color space, extracts a first Hue component from the first HSV color space, and converts the input image into a second color space. Converting into an HSV color space, extracting a second Hue component from the second HSV color space, and generating the third foreground mask based on the first Hue component and the second Hue component .

The method of claim 1,
Wherein the result mask generator generates a triplet image based on a result of performing the logical operation on corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask.

The method of claim 6,
Wherein the logical operation includes at least one of an AND operation, an OR operation, and an XOR operation.

The method of claim 6,
The result mask generation unit allocates a value of 2 to pixels determined to be foreground as a result of performing the logical operation on corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask, The image processing apparatus, wherein a value of 0 is assigned to all pixels determined as background, and a value of 1 is assigned to other pixels to generate the struck image.

The method of claim 6,
The image processing apparatus, wherein the result mask generator generates a result binary mask by applying an image matting algorithm to the struck image.

The method of claim 9,
The image processing apparatus, wherein the object dividing unit divides an object from the input image based on the resultant binary mask and the input image.

In the method of performing object segmentation of an image,
Receiving an image;
Generating a first foreground mask generated based on a deep learning model from the input image, a second foreground mask generated based on background modeling, and a third foreground mask generated based on color information of a reference background image;
Generating a result mask by performing a logical operation on at least two of the first foreground mask, the second foreground mask, and the third foreground mask; And
Segmenting an object from the input image using the result mask
Including a method of dividing an object of an image.

The method of claim 11,
The generating of the first foreground mask includes generating the first foreground mask based on the deep learning model from the received image.

The method of claim 11,
Generating the second foreground mask,
Generating background information by performing the background modeling from the input image; And
Generating the second foreground mask from the input image based on the background information
Including a method of dividing an object of an image.

The method of claim 11,
Generating the third foreground mask,
Predicting an expected object area from the input image; And
Generating the reference background image by applying an infacing algorithm to the expected object area
Including a method of dividing an object of an image.

The method of claim 14,
Generating the third foreground mask,
Converting the reference background image into a first hue, saturation, value (HSV) color space;
Extracting a first Hue component from the first HSV color space;
Converting the received image into a second HSV color space;
Extracting a second Hue component from the second HSV color space; And
Generating the third foreground mask based on the first Hue component and the second Hue component
Including a method of dividing an object of an image.

The method of claim 11,
Generating the resulting mask,
Performing the logic operation on corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask; And
Generating a ternary image based on the result of performing the logical operation
Including a method of dividing an object of an image.

The method of claim 16,
The logical operation includes at least one of an AND operation, an OR operation, and an XOR operation.

The method of claim 16,
In the generating of the result mask, as a result of performing the logical operation on corresponding pixels of the first foreground mask, the second foreground mask, and the third foreground mask, a value of 2 is applied to pixels determined as foreground. Allocating, allocating a value of 0 to pixels determined as backgrounds, and assigning a value of 1 to other pixels to generate the struck image.

The method of claim 16,
The generating of the result mask includes applying an image matting algorithm to the struck image to generate a resultant binary mask.

The method of claim 19,
The step of dividing the object comprises dividing an object from the input image based on the resultant binary mask and the input image.