KR102338372B1

KR102338372B1 - Device and method to segment object from image

Info

Publication number: KR102338372B1
Application number: KR1020160022517A
Authority: KR
Inventors: 유병인; 후앙 용젠; 리앙 왕; 김정배; 최창규; 한재준
Original assignee: 삼성전자주식회사; 중국과학원 자동화연구소
Priority date: 2015-09-30
Filing date: 2016-02-25
Publication date: 2021-12-13
Also published as: KR20170038622A

Abstract

영상으로부터 객체를 분할하는 방법 및 장치가 제공된다. 일 실시예에 따른 영상으로부터 객체를 분할하는 장치는, 미리 학습된 영상 모델을 이용하여 입력 영상으로부터 객체 영상을 분할할 수 있다.A method and apparatus for segmenting an object from an image are provided. The apparatus for segmenting an object from an image according to an embodiment may segment an object image from an input image by using a pre-learned image model.

Description

Method and apparatus for segmenting an object from an image {DEVICE AND METHOD TO SEGMENT OBJECT FROM IMAGE}

이하, 영상으로부터 객체를 분할하는 기술이 제공된다.Hereinafter, a technique for segmenting an object from an image is provided.

영상 관련 기술 분야에 있어서 최근 영상을 이용하여 사람의 얼굴 등과 같은 객체를 인식하는 기술이 발전하고 있다. 이러한 얼굴 등과 같은 객체를 인식하기 위해서는, 영상으로부터 배경을 제외한 부분이 추출될 필요가 있다.In the field of image-related technologies, a technology for recognizing an object such as a human face using an image has recently been developed. In order to recognize an object such as a face, it is necessary to extract a portion excluding the background from the image.

예를 들어, 영상으로부터 배경을 제외한 부분을 추출하기 위하여, 깊이 정보에 기초하여 객체를 분할하는 기술이 사용될 수 있다. 다만, 깊이 정보에 기초하는 객체 분할 기술은 컬러 정보 및 깊이 정보를 결합하여 객체(예를 들면, 인체 객체)를 분할하는 것으로서, 컬러 정보를 획득하는 카메라 외에 깊이 정보를 획득하기 위한 모듈이 별도로 요구되며, 깊이 정보를 처리하는 과정에서 과중한 계산량이 요구될 수 있다For example, in order to extract a portion excluding a background from an image, a technique for segmenting an object based on depth information may be used. However, the object segmentation technology based on depth information divides an object (eg, a human body object) by combining color information and depth information, and a module for obtaining depth information is required in addition to a camera that obtains color information and an excessive amount of computation may be required in the process of processing depth information.

이에 따라, 컬러 정보를 이용하여 객체를 분할하는 기술이 요구된다.Accordingly, a technique for segmenting an object using color information is required.

일 실시예에 따르면, 영상으로부터 객체를 분할하는 방법은, 객체(object)를 포함하는 입력 영상(input image)을 수신하는 단계, 영상 모델(image model)을 이용하여, 상기 입력 영상으로부터 상기 객체에 대응하는 출력 영상(output image)을 생성하는 단계, 및 상기 출력 영상으로부터 객체 영상(object image) 을 추출하는 단계를 포함할 수 있다.According to an embodiment, a method of segmenting an object from an image includes receiving an input image including an object, and using an image model to divide the object from the input image. It may include generating a corresponding output image, and extracting an object image from the output image.

상기 객체 영상을 추출하는 단계는, 상기 출력 영상의 각각의 픽셀을 속성에 따라 분류 하는 단계, 및 상기 분류된 픽셀을 이용하여 객체 영상을 추출하는 단계를 포함할 수 있다.The extracting of the object image may include classifying each pixel of the output image according to a property, and extracting the object image by using the classified pixel.

상기 분류하는 단계는, 상기 각각의 픽셀의 픽셀 값(pixel value) 과 임계값을 비교하는 단계, 및 상기 비교한 결과에 기초하여 상기 각각의 픽셀의 속성을 결정하는 단계를 포함할 수 있다.The classifying may include comparing a pixel value of each pixel with a threshold value, and determining a property of each pixel based on a result of the comparison.

상기 객체 영상을 추출하는 단계는, 상기 출력 영상의 각각의 픽셀의 픽셀 값과 임계값을 비교한 결과에 기초하여, 상기 출력 영상을 이진화하여 마스크 영상(mask image)을 생성하는 단계를 포함할 수 있다.Extracting the object image may include generating a mask image by binarizing the output image based on a result of comparing a pixel value and a threshold value of each pixel of the output image. have.

상기 객체 영상을 추출하는 단계는, 상기 마스크 영상 및 상기 입력 영상에 기초하여 전경 영상을 생성하는 단계를 더 포함할 수 있다.The extracting of the object image may further include generating a foreground image based on the mask image and the input image.

상기 객체 영상을 추출하는 단계는, 상기 출력 영상의 각각의 픽셀의 픽셀 값과 임계값을 비교한 결과에 기초하여, 상기 출력 영상으로부터 전경 영상을 생성하는 단계를 포함할 수 있다.The extracting of the object image may include generating a foreground image from the output image based on a result of comparing a pixel value of each pixel of the output image with a threshold value.

상기 영상 모델은, 상기 입력 영상으로부터 생성되는 상기 객체 영상의 해상도가 상기 입력 영상의 해상도와 동일하도록 구성될 수 있다.The image model may be configured such that a resolution of the object image generated from the input image is the same as a resolution of the input image.

상기 영상 모델은, 뉴럴 네트워크를 포함하고, 상기 뉴럴 네트워크의 활성화 함수는 적어도 하나의 비선형 함수를 포함할 수 있다.The image model may include a neural network, and an activation function of the neural network may include at least one nonlinear function.

일 실시예에 따른 영상으로부터 객체를 분할하는 장치는, 영상 모델을 저장하는 메모리, 및 객체를 포함하는 입력 영상을 수신하고, 상기 영상 모델을 이용하여 상기 입력 영상으로부터 상기 객체에 대응하는 출력 영상을 생성하며, 상기 출력 영상으로부터 객체 영상을 추출하는 프로세서를 포함할 수 있다.An apparatus for segmenting an object from an image according to an embodiment receives an input image including a memory for storing an image model and an object, and generates an output image corresponding to the object from the input image by using the image model and a processor for extracting an object image from the output image.

상기 프로세서는, 상기 출력 영상의 각각의 픽셀을 속성에 따라 분류 하고, 상기 분류된 픽셀을 이용하여 객체 영상을 추출할 수 있다.The processor may classify each pixel of the output image according to a property, and extract an object image using the classified pixel.

상기 프로세서는, 상기 각각의 픽셀의 픽셀 값(pixel value) 과 임계값을 비교하고, 상기 비교한 결과에 기초하여 상기 각각의 픽셀의 속성을 결정할 수 있다.The processor may compare a pixel value of each pixel with a threshold value, and determine the property of each pixel based on a result of the comparison.

상기 프로세서는, 상기 출력 영상의 각각의 픽셀의 픽셀 값과 임계값을 비교한 결과에 기초하여, 상기 출력 영상을 이진화하여 마스크 영상(mask image)을 생성할 수 있다.The processor may generate a mask image by binarizing the output image based on a result of comparing a pixel value of each pixel of the output image with a threshold value.

상기 프로세서는, 상기 마스크 영상 및 상기 입력 영상에 기초하여 전경 영상을 생성할 수 있다.The processor may generate a foreground image based on the mask image and the input image.

상기 프로세서는, 상기 출력 영상의 각각의 픽셀의 픽셀 값과 임계값을 비교한 결과에 기초하여, 상기 출력 영상으로부터 전경 영상을 생성할 수 있다.The processor may generate a foreground image from the output image based on a result of comparing a pixel value of each pixel of the output image with a threshold value.

일 실시예에 따른 영상으로부터 객체의 분할을 학습하는 방법은 기준 트레이닝 영상 및 기준 객체 영상을 수신하는 단계; 및 객체를 포함하는 입력 영상으로부터 상기 객체에 대응하는 출력 영상을 생성하는 영상 모델을 이용하여, 프로세서가 상기 기준 트레이닝 영상으로부터 상기 기준 객체 영상을 분할하도록, 상기 영상 모델의 파라미터를 학습시키는 단계를 포함할 수 있다.According to an embodiment, a method for learning object division from an image includes: receiving a reference training image and a reference object image; and learning parameters of the image model so that a processor divides the reference object image from the reference training image by using an image model that generates an output image corresponding to the object from an input image including the object. can do.

상기 영상 모델은, 활성화 함수로서 적어도 하나의 비선형 함수를 포함하는 뉴럴 네트워크를 포함하고, 상기 뉴럴 네트워크는 상기 입력 영상으로부터 생성되는 상기 출력 영상의 해상도가 상기 입력 영상의 해상도와 동일하도록 구성될 수 있다.The image model may include a neural network including at least one nonlinear function as an activation function, and the neural network may be configured such that a resolution of the output image generated from the input image is the same as that of the input image. .

상기 영상 모델은, 상기 기준 트레이닝 영상에 대해 회전(rotation), 크기 조정(resize), 이동(shift), 반전(flip), 및 노이즈 부가(noise adding) 중 적어도 하나의 처리가 수행된 영상에 기초하여 학습될 수 있다.The image model is based on an image on which at least one of rotation, resize, shift, flip, and noise adding is performed on the reference training image. can be learned by

또 다른 일 실시예에 따른 영상으로부터 객체를 분할하는 방법은, 객체를 포함하는 입력 영상을 수신하는 단계, 제1 영상 모델(first image model)을 이용하여, 상기 입력 영상으로부터 상기 객체에 대응하는 중간 영상(intermediate image)을 생성하는 단계, 제2 영상 모델을 이용하여 상기 중간 영상으로부터 상기 객체에 대응하는 출력 영상을 생성하는 단계, 및 상기 출력 영상으로부터 객체 영상(object image)을 추출하는 단계를 포함할 수 있다.According to another exemplary embodiment, a method of segmenting an object from an image includes receiving an input image including the object, and using a first image model, an intermediate corresponding to the object from the input image. generating an intermediate image, generating an output image corresponding to the object from the intermediate image using a second image model, and extracting an object image from the output image can do.

또 다른 일 실시예에 따른 영상으로부터 객체의 분할을 학습하는 방법은 기준 트레이닝 영상 및 기준 객체 영상을 수신하는 단계, 객체를 포함하는 입력 영상으로부터 상기 객체에 대응하는 중간 영상을 생성하는 제1 영상 모델을 이용하여, 프로세서가 상기 기준 트레이닝 영상으로부터 상기 기준 객체 영상을 분할하도록, 상기 제1 영상 모델의 파라미터를 학습시키는 단계, 상기 제1 영상 모델을 이용하여 상기 기준 트레이닝 영상으로부터 기준 중간 영상(reference intermediate image)을 생성하는 단계, 및 상기 중간 영상으로부터 상기 객체에 대응하는 출력 영상을 생성하는 제2 영상 모델을 이용하여, 상기 프로세서가 상기 기준 중간 영상으로부터 상기 기준 객체 영상을 분할하도록, 상기 제2 영상 모델의 파라미터를 학습시키는 단계를 포함할 수 있다.According to another embodiment, a method for learning object division from an image includes receiving a reference training image and a reference object image, and a first image model generating an intermediate image corresponding to the object from an input image including the object. learning the parameters of the first image model so that the processor divides the reference object image from the reference training image by using a reference intermediate image from the reference training image using the first image model image), and using a second image model for generating an output image corresponding to the object from the intermediate image, the processor divides the reference object image from the reference intermediate image, the second image It may include training the parameters of the model.

도 1은 일 실시예에 따라 영상으로부터 분할되는 객체 영상을 도시한 도면이다.
도 2는 일 실시예에 따라 영상으로부터 객체를 분할하는 방법을 도시한 흐름도이다.
도 3 내지 도 5는 일 실시예에 따라 영상 모델을 이용하여 입력 영상으로부터 출력 영상을 생성하는 예시를 도시한 도면이다.
도 6은 일 실시예에 따라 출력 영상으로부터 객체 영상을 추출하는 방법을 도시한 흐름도이다.
도 7은 일 실시예에 따라 출력 영상의 각각의 픽셀을 속성에 따라 분류하는 방법을 도시한 흐름도이다.
도 8 및 도 9는 일 실시예에 따라 분류된 픽셀을 이용하여 객체 영상을 추출하는 방법을 도시한 흐름도이다.
도 10은 일 실시예에 따라 출력 영상의 픽셀을 이용하여 객체 영상을 추출하는 방법을 도시한 흐름도이다.
도 11은 일 실시예에 따라 영상으로부터 객체를 분할하는 장치의 구성을 도시한 블럭도이다.
도 12는 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 방법의 흐름도를 도시한 도면이다.
도 13은 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 장치의 구성을 도시한 도면이다.
도 14는 일 실시예에 따라 도 13에서 학습된 영상 모델을 이용하여, 입력 영상으로부터 생성된 객체 영상을 도시한 도면이다.
도 15는 다른 일 실시예에 따라 영상을 분할하는 방법을 도시한 흐름도이다.
도 16은 다른 일 실시예에 따라 영상 모델을 이용하여 입력 영상으로부터 출력 영상을 생성하는 예시를 도시한 도면이다.
도 17은 다른 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 방법의 흐름도를 도시한 도면이다
도 18은 다른 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 과정을 도시한 도면이다.1 is a diagram illustrating an object image divided from an image according to an exemplary embodiment.
2 is a flowchart illustrating a method of segmenting an object from an image according to an exemplary embodiment.
3 to 5 are diagrams illustrating examples of generating an output image from an input image by using an image model according to an exemplary embodiment.
6 is a flowchart illustrating a method of extracting an object image from an output image according to an exemplary embodiment.
7 is a flowchart illustrating a method of classifying each pixel of an output image according to an attribute, according to an exemplary embodiment.
8 and 9 are flowcharts illustrating a method of extracting an object image using classified pixels according to an exemplary embodiment.
10 is a flowchart illustrating a method of extracting an object image using pixels of an output image according to an embodiment.
11 is a block diagram illustrating a configuration of an apparatus for segmenting an object from an image according to an embodiment.
12 is a diagram illustrating a flowchart of a method of learning an image model used to segment an object from an image according to an embodiment.
13 is a diagram illustrating a configuration of an apparatus for learning an image model used to segment an object from an image according to an embodiment.
14 is a diagram illustrating an object image generated from an input image by using the image model learned in FIG. 13 according to an exemplary embodiment.
15 is a flowchart illustrating a method of segmenting an image according to another exemplary embodiment.
16 is a diagram illustrating an example of generating an output image from an input image by using an image model according to another embodiment.
17 is a flowchart illustrating a method of learning an image model used to segment an object from an image according to another embodiment;
18 is a diagram illustrating a process of learning an image model used to segment an object from an image according to another embodiment.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference numerals in each figure indicate like elements.

아래 설명하는 실시예들에는 다양한 변경이 가해질 수 있다. 아래 설명하는 실시예들은 실시 형태에 대해 한정하려는 것이 아니며, 이들에 대한 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Various modifications may be made to the embodiments described below. It should be understood that the embodiments described below are not intended to limit the embodiments, and include all modifications, equivalents, and substitutes thereto.

실시예에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 실시예를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수 개의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in the examples are used only to describe specific examples, and are not intended to limit the examples. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present specification, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are given the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

도 1은 일 실시예에 따라 영상으로부터 분할되는 객체 영상을 도시한 도면이다.1 is a diagram illustrating an object image divided from an image according to an exemplary embodiment.

일 실시예에 따른 영상으로부터 객체를 분할하는 장치는, 객체(object)를 포함하는 입력 영상(input image)(110)으로부터 객체 영상(object image)을 분할할 수 있다.The apparatus for dividing an object from an image according to an embodiment may divide an object image from an input image 110 including an object.

본 명세서에서 객체는 사람(human), 동물(animal), 사물(thing) 등과 같이 배경(background)을 제외한 대상(subject)을 포함할 수 있고, 사람의 얼굴, 팔, 다리, 및 신체의 일부 등과 같은 대상의 일부를 포함할 수도 있다.In the present specification, an object may include a subject excluding a background, such as a human, an animal, a thing, etc., and a part of a person's face, arms, legs, and body, etc. It may contain parts of the same object.

입력 영상(110)은 수신되는 영상으로서 객체를 포함할 수 있다. 입력 영상(110)은 2차원의 영상으로서, 예를 들어 컬러 영상 및 그레이스케일(grayscale) 영상 등일 수 있다. 입력 영상(110)은 복수의 픽셀들로 구성될 수 있고, 각각의 픽셀은 픽셀값을 가질 수 있다. 입력 영상(110)이 컬러 영상인 경우의 픽셀값은 컬러값(color value)(예를 들어, RGB값 등, 다만 다른 색 공간(color space)이 사용될 수도 있음)일 수 있고, 입력 영상(110)이 그레이스케일 영상인 경우의 픽셀값은 밝기값(brightness value) 또는 세기값(intensity value)일 수 있다. 다만, 입력 영상(110)을 이로 한정하는 것은 아니고, 3차원의 영상일 수도 있고, 이 경우에는 각 픽셀은 깊이값(depth value)을 더 포함할 수 있다.The input image 110 may include an object as a received image. The input image 110 is a two-dimensional image, and may be, for example, a color image and a grayscale image. The input image 110 may include a plurality of pixels, and each pixel may have a pixel value. When the input image 110 is a color image, the pixel value may be a color value (eg, an RGB value, but other color spaces may be used), and the input image 110 may be a color space. ) is a grayscale image, the pixel value may be a brightness value or an intensity value. However, the input image 110 is not limited thereto, and may be a three-dimensional image, and in this case, each pixel may further include a depth value.

객체 영상은 객체를 나타내는 영상일 수 있다. 예를 들어, 객체 영상은 입력 영상(110)으로부터 배경 부분이 제외된 영상으로서, 전경 부분만을 포함하는 전경 영상(120) 또는 마스크 부분만을 포함하는 마스크 영상(130)일 수 있다. 전경 영상(120)은 영상에서 전경에 대응하는 부분의 각 픽셀이 그에 대응하는 픽셀값을 가지고 있는 영상이고, 마스크 영상(130)은 영상에서 전경에 대응하는 부분의 픽셀 및 전경이 아닌 부분의 픽셀이 특정 값(예를 들어, 전경인 부분의 픽셀값은 1, 후경인 부분의 픽셀값은 0)으로 구분되는 영상을 나타낼 수 있다. 예를 들어, 도 1에 도시된 바와 같이, 전경 영상(120)은 전경에 해당하는 부분의 픽셀값이 입력 영상(110)과 동일하게 유지될 수 있고, 마스크 영상(130)은 전경에 해당하는 부분과 후경에 해당하는 부분의 픽셀값이 이진화(binarize)될 수 있다.The object image may be an image representing the object. For example, the object image is an image in which a background part is excluded from the input image 110 , and may be a foreground image 120 including only the foreground part or a mask image 130 including only a mask part. The foreground image 120 is an image in which each pixel of a portion corresponding to the foreground has a corresponding pixel value, and the mask image 130 is a pixel of a portion corresponding to the foreground and pixels of a portion other than the foreground of the image. An image divided by this specific value (eg, a pixel value of a foreground portion is 1 and a pixel value of a rear view portion is 0) may be displayed. For example, as shown in FIG. 1 , in the foreground image 120 , the pixel value of a portion corresponding to the foreground may be maintained to be the same as that of the input image 110 , and the mask image 130 may be configured to correspond to the foreground. Pixel values of the portion and the portion corresponding to the rear view may be binarized.

이하에서는, 입력 영상(110)으로부터 객체 영상을 추출하는 것을 설명한다.Hereinafter, extracting an object image from the input image 110 will be described.

도 2는 일 실시예에 따라 영상으로부터 객체를 분할하는 방법을 도시한 흐름도이다.2 is a flowchart illustrating a method of segmenting an object from an image according to an exemplary embodiment.

우선, 단계(210)에서 영상으로부터 객체를 분할하는 장치의 프로세서는 객체를 포함하는 입력 영상을 수신할 수 있다. 예를 들어, 프로세서는 입력 영상을 외부로부터 유선 또는 무선으로 수신하거나, 장치 내부의 카메라를 통한 촬영으로부터 입력 영상을 획득할 수 있다.First, the processor of the apparatus for segmenting an object from an image in operation 210 may receive an input image including the object. For example, the processor may receive the input image from the outside by wire or wirelessly, or may acquire the input image from photographing through a camera inside the device.

그리고 단계(220)에서 프로세서는 기준 트레이닝 영상으로부터 기준 객체 영상이 출력되도록 학습된 영상 모델을 이용하여, 입력 영상으로부터 객체에 대응하는 출력 영상을 생성할 수 있다. 출력 영상의 각각의 픽셀의 픽셀값 p_i은 해당 픽셀이 객체를 나타낼 확률에 대응할 수 있다. 예를 들어, 출력 영상이 마스크에 관한 것일 경우, 출력 영상의 픽셀값은 최소값이 0이고 최대값이 1일 수 있는데, 픽셀값이 1에 가까울 수록 해당 픽셀이 마스크를 나타낼 확률이 클 수 있다.In step 220 , the processor may generate an output image corresponding to the object from the input image by using the image model trained to output the reference object image from the reference training image. _{A pixel value p i} of each pixel of the output image may correspond to a probability that the corresponding pixel represents an object. For example, when the output image relates to a mask, the pixel value of the output image may have a minimum value of 0 and a maximum value of 1. The pixel value may have a greater probability of representing the mask as the pixel value is closer to 1.

영상 모델은 특정 입력에 대해 특정 출력이 출력되도록 학습된 모델로서, 예를 들어, 기계학습 구조의 파라미터를 나타낼 수 있다. 기계학습 구조는 특정 입력에 대해 미리 학습된 파라미터에 기초하여 임의의 출력이 생성되는 블랙박스 함수로 표현될 수 있다. 일 실시예에 따르면, 영상 모델은 입력 영상으로부터 객체를 나타내는 출력 영상이 출력되도록 구성될 수 있다. 예를 들어, 영상 모델은 뉴럴 네트워크의 파라미터로서 연결 가중치를 포함할 수 있고, 기준 트레이닝 영상으로부터 기준 객체 영상이 출력되도록 학습될 수 있다. 영상 모델의 학습은 하기 도 12 및 도 13에서 상세히 설명한다.The image model is a model trained to output a specific output with respect to a specific input, and may represent, for example, a parameter of a machine learning structure. Machine learning structures can be expressed as black box functions in which arbitrary outputs are generated based on pre-trained parameters for specific inputs. According to an embodiment, the image model may be configured to output an output image representing an object from an input image. For example, the image model may include a connection weight as a parameter of the neural network, and may be trained to output a reference object image from a reference training image. Learning of the image model will be described in detail with reference to FIGS. 12 and 13 below.

이어서 단계(230)에서 프로세서는 출력 영상으로부터 객체 영상을 추출할 수 있다. 예를 들어, 프로세서는 출력 영상의 픽셀들을 전경에 해당하는 픽셀과 전경에 해당하지 않는 픽셀로 분류하여 라벨링(labeling)할 수 있다. 픽셀의 분류에 의한 객체 영상의 추출은 하기 도 6 내지 도 10에서 상세히 설명한다.Subsequently, in step 230 , the processor may extract an object image from the output image. For example, the processor may label the pixels of the output image by classifying them into pixels corresponding to the foreground and pixels not corresponding to the foreground. The extraction of the object image by the classification of pixels will be described in detail with reference to FIGS. 6 to 10 below.

도 3 내지 도 5는 일 실시예에 따라 영상 모델을 이용하여 입력 영상으로부터 출력 영상을 생성하는 예시를 도시한 도면이다.3 to 5 are diagrams illustrating examples of generating an output image from an input image by using an image model according to an exemplary embodiment.

도 3 내지 도 5에서는 영상 모델이 뉴럴 네트워크의 파라미터로서 연결 가중치(connection weight)를 포함하는 경우를 예로 들어 설명한다. 도 3 내지 도 5에 도시된 영상 모델에 대응하는 뉴럴 네트워크는 학습이 완료된 상태로서, i번째 픽셀(여기서, i는 1이상의 정수)이 x_i의 픽셀값을 가지는 입력 영상에 대해 i번째 픽셀이 p_i의 픽셀값을 가지는 출력 영상이 출력되도록 구성될 수 있다.A case in which an image model includes a connection weight as a parameter of a neural network will be described as an example in FIGS. 3 to 5 . The neural network corresponding to the image model shown in FIGS. 3 to 5 has been trained, and the i-th pixel (here, i is an integer greater than or equal to 1) has _{a pixel value of x i} with respect to the input image. An output image having a pixel value of _{p i may be output.}

본 명세서의 뉴럴 네트워크는 생물학적인 뉴런의 기능을 단순화시킨 인공 뉴런들이 이용되고, 인공 뉴런들은 연결 가중치를 가지는 연결선을 통해 상호 연결될 수 있다. 뉴럴 네트워크의 파라미터인 연결 가중치는 연결선이 갖는 특정한 값으로서 연결 강도라고도 나타낼 수 있다. 뉴럴 네트워크는 인공 뉴런들을 통해 인간의 인지 작용이나 학습 과정을 수행할 수 있다. 인공 뉴런은 노드(node)라고도 지칭할 수 있다.In the neural network of the present specification, artificial neurons that simplify the function of biological neurons are used, and the artificial neurons may be interconnected through a connection line having a connection weight. A connection weight, which is a parameter of a neural network, is a specific value of a connection line and can also be expressed as connection strength. Neural networks can perform human cognitive actions or learning processes through artificial neurons. An artificial neuron may also be referred to as a node.

뉴럴 네트워크는 복수의 층들을 포함할 수 있다. 예를 들어, 뉴럴 네트워크는 입력 층(input layer), 히든 층(hidden layer), 출력 층(output layer)을 포함할 수 있다. 입력 층은 학습을 수행하기 위한 입력을 수신하여 히든 층에 전달할 수 있고, 출력 층은 히든 층의 노드들로부터 수신한 신호에 기초하여 뉴럴 네트워크의 출력을 생성할 수 있다. 히든 층은 입력 층과 출력 층 사이에 위치하고, 입력 층을 통해 전달된 학습 데이터를 예측하기 쉬운 값으로 변화시킬 수 있다. 입력 층과 히든 층에 포함된 노드들은 연결 가중치를 가지는 연결선을 통해 서로 연결되고, 히든 층과 출력 층에 포함된 노드들에서도 연결 가중치를 가지는 연결선을 통해 서로 연결될 수 있다. 입력 층, 히든 층 및 출력 층은 복수 개의 노드들을 포함할 수 있다. A neural network may include a plurality of layers. For example, the neural network may include an input layer, a hidden layer, and an output layer. The input layer may receive an input for performing learning and transmit it to the hidden layer, and the output layer may generate an output of the neural network based on signals received from nodes of the hidden layer. The hidden layer is located between the input layer and the output layer, and can change the training data transmitted through the input layer into a value that is easy to predict. Nodes included in the input layer and the hidden layer may be connected to each other through a connection line having a connection weight, and nodes included in the hidden layer and the output layer may also be connected to each other through a connection line having a connection weight. The input layer, the hidden layer, and the output layer may include a plurality of nodes.

뉴럴 네트워크는 복수 개의 히든 층들을 포함할 수 있다. 복수 개의 히든 층들을 포함하는 뉴럴 네트워크를 깊은 뉴럴 네트워크(deep neural network)이라고 하고, 깊은 뉴럴 네트워크를 학습시키는 것을 깊은 학습(deep learning)이라고 한다. 히든 층에 포함된 노드를 히든 노드(hidden node)라고 한다. 이전 시간 구간에서의 히든 노드의 출력은 현재 시간 구간에서의 히든 노드들에 연결될 수 있다. 그리고, 현재 시간 구간에서의 히든 노드의 출력은 다음 시간 구간에서의 히든 노드들에 연결될 수 있다. 서로 다른 시간 구간에서 히든 노드들 간에 재귀적(recurrent)인 연결이 있는 뉴럴 네트워크를 회귀 뉴럴 네트워크(recurrent neural network)이라고 한다.A neural network may include a plurality of hidden layers. A neural network including a plurality of hidden layers is called a deep neural network, and training a deep neural network is called deep learning. A node included in the hidden layer is called a hidden node. The output of the hidden node in the previous time interval may be connected to the hidden nodes in the current time interval. And, the output of the hidden node in the current time interval may be connected to the hidden nodes in the next time interval. A neural network with recurrent connections between hidden nodes in different time intervals is called a recurrent neural network.

또한, 히든 층은 예를 들어, 컨볼루션 층(convolution layer), 풀링 층(pooling layer), 정규화 층(normalization layer), 및 완전연결 층(fully connected layer) 등을 포함할 수 있다. 컨볼루션 층은 미리 정한 크기의 필터를 이용하여 이전 층에서 추출된 정보를 필터링하는 컨볼루션 필터링을 수행하는데 사용될 수 있고, 도 3 내지 도 5에서는 "C"로 도시될 수 있다. 풀링 층은, 풀링을 통하여, 이전 층의 정보로부터 대표 값(예를 들어, 프로세서는 풀링 층에서 이전 층의 정보(예를 들어, 영상의 픽셀값들)에 대하여 미리 정한 크기의 윈도우를 일정 칸씩 슬라이드 하면서, 윈도우 내 최대 값을 추출)을 추출하는데 사용될 수 있고, 도 3 내지 도 5에서는 "P"로 도시될 수 있다. 정규화 층은 영상의 픽셀의 값이 정규화되는 층을 나타낼 수 있고, 도 3 내지 도 5에서는 "N"으로 도시될 수 있다. 완전연결 층은 이전 층의 모든 노드들과 연결될 수 있고, 도 3 내지 도 5에서는 "F"로 도시될 수 있다.In addition, the hidden layer may include, for example, a convolution layer, a pooling layer, a normalization layer, and a fully connected layer. The convolutional layer may be used to perform convolutional filtering of filtering information extracted from the previous layer using a filter of a predetermined size, and may be shown as “C” in FIGS. 3 to 5 . The pooling layer, through pooling, creates a window of a predetermined size with respect to the representative value (eg, the information of the previous layer in the pooling layer (eg, pixel values of the image) from the information of the previous layer) While sliding, extract the maximum value within the window), and may be shown as “P” in FIGS. 3 to 5 . The normalization layer may indicate a layer in which values of pixels of an image are normalized, and may be denoted as “N” in FIGS. 3 to 5 . The fully connected layer may be connected to all nodes of the previous layer, and may be shown as “F” in FIGS. 3 to 5 .

일 실시예에 따르면, 도 3에 도시된 뉴럴 네트워크는 입력 층, 출력 층 및 6개의 히든 층을 포함할 수 있다. 입력 층은 입력 영상(301)을 수신할 수 있다. 도 3에 도시된 제1 레이어(310)에 있어서, C1(64@5*5+S1)은 컨볼루션 층이 예를 들어, 64개의 필터를 가지고 각 필터의 크기가 5*5이며, 필터가 1칸씩 이동되는 것을 나타내고, P(3*3+S2)는 풀링 층이 예를 들어, 윈도우의 크기는 3*3이고 2칸씩 이동되는 것을 나타내며, N은 정규화 층을 나타낼 수 있다. 제2 레이어(320)는 1칸씩 이동되는 5*5 크기로 64개의 필터를 가지는 컨볼루션 층, 2칸씩 이동되는 3*3 크기의 윈도우를 가지는 풀링 층 및 정규화 층을 포함할 수 있다. 제3 레이어(330)는 1칸씩 이동되는 3*3 크기로 64개의 필터를 가지는 컨볼루션 층을 포함할 수 있다. 제4 레이어(340) 및 제5 레이어(350)는 각각 100개의 노드를 가지는 완전연결 층을 포함할 수 있다. 제6 레이어(360)는 48*48개의 노드를 가지는 완전연결 층을 포함할 수 있다. 여기서, 출력 층 직전의 제6 레이어(360)는 입력 층과 동일한 해상도(도 3에서는 48*48)의 영상이 출력 층에서 출력되도록 구성될 수 있다. 도 3에 도시된 뉴럴 네트워크는 입력 영상(301)으로부터 마스크에 대응하는 출력 영상(309)이 출력되도록 학습될 수 있다.According to an embodiment, the neural network shown in FIG. 3 may include an input layer, an output layer, and six hidden layers. The input layer may receive the input image 301 . In the first layer 310 shown in FIG. 3 , C1 (64@5*5+S1) has a convolutional layer with, for example, 64 filters, each filter having a size of 5*5, and the filter is It indicates that it is moved by 1 space, P(3*3+S2) indicates that, for example, the size of the window is 3*3 and is moved by 2 spaces, P(3*3+S2) may indicate the normalization layer. The second layer 320 may include a convolution layer having 64 filters with a size of 5*5 moved by 1 space, a pooling layer and a normalization layer having a window of size 3*3 moved by 2 spaces. The third layer 330 may include a convolutional layer having 64 filters with a size of 3*3 that is moved by 1 space. Each of the fourth layer 340 and the fifth layer 350 may include a fully connected layer having 100 nodes. The sixth layer 360 may include a fully connected layer having 48*48 nodes. Here, the sixth layer 360 immediately before the output layer may be configured such that an image having the same resolution as the input layer (48*48 in FIG. 3) is output from the output layer. The neural network shown in FIG. 3 may be trained to output an output image 309 corresponding to a mask from an input image 301 .

다른 일 실시예에 따르면 도 4에 도시된 뉴럴 네트워크는 입력 층, 출력 층 및 8개의 히든 층을 포함할 수 있다. 도 4에 도시된 제1 레이어(410)는 1칸씩 이동되는 5*5 크기로 48개의 필터를 가지는 컨볼루션 층, 2칸씩 이동되는 3*3 크기의 윈도우를 가지는 풀링 층, 및 정규화 층을 포함할 수 있다. 제2 레이어(420)는 1칸씩 이동되는 5*5 크기로 128개의 필터를 가지는 컨볼루션 층, 2칸씩 이동되는 3*3 크기의 윈도우를 가지는 풀링 층, 및 정규화 층을 포함할 수 있다. 제3 레이어(430) 및 제4 레이어(440)는 각각 1칸씩 이동되는 3*3크기로 192개의 필터를 가지는 컨볼루션 층을 포함할 수 있다. 제5 레이어(450)는 1칸씩 이동되는 3*3 크기로 64개의 필터를 가지는 컨볼루션 층 및 2칸씩 이동되는 3*3 크기의 윈도우를 가지는 풀링 층을 포함할 수 있다. 제6 레이어(460) 및 제7 레이어(470)는 1024개의 노드를 가지는 완전연결 층을 포함할 수 있다. 제8 레이어(480)는 112*112개의 노드를 가지는 완전연결 층을 포함할 수 있다. 제8 레이어(480)는 입력 영상(401)의 해상도(도 4에서는 112*112)와 출력 영상(409)의 해상도(도 4에서는 112*112)가 동일하도록 노드가 구성될 수 있다.According to another embodiment, the neural network shown in FIG. 4 may include an input layer, an output layer, and eight hidden layers. The first layer 410 shown in FIG. 4 includes a convolution layer having 48 filters with a size of 5*5 moved by 1 space, a pooling layer having a window of size 3*3 moved by 2 spaces, and a normalization layer. can do. The second layer 420 may include a convolution layer having 128 filters with a size of 5*5 moved by 1 space, a pooling layer having a window of size 3*3 moved by 2 spaces, and a normalization layer. The third layer 430 and the fourth layer 440 may include a convolutional layer having 192 filters in a 3*3 size that is moved by 1 space, respectively. The fifth layer 450 may include a convolution layer having 64 filters with a size of 3*3 that is moved by 1 space and a pooling layer having a window of size of 3*3 that is moved by 2 spaces. The sixth layer 460 and the seventh layer 470 may include fully connected layers having 1024 nodes. The eighth layer 480 may include a fully connected layer having 112*112 nodes. The eighth layer 480 may be configured such that the resolution of the input image 401 (112*112 in FIG. 4) and the resolution of the output image 409 (112*112 in FIG. 4) are the same.

또 다른 일 실시예에 따르면, 도 5에 도시된 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(510, 520, 530, 540, 550, 560, 570, 580)는 도 4와 동일한 구조의 레이어로 구성될 수 있다. 다만, 도 5에 도시된 뉴럴 네트워크는 입력 영상(501)으로부터 전경에 대응하는 출력 영상(509)이 출력되도록 학습될 수 있다. 이와 같이, 동일한 구조의 뉴럴 네트워크에 대해서도 트레이닝 데이터에 따라 동일한 입력 영상에 대해 출력 영상이 달라질 수 있다.According to another embodiment, the first to eighth layers 510 , 520 , 530 , 540 , 550 , 560 , 570 and 580 of the neural network shown in FIG. 5 are composed of layers having the same structure as in FIG. 4 . can be However, the neural network shown in FIG. 5 may be trained to output the output image 509 corresponding to the foreground from the input image 501 . As such, even for a neural network having the same structure, an output image may be different for the same input image according to training data.

도 6은 일 실시예에 따라 출력 영상으로부터 객체 영상을 추출하는 방법을 도시한 흐름도이다.6 is a flowchart illustrating a method of extracting an object image from an output image according to an exemplary embodiment.

일 실시예에 따르면, 도 6은 상술한 도 2의 단계(230)를 보다 상세하게 설명하기 위한 흐름도를 나타낼 수 있다.According to an embodiment, FIG. 6 may show a flowchart for explaining step 230 of FIG. 2 described above in more detail.

우선, 단계(610)에서 프로세서는 출력 영상의 각각의 픽셀을 속성에 따라 분류할 수 있다. 픽셀의 속성은 출력 영상에서 해당 픽셀이 객체, 객체의 일부, 전경 또는 배경에 대응하는지 여부 등을 나타낼 수 있다. 예를 들어, 픽셀의 속성은 해당 픽셀이 출력 영상에서 전경인지 아닌지 여부를 나타낼 수 있다. 픽셀의 분류는 하기 도 7에서 상세히 설명한다.First, in operation 610, the processor may classify each pixel of the output image according to an attribute. The property of a pixel may indicate whether a corresponding pixel corresponds to an object, a part of an object, a foreground, or a background in the output image. For example, the property of a pixel may indicate whether a corresponding pixel is a foreground in an output image. The classification of pixels will be described in detail with reference to FIG. 7 below.

그리고 단계(620)에서 프로세서는 분류된 픽셀을 이용하여 객체 영상을 추출할 수 있다. 예를 들어, 프로세서는 객체로 분류된 픽셀을 취합하여 객체 영상을 생성할 수 있다. 픽셀을 이용한 객체 영상의 추출은 하기 도 8 내지 도 10에서 상세히 설명한다.And in step 620, the processor may extract an object image by using the classified pixels. For example, the processor may generate an object image by collecting pixels classified as objects. The extraction of the object image using pixels will be described in detail with reference to FIGS. 8 to 10 below.

도 7은 일 실시예에 따라 출력 영상의 각각의 픽셀을 속성에 따라 분류하는 방법을 도시한 흐름도이다.7 is a flowchart illustrating a method of classifying each pixel of an output image according to an attribute, according to an exemplary embodiment.

일 실시예에 따르면, 도 7은 도 6의 단계(610)를 보다 상세하게 설명하기 위한 흐름도를 나타낼 수 있다.According to an embodiment, FIG. 7 may show a flowchart for explaining step 610 of FIG. 6 in more detail.

우선, 단계(710)에서 프로세서는 각각의 픽셀의 픽셀값을 임계값과 비교할 수 있다. 예를 들어, 프로세서는 도 2의 단계(220)에서 생성된 출력 영상의 각각의 픽셀의 픽셀값이 임계값보다 큰 지 여부를 판단할 수 있다. 임계값은 픽셀을 분류하기 위한 값으로 설정될 수 있다. 예를 들어, 마스크 영상을 분할하기 위한 경우, 마스크 영상에서 배경에 해당하는 픽셀값은 0이고 마스크에 해당하는 픽셀값은 1인 바, 임계값은 0.5로 설정될 수 있다. 다른 예를 들어, 전경 영상을 분할하기 위한 경우, 전경 영상에서 배경에 해당하는 픽셀값은 0이고, 픽셀의 최대값은 255일 수 있는 바, 임계값은 127로 설정될 수 있다. 다만, 픽셀의 최소값, 최대값, 및 임계값을 상술한 바로 한정하는 것은 아니고, 설계에 따라 변경될 수 있다.First, in step 710, the processor may compare the pixel value of each pixel with a threshold value. For example, the processor may determine whether a pixel value of each pixel of the output image generated in step 220 of FIG. 2 is greater than a threshold value. The threshold value may be set as a value for classifying pixels. For example, when dividing a mask image, a pixel value corresponding to a background in the mask image is 0 and a pixel value corresponding to the mask is 1, and the threshold value may be set to 0.5. For another example, in the case of dividing the foreground image, the pixel value corresponding to the background in the foreground image may be 0 and the maximum value of the pixel may be 255, and the threshold value may be set to 127. However, the minimum value, the maximum value, and the threshold value of the pixel are not limited as described above, and may be changed according to design.

그리고 단계(720)에서 프로세서는 비교한 결과에 기초하여 각각의 픽셀의 속성을 결정할 수 있다. 프로세서는 임계값보다 큰 픽셀은 전경 속성 또는 마스크 속성을 가지는 것으로 결정할 수 있고, 임계값 이하의 픽셀은 배경 속성을 가지는 것으로 결정할 수 있다. 다만, 이로 한정하는 것은 아니고, 설계에 따라 배경 속성에 대응하는 값이 전경 속성에 대응하는 값보다 큰 경우에는 프로세서가 임계값보다 큰 픽셀이 배경 속성을 가지는 것으로 결정할 수도 있다.And in step 720, the processor may determine the property of each pixel based on the comparison result. The processor may determine that a pixel greater than the threshold has a foreground attribute or a mask attribute, and may determine that a pixel below the threshold has a background attribute. However, the present invention is not limited thereto, and when the value corresponding to the background attribute is greater than the value corresponding to the foreground attribute according to design, the processor may determine that a pixel greater than the threshold has the background attribute.

예를 들어, 마스크 영상의 분할에 있어서, 도 2의 단계(220)에서 생성된 출력 영상의 픽셀값이 1에 가까울 수록 해당 픽셀은 마스크일 확률이 높고, 0에 가까울 수록 해당 픽셀은 배경일 확률이 높을 수 있다. 이에 따라, 프로세서는 마스크 영상의 분할에 있어서, 출력 영상에서 픽셀값이 임계값인 0.5보다 큰 픽셀의 속성을 마스크 속성으로 결정하고, 임계값인 0.5 이하인 픽셀의 속성을 배경 속성으로 결정할 수 있다. 다른 예를 들어, 전경 영상의 분할에 있어서도, 픽셀값이 0에 가까울 수록 해당 픽셀은 배경일 확률이 높고, 픽셀값이 255에 가까울 수록 전경일 확률이 높은 바, 프로세서는 임계값인 127을 기준으로 각 픽셀의 속성이 전경인지 배경인지 여부를 결정할 수 있다. 다만, 이로 한정하는 것은 아니고, 프로세서는 픽셀의 속성이 객체인지, 객체의 일부인지 여부 등을 결정할 수도 있다.For example, in the segmentation of the mask image, the closer the pixel value of the output image generated in step 220 of FIG. 2 to 1 is, the higher the probability that the pixel is a mask, and the closer it is to 0, the higher the probability that the pixel is a background. This can be high. Accordingly, in segmenting the mask image, the processor may determine a property of a pixel having a pixel value greater than 0.5, which is a threshold value, as a mask property in the output image, and may determine a property of a pixel having a pixel value less than or equal to 0.5, which is a threshold value, as a background property. For another example, even in the division of the foreground image, the closer the pixel value is to 0, the higher the probability that the corresponding pixel is the background, and the closer the pixel value is 255, the higher the probability of the foreground image. It is possible to determine whether the property of each pixel is foreground or background. However, the present invention is not limited thereto, and the processor may determine whether the property of the pixel is an object or a part of the object.

도 8 및 도 9는 일 실시예에 따라 분류된 픽셀을 이용하여 객체 영상을 추출하는 방법을 도시한 흐름도이다.8 and 9 are flowcharts illustrating a method of extracting an object image using classified pixels according to an exemplary embodiment.

일 실시예에 따르면, 도 8은 도 6의 단계(620)을 수행하는 방법의 예시를 도시한 흐름도이다. 도 8의 방법을 수행하기 위한, 영상으로부터 객체를 분할하는 장치의 메모리에 저장된 영상 모델은, 기준 트레이닝 영상으로부터 기준 마스크 영상이 출력되도록 학습된 모델일 수 있다. 기준 마스크 영상은, 기준 트레이닝 영상으로부터 출력되어야 하는 것으로 설정된 마스크 영상을 나타낼 수 있다.According to one embodiment, FIG. 8 is a flowchart illustrating an example of a method of performing step 620 of FIG. 6 . The image model stored in the memory of the apparatus for dividing an object from an image for performing the method of FIG. 8 may be a model trained to output a reference mask image from a reference training image. The reference mask image may indicate a mask image set to be output from the reference training image.

우선, 단계(810)에서 프로세서는 결정된 픽셀의 속성에 기초하여, 출력 영상을 이진화하여 마스크 영상을 생성할 수 있다. 예를 들어, 프로세서는 도 6의 단계(610)에서 마스크 속성으로 결정된 픽셀의 픽셀값을 1로 설정하고, 배경 속성으로 결정된 픽셀의 픽셀값을 0으로 설정하여, 각 픽셀이 이진값(binary value)을 가지는 마스크 영상을 생성할 수 있다. 다만, 이로 한정하는 것은 아니고, 프로세서는 배경에 대응하는 픽셀값을 1, 마스크에 대응하는 픽셀값을 1로 설정할 수도 있다. 또한, 이진 값으로서 0과 1로 한정하는 것은 아니고, 2개의 서로 다른 값의 실수가 이용될 수도 있다. 하기에서는 마스크 속성에 대하여 1, 배경 속성에 대하여 0의 값을 기준으로 설명한다.First, in operation 810 , the processor may generate a mask image by binarizing the output image based on the determined property of the pixel. For example, the processor sets the pixel value of the pixel determined as the mask attribute to 1 in step 610 of FIG. 6 and sets the pixel value of the pixel determined as the background attribute to 0, so that each pixel has a binary value ) may be generated. However, the present invention is not limited thereto, and the processor may set the pixel value corresponding to the background to 1 and the pixel value corresponding to the mask to 1 . Also, the binary values are not limited to 0 and 1, and real numbers of two different values may be used. Hereinafter, a value of 1 for the mask attribute and 0 for the background attribute will be described as a reference.

그리고 단계(820)에서 프로세서는 단계(810)에서 생성된 마스크 영상 및 입력 영상에 기초하여 전경 영상을 생성할 수 있다. 예를 들어, 프로세서는 마스크 영상의 픽셀의 픽셀값과, 입력 영상에서 해당 픽셀에 대응하는 픽셀값을 곱하여 전경 영상을 생성할 수 있다. 마스크 영상은 마스크인 부분에 대해서 1의 픽셀값을 가지므로, 마스크 영상의 각각의 픽셀의 픽셀값을 입력 영상에 곱하게 될 경우, 입력 영상에서 마스크가 아닌 부분은 제거되고, 마스크인 부분의 픽셀값만이 유지될 수 있다.In operation 820 , the processor may generate a foreground image based on the mask image and the input image generated in operation 810 . For example, the processor may generate a foreground image by multiplying a pixel value of a pixel of the mask image by a pixel value corresponding to a corresponding pixel in the input image. Since the mask image has a pixel value of 1 for the mask-in portion, when the pixel value of each pixel of the mask image is multiplied by the input image, the non-mask portion of the input image is removed and the pixels of the mask-in portion Only values can be maintained.

일 실시예에 따르면, 마스크 영상을 분할하는 경우에는, 프로세서가 상술한 단계(820)을 수행하지 않을 수 있다. 상술한 단계(820)는 전경 영상을 분할하기 위한 경우에 수행될 수 있다.According to an embodiment, when dividing the mask image, the processor may not perform the above-described operation 820 . The above-described operation 820 may be performed in the case of segmenting the foreground image.

다른 일 실시예에 따르면, 도 9는 도 6의 단계(620)을 수행하는 방법의 다른 예시를 도시한 흐름도이다. 도 9의 방법을 수행하기 위한, 영상으로부터 객체를 분할하는 장치의 메모리에 저장된 영상 모델은, 기준 트레이닝 영상으로부터 기준 전경 영상이 출력되도록 학습된 모델일 수 있다. 기준 전경 영상은 기준 트레이닝 영상으로부터 출력되어야 하는 것으로 설정된 전경 영상을 나타낼 수 있다.According to another embodiment, FIG. 9 is a flowchart illustrating another example of a method of performing step 620 of FIG. 6 . The image model stored in the memory of the apparatus for segmenting an object from an image for performing the method of FIG. 9 may be a model trained to output a reference foreground image from a reference training image. The reference foreground image may indicate a foreground image set to be output from the reference training image.

단계(910)에서 프로세서는 결정된 픽셀의 속성에 기초하여, 출력 영상으로부터 전경 영상을 생성할 수 있다. 예를 들어, 프로세서는 출력 영상에서 전경에 대응하는 부분의 픽셀값은 유지하고, 전경이 아닌 부분의 픽셀값은 초기화(예를 들어, 픽셀값을 0으로 변경)하여, 전경 영상을 생성할 수 있다.In operation 910 , the processor may generate a foreground image from the output image based on the determined attribute of the pixel. For example, the processor may generate a foreground image by maintaining the pixel value of the portion corresponding to the foreground in the output image and initializing the pixel value of the non-foreground portion (eg, changing the pixel value to 0). have.

도 10은 일 실시예에 따라 출력 영상의 픽셀을 이용하여 객체 영상을 추출하는 방법을 도시한 흐름도이다.10 is a flowchart illustrating a method of extracting an object image using pixels of an output image according to an embodiment.

일 실시예에 따르면, 도 10은 도 2의 단계(230)에서 마스크 영상을 생성하는 과정(810)의 예시 및 전경 영상을 생성하는 과정(910, 820)의 예시를 설명의 편의를 위해 일괄적으로 도시한 흐름도이다. 도 10에서 단계(810), 단계(910), 및 단계(820)는 실시예에 따라 하나만 수행될 수 있다. 다만, 이로 한정하는 것은 아니고, 설계에 따라 상술한 단계들(810, 910, 820)이 선택적으로 수행될 수도 있다.According to an embodiment, FIG. 10 shows an example of a process 810 of generating a mask image and an example of processes 910 and 820 of generating a foreground image in step 230 of FIG. 2 for convenience of explanation. It is a flowchart shown in In FIG. 10 , only one of steps 810 , 910 , and 820 may be performed according to an embodiment. However, the present invention is not limited thereto, and the above-described steps 810 , 910 , and 820 may be selectively performed according to design.

우선, 단계(220)로부터 프로세서는 상술한 바와 같이 출력 영상(1010)을 생성할 수 있다. 출력 영상(1010)의 i번째 픽셀은 p_i의 픽셀값을 가질 수 있다.First, from step 220 , the processor may generate the output image 1010 as described above. The i-th pixel of the output image 1010 may have a pixel value _{of p i .}

그리고 단계(1020)에서 프로세서는 임계값 τ보다 p_i가 큰 픽셀을 추출할 수 있다. 여기서, 임계값 τ는 도 7에서 상술한 바 자세한 설명을 생략한다. 예를 들어, 프로세서는 임계값 τ보다 p_i가 큰 픽셀을 전경 속성 또는 마스크 속성을 가지는 것으로 라벨링할 수 있다.And in step 1020, the processor may extract a pixel having a larger _{p i than the threshold τ.} Here, the detailed description of the threshold τ as described above with reference to FIG. 7 will be omitted. For example, the processor may _{label a pixel with p i} greater than the threshold τ as having a foreground attribute or a mask attribute.

이어서 단계(810)에서 프로세서는 도 8에서 상술한 바와 같이 마스크 속성으로 결정된 픽셀을 취합하여 마스크 영상(1030)을 생성할 수 있다.Subsequently, in operation 810 , the processor may generate a mask image 1030 by combining pixels determined as a mask property as described above with reference to FIG. 8 .

그리고 단계(910)에서 프로세서는 도 9에서 상술한 바와 같이 전경 속성으로 결정된 픽셀을 취합하여 전경 영상(1040)을 생성할 수 있다.In operation 910 , the processor may generate a foreground image 1040 by collecting pixels determined as a foreground attribute as described above with reference to FIG. 9 .

이어서 단계(820)에서 프로세서는 도 8에서 상술한 바와 같이 마스크 속성으로 결정된 픽셀을 취합하여 마스크 영상(1030)을 생성하고, 마스크 영상(1030) 및 입력 영상(1001)을 이용하여 전경 영상(1050)을 생성할 수 있다.Subsequently, in operation 820 , the processor generates a mask image 1030 by collecting pixels determined as mask properties as described above with reference to FIG. 8 , and a foreground image 1050 using the mask image 1030 and the input image 1001 . ) can be created.

도 11은 일 실시예에 따라 영상으로부터 객체를 분할하는 장치의 구성을 도시한 블럭도이다.11 is a block diagram illustrating a configuration of an apparatus for segmenting an object from an image according to an embodiment.

영상으로부터 객체를 분할하는 장치(1100)는 프로세서(1110) 및 메모리(1120)를 포함한다.The apparatus 1100 for segmenting an object from an image includes a processor 1110 and a memory 1120 .

프로세서(1110)는 객체를 포함하는 입력 영상을 수신하고, 영상 모델을 이용하여 입력 영상으로부터 출력 영상을 생성하며, 출력 영상으로부터 객체 영상을 추출할 수 있다. 프로세서(1110)의 구체적인 동작은 도 1 내지 도 10에서 상술하였으므로 생략한다.The processor 1110 may receive an input image including an object, generate an output image from the input image using an image model, and extract an object image from the output image. A detailed operation of the processor 1110 is omitted since it has been described above with reference to FIGS. 1 to 10 .

메모리(1120)는 기준 트레이닝 영상으로부터 기준 객체 영상이 출력되도록 학습된 영상 모델을 저장할 수 있다. 또한, 메모리(1120)는 입력 영상, 출력 영상, 및 객체 영상 등과 같은, 영상 처리의 입력, 중간결과, 최종결과 등을 임시적으로 또는 영구적으로 저장할 수 있다.The memory 1120 may store an image model trained to output a reference object image from the reference training image. Also, the memory 1120 may temporarily or permanently store input, intermediate results, and final results of image processing, such as an input image, an output image, and an object image.

또한, 객체를 분할하는 장치(1100)는 카메라(미도시됨)를 더 포함할 수도 있다. 카메라(미도시됨)는 장치(1100) 외부를 촬영하여, 입력 영상을 획득할 수 있다. 객체를 분할하는 장치(1100)는 통신부(미도시됨)를 더 포함할 수도 있다. 통신부(미도시됨)는 입력 영상을 외부로부터 유선 또는 무선으로 수신할 수 있다.Also, the apparatus 1100 for segmenting an object may further include a camera (not shown). A camera (not shown) may acquire an input image by photographing the outside of the device 1100 . The apparatus 1100 for dividing an object may further include a communication unit (not shown). The communication unit (not shown) may receive an input image from the outside by wire or wirelessly.

일 실시예에 따른 장치(1100)는 영상 모델(예를 들어, 뉴럴 네트워크)을 이용하여 영상으로부터 객체를 분리하기 위해 픽셀 단위로 판단하는 것이 아니가, 영상 단위로 판단할 수 있다. 예를 들어, 장치(1100)는 각 픽셀에 대응하는 패치(patch)가 전경 또는 후경인지를 판단하는 것이 아니고, 입력된 영상 전체에 대하여 각 픽셀의 속성을 일괄적으로 결정하여 객체를 분할하는 바, 분할에 소요되는 시간이 적어 속도가 빠르고, 정확도도 높을 수 있다. 장치(1100)는 스마트폰 등과 같은 모바일 기기 또는 PC 등과 같은 거치형 기기로 구현되거나, 칩의 형태로 구현되어 휴대폰 또는 TV 등에 탑재될 수도 있다.The device 1100 according to an embodiment may determine in units of images instead of in units of pixels in order to separate an object from an image using an image model (eg, a neural network). For example, the device 1100 does not determine whether a patch corresponding to each pixel is a foreground or a background, but divides the object by collectively determining the properties of each pixel with respect to the entire input image. , it takes less time for segmentation, so the speed is fast and the accuracy can be high. The device 1100 may be implemented as a mobile device such as a smart phone or a stationary device such as a PC, or may be implemented in the form of a chip and mounted on a mobile phone or TV.

도 12는 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 방법의 흐름도를 도시한 도면이다.12 is a diagram illustrating a flowchart of a method of learning an image model used to segment an object from an image according to an embodiment.

우선, 단계(1210)에서 영상 모델을 학습하는 장치의 모델 학습부는 기준 트레이닝 영상 및 기준 객체 영상을 포함하는 트레이닝 데이터를 수신할 수 있다. 기준 트레이닝 영상은 트레이닝에서 입력으로 사용되는 영상이고, 기준 객체 영상은 특정 기준 트레이닝 영상에 대해 출력되어야 하는 것으로 미리 설정되는 영상을 나타낼 수 있다. 트레이닝 데이터는 기준 트레이닝 영상 및 해당 기준 트레이닝 영상에 매핑되는 기준 객체 영상으로 구성되는 트레이닝 쌍(training pair)을 포함할 수 있다.First, the model learning unit of the apparatus for learning an image model in operation 1210 may receive training data including a reference training image and a reference object image. The reference training image is an image used as an input in training, and the reference object image may indicate an image preset to be output with respect to a specific reference training image. The training data may include a training pair including a reference training image and a reference object image mapped to the reference training image.

그리고 단계(1220)에서 모델 학습부는 기준 트레이닝 영상에 대해 회전(rotation), 크기 조정(resize), 이동(shift), 반전(flip), 및 노이즈 부가(noise adding) 중 적어도 하나의 처리가 수행하여 트레이닝 데이터를 증강(augment)시킬 수 있다. 모델 학습부는 하나의 기준 트레이닝 영상 및 기준 객체 영상의 쌍에 대하여, 기준 트레이닝 영상을 회전, 크기 조정, 이동, 반전 및 노이즈 부가 등의 처리를 통해 동일한 기준 객체 영상에 매핑되는 기준 트레이닝 영상을 증강시킬 수 있다.And in step 1220, the model learning unit performs at least one processing of rotation, resizing, shift, flip, and noise adding on the reference training image. Training data may be augmented. The model learning unit may augment the reference training image mapped to the same reference object image through processing such as rotating, resizing, moving, inverting, and adding noise to the reference training image with respect to a pair of a reference training image and a reference object image. can

회전 처리는 기준 트레이닝 영상을 일정 각도로 회전시키는 영상 처리를 나타낼 수 있다. 예를 들어, 모델 학습부는 ±8도 사이에서 무작위로 선정된 각도로 기준 트레이닝 영상을 회전시킬 수 있다. 크기 조정은 기준 트레이닝 영상의 크기를 증가시키거나 감소시키는 영상 처리를 나타낼 수 있다. 예를 들어, 모델 학습부는 0.9배 및 1.1배 사이에서 무작위로 선정된 비율로 기준 트레이닝 영상의 크기를 조정할 수 있다. 이동은 기준 트레이닝 영상을 크롭(crop)시키는 영상 처리를 나타낼 수 있다. 예를 들어, 모델 학습부는 기준 트레이닝 영상 내에서 무작위 크기의 무작위 위치를 크롭시킬 수 있다. 반전은 기준 트레이닝 영상을 위아래로 뒤집거나 좌우로 뒤집는 영상 처리를 나타낼 수 있다. 예를 들어, 모델 학습부는 이동 처리된 기준 트레이닝 영상을 50%의 확률로 반전시킬 수 있다. 노이즈 부가는 기준 트레이닝 영상에 대해 가우시안 노이즈(Gaussian noise)를 부가하는 영상 처리를 나타낼 수 있다. 예를 들어, 모델 학습부는 기준 트레이닝 영상의 각 픽셀에 대해 0의 평균값 및 0.9의 편차를 가지는 가우시안 노이즈를 부가할 수 있다.The rotation processing may refer to image processing for rotating the reference training image by a predetermined angle. For example, the model learning unit may rotate the reference training image at a randomly selected angle between ±8 degrees. Resizing may refer to image processing to increase or decrease the size of the reference training image. For example, the model learner may adjust the size of the reference training image at a ratio randomly selected between 0.9 times and 1.1 times. The movement may represent image processing for cropping the reference training image. For example, the model learner may crop a random position of a random size within the reference training image. Inversion may refer to image processing that flips the reference training image upside down or left and right. For example, the model learning unit may invert the movement-processed reference training image with a probability of 50%. The noise addition may represent image processing in which Gaussian noise is added to the reference training image. For example, the model learner may add Gaussian noise having an average value of 0 and a deviation of 0.9 to each pixel of the reference training image.

이어서 단계(1230)에서 모델 학습부는 증강된 트레이닝 데이터에 기초하여 영상 모델을 트레이닝시킬 수 있다. 모델 학습부가 영상 모델을 트레이닝시키는 과정은 하기 도 13에서 상세히 설명한다.Subsequently, in operation 1230 , the model learner may train the image model based on the augmented training data. A process in which the model learning unit trains the image model will be described in detail with reference to FIG. 13 below.

도 13은 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 장치의 구성을 도시한 도면이다.13 is a diagram illustrating a configuration of an apparatus for learning an image model used to segment an object from an image according to an embodiment.

영상 모델을 학습하는 장치(1300)는 모델 학습부(1310) 및 트레이닝 데이터 저장소(1320)를 포함한다. 모델 학습부(1310)는 적어도 하나의 프로세서를 포함할 수 있고, 영상 모델을 학습시킬 수 있다. 예를 들어, 모델 학습부(1310)는 기준 트레이닝 데이터 저장소(1320)로부터 기준 트레이닝 영상(1301) 및 기준 객체 영상(1309)을 수신할 수 있고, 수신된 기준 트레이닝 영상 및 기준 객체 영상은 쌍(pair)으로 구성될 수 있다. 트레이닝 데이터 저장소(1320)는 적어도 하나의 메모리를 포함할 수 있고, 영상 모델의 학습에 사용되는 트레이닝 데이터(1325)를 저장할 수 있다. 트레이닝 데이터(1325)는 기준 트레이닝 영상(1301) 및 기준 객체 영상(1309)이 상호 매핑된 트레이닝 쌍을 적어도 하나 포함할 수 있다. 구체적인 학습 과정은 하기에서 상세히 설명하며, 이하에서는 영상 모델이 뉴럴 네트워크의 파라미터를 포함하는 경우를 예로 들어 설명한다.The apparatus 1300 for learning an image model includes a model learning unit 1310 and a training data storage 1320 . The model learning unit 1310 may include at least one processor and may train an image model. For example, the model learning unit 1310 may receive the reference training image 1301 and the reference object image 1309 from the reference training data storage 1320, and the received reference training image and the reference object image are paired ( pair) can be configured. The training data storage 1320 may include at least one memory, and may store training data 1325 used for learning an image model. The training data 1325 may include at least one training pair in which the reference training image 1301 and the reference object image 1309 are mapped to each other. A specific learning process will be described in detail below. Hereinafter, a case in which an image model includes parameters of a neural network will be described as an example.

일 실시예에 따르면, 영상 모델을 학습하는 장치(1300)는 영상으로부터 객체의 분할을 학습하는 방법을 수행할 수 있고, 예를 들어, 객체를 포함하는 입력 영상(도 14의 1401)으로부터 객체에 대응하는 출력 영상(도 14의 1405)을 생성하는 영상 모델(예를 들어, 뉴럴 네트워크(1311))을 이용하여, 프로세서(도 14의 1110)가 기준 트레이닝 영상(1301)으로부터 기준 객체 영상(1309)을 분할하도록, 영상 모델의 파라미터(예를 들어, 뉴럴 네트워크(1311)의 파라미터)를 학습시킬 수 있다. 예를 들어, 영상 모델을 학습하는 장치(1300)는 감독 학습(supervised learning)을 통해 뉴럴 네트워크(1311)를 학습시킬 수 있다. 감독 학습이란 기준 트레이닝 영상(1301)과 그에 대응하는 기준 객체 영상(1309)을 함께 뉴럴 네트워크(1311)에 입력하고, 기준 트레이닝 영상(1301)에 대응하는 기준 객체 영상(1309)이 출력되도록 연결선들의 연결 가중치를 업데이트하는 방법이다. 예를 들어, 영상 모델을 학습하는 장치(1300)는 델타 규칙(delta rule)과 오류 역전파 학습(backpropagation learning) 등을 통해 인공 뉴런들 사이의 연결 가중치를 업데이트할 수 있다.According to an embodiment, the apparatus 1300 for learning an image model may perform a method of learning division of an object from an image, for example, from an input image (1401 of FIG. 14 ) including an object to an object. Using an image model (eg, the neural network 1311 ) that generates a corresponding output image ( 1405 of FIG. 14 ), the processor ( 1110 of FIG. 14 ) is configured to generate the reference object image 1309 from the reference training image 1301 . ), a parameter of the image model (eg, a parameter of the neural network 1311) may be trained. For example, the apparatus 1300 for learning an image model may train the neural network 1311 through supervised learning. Supervised learning refers to inputting a reference training image 1301 and a reference object image 1309 corresponding thereto to the neural network 1311, and connecting lines such that a reference object image 1309 corresponding to the reference training image 1301 is output. How to update connection weights. For example, the apparatus 1300 for learning an image model may update a connection weight between artificial neurons through a delta rule and error backpropagation learning.

오류 역전파 학습은, 주어진 기준 트레이닝 영상(1301)에 대해 전방 계산(forward computation)으로 오류를 추정한 후, 출력 층에서 시작하여 히든 층과 입력 층 방향으로 역으로 전진하여 추정한 오류를 전파하고, 오류를 줄이는 방향으로 연결 가중치를 업데이트하는 방법이다. 뉴럴 네트워크(1311)의 처리는 입력 층 히든 층 출력 층의 방향으로 진행되지만, 오류 역전파 학습에서 연결 가중치의 업데이트 방향은 출력 층 히든 층 입력 층의 방향으로 진행될 수 있다. 예를 들어, 오류 역전파 학습으로서 확률적 경사감소법(stochastic gradient descent)이 사용될 수 있다. 각 층에서 초기의 연결 가중치는 0의 평균값 및 표준 편차가 0.01인 가우시안 분포에 의해 결정될 수 있다. 또한, 컨볼루션 층들 및 완전연결 층들의 바이어스는 0으로 초기화될 수 있다. 학습률(learning rate)은 0.001에서 시작하여, 0.0001로 감소될 수 있다.Error backpropagation learning, after estimating an error by forward computation for a given reference training image 1301, starts from the output layer and propagates the estimated error by moving backward in the direction of the hidden layer and the input layer, , is a method of updating the connection weights in a way that reduces errors. Although the processing of the neural network 1311 proceeds in the direction of the input layer hidden layer output layer, the update direction of the connection weights in error backpropagation learning may proceed in the direction of the output layer hidden layer input layer. For example, stochastic gradient descent may be used as error backpropagation learning. The initial connection weight in each layer may be determined by a Gaussian distribution having an average value of 0 and a standard deviation of 0.01. Also, the bias of the convolutional layers and the fully connected layers may be initialized to zero. The learning rate may start at 0.001 and decrease to 0.0001.

영상 모델을 학습하는 장치(1300)는 현재 설정된 연결 가중치들이 얼마나 최적에 가까운지를 측정하기 위한 목적 함수(objective function)를 정의하고, 목적 함수의 결과에 기초하여 연결 가중치들을 계속 변경하고, 학습을 반복적으로 수행할 수 있다. 예를 들어, 목적 함수는 뉴럴 네트워크(1311)가 기준 트레이닝 영상(1301)에 기초하여 실제 출력한 출력 값과 출력되기로 원하는 기대 값 간의 오류를 계산하기 위한 오류 함수일 수 있다. 영상 모델을 학습하는 장치(1300)는 오류 함수의 값을 줄이는 방향으로 연결 가중치들을 업데이트할 수 있다. 오류 함수는 squared L2 norm으로서, 출력 영상의 i번째 픽셀의 오류 L_i는 하기 수학식 1과 같이 나타낼 수 있다.The apparatus 1300 for learning an image model defines an objective function for measuring how close to optimal the currently set connection weights are, continues to change the connection weights based on the result of the objective function, and repeats learning. can be done with For example, the objective function may be an error function for calculating an error between an output value actually output by the neural network 1311 based on the reference training image 1301 and an expected value desired to be output. The apparatus 1300 for learning the image model may update the connection weights in a direction to reduce the value of the error function. The error function is a squared L2 norm, and the error L _i of the i-th pixel of the output image may be expressed as in Equation 1 below.

상술한 수학식 1에서 m_i는 기준 트레이닝 영상(1301)에 대해 매핑된 기준 객체 영상(1309)의 i번째 픽셀의 이진값을 나타낼 수 있다. p_i는 뉴럴 네트워크(1311)의 기준 트레이닝 영상(1301)에 대해 생성된 출력 영상의 i번째 픽셀의 픽셀값을 나타낼 수 있고, 하기 수학식 2와 같이 나타낼 수 있다. In Equation 1, m _i may represent the binary value of the i-th pixel of the reference object image 1309 mapped to the reference training image 1301 . p _i may represent the pixel value of the i-th pixel of the output image generated with respect to the reference training image 1301 of the neural network 1311 , and may be expressed as Equation 2 below.

상술한 수학식 2에서 f(x_i)는 기준 트레이닝 영상(1301)이 하나 이상의 컨볼루션 필터링을 통해 특징 공간(feature space)으로 투영(project)된 값을 나타낼 수 있고, g()는 완전연결 층을 통해 처리된 뉴럴 네트워크(1311)의 최종 결과를 도출하는 함수를 나타낼 수 있다.In Equation 2 above, f(x _i ) may represent a value in which the reference training image 1301 is projected into a feature space through one or more convolutional filtering, and g( ) is fully connected. A function for deriving the final result of the neural network 1311 processed through the layers may be represented.

일 실시예에 따르면, 영상 모델은, 입력 영상으로부터 생성되는 출력 영상 내지 객체 영상의 해상도가 입력 영상의 해상도와 동일하도록 구성될 수 있다. 또한, 영상 모델은, 뉴럴 네트워크(1311)를 포함하고, 뉴럴 네트워크(1311)의 활성화 함수는 적어도 하나의 비선형 함수(예를 들어, 시그모이드 뉴런 함수(sigmoid neuron function))를 포함할 수 있다. 더 나아가, 영상 모델은, 상기 기준 트레이닝 영상(1301)에 대해 회전(rotation), 크기 조정(resize), 이동(shift), 반전(flip), 및 노이즈 부가(noise adding) 중 적어도 하나의 처리가 수행된 영상에 기초하여 학습될 수도 있다.According to an embodiment, the image model may be configured such that the resolution of the output image or the object image generated from the input image is the same as the resolution of the input image. In addition, the image model may include a neural network 1311 , and an activation function of the neural network 1311 may include at least one nonlinear function (eg, a sigmoid neuron function). . Furthermore, the image model may include at least one of rotation, resizing, shift, flip, and noise adding with respect to the reference training image 1301. It may be learned based on the performed image.

도 14는 일 실시예에 따라 도 13에서 학습된 영상 모델을 이용하여, 입력 영상으로부터 생성된 객체 영상을 도시한 도면이다.14 is a diagram illustrating an object image generated from an input image by using the image model learned in FIG. 13 according to an exemplary embodiment.

일 실시예에 따르면, 영상으로부터 객체를 분할하는 장치(1100)의 메모리(1120)가 저장하는 영상 모델(1121)은 상술한 도 13에서 학습된 것일 수 있다.According to an embodiment, the image model 1121 stored in the memory 1120 of the apparatus 1100 for segmenting an object from an image may be the one learned in FIG. 13 described above.

프로세서(1110)는 도 13에서 학습된 영상 모델(1121)을 이용하여, 입력 영상(1401)으로부터 출력 영상(1405)을 생성할 수 있다. 출력 영상(1405)의 각 픽셀의 픽셀값은 예를 들어, 해당 픽셀이 마스크에 대응할 확률을 나타낼 수 있다. 예를 들어, 도 14의 출력 영상(1405)에 도시된 바와 같이 실제로는 객체가 아닌 부분에 대해서도, 임의의 픽셀이 마스크에 대응할 확률이 있는 것으로 나타날 수 있다. 도 10에서 상술한 바와 같이, 마스크에 대응할 확률이 낮은 픽셀은 임계값과의 비교를 통해 제거될 수 있다.The processor 1110 may generate an output image 1405 from the input image 1401 by using the image model 1121 learned in FIG. 13 . A pixel value of each pixel of the output image 1405 may indicate, for example, a probability that the corresponding pixel corresponds to a mask. For example, as shown in the output image 1405 of FIG. 14 , even for a portion that is not actually an object, it may appear that an arbitrary pixel has a probability of corresponding to the mask. As described above with reference to FIG. 10 , a pixel having a low probability of corresponding to the mask may be removed through comparison with a threshold value.

도 15는 다른 일 실시예에 따라 영상을 분할하는 방법을 도시한 흐름도이다.15 is a flowchart illustrating a method of segmenting an image according to another exemplary embodiment.

우선, 단계(1510)에서 영상으로부터 객체를 분할하는 장치의 프로세서는 객체를 포함하는 입력 영상을 수신할 수 있다. 수신되는 입력 영상은 도 1에서 상술한 바와 같을 수 있다.First, the processor of the apparatus for segmenting an object from an image in operation 1510 may receive an input image including the object. The received input image may be as described above with reference to FIG. 1 .

그리고 단계(1520)에서 프로세서는 제1 영상 모델(first image model)을 이용하여, 입력 영상으로부터 객체에 대응하는 중간 영상(intermediate image)을 생성할 수 있다. 제1 영상 모델은 도 2에서 상술한 영상 모델과 유사하게 구성될 수 있고, 제1 영상 모델은 입력 영상으로부터 1차적으로 객체 영상을 분할하기 위해 사용될 수 있다. 예를 들어, 제1 영상 모델은 기준 트레이닝 영상으로부터 기준 객체 영상이 출력되도록 학습될 수 있다. 제1 영상 모델의 학습은 하기 도 17 및 도 18에서 상세히 설명한다. 중간 영상은 프로세서에 의해 제1 영상 모델을 이용하여 입력 영상으로부터 생성된 중간 결과를 나타낼 수 있다.In operation 1520 , the processor may generate an intermediate image corresponding to the object from the input image by using a first image model. The first image model may be configured similarly to the image model described above with reference to FIG. 2 , and the first image model may be used to primarily segment an object image from an input image. For example, the first image model may be trained to output a reference object image from a reference training image. Learning of the first image model will be described in detail with reference to FIGS. 17 and 18 below. The intermediate image may represent an intermediate result generated from the input image by the processor using the first image model.

이어서 단계(1530)에서 프로세서는 제2 영상 모델을 이용하여 중간 영상으로부터 객체에 대응하는 출력 영상을 생성할 수 있다. 제2 영상 모델은 도 2에서 상술한 영상 모델과 유사하게 구성될 수 있고, 제2 영상 모델은 상술한 중간 영상으로부터 2차적으로 객체 영상을 분할하기 위해 사용될 수 있다. 예를 들어, 제2 영상 모델은 기준 트레이닝 영상에 제1 영상 모델이 적용된 결과인 기준 중간 영상으로부터 기준 객체 영상이 출력되도록 학습될 수 있다. 제2 영상 모델의 학습은 하기 도 17 및 도 18에서 상세히 설명한다. 출력 영상은 프로세서에 의해 제2 영상 모델을 이용하여 중간 영상으로부터 생성된 최종 결과를 나타낼 수 있다.Subsequently, in operation 1530 , the processor may generate an output image corresponding to the object from the intermediate image using the second image model. The second image model may be configured similarly to the image model described above with reference to FIG. 2 , and the second image model may be used to secondarily segment an object image from the above-described intermediate image. For example, the second image model may be trained to output a reference object image from a reference intermediate image that is a result of applying the first image model to the reference training image. Learning of the second image model will be described in detail with reference to FIGS. 17 and 18 below. The output image may represent a final result generated from the intermediate image by using the second image model by the processor.

다른 일 실시예에 따른 영상으로부터 객체를 분할하는 장치는 상술한 바와 같이, 1차적으로 입력 영상으로부터 제1 영상 모델을 이용하여 러프(rough)한 결과로서 중간 영상을 생성하고, 2차적으로 중간 영상으로부터 제2 영상 모델을 이용하여 정밀(fine)한 결과로서 출력 영상을 생성할 수 있다.As described above, the apparatus for segmenting an object from an image according to another embodiment generates an intermediate image as a result of primarily roughing an input image using a first image model, and secondarily, an intermediate image An output image may be generated as a fine result by using the second image model from

그리고 단계(1540)에서 프로세서는 출력 영상으로부터 객체 영상을 추출할 수 있다. 출력 영상으로부터 객체를 추출하는 과정은 도 1 내지 도 10에서 상술한 바와 같다.And in step 1540, the processor may extract an object image from the output image. The process of extracting the object from the output image is the same as described above with reference to FIGS. 1 to 10 .

일 실시예에 따르면 도 11에서 상술한 장치(1100)는 도 15의 방법을 수행할 수 있다. 예를 들어, 도 11의 프로세서(1110)는 상술한 단계들(1510 내지 1540)의 동작을 수행할 수 있고, 도 11의 메모리(1120)는 상술한 제1 영상 모델 및 제2 영상 모델 등을 저장할 수 있다.According to an embodiment, the apparatus 1100 described above with reference to FIG. 11 may perform the method of FIG. 15 . For example, the processor 1110 of FIG. 11 may perform the operations of steps 1510 to 1540 described above, and the memory 1120 of FIG. 11 stores the above-described first image model and second image model. can be saved

도 16은 다른 일 실시예에 따라 영상 모델을 이용하여 입력 영상으로부터 출력 영상을 생성하는 예시를 도시한 도면이다.16 is a diagram illustrating an example of generating an output image from an input image by using an image model according to another embodiment.

도 16은 제1 영상 모델 및 제2 영상 모델이 뉴럴 네트워크의 파라미터로서 연결 가중치를 포함하는 경우를 예로 들어 설명한다. 도 16에 도시된 제1 영상 모델에 대응하는 제1 뉴럴 네트워크 및 제2 영상 모델에 대응하는 제2 뉴럴 네트워크는 학습이 완료된 상태일 수 있다.16 , a case in which the first image model and the second image model include a connection weight as a parameter of a neural network will be described as an example. The first neural network corresponding to the first image model and the second neural network corresponding to the second image model shown in FIG. 16 may have been trained.

일 실시예에 따르면, 도 16에 도시된 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크는 각각 입력 층, 출력 층 및 6개의 히든 층을 포함할 수 있다. 예를 들어, 제1 뉴럴 네트워크의 입력 층은 입력 영상(1601)을 수신할 수 있다. 제1 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618)는 도 4와 동일한 구조의 레이어로 구성될 수 있다. 다만, 도 16에 도시된 제1 뉴럴 네트워크는 입력 영상(1601)으로부터 객체 영상이 출력되도록 학습될 수 있다. 여기서, 제1 뉴럴 네트워크의 출력 층은 중간 영상(1605)을 출력할 수 있다.According to an embodiment, the first neural network and the second neural network shown in FIG. 16 may include an input layer, an output layer, and six hidden layers, respectively. For example, the input layer of the first neural network may receive the input image 1601 . The first to eighth layers 1611 , 1612 , 1613 , 1614 , 1615 , 1616 , 1617 , and 1618 of the first neural network may include layers having the same structure as in FIG. 4 . However, the first neural network shown in FIG. 16 may be trained to output an object image from the input image 1601 . Here, the output layer of the first neural network may output an intermediate image 1605 .

또한, 도 16에 도시된 제2 뉴럴 네트워크의 입력 층은 상술한 중간 영상(1605)를 수신할 수 있다. 제2 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628)은 도 4와 동일한 구조의 레이어로 구성될 수 있다. 다만, 도 16에 도시된 제2 뉴럴 네트워크는 중간 영상(1605)으로부터 객체 영상이 출력되도록 학습될 수 있다. 여기서, 제2 뉴럴 네트워크의 출력 층은 객체에 대응하는 출력 영상(1609)를 출력할 수 있다.Also, the input layer of the second neural network shown in FIG. 16 may receive the above-described intermediate image 1605 . The first to eighth layers 1621 , 1622 , 1623 , 1624 , 1625 , 1626 , 1627 , and 1628 of the second neural network may include layers having the same structure as in FIG. 4 . However, the second neural network shown in FIG. 16 may be trained to output an object image from the intermediate image 1605 . Here, the output layer of the second neural network may output an output image 1609 corresponding to the object.

다만, 도 16에서 제1 영상 모델과 제2 영상 모델의 구조가 동일한 것으로 도시하였으나, 이로 한정하는 것은 아니고, 제1 영상 모델 및 제2 영상 모델은 서로 다른 구조를 가지도록 구성될 수 있다. 또한, 도 16에서 제1 영상 모델과 제2 영상 모델은 뉴럴 네트워크 타입의 모델로 설명되었으나, 이로 한정하는 것은 아니고 제1 영상 모델 및 제2 영상 모델은 서로 다른 타입의 영상 모델로 구성될 수도 있다. 아울러, 제1 영상 모델과 제2 영상 모델은 학습 결과에 따라 상이한 파라미터를 가질 수 있으나, 이로 한정하는 것은 아니고, 동일한 파라미터를 가질 수도 있다.However, although the structure of the first image model and the second image model is illustrated as the same in FIG. 16 , the present invention is not limited thereto, and the first image model and the second image model may be configured to have different structures. In addition, although the first image model and the second image model have been described as models of a neural network type in FIG. 16 , the present invention is not limited thereto, and the first image model and the second image model may be composed of different types of image models. . In addition, the first image model and the second image model may have different parameters depending on the learning result, but are not limited thereto, and may have the same parameters.

도 17은 다른 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 방법의 흐름도를 도시한 도면이다.17 is a diagram illustrating a flowchart of a method of learning an image model used to segment an object from an image according to another embodiment.

우선, 단계(1710)에서 영상 모델을 학습하는 장치의 모델 학습부는 기준 트레이닝 영상 및 기준 객체 영상을 수신할 수 있다. 기준 트레이닝 영상은 트레이닝에서 입력으로 사용되는 영상으로서, 제1 영상 모델의 학습을 위해 사용될 수 있다. 기준 객체 영상은 특정 기준 트레이닝 영상에 대해 출력되어야 하는 것으로 미리 설정되는 영상을 나타낼 수 있다. 제1 영상 모델 및 제2 영상 모델의 학습에서 동일한 기준 객체 영상이 사용될 수 있다. 트레이닝 데이터는 기준 트레이닝 영상 및 해당 기준 트레이닝 영상에 매핑되는 기준 객체 영상으로 구성되는 트레이닝 쌍을 포함할 수 있다.First, the model learning unit of the apparatus for learning an image model in operation 1710 may receive a reference training image and a reference object image. The reference training image is an image used as an input in training and may be used for learning the first image model. The reference object image may indicate an image preset to be output with respect to a specific reference training image. The same reference object image may be used in learning the first image model and the second image model. The training data may include a training pair including a reference training image and a reference object image mapped to the reference training image.

그리고 단계(1720)에서 영상 모델을 학습하는 장치는 객체를 포함하는 입력 영상으로부터 객체에 대응하는 중간 영상을 생성하는 제1 영상 모델을 이용하여, 프로세서가 기준 트레이닝 영상으로부터 기준 객체 영상을 분할하도록, 제1 영상 모델의 파라미터를 학습시킬 수 있다. 예를 들어, 제1 영상 모델의 학습은 도 13에서 상술한 바와 유사한 과정을 통해 수행될 수 있다.And in step 1720, the apparatus for learning the image model uses the first image model to generate an intermediate image corresponding to the object from the input image including the object, so that the processor divides the reference object image from the reference training image, A parameter of the first image model may be learned. For example, learning of the first image model may be performed through a process similar to that described above with reference to FIG. 13 .

이어서 단계(1730)에서 영상 모델을 학습하는 장치는 제1 영상 모델을 이용하여 기준 트레이닝 영상으로부터 기준 중간 영상(reference intermediate image)을 생성할 수 있다. 기준 중간 영상은 제2 영상 모델을 학습시키기(train) 위해 생성된 영상으로서 기준 객체 영상에 매핑될 수 있다.Subsequently, in operation 1730 , the apparatus for learning the image model may generate a reference intermediate image from the reference training image by using the first image model. The reference intermediate image is an image generated to train the second image model and may be mapped to the reference object image.

그리고 단계(1740)에서 영상 모델을 학습하는 장치는 중간 영상으로부터 객체에 대응하는 출력 영상을 생성하는 제2 영상 모델을 이용하여, 프로세서가 기준 중간 영상으로부터 기준 객체 영상을 분할하도록, 제2 영상 모델의 파라미터를 학습시킬 수 있다. 예를 들어, 제2 영상 모델의 학습은 도 13에서 상술한 바와 유사한 과정을 통해 수행될 수 있는데, 다만 제2 영상 모델은 상술한 단계(1730)에서 생성된 기준 중간 영상으로부터 기준 객체 영상을 분할하도록 학습될 수 있다.And in step 1740, the apparatus for learning the image model uses a second image model that generates an output image corresponding to the object from the intermediate image, so that the processor divides the reference object image from the reference intermediate image, the second image model parameters can be learned. For example, learning of the second image model may be performed through a process similar to that described above with reference to FIG. 13 , except that the second image model divides the reference object image from the reference intermediate image generated in the above-described step 1730 . can be learned to do.

일 실시예에 따르면, 영상 모델을 학습하는 장치는 단계(1720)에서의 제1 영상 모델의 학습을 완료하고, 이후 학습이 완료된 제1 영상 모델을 이용하여 생성된 기준 중간 영상 및 기준 객체 영상에 기초하여 단계(1740)에서 제2 영상 모델을 학습시킬 수 있다. 다만, 이로 한정하는 것은 아니고, 영상 모델을 학습하는 장치는 단계(1720)에서 제1 영상 모델이 학습 중이더라도, 단계(1730)에서 학습 중인 제1 영상 모델을 이용하여 기준 중간 영상을 생성하고, 단계(1740)에서는 이렇게 생성된 기준 중간 영상을 이용하여 제2 영상 모델을 학습시킬 수 있다. 이와 같이 1차적 객체 분할을 위한 제1 영상 모델과 2차적 객체 분할을 위한 제2 영상 모델의 학습은 분리되어 수행되는 것으로 한정하지 않고, 동시에 수행될 수도 있다.According to one embodiment, the apparatus for learning the image model completes the learning of the first image model in step 1720, and thereafter, it is applied to the reference intermediate image and the reference object image generated using the first image model on which the learning is completed. Based on this, the second image model may be trained in operation 1740 . However, the present invention is not limited thereto, and the device for learning the image model generates a reference intermediate image by using the first image model being trained in step 1730, even if the first image model is being learned in step 1720, In operation 1740, the second image model may be trained using the generated reference intermediate image. As described above, the learning of the first image model for primary object segmentation and the second image model for secondary object segmentation is not limited to being separately performed, but may be performed simultaneously.

도 13에 도시된 장치(1300)는 도 17에 도시된 학습 방법을 수행할 수도 있다.The apparatus 1300 shown in FIG. 13 may perform the learning method shown in FIG. 17 .

도 18은 다른 일 실시예에 따라 영상으로부터 객체를 분할하기 위해 사용되는 영상 모델을 학습하는 과정을 도시한 도면이다.18 is a diagram illustrating a process of learning an image model used to segment an object from an image according to another embodiment.

도 18에 도시된 제1 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818)은 도 16의 제1 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618)와 동일한 구조의 레이어인 것으로 가정할 수 있다. 또한, 도 18에 도시된 제2 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828)는 도 16의 제2 뉴럴 네트워크의 제1 레이어 내지 제8 레이어(1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628)와 동일한 구조의 레이어인 것으로 가정할 수 있다.The first to eighth layers 1811 , 1812 , 1813 , 1814 , 1815 , 1816 , 1817 , and 1818 of the first neural network shown in FIG. 18 are the first to eighth layers of the first neural network of FIG. 16 . It can be assumed that the layer has the same structure as (1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618). In addition, the first to eighth layers 1821, 1822, 1823, 1824, 1825, 1826, 1827, and 1828 of the second neural network shown in FIG. 18 are the first to second layers of the second neural network shown in FIG. 16 . It may be assumed that the 8 layers 1621 , 1622 , 1623 , 1624 , 1625 , 1626 , 1627 , and 1628 have the same structure as the layer.

도 18에 도시된 바와 같이, 영상 모델을 학습하는 장치는, 기준 입력 영상(1801)으로부터 출력되는 기준 중간 영상(1804)과 기준 객체 영상(1809) 간의 오차가 최소화되도록, 제1 영상 모델(예를 들어, 제1 뉴럴 네트워크)을 학습시킬 수 있다. 다만, 제1 영상 모델에 의한 결과는 러프(rough)할 수 있다. 또한, 영상 모델을 학습하는 장치는 기준 중간 영상(1804)로부터 출력되는 출력 영상(1805)과 기준 객체 영상(1809) 간의 오차가 최소화되도록 제2 영상 모델(예를 들어, 제2 뉴럴 네트워크)을 학습시킬 수 있다. 제2 영상 모델에 의한 결과인 출력 영상(1805)는 제1 영상 모델에 의한 결과보다 정밀(fine)할 수 있다.As shown in FIG. 18 , the apparatus for learning the image model is configured to minimize the error between the reference intermediate image 1804 output from the reference input image 1801 and the reference object image 1809 , the first image model (eg, For example, the first neural network) may be trained. However, the result by the first image model may be rough. In addition, the apparatus for learning the image model generates a second image model (eg, a second neural network) such that an error between the output image 1805 output from the reference intermediate image 1804 and the reference object image 1809 is minimized. can learn The output image 1805 that is a result of the second image model may be finer than a result of the first image model.

일 실시예에 따르면, 2단(2 step)에 걸쳐 학습된 제1 영상 모델 및 제2 영상 모델을 이용함으로써, 영상으로부터 객체를 분할하는 장치는 보다 정확하게 입력 영상으로부터 객체 영상을 분할해낼 수 있다.According to an embodiment, an apparatus for segmenting an object from an image may more accurately segment an object image from an input image by using the first image model and the second image model learned in two steps.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

receiving an input image including an object and a background portion;
generating an output image corresponding to the object from the input image by using an image model; and
extracting the object image from the output image such that the object is included in the object image and the background part is excluded
including,
The step of extracting the object image,
determining that a pixel having a pixel value greater than a threshold value among pixels of the output image has a mask property and determining that a pixel having a pixel value less than or equal to the threshold value has a background property;
By binarizing the output image by setting a pixel value of a pixel corresponding to a pixel determined by the mask property in the output image to 1 and setting a pixel value of a pixel corresponding to a pixel determined by the background property in the output image to 0 , generating a mask image in which each pixel has a binary value;
generating a foreground image by multiplying a pixel value of a pixel of the mask image and a pixel value corresponding to a corresponding pixel in the input image;
includes,
The video model is
a neural network, wherein a pixel value of each output image pixel among a plurality of output image pixels included in the output image is included in the input image so that a pixel corresponding to the corresponding output image pixel is included in the object set to generate the output image to indicate the probability that it will not be included in the part,
How to segment an object from an image.

delete

According to claim 1,
The video model is
and a resolution of the object image generated from the input image is configured to be the same as a resolution of the input image,
How to segment an object from an image.

According to claim 1,
The activation function of the neural network comprises at least one non-linear function,
How to segment an object from an image.

A computer program stored on a medium in combination with hardware to execute the method of any one of claims 1, 7, and 8.

a memory for storing the image model; and
Receive an input image including an object and a background part, generate an output image corresponding to the object from the input image using the image model, and output the object so that the object is included in the object image and the background part is excluded A processor for extracting the object image from the image
including,
The processor is
It is determined that a pixel having a pixel value greater than a threshold value among pixels of the output image has a mask property, and a pixel having a pixel value less than the threshold value is determined as having a background property, and in the output image, the mask By binarizing the output image by setting a pixel value of a pixel corresponding to a pixel determined as a property to 1 and setting a pixel value of a pixel corresponding to a pixel determined as a background property in the output image to 0, each pixel is a binary value generating a mask image having a (binary value), multiplying a pixel value of a pixel of the mask image by a pixel value corresponding to a corresponding pixel in the input image to generate a foreground image;
The video model is
a neural network, wherein a pixel value of each output image pixel among a plurality of output image pixels included in the output image is included in the input image so that a pixel corresponding to the corresponding output image pixel is included in the object and the background set to generate the output image to indicate the probability that it will not be included in the part,
A device for segmenting an object from an image.

11. The method of claim 10,
The processor is
Classifying each pixel of the output image according to a property, and extracting an object image using the classified pixel,
A device for segmenting an object from an image.

delete

11. The method of claim 10,
The video model is
and a resolution of the object image generated from the input image is configured to be the same as a resolution of the input image,
A device for segmenting an object from an image.

11. The method of claim 10,
The video model is
The activation function of the neural network comprises at least one non-linear function,
A device for segmenting an object from an image.

delete

receiving an input image including an object and a background part;
generating an intermediate image corresponding to the object from the input image by using a first image model;
generating an output image corresponding to the object from the intermediate image using a second image model; and
extracting the object image from the output image such that the object is included in the object image and the background part is excluded
including,
The step of extracting the object image,
determining that a pixel having a pixel value greater than a threshold value among pixels of the output image has a mask property and determining that a pixel having a pixel value less than or equal to the threshold value has a background property;
By binarizing the output image by setting a pixel value of a pixel corresponding to a pixel determined by the mask property in the output image to 1 and setting a pixel value of a pixel corresponding to a pixel determined by the background property in the output image to 0 , generating a mask image in which each pixel has a binary value;
generating a foreground image by multiplying a pixel value of a pixel of the mask image and a pixel value corresponding to a corresponding pixel in the input image;
includes,
The first image model is
a neural network, wherein a pixel value of each intermediate image pixel among a plurality of intermediate image pixels included in the intermediate image is included in the input image so that a pixel corresponding to the corresponding intermediate image pixel is included in the object and the background set to generate the intermediate image to indicate the probability that it will not be included in the part,
How to segment an object from an image.

delete