KR20240060163A

KR20240060163A - Method of acquiring object segmentation information through trained neural network and server system performing the same method

Info

Publication number: KR20240060163A
Application number: KR1020220141458A
Authority: KR
Inventors: 김경수; 유인재; 조상현; 방승온
Original assignee: 오지큐 주식회사
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2024-05-08
Also published as: WO2024090692A1

Abstract

본 발명은 객체 세그멘테이션을 위한 방법에 관한 것으로 본 발명에 따른 컴퓨팅 장치에서 수행되는 학습된 신경망을 통한 객체 세그멘테이션 정보의 획득 방법은 대상 이미지를 인코더에 입력하여 상기 대상 이미지 내 적어도 일 객체의 특징이 축약된 특징맵을 출력하는 단계; 상기 출력된 특징맵을 디코더에 입력하여 상기 객체의 특징이 확장된 세그멘테이션맵을 생성하는 단계; 및 상기 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 상기 이미지 내 불확실한 마스크를 정제(Refinement)하여 상기 객체 세그멘테이션 정보를 획득하는 단계를 포함하고, 상기 인코더는 객체의 라벨을 이용하여 상기 특징맵 내 객체의 영역을 활성화한 활성맵과 상기 활성맵으로부터 픽셀 단위로 재구성된 재구성맵간의 차이를 손실로 학습된 것이 바람직하다. 이상 본 발명에 따르면 보다 정확한 객체 세그멘테이션 결과를 생성하고 세그멘테이션 결과를 이용하여 레이블을 생성하고 신경망의 학습에 이용할 수 있다.The present invention relates to a method for object segmentation. The method of obtaining object segmentation information through a learned neural network performed in a computing device according to the present invention inputs a target image into an encoder and reduces the characteristics of at least one object in the target image. outputting the feature map; Inputting the output feature map into a decoder to generate a segmentation map with expanded features of the object; And obtaining the object segmentation information by refining the uncertain mask in the image based on the edge of the object detected in the segmentation map, wherein the encoder uses the label of the object to determine the object in the feature map. It is preferable that the difference between the active map that activates the area and the reconstruction map reconstructed in pixel units from the active map is learned as a loss. According to the present invention, it is possible to generate more accurate object segmentation results, create labels using the segmentation results, and use them for learning neural networks.

Description

Method of acquiring object segmentation information through trained neural network and server system performing the same method}

본 발명은 객체 세그멘테이션을 위한 방법에 관한 것이다.The present invention relates to a method for object segmentation.

인공지능 기술의 발달로 다양한 기술 분야에 인공지능이 적용하고 있다.With the development of artificial intelligence technology, artificial intelligence is being applied to various technical fields.

특히 입력된 이미지에 대한 픽셀 값들을 특징으로 수학적인 연산을 통해 객체를 추적하고, 추적된 객체를 분류하기 위해 딥러닝 기반의 다양한 알고리즘들이 개발되고 있으며, 행렬로 정의되는 특징 값들에 대한 컨볼루션 연산을 수행하는 복수의 레이어들로 결합된 CNN(Convolution Neural Network) 신경망 모델들은 적용되는 도메인에 따라 최적화되어 이용되고 있다.In particular, various algorithms based on deep learning are being developed to track objects through mathematical operations using the pixel values of the input image and classify the tracked objects, and perform convolution operations on feature values defined as matrices. CNN (Convolution Neural Network) neural network models, which are combined with multiple layers that perform, are optimized and used according to the domain to which they are applied.

구체적으로 객체의 분류와 함께 이미지 내에서 객체가 존재하는 위치를 바운딩박스 형태로 제공하는 검출기술과, 픽셀단위로 객체를 세분화하여 영역을 추출하는 세그멘테이션 기술로 발전되고 있다.Specifically, it is being developed into a detection technology that classifies objects and provides the location of the object in the image in the form of a bounding box, and a segmentation technology that extracts the area by segmenting the object in pixel units.

일예로 2015년 MIT의 논문(Learning Deep Features for Discriminative Localization, [Submitted on 14 Dec 2015], https://arxiv.org/abs/1512.04150)으로 CAM(Class Activation Maps)이라고 하는 알고리즘 은 컨볼루션 네트워크의 최종 출력인 특징맵을 이용하여 신경망 모델이 어떤 특징을 판단 시 중요한 요소로 이용하였는지를 히트맵 형태로 표시하며, 전체 이미지에 대한 개략적인 판단 요소를 제시함으로써 학습을 위한 라벨링 정보로 활용할 수 있도록 한다.For example, in a 2015 MIT paper (Learning Deep Features for Discriminative Localization, [Submitted on 14 Dec 2015], https://arxiv.org/abs/1512.04150), an algorithm called CAM (Class Activation Maps) is used for convolutional networks. Using the feature map, which is the final output, it displays in the form of a heat map which features the neural network model used as important factors when making a decision, and presents rough decision elements for the entire image so that it can be used as labeling information for learning.

다만, CAM을 이용하여 객체의 픽셀 레벨의 세그멘테이션 결과를 약지도학습의 레이블 데이터로 생성하여 활용하기 위해서는 보다 정밀한 구분 과정이 필요할 수 있으며 따라서 이에 대한 구체적인 방법이 제시될 필요가 있다.However, in order to generate and utilize the pixel-level segmentation results of an object using CAM as label data for weakly supervised learning, a more precise classification process may be necessary, and therefore a specific method needs to be presented.

이상 본 발명은 객체 세그멘테이션을 위한 신경망의 구조를 제안하는 것을 목적으로 한다.The purpose of the present invention is to propose a neural network structure for object segmentation.

본 발명은 세그멘테이션을 위한 신경망의 구체적인 학습 방법을 제안하는 것을 목적으로 한다.The purpose of the present invention is to propose a specific learning method of a neural network for segmentation.

또한, 본 발명은 신경망의 출력을 이용하여 학습을 위한 세그멘테이션 데이터를 생성하는 방법을 제안하는 것을 목적으로 한다. Additionally, the purpose of the present invention is to propose a method for generating segmentation data for learning using the output of a neural network.

또한, 본 발명은 신경망의 목적에 따라 최적의 손실을 정의하는 방법을 제안하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to propose a method for defining the optimal loss according to the purpose of the neural network.

또한, 본 발명은 신경망의 세그멘테이션 데이터를 활용하여 복수의 이미지 내 객체의 정밀한 합성 방법을 제안하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to propose a method for precise synthesis of objects in multiple images by utilizing segmentation data of a neural network.

또한, 본 발명은 신경망의 학습을 위한 양질의 데이터를 증강하는 방법을 제안하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to propose a method for augmenting high-quality data for learning neural networks.

상기 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따른 컴퓨팅 장치에서 수행되는 학습된 신경망을 통한 객체 세그멘테이션 정보의 획득 방법은 대상 이미지를 인코더에 입력하여 상기 대상 이미지 내 적어도 일 객체의 특징이 축약된 특징맵을 출력하는 단계; 상기 출력된 특징맵을 디코더에 입력하여 상기 객체의 특징이 확장된 세그멘테이션맵을 생성하는 단계; 및 상기 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 상기 이미지 내 불확실한 마스크를 정제(Refinement)하여 상기 객체 세그멘테이션 정보를 획득하는 단계를 포함하고, 상기 인코더는 객체의 라벨을 이용하여 상기 특징맵 내 객체의 영역을 활성화한 활성맵과 상기 활성맵으로부터 픽셀 단위로 재구성된 재구성맵간의 차이를 손실로 학습된 것이 바람직하다.A method of obtaining object segmentation information through a learned neural network performed in a computing device according to an embodiment of the present invention for solving the above technical problem involves inputting a target image into an encoder and abbreviating the characteristics of at least one object in the target image. outputting the feature map; Inputting the output feature map into a decoder to generate a segmentation map with expanded features of the object; And obtaining the object segmentation information by refining the uncertain mask in the image based on the edge of the object detected in the segmentation map, wherein the encoder uses the label of the object to determine the object in the feature map. It is preferable that the difference between the active map that activates the area and the reconstruction map reconstructed in pixel units from the active map is learned as a loss.

상기 재구성맵은, 상기 활성맵 내 픽셀 간 연관관계(Correlation)에 따라 미탐지(False Negative) 영역을 검출하여 제1 재구성맵을 생성하는 단계; 및 상기 활성맵 내 픽셀 간 유사도(Affinity)에 따라 오탐지(False Positive) 영역을 검출하여 제2 재구성맵을 생성하는 단계로 재구성되는 것이 바람직하다.The reconstructed map includes generating a first reconstructed map by detecting a false negative area according to correlation between pixels in the active map; And it is preferable that the reconstruction be performed by detecting a false positive area according to the affinity between pixels in the activation map and generating a second reconstruction map.

상기 디코더는 상기 세그멘테이션맵과 상기 제2 재구성맵의 픽셀 단위의 객체별 확률에 따라 필터링된 재구성 세그멘테이션맵 간의 차이를 손실로 학습된 것이 바람직하다.It is preferable that the decoder learns the difference between the segmentation map and the reconstructed segmentation map filtered according to the per-object probability of the pixel unit of the second reconstruction map as a loss.

상기 객체 세그멘테이션 정보를 획득하는 단계는, 상기 세그멘테이션맵 내 픽셀 간 유사도에 따라 오탐지 영역을 검출하여 제2 세그멘테이션맵을 생성하는 단계를 더 포함하고, 상기 생성된 제2 세그멘테이션맵으로부터 상기 객체 세그멘테이션 정보를 획득하는 것이 바람직하다.Obtaining the object segmentation information further includes generating a second segmentation map by detecting a false positive area according to similarity between pixels in the segmentation map, and generating the object segmentation information from the generated second segmentation map. It is desirable to obtain.

상기 객체 세그멘테이션 정보를 획득하는 단계는, 상기 제2 세그멘테이션맵의 픽셀 단위의 객체별 확률에 따라 필터링된 제3 세그멘테이션맵을 생성하는 단계를 더 포함하고, 상기 제3 세그멘테이션맵 내 불확실한 마스크를 상기 제2 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 생성된 슈퍼픽셀을 이용하여 정제하는 것이 바람직하다.The step of obtaining the object segmentation information further includes generating a third segmentation map filtered according to a probability for each object in pixel units of the second segmentation map, and dividing the uncertain mask in the third segmentation map into the third segmentation map. 2 It is desirable to refine using superpixels generated based on the edges of objects detected in the segmentation map.

제1 및 제2 대상 이미지 내 각 객체 세그멘테이션 정보를 이용하여 객체가 합성된 합성 이미지를 생성하는 단계를 더 포함한다.It further includes generating a composite image in which objects are synthesized using segmentation information for each object in the first and second target images.

상기 특징맵을 출력하는 단계는, 상기 인코더로 상기 합성 이미지 내 복수 객체의 특징이 축약된 합성 특징맵을 출력하되, 상기 인코더는, 상기 합성 이미지 내 제1 및 제2 객체의 라벨을 이용하여 상기 합성 특징맵 내 상기 제1 및 제2 객체의 영역을 활성화한 제1 활성맵과, 상기 제1 대상 이미지 및 상기 제2 대상 이미지 각각의 활성맵으로부터 재구성된 재구성맵을 합성한 제2 합성맵 간의 차이를 손실로 학습된 것이 바람직하다.The step of outputting the feature map includes outputting a synthesized feature map in which the features of a plurality of objects in the synthesized image are abbreviated to the encoder, wherein the encoder uses the labels of the first and second objects in the synthesized image to Between a first activation map that activates the regions of the first and second objects in the composite feature map and a second composite map that synthesizes the reconstruction maps reconstructed from the activation maps of each of the first target image and the second target image. It is desirable to learn the difference as a loss.

상기 세그멘테이션맵을 생성하는 단계는, 상기 디코더로 상기 합성 특징맵 내 객체의 특징이 확장된 합성 세그멘테이션맵을 생성하되, 상기 디코더는, 상기 합성 세그멘테이션맵과 상기 제1 및 제2 대상 이미지에 대하여 각각 객체별 확률에 따라 필터링된 제3 재구성맵이 합성된 합성 재구성맵 간의 차이를 손실로 학습된 것이 바람직하다.In the step of generating the segmentation map, the decoder generates a synthetic segmentation map in which features of the object in the synthetic feature map are expanded, and the decoder generates a synthetic segmentation map for the synthetic segmentation map and the first and second target images, respectively. It is preferable that the third reconstruction map filtered according to the probability of each object is learned as a loss based on the difference between the synthesized synthetic reconstruction maps.

상기 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따른 컴퓨팅 장치에서 수행되는 학습된 신경망을 통한 이미지 증강 방법은 복수의 대상 이미지를 입력 받는 단계; 각각의 상기 대상 이미지를 인코더에 입력하여 상기 대상 이미지 내 적어도 일 객체의 특징이 축약된 특징맵을 출력하는 단계; 상기 출력된 특징맵을 디코더에 입력하여 상기 객체의 특징이 확장된 세그멘테이션맵을 생성하는 단계; 상기 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 상기 이미지 내 불확실한 마스크를 정제(Refinement)하여 상기 객체 세그멘테이션 정보를 획득하는 단계; 및 상기 대상 이미지 내 각 객체 세그멘테이션 정보를 이용하여 객체가 합성된 합성 이미지를 생성하는 단계를 포함하고, 상기 인코더는 객체의 라벨을 이용하여 상기 특징맵 내 객체의 영역을 활성화한 활성맵과 상기 활성맵으로부터 픽셀 단위로 재구성된 재구성맵간의 차이를 손실로 학습된 것이 바람직하다.An image augmentation method using a learned neural network performed in a computing device according to an embodiment of the present invention to solve the above technical problem includes receiving a plurality of target images as input; Inputting each target image into an encoder and outputting a feature map in which features of at least one object in the target image are abbreviated; Inputting the output feature map into a decoder to generate a segmentation map with expanded features of the object; Obtaining the object segmentation information by refining an uncertain mask in the image based on the edge of the object detected in the segmentation map; and generating a composite image in which an object is synthesized using segmentation information for each object in the target image, wherein the encoder generates an activity map that activates an area of the object in the feature map using the label of the object and the activity. It is preferable that the difference between the reconstructed maps reconstructed in pixel units from the map is learned as a loss.

본 발명에 따르면 보다 정확한 객체 세그멘테이션 결과를 생성할 수 있다.According to the present invention, more accurate object segmentation results can be generated.

또한, 본 발명은 세그멘테이션 결과를 이용하여 세그멘테이션 모델의 학습을 위한 레이블링 시간을 기존 대비 단축시킬 수 있으며, 정확하고 빠르게 생성된 레이블을 이용하여 세그멘테이션 모델을 학습시킬 수 있다.In addition, the present invention can shorten the labeling time for learning a segmentation model using segmentation results compared to the existing method, and can learn a segmentation model using labels generated accurately and quickly.

또한, 본 발명은 인코더와 디코더로 구성된 신경망 모델의 네트워크의 중간 출력들을 이용하여 인코더와 디코더를 개별적으로 학습시킴으로써 보다 높은 성능의 향상을 이룰 수 있으며, 추가적인 리소스 없이 보다 정확한 세그멘테이션 결과를 생성할 수 있다.In addition, the present invention can achieve higher performance improvement by individually training the encoder and decoder using the intermediate outputs of the network of the neural network model composed of the encoder and decoder, and can generate more accurate segmentation results without additional resources. .

또한, 본 발명은 세그멘테이션 결과로 획득된 마스크를 이용하여 객체들을 합성함으로써 다양한 합성(Synthetic) 이미지들을 생성할 수 있다.Additionally, the present invention can generate various synthetic images by combining objects using a mask obtained as a result of segmentation.

도 1은 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 프로세스를 나타내는 도이다.
도 2는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 동작 파이프라인을 나타내는 도이다.
도 3은 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 세부 동작 파이프라인을 나타내는 도이다.
도 4는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 학습 파이프라인을 나타내는 도이다.
도 5는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 제1 학습을 위한 수도 레이블의 생성 과정을 나타내는 도이다.
도 6은 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 제2 학습을 위한 수도 레이블의 생성 과정을 나타내는 도이다.
도 7은 본 발명의 일 실시예에 따른 복수의 대상 이미지에 대한 객체 세그멘테이션 정보를 획득하기 위한 신경망의 동작 파이프라인을 나타내는 도이다.
도 8 내지 9는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하기 위한 후처리 과정을 나타내는 도이다.
도 10은 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 통한 합성 이미지의 획득 과정을 나타내는 도이다.
도 11는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 신경망의 추가 학습 파이프라인을 나타내는 도이다.
도 12는 본 발명의 일 실시예에 따른 객체 세그멘테이션 정보를 획득하는 방법을 수행하는 컴퓨팅 장치의 구현을 나타내는 도이다.1 is a diagram illustrating a process for obtaining object segmentation information according to an embodiment of the present invention.
Figure 2 is a diagram showing the operation pipeline of a neural network that acquires object segmentation information according to an embodiment of the present invention.
Figure 3 is a diagram showing a detailed operation pipeline of a neural network that acquires object segmentation information according to an embodiment of the present invention.
Figure 4 is a diagram showing a learning pipeline of a neural network that acquires object segmentation information according to an embodiment of the present invention.
Figure 5 is a diagram illustrating the process of generating a number label for first learning of a neural network that acquires object segmentation information according to an embodiment of the present invention.
Figure 6 is a diagram illustrating the process of generating a capital label for second learning of a neural network acquiring object segmentation information according to an embodiment of the present invention.
Figure 7 is a diagram illustrating the operation pipeline of a neural network for obtaining object segmentation information for a plurality of target images according to an embodiment of the present invention.
Figures 8 and 9 are diagrams showing a post-processing process for obtaining object segmentation information according to an embodiment of the present invention.
Figure 10 is a diagram illustrating a process for acquiring a composite image through object segmentation information according to an embodiment of the present invention.
Figure 11 is a diagram showing an additional learning pipeline of a neural network that acquires object segmentation information according to an embodiment of the present invention.
FIG. 12 is a diagram illustrating the implementation of a computing device that performs a method of obtaining object segmentation information according to an embodiment of the present invention.

이하의 내용은 단지 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시 되지 않았지만 발명의 원리를 구현하고 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. 또한, 본 명세서에 열거된 모든 조건부 용어 및 실시 예들은 원칙적으로, 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이외같이 특별히 열거된 실시 예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다. The following merely illustrates the principles of the invention. Therefore, a person skilled in the art can invent various devices that embody the principles of the invention and are included in the concept and scope of the invention, although not clearly described or shown herein. In addition, all conditional terms and embodiments listed in this specification are, in principle, clearly intended only for the purpose of ensuring that the inventive concept is understood, and should be understood as not limiting to the specifically listed embodiments and states. .

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. The above-mentioned purpose, features and advantages will become clearer through the following detailed description in relation to the attached drawings, and accordingly, those skilled in the art in the technical field to which the invention pertains will be able to easily implement the technical idea of the invention. .

또한, 발명을 설명함에 있어서 발명과 관련된 공지 기술에 대한 구체적인 설명이 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하에는 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예에 대해 상세하게 설명한다.Additionally, when describing the invention, if it is determined that a detailed description of the known technology related to the invention may unnecessarily obscure the gist of the invention, the detailed description will be omitted. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the attached drawings.

이하, 도 1을 참조하여 본 실시예에 따른 신경망의 세그멘테이션(Segmentation) 과정을 보다 상세히 설명한다.Hereinafter, the segmentation process of the neural network according to this embodiment will be described in more detail with reference to FIG. 1.

도 1은 본 발명의 일 실시예에 따른 컴퓨팅 장치의 객체 세그멘테이션 방법을 나타내는 흐름도이다.1 is a flowchart showing an object segmentation method of a computing device according to an embodiment of the present invention.

본 실시예에서 컴퓨팅 장치는 프로세서를 포함하는 단일 컴퓨터 또는 서버, 복수의 서버로 구성되는 서버 시스템으로 구현될 수 있다. In this embodiment, the computing device may be implemented as a single computer or server including a processor, or a server system composed of a plurality of servers.

따라서, 컴퓨팅 장치는 단일 프로세스 상에서 수신된 이미지로부터 세그멘테이션을 수행하는 것 외에 네트워크 통신이 가능한 형태로 구성될 수 있으며 클라우드 형태로 구현되어 다양한 촬영 장치에서 촬영된 영상들을 수신하여 처리하는 것도 가능하다. Therefore, in addition to performing segmentation from images received in a single process, the computing device can be configured to enable network communication, and can be implemented in a cloud form to receive and process images captured by various photographing devices.

또한, 컴퓨팅 장치는 프로세서 외에 수집된 영상들을 이용하여 학습 데이터로 활용하거나 학습 데이터의 증강(Augmentation)을 위해 클라우드 기반의 메모리 장치를 포함하여 구성될 수 있다.Additionally, in addition to the processor, the computing device may be configured to include a cloud-based memory device for using the collected images as learning data or for augmentation of the learning data.

예를 들어 본 실시예에 따른 컴퓨팅 장치는 종단에 설치되는 CCTV와 같은 카메라로부터 수신된 이미지로부터 직접 세그멘테이션 정보를 획득하는 엣지 컴퓨터로 구현되어 세그멘테이션 과정을 수행할 수 있다. 또는 중앙의 서버로 구현되어 세그멘테이션을 수행하는 신경망의 학습 또는 추론을 위하여 수집된 이미지에 대한 배치단위의 처리를 수행하고, 추론 결과를 API 호출에 대한 응답 형태로 제공하는 것도 가능하다.For example, the computing device according to this embodiment can be implemented as an edge computer that directly obtains segmentation information from images received from cameras such as CCTV installed at the edge and can perform a segmentation process. Alternatively, it is possible to implement batch processing on the collected images for learning or inference of a neural network that performs segmentation by implementing it as a central server, and provide the inference results in the form of a response to an API call.

먼저 본 실시예에 따른 컴퓨팅 장치는 세그멘테이션을 위해 세그멘테이션 대상 이미지(I)를 입력 받는다(S10).First, the computing device according to this embodiment receives the segmentation target image (I) for segmentation (S10).

본 실시예에서 세그멘테이션은 이미지에 포함된 객체의 분류(Classification)와 함께 바운딩 박스 형태의 위치 검출(detection)에서 나아가 객체가 위치하는 영역을 픽셀 단위로 구분하는 것을 의미할 수 있다.In this embodiment, segmentation may mean classifying the object included in the image and classifying the area where the object is located on a pixel-by-pixel basis, in addition to detecting the location of the object in the form of a bounding box.

이미지 내 객체의 검출은 객체가 위치하는 영역을 좌표와 높이 및 크기로 결정되는 바운딩 박스 형태로 제공하는 것으로 수행될 수 있으며, 검출 결과를 레이블로 이용하여 이미지를 학습에 이용할 수 있다.Detection of an object in an image can be performed by providing the area where the object is located in the form of a bounding box determined by coordinates, height, and size, and the image can be used for learning by using the detection result as a label.

다만, 바운딩 박스로 추출된 영역 내에는 학습에 불필요한 부분이 포함됨에 따른 오검출 문제가 발생할 수 있으며 따라서 신경망의 학습 성능을 저하시키는 원인으로 작용할 수 있다. 따라서, 신경망의 학습 효과를 높이기 위해서는 보다 정밀한 검출이 필요하다. 또한 배경과 객체의 결합을 통한 다양한 합성 이미지의 생성을 위해서는 보다 정밀하게 픽셀을 단위로 객체의 영역을 구분될 필요가 있다.However, false detection problems may occur due to unnecessary parts for learning being included in the area extracted as a bounding box, which may cause a decrease in the learning performance of the neural network. Therefore, more precise detection is needed to increase the learning effect of the neural network. Additionally, in order to create various composite images by combining the background and the object, it is necessary to more precisely distinguish the object area on a pixel basis.

본 실시예에서 신경망의 세그멘테이션은 이미지를 픽셀 단위로 구분하되 정의된 클래스에 기초하여 분류하는 것으로 의미론적 세그멘테이션(Semantic Segmentation)을 목표로 하되, 동일 클래스에 대해서도 각각의 객체를 단위로 구분하는 인스턴스(Instance) 세그멘테이션 정보를 획득하는 것도 가능하다.In this embodiment, the segmentation of the neural network aims at semantic segmentation by dividing the image by pixel and classifying it based on a defined class, but also classifies each object by unit even for the same class (instance ( It is also possible to obtain segmentation information.

본 실시예에서 입력되는 세그멘테이션 대상 이미지(I)는 카메라 장치를 통해 촬영된 것으로 목적에 따라 보다 신속한 추적을 위해 한 번에 넓은 영역을 촬영한 것일 수 있으며 대화면 이미지일 수도 있다.In this embodiment, the input segmentation target image (I) is captured using a camera device, and may be a large area captured at once for faster tracking depending on the purpose, or may be a large screen image.

대화면 이미지는 예를 들어 항공 사진 형태로 획득될 수 있으며, 예를 들어 드론이 비행 중에 촬영한 지상 영상을 획득하고, 획득된 전체 영역에 비해 매우 작은 크기의 영역에 해당하는 객체를 보다 정밀하게 세그멘테이션할 수 있다.Large screen images can be acquired, for example, in the form of aerial photography, for example, by acquiring ground images taken by a drone during flight, and segmenting objects corresponding to an area of very small size compared to the entire acquired area more precisely. can do.

다음, 본 실시예에 따른 컴퓨팅 장치는 대상 이미지(I)를 인코더(110)에 입력하여 대상 이미지(I) 내 적어도 일 객체의 특징이 축약된 특징맵을 출력한다(S20).Next, the computing device according to this embodiment inputs the target image (I) into the encoder 110 and outputs a feature map in which the features of at least one object in the target image (I) are abbreviated (S20).

인코더(110)는 일반적인 컨볼루션(Convolution) 연산을 수행하는 컨볼루션 네트워크로 구성될 수 있다. 컨볼루션 네트워크는 복수의 컨볼루션 레이어들로 구성되며 컨볼루션 레이어들은 내부의 필터와 이미지의 픽셀 값들 간의 컨볼루션 연산을 채널 별로 수행함으로써 객체와 관련된 특징들이 강조되어 축약된 형태의 특징맵을 출력할 수 있다.The encoder 110 may be configured as a convolutional network that performs a general convolution operation. A convolutional network consists of a plurality of convolutional layers. The convolutional layers perform a convolution operation between the internal filter and the pixel values of the image for each channel, so that the features related to the object are emphasized and a condensed feature map is output. You can.

이어서, 본 실시예에 따른 컴퓨팅 장치는 출력된 특징맵을 디코더(120)에 입력하여 상기 객체의 특징이 확장된 세그멘테이션맵을 생성한다(S30).Next, the computing device according to this embodiment inputs the output feature map into the decoder 120 to generate a segmentation map with expanded features of the object (S30).

구체적으로 디코더(120)는 인코더(110)에 대칭되는 형상으로 구현될 수 있으며 따라서 디컨볼루션(De-convolution)연산을 수행하는 디컨볼루션 네트워크로 구성될 수 있다.Specifically, the decoder 120 may be implemented in a shape symmetrical to the encoder 110 and may therefore be configured as a deconvolution network that performs a deconvolution operation.

즉, 디컨볼루션 네트워크 내 각각의 레이어들은 인코더(110)의 출력인 특징맵을 입력으로 특징맵 내 특징 정보들을 위치 정보를 유지하고, 원본 이미지의 크기에서 해당 특징들이 잘 표현될 수 있도록 확장시킨다.That is, each layer in the deconvolution network uses the feature map, which is the output of the encoder 110, as input, maintains the location information of the feature information in the feature map, and expands it so that the features can be expressed well in the size of the original image. .

대칭적인 구조에 따라 인코더(110)의 최종 출력된 특징맵의 크기와 채널은 디코더(120)의 입력에 따른 크기와 채널을 가질 수 있으며, 디코더(120) 내부의 디컨볼루션 레이어를 거치면서 인코더(110)의 출력을 통해 원본 이미지 크기에서 특징들의 위치 정보들이 복원된 세그멘테이션맵이 출력될 수 있다.According to the symmetrical structure, the size and channel of the final output feature map of the encoder 110 may have a size and channel according to the input of the decoder 120, and the encoder 110 passes through a deconvolution layer inside the decoder 120. Through the output of (110), a segmentation map in which the location information of features is restored from the original image size can be output.

본 실시예에 따른 인코더(110)와 디코더(120)로 구성된 신경망을 잘 학습시키기 위해서는 축약된 특징맵에서 나타나는 특징 정보들과 디코더(120)로 확장된 세그멘테이션맵에서 나타나는 특징 정보들에 대한 정확한 레이블이 필요 할 수 있다.In order to properly learn the neural network consisting of the encoder 110 and the decoder 120 according to this embodiment, accurate labels for the feature information appearing in the abbreviated feature map and the segmentation map expanded by the decoder 120 are required. This may be necessary.

다만, 각각의 특징맵에 대한 정확한 레이블을 확보하기 위해서는 수기 형태의 레이블이 필요할 수 있으므로 보다 효율적인 학습을 위해서 본 실시예에서는 추가적인 네트워크의 출력이나 처리를 통하여 생성된 이미지를 수도(Pseudo) 레이블로 이용하여 의미 기반의 약지도 세그멘테이션(weakly-supervised semantic segmentation)을 통한 학습을 수행할 수 있다.However, in order to secure an accurate label for each feature map, a handwritten label may be necessary, so for more efficient learning, in this embodiment, the image generated through the output or processing of an additional network is used as a pseudo label. Thus, learning can be performed through semantic-based weakly-supervised semantic segmentation.

구체적인 학습 방법에 대해서는 도 2에 따른 신경망을 포함하는 학습 파이프라인을 참고하여 설명한다.The specific learning method will be described with reference to the learning pipeline including the neural network shown in FIG. 2.

도 2는 본 발명의 일 실시예에 따른 학습 파이프라인의 구조를 나타내는 예시도이다.Figure 2 is an exemplary diagram showing the structure of a learning pipeline according to an embodiment of the present invention.

도 2를 참고하면, 상술한 바와 같이 입력된 대상 이미지(I)를 입력으로 인코더(110)는 특징맵을 생성하여 출력하고, 특징맵은 디코더(120)를 통해 세그멘테이션맵() 형태로 출력될 수 있다.Referring to FIG. 2, the encoder 110 generates and outputs a feature map using the input target image (I) as described above, and the feature map is converted into a segmentation map ( ) can be output in the form.

이때 학습을 위한 파이프라인은 인코더(110)의 학습을 위한 제1 학습 파이프라인과 디코더(120)의 학습을 위한 제2 학습 파이프라인으로 구분될 수 있다.At this time, the pipeline for learning may be divided into a first learning pipeline for learning of the encoder 110 and a second learning pipeline for learning of the decoder 120.

먼저 제1 학습 파이프라인에 대하여 설명하면, 인코더(110)의 출력은 추가적인 레이어를 통해 활성맵() 형태로 생성될 수 있다.First, to describe the first learning pipeline, the output of the encoder 110 is an active map ( ) can be created in the form.

도 3을 참조하여 활성맵() 형태의 출력을 수행하는 네트워크 구조에 대하여 보다 상세히 설명하면, 상술한 바와 같이 인코더(110)의 출력으로 최종 특징맵은 분류부(130)에서 미리 결정된 채널과 크기에 따른 형태로 구성될 수 있다. 바람직하게 최종 특징맵의 채널은 분류하고자 하는 클래스에 따라 결정될 수 있으며 특징맵은 채널 별로 클래스 내 객체를 식별하는데 주요한 특징이 위치에 따라 강조된 매트릭스 형태로 구성된다.Referring to Figure 3, the active map ( ) To describe in more detail the network structure that performs output in the form, as described above, the final feature map as the output of the encoder 110 may be configured in a form according to the channel and size predetermined in the classification unit 130. . Preferably, the channel of the final feature map can be determined according to the class to be classified, and the feature map is structured in the form of a matrix in which major features are highlighted according to location to identify objects within the class for each channel.

이상의 컨볼루션 계층에서 출력된 특징맵은 전역 평균 풀링(GAP:Global Average Pooling) 레이어를 통해 평탄화될 수 있다.The feature map output from the above convolution layer can be flattened through a global average pooling (GAP: Global Average Pooling) layer.

구체적으로 각 채널 별 특징맵의 값들의 평균을 전역 평균 풀링 레이어를 통해 하나의 벡터 값으로 산출할 수 있으며 따라서 채널 별 하나의 벡터로 구성된 1차원 배열을 획득할 수 있다. Specifically, the average of the values of the feature map for each channel can be calculated as one vector value through a global average pooling layer, and thus a one-dimensional array consisting of one vector for each channel can be obtained.

다음 전역 평균 풀링 레이어의 출력은 분류하고자 하는 객체의 클래스 별로 정의된 가중치(W₁, W₂, … W_n)들과 연산된다. 각각의 가중치는 해당 클래스에서 각 채널의 중요도를 나타낼 수 있다. 가중치와 벡터 값의 연산은 활성화 함수로 소프트맥스의 입력으로 주어지게 되며 객체의 분류 결과를 확률 값 형태로 최종 출력할 수 있다. 본 실시예에서는 이상의 분류 결과에 따라 가장 높은 확률을 갖는 클래스를 레이블로 활용하여 활성맵을 생성한다.Next, the output of the global average pooling layer is calculated with weights (W ₁ , W ₂ , … W _n ) defined for each class of the object to be classified. Each weight may indicate the importance of each channel in the corresponding class. The calculation of weights and vector values is given as an input to Softmax as an activation function, and the object classification results can be finally output in the form of probability values. In this embodiment, an activity map is created using the class with the highest probability as a label according to the above classification results.

예를 들어 도 3에서 이미지 내의 객체가 비행기로 분류 결과(Y)가 출력된 경우, 이때의 결과를 이용하여 내부 연산 네트워크로 활성맵을 생성할 수 있다.For example, in Figure 3, if the object in the image is classified as an airplane and the result (Y) is output, the active map can be created using the internal computation network using the result.

구체적으로 내부 연산 네트워크는 분류 결과가 비행기로 출력된 경우, 비행기의 판단에 이용된 가중치(W₁, W₂, … W_n)를 다시 각 가중치에 대응되는 풀링 전 채널 별 특징맵과 곱함으로써 특징맵의 어떠한 위치에 해당하는 픽셀 값이 비행기의 판단에 영향을 미쳤는지를 확인할 수 있다. 따라서, 각 채널 별 특징맵을 가중치와 가중합함으로써 2차원의 이미지와 같은 형태의 활성맵을 출력하고, 직관적으로 객체의 존재하는 위치와 객체의 특징의 중요도를 나타낼 수 있다.Specifically, when the classification result is output as an airplane, the internal computation network multiplies the weights (W ₁ , W ₂ , … W _n ) used in the judgment of the airplane with the feature maps for each channel before pooling corresponding to each weight to determine the characteristics. You can check which pixel value corresponding to a location on the map influenced the airplane's decision. Therefore, by combining the feature maps for each channel with weights, an activation map in the form of a two-dimensional image can be output and intuitively indicate the location of the object and the importance of the object's features.

최종적으로 각 채널별 특징맵에 가중치가 부여된 값들을 가중합하고 따라서 상대적으로 중요한 픽셀의 영역들이 강조됨으로써 객체의 위치에 대한 정보를 간접적으로 나타내는 히트맵 형태의 활성맵을 생성할 수 있다. Finally, the weighted values of the feature maps for each channel are weighted and added, and relatively important pixel areas are emphasized, thereby creating an activation map in the form of a heat map that indirectly represents information about the location of the object.

즉, 분류에 주요한 영향을 미친 픽셀은 그렇지 않은 픽셀에 비해 큰 값을 가지므로 이에 따라 강조된 색상을 가질 수 있으며, 픽셀 값을 기초로 객체의 영역을 구분할 수 있따.In other words, pixels that have a major influence on classification have a larger value than pixels that do not, so they can have emphasized colors accordingly, and areas of objects can be distinguished based on pixel values.

다만, 상술한 바와 같이 인코더(110)의 특징맵으로부터 출력된 활성맵은 객체의 세밀한 위치정보까지 포함하기는 어려우므로 추가적인 처리를 수행할 필요가 있으며 따라서 본 실시예에서는 반복적인 학습 과정 중에서에서 획득된 활성맵 내 객체의 영역을 보다 세밀하게 정제(Refinement)하여 재구성한 재구성맵을 이용하여 인코더(110)를 재귀적으로 학습시킨다.However, as described above, it is difficult for the activation map output from the feature map of the encoder 110 to include detailed location information of the object, so additional processing is necessary. Therefore, in this embodiment, it is obtained during an iterative learning process. The encoder 110 is recursively trained using the reconstructed map by refining the area of the object in the active map in more detail.

구체적으로 도 4를 참고하면, 5번째 학습 주기(t=5)에서 획득된 활성맵을 재구성하여 생성된 재구성맵()을 수도(Pseudo) 레이블로 활용하여 학습을 수행할 수 있다. 즉, 해당 주기에서 생성된 활성맵() 과 활성맵() 으로부터 재구성되어 생성된 재구성맵() 간의 차이를 손실로 정의함으로써 해당 손실이 적어지는 방향으로 반복적인 학습을 수행하는 것을 통해 인코더(110) 내부의 레이어들을 갱신할 수 있다. Specifically, referring to FIG. 4, the reconstruction map ( ) can be used as a pseudo label to perform learning. That is, the active map generated in that cycle ( ) and active map ( ) Reconstruction map created by reconstruction from ( By defining the difference between ) as loss, the layers inside the encoder 110 can be updated by performing iterative learning in the direction of reducing the loss.

이러한 학습 과정은 출력된 활성맵을 이용하여 생성된 재구성맵을 통한 갱신 과정으로 반복될 수 있으며 따라서, 30번째 학습 주기(t=30)에서 획득된 활성맵() 은 이전의 주기의 활성맵() 에 비하여 보다 상세한 세그멘테이션 결과를 생성할 수 있다.This learning process can be repeated as an update process through the reconstruction map generated using the output activation map, and therefore, the activation map ( ) is the activity map of the previous cycle ( ) can produce more detailed segmentation results than .

이때의 반복 횟수 등은 미리 결정된 하이퍼파라미터에 따라 결정될 수 있다.At this time, the number of repetitions, etc. may be determined according to predetermined hyperparameters.

다음, 본 실시예에 따라 활성맵을 재구성하는 과정에 대하여 보다 상세히 설명한다.Next, the process of reconstructing the active map according to this embodiment will be described in more detail.

도 5를 참고하면 재구성과정은 2단계의 과정으로 구분되어 수행될 수 있다.Referring to FIG. 5, the reconstruction process can be divided into two stages.

먼저, 활성맵으로 표현된 객체의 영역정보를 확장하는 제1 재구성과정과 확장된 제1 재구성맵에서 보다 세밀한 영역으로 정제를 수행하는 제2 재구성과정으로 구분되어 수행될 수 있다.First, it can be performed separately into a first reconstruction process that expands the area information of the object expressed in the active map and a second reconstruction process that performs refinement into a more detailed area from the expanded first reconstruction map.

제1 재구성 과정에 대하여 먼저 설명하면, 본 실시예에서 제1 재구성 과정은 특징맵 내 픽셀 간의 유사도를 산출하고, 산출된 유사도의 활성화 함수의 출력을 이용하여 유사도에 따라 영역이 확장된 재구성맵을 생성할 수 있다. 예를 들어 인코더(110)의 특징맵으로부터 출력된 활성맵 내에서 객체의 영역으로 판단된 픽셀과 기준값 이상의 유사도를 갖는 픽셀에 대해서는 객체의 영역으로 포함하도록 확장할 수 있다.First, the first reconstruction process will be described. In this embodiment, the first reconstruction process calculates the similarity between pixels in the feature map, and uses the output of the activation function of the calculated similarity to create a reconstruction map whose area is expanded according to the similarity. can be created. For example, in the activation map output from the feature map of the encoder 110, pixels that are determined to be the object area and pixels that have a similarity higher than a reference value can be expanded to include the object area.

구체적으로 제1 재구성(SCG, self-correlation map generating) 모듈은 특징맵 내 픽셀간의 유사도 및 비인접 픽셀간의 평균 유사도를 출력하고, 픽셀 단위의 유사도 비교를 통해 최대값을 유사도로 취하고, 유사도가 제1 임계값을 초과하는 경우 영역을 확장시키는 과정을 통해 제1 재구성맵()을 생성할 수 있다. 구체적으로 미탐지(False Negative) 영역을 검출하여 확장시킴으로써 제1 재구성맵을 생성한다.Specifically, the first reconstruction (SCG, self-correlation map generating) module outputs the similarity between pixels in the feature map and the average similarity between non-adjacent pixels, takes the maximum value as the similarity through pixel-level similarity comparison, and determines the similarity. 1 If it exceeds the threshold, the first reconstruction map ( ) can be created. Specifically, the first reconstruction map is created by detecting and expanding the false negative area.

따라서, 제1 재구성맵은 픽셀간의 유사도에 따라 활성맵에 비하여 확장된 객체의 영역을 포함한다.Accordingly, the first reconstruction map includes an area of the object expanded compared to the active map according to the similarity between pixels.

이어서, 제1 재구성맵의 확장된 영역을 보다 세밀하게 정제하기 위한 제2 재구성을 수행한다.Next, a second reconstruction is performed to refine the expanded area of the first reconstruction map in more detail.

제2 재구성 과정은 제2 재구성(PAMR, Pixel-adaptive mask refinement) 모듈에서 별도로 정의되는 유사도 행렬(Affinity Matrix)를 통해 제1 재구성맵을 입력으로 재구성맵() 을 생성하되 산출되는 값들은 주변 색상 값이 유사할수록 동일한 확률 값을 갖도록 보정하여 오탐지(False Positive) 영역을 제거하는 형태로 재구성맵() 을 생성할 수 있다. 이때 제1 재구성맵의 채널별로 영역의 제거과정이 수행되는 것도 가능하다.The second reconstruction process uses the first reconstruction map as input through an affinity matrix separately defined in the second reconstruction (PAMR, pixel-adaptive mask refinement) module. ) is generated, but the calculated values are corrected to have the same probability value as surrounding color values are similar, and a reconstruction map (false positive area) is removed. ) can be created. At this time, it is also possible to perform a region removal process for each channel of the first reconstruction map.

이상 제1 학습 파이프라인은 출력된 활성맵으로부터 생성된 재구성맵을 수도 레이블로 인코더(110)의 출력간 오차를 감소시키는 방향으로 인코더(110) 내부의 레이어를 갱신하는 방식으로 학습을 수행할 수 있다.The first learning pipeline can perform learning by updating the layers inside the encoder 110 in the direction of reducing the error between the outputs of the encoder 110 using the reconstructed map generated from the output active map as a capital label. there is.

이어서 제2 학습파이프라인에 대하여 설명한다.Next, the second learning pipeline will be described.

제2 학습파이프라인은 디코더(120)의 출력을 통한 디코더(120)의 학습 과정으로 디코더(120)의 출력으로 세그멘테이션맵에 대한 수도 레이블을 이용하여 학습을 수행할 수 있다.The second learning pipeline is a learning process of the decoder 120 through the output of the decoder 120, and learning can be performed using the capital label for the segmentation map as the output of the decoder 120.

즉, 학습된 인코더(110)에서 정제된 특징맵은 디코더(120)를 통해 영역에 대한 세밀한 세그멘테이션 정보로 출력될 수 있다. That is, the feature map refined in the learned encoder 110 can be output as detailed segmentation information for the region through the decoder 120.

이때 세그멘테이션은 객체의 보다 정확한 구분을 위하여 수도 레이블로 제1 학습 파이프라인에서 생성된 재구성맵을 이용할 수 있다. 구체적으로 재구성맵은 활성맵으로부터 생성된 형태로 세그멘테이션 정보로 변환하기 위한 추가적인 과정을 수행한다.At this time, segmentation can use the reconstruction map generated in the first learning pipeline as a capital label for more accurate classification of objects. Specifically, the reconstruction map performs an additional process to convert the form generated from the active map into segmentation information.

도 6을 참고하면 본 실시예에서는 재구성맵() 에 대한 필터(CF, CertainFilter)를 통하여 특정 임계 확률에 따른 세그멘테이션 정보로 재구성할 수 있다. 이상의 과정을 통해 생성된 재구성 세그멘테이션맵() 은 디코더에서 출력된 세그멘테이션맵() 의 수도 레이블로 활용된다.Referring to Figure 6, in this embodiment, the reconstruction map ( ) can be reconstructed into segmentation information according to a specific threshold probability through a filter (CF, CertainFilter). Reconstructed segmentation map generated through the above process ( ) is the segmentation map output from the decoder ( ) is also used as a label.

따라서, 디코더(120)는 출력값으로 세그멘테이션맵() 과 재구성 세그멘테이션맵() 간의 차이를 손실로 정의함으로써 차이가 감소하는 방향으로 반복적인 학습을 수행할 수 있다. Therefore, the decoder 120 uses a segmentation map ( ) and reconstructed segmentation map ( ) By defining the difference between them as loss, iterative learning can be performed in the direction of reducing the difference.

이상의 과정으로 학습된 신경망을 통해 본 실시예에 따른 컴퓨팅 장치는 세그멘테이션 정보를 획득한다(S40).The computing device according to this embodiment acquires segmentation information through the neural network learned through the above process (S40).

본 실시예에서 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 객체 세그멘테이션 정보를 획득한다.In this embodiment, object segmentation information is obtained based on the edges of objects detected in the segmentation map.

도 7을 참고하면, 구체적으로 객체 세그멘테이션 정보를 획득하는 단계는, 세그멘테이션맵() 내 에지들을 검출함으로써 제3 세그멘테이션맵()을 생성하고, 제3 세그멘테이션맵()으로부터 객체와 배경을 구분한 객체 세그멘테이션 정보 를 획득할 수 있다.Referring to FIG. 7, specifically, the step of acquiring object segmentation information is a segmentation map ( ) by detecting the edges within the third segmentation map ( ) and generate a third segmentation map ( ), object segmentation information that distinguishes the object and the background can be obtained.

이때, 객체 세그멘테이션 정보의 획득을 위하여 디코더(120)의 출력인 세그멘테이션맵()의 추가적인 처리가 가능하며 바람직하게는 상술한 재구성 과정 중 제2 재구성 과정을 통하여 세그멘테이션맵()을 정밀하게 정제할 수 있다.At this time, in order to obtain object segmentation information, the segmentation map ( ) is possible to further process, and preferably, the segmentation map ( ) can be precisely purified.

도 8을 참고하면 픽셀 간 유사도 행렬의 연산에 따라 오탐지 영역을 검출하여 제거함으로써 제2 세그멘테이션맵() 을 생성하고, 제2 세그멘테이션맵() 의 픽셀 단위의 객체별 확률에 따라 필터링된 제3 세그멘테이션맵()를 생성할 수 있다.Referring to FIG. 8, a second segmentation map ( ) and create a second segmentation map ( ) A third segmentation map filtered according to the probability of each pixel object ( ) can be created.

구체적으로 제2 세그멘테이션맵() 내 불확실한 마스크를 상기 제2 세그멘테이션맵() 내 검출된 객체의 에지를 기반으로 생성된 슈퍼픽셀을 이용하여 정제함으로써 세그멘테이션 정보를 획득하는 제3 재구성 과정(EP, EdgePrediction)을 통해 수행될 수 있다.Specifically, the second segmentation map ( ) My uncertain mask is set to the second segmentation map ( ) can be performed through a third reconstruction process (EP, EdgePrediction) that obtains segmentation information by refining using superpixels generated based on the edges of the detected object.

본 실시예에 따른 제3 재구성 과정에 대하여 도 9를 참고하여 설명한다.The third reconstruction process according to this embodiment will be described with reference to FIG. 9.

도 9를 참고하면, 먼저 제3 재구성 과정은 입력된 이미지에 대하여 디코더(120)의 출력인 세그멘테이션맵을 정제한 제2 세그멘테이션맵() 으로부터 에지를 검출할 수 있다. 에지 검출은 종래의 기술(A Computational Approach to Edge Detection, Canny 1986) 에 따라 픽셀의 값의 변화량을 기초로 검출될 수 있으며 연속된 선분의 형태로 구성된다.Referring to FIG. 9, first, the third reconstruction process is a second segmentation map (which refines the segmentation map that is the output of the decoder 120 for the input image). ) Edges can be detected from . Edge detection can be detected based on the amount of change in pixel value according to conventional technology (A Computational Approach to Edge Detection, Canny 1986) and is composed of continuous line segments.

이어서, 에지를 기초로 동일한 연결된 구성 요소에 속하는 픽셀을 함께 그룹화 하여 하나의 픽셀로 정의하는 슈퍼 픽셀을 추출한다(Connected-component labeling (CCL)).Next, pixels belonging to the same connected component are grouped together based on edges to extract a superpixel, which is defined as one pixel (Connected-component labeling (CCL)).

또한, 제2 세그멘테이션맵() 을 필터() 를 통해 임계 확률을 기준으로 확실한 영역을 정의하는 제1 마스크와 불확실한 영역을 정의하는 제2 마스크로 구분하며, 불확실한 영역을 슈퍼 픽셀을 기초로 제거하는 과정을 수행한다.In addition, the second segmentation map ( ) to filter( ) is divided into a first mask that defines a certain area based on a threshold probability and a second mask that defines an uncertain area, and a process of removing the uncertain area is performed based on super pixels.

이상의 과정을 통해 제2 마스크에서 불확실한 영역들이 제거되고 남은 영역과 제1 마스크를 합산함으로써 객체 세그멘테이션 정보를 포함하는 제3 세그멘테이션맵()을 생성할 수 있다.Through the above process, uncertain areas are removed from the second mask, and the remaining areas and the first mask are added to create a third segmentation map containing object segmentation information ( ) can be created.

다음, 컴퓨팅 장치는 제3 세그멘테이션맵()으로부터 추출된 객체 세그멘테이션 정보를 이용하여 객체가 합성된 합성 이미지를 생성한다.Next, the computing device generates a third segmentation map ( ) using the object segmentation information extracted from ) to create a composite image in which the object is synthesized.

다시 도 7을 참고하면 상술한 객체 세그멘테이션 정보의 추출과정은 복수의 대상 이미지(I)에 대하여 개별적으로 수행될 수 있으며 따라서 각각의 객체 세그멘테이션 정보를 추출할 수 있다.Referring again to FIG. 7, the above-described object segmentation information extraction process can be performed individually for a plurality of target images (I), and thus each object segmentation information can be extracted.

이후 추출된 객체 세그멘테이션 정보에 해당하는 이미지의 값을 합성 대상 이미지(I)의 특정 위치에 합성하거나, 또는 제3의 배경에 합성함으로써 새로운 합성 이미지를 생성할 수 있다.Afterwards, a new composite image can be created by compositing the image value corresponding to the extracted object segmentation information to a specific position of the composite image (I) or to a third background.

이상 본 실시예에서는 개별적으로 학습한 인코더(110)와 디코더(120) 및 디코더(120)의 출력을 통해 보다 정제된 객체 세그멘테이션 정보를 추출함으로써 합성 이미지의 형태를 최대한 보전하며 이질감을 줄일 수 있다.In this embodiment, the shape of the composite image can be preserved as much as possible and the sense of heterogeneity can be reduced by extracting more refined object segmentation information through the individually learned encoder 110 and decoder 120 and the output of the decoder 120.

도 10을 참고하면, 본 실시예에서는 비행기와 말의 클래스에 해당하는 대상 이미지로부터 객체 세그멘테이션 정보를 생성하고, 생성된 객체 세그멘테이션 정보를 합성하여 합성 세그멘테이션맵()를 생성한다.Referring to FIG. 10, in this embodiment, object segmentation information is generated from target images corresponding to the classes of airplane and horse, and the generated object segmentation information is synthesized to create a synthetic segmentation map ( ) is created.

이어서 합성 세그멘테이션 정보에 대응되는 입력된 각각의 대상 이미지(I) 내 영역을 추출하여 합성함으로써 비행기와 말이 합성된 합성 이미지()를 생성할 수 있다.Next, the region within each input target image (I) corresponding to the synthetic segmentation information is extracted and synthesized to create a composite image of an airplane and a horse ( ) can be created.

나아가 본 실시예에서는 합성 이미지를 통한 인코더(110)와 디코더(120)의 상술한 학습 파이프라인을 재수행함으로써 합성 과정에서의 신경망의 성능을 더욱 높일 수 있다.Furthermore, in this embodiment, the performance of the neural network in the synthesis process can be further improved by re-performing the above-described learning pipeline of the encoder 110 and decoder 120 through the synthesized image.

도 11을 참고하면, 상술한 제1 및 제2 학습 파이프라인은 합성 이미지()에 대하여 추가적인 학습을 수행하되, 클래스는 각각의 대상 이미지(I)를 활용하여 말과 비행기에 대한 활성맵()과 제2 재구성맵()의 오차를 줄이도록 하는 인코더(110)의 제1-1 학습 파이프라인과, 세그멘테이션맵()과 세그멘테이션맵()의 에지를 통한 슈퍼픽셀로부터 불확실 영역이 제거된 재구성 세그멘테이션맵()의 오차를 줄이도록 디코더(120)의 제2-1 학습 파이프라인을 수행함으로써 신경망의 성능을 더욱 항샹 시킬 수 있다. Referring to Figure 11, the first and second learning pipelines described above are synthesized images ( ), but the class utilizes each target image (I) to map the active maps for horses and airplanes ( ) and the second reconstruction map ( 1-1 learning pipeline of the encoder 110 to reduce the error of ), and a segmentation map ( ) and segmentation map ( Reconstructed segmentation map (with uncertain regions removed from superpixels through the edges of ) ) The performance of the neural network can be further improved by performing the 2-1 learning pipeline of the decoder 120 to reduce the error.

이하 도 12를 참고하여 본 실시예에 따른 객체 세그멘테이션 정보의 획득 방법을 수행하는 컴퓨팅 장치로서 서버의 구체적인 하드웨어 구현에 대하여 설명한다.Hereinafter, with reference to FIG. 12, a specific hardware implementation of a server as a computing device that performs the method of obtaining object segmentation information according to this embodiment will be described.

도 12를 참조하면, 본 발명의 몇몇 실시예들에서 서버(300)는 컴퓨팅 장치의 형태로 구현될 수 있다. 서버(300)를 구성하는 각각의 모듈 중 하나 이상은 범용 컴퓨팅 프로세서 상에서 구현되며 따라서 프로세서(processor)(308), 입출력 I/O(302), 메모리 (memory)(340), 인터페이스(interface)(306) 및 버스(314, bus)를 포함할 수 있다. 프로세서(308), 입출력 장치(302), 메모리 (340) 및/또는 인터페이스(306)는 버스(314)를 통하여 서로 결합될 수 있다. 버스(314)는 데이터들이 이동되는 통로(path)에 해당한다.Referring to FIG. 12, in some embodiments of the present invention, the server 300 may be implemented in the form of a computing device. One or more of each module constituting the server 300 is implemented on a general-purpose computing processor and thus includes a processor 308, input/output I/O 302, memory 340, and interface ( 306) and a bus 314. The processor 308, input/output device 302, memory 340, and/or interface 306 may be coupled to each other through a bus 314. The bus 314 corresponds to a path through which data moves.

구체적으로, 프로세서(308)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit), 마이크로프로세서, 디지털 신호 프로세스, 마이크로컨트롤러, 어플리케이션 프로세서(AP, application processor) 및 이들과 유사한 기능을 수행할 수 있는 논리 소자들 중에서 적어도 하나를 포함할 수 있다.Specifically, the processor 308 includes a Central Processing Unit (CPU), Micro Processor Unit (MPU), Micro Controller Unit (MCU), Graphic Processing Unit (GPU), microprocessor, digital signal processor, microcontroller, and application processor (AP). , application processor) and logic elements capable of performing similar functions.

입출력 장치(302)는 키패드(keypad), 키보드, 터치스크린 및 디스플레이 장치 중 적어도 하나를 포함할 수 있다. 메모리 장치(340)는 데이터 및/또는 프로그램 등을 저장할 수 있다.The input/output device 302 may include at least one of a keypad, a keyboard, a touch screen, and a display device. The memory device 340 may store data and/or programs.

인터페이스(306)는 통신 네트워크로 데이터를 전송하거나 통신 네트워크로부터 데이터를 수신하는 기능을 수행할 수 있다. 인터페이스(306)는 유선 또는 무선 형태일 수 있다. 예컨대, 인터페이스(306)는 안테나 또는 유무선 트랜시버 등을 포함할 수 있다. 메모리 (340)는 프로세서(308)의 동작을 향상시키되, 개인정보의 보호를 위한 휘발성의 동작 메모리로서, 고속의 디램 및/또는 에스램 등을 더 포함할 수도 있다. The interface 306 may perform the function of transmitting data to or receiving data from a communication network. Interface 306 may be wired or wireless. For example, the interface 306 may include an antenna or a wired or wireless transceiver. The memory 340 is a volatile operating memory that improves the operation of the processor 308 and protects personal information, and may further include high-speed DRAM and/or SRAM.

또한, 메모리(340) 내에는 여기에 설명된 일부 또는 모든 모듈의 기능을 제공하는 프로그래밍 및 데이터 구성을 저장한다. 예를 들어, 상술한 학습 방법의 선택된 양태들을 수행하도록 하는 로직을 포함할 수 있다.Additionally, memory 340 stores programming and data configurations that provide the functionality of some or all of the modules described herein. For example, it may include logic to perform selected aspects of the learning method described above.

메모리 (340)에 저장된 상술한 학습 방법을 수행하는 각 단계를 포함하는 명령어들의 집합으로 프로그램 또는 어플리케이션을 로드하고 프로세서가 각 단계를 수행할 수 있도록 한다. 예를 들어, 대상 이미지(I)를 인코더(110)에 입력하여 상기 대상 이미지(I) 내 적어도 일 객체의 특징이 축약된 특징맵을 출력하는 동작, 상기 출력된 특징맵을 디코더(120)에 입력하여 상기 객체의 특징이 확장된 세그멘테이션맵을 생성하는 동작, 및 상기 세그멘테이션맵 내 검출된 객체의 에지를 기반으로 상기 이미지 내 불확실한 마스크를 정제(Refinement)하여 상기 객체 세그멘테이션 정보를 획득하는 동작 등이 포함된 컴퓨터 프로그램이 프로세서에 의해 수행될 수 있다.A program or application is loaded with a set of instructions including each step of performing the above-described learning method stored in the memory 340 and allows the processor to perform each step. For example, an operation of inputting a target image (I) to the encoder 110 and outputting a feature map in which the features of at least one object in the target image (I) are abbreviated, and sending the output feature map to the decoder 120. An operation of inputting and generating a segmentation map with expanded features of the object, and an operation of obtaining the object segmentation information by refining an uncertain mask in the image based on the edge of the object detected in the segmentation map. The included computer program may be executed by a processor.

이상의 본 발명에 따르면 보다 정확한 객체 세그멘테이션 결과를 생성할 수 있다.According to the present invention described above, more accurate object segmentation results can be generated.

또한, 본 발명은 세그멘테이션 결과를 이용하여 레이블을 생성하고 신경망의 학습에 이용할 수 있다.Additionally, the present invention can generate a label using the segmentation result and use it for learning of a neural network.

또한, 본 발명은 인코더(110)와 디코더(120)로 구성된 신경망 네트워크의 중간 출력들을 이용하여 인코더(110)와 디코더(120)를 개별적으로 학습시킴으로써 보다 높은 성능의 향상을 이룰 수 있으며, 추가적인 리소스 없이 보다 정확한 세그멘테이션 결과를 생성할 수 있다.In addition, the present invention can achieve higher performance improvement by individually training the encoder 110 and the decoder 120 using the intermediate outputs of the neural network consisting of the encoder 110 and the decoder 120, and additional resources. More accurate segmentation results can be generated without.

나아가, 여기에 설명되는 다양한 실시예는 예를 들어, 소프트웨어, 하드웨어 또는 이들의 조합된 것을 이용하여 컴퓨터 또는 이와 유사한 장치로 읽을 수 있는 기록매체 내에서 구현될 수 있다.Furthermore, various embodiments described herein may be implemented in a recording medium readable by a computer or similar device, for example, using software, hardware, or a combination thereof.

하드웨어적인 구현에 의하면, 여기에 설명되는 실시예는 ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다. 일부의 경우에 본 명세서에서 설명되는 실시예들이 제어 모듈 자체로 구현될 수 있다.According to hardware implementation, the embodiments described herein include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and field programmable gate arrays (FPGAs). In some cases, it may be implemented using at least one of processors, controllers, micro-controllers, microprocessors, and other electrical units for performing functions. The described embodiments may be implemented as a control module itself.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다. 적절한 프로그램 언어로 씌여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리 모듈에 저장되고, 제어모듈에 의해 실행될 수 있다.According to software implementation, embodiments such as procedures and functions described in this specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein. Software code can be implemented as a software application written in an appropriate programming language. The software code may be stored in a memory module and executed by a control module.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. The above description is merely an illustrative explanation of the technical idea of the present invention, and various modifications, changes, and substitutions can be made by those skilled in the art without departing from the essential characteristics of the present invention. will be.

따라서, 본 발명에 개시된 실시 예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.Accordingly, the embodiments disclosed in the present invention and the attached drawings are not intended to limit the technical idea of the present invention, but are for illustrative purposes, and the scope of the technical idea of the present invention is not limited by these embodiments and the attached drawings. . The scope of protection of the present invention should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be construed as being included in the scope of rights of the present invention.

Claims

In a method of obtaining object segmentation information through a learned neural network performed on a computing device,
Inputting a target image into an encoder and outputting a feature map in which features of at least one object in the target image are abbreviated;
Inputting the output feature map into a decoder to generate a segmentation map with expanded features of the object; and
Obtaining the object segmentation information by refining the uncertain mask in the image based on the edge of the object detected in the segmentation map,
The encoder is a method of obtaining object segmentation information, characterized in that the difference between an active map that activates the area of the object in the feature map using the label of the object and a reconstruction map reconstructed in pixel units from the active map is learned as a loss. .

According to claim 1,
The reconstruction map is,
generating a first reconstruction map by detecting a false negative area according to correlation between pixels in the active map; and
A method of obtaining object segmentation information, characterized in that it is reconstructed by detecting a false positive area according to the affinity between pixels in the active map and generating a second reconstruction map.

According to claim 2,
The decoder is a method of acquiring object segmentation information, characterized in that the difference between the segmentation map and the reconstructed segmentation map filtered according to the pixel-unit object probability of the second reconstruction map is learned as a loss.

According to claim 1,
The step of acquiring the object segmentation information is,
Further comprising generating a second segmentation map by detecting a false positive area according to the similarity between pixels in the segmentation map,
A method of obtaining object segmentation information, characterized in that obtaining the object segmentation information from the generated second segmentation map.

According to claim 4,
The step of obtaining the object segmentation information is,
Further comprising generating a third segmentation map filtered according to the pixel-level object-specific probability of the second segmentation map,
A method of obtaining object segmentation information, characterized in that the uncertain mask in the third segmentation map is refined using superpixels generated based on the edges of the object detected in the second segmentation map.

According to claim 2,
A method of obtaining object segmentation information, further comprising generating a composite image in which an object is synthesized using each object segmentation information in the first and second target images.

According to claim 6,
The step of outputting the feature map is,
The encoder outputs a composite feature map in which the features of a plurality of objects in the composite image are abbreviated,
The encoder is,
A first activation map that activates areas of the first and second objects in the composite feature map using labels of the first and second objects in the composite image, and each of the first target image and the second target image A method of acquiring object segmentation information, characterized in that the difference between the second composite map synthesized by the reconstructed map reconstructed from the active map of is learned as a loss.

According to claim 7,
The step of generating the segmentation map is,
The decoder generates a synthetic segmentation map in which the features of the object in the synthetic feature map are expanded,
The decoder is,
A method of obtaining object segmentation information, characterized in that the difference between the synthetic segmentation map and the synthetic reconstruction map filtered according to the probability of each object for each of the first and second target images is learned as a loss. .

In an image augmentation method using a learned neural network performed on a computing device,
Receiving a plurality of target images as input;
Inputting each target image into an encoder and outputting a feature map in which features of at least one object in the target image are abbreviated;
Inputting the output feature map into a decoder to generate a segmentation map with expanded features of the object;
Obtaining the object segmentation information by refining an uncertain mask in the image based on the edge of the object detected in the segmentation map; and
Generating a composite image in which an object is synthesized using segmentation information for each object in the target image,
The encoder is an image augmentation method characterized in that the difference between an active map that activates the area of the object in the feature map using the object's label and a reconstruction map reconstructed in pixel units from the active map is learned as a loss.

A computer-readable recording medium storing a program for performing a method of acquiring object segmentation information through a learned neural network performed in the computing device according to any one of claims 1 to 8.