KR102167808B1

KR102167808B1 - Semantic segmentation method and system applicable to AR

Info

Publication number: KR102167808B1
Application number: KR1020200039310A
Authority: KR
Inventors: 이승호; 고태영
Original assignee: 한밭대학교 산학협력단
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-10-20
Also published as: WO2021201422A1

Abstract

The present invention relates to a method and a system for semantic segmentation applicable to augmented reality (AR). The present invention is to improve the performance rate and accuracy of video analysis in order to apply the video analysis to AR. The method includes: a semantic segmentation image acquisition step (S10) of acquiring a semantic segmentation image by classifying an object from an input image and labeling it; a modified dilated residual network (DRN) step (S20) of extracting a feature map from the acquired image by using atrous convolution; and an atrous pyramid pooling module step (S30) of selectively applying various atrous convolutions in accordance with the extracted feature map, extracting a feature by arranging the atrous convolutions in parallel, and then forming a pyramid-shaped feature map in order to effectively extract an object occupying a small region in the image.

Description

Semantic segmentation method and system applicable to AR

본 발명은 AR에 적용 가능한 의미적인 분할 방법 및 시스템에 관한 것으로서, 더욱 상세하게는 영상 분석을 AR(Augmented Reality)에 적용하기 위해 영상 분석의 수행 속도와 정확성을 향상시키는 AR에 적용 가능한 의미적인 분할 방법 및 시스템에 관한 것이다.The present invention relates to a semantic segmentation method and system applicable to AR, and more specifically, semantic segmentation applicable to AR that improves the speed and accuracy of image analysis in order to apply image analysis to AR (Augmented Reality). It relates to a method and system.

4차 산업혁명 이후 인공지능 및 로봇 개발이 가속화되면서 인간처럼 생각하는 분야의 연구가 확대되고 있다. 따라서 가상 및 증강 현실 시스템, 자율 주행, 의료 로봇, 드론 등 실시간으로 움직이며 판단하고 동작을 수행하는 연구에 관심이 늘어나고 있다.As the development of artificial intelligence and robots accelerates after the 4th industrial revolution, research in the field of thinking like a human is expanding. Therefore, there is increasing interest in research that moves, judges and performs actions in real time, such as virtual and augmented reality systems, autonomous driving, medical robots, and drones.

이러한 연구의 바탕에는 사람의 눈을 대신하는 카메라를 통해 들어오는 영상을 분석하는 연구가 기초가 된다. 영상을 분석하는 연구에서 각 픽셀이 어떤 클래스에 해당하는지 레이블링을 수행하는 의미적인 분할(Semantic segmentation)에 관한 연구는 기본 과제이다.The basis of this research is a study that analyzes the image coming through a camera that replaces the human eye. In the study of image analysis, a study on semantic segmentation, which labels each pixel to which class, is a basic task.

상기 의미적인 분할(Semantic segmentation)은 미리 학습된 클래스에 대하여 영상을 픽셀별로 나누는 기법으로 단순히 사진을 클래스별로 분류하는 것이 아니라, 영상 속의 장면을 완벽하게 이해하는 높은 수준의 기법으로 시각적 환경을 완전히 이해하는데 필요한 핵심적인 컴퓨터 비전 기술 중의 하나이다.Semantic segmentation is a technique of dividing images by pixels for a class that has been learned in advance, and not simply classifying photos by class, but a high-level technique that fully understands the scene in the image, and fully understands the visual environment. It is one of the key computer vision skills required for this.

또한, 상기 의미적인 분할 알고리즘은 빠른 상호 작용 또는 응답을 위한 효율적인 수행 속도와 정확한 판단을 위한 정확도의 요구가 높다. 예를 들어, 의미적인 분할의 수행 속도와 정확한 판단을 위한 정확도는 자율주행에서의 안전 제어 및 운전 결정, 충돌 회피와 같은 기술을 위해서 필수적인 요소이다.In addition, the semantic segmentation algorithm requires high efficient execution speed for quick interaction or response and accuracy for accurate determination. For example, the speed of performing semantic division and the accuracy for accurate judgment are essential factors for technologies such as safety control and driving decisions in autonomous driving, and collision avoidance.

하지만, 실제 촬영하는 영상에서 실시간으로 정확한 의미적인 분할을 수행하는 것은 어려움을 겪고 있다. 첫째로, 시각적 개체는 종종 변형, 폐색 및 스케일 변형의 영향을 받는다. 둘째, 배경 잡음은 객체를 배경에서 분리하기 어렵게 만든다.However, it is difficult to perform accurate semantic segmentation in real time in an image actually photographed. First, visual objects are often affected by deformation, occlusion and scale deformation. Second, background noise makes it difficult to separate objects from the background.

따라서, 이러한 문제들을 다루기 위해서 우리는 외관 변화에 탄력적이고 강력한 알고리즘이 필요하다. 동시에, 복잡한 배경으로부터 물체를 구별하기 위해 다양한 상황별 정보를 고려할 필요가 있다.Therefore, to deal with these problems, we need an algorithm that is resilient and robust to appearance changes. At the same time, it is necessary to consider various context-specific information to distinguish objects from complex backgrounds.

대한민국 공개특허 제10-2019-0033933호(2019년 04월 01일 공개)Republic of Korea Patent Publication No. 10-2019-0033933 (published on April 1, 2019)

따라서, 본 발명은 종래의 단점을 해결하기 위한 것으로서, 입력 이미지에서 크기가 작은 객체를 효과적으로 추출하여 영상 분석의 정확성을 향상하고자 하는데 그 목적이 있다. 또한, 영상 분석을 AR(Augmented Reality)에 적용하기 위해 영상 분석의 수행 속도를 향상시키고자 하는데 그 목적이 있다.Accordingly, an object of the present invention is to solve the conventional shortcomings and to improve the accuracy of image analysis by effectively extracting a small object from an input image. In addition, the purpose of this is to improve the speed of performing image analysis in order to apply image analysis to AR (Augmented Reality).

이러한 기술적 과제를 이루기 위한 본 발명의 일 측면에 따른 AR에 적용 가능한 의미적인 분할 방법은 입력된 이미지에서 객체를 분류하고, 라벨링(Labeling) 하여 의미적인 분할 이미지를 획득하는 의미적인 분할(Semantic segmentation) 이미지 획득 단계(S10)와, 아트로스 컨볼루션(Atrous convolution)을 이용하여 획득된 이미지에서 특징점 맵(feature map)을 추출하는 변형된 확장 잔여 네트워크(DRN, Dilated Residual Network) 단계(S20)를 포함한다.A semantic segmentation method applicable to AR according to an aspect of the present invention for achieving such a technical problem is semantic segmentation in which an object is classified from an input image and is labeled to obtain a semantic segmented image. Including an image acquisition step (S10) and a modified extended residual network (DRN, Dilated Residual Network) step (S20) of extracting a feature map from an image acquired using Atrous convolution. do.

또한, 이미지에서 차지하는 영역이 작은 객체를 효과적으로 추출하기 위해 상기 추출된 특징점 맵에 따라 선택적으로 다양한 아트로스 컨볼루션(Atrous convolution)을 적용하고, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점을 추출한 후 피라미드 형상으로 특징점 맵을 형성하는 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)와, 상기 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)를 통해 추출된 결과와 표준 데이터베이스에서 제공하는 결과 이미지를 비교하고, 비교 결과를 토대로 오차율을 줄이기 위해 가중치를 수정하는 변형된 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation) 단계(S40)를 포함한다.In addition, in order to effectively extract an object with a small area occupied in the image, various Atrous convolutions are selectively applied according to the extracted feature point map, and feature points are extracted by arranging the Atrous convolutions in parallel. The result extracted through the Atrous Pyramid Pooling Module step (S30) and the Atrous Pyramid Pooling Module step (S30) forming a feature point map in a pyramid shape and the standard database And a modified extended residual network backpropagation step (S40) of comparing the provided result images and correcting weights to reduce an error rate based on the comparison result.

이때, 상기 아트로스 피라미드 풀링 모듈 단계(S30)는 아트로스 피라미드 풀링 모듈 단계(S30)에서 추출된 특징점 맵을 피라미드 형상으로 적층하고, 상기 피라미드 형상으로 형성된 특징점 맵을 1x1 컨볼루션을 적용하여 1채널의 특징점 맵으로 형성하는 과정을 포함한다.In this case, in the Atros pyramid pulling module step (S30), the feature point map extracted in the Atros pyramid pulling module step (S30) is stacked in a pyramid shape, and the feature point map formed in the pyramid shape is applied to a 1x1 convolution. It includes the process of forming a feature point map of.

또한, 본 발명의 다른 측면에 따른 AR에 적용 가능한 의미적인 분할 시스템은 영상 입력부, 분할 이미지 획득부, 특징점 추출부, 판단부 및 저장부를 포함한다. 상기 영상 입력부는 카메라와 같은 영상 장치를 통해 촬영된 이미지 정보 또는 표준 데이터베이스를 통해 이미지 정보를 입력받을 수 있다.In addition, a semantic segmentation system applicable to an AR according to another aspect of the present invention includes an image input unit, a segmented image acquisition unit, a feature point extraction unit, a determination unit, and a storage unit. The image input unit may receive image information captured through an imaging device such as a camera or image information through a standard database.

또한, 상기 분할 이미지 획득부는 영상 입력부를 통해 입력받은 이미지 정보에서 객체를 분류하고, 분류된 객체를 라벨링(Labeling) 하여 의미적인 분할(Semantic segmentation) 이미지를 획득한다.In addition, the segmented image acquisition unit classifies an object from image information received through an image input unit, and labels the classified object to obtain a semantic segmentation image.

또한, 상기 특징점 추출부는 확장 잔여 네트워크 모듈 및 아트로스 피라미드 풀링 모듈을 포함한다. 상기 확장 잔여 네트워크 모듈은 아트로스 컨볼루션(Atrous convolution)을 이용하여 상기 분할 이미지 획득부를 통해 획득된 이미지에서 특징점 맵(feature map)을 추출한다.In addition, the feature point extraction unit includes an extended residual network module and an Atros pyramid pooling module. The extended residual network module extracts a feature map from the image acquired through the segmented image acquisition unit using Atrous convolution.

또한, 상기 아트로스 피라미드 풀링 모듈은 이미지에서 차지하는 영역이 작은 객체를 효과적으로 추출하기 위해 상기 확장 잔여 네트워크 모듈에서 추출된 특징점 맵에 따라 선택적으로 다양한 아트로스 컨볼루션(Atrous convolution)을 적용하고, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점 맵을 추출한다.In addition, the Atros pyramid pooling module selectively applies various Atrous convolutions according to the feature point map extracted from the extended residual network module in order to effectively extract objects with a small area occupied in the image. The feature point map is extracted by arranging convolutions in parallel.

또한, 상기 판단부는 아트로스 피라미드 풀링 모듈에서 추출된 결과와 미리 설정된 표준 데이터베이스에서 제공하는 결과 이미지를 비교하고, 비교 결과를 토대로 판단하여 오차율을 줄이기 위해 가중치를 수정한다.In addition, the determination unit compares a result extracted from the Atros pyramid pooling module with a result image provided from a preset standard database, determines based on the comparison result, and corrects the weight to reduce an error rate.

이상에서 설명한 바와 같이, 본 발명에 따른 AR에 적용 가능한 의미적인 분할 방법 및 시스템은 이미지로부터 추출된 특징점 맵에 아트로스 컨볼루션을 적용하여 작은 객체를 효과적으로 추출하고, 영상 분석의 정확성을 향상할 수 있는 효과가 있다.As described above, the semantic segmentation method and system applicable to AR according to the present invention can effectively extract small objects and improve the accuracy of image analysis by applying Atros convolution to the feature point map extracted from the image. There is an effect.

또한, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점을 추출한 후 피라미드 형상으로 특징점 맵을 형성함으로써 영상 분석을 AR(Augmented Reality)에 적용하기 위한 의미적인 분할(Semantic segmentation)의 수행 속도를 향상시킬 수 있는 효과가 있다. 또한, 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation)를 사용하여 의미적인 분할의 오차율을 줄이고 정확도를 향상시킬 수 있는 효과가 있다.In addition, it is possible to improve the speed of semantic segmentation to apply image analysis to AR (Augmented Reality) by forming a feature point map in a pyramid shape after extracting feature points by arranging the Atros convolution in parallel. There is an effect. In addition, by using Dilated Residual Network Backpropagation, there is an effect of reducing an error rate of semantic division and improving accuracy.

도 1은 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법을 나타내는 개념도이다.
도 2는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법을 나타내는 순서도이다.
도 3은 의미적인 분할(Semantic segmentation) 이미지 획득과정을 나타내는 도면이다.
도 4는 파스칼(PASCAL) VOC 2012 데이터베이스의 이미지를 나타내는 도면이다.
도 5는 시티스케이프(Cityscape) 데이터베이스의 이미지를 나타내는 도면이다.
도 6은 본 발명의 실시 예에 따른 아트로스 컨볼루션(Atrous convolution)을 나타내는 도면이다.
도 7은 종래의 컨볼루션과 본 발명의 실시 예에 따른 아트로스 컨볼루션의 의미적인 분할 수행 결과를 나타내는 도면이다.
도 8은 본 발명의 실시 예에 따른 확장 잔여 네트워크(DRN)를 나타내는 도면이다.
도 9는 본 발명의 실시 예에 따른 아트로스 피라미드 풀링 모듈을 나타내는 도면이다.
도 10은 컨볼루션(Convolution)의 구조를 나타내는 도면이다.
도 11a 및 도 11b는 역전파(backpropagation) 과정의 카르파티(Karpathy) 계산 그래프를 나타내는 도면이다.
도 12는 본 발명의 실시 예에 따른 역전파 과정 이미지를 나타내는 도면이다.
도 13은 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 평가절차를 나타내는 흐름도이다.
도 14는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 시스템을 나타내는 구성도이다.1 is a conceptual diagram illustrating a semantic segmentation method applicable to an AR according to an embodiment of the present invention.
2 is a flowchart illustrating a semantic segmentation method applicable to AR according to an embodiment of the present invention.
3 is a diagram illustrating a semantic segmentation image acquisition process.
4 is a diagram showing an image of the PASCAL VOC 2012 database.
5 is a diagram showing an image of a Cityscape database.
6 is a diagram illustrating an Atrous convolution according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating a result of performing semantic division between a conventional convolution and an Atros convolution according to an embodiment of the present invention.
8 is a diagram illustrating an extended residual network (DRN) according to an embodiment of the present invention.
9 is a diagram illustrating an Atros pyramid pooling module according to an embodiment of the present invention.
10 is a diagram showing a structure of convolution.
11A and 11B are diagrams illustrating a graph of Karpathy calculation in a backpropagation process.
12 is a diagram illustrating an image of a backpropagation process according to an embodiment of the present invention.
13 is a flowchart illustrating an evaluation procedure of a semantic segmentation method applicable to an AR according to an embodiment of the present invention.
14 is a block diagram illustrating a semantic segmentation system applicable to an AR according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면부호를 붙였다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, the present invention may be implemented in various forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 또는 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated. In addition, terms such as "... unit", "... group", and "... module" described in the specification mean a unit that processes at least one function or operation, which is implemented by hardware or software or a combination of hardware and software. Can be.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing a preferred embodiment of the present invention with reference to the accompanying drawings.

각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.The same reference numerals in each drawing indicate the same members.

도 1은 본 발명의 실시 예에 따른 AR(Augmented Reality)에 적용 가능한 의미적인 분할 방법을 나타내는 개념도이고, 도 2는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법을 나타내는 순서도이다.FIG. 1 is a conceptual diagram illustrating a semantic segmentation method applicable to an Augmented Reality (AR) according to an embodiment of the present invention, and FIG. 2 is a flowchart illustrating a semantic segmentation method applicable to an AR according to an embodiment of the present invention.

본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할(Semantic segmentation) 방법은 입력된 이미지에서 객체를 분류하고, 라벨링(Labeling) 하여 의미적인 분할 이미지를 획득하는 의미적인 분할(Semantic segmentation) 이미지 획득 단계(S10)와, 아트로스 컨볼루션(Atrous convolution)을 이용하여 획득된 이미지에서 특징점 맵(feature map)을 추출하는 변형된 확장 잔여 네트워크(DRN, Dilated Residual Network) 단계(S20)를 포함한다.Semantic segmentation method applicable to AR according to an embodiment of the present invention is to obtain a semantic segmentation image that classifies objects from an input image and obtains a semantic segmented image by labeling. Step S10 and a modified extended residual network (DRN, Dilated Residual Network) step S20 of extracting a feature map from the image acquired using Atrous convolution.

도 3은 의미적인 분할(Semantic segmentation) 이미지 획득과정을 나타내는 도면이고, 도 4는 파스칼(PASCAL) VOC 2012 데이터베이스의 이미지를 나타내는 도면이며, 도 5는 시티스케이프(Cityscape) 데이터베이스의 이미지를 나타내는 도면이다.3 is a diagram showing a semantic segmentation image acquisition process, FIG. 4 is a diagram showing an image of a PASCAL VOC 2012 database, and FIG. 5 is a diagram showing an image of a Cityscape database .

도 3과 같이 상기 의미적인 분할 이미지 획득 단계(S10)는 카메라와 같은 영상장치를 통해 이미지를 촬영하는 단계(S11)와, 촬영된 이미지로부터 객체를 검출하는 단계(S12)와, 검출된 객체를 토대로 라벨링(Labeling) 하여 분할(Segmentation) 이미지를 획득하는 단계(S13)를 포함할 수 있다.As shown in FIG. 3, the semantic segmented image acquisition step (S10) includes taking an image through an imaging device such as a camera (S11), detecting an object from the captured image (S12), and detecting the detected object. Based on the labeling, a step of obtaining a segmentation image (S13) may be included.

한편, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할(Semantic segmentation) 방법에 대한 객관적인 평가를 위해 정규화된 표준 데이터베이스를 이용할 수 있다. 즉, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 설명을 위해 정규화된 표준 데이터베이스를 이용한다.Meanwhile, a normalized standard database may be used for objective evaluation of a semantic segmentation method applicable to AR according to an embodiment of the present invention. That is, a normalized standard database is used to describe a semantic segmentation method applicable to AR according to an embodiment of the present invention.

예를 들어, 도 4의 파스칼(PASCAL) VOC 2012 데이터베이스와 도 5의 시티스케이프(Cityscape) 데이터베이스를 이용하여 이미지의 객체를 분류하고 라벨링(Labeling) 함으로써 의미적인 분할(Semantic segmentation) 이미지를 획득할 수 있다.For example, semantic segmentation images can be obtained by classifying and labeling objects of an image using the PASCAL VOC 2012 database of FIG. 4 and the Cityscape database of FIG. 5. have.

도 4의 파스칼(PASCAL) VOC 2012 데이터베이스는 비행기(aeroplane), 자전거(bicycle), 새(bird), 보트(boat), 병(bottle), 버스(bus), 자동차(car), 고양이(cat), 의자(chair), 소(cow), 식탁(dining table), 개(dog), 말(horse), 오토바이(motorbike), 사람(person), 화분(potted plant), 양(sheep), 소파(sofa), 열차(train), 티브/모니터(TV/monitor)와 같은 총 20개의 클래스(class)로 이루어지는 데이터베이스를 나타낸다.The PASCAL VOC 2012 database of FIG. 4 includes an airplane, a bicycle, a bird, a boat, a bottle, a bus, a car, and a cat. , Chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa It represents a database consisting of a total of 20 classes, such as sofa), train, and TV/monitor.

또한, 도 5의 시티스케이프(Cityscape) 데이터베이스는 도시의 거리 장면 이미지로 구축하여 공개된 표준 데이터베이스이다. 즉, 상기 시티스케이프 데이터베이스는 50여 개의 도시에서 다양한 날과 시간대에 촬영한 이미지 데이터 셋으로, 30여 개의 클래스(class)를 가진 5000개의 이미지로 구성된다.In addition, the Cityscape database of FIG. 5 is a standard database constructed from urban street scene images and published. That is, the Cityscape database is a set of image data taken at various days and times in 50 cities, and is composed of 5000 images having 30 classes.

이때, 상기 의미적인 분할(Semantic segmentation) 이미지는 일관성을 유지하기 위해 동일한 크기로 조절하는 것이 바람직하다. 예를 들어, 상기 의미적인 분할 이미지들을 513x513의 크기로 조절할 수 있다.In this case, it is preferable to adjust the semantic segmentation image to the same size to maintain consistency. For example, the semantic divided images may be adjusted to a size of 513x513.

도 6은 본 발명의 실시 예에 따른 아트로스 컨볼루션(Atrous convolution)을 나타내는 도면이다. 즉, 도 6의 도면 (a)는 비율(rate)의 크기 r=1일 때의 아트로스 컨볼루션(Atrous convolution)을 나타내는 도면이고, 도면 (b)는 비율(rate)의 크기 r=2일 때의 아트로스 컨볼루션(Atrous convolution)을 나타내는 도면이며, 도면 (c)는 비율(rate)의 크기 r=3일 때의 아트로스 컨볼루션(Atrous convolution)을 나타내는 도면이다.6 is a diagram illustrating an Atrous convolution according to an embodiment of the present invention. That is, the drawing (a) of FIG. 6 is a diagram showing the Atrous convolution when the size of the rate r=1, and the drawing (b) is the size of the rate r=2 days. It is a diagram showing atrous convolution at the time, and FIG. (c) is a diagram showing atrous convolution when the size r = 3 of the rate.

도 6에서 도시된 바와 같이 상기 변형된 확장 잔여 네트워크(DRN, Dilated Residual Network) 단계(S20)는 컨볼루션(Convolution)을 변형하여 공간 정보를 최대한 유지하며 특징점을 추출하는 아트로스 컨볼루션(Atrous convolution)을 이용하여 원본 이미지에서 특징점 맵(feature map)을 추출한다.6, the modified extended residual network (DRN, Dilated Residual Network) step (S20) maintains spatial information as much as possible by transforming convolution and extracts feature points at Atrous convolution. ) To extract a feature map from the original image.

상기 아트로스 컨볼루션(Atrous convolution) 구조는 웨이블릿(Wavelet) 분석의 영향을 받아 필터 내부에 제로 패딩(Zero padding)을 넣는 방법으로 가중치(Weight)의 개수를 늘리지 않고 윈도우(Window)의 크기를 늘린다.The Atrous convolution structure increases the size of the window without increasing the number of weights by adding zero padding inside the filter under the influence of wavelet analysis. .

이러한 아트로스 컨볼루션(Atrous convolution)은 종래의 컨볼루션에 비해 같은 연산량으로 큰 특징을 잡아낼 수 있으며, 다양한 확장 비율을 가진 아트로스 컨볼루션을 병렬적으로 사용하면 더 많은 공간 특징을 추출할 수 있다.This Atrous convolution can capture large features with the same amount of computation compared to the conventional convolution, and more spatial features can be extracted by parallel use of the Atrous convolution with various expansion ratios. .

아래의 [수학식 1]은

이 1인 경우로 종래의 컨볼루션을 나타내는 수식이다. 또한, 아래의 [수학식 2]는

이 1보다 큰 경우로 아트로스 컨볼루션을 나타내는 수식이다.[Equation 1] below is

In the case of 1, this is an equation representing a conventional convolution. Also, [Equation 2] below

If this is greater than 1, it is an equation representing the atros convolution.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

여기에서, k는 커널(kernel)을 나타내고, s는 스트라이드(stride)를 나타낸다.Here, k represents a kernel, and s represents a stride.

도 7은 종래의 컨볼루션과 본 발명의 실시 예에 따른 아트로스 컨볼루션의 의미적인 분할 수행 결과를 나타내는 도면이다. 종래의 컨볼루션 네트워크(Convolution network)를 사용하여 얻은 작은 특징점 맵으로 의미적인 분할을 수행하면 정확도가 감소한다.FIG. 7 is a diagram illustrating a result of performing semantic division between a conventional convolution and an Atros convolution according to an embodiment of the present invention. When semantic division is performed with a small feature point map obtained using a conventional convolution network, accuracy is reduced.

도 7에서 도시된 바와 같이 다운 샘플링(down-sampling), 컨볼루션 및 업 샘플링(up-sampling)을 거쳐 의미적인 분할(Semantic segmentation)을 수행하는 상단 이미지와, 아트로스 컨볼루션(Atrous convolution)을 통해 의미적인 분할을 수행하는 하단 이미지의 차이를 확인할 수 있다.As shown in FIG. 7, the upper image performing semantic segmentation through down-sampling, convolution, and up-sampling, and Atrous convolution. Through this, it is possible to check the difference between the lower image that performs semantic division.

도 7에서 종래의 컨볼루션에 대한 상단의 이미지를 보면 공간적 정보의 손실이 있는 상태에서 업 샘플링(up-sampling)을 하면서 의미적인 분할(Semantic segmentation)의 해상도가 떨어지는 것을 볼 수 있다.7, it can be seen that the resolution of semantic segmentation is degraded while up-sampling is performed in a state in which there is a loss of spatial information in a state in which spatial information is lost.

하지만, 도 7에서 아트로스 컨볼루션(Atrous convolution)을 수행하는 하단의 이미지를 보면 리셉티브 필드(receptive field)를 크게 가져가면서 컨볼루션을 함으로써 공간적 정보의 손실을 최소화하면서 해상도는 큰 결과를 얻게 된다.However, when looking at the image below performing Atrous convolution in FIG. 7, convolution is performed while taking a large receptive field, thereby minimizing loss of spatial information and obtaining a high resolution result. .

따라서, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법은 아트로스 컨볼루션(Atrous convolution)을 이용하여 빈 가중치로 커널을 확장함으로써 네트워크는 풀링(pooling) 함수에 의존하지 않고 장거리 피처를 학습하며, 풀링(pooling)이 없이도 네트워크는 더 많은 높은 공간 빈도의 세부요소들을 유지할 수 있다.Therefore, the semantic segmentation method applicable to AR according to an embodiment of the present invention extends the kernel with empty weights using Atrous convolution, so that the network does not rely on a pooling function, and provides long distance features. Learning, and without pooling, the network can maintain more high spatial frequency details.

도 8은 본 발명의 실시 예에 따른 확장 잔여 네트워크(DRN, Dilated Residual Network)를 나타내는 도면이고, 도 9는 본 발명의 실시 예에 따른 아트로스 피라미드 풀링 모듈(Atrous pyramid pooling module)을 나타내는 도면이다.FIG. 8 is a diagram showing a Dilated Residual Network (DRN) according to an embodiment of the present invention, and FIG. 9 is a diagram showing an Atrous pyramid pooling module according to an embodiment of the present invention. .

즉, 도 8은 잔여 네트워크(Residual network) 101단을 DRN의 원리를 적용하여 변형하고, 변형한 DRN을 이용하여 특징점 맵을 추출하는 도면이다. 아래의 [표 1]은 본 발명의 실시 예에 따른 DRN의 구조를 나타낸다.That is, FIG. 8 is a diagram illustrating a modification of 101 stages of a residual network by applying the principle of DRN, and extracting a feature point map using the modified DRN. [Table 1] below shows the structure of a DRN according to an embodiment of the present invention.

[표 1] DRN(Dilated Residual Network)의 구조 표[Table 1] Structure table of DRN (Dilated Residual Network)

본 발명의 실시 예에 따른 DRN은 입력 이미지에서 전체 이미지에 대한 공간영역의 특징을 최대한 보존하면서 중요한 특징만 추출해 특징점 맵을 생성한다.The DRN according to an embodiment of the present invention generates a feature point map by extracting only important features while preserving the features of the spatial domain for the entire image from the input image as much as possible.

또한, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할(Semantic segmentation) 방법은 이미지에서 차지하는 영역이 작은 객체를 효과적으로 추출하기 위해 상기 추출된 특징점 맵에 따라 선택적으로 다양한 아트로스 컨볼루션(Atrous convolution)을 적용하고, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점을 추출한 후 피라미드 형상으로 특징점 맵을 형성하는 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)와, 상기 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)를 통해 추출된 결과와 표준 데이터베이스에서 제공하는 결과 이미지를 비교하고, 비교 결과를 토대로 오차율을 줄이기 위해 가중치를 수정하는 변형된 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation) 단계(S40)를 포함한다.In addition, the semantic segmentation method applicable to AR according to an embodiment of the present invention selectively uses various atrous convolutions according to the extracted feature point map in order to effectively extract an object with a small area occupied by an image. convolution), extracting feature points by arranging the Atros convolution in parallel, and then forming a feature point map in a pyramid shape (S30), and the Atrous Pyramid Pooling Modified extended residual network backpropagation (Dilated Residual) that compares the result extracted through the Atrous Pyramid Pooling Module step (S30) with the result image provided from the standard database, and corrects the weight to reduce the error rate based on the comparison result. Network Backpropagation) step (S40).

의미적인 분할(Semantic segmentation)에서 정확도를 높이기 위해서는 작은 객체도 정확하게 추출하는 것이 중요하다. 하지만, 종래의 의미적인 분할(Semantic segmentation) 분야에서 작은 객체가 복잡하게 배치된 경우는 세그맨테이션(Segmentation)에 많은 어려움을 겪는다.In order to increase accuracy in semantic segmentation, it is important to accurately extract even small objects. However, in the conventional semantic segmentation field, when small objects are complexly arranged, there is a lot of difficulty in segmentation.

따라서, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법은 이러한 문제점을 해결하기 위해 PSPNet(Pyramid Scene Parsing Network)의 피라미드 센스 풀링(Pyramid sense pooling)에서 변형을 주어 적용한다.Therefore, the semantic segmentation method applicable to AR according to an embodiment of the present invention is applied by applying a modification in Pyramid Sense Pooling of a Pyramid Scene Parsing Network (PSPNet) in order to solve this problem.

예를 들어, 도 9와 같이 DRN을 통해 얻은 28x28 크기의 특징점 맵에 5가지의 아트로스 컨볼루션을 병렬적으로 적용하여 특징점 맵을 추출할 수 있다. 여기에서 적용하는 아트로스 컨볼루션은 비율(rate) 1을 적용한 일반적인 컨볼루션과 비율(rate)을 3, 6, 9로 적용한 아트로스 컨볼루션과, 마지막은 상기 변형된 확장 잔여 네트워크(DRN) 단계(S20)에서 추출된 특징점 맵에 이미지 풀링을 적용한다.For example, as shown in FIG. 9, a feature point map may be extracted by applying five types of atros convolutions in parallel to a feature point map having a size of 28x28 obtained through DRN. The Atros convolution applied here is a general convolution with a rate of 1, an Atros convolution with a rate of 3, 6, and 9, and the modified extended residual network (DRN) step. Image pooling is applied to the feature point map extracted in (S20).

그 후 도 9와 같이 추출한 5가지 특징점 맵을 피라미드 형태로 쌓고, 1x1 컨볼루션을 적용하여 1채널 특징점 맵을 추출한다.After that, the five feature point maps extracted as shown in FIG. 9 are stacked in a pyramid shape, and a 1-channel feature point map is extracted by applying 1x1 convolution.

이와 같이, 상기 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)는 이미지에서 차지하는 영역이 작은 객체를 효과적으로 추출하기 위해 상기 변형된 확장 잔여 네트워크(DRN, Dilated Residual Network) 단계(S20)에서 추출된 특징점 맵에 다양한 아트로스 컨볼루션(Atrous convolution)을 적용한다.In this way, the Atrous Pyramid Pooling Module step (S30) is performed in the modified extended residual network (DRN, Dilated Residual Network) step (S20) to effectively extract an object with a small area occupied in the image. Various Atrous convolutions are applied to the extracted feature map.

또한, 상기 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module) 단계(S30)는 소요시간을 줄이기 위해 아트로스 컨볼루션(Atrous convolution)을 병렬적으로 배치하여 특징점을 추출한 후 피라미드 모양으로 특징점 맵을 쌓는다.In addition, in the Atrous Pyramid Pooling Module step (S30), feature points are extracted by arranging atrous convolutions in parallel to reduce the required time, and then a feature point map is stacked in a pyramid shape.

이때, 미리 설정된 기준치 이하로 크기가 작은 특징점 맵은 업샘플링(Up-sampling)을 통해 크기를 동일하게 맞추어 쌓는 것이 바람직하다. 또한, 다양한 특징점 맵을 피라미드 모양으로 쌓으면 1x1 컨볼루션(Convolution)을 적용하여 1채널의 특징점 맵을 생성한다.In this case, it is preferable to stack the feature point maps having a size smaller than a preset reference value to match the size of the feature point maps through up-sampling. In addition, when various feature point maps are stacked in a pyramid shape, a 1-channel feature point map is generated by applying a 1x1 convolution.

또한, 상기 변형된 확장 잔여 네트워크 역전파 단계(S40)는 상기 아트로스 피라미드 풀링 모듈(Atrous Pyramid Pooling Module)에서 추출된 최종 특징점 맵에서 컨볼루션을 통해 얻은 의미적인 분할(Semantic segmentation)을 데이터베이스에서 제공하는 결과 이미지와 비교하여 미리 설정된 일정 값 이상 오차율이 발생하면, 상기 변형된 확장 잔여 네트워크(DRN) 단계(S20)에서 수행하는 컨볼루션에 적용하여 가중치를 수정하고 오차율을 줄인다.In addition, in the modified extended residual network backpropagation step (S40), semantic segmentation obtained through convolution from the final feature point map extracted from the Atrous Pyramid Pooling Module is provided in the database. When an error rate of more than a predetermined value is generated compared to the result image, the weight is corrected by applying it to the convolution performed in the modified extended residual network (DRN) step (S20), and the error rate is reduced.

일반적으로 CNN(Convolutional Neural Network)은 필터가 입력데이터를 슬라이딩하면서 지역적 특징(feature)을 추출하여 최대값(max pooling)이나 평균값(average pooling)으로 압축하여 다음 레이어로 전송한다. 이러한 과정을 반복하는 것이 CNN의 일반적인 구조이다.In general, in a convolutional neural network (CNN), a filter extracts a regional feature while sliding input data, compresses it into a maximum value (max pooling) or an average value (average pooling), and transmits it to the next layer. Repeating this process is the general structure of CNN.

도 10은 컨볼루션(Convolution)의 구조를 나타내는 도면이고, 도 11a 및 도 11b는 역전파(backpropagation) 과정의 카르파티(Karpathy) 계산 그래프를 나타내는 도면이다. 즉, 도 11a는 도 10의 컨볼루션 구조를 바탕으로

에 대한 역전파 과정을 나타내는 도면이고, 도 11b는 도 10의 컨볼루션 구조를 바탕으로

에 대한 역전파 과정을 나타내는 도면이다.10 is a diagram showing a structure of a convolution, and FIGS. 11A and 11B are diagrams illustrating a Karpathy calculation graph in a backpropagation process. That is, Figure 11a is based on the convolution structure of Figure 10

It is a diagram showing a backpropagation process for, and FIG. 11B is based on the convolutional structure of FIG.

It is a diagram showing a backpropagation process for.

도 10에서 입력값은 5x5 행렬이고,

는 각각 입력값의 i번째 행, j번째 열의 요소를 나타낸다. 이때, 해당 입력값에 필터의 크기가 3x3인 컨볼루션을 수행하면, 출력값은 2x2의 크기를 갖는다. 예를 들어, 도 10에서

의 값은

의 합성곱으로 출력된다.In Fig. 10, the input value is a 5x5 matrix,

Represents the elements of the i-th row and j-th column of the input value, respectively. At this time, when convolution with a filter size of 3x3 is performed on the input value, the output value has a size of 2x2. For example, in Figure 10

Is the value of

Is output as a convolution of

또한, 도 11a에서 도시된 바와 같이

은 포워드(forward) 과정에서 3x3 필터의

가중치하고만 합성곱이 수행되기 때문에 역전파도 한 번만 진행된다. 이때,

의 기울기는 흘러들어온 기울기

에 상대방의 변화량을 나타내는 로컬 그래디언트(

)을 곱해서 구할 수 있다.In addition, as shown in Figure 11a

Is the 3x3 filter in the forward process.

Since convolution is performed only with the weights, backpropagation is performed only once. At this time,

The slope of the flow in

A local gradient (

It can be found by multiplying by ).

마찬가지로,

의 그래디언트는 흘러들어온 그래디언트

에 로컬 그래디언트(

)를 곱해 계산할 수 있다. 또한, 도 11b에서 도시된 바와 같이 상기

과 동일한 방식으로

에 대한 역전파를 계산할 수 있다.Likewise,

The gradient of is the flowing gradient

To the local gradient(

It can be calculated by multiplying by ). In addition, as shown in Figure 11b

In the same way

You can calculate the backpropagation for

도 12는 본 발명의 실시 예에 따른 역전파 과정 이미지를 나타내는 도면이다. 상기 도 11a 및 도 11b와 같이 역전파 방식을 일일이 대입하기에는 어려움이 많다.12 is a diagram illustrating an image of a backpropagation process according to an embodiment of the present invention. As shown in FIGS. 11A and 11B, it is difficult to substitute the backpropagation method individually.

따라서, 도 12의 역전파 과정 이미지를 활용하여 간단하게 그래디언트를 구할 수 있다. 즉, 컨볼루션 레이어(convolution layer)를 만들 때 사용하는 필터의 요소를 정반대로 바꿔서 그래디언트 행렬에 합성곱을 수행하면 입력 벡터에 대한 그래디언트를 구할 수 있다.Therefore, the gradient can be simply obtained by using the backpropagation process image of FIG. 12. That is, if the elements of the filter used to create the convolution layer are reversed and convolution is performed on the gradient matrix, the gradient for the input vector can be obtained.

예를 들어,

의 그래디언트는 도 12의 좌측 상단을 참고로 아래의 [수학식 1]을 이용하여 구할 수 있다.E.g,

The gradient of can be obtained using [Equation 1] below with reference to the upper left of FIG. 12.

[수학식 1][Equation 1]

또한, 필터의 그래디언트는 흘러들어온 그래디언트 행렬의 첫 번째 요소인

이

와 연결되어 있기에 필터의 그래디언트는 흘러들어온 그래디언트(

)에 로컬 그래디언트를 곱해서 구할 수 있다.Also, the gradient of the filter is the first element of the gradient matrix

this

Because it is connected to, the gradient of the filter is the gradient (

) By the local gradient.

따라서,

의 그래디언트는 아래의 [수학식 2]와 같이 구할 수 있다.therefore,

The gradient of can be obtained as [Equation 2] below.

[수학식 2][Equation 2]

입력이미지와 변형된 DRN(Dilated Residual Network)을 적용하여 나온 결과를 비교하여 역전파를 진행하지만, 아트로스 피라미드 풀링 모듈(320)을 거쳐 얻은 결과가 실제 정확도에 영향을 줄 수 있다.Backpropagation is performed by comparing the result obtained by applying the input image and the modified DRN (Dilated Residual Network), but the result obtained through the Atros pyramid pooling module 320 may affect the actual accuracy.

따라서, 상기 변형된 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation) 단계(S40)는 도 1과 같이 아트로스 피라미드 풀링 모듈(320)을 통해 얻은 결과를 표준 데이터베이스에서 제공한 참값과 비교하고, 미리 설정된 해당 오차율을 사용하여 변형된 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation)를 수행한다.Therefore, the modified extended residual network backpropagation step (S40) compares the result obtained through the Atros pyramid pooling module 320 as shown in FIG. 1 with the true value provided from the standard database, and is preset. Deformed extended residual network backpropagation is performed using the error rate.

도 13은 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 평가절차를 나타내는 흐름도이다. 도 13과 같은 과정을 통해 본 발명의 AR에 적용 가능한 의미적인 분할 방법의 인식 횟수에 따른 인식에 걸리는 시간(소요시간)과 예측된 경계 상자와 실제 참값(ground truth) 경계 상자의 교차율(정확도)을 평가할 수 있다.13 is a flowchart illustrating an evaluation procedure of a semantic segmentation method applicable to an AR according to an embodiment of the present invention. Through the process shown in FIG. 13, the time taken for recognition according to the number of recognitions of the semantic segmentation method applicable to the AR of the present invention (the time required) and the intersection rate (accuracy) of the predicted bounding box and the actual ground truth bounding box Can be evaluated.

도 13에서 도시된 바와 같이 학습 과정에서 변형된 DRN(Dilated Residual Network), 아트로스 피라미드 풀링 모듈(Atrous pyramid pooling module)을 적용하여 파스칼(PASCAL) VOC 2012 데이터베이스와 시티스케이프(Cityscape) 데이터베이스와 같은 정규화된 표준 데이터베이스를 학습시킨다.As shown in Fig. 13, normalization such as PASCAL VOC 2012 database and Cityscape database by applying DRN (Dilated Residual Network) and Atrous pyramid pooling module modified in the learning process Train the standard database.

또한, 수행 과정에서는 종래에 학습된 데이터를 기반으로 의미적인 분할(Semantic segmentation)을 수행한다.In addition, in the execution process, semantic segmentation is performed based on previously learned data.

이때, 정확도와 소요시간을 평가하기 위하여 실험에 사용된 파스칼(PASCAL) VOC 2012 데이터베이스의 이미지는 513x513 크기로 조절하여 사용한다. 또한, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 객관적인 신뢰도를 평가하기 위하여 2018년에 Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"과 정확도를 비교한다.At this time, the image of the PASCAL VOC 2012 database used in the experiment is adjusted to a size of 513x513 to evaluate the accuracy and time required. In addition, "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" published by Liang-Chieh Chen and others in 2018 to evaluate the objective reliability of the semantic segmentation method applicable to AR according to an embodiment of the present invention Compare accuracy.

아래의 [표 2]는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법과 다른 논문들에 대한 파스칼(PASCAL) VOC 2012 데이터베이스 정확도를 비교한 결과를 나타낸다.[Table 2] below shows the results of comparing the semantic segmentation method applicable to AR according to an embodiment of the present invention and the accuracy of the PASCAL VOC 2012 database for other papers.

[표 2] Comparison of Semantic Segmentation Techniques and Other Papers with PASCAL VOC 2012 Database Accuracy[Table 2] Comparison of Semantic Segmentation Techniques and Other Papers with PASCAL VOC 2012 Database Accuracy

즉, 상기 [표 2]는 Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"이 동일한 환경에서 다른 논문에서 발표한 기법들과 비교 평가한 정확도 결과를 바탕으로, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법과 비교한 정확도 결과를 나타낸다.That is, the above [Table 2] is based on the accuracy results of "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" published by Liang-Chieh Chen et al. compared with techniques published in other papers in the same environment. , Shows the accuracy result compared with the semantic segmentation method applicable to AR according to an embodiment of the present invention.

상기 [표 2]에서 나타난 바와 같이 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 결과가 다른 논문에서 발표한 기법들보다 높은 정확도를 나타낸다. Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"에서 발표한 DeepLabv3+ 기법은 의미적인 분할(Semantic segmentation) 기법을 수행하면서 순(forward) 방향만 고려할 뿐 역전파(backpropagation)를 수행하지 않아 정확도가 떨어진다.As shown in [Table 2], the result of the semantic segmentation method applicable to AR according to an embodiment of the present invention exhibits higher accuracy than techniques published in other papers. The DeepLabv3+ technique presented in "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" by Liang-Chieh Chen and 4 others only considers the forward direction while performing a semantic segmentation technique. Backpropagation) is not performed, resulting in poor accuracy.

그러나 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법은 최종 이미지에서 측정지수인 mIOU가 일정값 이상 나오지 않으면 역전파(backpropagation)를 수행한다.However, the semantic segmentation method applicable to AR according to an embodiment of the present invention performs backpropagation when mIOU, which is a measurement index, does not appear above a predetermined value in the final image.

아래의 [표 3]은 파스칼(PASCAL) VOC 2012 데이터베이스의 전체 이미지에 대한 의미적인 분할의 소요시간을 나타낸다.[Table 3] below shows the time required for semantic segmentation for the entire image of the PASCAL VOC 2012 database.

[표 3] Duration of semantic segmentation for images in PASCAL VOC 2012 database[Table 3] Duration of semantic segmentation for images in PASCAL VOC 2012 database

상기 [표 3]과 같이 파스칼(PASCAL) VOC 2012 데이터베이스의 이미지를 입력으로 의미적인 분할을 수행하여 걸리는 시간을 측정한다. [표 3]에서 나타난 바와 같이 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법이 파스칼(PASCAL) VOC 2012 데이터베이스의 전체 이미지를 대상으로 1초에 의미적인 분할을 수행한 이미지의 개수로 소요시간을 측정한 초당 프레임은 64.3(FPS)의 결과를 나타낸다.As shown in [Table 3], the time taken by semantic segmentation is performed by inputting the image of the PASCAL VOC 2012 database. As shown in [Table 3], the semantic segmentation method applicable to AR according to an embodiment of the present invention is the number of images that semantic segmentation is performed in 1 second for the entire image of the PASCAL VOC 2012 database. Frames per second measured the time required shows a result of 64.3 (FPS).

따라서, 초당 프레임(FPS)이 60(FPS) 이상으로 나타났기 때문에 사람의 움직임 속도를 따라가는 AR 분야에 적용 가능함을 확인할 수 있다.Therefore, it can be confirmed that the frame per second (FPS) is 60 (FPS) or more, so it can be applied to the AR field that follows the human movement speed.

한편, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 객관적인 신뢰도를 평가하기 위하여 2018년에 Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"과 비교한다.Meanwhile, in order to evaluate the objective reliability of the semantic segmentation method applicable to AR according to an embodiment of the present invention, "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", published by Liang-Chieh Chen and others in 2018, and Compare.

이때, 정확도를 평가하기 위하여 실험에 사용된 시티스케이프(Cityscape) 데이터베이스의 이미지는 파스칼(PASCAL) VOC 2012 데이터베이스의 이미지와 같이 513x513 크기로 조절하여 사용한다.At this time, in order to evaluate the accuracy, the image of the Cityscape database used in the experiment is adjusted to a size of 513x513 as the image of the PASCAL VOC 2012 database and used.

아래의 [표 4]는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법과 다른 논문들에 대한 시티스케이프(Cityscape) 데이터베이스 정확도의 비교 결과를 나타낸다.[Table 4] below shows a comparison result of the semantic segmentation method applicable to AR according to an embodiment of the present invention and the accuracy of the Cityscape database for other papers.

[표 4] Comparison of Semantic Segmentation Techniques and Other Papers with Cityscapes Database Accuracy[Table 4] Comparison of Semantic Segmentation Techniques and Other Papers with Cityscapes Database Accuracy

즉, 상기 [표 4]는 Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"이 동일한 환경에서 다른 논문에서 발표한 기법들과 비교 평가한 결과를 바탕으로, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법과 비교한 결과를 나타낸다.That is, the above [Table 4] is based on the results of comparing and evaluating "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" published by Liang-Chieh Chen and other papers in the same environment. A result of comparison with a semantic segmentation method applicable to AR according to an embodiment of the present invention is shown.

상기 [표 4]에서 나타난 바와 같이 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법의 결과가 다른 논문에서 발표한 기법의 결과들에 비해 높은 정확도를 나타낸다. Liang-Chieh Chen 외 4명이 발표한 "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation"의 DeepLabv3+ 기법은 의미적인 분할 기법을 수행함에 있어서 순(forward0 방향만 고려할 뿐 역전파(backpropagation)를 수행하지 않아 정확도가 떨어진다.As shown in [Table 4], the result of the semantic segmentation method applicable to the AR according to the embodiment of the present invention exhibits higher accuracy than the results of the techniques published in other papers. The DeepLabv3+ technique of "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" published by Liang-Chieh Chen and 4 others only considers the forward0 direction and does not perform backpropagation. The accuracy is poor.

그러나 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법은 최종 이미지에서 측정지수인 mIOU가 일정값 이상 나오지 않으면 역전파(backpropagation)를 수행함으로써 다른 논문에서 발표한 기법들보다 우수한 정확도를 나타낸다.However, the semantic segmentation method applicable to AR according to an embodiment of the present invention exhibits better accuracy than techniques published in other papers by performing backpropagation when mIOU, which is a measurement index, does not appear above a certain value in the final image. .

아래의 [표 5]는 시티스케이프(Cityscape) 데이터베이스의 전체 이미지에 대한 의미적인 분할의 소요시간을 나타낸다.[Table 5] below shows the time required for semantic segmentation for the entire image of the Cityscape database.

[표 5] Duration of semantic segmentation for images in Cityscapes database[Table 5] Duration of semantic segmentation for images in Cityscapes database

상기 [표 5]와 같이 시티스케이프(Cityscape) 데이터베이스의 이미지를 입력으로 의미적인 분할을 수행하여 걸리는 시간을 측정한다. [표 5]에서 나타난 바와 같이 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법이 시티스케이프(Cityscape) 데이터베이스를 대상으로 1초에 의미적인 분할을 수행한 이미지의 개수로 소요시간을 측정한 초당 프레임은 61(FPS)의 결과를 나타낸다.As shown in [Table 5], the time taken by semantic segmentation is performed by inputting an image from the Cityscape database. As shown in [Table 5], the semantic segmentation method applicable to the AR according to an embodiment of the present invention measures the time required by the number of images that semantic segmentation is performed in one second for a Cityscape database. One frame per second represents a result of 61 (FPS).

도 14는 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 시스템을 나타내는 구성도이다. 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 시스템(10)은 영상 입력부(100), 분할 이미지 획득부(200), 특징점 추출부(300), 판단부(400) 및 저장부(500)를 포함할 수 있다.14 is a block diagram illustrating a semantic segmentation system applicable to an AR according to an embodiment of the present invention. The semantic segmentation system 10 applicable to AR according to an embodiment of the present invention includes an image input unit 100, a segmented image acquisition unit 200, a feature point extraction unit 300, a determination unit 400, and a storage unit 500. ) Can be included.

영상 입력부(100)는 이미지 정보를 입력받을 수 있다. 즉, 영상 입력부(100)는 카메라와 같은 영상 장치를 통해 촬영된 이미지 정보나, 또는 표준 데이터베이스를 통해 이미지 정보를 입력받을 수 있다.The image input unit 100 may receive image information. That is, the image input unit 100 may receive image information captured through an imaging device such as a camera or image information through a standard database.

상기 표준 데이터베이스에는 파스칼(PASCAL) VOC 2012 데이터베이스와, 시티스케이프(Cityscape) 데이터베이스가 포함될 수 있다.The standard database may include a PASCAL VOC 2012 database and a Cityscape database.

분할 이미지 획득부(200)는 영상 입력부(100)를 통해 입력받은 이미지 정보에서 객체를 분류하고, 분류된 객체를 라벨링(Labeling) 하여 의미적인 분할(Semantic segmentation) 이미지를 획득한다.The segmented image acquisition unit 200 classifies an object from image information input through the image input unit 100 and labels the classified object to obtain a semantic segmentation image.

또한, 특징점 추출부(300)는 확장 잔여 네트워크 모듈(310) 및 아트로스 피라미드 풀링 모듈(320)를 포함한다. 확장 잔여 네트워크 모듈(310)은 아트로스 컨볼루션(Atrous convolution)을 이용하여 분할 이미지 획득부(200)를 통해 획득된 이미지에서 특징점 맵(feature map)을 추출한다.In addition, the feature point extraction unit 300 includes an extended residual network module 310 and an Atros pyramid pooling module 320. The extended residual network module 310 extracts a feature map from the image acquired through the segmented image acquisition unit 200 using Atrous convolution.

또한, 아트로스 피라미드 풀링 모듈(320)은 이미지에서 차지하는 영역이 작은 객체를 효과적으로 추출하기 위해 확장 잔여 네트워크 모듈(310)에서 추출된 특징점 맵에 따라 선택적으로 다양한 아트로스 컨볼루션(Atrous convolution)을 적용하고, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점 맵을 추출한다.In addition, the Atros pyramid pooling module 320 selectively applies various Atrous convolutions according to the feature point map extracted from the extended residual network module 310 in order to effectively extract an object with a small area occupied in the image. And, by arranging the Atros convolution in parallel, the feature point map is extracted.

또한, 판단부(400)는 아트로스 피라미드 풀링 모듈(320)에서 추출된 결과와 미리 설정된 표준 데이터베이스에서 제공하는 결과 이미지를 비교한다. 또한, 판단부(400)는 비교 결과를 토대로 판단하여 오차율을 줄이기 위해 가중치를 수정한다.In addition, the determination unit 400 compares the result extracted from the Atros pyramid pooling module 320 with the result image provided from a preset standard database. In addition, the determination unit 400 determines based on the comparison result and corrects the weight to reduce the error rate.

즉, 판단부(400)는 이미지의 비교 결과 미리 설정된 오차 기준값 이상의 오차율이 발생하면 확장 잔여 네트워크 모듈(310)에서 수행하는 아트로스 컨볼루션(Atrous convolution)의 가중치를 수정하여 이미지의 특징점 맵(feature map)을 다시 추출한다.That is, if an error rate equal to or greater than a preset error reference value occurs as a result of comparing the images, the determination unit 400 modifies the weight of the Atrous convolution performed by the extended residual network module 310 to provide a feature map of the image. map) again.

또한, 저장부(500)는 영상 입력부(100)를 통해 입력받은 이미지 정보 또는 표준 데이터베이스를 저장할 수 있다. 또한, 저장부(500)는 특징점 추출부(300)를 통해 추출된 특징점 맵을 저장한다.Further, the storage unit 500 may store image information or a standard database input through the image input unit 100. In addition, the storage unit 500 stores the feature point map extracted through the feature point extraction unit 300.

이와 같이, 본 발명의 실시 예에 따른 AR에 적용 가능한 의미적인 분할 방법 및 시스템은 추출된 특징점 맵에 아트로스 컨볼루션을 적용하여 작은 객체를 효과적으로 추출할 수 있다. 또한, 아트로스 컨볼루션을 병렬적으로 배치하여 특징점을 추출한 후 피라미드 형상으로 특징점 맵을 형성함으로써 소요시간을 줄일 수 있다.As described above, the semantic segmentation method and system applicable to the AR according to an embodiment of the present invention can effectively extract a small object by applying the Atros convolution to the extracted feature point map. In addition, it is possible to reduce the time required by arranging the Atros convolutions in parallel to extract feature points and then form a feature point map in a pyramid shape.

또한, 확장 잔여 네트워크 역전파(Dilated Residual Network Backpropagation)를 사용하여 오차율을 줄이고 정확도를 향상시킬 수 있다.In addition, the error rate can be reduced and the accuracy can be improved by using the extended residual network backpropagation.

이상으로 본 발명에 관한 바람직한 실시 예를 설명하였으나, 본 발명은 상기 실시 예에 한정되지 아니하며, 본 발명의 실시 예로부터 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의한 용이하게 변경되어 균등하다고 인정되는 범위의 모든 변경을 포함한다.Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and is easily changed from the embodiments of the present invention by those of ordinary skill in the art to which the present invention belongs. It includes all changes to the extent deemed acceptable.

10 : 의미적인 분할 시스템
100 : 영상 입력부 200 : 분할 이미지 획득부
300 : 특징점 추출부 310 : 확장 잔여 네트워크 모듈
320 : 아트로스 피라미드 풀링 모듈
400 : 판단부 500 : 저장부10: semantic division system
100: image input unit 200: segmented image acquisition unit
300: feature point extraction unit 310: extended residual network module
320: Atros Pyramid Pooling Module
400: judgment unit 500: storage unit

Claims

A semantic segmentation image acquisition step (S10) of classifying and labeling objects from the input image to obtain a semantic segmented image;
Atrous convolution that maintains spatial information through various expansion ratios and extracts feature points with the same amount of computation, and feature point maps by selectively extracting only important features while preserving the features of the spatial domain for the entire image from the input image. A modified extended residual network (DRN, Dilated Residual Network) step of extracting a feature map from the image acquired from the image acquisition step S10 using the generated extended residual network (DRN, Dilated Residual Network) ( S20); And
In order to effectively extract an object with a small area occupied in the image, various Atrous convolutions are selectively applied according to the extracted feature point map, and finally, the feature point map extracted in the modified extended residual network step (S20) Atros Pyramid, which applies image pooling to the Atros convolutions in parallel, stacks the feature point maps extracted from the paralleled Atros convolutions in a pyramid shape, and extracts a 1-channel feature point map by applying 1x1 convolution. Semantic segmentation method applicable to AR including the Atrous Pyramid Pooling Module step (S30).

The method of claim 1,
A modified extended residual network backpropagation that compares the result extracted through the Atros pyramid pooling module step (S30) with the result image provided from a preset standard database, and corrects the weight to reduce the error rate by determining based on the comparison result. (Dilated Residual Network Backpropagation) Semantic segmentation method applicable to AR further comprising a step (S40).

The method of claim 1,
In the atros pyramid pulling module step (S30), the feature point map extracted in the atros pyramid pulling module step (S30) is stacked in a pyramid shape,
A semantic segmentation method applicable to AR, wherein the feature point map formed in a pyramid shape is formed as a one-channel feature point map by applying a 1x1 convolution.

The method of claim 3,
A semantic segmentation method applicable to AR, characterized in that the size of a feature point map having a size smaller than a preset reference value is equally adjusted through up-sampling.

delete

An image input unit receiving image information;
A segmented image acquisition unit that classifies an object from the image information input through the image input unit and obtains a semantic segmentation image by labeling the classified object;
Atrous convolution that maintains spatial information through various expansion ratios and extracts feature points with the same amount of computation, and feature point maps by selectively extracting only important features while preserving the features of the spatial domain for the entire image from the input image. An extended residual network module for extracting a feature map from the image acquired through the segmented image acquisition unit using a generated extended residual network (DRN);
In order to effectively extract objects with a small area occupied in the image, various Atrous convolutions are selectively applied according to the feature point map extracted from the extended residual network module, and finally feature points extracted from the extended residual network module Atros convolution is placed in parallel by applying image pooling to the map, and the feature point maps extracted from the parallel placed Atros convolution are stacked in a pyramid shape, and a 1-channel feature point map is extracted by applying 1x1 convolution. Pyramid pooling module; And
A semantic segmentation system applicable to AR including a determination unit that compares the result extracted from the Atros pyramid pooling module with the result image provided from a preset standard database, and modifies the weight to reduce the error rate by determining based on the comparison result .

The method of claim 6,
The determination unit re-extracts a feature map of the image by modifying the weight of the Atrous convolution performed by the extended residual network module when an error rate greater than or equal to a preset error reference value occurs as a result of comparing the images. A semantic segmentation system applicable to the featured AR.