KR20230099941A

KR20230099941A - Generalization Method and System of construction object segmentation model using self-supervised learning and copy-paste data augmentation

Info

Publication number: KR20230099941A
Application number: KR1020210189394A
Authority: KR
Inventors: 김홍조; 홍예지; 김형관
Original assignee: 연세대학교 산학협력단
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-07-05

Abstract

본 발명은 데이터베이스 및 연산기능을 가진 제어서버를 이용하는 컴퓨팅장치에 의해 수행되는 건설객체 분할모델 일반화방법으로서, 상기 컴퓨팅장치는 데이터 증강부(100)가 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 S100 단계; 의미적 분할부(200)가 지도학습 기반의 의미적 분할을 수행하는 S200 단계; 노이즈제거 및 타겟학습부(300)가 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 S300 단계; 및 지식 증류부(400)가 상기 노이즈제거 및 타겟학습부(300)에서 훈련된 모델을 교사 모델로 사용하여 학생 모델(student model)을 훈련시키는 S400 단계를 포함하여 수행하는 것을 특징으로 하는 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법이다.The present invention is a generalization method for a construction object segmentation model performed by a computing device using a database and a control server having an arithmetic function, wherein the data augmentation unit 100 determines that the semantic segmentation model is a visual feature of a target domain background and , step S100 of augmenting data to learn the boundary between the foreground and the background; Step S200 in which the semantic segmentation unit 200 performs semantic segmentation based on supervised learning; Step S300 in which the denoising and target learning unit 300 performs self-supervised learning by inputting the target domain image without annotation into the semantic segmentation model trained using the source domain data; and a step S400 in which the knowledge distillation unit 400 trains a student model using the model trained in the denoising and target learning unit 300 as a teacher model. A generalization method for construction object segmentation model using learning and copy-paste data augmentation.

Description

Generalization Method and System of construction object segmentation model using self-supervised learning and copy-paste data augmentation

본 발명은 건설객체 분할모델 일반화방법 및 일반화시스템에 관한 것이다. 구체적으로는 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법 및 일반화시스템에 관한 것이다.The present invention relates to a generalization method and system for a construction object segmentation model. Specifically, it relates to a generalization method and generalization system for a construction object segmentation model using self-supervised learning and copy-paste data augmentation.

건설 산업 현장의 안전 관리, 생산성 관리, 그리고 구축환경(built environment)의 유지보수 등 다양한 목적의 모니터링 작업을 자동화하기 위해 이미지를 이용한 건설 객체 인식 연구가 활발하게 이루어져 왔다. In order to automate monitoring tasks for various purposes, such as safety management, productivity management, and maintenance of the built environment in the construction industry, research on recognizing construction objects using images has been actively conducted.

이미지 기반의 객체 인식 방법론 중에서도 지도 학습 기반의 딥러닝 기술을 이용한 방법이 높은 정확도를 보여주어 가장 경쟁력 있는 방법론으로 여겨지고 있다.Among the image-based object recognition methodologies, the method using supervised learning-based deep learning technology is considered the most competitive method because it shows high accuracy.

그러나 지도 학습 기반의 딥러닝 모델은 모델의 훈련에 이용된 데이터와 다른 분포를 띄는 데이터에 대해서는 잘 작동하지 않는다는 문제점을 가진다. However, supervised deep learning models have a problem in that they do not work well for data with a different distribution from the data used for model training.

이러한 문제를 해결하는 방법의 하나로 다양한 시각적 특징을 포함하는 많은 양의 건설 객체 이미지 데이터로 모델을 훈련하는 방법이 있다. As one of the methods to solve this problem, there is a method of training a model with a large amount of construction object image data including various visual features.

그러나 이러한 해결 방법은 각 이미지에 대한 주석을 사람이 직접 생성해야 한다는 점에서 노동 집약적이고 시간 소모가 된다는 문제점이 제기된다. However, this solution raises the problem that it is labor-intensive and time-consuming in that annotations for each image must be created by a person.

데이터의 도메인이 바뀌면 모델의 성능이 매우 떨어지는 문제를 해결하기 위해 컴퓨터 과학 분야에서는 도메인 적응과 관련된 연구가 수행되어 왔다. In order to solve the problem that the performance of the model is very poor when the domain of data is changed, research related to domain adaptation has been conducted in the field of computer science.

도메인 분포 간 불일치를 처리할 수 있는 모델을 구축하는 것을 목표로 연구자들은 동일한 벤치마크 데이터셋에 대해 각자가 구축한 모델의 성능을 발표함으로써 최첨단의 기술을 가려내었다. 2021년 8월 기준 GTA5 (게임엔진을 이용한 가상 도로주행 데이터)를 이용해 Cityscapes (실제 도로주행 데이터)의 의미적 분해를 수행한 연구들 중 가장 높은 성능을 달성한 모델은 ProDA이다. With the goal of building a model that can handle discrepancies between domain distributions, the researchers screened out the state-of-the-art by publishing the performance of their models on the same benchmark dataset. As of August 2021, the model that achieved the highest performance among studies that performed semantic decomposition of Cityscapes (actual road driving data) using GTA5 (virtual road driving data using a game engine) is ProDA.

이러한, ProDA 모델 방법론은 가상의 도로 이미지를 이용해 실제 도로 이미지에서 자동차, 도로, 건물, 보행자 등을 검출하는 작업에 대해서는 높은 성능을 보였다.This ProDA model methodology showed high performance in the task of detecting cars, roads, buildings, pedestrians, etc. in real road images using virtual road images.

다만, ProDA 모델 방법론은 건설 현장 이미지에서 건설 객체 (작업자, 안전모 등)를 검출하는 작업에 대해서는 잘 작동되지 않는 문제점이 제기되었다.However, a problem was raised that the ProDA model methodology did not work well for the task of detecting construction objects (workers, hard hats, etc.) in construction site images.

(문헌 1) 한국등록특허공보 제10-2160224호 (2020.09.21)(Document 1) Korean Patent Registration No. 10-2160224 (2020.09.21)

본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법 및 일반화시스템은 다음과 같은 해결과제를 가진다.The generalization method and generalization system for a construction object segmentation model using self-supervised learning and copy-paste data augmentation according to the present invention have the following problems.

첫째, 건설현장 이미지에서 작업자, 안전모 등의 건설객체를 용이하게 검출하고자 한다.First, it is intended to easily detect construction objects such as workers and hard hats in construction site images.

둘째, 새로운 건설현장에서 촬영된 이미지데이터에 대하여 주석을 생성하는 작업을 수행하지 않고도, 기존 훈련 모델을 다른 현장에 적용하고자 한다.Second, we want to apply the existing training model to other construction sites without performing the task of generating annotations for the image data taken at the new construction site.

셋째, 검출된 건설객체를 이용하여 건설현장의 안전 등에 관한 자동 모니터링을 하고자 한다.Third, it is intended to automatically monitor the safety of the construction site using the detected construction objects.

본 발명의 해결과제는 이상에서 언급한 것들에 한정되지 않으며, 언급되지 아니한 다른 해결과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The problems of the present invention are not limited to those mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

본 발명은 데이터베이스 및 연산기능을 가진 제어서버를 이용하는 컴퓨팅장치에 의해 수행되는 건설객체 분할모델 일반화방법으로서, 상기 컴퓨팅장치는 데이터 증강부가 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 S100 단계; 의미적 분할부가 지도학습 기반의 의미적 분할을 수행하는 S200 단계; 노이즈제거 및 타겟학습부가 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 S300 단계; 및 지식 증류부가 상기 노이즈제거 및 타겟학습부에서 훈련된 모델을 교사 모델로 사용하여 학생 모델을 훈련시키는 S400 단계를 포함하여 수행할 수 있다.The present invention is a method for generalizing a construction object segmentation model performed by a computing device using a database and a control server having an arithmetic function, wherein the computing device is configured so that a semantic segmentation model in a data augmentation unit is a visual feature of a target domain background, a foreground and a background Step S100 of augmenting data so as to learn the boundary between them; Step S200 in which a semantic segmentation unit performs semantic segmentation based on supervised learning; Step S300 in which the denoising and target learning unit performs self-supervised learning by inputting the target domain image to the semantic segmentation model trained using the source domain data without annotation; and S400 in which the knowledge distillation unit trains a student model using the model trained in the denoising and target learning unit as a teacher model.

본 발명에 있어서, S100 단계의 상기 데이터 증강부는 타겟 도메인 이미지 중 일 이미지를 무작위로 선택하고, 상기 선택된 이미지에서 타겟 도메인의 전경이 포함된 영역을 제거하고, 소스 도메인 데이터의 전경 중 기 설정된 일부를 상기 선택된 타겟 도메인의 배경에 삽입할 수 있다.In the present invention, the data augmentation unit in step S100 randomly selects one of the target domain images, removes an area including the foreground of the target domain from the selected image, and selects a predetermined part of the foreground of the source domain data. It can be inserted into the background of the selected target domain.

본 발명에 있어서, S200 단계의 상기 의미적 분할부는 DeepLabV2를 이용하며, 훈련 이전의 DeepLabV2 모델은 ImageNet을 사용해 사전훈련된 상태이며, 소스 도메인 이미지와 주석을 이용하여 모델의 파라미터를 미세조정할 수 있다.In the present invention, the semantic segmentation in step S200 uses DeepLabV2, the DeepLabV2 model before training is pre-trained using ImageNet, and the parameters of the model can be fine-tuned using source domain images and annotations.

본 발명에 있어서, S300 단계의 상기 노이즈제거 및 타겟학습부는 노이즈 제거부가 프로토타입을 이용하여 가상 레이블의 노이즈를 제거하는 S310 단계 및 타겟구조학습부가 타겟 도메인의 특징점들이 밀집되도록 하는 S320 단계를 수행할 수 있다.In the present invention, the noise removal and target learning unit in step S300 performs a step S310 in which the noise removal unit removes noise of the virtual label using a prototype and a step S320 in which the target structure learning unit makes feature points of the target domain dense. can

본 발명에 있어서, S310 단계의 상기 노이즈 제거부는 상기 의미적 분할부에서 훈련된 의미적 분해 모델을 사용하여 타겟 도메인의 가상 레이블을 생성하고, 대표 특징인 프로토타입을 계산할 수 있다.In the present invention, the denoising unit in step S310 may generate a virtual label of the target domain using the semantic decomposition model trained in the semantic segmentation unit and calculate a prototype, which is a representative feature.

본 발명에 있어서, 가상 레이블의 노이즈를 제거하기 위해 프로토타입과 각 특징점 사이의 거리가 가상 레이블의 결정에 이용되며, k번째 클래스의 프로토타입과 특징점 사이의 거리가 멀수록 해당 특징점이 k번째 클래스로 결정될 확률이 감소될 수 있다.In the present invention, the distance between the prototype and each feature point is used to determine the virtual label in order to remove the noise of the virtual label. The probability of being determined as may be reduced.

본 발명에 있어서, S400 단계의 상기 지식 증류부는 상기 학생모델로서 SimCLRv2로 사전훈련되어 있는 DeepLabv2 모델을 사용할 수 있다.In the present invention, the knowledge distillation unit in step S400 may use a DeepLabv2 model pre-trained with SimCLRv2 as the student model.

본 발명에 있어서, 손실 함수로서 소스 도메인 데이터의 범주형 교차 엔트로피 손실 함수, 타겟 도메인 데이터의 범주형 교차 엔트로피 손실 함수 또는 교사 모델과 학생 모델 간 쿨백-라이블러 발산(Kullback-Leibler Divergence)의 합 중 어느 하나의 손실함수가 사용될 수 있다.In the present invention, as the loss function, the sum of the categorical cross-entropy loss function of the source domain data, the categorical cross-entropy loss function of the target domain data, or the Kullback-Leibler divergence between the teacher model and the student model Any one loss function may be used.

본 발명에 있어서, 상기 각 단계를 통해 획득되는 의미적 분할 모델들은 평가지표 mIoU 또는 DAI로 평가될 수 있다.In the present invention, the semantic segmentation models obtained through the above steps may be evaluated by the evaluation index mIoU or DAI.

청구항 9에 있어서, 상기 DAI 평가는 DAI가 1일 때 도메인 적응형 의미적 분할 모델은 타겟 도메인 데이터로 지도 학습을 수행한 모델과 동일한 성능으로 평가될 수 있다.The method according to claim 9, wherein the DAI evaluation is 1, the domain adaptive semantic segmentation model may be evaluated with the same performance as a model performing supervised learning with target domain data.

본 발명은 데이터베이스 및 연산기능을 가진 제어서버를 이용하는 컴퓨팅장치에 의해 수행되는 건설객체 분할모델 일반화시스템으로서, 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 데이터 증강부; 지도학습 기반의 의미적 분할을 수행하는 의미적 분할부; 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 노이즈제거 및 타겟학습부; 및 상기 노이즈제거 및 타겟학습부에서 훈련된 모델을 교사 모델로 사용하여 학생 모델을 훈련시키는 지식 증류부를 포함할 수 있다.The present invention is a generalization system for a construction object segmentation model performed by a computing device using a database and a control server having an arithmetic function. A semantic segmentation model can learn the visual characteristics of the background of a target domain and the boundary between foreground and background. a data augmentation unit for augmenting data so as to be; a semantic segmentation unit that performs semantic segmentation based on supervised learning; a denoising and target learning unit for performing self-supervised learning by inputting a target domain image without annotation into a semantic segmentation model trained using source domain data; and a knowledge distillation unit that trains a student model by using the model trained in the denoising and target learning unit as a teacher model.

본 발명에 있어서, 상기 데이터 증강부는 타겟 도메인 이미지 중 일 이미지를 무작위로 선택하고, 상기 선택된 이미지에서 타겟 도메인의 전경이 포함된 영역을 제거하고, 소스 도메인 데이터의 전경 중 기 설정된 일부를 상기 선택된 타겟 도메인의 배경에 삽입할 수 있다.In the present invention, the data augmentation unit randomly selects one of the target domain images, removes an area including the foreground of the target domain from the selected image, and transfers a predetermined part of the foreground of the source domain data to the selected target. You can insert it into the background of your domain.

본 발명에 있어서, 상기 의미적 분할부는 DeepLabV2를 이용하며, 훈련 이전의 DeepLabV2 모델은 ImageNet을 사용해 사전훈련된 상태이며, 소스 도메인 이미지와 주석을 이용하여 모델의 파라미터를 미세조정할 수 있다.In the present invention, the semantic segmentation unit uses DeepLabV2, the DeepLabV2 model before training is pre-trained using ImageNet, and the parameters of the model can be fine-tuned using source domain images and annotations.

본 발명에 있어서, 상기 노이즈제거 및 타겟학습부는 프로토타입을 이용하여 가상 레이블의 노이즈를 제거하는 노이즈 제거부; 및 타겟 도메인의 특징점들이 밀집되도록 하는 타겟구조학습부를 수행할 수 있다.In the present invention, the noise removal and target learning unit includes a noise removal unit for removing noise of virtual labels using a prototype; and a target structure learning unit for concentrating feature points of the target domain.

본 발명에 있어서, 상기 노이즈 제거부는 상기 의미적 분할부에서 훈련된 의미적 분해 모델을 사용하여 타겟 도메인의 가상 레이블을 생성하고, 대표 특징인 프로토타입을 계산할 수 있다.In the present invention, the denoising unit may generate a virtual label of the target domain using the semantic decomposition model trained in the semantic segmentation unit and calculate a prototype as a representative feature.

본 발명에 있어서, 상기 지식 증류부는 상기 학생모델로서 SimCLRv2로 사전훈련되어 있는 DeepLabv2 모델을 사용할 수 있다.In the present invention, the knowledge distillation unit may use a DeepLabv2 model pre-trained with SimCLRv2 as the student model.

본 발명에 있어서, 상기 DAI 평가는 DAI가 1일 때 도메인 적응형 의미적 분할 모델은 타겟 도메인 데이터로 지도 학습을 수행한 모델과 동일한 성능으로 평가될 수 있다.In the present invention, in the DAI evaluation, when the DAI is 1, the domain adaptive semantic segmentation model may be evaluated with the same performance as a model performing supervised learning with target domain data.

본 발명은 하드웨어와 결합되어, 본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법을 컴퓨터에 의해 실행시키기 위하여 컴퓨터가 판독 가능한 기록매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다.The present invention can be implemented as a computer program stored in a computer-readable recording medium in order to execute the construction object segmentation model generalization method using self-supervised learning and copy-paste data augmentation according to the present invention in combination with hardware. there is.

본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법 및 일반화시스템은 다음과 같은 효과를 가진다.The construction object segmentation model generalization method and generalization system using self-supervised learning and copy-paste data augmentation according to the present invention have the following effects.

첫째, 건설현장 이미지에서 작업자, 안전모 등의 건설객체를 용이하게 검출하는 효과가 있다.First, there is an effect of easily detecting construction objects such as workers and hard hats in a construction site image.

둘째, 새로운 건설현장에서 촬영된 이미지데이터에 대하여 주석을 생성하는 작업을 수행하지 않고도, 기존 훈련 모델을 다른 현장에 적용하는 효과가 있다.Second, there is an effect of applying the existing training model to other construction sites without performing the task of generating annotations for the image data taken at the new construction site.

셋째, 검출된 건설객체를 이용하여 건설현장의 안전 등에 관한 자동 모니터링을 하는 효과가 있다.Third, there is an effect of automatically monitoring the safety of the construction site using the detected construction object.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법의 순서도이다.
도 2는 본 발명에 있어서, 의미적 분할을 위한 이미지와 이미지의 주석에 대한 일 실시예를 나타낸다.
도 3은 복사-붙이기 데이터 증강 없이 도메인 적응형 의미적 분할을 수행한 경우의 순서도를 나타낸다.
도 4는 복사-붙이기 데이터 증강을 포함하여 도메인 적응형 의미적 분할을 수행한 경우의 순서도를 나타낸다.
도 5는 도메인 적응형 의미적 분할 모델과의 비교를 위해 타겟 도메인 데이터를 이용해 의미적 분할을 수행한 경우의 순서도를 나타낸다.
도 6은 본 발명에 따른 복사-붙이기 데이터 증강 방법의 일 실시예를 나타낸다.
도 7은 본 발명의 실험에 디용된 데이터셋의 이미지의 일 실시예를 나타낸다.
도 8은 본 발명에 따른 건설객체 분할모델 일반화시스템의 구성도이다.1 is a flowchart of a generalization method for a construction object segmentation model using self-supervised learning and copy-paste data augmentation according to the present invention.
2 shows an embodiment of an image for semantic segmentation and an annotation of the image in the present invention.
3 is a flowchart of a case where domain adaptive semantic segmentation is performed without copy-paste data augmentation.
4 shows a flowchart in the case of performing domain-adaptive semantic segmentation including copy-paste data augmentation.
5 is a flowchart illustrating a case in which semantic segmentation is performed using target domain data for comparison with a domain-adaptive semantic segmentation model.
6 shows an embodiment of a copy-paste data augmentation method according to the present invention.
7 shows an example of an image of a dataset used in the experiment of the present invention.
8 is a block diagram of a construction object segmentation model generalization system according to the present invention.

이하, 첨부한 도면을 참조하여, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 설명한다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 이해할 수 있는 바와 같이, 후술하는 실시예는 본 발명의 개념과 범위를 벗어나지 않는 한도 내에서 다양한 형태로 변형될 수 있다. 가능한 한 동일하거나 유사한 부분은 도면에서 동일한 도면부호를 사용하여 나타낸다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described so that those skilled in the art can easily practice it. As can be easily understood by those skilled in the art to which the present invention pertains, the embodiments described below may be modified in various forms without departing from the concept and scope of the present invention. Where possible, identical or similar parts are indicated using the same reference numerals in the drawings.

본 명세서에서 사용되는 전문용어는 단지 특정 실시예를 언급하기 위한 것이며, 본 발명을 한정하는 것을 의도하지는 않는다. 여기서 사용되는 단수 형태들은 문구들이 이와 명백히 반대의 의미를 나타내지 않는 한 복수 형태들도 포함한다.The terminology used in this specification is only for referring to specific embodiments and is not intended to limit the present invention. As used herein, the singular forms also include the plural forms unless the phrases clearly indicate the opposite.

본 명세서에서 사용되는 "포함하는"의 의미는 특정 특성, 영역, 정수, 단계, 동작, 요소 및/또는 성분을 구체화하며, 다른 특정 특성, 영역, 정수, 단계, 동작, 요소, 성분 및/또는 군의 존재나 부가를 제외시키는 것은 아니다.As used herein, the meaning of "comprising" specifies particular characteristics, regions, integers, steps, operations, elements, and/or components, and other specific characteristics, regions, integers, steps, operations, elements, components, and/or components. It does not exclude the presence or addition of groups.

본 명세서에서 사용되는 기술용어 및 과학용어를 포함하는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 일반적으로 이해하는 의미와 동일한 의미를 가진다. 사전에 정의된 용어들은 관련 기술문헌과 현재 개시된 내용에 부합하는 의미를 가지는 것으로 추가 해석되고, 정의되지 않는 한 이상적이거나 매우 공식적인 의미로 해석되지 않는다.All terms including technical terms and scientific terms used in this specification have the same meaning as commonly understood by a person of ordinary skill in the art to which the present invention belongs. Terms defined in the dictionary are further interpreted as having meanings consistent with related technical literature and currently disclosed content, and are not interpreted in ideal or very formal meanings unless defined.

본 명세서에서 사용되는 방향에 관한 표현, 예를 들어 전/후/좌/우의 표현, 상/하의 표현, 종방향/횡방향의 표현은 도면에 개시된 방향을 참고하여 해석될 수 있다.Expressions related to directions used in this specification, for example, expressions of front/back/left/right, top/bottom, and longitudinal/lateral directions may be interpreted with reference to directions disclosed in the drawings.

지도학습(Supervised Learning)은 모든 데이터에 레이블을 달아 그 데이터를 이용하여 모델을 학습시켜 답을 예측하게 하는 모델이다. 레이블링(각 데이터마다 학습시켜 놓은 답)이 되어있는 데이터셋(Data Set)을 가지고 모델을 학습시킨다. 다만, 지도학습은 수많은 데이터에 레이블을 전부 달아야 한다는 점에서 데이터셋을 모으기 어렵다는 단점이 있다.Supervised learning is a model that labels all data and uses that data to train a model to predict an answer. The model is trained with a dataset that has been labeled (an answer trained for each data). However, supervised learning has a disadvantage in that it is difficult to collect a dataset in that a large number of data must be labeled.

자기 지도학습(Self-Supervised Learning)은 이러한 지도 학습의 단점을 극복하기 위해 나온 모델에 해당된다.Self-Supervised Learning is a model that has been developed to overcome these disadvantages of supervised learning.

자기 지도학습은 인간의 관리 없이 컴퓨터 알고리즘을 훈련시키기 위해 자유롭게 사용할 수 있는 텍스트, 이미지 콘텐츠를 활용할 수 있는 방법을 제시하는 모델이다. 레이블링 되어있지 않은 데이터들을 이용해 모델 스스로 레이블링 되게 하는 것을 의미한다. 입력 데이터의 한 부분이 다른 부분의 지도(Supervision) 역할을 하게 된다. 즉, 다량의 레이블이 없는 원 데이터로부터 데이터 부분들의 관계를 통해 라벨을 자동으로 생성하여 지도 학습에 이용하는 기법이다.Self-supervised learning is a model that suggests a way to utilize freely available text and image content to train computer algorithms without human supervision. This means that the model itself is labeled using unlabeled data. One part of the input data serves as a supervision of the other part. In other words, it is a technique for automatically generating labels through the relationship between data parts from a large amount of unlabeled raw data and using them for supervised learning.

자기 지도 학습을 통해 웹이나 클라우드에 존재하는 많은 비디오, 이미지, 텍스트 데이터들을 모아 별도의 레이블링 없이 학습이 가능하기 때문에 새로운 데이터 셋을 구축하는 비용과 시간을 감소시킬 수 있다.Through self-supervised learning, a lot of video, image, and text data that exist on the web or in the cloud can be collected and learned without separate labeling, so the cost and time to build a new data set can be reduced.

의미적 분할(Semantic segmentation)은 디지털 이미지를 여러 개의 픽셀 집합으로 나누는 과정으로, 분할을 통해 이미지의 표현을 해석하기 쉬운 것으로 단순화하여 변환하는 것이다. 의미적 분할(Semantic segmentation)은 객체 인식(Object detection)과 함께 컴퓨터 비전 분야에서 많이 활용되고 있다.Semantic segmentation is the process of dividing a digital image into a set of pixels, which simplifies and transforms the representation of an image into something easy to interpret. Semantic segmentation is widely used in the field of computer vision along with object detection.

기존에 모델이 동작하던 영역을 소스(source) 도메인이라고 하고, 새로운 영역을 타겟(target) 도메인이라고 한다. 특정 작업이나 영역이 바뀌었을 때 기존의 정보를 잘 전이(Transfer)하여 활용하는 것을 전이 학습(Transfer Learning)라고 하는데 도메인 적응은 전이 학습의 하위 분야이다.The area in which the model operated previously is called a source domain, and the new area is called a target domain. When a specific task or area changes, transfer and use of existing information is called transfer learning, and domain adaptation is a sub-field of transfer learning.

도메인 적응(Domain Adaptation)은 적용되던 영역(Domain)이 약간 달라졌을 때, 다르지만 관련 있는 새로운 영역에 기존 영역의 정보를 적응(Adaptation)시켜서 사용하고자 하는 목적을 가지고 있는 연구 분야이다. Domain Adaptation is a research field that aims to adapt and use information from an existing domain to a different but related new domain when the applied domain is slightly different.

타겟 도메인에 대해서도 소스 도메인만큼의 학습 데이터를 구축하여 타겟 도메인에서도 같은 시스템을 새롭게 구축하는 것도 일 방법이지만, 이는 시간과 비용적인 측면에서 장애가 되는 요소이다. 도메인 적응 기술(Domain Adaptation)은 이러한 문제를 적은 양의 타겟 도메인의 학습 데이터 구축으로도 소스 도메인에 비해 급격한 성능 하락을 방지하기 위해서 제안되었다.It is also one method to newly build the same system in the target domain by building learning data equal to the source domain for the target domain, but this is an obstacle in terms of time and cost. The domain adaptation technology (Domain Adaptation) has been proposed to prevent such a problem from drastic performance degradation compared to the source domain even when a small amount of training data is built in the target domain.

미세 조정(fine-tuning)은 사전 학습한 모든 가중치와 더불어 하위 문제를 위한 최소한의 가중치를 추가해서 모델을 추가로 학습(미세 조정)하는 방법을 의미한다.Fine-tuning refers to a method of further training (fine-tuning) a model by adding minimal weights for subproblems in addition to all pretrained weights.

지식 증류(knowledge distillation)는 미리 잘 학습된 큰 네트워크(Teacher network) 의 지식을 실제로 사용하고자 하는 작은 네트워크(Student network)에게 전달하는 것을 의미한다. 딥러닝에서의 지식 증류는 큰 모델(Teacher Network)로부터 증류한 지식을 작은 모델(Student Network)로 전이(transfer)하는 일련의 과정이라고 할 수 있다.Knowledge distillation means transferring the knowledge of a large network (Teacher network) that has been well trained in advance to a small network (Student network) that actually wants to use it. Knowledge distillation in deep learning can be referred to as a series of processes in which knowledge distilled from a large model (Teacher Network) is transferred to a small model (Student Network).

지도 학습 방법으로 의미적 분할 모델을 훈련시키기 위해서는 이미지의 주석(annotation, label)이 필요하다. 도 2는 본 발명에 있어서, 의미적 분할을 위한 이미지와 이미지의 주석에 대한 일 실시예를 나타낸다. 도 2의 이미지의 주석에서 흰색(white)은 배경, 빨간색(red)은 작업자, 그리고 노란색(yellow)은 안전모를 의미한다.In order to train a semantic segmentation model with a supervised learning method, image annotations (labels) are required. 2 shows an embodiment of an image for semantic segmentation and an annotation of the image in the present invention. In the annotation of the image of FIG. 2, white means a background, red means a worker, and yellow means a safety helmet.

취득된 이미지에서 전경(foreground)은 작업자, 안전모 등 건설객체로서 중요한 대상을 의미하고, 배경(background)은 그 나머지 부분으로 중요성이 덜한 대상을 의미한다.In the acquired image, the foreground refers to important objects such as construction objects such as workers and hard hats, and the background refers to less important objects such as the rest.

기존의 지도 학습 기반 딥러닝 건설 객체 인식 모델은 훈련에 이용된 이미지 데이터 이외에 새로운 현장에서 촬영한 이미지 데이터에 대해서는 높은 성능을 기대하는 것이 곤란하였다. 이에, 모델을 새로운 현장에 적용하기 위해서는 새 이미지 데이터에 주석을 달아 모델을 새로 훈련시켜야 하였다. Existing supervised learning-based deep learning construction object recognition models were difficult to expect high performance for image data taken at new sites in addition to the image data used for training. Therefore, in order to apply the model to a new field, the model had to be newly trained by adding annotations to the new image data.

하지만, 본 발명은 새 이미지 데이터에 대해 주석을 생성하는 작업 없이 기존에 훈련된 모델을 이용하여 다른 현장에 적용할 수 있는 모델을 구축할 수 있는r기술적 특징을 가진다.However, the present invention has a technical feature of being able to build a model that can be applied to other fields using a previously trained model without creating annotations for new image data.

이하에서는 도면을 참고하여 본 발명을 설명하고자 한다. 참고로, 도면은 본 발명의 특징을 설명하기 위하여, 일부 과장되게 표현될 수도 있다. 이 경우, 본 명세서의 전 취지에 비추어 해석되는 것이 바람직하다.Hereinafter, the present invention will be described with reference to the drawings. For reference, the drawings may be partially exaggerated in order to explain the features of the present invention. In this case, it is preferable to interpret in light of the whole purpose of this specification.

도 1은 본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법의 순서도이다.1 is a flowchart of a generalization method for a construction object segmentation model using self-supervised learning and copy-paste data augmentation according to the present invention.

본 발명은 데이터베이스 및 연산기능을 가진 제어서버를 이용하는 컴퓨팅장치에 의해 수행되는 건설객체 분할모델 일반화방법으로서, 상기 컴퓨팅장치는 데이터 증강부(100)가 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 S100 단계; 의미적 분할부(200)가 지도학습 기반의 의미적 분할을 수행하는 S200 단계; 노이즈제거 및 타겟학습부(300)가 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 S300 단계; 및 지식 증류부(400)가 상기 노이즈제거 및 타겟학습부(300)에서 훈련된 모델을 교사 모델로 사용하여 학생 모델(student model)을 훈련시키는 S400 단계를 포함하여 수행할 수 있다.The present invention is a generalization method for a construction object segmentation model performed by a computing device using a database and a control server having an arithmetic function, wherein the data augmentation unit 100 determines that the semantic segmentation model is a visual feature of a target domain background and , step S100 of augmenting data to learn the boundary between the foreground and the background; Step S200 in which the semantic segmentation unit 200 performs semantic segmentation based on supervised learning; Step S300 in which the denoising and target learning unit 300 performs self-supervised learning by inputting the target domain image without annotation into the semantic segmentation model trained using the source domain data; and a step S400 in which the knowledge distillation unit 400 trains a student model using the model trained in the denoising and target learning unit 300 as a teacher model.

이하에서는, 본 발명에 따른 S100 단계를 설명하고자 한다.Hereinafter, step S100 according to the present invention will be described.

본 발명에 따른 S100 단계는 복사-붙이기 데이터 증강(Copy-paste data augmentation)에 관한 구성이다.Step S100 according to the present invention is a configuration related to copy-paste data augmentation.

본 발명에 따른 S100 단계는 데이터 증강부(100)가 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 단계이다.Step S100 according to the present invention is a step in which the data augmentation unit 100 augments data so that the semantic segmentation model can learn the visual characteristics of the background of the target domain and the boundary between the foreground and the background.

본 발명에 있어서, '시각적 특징'은 이미지에서 관찰되는 색상, 재질, 선, 꼭짓점 등을 의미한다. 딥러닝 모델은 훈련 시 포함되는 컨볼루션 연산을 통해 시각적 특징을 학습하고 이를 기반으로 이미지의 픽셀이 배경인지, 작업자인지, 또는 안전모인지 구분한다.In the present invention, 'visual features' means colors, materials, lines, vertices, etc. observed in an image. The deep learning model learns visual features through the convolution operation included in training, and based on this, distinguishes whether a pixel in an image is a background, a worker, or a hard hat.

본 발명에 있어서, 딥러닝 모델의 보편성(generality)을 높이기 위해서는 다양한 시각적 특징을 가진 이미지를 훈련 데이터에 포함시켜 모델이 다양한 시각적 특징을 학습할 수 있게 하는 것이 바람직하다. In the present invention, in order to increase the generality of the deep learning model, it is preferable to include images having various visual characteristics in training data so that the model can learn various visual characteristics.

본 발명에 있어서, '전경'은 배경이 아닌 클래스 즉, 작업자와 안전모를 의미한다. 작업자와 배경 사이의 경계, 그리고 안전모와 배경 사이의 경계를 다양하게 제공하는 것 역시 다양한 시각적 특징을 딥러닝 모델이 학습할 수 있도록 하는 방법이 될 수 있다.In the present invention, 'foreground' means a non-background class, that is, a worker and a safety helmet. Providing various boundaries between the worker and the background and between the hard hat and the background can also be a way to enable the deep learning model to learn various visual features.

S100 단계는 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징과, 전경과 배경 사이의 다양한 경계를 학습할 수 있도록 데이터를 증강하는 단계이다. Step S100 is a step of augmenting data so that the semantic segmentation model can learn the visual characteristics of the background of the target domain and various boundaries between the foreground and the background.

먼저, 타겟 도메인 이미지 중 일 이미지를 무작위로 선택한다. 다음으로, 선택된 이미지에서 타겟 도메인의 전경(작업자와 안전모)을 포함하는 영역이 제거된다. 마지막으로, 소스 도메인 데이터의 전경 중 기 설정된 일부를 타겟 도메인의 배경에 삽입시킨다.First, one image is randomly selected among target domain images. Next, the region containing the foreground of the target domain (the worker and the hard hat) is removed from the selected image. Finally, a predetermined part of the foreground of the source domain data is inserted into the background of the target domain.

일 실시예로서, 타겟 도메인의 전경을 제거하는 작업을 마친 이미지에 소스 도메인 데이터의 '작업자와 안전모 한 쌍'을 랜덤한 위치에, 스케일을 랜덤하게 변경하여 붙여넣었고 이 작업을 수 회 반복하였다.As an example, 'worker and a pair of hard hats' of source domain data were pasted at random locations and the scale was randomly changed to the image after removing the foreground of the target domain, and this operation was repeated several times.

본 발명에 있어서, 복사-붙이기 데이터 증강은 도메인 적응과 관련하여, 타겟 도메인의 배경과 소스 도메인의 전경 간 경계를 용이하게 학습할 수 있도록 한다.In the present invention, copy-paste data augmentation makes it possible to easily learn the boundary between the background of the target domain and the foreground of the source domain in relation to domain adaptation.

이하에서는, 본 발명에 따른 S200 단계를 설명하고자 한다.Hereinafter, step S200 according to the present invention will be described.

본 발명에 따른 S200 단계는 의미적 분할 (Semantic segmentation)에 관한 구성이다.Step S200 according to the present invention is a configuration related to semantic segmentation.

본 발명에 따른 S200 단계는 의미적 분할부(200)가 지도학습 기반의 의미적 분할을 수행하는 단계이다.Step S200 according to the present invention is a step in which the semantic segmentation unit 200 performs semantic segmentation based on supervised learning.

S200 단계는 지도 학습 기반의 의미적 분할로서, 기존에 존재하는 모델인 DeepLabV2를 이용할 수 있다. 훈련 이전의 DeepLabV2 모델은 ImageNet을 이용해 사전훈련(pretrained)된 상태이다. 소스 도메인 이미지와 주석을 이용하여 모델의 파라미터를 미세조정(fine-tuning)할 수 있다.Step S200 is semantic segmentation based on supervised learning, and DeepLabV2, an existing model, can be used. The DeepLabV2 model before training is pretrained using ImageNet. You can fine-tune model parameters using source domain images and annotations.

이하에서는, 본 발명에 따른 S300 단계를 설명하고자 한다.Hereinafter, step S300 according to the present invention will be described.

본 발명에 따른 S300 단계는 프로토타입을 이용한 가상 레이블 노이즈 제거 및 타겟 구조 학습(Prototypical pseudo label denoising and target structure learning)에 관한 구성이다.Step S300 according to the present invention is a configuration related to virtual label denoising and target structure learning (Prototypical pseudo label denoising and target structure learning) using a prototype.

본 발명에 따른 S300 단계는 노이즈제거 및 타겟학습부(300)가 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 단계이다.Step S300 according to the present invention is a step in which the denoising and target learning unit 300 performs self-supervised learning by inputting the target domain image into the semantic segmentation model trained using the source domain data without annotation.

S300 단계는 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행한다. S300 단계는 가상 레이블 노이즈 제거단계(S310)와 타겟 구조 학습 단계(S320)로 구성될 수 있다.In step S300, self-supervised learning is performed by inputting a target domain image without annotation into a semantic segmentation model trained using the source domain data. Step S300 may include a virtual label noise removal step (S310) and a target structure learning step (S320).

S310 단계는 가상 레이블 노이즈 제거에 관한 구성이다.Step S310 is a configuration related to virtual label noise removal.

S310 단계에서, 노이즈 제거부(310)가 의미적 분할부(200)에서 훈련된 의미적 분해 모델을 사용하여 타겟 도메인의 가상 레이블을 생성하고, 대표 특징인 프로토타입을 계산할 수 있다.In step S310 , the denoising unit 310 may generate a virtual label of the target domain using the semantic decomposition model trained in the semantic segmentation unit 200 and calculate a prototype, which is a representative feature.

이때, 소스 도메인과 타겟 도메인 간의 분포 차이로 인해 가상 레이블에는 노이즈가 존재할 수 있다.In this case, noise may exist in the virtual label due to a distribution difference between the source domain and the target domain.

S310 단계에서, 가상 레이블의 노이즈를 제거하기 위해 프로토타입과 각 특징점 사이의 거리가 가상 레이블의 결정에 이용될 수 있다. k번째 클래스의 프로토타입과 특징점 사이의 거리가 멀수록 해당 특징점이 k번째 클래스로 결정될 확률이 감소된다.In step S310, the distance between the prototype and each feature point may be used to determine the virtual label in order to remove the noise of the virtual label. As the distance between the prototype of the k-th class and the feature point increases, the probability that the corresponding feature point is determined as the k-th class decreases.

S320 단계는 타겟 구조 학습에 관한 구성이다.Step S320 is a configuration related to target structure learning.

S320 단계에서, 타겟구조학습부(320)가 타겟 도메인의 특징점들이 밀집되도록 할 수 있다.In step S320, the target structure learning unit 320 may make feature points of the target domain dense.

타겟 도메인 데이터는 소스 도메인과의 분포 차이로 인해 특징 공간에 넓게 분산된다는 특징을 가진다. 이러한 분산 특징은 가상 레이블 노이즈 제거를 방해하는 특징이 된다. 이에, 본 발명에 따른 S320 단계는 상기의 노이즈 제거방해를 방지하기 위하여, 특징 공간상에서 타겟 도메인 특징점들이 밀집하여 존재하도록 만들수 있다. The target domain data has a characteristic of being widely distributed in the feature space due to a difference in distribution from that of the source domain. This dispersion characteristic becomes a characteristic that hinders virtual label noise removal. Therefore, in step S320 according to the present invention, in order to prevent the above-mentioned noise removal interference, the target domain feature points may be densely present in the feature space.

이하에서는, 본 발명에 따른 S400 단계를 설명하고자 한다.Hereinafter, step S400 according to the present invention will be described.

본 발명에 따른 S400 단계는 지식 증류 (Knowledge distillation)에 관한 구성이다.Step S400 according to the present invention is a configuration related to knowledge distillation.

본 발명에 따른 S400 단계는 지식 증류부(400)가 상기 노이즈제거 및 타겟학습부(300)에서 훈련된 모델을 교사 모델로 사용하여 학생 모델(student model)을 훈련시킬 수 있다.In step S400 according to the present invention, the knowledge distillation unit 400 may train a student model using the model trained in the noise removal and target learning unit 300 as a teacher model.

S400 단계에서, 의미적 분할의 성능을 향상시키기 위해, S300 단계에서 훈련된 모델을 교사 모델(teacher model)로 이용하여 학생 모델(student model)을 훈련시킬 수 있다. 학생 모델로는 SimCLRv2로 사전훈련되어 있는 DeepLabv2 모델을 이용할 수 있다. In step S400, in order to improve performance of semantic segmentation, a student model may be trained using the model trained in step S300 as a teacher model. As a student model, the DeepLabv2 model pre-trained with SimCLRv2 can be used.

S400 단계의 손실 함수로는 소스 도메인 데이터의 범주형 교차 엔트로피 손실 함수, 타겟 도메인 데이터의 범주형 교차 엔트로피 손실 함수 또는 교사 모델과 학생 모델 간 쿨백-라이블러 발산(Kullback-Leibler Divergence)의 합 중 어느 하나의 손실함수가 이용될 수 있다. The loss function in step S400 is any one of the categorical cross-entropy loss function of the source domain data, the categorical cross-entropy loss function of the target domain data, or the sum of the Kullback-Leibler divergence between the teacher model and the student model. A single loss function can be used.

이하에서는, 본 발명에 따른 건설객체 분할모델 일반화방법으로 산출되는 모델의 평가방법에 관하여 설명하고자 한다.Hereinafter, the evaluation method of the model calculated by the generalization method of the construction object segmentation model according to the present invention will be described.

먼저, 의미적 분할 모델 1-8(Semantic segmentation model 1-8)을 평가하고자 한다. 본 발명에 따른 각 단계가 의미적 분할에 미치는 영향을 검증하기 위해서 각 단계를 마치고 얻어진 의미적 분할 모델(모델 1-8)의 성능을 평가하고자 한다. First, we want to evaluate the semantic segmentation model 1-8. In order to verify the effect of each step on the semantic segmentation according to the present invention, the performance of the semantic segmentation model (models 1-8) obtained after each stage is evaluated.

다음으로, 의미적 분할 모델 9(Semantic segmentation model 9)를 평가하고자 한다. Next, we will evaluate Semantic segmentation model 9.

의미적 분할 모델 9는 의미적 분할 모델 1에서 8까지와 달리 DeepLabv2를 타겟 도메인 이미지와 주석을 이용해 지도 학습 방법으로 훈련시킨 모델이다. 모델 9는 본 발명의 효용성을 검증하기 위한 대조군으로 이용되었다. Semantic segmentation model 9, unlike semantic segmentation models 1 to 8, is a model trained using DeepLabv2 as a supervised learning method using target domain images and annotations. Model 9 was used as a control group to verify the effectiveness of the present invention.

다음으로, 의미적 분할 모델 1-9 평가 방법으로서, 평가지표 mIoU(mean Intersection over Union)를 설명한다. Next, as a semantic segmentation model 1-9 evaluation method, the evaluation index mIoU (mean Intersection over Union) will be described.

mIoU는 IoU 값에 대한 평균값이다. 의미적 분할 모델의 평가 방법으로는 각 클래스에 대한 IoU(Intersection over Union)를 계산한 다음, 클래스에 대한 평균을 계산하는 mIoU가 이용될 수 있다. IoU는 true positive / (true positive + false positive + false negative) 로 계산된다. mIoU is the average value for IoU values. As an evaluation method of the semantic segmentation model, mIoU, which calculates an intersection over union (IoU) for each class and then calculates an average for the classes, may be used. IoU is calculated as true positive / (true positive + false positive + false negative).

다음으로, 도메인 적응형 의미적 분할 모델 1-8의 평가방법으로서, DAI(Domain Adaptation Index)를 설명한다.Next, as an evaluation method for domain adaptive semantic segmentation models 1-8, DAI (Domain Adaptation Index) will be described.

도메인 적응형 의미적 분할 모델의 효과를 검증하는 방법으로 DAI를 사용할 수 있다. DAI는 타겟 도메인 데이터의 주석 없이 훈련된 도메인 적응형 의미적 분할 모델의 성능을 타겟 도메인 데이터의 주석을 이용하여 훈련된 지도 학습 의미적 분할 모델의 성능으로 나눈 값이다. DAI can be used as a method to verify the effect of the domain-adaptive semantic segmentation model. The DAI is a value obtained by dividing the performance of a domain-adaptive semantic segmentation model trained without annotation of target domain data by the performance of a supervised semantic segmentation model trained using annotation of target domain data.

일반적인 컴퓨터 과학 분야의 도메인 적응형 의미적 분할 모델들의 주된 평가 방법은 소스 도메인 데이터만으로 훈련된 의미적 분할 모델을 타겟 도메인 데이터로 테스트했을 때의 성능보다 타겟 도메인 데이터의 주석 없이 훈련된 도메인 적응형 의미적 분할 모델의 성능이 향상한 수치를 비교하는 것이다. The main evaluation method of domain-adaptive semantic segmentation models in the field of general computer science is that the performance of the semantic segmentation model trained only with source domain data is tested with target domain data, but the performance of domain-adaptive semantic segmentation trained without annotation of target domain data is higher. It is to compare the numerical value of the improved performance of the redeployment model.

그러나 이 평가 방법은 건설 분야에서의 도메인 적응형 모델을 개발하는 목적을 고려할 때 적절한 평가 방법이 아니다. However, this evaluation method is not an appropriate evaluation method considering the purpose of developing a domain adaptive model in the construction field.

건설 분야에서는 지도 학습을 통한 의미적 분할 모델만큼 의미적 분할을 성공적으로 수행하는 도메인 적응형 의미적 분할 모델을 구축하는 것이 목적이므로 타겟 도메인 데이터의 지도 학습을 통한 의미적 분할 모델과 비교를 하는 것이 합리적이다. In the field of construction, the goal is to build a domain-adaptive semantic segmentation model that performs semantic segmentation as successfully as a semantic segmentation model through supervised learning, so it is important to compare it with a semantic segmentation model through supervised learning of target domain data. Reasonable.

DAI가 1일 때 도메인 적응형 의미적 분할 모델은 타겟 도메인 데이터로 지도 학습을 수행한 모델과 동일한 성능을 보이며, 지도 학습 모델을 대체할 수 있는 것으로 평가될 수 있다. When the DAI is 1, the domain-adaptive semantic segmentation model shows the same performance as a model performing supervised learning with target domain data, and can be evaluated as being able to replace the supervised learning model.

이하에서는, 연세대학교 제1 공학관 증축 공사 현장에서 본 발명이 적용된 실시예를 통해 본 발명을 설명하고자 한다. 또한, 본 발명에 따른 모델의 평가도 수행하고자 한다.Hereinafter, the present invention will be described through an embodiment in which the present invention is applied at the Yonsei University Engineering Building 1 extension construction site. In addition, an evaluation of the model according to the present invention is also intended.

본 발명은 복사-붙이기 데이터 증강 구성, 프로토타입을 이용한 가상 레이블 노이즈 제거 및 타겟 구조 학습 구성 및 지식 증류 구성을 융합하여, 건설현장 이미지에서 건설객체(예로, 작업자와 안전모)와 배경을 높은 정확도로 분할하는 일반화된 모델을 구축하는 방법과 그 모델을 개발한 것이다. The present invention combines a copy-paste data augmentation configuration, virtual label noise removal using a prototype, a target structure learning configuration, and a knowledge distillation configuration, so that construction objects (eg, workers and safety helmets) and backgrounds are detected with high accuracy in construction site images. A method for building a generalized model for segmentation and the development of that model.

연세대학교 제1 공학관 증축 공사 현장에서 각기 다른 장면을 촬영한 동영상 세 개로부터 이미지를 추출하여 소스 도메인 데이터셋 한 개와 타겟 도메인 데이터셋 두 개를 생성하여 수행한 실험을 통해 모델의 성능을 평가하였다. The performance of the model was evaluated through experiments performed by extracting images from three videos of different scenes taken at the extension construction site of Yonsei University Engineering Building 1, creating one source domain dataset and two target domain datasets.

구체적으로, 상기 실험에 사용된 소스 도메인의 경우, 훈련(training), 검증(validation) 및 시험(test) 데이터의 개수는 순서대로 190장, 27장 및 54장이다. 실험에 사용된 타겟 도메인 1의　훈련(training), 검증(validation) 및 시험(test) 데이터의 개수는 순서대로 217장, 31장 및 61장이다.　실험에 사용된 타겟 도메인 2의　훈련(training), 검증(validation) 및 시험(test) 데이터의 개수는 순서대로 185장, 26장 및 52장이다.Specifically, in the case of the source domain used in the experiment, the number of training, validation, and test data is 190, 27, and 54 in order. The number of training, validation, and test data of target domain 1 used in the experiment is 217, 31, and 61 in order. The number of training, validation, and test data of target domain 2 used in the experiment is 185, 26, and 52 in order.

정해진 반복(iteration) 수 만큼 훈련시킨 다음　검증셋(Validation set)의 시험(test) 결과가 가장 낮은 가중치(weight)를 이용하여 시험셋(test set)의 시험(test)을 수행하였다.　After training for a set number of iterations, the test of the test set was performed using the weight with the lowest test result of the validation set.

데이터 증강부(100)의 경우, 데이터 증강부에서 작업자와 안전모 한 쌍을 추가할 때 무작위로 회전을 주었고, 크기는 0.7배에서 2배 사이의 임의의 값을 곱해서 변화를 주었다.In the case of the data augmentation unit 100, when a worker and a pair of safety helmets are added in the data augmentation unit, rotation is given randomly, and the size is changed by multiplying a random value between 0.7 and 2 times.

의미적 분할부(200)의 경우, 소스 도메인 데이터만을, 또는 타겟 도메인 데이터만을 이용해 지도 학습을 할 때에는 반복(iteration) 60,000회, 배치사이즈(batch size) 2, 및 학습률(learning rate) 0.0001을 하이퍼파라미터로 지정하였다.In the case of the semantic segmentation unit 200, when supervised learning is performed using only the source domain data or only the target domain data, 60,000 iterations, a batch size of 2, and a learning rate of 0.0001 are set. specified as a parameter.

노이즈 제거 및 타겟 학습부(300)의 경우, 지도 학습된 의미 분할 모델을 타겟 도메인 데이터(주석이 포함되지 않은)를 이용해 자기지도 학습을 할 때에는 ㅂ반복(ieration) 100,000회, 배치사이즈(batch size) 2, 및 학습률(learning rate) 0.0001을 하이퍼파라미터로 지정하였다.In the case of the denoising and target learning unit 300, when self-supervised learning is performed using the target domain data (annotation is not included) for the supervised semantic segmentation model, 100,000 iterations, batch size ) 2, and a learning rate of 0.0001 were specified as hyperparameters.

노이즈 제거 및 타겟 학습부에 대한 설명을 PPT 파일로 첨부하였습니다. Script 부분에 설명을 기입하였습니다.An explanation of noise removal and target learning is attached as a PPT file. I wrote an explanation in the Script part.

지식 증류부(400)는 학습률(learning rate)을 6e4(0.0006)로 하였다.The knowledge distillation unit 400 sets the learning rate to 6e4 (0.0006).

각 데이터셋의 예시 이미지를 도 6에 나타내었다. Example images of each dataset are shown in FIG. 6 .

타겟 도메인 1과 타겟 도메인 2의 데이터로 훈련된 의미적 분할 모델의 성능을 다음의 표 1과 표 2에 각각 나타내었다.The performance of the semantic segmentation model trained with the data of target domain 1 and target domain 2 is shown in Table 1 and Table 2, respectively.

타겟 도메인 1과 2 모두에서 복사-붙이기 데이터 증강, 자기 지도 학습 및지식 증류를 적용했을 때 가장 높은 성능을 보였다. In both target domains 1 and 2, copy-and-paste data augmentation, self-supervised learning, and knowledge distillation showed the highest performance.

타겟 도메인 1에 대해 의미적 분할 모델 8은 81.98%의 mIoU를, 109.63%의 DAI를 달성하였다. 이는 지도 학습을 통해 타겟 도메인 데이터로 훈련된 의미적 분할 모델 9의 성능을 능가하는 결과이다.For target domain 1, semantic segmentation model 8 achieved an mIoU of 81.98% and a DAI of 109.63%. This is a result that exceeds the performance of the semantic segmentation model 9 trained with target domain data through supervised learning.

타겟 도메인 2에 대해 의미적 분할 모델 8은 75.90%의 mIoU를, 91.71%의 DAI를 달성하였다. For target domain 2, semantic segmentation model 8 achieved an mIoU of 75.90% and a DAI of 91.71%.

타겟 도메인 1,2 모두에서 지도 학습을 통해 타겟 도메인 데이터로 훈련된 의미적 분할 모델을 대체할 수 있을 만한 결과인 것으로 평가될 수 있다. It can be evaluated as a result that can replace the semantic segmentation model trained with target domain data through supervised learning in both target domains 1 and 2.

타겟 도메인 1,2 모두에 대해 두 가지의 예외를 제외하고는 의미적 분할 모델 1에서 4까지, 5에서 8까지 단계를 거듭할수록 높은 의미적 분할 성능을 보였다. 복사-붙이기 데이터 증강, 자기 지도 학습, 그리고 지식 증류 모두가 각각 의미적 분할의 성능을 향상시키는데 기여한 것이다. For both target domains 1 and 2, semantic segmentation performance was higher as the steps were repeated from semantic segmentation models 1 to 4 and 5 to 8, with two exceptions. Copy-paste data augmentation, self-supervised learning, and knowledge distillation all contributed to improving the performance of semantic segmentation, respectively.

하나의 예외는 타겟 도메인 1, 2 모두에서 의미적 분할 모델 5보다 의미적 분할 모델 6의 안전모(건설객체) 인식 성능이 낮아진 것이다. One exception is that the safety helmet (construction object) recognition performance of the semantic segmentation model 6 is lower than that of the semantic segmentation model 5 in both target domains 1 and 2.

나머지 하나의 예외는 타겟 도메인 1에서 의미적 분할 모델 1보다 의미적 분할 모델 2의 안전모(건설객체) 인식 성능이 약간 낮아진 것이다. The other exception is that the safety helmet (construction object) recognition performance of the semantic segmentation model 2 is slightly lower than that of the semantic segmentation model 1 in target domain 1.

이는 프로토타입을 이용한 가상 레이블 노이즈 제거 및 타겟 구조 학습 단계를 거치며 안전모 픽셀이 배경 픽셀로 잘못 구분된 것이 원인임을 확인하였다. 이는 프로토타입을 이용한 가상 레이블 노이즈 제거 및 타겟 구조 학습은 안전모와 같은 작은 건설객체에 민감하게 작동하는 것으로 여겨진다. It was confirmed that this was caused by the fact that the hard hat pixels were mistakenly classified as background pixels through the steps of removing virtual label noise and learning the target structure using the prototype. It is believed that virtual label noise removal and target structure learning using prototypes are sensitive to small construction objects such as hard hats.

따라서, 추가적인 데이터 증강 기법을 통해 모델에 구별 가능한 특징을 더 제공하여 이러한 현상을 해결할 수도 있을 것이다.Therefore, it may be possible to solve this phenomenon by providing more distinguishable features to the model through an additional data augmentation technique.

한편, 본 발명은 건설객체 분할모델 일반화시스템으로 구현될 수 있다. 구체적으로 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화시스템으로 구현될 수 있다.On the other hand, the present invention can be implemented as a construction object segmentation model generalization system. Specifically, it can be implemented as a generalization system for a construction object segmentation model using self-supervised learning and copy-paste data augmentation.

이러한 일반화시스템 발명은 전술한 일반화방법 발명과 실질적으로 동일한 발명으로서 발명의 카테고리가 상이하다. 따라서, 일반화방법 발명과 공통되는 구성은, 전술한 설명으로 대체하기로 하며, 이하에서는 본 일반화시스템 발명의 요지 위주로 설명하고자 한다.This generalization system invention is substantially the same invention as the generalization method invention described above, and the category of the invention is different. Therefore, the configuration common to the generalization method invention will be replaced with the above description, and the following will focus on the gist of the generalization system invention.

도 8은 본 발명에 따른 건설객체 분할모델 일반화시스템의 구성도이다.8 is a block diagram of a construction object segmentation model generalization system according to the present invention.

본 발명은 데이터베이스 및 연산기능을 가진 제어서버를 이용하는 컴퓨팅장치에 의해 수행되는 건설객체 분할모델 일반화시스템으로서, 의미적 분할 모델이 타겟 도메인 배경의 시각적 특징 및, 전경과 배경 사이의 경계를 학습할 수 있도록 데이터를 증강하는 데이터 증강부(100); 지도학습 기반의 의미적 분할을 수행하는 의미적 분할부(200); 소스 도메인 데이터를 이용해 훈련된 의미적 분할 모델에 타겟 도메인 이미지를 주석 없이 입력하여 자기 지도 학습을 수행하는 노이즈제거 및 타겟학습부(300); 및 상기 노이즈제거 및 타겟학습부(300)에서 훈련된 모델을 교사 모델로 사용하여 학생 모델(student model)을 훈련시키는 지식 증류부(400)를 포함한다.The present invention is a generalization system for a construction object segmentation model performed by a computing device using a database and a control server having an arithmetic function. A semantic segmentation model can learn the visual characteristics of the background of a target domain and the boundary between foreground and background. a data augmentation unit 100 for augmenting data so as to be; a semantic segmentation unit 200 that performs semantic segmentation based on supervised learning; a denoising and target learning unit 300 that performs self-supervised learning by inputting a target domain image without annotation into a semantic segmentation model trained using source domain data; and a knowledge distillation unit 400 that trains a student model using the model trained in the noise removal and target learning unit 300 as a teacher model.

본 발명에 있어서, 데이터 증강부(100)는 타겟 도메인 이미지 중 일 이미지를 무작위로 선택하고,상기 선택된 이미지에서 타겟 도메인의 전경이 포함된 영역을 제거하고, 소스 도메인 데이터의 전경 중 기 설정된 일부를 상기 선택된 타겟 도메인의 배경에 삽입할 수 있다.In the present invention, the data augmentation unit 100 randomly selects one of the target domain images, removes an area including the foreground of the target domain from the selected image, and selects a predetermined part of the foreground of the source domain data. It can be inserted into the background of the selected target domain.

본 발명에 있어서, 의미적 분할부(200)는 DeepLabV2를 이용하며, 훈련 이전의 DeepLabV2 모델은 ImageNet을 사용해 사전훈련된 상태이며, 소스 도메인 이미지와 주석을 이용하여 모델의 파라미터를 미세조정할 수 있다.In the present invention, the semantic segmentation unit 200 uses DeepLabV2, the DeepLabV2 model before training is pre-trained using ImageNet, and the parameters of the model can be fine-tuned using source domain images and annotations.

본 발명에 있어서, 노이즈제거 및 타겟학습부(300)는 프로토타입을 이용하여 가상 레이블의 노이즈를 제거하는 노이즈 제거부(310); 및 타겟 도메인의 특징점들이 밀집되도록 하는 타겟구조학습부(320)를 수행할 수 있다.In the present invention, the noise removal and target learning unit 300 includes a noise removal unit 310 that removes noise of virtual labels using a prototype; and a target structure learning unit 320 for concentrating the feature points of the target domain.

본 발명에 있어서, 노이즈 제거부(310)는 상기 의미적 분할부(200)에서 훈련된 의미적 분해 모델을 사용하여 타겟 도메인의 가상 레이블을 생성하고, 대표 특징인 프로토타입을 계산할 수 있다.In the present invention, the noise removal unit 310 may generate a virtual label of the target domain using the semantic decomposition model trained in the semantic segmentation unit 200 and calculate a prototype, which is a representative feature.

본 발명에 있어서, 지식 증류부(400)는 상기 학생모델로서 SimCLRv2로 사전훈련되어 있는 DeepLabv2 모델을 사용할 수 있다.In the present invention, the knowledge distillation unit 400 may use a DeepLabv2 model pre-trained with SimCLRv2 as the student model.

본 발명에 있어서, 각 단계를 통해 획득되는 의미적 분할 모델들은 평가지표 mIoU 또는 DAI로 평가될 수 있다.In the present invention, the semantic segmentation models obtained through each step may be evaluated by the evaluation index mIoU or DAI.

본 발명에 있어서, DAI 평가는 DAI가 1일 때 도메인 적응형 의미적 분할 모델은 타겟 도메인 데이터로 지도 학습을 수행한 모델과 동일한 성능으로 평가될 수 있다.In the present invention, when DAI is 1, the domain adaptive semantic segmentation model can be evaluated with the same performance as a model performing supervised learning with target domain data.

또한, 본 발명은 컴퓨터프로그램으로 구현될 수도 있다. 구체적으로 본 발명은 하드웨어와 결합되어, 본 발명에 따른 자기 지도학습과 복사-붙이기 데이터 증강을 이용한 건설객체 분할모델 일반화방법을 컴퓨터에 의해 실행시키기 위하여 컴퓨터가 판독 가능한 기록매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다.Also, the present invention may be implemented as a computer program. Specifically, the present invention is implemented as a computer program stored in a computer-readable recording medium in order to execute the construction object segmentation model generalization method using self-supervised learning and copy-paste data augmentation in combination with hardware according to the present invention by a computer. It can be.

본 발명의 실시예에 따른 방법들은 다양한 컴퓨터 수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CDROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Methods according to embodiments of the present invention may be implemented in a program form readable by various computer means and recorded on a computer readable recording medium. Here, the recording medium may include program commands, data files, data structures, etc. alone or in combination. Program instructions recorded on the recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in computer software. For example, recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CDROMs and DVDs, and magneto-optical media such as floptical disks. optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of the program command may include a high-level language that can be executed by a computer using an interpreter, as well as a machine language generated by a compiler. These hardware devices may be configured to act as one or more software modules to perform the operations of the present invention, and vice versa.

본 명세서에서 설명되는 실시예와 첨부된 도면은 본 발명에 포함되는 기술적 사상의 일부를 예시적으로 설명하는 것에 불과하다. 따라서, 본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이므로, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아님은 자명하다. 본 발명의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시 예는 모두 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The embodiments described in this specification and the accompanying drawings merely illustrate some of the technical ideas included in the present invention by way of example. Therefore, since the embodiments disclosed in this specification are intended to explain rather than limit the technical spirit of the present invention, it is obvious that the scope of the technical spirit of the present invention is not limited by these embodiments. All modified examples and specific examples that can be easily inferred by those skilled in the art within the scope of the technical idea included in the specification and drawings of the present invention should be construed as being included in the scope of the present invention.

100 : 데이터 증강부
200 : 의미적 분할부
300 : 노이즈제거 및 타겟학습부
310 : 노이즈 제거부
320 : 타겟 구조 학습부
400 : 지식 증류부100: data augmentation unit
200: semantic division
300: noise removal and target learning unit
310: noise removal unit
320: target structure learning unit
400: knowledge distillation

Claims

A construction object segmentation model generalization method performed by a computing device using a database and a control server having a calculation function, the computing device comprising:
Step S100 in which the data augmentation unit augments the data so that the semantic segmentation model can learn the visual characteristics of the background of the target domain and the boundary between the foreground and the background;
Step S200 in which a semantic segmentation unit performs semantic segmentation based on supervised learning;
Step S300 in which the denoising and target learning unit performs self-supervised learning by inputting the target domain image to the semantic segmentation model trained using the source domain data without annotation; and
Construction object segmentation using self-supervised learning and copy-paste data augmentation, characterized in that the knowledge distillation unit includes a step S400 of training a student model using the model trained in the denoising and target learning unit as a teacher model. How to generalize the model.

The method of claim 1,
The data augmentation unit of step S100
Randomly select one image from among the target domain images,
Remove a region including the foreground of the target domain from the selected image;
A method for generalizing a construction object segmentation model using self-supervised learning and copy-paste data augmentation, characterized in that a predetermined part of the foreground of the source domain data is inserted into the background of the selected target domain.

The method of claim 1,
The semantic segmentation in step S200 uses DeepLabV2,
The DeepLabV2 model before training is pre-trained using ImageNet,
A generalization method for a construction object segmentation model using self-supervised learning and copy-paste data augmentation, characterized by fine-tuning model parameters using source domain images and annotations.

The method of claim 1,
The noise removal and target learning unit of step S300
Step S310 in which the noise removal unit removes the noise of the virtual label using the prototype; and
A generalization method for a construction object segmentation model using self-supervised learning and copy-paste data augmentation, characterized in that the target structure learning unit performs step S320 of making the feature points of the target domain dense.

The method of claim 4,
The noise removal unit of step S310
Construction object segmentation using self-supervised learning and copy-paste data augmentation, characterized in that a virtual label of the target domain is generated using the semantic decomposition model trained in the semantic segmentation unit, and a prototype, which is a representative feature, is calculated. How to generalize the model.

The method of claim 5,
In order to remove the noise of the virtual label, the distance between the prototype and each feature point is used to determine the virtual label.
A generalization method for a construction object segmentation model using self-supervised learning and copy-paste data augmentation, characterized in that the probability that the feature point is determined as the k-th class decreases as the distance between the prototype and the feature point of the k-th class increases.

The method of claim 1,
The knowledge distillation unit of step S400
A construction object segmentation model generalization method using self-supervised learning and copy-paste data augmentation, characterized in that using a DeepLabv2 model pre-trained with SimCLRv2 as the student model.

The method of claim 7,
As the loss function, either the categorical cross-entropy loss function of the source domain data, the categorical cross-entropy loss function of the target domain data, or the sum of the Kullback-Leibler divergence between the teacher and student models. Construction object segmentation model generalization method using self-supervised learning and copy-paste data augmentation, characterized in that is used.

The method of claim 1,
The semantic segmentation models obtained through each of the above steps are evaluated by the evaluation index mIoU or DAI.

The method of claim 9,
The DAI evaluation is a construction object using self-supervised learning and copy-paste data augmentation, characterized in that when DAI is 1, the domain-adaptive semantic segmentation model is evaluated with the same performance as a model in which supervised learning is performed with target domain data. Split model generalization method.

As a generalization system for a construction object segmentation model performed by a computing device using a control server having a database and an arithmetic function,
a data augmentation unit that augments data so that the semantic segmentation model can learn the visual characteristics of the background of the target domain and the boundary between the foreground and the background;
a semantic segmentation unit that performs semantic segmentation based on supervised learning;
a denoising and target learning unit for performing self-supervised learning by inputting a target domain image without annotation into a semantic segmentation model trained using source domain data; and
and a knowledge distillation unit for training a student model using the model trained in the noise removal and target learning unit as a teacher model.

The method of claim 11,
The data augmentation unit
Randomly select one image from among the target domain images,
Remove a region including the foreground of the target domain from the selected image;
A construction object segmentation model generalization system using self-supervised learning and copy-paste data augmentation, characterized in that a predetermined part of the foreground of the source domain data is inserted into the background of the selected target domain.

The method of claim 11,
The semantic segmentation unit uses DeepLabV2,
The DeepLabV2 model before training is pre-trained using ImageNet,
A generalization system for a construction object segmentation model using self-supervised learning and copy-paste data augmentation, characterized by fine-tuning model parameters using source domain images and annotations.

The method of claim 11,
The noise removal and target learning unit
a noise removal unit that removes noise from the virtual label using the prototype; and a target structure learning unit for concentrating the feature points of the target domain.

The method of claim 14,
the noise removal unit
Construction object segmentation using self-supervised learning and copy-paste data augmentation, characterized in that a virtual label of the target domain is generated using the semantic decomposition model trained in the semantic segmentation unit, and a prototype, which is a representative feature, is calculated. Model generalization system.

The method of claim 15
In order to remove the noise of the virtual label, the distance between the prototype and each feature point is used to determine the virtual label.
Construction object segmentation model generalization system using self-supervised learning and copy-and-paste data augmentation, characterized in that the probability that the feature point is determined as the k-th class decreases as the distance between the prototype of the k-th class and the feature point decreases.

The method of claim 11,
The knowledge distillation part
A construction object segmentation model generalization system using self-supervised learning and copy-paste data augmentation, characterized in that the DeepLabv2 model pre-trained with SimCLRv2 is used as the student model.

The method of claim 17
As the loss function, either the categorical cross-entropy loss function of the source domain data, the categorical cross-entropy loss function of the target domain data, or the sum of the Kullback-Leibler divergence between the teacher and student models. Construction object segmentation model generalization system using self-supervised learning and copy-paste data augmentation, characterized in that is used.

The method of claim 11,
The semantic segmentation models obtained through each of the above steps are evaluated by the evaluation index mIoU or DAI.

The method of claim 19
The DAI evaluation is a construction object using self-supervised learning and copy-paste data augmentation, characterized in that when DAI is 1, the domain-adaptive semantic segmentation model is evaluated with the same performance as a model in which supervised learning is performed with target domain data. Split model generalization system.

A computer program stored in a computer-readable recording medium in order to execute the construction object segmentation model generalization method using self-supervised learning and copy-paste data augmentation according to claim 1 in combination with hardware.