KR20240029239A

KR20240029239A - Image learning processing system for object estimation

Info

Publication number: KR20240029239A
Application number: KR1020220107455A
Authority: KR
Inventors: 전광길
Original assignee: 인천대학교 산학협력단
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2024-03-05

Abstract

본 발명은 객체 추정을 위한 이미지 학습 처리 시스템에 관한 것으로서, 본 발명의 이미지 학습 처리 시스템은, 하나 이상의 제1 입력 이미지에 대해 객체 추정 세그먼트 모델을 학습시키되, 상기 객체 추정 세그먼트 모델을 이용해 상기 제1 입력 이미지에 포함된 객체를 분류하고 해당 객체 정보를 추출하도록 상기 객체 추정 세그먼트 모델을 학습시키는 학습부, 하나 이상의 제2 입력 이미지에 대해 상기 학습부에서 학습된 상기 객체 추정 세그먼트 모델을 통하여 추정되는 객체의 분류와 객체 정보가 실제 데이터와 나타내는 보정 정보를 생성하는 테스트부 및 상기 하나 이상의 제2 입력 이미지와 그에 대응된 상기 보정 정보에 대해, 상기 보정 정보를 그라운드 트루쓰와 비교해 비교 결과에 따라 추가 학습을 위한 학습데이터로 분류된 이미지를 추가학습저장소에 수집하고, 상기 추가학습저장소에 수집된 이미지의 데이터 세트를 이용해 상기 객체 추정 세그먼트 모델을 재학습시키는 모델 추정부를 포함한다. The present invention relates to an image learning processing system for object estimation. The image learning processing system of the present invention trains an object estimation segment model for one or more first input images, and uses the object estimation segment model to determine the first input image. A learning unit that trains the object estimation segment model to classify objects included in an input image and extract corresponding object information, and an object estimated through the object estimation segment model learned by the learning unit for one or more second input images. A test unit that generates correction information that represents classification and object information with actual data, and compares the correction information with ground truth for the one or more second input images and the correction information corresponding thereto, and performs additional learning according to the comparison result. It includes a model estimation unit that collects images classified as training data for an additional learning repository and retrains the object estimation segment model using the data set of images collected in the additional learning repository.

Description

Image learning processing system for object estimation}

본 발명은 이미지 학습 처리 시스템에 관한 것으로서, 특히, 항공 드론 이미지 등으로부터 이미지 내에 포함된 객체와 해당 객체 정보를 효율적으로 추정하기 위한 이미지 학습 처리 시스템에 관한 것이다. The present invention relates to an image learning processing system, and in particular, to an image learning processing system for efficiently estimating objects included in images and corresponding object information from aerial drone images, etc.

최근 몇 년 동안 소형, 상업용, 지능형 및 저렴한 위성의 개발과 현재 무인 공중 차량 (UAV)의 광범위한 가용성을 포함하여 현대 원격 감지 기술이 상당히 발전해 오고 있다. 원격 감지에는 고해상도 광학 장치(카메라 또는 센서)를 활용하여 원격으로 정보를 얻는 기술이 포함된다. 일반적으로 위성 및 무인 공중 차량과 공중 센서 모음 등이 관련된다. 그러나 원격 감지 기술의 발전으로 공간 해상도, 데이터 품질 및 사용 가능한 영역, 위치 및 장면의 적용 범위 측면에서 크게 변화하고 있는 원격 감지 이미지 및 비디오의 수가 급속히 증가했다. 원격 감지 이미지 및 비디오는 리소스를 효율적으로 제어, 모니터링 및 중요한 정보를 수집하는 데 도움이 될 수 있다. There have been significant advances in modern remote sensing technology in recent years, including the development of small, commercial, intelligent and inexpensive satellites and now the widespread availability of unmanned aerial vehicles (UAVs). Remote sensing involves the use of high-resolution optical devices (cameras or sensors) to obtain information remotely. It typically involves satellites, unmanned aerial vehicles and a collection of airborne sensors. However, advances in remote sensing technology have led to a rapid increase in the number of remotely sensed images and videos, which are changing significantly in terms of spatial resolution, data quality, and coverage of usable areas, locations, and scenes. Remotely sensed images and video can help you efficiently control, monitor, and gather valuable information.

효율적인 원격 감지 기술의 증가하는 발전은 도시 관리, 토지 변화 모니터링 및 교통 모니터링과 같은 다양한 응용 분야에 대한 훌륭한 기회를 제공한다. 다양한 응용 분야 중에서 고해상도 이미지 및 비디오로부터의 객체 감지 및 세분화는 원격 감지 분야에서 더 많이 고려되고 있다. 그러나 원격 감지 이미지에서 다양한 객체를 효율적으로 분류, 감지 및 세그먼트화하는 것은 카메라 높이, 객체 외관, 다양한 배경 및 환경 조건을 포함한 다양한 요인 때문에 연구 개발에 어려움이 있다. Increasing advancements in efficient remote sensing technologies offer excellent opportunities for diverse applications such as urban management, land change monitoring, and traffic monitoring. Among various applications, object detection and segmentation from high-resolution images and videos are being considered more in the field of remote sensing. However, efficient classification, detection, and segmentation of various objects in remote sensing images is challenging in research and development due to various factors including camera height, object appearance, and various background and environmental conditions.

따라서, 본 발명은 상술한 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 항공 드론 이미지 등 입력 이미지에 포함된 객체를 인식하고 해당 객체 정보를 추정하기 위한 학습 모델을 이용해 이미지에서 다양한 객체를 효율적으로 분류 및 세부 정보를 획득할 수 있도록 하기 위한 이미지 학습 처리 시스템을 제공하는 데 있다. Therefore, the present invention was created to solve the above-mentioned problems, and the purpose of the present invention is to recognize objects included in input images such as aerial drone images and use a learning model to estimate the corresponding object information to identify various objects in the image. The goal is to provide an image learning processing system to efficiently classify and obtain detailed information.

먼저, 본 발명의 특징을 요약하면, 상기의 목적을 달성하기 위한 본 발명의 일면에 따른 이미지 학습 처리 시스템은, 하나 이상의 제1 입력 이미지에 대해 객체 추정 세그먼트 모델을 학습시키되, 상기 객체 추정 세그먼트 모델을 이용해 상기 제1 입력 이미지에 포함된 객체를 분류하고 해당 객체 정보를 추출하도록 상기 객체 추정 세그먼트 모델을 학습시키는 학습부; 하나 이상의 제2 입력 이미지에 대해 상기 학습부에서 학습된 상기 객체 추정 세그먼트 모델을 통하여 추정되는 객체의 분류와 객체 정보가 실제 데이터와 나타내는 보정 정보를 생성하는 테스트부; 및 상기 하나 이상의 제2 입력 이미지와 그에 대응된 상기 보정 정보에 대해, 상기 보정 정보를 그라운드 트루쓰와 비교해 비교 결과에 따라 추가 학습을 위한 학습데이터로 분류된 이미지를 추가학습저장소에 수집하고, 상기 추가학습저장소에 수집된 이미지의 데이터 세트를 이용해 상기 객체 추정 세그먼트 모델을 재학습시키는 모델 추정부를 포함한다.First, to summarize the features of the present invention, the image learning processing system according to one aspect of the present invention for achieving the above object trains an object estimation segment model for one or more first input images, and the object estimation segment model a learning unit that classifies objects included in the first input image and trains the object estimation segment model to extract corresponding object information; a test unit that generates correction information that represents classification of an object and object information estimated through the object estimation segment model learned by the learning unit for one or more second input images and actual data; And for the one or more second input images and the corresponding correction information, compare the correction information with ground truth and collect images classified as learning data for additional learning according to the comparison result in an additional learning storage, It includes a model estimation unit that retrains the object estimation segment model using a data set of images collected in an additional learning repository.

상기 모델 추정부는, 상기 비교 결과에 따라, 상기 제2 입력 이미지 중 참 긍정(TP) 이미지와 참 부정(TN) 이미지를 제외한, 거짓 긍정(FP) 이미지 및 거짓 부정(FN) 이미지에 대하여, 상기 추가 학습을 위한 학습데이터로 분류할 수 있다.According to the comparison result, the model estimator is configured to calculate the false positive (FP) image and the false negative (FN) image, excluding the true positive (TP) image and the true negative (TN) image, among the second input images. It can be classified as learning data for further learning.

상기 모델 추정부는, 상기 제2 입력 이미지에 대하여 하기의 수학식들에 따라 리콜값(Rec)과 정밀도값(Prec)의 조화 평균값(F1-score)을 산출하여, 상기 제2 입력 이미지 중 상기 조화 평균값(F1-score) 미만인 이미지들에 대하여 상기 추가 학습을 위한 학습데이터로 분류하며, The model estimation unit calculates a harmonic average value (F1-score) of the recall value (Rec) and the precision value (Prec) for the second input image according to the following equations, and calculates the harmonic mean value (F1-score) of the recall value (Rec) and the precision value (Prec) for the second input image. Images that are below the average value (F1-score) are classified as training data for further learning,

Rec = NTP/(NTP+ NFN)Rec = NTP/(NTP+ NFN)

Prec = NTP/(NTP+ NFP)Prec = NTP/(NTP+ NFP)

여기서, NTP는 참 긍정(TP) 이미지라고 추정된 해당 이미지 내의 하나 이상의 객체의 면적, NFN은 거짓 부정(FN) 이미지라고 추정된 해당 이미지 내의 하나 이상의 해당 객체의 면적, NFP는 거짓 긍정(FP) 이미지라고 추정된 해당 이미지 내의 하나 이상의 객체의 면적일 수 있다.Here, NTP is the area of one or more objects in the image estimated to be a true positive (TP) image, NFN is the area of one or more corresponding objects in the image estimated to be a false negative (FN) image, and NFP is a false positive (FP) image. It may be the area of one or more objects within the image, which is estimated to be an image.

상기 객체 정보는, 객체 분류 정보, 객체 크기, 촬영 위치 정보, 고도, 방위각 또는 색상 중 하나 이상을 포함할 수 있다.The object information may include one or more of object classification information, object size, shooting location information, altitude, azimuth, or color.

그리고, 본 발명의 다른 일면에 따른 학습 기반의 이미지에 포함된 객체 및 객체 정보를 추정하기 위한, 컴퓨터가 판독 가능한 코드를 기록한 기록 매체는, 하나 이상의 제1 입력 이미지에 대해 객체 추정 세그먼트 모델을 학습시키되, 상기 객체 추정 세그먼트 모델을 이용해 상기 제1 입력 이미지에 포함된 객체를 분류하고 해당 객체 정보를 추출하도록 상기 객체 추정 세그먼트 모델을 학습시키는 학습기능; 하나 이상의 제2 입력 이미지에 대해 상기 학습된 상기 객체 추정 세그먼트 모델을 통하여 추정되는 객체의 분류와 객체 정보가 실제 데이터와 나타내는 보정 정보를 생성하는 테스트 기능; 및 상기 하나 이상의 제2 입력 이미지와 그에 대응된 상기 보정 정보에 대해, 상기 보정 정보를 그라운드 트루쓰와 비교해 비교 결과에 따라 추가 학습을 위한 학습데이터로 분류된 이미지를 추가학습저장소에 수집하고, 상기 추가학습저장소에 수집된 이미지의 데이터 세트를 이용해 상기 객체 추정 세그먼트 모델을 재학습시키는 모델 추정 기능을 구현하기 위한 기록 매체를 포함할 수 있다. And, according to another aspect of the present invention, a recording medium recording a computer-readable code for estimating an object and object information included in a learning-based image learns an object estimation segment model for one or more first input images. A learning function for classifying objects included in the first input image using the object estimation segment model and training the object estimation segment model to extract corresponding object information; a test function for generating correction information representing classification of an object and object information estimated through the object estimation segment model learned for one or more second input images and actual data; And for the one or more second input images and the correction information corresponding thereto, compare the correction information with ground truth and collect images classified as learning data for additional learning according to the comparison result in an additional learning storage, It may include a recording medium for implementing a model estimation function that retrains the object estimation segment model using a data set of images collected in the additional learning storage.

본 발명에 따른 이미지 학습 처리 시스템에 따르면, 항공 드론 이미지 등 입력 이미지에 포함된 객체를 인식하고 해당 객체 정보를 추정하기 위한 객체 추정 세그먼트 모델을 학습시켜 이미지에서 다양한 객체를 효율적으로 분류 및 세부 정보를 획득할 수 있도록 하기 위한 이미지 학습 처리 시스템을 제공할 수 있다. According to the image learning processing system according to the present invention, objects included in input images such as aerial drone images are recognized and an object estimation segment model is trained to estimate the corresponding object information to efficiently classify various objects in the image and provide detailed information. An image learning processing system can be provided to enable acquisition.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는 첨부도면은, 본 발명에 대한 실시예를 제공하고 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명의 일 실시예에 따른 딥 러닝 기반 객체 및 세부 정보 추정을 위한 이미지 학습처리 시스템(100)을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 데이터 증강부(120)에서 생성한 증강된 데이터 세트들의 이미지들을 예시한다.
도 3은 본 발명의 일 실시예에 따른 모델 추정부(150)의 동작을 설명하기 위한 도면이다.
도 4는 입력되는 본래 이미지, 객체 추정 세그먼트 모델 기반의 예측 이미지(predicted image) 및 참값 이미지들(True labe mask)에 대한 예시이다.
도 5는 본 발명의 일 실시예에 따른 이미지 학습처리 시스템(100)의 구현 방법의 일례를 설명하기 위한 도면이다.The accompanying drawings, which are included as part of the detailed description to aid understanding of the present invention, provide embodiments of the present invention and explain the technical idea of the present invention along with the detailed description.
FIG. 1 is a diagram illustrating an image learning processing system 100 for deep learning-based object and detailed information estimation according to an embodiment of the present invention.
Figure 2 illustrates images of augmented data sets generated by the data augmentation unit 120 according to an embodiment of the present invention.
Figure 3 is a diagram for explaining the operation of the model estimation unit 150 according to an embodiment of the present invention.
Figure 4 is an example of an input original image, a predicted image based on an object estimation segment model, and a true label mask.
Figure 5 is a diagram for explaining an example of a method of implementing the image learning processing system 100 according to an embodiment of the present invention.

이하에서는 첨부된 도면들을 참조하여 본 발명에 대해서 자세히 설명한다. 이때, 각각의 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타낸다. 또한, 이미 공지된 기능 및/또는 구성에 대한 상세한 설명은 생략한다. 이하에 개시된 내용은, 다양한 실시 예에 따른 동작을 이해하는데 필요한 부분을 중점적으로 설명하며, 그 설명의 요지를 흐릴 수 있는 요소들에 대한 설명은 생략한다. 또한 도면의 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시될 수 있다. 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니며, 따라서 각각의 도면에 그려진 구성요소들의 상대적인 크기나 간격에 의해 여기에 기재되는 내용들이 제한되는 것은 아니다.Hereinafter, the present invention will be described in detail with reference to the attached drawings. At this time, the same components in each drawing are indicated by the same symbols whenever possible. Additionally, detailed descriptions of already known functions and/or configurations will be omitted. The content disclosed below focuses on the parts necessary to understand operations according to various embodiments, and descriptions of elements that may obscure the gist of the explanation are omitted. Additionally, some components in the drawings may be exaggerated, omitted, or shown schematically. The size of each component does not entirely reflect the actual size, and therefore the content described here is not limited by the relative sizes or spacing of the components drawn in each drawing.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시 예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. In describing the embodiments of the present invention, if it is determined that a detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification. The terminology used in the detailed description is only for describing embodiments of the present invention and should in no way be limiting. Unless explicitly stated otherwise, singular forms include plural meanings. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, and one or more than those described. It should not be construed to exclude the existence or possibility of any other characteristic, number, step, operation, element, or part or combination thereof.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In addition, terms such as first, second, etc. may be used to describe various components, but the components are not limited by the terms, and the terms are used for the purpose of distinguishing one component from another component. It is used only as

도 1은 본 발명의 일 실시예에 따른 딥 러닝 기반 객체 및 세부 정보 추정을 위한 이미지 학습처리 시스템(100)을 설명하기 위한 도면이다. FIG. 1 is a diagram illustrating an image learning processing system 100 for deep learning-based object and detailed information estimation according to an embodiment of the present invention.

도 1을 참조하면, 이미지 학습 처리 시스템(100)은 전처리부(110), 데이터 증강부(120), 학습부(130), 테스트부(140), 모델 추정부(150)를 포함한다.Referring to FIG. 1, the image learning processing system 100 includes a preprocessor 110, a data augmentation unit 120, a learning unit 130, a test unit 140, and a model estimation unit 150.

전처리부(110)는 수집된 데이터 세트와 이에 대응된 참값 이미지들에 대해 이미지 크기 조정, 셔플링(shuffling)(예, 난독을 위해 픽셀 데이터 위치를 조정함) 및 정규화(예, 밝기 및 대조도 등을 조정함) 등의 전처리를 수행한다. 수집된 데이터 세트는 예를 들어 드론을 이용한 항공 촬영 이미지들이 수집된 데이터로 이루어질 수 있으며, 학습을 위하여 상기 수집된 이미지들은 기준이 되는 실제 이미지로서의 참값(True) 이미지들과 레이블링(labeling) 데이터(예, 객체 분류 정보, 객체 크기, 촬영 위치 정보, 고도, 방위각, 색상 등)를 기초로 대응되어 있다. 수집된 이미지들(예, 400개의 이미지, 해상도 6000 x 4000픽셀)과 참값 이미지들은 이미지 임베딩 알고리즘을 이용하여 임베딩 처리해 벡터화될 수 있다. 전처리부(110)는 벡터화된 수집된 이미지들에 대해 전처리를 수행할 수 있다.The preprocessor 110 performs image resizing, shuffling (e.g., adjusting pixel data positions for obfuscation), and normalization (e.g., brightness and contrast) for the collected data set and the corresponding true value images. Perform preprocessing such as adjusting etc. The collected data set may, for example, consist of data collected from aerial images using drones, and for learning purposes, the collected images include true images as reference actual images and labeling data ( For example, the correspondence is based on object classification information, object size, shooting location information, altitude, azimuth, color, etc.). Collected images (e.g., 400 images, resolution 6000 x 4000 pixels) and true images can be vectorized by embedding using an image embedding algorithm. The preprocessor 110 may perform preprocessing on the vectorized collected images.

데이터 증강부(120)는 데이터 증강 알리즘을 이용해 상기 전처리된 이미지들(데이터 세트)에 대한 데이터 증강을 수행하여 증강된 데이터 세트를 생성하여 학습에 이용함으로써, 데이터 세트의 다양성을 통한 아키텍처의 충분한 분산과 견고성을 보장하며 과도한 피팅을 방지해 시스템의 효율성을 향상시킬 수 있도록 한다. The data augmentation unit 120 performs data augmentation on the preprocessed images (data set) using a data augmentation algorithm to generate an augmented data set and uses it for learning, thereby providing a sufficient level of architecture through the diversity of the data set. It ensures dispersion and robustness and prevents excessive fitting to improve system efficiency.

도 2는 본 발명의 일 실시예에 따른 데이터 증강부(120)에서 생성한 증강된 데이터 세트들의 이미지들을 예시한다. Figure 2 illustrates images of augmented data sets generated by the data augmentation unit 120 according to an embodiment of the present invention.

도 2와 같이, 예를 들어, 상기 본래의 전처리된 이미지들(original image)에 대하여, 데이터 증강부(120)는, 고도를 다르게 한(Vertical shift) 이미지들, 수평 위치를 다르게 한(Herizontal shift) 이미지들, 사진의 수평 또는 수직축에 대칭인(Flipped) 이미지들, 또는 회전된(Rotated) 이미지들 등의 이미지들에 대한 데이터 세트를 증강하여 학습에 이용될 수 있게 한다. As shown in FIG. 2, for example, with respect to the original preprocessed images, the data augmentation unit 120 generates images with different altitudes (vertical shift) and images with different horizontal positions (Herizontal shift). ) images, images that are symmetrical (flipped) to the horizontal or vertical axis of the photo, or rotated images, etc., can be augmented and used for learning.

학습부(130)는 데이터 증강부(120)로부터의 증강된 데이터 세트 중 일부(예, 수집된 400개의 이미지들 중 무작위로 선택된 200개)에 대해 딥 러닝 기반 학습을 수행하여 객체 추정 세그먼트 모델을 학습한다. 여기서 소정의 응용, 예를 들어, 이미지 내의 객체의 분류를 위한 VGG-16, ResNet 50 및 MobileNet 등의 기계 학습 장치, 및 이미지 내의 객체의 세분화를 위한 U-Net 등의 기계 학습 장치가 이용될 수 있다. The learning unit 130 performs deep learning-based learning on some of the augmented data sets from the data augmentation unit 120 (e.g., 200 randomly selected images out of 400 collected images) to create an object estimation segment model. learn Here, for certain applications, for example, machine learning devices such as VGG-16, ResNet 50 and MobileNet for classification of objects in images, and machine learning devices such as U-Net for segmentation of objects in images may be used. there is.

객체 추정 세그먼트 모델은 입력 이미지에 대해 이미지 내의 객체(예, 지붕, 자동차, 나무, 사람, 도로, 창고, 산, 강 등)를 분류해 구분하고, 레이블링 데이터(예, 객체 분류 정보, 객체 크기(면적), 촬영 위치 정보, 고도, 방위각, 색상 등)를 기초로 참값 이미지들과 비교해 해당 객체 정보(예, 객체들에 대한 객체 분류 정보, 객체 크기(면적), 촬영 위치 정보, 고도, 방위각, 색상 등)를 추출하기 위한 모델로서, 학습부(130)는 데이터 증강부(120)로부터의 증강된 데이터 세트를 입력받아, 각각에 대하여 객체 추정 세그먼트 모델을 이용해 입력 이미지에 포함된 객체를 분류하고 해당 객체 정보를 추출하도록 상기 객체 추정 세그먼트 모델을 학습시키며 필요한 학습 파라미터들을 업데이트시킨다. The object estimation segment model classifies and distinguishes objects in the image (e.g., roofs, cars, trees, people, roads, warehouses, mountains, rivers, etc.) from the input image, and uses labeling data (e.g., object classification information, object size ( Area), shooting location information, altitude, azimuth, color, etc.) are compared with the true value images to determine corresponding object information (e.g., object classification information for objects, object size (area), shooting location information, altitude, azimuth, As a model for extracting (color, etc.), the learning unit 130 receives the augmented data set from the data augmentation unit 120, classifies the objects included in the input image using an object estimation segment model for each, and The object estimation segment model is trained to extract the corresponding object information and the necessary learning parameters are updated.

상기 객체 분류 정보는 객체를 분류하는 지붕, 자동차, 나무, 사람, 도로, 창고, 산, 강 등의 분류 정보이다. 상기 촬영 위치 정보는 촬영된 위치를 나타내는 GPS(Global Positioning System) 정보 등일 수 있다. The object classification information is classification information for objects such as roofs, cars, trees, people, roads, warehouses, mountains, and rivers. The shooting location information may be GPS (Global Positioning System) information indicating the shooting location.

테스트부(140)는 데이터 증강부(120)로부터의 증강된 데이터 세트 중 나머지 데이터(예, 수집된 400개의 이미지들 중 학습에 이용된 200개를 제외한 나머지 200개)에 대해 학습부(130)로부터의 학습된 객체 추정 세그먼트 모델에 대한 테스트를 수행하여 보정 정보를 생성한다. The test unit 140 uses the learning unit 130 for the remaining data (e.g., 200 of the 400 collected images excluding the 200 used for learning) of the augmented data set from the data augmentation unit 120. Correction information is generated by performing a test on the object estimation segment model learned from.

예를 들어, 테스트부(140)는 학습부(130)에서 학습된 객체 추정 세그먼트 모델에 상기 테스트용 데이터 세트(테스트 이미지)를 입력하여 해당 이미지에 포함된 객체의 분류(예, 지붕, 자동차, 나무, 사람, 도로, 창고, 산, 강 등)와 해당 객체 정보의 추정이 정상적인지 여부를 테스트함으로써, 실제 데이터(예, 분류 대상 실제 객체와 해당 객체 정보)와 차이를 보정 정보로 생성할 수 있다. For example, the test unit 140 inputs the test data set (test image) into the object estimation segment model learned by the learning unit 130 to classify objects included in the image (e.g., roof, car, trees, people, roads, warehouses, mountains, rivers, etc.) and whether the estimation of corresponding object information is normal, the difference from actual data (e.g., actual object to be classified and corresponding object information) can be generated as correction information. there is.

모델 추정부(150)는 테스트부(140)로부터의 보정 정보를 상기 학습된 객체 추정 세그먼트 모델의 학습 파라미터들에 반영하여, 입력 이미지에 대한 객체와 객체 정보를 정교하게 추정가능한 보정된 객체 추정 세그먼트 모델을 완성하도록 상기 학습된 객체 추정 세그먼트 모델을 학습시킨다. The model estimation unit 150 reflects the correction information from the test unit 140 to the learning parameters of the learned object estimation segment model, and provides a corrected object estimation segment that can precisely estimate the object and object information for the input image. The learned object estimation segment model is trained to complete the model.

도 3은 본 발명의 일 실시예에 따른 모델 추정부(150)의 동작을 설명하기 위한 도면이다. Figure 3 is a diagram for explaining the operation of the model estimation unit 150 according to an embodiment of the present invention.

도 3을 참조하면, 모델 추정부(150)는, 상기 테스트용 데이터 세트, 즉 테스트 이미지 각각을 특징점 추출 등의 알고리즘을 이용한 이미지 프로세싱을 통해 생성한 예측 이미지(predicted image) (도 4 참조)를 기반으로 객체를 분류하고 객체 정보를 추정할 수 있으며, 이를 그라운드 트루쓰(ground truth)(정답/참값)에 해당하는 참값(True) 이미지들(True labe mask) (도 4 참조)과 비교한다. Referring to FIG. 3, the model estimation unit 150 generates a predicted image (see FIG. 4) generated through image processing of each of the test data sets, that is, test images, using an algorithm such as feature point extraction. Based on this, objects can be classified and object information can be estimated, and this is compared with true images (True labe mask) (see FIG. 4) corresponding to ground truth (correct answer/true value).

즉, 모델 추정부(150)는, 상기 테스트용 데이터 세트, 즉 테스트 이미지 각각과 추정된 객체의 분류 및 객체 정보(predicted results)에 기초한 해당 보정 정보에 대하여, 상기 보정 정보를 그에 대응되는 참값들에 해당하는 그라운드 트루쓰(ground truth)와 비교해 그 차이의 결과에 따라, 상기 테스트 이미지를 4가지 유형, 즉, 참 긍정(TP, True Positive) 이미지(210), 참 부정(TN, True Negative) 이미지(220), 거짓 긍정(FP, False Positive) 이미지(230) 및 거짓 부정(FN, False Negative) 이미지(240)으로 분류한다.That is, the model estimation unit 150 converts the correction information into corresponding true values for the test data set, that is, each test image and the corresponding correction information based on the classification and object information (predicted results) of the estimated object. According to the result of the difference compared to the corresponding ground truth, the test image is classified into four types, namely, true positive (TP) image 210, true negative (TN) image 210, and true negative (TN) image 210. They are classified into images (220), false positive (FP) images (230), and false negative (FN) images (240).

참 긍정(TP, True Positive) 이미지(210)는 상기 학습된 객체 추정 세그먼트 모델에 의해 추정된 객체 정보가 있고 그 정보가 소정의 범위 내에서 그라운드 트루쓰와 일치하는 경우, 참 부정(TN, True Negative) 이미지(220)는 상기 학습된 객체 추정 세그먼트 모델에 의해 추정된 객체가 없다고 판단하였으나 그라운드 트루쓰에 기초하면 객체가 있는 경우, 거짓 긍정(FP, False Positive) 이미지(230)는 상기 학습된 객체 추정 세그먼트 모델에 의해 추정된 객체 정보가 있다고 판단하였으나 그라운드 트루쓰에 기초하면 해당 객체가 없는 경우, 거짓 부정(FN, False Negative) 이미지(240)는 상기 학습된 객체 추정 세그먼트 모델에 의해 추정된 객체가 없다고 판단하였으나 그라운드 트루쓰에 기초하면 객체가 있고 해당 객체 정보가 있는 경우이다.The true positive (TP, True Positive) image 210 has object information estimated by the learned object estimation segment model, and if the information matches the ground truth within a predetermined range, the true negative (TN, True Negative) image 220 determines that there is no object estimated by the learned object estimation segment model, but if there is an object based on ground truth, the false positive (FP) image 230 is determined by the learned object estimation segment model. If it is determined that there is object information estimated by the object estimation segment model, but there is no corresponding object based on the ground truth, the false negative (FN, False Negative) image 240 is generated using the object information estimated by the learned object estimation segment model. It is determined that there is no object, but based on the ground truth, there is an object and there is information about the object.

모델 추정부(150)는, 상기 테스트용 데이터 세트의 이미지들에 대해, 상기 차이가 소정의 범위 내에 있는 참 긍정(TP, True Positive) 이미지(210)과 참 부정(TN, True Negative) 이미지(220)를 제외한, 거짓 긍정(FP, False Positive) 이미지(230) 및 거짓 부정(FN, False Negative) 이미지(240)에 대하여, 상기 학습된 객체 추정 세그먼트 모델의 추가 학습을 위한 학습데이터로 분류하여, 해당 테스트 이미지를 메모리 등 소정의 추가학습저장소에 수집할 수 있다. 이는 테스트 이미지에 복수의 객체(예, 지붕 20개 등)가 있는 경우에, 하나의 객체에 대해라도 위와 같이 거짓 긍정(FP, False Positive) 이미지(230) 및 거짓 부정(FN, False Negative) 이미지(240)라고 잘못된 판단한 경우를 포함할 수 있다. 모델 추정부(150)는, 상기 추가학습저장소에 수집된 이미지(들)의 데이터 세트를 이용해 상기 객체 추정 세그먼트 모델을 재학습시킬 수 있다. 여기에는 학습부(110)의 딥 러닝 기반 학습 알고리즘이 이용될 수 있다. The model estimation unit 150 generates a true positive (TP) image 210 and a true negative (TN) image (True Negative) whose difference is within a predetermined range for the images of the test data set. Except for 220), the false positive (FP) image 230 and the false negative (FN) image 240 are classified as learning data for further training of the learned object estimation segment model. , the test images can be collected in a designated additional learning storage such as memory. This means that when there are multiple objects (e.g., 20 roofs, etc.) in the test image, even for one object, a false positive (FP) image 230 and a false negative (FN) image 230 are used as above. This may include cases of incorrect judgment (240). The model estimation unit 150 may retrain the object estimation segment model using a data set of image(s) collected in the additional learning storage. Here, the deep learning-based learning algorithm of the learning unit 110 may be used.

나아가, 모델 추정부(150)는, 상기 테스트용 데이터 세트의 이미지들에 대해, [수학식1]에 따라 그라운드 트루쓰와 참 긍정(TP) 이미지(210)라고 리콜값(Rec) (상기 차이가 소정의 범위 내에서 실제 참값인 객체들 중에서 모델이 참값이라고 예측한 객체들의 비율)을 계산하고 [수학식2]에 따라 정밀도값(Prec) (모델이 실제 참값으로 분류한 객체들 중에서 상기 차이가 소정의 범위 내에서 실제 참값인 객체들의 비율)을 계산하고 [수학식3]에 따라 조화 평균값(harmonic mean)(F1-score)을 계산하여, 상기 조화 평균값(F1-score) 미만인 이미지들에 대하여, 상기 학습된 객체 추정 세그먼트 모델의 추가 학습을 위한 학습데이터로 분류하여, 해당 테스트 이미지를 메모리 등 소정의 추가학습저장소에 수집할 수 있다. 마찬가지로. 모델 추정부(150)는, 상기 추가학습저장소에 수집된 이미지(들)의 데이터 세트를 이용해 상기 객체 추정 세그먼트 모델을 재학습시킬 수 있다. 여기에는 학습부(110)의 딥 러닝 기반 학습 알고리즘이 이용될 수 있다. Furthermore, the model estimation unit 150 calculates the ground truth and true positive (TP) images 210 according to [Equation 1] for the images of the test data set and sets a recall value (Rec) (the difference Calculate the ratio of objects predicted by the model to be true among objects with actual true values within a predetermined range) and calculate the precision value (Prec) according to [Equation 2] (the difference among objects classified by the model as actual true values) Calculate the ratio of objects that are actual true values within a predetermined range) and calculate the harmonic mean (F1-score) according to [Equation 3], and In relation to this, the test images can be classified as learning data for additional learning of the learned object estimation segment model, and collected in a predetermined additional learning storage such as memory. Likewise. The model estimation unit 150 may retrain the object estimation segment model using a data set of image(s) collected in the additional learning storage. Here, the deep learning-based learning algorithm of the learning unit 110 may be used.

[수학식1][Equation 1]

Rec = NTP/(NTP+ NFN)Rec = NTP/(NTP+ NFN)

[수학식2][Equation 2]

Prec = NTP/(NTP+ NFP)Prec = NTP/(NTP+ NFP)

[수학식3][Equation 3]

여기서, NTP는 참 긍정(TP) 이미지(210)라고 추정된 테스트 이미지 내의 해당 객체(들)(예, 추정된 객체)의 (픽셀) 면적, NFN은 거짓 부정(FN) 이미지(240)라고 추정된 테스트 이미지 내의 해당 객체(들) (예, 실제 객체)의 (픽셀) 면적, NFP는 거짓 긍정(FP) 이미지(230)라고 추정된 테스트 이미지 내의 해당 객체(들) (예, 추정된 객체)의 (픽셀) 면적이다. Here, NTP is the (pixel) area of the corresponding object(s) (e.g., estimated object) in the test image estimated to be a true positive (TP) image 210, and NFN is estimated to be a false negative (FN) image 240. (pixel) area of the corresponding object(s) (e.g., real object) within the estimated test image, NFP is the false positive (FP) image 230. is the (pixel) area of.

도 5는 본 발명의 일 실시예에 따른 이미지 학습처리 시스템(100)의 구현 방법의 일례를 설명하기 위한 도면이다.Figure 5 is a diagram for explaining an example of a method of implementing the image learning processing system 100 according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예에 따른 이미지 학습처리 시스템(100)은, 하드웨어, 소프트웨어, 또는 이들의 결합으로 이루어질 수 있다. 예를 들어, 본 발명의 이미지 학습처리 시스템(100)은 위와 같은 기능/단계/과정들을 수행하기 위한 적어도 하나의 프로세서를 갖는 도 4와 같은 컴퓨팅 시스템(1000) 또는 인터넷 상의 서버 형태로 구현될 수 있다. Referring to FIG. 5, the image learning processing system 100 according to an embodiment of the present invention may be comprised of hardware, software, or a combination thereof. For example, the image learning processing system 100 of the present invention may be implemented in the form of a server on the Internet or a computing system 1000 as shown in FIG. 4 having at least one processor for performing the above functions/steps/processes. there is.

컴퓨팅 시스템(1000)은 버스(1200)를 통해 연결되는 적어도 하나의 프로세서(1100), 메모리(1300), 사용자 인터페이스 입력 장치(1400), 사용자 인터페이스 출력 장치(1500), 스토리지(1600), 및 네트워크 인터페이스(1700)를 포함할 수 있다. 프로세서(1100)는 중앙 처리 장치(CPU) 또는 메모리(1300) 및/또는 스토리지(1600)에 저장된 명령어들에 대한 처리를 실행하는 반도체 장치일 수 있다. 메모리(1300) 및 스토리지(1600)는 다양한 종류의 휘발성 또는 불휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(1300)는 ROM(Read Only Memory)(1310) 및 RAM(Random Access Memory)(1320)을 포함할 수 있다. Computing system 1000 includes at least one processor 1100, memory 1300, user interface input device 1400, user interface output device 1500, storage 1600, and network connected through bus 1200. It may include an interface 1700. The processor 1100 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 1300 and/or storage 1600. Memory 1300 and storage 1600 may include various types of volatile or non-volatile storage media. For example, the memory 1300 may include a read only memory (ROM) 1310 and a random access memory (RAM) 1320.

또한, 네트워크 인터페이스(1700)는 스마트폰, 노트북 PC, 데스크탑 PC 등 사용자 단말에서의 유선 인터넷 통신이나 WiFi, WiBro 등 무선 인터넷 통신, WCDMA, LTE 등 이동통신을 지원하는 모뎀 등의 통신 모듈이나, 근거리 무선 통신 방식(예, 블루투스, 지그비, 와이파이 등)의 통신을 지원하는 모뎀 등의 통신모듈을 포함할 수 있다. In addition, the network interface 1700 is a communication module such as a modem that supports wired Internet communication in user terminals such as smartphones, laptop PCs, and desktop PCs, wireless Internet communication such as WiFi and WiBro, and mobile communication such as WCDMA and LTE, or a short-distance communication module. It may include a communication module such as a modem that supports wireless communication (e.g., Bluetooth, ZigBee, Wi-Fi, etc.).

따라서, 본 명세서에 개시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 단계는 프로세서(1100)에 의해 실행되는 하드웨어, 소프트웨어 모듈, 또는 그 2 개의 결합으로 직접 구현될 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM과 같이 컴퓨터 등 장치로 판독 가능한 저장/기록 매체(즉, 메모리(1300) 및/또는 스토리지(1600))에 상주할 수도 있다. 예시적인 저장 매체는 프로세서(1100)에 커플링되며, 그 프로세서(1100)는 저장 매체로부터 정보(코드)를 판독할 수 있고 저장 매체에 정보(코드)를 기입할 수 있다. 다른 방법으로, 저장 매체는 프로세서(1100)와 일체형일 수도 있다. 프로세서 및 저장 매체는 주문형 집적회로(ASIC) 내에 상주할 수도 있다. ASIC는 사용자 단말기 내에 상주할 수도 있다. 다른 방법으로, 프로세서 및 저장 매체는 사용자 단말기 내에 개별 컴포넌트로서 상주할 수도 있다.Accordingly, steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented directly in hardware, software modules, or a combination of the two executed by processor 1100. The software module is a storage/recording medium (i.e., memory 1300 and Alternatively, it may reside in storage 1600). An exemplary storage medium is coupled to a processor 1100, which can read information (code) from and write information (code) to the storage medium. Alternatively, the storage medium may be integrated with processor 1100. The processor and storage medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and storage medium may reside as separate components within the user terminal.

상술한 바와 같이, 본 발명에 따른 이미지 학습 처리 시스템(100)에 따르면, 항공 드론 이미지 등 입력 이미지에 포함된 객체를 인식하고 해당 객체 정보를 추정하기 위한 객체 추정 세그먼트 모델을 학습시켜 이미지에서 다양한 객체를 효율적으로 분류 및 세부 정보를 획득할 수 있도록 하기 위한 이미지 학습 처리 시스템을 제공할 수 있다. As described above, according to the image learning processing system 100 according to the present invention, objects included in input images such as aerial drone images are recognized and an object estimation segment model for estimating the corresponding object information is learned to identify various objects in the image. An image learning processing system can be provided to efficiently obtain classification and detailed information.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.As described above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , those of ordinary skill in the field to which the present invention pertains will be able to make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the spirit of the present invention should not be limited to the described embodiments, and the scope of the patent claims described later as well as all technical ideas that are equivalent or equivalent to the scope of this patent claim are included in the scope of the rights of the present invention. It should be interpreted as

전처리부(110)
데이터 증강부(120)
학습부(130)
테스트부(140)
모델 추정부(150)Preprocessing unit (110)
Data augmentation unit (120)
Learning Department (130)
Test section (140)
Model estimation unit (150)

Claims

An object estimation segment model is trained for one or more first input images, and the object estimation segment model is trained to classify objects included in the first input image and extract corresponding object information using the object estimation segment model. wealth;
a test unit that generates correction information that represents classification of an object and object information estimated through the object estimation segment model learned by the learning unit for one or more second input images and actual data; and
For the one or more second input images and the correction information corresponding thereto, the correction information is compared with ground truth, and images classified as learning data for additional learning are collected in an additional learning storage according to the comparison result, and the additional learning storage is stored. A model estimation unit that retrains the object estimation segment model using a data set of images collected in the learning storage.
An image learning processing system including.

According to paragraph 1,
The model estimation unit,
According to the comparison result, training for the additional learning is performed on false positive (FP) images and false negative (FN) images, excluding true positive (TP) images and true negative (TN) images, among the second input images. An image learning processing system that classifies data.

According to paragraph 1,
The model estimation unit calculates a harmonic average value (F1-score) of the recall value (Rec) and the precision value (Prec) for the second input image according to the following equations, and calculates the harmonic mean value (F1-score) of the recall value (Rec) and the precision value (Prec) for the second input image. Images that are below the average value (F1-score) are classified as training data for further learning,
Rec = NTP/(NTP+ NFN)
Prec = NTP/(NTP+ NFP)

Here, NTP is the area of one or more objects in the image estimated to be a true positive (TP) image, NFN is the area of one or more corresponding objects in the image estimated to be a false negative (FN) image, and NFP is a false positive (FP) image. An image learning processing system in which the area of one or more objects in an image is estimated to be an image.

According to paragraph 1,
The object information includes one or more of object classification information, object size, shooting location information, altitude, azimuth, or color.

A recording medium recording a computer-readable code for estimating objects and object information included in learning-based images,
An object estimation segment model is trained for one or more first input images, and the object estimation segment model is trained to classify objects included in the first input image and extract corresponding object information using the object estimation segment model. function;
a test function for generating correction information representing classification of an object and object information estimated through the object estimation segment model learned for one or more second input images and actual data; and
For the one or more second input images and the correction information corresponding thereto, the correction information is compared with ground truth, and images classified as learning data for additional learning are collected in an additional learning storage according to the comparison result, and the additional learning storage is collected. Model estimation function that retrains the object estimation segment model using the data set of images collected in the learning storage.
A recording medium for implementing.

According to clause 5,
In the model estimation function,
According to the comparison result, training for the additional learning is performed on false positive (FP) images and false negative (FN) images, excluding true positive (TP) images and true negative (TN) images, among the second input images. A recording medium classified as data.

According to clause 5,
In the model estimation function, the harmonic mean value (F1-score) of the recall value (Rec) and the precision value (Prec) is calculated for the second input image according to the following equations, and the harmonic mean value (F1-score) of the recall value (Rec) and the precision value (Prec) is calculated for the second input image Images that are less than the harmonic mean value (F1-score) are classified as learning data for further learning,
Rec = NTP/(NTP+ NFN)
Prec = NTP/(NTP+ NFP)

Here, NTP is the area of one or more objects in the image estimated to be a true positive (TP) image, NFN is the area of one or more corresponding objects in the image estimated to be a false negative (FN) image, and NFP is a false positive (FP) image. A recording medium that is the area of one or more objects within an image assumed to be an image.

According to clause 5,
The object information includes one or more of object classification information, object size, shooting location information, altitude, azimuth, or color.