KR102340998B1

KR102340998B1 - Auto labeling method and system

Info

Publication number: KR102340998B1
Application number: KR1020210088481A
Authority: KR
Inventors: 신동민; 박상민; 조이상; 구민기; 최치민
Original assignee: (주) 웨다
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2021-12-20

Abstract

Disclosed are an auto labeling method and a system. According to one embodiment of the present invention, an auto labeling method performed by an auto-labeling system includes the steps of: generating an auto labeling model using a data set uploaded from an operator; performing auto labeling on the data set uploaded from the operator using the generated auto labeling model; selecting data to be labeled from the uploaded data set based on a result predicted through the auto labeling performed; and re-learning the auto labeling model using the data set on which the data inspection and re-labeling of the selected data have been completed. Therefore, the accuracy of auto-labeling and the accuracy of the model are improved.

Description

AUTO LABELING METHOD AND SYSTEM

아래의 설명은 오토 레이블링 기술에 관한 것이다. The description below relates to auto-labeling technology.

최근 인공지능의 발전으로 인해 다양한 고성능의 알고리즘이 개발되고 있으며, 이와 함께 인공지능 학습에 필요한 데이터 셋 생성인 데이터 레이블링에 대한 관심도 급속도로 주목받고 있는 추세이다.Recently, due to the development of artificial intelligence, various high-performance algorithms are being developed. At the same time, interest in data labeling, which is the data set creation required for artificial intelligence learning, is also rapidly attracting attention.

데이터 레이블링이란 기계학습에 필요한 데이터를 수집, 분류, 가공하는 작업을 뜻하며, 학습 데이터 구축 시간 중 약 80% 이상을 차지할 정도의 매우 중요한 작업이다. 현재의 데이터 레이블링 방식은 크라우드 소싱 형태가 주를 이루고 있으며, 이와 같은 방식은 여러 사람이 수집한 데이터를 레이블링하여 데이터를 생성하는 방법으로, 레이블링 생성에 많은 시간이 소요되고, 비교적 기계학습에 이해도가 낮은 사람이 데이터를 생성하기 때문에 모든 레이블링 데이터를 전수조사 해야하는 번거로움이 발생하며, 레이블링 데이터에 대해 사람마다의 차이가 발생할 수 있는 소지가 있기 때문에 인공지능 모델 학습 시 모델 정확도에 영향을 줄 수 있다.Data labeling refers to the task of collecting, classifying, and processing data required for machine learning, and it is a very important task that occupies more than 80% of the training data construction time. The current data labeling method is mainly in the form of crowdsourcing, and this method generates data by labeling data collected by several people. Since a low number of people generate data, it is cumbersome to investigate all the labeling data, and since there is a possibility that there is a possibility that there is a difference between each person about the labeling data, it can affect the model accuracy when learning the artificial intelligence model. .

이에, 데이터 레이블링의 문제점을 보완하고 기계학습에 지식이 많지 않은 초보자도 쉽게 사용할 수 있는 레이블링 기술이 요구된다. Accordingly, there is a need for a labeling technique that supplements the problems of data labeling and can be easily used even by beginners who do not have much knowledge in machine learning.

기 학습 모델을 이용하여 데이터 셋에 대한 예측을 수행하고, 예측된 데이터의 검수 및 수정을 통해 오토 레이블링의 정확도 및 모델의 정확도를 향상시키는 오토 레이블링 방법 및 시스템을 제공할 수 있다. It is possible to provide an auto-labeling method and system that performs prediction on a data set using a pre-learning model, and improves the accuracy of auto-labeling and the accuracy of the model through inspection and correction of the predicted data.

오토 레이블링 모델의 반복 재학습을 통해 사용자의 개입을 점진적으로 감소시키기 위한 오토 레이블링 방법 및 시스템을 제공할 수 있다.It is possible to provide an auto-labeling method and system for gradually reducing user intervention through iterative re-learning of the auto-labeling model.

오토 레이블링 시스템에 의해 수행되는 오토 레이블링 방법은, 작업자로부터 업로드된 데이터 셋을 이용하여 오토 레이블링 모델을 생성하는 단계; 상기 생성된 오토 레이블링 모델을 이용하여 상기 작업자로부터 업로드된 데이터 셋에 대한 오토 레이블링을 수행하는 단계; 상기 수행된 오토 레이블링을 통해 예측된 결과에 기초하여 상기 업로드된 데이터 셋으로부터 레이블링 해야 할 데이터를 선별하는 단계; 및 상기 선별된 데이터에 대한 데이터 검수 및 재레이블링이 완료된 데이터 셋을 이용하여 오토 레이블링 모델을 재학습하는 단계를 포함할 수 있다. The auto-labeling method performed by the auto-labeling system includes: generating an auto-labeling model using a data set uploaded from an operator; performing auto-labeling on the data set uploaded from the worker using the generated auto-labeling model; selecting data to be labeled from the uploaded data set based on a result predicted through the auto-labeling performed; and re-learning the auto-labeling model using a data set on which data inspection and re-labeling of the selected data have been completed.

상기 선별하는 단계는, 상기 수행된 오토 레이블링을 통해 각 이미지에 포함된 레이블 별로 레이블명, 정확도, 위치 데이터를 포함하는 이미지 레이블을 도출하고, 상기 도출된 이미지 레이블을 이용하여 기 설정된 정확도 범위의 데이터 셋을 작업자에게 전달하는 단계; 및 상기 작업자로부터 상기 기 설정된 정확도 범위의 데이터 셋에 포함된 레이블명, 위치 데이터가 수정됨에 따라 검수 및 재레이블링이 완료되고, 상기 완료된 검수 및 재레이블링을 통해 수정된 로그 정보를 데이터베이스에 저장하는 단계를 포함할 수 있다. The selecting may include deriving an image label including label name, accuracy, and location data for each label included in each image through the auto-labeling performed, and using the derived image label to obtain data within a preset accuracy range delivering the set to the operator; And as the label name and location data included in the data set of the preset accuracy range are modified by the operator, inspection and re-labeling are completed, and storing the log information corrected through the completed inspection and re-labeling in a database may include

상기 선별하는 단계는, 상기 오토 레이블링을 반복적으로 수행함에 따라 예측된 결과가 기 설정된 정확도 범위에 포함되는지 여부를 판단하고, 상기 예측된 결과에 기초하여 기 설정된 정확도 범위에 포함되는 데이터를 약점 이미지로 선별하고, 상기 선별된 약점 이미지의 타입과 상기 선별된 레이블링 해야 할 데이터에서 수정이 발생한 클래스 데이터 유형을 포함하는 복수 개의 학습 지표를 수집하는 단계를 포함할 수 있다. In the selecting step, as the auto-labeling is repeatedly performed, it is determined whether a predicted result is included in a preset accuracy range, and based on the predicted result, data included in a preset accuracy range is used as a weakness image. and collecting a plurality of learning indicators including the type of the selected weakness image and the class data type in which the correction has occurred in the selected data to be labeled.

상기 재학습하는 단계는, 상기 재학습된 오토 레이블링 모델을 이용하여 상기 완료된 데이터 셋 또는 상기 완료된 데이터 셋의 일부 데이터인 약점 데이터에 대한 오토 레이블링을 재수행하고, 상기 재수행된 오토 레이블링을 통해 예측된 결과와 작업자로부터 레이블링된 데이터 셋을 비교하고, 상기 비교를 통해 상기 재학습된 오토 레이블링 모델의 정량적인 학습 지표와 추가로 업로드되어야 할 데이터에 대한 가이드를 도출하는 단계를 포함할 수 있다. In the re-learning, auto-labeling is re-performed on weak data, which is part data of the completed data set or the completed data set, using the re-learned auto-labeling model, and predicted through the re-performed auto-labeling. Comparing the result and the labeled data set from the operator, and deriving a guide for the quantitative learning index of the retrained auto-labeling model and data to be additionally uploaded through the comparison.

상기 재학습하는 단계는, 상기 작업자로부터 레이블링된 데이터 셋과 상기 재수행된 오토 레이블링을 통해 예측된 결과의 정답 오차율에 기초하여 정확도를 도출하고, 상기 도출된 정확도를 포함하는 추가 데이터 가이드를 제공하고, 상기 선별된 약점 이미지 유형의 예시를 제시하고, 약점 이미지들의 색상 분포를 표시하고, 현재의 오토 레이블링 모델이 예측하기 어려워하는 클래스의 종류를 표시하고, 우선적으로 추가해야 할 클래스를 추천하고, 현재의 오토 레이블링 모델의 클래스별 정답, 정답대비 거리를 제시하는 단계를 포함할 수 있다. The re-learning includes deriving accuracy based on the correct answer error rate of the result predicted through the re-performed auto-labeling and the data set labeled from the operator, and provides an additional data guide including the derived accuracy, , presents an example of the selected weak image type, displays the color distribution of the weak image, displays the type of class that the current auto-labeling model is difficult to predict, recommends a class to be added preferentially, and currently It may include the step of presenting the correct answer for each class of the auto-labeling model, and the distance to the correct answer.

상기 수행하는 단계는, 상기 오토 레이블링이 완료된 데이터를 각 작업자에게 분배하기 위하여 클러스터 모델을 이용하여 동일한 또는 유사한 속성의 작업 파일 분배를 통해 각 클래스별 군집을 생성하고, 상기 생성된 각 클래스별 군집을 상기 각 작업자에게 분배하는 단계를 포함할 수 있다. In the performing step, clusters for each class are created through distribution of work files of the same or similar properties using a cluster model in order to distribute the auto-labeling completed data to each worker, and the generated clusters for each class are generated. It may include the step of distributing to each worker.

오토 레이블링 시스템은, 작업자로부터 업로드된 데이터 셋을 이용하여 오토 레이블링 모델을 생성하는 모델 생성부; 상기 생성된 오토 레이블링 모델을 이용하여 상기 작업자로부터 업로드된 데이터 셋에 대한 오토 레이블링을 수행하는 레이블링 수행부; 상기 수행된 오토 레이블링을 통해 예측된 결과에 기초하여 상기 업로드된 데이터 셋으로부터 레이블링 해야 할 데이터를 선별하는 데이터 선별부; 및 상기 선별된 데이터에 대한 데이터 검수 및 재레이블링이 완료된 데이터 셋을 이용하여 오토 레이블링 모델을 재학습하는 재학습부를 포함할 수 있다. The auto-labeling system includes: a model generator for generating an auto-labeling model using a data set uploaded from an operator; a labeling performing unit for performing auto-labeling on the data set uploaded from the worker using the generated auto-labeling model; a data selection unit for selecting data to be labeled from the uploaded data set based on a result predicted through the auto-labeling performed; and a re-learning unit for re-learning the auto-labeling model by using the data set on which the data inspection and re-labeling of the selected data have been completed.

오토 레이블링(Auto-Labeling)과 이지-레이블링(Easy-Labeling)을 통해 데이터 레이블링을 수행하는 작업자의 작업시간을 줄이고 정합성을 높일 수 있다.Through Auto-Labeling and Easy-Labeling, it is possible to reduce the working time of the operator performing data labeling and increase the consistency.

오토 레이블링 모델을 업데이트하면서 레이블링과 레이블링 모델에 대한 성능을 향상시킬 수 있다. Updating the auto-labeling model can improve the labeling and performance of the labeling model.

오토레이블링 모델의 반복 재학습을 통해 사용자의 개입을 점진적으로 감소시킬 수 있다. Through iterative re-learning of the auto-labeling model, user intervention can be gradually reduced.

도 1은 데이터 레이블링을 설명하기 위한 도면이다.
도 2는 일 실시예에 있어서, 오토 레이블링을 위한 초기 모델을 생성하는 동작을 설명하기 위한 도면이다.
도 3은 일 실시예에 있어서, 오토 레이블링 시스템의 개괄적인 동작을 설명하기 위한 도면이다.
도 4는 수동 레이블링 방식을 설명하기 위한 도면이다.
도 5는 일 실시예에 있어서, 오토 레이블링 동작을 설명하기 위한 도면이다.
도 6은 일 실시예에 있어서, 오토 레이블링의 검수 동작을 설명하기 위한 도면이다.
도 7은 일 실시예에 있어서, 오토 레이블링의 가속화 동작을 설명하기 위한 도면이다.
도 8 및 도 9는 일 실시예에 있어서, 추가 데이터에 대한 가이드를 제공하는 동작을 설명하기 위한 도면이다.
도 10은 일 실시예에 있어서, 작업 속도 개선을 위한 작업자별 데이터 선별 동작을 설명하기 위한 도면이다.
도 11은 일 실시예에 따른 오토 레이블링 시스템의 구성을 설명하기 위한 블록도이다.
도 12는 일 실시예에 따른 오토 레이블링 시스템에서 오토 레이블링 방법을 설명하기 위한 흐름도이다.
도 13은 일 실시예에 있어서, 오토 레이블링의 정확도를 설명하기 위한 도면이다.
도 14는 일 실시예에 있어서, 데이터 어노테이션의 결과 화면을 나타낸 예이다.1 is a diagram for explaining data labeling.
FIG. 2 is a diagram for explaining an operation of generating an initial model for auto-labeling, according to an embodiment.
3 is a view for explaining the general operation of the auto labeling system according to an embodiment.
4 is a diagram for explaining a manual labeling method.
5 is a diagram for explaining an auto-labeling operation according to an embodiment.
6 is a view for explaining an inspection operation of auto-labeling, according to an embodiment.
7 is a diagram for explaining an accelerating operation of auto labeling according to an embodiment.
8 and 9 are diagrams for explaining an operation of providing a guide for additional data according to an embodiment.
10 is a diagram for explaining an operation of selecting data for each worker for improving work speed, according to an embodiment.
11 is a block diagram illustrating a configuration of an auto labeling system according to an embodiment.
12 is a flowchart illustrating an auto labeling method in an auto labeling system according to an embodiment.
13 is a diagram for explaining the accuracy of auto labeling according to an embodiment.
14 is an example illustrating a result screen of data annotation according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 데이터 레이블링을 설명하기 위한 도면이다. 1 is a diagram for explaining data labeling.

일반적으로 이미지 데이터를 레이블링하는 방법은 학습 목적에 따라 도 1과 같이 타겟 클래스(target class)에 사각형 박스를 치거나(detection), 색상을 칠해 마스크(mask)를 생성하는 방법(segementation)이 대표적이다. 더욱 상세하게는, 예를 들어 사진/영상 속 "사람"을 박스로 구분(Bounding-box)할 경우 탐지(Detection) 모델을 위한 데이터 셋을 생성, 사진/영상 속 "사람"의 테두리를 점선(Polygon), 브러쉬 등으로 색칠하여 배경, 다른 오브젝트와 구분할 경우 세그멘테이션(Segmentation) 모델을 위한 데이터 셋을 생성할 수 있다. 인공지능은 이를 통해 입력되는 사진/영상에서 "사람"의 영역과 배경, 혹은 다른 오브젝트에 대하여 구분하도록 학습할 수 있다.In general, as a method of labeling image data, a method of generating a mask by hitting a square box on a target class (detection) or painting a color (segmentation) is representative as shown in FIG. 1 according to the purpose of learning . In more detail, for example, if a "person" in a photo/video is divided into a box (Bounding-box), a data set for a detection model is created, Polygon), a data set for a segmentation model can be created when it is distinguished from the background and other objects by coloring with a brush. Through this, the artificial intelligence can learn to distinguish between the "person" area, the background, or other objects in the input photo/video.

그러나, 일일이 사용자(작업자)이 레이블링을 진행하기 때문에 많은 사용자가 레이블링을 진행할 경우 데이터의 일관성이 떨어져 결론적으로 인공지능 모델의 학습 시에 영향을 줄 수 있다. 또한, 인공지능을 학습시키기 위해서 이러한 사진/영상의 데이터 셋이 적게는 1000건에서 많게는 수십만건의 데이터를 입력해야만 인공지능이 정확하게 해당 사물을 구분할 수 있기 때문에 인공지능 모델을 학습시키기 위해 데이터 레이블링은 가장 많은 시간을 소요한다.However, since users (workers) individually perform labeling, if many users perform labeling, data consistency may be inconsistent, and consequently, it may affect the training of the artificial intelligence model. In addition, in order to train artificial intelligence, data labeling is the most effective for training an artificial intelligence model because the artificial intelligence can accurately classify the object only when there are as few as 1,000 to as many as hundreds of thousands of data sets of these photos/videos. It takes a lot of time.

이에, 실시예에서는 위와 같은 문제를 해결하기 위해 레이블링 작업에서 사용자의 편의성과 작업시간을 단축시키는 오토 레이블링 동작에 대하여 설명하기로 한다. Accordingly, in the embodiment, in order to solve the above problems, an auto-labeling operation for reducing the user's convenience and working time in the labeling operation will be described.

도 2는 일 실시예에 있어서, 오토 레이블링을 위한 초기 모델을 생성하는 동작을 설명하기 위한 도면이다. FIG. 2 is a diagram for explaining an operation of generating an initial model for auto-labeling, according to an embodiment.

오토 레이블링 시스템에 데이터 셋(201)이 업로드될 수 있다. 데이터 셋이 업로드됨에 따라 레이블이 등록(예를 들면, 인공지능이 예측할 대상인 클래스(사람, 자전거 등))될 수 있다. 이때, 레이블을 등록할 때 해당 레이블의 카테고리가 선택(해당 레이블이 동물-사람에 가까운 유형인지, 사물-산업현장 사물에 가까운 유형인지 등)될 수 있다. 오토 레이블링을 위해 비지도 학습으로 샘플 선별 알고리즘이 적용될 수 있다. 데이터 셋이 업로드되면 클러스터링 모델(비지도 학습)을 통해 이미지가 복수 개(n개)의 유형으로 분류될 수 있다. 예를 들면, 개, 고양이, 사람이 섞여 있는 데이터 셋인 경우 개, 고양이, 사람으로 이미지가 분류될 수 있다. 클러스터링 모델을 통해 분류된 각각의 군집별 선별 데이터 셋(202)에서 각 군집별 선별 데이터의 일부를 각각의 작업자(203)에 의하여 수동 레이블링이 진행될 수 있다. 수동 레이블링의 진행에 따라 획득된 데이터 셋이 병합(204)되어 초기 모델이 생성(210)될 수 있다. The data set 201 may be uploaded to the auto-labeling system. As the data set is uploaded, a label may be registered (eg, a class (human, bicycle, etc.) that AI predicts). In this case, when registering a label, a category of the corresponding label may be selected (whether the corresponding label is a type close to an animal-human or a type close to an object-industrial field object, etc.). For auto-labeling, a sample selection algorithm can be applied with unsupervised learning. When a data set is uploaded, images can be classified into a plurality of (n) types through a clustering model (unsupervised learning). For example, in the case of a data set in which dogs, cats, and people are mixed, images may be classified as dogs, cats, and people. Manual labeling may be performed by each operator 203 on a portion of the screening data for each cluster in the screening data set 202 for each cluster classified through the clustering model. As the manual labeling progresses, the obtained data sets may be merged 204 to generate an initial model 210 .

예를 들면, 클러스터링 모델을 통해 분류된 각각의 집합에서 10~30장씩을 각각의 작업자에 의해 초기 수동 레이블링이 진행되고, 초기 수동 레이블링이 진행된 데이터 셋이 병합되어 초기 모델이 생성될 수 있다. 이때, 데이터 셋에 설정한 카테고리에 기초하여 BaseModel로 전이학습이 진행될 수 있다. 기존에 보유하고 있는 유사 카테고리의 모델에서 전이학습을 진행하여 낮은 학습 시간으로 높은 학습 효과를 제공할 수 있다. 이를 통해 초기 모델의 생성이 완료될 수 있다. For example, in each set classified through the clustering model, initial manual labeling is performed by each worker for 10 to 30 sheets, and the data sets subjected to the initial manual labeling are merged to generate an initial model. In this case, transfer learning may proceed to the BaseModel based on the category set in the data set. It is possible to provide high learning effect with a low learning time by performing transfer learning on a model of a similar category that is already owned. Through this, generation of the initial model may be completed.

도 4를 참고하면, 수동 레이블링 동작에 대하여 설명하기로 한다. 수동 레이블링은, 복수의 작업자에게 데이터 셋을 각각 분배하고, 복수의 작업자에게 분해된 데이터 셋을 각각 이용하여 다수의 작업자로부터 레이블링이 완료되면, 소수의 검수자로부터 레이블링이 완료된 데이터 셋이 검수되고, 검수자(203)의 의견에 따라 레이블 위치, 클래스 등을 포함하는 재레이블링이 진행됨에 따라 데이터 셋이 완성될 수 있다. 이때, 재레이블링이 진행된 데이터 셋을 물리적인 방법을 통해 취합됨에 따라 재레이블링된 데이터 셋이 획득될 수 있다.Referring to FIG. 4 , a manual labeling operation will be described. In manual labeling, the data set is distributed to a plurality of workers, respectively, and when labeling is completed from a plurality of workers using the decomposed data set to a plurality of workers, the data set with labeling completed from a small number of inspectors is inspected, and the inspector According to the opinion of (203), the data set can be completed as re-labeling including the label position, class, etc. proceeds. In this case, as the relabeled data set is collected through a physical method, a relabeled data set may be obtained.

도 3은 일 실시예에 있어서, 오토 레이블링 시스템의 개괄적인 동작을 설명하기 위한 도면이다. 3 is a view for explaining the general operation of the auto labeling system according to an embodiment.

오토 레이블링 시스템은 오토 레이블링 모델(300)을 이용하여 입력된 비전 데이터를 한번 예측한 후, 작업자(사용자)에 의해 예측된 데이터에 대한 검수 및 수정을 통해 오토 레이블링의 정확도, 더 나아가 오토레이블 모델의 정확도를 향상시킬 수 있다. 이때, 오토 레이블링 모델(300)은 기 학습된 모델로서, 미리 학습된 모델(도 3에서는 'SOTA' 모델로 기재) 혹은 사용자에 의해 생성된 모델(초기 모델)을 포함할 수 있다. The auto-labeling system predicts the input vision data once using the auto-labeling model 300, and then inspects and corrects the data predicted by the operator (user) to improve the accuracy of auto-labeling and further improve the auto-labeling model. accuracy can be improved. In this case, the auto-labeling model 300 is a pre-trained model, and may include a pre-trained model (described as 'SOTA' model in FIG. 3) or a model (initial model) generated by a user.

오토 레이블링 시스템은 오토 레이블링(Auto-Labeling)과 이지 레이블링(Easy-Labeling) 기능을 통해 데이터 레이블링을 수행하는 작업자의 작업시간을 줄이고 정합성을 높일 수 있다. 오토 레이블링 시스템은 레이블링과 동시에 검수를 진행할 수 있다. The auto-labeling system can reduce the working time of the operator performing data labeling and increase the consistency through the auto-labeling and easy-labeling functions. The auto labeling system can perform inspection at the same time as labeling.

오토 레이블링은 두 가지의 기능을 포함할 수 있다. 머신러닝 성능 평가에 사용되는 잘 알려진 데이터 셋인 Pascal VoC, MsCoco로 미리 학습된 모델을 이용하여 일반적인 환경에서 촬영된 영상(예를 들어, 도로 영상, 사람이 지나다니는 영상 등)에서의 모든 클래스를 검출하거나, 또는, 사용자가 생성한 인공지능 모델(예를 들면, 도 2에서 설명한 초기 모델)을 이용하여 특수한(specific) 환경에서 촬영된 영상(예를 들어, 제품 불량 검출, 거수자 색출 등)에서 기 지정된 클래스를 검출할 수 있다. 이때, 클래스는 사용자(작업자)에 의해 지정될 수 있다. Auto labeling can include two functions. Detect all classes in images captured in a general environment (e.g., images of roads, images of people passing by, etc.) Alternatively, using an artificial intelligence model created by the user (eg, the initial model described in FIG. 2 ) in an image (eg, product defect detection, locating a handshake, etc.) taken in a specific environment A predefined class can be detected. In this case, the class may be designated by a user (worker).

이지 레이블링은 하나의 클래스에 대한 예측값을 검출하는 역할을 수행할 수 있다. 미리 학습된 모델(SOTA모델) 혹은 사용자에 의해 생성된 모델(초기 모델)을 통해 데이터 셋 내의 클래스 또는 기 지정된 클래스를 검출한다. The easy labeling may serve to detect a predicted value for one class. A class in the data set or a predefined class is detected through a pre-trained model (SOTA model) or a user-generated model (initial model).

업로드된 데이터 셋은 SOTA 혹은 사용자가 생성한 초기 모델을 통해 클래스가 검출될 수 있다. 이때, 검출된 클래스는 사용자에게 추천될 수 있는 임계치(Threshold) 이상의 정확도를 보장한다. 레이블링된 데이터는 예측된 데이터의 목적(classification, Detection, Segmentation) 형태의 클래스 정보를 포함할 수 있다. 작업자(203)는 검출된 클래스 정보를 목적 형태에 따라 분류 오류, False Positive, False Nagative, IoU 수정의 방식으로 검증할 수 있다. 검증된 정보는 레이블링 데이터 셋으로 저장될 수 있다. 이때 검증을 통해 수정된 데이터는 오토 레이블링 모델(300)의 정확도를 향상시키기 위한 재학습 데이터 셋으로 사용될 수 있다. In the uploaded data set, the class can be detected through SOTA or an initial model created by the user. In this case, the detected class guarantees accuracy above a threshold that can be recommended to a user. The labeled data may include class information in the form of a purpose (classification, detection, segmentation) of the predicted data. The operator 203 may verify the detected class information in the manner of classification error, false positive, false negative, and IoU correction according to the type of purpose. The verified information may be stored as a labeling data set. In this case, the data corrected through verification may be used as a re-learning data set to improve the accuracy of the auto-labeling model 300 .

도 5는 일 실시예에 있어서, 오토 레이블링 동작을 설명하기 위한 도면이다. 5 is a diagram for explaining an auto labeling operation according to an embodiment.

오토 레이블링 시스템에 소수의 작업자에 의해 데이터 셋이 업로드될 수 있다. 오토 레이블링 시스템은 업로드된 데이터 셋에 대한 오토 레이블링을 수행할 수 있다. 이때, 레이블링 진행과 동시에 검수가 진행될 수 있다. 오토 레이블링 시스템은 오토 레이블링 모델을 이용하여 업로드된 데이터 셋 중 레이블링해야 할 데이터를 선별할 수 있다(검수 역할). 다시 말해서, 레이블링해야 할 일부의 데이터가 선별될 수 있다. 소수의 작업자에 의해 선별된 데이터에 대한 검증 및 레이블 위치, 클래스 수정이 진행될 수 있다. 이후, 레이블링이 완료됨에 따라 레이블링이 진행된 데이터의 자동 병합을 통해 레이블링된 데이터 셋이 완성될 수 있다. 오토 레이블링 시스템은 완성된 데이터 셋을 이용하여 오토 레이블링 모델을 재학습할 수 있다. Data sets can be uploaded to the auto-labeling system by a small number of operators. The auto-labeling system may perform auto-labeling on the uploaded data set. In this case, the inspection may proceed simultaneously with the labeling progress. The auto-labeling system can select the data to be labeled among the uploaded data sets using the auto-labeling model (review role). In other words, some data to be labeled may be selected. Verification of the data selected by a small number of workers, and correction of label positions and classes may proceed. Thereafter, as the labeling is completed, the labeled data set may be completed through automatic merging of the labeled data. The auto-labeling system can retrain the auto-labeling model using the completed data set.

이와 같이 오토 레이블링을 이용한 작업 방식은 기존의 검수자가 많은 양의 데이터에서 레이블이 올바른지 여부를 전수 확인하고 다시 수정 요청을 하는 방식에서 검수자의 역할을 대부분 인공지능이 자동으로 진행하는 방식으로 전환될 수 있다. 이에, 오토 레이블링 모델의 반복 재학습에 따라 검수자의 개입이 점진적으로 감소될 수 있다. In this way, the work method using auto-labeling can be switched from the method in which the existing inspector completely checks whether the label is correct in a large amount of data and requests correction again, to a method in which most of the role of the inspector is automatically performed by artificial intelligence. have. Accordingly, the intervention of the inspector can be gradually reduced according to the iterative re-learning of the auto-labeling model.

도 6은 일 실시예에 있어서, 오토 레이블링의 검수 동작을 설명하기 위한 도면이다. 6 is a view for explaining an inspection operation of auto labeling according to an embodiment.

데이터 셋이 오토 레이블링 모델을 통해 결과가 예측됨에 따라 검수 동작이 진행될 수 있다. 오토 레이블링 시스템에 데이터 셋이 업로드될 수 있다. 데이터 셋이 오토 레이블링 모델에 입력될 수 있다. 오토 레이블링 모델을 통해 데이터 전수 예측이 수행될 수 있다. 오토 레이블링 모델을 이용하여 데이터 셋에 대한 예측이 수행됨에 따라 결과가 획득될 수 있다. 이때, 작업자에 의하여 각 데이터별 기 설정된 정확도 범위에 포함되는 약점 데이터에 대한 검수가 지시될 수 있다. 예를 들면, 사용자/작업자에 의하여 정확도 범위가 0 내지 50 사이로 설정될 수 있다. 각 작업자(203)로부터 기 설정된 정확도 범위에 포함되는 약점 데이터에 대한 검수 및 재레이블링이 진행될 수 있다. 기 설정된 정확도 범위에 포함되는 약점 데이터에 대한 검수 및 재레이블링이 완료됨에 따라 획득된 데이터 셋을 통해 오토 레이블링 모델이 재학습될 수 있다. 이때, 기 설정된 정확도 범위에 포함되지 않는 데이터 셋은 재레이블링을 수행하지 않고 오토 레이블링 모델에 입력되어 재학습될 수 있다. 이와 같이, 약점 데이터는 오토 레이블링 모델에 입력되는 입력 데이터 중 일부를 의미한다. 약점 데이터를 이용하여 오토 레이블링 모델을 재학습시킴에 따라 학습률이 높으며, 빠르게 오토 레이블링 모델이 최적화(fitting)될 수 있다. As the result of the data set is predicted through the auto-labeling model, the inspection operation may proceed. Data sets can be uploaded to the auto-labeling system. Data sets can be fed into the auto-labeling model. Prediction of data can be performed through the auto-labeling model. As the prediction is performed on the data set using the auto-labeling model, a result may be obtained. In this case, the operator may be instructed to inspect the weakness data included in the preset accuracy range for each data. For example, the accuracy range may be set between 0 and 50 by the user/operator. Inspection and re-labeling of the weakness data included in the preset accuracy range from each operator 203 may be performed. As the inspection and re-labeling of the weak data included in the preset accuracy range is completed, the auto-labeling model may be re-trained through the obtained data set. In this case, the data set not included in the preset accuracy range may be inputted into the auto-labeling model and re-learned without re-labeling. As such, the weakness data refers to some of the input data input to the auto-labeling model. As the auto-labeling model is retrained using weak data, the learning rate is high, and the auto-labeling model can be optimized quickly.

또는, 정확도가 기 설정된 값(예를 들면, 50)으로 설정될 수 있다. 작업자에 의하여 각 데이터별 기 설정된 정확도 이하에 포함되는 약점 데이터에 대한 검수가 지시될 수 있다. 각 작업자(203)로부터 기 설정된 정확도 이하에 해당되는 약점 데이터에 대한 검수 및 재레이블링이 진행될 수 있다. 기 설정된 정확도 이하에 포함되는 약점 데이터에 대한 검수 및 재레이블링이 완료됨에 따라 획득된 데이터 셋을 통해 오토 레이블링 모델이 재학습될 수 있다. 이와 같이, 약점 데이터는 오토 레이블링 모델에 입력되는 입력 데이터 중 일부를 의미한다. 약점 데이터를 이용하여 오토 레이블링 모델을 재학습시킴에 따라 학습률이 높으며, 빠르게 오토 레이블링 모델이 최적화(fitting)될 수 있다.Alternatively, the accuracy may be set to a preset value (eg, 50). The operator may be instructed to inspect the weakness data included below the preset accuracy for each data. Inspection and re-labeling of weakness data corresponding to less than a preset accuracy from each worker 203 may be performed. The auto-labeling model may be re-trained through the acquired data set as the inspection and re-labeling of the weak data included below the preset accuracy is completed. As such, the weakness data refers to some of the input data input to the auto-labeling model. As the auto-labeling model is retrained using weak data, the learning rate is high, and the auto-labeling model can be optimized quickly.

먼저, 데이터 셋이 오토 레이블링 모델을 통해 예측됨에 따라 각 이미지에 포함된 레이블 별로 레이블명, 정확도, 위치 정보 등을 포함하는 이미지 레이블이 도출될 수 있다. 도출된 이미지 레이블을 이용하여 상대적으로 낮은 정확도의 데이터 셋만을 작업자에게 전달할 수 있다(모델이 상대적으로 예측에 어려워하는 약점). 작업자는 데이터에서 레이블명, 위치 등을 수정하여 검수 및 재레이블링을 완료할 수 있다. 이때, 수정 로그는 데이터베이스에 저장될 수 있다. 작업자의 재레이블링이 완료된 데이터 셋을 이용해 오토 레이블링 모델이 재학습될 수 있다. First, as the data set is predicted through the auto-labeling model, an image label including a label name, accuracy, location information, etc. for each label included in each image may be derived. Using the derived image label, only a relatively low-accuracy data set can be delivered to the operator (a weakness in which the model is relatively difficult to predict). The operator can complete the inspection and re-labeling by modifying the label name, location, etc. in the data. In this case, the modification log may be stored in the database. The auto-labeling model can be retrained using the data set that has been re-labeled by the operator.

이러한 과정을 반복적으로 수행함으로써 오토 레이블링의 정확도를 향상시키며 실제 사람이 작업하는 레이블링 작업 대상의 데이터 양을 최소화시킬 수 있다. By repeatedly performing this process, it is possible to improve the accuracy of auto-labeling and to minimize the amount of data for a labeling object that a human being works with.

도 7은 일 실시예에 있어서, 오토 레이블링의 가속화 동작을 설명하기 위한 도면이다. 7 is a diagram for explaining an accelerating operation of auto labeling according to an embodiment.

실시예에서는 오토 레이블링 가속화하는 전략에 대하여 설명하기로 한다. 오토 레이블링 시스템은 오토 레이블링의 반복을 통해 레이블링 작업의 효율성과 작업시간을 단축 및 감소시킬 수 있다. 오토 레이블링 시스템은 오토 레이블링 모델의 정량적인 학습 지표와 학습의 가속화를 위한 추가 데이터 셋에 대한 가이드를 제공할 수 있다.In the embodiment, a strategy for accelerating auto-labeling will be described. The auto-labeling system can shorten and reduce the efficiency of the labeling operation and the working time through the repetition of auto-labeling. The auto-labeling system can provide a guide to quantitative learning indicators of auto-labeling models and additional datasets for accelerating learning.

오토 레이블링 시스템은 반복적인 오토 레이블링을 통한 데이터 레이블링 작업을 수행함에 따라 검수를 위한 두 가지의 학습지표를 지속적으로 수집하게 된다. 이때, 두 가지의 학습지표는 약점 이미지 타입(정확도 기반의 이미지 유형)과 수정이 발생한 클래스 데이터 유형을 포함할 수 있다. As the auto-labeling system performs data labeling through repetitive auto-labeling, it continuously collects two learning indicators for inspection. In this case, the two learning indicators may include a weak image type (accuracy-based image type) and a class data type in which correction has occurred.

도 8은 일 실시예에 있어서, 추가 데이터에 대한 가이드를 제공하는 동작을 설명하기 위한 도면이다.8 is a diagram for explaining an operation of providing a guide for additional data, according to an embodiment.

오토 레이블링 시스템은 오토 레이블링 이후, 작업자가 데이터를 검수 및 재레이블링함에 따라 수정된 데이터 셋을 통해 오토 레이블링 모델을 재학습할 수 있다. 그 뒤, 재학습된 오토 레이블링 모델에 수정된 데이터 셋이 입력될 수 있다. 오토 레이블링 시스템은 재학습된 오토 레이블링 모델을 이용하여 수정된 데이터 셋에 대한 오토 레이블링을 재수행할 수 있다. 오토 레이블링 시스템은 오토 레이블링을 재수행함에 따라 다시 예측된 결과와 검수자가 레이블링한 데이터 셋을 비교할 수 있다. 이때, 오토 레이블링 시스템은 약점 이미지 타입의 정확도 변화, 작업자가 레이블링한 데이터(수정 데이터)의 정답 오차를 이용하여 현재 시점에서의 오토 레이블링 모델의 정량적인 학습 지표와 추가로 업로드 해야 할 데이터(추가 데이터)에 대한 가이드를 도출할 수 있다. 관리자에 의해 약점 위주의 추가 데이터가 데이터 셋으로 업로드될 수 있다. The auto-labeling system can retrain the auto-labeling model through the corrected data set as the operator inspects and re-labels the data after auto-labeling. Thereafter, the modified data set may be input to the retrained auto-labeling model. The auto-labeling system can re-perform auto-labeling on the corrected data set using the retrained auto-labeling model. As the auto-labeling system re-performs auto-labeling, the re-predicted result can be compared with the data set labeled by the inspector. At this time, the auto-labeling system uses quantitative learning indicators of the auto-labeling model at the present time and additional data to be uploaded (additional data) by using the change in the accuracy of the weak image type and the correct answer error of the data labeled by the operator (correction data). ) can be derived. Additional data focusing on weaknesses can be uploaded to the data set by the administrator.

더욱 상세하게는, 도 9를 참고하면, 오토 레이블링 시스템은 기존의 약점 데이터와 작업자가 레이블링한 데이터(수정이 발생한 데이터)를 재학습된 오토 레이블링 모델의 결과와 비교할 수 있다. 예를 들면, 오토 레이블링 시스템은 데이터에서 특정 클래스의 위치, 클래스명, 클래스 유형, 이미지의 색상/밝기 등을 비교할 수 있다. 오토 레이블링 시스템은 현재 오토 레이블링 모델의 정량적 학습 지표, 약점 이미지 유형, 추가 데이터의 클래스 분포 등을 추가 데이터에 대한 가이드로 제공할 수 있다. 오토 레이블링 시스템은 수정이 발생한 데이터와 재학습된 오토 레이블링 모델의 결과 데이터의 정답 오차율을 통해 정확도를 도출할 수 있고, 도출된 정확도를 포함하는 추가 데이터에 대한 가이드를 제공할 수 있다. 오토 레이블링 시스템은 약점 이미지 유형의 예시를 제시하고, 약점 이미지들의 색상 분포를 표시할 수 있다. 예를 들면, 오토 레이블링 시스템은 약점 이미지에 대한 정보를 유형별로 저장해놓을 수 잇다. 오토 레이블링 시스템은 저장된 약점 이미지에 대한 정보에 기초하여 약점 이미지 유형의 예시를 제공할 수 있다. 이를 통해 작업자는 비슷한 유형의 이미지를 추가 보완할 수 있다. 오토 레이블링 시스템은 현재의 오토 레이블링 모델이 예측하기 어려워하는 클래스의 종류를 표시하고, 우선적으로 추가해야 할 클래스를 추천할 수 있다. 오토 레이블링 시스템은 현재의 오토 레이블링 모델의 클래스별 정답, 정답대비 거리를 제시할 수 있다.More specifically, referring to FIG. 9 , the auto-labeling system may compare existing weakness data and operator-labeled data (data in which correction occurs) with the results of the retrained auto-labeling model. For example, the auto-labeling system can compare the location of a particular class in the data, the class name, the class type, and the color/brightness of an image. The auto-labeling system can provide the quantitative learning indicators of the current auto-labeling model, the weak image type, the class distribution of the additional data, etc. as a guide for the additional data. The auto-labeling system may derive the accuracy through the correct error rate of the corrected data and the result data of the retrained auto-labeling model, and may provide a guide for additional data including the derived accuracy. The auto-labeling system may provide an example of the weak image type and display the color distribution of the weak image. For example, the auto-labeling system may store information about the weak point image by type. The auto-labeling system may provide an example of a weak point image type based on information about the stored weak point image. This allows the operator to further supplement similar types of images. The auto-labeling system can indicate the type of class that the current auto-labeling model is difficult to predict, and recommend a class to be added first. The auto-labeling system can present the correct answer for each class of the current auto-labeling model, and the distance from the correct answer.

도 10은 일 실시예에 있어서, 작업 속도 개선을 위한 작업자별 데이터 선별 동작을 설명하기 위한 도면이다.10 is a diagram for explaining an operation of selecting data for each worker for improving work speed, according to an embodiment.

레이블링 작업 중 클래스의 변경, 위치의 수정과 같은 도구 사용의 빈도가 많을수록 작업 속도가 감소하며 장시간 작업시 편차와 정확도 하락의 원인이 될 수 있다. 오토 레이블링 시스템은 클릭, UX 변경과 같은 반복 작업을 최소화하고, 각 작업자별 전문성 향상을 위해 데이터 선별 기술을 적용할 수 있다.The more frequent the use of tools such as class change and position correction during labeling work, the faster the work speed decreases, and it can cause deviations and decreased accuracy when working for a long time. The auto labeling system minimizes repetitive tasks such as clicks and UX changes, and data sorting technology can be applied to improve the professionalism of each worker.

오토 레이블링 시스템은 오토 레이블링이 완료된 데이터를 각 작업자에게 분배하기 위하여 클러스터 모델을 이용하여 동일한 또는 유사한 속성의 작업 파일 분배를 통해 각 클래스별 군집을 생성하고, 생성된 각 클래스별 군집을 각 작업자에게 분배할 수 있다. 이를 통해 각 작업자에게 최대한 동일한 또는 유사한 클래스, 동일/유사한 형태의 이미지가 집중될 수 있도록 분배하여 작업 효율을 개선시킬 수 있다. The auto-labeling system uses a cluster model to distribute the auto-labeling completed data to each worker, and creates a cluster for each class by distributing work files with the same or similar properties, and distributes the generated clusters for each class to each worker. can do. Through this, it is possible to improve work efficiency by distributing images of the same or similar class and same/similar shape to each worker as much as possible to be concentrated.

도 11은 일 실시예에 따른 오토 레이블링 시스템의 구성을 설명하기 위한 블록도이고, 도 12는 일 실시예에 따른 오토 레이블링 시스템에서 오토 레이블링 방법을 설명하기 위한 흐름도이다. 11 is a block diagram illustrating a configuration of an auto labeling system according to an embodiment, and FIG. 12 is a flowchart illustrating an auto labeling method in the auto labeling system according to an embodiment.

오토 레이블링 시스템(100)의 프로세서는 모델 생성부(1110), 레이블링 수행부(1120), 데이터 선별부(1130) 및 재학습부(1140)를 포함할 수 있다. 이러한 프로세서의 오토 레이블링 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 12의 오토 레이블링 방법이 포함하는 단계들(1210 내지 1240)을 수행하도록 오토 레이블링 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor of the auto labeling system 100 may include a model generating unit 1110 , a labeling performing unit 1120 , a data selecting unit 1130 , and a re-learning unit 1140 . These may be expressions of different functions performed by the processor according to a control instruction provided by the program code stored in the auto-labeling system of the processor. The processor and components of the processor may control the auto labeling system to perform steps 1210 to 1240 included in the auto labeling method of FIG. 12 . In this case, the processor and the components of the processor may be implemented to execute instructions according to the code of the operating system included in the memory and the code of at least one program.

프로세서는 오토 레이블링 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 오토 레이블링 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 오토 레이블링 시스템을 제어할 수 있다. 이때, 모델 생성부(1110), 레이블링 수행부(1120), 데이터 선별부(1130) 및 재학습부(1140) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(1210 내지 1240)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다.The processor may load the program code stored in the file of the program for the auto-labeling method into the memory. For example, when a program is executed in the auto-labeling system, the processor may control the auto-labeling system to load the program code from the program file into the memory according to the control of the operating system. At this time, each of the model generating unit 1110, the labeling performing unit 1120, the data selecting unit 1130, and the re-learning unit 1140 executes a command of a corresponding part of the program code loaded in the memory to perform subsequent steps ( 1210 to 1240) may be different functional representations of the processor.

단계(1210)에서 모델 생성부(1110)는 작업자로부터 업로드된 데이터 셋을 이용하여 오토 레이블링 모델을 생성할 수 있다. In operation 1210, the model generator 1110 may generate an auto-labeling model using the data set uploaded by the operator.

단계(1220)에서 레이블링 수행부(1120)는 생성된 오토 레이블링 모델을 이용하여 작업자로부터 업로드된 데이터 셋에 대한 오토 레이블링을 수행할 수 있다. 레이블링 수행부(1120)는 오토 레이블링이 완료된 데이터를 각 작업자에게 분배하기 위하여 클러스터 모델을 이용하여 동일한 또는 유사한 속성의 작업 파일 분배를 통해 각 클래스별 군집을 생성하고, 생성된 각 클래스별 군집을 상기 각 작업자에게 분배할 수 있다. In step 1220, the labeling performing unit 1120 may perform auto-labeling on the data set uploaded by the operator using the generated auto-labeling model. The labeling performing unit 1120 generates a cluster for each class through distribution of work files of the same or similar properties using a cluster model in order to distribute the auto-labeling completed data to each worker, and recalls the generated clusters for each class. It can be distributed to each worker.

단계(1230)에서 데이터 선별부(1130)는 수행된 오토 레이블링을 통해 예측된 결과에 기초하여 업로드된 데이터 셋으로부터 레이블링 해야 할 데이터를 선별할 수 있다. 데이터 선별부(1130)는 수행된 오토 레이블링을 통해 각 이미지에 포함된 레이블 별로 레이블명, 정확도, 위치 데이터를 포함하는 이미지 레이블을 도출하고, 도출된 이미지 레이블을 이용하여 기 설정된 정확도 범위의 데이터 셋을 작업자에게 전달할 수 있다. 데이터 선별부(1130)는 작업자로부터 기 설정된 정확도 범위의 데이터 셋에 포함된 레이블명, 위치 데이터가 수정됨에 따라 검수 및 재레이블링이 완료되고, 완료된 검수 및 재레이블링을 통해 수정된 로그 정보를 데이터베이스에 저장할 수 있다. 데이터 선별부(1130)는 오토 레이블링을 반복적으로 수행함에 따라 예측된 결과가 기 설정된 정확도 범위에 포함되는지 여부를 판단하고, 예측된 결과에 기초하여 기 설정된 정확도 범위에 포함되는 데이터를 약점 이미지로 선별하고, 선별된 약점 이미지의 타입과 선별된 레이블링 해야 할 데이터에서 수정이 발생한 클래스 데이터 유형을 포함하는 복수 개의 지표를 수집할 수 있다. In operation 1230, the data selector 1130 may select data to be labeled from the uploaded data set based on the result predicted through the auto-labeling performed. The data selection unit 1130 derives an image label including label name, accuracy, and location data for each label included in each image through the performed auto-labeling, and uses the derived image label to set a data set within a preset accuracy range. can be transmitted to the worker. The data sorting unit 1130 completes the inspection and re-labeling as the label name and location data included in the data set of the preset accuracy range are corrected by the operator, and records the log information corrected through the completed inspection and re-labeling to the database. can be saved The data selection unit 1130 determines whether a predicted result is included in a preset accuracy range as the auto-labeling is repeatedly performed, and selects data included in a preset accuracy range as a weakness image based on the predicted result. And, it is possible to collect a plurality of indicators including the type of the selected weakness image and the class data type in which the correction occurred in the selected data to be labeled.

단계(1240)에서 재학습부(1140)는 선별된 데이터에 대한 데이터 검수 및 재레이블링이 완료된 데이터 셋을 이용하여 오토 레이블링 모델을 재학습할 수 있다. 재학습부(1140)는 재학습된 오토 레이블링 모델을 이용하여 완료된 데이터 셋 또는 완료된 데이터 셋의 일부 데이터인 약점 데이터에 대한 오토 레이블링을 재수행하고, 재수행된 오토 레이블링을 통해 예측된 결과와 작업자로부터 레이블링된 데이터 셋을 비교하고, 비교를 통해 재학습된 오토 레이블링 모델의 정량적인 학습 지표와 추가로 업로드되어야 할 데이터에 대한 가이드를 도출할 수 있다. 재학습부(1140)는 작업자로부터 레이블링된 데이터 셋과 재수행된 오토 레이블링을 통해 예측된 결과의 정답 오차율에 기초하여 정확도를 도출하고, 도출된 정확도를 포함하는 추가 데이터 가이드를 제공하고, 선별된 약점 이미지 유형의 예시를 제시하고, 약점 이미지들의 색상 분포를 표시하고, 현재의 오토 레이블링 모델이 예측하기 어려워하는 클래스의 종류를 표시하고, 우선적으로 추가해야 할 클래스를 추천하고, 현재의 오토 레이블링 모델의 클래스별 정답, 정답대비 거리를 제시할 수 있다.In step 1240, the re-learning unit 1140 may re-learn the auto-labeling model by using the data set for which data inspection and re-labeling of the selected data have been completed. The re-learning unit 1140 re-performs auto-labeling on weak data, which is a partial data of a completed data set or a completed data set, using the re-learned auto-labeling model, and obtains results predicted through the re-performed auto-labeling and the operator. By comparing the labeled data sets, it is possible to derive the quantitative learning indicators of the retrained auto-labeling model and guides for the data to be additionally uploaded. The re-learning unit 1140 derives accuracy based on the error rate of the correct answer predicted through the data set labeled from the operator and the re-performed auto-labeling, provides an additional data guide including the derived accuracy, and selects Presents examples of weak image types, displays the color distribution of weak images, displays the types of classes that the current auto-labeling model is difficult to predict, recommends classes to be added first, and the current auto-labeling model You can present the correct answer for each class, and the distance from the correct answer.

이때, 단계(1220) 내지 단계(1240)가 반복적으로 수행될 수 있다.In this case, steps 1220 to 1240 may be repeatedly performed.

도 13은 일 실시예에 있어서, 오토 레이블링의 정확도를 설명하기 위한 도면이다. 13 is a diagram for explaining the accuracy of auto labeling according to an embodiment.

오토 레이블링 시스템은 사용자의 변경 기록과 약점 데이터를 적용하여 오토 레이블링 및 오토 레이블링 모델을 학습시킴으로써 작업량을 감소시킬 수 있다.The auto-labeling system can reduce the amount of work by learning the auto-labeling and auto-labeling model by applying the user's change history and weakness data.

오토 레이블링 시스템은 검수자의 수정(변경 기록)을 기반으로 변경이 많은 유형의 데이터를 우선적으로 학습할 수 있다. 오토 레이블링 시스템은 오토 레이블링 모델에 약점 기반 학습을 수행할 수 있다. 오토 레이블링을 수행할 때 정확도 범위가 설정될 수 있다. 예를 들면, 0과 50 사이의 정확도 범위가 설정될 경우 오토 레이블링 모델이 헷갈려하는 0과 50 사이의 정확도 범위에 해당하는 데이터 셋을 우선적으로 학습할 수 있다.The auto-labeling system can preferentially learn the types of data with many changes based on the revision (change history) of the inspector. The auto-labeling system can perform weakness-based learning on the auto-labeling model. When performing auto-labeling, an accuracy range can be established. For example, when an accuracy range between 0 and 50 is set, the data set corresponding to the accuracy range between 0 and 50, which is confusing for the auto-labeling model, can be learned preferentially.

오토레이블링 가속화 전략은 자동적인 데이터 분석을 통해 모델이 우선적으로 학습해야하는 데이터를 선별하고 추천하여 더 빠른 속도로 오토 레이블링 모델이 학습되어 오토 레이블링의 효율을 최대화할 수 있다. The auto-labeling acceleration strategy selects and recommends the data that the model needs to learn first through automatic data analysis, so that the auto-labeling model is trained at a faster rate, thereby maximizing the efficiency of auto-labeling.

도 14는 일 실시예에 있어서, 데이터 어노테이션의 결과 화면을 나타낸 예이다.14 is an example illustrating a result screen of data annotation according to an embodiment.

오토 레이블링과 이지 레이블링의 성능 평가가 수행될 수 있다. 평가는 탐지(Detection)의 경우 Average Precision(AP)가 사용되고, 세그멘테이션(Segmentation)의 경우 Mean Intersection over Union(mIoU)를 이용하여 검증될 수 있다. Performance evaluation of auto labeling and easy labeling can be performed. Evaluation may be verified using Average Precision (AP) for detection and Mean Intersection over Union (mIoU) for segmentation.

예를 들면, 성능 평가 수행을 위해 실험에 사용한 데이터 셋은 오토 레이블링 모델의 성능 평가에 사용되는 데이터 셋인 PascalVoC2012와 MSCoco, 그리고 Google ImageNet 데이터가 활용될 수 있다. 데이터 셋은 전부 이미지 데이터로 구성되어 있으며, 각 이미지에 대하여 클래스 별 어노테이션 데이터가 포함될 수 있다. For example, the data set used for the experiment to perform the performance evaluation may be PascalVoC2012, MSCoco, and Google ImageNet data, which are the data sets used for the performance evaluation of the auto-labeling model. The data set is all composed of image data, and annotation data for each class may be included for each image.

레이블링 결과의 대한 정확도 평가는 일반적으로 탐지(Detection)의 경우 AP를 사용하고, 세그멘테이션(Segmentation)의 경우 mIoU를 사용할 수 있다. AP를 계산하기 위해 먼저 IoU를 계산한다. IoU는 예측된 경계 박스(boundary box)와 실측(ground truth)을 활용하여 계산하며, 수학식 1과 같이 나타낼 수 있다.For the accuracy evaluation of the labeling result, in general, in the case of detection, AP may be used, and in the case of segmentation, mIoU may be used. To calculate AP, we first calculate IoU. IoU is calculated using the predicted boundary box and ground truth, and can be expressed as Equation 1.

수학식 1:Equation 1:

실시예에서는 IoU가 0.5 이상이면 정확하게 검출되었다고 판단하고, 0.5미만이면 잘못 검출되었다고 판단할 수 있다. In the embodiment, if the IoU is 0.5 or more, it may be determined that it is correctly detected, and if it is less than 0.5, it may be determined that the IoU is incorrectly detected.

또한, 작업자와 실시예에서 제안된 레이블링 속도를 측정하여 성능을 검증할 수 있다. 예를 들면, 성능을 검증하기 위해 50명의 실험자로 10명씩 5개의 실험단을 구성하여 평가될 수 있다. 실험은 50명의 실험자가 각 데이터 셋 별 무작위 100개의 이미지를 3번씩 어노테이션(annotation)하고, 동일한 이미지를 오토 레이블링, 그리고 단일 클래스에 이지 레이블링을 이용하여 정확도 및 속도를 측정할 수 있다. 실험을 통해 작업자에 비해 레이블링의 속도 및 레이블링의 정확도가 높은 것을 확인할 수 있다. In addition, the performance can be verified by measuring the labeling speed suggested in the embodiment with the operator. For example, in order to verify the performance, 5 experimental groups of 10 people each with 50 experimenters may be configured and evaluated. In the experiment, 50 experimenters annotated 100 random images for each data set three times, auto-labeling the same image, and using easy labeling for a single class to measure accuracy and speed. Through the experiment, it can be confirmed that the speed of labeling and the accuracy of labeling are higher than that of the operator.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the auto labeling method performed by the auto labeling system,
generating an auto-labeling model using a data set uploaded from an operator;
performing auto-labeling on the data set uploaded from the worker using the generated auto-labeling model;
selecting data to be labeled from the uploaded data set based on a result predicted through the auto-labeling performed; and
Re-learning the auto-labeling model using the data set on which the data inspection and re-labeling of the selected data have been completed
including,
The performing step is,
Creating a cluster for each class through distribution of work files of the same or similar properties using a cluster model to distribute the auto-labeling completed data to each worker, and distributing the generated cluster for each class to each worker
including,
The re-learning step is
Using the re-learned auto-labeling model, auto-labeling is re-performed on the weak data that is the completed data set or part of the completed data set, and the results predicted through the re-performed auto-labeling and the data labeled from the operator Compare the sets, derive a guide for the quantitative learning index of the retrained auto-labeling model and additional data to be uploaded through the comparison, and perform the re-performed auto-labeling with the labeled data set from the operator Derives accuracy based on the correct error rate of the predicted result, provides an additional data guide including the derived accuracy, presents an example of the selected weak image type, and displays the color distribution of the weak image, Indicate the type of class that the current auto-labeling model is difficult to predict, recommend the class to be added first, and present the correct answer for each class of the current auto-labeling model, and the distance to the correct answer
An auto-labeling method comprising

The method of claim 1,
The selecting step is
Deriving an image label including label name, accuracy, and location data for each label included in each image through the auto-labeling performed, and using the derived image label to deliver a data set of a preset accuracy range to the operator step; and
Inspection and re-labeling are completed as the label name and location data included in the data set of the preset accuracy range are modified by the operator, and storing log information corrected through the completed inspection and re-labeling in a database
An auto-labeling method comprising

The method of claim 1,
The selecting step is
As the auto-labeling is repeatedly performed, it is determined whether a predicted result is included in a preset accuracy range, based on the predicted result, data included in a preset accuracy range is selected as a weakness image, and the selected An auto-labeling method comprising the step of collecting a plurality of learning indicators including the type of weakness image and the class data type in which the correction has occurred in the selected data to be labeled.

delete

In the auto labeling system,
a model generator for generating an auto-labeling model using a data set uploaded from an operator;
a labeling performing unit for performing auto-labeling on the data set uploaded from the worker using the generated auto-labeling model;
a data selection unit for selecting data to be labeled from the uploaded data set based on a result predicted through the auto-labeling performed; and
A re-learning unit for re-learning an auto-labeling model using a data set that has completed data inspection and re-labeling for the selected data
including,
The labeling performing unit,
Creating a cluster for each class through distribution of work files of the same or similar properties using a cluster model to distribute the auto-labeling completed data to each worker, and distributing the generated cluster for each class to each worker including,
The re-learning unit,
Using the re-learned auto-labeling model, auto-labeling is re-performed on the weak data that is the completed data set or part of the completed data set, and the results predicted through the re-performed auto-labeling and the data labeled from the operator Compare the sets, derive a guide for the quantitative learning index of the retrained auto-labeling model and additional data to be uploaded through the comparison, and perform the re-performed auto-labeling with the labeled data set from the operator Derives accuracy based on the correct error rate of the predicted result, provides an additional data guide including the derived accuracy, presents an example of the selected weak image type, and displays the color distribution of the weak image, It displays the type of class that the current auto-labeling model is difficult to predict, recommends a class to be added first, and presents the correct answer for each class of the current auto-labeling model, and the distance to the correct answer.
Auto labeling system.