KR20240045414A

KR20240045414A - Method and device for quality inspection and automatic annotation of training data based on deep learning ensemble

Info

Publication number: KR20240045414A
Application number: KR1020220124153A
Authority: KR
Inventors: 장용석; 육창근
Original assignee: (주)다울디엔에스
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2024-04-08

Abstract

딥러닝 앙상블 기반 학습 데이터 품질 검사 및 자동 어노테이션을 위한 방법과 장치가 개시된다. 품질 검사 및 자동 어노테이션을 위한 학습 데이터 관리 방법은, 원천 데이터와 상기 원천 데이터에 대한 어노테이션 데이터로 이루어진 학습 데이터를 수신하는 단계; 상기 어노테이션 데이터에 대한 구문 분석을 통해 구문 정확성 검사를 수행하는 단계; 구문 분석 결과에 따라 구문 오류가 없는 학습 데이터를 대상으로 딥러닝 앙상블 인공지능 모델을 통해 유효성 검사를 수행하는 단계; 및 상기 딥러닝 앙상블 인공지능 모델을 통해 상기 원천 데이터에서 인식된 객체에 대해 어노테이션하는 단계를 포함할 수 있다.A method and device for deep learning ensemble-based learning data quality inspection and automatic annotation are disclosed. A learning data management method for quality inspection and automatic annotation includes receiving learning data consisting of source data and annotation data for the source data; performing a syntax correctness check through syntax analysis of the annotation data; Performing validation through a deep learning ensemble artificial intelligence model on training data without syntax errors according to the syntax analysis results; And it may include annotating objects recognized in the source data through the deep learning ensemble artificial intelligence model.

Description

Method and device for deep learning ensemble-based learning data quality inspection and automatic annotation {METHOD AND DEVICE FOR QUALITY INSPECTION AND AUTOMATIC ANNOTATION OF TRAINING DATA BASED ON DEEP LEARNING ENSEMBLE}

아래의 설명은 학습 데이터의 품질을 검사하는 기술에 관한 것이다.The explanation below concerns techniques for checking the quality of training data.

최근 컴퓨터 과학 분야에서 인공지능에 대한 연구가 활성화됨에 따라 인간의 학습 체계를 모방한 딥러닝 기법과 관련된 여러 알고리즘이 개발되고 있다. 이에 따라 여러 소프트웨어가 딥러닝과 관련된 여러 알고리즘을 채용하고 있다.Recently, as research on artificial intelligence has become active in the field of computer science, several algorithms related to deep learning techniques that mimic the human learning system are being developed. Accordingly, many software is adopting various algorithms related to deep learning.

딥러닝은 인공 신경망에서 발전한 형태의 인공 지능으로 뇌의 뉴런과 유사한 정보 입출력 계층을 활용하여 데이터를 학습하는 것으로, 알고리즘을 계층으로 구성하여 자체적으로 학습하고 지능적인 결정을 내릴 수 있는 인공 신경망을 만드는 것을 목적으로 한다.Deep learning is a form of artificial intelligence developed from artificial neural networks that learns data using information input and output layers similar to neurons in the brain. It organizes algorithms into layers to create an artificial neural network that can learn on its own and make intelligent decisions. The purpose is to

딥러닝 앙상블은 여러 개의 학습 모델을 조합하여 예측력을 향상시키는 기법으로, 앙상블 학습의 유형은 보팅(voting), 배깅(bagging), 부스팅(boosting), 스태킹(stacking)으로 구분될 수 있다.Deep learning ensemble is a technique that improves prediction ability by combining multiple learning models. The types of ensemble learning can be divided into voting, bagging, boosting, and stacking.

학습에 있어서 중요한 것은 여러 가지가 있을 수 있지만 그 중 무엇을 통해 학습을 할 것인가는 중요한 문제이다. 좋은 정보, 올바른 정보를 가지고 학습을 하게 된다면 그 효과는 그렇지 않은 경우보다 학습의 능률도, 결과도 좋을 것이다.There may be many things that are important in learning, but which of them to learn from is an important issue. If you learn with good and correct information, the learning efficiency and results will be better than if you do not.

기계 학습에서 있어서 중요한 것 역시 어떤 데이터를 통해 학습하는 것인가이다. 학습 데이터를 구성(혹은 생성)하는 방법에 있어 여러 방법이 연구되고 있다.What is also important in machine learning is what data it learns from. Several methods are being studied for configuring (or generating) learning data.

1. 한국공개특허 제10-2019-0044814호 (딥러닝 학습을 위한 데이터 생성 및 자료 구축 방법, 공개일: 2019년 05월 02일)1. Korean Patent Publication No. 10-2019-0044814 (Data generation and data construction method for deep learning learning, publication date: May 2, 2019)

딥러닝 앙상블 모델을 이용하여 학습 데이터에 대한 품질 검사 및 어노테이션을 자동화할 수 있는 기술을 제공한다.Provides technology to automate quality inspection and annotation of learning data using deep learning ensemble models.

컴퓨터 장치에서 실행되는 학습 데이터 관리 방법에 있어서, 상기 컴퓨터 장치는 메모리에 포함된 컴퓨터 판독가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 상기 학습 데이터 관리 방법은, 상기 적어도 하나의 프로세서에 의해, 원천 데이터와 상기 원천 데이터에 대한 어노테이션 데이터로 이루어진 학습 데이터를 수신하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 어노테이션 데이터에 대한 구문 분석을 통해 구문 정확성 검사를 수행하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 구문 분석 결과에 따라 구문 오류가 없는 학습 데이터를 대상으로 딥러닝 앙상블 인공지능 모델을 통해 유효성 검사를 수행하는 단계를 포함하는 학습 데이터 관리 방법을 제공한다.A learning data management method executed on a computer device, wherein the computer device includes at least one processor configured to execute computer-readable instructions included in a memory, and the learning data management method includes: , receiving learning data consisting of source data and annotation data for the source data; performing, by the at least one processor, a syntax correctness check through parsing of the annotation data; and performing validation through a deep learning ensemble artificial intelligence model on training data without syntax errors according to a syntax analysis result, by the at least one processor.

일 측면에 따르면, 상기 구문 정확성 검사를 수행하는 단계는, 어노테이션 정의(annotation definition)로서 객체 검출을 위한 데이터셋이 설정된 타입 로더(type loader)를 통해 해당 타입의 어노테이션에 대한 구문 분석을 수행하는 단계를 포함할 수 있다.According to one aspect, the step of performing the syntax correctness check includes performing syntax analysis on an annotation of the corresponding type through a type loader in which a dataset for object detection is set as an annotation definition. may include.

다른 측면에 따르면, 상기 딥러닝 앙상블 인공지능 모델은 K겹 교차 검증(K-fold cross validation)을 이용한 앙상블 학습을 통해 상기 원천 데이터와 상기 어노테이션 데이터 쌍으로 학습된 모델일 수 있다.According to another aspect, the deep learning ensemble artificial intelligence model may be a model learned with a pair of the source data and the annotation data through ensemble learning using K-fold cross validation.

또 다른 측면에 따르면, 상기 학습 데이터 관리 방법은, 상기 적어도 하나의 프로세서에 의해, 상기 딥러닝 앙상블 인공지능 모델을 통해 상기 원천 데이터에서 인식된 객체에 대해 어노테이션하는 단계를 더 포함할 수 있다.According to another aspect, the learning data management method may further include annotating objects recognized in the source data through the deep learning ensemble artificial intelligence model by the at least one processor.

또 다른 측면에 따르면, 상기 어노테이션하는 단계는, 상기 원천 데이터에서 인식된 객체에 경계 박스(bound box) 타입의 어노테이션을 생성하는 단계; 및 상기 생성된 어노테이션을 상기 원천 데이터와 함께 로컬 저장소 또는 원격 저장소로 전달하는 단계를 포함할 수 있다.According to another aspect, the annotating step includes generating a bounding box type annotation on an object recognized in the source data; And it may include transmitting the generated annotation along with the source data to a local storage or remote storage.

본 발명의 실시예들에 따르면, 딥러닝 앙상블 모델을 이용하여 학습 데이터에 대한 품질 검사 및 어노테이션을 자동화함으로써 수작업 비율을 획기적으로 줄일 수 있고 학습 데이터의 생성 비용을 감소시킬 수 있다.According to embodiments of the present invention, the manual work rate can be dramatically reduced and the cost of generating training data can be reduced by automating quality inspection and annotation of training data using a deep learning ensemble model.

도 1은 본 발명의 일실시예에 있어서 컴퓨터 장치의 내부 구성의 일례를 설명하기 위한 블록도이다.
도 2는 본 발명의 일실시예에 있어서 딥러닝 앙상블 기반 학습 데이터 품질 검사 시스템의 구성을 도시한 것이다.
도 3은 본 발명의 일실시예에 있어서 딥러닝 앙상블 기반 학습 데이터 자동 어노테이션 시스템의 구성을 도시한 것이다.
도 4는 본 발명의 일실시예에 있어서 딥러닝 앙상블 인공지능 모델의 학습 방법을 설명하기 위한 예시 도면이다.
도 5는 본 발명의 일실시예에 따른 컴퓨터 장치가 수행할 수 있는 품질 검사 및 자동 어노테이션을 포함한 학습 데이터 관리 방법의 일례를 도시한 순서도이다.1 is a block diagram for explaining an example of the internal configuration of a computer device according to an embodiment of the present invention.
Figure 2 shows the configuration of a deep learning ensemble-based learning data quality inspection system in one embodiment of the present invention.
Figure 3 shows the configuration of a deep learning ensemble-based learning data automatic annotation system in one embodiment of the present invention.
Figure 4 is an example diagram illustrating a learning method of a deep learning ensemble artificial intelligence model in one embodiment of the present invention.
Figure 5 is a flowchart showing an example of a learning data management method including quality inspection and automatic annotation that can be performed by a computer device according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.

본 발명의 실시예들은 학습 데이터의 품질 검사 및 자동 어노테이션 기술에 관한 것이다.Embodiments of the present invention relate to quality inspection and automatic annotation technology for learning data.

본 명세서에서 구체적으로 개시되는 것들을 포함하는 실시예들은 딥러닝 앙상블 모델을 이용하여 학습 데이터에 대한 품질 검사 및 어노테이션을 자동화할 수 있다.Embodiments including those specifically disclosed herein can automate quality inspection and annotation for learning data using a deep learning ensemble model.

도 1은 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 예를 들어, 본 발명의 실시예들에 따른 학습 데이터 관리 시스템은 도 1을 통해 도시된 컴퓨터 장치(100)에 의해 구현될 수 있다.1 is a block diagram showing an example of a computer device according to an embodiment of the present invention. For example, a learning data management system according to embodiments of the present invention may be implemented by the computer device 100 shown in FIG. 1.

도 1에 도시된 바와 같이 컴퓨터 장치(100)는 본 발명의 실시예들에 따른 학습 데이터 관리 방법을 실행하기 위한 구성요소로서, 메모리(110), 프로세서(120), 통신 인터페이스(130), 그리고 입출력 인터페이스(140)를 포함할 수 있다.As shown in FIG. 1, the computer device 100 is a component for executing the learning data management method according to embodiments of the present invention, including a memory 110, a processor 120, a communication interface 130, and It may include an input/output interface 140.

메모리(110)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(110)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(100)에 포함될 수도 있다. 또한, 메모리(110)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(110)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(110)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(130)를 통해 메모리(110)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(160)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(100)의 메모리(110)에 로딩될 수 있다.The memory 110 is a computer-readable recording medium and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, non-perishable large-capacity recording devices such as ROM and disk drives may be included in the computer device 100 as a separate permanent storage device that is distinct from the memory 110. Additionally, an operating system and at least one program code may be stored in the memory 110. These software components may be loaded into the memory 110 from a computer-readable recording medium separate from the memory 110. Such separate computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into the memory 110 through the communication interface 130 rather than a computer-readable recording medium. For example, software components may be loaded into memory 110 of computer device 100 based on computer programs installed by files received over network 160.

프로세서(120)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(110) 또는 통신 인터페이스(130)에 의해 프로세서(120)로 제공될 수 있다. 예를 들어, 프로세서(120)는 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 120 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 120 by the memory 110 or the communication interface 130. For example, processor 120 may be configured to execute received instructions according to program code stored in a recording device such as memory 110.

통신 인터페이스(130)는 네트워크(160)를 통해 컴퓨터 장치(100)가 다른 장치와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(100)의 프로세서(120)가 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(130)의 제어에 따라 네트워크(160)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(160)를 거쳐 컴퓨터 장치(100)의 통신 인터페이스(130)를 통해 컴퓨터 장치(100)로 수신될 수 있다. 통신 인터페이스(130)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(120)나 메모리(110)로 전달될 수 있고, 파일 등은 컴퓨터 장치(100)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 130 may provide a function for the computer device 100 to communicate with other devices through the network 160. For example, a request, command, data, file, etc. generated by the processor 120 of the computer device 100 according to a program code stored in a recording device such as memory 110 is transmitted to the network ( 160) and can be transmitted to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 100 through the communication interface 130 of the computer device 100 via the network 160. Signals, commands, data, etc. received through the communication interface 130 may be transmitted to the processor 120 or memory 110, and files, etc. may be stored in a storage medium (as described above) that the computer device 100 may further include. It can be stored as a permanent storage device).

통신 방식은 제한되지 않으며, 네트워크(160)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들 간의 근거리 유선/무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(160)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(160)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and may include not only a communication method utilizing communication networks that the network 160 may include (e.g., mobile communication network, wired Internet, wireless Internet, and broadcasting network), but also short-distance wired/wireless communication between devices. there is. For example, the network 160 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , may include one or more arbitrary networks such as the Internet. Additionally, the network 160 may include any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. Not limited.

입출력 인터페이스(140)는 입출력 장치(150)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드, 카메라 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(140)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(150)는 컴퓨터 장치(100)와 하나의 장치로 구성될 수도 있다.The input/output interface 140 may be a means for interfacing with the input/output device 150. For example, input devices may include devices such as a microphone, keyboard, camera, or mouse, and output devices may include devices such as displays and speakers. As another example, the input/output interface 140 may be a means for interfacing with a device that integrates input and output functions, such as a touch screen. The input/output device 150 may be configured as a single device with the computer device 100.

또한, 다른 실시예들에서 컴퓨터 장치(100)는 도 1의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(100)는 상술한 입출력 장치(150) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Additionally, in other embodiments, computer device 100 may include fewer or more components than those of FIG. 1 . However, there is no need to clearly show most prior art components. For example, the computer device 100 may be implemented to include at least a portion of the input/output device 150 described above, or may further include other components such as a transceiver, a camera, various sensors, and a database.

이하에서는 딥러닝 앙상블 기반 학습 데이터 품질 검사 및 자동 어노테이션을 위한 방법과 장치의 구체적인 실시예를 설명하기로 한다.Hereinafter, specific embodiments of a method and device for deep learning ensemble-based learning data quality inspection and automatic annotation will be described.

본 실시예에 따른 컴퓨터 장치(100)는 클라이언트(client)를 대상으로 클라이언트 상에 설치된 전용 어플리케이션이나 컴퓨터 장치(100)와 관련된 웹/모바일 사이트 접속을 통해 학습 데이터 품질 검사 및 자동 어노테이션을 위한 학습 데이터 관리 서비스를 제공할 수 있다. 컴퓨터 장치(100)에는 컴퓨터로 구현된 학습 데이터 관리 시스템이 구성될 수 있다. 일례로, 학습 데이터 관리 시스템은 독립적으로 동작하는 프로그램 형태로 구현되거나, 혹은 특정 어플리케이션의 인-앱(in-app) 형태로 구성되어 상기 특정 어플리케이션 상에서 동작이 가능하도록 구현될 수 있다.The computer device 100 according to this embodiment provides training data quality inspection and automatic annotation to clients through a dedicated application installed on the client or access to a web/mobile site related to the computer device 100. Management services can be provided. The computer device 100 may be configured with a learning data management system implemented on a computer. For example, the learning data management system may be implemented in the form of a program that operates independently, or may be implemented in the form of an in-app of a specific application to enable operation on the specific application.

컴퓨터 장치(100)의 프로세서(120)는 이하의 학습 데이터 관리 방법을 수행하기 위한 구성요소로 구현될 수 있다. 실시예에 따라 프로세서(120)의 구성요소들은 선택적으로 프로세서(120)에 포함되거나 제외될 수도 있다. 또한, 실시예에 따라 프로세서(120)의 구성요소들은 프로세서(120)의 기능의 표현을 위해 분리 또는 병합될 수도 있다.The processor 120 of the computer device 100 may be implemented as a component for performing the following learning data management method. Depending on the embodiment, components of the processor 120 may be selectively included in or excluded from the processor 120. Additionally, depending on the embodiment, components of the processor 120 may be separated or merged to express the functions of the processor 120.

이러한 프로세서(120) 및 프로세서(120)의 구성요소들은 이하의 학습 데이터 관리 방법이 포함하는 단계들을 수행하도록 컴퓨터 장치(100)를 제어할 수 있다. 예를 들어, 프로세서(120) 및 프로세서(120)의 구성요소들은 메모리(110)가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor 120 and the components of the processor 120 can control the computer device 100 to perform steps included in the learning data management method below. For example, the processor 120 and its components may be implemented to execute instructions according to the code of an operating system included in the memory 110 and the code of at least one program.

여기서, 프로세서(120)의 구성요소들은 컴퓨터 장치(100)에 저장된 프로그램 코드가 제공하는 명령에 따라 프로세서(120)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다.Here, the components of the processor 120 may be expressions of different functions performed by the processor 120 according to instructions provided by program codes stored in the computer device 100.

프로세서(120)는 컴퓨터 장치(100)의 제어와 관련된 명령이 로딩된 메모리(110)로부터 필요한 명령을 읽어들일 수 있다. 이 경우, 상기 읽어들인 명령은 프로세서(120)가 이후 설명될 단계들을 실행하도록 제어하기 위한 명령을 포함할 수 있다.The processor 120 may read necessary instructions from the memory 110 where instructions related to controlling the computer device 100 are loaded. In this case, the read command may include an command for controlling the processor 120 to execute steps that will be described later.

이후 설명될 학습 데이터 관리 방법이 포함하는 단계들은 도시된 순서와 다른 순서로 수행될 수 있으며, 단계들 중 일부가 생략되거나 추가의 과정이 더 포함될 수 있다.Steps included in the learning data management method to be described later may be performed in an order different from the order shown, and some of the steps may be omitted or additional processes may be included.

본 발명에 따른 학습 데이터 관리 서비스는 학습 데이터 로딩 기능, 학습 데이터의 구문 정확성 검사 기능, 딥러닝 앙상블 인공지능 모델 구성 및 학습을 통한 자동 어노테이션 도구, 학습 데이터의 유효성 검사 기능, 학습 데이터의 다양성 검사 기능 등을 제공할 수 있다.The learning data management service according to the present invention includes a learning data loading function, a function to check the syntactic accuracy of the learning data, an automatic annotation tool through deep learning ensemble artificial intelligence model construction and learning, a validation function of the learning data, and a diversity check function of the learning data. etc. can be provided.

도 2는 본 발명의 일실시예에 있어서 딥러닝 앙상블 기반 학습 데이터 품질 검사 시스템의 구성을 도시한 것이다. 본 발명의 실시예들에 따른 학습 데이터 품질 검사 시스템(200)은 도 1을 통해 도시된 컴퓨터 장치(100)에 의해 구현될 수 있다.Figure 2 shows the configuration of a deep learning ensemble-based learning data quality inspection system in one embodiment of the present invention. The learning data quality inspection system 200 according to embodiments of the present invention may be implemented by the computer device 100 shown in FIG. 1.

도 2를 참조하면, 학습 데이터 품질 검사 시스템(200)은 산업 안전 인공지능 원천 데이터를 객체 인식 모델의 학습 데이터로 사용하기 위한 학습 데이터 품질 검사 기능을 제공할 수 있다.Referring to FIG. 2, the learning data quality inspection system 200 can provide a learning data quality inspection function for using industrial safety artificial intelligence source data as learning data for an object recognition model.

학습 데이터 품질 검사 시스템(200)은 타입 로더(type loader)(210), 구문 분석기(220), 및 데이터 유효성 분석기(230)를 포함할 수 있다.The learning data quality inspection system 200 may include a type loader 210, a syntax analyzer 220, and a data validity analyzer 230.

인공지능의 학습을 위해 필요한 데이터의 수집, 정제, 가공, 품질 관리 등의 작업에 참여하는 크라우드 워커(crowd worker)는 전용 어플리케이션이나 웹/모바일 사이트 접속을 통해 원천 데이터에 대한 어노테이션 작업을 마친 학습 데이터를 업로딩할 수 있다.Crowd workers, who participate in tasks such as collection, purification, processing, and quality control of data necessary for artificial intelligence learning, annotate the source data and annotate the learning data by accessing a dedicated application or web/mobile site. You can upload.

타입 로더(210)는 어노테이션 데이터에 대한 정의를 사전 설정할 수 있다. 타입 로더(210)는 어노테이션 정의로서 객체 검출을 위한 데이터셋, 예를 들어, PASCAL VOC, MS COCO, JSON 등을 설정할 수 있다.The type loader 210 may preset definitions for annotation data. The type loader 210 can set a dataset for object detection, for example, PASCAL VOC, MS COCO, JSON, etc., as an annotation definition.

구문 분석기(220)는 원천 데이터와 함께, 크라우드 워커에 의해 원천 데이터에 어노테이션된 데이터를 입력받을 수 있다.The parser 220 can receive input data annotated on the source data by a crowd worker along with the source data.

구문 분석기(220)는 학습 데이터로서 원천 데이터와 어노테이션 데이터를 수신할 수 있다. 원천 데이터는 산업 현장의 카메라 영상이 캡쳐된 이미지 데이터를 의미할 수 있다.The syntax analyzer 220 may receive source data and annotation data as learning data. Source data may refer to image data captured from camera images at industrial sites.

구문 분석기(220)는 크라우드 워커로부터 로딩된 학습 데이터를 수신하여 타입 로더(210)에 설정된 타입의 어노테이션에 대한 구문 분석을 통해 구문 정확성 및 데이터 다양성 검사를 수행할 수 있다.The syntax analyzer 220 may receive training data loaded from the crowd worker and perform syntax accuracy and data diversity checks through parsing annotations of the type set in the type loader 210.

구문 분석기(220)는 어노테이션의 구문 분석 결과로서 구문 정확성 데이터와 다양성 분포 정보를 통계 뷰(view)를 통해 시각화하여 제공할 수 있다.The syntax analyzer 220 can visualize and provide syntactic accuracy data and diversity distribution information as a result of annotation syntax analysis through a statistical view.

구문 분석기(220)는 타입 로더(210)의 설정을 벗어난, 다시 말해 구문 오류가 포함된 어노테이션에 대해 크라우드 워커로 해당 어노테이션의 구문 오류 수정을 요청할 수 있다.The syntax analyzer 220 may request the crowd worker to correct syntax errors in annotations that deviate from the settings of the type loader 210, that is, contain syntax errors.

구문 분석기(220)는 학습 데이터로 수신된 원천 데이터와 어노테이션 데이터 쌍을 데이터 유효성 분석기(230)로 전달할 수 있다.The syntax analyzer 220 may transmit the pair of source data and annotation data received as learning data to the data validity analyzer 230.

데이터 유효성 분석기(230)는 딥러닝 앙상블 인공지능 모델을 통해 학습 데이터에 대한 유효성 검사를 수행할 수 있다. 데이터 유효성 분석기(230)는 어노테이션 데이터의 구문 분석을 거쳐 어노테이션에 구문 오류가 없는 학습 데이터를 대상으로 유효성 검사를 수행할 수 있다.The data validity analyzer 230 can perform validation on learning data through a deep learning ensemble artificial intelligence model. The data validation analyzer 230 can perform a validation check on learning data that does not have syntax errors in the annotation by parsing the annotation data.

본 실시예에서는 K겹 교차 검증(K-fold cross validation) 세팅을 통한 학습을 거쳐 원천 데이터와 어노테이션 데이터 간의 정확성에 대한 전체 평균이 일정 레벨(예를 들어, 95%) 이상인 딥러닝 앙상블 인공지능 모델을 구축할 수 있다. 데이터 유효성 분석기(230)는 딥러닝 앙상블 인공지능 모델을 통해 원천 데이터에 대한 어노테이션이 제대로 라벨링되었는지, 어노테이션 좌표가 맞는지 등 학습 데이터의 유효성 검사를 수행할 수 있다.In this embodiment, a deep learning ensemble artificial intelligence model is trained through a K-fold cross validation setting and the overall average of accuracy between source data and annotation data is above a certain level (e.g., 95%). can be built. The data validity analyzer 230 can perform validation checks on learning data, such as whether the annotations on the source data are properly labeled and whether the annotation coordinates are correct, through a deep learning ensemble artificial intelligence model.

딥러닝 앙상블 인공지능 모델에 대해서는 이하에서 다시 설명하기로 한다.The deep learning ensemble artificial intelligence model will be explained again below.

도 3은 본 발명의 일실시예에 있어서 딥러닝 앙상블 기반 학습 데이터 자동 어노테이션 시스템의 구성을 도시한 것이다. 본 발명의 실시예들에 따른 학습 데이터 자동 어노테이션 시스템(300)은 도 1을 통해 도시된 컴퓨터 장치(100)에 의해 구현될 수 있다.Figure 3 shows the configuration of a deep learning ensemble-based learning data automatic annotation system in one embodiment of the present invention. The learning data automatic annotation system 300 according to embodiments of the present invention may be implemented by the computer device 100 shown in FIG. 1.

도 3을 참조하면, 학습 데이터 자동 어노테이션 시스템(300)은 산업 안전 인공지능 원천 데이터에 대한 자동 어노테이션 기능을 제공할 수 있다.Referring to FIG. 3, the learning data automatic annotation system 300 can provide an automatic annotation function for industrial safety artificial intelligence source data.

본 실시예에서는 K겹 교차 검증 세팅을 통한 학습을 거쳐 원천 데이터와 어노테이션 데이터 간의 정확성에 대한 전체 평균이 일정 레벨(예를 들어, 95%) 이상인 딥러닝 앙상블 인공지능 모델을 구축할 수 있다. 어노테이션 엔진(310)은 딥러닝 앙상블 인공지능 모델을 통해 원천 데이터에서 인식된 객체에 대하여 자동으로 어노테이션할 수 있다.In this embodiment, it is possible to build a deep learning ensemble artificial intelligence model whose overall average accuracy between source data and annotation data is above a certain level (for example, 95%) through learning through a K-fold cross-validation setting. The annotation engine 310 can automatically annotate objects recognized in source data through a deep learning ensemble artificial intelligence model.

상세하게, 학습 데이터 자동 어노테이션 시스템(300)은 객체 인식 및 자동 어노테이션을 위한 어노테이션 엔진(310)을 포함할 수 있다.In detail, the learning data automatic annotation system 300 may include an annotation engine 310 for object recognition and automatic annotation.

실시예에서는 설명의 편의를 위하여 산업 현장에서 촬영되는 산업 안전 인공지능 인식용 테스트 영상 정보를 이용하여 어노테이션을 수행하는 동작에 대하여 예를 들어 설명하기로 한다.In the embodiment, for convenience of explanation, the operation of performing annotation using test image information for industrial safety artificial intelligence recognition captured at an industrial site will be described as an example.

어노테이션 엔진(310)은 카메라에서 촬영되는 산업 현장의 산업 안전 인공지능 인식용 테스트 영상을 획득할 수 있다. 예를 들면, 산업 현장에 존재하는 지게차에 카메라가 설치되고, 지게차에 설치된 카메라를 통해 산업 현장의 산업 안전 인공지능 인식용 테스트 영상이 촬영될 수 있다.The annotation engine 310 can acquire test images for industrial safety artificial intelligence recognition at industrial sites captured by a camera. For example, a camera may be installed on a forklift at an industrial site, and test images for industrial safety artificial intelligence recognition at an industrial site may be captured through the camera installed on the forklift.

어노테이션 엔진(310)은 획득된 산업 안전 인공지능 인식용 테스트 영상 정보를 캡쳐할 수 있고, 캡쳐된 산업 안전 인공지능 인식용 테스트 영상 정보로부터 객체 탐지 알고리즘(예를 들면, YOLO DarkNet)을 이용하여 객체 정보를 탐지할 수 있다.The annotation engine 310 can capture the acquired test image information for industrial safety artificial intelligence recognition, and use an object detection algorithm (e.g., YOLO DarkNet) from the captured test image information for industrial safety artificial intelligence recognition to identify objects. Information can be detected.

어노테이션 엔진(310)은 탐지된 객체 정보를 포함하는 산업 안전 인공지능 인식용 테스트 영상 정보를 객체 인식을 위한 학습 모델에 입력받을 수 있다. 어노테이션 엔진(310)은 객체 인식을 위한 학습 모델을 이용하여 탐지된 객체 정보를 포함하는 산업 안전 인공지능 인식용 테스트 영상 정보로부터 객체를 인식하여 어노테이션할 수 있다. 다시 말해, 어노테이션 엔진(310)은 탐지된 객체 정보를 포함하는 산업 안전 인공지능 인식용 테스트 영상 정보로부터 객체를 인식하고, 인식된 객체에 경계 박스(bound box) 타입 중 하나인 MS COCO 타입의 어노테이션을 생성할 수 있다.The annotation engine 310 may receive test image information for industrial safety artificial intelligence recognition, including detected object information, as input to a learning model for object recognition. The annotation engine 310 can recognize and annotate objects from test image information for industrial safety artificial intelligence recognition that includes object information detected using a learning model for object recognition. In other words, the annotation engine 310 recognizes an object from test image information for industrial safety artificial intelligence recognition including detected object information, and annotates the recognized object with an MS COCO type, which is one of the bounding box types. can be created.

또는, 어노테이션 엔진(310)은 탐지된 객체 정보를 객체 인식을 위한 학습 모델에 입력받을 수 있다. 어노테이션 엔진(310)은 객체 인식을 위한 학습 모델을 이용하여 탐지된 객체 정보로부터 객체를 인식하여 어노테이션할 수 있다. 다시 말해서, 어노테이션 엔진(310)은 탐지된 객체 정보로부터 객체를 인식하고, 인식된 객체에 MS COCO 타입의 어노테이션을 생성할 수 있다.Alternatively, the annotation engine 310 may receive detected object information as input to a learning model for object recognition. The annotation engine 310 can recognize and annotate objects from detected object information using a learning model for object recognition. In other words, the annotation engine 310 can recognize an object from detected object information and create an annotation of the MS COCO type on the recognized object.

어노테이션 엔진(310)은 인식된 객체의 영역에 대한 좌표 정보를 획득한 후 인식된 객체의 영역에 어노테이션할 수 있다. 어노테이션 엔진(310)은 인식된 객체의 영역에 대하여 사람(person), 얼굴(face), 상체(upper body), 하체(lower body), 포크 리프트(forklift), 배터리 차(battery car) 등을 포함하는 경계 박스 타입의 어노테이션을 생성할 수 있다.The annotation engine 310 may obtain coordinate information about the area of the recognized object and then annotate the area of the recognized object. The annotation engine 310 includes person, face, upper body, lower body, forklift, battery car, etc. for the recognized object area. You can create an annotation of the bounding box type.

보다 상세하게는, 예를 들면, 어노테이션 엔진(310)은 인식된 객체에 대한 식별 정보를 1차적으로 어노테이션한 다음, 인식된 객체에 대한 상세 정보를 2차적으로 어노테이션할 수 있다. 어노테이션 엔진(310)은 인식된 객체가 사람일 경우, 인식된 객체에 대하여 1차적으로 사람이라고 어노테이션할 수 있고, 사람에 대한 상세 정보(예를 들면, 상체, 하체 등)를 2차적으로 어노테이션할 수 있다. 또는, 어노테이션 엔진(310)은 인식된 객체가 물체일 경우, 인식된 객체에 대하여 지게차라고 1차적으로 어노테이션할 수 있고, 지게차에 대한 상세 정보(예를 들면, 바퀴, 운전대 등)를 2차적으로 어노테이션할 수 있다. 또한, 어노테이션 엔진(310)에서 경계 박스 타입의 어노테이션이 생성된 후 다시 사용자에 의한 확인 과정을 통해 어노테이션이 수정될 수 있다.More specifically, for example, the annotation engine 310 may primarily annotate identification information about the recognized object and then secondarily annotate detailed information about the recognized object. When the recognized object is a person, the annotation engine 310 can primarily annotate the recognized object as a person and secondarily annotate detailed information about the person (e.g., upper body, lower body, etc.). You can. Alternatively, if the recognized object is an object, the annotation engine 310 may primarily annotate the recognized object as a forklift and secondarily annotate detailed information about the forklift (e.g., wheels, steering wheel, etc.) Annotation is possible. Additionally, after a bounding box type annotation is created in the annotation engine 310, the annotation may be modified through a confirmation process by the user.

어노테이션 엔진(310)은 캡쳐된 영상 및 어노테이션 데이터를 로컬에 저장할 수 있다. 이때, 로컬은 임베디드 기반의 전자 기기일 수 있으며, 로컬에 새로운 영상 및 어노테이션 데이터가 저장될 때마다 캡쳐된 영상 및 어노테이션 데이터가 업데이트될 수 있다.The annotation engine 310 may store captured images and annotation data locally. At this time, the local may be an embedded-based electronic device, and the captured image and annotation data may be updated whenever new image and annotation data are stored locally.

어노테이션 엔진(310)은 캡쳐된 영상 및 어노테이션 데이터를 원격 저장소에 전달할 수 있다. 예를 들면, 어노테이션 엔진(310)은 캡쳐된 영상 및 어노테이션 데이터를 무선 통신 또는 유선 통신을 통해 원격 저장소에 전달할 수 있다. 원격 저장소는 어노테이션 엔진(310)으로부터 전달된 캡쳐된 영상 및 어노테이션 데이터를 저장할 수 있다.The annotation engine 310 can transmit the captured image and annotation data to a remote storage. For example, the annotation engine 310 may transmit captured images and annotation data to a remote storage through wireless or wired communication. The remote storage may store the captured image and annotation data delivered from the annotation engine 310.

도 4는 본 발명의 일실시예에 있어서 딥러닝 앙상블 인공지능 모델의 학습 방법을 설명하기 위한 예시 도면이다.Figure 4 is an example diagram illustrating a learning method of a deep learning ensemble artificial intelligence model in one embodiment of the present invention.

본 발명에서는 인공지능 모델 분석에 기반한 앙상블 모델화를 통해 학습 데이터 품질 검사와 자동 어노테이션을 위한 모델을 구축할 수 있다.In the present invention, a model for learning data quality inspection and automatic annotation can be built through ensemble modeling based on artificial intelligence model analysis.

앙상블 학습 유형 중 적어도 하나를 선택하여 해당 유형의 앙상블 학습을 통해 학습 데이터 품질 검사와 자동 어노테이션을 위한 모델을 구축할 수 있다. 도 4를 참조하면, 인공지능 모델을 선택한 후(①) 선택된 인공지능 모델에 대해서 원천 데이터와 어노테이션 데이터 쌍으로 이루어진 소량의 데이터 셋을 이용한 초기 학습을 진행할 수 있다(②). 인공지능 모델의 초기 학습 결과에 대한 유효성 검사를 거친 후 이상이 없으면 대량의 데이터 셋을 이용한 본 학습을 진행하여 해당 학습 결과에 대한 유효성 검사와 추론 작업을 차례로 진행할 수 있다. 인공지능 모델의 병합이나, 모델 일부 변경(레이어 변경, LR 스케줄러 변경, GPU 가속 등), 손실함수 변경 등 다양한 하이퍼파라미터를 튜닝할 수 있다(③). 상기한 과정(①, ②, ③)을 거쳐 인공지능 모델의 결정 규칙에 기반한 학습 결과를 바탕으로 최적의 성능을 가진 인공지능 모델을 최종 모델로 선택할 수 있다(④).You can select at least one of the ensemble learning types and build a model for learning data quality inspection and automatic annotation through ensemble learning of that type. Referring to Figure 4, after selecting an artificial intelligence model (①), initial learning can be performed for the selected artificial intelligence model using a small data set consisting of pairs of source data and annotation data (②). After validating the initial learning results of the artificial intelligence model, if there are no abnormalities, main learning using a large data set can be performed and validation and inference work on the learning results can be carried out sequentially. You can tune various hyperparameters, such as merging artificial intelligence models, changing part of the model (layer change, LR scheduler change, GPU acceleration, etc.), and loss function change (③). Through the above processes (①, ②, ③), the artificial intelligence model with optimal performance can be selected as the final model based on the learning results based on the decision rule of the artificial intelligence model (④).

도 5는 본 발명의 일실시예에 따른 컴퓨터 장치가 수행할 수 있는 품질 검사 및 자동 어노테이션을 포함한 학습 데이터 관리 방법의 일례를 도시한 순서도이다.Figure 5 is a flowchart showing an example of a learning data management method including quality inspection and automatic annotation that can be performed by a computer device according to an embodiment of the present invention.

도 5를 참조하면, 단계(S501)에서 프로세서(120)는 객체 검출 기반 어노테이션을 위한 타입 로더에 어노테이션 데이터에 대한 정의를 사전 설정할 수 있다.Referring to FIG. 5, in step S501, the processor 120 may preset a definition for annotation data in a type loader for object detection-based annotation.

단계(S502)에서 프로세서(120)는 크라우드 워커에 의한 어노테이션 데이터를 대상으로 타입 로더에 설정된 타입의 어노테이션에 대한 구문 분석을 통해 어노테이션 데이터의 구문 정확성 및 데이터 다양성 검사를 수행할 수 있다.In step S502, the processor 120 may check the syntax accuracy and data diversity of the annotation data by parsing the annotation of the type set in the type loader for the annotation data by the crowd worker.

단계(S503)에서 프로세서(120)는 어노테이션 데이터의 구문 분석을 거쳐 어노테이션 구문에 문제가 없는 학습 데이터를 대상으로 딥러닝 앙상블 인공지능 모델을 통해 유효성 검사를 수행할 수 있다. 프로세서(120)는 단계(S502)의 검사 결과와 K겹 교차 검증 세팅을 통한 학습을 거쳐 원천 데이터와 어노테이션 데이터 간의 정확성에 대한 전체 평균이 95% 이상인 딥러닝 앙상블 인공지능 모델을 구축할 수 있고, 이를 통해 학습 데이터의 유효성 검사를 수행할 수 있다.In step S503, the processor 120 may parse the annotation data and perform validation on learning data that does not have problems with the annotation syntax using a deep learning ensemble artificial intelligence model. The processor 120 can build a deep learning ensemble artificial intelligence model with an overall average accuracy of 95% or more between source data and annotation data through learning through the inspection results of step S502 and K-fold cross-validation settings, Through this, you can perform validation of the learning data.

프로세서(120)는 학습 데이터에 대한 구문 정확성 및 데이터 다양성 검사 결과는 물론이고, 학습 데이터에 대한 유효성 검사 결과를 시각화하여 제공할 수 있다.The processor 120 may visualize and provide syntactic accuracy and data diversity test results for the training data, as well as validation results for the training data.

단계(S504)에서 프로세서(120)는 원천 데이터와 어노테이션 데이터 쌍으로 이루어진 학습 데이터로 학습된 딥러닝 앙상블 인공지능 모델을 통해 원천 데이터에서 인식된 객체에 대한 어노테이션을 자동화할 수 있다.In step S504, the processor 120 can automate annotations for objects recognized in the source data through a deep learning ensemble artificial intelligence model learned with training data consisting of pairs of source data and annotation data.

프로세서(120)는 자동 어노테이션 결과를 시각화하여 제공할 수 있으며, 어노테이션 결과로서 원천 데이터와 어노테이션 데이터를 저장소(로컬 또는 원격)에 저장할 수 있다.The processor 120 can visualize and provide automatic annotation results, and can store source data and annotation data as annotation results in a storage (local or remote).

이처럼 본 발명의 실시예들에 따르면, 딥러닝 앙상블 모델을 이용하여 학습 데이터에 대한 품질 검사 및 어노테이션을 자동화함으로써 수작업 비율을 획기적으로 줄일 수 있고 학습 데이터의 생성 비용을 감소시킬 수 있다.As such, according to embodiments of the present invention, the manual work rate can be dramatically reduced and the cost of generating training data can be reduced by automating quality inspection and annotation of learning data using a deep learning ensemble model.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices and components described in the embodiments include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU). It may be implemented using one or more general-purpose or special-purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. The software and/or data may be embodied in any type of machine, component, physical device, computer storage medium or device for the purpose of being interpreted by or providing instructions or data to the processing device. there is. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수 개의 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 어플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. At this time, the medium may continuously store a computer-executable program, or temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In a learning data management method executed on a computer device,
The computer device includes at least one processor configured to execute computer-readable instructions contained in a memory,
The learning data management method is,
Receiving learning data consisting of source data and annotation data for the source data, by the at least one processor;
performing, by the at least one processor, a syntax correctness check through parsing of the annotation data; and
Performing validation through a deep learning ensemble artificial intelligence model on training data without syntax errors according to a syntax analysis result, by the at least one processor.
A learning data management method including.

According to paragraph 1,
The step of performing the syntax correctness check is,
A step of performing syntax analysis on annotations of the corresponding type through a type loader in which a dataset for object detection is set as an annotation definition.
A learning data management method including.

According to paragraph 1,
The deep learning ensemble artificial intelligence model is a model learned with a pair of the source data and the annotation data through ensemble learning using K-fold cross validation.
A learning data management method characterized by:

According to paragraph 1,
The learning data management method is,
Annotating, by the at least one processor, an object recognized in the source data through the deep learning ensemble artificial intelligence model.
A learning data management method further comprising:

According to clause 4,
The annotation step is,
Creating a bounding box type annotation on an object recognized in the source data; and
Transferring the generated annotation together with the source data to a local storage or remote storage.
A learning data management method including.