KR20230146320A

KR20230146320A - System for labeling and method thereof

Info

Publication number: KR20230146320A
Application number: KR1020220045175A
Authority: KR
Inventors: 신승호
Original assignee: 삼성에스디에스 주식회사
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2023-10-19

Abstract

레이블링 시스템 및 그 방법이 제공된다. 본 개시의 몇몇 실시예들에 따른 레이블링 시스템은, 모델셋을 등록 및 관리하는 모델 관리부, 룰셋을 등록 및 관리하는 룰셋 관리부 및 언레이블 데이터셋(unlabeled dataset)에 대한 레이블링(labeling) 기능을 제공하는 레이블링부를 포함할 수 있다. 레이블링부는 기 등록된 모델셋 중 적어도 하나의 모델 또는 기 등록된 룰셋 중 적어도 하나의 룰을 이용하여 언레이블 데이터셋에 대한 레이블을 자동으로 생성할 수 있으며, 이에 따라 레이블링에 소요되는 인적 비용 및 시간 비용이 크게 절감될 수 있다.A labeling system and method are provided. The labeling system according to some embodiments of the present disclosure includes a model management unit for registering and managing a model set, a ruleset management unit for registering and managing a ruleset, and a labeling function for an unlabeled dataset. It may include a labeling unit. The labeling unit can automatically generate a label for the unlabeled dataset using at least one model from the pre-registered model set or at least one rule from the pre-registered rule set, thereby reducing the human cost and time required for labeling. Costs can be greatly reduced.

Description

Labeling system and method {SYSTEM FOR LABELING AND METHOD THEREOF}

본 개시는 레이블링 시스템 및 그 방법에 관한 것으로, 보다 상세하게는, 레이블링 작업에 소요되는 인적 비용 및 시간 비용을 절감시킬 수 있는 시스템 및 그 시스템에서 수행되는 방법에 관한 것이다.The present disclosure relates to a labeling system and method, and more specifically, to a system and a method performed in the system that can reduce human and time costs required for labeling work.

지도 학습(supervised learning)이란 레이블 데이터셋(labeled dataset)을 학습함으로써 목적 태스크를 수행하는 모델을 구축하는 기계학습 기법이다. 따라서, 언레이블 데이터셋(unlabeled dataset)에 대해 지도 학습을 수행하기 위해서는, 레이블링 작업이 필수적으로 선행되어야 한다.Supervised learning is a machine learning technique that builds a model that performs a target task by learning a labeled dataset. Therefore, in order to perform supervised learning on an unlabeled dataset, labeling must be performed first.

레이블링 작업은 레이블 데이터셋을 생성하기 위해 샘플(개별 데이터) 별로 레이블을 태깅하는 작업을 의미한다. 레이블링 작업은 일반적으로 어노테이터(annotator)에 의해 수동으로 수행되기 때문에, 대량의 레이블 데이터셋을 생성하기 위해서는 상당한 인적 비용과 시간 비용이 소요된다.Labeling refers to the task of tagging labels for each sample (individual data) to create a label dataset. Because labeling work is generally performed manually by annotators, it requires significant human and time costs to generate large labeled datasets.

한국공개특허 제10-2021-0120489호 (2021.10.07 공개)Korean Patent Publication No. 10-2021-0120489 (published on October 7, 2021)

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 레이블링 작업에 소요되는 인적 비용 및 시간 비용을 절감할 수 있는 레이블링 시스템 및 그 시스템에서 수행되는 방법을 제공하는 것이다.The technical problem to be solved through some embodiments of the present disclosure is to provide a labeling system and a method performed in the system that can reduce human and time costs required for labeling work.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 고품질의 레이블링 결과물을 보장할 수 있는 레이블링 시스템 및 그 시스템에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a labeling system that can guarantee high-quality labeling results and a method performed in the system.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 텍스트 데이터셋의 다양한 레이블들에 대해 통합된 레이블링 기능을 제공할 수 있는 레이블링 시스템 및 그 시스템에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a labeling system that can provide an integrated labeling function for various labels of a text dataset and a method performed in the system.

본 개시의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 레이블링 시스템은, 모델셋을 등록 및 관리하는 모델 관리부, 룰셋을 등록 및 관리하는 룰셋 관리부 및 언레이블 데이터셋(unlabeled dataset)에 대한 레이블링(labeling) 기능을 제공하는 레이블링부를 포함하고, 상기 레이블링부는, 기 등록된 모델셋 중 적어도 하나의 모델을 이용하여 상기 언레이블 데이터셋에 대한 제1 레이블을 생성하는 제1 레이블 생성부 및 기 등록된 룰셋 중 적어도 하나의 룰을 이용하여 상기 언레이블 데이터셋에 대한 제2 레이블을 생성하는 제2 레이블 생성부를 포함할 수 있다.The labeling system according to some embodiments of the present disclosure to solve the above-described technical problem includes a model management unit for registering and managing a model set, a ruleset management unit for registering and managing a ruleset, and an unlabeled dataset. A labeling unit that provides a labeling function, wherein the labeling unit includes a first label generator and a first label that generates a first label for the unlabeled data set using at least one model from a pre-registered model set. It may include a second label generator that generates a second label for the unlabeled data set using at least one rule from among the registered rule sets.

일 실시예에서, 상기 기 등록된 모델셋은 사용자로부터 획득된 사용자 모델을 포함하고, 상기 제1 레이블 생성부는 상기 사용자의 요청에 응답하여 상기 사용자 모델을 이용하여 상기 제1 레이블을 생성할 수 있다.In one embodiment, the pre-registered model set includes a user model obtained from a user, and the first label generator may generate the first label using the user model in response to the user's request. .

일 실시예에서, 상기 기 등록된 모델셋은 상기 레이블링 시스템이 제공하는 시스템 모델을 포함하고, 상기 제1 레이블 생성부는 사용자의 요청에 응답하여 상기 시스템 모델을 이용하여 상기 제1 레이블을 생성할 수 있다.In one embodiment, the pre-registered model set includes a system model provided by the labeling system, and the first label generator may generate the first label using the system model in response to a user's request. there is.

일 실시예에서, 상기 모델셋은 자연어처리 모델을 포함하고, 상기 언레이블 데이터셋은 텍스트 데이터셋일 수 있다.In one embodiment, the model set includes a natural language processing model, and the unlabeled dataset may be a text dataset.

일 실시예에서, 상기 제1 레이블은 단어 레벨의 레이블, 문장 레벨의 레이블 및 문서 레벨의 레이블을 포함할 수 있다.In one embodiment, the first label may include a word-level label, a sentence-level label, and a document-level label.

일 실시예에서, 상기 레이블링부는 작업자에게 상기 레이블링 기능을 제공하고, 상기 작업자의 요청에 응답하여, 상기 제1 레이블 또는 상기 제2 레이블에 대한 검수 작업을 생성하고, 상기 생성된 검수 작업을 검수자에게 할당하는 작업 관리부를 더 포함할 수 있다.In one embodiment, the labeling unit provides the labeling function to an operator, generates an inspection job for the first label or the second label in response to a request from the operator, and sends the generated inspection job to an inspector. It may further include an assigned task management unit.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 레이블링 방법은, 적어도 하나의 컴퓨팅 장치에서 수행되는 레이블링(labeling) 방법으로서, 사용자로부터 획득된 사용자 모델을 등록하는 단계, 상기 사용자로부터 언레이블(unlabeled) 데이터셋을 획득하는 단계, 상기 등록된 사용자 모델을 이용하여 상기 언레이블 데이터셋에 대한 레이블을 생성하는 단계, 상기 생성된 레이블에 대한 검수 작업을 생성하는 단계 및 상기 생성된 검수 작업을 검수자에게 할당하는 단계를 포함할 수 있다.A labeling method according to some embodiments of the present disclosure for solving the above-described technical problem is a labeling method performed on at least one computing device, comprising: registering a user model obtained from a user; Obtaining an unlabeled dataset, generating a label for the unlabeled dataset using the registered user model, creating an inspection task for the generated label, and inspecting the generated label. It may include the step of assigning tasks to inspectors.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 컴퓨터 프로그램은, 컴퓨팅 장치와 결합되어, 사용자로부터 획득된 사용자 모델을 등록하는 단계, 상기 사용자로부터 언레이블(unlabeled) 데이터셋을 획득하는 단계, 상기 등록된 사용자 모델을 이용하여 상기 언레이블 데이터셋에 대한 레이블을 생성하는 단계, 상기 생성된 레이블에 대한 검수 작업을 생성하는 단계 및 상기 생성된 검수 작업을 검수자에게 할당하는 단계를 실행시키기 위하여 컴퓨터로 판독가능한 기록매체에 저장될 수 있다.A computer program according to some embodiments of the present disclosure for solving the above-described technical problem includes, in combination with a computing device, registering a user model obtained from a user, and obtaining an unlabeled dataset from the user. Executing the steps of creating a label for the unlabeled dataset using the registered user model, creating an inspection task for the created label, and assigning the generated inspection task to an inspector. It may be stored on a computer-readable recording medium in order to do so.

본 개시의 몇몇 실시예들에 따르면, 모델셋 및 룰셋 기반의 자동 레이블링 기능을 구비한 레이블링 시스템이 제공될 수 있다. 이에 따라, 사용자, 어노테이터(annotator) 등의 작업자는 레이블링 시스템을 통해 생성된 자동 레이블을 이용하여 신속하게 레이블링 작업을 수행할 수 있게 되며, 그 결과 언레이블 데이터셋(unlabeled dataset)의 레이블링에 소요되는 인적 비용 및 시간 비용이 크게 절감될 수 있다.According to some embodiments of the present disclosure, a labeling system with an automatic labeling function based on a model set and a rules set may be provided. Accordingly, workers such as users and annotators can quickly perform labeling tasks using automatic labels generated through the labeling system, and as a result, less time is spent on labeling unlabeled datasets. Human and time costs can be greatly reduced.

또한, 레이블링 결과물에 대해 검수 프로세스를 수행함으로써, 레이블링 결과물에 대한 품질이 보장될 수 있다.Additionally, by performing an inspection process on the labeling results, the quality of the labeling results can be guaranteed.

또한, 사용자 모델에 대한 등록 및 사용자 모델을 이용한 레이블링 기능을 제공함으로써, 레이블링 시스템을 이용하는 사용자(e.g. 개인 고객, 기업 고객)의 편의성이 크게 향상될 수 있다. 가령, 사용자는 직접 시스템을 구축할 필요 없이 제공된 레이블링 시스템과 자신들의 모델을 이용하여 레이블링 작업을 수행할 수 있게 된다.In addition, by providing user model registration and labeling functions using the user model, the convenience of users (e.g. individual customers, corporate customers) who use the labeling system can be greatly improved. For example, users can perform labeling tasks using the provided labeling system and their own models without having to build their own system.

또한, 텍스트 데이터셋에 대해 통합된 레이블링 기능이 제공될 수 있다. 가령, 텍스트 데이터셋에 대해 단어 레벨(e.g. 개체명 등), 문장 레벨(e.g. 문장의 의도, 유형, 감성 등) 및/또는 문서 레벨의 레이블(e.g. 문서의 토픽, 감성 등)에 대한 생성(태깅) 기능이 제공될 수 있다. 이에 따라, 다양한 자연어 처리 태스크를 위한 레이블 데이터셋(labeled dataset)이 용이하게 생성될 수 있으며, 궁극적으로 챗봇 등과 같은 텍스트 기반 기계학습 애플리케이션에 대한 개발 편의성이 향상될 수 있다.Additionally, an integrated labeling function may be provided for text datasets. For example, generation (tagging) of word level (e.g. entity name, etc.), sentence level (e.g. sentence intention, type, emotion, etc.) and/or document level labels (e.g. document topic, emotion, etc.) for text datasets. ) function may be provided. Accordingly, labeled datasets for various natural language processing tasks can be easily created, and ultimately, the convenience of development for text-based machine learning applications such as chatbots can be improved.

본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 몇몇 실시예들에 따른 레이블링 시스템을 개략적으로 설명하기 위한 예시적인 도면이다.
도 2는 본 개시의 몇몇 실시예들에 따른 레이블링 시스템의 구성을 나타내는 예시적인 블록도이다.
도 3은 본 개시의 일 실시예에 따른 모델 등록 프로세스를 나타내는 예시적인 흐름도이다.
도 4은 본 개시의 몇몇 실시예들에서 참조될 수 있는 모델 등록 화면을 예시한다.
도 5는 본 개시의 일 실시예에 따른 레이블링 프로세스를 나타내는 예시적인 흐름도이다.
도 6은 본 개시의 몇몇 실시예들에서 참조될 수 있는 레이블링 작업 화면을 예시한다.
도 7은 본 개시의 다른 실시예에 따른 레이블링 프로세스를 나타내는 예시적인 흐름도이다.
도 8은 본 개시의 일 실시예에 따른 검수 프로세스를 나타내는 예시적인 흐름도이다.
도 9는 본 개시의 몇몇 실시예들에서 참조될 수 있는 검수 작업 리스트 화면을 예시한다.
도 10은 본 개시의 몇몇 실시예들에서 참조될 수 있는 검수 작업 화면을 예시한다.
도 11은 본 개시의 일 실시예에 따른 모델 추천 방법을 나타내는 예시적인 흐름도이다.
도 12는 본 개시의 일 실시예에 따른 모델 추가 학습 방법을 설명하기 예시적인 도면이다.
도 13은 본 개시의 몇몇 실시예들에 따른 레이블링 시스템을 구현할 수 있는 예시적인 컴퓨팅 장치를 도시한다.1 is an exemplary diagram schematically illustrating a labeling system according to some embodiments of the present disclosure.
Figure 2 is an example block diagram showing the configuration of a labeling system according to some embodiments of the present disclosure.
3 is an example flowchart showing a model registration process according to an embodiment of the present disclosure.
4 illustrates a model registration screen that may be referenced in some embodiments of the present disclosure.
5 is an example flowchart illustrating a labeling process according to an embodiment of the present disclosure.
6 illustrates a labeling task screen that may be referenced in some embodiments of the present disclosure.
7 is an example flowchart showing a labeling process according to another embodiment of the present disclosure.
8 is an exemplary flowchart showing an inspection process according to an embodiment of the present disclosure.
Figure 9 illustrates an inspection task list screen that may be referenced in some embodiments of the present disclosure.
10 illustrates an inspection task screen that may be referenced in some embodiments of the present disclosure.
Figure 11 is an example flowchart showing a model recommendation method according to an embodiment of the present disclosure.
FIG. 12 is an exemplary diagram illustrating a method for additional model learning according to an embodiment of the present disclosure.
13 illustrates an example computing device that can implement a labeling system in accordance with some embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 다양한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 개시의 기술적 사상을 완전하도록 하고, 본 개시가 속한 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the attached drawings. The advantages and features of the present disclosure and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the technical idea of the present disclosure is not limited to the following embodiments and may be implemented in various different forms. The following examples are merely intended to complete the technical idea of the present disclosure and to cover the technical field to which the present disclosure belongs. is provided to fully inform those skilled in the art of the scope of the present disclosure, and the technical idea of the present disclosure is only defined by the scope of the claims.

본 개시의 다양한 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In describing various embodiments of the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.

다른 정의가 없다면, 이하의 실시예들에서 사용되는 용어(기술 및 과학적 용어를 포함)는 본 개시가 속한 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수도 있다. 본 개시에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시의 범주를 제한하고자 하는 것은 아니다.Unless otherwise defined, terms (including technical and scientific terms) used in the following embodiments may be used in a meaning that can be commonly understood by those skilled in the art in the technical field to which this disclosure pertains. It may vary depending on the intentions or precedents of engineers working in related fields, the emergence of new technologies, etc. The terminology used in this disclosure is for describing embodiments and is not intended to limit the scope of this disclosure.

이하의 실시예들에서 사용되는 단수의 표현은 문맥상 명백하게 단수인 것으로 특정되지 않는 한, 복수의 개념을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정되지 않는 한, 단수의 개념을 포함한다.The singular expressions used in the following embodiments include plural concepts, unless the context clearly specifies singularity. Additionally, plural expressions include singular concepts, unless the context clearly specifies plurality.

또한, 이하의 실시예들에서 사용되는 제1, 제2, A, B, (a), (b) 등의 용어는 어떤 구성요소를 다른 구성요소와 구별하기 위해 사용되는 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지는 않는다.In addition, terms such as first, second, A, B, (a), (b) used in the following embodiments are only used to distinguish one component from another component, and the terms The nature, sequence, or order of the relevant components are not limited.

먼저, 본 개시의 다양한 실시예들에서 사용될 수 있는 몇몇 용어들에 대하여 명확하게 하도록 한다.First, let us clarify some terms that may be used in various embodiments of the present disclosure.

이하의 실시예들에서, '샘플(sample)'은 '샘플(sample)'은 데이터셋을 구성하는 하나 또는 복수의 개별 데이터를 의미할 수 있다. 당해 기술 분야에서, 샘플은 사례(example), 인스턴스(instance), 관측치(observation) 등의 용어와 혼용되어 사용될 수 있다.In the following embodiments, 'sample' may mean one or a plurality of individual data constituting a dataset. In the art, sample may be used interchangeably with terms such as example, instance, and observation.

이하의 실시예들에서, '레이블링(labeling)'은 언레이블 데이터셋(unlabeled dataset)을 구성하는 각 샘플(데이터)에 레이블을 태깅하거나 레이블을 생성하는 작업을 의미할 수 있다. 당해 기술 분야에서, 레이블링은 어노테이션(annotation) 등의 용어와 혼용되어 사용될 수 있다.In the following embodiments, 'labeling' may refer to the task of tagging or creating a label for each sample (data) constituting an unlabeled dataset. In the technical field, labeling may be used interchangeably with terms such as annotation.

이하의 실시예들에서, 레이블(label)은 언레이블 데이터셋을 구성하는 각 샘플(데이터)에 태깅되는 정보(e.g. 목적 태스크를 위한 정답)를 의미할 수 있다. 예를 들어, 언레이블 데이터셋으로부터 분류 모델을 위한 학습 데이터셋을 구축하는 경우, 레이블은 해당 샘플의 클래스 정보가 될 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니다. 당해 기술 분야에서, 레이블은 어노테이션(annotation), 라벨 등의 용어와 혼용되어 사용될 수 있다.In the following embodiments, a label may mean information (e.g. the correct answer for the target task) tagged with each sample (data) constituting the unlabeled dataset. For example, when building a learning dataset for a classification model from an unlabeled dataset, the label can be the class information of the sample. However, the scope of the present disclosure is not limited thereto. In the art, labels may be used interchangeably with terms such as annotations and labels.

이하의 실시예들에서, 시스템 모델(system model)은 레이블링 시스템에 의해 제공되는 모델을 의미할 수 있다. 예를 들어, 시스템 모델은 레이블링 시스템의 관리자가 등록한 모델(e.g. 범용적인 모델)이 될 수 있을 것이나, 본 개시의 범위가 이에 한정되는 것은 아니다.In the following embodiments, system model may refer to a model provided by a labeling system. For example, the system model may be a model (e.g. a general-purpose model) registered by the administrator of the labeling system, but the scope of the present disclosure is not limited thereto.

이하의 실시예들에서, 사용자 모델(user model)은 사용자(e.g. 개인 고객, 기업 고객 등)로부터 획득된 모델을 의미할 수 있다. 예를 들어, 사용자 모델은 해당 사용자가 속한 도메인의 데이터셋을 학습한 커스텀 모델(e.g. 도메인-특화 모델) 등이 될 수 있을 것이나, 본 개시의 범위가 이에 한정되는 것은 아니다.In the following embodiments, a user model may refer to a model obtained from a user (e.g. individual customer, corporate customer, etc.). For example, the user model may be a custom model (e.g. domain-specific model) learned from the dataset of the domain to which the user belongs, but the scope of the present disclosure is not limited thereto.

이하, 첨부된 도면들을 참조하여 본 개시의 다양한 실시예들에 대하여 상세하게 설명한다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)을 개략적으로 설명하기 위한 예시적인 도면이다.1 is an exemplary diagram schematically illustrating a labeling system 10 according to some embodiments of the present disclosure.

도 1에 도시된 바와 같이, 레이블링 시스템(10)은 사용자에게 레이블링 기능(또는 서비스)을 제공하는 시스템 또는 언레이블 데이터셋(12)에 대한 레이블링을 수행하는 시스템일 수 있다. 가령, 레이블링 시스템(10)은 사용자 단말(20)로부터 언레이블 데이터셋(12)을 획득하고, 언레이블 데이터셋(12)을 레이블 데이터셋(13)으로 변환하며, 레이블 데이터셋(13)을 사용자 단말(20)에게 제공할 수 있다.As shown in FIG. 1, the labeling system 10 may be a system that provides a labeling function (or service) to a user or a system that performs labeling on the unlabeled dataset 12. For example, the labeling system 10 obtains the unlabel dataset 12 from the user terminal 20, converts the unlabel dataset 12 into a label dataset 13, and converts the label dataset 13 into a label dataset 13. It can be provided to the user terminal 20.

언레이블 데이터셋(12)은 예를 들어 텍스트, 이미지 등과 같은 다양한 유형의 데이터로 구성될 수 있다. 따라서, 본 개시의 범위가 특정 데이터의 유형에 의해 한정되는 것은 아니다. 다만, 이하에서는, 이해의 편의를 제공하기 위해, 언레이블 데이터셋(12)이 텍스트 데이터셋인 것을 가정하여 설명을 이어가도록 한다.The unlabeled dataset 12 may consist of various types of data, such as text, images, etc. Accordingly, the scope of the present disclosure is not limited by the type of specific data. However, in the following, in order to provide convenience of understanding, the explanation will be continued assuming that the unlabeled data set 12 is a text data set.

레이블링 시스템(10)은 레이블링과 관련된 다양한 기능(또는 서비스)을 제공할 수 있다.The labeling system 10 may provide various functions (or services) related to labeling.

예를 들어, 레이블링 시스템(10)은 레이블링에 이용되는 모델셋(즉, 학습된 모델의 집합)에 대한 등록 및 관리 기능을 제공할 수 있다. 본 예시와 관련하여서는, 도 3 및 도 4 등을 참조하여 보다 상세하게 설명하도록 한다.For example, the labeling system 10 may provide registration and management functions for a model set (i.e., a set of learned models) used for labeling. Regarding this example, it will be described in more detail with reference to FIGS. 3 and 4, etc.

또한, 예를 들어, 레이블링 시스템(10)은 사용자에게 자동 레이블링 기능을 제공할 수 있다. 이를테면, 레이블링 시스템(10)은 기 등록된 모델셋 및/또는 룰셋을 이용하여 언레이블 데이터셋(12)에 대한 레이블을 자동으로 생성할 수 있다. 본 예시와 관련하여서는, 도 5 및 도 6 등을 참조하여 보다 상세하게 설명하도록 한다.Additionally, for example, labeling system 10 may provide an automatic labeling function to the user. For example, the labeling system 10 may automatically generate a label for the unlabeled data set 12 using a pre-registered model set and/or rule set. Regarding this example, it will be described in more detail with reference to FIGS. 5 and 6, etc.

이외에도, 레이블링 시스템(10)은 수동 레이블링 기능, 레이블링 작업에 대한 관리 기능, 레이블링 작업에 대한 검수 기능 등을 더 제공할 수도 있다.In addition, the labeling system 10 may further provide a manual labeling function, a management function for labeling work, and an inspection function for labeling work.

레이블링 시스템(10)이 제공하는 다양한 기능들과 레이블 시스템(10)의 구성 및 동작에 관하여서는 도 2 이하의 도면을 참조하여 보다 상세하게 설명하도록 한다.The various functions provided by the labeling system 10 and the configuration and operation of the label system 10 will be described in more detail with reference to the drawings below in FIG. 2.

레이블링 시스템(10)은 적어도 하나의 컴퓨팅 장치로 구현될 수 있다. 예를 들어, 레이블링 시스템(10)의 모든 기능이 하나의 컴퓨팅 장치로 구현될 수도 있고, 레이블링 시스템(10)의 제1 기능이 제1 컴퓨팅 장치에 구현되고 제2 기능이 제2 컴퓨팅 장치에 구현될 수도 있다. 또는, 레이블링 시스템(10)의 특정 기능이 복수의 컴퓨팅 장치들로 구현될 수도 있다.Labeling system 10 may be implemented with at least one computing device. For example, all functions of labeling system 10 may be implemented in a single computing device, with a first function of labeling system 10 implemented in a first computing device and a second function implemented in a second computing device. It could be. Alternatively, specific functions of labeling system 10 may be implemented with multiple computing devices.

컴퓨팅 장치는 컴퓨팅 기능을 구비한 다양한 종류의 장치를 포함할 수 있으며, 이러한 장치의 일 예시에 관하여서는 도 13을 참조하도록 한다.Computing devices may include various types of devices equipped with computing functions. Refer to FIG. 13 for an example of such devices.

사용자 단말(20)은 레이블링 시스템(10)으로부터 다양한 기능(또는 서비스)을 제공받는 사용자 측의 단말일 수 있다. 사용자는 예를 들어 개인 고객, 기업 고객 등이 될 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 사용자 단말(20)은 어떠한 장치로 구현되더라도 무방하다. 이하의 설명에서, 사용자는 '사용자 단말(20)'을 지칭하는 것일 수도 있다. 유사하게, '어노테이터', '검수자' 및 '관리자'도 각각 '어노테이터 단말(도 2의 30)', '검수자 단말(도 2의 40) 및 '관리자 단말(도 2의 50)'을 지칭하는 것일 수 있다.The user terminal 20 may be a user terminal that receives various functions (or services) from the labeling system 10. Users may be, for example, individual customers, corporate customers, etc., but the scope of the present disclosure is not limited thereto. The user terminal 20 may be implemented with any device. In the following description, the user may refer to the 'user terminal 20'. Similarly, 'annotator', 'reviewer', and 'administrator' are 'annotator terminal (30 in Figure 2)', 'reviewer terminal (40 in Figure 2), and 'administrator terminal (50 in Figure 2)', respectively. It may be referring to

도시된 바와 같이, 사용자 단말(20)과 레이블링 시스템(10)은 네트워크를 통해 통신할 수 있다. 여기서, 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 이동 통신망(mobile radio communication network), Wibro(Wireless Broadband Internet) 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다.As shown, the user terminal 20 and the labeling system 10 can communicate over a network. Here, the network is implemented as all types of wired/wireless networks such as Local Area Network (LAN), Wide Area Network (WAN), mobile radio communication network, Wibro (Wireless Broadband Internet), etc. It can be.

지금까지 도 1을 참조하여 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)에 대하여 개략적으로 설명하였다. 이하에서는, 도 2 이하의 도면을 참조하여 레이블링 시스템(10)의 구성 및 동작에 대하여 보다 상세하게 설명하도록 한다.So far, the labeling system 10 according to some embodiments of the present disclosure has been schematically described with reference to FIG. 1 . Hereinafter, the configuration and operation of the labeling system 10 will be described in more detail with reference to the drawings of FIG. 2 and below.

도 2는 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)의 구성을 나타내는 예시적인 흐름도이다.FIG. 2 is an example flowchart showing the configuration of a labeling system 10 according to some embodiments of the present disclosure.

도 2에 도시된 바와 같이, 레이블링 시스템(10)은 사용자 인터페이스부(21), 관리부(22), 레이블링부(23) 및 저장소(24)를 포함할 수 있다. 다만, 도 2에는 본 개시의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 개시가 속한 기술분야의 통상의 기술자라면 도 2에 도시된 구성요소들 외에 다른 범용적인 구성 요소들(e.g. 프로세서, 메모리, 통신 인터페이스 등)이 더 포함 수 있음을 알 수 있다. 또한, 도 2에 도시된 레이블링 시스템(10)의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 복수의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있다. 예를 들어, 모델 관리부(25)와 룰셋 관리부(26)는 동일한 물리적 장치 내의 서로 다른 로직(logic)의 형태로 구현될 수도 있다. 또는, 특정 기능 요소가 복수의 서브 기능 요소들로 분리되는 형태로 구현될 수도 있다. 이하, 각 구성요소에 대하여 상세하게 설명한다.As shown in FIG. 2, the labeling system 10 may include a user interface unit 21, a management unit 22, a labeling unit 23, and a storage unit 24. However, only components related to the embodiment of the present disclosure are shown in FIG. 2 . Accordingly, those skilled in the art to which this disclosure pertains can see that other general-purpose components (e.g. processor, memory, communication interface, etc.) in addition to the components shown in FIG. 2 may be further included. Additionally, the components of the labeling system 10 shown in FIG. 2 represent functionally distinct functional elements, and a plurality of components may be implemented in an integrated form in an actual physical environment. For example, the model management unit 25 and the ruleset management unit 26 may be implemented in the form of different logic within the same physical device. Alternatively, a specific functional element may be implemented by being separated into a plurality of sub-functional elements. Hereinafter, each component will be described in detail.

사용자 인터페이스부(21)는 레이블링 시스템(10)의 사용 편의성을 위한 다양한 사용자 인터페이스(e.g. 웹 인터페이스, GUI 등)를 제공할 수 있으며, 레이블링 시스템(10)은 사용자 인터페이스부(21)를 통해 다양한 단말들(20 내지 50)과 각종 데이터를 주고받을 수 있다. 가령, 사용자 인터페이스부(21)는 모델 관리, 작업 관리, 룰셋 관리, 레이블링, 레이블링 결과물 검수 등의 기능들을 편리하기 사용하기 위한 웹 기반의 사용자 인터페이스를 제공할 수 있다. 사용자 인터페이스는 어떠한 방식으로 설계되더라도 무방하다.The user interface unit 21 can provide various user interfaces (e.g. web interface, GUI, etc.) for ease of use of the labeling system 10, and the labeling system 10 can be used through various terminals through the user interface unit 21. Various data can be exchanged with 20 to 50 people. For example, the user interface unit 21 may provide a web-based user interface for convenient use of functions such as model management, task management, ruleset management, labeling, and inspection of labeling results. The user interface can be designed in any way.

경우에 따라, 예시된 사용자 인터페이스들은 다른 구성요소들(e.g. 22, 23)에서 제공되는 것일 수도 있다. 가령, 모델 관리를 위한 사용자 인터페이스는 모델 관리부(25)에 의해 제공되고, 룰셋 관리를 위한 사용자 인터페이스는 룰셋 관리부(26)에 의해 제공될 수도 있다.In some cases, the illustrated user interfaces may be provided by other components (e.g. 22, 23). For example, a user interface for model management may be provided by the model management unit 25, and a user interface for ruleset management may be provided by the ruleset management unit 26.

또한, 사용자 인터페이스부(21)는 경우에 따라 레이블링 시스템(10)을 이용하는 단말들(20 내지 50) 측에서 구현될 수도 있다. 가령, 사용자 인터페이스부(21)는 단말들(20 내지 50) 측에서 앱(App)의 형태로 구현될 수도 있다.Additionally, the user interface unit 21 may, in some cases, be implemented on the terminals 20 to 50 that use the labeling system 10. For example, the user interface unit 21 may be implemented in the form of an app on the terminals 20 to 50.

다음으로, 관리부(22)는 레이블링 시스템(10)을 위한 다양한 관리 기능을 제공할 수 있다. 예를 들어, 관리부(22)는 모델 관리, 룰셋 관리, 작업 관리 등의 기능을 제공할 수 있다.Next, the management unit 22 may provide various management functions for the labeling system 10. For example, the management unit 22 may provide functions such as model management, ruleset management, and task management.

도시된 바와 같이, 실시예들에 따른 관리부(22)는 모델 관리부(25), 룰셋 관리부(26) 및 작업 관리부(27)를 포함할 수 있다.As shown, the management unit 22 according to embodiments may include a model management unit 25, a ruleset management unit 26, and a task management unit 27.

모델 관리부(25)는 레이블링에 이용되는 모델셋에 대한 관리 기능을 제공할 수 있다. 가령, 모델 관리부(25)는 모델에 대한 등록, 수정, 삭제 등의 관리 기능을 수행할 수 있다. 또한, 모델 관리부(25)는 등록된 모델을 모델 저장소(29-1)에 저장할 수 있으며, 모델에 대한 학습 기능을 제공할 수도 있다.The model management unit 25 may provide a management function for the model set used for labeling. For example, the model management unit 25 may perform management functions such as registration, modification, and deletion of models. Additionally, the model management unit 25 may store the registered model in the model storage 29-1 and may provide a learning function for the model.

모델 관리부(25)에 의해 관리되는 모델셋에는 다양한 유형 및/또는 태스크의 모델들이 포함될 수 있다.The model set managed by the model management unit 25 may include models of various types and/or tasks.

예를 들어, 모델셋에는 시스템 모델, 사용자 모델, 커스텀 모델 등이 포함될 수 있다. 상술한 바와 같이, 시스템 모델은 레이블링 시스템(10)이 제공하는 모델로서, 예를 들어 관리자가 등록한 모델일 수 있고, 사용자 모델은 사용자로부터 제공받은 모델일 수 있다. 또한, 커스텀 모델은 특정 사용자를 위해 커스터마이징된 모델로서, 시스템 모델일 수도 있고 사용자 모델일 수도 있다. 가령, 모델 관리부(25)는 사용자로부터 커스텀 모델을 제공받을 수도 있고, 사용자에게 시스템 모델에 대한 커스터마이징 기능을 제공할 수도 있다. 보다 구체적인 예로서, 모델 관리부(25)는 사용자로부터 언레이블 데이터셋과 동일한 도메인의 레이블 데이터셋(labeled dataset)을 획득하고, 레이블 데이터셋을 이용하여 시스템 모델을 파인-튜닝(fine-tuning)함으로써 해당 사용자를 위한 커스텀 모델을 생성할 수 있다. 그리고, 레이블링부(23)는 사용자의 요청에 응답하여 생성된 커스텀 모델을 이용하여 레이블을 생성할 수 있다.For example, a model set may include system models, user models, custom models, etc. As described above, the system model is a model provided by the labeling system 10 and, for example, may be a model registered by an administrator, and the user model may be a model provided by the user. Additionally, a custom model is a model customized for a specific user and may be a system model or a user model. For example, the model management unit 25 may receive a custom model from a user or provide a customizing function for the system model to the user. As a more specific example, the model management unit 25 acquires a labeled dataset of the same domain as the unlabeled dataset from the user and fine-tunes the system model using the labeled dataset. You can create a custom model for that user. Additionally, the labeling unit 23 may generate a label using a custom model created in response to a user's request.

또한, 예를 들어, 모델셋에는 이미지 분류, 시맨틱 세그먼테이션(semantic segmentation), 자연어 처리(Natural Language Process; NLP) 등과 같이 다양한 태스크를 수행하는 모델이 포함될 수 있다. 보다 구체적인 예로서, 모델셋에는 단어 레벨, 문장 레벨 및/또는 문서 레벨의 자연어 처리를 태스크를 수행하는 모델(즉, 자연어 처리 모델)들이 포함될 수 있다. 여기서, 단어 레벨 태스크의 예로는 개체명 인식(NER; Named Entity Recognition)을 들 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 또한, 문장 레벨의 태스크는 예를 들어 문장의 의도 분류, 유형 분류, 감성 분류(분석) 등을 포함할 수 있고, 문서 레벨의 태스크는 예를 들어 토픽(또는 카테고리) 분류, 감성 분석을 포함할 수 있다. 그러나, 본 개시의 범위가 이러한 예시들에 의해 한정되는 것은 아니다.Additionally, for example, the model set may include models that perform various tasks such as image classification, semantic segmentation, natural language processing (NLP), etc. As a more specific example, the model set may include models (i.e., natural language processing models) that perform natural language processing tasks at the word level, sentence level, and/or document level. Here, an example of a word-level task may be Named Entity Recognition (NER), but the scope of the present disclosure is not limited thereto. Additionally, sentence-level tasks may include, for example, sentence intent classification, type classification, and sentiment classification (analysis), and document-level tasks may include, for example, topic (or category) classification and sentiment analysis. You can. However, the scope of the present disclosure is not limited by these examples.

일 실시예에서는, 모델 관리부(25)가 등록된 모델에 대한 추가 학습(또는 파인-튜닝)을 수행할 수도 있다. 본 실시예와 관련하여서는 추후 도 12를 참조하여 상세하게 설명하도록 한다.In one embodiment, the model management unit 25 may perform additional learning (or fine-tuning) on the registered model. This embodiment will be described in detail later with reference to FIG. 12.

모델 관리부(25)에 관하여서는 도 3 및 도 4의 설명 내용을 더 참조하도록 한다.Regarding the model management unit 25, please refer further to the descriptions of FIGS. 3 and 4.

다음으로, 룰셋 관리부(26)는 레이블링에 이용되는 룰셋에 대한 관리 기능을 제공할 수 있다. 가령, 룰셋 관리부(26)는 룰셋에 대한 등록, 수정, 삭제 등의 관리 기능을 수행할 수 있다. 또한, 룰셋 관리부(26)는 등록된 룰셋을 룰셋 저장소(29-2)에 저장할 수 있다.Next, the ruleset management unit 26 can provide a management function for the ruleset used for labeling. For example, the ruleset management unit 26 can perform management functions such as registration, modification, and deletion of the ruleset. Additionally, the ruleset management unit 26 may store the registered ruleset in the ruleset storage 29-2.

룰셋 관리부(26)에 의해 관리되는 룰셋에는 예를 들어 개체명인식 등과 같이 룰(또는 사전) 기반으로 레이블링을 수행할 수 있는 다양한 종류의 룰들이 포함될 수 있다.The ruleset managed by the ruleset management unit 26 may include various types of rules that can perform labeling based on rules (or dictionaries), such as entity name recognition.

다음으로, 작업 관리부(27)는 레이블링 작업에 대한 관리 기능을 수행할 수 있다. 구체적으로, 작업 관리부(27)는 사용자의 요청에 응답하여 레이블링 작업을 생성하고, 레이블링 작업을 수행하는 작업자(e.g. 사용자, 어노테이터)의 요청에 응답하여 레이블링 결과물(즉, 레이블링 작업의 결과물)에 대한 검수 작업을 생성 및 할당할 수 있다. 또한, 작업 관리부(27)는 레이블링 작업의 상태를 관리할 수 있다.Next, the task management unit 27 may perform a management function for the labeling task. Specifically, the task management unit 27 creates a labeling task in response to a user's request, and responds to a request from a worker (e.g. user, annotator) performing the labeling task to provide labeling results (i.e., the results of the labeling task). You can create and assign inspection tasks for Additionally, the task management unit 27 can manage the status of the labeling task.

작업 관리부(27)에 관하여서는 도 5 내지 도 8 등의 설명 내용을 더 참조하도록 한다.Regarding the task management unit 27, please refer to further explanations such as FIGS. 5 to 8.

한편, 도 2에 도시되어 있지는 않으나, 관리부(22)는 회원 관리부(미도시)를 더 포함할 수도 있다. 회원 관리부(미도시)는 회원 정보 관리, 계정 관리, 로그인 등의 기능을 제공할 수 있다. 회원 정보는 예를 들어 로그인(계정) 정보(e.g. 아이디, 패스워드), 회원 유형(e.g. 사용자, 어노테이터, 검수자, 관리자 등), 작업자(e.g. 어노테이터, 사용자)의 작업 이력 등의 정보를 포함할 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 회원 관리부(미도시)는 로그인 기능을 통해 등록된 회원에 한하여 레이블링 시스템(10)의 기능을 제공하도록 할 수 있다.Meanwhile, although not shown in FIG. 2, the management unit 22 may further include a member management unit (not shown). The member management department (not shown) may provide functions such as member information management, account management, and login. Member information may include, for example, login (account) information (e.g. ID, password), member type (e.g. user, annotator, reviewer, administrator, etc.), and work history of the worker (e.g. annotator, user). However, the scope of the present disclosure is not limited thereto. The member management department (not shown) may provide the function of the labeling system 10 only to registered members through a login function.

다음으로, 레이블링부(23)는 주어진 데이터셋(e.g. 언레이블 데이터셋)에 대한 레이블링 기능을 제공할 수 있다. 가령, 레이블링부(23)는 사용자, 어노테이터, 검수자 등에게 모델 기반 자동 레이블링 기능과 룰셋 기반 자동 레이블링 기능을 제공할 수 있다. 또한, 레이블링부(23)는 수동 레이블링 기능도 제공할 수 있다.Next, the labeling unit 23 can provide a labeling function for a given data set (e.g. unlabeled data set). For example, the labeling unit 23 may provide a model-based automatic labeling function and a ruleset-based automatic labeling function to users, annotators, reviewers, etc. Additionally, the labeling unit 23 may also provide a manual labeling function.

도시된 바와 같이, 실시예들에 따른 레이블링부(23)는 제1 레이블 생성부(28-1)과 제2 레이블 생성부(28-2)를 포함할 수 있다.As shown, the labeling unit 23 according to embodiments may include a first label generation unit 28-1 and a second label generation unit 28-2.

제1 레이블 생성부(28-1)는 등록된 모델셋 중에서 적어도 하나의 모델을 이용하여 데이터셋에 대한 자동 레이블을 생성할 수 있다. 가령, 제1 레이블 생성부(28-1)는 개체명 인식 모델을 이용하여 주어진 단어에 대한 개체명 레이블을 생성할 수 있고, 감성 분석 모델을 이용하여 주어진 문장(또는 문서)의 감성 레이블을 생성할 수 있다. 또는, 제1 레이블 생성부(28-1)는 복수개의 모델들을 이용하여 주어진 샘플(데이터)에 대해 복수개의 레이블을 생성할 수도 있다.The first label generator 28-1 may generate an automatic label for the dataset using at least one model from among the registered model sets. For example, the first label generator 28-1 may generate an entity name label for a given word using an entity name recognition model, and generate an emotion label for a given sentence (or document) using a sentiment analysis model. can do. Alternatively, the first label generator 28-1 may generate a plurality of labels for a given sample (data) using a plurality of models.

제2 레이블 생성부(28-2)는 등록된 룰셋 중 적어도 하나의 룰을 이용하여 데이터셋에 대한 자동 레이블을 생성할 수 있다. 가령, 제2 레이블 생성부(28-2)는 개체명에 관한 적어도 하나의 룰을 이용하여 주어진 단어에 대한 개체명 레이블을 생성할 수 있다.The second label generator 28-2 may generate an automatic label for the data set using at least one rule from the registered rule set. For example, the second label generator 28-2 may generate an entity name label for a given word using at least one rule related to the entity name.

제1 레이블 생성부(28-1) 및 제2 레이블 생성부(28-2)에 관하여서는 도 5 내지 도 8을 설명 내용을 더 참조하도록 한다.Regarding the first label generator 28-1 and the second label generator 28-2, refer to FIGS. 5 to 8 for further explanation.

한편, 도 2에 도시되어 있지는 않으나, 레이블링부(23)는 레이블링 환경 설정부(미도시)를 더 포함할 수도 있다. 레이블링 환경 설정부(미도시)는 레이블링 환경에 대한 다양한 설정 기능을 제공할 수 있다. 가령, 레이블링 환경 설정부(미도시)는 레이블링 작업 화면(도 6 참조)에서 자동 레이블과 수동 레이블이 다른 방식(e.g. 다른 색상, 다른 크기, 다른 색상 농도 등)으로 표시되도록 설정하는 기능을 제공할 수 있다. 또는, 레이블링 환경 설정부(미도시)는 레이블링 작업 화면(도 6 참조)에서 모델 기반 자동 레이블과 룰셋 기반 자동 레이블이 다른 방식(e.g. 다른 색상, 다른 크기, 다른 색상 농도 등)으로 표시되도록 설정하는 기능을 제공할 수도 있다. 또는, 레이블링 환경 설정부(미도시)는 레이블링 작업 화면(도 6 참조)에서 레이블의 종류에 따라 다른 방식(e.g. 다른 색상, 다른 크기, 다른 색상 농도 등)으로 표시되도록 설정하는 기능을 제공할 수도 있다. 또는, 레이블링 환경 설정부(미도시)는 레이블링 작업 화면(도 6 참조)에서 레이블의 컨피던스 스코어(confidence score)에 따라 다른 방식(e.g. 다른 색상, 다른 크기, 다른 색상 농도 등)으로 표시되도록 설정하는 기능을 제공할 수도 있다.Meanwhile, although not shown in FIG. 2, the labeling unit 23 may further include a labeling environment setting unit (not shown). The labeling environment setting unit (not shown) may provide various setting functions for the labeling environment. For example, the labeling environment setting unit (not shown) may provide a function to set automatic labels and manual labels to be displayed in different ways (e.g. different colors, different sizes, different color densities, etc.) on the labeling work screen (see FIG. 6). You can. Alternatively, the labeling environment setting unit (not shown) sets the model-based automatic label and the ruleset-based automatic label to be displayed in different ways (e.g. different colors, different sizes, different color densities, etc.) on the labeling work screen (see FIG. 6). Functions may also be provided. Alternatively, the labeling environment setting unit (not shown) may provide a function to set the label to be displayed in a different way (e.g. different color, different size, different color density, etc.) depending on the type of label on the labeling work screen (see FIG. 6). there is. Alternatively, the labeling environment setting unit (not shown) sets the label to be displayed in a different way (e.g. different color, different size, different color density, etc.) depending on the confidence score of the label on the labeling work screen (see FIG. 6). Functions may also be provided.

다음으로, 저장소(24)는 레이블링 시스템(10)에서 이용되는 각종 데이터/정보를 저장할 수 있다. 저장소(24)는 레이블링 시스템(10)의 내부에 구현될 수도 있고, 외부(e.g. 클라우드 저장소)에 구현될 수도 있다. 또한, 저장소(24)의 일부는 레이블링 시스템(10)의 내부에 구현되고, 나머지는 외부에 구현될 수도 있다.Next, storage 24 may store various data/information used in labeling system 10. Storage 24 may be implemented internally to the labeling system 10 or externally (e.g. cloud storage). Additionally, a portion of the storage 24 may be implemented internally to the labeling system 10 and the remainder may be implemented externally.

도시된 바와 같이, 실시예들에 따른 저장소(24)는 모델 저장소(29-1), 룰셋 저장소(29-2), 기타 저장소(29-3)를 포함할 수 있다.As shown, the storage 24 according to embodiments may include a model storage 29-1, a ruleset storage 29-2, and other storage 29-3.

모델 저장소(29-1)에는 레이블링에 이용될 수 있는 다양한 모델셋이 저장될 수 있고, 룰셋 저장소(29-2)에는 다양한 룰셋이 저장될 수 있으며, 기타 저장소(29-3)는 회원 정보, 레이블 데이터셋 등과 같은 각종 데이터가 저장될 수 있다. Various model sets that can be used for labeling can be stored in the model storage 29-1, various rule sets can be stored in the ruleset storage 29-2, and the other storage 29-3 contains member information, Various data such as label datasets, etc. may be stored.

지금까지 도 2를 참조하여 레이블링 시스템(10)의 예시적인 구성에 대하여 설명하였다. 상술한 바에 따르면, 모델셋 및 룰셋 기반의 자동 레이블링 기능을 구비한 레이블링 시스템(10)이 제공될 수 있다. 이에 따라, 사용자, 어노테이터 등의 작업자는 레이블링 시스템(10)을 통해 생성된 자동 레이블을 이용하여 신속하게 레이블링 작업을 수행할 수 있게 되며, 그 결과 언레이블 데이터셋의 레이블링에 소요되는 인적 비용 및 시간 비용이 크게 절감될 수 있다.So far, an exemplary configuration of the labeling system 10 has been described with reference to FIG. 2 . According to the above, a labeling system 10 equipped with an automatic labeling function based on a model set and a rules set can be provided. Accordingly, workers such as users and annotators can quickly perform labeling tasks using the automatic labels generated through the labeling system 10, and as a result, the human cost and cost incurred in labeling the unlabeled dataset are reduced. Time costs can be greatly reduced.

또한, 사용자 모델에 대한 등록 및 사용자 모델을 이용한 레이블링 기능을 제공함으로써, 레이블링 시스템(10)을 이용하는 사용자(e.g. 개인 고객, 기업 고객)의 편의성이 크게 향상될 수 있다. 가령, 사용자는 직접 시스템을 구축할 필요 없이 제공된 레이블링 시스템(10)과 자신들의 모델을 이용하여 레이블링 작업을 수행할 수 있게 된다.Additionally, by providing user model registration and labeling functions using the user model, the convenience of users (e.g. individual customers, corporate customers) using the labeling system 10 can be greatly improved. For example, users can perform labeling tasks using the provided labeling system 10 and their own models without having to build the system themselves.

이하에서는, 도 3 이하의 도면을 참조하여 레이블링 시스템(10)의 동작(또는 시스템 10에서 수행되는 레이블링 방법)에 대하여 상세하게 설명하도록 한다.Hereinafter, the operation of the labeling system 10 (or the labeling method performed in the system 10) will be described in detail with reference to the drawings of FIG. 3 and below.

레이블링 시스템(10)에서 수행되는 레이블링 방법은 다양한 프로세스들(e.g. 모델 등록 프로세스, 레이블링 프로세스, 검수 프로세스 등)의 조합으로 구성될 수 있다. 따라서, 이해의 편의를 위해, 레이블링 방법을 프로세스 별로 구분하여 설명하도록 한다.The labeling method performed by the labeling system 10 may be composed of a combination of various processes (e.g. model registration process, labeling process, inspection process, etc.). Therefore, for convenience of understanding, the labeling method will be explained separately by process.

먼저, 도 3 및 도 4를 참조하여 본 개시의 일 실시예에 따른 모델 등록 프로세스에 대하여 설명하도록 한다.First, the model registration process according to an embodiment of the present disclosure will be described with reference to FIGS. 3 and 4.

도 3은 본 개시의 일 실시예에 따른 모델 등록 프로세스를 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다. 도 3 이하의 도면에서, 점선의 화살표는 처리 결과에 대한 전달(e.g. ACK)을 의미할 수 있다.3 is an example flowchart showing a model registration process according to an embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed. In the drawings of FIG. 3 and below, a dotted arrow may mean delivery of a processing result (e.g. ACK).

도 3에 도시된 바와 같이, 등록 대상이 되는 모델은 시스템 모델과 사용자 모델을 포함할 수 있고, 시스템 모델에 대한 등록 프로세스는 모델 관리부(25)가 관리자 단말(50)로부터 등록 요청을 수신하는 단계 S31에서 시작될 수 있다. 가령, 모델 관리부(25)는 등록 요청과 함께 시스템 모델(e.g. 학습된 자연어처리 모델)과 모델 정보를 수신할 수 있다.As shown in FIG. 3, the model subject to registration may include a system model and a user model, and the registration process for the system model includes the step of the model management unit 25 receiving a registration request from the administrator terminal 50. It can be started from S31. For example, the model management unit 25 may receive a system model (e.g. a learned natural language processing model) and model information along with a registration request.

시스템 모델의 정보는 예를 들어 모델 명칭, 모델 버전, 태스크(레이블)의 종류(e.g. 단어의 개체명, 문장의 의도/감성, 문서의 토픽/감성 등), 도커(docker) 정보(e.g. 도커 식별자, 도커 실행을 위한 파이썬 명령어 등), 모델에 대한 설명 및/또는 코멘트, 생성되는 레이블 값들(e.g. 클래스 명칭) 등에 관한 정보가 될 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니다.Information on the system model includes, for example, model name, model version, type of task (label) (e.g. entity name of word, intention/sentiment of sentence, topic/sentiment of document, etc.), Docker information (e.g. Docker identifier) , Python commands for Docker execution, etc.), description and/or comments about the model, generated label values (e.g. class name), etc. However, the scope of the present disclosure is not limited thereto.

단계 S32에서, 모델 관리부(25)는 요청에 응답하여 시스템 모델을 등록할 수 있다. 가령, 모델 관리부(25)는 등록 모델 리스트를 업데이트하고, 요청된 시스템 모델과 모델 정보를 모델 저장소(29-1)에 저장할 수 있다.In step S32, the model management unit 25 may register a system model in response to the request. For example, the model management unit 25 may update the registered model list and store the requested system model and model information in the model storage 29-1.

다음으로, 사용자 모델에 대한 등록 프로세스는 사용자 단말(20)로부터 등록 요청이 수신되는 점을 제외하고 시스템 모델 등록 프로세스와 동일한 방식으로 수행될 수 있다(S33, S34 참조). 따라서, 이에 대한 설명은 생략하도록 한다.Next, the registration process for the user model may be performed in the same manner as the system model registration process, except that a registration request is received from the user terminal 20 (see S33 and S34). Therefore, description thereof will be omitted.

도 4는 본 개시의 몇몇 실시예들에서 참조될 수 있는 모델 등록 화면(40)을 예시하고 있다. 가령, 도 4에 도시된 화면(40)은 모델 등록 시에 관리자 단말(50) 또는 사용자 단말(20)에 표시되는 화면일 수 있다.Figure 4 illustrates a model registration screen 40 that may be referenced in some embodiments of the present disclosure. For example, the screen 40 shown in FIG. 4 may be a screen displayed on the administrator terminal 50 or the user terminal 20 when registering a model.

도 4에 도시된 바와 같이, 관리자 및/또는 사용자는 모델 명칭(41), 버전(42), 태스크의 유형(43), 설명(44), 생성되는 레이블 값들(45) 등과 같은 모델 정보를 입력함으로써, 시스템 모델 및/또는 사용자 모델을 모델 관리부(25)를 통해 등록할 수 있다.As shown in Figure 4, the administrator and/or user inputs model information such as model name (41), version (42), type of task (43), description (44), generated label values (45), etc. By doing so, the system model and/or user model can be registered through the model management unit 25.

참고로, 룰셋 등록 프로세스 또한 도 3에 예시된 바와 유사한 방식으로 룰셋 관리부(26)에 의해 수행될 수 있으며, 등록된 룰셋에는 관리자에 의해 등록된 시스템 룰셋과 사용자로부터 획득된 사용자 룰셋이 포함될 수 있다.For reference, the ruleset registration process may also be performed by the ruleset management unit 26 in a manner similar to that illustrated in FIG. 3, and the registered ruleset may include a system ruleset registered by the administrator and a user ruleset obtained from the user. .

지금까지 도 3 및 도 4를 참조하여 본 개시의 일 실시예에 따른 모델 등록 프로세스에 대하여 설명하였다. 이하에서는, 도 5 내지 도 7을 참조하여 레이블링 프로세스에 관한 실시예들에 대하여 설명하도록 한다.So far, the model registration process according to an embodiment of the present disclosure has been described with reference to FIGS. 3 and 4. Hereinafter, embodiments of the labeling process will be described with reference to FIGS. 5 to 7.

도 5는 본 개시의 일 실시예에 따른 레이블링 프로세스를 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.5 is an example flowchart illustrating a labeling process according to an embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed.

도 5에 도시된 바와 같이, 본 실시예는 어노테이터에게 레이블링 작업을 요청하지 않고 사용자가 직접 레이블링 작업을 수행하는 경우에 관한 것이다.As shown in Figure 5, this embodiment relates to a case where the user directly performs the labeling task without requesting the annotator to perform the labeling task.

구체적으로, 본 실시예는 작업 관리부(27)가 사용자 단말(20)로부터 레이블링 작업의 생성 요청을 수신하는 단계 S51에서 시작될 수 있다. 여기서, 레이블링 작업의 생성 요청은 반드시 명시적인 요청을 의미하는 것은 아니며, 언레이블 데이터셋에 대한 업로드, 레이블링을 위한 모델의 선택, 레이블의 종류 선택 등을 통해 수행되는 것일 수도 있다.Specifically, this embodiment may begin at step S51 in which the task management unit 27 receives a request for creating a labeling task from the user terminal 20. Here, a request to create a labeling task does not necessarily mean an explicit request, and may be performed through uploading an unlabeled dataset, selecting a model for labeling, selecting the type of label, etc.

단계 S52에서, 작업 관리부(27)는 요청에 응답하여 레이블링 작업을 생성할 수 있다.In step S52, the job management unit 27 may create a labeling job in response to the request.

단계 S53-1 내지 S53-3에서, 제1 레이블 생성부(28-1)는 사용자 단말(20)의 요청에 응답하여 모델 기반으로 언레이블 데이터셋에 대한 레이블을 생성할 수 있다. 가령, 제1 레이블 생성부(28-1)는 기 등록된 모델셋 중에서 사용자에 의해 선택된 적어도 하나의 모델(e.g. 시스템 모델, 사용자 모델)을 이용하여 레이블을 생성할 수 있다. 그리고, 제1 레이블 생성부(28-1)는 레이블 생성 결과를 사용자 단말(20)에게 제공할 수 있다.In steps S53-1 to S53-3, the first label generator 28-1 may generate a label for the unlabeled dataset based on a model in response to a request from the user terminal 20. For example, the first label generator 28-1 may generate a label using at least one model (e.g. system model, user model) selected by the user from among the pre-registered model sets. Additionally, the first label generator 28-1 may provide the label generation result to the user terminal 20.

보다 이해의 편의를 제공하기 위해, 도 6에 예시된 레이블링 작업 화면을 참조하여 단계 S53-1 내지 S53-3에 대하여 부연 설명하도록 한다.In order to provide easier understanding, steps S53-1 to S53-3 will be further explained with reference to the labeling work screen illustrated in FIG. 6.

도 6은 본 개시의 몇몇 실시예들에서 참조될 수 있는 레이블링 작업 화면(60)을 예시하고 있다. 가령, 도 6에 도시된 화면(60)은 레이블링 작업 시에 작업자(e.g. 사용자, 어노테이터)의 단말(20, 30)에 표시되는 화면일 수 있다. 도 6은 언레이블 데이터셋이 텍스트 데이터셋인 경우를 가정하고 있다.6 illustrates a labeling task screen 60 that may be referenced in some embodiments of the present disclosure. For example, the screen 60 shown in FIG. 6 may be a screen displayed on the terminals 20 and 30 of an operator (e.g. user, annotator) during labeling work. Figure 6 assumes that the unlabeled dataset is a text dataset.

도 6에 도시된 바와 같이, 레이블링 작업 화면(60)에는 텍스트 데이터셋(또는 데이터)과 단어 레벨의 레이블(61-1, 61-3)이 표시되는 제1 영역(61), 문장 레벨의 레이블(62-1, 62-2)이 표시되는 영역(62), 문서 레벨의 레이블(63-1)이 표시되는 제3 영역(63)이 포함될 수 있다. 다만, 각 영역(61 내지 63)의 배치 등은 얼마든지 달라질 수 있다. 도 6은 각 단어(61-2, 61-4)의 인접 위치에 단어 레벨의 레이블(61-3, 61-3)이 유사한 색상으로 태깅된 것을 예시하고 있다.As shown in FIG. 6, the labeling work screen 60 includes a first area 61 where a text dataset (or data) and word-level labels 61-1 and 61-3 are displayed, and a sentence-level label An area 62 where (62-1, 62-2) is displayed and a third area 63 where a document level label 63-1 is displayed may be included. However, the arrangement of each area 61 to 63 may vary. Figure 6 illustrates that word-level labels 61-3 and 61-3 are tagged with similar colors at adjacent positions of each word 61-2 and 61-4.

또한, 레이블링 작업 화면(60)에는 자동 레이블링을 요청하기 위한 인터페이스(e.g. 버튼), 언레이블 데이터셋을 선택(e.g. 데이터셋이 저장된 폴더 선택)하기 위한 인터페이스(66), 검수 요청을 위한 인터페이스(65), 레이블의 종류를 선택하기 위한 인터페이스(64) 등이 표시될 수도 있다. 이를 통해, 사용자(또는 어노테이터)는 사용자 편의적인 방식으로 레이블링 작업을 수행할 수 있다.In addition, the labeling work screen 60 includes an interface for requesting automatic labeling (e.g. button), an interface 66 for selecting an unlabeled dataset (e.g. selecting a folder where the dataset is stored), and an interface 65 for requesting inspection. ), an interface 64 for selecting the type of label, etc. may be displayed. Through this, the user (or annotator) can perform labeling tasks in a user-friendly manner.

작업자는 도 6에 예시된 레이블링 작업 화면(60)에서 수동 레이블을 태깅할 수도 있고, 레이블 별로 코멘트를 태깅할 수도 있다.The operator may tag manual labels in the labeling work screen 60 illustrated in FIG. 6 and tag comments for each label.

다시 도 5를 참조하여 설명한다.This will be described again with reference to FIG. 5 .

단계 S54-1 내지 단계 S54-3에서, 제2 레이블 생성부(28-2)는 사용자 단말(20)의 요청에 응답하여 룰셋 기반으로 언레이블 데이터셋에 대한 레이블을 생성할 수 있다. 가령, 제2 레이블 생성부(28-2)는 기 등록된 룰셋 중에서 사용자에 의해 선택된 적어도 하나의 룰을 이용하여 레이블을 생성할 수 있다. 그리고, 제2 레이블 생성부(28-2)는 레이블 생성 결과를 사용자 단말(20)에게 제공할 수 있다.In steps S54-1 to S54-3, the second label generator 28-2 may generate a label for the unlabeled dataset based on the ruleset in response to a request from the user terminal 20. For example, the second label generator 28-2 may generate a label using at least one rule selected by the user from a pre-registered rule set. And, the second label generation unit 28-2 may provide the label generation result to the user terminal 20.

참고로, 도 5는 단계 S53-1 내지 S53-3과 단계 S54-1 내지 S54-3이 순차적으로 수행되는 것을 예로써 도시하고 있으나, 본 개시의 범위가 이에 한정되는 것은 아니며, 수행 순서는 얼마든지 변경될 수 있다.For reference, Figure 5 shows as an example that steps S53-1 to S53-3 and steps S54-1 to S54-3 are performed sequentially, but the scope of the present disclosure is not limited thereto, and the execution order is not limited to this. Anything can be changed.

단계 S55에서, 작업 관리부(27)는 사용자 단말(20)로부터 작업 완료 또는 검수에 대한 요청을 수신할 수 있다.In step S55, the task management unit 27 may receive a request for task completion or inspection from the user terminal 20.

단계 S56에서, 작업 관리부(27)는 요청에 따른 처리를 수행할 수 있다. 가령, 작업 관리부(27)는 작업 완료 요청에 따라 레이블링 작업에 대한 완료 처리(e.g. 작업의 상태를 '완료'로 변경)를 수행할 수 있고, 검수 요청에 따라 레이블링 작업의 검수를 위한 처리(e.g. 작업의 상태를 '검수 중'으로 변경하고, 검수 작업을 생성하여 검수자에게 할당)를 수행할 수도 있다. 레이블링 작업의 검수와 관련하여서는 도 8 내지 도 10을 참조하여 잠시 후에 상세하게 설명하도록 한다.In step S56, the task management unit 27 may perform processing according to the request. For example, the task management unit 27 may perform completion processing for the labeling task (e.g. change the status of the task to 'complete') according to a task completion request, and process for inspection of the labeling task according to an inspection request (e.g. You can also change the status of the task to ‘in review’ and create an inspection task and assign it to the inspector. Inspection of labeling work will be described in detail later with reference to FIGS. 8 to 10.

이하에서는, 도 7을 참조하여 본 개시의 다른 실시예에 따른 레이블링 프로세스에 대하여 설명하도록 한다. 다만, 본 개시의 명료함을 위해, 앞선 실시예들과 중복되는 내용에 대한 설명은 생략하도록 한다.Hereinafter, a labeling process according to another embodiment of the present disclosure will be described with reference to FIG. 7. However, for clarity of the present disclosure, description of content that overlaps with the previous embodiments will be omitted.

도 7은 본 개시의 다른 실시예에 따른 레이블링 프로세스를 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.7 is an example flowchart showing a labeling process according to another embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed.

도 7에 도시된 바와 같이, 본 실시예는 어노테이터가 사용자가 요청(의뢰)한 레이블링 작업을 수행하는 경우에 관한 것이다.As shown in FIG. 7, this embodiment relates to a case where an annotator performs a labeling task requested (requested) by a user.

구체적으로, 본 실시예도 작업 관리부(27)가 사용자 단말(20)로부터 레이블링 작업의 생성 요청을 수신하는 단계 S71에서 시작될 수 있다. 상술한 바와 같이, 작업 관리부(27)는 요청에 응답하여 레이블링 작업을 생성할 수 있다(S72).Specifically, this embodiment may also begin at step S71, where the task management unit 27 receives a request for creating a labeling task from the user terminal 20. As described above, the task management unit 27 may generate a labeling task in response to the request (S72).

단계 S73에서, 작업 관리부(27)는 레이블링 작업에 대한 수행 요청을 수신할 수 있다.In step S73, the task management unit 27 may receive a request to perform a labeling task.

단계 S74-1 및 S74-2에서, 작업 관리부(27)는 요청에 응답하여 레이블링 작업을 특정 어노테이터에게 할당하고, 해당 어노테이터의 단말(30)로 작업이 할당되었음을 통지할 수 있다. 가령, 작업 관리부(27)는 레이블링 작업을 특정 어노테이터의 계정에 할당하고, 어노테이터의 계정으로 통지 메시지를 전송할 수 있다.In steps S74-1 and S74-2, the task management unit 27 may assign a labeling task to a specific annotator in response to the request and notify the annotator's terminal 30 that the task has been assigned. For example, the task management unit 27 may assign a labeling task to a specific annotator's account and transmit a notification message to the annotator's account.

단계 S75 및 S76에서, 제1 레이블 생성부(28-1) 및 제2 레이블 생성부(28-2)는 어노테이터 단말(30)의 요청에 응답하여 언레이블 데이터셋에 대한 레이블을 생성할 수 있다. 이에 대한 내용은 앞서 설명한 바와 같으므로, 더 이상의 설명은 생략하도록 한다.In steps S75 and S76, the first label generator 28-1 and the second label generator 28-2 may generate a label for the unlabeled dataset in response to the request from the annotator terminal 30. there is. Since the contents of this are the same as previously described, further explanation will be omitted.

단계 S77 및 S78에서, 작업 관리부(27)는 어노테이터 단말(30)의 작업 완료 요청 또는 검수 요청을 수신하고, 수신된 요청에 따른 처리를 수행할 수 있다. 이에 대한 내용도 앞서 설명한 바와 같으므로, 더 이상의 설명은 생략하도록 한다.In steps S77 and S78, the task management unit 27 may receive a task completion request or an inspection request from the annotator terminal 30 and perform processing according to the received request. Since this is the same as previously described, further explanation will be omitted.

지금까지 도 5 내지 도 7을 참조하여 본 개시의 몇몇 실시예들에 따른 레이블링 프로세스에 대하여 설명하였다. 상술한 바에 따르면, 모델셋 및 룰셋 기반으로 자동 레이블링을 수행함으로써, 레이블링 작업에 소요되는 시간 비용 및 인적 비용이 크게 절감될 수 있다. 뿐만 아니라, 모델셋과 룰셋을 함께 이용함으로써 레이블링 결과물의 품질도 향상될 수 있다. So far, the labeling process according to some embodiments of the present disclosure has been described with reference to FIGS. 5 to 7. According to the above, by performing automatic labeling based on the model set and rule set, the time and human costs required for the labeling task can be greatly reduced. In addition, the quality of labeling results can be improved by using the model set and rule set together.

이하에서는, 도 8 내지 도 10을 참조하여 본 개시의 일 실시예에 따른 검수 프로세스에 대하여 상세하게 설명하도록 한다.Hereinafter, the inspection process according to an embodiment of the present disclosure will be described in detail with reference to FIGS. 8 to 10.

도 8은 본 개시의 일 실시예에 따른 검수 프로세스를 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.8 is an exemplary flowchart showing an inspection process according to an embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed.

도 8에 도시된 바와 같이, 실시예에 따른 검수 프로세스는 작업 관리부(27)가 사용자 단말(20)로부터 검수 요청을 수신하는 단계 S81에서 시작될 수 있다. 다만, 다른 실시예에서는, 작업 관리부(27)가 어노테이터 단말(30)로부터 검수 요청을 수신할 수도 있다.As shown in FIG. 8, the inspection process according to the embodiment may begin at step S81 in which the task management unit 27 receives an inspection request from the user terminal 20. However, in another embodiment, the task management unit 27 may receive an inspection request from the annotator terminal 30.

단계 S82 및 단계 S83에서, 작업 관리부(27)는 검수 요청에 응답하여 검수 작업을 생성하고, 생성된 검수 작업을 특정 검수자에게 할당하고, 해당 검수자에게 작업이 할당되었음을 통지할 수 있다. 여기서, 검수 작업을 생성한다는 것은 예를 들어 사용자가 수행한 레이블링 작업의 상태를 '검수 중'으로 변경하는 것을 의미할 수도 있다. 또한, 검수자에게 작업을 할당한다는 것은 예를 들어 검수 작업을 해당 검수자의 계정에 할당하는 것을 의미할 수도 있고, 검수자(또는 검수자의 단말 40)에게 작업 할당을 통지한다는 것은 검수자의 계정으로 작업 할당에 관한 통지 메시지를 전송하는 것을 의미할 수도 있다.In steps S82 and S83, the task management unit 27 may create an inspection task in response to the inspection request, assign the created inspection task to a specific inspector, and notify the inspector that the task has been assigned. Here, creating a review task may mean, for example, changing the status of the labeling task performed by the user to 'under review'. In addition, assigning a task to an inspector may mean, for example, assigning the inspection task to the inspector's account, and notifying the inspector (or the inspector's terminal 40) of task assignment means assigning the task to the inspector's account. It may also mean sending a notification message regarding.

단계 S84 및 S85에서, 작업 관리부(27)는 검수자 단말(40)로부터 검수 결과를 수신하고, 검수 결과에 따른 처리를 수행할 수 있다. 가령, 검수 결과가 '확정(confirm)'인 경우, 작업 관리부(27)는 레이블링 작업에 대한 완료 처리를 수행할 수 있다(e.g. 작업의 상태를 '완료'로 변경함). 이와 반대로, 검수 결과가 '거절(reject)'인 경우, 작업 관리부(27)는 사용자(또는 어노테이터)에게 레이블링 작업을 다시 요청할 수도 있다.In steps S84 and S85, the task management unit 27 may receive the inspection result from the inspector terminal 40 and perform processing according to the inspection result. For example, if the inspection result is 'confirm', the task management unit 27 may perform completion processing for the labeling task (e.g. change the status of the task to 'complete'). Conversely, if the inspection result is 'reject', the task management unit 27 may request the user (or annotator) to perform the labeling task again.

보다 이해의 편의를 제공하기 위해, 도 9 및 도 10을 참조하여 검수 작업의 처리 방식에 대하여 부연 설명하도록 한다.In order to provide easier understanding, the processing method of the inspection work will be further explained with reference to FIGS. 9 and 10.

도 9는 본 개시의 몇몇 실시예들에서 참조될 수 있는 검수 작업 리스트 화면(90)을 예시하고 있다. 가령, 도 9에 도시된 화면(90)은 검수자의 단말(40)에 표시되는 화면일 수 있다.Figure 9 illustrates an inspection task list screen 90 that may be referenced in some embodiments of the present disclosure. For example, the screen 90 shown in FIG. 9 may be a screen displayed on the inspector's terminal 40.

도 9에 도시된 바와 같이, 검수 작업 리스트 화면(90)에는 검수 작업들에 대한 처리 현황이 표시되는 제1 영역(91)과 해당 검수자에게 할당된 검수 작업들(93, 94)의 리스트가 표시되는 제2 영역(92)이 포함될 수 있다. 검수자는 리스트에 포함된 검수 작업(e.g. 93 또는 94)을 선택하여 레이블링 작업의 결과물에 대한 검수를 수행할 수 있다.As shown in FIG. 9, the inspection task list screen 90 displays a first area 91 that displays the processing status of inspection tasks and a list of inspection tasks 93 and 94 assigned to the corresponding inspector. A second area 92 may be included. The inspector can select the inspection task (e.g. 93 or 94) included in the list to inspect the results of the labeling task.

도 10은 본 개시의 몇몇 실시예들에서 참조될 수 있는 검수 작업 화면(100)을 예시하고 있다. 가령, 도 10에 도시된 화면(100)은 검수자의 단말(40)에 표시되는 작업 화면일 수 있으며, 검수 작업 리스트 화면(90)에서 특정 검수 작업(e.g. 93)에 대한 선택 입력에 응답하여 표시되는 화면일 수 있다. 도 10은 언레이블 데이터셋이 텍스트 데이터셋인 경우를 가정하고 있다.FIG. 10 illustrates an inspection task screen 100 that may be referenced in some embodiments of the present disclosure. For example, the screen 100 shown in FIG. 10 may be a task screen displayed on the inspector's terminal 40, and is displayed in response to a selection input for a specific inspection task (e.g. 93) on the inspection task list screen 90. It may be a screen that works. Figure 10 assumes the case where the unlabeled dataset is a text dataset.

도 10에 도시된 바와 같이, 검수 작업 화면(100)에는 텍스트 데이터셋(또는 데이터)과 단어 레벨의 레이블(101-1 등)이 표시되는 제1 영역(101), 문장 레벨의 레이블(102-1)이 표시되는 제2 영역, 문서 레벨의 레이블(103-1)이 표시되는 제3 영역(103) 및 검수 결과를 입력(선택)하기 위한 인터페이스가 표시되는 제4 영역(104)이 포함될 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니고, 검수 작업 화면(100)은 얼마든지 다른 방식으로 설계될 수도 있다.As shown in FIG. 10, the inspection work screen 100 includes a first area 101 where a text dataset (or data) and a word-level label (101-1, etc.) are displayed, and a sentence-level label (102- It may include a second area where 1) is displayed, a third area 103 where the document level label 103-1 is displayed, and a fourth area 104 where an interface for entering (selecting) the inspection result is displayed. there is. However, the scope of the present disclosure is not limited to this, and the inspection work screen 100 may be designed in many different ways.

검수자는 각 영역(101 내지 103)에 표시된 레이블(101-1 내지 101-5, 102-1, 103-1)을 확인함으로써 신속하게 검수를 수행할 수 있고, 제4 영역(104)의 인터페이스를 통해 검수 결과를 작업 관리부(27)에게 전달할 수 있다.The inspector can quickly perform the inspection by checking the labels (101-1 to 101-5, 102-1, and 103-1) displayed in each area (101 to 103) and use the interface of the fourth area (104). The inspection results can be delivered to the work management department (27).

일 실시예에서는, 동일 데이터(101-6, e.g. 단어 '남산타워')에 대해 복수개의 모델들에 의해 생성된 복수개의 후보 레이블들(101-3)이 표시될 수 있다. 가령, 복수개의 후보 레이블들(101-3)은 소정의 선택 입력(e.g. 원래의 레이블 101-2를 선택하는 입력)에 응답하여 검수 작업 화면(100)에 표시될 수 있다. 다만, 본 개시의 범위가 이에 한정되는 것은 아니다. 이때, 복수개의 후보 레이블들(101-3)은 레이블의 값에 따라 다른 방식(e.g. 색상)으로 표시될 수도 있다. 가령, 제1 레이블(101-4)과 제2 레이블(101-5)은 값이 다르므로(e.g. '지명'과 '위치'로 개체명이 다름), 다른 색상을 표시될 수 있다. 또는, 복수개의 후보 레이블들(101-3)은 컨피던스 스코어의 값에 따라 다른 색상 농도로 표시될 수도 있다(e.g. 컨??건스 스코어의 값이 클수록 짙은 농도로 표시됨). 이러한 경우, 검수자는 복수의 후보 레이블들(101-3)을 참고하여 검수를 신속하게 정확하게 수행할 수 있게 된다. 가령, 복수의 후보 레이블들(101-3) 중에 정답이 존재하는 경우, 검수자는 정답 레이블을 선택하는 방식으로 빠르게 검수(정정)를 수행할 수 있게 된다.In one embodiment, a plurality of candidate labels 101-3 generated by a plurality of models may be displayed for the same data 101-6 (e.g. the word 'Namsan Tower'). For example, a plurality of candidate labels 101-3 may be displayed on the inspection work screen 100 in response to a predetermined selection input (e.g. an input selecting the original label 101-2). However, the scope of the present disclosure is not limited thereto. At this time, the plurality of candidate labels 101-3 may be displayed in different ways (e.g. colors) depending on the value of the label. For example, since the first label 101-4 and the second label 101-5 have different values (e.g. the entity names are different as 'place name' and 'location'), different colors may be displayed. Alternatively, the plurality of candidate labels 101-3 may be displayed with different color intensities depending on the value of the confidence score (e.g., the larger the value of the confidence score, the darker the color density). In this case, the inspector can quickly and accurately perform inspection by referring to the plurality of candidate labels 101-3. For example, if the correct answer exists among the plurality of candidate labels 101-3, the inspector can quickly perform inspection (correction) by selecting the correct answer label.

지금까지 도 8 내지 도 10을 참조하여 본 개시의 일 실시예에 따른 검수 프로세스에 대하여 설명하였다. 상술한 바에 따르면, 레이블링 결과물에 대해 검수 프로세스를 수행(진행)함으로써, 레이블링 결과물에 대한 품질이 보장될 수 있다.So far, the inspection process according to an embodiment of the present disclosure has been described with reference to FIGS. 8 to 10. According to the above, the quality of the labeling result can be guaranteed by performing (proceeding with) an inspection process on the labeling result.

이하에서는, 레이블링 결과물의 품질을 더욱 향상시키기 위한 방법들에 대하여 설명하도록 한다.Below, methods for further improving the quality of labeling results will be described.

먼저, 도 11을 참조하여 데이터셋의 레이블링에 적합한 모델을 추천하는 방법에 관하여 설명하도록 한다.First, with reference to Figure 11, we will explain how to recommend a model suitable for labeling a dataset.

도 11은 본 개시의 일 실시예에 따른 모델 추천 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.Figure 11 is an example flowchart showing a model recommendation method according to an embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed.

도 11에 도시된 바와 같이, 본 실시예는 언레이블 데이터셋의 도메인을 예측하는 단계 S111에서 시작될 수 있다. 가령, 레이블링 시스템(10)은 데이터셋의 도메인을 예측하는 모델(즉, 학습된 모델)을 이용하여 데이터셋의 도메인을 예측할 수 있다. 또는, 언레이블 데이터셋이 텍스트 데이터셋인 경우, 레이블링 시스템(10)은 토픽 분류 모델을 이용하여 언레이블 데이터셋과 연관된 하나 이상의 토픽을 예측하고, 예측된 토픽을 이용하여 도메인을 예측할 수도 있다. 또는, 레이블링 시스템(10)는 도메인 용어(단어) 사전을 이용하여 언레이블 데이터셋에 나타나는 용어(단어)들로부터 데이터셋의 도메인을 예측할 수도 있다.As shown in FIG. 11, this embodiment may begin at step S111 of predicting the domain of the unlabeled dataset. For example, the labeling system 10 may predict the domain of the dataset using a model (i.e., a learned model) that predicts the domain of the dataset. Alternatively, when the unlabeled dataset is a text dataset, the labeling system 10 may predict one or more topics related to the unlabeled dataset using a topic classification model and predict a domain using the predicted topics. Alternatively, the labeling system 10 may predict the domain of the dataset from terms (words) appearing in the unlabeled dataset using a domain term (word) dictionary.

데이터셋의 도메인은 다양한 방식으로 정의될 수 있으며, 어떠한 방식으로 정의되더라도 무방하다. 가령, 데이터셋의 도메인은 뉴스, 특허, 논문 도메인 등과 같이 텍스트의 유형에 기초하여 정의될 수도 있고, 의료, 금융, 경제 도메인과 같이 텍스트의 내용(또는 기술 분야)에 기초하여 정의될 수도 있다.The domain of a dataset can be defined in various ways, and it may be defined in any way. For example, the domain of the dataset may be defined based on the type of text, such as a news, patent, or paper domain, or may be defined based on the content (or technical field) of the text, such as a medical, financial, or economic domain.

단계 S112에서, 예측된 도메인과 연관된 하나의 이상의 후보 모델이 선별될 수 있다. 가령, 레이블링 시스템(10)은 기 등록된 모델셋에서 예측된 도메인의 데이터셋을 학습한 하나 이상의 모델을 후보 모델로 선별할 수 있다.In step S112, one or more candidate models associated with the predicted domain may be selected. For example, the labeling system 10 may select one or more models that learned a dataset of a predicted domain from a pre-registered model set as a candidate model.

단계 S113에서, 모델의 성능을 기초로 선별된 후보 모델 중에서 하나 이상의 모델이 레이블링 위한 모델로 추천될 수 있다. 가령, 레이블링 시스템(10)은 모델의 성능이 기준치 이상인 후보 모델을 작업자(e.g. 사용자, 어노테이터)에게 추천할 수 있다. 다만, 모델의 성능을 평가(판단)하는 방식은 다양하게 설계될 수 있다.In step S113, one or more models among candidate models selected based on model performance may be recommended as a model for labeling. For example, the labeling system 10 may recommend a candidate model whose model performance is above a standard value to an operator (e.g. user, annotator). However, the method of evaluating (judging) the performance of the model can be designed in various ways.

일 예를 들어, 모델의 성능은 언레이블 데이터셋과 동일한 도메인에 속한 레이블 데이터셋에 대한 성능 평가 결과에 기초하여 평가(판단)될 수 있다.For example, the performance of the model may be evaluated (judged) based on the performance evaluation results for a labeled dataset belonging to the same domain as the unlabeled dataset.

다른 예로서, 모델의 성능은 해당 모델을 이용하여 레이블링된 결과물에 대한 검수 결과에 기초하여 평가(판단)될 수 있다. 가령, 검수자가 거절(reject)한 횟수, 수정한 정도 등에 기초하여 모델의 성능이 평가(판단)될 수 있다(e.g. 거절 횟수가 많을수록, 수정한 정도가 클수록 모델의 성능이 떨어지는 것으로 평가).As another example, the performance of a model may be evaluated (judged) based on the inspection results of the results labeled using the model. For example, the performance of the model can be evaluated (judged) based on the number of rejections by the inspector, the degree of modification, etc. (e.g., the more the number of rejections and the greater the degree of modification, the worse the model performance is evaluated).

또 다른 예로서, 모델의 성능은 해당 모델을 이용한 작업자(e.g. 사용자, 어노테이터)의 평가 점수에 기초하여 평가(판단)될 수 있다.As another example, the performance of a model may be evaluated (judged) based on the evaluation scores of workers (e.g. users, annotators) who use the model.

또 다른 예로서, 모델의 성능을 상술한 다양한 예시들의 조합에 기초하여 평가될 수 있다. 가령, 모델의 성능은 상술한 예시들에 따른 성능 평가 점수의 가중치 합에 기초하여 평가(판단)될 수도 있다.As another example, the performance of the model may be evaluated based on a combination of the various examples described above. For example, the performance of the model may be evaluated (judged) based on the weighted sum of the performance evaluation scores according to the above-described examples.

참고로, 상술한 모델 추천 방법은 모델 관리부(25)에 의해 수행될 수 있을 것이나, 본 개시의 범위가 이에 한정되는 것은 아니다. 가령, 상술한 모델 추천 방법은 레이블링 시스템(10)의 다른 구성요소에 의해 수행될 수도 있다.For reference, the above-described model recommendation method may be performed by the model management unit 25, but the scope of the present disclosure is not limited thereto. For example, the model recommendation method described above may be performed by other components of the labeling system 10.

이하에서는, 도 12를 참조하여 모델(122)에 대한 추가 학습을 통해 레이블링 결과물의 품질을 향상시키는 방법에 관하여 설명하도록 한다.Hereinafter, a method of improving the quality of labeling results through additional learning of the model 122 will be described with reference to FIG. 12.

도 12는 본 개시의 일 실시예에 따른 모델 추가 학습 방법을 설명하기 위한 예시적인 도면이다. 이하, 도 12를 참조하여 설명한다.FIG. 12 is an exemplary diagram illustrating a method for additional model learning according to an embodiment of the present disclosure. Hereinafter, description will be made with reference to FIG. 12.

상술한 바와 같이, 레이블링 시스템(10)은 모델(122)을 이용하여 언레이블 데이터셋(121)에 대한 레이블(123)을 생성할 수 있다. 또한, 생성된 레이블(123)을 포함하는 레이블 데이터셋(124)은 검수자에 의해 검수가 수행될 수 있다.As described above, the labeling system 10 may generate a label 123 for the unlabeled dataset 121 using the model 122. Additionally, the label dataset 124 including the generated label 123 may be inspected by an inspector.

위와 같은 경우, 레이블링 시스템(10)은 검수가 완료된 데이터셋(125)을 이용하여 모델(122)에 대한 추가 학습(또는 파인-튜닝)을 수행할 수 있다. 이때, 추가 학습되는 모델(122)은 도 12에 도시된 바와는 달리 레이블링에 이용되지 않은 모델일 수도 있다. 또한, 경우에 따라, 레이블링 시스템(10)은 추가 학습이 아니라 모델(122)에 대한 재학습을 수행할 수도 있다. 가령, 레이블링 시스템(10)은 모델(122)의 가중치를 초기화하고, 검수가 완료된 데이터셋(125)을 이용하여(또는 이전의 학습에 이용된 데이터셋을 더 이용하여) 모델(122)을 다시 학습시킬 수도 있다. 검수가 완료된 데이터셋(125)은 레이블의 품질(즉, 정확도)이 우수할 것이기 때문에, 모델(122)의 성능을 더욱 향상시킬 수 있다.In the above case, the labeling system 10 may perform additional learning (or fine-tuning) on the model 122 using the inspected dataset 125. At this time, the additionally learned model 122 may be a model that has not been used for labeling, unlike shown in FIG. 12 . Additionally, in some cases, the labeling system 10 may perform retraining on the model 122 rather than additional learning. For example, the labeling system 10 initializes the weights of the model 122 and re-creates the model 122 using the verified dataset 125 (or further using the dataset used in previous learning). You can also learn it. Since the inspected dataset 125 will have excellent label quality (i.e., accuracy), the performance of the model 122 can be further improved.

한편, 레이블링 시스템(10)은 추가 학습된 모델(126)을 이용하여 데이터셋(121 또는 124)에 대한 레이블링을 다시 수행할 수도 있다. 가령, 레이블링 시스템(10)은 데이터셋(121 또는 124)에서 검수자에 의해 레이블이 정정된 데이터를 제외하고 나머지 데이터에 대해서 다시 레이블링을 수행할 수 있다. 그리고, 레이블링 시스템(10)은 나머지 데이터의 레이블링 결과물에 대해 검수를 요청할 수 있다. 또는, 레이블링 시스템(10)은 나머지 데이터의 레이블링 결과물 중에서 추가 학습 전 모델(122)의 레이블링 결과(e.g. 123)와 차이가 있는 부분에 대해서만 검수를 요청할 수도 있다.Meanwhile, the labeling system 10 may re-perform labeling on the dataset 121 or 124 using the additionally learned model 126. For example, the labeling system 10 may re-label the remaining data in the data set 121 or 124, excluding data whose labels have been corrected by an inspector. Additionally, the labeling system 10 may request inspection of the labeling results of the remaining data. Alternatively, the labeling system 10 may request inspection only of the portion of the labeling result of the remaining data that is different from the labeling result of the model 122 (e.g. 123) before additional learning.

상술한 추가 학습과 레이블링 과정은 지정된 조건이 만족될 때까지 반복적으로 수행될 수도 있다. 그렇게 함으로써, 보다 철저한 검수가 이루어질 수 있고, 레이블링 결과물의 품질이 더욱 향상될 수 있다.The above-described additional learning and labeling process may be performed repeatedly until specified conditions are satisfied. By doing so, more thorough inspection can be performed and the quality of labeling results can be further improved.

참고로, 상술한 모델 추가 학습 방법은 모델 관리부(25)에 의해 수행될 수 있을 것이나, 본 개시의 범위가 이에 한정되는 것은 아니다. 가령, 상술한 모델 재학습 방법은 레이블링 시스템(10)의 다른 구성요소에 의해 수행될 수도 있다.For reference, the above-described additional model learning method may be performed by the model management unit 25, but the scope of the present disclosure is not limited thereto. For example, the model retraining method described above may be performed by other components of labeling system 10.

지금까지 도 11 및 도 12를 참조하여 레이블링 결과물의 품질을 더욱 향상시키기 위한 방법들에 대하여 설명하였다. 이하에서는, 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)을 구현할 수 있는 예시적인 컴퓨팅 장치(130)에 대하여 설명하도록 한다.So far, methods for further improving the quality of labeling results have been described with reference to FIGS. 11 and 12. Hereinafter, an exemplary computing device 130 capable of implementing the labeling system 10 according to some embodiments of the present disclosure will be described.

도 13은 컴퓨팅 장치(130)를 나타내는 예시적인 하드웨어 구성도이다.13 is an exemplary hardware configuration diagram showing the computing device 130.

도 13에 도시된 바와 같이, 컴퓨팅 장치(130)는 하나 이상의 프로세서(131), 버스(133), 통신 인터페이스(134), 프로세서(131)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(132)와, 컴퓨터 프로그램(136)을 저장하는 스토리지(135)를 포함할 수 있다. 다만, 도 13에는 본 개시의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 개시가 속한 기술분야의 통상의 기술자라면 도 13에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다. 즉, 컴퓨팅 장치(130)에는, 도 13에 도시된 구성 요소 이외에도 다양한 구성 요소가 더 포함될 수 있다. 또한, 경우에 따라, 도 13에 도시된 구성요소들 중 일부가 생략된 형태로 컴퓨팅 장치(130)가 구성될 수도 있다. 이하, 컴퓨팅 장치(130)의 각 구성요소에 대하여 설명한다.As shown in FIG. 13, the computing device 130 includes one or more processors 131, a bus 133, a communication interface 134, and a memory (loading) a computer program executed by the processor 131. 132) and a storage 135 that stores a computer program 136. However, only components related to the embodiment of the present disclosure are shown in FIG. 13. Accordingly, a person skilled in the art to which this disclosure pertains can see that other general-purpose components may be included in addition to the components shown in FIG. 13 . That is, the computing device 130 may further include various components in addition to those shown in FIG. 13 . Additionally, in some cases, the computing device 130 may be configured with some of the components shown in FIG. 13 omitted. Hereinafter, each component of the computing device 130 will be described.

프로세서(131)는 컴퓨팅 장치(130)의 각 구성의 전반적인 동작을 제어할 수 있다. 프로세서(131)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서 중 적어도 하나를 포함하여 구성될 수 있다. 또한, 프로세서(131)는 본 개시의 실시예들에 따른 동작/방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(130)는 하나 이상의 프로세서를 구비할 수 있다.The processor 131 may control the overall operation of each component of the computing device 130. The processor 131 is at least one of a Central Processing Unit (CPU), Micro Processor Unit (MPU), Micro Controller Unit (MCU), Graphic Processing Unit (GPU), or any type of processor well known in the art of the present disclosure. It can be configured to include. Additionally, the processor 131 may perform operations on at least one application or program to execute operations/methods according to embodiments of the present disclosure. Computing device 130 may include one or more processors.

다음으로, 메모리(132)는 각종 데이터, 명령 및/또는 정보를 저장할 수 있다. 메모리(132)는 본 개시의 실시예들에 따른 동작/방법을 실행하기 위하여 스토리지(135)로부터 컴퓨터 프로그램(136)을 로드할 수 있다. 메모리(132)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.Next, memory 132 may store various data, commands and/or information. Memory 132 may load a computer program 136 from storage 135 to execute operations/methods according to embodiments of the present disclosure. The memory 132 may be implemented as a volatile memory such as RAM, but the technical scope of the present disclosure is not limited thereto.

다음으로, 버스(133)는 컴퓨팅 장치(130)의 구성 요소 간 통신 기능을 제공할 수 있다. 버스(133)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.Next, the bus 133 may provide communication functionality between components of the computing device 130. The bus 133 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

다음으로, 통신 인터페이스(134)는 컴퓨팅 장치(130)의 유무선 인터넷 통신을 지원할 수 있다. 또한, 통신 인터페이스(134)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(134)는 본 개시의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.Next, the communication interface 134 may support wired or wireless Internet communication of the computing device 130. Additionally, the communication interface 134 may support various communication methods other than Internet communication. To this end, the communication interface 134 may be configured to include a communication module well known in the technical field of the present disclosure.

다음으로, 스토리지(135)는 하나 이상의 컴퓨터 프로그램(136)을 비임시적으로 저장할 수 있다. 스토리지(135)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.Next, storage 135 may non-transitory store one or more computer programs 136. The storage 135 may be a non-volatile memory such as Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, a hard disk, a removable disk, or a device well known in the art to which this disclosure pertains. It may be configured to include any known type of computer-readable recording medium.

다음으로, 컴퓨터 프로그램(136)은 메모리(132)에 로드될 때 프로세서(131)로 하여금 본 개시의 다양한 실시예들에 따른 동작/방법을 수행하도록 하는 하나 이상의 인스트럭션을 포함할 수 있다. 즉, 프로세서(131)는 상기 하나 이상의 인스트럭션을 실행함으로써, 본 개시의 다양한 실시예들에 따른 동작/방법을 수행할 수 있다.Next, the computer program 136 may include one or more instructions that, when loaded into the memory 132, cause the processor 131 to perform operations/methods according to various embodiments of the present disclosure. That is, the processor 131 may perform operations/methods according to various embodiments of the present disclosure by executing the one or more instructions.

예를 들어, 컴퓨터 프로그램(136)은 사용자로부터 획득된 사용자 모델을 등록하는 동작, 사용자로부터 언레이블 데이터셋을 획득하는 동작, 등록된 사용자 모델을 이용하여 언레이블 데이터셋에 대한 레이블을 생성하는 동작, 생성된 레이블에 대한 검수 작업을 생성하는 동작 및 생성된 검수 작업을 검수자에게 할당하는 동작을 수행하는 하나 이상의 인스트럭션을 포함할 수 있다. 이와 같은 경우, 컴퓨팅 장치(130)를 통해 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)이 구현될 수 있다.For example, the computer program 136 may perform the following operations: registering a user model obtained from a user, obtaining an unlabeled dataset from the user, and creating a label for the unlabeled dataset using the registered user model. , may include one or more instructions that perform the operation of creating an inspection task for the generated label and assigning the created inspection task to an inspector. In this case, the labeling system 10 according to some embodiments of the present disclosure may be implemented through the computing device 130.

지금까지 도 13을 참조하여 본 개시의 몇몇 실시예들에 따른 레이블링 시스템(10)을 구현할 수 있는 예시적인 컴퓨팅 장치(130)에 대하여 설명하였다.So far, an exemplary computing device 130 capable of implementing the labeling system 10 according to some embodiments of the present disclosure has been described with reference to FIG. 13 .

지금까지 설명된 본 개시의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 컴퓨터로 읽을 수 있는 기록 매체에 기록된 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical ideas of the present disclosure described so far can be implemented as computer-readable code on a computer-readable medium. A computer program recorded on a computer-readable recording medium can be transmitted to another computing device through a network such as the Internet, installed on the other computing device, and thus used on the other computing device.

또한, 이상의 실시예들에서 복수의 구성요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 개시의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 개시의 기술적 사상의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In addition, although it has been described in the above embodiments that a plurality of components are combined or operated in combination, the technical idea of the present disclosure is not necessarily limited to these embodiments. That is, as long as it is within the scope of the technical idea of the present disclosure, all of the components may be operated by selectively combining one or more of them.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 이상 첨부된 도면을 참조하여 본 개시의 다양한 실시예들을 설명하였지만, 본 개시가 속한 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 개시의 기술적 사상이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although operations are shown in the drawings in a specific order, it should not be understood that the operations must be performed in the specific order shown or sequential order or that all illustrated operations must be performed to obtain the desired results. In certain situations, multitasking and parallel processing may be advantageous. Although various embodiments of the present disclosure have been described above with reference to the attached drawings, those skilled in the art will understand that the technical idea of the present disclosure can be modified in a different specific form without changing the technical idea or essential features. It is understandable that it can also be implemented. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. The scope of protection of this disclosure should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of the technical ideas defined by this disclosure.

Claims

Model management department that registers and manages model sets;
A ruleset management unit that registers and manages rulesets; and
It includes a labeling unit that provides a labeling function for an unlabeled dataset,
The labeling unit,
a first label generator that generates a first label for the unlabeled dataset using at least one model from among the pre-registered model sets; and
Comprising a second label generator that generates a second label for the unlabeled data set using at least one rule from a pre-registered rule set,
Labeling system.

According to paragraph 1,
The pre-registered model set includes a user model obtained from the user,
The first label generator generates the first label using the user model in response to the user's request,
Labeling system.

According to paragraph 1,
The pre-registered model set includes a system model provided by the labeling system,
The first label generator generates the first label using the system model in response to a user's request,
Labeling system.

According to paragraph 1,
The model set includes a natural language processing model,
The unlabeled dataset is a text dataset,
Labeling system.

According to paragraph 4,
The first label includes a word-level label, a sentence-level label, and a document-level label,
Labeling system.

According to clause 5,
The word-level label includes a named entity,
The sentence-level label includes at least one of the intent of the sentence, the type of the sentence, and the emotion of the sentence,
The document-level label includes the topic of the document,
Labeling system.

According to paragraph 1,
The labeling unit further provides a manual labeling function that receives a label from the operator,
The input label is displayed in a different way from the first label or the second label on the labeling work screen,
Labeling system.

According to paragraph 1,
The labeling unit provides the labeling function to the operator,
In response to the worker's request, further comprising a task management unit that creates an inspection task for the first label or the second label and assigns the created inspection task to an inspector,
Labeling system.

According to clause 8,
The first label includes a plurality of candidate labels generated by a plurality of models for the same data,
The plurality of candidate labels are displayed on the inspector's work screen,
Labeling system.

According to clause 9,
The plurality of candidate labels are,
displayed on the work screen in response to a predetermined input,
Displayed in different ways depending on the value of the label,
Labeling system.

According to paragraph 1,
In response to a user's request, further comprising a task management unit that generates a labeling task for the unlabeled dataset and assigns the generated task to an annotator,
Labeling system.

According to paragraph 1,
The pre-registered model set includes a system model provided by the labeling system,
The model management department,
Obtain a labeled dataset of the same domain as the unlabeled dataset from the user,
Create a custom model for the user by fine-tuning the system model using the label dataset,
The labeling unit generates the first label using the custom model in response to the user's request,
Labeling system.

According to paragraph 1,
The model management department,
Analyzing the unlabeled dataset to predict the domain to which the unlabeled dataset belongs,
Selecting at least one candidate model associated with the predicted domain from the pre-registered model set,
Recommending a model to generate the first label among the selected candidate models based on model performance,
Labeling system.

1. A labeling method performed on at least one computing device, comprising:
registering a user model obtained from a user;
Obtaining an unlabeled dataset from the user;
generating a label for the unlabeled dataset using the registered user model;
creating an inspection task for the generated label; and
Including the step of assigning the generated inspection task to an inspector,
Labeling method.

Combined with a computing device,
registering a user model obtained from a user;
Obtaining an unlabeled dataset from the user;
generating a label for the unlabeled dataset using the registered user model;
creating an inspection task for the generated label; and
Stored on a computer-readable recording medium to execute the step of assigning the generated inspection task to an inspector,
computer program.