KR102413583B1

KR102413583B1 - Method for managing annotation job, apparatus and system supporting the same

Info

Publication number: KR102413583B1
Application number: KR1020200016770A
Authority: KR
Inventors: 이경원; 팽경현
Original assignee: 주식회사 루닛
Priority date: 2020-01-28
Filing date: 2020-02-12
Publication date: 2022-06-27
Also published as: KR20200054138A

Abstract

어노테이션 작업을 효율적으로 수행할 수 있는 어노테이션 작업 관리 방법이 제공된다. 상기 어노테이션 작업 관리 방법은, 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 단계를 포함할 수 있다. 이때, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭쳐(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.An annotation task management method capable of efficiently performing annotation tasks is provided. The annotation task management method includes: obtaining information on a new pathological slide image; determining a dataset type and panel of the pathological slide image; and the pathological slide image, the determined dataset type, and an annotation task. ) and allocating an annotation job defined as a patch that is a partial region of the pathological slide image to an annotator account. In this case, the annotation task is defined including the determined panel, and the panel is designated as any one of a cell panel, a tissue panel, and a structure panel, and the dataset type is , which indicates the use of the pathological slide image, may be designated as either a training use of a machine learning model or a validation use of the machine learning model.

Description

Methods for managing annotation jobs, and the devices and systems that support them

본 개시는 어노테이션 작업 관리 방법, 이를 지원하는 장치 및 시스템에 관한 것이다. 보다 자세하게는, 어노테이션(annotation) 작업을 보다 효율적으로 관리함과 동시에 어노테이션 결과의 정확성을 담보할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.The present disclosure relates to an annotation job management method, and an apparatus and system supporting the same. More specifically, it is an object to provide a method, an apparatus and a system supporting the method, capable of more efficiently managing an annotation operation and guaranteeing the accuracy of an annotation result.

지도 학습(supervised learning)이란 도 1에 도시된 바와 같이 레이블 정보(즉, 정답 정보)가 주어진 데이터셋(2)을 학습하여 목적 태스크를 수행하는 타깃 모델(3)을 구축하는 기계 학습 방법이다. 따라서, 레이블 정보(태그 아이콘으로 표시됨)가 주어지지 않은 데이터셋(1)에 대해 지도 학습을 수행하기 위해서는, 어노테이션(annotation) 작업이 필수적으로 선행되어야 한다.Supervised learning is a machine learning method for building a target model 3 that performs a target task by learning a dataset 2 given label information (ie, correct answer information) as shown in FIG. 1 . Therefore, in order to perform supervised learning on the dataset 1 to which label information (indicated by a tag icon) is not given, an annotation operation must be performed in advance.

어노테이션 작업은 학습 데이터셋을 생성하기 위해 데이터 별로 레이블 정보를 태깅하는 작업을 의미한다. 어노테이션 작업은 일반적으로 사람에 의해 수행되기 때문에, 대량의 학습 데이터셋을 생성하기 위해서는 상당한 인적 비용과 시간 비용이 소모된다. 특히, 병리 이미지에서 병변의 종류 또는 위치 등을 진단하는 기계 학습 모델을 구축하는 경우라면, 숙련된 전문의에 의해 어노테이션 작업이 수행되어야 하기 때문에, 다른 도메인에 비해 훨씬 더 많은 비용이 소모된다.The annotation operation refers to the operation of tagging label information for each data in order to create a training dataset. Since annotation work is generally performed by humans, significant human and time costs are consumed to generate a large training dataset. In particular, in the case of building a machine learning model for diagnosing the type or location of a lesion in a pathological image, since annotation work must be performed by a skilled specialist, much more cost is consumed compared to other domains.

종래에는, 체계적인 작업 프로세스가 정립되지 않은 체로 어노테이션 작업이 수행되었다. 가령, 종래의 방식은 관리자가 각 병리 이미지의 특성을 육안으로 확인하여 어노테이션 수행 여부를 결정하고, 손수 병리 이미지를 분류한 다음 적절한 어노테이터(annotator)에게 병리 이미지를 할당하는 방식이었다. 뿐만 아니라, 종래에는 관리자가 일일이 병리 이미지 상의 어노테이션 영역을 지정한 다음, 어노테이터에게 작업을 할당하였다. 즉, 종래에는 병리 이미지 분류, 작업 할당, 어노테이션 영역 지정 등의 제반 과정이 관리자에 의해 수동으로 이루어졌고, 이로 인해 어노테이션 작업에 상당한 시간 비용과 인적 비용이 소모되는 문제가 있었다.Conventionally, annotation work was performed without a systematic work process established. For example, in the conventional method, an administrator visually checks the characteristics of each pathological image to determine whether to perform annotation, classifies the pathological image by hand, and then assigns the pathological image to an appropriate annotator. In addition, in the prior art, an administrator individually designates an annotation area on a pathological image, and then assigns a task to an annotator. That is, in the prior art, all processes, such as pathological image classification, task assignment, and annotation area designation, were manually performed by an administrator, and thus, there was a problem in that considerable time and human costs were consumed in the annotation operation.

나아가, 기계 학습 기법 자체는 충분히 고도화되었음에도 불구하고, 어노테이션 작업의 시간적, 비용적 문제로 인해 다양한 분야에 기계 학습 기법을 적용하는데 많은 어려움이 있었다.Furthermore, although the machine learning technique itself was sufficiently advanced, there were many difficulties in applying the machine learning technique to various fields due to the time and cost problems of annotation work.

따라서, 기계 학습 기법의 활용성을 더욱 증대시키기 위해, 보다 효율적이고 체계적으로 어노테이션 작업을 수행할 수 있는 방법이 요구된다.Therefore, in order to further increase the utility of the machine learning technique, a method capable of performing annotation work more efficiently and systematically is required.

한국공개특허 제10-2014-0093974호 (2014.07.29 공개)Korean Patent Publication No. 10-2014-0093974 (published on July 29, 2014)

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 어노테이션 작업의 자동화를 통해 어노테이션 작업을 보다 효율적이고 체계적으로 수행하고 관리할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.A technical problem to be solved through some embodiments of the present disclosure is to provide a method capable of more efficiently and systematically performing and managing an annotation operation through automation of the annotation operation, and an apparatus and system supporting the method.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 어노테이션 작업을 체계적으로 관리할 수 있는 데이터 설계 산출물 또는 데이터 모델링 산출물을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a data design product or data modeling product capable of systematically managing annotation work.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 어노테이션 작업을 적절한 어노테이터에게 자동으로 할당하는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method for automatically allocating annotation tasks to appropriate annotators, and an apparatus and system supporting the method.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 병리 슬라이드 이미지에서 어노테이션 작업이 수행될 패치 이미지를 자동으로 추출하는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method for automatically extracting a patch image to be annotated from a pathological slide image, and an apparatus and system supporting the method.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 어노테이션 결과의 정확성을 담보할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method capable of ensuring the accuracy of an annotation result, and an apparatus and system supporting the method.

본 개시의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present disclosure are not limited to the above-mentioned technical problems, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법은, 컴퓨팅 장치에 의하여 수행되는 방법에 있어서, 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 단계를 포함하되, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.In order to solve the above technical problem, the annotation task management method according to some embodiments of the present disclosure is a method performed by a computing device, comprising the steps of: obtaining information on a new pathological slide image; Determining a dataset type and panel and annotating the pathological slide image, the determined dataset type, an annotation task, and an annotation job defined as a patch that is a partial region of the pathological slide image with an annotator ) to an account, wherein the annotation task is defined including the determined panel, wherein the panel is designated as any one of a cell panel, a tissue panel, and a structure panel , and the dataset type indicates the use of the pathological slide image, and may be designated as either a training use of a machine learning model or a validation use of the machine learning model.

몇몇 실시예에서, 상기 어노테이션 태스크는, 태스크 클래스를 더 포함하여 정의되는 것이고, 상기 태스크 클래스는, 상기 패널의 관점에서 정의되는 어노테이션 대상을 가리키는 것일 수 있다.In some embodiments, the annotation task is defined by further including a task class, and the task class may indicate an annotation target defined from the perspective of the panel.

몇몇 실시예에서, 상기 데이터셋 타입은, 상기 기계학습 모델의 학습(training) 용도, 상기 기계학습 모델의 검증(validation) 용도 또는 OPT(Observer Performance Test) 용도 중 어느 하나로 지정되는 것일 수 있다.In some embodiments, the dataset type may be designated as any one of a training use of the machine learning model, a validation use of the machine learning model, or an Observer Performance Test (OPT) use.

몇몇 실시예에서, 상기 데이터셋 타입 및 패널을 결정하는 단계는, 상기 병리 슬라이드 이미지를 기계학습 모델에 입력하고, 그 결과로 출력된 출력 값에 기반하여, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계를 포함할 수 있다.In some embodiments, the determining of the dataset type and panel includes inputting the pathological slide image to a machine learning model, and based on an output value output as a result, the dataset type and panel of the pathological slide image may include the step of determining

몇몇 실시예에서, 상기 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계는, 지정된 위치의 스토리지에 병리 슬라이드 이미지 파일이 추가되는 것을, 상기 스토리지를 모니터링 하는 워커 에이전트가 감지하는 단계, 상기 워커 에이전트에 의하여 상기 신규의 병리 슬라이드 이미지에 대한 정보가 데이터베이스에 삽입되는 단계 및 상기 데이터베이스로부터 상기 병리 슬라이드 이미지에 대한 정보를 얻는 단계를 포함할 수 있다.In some embodiments, the step of obtaining information on the new pathological slide image includes: detecting, by a worker agent monitoring the storage, that a pathological slide image file is added to a storage of a designated location; by the worker agent The method may include inserting information on the new pathological slide image into a database and obtaining information on the pathological slide image from the database.

몇몇 실시예에서, 상기 할당하는 단계는, 상기 어노테이션 작업의 데이터셋 타입 및 어노테이션 태스크의 패널의 조합과 연관된 어노테이션 수행 이력을 기준으로 선정된 어노테이터 계정에 상기 어노테이션 작업을 자동 할당하는 단계를 포함할 수 있다.In some embodiments, the allocating may include automatically allocating the annotation task to a selected annotator account based on an annotation performance history associated with a combination of a panel of annotation tasks and a dataset type of the annotation task. can

몇몇 실시예에서, 상기 어노테이션 태스크는, 태스크 클래스를 더 포함하여 정의되는 것이고, 상기 태스크 클래스는, 상기 패널의 관점에서 정의되는 어노테이션 대상을 가리키는 것이며, 상기 할당하는 단계는, 상기 어노테이션 작업의 어노테이션 태스크의 패널 및 태스크 클래스의 조합과 연관된 어노테이션 수행 이력을 기준으로 선정된 어노테이터 계정에 상기 어노테이션 작업을 자동 할당하는 단계를 포함할 수 있다. In some embodiments, the annotation task is defined by further including a task class, the task class indicates an annotation target defined from the perspective of the panel, and the allocating includes: an annotation task of the annotation task The method may include automatically allocating the annotation task to a selected annotator account based on an annotation performance history associated with a combination of a panel and a task class of .

몇몇 실시예에서, 상기 할당하는 단계는, 상기 병리 슬라이드 이미지의 후보 패치들을 얻는 단계 및 각각의 후보 패치를 상기 기계학습 모델에 입력하고, 그 결과로 출력된 각 클래스 별 출력 값에 기반하여, 상기 후보 패치들 중에서 상기 어노테이션 작업의 패치를 자동으로 선정하는 단계를 포함할 수 있다.In some embodiments, the allocating may include obtaining candidate patches of the pathological slide image, inputting each candidate patch to the machine learning model, and based on the output value for each class output as a result, the The method may include automatically selecting a patch for the annotation operation from among candidate patches.

몇몇 실시예에서, 상기 어노테이션 작업의 패치를 상기 후보 패치들 중에서 자동으로 선정하는 단계는, 상기 각각의 후보 패치에 대한 각 클래스 별 출력 값을 이용하여 엔트로피 값을 연산하는 단계 및 상기 엔트로피 값이 기준치 이상인 후보 패치를, 상기 어노테이션 작업의 패치로 선정하는 단계를 포함할 수 있다.In some embodiments, the step of automatically selecting the patch to be annotated from among the candidate patches includes calculating an entropy value using an output value for each class for each candidate patch, and the entropy value is a reference value. The method may include selecting the above candidate patches as the patch for the annotation work.

몇몇 실시예에서, 상기 할당하는 단계는, 상기 병리 슬라이드 이미지의 후보 패치들을 얻는 단계, 각각의 후보 패치에 대한 상기 기계학습 모델의 오예측(miss-prediction) 확률을 산출하는 단계 및 상기 산출된 오예측 확률이 기준치 이상인 후보 패치를, 상기 어노테이션 작업의 패치로 선정하는 단계를 포함할 수 있다.In some embodiments, the allocating includes: obtaining candidate patches of the pathological slide image; calculating a miss-prediction probability of the machine learning model for each candidate patch; The method may include selecting a candidate patch having a prediction probability equal to or greater than a reference value as the patch for the annotation operation.

몇몇 실시예에서, 상기 어노테이션 작업을 할당받은 어노테이터 계정의 제1 어노테이션 결과 데이터를 얻는 단계, 상기 제1 어노테이션 결과 데이터와 상기 어노테이션 작업의 패치를 상기 기계학습 모델에 입력한 결과를 비교하는 단계 및 상기 비교 결과, 두 결과의 차이가 기준치를 초과하면 상기 어노테이션 작업을 다른 어노테이터 계정에 재할당하는 단계를 더 포함할 수 있다.In some embodiments, obtaining first annotation result data of an annotator account assigned with the annotation work, comparing the first annotation result data with a result of inputting the patch of the annotation work to the machine learning model; The method may further include reallocating the annotation task to another annotator account when a difference between the two results exceeds a reference value as a result of the comparison.

몇몇 실시예에서, 상기 어노테이션 작업을 할당받은 어노테이터 계정의 제1 어노테이션 결과 데이터를 얻는 단계, 다른 어노테이터 계정의 제2 어노테이션 결과 데이터를 얻는 단계 및 상기 제1 어노테이션 결과 데이터와 상기 제2 어노테이션 결과 데이터의 유사도가 기준치 미만이면 상기 제1 어노테이션 결과 데이터를 미승인 처리하는 단계를 더 포함할 수 있다.In some embodiments, obtaining first annotation result data of an annotator account assigned with the annotation task, obtaining second annotation result data of another annotator account, and the first annotation result data and the second annotation result If the similarity of the data is less than the reference value, the method may further include disapproving the first annotation result data.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 장치는, 하나 이상의 인스트럭션들(instructions)을 포함하는 메모리 및 상기 하나 이상의 인스트럭션들을 실행함으로써, 신규의 병리 슬라이드 이미지에 대한 정보를 얻어오고, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하며, 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 프로세서를 포함하되, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.Annotation task management apparatus according to some embodiments of the present disclosure for solving the above technical problem, by executing a memory including one or more instructions and the one or more instructions, for a new pathological slide image obtain information, determine a dataset type and panel of the pathological slide image, and define the pathological slide image, the determined dataset type, an annotation task, and a patch that is a partial region of the pathological slide image a processor for allocating a job to an annotator account, wherein the annotation task is defined including the determined panel, the panel comprising: a cell panel, a tissue panel and It is designated by any one of the structure panels, and the dataset type indicates the use of the pathological slide image, and any of the machine learning model's training (training) use or the machine learning model's validation (validation) use It may be specified as one.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 컴퓨터 프로그램을 포함하는 비일시적(non-transitory) 컴퓨터 판독가능 기록매체는, 상기 컴퓨터 프로그램의 명령어들이 프로세서에 의해 실행될 때, 상기 프로세서로 하여금, 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 단계를 수행하도록 할 수 있다. 이때, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.A non-transitory computer-readable recording medium including a computer program according to some embodiments of the present disclosure for solving the above-described technical problem, when the instructions of the computer program are executed by a processor, the processor obtaining information on a new pathology slide image, determining a dataset type and panel of the pathology slide image, and performing the pathology slide image, the determined dataset type, an annotation task and the pathology A step of allocating an annotation job defined as a patch, which is a part of a slide image, to an annotator account can be performed. In this case, the annotation task is defined including the determined panel, and the panel is designated as any one of a cell panel, a tissue panel, and a structure panel, and the dataset type is , which indicates the use of the pathological slide image, may be designated as either a training use of a machine learning model or a validation use of the machine learning model.

상술한 본 개시의 다양한 실시예들에 따르면, 어노테이션 작업이 전반적으로 자동화됨에 따라 관리자의 편의성이 증대되고, 전반적인 작업 효율성이 크게 향상될 수 있다. 이에 따라, 어노테이션 작업에 소요되는 시간 비용 및 인적 비용이 크게 절감될 수 있다. 또한, 어노테이션 작업의 부담이 감소됨에 따라, 기계 학습 기법의 활용성은 더욱 증대될 수 있다.According to the above-described various embodiments of the present disclosure, as the annotation operation is generally automated, the convenience of the administrator may be increased and the overall operation efficiency may be greatly improved. Accordingly, time cost and human cost required for annotation work can be greatly reduced. In addition, as the burden of the annotation task is reduced, the utility of the machine learning technique may be further increased.

또한, 데이터 모델링 산출물에 기반하여 어노테이션 작업과 연관된 각종 데이터가 체계적으로 관리될 수 있다. 이에 따라, 데이터 관리 비용은 감소되고, 전반적인 어노테이션 작업 프로세스가 원활하게 진행될 수 있다.In addition, various data related to annotation work may be systematically managed based on the data modeling product. Accordingly, data management cost is reduced, and the overall annotation work process can be smoothly performed.

또한, 어노테이션 작업을 적절한 어노테이터에게 자동으로 할당함으로써, 관리자의 업무 부담이 감소될 수 있고, 어노테이션 결과의 정확성은 향상될 수 있다.In addition, by automatically assigning annotation tasks to appropriate annotators, an administrator's workload can be reduced and the accuracy of annotation results can be improved.

또한, 어노테이션 작업 결과를 기계학습 모델 또는 다른 어노테이터의 결과와 비교 검증함으로써, 어노테이션 결과의 정확성이 담보될 수 있다. 이에 따라, 어노테이션 결과를 학습한 기계학습 모델의 성능도 향상될 수 있다.In addition, by comparing and verifying the results of the annotation operation with the results of the machine learning model or other annotators, the accuracy of the annotation results can be guaranteed. Accordingly, the performance of the machine learning model that has learned the annotation result may also be improved.

또한, 어노테이션이 수행될 영역을 가리키는 패치가 자동으로 추출될 수 있다. 따라서, 관리자의 업무 부담이 최소화될 수 있다.Also, a patch indicating a region to be annotated may be automatically extracted. Accordingly, the work load of the manager can be minimized.

또한, 기계 학습 모델의 오예측 확률, 엔트로피 값 등에 기반하여 복수의 후보 패치 중에서 학습에 효과적인 패치만이 어노테이션 대상으로 선정된다. 이에 따라, 어노테이션 작업량이 감소되고, 양질의 학습 데이터셋이 생성될 수 있다.In addition, based on the erroneous prediction probability and entropy value of the machine learning model, only a patch effective for learning from among a plurality of candidate patches is selected as an annotation target. Accordingly, the amount of annotation work may be reduced, and a high-quality training dataset may be generated.

본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects according to the technical spirit of the present disclosure are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 지도 학습과 어노테이션 작업 간의 관계를 설명하기 위한 예시도이다.
도 2 및 도 3은 본 개시의 다양한 실시예들에 따른 어노테이션 작업 관리 시스템을 나타내는 예시적인 구성도이다.
도 4는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리를 위한 예시적인 데이터 모델의 설계도이다.
도 5는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법을 나타내는 예시적인 흐름도이다.
도 6은 본 개시의 몇몇 실시예들에 따른 어노테이터 선정 방법을 설명하기 위한 예시도이다.
도 7은 본 개시의 몇몇 실시예들에서 참조될 수 있는 어노테이션 툴을 나타내는 예시도이다.
도 8은 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법을 나타내는 예시적인 흐름도이다.
도 9는 본 개시의 몇몇 실시예들에 따른 병리 슬라이드 이미지에 대한 데이터셋 타입 결정 방법을 나타내는 예시적인 흐름도이다.
도 10 내지 도 13은 본 개시의 몇몇 실시예들에 따른 패널 유형 결정 방법을 설명하기 위한 도면이다.
도 14는 본 개시의 제1 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다.
도 15 내지 도 19는 본 개시의 제1 실시예들에 따른 패치 자동 추출 방법을 부연 설명하기 위한 예시도이다.
도 20은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다.
도 21 내지 도 23은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 부연 설명하기 위한 예시도이다.
도 24는 본 개시의 다양한 실시예들에 따른 장치/시스템을 구현할 수 있는 예시적인 컴퓨팅 장치를 나타내는 예시적인 하드웨어 구성도이다.1 is an exemplary diagram for explaining the relationship between supervised learning and annotation work.
2 and 3 are exemplary configuration diagrams illustrating an annotation job management system according to various embodiments of the present disclosure.
4 is a schematic diagram of an exemplary data model for annotation task management according to some embodiments of the present disclosure.
5 is an exemplary flowchart illustrating an annotation job management method according to some embodiments of the present disclosure.
6 is an exemplary diagram for explaining an annotator selection method according to some embodiments of the present disclosure.
7 is an exemplary diagram illustrating an annotation tool that may be referred to in some embodiments of the present disclosure.
8 is an exemplary flowchart illustrating a method for generating an annotation job according to some embodiments of the present disclosure.
9 is an exemplary flowchart illustrating a method of determining a dataset type for a pathological slide image according to some embodiments of the present disclosure.
10 to 13 are diagrams for explaining a method for determining a panel type according to some embodiments of the present disclosure.
14 is an exemplary flowchart illustrating a method for automatically extracting patches according to first embodiments of the present disclosure.
15 to 19 are exemplary diagrams for further explaining the method for automatically extracting patches according to the first exemplary embodiments of the present disclosure.
20 is an exemplary flowchart illustrating an automatic patch extraction method according to second embodiments of the present disclosure.
21 to 23 are exemplary diagrams for further explaining a method for automatically extracting patches according to second embodiments of the present disclosure.
24 is an exemplary hardware configuration diagram illustrating an exemplary computing device capable of implementing an apparatus/system according to various embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 바람직한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Advantages and features of the present disclosure, and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical spirit of the present disclosure is not limited to the following embodiments, but may be implemented in various different forms, only the present embodiments make the present disclosure complete, and common knowledge in the technical field to which the present disclosure belongs It is provided to fully inform those who have the scope of the present disclosure, and the technical spirit of the present disclosure is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시를 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which this disclosure belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular. The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present disclosure. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the components of the present disclosure, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is “connected”, “coupled” or “connected” to another component, the component may be directly connected or connected to the other component, but another component is formed between each component. It should be understood that elements may also be “connected,” “coupled,” or “connected.”

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

본 명세서에 대한 설명에 앞서, 본 명세서에서 사용되는 몇몇 용어들에 대하여 명확하게 하기로 한다.Prior to the description of the present specification, some terms used in the present specification will be clarified.

본 명세서에서, 레이블 정보(label information)란, 데이터 샘플의 정답 정보로써 어노테이션 작업의 결과로 획득된 정보이다. 상기 레이블은 당해 기술 분야에서 어노테이션(annotation), 태그 등의 용어와 혼용되어 사용될 수 있다.In the present specification, label information is information obtained as a result of annotation work as correct answer information of a data sample. The label may be used interchangeably with terms such as annotation and tag in the art.

본 명세서에서, 어노테이션(annotation)이란, 데이터 샘플에 레이블 정보를 태깅하는 작업 또는 태깅된 정보(즉, 주석) 그 자체를 의미한다. 상기 어노테이션은 당해 기술 분야에서 태깅(tagging), 레이블링(labeling) 등의 용어와 혼용되어 사용될 수 있다.In the present specification, the annotation (annotation) refers to an operation of tagging label information on a data sample or tagged information (ie, annotation) itself. The annotation may be used interchangeably with terms such as tagging and labeling in the art.

본 명세서에서, 오예측(miss-prediction) 확률이란, 주어진 데이터 샘플에 대한 특정 모델이 예측을 수행할 때, 상기 예측 결과에 오류가 포함될 확률(즉, 예측이 틀릴 확률) 또는 가능성을 의미한다.As used herein, the miss-prediction probability means a probability (ie, a wrong prediction) or probability that an error is included in the prediction result when a specific model for a given data sample performs prediction.

본 명세서에서, 패널(panel)이란, 병리 슬라이드 이미지에서 추출될 패치(patch) 또는 병리 슬라이드 이미지의 타입을 의미한다. 상기 패널은 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널로 구분될 수 있으나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다. 각 패널 유형에 대응되는 패치의 예는 도 10 내지 도 12를 참조하도록 한다.As used herein, the term "panel" means a patch or a type of pathological slide image to be extracted from the pathological slide image. The panel may be divided into a cell panel, a tissue panel, and a structure panel, but the technical scope of the present disclosure is not limited thereto. For examples of patches corresponding to each panel type, refer to FIGS. 10 to 12 .

본 명세서에서 인스트럭션(instruction)이란, 기능을 기준으로 묶인 일련의 명령어들로서 컴퓨터 프로그램의 구성 요소이자 프로세서에 의해 실행되는 것을 가리킨다.In this specification, an instruction refers to a series of instructions grouped based on a function, which is a component of a computer program and is executed by a processor.

이하, 본 개시의 몇몇 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

도 2는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 시스템을 나타내는 예시적인 구성도이다.2 is an exemplary configuration diagram illustrating an annotation job management system according to some embodiments of the present disclosure.

도 2에 도시된 바와 같이, 상기 어노테이션 작업 관리 시스템은 스토리지 서버(10), 적어도 하나의 어노테이터 단말(20-1 내지 20-n) 및 어노테이션 작업 관리 장치(100)를 포함할 수 있다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 구성 요소가 추가되거나 삭제될 수 있음은 물론이다. 가령, 다른 몇몇 실시예에서는, 도 3에 도시된 바와 같이, 상기 어노테이션 작업 관리 시스템은 어노테이션 작업에 대한 리뷰(즉, 평가)를 담당하는 리뷰자의 단말(30)을 더 포함할 수 있다.As shown in FIG. 2 , the annotation job management system may include a storage server 10 , at least one annotator terminal 20 - 1 to 20 -n, and the annotation job management apparatus 100 . However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and it goes without saying that some components may be added or deleted as needed. For example, in some other embodiments, as shown in FIG. 3 , the annotation work management system may further include a reviewer's terminal 30 in charge of reviewing (ie, evaluating) the annotation work.

도 2 또는 도3에 도시된 시스템의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 복수의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있다. 또는, 상기 각각의 구성 요소들은 실제 물리적 환경에서는 복수의 세부 기능 요소로 분리되는 형태로 구현될 수도 있다. 예컨대, 어노테이션 작업 관리 장치(100)의 제1 기능은 제1 컴퓨팅 장치에서 구현되고, 제2 기능은 제2 컴퓨팅 장치에서 구현될 수도 있다. 이하, 상기 각각의 구성 요소에 대하여 설명한다.Each component of the system shown in FIG. 2 or FIG. 3 represents functional components that are functionally separated, and a plurality of components may be implemented in a form that is integrated with each other in an actual physical environment. Alternatively, each of the components may be implemented in a form separated into a plurality of detailed functional elements in an actual physical environment. For example, a first function of the annotation job management apparatus 100 may be implemented in a first computing device, and a second function may be implemented in a second computing device. Hereinafter, each component will be described.

상기 어노테이션 작업 관리 시스템에서, 스토리지 서버(10)는 어노테이션 작업과 연관된 각종 데이터를 저장하고 관리하는 서버이다. 데이터의 효율적인 관리를 위해, 스토리지 서버(10)는 데이터베이스를 이용하여 상기 각종 데이터를 저장되고 관리할 수 있다.In the annotation job management system, the storage server 10 is a server that stores and manages various data related to the annotation job. For efficient data management, the storage server 10 may store and manage the various data using a database.

상기 각종 데이터는 병리 슬라이드 이미지의 파일, 병리 슬라이드 이미지의 메타 데이터(e.g. 이미지 형식, 연관된 병명, 연관된 조직, 연관된 환자 정보 등), 어노테이션 작업에 관한 데이터, 어노테이터에 관한 데이터, 어노테이션 작업 결과물 등을 포함할 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.The various data includes a file of a pathology slide image, metadata of a pathological slide image (e.g. image format, associated disease name, associated tissue, associated patient information, etc.), data about annotation work, data about annotator, annotation work results, etc. It may be included, but the technical scope of the present disclosure is not limited thereto.

몇몇 실시예에서, 스토리지 서버(10)는 작업 관리 웹 페이지를 제공하는 웹 서버로 동작할 수도 있다. 이와 같은 경우, 관리자는 상기 작업 관리 웹 페이지를 통해 어노테이션 작업에 대한 할당, 관리 등을 수행하고, 어노테이터는 상기 작업 관리 웹 페이지를 통해 할당된 작업을 확인하고 수행할 수 있다.In some embodiments, the storage server 10 may operate as a web server that provides a job management web page. In this case, the manager may allocate and manage the annotation task through the task management web page, and the annotator may check and perform the assigned task through the task management web page.

몇몇 실시예에서, 어노테이션 작업 관리를 위한 데이터 모델(e.g. DB 스키마)은 도 4에 도시된 바와 같이 설계될 수 있다. 도 4에서 박스형 객체는 엔티티(entity)를 가리키고, 박스형 객체를 연결하는 선은 관계(relationship)를 가리키며, 선 위의 글자는 관계 유형을 가리킨다. 도 4에 도시된 바와 같이, 어노테이션 작업 엔티티(44)는 다양한 엔티티(43, 45, 46, 47, 49)와 연관될 수 있다. 보다 이해의 편의를 제공하기 위해, 작업 엔티티(44)를 중심으로 도 4에 도시된 데이터 모델에 대하여 간략하게 설명하도록 한다.In some embodiments, a data model (e.g. DB schema) for annotation task management may be designed as shown in FIG. 4 . In FIG. 4 , a box-shaped object indicates an entity, a line connecting the box-shaped object indicates a relationship, and a letter on the line indicates a relationship type. As shown in FIG. 4 , annotating entity 44 may be associated with various entities 43 , 45 , 46 , 47 , 49 . In order to provide a more convenient understanding, the data model shown in FIG. 4 will be briefly described with a focus on the work entity 44 .

슬라이드 엔티티(45)는 병리 슬라이드 이미지에 관한 엔티티이다. 슬라이드 엔티티(45)는 병리 슬라이드 이미지와 연관된 각종 정보를 속성(attribute)으로 가질 수 있다. 하나의 병리 슬라이드 이미지로부터 다수의 어노테이션 작업이 생성될 수 있기 때문에, 슬라이드 엔티티(45)와 작업 엔티티(44) 간의 관계는 1:n이 된다.The slide entity 45 is an entity related to the pathological slide image. The slide entity 45 may have various types of information related to the pathological slide image as attributes. Since multiple annotation tasks can be generated from one pathological slide image, the relationship between the slide entity 45 and the task entity 44 becomes 1:n.

데이터셋 엔티티(49)는 어노테이션이 수행된 데이터의 활용 용도를 나타내는 엔티티이다. 가령, 상기 활용 용도는 학습(training) 용도(즉, 학습 데이터셋으로 활용됨), 검증(validation) 용도(즉, 검증 데이터셋으로 활용됨), 테스트 용도(즉, 테스트 데이터셋으로 활용됨) 또는 OPT(Observer Performance Test) 용도(즉, OPT 테스트에 활용됨)로 구분될 수 있으나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.The dataset entity 49 is an entity indicating the usage purpose of the data on which the annotation has been performed. For example, the usage purpose is a training purpose (that is, used as a training dataset), a validation purpose (ie, used as a validation dataset), a test purpose (ie, used as a test dataset) or OPT ( Observer Performance Test) purpose (ie, used for OPT test) may be classified, but the technical scope of the present disclosure is not limited thereto.

어노테이터 엔티티(47)는 어노테이터를 나타내는 엔티티이다. 어노테이터 엔티티(47)는 상기 어노테이터의 현재 작업 현황, 과거 작업 수행 이력, 기 수행된 작업에 대한 평가 결과, 어노테이터의 인적 정보(e.g. 학력, 전공 등) 등을 속성으로 가질 수 있다. 한 명의 어노테이터는 다수의 작업을 수행할 수 있으므로, 어노테이터 엔티티(47)와 작업 엔티티(44) 간의 관계는 1:n이 된다.The annotator entity 47 is an entity representing an annotator. The annotator entity 47 may have, as attributes, the annotator's current job status, past job performance history, evaluation results of previously performed jobs, and personal information (e.g. academic background, major, etc.) of the annotator. Since one annotator can perform multiple tasks, the relationship between the annotator entity 47 and the task entity 44 is 1:n.

패치 엔티티(46)는 병리 슬라이드 이미지로부터 파생된 패치에 관한 엔티티이다. 상기 패치에는 복수개의 어노테이션이 포함될 수 있기 때문에, 패치 엔티티(46)와 어노테이션 엔티티(48) 간의 관계는 1:n이 된다. 또한, 하나의 어노테이션 작업이 복수개의 패치에 대해 수행될 수 있기 때문에, 패치 엔티티(46)와 작업 엔티티(44) 간의 관계는 n:1이 된다.The patch entity 46 is an entity relating to a patch derived from a pathological slide image. Since the patch may include a plurality of annotations, the relationship between the patch entity 46 and the annotation entity 48 is 1:n. Also, since one annotation operation may be performed on a plurality of patches, the relationship between the patch entity 46 and the operation entity 44 becomes n:1.

어노테이션 태스크 엔티티(43)는 세부적인 어노테이션 작업 유형인 어노테이션 태스크(annotation task)를 나타내는 엔티티이다. 예를 들어, 상기 어노테이션 태스크 유사 분열 세포(mitosis)인지 여부를 태깅하는 태스크, 유사 분열 세포의 개수를 태깅하는 태스크, 병변의 종류를 태깅하는 태스크, 병변의 위치를 태깅하는 태스크, 병명을 태깅하는 태스크 등과 같이 다양하게 정의되고 세분화될 수 있다. 상기 어노테이션 작업의 세부 유형은 패널에 따라 달라질 수 있고(즉, 세포 패널과 조직 패널에 태깅되는 어노테이션은 달라질 수 있음), 동일한 패널이더라도 서로 다른 태스크가 수행될 수 있기 때문에, 태스크 엔티티((43)는 패널 엔티티(41)와 태스크 클래스(42) 엔티티를 속성으로 가질 수 있다. 여기서, 태스크 클래스 엔티티(42)는 패널의 관점에서 정의되는 어노테이션 대상(e.g. 유사 분열 세포, 병변의 위치) 또는 패널의 관점에서 정의되는 태스크 유형을 나타내는 엔티티이다. 하나의 어노테이션 태스크에서 복수의 어노테이션 작업이 생성될 수 있기 때문에(즉, 동일한 태스크를 수행하는 복수의 작업이 존재할 수 있음), 어노테이션 태스크 엔티티(43)와 어노테이션 작업 엔티티(44) 간의 관계는 1:n이 된다. 프로그래밍적 관점에서, 어노테이션 태스크 엔티티(43)는 클래스(class) 또는 프로그램(program)에 대응되고, 어노테이션 작업 엔티티(44)는 상기 클래스의 인스턴스(instance) 또는 프로그램 실행에 의해 생성된 프로세스(process)에 대응되는 것으로 이해될 수 있다.The annotation task entity 43 is an entity representing an annotation task, which is a detailed annotation task type. For example, the annotation task is a task of tagging whether the cells are mitotic, a task of tagging the number of mitotic cells, a task of tagging the type of lesion, a task of tagging the location of the lesion, a task of tagging a disease name It can be defined and subdivided in various ways, such as a task. Since the detailed type of the annotation operation may vary from panel to panel (i.e., annotations tagged to the cell panel and tissue panel may be different), and different tasks may be performed even for the same panel, task entity (43) may have a panel entity 41 and a task class 42 entity as attributes, where the task class entity 42 is an annotation object defined in terms of a panel (e.g. mitotic cell, location of a lesion) or panel's An entity representing a task type defined in terms of an annotation task entity (43) and The relationship between the annotation task entities 44 becomes 1: n. From a programming point of view, the annotation task entity 43 corresponds to a class or program, and the annotation task entity 44 is a member of the class. It may be understood to correspond to an instance or a process created by execution of a program.

몇몇 실시예에서, 스토리지 서버(10)는 전술한 데이터 모델에 기반하여 데이터베이스를 구축하고, 어노테이션 작업과 연관된 각종 데이터를 체계적으로 관리할 수 있다. 이에 따라, 데이터 관리 비용은 감소되고, 전반적인 어노테이션 작업 프로세스가 원활하게 진행될 수 있다.In some embodiments, the storage server 10 may build a database based on the above-described data model and systematically manage various data related to annotation work. Accordingly, data management cost is reduced, and the overall annotation work process can be smoothly performed.

지금까지 어노테이션 작업 관리를 위한 데이터 모델에 대하여 설명하였다. 다시 도 2 및 도 3을 참조하여 어노테이션 작업 관리 시스템의 구성 요소에 대한 설명을 이어가도록 한다.So far, the data model for annotation task management has been described. The description of the components of the annotation task management system will be continued with reference to FIGS. 2 and 3 again.

상기 어노테이션 작업 관리 시스템에서, 어노테이션 작업 관리 장치(100)는 어노테이터 단말(20-1 내지 20-n)에게 어노테이션 작업을 할당하는 등의 제반 관리 기능을 수행하는 컴퓨팅 장치이다. 여기서, 상기 컴퓨팅 장치는, 노트북, 데스크톱(desktop), 랩탑(laptop) 등이 될 수 있으나, 이에 국한되는 것은 아니며 컴퓨팅 기능이 구비된 모든 종류의 장치를 포함할 수 있다. 상기 컴퓨팅 장치의 일 예는 도 24를 참조하도록 한다. 이하에서는, 설명의 편의상 어노테이션 작업 관리 장치(100)를 관리 장치(100)로 약칭하도록 한다. 또한, 이하의 서술에서, 어노테이터 단말을 총칭하거나 구분 없이 임의의 어노테이터 단말을 지칭할 때는 참조 번호 20을 이용하도록 한다.In the annotation job management system, the annotation job management apparatus 100 is a computing device that performs various management functions, such as allocating annotation jobs to the annotator terminals 20-1 to 20-n. Here, the computing device may be a notebook, a desktop, a laptop, etc., but is not limited thereto and may include any type of device equipped with a computing function. An example of the computing device will be referred to FIG. 24 . Hereinafter, for convenience of description, the annotation task management apparatus 100 will be abbreviated as the management apparatus 100 . In addition, in the following description, when referring to an annotator terminal generically or to any annotator terminal without distinction, reference number 20 is used.

작업 관리 장치(100)는 관리자에 의해 이용되는 장치일 수 있다. 가령, 관리자는 작업 관리 장치(100)를 통해 작업 관리 웹 페이지 접속하고, 관리자 계정으로 로그인 한 다음, 전반적인 작업에 대한 관리를 수행할 수 있다. 가령, 관리자는 어노테이션 작업을 특정 어노테이터의 계정에 할당하거나, 어노테이션 결과를 리뷰자의 계정으로 전송하여 리뷰를 요청하는 등의 관리 행위를 수행할 수 있다. 물론, 위와 같은 제반 관리 과정은 작업 관리 장치(100)에 의해 자동으로 수행될 수도 잇는데, 이에 대한 설명은 도 5 이하의 도면을 참조하여 후술하도록 한다.The job management device 100 may be a device used by a manager. For example, the manager may access the job management web page through the job management device 100 , log in with an administrator account, and then perform overall job management. For example, the administrator may perform management actions such as allocating annotation tasks to a specific annotator's account or requesting a review by sending the annotation result to the reviewer's account. Of course, the above overall management process may be automatically performed by the job management apparatus 100, which will be described later with reference to the drawings below with reference to FIG. 5 .

상기 어노테이션 작업 관리 시스템에서, 어노테이터 단말(20)은 어노테이터에 의해 어노테이션 작업이 수행되는 단말이다. 단말(20)에는 어노테이션 툴(annotation tool)이 설치되어 있을 수 있다. 물론, 작업 관리 웹 페이지를 통해 어노테이션을 위한 각종 기능이 제공될 수도 있다. 이와 같은 경우, 어노테이터는 단말(20)을 통해 상기 작업 관리 웹 페이지에 접속한 다음, 웹 상에서 어노테이션 작업을 수행할 수 있다. 상기 어노테이션 툴의 일 예시는 도 7을 참조하도록 한다.In the annotation work management system, the annotator terminal 20 is a terminal on which annotation work is performed by an annotator. An annotation tool may be installed in the terminal 20 . Of course, various functions for annotation may be provided through the job management web page. In this case, the annotator may access the job management web page through the terminal 20 and then perform an annotation job on the web. An example of the annotation tool will be referred to FIG. 7 .

상기 어노테이션 작업 관리 시스템에서, 리뷰자 단말(30)은 어노테이션 결과에 대한 리뷰를 수행하는 리뷰자 측의 단말이다. 리뷰자는 리뷰자 단말(30)을 이용하여 어노테이션 결과에 대한 리뷰를 수행하고, 리뷰 결과를 관리 장치(100)에게 제공할 수 있다.In the annotation work management system, the reviewer terminal 30 is a terminal on the side of the reviewer that reviews the annotation results. The reviewer may review the annotation result using the reviewer terminal 30 and provide the review result to the management device 100 .

몇몇 실시예에서, 어노테이션 작업 관리 시스템의 적어도 일부의 구성 요소들은 네트워크를 통해 통신할 수 있다. 여기서, 상기 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 이동 통신망(mobile radio communication network), Wibro(Wireless Broadband Internet) 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다.In some embodiments, at least some components of the annotation task management system may communicate over a network. Here, the network includes all types of wired/wireless networks, such as a local area network (LAN), a wide area network (WAN), a mobile radio communication network, and a Wibro (Wireless Broadband Internet). can be implemented.

지금까지 도 2 내지 도 4를 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 시스템에 대하여 설명하였다. 이하에서는, 도 5 내지 도 23의 도면을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법에 대하여 설명하도록 한다.So far, an annotation job management system according to some embodiments of the present disclosure has been described with reference to FIGS. 2 to 4 . Hereinafter, an annotation job management method according to some embodiments of the present disclosure will be described with reference to FIGS. 5 to 23 .

상기 어노테이션 작업 관리 방법의 각 단계는 컴퓨팅 장치에 의해 수행될 수 있다. 다시 말하면, 상기 어노테이션 작업 관리 방법의 각 단계는 컴퓨팅 장치의 프로세서에 의해 실행되는 하나 이상의 인스트럭션들로 구현될 수 있다. 이해의 편의를 제공하기 위해, 상기 어노테이션 작업 관리 방법이 도 3 또는 도 4에 도시된 환경에서 수행되는 것을 가정하여 설명을 이어가도록 한다.Each step of the annotation job management method may be performed by a computing device. In other words, each step of the annotation task management method may be implemented as one or more instructions executed by a processor of a computing device. For convenience of understanding, it is assumed that the annotation job management method is performed in the environment shown in FIG. 3 or FIG. 4 to continue the description.

도 5는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.5 is an exemplary flowchart illustrating an annotation job management method according to some embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and it goes without saying that some steps may be added or deleted as needed.

도 5에 도시된 바와 같이, 상기 어노테이션 작업 관리 방법은 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계 S100에서 시작된다. 상기 병리 슬라이드 이미지에 대한 정보는 상기 병리 슬라이드 이미지의 메타 데이터만을 포함할 수 있고, 병리 슬라이드 이미지 파일을 더 포함할 수도 있다.As shown in FIG. 5 , the method for managing the annotation task starts in step S100 of obtaining information on a new pathological slide image. The information on the pathological slide image may include only meta data of the pathological slide image, and may further include a pathological slide image file.

몇몇 실시예에서, 워커 에이전트(worker agent)를 통해 상기 신규의 병리 슬라이드 이미지에 대한 정보가 실시간으로 획득될 수 있다. 구체적으로, 상기 워커 에이전트에 의해 지정된 위치의 스토리지(e.g. 스토리지 서버 10 or 병리 슬라이드 이미지를 제공하는 의료 기관의 스토리지)에 병리 슬라이드 이미지 파일이 추가되는 것이 감지될 수 있다. 또한, 상기 워커 에이전트에 의하여 상기 신규의 병리 슬라이드 이미지에 대한 정보가 작업 관리 장치(100) 또는 스토리지 서버(10)의 데이터베이스에 삽입될 수 있다. 그러면, 상기 데이터베이스로부터 상기 신규의 병리 슬라이드 이미지에 대한 정보가 획득될 수 있다.In some embodiments, information about the new pathology slide image may be acquired in real time via a worker agent. Specifically, it may be detected that the pathology slide image file is added to the storage (e.g. storage server 10 or storage of a medical institution that provides pathology slide images) at a location designated by the worker agent. Also, information on the new pathological slide image may be inserted into the database of the job management device 100 or the storage server 10 by the worker agent. Then, information on the new pathological slide image may be obtained from the database.

단계 S200에서, 관리 장치(100)는 상기 병리 슬라이드 이미지에 대한 어노테이션 작업을 생성한다. 여기서, 상기 어노테이션 작업은 상기 병리 슬라이드 이미지, 데이터셋 타입, 어노테이션 태스크 및 상기 병리 슬라이드 이미지의 일부 영역(즉, 어노테이션 대상 영역)인 패치 등의 정보에 기초하여 정의될 수 있다(도 4 참조). 본 단계 S200에 대한 자세한 설명은 도 8 내지 도 23을 참조하여 후술하도록 한다.In step S200, the management device 100 creates an annotation job for the pathological slide image. Here, the annotation operation may be defined based on information such as the pathological slide image, the dataset type, the annotation task, and a patch that is a partial region (ie, an annotation target region) of the pathological slide image (see FIG. 4 ). A detailed description of this step S200 will be described later with reference to FIGS. 8 to 23 .

단계 S300에서, 관리 장치(100)는 상기 생성된 어노테이션 작업을 수행할 어노테이터를 선정한다.In step S300 , the management device 100 selects an annotator to perform the generated annotation work.

몇몇 실시예에서, 도 6에 도시된 바와 같이, 관리 장치(100)는 어노테이터(51 내지 53)의 작업 수행 이력(e.g. 자주 진행했던 어노테이션 작업 등), 기 수행한 작업의 평가 결과(또는 검증 결과), 현재 작업 현황(e.g. 현재 할당된 작업의 진행 상태) 등의 관리 정보(54 내지 56) 에 기초하여 어노테이터를 자동으로 선정할 수 있다. 예를 들어, 관리 장치(100)는 상기 생성된 어노테이션 작업과 연관된 작업을 자주 수행했던 제1 어노테이터, 연관 작업에 대한 어노테이션 결과가 우수했던 제2 어노테이터, 현재 진행 중인 작업이 많지 않은 제3 어노테이터 등을 상기 생성된 어노테이션 작업의 어노테이터로 선정할 수 있다.In some embodiments, as shown in FIG. 6 , the management device 100 performs the task performance histories of the annotators 51 to 53 (e.g. frequently performed annotation tasks, etc.), evaluation results of previously performed tasks (or verification). Result), an annotator may be automatically selected based on the management information 54 to 56, such as the current job status (e.g. the progress status of the currently assigned job). For example, the management device 100 may include a first annotator who frequently performed a task related to the generated annotation task, a second annotator with excellent annotation results for the related task, and a third annotator with few tasks currently in progress. An annotator or the like may be selected as an annotator of the generated annotation work.

여기서, 작업 수행 이력에 상기 생성된 어노테이션 작업과 연관된 작업이 포함되어 있는지 여부는 각 작업의 데이터셋 타입과 어노테이션 태스크의 패널의 조합이 서로 유사한지 여부에 기초하여 판정될 수 있다. 또는, 어노테이션 태스크의 패널 및 태스크 클래스의 조합이 서로 유사한지 여부에 기초하여 판정될 수도 있다. 물론, 상기 두가지 조합이 모두 유사한지 여부에 기초하여 판정될 수도 있다.Here, whether a task related to the generated annotation task is included in the task performance history may be determined based on whether the combination of the dataset type of each task and the panel of the annotation task is similar to each other. Alternatively, it may be determined based on whether the combination of the panel and task class of the annotation task is similar to each other. Of course, it may be determined based on whether the two combinations are both similar.

몇몇 실시예에서, 상기 신규의 병리 슬라이드 이미지가 중요 데이터(e.g. 희귀병과 연관된 슬라이드 이미지, 고품질의 슬라이드 이미지 등)인 경우, 복수의 어노테이터가 선정될 수 있다. 또한, 어노테이터의 사람 수는 상기 중요도에 비례하여 증가될 수도 있다. 이와 같은 경우, 상기 복수의 어노테이터의 작업 결과를 상호 비교함으로써, 어노테이션 결과에 대한 검증이 수행될 수 있다. 본 실시예에 따르면, 중요 데이터에 대하여 보다 엄격한 검증이 수행됨으로써, 어노테이션 결과의 정확성이 향상될 수 있다.In some embodiments, when the new pathology slide image is important data (e.g. a slide image associated with a rare disease, a high-quality slide image, etc.), a plurality of annotators may be selected. Also, the number of annotators may be increased in proportion to the above importance. In this case, by comparing the operation results of the plurality of annotators with each other, verification of the annotation results may be performed. According to the present embodiment, as more stringent verification is performed on important data, the accuracy of the annotation result may be improved.

단계 S400에서, 관리 장치(100)는 선정된 어노테이터의 단말(20)에게 어노테이션 작업을 할당한다. 가령, 관리 장치(100)는 상기 선정된 어노테이터의 계정에 어노테이션 작업을 할당할 수 있다.In step S400 , the management device 100 allocates an annotation task to the terminal 20 of the selected annotator. For example, the management device 100 may allocate an annotation task to the account of the selected annotator.

단계 S500에서, 어노테이터 단말(20)에서 어노테이션이 수행된다. 어노테이터는 단말(20)에 설치된 어노테이터 툴 또는 웹(e.g. 작업 관리 웹 페이지)을 통해 제공되는 어노테이션 서비스를 이용하여 어노테이션을 수행할 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.In step S500 , the annotation is performed in the annotator terminal 20 . The annotator may perform annotation using an annotator tool installed in the terminal 20 or an annotation service provided through a web (e.g. job management web page), but the technical scope of the present disclosure is not limited thereto.

상기 어노테이터 툴의 몇몇 예시는 도 6에 도시되어 있다. 도 6에 도시된 바와 같이, 어노테이션 툴(60)은 제1 영역(63)과 제2 영역(61)을 포함할 수 있다. 제2 영역(61)에는 실제 어노테이션이 수행될 패치 영역(68)과 확대/축소 인디케이터(65)가 포함될 수 있다. 도 6에 도시된 바와 같이, 패치 영역(68)에는 박스 라인 등의 하이라이트 처리가 이루어질 수 있다. 제1 영역(63)에는 작업 정보(67)가 표시되고, 도구 영역(69)이 더 포함될 수 있다. 도구 영역(69)에는 각 어노테이션에 대응되는 선택 가능한 도구들이 포함될 수 있다. 따라서, 어노테이터는 패치 영역(68)에 직접 어노테이션을 기입하지 않고, 간편하게 선택된 도구를 이용하여 패치 영역(68)에 어노테이션을 태깅할 수 있다(e.g. 클릭을 통해 제1 도구를 선택하고, 패치 영역 68을 다시 클릭하여 태깅 수행). 도구 영역(63)에 표시되는 어노테이션의 종류는 어노테이션 작업에 따라 달라질 것이므로, 어노테이션 툴(60)은 어노테이션 작업 정보에 기초하여 적절한 어노테이션 도구를 세팅할 수 있다.Some examples of such annotator tools are shown in FIG. 6 . As shown in FIG. 6 , the annotation tool 60 may include a first area 63 and a second area 61 . The second region 61 may include a patch region 68 on which an actual annotation is to be performed and an enlargement/reduction indicator 65 . As shown in FIG. 6 , a highlight process such as a box line may be applied to the patch area 68 . The work information 67 may be displayed in the first area 63 , and a tool area 69 may be further included. The tool area 69 may include selectable tools corresponding to each annotation. Therefore, the annotator can tag the patch area 68 with annotations by simply using the selected tool without directly annotating the patch area 68 (e.g., by selecting the first tool by clicking, and Click 68 again to do tagging). Since the type of annotation displayed in the tool area 63 will vary depending on the annotation operation, the annotation tool 60 may set an appropriate annotation tool based on the annotation operation information.

도 6에 도시된 어노테이션 툴(60)은 어노테이터의 편의성을 위해 고안된 툴의 일 예시를 도시하고 있을 뿐임에 유의하여야 한다. 즉, 어노테이션 툴은 어떠한 방식으로 구현되더라도 무방하다. 다시 도 5를 참조하여 설명을 이어가도록 한다.It should be noted that the annotation tool 60 shown in FIG. 6 is only showing an example of a tool designed for the convenience of an annotator. That is, the annotation tool may be implemented in any way. The description will be continued with reference to FIG. 5 again.

단계 S600에서, 어노테이터 단말(20)은 어노테이션 작업의 결과를 제공한다. 어노테이션 작업의 결과는 해당 패치에 태깅된 레이블 정보가 될 수 있다.In step S600, the annotator terminal 20 provides the results of the annotation work. The result of the annotation operation may be label information tagged in the corresponding patch.

단계 S700에서, 관리 장치(100)는 작업 결과에 대한 검증(평가)을 수행한다. 상기 검증 결과는 해당 어노테이터의 평가 결과로 기록될 수 있다. 상기 검증을 수행하는 구체적인 방식은 실시예에 따라 달라질 수 있다.In step S700, the management device 100 performs verification (evaluation) on the work result. The verification result may be recorded as an evaluation result of the corresponding annotator. A specific method of performing the verification may vary according to embodiments.

몇몇 실시예에서, 기계학습 모델의 출력 결과에 기초하여 자동으로 검증이 수행될 수 있다. 구체적으로, 작업을 할당받은 어노테이터로부터 제1 어노테이션 결과 데이터가 획득되면, 상기 제1 어노테이션 결과 데이터와 상기 어노테이션 작업의 패치를 상기 기계학습 모델에 입력한 결과가 비교될 수 있다. 상기 비교 결과, 두 결과의 차이가 기준치를 초과하면 상기 제1 어노테이션 결과 데이터의 승인은 보류되거나 미승인 처리될 수 있다.In some embodiments, verification may be performed automatically based on an output result of the machine learning model. Specifically, when the first annotation result data is obtained from the annotator to which the task is assigned, the result of inputting the first annotation result data and the patch of the annotation task into the machine learning model may be compared. As a result of the comparison, if the difference between the two results exceeds a reference value, the approval of the first annotation result data may be withheld or may not be approved.

여기서, 상기 기준치는 기 설정된 고정 값 또는 상황에 따라 변동되는 변동 값일 수 있다. 가령, 상기 기준치는 상기 기계학습 모델의 정확도가 높을수록 더 작은 값으로 변동되는 값일 수 있다.Here, the reference value may be a preset fixed value or a variable value that varies according to circumstances. For example, the reference value may be a value that changes to a smaller value as the accuracy of the machine learning model increases.

단계 S800에서, 관리 장치(100)는 어노테이션 작업이 다시 수행되어야 하는지 여부를 판정한다. 가령, 단계 S700에서 검증이 성공적으로 수행되지 않은 경우, 관리 장치(100)는 재작업이 필요하다는 결정을 내릴 수 있다.In step S800 , the management device 100 determines whether the annotation operation should be performed again. For example, when the verification is not successfully performed in step S700 , the management device 100 may determine that rework is required.

단계 S900에서, 재작업 필요 결정에 응답하여, 관리 장치(100)는 다른 어노테이터를 선정하고, 상기 다른 어노테이터에게 어노테이션 작업을 재할당한다. 이때, 상기 다른 어노테이터는 단계 S300에서 설명한 바와 유사한 방식으로 선정될 수 있다. 또는, 상기 다른 어노테이터는 리뷰자 또는 성능이 가장 우수한 기계학습 모델이 될 수도 있다.In step S900 , in response to the determination of the need for rework, the management device 100 selects another annotator and reallocates the annotation task to the other annotator. In this case, the other annotator may be selected in a manner similar to that described in step S300. Alternatively, the other annotator may be a reviewer or a machine learning model with the best performance.

도 5에는 도시되어 있지 않으나, 단계 S900 이후에, 상기 다른 어노테이터의 제2 어노테이션 결과 데이터에 기초하여 상기 제1 어노테이션 결과 데이터에 대한 검증이 다시 수행될 수 있다. 구체적으로, 상기 제2 어노테이션 결과 데이터가 획득되면, 상기 제1 어노테이션 결과 데이터와 상기 제2 어노테이션 결과 데이터의 유사도가 산출될 수 있다. 또한, 상기 유사도가 기준치 미만이면 상기 제1 어노테이션 결과 데이터는 최종적으로 미승인 처리될 수 있다. 이와 같은 처리 결과는 해당 어노테이터의 작업 수행 이력에 기록될 수 있다.Although not shown in FIG. 5 , after step S900 , verification of the first annotation result data may be performed again based on the second annotation result data of the other annotator. Specifically, when the second annotation result data is obtained, a similarity between the first annotation result data and the second annotation result data may be calculated. In addition, if the similarity is less than the reference value, the first annotation result data may be finally disapproved. Such processing results may be recorded in the work performance history of the corresponding annotator.

지금까지 도 4 내지 도 7을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법에 대하여 설명하였다. 상술한 방법에 따르면, 어노테이션 작업이 전반적으로 자동화됨에 따라 관리자의 편의성이 증대되고, 전반적인 작업 효율성이 크게 향상될 수 있다. 이에 따라, 어노테이션 작업에 소요되는 시간 비용 및 인적 비용이 크게 절감될 수 있다. 또한, 어노테이션 작업의 부담이 감소됨에 따라, 기계 학습 기법의 활용성은 더욱 증대될 수 있다.So far, an annotation job management method according to some embodiments of the present disclosure has been described with reference to FIGS. 4 to 7 . According to the above-described method, as the annotation operation is overall automated, the convenience of the administrator may be increased, and the overall operation efficiency may be greatly improved. Accordingly, time cost and human cost required for annotation work can be greatly reduced. In addition, as the burden of the annotation task is reduced, the utility of the machine learning technique may be further increased.

나아가, 어노테이션 작업 결과를 기계학습 모델 또는 다른 어노테이터의 결과와 비교 검증함으로써, 어노테이션 결과의 정확성이 담보될 수 있다. 이에 따라, 어노테이션 결과를 학습한 기계학습 모델의 성능도 향상될 수 있다.Furthermore, by comparing and verifying the results of the annotation operation with the results of the machine learning model or other annotators, the accuracy of the annotation results can be guaranteed. Accordingly, the performance of the machine learning model that has learned the annotation result may also be improved.

이하에서는, 도 8 내지 도 22를 참조하여 어노테이션 작업 생성 단계 S200의 세부 과정에 대하여 상세하게 설명하도록 한다.Hereinafter, the detailed process of the annotation job creation step S200 will be described in detail with reference to FIGS. 8 to 22 .

도 8은 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.8 is an exemplary flowchart illustrating a method for generating an annotation job according to some embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and it goes without saying that some steps may be added or deleted as needed.

도 8에 도시된 바와 같이, 상기 어노테이션 작업 생성 방법은 신규의 병리 슬라이드 이미지의 데이터셋 타입을 결정하는 단계 S210에서 시작된다. 전술한 바와 같이, 상기 데이터셋 타입은 상기 병리 슬라이드 이미지의 활용 용도를 가리키는 것으로, 용도는 학습 용도, 검증 용도, 테스트 용도 또는 OPT(Observer Performance Test) 용도 등으로 구분될 수 있다.As shown in FIG. 8 , the method for generating an annotation task starts in step S210 of determining a dataset type of a new pathological slide image. As described above, the data set type indicates a usage purpose of the pathological slide image, and the usage may be classified into a learning purpose, a verification purpose, a test use, or an Observer Performance Test (OPT) use.

몇몇 실시예에서, 상기 데이터셋 타입은 관리자의 선택에 의해 결정될 수 있다. In some embodiments, the dataset type may be determined by an administrator's selection.

다른 몇몇 실시예에서, 상기 데이터셋 타입은 병리 슬라이드 이미지에 대한 기계학습 모델의 컨피던스 스코어에 기초하여 결정될 수 있다. 여기서, 상기 기계학습 모델은 병리 슬라이드 이미지에 기초하여 특정 태스크(e.g. 병변 분류, 병변 위치 인식 등)를 수행하는 모델(즉, 학습 대상 모델)을 의미한다. 본 실시예에 대한 자세한 내용은 도 9에 도시되어 있다. 도 9에 도시된 바와 같이, 관리 장치(100)는 병리 슬라이드 이미지를 기계학습 모델에 입력하고, 그 결과로 컨피던스 스코어를 획득하며(S211), 상기 컨피던스 스코어가 기준치 이상인지 여부를 판정한다(S213). 또한, 기준치 미만이라는 판정에 응답하여, 관리 장치(100)는 상기 병리 슬라이드 이미지의 데이터셋 타입을 학습 용도로 결정한다(S217). 컨피던스 스코어가 기준치 미만이라는 것은 기계학습 모델이 상기 병리 슬라이드 이미지를 명확하게 판단하지 못한다는 것을 의미하기 때문이다(즉, 해당 병리 슬라이드 이미지에 대한 학습이 필요하다는 것을 의미하기 때문이다). 반대의 경우, 상기 병리 슬라이드 이미지의 데이터셋 타입은 검증 용도(또는 테스트 용도)로 결정된다(S215).In some other embodiments, the dataset type may be determined based on a confidence score of a machine learning model for a pathological slide image. Here, the machine learning model refers to a model (ie, a learning target model) that performs a specific task (e.g. lesion classification, lesion location recognition, etc.) based on a pathological slide image. Details of this embodiment are shown in FIG. 9 . As shown in FIG. 9 , the management device 100 inputs the pathological slide image to the machine learning model, obtains a confidence score as a result (S211), and determines whether the confidence score is greater than or equal to a reference value (S213) ). Also, in response to the determination that it is less than the reference value, the management device 100 determines the dataset type of the pathological slide image for learning ( S217 ). If the confidence score is less than the reference value, it means that the machine learning model cannot clearly determine the pathological slide image (that is, it means that learning on the pathological slide image is required). In the opposite case, the dataset type of the pathological slide image is determined for verification (or testing) (S215).

또 다른 몇몇 실시예에서, 상기 데이터셋 타입은 병리 슬라이드 이미지에 대한 기계학습 모델의 엔트로피(entropy) 값에 기초하여 결정될 수 있다. 상기 엔트로피 값은 불확실성(uncertainty)를 나타내는 지표로, 컨피던스 스코어가 클래스 별로 고르게 분포할수록 큰 값을 가질 수 있다. 본 실시예에서, 상기 엔트로피 값이 기준치 이상이라는 판정에 응답하여, 상기 데이터셋 타입은 학습 용도로 결정될 수 있다. 반대의 경우라면, 검증 용도로 결정될 수 있다.In some other embodiments, the dataset type may be determined based on an entropy value of a machine learning model for a pathological slide image. The entropy value is an index indicating uncertainty, and may have a larger value as the confidence scores are more evenly distributed for each class. In this embodiment, in response to determining that the entropy value is greater than or equal to a reference value, the dataset type may be determined for learning use. Otherwise, it may be determined for verification purposes.

다시 도 8을 참조하면, 단계 S230에서, 관리 장치(100)는 병리 슬라이드 이미지의 패널 유형을 결정한다. 전술한 바와 같이, 상기 패널 유형은 세포 패널, 조직 패널 및 스트럭처 패널 등으로 구분될 수 있다. 상기 세포 패널 유형의 이미지의 예는 도 10에 도시되어 있고, 상기 조직 패널의 이미지의 예는 도 11에 도시되어 있으며, 상기 스트럭처 패널의 이미지의 예는 도 12에 도시되어 있다. 도 10 내지 도 12에 도시된 바와 같이, 세포 패널은 세포 레벨의 어노테이션이 수행되는 패치 유형이고, 조직 패널은 조직 레벨의 어노테이션이 수행되는 패치 유형이며, 조직 패널은 세포 또는 조직 등의 구조와 연관된 어노테이션이 수행되는 패치 유형이다.Referring back to FIG. 8 , in step S230 , the management device 100 determines the panel type of the pathological slide image. As described above, the panel type may be divided into a cell panel, a tissue panel, and a structure panel. An example of an image of the cell panel type is shown in FIG. 10 , an example of an image of the tissue panel is shown in FIG. 11 , and an example of an image of the structure panel is shown in FIG. 12 . 10 to 12 , the cell panel is a patch type in which cell-level annotation is performed, the tissue panel is a patch type in which tissue level annotation is performed, and the tissue panel is associated with a structure such as a cell or tissue. The type of patch on which the annotation is performed.

몇몇 실시예에서, 상기 패널 유형은 관리자의 선택에 의해 결정될 수 있다.In some embodiments, the panel type may be determined by an administrator's selection.

몇몇 실시예에서, 상기 패널 유형은 기계학습 모델의 출력 값에 기초하여 결정될 수 있다. 도 13을 참조하여 부연 설명하면, 기계학습 모델에는 세포 패널에 대응되는 제1 기계학습 모델(75-1, 즉 세포 레벨의 어노테이션을 학습한 모델), 조직 패널에 대응되는 제2 기계학습 모델(75-2) 및 스트럭처 패널에 대응되는 제3 기계학습 모델(75-3)이 포함될 수 있다. 이와 같은 경우, 관리 장치(100)는 주어진 병리 슬라이드 이미지(71)에서 각각의 패널에 대응되는 제1 내지 제3 이미지(73-1 내지 73-3)를 추출(또는 샘플링)하고, 각 이미지를 대응되는 모델(75-1 내지 75-3)에 입력하며, 그 결과로 출력 값(77-1 내지 77-3)을 획득할 수 있다. 또한, 관리 장치(100)는 출력 값(77-1 내지 77-3)과 기준치와의 비교 결과에 따라 병리 슬라이드 이미지(71)의 패널 유형을 결정할 수 있다. 가령, 제1 출력 값(77-1)이 상기 기준치 미만인 경우, 병리 슬라이드 이미지(71)의 패널 유형은 세포 패널로 결정될 수 있다. 병리 슬라이드 이미지(71)에서 추출되는 세포 패치들이 제1 기계학습 모델(75-1)의 학습 성능을 올리는데 효과적일 것이기 때문이다.In some embodiments, the panel type may be determined based on an output value of a machine learning model. 13, the machine learning model includes a first machine learning model corresponding to the cell panel (75-1, that is, a model learning cell-level annotations), and a second machine learning model corresponding to the tissue panel ( 75-2) and a third machine learning model 75-3 corresponding to the structure panel may be included. In this case, the management device 100 extracts (or samples) the first to third images 73-1 to 73-3 corresponding to each panel from the given pathological slide image 71, and uses each image. It is input to the corresponding models 75-1 to 75-3, and as a result, output values 77-1 to 77-3 can be obtained. Also, the management apparatus 100 may determine the panel type of the pathological slide image 71 according to a result of comparing the output values 77 - 1 to 77 - 3 with the reference value. For example, when the first output value 77 - 1 is less than the reference value, the panel type of the pathological slide image 71 may be determined as a cell panel. This is because the cell patches extracted from the pathological slide image 71 will be effective in increasing the learning performance of the first machine learning model 75-1.

몇몇 실시예에서, 병리 슬라이드 이미지가 복수의 패널 유형을 가질 수도 있다. 이와 같은 경우, 상기 병리 슬라이드 이미지로부터 각 패널에 대응되는 패치들이 추출될 수 있다.In some embodiments, the pathological slide image may have multiple panel types. In this case, patches corresponding to each panel may be extracted from the pathological slide image.

다시 도 8을 참조하면, 단계 S250에서, 관리 장치(100)는 어노테이션 태스크를 결정한다. 전술한 바와 같이, 어노테이션 태스크는 세부 작업 유형이 정의해 놓은 엔티티를 의미한다.Referring back to FIG. 8 , in step S250 , the management device 100 determines an annotation task. As described above, the annotation task refers to an entity defined by the detailed task type.

몇몇 실시예에서, 상기 어노테이션 태스크는 관리자의 선택에 의해 결정될 수 있다.In some embodiments, the annotation task may be determined by an administrator's selection.

몇몇 실시예에서, 상기 어노테이션 태스크는 상기 결정된 데이터셋 타입과 패널 유형의 조합에 기초하여 자동으로 결정될 수도 있다. 가령, 데이터셋 타입과 패널 유형의 조합에 매칭되는 어노테이션 태스크가 미리 정의되어 있는 경우, 상기 조합에 에 기초하여 상기 매칭되는 어노테이션 태스크가 자동으로 결정될 수 있다.In some embodiments, the annotation task may be automatically determined based on the determined combination of the dataset type and the panel type. For example, when an annotation task matching a combination of a dataset type and a panel type is predefined, the matching annotation task may be automatically determined based on the combination.

단계 S270에서, 관리 장치(100)는 병리 슬라이드 이미지에서 실제 어노테이션이 수행될 패치를 자동으로 추출한다. 물론, 관리자에 의해 지정된 영역이 패치로 추출될 수도 있다. 상기 패치를 자동으로 추출하는 구체적인 방법은 실시예에 따라 달라질 수 있는데, 패치 추출과 관련된 다양한 실시예들은 도 14 내지 도 23을 참조하여 후술하도록 한다.In step S270 , the management device 100 automatically extracts a patch on which an actual annotation is to be performed from the pathological slide image. Of course, an area designated by an administrator may be extracted as a patch. A specific method of automatically extracting the patch may vary according to embodiments, and various embodiments related to patch extraction will be described later with reference to FIGS. 14 to 23 .

도 8에는 도시되어 있지 않으나, 단계 S270 이후에, 관리 장치(100)는 단계 S210 내지 S270에서 결정된 데이터셋 타입, 패널 유형, 어노테이션 태스크 및 패치에 기반하여 어노테이션 작업을 생성할 수 있다. 전술한 바와 같이, 생성된 어노테이션 작업은 적절한 어노테이터의 계정에 할당될 수 있다.Although not shown in FIG. 8 , after step S270 , the management device 100 may generate an annotation job based on the dataset type, panel type, annotation task, and patch determined in steps S210 to S270 . As mentioned above, the generated annotation tasks can be assigned to the appropriate annotator's account.

지금까지 도 8 내지 도 13을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법에 대하여 설명하였다. 이하에서는, 패치 자동 추출과 관련된 본 개시의 다양한 실시예들에 대하여 도 14 내지 도 23을 참조하여 설명하도록 한다.So far, an annotation job generating method according to some embodiments of the present disclosure has been described with reference to FIGS. 8 to 13 . Hereinafter, various embodiments of the present disclosure related to automatic patch extraction will be described with reference to FIGS. 14 to 23 .

도 14는 본 개시의 제1 실시예에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.14 is an exemplary flowchart illustrating a method for automatically extracting patches according to a first embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and it goes without saying that some steps may be added or deleted as needed.

도 14에 도시된 바와 같이, 상기 패치 자동 추출 방법은 신규의 병리 슬라이드 이미지에서 복수의 후보 패치를 샘플링하는 단계 S271에서 시작된다. 상기 복수의 후보 패치를 샘플링하는 구체적인 방식은 실시예에 따라 달라질 수 있다.As shown in FIG. 14 , the automatic patch extraction method starts in step S271 of sampling a plurality of candidate patches from a new pathological slide image. A specific method of sampling the plurality of candidate patches may vary according to embodiments.

몇몇 실시예에서, 특정 조직을 구성하는 적어도 세포 영역들을 후보 패치(즉, 세포 패널 유형의 패치들)로 샘플링하는 경우, 도 15에 도시된 바와 같이, 병리 슬라이드 이미지(81)에서 영상 분석을 통해 조직 영역(83)을 추출하고, 추출된 영역(83) 내에서 복수의 후보 패치들(85)이 샘플링될 수 있다. 샘플링 결과의 몇몇 예시는 도 16 및 도 17에 도시되어 있다. 도 16 및 도 17에 도시된 병리 슬라이드 이미지(87, 89)에서, 각 포인트는 샘플링 포인트를 의미하고, 사각형의 도형은 샘플링 영역(즉, 후보 패치 영역)을 의미한다. 도 16 및 도 17에 도시된 바와 같이, 복수의 후보 패치들은 적어도 일부가 중첩되는 형태로 샘플링될 수도 있다.In some embodiments, when sampling at least cell regions constituting a specific tissue as a candidate patch (ie, patches of a cell panel type), as shown in FIG. 15 , through image analysis in the pathological slide image 81 , The tissue region 83 may be extracted, and a plurality of candidate patches 85 may be sampled within the extracted region 83 . Some examples of sampling results are shown in FIGS. 16 and 17 . In the pathological slide images 87 and 89 shown in FIGS. 16 and 17, each point means a sampling point, and a rectangular figure means a sampling area (ie, a candidate patch area). 16 and 17 , the plurality of candidate patches may be sampled in a form in which at least some overlap.

몇몇 실시예에서, 병리 슬라이드 이미지의 전체 영역을 균일하게 분할하고, 분할된 각각의 영역들을 샘플링하여 후보 패치들이 생성될 수 있다. 즉, 균등 분할 방식으로 샘플링이 수행될 수 있다. 이때, 각 후보 패치들의 크기는 기 설정된 고정 값 또는 병리 슬라이드 이미지의 크기, 해상도, 패널 유형 등에 기초하여 결정되는 변동 값일 수 있다.In some embodiments, candidate patches may be generated by uniformly segmenting the entire area of the pathological slide image and sampling each segmented area. That is, sampling may be performed in an equal division method. In this case, the size of each candidate patch may be a preset fixed value or a variable value determined based on the size, resolution, panel type, etc. of the pathological slide image.

몇몇 실시예에서, 병리 슬라이드 이미지의 전체 영역을 랜덤하게 분할하고, 분할된 각각의 영역들을 샘플링하여 후보 패치들이 생성될 수 있다.In some embodiments, candidate patches may be generated by randomly segmenting an entire region of the pathological slide image and sampling each segmented region.

몇몇 실시예에서, 객체의 개수가 기준치를 초과하도록 후보 패치들이 형성될 수 있다. 가령, 상기 병리 슬라이드 이미지의 전체 영역에 대하여 객체 인식을 수행하고, 상기 객체 인식의 결과 산출된 객체의 개수가 기준치를 초과하는 영역이 후보 패치로 샘플링될 수 있다. 이와 같은 경우, 후보 패치의 크기는 서로 다를 수도 있다.In some embodiments, candidate patches may be formed such that the number of objects exceeds a threshold. For example, object recognition is performed on the entire area of the pathological slide image, and a region in which the number of objects calculated as a result of object recognition exceeds a reference value may be sampled as a candidate patch. In this case, the size of the candidate patch may be different from each other.

몇몇 실시예에서, 병리 슬라이드 이미지의 메타 데이터에 기반하여 결정된 정책에 따라 분할된 후보 패치가 샘플링될 수 있다. 여기서, 상기 메타 데이터는 상기 병리 슬라이드 이미지와 연관된 병명, 조직(tissue), 환자의 인구통계학적 정보, 의료기관의 위치, 상기 병리 슬라이드 이미지의 품질(e.g. 해상도), 포맷 형식 등이 될 수 있다. 구체적인 예를 들어, 병리 슬라이드 이미지가 종양 환자의 조직에 관한 이미지인 경우, 유사 분열 세포 검출을 위한 기계학습 모델의 학습 데이터로 이용하기 위해, 세포 레벨로 후보 패치가 샘플링될 수 있다. 다른 예를 들어, 병리 슬라이드 이미지와 연관된 병명의 예후를 진단할 때 조직 내 병변의 위치가 중요한 경우, 조직 레벨로 후보 패치가 샘플링될 수도 있다.In some embodiments, segmented candidate patches may be sampled according to a policy determined based on metadata of the pathological slide image. Here, the metadata may be a disease name, tissue, patient demographic information, location of a medical institution, quality (e.g. resolution) of the pathology slide image, and format format associated with the pathology slide image. As a specific example, when the pathology slide image is an image of a tissue of a tumor patient, candidate patches may be sampled at the cellular level to be used as training data for a machine learning model for detecting mitotic cells. For another example, when the location of the lesion in the tissue is important when diagnosing the prognosis of the disease associated with the pathological slide image, candidate patches may be sampled at the tissue level.

몇몇 실시예에서, 병리 슬라이드 이미지에서 스트럭처 패널 유형의 후보 패치들을 샘플링하는 경우, 영상 분석을 통해 상기 병리 슬라이드 이미지에서 외곽선이 추출되고, 상기 추출된 외곽선 중에서 서로 연결된 외곽선들이 하나의 후보 패치를 형성하도록 샘플링이 수행될 수도 있다.In some embodiments, when sampling candidate patches of the structure panel type from the pathological slide image, an outline is extracted from the pathological slide image through image analysis, and the interconnected outlines form one candidate patch among the extracted outlines. Sampling may be performed.

이와 같이, 단계 S271에서 복수의 후보 패치를 샘플링하는 구체적인 방식은 실시예에 따라 달라질 수 있다. 다시 도 14를 참조하여 설명을 이어가도록 한다.As such, a specific method of sampling a plurality of candidate patches in step S271 may vary according to embodiments. The description will be continued with reference to FIG. 14 again.

단계 S273에서, 기계학습 모델의 출력 값에 기반하여 어노테이션 대상 패치가 선정된다. 상기 출력 값은 예를 들어 컨피던스 스코어(또는 클래스 별 컨피던스 스코어)가 될 수 있는데, 상기 컨피던스 스코어에 기초하여 패치를 선정하는 구체적인 방식은 실시예에 따라 달라질 수 있다.In step S273, an annotation target patch is selected based on the output value of the machine learning model. The output value may be, for example, a confidence score (or a confidence score for each class), and a specific method of selecting a patch based on the confidence score may vary according to embodiments.

몇몇 실시예에서, 클래스 별 컨피던스 스코어에 의해 산출된 엔트로피 값에 기초하여 어노테이션 대상 패치가 선정될 수 있다. 본 실시예에 대한 자세한 내용은 도 18 및 도 19에 도시되어 있다.In some embodiments, the patch to be annotated may be selected based on the entropy value calculated by the confidence score for each class. Details of this embodiment are shown in FIGS. 18 and 19 .

도 18에 도시된 바와 같이, 병리 슬라이드 이미지(91)에서 샘플링된 후보 패치들(92)로부터 엔트로피 값 기반의 불확실성 샘플링을 통해 어노테이션 대상 패치(93)가 선정될 수 있다. 보다 구체적으로, 도 19에 도시된 바와 같이, 기계학습 모델(95)로부터 출력된 각 후보 패치(94-1 내지 94-n)의 클래스 별 컨피던스 스코어(96-1 내지 96-n)를 기초로 엔트로피 값(97-1 내지 97-n)이 산출될 수 있다. 전술한 바와 같이, 엔트로피 값은 컨피던스 스코어가 클래스 별로 고르게 분포할수록 큰 값을 갖게 된다. 가령, 도 19에 도시된 경우라면, 엔트로피 A(97-1)가 가장 큰 값으로 연산되고, 엔트로피 C(97-n)가 가장 작은 값으로 연산된다. 또한, 엔트로피 값이 기준치 이상인 후보 패치들이 어노테이션 대상으로 자동 선정될 수 있다. 엔트로피 값이 높다는 것은 기계학습 모델의 예측 결과가 불확실하다는 것을 의미하고, 이는 곧 학습에 보다 효과적인 데이터라는 것을 의미하기 때문이다.18 , an annotation target patch 93 may be selected from candidate patches 92 sampled from the pathological slide image 91 through entropy-based uncertainty sampling. More specifically, as shown in FIG. 19 , based on the confidence scores 96-1 to 96-n for each class of each candidate patch 94-1 to 94-n output from the machine learning model 95, Entropy values 97-1 to 97-n may be calculated. As described above, the entropy value has a larger value as the confidence scores are more evenly distributed for each class. For example, in the case shown in FIG. 19 , entropy A(97-1) is calculated as the largest value, and entropy C(97-n) is calculated as the smallest value. Also, candidate patches having an entropy value equal to or greater than the reference value may be automatically selected as an annotation target. A high entropy value means that the prediction results of the machine learning model are uncertain, which means that data is more effective for learning.

몇몇 실시예에서, 상기 컨피던스 스코어 자체에 기초하여 어노테이션 대상 패치가 선정될 수도 있다. 가령, 복수의 후보 패치 중에서, 컨피던스 스코어가 기준치 미만인 후보 패치가 상기 어노테이션 대상 패치로 선정될 수 있다.In some embodiments, the patch to be annotated may be selected based on the confidence score itself. For example, a candidate patch having a confidence score less than a reference value among a plurality of candidate patches may be selected as the annotation target patch.

도 20은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다. 명세서의 명료함을 위해, 전술한 실시예와 중복되는 내용에 대한 설명은 생략하도록 한다.20 is an exemplary flowchart illustrating an automatic patch extraction method according to second embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and it goes without saying that some steps may be added or deleted as needed. For clarity of the specification, a description of the content overlapping with the above-described embodiment will be omitted.

도 20에 도시된 바와 같이, 상기 제2 실시예 또한 복수의 후보 패치를 샘플링하는 단계 S271에서 시작된다. 다만, 상기 제2 실시예에서는 기계학습 모델의 오예측 확률에 기반하여 어노테이션 대상 패치가 선정된다는 점에서(S275 참조), 전술한 실시예와 차이가 있다. As shown in Fig. 20, the second embodiment also starts in step S271 of sampling a plurality of candidate patches. However, the second embodiment is different from the above-described embodiment in that an annotation target patch is selected based on the erroneous prediction probability of the machine learning model (see S275).

상기 기계학습 모델의 오예측 확률은 기계 학습을 통해 구축된 오예측 확률 산출 모델(이하, "산출 모델"로 약칭함)에 의해 산출될 수 있는데, 이해의 편의를 제공하기 위해, 먼저 상기 산출 모델을 구축하는 방법에 대하여 도 21 및 도 22를 참조하여 설명하도록 한다.The erroneous prediction probability of the machine learning model may be calculated by a erroneous prediction probability calculation model (hereinafter abbreviated as "calculation model") built through machine learning. For convenience of understanding, first, the calculation model A method of constructing a will be described with reference to FIGS. 21 and 22 .

도 21에 도시된 바와 같이, 상기 산출 모델은 상기 기계학습 모델의 평가 결과(e.g. 검증 결과, 테스트 결과)를 학습함으로써 구축될 수 있다(S291 내지 S295). 구체적으로, 평가용 데이터로 상기 기계학습 모델을 평가하고(S291), 평가 결과가 상기 평가용 데이터에 레이블 정보로 태깅되면(S293), 상기 평가용 데이터를 상기 레이블 정보로 학습함으로써 상기 산출 모델이 구축될 수 있다(S295).As shown in FIG. 21 , the calculation model may be built by learning the evaluation results (e.g. verification results, test results) of the machine learning model ( S291 to S295 ). Specifically, when the machine learning model is evaluated with the evaluation data (S291), and the evaluation result is tagged as label information on the evaluation data (S293), the calculation model is obtained by learning the evaluation data as the label information It can be built (S295).

평가용 데이터에 레이블 정보를 태깅하는 몇몇 예시는 도 22에 도시되어 있다. 도 22는 혼동 행렬(confusion matrix)을 도시하고 있는데, 상기 기계학습 모델이 분류 태스크를 수행하는 모델인 경우, 평가 결과는 혼동 행렬 내의 특정 셀에 대응될 수 있다. 도 22에 도시된 바와 같이, 평가 결과가 FP(false positive) 또는 FN(false negative)인 이미지(101)에는 제1 값(e.g. 1)이 레이블 값(102)으로 태깅되고, 평가 결과가 TP(true positive) 또는 TN(true negative)인 이미지(103)에는 제2 값(e.g. 0)이 레이블 값(104)으로 태깅될 수 있다. 즉, 기계학습 모델의 예측이 정답과 일치한 경우에는 "1"이 태깅되고, 불일치한 경우에는 "0"이 태깅될 수 있다.Some examples of tagging label information to data for evaluation are shown in FIG. 22 . 22 shows a confusion matrix. When the machine learning model is a model for performing a classification task, an evaluation result may correspond to a specific cell in the confusion matrix. As shown in FIG. 22 , the image 101 in which the evaluation result is FP (false positive) or FN (false negative) is tagged with a first value (e.g. 1) as a label value 102, and the evaluation result is TP ( A second value (e.g. 0) may be tagged as a label value 104 in the image 103 that is true positive) or TN (true negative). That is, if the prediction of the machine learning model matches the correct answer, "1" may be tagged, and if the prediction of the machine learning model does not match, "0" may be tagged.

위와 같은 이미지(101, 102)와 레이블 정보를 학습하게 되면, 산출 모델은 기계학습 모델이 정확하게 예측했던 이미지와 유사한 이미지가 입력될 때 높은 컨피던스 스코어를 출력하게 된다. 또한, 반대의 경우 산출 모델은 낮은 컨피던스 스코어를 출력하게 된다. 따라서, 산출 모델은 입력된 이미지에 대한 기계학습 모델의 오예측 확률을 산출할 수 있게 된다.When the above images 101 and 102 and label information are learned, the output model outputs a high confidence score when an image similar to the image accurately predicted by the machine learning model is input. Also, in the opposite case, the calculation model outputs a low confidence score. Therefore, the calculation model can calculate the erroneous prediction probability of the machine learning model with respect to the input image.

한편, 도 22는 레이블 정보를 태깅하는 몇몇 예시를 도시하고 있을 뿐임에 유의하여야 한다. 본 개시의 다른 몇몇 실시예들에 따르면, 예측 오차가 레이블 정보로 태깅될 수도 있다. 여기서, 상기 예측 오차는 예측 값(즉, 컨피던스 스코어)과 실제 값(즉, 정답 정보)의 차이를 의미한다.Meanwhile, it should be noted that FIG. 22 only shows some examples of tagging label information. According to some other embodiments of the present disclosure, the prediction error may be tagged with label information. Here, the prediction error means a difference between a predicted value (ie, confidence score) and an actual value (ie, correct answer information).

또한, 본 개시의 또 다른 몇몇 실시예들에 따르면, 평가용 이미지의 예측 오차가 임계 값 이상인 경우 제1 값(e.g. 0)이 태깅되고, 상기 예측 오차가 상기 임계 값 미만인 경우 제2 값(e.g. 1)이 레이블 정보로 태깅될 수도 있다.In addition, according to some embodiments of the present disclosure, when the prediction error of the image for evaluation is equal to or greater than the threshold value, a first value (e.g. 0) is tagged, and when the prediction error is less than the threshold value, a second value (e.g. 1) may be tagged with this label information.

다시 도 20을 참조하여 설명을 이어가도록 한다.Referring again to FIG. 20, the description will be continued.

전술한 방법에 따라 산출 모델이 구축되면, 단계 S275에서, 관리 장치(100)는 복수의 후보 패치들 각각에 대한 오예측 확률을 산출할 수 있다. 가령, 도 23에 도시된 바와 같이, 관리 장치(100)는 각 데이터 샘플(111-1 내지 111-n)을 산출 모델(113)에 입력하여 산출 모델(113)의 컨피던스 스코어(115-1 내지 115-n)를 획득하고, 획득된 컨피던스 스코어(115-1 내지 115-n)에 기초하여 상기 오예측 확률을 산출할 수 있다.When the calculation model is constructed according to the above-described method, in operation S275 , the management apparatus 100 may calculate a erroneous prediction probability for each of the plurality of candidate patches. For example, as shown in FIG. 23 , the management device 100 inputs each data sample 111-1 to 111-n into the calculation model 113 , and the confidence scores 115-1 to 115-1 to the calculation model 113 . 115-n), and the misprediction probability may be calculated based on the obtained confidence scores 115-1 to 115-n.

다만, 도 23에 도시된 바와 같이, 후보 패치(11-1 내지 111-n)가 입력될 때, 정답 및 오답 클래스에 대한 컨피던스 스코어(115-1 내지 115-n)를 출력하도록 산출 모델(113)이 학습된 경우(e.g. 정답과 일치 시 레이블 1로 학습하고, 불일치 시 레이블 0으로 학습한 경우)라면, 오답 클래스의 컨피던스 스코어(밑줄로 도시됨)가 오예측 확률로 이용될 수도 있다.However, as shown in FIG. 23 , when candidate patches 11-1 to 111-n are input, the calculation model 113 outputs confidence scores 115-1 to 115-n for correct and incorrect answer classes. ) is learned (e.g. when it matches the correct answer, it learns with label 1, and when it does not match, it learns with label 0), the confidence score of the incorrect answer class (shown underlined) may be used as the misprediction probability.

각 후보 패치의 오예측 확률이 산출되면, 관리 장치(100)는 복수의 후보 패치 중에서 상기 산출된 오예측 확률이 기준치 이상인 후보 패치를 어노테이션 대상으로 선정할 수 있다. 오예측 확률이 높다는 것은 상기 기계학습 모델의 예측 결과가 틀릴 가능성이 높다는 것을 의미하고, 이는 곧 해당 패치가 상기 기계학습 모델의 성능을 개선하는데 중요한 데이터라는 것을 의미하기 때문이다. 이와 같이, 오예측 확률에 기반하여 패치를 선정하면, 학습에 효과적인 패치들이 어노테이션 대상으로 선정됨으로써 양질의 학습 데이터셋이 생성될 수 있다.When the erroneous prediction probability of each candidate patch is calculated, the management apparatus 100 may select a candidate patch having the calculated erroneous prediction probability equal to or greater than a reference value from among a plurality of candidate patches as an annotation target. A high probability of misprediction means that the prediction result of the machine learning model is highly likely to be wrong, which means that the patch is important data for improving the performance of the machine learning model. In this way, when patches are selected based on the erroneous prediction probability, patches effective for learning are selected as annotation targets, thereby generating a high-quality learning dataset.

지금까지 도 14 내지 도 23을 참조하여 본 개시의 다양한 실시예들에 따른 패치 자동 추출 방법에 대하여 설명하였다. 상술한 방법에 따르면, 어노테이션이 수행될 영역을 가리키는 패치가 자동으로 추출될 수 있다. 따라서, 관리자의 업무 부담이 최소화될 수 있다. 또한, 기계 학습 모델의 오예측 확률, 엔트로피 값 등에 기반하여 복수의 후보 패치 중에서 학습에 효과적인 패치만이 어노테이션 대상으로 선정된다. 이에 따라, 어노테이션 작업량이 감소되고, 양질의 학습 데이터셋이 생성될 수 있다So far, a method for automatically extracting patches according to various embodiments of the present disclosure has been described with reference to FIGS. 14 to 23 . According to the above-described method, a patch indicating an area to be annotated may be automatically extracted. Accordingly, the work load of the manager can be minimized. Also, based on the erroneous prediction probability and entropy value of the machine learning model, only a patch effective for learning from among a plurality of candidate patches is selected as an annotation target. Accordingly, the amount of annotation work can be reduced, and a high-quality training dataset can be generated.

이하에서는, 도 24를 참조하여 본 개시의 다양한 실시예들에 따른 장치(e.g. 관리 장치 100)/시스템을 구현할 수 있는 예시적인 컴퓨팅 장치(200)에 대하여 설명하도록 한다.Hereinafter, an exemplary computing device 200 capable of implementing the device (e.g. management device 100 )/system according to various embodiments of the present disclosure will be described with reference to FIG. 24 .

도 24는 본 개시의 다양한 실시예들에 따른 장치를 구현할 수 있는 예시적인 컴퓨팅 장치(200)를 나타내는 예시적인 하드웨어 구성도이다.24 is an exemplary hardware configuration diagram illustrating an exemplary computing device 200 that may implement an apparatus according to various embodiments of the present disclosure.

도 24에 도시된 바와 같이, 컴퓨팅 장치(200)는 하나 이상의 프로세서(210), 버스(250), 통신 인터페이스(270), 프로세서(210)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(230)와 컴퓨터 프로그램(291)를 저장하는 스토리지(290)를 포함할 수 있다. 다만, 도 24에는 본 개시의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 개시가 속한 기술분야의 통상의 기술자라면 도 24에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.As shown in FIG. 24 , the computing device 200 includes one or more processors 210 , a bus 250 , a communication interface 270 , and a memory (loading) for loading a computer program executed by the processor 210 . 230 and a storage 290 for storing the computer program 291 may be included. However, only components related to the embodiment of the present disclosure are illustrated in FIG. 24 . Accordingly, one of ordinary skill in the art to which the present disclosure pertains can know that other general-purpose components other than the components shown in FIG. 24 may be further included.

프로세서(210)는 컴퓨팅 장치(200)의 각 구성의 전반적인 동작을 제어한다. 프로세서(210)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(210)는 본 개시의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(200)는 하나 이상의 프로세서를 구비할 수 있다.The processor 210 controls the overall operation of each component of the computing device 200 . The processor 210 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphic processing unit (GPU), or any type of processor well known in the art of the present disclosure. can be In addition, the processor 210 may perform an operation on at least one application or program for executing the method according to the embodiments of the present disclosure. The computing device 200 may include one or more processors.

메모리(230)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(230)는 본 개시의 다양한 실시예들에 따른 방법/동작을 실행하기 위하여 스토리지(290)로부터 하나 이상의 프로그램(291)을 로드할 수 있다. 메모리(230)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 개시의 기술적 범위는 이에 한정되지 아니한다.The memory 230 stores various data, commands and/or information. The memory 230 may load one or more programs 291 from the storage 290 to execute methods/operations according to various embodiments of the present disclosure. The memory 230 may be implemented as a volatile memory such as RAM, but the technical scope of the present disclosure is not limited thereto.

버스(250)는 컴퓨팅 장치(200)의 구성 요소 간 통신 기능을 제공한다. 버스(250)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 250 provides communication functions between components of the computing device 200 . The bus 250 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

통신 인터페이스(270)는 컴퓨팅 장치(200)의 유무선 인터넷 통신을 지원한다. 또한, 통신 인터페이스(270)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(270)는 본 개시의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The communication interface 270 supports wired/wireless Internet communication of the computing device 200 . In addition, the communication interface 270 may support various communication methods other than Internet communication. To this end, the communication interface 270 may be configured to include a communication module well-known in the technical field of the present disclosure.

스토리지(290)는 상기 하나 이상의 프로그램(291)을 비임시적으로 저장할 수 있다. 스토리지(290)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 290 may non-temporarily store the one or more programs 291 . The storage 290 is a non-volatile memory such as read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, a hard disk, a removable disk, or well in the art to which the present disclosure pertains. It may be configured to include any known computer-readable recording medium.

컴퓨터 프로그램(291)은 메모리(230)에 로드될 때 프로세서(210)로 하여금 본 개시의 다양한 실시예들에 따른 동작/방법을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 포함할 수 있다. 즉, 프로세서(210)는 상기 하나 이상의 인스트럭션들을 실행함으로써, 본 개시의 다양한 실시예들에 따른 동작/방법들을 수행할 수 있다.The computer program 291 may include one or more instructions that, when loaded into the memory 230 , cause the processor 210 to perform an operation/method according to various embodiments of the present disclosure. That is, the processor 210 may perform the operations/methods according to various embodiments of the present disclosure by executing the one or more instructions.

예를 들어, 컴퓨터 프로그램(291)은 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 동작, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 동작 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 동작을 수행하도록 하는 하나 이상의 인스트럭션들을 포함할 수 있다. 이와 같은 경우, 컴퓨팅 장치(200)를 통해 본 개시의 몇몇 실시예들에 따른 관리 장치(100)가 구현될 수 있다.For example, the computer program 291 may perform an operation of obtaining information about a new pathological slide image, an operation of determining a dataset type and panel of the pathological slide image, and an operation of the pathological slide image, the determined dataset type, and an annotation task (annotation task) and one or more instructions for performing an operation of allocating an annotation job defined as a patch, which is a partial region of the pathological slide image, to an annotator account. In this case, the management device 100 according to some embodiments of the present disclosure may be implemented through the computing device 200 .

지금까지 도 24를 참조하여 본 개시의 다양한 실시예들에 따른 장치를 구현할 수 있는 예시적인 컴퓨팅 장치에 대하여 설명하였다.An exemplary computing device capable of implementing the device according to various embodiments of the present disclosure has been described with reference to FIG. 24 .

지금까지 도 1 내지 도 24를 참조하여 설명된 본 개시의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical idea of the present disclosure described with reference to FIGS. 1 to 24 may be implemented as computer-readable codes on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). can The computer program recorded on the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 개시의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 개시의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 개시의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though it has been described that all components constituting the embodiment of the present disclosure are combined or operated as one, the technical spirit of the present disclosure is not necessarily limited to this embodiment. That is, within the scope of the object of the present disclosure, all of the components may operate by selectively combining one or more.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although acts are shown in a particular order in the drawings, it should not be understood that the acts must be performed in the specific order or sequential order shown, or that all illustrated acts must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이상 첨부된 도면을 참조하여 본 개시의 실시예들을 설명하였지만, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 개시가 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although embodiments of the present disclosure have been described with reference to the accompanying drawings, those of ordinary skill in the art to which the present disclosure pertains may practice the present disclosure in other specific forms without changing the technical spirit or essential features. can understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present disclosure should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the technical ideas defined by the present disclosure.

Claims

communication interface,
memory for storing one or more instructions; and
a processor that executes the instructions;
The processor, by executing the instruction,
receive an annotation task for the pathological slide image from the management device through the communication interface;
displaying an annotation tool including a first area in which information related to the annotation operation is displayed and a second area in which information related to a partial area of the pathological slide image is displayed;
receiving an input of an annotator through the annotation tool;
performing an annotation based on the input of the annotator;
providing the result of the annotation to the management device through the communication interface
terminal.

In claim 1,
The annotation operation is determined based on at least one of a data set type indicating a usage purpose of the pathological slide image and a panel type of the pathological slide image and a patch corresponding to the partial region.

In claim 2,
The use use is selected from a plurality of uses including a learning use of the machine learning model and a verification use of the machine learning model,
The panel type is selected from a plurality of panels including a cell panel, a tissue panel, and a structure panel.
terminal.

In claim 1,
The first area includes an area in which information related to the annotation operation is displayed and an area in which a plurality of tools for annotation are displayed,
wherein the processor selects at least one tool among the plurality of tools based on the input of the annotator.
terminal.

In claim 1,
The second region includes a patch region in which a partial region of the pathological slide image is displayed to perform annotation, and an indicator for controlling enlargement or reduction of the patch region.

In claim 1,
The result of performing the annotation includes label information tagged in the partial area, the terminal.

communication interface,
memory for storing one or more instructions; and
a processor that executes the instructions;
The processor, by executing the instruction,
Transmitting the annotation work for the pathological slide image to the terminal of the annotator through the communication interface,
receiving a result of the annotation operation performed through the annotation tool in the terminal through the communication interface;
Using a machine learning model to verify the results of the annotation work,
recording the result of the verification as the evaluation result of the annotator
management device.

In claim 7,
and the annotation tool includes a first area in which information related to the annotation operation is displayed and a second area in which information related to a partial area of the pathological slide image is displayed.

In claim 7,
the processor is
input the pathological slide image to the machine learning model;
Comparing the results of the annotation work with the results output from the machine learning model to verify the results of the annotation work
management device.

In claim 9,
The processor is
If, as a result of the verification, the annotation work needs to be performed again, another annotator is selected;
delivering the annotation work to the terminal of the other annotator
management device.

In claim 9,
When a difference between a result of the annotation operation and a result output from the machine learning model exceeds a reference value, the processor determines that the annotation operation should be performed again.

In claim 7,
The processor is
determining at least one of a data set type indicating a usage purpose of the pathological slide image and a panel type of the pathological slide image;
determining a patch corresponding to a partial region of the pathological slide image;
generating the annotation task based on the patch and at least one of the dataset type and the panel type
management device.

In claim 12,
the processor is
determining the utilization use from among a plurality of uses including a learning purpose of the machine learning model and a verification purpose of the machine learning model;
determining the panel type from among a plurality of panels including a cell panel, a tissue panel, and a structure panel
management device.

In claim 12,
the processor is
input the pathological slide image to a machine learning model,
Determining at least one of the dataset type and the panel type based on the output value of the machine learning model
management device.

In claim 12,
the processor is
selecting a plurality of candidate patches from the pathological slide image;
calculating the erroneous prediction probability of the machine learning model for each of the plurality of candidate patches,
selecting the patch from among the plurality of candidate patches based on the erroneous prediction probability
management device.

In claim 7,
The result of the annotation operation includes label information tagged in a partial area of the pathological slide image.

A method of performing annotation in a terminal, comprising:
receiving, via the communication interface, an annotation task for the pathological slide image from the management device;
displaying an annotation tool including a first area in which information related to the annotation operation is displayed and a second area in which information related to a partial area of the pathological slide image is displayed;
receiving input from an annotator through the annotation tool;
performing an annotation based on the input of the annotator; and
providing the result of the annotation to the management device through the communication interface
How to include.

A method for managing annotation tasks on a managed device, the method comprising:
transmitting the annotation task for the pathological slide image to the terminal of the annotator through the communication interface;
Receiving the result of the annotation work performed through the annotation tool in the terminal through the communication interface, and
Using a machine learning model, performing verification on the result of the annotation operation, and
recording the result of the verification as the evaluation result of the annotator;
How to include.