KR102081037B1

KR102081037B1 - Method for managing annotation job, apparatus and system supporting the same

Info

Publication number: KR102081037B1
Application number: KR1020200009995A
Authority: KR
Inventors: 이경원; 팽경현
Original assignee: 주식회사 루닛
Priority date: 2020-01-28
Filing date: 2020-01-28
Publication date: 2020-02-24

Abstract

Provided is a method for managing an annotation job, capable of efficiently performing an annotation job. The method for managing an annotation job comprises the following steps of: obtaining information on a new pathology slide image; determining a dataset type and a panel of the pathology slide image; and assigning an annotation job defined by the pathology slide image, the determined dataset type, an annotation task, and a patch which is a partial area of the pathology slide image, to an annotator account, wherein the annotation task may be defined to include the determined panel, the panel may be designated as any one of a cell panel, a tissue panel, and a structure panel, and the dataset type may indicate use of the pathology slide image and may be designated as any one of a training use of a machine learning model and a validation use of the machine learning model.

Description

How to manage annotation work, devices and systems that support it {METHOD FOR MANAGING ANNOTATION JOB, APPARATUS AND SYSTEM SUPPORTING THE SAME}

본 개시는 어노테이션 작업 관리 방법, 이를 지원하는 장치 및 시스템에 관한 것이다. 보다 자세하게는, 어노테이션(annotation) 작업을 보다 효율적으로 관리함과 동시에 어노테이션 결과의 정확성을 담보할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.The present disclosure relates to an annotation task management method, an apparatus and a system supporting the same. More specifically, the present invention provides a method, an apparatus and a system supporting the method, which can manage annotation work more efficiently and ensure the accuracy of annotation results.

지도 학습(supervised learning)이란 도 1에 도시된 바와 같이 레이블 정보(즉, 정답 정보)가 주어진 데이터셋(2)을 학습하여 목적 태스크를 수행하는 타깃 모델(3)을 구축하는 기계 학습 방법이다. 따라서, 레이블 정보(태그 아이콘으로 표시됨)가 주어지지 않은 데이터셋(1)에 대해 지도 학습을 수행하기 위해서는, 어노테이션(annotation) 작업이 필수적으로 선행되어야 한다.Supervised learning is a machine learning method for constructing a target model 3 for performing a target task by learning a data set 2 given label information (ie, correct answer information) as shown in FIG. 1. Therefore, in order to perform supervised learning on the data set 1 which is not given label information (indicated by a tag icon), annotation work must be preceded essentially.

어노테이션 작업은 학습 데이터셋을 생성하기 위해 데이터 별로 레이블 정보를 태깅하는 작업을 의미한다. 어노테이션 작업은 일반적으로 사람에 의해 수행되기 때문에, 대량의 학습 데이터셋을 생성하기 위해서는 상당한 인적 비용과 시간 비용이 소모된다. 특히, 병리 이미지에서 병변의 종류 또는 위치 등을 진단하는 기계 학습 모델을 구축하는 경우라면, 숙련된 전문의에 의해 어노테이션 작업이 수행되어야 하기 때문에, 다른 도메인에 비해 훨씬 더 많은 비용이 소모된다.Annotation work refers to a task of tagging label information for each data to generate a training data set. Annotation work is typically done by humans, which requires considerable human and time costs to generate large learning datasets. In particular, in the case of building a machine learning model for diagnosing the type or location of a lesion in a pathological image, an annotation operation must be performed by a skilled practitioner, which is much more expensive than other domains.

종래에는, 체계적인 작업 프로세스가 정립되지 않은 체로 어노테이션 작업이 수행되었다. 가령, 종래의 방식은 관리자가 각 병리 이미지의 특성을 육안으로 확인하여 어노테이션 수행 여부를 결정하고, 손수 병리 이미지를 분류한 다음 적절한 어노테이터(annotator)에게 병리 이미지를 할당하는 방식이었다. 뿐만 아니라, 종래에는 관리자가 일일이 병리 이미지 상의 어노테이션 영역을 지정한 다음, 어노테이터에게 작업을 할당하였다. 즉, 종래에는 병리 이미지 분류, 작업 할당, 어노테이션 영역 지정 등의 제반 과정이 관리자에 의해 수동으로 이루어졌고, 이로 인해 어노테이션 작업에 상당한 시간 비용과 인적 비용이 소모되는 문제가 있었다.In the past, annotation work was carried out with no systematic work process established. For example, in the conventional method, the administrator visually checks the characteristics of each pathological image to determine whether to perform annotations, classifies the pathological image by hand, and then assigns the pathological image to an appropriate annotator. In addition, in the related art, an administrator assigns an annotation region on a pathological image, and then assigns a task to an annotator. That is, in the related art, various processes such as pathological image classification, task assignment, and annotation region designation are manually performed by an administrator, which causes a significant time cost and human cost to be consumed by the annotation work.

나아가, 기계 학습 기법 자체는 충분히 고도화되었음에도 불구하고, 어노테이션 작업의 시간적, 비용적 문제로 인해 다양한 분야에 기계 학습 기법을 적용하는데 많은 어려움이 있었다.Furthermore, although the machine learning technique itself has been sufficiently advanced, there have been many difficulties in applying the machine learning technique to various fields due to the time and cost problems of the annotation work.

따라서, 기계 학습 기법의 활용성을 더욱 증대시키기 위해, 보다 효율적이고 체계적으로 어노테이션 작업을 수행할 수 있는 방법이 요구된다.Therefore, in order to further increase the utility of machine learning techniques, a method for annotating more efficiently and systematically is required.

한국공개특허 제10-2014-0093974호 (2014.07.29 공개)Korean Patent Publication No. 10-2014-0093974 (published Jul. 29, 2014)

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 어노테이션 작업의 자동화를 통해 어노테이션 작업을 보다 효율적이고 체계적으로 수행하고 관리할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.The technical problem to be solved by some embodiments of the present disclosure is to provide a method, an apparatus and a system supporting the method, which can perform and manage the annotation task more efficiently and systematically through the automation of the annotation task.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 어노테이션 작업을 체계적으로 관리할 수 있는 데이터 설계 산출물 또는 데이터 모델링 산출물을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a data design output or a data modeling output that can systematically manage annotation work.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 어노테이션 작업을 적절한 어노테이터에게 자동으로 할당하는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method for automatically allocating an annotation task to an appropriate annotator, an apparatus and a system supporting the method.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 병리 슬라이드 이미지에서 어노테이션 작업이 수행될 패치 이미지를 자동으로 추출하는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method for automatically extracting a patch image to be annotated from a pathological slide image, an apparatus and a system supporting the method.

본 개시가 해결하고자 하는 또 다른 기술적 과제는, 어노테이션 결과의 정확성을 담보할 수 있는 방법, 그 방법을 지원하는 장치 및 시스템을 제공하는 것이다.Another technical problem to be solved by the present disclosure is to provide a method capable of ensuring the accuracy of annotation results, an apparatus and a system supporting the method.

본 개시의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present disclosure are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법은, 컴퓨팅 장치에 의하여 수행되는 방법에 있어서, 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 단계를 포함하되, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.An annotation task management method according to some embodiments of the present disclosure, to solve the technical problem, in a method performed by a computing device, obtaining information about a new pathological slide image, Determining a dataset type and panel and annotating an annotated job defined by the pathology slide image, the determined dataset type, an annotation task and a patch that is a portion of the pathology slide image. ), Wherein the annotation task is defined including the determined panel, and the panel is assigned to any one of a cell panel, a tissue panel, and a structure panel. The dataset type is used to determine the purpose of the pathological slide image. As indicated, it may be designated as either a training use of the machine learning model or a validation use of the machine learning model.

몇몇 실시예에서, 상기 어노테이션 태스크는, 태스크 클래스를 더 포함하여 정의되는 것이고, 상기 태스크 클래스는, 상기 패널의 관점에서 정의되는 어노테이션 대상을 가리키는 것일 수 있다.In some embodiments, the annotation task may be further defined to include a task class, and the task class may indicate an annotation target defined from the perspective of the panel.

몇몇 실시예에서, 상기 데이터셋 타입은, 상기 기계학습 모델의 학습(training) 용도, 상기 기계학습 모델의 검증(validation) 용도 또는 OPT(Observer Performance Test) 용도 중 어느 하나로 지정되는 것일 수 있다.In some embodiments, the dataset type may be designated as any one of a training use of the machine learning model, a validation use of the machine learning model, or an Observer Performance Test (OPT) use.

몇몇 실시예에서, 상기 데이터셋 타입 및 패널을 결정하는 단계는, 상기 병리 슬라이드 이미지를 기계학습 모델에 입력하고, 그 결과로 출력된 출력 값에 기반하여, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계를 포함할 수 있다.In some embodiments, determining the dataset type and panel comprises inputting the pathological slide image into a machine learning model and based on the resulting output value, the dataset type and panel of the pathological slide image. Determining may include.

몇몇 실시예에서, 상기 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계는, 지정된 위치의 스토리지에 병리 슬라이드 이미지 파일이 추가되는 것을, 상기 스토리지를 모니터링 하는 워커 에이전트가 감지하는 단계, 상기 워커 에이전트에 의하여 상기 신규의 병리 슬라이드 이미지에 대한 정보가 데이터베이스에 삽입되는 단계 및 상기 데이터베이스로부터 상기 병리 슬라이드 이미지에 대한 정보를 얻는 단계를 포함할 수 있다.In some embodiments, obtaining information about the new pathological slide image comprises: detecting, by a worker agent monitoring the storage, that a pathological slide image file is added to storage at a specified location, by the worker agent; Inserting information about the new pathological slide image into a database and obtaining information about the pathological slide image from the database.

몇몇 실시예에서, 상기 할당하는 단계는, 상기 어노테이션 작업의 데이터셋 타입 및 어노테이션 태스크의 패널의 조합과 연관된 어노테이션 수행 이력을 기준으로 선정된 어노테이터 계정에 상기 어노테이션 작업을 자동 할당하는 단계를 포함할 수 있다.In some embodiments, the allocating may include automatically assigning the annotation task to a selected annotator account based on the annotation performance history associated with the combination of the dataset type of the annotation task and the panel of annotation tasks. Can be.

몇몇 실시예에서, 상기 어노테이션 태스크는, 태스크 클래스를 더 포함하여 정의되는 것이고, 상기 태스크 클래스는, 상기 패널의 관점에서 정의되는 어노테이션 대상을 가리키는 것이며, 상기 할당하는 단계는, 상기 어노테이션 작업의 어노테이션 태스크의 패널 및 태스크 클래스의 조합과 연관된 어노테이션 수행 이력을 기준으로 선정된 어노테이터 계정에 상기 어노테이션 작업을 자동 할당하는 단계를 포함할 수 있다. In some embodiments, the annotation task is further defined to include a task class, the task class points to an annotation target defined from the perspective of the panel, and the assigning step is an annotation task of the annotation task. And automatically assigning the annotation task to the selected annotator account based on the annotation performance history associated with the combination of the panel and the task class.

몇몇 실시예에서, 상기 할당하는 단계는, 상기 병리 슬라이드 이미지의 후보 패치들을 얻는 단계 및 각각의 후보 패치를 상기 기계학습 모델에 입력하고, 그 결과로 출력된 각 클래스 별 출력 값에 기반하여, 상기 후보 패치들 중에서 상기 어노테이션 작업의 패치를 자동으로 선정하는 단계를 포함할 수 있다.In some embodiments, the assigning step comprises obtaining candidate patches of the pathological slide image and inputting each candidate patch into the machine learning model and based on the resulting output value for each class, And automatically selecting a patch of the annotation task among candidate patches.

몇몇 실시예에서, 상기 어노테이션 작업의 패치를 상기 후보 패치들 중에서 자동으로 선정하는 단계는, 상기 각각의 후보 패치에 대한 각 클래스 별 출력 값을 이용하여 엔트로피 값을 연산하는 단계 및 상기 엔트로피 값이 기준치 이상인 후보 패치를, 상기 어노테이션 작업의 패치로 선정하는 단계를 포함할 수 있다.In some embodiments, automatically selecting a patch of the annotation task from among the candidate patches may include calculating an entropy value using an output value for each class for each candidate patch, and the entropy value is a reference value. The candidate patch may be selected as a patch of the annotation operation.

몇몇 실시예에서, 상기 할당하는 단계는, 상기 병리 슬라이드 이미지의 후보 패치들을 얻는 단계, 각각의 후보 패치에 대한 상기 기계학습 모델의 오예측(miss-prediction) 확률을 산출하는 단계 및 상기 산출된 오예측 확률이 기준치 이상인 후보 패치를, 상기 어노테이션 작업의 패치로 선정하는 단계를 포함할 수 있다.In some embodiments, said assigning step comprises obtaining candidate patches of said pathological slide image, calculating a miss-prediction probability of said machine learning model for each candidate patch and said calculated error. The method may include selecting a candidate patch having a predicted probability greater than or equal to a reference value as a patch of the annotation operation.

몇몇 실시예에서, 상기 어노테이션 작업을 할당받은 어노테이터 계정의 제1 어노테이션 결과 데이터를 얻는 단계, 상기 제1 어노테이션 결과 데이터와 상기 어노테이션 작업의 패치를 상기 기계학습 모델에 입력한 결과를 비교하는 단계 및 상기 비교 결과, 두 결과의 차이가 기준치를 초과하면 상기 어노테이션 작업을 다른 어노테이터 계정에 재할당하는 단계를 더 포함할 수 있다.In some embodiments, obtaining first annotation result data of an annotator account assigned an annotation task, comparing the first annotation result data with a result of inputting a patch of the annotation task into the machine learning model; As a result of the comparison, if the difference between the two results exceeds the reference value, the method may further include reallocating the annotation work to another annotator account.

몇몇 실시예에서, 상기 어노테이션 작업을 할당받은 어노테이터 계정의 제1 어노테이션 결과 데이터를 얻는 단계, 다른 어노테이터 계정의 제2 어노테이션 결과 데이터를 얻는 단계 및 상기 제1 어노테이션 결과 데이터와 상기 제2 어노테이션 결과 데이터의 유사도가 기준치 미만이면 상기 제1 어노테이션 결과 데이터를 미승인 처리하는 단계를 더 포함할 수 있다.In some embodiments, obtaining first annotation result data of an annotator account assigned to the annotation task, obtaining second annotation result data of another annotation account, and the first annotation result data and the second annotation result. If the degree of similarity of the data is less than the reference value, the method may further include disapproving the first annotation result data.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 장치는, 하나 이상의 인스트럭션들(instructions)을 포함하는 메모리 및 상기 하나 이상의 인스트럭션들을 실행함으로써, 신규의 병리 슬라이드 이미지에 대한 정보를 얻어오고, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하며, 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 프로세서를 포함하되, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.An annotation task management apparatus according to some embodiments of the present disclosure for solving the above technical problem, by executing the memory and one or more instructions (instructions) and the one or more instructions, for a new pathological slide image Obtain information, determine a dataset type and panel of the pathological slide image, and define an annotation defined by the pathological slide image, the determined dataset type, an annotation task, and a patch that is part of the pathological slide image A processor for assigning a job to an annotator account, wherein the annotation task is defined including the determined panel, wherein the panel comprises a cell panel, a tissue panel, and Is assigned to any of the structure panels, Group data set type, as pointing to the use of such pathology slide image may be assigned any of the verification (validation) use of the learning (training) of the machine learning models use or the machine learning model as one.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 컴퓨터 프로그램을 포함하는 비일시적(non-transitory) 컴퓨터 판독가능 기록매체는, 상기 컴퓨터 프로그램의 명령어들이 프로세서에 의해 실행될 때, 상기 프로세서로 하여금, 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 단계 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 단계를 수행하도록 할 수 있다. 이때, 상기 어노테이션 태스크는, 상기 결정된 패널을 포함하여 정의되는 것이고, 상기 패널은, 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널 중 어느 하나로 지정되는 것이며, 상기 데이터셋 타입은, 상기 병리 슬라이드 이미지의 용도를 가리키는 것으로서, 기계학습 모델의 학습(training) 용도 또는 상기 기계학습 모델의 검증(validation) 용도 중 어느 하나로 지정되는 것일 수 있다.A non-transitory computer-readable recording medium including a computer program according to some embodiments of the present disclosure for solving the above technical problem is executed when the instructions of the computer program are executed by a processor. Obtaining information about a new pathological slide image, determining a dataset type and panel of the pathological slide image and the pathological slide image, the determined dataset type, annotation task and the pathology. An annotation job defined as a patch, which is part of a slide image, may be assigned to an annotator account. In this case, the annotation task is defined including the determined panel, and the panel is assigned to any one of a cell panel, a tissue panel, and a structure panel, and the dataset type is , Indicating the use of the pathological slide image, may be designated as either a training use of a machine learning model or a validation use of the machine learning model.

상술한 본 개시의 다양한 실시예들에 따르면, 어노테이션 작업이 전반적으로 자동화됨에 따라 관리자의 편의성이 증대되고, 전반적인 작업 효율성이 크게 향상될 수 있다. 이에 따라, 어노테이션 작업에 소요되는 시간 비용 및 인적 비용이 크게 절감될 수 있다. 또한, 어노테이션 작업의 부담이 감소됨에 따라, 기계 학습 기법의 활용성은 더욱 증대될 수 있다.According to various embodiments of the present disclosure described above, as the annotation task is generally automated, the convenience of the administrator may be increased and the overall work efficiency may be greatly improved. Accordingly, the time cost and human cost required for the annotation work can be greatly reduced. In addition, as the burden of annotation work is reduced, the utility of machine learning techniques can be further increased.

또한, 데이터 모델링 산출물에 기반하여 어노테이션 작업과 연관된 각종 데이터가 체계적으로 관리될 수 있다. 이에 따라, 데이터 관리 비용은 감소되고, 전반적인 어노테이션 작업 프로세스가 원활하게 진행될 수 있다.In addition, various data associated with the annotation work may be systematically managed based on the data modeling output. As a result, data management costs are reduced, and the overall annotation work process can be smoothly performed.

또한, 어노테이션 작업을 적절한 어노테이터에게 자동으로 할당함으로써, 관리자의 업무 부담이 감소될 수 있고, 어노테이션 결과의 정확성은 향상될 수 있다.In addition, by automatically assigning annotation work to appropriate annotators, the burden on the administrator can be reduced, and the accuracy of annotation results can be improved.

또한, 어노테이션 작업 결과를 기계학습 모델 또는 다른 어노테이터의 결과와 비교 검증함으로써, 어노테이션 결과의 정확성이 담보될 수 있다. 이에 따라, 어노테이션 결과를 학습한 기계학습 모델의 성능도 향상될 수 있다.In addition, the accuracy of the annotation results can be ensured by comparing and verifying the results of annotation work with those of machine learning models or other annotators. Accordingly, the performance of the machine learning model learning the annotation result may also be improved.

또한, 어노테이션이 수행될 영역을 가리키는 패치가 자동으로 추출될 수 있다. 따라서, 관리자의 업무 부담이 최소화될 수 있다.In addition, a patch indicating an area to be annotated may be automatically extracted. Therefore, the burden on the manager can be minimized.

또한, 기계 학습 모델의 오예측 확률, 엔트로피 값 등에 기반하여 복수의 후보 패치 중에서 학습에 효과적인 패치만이 어노테이션 대상으로 선정된다. 이에 따라, 어노테이션 작업량이 감소되고, 양질의 학습 데이터셋이 생성될 수 있다.In addition, only patches effective for learning among a plurality of candidate patches are selected as annotation targets based on misprediction probabilities and entropy values of the machine learning model. As a result, the amount of annotation work can be reduced, and a good learning data set can be generated.

본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects according to the technical spirit of the present disclosure are not limited to the above-mentioned effects, other effects that are not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 지도 학습과 어노테이션 작업 간의 관계를 설명하기 위한 예시도이다.
도 2 및 도 3은 본 개시의 다양한 실시예들에 따른 어노테이션 작업 관리 시스템을 나타내는 예시적인 구성도이다.
도 4는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리를 위한 예시적인 데이터 모델의 설계도이다.
도 5는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법을 나타내는 예시적인 흐름도이다.
도 6은 본 개시의 몇몇 실시예들에 따른 어노테이터 선정 방법을 설명하기 위한 예시도이다.
도 7은 본 개시의 몇몇 실시예들에서 참조될 수 있는 어노테이션 툴을 나타내는 예시도이다.
도 8은 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법을 나타내는 예시적인 흐름도이다.
도 9는 본 개시의 몇몇 실시예들에 따른 병리 슬라이드 이미지에 대한 데이터셋 타입 결정 방법을 나타내는 예시적인 흐름도이다.
도 10 내지 도 13은 본 개시의 몇몇 실시예들에 따른 패널 유형 결정 방법을 설명하기 위한 도면이다.
도 14는 본 개시의 제1 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다.
도 15 내지 도 19는 본 개시의 제1 실시예들에 따른 패치 자동 추출 방법을 부연 설명하기 위한 예시도이다.
도 20은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다.
도 21 내지 도 23은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 부연 설명하기 위한 예시도이다.
도 24는 본 개시의 다양한 실시예들에 따른 장치/시스템을 구현할 수 있는 예시적인 컴퓨팅 장치를 나타내는 예시적인 하드웨어 구성도이다.1 is an exemplary diagram for explaining a relationship between supervised learning and annotation work.
2 and 3 are exemplary diagrams illustrating an annotation task management system according to various embodiments of the present disclosure.
4 is a schematic diagram of an example data model for annotation task management in accordance with some embodiments of the present disclosure.
5 is an exemplary flowchart illustrating an annotation task management method according to some embodiments of the present disclosure.
6 is an exemplary view for explaining an annotator selection method according to some embodiments of the present disclosure.
7 is an exemplary diagram illustrating an annotation tool that may be referenced in some embodiments of the present disclosure.
8 is an exemplary flowchart illustrating a method of generating an annotation task according to some embodiments of the present disclosure.
9 is an exemplary flowchart illustrating a method of determining a dataset type for a pathological slide image according to some embodiments of the present disclosure.
10 to 13 illustrate a panel type determination method according to some embodiments of the present disclosure.
14 is an exemplary flowchart illustrating a patch automatic extraction method according to first embodiments of the present disclosure.
15 to 19 are exemplary views for explaining the patch automatic extraction method according to the first embodiment of the present disclosure.
20 is an exemplary flowchart illustrating a patch automatic extraction method according to second embodiments of the present disclosure.
21 to 23 are exemplary diagrams for further explaining a method for automatically extracting patches according to second embodiments of the present disclosure.
24 is an example hardware configuration diagram illustrating an example computing device that can implement an apparatus / system in accordance with various embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 바람직한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Advantages and features of the present disclosure, and methods of accomplishing the same will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical spirit of the present disclosure is not limited to the following embodiments, and may be implemented in various forms, and the present exemplary embodiments may be provided only to complete the present disclosure, and to provide general knowledge in the technical field to which the present disclosure belongs. It is provided to fully inform those having the scope of the present disclosure, and the technical spirit of the present disclosure is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are used to refer to the same components as much as possible, even if displayed on different drawings. In addition, in describing the present disclosure, when it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present disclosure, the detailed description thereof will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시를 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms used in this specification (including technical and scientific terms) may be used as meanings that can be commonly understood by those skilled in the art to which the present disclosure belongs. In addition, terms that are defined in a commonly used dictionary are not ideally or excessively interpreted unless they are specifically defined clearly. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the components of the present disclosure, terms such as first, second, A, B, (a), and (b) may be used. These terms are only for distinguishing the components from other components, and the nature, order or order of the components are not limited by the terms. If a component is described as being "connected", "coupled" or "connected" to another component, that component may be directly connected to or connected to that other component, but there may be another configuration between each component. It is to be understood that the elements may be "connected", "coupled" or "connected".

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, “comprises” and / or “comprising” refers to a component, step, operation and / or element that is mentioned in the presence of one or more other components, steps, operations and / or elements. Or does not exclude additions.

본 명세서에 대한 설명에 앞서, 본 명세서에서 사용되는 몇몇 용어들에 대하여 명확하게 하기로 한다.Prior to the description herein, some terms used herein will be clarified.

본 명세서에서, 레이블 정보(label information)란, 데이터 샘플의 정답 정보로써 어노테이션 작업의 결과로 획득된 정보이다. 상기 레이블은 당해 기술 분야에서 어노테이션(annotation), 태그 등의 용어와 혼용되어 사용될 수 있다.In the present specification, label information is information obtained as a result of an annotation operation as correct answer information of a data sample. The label may be used interchangeably with terms such as annotation, tag, and the like in the art.

본 명세서에서, 어노테이션(annotation)이란, 데이터 샘플에 레이블 정보를 태깅하는 작업 또는 태깅된 정보(즉, 주석) 그 자체를 의미한다. 상기 어노테이션은 당해 기술 분야에서 태깅(tagging), 레이블링(labeling) 등의 용어와 혼용되어 사용될 수 있다.In the present specification, annotation means a task of tagging label information on a data sample or tagged information (ie, annotation) itself. The annotation may be used interchangeably with terms such as tagging and labeling in the art.

본 명세서에서, 오예측(miss-prediction) 확률이란, 주어진 데이터 샘플에 대한 특정 모델이 예측을 수행할 때, 상기 예측 결과에 오류가 포함될 확률(즉, 예측이 틀릴 확률) 또는 가능성을 의미한다.In this specification, a miss-prediction probability means a probability (ie, a probability that a prediction is wrong) or a probability that an error is included in the prediction result when a specific model for a given data sample performs prediction.

본 명세서에서, 패널(panel)이란, 병리 슬라이드 이미지에서 추출될 패치(patch) 또는 병리 슬라이드 이미지의 타입을 의미한다. 상기 패널은 세포(cell) 패널, 조직(tissue) 패널 및 스트럭처(structure) 패널로 구분될 수 있으나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다. 각 패널 유형에 대응되는 패치의 예는 도 10 내지 도 12를 참조하도록 한다.In the present specification, a panel refers to a type of patch or pathological slide image to be extracted from a pathological slide image. The panel may be classified into a cell panel, a tissue panel, and a structure panel, but the technical scope of the present disclosure is not limited thereto. For examples of patches corresponding to each panel type, refer to FIGS. 10 to 12.

본 명세서에서 인스트럭션(instruction)이란, 기능을 기준으로 묶인 일련의 명령어들로서 컴퓨터 프로그램의 구성 요소이자 프로세서에 의해 실행되는 것을 가리킨다.In the present specification, an instruction is a series of instructions grouped by function and refers to a component of a computer program and executed by a processor.

이하, 본 개시의 몇몇 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

도 2는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 시스템을 나타내는 예시적인 구성도이다.2 is an exemplary configuration diagram illustrating an annotation task management system in accordance with some embodiments of the present disclosure.

도 2에 도시된 바와 같이, 상기 어노테이션 작업 관리 시스템은 스토리지 서버(10), 적어도 하나의 어노테이터 단말(20-1 내지 20-n) 및 어노테이션 작업 관리 장치(100)를 포함할 수 있다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 구성 요소가 추가되거나 삭제될 수 있음은 물론이다. 가령, 다른 몇몇 실시예에서는, 도 3에 도시된 바와 같이, 상기 어노테이션 작업 관리 시스템은 어노테이션 작업에 대한 리뷰(즉, 평가)를 담당하는 리뷰자의 단말(30)을 더 포함할 수 있다.As shown in FIG. 2, the annotation task management system may include a storage server 10, at least one annotator terminal 20-1 to 20-n, and an annotation task management apparatus 100. However, this is only a preferred embodiment for achieving the object of the present disclosure, of course, some components may be added or deleted as necessary. For example, in some other embodiments, as shown in FIG. 3, the annotation task management system may further include a reviewer's terminal 30 in charge of reviewing (ie, evaluating) the annotation task.

도 2 또는 도3에 도시된 시스템의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 복수의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있다. 또는, 상기 각각의 구성 요소들은 실제 물리적 환경에서는 복수의 세부 기능 요소로 분리되는 형태로 구현될 수도 있다. 예컨대, 어노테이션 작업 관리 장치(100)의 제1 기능은 제1 컴퓨팅 장치에서 구현되고, 제2 기능은 제2 컴퓨팅 장치에서 구현될 수도 있다. 이하, 상기 각각의 구성 요소에 대하여 설명한다.Each component of the system illustrated in FIG. 2 or 3 represents functionally divided functional elements, and a plurality of components may be implemented in a form in which they are integrated with each other in an actual physical environment. Alternatively, each of the components may be implemented in a form that is separated into a plurality of detailed functional elements in the actual physical environment. For example, the first function of the annotation task management device 100 may be implemented in the first computing device, and the second function may be implemented in the second computing device. Hereinafter, each said component is demonstrated.

상기 어노테이션 작업 관리 시스템에서, 스토리지 서버(10)는 어노테이션 작업과 연관된 각종 데이터를 저장하고 관리하는 서버이다. 데이터의 효율적인 관리를 위해, 스토리지 서버(10)는 데이터베이스를 이용하여 상기 각종 데이터를 저장되고 관리할 수 있다.In the annotation job management system, the storage server 10 is a server that stores and manages various data associated with the annotation job. In order to efficiently manage data, the storage server 10 may store and manage the various data using a database.

상기 각종 데이터는 병리 슬라이드 이미지의 파일, 병리 슬라이드 이미지의 메타 데이터(e.g. 이미지 형식, 연관된 병명, 연관된 조직, 연관된 환자 정보 등), 어노테이션 작업에 관한 데이터, 어노테이터에 관한 데이터, 어노테이션 작업 결과물 등을 포함할 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.The various data may include a file of a pathological slide image, meta data of a pathological slide image (eg image format, associated disease name, associated tissue, associated patient information, etc.), data on an annotation task, data on annotator, annotation result, and the like. Although it may include, the technical scope of the present disclosure is not limited thereto.

몇몇 실시예에서, 스토리지 서버(10)는 작업 관리 웹 페이지를 제공하는 웹 서버로 동작할 수도 있다. 이와 같은 경우, 관리자는 상기 작업 관리 웹 페이지를 통해 어노테이션 작업에 대한 할당, 관리 등을 수행하고, 어노테이터는 상기 작업 관리 웹 페이지를 통해 할당된 작업을 확인하고 수행할 수 있다.In some embodiments, the storage server 10 may operate as a web server that presents work management web pages. In this case, the administrator may perform an assignment, management, and the like for the annotation task through the task management web page, and the annotator may check and perform the task assigned through the task management web page.

몇몇 실시예에서, 어노테이션 작업 관리를 위한 데이터 모델(e.g. DB 스키마)은 도 4에 도시된 바와 같이 설계될 수 있다. 도 4에서 박스형 객체는 엔티티(entity)를 가리키고, 박스형 객체를 연결하는 선은 관계(relationship)를 가리키며, 선 위의 글자는 관계 유형을 가리킨다. 도 4에 도시된 바와 같이, 어노테이션 작업 엔티티(44)는 다양한 엔티티(43, 45, 46, 47, 49)와 연관될 수 있다. 보다 이해의 편의를 제공하기 위해, 작업 엔티티(44)를 중심으로 도 4에 도시된 데이터 모델에 대하여 간략하게 설명하도록 한다.In some embodiments, a data model (e.g. DB schema) for annotation task management may be designed as shown in FIG. In FIG. 4, the box-shaped object indicates an entity, the line connecting the box-shaped object indicates a relationship, and the letters on the line indicate a relationship type. As shown in FIG. 4, the annotation work entity 44 may be associated with various entities 43, 45, 46, 47, 49. For convenience of understanding, the data model shown in FIG. 4 will be briefly described based on the work entity 44.

슬라이드 엔티티(45)는 병리 슬라이드 이미지에 관한 엔티티이다. 슬라이드 엔티티(45)는 병리 슬라이드 이미지와 연관된 각종 정보를 속성(attribute)으로 가질 수 있다. 하나의 병리 슬라이드 이미지로부터 다수의 어노테이션 작업이 생성될 수 있기 때문에, 슬라이드 엔티티(45)와 작업 엔티티(44) 간의 관계는 1:n이 된다.Slide entity 45 is an entity relating to a pathological slide image. The slide entity 45 may have various information associated with the pathological slide image as an attribute. Since multiple annotation tasks can be generated from one pathological slide image, the relationship between the slide entity 45 and the task entity 44 is 1: n.

데이터셋 엔티티(49)는 어노테이션이 수행된 데이터의 활용 용도를 나타내는 엔티티이다. 가령, 상기 활용 용도는 학습(training) 용도(즉, 학습 데이터셋으로 활용됨), 검증(validation) 용도(즉, 검증 데이터셋으로 활용됨), 테스트 용도(즉, 테스트 데이터셋으로 활용됨) 또는 OPT(Observer Performance Test) 용도(즉, OPT 테스트에 활용됨)로 구분될 수 있으나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.The dataset entity 49 is an entity that indicates the usage of the annotated data. For example, the use purpose may be a training use (ie, utilized as a training dataset), a validation use (ie, utilized as a validation dataset), a test use (ie, used as a test dataset), or an OPT ( Observer Performance Test) purposes (that is, utilized in the OPT test), but the technical scope of the present disclosure is not limited thereto.

어노테이터 엔티티(47)는 어노테이터를 나타내는 엔티티이다. 어노테이터 엔티티(47)는 상기 어노테이터의 현재 작업 현황, 과거 작업 수행 이력, 기 수행된 작업에 대한 평가 결과, 어노테이터의 인적 정보(e.g. 학력, 전공 등) 등을 속성으로 가질 수 있다. 한 명의 어노테이터는 다수의 작업을 수행할 수 있으므로, 어노테이터 엔티티(47)와 작업 엔티티(44) 간의 관계는 1:n이 된다.Annotator entity 47 is an entity representing annotator. The annotator entity 47 may have attributes such as current work status of the annotator, past work execution history, evaluation results of previously performed work, personal information of the annotator (e.g. education, major, etc.), and the like. Since one annotator can perform multiple tasks, the relationship between annotator entity 47 and work entity 44 is 1: n.

패치 엔티티(46)는 병리 슬라이드 이미지로부터 파생된 패치에 관한 엔티티이다. 상기 패치에는 복수개의 어노테이션이 포함될 수 있기 때문에, 패치 엔티티(46)와 어노테이션 엔티티(48) 간의 관계는 1:n이 된다. 또한, 하나의 어노테이션 작업이 복수개의 패치에 대해 수행될 수 있기 때문에, 패치 엔티티(46)와 작업 엔티티(44) 간의 관계는 n:1이 된다.Patch entity 46 is an entity for a patch derived from a pathological slide image. Since the patch may include a plurality of annotations, the relationship between the patch entity 46 and the annotation entity 48 is 1: n. Also, since one annotation operation can be performed for a plurality of patches, the relationship between patch entity 46 and task entity 44 is n: 1.

어노테이션 태스크 엔티티(43)는 세부적인 어노테이션 작업 유형인 어노테이션 태스크(annotation task)를 나타내는 엔티티이다. 예를 들어, 상기 어노테이션 태스크 유사 분열 세포(mitosis)인지 여부를 태깅하는 태스크, 유사 분열 세포의 개수를 태깅하는 태스크, 병변의 종류를 태깅하는 태스크, 병변의 위치를 태깅하는 태스크, 병명을 태깅하는 태스크 등과 같이 다양하게 정의되고 세분화될 수 있다. 상기 어노테이션 작업의 세부 유형은 패널에 따라 달라질 수 있고(즉, 세포 패널과 조직 패널에 태깅되는 어노테이션은 달라질 수 있음), 동일한 패널이더라도 서로 다른 태스크가 수행될 수 있기 때문에, 태스크 엔티티((43)는 패널 엔티티(41)와 태스크 클래스(42) 엔티티를 속성으로 가질 수 있다. 여기서, 태스크 클래스 엔티티(42)는 패널의 관점에서 정의되는 어노테이션 대상(e.g. 유사 분열 세포, 병변의 위치) 또는 패널의 관점에서 정의되는 태스크 유형을 나타내는 엔티티이다. 하나의 어노테이션 태스크에서 복수의 어노테이션 작업이 생성될 수 있기 때문에(즉, 동일한 태스크를 수행하는 복수의 작업이 존재할 수 있음), 어노테이션 태스크 엔티티(43)와 어노테이션 작업 엔티티(44) 간의 관계는 1:n이 된다. 프로그래밍적 관점에서, 어노테이션 태스크 엔티티(43)는 클래스(class) 또는 프로그램(program)에 대응되고, 어노테이션 작업 엔티티(44)는 상기 클래스의 인스턴스(instance) 또는 프로그램 실행에 의해 생성된 프로세스(process)에 대응되는 것으로 이해될 수 있다.Annotation task entity 43 is an entity representing an annotation task, which is a detailed annotation task type. For example, the annotation task is a task of tagging whether or not the mitosis, the task of tagging the number of mitotic cells, the task of tagging the type of lesion, the task of tagging the location of the lesion, tagging the name of the disease It can be defined and broken down in various ways such as tasks. The detailed type of annotation work can vary from panel to panel (ie, annotations tagged to cell panels and tissue panels can vary), and because different tasks can be performed on the same panel, the task entity (43) May have a panel entity 41 and a task class 42 entity as attributes, where the task class entity 42 is the annotation object (eg mitotic cell, location of the lesion) or panel defined from the panel's point of view. An entity representing a task type that is defined from a perspective, since multiple annotation tasks can be generated from one annotation task (that is, there can be multiple tasks performing the same task), and the annotation task entity 43 The relationship between the annotation work entities 44 is 1: n. From a programmatic point of view, the annotation task The entity 43 may correspond to a class or program, and the annotation task entity 44 may be understood to correspond to an instance of the class or a process generated by program execution. have.

몇몇 실시예에서, 스토리지 서버(10)는 전술한 데이터 모델에 기반하여 데이터베이스를 구축하고, 어노테이션 작업과 연관된 각종 데이터를 체계적으로 관리할 수 있다. 이에 따라, 데이터 관리 비용은 감소되고, 전반적인 어노테이션 작업 프로세스가 원활하게 진행될 수 있다.In some embodiments, the storage server 10 may build a database based on the data model described above and systematically manage various data associated with the annotation work. As a result, data management costs are reduced, and the overall annotation work process can be smoothly performed.

지금까지 어노테이션 작업 관리를 위한 데이터 모델에 대하여 설명하였다. 다시 도 2 및 도 3을 참조하여 어노테이션 작업 관리 시스템의 구성 요소에 대한 설명을 이어가도록 한다.So far, the data model for annotation task management has been described. 2 and 3, the description of the components of the annotation task management system will be continued.

상기 어노테이션 작업 관리 시스템에서, 어노테이션 작업 관리 장치(100)는 어노테이터 단말(20-1 내지 20-n)에게 어노테이션 작업을 할당하는 등의 제반 관리 기능을 수행하는 컴퓨팅 장치이다. 여기서, 상기 컴퓨팅 장치는, 노트북, 데스크톱(desktop), 랩탑(laptop) 등이 될 수 있으나, 이에 국한되는 것은 아니며 컴퓨팅 기능이 구비된 모든 종류의 장치를 포함할 수 있다. 상기 컴퓨팅 장치의 일 예는 도 24를 참조하도록 한다. 이하에서는, 설명의 편의상 어노테이션 작업 관리 장치(100)를 관리 장치(100)로 약칭하도록 한다. 또한, 이하의 서술에서, 어노테이터 단말을 총칭하거나 구분 없이 임의의 어노테이터 단말을 지칭할 때는 참조 번호 20을 이용하도록 한다.In the annotation task management system, the annotation task management apparatus 100 is a computing device that performs various management functions such as allocating annotation tasks to the annotator terminals 20-1 to 20-n. Here, the computing device may be a laptop, a desktop, a laptop, and the like, but is not limited thereto and may include any kind of device having a computing function. An example of the computing device is described with reference to FIG. 24. Hereinafter, for convenience of explanation, the annotation job management apparatus 100 will be abbreviated as a management apparatus 100. In addition, in the following description, reference numeral 20 is used when referring to an annotator terminal generically or without any distinction.

작업 관리 장치(100)는 관리자에 의해 이용되는 장치일 수 있다. 가령, 관리자는 작업 관리 장치(100)를 통해 작업 관리 웹 페이지 접속하고, 관리자 계정으로 로그인 한 다음, 전반적인 작업에 대한 관리를 수행할 수 있다. 가령, 관리자는 어노테이션 작업을 특정 어노테이터의 계정에 할당하거나, 어노테이션 결과를 리뷰자의 계정으로 전송하여 리뷰를 요청하는 등의 관리 행위를 수행할 수 있다. 물론, 위와 같은 제반 관리 과정은 작업 관리 장치(100)에 의해 자동으로 수행될 수도 잇는데, 이에 대한 설명은 도 5 이하의 도면을 참조하여 후술하도록 한다.The job management apparatus 100 may be a device used by an administrator. For example, the administrator may access the task management web page through the task management apparatus 100, log in with an administrator account, and then manage overall tasks. For example, the manager may assign an annotation task to an account of a specific annotator, or perform an administrative action such as requesting a review by transmitting the annotation result to the reviewer's account. Of course, the above general management process may be automatically performed by the job management apparatus 100, which will be described later with reference to the drawings of FIG.

상기 어노테이션 작업 관리 시스템에서, 어노테이터 단말(20)은 어노테이터에 의해 어노테이션 작업이 수행되는 단말이다. 단말(20)에는 어노테이션 툴(annotation tool)이 설치되어 있을 수 있다. 물론, 작업 관리 웹 페이지를 통해 어노테이션을 위한 각종 기능이 제공될 수도 있다. 이와 같은 경우, 어노테이터는 단말(20)을 통해 상기 작업 관리 웹 페이지에 접속한 다음, 웹 상에서 어노테이션 작업을 수행할 수 있다. 상기 어노테이션 툴의 일 예시는 도 7을 참조하도록 한다.In the annotation job management system, the annotator terminal 20 is a terminal on which the annotation job is performed by the annotator. An annotation tool may be installed in the terminal 20. Of course, various functions for annotation may be provided through the job management web page. In this case, the annotator may access the work management web page through the terminal 20 and then perform annotation work on the web. An example of the annotation tool is described with reference to FIG. 7.

상기 어노테이션 작업 관리 시스템에서, 리뷰자 단말(30)은 어노테이션 결과에 대한 리뷰를 수행하는 리뷰자 측의 단말이다. 리뷰자는 리뷰자 단말(30)을 이용하여 어노테이션 결과에 대한 리뷰를 수행하고, 리뷰 결과를 관리 장치(100)에게 제공할 수 있다.In the annotation task management system, the reviewer terminal 30 is a terminal of the reviewer side to perform a review of the annotation results. The reviewer may review the annotation result using the reviewer terminal 30, and provide the review result to the management device 100.

몇몇 실시예에서, 어노테이션 작업 관리 시스템의 적어도 일부의 구성 요소들은 네트워크를 통해 통신할 수 있다. 여기서, 상기 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 이동 통신망(mobile radio communication network), Wibro(Wireless Broadband Internet) 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다.In some embodiments, at least some components of the annotation task management system may communicate over a network. The network may be any type of wired / wireless network such as a local area network (LAN), a wide area network (WAN), a mobile radio communication network, a wireless broadband Internet (Wibro), or the like. Can be implemented.

지금까지 도 2 내지 도 4를 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 시스템에 대하여 설명하였다. 이하에서는, 도 5 내지 도 23의 도면을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법에 대하여 설명하도록 한다.So far, the annotation task management system according to some embodiments of the present disclosure has been described with reference to FIGS. 2 to 4. Hereinafter, an annotation task management method according to some embodiments of the present disclosure will be described with reference to the drawings of FIGS. 5 to 23.

상기 어노테이션 작업 관리 방법의 각 단계는 컴퓨팅 장치에 의해 수행될 수 있다. 다시 말하면, 상기 어노테이션 작업 관리 방법의 각 단계는 컴퓨팅 장치의 프로세서에 의해 실행되는 하나 이상의 인스트럭션들로 구현될 수 있다. 이해의 편의를 제공하기 위해, 상기 어노테이션 작업 관리 방법이 도 3 또는 도 4에 도시된 환경에서 수행되는 것을 가정하여 설명을 이어가도록 한다.Each step of the annotation task management method may be performed by a computing device. In other words, each step of the annotation task management method may be implemented by one or more instructions executed by a processor of the computing device. For convenience of understanding, it is assumed that the annotation task management method is performed in the environment shown in FIG. 3 or 4 to continue the description.

도 5는 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.5 is an exemplary flowchart illustrating an annotation task management method according to some embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, it is a matter of course that some steps can be added or deleted as necessary.

도 5에 도시된 바와 같이, 상기 어노테이션 작업 관리 방법은 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 단계 S100에서 시작된다. 상기 병리 슬라이드 이미지에 대한 정보는 상기 병리 슬라이드 이미지의 메타 데이터만을 포함할 수 있고, 병리 슬라이드 이미지 파일을 더 포함할 수도 있다.As shown in FIG. 5, the annotation task management method starts at step S100 of obtaining information about a new pathological slide image. The information on the pathological slide image may include only meta data of the pathological slide image, and may further include a pathological slide image file.

몇몇 실시예에서, 워커 에이전트(worker agent)를 통해 상기 신규의 병리 슬라이드 이미지에 대한 정보가 실시간으로 획득될 수 있다. 구체적으로, 상기 워커 에이전트에 의해 지정된 위치의 스토리지(e.g. 스토리지 서버 10 or 병리 슬라이드 이미지를 제공하는 의료 기관의 스토리지)에 병리 슬라이드 이미지 파일이 추가되는 것이 감지될 수 있다. 또한, 상기 워커 에이전트에 의하여 상기 신규의 병리 슬라이드 이미지에 대한 정보가 작업 관리 장치(100) 또는 스토리지 서버(10)의 데이터베이스에 삽입될 수 있다. 그러면, 상기 데이터베이스로부터 상기 신규의 병리 슬라이드 이미지에 대한 정보가 획득될 수 있다.In some embodiments, information about the new pathological slide image may be obtained in real time through a worker agent. Specifically, it may be detected that the pathological slide image file is added to the storage (e.g. storage server 10 or the storage of the medical institution providing the pathological slide image) of the location designated by the worker agent. In addition, information about the new pathological slide image may be inserted into the database of the job management apparatus 100 or the storage server 10 by the worker agent. Then, information about the new pathological slide image may be obtained from the database.

단계 S200에서, 관리 장치(100)는 상기 병리 슬라이드 이미지에 대한 어노테이션 작업을 생성한다. 여기서, 상기 어노테이션 작업은 상기 병리 슬라이드 이미지, 데이터셋 타입, 어노테이션 태스크 및 상기 병리 슬라이드 이미지의 일부 영역(즉, 어노테이션 대상 영역)인 패치 등의 정보에 기초하여 정의될 수 있다(도 4 참조). 본 단계 S200에 대한 자세한 설명은 도 8 내지 도 23을 참조하여 후술하도록 한다.In operation S200, the management apparatus 100 generates an annotation job for the pathological slide image. Here, the annotation task may be defined based on information such as the pathological slide image, the dataset type, the annotation task, and a patch which is a partial region (ie, an annotation target region) of the pathological slide image (see FIG. 4). A detailed description of this step S200 will be described later with reference to FIGS. 8 to 23.

단계 S300에서, 관리 장치(100)는 상기 생성된 어노테이션 작업을 수행할 어노테이터를 선정한다.In operation S300, the management apparatus 100 selects an annotationr to perform the generated annotation work.

몇몇 실시예에서, 도 6에 도시된 바와 같이, 관리 장치(100)는 어노테이터(51 내지 53)의 작업 수행 이력(e.g. 자주 진행했던 어노테이션 작업 등), 기 수행한 작업의 평가 결과(또는 검증 결과), 현재 작업 현황(e.g. 현재 할당된 작업의 진행 상태) 등의 관리 정보(54 내지 56) 에 기초하여 어노테이터를 자동으로 선정할 수 있다. 예를 들어, 관리 장치(100)는 상기 생성된 어노테이션 작업과 연관된 작업을 자주 수행했던 제1 어노테이터, 연관 작업에 대한 어노테이션 결과가 우수했던 제2 어노테이터, 현재 진행 중인 작업이 많지 않은 제3 어노테이터 등을 상기 생성된 어노테이션 작업의 어노테이터로 선정할 수 있다.In some embodiments, as shown in FIG. 6, the management apparatus 100 may perform a task execution history of the annotators 51 to 53 (eg, frequently performed annotation tasks, etc.), evaluation results (or verification of previously performed tasks). As a result, the annotator can be automatically selected based on the management information 54 to 56 such as the current work status (eg, the progress status of the currently assigned work). For example, the management device 100 may include a first annotator that frequently performs a task associated with the generated annotation task, a second annotator having excellent annotation results for the associated task, and a third that does not have many ongoing tasks. Annotators and the like may be selected as annotators of the generated annotation work.

여기서, 작업 수행 이력에 상기 생성된 어노테이션 작업과 연관된 작업이 포함되어 있는지 여부는 각 작업의 데이터셋 타입과 어노테이션 태스크의 패널의 조합이 서로 유사한지 여부에 기초하여 판정될 수 있다. 또는, 어노테이션 태스크의 패널 및 태스크 클래스의 조합이 서로 유사한지 여부에 기초하여 판정될 수도 있다. 물론, 상기 두가지 조합이 모두 유사한지 여부에 기초하여 판정될 수도 있다.Here, whether the task execution history includes a task associated with the generated annotation task may be determined based on whether the combination of the dataset type of each task and the panel of the annotation task are similar to each other. Or, it may be determined based on whether the combination of the panel and task class of the annotation task are similar to each other. Of course, it may be determined based on whether the above two combinations are similar.

몇몇 실시예에서, 상기 신규의 병리 슬라이드 이미지가 중요 데이터(e.g. 희귀병과 연관된 슬라이드 이미지, 고품질의 슬라이드 이미지 등)인 경우, 복수의 어노테이터가 선정될 수 있다. 또한, 어노테이터의 사람 수는 상기 중요도에 비례하여 증가될 수도 있다. 이와 같은 경우, 상기 복수의 어노테이터의 작업 결과를 상호 비교함으로써, 어노테이션 결과에 대한 검증이 수행될 수 있다. 본 실시예에 따르면, 중요 데이터에 대하여 보다 엄격한 검증이 수행됨으로써, 어노테이션 결과의 정확성이 향상될 수 있다.In some embodiments, when the new pathological slide image is critical data (e.g. slide image associated with rare disease, high quality slide image, etc.), a plurality of annotators may be selected. In addition, the number of persons in the annotator may be increased in proportion to the importance. In this case, by comparing the work results of the plurality of annotators with each other, verification of the annotation results may be performed. According to this embodiment, more stringent verification is performed on important data, so that the accuracy of annotation results can be improved.

단계 S400에서, 관리 장치(100)는 선정된 어노테이터의 단말(20)에게 어노테이션 작업을 할당한다. 가령, 관리 장치(100)는 상기 선정된 어노테이터의 계정에 어노테이션 작업을 할당할 수 있다.In operation S400, the management apparatus 100 allocates an annotation task to the terminal 20 of the selected annotator. For example, the management device 100 may assign an annotation task to an account of the selected annotator.

단계 S500에서, 어노테이터 단말(20)에서 어노테이션이 수행된다. 어노테이터는 단말(20)에 설치된 어노테이터 툴 또는 웹(e.g. 작업 관리 웹 페이지)을 통해 제공되는 어노테이션 서비스를 이용하여 어노테이션을 수행할 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.In step S500, annotation is performed in the annotator terminal 20. The annotator may perform annotation using an annotation tool provided on the terminal 20 or an annotation service provided through a web (e.g., a job management web page), but the technical scope of the present disclosure is not limited thereto.

상기 어노테이터 툴의 몇몇 예시는 도 6에 도시되어 있다. 도 6에 도시된 바와 같이, 어노테이션 툴(60)은 제1 영역(63)과 제2 영역(61)을 포함할 수 있다. 제2 영역(61)에는 실제 어노테이션이 수행될 패치 영역(68)과 확대/축소 인디케이터(65)가 포함될 수 있다. 도 6에 도시된 바와 같이, 패치 영역(68)에는 박스 라인 등의 하이라이트 처리가 이루어질 수 있다. 제1 영역(63)에는 작업 정보(67)가 표시되고, 도구 영역(69)이 더 포함될 수 있다. 도구 영역(69)에는 각 어노테이션에 대응되는 선택 가능한 도구들이 포함될 수 있다. 따라서, 어노테이터는 패치 영역(68)에 직접 어노테이션을 기입하지 않고, 간편하게 선택된 도구를 이용하여 패치 영역(68)에 어노테이션을 태깅할 수 있다(e.g. 클릭을 통해 제1 도구를 선택하고, 패치 영역 68을 다시 클릭하여 태깅 수행). 도구 영역(63)에 표시되는 어노테이션의 종류는 어노테이션 작업에 따라 달라질 것이므로, 어노테이션 툴(60)은 어노테이션 작업 정보에 기초하여 적절한 어노테이션 도구를 세팅할 수 있다.Some examples of such annotator tools are shown in FIG. 6. As shown in FIG. 6, the annotation tool 60 may include a first region 63 and a second region 61. The second area 61 may include a patch area 68 and an enlargement / reduction indicator 65 on which the actual annotation is to be performed. As illustrated in FIG. 6, the patch area 68 may be subjected to highlight processing such as a box line. The job information 67 is displayed in the first area 63, and the tool area 69 may be further included. The tool area 69 may include selectable tools corresponding to each annotation. Therefore, the annotator can tag an annotation in the patch area 68 using a simply selected tool without directly annotating the patch area 68 (eg, select the first tool by clicking on the patch area 68). Click 68 again to perform tagging). Since the type of annotation displayed in the tool area 63 will vary depending on the annotation task, the annotation tool 60 may set an appropriate annotation tool based on the annotation task information.

도 6에 도시된 어노테이션 툴(60)은 어노테이터의 편의성을 위해 고안된 툴의 일 예시를 도시하고 있을 뿐임에 유의하여야 한다. 즉, 어노테이션 툴은 어떠한 방식으로 구현되더라도 무방하다. 다시 도 5를 참조하여 설명을 이어가도록 한다.It should be noted that the annotation tool 60 shown in FIG. 6 only shows one example of a tool designed for convenience of annotators. In other words, the annotation tool can be implemented in any way. The description will be continued with reference to FIG. 5 again.

단계 S600에서, 어노테이터 단말(20)은 어노테이션 작업의 결과를 제공한다. 어노테이션 작업의 결과는 해당 패치에 태깅된 레이블 정보가 될 수 있다.In step S600, the annotator terminal 20 provides a result of the annotation work. The result of the annotation operation may be label information tagged to the patch.

단계 S700에서, 관리 장치(100)는 작업 결과에 대한 검증(평가)을 수행한다. 상기 검증 결과는 해당 어노테이터의 평가 결과로 기록될 수 있다. 상기 검증을 수행하는 구체적인 방식은 실시예에 따라 달라질 수 있다.In operation S700, the management device 100 performs verification (evaluation) on the work result. The verification result may be recorded as the evaluation result of the annotator. The specific way of performing the verification may vary depending on the embodiment.

몇몇 실시예에서, 기계학습 모델의 출력 결과에 기초하여 자동으로 검증이 수행될 수 있다. 구체적으로, 작업을 할당받은 어노테이터로부터 제1 어노테이션 결과 데이터가 획득되면, 상기 제1 어노테이션 결과 데이터와 상기 어노테이션 작업의 패치를 상기 기계학습 모델에 입력한 결과가 비교될 수 있다. 상기 비교 결과, 두 결과의 차이가 기준치를 초과하면 상기 제1 어노테이션 결과 데이터의 승인은 보류되거나 미승인 처리될 수 있다.In some embodiments, verification may be performed automatically based on the output of the machine learning model. Specifically, when the first annotation result data is obtained from the assignee assigned to the task, the first annotation result data may be compared with a result of inputting a patch of the annotation task to the machine learning model. As a result of the comparison, if the difference between the two results exceeds the reference value, the approval of the first annotation result data may be suspended or disapproved.

여기서, 상기 기준치는 기 설정된 고정 값 또는 상황에 따라 변동되는 변동 값일 수 있다. 가령, 상기 기준치는 상기 기계학습 모델의 정확도가 높을수록 더 작은 값으로 변동되는 값일 수 있다.Here, the reference value may be a predetermined fixed value or a change value that varies depending on a situation. For example, the reference value may be a value that is changed to a smaller value as the accuracy of the machine learning model is higher.

단계 S800에서, 관리 장치(100)는 어노테이션 작업이 다시 수행되어야 하는지 여부를 판정한다. 가령, 단계 S700에서 검증이 성공적으로 수행되지 않은 경우, 관리 장치(100)는 재작업이 필요하다는 결정을 내릴 수 있다.In step S800, the management device 100 determines whether the annotation work should be performed again. For example, if verification is not successfully performed in step S700, the management device 100 may determine that rework is required.

단계 S900에서, 재작업 필요 결정에 응답하여, 관리 장치(100)는 다른 어노테이터를 선정하고, 상기 다른 어노테이터에게 어노테이션 작업을 재할당한다. 이때, 상기 다른 어노테이터는 단계 S300에서 설명한 바와 유사한 방식으로 선정될 수 있다. 또는, 상기 다른 어노테이터는 리뷰자 또는 성능이 가장 우수한 기계학습 모델이 될 수도 있다.In step S900, in response to the rework necessity determination, the management apparatus 100 selects another annotator and reassigns the annotation work to the other annotator. In this case, the other annotators may be selected in a similar manner as described in step S300. Alternatively, the other annotator may be a reviewer or a machine learning model having the best performance.

도 5에는 도시되어 있지 않으나, 단계 S900 이후에, 상기 다른 어노테이터의 제2 어노테이션 결과 데이터에 기초하여 상기 제1 어노테이션 결과 데이터에 대한 검증이 다시 수행될 수 있다. 구체적으로, 상기 제2 어노테이션 결과 데이터가 획득되면, 상기 제1 어노테이션 결과 데이터와 상기 제2 어노테이션 결과 데이터의 유사도가 산출될 수 있다. 또한, 상기 유사도가 기준치 미만이면 상기 제1 어노테이션 결과 데이터는 최종적으로 미승인 처리될 수 있다. 이와 같은 처리 결과는 해당 어노테이터의 작업 수행 이력에 기록될 수 있다.Although not shown in FIG. 5, after step S900, the verification of the first annotation result data may be performed again based on the second annotation result data of the other annotator. Specifically, when the second annotation result data is obtained, a similarity degree between the first annotation result data and the second annotation result data may be calculated. In addition, when the similarity is less than the reference value, the first annotation result data may be finally unapproved. The result of such processing may be recorded in the job execution history of the annotator.

지금까지 도 4 내지 도 7을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 관리 방법에 대하여 설명하였다. 상술한 방법에 따르면, 어노테이션 작업이 전반적으로 자동화됨에 따라 관리자의 편의성이 증대되고, 전반적인 작업 효율성이 크게 향상될 수 있다. 이에 따라, 어노테이션 작업에 소요되는 시간 비용 및 인적 비용이 크게 절감될 수 있다. 또한, 어노테이션 작업의 부담이 감소됨에 따라, 기계 학습 기법의 활용성은 더욱 증대될 수 있다.So far, the annotation task management method according to some embodiments of the present disclosure has been described with reference to FIGS. 4 to 7. According to the above-described method, as the annotation work is generally automated, the convenience of the manager may be increased and the overall work efficiency may be greatly improved. Accordingly, the time cost and human cost required for the annotation work can be greatly reduced. In addition, as the burden of annotation work is reduced, the utility of machine learning techniques can be further increased.

나아가, 어노테이션 작업 결과를 기계학습 모델 또는 다른 어노테이터의 결과와 비교 검증함으로써, 어노테이션 결과의 정확성이 담보될 수 있다. 이에 따라, 어노테이션 결과를 학습한 기계학습 모델의 성능도 향상될 수 있다.Furthermore, the accuracy of the annotation results can be ensured by comparing and verifying the results of annotation work with those of machine learning models or other annotators. Accordingly, the performance of the machine learning model learning the annotation result may also be improved.

이하에서는, 도 8 내지 도 22를 참조하여 어노테이션 작업 생성 단계 S200의 세부 과정에 대하여 상세하게 설명하도록 한다.Hereinafter, a detailed process of the annotation job generation step S200 will be described in detail with reference to FIGS. 8 to 22.

도 8은 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.8 is an exemplary flowchart illustrating a method of generating an annotation task according to some embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, it is a matter of course that some steps can be added or deleted as necessary.

도 8에 도시된 바와 같이, 상기 어노테이션 작업 생성 방법은 신규의 병리 슬라이드 이미지의 데이터셋 타입을 결정하는 단계 S210에서 시작된다. 전술한 바와 같이, 상기 데이터셋 타입은 상기 병리 슬라이드 이미지의 활용 용도를 가리키는 것으로, 용도는 학습 용도, 검증 용도, 테스트 용도 또는 OPT(Observer Performance Test) 용도 등으로 구분될 수 있다.As shown in FIG. 8, the method for generating an annotation task begins at step S210 of determining a dataset type of a new pathological slide image. As described above, the dataset type indicates the use of the pathological slide image, and the use may be classified into learning use, verification use, test use, or OPT (Observer Performance Test) use.

몇몇 실시예에서, 상기 데이터셋 타입은 관리자의 선택에 의해 결정될 수 있다. In some embodiments, the dataset type may be determined by the administrator's choice.

다른 몇몇 실시예에서, 상기 데이터셋 타입은 병리 슬라이드 이미지에 대한 기계학습 모델의 컨피던스 스코어에 기초하여 결정될 수 있다. 여기서, 상기 기계학습 모델은 병리 슬라이드 이미지에 기초하여 특정 태스크(e.g. 병변 분류, 병변 위치 인식 등)를 수행하는 모델(즉, 학습 대상 모델)을 의미한다. 본 실시예에 대한 자세한 내용은 도 9에 도시되어 있다. 도 9에 도시된 바와 같이, 관리 장치(100)는 병리 슬라이드 이미지를 기계학습 모델에 입력하고, 그 결과로 컨피던스 스코어를 획득하며(S211), 상기 컨피던스 스코어가 기준치 이상인지 여부를 판정한다(S213). 또한, 기준치 미만이라는 판정에 응답하여, 관리 장치(100)는 상기 병리 슬라이드 이미지의 데이터셋 타입을 학습 용도로 결정한다(S217). 컨피던스 스코어가 기준치 미만이라는 것은 기계학습 모델이 상기 병리 슬라이드 이미지를 명확하게 판단하지 못한다는 것을 의미하기 때문이다(즉, 해당 병리 슬라이드 이미지에 대한 학습이 필요하다는 것을 의미하기 때문이다). 반대의 경우, 상기 병리 슬라이드 이미지의 데이터셋 타입은 검증 용도(또는 테스트 용도)로 결정된다(S215).In some other embodiments, the dataset type may be determined based on the confidence score of the machine learning model for the pathological slide image. Here, the machine learning model refers to a model (ie, a learning target model) that performs a specific task (e.g. lesion classification, lesion location recognition, etc.) based on the pathological slide image. Details of the present embodiment are shown in FIG. 9. As shown in FIG. 9, the management apparatus 100 inputs a pathological slide image into the machine learning model, obtains a confidence score as a result (S211), and determines whether the confidence score is greater than or equal to the reference value (S213). ). In addition, in response to the determination that the reference value is less than the reference value, the management apparatus 100 determines the data set type of the pathological slide image for learning purposes (S217). This is because the confidence score below the baseline means that the machine learning model does not clearly determine the pathological slide image (ie, it means that learning on that pathological slide image is necessary). On the contrary, the dataset type of the pathological slide image is determined for verification (or test) (S215).

또 다른 몇몇 실시예에서, 상기 데이터셋 타입은 병리 슬라이드 이미지에 대한 기계학습 모델의 엔트로피(entropy) 값에 기초하여 결정될 수 있다. 상기 엔트로피 값은 불확실성(uncertainty)를 나타내는 지표로, 컨피던스 스코어가 클래스 별로 고르게 분포할수록 큰 값을 가질 수 있다. 본 실시예에서, 상기 엔트로피 값이 기준치 이상이라는 판정에 응답하여, 상기 데이터셋 타입은 학습 용도로 결정될 수 있다. 반대의 경우라면, 검증 용도로 결정될 수 있다.In some other embodiments, the dataset type may be determined based on the entropy value of the machine learning model for the pathological slide image. The entropy value is an index indicating uncertainty and may have a larger value as the confidence score is distributed evenly for each class. In this embodiment, in response to determining that the entropy value is greater than or equal to the reference value, the dataset type may be determined for learning purposes. In the opposite case, it may be determined for verification purposes.

다시 도 8을 참조하면, 단계 S230에서, 관리 장치(100)는 병리 슬라이드 이미지의 패널 유형을 결정한다. 전술한 바와 같이, 상기 패널 유형은 세포 패널, 조직 패널 및 스트럭처 패널 등으로 구분될 수 있다. 상기 세포 패널 유형의 이미지의 예는 도 10에 도시되어 있고, 상기 조직 패널의 이미지의 예는 도 11에 도시되어 있으며, 상기 스트럭처 패널의 이미지의 예는 도 12에 도시되어 있다. 도 10 내지 도 12에 도시된 바와 같이, 세포 패널은 세포 레벨의 어노테이션이 수행되는 패치 유형이고, 조직 패널은 조직 레벨의 어노테이션이 수행되는 패치 유형이며, 조직 패널은 세포 또는 조직 등의 구조와 연관된 어노테이션이 수행되는 패치 유형이다.Referring back to FIG. 8, in step S230, the management device 100 determines the panel type of the pathological slide image. As described above, the panel types may be divided into cell panels, tissue panels, structure panels, and the like. An example of the image of the cell panel type is shown in FIG. 10, an example of the image of the tissue panel is shown in FIG. 11, and an example of the image of the structure panel is shown in FIG. 12. As shown in FIGS. 10-12, the cell panel is a patch type on which cell level annotation is performed, the tissue panel is a patch type on which tissue level annotation is performed, and the tissue panel is associated with a structure such as a cell or tissue. The type of patch on which the annotation is performed.

몇몇 실시예에서, 상기 패널 유형은 관리자의 선택에 의해 결정될 수 있다.In some embodiments, the panel type may be determined by the manager's choice.

몇몇 실시예에서, 상기 패널 유형은 기계학습 모델의 출력 값에 기초하여 결정될 수 있다. 도 13을 참조하여 부연 설명하면, 기계학습 모델에는 세포 패널에 대응되는 제1 기계학습 모델(75-1, 즉 세포 레벨의 어노테이션을 학습한 모델), 조직 패널에 대응되는 제2 기계학습 모델(75-2) 및 스트럭처 패널에 대응되는 제3 기계학습 모델(75-3)이 포함될 수 있다. 이와 같은 경우, 관리 장치(100)는 주어진 병리 슬라이드 이미지(71)에서 각각의 패널에 대응되는 제1 내지 제3 이미지(73-1 내지 73-3)를 추출(또는 샘플링)하고, 각 이미지를 대응되는 모델(75-1 내지 75-3)에 입력하며, 그 결과로 출력 값(77-1 내지 77-3)을 획득할 수 있다. 또한, 관리 장치(100)는 출력 값(77-1 내지 77-3)과 기준치와의 비교 결과에 따라 병리 슬라이드 이미지(71)의 패널 유형을 결정할 수 있다. 가령, 제1 출력 값(77-1)이 상기 기준치 미만인 경우, 병리 슬라이드 이미지(71)의 패널 유형은 세포 패널로 결정될 수 있다. 병리 슬라이드 이미지(71)에서 추출되는 세포 패치들이 제1 기계학습 모델(75-1)의 학습 성능을 올리는데 효과적일 것이기 때문이다.In some embodiments, the panel type may be determined based on the output value of the machine learning model. Referring to FIG. 13, the machine learning model includes a first machine learning model 75-1 corresponding to a cell panel (that is, a model learning an annotation of cell level) and a second machine learning model corresponding to a tissue panel ( 75-2) and a third machine learning model 75-3 corresponding to the structure panel. In such a case, the management apparatus 100 extracts (or samples) the first to third images 73-1 to 73-3 corresponding to each panel from a given pathological slide image 71, and extracts each image. It inputs to the corresponding models 75-1 to 75-3, and as a result, output values 77-1 to 77-3 can be obtained. In addition, the management apparatus 100 may determine the panel type of the pathological slide image 71 according to a result of comparing the output values 77-1 to 77-3 with a reference value. For example, when the first output value 77-1 is less than the reference value, the panel type of the pathological slide image 71 may be determined as a cell panel. This is because the cell patches extracted from the pathological slide image 71 may be effective in increasing the learning performance of the first machine learning model 75-1.

몇몇 실시예에서, 병리 슬라이드 이미지가 복수의 패널 유형을 가질 수도 있다. 이와 같은 경우, 상기 병리 슬라이드 이미지로부터 각 패널에 대응되는 패치들이 추출될 수 있다.In some embodiments, pathological slide images may have multiple panel types. In this case, patches corresponding to each panel may be extracted from the pathological slide image.

다시 도 8을 참조하면, 단계 S250에서, 관리 장치(100)는 어노테이션 태스크를 결정한다. 전술한 바와 같이, 어노테이션 태스크는 세부 작업 유형이 정의해 놓은 엔티티를 의미한다.Referring back to FIG. 8, in step S250, the management device 100 determines an annotation task. As described above, the annotation task refers to an entity defined by the detailed job type.

몇몇 실시예에서, 상기 어노테이션 태스크는 관리자의 선택에 의해 결정될 수 있다.In some embodiments, the annotation task may be determined by the administrator's choice.

몇몇 실시예에서, 상기 어노테이션 태스크는 상기 결정된 데이터셋 타입과 패널 유형의 조합에 기초하여 자동으로 결정될 수도 있다. 가령, 데이터셋 타입과 패널 유형의 조합에 매칭되는 어노테이션 태스크가 미리 정의되어 있는 경우, 상기 조합에 에 기초하여 상기 매칭되는 어노테이션 태스크가 자동으로 결정될 수 있다.In some embodiments, the annotation task may be automatically determined based on the determined combination of dataset type and panel type. For example, when an annotation task matching a combination of a dataset type and a panel type is predefined, the matching annotation task may be automatically determined based on the combination.

단계 S270에서, 관리 장치(100)는 병리 슬라이드 이미지에서 실제 어노테이션이 수행될 패치를 자동으로 추출한다. 물론, 관리자에 의해 지정된 영역이 패치로 추출될 수도 있다. 상기 패치를 자동으로 추출하는 구체적인 방법은 실시예에 따라 달라질 수 있는데, 패치 추출과 관련된 다양한 실시예들은 도 14 내지 도 23을 참조하여 후술하도록 한다.In operation S270, the management apparatus 100 automatically extracts a patch on which the actual annotation is to be performed from the pathological slide image. Of course, the area designated by the administrator may be extracted as a patch. The specific method of automatically extracting the patch may vary according to embodiments, and various embodiments related to patch extraction will be described later with reference to FIGS. 14 to 23.

도 8에는 도시되어 있지 않으나, 단계 S270 이후에, 관리 장치(100)는 단계 S210 내지 S270에서 결정된 데이터셋 타입, 패널 유형, 어노테이션 태스크 및 패치에 기반하여 어노테이션 작업을 생성할 수 있다. 전술한 바와 같이, 생성된 어노테이션 작업은 적절한 어노테이터의 계정에 할당될 수 있다.Although not shown in FIG. 8, after step S270, the management device 100 may generate an annotation job based on the data set type, panel type, annotation task, and patch determined in steps S210 to S270. As mentioned above, the generated annotation task may be assigned to the account of the appropriate annotator.

지금까지 도 8 내지 도 13을 참조하여 본 개시의 몇몇 실시예들에 따른 어노테이션 작업 생성 방법에 대하여 설명하였다. 이하에서는, 패치 자동 추출과 관련된 본 개시의 다양한 실시예들에 대하여 도 14 내지 도 23을 참조하여 설명하도록 한다.So far, the method for generating an annotation task according to some embodiments of the present disclosure has been described with reference to FIGS. 8 to 13. Hereinafter, various embodiments of the present disclosure related to automatic patch extraction will be described with reference to FIGS. 14 to 23.

도 14는 본 개시의 제1 실시예에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.14 is an exemplary flowchart illustrating a patch automatic extraction method according to a first embodiment of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, it is a matter of course that some steps can be added or deleted as necessary.

도 14에 도시된 바와 같이, 상기 패치 자동 추출 방법은 신규의 병리 슬라이드 이미지에서 복수의 후보 패치를 샘플링하는 단계 S271에서 시작된다. 상기 복수의 후보 패치를 샘플링하는 구체적인 방식은 실시예에 따라 달라질 수 있다.As shown in FIG. 14, the patch automatic extraction method starts at step S271 of sampling a plurality of candidate patches in a new pathological slide image. A specific manner of sampling the plurality of candidate patches may vary depending on the embodiment.

몇몇 실시예에서, 특정 조직을 구성하는 적어도 세포 영역들을 후보 패치(즉, 세포 패널 유형의 패치들)로 샘플링하는 경우, 도 15에 도시된 바와 같이, 병리 슬라이드 이미지(81)에서 영상 분석을 통해 조직 영역(83)을 추출하고, 추출된 영역(83) 내에서 복수의 후보 패치들(85)이 샘플링될 수 있다. 샘플링 결과의 몇몇 예시는 도 16 및 도 17에 도시되어 있다. 도 16 및 도 17에 도시된 병리 슬라이드 이미지(87, 89)에서, 각 포인트는 샘플링 포인트를 의미하고, 사각형의 도형은 샘플링 영역(즉, 후보 패치 영역)을 의미한다. 도 16 및 도 17에 도시된 바와 같이, 복수의 후보 패치들은 적어도 일부가 중첩되는 형태로 샘플링될 수도 있다.In some embodiments, when sampling at least the cell regions constituting a particular tissue with candidate patches (ie, cell panel type patches), through image analysis in pathological slide image 81, as shown in FIG. 15. The tissue region 83 may be extracted, and a plurality of candidate patches 85 may be sampled in the extracted region 83. Some examples of sampling results are shown in FIGS. 16 and 17. In the pathological slide images 87 and 89 shown in Figs. 16 and 17, each point means a sampling point, and a rectangular figure means a sampling area (ie, a candidate patch area). As illustrated in FIGS. 16 and 17, the plurality of candidate patches may be sampled in such a manner that at least some overlap.

몇몇 실시예에서, 병리 슬라이드 이미지의 전체 영역을 균일하게 분할하고, 분할된 각각의 영역들을 샘플링하여 후보 패치들이 생성될 수 있다. 즉, 균등 분할 방식으로 샘플링이 수행될 수 있다. 이때, 각 후보 패치들의 크기는 기 설정된 고정 값 또는 병리 슬라이드 이미지의 크기, 해상도, 패널 유형 등에 기초하여 결정되는 변동 값일 수 있다.In some embodiments, candidate patches may be generated by uniformly dividing an entire area of a pathological slide image and sampling each of the divided areas. That is, sampling may be performed in an equally divided manner. In this case, the size of each candidate patch may be a variation value determined based on a predetermined fixed value or the size, resolution, panel type, etc. of the pathological slide image.

몇몇 실시예에서, 병리 슬라이드 이미지의 전체 영역을 랜덤하게 분할하고, 분할된 각각의 영역들을 샘플링하여 후보 패치들이 생성될 수 있다.In some embodiments, candidate patches may be generated by randomly dividing an entire area of a pathological slide image and sampling each of the divided areas.

몇몇 실시예에서, 객체의 개수가 기준치를 초과하도록 후보 패치들이 형성될 수 있다. 가령, 상기 병리 슬라이드 이미지의 전체 영역에 대하여 객체 인식을 수행하고, 상기 객체 인식의 결과 산출된 객체의 개수가 기준치를 초과하는 영역이 후보 패치로 샘플링될 수 있다. 이와 같은 경우, 후보 패치의 크기는 서로 다를 수도 있다.In some embodiments, candidate patches may be formed such that the number of objects exceeds a threshold. For example, object recognition may be performed on the entire area of the pathological slide image, and an area in which the number of objects calculated as a result of the object recognition exceeds a reference value may be sampled as a candidate patch. In such a case, the size of the candidate patch may be different.

몇몇 실시예에서, 병리 슬라이드 이미지의 메타 데이터에 기반하여 결정된 정책에 따라 분할된 후보 패치가 샘플링될 수 있다. 여기서, 상기 메타 데이터는 상기 병리 슬라이드 이미지와 연관된 병명, 조직(tissue), 환자의 인구통계학적 정보, 의료기관의 위치, 상기 병리 슬라이드 이미지의 품질(e.g. 해상도), 포맷 형식 등이 될 수 있다. 구체적인 예를 들어, 병리 슬라이드 이미지가 종양 환자의 조직에 관한 이미지인 경우, 유사 분열 세포 검출을 위한 기계학습 모델의 학습 데이터로 이용하기 위해, 세포 레벨로 후보 패치가 샘플링될 수 있다. 다른 예를 들어, 병리 슬라이드 이미지와 연관된 병명의 예후를 진단할 때 조직 내 병변의 위치가 중요한 경우, 조직 레벨로 후보 패치가 샘플링될 수도 있다.In some embodiments, candidate patches split may be sampled according to a policy determined based on metadata of the pathological slide image. Here, the metadata may be a pathology associated with the pathological slide image, a tissue, demographic information of a patient, a location of a medical institution, a quality (e.g. resolution) of the pathological slide image, a format, and the like. As a specific example, if the pathological slide image is an image of a tissue of a tumor patient, candidate patches may be sampled at the cell level to use as learning data of a machine learning model for mitotic cell detection. In another example, candidate patches may be sampled at the tissue level if the location of the lesion in the tissue is important when diagnosing the prognosis of the pathology associated with the pathological slide image.

몇몇 실시예에서, 병리 슬라이드 이미지에서 스트럭처 패널 유형의 후보 패치들을 샘플링하는 경우, 영상 분석을 통해 상기 병리 슬라이드 이미지에서 외곽선이 추출되고, 상기 추출된 외곽선 중에서 서로 연결된 외곽선들이 하나의 후보 패치를 형성하도록 샘플링이 수행될 수도 있다.In some embodiments, when sampling candidate patches of the structure panel type in a pathological slide image, an outline is extracted from the pathological slide image through image analysis, and the outlines connected to each other among the extracted outlines form one candidate patch. Sampling may be performed.

이와 같이, 단계 S271에서 복수의 후보 패치를 샘플링하는 구체적인 방식은 실시예에 따라 달라질 수 있다. 다시 도 14를 참조하여 설명을 이어가도록 한다.As such, the detailed method of sampling the plurality of candidate patches in step S271 may vary according to an exemplary embodiment. The description will be continued with reference to FIG. 14 again.

단계 S273에서, 기계학습 모델의 출력 값에 기반하여 어노테이션 대상 패치가 선정된다. 상기 출력 값은 예를 들어 컨피던스 스코어(또는 클래스 별 컨피던스 스코어)가 될 수 있는데, 상기 컨피던스 스코어에 기초하여 패치를 선정하는 구체적인 방식은 실시예에 따라 달라질 수 있다.In step S273, the annotation target patch is selected based on the output value of the machine learning model. The output value may be, for example, a confidence score (or a confidence score for each class). A specific manner of selecting a patch based on the confidence score may vary according to embodiments.

몇몇 실시예에서, 클래스 별 컨피던스 스코어에 의해 산출된 엔트로피 값에 기초하여 어노테이션 대상 패치가 선정될 수 있다. 본 실시예에 대한 자세한 내용은 도 18 및 도 19에 도시되어 있다.In some embodiments, the annotation target patch may be selected based on the entropy value calculated by the class-specific confidence scores. Details of this embodiment are shown in FIGS. 18 and 19.

도 18에 도시된 바와 같이, 병리 슬라이드 이미지(91)에서 샘플링된 후보 패치들(92)로부터 엔트로피 값 기반의 불확실성 샘플링을 통해 어노테이션 대상 패치(93)가 선정될 수 있다. 보다 구체적으로, 도 19에 도시된 바와 같이, 기계학습 모델(95)로부터 출력된 각 후보 패치(94-1 내지 94-n)의 클래스 별 컨피던스 스코어(96-1 내지 96-n)를 기초로 엔트로피 값(97-1 내지 97-n)이 산출될 수 있다. 전술한 바와 같이, 엔트로피 값은 컨피던스 스코어가 클래스 별로 고르게 분포할수록 큰 값을 갖게 된다. 가령, 도 19에 도시된 경우라면, 엔트로피 A(97-1)가 가장 큰 값으로 연산되고, 엔트로피 C(97-n)가 가장 작은 값으로 연산된다. 또한, 엔트로피 값이 기준치 이상인 후보 패치들이 어노테이션 대상으로 자동 선정될 수 있다. 엔트로피 값이 높다는 것은 기계학습 모델의 예측 결과가 불확실하다는 것을 의미하고, 이는 곧 학습에 보다 효과적인 데이터라는 것을 의미하기 때문이다.As shown in FIG. 18, the annotation target patch 93 may be selected from the uncertainty sampling based on entropy values from candidate patches 92 sampled in the pathological slide image 91. More specifically, as shown in FIG. 19, based on the class-specific confidence scores 96-1 to 96-n of each candidate patch 94-1 to 94-n output from the machine learning model 95. Entropy values 97-1 to 97-n can be calculated. As described above, the entropy value has a larger value as the confidence score is distributed evenly by class. For example, in the case shown in Fig. 19, entropy A 97-1 is calculated with the largest value and entropy C 97-n is calculated with the smallest value. In addition, candidate patches having an entropy value of more than a reference value may be automatically selected as an annotation target. Higher entropy values mean that the predictive results of machine learning models are uncertain, which means more effective data for learning.

몇몇 실시예에서, 상기 컨피던스 스코어 자체에 기초하여 어노테이션 대상 패치가 선정될 수도 있다. 가령, 복수의 후보 패치 중에서, 컨피던스 스코어가 기준치 미만인 후보 패치가 상기 어노테이션 대상 패치로 선정될 수 있다.In some embodiments, an annotation target patch may be selected based on the confidence score itself. For example, among a plurality of candidate patches, candidate patches having a confidence score below a reference value may be selected as the annotation target patch.

도 20은 본 개시의 제2 실시예들에 따른 패치 자동 추출 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다. 명세서의 명료함을 위해, 전술한 실시예와 중복되는 내용에 대한 설명은 생략하도록 한다.20 is an exemplary flowchart illustrating a patch automatic extraction method according to second embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, it is a matter of course that some steps can be added or deleted as necessary. For clarity of description, descriptions of overlapping contents with the above-described embodiments will be omitted.

도 20에 도시된 바와 같이, 상기 제2 실시예 또한 복수의 후보 패치를 샘플링하는 단계 S271에서 시작된다. 다만, 상기 제2 실시예에서는 기계학습 모델의 오예측 확률에 기반하여 어노테이션 대상 패치가 선정된다는 점에서(S275 참조), 전술한 실시예와 차이가 있다. As shown in Fig. 20, the second embodiment also begins in step S271 of sampling a plurality of candidate patches. However, in the second embodiment, the annotation target patch is selected based on a misprediction probability of the machine learning model (see S275).

상기 기계학습 모델의 오예측 확률은 기계 학습을 통해 구축된 오예측 확률 산출 모델(이하, "산출 모델"로 약칭함)에 의해 산출될 수 있는데, 이해의 편의를 제공하기 위해, 먼저 상기 산출 모델을 구축하는 방법에 대하여 도 21 및 도 22를 참조하여 설명하도록 한다.The misprediction probability of the machine learning model may be calculated by a misprediction probability calculation model (hereinafter, abbreviated as "calculation model") constructed through machine learning. For convenience of understanding, first, the calculation model A method of constructing the same will be described with reference to FIGS. 21 and 22.

도 21에 도시된 바와 같이, 상기 산출 모델은 상기 기계학습 모델의 평가 결과(e.g. 검증 결과, 테스트 결과)를 학습함으로써 구축될 수 있다(S291 내지 S295). 구체적으로, 평가용 데이터로 상기 기계학습 모델을 평가하고(S291), 평가 결과가 상기 평가용 데이터에 레이블 정보로 태깅되면(S293), 상기 평가용 데이터를 상기 레이블 정보로 학습함으로써 상기 산출 모델이 구축될 수 있다(S295).As shown in FIG. 21, the calculation model may be constructed by learning evaluation results (e.g. verification results and test results) of the machine learning model (S291 to S295). Specifically, when the machine learning model is evaluated with evaluation data (S291), and the evaluation result is tagged with the label information on the evaluation data (S293), the calculation model is learned by learning the evaluation data with the label information. It may be established (S295).

평가용 데이터에 레이블 정보를 태깅하는 몇몇 예시는 도 22에 도시되어 있다. 도 22는 혼동 행렬(confusion matrix)을 도시하고 있는데, 상기 기계학습 모델이 분류 태스크를 수행하는 모델인 경우, 평가 결과는 혼동 행렬 내의 특정 셀에 대응될 수 있다. 도 22에 도시된 바와 같이, 평가 결과가 FP(false positive) 또는 FN(false negative)인 이미지(101)에는 제1 값(e.g. 1)이 레이블 값(102)으로 태깅되고, 평가 결과가 TP(true positive) 또는 TN(true negative)인 이미지(103)에는 제2 값(e.g. 0)이 레이블 값(104)으로 태깅될 수 있다. 즉, 기계학습 모델의 예측이 정답과 일치한 경우에는 "1"이 태깅되고, 불일치한 경우에는 "0"이 태깅될 수 있다.Some examples of tagging label information to evaluation data are shown in FIG. 22. FIG. 22 illustrates a confusion matrix. When the machine learning model is a model for performing a classification task, the evaluation result may correspond to a specific cell in the confusion matrix. As shown in FIG. 22, the first value (eg 1) is tagged with the label value 102 in the image 101 having an evaluation result of FP (false positive) or FN (false negative), and the evaluation result is TP ( The second value (eg 0) may be tagged with the label value 104 in the image 103 that is true positive or true negative (TN). That is, when the prediction of the machine learning model matches the correct answer, "1" may be tagged, and if it is not, "0" may be tagged.

위와 같은 이미지(101, 102)와 레이블 정보를 학습하게 되면, 산출 모델은 기계학습 모델이 정확하게 예측했던 이미지와 유사한 이미지가 입력될 때 높은 컨피던스 스코어를 출력하게 된다. 또한, 반대의 경우 산출 모델은 낮은 컨피던스 스코어를 출력하게 된다. 따라서, 산출 모델은 입력된 이미지에 대한 기계학습 모델의 오예측 확률을 산출할 수 있게 된다.When the above images 101 and 102 and the label information are learned, the calculation model outputs a high confidence score when an image similar to the image that the machine learning model accurately predicts is input. Also, in the opposite case, the calculation model outputs a low confidence score. Thus, the calculation model can calculate the probability of misprediction of the machine learning model for the input image.

한편, 도 22는 레이블 정보를 태깅하는 몇몇 예시를 도시하고 있을 뿐임에 유의하여야 한다. 본 개시의 다른 몇몇 실시예들에 따르면, 예측 오차가 레이블 정보로 태깅될 수도 있다. 여기서, 상기 예측 오차는 예측 값(즉, 컨피던스 스코어)과 실제 값(즉, 정답 정보)의 차이를 의미한다.Meanwhile, it should be noted that FIG. 22 only shows some examples of tagging information. According to some embodiments of the present disclosure, the prediction error may be tagged with label information. Here, the prediction error means a difference between a prediction value (ie, confidence score) and an actual value (ie, correct answer information).

또한, 본 개시의 또 다른 몇몇 실시예들에 따르면, 평가용 이미지의 예측 오차가 임계 값 이상인 경우 제1 값(e.g. 0)이 태깅되고, 상기 예측 오차가 상기 임계 값 미만인 경우 제2 값(e.g. 1)이 레이블 정보로 태깅될 수도 있다.Further, according to some embodiments of the present disclosure, a first value (eg 0) is tagged when the prediction error of the evaluation image is greater than or equal to a threshold value, and a second value (eg, when the prediction error is less than the threshold value). 1) may be tagged with label information.

다시 도 20을 참조하여 설명을 이어가도록 한다.The description will be continued with reference to FIG. 20 again.

전술한 방법에 따라 산출 모델이 구축되면, 단계 S275에서, 관리 장치(100)는 복수의 후보 패치들 각각에 대한 오예측 확률을 산출할 수 있다. 가령, 도 23에 도시된 바와 같이, 관리 장치(100)는 각 데이터 샘플(111-1 내지 111-n)을 산출 모델(113)에 입력하여 산출 모델(113)의 컨피던스 스코어(115-1 내지 115-n)를 획득하고, 획득된 컨피던스 스코어(115-1 내지 115-n)에 기초하여 상기 오예측 확률을 산출할 수 있다.When the calculation model is constructed according to the above-described method, in operation S275, the management apparatus 100 may calculate a misprediction probability for each of the plurality of candidate patches. For example, as shown in FIG. 23, the management apparatus 100 inputs each of the data samples 111-1 to 111-n to the calculation model 113, and the confidence score 115-1 to the calculation model 113. 115-n), and the false prediction probability may be calculated based on the obtained confidence scores 115-1 to 115-n.

다만, 도 23에 도시된 바와 같이, 후보 패치(11-1 내지 111-n)가 입력될 때, 정답 및 오답 클래스에 대한 컨피던스 스코어(115-1 내지 115-n)를 출력하도록 산출 모델(113)이 학습된 경우(e.g. 정답과 일치 시 레이블 1로 학습하고, 불일치 시 레이블 0으로 학습한 경우)라면, 오답 클래스의 컨피던스 스코어(밑줄로 도시됨)가 오예측 확률로 이용될 수도 있다.23, when the candidate patches 11-1 to 111-n are input, the calculation model 113 outputs the confidence scores 115-1 to 115-n for the correct and incorrect class. ) Is learned (eg when learning with label 1 if it matches the correct answer, and with label 0 when it does not match), the confidence score (shown underlined) of the incorrect answer class may be used as the misprediction probability.

각 후보 패치의 오예측 확률이 산출되면, 관리 장치(100)는 복수의 후보 패치 중에서 상기 산출된 오예측 확률이 기준치 이상인 후보 패치를 어노테이션 대상으로 선정할 수 있다. 오예측 확률이 높다는 것은 상기 기계학습 모델의 예측 결과가 틀릴 가능성이 높다는 것을 의미하고, 이는 곧 해당 패치가 상기 기계학습 모델의 성능을 개선하는데 중요한 데이터라는 것을 의미하기 때문이다. 이와 같이, 오예측 확률에 기반하여 패치를 선정하면, 학습에 효과적인 패치들이 어노테이션 대상으로 선정됨으로써 양질의 학습 데이터셋이 생성될 수 있다.When the misprediction probability of each candidate patch is calculated, the management apparatus 100 may select a candidate patch having the calculated misprediction probability equal to or greater than a reference value from among a plurality of candidate patches as the annotation target. The high probability of misprediction means that the prediction results of the machine learning model are likely to be wrong, because that means that the patch is important data to improve the performance of the machine learning model. As such, when a patch is selected based on a misprediction probability, high-quality learning datasets may be generated by selecting patches effective for learning as annotation targets.

지금까지 도 14 내지 도 23을 참조하여 본 개시의 다양한 실시예들에 따른 패치 자동 추출 방법에 대하여 설명하였다. 상술한 방법에 따르면, 어노테이션이 수행될 영역을 가리키는 패치가 자동으로 추출될 수 있다. 따라서, 관리자의 업무 부담이 최소화될 수 있다. 또한, 기계 학습 모델의 오예측 확률, 엔트로피 값 등에 기반하여 복수의 후보 패치 중에서 학습에 효과적인 패치만이 어노테이션 대상으로 선정된다. 이에 따라, 어노테이션 작업량이 감소되고, 양질의 학습 데이터셋이 생성될 수 있다So far, the automatic patch extraction method according to various embodiments of the present disclosure has been described with reference to FIGS. 14 to 23. According to the above-described method, a patch indicating an area where annotations are to be performed may be automatically extracted. Therefore, the burden on the manager can be minimized. In addition, only patches effective for learning among a plurality of candidate patches are selected as annotation targets based on misprediction probabilities and entropy values of the machine learning model. Accordingly, the amount of annotation work can be reduced, and a good learning dataset can be generated.

이하에서는, 도 24를 참조하여 본 개시의 다양한 실시예들에 따른 장치(e.g. 관리 장치 100)/시스템을 구현할 수 있는 예시적인 컴퓨팅 장치(200)에 대하여 설명하도록 한다.Hereinafter, an exemplary computing device 200 that may implement an apparatus (e.g. management device 100) / system according to various embodiments of the present disclosure will be described with reference to FIG. 24.

도 24는 본 개시의 다양한 실시예들에 따른 장치를 구현할 수 있는 예시적인 컴퓨팅 장치(200)를 나타내는 예시적인 하드웨어 구성도이다.24 is an example hardware configuration diagram illustrating an example computing device 200 that can implement an apparatus in accordance with various embodiments of the present disclosure.

도 24에 도시된 바와 같이, 컴퓨팅 장치(200)는 하나 이상의 프로세서(210), 버스(250), 통신 인터페이스(270), 프로세서(210)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(230)와 컴퓨터 프로그램(291)를 저장하는 스토리지(290)를 포함할 수 있다. 다만, 도 24에는 본 개시의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 개시가 속한 기술분야의 통상의 기술자라면 도 24에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.As shown in FIG. 24, the computing device 200 may include a memory that loads one or more processors 210, a bus 250, a communication interface 270, and a computer program executed by the processor 210. Storage 290 for storing 230 and computer program 291. However, FIG. 24 illustrates only the components related to the embodiment of the present disclosure. Accordingly, those skilled in the art may recognize that other general purpose components may be further included in addition to the components illustrated in FIG. 24.

프로세서(210)는 컴퓨팅 장치(200)의 각 구성의 전반적인 동작을 제어한다. 프로세서(210)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(210)는 본 개시의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(200)는 하나 이상의 프로세서를 구비할 수 있다.The processor 210 controls the overall operation of each component of the computing device 200. The processor 210 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphics processing unit (GPU), or any type of processor well known in the art. Can be. In addition, the processor 210 may perform an operation on at least one application or program for executing a method according to embodiments of the present disclosure. Computing device 200 may have one or more processors.

메모리(230)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(230)는 본 개시의 다양한 실시예들에 따른 방법/동작을 실행하기 위하여 스토리지(290)로부터 하나 이상의 프로그램(291)을 로드할 수 있다. 메모리(230)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 개시의 기술적 범위는 이에 한정되지 아니한다.The memory 230 stores various data, commands, and / or information. Memory 230 may load one or more programs 291 from storage 290 to execute methods / operations in accordance with various embodiments of the present disclosure. The memory 230 may be implemented as a volatile memory such as a RAM, but the technical scope of the present disclosure is not limited thereto.

버스(250)는 컴퓨팅 장치(200)의 구성 요소 간 통신 기능을 제공한다. 버스(250)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 250 provides a communication function between components of the computing device 200. The bus 250 may be implemented as various types of buses such as an address bus, a data bus, and a control bus.

통신 인터페이스(270)는 컴퓨팅 장치(200)의 유무선 인터넷 통신을 지원한다. 또한, 통신 인터페이스(270)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(270)는 본 개시의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The communication interface 270 supports wired and wireless Internet communication of the computing device 200. In addition, the communication interface 270 may support various communication methods other than Internet communication. To this end, the communication interface 270 may comprise a communication module well known in the art of the present disclosure.

스토리지(290)는 상기 하나 이상의 프로그램(291)을 비임시적으로 저장할 수 있다. 스토리지(290)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 290 may non-temporarily store the one or more programs 291. Storage 290 is well known in the art for non-volatile memory, hard disks, removable disks, or the like to which the present disclosure pertains, such as Read Only Memory (ROM), Eraseable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, and the like. And any known type of computer readable recording medium.

컴퓨터 프로그램(291)은 메모리(230)에 로드될 때 프로세서(210)로 하여금 본 개시의 다양한 실시예들에 따른 동작/방법을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 포함할 수 있다. 즉, 프로세서(210)는 상기 하나 이상의 인스트럭션들을 실행함으로써, 본 개시의 다양한 실시예들에 따른 동작/방법들을 수행할 수 있다.Computer program 291 may include one or more instructions that, when loaded into memory 230, cause processor 210 to perform operations / methods in accordance with various embodiments of the present disclosure. That is, the processor 210 may perform operations / methods according to various embodiments of the present disclosure by executing the one or more instructions.

예를 들어, 컴퓨터 프로그램(291)은 신규의 병리 슬라이드 이미지에 대한 정보를 얻는 동작, 상기 병리 슬라이드 이미지의 데이터셋 타입 및 패널을 결정하는 동작 및 상기 병리 슬라이드 이미지, 상기 결정된 데이터셋 타입, 어노테이션 태스크(annotation task) 및 상기 병리 슬라이드 이미지의 일부 영역인 패치로 정의되는 어노테이션 작업(job)을 어노테이터(annotator) 계정에 할당하는 동작을 수행하도록 하는 하나 이상의 인스트럭션들을 포함할 수 있다. 이와 같은 경우, 컴퓨팅 장치(200)를 통해 본 개시의 몇몇 실시예들에 따른 관리 장치(100)가 구현될 수 있다.For example, the computer program 291 may be configured to obtain information about a new pathological slide image, determine a dataset type and panel of the pathological slide image, and the pathological slide image, the determined dataset type, an annotation task. (annotation task) and one or more instructions to perform an operation of assigning an annotation job (an job) defined as a patch, which is a part of the pathological slide image, to the annotator account. In this case, the management device 100 according to some embodiments of the present disclosure may be implemented through the computing device 200.

지금까지 도 24를 참조하여 본 개시의 다양한 실시예들에 따른 장치를 구현할 수 있는 예시적인 컴퓨팅 장치에 대하여 설명하였다.An example computing device that may implement an apparatus according to various embodiments of the present disclosure has been described above with reference to FIG. 24.

지금까지 도 1 내지 도 24를 참조하여 설명된 본 개시의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical idea of the present disclosure described with reference to FIGS. 1 to 24 may be implemented as computer readable codes on a computer readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disc, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer equipped hard disk). Can be. The computer program recorded on the computer-readable recording medium may be transmitted to another computing device and installed in the other computing device via a network such as the Internet, thereby being used in the other computing device.

이상에서, 본 개시의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 개시의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 개시의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above description, it is described that all the components constituting the embodiments of the present disclosure are combined or operated as one, but the technical spirit of the present disclosure is not necessarily limited to these embodiments. That is, within the scope of the present disclosure, all of the components may be selectively operated in one or more combinations.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although the operations are shown in a specific order in the drawings, it should not be understood that the operations must be performed in the specific order or sequential order shown or that all the illustrated operations must be executed to achieve the desired results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various configurations in the embodiments described above should not be understood as such separation being necessary, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. Should be understood.

이상 첨부된 도면을 참조하여 본 개시의 실시예들을 설명하였지만, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 개시가 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.While the embodiments of the present disclosure have been described with reference to the accompanying drawings, a person of ordinary skill in the art may implement the present disclosure in other specific forms without changing the technical spirit or essential features. I can understand that. Therefore, it is to be understood that the embodiments described above are exemplary in all respects and not restrictive. The protection scope of the present disclosure should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto shall be interpreted as being included in the scope of the technical spirit defined by the present disclosure.

Claims

An annotation task management method performed by at least one computing device,
Obtaining a pathological slide image to be annotated;
Determining at least one of a dataset type and a panel type of the obtained pathological slide image; And
Selecting at least one annotation target object patch from among a plurality of candidate patches included in the pathological slide image based on at least one of the determined dataset type and panel type.
How to manage annotation work.

The method of claim 1,
Selecting the at least one annotation target patch is
Selecting a plurality of candidate patches sampled from the pathological slide image,
Calculating at least one of a confidence score and an entropy value of each of the selected plurality of candidate patches,
Selecting at least one annotation target object patch of the plurality of candidate patches based on at least one confidence score and entropy value calculated for each of the plurality of candidate patches;
How to manage annotation work.

The method of claim 2,
Selecting the plurality of candidate patches
Segmenting at least a portion of the pathological slide image based on information associated with the pathological slide image,
Selecting the plurality of candidate patches from the divided at least part;
How to manage annotation work.

The method of claim 1,
Selecting the at least one annotation target patch is
Selecting a plurality of candidate patches sampled from the pathological slide image,
Calculating a miss-prediction of a machine learning model for each of the selected plurality of candidate patches,
Selecting the at least one annotation target object patch among the plurality of candidate patches based on the calculated misprediction probability;
How to manage annotation work.

The method of claim 1,
Allocating the selected at least one annotation target patch to at least one annotator account;
How to manage annotation work.

The method of claim 5,
Assigning to the at least one annotator account
Allocating the at least one annotation target patch to at least one annotator account based on at least one of the determined dataset type and panel type and annotation performance history of the annotator;
How to manage annotation work.

The method of claim 5,
Obtaining an annotation result for the at least one annotation target object patch from the assigned annotation account;
Comparing a result of the machine learning model for the at least one annotation target object patch with the obtained annotation result; And
The method may further include determining whether to reassign the at least one annotation target patch based on the comparison result.
How to manage annotation work.

The method of claim 5,
Obtaining annotation results for the at least one annotation target patch from the plurality of annotation accounts assigned for the at least one annotation target patch;
Comparing annotation results of each of the plurality of annotation accounts; And
The method may further include determining whether to reassign the at least one annotation target patch based on the comparison result.
How to manage annotation work.

The method of claim 1,
The panel type is
At least one of a cell panel, a tissue panel, and a structure panel,
The data set type is
Indicating a use of the pathological slide image, wherein the use of the pathological slide image includes one or more of a training use of a machine learning model and a validation use of the machine learning model
How to manage annotation work.

The method of claim 1,
The determining step
Inputting the pathological slide image to a machine learning model, and determining at least one of a dataset type and a panel type of the pathological slide image based on an output value outputted to the machine learning model;
How to manage annotation work.

A memory for storing one or more instructions; And
By executing the stored one or more instructions,
Acquire an annotated pathological slide image, determine at least one of a dataset type and a panel type of the obtained pathological slide image, and include in the pathological slide image based on at least one of the determined dataset type and panel type. And a processor configured to select at least one annotation target patch from among a plurality of candidate patches.
Annotation work management device.

The method of claim 11,
The processor is
Selecting a plurality of candidate patches sampled from the pathological slide image,
Calculating at least one of a confidence score and an entropy value of each of the selected plurality of candidate patches,
Selecting at least one annotation target object patch among the plurality of candidate patches based on at least one confidence score and entropy value calculated for each of the plurality of candidate patches;
Annotation work management device.

The method of claim 12,
The processor is
Segmenting at least a portion of the pathological slide image based on information associated with the pathological slide image,
Selecting the plurality of candidate patches from the divided at least part
Annotation work management device.

The method of claim 11,
The processor is
Selecting a plurality of candidate patches sampled from the pathological slide image,
Calculating a miss-prediction of a machine learning model for each of the selected plurality of candidate patches,
Selecting at least one annotation target object patch among the plurality of candidate patches based on the calculated misprediction probability
Annotation work management device.

The method of claim 11,
The processor is
Assign the selected at least one annotation target patch to at least one annotator account
Annotation work management device.

The method of claim 15,
The processor is
Assigning the at least one annotation target patch to at least one annotator account based on at least one of the determined dataset type and panel type, and annotation history of the annotator
Annotation work management device.

The method of claim 15,
The processor is
Obtaining an annotation result for the at least one annotation target patch from the assigned annotation account,
Comparing the results of the machine learning model for the at least one annotation target object patch with the obtained annotation results,
Based on the comparison result, it is determined whether to reassign the at least one annotation target patch.
Annotation work management device.

The method of claim 15,
The processor is
Obtaining annotation results for the at least one annotation target patch from the plurality of annotation accounts assigned for the at least one annotation target patch,
Compare annotation results of each of the plurality of annotation accounts,
Based on the comparison result, it is determined whether to reassign the at least one annotation target patch.
Annotation work management device.

The method of claim 11,
The panel type is
At least one of a cell panel, a tissue panel, and a structure panel,
The data set type is
Indicating a use of the pathological slide image, wherein the use of the pathological slide image includes one or more of a training use of a machine learning model and a validation use of the machine learning model
Annotation work management device.

The method of claim 11,
The processor is
Inputting the pathological slide image to a machine learning model to determine at least one of a dataset type and a panel type of the pathological slide image based on the output value
Annotation work management device.