KR20230120992A

KR20230120992A - Method and apparatus for selecting medical data for annotation

Info

Publication number: KR20230120992A
Application number: KR1020230010383A
Authority: KR
Inventors: 유동근
Original assignee: 주식회사 루닛
Priority date: 2022-02-09
Filing date: 2023-01-26
Publication date: 2023-08-17

Abstract

A method of operating a medical data screening device operated by at least one processor comprises the steps of: generating some medical data sampled from bulk medical data and annotation data of the partial medical data as training data; extracting annotation candidate data that is at least part of the mass medical data; and obtaining an inference result for the annotation candidate data from an artificial intelligence model trained using the training data, and selecting an annotation target for the next training of the artificial intelligence model from the annotation candidate data based on the inference result. Accordingly, the present invention may increase artificial intelligence model training performance compared to annotation cost.

Description

Method and apparatus for selecting medical data for annotation {METHOD AND APPARATUS FOR SELECTING MEDICAL DATA FOR ANNOTATION}

본 개시는 어노테이션에 관한 것이다.This disclosure relates to annotations.

인공지능 모델은 대량의 의료 이미지들과 이의 어노테이션 데이터(annotated data)를 통해 훈련될 수 있다. 어노테이션은 인공지능 모델의 훈련 데이터를 생성하기 위해, 요구되는 정보를 태깅하는 작업 또는 태깅된 어노테이션 데이터를 의미한다. 특히 의료 데이터의 경우, 숙련된 전문가에 의해 어노테이션이 진행되므로, 상당한 비용과 시간이 요구된다. An artificial intelligence model can be trained through a large amount of medical images and their annotated data. Annotation refers to a task of tagging required information or tagged annotation data in order to generate training data of an artificial intelligence model. In particular, in the case of medical data, since annotation is performed by a skilled expert, considerable cost and time are required.

어노테이션 데이터가 많을수록 훈련 성능을 높일 수 있으나, 어노테이션 예산이 한정되어 있으므로, 전체 의료 데이터에서 어노테이션 대상을 선별하는 것이 필요하다. 보통 전체 의료 데이터에서 무작위 추출한 일부 데이터를 어노테이션 대상으로 결정하는데, 이렇게 무작위 추출된 어노테이션 대상이 인공지능 모델의 훈련을 위한 최적의 데이터라고 보기 어렵다. Training performance can be improved as the amount of annotation data increases, but since the annotation budget is limited, it is necessary to select an annotation target from all medical data. Usually, some data randomly extracted from the entire medical data is selected as an annotation target, but it is difficult to see that this randomly extracted annotation target is the optimal data for training an artificial intelligence model.

본 개시는 어노테이션을 위한 의료 데이터 선별 방법 및 장치를 제공한다.The present disclosure provides a method and apparatus for screening medical data for annotation.

본 개시는 현재 인공지능 모델의 훈련 성능을 기초로 의료 데이터의 선별 정책을 결정하고, 선별 정책에 따라 다음 훈련을 위한 어노테이션 대상을 선별함으로써 대량 의료 데이터에서 어노테이션을 위한 의료 데이터를 선별하는 방법 및 장치를 제공한다.The present disclosure provides a method and apparatus for selecting medical data for annotation from a large amount of medical data by determining a medical data selection policy based on the training performance of a current artificial intelligence model and selecting an annotation target for next training according to the selection policy. provides

한 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 의료 데이터 선별 장치의 동작 방법으로서, 대량 의료 데이터에서 샘플링된 일부 의료 데이터와 상기 일부 의료 데이터의 어노테이션 데이터를 훈련 데이터로 생성하는 단계, 상기 대량 의료 데이터 중 적어도 일부인 어노테이션 후보 데이터를 추출하는 단계, 그리고 상기 훈련 데이터를 통해 훈련된 인공지능 모델로부터 상기 어노테이션 후보 데이터에 대한 추론 결과를 획득하고, 상기 추론 결과를 기초로 상기 어노테이션 후보 데이터 중에서 상기 인공지능 모델의 다음 훈련을 위한 어노테이션 대상을 선별하는 단계를 포함한다.A method of operating a medical data sorting apparatus operated by at least one processor according to an embodiment, comprising: generating training data from partial medical data sampled from a large amount of medical data and annotation data of the partial medical data; Extracting annotation candidate data, which is at least a part of data, and acquiring an inference result for the annotation candidate data from an artificial intelligence model trained through the training data, and based on the inference result, the artificial intelligence among the annotation candidate data Selecting an annotation target for the next training of the model.

상기 어노테이션 대상을 선별하는 단계는 상기 인공지능 모델의 훈련 성능을 기초로 다음 훈련을 위한 의료 데이터의 선별 정책을 결정하고, 상기 추론 결과를 기초로 상기 어노테이션 후보 데이터 중에서 상기 선별 정책에 해당하는 어노테이션 대상을 선별할 수 있다.In the step of selecting an annotation target, a selection policy of medical data for the next training is determined based on the training performance of the artificial intelligence model, and an annotation target corresponding to the selection policy is selected from among the annotation candidate data based on the reasoning result. can be selected.

상기 선별 정책은 분류 정확도가 기준 이하인 특정 클래스의 의료 데이터를 다른 클래스보다 높은 비율로 추출하는 정책일 수 있다.The selection policy may be a policy of extracting medical data of a specific class whose classification accuracy is lower than a standard at a higher rate than other classes.

상기 동작 방법은 상기 대량 의료 데이터 중 적어도 일부인 검증 데이터를 추출하는 단계, 그리고 상기 검증 데이터를 기초로 상기 인공지능 모델의 훈련 성능을 평가하는 단계를 더 포함할 수 있다.The operation method may further include extracting verification data that is at least a part of the mass medical data, and evaluating training performance of the artificial intelligence model based on the verification data.

상기 어노테이션 대상을 선별하는 단계는 상기 대량 의료 데이터에서 랜덤 추출한 의료 데이터를 상기 어노테이션 대상에 추가할 수 있다.In the selecting of the annotation target, medical data randomly extracted from the mass medical data may be added to the annotation target.

상기 어노테이션 대상을 선별하는 단계는 상기 훈련 데이터를 통해 훈련된 인공지능 모델로부터 상기 어노테이션 후보 데이터에 대한 추론 결과를 획득하고, 상기 추론 결과를 기초로 상기 어노테이션 후보 데이터 중에서 특정 클래스에 관계된 의료 데이터를 선별할 수 있다.In the step of selecting an annotation target, an inference result for the annotation candidate data is obtained from an artificial intelligence model trained through the training data, and medical data related to a specific class is selected from the annotation candidate data based on the inference result. can do.

상기 훈련 데이터로 생성하는 단계는 상기 대량 의료 데이터에 연관된 판독 리포트를 분석하여, 상기 판독 리포트에 상기 인공지능 모델의 훈련과 관련된 정보를 포함하는 상기 일부 의료 데이터를 샘플링하고, 상기 일부 의료 데이터의 어노테이션 데이터를 획득하여 상기 훈련 데이터를 생성할 수 있다.The generating of the training data may include analyzing a reading report associated with the mass medical data, sampling the partial medical data including information related to training of the artificial intelligence model in the reading report, and annotating the partial medical data. The training data may be generated by obtaining data.

상기 대량 의료 데이터는 적어도 하나의 의료 영상 촬영 장비를 통해 획득한 이미지들, 병리 이미지들 또는 각 의료 이미지에서 추출된 패치 이미지들을 포함할 수 있다.The mass medical data may include images acquired through at least one medical imaging device, pathology images, or patch images extracted from each medical image.

한 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 의료 데이터 선별 장치의 동작 방법으로서, 현재 인공지능 모델의 훈련 성능을 기초로 의료 데이터의 선별 정책을 결정하고, 어노테이션 후보 데이터에 대한 상기 현재 인공지능 모델의 추론 결과를 기초로, 상기 어노테이션 후보 데이터 중에서 상기 선별 정책에 해당하는 어노테이션 대상을 다음 훈련을 위해 선별한 후, 상기 어노테이션 대상에 대해 획득한 어노테이션 데이터를 기초로 상기 현재 인공 지능 모델을 훈련시키는 과정을 반복하는 단계, 그리고 상기 현재 인공지능 모델의 다음 훈련을 진행하지 않는 경우, 상기 과정을 종료하는 단계를 포함한다. 상기 어노테이션 후보 데이터는 대량 의료 데이터 중 적어도 일부의 데이터일 수 있다.A method of operating an apparatus for selecting medical data operated by at least one processor according to an embodiment, determining a policy for selecting medical data based on training performance of a current artificial intelligence model, and determining the current artificial intelligence for annotation candidate data. Based on the inference result of the model, after selecting an annotation target corresponding to the selection policy from among the annotation candidate data for next training, training the current artificial intelligence model based on the annotation data obtained for the annotation target Repeating the process, and terminating the process when the next training of the current artificial intelligence model is not performed. The annotation candidate data may be data of at least a part of mass medical data.

상기 반복하는 단계는 상기 대량 의료 데이터 중 적어도 일부인 검증 데이터를 추출하고, 상기 검증 데이터를 기초로 상기 현재 인공지능 모델의 훈련 성능을 평가할 수 있다.The repeating step may extract verification data, which is at least a part of the mass medical data, and evaluate training performance of the current artificial intelligence model based on the verification data.

상기 반복하는 단계는 상기 대량 의료 데이터에서 랜덤 추출한 의료 데이터를 상기 어노테이션 대상에 추가할 수 있다.In the repeating step, medical data randomly extracted from the mass medical data may be added to the annotation target.

상기 반복하는 단계는 상기 현재 인공지능 모델의 훈련 성능에 따라 상기 선별 정책을 이전과 같거나 다르게 결정할 수 있다.In the repeating step, the selection policy may be determined as before or differently according to the training performance of the current artificial intelligence model.

상기 현재 인공지능 모델은 상기 대량 의료 데이터에서 샘플링된 일부 의료 데이터와 상기 일부 의료 데이터의 어노테이션 데이터를 기초로 훈련된 최초 모델이거나, 상기 최초 모델이 상기 과정을 통해 재훈련된 모델일 수 있다.The current artificial intelligence model may be an initial model trained based on some medical data sampled from the mass medical data and annotation data of the partial medical data, or the initial model may be a model retrained through the above process.

한 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 의료 데이터 선별 장치로서, 훈련 데이터를 기초로, 입력으로부터 추론된 결과를 출력하도록 훈련된 인공지능 모델, 그리고 대량 의료 데이터 중 적어도 일부인 어노테이션 후보 데이터를 추출하고, 상기 인공지능 모델로부터 상기 어노테이션 후보 데이터에 대한 추론 결과를 획득하고, 상기 추론 결과를 기초로 상기 어노테이션 후보 데이터 중에서 상기 인공지능 모델의 다음 훈련을 위한 어노테이션 대상을 선별하는 선별기를 포함한다. 상기 다음 훈련을 위한 어노테이션 대상에 대해 획득한 어노테이션 데이터는 상기 인공지능 모델의 다음 훈련에 사용될 수 있다.An apparatus for selecting medical data operated by at least one processor according to an embodiment, comprising an artificial intelligence model trained to output a result inferred from an input based on training data, and annotation candidate data that is at least a part of mass medical data. and a selector that selects an annotation target for next training of the artificial intelligence model from among the annotation candidate data based on the inference result. Annotation data obtained for the annotation target for the next training may be used for the next training of the artificial intelligence model.

상기 선별기는 상기 인공지능 모델의 다음 훈련이 필요한 경우, 상기 인공지능 모델의 추론결과를 기초로 다음 훈련을 위한 상기 어노테이션 대상을 선별하는 과정을 반복할 수 있다.The selector may repeat the process of selecting the annotation target for the next training based on the reasoning result of the artificial intelligence model when the next training of the artificial intelligence model is required.

상기 선별기는 상기 인공지능 모델의 훈련 성능을 기초로 다음 훈련을 위한 의료 데이터의 선별 정책을 결정하고, 상기 추론 결과를 기초로 상기 어노테이션 후보 데이터 중에서 상기 선별 정책에 해당하는 어노테이션 대상을 선별할 수 있다.The selector may determine a selection policy of medical data for the next training based on the training performance of the artificial intelligence model, and select an annotation target corresponding to the selection policy from among the annotation candidate data based on the reasoning result. .

상기 선별기는 상기 인공지능 모델의 훈련 성능에 따라 상기 선별 정책을 이전과 같거나 다르게 결정할 수 있다.The selector may determine the same or different selection policy according to the training performance of the artificial intelligence model.

상기 선별기는 상기 대량 의료 데이터에서 랜덤 추출한 의료 데이터를 상기 어노테이션 대상에 추가할 수 있다.The selector may add medical data randomly extracted from the mass medical data to the annotation target.

실시예에 따르면, 대량 의료 데이터에서 어노테이션 대상을 지능적으로 선별함으로써 어노테이션 비용 대비 인공지능 모델 훈련 성능을 높일 수 있다.According to the embodiment, it is possible to increase AI model training performance against annotation cost by intelligently selecting an annotation target from a large amount of medical data.

실시예에 따르면, 전체 의료 데이터에서 무작위 추출한 일부 데이터에 어노테이션한 훈련 데이터에 비해, 인공지능 모델의 훈련 성능을 높일 수 있다.According to the embodiment, the training performance of the artificial intelligence model may be improved compared to training data in which some data randomly extracted from all medical data is annotated.

실시예에 따르면, 전체 슬라이드 이미징(Whole Slide Imaging, WSI)을 통한 병리 이미지처럼 큰 이미지에서 어노테이션이 필요한 일부 패치(patch)를 선별할 수 있다.According to the embodiment, some patches requiring annotation may be selected from a large image, such as a pathology image through whole slide imaging (WSI).

도 1은 한 실시예에 따른 어노테이션을 위한 의료 데이터 선별 장치를 설명하는 도면이다.
도 2는 한 실시예에 따른 단계적 어노테이션 대상 선별을 통한 훈련을 설명하는 도면이다.
도 3부터 도 6 각각은 한 실시예에 따른 어노테이션을 위한 의료 데이터 선별 방법의 흐름도이다.
도 7과 도 8 각각은 한 실시예에 따른 어노테이션을 위한 패치 이미지 선별 방법의 흐름도이다.
도 9는 한 실시예에 따른 어노테이션 선별 장치의 하드웨어 구성도이다.1 is a diagram illustrating an apparatus for selecting medical data for annotation according to an exemplary embodiment.
2 is a diagram illustrating training through step-by-step annotation target selection according to an embodiment.
3 to 6 are flowcharts of a method for selecting medical data for annotation according to an embodiment.
7 and 8 are respective flowcharts of a method for selecting a patch image for annotation according to an exemplary embodiment.
9 is a hardware configuration diagram of an annotation screening device according to an embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

설명에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In the description, when a part is said to "include" a certain component, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as “… unit”, “… unit”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. there is.

또한, 설명에서 사용되는 "제 1" 또는 "제 2" 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용할 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다.In addition, terms including ordinal numbers such as “first” or “second” used in the description may be used to describe various elements, but the elements should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from another.

본 개시의 장치는 적어도 하나의 프로세서가 명령어들(instructions)을 실행함으로써, 본 개시의 동작을 수행할 수 있도록 구성 및 연결된 컴퓨팅 장치이다. 컴퓨터 프로그램은 프로세서가 본 개시의 동작을 실행하도록 기술된 명령어들(instructions)을 포함하고, 비일시적-컴퓨터 판독가능 저장매체(non-transitory computer readable storage medium)에 저장될 수 있다. 컴퓨터 프로그램은 네트워크를 통해 다운로드되거나, 제품 형태로 판매될 수 있다.An apparatus of the present disclosure is a computing device configured and connected so that at least one processor can perform the operations of the present disclosure by executing instructions. The computer program includes instructions described to cause a processor to execute the operations of the present disclosure, and may be stored in a non-transitory computer readable storage medium. The computer program may be downloaded through a network or sold in the form of a product.

어노테이션(annotation)은 어노테이션 대상에 정보를 표시하거나 정보를 추가하는 등의 작업을 의미할 수 있다. 또는 어노테이션 대상에 표시되거나 추가된 어노테이션 데이터(annotated data)를 어노테이션이라고 부를 수 있다. Annotation may refer to a task such as displaying information or adding information to an annotation target. Alternatively, annotated data displayed or added to the annotation target may be referred to as annotation.

어노테이션 작업을 수행하는 전문가를 어노테이터(annotator)라고 부른다. 어노테이터가 반드시 사람일 필요는 없고, 어노테이션 작업을 학습한 인공지능 모델일 수 있다.An expert who performs annotation work is called an annotator. Annotators need not necessarily be humans, but can be artificial intelligence models that have learned to annotate tasks.

어노테이션 대상은 다양한 의료 데이터일 수 있는데, 의료 이미지(medical image)를 예로 들어 설명할 수 있다. 의료 이미지는 엑스레이(X-ray), MRI(magnetic resonance imaging), 초음파(ultrasound), CT(computed tomography), MMG(Mammography), DBT(Digital breast tomosynthesis) 등의 다양한 의료 영상 촬영 장비들로 촬영된 이미지이거나, 병리 이미지일 수 있다. 병리 이미지는 전체 슬라이드 이미징(Whole Slide Imaging, WSI)을 통해 획득될 수 있다. 또한, 의료 이미지의 적어도 일부 패치가 어노테이션 대상일 수 있다.An annotation target may be various medical data, and a medical image may be described as an example. Medical images are taken with various medical imaging equipment such as X-ray, magnetic resonance imaging (MRI), ultrasound (ultrasound), computed tomography (CT), mammography (MMG), and digital breast tomosynthesis (DBT). It may be an image or a pathological image. Pathological images may be acquired through Whole Slide Imaging (WSI). Also, at least some patches of the medical image may be an annotation target.

도 1은 한 실시예에 따른 어노테이션을 위한 의료 데이터 선별 장치를 설명하는 도면이고, 도 2는 한 실시예에 따른 단계적 어노테이션 대상 선별을 통한 훈련을 설명하는 도면이다.FIG. 1 is a diagram illustrating an apparatus for selecting medical data for annotation according to an embodiment, and FIG. 2 is a diagram illustrating training through step-by-step annotation target selection according to an embodiment.

도 1을 참고하면, 어노테이션을 위한 의료 데이터 선별 장치(간단히, 선별 장치)(10)는 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치이다. 선별 장치(10)는 대량 의료 데이터가 저장된 데이터베이스(20)에서, 인공지능 모델(100)의 훈련을 위해 어노테이션 작업이 수행되어야 하는 어노테이션 대상을 선별한다. 선별 장치(10)는 어노테이션 비용 대비 인공지능 모델 훈련 성능을 높이기 위해, 현재 인공지능 모델(Artificial Intelligence model, AI model)(100)의 훈련 성능을 기초로 선별 정책을 결정하고, 결정된 선별 정책에 따라 다음 훈련을 위한 어노테이션 대상을 선별함으로써 대량 의료 데이터에서 어노테이션 대상을 지능적으로 선별할 수 있다. 선별 정책은 선별 조건을 포함하고, 선별해야 하는 어노테이션 대상 수를 더 포함할 수 있다. 선별 정책은 현재 인공지능 모델의 훈련 성능에 따라 이전과 같거나 다르게 결정될 수 있다.Referring to FIG. 1 , an apparatus for selecting medical data for annotation (simply, a sorting apparatus) 10 is a computing device operated by at least one processor. The screening device 10 selects an annotation target on which an annotation job is to be performed for training of the artificial intelligence model 100 in the database 20 in which a large amount of medical data is stored. The selection device 10 determines a selection policy based on the training performance of the current AI model (Artificial Intelligence model, AI model) 100 in order to increase the AI model training performance against the annotation cost, and according to the determined selection policy By selecting annotation targets for the next training, annotation targets can be intelligently selected from large amounts of medical data. The selection policy may include selection conditions and may further include the number of annotation targets to be selected. The selection policy may be determined the same as before or differently depending on the training performance of the current AI model.

선별 장치(10)는 인공지능 모델(100)을 훈련시키는 훈련기(200), 그리고 현재 인공지능 모델(100)의 추론 결과를 기초로 어노테이션 후보 데이터에서 다음 훈련을 위한 어노테이션 대상을 선별하는 선별기(300)를 포함할 수 있다. 선별 장치(10)는 어노테이터에게 어노테이션 대상에 대한 어노테이션 작업을 의뢰하고, 어노테이션 대상에 대한 어노테이션 데이터를 획득하는 어노테이션 작업 처리기(400)를 더 포함할 수 있다. 어노테이션 대상에 대한 어노테이션 데이터는 데이터베이스(20)에 저장되고, 다음 훈련을 위한 훈련 데이터로 사용될 수 있다.The selection device 10 includes a trainer 200 for training the artificial intelligence model 100, and a selector 300 for selecting an annotation target for the next training from annotation candidate data based on the inference result of the current artificial intelligence model 100. ) may be included. The screening device 10 may further include an annotation task processor 400 that requests an annotation task for an annotation target from an annotator and obtains annotation data for the annotation target. Annotation data for an annotation target is stored in the database 20 and can be used as training data for the next training.

데이터베이스(20)는 다양한 종류의 의료 데이터를 포함할 수 있고, 인공지능 모델(100)의 태스크에 따라 데이터베이스(20)에서 훈련 데이터나 어노테이션 후보 데이터가 결정될 수 있다. 데이터베이스(20)는 적어도 하나의 의료 영상 촬영 장비를 통해 획득한 이미지들, 병리 이미지들 또는 각 의료 이미지에서 추출된 패치 이미지들을 포함할 수 있다.The database 20 may include various types of medical data, and training data or annotation candidate data may be determined in the database 20 according to tasks of the artificial intelligence model 100 . The database 20 may include images obtained through at least one medical imaging device, pathology images, or patch images extracted from each medical image.

인공지능 모델(100)은 적어도 하나의 태스크를 학습하는 기계학습모델로서, 프로세서에 의해 실행되는 컴퓨터 프로그램으로 구현될 수 있다. 인공지능 모델이 학습하는 태스크란, 기계 학습을 통해 해결하고자 하는 과제 또는 기계 학습을 통해 수행하고자 하는 작업을 지칭할 수 있다. 인공지능 모델은 컴퓨팅 장치에서 실행되는 컴퓨터 프로그램으로 구현될 수 있고, 네트워크를 통해 다운로드되거나, 제품 형태로 판매될 수 있다. 또는 인공지능 모델은 네트워크를 통해 다양한 장치들과 연동할 수 있다. The artificial intelligence model 100 is a machine learning model that learns at least one task, and may be implemented as a computer program executed by a processor. The task to be learned by the artificial intelligence model may refer to a task to be solved through machine learning or a task to be performed through machine learning. The artificial intelligence model can be implemented as a computer program running on a computing device, downloaded through a network, or sold in the form of a product. Alternatively, the artificial intelligence model can work with various devices through a network.

인공지능 모델(100)은 훈련 과정에서 입력과 출력 사이의 관계를 학습함으로써 새로운 입력으로부터 추론된 결과를 출력하는 모델로서, 예를 들면, 의료 이미지를 입력받고, 의료 이미지에 포함된 의학적 추론을 하도록 훈련된 모델일 수 있다. 의학적 추론은 입력과 태스크에 따라 다양하게 정의될 수 있다. The artificial intelligence model 100 is a model that outputs a result inferred from a new input by learning a relationship between an input and an output in a training process, for example, to receive a medical image and make medical inference included in the medical image It can be a trained model. Medical reasoning can be defined in various ways according to inputs and tasks.

인공지능 모델(100)의 태스크는, 예를 들면, 의료 이미지에서 세포, 조직 등의 생물학적 대상체를 검출하거나 분리하는 태스크, 또는 의료 이미지에서 특정 정보(예를 들면, 면역 표현형(Immune Phenotype, IP), CPS(Combined positive score), TPS(Tumor Proportion Score), 특정 염색에 대한 발현 정도, 특정 치료에 대한 반응 여부 등)를 추론하는 태스크일 수 있다.The task of the artificial intelligence model 100 is, for example, a task of detecting or separating biological objects such as cells and tissues from medical images, or specific information (eg, immune phenotype (IP)) from medical images. , CPS (Combined positive score), TPS (Tumor Proportion Score), expression level for a specific dye, response to a specific treatment, etc.) may be a task of inferring.

인공지능 모델(100)이 병리 이미지로부터 추론할 수 있는 정보는 다양할 수 있다. 예를 들면, 인공지능 모델(100)은 병리 이미지에 포함된 생물학적 대상체를 검출할 수 있다. 인공지능 모델(100)은 모든 종류의 세포에 컨투어(contour)를 그리거나, 경계 상자(bounding box)를 그리거나, 세포 중앙에 점을 그려서, 검출한 생물학적 대상체를 표시할 수 있다. 인공지능 모델(100)은 병리 이미지에 포함된 생물학적 대상체를 분리(segmentation)할 수 있다. 인공지능 모델(100)은 병리 이미지에서 특정 생물학적 대상체에 해당하는 영역을 픽셀단위로 인식할 수 있다. 여기서 생물학적 대상체는 암세포, 정상세포, 유사분열세포(mitosis), 면역세포, 섬유아세포(Fibroblast) 등의 세포일 수 있고, 혈관, 장기, 신경, 암병변, 기질 종양(tumor stroma) 등 모든 생물학적 대상체를 포함할 수 있다. 이외에도, 인공지능 모델(100)은 이미지의 퀄리티 정보(예를 들면, blur 여부, 오염물질 존재 여부, 조직이 접혀 있는 지 여부 등)를 추론할 수 있다. 인공지능 모델(100)은 이미지가 관심 영역(예를 들면, 세포 영역, 배경 영역, 암 영역, 등)을 포함하는 지 여부를 추론할 수 있다. 인공지능 모델(100)은 이미지에서 면역 관련 정보(예를 들면, immune inflamed tumors, immune desert, immune excluded 등으로 구분되는 면역 표현형)를 추론할 수 있다. 인공지능 모델(100)은 이미지의 TPS 또는 CPS 등의 점수를 추론할 수 있다. 인공지능 모델(100)은 이미지에서 특정 염색(예를 들면, H&E, IHC 등)에 대한 발현 정도를 추론할 수 있다. 이러한 다양한 추론을 인공지능 모델(100)이 수행하기 위해서, 추론하고자 하는 정답이 어노테이션된 의료 데이터가 훈련 데이터로 사용되어야 한다. 이를 위해 선별기(300)가 어느 의료 데이터에 어노테이션을 하는 것이 훈련에 도움이 될지 판단해서 어노테이션 대상을 선별하는 것이다. The information that the artificial intelligence model 100 can infer from the pathology image may be diverse. For example, the artificial intelligence model 100 may detect a biological object included in a pathology image. The artificial intelligence model 100 may mark detected biological objects by drawing contours, bounding boxes, or dots in the center of cells of all types. The artificial intelligence model 100 may segment a biological object included in the pathology image. The artificial intelligence model 100 may recognize a region corresponding to a specific biological object in a pathology image in units of pixels. Here, the biological object may be cells such as cancer cells, normal cells, mitosis cells, immune cells, and fibroblasts, and all biological objects such as blood vessels, organs, nerves, cancer lesions, and tumor stroma. can include In addition, the artificial intelligence model 100 may infer image quality information (for example, whether there is blur, whether a contaminant is present, whether a tissue is folded, etc.). The artificial intelligence model 100 may infer whether the image includes a region of interest (eg, a cell region, a background region, a cancer region, etc.). The artificial intelligence model 100 may infer immune-related information (eg, immune phenotype classified into immune inflamed tumors, immune desert, immune excluded, etc.) from an image. The artificial intelligence model 100 may infer a score such as TPS or CPS of an image. The artificial intelligence model 100 may infer the expression level of a specific dye (eg, H&E, IHC, etc.) in an image. In order for the artificial intelligence model 100 to perform these various inferences, medical data annotated with correct answers to be inferred must be used as training data. To this end, the selector 300 selects an annotation target by determining which medical data to annotate will be helpful for training.

마찬가지로, 인공지능 모델(100)이 X-ray, CT(Computed Tomography), MMG(Mammography), DBT(Digital Breast Tomosynthesis) 등의 의료 영상 촬영 장비를 통해 획득한 이미지로부터 추론할 수 있는 정보는 다양할 수 있다. 예를 들면, 인공지능 모델(100)은 이미지 내 객체를 분류(classification)하거나, 검출한 객체에 컨투어(contour)를 그리거나, 객체에 경계 상자(bounding box)를 그리거나, 객체의 중앙에 점을 그리거나, 픽셀 단위로 클래스를 분류할 수 있다. 인공지능 모델(100)은 이미지에서 E-tube나 L-tube와 같은 선을 검출하거나, 픽셀 단위로 클래스에 해당할 확률을 계산하고 이를 히트맵으로 표시할 수 있다. 여기서, 클래스는 이미지에서 추론하고자 하는 정보에 따라 다양하게 정의될 수 있는데, 예를 들면, 기흉, 폐결절, 폐암, 결핵, 골절, 폐렴, 무기폐, 폐 경화 등의 병변이거나, 폐, 위, 식도, 분기부(carina), 횡격막 등의 장기이거나, E-tube, L-tube 등의 의료 장치일 수 있다. 다른 예를 들면, 인공지능 모델(100)은, MMT, DBT 등의 유방촬영술을 통해 획득된 영상을 입력 받아서, 유방 내 병변을 대상 객체로서 분류하거나, 유방 밀도 또는 추후 유방암 발생 가능성을 계산할 수 있다. 이러한 다양한 추론을 인공지능 모델(100)이 수행하기 위해서, 정답에 해당하는 클래스가 어노테이션된 의료 이미지가 훈련 데이터로 사용되어야 한다. Similarly, information that can be inferred from images acquired by the artificial intelligence model 100 through medical imaging equipment such as X-ray, CT (Computed Tomography), MMG (Mammography), DBT (Digital Breast Tomosynthesis), etc. may vary. can For example, the artificial intelligence model 100 classifies an object in an image, draws a contour on a detected object, draws a bounding box on an object, or draws a dot in the center of an object. can be drawn, or classes can be classified in units of pixels. The artificial intelligence model 100 may detect a line such as an E-tube or an L-tube in an image, or calculate a probability corresponding to a class in units of pixels and display it as a heat map. Here, the class may be defined in various ways depending on the information to be inferred from the image. It may be an organ such as a carina or a diaphragm, or a medical device such as an E-tube or L-tube. For another example, the artificial intelligence model 100 may receive an image acquired through mammography such as MMT or DBT, classify a lesion in the breast as a target object, or calculate breast density or the possibility of breast cancer occurring in the future. . In order for the artificial intelligence model 100 to perform these various inferences, medical images annotated with classes corresponding to correct answers must be used as training data.

훈련기(200)는 훈련 데이터를 기초로 인공지능 모델(100)을 훈련시킨다. 훈련 데이터는 어노테이션 데이터가 정답값(ground truth)으로 매핑된 의료 데이터를 포함할 수 있다. 현재 훈련 데이터는 이전 훈련 데이터로 훈련된 인공지능 모델(100)의 재훈련을 위해 선별된 어노테이션 대상과 이의 어노테이션 데이터를 포함할 수 있다. 인공지능 모델(100)은 현재 훈련 데이터를 기초로 입력으로부터 정답인 어노테이션 데이터에 가까운 결과를 추론하도록 학습할 수 있다. 이때, 훈련 데이터 중 어노테이션 데이터가 매핑된 일부 데이터는 훈련 성능을 평가하기 위한 검증 데이터(validation data)로 사용될 수 있다. 한편, 인공지능 모델(100)은 학습 방식에 따라 어노테이션 데이터가 매핑된 의료 데이터로 지도 학습할 수 있고, 또는 어노테이션 데이터가 매핑되지 않은 의료 데이터도 훈련 데이터로 사용하여 학습할 수도 있다.The trainer 200 trains the artificial intelligence model 100 based on training data. Training data may include medical data in which annotation data is mapped to ground truth. The current training data may include an annotation target selected for retraining of the artificial intelligence model 100 trained with previous training data and annotation data thereof. The artificial intelligence model 100 may learn to infer a result close to the annotation data, which is the correct answer, from an input based on current training data. In this case, some of the training data to which the annotation data is mapped may be used as validation data for evaluating training performance. Meanwhile, the artificial intelligence model 100 may perform supervised learning with medical data to which annotation data is mapped according to a learning method, or may learn by using medical data to which annotation data is not mapped as training data.

한편, 최초 훈련 데이터는 대량 의료 데이터에서 샘플링된 일부 데이터와 이의 어노테이션 데이터를 포함할 수 있다. 예를 들어, 최초 훈련 데이터를 위해, 선별기(300)는 10만장의 의료 이미지들 중에서 1000장의 의료 이미지들을 샘플링하고, 1000장의 의료 이미지들을 어노테이션 대상으로 결정할 수 있다. 이후, 어노테이션 작업 처리기(400)가 1000장의 의료 이미지들에 대한 어노테이션 데이터를 획득하여 데이터베이스(20)에 저장함으로써, 훈련기(200)가 어노테이션 데이터가 매핑된 1000장의 의료 이미지들을 최초 훈련 데이터로 사용할 수 있다. Meanwhile, the initial training data may include some data sampled from mass medical data and annotation data thereof. For example, for initial training data, the selector 300 may sample 1000 medical images from among 100,000 medical images and determine the 1000 medical images as annotation targets. Thereafter, the annotation task processor 400 obtains annotation data for 1000 medical images and stores them in the database 20, so that the trainer 200 can use the 1000 medical images to which the annotation data is mapped as initial training data. there is.

최초 훈련 데이터를 위한 어노테이션 대상을 샘플링하는 방법은 다양할 수 있다. 예를 들면, 선별기(300)는 10만장의 의료 이미지들 중에서 1000장의 의료 이미지들을 랜덤 샘플링할 수 있다. 또는, 선별기(300)는 10만장의 의료 이미지들에 연관된 판독 리포트를 분석하여, 판독 리포트에 인공지능 모델(100)의 훈련과 관련된 정보(예를 들면, 분류하고자 하는 클래스명)를 포함하는 의료 이미지들을 샘플링할 수 있다. 이외에도 샘플링 방법은 다양할 수 있고, 최초 훈련 데이터로 사용할 의료 이미지들이 미리 정해져 있다면, 샘플링 없이 해당 의료 이미지들에 대한 어노테이션 데이터를 획득할 수 있다. There may be various methods of sampling annotation targets for initial training data. For example, the selector 300 may randomly sample 1000 medical images from among 100,000 medical images. Alternatively, the sorter 300 analyzes a reading report associated with 100,000 medical images, and includes information related to training of the artificial intelligence model 100 (eg, a class name to be classified) in the reading report. Images can be sampled. In addition, sampling methods may vary, and if medical images to be used as initial training data are predetermined, annotation data for corresponding medical images may be obtained without sampling.

훈련 데이터로 훈련된 인공지능 모델(100)은 입력과 태스크에 따라 다양한 추론 결과를 출력할 수 있다. 예를 들면, 추론 결과는 이미지에 포함된 세포 종류를 나타내는 클래스, 이미지에 포함된 조직 종류를 나타내는 클래스, 이미지의 퀄리티 또는 퀄리티에 영향을 주는 아티팩트를 나타내는 클래스, 이미지 내 영역이 관심 영역인지를 나타내는 클래스, 면역 관련 정보를 나타내는 클래스, TPS 또는 CPS 등의 점수를 나타내는 클래스, 특정 염색에 대한 발현 정도를 나타내는 클래스 등일 수 있다. 또한, 추론 결과는 병변의 종류를 나타내는 클래스(예를 들면, 기흉, 폐결절, 폐암, 결핵, 골절, 폐렴, 무기폐, 폐 경화, 유방 내 양성 종양, 유방 내 악성 종양, 유방암 병변 등), 장기의 종류를 나타내는 클래스(예를 들면, 폐, 위, 식도, 분기부(carina), 횡격막 등), 의료 장치의 종류를 나타내는 클래스, 유방 밀도를 나타내는 클래스, 또는 추후 유방암 발생 가능성을 나타내는 클래스 등일 수 있다. The artificial intelligence model 100 trained with training data may output various inference results according to inputs and tasks. For example, the inference result may include a class representing the cell type included in the image, a class representing the tissue type included in the image, a class representing the quality of the image or an artifact affecting the quality, and a class indicating whether the region in the image is a region of interest. It may be a class, a class representing immunity-related information, a class representing scores such as TPS or CPS, and a class representing the expression level of a specific dye. In addition, the inference result is a class representing the type of lesion (eg, pneumothorax, pulmonary nodule, lung cancer, tuberculosis, fracture, pneumonia, atelectasis, pulmonary consolidation, benign tumor in the breast, malignant tumor in the breast, breast cancer lesion, etc.), organ It may be a class representing a type (eg, lung, stomach, esophagus, carina, diaphragm, etc.), a class representing a type of medical device, a class representing breast density, or a class representing the possibility of developing breast cancer in the future. .

선별기(300)는 현재 인공지능 모델(100)의 훈련 성능을 기초로 어노테이션 후보 데이터에서 다음 훈련을 위한 어노테이션 대상을 선별한다. 선별기(300)는 현재 인공지능 모델(100)의 훈련 성능을 획득하고, 훈련 성능을 기초로 다음 훈련 데이터를 위한 어노테이션 대상 선별 정책을 결정할 수 있다. 이때, 선별기(300)는 현재 인공지능 모델(100)의 훈련 성능에 따라 이번 선별 정책을 이전 선별 정책과 같거나 다르게 결정할 수 있다. 즉, 선별기(300)는 현재 인공지능 모델(100)의 훈련 성능에 맞춰 결정된 선별 정책을 기초로, 다음 훈련 데이터를 위한 어노테이션 대상을 선별하기 때문에, 현재 인공지능 모델(100)은 지금까지의 훈련에서 학습이 부족한 부분을 다음 훈련에서 재학습할 수 있다. The selector 300 selects an annotation target for next training from annotation candidate data based on the training performance of the current artificial intelligence model 100 . The selector 300 may acquire training performance of the current artificial intelligence model 100 and determine an annotation target selection policy for next training data based on the training performance. At this time, the selector 300 may determine the current selection policy equal to or different from the previous selection policy according to the training performance of the current artificial intelligence model 100 . That is, since the selector 300 selects an annotation target for the next training data based on the selection policy determined according to the training performance of the current artificial intelligence model 100, the current artificial intelligence model 100 is used for training so far. The lack of learning in the training can be re-learned in the next training.

현재 인공지능 모델(100)의 훈련 성능은 인공지능 모델(100)의 훈련에 사용된 현재 훈련 데이터 중에서 남겨진 검증 데이터(validation data)를 통해 평가될 수 있고, 또는 대량 의료 데이터에서 정답값이 어노테이션된 데이터를 검증 데이터로 사용하여 평가될 수 있다. 선별기(300)는 검증 데이터에 대한 현재 인공지능 모델(100)의 추론 결과와 정답인 어노테이션 데이터를 비교하여, 현재 인공지능 모델(100)의 훈련 성능을 평가할 수 있다. 예를 들면, 현재 인공지능 모델(100)이 분류 모델인 경우, 선별기(300)는 클래스별 분류 정확도를 기초로 훈련 성능을 평가할 수 있다. 그리고, 선별기(300)는 특정 클래스의 분류 정확도가 기준 이하인 경우, 현재 인공지능 모델(100)이 다음 훈련에서 분류 정확도가 떨어진 특정 클래스에 대한 분류를 잘 할 수 있도록, 특정 클래스에 관계된 의료 데이터를 추출하는 선별 정책을 결정할 수 있다. The training performance of the current artificial intelligence model 100 can be evaluated through validation data remaining among the current training data used for training the artificial intelligence model 100, or the correct answer value is annotated in mass medical data. It can be evaluated using the data as validation data. The selector 300 may evaluate the training performance of the current artificial intelligence model 100 by comparing the inference result of the current artificial intelligence model 100 on the verification data with the annotation data that is the correct answer. For example, if the current artificial intelligence model 100 is a classification model, the selector 300 may evaluate training performance based on classification accuracy for each class. In addition, when the classification accuracy of the specific class is below the standard, the selector 300 selects medical data related to the specific class so that the current AI model 100 can classify the specific class with poor classification accuracy in the next training. It is possible to determine the selection policy for extraction.

특정 클래스에 관계된 의료 데이터를 추출하고 추후 어노테이션을 획득하기 위해서는, 인공지능 모델(100)이 추론한 결과 출력값이 이용될 수 있다. 예를 들어 선별기(300)는, 결정된 선별 정책에 따라 A라는 클래스에 관계된 의료 데이터를 추출하기 위해서, A라는 클래스에 상응하는 인공지능 모델(100)의 출력값을 기준으로 의료 데이터를 정렬(sorting)할 수 있다. 선별기(300)는, 정렬된 의료 데이터로부터 특정 컷-오프(cut-off) 이상의 출력값을 갖는 의료 데이터들을 선별할 수 있다. 선별기(300)는, 특정 컷-오프 이상의 출력값을 갖는 의료 데이터들을 선별함으로써, A라는 클래스에 대응하는 객체가 실제로 존재할 가능성이 상대적으로 높은 의료 데이터들 또는 실제로 A라는 클래스에 대응할 확률이 높은 의료 데이터들을 선별할 수 있다.In order to extract medical data related to a specific class and obtain an annotation later, an output value obtained as a result of inference by the artificial intelligence model 100 may be used. For example, the sorter 300 sorts the medical data based on the output value of the artificial intelligence model 100 corresponding to the class A in order to extract the medical data related to the class A according to the determined selection policy. can do. The sorter 300 may select medical data having an output value equal to or greater than a specific cut-off from the sorted medical data. The selector 300 selects medical data having an output value equal to or greater than a specific cut-off, such that medical data having a relatively high probability that an object corresponding to class A actually exists or medical data having a high probability actually corresponding to class A can select them.

선별기(300)는 다양하게 결정된 선별 정책을 사용할 수 있다. 예를 들면, 선별기(300)는 현재 인공지능 모델(100)의 분류 정확도가 기준 이하인 특정 클래스의 의료 데이터를 다른 클래스보다 높은 비율로 추출하는 선별 정책1을 사용할 수 있다. 즉, 선별기(300)는 선별 정책1은 현재 인공지능 모델(100)의 분류 정확도를 향상시키기 위해 추출되어야 하는 클래스들의 최적 비율을 포함할 수 있다. 선별기(300)는 어노테이션 후보 데이터 중에서 적어도 하나의 클래스 점수가 기준 이상인 의료 데이터를 추출하는 선별 정책2를 사용할 수 있다. 예를 들면, 선별 정책2에서, 클래스는 어노테이션의 대상이 되는 세포(cell), 조직, 또는 스트럭처(structure) 중 적어도 하나를 나타낼 수 있고, 클래스 점수는 해당 클래스에 포함될 확률일 수 있다. 선별기(300)는 어노테이션 후보 데이터 중에서 하나 이상의 클래스가 포함되지 않은 의료 데이터를 추출하는 선별 정책3을 사용할 수 있다. 선별기(300)는 어노테이션 후보 데이터 중에서 정의된 모든 클래스가 골고루 분포하는 의료 데이터를 추출하는 선별 정책4를 사용할 수 있다. 선별기(300)는 어노테이션 후보 데이터 중에서 인공지능 모델(100)의 예측 엔트로피가 기준 이상인 의료 데이터를 추출하는 선별 정책5를 사용할 수 있다. 인공지능 모델(100)이 객체 검출이나 객체 분리를 수행하는 경우, 선별기(300)는 클래스간 비율이 기준 조건을 만족하는 의료 데이터를 추출하는 선별 정책6을 사용할 수 있다. 클래스간 비율은 현재 인공지능 모델(100)의 훈련 성능, 태스크에 따라 다르게 설정될 수 있다. 객체 검출의 경우, 클래스별 객체(예를 들면, 세포 등) 수를 기준으로 클래스간 비율이 설정될 수 있고, 객체 분리의 경우 클래스별 영역(예를 들면, 암영역 등)의 넓이를 기준으로 클래스간 비율이 설정될 수 있다. 또한, 선별기(300)는 선별 정책들을 조합한 새로운 선별 정책을 사용할 수 있다. The sorter 300 may use variously determined sorting policies. For example, the sorter 300 may use a sorting policy 1 that extracts medical data of a specific class whose classification accuracy of the current artificial intelligence model 100 is below a standard at a higher rate than other classes. That is, in the sorter 300, the sorting policy 1 may include an optimal ratio of classes to be extracted to improve the classification accuracy of the current artificial intelligence model 100. The selector 300 may use the selection policy 2 for extracting medical data having at least one class score higher than or equal to a standard among annotation candidate data. For example, in the selection policy 2, a class may represent at least one of a cell, tissue, or structure to be annotated, and a class score may be a probability of being included in the corresponding class. The selector 300 may use a selection policy 3 for extracting medical data not including one or more classes from annotation candidate data. The sorter 300 may use the sorting policy 4 for extracting medical data in which all defined classes are evenly distributed among annotation candidate data. The selector 300 may use a selection policy 5 for extracting medical data having a predicted entropy of the artificial intelligence model 100 equal to or greater than a reference value from among annotation candidate data. When the artificial intelligence model 100 performs object detection or object separation, the sorter 300 may use the sorting policy 6 for extracting medical data whose ratio between classes satisfies the reference condition. The ratio between classes may be set differently according to the training performance and task of the current artificial intelligence model 100 . In the case of object detection, the ratio between classes may be set based on the number of objects (eg, cells, etc.) for each class, and in the case of object separation, based on the area of each class (eg, cancer region, etc.) A ratio between classes can be set. Also, the sorter 300 may use a new sorting policy that is a combination of the sorting policies.

선별기(300)는 선별 정책을 기초로, 어노테이션 후보 데이터에서 다음 훈련을 위한 어노테이션 대상을 선별할 수 있다. 선별 정책은 다양한 선별 정책 중에서 훈련 성능에 맞게 결정될 수 있다. Based on the selection policy, the selector 300 may select an annotation target for the next training from annotation candidate data. A selection policy may be determined according to training performance among various selection policies.

선별기(300)는 인공지능 모델(100)을 통해 어노테이션 후보 데이터에 대한 추론 결과를 획득하고, 어노테이션 후보 데이터의 추론 결과를 기초로 선별 정책에 해당하는 의료 데이터를, 어노테이션 대상으로 선별할 수 있다. The selector 300 may obtain an inference result for annotation candidate data through the artificial intelligence model 100, and select medical data corresponding to a selection policy as an annotation target based on the inference result of the annotation candidate data.

어노테이션 대상은 이미지 단위로 선별될 수 있고, 또는 이미지를 구성하는 패치 단위로 선별될 수 있다. 예를 들어, 어노테이션 후보 데이터가 병리 이미지들인 경우, 선별기(300)는 병리 이미지들 또는 각 병리 이미지를 구성하는 복수의 패치들에 대한 추론 결과를 획득하고, 추론 결과를 기초로 선별 정책에 해당하는 병리 이미지들 또는 패치들을 어노테이션 대상으로 결정할 수 있다. 예를 들어, 전체 슬라이드 이미징(WSI)을 통해 생성된 병리 이미지는 매우 크기 때문에, 선별기(300)는 적어도 하나의 병리 이미지에서 관심 영역에 해당하는 패치들을 추출하고, 이를 어노테이션 후보 데이터로 추출할 수 있다. 이때, 선별기(300)는 관심 영역이 겹치게 패치들을 추출할 수 있고, 또는 겹치는 영역이 없도록 패치들을 추출할 수 있다. 선별기(300)는 인공지능 모델(100)을 통해 패치들의 추론 결과를 획득하고, 패치들의 추론 결과를 기초로 어노테이션이 필요한 패치들을 어노테이션 대상으로 선별할 수 있다. The annotation target may be selected in units of images or in units of patches constituting an image. For example, when the annotation candidate data is pathology images, the selector 300 obtains inference results for the pathology images or a plurality of patches constituting each pathology image, and based on the inference results, the sorter 300 corresponding to the selection policy. Pathological images or patches may be determined as targets for annotation. For example, since a pathology image generated through whole slide imaging (WSI) is very large, the sorter 300 may extract patches corresponding to a region of interest from at least one pathology image and extract the patches as annotation candidate data. there is. In this case, the selector 300 may extract patches such that the region of interest overlaps, or may extract patches such that there is no overlapping region. The selector 300 may obtain inference results of patches through the artificial intelligence model 100 and select patches requiring annotation as annotation targets based on the inference results of the patches.

데이터베이스(20)에서 어노테이션 데이터가 매핑되지 않은 전체 데이터가 어노테이션 후보 데이터일 수 있고, 또는 데이터베이스(20)에서 어노테이션 데이터가 매핑되지 않은 전체 데이터 중 일정량의 데이터가 어노테이션 후보 데이터일 수 있다. 예를 들면, 2차 훈련 데이터를 위한 어노테이션 후보 데이터는 10만장의 의료 이미지들 중에서 1차 훈련 데이터로 사용되고 남은 9만9천장의 의료 이미지들일 수 있고, 또는 9만9천장 중에서 샘플링된 일정량(예를 들면, 1만장)의 의료 이미지들일 수 있다. 선별해야 할 어노테이션 대상의 수가 정해진 경우, 선별기(300)는 선별 정책에 해당하는 의료 데이터 중 K개를 무작위 추출을 하거나, 선별 정책에 해당하는 의료 데이터 중 특정 조건을 만족시키는 상위 K개를 추출할 수 있다. All data to which annotation data is not mapped in the database 20 may be annotation candidate data, or a certain amount of data among all data to which annotation data is not mapped in the database 20 may be annotation candidate data. For example, the annotation candidate data for secondary training data may be 99,000 medical images remaining after being used as primary training data among 100,000 medical images, or a certain amount sampled from among 99,000 (e.g., For example, 10,000 medical images). When the number of annotation targets to be screened is determined, the selector 300 randomly extracts K pieces of medical data corresponding to the selection policy or extracts the top K items satisfying a specific condition from among the medical data corresponding to the selection policy. can

예를 들어, 선별기(300)가 특정 클래스의 의료 데이터를 다른 클래스보다 높은 비율로 추출하는 선별 정책1을 사용하기로 결정한 경우, 어노테이션 후보 데이터에 대한 추론 결과를 기초로, 특정 클래스로 추론된 의료 데이터를 다른 클래스로 추론된 의료 데이터보다 많이 선별할 수 있다. For example, when the selector 300 determines to use screening policy 1 that extracts medical data of a specific class at a higher rate than other classes, medical data inferred as a specific class based on a result of reasoning on annotation candidate data. Data can be screened over inferred medical data into different classes.

선별기(300)는 어노테이션 작업 처리기(400)로 어노테이션 대상에 대한 어노테이션 작업을 의뢰할 수 있다. 한편, 선별기(300)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 의료 데이터를 어노테이션 대상에 포함시킬 수 있다. 랜덤 추출된 어노테이션 대상은 다양한 분포를 지니고 있어서 모델 학습에 유리할 수 있다.The sorter 300 may request an annotation job for an annotation target to the annotation job processor 400 . Meanwhile, the selector 300 selects an annotation target based on a selection policy, but may include a predetermined number of randomly extracted medical data in the annotation target. Randomly extracted annotation objects have various distributions, which can be advantageous for model learning.

어노테이션 작업 처리기(400)는 어노테이터에게 어노테이션 대상에 대한 어노테이션 작업을 의뢰하고, 어노테이션 대상에 대한 어노테이션 데이터를 획득한 후, 어노테이션 대상에 대한 어노테이션 데이터를 데이터베이스(20)에 저장할 수 있다.The annotation job processor 400 may request an annotation job for an annotation target from an annotator, obtain annotation data for the annotation target, and store the annotation data for the annotation target in the database 20 .

다음 훈련을 위해 선별된 어노테이션 대상에 대한 어노테이션 데이터가 획득된 경우, 훈련기(200)는 새로운 훈련 데이터를 기초로 인공지능 모델(100)을 재훈련시킨다. 훈련기(200)는 새로운 훈련 데이터와 기존 훈련 데이터를 함께 사용해서 인공지능 모델(100)을 재훈련시킬 수 있다.When annotation data for the annotation target selected for the next training is obtained, the trainer 200 retrains the artificial intelligence model 100 based on the new training data. The trainer 200 may retrain the artificial intelligence model 100 by using both new training data and existing training data.

도 2를 참고하면, 선별 장치(10)는 현재의 인공지능 모델(100)의 훈련 성능을 기초로 의료 데이터의 선별 정책을 결정하고, 선별 정책에 따라 다음 훈련을 위한 어노테이션 대상을 선별하는 과정을 인공지능 모델(100)의 훈련 성능이 기준 이상이 될 때까지 반복할 수 있다. 선별 장치(10)는 현재의 인공지능 모델(100)의 훈련 성능을 평가하고, 다음 훈련이 필요한 경우, 훈련 성능을 기초로 어노테이션 대상을 선별한 후, 어노테이션 대상과 이의 어노테이션 데이터를 이용해서 현재의 인공지능 모델(100)을 재훈련시키는데, 총 훈련 횟수 N은 훈련 성능에 따라 조절되고, 각 훈련을 위한 어노테이션 대상의 수 역시 훈련 성능에 따라 조절될 수 있다. 선별 장치(10)는 현재의 인공지능 모델(100)의 훈련 성능을 평가하고, 다음 훈련이 불필요한 경우, 인공지능 모델(100)의 훈련을 완료할 수 있다.Referring to FIG. 2 , the selection device 10 determines a selection policy of medical data based on the training performance of the current artificial intelligence model 100, and selects an annotation target for the next training according to the selection policy. It may be repeated until the training performance of the artificial intelligence model 100 exceeds the standard. The selection device 10 evaluates the training performance of the current artificial intelligence model 100, and if the next training is required, selects an annotation target based on the training performance, and then uses the annotation target and its annotation data to determine the current training performance. In retraining the artificial intelligence model 100, the total number of training times N is adjusted according to the training performance, and the number of annotation targets for each training may also be adjusted according to the training performance. The screening device 10 evaluates the training performance of the current artificial intelligence model 100, and may complete training of the artificial intelligence model 100 when the next training is unnecessary.

이처럼 선별 장치(10)는 대량 의료 데이터에서 한꺼번에 어노테이션 대상(예를 들면, 10만장 중 1만장)을 선별하는 것이 아니라, 1차 훈련 데이터(예를 들면, 10만장 중 1천장)로 훈련된 인공지능 모델(100)의 훈련 성능을 기초로 2차 훈련에 사용할 일정의 어노테이션 대상(예를 들면, 10만장 중 1만장)을 맞춤형으로 결정할 수 있다. 그리고 선별 장치(10)는, 2차 훈련에 사용하도록 결정된 어노테이션을 이용하여 재훈련된 인공지능 모델(100)의 훈련 성능을 기초로 3차 훈련이 필요한지 판단하고, 3차 훈련에 사용할 일정의 어노테이션 대상(예를 들면, 10만 장 중 2만장)을 선별하는 과정을 단계적으로 진행한다. 선별 장치(10)는, 대량의 훈련 데이터를 이용한 훈련을 진행하기 전에, 1차적으로 적은 양의 데이터를 이용하여 인공지능 모델(100)을 훈련한 결과를 기초로 대량의 훈련 데이터를 선별하기 위한 선별 정책을 결정할 수 있다. 따라서, 선별 장치(10)는, 현재 인공지능 모델(100)의 훈련 성능을 개선하기 위해 필요한 어노테이션 대상을 효율적으로 선별할 수 있고, 어노테이션 비용 대비 인공지능 모델 훈련 성능을 높일 수 있다.In this way, the screening device 10 does not select annotation targets (eg, 10,000 sheets out of 100,000 sheets) at once from a large amount of medical data, but artificially trained with primary training data (eg, 1,000 sheets out of 100,000 sheets). Based on the training performance of the intelligence model 100, a schedule of annotation targets to be used for secondary training (eg, 10,000 annotations out of 100,000 annotations) may be customized. And the screening device 10 determines whether tertiary training is necessary based on the training performance of the retrained artificial intelligence model 100 using the annotation determined to be used for the 2nd training, and determines the annotation schedule to be used for the 3rd training. The process of selecting an object (for example, 20,000 sheets out of 100,000 sheets) is performed step by step. The selection device 10 is configured to select a large amount of training data based on a result of training the artificial intelligence model 100 using a small amount of data primarily before performing training using a large amount of training data. selection policy can be determined. Therefore, the selection device 10 can efficiently select an annotation target required to improve the training performance of the current artificial intelligence model 100, and can increase the training performance of the artificial intelligence model compared to the annotation cost.

도 3부터 도 6 각각은 한 실시예에 따른 어노테이션을 위한 의료 데이터 선별 방법의 흐름도이다.Each of FIGS. 3 to 6 is a flowchart of a method for selecting medical data for annotation according to an embodiment.

도 3을 참고하면, 선별 장치(10)는 대량 의료 데이터에서 샘플링된 일부 의료 데이터와 이의 어노테이션 데이터를 최초 훈련 데이터로 생성한다(S110). 최초 훈련 데이터를 위한 어노테이션 대상을 샘플링하는 방법은 다양할 수 있다. 예를 들면, 선별기(300)는 10만장의 의료 이미지들 중에서 1000장의 의료 이미지들을 랜덤 샘플링할 수 있다. 또는, 선별기(300)는 10만장의 의료 이미지들에 연관된 판독 리포트를 분석하여, 판독 리포트에 인공지능 모델(100)의 훈련과 관련된 정보(예를 들면, 분류하고자 하는 클래스명)를 포함하는 의료 이미지들을 샘플링할 수 있다.Referring to FIG. 3 , the screening device 10 generates some medical data sampled from a large amount of medical data and annotation data thereof as initial training data (S110). There may be various methods of sampling annotation targets for initial training data. For example, the selector 300 may randomly sample 1000 medical images from among 100,000 medical images. Alternatively, the sorter 300 analyzes a reading report associated with 100,000 medical images, and includes information related to training of the artificial intelligence model 100 (eg, a class name to be classified) in the reading report. Images can be sampled.

선별 장치(10)는 최초 훈련 데이터를 통해 입력으로부터 추론된 결과를 출력하도록 인공지능 모델(100)을 훈련시킨다(S120). 인공지능 모델(100)이 의료 데이터로부터 원하는 추론 결과를 출력하도록 어노테이션 데이터가 부가될 수 있다. 예를 들면, 추론 결과는 이미지에 포함된 세포 종류를 나타내는 클래스, 이미지에 포함된 조직 종류를 나타내는 클래스, 이미지의 퀄리티 또는 퀄리티에 영향을 주는 아티팩트를 나타내는 클래스, 이미지 내 영역이 관심 영역인지를 나타내는 클래스, 면역 관련 정보를 나타내는 클래스, TPS 또는 CPS 등의 점수를 나타내는 클래스, 특정 염색에 대한 발현 정도를 나타내는 클래스 등일 수 있다. 또한, 추론 결과는 병변의 종류를 나타내는 클래스(예를 들면, 기흉, 폐결절, 폐암, 결핵, 골절, 폐렴, 무기폐, 폐 경화, 유방 내 양성 종양, 유방 내 악성 종양, 유방암 병변 등), 장기의 종류를 나타내는 클래스(예를 들면, 폐, 위, 식도, 분기부(carina), 횡격막 등), 의료 장치의 종류를 나타내는 클래스, 유방 밀도를 나타내는 클래스, 또는 추후 유방암 발생 가능성을 나타내는 클래스 등일 수 있다. The screening device 10 trains the artificial intelligence model 100 to output a result inferred from an input through initial training data (S120). Annotation data may be added so that the artificial intelligence model 100 outputs a desired inference result from medical data. For example, the inference result may include a class representing the cell type included in the image, a class representing the tissue type included in the image, a class representing the quality of the image or an artifact affecting the quality, and a class indicating whether the region in the image is a region of interest. It may be a class, a class representing immunity-related information, a class representing scores such as TPS or CPS, and a class representing the expression level of a specific dye. In addition, the inference result is a class representing the type of lesion (eg, pneumothorax, pulmonary nodule, lung cancer, tuberculosis, fracture, pneumonia, atelectasis, pulmonary consolidation, benign tumor in the breast, malignant tumor in the breast, breast cancer lesion, etc.), organ It may be a class representing a type (eg, lung, stomach, esophagus, carina, diaphragm, etc.), a class representing a type of medical device, a class representing breast density, or a class representing the possibility of developing breast cancer in the future. .

선별 장치(10)는 대량 의료 데이터 중 적어도 일부인 어노테이션 후보 데이터를 추출한다(S130).The sorting device 10 extracts annotation candidate data, which is at least a part of mass medical data (S130).

선별 장치(10)는 어노테이션 후보 데이터에 대한 인공지능 모델(100)의 추론 결과를 기초로, 어노테이션 후보 데이터 중에서 인공지능 모델의 다음 훈련을 위한 어노테이션 대상을 선별한다(S140). 선별 장치(10)는 현재의 인공지능 모델(100)의 훈련 성능을 기초로 다음 훈련을 위한 의료 데이터의 선별 정책을 결정하고, 추론 결과를 기초로 어노테이션 후보 데이터 중에서 선별 정책에 해당하는 어노테이션 대상을 선별할 수 있다. 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 의료 데이터를 어노테이션 대상에 포함시킬 수 있다. 이때, 선별 장치(10)는 인공지능 모델(100)의 훈련 성능에 따라 이번 선별 정책을 이전 선별 정책과 같거나 다르게 결정할 수 있다.The selection device 10 selects an annotation target for the next training of the artificial intelligence model from the annotation candidate data based on the inference result of the artificial intelligence model 100 on the annotation candidate data (S140). The selection device 10 determines a selection policy of medical data for the next training based on the training performance of the current artificial intelligence model 100, and selects an annotation target corresponding to the selection policy from among annotation candidate data based on the inference result. can be selected Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include a predetermined number of randomly extracted medical data in the annotation target. At this time, the selection device 10 may determine the current selection policy to be the same as or different from the previous selection policy according to the training performance of the artificial intelligence model 100 .

선별 장치(10)는 다음 훈련을 위한 어노테이션 대상에 대한 어노테이션 데이터를 획득한다(S150).The sorting device 10 acquires annotation data for an annotation target for the next training (S150).

선별 장치(10)는 어노테이션 데이터가 매핑된 어노테이션 대상을 훈련 데이터로 사용하여, 인공지능 모델(100)을 훈련시킨다(S160). 인공지능 모델(100)의 다음 훈련을 위한 어노테이션 데이터가 필요한 경우, 선별 장치(10)는 다음 훈련을 위한 어노테이션 대상 선별 단계(S130)로 이동하여 어노테이션 대상 선별을 반복할 수 있다.The selection device 10 trains the artificial intelligence model 100 by using the annotation target to which the annotation data is mapped as training data (S160). If annotation data for the next training of the artificial intelligence model 100 is required, the selection device 10 may move to the annotation target selection step (S130) for the next training and repeat the annotation target selection.

도 4를 참고하면, 선별 장치(10)는 현재 인공지능 모델(100)의 훈련 성능을 기초로 의료 데이터의 선별 정책을 결정한다(S210). 선별 정책은 현재의 인공지능 모델(100)의 훈련 성능, 그리고 태스크에 따라 다양하게 결정될 수 있다. 예를 들어, 특정 클래스의 분류 정확도가 기준 이하인 경우, 현재 인공지능 모델(100)이 다음 훈련에서 분류 정확도가 떨어진 특정 클래스에 대한 분류를 잘 할 수 있도록, 선별 장치(10)는 특정 클래스에 관계된 의료 데이터를 추출하는 선별 정책을 결정할 수 있다. 한편, 선별 정책은 선별해야 하는 어노테이션 대상 수를 포함할 수 있다. 이때, 선별 장치(10)는 인공지능 모델(100)의 훈련 성능에 따라 이번 선별 정책을 이전 선별 정책과 같거나 다르게 결정할 수 있다.Referring to FIG. 4 , the selection device 10 determines a selection policy of medical data based on the training performance of the current artificial intelligence model 100 (S210). The selection policy may be variously determined according to the training performance of the current artificial intelligence model 100 and tasks. For example, when the classification accuracy of a specific class is below the standard, the sorting device 10 is configured to perform classification related to the specific class so that the current artificial intelligence model 100 can classify the specific class with poor classification accuracy in the next training. A screening policy for extracting medical data can be determined. Meanwhile, the selection policy may include the number of annotation targets to be selected. At this time, the selection device 10 may determine the current selection policy to be the same as or different from the previous selection policy according to the training performance of the artificial intelligence model 100 .

선별 장치(10)는 어노테이션 후보 데이터에 대한 인공지능 모델(100)의 추론 결과를 기초로, 어노테이션 후보 데이터 중에서 선별 정책에 해당하는 의료 데이터를 다음 훈련을 위한 어노테이션 대상으로 선별한다(S220). 선별 장치(10)는 인공지능 모델(100)을 통해 어노테이션 후보 데이터에 대한 추론 결과를 획득하고, 어노테이션 후보 데이터의 추론 결과를 기초로 선별 정책에 해당하는 의료 데이터를, 어노테이션 대상으로 선별할 수 있다. 대량 의료 데이터에서 어노테이션 데이터가 매핑되지 않은 전체 데이터가 어노테이션 후보 데이터일 수 있고, 또는 어노테이션 데이터가 매핑되지 않은 전체 데이터 중 일정량의 데이터가 어노테이션 후보 데이터일 수 있다. 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 의료 데이터를 어노테이션 대상에 포함시킬 수 있다.The selection device 10 selects medical data corresponding to a selection policy from annotation candidate data as an annotation target for the next training, based on the inference result of the artificial intelligence model 100 on the annotation candidate data (S220). The screening device 10 may obtain an inference result for annotation candidate data through the artificial intelligence model 100, and select medical data corresponding to a selection policy as an annotation target based on the inference result of the annotation candidate data. . In the mass medical data, all data to which annotation data is not mapped may be annotation candidate data, or a certain amount of data among all data to which annotation data is not mapped may be annotation candidate data. Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include a predetermined number of randomly extracted medical data in the annotation target.

선별 장치(10)는 어노테이션 대상에 대한 어노테이션 데이터를 획득하고, 어노테이션 데이터가 매핑된 어노테이션 대상을 포함하는 새로운 훈련 데이터를 기초로 인공지능 모델(100)을 훈련시킨다(S230). 선별 장치(10)는 새로운 훈련 데이터와 기존 훈련 데이터를 함께 사용해서 인공지능 모델(100)을 재훈련시킬 수 있다. 인공지능 모델(100)은 학습 방식에 따라 어노테이션 데이터가 매핑된 의료 데이터로 지도 학습할 수 있고, 또는 어노테이션 데이터가 매핑되지 않은 의료 데이터도 훈련 데이터로 사용하여 비지도 학습할 수도 있다.The screening device 10 acquires annotation data for an annotation target, and trains the artificial intelligence model 100 based on new training data including an annotation target to which the annotation data is mapped (S230). The screening device 10 may retrain the artificial intelligence model 100 by using both new training data and existing training data. The artificial intelligence model 100 may perform supervised learning with medical data to which annotation data is mapped according to a learning method, or may perform unsupervised learning by using medical data to which annotation data is not mapped as training data.

선별 장치(10)는 검증 데이터를 기초로 훈련된 인공지능 모델(100)의 훈련 성능을 평가한다(S240). 선별 장치(10)는 인공지능 모델(100)이 검증 데이터에 대해 추론한 결과를 획득하고, 추론 결과와 검증 데이터에 매핑된 어노테이션 데이터(정답)를 비교하여 훈련 성능을 평가할 수 있다.The selection device 10 evaluates the training performance of the trained artificial intelligence model 100 based on the verification data (S240). The selection device 10 may obtain a result of the artificial intelligence model 100 reasoning on the verification data, and evaluate training performance by comparing the inference result with annotation data (correct answer) mapped to the verification data.

평가 결과에 따라 다음 훈련을 진행하지 않는 경우, 선별 장치(10)는 어노테이션 대상 선별 절차를 종료한다(S250). 평가 결과에 따라 다음 훈련을 진행하는 경우, 선별 장치(10)는 다음 훈련을 위한 어노테이션 대상 선별 단계(S210)로 이동하여 어노테이션 대상 선별 및 훈련을 반복할 수 있다. 선별 장치(10)는 인공지능 모델(100)의 훈련 성능이 기준 미만인 경우, 다음 훈련이 필요하다고 판단할 수 있다.If the next training is not performed according to the evaluation result, the selection device 10 ends the annotation target selection procedure (S250). When the next training is performed according to the evaluation result, the selection device 10 may move to the annotation target selection step (S210) for the next training and repeat the annotation target selection and training. When the training performance of the artificial intelligence model 100 is less than the standard, the screening device 10 may determine that the next training is necessary.

도 5를 참고하면, 선별 장치(10)는 현재 인공지능 모델(100)의 훈련 성능을 기초로 의료 데이터의 선별 정책을 결정한다(S310). Referring to FIG. 5 , the selection device 10 determines a selection policy of medical data based on the training performance of the current artificial intelligence model 100 (S310).

선별 장치(10)는 어노테이션 후보 데이터에 대한 인공지능 모델(100)의 추론 결과를 기초로, 선별 정책에 따라 어노테이션 후보 데이터 중에서 다음 훈련을 위한 어노테이션 대상을 선별한다(S320). 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 의료 데이터를 어노테이션 대상에 포함시킬 수 있다.The selection device 10 selects an annotation target for the next training from annotation candidate data according to a selection policy based on the inference result of the artificial intelligence model 100 on the annotation candidate data (S320). Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include a predetermined number of randomly extracted medical data in the annotation target.

선별 장치(10)는 새로운 어노테이션 대상에 대한 어노테이션 작업을 요청한다(S330).The sorting device 10 requests an annotation work for a new annotation target (S330).

선별 장치(10)는 새로운 어노테이션 대상에 대한 어노테이션 작업 결과를 기초로 훈련된 인공지능 모델(100)로부터, 검증 데이터에 대한 추론 결과를 획득한다(S340).The selection device 10 obtains an inference result for verification data from the artificial intelligence model 100 trained on the basis of the result of the annotation task for the new annotation target (S340).

선별 장치(10)는 추론 결과와 검증 데이터에 매핑된 어노테이션 데이터(정답)를 비교하여 훈련 성능을 평가한다(S350).The screening device 10 evaluates training performance by comparing the inference result with the annotation data (correct answer) mapped to the verification data (S350).

선별 장치(10)는 평가 결과에 따라 인공지능 모델(100)을 위한 어노테이션 대상 선별 절차를 종료하거나, 인공지능 모델(100)의 다음 훈련을 위한 어노테이션 대상 선별 절차를 반복한다(S360).According to the evaluation result, the selection device 10 ends the procedure for selecting an annotation target for the artificial intelligence model 100 or repeats the procedure for selecting an annotation target for the next training of the artificial intelligence model 100 (S360).

도 6을 참고하면, 선별 장치(10)는 대량 의료 데이터에서 샘플링된 일부 데이터와 이의 어노테이션 데이터를 최초 훈련 데이터로 생성한다(S410). 최초 훈련 데이터를 위한 어노테이션 대상을 샘플링하는 방법은 다양할 수 있다. 예를 들면, 선별기(300)는 10만장의 의료 이미지들 중에서 1000장의 의료 이미지들을 랜덤 샘플링할 수 있다. 또는, 선별기(300)는 10만장의 의료 이미지들에 연관된 판독 리포트를 분석하여, 판독 리포트에 인공지능 모델(100)의 훈련과 관련된 정보(예를 들면, 분류하고자 하는 클래스명)를 포함하는 의료 이미지들을 샘플링할 수 있다.Referring to FIG. 6 , the selection device 10 generates some data sampled from a large amount of medical data and annotation data thereof as initial training data (S410). There may be various methods of sampling annotation targets for initial training data. For example, the selector 300 may randomly sample 1000 medical images from among 100,000 medical images. Alternatively, the sorter 300 analyzes a reading report associated with 100,000 medical images, and includes information related to training of the artificial intelligence model 100 (eg, a class name to be classified) in the reading report. Images can be sampled.

선별 장치(10)는 최초 훈련 데이터를 통해 인공지능 모델(100)이 입력으로부터 추론 결과를 출력하도록 훈련시킨다(S420). 인공지능 모델(100)이 의료 데이터로부터 원하는 추론 결과를 출력하도록 어노테이션 데이터가 부가될 수 있다. 예를 들면, 추론 결과는 이미지에 포함된 세포 종류를 나타내는 클래스, 이미지에 포함된 조직 종류를 나타내는 클래스, 이미지의 퀄리티 또는 퀄리티에 영향을 주는 아티팩트를 나타내는 클래스, 이미지 내 영역이 관심 영역인지를 나타내는 클래스, 면역 관련 정보를 나타내는 클래스, TPS 또는 CPS 등의 점수를 나타내는 클래스, 특정 염색에 대한 발현 정도를 나타내는 클래스 등일 수 있다. 또한, 추론 결과는 병변의 종류를 나타내는 클래스(예를 들면, 기흉, 폐결절, 폐암, 결핵, 골절, 폐렴, 무기폐, 폐 경화, 유방 내 양성 종양, 유방 내 악성 종양, 유방암 병변 등), 장기의 종류를 나타내는 클래스(예를 들면, 폐, 위, 식도, 분기부(carina), 횡격막 등), 의료 장치의 종류를 나타내는 클래스, 유방 밀도를 나타내는 클래스, 또는 추후 유방암 발생 가능성을 나타내는 클래스 등일 수 있다. The screening device 10 trains the artificial intelligence model 100 to output an inference result from an input through initial training data (S420). Annotation data may be added so that the artificial intelligence model 100 outputs a desired inference result from medical data. For example, the inference result may include a class representing the cell type included in the image, a class representing the tissue type included in the image, a class representing the quality of the image or an artifact affecting the quality, and a class indicating whether the region in the image is a region of interest. It may be a class, a class representing immunity-related information, a class representing scores such as TPS or CPS, and a class representing the expression level of a specific dye. In addition, the inference result is a class representing the type of lesion (eg, pneumothorax, pulmonary nodule, lung cancer, tuberculosis, fracture, pneumonia, atelectasis, pulmonary consolidation, benign tumor in the breast, malignant tumor in the breast, breast cancer lesion, etc.), organ It may be a class representing a type (eg, lung, stomach, esophagus, carina, diaphragm, etc.), a class representing a type of medical device, a class representing breast density, or a class representing the possibility of developing breast cancer in the future. .

선별 장치(10)는 검증 데이터를 기초로 훈련된 인공지능 모델(100)의 훈련 성능을 평가한다(S430). 검증 데이터는 인공지능 모델(100)의 훈련 성능을 평가하기 위해서 생성된 데이터일 수 있다. 검증 데이터는 대량 의료 데이터로부터 추출된 데이터일 수 있다. 검증 데이터는 단계 S410에서 샘플링되는 일부 데이터에 포함될 수 있다. 선별 장치(10)는 인공지능 모델(100)이 검증 데이터에 대해 추론한 결과를 획득하고, 추론 결과와 검증 데이터에 매핑된 어노테이션 데이터(정답)를 비교하여 훈련 성능을 평가할 수 있다.The selection device 10 evaluates the training performance of the trained artificial intelligence model 100 based on the verification data (S430). The verification data may be data generated to evaluate training performance of the artificial intelligence model 100 . Verification data may be data extracted from mass medical data. Verification data may be included in some data sampled in step S410. The selection device 10 may obtain a result of the artificial intelligence model 100 reasoning on the verification data, and evaluate training performance by comparing the inference result with annotation data (correct answer) mapped to the verification data.

선별 장치(10)는 현재 인공지능 모델(100)의 훈련 성능을 기초로 다음 훈련을 위한 의료 데이터의 선별 정책을 결정한다(S440).The selection device 10 determines a selection policy of medical data for the next training based on the training performance of the current artificial intelligence model 100 (S440).

선별 장치(10)는 어노테이션 후보 데이터에 대한 인공지능 모델(100)의 추론 결과를 기초로, 선별 정책에 따라 어노테이션 후보 데이터 중에서 다음 훈련을 위한 어노테이션 대상을 선별한다(S450). 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 의료 데이터를 어노테이션 대상에 포함시킬 수 있다.The selection device 10 selects an annotation target for the next training from annotation candidate data according to a selection policy based on the inference result of the artificial intelligence model 100 for annotation candidate data (S450). Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include a predetermined number of randomly extracted medical data in the annotation target.

선별 장치(10)는 다음 훈련을 위한 어노테이션 대상에 대한 어노테이션 데이터를 획득하고, 어노테이션 데이터가 매핑된 어노테이션 대상을 포함하는 새로운 훈련 데이터를 기초로 인공지능 모델(100)을 훈련시킨다(S460).The screening device 10 acquires annotation data for an annotation target for the next training, and trains the artificial intelligence model 100 based on new training data including an annotation target to which the annotation data is mapped (S460).

선별 장치(10)는 훈련된 인공지능 모델(100)의 훈련 성능을 평가한다(S470). 선별 장치(10)는 인공지능 모델(100)이 검증 데이터에 대해 추론한 결과를 획득하고, 추론 결과와 검증 데이터에 매핑된 어노테이션 데이터(정답)를 비교하여 훈련 성능을 평가할 수 있다.The selection device 10 evaluates the training performance of the trained artificial intelligence model 100 (S470). The selection device 10 may obtain a result of the artificial intelligence model 100 reasoning on the verification data, and evaluate training performance by comparing the inference result with annotation data (correct answer) mapped to the verification data.

평가 결과에 따라 다음 훈련을 진행하지 않는 경우, 선별 장치(10)는 어노테이션 대상 선별 절차를 종료한다(S480). 평가 결과에 따라 다음 훈련을 진행하는 경우, 선별 장치(10)는 다음 훈련을 위한 어노테이션 대상 선별 단계(S440)로 이동하여 어노테이션 대상 선별 및 훈련을 반복할 수 있다.If the next training is not performed according to the evaluation result, the selection device 10 ends the annotation target selection procedure (S480). When the next training is performed according to the evaluation result, the selection device 10 may move to the annotation target selection step (S440) for the next training and repeat the annotation target selection and training.

도 7과 도 8 각각은 한 실시예에 따른 어노테이션을 위한 패치 이미지 선별 방법의 흐름도이다.7 and 8 are respective flowcharts of a method for selecting a patch image for annotation according to an exemplary embodiment.

도 7을 참고하면, 선별 장치(10)는 적어도 하나의 의료 이미지에서 패치들을 추출한다(S510). 선별 장치(10)는 관심 영역이 겹치게 패치들을 추출할 수 있고, 또는 겹치는 영역이 없도록 패치들을 추출할 수 있다. 의료 이미지는 예를 들면, WSI를 통해 생성된 병리 이미지일 수 있으나, 이에 한정될 필요는 없다. Referring to FIG. 7 , the screening device 10 extracts patches from at least one medical image (S510). The screening device 10 may extract patches so that the region of interest overlaps, or may extract patches so that the region of interest does not overlap. The medical image may be, for example, a pathology image generated through WSI, but is not limited thereto.

선별 장치(10)는 패치들에 대한 인공지능 모델(100)의 추론 결과를 기초로, 패치들 중 선별 정책에 해당하는 일부 패치들을 어노테이션 대상으로 선별한다(S520). 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 패치들을 어노테이션 대상에 포함시킬 수 있다.The selection device 10 selects some patches corresponding to the selection policy among patches as annotation targets based on the inference result of the artificial intelligence model 100 for the patches (S520). Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include randomly extracted patches in the annotation target.

선별 장치(10)는 선별한 패치들에 대한 어노테이션 데이터를 획득하고, 어노테이션 데이터가 매핑된 패치들을 기초로 인공지능 모델(100)을 훈련시킨다(S530).The screening device 10 acquires annotation data for the selected patches and trains the artificial intelligence model 100 based on the patches to which the annotation data is mapped (S530).

선별 장치(10)는 훈련된 인공지능 모델(100)의 재학습이 필요한지 판단한다(S540). The selection device 10 determines whether re-learning of the trained artificial intelligence model 100 is necessary (S540).

선별 장치(10)는 재학습이 불필요한 경우, 인공지능 모델을 위한 어노테이션 대상 선별 절차를 종료하고, 재학습이 필요한 경우, 인공지능 모델의 다음 훈련을 위한 어노테이션 대상 선별 절차를 반복한다(S550). If re-learning is unnecessary, the selection device 10 ends the procedure for selecting an annotation target for the artificial intelligence model, and repeats the procedure for selecting an annotation target for the next training of the artificial intelligence model if re-learning is required (S550).

도 8을 참고하면, 선별 장치(10)는 입력 이미지로부터 추론된 결과를 출력하도록 훈련된 인공지능 모델을 이용하여 적어도 하나의 의료 이미지에 대한 추론 결과를 획득한다(S610). 의료 이미지는 예를 들면, WSI를 통해 생성된 병리 이미지일 수 있으나, 이에 한정될 필요는 없다.Referring to FIG. 8 , the screening device 10 obtains an inference result for at least one medical image by using an artificial intelligence model trained to output a result inferred from an input image (S610). The medical image may be, for example, a pathology image generated through WSI, but is not limited thereto.

선별 장치(10)는 각 의료 이미지 내에서 패치들을 추출한다(S620). 선별 장치(10)는 관심 영역이 겹치게 패치들을 추출할 수 있고, 또는 겹치는 영역이 없도록 패치들을 추출할 수 있다. The screening device 10 extracts patches from each medical image (S620). The screening device 10 may extract patches so that the region of interest overlaps, or may extract patches so that the region of interest does not overlap.

선별 장치(10)는 각 의료 이미지에 대한 추론 결과를 기초로 각 패치에 해당하는 추론 결과를 매핑하고, 각 패치의 추론 결과를 기초로 패치들 중 선별 정책에 해당하는 일부 패치들을 어노테이션 대상으로 선별한다(S630). 한편, 선별 장치(10)는 선별 정책을 기초로 어노테이션 대상을 선별하되, 랜덤 추출된 일정 수의 패치들을 어노테이션 대상에 포함시킬 수 있다.The selection device 10 maps an inference result corresponding to each patch based on the inference result for each medical image, and selects some patches corresponding to a selection policy among patches as annotation targets based on the inference result of each patch. Do (S630). Meanwhile, the selection device 10 selects an annotation target based on a selection policy, but may include randomly extracted patches in the annotation target.

이때, 선별 장치(10)는 각 패치에 매핑된 추론 결과를 이용하여 패치 이미지로부터 추론 결과를 출력하는 인공지능 모델을 훈련시킬 수 있다. 또한, 선별 장치(10)는 선별한 패치들에 대한 어노테이션 데이터를 획득하고, 어노테이션 데이터가 매핑된 패치들을 기초로 인공지능 모델을 재훈련시킬 수 있다. 앞에서 설명한 바와 같이, 선별 장치(10)는 인공지능 모델을 재학습이 필요한 경우, 패치들에 대한 인공지능 모델의 추론 결과를 기초로 다음 훈련을 위한 어노테이션 대상을 선별하고 이를 기초로 훈련하는 절차를 반복할 수 있다.In this case, the screening device 10 may train an artificial intelligence model that outputs an inference result from the patch image using the inference result mapped to each patch. Also, the selection device 10 may obtain annotation data for the selected patches and retrain the artificial intelligence model based on the patches to which the annotation data is mapped. As described above, the screening device 10 selects an annotation target for the next training based on the inference result of the artificial intelligence model for patches when re-learning the artificial intelligence model is required, and performs a procedure of training based on this. can be repeated

도 9는 한 실시예에 따른 어노테이션 선별 장치의 하드웨어 구성도이다.9 is a hardware configuration diagram of an annotation screening device according to an embodiment.

도 9를 참고하면, 선별 장치(10)는 하나 이상의 프로세서(11)에 의해 동작하는 컴퓨팅 장치로서, 프로세서(11), 프로세서(11)에 의하여 수행되는 컴퓨터 프로그램을 로드하는 메모리(13), 컴퓨터 프로그램 및 각종 데이터를 저장하는 스토리지(15), 통신 인터페이스(17), 그리고 이들을 연결하는 버스(19)를 포함할 수 있다. 이외에도, 선별 장치(10)에는 다양한 구성 요소가 더 포함될 수 있다. Referring to FIG. 9 , the sorting device 10 is a computing device operated by one or more processors 11, and includes a processor 11, a memory 13 for loading a computer program executed by the processor 11, and a computer. It may include a storage 15 for storing programs and various data, a communication interface 17, and a bus 19 connecting them. In addition, the sorting device 10 may further include various components.

컴퓨터 프로그램은 메모리(13)에 로드될 때 프로세서(11)로 하여금 본 개시의 다양한 실시예에 따른 방법/동작을 수행하도록 하는 명령어들(instruction)을 포함할 수 있다. 즉, 프로세서(11)는 명령어들을 실행함으로써, 본 개시의 다양한 실시예에 따른 방법/동작들을 수행할 수 있다. 컴퓨터 프로그램은 기능을 기준으로 묶인 일련의 컴퓨터 판독가능 명령어들로 구성되고, 프로세서에 의해 실행되는 것을 가리킨다. 컴퓨터 프로그램은 본 개시에서 설명한 동작을 수행하는 명령어들을 포함할 수 있다.The computer program may include instructions that, when loaded into memory 13, cause processor 11 to perform methods/operations in accordance with various embodiments of the present disclosure. That is, the processor 11 may perform methods/operations according to various embodiments of the present disclosure by executing instructions. A computer program consists of a series of computer readable instructions grouped together on a functional basis and is executed by a processor. A computer program may include instructions for performing the operations described in this disclosure.

프로세서(11)는 어노테이션 선별 장치(10)의 각 구성의 전반적인 동작을 제어한다. 프로세서(11)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서 중 적어도 하나를 포함하여 구성될 수 있다. 또한, 프로세서(11)는 본 개시의 다양한 실시예들에 따른 방법/동작을 실행하기 위한 적어도 하나의 애플리케이션 또는 컴퓨터 프로그램에 대한 연산을 수행할 수 있다. The processor 11 controls the overall operation of each component of the annotation screening device 10 . The processor 11 may include at least one of a Central Processing Unit (CPU), a Micro Processor Unit (MPU), a Micro Controller Unit (MCU), a Graphic Processing Unit (GPU), or any type of processor well known in the art of the present disclosure. can be configured to include Also, the processor 11 may perform an operation for at least one application or computer program for executing a method/operation according to various embodiments of the present disclosure.

메모리(13)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(13)는 본 개시의 다양한 실시예들에 따른 방법/동작을 실행하기 위하여 스토리지(15)로부터 하나 이상의 컴퓨터 프로그램을 로드할 수 있다. 메모리(13)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 개시의 기술적 범위는 이에 한정되지 않는다.The memory 13 stores various data, commands and/or information. Memory 13 may load one or more computer programs from storage 15 to execute methods/operations according to various embodiments of the present disclosure. The memory 13 may be implemented as a volatile memory such as RAM, but the technical scope of the present disclosure is not limited thereto.

스토리지(15)는 컴퓨터 프로그램을 비임시적으로 저장할 수 있다. 스토리지(15)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.Storage 15 may non-temporarily store computer programs. The storage 15 may be a non-volatile memory such as read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or the like, a hard disk, a removable disk, or a device well known in the art. It may be configured to include any known type of computer-readable recording medium.

통신 인터페이스(17)는 선별 장치(10)의 유무선 인터넷 통신을 지원한다. 또한, 통신 인터페이스(17)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(17)는 본 개시의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다. The communication interface 17 supports wired and wireless Internet communication of the sorting device 10 . Also, the communication interface 17 may support various communication methods other than internet communication. To this end, the communication interface 17 may include a communication module well known in the art of the present disclosure.

버스(19)는 선별 장치(10)의 구성 요소 간 통신 기능을 제공한다. 버스(19)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.Bus 19 provides communication between the components of sorting device 10 . The bus 19 may be implemented in various types of buses such as an address bus, a data bus, and a control bus.

이상에서 설명한 본 개시의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 개시의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present disclosure described above are not implemented only through devices and methods, and may be implemented through a program that realizes functions corresponding to the configuration of the embodiments of the present disclosure or a recording medium on which the program is recorded.

이상에서 본 개시의 실시예에 대하여 상세하게 설명하였지만 본 개시의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 개시의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 개시의 권리범위에 속하는 것이다.Although the embodiments of the present disclosure have been described in detail above, the scope of the present disclosure is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present disclosure defined in the following claims are also included in the present disclosure. that fall within the scope of the right.

Claims

A method of operating a medical data selection device operated by at least one processor,
generating some medical data sampled from a large amount of medical data and annotation data of the partial medical data as training data;
Extracting annotation candidate data that is at least a part of the mass medical data; and
Obtaining an inference result for the annotation candidate data from an artificial intelligence model trained through the training data, and selecting an annotation target for next training of the artificial intelligence model from among the annotation candidate data based on the inference result.
Operation method including.

In paragraph 1,
The step of selecting the annotation target is
Determining a selection policy of medical data for next training based on the training performance of the artificial intelligence model, and selecting an annotation target corresponding to the selection policy from among the annotation candidate data based on the reasoning result.

In paragraph 2,
The selection policy
An operation method, which is a policy of extracting medical data of a specific class whose classification accuracy is below the standard at a higher rate than other classes.

In paragraph 2,
extracting verification data that is at least a part of the mass medical data; and
Evaluating training performance of the artificial intelligence model based on the verification data
Operation method further comprising.

In paragraph 1,
The step of selecting the annotation target is
Adding medical data randomly extracted from the mass medical data to the annotation target.

In paragraph 1,
The step of selecting the annotation target is
Obtaining an inference result for the annotation candidate data from an artificial intelligence model trained through the training data, and selecting medical data related to a specific class from the annotation candidate data based on the inference result.

In paragraph 1,
The step of generating the training data is
Analyzing a reading report associated with the mass medical data, sampling the partial medical data including information related to training of the artificial intelligence model in the reading report, obtaining annotation data of the partial medical data, and obtaining the training data generating, how it works.

In paragraph 1,
The bulk of medical data is
An operating method comprising at least one image acquired through a medical imaging device, pathology images, or patch images extracted from each medical image.

A method of operating a medical data selection device operated by at least one processor,
A selection policy for medical data is determined based on the training performance of the current artificial intelligence model, and an annotation target corresponding to the selection policy is selected from among the annotation candidate data based on the inference result of the current artificial intelligence model for annotation candidate data. After selecting for next training, repeating the process of training the current artificial intelligence model based on the annotation data obtained for the annotation target; and
When the next training of the current artificial intelligence model is not performed, terminating the process,
The method of operation, wherein the annotation candidate data is data of at least a part of mass medical data.

In paragraph 9,
The repeating steps
Extracting verification data, which is at least a part of the mass medical data, and evaluating training performance of the current artificial intelligence model based on the verification data.

In paragraph 9,
The repeating steps
Adding medical data randomly extracted from the mass medical data to the annotation target.

In paragraph 9,
The repeating steps
The operating method of determining the selection policy as before or differently according to the training performance of the current artificial intelligence model.

In paragraph 9,
The current AI model is
An initial model trained on the basis of some medical data sampled from the mass medical data and annotation data of the partial medical data, or the initial model is a model retrained through the process.

In paragraph 9,
The bulk of medical data is
An operating method comprising at least one image acquired through a medical imaging device, pathology images, or patch images extracted from each medical image.

A medical data selection device operated by at least one processor,
An artificial intelligence model trained to output inferred results from inputs based on training data, and
Extracting annotation candidate data, which is at least a part of mass medical data, obtaining an inference result for the annotation candidate data from the artificial intelligence model, and performing a next training of the artificial intelligence model among the annotation candidate data based on the inference result. Includes a selector for selecting an annotation target;
The annotation data obtained for the annotation target for the next training is used for the next training of the artificial intelligence model.

In paragraph 15,
The selector
When the next training of the artificial intelligence model is required, the medical data selection device repeats the process of selecting the annotation target for the next training based on the reasoning result of the artificial intelligence model.

In paragraph 15,
The selector
A medical data selection device that determines a medical data selection policy for the next training based on the training performance of the artificial intelligence model and selects an annotation target corresponding to the selection policy from among the annotation candidate data based on the reasoning result. .

In paragraph 17,
The selector
A medical data selection device that determines the selection policy as before or differently according to the training performance of the artificial intelligence model.

In paragraph 15,
The selector
Medical data selection apparatus for adding medical data randomly extracted from the mass medical data to the annotation target.

In paragraph 15,
The bulk of medical data is
A medical data screening apparatus comprising at least one patch image extracted from images obtained through at least one medical imaging device, pathology images, or each medical image.