KR102495367B1

KR102495367B1 - Multiple instance learning for histopathology classification

Info

Publication number: KR102495367B1
Application number: KR1020200047888A
Authority: KR
Inventors: 박상현; 치콘테 필립
Original assignee: 재단법인대구경북과학기술원
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2023-02-01
Also published as: KR20210129850A

Abstract

본 발명은 조직 병리학 분류를 위한 다중 인스턴스 학습 방법에 관한 것으로, 컴퓨팅 장치 또는 컴퓨팅 네트워크에서 적어도 하나의 프로세서에 의해 수행되는 조직 병리학 분류를 위한 다중 인스턴스 학습 방법으로서, 특징 추출 모델(Fθ(ㆍ))을 실행하여 i번째 슬라이드 유래 인스턴스(pij)를 저차원 임베딩(low dimensional embedding, gij)으로 변환하고, 이진 분류기를 이용하여 인스턴스(pij)의 양성성을 확인 후, 모든 모음(bags)의 인스턴스 레벨 확률(instance level probabilities)을 분류하여 학습을 위한 슬라이드당 최상위 인스턴스를 샘플링하는 인스턴스 선택 단계와, 상기 인스턴스 선택 단계에서 얻어진 인스턴스들을 이용하여 학습하되, 인스턴스 레벨 학습과 모음 레벨 학습을 순차 수행하여 최종 로스를 구하는 학습단계와, 두 점 사이의 유사도를 검출하는 커널을 이용하여 모음 레벨 임베딩(zi)을 학습된 중심(learned centroid)에 분배하는 소프트 할당 기반 추론 단계를 포함할 수 있다.The present invention relates to a multi-instance learning method for histopathology classification, which is performed by at least one processor in a computing device or a computing network, and features a feature extraction model (Fθ(•)). to convert the ith slide-derived instance (pij) into a low dimensional embedding (gij), and after confirming the positivity of the instance (pij) using a binary classifier, the instance level of all bags (bags) An instance selection step of sampling the highest instance per slide for learning by classifying instance level probabilities, learning using the instances obtained in the instance selection step, and sequentially performing instance level learning and vowel level learning to final loss It may include a learning step of obtaining , and a soft assignment-based inference step of distributing vowel level embeddings (zi) to learned centroids using a kernel that detects a similarity between two points.

Description

Multi-instance learning method for histopathology classification {MULTIPLE INSTANCE LEARNING FOR HISTOPATHOLOGY CLASSIFICATION}

본 발명은 조직 병리학 분류를 위한 다중 인스턴스 학습 방법에 관한 것으로, 더 상세하게는 인스턴스 레이블을 정확하게 예측할 수 있는 조직 병리학 분류를 위한 다중 인스턴스 학습 방법에 관한 것이다.The present invention relates to a multi-instance learning method for histopathology classification, and more particularly, to a multi-instance learning method for histopathology classification capable of accurately predicting instance labels.

최근 전체 슬라이드 이미지(WSI, whole-slide image) 스캐너를 이용하여 얻은 글라스 슬라이드를 조직 병리학 이미지로 디지털화하는 것은, 임상 환경에서 암 진단의 표준으로써 중요한 역할을 하고 있다.Recently, digitization of glass slides obtained using a whole-slide image (WSI) scanner into histopathological images has played an important role as a standard for cancer diagnosis in a clinical environment.

단일 WSI는 100k 픽셀(pixels)의 매우 큰 볼륨이며, 선택적 분석은 분석의 어려움과 시간의 소모 모두를 고려할 필요가 있다.A single WSI is a very large volume of 100k pixels, and selective analysis needs to consider both analysis difficulty and time consumption.

특히, 높은 계산 비용과 관찰자들의 주관적 판단의 편향에 기인하여 WSI의 자동화 및 정확한 분석은 향상된 진단 및 더 나은 치료전략을 세우기에 적합하다.In particular, due to the high computational cost and bias of observers' subjective judgments, automated and accurate analysis of WSI is suitable for improved diagnosis and better treatment strategies.

딥러닝은 널리 사용되는 솔루션이 되었으며, 충분한 트레이닝 데이터가 제공될 때 향상된 결과를 얻을 수 있다.Deep learning has become a widely used solution and can yield improved results when provided with sufficient training data.

그러나 픽셀 레벨 해석은 어렵고 비용이 많이 드는 문제점이 있었다.However, pixel-level analysis has been difficult and costly.

이러한 문제점을 해결하기 위하여 다수의 다중 인스턴스 학습(Multiple instance learning, MIL) 기반의 신경망의 학습은 정확한 해석 없이 최종 WSI 진단에 대한 과제를 완화할 수 있는 솔루션을 제시한다.In order to solve this problem, learning of multiple instance learning (MIL)-based neural networks presents a solution that can alleviate the challenges of final WSI diagnosis without accurate interpretation.

예를 들어 "Campanella, G., Hanna, M.G., Geneslaw, L., Miraor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25(8), 1301{1309 (2019)" 등 에는 MIL 기반의 신경망 학습을 이용하여 조직 병리학 분류를 수행하는 방법들이 제안되었다.For example, "Campanella, G., Hanna, M.G., Geneslaw, L., Miraor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25(8), 1301{1309 (2019)" etc. proposed methods for performing histopathology classification using MIL-based neural network learning.

그러나 MIL의 인스턴스 레이블은 모호하기 때문에, 강력한 인스턴스 임베딩을 학습하는 것은 매우 어렵다.However, because MIL's instance labels are ambiguous, learning strong instance embeddings is very difficult.

이를 해결하기 위하여 "Hashimoto, N., Fukushima, D., Koga, R., Takagi, Y., Ko, K., Kohno, K., Nakaguro, M., Nakamura, S., Hontani, H., Takeuchi, I.: Multi-scale domainadversarial multiple-instance cnn for cancer subtype classication with nonannotated histopathological images. arXiv preprint arXiv:2001.01599 (2020)" 등에서는 (1) WSI에서 샘플링 된 영역을 기반으로 인스턴스 인코더를 학습하는 단계와 (2) 학습된 인스턴스 인코더를 사용하여 슬라이드 레벨 예측을 위한 인스턴스 레벨 정보를 통합하는 집계 모델을 학습하는 단계의 2단계 접근법을 채택하고 있다.To solve this problem, "Hashimoto, N., Fukushima, D., Koga, R., Takagi, Y., Ko, K., Kohno, K., Nakaguro, M., Nakamura, S., Hontani, H., In Takeuchi, I.: Multi-scale domain adversarial multiple-instance cnn for cancer subtype classication with nonannotated histopathological images. arXiv preprint arXiv:2001.01599 (2020)", (1) learning an instance encoder based on a region sampled from WSI and (2) learning an aggregation model incorporating instance-level information for slide-level prediction using a learned instance encoder.

그러나 위의 2단계 접근법을 사용하는 경우에도 일부 문제 설정에서는 성공적이지만, 이러한 접근법은 학습을 수행할 때 구분이 모호한 인스턴스들이 다수 이용되게 되면 실패하는 경우가 종종있으며, 특징(feature)이 실제 레이블을 나타내지 않기 때문에 집계 모델을 학습하는 두 번째 단계에서 악화되는 문제점이 있었다.However, even when using the above two-step approach, although it is successful in some problem settings, this approach often fails when a large number of ambiguous instances are used during learning, and the feature does not match the actual label. There was a problem that was exacerbated in the second step of training the aggregation model because it does not represent

따라서 정확한 분류가 어렵고, 분류의 신뢰성이 저하될 수 있다.Therefore, accurate classification is difficult, and reliability of classification may be lowered.

상기와 같은 문제점을 감안한 본 발명이 해결하고자 하는 과제는, 집계 모델을 변경하여 보다 정확한 분류가 가능한 조직 병리학 분류를 위한 다중 인스턴스 학습 방법을 제공함에 있다.An object to be solved by the present invention in view of the above problems is to provide a multi-instance learning method for histopathology classification capable of more accurate classification by changing an aggregation model.

아울러 표준 조직 병리학 과정을 따르는 조직 병리학 분류를 위한 다중 인스턴스 학습 방법을 제공함에 있다.In addition, it is to provide a multi-instance learning method for histopathology classification that follows the standard histopathology process.

좀 더 구체적으로, 본 발명은 정확한 인스턴스와 레이블 모음(bag labels)을 할당할 수 있는, 조직 병리학 분류를 위한 종단간 모델(end-to-end model)을 제공함에 목적이 있다.More specifically, an object of the present invention is to provide an end-to-end model for histopathology classification, capable of assigning accurate instances and bag labels.

상기와 같은 기술적 과제를 해결하기 위한 본 발명 조직 병리학 분류를 위한 다중 인스턴스 학습 방법은, 컴퓨팅 장치 또는 컴퓨팅 네트워크에서 적어도 하나의 프로세서에 의해 수행되는 조직 병리학 분류를 위한 다중 인스턴스 학습 방법으로서, 특징 추출 모델(Fθ(ㆍ))을 실행하여 i번째 슬라이드 유래 인스턴스(pij)를 저차원 임베딩(low dimensional embedding, gij)으로 변환하고, 이진 분류기를 이용하여 인스턴스(pij)의 양성성을 확인 후, 모든 모음(bags)의 인스턴스 레벨 확률(instance level probabilities)을 분류하여 학습을 위한 슬라이드당 최상위 인스턴스를 샘플링하는 인스턴스 선택 단계와, 상기 인스턴스 선택 단계에서 얻어진 인스턴스들을 이용하여 학습하되, 인스턴스 레벨 학습과 모음 레벨 학습을 순차 수행하여 최종 로스를 구하는 학습단계와, 두 점 사이의 유사도를 검출하는 커널을 이용하여 모음 레벨 임베딩(zi)을 학습된 중심(learned centroid)에 분배하는 소프트 할당 기반 추론 단계를 포함할 수 있다.The multi-instance learning method for histopathology classification of the present invention to solve the above technical problem is a multi-instance learning method for histopathology classification performed by at least one processor in a computing device or a computing network, and includes a feature extraction model. (Fθ(ㆍ)) is executed to convert the ith slide-derived instance (pij) into a low dimensional embedding (gij), and after confirming the positivity of the instance (pij) using a binary classifier, all collections An instance selection step of sampling the highest instance per slide for learning by classifying instance level probabilities of bags, and learning using the instances obtained in the instance selection step, but instance-level learning and collection-level learning and a soft assignment-based inference step of distributing vowel level embeddings (zi) to a learned centroid using a kernel that detects a similarity between two points. there is.

본 발명은 센터 로스(center loss)를 통해 특징 모음(bag feature)의 학습을 개선하고 또한 인스턴스 레이블의 불확실성을 개선함으로써, 보다 정확한 분류를 수행할 수 있는 효과가 있다.The present invention has an effect of performing more accurate classification by improving the learning of bag features through center loss and also improving the uncertainty of instance labels.

또한, 인스턴스 기반의 MIL과 임베딩 기반 MIL을 모두 고려함으로써, 분류 성능을 개선하여 가양성적 판단(false positives)을 감소시킬 수 있는 효과가 있다.In addition, by considering both instance-based MIL and embedding-based MIL, there is an effect of reducing false positives by improving classification performance.

도 1은 본 발명의 바람직한 실시 예에 따른 조직 병리학 분류를 위한 다중 인스턴스 학습 방법의 프레임워크이다.
도 2는 슬라이드 클래스 당 모델 샘플인 k 패치들과 세분화를 통한 해석성에서 학습된 모델의 효과라는 두 가지 측면에서 본 발명의 질적 결과를 나타낸다.1 is a framework of a multi-instance learning method for histopathology classification according to a preferred embodiment of the present invention.
Figure 2 shows the qualitative results of the present invention in two aspects: k patches, which are model samples per slide class, and the effect of the learned model on interpretability through segmentation.

본 발명의 구성 및 효과를 충분히 이해하기 위하여, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예들을 설명한다. 그러나 본 발명은 이하에서 개시되는 실시 예에 한정되는 것이 아니라, 여러가지 형태로 구현될 수 있고 다양한 변경을 가할 수 있다. 단지, 본 실시 예에 대한 설명은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위하여 제공되는 것이다. 첨부된 도면에서 구성요소는 설명의 편의를 위하여 그 크기를 실제보다 확대하여 도시한 것이며, 각 구성요소의 비율은 과장되거나 축소될 수 있다.In order to fully understand the configuration and effects of the present invention, preferred embodiments of the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, and may be implemented in various forms and various changes may be applied. However, the description of the present embodiment is provided to complete the disclosure of the present invention, and to completely inform those skilled in the art of the scope of the invention to which the present invention belongs. In the accompanying drawings, the size of the components is enlarged from the actual size for convenience of description, and the ratio of each component may be exaggerated or reduced.

'제1', '제2' 등의 용어는 다양한 구성요소를 설명하는데 사용될 수 있지만, 상기 구성요소는 위 용어에 의해 한정되어서는 안 된다. 위 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용될 수 있다. 예를 들어, 본 발명의 권리범위를 벗어나지 않으면서 '제1구성요소'는 '제2구성요소'로 명명될 수 있고, 유사하게 '제2구성요소'도 '제1구성요소'로 명명될 수 있다. 또한, 단수의 표현은 문맥상 명백하게 다르게 표현하지 않는 한, 복수의 표현을 포함한다. 본 발명의 실시 예에서 사용되는 용어는 다르게 정의되지 않는 한, 해당 기술분야에서 통상의 지식을 가진 자에게 통상적으로 알려진 의미로 해석될 수 있다.Terms such as 'first' and 'second' may be used to describe various elements, but the elements should not be limited by the above terms. The above terms may only be used for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a 'first element' may be named a 'second element', and similarly, a 'second element' may also be named a 'first element'. can Also, singular expressions include plural expressions unless the context clearly indicates otherwise. Terms used in the embodiments of the present invention may be interpreted as meanings commonly known to those skilled in the art unless otherwise defined.

또한, 본 발명은 전체 슬라이드 이미지(WSI) 스캐너로 획득한 이미지를 디지털화하고, 그 결과를 학습하여 조직 병리학 분류를 수행하는 방법에 관한 발명이다.In addition, the present invention relates to a method for performing histopathology classification by digitizing an image acquired by a whole slide image (WSI) scanner and learning the result.

본 발명은 제안하는 프레임워크를 수행하는 마이크로 프로세서 등의 프로세서와, 디지털화된 WSI가 저장되는 저장수단, 분류 및 학습 수행 시 데이터를 임시 호출할 수 있는 메모리수단, 사용자의 제어명령을 입력할 수 있는 입력수단, 분류 결과를 표시하는 표시수단을 포함하는 컴퓨팅 장치에서 수행될 수 있다.The present invention provides a processor such as a microprocessor that performs the proposed framework, a storage means for storing digitized WSI, a memory means for temporarily recalling data when classifying and learning is performed, and a user's control command input. It can be performed in a computing device including an input means and a display means for displaying a classification result.

이에 더하여 WSI 스캐너를 더 포함할 수도 있다.In addition to this, a WSI scanner may be further included.

상기 컴퓨팅 장치는 퍼스널 컴퓨터, 서버, 스마트 패드, 스마트 폰 등의 단일 기기 뿐만 아니라 WSI 데이터를 수집하는 단말들과, 수집된 WSI 데이터를 학습하고 분류하는 서버를 포함하는 네트워크 장치일 수도 있다.The computing device may be a single device such as a personal computer, server, smart pad, or smart phone, as well as a network device including terminals that collect WSI data and a server that learns and classifies the collected WSI data.

즉, 본 발명은 단일 또는 네트워크를 구성하는 컴퓨팅 시스템을 이용하는 방법이며, 특히 프로세서에서 처리되는 과정으로 이해될 수 있다.That is, the present invention is a method using a computing system constituting a single or network, and can be particularly understood as a process processed by a processor.

따라서 특별한 언급이 없더라도 본 발명을 이루는 각 단계의 수행 주체는 표현상의 차이가 있을 수 있으나 통상의 프로세서일 수 있다.Therefore, even if there is no special mention, the performer of each step constituting the present invention may be a normal processor although there may be differences in expression.

도 1은 본 발명의 바람직한 실시 예에 따른 조직 병리학 분류를 위한 다중 인스턴스 학습 방법의 프레임워크이다.1 is a framework of a multi-instance learning method for histopathology classification according to a preferred embodiment of the present invention.

이를 참조하면 본 발명은 슬라이드 당 최고 예측 확률을 기반으로 k개의 인스턴스를 샘플링하는 인스턴스 선택 단계(S10)와, 상기 인스턴스 선택 단계(S10)에서 얻어진 인스턴스들을 이용하여 학습하는 학습단계(S20)와, 학습된 센터에 대한 소프트 할당 기반 추론 단계(S30)를 포함한다.Referring to this, the present invention includes an instance selection step (S10) of sampling k instances based on the highest predicted probability per slide, a learning step (S20) of learning using the instances obtained in the instance selection step (S10), Soft allocation based inference step for the learned center (S30).

이하, 상기와 같이 구성되는 본 발명의 구성과 작용에 대하여 좀 더 상세히 설명한다.Hereinafter, the configuration and operation of the present invention configured as described above will be described in more detail.

먼저, 본 발명의 구체적인 설명에 앞서 몇 가지 정의를 한다.First, some definitions are made prior to detailed description of the present invention.

WSI 데이터세트(D)는 데이터 집합이며, {S1, ..., Sn}로 표현될 수 있다.The WSI dataset (D) is a data set and can be expressed as {S1, ..., Sn}.

여기서 각 Si(i는 1 내지 n, n은 양의 정수)는 레이블(yi)을 가지는 WSI 이미지이며, 레이블은 0 또는 1의 값을 가지는 것으로 한다.Here, each Si (i is 1 to n, n is a positive integer) is a WSI image having a label (yi), and the label is assumed to have a value of 0 or 1.

상기 WSI 데이터세트에 포함되는 데이터인 Si는 m개의 인스턴스를 가질 수 있다.Si, which is data included in the WSI dataset, may have m instances.

즉, Si는 {s_i1, s_i2, ..., s_im}으로 표현될 수 있다.That is, Si can be expressed as {s _i1 , s _i2 , ..., s _im }.

여기서 m은 슬라이드 내의 백그라운드가 아닌 영역에서 얻어진 인스턴스의 총 수인 M과 같거나 작은 양의 정수이다.where m is a positive integer less than or equal to M, the total number of instances obtained in non-background areas within the slide.

본 발명에서는 각 슬라이드에 올바른 레이블(yi)를 할당하기 위하여, 학습 중에는 세팅된 레이블만을 사용할 수 있으며, 1은 양성을 0은 음성을 나타낸다. In the present invention, in order to assign the correct label (yi) to each slide, only set labels can be used during learning, with 1 indicating positive and 0 indicating negative.

다중 인스턴스 학습(MIL)은 다음과 같은 조건을 충족해야 한다.Multi-instance learning (MIL) must meet the following conditions:

- 데이터(Si)가 음성이면, Si의 모든 인스턴스는 음성이다. 즉, 인스턴스 레이블(yi)이 0이면, ∀(yij)는 0이다.- If data (Si) is negative, then all instances of Si are negative. That is, if the instance label yi is 0, ∀(yij) is 0.

- 데이터(Si)가 양성이면, Si의 적어도 하나의 인스턴스는 양성이다. 즉, 인스턴스 레이블(yi)가 1이면, Σyij는 1이상이다.- If data (Si) is positive, then at least one instance of Si is positive. That is, if the instance label yi is 1, Σyij is greater than or equal to 1.

이와 같은 조건들을 충족하기 위하여 본 발명에서는 도 1에 도시한 종단간 프레임워크(end-to-end framework)에서 인스턴스 레벨 판별 및 슬라이드 레벨 분류를 모두 수행할 수 있는 학습 모델(CNN model)을 제안한다.In order to satisfy these conditions, the present invention proposes a learning model (CNN model) that can perform both instance level discrimination and slide level classification in the end-to-end framework shown in FIG. .

먼저, 인스턴스 선택 단계(S10)에서 신경망인 특징 추출 모델(Fθ(ㆍ))을 실행하여 i번째 슬라이드 유래 인스턴스를 저차원 임베딩(low dimensional embedding, gij)으로 변환한다.First, in the instance selection step (S10), the feature extraction model (Fθ(•)), which is a neural network, is executed to convert the i-th slide-derived instance into a low dimensional embedding (gij).

저차원 임베딩(gij)은 Fθ(s_ij)로 표현될 수 있다.The low-dimensional embedding gij can be expressed as Fθ(s _ij ).

이때의 변환은 공유 임베딩모듈(Eθ)을 포함하는 인스턴스 브랜치(L_I)를 통해 이루어진다.The conversion at this time is performed through the instance branch L _I including the shared embedding module Eθ.

그 다음, 분류 모듈은 인스턴스(pij)의 양성성을 출력한다.The classification module then outputs the positivity of the instance pij.

이때 양성성의 출력은 이진 분류기(H_I)를 사용하며, 앞서 언급한 바와 같이 1이면 양성, 0이면 음성을 나타낸다.At this time, the output of positivity uses a binary classifier (H _I ), and as mentioned above, 1 indicates positive, and 0 indicates negative.

인스턴스(pij)를 분류하는 이진 분류기(H_I)는 H_I(pij)로 표현할 수 있다.A binary classifier (H _I ) classifying an instance (pij) can be expressed as H _I (pij).

그 다음, 모든 모음(all bags)의 인스턴스 레벨 확률(instance level probabilities)을 분류하여 학습을 위한 슬라이드당 최상위 인스턴스를 얻는다.Then classify the instance level probabilities of all bags to get the top instance per slide for training.

즉, 본 발명에서는 데이터(D)를 모두 사용하지 않고, 선별된 k개의 데이터(Dk)를 사용한다. 이를 탑 케이(top-k) 인스턴스 선택으로 명명한다.That is, in the present invention, all of the data D is not used, but the selected k pieces of data Dk are used. We call this top-k instance selection.

이때 선별된 데이터(Dk)는 데이터(D)의 부분집합이다.At this time, the selected data Dk is a subset of the data D.

이러한 과정에서 F, E와 앞으로 설명될 모음 모듈(B)의 파라미터인 θ는 다른 모듈들과 마찬가지로 업데이트되거나 저장되지 않는다.In this process, F, E and θ, which is a parameter of the vowel module (B) to be described later, are not updated or stored like other modules.

다시 말해서, 본 발명의 인스턴스 선택 단계에서는, 인스턴스 모듈을 통해 슬라이드당 예상 최고 확률을 기반으로 각 트레이닝이 시작될 때, k개의 인스턴스를 샘플링한다.In other words, in the instance selection step of the present invention, when each training starts based on the expected highest probability per slide through the instance module, k instances are sampled.

그 다음, 얻어진 k개의 인스턴스를 이용하여, 학습단계(S20)를 수행한다.Then, using the obtained k instances, a learning step (S20) is performed.

본 발명의 학습단계(S20)는 인스턴스 레벨 학습과 모음 레벨 학습을 포함한다.The learning step (S20) of the present invention includes instance level learning and vowel level learning.

즉, 인스턴스 레벨과 모음 레벨을 모두 고려한 학습을 수행한다.That is, learning is performed considering both the instance level and the collection level.

먼저, 인스턴스 레벨 학습은, 각 학습 단계에서 주어진 k개의 인스턴스 입력에 대하여, 신경망(Fθ)을 통해 전체 평균 풀링 후 임베딩(g_ij)들을 획득한다.First, instance-level learning obtains embeddings (gi _ij ) after total mean pooling through a neural network (Fθ) for k instances inputs given in each learning step.

획득된 상기 임베딩(g_ij)은 공유 임베딩 모듈(Eθ)에 공급되고, 크로스 엔트로피를 사용하는 인스턴스 분류기(H_I)에 대한 예측값을 얻을 수 있다. The obtained embedding (g _ij ) is supplied to the shared embedding module (Eθ), and a predicted value for the instance classifier (H _I ) using cross entropy can be obtained.

이때 k개의 인스턴스 입력(s_ij)은 상기 k개의 데이터(Dk)의 집합에 포함된다.At this time, the k number of instance inputs (s _ij ) are included in the set of the k number of data (Dk).

각 인스턴스들에는 모음 레벨 레이블(y)이 할당되며, 위의 조건이 충족되어 인스턴스 로스(instance loss, L_I)를 계산하는데 사용된다.Each instance is assigned a vowel level label (y), and the above condition is satisfied and used to calculate instance loss (L _I ).

인스턴스 로스의 산출식은 아래의 수학식1로 표현될 수 있다.The formula for calculating instance loss can be expressed as Equation 1 below.

또한, 모음 레벨 학습(pyramidal bag-level learning, Bθ)에서는 앞서 얻어진 임베딩(g_ij)에 더하여 상기 인스턴스 레벨 학습에서 얻어진 세가지 특징 맵(features maps)이 이용된다.In addition, in pyramidal bag-level learning (Bθ), three feature maps obtained in the instance-level learning are used in addition to the previously obtained embedding (g _ij ).

세가지 특징 맵은, 저차원 임베딩, 인스턴스, 모음(bags)의 인스턴스 레벨 확률이다.The three feature maps are low-dimensional embeddings, instances, and instance-level probabilities of bags.

상기 세가지 특징 맵은 직렬의 입력크기가 [512, 256, 128] 크기를 가지는 특징 맵 크기에 해당하는 합성곱 블록들에 입력되고, 각각 싱글 채널로 감소된 특징을 얻을 수 있다.The three feature maps are input to convolutional blocks corresponding to feature map sizes having a serial input size of [512, 256, 128], and features reduced to a single channel can be obtained.

그 다음, 이전 블록의 크기와 일치시키고, 이전 특징 맵과 연결하는 선형 보간법을 사용하여 맵을 업샘플링하여 이전 블록의 크기와 일치시키고, 이전 특징 맵과 연결하여 최종 플래튼 특징(zi)을 얻는 데 사용되는 k개의 채널로 단일 공간 맵을 얻는다.Then upsampling the map using linear interpolation to match the size of the previous block and concatenate it with the previous feature map to obtain the final platen feature (zi). A single spatial map is obtained with k channels used to

그 다음, 최종 플래튼 특징(zi)은 공유 임베딩 모듈(Eθ)을 통해, 모음 분류기(H_B(zi))로 입력된다.Then, the final platen feature (zi) is input to the vowel classifier (H _B (zi)) through the shared embedding module (Eθ).

모음 예측(

)은 교차 엔트로피를 사용하여 모음 로스(L_B(

,y)을 구하는데 사용된다.vowel prediction (

) is the vowel loss (L _B (

,y) is used to find

모음 로스(bag loss)를 구하는 식은 아래의 수학식2와 같다.The equation for obtaining the bag loss is as shown in Equation 2 below.

이에 더하여, 본 발명에서는 딥 특징(deep features)의 차별적인 능력을 향상시키기 위하여 센터 로스(center loss)의 개념을 도입한다.In addition, the present invention introduces the concept of center loss in order to improve the discriminative ability of deep features.

센터 로스는 동일한 모음에서 인스턴스 거리를 최소화하는 임베딩을 학습하여 클래스 내부 다양성(intra-class variation)을 특징화 한다.Center loss characterizes intra-class variation by learning embeddings that minimize instance distances in the same collection.

센터 로스는 아래의 수학식 3으로 표현될 수 있다.The center loss can be expressed by Equation 3 below.

수학식 3에서 c_yi는 임베딩(g_ij)과 동일 차원의 yij 번째 클래스의 센터 딥 특징이며, u는 미니 배치(mini-batch) 크기이다.In Equation 3, c _yi is the center-dip feature of the yij th class of the same dimension as the embedding (g _ij ), and u is the mini-batch size.

본 발명에서 클래스 센터는 표준 정규 분포에서 초기화되고, 센터 로스(L_C)와 모음 로스(L_B)와 함께 트레이닝 된 센터모듈(Cθ(ㆍ))에 의해 매개 변수화 된다.In the present invention, the class center is initialized from a standard normal distribution and parameterized by the center module ( _Cθ ( )) trained with the center loss (L _C ) and the vowel loss (LB ).

직관적으로, 동일 모음의 인스턴스 임베딩에는 임베딩 공간의 유사 점으로 클러스터링 할 수 있는 유사 특징이 있어야 한다. Intuitively, instance embeddings of the same collection should have similar features that can be clustered into similar points in the embedding space.

클래스 센터는 전체 데이터 세트가 아닌 미니 배치의 인스턴스 임베딩을 기반으로 업데이트 된다.Class centers are updated based on instance embeddings in mini-batch, not the full data set.

최종 로스 함수는 수학식 4로 표현될 수 있다.The final loss function can be expressed as Equation 4.

위의 수학식 4에서 α, β, λ는 각각 로스의 균형값(balance the loss)이다.In Equation 4 above, α, β, and λ are balance values of the loss, respectively.

그 다음, 소프트 할당 기반 추론 단계(S30)에서, 정확한 분류를 위해서 상기 모음 레벨 학습을 통해 얻어진 모음 임베딩을 고려할 때, 정확한 레이블을 최종 진단으로서 분배할 필요가 있다.Then, in the soft assignment-based inference step (S30), considering the vowel embedding obtained through the vowel level learning for accurate classification, it is necessary to distribute an accurate label as a final diagnosis.

이를 위해서는 동일 모음의 인스턴스-임베딩은 모음 레이블을 대표하는 단일 중심(single centroid)과 일치해야 한다.To do this, instance-embeddings of the same collection must match a single centroid representing the collection label.

슬라이드(Si)에 대한 모음 레벨 임베딩(zi)은 B{gi1, ... , gik}이며, 두 점 사이의 유사도를 검출하는 커널인 수학식 5를 통해 모음 레벨 임베딩(zi)을 학습된 중심(learned centroid)에 분배할 수 있다.The vowel level embedding (zi) for the slide (Si) is B{gi1, ... , gik}, and the vowel level embedding (zi) is the learned center through Equation 5, a kernel that detects the similarity between two points. (learned centroid).

수학식 5에서 qi는 모음 레벨 임베딩(zi)을 클래스 센터에 분배할 확률이며, ψ는 스튜던트 분포의 자유도이다.In Equation 5, qi is the probability of distributing the vowel level embedding (zi) to the class center, and ψ is the degree of freedom of the Student distribution.

위의 예에서는 슬라이드 단위의 레이블을 센터 로스 부분에서 얻어지는 예측값과 센터 로스의 거리를 바탕으로 결정하는 것이며, 다른 예로서 모음 레벨 로스에서 얻어지는 예측값으로도 결정할 수 있다.In the above example, the label of each slide is determined based on the distance between the predicted value obtained from the center loss part and the center loss, and as another example, the predicted value obtained from the vowel level loss can also be determined.

아울러 본 발명에서 학습을 수행할 때는 인스턴스 레벨 단위의 레이블을 가지고 있지 않으나, 예측값은 인스턴스 단위로 구할 수 있다.In addition, when learning is performed in the present invention, a label in an instance level unit is not included, but a prediction value can be obtained in an instance unit.

인스턴스들의 예측이 가능하면 악성영역(종양)을 자동으로 검출할 수 있다.If instances can be predicted, malignant areas (tumors) can be automatically detected.

이와 같이 구성되는 본 발명을 이용하여 조직 병리학 분류를 수행하는 실험의 예와 그 결과에 대하여 아래에서 설명한다.An example of an experiment for performing histopathology classification using the present invention configured as described above and the result thereof will be described below.

- 데이터 세트 와 세팅- Data sets and settings

본 발명의 실시예로서, 서로 다른 스캐닝 조건에서 스캐닝된 익명의 의료 센터에서 수집한 두 개의 소장 암(colectoral cancer, CRC) 데이터 세트를 준비한다.As an example of the present invention, two small intestine cancer (CRC) data sets collected from an anonymous medical center scanned under different scanning conditions are prepared.

데이터 세트는 헤마톡실린(Hematoxylin) 및 에오신(Eosin) 착색되어 서로 다른 스캐너에서 x40 배율로 스캔된 정상 및 악성 조직 슬라이드를 포함한다.The data set contains normal and malignant tissue slides stained with Hematoxylin and Eosin and scanned at x40 magnification on different scanners.

CRC는 사람에게서 세번째로 많이 발생하는 암이며, 남성과 여성 모두에서 일반적인 사망원인이다.CRC is the third most common cancer in humans and a common cause of death in both men and women.

상기 데이터 세트에는 악성 슬라이드는 결함 DNA로 인한 분자 표현형인 미소부수체 불안정(microsatellite instable, MSI) CRC를 포함한다.In this data set, malignant slides contain microsatellite instable (MSI) CRC, a molecular phenotype due to defective DNA.

전문 병리학자들은 면역 조직 화학 분석(Immunohistochemical analysis, IHC)과 PCR 기반 증폭을 이용하여 MSI를 검출하고 치료한다. CRC에서 MSI 상태의 결정은 예후 및 치료적 의미를 갖는다.Specialist pathologists use immunohistochemical analysis (IHC) and PCR-based amplification to detect and treat MSI. Determination of MSI status in CRC has prognostic and therapeutic implications.

상기 두 데이터 세트 중 제1데이터 세트는, 정상 59, 악성 114개인 총 173개의 WSI로 이루어지고, 제2데이터 세트는 정상 85개, 악성 108로 총 193개의 WSI로 이루어진다.Among the two data sets, the first data set consists of a total of 173 WSIs, including 59 normal and 114 malicious WSIs, and the second data set consists of 85 normal WSIs and 108 malicious WSIs, totaling 193 WSIs.

본 발명에서는 각 데이터 세트를 트레이닝, 검증 및 테스트를 위하여 상호 겹치지 않는, 전체에 대하여 각각 40%, 10%, 50%인 세트로 분할한다.In the present invention, each data set is divided into non-overlapping sets of 40%, 10%, and 50% of the total for training, verification, and testing.

그 다음, 각 WSI에 대하여 HSV 색공간으로 변환한 후, "Otsu, N.: A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1), 62{66 (1979)"에 기재된 방법에 따라 비조직 영역을 제거한다.Then, after converting to HSV color space for each WSI, "Otsu, N.: A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1), 62{66 (1979 )" to remove non-tissue areas.

패치 후보 위치는 각각의 트레이닝 및 검증 동안 각 슬라이드 당 추출을 위해 무작위로 선택된다.Patch candidate locations are randomly selected for extraction per each slide during each training and validation.

트레이닝 및 추론 중, 인스턴스의 수 k를 50으로 설정하고, 256X256 크기의 인스턴스를 사용한다.During training and inference, the number k of instances is set to 50, and instances of size 256X256 are used.

"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770{778 (2016)"의 ResNet-34가 미세조정 및 특징 추출 모델(Fθ)로서 사용되었다."He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770{778 (2016) "'s ResNet-34 was used as the refinement and feature extraction model (Fθ).

또한, 인스턴스 및 모음 분류기(H_I, H_B) 완전 연결 레이어(fully connected layer)가 사용되었다.In addition, instance and vowel classifiers (H _I , H _B ) fully connected layers were used.

임베딩 및 중앙 모듈이 특징 수는 512로 설정되고, 전체 프레임 워크는 종단간 트레이닝되며, 트레이닝시 학습 레이트는 40회에 1e^-4로 설정한다.The number of features of the embedding and central modules is set to 512, the entire framework is trained end-to-end, and the training rate is set to 1e ^-4 at 40 times.

그리고 로스의 균형값(balance the loss) α, β, λ는 각각 1.O, 0.01, 0.01로 설정한다. 즉 균형값은 임의의 양의 소수 값으로 설정될 수 있다.In addition, the balance of the loss α, β, and λ are set to 1.0, 0.01, and 0.01, respectively. That is, the balance value can be set to any positive decimal value.

- 비교 방법- Comparison method

본 발명의 효과를 설명하기 위하여, 분류결과를 본 발명이 속하는 기술분야의 최신 기술인 "Campanella, G., Hanna, M.G., Geneslaw, L., Miraor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25(8), 1301{1309 (2019)"의 결과와 비교한다. 위의 기술을 비교대상기술1이라 약칭한다.In order to explain the effect of the present invention, the classification result is the latest technology in the art to which the present invention belongs, "Campanella, G., Hanna, M.G., Geneslaw, L., Miraor, A., Silva, V.W.K., Busam, K.J., Brogi , E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25(8), 1301{1309 (2019)" . The above technology is abbreviated as Comparative Technology 1.

또한, 비교기술2(Ilse, M., Tomczak, J.M.,Welling, M.: Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712 (2018)) 및 비교기술3(Nazeri, K., Aminpour, A., Ebrahimi, M.: Two-stage convolutional neural network for breast cancer histology image classication. In: International Conference Image Analysis and Recognition. pp. 717{726. Springer (2018)"에 대해서도 평가한다.In addition, comparative technique 2 (Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712 (2018)) and comparative technique 3 (Nazeri, K., Aminpour, A ., Ebrahimi, M.: Two-stage convolutional neural network for breast cancer histology image classication. In: International Conference Image Analysis and Recognition. pp. 717{726. Springer (2018)".

공정한 비교를 위하여 모든 케이스에 대하여 동일한 백본 Fθ를 사용한다.For fair comparison, the same backbone Fθ is used for all cases.

비교기술1과 비교기술3은 모두 인스턴스 레벨 학습과 슬라이드 레벨 집계의 2단계 학습 절차를 사용한다.Comparative technique 1 and comparative technique 3 both use a two-step learning procedure of instance-level learning and slide-level aggregation.

비교기술2는 어텐션 메커니즘(attention mechanism) 기반의 순열 불변량 풀링(permutation invariant pooling)과 함께 종단간 접슨 방식(end-to-end approach)을 사용한다.Comparative technique 2 uses an end-to-end approach together with permutation invariant pooling based on an attention mechanism.

- 양적 결과- Quantitative results

아래의 표 1은 본 발명의 제1데이터 세트 분류 결과와 비교기술들의 성능 비교표이다.Table 1 below is a performance comparison table of the first data set classification results and comparison techniques of the present invention.

위의 표 1을 통해 확인할 수 있는 바와 같이 소프트 할당 기반 추론 단계(S30)를 포함하는 본 발명이 가장 우수한 결과를 달성하였다.As can be seen from Table 1 above, the present invention including the soft allocation-based reasoning step (S30) achieved the best results.

비교기술1은 최상위 인스턴스의 확률을 최종 슬라이드로 간주하기 때문에 성능이 저하된다.Comparison Technique 1 suffers from poor performance because it considers the probability of the top instance as the final slide.

또한, 본 발명이 모음 분류만을 사용하여 평가되었을 때 비교기술1에 비하여 9.97% 개선된 성능을 달성하였고, 할당 기반 접근법은 15.56%의 개선을 나타내었다.In addition, when the present invention was evaluated using only vowel classification, it achieved a 9.97% improved performance compared to Comparative Technique 1, and the assignment-based approach showed an improvement of 15.56%.

특히 비교기술2는 비교된 기술들 중 어텐션 기반 집계의 이점을 보여주는 최고의 방법이었다. 그러나 비교기술2의 문헌에서 설명하는 것보다 본 발명의 모음 모듈을 적용하는 것이 더욱 성능을 향상시킬 수 있다.In particular, comparative technique 2 was the best way to show the advantage of attention-based aggregation among the compared techniques. However, the performance can be further improved by applying the collection module of the present invention than described in the comparative technology 2 document.

대부분의 경우 소프트 할당은 모음 분류기에 대한 훌륭한 대안이며, 본 발명에서는 학습된 센터가 유사한 다른 인스턴스 임베딩들 사이에서 최대 정보를 가진다는 것에 대하여 논의하였으며, 소프트 할당 기반 추론 단계(S30)는 모음 분류기를 사용하는 다른 방법들에 비하여 더 강력한 성능을 나타낸다.In most cases, soft assignment is an excellent alternative to vowel classifiers. In the present invention, we discussed that the learned center has the maximum information among other similar instance embeddings, and the soft assignment-based inference step (S30) It shows more powerful performance compared to other methods used.

표 2는 본 발명의 제2데이터 세트 분류 결과와 비교기술들의 성능 비교표이다.Table 2 is a performance comparison table of the second data set classification results and comparison techniques of the present invention.

위의 표 2에 도시한 바와 같이 본 발명의 성능은 비교기술들에 비하여 월등하게 높은 성능을 나타낸다.As shown in Table 2 above, the performance of the present invention is significantly higher than that of comparable technologies.

비교기술1의 RNN 기반 집계가 이 세트에서 가장 우수하지만, 본 발명에서는 이보다 1.28% 개선된 특징을 나타낸다.Although the RNN-based aggregation of Comparative Technique 1 is the best in this set, the present invention shows a 1.28% improvement over it.

또한, 대부분의 방법들이 처음의 데이터 세트와 비교하여 제2데이터 세트에서 더 높게 나타났다. 이는 스캐닝 프로토콜의 차이로 인해 성능이 다르게 나타날 수 있기 때문이다.Also, most methods appeared higher in the second data set compared to the first data set. This is because performance may appear different due to differences in scanning protocols.

최근 정규화 기법을 사용한 비교기술1에 소개된 표준 프로토콜을 따르기 때문에 전처리에서 색상 정규화 방법을 사용하지 않았다.The color normalization method was not used in the preprocessing because it follows the standard protocol introduced in Comparative Technique 1 using the recent normalization technique.

- 질적 결과- Qualitative results

도 2는 슬라이드 클래스 당 모델 샘플인 k 패치들과 세분화를 통한 해석성에서 학습된 모델의 효과라는 두 가지 측면에서 본 발명의 질적 결과를 나타낸다.Figure 2 shows the qualitative results of the present invention in two aspects: k patches, which are model samples per slide class, and the effect of the learned model on interpretability through segmentation.

임상 연구에서 전문가를 돕기 위한 결정을 내릴 때, 모델의 초점 영역을 보는 것이 유리하다.When making decisions to assist experts in clinical research, it is advantageous to look at the model's areas of focus.

따라서, 본 발명은 인스턴스 분류기에 의해 예측된 높고 낮은 양성으로 k값을 5로 하여 패치를 수집함으로써, 본 발명의 효과를 시각적으로 검증한다.Therefore, the present invention visually verifies the effect of the present invention by collecting patches with a k value of 5 with high and low positives predicted by the instance classifier.

특히, 가장 낮은 값으로 예측된 사례는 모두 실제 정상 조직에 해당하는 반면, 확률이 높은 것은 함께 모인 악성 종양 영역이다.In particular, the cases predicted with the lowest values all correspond to actual normal tissue, while the ones with high probability are areas of malignant tumors clustered together.

이는 모델이 각 슬라이드에서 모호한 레이블을 정확하게 분류할 수 있음을 나타낸다.This indicates that the model can accurately classify ambiguous labels in each slide.

또한, 트레이닝 된 모델을 사용하여 전체 슬라이드에 대해 패치 단위 분류를 수행함으로써, 임계값이 있고, 종양 확률이 높은 영역을 보여주는 히트 맵을 얻을 수 있다.In addition, by performing patch-by-patch classification on the entire slide using the trained model, it is possible to obtain a heat map showing regions with a high tumor probability with a threshold value.

특히 이러한 예측은 전문가의 주석과 정확하게 일치한다.In particular, these predictions are in exact agreement with the expert's comments.

이는 본 발명의 각 단계에서 의심스러운 사례들을 정확하게 선택하고, 음성 판정 오인을 피하는 샘플링을 사용하기 때문이다.This is because at each stage of the present invention, sampling is used to accurately select suspicious cases and avoid misidentification of negative decisions.

이러한 결과들은 본 발명의 효과를 검증하고, 인스턴스와 모임 특징 학습 사이에 우수한 균형을 이루고 있음을 나타낸다.These results verify the effectiveness of the present invention and indicate a good balance between instance and meeting feature learning.

이상에서 본 발명에 따른 실시 예들이 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 범위의 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 다음의 청구범위에 의해서 정해져야 할 것이다.Embodiments according to the present invention have been described above, but these are merely examples, and those skilled in the art will understand that various modifications and embodiments of equivalent scope are possible therefrom. Therefore, the true technical protection scope of the present invention should be defined by the following claims.

Claims

A multi-instance learning method for histopathology classification performed by at least one processor in a computing device or computing network, comprising:
After executing the feature extraction model (Fθ(ㆍ)), the ith slide-derived instance (pij) is converted into a low-dimensional embedding (low dimensional embedding, gij), and the positiveness of the instance (pij) is confirmed using a binary classifier. , instance selection step of sampling the top instance per slide for learning by classifying instance level probabilities of all bags;
A learning step of learning using the instances obtained in the instance selection step, and obtaining a final loss by sequentially performing instance-level learning and vowel-level learning; and
Including a soft assignment-based inference step of distributing vowel level embeddings (zi) to learned centroids using a kernel that detects a similarity between two points,
In the learning phase,
The instance-level learning is performed to obtain an instance loss, the bag loss is obtained by performing the bag-level learning based on the instance-level learning, and the bag loss is obtained based on the instance-level learning and the bag-level learning. A multi-instance learning method for histopathology classification, characterized in that by obtaining a center loss with , and obtaining the final loss through the instance loss, the vowel loss, and the center loss.

delete

According to claim 1,
A multi-instance learning method for histopathology classification, characterized in that processing so that features of instances extracted from the same collection are similar using the center loss.

According to claim 1 or 3,
The instance loss,
A multi-instance learning method for histopathology classification represented by Equation 1 below.
Equation 1

L _I is the instance loss, yij is the instance label, pij is the instance

According to claim 1 or 3,
The vowel Ross,
A multi-instance learning method for histopathology classification represented by Equation 2 below.
Equation 2

L _B is the vowel loss, yi is the instance label,

is vowel prediction

According to claim 1 or 3,
The center loss,
Multi-instance learning method for histopathology classification represented by Equation 3 below.
Equation 3

L _C is the center loss, g _ij is the embedding, c _yi is the center-dip feature of the yij-th class of the same dimension as the embedding (g _ij ), u is the center-dip of the yij-th class of the same dimension as the mini-batch size characteristic

According to claim 1 or 3,
The final loss is,
A multi-instance learning method for histopathology classification represented by Equation 4 below.
Equation 4

L is the final loss, L _I is the instance loss, L _B is the vowel loss, L _C is the center loss, and α, β, and λ are the balance of the loss, respectively, and are arbitrary positive decimal values.

According to claim 1 or 3,
The kernel for detecting the degree of similarity between two points in the soft assignment-based inference step is a multi-instance learning method for histopathology classification represented by Equation 5 below.
Equation 5

qi is the probability of distributing vowel level embeddings (zi) to class centers, ψ is the degree of freedom of the Student distribution