KR102522734B1

KR102522734B1 - Transfer Learning method and apparatus based on self-supervised learning for 3D MRI analysis

Info

Publication number: KR102522734B1
Application number: KR1020210097644A
Authority: KR
Inventors: 석흥일; 전은지
Original assignee: 고려대학교 산학협력단
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-04-17
Also published as: KR20230016324A

Abstract

3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습 방법 및 그 장치가 개시된다. 3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습 프레임워크 장치는, 3차원 의료 영상을 3개의 평면으로 나눈 후 각 평면마다 2차원의 순차적 데이터로 표현한 후 3차원 공간 특징 정보로 통합하도록 사전 표현 학습(Representation learning)되는 중축 모델; 및 완전 연결 계층으로 구성되며, 상기 3차원 공간 특징 정보를 기반으로 목표 태스크에 대한 예측을 수행하는 예측 모듈을 포함하되, 상기 사전 학습된 중축 모듈과 상기 예측 모듈은 상기 목표 태스크에 따라 미세 조정될 수 있다. Disclosed are a self-learning-based transfer learning method and apparatus for analyzing 3D medical images. A self-learning-based transfer learning framework device for 3D medical image analysis divides a 3D medical image into three planes, expresses them as 2D sequential data for each plane, and then learns pre-representation to integrate them into 3D spatial feature information. (representation learning) a central model; and a prediction module configured of a fully connected layer and performing prediction on a target task based on the 3D spatial feature information, wherein the pretrained central module and the prediction module may be fine-tuned according to the target task. there is.

Description

Transfer learning method and apparatus based on self-supervised learning for 3D MRI analysis for 3D medical image analysis

본 발명은 3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습 방법 및 그 장치에 관한 것이다. The present invention relates to a self-learning-based transfer learning method and apparatus for analyzing 3D medical images.

전이 학습이란 학습 데이터가 부족한 모델 구축을 위해 데이터가 풍부한 분야에서 훈련된 모델을 재사용하는 학습 기법을 말한다. 이는 특정한 태스크에 대하여 학습한 모델을 다른 태스크로 전이하여 해당 모델을 사후적으로 학습하는 개념을 포괄한다. Transfer learning refers to a learning technique that reuses a trained model in a data-rich field to build a model that lacks training data. This encompasses the concept of transferring a model learned for a specific task to another task to learn the corresponding model ex post facto.

전이 학습을 통해 학습 시간이 비교적 오래 걸리는 딥러닝 모델의 학습 시간을 줄여주고, 적은 수의 학습 샘플만으로 높은 예측 성능을 보임으로써 딥러닝의 다양한 분야에서 연구가 활발히 이루어지고 있다.Transfer learning reduces the training time of deep learning models, which take a relatively long learning time, and shows high prediction performance with only a small number of training samples, so research is being actively conducted in various fields of deep learning.

최근 라벨링된 샘플이 적고 고차원인 의료 영상 분석 분야에서도 전이 학습이 주목받고 있는데, 이를 통해, 영상의 분류(Classification), 분할(Segmentation) 등의 다양한 의료 영상 분석 관련 태스크들에 활발하게 활용되고 있다.Recently, transfer learning is attracting attention in the field of high-dimensional medical image analysis with few labeled samples, and through this, it is actively used for various medical image analysis-related tasks such as image classification and segmentation.

이러한 의료 영상 분석을 위한 전이 학습 프레임워크는 대부분 3차원 합성곱 신경망 (Convolutional Neural Network, CNN) 네트워크를 지도 학습(Supervised learning) 또는 자가지도 학습 방식을 통해 사전 학습하여 활용하고 있다.Most of these transfer learning frameworks for medical image analysis utilize 3D Convolutional Neural Network (CNN) networks after pre-learning through supervised learning or self-supervised learning methods.

하지만 지도 학습 기반 사전 학습 방식은 학습을 위해 많은 수의 라벨링된 학습 데이터셋을 요구하는 문제점이 있다. 또한 3차원 합성곱 신경망 네트워크 기반 방법들은 모델 학습에 필요한 매개변수 수가 많아 실제적으로 고차원의 의료 영상 데이터셋을 학습시키기 어려운 한계가 존재한다. However, the supervised learning-based pre-learning method has a problem in that it requires a large number of labeled training datasets for learning. In addition, 3D convolutional neural network-based methods have limitations in actually learning high-dimensional medical image datasets due to the large number of parameters required for model learning.

본 발명은 라벨링된 학습 데이터셋이 부족한 3차원 의료 영상 분석 분야에서 적은 학습 샘플만으로 높은 성능을 도출할 수 있는 자가학습 기반 전이 학습 방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a self-learning-based transfer learning method and apparatus capable of deriving high performance with only a small number of training samples in the field of 3D medical image analysis where labeled training datasets are insufficient.

또한, 본 발명은 3차원 의료 영상을 3개의 평면(Plane)으로 나누고, 각 평면마다 2차원의 순차적 데이터로 표현한 후, 3차원 공간 특징으로 통합하여 표현 학습(Representation learning)함으로써, 3차원 공간 특징을 효과적으로 학습할 뿐만 아니라 학습 매개변수를 효율적으로 줄일 수 있는 자가학습 기반 전이 학습 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention divides a 3D medical image into three planes, expresses them as 2D sequential data for each plane, and integrates them into 3D spatial features for representation learning, thereby providing 3D spatial features. It is to provide a self-learning-based transfer learning method and apparatus capable of efficiently reducing learning parameters as well as effectively learning.

본 발명의 일 측면에 따르면, 3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습을 위한 장치가 제공된다. According to one aspect of the present invention, an apparatus for self-learning-based transfer learning for 3D medical image analysis is provided.

본 발명의 일 실시예에 따르면, 3차원 의료 영상을 3개의 평면으로 나눈 후 각 평면마다 2차원의 순차적 데이터로 표현한 후 3차원 공간 특징 정보로 통합하여 사전 표현 학습(Representation learning)하는 중축 모델; 완전 연결 계층으로 구성되며, 상기 3차원 공간 특징 정보를 기반으로 목표 태스크에 대한 예측을 수행하는 예측 모듈을 포함하되, 상기 사전 학습된 중축 모듈과 상기 예측 모듈은 상기 목표 태스크에 따라 미세 조정되는 것을 특징으로 하는 자가학습 기반 전이 학습 프레임워크 장치가 제공될 수 있다. According to an embodiment of the present invention, a 3D medical image is divided into three planes, and each plane is expressed as 2D sequential data, and then integrated into 3D spatial feature information to perform pre-representation learning; It consists of a fully connected layer and includes a prediction module that performs prediction on a target task based on the 3D spatial feature information, wherein the pretrained central module and the prediction module are fine-tuned according to the target task. A characterized self-learning-based transfer learning framework device may be provided.

상기 중축 모듈은, 상기 각 평면에 대한 2차원 슬라이스 영상에 대한 인코딩 벡터를 추출하도록 자가 학습 방식으로 학습되는 2차원 합성곱 신경망 모듈; 및 상기 인코딩 벡터를 콤비네이션하여 상기 3차원 공간 특징 정보로 통합하도록 표현 학습하는 트랜스포머를 포함하되, 상기 2차원 합성곱 신경망 모듈은 트리플렛 손실 함수를 기반으로 사전 학습될 수 있다. The central axis module may include a 2D convolutional neural network module trained by a self-learning method to extract an encoding vector for a 2D slice image for each plane; and a transformer that performs expression learning to combine the encoding vectors and integrate them into the 3D spatial feature information, wherein the 2D convolutional neural network module may be pre-learned based on a triplet loss function.

상기 트랜스포머는 2차원 슬라이스 영상 중 일부를 무작위로 마스크처리한 후 마스킹되지 않은 슬라이스의 인코딩 벡터를 이용하여 상기 마스킹된 슬라이스 영상의 인코딩 벡터를 예측하도록 학습될 수 있다. The transformer may be trained to randomly mask a portion of a 2D slice image and predict an encoding vector of the masked slice image using an encoding vector of an unmasked slice image.

상기 트랜스포머 학습시, 상기 2차원 합성곱 신경망 모듈은 학습되지 않도록 동결될 수 있다. When learning the transformer, the 2D convolutional neural network module may be frozen so as not to be learned.

상기 사전 학습된 중축 모듈과 상기 예측 모듈은 엔드투엔드방식으로 각 목표 태스크에 맞도록 미세 조정되되, 상기 목표 태스크는 뇌 질환 진단, 뇌 나이 예측 및 뇌 종양 분할 중 적어도 하나일 수 있다. The pretrained central module and the prediction module are fine-tuned to each target task in an end-to-end manner, and the target task may be at least one of brain disease diagnosis, brain age prediction, and brain tumor segmentation.

상기 예측 모듈은, 상기 목표 태스크에 따라 상기 2차원 합성곱 신경망 모듈의 높은 수준(high-level) 인코딩 벡터를 이용하는 싱글 스케일(single-scale) 기법과 상기 2차원 합성곱 신경망 모듈의 낮은 수준(low-level) 인코딩 벡터들을 통합하는 다중 스케일(multi-scale) 기법 중 어느 하나가 적용될 수 있다. The prediction module, according to the target task, a single-scale technique using a high-level encoding vector of the 2D convolutional neural network module and a low-level (low level) of the 2D convolutional neural network module -level) Any one of multi-scale techniques for integrating encoding vectors may be applied.

본 발명의 다른 측면에 따르면, 3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습 방법이 제공된다. According to another aspect of the present invention, a self-learning-based transfer learning method for analyzing 3D medical images is provided.

본 발명의 일 실시예에 따르면, 3차원 의료 영상을 3개의 평면으로 나눈 후 각 평면마다 2차원의 순차적 데이터로 표현한 후 3차원 공간 특징 정보로 통합하도록 모델을 사전 표현 학습(Representation learning)하는 단계; 및 상기 3차원 공간 특징 정보를 기반으로 목표 태스크에 대한 예측을 수행하도록 상기 목표 태스크에 따라 상기 모델을 미세 조정하는 단계를 포함하는 자가학습 기반 전이 학습 방법이 제공될 수 있다. According to an embodiment of the present invention, pre-representation learning is performed on a model to divide a 3D medical image into three planes, express them as 2D sequential data for each plane, and integrate them into 3D spatial feature information. ; and fine-tuning the model according to the target task to perform prediction on the target task based on the 3D spatial feature information.

본 발명의 일 실시예에 따른 3차원 의료 영상 분석을 위한 자가학습 기반 전이 학습 방법 및 그 장치를 제공함으로써, 라벨링된 학습 데이터셋이 부족한 3차원 의료 영상 분석 분야에서 적은 학습 샘플만으로 높은 성능을 도출할 수 있다.By providing a self-learning-based transfer learning method and apparatus for analyzing 3D medical images according to an embodiment of the present invention, high performance is obtained with only a small number of training samples in the field of 3D medical image analysis lacking labeled training datasets. can do.

또한, 본 발명은 3차원 의료 영상을 3개의 평면(Plane)으로 나누고, 각 평면마다 2차원의 순차적 데이터로 표현한 후, 3차원 공간 특징으로 통합하여 표현 학습(Representation learning)함으로써, 3차원 공간 특징을 효과적으로 학습할 뿐만 아니라 학습 매개변수를 효율적으로 줄일 수 있는 이점도 있다. In addition, the present invention divides a 3D medical image into three planes, expresses them as 2D sequential data for each plane, and integrates them into 3D spatial features for representation learning, thereby providing 3D spatial features. It not only learns effectively, but also has the advantage of efficiently reducing the learning parameters.

도 1은 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크 장치의 내부 구성을 개략적으로 도시한 블록도.
도 2는 본 발명의 일 실시예에 따른 중축 모듈의 내부 구성을 도시한 도면.
도 3은 본 발명의 일 실시예에 따른 2차원 합성곱 신경망 모듈의 트리플렛 손실 기반 학습을 설명하기 위해 도시한 도면.
도 4는 본 발명의 일 실시예에 따른 미세 조정을 설명하기 위해 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 방법을 나타낸 순서도.
도 6은 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크의 전체 플로우를 도시한 도면.
도 7은 종래와 본 발명의 뇌 종양 분할 결과를 비교한 도면.1 is a block diagram schematically showing the internal configuration of a self-learning-based transfer learning framework device according to an embodiment of the present invention.
Figure 2 is a view showing the internal configuration of the heavy duty module according to an embodiment of the present invention.
3 is a diagram for explaining triplet loss-based learning of a two-dimensional convolutional neural network module according to an embodiment of the present invention.
4 is a diagram for explaining fine adjustment according to an embodiment of the present invention;
5 is a flowchart illustrating a self-learning-based transfer learning method according to an embodiment of the present invention.
6 is a diagram showing the overall flow of a self-learning-based transfer learning framework according to an embodiment of the present invention.
Figure 7 is a comparison of brain tumor segmentation results of the prior art and the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Singular expressions used herein include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or steps described in the specification, and some of the components or some of the steps It should be construed that it may not be included, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크 장치의 내부 구성을 개략적으로 도시한 블록도이고, 도 2는 본 발명의 일 실시예에 따른 중축 모듈의 내부 구성을 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 2차원 합성곱 신경망 모듈의 트리플렛 손실 기반 학습을 설명하기 위해 도시한 도면이며, 도 4는 본 발명의 일 실시예에 따른 미세 조정을 설명하기 위해 도시한 도면이다. 1 is a block diagram schematically showing the internal configuration of a self-learning-based transfer learning framework device according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the internal configuration of a centralized module according to an embodiment of the present invention. 3 is a diagram for explaining triplet loss-based learning of a two-dimensional convolutional neural network module according to an embodiment of the present invention, and FIG. 4 is a diagram for explaining fine tuning according to an embodiment of the present invention. It is a drawing shown for

도 1을 참조하면, 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크 장치(100)는 중축 모듈(110), 예측 모듈(120), 메모리(130) 및 프로세서(140)를 포함하여 구성된다. Referring to FIG. 1, a self-learning-based transfer learning framework apparatus 100 according to an embodiment of the present invention includes a centralization module 110, a prediction module 120, a memory 130 and a processor 140, It consists of

본 발명의 일 실시예에 따르면, 자가학습 기반 전이 학습 프레임워크 장치(100)는 SSL 프록시 작업에 의해 사전 훈련되는 중축 모듈(110)과 최종 예측 위한 완전 연결 계층 기반 예측 모듈(120)을 포함할 수 있다. According to an embodiment of the present invention, the self-learning-based transfer learning framework apparatus 100 may include a centralized module 110 pre-trained by an SSL proxy task and a fully-connected layer-based prediction module 120 for final prediction. can

중축 모듈(110)은 3차원 의료 영상을 각 평면으로 나눈 후 각 평면에 대한 2차원 슬라이스의 고수준 공간 특징을 추출하여 슬라이스간 종속성을 모델링하여 관계형 특징을 추출할 수 있다. The central axis module 110 may extract relational features by modeling inter-slice dependencies by dividing the 3D medical image into planes and then extracting high-level spatial features of the 2D slices for each plane.

3차원 의료 영상은 각 평면에 대해 서로 다른 뷰를 갖는 축, 관상, 시상 3개의 평면으로 구성되기 때문에 중축 모듈(110)은 매개 변수를 공유하지 않는 각 평면에 대해 독립적인 합성곱 인코더가 사용될 수 있다. Since a 3D medical image is composed of three planes, axial, coronal, and sagittal, each plane having a different view, the central axis module 110 may use an independent convolutional encoder for each plane that does not share parameters. there is.

합성곱 인코더(212)에 의해 추출된 공간 특징은 트랜스포머(220)로 공급되기 전에 슬라이스 순서 지정을 위한 위치 인코딩과 식별 평면에 대한 세그먼트 인코딩이 수행될 수 있다. Before the spatial features extracted by the convolutional encoder 212 are supplied to the transformer 220, position encoding for specifying a slice order and segment encoding for an identification plane may be performed.

트랜스포머(220)를 통과한 후 모든 평면에 대한 슬라이스별 특징의 조합으로 3차원 공간 특징 맵(정보)가 형성되며, 예측 모듈(120)은 해당 3차원 공간 특징 맵을 이용하여 목표 태스크에 따른 예측을 수행할 수 있다. After passing through the transformer 220, a 3D spatial feature map (information) is formed as a combination of features of each slice for all planes, and the prediction module 120 predicts according to the target task using the corresponding 3D spatial feature map. can be performed.

이하에서 보다 상세히 설명하기로 한다. It will be described in more detail below.

본 발명의 일 실시예에 따르면, 중축 모듈(110)은 3차원 의료 영상을 3개의 평면으로 각각 나눈 후 각 평면에 대한 2차원의 순차적인 데이터로 표현한 후 이를 3차원 공간 특징 정보로 통합하도록 사전 표현 학습된다.According to an embodiment of the present invention, the central axis module 110 divides the 3D medical image into three planes, expresses them as 2D sequential data for each plane, and then integrates them into 3D spatial feature information. expression is learned.

중축 모듈(110)의 사전 학습이 완료되면, 사전 학습 완료된 중축 모듈(110)과 예측 모듈(120)이 목표 태스크에 적합하도록 미세 조정될 수 있다. When the pre-learning of the central module 110 is completed, the pre-trained central module 110 and the prediction module 120 may be fine-tuned to be suitable for the target task.

예측 모듈(120)은 완전 연결 계층으로 구성되되, 사전 학습된 중축 모듈(110)과 예측 모듈은 엔드투엔드 방식으로 목표 태스크에 적합하도록 미세 조정될 수 있다. Prediction module 120 consists of a fully connected layer, but the pre-trained central module 110 and the prediction module can be fine-tuned to suit the target task in an end-to-end manner.

도 2에 중축 모듈(110)의 내부 구조가 도시되어 있다. 도 2에 도시된 바와 같이, 중축 모듈(110)은 2차원 합성곱 신경망 모듈(210)과 트랜스포머(220)를 포함하여 구성된다. The internal structure of the central axis module 110 is shown in FIG. 2 . As shown in FIG. 2 , the central axis module 110 includes a 2D convolutional neural network module 210 and a transformer 220 .

2차원 합성곱 신경망 모듈(210)은 3차원 의료 영상을 3개의 평면으로 나눈 후 각 평면에 대해 생성된 2차원 슬라이스 영상의 공간적 특징으로 슬라이스 인코딩 벡터(slice encoding vector)들을 추출한다. The 2D convolutional neural network module 210 divides the 3D medical image into three planes and then extracts slice encoding vectors as spatial features of the 2D slice image generated for each plane.

트랜스포머(220)는 각 평면에 상응하는 슬라이스 인코딩 벡터(slice encoding vector)들을 콤비네이션하여 3차원 공간 특징 정보로 통합하기 위한 수단이다. The transformer 220 is a means for combining slice encoding vectors corresponding to each plane into 3D spatial feature information.

예를 들어, 3차원 의료 영상을

라 가정하면, 각 평면의 2D 슬라이스 영상의 집합을

로 나타낼 수 있다.

이다.

는 상응하는 평면의 슬라이스 개수를 나타내고, w, d, h는 3차원 의료 영상의 너비, 깊이, 및 높이를 각각 나타낸다. For example, 3D medical images

Assuming that, a set of 2D slice images in each plane

can be expressed as

am.

represents the number of slices of the corresponding plane, and w, d, and h represent the width, depth, and height of the 3D medical image, respectively.

2차원 슬라이스 영상이 2차원 합성곱 신경망 모듈(210)로 전달되며, 2차원 합성곱 신경망 모듈(210)은 각 평면에 대한 슬라이스 영상에 대한 임베딩 벡터

를 추출할 수 있다. 여기서,

는 임베딩 벡터의 차원을 나타낸다. The 2D slice image is passed to the 2D convolutional neural network module 210, and the 2D convolutional neural network module 210 embedding vectors for the slice image for each plane.

can be extracted. here,

represents the dimension of the embedding vector.

이어, 2차원 합성곱 신경망 모듈(210)은 각 평면의 슬라이스 영상에 대한 임베딩 벡터에 대해 각각 위치 및 세그먼트 인코딩을 수행하여 인코딩 벡터

를 생성할 수 있으며, 해당 인코딩 벡터가 트랜스포머(220)로 전달될 수 있다. Subsequently, the 2D convolutional neural network module 210 performs position and segment encoding on the embedding vectors for the slice images of each plane, respectively, to encode the encoding vectors.

can be generated, and the corresponding encoding vector can be passed to the transformer 220.

본 발명의 일 실시예에 따른 중축 모듈(110)은 사전 학습될 수 있다. 이하에서는 중축 모듈(110)의 사전 학습 과정에 대해 우선 설명하기로 한다. The central axis module 110 according to an embodiment of the present invention may be pre-learned. Hereinafter, the pre-learning process of the central axis module 110 will be first described.

중축 모듈(110)은 두번의 자가 학습에 기반하여 사전 학습될 수 있다. The central axis module 110 may be pre-learned based on self-learning twice.

도 3을 참조하여 이에 대해 보다 상세히 설명하기로 한다. This will be described in more detail with reference to FIG. 3 .

2차원 합성곱 신경망 모듈(210)이 공간적으로 의미 있는 임베딩 벡터를 추출하도록 자가 학습 방식으로 2차원 합성곱 신경망 모듈(210)이 학습될 수 있다. The 2D convolutional neural network module 210 may be trained by a self-learning method so that the 2D convolutional neural network module 210 extracts a spatially meaningful embedding vector.

2차원 합성곱 신경망 모듈(210)은 합성곱 인코더(212)를 포함하되, 합성곱 인코더(212)는 트리플렛 손실 함수를 기반으로 학습될 수 있다. 각 평면에 대해 모든 슬라이스별 특징이 평균 풀링되고, 포지티브 페어(변환된 특징)과 네거티브 페어(다른 배치로부터의 샘플 특징)과 비교하여 수학식 1과 같이 트리플렛 손실(

)이 계산될 수 있다. The two-dimensional convolutional neural network module 210 includes a convolutional encoder 212, and the convolutional encoder 212 may be trained based on a triplet loss function. For each plane, all slice-by-slice features are averaged and compared to positive pairs (transformed features) and negative pairs (sample features from different batches), resulting in triplet loss (

) can be calculated.

여기서,

는 슬라이스에 대해 평균을 낸 랜덤 훈련 샘플의 임베딩 벡터를 나타내고,

는 트랜스폼된 버전(포지티브 예제)을 나타내고,

는 데이터세트로부터의 다른 샘플(네거티브 예제)을 나타내며,

는 훈련된 샘플의 개수를 나타내고,

는 포지티브 샘플과 네거티브 샘플 사이의 마진을 나타내며, D는 거리 측정을 위한 함수로 예를 들어, L2거리일 수 있다. 최종 손실은 세 평면의 모든 손실을 평균하여 정의될 수 있다. here,

denotes the embedding vector of random training samples averaged over slices,

denotes the transformed version (positive example),

represents another sample (negative example) from the dataset,

represents the number of trained samples,

Represents a margin between positive and negative samples, and D is a function for measuring a distance, and may be, for example, an L2 distance. The final loss can be defined by averaging all losses in the three planes.

합성곱 인코더(212)에 의해 각 평면에 대한 2차원 슬라이스 공간 특징(임베딩 벡터)가 추출된 후, 슬라이스 순서 지정을 위한 위치 인코딩과 식별 평면에 대한 세그먼트 인코딩이 수행되어 출력값인 인코딩 벡터가 트랜스포머(220)로 전달될 수 있다. 슬라이스 단위 인코딩 벡터

를 토큰으로 간주하고, 이미지 수준 특징 집합인 인코딩 벡터의 집합

을 문장으로 간주할 수 있다. After the two-dimensional slice space feature (embedding vector) for each plane is extracted by the convolutional encoder 212, position encoding for specifying the slice order and segment encoding for the identification plane are performed, and the encoding vector as an output value is converted to a transformer ( 220). Slice Unit Encoding Vector

as a token, and a set of encoding vectors, which are image-level feature sets.

can be regarded as a sentence.

트리플렛 손실 함수를 기반으로 2차원 합성곱 신경망 모듈(210)의 학습의 학습 수행된 후 학습된 합성곱 신경망 모듈(210)이 학습되지 않도록 동결시킨 후 임의로 마스킹한 인코더 벡터를 예측하는 마스크된 인코딩 벡터 예측 태스크를 통해 트랜스포머(220)를 학습을 수행할 수 있다. A masked encoding vector predicting an encoder vector masked arbitrarily after the training of the two-dimensional convolutional neural network module 210 is performed based on the triplet loss function, and then the learned convolutional neural network module 210 is frozen so that it is not learned. The transformer 220 may perform learning through the prediction task.

이와 같이, 2차원 합성곱 신경망 모듈(210)이 사전 학습된 후, 트랜스포머(220)가 학습될 수 있다. In this way, after the 2D convolutional neural network module 210 is pre-trained, the transformer 220 may be trained.

본 발명의 일 실시예에 따른 트랜스포머(220)의 사전 학습시, 각 평면에 대한 일부 슬라이스 영상을 무작위로 마스킹하고, 나머지 마스킹되지 않은 슬라이스 영상에 대한 인코딩 벡터에서 마스킹된 슬라이스 영상의 인코딩 벡터를 예측하도록 트랜스포머(220)가 학습될 수 있다. During pre-learning of the transformer 220 according to an embodiment of the present invention, some slice images for each plane are randomly masked, and an encoding vector of the masked slice image is predicted from encoding vectors for the remaining unmasked slice images The transformer 220 may be learned to do so.

이때, 이전 단계에서 사전 훈련된 2차원 합성곱 신경망 모듈(210)은 고정되며, 트랜스포머(220)만 마스킹된 인코딩 벡터를 예측하도록 훈련되므로 트랜스포머(220)는 슬라이스간 종속성을 파악하여 마스킹된 인코딩 벡터를 예측할 수 있다. At this time, since the 2D convolutional neural network module 210 pre-trained in the previous step is fixed and only the transformer 220 is trained to predict the masked encoding vector, the transformer 220 determines the inter-slice dependency and determines the masked encoding vector. can predict

예를 들어, 슬라이스 단위 인코딩 벡터를

라고 가정하기로 한다. For example, slice-by-slice encoding vectors

Let's assume that

트랜스포머(220)는 집중 매커니즘을 통해 슬라이스간 관계(종속성)을 모델링할 수 있다. The transformer 220 may model relationships (dependencies) between slices through a concentration mechanism.

이를 수학식으로 나타내면, 수학식 2와 같이 나타낼 수 있다. If this is expressed as an equation, it can be expressed as in Equation 2.

여기서, {

,

}는 학습 가능한 가중치 행렬의 집합을 나타내고,

와

는 두개의 위치 단위 피드포워드 합성곱(point-wise feed-forward convolutions)을 나타낸다. here, {

,

} denotes a set of learnable weight matrices,

and

denotes two point-wise feed-forward convolutions.

마스킹된 인코딩 벡터 예측에 대한 손실 함수는 수학식 3과 같이 나타낼 수 있다. A loss function for masked encoding vector prediction can be expressed as Equation 3.

여기서,

는 마스킹된 슬라이스들의 인덱스 세트를 나타낸다. here,

denotes a set of indices of masked slices.

이와 같이, 중축 모듈(110)에 대한 사전 학습이 완료된 후 사전 학습된 중축 모듈(110)과 예측 모듈(120)이 미세 조정될 수 있다. In this way, after the pre-learning of the central module 110 is completed, the pre-learned central module 110 and the prediction module 120 may be fine-tuned.

이미 전술한 바와 같이, 사전 학습된 중축 모듈(110)과 예측 모듈(120)은 엔드투엔드방식으로 목표 태스크에 따라 미세 조정될 수 있다. As already mentioned, the pre-trained weighting module 110 and prediction module 120 can be fine-tuned according to the target task end-to-end.

이에 대해 도 4를 참조하여 보다 상세히 설명하기로 한다. This will be described in more detail with reference to FIG. 4 .

목표 태스크는 뇌 질환 진단, 뇌 나이 예측 및 뇌 종양 분할 중 어느 하나일 수 있다. The target task may be any one of brain disease diagnosis, brain age prediction, and brain tumor segmentation.

즉, 사전 학습된 중축 모듈(110)과 예측 모듈(120)은 뇌 질환 진단, 뇌 나이 예측 및 뇌 종양 분할 중 어느 하나에 따라 상이하게 미세 조정될 수 있다. That is, the pretrained central axis module 110 and the prediction module 120 may be differently fine-tuned according to any one of brain disease diagnosis, brain age prediction, and brain tumor segmentation.

목표 태스크에 따라 낮은 수준(low-level) 및 높은 수준(high-level) 특징을 모두 캡쳐하는 멀티 스케일 접근 방식과 분류 및 회귀 작업에 대한 높은 수준의 특징을 캡처하는 싱글 스케일 접근 방식이 취해질 수 있다. Depending on the target task, a multi-scale approach to capture both low-level and high-level features and a single-scale approach to capture high-level features for classification and regression tasks can be taken. .

사전 학습 및 미세 조정을 위한 전체 플로우는 도 4에 도시된 바와 같다. The overall flow for pre-learning and fine-tuning is as shown in FIG. 4 .

메모리(130)는 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 방법을 위한 다양한 명령어들을 저장한다.The memory 130 stores various commands for the self-learning-based transfer learning method according to an embodiment of the present invention.

프로세서(140)는 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크 장치(100)의 내부 구성 요소들(예를 들어, 중축 모듈(110), 예측 모듈(120), 메모리(130) 등)을 제어하기 위한 수단이다.The processor 140 includes internal components (eg, the central module 110, the prediction module 120, the memory 130) of the self-learning-based transfer learning framework device 100 according to an embodiment of the present invention. etc.) as a means to control.

또한, 프로세서(140)는 중축 모듈(110)이 사전 학습되도록 제어하며, 사전 학습된 중축 모듈(110)과 예측 모듈(120)이 목표 태스크에 따라 미세 조정되도록 제어할 수 있다. In addition, the processor 140 controls the central module 110 to be pre-learned, and controls the pre-trained central module 110 and the prediction module 120 to be fine-tuned according to the target task.

도 5는 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 방법을 나타낸 순서도이고, 도 6은 본 발명의 일 실시예에 따른 자가학습 기반 전이 학습 프레임워크의 전체 플로우를 도시한 도면이고, 도 7은 종래와 본 발명의 뇌 종양 분할 결과를 비교한 도면이다. 5 is a flowchart showing a self-learning-based transfer learning method according to an embodiment of the present invention, and FIG. 6 is a diagram showing the overall flow of a self-learning-based transfer learning framework according to an embodiment of the present invention. 7 is a diagram comparing the brain tumor segmentation results of the prior art and the present invention.

단계 510에서 자가학습 기반 전이 학습 프레임워크 장치(100)는 백본 네트워크를 사전 학습시킨다. In step 510, the self-learning-based transfer learning framework apparatus 100 pre-trains the backbone network.

도 6에 도시된 바와 같이, 백본 네트워크는 중축 모듈(110)로, 2차원 합성곱 신경망 모듈(210)과 트랜스포머(220)를 포함하여 구성된다. 2차원 합성곱 신경망 모듈(210)은 3차원 의료 영상의 각 평면에 대한 2차원 슬라이스의 고수준 공간 특징을 추출하고, 트랜스포머(220)를 통해 슬라이스간 종속성을 모델링하여 관계형 특징을 추출하도록 사전학습될 수 있다. As shown in FIG. 6 , the backbone network includes a central axis module 110 and includes a 2D convolutional neural network module 210 and a transformer 220 . The 2D convolutional neural network module 210 is pre-trained to extract high-level spatial features of 2D slices for each plane of the 3D medical image and extract relational features by modeling dependencies between slices through the transformer 220. can

3차원 의료 영상은 각 평면에 대해 서로 다른 뷰를 갖는 축, 관상 및 시상 3개의 평면으로 구성되므로 매개변수를 공유하지 않는 각 평면에 대해 독립적인 2차원 합성곱 신경망 모듈(210)을 사용한다. Since a 3D medical image is composed of three planes, axial, coronal and sagittal, each having a different view, an independent 2D convolutional neural network module 210 is used for each plane that does not share parameters.

공간 특징은 트랜스포머(220)로 전달되기 전에 슬라이스의 순서 지정을 위한 위치 인코딩과 식별 평면에 대한 세그먼트 인코딩을 거친 후 트랜스포머(220)로 전달될 수 있다. Spatial features may be transmitted to the transformer 220 after passing through position encoding for order designation of slices and segment encoding for an identification plane before being transferred to the transformer 220 .

트랜스포머(220)는 자가주의 매커니즘을 통해 인접 슬라이스와 원거리 슬라이스간 종속성을 캡쳐하도록 학습될 수 있다. 즉, 트랜스포머(220)는 마스킹된 일부 슬라이스의 인코딩 벡터를 마스킹되지 않은 슬라이스의 인코딩 벡터를 이용하여 예측하도록 학습될 수 있다. The transformer 220 can be taught to capture dependencies between adjacent slices and far slices through an autoattention mechanism. That is, the transformer 220 may learn to predict encoding vectors of some masked slices by using encoding vectors of non-masked slices.

이미 전술한 바와 같이, 트랜스포머(220)의 사전 학습시, 2차원 합성곱 신경망 모듈(210)은 동결되어 학습되지 않도록 할 수 있다. As already described above, during pre-learning of the transformer 220, the 2D convolutional neural network module 210 may be frozen and not learned.

단계 515에서 자가학습 기반 전이 학습 프레임워크 장치(100)는 사전 학습된 중축 모듈(110)과 예측 모듈(120)을 목표 태스크에 따라 미세 조정한다. In step 515, the self-learning-based transfer learning framework device 100 fine-tunes the pretrained central module 110 and the prediction module 120 according to the target task.

예를 들어, 목표 태스크는 뇌 질환 진단, 뇌 나이 예측 및 뇌 종양 분할일 수 있다. For example, the target task may be brain disease diagnosis, brain age prediction, and brain tumor segmentation.

사전 학습된 중축 모듈(110)과 예측 모듈(120)은 목표 태스크에 따라 낮은 수준(low-level) 특징과 높은 수준(high-level)의 특징을 모두 이용하는 멀티 스케일 접근 방식과 높은 수준의 특징을 이용하는 싱글 스케일 접근 방식이 이용될 수 있다. The pretrained central axis module 110 and the prediction module 120 adopt a multi-scale approach using both low-level and high-level features and high-level features according to the target task. A single scale approach may be used.

따라서, 목표 태스크에 따라 사전 학습된 중축 모듈(110)과 예측 모듈(120)이 엔드투엔드 방식으로 미세 조정될 수 있다. Accordingly, the pre-trained centralization module 110 and the prediction module 120 can be fine-tuned in an end-to-end manner according to the target task.

이와 같이, 미세 조정이 완료되면, 중축 모듈(110)을 통해 3차원 의료 영상을 각 평면으로 나눈 후 각 평면에 대한 2차원 슬라이스의 고수준 공간 특징을 추출한 후 이를 조합하여 3차원 공간 특징 정보를 형성할 수 있다. In this way, when the fine adjustment is completed, the 3D medical image is divided into each plane through the central axis module 110, high-level spatial features of the 2D slice for each plane are extracted, and 3D spatial feature information is formed by combining them. can do.

이어, 예측 모듈(120)은 3차원 공간 특징 정보를 이용하여 목표 태스크를 예측할 수 있다. Next, the prediction module 120 may predict the target task using the 3D spatial feature information.

도 7은 종래와 본 발명의 일 실시예에 따른 3차원 의료 영상에 대한 뇌 종양 분할 결과를 비교한 것이다. 본 발명의 뇌 종양 분할 결과가 3D-SSL의 분할 결과와 유사하며 종래와 비교하여 종양 유형이 가장 잘 구별되는 것을 알 수 있다. 7 compares brain tumor segmentation results for 3D medical images according to the prior art and an embodiment of the present invention. It can be seen that the brain tumor segmentation results of the present invention are similar to those of 3D-SSL, and tumor types are best distinguished compared to the conventional ones.

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Devices and methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in computer readable media. Computer readable media may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on a computer readable medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art in the field of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - Includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, etc. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at mainly by its embodiments. Those skilled in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100: 자가학습 기반 전이 학습 프레임워크 장치.
110: 중축 모듈
120: 예측 모듈
130: 메모리
140: 프로세서100: Self-learning based transfer learning framework device.
110: central axis module
120: prediction module
130: memory
140: processor

Claims

A central axis module for pre-representation learning to divide the 3D medical image into three planes, express them as 2D sequential data for each plane, and integrate them into 3D spatial feature information;
It consists of a fully connected layer and includes a prediction module that performs prediction on a target task based on the 3D spatial feature information,
The central axis module may include a 2D convolutional neural network module trained by a self-learning method to extract an encoding vector for a 2D slice image for each plane; and
A transformer for pre-representation learning to combine the encoding vectors and integrate them into the 3-dimensional spatial feature information, wherein the 2-dimensional convolutional neural network module is pre-learned based on a triplet loss function,
The pre-representation learned condensation module and the prediction module are fine-tuned to suit each target task in an end-to-end manner,
Each of the target tasks is a self-learning-based transfer learning framework device, characterized in that at least one of brain disease diagnosis, brain age prediction, and brain tumor segmentation.

delete

According to claim 1,
The transformer randomly masks a part of the 2D slice image and then learns to predict the encoding vector of the masked slice image using the encoding vector of the unmasked slice. Self-learning-based transfer learning framework Device.

According to claim 3,
Self-learning-based transfer learning framework device, characterized in that, when learning the transformer, the two-dimensional convolutional neural network module is frozen so as not to be learned.

delete

According to claim 1,
The prediction module,
A single-scale technique using a high-level encoding vector of the 2D convolutional neural network module and a low-level encoding vector of the 2D convolutional neural network module according to the target task A self-learning-based transfer learning framework device, characterized in that any one of the multi-scale techniques for integrating them is applied.

delete