KR102561214B1

KR102561214B1 - A method and apparatus for image segmentation using global attention

Info

Publication number: KR102561214B1
Application number: KR1020210045618A
Authority: KR
Inventors: 이범식; 챠이트라 다야난다
Original assignee: 조선대학교산학협력단
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2023-07-27
Also published as: KR20220139541A

Abstract

본 발명은 글로벌 어텐션을 이용한 영상 분할 방법 및 장치에 관한 것이다. 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 방법은, (a) 입력 정보를 획득하는 단계; (b) 상기 입력 정보를 제1 컨볼루션 레이어로 구성된 스퀴즈 레이어(squeeze layer)와 제2 컨볼루션 레이어 및 제3 컨볼루션 레이어로 구성된 확장 레이어(expand layer)를 포함하는 제1 파이어(fire) 모듈에 입력하는 단계; (c) 상기 제1 파이어 모듈의 출력 정보와 멀티 스케일 입력 정보를 글로벌 평균 풀링(global average pooling)을 포함하는 제1 글로벌 어텐션(global attention) 모듈에 입력하는 단계; 및 (d) 상기 제1 글로벌 어텐션 모듈의 출력 정보를 맥스 풀링(max pooling)하여 인코딩을 수행하는 단계;를 포함할 수 있다. The present invention relates to a method and apparatus for segmenting an image using global attention. An image segmentation method using global attention according to an embodiment of the present invention includes the steps of (a) obtaining input information; (b) a first fire module including a squeeze layer composed of a first convolution layer, an expand layer composed of a second convolution layer and a third convolution layer, inputting the input information to a fire module; (c) inputting output information and multi-scale input information of the first fire module to a first global attention module including global average pooling; and (d) performing encoding by max pooling output information of the first global attention module.

Description

Method and apparatus for image segmentation using global attention {A method and apparatus for image segmentation using global attention}

본 발명은 영상 분할 방법 및 장치에 관한 것으로, 더욱 상세하게는 글로벌 어텐션을 이용한 영상 분할 방법 및 장치에 관한 것이다.The present invention relates to an image segmentation method and apparatus, and more particularly, to an image segmentation method and apparatus using global attention.

자기 공명 영상(MRI)은 대비가 높고 해상도가 상대적으로 높다. 따라서 새로운 기술은 임상 응용 및 과학 연구에서 인간의 뇌를 검사하는데 널리 사용되고 있다.Magnetic resonance imaging (MRI) has high contrast and relatively high resolution. Therefore, the new technique is widely used to examine the human brain in clinical applications and scientific research.

이 경우, 뇌 조직을 백질(white matter, WM), 회백질(gray matter, GM) 및 뇌척수액(cerebrospinal fluid, CSF)으로 자동 분할하는 것은 앞서 언급한 작업에서 매우 중요하다.In this case, automatic segmentation of brain tissue into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is very important in the aforementioned task.

정확한 조직 분할은 MRI의 잡음, 편향 장 및 부분 부피 효과로 인한 복잡한 뇌 구조와 조직 이질성으로 인해 어려운 일이다.Accurate tissue segmentation is challenging due to complex brain structures and tissue heterogeneity due to MRI noise, deflection fields, and partial volume effects.

이러한 문제를 해결하기 위해 딥러닝 네트워크를 사용하는 전략은 관련 이점으로 인해 세분화 작업에 사용되고 있으나, 이에 대한 연구는 미흡한 실정이다. Strategies using deep learning networks to solve these problems are being used for segmentation tasks due to their associated advantages, but research on them is insufficient.

[특허문헌 1] 한국등록특허 제10-2089014호[Patent Document 1] Korea Patent Registration No. 10-2089014

본 발명은 전술한 문제점을 해결하기 위하여 창출된 것으로, 글로벌 어텐션을 이용한 영상 분할 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention has been created to solve the above problems, and an object of the present invention is to provide a method and apparatus for segmenting an image using global attention.

또한, 본 발명은, 멀티 스케일 글로벌 어텐션 및 파이어 모듈과 통합된 패치 방식 입력을 이용하는 영상 분할 방법 및 장치를 제공하는 것을 그 목적으로 한다.Another object of the present invention is to provide an image segmentation method and apparatus using multi-scale global attention and a patch method input integrated with a fire module.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood from the description below.

상기한 목적들을 달성하기 위하여, 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 방법은, (a) 입력 정보를 획득하는 단계; (b) 상기 입력 정보를 제1 컨볼루션 레이어로 구성된 스퀴즈 레이어(squeeze layer)와 제2 컨볼루션 레이어 및 제3 컨볼루션 레이어로 구성된 확장 레이어(expand layer)를 포함하는 제1 파이어(fire) 모듈에 입력하는 단계; (c) 상기 제1 파이어 모듈의 출력 정보와 멀티 스케일 입력 정보를 글로벌 평균 풀링(global average pooling)을 포함하는 제1 글로벌 어텐션(global attention) 모듈에 입력하는 단계; 및 (d) 상기 제1 글로벌 어텐션 모듈의 출력 정보를 맥스 풀링(max pooling)하여 인코딩을 수행하는 단계; 를 포함할 수 있다. In order to achieve the above objects, an image segmentation method using global attention according to an embodiment of the present invention includes the steps of (a) obtaining input information; (b) a first fire module including a squeeze layer composed of a first convolution layer, an expand layer composed of a second convolution layer and a third convolution layer, inputting the input information to a fire module; (c) inputting output information and multi-scale input information of the first fire module to a first global attention module including global average pooling; and (d) performing encoding by max pooling output information of the first global attention module; can include

실시예에서, 상기 (a) 단계는, 입력 이미지를 획득하는 단계; 및 상기 입력 이미지를 분할하여 패치(patch) 형태의 상기 입력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, step (a) may include obtaining an input image; and generating the input information in the form of a patch by dividing the input image.

실시예에서, 상기 제1 컨볼루션 레이어 및 제2 컨볼루션 레이어는, 제1 커널 크기로 구성되고, 상기 제3 컨볼루션 레이어는, 제2 커널 크기로 구성될 수 있다. In an embodiment, the first convolution layer and the second convolution layer may have a first kernel size, and the third convolution layer may have a second kernel size.

실시예에서, 상기 (b) 단계는, 상기 스퀴즈 레이어의 제1 컨볼루션 레이어의 출력값을 생성하는 단계; 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 컨볼루션 레이어와 제3 컨볼루션 레이어 각각에 입력하는 단계; 및 상기 제2 컨볼루션 레이어와 제3 컨볼루션 레이어 각각의 출력값을 연결(concatenate)하여 상기 제1 파이어 모듈의 출력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, the step (b) may include generating an output value of a first convolution layer of the squeeze layer; inputting the generated output value to each of a second convolution layer and a third convolution layer configured in parallel of the enhancement layer; and generating output information of the first fire module by concatenating output values of each of the second convolution layer and the third convolution layer.

실시예에서, 상기 글로벌 어텐션을 이용한 영상 분할 방법은, 상기 (c) 단계 이전에, 제1 컨볼루션 레이어에 대한 상기 입력 정보에 대한 맥스 풀링을 수행하여 제2 컨볼루션 레이어에 대한 입력 정보를 생성하는 단계; 상기 제2 컨볼루션 레이어에 대한 입력 정보를 병렬로 구성된 제4 컨볼루션 레이어와 제5 컨볼루션 레이어 각각에 입력하여 상기 제2 레이어에 대한 멀티 스케일 입력 정보를 생성하는 단계;를 더 포함할 수 있다. In an embodiment, the image segmentation method using global attention may include generating input information for a second convolution layer by performing max pooling on the input information for the first convolution layer before the step (c); The method may further include generating multi-scale input information for the second layer by inputting input information for the second convolution layer to each of a fourth convolution layer and a fifth convolution layer configured in parallel.

실시예에서, 상기 (c) 단계는, 상기 제1 파이어 모듈의 출력 정보를 상기 글로벌 평균 풀링하는 단계; 상기 글로벌 평균 풀링의 결과값을 제6 컨볼루션 레이어에 입력하는 단계; 상기 제6 컨볼루션 레이어의 결과값과 상기 멀티 스케일 입력 정보를 제7 컨볼루션 레이어에 입력하여 생성된 결과값에 기반한 업샘플링(upsampling)을 수행하여 어텐션 계수(attention coefficient)를 생성하는 단계; 및 상기 어텐션 계수와 상기 제1 파이어 모듈의 출력 정보를 이용하여 상기 제1 글로벌 어텐션 모듈의 출력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, the step (c) may include: pooling the global average of the output information of the first fire module; inputting a resultant value of the global average pooling to a sixth convolution layer; Generating an attention coefficient by performing upsampling based on a result value of the sixth convolution layer and a result value generated by inputting the multi-scale input information to a seventh convolution layer; and generating output information of the first global attention module using the attention coefficient and output information of the first fire module.

실시예에서, 상기 글로벌 어텐션을 이용한 영상 분할 방법은, 상기 (d) 단계 이후에, 상기 제1 컨볼루션 레이어에 대한 제1 파이어 모듈의 출력 정보와 상기 제2 컨볼루션 레이어에 대한 제2 파이어 모듈의 출력 정보를 상기 제1 컨볼루션 레이어에 대한 제2 글로벌 어텐션 모듈에 입력하는 단계; 를 더 포함하고, 상기 제2 파이어 모듈은, 제1 트랜스포즈(transposed) 컨볼루션 레이어로 구성된 스퀴즈 레이어와 제2 트랜스포즈 컨볼루션 레이어 및 제3 트랜스포즈 컨볼루션 레이어로 구성된 확장 레이어를 포함할 수 있다. In an embodiment, the image segmentation method using global attention may, after step (d), output information of a first fire module for the first convolution layer and output information of a second fire module for the second convolution layer Inputting information to a second global attention module for the first convolution layer; Further, the second fire module may include a squeeze layer composed of a first transposed convolution layer, an extension layer composed of a second transposed convolution layer, and a third transposed convolution layer.

실시예에서, 상기 글로벌 어텐션을 이용한 영상 분할 방법은, 상기 (d) 단계 이후에, 상기 제2 글로벌 어텐션 모듈의 출력 정보와 상기 제2 레이어에 대한 제2 파이어 모듈의 출력 정보에 대해 업샘플링하여 생성된 결과값을 상기 제1 컨볼루션 레이어에 대한 제2 파이어 모듈에 입력하여 디코딩을 수행하는 단계; 및 상기 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)에 입력하여 최종 출력값을 산출하는 단계; 를 더 포함할 수 있다. In an embodiment, in the image segmentation method using global attention, after step (d), output information of the second global attention module and output information of the second fire module for the second layer are upsampled; and calculating a final output value by inputting the decoded value calculated by performing the decoding to a classification layer. may further include.

실시예에서, 상기 제2 글로벌 어텐션 모듈에 입력하는 단계는, 상기 스퀴즈 레이어의 제1 트랜스포즈 컨볼루션 레이어의 출력값을 생성하는 단계; 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 트랜스포즈 컨볼루션 레이어와 제3 트랜스포즈 컨볼루션 레이어 각각에 입력하는 단계; 및 상기 제2 트랜스포즈 컨볼루션 레이어와 제3 트랜스포즈 컨볼루션 레이어 각각의 출력값을 연결(concatenate)하여 상기 제2 파이어 모듈의 출력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, the step of inputting the input to the second global attention module may include generating an output value of a first transpose convolution layer of the squeeze layer; inputting the generated output value to a second transpose convolution layer and a third transpose convolution layer configured in parallel of the enhancement layer, respectively; and generating output information of the second fire module by concatenating output values of the second transpose convolution layer and the third transpose convolution layer, respectively.

실시예에서, 글로벌 어텐션을 이용한 영상 분할 장치는, 입력 정보를 획득하는 획득부; 및 상기 입력 정보를 제1 컨볼루션 레이어로 구성된 스퀴즈 레이어(squeeze layer)와 제2 컨볼루션 레이어 및 제3 컨볼루션 레이어로 구성된 확장 레이어(expand layer)를 포함하는 제1 파이어(fire) 모듈에 입력하고, 상기 제1 파이어 모듈의 출력 정보와 멀티 스케일 입력 정보를 글로벌 평균 풀링(global average pooling)을 포함하는 제1 글로벌 어텐션(global attention) 모듈에 입력하며, 상기 제1 글로벌 어텐션 모듈의 출력 정보를 맥스 풀링(max pooling)하여 인코딩을 수행하는 제어부;를 포함할 수 있다. In an embodiment, an apparatus for segmenting an image using global attention may include: an acquisition unit that obtains input information; and inputting the input information to a first fire module including a squeeze layer composed of a first convolution layer and an expand layer composed of a second convolution layer and a third convolution layer, and inputting the output information and multi-scale input information of the first fire module to a first global attention module including global average pooling, and output information of the first global attention module to the max It may include a control unit that performs encoding by pooling (max pooling).

실시예에서, 상기 획득부는, 입력 이미지를 획득하고, 상기 제어부는, 상기 입력 이미지를 분할하여 패치(patch) 형태의 상기 입력 정보를 생성할 수 있다. In an embodiment, the acquiring unit may acquire an input image, and the control unit may generate the input information in a patch form by dividing the input image.

실시예에서, 상기 제어부는, 상기 스퀴즈 레이어의 제1 컨볼루션 레이어의 출력값을 생성하고, 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 컨볼루션 레이어와 제3 컨볼루션 레이어 각각에 입력하고, 상기 제2 컨볼루션 레이어와 제3 컨볼루션 레이어 각각의 출력값을 연결(concatenate)하여 상기 제1 파이어 모듈의 출력 정보를 생성할 수 있다. In an embodiment, the controller may generate an output value of the first convolution layer of the squeeze layer, input the generated output value to each of the second convolution layer and the third convolution layer configured in parallel of the enhancement layer, and concatenate the output values of the second convolution layer and the third convolution layer to generate output information of the first fire module.

실시예에서, 상기 제어부는, 제1 컨볼루션 레이어에 대한 상기 입력 정보에 대한 맥스 풀링을 수행하여 제2 컨볼루션 레이어에 대한 입력 정보를 생성하고, 상기 제2 컨볼루션 레이어에 대한 입력 정보를 병렬로 구성된 제4 컨볼루션 레이어와 제5 컨볼루션 레이어 각각에 입력하여 상기 제2 컨볼루션 레이어에 대한 멀티 스케일 입력 정보를 생성할 수 있다. In an embodiment, the control unit may perform max pooling on the input information for the first convolution layer to generate input information for the second convolution layer, and generate multi-scale input information for the second convolution layer by inputting the input information for the second convolution layer to each of the fourth convolution layer and the fifth convolution layer configured in parallel.

실시예에서, 상기 제어부는, 상기 제1 파이어 모듈의 출력 정보를 상기 글로벌 평균 풀링하고, 상기 글로벌 평균 풀링의 결과값을 제6 컨볼루션 레이어에 입력하고, 상기 제6 컨볼루션 레이어의 결과값과 상기 멀티 스케일 입력 정보를 제7 컨볼루션 레이어에 입력하여 생성된 결과값에 기반하여 업샘플링(upsampling)을 수행하여 어텐션 계수(attention coefficient)를 생성할 수 있다. In an embodiment, the control unit performs global average pooling on output information of the first fire module, inputs a resultant value of the global average pooling to a sixth convolution layer, and generates an attention coefficient by performing upsampling based on a resultant value generated by inputting the resultant value of the sixth convolution layer and the multi-scale input information to a seventh convolution layer.

실시예에서, 상기 제어부는, 상기 제1 컨볼루션 레이어에 대한 제1 파이어 모듈의 출력 정보와 상기 제2 컨볼루션 레이어에 대한 제2 파이어 모듈의 출력 정보를 상기 제1 컨볼루션 레이어에 대한 제2 글로벌 어텐션 모듈에 입력하고, 상기 제2 파이어 모듈은, 제1 트랜스포즈(transposed) 컨볼루션 레이어로 구성된 스퀴즈 레이어와 제2 트랜스포즈 컨볼루션 레이어 및 제3 트랜스포즈 컨볼루션 레이어로 구성된 확장 레이어를 포함할 수 있다. In an embodiment, the controller inputs output information of a first fire module for the first convolution layer and output information of a second fire module for the second convolution layer to a second global attention module for the first convolution layer, and the second fire module includes a squeeze layer composed of a first transposed convolution layer, a second transposed convolution layer, and an extension layer composed of a third transposed convolution layer. there is

실시예에서, 상기 제어부는, 상기 제2 글로벌 어텐션 모듈의 출력 정보와 상기 제2 컨볼루션 레이어에 대한 제2 파이어 모듈의 출력 정보에 대해 업샘플링하여 생성된 결과값을 상기 제1 컨볼루션 레이어에 대한 제2 파이어 모듈에 입력하여 디코딩을 수행하고, 상기 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)에 입력하여 최종 출력값을 산출할 수 있다. In an embodiment, the control unit may perform decoding by inputting a result value generated by upsampling output information of the second global attention module and output information of the second fire module for the second convolution layer to a second fire module for the first convolution layer, and inputting the decoded value calculated by performing the decoding to a classification layer to calculate a final output value.

실시예에서, 상기 제어부는, 상기 스퀴즈 레이어의 제1 트랜스포즈 컨볼루션 레이어의 출력값을 생성하고, 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 트랜스포즈 컨볼루션 레이어와 제3 트랜스포즈 컨볼루션 레이어 각각에 입력할 수 있다. In an embodiment, the control unit may generate an output value of the first transpose convolution layer of the squeeze layer, and input the generated output value to the second transpose convolution layer and the third transpose convolution layer configured in parallel of the enhancement layer, respectively.

상기한 목적들을 달성하기 위한 구체적인 사항들은 첨부된 도면과 함께 상세하게 후술될 실시예들을 참조하면 명확해질 것이다.Specific details for achieving the above objects will become clear with reference to embodiments to be described later in detail in conjunction with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 서로 다른 다양한 형태로 구성될 수 있으며, 본 발명의 개시가 완전하도록 하고 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자(이하, "통상의 기술자")에게 발명의 범주를 완전하게 알려주기 위해서 제공되는 것이다.However, the present invention is not limited to the embodiments disclosed below, but may be configured in a variety of different forms, so that the disclosure of the present invention is complete and those skilled in the art to which the present invention belongs (hereinafter "ordinary technician") is provided to fully inform the scope of the invention.

본 발명의 일 실시예에 의하면, 멀티 스케일 글로벌 어텐션 및 파이어 모듈과 통합된 패치 방식 입력에서 작동하는 네트워크가 효율적인 뇌 MRI 분할을 수행할 수 있다. According to one embodiment of the present invention, a network operating on patchy inputs integrated with multi-scale global attention and fire modules can perform efficient brain MRI segmentation.

본 발명의 효과들은 상술된 효과들로 제한되지 않으며, 본 발명의 기술적 특징들에 의하여 기대되는 잠정적인 효과들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and the potential effects expected by the technical features of the present invention will be clearly understood from the description below.

도 1은 본 발명의 일 실시예에 따른 글로벌 어텐션 기반 딥러닝 네트워크를 이용한 영상 분할 프로세스를 도시한 도면이다.
도 2a는 종래의 일 실시예에 따른 인코더 파이어 모듈을 도시한 도면이다.
도 2b는 종래의 일 실시예에 따른 디코더 파이어 모듈을 도시한 도면이다.
도 2c는 본 발명의 일 실시예에 따른 인코더 파이어 모듈을 도시한 도면이다.
도 2d는 본 발명의 일 실시예에 따른 디코더 파이어 모듈을 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 글로벌 어텐션 기반 딥러닝 네트워크를 이용한 영상 분할 과정을 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 글로벌 어텐션 모듈을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과를 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과를 도시한 도면이다.
도 7은 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.
도 8은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.
도 9는 본 발명의 일 실시예에 따른 학습 파라미터 수와 연산 시간 비교를 도시한 도면이다.
도 10은 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 방법을 도시한 도면이다.
도 11은 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 장치의 기능적 구성을 도시한 도면이다.1 is a diagram illustrating an image segmentation process using a global attention-based deep learning network according to an embodiment of the present invention.
2A is a diagram illustrating an encoder fire module according to a conventional embodiment.
2B is a diagram illustrating a decoder fire module according to a conventional embodiment.
2c is a diagram illustrating an encoder fire module according to an embodiment of the present invention.
2d is a diagram illustrating a decoder fire module according to an embodiment of the present invention.
3 is a diagram illustrating an image segmentation process using a global attention-based deep learning network according to an embodiment of the present invention.
4 is a diagram illustrating a global attention module according to an embodiment of the present invention.
5 is a diagram illustrating a classification result based on a first data set according to an embodiment of the present invention.
6 is a diagram illustrating a classification result based on a second data set according to an embodiment of the present invention.
7 is a diagram illustrating comparison of classification results based on a first data set according to an embodiment of the present invention.
8 is a diagram illustrating comparison of classification results based on a second data set according to an embodiment of the present invention.
9 is a diagram illustrating comparison between the number of learning parameters and calculation time according to an embodiment of the present invention.
10 is a diagram illustrating an image segmentation method using global attention according to an embodiment of the present invention.
11 is a diagram showing a functional configuration of an image segmentation apparatus using global attention according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시예들을 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail.

청구범위에 개시된 발명의 다양한 특징들은 도면 및 상세한 설명을 고려하여 더 잘 이해될 수 있을 것이다. 명세서에 개시된 장치, 방법, 제법 및 다양한 실시예들은 예시를 위해서 제공되는 것이다. 개시된 구조 및 기능상의 특징들은 통상의 기술자로 하여금 다양한 실시예들을 구체적으로 실시할 수 있도록 하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. 개시된 용어 및 문장들은 개시된 발명의 다양한 특징들을 이해하기 쉽게 설명하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다.Various features of the invention disclosed in the claims may be better understood in consideration of the drawings and detailed description. Devices, methods, manufacturing methods, and various embodiments disclosed in the specification are provided for illustrative purposes. The disclosed structural and functional features are intended to enable a person skilled in the art to specifically implement various embodiments, and are not intended to limit the scope of the invention. The disclosed terms and phrases are intended to provide an easy-to-understand description of the various features of the disclosed invention, and are not intended to limit the scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명을 생략한다.In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

이하, 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 방법 및 장치를 설명한다.Hereinafter, an image segmentation method and apparatus using global attention according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 글로벌 어텐션 기반 딥러닝 네트워크를 이용한 영상 분할 프로세스를 도시한 도면이다.1 is a diagram illustrating an image segmentation process using a global attention-based deep learning network according to an embodiment of the present invention.

도 1을 참고하면, S110 단계에서, 입력 이미지(예: MRI 스캔 데이터)를 해당 그라운드 트루스(ground truth)와 함께 수집할 수 있다. 예를 들어, 각 스캔은 치수 높이너비슬라이스(HWS)일 수 있다. 각 슬라이스의 HW에 0을 채우고 크기를 256256으로 조정할 수 있다. Referring to FIG. 1 , in step S110 , an input image (eg, MRI scan data) may be collected along with a corresponding ground truth. For example, each scan has a dimension height width slice (H W S) can be. H for each slice Fill W with zeros and set the size to 256 256 can be adjusted.

S120 단계에서, 일정 개수(예: 3)의 슬라이스 간격으로 특정(예: 10 번째) 슬라이스부터 적어도 하나의(예: 48) 슬라이스를 추출할 수 있다. In step S120, at least one (eg, 48) slices may be extracted from a specific (eg, 10th) slice at a slice interval of a predetermined number (eg, 3).

일 실시예에서, 각 슬라이스는 4 개의 균일한 패치로 나뉘며 이러한 패치는 학습을 위해 본 발명에 따른 글로벌 어텐션 기반 딥러닝 네트워크 모델에 입력으로 제공될 수 있다. In one embodiment, each slice is divided into 4 uniform patches and these patches can be provided as input to the global attention-based deep learning network model according to the present invention for training.

S130 단계에서, 글로벌 어텐션 기반 딥러닝 네트워크 모델 학습 후 테스트 입력이 모델에 공급되고 예측된 세분화 출력을 획득할 수 있다. In step S130, after training the global attention-based deep learning network model, a test input is supplied to the model and a predicted segmentation output may be obtained.

본 발명에 따른 아키텍처는 (i) 균일한 패치 방식 입력, (ii) 인코더 모듈, (iii) 디코더 모듈 및 (iv) 글로벌 어텐션 모듈(GAM)과 같이 자세히 논의될 수 있다. The architecture according to the present invention can be discussed in detail as (i) Uniform Patched Input, (ii) Encoder Module, (iii) Decoder Module and (iv) Global Attention Module (GAM).

균일한 패치 방식 입력 측면에서, 로컬 정보는 뇌 MRI에서 WM, GM 및 CSF를 식별하는 글로벌 정보보다 매우 중요할 수 있다. 더 나은 로컬 세부 정보를 캡처하고 정확한 조직 분할을 얻기 위해 모델 훈련을 위한 패치 기반 입력이 사용될 수 있다. In terms of uniform patchy input, the local information can be very important than the global information identifying WM, GM and CSF in brain MRI. Patch-based inputs for model training can be used to capture better local details and obtain accurate tissue segmentation.

예를 들어, 각 피험자의 뇌 MRI 스캔은 차원 HWS로 구성됩니다. 뇌 MRI 스캔의 시작 및 끝 부분 중 일부는 이전 연구에서 조사한 것처럼 유용한 정보를 많이 포함하지 않으며 연속된 부분도 거의 동일한 정보를 공유할 수 있다. For example, each subject's brain MRI scan is dimension H W It consists of S. Some of the beginning and ending segments of brain MRI scans don't contain much useful information, as previous studies have investigated, and successive segments can share almost the same information.

따라서 이러한 정보가 없는 슬라이스를 제외하고 연속 슬라이스의 멀티 학습을 줄이기 위해 모델 학습에 대한 정보가 많거나 적은 슬라이스의 존재를 보장하는 3 슬라이스의 간격이 있는 48개의 슬라이스를 선택했습니다. 추출된 각 슬라이스는 256256 크기로 크기가 조정됩니다.Therefore, to reduce multi-learning of consecutive slices, excluding slices without such information, we selected 48 slices with a spacing of 3 slices ensuring the existence of slices with more or less information for model training. Each slice extracted is 256 256 will be resized.

훈련된 글로벌 어텐션 기반 딥러닝 네트워크가 패치의 로컬 세부 사항에 더 집중할 수 있기 때문에 개별 패치로 슬라이스를 분할하면 로컬화가 향상될 수 있다. 따라서 각 슬라이스는 4개의 균일한 패치 입력으로 분할될 수 있다. 이러한 패치 입력은 훈련을 위해 글로벌 어텐션 기반 딥러닝 네트워크 모델에 입력되고 테스트 데이터에 대해 예측된 세분화 결과가 획득될 수 있다. Splitting slices into individual patches can improve localization, as a trained global attention-based deep learning network can focus more on the local details of a patch. Thus, each slice can be divided into 4 uniform patch inputs. These patch inputs can be input to a global attention-based deep learning network model for training, and a predicted segmentation result for test data can be obtained.

도 2a는 종래의 일 실시예에 따른 인코더 파이어 모듈을 도시한 도면이다. 도 2b는 종래의 일 실시예에 따른 디코더 파이어 모듈을 도시한 도면이다. 도 2c는 본 발명의 일 실시예에 따른 인코더 파이어 모듈(210)을 도시한 도면이다. 도 2d는 본 발명의 일 실시예에 따른 디코더 파이어 모듈(220)을 도시한 도면이다.2A is a diagram illustrating an encoder fire module according to a conventional embodiment. 2B is a diagram illustrating a decoder fire module according to a conventional embodiment. 2c is a diagram illustrating the encoder fire module 210 according to an embodiment of the present invention. 2d is a diagram illustrating the decoder fire module 220 according to an embodiment of the present invention.

도 2a 및 2b를 참고하면, F_i 필터를 포함하는 각 컨볼루션 블록이 있는 종래의 U-net의 인코더 및 디코더 측에 있는 컨볼루션 레이어를 확인할 수 있으며, 높이너비채널의 특징 맵이 입력으로 사용될 수 있다. Referring to Figures 2a and 2b, it can be seen that the convolutional layer on the encoder and decoder side of the conventional U-net having each convolution block including the F _i filter, width A feature map of the channel can be used as input.

도 2c 및 2d를 참고하면, 본 발명에 따른 인코더 측에서 사용되는 인코어 파이어 모듈(210)과 디코더 측에서 사용되는 디코더 파이어 모델(220)을 확인할 수 있다. Referring to FIGS. 2C and 2D , the encore fire module 210 used on the encoder side and the decoder fire model 220 used on the decoder side according to the present invention can be confirmed.

파이어 모듈은 (i)스퀴즈 레이어(squeeze layer) 및 (ii)확장 레이어(expand layer)를 포함할 수 있다. The fire module may include (i) a squeeze layer and (ii) an expand layer.

도 2c와 같이, 스퀴즈 모듈(211)은 커널 크기가 11이고 출력 채널이 F_i/4인 제1 컨볼루션 레이어(212)로 구성될 수 있다. 여기서 F_i는 컨볼루션 필터의 수를 나타낼 수 있다. 스퀴즈 레이어(211)의 출력은 확장 레이어(213)로 공급될 수 있다. As shown in Figure 2c, the squeeze module 211 has a kernel size of 1 1 and an output channel of F _i/4 . Here, F _i may represent the number of convolution filters. An output of the squeeze layer 211 may be supplied to the enhancement layer 213 .

확장 레이어(213)는 커널 크기가 11 및 33인 두 개의 병렬 컨볼루션(제2 컨볼루션 레이어(214), 제3 컨볼루션 레이어(215))으로 구성되며, 각 컨볼루션은 F_i/2 출력 채널일 수 있다. 또한 이러한 병렬 컨볼루션의 출력은 연결(216)되어 인코더 파이어 모듈(210)의 출력을 형성할 수 있다. The extension layer 213 has a kernel size of 1 1 and 3 3 is composed of two parallel convolutions (the second convolution layer 214 and the third convolution layer 215), and each convolution may be an F _i/2 output channel. The outputs of these parallel convolutions can also be concatenated 216 to form the output of the encoder fire module 210 .

도 2d를 참고하면, 디코더 경로는 트랜스포스된(transposed) 디코더 파이어 모듈(220)을 사용하여 모델 매개 변수를 감소시킬 수 있다. 디코더 경로의 주요 구성 요소는 업샘플링을 포함할 수 있다. Referring to FIG. 2D , the decoder path may reduce model parameters using a transposed decoder fire module 220 . A major component of the decoder path may include upsampling.

각 업샘플링은 도 2d와 같이 트랜스포스된 디코더 파이어 모듈(220)로 구성될 수 있다. 일 실시예에서, 디코더 파이어 모듈(220)은 트랜스포스 파이어 모듈 또는 이와 동등한 기술적 의미를 갖는 용어로 지칭될 수 있다. Each upsampling may consist of a transposed decoder fire module 220 as shown in FIG. 2d. In one embodiment, the decoder fire module 220 may be referred to as a transphos fire module or a term having an equivalent technical meaning.

디코더 파이어 모듈(220)은 스퀴즈 레이어(221)에 포함된 F_i/4 출력 채널이 있는 11의 제1 트랜스포스 컨볼루션 레이어(222)로 구성될 수 있다. 제1 트랜스포스 컨볼루션 레이어(222)의 출력은 다운 샘플링 유닛에서와 같이 디코더 파이어 모듈(220)의 출력을 형성하기 위해 연결되는 확장 레이어(223)에 포함되며, F_i/2 출력 채널과 각각 33 및 11 커널 크기를 가진 두 개의 병렬 트랜스포스 컨볼루션 레이어(제2 트랜스포즈 컨볼루션 레이어(224) 및 제2 트랜스포즈 컨볼루션 레이어(225))로 공급될 수 있다. The decoder fire module 220 has a F _{i / 4} output channel included in the squeeze layer 221 1 It may consist of a first transpose convolution layer 222 of 1. The output of the first transpose convolution layer 222 is included in the extension layer 223 connected to form the output of the decoder fire module 220 as in the down sampling unit, and the F _{i / 2} output channel and 3 respectively 3 and 1 It can be fed into two parallel transpose convolution layers (second transpose convolution layer 224 and second transpose convolution layer 225) with 1 kernel size.

또한 이러한 병렬 컨볼루션의 출력은 연결(226)되어 디코더 파이어 모듈(220)의 출력을 형성할 수 있다. The outputs of these parallel convolutions can also be concatenated 226 to form the output of the decoder fire module 220 .

도 3은 본 발명의 일 실시예에 따른 글로벌 어텐션 기반 딥러닝 네트워크(300)를 이용한 영상 분할 과정을 도시한 도면이다.3 is a diagram illustrating an image segmentation process using the global attention-based deep learning network 300 according to an embodiment of the present invention.

도 3을 참고하면, 글로벌 어텐션 기반 딥러닝 네트워크(300)는 인코더와 디코더 경로로 구성될 수 있다. 인코더 경로는 일련의 제1 파이어 모듈(310)로 구성되며, 각 제1 파이어 모듈(310)의 출력은 제1 글로벌 어텐션 모듈(Global Attention Module, GAM)(311)에 적용되고, 이어서 맥스 풀링 레이어(312)가 구성될 수 있다. Referring to FIG. 3 , the global attention-based deep learning network 300 may be composed of an encoder and a decoder path. The encoder path is composed of a series of first fire modules 310, and the output of each first fire module 310 is applied to a first global attention module (GAM) 311, followed by a max pooling layer. 312 may be configured.

일 실시예에서, 제1 파이어 모듈(310)은 도 2c의 인코어 파이어 모듈(210)을 포함할 수 있다. In one embodiment, the first fire module 310 may include the in-core fire module 210 of FIG. 2c.

맥스 풀링(312)과 함께 제1 파이어 모듈(310)은 다운 샘플링 모듈을 구성할 수 있다. 파이어 모듈 입력 외에도, 제1 글로벌 어텐션 모듈(311)은 멀티 스케일 레이어(301)로부터 멀티 스케일 입력 특징 융합 전략(multi-scale input feature fusion strategy)에서 얻은 입력을 수신할 수 있다. The first fire module 310 together with the max pooling 312 may constitute a down sampling module. In addition to the fire module input, the first global attention module 311 may receive an input obtained from a multi-scale input feature fusion strategy from the multi-scale layer 301 .

일 실시예에서, 멀티 스케일 레이어(301)에서 입력 정보는 스트라이드가 22인 맥스 풀링(303)과 11, 33 필터의 컨볼루션 레이어(301)을 개별적으로 사용하여 다운 샘플링될 수 있다. In one embodiment, the input information in the multi-scale layer 301 has a stride of 2 Max pooling by 2 (303) and 1 1, 3 It can be downsampled using the convolutional layer 301 of the 3 filters individually.

멀티 스케일 레이어(301)의 컨볼루션 출력은 제1 글로벌 어텐션 모듈(311)에 입력으로 제공되는 멀티 스케일 특징 맵을 형성할 수 있다. 제1 글로벌 어텐션 모듈(3110)의 출력은 차원을 줄이고 특징 맵의 세부 사항에 초점을 맞추기 위해 맥스풀링 레이어(312)에 제공될 수 있다. The convolution output of the multi-scale layer 301 may form a multi-scale feature map provided as an input to the first global attention module 311 . The output of the first global attention module 3110 can be provided to the maxpooling layer 312 to reduce the dimensionality and focus on the details of the feature map.

일 실시예에서, 풀링 인덱스는 각 맥스 풀링(312)을 수행하는 동안 저장되며 디코더에서 특징 맵을 업샘플링하는데 사용될 수 있다. 디코더는 특징을 강조할 수 있는 어텐션 게이트와 통합될 수 있다. In one embodiment, the pooling index is stored during each max pooling 312 and can be used to upsample the feature map at the decoder. The decoder can be integrated with an attention gate that can emphasize features.

l^th(하위 레벨) 및 l+1^th(상위 레벨) 인코딩 레이어에서 추출된 특징 맵은 각각 글로벌 어텐션 모듈에 대한 입력 신호 및 게이팅 신호로 사용됩니다.The feature maps extracted from the l ^th (lower level) and l+1 ^th (higher level) encoding layers are used as input and gating signals for the global attention module, respectively.

따라서 컨텍스트 정보를 포함하는 인코딩 레이어에서 얻은 특징 맵은 관련없는 응답을 제거하기 위해 제1 글로벌 어텐션 모듈(311)에 의해 계산되고 그 출력은 해당 업샘플링 레이어의 특징 맵과 연결될 수 있다. Therefore, the feature map obtained from the encoding layer including the context information is calculated by the first global attention module 311 to remove irrelevant responses, and the output thereof may be connected to the feature map of the corresponding upsampling layer.

따라서, 멀티 스케일 정보를 캡처하기 위해 인코더와 디코더 사이에 도입된 어텐션 기반 스킵 연결(skip connection)을 생성할 수 있다. 이러한 스킵 연결은 고해상도 특징 정보를 모두 사용하고 업샘플링 작업을 수행하는 동안 가장 관련성이 높은 정보에 초점을 맞출 수 있다. Accordingly, an attention-based skip connection introduced between an encoder and a decoder can be created to capture multi-scale information. These skip connections can use all of the high-resolution feature information and focus on the most relevant information while performing the upsampling task.

제1 파이어 모듈(310)의 출력 정보와 제2 파이어 모듈(320)의 출력 정보를 제2 글로벌 어텐션 모듈(321)에 입력할 수 있다.Output information of the first fire module 310 and output information of the second fire module 320 may be input to the second global attention module 321 .

제2 글로벌 어텐션 모듈(321)의 출력 정보와 제2 파이어 모듈(320)의 출력 정보에 대해 업샘플링(322)하여 생성된 결과값을 제2 파이어 모듈(320)에 입력하여 디코딩을 수행할 수 있다.The output information of the second global attention module 321 and the output information of the second fire module 320 are up-sampled 322 and the resulting value is input to the second fire module 320 to perform decoding.

디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)(330)에 입력하여 최종 출력값을 산출할 수 있다.A final output value may be calculated by inputting a decoded value calculated by performing decoding to a classification layer 330 .

각 인코더 및 디코더 레이어의 글로벌 어텐션 모듈을 사용하면 하위 레벨 특징의 가이드로 글로벌 컨텍스트를 통해 클래스 로컬화 세부 정보를 결정할 수 있다. Using the global attention module in each encoder and decoder layer, class localization details can be determined through the global context as a guide for low-level features.

분류 레이어(330)는 재구성된 이미지를 출력하는 소프트맥스 활성화 함수가 있는 11 컨벌루션 레이어로 구성될 수 있다. 예를 들어, 소프트맥스 레이어는 GM, WM, CSF 및 배경과 같은 네 가지 출력 클래스를 예측할 수 있다. 본 발명에 따른 글로벌 어텐션 기반 딥러닝 네트워크는 입력 이미지를 가져와 학습된 표현을 생성할 수 있다.The classification layer 330 is 1 with a softmax activation function that outputs a reconstructed image. It can be composed of 1 convolutional layer. For example, the softmax layer can predict four output classes: GM, WM, CSF and background. The global attention-based deep learning network according to the present invention can generate a learned expression by taking an input image.

입력 이미지는 이 특징 표현을 기반으로 4개의 출력 클래스 중 하나로 분류될 수 있다. 일 실시예에서, 글로벌 어텐션 기반 딥러닝 네트워크의 손실을 측정하기 위해 교차 엔트로피 손실을 사용할 수 있다. 소프트맥스 레이어는 표현 디코더(l)를 학습하고 이를 출력 클래스로 해석할 수 있다. 일 실시예에서, 확률 점수 y'는 출력 클래스에 할당될 수 있다. An input image can be classified into one of four output classes based on this feature representation. In one embodiment, cross-entropy loss can be used to measure the loss of a global attention-based deep learning network. The softmax layer can learn the representation decoder (l) and interpret it as an output class. In one embodiment, a probability score y' may be assigned to an output class.

일 실시예에서, 출력 클래스의 수를 c로 정의하면 하기 <수학식 1>과 같이 나타낼 수 있다. In one embodiment, when the number of output classes is defined as c, it can be expressed as in Equation 1 below.

일 실시예에서, 교차 엔트로피 손실 함수는 하기 <수학식 2>에서와 같이 네트워크 비용을 계산하는데 사용될 수 있다. In one embodiment, the cross entropy loss function can be used to calculate the network cost as shown in Equation 2 below.

여기서, y 및 y'는 각각 각 클래스 i에 대한 실측 및 예측 분포 점수를 나타낸다. Here, y and y' denote the measured and predicted distribution scores for each class i, respectively.

도 4는 본 발명의 일 실시예에 따른 글로벌 어텐션 모듈(400)을 도시한 도면이다. 일 실시예에서, 글로벌 어텐션 모듈(400)은 도 3의 제1 글로벌 어텐션 모듈(311) 및 제2 글로벌 어텐션 모듈(321)을 포함할 수 있다. 4 is a diagram illustrating a global attention module 400 according to an embodiment of the present invention. In one embodiment, the global attention module 400 may include the first global attention module 311 and the second global attention module 321 of FIG. 3 .

도 4를 참고하면, 글로벌 어텐션 모듈(400)에 대한 x_l은 l^th 인코딩 레이어(하위 레벨 특징)의 출력 특징 맵을 포함할 수 있다. 또한, x_(l+1)은 더 거친 스케일(coarser scale)에서 수집되어 게이팅 신호 벡터 역할을 하며 포커스 영역을 선택하기 위해 모든 픽셀에 적용될 수 있다. Referring to FIG. 4 , x _l for the global attention module 400 may include an output feature map of l ^th encoding layer (lower level feature). Also, x _(l+1) can be collected at a coarser scale and serve as a gating signal vector and applied to all pixels to select a focus area.

α_l은 비상대적 특징 응답을 억제하여 대상 작업과 관련된 활성화를 유지하는 어텐션 계수를 포함할 수 있다. 글로벌 어텐션 모듈(400)의 출력은 어텐션 계수가 하기 <수학식 3>과 같이 정의된 l^th 인코딩 레이어에서 요소 별 특징 맵을 추가하여 나타낼 수 있다. α _l may include an attention coefficient that suppresses non-relative feature responses to maintain activation related to the target task. The output of the global attention module 400 can be expressed by adding a feature map for each element in the l ^th encoding layer where the attention coefficient is defined as in Equation 3 below.

일 실시예에서, 멀티 시맨틱 클래스의 상황에서는 다차원 어텐션 계수를 학습할 수 있다. In one embodiment, multi-dimensional attention coefficients may be learned in the context of multi-semantic classes.

글로벌 평균 풀링(Global average pooling)(401)은 로컬 특징을 글로벌 컨텍스트와 통합하기 위한 하위 레벨 특징에 대한 가이드(guidance)로 글로벌 컨텍스트 정보(global context information)를 제공할 수 있다. Global average pooling 401 may provide global context information as a guide to low-level features for integrating local features with the global context.

글로벌 평균 풀링(Global average pooling)(401)의 결과값을 11의 제1 컨볼루션(403)에 대한 입력으로 제공할 수 있다. Set the result of Global average pooling (401) to 1 It can be provided as an input to the first convolution 403 of 1.

상위 레벨 특징으로부터 생성된 글로벌 정보를 ReLU 활성화 특징을 포함하는 11의 제6 컨볼루션 레이어(403)에 대한 입력으로 제공할 수 있다. 1 including the ReLU activation feature for global information generated from higher-level features. 1 may be provided as an input to the sixth convolution layer 403 .

가중치가 적용된 하위 레벨 특징을 추출하기 위해 11의 제7 컨볼루션 레이어(405)의 하위 레벨 특징(convolved low-level feature)과 곱할 수 있다. 1 to extract weighted low-level features 1 may be multiplied with the convolved low-level feature of the seventh convolutional layer 405 .

곱 연산(407)의 결과값에 대해 업샘플링(409)을 수행하여 어텐션 계수를 생성할 수 있다. An attention coefficient may be generated by performing upsampling 409 on the resultant value of the multiplication operation 407 .

일 실시예에서, 어텐션 계수에 대한 공식은 하기 <수학식 4>와 같이 나타낼 수 있다. In one embodiment, the formula for the attention coefficient can be expressed as Equation 4 below.

여기서, W_x 및 W_g는 각각 입력 및 게이팅 신호와 관련된 가중치이고, b는 바이어스 항이고, GAP는 글로벌 평균 풀링을 나타낼 수 있다. Here, W _x and W _g are weights associated with the input and gating signals, respectively, b is a bias term, and GAP may represent global average pooling.

업샘플링 결과값인 어텐션 계수는 하위 레벨 특징과 함께 합 연산(411)되어, 상위 레벨 특징 맵의 클래스 범주에 특정한 픽셀 로컬화를 추출할 수 있다. The upsampling result value, the attention coefficient, is summed (411) with the low-level features, and pixel localization specific to the class category of the high-level feature map can be extracted.

또한, 컨텍스트 정보를 포함하는 어텐션 모듈 x_lout을 획득한 특징 맵은 스킵 연결을 형성하는 해당 디코딩 레이어의 특징 맵과 연결될 수 있다. 이러한 스킵 연결은 고해상도 특징 정보를 모두 사용하고 업샘플링 작업을 수행하는 동안 가장 관련성이 높은 정보에 초점을 맞출 수 있다. Also, a feature map obtained by obtaining attention module x _lout including context information may be connected to a feature map of a corresponding decoding layer forming a skip connection. These skip connections can use all of the high-resolution feature information and focus on the most relevant information while performing the upsampling task.

도 5는 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과를 도시한 도면이다. 도 6은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과를 도시한 도면이다.5 is a diagram illustrating a classification result based on a first data set according to an embodiment of the present invention. 6 is a diagram illustrating a classification result based on a second data set according to an embodiment of the present invention.

도 5 및 6을 참고하면, 본 발명에 따른 방법은 두 세트의 뇌 MRI 데이터로 실험될 수 있다. 일 실시예에서, 도 5 및 6의 (a)는 원본 입력 이미지(Orignal input image), (b)는 그라운드 트루스 세분화 맵(ground truth segmentation map), (c)는 예측된 세분화 맵(predicted segmentation map), (d)는 예측된 GM(predicted GM(binary map)), (e)는 예측된 CSF(predicted CSF(binary map)) 및 (f)는 예측된 WM(predicted WM(binary map))을 나타낸다. Referring to Figures 5 and 6, the method according to the present invention can be tested with two sets of brain MRI data. 5 and 6, (a) is an original input image, (b) is a ground truth segmentation map, (c) is a predicted segmentation map, (d) is a predicted GM (binary map), (e) is a predicted CSF (binary map), and (f) is a predicted WM (binary map) indicates

예를 들어, 제1 데이터 세트에는 OASIS(Open Access Series of Imaging Studies) 데이터베이스에서 얻은 416 명의 피험자의 T1 가중치가 적용된 뇌 MRI가 포함될 수 있다. For example, the first data set may include T1-weighted brain MRIs of 416 subjects obtained from the Open Access Series of Imaging Studies (OASIS) database.

일 실시예에서, <표 1>과 같이 총 416 명의 피험자 중 50명의 피험자를 선정하여 실험을 진행하였다.In one embodiment, as shown in <Table 1>, the experiment was conducted by selecting 50 subjects out of a total of 416 subjects.

DatasetDataset No. of subjectsNo. of subjects Experiment dataExperiment data MaleMale FemaleFemale TotalTotal Training settraining set Testing setTesting set OASISOASIS 160160 256256 416416 3030 2020 IBSRIBSR 1414 44 1818 1212 66

선택한 데이터 중 처음 30개의 피험자는 모델 학습에 사용되었고 나머지 20 개의 피험자는 테스트 데이터 세트로 사용될 수 있다. MRI 슬라이스의 축, 시상 및 관상면은 실험에서 뇌 MRI의 분할을 위한 훈련 및 테스트에 사용될 수 있다.Among the selected data, the first 30 subjects were used for model training and the remaining 20 subjects could be used as a test data set. Axial, sagittal and coronal planes of MRI slices can be used for training and testing for brain MRI segmentation in experiments.

예를 들어, 제1 데이터 세트의 각 입력 축 스캔의 차원은 208176176이며 각 스캔은 176 개의 슬라이스로 구성될 수 있다. 구별 가능한 조직 영역은 대부분 부피의 중간 부분 근처에서 발견되는 것으로 관찰될 수 잇다. For example, the dimension of each input axis scan in the first data set is 208 176 176, and each scan can consist of 176 slices. A distinguishable tissue area can be observed, mostly found near the middle part of the volume.

또한, 연속 조각은 거의 동일한 정보를 공유할 수 있다. 따라서 이러한 정보가 없는 슬라이스를 제외하고 연속적인 슬라이스의 반복적인 훈련을 줄이기 위해 평가 절차를 위해 10번째 슬라이스부터 시작하여 3 슬라이스 간격으로 48 슬라이스 샘플을 선택할 수 있다. Also, successive pieces may share nearly identical information. Therefore, in order to reduce the repetitive training of consecutive slices except for slices without such information, 48 slice samples can be selected at 3 slice intervals starting from the 10th slice for the evaluation procedure.

추출된 슬라이스는 이미지의 상단과 하단에 24 픽셀의 0을 추가하고 이미지의 왼쪽과 오른쪽에 40 픽셀의 0을 추가하여 25625648의 크기로 크기가 조정될 수 있다. 마찬가지로 MRI 슬라이스의 시상면 및 관상 면도 256256 크기로 크기가 조정될 수 있다.The extracted slice is 256 by adding 24 pixels of zeros to the top and bottom of the image and 40 pixels of zeros to the left and right of the image. 256 It can be resized to a size of 48. Similarly, sagittal and coronal views of MRI slices 256 256 can be resized.

따라서, 각 입력 스캔은 256256 크기의 48개 슬라이스로 구성될 수 있다. 훈련 단계에서 각 MRI 스캔의 조각과 해당 실측 세분화 맵이 균일한 패치로 분할될 수 있다.Thus, each input scan is 256 It can consist of 48 slices of 256 size. In the training phase, the slices of each MRI scan and the corresponding ground truth segmentation map can be segmented into uniform patches.

입력 슬라이스의 크기는 256256이며 각 슬라이스는 4개의 패치로 분할될 수 있다. 따라서 본 발명에 따른 방법에서 분할된 각 패치의 크기는 128128일 수 있다. 이러한 패치는 훈련을 위한 모델에 대한 입력으로 제공되며 테스트 데이터에 대한 예측된 세분화 결과를 획득할 수 있다. The size of the input slice is 256 256, and each slice can be divided into 4 patches. Therefore, the size of each patch divided in the method according to the present invention is 128 may be 128. These patches are provided as inputs to the model for training and can obtain predicted segmentation results on the test data.

예를 들어, 제2 데이터 세트에는 IBSR(Internet Brain Segmentation Repository) 데이터 세트의 MRI가 포함될 수 있다. 제2 데이터 세트에는 건강한 남성 14명과 7세에서 71세 사이의 건강한 여성 4명의 T1 가중치 MRI 18개가 포함될 수 있다.For example, the second data set may include an MRI of an Internet Brain Segmentation Repository (IBSR) data set. A second data set may include 18 T1-weighted MRIs of 14 healthy males and 4 healthy females between the ages of 7 and 71 years.

IBSR의 MRI는 두개골 제거, 정규화 및 바이어스 필드 보정과 같은 전처리 후 제공될 수 있다. 훈련 데이터 세트에는 수동으로 주석을 달고 확인된 지상 실측 레이블이 있는 12명의 대상이 포함되었으며 나머지 6명은 모델을 테스트할 수 있다.MRI of the IBSR can be presented after preprocessing such as skull removal, normalization, and bias field correction. The training data set included 12 subjects with manually annotated and verified ground truth labels, and the remaining 6 could test the model.

원래의 축 스캔(256128256)은 이미지의 위와 아래에 64 픽셀의 0으로 채워져 256256256 크기로 크기를 조정하여 제안된 방법의 패치를 효과적으로 사용할 수 있다. 유사하게, 원래의 관상(256256128) 및 시상(128256256) 스캔도 실험을 위해 256256256 크기로 조정될 수 있다. Original axial scan (256 128 256) is 64 pixels of zero padding above and below the image, resulting in 256 256 256 size, the patch of the proposed method can be used effectively. Similarly, the original crown (256 256 128) and awards (128) 256 256) for the scan diagram experiment 256 256 256 can be resized.

일 실시예에서, 글로벌 어텐션 기반 딥러닝 네트워크 모델은 카테고리형 교차 엔트로피 손실 함수(categorical cross-entropy loss function)에 따라 최적화될 수 있다. 일 실시예에서, 가중치를 초기화하기 위해 정규화 기법이 채택될 수 있다. In one embodiment, the global attention-based deep learning network model can be optimized according to a categorical cross-entropy loss function. In one embodiment, a normalization technique may be employed to initialize the weights.

도 5 및 6을 참고하면, 제1 및 제2 데이터 세트의 축, 관상 및 시상면에 대한 분할 결과를 확인할 수 있다. Referring to FIGS. 5 and 6 , it is possible to check the segmentation results of the axial, coronal, and sagittal planes of the first and second data sets.

결과로부터 본 발명에 따른 방법이 두 데이터 세트에서 GM, WM, CSF에 대해 잘 세분화된 결과를 달성함을 관찰할 수 있습니다.From the results, it can be observed that the method according to the present invention achieves well-disaggregated results for GM, WM and CSF in both data sets.

일 실시에에서, 본 발명에 따른 방법의 성능을 하기 <표 2>에 자세히 설명된 정량적 지표를 사용하여 평가할 수 있다. In one embodiment, the performance of the method according to the present invention can be evaluated using the quantitative indices detailed in Table 2 below.

Dice similarity
coefficient (DSC)Dice similarity
coefficient (DSC) Jaccard Index (JI)Jaccard Index (JI) Hausdorff
distance (HD)Hausdorff
distance (HD) Mean square error
(MSE) Mean square error
(MSE)

여기서, DSC와 JI는 오버랩을 기준으로 볼륨을 비교하도록 조정되었으며, 자동화된 세분화 방법의 결과와 그라운드 트루스를 비교하는데 사용될 수 있다. Here, DSC and JI are tuned to compare volume based on overlap, and can be used to compare ground truth with results of automated segmentation methods.

DSC는 두 세트에 공통된 요소 수의 두 배를 각 세트에 있는 요소 수의 합으로 나눈 값으로 정의될 수 있다. 여기서 및 그라운드 트루스 세트 및 예측 분할 세트(즉, 각 세트의 요소 수)의 카디널리티를 나타낼 수 있다. DSC can be defined as twice the number of elements common to both sets divided by the sum of the number of elements in each set. here and The cardinality of the ground truth set and the predicted split set (ie, the number of elements in each set) may be indicated.

JI는 <표 2>에서 언급한 DSC로 표현될 수 있다. DSC 및 JI 메트릭은 예측된 세분화 맵과 해당 실측 세분화 맵 간의 일치를 결정할 수 있다.JI can be expressed as DSC mentioned in <Table 2>. The DSC and JI metrics can determine the match between the predicted segmentation map and the corresponding ground truth map.

또한, 원래 X 값과 예측 Y 값 간의 평균 제곱 차이인 평균 제곱 오차(MSE) 측면에서 세분화 성능을 평가할 수 있다.In addition, segmentation performance can be evaluated in terms of the mean squared error (MSE), which is the mean squared difference between the original X value and the predicted Y value.

HD는 메트릭 공간에서 두 세트 간의 비유사성을 결정하는데 사용될 수 있다. 작은 HD의 두 세트는 거의 동일하게 보일 수 있다. HD 및 MSE는 <표 2>에 표시된대로 계산될 수 있다. 여기서 D는 두 픽셀의 유클리드 거리를 의미하고, R과 C는 각각 이미지 높이와 너비를 의미할 수 있다.HD can be used to determine dissimilarity between two sets in metric space. Two sets of small HDs can look almost identical. HD and MSE can be calculated as shown in Table 2. Here, D may mean the Euclidean distance of two pixels, and R and C may mean the image height and width, respectively.

서로 다른 네트워크 아키텍처의 세분화 효과를 보다 직관적으로 보여주기 위해 U-net, SegNet 및 U-SegNet 모델을 실험 데이터에 대해 학습할 수 있다.To more intuitively show the segmentation effects of different network architectures, U-net, SegNet and U-SegNet models can be trained on experimental data.

도 7은 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다. 도 8은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.7 is a diagram illustrating comparison of classification results based on a first data set according to an embodiment of the present invention. 8 is a diagram illustrating comparison of classification results based on a second data set according to an embodiment of the present invention.

도 7 및 8을 참고하면, 본 발명에 따른 방법에 의해 생성된 분할 맵의 품질은 다른 기존 방법의 결과에 비해 분명히 우수함을 확인할 수 있다. Referring to FIGS. 7 and 8 , it can be seen that the quality of the segmentation map generated by the method according to the present invention is clearly superior to that of other existing methods.

일 실시예에서, 도 7 및 8의 (a)는 원본 입력 이미지, (b)는 그라운드 트루스 세분화 맵, (c)는 SegNet, U-net, U-SegNet, 및 본 발명에 따른 방법(the proposed method)에 의해 생성된 세분화 결과, (d)는 SegNet, U-net, U-SegNet 및 본 발명에 따른 방법에 의해 생성된 GM 맵, (e)는 SegNet, U-net, U-SegNet 및 본 발명에 따른 방법에 의해 생성된 CSF 맵 및 (f)는 SegNet, U-net, U-SegNet 및 본 발명에 따른 방법에 의해 생성된 WM 맵을 나타낸다.In one embodiment, (a) of FIGS. 7 and 8 is an original input image, (b) is a ground truth segmentation map, (c) is a segmentation result generated by SegNet, U-net, U-SegNet, and the method according to the present invention, (d) is a GM map generated by SegNet, U-net, U-SegNet and the method according to the present invention, (e) is SegNet, U-net, U-SegNet and the method according to the present invention The generated CSF maps and (f) represent SegNet, U-net, U-SegNet and WM maps generated by the method according to the present invention.

U-net은 네트워크가 얕고 이미지 공간 정보를 캡처하기에 충분하지 않다. 또한 영상 콘텐츠가 복잡해지면 U-net과 SegNet의 분할 정확도가 크게 낮아진다. 특히 SegNet과 U-net에 의해 생성된 특징 맵에서 확인할 수 있는데, 강조된 영역은 오분류를 초래한 특정 조직이 집중되어 있음을 확인할 수 있다. U-net has a shallow network and is not sufficient to capture image space information. In addition, if the video content is complicated, the segmentation accuracy of U-net and SegNet is greatly reduced. In particular, it can be confirmed in the feature maps generated by SegNet and U-net, and it can be confirmed that the highlighted area is concentrated in a specific organization that caused misclassification.

U-SegNet은 스킵 연결을 사용하여 인코더에서 디코더로의 특징 맵을 결합하고 인덱스를 풀링하여 업샘플링하는 동안 이러한 특징을 로컬화할 수 있다.U-SegNet can use skip connections to combine feature maps from encoder to decoder and pull indices to localize these features during upsampling.

그럼에도 불구하고 U-SegNet은 인덱스와 스킵 연결의 조합으로 더 나은 세분화 결과를 산출하지만 미세한 세부 사항을 캡처하지 못할 수 있다. 강조 표시된 빨간색 상자에서 U-SegNet은 일반적으로 WM 및 CSF 조직을 세분화하고 있음을 알 수 있다.Nevertheless, U-SegNet yields better segmentation results with a combination of index and skip concatenation, but may fail to capture fine details. In the highlighted red box, it can be seen that U-SegNet usually subdivides WM and CSF organizations.

어텐션을 통합하면 이러한 한계 중 일부를 극복하고 관련 영역에 어텐션을 집중시켜 세분화 성능을 향상시킬 수 있다. 이러한 개선된 분할은 본 발명에 따른 방법으로 얻은 결과에서 관찰될 수 있다.Integrating attention can overcome some of these limitations and improve segmentation performance by focusing attention on relevant areas. This improved partitioning can be observed in the results obtained with the method according to the present invention.

도 8을 참고하면, IBSR 이미지에서 얻은 분할에서도 유사한 결과를 확인할 수 있다. 특히 본 발명에 따른 네트워크가 다른 아키텍처보다 미세한 세부 정보를 얻을 수 있음을 알 수 있다. 이러한 시각적 결과는 본 발명에 따른 방법이 모호한 영역의 산만함을 우회하면서 더 미세한 세분화 세부 사항을 강력하게 복구할 수 있음을 확인할 수 있다. Referring to FIG. 8 , similar results can be confirmed in the segmentation obtained from the IBSR image. In particular, it can be seen that the network according to the present invention can obtain finer details than other architectures. These visual results confirm that the method according to the present invention can robustly recover finer segmentation details while bypassing the distraction of ambiguous regions.

일 실시예에서, 본 발명에 따른 방법에 대한 정량 분석은 기존 SegNet, U-net, U-SegNet 방법과 비교하여 수행되었으며 그 결과는 하기 <표 3> 내지 <표 6>과 같이 나타낼 수 있다. In one embodiment, quantitative analysis of the method according to the present invention was performed in comparison with existing SegNet, U-net, and U-SegNet methods, and the results are shown in <Table 3> to <Table 6> below.

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter WMWM SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 87.3687.36 92.1892.18 93.3693.36 95.5495.54 JI(%)JI(%) 77.1877.18 85.5085.50 87.5687.56 90.1390.13 HDHD 5.095.09 4.404.40 4.24.2 3.63.6 Coronal planeCoronal plane DSC(%)DSC (%) 82.1282.12 93.1493.14 94.1294.12 96.3796.37 JI(%)JI(%) 69.2369.23 87.2087.20 89.6389.63 91.1791.17 HDHD 5.45.4 4.144.14 3.93.9 3.23.2 Sagittal planeSagittal plane DSC(%)DSC (%) 82.4282.42 92.4492.44 93.2593.25 96.7296.72 JI(%)JI(%) 69.5469.54 86.3486.34 87.6587.65 91.3891.38 HDHD 7.27.2 4.34.3 4.04.0 3.353.35 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 72.8672.86 89.4589.45 90.3590.35 91.7691.76 JI(%)JI(%) 65.3465.34 81.5681.56 82.4982.49 84.2384.23 HDHD 6.516.51 5.145.14 4.64.6 4.24.2 Coronal planeCoronal plane DSC(%)DSC (%) 70.1570.15 88.4588.45 89.3689.36 90.4790.47 JI(%)JI(%) 62.3562.35 79.3879.38 80.1480.14 82.5382.53 HDHD 6.36.3 5.455.45 5.55.5 4.64.6 Sagittal planeSagittal plane DSC(%)DSC (%) 71.5371.53 86.8486.84 87.3287.32 89.5289.52 JI(%)JI(%) 63.4163.41 78.6378.63 79.6279.62 81.4281.42 HDHD 6.496.49 5.755.75 5.45.4 4.824.82

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter GMGM SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 84.9384.93 90.3290.32 82.0582.05 94.6594.65 JI(%)JI(%) 72.2072.20 82.3582.35 85.2785.27 88.5388.53 HDHD 5.75.7 4.34.3 4.04.0 3.83.8 Coronal planeCoronal plane DSC(%)DSC (%) 78.2178.21 82.2582.25 93.5393.53 95.9495.94 JI(%)JI(%) 64.1664.16 85.4585.45 87.8587.85 90.4690.46 HDHD 4.64.6 4.24.2 4.124.12 3.423.42 Sagittal planeSagittal plane DSC(%)DSC (%) 80.2380.23 91.1391.13 92.5692.56 95.6995.69 JI(%)JI(%) 67.5767.57 83.2683.26 85.6985.69 90.5390.53 HDHD 5.95.9 5.25.2 4.24.2 3.493.49 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 75.6375.63 91.5391.53 92.2092.20 93.8393.83 JI(%)JI(%) 67.3267.32 85.4185.41 86.1586.15 87.5287.52 HDHD 6.536.53 4.874.87 4.24.2 4.04.0 Coronal planeCoronal plane DSC(%)DSC (%) 73.6573.65 90.2390.23 91.4591.45 92.8592.85 JI(%)JI(%) 65.4265.42 83.5683.56 84.1684.16 85.6185.61 HDHD 6.216.21 5.175.17 4.84.8 4.534.53 Sagittal planeSagittal plane DSC(%)DSC (%) 74.6274.62 89.4689.46 90.4490.44 91.7191.71 JI(%)JI(%) 66.8566.85 81.5381.53 82.1582.15 84.4584.45 HDHD 6.366.36 5.775.77 5.35.3 4.254.25

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter CSFCSF SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 80.6780.67 89.8289.82 91.6491.64 93.8393.83 JI(%)JI(%) 67.8567.85 81.5381.53 84.5784.57 86.5386.53 HDHD 4.94.9 4.64.6 4.14.1 3.93.9 Coronal planeCoronal plane DSC(%)DSC (%) 74.0374.03 89.5389.53 91.4691.46 94.1894.18 JI(%)JI(%) 61.2661.26 82.3682.36 84.2584.25 89.2789.27 HDHD 4.64.6 4.14.1 4.154.15 3.813.81 Sagittal planeSagittal plane DSC(%)DSC (%) 77.4577.45 88.6388.63 92.1592.15 94.5394.53 JI(%)JI(%) 63.5163.51 81.2681.26 85.3685.36 89.4489.44 HDHD 6.36.3 4.44.4 4.154.15 3.563.56 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 68.4268.42 84.3484.34 84.9584.95 85.6485.64 JI(%)JI(%) 59.3259.32 75.8575.85 75.9875.98 77.1677.16 HDHD 6.36.3 4.44.4 4.34.3 4.864.86 Coronal planeCoronal plane DSC(%)DSC (%) 66.5466.54 83.6583.65 84.1584.15 85.8385.83 JI(%)JI(%) 57.3257.32 76.8676.86 76.9476.94 77.8677.86 HDHD 6.846.84 5.545.54 5.25.2 4.94.9 Sagittal planeSagittal plane DSC(%)DSC (%) 65.4965.49 80.7580.75 81.1981.19 83.5683.56 JI(%)JI(%) 54.8654.86 73.9673.96 74.1074.10 75.2875.28 HDHD 6.996.99 5.835.83 5.65.6 5.15.1

MSEMSE SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method OASISOASIS 0.0210.021 0.0080.008 0.0060.006 0.0030.003 IBSRIBSR 0.0130.013 0.0090.009 0.0070.007 0.0050.005

본 발명에 따른 네트워크는 SegNet, U-net, U-SegNet 방법에 비해 10%, 3.9% 및 2.3%(DSC 기준)의 평균 개선을 달성하고 0.003의 더 낮은 MSE 값을 달성함을 확인할 수 있다. It can be seen that the network according to the present invention achieves average improvements of 10%, 3.9% and 2.3% (based on DSC) compared to SegNet, U-net and U-SegNet methods and achieves a lower MSE value of 0.003.

이러한 성능 차이는 SegNet가 맥스 풀링 인덱스만 저장한다는 사실에 의해 설명될 수 있다. 즉, 각 풀링 창에서 최대 특징 값의 위치는 각 인코더 맵에 대해 기억되고 업샘플링에 사용될 수 있다.This performance difference can be explained by the fact that SegNet only stores max pooling indices. That is, the position of the maximum feature value in each pooling window can be stored for each encoder map and used for upsampling.

이는 3백만 개의 매개 변수로 경계선을 개선하고 기존 방법 중 2.9 시간의 훈련 시간을 덜 필요로 할 수 있다. 그러나 SegNet은 저해상도 특징 맵에서 업샘플링을 수행할 때 주변 정보를 잃기 때문에 많은 세부 정보를 놓치는 경향이 있다.This could improve borderline with 3 million parameters and require 2.9 hours less training time than existing methods. However, when SegNet performs upsampling on low-resolution feature maps, it tends to miss a lot of detail because it loses surrounding information.

반면에 U-net은 깊고 거친 정보를 얕고 미세한 의미 정보와 혼합하는 아키텍처의 핵심으로 스킵 연결을 사용할 수 있다. U-net은 업샘플링을 위해 하위 레벨 특징 맵을 사용할 수 있다. 그 결과 번역 불변성이 종종 손상될 수 있다. 또한 U-SegNet은 미세한 세부 사항에 민감하지 않은 경향이 있으며 WM 및 GM과 같은 인접 조직 간의 경계를 식별하는데 어려움이 있다. 이러한 기존 모델에 의해 생성된 세분화 맵은 인코더 단계의 풀링 레이어로 인해 상대적으로 낮은 해상도를 가질 수 있다.On the other hand, U-net can use skip connections as the core of an architecture that mixes deep coarse information with shallow fine semantic information. U-net can use low-level feature maps for upsampling. As a result, translational constancy can often be compromised. Also, U-SegNet tends to be insensitive to fine details and has difficulty discerning boundaries between adjacent organizations such as WM and GM. Subdivision maps generated by these existing models may have relatively low resolution due to the pooling layer of the encoder stage.

따라서 높은 공간 해상도를 유지하려면 풀링 레이어를 제거해야 할 수 있다. 그러나 컨볼루션은 로컬 작업이므로 SegNet, U-net, U-SegNet 모델은 레이어 풀링 없이는 이미지에서 전체적인 특징을 학습할 수 없다.Therefore, pooling layers may need to be removed to maintain high spatial resolution. However, since convolution is a local operation, SegNet, U-net, and U-SegNet models cannot learn global features from images without layer pooling.

본 발명에 따른 방법은 위에서 논의된 종래 기술의 문제에 대한 잠재적인 해결책으로 GAM과 결합된 멀티 스케일 특징 융합 방식을 제시하고 향상된 분할 정확도를 생성할 수 있다. 함께 연결된 11, 33 커널과 컨볼루션된 맥스 풀 입력은 세분화 맵의 해상도를 줄이지 않고 글로벌 컨텍스트를 캡처할 수 있다.The method according to the present invention presents a multi-scale feature fusion scheme combined with GAM as a potential solution to the problems of the prior art discussed above and can produce improved segmentation accuracy. 1 linked together 1, 3 A max-full input convolved with a 3-kernel can capture the global context without reducing the resolution of the subdivision map.

이러한 방식으로 글로벌 정보는 해상도를 희생하지 않고 레이어간에 교환될 수 있으며 의미론적 세분화 맵의 흐려짐을 줄일 수 있다. 또한 인코더의 GAM을 사용하면 세분화를 위해 원래 해상도를 추출하기 위해 하위 레벨 특징에 대한 가이드로 글로벌 컨텍스트 정보를 제공할 수 있다.In this way, global information can be exchanged between layers without sacrificing resolution and blurring of semantic segmentation maps can be reduced. In addition, the encoder's GAM can provide global context information as a guide to low-level features to extract the original resolution for segmentation.

디코더의 GAM은 글로벌 특징과 로컬 특징의 조합이 뇌 조직을 구별하는데 중요했으며 참조 결과와 일치함을 확인할 수 있다. 또한 균일한 입력 패치는 네트워크가 로컬 세부 사항에 더 집중할 수 있도록 할 수 있다.The GAM of the decoder confirms that the combination of global and local features was important in discriminating brain tissue and is consistent with the reference results. Uniform input patching can also allow the network to focus more on local details.

균일한 패치를 통해 공간 정보를 선택적으로 통합한 결과, 멀티 스케일 안내 멀티 GAM이 뒤 따르는 피쳐 맵은 컨텍스트 정보를 캡처하는데 도움이 되며 보완 정보를 효율적으로 인코딩하여 뇌 MRI를 정확하게 분할할 수 있다.As a result of selectively integrating spatial information through uniform patches, feature maps followed by multi-scale guided multi-GAM help capture contextual information and efficiently encode complementary information to accurately segment brain MRI.

또한 동일한 정확도를 유지하면서 학습 가능한 매개 변수가 거의 없는 네트워크를 식별하기 위해 파이어 모듈을 사용할 수 있다. We can also use the Fire module to identify networks with few learnable parameters while maintaining the same accuracy.

도 9는 본 발명의 일 실시예에 따른 학습 파라미터 수와 연산 시간 비교를 도시한 도면이다.9 is a diagram illustrating comparison between the number of learning parameters and calculation time according to an embodiment of the present invention.

도 9를 참고하면, 기존 방법과 비교하여 제안된 방법이 소모하는 학습 가능한 매개 변수와 계산 시간을 확인할 수 있다. Referring to FIG. 9 , it is possible to check learnable parameters and calculation time consumed by the proposed method compared to the existing method.

11 컨볼루션 필터만 있는 스퀴즈 레이어로 구성된 일련의 파이어 모듈을 배열하여 더 작은 모델을 구축할 수 있다. 이것은 11 및 33 필터가 조합된 확장 레이어를 제공할 수 있다.One A smaller model can be built by arranging a series of fire modules consisting of squeeze layers with only 1 convolutional filter. this is 1 1 and 3 An extension layer in which three filters are combined may be provided.

스퀴즈 레이어의 필터 수는 확장 레이어의 11 및 33 필터 수보다 적도록 정의될 수 있다. 스퀴즈 레이어의 11 필터는 확장 레이어에 대한 입력으로 제공되기 전에 입력 채널을 다운 샘플링하여 매개 변수를 감소시킬 수 있다. 확장 레이어는 11 및 33 필터로 구성될 수 있다.The number of filters in the squeeze layer is 1 in the extension layer. 1 and 3 Less than 3 filters can be defined. 1 of the squeeze layer 1 The filter can reduce parameters by downsampling the input channels before being fed as input to the enhancement layer. extension layer is 1 1 and 3 It can consist of 3 filters.

확장 레이어의 11 필터는 채널을 결합하고 교차 채널 풀링을 수행하지만 공간 구조를 인식할 수 없을 수 있다. 확장 레이어의 33 컨볼루션 필터는 공간 표현을 식별할 수 있다. 이 두 가지 크기 필터를 결합하면 모델이 더 적은 매개 변수로 작동하면서 더 표현력이 높아질 수 있다.1 of the extension layer 1 filters combine channels and perform cross-channel pooling, but may not be able to recognize spatial structure. 3 of the extension layer 3 Convolutional filters can identify spatial representations. Combining these two size filters allows the model to be more expressive while working with fewer parameters.

따라서 파이어 모듈은 매개 변수 맵을 줄여 계산 부하를 줄이고 더 높은 정확도를 유지할 수 있는 더 작은 딥러닝 네트워크를 구축할 수 있다. Thus, the Fire module can reduce the parameter maps to build smaller deep learning networks that can reduce the computational load and maintain higher accuracy.

본 발명에 따른 방법의 총 매개 변수는 100만 매개 변수로 SegNet, U-SegNet, U-net 네트워크보다 각각 3.3배, 4배, 5배 작을 수 있다.The total parameters of the method according to the present invention are 1 million parameters and can be 3.3 times, 4 times, and 5 times smaller than SegNet, U-SegNet, and U-net networks, respectively.

제1 데이터 세트 방법의 훈련 시간은 U-SegNet의 73%이며 U-SegNet 네트워크보다 12% 빠를 수 있다. 메모리 요구 사항이 감소하면 기존 방법에 비해 에너지 및 계산 요구 사항이 크게 감소할 수 있다.The training time of the first data set method is 73% of U-SegNet and can be 12% faster than U-SegNet network. Reduced memory requirements can significantly reduce energy and computational requirements compared to existing methods.

본 발명에 따른 세분화 성능에서 각 선택의 영향을 조사하기 위해 서로 다른 제안된 모듈에 대한 테스트를 수행할 수 있다. (i) 스퀴즈 U-SegNet, (ii) 멀티 스케일 입력으로 스퀴즈U-SegNet, (iii) 멀티 글로벌 어텐션으로 스퀴즈 U-SegNet, (iv)멀티 글로벌 어텐션으로 멀티 스케일 스퀴즈 U-SegNet(제안된 방법).Tests can be performed on different proposed modules to investigate the impact of each choice on the segmentation performance according to the present invention. (i) squeeze U-SegNet, (ii) squeeze U-SegNet with multi-scale input, (iii) squeeze U-SegNet with multi-global attention, (iv) multi-scale squeeze U-SegNet with multi-global attention (proposed method).

첫 번째 네트워크의 스퀴즈 U-SegNet은 기존 U-SegNet에서 각 컨볼루션 블록을 파이어 모듈로 교체하여 얻을 수 있다. The squeeze U-SegNet of the first network can be obtained by replacing each convolution block with a fire module in the existing U-SegNet.

두 번째 네트워크의 스퀴즈 U-SegNet의 인코더는 멀티 스케일 입력 레이어를 포함할 수 있다. The encoder of the squeeze U-SegNet of the second network may include a multi-scale input layer.

이것은 입력을 맥스 풀링하고 11, 33 커널을 사용하여 병렬 컨볼루션을 수행하고 이러한 멀티 스케일 특징을 연결함으로써 달성될 수 있다. 이러한 융합된 멀티 스케일 특징은 해당 파이어 모듈 출력과 연결되고 맥스 풀링 작업을 위한 입력으로 공급될 수 있다. 이 과정은 모든 인코딩 레이어에 대해 반복될 수 있다. This max pools the input and returns 1 1, 3 It can be achieved by performing parallel convolution using a 3 kernel and concatenating these multi-scale features. These fused multi-scale features can be connected to corresponding Fire module outputs and fed as inputs for max pooling operations. This process can be repeated for all encoding layers.

멀티 스케일 특징 모듈은 관련 없는 정보를 필터링하면서 글로벌 특징의 인접 스케일 정보를 보다 정확하게 추출할 수 있다. 어텐션 메커니즘의 영향은 GAM이 인코더와 디코더 모두에 통합되어 멀티 어텐션 네트워크를 형성하는 세 번째 네트워크에서 탐구될 수 있다.The multi-scale feature module can more accurately extract adjacent scale information of global features while filtering irrelevant information. The influence of the attention mechanism can be explored in a third network where GAMs are integrated in both the encoder and decoder to form a multi-attention network.

마지막으로, 본 발명에 따른 방법이라 불리는 멀티 스케일 스퀴즈 U-SegNet은 제안된 모든 모듈을 결합하여 시맨틱 가이드를 통합할 수 있다. Finally, the multi-scale squeeze U-SegNet, which is called the method according to the present invention, can integrate a semantic guide by combining all the proposed modules.

일 실시예에서, 아래 <표 7> 내지 <표 10>은 세분화 성능에 대한 다양한 구성 요소의 개별 결과를 나타낼 수 있다. In one embodiment, <Table 7> to <Table 10> below may indicate individual results of various components for segmentation performance.

ModelsModels GMGM DSCDSC JIJI HDHD Squeeze U-SegNetSqueeze U-SegNet 92.0592.05 88.0688.06 3.53.5 Squeeze U-SegNet
with multi-scale inputSqueeze U-SegNet
with multi-scale input 93.4493.44 89.4789.47 4.84.8 Squeeze U-SegNet
with multi global
attentionSqueeze U-SegNet
with multi global
attention 94.3294.32 89.2589.25 55 Proposed MethodProposed Method 94.6594.65 89.5189.51 4.84.8

ModelsModels WMWM DSCDSC JIJI HDHD Squeeze U-SegNetSqueeze U-SegNet 93.3793.37 90.4290.42 2.82.8 Squeeze U-SegNet
with multi-scale inputSqueeze U-SegNet
with multi-scale input 94.7894.78 91.9091.90 4.14.1 Squeeze U-SegNet
with multi global
attentionSqueeze U-SegNet
with multi global
attention 95.595.5 91.4091.40 4.24.2 Proposed MethodProposed Method 95.8695.86 92.0592.05 4.14.1

ModelsModels CSFCSF DSCDSC JIJI HDHD MSEMSE Squeeze U-SegNetSqueeze U-SegNet 91.6591.65 88.0688.06 2.02.0 0.0060.006 Squeeze U-SegNet
with multi-scale inputSqueeze U-SegNet
with multi-scale input 93.3293.32 90.2590.25 3.03.0 0.0050.005 Squeeze U-SegNet
with multi global
attentionSqueeze U-SegNet
with multi global
attention 94.3094.30 89.2289.22 3.33.3 0.0040.004 Proposed MethodProposed Method 94.4394.43 89.7489.74 3.03.0 0.0030.003

Computation
time (5 epochs)Computation
time (5 epochs) #Learnable
parameters#learnable
parameters Squeeze U-SegNetSqueeze U-SegNet 1.7 hours1.7 hours 768,788768,788 Squeeze U-SegNet
with multi-scale inputSqueeze U-SegNet
with multi-scale input 1.9 hours1.9 hours 860,180
860,180
Squeeze U-SegNet
with multi global
attentionSqueeze U-SegNet
with multi global
attention 2 hours2 hours 942,164942,164 Proposed MethodProposed Method 2.2 hours2.2 hours 1,030,420 1,030,420

모델은 네트워크 정확도를 유지하면서 모델 학습을 위한 계산 시간을 줄임으로써 학습 가능한 매개 변수의 요구 사항이 크게 감소함을 나타낸다. 기준 스퀴즈 U-SegNet과 비교하여 멀티 스케일 특징 융합 입력 체계 및 멀티 어텐션 모듈과 통합된 모델의 성능이 각각 2.1% 및 3% 향상되었음을 확인할 수 있다.The model exhibits a significant reduction in the requirements of learnable parameters by reducing the computational time for model training while maintaining network accuracy. Compared to the reference squeeze U-SegNet, it can be seen that the performance of the model integrated with the multi-scale feature fusion input scheme and multi-attention module is improved by 2.1% and 3%, respectively.

멀티 스케일 특징 융합은 DSC에서 더 적은 증가를 보여주지만 GAM과 결합하여 더 많은 네트워크 효율성을 제공할 수 있다.Multi-scale feature fusion shows a smaller increase in DSC but can provide more network efficiency in combination with GAM.

또한 멀티 규모 및 멀티 글로벌 어텐션 전략을 결합하면 성능이 향상되고 세 가지 메트릭에서 94.78%(DSC), 90.43%(JI), 3.1(HD), 0.003의 가장 낮은 MSE와 같은 최상의 값을 얻을 수 있다. In addition, combining multi-scale and multi-global attention strategies improves performance and yields the best values across the three metrics, such as 94.78% (DSC), 90.43% (JI), 3.1 (HD), and the lowest MSE of 0.003.

이러한 결과는 기준 U-SegNet에 비해 DSC에서 3.5%의 개선을 나타내며 개별 구성 요소와 비교하여 제안된 멀티 스케일 가이드 멀티 GAM의 효율성을 나타낸다. These results represent a 3.5% improvement in DSC over the baseline U-SegNet and demonstrate the efficiency of the proposed multi-scale guided multi-GAM compared to individual components.

또한, 훈련 시간 및 세분화 성능 측면에서 패치 크기의 영향을 조사할 수 있다. 실험은 세 가지 다른 패치 크기(128128, 6464 및 3232)에 대해 제2 데이터 세트에서 수행될 수 있다. In addition, the effect of patch size in terms of training time and segmentation performance can be investigated. Experiments were conducted with three different patch sizes (128 128, 64 64 and 32 32) on the second data set.

일 실시예에서, 아래 <표 11>은 다양한 패치 크기에 대한 DSC의 세분화 성능을 나타낼 수 있다. In one embodiment, Table 11 below may indicate segmentation performance of DSC for various patch sizes.

Patch
sizePatch
size DSCDSC HIHI Training time (hours)Training time (hours) WMWM GMGM CSFCSF WMWM GMGM CSFCSF 128x128128x128 96.3396.33 95.4495.44 94.8194.81 92.9392.93 91.2891.28 90.1490.14 2.22.2 64x6464x64 96.3596.35 95.4895.48 94.8994.89 92.9592.95 91.3791.37 90.2890.28 3.43.4 32x3232x32 96.3596.35 95.5195.51 95.1295.12 92.9892.98 91.4791.47 90.3490.34 4.64.6

패치 크기가 작을수록 성능이 향상되는 것을 확인할 수 있다. 이는 패치가 작을수록 네트워크가 훈련할 훈련 데이터가 더 많이 생성되기 때문일 수 있다. 또한, 로컬화는 더욱 정확하게 생성될 수 있다.It can be seen that the smaller the patch size, the better the performance. This could be because smaller patches generate more training data for the network to train on. Also, localization can be created more accurately.

또한 패치 크기가 128128인 경우 모델을 학습하는데 2.2 시간이 걸리는 반면, 거의 동일한 정확도로 3232 패치의 경우 학습 시간이 두 배가 될 수 있다.Also, if the patch size is 128 For 128, it takes 2.2 hours to train the model, while for 32 with almost the same accuracy For 32 patches, the learning time can be doubled.

따라서 128128 패치 크기는 <표 11>의 결과에 따라 DSC 점수와 모델 학습에 소요되는 계산 시간 사이에 적절한 균형을 제공함을 확인할 수 있다. So 128 According to the results in <Table 11>, it can be confirmed that the 128 patch size provides an appropriate balance between the DSC score and the computation time required for model learning.

도 10은 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 방법을 도시한 도면이다.10 is a diagram illustrating an image segmentation method using global attention according to an embodiment of the present invention.

도 10을 참고하면, S1001 단계는, 입력 정보를 획득하는 단계이다. Referring to FIG. 10 , step S1001 is a step of acquiring input information.

일 실시예에서, 입력 이미지를 획득하고, 입력 이미지를 분할하여 패치(patch) 형태의 입력 정보를 생성할 수 있다. 여기서, 입력 이미지는 객체의 슬라이스 이미지(slice image)를 포함할 수 있다. 또한, 입력 정보는 슬라이스 이미지를 다수의 패치 형태로 분할한 패치 이미지를 포함할 수 있다. In one embodiment, an input image may be obtained, and input information in the form of a patch may be generated by dividing the input image. Here, the input image may include a slice image of an object. Also, the input information may include a patch image obtained by dividing a slice image into a plurality of patch shapes.

S1003 단계는, 입력 정보를 제1 컨볼루션 레이어(212)로 구성된 스퀴즈 레이어(squeeze layer)(211)와 제2 컨볼루션 레이어(214) 및 제3 컨볼루션 레이어(215)로 구성된 확장 레이어(expand layer)(213)를 포함하는 제1 파이어(fire) 모듈(310)에 입력하는 단계이다. Step S1003 is a step of inputting input information to a first fire module 310 including an expand layer 213 composed of a squeeze layer 211 composed of a first convolution layer 212, a second convolution layer 214, and a third convolution layer 215.

일 실시예에서, 제1 컨볼루션 레이어(211) 및 제2 컨볼루션 레이어(214)는 제1 커널 크기로 구성되고, 제3 컨볼루션 레이어(215)는 제2 커널 크기로 구성될 수 있다. 예를 들어, 제1 커널 크기는 11 커널 크기를 포함하고, 제2 커널 크기는 33 커널 크기를 포함하나, 이에 제한되지 않는다. In one embodiment, the first convolution layer 211 and the second convolution layer 214 may be configured with a first kernel size, and the third convolution layer 215 may be configured with a second kernel size. For example, the first kernel size is 1 1 kernel size, and the second kernel size is 3 3 kernel size, but is not limited thereto.

일 실시예에서, 스퀴즈 레이어(211)의 제1 컨볼루션 레이어(212)의 출력값을 생성하고, 상기 생성된 출력값을 확장 레이어(213)의 병렬로 구성된 제2 컨볼루션 레이어(214)와 제3 컨볼루션 레이어(215) 각각에 입력하며, 제2 컨볼루션 레이어(214)와 제3 컨볼루션 레이어(215) 각각의 출력값을 연결(concatenate)(216)하여 제1 파이어 모듈(310)의 출력 정보를 생성할 수 있다. In one embodiment, an output value of the first convolution layer 212 of the squeeze layer 211 is generated, the generated output value is input to each of the second convolution layer 214 and the third convolution layer 215 configured in parallel of the extension layer 213, and the output values of the second convolution layer 214 and the third convolution layer 215 are concatenated (216) Thus, output information of the first fire module 310 may be generated.

일 실시예에서, S1005 단계 이전에, 제1 레이어에 대한 입력 정보에 대한 맥스 풀링(312)을 수행하여 제2 레이어에 대한 입력 정보를 생성하고, 제2 레이어에 대한 입력 정보를 병렬로 구성된 제4 컨볼루션 레이어와 제5 컨볼루션 레이어 각각에 입력하여 제2 레이어에 대한 멀티 스케일 입력 정보를 생성할 수 있다. In one embodiment, before step S1005, max pooling 312 is performed on the input information for the first layer to generate input information for the second layer, and the input information for the second layer is configured in parallel. The fourth convolution layer and the fifth convolution layer may generate multi-scale input information for the second layer.

일 실시예에서, 제4 컨볼루션 레이어와 제5 컨볼루션 레이어는 병렬로 구성되어 멀티 스케일 레이어(301)를 구성할 수 있다. In one embodiment, the fourth convolution layer and the fifth convolution layer may be configured in parallel to configure the multi-scale layer 301 .

S1005 단계는, 제1 파이어 모듈(310)의 출력 정보와 멀티 스케일 입력 정보를 글로벌 평균 풀링(global average pooling)(401)을 포함하는 제1 글로벌 어텐션(global attention) 모듈(311)에 입력하는 단계이다. Step S1005 is a step of inputting the output information of the first fire module 310 and the multi-scale input information to the first global attention module 311 including the global average pooling 401.

일 실시예에서, 제1 파이어 모듈(310)의 출력 정보를 글로벌 평균 풀링(401)하고, 글로벌 평균 풀링(401)의 결과값을 제6 컨볼루션 레이어(403)에 입력하며, 제6 컨볼루션 레이어(403)의 결과값과 멀티 스케일 입력 정보를 제7 컨볼루션 레이어(405)에 입력하여 생성된 결과값에 기반한 업샘플링(upsampling)(409)을 수행하여 어텐션 계수(attention coeffeicient)를 생성하며, 어텐션 계수와 제1 파이어 모듈(310)의 출력 정보를 이용하여 제1 글로벌 어텐션 모듈(311)의 출력 정보를 생성할 수 있다. In one embodiment, global average pooling 401 is performed on the output information of the first fire module 310, the resultant value of the global average pooling 401 is input to the sixth convolution layer 403, and the resultant value of the sixth convolution layer 403 and the multi-scale input information are input to the seventh convolution layer 405 to perform upsampling 409 based on the generated resultant to perform attention coefficient ( attention coeffeicient), and output information of the first global attention module 311 may be generated using the attention coefficient and the output information of the first fire module 310 .

일 실시예에서, 제6 컨볼루션 레이어(403)의 결과값과 멀티 스케일 입력 정보를 제7 컨볼루션 레이어(405)에 입력하여 생성된 결과값에 대한 곱 연산(407)을 수행하고, 곱 연산(407)을 통해 생성된 결과값에 업샘플링(409)을 수행할 수 있다.In one embodiment, a multiplication operation 407 is performed on a result value generated by inputting the result value of the sixth convolution layer 403 and the multi-scale input information to the seventh convolution layer 405, and upsampling 409 may be performed on the result value generated through the multiplication operation 407.

일 실시예에서, 어텐션 계수와 제1 파이어 모듈(310)의 출력 정보에 대한 합 연산(411)을 수행하고, 합 연산(411)을 통해 제1 글로벌 어텐션 모듈(311)의 출력 정보를 생성할 수 있다. In one embodiment, a sum operation 411 may be performed on the attention coefficient and output information of the first fire module 310, and output information of the first global attention module 311 may be generated through the sum operation 411.

일 실시예에서, 제1 파이어 모듈(310)의 출력 정보는 하위 레벨 특징을 의미하고, 멀티 스케일 입력 정보는 상위 레벨 특징을 의미할 수 있다. In one embodiment, output information of the first fire module 310 may mean a low-level feature, and multi-scale input information may mean a high-level feature.

S1007 단계는, 제1 글로벌 어텐션 모듈(311)의 출력 정보를 맥스 풀링(max pooling)(312)하여 인코딩을 수행하는 단계이다. Step S1007 is a step of performing encoding by performing max pooling 312 on output information of the first global attention module 311 .

일 실시예에서, 인코딩을 수행하여 산출된 인코딩값은 다음 레이어의 제1 파이어 모듈(310)에 입력될 수 있다.In one embodiment, an encoding value calculated by performing encoding may be input to the first fire module 310 of the next layer.

일 실시예에서, S1007 단계 이후에, 제1 레이어에 대한 제1 파이어 모듈(310)의 출력 정보와 제2 레이어에 대한 제2 파이어 모듈(320)의 출력 정보를 제1 레이어에 대한 제2 글로벌 어텐션 모듈(321)에 입력할 수 있다. In one embodiment, after step S1007, the output information of the first fire module 310 for the first layer and the output information of the second fire module 320 for the second layer are input to the second global attention module 321.

예를 들어, 제1 파이어 모듈(310)는 인코더 파이어 모듈(210)을 의미할 수 있고, 제2 파이어 모듈(320)은 디코더 파이어 모듈(220)을 의미할 수 있다. For example, the first fire module 310 may mean the encoder fire module 210, and the second fire module 320 may mean the decoder fire module 220.

일 실시예에서, 제2 파이어 모듈(320)은, 제1 트랜스포즈(transposed) 컨볼루션 레이어(222)로 구성된 스퀴즈 레이어(221)와 제2 트랜스포즈 컨볼루션 레이어(224) 및 제2 트랜스포즈 컨볼루션 레이어(225)로 구성된 확장 레이어(223)를 포함할 수 있다. In one embodiment, the second fire module 320 may include a squeeze layer 221 composed of a first transposed convolution layer 222, a second transposed convolution layer 224, and an extension layer 223 composed of a second transposed convolution layer 225.

일 실시예에서, S1007 단계 이후에, 제2 글로벌 어텐션 모듈(321)의 출력 정보와 제2 레이어에 대한 제2 파이어 모듈(320)의 출력 정보에 대해 업샘플링(322)하여 생성된 결과값을 제1 레이어에 대한 제2 파이어 모듈(320)에 입력하여 디코딩을 수행할 수 있다.In one embodiment, after step S1007, the output information of the second global attention module 321 and the output information of the second fire module 320 for the second layer are upsampled (322). The resulting value may be input to the second fire module 320 for the first layer to perform decoding.

일 실시예에서, 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)(330)에 입력하여 최종 출력값을 산출할 수 있다. In one embodiment, a decoded value calculated by performing decoding may be input to a classification layer 330 to calculate a final output value.

도 11은 본 발명의 일 실시예에 따른 글로벌 어텐션을 이용한 영상 분할 장치(1100)의 기능적 구성을 도시한 도면이다.11 is a diagram showing a functional configuration of an image segmentation apparatus 1100 using global attention according to an embodiment of the present invention.

도 11을 참고하면, 영상 분할 장치(1100)는 획득부(1110), 제어부(1120) 및 저장부(1130)를 포함할 수 있다.Referring to FIG. 11 , an image segmentation apparatus 1100 may include an acquisition unit 1110, a control unit 1120, and a storage unit 1130.

획득부(1110)는 입력 정보를 획득할 수 있다. 일 실시예에서, 획득부(1110)는 통신부 또는 촬영부를 포함할 수 있다. 예를 들어, 획득부(1110)는 자기 공명 영상(MRI) 카메라를 포함할 수 있다. 또한, 획득부(1110)는 외부 전자 장치로부터 데이터를 수신하는 통신부를 포함할 수 있다. The acquisition unit 1110 may obtain input information. In one embodiment, the acquisition unit 1110 may include a communication unit or a photographing unit. For example, the acquisition unit 1110 may include a magnetic resonance imaging (MRI) camera. Also, the acquisition unit 1110 may include a communication unit that receives data from an external electronic device.

일 실시예에서, 통신부는 유선 통신 모듈 및 무선 통신 모듈 중 적어도 하나를 포함할 수 있다. 통신부의 전부 또는 일부는 '송신부', '수신부' 또는 '송수신부(transceiver)'로 지칭될 수 있다.In one embodiment, the communication unit may include at least one of a wired communication module and a wireless communication module. All or part of the communication unit may be referred to as a 'transmitter', a 'receiver', or a 'transceiver'.

제어부(1120)는 입력 정보를 제1 컨볼루션 레이어(212)로 구성된 스퀴즈 레이어(squeeze layer)(211)와 제2 컨볼루션 레이어(214) 및 제3 컨볼루션 레이어(215)로 구성된 확장 레이어(expand layer)(213)를 포함하는 제1 파이어(fire) 모듈(310)에 입력할 수 있다.The control unit 1120 may input input information to a first fire module 310 including an expand layer 213 composed of a squeeze layer 211 composed of a first convolution layer 212, a second convolution layer 214, and a third convolution layer 215.

일 실시예에서, 제어부(1120)는 제1 파이어 모듈(310)의 출력 정보와 멀티 스케일 입력 정보를 글로벌 평균 풀링(global average pooling)(401)을 포함하는 제1 글로벌 어텐션(global attention) 모듈(311)에 입력할 수 있다.In one embodiment, the controller 1120 includes the output information and the multi-scale input information of the first fire module 310 and the first global attention module 311 including global average pooling 401. It can be input.

일 실시예에서, 제1 글로벌 어텐션 모듈(311)의 출력 정보를 맥스 풀링(max pooling)(312)하여 인코딩을 수행할 수 있다. In an embodiment, encoding may be performed by performing max pooling 312 on output information of the first global attention module 311 .

일 실시예에서, 제어부(1120)는 적어도 하나의 프로세서 또는 마이크로(micro) 프로세서를 포함하거나, 또는, 프로세서의 일부일 수 있다. 또한, 제어부(1120)는 CP(communication processor)라 지칭될 수 있다. 제어부(1120)는 본 발명의 다양한 실시예에 따른 영상 분할 장치(1100)의 동작을 제어할 수 있다. In one embodiment, the controller 1120 may include at least one processor or microprocessor, or may be a part of the processor. Also, the controller 1120 may be referred to as a communication processor (CP). The controller 1120 may control the operation of the image segmentation device 1100 according to various embodiments of the present disclosure.

저장부(1130)는 입력 정보를 저장할 수 있다. 또한, 저장부(1030)는 글로벌 어텐션 기반 딥러닝 네트워크를 저장할 수 있다.The storage unit 1130 may store input information. Also, the storage unit 1030 may store a global attention-based deep learning network.

일 실시예에서, 저장부(1130)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 저장부(1130)는 제어부(1120)의 요청에 따라 저장된 데이터를 제공할 수 있다.In one embodiment, the storage unit 1130 may include a volatile memory, a non-volatile memory, or a combination of volatile and non-volatile memories. Also, the storage unit 1130 may provide stored data according to a request of the control unit 1120 .

도 11을 참고하면, 영상 분할 장치(1100)는 획득부(1110), 제어부(1120) 및 저장부(1130)를 포함할 수 있다. 본 발명의 다양한 실시 예들에서 영상 분할 장치(1100)는 도 11에 설명된 구성들이 필수적인 것은 아니어서, 도 11에 설명된 구성들보다 많은 구성들을 가지거나, 또는 그보다 적은 구성들을 가지는 것으로 구현될 수 있다.Referring to FIG. 11 , an image segmentation apparatus 1100 may include an acquisition unit 1110, a control unit 1120, and a storage unit 1130. In various embodiments of the present invention, the image segmentation apparatus 1100 may have more or fewer components than those described in FIG. 11 because the components described in FIG. 11 are not essential.

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로, 통상의 기술자라면 본 발명의 본질적인 특성이 벗어나지 않는 범위에서 다양한 변경 및 수정이 가능할 것이다.The above description is only illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention.

본 명세서에 개시된 다양한 실시예들은 순서에 관계없이 수행될 수 있으며, 동시에 또는 별도로 수행될 수 있다. The various embodiments disclosed herein may be performed out of order, concurrently or separately.

일 실시예에서, 본 명세서에서 설명되는 각 도면에서 적어도 하나의 단계가 생략되거나 추가될 수 있고, 역순으로 수행될 수도 있으며, 동시에 수행될 수도 있다. In one embodiment, at least one step may be omitted or added in each figure described herein, may be performed in reverse order, or may be performed concurrently.

본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라, 설명하기 위한 것이고, 이러한 실시예들에 의하여 본 발명의 범위가 한정되는 것은 아니다.The embodiments disclosed herein are not intended to limit the technical spirit of the present invention, but are intended to explain, and the scope of the present invention is not limited by these embodiments.

본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 이해되어야 한다.The protection scope of the present invention should be interpreted according to the claims, and all technical ideas within the equivalent range should be understood to be included in the scope of the present invention.

210: 인코더 파이어 모듈
211: 스퀴즈 모듈
212: 제1 컨볼루션 레이어
213: 확장 레이어
214: 제2 컨볼루션 레이어
215: 제3 컨볼루션 레이어
216: 연결
220: 디코더 파이어 모듈
221: 스퀴즈 레이어
222: 제1 트랜스포스 컨볼루션 레이어
223: 확장 레이어
224: 제2 트랜스포즈 컨볼루션 레이어
225: 제2 트랜스포즈 컨볼루션 레이어
226: 연결
300: 글로벌 어텐션 기반 딥러닝 네트워크
301: 멀티 스케일 레이어
303: 맥스 풀링
310: 제1 파이어 모듈
311: 제1 글로벌 어텐션 모듈
312: 맥스 풀링
320: 제2 파이어 모듈
321: 제2 글로벌 어텐션 모듈
322: 업샘플링
330: 분류 레이어
400: 글로벌 어텐션 모듈
401: 글로벌 평균 풀링
403: 제6 컨볼루션 레이어
405: 제7 컨볼루션 레이어
407: 곱 연산
409: 업샘플링
411: 합 연산
1100: 영상 분할 장치
1110: 획득부
1120: 제어부
1130: 저장부210: encoder fire module
211: squeeze module
212: first convolution layer
213: extension layer
214: second convolution layer
215: third convolution layer
216: connection
220: decoder fire module
221: squeeze layer
222: first transpose convolution layer
223: extension layer
224: second transpose convolution layer
225: second transpose convolution layer
226: connection
300: Global attention-based deep learning network
301: multi-scale layer
303 Max pooling
310: first fire module
311: first global attention module
312 Max pooling
320: second fire module
321: second global attention module
322: upsampling
330: classification layer
400: global attention module
401: global average pooling
403: 6th convolution layer
405: 7th convolution layer
407: product operation
409: upsampling
411: sum operation
1100: video segmentation device
1110: acquisition unit
1120: control unit
1130: storage unit

Claims

(a) obtaining input information;
(b) a first fire module including a squeeze layer composed of a first convolution layer, an expand layer composed of a second convolution layer and a third convolution layer, inputting the input information to a fire module;
(c) inputting output information and multi-scale input information of the first fire module to a first global attention module including global average pooling; and
(d) performing encoding by max pooling output information of the first global attention module;
including,
Image segmentation method using global attention.

According to claim 1,
In step (a),
acquiring an input image; and
generating the input information in a patch form by dividing the input image;
including,
Image segmentation method using global attention.

According to claim 1,
The first convolution layer and the second convolution layer are composed of a first kernel size,
The third convolution layer is composed of a second kernel size,
Image segmentation method using global attention.

According to claim 1,
In step (b),
generating an output value of a first convolution layer of the squeeze layer;
inputting the generated output value to each of a second convolution layer and a third convolution layer configured in parallel of the enhancement layer; and
generating output information of the first fire module by concatenating output values of the second convolution layer and the third convolution layer;
including,
Image segmentation method using global attention.

According to claim 1,
Before step (c),
Generating input information for a second convolution layer by performing max pooling on the input information for the first convolution layer;
generating multi-scale input information for the second convolution layer by inputting input information for the second convolution layer to each of a fourth convolution layer and a fifth convolution layer configured in parallel;
Including more,
Image segmentation method using global attention.

According to claim 1,
In step (c),
pooling the global average of output information of the first fire module;
inputting a resultant value of the global average pooling to a sixth convolution layer;
Generating an attention coefficient by performing upsampling based on a result value of the sixth convolution layer and a result value generated by inputting the multi-scale input information to a seventh convolution layer; and
generating output information of the first global attention module using the attention coefficient and output information of the first fire module;
including,
Image segmentation method using global attention.

According to claim 1,
After step (d),
inputting output information of a first fire module for the first convolution layer and output information of a second fire module for the second convolution layer to a second global attention module for the first convolution layer;
Including more,
The second fire module includes a squeeze layer composed of a first transposed convolution layer, an extension layer composed of a second transposed convolution layer and a third transposed convolution layer,
Image segmentation method using global attention.

According to claim 7,
After step (d),
Decoding by inputting a result value generated by upsampling output information of the second global attention module and output information of the second fire module for the second convolution layer to a second fire module for the first convolution layer; and
Calculating a final output value by inputting a decoded value calculated by performing the decoding to a classification layer;
Including more,
Image segmentation method using global attention.

According to claim 7,
In the step of inputting to the second global attention module,
generating an output value of a first transpose convolution layer of the squeeze layer;
inputting the generated output value to a second transpose convolution layer and a third transpose convolution layer configured in parallel of the enhancement layer, respectively; and
generating output information of the second fire module by concatenating output values of the second transpose convolution layer and the third transpose convolution layer;
including,
Image segmentation method using global attention.

an acquisition unit that acquires input information; and
Inputting the input information to a first fire module including a squeeze layer composed of a first convolution layer and an expand layer composed of a second convolution layer and a third convolution layer,
Inputting the output information and multi-scale input information of the first fire module to a first global attention module including global average pooling,
a control unit performing encoding by max pooling output information of the first global attention module;
including,
Video segmentation device using global attention.

According to claim 10,
The acquisition unit obtains an input image,
The control unit divides the input image to generate the input information in the form of a patch.
Video segmentation device using global attention.

According to claim 10,
The first convolution layer and the second convolution layer are composed of a first kernel size,
The third convolution layer is composed of a second kernel size,
Video segmentation device using global attention.

According to claim 10,
The control unit,
generating an output value of a first convolution layer of the squeeze layer;
Inputting the generated output value to each of a second convolution layer and a third convolution layer configured in parallel of the extension layer,
Generating output information of the first fire module by concatenating output values of each of the second convolution layer and the third convolution layer,
Video segmentation device using global attention.

According to claim 10,
The control unit,
Generating input information for a second convolution layer by performing max pooling on the input information for the first convolution layer;
Generating multi-scale input information for the second convolution layer by inputting input information for the second convolution layer to each of a fourth convolution layer and a fifth convolution layer configured in parallel,
Video segmentation device using global attention.

According to claim 10,
The control unit,
The global average pooling of the output information of the first fire module,
Inputting the resultant value of the global mean pooling to a sixth convolution layer;
Generating an attention coefficient by performing upsampling based on a result value of the sixth convolution layer and a result value generated by inputting the multi-scale input information to a seventh convolution layer,
Video segmentation device using global attention.

According to claim 10,
The control unit,
Inputting output information of a first fire module for the first convolution layer and output information of a second fire module for the second convolution layer to a second global attention module for the first convolution layer,
The second fire module includes a squeeze layer composed of a first transposed convolution layer, an extension layer composed of a second transposed convolution layer and a third transposed convolution layer,
Video segmentation device using global attention.

According to claim 16,
The control unit,
Decoding is performed by inputting a result value generated by upsampling the output information of the second global attention module and the output information of the second fire module for the second convolution layer to the second fire module for the first convolution layer,
Calculating a final output value by inputting the decoded value calculated by performing the decoding to a classification layer,
Video segmentation device using global attention.

According to claim 16,
The control unit,
generating an output value of a first transpose convolution layer of the squeeze layer;
Inputting the generated output value to each of a second transpose convolution layer and a third transpose convolution layer configured in parallel of the extension layer,
Video segmentation device using global attention.