KR20220129406A

KR20220129406A - A method and apparatus for image segmentation using residual convolution based deep learning network

Info

Publication number: KR20220129406A
Application number: KR1020210034278A
Authority: KR
Inventors: 이범식; 챠이트라 다야난다
Original assignee: 조선대학교산학협력단
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-09-23
Also published as: KR102604217B1

Abstract

The present invention relates to image segmentation method and apparatus using a residual convolution-based deep learning network. The image segmentation method using a residual convolution-based deep learning network, according to an embodiment of the present invention, comprises the steps of: (a) obtaining input information; (b) inputting, into a first fire module comprising a squeeze layer composed of a first convolution filter and an expand layer composed of a second convolution filter and a third convolution filter, the input information; (c) concatenating output information of the first fire module and the input information obtained through residual connection; and (d) performing encoding by max pooling a result value calculated by the concatenation. The present invention can perform smooth learning of a deep learning network architecture through a residual unit.

Description

A method and apparatus for image segmentation using residual convolution based deep learning network}

본 발명은 영상 분할 방법 및 장치에 관한 것으로, 더욱 상세하게는 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법 및 장치에 관한 것이다.The present invention relates to an image segmentation method and apparatus, and more particularly, to an image segmentation method and apparatus using a residual convolution-based deep learning network.

자기 공명 영상(MRI)은 대비가 높고 해상도가 상대적으로 높다. 따라서 새로운 기술은 임상 응용 및 과학 연구에서 인간의 뇌를 검사하는데 널리 사용되고 있다.Magnetic resonance imaging (MRI) has high contrast and relatively high resolution. Therefore, the new technology is widely used to examine the human brain in clinical applications and scientific research.

이 경우, 뇌 조직을 백질(white matter, WM), 회백질(gray matter, GM) 및 뇌척수액(cerebrospinal fluid, CSF)으로 자동 분할하는 것은 앞서 언급한 작업에서 매우 중요하다.In this case, automatic division of brain tissue into white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) is very important in the aforementioned work.

정확한 조직 분할은 MRI의 잡음, 편향 장 및 부분 부피 효과로 인한 복잡한 뇌 구조와 조직 이질성으로 인해 어려운 일이다.Accurate tissue segmentation is challenging due to the complex brain structure and tissue heterogeneity caused by MRI noise, deflection fields, and partial volume effects.

이러한 문제를 해결하기 위해 딥러닝 네트워크를 사용하는 전략은 관련 이점으로 인해 세분화 작업에 사용되고 있으나, 이에 대한 연구는 미흡한 실정이다. To solve this problem, a strategy using deep learning networks is used for segmentation work due to related advantages, but studies on this are insufficient.

[특허문헌 1] 한국등록특허 제10-2089014호[Patent Document 1] Korean Patent No. 10-2089014

본 발명은 전술한 문제점을 해결하기 위하여 창출된 것으로, 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention was created to solve the above problems, and an object of the present invention is to provide an image segmentation method and apparatus using a residual convolution-based deep learning network.

또한, 본 발명은 스퀴즈 레이어와 확장 레이어를 포함하는 파이어 모듈을 사용하는 영상 분할 방법 및 장치를 제공하는 것을 그 목적으로 한다.Another object of the present invention is to provide an image segmentation method and apparatus using a fire module including a squeeze layer and an extension layer.

또한, 본 발명은 파이어 모듈의 출력 정보와 잔차 연결을 통한 입력 정보를 연결(concatenate)하는 영상 분할 방법 및 장치를 제공하는 것을 그 목적으로 한다. Another object of the present invention is to provide an image segmentation method and apparatus for concatenating output information of a fire module and input information through residual concatenation.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.Objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood from the description below.

상기한 목적들을 달성하기 위하여, 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법은, (a) 입력 정보를 획득하는 단계; (b) 상기 입력 정보를 제1 컨볼루션 필터로 구성된 스퀴즈 레이어(squeeze layer)와 제2 컨볼루션 필터 및 제3 컨볼루션 필터로 구성된 확장 레이어(expand layer)를 포함하는 제1 파이어(fire) 모듈에 입력하는 단계; (c) 상기 제1 파이어 모듈의 출력 정보와 잔차 연결(residual connection)을 통해 획득된 상기 입력 정보를 연결(concatenate)하는 단계; 및 (d) 상기 연결하여 산출된 결과값을 맥스 풀링(max pooling)하여 인코딩을 수행하는 단계;를 포함할 수 있다. In order to achieve the above objects, an image segmentation method using a residual convolution-based deep learning network according to an embodiment of the present invention includes the steps of: (a) obtaining input information; (b) a first fire module including a squeeze layer composed of a first convolution filter and an expand layer composed of a second convolution filter and a third convolution filter for the input information to enter into; (c) concatenating the output information of the first fire module and the input information obtained through residual connection; and (d) performing encoding by max pooling the result values calculated by concatenation.

실시예에서, 상기 (a) 단계는, 입력 이미지를 획득하는 단계; 및 상기 입력 이미지를 분할하여 패치(patch) 형태의 상기 입력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, the step (a) includes: obtaining an input image; and generating the input information in the form of a patch by dividing the input image.

실시예에서, 상기 제1 컨볼루션 필터 및 제2 컨볼루션 필터는, 제1 커널 크기로 구성되고, 상기 제3 컨볼루션 필터는, 제2 커널 크기로 구성될 수 있다. In an embodiment, the first convolution filter and the second convolution filter may have a first kernel size, and the third convolution filter may have a second kernel size.

실시예에서, 상기 (b) 단계는, 상기 스퀴즈 레이어의 제1 컨볼루션 필터의 출력값을 생성하는 단계; 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 컨볼루션 필터와 제3 컨볼루션 필터 각각에 입력하는 단계; 및 상기 제2 컨볼루션 필터와 제3 컨볼루션 필터 각각의 출력값을 연결(concatenate)하여 상기 제1 파이어 모듈의 출력 정보를 생성하는 단계;를 포함할 수 있다. In an embodiment, the step (b) may include generating an output value of a first convolution filter of the squeeze layer; inputting the generated output value to each of a second convolution filter and a third convolution filter configured in parallel of the extension layer; and generating output information of the first fire module by concatenating the respective output values of the second convolution filter and the third convolution filter.

실시예에서, 상기 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법은, 상기 (d) 단계 이후에, 상기 인코딩을 수행하여 산출된 인코딩값을 맥스 언풀링(max unpooling)하는 단계; 상기 맥스 언풀링하여 산출된 결과값을 제1 컨볼루션 필터로 구성된 스퀴즈 레이어와 제2 컨볼루션 필터 및 제3 컨볼루션 필터로 구성된 확장 레이어를 포함하는 제2 파이어 모듈에 입력하는 단계; 및 상기 제2 파이어 모듈의 출력 정보와 잔차 연결을 통해 획득된 상기 맥스 언풀링하여 산출된 결과값을 연결(concatenate)하여 디코딩을 수행하는 단계;를 포함할 수 있다. In an embodiment, the image segmentation method using the residual convolution-based deep learning network includes, after the step (d), max unpooling the encoding value calculated by performing the encoding; inputting the result calculated by the max unpooling to a second fire module including a squeeze layer composed of a first convolution filter and an extension layer composed of a second convolution filter and a third convolution filter; and performing decoding by concatenating the output information of the second fire module and the result calculated by the max unpooling obtained through residual concatenation.

실시예에서, 상기 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법은, 상기 디코딩을 수행하는 단계 이후에, 상기 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)에 입력하여 최종 출력값을 산출하는 단계;를 포함할 수 있다. In an embodiment, the image segmentation method using the residual convolution-based deep learning network, after performing the decoding, inputs the decoded value calculated by performing the decoding to a classification layer to obtain the final output value Calculating; may include.

실시예에서, 상기 분류 레이어는, 소프트맥스(softmax) 활성화 함수로 구성될 수 있다. In an embodiment, the classification layer may be composed of a softmax activation function.

실시예에서, 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 장치는, 입력 정보를 획득하는 획득부; 상기 입력 정보를 제1 컨볼루션 필터로 구성된 스퀴즈 레이어(squeeze layer)와 제2 컨볼루션 필터 및 제3 컨볼루션 필터로 구성된 확장 레이어(expand layer)를 포함하는 제1 파이어(fire) 모듈에 입력하고, 상기 제1 파이어 모듈의 출력 정보와 잔차 연결(residual connection)을 통해 획득된 상기 입력 정보를 연결(concatenate)하고, 상기 연결하여 산출된 결과값을 맥스 풀링(max pooling)하여 인코딩을 수행하는 제어부;를 포함할 수 있다. In an embodiment, an image segmentation apparatus using a residual convolution-based deep learning network includes: an acquisition unit configured to acquire input information; The input information is input to a first fire module including a squeeze layer composed of a first convolution filter and an expand layer composed of a second convolution filter and a third convolution filter, and , a control unit that concatenates the output information of the first fire module and the input information obtained through residual connection, and performs encoding by max pooling the concatenated result value. ; may be included.

실시예에서, 상기 획득부는, 입력 이미지를 획득하고, 상기 제어부는, 상기 입력 이미지를 분할하여 패치(patch) 형태의 상기 입력 정보를 생성할 수 있다. In an embodiment, the acquiring unit may acquire an input image, and the controller may generate the input information in the form of a patch by dividing the input image.

실시예에서, 상기 제어부는, 상기 스퀴즈 레이어의 제1 컨볼루션 필터의 출력값을 생성하고, 상기 생성된 출력값을 상기 확장 레이어의 병렬로 구성된 제2 컨볼루션 필터와 제3 컨볼루션 필터 각각에 입력하며, 상기 제2 컨볼루션 필터와 제3 컨볼루션 필터 각각의 출력값을 연결(concatenate)하여 상기 제1 파이어 모듈의 출력 정보를 생성할 수 있다. In an embodiment, the control unit generates an output value of the first convolution filter of the squeeze layer, and inputs the generated output value to each of a second convolution filter and a third convolution filter configured in parallel of the extension layer, , by concatenating output values of each of the second convolution filter and the third convolution filter, output information of the first fire module may be generated.

실시예에서, 상기 제어부는, 상기 인코딩을 수행하여 산출된 인코딩값을 맥스 언풀링(max unpooling)하고, 상기 맥스 언풀링하여 산출된 결과값을 제1 컨볼루션 필터로 구성된 스퀴즈 레이어와 제2 컨볼루션 필터 및 제3 컨볼루션 필터로 구성된 확장 레이어를 포함하는 제2 파이어 모듈에 입력하고, 상기 제2 파이어 모듈의 출력 정보와 잔차 연결을 통해 획득된 상기 맥스 언풀링하여 산출된 결과값을 연결(concatenate)하여 디코딩을 수행할 수 있다. In an embodiment, the control unit max unpools the encoding value calculated by performing the encoding, and uses a squeeze layer composed of a first convolution filter and a second convolution filter for a result value calculated by max unpooling. Input to a second Fire module including an extension layer composed of a convolution filter and a third convolution filter, and connect the output information of the second Fire module and the result calculated by max unpooling obtained through residual concatenation ( concatenate) to perform decoding.

실시예에서, 상기 제어부는, 상기 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)에 입력하여 최종 출력값을 산출할 수 있다. In an embodiment, the controller may input a decoded value calculated by performing the decoding to a classification layer to calculate a final output value.

상기한 목적들을 달성하기 위한 구체적인 사항들은 첨부된 도면과 함께 상세하게 후술될 실시예들을 참조하면 명확해질 것이다.Specific details for achieving the above objects will become clear with reference to the embodiments to be described in detail below in conjunction with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 서로 다른 다양한 형태로 구성될 수 있으며, 본 발명의 개시가 완전하도록 하고 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자(이하, "통상의 기술자")에게 발명의 범주를 완전하게 알려주기 위해서 제공되는 것이다.However, the present invention is not limited to the embodiments disclosed below, and may be configured in various different forms, and those of ordinary skill in the art to which the present invention pertains ( Hereinafter, "a person skilled in the art") is provided to fully inform the scope of the invention.

본 발명의 일 실시예에 의하면, 잔차 단위(residual unit)를 통해 딥러닝 네트워크 아키텍처의 원활한 학습을 수행할 수 있다. According to an embodiment of the present invention, it is possible to smoothly learn the deep learning network architecture through a residual unit.

본 발명의 일 실시예에 의하면, 잔차 컨볼루션 레이어(residual convolutional layer)를 사용한 특징 수집은 분할 네트워크에서 더 나은 특징 표현을 보장할 수 있다. According to an embodiment of the present invention, feature collection using a residual convolutional layer can ensure better feature representation in a segmented network.

본 발명의 일 실시예에 의하면, 더 적은 수의 네트워크 매개 변수와 뇌 MRI를 위한 더 나은 분할 정확도로 설계를 보다 효율적으로 설계할 수 있다. According to an embodiment of the present invention, it is possible to design a design more efficiently with fewer network parameters and better segmentation accuracy for brain MRI.

본 발명의 효과들은 상술된 효과들로 제한되지 않으며, 본 발명의 기술적 특징들에 의하여 기대되는 잠정적인 효과들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-described effects, and potential effects expected by the technical features of the present invention will be clearly understood from the following description.

도 1은 본 발명의 일 실시예에 따른 잔차 연결을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 프로세스를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 과정을 도시한 도면이다.
도 4a는 종래의 일 실시예에 따른 파이어 모듈을 도시한 도면이다.
도 4b는 본 발명의 일 실시예에 따른 파이어 모듈을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 분류 결과 이미지를 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.
도 7은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.
도 8은 본 발명의 일 실시예에 따른 컨볼루션 유닛 비교를 도시한 도면이다.
도 9는 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법을 도시한 도면이다.
도 10은 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 장치를 도시한 도면이다.1 is a diagram illustrating a residual connection according to an embodiment of the present invention.
2 is a diagram illustrating an image segmentation process using a residual convolution-based deep learning network according to an embodiment of the present invention.
3 is a diagram illustrating an image segmentation process using a residual convolution-based deep learning network according to an embodiment of the present invention.
4A is a diagram illustrating a fire module according to an exemplary embodiment of the related art.
4B is a diagram illustrating a fire module according to an embodiment of the present invention.
5 is a diagram illustrating a classification result image according to an embodiment of the present invention.
6 is a diagram illustrating a comparison of classification results based on a first data set according to an embodiment of the present invention.
7 is a diagram illustrating a comparison of classification results based on a second data set according to an embodiment of the present invention.
8 is a diagram illustrating a comparison of convolution units according to an embodiment of the present invention.
9 is a diagram illustrating an image segmentation method using a residual convolution-based deep learning network according to an embodiment of the present invention.
10 is a diagram illustrating an image segmentation apparatus using a residual convolution-based deep learning network according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시예들을 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

청구범위에 개시된 발명의 다양한 특징들은 도면 및 상세한 설명을 고려하여 더 잘 이해될 수 있을 것이다. 명세서에 개시된 장치, 방법, 제법 및 다양한 실시예들은 예시를 위해서 제공되는 것이다. 개시된 구조 및 기능상의 특징들은 통상의 기술자로 하여금 다양한 실시예들을 구체적으로 실시할 수 있도록 하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. 개시된 용어 및 문장들은 개시된 발명의 다양한 특징들을 이해하기 쉽게 설명하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다.Various features of the invention disclosed in the claims may be better understood upon consideration of the drawings and detailed description. The apparatus, methods, preparations, and various embodiments disclosed herein are provided for purposes of illustration. The disclosed structural and functional features are intended to enable those skilled in the art to specifically practice the various embodiments, and are not intended to limit the scope of the invention. The terms and sentences disclosed are for the purpose of easy-to-understand descriptions of various features of the disclosed invention, and are not intended to limit the scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명을 생략한다.In describing the present invention, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

이하, 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법 및 장치를 설명한다.Hereinafter, an image segmentation method and apparatus using a residual convolution-based deep learning network according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 잔차 연결을 도시한 도면이다.1 is a diagram illustrating a residual connection according to an embodiment of the present invention.

도 1을 참고하면, 잔차 연결(residual connection) 또는 숏컷 연결(shortcut connection)의 블록 다이어그램을 확인할 수 있다. 즉, 잔차 연결은 하나 이상의 레이어를 건너뛰는 것을 의미할 수 있다. Referring to FIG. 1 , a block diagram of a residual connection or a shortcut connection may be identified. That is, residual concatenation may mean skipping one or more layers.

잔차 연결은 정보 손실없이 이전 레이어에서 다음 레이어로 정보를 전달할 수 있게 해주고 소실 그라디언트 문제(vanishing gradient problem)를 방지할 수 있다. Residual concatenation allows information to be passed from the previous layer to the next layer without loss of information and avoids the vanishing gradient problem.

도 2는 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 프로세스를 도시한 도면이다.2 is a diagram illustrating an image segmentation process using a residual convolution-based deep learning network according to an embodiment of the present invention.

도 2를 참고하면, 뇌 MRI에서 백질(WM), 회백질(GM) 및 뇌척수액(CSF)을 식별하기 위해서는 로컬 세부 정보(local detail)가 글로벌 정보(global information)보다 중요할 수 있다. 따라서, 더 나은 로컬 세부 정보를 캡처하고 정확한 조직 분할을 얻기 위해 잔차 컨볼루션 기반 딥러닝 네트워크의 훈련을 위한 패치 기반 입력이 사용될 수 있다. Referring to FIG. 2 , in order to identify white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) in brain MRI, local detail may be more important than global information. Therefore, patch-based input for training of residual convolution-based deep learning networks can be used to better capture local details and obtain accurate tissue segmentation.

210 단계에서, 다수의 슬라이스로 구성된 입력 이미지를 획득할 수 있다. 예를 들어, 입력 이미지는 단면 슬라이스로 구성된 뇌 MRI 이미지를 포함할 수 있다. In operation 210, an input image composed of a plurality of slices may be acquired. For example, the input image may include a brain MRI image composed of a cross-sectional slice.

220 단계에서, 다수의 입력 이미지 중 적어도 하나의 입력 이미지를 추출할 수 있다. 예를 들어, 다수의 슬라이스 입력 이미지 중 적어도 하나의 입력 이미지를 추출할 수 있다. In operation 220, at least one input image may be extracted from among the plurality of input images. For example, at least one input image among a plurality of slice input images may be extracted.

230 단계에서, 입력 이미지를 다수의 패치(patch) 형태의 입력 정보로 분할할 수 있다. 일 실시예에서, 각 입력 이미지의 치수는 높이×너비×슬라이스(H×W×S)로 구성될 수 있다. 예를 들어, 각 슬라이스의 HxW에 0을 채우고 256x256 크기로 조정할 수 있다. 예를 들어, 각 슬라이스 형태의 입력 이미지를 4 개의 균일한 패치로 분할할 수 있다. 예를 들어, 분할된 각 패치의 크기는 128×128일 수 있다. In operation 230, the input image may be divided into a plurality of patch types of input information. In one embodiment, the dimensions of each input image may be composed of height×width×slice (H×W×S). For example, you can pad the HxW of each slice with zeros and resize it to 256x256. For example, an input image in the form of each slice can be divided into 4 uniform patches. For example, the size of each divided patch may be 128×128.

240 단계에서, 이러한 패치는 학습을 위해 잔차 컨볼루션 기반 딥러닝 네트워크에 입력하여 테스트 데이터에 대해 예측된 세분화 결과를 획득할 수 있다. In step 240, these patches may be input to the residual convolution-based deep learning network for training to obtain predicted segmentation results for the test data.

즉, 본 발명에 따르면, 더 작은 입력 패치는 로컬 세부 정보를 더 잘 반영할 수 있으며, 잔차 컨볼루션 기반 딥러닝 네트워크를 훈련하여 더 높은 세분화 정확도를 생성할 수 있다. 따라서 입력 MRI 슬라이스를 균일한 패치로 나누어 미세한 세부 정보를 캡처하고 이러한 패치를 잔차 컨볼루션 기반 딥러닝 네트워크 학습에 제공할 수 있다.That is, according to the present invention, smaller input patches can better reflect local details, and can train a residual convolution-based deep learning network to produce higher segmentation accuracy. Therefore, it is possible to divide the input MRI slice into uniform patches to capture fine details and feed these patches to residual convolution-based deep learning network training.

따라서 잔차 컨볼루션 기반 딥러닝 네트워크를 훈련시키고 더 나은 세분화 정확도를 제공하는데 사용되는 각 입력 슬라이스의 패치 방식 분할을 사용할 수 있다.Therefore, we can use patch-wise segmentation of each input slice, which is used to train residual convolution-based deep learning networks and provide better segmentation accuracy.

예를 들어, 본 발명에 따르면, 자기 공명 영상(MRI)에서 뇌 조직을 분할하기 위해 U-SegNet과 파이어 모듈(fire module) 및 잔차 컨볼루션(residual convolution)을 통합하여 개선된 U-SegNet 모델이 사용될 수 있다. For example, according to the present invention, an improved U-SegNet model by integrating U-SegNet with a fire module and residual convolution to segment brain tissue in magnetic resonance imaging (MRI) is provided. can be used

본 발명에 따른 잔차 컨볼루션 기반 딥러닝 네트워크는 U-SegNet의 파워(power), 잔차 연결(residual connection) 및 스퀴즈(squeeze)를 활용하고 뇌 MRI의 분할을 위해 파이어 모듈에서 컨볼루션 레이어를 확장할 수 있다. The residual convolution-based deep learning network according to the present invention utilizes the power, residual connection, and squeeze of U-SegNet and extends the convolution layer in the Fire module for segmentation of brain MRI. can

도 3은 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크(300)를 이용한 영상 분할 과정을 도시한 도면이다. 도 4b는 본 발명의 일 실시예에 따른 파이어 모듈을 도시한 도면이다.3 is a diagram illustrating an image segmentation process using the residual convolution-based deep learning network 300 according to an embodiment of the present invention. 4B is a diagram illustrating a fire module according to an embodiment of the present invention.

도 3을 참고하면, 잔차 컨볼루션 기반 딥러닝 네트워크(300)는 인코더 경로, 디코더 경로 및 분류 경로를 포함할 수 있다. 또한, 본 발명에 따르면, 인코더 경로와 디코더 경로에서 학습 가능한 매개 변수의 수와 계산 복잡성을 줄이기 위해 파이어 모듈(즉, 제1 파이어 모듈(310) 및 제2 파이어 모듈(322))이 사용될 수 있다. Referring to FIG. 3 , the residual convolution-based deep learning network 300 may include an encoder path, a decoder path, and a classification path. In addition, according to the present invention, a fire module (ie, the first fire module 310 and the second fire module 322) can be used to reduce the number of learnable parameters and the computational complexity in the encoder path and the decoder path. .

인코더 경로의 경우, 패치 형태의 입력 정보를 제1 컨볼루션 필터(412)로 구성된 스퀴즈 레이어(squeeze layer)(410)와 제2 컨볼루션 필터(422) 및 제3 컨볼루션 필터(424)로 구성된 확장 레이어(expand layer)(420)를 포함하는 제1 파이어 모듈(310)에 입력할 수 있다. In the case of the encoder path, the input information in the form of a patch is composed of a squeeze layer 410 composed of a first convolution filter 412 , a second convolution filter 422 , and a third convolution filter 424 . An input may be performed to the first fire module 310 including an expand layer 420 .

일 실시예에서, 제1 컨볼루션 필터(412) 및 제2 컨볼루션 필터(422)는 제1 커널 크기로 구성되고, 제3 컨볼루션 필터(424)는 제2 커널 크기로 구성될 수 있다. 예를 들어, 제1 커널 크기는 1x1 커널 크기를 포함하고, 제2 커널 크기는 3x3 커널 크기를 포함하나, 이에 제한되지 않는다. In an embodiment, the first convolution filter 412 and the second convolution filter 422 may be configured with a first kernel size, and the third convolution filter 424 may be configured with a second kernel size. For example, the first kernel size includes a 1x1 kernel size, and the second kernel size includes a 3x3 kernel size, but is not limited thereto.

일 실시예에서, 스퀴즈 레이어(410)의 제1 컨볼루션 필터(412)의 출력값을 생성하고, 상기 생성된 출력값을 확장 레이어(420)의 병렬로 구성된 제2 컨볼루션 필터(422)와 제3 컨볼루션 필터(424) 각각에 입력하며, 제2 컨볼루션 필터(422)와 제3 컨볼루션 필터(424) 각각의 출력값을 연결(concatenate)(426)하여 제1 파이어 모듈(310)의 출력 정보를 생성할 수 있다. In an embodiment, the output value of the first convolution filter 412 of the squeeze layer 410 is generated, and the generated output value is used in parallel with the second convolution filter 422 and the third convolutional filter 422 of the extension layer 420 . The output information of the first fire module 310 is input to each of the convolution filters 424, and the output values of the second convolution filter 422 and the third convolution filter 424 are concatenated (426) respectively. can create

제1 파이어 모듈(310)의 출력 정보와 잔차 연결(residual connection)을 통해 획득된 입력 정보를 연결(concatenate)(312)할 수 있다. 상기 연결(312)을 통해 서로 분리된 레이어의 출력 데이터를 하나의 단일 레이어를 통해 취합할 수 있다. 일 실시예에서, 연결(312) 연산은 독립적인 두 개의 차원을 하나의 차원으로 합쳐 확장하는 것을 의미할 수 있다. Output information of the first fire module 310 and input information obtained through residual connection may be concatenated 312 . Output data of layers separated from each other through the connection 312 may be collected through one single layer. In one embodiment, the operation of concatenation 312 may mean expanding two independent dimensions by merging them into one dimension.

또한, 상기 연결(312)하여 산출된 결과값을 맥스 풀링(314)하여 인코딩을 수행할 수 있다. In addition, encoding may be performed by max pooling (314) the result value calculated by the concatenation (312).

일 실시예에서, 인코더 경로는, 다수의 제1 파이어 모듈(310)와 연결(312) 및 맥스 풀링(314)으로 구성될 수 있다. In one embodiment, the encoder path may consist of a plurality of first fire modules 310 and connections 312 and max pooling 314 .

디코더 경로의 경우, 인코딩을 수행하여 산출된 인코딩값을 맥스 언풀링(max unpooling)(320)하고, 맥스 언풀링(320)하여 산출된 결과값을 제1 컨볼루션 필터(412)로 구성된 스퀴즈 레이어(410)와 제2 컨볼루션 필터(422) 및 제3 컨볼루션 필터(424)로 구성된 확장 레이어(420)를 포함하는 제2 파이어 모듈(322)에 입력할 수 있다. In the case of the decoder path, a squeeze layer composed of a first convolution filter 412 for max unpooling (320) an encoding value calculated by performing encoding, and max unpooling (320) for a result calculated by the first convolution filter (412) The input may be performed to the second fire module 322 including the extension layer 420 including the 410 and the second convolution filter 422 and the third convolution filter 424 .

일 실시예에서, 제2 파이어 모듈(322)의 출력 정보와 잔차 연결을 통해 획득된 맥스 언풀링(320)하여 산출된 결과값을 연결(concatenate)(324)하여 디코딩을 수행할 수 있다. In an embodiment, decoding may be performed by concatenating ( 324 ) output information of the second fire module 322 and a result value calculated by max unpooling 320 obtained through residual concatenation.

일 실시예에서, 하나의 인코더(예: 제1 파이어 모듈(310))에서 디코더(예: 제2 파이어 모듈(322))로 정보를 전달하는 스킵 연결(skip connection) 또는 폴링 인덱스 전달을 수행할 수 있다. In one embodiment, a skip connection that transfers information from one encoder (eg, the first fire module 310) to a decoder (eg, the second fire module 322) or a polling index transfer is performed. can

분류 경로의 경우, 디코딩을 수행한 이후에, 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)(330)에 입력하여 최종 출력값을 산출할 수 있다. 일 실시예에서, 분류 레이어(330)는 소프트맥스(softmax) 활성화 함수로 구성될 수 있다. In the case of a classification path, after decoding is performed, a decoded value calculated by performing decoding may be input to a classification layer 330 to calculate a final output value. In one embodiment, the classification layer 330 may consist of a softmax activation function.

본 발명에 따르면, 파이어 모듈(fire module)과 잔차 연결(residual connection)의 조합은 제안된 U-SegNet 모델의 새로운 확장입니다. 네트워크의 잔차 연결을 통해 이전 레이어에서 다음 레이어로 중요한 정보를 전달할 수 있습니다.According to the present invention, the combination of fire module and residual connection is a new extension of the proposed U-SegNet model. Residual connections in the network allow passing important information from previous layers to the next.

본 발명에 따르면, 기존 U-SegNet의 인코더 및 디코더 경로에 있는 컨볼루션 레이어는 제안된 방법에서 파이어 모듈로 대체되어 매개 변수 및 계산 복잡성이 감소합니다.According to the present invention, the convolutional layers in the encoder and decoder paths of the existing U-SegNet are replaced by fire modules in the proposed method, reducing the parameter and computational complexity.

본 발명에 따르면, 제안된 균일한 패치 방식 입력은 네트워크가 개별 패치의 세부 사항에 집중하고 로컬 공간 정보를 더 잘 유지할 수 있도록 도와줍니다.According to the present invention, the proposed uniform patching input helps the network to focus on the details of individual patches and better retain local spatial information.

본 발명에 따르면, 우리가 제안한 패치 방식 잔차 기반 스퀴즈 U-SegNet 모델은 정확도 이득을 쉽게 얻을 수 있어 기존 방법보다 훨씬 나은 결과를 보여줍니다.According to the present invention, our proposed patch-based residual-based squeeze U-SegNet model can easily obtain accuracy gains, showing much better results than conventional methods.

도 4a는 종래의 일 실시예에 따른 파이어 모듈을 도시한 도면이다. 도 4b는 본 발명의 일 실시예에 따른 파이어 모듈을 도시한 도면이다.4A is a diagram illustrating a fire module according to an exemplary embodiment of the related art. 4B is a diagram illustrating a fire module according to an embodiment of the present invention.

도 4a를 참고하면,

필터를 포함하는 각 컨볼루션 블록이 있는 종래의 U-Net의 컨볼루션 레이어를 확인할 수 있으며, 높이x너비x채널의 특징 맵을 입력으로 사용할 수 있다. Referring to Figure 4a,

It is possible to check the convolutional layer of the conventional U-Net with each convolution block including the filter, and the feature map of height x width x channel can be used as input.

도 4b를 참고하면, 본 발명에 따른 파이어 모델(400)은 (i)스퀴즈 레이어(Squeeze layer)와 (ii)확장 레이어(expand layer)의 두 부분으로 구성될 수 있다. 예를 들어, 파이어 모델(400)은 제1 파이어 모델(310) 및 제2 파이어 모델(322)을 포함할 수 있다. Referring to FIG. 4B , the fire model 400 according to the present invention may be composed of two parts: (i) a squeeze layer and (ii) an expand layer. For example, the fire model 400 may include a first fire model 310 and a second fire model 322 .

스퀴즈 레이어(410)는 커널 크기가 1x1이고, 출력 채널이

/4인 하나의 제1 컨볼루션 레이어(412)로 구성될 수 있다. 여기서

는 U-Net에서 사용되는 컨볼루션 필터의 수를 나타낼 수 있다. The squeeze layer 410 has a kernel size of 1x1 and an output channel

It may be composed of one first convolutional layer 412 equal to /4. here

may represent the number of convolution filters used in U-Net.

스퀴즈 레이어(410)의 출력은 확장 레이어(420)로 공급되며, 이는 커널 크기가 3x3 및 1x1인 두 개의 병렬 컨볼루션, 즉, 제2 컨볼루션 레이어(422) 및 제3 컨볼루션 레이어(424)로 구성되며, 각 컨볼루션은

/2 출력 채널을 사용할 수 있다. The output of the squeeze layer 410 is fed to the extension layer 420, which consists of two parallel convolutions with kernel sizes of 3x3 and 1x1, namely the second convolutional layer 422 and the third convolutional layer 424. consists of , and each convolution is

/2 output channels are available.

따라서 이러한 병렬 컨볼루션의 출력은 연결되어 파이어 모듈(400)의 출력을 형성할 수 있다. Accordingly, the outputs of the parallel convolution may be connected to form the output of the fire module 400 .

일 실시예에서, 인코더 경로에는 일련의 파이어 모듈이 있으며, 여기서 각 파이어 모듈의 출력은 잔차 연결을 형성하는 입력과 연결될 수 있다. In one embodiment, there is a series of fire modules in the encoder path, wherein the output of each fire module may be coupled with an input forming a residual connection.

예를 들어,

을 입력 샘플로 표시하여, 이 경우, l은 잔차 컨볼루션 네트워크(residual convolutional network, RCNN)의 레이어를 나타낼 수 있다. 스퀴즈 블록의 컨볼루션 출력은 하기 <수학식 1>과 같이 제공됩니다.for example,

By representing as an input sample, in this case, l may represent a layer of a residual convolutional network (RCNN). The convolutional output of the squeeze block is provided as in <Equation 1> below.

여기서,

는 파이어 모듈(400)의 스퀴즈 레이어(410)의 출력,

은 커널 가중치, 위 첨자 [1x1]은 각 레이어와 관련된 컨볼루션 커널의 크기,

은 바이어스 항을 나타낸다. here,

is the output of the squeeze layer 410 of the fire module 400,

is the kernel weight, the superscript [1x1] is the size of the convolution kernel associated with each layer,

is the bias term.

일 실시예에서, 컨볼루션 출력은 표준 ReLU 활성화 함수

에 제공되며 하기 <수학식 2>로 표현될 수 있다. In one embodiment, the convolution output is a standard ReLU activation function

is provided and can be expressed by the following <Equation 2>.

일 실시예에서, 스퀴즈 레이어(410)의 출력은 확장 레이어(420)의 1x1 및 3x3 커널에 병렬로 공급되고 그 결과는 연결되어 파이어 모듈(400)의 출력을 생성하며 하기 <수학식 3>과 같이 표현될 수 있다. In one embodiment, the output of the squeeze layer 410 is supplied in parallel to the 1x1 and 3x3 kernels of the extension layer 420 and the result is connected to generate the output of the fire module 400, and can be expressed together.

여기서,

는 네트워크의 l 번째 파이어 모듈(400)의 출력을 나타낸다. 파이어 모듈(400)의 출력은 입력

과 연결되어 잔차 연결을 형성할 수 있다. here,

denotes the output of the l-th fire module 400 of the network. The output of the fire module 400 is an input

can be connected to form a residual link.

일 실시예에서, RCNN(residual convolutional network) 블록의 출력은 하기 <수학식 4>를 사용하여 계산할 수 있다.In an embodiment, the output of a residual convolutional network (RCNN) block may be calculated using Equation 4 below.

여기서,

은 숏컷 연결에 의해 수행되고, 요소 별 추가(element-wise addition)는 학습할 잔차 매핑을 나타낼 수 있다. here,

is performed by shortcut concatenation, and element-wise addition may indicate a residual mapping to be learned.

여기서,

은 RCNN 블록의 입력 샘플을 나타냅니다.

샘플은 각각 인코딩 및 디코딩 컨볼루션 단위에서 즉시 연속되는 맥스 풀링(max-pooling) 또는 언풀링 레이어(un-pooling layer)에 대한 입력으로 사용될 수 있다. here,

represents the input sample of the RCNN block.

Samples can be used as input to a max-pooling or un-pooling layer that is immediately continuous in encoding and decoding convolutional units, respectively.

일 실시예에서, 하기 <수학식 5>와 같이 맥스 풀링과 함께 잔차 블록을 인코더 장치를 나타낼 수 있다. In an embodiment, as shown in Equation 5 below, a residual block with max pooling may represent an encoder device.

여기서, 풀링 인덱스는, 각 맥스 풀링 작업을 수행하는 동안 저장되며 디코더에서 특징 맵을 풀링하는데 사용될 수 있다. Here, the pooling index is stored during each max pooling operation and may be used to pool the feature map in the decoder.

디코더 경로의 경우, 원래 입력 이미지를 복구하기 위해 인코더 경로의 맥스 풀링이 맥스 언풀링 작업으로 대체될 수 있다. For the decoder path, the max pooling of the encoder path can be replaced with the max unpooling operation to recover the original input image.

디코더는 모델 매개 변수를 줄이기 위해 제2 파이어 모듈(322)이 사용될 수 있다. 제2 파이어 모듈(322)은 인코더에서와 같이

/4 출력 채널이 있는 1x1 컨볼루션으로 구성될 수 있다. The decoder may use the second fire module 322 to reduce the model parameters. The second fire module 322, as in the encoder

It can be configured as a 1x1 convolution with /4 output channels.

1x1 컨볼루션의 출력은 3x3 및 1x1 커널 크기를 가진 두 개의 병렬 컨볼루션으로 공급되며, 각각은

/2 출력 채널이 연결되어 제2 파이어 모듈(322)의 출력을 형성할 수 있다. The output of the 1x1 convolution is fed into two parallel convolutions with 3x3 and 1x1 kernel sizes, each

The /2 output channel may be connected to form an output of the second fire module 322 .

인코더와 유사하게 제2 파이어 모듈(322)의 출력은 입력과 연결되어 잔차 블록을 형성할 수 있다. 맥스 언풀링 동안 특징 맵을 로컬화하는데 사용되는 풀링 인덱스와 관련하여 잔차 출력이 언풀링될 수 있다.Similar to the encoder, an output of the second fire module 322 may be connected to an input to form a residual block. The residual output may be unpooled with respect to the pooling index used to localize the feature map during max unpooling.

일 실시예에서, 컨텍스트 정보를 포함하는 인코딩 레이어에서 얻은 특징 맵은 하기 <수학식 6>과 같이 스킵 연결을 형성하는 해당 디코딩 레이어의 특징 맵과 연결될 수 있다.In an embodiment, the feature map obtained from the encoding layer including the context information may be connected with the feature map of the corresponding decoding layer forming a skip connection as shown in Equation 6 below.

이러한 스킵 연결은 고해상도 및 저해상도 특징 정보를 모두 사용하고 업 샘플링 작업을 수행하는 동안 가장 관련성이 높은 정보에 초점을 맞출 수 있다. Such skip linking can use both high and low resolution feature information and focus on the most relevant information while performing upsampling operations.

분류 경로의 경우, 재구성된 이미지를 출력하는 소프트맥스 활성화 함수가 있는 1x1 컨벌루션 레이어로 구성될 수 있다. For the classification path, it may consist of a 1x1 convolutional layer with a softmax activation function that outputs a reconstructed image.

예를 들어, 소프트맥스 레이어는 GM, WM, CSF 및 배경과 같은 네 가지 출력 클래스를 예측할 수 있다. 입력 이미지는 이 특징 표현을 기반으로 4 개의 출력 클래스 중 하나로 분류될 수 있다. 일 실시예에서, 딥러닝 네트워크의 손실을 측정하기 위해 교차 엔트로피 손실이 사용될 수 있다. For example, the softmax layer can predict four output classes: GM, WM, CSF, and background. The input image can be classified into one of four output classes based on this feature representation. In one embodiment, cross entropy loss may be used to measure the loss of a deep learning network.

소프트맥스 레이어는 표현 디코더(l)를 학습하고 이를 출력 클래스로 해석할 수 있다. 확률 점수 y'는 출력 클래스에 할당될 수 있다. The softmax layer can learn the representation decoder (l) and interpret it as an output class. A probability score y' may be assigned to an output class.

일 실시예에서, 출력 클래스의 수를 c로 정의하면 하기 <수학식 7>과 같이 나타낼 수 있다. In an embodiment, when the number of output classes is defined as c, it can be expressed as in Equation 7 below.

일 실시예에서, 교차 엔트로피 손실 함수는 하기 <수학식 8>과 같이 네트워크 코스트를 계산하는데 사용될 수 있다. In an embodiment, the cross entropy loss function may be used to calculate the network cost as shown in Equation 8 below.

여기서, y와 y'는 각 클래스 i에 대한 ground truth 및 예측 분포 점수를 나타낸다. Here, y and y' represent the ground truth and predicted distribution scores for each class i.

즉, 본 발명에 따르면, 1x1 컨볼루션 필터로 구성된 스퀴즈 레이어와 1x1 및 3x3 필터로 구성된 확장 레이어를 포함하는 파이어 모듈을 사용하여 학습 가능한 매개 변수 수, 계산 요구 사항이 줄어들고 결국 더 작은 효율적인 딥러닝 네트워크 모델을 형성할 수 있다. That is, according to the present invention, the number of learnable parameters and computational requirements is reduced using a Fire module including a squeeze layer composed of a 1x1 convolution filter and an extension layer composed of 1x1 and 3x3 filters, and consequently a smaller efficient deep learning network model can be formed.

제안된 방법은 두 세트의 뇌 MRI로 테스트됩니다. 예를 들어, 제1 데이터 세트에는 OASIS 데이터베이스에서 얻은 416 명의 피험자의 T1 가중치 뇌 MRI가 포함될 수 있다. 총 416 명 중 처음 30 명은 모델 학습에 사용되었고 나머지 20 명은 테스트 데이터 세트로 사용될 수 있다. The proposed method is tested with two sets of brain MRI. For example, the first data set may include T1-weighted brain MRIs of 416 subjects obtained from the OASIS database. Of the total of 416, the first 30 were used for model training and the remaining 20 could be used as a test data set.

제2 데이터 세트에는 IBSR(Internet Brain Segmentation Repository) 데이터 세트의 MRI가 포함될 수 있다. 훈련 데이터 세트에는 수동으로 주석을 달고 확인된 지상 실측 레이블이 있는 12 명의 대상이 포함되며 나머지 6 명은 모델을 테스트할 수 있다. The second data set may include MRI of an Internet Brain Segmentation Repository (IBSR) data set. The training dataset contains 12 subjects with manually annotated and verified ground truth labels, the remaining 6 being able to test the model.

도 5는 본 발명의 일 실시예에 따른 분류 결과 이미지를 도시한 도면이다.5 is a diagram illustrating a classification result image according to an embodiment of the present invention.

도 5를 참고하면, 각각 제1 데이터 세트의 축, 관상면 및 시상면에 대한 분할 결과를 확인할 수 있다. 도 5의 (a)는 원본 입력 이미지, (b)는 그라운드 트루스(ground truth) 분할 맵, (c)는 예측된 분류맵, (d)는 예측된 CSF(이진(binary) 맵), (e)는 예측된 GM(이진 맵), (f)는 예측된 WM(이진 맵)을 나타낸다. Referring to FIG. 5 , the division results for the axis, coronal plane, and sagittal plane of the first data set may be confirmed. 5(a) is the original input image, (b) is a ground truth segmentation map, (c) is a predicted classification map, (d) is a predicted CSF (binary map), (e) ) denotes predicted GM (binary map), and (f) denotes predicted WM (binary map).

즉, 본 발명에 따른 방법이 GM, WM, CSF에 대해 잘 세분화된 결과를 달성하는 것을 결과에서 확인할 수 있다.That is, it can be confirmed from the results that the method according to the present invention achieves well-refined results for GM, WM, and CSF.

또한 세분화 성능을 객관적으로 측정하기 위해 하기 <표 1>에 자세히 설명된 정량적 지표를 사용하여 본 발명에 따른 방법의 성능을 평가할 수 있다. In addition, in order to objectively measure the segmentation performance, the performance of the method according to the present invention can be evaluated using the quantitative indicators detailed in Table 1 below.

Dice similarity
coefficient (DSC)Dice similarity
coefficient (DSC)

Jaccard Index (JI)

Hausdorff
distance (HD)

Mean square error
(MSE)

DSC 및 JI는 일반적으로 중복을 기준으로 볼륨을 비교하는데 사용되며 Ground Truth와 자동화된 분할 방법의 결과를 비교하는데 사용될 수 있다. DSC는 두 세트에 공통된 요소 수의 두 배를 각 세트의 요소 수의 합으로 나눈 값으로 정의될 수 있다. 여기서

및

Ground Truth 세트 및 예측된 분할 세트(즉, 각 세트의 요소 수)의 카디널리티를 나타낸다. DSC and JI are commonly used to compare volumes based on redundancy and can be used to compare results from ground truth and automated segmentation methods. DSC can be defined as twice the number of elements common to both sets divided by the sum of the number of elements in each set. here

and

It represents the cardinality of the ground truth set and the predicted split set (i.e. the number of elements in each set).

JI는 <표 1>에 설명된 바와 같이 DSC로 표현될 수 있다. DSC 및 JI 메트릭은 예측된 세분화 맵과 해당 실측 세분화 맵 간의 일치를 결정할 수 있다. JI can be expressed in DSC as described in <Table 1>. The DSC and JI metrics may determine a match between the predicted segmentation map and the corresponding ground truth segmentation map.

또한, 원래 x값과 예측된 Y 값 간의 평균 제곱 차이인 평균 제곱 오차(mean square error, MSE) 측면에서 분할 성능을 측정할 수 있다. In addition, segmentation performance can be measured in terms of mean square error (MSE), which is a mean square difference between the original x value and the predicted Y value.

HD는 미터법 공간에서 두 세트 간의 비 유사성을 결정하는데 사용될 수 있다. 작은 HD의 두 세트는 거의 동일하게 보일 수 있다. HD can be used to determine dissimilarity between two sets in metric space. The two sets of small HDs can look almost identical.

HD와 MSE는 <표 1>과 같이 계산될 수 있다. 여기서 D는 두 픽셀의 유클리드 거리를 의미하고, R과 C는 각각 이미지 높이와 너비를 나타낼 수 있다. HD and MSE can be calculated as shown in <Table 1>. Here, D means the Euclidean distance of two pixels, and R and C may represent the image height and width, respectively.

서로 다른 네트워크 아키텍처에 대한 세분화 성능을 비교하기 위해 종래의 U-Net, SegNet 및 U-SegNet 모델이 동일한 데이터 세트에서 학습될 수 있다. To compare the segmentation performance for different network architectures, conventional U-Net, SegNet and U-SegNet models can be trained on the same data set.

도 6은 본 발명의 일 실시예에 따른 제1 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다. 도 7은 본 발명의 일 실시예에 따른 제2 데이터 세트에 기반한 분류 결과 비교를 도시한 도면이다.6 is a diagram illustrating a comparison of classification results based on a first data set according to an embodiment of the present invention. 7 is a diagram illustrating a comparison of classification results based on a second data set according to an embodiment of the present invention.

도 6 및 7을 참고하면, 종래의 U-Net, SegNet, U-SegNet과 본 발명에 따른 방법(Proposed Method)에 대한 분할 결과를 비교할 수 있다. 6 and 7 , it is possible to compare the split results for the conventional U-Net, SegNet, and U-SegNet with the method according to the present invention (Proposed Method).

즉, 본 발명에 따른 방법으로 생성된 세그멘테이션 맵의 품질은 기존의 다른 방법에 비해 훨씬 높음을 확인할 수 있다. That is, it can be seen that the quality of the segmentation map generated by the method according to the present invention is much higher than that of other existing methods.

U-Net은 네트워크가 얕고 이미지 공간 정보를 캡처하기에 충분하지 않을 수 있다. 이미지 텍스처가 복잡하면 U-Net 및 SegNet의 분할 정확도가 크게 낮아질 수 있다. 특히, 도 6의 (c)와 도 7의 (c)의 SegNet과 U-Net에서 생성된 특징 맵에서 볼 수 있는데, 강조된 영역은 특정 조직이 집중되어 오 분류를 일으켰다. 그럼에도 불구하고 U-SegNet은 인덱스와 스킵 커넥션의 조합으로 더 나은 세분화 결과를 산출하지만 미세한 세부 사항을 캡처하지 못함을 확인할 수 있다. U-Net has a shallow network and may not be sufficient to capture image spatial information. Complex image textures can significantly reduce the segmentation accuracy of U-Net and SegNet. In particular, it can be seen in the feature maps generated by SegNet and U-Net of FIGS. 6 (c) and 7 (c), and the highlighted area caused a misclassification due to the concentration of a specific tissue. Nevertheless, it can be seen that U-SegNet produces better segmentation results with the combination of index and skip connection, but does not capture fine details.

강조 표시된 빨간색 상자에서 U-SegNet은 일반적으로 WM 및 CSF 조직의 하위 세그먼트를 볼 수 있다. 잔차 연결을 통합하면 이러한 한계를 극복하고 과적합을 줄이고 세분화 성능을 향상시킬 수 있다.In the highlighted red box, U-SegNet can usually see subsegments of WM and CSF organizations. Integrating residual linkage can overcome these limitations, reduce overfitting, and improve segmentation performance.

또한, 균일한 패치는 관련 영역에 초점을 맞추고 더 나은 특징 표현을 캡처하여 제안된 방법의 성능을 향상시킬 수 있다.In addition, uniform patches can focus on relevant regions and capture better feature representations, improving the performance of the proposed method.

이러한 개선된 분할은 본 발명에 따른 방법으로 얻은 결과에서 관찰할 수 있다. IBSR 이미지에서 얻은 분할에서도 유사한 결과가 관찰될 수 있다.This improved segmentation can be observed in the results obtained with the method according to the invention. Similar results can be observed for segmentation obtained from IBSR images.

특히, 본 발명에 따른 잔차 컨볼루션 기반 딥러닝 네트워크는 다른 아키텍처보다 세부적인 정보를 얻을 수 있음을 알 수 있다. 이러한 시각적 결과는 본 발명에 따른 방법이 모호한 영역의 산만함을 우회하면서 더 거칠고 미세한 세분화 세부 사항을 강력하게 복구할 수 있음을 확인할 수 있다. In particular, it can be seen that the residual convolution-based deep learning network according to the present invention can obtain detailed information than other architectures. These visual results confirm that the method according to the present invention can robustly recover coarser and finer subdivision details while bypassing the distraction of ambiguous areas.

본 발명에 따른 방법에 대한 정량 분석은 기존 SegNet, U-Net, U-SegNet 방법과 비교하여 수행되었으며 그 결과는 <표 2> 내지 <표 5>와 같이 나타낼 수 있다. Quantitative analysis of the method according to the present invention was performed in comparison with the existing SegNet, U-Net, and U-SegNet methods, and the results can be shown in <Table 2> to <Table 5>.

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter WMW.M. SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 87.3687.36 92.1892.18 93.3693.36 96.0996.09 JI(%)JI (%) 77.1877.18 85.5085.50 87.5687.56 92.4892.48 HDHD 5.095.09 4.404.40 4.24.2 3.43.4 Coronal planeCoronal plane DSC(%)DSC (%) 82.1282.12 93.1493.14 94.1294.12 96.3796.37 JI(%)JI (%) 69.2369.23 87.2087.20 89.6389.63 91.1791.17 HDHD 5.45.4 4.144.14 3.93.9 3.23.2 Sagittal planeSagittal plane DSC(%)DSC (%) 82.4282.42 92.4492.44 93.2593.25 96.3396.33 JI(%)JI (%) 69.5469.54 86.3486.34 87.6587.65 91.3891.38 HDHD 7.27.2 4.34.3 4.04.0 3.353.35 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 72.8672.86 89.4589.45 90.3590.35 91.7691.76 JI(%)JI (%) 65.3465.34 81.5681.56 82.4982.49 84.2384.23 HDHD 6.516.51 5.145.14 4.64.6 4.24.2 Coronal planeCoronal plane DSC(%)DSC (%) 70.1570.15 88.4588.45 89.3689.36 90.4790.47 JI(%)JI (%) 62.3562.35 79.3879.38 80.1480.14 82.5382.53 HDHD 6.36.3 5.455.45 5.55.5 4.64.6 Sagittal planeSagittal plane DSC(%)DSC (%) 71.5371.53 86.8486.84 87.3287.32 89.5289.52 JI(%)JI (%) 63.4163.41 78.6378.63 79.6279.62 81.4281.42 HDHD 6.496.49 5.755.75 5.45.4 4.824.82

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter GMGM SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 84.9384.93 90.3290.32 82.0582.05 94.0494.04 JI(%)JI (%) 72.2072.20 82.3582.35 85.2785.27 90.5590.55 HDHD 5.75.7 4.34.3 4.04.0 3.73.7 Coronal planeCoronal plane DSC(%)DSC (%) 78.2178.21 82.2582.25 93.5393.53 94.9494.94 JI(%)JI (%) 64.1664.16 85.4585.45 87.8587.85 90.4690.46 HDHD 4.64.6 4.24.2 4.124.12 3.423.42 Sagittal planeSagittal plane DSC(%)DSC (%) 80.2380.23 91.1391.13 92.5692.56 95.0595.05 JI(%)JI (%) 67.5767.57 83.2683.26 85.6985.69 90.5390.53 HDHD 5.95.9 5.25.2 4.24.2 3.493.49 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 75.6375.63 91.5391.53 92.2092.20 93.8393.83 JI(%)JI (%) 67.3267.32 85.4185.41 86.1586.15 87.5287.52 HDHD 6.536.53 4.874.87 4.24.2 4.04.0 Coronal planeCoronal plane DSC(%)DSC (%) 73.6573.65 90.2390.23 91.4591.45 92.8592.85 JI(%)JI (%) 65.4265.42 83.5683.56 84.1684.16 85.6185.61 HDHD 6.216.21 5.175.17 4.84.8 4.534.53 Sagittal planeSagittal plane DSC(%)DSC (%) 74.6274.62 89.4689.46 90.4490.44 91.7191.71 JI(%)JI (%) 66.8566.85 81.5381.53 82.1582.15 84.4584.45 HDHD 6.366.36 5.775.77 5.35.3 4.254.25

OASIS datasetOASIS dataset Axial plane Axial plane parameterparameter CSFCSF SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method DSC(%)DSC (%) 80.6780.67 89.8289.82 91.6491.64 93.8893.88 JI(%)JI (%) 67.8567.85 81.5381.53 84.5784.57 90.2690.26 HDHD 4.94.9 4.64.6 4.14.1 3.63.6 Coronal planeCoronal plane DSC(%)DSC (%) 74.0374.03 89.5389.53 91.4691.46 93.6293.62 JI(%)JI (%) 61.2661.26 82.3682.36 84.2584.25 89.2789.27 HDHD 4.64.6 4.14.1 4.154.15 3.813.81 Sagittal planeSagittal plane DSC(%)DSC (%) 77.4577.45 88.6388.63 92.1592.15 94.3194.31 JI(%)JI (%) 63.5163.51 81.2681.26 85.3685.36 89.4489.44 HDHD 6.36.3 4.44.4 4.154.15 3.563.56 IBSR dataset IBSR dataset Axial planeAxial plane DSC(%)DSC (%) 68.4268.42 84.3484.34 84.9584.95 85.6485.64 JI(%)JI (%) 59.3259.32 75.8575.85 75.9875.98 77.1677.16 HDHD 6.36.3 4.44.4 4.34.3 4.864.86 Coronal planeCoronal plane DSC(%)DSC (%) 66.5466.54 83.6583.65 84.1584.15 85.8385.83 JI(%)JI (%) 57.3257.32 76.8676.86 76.9476.94 77.8677.86 HDHD 6.846.84 5.545.54 5.25.2 4.94.9 Sagittal planeSagittal plane DSC(%)DSC (%) 65.4965.49 80.7580.75 81.1981.19 83.5683.56 JI(%)JI (%) 54.8654.86 73.9673.96 74.1074.10 75.2875.28 HDHD 6.996.99 5.835.83 5.65.6 5.15.1

MSEMSE SegNetSegNet U-netU-net U-SegnetU-Segnet Proposed
MethodProposed
Method OASISOASIS 0.0210.021 0.0080.008 0.0060.006 0.0030.003 IBSRIBSR 0.0130.013 0.0090.009 0.0070.007 0.0050.005

본 발명에 다른 딥러닝 네트워크는 종래의 SegNet, U-Net, U-SegNet 방법에 비해 DSC 측면에서 평균 10%, 3.9%, 2.3% 개선을 달성했으며, SegNet, U-Net 및 U-SegNet 보다 낮은 0.003의 MSE 값을 달성했습니다. The deep learning network according to the present invention achieved average 10%, 3.9%, and 2.3% improvement in DSC compared to the conventional SegNet, U-Net, and U-SegNet methods, and lower than SegNet, U-Net and U-SegNet. We achieved an MSE value of 0.003.

이러한 성능 차이는 SegNet이 맥스 풀링 인덱스만 저장한다는 사실에 의해 설명될 수 있다. 즉, 각 풀링 창에서 최대 특성 값의 위치는 각 인코더 맵에 대해 기억되고 업 샘플링을 위한 하위 레벨 특징 맵에 사용될 수 있다. 따라서 트랜스레이션 불변성이 종종 손상될 수 있다.This performance difference can be explained by the fact that SegNet only stores max pooling indexes. That is, the position of the maximum feature value in each pooling window is memorized for each encoder map and can be used in a lower-level feature map for upsampling. Therefore, translation invariance can often be compromised.

또한 U-SegNet은 미세한 세부 사항에 민감하지 않은 경향이 있으며 WM 및 GM과 같은 인접 조직 간의 경계를 식별하는데 어려움이 있다. 이러한 기존 모델에 의해 생성된 분할 맵은 인코더 단계의 풀링 레이어로 인해 상대적으로 낮은 해상도를 가질 수 있다. 따라서 높은 공간 해상도를 유지하려면 풀링 레이어를 제거해야할 수 있다.In addition, U-SegNet tends to be insensitive to fine details and has difficulty in identifying boundaries between adjacent tissues such as WM and GM. The segmentation map generated by such an existing model may have a relatively low resolution due to the pooling layer in the encoder stage. Therefore, it may be necessary to remove the pooling layer to maintain high spatial resolution.

그러나 컨볼루션은 로컬 작업이므로 SegNet, U-Net, U-SegNet 모델은 레이어 풀링 없이는 이미지에서 전체적인 특징을 학습할 수 없을 수 있다.However, since convolution is a local operation, SegNet, U-Net, and U-SegNet models may not be able to learn global features from the image without layer pooling.

파이어 모듈과 결합된 잔차 컨볼루션 기반 딥러닝 네트워크를 사용한 본 발명에 따른 방법은 상술한 문제에 대한 잠재적인 해결책이 될 수 있으며 향상된 분할 정확도를 생성할 수 있다. The method according to the present invention using the residual convolution based deep learning network combined with the Fire module can be a potential solution to the above-mentioned problem and can produce improved segmentation accuracy.

파이어 모듈에서 함께 연결된 1x1, 3x3 커널과 컨볼루션된 입력은 세분화 맵의 해상도를 줄이지 않고 전역 컨텍스트를 캡처할 수 있다.1x1, 3x3 kernels and convolutional inputs connected together in the Fire module can capture the global context without reducing the resolution of the segmentation map.

파이어 모듈에서 얻은 정보는 잔차 연결을 통한 입력과 추가로 결합되어 특징 표현이 향상될 수 있다.The information obtained from the Fire module can be further combined with input through residual linkage to improve feature representation.

이러한 방식으로 전역 정보는 해상도를 희생하지 않고 레이어 간에 교환될 수 있으며 의미론적 세분화 맵의 흐려짐을 줄일 수 있다.In this way, global information can be exchanged between layers without sacrificing resolution and the blurring of the semantic segmentation map can be reduced.

또한 균일한 입력 패치는 네트워크가 로컬 세부 사항에 더 집중할 수 있다. 균일한 패치를 통해 공간 정보를 선택적으로 통합한 결과, 특징 맵과 잔차 연결이 컨텍스트 정보를 효율적으로 캡처하는데 도움이 될 수 있다.Uniform input patching also allows the network to focus more on local details. As a result of selective integration of spatial information through uniform patching, feature maps and residual connections can help to efficiently capture contextual information.

<표 6>은 기존 방법과 비교하여 본 발명에 따른 방법이 소모하는 학습 가능한 매개 변수 및 계산 시간을 나타낼 수 있다. <Table 6> may indicate the learnable parameters and calculation time consumed by the method according to the present invention compared to the existing method.

ModelsModels #Learnable
parameters#learnable
parameters Computation
time(hrs)computation
time(hrs) SegNetSegNet 3,502,0283,502,028 2.92.9 U-NetU-Net 4,832,3244,832,324 3.53.5 U-SegNetU-SegNet 4,279,1724,279,172 3.13.1 Proposed methodProposed method 4,290,7174,290,717 3.63.6

일 실시예에서, 1x1 컨볼루션 필터만 있는 스퀴즈 레이어로 구성된 일련의 파이어 모듈을 배열하여 더 작은 모델을 만들 수 있다.In one embodiment, a smaller model can be built by arranging a series of fire modules consisting of a squeeze layer with only a 1x1 convolution filter.

스퀴즈 레이어의 1x1 필터는 확장 레이어에 입력으로 제공되기 전에 입력 채널을 다운 샘플링하여 매개 변수를 줄일 수 있다.A 1x1 filter in the squeeze layer can reduce parameters by downsampling the input channel before being fed as input to the extension layer.

확장 레이어의 1x1 필터는 채널을 결합하고 교차 채널 풀링을 수행하지만 공간 구조를 인식할 수 없다.The 1x1 filter in the extension layer combines channels and performs cross-channel pooling, but cannot recognize spatial structures.

확장 레이어의 3x3 컨볼루션 필터는 공간 표현을 식별할 수 있으며, 이 두 가지 크기 필터를 결합하면 모델은 더 적은 매개 변수로 작동하면서 더 표현력이 높아질 수 있다.The 3x3 convolution filter in the extension layer can discriminate spatial representations, and combining these two size filters allows the model to be more expressive while working with fewer parameters.

따라서, 파이어 모듈은 매개 변수 맵을 줄여 계산 부하를 줄이고 더 높은 정확도를 유지할 수 있는 더 작은 CNN 네트워크를 구축할 수 있다.Therefore, the Fire module can reduce the parameter map to build a smaller CNN network, which can reduce the computational load and maintain higher accuracy.

본 발명에 따른 방법은 기존 방법에 비해 더 많은 레이어를 포함하지만 U-Net에 비해 매개 변수가 적고 U-SegNet 모델과 동일한 개수의 매개 변수가 필요할 수 있다. 본 발명에 따른 방법의 훈련 시간은 3.6 시간으로 U-Net 모델과 거의 같을 수 있다.The method according to the present invention includes more layers than the conventional method, but has fewer parameters than U-Net and may require the same number of parameters as the U-SegNet model. The training time of the method according to the present invention is 3.6 hours, which can be almost equal to that of the U-Net model.

도 8은 본 발명의 일 실시예에 따른 컨볼루션 유닛 비교를 도시한 도면이다.8 is a diagram illustrating a comparison of convolution units according to an embodiment of the present invention.

도 8을 참고하면, 본 발명에 따른 방법에 대해 서로 다른 컨볼루션 유닛(convolutional unit)과 통합된 방법에 대한 절제 연구를 수행하여 각 선택이 분할 성능에 미치는 영향을 확인할 수 있다. Referring to FIG. 8 , an ablation study on a method integrated with different convolutional units for the method according to the present invention can be performed to confirm the effect of each selection on the segmentation performance.

여기서, 도 8의 (a)는 순방향 컨볼루션 유닛, (b)는 스퀴즈 컨볼루션 유닛, (c)는 잔차 컨볼루션 유닛, (d)는 스퀴즈 잔차 컨볼루션 유닛(Proposed method)을 나타낸다.8, (a) is a forward convolution unit, (b) is a squeeze convolution unit, (c) is a residual convolution unit, (d) is a squeeze residual convolution unit (Proposed method).

일 실시예에서, 이러한 모든 컨볼루션 유닛은 U-SegNet을 기본 네트워크로 고려하여 구현될 수 있다. 첫 번째 모델은 U-SegNet 네트워크의 인코더와 디코더 모두에서 ReLU 활성화가 있는 순방향 컨볼루션으로 구성될 수 있다.In one embodiment, all these convolutional units may be implemented considering U-SegNet as a basic network. The first model can be constructed as a forward convolution with ReLU activation in both the encoder and decoder of the U-SegNet network.

두 번째 모델은 도 4(b)에서와 같이 순방향 컨볼루션을 스퀴즈 및 확장 레이어를 포함하는 파이어 모듈로 대체하여 구성될 수 있다. The second model can be constructed by replacing the forward convolution with a fire module including a squeeze and extension layer as shown in FIG. 4(b).

세 번째 모델은 도 8의 (c)와 같이 순방향 컨볼루션 출력을 입력과 잔차 연결을 형성하여 연결하여 형성될 수 있다.The third model may be formed by connecting the forward convolutional output to the input by forming a residual connection as shown in FIG. 8(c).

본 발명에 따른 방법인 U-SegNet 네트워크에서는 두 파이어 모듈과 잔차 연결의 조합이 사용될 수 있다.In the U-SegNet network, which is the method according to the present invention, a combination of two fire modules and residual connection can be used.

일 실시예에서, <표 7> 내지 <표 10>은 DSC 점수, 모델 매개 변수 수, 모델 학습에 필요한 시간 측면에서 이 네 가지 컨볼루션 단위의 성능을 나타낼 수 있다. In an embodiment, <Table 7> to <Table 10> may represent the performance of these four convolutional units in terms of DSC scores, number of model parameters, and time required for model training.

ModelsModels GMGM DSCDSC JIJI HDHD U-SegNet with forward convolution unitU-SegNet with forward convolution unit 92.0592.05 85.2785.27 93.3693.36 U-SegNet with Fire moduleU-SegNet with Fire module 93.7293.72 86.0686.06 93.6593.65 U-SegNet with residual connectionU-SegNet with residual connection 94.0994.09 88.5888.58 95.1195.11 Proposed MethodProposed Method 95.0495.04 90.5590.55 96.0996.09

ModelsModels WMW.M. DSCDSC JIJI HDHD U-SegNet with forward convolution unitU-SegNet with forward convolution unit 93.3693.36 87.5687.56 4.24.2 U-SegNet with Fire moduleU-SegNet with Fire module 93.6593.65 88.4288.42 2.82.8 U-SegNet with residual connectionU-SegNet with residual connection 95.1195.11 90.6890.68 3.43.4 Proposed MethodProposed Method 96.0996.09 92.4892.48 3.43.4

ModelsModels CSFCSF DSCDSC JIJI HDHD MSEMSE U-SegNet with forward convolution unitU-SegNet with forward convolution unit 91.6491.64 84.5784.57 4.14.1 0.0060.006 U-SegNet with Fire moduleU-SegNet with Fire module 92.1592.15 85.6685.66 2.02.0 0.0060.006 U-SegNet with residual connectionU-SegNet with residual connection 93.5393.53 89.9989.99 3.33.3 0.0050.005 Proposed MethodProposed Method 94.8894.88 90.2690.26 2.82.8 0.0030.003

Computation
time (5 epochs)computation
time (5 epochs) #Learnable
parameters#learnable
parameters U-SegNet with forward convolution unitU-SegNet with forward convolution unit 3.1 hours3.1 hours 4,279,1724,279,172 U-SegNet with Fire moduleU-SegNet with Fire module 1.7 hours1.7 hours 768,788768,788 U-SegNet with residual connectionU-SegNet with residual connection 14 hours14 hours 21,138,20521,138,205 Proposed MethodProposed Method 3.6 hours3.6 hours 4,290,7174,290,717

단순 순방향 컨볼루션이 있는 모델은 4 백만 개의 매개 변수를 사용하여 전체 DSC 점수가 92.35%임을 보여 주며 3.1 시간의 모델 학습이 필요함을 확인할 수 있다.The model with simple forward convolution shows an overall DSC score of 92.35% with 4 million parameters, confirming that it requires 3.1 hours of model training.

파이어 모듈로 교체된 동일한 모델은 정확도가 0.5% 증가했지만 학습 가능한 매개 변수가 5.5 배 감소하고 모델 학습 시간이 50% 감소함을 확인할 수 있다.It can be seen that the same model replaced by the Fire module increased the accuracy by 0.5%, but reduced the learnable parameters by 5.5 times and reduced the model training time by 50%.

순방향 컨볼루션과는 달리 숏컷 연결은 항상 활성화되어 있으며 그라디언트는 쉽게 다시 전파될 수 있으므로 정확도가 향상될 수 있다.Unlike forward convolution, shortcut connections are always active and gradients can easily propagate back, which can improve accuracy.

그러나 이 잔차 기반 모델에는 2,100 만 개의 매개 변수가 있으며 기준 U-SegNet보다 4 배 더 많은 모델을 학습하는데 14 시간이 소요될 수 있다.However, this residual-based model has 21 million parameters and it can take 14 hours to train four times more models than the reference U-SegNet.

잔차 연결로 더 나은 정확도를 얻을 수 있지만 큰 매개 변수 생성과 높은 시간 복잡성으로 인해 이미지 분할 작업에 적합하지 않은 선택일 수 있다.Residual concatenation gives better accuracy, but may not be a good choice for image segmentation tasks due to large parameter generation and high time complexity.

이 문제를 극복하기 위해, 본 발명은 잔차 네트워크에서 파이어 모듈을 사용할 수 있다. 파이어 모듈 기반 네트워크는 네트워크 정확도를 유지하면서 모델 학습을 위한 계산 시간을 줄임으로써 학습 가능한 매개 변수의 요구 사항이 크게 감소함을 확인할 수 있다. To overcome this problem, the present invention can use the Fire module in the residual network. It can be seen that the Fire module-based network reduces the computation time for model training while maintaining the network accuracy, thereby significantly reducing the requirements for learnable parameters.

스퀴즈 레이어에 1x1 컨볼루션이 있는 파이어 모듈은 차원 감소에 도움이 되고 모델 수렴 속도를 높이는데 도움이 되며, 확장 레이어는 1x1 및 3x3과 병렬로 결합되어 스퀴즈 레이어에 의해 발생되는 그라디언트 손실을 방지하는데 도움이 될 수 있다. Fire module with 1x1 convolution on squeeze layer helps reduce dimensionality and helps speed up model convergence, extension layer is combined with 1x1 and 3x3 in parallel to help avoid gradient loss caused by squeeze layer this can be

파이어 모듈 및 잔차 연결과 협력하여 네트워크는 전체 DSC 점수가 95%이고 훈련 시간이 3.6 시간에 불과한 기준 U-SegNet에 비해 2.5% 향상되었음을 확인할 수 있다.Cooperating with the Fire module and residual concatenation, it can be seen that the network improved by 2.5% compared to the baseline U-SegNet with an overall DSC score of 95% and a training time of only 3.6 hours.

따라서 단축 연결은 각 네트워크 레이어를 통해 잔차 정보를 전달할 수 있게 하여 분할 정확도를 높이고 파이어 모듈은 모델 매개 변수와 계산 시간을 줄여 제안된 방법이 기존 방법보다 우수함을 확인할 수 있다.Therefore, the shortened connection allows the residual information to be transmitted through each network layer, thereby increasing the segmentation accuracy, and the Fire module reduces the model parameters and computation time, confirming that the proposed method is superior to the existing method.

또한 훈련 시간 및 세분화 성능 측면에서 패치 크기의 영향을 실험하며, 실험은 세 가지 다른 패치 크기(128×128, 64×64 및 32×32)에 대해 OASIS 데이터 세트에서 수행될 수 있다. We also experiment with the effect of patch size on training time and segmentation performance, experiments can be performed on the OASIS dataset for three different patch sizes (128×128, 64×64 and 32×32).

일 실시예에서, <표 11>은 다양한 패치 크기에 대한 DSC의 세분화 성능을 나타낼 수 있다. In an embodiment, <Table 11> may indicate the segmentation performance of the DSC for various patch sizes.

Patch
sizePatch
size DSCDSC HIHI Computation time (hours)Computation time (hours) WMW.M. GMGM CSFCSF WMW.M. GMGM CSFCSF 128x128128x128 96.3396.33 95.4495.44 94.8194.81 92.9392.93 91.2891.28 90.1490.14 3.63.6 64x6464x64 96.3596.35 95.4895.48 94.8994.89 92.9592.95 91.3791.37 90.2890.28 5.45.4 32x3232x32 96.3596.35 95.5195.51 95.1295.12 92.9892.98 91.4791.47 90.3490.34 7.27.2

패치 크기가 작을수록 성능이 향상됨을 확인할 수 있다. 이는 작은 패치가 네트워크가 훈련할 수 있도록 더 많은 훈련 데이터를 생성하기 때문일 수 있다.It can be seen that the smaller the patch size, the better the performance. This may be because small patches generate more training data for the network to train.

더욱이 로컬 영역은 더욱 정확하게 재구성될 수 있다. 또한 패치 크기가 128×128이면 모델 학습에 3.6 시간이 걸리는 반면, 거의 동일한 정확도로 32×32 패치의 경우 학습 시간이 두 배가 될 수 있다.Moreover, the local area can be reconstructed more accurately. Also, if the patch size is 128×128, it takes 3.6 hours to train the model, whereas for a 32×32 patch with almost the same accuracy, the training time can be doubled.

따라서 128x128 패치 크기는 <표 11>의 결과에 따라 DSC 점수와 모델 학습에 소요되는 계산 시간 사이에 적절한 균형을 제공함을 확인할 수 있다. Therefore, it can be confirmed that the 128x128 patch size provides an appropriate balance between the DSC score and the computation time required for model training according to the results in <Table 11>.

도 9는 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 방법을 도시한 도면이다.9 is a diagram illustrating an image segmentation method using a residual convolution-based deep learning network according to an embodiment of the present invention.

도 9를 참고하면, S901 단계는, 입력 정보를 획득하는 단계이다. Referring to FIG. 9 , step S901 is a step of acquiring input information.

일 실시예에서, 입력 이미지를 획득하고, 입력 이미지를 분할하여 패치(patch) 형태의 입력 정보를 생성할 수 있다. 여기서, 입력 이미지는 객체의 슬라이스 이미지(slice image)를 포함할 수 있다. 또한, 입력 정보는 슬라이스 이미지를 다수의 패치 형태로 분할한 패치 이미지를 포함할 수 있다. In an embodiment, the input information may be generated in the form of a patch by acquiring an input image and dividing the input image. Here, the input image may include a slice image of an object. Also, the input information may include a patch image obtained by dividing a slice image into a plurality of patch types.

S903 단계는, 입력 정보를 제1 컨볼루션 필터(412)로 구성된 스퀴즈 레이어(squeeze layer)(410)와 제2 컨볼루션 필터(422) 및 제3 컨볼루션 필터(424)로 구성된 확장 레이어(expand layer)(420)를 포함하는 제1 파이어 모듈(310)에 입력하는 단계이다. In step S903, the input information is expanded with the squeeze layer 410 composed of the first convolution filter 412, the second convolution filter 422 and the third convolution filter 424. It is a step of inputting the first fire module 310 including the layer) 420 .

S905 단계는, 제1 파이어 모듈(310)의 출력 정보와 잔차 연결(residual connection)을 통해 획득된 입력 정보를 연결(concatenate)(312)하는 단계이다. Step S905 is a step of concatenating ( 312 ) output information of the first fire module 310 and input information obtained through residual connection.

S907 단계는, 상기 연결(312)하여 산출된 결과값을 맥스 풀링(314)하여 인코딩을 수행하는 단계이다. Step S907 is a step of performing encoding by max-pooling (314) the result value calculated by the connection (312).

일 실시예에서, S907 단계 이후에, 인코딩을 수행하여 산출된 인코딩값을 맥스 언풀링(max unpooling)(320)하고, 맥스 언풀링(320)하여 산출된 결과값을 제1 컨볼루션 필터(412)로 구성된 스퀴즈 레이어(410)와 제2 컨볼루션 필터(422) 및 제3 컨볼루션 필터(424)로 구성된 확장 레이어(420)를 포함하는 제2 파이어 모듈(322)에 입력하며, 제2 파이어 모듈(322)의 출력 정보와 잔차 연결을 통해 획득된 맥스 언풀링(320)하여 산출된 결과값을 연결(concatenate)(324)하여 디코딩을 수행할 수 있다. In one embodiment, after step S907, the encoding value calculated by performing encoding is max unpooled (320), and the result calculated by max unpooling (320) is applied to the first convolution filter (412). ) is input to the second fire module 322 including the squeeze layer 410 composed of Decoding may be performed by concatenating ( 324 ) the output information of the module 322 and the result calculated by max unpooling 320 obtained through residual concatenation.

일 실시예에서, 디코딩을 수행한 이후에, 디코딩을 수행하여 산출된 디코딩값을 분류 레이어(classify layer)(330)에 입력하여 최종 출력값을 산출할 수 있다. In an embodiment, after decoding is performed, a decoded value calculated by performing decoding may be input to a classification layer 330 to calculate a final output value.

일 실시예에서, 분류 레이어(330)는 소프트맥스(softmax) 활성화 함수로 구성될 수 있다. In one embodiment, the classification layer 330 may consist of a softmax activation function.

도 10은 본 발명의 일 실시예에 따른 잔차 컨볼루션 기반 딥러닝 네트워크를 이용한 영상 분할 장치(1000)를 도시한 도면이다.10 is a diagram illustrating an image segmentation apparatus 1000 using a residual convolution-based deep learning network according to an embodiment of the present invention.

도 10을 참고하면, 영상 분할 장치(1000)는 획득부(1010), 제어부(1020) 및 저장부(1030)를 포함할 수 있다.Referring to FIG. 10 , the image dividing apparatus 1000 may include an acquisition unit 1010 , a control unit 1020 , and a storage unit 1030 .

획득부(1010)는 입력 정보를 획득할 수 있다. The acquisition unit 1010 may acquire input information.

일 실시예에서, 획득부(1010)는 통신부 또는 촬영부를 포함할 수 있다. 예를 들어, 획득부(1010)는 자기 공명 영상(MRI) 카메라를 포함할 수 있다. 또한, 획득부(1010)는 외부 전자 장치로부터 데이터를 수신하는 통신부를 포함할 수 있다. In an embodiment, the acquisition unit 1010 may include a communication unit or a photographing unit. For example, the acquisition unit 1010 may include a magnetic resonance imaging (MRI) camera. Also, the acquisition unit 1010 may include a communication unit that receives data from an external electronic device.

일 실시예에서, 통신부는 유선 통신 모듈 및 무선 통신 모듈 중 적어도 하나를 포함할 수 있다. 통신부의 전부 또는 일부는 '송신부', '수신부' 또는 '송수신부(transceiver)'로 지칭될 수 있다.In an embodiment, the communication unit may include at least one of a wired communication module and a wireless communication module. All or part of the communication unit may be referred to as a 'transmitter', 'receiver' or 'transceiver'.

제어부(1020)는 입력 정보를 제1 컨볼루션 필터(412)로 구성된 스퀴즈 레이어(410)와 제2 컨볼루션 필터(422) 및 제3 컨볼루션 필터(424)로 구성된 확장 레이어(420)를 포함하는 제1 파이어 모듈(310)에 입력할 수 있다.The control unit 1020 includes a squeeze layer 410 composed of a first convolution filter 412 and an extension layer 420 composed of a second convolution filter 422 and a third convolution filter 424 for input information. may be input to the first fire module 310 .

또한, 제어부(1020)는 제1 파이어 모듈(310)의 출력 정보와 잔차 연결을 통해 획득된 입력 정보를 연결(312)하고, 상기 연결(312)하여 산출된 결과값을 맥스 풀링(314)하여 인코딩을 수행할 수 있다. In addition, the controller 1020 connects (312) the output information of the first fire module 310 and the input information obtained through the residual connection, and max-pools (314) the result calculated by the connection (312). encoding can be performed.

일 실시예에서, 제어부(1020)는 적어도 하나의 프로세서 또는 마이크로(micro) 프로세서를 포함하거나, 또는, 프로세서의 일부일 수 있다. 또한, 제어부(1020)는 CP(communication processor)라 지칭될 수 있다. 제어부(1020)는 본 발명의 다양한 실시예에 따른 영상 분할 장치(1000)의 동작을 제어할 수 있다. In an embodiment, the controller 1020 may include at least one processor or microprocessor, or may be a part of the processor. Also, the controller 1020 may be referred to as a communication processor (CP). The controller 1020 may control the operation of the image dividing apparatus 1000 according to various embodiments of the present disclosure.

저장부(1030)는 입력 정보를 저장할 수 있다. 또한, 저장부(1030)는 잔차 컨볼루션 기반 딥러닝 네트워크를 저장할 수 있다. The storage unit 1030 may store input information. Also, the storage unit 1030 may store a residual convolution-based deep learning network.

일 실시예에서, 저장부(1030)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 저장부(1030)는 제어부(1020)의 요청에 따라 저장된 데이터를 제공할 수 있다.In an embodiment, the storage unit 1030 may be configured as a volatile memory, a non-volatile memory, or a combination of a volatile memory and a non-volatile memory. In addition, the storage unit 1030 may provide the stored data according to the request of the control unit 1020 .

도 10을 참고하면, 영상 분할 장치(1000)는 획득부(1010), 제어부(1020) 및 저장부(1030)를 포함할 수 있다. 본 발명의 다양한 실시 예들에서 영상 분할 장치(1000)는 도 10에 설명된 구성들이 필수적인 것은 아니어서, 도 10에 설명된 구성들보다 많은 구성들을 가지거나, 또는 그보다 적은 구성들을 가지는 것으로 구현될 수 있다.Referring to FIG. 10 , the image dividing apparatus 1000 may include an acquisition unit 1010 , a control unit 1020 , and a storage unit 1030 . In various embodiments of the present disclosure, the image segmentation apparatus 1000 is not essential to the components illustrated in FIG. 10 , and thus may be implemented as having more or fewer components than those illustrated in FIG. 10 . have.

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로, 통상의 기술자라면 본 발명의 본질적인 특성이 벗어나지 않는 범위에서 다양한 변경 및 수정이 가능할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention.

본 명세서에 개시된 다양한 실시예들은 순서에 관계없이 수행될 수 있으며, 동시에 또는 별도로 수행될 수 있다. The various embodiments disclosed herein may be performed out of order, and may be performed simultaneously or separately.

일 실시예에서, 본 명세서에서 설명되는 각 도면에서 적어도 하나의 단계가 생략되거나 추가될 수 있고, 역순으로 수행될 수도 있으며, 동시에 수행될 수도 있다. In an embodiment, at least one step may be omitted or added in each figure described herein, may be performed in the reverse order, or may be performed simultaneously.

본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라, 설명하기 위한 것이고, 이러한 실시예들에 의하여 본 발명의 범위가 한정되는 것은 아니다.The embodiments disclosed in the present specification are not intended to limit the technical spirit of the present invention, but to illustrate, and the scope of the present invention is not limited by these embodiments.

본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 이해되어야 한다.The protection scope of the present invention should be interpreted by the claims, and all technical ideas within the scope equivalent thereto should be understood to be included in the scope of the present invention.

300: 잔차 컨볼루션 기반 딥러닝 네트워크
310: 제1 파이어 모듈
312: 연결
314: 맥스 풀링
320: 맥스 언풀링
322: 제2 파이어 모듈
324: 연결
330: 분류 레이어
400: 파이어 모델
410: 스퀴즈 레이어
412: 제1 컨볼루션 레이어
420: 확장 레이어
422: 제2 컨볼루션 레이어
424: 제3 컨볼루션 레이어
426: 연결
1000: 영상 분할 장치
1010: 획득부
1020: 제어부
1030: 저장부300: Residual Convolution Based Deep Learning Network
310: first fire module
312: connect
314: Max Pooling
320: max unpooling
322: second fire module
324: connect
330: classification layer
400: fire model
410: squeeze layer
412: first convolutional layer
420: extension layer
422: second convolutional layer
424: third convolution layer
426: connect
1000: video splitting device
1010: Acquisition Department
1020: control unit
1030: storage

Claims

(a) obtaining input information;
(b) a first fire module including a squeeze layer composed of a first convolution filter and an expand layer composed of a second convolution filter and a third convolution filter for the input information to enter into;
(c) concatenating the output information of the first fire module and the input information obtained through residual connection; and
(d) performing encoding by max pooling the result values calculated by concatenation;
containing,
An image segmentation method using a residual convolution-based deep learning network.

According to claim 1,
The step (a) is,
acquiring an input image; and
generating the input information in the form of a patch by dividing the input image;
containing,
An image segmentation method using a residual convolution-based deep learning network.

According to claim 1,
The first convolution filter and the second convolution filter are configured with a first kernel size,
The third convolution filter is composed of a second kernel size,
An image segmentation method using a residual convolution-based deep learning network.

According to claim 1,
Step (b) is,
generating an output value of a first convolution filter of the squeeze layer;
inputting the generated output value to each of a second convolution filter and a third convolution filter configured in parallel of the extension layer; and
generating output information of the first fire module by concatenating output values of each of the second convolution filter and the third convolution filter;
containing,
An image segmentation method using a residual convolution-based deep learning network.

According to claim 1,
After step (d),
max unpooling the encoding value calculated by performing the encoding;
inputting the result calculated by the max unpooling to a second fire module including a squeeze layer composed of a first convolution filter and an extension layer composed of a second convolution filter and a third convolution filter; and
performing decoding by concatenating the output information of the second fire module and the result value calculated by max unpooling obtained through residual concatenation;
containing,
An image segmentation method using a residual convolution-based deep learning network.

6. The method of claim 5,
After performing the decoding step,
calculating a final output value by inputting the decoded value calculated by performing the decoding to a classification layer;
containing,
An image segmentation method using a residual convolution-based deep learning network.

7. The method of claim 6,
The classification layer is composed of a softmax activation function,
An image segmentation method using a residual convolution-based deep learning network.

an acquisition unit for acquiring input information;
The input information is input to a first fire module including a squeeze layer composed of a first convolution filter and an expand layer composed of a second convolution filter and a third convolution filter, and ,
concatenates the output information of the first fire module and the input information obtained through residual connection;
a control unit for performing encoding by max pooling the result values calculated by the connection;
containing,
An image segmentation device using a residual convolution-based deep learning network.

9. The method of claim 8,
The acquisition unit acquires an input image,
The control unit divides the input image to generate the input information in the form of a patch,
An image segmentation device using a residual convolution-based deep learning network.

9. The method of claim 8,
The first convolution filter and the second convolution filter are configured with a first kernel size,
The third convolution filter is composed of a second kernel size,
An image segmentation device using a residual convolution-based deep learning network.

9. The method of claim 8,
The control unit is
generating an output value of the first convolution filter of the squeeze layer;
inputting the generated output value to each of a second convolution filter and a third convolution filter configured in parallel of the extension layer,
generating output information of the first fire module by concatenating the output values of each of the second convolution filter and the third convolution filter;
An image segmentation device using a residual convolution-based deep learning network.

9. The method of claim 8,
The control unit is
Max unpooling the encoding value calculated by performing the encoding,
The result calculated by the max unpooling is input to a second fire module including a squeeze layer composed of a first convolution filter and an extension layer composed of a second convolution filter and a third convolution filter,
decoding is performed by concatenating the output information of the second fire module and the result value calculated by max unpooling obtained through residual concatenation,
An image segmentation device using a residual convolution-based deep learning network.

13. The method of claim 12,
The control unit is
Inputting the decoded value calculated by performing the decoding to a classification layer to calculate a final output value,
An image segmentation device using a residual convolution-based deep learning network.

14. The method of claim 13,
The classification layer is composed of a softmax activation function,
An image segmentation device using a residual convolution-based deep learning network.