KR20140025639A

KR20140025639A - Parallel processing of 3d medical image registration by gp-gpu

Info

Publication number: KR20140025639A
Application number: KR1020120091184A
Authority: KR
Inventors: 김학일; 이성철; 최학남; 곽규성
Original assignee: 인하대학교 산학협력단; 아주대학교산학협력단
Priority date: 2012-08-21
Filing date: 2012-08-21
Publication date: 2014-03-05
Also published as: KR101471646B1

Abstract

The present invention provides an adjustment technique of a medical image required to monitor diseases in the medical image obtained from a same patent at different times. According to the present invention, provided is a parallel processing method of a three-dimensional (3D) medical image adjustment using GP-GPU. The present invention is applied with parallel processing based on the GP-GPU in order to rapidly and accurately process 3D medical image data by gathering strengths of an existing division-based technique and a pixel-based technique. [Reference numerals] (AA) Input 2 volume; (BB) Scaling; (CC) Initialization step; (DD) Extract binary volume; (EE) Calculate the center; (FF) Calculate the principal axis of inertia; (GG) Optimization step; (HH) Convert floating volume; (II) Update P; (JJ) Is m_val minimum?; (KK) Optimize; (LL) Matched volume; (MM) P : Set of transformation parameters; (NN) m_val : Voxel similarity measurement value

Description

[0001] The present invention relates to a parallel processing method of 3D medical image matching using a GP-GPU,

본 발명은, 3차원 의료영상의 정합 처리를 고속으로 처리하는 방법에 관한 것으로, 더 상세하게는, GP-GPU(General-Purpose computation on Graphics Processing Units)를 이용한 3D 의료 영상 정합의 병렬처리방법에 관한 것이다.
More particularly, the present invention relates to a method for parallel processing of 3D medical image matching using GP-GPU (General-Purpose Computation on Graphics Processing Units) .

종래, 예를 들면, 관절염(Osteoarthritis)과 같은 질환을 치료하기 위해서는, 일반적으로, 도 7에 나타낸 바와 같이, 질환이 있는 관절의 위치를 CT(Computed Tomography), MRI(Magnetic Resonance Imaging), PET(Positron Emission Tomography) 및 SPECT(Single Photon Emission Computed Tomography) 등과 같은 스캐너(scanner)를 사용하여 의료영상을 2차원 영상 또는 이미지 슬라이드로 촬영하고, 이러한 영상들을 이용하여 질환을 모니터링(monitoring) 하거나 치료를 행하게 된다.
Conventionally, for example, in order to treat a disease such as osteoarthritis, generally, as shown in Fig. 7, the position of the diseased joint is determined by CT (Computed Tomography), MRI (Magnetic Resonance Imaging) Positron Emission Tomography (SPECT), and Single Photon Emission Computed Tomography (SPECT). The medical images are then photographed as two-dimensional images or image slides, and the diseases are monitored or treated using these images do.

또한, 질환을 모니터링 하기 위해서는, 동일한 환자에 대하여 적어도 두 세트의 의료영상을 이용해야 하며, 여기서, 두 세트의 의료영상이란, 예를 들면, 환자가 처음으로 입원할 때 찍었던 영상과, 6개월이나 1년 후에 찍은 영상과 같이, 서로 다른 시간에 각각 얻어진 영상을 의미하고, 이러한 두 영상을 각각 입력(source) 영상과 목표(target) 영상이라 한다.
In addition, in order to monitor the disease, at least two sets of medical images should be used for the same patient, wherein the two sets of medical images are, for example, images taken at the time of first hospitalization, This means that the images obtained at different times, such as the images taken one year later, are referred to as the source image and the target image, respectively.

아울러, 질환이 있는 위치의 영상을 획득할 때, 환자가 움직이거나 또는 질환이 발생한 경우, 의사들이 2차원 영상을 한 장씩 눈으로 보고 판단하기는 용이하지 못하므로, 이를 위해, 영상처리기법 중 하나인 3차원 영상 정합기법이 필요하게 된다.
In addition, since it is not easy for physicians to view two-dimensional images one by one by eye when acquiring images of a diseased position, when a patient moves, or when a disease occurs, Dimensional image matching technique.

즉, 일반적으로, 영상 정합기법은, 하나의 대상을 다른 시간이나 관점에서 촬영한 영상이나, 서로 다른 좌표계에서 얻어진 영상을 같은 좌표계로 다시 매칭(matching) 해주는 기법을 말한다.
In other words, in general, the image matching technique refers to a technique of matching an image taken by one object at another time or point of view or an image obtained from different coordinate systems to the same coordinate system.

또한, 의료영상에 있어서, 영상 정합은, 서로 다른 스캐너에서 획득한 두 개의 영상을 통합할 때나, 동일한 스캐너를 사용하여 서로 다른 시간에 얻은 두 개의 영상을 통합할 때 중요한 역할을 한다.
Also, in medical images, image registration plays an important role when merging two images obtained from different scanners or when merging two images obtained at different times using the same scanner.

여기서, 예를 들면, 머리 CT 영상들을 매칭하는 정합기법과 무릎 CT 영상들을 매칭하는 정합기법은, 원하는 결과가 상이하므로 그 정합기법이 서로 다를 수 있다.
Here, for example, the matching technique for matching the head CT images and the matching technique for matching the knee CT images may differ from each other because the desired results are different.

또한, 기존에 개발된 정합기법들은, 정합 방식에 따라 크게 특징기반(Feature-based) 기법, 분할기반(Segmentation-based) 기법 및 화소값 기반(Intensity-based) 기법으로 분류된다.
In addition, existing matching schemes are largely classified into Feature-based, Segmentation-based, and Intensity-based schemes according to the matching method.

먼저, 특징기반 기법은, 입력 영상과 목표 영상에서 해당된 특징들을 검출하고, 그 특징들 또는 랜드마크(Landmark)들을 이용하여 정합을 수행하며, 여기서, 영상처리에서의 특징이란, 한 영상을 대표하는 모든 것들을 그 영상의 특징이라고 말할 수 있다.
First, a feature-based technique detects corresponding features in an input image and a target image, and performs matching using the features or landmarks. Here, the feature in the image processing is a feature Can be said to be a feature of the image.

즉, 영상에서 통속적인 특징들은, 영상의 점(points), 코너(corners), 선(Lines), 경계(boundary), 구분선(contour) 등이며, 인체의 부위에 따라 사용되는 특징도 다르다.
In other words, the common characteristics of the image are the points, corners, lines, boundary, contour, etc. of the image, and the features used depend on the parts of the human body.

아울러, 이러한 특징들을 찾을 때에는, 전문의가 수동으로 찾을 수도 있고, 또는, 예를 들면, SIFT, SURF 방법들과 같이, 영상에서 특징을 검출하는 자동 영상처리 기반 알고리즘들을 사용하여 자동으로 찾도록 할 수도 있다.
In addition, these features can be found manually by a specialist or automatically detected using automatic image processing based algorithms that detect features in an image, such as, for example, SIFT and SURF methods have.

따라서 특징기반의 기법은, 특징만 매칭하기 때문에 정합 수행시간이 오래 걸리지 않지만 정확성이 떨어지는 단점이 있고, 반면, 전문가가 수동으로 특징을 찾으면 정확성은 높일 수 있으나 수행시간이 오래 걸리게 된다.
Therefore, the feature-based technique has a disadvantage in that it does not take a long time to perform the matching because it matches only the features, but the accuracy is poor. On the other hand, if the expert manually finds the feature, the accuracy can be increased but the execution time is long.

또한, 분할기반의 정합기법은, 특징기반 정합기법과 방식이 유사하며, 즉, 분할기반 기법은, 특징기반 기법과 마찬가지로 두 영상(입력 영상과 목표 영상)에서 변화가 없는 부분을 찾고, 그 부분에 대하여 매칭을 수행한다.
In addition, the segmentation-based matching technique is similar to the feature-based matching technique. That is, the segmentation-based technique finds a portion that has no change in two images (input image and target image) . &Lt; / RTI >

더 상세하게는, 예를 들면, 무릎 영상에서 뼈 부분을 분할할 때, 수동으로 분할하는 방식과 자동으로 분할하는 방식이 있고, 영상처리 분할 알고리즘 중 ACM(Active Contour model)과 Mean Shift 알고리즘들은 의료영상처리 분야에 자주 사용되고 있는 알고리즘들이며, 분할기반 기법도 특징기반 기법과 마찬가지로 영상의 분할한 영역만 매칭하기 때문에 정합시간이 짧다.
More specifically, for example, when dividing a bone portion from a knee image, there are a method of dividing the bone manually and a method of dividing the bone automatically. Among the image processing division algorithms, the ACM (Active Contour model) These algorithms are frequently used in image processing. Similar to the feature-based technique, the segmentation-based method is shorter in matching time because it only matches the segmented region of the image.

또한, 상기한 특징기반의 기법 및 분할기반의 기법과는 별개로, 화소값 기반의 정합기법은, 영상에서 특징을 검출하거나 분할영역을 찾지 않고 영상의 화소값을 그대로 이용하는 기법이다.
In addition, apart from the feature-based technique and the segmentation-based technique described above, the pixel-value-based matching technique is a technique for directly detecting pixel values of an image without detecting features or searching for a divided region.

아울러, 이러한 화소값을 이용한 기법은, 전체 이미지(whole image)를 사용하는 경우가 있고, 서브 이미지(sub-image)만 사용하는 경우가 있다.
In addition, a technique using such a pixel value may use a whole image, and may use only a sub-image.

먼저, 서브 이미지로 매칭하는 방식을 템플릿 매칭(template matching) 방식이라 하며, 의료영상에서는 서브 이미지로 매칭하는 방식보다는 전체 이미지를 사용하여 매칭하는 방식이 많이 이용되고, 이는, 이러한 방식이 임상 응용(clinical applications)을 위해 신뢰할 수 있는 결과를 제공하기 때문이다.
First, a matching method using a sub image is referred to as a template matching method. In a medical image, a matching method using an entire image is used rather than a matching method using a sub image. because it provides reliable results for clinical applications.

또한, 두 영상(입력 영상과 목표 영상)의 화소값을 비교할 때, 유사성 측정 메트릭(similarity measure metric)을 이용하여 계산을 행하며, 원본 이미지에 따라 사용되는 메트릭도 다르다.
Also, when comparing the pixel values of two images (input image and target image), similarity measure metrics are used for calculation, and the metrics used depend on the original image.

예를 들면, MRI 영상과 CT 영상을 정합할 때 적용하는 메트릭과, CT 영상과 CT영상을 정합할 때 적용하는 메트릭은 상이하며, 일반적인 메트릭들은 MI(mutual information), NMI(normalized mutual information), NCC(normalized cross correlation), SSD(sum of squared differences) 및 SAD(sum of absolute differences) 등이 사용된다.
For example, the metrics used when matching MRI images with CT images are different from the metrics used when matching CT images with CT images. Common metrics include MI (mutual information), NMI (normalized mutual information) Normalized cross correlation (NCC), sum of squared differences (SSD), and sum of absolute differences (SAD).

여기서, MI 및 NMI는, 예를 들면, MRI/CT와 같이, 서로 다른 스캐너로 획득한 두 영상을 정합할 때 사용하고 NCC, SSD, SAD는, 예를 들면, CT/CT와 같이 같은 스캐너로 획득한 두 영상을 정합할 때 적용된다.
Here, MI and NMI are used for matching two images acquired by different scanners, for example, MRI / CT, and NCC, SSD, and SAD are used for matching, for example, CT / CT It is applied when matching two acquired images.

따라서 상기한 바와 같이, 화소값을 이용한 기법은 전체 이미지의 화소값에 메트릭을 사용하여 비교하기 때문에 높은 정확도를 보장하지만, 수행시간이 오래 걸린다.
Therefore, as described above, the technique using pixel values compares the pixel values of the entire image using metrics, thereby ensuring high accuracy, but requires a long execution time.

상기한 바와 같이, 기존에 제시된 종래의 영상 정합기법들은 각자 장단점을 가지고 있으며, 즉, 특징기반의 기법과 분할기반의 기법은 정합시간이 빠르나 매칭하기 전에 특징이나 분할 영역을 구해야 하기 때문에 정합 정확도를 보장하지 못하며, 반면, 화소값 기반의 방식은 영상의 화소값을 그대로 계산하기 때문에 정확도를 보장할 수 있으나 픽셀값을 하나씩 비교하므로 수행시간이 많이 걸린다는 단점이 있다.
As described above, the existing conventional image matching techniques have advantages and disadvantages. That is, in the feature-based technique and the segmentation-based technique, since the matching time is fast but the feature or the divided area must be obtained before matching, On the other hand, the pixel value based method can guarantee the accuracy because the pixel value of the image is calculated as it is. However, since it compares pixel values one by one, it takes a long time to execute.

따라서 의료영상 처리분야에 있어서 상기한 바와 같은 종래기술의 문제점을 해결하기 위하여는, 정확도를 보장하면서 동시에 수행시간도 짧은 새로운 정합기법의 개발이 요구되나, 아직까지 그러한 요구를 모두 만족시키는 의료영상의 정합기법은 제시된 바 없었다.
Therefore, in order to solve the above-mentioned problems of the prior art in the field of medical image processing, it is required to develop a new matching technique that guarantees accuracy and shortens the execution time. However, No matching technique has been proposed.

아울러, 의료 분야에 있어서, 고가의 연산처리 비용(computational cost)으로 인해 3D 영상을 적용하는 경우는 매우 제한적이었으나, 최근, 멀티코어 CPU 및 그래픽 전용의 처리장치(GPU)의 사용이 증가하면서, 병렬 처리에 의해 연산 시간의 문제는 점차 해소되고 있다.
In addition, in the medical field, the use of 3D images due to expensive computational cost was very limited. However, in recent years, the use of multi-core CPUs and graphics processing units (GPUs) The problem of calculation time is gradually solved by the processing.

더 상세하게는, GPU를 이용하여 영상정합을 가속하는 기술에 있어서, 종래, 예를 들면, NVIDIA사의 GPU를 프로그래밍하기 위해 C/C++과 유사한 프로그래밍 환경인 CUDA(Compute Unified Device Architecture)를 이용하는 GPU 병렬처리 방법이 있으며, 이러한 CUDA에 기반하여 다양한 영상처리의 가속방법이 제안되어 왔다.
More particularly, the present invention relates to a technique for accelerating image registration using a GPU, and more particularly, to GPU parallel processing using a CUDA (Compute Unified Device Architecture) which is a programming environment similar to C / C ++ for programming NVIDIA's GPU There are various methods of accelerating various image processing based on the CUDA.

그러나 상기한 바와 같은 CUDA를 이용한 알고리즘들은, NVIDIA사의 하드웨어에서만 동작한다는 문제점이 있었고, 따라서 이러한 문제점을 해결하기 위하여는,특정 회사의 특정 제품이 아닌, 범용으로 어느 하드웨어에서나 동작 가능한 알고리즘을 제공하는 것이 바람직하다.
However, there is a problem that the algorithms using CUDA as described above operate only on NVIDIA hardware. Therefore, in order to solve such a problem, it is necessary to provide algorithms that can be operated in any hardware, desirable.

즉, 예를 들면, OpenCL과 같이, 특정한 하드웨어를 위해 개발된 것이 아닌 보다 일반적인 병렬처리 구조를 이용하여 GPU에 의해 3D 의료영상의 정합처리를 더욱 가속할 수 있는 알고리즘이나 방법을 제공하는 것이 바람직하나, 아직까지 그러한 요구를 모두 만족시키는 알고리즘이나 방법 또한 제시된 바 없었다.
That is, for example, it is desirable to provide an algorithm or method capable of accelerating the matching process of a 3D medical image by a GPU using a more general parallel processing structure that is not developed for a specific hardware, such as OpenCL , Yet no algorithms or methods have been proposed to satisfy all of these requirements.

본 발명은 상기한 바와 같은 종래기술의 문제점을 해결하고자 하는 것으로, 따라서 본 발명의 목적은, 서로 다른 시간에 획득한 같은 환자의 의료 영상에서 질환을 모니터링하기 위해 필요한 의료영상의 정합 기법에 있어서, 종래의 분할기반의 기법과 화소값기반 기법의 장점을 종합하여, 보다 빠르고 정확한 의료영상의 정합방법을 제공하고자 하는 것이다.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a medical image matching technique for monitoring a disease in a medical image of a patient acquired at different times, The present invention is to provide a faster and more accurate matching method of medical images by integrating the advantages of the conventional segmentation-based technique and the pixel value-based technique.

또한, 본 발명의 다른 목적은, 3차원 데이터를 빠르게 처리할 수 있도록 GP-GPU(General-Purpose computation on Graphics Processing Units) 기반 병렬처리를 적용하여, 종래의 의료영상 정합기법에 비하여 보다 빠르고 정확한 3D 의료영상 정합의 병렬처리방법을 제공하고자 하는 것이다.
Another object of the present invention is to provide an image processing apparatus and a method for processing a three-dimensional image using a GP-GPU (GPU-based parallel processing) And to provide a parallel processing method of medical image matching.

상기한 바와 같은 목적을 달성하기 위해, 본 발명에 따르면, 서로 다른 시간에 획득한 동일한 환자의 2개의 영상을 정합하는 처리를 고속으로 처리하기 위한 영상 정합의 병렬처리방법에 있어서, GP-GPU(General-Purpose computation on Graphics Processing Units)를 이용하여 상기 2개의 영상을 정합하는 처리가 CPU와 GPU에 의해 병렬처리되는 것을 특징으로 하는 영상 정합의 병렬처리방법이 제공된다.
According to an aspect of the present invention, there is provided a parallel processing method of image matching for processing a process of matching two images of the same patient acquired at different times at a high speed, And a process of matching the two images using general-purpose computation on Graphics Processing Units is performed in parallel by a CPU and a GPU.

여기서, 상기 방법은, 서로 다른 시간에 획득된 동일한 환자의 2개의 영상을 각각 입력받는 입력 단계; 상기 입력받는 단계에서 입력된 영상들을 스케일링(scaling) 하는 스케일링 단계; 상기 스케일링된 영상들로부터 초기 변환 파라미터(initial transformation parameter)를 추출하는 초기화 단계(initialization step); 상기 초기화 단계에서 추출된 상기 초기 변환 파라미터들을 최적화하여 최종 파라미터를 구하는 최적화 단계(optimization step); 및 상기 최적화 단계에서 최적화된 상기 최종 파라미터를 이용하여 상기 입력된 영상들을 정합하고 표시장치를 통하여 표시하는 시각화 단계(visualization step)를 포함하여 구성되며, 상기 입력 단계, 상기 스케일링 단계, 상기 초기화 단계 및 상기 시각화 단계는 CPU에 의해 수행되고, 상기 최적화 단계는 GPU에 의해 상기 CPU의 처리와 함께 병렬처리되는 것을 특징으로 한다.
Here, the method includes: inputting two images of the same patient obtained at different times; A scaling step of scaling the images input in the input step; An initialization step of extracting an initial transformation parameter from the scaled images; An optimization step of optimizing the initial conversion parameters extracted in the initialization step to obtain final parameters; And a visualization step of matching the input images using the final parameters optimized in the optimization step and displaying the input images through a display device, wherein the input step, the scaling step, the initialization step, The visualization step is performed by a CPU, and the optimization step is performed by a GPU in parallel with the processing of the CPU.

또한, 상기 CPU는 C 언어로 작성된 프로그램에 의해 제어되고, 상기 GPU는 OpenCL 또는 CUDA를 이용하여 제어되도록 구성되는 것을 특징으로 한다.
In addition, the CPU is controlled by a program written in the C language, and the GPU is controlled using OpenCL or CUDA.

아울러, 상기 스케일링 단계는, 상기 입력된 영상들의 복셀 사이즈(voxel size)에 따른 스케일링 파라미터를 이용하여 상기 입력된 영상들의 스케일링이 수행되는 것을 특징으로 한다.
In addition, the scaling step scales the input images using a scaling parameter according to a voxel size of the input images.

더욱이, 상기 복셀 사이즈에 대한 정보는, 입력 영상의 DICOM(Distal Image and COmmunication in Medicine) 이미지 헤더 파일로부터 얻어지며, 상기 복셀 사이즈는, x-y 평면에서 픽셀 스페이싱(pixel spacing)을 통해 추출되고, z-평면에서는 입력 DICOM 영상의 헤더 파일 정보의 슬라이스(slice)로부터 얻어지고, 제 1 입력영상 V₁의 복셀 사이즈가 제 2 입력영상 V₂의 복셀 사이즈보다 작은 것으로 가정하면, 상기 스케일링 파라미터는 이하의 수학식을 이용하여 계산되며,
Furthermore, the voxel size information is obtained from a DICOM image header file of an input image, wherein the voxel size is extracted through pixel spacing in the xy plane and z- When the plane is obtained from the slice (slice) of the header file information of the input DICOM image, the voxel size of the first input image V ₁ assumed to be smaller than the voxel size of the second input image V _2, the scaling parameter is a mathematical or less Is calculated using an equation,

(여기서, V_1i는 볼륨 V₁의 복셀 사이즈이고, V_2i는 볼륨 V₂의 복셀 사이즈이다.)
(Where V _1i is the voxel size of the volume V ₁ , and V _2i is the voxel size of the volume V ₂ ).

스케일링 파라미터가 계산된 후, 상기 V₁은 스케일 다운되고, 상기 V₂는 기준 볼륨(reference volume)(V_r)으로 정의되며, 상기 V₁은 부동 볼륨(float volume)(V_f)으로 설정되는 것을 특징으로 한다.
After the scaling parameter is calculated, V ₁ is scaled down, V ₂ is defined as a reference volume (V _r ), and V ₁ is set to a float volume (V _f ) .

또한, 상기 초기화 단계에서, 상기 초기 변환 파라미터는, 3개의 회전 파라미터와 3개의 변환 파라미터를 포함하며, 상기 기준 볼륨 V_r과 상기 부동 볼륨 V_f 사이의 상대 위치(related position)는, 변환 파라미터 P = {t_x, t_y, t_z, α, β, γ}(여기서, t_x, t_y, t_z는 변환량(translation quanta)이고 α, β, γ는 각각 기준 볼륨에 대하여 3D 축에 따른 부동 볼륨의 회전각)의 집합에 의해 정의되는 것을 특징으로 한다.
Further, in the initialization step, the initial conversion parameter includes three rotation parameters and three conversion parameters, and a relative position between the reference volume V _r and the floating volume V _f is determined by a conversion parameter P _{_{= {t x, t y,}} t z, α, β, γ} ( _{_{where, t x, t y, t}} z is a conversion amount (translation quanta) and the 3D axis for each reference volume, α, β, γ are And a rotation angle of the floating volume according to the rotation angle).

아울러, 상기 초기화 단계는, 상기 기준 볼륨 및 상기 부동 볼륨이 모두 2진화되는(binarized) 단계; 2진화된 상기 기준 볼륨 및 상기 부동 볼륨의 픽셀의 좌표로부터 3D 벡터를 형성되고, 형성된 상기 3D 벡터로부터 중심(centroiod) 및 관성행렬(inertia matrix)이 산출되는 단계; 각각의 상기 관성행렬의 고유벡터로부터 각 볼륨의 회전각이 산출되는 단계; 상기 기준 볼륨의 회전각(x, y, z)으로부터 상기 부동 볼륨의 회전각(x, y, z)을 감산함으로써(subtracting) 3개의 초기 회전 파라미터가 산출되는 단계; 및 상기 기준 볼륨의 중심(x, y, z)으로부터 상기 부동 볼륨의 중심(x, y, z)을 감산함으로써 3개의 초기 변환 파라미터가 산출되는 단계를 포함하는 것을 특징으로 한다.
In addition, the initializing step includes binarizing both the reference volume and the floating volume; A 3D vector is formed from the coordinates of pixels of the binarized reference volume and the floating volume, and a centroiod and an inertia matrix are calculated from the 3D vector formed; Calculating a rotation angle of each volume from an eigenvector of each of the inertia matrixes; Calculating three initial rotation parameters by subtracting a rotation angle (x, y, z) of the floating volume from a rotation angle (x, y, z) of the reference volume; And calculating three initial conversion parameters by subtracting the center (x, y, z) of the floating volume from the center (x, y, z) of the reference volume.

여기서, 상기 2진화되는 단계는, B(x, y, z)를 3D 볼륨 V(x, y, z)의 초기 2진화 볼륨이라 하면, 이하의 수학식을 이용하여 상기 기준 볼륨 및 상기 부동 볼륨이 2진화되는 것을 특징으로 한다.
Here, the binarizing step may be performed by using B (x, y, z) as an initial binarization volume of the 3D volume V (x, y, z) Is binarized.

(여기서, x, y, z는 이미지의 복셀의 좌표(coordinates)이고, τ는 2진화 볼륨을 정의하는 임계값(threshold value)이다.)
(Where x, y, z are the coordinates of the voxel in the image, and τ is a threshold value defining the binarization volume).

또한, 상기 관성행렬은, 이하의 수학식으로 정의되며, 상기 관성행렬의 고유벡터(eigenvectors)에 의해 관성 주축(principal axes)이 정의되는 것을 특징으로 한다.
Also, the inertia matrix is defined by the following equation, and principal axes are defined by eigenvectors of the inertia matrix.

(여기서,

는 객체 모멘트(object moments) 이고, 함수 f(x, y, z)는 복셀 데이터의 이미지 내용(image content)을 나타내며, x_c, y_c, z_c는 상기 객체의 중심을 나타내고, 상기 관성행렬로부터 계산된 3개의 고유벡터는 상기 객체의 관성 주축을 나타낸다.)
(here,

X _c , y _c , z _c represent the center of the object, and the inertia matrix (x, y, z) &Lt; / RTI > are the inertial principal axes of the object.

아울러, 상기 고유벡터의 행렬 형태는 이하의 수학식으로 나타내지며,
In addition, the matrix form of the eigenvector is represented by the following equation,

회전행렬 R에 대하여 상기 행렬 E를 푸는 것에 의해(E = R), 상기 회전각 α, β, γ가 이하의 수학식으로 계산되고,
(E = R) by solving the matrix E for the rotation matrix R, the rotation angles [alpha], [beta], and [gamma]

상기 회전행렬은, 이하의 수학식으로 나타내지는 것을 특징으로 한다.
The rotation matrix is characterized by being expressed by the following equation.

R = R_γ×R_β×R_α
R = R _? XR _? XR _?

(여기서, α, β, γ는 각각 3D 축에 대한 오일러 각도(Euler angles) 이며, (Where, alpha, beta and gamma are Euler angles with respect to the 3D axis, respectively,

이다.)
to be.)

더욱이, 상기 최적화 단계는, 상기 부동 볼륨의 각 복셀에 대하여, 강체 변환 행렬(rigid body transfomation matrix) 및 3차-선형 보간법(tri-linear interpolation) 연산(operate)을 이용하여 상기 부동 볼륨을 변환하는 단계; 변환된 상기 부동 볼륨과 상기 기준 볼륨 사이의 유사성 스코어(similarity score)를 측정하는 단계; 및 상기 유사성 스코어가 최대가 될 때까지 모든 복셀에 대하여 상기한 단계들을 반복하는 단계를 포함하는 것을 특징으로 한다.
Further, the optimizing step may further comprise transforming the floating volume using a rigid body transformation matrix and a tri-linear interpolation operation for each voxel of the floating volume step; Measuring a similarity score between the transformed floating volume and the reference volume; And repeating the steps for all voxels until the similarity score is at a maximum.

또한, 상기 최적화 단계는, 두 입력 영상 사이의 관계를 3개의 변환 파라미터 및 3개의 회전 파라미터에 의해 정의되는 강체변환(rigid body transformation) 행렬 M이라 하면, 상기 강체변환행렬 M은 이하의 수학식으로 나타내지는 것을 특징으로 한다.
Further, in the optimization step, when a relation between two input images is a rigid body transformation matrix M defined by three transformation parameters and three rotation parameters, the rigid transformation matrix M is expressed by the following equation .

M = T(t)R
M = T (t) R

(여기서, 상기 T(t) 및 상기 (R)은 동일 좌표계(homogeneous coordinates)에서의 변환벡터와 회전행렬이다.)
(Where T (t) and (R) are transformation vectors and rotation matrices in homogeneous coordinates).

아울러, 상기 유사성 스코어의 측정은, 상기 최적화 단계에서 두 볼륨의 유사성의 정도(degree)를 정량화(quantify) 하기 위해 이하의 NCC(normalized cross-correlation) 함수를 이용하여 수행되는 것을 특징으로 한다.
In addition, the similarity score measurement is performed using the following normalized cross-correlation (NCC) function to quantify the degree of similarity of the two volumes in the optimization step.

(여기서, n은 복셀의 총 수(total number), i와 j는, x가 V_r(x_i) 및 V_f(x_j)의 x-, y-, z- 좌표를 나타내고 V_r과 V_f가 평균 화소값(mean intensity value)을 나타낼 때, 포인트 x_i와 x_j에서의 화소값을 나타내는 복셀 인덱스(voxel index) 이다.)
(Where, n is the total number of voxels (total number), i and j, x _r is V (x _i) and _f V (x _j) of x-, y-, z- coordinate represents V _r and V _{and v} is a voxel index indicating the pixel values at points x _i and x _j when _f represents a mean intensity value.

더욱이, 상기 방법은, 상기 최적화 단계에서, 상기 부동 볼륨을 변환하는 단계 및 상기 유사성 스코어를 측정하는 단계의 처리가 상기 GPU에 의해 병렬처리되는 것을 특징으로 한다.
Further, the method is characterized in that, in the optimization step, the processing of converting the floating volume and the step of measuring the similarity scores are performed in parallel by the GPU.

상기한 바와 같이, 본 발명에 따르면, 종래의 분할기반의 기법과 화소값기반 기법의 장점만을 취합함으로써, 종래의 의료영상 정합기법들보다 더욱 빠르고 정확한 의료영상의 정합방법을 제공할 수 있다.
As described above, according to the present invention, it is possible to provide a faster and more accurate matching method of medical images than conventional medical image matching techniques by merely taking advantage of the conventional division-based technique and pixel value based technique.

또한, 본 발명에 따르면, GP-GPU(General-Purpose computation on Graphics Processing Units) 기반 병렬처리를 적용함으로써, 종래의 분할기반의 기법과 화소값기반 기법의 장점만을 취합하여 3차원 의료영상 데이터를 보다 빠르고 정확하게 처리할 수 있는 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법을 제공할 수 있다.
In addition, according to the present invention, by applying GP-GPU (General-Purpose Computation on Graphics Processing Units) based parallel processing, only the merits of the conventional segmentation- It is possible to provide a parallel processing method of 3D medical image matching using a GP-GPU capable of fast and accurate processing.

도 1은 2개의 영상을 정합하기 위한 영상정합 처리방법의 전체적인 처리 흐름을 개략적으로 나타내는 플로차트이다.
도 2는 본 발명의 실시예에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법의 전체적인 구성을 개략적으로 나타내는 도면이다.
도 3은 OpenCL의 플랫폼 모델의 구성을 개략적으로 나타내는 도면이다.
도 4는 OpenCL의 NDRange 구성을 개략적으로 나타내는 도면이다.
도 5는 OpenCL의 메모리 모델의 구성을 개략적으로 나타내는 도면으로, OpenCL에 의해 정의되는 메모리 계층 구조(memory hierarchy)의 다이어그램을 나타내는 도면이다.
도 6은 3D 변환을 위한 OpenCL 구현의 설계방법의 개념을 개략적으로 나타내는 도면이다.
도 7은 영상 정합에 이용되는 환자의 무릎 이미지를 나타내는 도면이다. 1 is a flow chart schematically showing an overall processing flow of an image matching processing method for matching two images.
2 is a diagram schematically showing the overall configuration of a parallel processing method of 3D medical image matching using GP-GPU according to an embodiment of the present invention.
3 is a view schematically showing a configuration of a platform model of OpenCL.
4 is a diagram schematically showing the NDRange configuration of OpenCL.
5 is a diagram schematically showing the configuration of a memory model of OpenCL, which is a diagram of a memory hierarchy defined by OpenCL.
6 is a diagram schematically showing a concept of a design method of an OpenCL implementation for 3D conversion.
7 is a view showing a knee image of a patient used for image matching.

이하, 첨부된 도면을 참조하여 본 발명에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법의 상세한 내용에 대하여 설명한다.
Hereinafter, a detailed description of a parallel processing method of 3D medical image matching using a GP-GPU according to the present invention will be described with reference to the accompanying drawings.

여기서, 이하에 설명하는 내용은 본 발명을 실시하기 위한 실시예일 뿐이며, 본 발명은 이하에 설명하는 실시예의 내용으로만 한정되는 것은 아니라는 사실에 유념해야 한다.
It should be noted that the following description is only an embodiment for carrying out the present invention, and the present invention is not limited to the contents of the embodiments described below.

즉, 본 발명은, 후술하는 바와 같이, 서로 다른 시간에 획득한 같은 환자의 영상에서 질환을 모니터링하기 각각의 영상을 정합하는 방법에 있어서, 종래의 분할기반의 기법과 화소값기반 기법의 장점만을 취합하여 빠르고 정확한 의료영상의 정합방법을 제공하고자 하는 것이다.
That is, according to the present invention, as described later, in the method of matching each image for monitoring the disease in the same patient image acquired at different times, only the advantages of the conventional segmentation-based technique and pixel value- And to provide a fast and accurate matching method of medical images.

이를 위해, 본 발명에 따르면, 후술하는 바와 같이, 먼저, 분할기반의 기법을 이용하여 초기 변환 파라미터를 고려하고, 고려한 초기 변환 파라미터 최적화하기 위해서 화소값기반 기법을 적용하여 자동적으로 최종 파라미터를 구하며, 이때, 화소값기반 최적화는 수행시간이 오래 걸리므로, 3차원 데이터를 빠르게 처리할 수 있도록 GP-GPU 기반의 병렬처리기법을 적용하여 처리속도를 높이는 것을 특징으로 하는 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법이 제공된다.
To this end, according to the present invention, as described later, first, a final parameter is automatically obtained by considering the initial conversion parameter by using a segmentation-based technique and applying a pixel value-based technique to optimize the initial conversion parameter, GP-GPU-based parallel processing method is applied to speed up the process of 3D data because GP-based optimization takes a long time. A method for parallel processing of matching is provided.

계속해서, 첨부된 도면을 참조하여, 상기한 바와 같은 본 발명에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법의 구체적인 실시예에 대하여 상세히 설명한다.
Next, with reference to the accompanying drawings, a specific embodiment of a parallel processing method of 3D medical image matching using the GP-GPU according to the present invention as described above will be described in detail.

여기서, 이하의 실시예에서는, CT 스캐너를 이용하여 서로 다른 시간에 얻어진 3D 무릎 이미지의 영상을 정합하는 예를 통하여 본 발명을 설명하나, 본 발명은 이러한 경우로만 한정되는 것은 아니다.
Hereinafter, the present invention will be described with reference to an example in which images of a 3D knee image obtained at different times are matched using a CT scanner, but the present invention is not limited thereto.

즉, 본 발명의 실시예에 따른 의료영상 정합의 병렬처리방법은, 먼저, 입력 영상인 무릎뼈 영상의 2개의(binary) 뼈 구조의 관성 주축(principal axes)으로부터 산출되는 초기 변환 파라미터(initial transformation parameters)를 추출한다.
That is, a method for parallel processing of medical image matching according to an embodiment of the present invention is characterized in that an initial transformation is performed from the principal axes of two bone structures of a knee bone image, parameters).

다음으로, 다운힐 심플렉스 옵티마이저(downhill simplex optimizer)의 유사 메트릭(similarity metric)을 최소화함으로써(minimizing), 상기한 초기 변환 파라미터를 최적화한다.
Next, by minimizing the similarity metric of the downhill simplex optimizer, it optimizes the initial conversion parameters.

여기서, 정합될 이미지가 동일한 형태(modality)로부터 얻어지므로, 유사메트릭으로서 NCC(Normalized-cross correltion)가 적용된다.
Here, since the image to be matched is obtained from the same modality, NCC (Normalized-Cross Correlation) is applied as a similar metric.

또한, 본 발명에 따르면, 종래의 EMP-MI와 달리, 정확한 정합 결과를 제공하기 위해 뼈 영역의 전체 3D 볼륨을 사용하며, 아울러, 스케일링 팩터(scaling factor)는 정합을 위해 계산되는 파라미터를 감소하도록 입력 볼륨의 복셀 사이즈(voxel size)에 근거하여 계산된다.
Further, according to the present invention, unlike the conventional EMP-MI, the entire 3D volume of the bone region is used to provide an accurate matching result, and a scaling factor is used to reduce the parameter calculated for matching Is calculated based on the voxel size of the input volume.

더욱이, 알고리즘 중 가장 많은 시간이 소비되는 부분인 보간법(interpolation)을 포함하는 변환(transformation) 및 NCC 스코어(score) 계산은, NDIVIA사의 Tesla 이외에, AMD사의 APU(Accelerated Processing Unit), 인텔사의 MIC 아키텍쳐(Many Integrated Core architecture)와 같은 다중 CPU 및 다중 GPU를 동시에 활용 가능한 OpenCL 구현(implementation)을 통하여 병렬처리된다(parallelized).
Furthermore, transformation and NCC score calculations, including interpolation, which is the most time-consuming part of the algorithm, can be performed in addition to NDVIIA Tesla, AMD Accelerated Processing Unit (APU), Intel's MIC Architecture (Many Integrated Core architecture), and Parallelized through an OpenCL implementation that can simultaneously use multiple GPUs.

상기한 변환 파라미터는, 최적화 단계에 이어지는 기준(reference) 및 변환된 볼륨 사이의 복셀 유사성 측정(voxel similarity measurement)을 계산함으로써 얻어진다.
The transformation parameters are obtained by calculating a voxel similarity measurement between the reference and the transformed volume following the optimization step.

최적화 파라미터는 유사성이 최소 또는 최대가 될 때까지 조사되고, 유사성 측정의 역할은 2개의 이미지가 얼마나 일치하는가를(matched) 나타내는 값을 반환하는(return) 것이다.
The optimization parameters are examined until similarity is at a minimum or maximum, and the role of similarity measurement is to return a value that indicates how matched the two images are.

최적화 파라미터 행렬(optimal parameter matrix)을 정의하기 위한 비용함수(costfunction)는 다음과 같다.
The cost function for defining the optimal parameter matrix is as follows.

[수학식 1] [Equation 1]

여기서, S는 유사 메트릭(similarity metric)이고, V_r 및 V_f는 각각 기준 볼륨(reference volume) 및 부동 볼륨(float volume)이며, M은 그들 사이의 기하학적 변환행렬(geometric transform matrix)이다.
Where S is a similarity metric, V _r and V _f are a reference volume and a float volume, respectively, and M is a geometric transform matrix between them.

또한, 변환행렬 M의 형식은 정합 도메인(registration domain)에 따르고, S는 정합 또는 이미지의 특징에 포함되는 형태(modality)의 수에 따라 선택되며, 반복 회수를 줄이기 위해, 초기 파라미터는 관성 주축 기반 접근(principal axes based approach)에 의해 추정된다.
Also, the format of the transformation matrix M depends on the registration domain, and S is selected according to the number of modalities included in the matching or image characteristics. To reduce the number of iterations, the initial parameters are based on the inertia principal axis It is estimated by the principal axes-based approach.

더 상세하게는, 도 1을 참조하면, 도 1은 2개의 영상을 정합하기 위한 처리방법의 전체적인 알고리즘을 나타내는 플로차트이다.
More specifically, referring to Fig. 1, Fig. 1 is a flowchart showing an overall algorithm of a processing method for matching two images.

즉, 도 1에 나타낸 바와 같이, 2개의 영상을 정합하기 위한 처리방법은, 초기화 단계 및 최적화 단계의 주요한 두 단계로 구분될 수 있고, 초기화 단계 전에, 이후 단계에서 계산되어야 할 파라미터들의 수를 감소하기 위해 복셀 사이즈에 따른 스케일링이 고려되며, 스케일링 파라미터는 입력 볼륨의 복셀 사이즈에 따라 고려된다.
That is, as shown in Fig. 1, the processing method for matching two images can be divided into two main steps of an initialization step and an optimization step, and before the initialization step, the number of parameters to be calculated in the subsequent step is reduced , Scaling according to the voxel size is considered, and the scaling parameter is considered according to the voxel size of the input volume.

또한, 초기화 단계는, 바이너리 볼륨 추출(binary volume extraction), 중심 계산(centroid calculation) 및 관성 주축 계산(principal axes calculation)의 세 개의 단계를 포함하여 구성된다.
In addition, the initialization step is comprised of three steps: binary volume extraction, centroid calculation, and principal axes calculation.

아울러, 각각의 파라미터들은, 후술하는 바와 같이 하여 두 번째 단계인 최적화 단계에서 최적화된다.
In addition, each of the parameters is optimized in the second optimization step, as described below.

즉, 두 입력 볼륨 사이의 관계를 세 개의 변환 파라미터 및 세 개의 회전 파라미터의 6개의 파라미터에 의해 정의되는 강체변환(rigid body transformation) 행렬로 가정하면, 강체변환행렬 M은, 동일 좌표계(homogeneous coordinates)에서 변환벡터 T(t)와 회전행렬(R)의 연속(concatenation)이라 할 수 있다.
That is, assuming that the relationship between two input volumes is a rigid body transformation matrix defined by three transformation parameters and six parameters of three rotation parameters, the rigid transformation matrix M is defined by homogeneous coordinates, (T) and a rotation matrix (R) in the matrix T (t).

따라서 M은 다음과 같은 형태를 가진다.
Therefore, M has the following form.

[수학식 2] &Quot; (2) "

M = T(t)R
M = T (t) R

여기서, 회전행렬을 기술하는 일반적인 방법은, 다음의 일련의 행렬들의 분해형(decomposed form) 이다.
Here, a general method of describing a rotation matrix is a decomposed form of the following series of matrices.

[수학식 3] &Quot; (3) "

여기서, α, β, γ는 각각 3D 축에 대한 오일러 각도(Euler angles) 이다.
Here,?,?, And? Are Euler angles with respect to the 3D axis, respectively.

따라서 회전행렬은, 다음과 같이 나타낼 수 있다.
Therefore, the rotation matrix can be expressed as follows.

[수학식 4] &Quot; (4) "

R = R_γ×R_β×R_α
R = R _? XR _? XR _?

스케일링 팩터 s_x, s_y 및 s_z는, 회전 및 변환 파라미터를 계산하기 전에 별도로 고려되며, 본 발명에 따르면, 스케일링 파라미터는 입력 볼륨의 복셀 사이즈에 따라 고려된다.
The scaling factors s _x , s _y, and s _z are considered separately before calculating the rotation and transformation parameters, and according to the present invention, the scaling parameters are considered according to the voxel size of the input volume.

복셀 사이즈에 대한 정보는, 입력 이미지의 DICOM(Distal Image and COmmunication in Medicine) 이미지 헤더 파일로부터 얻어지며, x-y 평면에서, 복셀 사이즈는 픽셀 스페이싱(pixel spacing)을 통해 추출되고, z-복셀 사이즈는 입력 DICOM 이미지의 헤더파일 정보의 슬라이스(slice)로부터 얻어진다.
Information about the voxel size is obtained from the Disco Image and COmmunication in Medicine (DICOM) image header file of the input image, in the xy plane, the voxel size is extracted through pixel spacing, and the z-voxel size is input Obtained from a slice of header file information of a DICOM image.

V₁의 복셀 사이즈가 V₂의 복셀 사이즈보다 작은 것으로 가정하면, 스케일링 파라미터는 다음과 같이 계산된다.
Assuming that the voxel size of V ₁ is smaller than the voxel size of V ₂ , the scaling parameter is calculated as follows.

[수학식 5] &Quot; (5) "

여기서, V_1i는 볼륨 V₁의 복셀 사이즈이고, V_2i는 볼륨 V₂의 복셀 사이즈이다.
Here, V _1i is the voxel size of the volume V ₁ , and V _2i is the voxel size of the volume V ₂ .

스케일링 파라미터가 계산된 후, V₁은 스케일 다운되고, V₂는 기준 볼륨(reference volume) V_r로 정의되며, 스케일된 볼륨 V₁은 부동 볼륨(float volume) V_f으로 설정된다.
After the scaling parameters are calculated, V ₁ is scaled down, V ₂ is defined as the reference volume V _r , and the scaled volume V ₁ is set to the float volume V _f .

정합 처리 전에 스케일링 파라미터가 최초에 계산되므로, 회전에 대한 3개와 변환에 대한 3개의 6개의 파라미터가 초기화 및 최적화 단계 모두에서 고려되며, 따라서 기준 볼륨 V_r과 부동 볼륨 V_f 사이의 상대 위치(related position)는 변환 파라미터 P = {t_x, t_y, t_z, α, β, γ}의 집합에 의해 정의된다(여기서, t_x, t_y, t_z는 변환량(translation quanta)이고 α, β, γ는 각각 기준 볼륨에 대하여 3D 축에 따른 부동 볼륨의 회전각이다).
Since the scaling parameters are initially calculated before the matching process, three for rotation and three for conversion are taken into account in both the initialization and optimization steps, and thus the relative position between the reference volume V _r and the floating volume V _f position is defined by a set of transformation parameters P = {t _x , t _y , t _z , α, β, γ}, where t _x , t _y , t _z are translation quanta, β, and γ are the rotational angles of the floating volume along the 3D axis with respect to the reference volume, respectively).

계속해서, 기준 볼륨에 대하여 부동 볼륨을 최적으로 정렬하는 최적화 파라미터 집합 P를 구하는 방법에 대하여 설명한다.
Next, a method for obtaining the set of optimization parameters P for optimally aligning the floating volume with respect to the reference volume will be described.

6개의 초기 변환 파라미터를 구하기 위해, 정합될 기준 및 부동 CT 볼륨은 모두 2진화되며(binarized), 2진화 볼륨은, 관성 주축(principal axes)을 추출하기 위해 3D의 기하학적 형상(geometric shape)을 나타내기 위한 특징으로서 사용된다.
To obtain the six initial transformation parameters, both the reference and the floating CT volumes to be matched are binarized and the binarization volume represents the 3D geometric shape to extract the principal axes Is used as a feature for paying attention.

B(x, y, z)를 3D 볼륨 V(x, y, z)의 초기 2진화 볼륨이라 하면, 다음과 같이 나타낼 수 있다.
Let B (x, y, z) be the initial binarization volume of the 3D volume V (x, y, z)

[수학식 6] &Quot; (6) "

여기서, x, y, z는 이미지의 복셀의 좌표(coordinates)이고, τ는 2진화 볼륨을 정의하는 임계값(threshold value)이다.
Where x, y, z are the coordinates of the voxel in the image, and [tau] is the threshold value defining the binarization volume.

3D 라벨링 처리는 초기 임계값을 추출한 후에 이어진다. 2진화 이미지의 연결된 성분(connected components)에 대한 라벨링은 2진화 이미지를 심볼릭 이미지로 변환하여 각각의 연결된 성분에 고유한(unique) 라벨을 할당하도록 한다.
The 3D labeling process continues after extracting the initial threshold value. Labeling of the connected components of the binarized image transforms the binarized image into a symbolic image so that each connected component is assigned a unique label.

라벨링 후, 이미지 필링(filling) 및 이로전(erosion)과 같은 형태학적 연산(morphological operations)이 이미지의 구멍(hole)을 채우고 이미지의 원하지 않는 작은 영역을 제거하기 위해 적용된다.
After labeling, morphological operations such as image filling and erosion are applied to fill the holes in the image and to remove unwanted small areas of the image.

6개의 파라미터의 계산은, 2진 객체(binary objects)의 관성 주축을 이용하여 달성되며, 3D 객체의 관성 주축은 3×3 관성행렬(inertia matrix)의 고유벡터(eigenvectors)에 의해 정의된다.
The calculation of the six parameters is accomplished using the inertial principal axis of the binary objects and the inertial principal axis of the 3D object is defined by the eigenvectors of the 3 x 3 inertia matrix.

여기서, 상기한 관성행렬은 다음과 같이 정의된다.
Here, the inertia matrix is defined as follows.

[수학식 7] &Quot; (7) "

여기서,

는 객체 모멘트(object moments) 이고, 함수 f(x, y, z)는 복셀 데이터의 이미지 내용(image content)을 나타내며, x_c, y_c, z_c는 객체의 중심을 나타내며, 관성행렬로부터 계산된 3개의 고유벡터는 직접적으로 객체의 관성 주축을 나타낸다.
here,

(X, y, z) represents the image content of the voxel data, and x _c , y _c , and z _c represent the center of the object and are calculated from the inertia matrix The three eigenvectors directly represent the inertial principal axis of the object.

고유벡터의 행렬 형태는 다음과 같이 나타낼 수 있다.
The matrix form of the eigenvector can be expressed as:

[수학식 8] &Quot; (8) "

회전행렬 R에 대하여 행렬 E를 푸는 것으로(즉, E = R), 회전각 α, β, γ가 다음과 같이 계산될 수 있다.
By solving the matrix E for the rotation matrix R (i.e., E = R), the rotation angles alpha, beta, and gamma can be computed as:

[수학식 9] &Quot; (9) "

또한, 초기 파라미터를 추출하는 단계는, 먼저, 2진화된 볼륨의 픽셀의 좌표로부터 3차원 벡터가 형성되면, 이어서, 3D 좌표 벡터로부터 중심 및 관성행렬이 산출되며, 최종적으로, 각각의 관성행렬의 고유벡터로부터 각 볼륨의 회전각이 산출된다.
In addition, the step of extracting the initial parameters is such that, first, when a three-dimensional vector is formed from the coordinates of pixels of the binarized volume, then a center and inertia matrix is calculated from the 3D coordinate vector, and finally, The rotation angle of each volume is calculated from the eigenvector.

기준 볼륨의 회전각 x, y, z로부터 부동 볼륨의 회전각 x, y, z를 감산함으로써(subtracting) 3개의 초기 회전각이 산출되며, 마찬가지로, 기준 볼륨의 중심 x, y, z로부터 부동 볼륨의 중심 x, y, z를 감산함으로써 3개의 초기 변환 파라미터가 산출된다.
Subtracting the rotational angles x, y, z of the floating volume from the rotational angles x, y, z of the reference volume to produce three initial rotational angles, and likewise, from the center x, y, Three initial conversion parameters are calculated by subtracting the center x, y,

계속해서, 일반적으로 널리 이용되는 다운힐 심플렉스 방법(downhill simplex method)을 적용하여 파라미터를 최적화하는 방법의 상세한 내용에 대하여 설명한다.
Next, details of a method for optimizing parameters by applying a commonly used downhill simplex method will be described.

즉, 상기한 방법은, 최적화 처리를 N+1 포인트로 초기화하고, N차원(N-dimensional) 파라미터 공간의 초기 심플렉스(simplex)를 정의한다.
That is, the above method initializes the optimization process to N + 1 points and defines the initial simplex of the N-dimensional parameter space.

여기서, 상기한 심플렉스는 반복적으로 변형되며(deformed), 즉, 상기한 심플렉스는, 비용함수(cost function) S의 최소값을 향하여 심플렉스의 정점들(vertices)이 천이하도록(shift), 반복되는 단계에서 반사(reflection), 확장(expansion) 또는 단축(contraction)에 의해 형성된다.
Here, the simplex is repeatedly deformed, that is, the simplex is shifted so that the vertices of the simplex shift toward the minimum value of the cost function S, Is formed by reflection, expansion, or contraction at the stage where the light is emitted.

또한, 다운힐 심플렉스 방법은, 예를 들면, "Convergence properties of the Nelder-Mead simplex method in low dimensions", J. C. Lagarias, J. A. Reeds, M. H. Wright and P. E. Wright, SIAM Journal on Optimization, 9(1999) 112-147에 개시된 바와 같은 종래의 알고리즘에 근거하여 구현 가능하며, 일반적으로, 심플렉스는 제로값(zero value)으로 초기화되고 유사성 스코어(similarity score)가 최소값이 될 때까지 최적화된다.
Also, the downhill simplex method is described in, for example, "Convergence properties of the Nelder-Mead simplex method in low dimensions", JC Lagarias, JA Reeds, MH Wright and PE Wright, SIAM Journal on Optimization, -147, and in general, the simplex is initialized with a zero value and optimized until the similarity score is the minimum value.

본 발명의 실시예에서는, 정합 시간을 알고리즘적으로 개선하기 위해 기존의 초기화 단계에 근거하여 얻어진 6개의 변환 파라미터로 상기한 심플렉스가 초기화되는 것으로 하여 본 발명을 설명한다.
In the embodiment of the present invention, in order to improve the matching time algorithmically, the above-described simplex is initialized with six conversion parameters obtained based on an existing initialization step.

먼저, 변환 과정에 대하여 설명하면, 최적화의 반복시에 강체 변환(rigid transformation)이 존재하게 되면, 처리는 다음과 같이 수행될 수 있다.
First, the conversion process will be described. If a rigid transformation exists in the iteration of the optimization, the process can be performed as follows.

(1) 기준 볼륨과 동일한 기하학적 형상(geometry)을 가지는 빈 이미지 볼륨(empty image volume)(임시 변환 볼륨(temporary transform volume))을 생성. (1) Create an empty image volume (temporary transform volume) with the same geometry as the reference volume.

(2) 빈 이미지 볼륨의 각 복셀에 대하여, (2) For each voxel in the empty image volume,

1) 수학식 M = T(t)R에 기재된 바와 같은 변환 행렬을 이용하여, 부동 볼륨의 대응하는 복셀 포인트를 계산. 1) Compute the corresponding voxel points of the floating volume, using a transformation matrix as described in equation M = T (t) R.

2) 정규 그리드(regular grid) 상에 계산된 대응하는 복셀의 3-선형 삽입(tri-linear interpolation)을 연산(operate). 2) operate a tri-linear interpolation of the corresponding voxels computed on a regular grid.

3) 삽입된 복셀을 빈 이미지 볼륨에 설정(set). 3) Set the inserted voxel to the empty image volume.

(3) 모든 복셀에 대하여 상기한 단계들을 반복.
(3) Repeat the above steps for all voxels.

다음으로, 유사성 측정(similarity measure)에 대하여 설명하면, 다음과 같다.
Next, a similarity measure will be described as follows.

즉, 정합의 인증(validation)은, 두 볼륨 사이의 상관 계수(correlation coefficient)에 의해 평가되며, 따라서 최적화 단계에서 볼륨의 유사성의 정도(degree)를 정량화(quantify) 하기 위해 NCC(normalized cross-correlation)가 적용된다.
That is, the validation of the match is evaluated by a correlation coefficient between the two volumes, so that in order to quantify the degree of similarity of the volume in the optimization phase, a normalized cross-correlation (NCC) ) Is applied.

여기서, NCC 함수는 다음과 같이 정의된다.
Here, the NCC function is defined as follows.

[수학식 10] &Quot; (10) "

여기서, n은 복셀의 총 수(total number), i와 j는 복셀 인덱스(voxel index), x가 V_r(x_i) 및 V_f(x_j)의 x-, y-, z- 좌표를 나타내고, V_r과 V_f가 평균 화소값(mean intensity value)을 나타낼 때, 포인트 x_i와 x_j에서의 화소값 이다.
Here, n is the total number of voxels, i and j are the x-, y-, z-coordinates of the voxel index, x is V _r (x _i ), and V _f (x _j ) And is the pixel value at points x _i and x _j when V _r and V _f represent the mean intensity value.

아울러, 최종 변환 파라미터는 V_r과 V_f 사이에서 NCC 스코어가 최대일 때 최적인 것으로 간주된다.
In addition, the final transformation parameter is considered optimal when the NCC score is maximum between V _r and V _f .

계속해서, 도 2를 참조하여, GPU 구현에 대하여 설명한다.
Next, the GPU implementation will be described with reference to FIG.

즉, 도 2를 참조하면, 도 2는 본 발명의 실시예에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법의 전체적인 구성을 개략적으로 나타내는 도면이다.
That is, referring to FIG. 2, FIG. 2 is a diagram schematically showing a general configuration of a parallel processing method of 3D medical image matching using GP-GPU according to an embodiment of the present invention.

더 상세하게는, 도 2에 나타낸 바와 같이, 본 발명의 실시예에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법은, 먼저, 두 개의 영상을 입력(source) 영상과 목표(target) 영상으로 각각 입력받고, 각 영상의 스케일링 파라미터를 찾는다.
More specifically, as shown in FIG. 2, a parallel processing method of a 3D medical image matching using a GP-GPU according to an embodiment of the present invention includes two processes of inputting a source image and a target, And receives scaling parameters of each image.

이어서, 파라미터의 초기화를 위해, 도 2에 나타낸 바와 같이, 각 영상의 바이너리 볼륨을 추출하고, 중심(centroid) 및 관성 주축(principal axes)을 계산하며, 두 영상과 계산된 초기 파라미터를 "MEX" 인터페이스를 통해 CPU 및 GPU로 각각 보내고, CPU와 GPU는 각각 C 언어 및 OpenCL을 이용하여 최적화 작업을 병렬 처리한다.
Next, to initialize the parameters, the binary volume of each image is extracted, the centroid and the principal axes are calculated, and the two images and the calculated initial parameters are called "MEX" Interface to the CPU and the GPU, respectively, and the CPU and the GPU parallelize the optimization work using the C language and OpenCL, respectively.

상기한 바와 같이, 파라미터의 최적화 과정은 CPU와 GPU에 의해 각각 병렬처리되며, 즉, CPU는 C 언어로 작성된 프로그램에 의해 제어되고, GPU는 OpenCL을 이용하여 제어된다.
As described above, the parameter optimization process is performed in parallel by the CPU and the GPU, that is, the CPU is controlled by a program written in the C language, and the GPU is controlled by using the OpenCL.

또한, 도 2에 있어서, 3차원 강체변환(rigid transformation)은 입력 영상들로부터 얻어지는 좌표계로 정의되는 임의의 점 P(x,y,z)를 절대좌표계에서 정의되는 점 P'(x,y,z)로 변환하는 것을 의미하며, 다음과 같이 나타낼 수 있다.
2, a rigid transformation is a transformation of a point P (x, y, z) defined by a coordinate system obtained from input images to a point P '(x, y, z) defined in an absolute coordinate system, z), which can be expressed as follows.

[수학식 11] &Quot; (11) "

P' = RP+t
P '= RP + t

여기서, R은 3×3 회전행렬(rotation matrix)이고, t는 3×1 이동벡터(translation vector) 이며, R은 x, y, z 좌표계의 세 회전행렬을 곱하여 얻은 행렬이다.
Here, R is a 3 × 3 rotation matrix, t is a 3 × 1 translation vector, and R is a matrix obtained by multiplying three rotation matrices of the x, y, and z coordinate systems.

또한, 회전행렬은 다음과 같이 나타낼 수 있다.
In addition, the rotation matrix can be expressed as follows.

[수학식 12] &Quot; (12) "

R = R_γ×R_β×R_α
R = R _? XR _? XR _?

여기서,here,

이고, t 벡터는 t=[t _x t _y t _z ] ^T 로 표시하며, 스케일링 변환 행렬은 아래와 같다.

And the t vector is represented by t = [ t _x t _y t _z ] ^T , and the scaling conversion matrix is as follows.

[수학식 13] &Quot; (13) "

도 2에 나타낸 바와 같이, 강체변환 파라미터 P={tx, ty, tz, α, β, γ}를 찾기 전에, 이하의 식을 이용하여, 입력된 두 이미지의 복셀(voxel) 정보를 이용하여 3개의 스케일링 파라미터를 계산한다.
As shown in FIG. 2, voxel information of the input two images is used to determine 3 (k) using the following equation before finding the rigid transformation parameter P = {tx, ty, tz, Lt; / RTI > scaling parameters.

[수학식 14] &Quot; (14) "

여기서 v_1i는 v_2i보다 작으며, v_1i는 첫 번째 입력된 이미지의 복셀정보이고, v_2i는 두 번째 입력된 이미지의 복셀정보이다.
Where v _1i is less than v _2i , v _1i is the voxel information of the first input image, and v _2i is the voxel information of the second input image.

상기한 바와 같이 두 이미지의 크기를 맞추고 나서 강체변환의 6개의 파라미터를 고려하기 위해 다음과 같은 다섯 개의 단계을 수행한다.
After adjusting the sizes of the two images as described above, the following five steps are performed in order to consider the six parameters of the rigid transformation.

1. 입력된 완본 이미지 두 개에서 뼈 부분을 분할하는 단계 1. Splitting the bone part in two input complete images

2. 분할한 바이너리 이미지의 각 중심을 찾는 단계 2. Finding the center of each binary image

3. 바이너리 이미지의 관성 행렬(inertia matrix)을 통해 고유벡터 (eigenvectors)를 고려하고, 각 고유벡터에서 각각 회전 파라미터를 계산하는 단계 3. Consider eigenvectors through an inertia matrix of the binary image and calculate rotation parameters for each eigenvector, respectively

4. 첫 번째 입력된 이미지의 회전 파라미터에서 두 번째 입력된 이미지의 회전 파라미터를 감산하고, 두 이미지의 회전 파라미터를 고려하는 단계 4. subtracting the rotation parameter of the second input image from the rotation parameter of the first input image and considering the rotation parameters of the two images

5. 두 이미지의 중심을 감산하고 변환 파라미터를 고려하는 단계
5. Subtract the center of both images and consider the transformation parameters

상기한 바와 같이 하여 6개의 초기 파라미터를 관성 주축 기반으로 찾고 나서, 그 파라미터들을 화소값 기반으로 최적화한다.
As described above, six initial parameters are found based on the inertia major axis, and the parameters are optimized based on the pixel value.

여기서, 최적화의 방법으로서는, 상기한 바와 같이 다운힐 심플렉스(Downhill simplex) 방법을 적용하고, 유사성 측정 메트릭은 NCC(normalized cross correlation )를 적용한다.
Here, as a method of optimization, the downhill simplex method is applied as described above, and the similarity measurement metric is NCC (normalized cross correlation).

또한, 수행시간이 오래 걸리는 변환 계산과 메트릭 계산은 그래픽 카드의 GPU를 이용하여 처리함으로써, CPU와 GPU의 병렬처리 기법을 사용하여 처리속도를 높일 수 있다.
In addition, the conversion calculation and the metric calculation, which take a long time, can be processed using the GPU of the graphics card, and the processing speed can be increased by using the parallel processing technique of the CPU and the GPU.

다음으로, 상기한 바와 같은 OpenCL을 이용한 병렬처리의 구현방법에 대하여 상세히 설명한다.
Next, a method for implementing parallel processing using OpenCL as described above will be described in detail.

즉, 초기화 단계에 의해 정합 시간을 알고리즘적으로 감소시킨다 하더라도, 삽입 및 유사성 스코어 계산을 포함하는 대량의 반복적인 최적화 처리로 인해 정합 처리의 속도 문제는 여전히 남아 있게 된다.
That is, even if the matching time is algorithmically reduced by the initialization step, the speed problem of matching processing still remains due to the large amount of iterative optimization processing including insertion and similarity score calculation.

더 상세하게는, 반복을 위한 NCC 스코어를 얻기 위해, 가산(addition), 감산(subtraction), 승산(multiplication), 제산(division), 평균 제곱(mean square), 제곱근 연산(square root operation)과 같은 다수의 수학적 연산이 각 픽셀에 적용되고, 주어진 변환행렬에 의한 하나의 좌표계로부터 다른 좌표계로 볼륨의 변환은 볼륨 내의 모든 복셀에 걸쳐 행해져야(traverse) 하며, 각 복셀의 좌표는 변환행렬과 곱해지고, 화소값(intensity value)은 삽입 후에 새로운 볼륨으로 보내진다.
More specifically, to obtain an NCC score for repetition, one or more of the following operations may be performed: addition, subtraction, multiplication, division, mean square, square root operation, A number of mathematical operations are applied to each pixel and the transformation of the volume from one coordinate system to the other coordinate system by a given transformation matrix traverses all voxels in the volume and the coordinates of each voxel are multiplied with the transformation matrix , And the intensity value is sent to the new volume after insertion.

이러한 연산은 복잡하지는 않으나, 볼륨 내의 모든 복셀을 거쳐야 하고, 일반적으로 3D 의료영상은 큰 사이즈의 볼륨을 가지므로, 직렬 프로그래밍에서는 매우 시간이 걸린다.
Such an operation is not complicated, but it has to go through all the voxels in the volume, and in general, 3D medical images have a large volume, which is very time consuming in serial programming.

NCC 스코어 계산과 변환의 문제는, SIMD 문제로서 나타내질 수 있고, OpenCL은 이러한 종류의 문제를 다루기 위해 적용가능한 병렬 프로그래밍 모델을 제공한다.
The problem of NCC score calculation and transformation can be represented as a SIMD problem, and OpenCL provides an applicable parallel programming model to address this kind of problem.

또한, OpenCL의 상세한 내용에 대하여는, 예를 들면, "OpenCL Programming Guide", A. Munshi, B. Gaster, T. G. Mattson, J. Fung and D. Ginsburg, Addison-Wesley Professional, 2011.에 게시된 내용을 참조할 수 있으며, 즉, OpenCL은 다양한 형태의 연산장치의 사용에 관한 이종 연산모델(heterogeneous computing model)이며, GPU 및 다른 프로세서들이 CPU와 직렬로(tandem) 동작하도록 설계된 개방적이고 자유로운 API(Application Programming Interface) 이다.
For more information on OpenCL, see "OpenCL Programming Guide", A. Munshi, B. Gaster, TG Mattson, J. Fung and D. Ginsburg, OpenCL is a heterogeneous computing model for the use of various types of computing devices, and GPUs and other processors are connected in series with the CPU (see, for example, tandem) is an open and free API (Application Programming Interface) designed to operate.

OpenCL 구조는, 플랫폼 모델(platform model), 실행 모델(execution model), 프로그래밍 모델(programming model) 및 메모리 모델(memory model)의 4가지 모델을 이용하여 기술될 수 있다.
The OpenCL structure can be described using four models: a platform model, an execution model, a programming model, and a memory model.

더 상세하게는, 도 3을 참조하면, 도 3은 OpenCL의 플랫폼 모델의 구성을 개략적으로 나타내는 도면이다.
More specifically, referring to FIG. 3, FIG. 3 is a diagram schematically showing a configuration of a platform model of Open CL.

즉, 도 3에 나타낸 바와 같이, 플랫폼 모델은, 호스트 시스템(CPU)과 다양한 OpenCL 구성요소 사이의 전체적인 관계를 기술하는 것이며, 여기서, 호스트는 하나 이상의 다른 연산장치와 연결될 수 있다.
That is, as shown in Figure 3, the platform model describes the overall relationship between the host system (CPU) and the various OpenCL components, where the host may be coupled to one or more other computing devices.

또한, 각각의 연산장치는 연산 유닛(코어)의 집합(collection)이며, 각각의 연산 유닛은 처리소자(processing elements)(PEs)로 구성되고, 처리소자는 프로세서의 적절한 사용과 SIMD(Single Instruction, Multiple Data) 또는 SPMD(Single Programm, Multiple Data)와 같은 코드의 실행을 수행한다.
In addition, each arithmetic unit is a collection of arithmetic units (cores), each arithmetic unit is made up of processing elements (PEs), and the processing element is a single instruction, Multiple Data) or SPMD (Single Program, Multiple Data).

다음으로, OpenCL의 실행 모델은, 장치의 커널 실행(kernel execution) 및 호스트의 순차 프로그램(sequential program) 또는 호스트 프로그램(host program) 실행의 두 부분으로 구성된다.
Next, the execution model of OpenCL consists of two parts: kernel execution of the device and execution of a host sequential program or host program.

여기서, 커널은 컴퓨터 장치에서 실행되는 실행가능한 코드의 기본 단위(basic unit)이며 C의 함수와 유사하고, 각각의 커널 실행은 작업 아이템(work-item)이라 불리며 작업 아이템의 그룹은 작업 그룹(work-groups)으로 나타낸다.
Here, a kernel is a basic unit of executable code executed on a computer device, and is similar to a function of C. Each kernel execution is called a work-item, and a group of work items is called a work group -groups).

아울러, 실행을 위해 커널이 선택되면, "NDRange"라 불리는 인덱스 공간(index space)이 정의되며, 이때, NDRange는 N이 1, 2 또는 3일 때 N차원 인덱스 공간을 의미한다.
In addition, when the kernel is selected for execution, an index space called "NDRange" is defined, where NDRange denotes an N-dimensional index space when N is 1, 2 or 3.

즉, 도 4를 참조하면, 도 4는 OpenCL의 NDRange 구성을 개략적으로 나타내는 도면이다.
That is, referring to FIG. 4, FIG. 4 is a diagram schematically showing an NDRange configuration of OpenCL.

도 4에 나타낸 바와 같이, 작업 그룹 크기의 특정은 또한 NDRange 구성(NDRange configuration)이라고도 불리며, 동기화(synchronization)는 오직 동일한 작업 그룹 내의 작업 아이템 사이에서만 허용되고, 다른 작업 그룹의 작업 아이템과는 동기화가 불가능하다.
As shown in FIG. 4, the specification of the workgroup size is also referred to as an NDRange configuration, wherein synchronization is only allowed between work items within the same work group, and synchronization with work items in other work groups impossible.

또한, 실행 모델의 호스트 프로그램은 디바이스 콘텍스트(device context)를 정의하는 호스트 시스템에서 동작하고, 명령 큐(command queue)를 이용하는 커널 실행을 정렬(queue) 한다.
In addition, the execution model host program operates on a host system that defines a device context and queues kernel execution using a command queue.

아울러, 프로그래밍 모델은 작업 기반(task-based) 또는 데이터 기반(data-based)이 될 수 있으며, 작업 기반 병렬 컴퓨팅은 작업 그룹이 다른 모든 작업 그룹에 대하여 독립적으로 실행되는 반면, 데이터 기반 병렬 컴퓨팅은 장치 커널의 다중 인스턴스(multiple instance)에 의해 병렬 처리된다.
In addition, the programming model can be task-based or data-based, and task-based parallel computing can be run independently of any other workgroup, while data-based parallel computing They are processed in parallel by multiple instances of the device kernel.

계속해서, 도 5를 참조하여, OpenCL의 메모리 모델의 구성에 대하여 설명한다.
Next, the structure of the memory model of OpenCL will be described with reference to FIG.

도 5에 나타낸 바와 같이, OpenCL은 프라이빗 메모리(private memory)로부터 글로벌 메모리(global memory)까지 메모리 레인징(memory ranging)을 통하여 메모리 레벨을 정의하며, 즉, OpenCL은 프라이빗(private), 로컬(local), 상수(constant) 및 글로벌(global) 메모리의 4개의 메모리 공간(memory space)을 정의한다.
5, OpenCL defines a memory level by memory ranging from a private memory to a global memory, that is, OpenCL is a private, local ), Constant, and global memory (memory space).

더 상세하게는, 도 5를 참조하면, 도 5는 OpenCL의 메모리 모델의 구성을 개략적으로 나타내는 도면으로, OpenCL에 의해 정의되는 메모리 계층 구조(memory hierarchy)의 다이어그램을 나타내고 있다.
More specifically, referring to FIG. 5, FIG. 5 is a diagram schematically illustrating a configuration of a memory model of Open CL, which shows a diagram of a memory hierarchy defined by OpenCL.

도 5에 있어서, 프라이빗 메모리는 단일 계산유닛(single compute unit)에 의해서만 사용될 수 있는 메모리이고, 싱글 코어 CPU의 레지스터와 유사하다.
In Figure 5, the private memory is a memory that can only be used by a single compute unit and is similar to a register in a single core CPU.

또한, 로컬 메모리는 작업 그룹 내의 작업 아이템에 의해 사용될 수 있는 메모리이고, 상수 메모리는 커널의 실행 동안 장치 내의 모든 계산유닛에 의해 읽기 전용으로(read only access) 상수 데이터(constant data)를 저장하기 위한 메모리이며, 글로벌 메모리는 장치 내의 모든 계산 유닛에 의해 사용될 수 있다.
Also, the local memory is a memory that can be used by the work items in the workgroup, and the constant memory is used for storing constant data by read only access by all the calculation units in the device during execution of the kernel. Memory, and the global memory may be used by all of the computing units in the device.

여기서, 상기한 메모리들을 충분히 활용하기 위하여는, 각 종류의 메모리의 특성이 알고리즘의 특징 및 데이터 구조를 만족해야 한다.
Here, in order to fully utilize the memories described above, the characteristics of each kind of memory must satisfy the characteristics of the algorithm and the data structure.

계속해서 OpenCL의 구현방법의 상세한 내용에 대하여 설명한다.
Next, the details of the implementation method of OpenCL will be described.

GPU가 데이터를 처리하기 전에, 해야 할 중요한 작업은 CPU로부터 GPU로 데이터의 할당(allocation) 이다.
Before the GPU can process the data, the important task is to allocate the data from the CPU to the GPU.

먼저, 입력 볼륨의 3차원 데이터가 효율적인 계산을 위해 1차원 배열로 형성되고, 볼륨 데이터의 사이즈가 크고 또한 모든 작업 아이템에 의해 엑세스 가능해야 하므로, 그러한 데이터는 호스트로부터 글로벌 메모리에 복사된다(copied).
First, since the three-dimensional data of the input volume is formed into a one-dimensional array for efficient calculation, and the volume data is large in size and must be accessible by all the work items, such data is copied from the host to the global memory, .

또한, 변환 행렬은, 국지적으로(locally)만 필요하고 부동형의 4×4 사이즈를 가지므로 로컬 메모리에 복사되고, 볼륨의 폭(width), 높이(height) 및 깊이(depth) 정보와 같은 다른 상수 변수(constant vatiables)는 프라이빗 메모리에 복사된다.
In addition, the transformation matrix needs to be locally only and has a floating 4x4 size, so it is copied to local memory, and the transformation matrix can be copied to other memory, such as the width, height, Constant variables are copied into private memory.

아울러, NCC 스코어의 계산은, 주로 각 복셀의 합(summation)에 대한 루프(loop)를 포함하고, GPU 메모리의 코어의 각 복셀값의 합을 위해, 예를 들면, "http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingOverview.pdf"에 제시된 바와 같은, 병렬 감소 방법(parallel reduction stratege)이 사용된다.
In addition, the computation of the NCC score includes a loop for the summation of each voxel, and for the sum of each voxel value in the core of the GPU memory, for example, "http: // www. a parallel reduction strategy is used, as shown in " nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingOverview.pdf ".

또한, 작업 아이템의 합은 작업 그룹 내에서 완료되며, 결과는 작업그룹 IDs에 저장되고 호스트에 전달되며, 최종 합산은 루프의 수가 작업 그룹의 사이즈와 같아질 때까지 반복되어 호스트 측에서 완료된다.
In addition, the sum of the work items is completed in the work group, the results are stored in the work group IDs and delivered to the host, and the final sum is repeated on the host side until the number of loops equals the work group size.

상기한 NCC 함수의 식에 나타낸 바와 같이, 해당 식은 각 볼륨의 평균 및 제곱 평균의 계산을 포함하고 있으며, 따라서 합산을 위한 감소 커널(reduction kenel)과 분모(denominator) 및 분자(nominator)의 분할 계산(partial calculation)을 위한 평균 제곱 커널(mean square kernel)의 2개의 커널이 NCC 스코어를 계산하기 위해 분할된다.
As shown in the above equation of the NCC function, the equation includes the calculation of the mean and the mean of the squares of each volume, so that the reduction kernels for summing, the division of the denominator and the nominator Two kernels of mean square kernel for partial calculation are partitioned to compute the NCC score.

이어서, 최종 계산은, 결과의 합산을 수행해야 하는 호스트 프로그램 측에서 완료된다.
The final calculation is then completed on the host program side where the sum of the results should be performed.

또한, 상기한 바와 같은 병렬 변환을 위한 OpenCL 구현의 설계는 도 6에 나타낸 바와 같다.
The design of the OpenCL implementation for parallel conversion as described above is as shown in FIG.

즉, 도 6을 참조하면, 도 6은 3D 변환을 위한 OpenCL 구형의 설계방법의 개념을 개략적으로 나타내는 도면이다.
That is, referring to FIG. 6, FIG. 6 is a diagram schematically showing a concept of a design method of an OpenCL spherical shape for 3D conversion.

도 6에 나타낸 바와 같이, 병렬화되어야 할 픽셀의 총 수는, 할당되어야 할 글로벌 메모리의 사이즈가 폭×높이×깊이의 총계(amount)와 같도록, 이미지의 폭, 높이 및 깊이의 곱과 같다.
6, the total number of pixels to be parallelized is equal to the product of the width, height, and depth of the image, such that the size of the global memory to be allocated is equal to the amount of width x height x depth.

작업 그룹 사이즈는, 이미지 볼륨의 높이와 동일한 사이즈로 정의되고, 512의 로컬 사이즈가 변환 및 NCC 스코어 계산의 병렬화(parallelization)를 위해 사용된다.
The workgroup size is defined to be the same size as the height of the image volume, and a local size of 512 is used for the translation and parallelization of the NCC score calculation.

여기서, 병렬화 처리의 단계는 다음과 같다.
Here, the steps of the parallelization process are as follows.

1. 호스트로부터 장치로 메모리 복사(볼륨 데이터, 변환 파리미터, 상수값) 1. Copy memory from host to device (volume data, transform parameters, constant value)

2. 최적화기에서 각각 반복 2. Repeat each in optimizer

(1) 변환 커널 런칭 (1) Launching conversion kernel

1) 글로벌 인덱스를 복셀 좌표로 변환 1) Convert global index to voxel coordinates

2) 변환 행렬에 복셀 좌표를 곱함 2) Multiply the transformation matrix by the voxel coordinates

3) 3선형 삽입 후 부동 볼륨의 이전 좌표로부터 임시 볼륨의 새 좌표로 화소값을 복사 3) Copy the pixel value from the previous coordinates of the floating volume to the new coordinates of the temporary volume after 3-line insertion

(2) 감소 커널 런칭 (2) Reduced kernel launch

4) 임시 볼륨의 화소값을 합산하고, 각 작업 그룹에 대한 결과를 각각의 작업 그룹 IDs에 저장 4) Sum the pixel values of the temporary volume, and store the results for each work group in each work group IDs

5) 장치로부터 호스트로 결과를 복사하고 평균값의 계산은 호스트측에서 완료 5) The result is copied from the device to the host and the calculation of the average value is completed on the host side

(3) 평균 제곱 커널 런칭 (3) Average square kernel launch

6) 기준 및 변환 볼륨의 평균 제곱을 계산 6) Compute the mean and square of the baseline and transform volume

7) 각 볼륨에 대한 평균 제곱의 합을 계산하기 위해 감소를 계산 7) Calculate the reduction to calculate the sum of the mean squares for each volume

8) 장치로부터 호스트로 관련된 값을 복사 8) Copying related values from the device to the host

(4) 호스트에서 NCC 스코어 계산을 완료, 여기서, 작업 그룹, 제곱, 제곱근 및 제산의 추가적인 합을 계산해야 함. (4) The host has completed the NCC score calculation, where the additional sum of the workgroup, square, square root and divide must be calculated.

3. NCC 스코어를 최적화기에 반환
3. Return NCC Scores to Optimizer

여기서, 상기한 (2)-4) 단계에서 임시 볼륨의 화소값을 합산하는 방법 또한, "http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingOverview.pdf"에 게시된 바와 같은 감소 방법을 이용할 수 있다.
Here, the method of summing the pixel values of the temporary volume in the above-mentioned (2) -4) step is also the method disclosed in "http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingOverview.pdf" The same reduction method can be used.

따라서 상기한 바와 같이, 본 발명의 실시예에 따르면, 3차원 관성주축 기반 방법과 화소값 기반 방법의 장점만을 결합하여 보다 빠르고 정확하며 효과적인 3차원 영상의 정합방법을 제공할 수 있다.
As described above, according to the embodiment of the present invention, it is possible to provide a method of matching three-dimensional images faster, more accurately, and more effectively by merely combining the merits of the three-dimensional inertial spindle-based method and the pixel value-based method.

이상, 상기한 바와 같은 본 발명의 실시예를 통하여 본 발명에 따른 GP-GPU를 이용한 3D 의료영상 정합의 병렬처리방법의 상세한 내용에 대하여 설명하였으나, 본 발명은 상기한 실시예에 기재된 내용으로만 한정되는 것은 아니며, 따라서 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 설계상의 필요 및 기타 다양한 요인에 따라 여러 가지 수정, 변경, 결합 및 대체 등이 가능한 것임은 당연한 일이라 하겠다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It will be understood by those skilled in the art that various changes, modifications, combinations, and substitutions may be made without departing from the scope of the present invention as set forth in the following claims. I will.

Claims

In the parallel processing method of image registration for fast processing of matching two images of the same patient acquired at different times,
And a process for matching the two images using General-Purpose computation on Graphics Processing Units (GP-GPU) in parallel by a CPU and a GPU.

The method of claim 1,
The method comprises:
An input step of receiving two images of the same patient acquired at different times;
A scaling step of scaling the images input in the input step;
An initialization step of extracting an initial transformation parameter from the scaled images;
An optimization step of obtaining a final parameter by optimizing the initial conversion parameters extracted in the initialization step; And
And a visualization step of matching the input images using the final parameter optimized in the optimization step and displaying the image through a display device.
The input step, the scaling step, the initialization step and the visualization step are performed by the CPU,
And the processing of the optimization step is processed in parallel with the processing of the CPU by the GPU.

3. The method of claim 2,
And the CPU is controlled by a program written in the C language, and the GPU is configured to be controlled using OpenCL or CUDA.

The method of claim 3,
In the scaling step, scaling of the input images is performed using a scaling parameter according to a voxel size of the input images.

5. The method of claim 4,
The voxel size information is obtained from a DICOM (Distal Image and COmmunication in Medicine) image header file of an input image.
The voxel size is extracted through pixel spacing in the xy plane, from the slice of the header file information of the input DICOM image in the z-plane,
Assuming that the voxel size of the first input image V ₁ is smaller than the voxel size of the second input image V ₂ , the scaling parameter is calculated using the following equation,

(Where V _1i is the voxel size of the volume V ₁ , and V _2i is the voxel size of the volume V ₂ ).

After the scaling parameter is calculated, V ₁ is scaled down, V ₂ is defined as a reference volume (V _r ), and V ₁ is set to a floating volume (V _f ). Parallel processing method of image registration, characterized in that.

6. The method of claim 5,
In the initialization step, the initial conversion parameter includes three rotation parameters and three conversion parameters,
The reference volume V _r and the relative position (related position) between the floating volume V _f, the conversion parameter P = {t _x, t _y, t _z, α, β, γ} (where, t _x, t _y, wherein t _z is a translation quanta and?,?, and? are defined by a set of rotational angles of a floating volume along a 3D axis with respect to a reference volume, respectively.

The method according to claim 6,
In the initialization step,
Binarizing both the reference volume and the floating volume;
A 3D vector is formed from the coordinates of pixels of the binarized reference volume and the floating volume, and a centroiod and an inertia matrix are calculated from the 3D vector formed;
Calculating a rotation angle of each volume from an eigenvector of each of the inertia matrixes;
Calculating three initial rotation parameters by subtracting a rotation angle (x, y, z) of the floating volume from a rotation angle (x, y, z) of the reference volume; And
And calculating three initial transformation parameters by subtracting the center (x, y, z) of the floating volume from the center of the reference volume (x, y, z) .

8. The method of claim 7,
Wherein the binarizing comprises:
When B (x, y, z) is an initial binarization volume of 3D volume V (x, y, z), the reference volume and the floating volume are binarized using the following equation. Parallel processing of the match.

(Where x, y, z are the coordinates of the voxel in the image, and τ is a threshold value defining the binarization volume).

The method of claim 8,
The inertia matrix is defined by the following equation,
And an inertia principal axis is defined by the eigenvectors of the inertia matrix.

(here,

The method of claim 9,
The matrix form of the eigenvector is expressed by the following equation,

(E = R), the rotation angles?,?, And? Are calculated by the following equation with respect to the rotation matrix R,

And the rotation matrix is represented by the following equation.

R = R _? XR _? XR _?

(Where α, β, and γ are Euler angles with respect to the 3D axis, respectively,

to be.)

The method of claim 10,
Wherein the optimizing comprises:
Transforming the floating volume for each voxel of the floating volume using a rigid body transformation matrix and a tri-linear interpolation operation;
Measuring a similarity score between the transformed floating volume and the reference volume; And
And repeating the above steps for all voxels until the similarity score is maximum.

12. The method of claim 11,
Wherein the optimizing comprises:
If the relationship between two input images is a rigid body transformation matrix M defined by three transformation parameters and three rotation parameters, the rigid transformation matrix M is represented by the following equation. Parallel processing of the match.

M = T (t) R

(Where T (t) and (R) are transformation vectors and rotation matrices in homogeneous coordinates).

13. The method of claim 12,
The measurement of the similarity score is performed using the following normalized cross-correlation (NCC) function to quantify the degree of similarity between the two volumes in the optimization step. Treatment method.

(Where, n is the total number of voxels (total number), i and j, x _r is V (x _i) and _f V (x _j) of x-, y-, z- coordinate represents V _r and V _{and v} is a voxel index indicating the pixel values at points x _i and x _j when _f represents a mean intensity value.

14. The method of claim 13,
Wherein the step of converting the floating volume and the step of measuring the similarity score are performed in parallel by the GPU in the optimization step.