KR102619707B1

KR102619707B1 - Method and system for processing bio data based on distributed parallel processing

Info

Publication number: KR102619707B1
Application number: KR1020210020504A
Authority: KR
Inventors: 김남욱; 정성진; 강병수
Original assignee: 재단법인대구경북과학기술원
Priority date: 2021-02-16
Filing date: 2021-02-16
Publication date: 2023-12-28
Also published as: KR20220116976A

Abstract

바이오 데이터를 분산 병렬 처리하는 시스템이 개시된다. 본 시스템은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 EM 데이터를 포함하는 소정 용량의 3차원 데이터 셋을 입력받고, 3차원 데이터 셋에 포함된 데이터 타입 정보에 기초하여, 제1 데이터 타입에 대응하는 EM 데이터 및 제2 데이터 타입에 대응하는 EM 데이터를 분류하는 마스터 노드 및 마스터 노드와 클러스터링된 하나 이상의 연산 노드를 포함한다. 이에 따라, 대용량의 영상 데이터의 처리가 수행될 수 있다.A system for distributed and parallel processing of bio data is disclosed. This system receives a 3D data set of a certain capacity including EM data corresponding to each of the first data type and the second data type, and based on the data type information included in the 3D data set, the first data type It includes a master node that classifies the EM data corresponding to the EM data and the EM data corresponding to the second data type, and one or more operation nodes clustered with the master node. Accordingly, processing of large amounts of image data can be performed.

Description

Method and system for distributed parallel processing of bio data {METHOD AND SYSTEM FOR PROCESSING BIO DATA BASED ON DISTRIBUTED PARALLEL PROCESSING}

본 발명은 바이오 데이터를 분산 병렬 처리하는 방법 및 이를 적용한 시스템에 관한 것으로 더 상세하게는 전자 현미경으로 관찰된 3차원 데이터 셋을 분산 병렬 처리하는 방법 및 이를 적용한 시스템에 관한 것이다.The present invention relates to a method of distributed parallel processing of bio data and a system to which the same is applied. More specifically, to a method of distributed parallel processing of a three-dimensional data set observed with an electron microscope and a system to which the same is applied.

생물학 연구 분야에 있어서 차세대 시퀀싱 기술의 발전으로 영상 기반으로 활성 데이터, 유전체 데이터, 전사체 데이터, 단백질체 데이터 등이 생성되고 있으며, 이들 데이터들은 고해상도가 요구되는 등의 특성으로 인해 대용량 데이터라는 특징을 갖는다.In the field of biological research, with the development of next-generation sequencing technology, image-based activity data, genome data, transcriptome data, and proteome data are being generated, and these data are characterized as large data due to the characteristics of high resolution and other characteristics. .

아울러, 생물의 시료를 촬영하는 기술도 계속 발전하고 있다. 대표적으로, 전자 현미경은 전자빔을 광원으로 이용하는 현미경으로, 광결정부터, 단백질 분자, 세포 그리고 세포의 조직 등 다양한 샘플을 관찰할 수 있고, 단백질의 경우 샘플을 초저온상태로 관찰하여 원자 수준의 해상도로 단백질의 구조 분석을 수행할 수 있다.In addition, technology for imaging biological samples continues to develop. Typically, an electron microscope is a microscope that uses an electron beam as a light source, and can observe various samples such as photonic crystals, protein molecules, cells, and cellular tissues. In the case of proteins, the samples are observed at ultra-low temperature to obtain atomic-level resolution. Structural analysis can be performed.

전자현미경에 의해 관찰되어 생성된 3차원 영상 데이터는 기가 바이트 또는 테라 바이트 급의 대용량 데이터이므로, 종래 기술에서는 이러한 대용량 데이터를 처리 및 가공하기에 어려움이 있었다.Since the 3D image data observed and generated by an electron microscope is gigabyte or terabyte-level data, it was difficult to process and process such large data in the prior art.

한편, 상기와 같은 정보는 본 발명의 이해를 돕기 위한 백그라운드(background) 정보로서만 제시될 뿐이다. 상기 내용 중 어느 것이라도 본 발명에 관한 종래 기술로서 적용 가능할지 여부에 관해, 어떤 결정도 이루어지지 않았고, 또한 어떤 주장도 이루어지지 않는다.Meanwhile, the above information is presented only as background information to aid understanding of the present invention. No decision has been made, and no claim is made, as to whether any of the above is applicable as prior art with respect to the present invention.

공개특허공보 제10-2016-0099762호(공개일: 2016.08.23)Publication of Patent No. 10-2016-0099762 (Publication date: 2016.08.23)

본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로, 본 발명의 일 과제는 전자 현미경의 3차원 영상 데이터 셋을 영상 처리하는 분산 병렬 처리하는 방법을 제공하는 데에 있다.The present invention was devised to solve the above-mentioned problems, and one object of the present invention is to provide a distributed parallel processing method for image processing a 3D image data set of an electron microscope.

본 발명의 또 다른 과제는, 3차원 영상 데이터 셋을 복수의 3차원 청크(Chunk)로 가공하는 방법을 제공하는 데에 있다.Another object of the present invention is to provide a method of processing a 3D image data set into a plurality of 3D chunks.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below. You will be able to.

상기한 과제를 실현하기 위한 본 발명의 일 실시 예에 따른 바이오 데이터를 분산 병렬 처리하는 방법은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 EM(Electron Microscope) 데이터를 포함하는 소정 용량의 3차원 데이터 셋을 입력받는 단계; 상기 3차원 데이터 셋에 포함된 데이터 타입 정보에 기초하여, 상기 제1 데이터 타입에 대응하는 EM 데이터 및 상기 제2 데이터 타입에 대응하는 EM 데이터를 분류하는 단계; 상기 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 키(Key) 및 밸류(Value) 정보에 기초하여, 제1 데이터 타입에 대응하는 제1 메시지 큐 및 제2 데이터 타입에 대응하는 제2 메시지 큐에 연산 태스크(Task)를 순차적으로 제공하는 단계; 상기 제1 메시지 큐 및 제2 메시지 큐에 공유 자원을 할당하기 위한 세마포어를 생성하며, 순차적으로 제공된 상기 연산 태스크를 수행할 연산 노드를 결정하는 단계; 및 상기 결정된 연산 노드가 연산 태스크를 수행하는 단계를 포함할 수 있다.A method of distributed parallel processing of bio data according to an embodiment of the present invention for realizing the above-mentioned problem is 3 of a predetermined capacity including EM (Electron Microscope) data corresponding to each of the first data type and the second data type. Receiving a dimensional data set; Classifying EM data corresponding to the first data type and EM data corresponding to the second data type based on data type information included in the 3D data set; Based on key and value information corresponding to each of the first data type and the second data type, a first message queue corresponding to the first data type and a second message corresponding to the second data type sequentially providing computational tasks to a queue; creating a semaphore for allocating shared resources to the first message queue and the second message queue, and determining a computation node to perform the computation tasks provided sequentially; And it may include the step of performing the computational task by the determined computational node.

본 발명의 일 실시 예에 따른 바이오 데이터를 분산 병렬 처리하는 시스템은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 EM(Electron Microscope) 데이터를 포함하는 소정 용량의 3차원 데이터 셋을 입력받고, 상기 3차원 데이터 셋에 포함된 데이터 타입 정보에 기초하여, 상기 제1 데이터 타입에 대응하는 EM 데이터 및 상기 제2 데이터 타입에 대응하는 EM 데이터를 분류하는 마스터 노드; 및 상기 마스터 노드와 클러스터링된 하나 이상의 연산 노드를 포함할 수 있다.A system for distributed parallel processing of bio data according to an embodiment of the present invention receives a three-dimensional data set of a certain capacity including EM (Electron Microscope) data corresponding to each of the first data type and the second data type, a master node that classifies EM data corresponding to the first data type and EM data corresponding to the second data type based on data type information included in the three-dimensional data set; And it may include one or more computational nodes clustered with the master node.

상기 마스터 노드는, 상기 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 키 및 밸류 정보에 기초하여, 제1 데이터 타입에 대응하는 제1 메시지 큐 및 제2 데이터 타입에 대응하는 제2 메시지 큐에 연산 태스크(Task)를 순차적으로 제공하고, 상기 제1 메시지 큐 및 제2 메시지 큐에 공유 자원을 할당하기 위한 세마포어를 생성하며, 순차적으로 제공된 상기 연산 태스크를 수행할 연산 노드를 결정할 수 있다. 결정된 연산 노드는 연산 태스크를 수행하도록 구성될 수 있다.The master node generates a first message queue corresponding to the first data type and a second message queue corresponding to the second data type, based on key and value information corresponding to each of the first data type and the second data type. Computation tasks may be provided sequentially, a semaphore for allocating shared resources to the first message queue and the second message queue may be created, and a computation node to perform the sequentially provided computation tasks may be determined. The determined computational node may be configured to perform computational tasks.

본 발명에서 이루고자 하는 기술적 해결 수단들은 이상에서 언급한 기술적 해결 수단들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 해결 수단들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical solutions to be achieved in the present invention are not limited to the technical solutions mentioned above, and other technical solutions not mentioned above will be clear to those skilled in the art from the description below. It will be understandable.

본 발명의 다양한 실시 예에 따르면, 전자 현미경에 의해 관찰된 대용량의 3차원의 영상 데이터가 분산 병렬 처리로 신속하게 처리되어 다수의 3차원 청크 데이터로 생성될 수 있다.According to various embodiments of the present invention, a large amount of 3D image data observed by an electron microscope can be quickly processed through distributed parallel processing to generate a plurality of 3D chunk data.

본 발명에서 이루고자 하는 기술적 효과들은 이상에서 언급한 기술적 효과들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical effects to be achieved in the present invention are not limited to the technical effects mentioned above, and other technical effects not mentioned will be clearly understood by those skilled in the art from the description below. You will be able to.

도 1은 본 발명의 일 실시 예에 따른 바이오 데이터를 분산 병렬 처리하는 시스템을 개략적으로 설명하기 위한 도면,
도 2는 본 발명의 일 실시 예에 따른 분산 병렬 처리 시스템의 클러스터링된 마스터 노드 및 복수의 연산 노드를 설명하기 위한 도면,
도 3은 본 발명의 일 실시 예에 따른 분산 병렬 처리 시스템의 마스터 노드 프로세스 및 연산 노드 프로세스를 설명하기 위한 도면, 그리고,
도 4는 본 발명의 일 실시 예예 따른 바이오 데이터를 분산 병렬 처리하는 방법을 나타내는 시퀀스도이다.1 is a diagram schematically illustrating a system for distributed and parallel processing of bio data according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a clustered master node and a plurality of operation nodes of a distributed parallel processing system according to an embodiment of the present invention;
3 is a diagram illustrating a master node process and a calculation node process of a distributed parallel processing system according to an embodiment of the present invention, and
Figure 4 is a sequence diagram showing a method of distributed parallel processing of bio data according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms. The present embodiments only serve to ensure that the disclosure of the present invention is complete and that common knowledge in the technical field to which the present invention pertains is not limited. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 발명의 실시예를 설명하기 위한 도면에 개시된 형상, 크기, 비율, 각도, 개수 등은 예시적인 것이므로 본 발명이 도시된 사항에 한정되는 것은 아니다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.The shapes, sizes, proportions, angles, numbers, etc. disclosed in the drawings for explaining embodiments of the present invention are illustrative, and the present invention is not limited to the matters shown. Like reference numerals refer to like elements throughout the specification.

또한, 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명은 생략한다.Additionally, in describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

본 명세서에서 언급된 '포함한다', '갖는다', '이루어진다' 등이 사용되는 경우 '~만'이 사용되지 않는 이상 다른 부분이 추가될 수 있다. 구성 요소를 단수로 표현한 경우에 특별히 명시적인 기재 사항이 없는 한 복수를 포함하는 경우를 포함한다.When 'includes', 'has', 'consists of', etc. mentioned in this specification are used, other parts may be added unless 'only' is used. When a component is expressed in the singular, the plural is included unless specifically stated otherwise.

구성 요소를 해석함에 있어서, 별도의 명시적 기재가 없더라도 오차 범위를 포함하는 것으로 해석한다. 위치 관계에 대한 설명일 경우, 예를 들어, '~상에', '~상부에', '~하부에', '~옆에' 등으로 두 부분의 위치 관계가 설명되는 경우, '바로' 또는 '직접'이 사용되지 않는 이상 두 부분 사이에 하나 이상의 다른 부분이 위치할 수도 있다.When interpreting a component, it is interpreted to include the margin of error even if there is no separate explicit description. In the case of a description of a positional relationship, for example, if the positional relationship of two parts is described as 'on top', 'on the top', 'on the bottom', 'next to', etc., 'immediately' Alternatively, there may be one or more other parts placed between the two parts, unless 'directly' is used.

시간 관계에 대한 설명일 경우, 예를 들어, '~후에', '~에 이어서', '~다음에', '~전에' 등으로 시간적 선후 관계가 설명되는 경우, '바로' 또는 '직접'이 사용되지 않는 이상 연속적이지 않은 경우도 포함할 수 있다.In the case of a description of a temporal relationship, for example, if a temporal relationship is described as 'after', 'successfully after', 'after', 'before', etc., 'immediately' or 'directly' Unless used, non-consecutive cases may also be included.

제1, 제2 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않는다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although first, second, etc. are used to describe various components, these components are not limited by these terms. These terms are merely used to distinguish one component from another. Accordingly, the first component mentioned below may also be the second component within the technical spirit of the present invention.

"X축 방향", "Y축 방향" 및 "Z축 방향"은 서로 간의 관계가 수직으로 이루어진 기하학적인 관계만으로 해석되어서는 아니 되며, 본 발명의 구성이 기능적으로 작용할 수 있는 범위 내에서보다 넓은 방향성을 가지는 것을 의미할 수 있다.“X-axis direction,” “Y-axis direction,” and “Z-axis direction” should not be interpreted as only geometrical relationships in which the relationship between each other is vertical, and should not be interpreted as a wider range within which the configuration of the present invention can function functionally. It can mean having direction.

"적어도 하나"의 용어는 하나 이상의 관련 항목으로부터 제시 가능한 모든 조합을 포함하는 것으로 이해되어야 한다. 예를 들어, "제 1 항목, 제 2 항목 및 제 3 항목 중에서 적어도 하나"의 의미는 제 1 항목, 제 2 항목 또는 제 3 항목 각각 뿐만 아니라 제 1 항목, 제 2 항목 및 제 3 항목 중에서 2개 이상으로부터 제시될 수 있는 모든 항목의 조합을 의미할 수 있다.The term “at least one” should be understood to include all possible combinations from one or more related items. For example, “at least one of the first, second, and third items” means each of the first, second, or third items, as well as two of the first, second, and third items. It can mean a combination of all items that can be presented from more than one.

본 발명의 여러 실시예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하고, 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관관계로 함께 실시할 수도 있다.Each feature of the various embodiments of the present invention can be combined or combined with each other, partially or entirely, and various technical interconnections and operations are possible, and each embodiment can be implemented independently of each other or together in a related relationship. It may be possible.

도 1은 본 발명의 일 실시 예에 따른 바이오 데이터를 분산 병렬 처리하는 시스템(이하, “분산 병렬 처리 시스템”으로 칭함)을 개략적으로 설명하기 위한 도면이다.FIG. 1 is a diagram schematically illustrating a system for distributed parallel processing of bio data (hereinafter referred to as “distributed parallel processing system”) according to an embodiment of the present invention.

도 1을 참고하면, 전자 현미경(10)은 샘플을 촬영할 수 있다. 여기서, 샘플(17)은 생체의 다양한 부위에서 추출될 수 있다. 일 예로, 샘플(17)은 생물의 뇌로부터 추출될 수 있다.Referring to FIG. 1, the electron microscope 10 can photograph a sample. Here, the sample 17 may be extracted from various parts of the living body. As an example, sample 17 may be extracted from the brain of an organism.

전자 현미경(10)은 전자 건(Electron gun, 11)을 이용하여 전자 빔(Electron beam, 15)을 출력하고, 출력된 전자 빔(15)이 자성 렌즈(Magnetic Lens, 13)를 통과하여, 샘플(17)에 대한 영상을 스크린(Screen, 19)에 출력할 수 있다.The electron microscope 10 outputs an electron beam 15 using an electron gun 11, and the output electron beam 15 passes through a magnetic lens 13 to obtain a sample. The image for (17) can be output on the screen (Screen, 19).

영상 가공 모듈(20)은, 별도의 장치 또는 클라우드에서, 촬영된 영상을 가공할 수 있다. 구체적으로, 영상 가공 모듈(20)은 전자 현미경(10)으로부터 촬영된 수많은 2차원 영상에 스티치 프로세스(Stich Process) 및 얼라인먼트 프로세스(Alignment Process) 등을 수행하여, 2차원 영상을 3차원 영상으로 생성할 수 있다. The image processing module 20 can process captured images in a separate device or in the cloud. Specifically, the image processing module 20 performs a stitch process and an alignment process on numerous two-dimensional images captured from the electron microscope 10, generating the two-dimensional images into three-dimensional images. can do.

영상 가공 모듈(20)은 생성된 3차원(3D) 영상을 분산 병렬 처리 시스템(100)으로 제공할 수 있다. 생성된 3차원 영상은 단일 파일로 구현될 수 있으며, 수백 기가 바이트 내지 수십 테라 바이트의 용량으로 구현될 수 있도 있다.The image processing module 20 may provide the generated three-dimensional (3D) image to the distributed parallel processing system 100. The generated 3D image can be implemented as a single file, and can also be implemented with a capacity of hundreds of gigabytes to tens of terabytes.

분산 병렬 처리 시스템(100)은 클라우드(CLOUD)로 구현될 수 있으나, 선택적 실시 예로, 서버 또는 장치로 구현될 수도 있다.The distributed parallel processing system 100 may be implemented as a cloud, but in an optional embodiment, it may also be implemented as a server or device.

분산 병렬 처리 시스템(100)은 영상 가공 모듈(20)로부터 3차원(3D) 영상을 수신하여, 수신된 영상을 처리할 수 있다. 선택적 실시 예로, 분산 병렬 처리 시스템(100)은 특정 입력 수단을 통해 직접 3차원 영상을 입력받을 수도 있다. 분산 병렬 처리 시스템(100)의 구체적인 구성 및 동작을 이하에서 자세히 설명하기로 한다.The distributed parallel processing system 100 may receive a three-dimensional (3D) image from the image processing module 20 and process the received image. In an optional embodiment, the distributed parallel processing system 100 may directly receive a 3D image through a specific input means. The specific configuration and operation of the distributed parallel processing system 100 will be described in detail below.

도 2는 본 발명의 일 실시 예에 따른 분산 병렬 처리 시스템(100)의 클러스터링된 마스터 노드(110) 및 복수의 연산 노드(1501~150N)를 설명하기 위한 도면이며, Figure 2 is a diagram for explaining the clustered master node 110 and a plurality of operation nodes 1501 to 150N of the distributed parallel processing system 100 according to an embodiment of the present invention.

도 2를 참고하면, 분산 병렬 처리 시스템(100)의 마스터 노드(110) 및 복수의 연산 노드(1501~150N)는 클러스터(CLU) 단위로 그룹을 형성할 수 있다.Referring to FIG. 2, the master node 110 and a plurality of operation nodes 1501 to 150N of the distributed parallel processing system 100 may form a group on a cluster (CLU) basis.

도 3은 본 발명의 일 실시 예에 따른 분산 병렬 처리 시스템(100)의 마스터 노드 프로세스(S31) 및 연산 노드 프로세스(S33)를 설명하기 위한 도면이다.Figure 3 is a diagram for explaining the master node process (S31) and the operation node process (S33) of the distributed parallel processing system 100 according to an embodiment of the present invention.

마스터 노드 프로세스(S31)는 마스터 노드(110) 또는 마스터 노드(110)와 직접 또는 간접적으로 연결된 모듈에 의해 수행되는 프로세스를 의미하며, 마스터 노드(110)에 의해 수행되는 것으로 기술하기로 하며, 연산 노드 프로세스(S33)는 연산 노드(150) 또는 연산 노드(150)와 직접 또는 간접적으로 연결된 모듈에 의해 수행되는 프로세스를 의미하며, 연산 노드(150)에 의해 수행되는 것으로 기술하기로 한다.The master node process (S31) refers to a process performed by the master node 110 or a module directly or indirectly connected to the master node 110, and is described as being performed by the master node 110. The node process S33 refers to a process performed by the computation node 150 or a module directly or indirectly connected to the computation node 150, and will be described as being performed by the computation node 150.

마스터 노드 프로세스(S31)는 마스터 노드(110), 채널 & 세그먼테이션 분류 모듈(120), 복수의 메시지 큐(130A, 130B), 에러 리포트 모듈, 스레드 세이프 큐 등을 포함할 수 있다.The master node process (S31) may include a master node 110, a channel & segmentation classification module 120, a plurality of message queues (130A, 130B), an error report module, a thread safe queue, etc.

먼저, 마스터 노드 프로세스(S31)는 EM(Electron Microscope) 데이터 셋을 입력받을 수 있다. 구체적으로, 마스터 노드(110)가 EM 데이터 셋을 입력받아, 해당 데이터 셋을 채널 & 세그먼테이션 분류 모듈(120)로 제공할 수 있다. First, the master node process (S31) can receive an EM (Electron Microscope) data set. Specifically, the master node 110 may receive an EM data set and provide the data set to the channel & segmentation classification module 120.

여기서, EM 데이터 셋은 3차원 데이터를 포함하는 단일 파일 형태로 구현될 수 있다. EM 데이터 셋은 복수의 데이터 타입으로 구분될 수 있는 복수의 EM 데이터(셋)를 포함할 수 있다. Here, the EM data set can be implemented in the form of a single file containing 3D data. The EM data set may include multiple EM data (sets) that can be divided into multiple data types.

가령, EM 데이터 셋은 인간, 동물 등의 뇌에 위치한 생체 샘플로부터 수집된 3차원 영상 데이터 셋일 수 있으며, 3차원 영상 데이터 셋은 서로 다른 데이터 타입으로 표현된 3차원 영상 데이터(셋)를 단일 파일 내부에 함께 저장할 수 있다. 단일 파일의 확장자는 <.H5>일 수 있다. 여기서, 단일 파일의 사이즈는 기가 바이트 내지 테라 바이트 급의 용량으로 구성될 수 있다.For example, an EM data set may be a 3D image data set collected from biological samples located in the brain of humans, animals, etc., and a 3D image data set may be a single file containing 3D image data (sets) expressed in different data types. It can be stored together inside. The extension of a single file may be <.H5>. Here, the size of a single file can range from gigabytes to terabytes.

여기서, 3차원 기반의 EM 데이터 셋은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 EM데이터(셋)를 포함할 수 있다. 제1 데이터 타입은 채널(Channel) 타입일 수 있으며, 제2 데이터 타입은 세그먼테이션(Segmentation) 타입일 수 있다. 채널은 3차원 영상 데이터 셋을 픽셀별로 그레이 스케일로 표현한 영상이며, 세그먼테이션은 3차원 영상 데이터 셋을 픽셀별로 컬러 스케일로 표현한 영상이다.Here, the 3D-based EM data set may include EM data (sets) corresponding to each of the first data type and the second data type. The first data type may be a Channel type, and the second data type may be a Segmentation type. A channel is an image that expresses a 3D image data set in gray scale for each pixel, and segmentation is an image that expresses a 3D image data set in color scale for each pixel.

또한, 제1 데이터 타입(채널 타입)에 대응하는 EM 데이터는 EM 데이터 셋에 대해, 그레이 스케일 기반으로 표현한 EM 데이터인데, 3차원으로 표현된 EM 데이터 셋의 각 픽셀에 대해 회색조 기반으로 표현한 EM 데이터이다. 밝은 광도를 갖는 픽셀을 백색으로 어두운 광도를 갖는 픽셀을 흑색으로 표현할 수 있다.In addition, the EM data corresponding to the first data type (channel type) is EM data expressed based on gray scale for the EM data set, and is EM data expressed based on gray scale for each pixel of the EM data set expressed in three dimensions. am. Pixels with bright luminance can be expressed as white, and pixels with dark luminance can be expressed as black.

제2 데이터 타입(세그먼테이션)에 대응하는 EM 데이터는 EM 데이터 셋에 대해, 컬러 스케일 기반으로 표현한 EM 데이터인데, 3차원으로 표현된 EM 데이터 셋의 각 픽셀에 대해 컬러 픽셀값을 포함할 수 있다. 가령, 컬러 픽셀값이 UNIT16 사이즈로 표현되는 경우, 픽셀 당 0~65535 범위의 컬러값이 세팅될 수 있다.EM data corresponding to the second data type (segmentation) is EM data expressed based on a color scale for the EM data set, and may include a color pixel value for each pixel of the EM data set expressed in three dimensions. For example, when a color pixel value is expressed in UNIT16 size, a color value in the range of 0 to 65535 can be set per pixel.

즉, 본 발명의 일 실시 예에 따른 EM 데이터 셋은 여러 타입으로 표현된 3차원 EM 데이터를 하나의 파일로 구성된 것으로, 여러 타입으로 표현된 3차원 EM 데이터를 하나의 파일로 구성하지 못한 종래기술의 한계를 극복한 것이라 할 수 있다.In other words, the EM data set according to an embodiment of the present invention consists of 3D EM data expressed in multiple types as one file, and prior art does not consist of 3D EM data expressed in multiple types as one file. It can be said that it has overcome the limitations of

마스터 노드 프로세스(S31)는 채널 & 세그먼테이션 분류 모델(120)을 이용하여 3차원 기반의 EM 데이터 셋에 포함된 데이터 타입 정보에 기초하여, 제1 데이터 타입(채널 타입)에 대응하는 EM 데이터 및 제2 데이터 타입(세그먼테이션 타입)에 대응하는 EM 데이터를 분류(Classification)할 수 있다.The master node process (S31) uses the channel & segmentation classification model 120 to generate EM data corresponding to the first data type (channel type) and the first data type based on the data type information included in the 3D-based EM data set. 2 EM data corresponding to the data type (segmentation type) can be classified.

이때, 마스터 노드 프로세스(S31)는 스레드 세이프 큐(Thread Safe Queue) 메시지에 순차적으로 연산을 위한 잡(Job)을 전달할 수 있으며, 잡(Job)이 큐에 순차적으로 적재되는지 마스터 노드(110) 및/또는 태스크 매니저 등을 통해 모니터링할 수 있다. At this time, the master node process (S31) can sequentially deliver jobs for calculation to a Thread Safe Queue message, and the master node 110 and /Or it can be monitored through a task manager, etc.

마스터 노드 프로세스(S31)는 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 키(Key) 및 밸류(Value) 정보에 기초하여, 제1 데이터 타입에 대응하는 제1 메시지 큐(130A) 및 제2 데이터 타입에 대응하는 제2 메시지 큐(130B)에 연산 태스크(Task)를 순차적으로 제공할 수 있다.The master node process (S31) processes the first message queue 130A and the first message queue 130A corresponding to the first data type based on key and value information corresponding to each of the first data type and the second data type. 2 Computation tasks may be sequentially provided to the second message queue 130B corresponding to the data type.

여기서, 키 및 밸류 정보는 메모리 기반으로 대용량의 3차원 EM 데이터 셋을 저장하기 위한 정보이며, 키 정보는 파일의 확장자(H5) 정보 또는 확장자 정보와 연관된 정보를 포함할 수 있으며, 밸류 정보는 제1 데이터 타입(채널 타입)에 대한 정보 또는 제2 데이터 타입(세그먼테이션 타입)에 대한 정보를 포함할 수 있다.Here, the key and value information are memory-based information for storing a large 3D EM data set. The key information may include file extension (H5) information or information related to the extension information, and the value information is It may include information about the first data type (channel type) or information about the second data type (segmentation type).

또한, 메시지 큐(130A, 130B)에 적재된 연산 태스크는 3차원 EM 데이터 셋을 소정 사이즈의 3차원 EM 데이터 청크로 생성하기 위한 연산 태스크일 수 있다.Additionally, the computation tasks loaded in the message queues 130A and 130B may be computation tasks for generating a 3D EM data set into 3D EM data chunks of a predetermined size.

마스터 노드 프로세스(S31)는 상기 제1 메시지 큐(130A) 및 제2 메시지 큐(130B)에 공유 자원을 할당하기 위한 세마포어를 생성하여, 중복 처리 또는 데드락을 방지할 수 있다.The master node process S31 creates a semaphore to allocate shared resources to the first message queue 130A and the second message queue 130B, thereby preventing duplicate processing or deadlock.

마스터 노드 프로세스(S31)는 순차적으로 제공된 연산 태스크를 수행할 연산 노드(1501~150N)를 결정할 수 있다.The master node process (S31) can determine the calculation nodes (1501 to 150N) that will sequentially perform the provided calculation tasks.

마스터 노드 프로세스(S31)가 수행된 후, 결정된 연산 노드들(150, 1501~150N)은 연산 태스크를 수행할 수 있다. 구체적으로, 연산 노드들(150, 1501~150N)은 제1 메시지 큐(130A) 또는 제2 메시지 큐(130B)의 공유 자원을 사용하기 위한 권한을 세마포어를 통해 획득할 수 있다. 연산 노드들(150)은 제1 데이터 타입에 관련된 연산 및 제2 데이터 타입에 관련된 연산을 모두 수행할 수 있다.After the master node process (S31) is performed, the determined calculation nodes (150, 1501 to 150N) can perform calculation tasks. Specifically, the operation nodes 150 and 1501 to 150N may obtain permission to use shared resources of the first message queue 130A or the second message queue 130B through a semaphore. The operation nodes 150 may perform both operations related to the first data type and operations related to the second data type.

실시 예에서, 제1 및 제2 데이터 타입보다 많은 데이터 타입이 적용될 수 있으며, 이 경우, 메시지 큐는 데이터 타입에 따라 개수가 늘어날 수 있다.In an embodiment, more data types than the first and second data types may be applied, and in this case, the number of message queues may increase depending on the data type.

연산 노드들(1501~150N)은 상기 연산 태스크에 대응하는 3차원 데이터 셋을 3차원 청크(Chunk) 데이터로 생성할 수 있다. 3차원 청크는 대용량의 3차원 데이터 셋을 소정 사이즈로 분할한 3차원 영상 데이터일 수 있다. 가령, 3차원 X, Y, Z 좌표 기준으로, 3차원의 10X, 10Y, 10Z 사이즈의 영상 데이터 셋을 1X, 1Y, 1Z사이즈 단위로 분할한 데이터일 수 있다(총 1000 개의 청크).Computation nodes 1501 to 150N may generate a 3D data set corresponding to the computation task as 3D chunk data. A 3D chunk may be 3D image data obtained by dividing a large 3D data set into predetermined sizes. For example, based on three-dimensional

분산 병렬 처리 시스템(100)은 디스플레이를 더 포함하여, 연산 노드들(1501~150N)이 생성된 3차원 청크를 디스플레이에 출력할 수 있다.The distributed parallel processing system 100 may further include a display, and output the three-dimensional chunks generated by the operation nodes 1501 to 150N to the display.

또한, 마스터 노드 프로세스(31S)는 연산 노드에서 연산 에러가 발생된 경우, 연산 에러 로그 정보를 수신할 수 있다.Additionally, the master node process 31S may receive operation error log information when an operation error occurs in the operation node.

마스터 노드 프로세스(31S)는 연산 에러 로그 정보에 기반하여, 비교적 가벼운 연산 에러 로그는 무시(Ignore)할 수 있으며, 소정 수준의 에러 로그가 발견되면, 프로세스를 중단할 수 있다.Based on the operation error log information, the master node process 31S can ignore relatively light operation error logs and stop the process when a predetermined level of error log is found.

마스터 노드 프로세스(31S)는 인공 지능 기반의 에러 처리 모델을 포함하여, 소정 기준을 초과한 에러 로그가 발견되면, 자동적으로 분산 병렬 처리를 중단시킬 수 있다.The master node process 31S includes an artificial intelligence-based error processing model and can automatically stop distributed parallel processing when an error log exceeding a predetermined standard is found.

도 4는 본 발명의 일 실시 예에 따른 바이오 데이터를 분산 병렬 처리하는 방법을 나타내는 시퀀스도이다.Figure 4 is a sequence diagram showing a method of distributed parallel processing of bio data according to an embodiment of the present invention.

우선, 바이오 데이터 분산 병렬 처리 시스템(100)의 처리 방법은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 EM(Electron Microscope) 데이터를 포함하는 소정 용량의 3차원 데이터 셋을 입력받는 단계(S510), 3차원 데이터 셋에 포함된 데이터 타입 정보에 기초하여, 상기 제1 데이터 타입에 대응하는 EM 데이터 및 상기 제2 데이터 타입에 대응하는 EM 데이터를 분류하는 단계(S520)를 포함할 수 있다.First, the processing method of the bio data distributed parallel processing system 100 includes the step of receiving a three-dimensional data set of a predetermined capacity including EM (Electron Microscope) data corresponding to each of the first data type and the second data type (S510) ), may include classifying the EM data corresponding to the first data type and the EM data corresponding to the second data type based on data type information included in the 3D data set (S520).

그 다음, 처리 방법은 제1 데이터 타입 및 제2 데이터 타입 각각에 대응하는 키(Key)-밸류(Value) 정보에 기초하여, 제1 데이터 타입에 대응하는 제1 메시지 큐 및 제2 데이터 타입에 대응하는 제2 메시지 큐에 연산 태스크(Task)를 순차적으로 제공하는 단계(S530) 및 제1 메시지 큐 및 제2 메시지 큐에 공유 자원을 할당하기 위한 세마포어를 생성하며, 순차적으로 제공된 상기 연산 태스크를 수행할 연산 노드를 결정하는 단계(S540)를 포함할 수 있다.Next, the processing method is based on key-value information corresponding to each of the first data type and the second data type, to the first message queue and the second data type corresponding to the first data type. Step (S530) of sequentially providing calculation tasks to the corresponding second message queue, creating a semaphore for allocating shared resources to the first message queue and the second message queue, and providing the calculation tasks sequentially provided. It may include a step (S540) of determining the operation node to be performed.

그 후에, 처리 방법은 결정된 연산 노들들이 3차원 데이터 셋을 3차원 기반의 청크 데이터로 생성하는 단계(S550)를 포함할 수 있다.Afterwards, the processing method may include a step (S550) in which the determined operation nodes generate a 3D data set as 3D-based chunk data.

한편, 본 발명의 다양한 실시 예에 따라, 서로 다른 데이터 타입의 EM 데이터를 분산 병렬 처리함으로써, 600 기가 바이트의 EM 데이터의 분산 처리에 몇 시간이 소요됨으로써, 종래 기술에 따라 몇 일이 소요되는 한계가 개선될 수 있다.Meanwhile, according to various embodiments of the present invention, by distributing and parallel processing EM data of different data types, distributed processing of 600 gigabytes of EM data takes several hours, which is a limitation of several days according to the prior art. can be improved.

또한, 본 발명의 다양한 실시 예에 따라, 전두엽의 의사 결정 관련된 신경 회로에 대한 규명하는데 있어, 도움이 될 수 있다. 또한, 뇌 신경 회로망 이미지 데이터, 구조 데이터, 분자 데이터 등의 저장, 추출 및 분석 기술과 가시화 기술을 통해, 뇌신경회로망 분야 기술에 도움이 될 수 있다.Additionally, according to various embodiments of the present invention, it may be helpful in identifying neural circuits related to decision-making in the frontal lobe. In addition, it can be helpful in the field of brain neural network technology through storage, extraction, and analysis technology and visualization technology of brain neural network image data, structural data, and molecular data.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 마찬가지로, 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.Although this specification contains details of numerous specific implementations, these should not be construed as limitations on the scope of any invention or what may be claimed, but rather as descriptions of features that may be unique to particular embodiments of particular inventions. It must be understood. Likewise, certain features described herein in the context of individual embodiments may also be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable sub-combination. Furthermore, although features may be described as operating in a particular combination and initially claimed as such, one or more features from a claimed combination may in some cases be excluded from that combination, and the claimed combination may be a sub-combination. It can be changed to a variant of a sub-combination.

또한, 본 명세서에서는 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. Additionally, although operations are depicted in the drawings in a specific order herein, this should not be understood to mean that such operations must be performed in the specific order or sequential order shown or that all illustrated operations must be performed to obtain desirable results. Can not be done.

이와 같이, 본 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하려는 의도가 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, this specification is not intended to limit the invention to the specific terms presented. Accordingly, although the present invention has been described in detail with reference to the above-described examples, those skilled in the art may make modifications, changes, and variations to the examples without departing from the scope of the present invention. The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

As a method for distributed parallel processing of bio data,
Receiving a three-dimensional data set of a certain capacity including EM (Electron Microscope) data corresponding to each of a gray scale-based first data type and a color scale-based second data type;
Based on the data type information included in the three-dimensional data set, classifying the EM data included in the three-dimensional data set into EM data corresponding to the first data type and EM data corresponding to the second data type. step;
Based on key and value information corresponding to each of the first data type and the second data type, a first message queue corresponding to the first data type and a second message corresponding to the second data type sequentially providing computational tasks to a queue;
creating a semaphore for allocating shared resources to the first message queue and the second message queue, and determining a computation node to perform the computation tasks provided sequentially; and
Distributed parallel processing method comprising the step of the determined computational node performing a computational task.

According to paragraph 1,
The 3D data set is comprised of a single file, and the 3D data set is comprised of a capacity of gigabytes to terabytes.

delete

According to paragraph 1,
The step of performing the calculation task is,
A distributed parallel processing method comprising generating a 3D data set corresponding to the computational task as 3D chunk data.

According to paragraph 4,
Distributed parallel processing method further comprising displaying the generated 3D chunk data.

According to paragraph 1,
Distributed parallel processing method further comprising receiving operation error log information when an operation error occurs in the operation node.

As a system for distributed parallel processing of bio data,
Receives a 3D data set of a certain capacity including EM (Electron Microscope) data corresponding to each of a gray scale-based first data type and a color scale-based second data type, and data included in the 3D data set A master node that classifies EM data included in the three-dimensional data set into EM data corresponding to the first data type and EM data corresponding to the second data type, based on type information; and
It includes one or more computational nodes clustered with the master node,
The master node is,
Based on the key and value information corresponding to each of the first data type and the second data type, an operation task (Task) is sent to the first message queue corresponding to the first data type and the second message queue corresponding to the second data type. ) are provided sequentially,
Creates a semaphore for allocating shared resources to the first message queue and the second message queue, and determines a computational node to perform the sequentially provided computational tasks,
A distributed parallel processing system in which the determined computational nodes are configured to perform computational tasks.

In clause 7,
The 3D data set is comprised of a single file, and the 3D data set is comprised of a capacity of gigabytes to terabytes.

delete

In clause 7,
The determined operation node is,
A distributed parallel processing system configured to generate a 3D data set corresponding to the computational task as 3D chunk data.

According to clause 10,
A distributed parallel processing system further comprising a display for displaying the generated three-dimensional chunk data.

In clause 7,
The master node is,
A distributed parallel processing system configured to receive computational error log information when an computational error occurs in a computational node.