KR102161511B1

KR102161511B1 - Extracting method for biomarker for diagnosis of biliary tract cancer, computing device therefor, biomarker for diagnosis of biliary tract cancer, and biliary tract cancer diagnosis device comprising same

Info

Publication number: KR102161511B1
Application number: KR1020130122635A
Authority: KR
Inventors: 최형석; 허지연; 송시영; 정다운
Original assignee: 엘지전자 주식회사; 연세대학교 산학협력단
Priority date: 2013-10-15
Filing date: 2013-10-15
Publication date: 2020-10-05
Also published as: KR20150043790A

Abstract

본 발명은 담도암 진단용 바이오마커의 추출 방법 및 이를 위한 컴퓨팅 장치, 그리고 담도암 진단용 바이오마커 및 이를 포함하는 담도암 진단 장치를 개시한다. 더욱 구체적으로 본 발명은 혈액 또는 조직에서 수득한 마이크로 RNA를 이용한 담도암 진단용 바이오마커의 추출 방법 및 이를 위한 컴퓨팅 장치, 그리고 담도암 진단용 바이오마커 및 이를 포함하는 담도암 진단 장치를 개시한다.The present invention discloses a method for extracting a biomarker for diagnosis of biliary tract cancer, a computing device therefor, and a biomarker for diagnosis of biliary tract cancer, and a biliary cancer diagnosis apparatus including the same. More specifically, the present invention discloses a method of extracting a biomarker for diagnosis of biliary tract cancer using microRNA obtained from blood or tissue, a computing device therefor, and a biomarker for diagnosis of biliary tract cancer, and a biliary cancer diagnosis apparatus including the same.

Description

Extraction method of biomarker for diagnosis of biliary tract cancer, computing device therefor, biomarker for diagnosis of biliary tract cancer, and biliary cancer diagnosis device including the same , AND BILIARY TRACT CANCER DIAGNOSIS DEVICE COMPRISING SAME}

본 발명은 담도암 진단용 바이오마커의 추출 방법 및 이를 위한 컴퓨팅 장치, 그리고 담도암 진단용 바이오마커 및 이를 포함하는 담도암 진단 장치에 관한 것이다. 더욱 구체적으로 본 발명은 혈액 또는 조직에서 수득한 마이크로 RNA를 이용한 담도암 진단을 위한 바이오마커 의 추출 방법 및 이를 위한 컴퓨팅 장치, 그리고 담도암 진단용 바이오마커 및 이를 포함하는 담도암 진단 장치에 관한 것이다. The present invention relates to a method for extracting a biomarker for diagnosis of biliary tract cancer, a computing device therefor, and a biomarker for diagnosis of biliary tract cancer, and a biliary cancer diagnosis apparatus including the same. More specifically, the present invention relates to a method for extracting a biomarker for diagnosis of biliary tract cancer using microRNA obtained from blood or tissue, a computing device therefor, and a biomarker for biliary cancer diagnosis, and a biliary tract cancer diagnosis apparatus including the same.

담관은 간에서 만들어지는 담즙을 십이지장으로 보내는 관으로서, 간 속에서 나뭇가지가 하나의 가지를 향해 모이듯이 서서히 합류하면서 굵어지며, 간에서 나올 때에 좌우의 담관이 대부분 하나로 합류하게 된다. 담관은 간 속을 지나는 간내 담관과 간을 벗어나 십이지장까지 이어지는 간외 담관으로 나뉜다. 간외 담관 중 담즙을 일시적으로 저장하여 농축하는 주머니를 담낭이라 부르며, 이들 간내외 담관과 담낭을 통틀어 담도라고 부른다. The bile ducts are the tubes that send bile produced by the liver to the duodenum, and they become thicker while gradually joining as branches gather toward one branch in the liver, and when they come out of the liver, most of the bile ducts on the left and right join together. The bile duct is divided into an intrahepatic bile duct that passes through the liver and an extrahepatic bile duct that extends from the liver to the duodenum. A pouch that temporarily stores and concentrates bile among the extrahepatic bile ducts is called the gallbladder, and these internal and external bile ducts and gallbladder are collectively called the biliary tract.

담도암은 담관암이라고도 하며, 담관의 상피에서 발생하는 악성종양으로서 발생 부위에 따라 간내 담도암과 간외 담도암의 두 종류로 나뉘는데, 일반적으로 담도암이라고 하면 주로 간외 담관에 발생한 암을 가리킨다. 본 명세서에서는 달리 지시되지 않는 한 간내 담도암 및 간외 담도암을 모두 지칭한다.Biliary cancer, also called bile duct cancer, is a malignant tumor that occurs in the epithelium of the bile duct, and is divided into two types: intrahepatic biliary cancer and extrahepatic biliary cancer. Generally speaking, biliary cancer refers to cancer that occurs mainly in the extrahepatic bile duct. In the present specification, unless otherwise indicated, both intrahepatic biliary cancer and extrahepatic biliary cancer are referred to.

담도암은 주위의 조직에 스며들듯이 퍼지는 일이 많고, 명료한 종양 덩어리를 형성하지 않으므로, 그 덩어리를 정확하게 확인하고 진단하는 것은 쉽지 않다. 최근에는 화상진단기술이 발달함에 따라 복부 초음파검사, 컴퓨터단층촬영(CT), 자기공명영상(MRI), 경피경간담도조영(PTC), 경피경간담도배액술(PTBD), 내시경 역행성 담췌관 조영술(ERCP) 또는 혈관조영검사 등의 기술을 이용해 담도암을 진단하고 있다. 그러나 이러한 화상진단기술은 진단에 고비용이 들고 복잡하며, 사실상 조기 진단용으로는 소용이 없기 때문에, 특히 조기 진단을 위한 담도암 진단용 바이오마커의 개발이 절실하다. Biliary duct cancer often spreads as if seeping into surrounding tissues and does not form a clear tumor mass, so it is not easy to accurately identify and diagnose the mass. In recent years, with the development of image diagnosis technology, abdominal ultrasound, computed tomography (CT), magnetic resonance imaging (MRI), percutaneous transverse biliary tract (PTC), percutaneous transverse biliary tract drainage (PTBD), endoscopic retrograde biliary tract angiography ( ERCP) or angiography is used to diagnose biliary tract cancer. However, since such image diagnosis technology is expensive and complicated to diagnose, and is practically useless for early diagnosis, it is urgent to develop biomarkers for diagnosis of biliary tract cancer for early diagnosis.

이와 관련하여, 지난 20년 동안 다른 암종에 대한 수십 개의 바이오마커가 나왔으나, 담도암의 바이오마커는 아직까지 상용화되어 있는 것이 없는 실정이다.In this regard, there have been dozens of biomarkers for other carcinomas over the past 20 years, but there is no commercialized biomarker for biliary tract cancer yet.

한편, 마이크로 RNA(micro RNA, miRNA)란 약 17 내지 25개의 뉴클레오타이드(nucleotide)로 구성된 짧은 단일 가닥 비-암호화(non-coding) RNA 분자를 의미한다. 마이크로 RNA는 표적 mRNA(유전자)의 전사 과정을 방해하거나, mRNA를 분해하게 함으로써, 단백질 생산 유전자 발현을 조절하는 것으로 알려져 있다. 마이크로 RNA는 조직 내에서뿐만 아니라 혈액 속에도 존재하는 것으로 알려져 있다. Meanwhile, micro RNA (miRNA) refers to a short single-stranded non-coding RNA molecule composed of about 17 to 25 nucleotides. Micro RNA is known to regulate protein-producing gene expression by interfering with the transcription process of a target mRNA (gene) or by causing the mRNA to be degraded. Micro RNA is known to exist not only in tissues but also in blood.

또한, 취급 및 진단 상의 용이성을 위해 조직 또는 혈액 샘플을 이용한 바이오마커의 개발이 필요하다. 특히 혈액 샘플이 유리하다. In addition, it is necessary to develop a biomarker using a tissue or blood sample for ease of handling and diagnosis. Blood samples are particularly advantageous.

상기 문제점을 해결하기 위하여, 본 발명에서는 혈액 또는 조직에서 수득한 마이크로 RNA를 이용한 담도암 진단을 위한 바이오마커의 추출 방법 및 이를 위한 컴퓨팅 장치를 제공하고자 한다. 또한, 본 발명에서는 담도암 진단용 바이오마커 및 이를 포함하는 담도암 진단용 장치를 제공하고자 한다.In order to solve the above problem, the present invention is to provide a method of extracting a biomarker for diagnosis of biliary tract cancer using microRNA obtained from blood or tissue, and a computing device therefor. In addition, an object of the present invention is to provide a biomarker for diagnosis of biliary tract cancer and an apparatus for diagnosis of biliary tract cancer including the same.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those of ordinary skill in the technical field to which the present invention belongs from the following description. I will be able to.

본 발명의 일예에 따른 담도암 진단용 바이오마커 추출 방법은, 마이크로 RNA와 유전자 사이 상보 결합 정도를 수치화한 인터랙션 점수를 연산하는 단계; 상기 인터랙션 점수가 높은 n 개의 마이크로 RNA 및 유전자 페어를 결정하는 단계; 및 상기 n 개의 마이크로 RNA 및 유전자 페어 중 담도암 환자에게서 특이발현되는 유전자와 공통되는 유전자의 페어인 마이크로 RNA를 추출하는 단계를 포함한다.A method for extracting a biomarker for diagnosis of biliary tract cancer according to an embodiment of the present invention includes: calculating an interaction score obtained by digitizing the degree of complementary binding between micro RNA and genes; Determining n microRNAs and gene pairs having a high interaction score; And extracting microRNAs, which are pairs of genes common to genes specifically expressed in patients with biliary tract cancer, among the n microRNAs and gene pairs.

본 발명의 일예에 따른 담도암 진단용 바이오마커는 생물학적 시료로서 조직을 사용하고, hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, hsa-miR-200a-3p, hsa-miR-200b-3p, hsa-miR-222-3p, 및 hsa-miR-331-3p를 포함한다.The biomarker for diagnosis of biliary tract cancer according to an embodiment of the present invention uses tissue as a biological sample, and hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155- 5p, hsa-miR-181a-5p, hsa-miR-200a-3p, hsa-miR-200b-3p, hsa-miR-222-3p, and hsa-miR-331-3p.

본 발명의 일예에 따른 담도암 진단용 바이오마커는 생물학적 시료로서 혈액을 사용하고, hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, 및 hsa-miR-222-3p를 포함한다.The biomarker for diagnosis of biliary tract cancer according to an embodiment of the present invention uses blood as a biological sample, and hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155- 5p, hsa-miR-181a-5p, and hsa-miR-222-3p.

본 발명의 일예에 따른 담도암 진단을 위한 장치는 상술한 바이오마커를 포함한다. The apparatus for diagnosing biliary tract cancer according to an embodiment of the present invention includes the above-described biomarker.

본 발명에서 제시하는 해결 수단은 이상에서 언급한 해결 수단들로 제한되지 않으며, 언급하지 않은 또 다른 해결 수단들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The solutions proposed in the present invention are not limited to the above-mentioned solutions, and other solutions not mentioned will be clearly understood by those of ordinary skill in the art from the following description. I will be able to.

본 발명에서는 높은 특이성 및 민감성을 갖춘 담도암의 진단용 바이오마커를 제공할 수 있다. 또한, 본 발명에서는 담도암 진단용 바이오마커를 발굴하는 방법을 제공할 수 있다.In the present invention, a biomarker for diagnosis of biliary tract cancer with high specificity and sensitivity can be provided. In addition, the present invention can provide a method of discovering a biomarker for diagnosis of biliary tract cancer.

본 발명에서 이루고자 하는 기술적 효과들은 이상에서 언급한 기술적 효과들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical effects to be achieved in the present invention are not limited to the technical effects mentioned above, and other technical effects that are not mentioned will be clearly understood by those of ordinary skill in the technical field to which the present invention belongs from the following description. I will be able to.

도 1은 본 발명에 따른 컴퓨팅 장치의 블록도이다.
도 2는 miRNA와 유전자 사이의 인터랙션 점수를 연산하는 것의 일예를 설명하기 위한 개념도다.
도 3은 인터랙션 점수의 연산 방법의 흐름도이다.
도 4는 유사도 데이터베이스를 이용하여 유사 miRNA 및 특정 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이다.
도 5는 유사도 데이터베이스를 이용하여 유사 miRNA 및 유전자 사이의 상관관계 값을 연산하는 방법의 흐름도이다.
도 6은 miRNA 클러스터 데이터베이스를 이용하여 인접 miRNA 및 특정 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이다.
도 7은 miRNA 클러스터 데이터베이스를 이용하여 인접 miRNA 및 특정 유전자 사이의 가중치 값을 연산하는 방법의 흐름도이다.
도 8은 전사인자 데이터베이스를 이용하여 특정 miRNA 및 전사조절 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이다.
도 9는 전사인자 데이터베이스를 이용하여 특정 miRNA 및 전사조절 유전자 사이의 가중치 값을 연산하는 방법의 흐름도이다.
도 10은 바이오마커 발굴용 통합 분석 알고리즘을 기초로 담도암 환자용 바이오마커를 발굴하는 방법의 흐름도이다.
도 11은 데이터 GSE32957을 이용한 계층적 군집분석 결과를 히트 맵(Heat Map)으로 도시한 것이다.
도 12는 차세대 유전체 시퀀싱의 구체적인 활용예 중의 하나인 스몰 RNA 시퀀싱 데이터 분석에 관한 개념도이다.
도 13은 본 발명에 따라, 조직 샘플에서 발현되는 9개의 마이크로 RNA의 발현 양상을 나타내는 차세대 유전체 시퀀싱 데이터를 이용한 계층적 군집 분석 결과를 도시한 도면이다.
도 14는 본 발명에 따라, 혈액 샘플에서 발현되는 6개의 마이크로 RNA의 발현 양상을 나타내는 차세대 유전체 시퀀싱 데이터를 이용한 계층적 군집 분석 결과를 도시한 도면이다.1 is a block diagram of a computing device according to the present invention.
2 is a conceptual diagram illustrating an example of calculating an interaction score between a miRNA and a gene.
3 is a flowchart of a method of calculating an interaction score.
4 is a conceptual diagram illustrating a method of calculating a correlation value between a miRNA similar and a specific gene using a similarity database.
5 is a flowchart of a method of calculating a correlation value between similar miRNAs and genes using a similarity database.
6 is a conceptual diagram illustrating a method of calculating a correlation value between adjacent miRNAs and a specific gene using a miRNA cluster database.
7 is a flowchart of a method of calculating weight values between adjacent miRNAs and specific genes using a miRNA cluster database.
8 is a conceptual diagram illustrating a method of calculating a correlation value between a specific miRNA and a transcription control gene using a transcription factor database.
9 is a flowchart of a method of calculating a weight value between a specific miRNA and a transcription control gene using a transcription factor database.
10 is a flowchart of a method of discovering a biomarker for a biliary tract cancer patient based on an integrated analysis algorithm for discovering a biomarker.
11 shows a hierarchical cluster analysis result using data GSE32957 as a heat map.
12 is a conceptual diagram of analysis of small RNA sequencing data, which is one of specific application examples of next-generation genome sequencing.
13 is a diagram showing the results of hierarchical cluster analysis using next-generation genome sequencing data showing expression patterns of nine microRNAs expressed in tissue samples according to the present invention.
14 is a diagram showing the results of hierarchical cluster analysis using next-generation genome sequencing data showing expression patterns of six microRNAs expressed in blood samples according to the present invention.

이하, 본 발명과 관련된 컴퓨팅 장치에 대하여 도면을 참조하여 보다 상세하게 설명한다. Hereinafter, a computing device related to the present invention will be described in more detail with reference to the drawings.

이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. The suffixes "module" and "unit" for components used in the following description are given or used interchangeably in consideration of only the ease of preparation of the specification, and do not have meanings or roles that are distinguished from each other.

본 발명에서는 바이오마커 발굴용 통합 분석 알고리즘을 적용한 컴퓨팅 장치(100) 및 상기 컴퓨팅 장치(100)를 통해 발굴된 바이오마커를 개시한다. 여기서, 설명하는 컴퓨팅 장치(100)는 개인용 컴퓨터, 워크 스테이션, 수퍼 컴퓨터 등 전자 회로를 이용한 고속의 연산 처리 장치를 포함할 수 있다. 비단, 컴퓨터, 워크 스테이션, 수퍼 컴퓨터 등의 고정형 장치가 아니더라도, 중앙 처리 장치(Central Processing Unit)를 포함하고 연산 처리를 수행할 수 있는 스마트 폰, PDA, 랩탑 등의 이동형 장치도 컴퓨팅 장치에 포함될 수 있다. In the present invention, a computing device 100 to which an integrated analysis algorithm for discovering biomarkers is applied and a biomarker discovered through the computing device 100 are disclosed. Here, the described computing device 100 may include a high-speed operation processing device using an electronic circuit such as a personal computer, a work station, and a super computer. However, even if it is not a fixed device such as a computer, a workstation, or a super computer, a portable device such as a smart phone, PDA, and laptop that includes a central processing unit and can perform arithmetic processing may also be included in the computing device. have.

도 1은 본 발명에 따른 컴퓨팅 장치의 블록도이다. 도 1을 참조하면, 본 발명에 따른 컴퓨팅 장치(100)는 저장부(110), 사용자 입력부(120), 통신부(130) 및 제어부(140)를 포함할 수 있다. 1 is a block diagram of a computing device according to the present invention. Referring to FIG. 1, the computing device 100 according to the present invention may include a storage unit 110, a user input unit 120, a communication unit 130, and a control unit 140.

저장부(110)는 제어부(140)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 데이터베이스 등)을 임시 저장할 수도 있다. 나아가 저장부(110)는 통신부(130)가 통신을 수행하면서 송수신하는 데이터들을 저장할 수도 있다. The storage unit 110 may store a program for the operation of the controller 140 and may temporarily store input/output data (eg, a database). Furthermore, the storage unit 110 may store data transmitted and received while the communication unit 130 performs communication.

저장부(110)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 저장부(110) 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 저장부, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. The storage unit 110 includes a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (for example, SD or XD storage unit 110 ), etc.), RAM (Random Access Memory, RAM), SRAM (Static Random Access Memory), ROM (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) , A magnetic storage unit, a magnetic disk, and an optical disk.

사용자 입력부(120)는 사용자로부터 사용자 입력을 수신하는 역할을 수행한다. 사용자 입력부(120)는 키보드, 마우스 등을 포함할 수 있다. The user input unit 120 serves to receive a user input from a user. The user input unit 120 may include a keyboard and a mouse.

통신부(130)는 통신을 통해 외부로부터 데이터를 수신하거나 외부로 데이터를 송신하는 역할을 수행한다. 본 발명에 의한 통신부(130)는 원격 서버로부터 각종 데이터베이스를 수신하는 역할을 수행할 수 있다.The communication unit 130 serves to receive data from the outside or transmit data to the outside through communication. The communication unit 130 according to the present invention may serve to receive various databases from a remote server.

제어부(140)는 컴퓨팅 장치(100)의 전반적인 동작을 제어하고, 각종 연산을 수행한다. 본 발명에 의한 제어부(140)는 후술할 인터랙션 점수 및 상관관계 값 등을 연산하고, 담도암 진단용 바이오마커를 추출하기 위한 연산을 수행할 수 있다.The controller 140 controls the overall operation of the computing device 100 and performs various operations. The control unit 140 according to the present invention may calculate an interaction score and a correlation value to be described later, and perform an operation for extracting a biomarker for diagnosis of biliary tract cancer.

본 발명에 의한 컴퓨팅 장치(100)는 정보의 출력을 위한 디스플레이부(150)를 더 포함할 수도 있다. 디스플레이부(150)는 사용자의 입력을 표시하고, 제어부(140)의 연산 결과를 출력하는 출력 장치로의 역할을 수행할 수 있다. 디스플레이부(150)는 컴퓨팅 장치(100)를 보조하는 모니터 등의 장치일 수 있다.The computing device 100 according to the present invention may further include a display unit 150 for outputting information. The display 150 may serve as an output device that displays a user's input and outputs an operation result of the controller 140. The display unit 150 may be a device such as a monitor that assists the computing device 100.

상기와 같이 설명된 컴퓨팅 장치(100)는 하술될 실시예들의 구성과 방법이 한정되어 적용될 수 있는 것이 아니라, 하기 실시예들의 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The computing device 100 described above is not limited and applicable to the configuration and method of the embodiments to be described below, but all or part of each of the embodiments is selectively combined so that various modifications of the following embodiments can be made. It can also be configured.

상술한 컴퓨팅 장치(100)를 이용하여, 담도암 진단용 바이오마커 발굴 방법에 대해 상세히 설명하기로 한다.A method of discovering a biomarker for diagnosis of biliary tract cancer will be described in detail using the above-described computing device 100.

본 발명에서 설명하는 바이오마커 발굴용 통합 분석 알고리즘은 차등 발현 유전자(Differentially Expressed Genes) 분석 알고리즘과 마이크로 RNA 표적 유전자 분석 알고리즘이 병합되어 구성된 형태일 수 있다. The integrated analysis algorithm for biomarker discovery described in the present invention may be configured by combining a differentially expressed gene analysis algorithm and a micro RNA target gene analysis algorithm.

먼저 차등 발현 유전자 알고리즘에 대해 설명하기로 한다. 차등 발현 유전자 알고리즘은 담도암 환자에게서 정상인과 다르게 과발현(over-expression)되거나, 저발현(low-expression)되는 유전자를 통계적으로 유의미하게 찾아내기 위한 알고리즘으로써, 다양한 요인들을 고려할 수 있는 고급 통계 방법 중 하나인 선형 모형(linear model)을 이용하여 정상인 그룹과 환자 그룹을 구분할 수 있는 유전자들을 찾는 것을 목적으로 한다(참고문헌 Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, Article 3).First, the differential expression genetic algorithm will be described. The differential expression gene algorithm is an algorithm for statistically meaningful finding of genes that are over-expressed or under-expressed differently from normal subjects in patients with biliary tract cancer.It is one of the advanced statistical methods that can consider various factors. The purpose of this study is to find genes that can distinguish between normal and patient groups using a linear model (Reference Statistical Applications in Genetics and Molecular Biology , Vol. 3, No. 1, Article 3).

차등 발현 유전자 분석 알고리즘은 크게 데이터 표준화(normalization) 단계와 통계 분석 단계로 구분될 수 있다. 데이터 표준화 단계는 정상인 그룹과 환자 그룹으로부터 얻어진 인간의 전체 유전자에 대한 마이크로어레이 데이터를 통합하고 보정하는 단계이다. 데이터 표준화를 위해, 로버스트 멀티 칩 평균(Robust Multichip Average, RMA) 알고리즘이 이용될 수 있다(참고문헌 Biostatistics, Vol. 4, No. 2, 249-264).The differential expression gene analysis algorithm can be largely divided into a data normalization stage and a statistical analysis stage. The data standardization step is the step of integrating and correcting microarray data for all human genes obtained from the normal and patient groups. For data standardization, a robust multichip average (RMA) algorithm can be used (Reference Biostatistics , Vol. 4, No. 2, 249-264).

통계분석 단계는 표준화된 데이터를 선형모형을 이용하여 두 그룹(즉, 정상인 그룹과 환자 그룹) 사이에서 통계적으로 유의미하게 발현량에 차이가 나는 유전자를 선별하는 단계이다. 통계적 유의 확률은 FDR (False Discovery Rate) 방법(참고문헌 Journal of the Royal Statistical Society , Series B ( Methodological ), Vol. 57, No. 1, 289-300)을 이용하여 보정된 p-value인 q-value가 0.01 이하인 유전자를 선택할 수 있다. The statistical analysis step is a step of selecting genes with statistically significant differences in expression levels between two groups (ie, a normal person group and a patient group) using a linear model using standardized data. The probability of statistical significance is determined by the FDR (False Discovery Rate) method (Reference Journal of the Royal Statistical Society , Series B ( Methodological ) , Vol. 57, No. 1, 289-300) can be used to select genes whose q-value, which is the corrected p-value, is 0.01 or less.

본 발명의 컴퓨팅 장치(100)는 담도암 진단용 바이오마커를 발굴하기 위해, 차등 발현 유전자 분석 알고리즘을 이용한 담도암 환자에게서 특이발현(과발현 또는 저발현)되는 유전자 목록을 이용할 수 있다. 차등 발현 유전자 분석 알고리즘을 이용하여 담도암 환자에게서 특이발현되는 유전자 목록을 발굴하는 것은 이미 공지의 기술이므로, 이에 대한 상세한 설명은 생략한다.The computing device 100 of the present invention may use a list of genes that are specifically expressed (overexpressed or underexpressed) in a patient with biliary tract cancer using a differentially expressed gene analysis algorithm in order to discover a biomarker for diagnosis of biliary tract cancer. Since it is already known technology to discover a list of genes that are specifically expressed in patients with biliary tract cancer using a differentially expressed gene analysis algorithm, detailed descriptions thereof will be omitted.

다음으로 마이크로 RNA 표적 유전자 분석 알고리즘에 대해 설명하기로 한다. 본 발명에서 설명하는 마이크로 RNA 표적 유전자 분석 알고리즘은 기존의 마이크로 RNA 데이터베이스에서 얻어진 마이크로 RNA 표적 유전자 예측 계산값, 마이크로어레이 실험을 통해 얻어진 마이크로 RNA와 유전자의 발현양상에 대한 상관관계 계산값 및 생물학적 메커니즘에 따른 가중치 계산값 중 적어도 하나를 이용하여 마이크로 RNA의 표적 유전자를 정확하게 찾아내는 통계 방정식을 제공하기 위한 것이다.Next, an algorithm for analyzing microRNA target genes will be described. The microRNA target gene analysis algorithm described in the present invention is based on the calculated values for predicting microRNA target genes obtained from the existing microRNA database, the calculated correlation values for the expression patterns of microRNAs and genes obtained through microarray experiments, and biological mechanisms. It is to provide a statistical equation for accurately finding a target gene of a micro RNA by using at least one of the calculated weight values.

이하에서는, 마이크로 RNA 표적 유전자 예측 계산값(또는 인터랙션 점수), 상관관계 계산값 및 가중치 계산값의 연산 방법에 대해 상세히 설명하기로 한다. 설명의 편의를 위해, 본 발명에서 miRNA 및 gene이라 기재된 것은 각각 마이크로 RNA 및 유전자와 동등한 의미인 것으로 가정한다. Hereinafter, a method of calculating a microRNA target gene prediction calculated value (or interaction score), a correlation calculated value, and a weight calculated value will be described in detail. For convenience of explanation, the descriptions of miRNA and gene in the present invention are assumed to have the same meaning as micro RNA and gene, respectively.

마이크로 Micro RNARNA 표적 유전자 Target gene 예측값Predicted value 연산 calculate

본 발명에 따른 컴퓨팅 장치(100)는 마이크로 RNA와 이의 표적 유전자 사이의 상보 결합 정도를 수치화한 인터랙션 점수(Interaction Score)를 연산할 수 있다. 인터랙션 점수를 통해 마이크로 RNA와 이의 표적 유전자 사이의 상보 결합이 발생할 가능성의 고저를 판단할 수 있다. 후술되는 도면을 참조하여 인터랙션 점수의 연산 방법에 대해 상세히 설명한다.The computing device 100 according to the present invention may calculate an interaction score obtained by quantifying the degree of complementary binding between the micro RNA and its target gene. The interaction score can be used to determine the likelihood of complementary binding between the microRNA and its target gene. A method of calculating an interaction score will be described in detail with reference to the drawings to be described later.

도 2는 miRNA와 유전자 사이의 인터랙션 점수를 연산하는 것의 일예를 설명하기 위한 개념도이고, 도 3은 인터랙션 점수의 연산 방법의 흐름도이다. 2 is a conceptual diagram illustrating an example of calculating an interaction score between a miRNA and a gene, and FIG. 3 is a flowchart of a method of calculating an interaction score.

도 2 및 도 3을 참조하면, 먼저, 컴퓨팅 장치(100)는 적어도 하나 이상의 miRNA 타겟 예측(miRNA Target Prediction) 툴을 이용하여 miRNA와 유전자 사이의 예측 점수(Prediction score)를 통계화한 데이터 베이스를 획득할 수 있다(S310). 2 and 3, first, the computing device 100 uses at least one miRNA Target Prediction tool to generate a database in which prediction scores between miRNAs and genes are statistics It can be obtained (S310).

miRNA 타겟 예측 툴은 표적 유전자와, 상기 표적 유전자에 상보적으로 결합하여 표적 유전자가 단백질로 만들어지는 과정을 억제할 수 있는 miRNA 페어(pair)의 결합의 정도를 수치화한 소프트웨어적 툴을 의미할 수 있다. 유전자-miRNA 페어들의 예측 점수를 획득하기 위한 miRNA 타겟 예측 툴로는, Targetscan, miRDB, DIANA-microT, PITA, miRanda MicroCosm, RNAhybrid, PicTar, RNA22 등이 포함될 수 있다. 각각의 miRNA 타겟 예측 툴에 대한 간략한 설명이 표 1에 기재되어 있다.The miRNA target prediction tool may refer to a software tool that quantifies the degree of binding of a target gene and a miRNA pair capable of complementarily binding to the target gene to inhibit the process of making the target gene into a protein. have. As a miRNA target prediction tool for obtaining prediction scores of gene-miRNA pairs, Targetscan, miRDB, DIANA-microT, PITA, miRanda MicroCosm, RNAhybrid, PicTar, RNA22, etc. may be included. A brief description of each miRNA target prediction tool is listed in Table 1.

툴 이름Tool name 툴 설명(이용 정보)Tool description (usage information) 참조 사이트Reference site TargetscanTargetscan 서열 유사성(sequence similarity) 정보와 보존(conservation) 정보 이용Use of sequence similarity and conservation information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /18955434/18955434 miRDBmiRDB 서열 유사성 정보, 열역학적 안정성(thermodynamic stability) 정보 및, 보존(conservation) 정보 이용 Use of sequence similarity information, thermodynamic stability information, and conservation information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /18426918/18426918 DIANA-microTDIANA-microT 서열 유사성 정보 및 열역학적 안정성 정보 이용 Use of sequence similarity information and thermodynamic stability information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /15131085/15131085 PITAPITA 서열 유사성 정보 및 열역학적 안정성 정보 이용 Use of sequence similarity information and thermodynamic stability information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /17893677/17893677 miRandamiRanda 열역학적 안정성 및 보존 정보 이용Use of thermodynamic stability and conservation information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /14709173/14709173 MicroCosmMicroCosm 열역학적 안정성 정보 및 보존 정보 이용Use of thermodynamic stability information and conservation information http://www.ebi.ac.uk/enright-http://www.ebi.ac.uk/enright- srvsrv // microcosmmicrocosm // htdocshtdocs // targetstargets // v5v5 // infoinfo .. htmlhtml RNAhybridRNAhybrid 열역학적 안정성 정보 이용Use of thermodynamic stability information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /15383676/15383676 PicTarPicTar 서열 유사성 정보와 보존 정보 이용Use of sequence similarity information and conservation information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /15806104/15806104 RNA22RNA22 서열 패턴(sequence pattern) 정보 이용Use of sequence pattern information http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ pubmedpubmed /16990141/16990141

타겟 예측 툴을 이용하면, miRNA와 상보 결합할 수 있는 다양한 유전자 사이의 예측 점수가 스코어링 될 수 있다. 예측 점수가 작을수록 miRNA와 유전자 사이의 상보 결합 가능성이 낮아지는 것을 의미할 수 있다. Using a target prediction tool, prediction scores between various genes capable of complementary binding to miRNAs can be scored. The smaller the predicted score, the lower the likelihood of complementary binding between the miRNA and the gene.

타겟 예측 툴은 본 발명에 의한 컴퓨팅 장치(100)에 의해 구동되고, 제어부(140)의 연산 처리에 의해 miRNA-유전자 페어의 예측 점수를 통계화한 데이터 베이스가 획득될 수도 있으나, 이에 한정되는 것은 아니다. 본 발명에 의한 컴퓨팅 장치(100)는 타겟 예측 툴을 이용하는 원격의 서버로부터 miRNA-유전자 페어의 예측 점수를 통계화한 데이터 베이스를 획득할 수도 있다.The target prediction tool is driven by the computing device 100 according to the present invention, and a database obtained by statistically calculating the predicted score of the miRNA-gene pair may be obtained by the computational processing of the control unit 140, but is limited thereto. no. The computing device 100 according to the present invention may obtain a database obtained by statisticalizing prediction scores of miRNA-gene pairs from a remote server using a target prediction tool.

miRNA-유전자 페어 사이의 예측 점수의 신뢰성을 높이기 위해, 하나의 타겟 예측 툴을 이용하는 것 보다 복수의 타겟 예측 툴을 이용하여 복수의 데이터베이스를 획득하는 것이 바람직하다. 도 2에서는 타겟 예측 툴로, PITA, DIANA-microT, TargetScan, MicroCosm, miRDB 및 miRanda가 이용된 것으로 예시되었다.In order to increase the reliability of prediction scores between miRNA-gene pairs, it is preferable to obtain a plurality of databases using a plurality of target prediction tools rather than using a single target prediction tool. In FIG. 2, it is illustrated that PITA, DIANA-microT, TargetScan, MicroCosm, miRDB, and miRanda are used as target prediction tools.

복수의 타겟 예측 툴을 이용하여, miRNA-유전자 페어의 예측 점수를 통계화한 복수의 데이터 베이스를 획득한 경우, 데이터베이스의 정규화(normalization)하기 위해, 제어부(140)는 miRNA-유전자 페어의 예측 점수의 순위를 기초로, 정규화된 점수를 연산할 수 있다(S320).In the case of obtaining a plurality of databases in which the predicted scores of miRNA-gene pairs are statistically obtained using a plurality of target prediction tools, in order to normalize the database, the control unit 140 determines the predicted scores of the miRNA-gene pairs. Based on the ranking of, it is possible to calculate a normalized score (S320).

표 1에 도시된 예에서와 같이, miRNA 타겟 예측 툴이 이용하는 정보들이 다르고, 각 데이터베이스마다 예측 점수를 매기는데 서로 다른 단위가 적용될 수 있기 때문에, 복수의 데이터 베이스를 이용하고자 하는 경우 이의 정규화가 필수적이라 할 것이다. miRNA-유전자 페어의 예측 점수를 정규화 하기 위해, 제어부(140)는 각 데이터베이스별로 miRNA-유전자 페어들의 예측 점수를 기준으로 순위를 매긴 뒤, 이를 표준 점수로 변환하고, 각 데이터베이스에서의 miRNA-유전자 페어들의 표준 점수를 합하여 정규화된 점수를 획득할 수 있다. 수학식 1은 정규화된 점수를 획득하는데 이용되는 수학식을 예시한 것이다.As in the example shown in Table 1, since the information used by the miRNA target prediction tool is different, and different units may be applied to calculate the prediction score for each database, it is necessary to normalize the information when using multiple databases. I would say this. In order to normalize the predicted score of the miRNA-gene pair, the control unit 140 ranks each database based on the predicted score of the miRNA-gene pair, converts it into a standard score, and converts the miRNA-gene pair in each database. Normalized scores can be obtained by summing the standard scores. Equation 1 illustrates an equation used to obtain a normalized score.

상기 수학식 1에서, i는 i번째 데이터 베이스, n은 데이터 베이스의 개수(일예로, 도 2에서는 6개의 예측 툴을 통해 6개의 데이터 베이스를 획득하였으므로, n은 6으로 설정될 수 있을 것임), T_i는 i번째 데이터 베이스에서 miRNA-유전자 페어의 총 개수, R_i,j는 i번째 데이터베이스에서, j번째 miRNA-유전자 페어의 순위를 의미하는 것일 수 있다. In Equation 1, i is the i-th database, and n is the number of databases (for example, since 6 databases were obtained through 6 prediction tools in FIG. 2, n may be set to 6) , T _i may mean the total number of miRNA-gene pairs in the i-th database _, and R _i,j denote the rank of the j-th miRNA-gene pair in the i-th database.

예컨대, 100개의 miRNA-유전자 페어가 존재하는 제 1 데이터베이스에서, miRNA1-gene1 페어의 예측 점수가 100개의 페어 중 20위에 해당한다면, 제 1 데이터베이스에 miRNA1-gene1 페어의 표준 점수는 (100+1-20)/100=0.81 이 될 것이다. 제어부(140)는, 제 2 내지 제 n 데이터베이스에서 miRNA1-geng1 페어의 표준 점수를 합하여, miRNA1-gene1 페어의 정규화 점수를 연산할 수 있다.For example, in the first database in which 100 miRNA-gene pairs exist, if the predicted score of the miRNA1-gene1 pair corresponds to the 20th place out of 100 pairs, the standard score of the miRNA1-gene1 pair in the first database is (100+1- 20)/100=0.81. The control unit 140 may calculate a normalization score of the miRNA1-gene1 pair by summing the standard scores of the miRNA1-geng1 pair in the second to nth databases.

이후, 제어부(140)는 정규화 점수를 기초로, 특정 유전자에 대한 miRNA 들의 순위 및 특정 miRNA에 대한 유전자들의 순위를 매길 수 있다(S330). Thereafter, the controller 140 may rank the miRNAs for a specific gene and the genes for a specific miRNA based on the normalization score (S330).

예컨대, gene1과 상보결합 가능한 miRNA가 miRNA1, miRNA3, miRNA4 일때, 제어부(140)는 gene1-miRNA1, gene1-miRNA3 및 gene1-miRNA4 각각의 정규화 점수를 기초로, gene1과의 상보적인 결합력이 강한(즉, 정규화 점수가 높은) 순서대로 miRNA의 순위를 결정할 수 있다. 도 2에서는 miRNA1-gene1 사이의 정규화 점수가 0.4이고, miRNA3-gene1 사이의 정규화 점수가 0.6으로 설정되어, gene1에 대해 miRNA1은 두 번째 순위를 갖고, miRNA3은 첫 번째 순위를 갖는 것으로 예시되었다.For example, when miRNAs capable of complementary binding to gene1 are miRNA1, miRNA3, and miRNA4, the control unit 140 has strong complementary binding power with gene1 based on the normalization scores of each of gene1-miRNA1, gene1-miRNA3 and gene1-miRNA4 (i.e. , The normalization score is high). In FIG. 2, the normalization score between miRNA1-gene1 is 0.4, and the normalization score between miRNA3-gene1 is set to 0.6, and it is illustrated that miRNA1 has the second rank and miRNA3 has the first rank with respect to gene1.

이와 같은 방법으로 특정 miRNA에 대한 유전자들의 순위를 결정할 수 있다. 예컨대, miRNA1과 상보결합 가능한 gene이 gene1 및 gene3인 경우, 제어부(140)는 miRNA1-gene1 및 miRNA1-gene3 각각의 정규화 점수를 기초로, miRNA1과의 상보적인 결합력이 강한(즉, 정규화 점수가 높은) 순서대로 gene의 순위를 결정할 수 있다. 도 2에서는 miRNA1-gene1 사이의 정규화 점수가 0.4이고, miRNA1-gene3 사이의 정규화 점수가 0.5로 설정되어, miRNA1에 대해 gene1은 두 번째 순위를 갖고, gene3은 첫 번째 순위를 갖는 것으로 예시되었다.In this way, you can determine the ranking of genes for a specific miRNA. For example, when genes capable of complementary binding to miRNA1 are gene1 and gene3, the control unit 140 has strong complementary binding power with miRNA1 based on the normalization scores of each of miRNA1-gene1 and miRNA1-gene3 (that is, a high normalization score). ) You can determine the ranking of the genes in order. In FIG. 2, the normalization score between miRNA1-gene1 is 0.4, and the normalization score between miRNA1-gene3 is set to 0.5, so that gene1 has the second rank and gene3 has the first rank for miRNA1.

이후, 제어부(140)는 유전자 및 miRNA 들의 순위를 기초로, 유전자-miRNA 사이의 인터랙션 점수(Interaction Score)를 연산할 수 있다(S340). 수학식 2는 인터랙션 점수를 연산하기 위해 이용되는 수학식을 예시한 것이다.Thereafter, the controller 140 may calculate an interaction score between genes and miRNAs based on the ranking of genes and miRNAs (S340). Equation 2 illustrates an equation used to calculate an interaction score.

상기 수학식 2에서, t_mi는 i번째 miRNA의 유전자와의 페어의 수(number of miRNA_i-gene), t_gj는 j번째 gene의 miRNA와의 페어의 수(number of gene_j-miRNA), r_mi는 j번째 유전자에 대한 i번째 miRNA의 정규화 점수 순위, r_gj는 i번째 miRNA에 대한 j번째 유전자의 정규화 점수 순위를 의미할 수 있다. In Equation 2, t _mi is the number of pairs with the i-th miRNA gene (number of miRNA _i -gene), t _gj is the number of pairs with the j-th gene miRNA (number of gene _j -miRNA), r _mi may mean the rank of the normalization score of the i th miRNA for the j th gene, and r _gj may mean the rank of the normalization score of the j th gene for the i th miRNA.

상관관계 연산Correlation operation

상술한 타겟 miRNA 예측 툴이 인체의 모든 miRNA 및 모든 유전자와 관련된 데이터베이스를 갖고 있지는 않다. 본 발명에서는 miRNA 사이의 유사도, miRNA 사이의 주변 영향력 및 유전자 들의 전사인자 등을 이용하여, 타겟 miRNA 예측 툴로부터 예측될 수 없는 다양한 miRNA 및 유전자 들의 인터랙션 점수를 획득할 수도 있다.The above-described target miRNA prediction tool does not have a database related to all miRNAs and all genes in the human body. In the present invention, using the similarity between miRNAs, peripheral influences between miRNAs, and transcription factors of genes, it is also possible to obtain an interaction score of various miRNAs and genes that cannot be predicted from a target miRNA prediction tool.

실시예 1 - 상관관계에 기인한 가중치 연산Example 1-Weight calculation due to correlation

본 발명에 의한 컴퓨팅 장치(100)는 마이크로어레이 실험을 통해 얻어진 특정 miRNA와 특정 유전자의 발현 양상에 대한 상관관계 값을 획득하고, 특정 miRNA와 유사한 유사 miRNA와 특정 유전자 사이의 상관관계 값을 예측할 수 있다. 후술되는 도면을 참조하여, 유사 miRNA와 특정 유전자 사이의 상관관계 값의 연산에 대해 상세히 설명한다.The computing device 100 according to the present invention can obtain a correlation value for the expression pattern of a specific miRNA and a specific gene obtained through a microarray experiment, and predict a correlation value between a similar miRNA similar to a specific miRNA and a specific gene. have. With reference to the drawings to be described later, the calculation of the correlation value between the miRNA and a specific gene will be described in detail.

도 4는 유사도 데이터베이스를 이용하여 유사 miRNA 및 특정 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이고, 도 5는 유사도 데이터베이스를 이용하여 유사 miRNA 및 유전자 사이의 상관관계 값을 연산하는 방법의 흐름도이다. FIG. 4 is a conceptual diagram illustrating a method of calculating a correlation value between a similar miRNA and a specific gene using a similarity database, and FIG. 5 is a method of calculating a correlation value between a similar miRNA and a gene using a similarity database Is the flow chart.

먼저, 마이크로어레이 실험을 통해 얻어진 유전자 발현 프로파일(Genes Expression Profiles) 및 miRNA 발현 프로파일(miRNAs Expression Profiles)을 포함하는 실험 데이터가 입력되면(S510), 제어부(140)는 입력된 실험 데이터를 기초로, 특정 miRNA와 특정 유전자 사이의 상관관계(Correlation)를 연산할 수 있다(S520). First, when experimental data including gene expression profiles and miRNAs expression profiles obtained through microarray experiments are input (S510), the controller 140 is based on the input experimental data, Correlation between a specific miRNA and a specific gene can be calculated (S520).

우선적으로 마이크로어레이 실험에 대해 설명하면, 유전자 마이크로어레이(microarray)란, 생명체의 유전자 전체 또는 일부에 대해 유전자 발현 양을 측정할 수 있는 도구로서, DNA 마이크로어레이라고도 한다. 유전자 마이크로어레이의 도입으로 유전자의 관찰 능력이 개개 유전자 수준에서 생명체 전체로 확장되어, 생명체를 하나의 시스템으로 연구할 수 있다. 또한 유전자 마이크로어레이는 기본적으로 기존의 유전자 검출 기법을 병렬화해서 대규모로 수행하는 것이므로, 데이터를 처리하고 분석하는 방식에도 획기적인 변화를 가져왔다. 유전자 마이크로어레이의 수행 방법은 일반적으로 먼저 1㎠ 정도의 슬라이드 표면에 수천 내지 수만 종류의 유전자 서열을 고정시킨 후, 다양한 실험 조건에서 채취한 세포의 RNA를 추출하고 DNA로 역전사하여 형광 물질로 표지한다. 이어서, 표지된 DNA를 마이크로어레이에 혼성화(hybridization)시키고, 이를 스캐닝하여 이미지로 형상화한 후 이미지 분석 프로그램으로 유전자 위치마다 형광 물질에 따른 발색 강도를 측정함으로써, 유전자 발현 여부 및 발현 정도를 수학, 통계 그리고 컴퓨터 공학과 같은 정보학을 이용하여 정량화된 유전자 발현 수치 데이터와 대조 분석한다.First of all, when explaining the microarray experiment, a gene microarray is a tool that can measure the amount of gene expression for all or part of a gene in an organism, and is also referred to as a DNA microarray. With the introduction of gene microarrays, the ability to observe genes is extended from the level of individual genes to the entire living organism, allowing the study of living organisms as a single system. In addition, since gene microarrays are basically parallelized and performed on a large scale with existing gene detection techniques, it has brought a drastic change in the way data is processed and analyzed. In general, the method of performing a gene microarray is to first fix thousands to tens of thousands of gene sequences on the surface of a slide of about 1 cm2, then extract RNA from cells collected under various experimental conditions, reverse transcription with DNA, and label with a fluorescent substance. . Subsequently, the labeled DNA is hybridized to a microarray, scanned and shaped into an image, and then the intensity of color development according to the fluorescent substance is measured for each gene location using an image analysis program, thereby determining whether or not the gene is expressed and the degree of expression. And it performs contrast analysis with quantified gene expression data using informatics such as computer engineering.

위와 같은 마이크로어레이 실험을 통해, 특정 miRNA와 특정 유전자 사이의 발현 정도가 수치화 될 수 있다. 특정 miRNA와 특정 유전자 사이의 상관관계는 피어슨 상관관계(Pearson's Correlation)로, 특정 유전자의 발현량 증가에 따른, 특정 miRNA의 발현량의 증감의 상대량을 나타내는 것일 수 있다.Through the above microarray experiment, the degree of expression between a specific miRNA and a specific gene can be quantified. The correlation between a specific miRNA and a specific gene is Pearson's Correlation, which may indicate a relative amount of increase or decrease in the expression level of a specific miRNA according to an increase in the expression level of a specific gene.

이후, 컴퓨팅 장치(100)는 miRNA 유사도 데이터베이스(miRNA Similarity DataBase)를 이용하여 특정 miRNA와 유사한 유사 miRNA의 유사도값을 획득할 수 있다(S530). miRNA 유사도 데이터베이스는 miRNA 사이의 기능적 유사도를 수치화한 유사도 값을 포함할 수 있다. miRNA 유사도 데이터베이스는 이미 공지된 BLAST 또는 BLAT 툴을 통해 획득된 것일 수 있다. Thereafter, the computing device 100 may obtain a similarity value of a miRNA similar to a specific miRNA using a miRNA Similarity Database (S530). The miRNA similarity database may contain similarity values obtained by quantifying functional similarity between miRNAs. The miRNA similarity database may be obtained through a known BLAST or BLAT tool.

이후, 컴퓨팅 장치(100)는 유사도 값을 이용하여 유사 miRNA와 특정 유전자 사이의 상관관계를 연산할 수 있다(S540). 유사 miRNA와 유전자 사이의 가중치를 연산하는 데에는 유사도 값을 이용한 선형 회귀 분석(Linear regression model)이 사용될 수 있다.Thereafter, the computing device 100 may calculate a correlation between the miRNA similar and the specific gene by using the similarity value (S540). Linear regression model using similarity values can be used to calculate weights between similar miRNAs and genes.

실시예 2 - miRNA 주변 영향력을 고려한 상관관계 연산Example 2-Calculation of correlation considering influence around miRNA

본 발명에 따른 컴퓨팅 장치(100)는 특정 miRNA와 클러스터를 형성하는 인접 miRNA에 대한 상관관계 값을 연산할 수도 있다. miRNA들 사이의 영향력을 고려하여 상관관계 값을 연산하는 것에 대해서는 후술되는 도면을 참조하기로 한다. The computing device 100 according to the present invention may calculate a correlation value between a specific miRNA and an adjacent miRNA forming a cluster. For calculating the correlation value in consideration of the influence between miRNAs, reference will be made to the drawings to be described later.

도 6은 miRNA 클러스터 데이터베이스를 이용하여 인접 miRNA 및 특정 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이고, 도 7은 miRNA 클러스터 데이터베이스를 이용하여 인접 miRNA 및 특정 유전자 사이의 가중치 값을 연산하는 방법의 흐름도이다. 6 is a conceptual diagram for explaining a method of calculating a correlation value between adjacent miRNAs and a specific gene using a miRNA cluster database, and FIG. 7 is a weight value between adjacent miRNAs and a specific gene using a miRNA cluster database This is a flow chart of how to do it.

먼저, 마이크로어레이 실험을 통해 얻어진 유전자 발현 프로파일(Genes Expression Profiles) 및 miRNA 발현 프로파일(miRNAs Expression Profiles)을 포함하는 실험 데이터가 입력되면(S710), 제어부(140)는 입력된 실험 데이터를 기초로, 특정 miRNA와 특정 유전자 사이의 상관관계(Correlation)를 연산할 수 있다(S720). First, when experimental data including gene expression profiles and miRNAs expression profiles obtained through microarray experiments are input (S710), the controller 140 is based on the input experimental data, Correlation between a specific miRNA and a specific gene can be calculated (S720).

이후, 컴퓨팅 장치(100)는 miRNA 클러스터 데이터베이스(miRNA Cluster DataBase)를 이용하여 실험 데이터로 입력된 특정 miRNA와 영향력 있는 유효 거리 이내에 위치하는 인접 miRNA를 추출할 수 있다(S730). miRNA 클러스터 데이터베이스는 miRNA 들 사이의 거리 데이터를 포함하는 것으로, 컴퓨팅 장치(100)는 특정 miRNA과 10kb(kilobase) 이내에 위치하는 miRNA가 영향력 있는 유효 거리에 있는 것으로 판단할 수 있다. 다만, 유효 거리가 반드시 10kb로 설정되어야 하는 것은 아니며, 이는 선택에 따라 얼마든지 달라질 수 있다.Thereafter, the computing device 100 may extract a specific miRNA input as experimental data and an adjacent miRNA located within an effective effective distance using a miRNA cluster database (S730). The miRNA cluster database includes distance data between miRNAs, and the computing device 100 may determine that a specific miRNA and a miRNA located within 10 kb (kilobase) are within an effective effective distance. However, the effective distance does not necessarily have to be set to 10kb, and this may vary depending on the selection.

이후, 컴퓨팅 장치(100)는 특정 miRNA와 영향력 있는 유효 거리 이내에 인접 위치하는 인접 miRNA과 유전자 사이의 상관관계 값 연산할 수 있다(S740). 일예로, 도 6에 도시된 예에서, miRNA_l이 miRNA_i의 인접 miRNA일 경우, 컴퓨팅 장치(100)는 miRNA_l-gene_m에 대한 상관관계 값을 연산할 수 있다. Thereafter, the computing device 100 may calculate a correlation value between a gene and an adjacent miRNA located adjacent within an effective distance between the specific miRNA and the effective distance (S740). As an example, in the example shown in FIG. 6, when miRNA _l is an adjacent miRNA of miRNA _i , the computing device 100 may calculate a correlation value for miRNA _l -gene _m .

실시예 3 - 전사인자(Transcription Factor)를 고려한 상관관계 연산Example 3-Calculation of correlation considering transcription factor

본 발명에 따른 컴퓨팅 장치(100)는 유전자들 사이의 전사인자를 고려하여 상관관계 값을 연산할 수 있다. 유전자들 사이의 전사인자를 고려하여 상관관계 값을 연산하는 것에 대해서는 후술되는 도면을 참조하기로 한다.The computing device 100 according to the present invention may calculate a correlation value in consideration of transcription factors between genes. For calculation of the correlation value in consideration of transcription factors between genes, reference will be made to the drawings to be described later.

도 8은 전사인자 데이터베이스를 이용하여 특정 miRNA 및 전사조절 유전자 사이의 상관관계 값을 연산하는 방법을 설명하기 위한 개념도이고, 도 9는 전사인자 데이터베이스를 이용하여 특정 miRNA 및 전사조절 유전자 사이의 가중치 값을 연산하는 방법의 흐름도이다. 8 is a conceptual diagram for explaining a method of calculating a correlation value between a specific miRNA and a transcription control gene using a transcription factor database, and FIG. 9 is a weight value between a specific miRNA and a transcription control gene using a transcription factor database Is a flow chart of how to calculate.

먼저, 마이크로어레이 실험을 통해 얻어진 유전자 발현 프로파일(Genes Expression Profiles) 및 miRNA 발현 프로파일(miRNAs Expression Profiles)을 포함하는 실험 데이터가 입력되면(S910), 제어부(140)는 입력된 실험 데이터를 기초로, 특정 miRNA와 특정 유전자 사이의 상관관계(Correlation)를 연산할 수 있다(S920). First, when experimental data including gene expression profiles and miRNAs expression profiles obtained through microarray experiments are input (S910), the control unit 140 based on the input experimental data, Correlation between a specific miRNA and a specific gene can be calculated (S920).

이후, 컴퓨팅 장치(100)는 전사인자 데이터베이스(Transcription Factor DataBase)로부터 특정 유전자의 전사조절 부위 DNA에 특이적으로 결합하여 특정 유전자의 전사를 활성화시키거나 억제하는 전사조절 유전자의 존부를 확인할 수 있다(S930). Thereafter, the computing device 100 may check the presence or absence of a transcription control gene that activates or inhibits the transcription of a specific gene by specifically binding to the DNA of the transcription control region of a specific gene from the transcription factor database ( S930).

특정 유전자의 전사조절 유전자가 존재하면, 컴퓨팅 장치(100)는 전사조절 유전자와 miRNA 사이의 상관관계 값을 연산할 수 있다(S940). 일예로, 도 8에 도시된 예에서, gene_m의 전사조절 유전자가 gene_n인 경우, 컴퓨팅 장치(100)는 miRNA_a-gene_n 사이의 상관관계 값을 기초로, miRNA_a-gene_m 사이의 상관관계 값을 연산할 수 있다.When the transcriptional control gene of a specific gene exists, the computing device 100 may calculate a correlation value between the transcriptional control gene and miRNA (S940). As an example, in the example shown in FIG. 8, when the transcriptional control gene of gene _m is gene _n , the computing device 100 is based on the correlation value between miRNA _a -gene _n, between miRNA _a -gene _m Correlation values can be calculated.

실시예 1 내지 3을 통해 연산된 상관관계 값을 기초로, 컴퓨팅 장치(100)는 유사 miRNA의 유전자에 대한 인터랙션 점수, 인접 miRNA의 유전자에 대한 인터랙션 점수, 전사조절 유전자의 miRNA에 대한 인터랙션 점수를 연산할 수 있다.Based on the correlation values calculated through Examples 1 to 3, the computing device 100 calculates an interaction score for a gene of a similar miRNA, an interaction score for a gene of an adjacent miRNA, and an interaction score for a miRNA of a transcription control gene. Can be calculated.

마이크로 RNA 표적 유전자 분석 알고리즘을 통해 miRNA-유전자 사이의 인터랙션 점수가 도출되면, 컴퓨팅 장치(100)는 차등 발현 유전자 분석 알고리즘을 이용한 담도암 환자의 특이 발현 유전자 목록을 이용하여 바이오마커를 검출할 수 있다. When the interaction score between miRNA-genes is derived through the microRNA target gene analysis algorithm, the computing device 100 may detect the biomarker using the list of specific expressed genes of biliary tract cancer patients using the differentially expressed gene analysis algorithm. .

상술한 바이오마커 발굴용 통합 분석 알고리즘을 기초로 담도암 환자용 바이오마커를 분석하는 방법에 대해 상세히 설명한다.A method of analyzing a biomarker for a patient with biliary tract cancer based on the above-described biomarker discovery integrated analysis algorithm will be described in detail.

도 10은 바이오마커 발굴용 통합 분석 알고리즘을 기초로 담도암 환자용 바이오마커를 발굴하는 방법의 흐름도이다. 설명의 편의를 위해, 컴퓨팅 장치(100)는 이후 차등 발현 유전자 분석 알고리즘을 이용한 담도암 환자에게서 정상인과 다르게 특이발현(예컨대, 과발현 또는 저발현)되는 유전자 목록을 저장하고 있는 상태라 가정한다. 10 is a flowchart of a method of discovering a biomarker for a biliary tract cancer patient based on an integrated analysis algorithm for discovering a biomarker. For convenience of explanation, it is assumed that the computing device 100 stores a list of genes that are specifically expressed (eg, over-expressed or under-expressed) in a biliary tract cancer patient using a differential expression gene analysis algorithm differently from a normal person.

도 10을 참조하면, 컴퓨팅 장치(100)는 마이크로 RNA 표적 유전자 분석 알고리즘을 이용하여 miRNA-유전자 사이의 인터랙션 점수를 연산할 수 있다(S1010). 인터랙션 점수를 연산하는 단계는 앞서 도 4 내지 도 9를 통해 설명한 바와 같으므로, 이에 대한 상세한 설명은 생략한다. Referring to FIG. 10, the computing device 100 may calculate an interaction score between miRNA-genes using a microRNA target gene analysis algorithm (S1010). Since the operation of calculating the interaction score is the same as described above with reference to FIGS. 4 to 9, detailed descriptions thereof will be omitted.

이후, 컴퓨팅 장치(100)는 인터랙션 점수가 상위 n 번째 이내인 miRNA-유전자 페어를 선택하고(S1020), 선택된 miRNA-유전자 페어에서의 유전자와 차등 발현 유전자 분석 알고리즘을 기초로 담도암 환자에게서 정상인과 다르게 특이발현되는 유전자 목록의 교집합의 페어인 miRNA의 세트를 담도암 진단용 바이오마커로 결정할 수 있다(S1030). 즉, 인터랙션 점수가 높으면서 차등 발현 유전자 분석 알고리즘에서도 담도암 환자에게서 정상인과 다르게 특이발현되는 유전자의 페어인 miRNA의 세트를 담도암 진단용 바이오마커로 결정할 수 있다. Thereafter, the computing device 100 selects a miRNA-gene pair whose interaction score is within the top n-th (S1020), and based on an algorithm for analyzing genes and differentially expressed genes in the selected miRNA-gene pair, the biliary tract cancer patient A set of miRNAs, which are pairs of intersections of differently specifically expressed genes list, may be determined as a biomarker for diagnosis of biliary tract cancer (S1030). That is, even in the differential expression gene analysis algorithm with a high interaction score, a set of miRNAs, which are pairs of genes that are specifically expressed differently from normal people in biliary tract cancer patients, can be determined as biomarkers for biliary tract cancer diagnosis.

또 다른 예로, 컴퓨팅 장치(100)는 miRNA-유전자 페어의 인터랙션 점수가 높은 순으로 m 개의 유전자를 결정하고, 차등 발현 유전자 분석 알고리즘을 기초로 담도암 환자에게서 정상인과 다르게 특이발현되는 유전자 목록의 교집합의 페어인 miRNA를 담도암 진단용 바이오마커로 결정할 수도 있다.As another example, the computing device 100 determines m genes in the order of the highest interaction score of the miRNA-gene pair, and based on the differentially expressed gene analysis algorithm, the intersection of the list of genes specifically expressed differently from the normal person in the biliary tract cancer patient The pair of miRNAs can also be determined as a biomarker for diagnosis of biliary tract cancer.

miRNA 예측 툴로, Targetscan, miRDB, DIANA-microT, PITA, miRanda, MicroCosm 의 6개를 이용하고, 생물학적 샘플로 조직을 사용 시, miRNA-유전자 페어의 인터랙션 점수 중 상위 n 개의 유전자(q-value가 0.05 이하이면서 동시에 상관관계 값이 -0.5 이하)의 페어인 miRNA 세트로, hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, hsa-miR-200a-3p, hsa-miR-200b-3p, hsa-miR-222-3p, 및 hsa-miR-331-3p가 담도암 진단용 바이오마커로 결정될 수 있다.As a miRNA prediction tool, 6 of Targetscan, miRDB, DIANA-microT, PITA, miRanda, and MicroCosm are used, and when tissue is used as a biological sample, the top n genes (q-value is 0.05) among the interaction scores of the miRNA-gene pair. A set of miRNAs that are pairs of less than or equal to -0.5), hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, hsa-miR-200a-3p, hsa-miR-200b-3p, hsa-miR-222-3p, and hsa-miR-331-3p may be determined as biomarkers for diagnosis of biliary tract cancer .

또한, 생물학적 샘플로 혈액을 사용 시, hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, 및 hsa-miR-222-3p가 담도암 진단용 바이오마커로 결정된다. In addition, when using blood as a biological sample, hsa-miR-21-5p, hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-155-5p, hsa-miR-181a-5p, And hsa-miR-222-3p is determined as a biomarker for diagnosis of biliary tract cancer.

상술한 바이오마커에 속하는 각 miRNA의 염기서열은 하기 표 2와 같다. The base sequence of each miRNA belonging to the above-described biomarker is shown in Table 2 below.

Mature_idMature_id miRNA_idmiRNA_id SequenceSequence hsa-miR-21-5phsa-miR-21-5p hsa-mir-21hsa-mir-21 UAGCUUAUCAGACUGAUGUUGAUAGCUUAUCAGACUGAUGUUGA hsa-miR-93-5phsa-miR-93-5p hsa-mir-93hsa-mir-93 CAAAGUGCUGUUCGUGCAGGUAGCAAAGUGCUGUUCGUGCAGGUAG hsa-miR-106b-5phsa-miR-106b-5p hsa-mir-106bhsa-mir-106b UAAAGUGCUGACAGUGCAGAUUAAAGUGCUGACAGUGCAGAU hsa-miR-155-5phsa-miR-155-5p hsa-mir-155hsa-mir-155 UUAAUGCUAAUCGUGAUAGGGGUUUAAUGCUAAUCGUGAUAGGGGU hsa-miR-181a-5phsa-miR-181a-5p hsa-mir-181a-1, hsa-mir-181a-2hsa-mir-181a-1, hsa-mir-181a-2 AACAUUCAACGCUGUCGGUGAGUAACAUUCAACGCUGUCGGUGAGU hsa-miR-200a-3phsa-miR-200a-3p hsa-mir-200ahsa-mir-200a UAACACUGUCUGGUAACGAUGUUAACACUGUCUGGUAACGAUGU hsa-miR-200b-3phsa-miR-200b-3p hsa-mir-200bhsa-mir-200b UAAUACUGCCUGGUAAUGAUGAUAAUACUGCCUGGUAAUGAUGA hsa-miR-222-3phsa-miR-222-3p hsa-mir-222hsa-mir-222 AGCUACAUCUGGCUACUGGGUAGCUACAUCUGGCUACUGGGU hsa-miR-331-3phsa-miR-331-3p hsa-mir-331hsa-mir-331 GCCCCUGGGCCUAUCCUAGAAGCCCCUGGGCCUAUCCUAGAA

상기 결과에 의해 획득된 담도암 진단용 바이오마커의 실험 과정 및 결과에 대해 상세히 설명한다.The experimental process and results of the biomarker for diagnosis of biliary tract cancer obtained by the above results will be described in detail.

담도암 환자 샘플 및 마이크로어레이 실험Biliary Cancer Patient Sample and Microarray Experiment

간암 연구소 및 중산 병원(Liver Cancer Institute and Zhongshan Hospital, 중국 상하이 푸단 대학교)에서 2002년 내지 2003년, 및 카나자와 대학교 병원(Kanazawa University Hospital, 일본, 이시카와)에서 2008년 내지 2010년에 치유 절제를 거친 아시아계 환자로부터 간내 담도암종(intrahepatic cholangiocellular carcinoma, ICC) 및 복합 간세포 담도암종(combined hepatocellular cholangiocarcinoma, CHC) 조직을 사전 동의를 얻어 수득하였다. Asian-Americans who underwent healing resections in 2002-2003 at Liver Cancer Institute and Zhongshan Hospital (Fudan University in Shanghai, China) and 2008-2010 at Kanazawa University Hospital (Ishikawa, Japan) Intrahepatic cholangiocellular carcinoma (ICC) and combined hepatocellular cholangiocarcinoma (CHC) tissues were obtained from patients with informed consent.

샘플은 상응하는 연구소의 임상시험심사위원회(Institutional Review Board)에 의해 승인되었고, 인간 대상 연구(Human Subjects Research)의 미국 국립보건원(National Institutes of Health, NIH) 사무국에 의해 기록되었다. 총 23개의 ICC 및 CHC 사례가 mRNA 및 마이크로 RNA 특징(microRNA signatures)을 만들기 위해 사용되었다. 혈청 검사 및 이미징에 근거하여 조기 진단하였고, 병리학자들이 병리조직학적으로 이를 확인하였다. 독립적인 집단으로부터 68명의 백인 ICC 환자들의 특성화가 최근에 기술되었다(Hepatology, Vol. 56, No. 5, 1792-803).Samples were approved by the corresponding institute's Institutional Review Board and recorded by the U.S. National Institutes of Health (NIH) Secretariat of Human Subjects Research. A total of 23 ICC and CHC cases were used to create mRNA and microRNA signatures. Early diagnosis was made based on serologic examination and imaging, and pathologists confirmed it histologically. Characterization of 68 Caucasian ICC patients from an independent population has been recently described (Hepatology, Vol. 56, No. 5, 1792-803).

본 발명의 바이오마커 세트의 검증Verification of the biomarker set of the present invention

본 발명의 조직 샘플용 바이오마커 세트의 담도암 판정의 검증은 담도암 환자 25명 및 정상인 10명으로 총 35명을 대상으로 하였다. 상기 대상으로부터 채취한 혈액을 이용하여, GEO(Gene Expression Omnibus) 데이터 GSE32957을 사용하여 계층적 군집분석 방법(hierarchical clustering, euclidean distance, complete method)으로 판정하였다. 그 결과, 담도암에 대한 민감도(sensitivity)가 96%(24/25) 및 특이도(specificity)가 100%(10/10)로 매우 우수하게 나타났다. 도 11은 데이터 GSE32957을 사용하였을 때의 계층적 군집 분석 결과를 히트 맵(Heat Map)으로 도시한 도면이다. 도 11에서 히트 맵 상단에 위치한 붉은색 바(Bar)는 암환자를 의미하고, 파란색 바는 정상인을 의미한다. The verification of the determination of biliary tract cancer of the biomarker set for tissue samples of the present invention was performed in a total of 35 patients with biliary tract cancer and 10 normal subjects. Using blood collected from the subject, it was determined by a hierarchical clustering method (hierarchical clustering, euclidean distance, complete method) using GEO (Gene Expression Omnibus) data GSE32957. As a result, the sensitivity to biliary tract cancer was 96% (24/25) and the specificity was 100% (10/10). 11 is a diagram showing a hierarchical cluster analysis result when data GSE32957 is used as a heat map. In FIG. 11, a red bar located at the top of the heat map indicates a cancer patient, and a blue bar indicates a normal person.

또한, 본 발명의 조직 샘플용 및 혈액 샘플용 바이오마커의 담도암 판정의 검증을 별도로 실시하였다. 먼저 조직 샘플용 바이오마커는 담도암 환자 2명 및 정상인 2명으로 총 4명을 대상으로 하고, 혈액 샘플용 바이오마커는 담도암 환자 8명 및 정상인 2명을 대상으로 하였다. 상기 대상으로부터 채취한 조직 및 혈액을 각각 이용하여, 차세대 유전체 시퀀싱(Next Generation Sequencing, NGS) 방법인 스몰 RNA 시퀀싱 데이터(Small RNA sequencing data)를 사용하여 계층적 군집 분석 방법(hierarchical clustering, euclidean distance, complete method)으로 판정하였다. 이러한 스몰 RNA 시퀀싱 데이터 분석의 일반적인 설명은 도 12에 개시되어 있다. 그 결과, 조직 샘플에 대한 바이오마커의 담도암에 대한 민감도(sensitivity)가 100%(2/2) 및 특이도(specificity)가 100%(2/2)로 나타났고, 이 경우 Small RNA sequencing 데이터를 이용하였을 때의 계층적 군집 분석 결과를 도 13에 도시하였다. 또한 혈액 샘플에 대한 바이오마커의 담도암에 대한 민감도(sensitivity)가 75%(6/8) 및 특이도(specificity)가 50%(1/2)로 나타났고, 이 경우 Small RNA sequencing 데이터를 이용하였을 때의 계층적 군집 분석 결과를 도 14에 도시하였다. 도 13 및 14에서 히트 맵 상단에 위치한 붉은색 바(Bar)는 암환자를 의미하고, 파란색 바는 정상인을 의미한다. Further, verification of the determination of biliary tract cancer of the biomarkers for tissue samples and blood samples of the present invention was performed separately. First, biomarkers for tissue samples were 2 patients with biliary tract cancer and 2 normal people, a total of 4 patients, and biomarkers for blood samples were 8 patients with biliary tract cancer and 2 normal people. Hierarchical clustering, euclidean distance, using Small RNA sequencing data, which is a Next Generation Sequencing (NGS) method, using tissues and blood collected from the target, respectively. complete method). A general description of this small RNA sequencing data analysis is disclosed in FIG. 12. As a result, the sensitivity of the biomarker to the tissue sample for biliary tract cancer was 100% (2/2) and the specificity was 100% (2/2). In this case, Small RNA sequencing data Fig. 13 shows the results of hierarchical cluster analysis when using. In addition, the sensitivity of the biomarker to the blood sample for biliary tract cancer was 75% (6/8) and the specificity was 50% (1/2). In this case, small RNA sequencing data were used. 14 shows the results of hierarchical cluster analysis. In FIGS. 13 and 14, a red bar located at the top of the heat map indicates a cancer patient, and a blue bar indicates a normal person.

한편, 상술한 바이오마커는 이를 포함한 담도암 진단용 장치로서 사용된다. 상기 담도암 진단용 장치로는 진단 칩, 진단 키트, 정량 PCR(qPCR) 장비, 현장검사(POCT) 장비, 시퀀서 등이 있다. 상기 진단 칩, 진단 키트, 정량 PCR(qPCR) 장비, 현장검사(POCT) 장비, 시퀀서에서, 바이오마커 세트를 제외한 부분은 공지된 것을 활용할 수 있다.Meanwhile, the above-described biomarker is used as an apparatus for diagnosing biliary tract cancer including the same. The apparatus for diagnosing biliary tract cancer includes a diagnostic chip, a diagnostic kit, a quantitative PCR (qPCR) equipment, a point-of-care (POCT) equipment, and a sequencer. In the diagnostic chip, diagnostic kit, quantitative PCR (qPCR) equipment, field test (POCT) equipment, and sequencer, known parts excluding the biomarker set may be used.

본 발명의 일실시예에 의하면, 전술한 방법들은, 프로그램이 기록된 매체에 프로세서가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 프로세서가 읽을 수 있는 매체의 예로는, ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다.According to an embodiment of the present invention, the above-described methods can be implemented as code readable by a processor in a medium on which a program is recorded. Examples of media that can be read by the processor include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, etc., and may be implemented in the form of a carrier wave (for example, transmission over the Internet). Include.

100 : 컴퓨팅 장치
110 : 저장부
120 : 사용자 입력부
130 : 통신부
140 : 제어부
150 : 디스플레이부100: computing device
110: storage unit
120: user input unit
130: communication department
140: control unit
150: display unit

Claims

As a biomarker extraction method for diagnosis of biliary tract cancer,
Calculating an interaction score obtained by quantifying the degree of complementary binding between the micro RNA and the gene;
Determining n microRNAs and gene pairs having a high interaction score; And
Extracting microRNAs, which are pairs of genes common to genes specifically expressed in patients with biliary tract cancer, among the n microRNAs and gene pairs
Including,
The step of calculating the interaction score,
Obtaining at least one or more databases in which prediction scores between microRNAs and genes are statistically calculated;
Calculating a normalized score from the prediction score between the micro RNA and the gene;
Calculating a binding order of micro RNAs for each gene and a binding order of genes for each micro RNAs based on the normalized score; And
Computing an interaction score based on the binding rank of the micro RNA and the binding rank of the gene
Including,
The normalized score is calculated based on the predicted score ranking of the microRNA and gene pairs in the one or more databases, and the normalized score is calculated based on Equation 1 below,
[Equation 1]

(In Equation 1, i is the i-th database, n is the number of databases, T _i is the total number of microRNA-gene pairs in the i-th database, R _{i, j} is the j-th database in the i-th database, Means the predicted score ranking of the micro RNA-gene pair),
The interaction score is calculated based on the ranking of the microRNAs for each gene and the rankings of the genes for each microRNA based on the normalized score, and the interaction score is calculated based on Equation 2 below. Methods of extracting diagnostic biomarkers:
[Equation 2]

(In Equation 2,
t _mi is the number of pairs with the i th microRNA gene (number of miRNA _i -gene), t _gj is the number of pairs with the j th gene micro RNA (number of gene _j -miRNA), r _mi is the j th The normalization score rank of the i th microRNA for the gene, r _gj means the normalization score rank of the j th gene for the i th microRNA).

The method of claim 1,
The one or more databases are biomarker extraction method for diagnosis of biliary tract cancer, characterized in that generated using a micro RNA target prediction tool.

The method of claim 2,
The micro RNA target prediction tool comprises at least one of Targetscan, miRDB, DIANA-microT, PITA, miRanda MicroCosm, RNAhybrid, PicTar, and RNA22. Biomarker extraction method for diagnosing biliary tract cancer.

A storage unit for storing data; And
Includes a control unit for calculation,
The control unit calculates an interaction score obtained by quantifying the degree of complementary binding between the microRNA and the gene, determines n microRNAs and gene pairs having a high interaction score, and among the n microRNAs and gene pairs, in patients with biliary tract cancer A computing device for executing the method of extracting a biomarker for diagnosis of biliary tract cancer according to any one of claims 1 to 3, characterized in that it is configured to extract microRNAs, which are pairs of genes common to genes that are specifically expressed.

delete