KR20170020189A

KR20170020189A - Method and device for mutation prioritization for personalized therapy of one or more patients

Info

Publication number: KR20170020189A
Application number: KR1020150153809A
Authority: KR
Inventors: 가리마 아가왈; 라마 스리칸스 말라바라푸; 시얌순더 아지트 보파르디카; 안태진
Original assignee: 삼성전자주식회사
Priority date: 2015-08-12
Filing date: 2015-11-03
Publication date: 2017-02-22
Also published as: KR102618536B1

Abstract

Provided are a method and an apparatus for performing a mutation prioritization for a treatment personalized to a patient. A method and an apparatus for creating a disease knowledge base are also provided. With respect to a certain set of disease, a gene and mutation, multiple pieces of information which belong to various categories of knowledge sources can be identified. In order to search the most relevant information to perform a treatment customized to the certain set of disease, a gene and mutation for a patient, the identified multiple pieces of information are ranked in the disease knowledge base. Therefore, a doctor can personalize the treatment to the patient.

Description

TECHNICAL FIELD [0001] The present invention relates to a mutation prioritization method and apparatus for personalized treatment of a patient,

임상 게노믹스(clinical genomics)의 분야에 관한 것으로서, 보다 구체적으로 개인화된 치료를 위한 돌연변이 우선순위화(mutation prioritization) 방법 및 장치에 관한다.To the field of clinical genomics, and more particularly to mutation prioritization methods and apparatus for personalized therapy.

개인화된 진단에 기초한 차세대 시퀀싱(Next generation sequencing, NGS)은 건강 관리에 있어 임상적 결정을 위한 가치 있는 도구로서 큰 잠재력을 갖는다. 그 시장 가치는 현재 대략 393만 달러로 추산되고 매년 급격하게 성장하는 추세이다. 개인화된 진단은 유전적 질병들, 특히 암과 관련하여 강조되어 왔다. 미국에서만 매년 백만 명의 암 환자들이 발생하였고, 유전적 치료는 낮은 비율(약 25%)로 시행되고 있다. NGS-기반 진단은 개개인에게 효과적인 치료를 처방하는데 있어서 중요할 수 있다.Next generation sequencing (NGS) based on personalized diagnosis has great potential as a valuable tool for clinical decision making in health care. Its market value is currently estimated at approximately $ 39.3 million and is growing rapidly every year. Personalized diagnosis has been emphasized in relation to genetic diseases, especially cancer. In the United States alone, one million cancer cases occur each year, and genetic therapy is performed at a low rate (about 25%). NGS-based diagnosis may be important in order to provide effective treatment for individuals.

이와 같은 개인화된 진단은 NGS 분석 파이프라인(NGS analysis pipeline)을 통한 개인의 DNA 데이터의 분석으로부터 획득된 돌연변이 세트들에 기초한다. 개인의 질병을 특성화시키는 이 돌연변이들은 임상의들이 맞춤화된 치료를 수행하는데 도움을 준다. 비록 매우 유망하더라도, 몇몇 도전들은 돌연변이 데이터가 개인화된 치료를 하는데 유용하게 되기 전에 검토될 필요가 있다. 돌연변이-질병 간의 연관성이나 암 특이적 표적화된 치료 정보와 같은 종종 체계화되지 않은 데이터는, 자동화된 분석을 위한 체계화된 포맷으로 조직화되어야 하는 것이 핵심적인 이슈이다. 관련 정보들의 체계적인 조직화는 임상의들이나 연구자들에게 치료들을 추천하기 위한 영향력 있는 지식들에 대한 데이터 지향적인(data-driven) 접근들에서 필수적인 역할을 수행한다.These personalized diagnoses are based on mutation sets obtained from the analysis of individual DNA data via the NGS analysis pipeline. These mutations, which characterize an individual's illness, help clinicians perform customized therapies. Although very promising, some challenges need to be examined before mutation data become available for personalized therapy. Unstructured data, such as mutation-disease associations or cancer-specific targeted therapeutic information, is a key issue that must be organized into a structured format for automated analysis. Systematic organization of relevant information plays an essential role in data-driven approaches to influential knowledge to recommend treatments to clinicians and researchers.

존재하는 접근 방식들은 치료들 및 그 치료들을 우선화시키는데 종종 초점이 맞춰져 있다. 특정 치료를 뒷받침하는 임상 시험들 및 간행물들과 같은 소스들로부터 추출되고 만들어진 에비던스들(evidences)이 이와 같은 접근 방식들에 포함될 수 있다. 또한, 바이오마커 데이터도 이용될 수 있다. 몇몇 다른 접근 방식들에 따르면, 돌연변이들은 간행물을 이용하여, 간행물들과 같은 소스들로부터 서로 다른 클래스들로 분류될 수 있다.Existing approaches are often focused on prioritizing treatments and therapies. Evidences extracted from sources such as clinical trials and publications that support specific treatments may be included in these approaches. Biomarker data may also be used. According to some other approaches, mutations can be classified into different classes from sources such as publications, using publications.

그러므로, 사용자에 의해 구체화된 지식베이스를 고려하고, 환자의 돌연변이를 획득하고, 지식베이스 등으로부터 수집된 데이터에 기초하여 돌연변이들을 우선순위화하고, 돌연변이들에 대해 수집된 정보에 기초하여 치료 옵션들을 결정하는데 보조할 수 있는 방법이 요구되고 있다.It is therefore desirable to consider the knowledge base embodied by the user, to obtain mutations of the patient, to prioritize mutations based on data collected from knowledge bases, etc., and to determine treatment options based on the information collected for the mutations There is a need for a method that can assist in the determination.

개인화된 치료를 위한 돌연변이 우선순위화 방법 및 장치를 제공하는데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.And to provide a method and apparatus for mutation prioritization for personalized treatment. The technical problem to be solved by this embodiment is not limited to the above-mentioned technical problems, and other technical problems can be deduced from the following embodiments.

일 측면에 따르면, 개인화된 치료를 위한 돌연변이 우선순위화(mutation prioritization)를 수행하는 방법은, 질병, 유전자 및 게노믹(genomic) DNA의 변이(alteration) 중 적어도 하나와 연관된 정보를 포함하는, 치료할 환자의 돌연변이 정보를 획득하는 단계; 상기 획득된 돌연변이 정보를 질병 지식베이스(disease knowledgebase)에 매핑하는 단계; 상기 질병 지식베이스에 존재하는 데이터 포인트들에 대해 매핑된, 상기 질병, 상기 유전자 및 상기 게노믹 DNA의 상기 변이 중 적어도 하나를 식별함으로써, 매핑 돌연변이 정보(mapped mutation information)를 형성하는 단계; 상기 매핑 돌연변이 정보에 기초하여, 지식 소스(knowledge source)의 카테고리 및 카테고리 내의 각 클래스 별로, 빈도 표(frequency table)를 생성하는 단계; 및 우선순위화 스킴(prioritization scheme)에 기초하여 상기 빈도 표 내의 상기 매핑 돌연변이 정보를 우선순위화하는 단계를 포함한다.According to an aspect, a method of performing mutation prioritization for personalized therapy comprises administering a therapeutically effective amount of a therapeutic agent comprising information associated with at least one of disease, gene and alteration of genomic DNA, Obtaining mutation information of the patient; Mapping the obtained mutation information to a disease knowledge base; Forming mapped mutation information by identifying at least one of the disease, the gene and the mutation of the genomic DNA mapped to data points present in the disease knowledge base; Generating a frequency table for each class in the knowledge source and each category in the category based on the mapping mutation information; And prioritizing the mapping mutation information in the frequency table based on a prioritization scheme.

다른 측면에 따르면, 개인화된 치료를 위한 돌연변이 우선순위화(mutation prioritization)를 수행하는 장치는, 메모리; 및 상기 메모리에 연결된 하나 이상의 프로세서들을 포함하고, 상기 하나 이상의 프로세서들은 질병, 유전자 및 게노믹(genomic) DNA의 변이(alteration) 중 적어도 하나와 연관된 정보를 포함하는, 치료할 환자의 돌연변이 정보를 획득하고, 상기 획득된 돌연변이 정보를 질병 지식베이스(disease knowledgebase)에 매핑하고, 상기 질병 지식베이스에 존재하는 데이터 포인트들에 대해 매핑된, 상기 질병, 상기 유전자 및 상기 게노믹 DNA의 상기 변이 중 적어도 하나를 식별함으로써, 매핑 돌연변이 정보(mapped mutation information)를 형성하고, 상기 매핑 돌연변이 정보에 기초하여, 지식 소스(knowledge source)의 카테고리 및 상기 카테고리 내의 각 클래스 별로, 빈도 표(frequency table)를 생성하고, 우선순위화 스킴(prioritization scheme)에 기초하여 상기 빈도 표 내의 상기 매핑 돌연변이 정보를 우선순위화한다.According to another aspect, an apparatus for performing mutation prioritization for personalized therapy comprises: a memory; And one or more processors coupled to the memory, the one or more processors obtaining mutation information of the patient to be treated, including information associated with at least one of disease, gene and genomic DNA alteration , Mapping the obtained mutation information to a disease knowledge base, and mapping at least one of the disease, the gene, and the mutation of the genomic DNA mapped to data points present in the disease knowledge base And generates a frequency table for each category of a knowledge source and each class in the category on the basis of the mapping mutation information, The method of claim 1, wherein the mapping mutation in the frequency table based on a prioritization scheme Prioritize this information.

또 다른 측면에 따르면, 질병 지식베이스(disease knowledgebase)를 생성하는 방법은, 질병, 유전자, 게노믹 DNA의 변이, 임상적 관련(clinical relevance)의 파라미터 중 적어도 하나에 관련된 정보를, 하나 이상의 카테고리들이 속하는 하나 이상의 지식 소스들로부터 획득하는 단계; 상기 지식 소스들로부터, 상기 질병, 상기 유전자, 상기 게노믹 DNA의 상기 변이, 상기 임상적 관련의 상기 파라미터 중 적어도 하나를 나타내는 하나 이상의 데이터 포인트들을 추출하기 위하여, 상기 획득된 정보를 큐레이팅하는(curating) 단계; 상기 질병, 상기 유전자 및 상기 게노믹 DNA의 상기 변이를 나타내는 데이터 포인트들과, 상기 임상적 관련의 상기 파라미터를 나타내는 데이터 포인트들 간의 연관성들(associations)을 식별함으로써, 상기 데이터 포인트들의 상기 연관성들에 대한 데이터를 형성하는 단계; 상기 질병, 상기 유전자, 상기 게노믹 DNA의 상기 변이의 연결(linkage)을 위하여 상기 데이터 포인트들의 상기 연관성들을 하나 이상의 클래스들로 분류하는 단계; 및 상기 카테고리들의 상기 분류된 데이터 포인트들의 상기 연관성들에 기초하여 상기 질병 지식베이스를 생성하는 단계를 포함한다.According to another aspect, a method for generating a disease knowledge base comprises providing information relating to at least one of a disease, a gene, a variation of genomic DNA, a parameter of a clinical relevance, Obtaining from at least one knowledge source to which it belongs; From said knowledge sources, to extract one or more data points representing at least one of said disease, said gene, said variation of said genomic DNA, said parameter of said clinical relevance, ) step; Identifying associations between the data points representing the disease, the gene and the variation of the genomic DNA, and the data points representing the parameter of the clinical relevance, to the associations of the data points Forming data on the data; Classifying the associations of the data points into one or more classes for the disease, the gene, the linkage of the mutation of the genomic DNA; And generating the disease knowledge base based on the associations of the classified data points of the categories.

또 다른 측면에 따르면, 질병 지식베이스(disease knowledgebase)를 생성하는 장치는, 상기 메모리에 연결된 하나 이상의 프로세서들을 포함하고, 상기 하나 이상의 프로세서들은 질병, 유전자, 게노믹 DNA의 변이, 임상적 관련(clinical relevance)의 파라미터 중 적어도 하나에 관련된 정보를, 하나 이상의 카테고리들이 속하는 하나 이상의 지식 소스들로부터 획득하고, 상기 지식 소스들로부터, 상기 질병, 상기 유전자, 상기 게노믹 DNA의 상기 변이, 상기 임상적 관련의 상기 파라미터 중 적어도 하나를 나타내는 하나 이상의 데이터 포인트들을 추출하기 위하여, 상기 획득된 정보를 큐레이팅하고(curating), 상기 질병, 상기 유전자 및 상기 게노믹 DNA의 상기 변이를 나타내는 데이터 포인트들과, 상기 임상적 관련의 상기 파라미터를 나타내는 데이터 포인트들 간의 연관성들(associations)을 식별함으로써, 상기 데이터 포인트들의 상기 연관성들에 대한 데이터를 형성하고, 상기 질병, 상기 유전자, 상기 게노믹 DNA의 상기 변이의 연결(linkage)을 위하여 상기 데이터 포인트들의 상기 연관성들을 하나 이상의 클래스들로 분류하고, 상기 카테고리들의 상기 분류된 데이터 포인트들의 상기 연관성들에 기초하여 상기 질병 지식베이스를 생성한다.According to another aspect, an apparatus for generating a disease knowledge base includes one or more processors coupled to the memory, wherein the one or more processors are operable to identify a disease, a gene, a genomic DNA variation, a clinical relevance of at least one of the categories of genomic DNA from one or more knowledge sources to which one or more categories belong, from the knowledge sources, wherein the disease, the gene, the variation of the genomic DNA, Curating said obtained information to extract one or more data points representing at least one of said parameters of said genome and said genomic DNA, The relationship between the data points representing the parameter Identifying the associations of the data points for linkage of the disease, the gene, the variation of the genomic DNA, One or more classes, and generates the disease knowledge base based on the associations of the classified data points of the categories.

도 1은 일 실시예에 따른, 환자의 개인화된 치료를 위한 돌연변이 우선화(mutation prioritization)하는 방법의 흐름도이다.
도 2는 일 실시예에 따라, 환자의 변이 데이터(예를 들어, VCF 파일)를 획득하고 빈도 표를 생성하는 것을 도식화한 흐름도이다.
도 3은 일 실시예에 따라, 환자의 변이 데이터(예를 들어, VCF 파일)로부터 빈도 표를 생성하는 단계를 설명하기 위한 도면이다.
도 4a 및 도 4b는 일 실시예에 따라, 2개의 우선순위화 스킴들을 설명하기 위한 도면들이다.
도 5는 일 실시예에 따라, 치료 에비던스 값(therapy evidence value)보다 더 높은 임상 시험 에비던스 값(clinical trial evidence value)에 기초하여 돌연변이들을 정렬(sort)하는 것을 설명하기 위한 도면이다.
도 6은 일 실시예에 따라, 임상 시험 에비던스 값보다 더 높은 치료 에비던스 값에 기초하여 돌연변이들을 정렬하는 것을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른, 하나 이상의 환자들에 대한 개인화된 치료를 위한 돌연변이 우선순위화를 수행하는 장치의 블록도이다.
도 8은 일 실시예에 따른, 질병 지식베이스를 생성하는 방법의 흐름도이다.
도 9는 일 실시예에 따라 지식 소스들의 복수의 카테고리들로부터 데이터를 획득 및 축적하고(aggregating), 데이터 포인트들을 획득하기 위해 데이터를 큐레이팅하고(curating), 데이터 포인트들을 분류하는 것을 설명하기 위한 도면이다.
도 10은 일 실시예에 따라, 질병 지식베이스를 생성하는 장치의 블록도이다.1 is a flow diagram of a method for mutation prioritization for personalized treatment of a patient, according to one embodiment.
2 is a flow chart diagram illustrating obtaining patient variation data (e.g., a VCF file) and generating a frequency table, according to one embodiment.
3 is a diagram for describing a step of generating a frequency table from a patient's variation data (e.g., a VCF file), according to one embodiment.
Figures 4A and 4B are diagrams illustrating two prioritization schemes, according to one embodiment.
Figure 5 is a diagram for illustrating the sorting of mutations based on a clinical trial evidence value that is higher than the therapy evidence value, according to one embodiment.
6 is a diagram for illustrating alignment of mutations based on a treatment benefit value that is higher than the clinical trial avidity value, according to one embodiment.
7 is a block diagram of an apparatus for performing mutation prioritization for personalized treatment of one or more patients, according to one embodiment.
8 is a flow diagram of a method for generating a disease knowledge base, in accordance with one embodiment.
FIG. 9 is a diagram for illustrating the acquisition and aggregation of data from a plurality of categories of knowledge sources according to one embodiment, curating data to obtain data points, and classifying data points. to be.
10 is a block diagram of an apparatus for generating a disease knowledge base, in accordance with one embodiment.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들의 전반에 걸친 내용을 토대로 정의되어야 한다.Although the terms used in the present embodiments have been selected in consideration of the functions in the present embodiments and are currently available in common terms, they may vary depending on the intention or the precedent of the technician working in the art, the emergence of new technology . Also, in certain cases, there are arbitrarily selected terms, and in this case, the meaning will be described in detail in the description part of the embodiment. Therefore, the terms used in the embodiments should be defined based on the meaning of the terms, not on the names of simple terms, and on the contents of the embodiments throughout.

실시예들에 대한 설명들에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 실시예들에 기재된 “...부”, “...모듈”의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the descriptions of the embodiments, when a part is connected to another part, it includes not only a case where the part is directly connected but also a case where the part is electrically connected with another part in between . Also, when a component includes an element, it is understood that the element may include other elements, not the exclusion of any other element unless specifically stated otherwise. The term " ... ", " module ", as used in the embodiments, means a unit for processing at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software Can be implemented.

본 실시예들에서 사용되는 “구성된다” 또는 “포함한다” 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 도는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.It should be noted that the terms such as " comprising " or " comprising ", as used in these embodiments, should not be construed as necessarily including the various components or stages described in the specification, Some steps may not be included, or may be interpreted to include additional components or steps.

하기 실시예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시예들을 상세히 설명하기로 한다.The following description of the embodiments should not be construed as limiting the scope of the present invention and should be construed as being within the scope of the embodiments of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Exemplary embodiments will now be described in detail with reference to the accompanying drawings.

본 실시예들에서, “게노믹 DNA의 변이(alteration in genomic DNA)”의 용어는, 치환(substitutions), 삽입(insertions), 결실(deletions), 프레임시프트(frameshifts) 등과 같은 모든 종류의 돌연변이들을 포함한다. 그리고, 본 실시예들에서 “게노믹 DNA의 변이”의 용어는 “돌연변이”의 용어와 동의어로서 혼용되어 사용될 수 있다.
In these embodiments, the term " alteration in genomic DNA " refers to all kinds of mutations, such as substitutions, insertions, deletions, frame shifts, . In the present embodiments, the term " mutation of genomic DNA " can be used interchangeably as a synonym for the term " mutation ".

환자의 개인화된 치료를 위한 돌연변이 Mutation for patient's personalized treatment 우선화Prioritization (mutation prioritization)(mutation prioritization)

본 실시예는, 환자의 개인화된 치료를 수행하는데 도움을 주는 돌연변이 우선화를 위한 방법 및 장치를 제공한다. 다시 말하면, 본 실시예는 의사 입장에서 환자에게 주어질 치료를 개인화시키는 것을 가능하게 한다. 암에 걸린 환자의 돌연변이 맵(mutation map)은 일반적으로, 수십에서 수천만의 게노믹 DNA(genomic DNA)의 변이들(alterations)을 나타낸다. 암 환자의 치료가 진행되는 동안, 개인화된 치료를 목표로 하는데 있어서 유용한 게노믹 DNA의 변이들을 식별하는 것은 어려운 일이다. 본 실시예는, 가장 관련이 깊은 게노믹 DNA의 변이를 식별하는 문제를 해결하고자 한다. 그럼으로써, 임상 시험들(clinical trials), 치료(therapy)에 관한 웹 사이트들, 또는 간행물들 등과 같은 다양한 카테고리들의 지식 소스들(knowledge sources)로부터 수집된 데이터에 의해 증거(evidence)를 지지하는 것에 기초하여, 환자의 게노믹 DNA의 변이들을 정렬(sort)하는 의사결정 지원 시스템(decision support system)을 제공하고자 한다. 임상 시험에 관한 지식 소스는 예를 들어 ClinicalTrials . gov 등일 수 있고, 치료에 관한 지식 소스는 예를 들어 Drugs@FDA® 또는 DrugBank® 등일 수 있고, 간행물에 관한 지식 소스는 예를 들어 PubMed® 등일 수 있다. 환자의 게노믹 DNA에서 정렬된 변이들은, 임상의 또는 연구자들이 정확한 의사결정을 하는데 도움을 주는 가장 관련이 깊은 돌연변이들을 식별하는데 있어서, 의사, 케어기버(caregiver), 연구자 등에게 적정한 아이디어(fair idea)를 준다.This embodiment provides a method and apparatus for mutation prioritization that aids in performing a personalized treatment of a patient. In other words, this embodiment makes it possible to personalize the treatment to be given to the patient in the physician's position. The mutation map of a patient with cancer generally represents tens or tens of millions of genomic DNA alterations. It is difficult to identify variants of genomic DNA that are useful in targeting personalized therapies during the treatment of cancer patients. This example attempts to solve the problem of identifying the most relevant genomic DNA variation. Thereby, supporting evidence by data collected from knowledge sources of various categories, such as clinical trials, therapy websites, or publications, Based on the above, a decision support system for sorting variations of genomic DNA of a patient is provided. Knowledge sources for clinical trials are, for example, Clinical Trials . gov, etc., and the source of knowledge about treatment may be, for example, Drugs @ FDA® or DrugBank®, and the source of knowledge about publications may be PubMed®, for example. Altered mutations in the patient's genomic DNA can be used to identify the most relevant mutations that help clinicians or researchers make the right decisions, such as fair ideas for doctors, caregivers, ).

도 1은 일 실시예에 따른 환자의 개인화된 치료를 위한 돌연변이 우선화하는 방법의 흐름도이다.1 is a flow diagram of a method for mutating prioritization for personalized treatment of a patient according to one embodiment.

102 단계에서, 치료할 환자의 돌연변이 정보를 획득한다. 돌연변이 정보는, 도 2에 도시된 바와 같이, 질병, 유전자 또는 게노믹 DNA의 변이와 관련된 정보를 포함할 수 있다. 환자의 돌연변이 정보의 생성(generation)은 당해 기술분야에서 알려진 방법들을 이용하여 수행될 수 있다. 예를 들어, 환자의 게놈(genome)은 관련된 돌연변이들을 식별하기 위해 서열화(sequenced) 및 분석된다. 표준 NGS 파이프라인(standard NGS pipeline)을 이용하여 식별된 돌연변이들을 포함하는 환자 변이 데이터(patient variation data), 예를 들어 VCF(Variant Call Format) 파일이 생성될 수 있다.In step 102, mutation information of the patient to be treated is obtained. The mutation information may include information related to the variation of disease, gene or genomic DNA, as shown in Fig. Generation of the patient ' s mutation information can be performed using methods known in the art. For example, the patient ' s genome is sequenced and analyzed to identify related mutations. Patient variation data including mutations identified, for example, a Variant Call Format (VCF) file, may be generated using a standard NGS pipeline.

104 단계에서, 획득된 돌연변이 정보는, 획득된 돌연변이 정보와 관련된 정보가 질병 지식베이스(disease knowledgebase)에서 이용 가능한지 찾기 위하여 질병 지식베이스에 매핑된다. 이와 같은 매핑은 획득된 돌연변이 정보에 관하여, 질병 지식베이스에 이용 가능한 관련 정보를 발견하는데 도움을 줄 수 있다.In step 104, the acquired mutation information is mapped to a disease knowledge base to find out if the information associated with the obtained mutation information is available in the disease knowledge base. Such a mapping can help to find relevant information available on the disease knowledge base for acquired mutation information.

질병 지식베이스는, 하나 이상의 카테고리들에 속하는 하나 이상의 지식 소스들로부터 데이터들을 수집함으로써 사전에 미리 생성될 수 있다. 질병 지식베이스는, 지식 소스의 카테고리로부터 유래된, 게노믹 DNA의 변이, 유전자 및 질병을 나타내는 데이터 포인트(data point)와, 지식 소스의 카테고리로부터 임상적 관련(clinical relevance)의 파라미터들을 나타내는 데이터 포인트 간의 연관성들(associations) 또는 연결들(linkages)에 대한 데이터를 포함한다. 여기서 사용되는 임상적 관련의 파라미터들은, 질병 단계(disease stage), 질병 타입(disease type), 또는 질병 서브 타입(disease sub-type) 등을 포함할 수 있다.The disease knowledge base may be generated in advance by collecting data from one or more knowledge sources belonging to one or more categories. The disease knowledge base comprises a data point derived from a category of knowledge sources, representing a variation of genomic DNA, genes and diseases, and a data point representing parameters of clinical relevance from a category of knowledge sources. And data relating to associations or links between the users. The clinical relevant parameters used herein may include a disease stage, a disease type, or a disease sub-type.

지식 소스의 카테고리는 사용자 입력 또는 미리 정의된 우선순위(priority)에 기초하여 순위화(rank)된다. 나아가서, 지식 소스들의 카테고리들에서 데이터 포인트들의 연관성들(associations)에 대한 데이터는, 미리 할당된 우선 순위(pre-assigned precedence)와 함께, 지식 소스의 각 카테고리마다 미리 정의된 클래스 별로 분류된다.The categories of knowledge sources are ranked based on user input or predefined priorities. Further, data on the associations of data points in the categories of knowledge sources are categorized by predefined classes for each category of knowledge source, with pre-assigned precedence.

질병 지식베이스의 생성은, <D, G, M> 연결들(linkages) 또는 <D, G, M> 3요소(triad)와 같이 표시되는 <Disease, Gene, Mutation>의 특정 정보(specific information)에 대해 수집된 데이터를 큐레이팅하고(curating), 큐레이팅하는 동안(during curation) 식별된 모든 <D, G, M> 3요소 (즉, 연결된(linked) 데이터)를 분류하고(classifying), 지식베이스에서 <D, G, M> 3요소에 연결된(linked) 지식 소스들로부터 포인트들을 식별하는(identifying) 것과 관련이 있다. 본 명세서에서, <D, G, M>은 <Disease(질병), Gene(유전자), Mutation(돌연변이)>을 나타낸다. 이에 대해서는 이하에서 보다 구체적으로 설명하도록 한다. 앞서 설명된 바와 같이, 지식 소스들(임상 시험들, 치료에 관한 웹 사이트들, 또는 간행물들 등)의 카테고리들은 사용자 입력 및 미리 정의된 우선순위(priority) 중 어느 하나에 기초하여 순위화(rank)될 수 있다. 그러므로, 지식 소스들의 카테고리들에 할당된 랭킹에 따라, 지식 소스로부터의 특정한 데이터 포인트들은, 질병 지식베이스에 디스플레이되거나 존재할(presented) 수 있다. 예를 들어, 종종, 의사들 또는 케어기버들은 환자에 존재할 만한 특정한 돌연변이들에 대해 더 흥미를 가질 수 있다. 치료 옵션들은 종종 이러한 돌연변이들에 기초하여 결정될 수 있다. 그러므로, 사용자의 선호는 치료, 임상 시험, 그리고 간행물의 순으로 (Therapies > Clinical Trials > Publications) 고려될 수 있다.Generation of the disease knowledge base is based on specific information of <Disease, Gene, Mutation> displayed as <D, G, M> links or < (Ie, linked data) identified during curation while curating the collected data for the <D, G, M> elements (ie, linked data) Is associated with identifying points from knowledge sources linked to <D, G, M> 3 elements. In the present specification, <D, G, and M> denote <Disease, Gene, and Mutation>. This will be described in more detail below. As described above, categories of knowledge sources (such as clinical trials, treatment websites, or publications) are ranked based on user input and predefined priorities ). Thus, depending on the ranking assigned to categories of knowledge sources, specific data points from a knowledge source may be displayed or presented in a disease knowledge base. For example, often, doctors or caregivers may be more interested in certain mutations that may be present in a patient. Treatment options can often be determined based on these mutations. Therefore, user preferences can be considered in the order of therapy, clinical trials, and publications (Therapies> Clinical Trials> Publications).

위와 같은 3가지 카테고리들(임상 시험들, 치료들 및 간행물들) 중 어느 것에 속하는 지식 소스들의 데이터 포인트들은, 지식 소스들의 카테고리 각각마다 정의된 복수의 클래스들로 더 분류될 수 있다. 여기서, 클래스들은 미리 할당된 우선 순위(pre-assigned precedence)를 갖는다.Data points of knowledge sources belonging to any of the above three categories (clinical trials, treatments and publications) may be further classified into a plurality of classes defined for each category of knowledge sources. Here, the classes have a pre-assigned precedence.

예를 들어, 임상 시험들이 지식 소스들의 카테고리들 중 어느 하나인 것으로 선택되고, 연관성들(associations)이 하나의 <D, G, M> 세트 (데이터 포인트)로 식별될 수 있다. 임상 시험 카테고리는, 주어진 <D, G, M>에 대한 관련성(relevance)에 기초하여 그 <D, G, M>을 위한 특정 클래스가 할당된다. 추가적으로, 같은 임상 시험 카테고리는 또한, 다른 <D, G, M> 세트가 할당될 수 있고 그 다른 <D, G, M> 세트에 대한 관련성에 기초하여 분류될 수 있다. 나아가서, 하나의 <D, G, M> 세트는 지식 소스의 주어진 카테고리로부터 다수의 데이터 포인트들에 연관될 수 있다. 나아가서, 하나의 <D, G, M> 세트는 다수의 임상 시험 카테고리들에 연관될 수 있다. 표 1은 질병 지식베이스 또는 그 일부를 형성하는 임상 시험 카테고리들을 위한 연관성들의 데이터를 설명한다. 그러므로, 분류는 항상 <D, G, M>에 관한 것이고, 지식 소스들의 다른 카테고리들의 다른 분류에 대한 것도 마찬가지일 수 있다. 'ClinicalTrials.gov'는 NCTID로 불려지는 각 임상 시험에 대하여 유니크 ID(unique ID) / 레지스트리 넘버(registry number)를 제공한다. NCTID는 8자리의 숫자로서 앞에 'NCT' 문자들이 붙는다. 도 1에서는 데이터 및 분류에 관한 설명의 편의를 위하여 가상의 유니크 ID / 레지스트리 넘버들을 사용하였다. 매 클래스는, 어느 질병에 대하여, 임상 시험에 대한 주어진 유전자 및 돌연변이와의 관련성의 정도를 나타낸다. 클래스들은 CT0, CT1, CT2 및 CT3로 라벨링될(labelled) 수 있고, CT0는 가장 관련성이 큰 클래스이고, CT3는 가장 관련성이 적은 클래스일 수 있다.For example, clinical trials are selected to be one of the categories of knowledge sources, and associations may be identified with a single <D, G, M> set (data point). The clinical trial category is assigned a specific class for its <D, G, M> based on the relevance to a given <D, G, M>. Additionally, the same clinical trial category may also be assigned a different <D, G, M> set and may be categorized based on its relevance to its other <D, G, M> sets. Furthermore, one set of < D, G, M > may be associated with multiple data points from a given category of knowledge sources. Furthermore, one set of <D, G, M> may be associated with multiple clinical trial categories. Table 1 describes data for associations for clinical trial categories that form the disease knowledge base or a portion thereof. Therefore, the classification is always about < D, G, M >, and may be for other categories of different categories of knowledge sources as well. 'ClinicalTrials.gov' provides a unique ID / registry number for each clinical trial called NCTID. NCTID is an 8-digit number preceded by the letters 'NCT'. In FIG. 1, virtual unique ID / registry numbers are used for convenience of description of data and classification. Each class represents the degree of relevance to a given gene and mutation for a clinical trial, for any disease. The classes may be labeled CT0, CT1, CT2 and CT3, CT0 is the most relevant class, and CT3 may be the least relevant class.

종양 (질병)Tumor (disease) 유전자gene 돌연변이Mutation NCTIDNCTID 클래스class BreastBreast ERBB2ERBB2 S310FS310F NCT01827267NCT01827267 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01670877NCT01670877 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01953926NCT01953926 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT00730925NCT00730925 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01288261NCT01288261 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT00580333NCT00580333 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01271725NCT01271725 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01441596NCT01441596 CT1CT1 BreastBreast ERBB2ERBB2 S310FS310F NCT01531764NCT01531764 CT1CT1

치료들은, 지식 소스의 카테고리로서, 예를 들어 공개될 연구들의 준비를 통해 약물 또는 약효(drug action)에 대한 <D, G, M>의 매 연관성마다 선택될 수 있다. 치료 카테고리의 분류는 환자 돌연변이 및 질병 정보에 기초하여 수행된다. 클래스들은 T0, T1, T2 및 T3로 라벨링될 수 있고, T0는 가장 관련성이 큰 클래스이고, T3는 가장 관련성이 적은 클래스일 수 있다.Treatments can be selected for each association of <D, G, M> for a drug or drug action as a category of knowledge source, eg, through preparation of studies to be published. Classification of treatment categories is performed based on patient mutation and disease information. The classes may be labeled T0, T1, T2 and T3, T0 is the most relevant class, and T3 may be the least relevant class.

간행물들은, 지식 소스의 카테고리로서 선택되고, 주어진 <D, G, M>과 관련된 간행물들이 식별된다. 식별된 {<D,G,M>, publication} 세트들은 간행물에서 논의된 연구들의 임상 상태, 임상 전(pre-clinical) 상태에 기초하여 관련된 클래스들로 분류된다. 클래스들은 P0, P1, P2 및 P3로 라벨링될 수 있고, P0는 가장 관련성이 큰 클래스이고, P3는 가장 관련성이 적은 클래스일 수 있다. 낮은 클래스 넘버는 더 높은 관련성을 나타내고, 높은 클래스 넘버는 더 낮은 관련성을 나타낸다. 예를 들어, P0는 P3보다 더 높은 관련성을 갖고, 임상 시험(CT) 및 치료(T)와 유사하게 적용될 수 있다.The publications are selected as categories of knowledge sources, and publications related to the given <D, G, M> are identified. The identified {<D, G, M>, publication} sets are categorized into related classes based on the clinical, pre-clinical status of the studies discussed in the publication. The classes can be labeled P0, P1, P2 and P3, P0 is the most relevant class, and P3 is the least relevant class. The lower class number indicates higher relevance, and the higher class number indicates lower relevance. For example, P0 has a higher relevance than P3 and can be applied similar to clinical trials (CT) and therapy (T).

106 단계에서, 데이터 포인트들의 연관성 또는 연결들의 데이터에 매핑된, 환자의 획득된 돌연변이 정보에서 제공된 게노믹 DNA의 변이가 식별된다. 매핑 단계의 출력은, 획득된 돌연변이 정보에 대한 지식베이스에 존재하는 관련 데이터 포인트들을 나타내는 돌연변이 정보에 매핑된다. 이는, 도 2 및 3을 참고하여 설명하도록 한다.In step 106, a variation of the genomic DNA provided in the acquired mutation information of the patient, which is mapped to data points associations or data of connections, is identified. The output of the mapping step is mapped to mutation information representing relevant data points present in the knowledge base for the obtained mutation information. This will be described with reference to Figs. 2 and 3. Fig.

108 단계에서, 매핑된 돌연변이 정보를 위한 빈도 표(frequency table)가 지식 소스의 카테고리 별로, 그 다음으로 각각의 클래스 별로 생성된다. 빈도 표는 복수의 열들(columns)을 포함한다. 각각의 열은 지식 소스의 카테고리의 특정 클래스에 속하는 데이터 포인트들의 연관성들(associations) 또는 연결들(linkages)의 데이터의 발생 빈도수가 분포된다. 또한, 빈도 표는 복수의 행들(rows)을 포함한다. 각각의 행은 게노믹 DNA의 특정 변이에 연결된(linked) 데이터 포인트들의 연관성들 또는 연결들의 데이터의 발생 빈도수가 분포된다. 도 2는 일 실시예에 따라 환자의 변이 데이터(예를 들어, VCF 파일)를 획득하고 빈도 표를 생성하는 것을 도식화한 흐름도이다. 지식 소스들의 열들에 분포한 수치 값들 및 그 뒤의 클래스들은 지식 소스의 특정 클래스에서 발견된, 매핑된 돌연변이 정보의 발생 빈도수를 나타낸다. 예를 들어, 유전자-돌연변이(gene-mutation) ATP6AP2-K205E에 대하여, 임상 시험 카테고리에 관한 CT1 열은 '1'의 값을 갖고, 이는 ATP6AP2-K205E는 임상 시험 카테고리의 클래스 1에 한번 매핑되었음을 나타낸다. 유사하게, 임상 시험 카테고리에 관한 CT0 열은 '0'의 값을 갖고, 이는 ATP6AP2-K205E는 임상 시험 카테고리의 클래스 0의 카테고리에 매핑되지 않았음을 나타낸다.In step 108, a frequency table for mapped mutation information is generated for each category of knowledge source and then for each class. The frequency table includes a plurality of columns. Each column is distributed the frequency of occurrence of data of associations or links of data points belonging to a particular class of categories of knowledge sources. In addition, the frequency table includes a plurality of rows. Each row is distributed the frequency of occurrences of data or associations of data points linked to a particular variation of genomic DNA. 2 is a flow chart diagram illustrating obtaining patient variation data (e.g., a VCF file) and generating a frequency table according to one embodiment. Numerical values distributed in the columns of knowledge sources and classes thereafter represent the occurrence frequency of mapped mutation information found in a particular class of knowledge source. For example, for the gene-mutation ATP6AP2-K205E, the CT1 column for the clinical trial category has a value of '1' indicating that ATP6AP2-K205E has been mapped once to class 1 of the clinical trial category . Similarly, the CTO column for the clinical trial category has a value of '0' indicating that ATP6AP2-K205E is not mapped to the class 0 category of the clinical trial category.

도 3은 일 실시예에 따라, 빈도 표를 생성하는 단계를 설명하기 위한 도면이다. 도 3에는, 질병 지식베이스 또는 그 일부를 형성하는 임상 시험들을 위한 연관성들의 데이터로부터 빈도 표가 어떻게 생성되는지를 설명하기 위하여, 표 1이 도시되어 있다. 이와 같은 경우에서, 임상 시험 카테고리 및 치료 카테고리를 포함하는 2개의 지식 소스들이 빈도 표의 생성을 위해 사용될 수 있다. 임상 시험 지식베이스 (질병 지식베이스의 일부)에 대해 유전자-돌연변이 ERBB2-S310F를 분석한 결과, 유전자-돌연변이 ERBB2-S310F는 클래스 CT1에 대해 4번 매핑되고, 클래스 CT2에 대해 2번 매핑되고, 클래스 CT3에 대해 3번 매핑되는 것으로 발견된다. 이후 클래스들의 각 열에서 유전자-돌연변이 ERBB2-S310F에 대한, 대응되는 항목들이 빈도 표에서 만들어 진다. 나아가서, 치료 지식베이스 (질병 지식베이스의 일부)에 대해 유전자-돌연변이 ERBB2-S310F를 또한 분석한 결과, 유전자-돌연변이 ERBB2-S310F는 클래스 T1에 대해 1번 매핑되고, 클래스 T2에 대해 1번 매핑되고, 클래스 T3에 대해 1번 매핑되는 것으로 발견된다. 이후 클래스들의 각 열에서 유전자-돌연변이 ERBB2-S310F에 대한, 대응되는 항목들이 빈도 표에서 만들어 진다. 비슷하게, 환자의 VCF로부터 식별된 다른 유전자-돌연변이도 하나하나 매핑되어 빈도 표가 생성된다. 다른 실시예에 따르면, 빈도 표를 생성할 목적으로 환자의 VCF로부터 식별된 유전자-돌연변이들은 모두 합쳐서 매핑될 수 있다.3 is a diagram for explaining a step of generating a frequency table according to an embodiment. In Figure 3, Table 1 is shown to illustrate how a frequency table is generated from data of associations for clinical trials that form the disease knowledge base or a portion thereof. In such a case, two knowledge sources, including a clinical trial category and a treatment category, may be used for generation of frequency tables. Analysis of the gene-mutant ERBB2-S310F against the clinical trial knowledge base (part of the disease knowledge base) showed that the gene-mutant ERBB2-S310F was mapped 4 times for class CT1, 2 times for class CT2, It is found to be mapped three times for CT3. Subsequently, corresponding entries for the gene-mutant ERBB2-S310F in each column of the classes are made in the frequency table. Furthermore, further analysis of the gene-mutant ERBB2-S310F against the therapeutic knowledge base (part of the disease knowledge base) shows that the gene-mutant ERBB2-S310F is mapped once for class T1, once for class T2 , It is found to be mapped once for class T3. Subsequently, corresponding entries for the gene-mutant ERBB2-S310F in each column of the classes are made in the frequency table. Similarly, other gene-mutations identified from the patient's VCF are also mapped one by one to generate a frequency table. According to another embodiment, the gene-mutations identified from the patient ' s VCF may be mapped together for the purpose of generating a frequency table.

110 단계에서, 매핑된 돌연변이 정보는 빈도표에서 우선순위화 스킴(prioritization scheme)에 기초하여 우선순위화된다. 빈도 표를 정렬하기 위한 사용자 요구에 기초하여 지정된 다양한 우선순위화 스킴들이 존재할 수 있다. 일 실시예에서, 일차적인 필터링(primary filtering)으로서, 지식 소스에서 선호하는 카테고리에 기초하여 데이터를 선별하기 위한 엄격한 기준(strict criterion)이 선택될 수 있다. 이와 같은 스킴은 다양한 데이터 소스들간에 존재하는 연결들(linkages)을 활용한다. 이 스킴에 대해 구체적으로 설명하면 다음과 같다.In step 110, the mapped mutation information is prioritized based on a prioritization scheme in a frequency table. There may be various prioritization schemes specified based on user needs to sort the frequency tables. In one embodiment, as a primary filtering, a strict criterion for selecting data based on the preferred category in the knowledge source may be selected. These schemes utilize links that exist between various data sources. This scheme will be described in detail as follows.

(a) 지식 소스들의 하나 이상의 카테고리들로부터 선택된 하나의 카테고리에 기초하여 빈도 표를 필터링한다.(a) filtering the frequency table based on one category selected from one or more categories of knowledge sources;

(b) 지식 소스의 선택된 카테고리와 연관된 데이터 포인트에 연결되고, 이전 단계((a) 단계)에서 선택되지 않은, 지식 소스의 카테고리의 데이터 포인트들을, 필터링된 빈도 표에 추가시킨다(populating).(b) populating the filtered frequency table with data points of a category of knowledge source that are linked to data points associated with selected categories of knowledge sources and not selected in a previous step (step (a)).

(c) 지식 소스의 카테고리의 랭킹마다(viz a viz ranking) 데이터 포인트의 발생 빈도수 및 지식 소스들의 각 카테고리에 존재하는 데이터 포인트의 클래스의 미리 할당된 우선순위에 기초하여, 빈도 표를 정렬한다(sorting).(c) for each rank of the category of the knowledge source based on (viz a viz ranking) pre-assigned priority of the data points in the class present in the frequency of occurrence and each category of the knowledge source for a data point position, to align the frequency table ( sorting).

우선순위화 스킴의 일 실시예에 따르면, 임상 시험 카테고리(지식 소스의 카테고리)가 일차적인 필터링으로서 선택될 수 있다. 생성된 빈도 표(예를 들어 108 단계에서 생성된 빈도 표)는, 빈도 표에서 임상 시험 섹션의 어느 클래스의 대응 엔트리들을 나타내는 환자의 돌연변이 정보를 목록화(list out)하기 위하여, 임상 시험 카테고리에 기초하여 필터링된다. 이전 단계의 임상 시험 카테고리의 식별된 데이터 포인트들에 관련된 또는 연결된, 지식 소스들의 다른 카테고리들(즉, 치료 카테고리 및 간행물 카테고리)의 데이터 포인트들은 다음 단계에서 선택되고, 빈도 표에 분포된다. 마지막 단계에서, 대응하는 클래스들 내 높은 엔트리들을 나타내는 유전자-돌연변이들에 대해 높은 랭킹을 부여함으로써, 빈도 표의 엔트리들이 정렬된다. 유전자-돌연변이들의 순위 결정(ranking)은, 지식 소스들에 할당된 랭크(rank)들 및 각 지식 소스들 내의 클래스들에 할당된 우선순위(precedence)를 고려하며 수행된다. 이에 대해서는 도 4a를 참고할 수 있다.According to one embodiment of the prioritization scheme, the clinical trial category (category of knowledge source) may be selected as the primary filtering. The generated frequency table (e.g., the frequency table generated in step 108) may be used to generate a list of the patient ' s mutation information representing the corresponding entries of a class of clinical trial sections in the frequency table, . The data points of the other categories of knowledge sources (i.e., therapy category and publication category) associated with or linked to the identified data points of the previous phase clinical trial category are selected in the next step and distributed in the frequency table. In the final step, entries in the frequency table are arranged by giving high rankings to the gene-mutations representing the high entries in the corresponding classes. Ranking of gene-mutations is performed taking into account the rank assigned to the knowledge sources and the priority assigned to the classes in each knowledge source. This can be referred to FIG. 4A.

다른 실시예에 따르면, 우선순위화 스킴은 빈도 표에서, 주어진 돌연변이에 독립적으로 존재하는 모든 증거들(evidences)을 고려하면서 정렬될 수도 있다. 이에 대해서는 도 4b를 참고할 수 있다. 보다 상세하게 설명하면 다음과 같다.According to another embodiment, the prioritization scheme may be arranged in the frequency table, taking into account all the evidences that exist independently of a given mutation. This can be referred to FIG. 4B. This will be described in more detail as follows.

(a) 지식소스의 카테고리 별로 그리고 뒤이어 각 클래스 별로 빈도 표에서 연결된(linked) 데이터 포인트들을 배열한다(arranging).(a) Arranging linked data points in a frequency table for each category of knowledge source followed by each class.

(b) 지식 소스의 카테고리의 랭킹마다(viz a viz ranking) 데이터 포인트의 발생 빈도수 및 지식 소스들의 각 카테고리에 존재하는 데이터 포인트들의 클래스의 미리 할당된 우선순위에 기초하여, 매핑된 돌연변이 정보에 대해 빈도 표를 정렬한다(sorting).(b) based on the frequency of occurrence of the data points in each category of viz a viz and the pre-assigned priorities of the classes of data points present in each category of knowledge sources, for mapped mutation information Sorting frequency tables.

일 실시예에 따르면, 정렬 테크닉(sorting technique)은 멀티레벨 정렬(multilevel sort)을 사용할 수 있다. 이하에서는 도 5를 참고하여 이와 같은 정렬 방식에 대해 설명하도록 한다. 각각의 돌연변이에 대해서는 스코어 S(m)이 할당되고, 스코어 S(m)은 아래 수학식 1과 같이 정의될 수 있다.According to one embodiment, the sorting technique may use a multilevel sort. Hereinafter, such an alignment method will be described with reference to FIG. For each mutation, a score S (m) is assigned, and the score S (m) can be defined as: " (1 ) "

그리고, 돌연변이에 대한 스코어는 S(m)으로 표현되고 아래 수학식 2를 이용하여 계산될 수 있다.Then, the score for the mutation can be expressed as S (m) and can be calculated using Equation 2 below.

수학식 2를 참고하면, k는 지식 소스들의 카테고리의 개수이고, c는 지식 소스들의 각 카테고리에 있는 클래스들의 총 개수이다. 예를 들어, c=4인 경우, 클래스 0부터 클래스 3이 존재할 수 있다. N _ij 는 지식 소스 i의 카테고리 및 클래스 j에 속하는 데이터 포인트들의 개수를 나타낸다. 10 ^t 는 클래스 당 데이터의 최대 개수를 의미한다.Referring to Equation 2, k is the number of categories of knowledge sources and c is the total number of classes in each category of knowledge sources. For example, if c = 4, classes 0 through 3 may exist. N _ij denotes the number of data points belonging to category i and class j of knowledge source i . 10 ^t means the maximum number of data per class.

우선순위화 스킴의 일 실시예에 따르면, 임상 시험 카테고리 및 치료 카테고리 각각에 대해 독립적으로 정렬이 수행된다. 데이터 필터링은 임상 시험 카테고리 및 치료 카테고리 각각에 대해 독립적으로 수행된다. 그러나, 이에 제한되지 않고, 지식 소스들의 카테고리의 개수가 더 많거나 적더라도 동일하게 적용될 수 있다. 본 실시예에서, 임상 시험 카테고리는 치료 카테고리에 비해 더 높게 랭크된다. 즉, 임상 시험 카테고리 다음에 치료 카테고리의 순이다(Clinical trial > Therapies). 108 단계 이후에 생성된 빈도 표는 우선순위화 스킴에 따라 정렬된다. 이는 도 5에 도시되어 있다. 정렬 후에, 6번째 행(Row #6) 및 7번째 행(Row #7)이 정렬된 빈도 표(sorted frequency table)의 제일 위에 표시된다. 여기서, 6번째 행은 4, 2, 3, 4, 5, 4, 3, 2의 에비던스 스코어들(evidence scores)을 갖고, 7번째 행은 5, 2, 3, 4, 5, 2, 3, 1의 에비던스 스코어들을 갖는다. 이후에, 단순한 정렬 메커니즘에 기초하여, 이 두 엔트리들에 대해 정렬된 순서는 7번째 행(Row #7) 다음에 6번째 행(Row #6)이다.According to one embodiment of the prioritization scheme, an alignment is performed independently for each of the clinical trial category and the treatment category. Data filtering is performed independently for each of the clinical trial category and treatment category. However, the present invention is not limited thereto, and the same applies even if the number of categories of knowledge sources is larger or smaller. In this example, the clinical trial category is ranked higher than the treatment category. That is, the order of the treatment category followed by the clinical trial category (Clinical trial> Therapies). The frequency tables generated after step 108 are sorted according to the prioritization scheme. This is shown in FIG. After the sorting, the sixth row (Row # 6) and the seventh row (Row # 7) are displayed at the top of the sorted frequency table. Here, the sixth row has evidence scores of 4, 2, 3, 4, 5, 4, 3 and 2 and the seventh row has 5, 2, 3, 4, 5, 2, 3 , And avision scores of 1. Thereafter, based on the simple sorting mechanism, the sorted order for these two entries is the sixth row (Row # 6) after the seventh row (Row # 7).

우선순위화 스킴의 다른 실시예에 따르면, 치료 카테고리는 임상 시험 카테고리에 비해 더 높게 랭크될 수 있다. 즉, 치료 카테고리 다음에 임상 시험 카테고리의 순이다(Therapies > Clinical trials). 이는 도 6에 도시되어 있다. 정렬 후에, 6번째 행(Row #6) 및 7번째 행(Row #7)이 정렬된 빈도 표의 제일 위에 표시된다. 여기서, 6번째 행은, 임상 시험 카테고리에 대해서는 4, 2, 3, 4의 에비던스 스코어들을 갖고, 치료 카테고리에 대해서는 5, 4, 3, 2의 에비던스 스코어들을 갖는다. 7번째 행은, 임상 시험 카테고리에 대해서는 5, 2, 3, 4의 에비던스 스코어들을 갖고, 치료 카테고리에 대해서는 5, 2, 3, 1의 에비던스 스코어들을 갖는다.According to another embodiment of the prioritization scheme, the treatment category may be ranked higher than the clinical trial category. In other words, Therapeutics> Clinical trials, followed by the Therapeutic category. This is shown in FIG. After the alignment, the sixth row (Row # 6) and the seventh row (Row # 7) are displayed at the top of the sorted frequency table. Here, the sixth row has avision scores of 4, 2, 3, and 4 for clinical trial categories, and 5, 4, 3, and 2 for treatment categories. The seventh row has avision scores of 5, 2, 3, and 4 for clinical trial categories, and 5, 2, 3, and 1 for treatment categories.

이와 같은 시나리오에서는, 최우선 순위는 치료 카테고리 내의 클래스들에 주어진다. 따라서, 에비던스 순서(evidence order)에 대해 설명하면, 6번째 행은 5, 4, 3, 2, 4, 2, 3, 4의 에비던스 스코어들을 갖고, 7번째 행은 5, 2, 3, 1, 5, 2, 3, 4의 에비던스 스코어들을 갖는다.In such a scenario, the highest priority is given to the classes in the treatment category. Thus, in the evidence order, the sixth row has avision scores of 5, 4, 3, 2, 4, 2, 3 and 4, the seventh row has 5, 2, 3, 1, 5, 2, 3, and 4, respectively.

이 경우, 단순한 정렬 메커니즘에 기초하여, 이 두 엔트리들에 대해 정렬된 순서는 6번째 행(Row #6) 다음에 7번째 행(Row #7)이다.In this case, based on a simple sorting mechanism, the sorted order for these two entries is the seventh row (Row # 7) after the sixth row (Row # 6).

한편, 본 실시예들에서 설명된 서로 다른 지식 소스들의 랭킹은 사용자의 요구에 따라 달라질 수 있다. 정렬된 빈도 표가 생성되면, 이용 가능한 증거들 또는 정보들에 기초하여, 환자에 대해 치료를 개인화시키기에 어떠한 방식이 적절한지 선택하는 것은 의사들에게 용이할 수 있다.On the other hand, the rankings of the different knowledge sources described in the present embodiments may be changed according to the needs of the user. Once an ordered frequency table is generated, it may be easier for physicians to choose which way is appropriate for personalizing treatment for the patient based on available evidence or information.

환자의 개인화된 치료를 위한 돌연변이 우선화를 수행하는 장치에 대해 설명하도록 한다. 도 7은 일 실시예에 따라, 돌연변이 우선순위화를 수행하는 장치의 블록도이다. 장치는 매핑된 돌연변이 정보를 우선순위화하기 위해 구현되고, 의사들 또는 케어기버들의 활용을 위해 우선순위화된 돌연변이의 리스트를 생성한다.An apparatus for performing mutation prioritization for personalized treatment of a patient will be described. 7 is a block diagram of an apparatus for performing mutation prioritization, in accordance with one embodiment. The device is implemented to prioritize mapped mutation information and generates a list of prioritized mutations for use by doctors or caregivers.

장치(700)는 프로세서(706) 및 프로세서(706)에 연결된 메모리(702)를 포함한다.Apparatus 700 includes a processor 706 and a memory 702 coupled to the processor 706.

프로세서(706)는, 어떠한 종류의 연산 회로(computational circuit)로도 구현될 수 있고, 예를 들어 마이크로프로세서, 마이크로컨트롤러, CISC(complex instruction set computing) 마이크로프로세서, RISC(reduced instruction set computing) 마이크로프로세서, VLIW(very long instruction word) 마이크로프로세서, EPIC(explicitly parallel instruction computing) 마이크로프로세서, DSP(digital signal processor), 또는 다른 종류의 프로세싱 회로, 또는 이들의 조합을 포함할 수 있다.The processor 706 may be implemented as any type of computational circuit and may be implemented as a microprocessor, microcontroller, complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) A very long instruction word (VLIW) microprocessor, an explicitly parallel instruction computing (EPIC) microprocessor, a digital signal processor (DSP), or other types of processing circuitry, or a combination thereof.

메모리(702)는, 프로세서(706)가 도 1에 도시된 단계들을 수행하도록 명령하는, 실행 가능한 프로그램(executable program)의 형태로 저장된 복수의 모듈들을 포함한다. 메모리(702)는, 돌연변이 정보 획득 모듈(708), 매핑 모듈(710), 식별 모듈(712), 빈도 표 생성 모듈(714) 및 우선순위화 모듈(716)을 포함할 수 있다. 메모리(702)는 또한, 질병 지식베이스를 포함할 수 있다. 한편, 이와 달리, 질병 지식베이스는, 장치의 어느 종류의 통신 수단을 이용하여 통신 가능하도록 연결된 것일 수 있다.Memory 702 includes a plurality of modules stored in the form of executable programs, which processor 706 commands to perform the steps shown in FIG. The memory 702 may include a mutation information acquisition module 708, a mapping module 710, an identification module 712, a frequency table generation module 714 and a prioritization module 716. The memory 702 may also include a disease knowledge base. Alternatively, the disease knowledge base may be communicatively coupled using any type of communication means of the device.

컴퓨터 메모리 엘리먼트들은, ROM(read only memory), RAM(random access memory), EPROM(erasable programmable read only memory), EEPROM(electrically erasable programmable read only memory), 하드 드라이브, 메모리 카드를 위한 착탈식 미디어 드라이브 등과 같이, 데이터 및 실행 가능한 프로그램의 저장을 위한 적절한 메모리 디바이스를 포함할 수 있다. 본 실시예들은, 프로그램 모듈들과 결합하여 구현되거나, 기능들, 프로시져들(procedures), 데이터 구조들 및 애플리케이션 프로그램들을 포함하도록 구현되거나, 태스크들을 수행하도록 구현되거나, 또는 ADT(abstract data types) 또는 로우-레벨(low-level) 하드웨어 컨텍스트들을 정의하도록 구현될 수 있다. 앞서 설명된 어느 스토리지 미디어에 저장된 실행 가능한 프로그램은 프로세서(706)에 의해 실행될 수 있다.The computer memory elements may be implemented as computer-readable media, such as read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory , An appropriate memory device for storage of data and executable programs. These embodiments may be implemented in conjunction with program modules or may be implemented to include functions, procedures, data structures, and application programs, or may be implemented to perform tasks, or may be implemented as abstract data types (ADT) And may be implemented to define low-level hardware contexts. An executable program stored on any of the storage media described above may be executed by the processor 706. [

돌연변이 정보 획득 모듈(708)은 프로세서(706)가 도 1의 102 단계를 수행하도록 명령한다(instruct).The mutation information acquisition module 708 instructs the processor 706 to perform step 102 of FIG.

매핑 모듈(710)은 프로세서(706)가 도 1의 104 단계를 수행하도록 명령한다.Mapping module 710 instructs processor 706 to perform step 104 of FIG.

식별 모듈(712)은 프로세서(706)가 도 1의 106 단계를 수행하도록 명령한다.Identification module 712 commands processor 706 to perform step 106 of FIG.

빈도 표 생성 모듈(714)은 프로세서(706)가 도 1의 108 단계를 수행하도록 명령한다.Frequency table generation module 714 instructs processor 706 to perform step 108 of FIG.

우선순위화 모듈(716)은 프로세서(706)가 도 1의 110 단계를 수행하도록 명령한다.Prioritization module 716 commands processor 706 to perform step 110 of FIG.

질병 지식베이스 및 질병 지식베이스를 생성하는 방법How to Generate Disease Knowledge Base and Disease Knowledge Base

본 실시예에 따르면, 질병 지식베이스를 생성하는 방법이 제공된다. 방법의 단계들은, 다양한 공개 데이터 소스들로부터 로컬 저장소(local repository)로 로우 데이터(raw data)를 축적시키는(aggregation) 것을 포함한다. 또한, 데이터 포인트라 불리는 특정 정보(<Disease, Gene, Mutation> 및 임상 관련 데이터(data of clinical relevance))에 대해 수집된 축적 데이터(aggregated data)를 클리닝(cleaning) 및 큐레이팅(curating)하고, 데이터 포인트들간의 연관성들(associations)을 식별(identifying)하는 것을 포함한다. 이와 같이 큐레이팅된(curated) 정보, 데이터 포인트 연관성들(data point associations)은, 질병 지식베이스를 생성하기 위하여, 분류 규칙들에 따라 분류된다. 그러므로, 질병 지식베이스는, 임상 시험들, 치료들 및 간행물들과 같은 3가지 일차적인 카테고리들에 연결된 다양한 지식 소스들을 포함한다. 임상 시험, 치료 및 간행물의 지식 소스들은 독립적으로 큐레이팅되고(curated) 분류될 수 있다. 나아가서, 지식 소스들의 각각의 카테고리들(임상 시험들, 치료들 및 간행물들)을 위한 분류 규칙들은 사용자의 요구에 따라 지정될 수 있다. 그러므로, 예를 들어, 치료 카테고리 또는 간행물 카테고리와 비교하여, 임상 시험 카테고리에 속하는 데이터 포인트들의 분류는 다를 수 있다.According to this embodiment, a method for generating a disease knowledge base is provided. The steps of the method include aggregating raw data from various public data sources into a local repository. In addition, cleaning and curating aggregated data collected for specific information (called " Disease, Gene, Mutation " and data of clinical relevance) called data points, And identifying associations between points. Thus, curated information, data point associations, are classified according to classification rules to create a disease knowledge base. Therefore, the disease knowledge base includes various knowledge sources linked to three primary categories, such as clinical trials, treatments and publications. Knowledge sources for clinical trials, treatments, and publications can be independently curated and categorized. Furthermore, the classification rules for each of the categories of knowledge sources (clinical trials, treatments and publications) may be specified according to the user's needs. Thus, for example, the classification of data points belonging to the clinical trial category may be different, as compared to the therapeutic category or publication category.

도 8은 일 실시예에 따라 질병 지식베이스를 생성하는 방법의 상세 흐름도이다.8 is a detailed flowchart of a method for generating a disease knowledge base according to an embodiment.

802 단계에서, 다양한 지식 소스들로부터, 게노믹 DNA의 변이, 유전자, 질병 및 임상적 관련(clinical relevance)의 파라미터에 관계된 정보가 획득된다.In step 802, information relating to the parameters of genomic DNA variation, genes, disease and clinical relevance is obtained from various knowledge sources.

804 단계에서, 지식 소스로부터 게노믹 DNA의 변이, 유전자, 질병 및 임상적 관련의 파라미터를 나타내는 데이터 포인트를 추출하기 위하여, 획득된 정보가 큐레이팅된다(curated). 큐레이팅 이후에(after curation), 대략 2 세트들의 데이터 포인트들이 생성된다. 예를 들어, 데이터 포인트의 1 세트는 게노믹 DNA의 변이, 유전자, 질병을 나타내고, 다른 세트는 임상적 관련의 파라미터를 나타낼 수 있다.In step 804, the obtained information is curated to extract data points representing variations in genomic DNA, genes, diseases, and clinical relevance parameters from a knowledge source. After curation, approximately two sets of data points are generated. For example, one set of data points may represent a variation in genomic DNA, a gene, a disease, and the other set may represent a parameter of clinical relevance.

806 단계에서, 게노믹 DNA의 변이, 유전자, 질병을 나타내는 데이터 포인트와 임상적 관련의 파라미터를 나타내는 데이터 포인트의 연관성들(associations)에 대한 데이터가 식별된다. 예를 들어, Breast tumor:ERBB2: S310F <DGM>의 데이터 포인트는, 유전자 ERBB2 및 돌연변이 S310F에 관련된 유방암을 커버하는 포함 기준(inclusion criteria)에 따라 임상 시험 카테고리에서 매칭되는 것을 찾을 수 있다.In step 806, data is identified for data point associations of genomic DNA, genes, diseases, and data point associations that represent parameters of clinical relevance. For example, data points of the breast tumor: ERBB2: S310F < DGM > can be found to match in the clinical trial category according to the inclusion criteria covering breast cancer associated with the gene ERBB2 and the mutation S310F.

808 단계에서, <DGM>에 연관된 지식 소스의 데이터 포인트는 복수의 클래스들로 분류된다. 이 단계는, 지식 소스의 카테고리 내의 질병, 유전자 및 게노믹 DNA와 클래스 간의 연결(linkage)을 위하여, 데이터 포인트들의 연관성(association)을 분류하는(classifying) 것을 포함한다. 클래스는, 질병, 유전자 및 게노믹 DNA의 변이의 연결을 위하여, 각 데이터 포인트에 할당된다. 그러므로, 이 분류(classification)는, 각 <D, G, M> 세트에 관계된다. 주어진 데이터 포인트가 주어진 지식 소스로부터 다수의 데이터 포인트들(<DGM> 세트들)에 연관되면, 각 <D, G, M>에 대해 서로 다르게 분류될 수 있다. 이는 앞서 환자의 개인화된 치료를 위한 돌연변이 우선순위화에서 설명한 바와 유사하다. 클래스들은 미리 할당된 우선순위(precedence)와 함께, 지식 소스의 각 카테고리 별로 미리 정의된다. 나아가서, 지식 소스들의 카테고리는 또한, 사용자 입력 또는 미리 정의된 우선순위(priority)에 기초하여 순위화될(ranked) 수 있다. 다양한 지식 소스들에서 임상 시험들, 치료들 및 간행물의 3가지 일차적인 카테고리들에 속하는 데이터 포인트들의 분류는 앞서 도 1 등에서 설명된 바와 유사하게 수행될 수 있다.In step 808, the data points of the knowledge source associated with < DGM > are classified into a plurality of classes. This step involves classifying associations of data points, for linkage between diseases, genes and genomic DNA and classes within a category of knowledge sources. The class is assigned to each data point for linkage of disease, gene and genomic DNA mutations. Therefore, this classification is related to each <D, G, M> set. If a given data point is associated with multiple data points (< DGM > sets) from a given knowledge source, it can be classified differently for each <D, G, M>. This is similar to that described above for mutation prioritization for patient ' s personalized treatment. Classes are predefined for each category of knowledge source, with pre-assigned precedence (precedence). Further, the categories of knowledge sources may also be ranked based on user input or a predefined priority. Classification of data points belonging to three primary categories of clinical trials, treatments, and publications in various knowledge sources can be performed similar to that described above in FIG.

도 9는 일 실시예에 따라 지식 소스들의 복수의 카테고리들로부터 데이터를 획득 및 축적하고(aggregating), 데이터 포인트들을 획득하기 위해 데이터를 큐레이팅하고(curating), 데이터 포인트들을 분류하는 것을 설명하기 위한 도면이다.FIG. 9 is a diagram for illustrating the acquisition and aggregation of data from a plurality of categories of knowledge sources according to one embodiment, curating data to obtain data points, and classifying data points. to be.

810 단계에서, 지식 소스들의 하나 이상의 카테고리들 내의 분류된 하나 이상의 데이터 포인트들에 기초하여 질병 지식베이스가 생성된다. 생성된 질병 지식베이스는, 지식 소스의 카테고리 별로, 그리고 뒤이어 각 클래스 별로, 지식 소스로부터 유래된 게노믹 DNA의 변이, 유전자 및 질병을 나타내는 데이터 포인트와 지식 소스로부터 임상적 관련의 파라미터를 나타내는 데이터 포인트간의 연관성들에 대한 데이터의 배열을 포함한다.In step 810, a disease knowledge base is generated based on one or more data points that are classified in one or more categories of knowledge sources. The generated disease knowledge base includes data points representing genomic and disease variants derived from a knowledge source, genomes and diseases derived from a knowledge source, and data points representing clinical relevance parameters from a knowledge source, Lt; RTI ID = 0.0 > and / or < / RTI >

임상 시험 카테고리, 치료 카테고리 및 간행물 카테고리에 속한 데이터 포인트들을 분류하기 위한 분류 규칙에 대해서는 이하에서 각 시나리오 별로 설명하도록 한다.The classification rules for classifying the data points belonging to the clinical trial category, the therapeutic category and the publication category will be described below for each scenario.

(A) 임상 시험 카테고리(A) Clinical trial category

임상 시험으로부터 식별된 <D, G, M> 세트(데이터 포인트)는 특정 클래스에 할당된다. 매 클래스는, 주어진 유전자 및 돌연변이와, 질병에 대한 임상 시험과의 관련성(relevance)의 정도를 나타낸다. 클래스들은 CT0, CT1, CT2 및 CT3로 라벨링될(labelled) 수 있고, CT0는 가장 관련성이 큰 클래스로 우선순위가 할당될 수 있고, CT3는 가장 관련성이 적은 클래스일 수 있다. 임상 시험 카테고리에 대한 분류 규칙은 표 2에 나열되어 있다. 표 2의 클래스들의 정의는 임상적 관련의 파라미터를 예시적으로 나타내는 것이나, 이는 설명의 편의를 위한 임의적인 것으로서, 본 실시예들은 표 2에 의해 제한되지 않는다. 예를 들어, <G, M>에 대한 정보를 나타내는 데이터 포인트는 클래스 CT0에 포함된다.The <D, G, M> sets (data points) identified from the clinical trial are assigned to specific classes. Each class represents the degree of relevance of a given gene and mutation to a clinical trial for the disease. The classes may be labeled CT0, CT1, CT2 and CT3, CT0 may be assigned a priority with the most relevant class, and CT3 may be the least relevant class. Classification rules for clinical trial categories are listed in Table 2. The definitions of classes in Table 2 illustrate clinical relevance parameters, but this is optional for convenience of description, and the embodiments are not limited by Table 2. For example, data points representing information on < G, M > are included in class CT0.

Class IDClass ID DefinitionDefinition CT0CT0 Given <Gene, Mutation> is specified in inclusion criteriaGiven <Gene, Mutation> is specified in inclusion criteria CT1CT1 Existence of mutation in the gene is specified in the inclusion criteriaExistence of mutation in the gene is specified in inclusion criteria CT2CT2 Mutation in the gene is not specified, clinical trial might be for retrospective subgroup analysisMutation in the gene is not specified, clinical trial might be for retrospective subgroup analysis CT3CT3 Drug mechanism might be related to the geneDrug mechanism might be related to the gene

(B) 치료 카테고리(B) Therapeutic category

약 또는 약효에 대한 <D, G, M>의 연관성은, 공개된 연구들의 큐레이션(curation)을 통해 수행될 수 있다. 주어진 약의 승인 상태(on-label / off-label)는 미국 FDA의 약 라벨(drug label) 정보를 이용하여 획득될 수 있다. 치료 카테고리의 정렬은, 획득된 환자 돌연변이 및 질병 정보에 기초하여 수행될 수 있다. 치료 카테고리의 분류는 환자 특이적 정보에 의존하고, 이는 환자의 데이터를 처리하면서 수행될 수 있다. 클래스들은 T0, T1, T2 및 T3로 라벨링될 수 있고, T0는 가장 관련성이 큰 클래스로 우선순위가 할당될 수 있고, T3는 가장 관련성이 적은 클래스일 수 있다. 표 3의 클래스들의 정의는 임상적 관련의 파라미터를 예시적으로 나타내는 것이나, 이는 설명의 편의를 위한 임의적인 것으로서, 본 실시예들은 표 3에 의해 제한되지 않는다. 치료 카테고리에 대한 분류 규칙은 표 3에 나열되어 있다. 예를 들어, 주어진 환자의 암 유형에서 <G, M>에 대한 승인된 치료를 나타내는 데이터 포인트는 클래스 T0이다.The association of <D, G, M> for a drug or drug effect can be performed through curation of published studies. The on-label / off-label of a given drug can be obtained using US FDA drug label information. Alignment of treatment categories can be performed based on acquired patient mutation and disease information. The classification of the treatment category depends on the patient-specific information, which can be performed while processing the patient's data. The classes can be labeled T0, T1, T2 and T3, T0 can be assigned a priority with the most relevant class, and T3 can be the least relevant class. The definitions of the classes in Table 3 illustrate clinical relevance parameters, but this is optional for convenience of description, and the embodiments are not limited by Table 3. Classification rules for treatment categories are listed in Table 3. For example, the data point representing an approved treatment for < G, M > in a given patient's cancer type is class TO.

Class IDClass ID DefinitionDefinition T0T0 Approved therapy for {gene, mutation} in patients' cancer typeApproved therapy for {gene, mutation} in patients' cancer type T1T1 Approved therapy for {gene, mutation} in other cancer typeApproved therapy for {gene, mutation} in other cancer type T2T2 Experimental therapy for {gene, mutation} in patients' cancer typeExperimental therapy for {gene, mutation} in patients' cancer type T3T3 Experimental therapy for {gene, mutation} in other cancer typeExperimental therapy for {gene, mutation} in other cancer type

(C) 간행물 카테고리(C) Publication category

주어진 <D, G, M>에 대하여, 관련 간행물들이 식별된다. 식별된 {<D,G,M>, publication} 세트들은, 간행물에서 논의된 연구들의 임상(clinical), 임상 전(pre-clinical) 상태에 기초하여 관련된 클래스들로 분류된다. 표 4의 클래스들의 정의는 임상적 관련의 파라미터를 예시적으로 나타내는 것이나, 이는 설명의 편의를 위한 임의적인 것으로서, 본 실시예들은 표 4에 의해 제한되지 않는다. 클래스들은 P0, P1, P2 및 P3로 라벨링될 수 있고, P0는 가장 관련성이 큰 클래스로 우선순위가 할당될 수 있고, P3는 가장 관련성이 적은 클래스일 수 있다. 간행물 카테고리에 대한 분류 규칙은 표 4에 나열되어 있다.For given <D, G, M>, relevant publications are identified. The identified {<D, G, M>, publication} sets are categorized into related classes based on the clinical, pre-clinical status of the studies discussed in the publication. The definitions of classes in Table 4 illustrate clinical relevance parameters, but this is optional for convenience of description, and the embodiments are not limited by Table 4. [ The classes may be labeled P0, P1, P2 and P3, P0 may be assigned a priority with the most relevant class, and P3 may be the least relevant class. The classification rules for the publication categories are listed in Table 4.

Class IDClass ID DefinitionDefinition P0P0 Pre-clinical and clinical studies are in agreement on the use of therapy for a given <D, G, M>Pre-clinical and clinical studies are in agreement on the use of a given <D, G, M> P1P1 Only clinical studies available on the use of a therapy for a given <D, G, M>Only clinical studies available on the D, G, M> P2P2 Only pre-clinical studies are available on the use of a therapy for a given <D, G, M>Only pre-clinical studies are available on the use of a given <D, G, M> P3P3 Neither pre-clinical nor clinical studies are available for a given <D, G, M>Neither pre-clinical nor clinical studies are available for a given <D, G, M>

규칙들에 따라, 임상 전 연구들 및 임상 연구들을 나타내는 데이터 포인트는 주어진 <D, G, M>에 대한 치료의 사용에 동의한 것으로서 클래스 P0에 할당된다.According to the rules, data points representing pre-clinical studies and clinical studies are assigned to class P0 as agreed to the use of treatment for given <D, G, M>.

한편, 보다 정밀하게 지식 소스들의 카테고리를 분류하기 위한 추가적인 분류 방식도 본 실시예들에 적용될 수 있다.On the other hand, an additional classification scheme for classifying categories of knowledge sources more precisely can be applied to these embodiments.

앞서 설명된 분류들과 달리, 아래의 기준은 보다 세밀하게 지식 소스들의 카테고리를 추가적으로 분류하는데 이용될 수 있다.
Unlike the categories described above, the following criteria can be used to further categorize the categories of knowledge sources in greater detail.

(a) 임상 시험 카테고리를 위한 위치 기반 분류(a) Location-based classification for clinical trial categories

관련성은 임상 시험의 지리적 위치에 기초하여 임상 시험 카테고리에 할당될 수 있다. 치료할 환자에 대한 다양한 관련 지리적 위치들은 우선순위화될 수 있다. 예를 들어, 1번째 선호 위치, 2번째 선호 위치, 3번째 선호 위치 등과 같이 사용자 입력에 기초하여 우선순위화될 수 있다.Relevance can be assigned to clinical trial categories based on the geographic location of the trial. Various related geographic locations for the patient to be treated may be prioritized. Such as a first preference position, a second preference position, a third preference position, and the like.

(b) 치료 카테고리를 위한 약효 기반 분류(b) Drug-based classification for treatment categories

주어진 유전자 또는 돌연변이에 대하여 “Sensitive, Resistant 또는 No Effect”와 같은 약효가 빈도 표를 분류하기 위한 추가적인 필터로서 사용될 수도 있다.For a given gene or mutation, a drug efficacy such as "Sensitive, Resistant or No Effect" may be used as an additional filter to classify frequency tables.

이하에서는 질병 지식베이스를 생성하는 장치에 대해 설명하도록 한다.Hereinafter, an apparatus for generating a disease knowledge base will be described.

도 10은 일 실시예에 따른 질병 지식베이스를 생성하는 장치의 블록도이다. 장치는 획득된 로우 데이터(raw data)에 기초하여 질병 지식베이스를 생성하기 위해 구현될 수 있다.10 is a block diagram of an apparatus for generating a disease knowledge base according to one embodiment. The device may be implemented to generate a disease knowledge base based on raw data obtained.

장치(1000)는 프로세서(1006) 및 프로세서(1006)에 연결된 메모리(1002)를 포함한다.Apparatus 1000 includes a processor 1006 and a memory 1002 coupled to the processor 1006.

프로세서(1006)는, 어떠한 종류의 연산 회로(computational circuit)로도 구현될 수 있고, 예를 들어 마이크로프로세서, 마이크로컨트롤러, CISC(complex instruction set computing) 마이크로프로세서, RISC(reduced instruction set computing) 마이크로프로세서, VLIW(very long instruction word) 마이크로프로세서, EPIC(explicitly parallel instruction computing) 마이크로프로세서, DSP(digital signal processor), 또는 다른 종류의 프로세싱 회로, 또는 이들의 조합을 포함할 수 있다.The processor 1006 may be implemented as any type of computational circuit and may be implemented as a microprocessor, microcontroller, complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) A very long instruction word (VLIW) microprocessor, an explicitly parallel instruction computing (EPIC) microprocessor, a digital signal processor (DSP), or other types of processing circuitry, or a combination thereof.

메모리(1002)는, 프로세서(1006)가 도 8에 도시된 단계들을 수행하도록 명령하는, 실행 가능한 프로그램(executable program)의 형태로 저장된 복수의 모듈들을 포함한다. 메모리(1002)는, 로우(raw) 정보 획득 모듈(1008), 큐레이팅(curating) 모듈(1010), 식별 모듈(1012), 분류 모듈(1014) 및 생성 모듈(1016)을 포함할 수 있다.The memory 1002 includes a plurality of modules stored in the form of an executable program, in which the processor 1006 commands to perform the steps shown in Fig. The memory 1002 may include a raw information acquisition module 1008, a curating module 1010, an identification module 1012, a classification module 1014 and a generation module 1016.

컴퓨터 메모리 엘리먼트들은, ROM(read only memory), RAM(random access memory), EPROM(erasable programmable read only memory), EEPROM(electrically erasable programmable read only memory), 하드 드라이브, 메모리 카드를 위한 착탈식 미디어 드라이브 등과 같이, 데이터 및 실행 가능한 프로그램의 저장을 위한 적절한 메모리 디바이스를 포함할 수 있다. 본 실시예들은, 프로그램 모듈들과 결합하여 구현되거나, 기능들, 프로시져들(procedures), 데이터 구조들 및 애플리케이션 프로그램들을 포함하도록 구현되거나, 태스크들을 수행하도록 구현되거나, 또는 ADT(abstract data types) 또는 로우-레벨(low-level) 하드웨어 컨텍스트들을 정의하도록 구현될 수 있다. 앞서 설명된 어느 스토리지 미디어에 저장된 실행 가능한 프로그램은 프로세서(1006)에 의해 실행될 수 있다.The computer memory elements may be implemented as computer-readable media, such as read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory , An appropriate memory device for storage of data and executable programs. These embodiments may be implemented in conjunction with program modules or may be implemented to include functions, procedures, data structures, and application programs, or may be implemented to perform tasks, or may be implemented as abstract data types (ADT) And may be implemented to define low-level hardware contexts. An executable program stored on any of the storage media described above may be executed by the processor 1006. [

로우 정보 획득 모듈(1008)은 프로세서(1006)가 도 8의 802 단계를 수행하도록 명령한다(instruct).The row information acquisition module 1008 instructs the processor 1006 to perform step 802 of FIG.

큐레이팅 모듈(1010)은 프로세서(1006)가 도 8의 804 단계를 수행하도록 명령한다.The curating module 1010 instructs the processor 1006 to perform step 804 of FIG.

식별 모듈(1012)은 프로세서(1006)가 도 8의 806 단계를 수행하도록 명령한다.The identification module 1012 commands the processor 1006 to perform step 806 of FIG.

분류 모듈(1014)은 프로세서(1006)가 도 8의 808 단계를 수행하도록 명령한다.The classification module 1014 instructs the processor 1006 to perform step 808 of FIG.

생성 모듈(1016)은 프로세서(1006)가 도 8의 810 단계를 수행하도록 명령한다.Generation module 1016 commands processor 1006 to perform step 810 of FIG.

본 실시예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. An apparatus according to the present embodiments may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, User interface devices, and the like. Methods implemented with software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor. Here, the computer-readable recording medium may be a magnetic storage medium such as a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, ), And a DVD (Digital Versatile Disc). The computer-readable recording medium may be distributed over networked computer systems so that computer readable code can be stored and executed in a distributed manner. The medium is readable by a computer, stored in a memory, and executable on a processor.

본 실시예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단”, “구성”과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented in a wide variety of hardware and / or software configurations that perform particular functions. For example, embodiments may include integrated circuit components such as memory, processing, logic, look-up tables, etc., that may perform various functions by control of one or more microprocessors or other control devices Can be employed. Similar to how components may be implemented with software programming or software components, the present embodiments may be implemented in a variety of ways, including C, C ++, Java (" Java), an assembler, and the like. Functional aspects may be implemented with algorithms running on one or more processors. In addition, the present embodiment can employ conventional techniques for electronic environment setting, signal processing, and / or data processing. Terms such as "mechanism", "element", "means", "configuration" may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.

본 실시예에서 설명하는 특정 실행들은 예시들로서, 어떠한 방법으로도 기술적 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. The specific implementations described in this embodiment are illustrative and do not in any way limit the scope of the invention. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of such systems may be omitted. Also, the connections or connecting members of the lines between the components shown in the figures are illustrative of functional connections and / or physical or circuit connections, which may be replaced or additionally provided by a variety of functional connections, physical Connection, or circuit connections.

본 명세서(특히 특허청구범위에서)에서 “상기”의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 포함하는 것으로서(이에 반하는 기재가 없다면), 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 한정되는 것은 아니다.In this specification (particularly in the claims), the use of the terms " above " and similar indication words may refer to both singular and plural. In addition, when a range is described, it includes the individual values belonging to the above range (unless there is a description to the contrary), and the individual values constituting the above range are described in the detailed description. Finally, if there is no explicit description or contradiction to the steps constituting the method, the steps may be performed in an appropriate order. It is not necessarily limited to the description order of the above steps.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

A method for performing mutation prioritization for personalized therapy,
Obtaining mutation information of a patient to be treated, comprising information associated with at least one of disease, gene and alteration of genomic DNA;
Mapping the obtained mutation information to a disease knowledge base;
Forming mapped mutation information by identifying at least one of the disease, the gene and the mutation of the genomic DNA mapped to data points present in the disease knowledge base;
Generating a frequency table for each class in the knowledge source and each category in the category based on the mapping mutation information; And
Prioritizing the mapping mutation information in the frequency table based on a prioritization scheme.

The method according to claim 1,
The disease knowledge base
One or more data points representing at least one of one or more diseases, one or more genes and one or more variations of the genomic DNA, derived from one or more knowledge sources belonging to one or more categories, Comprising data on associations between one or more data points indicative of parameters of clinical relevance,
The data for the associations
With a pre-assigned priority, into a plurality of predefined classes for each category in the knowledge sources.

3. The method of claim 2,
The categories of knowledge sources
A clinical trial category, a therapy category, and a publication category.

3. The method of claim 2,
The frequency table
A plurality of columns in which the frequency of occurrence of the data for the associations of the data points belonging to a particular class is distributed in respective columns; And
Wherein the frequency of occurrence of the data for associations of the data points linked to specific variations of the genomic DNA is distributed in each row.

3. The method of claim 2,
The prioritization scheme
Filtering the frequency table based on one category selected from the categories of the knowledge sources,
Populating the frequency table with data points of the categories connected to one or more data points associated with the selected category and not selected during the filtering,
And sorting the frequency table based on a frequency of occurrence of the data points and a priority assigned in advance to the classes, in viz a viz ranking of the categories.

3. The method of claim 2,
The prioritization scheme
In the frequency table, arranging linked data points by category and then by each class,
And sorting the frequency table for the mapping mutation information based on a frequency of occurrence of the data points and a priority assigned in advance to the classes.

3. The method of claim 2,
The categories
Wherein the priority is ranked based on either user input and a predefined priority.

An apparatus for performing mutation prioritization for personalized therapy, the apparatus comprising:
Memory; And
And one or more processors coupled to the memory,
The one or more processors
Obtaining mutation information of a patient to be treated, comprising information related to at least one of disease, gene and alteration of genomic DNA,
Mapping the obtained mutation information to a disease knowledge base,
Identifying mapped mutation information by identifying at least one of the disease, the gene and the mutation of the genomic DNA mapped to data points present in the disease knowledge base,
A frequency table is generated for each category of a knowledge source and each class in the category based on the mapping mutation information,
And prioritizes the mapping mutation information in the frequency table based on a prioritization scheme.

9. The method of claim 8,
The disease knowledge base
One or more data points representing at least one of one or more diseases, one or more genes and one or more variations of the genomic DNA, derived from one or more knowledge sources belonging to one or more categories, Comprising data on associations between one or more data points indicative of parameters of clinical relevance,
The data for the associations
And is classified into a plurality of predefined classes for each category in the knowledge sources, with pre-assigned priority.

10. The method of claim 9,
The categories of knowledge sources
A clinical trial category, a therapy category, and a publication category.

A method for generating a disease knowledge base,
Obtaining information related to at least one of a disease, a gene, a variation of genomic DNA, and a parameter of a clinical relevance from one or more knowledge sources to which one or more categories belong;
From said knowledge sources, to extract one or more data points representing at least one of said disease, said gene, said variation of said genomic DNA, said parameter of said clinical relevance, ) step;
Identifying associations between the data points representing the disease, the gene and the variation of the genomic DNA, and the data points representing the parameter of the clinical relevance, to the associations of the data points Forming data on the data;
Classifying the associations of the data points into one or more classes for the disease, the gene, the linkage of the mutation of the genomic DNA; And
And generating the disease knowledge base based on the associations of the classified data points of the categories.

12. The method of claim 11,
The classes
The disease, the gene, the connection of the mutation of the genomic DNA to each of the data points.

13. The method of claim 12,
The disease knowledge base
One or more data points representing one or more mutations of the genomic DNA, one or more genes and one or more diseases, derived from the knowledge sources, by category of the knowledge source and then by each class, And an arrangement of data for associations between one or more data points indicative of the parameters of the at least one data point.

14. The method of claim 13,
The classes
And having a pre-assigned priority.

14. The method of claim 13,
The categories
Based on either a user input or a predefined priority.

1. An apparatus for generating a disease knowledge base,
And one or more processors coupled to the memory,
The one or more processors
Acquiring information relating to at least one of a disease, a gene, a variation of genomic DNA, and a parameter of a clinical relevance from one or more knowledge sources to which one or more categories belong,
From the knowledge sources, curating the acquired information to extract one or more data points representing at least one of the disease, the gene, the variation of the genomic DNA, the parameter of the clinical relevance, ),
Identifying associations between the data points representing the disease, the gene and the variation of the genomic DNA, and the data points representing the parameter of the clinical relevance, to the associations of the data points Data is formed,
Classifying the associations of the data points into one or more classes for the disease, the gene, the linkage of the mutation of the genomic DNA,
And generate the disease knowledge base based on the associations of the classified data points of the categories.

17. The method of claim 16,
The categories of knowledge sources
A clinical trial category, a therapy category, and a publication category.