TWI669618B - A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease - Google Patents

A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease Download PDF

Info

Publication number
TWI669618B
TWI669618B TW106140934A TW106140934A TWI669618B TW I669618 B TWI669618 B TW I669618B TW 106140934 A TW106140934 A TW 106140934A TW 106140934 A TW106140934 A TW 106140934A TW I669618 B TWI669618 B TW I669618B
Authority
TW
Taiwan
Prior art keywords
disease
risk
relative risk
database
research
Prior art date
Application number
TW106140934A
Other languages
Chinese (zh)
Other versions
TW201926080A (en
Inventor
辛東稷
鄭縣暻
裵閏詵
Original Assignee
南韓商韓國美迪基因科技有限公司
辛東稷
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南韓商韓國美迪基因科技有限公司, 辛東稷 filed Critical 南韓商韓國美迪基因科技有限公司
Priority to TW106140934A priority Critical patent/TWI669618B/en
Publication of TW201926080A publication Critical patent/TW201926080A/en
Application granted granted Critical
Publication of TWI669618B publication Critical patent/TWI669618B/en

Links

Abstract

本發明是有關於一種預測疾病的方法及裝置以及計算每 種疾病發生風險的加權分數的方法。根據本發明的使用單核苷酸多態性之疾病相關基因分析的系統和裝置包括以下步驟:(1)藉由改進的演算法自疾病及藥物反應相關資料庫、研究資料庫及基因資料庫推導出客觀且具體的單核苷酸多態性-疾病關聯性;及(2)分析步驟(1)中推導出的特定疾病相關聯單核苷酸多態性的複雜性,藉此計算所述疾病的風險。因此,根據本發明的系統和裝置預期具有提高疾病預測結果的準確性的效果。 The invention relates to a method and a device for predicting diseases and calculating each A method of weighting scores for the risk of developing a disease. A system and apparatus for disease-related gene analysis using a single nucleotide polymorphism according to the present invention includes the following steps: (1) a database of disease and drug reaction related databases, a research database, and a gene database by an improved algorithm Deriving an objective and specific single nucleotide polymorphism-disease association; and (2) analyzing the complexity of a particular disease-associated single nucleotide polymorphism derived in step (1), thereby calculating the The risk of the disease. Therefore, the system and device according to the present invention are expected to have an effect of improving the accuracy of disease prediction results.

Description

預測疾病的方法及裝置以及計算每種疾病發生 風險的加權分數的方法 Methods and devices for predicting disease and calculating the occurrence of each disease Weighted score method

本發明是有關於一種使用單核苷酸多態性之疾病相關基因分析的系統和裝置。 The present invention relates to a system and apparatus for disease-related gene analysis using single nucleotide polymorphisms.

單核苷酸多態性(single nucleotide polymorphism,SNP)為單一染色體區域中的單鹼基上發生的常見DNA序列變異,且人類基因組中存在約300萬個單核苷酸多態性。單核苷酸多態性發生頻率高、穩定並分佈在整個基因組中,因而會發生個體遺傳多樣性。此種單核苷酸多態性差異使得對不同疾病的易感性存在差異。因此,近年來,已經開發出「用於預測疾病的遺傳分析技術(genetic analysis technologies for predicting diseases)」,所述用於預測疾病的遺傳分析技術包括:基於單核苷酸多態性資訊選擇人類疾病中所涉及的適當基因(疾病候選基因);辨識基因的變型; 以及統計分析基因與疾病的相關性(US2008-0020484、KR1483284等)。此疾病及藥物反應預測遺傳分析服務旨在對有發展出某些遺傳疾病的高風險的家庭和親屬進行檢查、治療及開處方之前,基於個體特徵,藉由分析與疾病及藥物反應相關聯的遺傳變異來預測疾病發生的可能性及預後(prognosis)。 Single nucleotide polymorphism (SNP) is a common DNA sequence variation occurring on a single base in a single chromosome region, and there are about 3 million single nucleotide polymorphisms in the human genome. Single nucleotide polymorphisms occur frequently, are stable, and are distributed throughout the genome, and thus individual genetic diversity occurs. Such differences in single nucleotide polymorphisms make differences in susceptibility to different diseases. Therefore, in recent years, "genetic analysis technologies for predicting diseases" have been developed, and the genetic analysis techniques for predicting diseases include: selecting humans based on single nucleotide polymorphism information Appropriate genes involved in the disease (disease candidate genes); identification of variants of the gene; And statistical analysis of the relationship between genes and diseases (US2008-0020484, KR1483284, etc.). The disease and drug response predictive genetic analysis service is designed to analyze the association with disease and drug response based on individual characteristics before examining, treating, and prescribing high-risk families and relatives who have developed certain genetic diseases. Genetic variation to predict the likelihood and prognosis of disease occurrence.

因此,「關於疾病與遺傳資訊之間關係的資料庫」的準確性對於準確預測疾病而言非常重要。 Therefore, the accuracy of the "repository on the relationship between disease and genetic information" is very important for accurately predicting disease.

用於預測疾病及藥物反應的傳統遺傳分析系統包括一系列程序,例如端視請求者的請求及疑似疾病的類型而定的預諮詢、分診、個體基因組變型辨識(實驗)及結果報告,以及與之相關聯的資訊的收集及應用。然而,在此等系統中,由於對廣泛公佈的資料的準確識別以及對將來欲提交給請求者的客觀且詳細的報告的討論不充分,因此會出現可靠性問題。 Traditional genetic analysis systems for predicting disease and drug response include a series of procedures, such as pre-consultation, triage, individual genomic variant identification (experimental), and outcome reporting, depending on the requester's request and the type of suspected disease, and Collection and application of information associated with it. However, in such systems, reliability issues arise due to the lack of accurate identification of widely published material and the lack of discussion of objective and detailed reports to be submitted to the requester in the future.

本發明是有關於一種使用單核苷酸多態性之疾病相關基因分析的系統和裝置。根據本發明的系統預期使用改進的演算法來提供高度準確地預測疾病的效果。 The present invention relates to a system and apparatus for disease-related gene analysis using single nucleotide polymorphisms. The system according to the present invention contemplates the use of improved algorithms to provide a highly accurate predictive effect of the disease.

本發明是為了解決先前技術中出現的上述問題而提出的,並且是有關於一種使用單核苷酸多態性之疾病相關基因分析的系統和裝置。 The present invention has been made to solve the above problems occurring in the prior art, and is related to a system and apparatus for disease-related gene analysis using single nucleotide polymorphism.

然而,本發明要達成的技術目的不限於上述技術目的,且熟習此項技術者可自以下說明清楚地理解以上未提及的其他目的。 However, the technical object to be achieved by the present invention is not limited to the above technical purpose, and other objects not mentioned above can be clearly understood from the following description by those skilled in the art.

本發明提供一種計算每種疾病發生風險的加權分數的方法,包括以下步驟:(a)估算每一基因的基因型的相對風險;(b)概化群體中每一基因的基因型的相對風險;(c)計算每一基因的相對風險的分數;(d)計算所有基因的相對風險的平均分數;(e)計算受試者的基因型的分數;以及(f)計算所述受試者的基因型的相對風險。 The present invention provides a method of calculating a weighted score for the risk of occurrence of each disease, comprising the steps of: (a) estimating the relative risk of genotype of each gene; (b) generalizing the relative risk of genotype of each gene in the population (c) calculating a score for the relative risk of each gene; (d) calculating an average score for the relative risk of all genes; (e) calculating a score for the genotype of the subject; and (f) calculating the subject The relative risk of the genotype.

在本發明的一實施例中,步驟(a)中的相對風險是藉由勝算比/((1-盛行率)+(盛行率*勝算比))來計算。 In an embodiment of the invention, the relative risk in step (a) is calculated by the odds ratio / ((1 prevailing rate) + (prevailing rate * odds ratio)).

在本發明的一實施例中,步驟(b)中的所述群體中的相對風險是藉由相對風險*對應的基因型頻率來計算。 In an embodiment of the invention, the relative risk in the population in step (b) is calculated by the genotype frequency corresponding to the relative risk*.

在本發明的一實施例中,步驟(c)中的相對風險的分數被計算為每一基因的基因型的相對風險的分數的總和。 In an embodiment of the invention, the fraction of the relative risk in step (c) is calculated as the sum of the scores of the relative risks of the genotype of each gene.

在本發明的一實施例中,步驟(d)中的相對風險的平均分數被計算為每一基因的相對風險的分數的乘積。 In an embodiment of the invention, the average score of the relative risk in step (d) is calculated as the product of the scores of the relative risks of each gene.

在本發明的一實施例中,步驟(e)中的分數被計算為所述受試者的每一基因的基因型的相對風險的乘積。 In an embodiment of the invention, the score in step (e) is calculated as the product of the relative risk of the genotype of each gene of the subject.

在本發明的一實施例中,步驟(f)中的相對風險是藉由步驟(e)中的分數/步驟(d)中的相對風險的平均分數來計算。 In an embodiment of the invention, the relative risk in step (f) is calculated by the score in step (e) / the average score of the relative risk in step (d).

本發明提供一種預測疾病的方法,包括以下步驟:(a)自請求的樣本提取DNA;(b)自所述DNA提取遺傳資訊;(c)藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)當二或更多個與所述特定疾病相關聯的單核苷酸多態性存在於所述請求的樣本的所述遺傳資訊中時,藉由應用上述計算每種疾病發生風險的加權分數的方法計算的加權分數來計算每一基因型的相對風險;(e)計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷所述請求的樣本的疾病發生風險。 The invention provides a method for predicting a disease comprising the steps of: (a) extracting DNA from a requested sample; (b) extracting genetic information from the DNA; (c) by using the genetic information from the first database The disease-single nucleotide polymorphism association results to the third database are compared to measure the risk of a particular disease; (d) when two or more single nucleotide polymorphisms associated with the particular disease are associated When the sex exists in the genetic information of the requested sample, the relative risk of each genotype is calculated by applying the above-mentioned weighted score calculated by the method of calculating the weighted score of each disease occurrence risk; (e) The relative risk (%) and incidence (%) of the sample requested; and (f) determining the risk of developing the sample of the request.

在本發明的一實施例中,步驟(c)中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、所述處方藥物的濃度、藥物處方的頻率、藥物處方的週期及副作用的資訊。 In an embodiment of the invention, the first database in step (c) is a disease and drug reaction related database, and the disease and drug reaction related database includes a specific single nucleotide polymorphism Information about the symptoms of the specific disease, the type of prescription drug, the concentration of the prescribed drug, the frequency of the drug prescription, the cycle of the drug prescription, and side effects.

在本發明的一實施例中,步驟(c)中的所述第二資料庫為研究資料庫,所述研究資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊以及研究的可重複性資訊。 In an embodiment of the invention, the second database in step (c) is a research database containing research papers on specific diseases associated with specific single nucleotide polymorphisms. It also contains the PubMed identifier, research objects, research methods, research cycles, research results, journal information, and research reproducibility information.

在本發明的一實施例中,步驟(c)中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。 In an embodiment of the present invention, the third database in step (c) is a gene database containing a chromosome number of a specific single nucleotide polymorphism associated with a specific disease, Locus and dual gene information.

在本發明的一實施例中,當步驟(d)中的相對風險為1或大於1(1)時,步驟(e)中的相對風險(%)是藉由(受試者的平均風險分數-1)*100來計算,而當步驟(d)中的相對風險小於1(<1)時,步驟(e)中的相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。 In an embodiment of the invention, the relative risk in step (d) is 1 or greater than 1 ( 1), the relative risk (%) in step (e) is calculated by (subject's average risk score -1) * 100, and the relative risk in step (d) is less than 1 (<1) The relative risk (%) in step (e) is calculated by (1 - average risk score of the subject) * 100.

在本發明的一實施例中,步驟(e)中的所述發生率(%)是藉由步驟(d)中的相對風險*盛行率來計算。 In an embodiment of the invention, the incidence rate (%) in step (e) is calculated by the relative risk* prevalence rate in step (d).

在本發明的一實施例中,步驟(f)中的判斷包括當步驟(e)中的相對風險(%)為1或小於1(1)時,判斷為標準,而當步驟(e)中的相對風險(%)大於1(>1)時,判斷為警告或注意。 In an embodiment of the invention, the determining in step (f) comprises when the relative risk (%) in step (e) is 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) in step (e) is greater than 1 (>1), it is judged as warning or attention.

本發明提供一種用於預測疾病的裝置,包括:(a)提取單元,被配置成自請求的樣本提取DNA;(b)輸入單元,被配置成自所述DNA提取遺傳資訊;(c)比較單元,被配置成藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)算術單元,被配置成當二或更多個與所述特定疾病相關聯的單核苷酸多態性存在於所述請求的樣本的所述遺傳資訊中時,藉由應用上述的計算每種疾病發生風險的加權分數的方法計算的加權分數來計算每一基因型的相對風險;(e)計算單元,被配置成計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷單元,被配置成判斷所述請求的樣本的疾病發生風險。 The present invention provides an apparatus for predicting a disease comprising: (a) an extracting unit configured to extract DNA from a requested sample; (b) an input unit configured to extract genetic information from the DNA; (c) compare a unit configured to measure a risk of a particular disease by comparing the genetic information to a disease-single nucleotide polymorphism associated with the first to third databases; (d) an arithmetic unit And configured to calculate each disease occurrence by applying the above-described calculation when two or more single nucleotide polymorphisms associated with the specific disease are present in the genetic information of the requested sample The weighted score method of the risk calculates the relative risk of each genotype; (e) the calculation unit configured to calculate the relative risk (%) and incidence (%) of the requested sample; f) a judging unit configured to determine a risk of occurrence of the disease of the requested sample.

在本發明的一實施例中,(c)比較單元中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、所述處方藥物的濃度、藥物處方頻率、藥物處方週期及副作用的資訊。 In an embodiment of the invention, (c) the first database in the comparison unit is a disease and drug reaction related database, and the disease and drug reaction related database contains polymorphisms with specific single nucleotides. Information on the symptoms of a particular disease associated with sex, the type of prescription drug, the concentration of the prescribed drug, the frequency of drug prescription, the prescription period of the drug, and side effects.

在本發明的一實施例中,(c)比較單元中的所述第二資料庫為研究資料庫,所述研究資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊以及研究的可重複性資訊。 In an embodiment of the invention, (c) the second database in the comparison unit is a research database containing studies on specific diseases associated with a particular single nucleotide polymorphism. The paper also includes the PubMed identifier, research objects, research methods, research cycles, research results, journal information, and research reproducibility information.

在本發明的一實施例中,(c)比較單元中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。 In an embodiment of the invention, the third database in the (c) comparison unit is a gene database containing the number of chromosomes of a particular single nucleotide polymorphism associated with a particular disease. , loci and dual gene information.

在本發明的一實施例中,(d)算術單元中的相對風險為1或大於1(1)時,(e)計算單元中的相對風險(%)是藉由(受試者的平均風險分數-1)*100來計算,而當(d)算術單元中的相對風險小於1(<1)時,(e)計算單元中的相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。 In an embodiment of the invention, (d) the relative risk in the arithmetic unit is 1 or greater than 1 ( 1), (e) The relative risk (%) in the calculation unit is calculated by (subject's average risk score -1) * 100, and when (d) the relative risk in the arithmetic unit is less than 1 (< 1), (e) The relative risk (%) in the calculation unit is calculated by (1 - average risk score of the subject) * 100.

在本發明的一實施例中,(e)計算單元中的所述發生率(%)是藉由(d)算術單元中的相對風險*盛行率來計算。 In an embodiment of the invention, the occurrence rate (%) in (e) the calculation unit is calculated by (d) the relative risk* prevalence rate in the arithmetic unit.

在本發明的一實施例中,(f)判斷單元中的判斷包括當(e)計算單元中的相對風險(%)為1或小於1(1)時,判斷 為標準,而當(e)計算單元中的相對風險(%)大於1(>1)時,判斷為警告或注意。 In an embodiment of the invention, (f) the determination in the determining unit includes when the relative risk (%) in the (e) calculating unit is 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) in the (e) calculation unit is greater than 1 (>1), it is judged as warning or caution.

根據本發明的使用單核苷酸多態性之疾病相關基因分析的系統和裝置包括以下步驟:(1)藉由改進的演算法自疾病及藥物反應相關資料庫、研究資料庫及基因資料庫推導出客觀且具體的單核苷酸多態性-疾病關聯性;以及(2)分析步驟(1)中推導出的特定疾病相關聯單核苷酸多態性的複雜性,藉此計算所述疾病的風險。因此,根據本發明的系統和裝置期望具有提高疾病預測結果的準確性的效果。 A system and apparatus for disease-related gene analysis using a single nucleotide polymorphism according to the present invention includes the following steps: (1) a database of disease and drug reaction related databases, a research database, and a gene database by an improved algorithm Deriving an objective and specific single nucleotide polymorphism-disease association; and (2) analyzing the complexity of a particular disease-associated single nucleotide polymorphism derived in step (1), thereby calculating the The risk of the disease. Therefore, the system and device according to the present invention are expected to have an effect of improving the accuracy of disease prediction results.

圖1為示出根據本發明一個實施例的使用單核苷酸多態性預測特定疾病的方法的示意圖。 1 is a schematic diagram showing a method of predicting a specific disease using a single nucleotide polymorphism according to an embodiment of the present invention.

圖2為示出根據本發明一個實施例的使用單核苷酸多態性預測對特定藥物的易感性的方法的示意圖。 2 is a schematic diagram showing a method of predicting susceptibility to a particular drug using a single nucleotide polymorphism, in accordance with one embodiment of the present invention.

圖3為示出根據本發明一個實施例的使用單核苷酸多態性預測特定藥物的副作用的方法的示意圖。 3 is a schematic diagram showing a method of predicting side effects of a specific drug using a single nucleotide polymorphism according to an embodiment of the present invention.

圖4為示出根據本發明一個實施例的應用依據於每一基因型的加權分數並預測疾病發生風險的方法的示意圖。 4 is a schematic diagram showing a method of applying a weighted score according to each genotype and predicting the risk of developing a disease according to an embodiment of the present invention.

圖5為具體示出根據本發明計算受試者中發生疾病的相對風 險(%)及發生率(%)的步驟的示意圖。 FIG. 5 is a view specifically showing the relative wind of a disease occurring in a subject according to the present invention. Schematic diagram of the steps of risk (%) and incidence (%).

在下文中,將參考圖式闡述本文中闡述的各種實施例。在以下說明中,為了提供對本發明的全面理解,闡述許多具體細節,例如具體構型、組成及過程等。然而,某些實施例可在沒有該些具體細節中的一或多者的情況下實踐,或者與其他已知的方法及構型組合起來實踐。在其他情況下,為了不會不必要地使本發明模糊不清,沒有詳細闡述已知的過程及製備技術。在本說明書通篇中提及「一個實施例」或「實施例」時意味著結合所述實施例闡述的特定特徵、構型、組成或特性包括在本發明的至少一個實施例中。因此,在本說明書通篇中的各處出現的片語「在一個實施例中」或「實施例」未必是指本發明的同一實施例。另外,特定特徵、構型、組成或特性可在一或多個實施例中以任何合適的方式進行組合。 In the following, various embodiments set forth herein will be set forth with reference to the drawings. In the following description, numerous specific details are set forth, such as a particular configuration, However, some embodiments may be practiced without one or more of these specific details or in combination with other known methods and configurations. In other instances, well known processes and fabrication techniques have not been described in detail in order not to unnecessarily obscure the invention. References to "an embodiment" or "an embodiment" or "an embodiment" or "an embodiment" or "an embodiment" or "an" Therefore, the phrase "in one embodiment" or "an embodiment" or "an" In addition, the particular features, configurations, compositions, or characteristics may be combined in any suitable manner in one or more embodiments.

除非本說明書中另外指明,否則本說明書中使用的所有科學用語及技術用語具有與熟習本發明所屬技術領域者通常理解的含義相同的含義。 Unless otherwise indicated in the specification, all scientific terms and technical terms used in the specification have the same meaning as commonly understood by those skilled in the art.

在本發明的一個實施例中,「遺傳資訊」廣義上是指在DNA的核苷酸序列中編碼的全部資訊。在本發明中,「遺傳資訊」包括個體單核苷酸多態性資訊。 In one embodiment of the invention, "genetic information" refers broadly to all information encoded in the nucleotide sequence of DNA. In the present invention, "genetic information" includes individual single nucleotide polymorphism information.

在本發明的一個實施例中,「單核苷酸多態性(single nucleotide polymorphism,SNP)」是在單個染色體區域中的單個鹼基上發生的常見DNA序列變異。人類基因中存在約300萬個單核苷酸多態性,且每約500到1000個鹼基會有一個單核苷酸多態性。在該些單核苷酸多態性中,約200,000個單核苷酸多態性被假定為存在於對蛋白質進行編碼的基因中的編碼單核苷酸多態性(cSNP)。單核苷酸多態性發生頻率高、穩定並分佈於整個基因中,因而會發生個體遺傳多樣性。亦即,任何人在DNA鏈的特定位點具有腺嘌呤(A),而任何人在所述特定位點具有胞嘧啶(C)。該些微小差異(單核苷酸多態性)可改變每個基因的功能,並且該些微小差異相互作用從而產生不同型態的人體,並使得對不同疾病的易感性存在差異。亦即,若可以找到肝炎患者與未患肝炎的人之間的遺傳差異,則可找出肝炎易感性發生變化的原因。如此一來,人類基因研究的最終目標便是利用此遺傳差異來開發用於預防或治療肝炎的藥物。因此,大型全球製藥公司及基因研究機構認為,單核苷酸多態性可以提供關於新藥開發的源資訊。基於此種信念,他們組建了SNC聯盟(TSC),並專注於單核苷酸多態性研究,以推動健康長壽的夢想,這是人類永恆的理想。然而,即使已開發出大量的單核苷酸多態性,單獨的單核苷酸多態性本身並無意義。亦即,若沒有對象來比較及分析單核苷酸多態性,則該些單核苷酸多態性是無用的。因此,國內製藥公司及研究機構已努力確保其所擁有的許多關於例如心臟病、癡呆症、愛滋病(ADIS)等疾病的比較細節(患者的DNA與臨床資料),並建立 關於哪些單核苷酸多態性與哪些疾病相關聯的資料庫。 In one embodiment of the invention, "single nucleotide polymorphism (single) Nucleotide polymorphism (SNP)" is a common DNA sequence variation that occurs on a single base in a single chromosomal region. There are about 3 million single nucleotide polymorphisms in the human gene, and there is a single nucleotide polymorphism every about 500 to 1000 bases. In these single nucleotide polymorphisms, approximately 200,000 single nucleotide polymorphisms are assumed to be encoded as single nucleotide polymorphisms (cSNPs) present in genes encoding proteins. Single nucleotide polymorphisms occur frequently, are stable, and are distributed throughout the gene, and thus individual genetic diversity occurs. That is, any person has adenine (A) at a specific site of the DNA strand, and any person has cytosine (C) at the specific site. These small differences (single nucleotide polymorphisms) can alter the function of each gene, and these small differences interact to produce different types of humans and make differences in susceptibility to different diseases. That is, if the genetic difference between a hepatitis patient and a person who does not have hepatitis can be found, the cause of the change in hepatitis susceptibility can be found. As a result, the ultimate goal of human genetic research is to use this genetic difference to develop drugs for the prevention or treatment of hepatitis. Therefore, large global pharmaceutical companies and genetic research institutions believe that single nucleotide polymorphisms can provide information about the source of new drug development. Based on this belief, they formed the SNC Alliance (TSC) and focused on single nucleotide polymorphism research to promote the dream of healthy longevity, which is the eternal ideal of human beings. However, even if a large number of single nucleotide polymorphisms have been developed, the single nucleotide polymorphism alone is not meaningful in itself. That is, if there is no object to compare and analyze single nucleotide polymorphisms, the single nucleotide polymorphisms are useless. As a result, domestic pharmaceutical companies and research institutions have worked hard to ensure that many of the details (such as DNA and clinical data) of diseases such as heart disease, dementia, and AIDS (patients) are established and established. A database of which SNPs are associated with which diseases.

在本發明的一個實施例中,「與疾病及藥物反應相關聯的資料庫」意指藉由比較疾病-單核苷酸多態性關聯結果來量測特定疾病的風險的資料庫,並且包括關於下列的資訊:與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥的種類、處方藥的濃度、藥物處方的頻率、藥物處方的週期以及副作用。作為資料庫的本發明中的「與疾病及藥物反應相關聯的資料庫」可包含由國內外食品及藥物管理局(Food and Drug Administration,FDA)、醫療機構及健康檢查中心提供的疾病及藥物資訊,並且可包含特定個體的年齡及性別資訊以及他/她的配偶、孩子、父母、表親等的家族歷史資訊。 In one embodiment of the invention, "a database associated with disease and drug response" means a database that measures the risk of a particular disease by comparing the results of the disease-single nucleotide polymorphism association, and includes Information about the symptoms of a particular disease associated with a particular single nucleotide polymorphism, the type of prescription drug, the concentration of a prescription drug, the frequency of a drug prescription, the duration of a drug prescription, and side effects. The "database related to disease and drug reaction" in the present invention as a database may include diseases and drugs provided by the Food and Drug Administration (FDA), medical institutions, and health check centers. Information, and may include information on the age and gender of a particular individual and family history information of his/her spouse, children, parents, cousins, and the like.

在本發明的一個實施例中,「研究資料庫」意指藉由比較疾病-單核苷酸多態性關聯結果來量測特定疾病的風險的資料庫,並且研究資料可包括但不限於臨床或學術論文。若資料庫中的資料是來源於論文的資料,則資料庫應包含論文的PubMed標識符(PMID)、研究對象、研究方法、研究週期、研究結果、期刊資訊以及研究的可重複性資訊,並且可包含作為研究對象的個體的年齡及性別資訊,以及他/她的配偶、孩子、父母、表親等的家族歷史資訊。 In one embodiment of the invention, "research database" means a database that measures the risk of a particular disease by comparing disease-single nucleotide polymorphism association results, and the research data may include, but is not limited to, clinical Or academic papers. If the data in the database is derived from the paper, the database should contain the PubMed identifier (PMID) of the paper, the research object, the research method, the research period, the research results, the journal information, and the reproducibility information of the study, and The age and gender information of the individual who is the subject of the study, as well as family history information of his/her spouse, child, parent, cousin, etc., may be included.

在本發明的一個實施例中,「基因資料庫」意指藉由比較疾病-單核苷酸多態性關聯結果來量測特定疾病的風險的資料庫,其中所述資料庫可包含與特定疾病相關聯的特定單核苷酸多 態性的遺傳資訊,包括染色體數目、基因座及對偶基因資訊。特別地,在遺傳資料庫中儲存的資料中,待分析個體的種族資訊可作為重要因子,但並不限於此。 In one embodiment of the invention, a "gene database" means a database that measures the risk of a particular disease by comparing the results of a disease-single nucleotide polymorphism association, wherein the database may contain and More specific single nucleotides associated with the disease State genetic information, including chromosome number, locus and dual gene information. In particular, in the data stored in the genetic database, the ethnic information of the individual to be analyzed may be an important factor, but is not limited thereto.

在本發明的一個實施例中,「勝算比」是自病例對照研究估算的相對風險的估算值。相對風險為自世代研究(Cohort study)估算的值,且被定義為在風險因子存在的情況下任何事件發生的概率對在風險因子不存在的情況下事件發生的概率的比率。在世代研究中,預先設定風險因子,並且隨著時間的推移觀察事件的發生,且因此表示事件與風險因子的關聯性的相對風險是可靠的。然而,在落於本發明的技術領域內的病例對照研究中,在根據事件是否發生進行分組後,判斷存在風險因子還是不存在風險因子,並且由於此種原因,相對風險並不顯著,且因此使用作為相對風險的估算值的勝算比。 In one embodiment of the invention, the "winning ratio" is an estimate of the relative risk estimated from a case-control study. The relative risk is a value estimated from the Cohort study and is defined as the ratio of the probability of occurrence of any event in the presence of a risk factor to the probability of occurrence of the event in the absence of a risk factor. In generational studies, risk factors are pre-set, and the occurrence of events is observed over time, and thus the relative risk of correlating events with risk factors is reliable. However, in case-control studies falling within the technical field of the present invention, it is judged whether there is a risk factor or no risk factor after grouping according to whether an event occurs, and for this reason, the relative risk is not significant, and thus Use the odds ratio as an estimate of relative risk.

在本發明的一個實施例中,「演算法」是指為了解決特定問題而藉由電腦程式化執行的程序。當藉由以預定順序進行機械處理而自然地獲得期望的結果時,所述預定順序被稱為用於所述目的的演算法。通常,演算法已知的演算法程序可由電腦程式進行轉換及處理。本說明書中的演算法將特定個體的遺傳資訊與儲存在疾病及藥物反應相關資料庫、研究資料庫及基因資料庫中的資訊進行比較,從而推導出特定疾病的發生(風險)概率,或者推導出對特定疾病高度敏感的候選藥物,或者推導出具有高副作用風險的候選藥物,但並不限於此。 In one embodiment of the invention, "algorithm" refers to a program that is programmed by a computer to solve a particular problem. When the desired result is naturally obtained by mechanical processing in a predetermined order, the predetermined order is referred to as an algorithm for the purpose. Usually, the algorithm program known by the algorithm can be converted and processed by a computer program. The algorithms in this specification compare the genetic information of a particular individual with the information stored in the disease and drug response-related databases, research databases, and gene databases to derive the probability (risk) of a particular disease, or to derive A drug candidate that is highly sensitive to a particular disease, or a drug candidate that has a high risk of side effects, is not limited thereto.

在本發明的一個實施例中,「目標疾病」意指可藉由本發明的演算法預測發生風險及發生率的疾病。在本發明中,目標疾病可大致分為慢性疾病、癌症疾病、藥物反應敏感疾病及其他疾病。具體而言,慢性疾病包括但不限於:1型糖尿病、2型糖尿病、C型肝炎、川崎病(Kawasaki disease)、強直性脊柱炎、銀屑病、肺結核、高血壓、骨關節炎、骨質疏鬆症、冠狀動脈病、潰瘍性結腸炎、發作性睡病、青光眼、腦動脈瘤病、中風、多囊性卵巢綜合征、多發性硬化症、膽石、路格裏克氏病(Lou Gehrig's disease)、狼瘡、類風濕性關節炎、風濕性心臟病、慢性腎病、膝骨關節炎、病理性近視(高度近視)、貝切特氏病(Behcet's disease)、白內障、白癜風、肥胖症、非酒精性脂肪肝、心肌梗塞、心房顫動、阿司匹林過敏性慢性蕁麻疹、異位性皮膚炎、變態性食物過敏、妊娠糖尿病、妊娠期成癮、甘油三酯水準、哮喘、椎間盤突出、癡呆症、克羅恩氏病(Crohn's disease)、痛風、帕金森氏病(Parkinson's disease)、阻塞性肺病、皮脂腺慢性病、冠心病、偏頭痛及黃斑變性;癌症疾病包括但不限於肝癌、甲狀腺癌、睾丸癌、口腔癌、急性骨髓性白血病、卵巢癌、膽道癌、結直腸癌、頭頸癌、瀰漫性胃癌、膀胱癌、兒童期白血病、食道癌、腎癌、胃癌、乳腺癌、子宮頸癌、子宮內膜癌、前列腺癌、胰腺癌、肺癌及皮膚癌;藥物反應敏感疾病包括但不限於甲基苯丙胺誘發的精神病、血管緊張素轉換酶抑制劑敏感性、華法林藥物敏感性、及丙泊酚麻醉敏感性;且其他疾病包括但不限於多動症(ADHD)、 恐慌症、尼古丁成癮、酒精依賴、雙相情感障礙、抑鬱症、孤獨症及精神分裂症。 In one embodiment of the present invention, "target disease" means a disease in which the risk and incidence can be predicted by the algorithm of the present invention. In the present invention, the target diseases can be broadly classified into chronic diseases, cancer diseases, drug-responsive diseases, and other diseases. Specifically, chronic diseases include, but are not limited to, type 1 diabetes, type 2 diabetes, hepatitis C, Kawasaki disease, ankylosing spondylitis, psoriasis, tuberculosis, hypertension, osteoarthritis, osteoporosis Symptoms, coronary artery disease, ulcerative colitis, narcolepsy, glaucoma, cerebral aneurysm, stroke, polycystic ovary syndrome, multiple sclerosis, gallstone, Lughih's disease ), lupus, rheumatoid arthritis, rheumatic heart disease, chronic kidney disease, knee osteoarthritis, pathological myopia (high myopia), Behcet's disease, cataract, vitiligo, obesity, non-alcohol Fatty liver, myocardial infarction, atrial fibrillation, aspirin allergic chronic urticaria, atopic dermatitis, allergic food allergy, gestational diabetes, pregnancy addiction, triglyceride levels, asthma, disc herniation, dementia, gram Crohn's disease, gout, Parkinson's disease, obstructive pulmonary disease, chronic sebaceous gland, coronary heart disease, migraine and macular degeneration; cancer Diseases include, but are not limited to, liver cancer, thyroid cancer, testicular cancer, oral cancer, acute myeloid leukemia, ovarian cancer, biliary tract cancer, colorectal cancer, head and neck cancer, diffuse gastric cancer, bladder cancer, childhood leukemia, esophageal cancer, kidney Cancer, stomach cancer, breast cancer, cervical cancer, endometrial cancer, prostate cancer, pancreatic cancer, lung cancer and skin cancer; drug-responsive diseases including but not limited to methamphetamine-induced psychosis, angiotensin-converting enzyme inhibitor sensitivity Sex, warfarin drug sensitivity, and propofol anesthesia sensitivity; and other diseases including but not limited to ADHD, Panic disorder, nicotine addiction, alcohol dependence, bipolar disorder, depression, autism, and schizophrenia.

此外,可根據是否可計算盛行率來對目標疾病進行分類。具體而言,在上述目標疾病中,1型糖尿病、2型糖尿病、C型肝炎、川崎病、強直性脊柱炎、銀屑病、肺結核、高血壓、骨關節炎、骨質疏鬆症、冠狀動脈病、潰瘍性結腸炎、發作性睡病、青光眼、腦動脈瘤病、中風、多囊性卵巢綜合征、多發性硬化症、膽石、路格裏克氏病、狼瘡、類風濕性關節炎、風濕性心臟病、慢性腎病、膝骨關節炎、病理性近視(高度近視)、貝切特氏病、白內障、白癜風、肥胖症、非酒精性脂肪肝、心肌梗塞、心房顫動、異位性皮膚炎、變態性食物過敏、妊娠糖尿病、妊娠期成癮、哮喘、椎間盤突出、癡呆症、克羅恩氏病、痛風、帕金森氏病、皮脂腺慢性病、黃斑變性、肝癌、甲狀腺癌、睾丸癌、口腔癌、急性骨髓性白血病、卵巢癌、膽道癌、結直腸癌、頭頸癌、彌漫性胃癌、膀胱癌、兒童期白血病、食道癌、腎癌、胃癌、乳腺癌、子宮頸癌、子宮內膜癌、前列腺癌、胰腺癌、肺癌、皮膚癌、多動症、恐慌症、酒精依賴、雙相情感障礙、抑鬱症、孤獨症、及精神分裂症可被歸類為可自已知的資料庫容易地計算盛行率的疾病;而阿司匹林過敏性慢性蕁麻疹、甘油三酯水準、甲基苯丙胺誘發的精神病、血管緊張素轉換酶抑制劑敏感性、華法林藥物敏感性、丙泊酚麻醉敏感性、阻塞性肺病及尼古丁成癮可以歸類為不容易自已知的資料庫計算盛行率的疾病,但本發明的範圍並不 限於此。 In addition, the target diseases can be classified according to whether the prevalence rate can be calculated. Specifically, among the above-mentioned target diseases, type 1 diabetes, type 2 diabetes, hepatitis C, Kawasaki disease, ankylosing spondylitis, psoriasis, tuberculosis, hypertension, osteoarthritis, osteoporosis, coronary artery disease Ulcerative colitis, narcolepsy, glaucoma, cerebral aneurysm, stroke, polycystic ovary syndrome, multiple sclerosis, gallstone, Lugrick's disease, lupus, rheumatoid arthritis, Rheumatic heart disease, chronic kidney disease, knee osteoarthritis, pathological myopia (high myopia), Becht's disease, cataract, vitiligo, obesity, nonalcoholic fatty liver, myocardial infarction, atrial fibrillation, atopic skin Inflammation, allergic food allergy, gestational diabetes, pregnancy addiction, asthma, disc herniation, dementia, Crohn's disease, gout, Parkinson's disease, sebaceous gland chronic disease, macular degeneration, liver cancer, thyroid cancer, testicular cancer, Oral cancer, acute myeloid leukemia, ovarian cancer, biliary tract cancer, colorectal cancer, head and neck cancer, diffuse gastric cancer, bladder cancer, childhood leukemia, esophageal cancer, kidney cancer, stomach cancer, Adenocarcinoma, cervical cancer, endometrial cancer, prostate cancer, pancreatic cancer, lung cancer, skin cancer, ADHD, panic disorder, alcohol dependence, bipolar disorder, depression, autism, and schizophrenia can be classified Easy to calculate prevalence of disease for known databases; aspirin allergic chronic urticaria, triglyceride levels, methamphetamine-induced psychosis, angiotensin-converting enzyme inhibitor sensitivity, warfarin drug sensitivity Sex, propofol anesthesia sensitivity, obstructive pulmonary disease, and nicotine addiction can be classified as diseases that are not easily calculated from known databases, but the scope of the present invention is not Limited to this.

在本發明中,對於容易計算盛行率的疾病組,藉由以下方式來設定演算法:基於對應種族的盛行率及基因型頻率確定群體中所有基因的加權平均分數,並預測所述群體中的受試者的相對風險以及發生率。為了進行判斷,使用平均風險分數及基因型頻率(%)作為判斷準則。具體而言,當受試者的平均風險分數1時,判斷為「標準階段」,而當受試者的風險平均分數>1時,判斷為「警告照護階段」與「注意照護階段」。另外,當平均風險分數較受試者的平均風險分數高的基因型組合的頻率和為5%或小於5%時,判斷為「注意照護階段」,而當所述頻率和超過5%時,判斷為「警告照護階段」。根據對所述演算法的補充,可以調整區分警告照護階段與注意照護階段的頻率的參考值。對於不容易計算盛行率的疾病組,可以三個水準(低/中/高)判斷自本發明的第一資料庫至第三資料庫辨識的基因型,從而產生針對特定疾病和藥物反應的相對風險和發生率預測模型。 In the present invention, for a disease group that is easy to calculate the prevalence rate, an algorithm is set by determining a weighted average score of all genes in the population based on the prevalence of the corresponding race and the genotype frequency, and predicting the population in the population. The relative risks and incidence of the subjects. In order to judge, the average risk score and the genotype frequency (%) were used as criteria for judgment. Specifically, when the subject’s average risk score At 1 o'clock, it is judged as "standard stage", and when the average risk score of the subject is > 1, it is judged as "warning care stage" and "attention care stage". In addition, when the frequency sum of the genotype combination whose average risk score is higher than the average risk score of the subject is 5% or less, it is judged as "attention care stage", and when the frequency is over 5%, It is judged as "warning care stage". In addition to the algorithm, the reference value that distinguishes the frequency of the warning care phase from the attentional care phase can be adjusted. For a disease group that does not easily calculate the prevalence rate, the genotypes identified from the first database to the third database of the present invention can be judged at three levels (low/medium/high), thereby generating a relative response to a specific disease and drug response. Risk and incidence prediction model.

本發明提供一種使用單核苷酸多態性之疾病相關基因分析的系統和裝置。更具體而言,根據本發明,在疾病及藥物反應的基因預測中,儲存以下資料:關於總體疾病及藥物反應的原因、診斷、預防等的資訊;關於遺傳因子的遺傳資訊,例如與遺傳因子相關的基因的搜索、所搜索基因的單核苷酸多態性名稱及基因座等;以及加以分析以在統計上證明或驗證所搜索基因為相關基因的資料。在回顧存儲資料的基礎上,結合對偶基因關聯研 究及遺傳關聯研究的一般特性,回顧了基因的優先級以及基因相關研究的優先級,從而選擇待應用於統計學預測演算法的單核苷酸多態性及相關值。因此,產生每種疾病及藥物反應的統計預測模型。 The present invention provides a system and apparatus for disease-related gene analysis using single nucleotide polymorphisms. More specifically, according to the present invention, in the gene prediction of diseases and drug reactions, the following information is stored: information on the cause, diagnosis, prevention, etc. of the overall disease and drug reaction; genetic information on genetic factors, such as genetic factors Search for related genes, single nucleotide polymorphism names and loci of the searched gene, etc.; and analyze to statistically prove or verify that the searched gene is a related gene. Based on the review of stored data, combined with the study of dual gene association The general characteristics of genetic and genetic association studies, reviewing the priority of genes and the priority of gene-related research, and selecting the single nucleotide polymorphisms and related values to be applied to statistical prediction algorithms. Therefore, a statistical prediction model for each disease and drug response is generated.

統計預測模型的產生包括以下步驟:搜索關於相關疾病及藥物反應的一般資訊;在原因中搜索作為遺傳因子的相關基因;以及搜索遺傳資訊。 The generation of a statistical prediction model includes the following steps: searching for general information about related diseases and drug reactions; searching for related genes as genetic factors in causes; and searching for genetic information.

搜索關於相關疾病及藥物反應的一般資訊的步驟為收集關於相關疾病及藥物反應的資訊的步驟,並且為確認定義、原因、診斷、治療、預防和照護以及重查所述疾病是否可能是由遺傳因子引起的過程。 The steps to search for general information about the disease and drug response are steps to gather information about the disease and drug response, and to confirm the definition, cause, diagnosis, treatment, prevention, and care, and to re-examine whether the disease is genetically The process caused by the factor.

搜索相關基因的步驟為檢索對偶基因關聯研究(包括所有證明或試圖證明與基因的關聯性的研究)的步驟,並且為基於實驗結果確認相關基因是否可被視為遺傳因子的過程。搜索遺傳資訊的步驟為確認每個種族中相關基因的分佈狀態、與其他基因的LD關係等的過程。 The step of searching for a related gene is a step of retrieving a dual gene association study (including all studies demonstrating or attempting to prove association with a gene), and is a process for confirming whether a related gene can be regarded as a genetic factor based on experimental results. The step of searching for genetic information is a process of confirming the distribution state of related genes in each race, and the LD relationship with other genes.

本發明提供一種系統和裝置,所述系統和裝置針對請求者所請求的疾病及藥物反應來檢查上述基因,然後根據對應的疾病及藥物反應執行統計預測模型,藉此計算表示個體基因型與所請求疾病及藥物反應之間的相關性的結果值。 The present invention provides a system and apparatus for examining a gene for a disease and a drug response requested by a requester, and then performing a statistical prediction model based on the corresponding disease and drug reaction, thereby calculating an individual genotype and The resulting value of the correlation between the disease and the drug response is requested.

更具體而言,根據本發明,為了準確地預測疾病及藥物反應,收集相關資訊,且將關於相關單核苷酸多態性的資訊儲存 至資料庫中以選擇候選單核苷酸多態性。使用此種資訊,應用用於準確地預測疾病的預後等的預測演算法來產生每一預測模型。 More specifically, according to the present invention, in order to accurately predict diseases and drug reactions, relevant information is collected, and information about related single nucleotide polymorphisms is stored. Go to the database to select candidate single nucleotide polymorphisms. Using this information, a prediction algorithm for accurately predicting the prognosis of the disease or the like is applied to generate each prediction model.

另外,疾病及藥物反應相關資料庫(本發明中的第一資料庫)在藉由對應的查詢進行搜索之後儲存關於疾病相關基因及疾病的資訊,並且還儲存例如依據於研究對象的盛行率/發生率等資料以及例如臨床資訊及健康指導資訊等來源。 In addition, the disease and drug reaction related database (the first database in the present invention) stores information on disease-related genes and diseases after searching by corresponding queries, and also stores, for example, the prevalence rate depending on the subject/ Sources such as incidence rates and sources such as clinical information and health guidance information.

研究資料庫(本發明中的第二資料庫)儲存例如證明或試圖證明相關基因的論文的PubMed標識符(PubMed identifier,PMID)、研究對象、研究方法、研究週期、研究結果、文獻資訊等資料。 The research database (the second database in the present invention) stores, for example, a PubMed identifier (PMID), a research object, a research method, a research cycle, a research result, a literature information, and the like of a paper that proves or attempts to prove a related gene. .

在藉由查詢檢索到自疾病資料庫檢索的基因之後,基因資料庫(本發明中的第三資料庫)儲存例如與特定疾病相關聯的特定單核苷酸多態性的表徵(trait)資訊、染色體數目、基因座及對偶基因資訊等資料。 After searching for a gene retrieved from the disease database by query, the gene database (the third database in the present invention) stores, for example, trait information of a particular single nucleotide polymorphism associated with a particular disease. , chromosome number, loci and dual gene information.

如圖1至圖4所示,將自第一資料庫至第三資料庫推導出的疾病-單核苷酸多態性關聯結果與請求的樣本的遺傳資訊進行比較,從而推導出提供請求的樣本的個體中特定疾病的發生(風險)概率,或者推導出對特定疾病高度敏感的候選藥物,或者推導出具有高副作用風險的候選藥物。在所述比較中,關於請求的樣本的年齡及性別資訊、他/她的配偶、孩子、父母、表親等的家族歷史資訊、與疾病相關位置有關的環境因子資訊及/或習慣資訊、營養狀況資訊、生活方式資訊及運動表現資訊可另外地加以 反映以獲得結果。若在請求的樣本的遺傳資訊中存在與特定疾病的發生高度相關聯的二或更多個單核苷酸多態性,則可對疾病發生風險給予加權分數,並且可根據單核苷酸多態性之間的相關性有差別地給予加權分數。在此種情形中,單核苷酸多態性之間的相關性愈高,則可給予的加權分數愈高。 As shown in FIG. 1 to FIG. 4, the disease-single nucleotide polymorphism association result derived from the first database to the third database is compared with the genetic information of the requested sample, thereby deriving the request for providing The probability of occurrence (risk) of a particular disease in an individual of the sample, or the derivation of a drug candidate that is highly sensitive to a particular disease, or the derivation of a drug candidate with a high risk of side effects. In the comparison, information about the age and gender of the requested sample, family history information of his/her spouse, child, parent, cousin, etc., environmental factor information and/or habit information related to the disease-related location, nutritional status Information, lifestyle information and athletic performance information can be added separately Reflect to get results. If there are two or more single nucleotide polymorphisms in the genetic information of the requested sample that are highly correlated with the occurrence of a particular disease, a weighted score may be given to the risk of the disease, and may be based on a single nucleotide The correlation between states is given a weighted score differentially. In this case, the higher the correlation between single nucleotide polymorphisms, the higher the weighted score that can be administered.

在自第一資料庫至第三資料庫推導出疾病-單核苷酸多態性關聯結果的過程中,如上所述用於推導出提供請求的樣本的個體中的特定疾病的發生(風險)概率的演算法可得出對研究方法、研究對象、研究週期、研究結果、文獻資訊等進行資訊選擇的優先級。舉例而言,若資料庫為第二資料庫,則優先級可為在進行全基因組關聯研究(Genome-Wide Association Study,GWAS)分析之後是否重複進行研究,或者所述論文是否是最近發表的論文,或者所述文獻是否具有高影響因子,或者是什麼結果(風險、置信區間、p值等)等,但不限於此。 In the process of deriving the disease-single nucleotide polymorphism association result from the first database to the third database, as described above for deriving the occurrence (risk) of a specific disease in the individual providing the requested sample The probability algorithm can give priority to the selection of information on research methods, research objects, research cycles, research results, and literature information. For example, if the database is the second database, the priority may be whether the study is repeated after the Genome-Wide Association Study (GWAS) analysis, or whether the paper is a recently published paper. Whether the document has a high impact factor, or what result (risk, confidence interval, p value, etc.), etc., but is not limited thereto.

另外,使用第一資料庫至第三資料庫的組合來確定優先級,優先級可以如下,但不限於此。 In addition, the priority is determined using a combination of the first database to the third database, and the priority may be as follows, but is not limited thereto.

1.探究目標疾病的遺傳力:遺傳因子如何影響目標疾病。端視遺傳力而定,選擇研究方法且預測關聯研究成功的可能性,從而以目標疾病的盛行率/發生率來探究目標疾病的遺傳力。 1. Exploring the heritability of the target disease: how the genetic factor affects the target disease. Depending on heritability, the research method is selected and the likelihood of success in the association study is predicted, so that the heritability of the target disease is explored based on the prevalence/incidence rate of the target disease.

2.確認研究對象表型的選擇:探究是否將基於生化表型進行選擇,所述生化表型為患者分組標準且對於疾病的發展很重要。此被稱為與所述疾病相關的中間表型。在與特定基因的關聯 性變得更強而非複雜的疾病的預期下,探究是否將研究此表型而非所述疾病是有意義的。 2. Confirmation of the choice of phenotype of the study subject: Exploring whether a biochemical phenotype will be selected based on the biochemical phenotype, which is a patient grouping criterion and is important for the development of the disease. This is referred to as the intermediate phenotype associated with the disease. In association with a specific gene Under the expectation that sex becomes stronger rather than complicated, it is meaningful to explore whether this phenotype will be studied instead of the disease.

3.檢查分析中使用的樣本量:樣本量影響檢定的功效。統計顯著性意味著實際上與疾病無關的遺傳變型被認為是在統計上顯著的概率(假陽性),或者,與疾病相關的基因被認為是在統計上不顯著的概率(假陰性)。藉由自概率1中減去此概率而獲得的值被稱為檢定的功效。由於統計顯著性是由遺傳模型、基因頻率、相對疾病風險及樣本量決定的,因此對於確認此等因素是有意義的。 3. Check the sample size used in the analysis: The sample size affects the efficacy of the assay. Statistical significance means that genetic variants that are not actually associated with the disease are considered to be statistically significant (false positives), or genes associated with the disease are considered to be statistically insignificant (false negatives). The value obtained by subtracting this probability from probability 1 is called the power of the check. Since statistical significance is determined by genetic models, gene frequency, relative disease risk, and sample size, it makes sense to identify these factors.

4.辨識研究對象的種族:當涉及具有不同遺傳背景的多個種族時,依據於種族的不同基因型的頻率極有可能歸因於種族差異而非與疾病的關聯性。出於此種原因,種族的辨識對於確認所述研究是對異質種族群體的研究還是對同質種族群體的研究是有意義的。若使用異質種族群體,則應檢查是否要考慮群體之間的差異。 4. Identify the ethnicity of the study: When multiple races with different genetic backgrounds are involved, the frequency of different genotypes based on race is most likely due to racial differences rather than disease. For this reason, racial identification makes sense for confirming whether the study is a study of heterogeneous ethnic groups or a study of homogeneous ethnic groups. If you use a heterogeneous ethnic group, you should check to see if you want to consider the differences between the groups.

5.檢查研究對象的基因樣本、臨床資訊、表型及環境資訊的收集:由於樣本類型、DNA提取方法的種類、遺傳複雜性及表型複雜性,需要檢查是否收集了準確的臨床資訊以及疾病的分類是否詳細,並且由於大多數疾病是複雜的,因此需要檢查是否收集了各種環境因子並將其應用於分析。 5. Check the collection of genetic samples, clinical information, phenotypes and environmental information of the study subjects: due to the type of sample, the type of DNA extraction method, genetic complexity and phenotypic complexity, it is necessary to check whether accurate clinical information and diseases have been collected. The classification is detailed, and since most diseases are complex, it is necessary to check whether various environmental factors have been collected and applied to the analysis.

6.確認選擇用於基因分型及候選單核苷酸多態性的候選基因的方法:應確認用於選擇與疾病相關聯的單核苷酸多態性 的方法以及是否將探究基因頻率,以確定統計檢定的功效。 6. Confirmation of methods for selecting candidate genes for genotyping and candidate single nucleotide polymorphisms: confirmation of selection for single nucleotide polymorphisms associated with disease The method and whether the gene frequency will be explored to determine the efficacy of the statistical test.

7.確認是否將進行重複實驗:若發現顯著的結果,則應重複進行實驗以證明結果是否為統計誤差。在此項研究中,應檢查是否藉由統計迭代或功能迭代重複觀察統計顯著性。首先,需要確認在過去或現在的研究中如何選擇待研究的核苷酸序列變異,並藉由反復實驗確認顯著的結果。 7. Confirm if repeated experiments will be performed: If significant results are found, the experiment should be repeated to prove whether the results are statistical errors. In this study, it should be checked whether statistical significance is observed repeatedly by statistical iteration or functional iteration. First, it is necessary to confirm how to select the nucleotide sequence variation to be studied in past or current research, and to confirm significant results by repeated experiments.

8.確認上述關聯研究的統計分析結果。在檢定單核苷酸多態性遺傳特徵的統計分析中,基於對偶基因頻率及基因型頻率的分佈進行顯著性檢定,藉此估算單核苷酸多態性的相對風險(勝算比)。端視基因型而定,根據遺傳模型進行關聯分析,以估算可能的基因型。遺傳模型包括三種模型:顯性模型、隱性模型、及相加性模型。藉由該些模型,驗證是否將單核苷酸多態性顯著分類為特定條件下的患者組及對照組,且驗證每一單核苷酸多態性的遺傳特徵。同樣,估算相對風險,且考慮所估算的相對風險的置信區間以及相對應的p值。為了檢查統計分析結果中的誤差,應驗證是否檢查與研究設計有關的多重比較引起的誤差。 8. Confirm the statistical analysis results of the above related research. In the statistical analysis of the genetic characteristics of single nucleotide polymorphisms, significant differences were determined based on the distribution of dual gene frequencies and genotype frequencies, thereby estimating the relative risk (winning ratio) of single nucleotide polymorphisms. Depending on the genotype, correlation analysis was performed based on genetic models to estimate possible genotypes. The genetic model includes three models: dominant model, recessive model, and additive model. Using these models, it was verified whether the single nucleotide polymorphisms were significantly classified into patient groups and control groups under specific conditions, and the genetic characteristics of each single nucleotide polymorphism were verified. Similarly, estimate the relative risk and consider the confidence interval for the estimated relative risk and the corresponding p-value. In order to check for errors in the statistical analysis results, it should be verified whether the errors caused by multiple comparisons related to the study design are checked.

9.最後,確認關聯研究中使用的已發表期刊的影響因子及發表年份,以及是否將基於相同的研究重複研究。出版物期刊的影響因子愈高,且研究出版年代愈近,並且研究愈多,則優先級愈高。 9. Finally, confirm the impact factors and publication years of the published journals used in the association study, and whether the study will be repeated based on the same study. The higher the impact factor of the publication journal, and the closer the research publication is, and the more research, the higher the priority.

考慮所有上述事項來設定優先級,且設定用於計算來自第一資料庫至第三資料庫的資訊的演算法。應用該些預測演算法 來產生針對特定疾病及藥物反應的預測模型。 Consider all of the above to set priorities and set up algorithms for calculating information from the first to the third database. Applying these prediction algorithms To generate predictive models for specific diseases and drug responses.

此等優先級準則被用作客觀準則,例如是否基於資訊的積累來驗證相關疾病、藥物及特徵,以便遵循不基於人為的任意判斷或知識積累的客觀判斷準則。考慮關於基於該些準則選擇的候選基因的資訊並考慮盛行率/發生率來預測受試者的預後。根據預測結果,可自第一資料庫至第三資料庫中選擇疾病資訊及例如預防方法等健康指南,並將其提供給所述受試者。 These priority criteria are used as objective criteria, such as whether to validate related diseases, drugs, and characteristics based on the accumulation of information in order to follow objective judgment criteria that are not based on arbitrary judgments or knowledge accumulation. The subject's prognosis is predicted by considering information about candidate genes selected based on these criteria and considering prevalence/incidence. Based on the predicted results, disease information and health guidelines such as prevention methods can be selected from the first database to the third database and provided to the subject.

在一個實施例中,本發明提供一種用於計算每種疾病發生風險的加權分數的方法,所述方法包括以下步驟:(a)估算每一基因的基因型的相對風險;(b)概化群體中每一基因的基因型的相對風險;(c)計算每一基因的相對風險的分數;(d)計算所有基因的相對風險的平均分數;(e)計算受試者的基因型的分數;以及(f)計算所述受試者的基因型的相對風險。在所述方法中,步驟(a)中的所述相對風險是藉由勝算比/((1-盛行率)+(盛行率*勝算比))來計算。在所述方法中,步驟(b)中在所述群體中的所述相對風險是藉由相對風險*對應的基因型頻率來計算。在所述方法中,步驟(c)中的所述相對風險的分數被計算為每一基因的基因型的相對風險的分數的和。在所述方法中,步驟(d)中的所述相對風險的平均分數被計算為每一基因的相對風險的分數的乘積。在所述方法中,步驟(e)中的所述分數被計算為所述受試者每一基因的基因型的相對風險的乘積。在所述方法中,步驟(f)中的所述相對風險是藉由步驟(e)中的分數/步驟(d)中的相對 風險的平均分數來計算。 In one embodiment, the invention provides a method for calculating a weighted score for each disease occurrence risk, the method comprising the steps of: (a) estimating the relative risk of genotype of each gene; (b) generalizing The relative risk of the genotype of each gene in the population; (c) the score of the relative risk of each gene; (d) the average score of the relative risk of all genes; (e) the score of the genotype of the subject And (f) calculating the relative risk of the subject's genotype. In the method, the relative risk in step (a) is calculated by the odds ratio / ((1 prevailing rate) + (prevailing rate * odds ratio)). In the method, the relative risk in the population in step (b) is calculated by the genotype frequency corresponding to the relative risk*. In the method, the fraction of the relative risk in step (c) is calculated as the sum of the scores of the relative risks of the genotype of each gene. In the method, the average score of the relative risk in step (d) is calculated as the product of the fraction of the relative risk of each gene. In the method, the score in step (e) is calculated as the product of the relative risk of the genotype of each gene of the subject. In the method, the relative risk in step (f) is by the score in step (e) / the relative in step (d) The average score of the risk is calculated.

在另一實施例中,本發明提供一種預測疾病的方法,所述方法包括以下步驟:(a)自請求的樣本提取DNA;(b)自所述DNA提取遺傳資訊;(c)藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)當二或更多個與所述特定疾病相關聯的單核苷酸多態性存在於所述請求的樣本的遺傳資訊中時,藉由應用上述的方法計算的加權分數來計算每一基因型的相對風險;(e)計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷所述請求的樣本發生疾病的風險。在所述方法中,步驟(c)中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、處方藥物的濃度、藥物處方的頻率、藥物處方的週期、及副作用的資訊。在所述方法中,步驟(c)中的所述第二資料庫為研究資料庫,所述研究資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含所述論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊、以及所述研究的可重複性資訊。在所述方法中,步驟(c)中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。在所述方法中,當步驟(d)中的所述相對風險為1或大於1(1)時,步驟(e)中的所述相對風險 (%)是藉由(受試者的平均風險分數-1)*100來計算,而當步驟(d)中的所述相對風險小於1(<1)時,步驟(e)中的所述相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。在所述方法中,步驟(e)中的所述發生率(%)是藉由步驟(d)中的所述相對風險*盛行率來計算。在所述方法中,步驟(f)中的所述判斷包括當步驟(e)中的所述相對風險(%)為1或小於1(1)時,判斷為標準,而當所述相對風險(%)大於1(>1)時,判斷為警告或注意。 In another embodiment, the invention provides a method of predicting a disease, the method comprising the steps of: (a) extracting DNA from the requested sample; (b) extracting genetic information from the DNA; (c) by The genetic information is compared to a disease-single nucleotide polymorphism association result from a first database to a third database to measure a risk of a particular disease; (d) when two or more are associated with the particular When the disease-associated single nucleotide polymorphism is present in the genetic information of the requested sample, the relative risk of each genotype is calculated by applying the weighted score calculated by the above method; (e) calculating the The relative risk (%) and incidence (%) of the requested sample; and (f) the risk of disease in the sample being requested. In the method, the first database in step (c) is a disease and drug reaction related database, and the disease and drug reaction related database includes information related to a specific single nucleotide polymorphism. Information on the symptoms of a particular disease, the type of prescription drug, the concentration of a prescription drug, the frequency of a drug prescription, the cycle of a drug prescription, and side effects. In the method, the second database in step (c) is a research database containing research papers on specific diseases associated with a particular single nucleotide polymorphism, and Contains the PubMed identifier of the paper, research subjects, research methods, research cycles, research results, journal information, and reproducibility information for the study. In the method, the third database in step (c) is a gene database comprising a chromosome number, a locus and a specific single nucleotide polymorphism associated with a specific disease. Dual gene information. In the method, the relative risk in step (d) is 1 or greater than 1 ( 1), the relative risk (%) in step (e) is calculated by (the average risk score of the subject -1) * 100, and the relative risk in step (d) is less than 1 (<1), the relative risk (%) in step (e) is calculated by (1 - average risk score of the subject) * 100. In the method, the incidence rate (%) in step (e) is calculated by the relative risk* prevalence rate in step (d). In the method, the determining in the step (f) includes when the relative risk (%) in the step (e) is 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) is greater than 1 (>1), it is judged as a warning or attention.

在又一實施例中,本發明提供一種用於預測疾病的裝置,所述裝置包括:(a)提取單元,被配置成自請求的樣本提取DNA;(b)輸入單元,被配置成自所述DNA提取遺傳資訊;(c)比較單元,被配置成藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)算術單元,被配置成當二或更多個與所述特定疾病相關聯的單核苷酸多態性存在於所述請求的樣本的遺傳資訊中時,藉由應用根據如上述的方法計算的加權分數來計算每一基因型的相對風險;(e)計算單元,被配置成計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷單元,被配置成判斷所述請求的樣本的疾病發生風險。在所述裝置中,步驟(c)中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、處方藥物的濃度、藥物處方的 頻率、藥物處方的週期及副作用的資訊。在所述裝置中,步驟(c)中的所述第二資料庫為研究資料庫,所述研究資料庫包括關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含所述論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊以及所述研究的可重複性資訊。在所述裝置中,步驟(c)中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。在所述裝置中,當步驟(d)中的所述相對風險為1或大於1(1)時,步驟(e)中的所述相對風險(%)是藉由(受試者的平均風險分數-1)*100來計算,而當步驟(d)中的所述相對風險小於1(<1)時,步驟(e)中的所述相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。在所述裝置中,步驟(e)中的所述發生率(%)是藉由步驟(d)中的所述相對風險*盛行率來計算。在所述裝置中,步驟(f)中的所述判斷包括當步驟(e)中的所述相對風險(%)為1或小於1(1)時,判斷為標準,而當所述相對風險(%)大於1(>1)時,判斷為警告或注意。 In still another embodiment, the present invention provides an apparatus for predicting a disease, the apparatus comprising: (a) an extracting unit configured to extract DNA from a requested sample; (b) an input unit configured to self-use The DNA extracts genetic information; (c) a comparison unit configured to measure the genetic information by correlating the disease-single nucleotide polymorphisms from the first database to the third database a risk of a particular disease; (d) an arithmetic unit configured to when two or more single nucleotide polymorphisms associated with the particular disease are present in the genetic information of the requested sample, Applying a weighted score calculated according to the method as described above to calculate a relative risk for each genotype; (e) a calculation unit configured to calculate a relative risk (%) and an incidence rate (%) of the requested sample; and f) a judging unit configured to determine a risk of occurrence of the disease of the requested sample. In the device, the first database in step (c) is a disease and drug reaction related database, and the disease and drug reaction related database contains information about a specific single nucleotide polymorphism. Information on the symptoms of a particular disease, the type of prescription drug, the concentration of a prescription drug, the frequency of a drug prescription, the cycle of a drug prescription, and side effects. In the device, the second database in step (c) is a research database comprising research papers on specific diseases associated with a particular single nucleotide polymorphism, and Contains the PubMed identifier of the paper, research subjects, research methods, research cycles, research results, journal information, and reproducibility information for the study. In the device, the third database in step (c) is a gene database comprising a chromosome number, a locus and a specific single nucleotide polymorphism associated with a specific disease. Dual gene information. In the device, the relative risk in step (d) is 1 or greater than 1 ( 1), the relative risk (%) in step (e) is calculated by (the average risk score of the subject -1) * 100, and the relative risk in step (d) is less than 1 (<1), the relative risk (%) in step (e) is calculated by (1 - average risk score of the subject) * 100. In the device, the incidence rate (%) in step (e) is calculated by the relative risk* prevalence rate in step (d). In the device, the determining in the step (f) includes when the relative risk (%) in the step (e) is 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) is greater than 1 (>1), it is judged as a warning or attention.

在下文中,將詳細闡述本發明的每一步驟。 Hereinafter, each step of the present invention will be explained in detail.

在下文中,將參照實例進一步詳細闡述本發明。對於熟習此項技術者而言顯而易見的是,該些實例僅用於說明的目的,而非旨在限制如隨附申請專利範圍所界定的本發明的範圍。 Hereinafter, the present invention will be further elaborated with reference to examples. It is obvious to those skilled in the art that the examples are for illustrative purposes only and are not intended to limit the scope of the invention as defined by the appended claims.

在說明書通篇中,當任何部分「包括」任何要素時,意 味著除非另有特別闡明,否則其可更包括其他要素而非排除其他要素。 Throughout the specification, when any part "includes" any element, It is intended to include other elements rather than exclude other elements, unless otherwise stated.

實例1:資料收集Example 1: Data Collection

首先,收集關於目標疾病的盛行率、勝算比及基因型頻率的資料。 First, collect information on the prevalence, odds ratio, and genotype frequency of the target disease.

具體而言,基於例如以下公共資料來收集盛行率:國家健康及營養調查(National Health and Nutrition Survey)、衛生福利部-精神疾病流行病學調查(Ministry of Health and Welfare-Mental Illness Epidemiology Survey)、中央癡呆中心年度報告(Central Dementia Center Annual Report)、公共健康評估及公共資訊披露機構公共資訊披露3.0公共資料(Public Health Assessment and Public Disclosure Agency Public Disclosure 3.0 Public Data)、及/或居民營養群體(Resident Nutrition Population),同時注明來源。 Specifically, the prevalence rate is collected based on, for example, the following public information: National Health and Nutrition Survey, Ministry of Health and Welfare-Mental Illness Epidemiology Survey, Central Dementia Center Annual Report, Public Health Assessment and Public Disclosure Agency Public Disclosure 3.0 Public Data, and/or Resident Nutrition Group (Resident) Nutrition Population), with the source indicated.

對於勝算比,自與相關疾病有關的論文收集例如患者(對照)組研究、薈萃分析、家庭研究、世代研究等類型資訊,並且論文並無限制,只要學術論文(例如PubMed、Google學者等)可以找到便可。 For odds ratios, papers related to related diseases such as patient (control) group studies, meta-analysis, family studies, generational studies, etc., and the papers are not limited, as long as academic papers (such as PubMed, Google Scholar, etc.) can Just find it.

將收集到的資料按照群體資料、良好的顯著性等順序選擇。 The collected data are selected in the order of group data, good significance, and the like.

基於例如Hapmap 3、千人基因組項目(1000 Genomes project)等公共資料來收集基因型頻率。 Genotype frequencies are collected based on public data such as Hapmap 3, 1000 Genomes project.

實例2:計算每種疾病發生風險的加權分數Example 2: Calculating the weighted score for the risk of each disease

每種疾病的風險是藉由以下步驟1至步驟8來計算。步驟1至步驟4為計算每一群體的風險的步驟;步驟5為計算每一受試者的風險的步驟;步驟6為計算所述群體中受試者的平均分數的步驟;且步驟7及步驟8為計算所述群體中受試者的相對風險(%)及發生率(%)的步驟。 The risk of each disease is calculated by steps 1 through 8 below. Steps 1 to 4 are steps for calculating the risk of each group; Step 5 is a step of calculating the risk of each subject; Step 6 is a step of calculating an average score of the subjects in the group; and Step 7 Step 8 is a step of calculating the relative risk (%) and incidence (%) of the subject in the population.

更具體而言,步驟1為估算每一基因的基因型的相對風險的步驟。亦即,在此步驟中,使用作為相關疾病的風險因子的每一基因的基因型的勝算比來估算相對風險。相對風險使用以下方程式計算:相對風險=勝算比/((1-盛行率)+(盛行率)*勝算比)))。 More specifically, step 1 is a step of estimating the relative risk of the genotype of each gene. That is, in this step, the relative risk is estimated using the odds ratio of the genotype of each gene as a risk factor for the relevant disease. The relative risk is calculated using the following equation: relative risk = odds ratio / ((1 prevalence rate) + (prevailing rate) * odds ratio))).

步驟2為概化每一基因的基因型的相對風險的步驟。亦即,此步驟為根據對應的國籍及種族概化每一基因的基因型風險的過程。風險使用以下方程式計算:群體的相對風險分數=相對風險*對應的基因型頻率。 Step 2 is the step of generalizing the relative risk of the genotype of each gene. That is, this step is a process of generalizing the genotype risk of each gene based on the corresponding nationality and ethnicity. The risk is calculated using the following equation: relative risk score for the population = relative risk * corresponding genotype frequency.

步驟3為計算每一基因的相對風險的分數的步驟。在此步驟中,計算每一基因的基因型的相對風險的分數的總和。 Step 3 is the step of calculating the score of the relative risk of each gene. In this step, the sum of the scores of the relative risks of the genotype of each gene is calculated.

步驟4為計算所有基因的相對風險的平均分數的步驟。在此步驟中,平均分數被計算為基因相對風險的平均分數的乘積。 Step 4 is the step of calculating the average score of the relative risks of all genes. In this step, the average score is calculated as the product of the average score of the gene relative risk.

步驟5為計算受試者的基因型的分數的步驟。在此步驟中,分數被計算為每一基因的受試者基因型的相對風險的乘積。 Step 5 is the step of calculating the score of the genotype of the subject. In this step, the score is calculated as the product of the relative risk of the subject's genotype for each gene.

步驟6為使用以下方程式計算受試者的基因型的相對風 險的步驟:群體中受試者的平均風險分數=(受試者的基因型的分數(步驟5中的分數))/(所有基因的相對風險的平均分數(步驟4中的分數))。 Step 6 is to calculate the relative wind of the subject's genotype using the following equation Risk Steps: The average risk score of the subjects in the population = (the score of the subject's genotype (the score in step 5)) / (the average score of the relative risk of all genes (the score in step 4)).

步驟7為計算受試者的相對風險(%)的步驟。在此步驟中,若受試者的平均風險分數為1或大於1(1),則相對風險是藉由「(受試者的平均風險分數-1)*100」來計算,而若受試者的平均風險分數小於1(<1),則相對風險是藉由「(1-受試者的平均風險分數)*100」來計算。 Step 7 is a step of calculating the relative risk (%) of the subject. In this step, if the subject's average risk score is 1 or greater than 1 ( 1), the relative risk is calculated by "(subject's average risk score -1) * 100", and if the subject's average risk score is less than 1 (<1), the relative risk is by " (1 - average risk score of the subject) * 100" is calculated.

步驟8為藉由「群體中受試者的平均風險分數*盛行率」來計算受試者的發生率(%)的步驟。 Step 8 is a step of calculating the incidence rate (%) of the subject by "average risk score of the subject in the population * prevalence rate".

步驟1至步驟8在圖5中示出。 Steps 1 through 8 are shown in FIG.

實例3:結果判斷Example 3: Result judgment

基於群體中的平均風險分數及基因型頻率(%),判斷受試者中發生疾病的風險。判斷較佳為兩階段(標準/警告)式判斷或三階段(標準/警告/注意)式判斷,但不限於此。 The risk of developing a disease in the subject is judged based on the average risk score in the population and the genotype frequency (%). The judgment is preferably a two-stage (standard/warning) type judgment or a three-stage (standard/warning/attention) type judgment, but is not limited thereto.

在兩階段式判斷或三階段式判斷中,標準是基於受試者的平均風險分數為1或小於1(1)的情形,而警告或注意是基於受試者的平均風險分數大於(>1)的情形。更具體而言,計算與疾病對應的基因的基因型的每種組合的平均風險分數及在普通群體中的頻率(%),並且增加平均風險分數高於受試者的平均風險分數的頻率。當所述頻率為5%或小於5%時,判斷為警告,而當所述頻率大於5%時,判斷為注意。例外地,對於無法計算盛行率 的疾病(例如,甘油三酯水準等),根據自參考論文中辨識的基因型,以三個水準(低/中/高)進行判斷。 In a two-stage judgment or a three-stage judgment, the criterion is based on the average risk score of the subject being 1 or less (1) 1) The case, and the warning or caution is based on the case where the subject's average risk score is greater than (>1). More specifically, the average risk score of each combination of the genotypes of the genes corresponding to the disease and the frequency (%) in the general population are calculated, and the frequency at which the average risk score is higher than the average risk score of the subject is increased. When the frequency is 5% or less, it is judged as a warning, and when the frequency is more than 5%, it is judged as attention. Exceptionally, for diseases that cannot calculate prevalence (eg, triglyceride levels, etc.), judged at three levels (low/medium/high) based on the genotype identified in the reference paper.

儘管已參照具體特徵詳細闡述了本發明,但對於熟習此項技術者而言顯而易見的是,此說明僅是關於其較佳實施例,而並不限制本發明的範圍。因此,本發明的實質範圍將由隨附申請專利範圍及其等效範圍來界定。 Although the present invention has been described in detail with reference to the specific features thereof, it is obvious to those skilled in the art that this description is only for the preferred embodiments thereof, and does not limit the scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims and their equivalents.

Claims (15)

一種計算每種疾病發生風險的加權分數的方法,包括以下步驟:(a)估算每一基因的基因型的相對風險;(b)概化群體中每一基因的基因型的相對風險;(c)計算每一基因的相對風險的分數;(d)計算所有基因的相對風險的平均分數;(e)計算受試者的基因型的分數;以及(f)計算所述受試者的基因型的相對風險,其中步驟(a)中的相對風險是藉由勝算比/((1-盛行率)+(盛行率*勝算比))來計算;步驟(b)中的所述群體中的相對風險是藉由相對風險*對應的基因型頻率來計算;步驟(c)中的相對風險的分數被計算為每一基因的基因型的相對風險的分數的總和;步驟(d)中的相對風險的平均分數被計算為每一基因的相對風險的分數的乘積;步驟(e)中的分數被計算為所述受試者的每一基因的基因型的相對風險的乘積;以及步驟(f)中的相對風險是藉由步驟(e)中的分數/步驟(d)中的相對風險的平均分數來計算。 A method of calculating a weighted score for the risk of occurrence of each disease, comprising the steps of: (a) estimating the relative risk of the genotype of each gene; (b) generalizing the relative risk of the genotype of each gene in the population; Calculating the score of the relative risk of each gene; (d) calculating the average score of the relative risk of all genes; (e) calculating the score of the genotype of the subject; and (f) calculating the genotype of the subject The relative risk, wherein the relative risk in step (a) is calculated by the odds ratio / ((1 prevailing rate) + (prevailing rate * odds ratio)); the relative in the group in step (b) The risk is calculated by the genotype frequency corresponding to the relative risk*; the score of the relative risk in step (c) is calculated as the sum of the scores of the relative risks of the genotype of each gene; the relative risk in step (d) The average score is calculated as the product of the scores of the relative risks of each gene; the score in step (e) is calculated as the product of the relative risk of the genotype of each gene of the subject; and step (f) The relative risk in the step is the relative value in step (e) / step (d) The average score of the risk is calculated. 一種預測疾病的方法,包括以下步驟: (a)自請求的樣本提取DNA;(b)自所述DNA提取遺傳資訊;(c)藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)當二或更多個與所述特定疾病相關聯的單核苷酸多態性存在於所述請求的樣本的所述遺傳資訊中時,藉由應用根據如申請專利範圍第1項所述的計算每種疾病發生風險的加權分數的方法計算的加權分數來計算每一基因型的相對風險;(e)計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷所述請求的樣本的疾病發生風險。 A method of predicting disease comprising the following steps: (a) extracting DNA from the requested sample; (b) extracting genetic information from the DNA; (c) by locating the genetic information with disease-single nucleotides from the first to the third database State correlation results are compared to measure the risk of a particular disease; (d) when two or more single nucleotide polymorphisms associated with the particular disease are present in the requested sample of the genetic information Calculating the relative risk of each genotype by applying a weighted score calculated according to the method of calculating the weighted score of each disease occurrence risk as described in claim 1 of the patent application; (e) calculating the request The relative risk (%) and incidence (%) of the sample; and (f) the risk of developing the disease in the sample requested. 如申請專利範圍第2項所述的預測疾病的方法,其中步驟(c)中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、所述處方藥物的濃度、藥物處方的頻率、藥物處方的週期及副作用的資訊。 The method for predicting disease according to claim 2, wherein the first database in the step (c) is a disease and drug reaction related database, and the disease and drug reaction related database includes Information on the symptoms of a particular disease associated with a single nucleotide polymorphism, the type of prescription drug, the concentration of the prescribed drug, the frequency of drug prescription, the cycle of the drug prescription, and side effects. 如申請專利範圍第2項所述的預測疾病的方法,其中步驟(c)中的所述第二資料庫為研究資料庫,所述研究資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊以及研究的可重複性資訊。 The method for predicting disease according to claim 2, wherein the second database in the step (c) is a research database, and the research database comprises related to a specific single nucleotide polymorphism. Research papers on specific diseases, including the PubMed identifier, research objects, research methods, research cycles, research results, journal information, and research reproducibility information. 如申請專利範圍第2項所述的預測疾病的方法,其中 步驟(c)中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。 A method for predicting a disease as described in claim 2, wherein The third database in step (c) is a gene database containing the number of chromosomes, loci and dual gene information of a particular single nucleotide polymorphism associated with a particular disease. 如申請專利範圍第2項所述的預測疾病的方法,其中當步驟(d)中的相對風險為1或大於1(1)時,步驟(e)中的相對風險(%)是藉由(受試者的平均風險分數-1)*100來計算,而當步驟(d)中的相對風險小於1(<1)時,步驟(e)中的相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。 A method for predicting disease as described in claim 2, wherein the relative risk in step (d) is 1 or greater (1) 1), the relative risk (%) in step (e) is calculated by (subject's average risk score -1) * 100, and the relative risk in step (d) is less than 1 (<1) The relative risk (%) in step (e) is calculated by (1 - average risk score of the subject) * 100. 如申請專利範圍第2項所述的預測疾病的方法,其中步驟(e)中的所述發生率(%)是藉由步驟(d)中的相對風險*盛行率來計算。 The method for predicting disease according to claim 2, wherein the occurrence rate (%) in the step (e) is calculated by the relative risk* prevalence rate in the step (d). 如申請專利範圍第2項所述的預測疾病的方法,其中步驟(f)中的判斷包括當步驟(e)中的相對風險(%)為1或小於1(1)時,判斷為標準,而當步驟(e)中的相對風險(%)大於1(>1)時,判斷為警告或注意。 The method for predicting disease according to claim 2, wherein the judgment in the step (f) comprises the relative risk (%) in the step (e) being 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) in step (e) is greater than 1 (>1), it is judged as warning or attention. 一種用於預測疾病的裝置,包括:(a)提取單元,被配置成自請求的樣本提取DNA;(b)輸入單元,被配置成自所述DNA提取遺傳資訊;(c)比較單元,被配置成藉由將所述遺傳資訊與來自第一資料庫至第三資料庫的疾病-單核苷酸多態性關聯結果進行比較來量測特定疾病的風險;(d)算術單元,被配置成當二或更多個與所述特定疾病相關 聯的單核苷酸多態性存在於所述請求的樣本的所述遺傳資訊中時,藉由應用根據如申請專利範圍第1項所述的計算每種疾病發生風險的加權分數的方法計算的加權分數來計算每一基因型的相對風險;(e)計算單元,被配置成計算所述請求的樣本的相對風險(%)及發生率(%);以及(f)判斷單元,被配置成判斷所述請求的樣本的疾病發生風險。 A device for predicting a disease, comprising: (a) an extraction unit configured to extract DNA from a requested sample; (b) an input unit configured to extract genetic information from the DNA; (c) a comparison unit, Configuring to measure the risk of a particular disease by comparing the genetic information to a disease-single nucleotide polymorphism associated with the first to third databases; (d) an arithmetic unit configured Two or more related to the specific disease When the single nucleotide polymorphism is present in the genetic information of the requested sample, the calculation is performed by applying a method of calculating a weighted score for each disease occurrence risk as described in claim 1 of the scope of the patent application. a weighted score to calculate the relative risk of each genotype; (e) a calculation unit configured to calculate a relative risk (%) and an incidence rate (%) of the requested sample; and (f) a judgment unit configured The risk of developing a disease of the sample of the request is determined. 如申請專利範圍第9項所述的用於預測疾病的裝置,其中(c)比較單元中的所述第一資料庫為疾病及藥物反應相關資料庫,所述疾病及藥物反應相關資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的症狀、處方藥物的種類、所述處方藥物的濃度、藥物處方頻率、藥物處方週期及副作用的資訊。 The device for predicting disease according to claim 9, wherein the first database in the (c) comparison unit is a disease and drug reaction related database, and the disease and drug reaction related database includes Information about the symptoms of a particular disease associated with a particular single nucleotide polymorphism, the type of prescription drug, the concentration of the prescribed drug, the frequency of drug prescription, the duration of the drug, and the side effects. 如申請專利範圍第9項所述的用於預測疾病的裝置,其中(c)比較單元中的所述第二資料庫為研究資料庫,所述研究資料庫包含關於與特定單核苷酸多態性相關聯的特定疾病的研究論文,並且亦包含論文的PubMed標識符、研究對象、研究方法、研究週期、研究結果、期刊資訊以及研究的可重複性資訊。 The device for predicting disease according to claim 9, wherein the second database in the (c) comparison unit is a research database, the research database comprising more about a specific single nucleotide. Research papers on specific diseases associated with the state, and also include the PubMed identifier, research objects, research methods, research cycles, research results, journal information, and research reproducibility information. 如申請專利範圍第9項所述的用於預測疾病的裝置,其中(c)比較單元中的所述第三資料庫為基因資料庫,所述基因資料庫包含與特定疾病相關聯的特定單核苷酸多態性的染色體數目、基因座及對偶基因資訊。 The device for predicting disease according to claim 9, wherein the third database in the (c) comparison unit is a gene database containing a specific list associated with a specific disease. Nucleotide number, locus and dual gene information of nucleotide polymorphisms. 如申請專利範圍第9項所述的預測疾病的裝置,其中當(d)算術單元中的相對風險為1或大於1(1)時,(e)計算單元中的相對風險(%)是藉由(受試者的平均風險分數-1)*100來計算,而當(d)算術單元中的相對風險小於1(<1)時,(e)計算單元中的相對風險(%)是藉由(1-受試者的平均風險分數)*100來計算。 A device for predicting disease according to claim 9 wherein the relative risk in the (d) arithmetic unit is 1 or greater (1) 1), (e) The relative risk (%) in the calculation unit is calculated by (subject's average risk score -1) * 100, and when (d) the relative risk in the arithmetic unit is less than 1 (< 1), (e) The relative risk (%) in the calculation unit is calculated by (1 - average risk score of the subject) * 100. 如申請專利範圍第9項所述的用於預測疾病的裝置,其中(e)計算單元中的所述發生率(%)是藉由(d)算術單元中的相對風險*盛行率來計算。 The apparatus for predicting disease according to claim 9, wherein (e) the occurrence rate (%) in the calculation unit is calculated by (d) a relative risk* prevalence rate in the arithmetic unit. 如申請專利範圍第9項所述的用於預測疾病的裝置,其中(f)判斷單元中的判斷包括當(e)計算單元中的相對風險(%)為1或小於1(1)時,判斷為標準,而當(e)計算單元中的相對風險(%)大於1(>1)時,判斷為警告或注意。 The apparatus for predicting a disease according to claim 9, wherein (f) the judgment in the judgment unit includes when the relative risk (%) in the (e) calculation unit is 1 or less than 1 ( 1), judged as a standard, and when the relative risk (%) in the (e) calculation unit is greater than 1 (>1), it is judged as warning or caution.
TW106140934A 2017-11-24 2017-11-24 A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease TWI669618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106140934A TWI669618B (en) 2017-11-24 2017-11-24 A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106140934A TWI669618B (en) 2017-11-24 2017-11-24 A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease

Publications (2)

Publication Number Publication Date
TW201926080A TW201926080A (en) 2019-07-01
TWI669618B true TWI669618B (en) 2019-08-21

Family

ID=68048916

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106140934A TWI669618B (en) 2017-11-24 2017-11-24 A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease

Country Status (1)

Country Link
TW (1) TWI669618B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI822547B (en) * 2023-01-04 2023-11-11 中國醫藥大學 System of applying machine learning to carotid sonographic features for recurrent stroke

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI532843B (en) * 2010-11-30 2016-05-11 香港中文大學 Detection of genetic or molecular aberrations associated with cancer
US20170183744A1 (en) * 2015-05-22 2017-06-29 The Regents Of The University Of California Determining risk of prostate tumor aggressiveness
TW201725526A (en) * 2015-09-30 2017-07-16 伊佛曼基因體有限公司 Systems and methods for predicting treatment-regimen-related outcomes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI532843B (en) * 2010-11-30 2016-05-11 香港中文大學 Detection of genetic or molecular aberrations associated with cancer
US20170183744A1 (en) * 2015-05-22 2017-06-29 The Regents Of The University Of California Determining risk of prostate tumor aggressiveness
TW201725526A (en) * 2015-09-30 2017-07-16 伊佛曼基因體有限公司 Systems and methods for predicting treatment-regimen-related outcomes

Also Published As

Publication number Publication date
TW201926080A (en) 2019-07-01

Similar Documents

Publication Publication Date Title
KR102194410B1 (en) A system and apparatus for disease-related genomic analysis using SNP
Claussnitzer et al. A brief history of human disease genetics
Gandal et al. The road to precision psychiatry: translating genetics into disease mechanisms
Tetreault et al. Whole-exome sequencing as a diagnostic tool: current challenges and future opportunities
Smedley et al. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes
US20200027557A1 (en) Multimodal modeling systems and methods for predicting and managing dementia risk for individuals
TWI363309B (en) Genetic analysis systems, methods and on-line portal
CN101617227B (en) Genetic analysis systems and methods
Hinds et al. A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci
Danis et al. Interpretable prioritization of splice variants in diagnostic next-generation sequencing
KR101693504B1 (en) Discovery system for disease cause by genetic variants using individual whole genome sequencing data
Zhou et al. Targeted resequencing of 358 candidate genes for autism spectrum disorder in a Chinese cohort reveals diagnostic potential and genotype–phenotype correlations
KR101693510B1 (en) Genotype analysis system and methods using genetic variants data of individual whole genome
Wang et al. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: Genotype-based deep learning
Cookson et al. Genetic risks and childhood-onset asthma
Liu et al. Comparison of multiple imputation algorithms and verification using whole-genome sequencing in the CMUH genetic biobank
KR101693717B1 (en) Bioactive variant analysis system using genetic variants data of individual whole genome
Nabais et al. An overview of DNA methylation-derived trait score methods and applications
JP2022549737A (en) Polygenic risk score for in vitro fertilization
Kuo et al. Frequency and spectrum of actionable pathogenic secondary findings in Taiwanese exomes
WO2017204482A2 (en) System and device for analyzing disease-related genome by using snps
TWI669618B (en) A method and an apparatus forpredicting disease, and a method for calculating the weighted score of risk of occurrence of eachdisease
Nho et al. The effect of reference panels and software tools on genotype imputation
Yu et al. Genetic clustering of depressed patients and normal controls based on single-nucleotide variant proportion
Young et al. Recursive partitioning analysis of complex disease pharmacogenetic studies. I. Motivation and overview

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees