WO2021107567A1 - Method and device for identifying genetic variation causative of recessive genetic disease by using ngs - Google Patents

Method and device for identifying genetic variation causative of recessive genetic disease by using ngs Download PDF

Info

Publication number
WO2021107567A1
WO2021107567A1 PCT/KR2020/016706 KR2020016706W WO2021107567A1 WO 2021107567 A1 WO2021107567 A1 WO 2021107567A1 KR 2020016706 W KR2020016706 W KR 2020016706W WO 2021107567 A1 WO2021107567 A1 WO 2021107567A1
Authority
WO
WIPO (PCT)
Prior art keywords
genetic
variation
reads containing
mutation
mutations
Prior art date
Application number
PCT/KR2020/016706
Other languages
French (fr)
Korean (ko)
Inventor
이정설
한주현
박중영
금창원
Original Assignee
주식회사 쓰리빌리언
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 쓰리빌리언 filed Critical 주식회사 쓰리빌리언
Publication of WO2021107567A1 publication Critical patent/WO2021107567A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B10/00ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present invention relates to a method and apparatus for determining a genetic mutation that causes a recessive genetic disease using next-generation sequencing (NGS).
  • NGS next-generation sequencing
  • next generation sequencing which produces a large amount of short sequences due to low cost and rapid data production, is rapidly replacing the traditional Sanger sequencing method.
  • next-generation sequencing NGS
  • the cost of generating a read fragment sequence
  • the availability of its application in the field of disease diagnosis is also increasing.
  • NGS next-generation sequencing
  • next-generation sequencing has the advantage of obtaining a large amount of sequencing information at a relatively low cost, it is not easy to find the causative gene of a disease from the large amount of information.
  • the present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease using reads in next-generation sequencing (NGS) technology.
  • NGS next-generation sequencing
  • Korean Patent Publication No. 10-1614471 discloses a method for diagnosing genetic abnormalities using reads, which is different from a method for determining genetic mutations that cause recessive genetic diseases.
  • the technical problem to be achieved by the present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease for determining a causative genetic mutation that causes a recessive genetic disease using a lead in a next-generation sequencing analysis.
  • the present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease for determining whether the two genetic mutations detected in a read are a cis-related mutation or a trans-related mutation to determine the causative genetic mutation that causes recessive genetic disease.
  • the method for determining the genetic variation causing a recessive genetic disease compares the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS) through the genetic variation extraction unit. , a gene extraction step of extracting a gene in which two or more genetic mutations have occurred; a read detection step of detecting a read of a target sample matching the extracted gene through a read detection unit; and a genetic variation discrimination step of discriminating a genetic variation causing a recessive genetic disease by using the read through a genetic variation determining unit.
  • NGS next-generation sequencing
  • the genetic mutation causing the recessive genetic disease and the cis-related variants refer to mutations in which the two genetic mutations are found only on one of the homologous chromosomes, and the trans-elated variants are the two genetic mutations.
  • the detecting of the read may include: identifying two genetic mutations (v1, v2) and a position (p1, p2) at which the genetic mutation exists in the extracted gene; and the number of reads containing both the two genetic mutation positions (p1 and p2) (N), and the number of reads containing v1 among the reads containing both the two genetic mutation positions (p1 and p2) (n1) , the number of reads containing v2 (n2) among the reads containing both the two genetic mutation positions (p1, p2), and v1 and v2 among the reads containing both the two genetic mutation positions (p1 and p2) Detecting the number of reads including all (c1) and the number of reads including v1 but not including v2 (c2) among the reads including both the two genetic mutation positions (p1 and p2) can
  • the two genetic variations (v1, v2) may be determined as cis-related variants.
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2)
  • c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.
  • the genetic variation determination step is calculated by the following [Equation 2], and when the score (v1, v2) is greater than or equal to the reference value, the two genetic variations (v1, v2) can be determined as trans-elated variants. .
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2)
  • c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.
  • the genetic variation (v1) and the genetic variation (v2) are cis-related variations (cis-related variants), and if the genetic variant (v2) and the genetic variant (v3) are cis-related variants, the genetic variant (v1) and the genetic variant (v3) are cis-related variants variants), genetic variation (v1) and genetic variation (v2) are cis-related variants, and genetic variation (v2) and genetic variation (v3) are trans-elated variants If this is the case, genetic variation (v1) and genetic variation (v3) are determined as trans-elated variants, and genetic variation (v1) and genetic variation (v2) are trans-related variants, If the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) and the genetic variation (v3) may be determined as cis-related variants.
  • the extracted gene may be a gene causing a recessive genetic disease.
  • the apparatus for determining genetic mutations causing recessive genetic diseases compares the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS), and two or more genetic mutations a gene extraction unit for extracting the gene in which the a read detection unit for detecting a read of a target sample matching the extracted gene; and a genetic variation determining unit for discriminating a genetic variation causing a recessive genetic disease by using the read.
  • NGS next-generation sequencing
  • the genetic mutation determining unit is configured to distinguish whether the two genetic mutations detected in the read are cis-related variants or trans-elated variants, and when it is determined as a trans-related mutation, it is a genetic mutation causing a recessive genetic disease. discriminate, and the cis-related variants refer to mutations in which the two genetic mutations are found only in one of homologous chromosomes, and the trans-elated variants are homologous to the two genetic mutations. Mutations found on all chromosomes.
  • the read detection unit identifies two genetic mutations (v1, v2) and a position (p1, p2) where the genetic mutation exists in the extracted gene, and includes both the two genetic mutation positions (p1, p2) the number of reads (N), the number of reads containing v1 among the reads containing both of the two genetic mutation positions (p1, p2) (n1), including both the two genetic mutation positions (p1, p2) The number of reads containing v2 among the reads (n2), the number of reads containing both v1 and v2 among the reads containing both of the two genetic mutation positions (p1, p2) (c1), and the two It is possible to detect the number (c2) of reads including v1 but not including v2 among the reads including both the mutation positions (p1 and p2).
  • the genetic variation determining unit may determine the two genetic variations (v1, v2) as cis-related variants when the score (v1, v2) calculated by the following [Equation 1] is equal to or greater than a reference value.
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2)
  • c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.
  • the genetic variation determining unit may determine the two genetic variations (v1, v2) as trans-elated variants.
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2)
  • c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.
  • the genetic variation (v1) and the genetic variation (v2) are cis-related variations (cis-related variants), and if the genetic variant (v2) and the genetic variant (v3) are cis-related variants, the genetic variant (v1) and the genetic variant (v3) are cis-related variants variants), genetic variation (v1) and genetic variation (v2) are cis-related variants, and genetic variation (v2) and genetic variation (v3) are trans-elated variants If this is the case, genetic variation (v1) and genetic variation (v3) are determined as trans-elated variants, and genetic variation (v1) and genetic variation (v2) are trans-related variants, If the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) and the genetic variation (v3) may be determined as cis-related variants.
  • the extracted gene may be a gene causing a recessive genetic disease.
  • the present invention can determine the causative mutation in the causative agent of a recessive genetic disease by using a read in next-generation sequencing (NGS).
  • NGS next-generation sequencing
  • the present invention can significantly reduce the time and effort for determining the genetic mutation that causes recessive genetic disease by discriminating whether two genetic mutations detected in the read of a target sample are cis-related mutations or trans-related mutations.
  • the present invention can significantly reduce the time and effort for determining the genetic mutation that causes recessive genetic disease by determining whether two genetic mutations separated by more than a read length are trans-related or cis-related mutations without statistical significance.
  • FIG. 1 is a block diagram illustrating a schematic configuration of an apparatus for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention.
  • FIG. 2 is a diagram for explaining a case in which two genetic mutations are trans-related mutations, according to an embodiment of the present invention.
  • FIG. 3 is a diagram for explaining a case in which two genetic mutations are cis-related mutations, according to an embodiment of the present invention.
  • FIG. 4 is a diagram for explaining a case in which two genetic mutations separated by a read length or more are trans-related mutations, according to an embodiment of the present invention.
  • FIG. 5 is a diagram for explaining a case in which two genetic mutations separated by a read length or more are cis-related mutations according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart for explaining a method for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention.
  • FIG. 7 is a flowchart for explaining a method for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention.
  • FIG. 8 is a view showing the analysis result of directly confirming a read containing a genetic mutation, according to an embodiment of the present invention.
  • next-generation sequencing is one of genome sequencing techniques, and it is possible to analyze a nucleotide sequence at a high speed by processing DNA fragments in parallel. Due to these characteristics, next-generation sequencing may be referred to as high-throughput sequencing, massive parallel sequencing, or second-generation sequencing. In addition, next-generation sequencing can be used as a variety of analysis platforms depending on the purpose.
  • analysis platforms for next-generation sequencing include Roche 454, GS FLX Titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer IIX, Life TECH SOLiD4, Life Technologies Ion Proton, Life Technologies Ion Proton, Complete Genomics, Helicos Biosciences Heliscope , Pacific Biosciences SMRT, and the like.
  • next-generation sequencing technology can be used to detect mutations in nucleotide sequences (genetic mutations).
  • Preferred analysis platforms for detecting sequence mutations may be Illumina hybridcapture, Illumina Amplicon, and IonTorrent Amplicon, but are not limited thereto.
  • the term "genetic variation” may refer to a mutation in a nucleotide sequence occurring in a chromosome due to various factors.
  • the genetic mutation may be a somatic mutation, a mutation in a nucleotide sequence due to contamination of a sample, and a mutation in a nucleotide sequence due to a genetic disease.
  • the genetic mutation is present in a small amount together with maternal DNA in the mother's blood. It may further include mutations present in small amounts in brain cells, mutations in the nucleotide sequence due to alleles, which may appear due to the DNA of the fetus.
  • the genetic variation is not limited to the above.
  • target sample may be a biological sample obtained from a patient to confirm a genetic variation
  • reference genome as used herein, is genetic as opposed to a target sample. It may be a normal biological sample that does not show any mutations.
  • a preferred target sample may be a tumor cell associated with a somatic mutation, and a preferred reference genome may be reference data sequenced in advance with respect to normal cells, but is not limited thereto.
  • a reference genome may be variously selected according to a target sample, and its nucleotide sequence may be analyzed together with the nucleotide sequence of the target sample.
  • the term “reads” is short-length nucleotide sequence data output from a genome sequencer.
  • the read length is generally composed of about 35 to 500 bp (base pair) depending on the type of genome sequencer, and in general, DNA bases are expressed by alphabetic letters A, C, G, and T.
  • FIG. 1 is a block diagram illustrating a schematic configuration of an apparatus for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention
  • FIG. 2 is a trans-related variation in which, according to an embodiment of the present invention, two genetic mutations are trans-related mutations. It is a view for explaining a case
  • FIG. 3 is a view for explaining a case where two genetic mutations are cis-related mutations according to an embodiment of the present invention.
  • the apparatus 1000 for determining a genetic mutation causing a recessive genetic disease includes a gene extraction unit 100 , a read detection unit 300 , and a genetic variation determining unit 500 .
  • the gene extraction unit 100 may extract the gene (G) in which two or more genetic mutations have occurred by comparing the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS).
  • NGS next-generation sequencing
  • the gene (G) may be a gene that causes a recessive genetic disease to be described later.
  • the read detection unit 300 may prepare a read of a target sample and detect a read (R) of the target sample that matches the gene (G) in which two or more genetic mutations have occurred.
  • the read detection unit 300 confirms the two genetic mutations (v1, v2) and the positions (p1, p2) at which the genetic mutations exist in the extracted gene (G), and then at any of the two genetic mutation positions (p1, p2).
  • a read having a position including at least one can be detected.
  • the read detection unit 300 includes the number of reads (N) including both the two genetic mutation positions (p1 and p2), and the genetic mutation v1 among the reads including both the two genetic mutation positions (p1 and p2).
  • the number of reads (n1), the number of reads containing the genetic mutation v2 (n2) among the reads containing both of the two genetic mutation positions (p1, p2), and the two genetic mutation positions (p1, p2) The number of reads containing both genetic mutations v1 and v2 (c1) among the included reads, and the number of reads containing both genetic mutation positions (p1 and p2) among the reads containing the genetic mutation v1 but not including the genetic mutation v2
  • the number of reads (c2) can be detected.
  • a disease occurs due to a mutation in a gene known to cause a hereditary disease on both the chromosomes from the father and the chromosomes from the mother, which are homologous chromosomes
  • the disease is called a recessive disease.
  • a disease occurs due to a mutation in a gene known to cause a hereditary disease in any one, the disease is called a dominant genetic disease.
  • one of the two genetic mutations (v1, v2) occurring at different positions (p1, p2) of the same gene (G) is from the father (a), and the other is from the mother (b).
  • the gene (G) is said to be recessive, and the disease is a recessive disease.
  • variants mutations in which two genetic mutations are found on only one of the homologous chromosomes are called cis-related variants, and mutations in which two genetic mutations are found on both homologous chromosomes are trans-elated. called variants).
  • this genetic mutation is the cause of the patient's recessive genetic disease. It can be a candidate for causative genetic variation.
  • each read (R) detected in the next-generation sequencing (NGS) cannot tell whether the chromosome (a) sequence received from the father or the chromosome (b) sequence received from the mother is read, so each read It is not known whether the genetic mutations present in (R) are cis-related mutations or trans-related mutations.
  • the genetic variation determining unit 500 may determine the genetic variation causing the recessive genetic disease by using the lead (R).
  • the genetic mutation determining unit 500 distinguishes whether the two genetic mutations detected in the read (R) are cis-related mutations or trans-related mutations, and when it is determined as trans-related mutations, it is possible to determine the two genetic mutations as genetic mutations causing recessive genetic diseases. have.
  • the two genetic variants (v1, v2) detected in the read (R) are cis-related variants if the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value Can be determined as cis-related variants. have.
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is The number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2)
  • c1 is the number of reads containing both v1 and v2 among the reads containing both genetic mutation sites (p1, p2) is the number of
  • n1 and n2 should both come from the grid, so n1 and n2 should be the same.
  • n1 and n2 are not completely the same for experimental or biological reasons, it can be determined with statistical significance (Fisher's exact p-value) of how similar n1 and n2 are.
  • the two genetic variants (v1, v2) detected in the read (R) can be determined as trans-elated variants if the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value. have.
  • N is the number of reads containing both genetic mutation sites (p1, p2)
  • n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2)
  • n2 is The number of reads containing v2 among the reads containing both mutation sites (p1, p2)
  • c2 is the number of reads containing v1 and not v2 among the reads containing both mutation positions (p1, p2) is the number of leads that do not.
  • the apparatus 1000 for determining the genetic mutation causing recessive genetic disease may determine the causative genetic mutation causing the recessive genetic disease by using the read in next-generation sequencing (NGS).
  • NGS next-generation sequencing
  • the device for determining genetic mutations causing recessive genetic diseases can significantly reduce the time and effort for determining the causative genetic mutation that causes recessive genetic disease by distinguishing whether the two genetic mutations detected in the read of the target sample are cis-related mutations or trans-related mutations.
  • FIG. 4 is a diagram for explaining a case in which two genetic mutations separated by more than a read length are trans-related mutations, according to an embodiment of the present invention
  • FIG. 5 is a diagram that is separated by more than a read length according to an embodiment of the present invention. It is a diagram for explaining a case in which two genetic mutations are cis-related mutations.
  • the apparatus for determining a genetic mutation causing a recessive genetic disease may determine whether two genetic mutations separated by a read length or more are a trans-related mutation or a cis-related mutation.
  • next-generation sequencing NGS Since the length of a DNA sequence that can be read at a time in next-generation sequencing (NGS) is limited, the read length is inevitably limited to about 150 bp.
  • the present invention proposes a method for determining whether a trans-related mutation or a cis-related mutation by using a mutation in the middle of two genetic mutations separated by more than a read length.
  • v1 and v2 are trans relational variables and v2 and v3 are trans relational variables, v1 and v3 are cis relational variables.
  • v1 and v3 must be on the same chromosome, so it is a cis-relational mutation.
  • v1 and v2 are cis-related variables and v2 and v3 are trans-related variables
  • v1 and v3 are trans-related variables.
  • v1 and v2 are cis-relational variables and v2 and v3 are cis-related variables, v1 and v3 are cis-relational variables.
  • the apparatus for determining the cause of a recessive genetic disease determines whether two genetic mutations separated by more than a read length are trans-related or cis-related without statistical significance, thereby causing recessive genetic disease.
  • the time and effort for identifying genetic mutations can be greatly reduced.
  • FIG. 6 is a schematic flowchart for explaining a method for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention
  • FIG. 7 is a flowchart for explaining a method for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention
  • FIG. 8 is a view showing an analysis result of directly confirming a read containing a genetic mutation, according to an embodiment of the present invention.
  • the method for determining a genetic variation causing a recessive genetic disease includes a gene extraction step (S100), a read detection step (S300), and a genetic variation determination step (S500).
  • the gene extraction step (S100) after comparing the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in the next-generation sequencing (NGS) through the gene extraction unit (S110), the gene with two or more genetic mutations is extracted You can (S130).
  • NGS next-generation sequencing
  • the number of reads (N) including both the two genetic mutation positions (p1, p2) and v1 among the reads including both the two genetic mutation positions (p1, p2) are included the number of reads (n1), the number of reads containing v2 among the reads containing both the two genetic mutation positions (p1, p2) (n2), and the two genetic mutation positions (p1, p2) Detects the number of reads containing both v1 and v2 among reads (c1), and the number of reads containing both v1 and no v2 among reads containing both genetic mutation sites (p1 and p2) (c2) do.
  • the data is a genetic mutation in which the guanine (G) base at 979690 of the target sample chromosome 1 is changed to an adenine (A) base and the guanine (G) base at 979835 of the target sample chromosome 1 is adenine (A) It is the result of each read of a genetic mutation that has been converted to a base.
  • the number of reads can be detected by displaying normal (O) and genetic mutation (X) in each read.
  • the genetic variation determination step (S500) determines whether the two genetic variations (v1, v2) occurring at different positions (p1, p2) detected in the read are cis-related variations or trans-related variations through the genetic variation discrimination unit and (S510), and the trans-related mutation is determined as a genetic mutation causing a recessive genetic disease (S530).
  • variants mutations in which two genetic mutations are found on only one of the homologous chromosomes are called cis-related variants, and mutations in which two genetic mutations are found on both homologous chromosomes are trans-elated. called variants).
  • Two genetic mutations (v1, v2) occurring at different positions (p1, p2) detected in the read can be determined as cis-related mutations by [Equation 1] described above, and trans-related mutations by [Equation 2] can be decided.
  • the above-described embodiments of the present invention can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium.
  • the computer-readable recording medium includes a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.) and an optically readable medium (eg, a CD-ROM, a DVD, etc.).

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Animal Behavior & Ethology (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The present invention provides a device for identifying a genetic variation causative of a recessive genetic disease, the device comprising: a gene extraction part in which a nucleotide sequence of a subject sample is compared with a reference nucleotide sequence of a reference genome by next generation sequencing (NGS) to extract a gene having two or more genetic variations generated thereon; a read detection part in which a read matched with the extracted gene is detected from the subject sample; and a genetic variation identification part in which a genetic variation causative of the recessive genetic disease is identified using the read.

Description

NGS를 이용한 열성유전병 원인 유전변이 판별 방법 및 장치Method and Apparatus for Determining Genetic Variation Caused by Recessive Genetic Disease Using NGS
본 발명은 차세대 염기서열 분석(NGS)를 이용하여 열성유전병을 유발하는 열성유전병 원인 유전변이 판별 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for determining a genetic mutation that causes a recessive genetic disease using next-generation sequencing (NGS).
인간 게놈 프로젝트가 시작된 이래로, 질병의 원인이 되는 유전자에 관한 많은 연구가 진행되었으며, 유전자의 염기 서열을 밝히기 위한 시퀀싱 기술도 계속 개발되고 있다. 최근에는, 저렴한 비용과 빠른 데이터 생산으로 인해 대용량의 짧은 서열을 생산하는 차세대 염기서열 분석(NGS; Next Generation Sequencing)이 전통적인 생거(Sanger) 시퀀싱 방식을 빠르게 대체하고 있다. 차세대 염기서열 분석(NGS) 기술이 발전함에 따라 리드(단편 서열)을 만들어 내는 비용이 예전의 절반 이하가 되었고, 이를 질병 진단 분야에 응용할 수 있는 가용성도 증가하고 있다. Since the start of the Human Genome Project, many studies have been conducted on genes that cause diseases, and sequencing technology to reveal the nucleotide sequence of the genes is still being developed. Recently, next generation sequencing (NGS), which produces a large amount of short sequences due to low cost and rapid data production, is rapidly replacing the traditional Sanger sequencing method. With the development of next-generation sequencing (NGS) technology, the cost of generating a read (fragment sequence) has become less than half of what it used to be, and the availability of its application in the field of disease diagnosis is also increasing.
최근에는 차세대 염기서열 분석(NGS) 기술을 이용해 멘델성 유전질환, 희귀질환, 암 등에서 질병의 원인 유전자를 찾는데 성공하였다.Recently, using next-generation sequencing (NGS) technology, we have succeeded in finding the causative genes of diseases such as Mendelian genetic diseases, rare diseases, and cancer.
그러나 차세대 염기서열 분석(NGS)은 상대적으로 저렴한 비용에 대용량의 염기서열 정보를 얻을 수 있는 장점이 있지만, 대용량 정보로부터 질병의 원인 유전자를 찾는 것은 쉽지 않다.However, although next-generation sequencing (NGS) has the advantage of obtaining a large amount of sequencing information at a relatively low cost, it is not easy to find the causative gene of a disease from the large amount of information.
본 발명은 차세대 염기서열 분석(NGS) 기술에서 리드(reads)를 이용하여 열성유전병을 유발하는 열성유전병 원인 유전변이 판별 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease using reads in next-generation sequencing (NGS) technology.
한국특허공고 제10-1614471호에서 리드(reads)를 이용한 유전자 이상 진단 결정 방법에 관하여 개시하고 있는데, 이는 열성유전병을 유발하는 원인 유전변이 판별 방법과는 차이가 있다.Korean Patent Publication No. 10-1614471 discloses a method for diagnosing genetic abnormalities using reads, which is different from a method for determining genetic mutations that cause recessive genetic diseases.
본 발명이 이루고자 하는 기술적 과제는 차세대 염기서열 분석에서 리드를 이용하여 열성유전병을 유발하는 원인 유전변이를 판별하기 위한 열성유전병 원인 유전변이 판별 방법 및 장치에 관한 것이다. 또한, 리드에서 검출된 두 개의 유전변이가 시스 관계변이인지 트랜스 관계변이인지 구별하여 열성유전병을 유발하는 원인 유전변이를 판결하기 위한 열성유전병 원인 유전변이 판별 방법 및 장치에 관한 것이다.The technical problem to be achieved by the present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease for determining a causative genetic mutation that causes a recessive genetic disease using a lead in a next-generation sequencing analysis. In addition, the present invention relates to a method and apparatus for determining a genetic mutation causing a recessive genetic disease for determining whether the two genetic mutations detected in a read are a cis-related mutation or a trans-related mutation to determine the causative genetic mutation that causes recessive genetic disease.
이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 열성유전병 원인 유전변이 판별 방법은 유전변이 추출부를 통해, 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교하여, 두 개 이상의 유전변이가 발생한 유전자를 추출하는 유전자 추출단계; 리드 검출부를 통해, 상기 추출된 유전자와 매칭되는 대상샘플의 리드를 검출하는 리드 검출단계; 및 유전변이 판별부를 통해, 상기 리드를 이용하여 열성유전병 원인 유전변이를 판별하는 유전변이 판별단계를 포함한다.In order to solve this problem, the method for determining the genetic variation causing a recessive genetic disease according to an embodiment of the present invention compares the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS) through the genetic variation extraction unit. , a gene extraction step of extracting a gene in which two or more genetic mutations have occurred; a read detection step of detecting a read of a target sample matching the extracted gene through a read detection unit; and a genetic variation discrimination step of discriminating a genetic variation causing a recessive genetic disease by using the read through a genetic variation determining unit.
상기 유전변이 판별단계는, 상기 리드에서 검출된 두 개의 유전변이가 시스 관계변이(cis-related variants)인지 트랜스 관계변이(trans-elated variants)인지 구분하여 트랜스 관계변이로 결정되면 열성유전병 원인 유전변이로 판별하고, 상기 시스 관계변이(cis-related variants)는 상기 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 말하고, 상기 트랜스 관계변이(trans-elated variants)는 상기 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 말한다.In the step of determining the genetic mutation, when the two genetic mutations detected in the read are cis-related variants or trans-elated variants, and determined as the trans-related mutation, the genetic mutation causing the recessive genetic disease , and the cis-related variants refer to mutations in which the two genetic mutations are found only on one of the homologous chromosomes, and the trans-elated variants are the two genetic mutations. A mutation found on all homologous chromosomes.
상기 리드를 검출하는 단계는, 상기 추출된 유전자에서 두 개의 유전변이(v1, v2)와 상기 유전변이가 존재하는 위치(p1, p2)를 확인하는 단계; 및 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수(n2), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수(c1), 및 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수(c2)를 검출하는 단계를 포함할 수 있다.The detecting of the read may include: identifying two genetic mutations (v1, v2) and a position (p1, p2) at which the genetic mutation exists in the extracted gene; and the number of reads containing both the two genetic mutation positions (p1 and p2) (N), and the number of reads containing v1 among the reads containing both the two genetic mutation positions (p1 and p2) (n1) , the number of reads containing v2 (n2) among the reads containing both the two genetic mutation positions (p1, p2), and v1 and v2 among the reads containing both the two genetic mutation positions (p1 and p2) Detecting the number of reads including all (c1) and the number of reads including v1 but not including v2 (c2) among the reads including both the two genetic mutation positions (p1 and p2) can
상기 유전변이 판별단계는, 하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)가 시스 관계변이(cis-related variants)로 결정될 수 있다.In the step of determining the genetic variation, when the score (v1, v2) calculated by the following [Equation 1] is equal to or greater than the reference value, the two genetic variations (v1, v2) may be determined as cis-related variants.
[식 1][Equation 1]
Figure PCTKR2020016706-appb-img-000001
Figure PCTKR2020016706-appb-img-000001
(여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c1는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2), c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.)
상기 유전변이 판별단계는, 하기 [식 2]로 산출되고, score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)가 트랜스 관계변이(trans-elated variants)로 결정될 수 있다.The genetic variation determination step is calculated by the following [Equation 2], and when the score (v1, v2) is greater than or equal to the reference value, the two genetic variations (v1, v2) can be determined as trans-elated variants. .
[식 2][Equation 2]
Figure PCTKR2020016706-appb-img-000002
Figure PCTKR2020016706-appb-img-000002
(여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2), c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.)
상기 리드 중 두 개의 유전변이(v1, v2)를 가지는 제1리드와 두 개의 유전변이(v2, v3)를 가지는 제2리드에 대해서, 유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 시스 관계변이(cis-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되고, 유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-elated variants)이면, 유전변이(v1)과 유전변이(v3)는 트랜스 관계변이(trans-elated variants)로 결정되고, 유전변이(v1)과 유전변이(v2)가 트랜스 관계변이(trans-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정될 수 있다.Among the reads, for the first read having two genetic variations (v1, v2) and the second lead having two genetic variations (v2, v3), the genetic variation (v1) and the genetic variation (v2) are cis-related variations (cis-related variants), and if the genetic variant (v2) and the genetic variant (v3) are cis-related variants, the genetic variant (v1) and the genetic variant (v3) are cis-related variants variants), genetic variation (v1) and genetic variation (v2) are cis-related variants, and genetic variation (v2) and genetic variation (v3) are trans-elated variants If this is the case, genetic variation (v1) and genetic variation (v3) are determined as trans-elated variants, and genetic variation (v1) and genetic variation (v2) are trans-related variants, If the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) and the genetic variation (v3) may be determined as cis-related variants.
추출된 상기 유전자는 열성유전병을 유발하는 유전자일 수 있다.The extracted gene may be a gene causing a recessive genetic disease.
이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 열성유전병 원인 유전변이 판별 장치는 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교하여, 두 개 이상의 유전변이가 발생한 유전자를 추출하는 유전자 추출부; 상기 추출된 유전자와 매칭되는 대상샘플의 리드를 검출하는 리드 검출부; 및 상기 리드를 이용하여 열성유전병 원인 유전변이를 판별하는 유전변이 판별부를 포함한다.In order to solve this problem, the apparatus for determining genetic mutations causing recessive genetic diseases according to an embodiment of the present invention compares the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS), and two or more genetic mutations a gene extraction unit for extracting the gene in which the a read detection unit for detecting a read of a target sample matching the extracted gene; and a genetic variation determining unit for discriminating a genetic variation causing a recessive genetic disease by using the read.
상기 유전변이 판별부는, 상기 리드에서 검출된 두 개의 유전변이가 시스 관계변이(cis-related variants)인지 트랜스 관계변이(trans-elated variants)인지 구분하여 트랜스 관계변이로 결정되면 열성유전병 원인 유전변이로 판별하고, 상기 시스 관계변이(cis-related variants)는 상기 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 말하고, 상기 트랜스 관계변이(trans-elated variants)는 상기 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 말한다.The genetic mutation determining unit is configured to distinguish whether the two genetic mutations detected in the read are cis-related variants or trans-elated variants, and when it is determined as a trans-related mutation, it is a genetic mutation causing a recessive genetic disease. discriminate, and the cis-related variants refer to mutations in which the two genetic mutations are found only in one of homologous chromosomes, and the trans-elated variants are homologous to the two genetic mutations. Mutations found on all chromosomes.
상기 리드 검출부는, 상기 추출된 유전자에서 두 개의 유전변이(v1, v2)와 상기 유전변이가 존재하는 위치(p1, p2)를 확인하고, 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수(n2), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수(c1), 및 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수(c2)를 검출할 수 있다.The read detection unit identifies two genetic mutations (v1, v2) and a position (p1, p2) where the genetic mutation exists in the extracted gene, and includes both the two genetic mutation positions (p1, p2) the number of reads (N), the number of reads containing v1 among the reads containing both of the two genetic mutation positions (p1, p2) (n1), including both the two genetic mutation positions (p1, p2) The number of reads containing v2 among the reads (n2), the number of reads containing both v1 and v2 among the reads containing both of the two genetic mutation positions (p1, p2) (c1), and the two It is possible to detect the number (c2) of reads including v1 but not including v2 among the reads including both the mutation positions (p1 and p2).
상기 유전변이 판별부는, 하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)를 시스 관계변이(cis-related variants)로 결정할 수 있다.The genetic variation determining unit may determine the two genetic variations (v1, v2) as cis-related variants when the score (v1, v2) calculated by the following [Equation 1] is equal to or greater than a reference value.
[식 1][Equation 1]
Figure PCTKR2020016706-appb-img-000003
Figure PCTKR2020016706-appb-img-000003
(여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c1는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2), c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.)
상기 유전변이 판별부는, 하기 [식 2]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)를 트랜스 관계변이(trans-elated variants)로 결정할 수 있다.If the score (v1, v2) calculated by the following [Equation 2] is equal to or greater than the reference value, the genetic variation determining unit may determine the two genetic variations (v1, v2) as trans-elated variants.
[식 2][Equation 2]
Figure PCTKR2020016706-appb-img-000004
Figure PCTKR2020016706-appb-img-000004
(여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2), c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.)
상기 리드 중 두 개의 유전변이(v1, v2)를 가지는 제1리드와 두 개의 유전변이(v2, v3)를 가지는 제2리드에 대해서, 유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 시스 관계변이(cis-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되고, 유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-elated variants)이면, 유전변이(v1)과 유전변이(v3)는 트랜스 관계변이(trans-elated variants)로 결정되고, 유전변이(v1)과 유전변이(v2)가 트랜스 관계변이(trans-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정될 수 있다.Among the reads, for the first read having two genetic variations (v1, v2) and the second lead having two genetic variations (v2, v3), the genetic variation (v1) and the genetic variation (v2) are cis-related variations (cis-related variants), and if the genetic variant (v2) and the genetic variant (v3) are cis-related variants, the genetic variant (v1) and the genetic variant (v3) are cis-related variants variants), genetic variation (v1) and genetic variation (v2) are cis-related variants, and genetic variation (v2) and genetic variation (v3) are trans-elated variants If this is the case, genetic variation (v1) and genetic variation (v3) are determined as trans-elated variants, and genetic variation (v1) and genetic variation (v2) are trans-related variants, If the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) and the genetic variation (v3) may be determined as cis-related variants.
추출된 상기 유전자는 열성유전병을 유발하는 유전자일 수 있다.The extracted gene may be a gene causing a recessive genetic disease.
위에서 언급된 본 발명의 기술적 과제 외에도, 본 발명의 다른 특징 및 이점들이 이하에서 기술되거나, 그러한 기술 및 설명으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.In addition to the technical problems of the present invention mentioned above, other features and advantages of the present invention will be described below, or will be clearly understood by those skilled in the art from such description and description.
이상과 같은 본 발명에 따르면 다음과 같은 효과가 있다.According to the present invention as described above, there are the following effects.
본 발명은 차세대 염기서열 분석(NGS)에서 리드를 이용하여 열성유전병을 유발하는 원인 원전변이를 판별할 수 있다.The present invention can determine the causative mutation in the causative agent of a recessive genetic disease by using a read in next-generation sequencing (NGS).
본 발명은 대상샘플의 리드에서 검출된 두 개의 유전변이가 시스 관계변이 인지 트랜스 관계 변이인지를 구별함으로써 열성유전병을 유발하는 원인 유전변이를 판별하는 시간과 노력을 대폭 감소시킬 수 있다. The present invention can significantly reduce the time and effort for determining the genetic mutation that causes recessive genetic disease by discriminating whether two genetic mutations detected in the read of a target sample are cis-related mutations or trans-related mutations.
본 발명은 리드 길이 이상 떨어져 있는 두 유전변이가 트랜스 관계변이 인지 시스 관계변이 인지를 통계적 유의성에 의하지 않고 결정함으로써 열성유전병을 유발하는 원인 유전변이를 판별하는 시간과 노력을 대폭 감소시킬 수 있다.The present invention can significantly reduce the time and effort for determining the genetic mutation that causes recessive genetic disease by determining whether two genetic mutations separated by more than a read length are trans-related or cis-related mutations without statistical significance.
이 밖에도, 본 발명의 실시 예들을 통해 본 발명의 또 다른 특징 및 이점들이 새롭게 파악될 수도 있을 것이다.In addition, other features and advantages of the present invention may be newly recognized through embodiments of the present invention.
도 1은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 장치의 개략적인 구성을 도시한 블록도이다.1 is a block diagram illustrating a schematic configuration of an apparatus for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention.
도 2는 본 발명의 일 실시예에 따른, 두 개의 유전변이가 트랜스 관계변이인 경우를 설명하기 위한 도면이다.2 is a diagram for explaining a case in which two genetic mutations are trans-related mutations, according to an embodiment of the present invention.
*도 3은 본 발명의 일 실시예에 따른, 두 개의 유전변이가 시스 관계변이인 경우를 설명하기 위한 도면이다.* FIG. 3 is a diagram for explaining a case in which two genetic mutations are cis-related mutations, according to an embodiment of the present invention.
도 4는 본 발명의 일 실시예에 따른, 리드 길이 이상 떨어져 있는 두 유전변이가 트랜스 관계변이인 경우를 설명하기 위한 도면이다.4 is a diagram for explaining a case in which two genetic mutations separated by a read length or more are trans-related mutations, according to an embodiment of the present invention.
도 5는 본 발명의 일 실시예에 따른, 리드 길이 이상 떨어져 있는 두 유전변이가 시스 관계변이인 경우를 설명하기 위한 도면이다.5 is a diagram for explaining a case in which two genetic mutations separated by a read length or more are cis-related mutations according to an embodiment of the present invention.
도 6은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 방법을 설명하기 위한 개략적인 흐름도이다.6 is a schematic flowchart for explaining a method for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention.
도 7은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판결 방법을 설명하기 위한 흐름도이다.7 is a flowchart for explaining a method for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention.
도 8은 본 발명의 일 실시예에 따른, 유전변이가 포함되어 있는 리드를 직접 확인한 분석 결과를 나타낸 도면이다.8 is a view showing the analysis result of directly confirming a read containing a genetic mutation, according to an embodiment of the present invention.
본 명세서에서 각 도면의 구성요소들에 참조번호를 부가함에 있어서 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다. It should be noted that in the present specification, in adding reference numbers to the components of each drawing, the same numbers are used for the same components, even if they are indicated on different drawings, as much as possible.
한편, 본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다. On the other hand, the meaning of the terms described in this specification should be understood as follows.
단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.The singular expression is to be understood as including the plural expression unless the context clearly defines otherwise, and the terms "first", "second", etc. are used to distinguish one element from another, The scope of rights should not be limited by these terms.
"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that terms such as “comprise” or “have” do not preclude the possibility of addition or existence of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.
또한, 본 명세서의 해석의 명확함을 위해, 이하에서는 본 명세서에서 사용되는 용어들을 정의하기로 한다.In addition, for clarity of interpretation of the present specification, terms used in the present specification will be defined below.
본 명세서에서 사용되는 용어, "차세대 염기서열 분석"은 유전체의 염기서열 분석기술 중 하나로, DNA 조각을 병렬로 처리함으로써 염기서열을 고속으로 분석할 수 있다. 이러한 특징으로, 차세대 염기서열 분석은 고 처리율 시퀀싱 (high-throughput sequencing), 대용량 병렬 시퀀싱 (massive parallel sequencing) 또는 2세대 시퀀싱 (second-generation sequencing) 으로 불릴 수 있다. 또한, 차세대 염기서열 분석은 목적에 따라 다양한 분석 플랫폼으로 이용될 수 있다. 예를 들어, 차세대 염기서열 분석의 분석 플랫폼은 Roche 454, GS FLX Titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer IIX, Life Technologie SOLiD4, Life Technologies Ion Proton, Life Technologies Ion Proton, Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences SMRT 등이 있을 수 있다. 더 나아가, 차세대 염기서열 분석기술은 염기서열의 변이(유전변이) 검출에 이용 될 수 있다. 염기서열의 변이 검출을 위한 바람직한 분석 플랫폼은 Illumina hybridcapture, Illumina Amplicon 및 IonTorrent Amplicon일 수 있으나, 이에 제한되는 것은 아니다.As used herein, the term “next-generation sequencing” is one of genome sequencing techniques, and it is possible to analyze a nucleotide sequence at a high speed by processing DNA fragments in parallel. Due to these characteristics, next-generation sequencing may be referred to as high-throughput sequencing, massive parallel sequencing, or second-generation sequencing. In addition, next-generation sequencing can be used as a variety of analysis platforms depending on the purpose. For example, analysis platforms for next-generation sequencing include Roche 454, GS FLX Titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer IIX, Life Technologie SOLiD4, Life Technologies Ion Proton, Life Technologies Ion Proton, Complete Genomics, Helicos Biosciences Heliscope , Pacific Biosciences SMRT, and the like. Furthermore, next-generation sequencing technology can be used to detect mutations in nucleotide sequences (genetic mutations). Preferred analysis platforms for detecting sequence mutations may be Illumina hybridcapture, Illumina Amplicon, and IonTorrent Amplicon, but are not limited thereto.
본 명세서에서 사용되는 용어, "유전변이"는 여러 가지 요인으로 인해, 염색체에서 일어나는 염기서열의 변이를 의미할 수 있다. 예를 들어, 유전변이는 체성 돌연변이, 샘플의 오염으로 인한 염기서열의 변이 및 유전병으로 인한 염기서열의 변이일 수 있고 더 나아가, 유전변이는 산모의 혈액 내에서, 모체 DNA와 함께 소량으로 존재하는 태아의 DNA로 인해 나타날 수 있는, 대립유전자에 의한 염기서열 변이, 뇌 세포 안에서 소량으로 존재하는 돌연변이를 더 포함할 수 있다. 그러나 유전변이는 전술한 것에 제한되는 것은 아니다.As used herein, the term "genetic variation" may refer to a mutation in a nucleotide sequence occurring in a chromosome due to various factors. For example, the genetic mutation may be a somatic mutation, a mutation in a nucleotide sequence due to contamination of a sample, and a mutation in a nucleotide sequence due to a genetic disease. Further, the genetic mutation is present in a small amount together with maternal DNA in the mother's blood. It may further include mutations present in small amounts in brain cells, mutations in the nucleotide sequence due to alleles, which may appear due to the DNA of the fetus. However, the genetic variation is not limited to the above.
본 명세서에서 사용되는 용어, "대상샘플"은 유전변이를 확인하고자 하는 환자로부터 수득한 생물학적 시료일 수 있고, 본 명세서에서 사용되는 용어, "참조 게놈(reference genome)"은 대상샘플과 대조적으로 유전변이가 나타나지 않은 정상의 생물학적 시료일 수 있다. 바람직한 대상샘플은 체성 돌연변이와 연관된 종양세포일 수 있고, 바람직한 참조 게놈(reference genome)은 정상의 세포에 대하여 미리 염기서열 분석된 레퍼런스 데이터일 수 있으나, 이에 제한되는 것은 아니다. 예를 들어, 참조 게놈(reference genome)은 대상샘플에 따라 다양하게 선택될 수 있으며, 이의 염기서열은 대상샘플의 염기서열과 함께 분석될 수도 있다.As used herein, the term "target sample" may be a biological sample obtained from a patient to confirm a genetic variation, and the term, "reference genome" as used herein, is genetic as opposed to a target sample. It may be a normal biological sample that does not show any mutations. A preferred target sample may be a tumor cell associated with a somatic mutation, and a preferred reference genome may be reference data sequenced in advance with respect to normal cells, but is not limited thereto. For example, a reference genome may be variously selected according to a target sample, and its nucleotide sequence may be analyzed together with the nucleotide sequence of the target sample.
본 명세서에서 사용되는 용어, "리드(reads)"란 게놈 시퀀서(genome sequencer)에서 출력되는 짧은 길이의 염기서열 데이터이다. 리드 길이는 게놈 시퀀서의 종류에 따라 일반적으로 35~500bp(base pair) 정도로 다양하게 구성되며, 일반적으로 DNA 염기의 경우 A, C, G, T의 알파벳 문자로 표현된다.As used herein, the term “reads” is short-length nucleotide sequence data output from a genome sequencer. The read length is generally composed of about 35 to 500 bp (base pair) depending on the type of genome sequencer, and in general, DNA bases are expressed by alphabetic letters A, C, G, and T.
도 1은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 장치의 개략적인 구성을 도시한 블록도이고, 도 2는 본 발명의 일 실시예에 따른, 두 개의 유전변이가 트랜스 관계변이인 경우를 설명하기 위한 도면이고, 도 3은 본 발명의 일 실시예에 따른, 두 개의 유전변이가 시스 관계변이인 경우를 설명하기 위한 도면이다.1 is a block diagram illustrating a schematic configuration of an apparatus for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention, and FIG. 2 is a trans-related variation in which, according to an embodiment of the present invention, two genetic mutations are trans-related mutations. It is a view for explaining a case, and FIG. 3 is a view for explaining a case where two genetic mutations are cis-related mutations according to an embodiment of the present invention.
도 1 내지 도 3을 참조하면, 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 장치(1000)는 유전자 추출부(100), 리드 검출부(300), 및 유전변이 판별부(500)를 포함한다.1 to 3 , the apparatus 1000 for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention includes a gene extraction unit 100 , a read detection unit 300 , and a genetic variation determining unit 500 . include
유전자 추출부(100)는 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교하여, 두 개 이상의 유전변이가 발생한 유전자(G)를 추출할 수 있다. 이때 유전자(G)는 후술할 열성유전병을 유발하는 유전자일 수 있다.The gene extraction unit 100 may extract the gene (G) in which two or more genetic mutations have occurred by comparing the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS). In this case, the gene (G) may be a gene that causes a recessive genetic disease to be described later.
리드 검출부(300)는 대상샘플의 리드를 준비하고, 두 개 이상의 유전변이가 발생한 유전자(G)와 매칭되는 대상샘플의 리드(R)를 검출할 수 있다.The read detection unit 300 may prepare a read of a target sample and detect a read (R) of the target sample that matches the gene (G) in which two or more genetic mutations have occurred.
리드 검출부(300)는 추출된 유전자(G)에서 두 개의 유전변이(v1, v2)와 유전변이가 존재하는 위치(p1, p2)를 확인한 후, 두 개의 유전변이 위치(p1, p2) 중 어느 하나라도 포함되는 위치를 가지는 리드를 검출할 수 있다.The read detection unit 300 confirms the two genetic mutations (v1, v2) and the positions (p1, p2) at which the genetic mutations exist in the extracted gene (G), and then at any of the two genetic mutation positions (p1, p2). A read having a position including at least one can be detected.
또한, 리드 검출부(300)는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 유전변이 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 유전변이 v2을 포함하는 리드의 수(n2), 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 유전변이 v1과 v2를 모두 포함하는 리드의 수(c1), 및 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 유전변이 v1을 포함하면서 유전변이 v2를 포함하지 않는 리드의 수(c2)를 검출할 수 있다.In addition, the read detection unit 300 includes the number of reads (N) including both the two genetic mutation positions (p1 and p2), and the genetic mutation v1 among the reads including both the two genetic mutation positions (p1 and p2). The number of reads (n1), the number of reads containing the genetic mutation v2 (n2) among the reads containing both of the two genetic mutation positions (p1, p2), and the two genetic mutation positions (p1, p2) The number of reads containing both genetic mutations v1 and v2 (c1) among the included reads, and the number of reads containing both genetic mutation positions (p1 and p2) among the reads containing the genetic mutation v1 but not including the genetic mutation v2 The number of reads (c2) can be detected.
사람의 경우 상동염색체인 부에게서 받은 염색체와 모에게서 받은 염색체 모두에서 유전병을 일으키는 것으로 알려진 유전자의 변이로 인하여 질병이 발병할 경우 이러한 질병을 열성유전병이라 하고, 부에게서 받은 염색체와 모에게서 받은 염색체 중 어느 하나에 유전병을 일으키는 것으로 알려진 유전자의 변이로 인하여 질병이 발병할 경우 이러한 질병을 우성유전병이라 한다.In humans, when a disease occurs due to a mutation in a gene known to cause a hereditary disease on both the chromosomes from the father and the chromosomes from the mother, which are homologous chromosomes, the disease is called a recessive disease. When a disease occurs due to a mutation in a gene known to cause a hereditary disease in any one, the disease is called a dominant genetic disease.
도 2을 참조하면, 동일 유전자(G)의 서로 다른 위치(p1, p2)에서 발생한 두 개의 유전변이(v1, v2) 중 하나는 부(a)에게서 왔고, 다른 하나는 모(b)에게서 온 경우에, 그 유전자(G)로 인해 질병이 발생하면 그 유전자(G)는 열성유전이라 하고, 그 질병은 열성유전병이다.2, one of the two genetic mutations (v1, v2) occurring at different positions (p1, p2) of the same gene (G) is from the father (a), and the other is from the mother (b). In some cases, if a disease is caused by the gene (G), the gene (G) is said to be recessive, and the disease is a recessive disease.
반면에, 도 3을 참조하면, 동일 유전자(G)의 서로 다른 위치(p1, p2)에서 발생한 두 개의 유전변이(v1, v2) 모두가 부(a) 또는 모(b) 어느 한쪽에서만 온 경우에, 그 유전자(G)가 열성유전이라면 질병이 발생되지 않는다.On the other hand, referring to FIG. 3 , when both genetic mutations (v1, v2) occurring at different positions (p1, p2) of the same gene (G) come only from either side of the parent (a) or the mother (b) E.g., if the gene (G) is recessive, no disease occurs.
이때, 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 시스 관계변이(cis-related variants)라 명명하고, 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 트랜스 관계변이(trans-elated variants)라 명명한다.At this time, mutations in which two genetic mutations are found on only one of the homologous chromosomes are called cis-related variants, and mutations in which two genetic mutations are found on both homologous chromosomes are trans-elated. called variants).
즉, 환자에게서 발견된 유전변이의 유전자가 열성유전 방식으로 질병을 유발하는 경우에, 환자에게서 발견된 유전변이가 부와 모 모두에게 존재한다면(트랜스 관계변이) 이 유전변이는 환자의 열성유전병의 원인 유전변이의 후보가 될 수 있다.In other words, if the gene of the genetic mutation found in the patient causes the disease in a recessive way, if the genetic mutation found in the patient is present in both parents (trans-relational mutation), this genetic mutation is the cause of the patient's recessive genetic disease. It can be a candidate for causative genetic variation.
반면에, 환자에게서 발견된 유전변이의 유전자가 열성유전 방식으로 질병을 유발하는 경우에, 환자에게서 발견된 유전변이가 부 또는 모 어느 한쪽에서만 존재한다면(시스 관계변이) 이 유전변이는 열성유전병의 원인 유전변이 후보에서 제외된다.On the other hand, if the gene of the genetic mutation found in the patient causes the disease in a recessive manner, if the genetic mutation found in the patient exists only in either the parent or the mother (cis-related mutation), this genetic mutation is the Causative genetic mutations are excluded from candidates.
이와 같이, 동일 유전자의 서로 다른 위치(p1, p2)에서 발생한 두 개 이상의 변이가 시스 관계변이(cis-related variants)인지 트랜스 관계변이(trans-elated variants)인지를 밝혀내는 것은 열성유전병을 유발하는 원인 유전변이 인지 아닌지를 판단하는 중요한 정보가 될 수 있다.As such, finding out whether two or more mutations occurring at different positions (p1, p2) of the same gene are cis-related variants or trans-elated variants is the key to causing recessive genetic disease. It can be important information to determine whether a causal genetic mutation or not.
한편, 차세대 염기서열 분석(NGS)에서 검출된 각각의 리드(R)는 부에게서 받은 염색체(a) 염기서열을 읽은 것인지, 모에게서 받은 염색체(b) 염기서열을 읽은 것인지 알 수 없어서, 각 리드(R)에 존재하는 유전변이들이 시스 관계변이 인지 트랜스 관계변이 인지 알 수 없다.On the other hand, each read (R) detected in the next-generation sequencing (NGS) cannot tell whether the chromosome (a) sequence received from the father or the chromosome (b) sequence received from the mother is read, so each read It is not known whether the genetic mutations present in (R) are cis-related mutations or trans-related mutations.
유전변이 판별부(500)는 리드(R)를 이용하여 열성유전병 원인 유전변이를 판별할 수 있다.The genetic variation determining unit 500 may determine the genetic variation causing the recessive genetic disease by using the lead (R).
유전변이 판별부(500)는 리드(R)에서 검출된 두 개의 유전변이가 시스 관계변이 인지 트랜스 관계변이 인지 구분하여 트랜스 관계변이로 결정되면 두 개의 유전변이를 열성유전병 원인 유전변이로 판별할 수 있다.The genetic mutation determining unit 500 distinguishes whether the two genetic mutations detected in the read (R) are cis-related mutations or trans-related mutations, and when it is determined as trans-related mutations, it is possible to determine the two genetic mutations as genetic mutations causing recessive genetic diseases. have.
이때, 리드(R)에서 검출된 두 개의 유전변이(v1, v2)는 하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 시스 관계변이(cis-related variants)로 결정될 수 있다.At this time, the two genetic variants (v1, v2) detected in the read (R) are cis-related variants if the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value Can be determined as cis-related variants. have.
[식 1][Equation 1]
Figure PCTKR2020016706-appb-img-000005
Figure PCTKR2020016706-appb-img-000005
여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c1는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수이다.Here, N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), and n2 is The number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2), c1 is the number of reads containing both v1 and v2 among the reads containing both genetic mutation sites (p1, p2) is the number of
이론적으로는 시스 관계변이일 경우에는 p1과 p2를 모두 포함하는 리드라면 v1과 v2가 모두 그 리드에서 나와야 하므로, n1과 n2가 같아야 한다. 그러나, 실험적인 이유 또는 생물학적인 이유 등으로 인해서 n1과 n2가 완전히 같이 않으므로, n1과 n2가 얼마나 유사한지에 대한 통계적 유의성(Fisher's exact p-value)을 가지고 판별할 수 있다.Theoretically, in the case of a cis-relational mutation, if a read includes both p1 and p2, v1 and v2 should both come from the grid, so n1 and n2 should be the same. However, because n1 and n2 are not completely the same for experimental or biological reasons, it can be determined with statistical significance (Fisher's exact p-value) of how similar n1 and n2 are.
또한, 리드(R)에서 검출된 두 개의 유전변이(v1, v2)는 하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 트랜스 관계변이(trans-elated variants)로 결정될 수 있다.In addition, the two genetic variants (v1, v2) detected in the read (R) can be determined as trans-elated variants if the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value. have.
[식 2][Equation 2]
Figure PCTKR2020016706-appb-img-000006
Figure PCTKR2020016706-appb-img-000006
여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수이다.Here, N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), and n2 is The number of reads containing v2 among the reads containing both mutation sites (p1, p2), c2 is the number of reads containing v1 and not v2 among the reads containing both mutation positions (p1, p2) is the number of leads that do not.
이와 같이, 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 장치(1000)는 차세대 염기서열 분석(NGS)에서 리드를 이용하여 열성유전병을 유발하는 원인 유전변이를 판별할 수 있다.As described above, the apparatus 1000 for determining the genetic mutation causing recessive genetic disease according to an embodiment of the present invention may determine the causative genetic mutation causing the recessive genetic disease by using the read in next-generation sequencing (NGS).
또한, 두 개의 유전변이가 시스 관계변이 인지 트랜스 관계변이 인지를 구별하기 위해서는 환자의 부와 모 모두에 대해 유전자 검사를 해야 가능하지만, 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 장치(1000)는 대상샘플의 리드에서 검출된 두 개의 유전변이가 시스 관계변이 인지 트랜스 관계 변이인지를 구별함으로써 열성유전병을 유발하는 원인 유전변이를 판별하는 시간과 노력을 대폭 감소시킬 수 있다. In addition, in order to distinguish whether two genetic mutations are cis-related mutations or trans-related mutations, it is possible to perform genetic testing on both the patient's father and mother, but the device for determining genetic mutations causing recessive genetic diseases according to an embodiment of the present invention ( 1000) can significantly reduce the time and effort for determining the causative genetic mutation that causes recessive genetic disease by distinguishing whether the two genetic mutations detected in the read of the target sample are cis-related mutations or trans-related mutations.
도 4는 본 발명의 일 실시예에 따른, 리드 길이 이상 떨어져 있는 두 유전변이가 트랜스 관계변이인 경우를 설명하기 위한 도면이고, 도 5는 본 발명의 일 실시예에 따른, 리드 길이 이상 떨어져 있는 두 유전변이가 시스 관계변이인 경우를 설명하기 위한 도면이다.4 is a diagram for explaining a case in which two genetic mutations separated by more than a read length are trans-related mutations, according to an embodiment of the present invention, and FIG. 5 is a diagram that is separated by more than a read length according to an embodiment of the present invention. It is a diagram for explaining a case in which two genetic mutations are cis-related mutations.
본 발명의 다른 실시예에 따른 열성유전병 원인 유전변이 판별 장치는 리드 길이 이상 떨어져 있는 두 유전변이가 트랜스 관계변이 인지 시스 관계변이 인지를 판별할 수 있다.The apparatus for determining a genetic mutation causing a recessive genetic disease according to another embodiment of the present invention may determine whether two genetic mutations separated by a read length or more are a trans-related mutation or a cis-related mutation.
차세대 염기서열 분석(NGS)에서 한 번에 읽을 수 있는 DNA 서열 길이는 제한적이므로, 리드 길이도 150bp 정도로 제한적일 수 밖에 없다.Since the length of a DNA sequence that can be read at a time in next-generation sequencing (NGS) is limited, the read length is inevitably limited to about 150 bp.
즉, 리드 길이 이상 떨어져 있는 두 유전변이는 통계적 유의성을 가지고 트랜스 관계변이 인지 시스 관계변이 인지 결정할 수 없다.That is, it is impossible to determine whether two genetic mutations separated by more than a read length have statistical significance and whether they are trans-related or cis-related mutations.
본 발명은 리드 길이 이상 떨어져 있는 두 유전변이를 중간에 있는 변이를 이용함으로써 트랜스 관계변이 인지 시스 관계변이 인지 결정할 수 있는 방법을 제시한다.The present invention proposes a method for determining whether a trans-related mutation or a cis-related mutation by using a mutation in the middle of two genetic mutations separated by more than a read length.
도 4를 참조하면, v1과 v2가 트랜스 관계변이 이고, v2와 v3가 트랜스 관계변이 이면, v1과 v3는 시스 관계변이다.Referring to FIG. 4 , if v1 and v2 are trans relational variables and v2 and v3 are trans relational variables, v1 and v3 are cis relational variables.
상동염색체는 2개의 염색체가 한 쌍이므로, v1과 v2가 다른 염색체에 있고, v2와 v3가 다른 염색체에 있으면 v1과 v3는 같은 염색체에 있어야 하므로 시스 관계변이가 된다.Since two chromosomes are a pair of homologous chromosomes, if v1 and v2 are on different chromosomes and v2 and v3 are on different chromosomes, v1 and v3 must be on the same chromosome, so it is a cis-relational mutation.
도 5를 참조하면, v1과 v2가 시스 관계변이 이고, v2와 v3가 트랜스 관계변이 이면, v1과 v3는 트랜스 관계변이다.Referring to FIG. 5 , when v1 and v2 are cis-related variables and v2 and v3 are trans-related variables, v1 and v3 are trans-related variables.
상동염색체는 2개의 염색체가 한 쌍이므로, v1과 v2가 동일 염색체에 있고, v2와 v3가 다른 염색체에 있으면 v1과 v3는 다른 염색체에 있어야 하므로 트랜스 관계변이가 된다.Since two chromosomes are a pair of homologous chromosomes, if v1 and v2 are on the same chromosome and v2 and v3 are on different chromosomes, v1 and v3 must be on different chromosomes, so it is a trans relational mutation.
도시하지는 않았지만, , v1과 v2가 시스 관계변이 이고, v2와 v3가 시스 관계변이 이면, v1과 v3는 시스 관계변이다.Although not shown, if v1 and v2 are cis-relational variables and v2 and v3 are cis-related variables, v1 and v3 are cis-relational variables.
이와 같이, 본 발명의 다른 실시예에 따른 열성유전병 원인 유전변이 판별 장치는 리드 길이 이상 떨어져 있는 두 유전변이가 트랜스 관계변이 인지 시스 관계변이 인지를 통계적 유의성에 의하지 않고 결정함으로써 열성유전병을 유발하는 원인 유전변이를 판별하는 시간과 노력을 대폭 감소시킬 수 있다.As such, the apparatus for determining the cause of a recessive genetic disease according to another embodiment of the present invention determines whether two genetic mutations separated by more than a read length are trans-related or cis-related without statistical significance, thereby causing recessive genetic disease. The time and effort for identifying genetic mutations can be greatly reduced.
이하에서는 도 6 내지 도 8을 참조하여, 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 방법을 설명한다.Hereinafter, a method for determining a genetic mutation causing a recessive genetic disease according to an embodiment of the present invention will be described with reference to FIGS. 6 to 8 .
도 6은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 방법을 설명하기 위한 개략적인 흐름도이고, 도 7은 본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판결 방법을 설명하기 위한 흐름도이고, 도 8은 본 발명의 일 실시예에 따른, 유전변이가 포함되어 있는 리드를 직접 확인한 분석 결과를 나타낸 도면이다.6 is a schematic flowchart for explaining a method for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention, and FIG. 7 is a flowchart for explaining a method for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention. and FIG. 8 is a view showing an analysis result of directly confirming a read containing a genetic mutation, according to an embodiment of the present invention.
본 발명의 일 실시예에 따른 열성유전병 원인 유전변이 판별 방법은 유전자 추출단계(S100), 리드 검출단계(S300), 및 유전변이 판별단계(S500)를 포함한다.The method for determining a genetic variation causing a recessive genetic disease according to an embodiment of the present invention includes a gene extraction step (S100), a read detection step (S300), and a genetic variation determination step (S500).
유전자 추출단계(S100)는 유전자 추출부를 통해, 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교한 후(S110), 두 개 이상의 유전변이가 발생한 유전자를 추출할 수 있다(S130).In the gene extraction step (S100), after comparing the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in the next-generation sequencing (NGS) through the gene extraction unit (S110), the gene with two or more genetic mutations is extracted You can (S130).
다음으로, 리드 검출단계(S300)는 리드 검출부를 통해, 추출된 유전자에서 두 개의 유전변이(v1, v2)와 유전변이가 존재하는 위치(p1, p2)를 확인한 후(S310), 두 개의 유전변이 위치(p1, p2) 중 어느 하나라도 포함되는 위치를 가지는 리드를 검출한다(S330).Next, in the read detection step (S300), two genetic mutations (v1, v2) and the positions (p1, p2) where the genetic mutations exist in the extracted gene are confirmed through the read detection unit (S310), and the two genetic mutations A read having a position including any one of the shift positions p1 and p2 is detected ( S330 ).
이때, 리드 수 검출(S330) 단계에서 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수(n2), 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수(c1), 및 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수(c2)를 검출한다.At this time, in the step of detecting the number of reads (S330), the number of reads (N) including both the two genetic mutation positions (p1, p2) and v1 among the reads including both the two genetic mutation positions (p1, p2) are included the number of reads (n1), the number of reads containing v2 among the reads containing both the two genetic mutation positions (p1, p2) (n2), and the two genetic mutation positions (p1, p2) Detects the number of reads containing both v1 and v2 among reads (c1), and the number of reads containing both v1 and no v2 among reads containing both genetic mutation sites (p1 and p2) (c2) do.
도 8을 참조하면, 데이터는 대상샘플 1번 염색체의 979690번의 구아닌(G) 염기가 아데닌(A) 염기로 바뀐 유전변이와 대상샘플 1번 염색체의 979835번의 구아닌(G) 염기가 아데닌(A) 염기로 바뀐 유전변이에 대한 각 리드의 결과물이다.Referring to Figure 8, the data is a genetic mutation in which the guanine (G) base at 979690 of the target sample chromosome 1 is changed to an adenine (A) base and the guanine (G) base at 979835 of the target sample chromosome 1 is adenine (A) It is the result of each read of a genetic mutation that has been converted to a base.
리드 검출단계(S300)에서 추출된 유전자에 포함되어 있는 두 개의 유전변이를 기준으로, 각 리드에서 정상(O)과 유전변이(X)를 표시하여 리드 수를 검출할 수 있다.Based on the two genetic mutations included in the gene extracted in the read detection step S300, the number of reads can be detected by displaying normal (O) and genetic mutation (X) in each read.
다음으로, 유전변이 판별단계(S500)는 유전변이 판별부를 통해, 리드에서 검출된 서로 다른 위치(p1, p2)에서 발생한 두 개의 유전변이(v1, v2)가 시스 관계변이 인지 트랜스 관계변이 인지 결정하고(S510), 트랜스 관계변이를 열성 유전병 원인 유전변이로 판별한다(S530).Next, the genetic variation determination step (S500) determines whether the two genetic variations (v1, v2) occurring at different positions (p1, p2) detected in the read are cis-related variations or trans-related variations through the genetic variation discrimination unit and (S510), and the trans-related mutation is determined as a genetic mutation causing a recessive genetic disease (S530).
이때, 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 시스 관계변이(cis-related variants)라 명명하고, 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 트랜스 관계변이(trans-elated variants)라 명명한다.At this time, mutations in which two genetic mutations are found on only one of the homologous chromosomes are called cis-related variants, and mutations in which two genetic mutations are found on both homologous chromosomes are trans-elated. called variants).
리드에서 검출된 서로 다른 위치(p1, p2)에서 발생한 두 개의 유전변이(v1, v2)는 앞서 설명한 [식 1]에 의해 시스 관계변이로 결정될 수 있고, [식 2]에 의해 트랜스 관계변이로 결정될 수 있다.Two genetic mutations (v1, v2) occurring at different positions (p1, p2) detected in the read can be determined as cis-related mutations by [Equation 1] described above, and trans-related mutations by [Equation 2] can be decided.
한편, 상술한 본 발명의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다.Meanwhile, the above-described embodiments of the present invention can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.) and an optically readable medium (eg, a CD-ROM, a DVD, etc.).
이상에서 설명한 본 발명이 전술한 실시예 및 첨부된 도면에 한정되지 않으며, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지로 치환, 변형 및 변경이 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and it is in the technical field to which the present invention pertains that various substitutions, modifications and changes are possible within the scope of the present invention. It will be clear to those of ordinary skill in the art.

Claims (15)

  1. 유전변이 추출부를 통해, 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교하여, 두 개 이상의 유전변이가 발생한 유전자를 추출하는 유전자 추출단계;a gene extraction step of extracting genes in which two or more genetic mutations have occurred by comparing the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS) through the genetic mutation extraction unit;
    리드 검출부를 통해, 상기 추출된 유전자와 매칭되는 대상샘플의 리드를 검출하는 리드 검출단계; 및a read detection step of detecting a read of a target sample matching the extracted gene through a read detection unit; and
    유전변이 판별부를 통해, 상기 리드를 이용하여 열성유전병 원인 유전변이를 판별하는 유전변이 판별단계를 포함하는 열성유전병 원인 유전변이 판별 방법.A method for determining a genetic variation causing a recessive genetic disease, comprising a genetic variation determining step of determining a genetic variation causing a recessive genetic disease by using the read through a genetic variation determining unit.
  2. 제1항에 있어서,According to claim 1,
    상기 유전변이 판별단계는,In the step of determining the genetic mutation,
    상기 리드에서 검출된 두 개의 유전변이가 시스 관계변이(cis-related variants)인지 트랜스 관계변이(trans-elated variants)인지 구분하여 트랜스 관계변이로 결정되면 열성유전병 원인 유전변이로 판별하고,If the two genetic mutations detected in the read are determined to be trans-related mutations by distinguishing whether they are cis-related variants or trans-elated variants, it is determined as a genetic mutation causing a recessive genetic disease,
    상기 시스 관계변이(cis-related variants)는 상기 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 말하고, 상기 트랜스 관계변이(trans-elated variants)는 상기 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 말하는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 방법.The cis-related variants refer to mutations in which the two genetic mutations are found only on one of homologous chromosomes, and the trans-elated variants refer to mutations in which the two genetic mutations are found on both homologous chromosomes. A method for determining genetic mutations causing recessive genetic diseases, characterized in that it refers to the mutations found.
  3. 제2항에 있어서,3. The method of claim 2,
    상기 리드를 검출하는 단계는,The step of detecting the lead,
    상기 추출된 유전자에서 두 개의 유전변이(v1, v2)와 상기 유전변이가 존재하는 위치(p1, p2)를 확인하는 단계; 및identifying two genetic mutations (v1, v2) and positions (p1, p2) at which the genetic mutations exist in the extracted gene; and
    상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수(n2), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수(c1), 및 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수(c2)를 검출하는 단계를 포함하는 열성유전병 원인 유전변이 판별 방법.The number of reads containing both of the two genetic mutation positions (p1, p2) (N), the number of reads containing v1 among the reads containing both the two genetic mutation positions (p1, p2) (n1), The number of reads containing v2 (n2) among the reads containing both of the two genetic mutation positions (p1, p2), and both v1 and v2 among the reads containing both the two genetic mutation positions (p1 and p2) Recession comprising the step of detecting the number of reads including v1 (c1) and the number of reads including v1 but not including v2 (c2) among the reads containing both of the two genetic mutation positions (p1 and p2) Methods for determining genetic mutations that cause genetic diseases.
  4. 제3항에 있어서,4. The method of claim 3,
    상기 유전변이 판별단계는,In the step of determining the genetic mutation,
    하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)가 시스 관계변이(cis-related variants)로 결정되는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 방법.When the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value, the genetic mutation causing recessive genetic disease, characterized in that two genetic variants (v1, v2) are determined as cis-related variants Determination method.
    [식 1][Equation 1]
    Figure PCTKR2020016706-appb-img-000007
    Figure PCTKR2020016706-appb-img-000007
    (여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c1는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2), c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.)
  5. 제3항에 있어서,4. The method of claim 3,
    상기 유전변이 판별단계는,In the step of determining the genetic mutation,
    하기 [식 2]로 산출되고, score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)가 트랜스 관계변이(trans-elated variants)로 결정되는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 방법.It is calculated by the following [Equation 2], and when the score (v1, v2) is greater than or equal to the reference value, two genetic variants (v1, v2) are determined as trans-elated variants. Variant determination method.
    [식 2][Equation 2]
    Figure PCTKR2020016706-appb-img-000008
    Figure PCTKR2020016706-appb-img-000008
    (여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2), c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.)
  6. 제2항에 있어서,3. The method of claim 2,
    상기 리드 중 두 개의 유전변이(v1, v2)를 가지는 제1리드와 두 개의 유전변이(v2, v3)를 가지는 제2리드에 대해서, Among the reads, for the first lead having two genetic variations (v1, v2) and the second lead having two genetic variations (v2, v3),
    유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 시스 관계변이(cis-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되고,If the genetic variation (v1) and the genetic variation (v2) are cis-related variants, and the genetic variation (v2) and the genetic variation (v3) are cis-related variants, the genetic variation (v1) ) and genetic variation (v3) are determined as cis-related variants,
    유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-elated variants)이면, 유전변이(v1)과 유전변이(v3)는 트랜스 관계변이(trans-elated variants)로 결정되고, If the genetic variation (v1) and the genetic variation (v2) are cis-related variants, and the genetic variation (v2) and the genetic variation (v3) are trans-elated variants, the genetic variation (v1) ) and genetic variation (v3) are determined as trans-elated variants,
    유전변이(v1)과 유전변이(v2)가 트랜스 관계변이(trans-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 방법.If the genetic variation (v1) and the genetic variation (v2) are trans-related variants, and the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) ) and genetic variation (v3) are cis-related variants (cis-related variants), characterized in that the determination of the genetic variation causing the recessive genetic disease.
  7. 제1항에 있어서,According to claim 1,
    추출된 상기 유전자는 열성유전병을 유발하는 유전자인 것을 특징으로 하는 열성유전병 원인 유전변이 판별 방법.The gene extracted is a method for determining a genetic variation causing a recessive genetic disease, characterized in that the gene that causes the recessive genetic disease.
  8. 차세대 염기서열 분석(NGS)에서 참조 게놈의 레퍼런스 염기서열과 대상샘플의 염기서열을 비교하여, 두 개 이상의 유전변이가 발생한 유전자를 추출하는 유전자 추출부;a gene extracting unit that compares the reference nucleotide sequence of the reference genome with the nucleotide sequence of the target sample in next-generation sequencing (NGS), and extracts genes in which two or more genetic mutations have occurred;
    상기 추출된 유전자와 매칭되는 대상샘플의 리드를 검출하는 리드 검출부; 및a read detection unit for detecting a read of a target sample matching the extracted gene; and
    상기 리드를 이용하여 열성유전병 원인 유전변이를 판별하는 유전변이 판별부를 포함하는 열성유전병 원인 유전변이 판별 장치.A device for determining a genetic variation causing a recessive genetic disease, including a genetic variation determining unit for discriminating a genetic variation causing a recessive genetic disease by using the lead.
  9. 제8항에 있어서,9. The method of claim 8,
    상기 유전변이 판별부는,The genetic mutation determination unit,
    상기 리드에서 검출된 두 개의 유전변이가 시스 관계변이(cis-related variants)인지 트랜스 관계변이(trans-elated variants)인지 구분하여 트랜스 관계변이로 결정되면 열성유전병 원인 유전변이로 판별하고,If the two genetic mutations detected in the read are determined to be trans-related mutations by distinguishing whether they are cis-related variants or trans-elated variants, it is determined as a genetic mutation causing a recessive genetic disease,
    상기 시스 관계변이(cis-related variants)는 상기 두 개의 유전변이가 상동 염색체 중 어느 하나에만 발견된 변이를 말하고, 상기 트랜스 관계변이(trans-elated variants)는 상기 두 개의 유전변이가 상동 염색체 모두에서 발견된 변이를 말하는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 장치.The cis-related variants refer to mutations in which the two genetic mutations are found only on one of homologous chromosomes, and the trans-elated variants refer to mutations in which the two genetic mutations are found on both homologous chromosomes. A device for determining genetic variation causing recessive genetic disease, characterized in that it refers to the found mutation.
  10. 제9항에 있어서,10. The method of claim 9,
    상기 리드 검출부는,The lead detection unit,
    상기 추출된 유전자에서 두 개의 유전변이(v1, v2)와 상기 유전변이가 존재하는 위치(p1, p2)를 확인하고,Confirming the two genetic mutations (v1, v2) and the position (p1, p2) where the genetic mutation exists in the extracted gene,
    상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수(N), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수(n1), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수(n2), 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수(c1), 및 상기 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수(c2)를 검출하는 열성유전병 원인 유전변이 판별 장치.The number of reads containing both of the two genetic mutation positions (p1, p2) (N), the number of reads containing v1 among the reads containing both the two genetic mutation positions (p1, p2) (n1), The number of reads containing v2 (n2) among the reads containing both the two genetic mutation positions (p1 and p2), and both v1 and v2 among the reads containing both the two genetic mutation positions (p1 and p2) A recessive genetic mutation that detects the number of reads that contain (c1), and the number of reads that contain v1 but do not include v2 (c2) among the reads that contain both of the two genetic mutation positions (p1 and p2) discrimination device.
  11. 제10항에 있어서,11. The method of claim 10,
    상기 유전변이 판별부는,The genetic mutation determination unit,
    하기 [식 1]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)를 시스 관계변이(cis-related variants)로 결정하는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 장치.If the score (v1, v2) calculated by the following [Equation 1] is greater than or equal to the reference value, the genetic mutation causing recessive genetic disease, characterized in that the two genetic variants (v1, v2) are determined as cis-related variants discrimination device.
    [식 1][Equation 1]
    Figure PCTKR2020016706-appb-img-000009
    Figure PCTKR2020016706-appb-img-000009
    (여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c1는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1과 v2를 모두 포함하는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among reads containing both genetic mutation sites (p1, p2), c1 is the number of reads containing both v1 and v2 among reads containing both genetic mutation sites (p1, p2) number of leads.)
  12. 제10항에 있어서,11. The method of claim 10,
    상기 유전변이 판별부는,The genetic mutation determination unit,
    하기 [식 2]로 산출된 score(v1, v2)가 기준 값 이상이면 두 개의 유전변이(v1, v2)를 트랜스 관계변이(trans-elated variants)로 결정하는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 장치.If the score (v1, v2) calculated by the following [Equation 2] is greater than or equal to the reference value, the genetic mutation causing recessive genetic disease, characterized in that the two genetic variants (v1, v2) are determined as trans-elated variants. discrimination device.
    [식 2][Equation 2]
    Figure PCTKR2020016706-appb-img-000010
    Figure PCTKR2020016706-appb-img-000010
    (여기서, N은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드의 수, n1은 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하는 리드의 수, n2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v2을 포함하는 리드의 수, c2는 두 개의 유전변이 위치(p1, p2)를 모두 포함하는 리드 중 v1을 포함하면서 v2를 포함하지 않는 리드의 수이다.)(where N is the number of reads containing both genetic mutation sites (p1, p2), n1 is the number of reads containing v1 among reads containing both genetic mutation positions (p1, p2), n2 is the number of reads containing v2 among the reads containing both genetic mutation sites (p1, p2), c2 is the number of reads containing v1 and v2 among the reads containing both genetic mutation sites (p1, p2) The number of leads that do not.)
  13. 제9항에 있어서,10. The method of claim 9,
    상기 리드 중 두 개의 유전변이(v1, v2)를 가지는 제1리드와 두 개의 유전변이(v2, v3)를 가지는 제2리드에 대해서, Among the reads, for the first lead having two genetic variations (v1, v2) and the second lead having two genetic variations (v2, v3),
    유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 시스 관계변이(cis-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되고,If the genetic variation (v1) and the genetic variation (v2) are cis-related variants, and the genetic variation (v2) and the genetic variation (v3) are cis-related variants, the genetic variation (v1) ) and genetic variation (v3) are determined as cis-related variants,
    유전변이(v1)과 유전변이(v2)가 시스 관계변이(cis-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-elated variants)이면, 유전변이(v1)과 유전변이(v3)는 트랜스 관계변이(trans-elated variants)로 결정되고, If the genetic variation (v1) and the genetic variation (v2) are cis-related variants, and the genetic variation (v2) and the genetic variation (v3) are trans-elated variants, the genetic variation (v1) ) and genetic variation (v3) are determined as trans-elated variants,
    유전변이(v1)과 유전변이(v2)가 트랜스 관계변이(trans-related variants)이고, 유전변이(v2)과 유전변이(v3)가 트랜스 관계변이(trans-related variants)이면, 유전변이(v1)과 유전변이(v3)는 시스 관계변이(cis-related variants)로 결정되는 것을 특징으로 하는 열성유전병 원인 유전변이 판별 장치.If the genetic variation (v1) and the genetic variation (v2) are trans-related variants, and the genetic variation (v2) and the genetic variation (v3) are trans-related variants, the genetic variation (v1) ) and genetic variation (v3) is a device for determining genetic variation causing recessive genetic disease, characterized in that determined as cis-related variants.
  14. 제8항에 있어서,9. The method of claim 8,
    추출된 상기 유전자는 열성유전병을 유발하는 유전자인 것을 특징으로 하는 열성유전병 원인 유전변이 판별 장치.The extracted gene is a recessive genetic disease cause genetic variation determination device, characterized in that the gene that causes the recessive genetic disease.
  15. 제1항 내지 제7항 중에 어느 한 항의 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 7 on a computer is recorded.
PCT/KR2020/016706 2019-11-28 2020-11-24 Method and device for identifying genetic variation causative of recessive genetic disease by using ngs WO2021107567A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2019-0155304 2019-11-28
KR1020190155304A KR102319447B1 (en) 2019-11-28 2019-11-28 Method and Apparatus for discriminating the mutations of genes related to recessive inherited disease using next generation sequencing(NGS)

Publications (1)

Publication Number Publication Date
WO2021107567A1 true WO2021107567A1 (en) 2021-06-03

Family

ID=76130655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/016706 WO2021107567A1 (en) 2019-11-28 2020-11-24 Method and device for identifying genetic variation causative of recessive genetic disease by using ngs

Country Status (2)

Country Link
KR (1) KR102319447B1 (en)
WO (1) WO2021107567A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160010277A (en) * 2014-07-18 2016-01-27 에스케이텔레콤 주식회사 Method for prediction of fetal monogenic genetic variations through next generation sequencing of maternal cell-free dna
KR101638473B1 (en) * 2014-12-26 2016-07-12 연세대학교 산학협력단 Detection method of gene deletion based on next-generation sequencing
JP6378529B2 (en) * 2014-04-28 2018-08-22 国立大学法人 鹿児島大学 Methods for detecting genetic diseases
KR101936933B1 (en) * 2016-11-29 2019-01-09 연세대학교 산학협력단 Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
CN109295198A (en) * 2018-09-03 2019-02-01 安吉康尔(深圳)科技有限公司 For detecting the method, apparatus and terminal device of genetic disease genetic mutation
KR20190015957A (en) * 2017-08-07 2019-02-15 연세대학교 산학협력단 A methods for detecting nucleic acid sequence variations based on gene panels and a device for detecting nucleic acid sequence variations using the same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014133369A1 (en) 2013-02-28 2014-09-04 주식회사 테라젠이텍스 Method and apparatus for diagnosing fetal aneuploidy using genomic sequencing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6378529B2 (en) * 2014-04-28 2018-08-22 国立大学法人 鹿児島大学 Methods for detecting genetic diseases
KR20160010277A (en) * 2014-07-18 2016-01-27 에스케이텔레콤 주식회사 Method for prediction of fetal monogenic genetic variations through next generation sequencing of maternal cell-free dna
KR101638473B1 (en) * 2014-12-26 2016-07-12 연세대학교 산학협력단 Detection method of gene deletion based on next-generation sequencing
KR101936933B1 (en) * 2016-11-29 2019-01-09 연세대학교 산학협력단 Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
KR20190015957A (en) * 2017-08-07 2019-02-15 연세대학교 산학협력단 A methods for detecting nucleic acid sequence variations based on gene panels and a device for detecting nucleic acid sequence variations using the same
CN109295198A (en) * 2018-09-03 2019-02-01 安吉康尔(深圳)科技有限公司 For detecting the method, apparatus and terminal device of genetic disease genetic mutation

Also Published As

Publication number Publication date
KR102319447B1 (en) 2021-10-29
KR20210066276A (en) 2021-06-07

Similar Documents

Publication Publication Date Title
WO2017023148A1 (en) Novel method capable of differentiating fetal sex and fetal sex chromosome abnormality on various platforms
WO2016195382A1 (en) Next-generation nucleotide sequencing using adaptor comprising bar code sequence
US20160092630A1 (en) Accurate and fast mapping of reads to genome
CN108197434B (en) Method for removing human gene sequence in metagenome sequencing data
WO2014183270A1 (en) Method for detecting chromosomal structural abnormalities and device therefor
WO2019139363A1 (en) Method for detecting circulating tumor dna in sample including acellular dna and use thereof
CN111755072B (en) Method and device for simultaneously detecting methylation level, genome variation and insertion fragment
WO2013065944A1 (en) Method for sequence recombination and apparatus for ngs
WO2017135768A1 (en) Method and system for predicting risk of developing genetic disorder in putative offspring
WO2019031785A9 (en) Method for detecting variation in nucleotide sequence on basis of gene panel and device for detecting variation in nucleotide sequence using same
WO2016124600A1 (en) Method of typing nucleic acid or amino acid sequences based on sequence analysis
WO2017126943A1 (en) Method for determining chromosome abnormalities
Normand et al. An introduction to high-throughput sequencing experiments: design and bioinformatics analysis
WO2021107567A1 (en) Method and device for identifying genetic variation causative of recessive genetic disease by using ngs
WO2021132920A1 (en) Tailored gene chip for genetic test and fabrication method therefor
WO2017213470A1 (en) Multiple z-score-based non-invasive prenatal testing method and apparatus
WO2017191871A1 (en) Method and device for determining reliability of variation detection marker
WO2023096224A1 (en) Method for detecting chromosome aneuploidy of fetus on basis of virtual data
KR101857735B1 (en) Methods for identifying and filtering of false somatic variants caused by laboratory vector contamination
WO2023090709A1 (en) Apparatus and method for analyzing cells by using state information of chromosome structure
WO2013097328A1 (en) Method and device for tagging genomic indel site
Probst et al. A new Trypanosoma cruzi genotyping method enables high resolution evolutionary analyses
WO2017204414A1 (en) Method and apparatus for analyzing degree of cross-contamination of sample
WO2019031867A1 (en) Method for increasing accuracy of analysis by removing primer sequence in amplicon-based next-generation sequencing
WO2014119914A1 (en) Method for providing information about gene sequence-based personal marker and apparatus using same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20892491

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20892491

Country of ref document: EP

Kind code of ref document: A1