WO2020231081A1 - Top3b gene mutation-based dementia diagnosis method - Google Patents

Top3b gene mutation-based dementia diagnosis method Download PDF

Info

Publication number
WO2020231081A1
WO2020231081A1 PCT/KR2020/006028 KR2020006028W WO2020231081A1 WO 2020231081 A1 WO2020231081 A1 WO 2020231081A1 KR 2020006028 W KR2020006028 W KR 2020006028W WO 2020231081 A1 WO2020231081 A1 WO 2020231081A1
Authority
WO
WIPO (PCT)
Prior art keywords
dementia
gene
snv
top3b
dna
Prior art date
Application number
PCT/KR2020/006028
Other languages
French (fr)
Korean (ko)
Inventor
이왕준
정영희
Original Assignee
주식회사 엠제이브레인바이오
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 엠제이브레인바이오 filed Critical 주식회사 엠제이브레인바이오
Publication of WO2020231081A1 publication Critical patent/WO2020231081A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/101Sanger sequencing method, i.e. oligonucleotide sequencing using primer elongation and dideoxynucleotides as chain terminators
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/122Massive parallel sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to a method for diagnosing dementia based on TOP3B gene mutation, and more particularly, to a method for diagnosing dementia based on the number and location of SNV (Single Nucleotide Variant) present in TOP3B (DNA topoisomerase III beta) gene. About.
  • SNV Single Nucleotide Variant
  • “Dementia” refers to a series of symptoms caused by brain disease. As dementia progresses, it affects the ability to think, act and perform daily activities. The characteristic of dementia is a lack of daily activity due to a decrease in cognitive ability. Doctors diagnose dementia when two or more cognitive functions are significantly impaired. Such cognitive functions include memory, language function, information comprehension, spatial function, judgment and attention. People with dementia may have difficulty solving problems and controlling their emotions, and may experience personality changes. The exact symptoms experienced by dementia patients depend on which part of the brain is damaged by the disease that caused the dementia. In many types of dementia, some of the brain's nerve cells stop functioning and their connections to other cells disappear and die. Dementia usually progresses steadily. In other words, dementia gradually spreads to the brain, and the patient's symptoms worsen over time.
  • Dementia is a representative neurodegenerative brain disease related to age, with a prevalence of about 5-10% in the elderly over 65 years of age worldwide, and most patients with dementia have symptoms of progressive cognitive dysfunction, hallucinations, delusions, and loss of life ability. Show.
  • Alzheimer's the most representative cause of dementia, a neurofibrillary tangle and a plaque of a protein called amyloid ⁇ around the brain cells are observed within the neurons of the brain cortex (Hardy, J. et al., (1998) Nat Neurosci, 1, 355-8), it is estimated that this waste product causes necrosis of nerve cells.
  • hyperphosphorylation of tau ( ⁇ ) protein, inflammation, and oxidative damage appear to be associated with the onset.
  • the diagnosis of dementia is to check the disorders in daily life and social activities due to cognitive impairment through detailed medical history and evaluation of symptoms, and use brain imaging to investigate cerebrovascular disease and brain atrophy. Dementia is confirmed.
  • a neuropsychological test that comprehensively evaluates not only memory, but also language ability, computational ability, spatiotemporal perception ability, and judgment is performed.
  • an additional close-up examination is required based on various information obtained through interviews and screening tests with patients/guardians. Additional tests include neuropsychological tests (SNSB), blood tests, and various types of brain imaging tests (CT, MRI, PET).
  • Magnetic resonance imaging (MRI) can play an important role in distinguishing types of dementia. This test can determine whether it is close to Alzheimer's disease dementia or vascular dementia, and may also help in determining whether it is due to another disease.
  • One of the methods for early diagnosis of Alzheimer's disease dementia is amyloid positron emission tomography (PET), which is a test to diagnose amyloid plates in the brain that appear in Alzheimer's disease patients with images. It can be confirmed even if there are no symptoms of dementia (Korean Dementia Association).
  • MRI is difficult to use for the purpose of early diagnosis of dementia because it can only be confirmed in a state where brain atrophy has progressed significantly.
  • diagnosis it is possible to represent clinical symptoms (surrogate marker) or measure the condition before symptoms appear. There is a need for a new diagnostic marker.
  • SNV single nucleotide variant
  • GWAS genome-wide association studies
  • SNP single nucleotide polymorphism
  • Alzheimer's dementia which is the most representative of dementia, is divided into early-oneset AD, which shows symptoms before age 65, and late-oneset AD, which develops symptoms before age 65, and late-onset dementia is the majority of dementia patients (> 95%).
  • the genetic risk factor known to date is the ApoE gene type.
  • ApoE is a lipid-binding protein with three isoforms, ApoE ⁇ 2, ApoE ⁇ 3, and ApoE ⁇ 4. People with ApoE ⁇ 4 type have 2 to 3 times more dementia in heterozygote and 5 times more in homozygote than others. It is known to have a high incidence rate (Christiane Reitz, Richard Mayeux, (2014) Biochem Pharmacol, 88 (4), 640-51).
  • the present inventors extracted genomic DNA from the blood of normal people over 52 years of age and dementia patients, and then investigated SNVs between the two groups using NGS (Next Generation Sequencing) and big data analysis. (DNA topoisomerase III beta)
  • NGS Next Generation Sequencing
  • big data analysis DNA topoisomerase III beta
  • the present invention was completed by confirming the distinct difference in the number and location of SNVs distributed in the gene.
  • An object of the present invention is to provide a method for diagnosing dementia based on the number and location of SNV (Single Nucleotide Variant) present in TOP3B (DNA topoisomerase III beta) gene.
  • Another object of the present invention is to provide a composition and kit for diagnosing or predicting dementia, including an agent capable of detecting SNV of the TOP3B gene.
  • Another object of the present invention is to provide a composition for diagnosing or predicting dementia comprising an agent capable of detecting SNV of the TOP3B gene for use in a method for diagnosing or predicting dementia.
  • Another object of the present invention is to provide a use of an agent capable of measuring the SNV of the TOP3B gene for diagnosis or prediction of dementia.
  • Another object of the present invention is to provide the use of an agent capable of detecting SNV of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
  • the present invention provides a method for providing information for diagnosis or prediction of dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample. .
  • SNV single nucleotide variant
  • TOP3B DNA topoisomerase III beta
  • the present invention also provides a method for diagnosing dementia comprising the step of detecting SNV of the TOP3B gene in an isolated biological sample.
  • the present invention also provides a composition and kit for diagnosing dementia comprising an agent capable of detecting SNV of the TOP3B gene.
  • the present invention also provides a composition for diagnosis or prediction of dementia comprising an agent capable of detecting SNV of the TOP3B gene for use in a method for diagnosing or predicting dementia.
  • the present invention also provides the use of an agent capable of measuring the SNV of the TOP3B gene for diagnosis or prediction of dementia.
  • the present invention also provides the use of an agent capable of detecting SNV of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
  • 1 is a graph showing the incidence of SNVs in a normal person and a dementia patient group in the selected 132 candidate genes.
  • 2 is a graph showing the incidence of SNVs present in the TOP3B gene in normal and dementia patients.
  • 3 is a result of ROC analysis for the number of SNVs in TOP3B in the analysis program developed by applying the pROC package of R.
  • the e4 allele in APOE has been found to be a strong risk factor for the onset of sporadic Alzheimer's disease (Guojun Bu, (2009) Nat Rev Neurosci, 10 (5)). , 333-44; EH Corder, et al., (1993) Science, 261 (5123), 921-3; Yadong Huang, Lennart Mucke, (2012) Cell, 148 (6), 1204-22).
  • the APOE gene has three polymorphisms, e2, e3 and e4, with frequencies of 8.4%, 77.9%, and 13.7% respectively worldwide.
  • APOE e4 is usually found in more than 50% of Alzheimer's disease patients, but less than 15% in the control group with normal cognitive ability (Alex Ward, et al., (2012) Neuroepidemiology, 38 (1), 1-17). However, the prediction based on APOE e4 can only be explained within 20% of the genetic impact on dementia. As such, the e4-mediated risk and possible triggers for Alzheimer's disease remain unclear.
  • NGS is a technology that can easily identify genetic sequences in vitro and in vivo .
  • genetics research has made significant progress due to the advent of sequencing technology.
  • NGS technology has enabled the investigation of a growing number of genes, and the development of this technology has made it easier to detect genetic mutations for disease diagnosis and treatment.
  • Most of the genes associated with dementia that have been recently identified have been known to affect A ⁇ 42 production and elimination or an important pathway in the pathogenesis of Alzheimer's disease (Celeste M Karch, Alison M Goate, (2015) Biol Psychiatry, 77 (1) , 43-51; Bin Zhang, et al., (2013) Cell, 153 (3), 707-20).
  • they have complex interrelationships (Lars Bertram, Rudolph E Tanzi, (2005) J Clin Invest, 115 (6), 1449-57). There was a problem below.
  • the present invention relates to a method of providing information for diagnosis or prediction of dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample.
  • SNV single nucleotide variant
  • TOP3B DNA topoisomerase III beta
  • the diagnosis or prediction of dementia may include not only diagnosing as a dementia patient, but also selecting as a high-risk group for dementia, but is not limited thereto.
  • dementia occurs in the 60s or older, and the risk of developing it increases with age.
  • dementia under the age of 65 which can occur in the 30s, 40s, and 50s, which occurs rarely, it is called ‘early onset dementia.’
  • the genetic predisposition is stronger than when it occurs after age 65. Since the likelihood of getting this disease gradually increases with age, it has been recognized as a disease that develops with age.
  • dementia is not a normal process of old age and is known to be caused by pathological degenerative cranial nerve changes.
  • dementia There are very few inherited forms of dementia, and certain genetic mutations are known to be the cause of the onset. However, in most cases, even if these genes are not involved, people with a history of family dementia have a higher risk of developing dementia. In addition, certain health and lifestyle options can be a risk factor for dementia. People with untreated vascular factors, such as high blood pressure, are at high risk, as are those with low physical and mental activity. There are many diseases that cause dementia. In most cases, it is not known why such disorders develop.
  • the dementia is Alzheimer's disease, senile dementia, vascular dementia, frontotemporal dementia, dementia with Lewy Bodies, or Parkinson's disease.
  • Parkinson's disease dementia
  • it may be characterized as Alzheimer's disease, but is not limited thereto.
  • Alzheimer's disease is the most common form of dementia, and it includes about two-thirds of dementia patients. In the present invention, it is used in the same sense as Alzheimer's disease and Alzheimer's dementia. This gradually decreases cognitive abilities and often begins to lose memory.
  • Alzheimer's disease is characterized by two or more symptoms in the brain called amyloid plaques and nerve fiber knots. Amyloid plaques are abnormal clusters of proteins called beta amyloids. Nerve fiber knots are twisted filament knots made up of a protein called taura. Amyloid plaques and nerve fiber knots kill these cells by blocking communication with them.
  • Alzheimer's disease the most common form of dementia, is a progressive brain disease, first described in 1906 by German physician Allois Alzheimer's.
  • Senile plaques and neurofibrillary tangles observed in the brains of patients who died of Alzheimer's disease appear as pathological characteristics of Alzheimer's disease.
  • senile plaques are formed by accumulation of proteins and dead cells on the outside of cells, and the main constituent is a peptide called amyloid ⁇ (A ⁇ ).
  • a ⁇ amyloid precursor protein
  • the precursor APP is decomposed by ⁇ -secretase (BSCE) and ⁇ -secretase to produce A ⁇ (DH Small, et al., (2001) Nat Rev Neurosci, 2 (8), 595- 8; BA Yankner, (1996) Neuron, 16 (5), 921-32; DJ Selkoe, (1999) Nature, 399 (6738 Suppl), A23-31).
  • BSCE ⁇ -secretase
  • a ⁇ DH Small, et al., (2001) Nat Rev Neurosci, 2 (8), 595- 8; BA Yankner, (1996) Neuron, 16 (5), 921-32; DJ Selkoe, (1999) Nature, 399 (6738 Suppl), A23-31).
  • 'Senile dementia' refers to a condition in which a person who has been living normally has suffered a significant disruption in daily life due to a persistent and general decline in cognitive function compared to before as the brain function has been impaired due to various causes after 65 years of age. .
  • senile dementia is a generic term for dementia that develops in old age after 65 years of age.
  • senile dementia was thought to be an aging phenomenon that the elderly would naturally experience, but it has been recognized as a clear brain disease through recent studies.
  • dementia There are a wide variety of causes that can cause dementia in old age, the most of which are'Alzheimer's disease' and'vascular dementia', and although relatively low in frequency, other degenerative brain diseases such as Lewy body dementia, frontotemporal lobe degeneration, and Parkinson's disease Hypernormal pressure hydrocephalus, head trauma, brain tumors, metabolic diseases, deficiency diseases, addictive diseases, and infectious diseases can also be the cause.
  • degenerative brain diseases such as Lewy body dementia, frontotemporal lobe degeneration, and Parkinson's disease Hypernormal pressure hydrocephalus, head trauma, brain tumors, metabolic diseases, deficiency diseases, addictive diseases, and infectious diseases can also be the cause.
  • 'Vascular dementia' is a cognitive impairment caused by damage to blood vessels in the brain. This can be caused by a single stroke or multiple strokes occurring over time.
  • Vascular dementia is diagnosed when there is evidence of vascular disease in the brain and cognitive impairment that interferes with daily life. Signs of vascular dementia may begin suddenly after a stroke or may begin gradually as vascular disease worsens. Symptoms depend on the location and size of the brain injury. This may affect one or several specific cognitive functions.
  • Vascular dementia may appear similar to Alzheimer's disease, and it is very common that Alzheimer's disease and vascular dementia occur together.
  • Lewy body disease is characterized by the formation of Lewy bodies in the brain.
  • Lewy bodies are abnormal chunks of the protein alpha-synuclein that progresses within nerve cells, which occur in specific areas of the brain that lead to changes in movement, thinking and behavior. People with Lewy body disease may experience many fluctuations in attention and thinking. They can go from almost normal practice to severe confusion within a short period of time, and visual illusions are also common symptoms.
  • Lewy body disease can include overlapping disorders of Lewy body dementia, Parkinson's disease or Parkinson's disease dementia. It is often diagnosed as Parkinson's disease when movement symptoms first appear, and dementia develops in most people when Parkinson's disease develops. When cognitive symptoms first appear, it is diagnosed as Lewy body dementia. Lewy body disease sometimes appears with Alzheimer's disease and/or vascular dementia.
  • Frontotemporal dementia occurs when there is gradual damage to the frontal and/or temporal lobes of the brain. Symptoms begin in their 50s or 60s or earlier. There are two main types of frontotemporal dementia: the frontal lobe (associated with behavioral symptoms and personality changes) and the temporal lobe (speech impairment). However, these two often go hand in hand. Because the temporal lobe of the brain controls judgment and social behavior, patients with frontotemporal dementia often have problems maintaining socially appropriate behavior. They behave seriously, overlook normal responsibilities, are difficult to control, repetitive, aggressive, lack deterrent, or behave impulsively. There are two main types of temporal lobe or language variant of frontotemporal dementia.
  • Semantic dementia is associated with a gradual loss of word meaning, difficulty finding words, and difficulty understanding language. Progressive non-fluent aphasia is less common but affects the ability to speak fluently. Frontotemporal lobar degeneration (FTLD) or Pick's disease is also called frontotemporal lobar degeneration.
  • FTLD Frontotemporal lobar degeneration
  • Pick's disease is also called frontotemporal lobar degeneration.
  • Parkinson's disease' is a representative degenerative brain disease. It is a disease in which nerve cells that secrete dopamine in a specific part of the brain called black matter located in the midbrain are gradually lost without knowing the cause. Dopamine is a neurotransmitter in the brain that is necessary for exercise, and symptoms such as slow motion (slow movement), tremors at rest, muscle stiffness, and postural instability occur in Parkinson patients.
  • diagnostic tests for dementia include mental state tests such as MMSE (Mini Mental State Examination) and neuropsychological tests such as clinical dementia rating (CDR).
  • MMSE a simplified mental state test, is a test that checks whether cognitive abilities are deteriorated due to diseases such as dementia or alcoholism.It is a tool that measures sense of direction, recollection, short-term memory, concentration, constitutive behavior, and language ability. If it is a perfect score, it is usually judged as definite dementia (clear cognitive dysfunction) if it is less than 18 points, and if it is 19 to 23, it is suspected of dementia (mild cognitive dysfunction). None). While it is very simple and does not take much time, it is convenient, but because other tests must be performed in parallel to obtain accurate information about which function is degraded, it is not possible to confirm dementia or distinguish the type of dementia only with the MMSE test.
  • the CDR which is a clinical evaluation scale for dementia, is a method of dividing the stages of dementia into six indicators (memory, mental power, judgment and problem solving ability, social activities, household life and hobbies, and hygiene and grooming).In each score, 0 is not dementia. , 0.5 is mild cognitive impairment, 1 is mild dementia, 2 is severe dementia, 3 is severe dementia, 4 is severe dementia, and 5 is terminal dementia.
  • Alzheimer's dementia patients can be classified into mild cognitive impairment, mild dementia (MILD AD), moderate dementia, and severe dementia (severe AD).
  • MILD AD mild dementia
  • severe dementia severe dementia
  • Normal people also experience some degree of memory impairment as they age, but symptoms such as personality changes that are specific to Alzheimer's patients do not appear, which is called mild cognitive impairment (MCI).
  • Mild cognitive impairment is considered a precursor to Alzheimer's disease and is characterized by short-term memory loss, spatial memory loss, and emotional imbalance, which can be classified into several stages.
  • the MCI related to memory loss is called amnestic MCI, and the probability of converting to Alzheimer's in a 65-year-old normal person within a certain period is 1 to 3%, whereas the group with forgetful MCI is 8 out of 10. It is believed that patients with Alzheimer's disease are converted to Alzheimer's, and those with oblivious mild cognitive impairment are highly likely to develop Alzheimer's dementia.
  • dementia It is essential that the patient undergoes a medical diagnosis in the early stages when the symptoms of dementia first appear, so that the patient receives the correct diagnosis and treatment.
  • early signs of dementia may not be obvious, and some common symptoms include progressive and frequent memory loss, confusion, personality changes, aphasia and withdrawal symptoms, and loss of ability to perform routine tasks.
  • some medications can be used to relieve some symptoms of dementia, but no effective treatment methods exist.
  • the ultimate goal of dementia treatment is to reverse and cure the disease itself, and to reduce and eliminate cognitive disorders, mental disorders, and abnormal behavioral symptoms caused by dementia.
  • Alzheimer's disease Currently, it is reported that many drugs can be used to treat Alzheimer's disease, but most of them are still in the process of screening regarding their efficacy. Moreover, most of the existing drugs can slightly slow the progression of Alzheimer's disease or appear due to Alzheimer's disease. It is only intended to treat symptoms, and there are no drugs designed and made to fundamentally treat Alzheimer's disease itself.
  • early diagnosis is more important in the field of dementia. This is because if a simple diagnostic technology that can distinguish dementia patients early is provided, symptoms can be alleviated and the severity of the onset can be reduced through rapid and appropriate treatment through administration of drugs in the early stages of the disease. If dementia can be diagnosed early, medical staff, patients, and caregivers can cope with future problems in advance, and early treatment with drugs or non-drugs slows the progression of the disease and improves the quality of life. I can help.
  • the information providing method or diagnostic method may further include the step of identifying as a dementia patient or a high-risk group for dementia when the cortical thickness decreases and brain atrophy by obtaining a brain image.
  • the brain image may be an MRI brain image, but is not limited thereto.
  • the information providing method or diagnosis method may be characterized in that one or more of tests consisting of a neuropsychological test, a cerebrospinal fluid (CSF) test, and an amyloid-PET test are additionally performed.
  • tests consisting of a neuropsychological test, a cerebrospinal fluid (CSF) test, and an amyloid-PET test are additionally performed.
  • analysis of SNVs of TOP3B was performed to evaluate the risk of dementia in parallel with detailed medical history and brain imaging evaluation of the subject's symptoms.
  • Alzheimer's dementia diagnosis techniques include neuropsychological tests, MRI brain imaging tests, clinical diagnosis based on expert findings, and pathological tests using cerebrospinal fluid and florbetaben-based amyloid-PET.
  • dementia onset or mild cognitive impairment can be diagnosed, but there are cases where it is difficult to distinguish it from other brain diseases, and it is not suitable for the purpose of early diagnosis because diagnosis is possible only after symptoms begin.
  • the cerebrospinal fluid test it is a reliable dementia diagnosis scale that is performed through quantitative analysis such as amyloid beta protein and tau protein analysis, but the subject's rejection level is very high due to invasive cerebrospinal fluid collection.
  • the reliability is high, but the cost is high.
  • amyloid precursor proteins presenilin-1 and presenilin-2
  • amyloid precursor proteins are known to be the primary factors for the early onset of Alzheimer's disease due to a family history (Kaj Blennow, et al., (2006) Lancet, 368 (9533), 387-403; John Hardy, Dennis J Selkoe, (2002) Science, 297 (5580), 353-6)
  • the e4 allele in APOE Has been shown to be the strongest risk factor for the onset of sporadic Alzheimer's disease.
  • the APOE gene has three polymorphisms, e2, e3 and e4, with frequencies of 8.4%, 77.9%, and 13.7% respectively worldwide.
  • APOE In Alzheimer's disease, the e4 frequency increases dramatically by ⁇ 40% (L A Farrer, et al., (1997) JAMA, 278 (16), 1349-56).
  • APOE is located on the long arm of chromosome 19 (19 q13.2), and the three alleles e2, e3, and e4 are formed by different amino acids 112 (Cys/Arg) and 158 (Arg/Cys).
  • APOE e4 is usually found in more than 50% of Alzheimer's disease patients, but less than 15% in the control group with normal cognitive ability.
  • Korean Patent Registration No. 10-1335021 describes the relationship between T/G heterozygosity of APOE rs405509 with patients with Alzheimer's disease or mild cognitive impairment.
  • Republic of Korea Patent Registration 10-1250464 can explain the difference in the risk of each race of the APOE E4/E4 homozygote where a single base mutation of the gene located in the APOE promoter, and the difference in cortical thickness according to the allele or allele It confirmed that it shows.
  • the TOP3B (DNA topoisomerase III beta) gene refers to a gene encoding DNA topoisomerase, an enzyme that regulates and changes the phase state of DNA during transcription.
  • the DNA topoisomerase temporarily cuts and recombines single strands of DNA so that the strands can pass through each other, easing the supercoil and changing the phase of the DNA.
  • This enzyme interacts with DNA helicase SGS1 and plays an important role in DNA recombination, cellular senescence, and maintenance of genomic stability. Another splicing of the C-terminus of this gene results in three transcriptional variants with distinct tissue specificity.
  • Eukaryotic topoisomerase III was first identified in budding yeast by a mutation that causes hyper-recombination between repeated sequences, and then the mammalian DNA topoisomerase III gene was also cloned. Unlike DNA topoisomerase III, it was found to be differentiated into alpha and beta isozymes as it evolved into higher organisms.
  • the TOP3B gene has a total length of 25,823 nt and is present in human chromosome 22.
  • the location of the TOP3B gene in the chromosome is 22,311,397-22,337,219 based on GRCh37.p13 (Genome Reference Consortium Human Build 37 patch release 13).
  • the TOP3B gene includes all of exon, intron, 5'and 3'untranslated regions (UTRs) present on the genome including the same.
  • the TOP3B gene may be characterized by including a nucleotide sequence represented by SEQ ID NO: 47.
  • the information providing method or diagnostic method for diagnosing or predicting dementia detects SNV (Single Nucleotide Variant) present in the TOP3B gene, and provides a method for selecting a group at high risk of developing dementia through the number and location.
  • SNV Single Nucleotide Variant
  • the information providing method or diagnosis method may further include the step of identifying as a dementia patient or a high-risk group of dementia when the SNV of the TOP3B gene is 3 or more.
  • the SNV of the TOP3B gene is 4 or more, it may be characterized in that it further comprises a step of identifying as a dementia patient or a high risk group for dementia, but is not limited thereto.
  • an analysis program applying the R package pROC was developed to perform ROC (Receiver Operating Characteristics) analysis, and a specificity of 0.6061 and a sensitivity of 0.9245 were confirmed based on the number of SNVs 2.5 in the TOP3B gene (Fig. 3 ). Accordingly, two or less cases can be selected as a normal person, and if three or more, a dementia patient or a high-risk group for dementia can be selected.
  • ROC Receiveiver Operating Characteristics
  • sensitivity true positive/(true positive + false positive
  • sensitivity true negative/(true negative + false positive)
  • specificity true negative/(true negative + false positive)
  • the y-axis is a true positive rate.
  • sensitivity plays a more important role than specificity, because low sensitivity means false negatives, that is, disease risk groups are not predicted as risk groups. Therefore, it is suggested that the information providing method or diagnostic method of the present invention exhibiting a sensitivity of 0.9245, which is very close to 1, exhibits very high accuracy in diagnosing a disease.
  • the information providing method or diagnosis method is 22,311,659 based on GRCh37.p13 (Genome Reference Consortium Human Build 37 patch release 13) in TOP3B gene; 22,311,776; 22,312,061; 22,312,502; 22,312,378; 22,312,589; 22,312,970; 22,313,743; 22,318,365; 22,312,555; 22,312,531; 22,316,792; 22,311,882; 22,313,733; 22,311,516; 22,312,292; 22,313,669; 22,312,383; 22,330,107; 22,312,568; 22,312,476; 22,318,671; 22,312,668; 22,312,790; 22,318,538; 22,312,484; 22,312,351; 22,312,350; 22,312,315; 22,313,829; And 22,330,082; when SNV is detected at a location selected from the group consisting of; determining as a dementia patient or a high-
  • the positions are 263, 380, 665, 1106, 982, 1193, 1574, 2347, 6969, 1159, 1135, 5396, 486, 2337, 120, 896, respectively, in the nucleotide sequence represented by SEQ ID NO: 47, 2273, 987, 18711, 1172, 1080, 7275, 1272, 1394, 7142, 1088, 955, 954, 919, 2433, 18686 refers to the position of the nucleotide.
  • the SNV of the TOP3B gene is selected from the SNV of Table 1, but is not limited thereto.
  • SNVs of other related genes may be additionally added or excluded in order to select a dementia patient group with higher accuracy.
  • SNV detection of genes such as APOE, SOL1, CLU, PICALM, CR1, BIN1 may be additionally performed as other genes in addition to the TOP3B gene according to the present invention.
  • Single Nucleotide Variant refers to mutations in which a single nucleotide sequence differs among mutations in the genome, and single nucleotide polymorphism (SNP) and point mutations are included here. Included. The frequency is not limited and can occur in somatic cells.
  • a single nucleotide alteration in a somatic cell eg due to cancer
  • Single-nucleotide alteration Single nucleotide polymorphism means that one specific nucleotide sequence is changed to another nucleotide at the same location in the genome of several people and is expressed as a different trait, and is the most common type of genetic mutation in the human genome.
  • Monobasic polymorphism generally occurs with a frequency of more than 1% of the population, and cases less than 1% are classified as mutations. Point mutations appear when one nucleotide sequence is substituted, inserted, or deleted, and can prevent or modify the production of a specific protein.
  • Single base mutations are classified according to their location and function in the genome. In addition, it is classified into synonymous SNV (sSNV), which does not cause amino acid sequence mutation, and nonsynonymous SNV (nsSNV), that does not cause amino acid sequence mutation.
  • sSNV synonymous SNV
  • nsSNV nonsynonymous SNV
  • SNV which is caused by substitution or insertion-deletion (indel) of one nucleotide sequence
  • Indel substitution or insertion-deletion
  • a case where the amino acid sequence does not change after a single base mutation occurs due to the substitution of one base sequence is called synonymous SNV or silent SNV.
  • GAC sequence is GAG and C is replaced by G
  • the codon of the mRNA changes from CUG to CUC, but both before and after the substitution encode the same leucine.
  • nucleotide sequence of an amino acid is changed after a single base mutation occurs due to the substitution of one nucleotide sequence, which is called nonsynonymous SNV.
  • nonsynonymous SNV when the GUA codon base sequence is GUU and A is replaced with U, aspartic acid is changed to valine and is encoded.Since the two amino acids have very different chemical properties, the structure and function of the resulting protein can be greatly affected. . In this way, what is encoded by SNV is subdivided into missense SNV. After SNV occurs, it is called nonsense SNV that it does not change to another amino acid, but a stop codon is encoded to produce a protein that is shorter than it actually is.
  • missense SNV is sickle cell anemia.
  • the sixth codon of ⁇ -hemoglobin is substituted from GAG to GTG, and the acidic glutamic acid is encoded by the non-polar amino acid valine.
  • the oxygen carrying capacity of hemoglobin is weakened, causing anemia. Pain and tissue damage can be caused.
  • indels may cause more severe mutations than substitutions.
  • indel a frame shift of the amino acid sequence sequence is induced, and the translated amino acid is changed after SNV.
  • coding SNV coding SNV
  • UTR the ratio of intron, 5'and 3'untranslated regions
  • ncSNV non-coding SNV
  • nsSNV can affect protein folding, binding affinity, expression, post-translational modification, and other protein features, and sequence variations in known genetic diseases. It accounts for more than 85% of the total, so the most attention is focused among single base variants. However, it has been reported that synonymous SNV is also associated with specific diseases (Siyuan Zheng, et al., (2014) Cell, 156(6), 1129-1131), and untranslated RNA or promoter generated by ncSNV is transcribed.
  • the SNP of the APOE gene and promoter is a genetic mutation that appears in Alzheimer's disease.
  • APOE is a lipid-binding protein having three isoforms, APOE ⁇ 2, APOE ⁇ 3, and APOE ⁇ 4, and the APOE gene has three polymorphisms, e2, with a frequency of 8.4%, 77.9%, and 13.7%, respectively, worldwide.
  • e3 and e4 Of these, the e4 allele is the strongest risk factor for the onset of sporadic Alzheimer's disease.
  • APOE is located on the long arm of chromosome 19 (19 q13.2), and is formed by a combination of three alleles e2, e3 and e4 that are formed by different amino acids 112 (Cys/Arg) and 158 (Arg/Cys).
  • APOE e4 was found to be involved in the conversion of healthy elderly people and mild cognitive impairment to Alzheimer's disease. However, predictions based on APOE e4 can only be explained within 20% of the genetic impact on dementia. As such, the e4-mediated risk and possible triggers for Alzheimer's disease remain unclear.
  • Korean Patent Registration No. 10-1933847 describes not only the APOE E4 gene mutation, but also the genetic mutations that may affect the risk of dementia of APOE E4 in the rs405509 T allele of the APOE promoter surrounding the APOE gene.
  • the SNP is located on chromosome 19 44905579 based on the human genetic map GRCh38.p7 version.
  • the -491AA genotype and -219TT genotype have been reported to increase AD risk independently of the APOE epsilon genotype (Anna Limon-Sztencel, et al., (2016) Alzheimers Res Ther, 8 (1), 19).
  • rs405509 has a synergistic effect with APOE e4 in the influence on cognitive ability (C Ma, et al., (2016) Eur J Neurol, 23 (9), 1415-25).
  • APOE apolipoprotein E
  • the APP gene is a gene that codes for 770 amino acids and is located on human chromosome 21q21.1. According to the amyloid cascade theory, an abnormality in APP metabolism related to the production of ⁇ -amyloid causes amyloid deposition in brain tissues and cerebrovascular vessels, which is thought to play an important role in the onset of Alzheimer's disease (DJ Selkoe , (1991) Neuron, 6 (4), 487-98). APP is degraded by three types of protein metabolism enzymes ( ⁇ -, ⁇ -, and ⁇ -secretase), and A ⁇ 40, a nonamyloidogenic product, is easy to dissolve, whereas A ⁇ 42, an amyloidogenic product, tends to cause ⁇ -amyloid deposition.
  • Presenilin 1 (PS1) and Presenilin 2 (PS2) genes have also been reported to be associated with Alzheimer's.
  • the gene of PS1 is located on chromosome 14q24.3, and PS2 is on chromosome 1, 1q31-q42.
  • Presenilin (PSI) is present in the nuclear membrane, endoplasmic reticulum, and Gogi, and has eight transmembrane domains.
  • the PS1 and PS2 genes are 67% of the amino acid sequence, and the transmembrane domain is 84%.
  • 142 mutations in PS1 and 10 mutations in PS2 have been discovered, and these mutations are known to affect APP metabolism, thereby increasing the production of A ⁇ 42.
  • PS in the metabolic process of APP is not yet known, but it is a constituent of the protein complex related to the enzyme activity of ⁇ -secretase, and the structural modification of the complex occurs due to genetic mutation, resulting in abnormal interactions between proteins. It is estimated to cause
  • APP, PS1, PS2 gene mutations that cause premature dementia can account for less than 5% of the onset of Alzheimer's disease.
  • polymorphism of the APOE allele rather than genetic mutation has recently attracted attention as a genetic factor.
  • APOE has a gene located on chromosome 19, and is a protein involved in cholesterol transport. There are three types of alleles, ⁇ 2, ⁇ 3, and ⁇ 4.
  • APOE ⁇ 4 homozygous or heterozygous genes have a 90% probability of developing AD until 85 years of age, and there are reports that they can develop AD 10 years earlier than those with ⁇ 2 or ⁇ 3 genes, and ⁇ 4 gene expression is It has been found to be related to about 50% of the genetic risk factors for AD, with about 15% in the general population. In addition, ⁇ 4 was found to be more correlated than other ⁇ 2 or ⁇ 3 in the immunoreactivity study between AD patients' brain tissues, senile plaques or nerve fiber bundles present in the cerebral blood vessels and APOE. In addition, there are studies showing that APOE reduces the ⁇ -secretase action on APP (J Poirier, (1994) Trends Neurosci, 17 (12), 525-30). APOE genes should be understood as susceptibility rather than determinants.
  • growth factor receptor-bound protein-associated binding protein 2 (GAB2) modifies the risk of LOAD in the APOE epsilon 4 carrier and influences the neuropathology of Alzheimer's disease.
  • a meta-analysis was performed to more accurately assess whether the phosphatidyl inositol binding clathrin assembly protein (PICALM) and sortilin-related receptor (SORL1) mutants were related to AD. According to this, in PICALM, the allele T of rs3851179 was associated with a 13% increased risk of AD. In addition, 7 SNPs of SORL1 were significantly related to AD.
  • Single Nucleotide Polymorphism refers to when a single base (A, T, C or G) in the genome differs between members of a species or between individual paired chromosomes. It refers to the diversity of DNA sequences. For example, DNA fragments from different individuals (eg, TGTG[G/T]AAAG, where G/T is a complementary base), including differences in a single base, are called two alleles (G or T). In general, almost all SNPs have two alleles. Within a population, SNPs can be assigned a minor allele frequency (MAF; the lowest allele frequency at a locus found in a particular population).
  • MAF minor allele frequency
  • SNP Single bases can be changed (replaced), removed (deleted) or added (inserted) to the polynucleotide sequence. SNP may cause an inframe shift.
  • SNPs can be classified into several types in terms of their location and function in the genome. If classified according to its location in the genome, regulatory SNP (rSNP) refers to a SNP that has the function of regulating gene expression by being located at the promoter region of a gene. In addition, the SNP may belong to a coding sequence of a gene, a non-coding region of a gene, or an intergene region (region between genes). Coding SNP (cSNP) refers to the SNP present in the exon region encoding the gene, intron SNP (iSNP) refers to the SNP located in the intron, and genomic SNP (gSNP) refers to the gene and gene It refers to the SNP that exists in the intergenic region between.
  • rSNP regulatory SNP
  • cSNP Coding SNP
  • iSNP intron SNP
  • gSNP genomic SNP
  • rSNP and cSNP which are located in front of the exon, which are directly involved in the change of gene function and that can control expression, are very likely to be functional SNPs that can cause phenotypic changes. This is because changes in rSNP and cSNP are functional amino acid sequences. This is because there is a high possibility of causing a change in However, due to the codon degeneracy, the SNP in the coding sequence of the gene does not necessarily cause a change in the amino acid sequence of the target protein.
  • SNPs in non-coding regions may have a higher cancer risk and may affect mRNA structure and disease susceptibility.
  • Non-coding SNPs can also alter the level of expression of a gene, such as expression quantitative trait locus (eQTL).
  • eQTL expression quantitative trait locus
  • synonymous substitutions do not change amino acids in the protein, but can still affect their function in other ways. For example, there may be a silent mutation in multiple drug resistant gene 1 (MDR1), which encodes a cell membrane pump that releases the drug from the cell, slows the rate of translation and causes the peptide chain to fold into an abnormal shape. Allow.
  • MDR1 multiple drug resistant gene 1
  • the C1236T polymorphism in the MDR1 protein changes the GGC codon to GGT at amino acid position 412 of the polypeptide (both encode glycine) (G Gumus-Akay, et al., (2008) Genet Mol Res, 7 (4), 1193). -9), C3435T polymorphism changes ATC to ATT at position 1145 (both coding for isoleucine) (Ji Woong Sohn, et al., Tuberc Respir Dis 2005; 58:135-141).
  • arginine represents the protein level where it is replaced by leucine at position 527, and at the phenotypic level this is seen in overlapping mandibuloacral dysplasia and progeria syndromes.
  • SNP genotyping measures the genetic variation of single nucleotide polymorphism (SNP) between members of a species. This is a form of genotyping that measures more common genetic variations. SNP has been found to be involved in the etiology of many human diseases, and is of particular interest in pharmacological genetics. Since SNPs are conserved during evolution, they have been proposed as markers for use in research instead of quantitative trait loci (QTL) analysis and microsatellites. The use of SNPs is expanding in the HapMap project, which aims to provide the minimal set of SNPs required for genotyping of the human genome. SNPs can also provide genetic fingerprints for use in identification (Harbron S; Rapley R (2004). Molecular analysis and genome discovery. London: John Wiley & Sons Ltd.). The increasing interest in SNP was reflected by the development of various SNP genotyping methods.
  • SNP analysis Various methods are used for SNP analysis. Most of the methods developed and used so far are based on the PCR method and mainly analyze a limited number of SNPs for several samples, but analyze a large number of SNPs simultaneously using a DNA array or an ultra-precise analysis equipment such as MALDI-TOF. The analysis method using is also widely used. There are four principles of SNP genotyping, including Allele-Specific Hybridization, Primer Extension, Allele-Specific Oligonucleotide Ligation, and Cleavage, depending on the difference between sample preparation and retrieval methods (Genetic Variation and Disease). .
  • the major SNP analysis methods based on PCR are SSCP (Single Strand Conformation Polymorphism), AFLP (Amplified Fragment Length Polymorphism), RFLP (Restriction Fragment Length Polymorphism), RAPD (Random Amplified Polymorphic DNA), AS-PCR (Allele-Specific PCR). Etc.
  • SSCP single-strand conformation polymorphism or single-strand chain polymorphism
  • SNP genotyping is a method commonly used for SNP genotyping, and is defined as the morphological difference of single-stranded nucleotide sequences of the same length induced by sequence differences under specific experimental conditions. This property makes it possible to distinguish sequences by gel electrophoresis, which separates fragments according to different morphology (M Orita, et al., (1989) Proc Natl Acad Sci USA, 86 (8), 2766-70). .
  • the double-stranded DNA is denatured under high temperature conditions (94°C) to form a single strand and then quickly cooled to form a unique three-strand structure.
  • each single strand with a difference in sequence has a different mobile phase. Even if the lengths are the same, if they have different base structures in them, they are distinguished in the mobile phase, so the variation can be confirmed by comparing the moving speed between samples.
  • AFLP amplified fragment length polymorphism
  • AFLP has many advantages compared to other marker technologies such as randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP) and microsatellites. AFLP not only has higher reproducibility, resolution and sensitivity at the whole genome level compared to other technologies (UG Mueller, LL Wolfenbarger, (1999) Trends Ecol Evol, 14 (10), 389-394), 50 to 50 at a time. It has the ability to amplify 100 fragments.
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Restriction fragment length polymorphism is a method of typing SNP by checking the difference in length of DNA fragments by treatment with restriction endonuclease. It is used when the SNP site present on the DNA fragment amplified through PCR can be distinguished by a specific restriction enzyme. The sequence of the restriction site for a specific restriction enzyme is changed by the SNP of the amplified fragment, resulting in a difference in fragment length of the two SNP alleles, which can be easily identified on an agarose gel. Many types of restriction enzymes are commercially available, and software that finds the recognition site acting on the desired sequence is provided free of charge on the web, so it can be easily used. However, 30-40% of SNPs do not have a restriction site, and to solve this, a restriction site that does not exist by changing 1 to 2 bp on the primer is sometimes used for typing (primer mutagenesis).
  • RAPD Random Amplified Polymorphic DNA
  • An arbitrary short primer (8-12bp) is used to amplify only the matched region by the complementary nucleotide sequence. This method is very simple because you only need to investigate the pattern of DNA fragments appearing on the agarose gel. However, very small primer fragments can be amplified as long as they have approximately 70% homology to DNA, and thus require extremely careful experimental conditions. In order to overcome this shortcoming, if the terminal sequence of the amplified site is analyzed and then resynthesized with a specific primer, there is no problem in reproducibility, so it is a method that can be sufficiently used for association analysis.
  • AS-PCR allele-specific polymerase chain reaction
  • AS-PCR is an application method of PCR that can directly detect any point mutation in DNA by analyzing the PCR product on agarose or polyacrylamide gel stained with ethidium bromide (Luis Ugozzoli, R. Bruce Wallace, Allele-specific polymerase chain reaction, Methods, Volume 2, Issue 1, February 1991, Pages 42-48). It is based on the fact that the 3'end of the primer must be complementary to the DNA template in PCR amplification. If there are SNPs of A (adenine) and C (cytosine), if the 3'end of the primer ends in A and the primer ends in C are prepared and amplified, only DNA that is complementary to each primer is amplified, so SNP typing becomes possible.
  • 132 candidate genes related to neurological genetic diseases including TOP3B were selected and target sequencing was performed.
  • TruSeq Nano DNA Library Prep Kits of Illumina (San Diego, CA, USA) were used, and the xGen locking probe of IDT (Coralville, IA, USA) was used for targeted enrichment of 132 genes. lockdown probes) were used.
  • the NGS analysis of the post-enriched library was performed using NextSeq 550 of Illumina.
  • the step of detecting the SNV of the TOP3B gene may be characterized by amplifying the gene and analyzing the gene mutation using sequencing data of the amplified product, but is limited thereto. no.
  • the sequencing may be characterized in that Sanger sequencing or next generation sequencing (NGS).
  • NGS next generation sequencing
  • a primer for the TOP3B gene in the step of detecting SNV of the TOP3B gene, may be used, or a pair or more primer sets may be used.
  • the primer may be used without limitation as long as it is a sequence capable of amplifying the TOP3B gene.
  • it may be characterized in that it is a primer set capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1 above. It is not limited.
  • a probe for the TOP3B gene in the step of detecting the SNV of the TOP3B gene, may be used, and the probe is a probe that complementarily binds to a region containing the SNV position of the TOP3B gene described in Table 1 above. It may be characterized, but is not limited thereto.
  • a reporter may be attached to the 5'end of the probe, and another fluorescent material indicating fluorescence may be attached, but the present invention is not limited thereto.
  • the reporter is FAM, JOE, BHQ1, VIC, TAMRA, ROX, NED, HEX, TET, fluorescein, fluorescein chlorotriazinyl, rhodamine green.
  • Rhodamine red Rhodamine red
  • tetramethylrhodamine FITC
  • Oregon green Alexa Fluor
  • Texas Red Cyanine-based dyes and ciadica
  • Cyanine-based dyes and ciadica It may be one or more selected from the group consisting of thiadicarbocyanine dyes.
  • a black hole quencher-1 BHQ-1
  • the quencher is from the group consisting of Dabcyl, TAMRA, Eclipse, DDQ, QSY, Blackberry Quencher, Qxl, Iowa black FQ, Iowa black RQ and IRDye QC-1. It may be one or more selected.
  • a primer set capable of specifically amplifying the SNV position described in Table 1 and a probe that complementarily binds to the region containing the SNV position of the TOP3B gene described in Table 1 above is in the technical field to which the present invention pertains.
  • the primer set and probe can be used for real time polymerase chain reaction (PCR), and more preferably, can be used for real time PCR.
  • the step of detecting SNV of the TOP3B gene includes polymerase chain reaction, nucleic acid digestion, hybridization, Southern blotting, restriction enzyme fragment polymorphism ( restriction enzyme fragment polymorphism), primer extension, single stranded conformation polymorphism, or analysis using the above methods together, but is not limited thereto. It can be analyzed using a combination of known molecular biology methods.
  • the "multiplex PCR” means that two or more sets of primers used in PCR are used in one amplification reaction.
  • the biological sample is a nucleic acid sample isolated from a sample selected from the group consisting of amniotic fluid including blood, hair, saliva, urine, semen, vaginal cells, oral cells, placental cells or fetal cells, and mixtures thereof. It can be characterized.
  • the nucleic acid may be genomic DNA, cell free DNA (cfDNA), RNA, or micro RNA, but is not limited thereto.
  • the nucleic acid can be obtained through a conventional method known in the art.
  • the tissue is treated with a DNA lysis buffer (e.g., tris-HCl, EDTA, EGTA, SDS, deoxycholate, and tritonX and/or NP-40) to separate DNA.
  • a DNA lysis buffer e.g., tris-HCl, EDTA, EGTA, SDS, deoxycholate, and tritonX and/or NP-40
  • the genomic DNA fragment comprises the steps of: (i) isolating cellular DNA from the test sample; And (ii) fragmenting the cellular DNA to obtain the genomic DNA fragment.
  • amplification refers to a reaction to amplify a nucleic acid molecule.
  • Various amplification reactions have been reported in the art, which are polymerase chain reaction (PCR) (US 4,683,195, 4,683,202, and 4,800,159), reverse transcription-polymerase chain reaction (RT-PCR) (Sambrook et al., Molecular Cloning.A Laboratory Manual, 3rd ed.Cold Spring Harbor Press (2001); WO 89/06700; and EP 329,822, ligase chain reaction (LCR) Gap-LCR (WO 90/01069), repair chain reaction ( repair chain reaction; EP 439,182), transcription-mediated amplification (TMA, WO 88/10315), self sustained sequence replication (WO 90/06995), selective amplification of target polynucleotide sequences ( selective amplification of target polynucleotide sequences, US Patent 6,410,276), consensus sequence primed polymerase chain reaction (CP-
  • PCR is the most well-known nucleic acid amplification method, and its many modifications and applications have been developed. For example, touchdown PCR, hot start PCR, nested PCR and booster PCR have been developed by modifying traditional PCR procedures to enhance the specificity or sensitivity of PCR.
  • real-time PCR differential display PCR (DD-PCR), rapid amplification of cDNA ends (RACE), multiplex PCR, inverse polymerase chain reaction chain reaction: IPCR), vectorette PCR and TAIL-PCR (thermal asymmetric interlaced PCR) have been developed for specific applications.
  • DD-PCR differential display PCR
  • RACE rapid amplification of cDNA ends
  • IPCR inverse polymerase chain reaction chain reaction
  • TAIL-PCR thermal asymmetric interlaced PCR
  • DNA sequencing refers to analyzing the sequence of nucleobases of nucleotides constituting DNA.
  • DNA consists of a double helix structure, and each single strand has a 5 ⁇ -end and a 3 ⁇ -end.
  • DNA is synthesized from the 5 ⁇ -end to the 3 ⁇ -end by DNA polymerase. Attempts to analyze DNA sequencing have been continued from the past using these characteristics, and DNA sequencing has been made possible by two methods developed almost simultaneously in 1977.
  • the first is the Sanger method, which analyzes the base sequence through DNA chain termination using the didioxynucleotide triphosphate (ddNTP), and the other is by cutting a specific base site in the DNA using a chemical agent This is the Maxam-Gilbert method of analyzing.
  • ddNTP didioxynucleotide triphosphate
  • dd-nucleotide In the didioxynucleotide (dd-nucleotide), the -OH group is substituted with the H group at the 3 ⁇ position of the ribose of the normal nucleotide. During normal DNA synthesis, ddNTPs can also bind to the DNA chain. However, after entering the DNA chain, since ddNTPs have no -OH at the 3 ⁇ position, the next nucleotide can no longer bind and the elongation reaction is terminated.
  • Each test tube commonly contains dNTP (dATP, dTTP, dGTP, dCTP), which is a component of DNA.
  • dNTP dATP, dTTP, dGTP, dCTP
  • Each test tube contains a different ddNTP chain terminator, so one test tube contains ddATP, the next test tube ddTTP, the next test tube ddGTP, and the next test tube ddCTP.
  • dNTPs or primers should be labeled with radioactivity (32P). For example, since ddGTP randomly enters the G position, ddGTP can theoretically fit into any G position.
  • each DNA chain synthesized in this reaction ends at all G points, you can see the location of G by looking at the length of the synthesized chain.
  • test tube A the polymerization of the chain can end at all points A, in test tube T, at all points T, and in test tube C, at all points C, a series of DNAs of different lengths are produced for each test tube.
  • the DNA is denatured in each test tube so that various newly synthesized strands come off the template.
  • A, T, G, C After electrophoresis in different lanes for each base reaction test tube, the separated DNA fragments according to their length are observed by autoradiography.
  • the DNA sequence can be determined by reading the band, which is a fragment of DNA that has moved according to its position in each of the adjacent lanes A, C, G, and T.
  • 132 candidate genes related to neurological genetic diseases including TOP3B were selected and target sequencing was performed.
  • TruSeq Nano DNA Library Prep Kits of Illumina (San Diego, CA, USA) were used, and the xGen locking probe of IDT (Coralville, IA, USA) was used for targeted enrichment of 132 genes. lockdown probes) were used.
  • the NGS analysis of the post-enriched library was performed using NextSeq 550 of Illumina.
  • Illumina's next-generation sequencing method which is currently the most widely used, extracts DNA from a sample and then mechanically fragments it, then creates a library having a specific size and uses it for sequencing. Using large-capacity sequencing equipment, it repeats the binding and separation of four types of complementary nucleotides in one base unit to produce initial sequencing data, and afterwards, processing, mapping, and genome mutations of the initial data. It is accomplished by performing the analysis steps using bioinformatics, such as identification of variance and analysis of mutation information.
  • next-generation sequencing method is contributing to the creation of new added value through the development of innovative therapeutic agents and industrialization by discovering genomic mutations that have a high possibility or affect diseases and various biological phenotypes.
  • the next-generation sequencing method can be applied not only to DNA but also to RNA and methylation decoding, and whole-exome sequencing (WES) that captures and sequence only the exome region encoding the protein. Also possible.
  • WES whole-exome sequencing
  • library preparation is a process of preparing a library required for sequence analysis by conjugating an adapter in the direction of 5'to 3'from random DNA or cDNA fragments of a sample.
  • Initial NGS library construction required complex procedures such as random cleavage of DNA or RNA samples, 3'and 5'end repair, adapter ligation, PCR amplification and purification, and a long time of one to two days.
  • Illumina improved this and developed a tagmentation method such as “Nextera XT DNA library Preparation”.
  • a tag conventional adapter
  • NGS Next Generation Sequencing
  • NGS is a name that is called to distinguish it from the first automated devices before, and to distinguish them from Next NGS devices (also referred to as the next generation or third generation NGS) that were created afterwards.
  • NGS Next Generation Sequencing
  • the sequencing technology of each generation becomes ambiguous, and the division of NGS Is used in a broad sense encompassing all of the sequencing technologies after the automated Sanger sequencing technology.
  • NGS The technology introduced in NGS can be largely divided into three types: clonal amplification, massively parallel, and a new readable sequencing method (non-Sanger method) (base/color calling).
  • Clonal amplification has the effect of removing the cloning process by removing the library construction process, and the mass-parallel method handles hundreds of thousands of clones at the same time, improving efficiency.
  • the new straightforward sequencing method shows the effect of eliminating capillary electrophoresis.
  • the process of obtaining a template clone was simplified by clonal amplification.
  • a template DNA with a length of about 500 base pairs is required.
  • short fragments must be cloned through subcloning and then amplified in bacteria.
  • the new method eliminates both the cumbersome library construction and cloning process, cuts the DNA into short fragments as appropriate, and then amplifies it by PCR using primers to obtain a template clone.
  • Strategies such as bead-based, solid-satate, and DNA nanoball generation are used for clonal amplification.
  • an adapter oligonucleotide is connected to both ends of the fragmented DNA, and then flowed on the surface of a glass flow cell to randomly bind to an adapter fixed on the surface and a complementary primer.
  • PCR is performed in this state, the free ends of the DNA fixed to the surrounding free primer are bound to form a bridge and amplification proceeds.
  • amplification proceeds in this way, a cluster that plays the same role as the bead is formed.
  • the nucleotide sequence determination method which replaces the Sanger method, is largely divided into a sequencing method (Sequencing By Ligation, SBL) and a sequencing method (Sequencing By Synthesis, SBS) through DNA ligation.
  • the SBL method uses repetitive ligation of DNA fragments.
  • An anchor with n bases is complementarily bound to a template DNA, and two randomly encoded bases labeled with a fluorescent label and their Probes with subsequent degenerate or universal bases are added to the DNA library slide in which the beads or clusters are precipitated.
  • a probe having two encoded sequences complementary to the template DNA fragment immediately following the anchor is ligated to the anchor, and the two encoded nucleotide sequences are analyzed through fluorescent label imaging of the slide. When the two sequences are analyzed, the degenerate base sequence and the fluorescent particles are removed and the above process of adding a probe is repeated.
  • an anchor having bases of n+2 and n+4 is used and repeatedly analyzed to analyze the sequence of the entire template DNA fragment.
  • SBS is again divided into a cyclic reversible termination (CRT) and a single nucleotide addition (SNA).
  • CRT cyclic reversible termination
  • SNA single nucleotide addition
  • the CRT method uses a process similar to the automated Sanger method, in which a mixture of primers, DNA polymerase, and modified nucleotides is added to a slide having a DNA cluster amplified using the solid state method.
  • the modified nucleotide is blocked with 3'-O-azidomethyl so that no additional polymerization process can occur, and is labeled with a fluorescent label specific to each base and which can be removed later.
  • the unpolymerized base is washed off and the base is identified through imaging using a total internal reflection fluorescence (TIRF) microscope.
  • TIRF total internal reflection fluorescence
  • the SNA method is a method of analyzing base sequence by converting ions, etc., which are generated when DNA polymerase attaches a single nucleotide to light.
  • the SNA method is represented by the pyrosequencing method used by Roche's 454 device, which is a method of reading the pyrophosphate released when nucleotides are bound with light. If 4 kinds of dNTPs (A, G, T, C) are sequentially added and reacted and washed repeatedly, light is emitted every time the polymerization reaction occurs. This is a method to find the base sequence.
  • a library of DNA fragments with short-reads of 1 Gb or less was created and based on the result of sequencing of such short reads.
  • This is a method of obtaining the sequence of the entire DNA to be analyzed through an algorithm that maps and arranges the overlapping sequence parts in each read. Since a short read is used, the base sequence of a DNA fragment can be obtained in a short time, but a high-performance computer is required, and reliability is very low when the size of the entire gene is large. In addition, it was difficult to assemble and solve repetitive and complex areas using the shotgun method.
  • NGS devices have improved functionality and speed a lot, they fall short of the $1,000 per capita cost target for real genome sequencing that could open an era of personalized medicine.
  • NHGRI National Human Genome Research Institute
  • NGS devices with new principles and concepts that go beyond NGS are being developed (3rd generation or higher NGS).
  • the technology introduced in the Next NGS device is the use of a single DNA molecule template that has the effect of eliminating clonal amplification, and the use of various signals generated during synthesis or decomposition through a base detection reaction with increased detection sensitivity (current, light, hydrogen ions, etc.) Technology, etc.
  • the new NGS technology overcomes the limitations of the above-described NGS technologies to perform base sequence analysis directly from a single DNA molecule without amplification.
  • clone amplification was first performed with a single DNA fragment because the number of templates had to be sufficiently increased in order to generate an optical signal sufficient to be captured by a high-speed imaging camera in the sequencing reaction.
  • the new NGS technology reads a sequence from a single DNA molecule. In other words, DNA is reacted in a single molecule state to read the sequence in real time.
  • SMRT Pacific Biosciences' single molecule real-time
  • SMRT single molecule real-time
  • the Oxford Nanopore sequencer reads the base by the change in potential that occurs when a single base cut by exonuclease passes through the pore.
  • SMRT single molecule, real-time
  • a molecule of DNA polymerase is bound to the bottom of the analysis chip, where it initiates a polymerization reaction with the template DNA, detects the reaction in real time, and reads the base sequence.
  • a fluorescent label is attached to the end of the phosphate group of a nucleotide and a base-binding reaction occurs, the fluorescent label is dropped and the fluorescent wavelength is stopped. This is detected in real time and the sequence is analyzed.
  • the base determination method of the Oxford Nanopore sequencer is an exonuclease sequencing method that reads the type of free nucleotide by cutting the nucleotides of the template instead of receiving a signal to synthesize DNA from the template.
  • Nanopores are a path through which current flows, and when free nucleotides pass through the nanopores, different currents are generated for each base of A, T, G, and C, and this is a method of detecting changes in potential.
  • the Oxford Nanopore Sequencing Analyzer is an innovative ultra-compact instrument that eliminates both the PCR amplification process and the fluorescence imaging process. These nanopores are made from proteins across the membrane, such as alpha-hemolysin.
  • Alpha-hemolysine is a protein pore made of heptamer, and the inner diameter is the same as a single DNA molecule.
  • solid-state nanopores are also being developed through processing such as grapheme to make more sophisticated and specific nanopores.
  • RAD-seq Restriction site Associated DNA sequencing
  • NGS Restriction site Associated DNA sequencing
  • a typical analysis program is Stacks published by Julian M. Catchen et al., and SNPs were identified in individuals and populations using this.
  • the RAD-seq method is not only complicated in the experimental method, but also has relatively low efficiency because a large amount of genome sequencing must be determined to obtain a good result.
  • GBS is one of the latest methods of NGS technology designed to detect SNP genotypes in various crop species and individuals. Unlike other genotyping techniques, GBS can map high-level SNP markers to reference genomes at low cost.
  • the first step in GBS analysis is to select the most effective restriction enzymes through genomic analysis in order to avoid repetitive genomic sequences and at the same time ensure that major regions of the genome can be selected.
  • the genome is treated with a restriction enzyme, and then both fragments are sequenced with restriction enzymes on both sides of the sequence.
  • This method reduces cost and time by allowing a certain portion to be analyzed at a high rate over a wide genome range without analyzing the entire genome.
  • restriction enzymes such as Reduced Representation Library (RRL), RAD-seq, etc.
  • RRL Reduced Representation Library
  • RAD-seq RAD-seq
  • the GBS analysis pipeline which is the most used among various analysis pipelines, is TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) developed by Buckler Lab at Georgia University, which is currently showing the most stable and excellent results.
  • TASSEL is a Java-based analysis program for genotyping analysis using genome and restriction enzyme information such as GBS developed by Buckler lab at Cornell University. It is a software that evaluates genotype and trait association as a quantitative genetics tool with population.
  • TASSEL consists of two large pipelines, Discovery and Production.
  • the discovery pipeline is processed with barcodes and restriction enzymes, extracts tags, which are genome fragments of a certain length, using sequence information in FASTQ format, maps them to reference genomes, and detects SNPs from the mapped data. .
  • the production pipeline finally generates genotyping information in Hatmap data format for a number of samples with genome files in FASTQ format and data mapped through discovery (Jeong-Ho Baek, et al., (2015) Journal of the Society (J. Korea Inst. Inf. Commun. Eng.) Vol. 19, No. 10: 2491-2499).
  • the current genome (polymorphism chip or next-generation sequencing technology) data-based biomarker search and discovery uses a single nucleotide polymorphism (SNP) method.
  • SNP single nucleotide polymorphism
  • the method of calculating this single nucleotide polymorphism is called single nucleotide polymorphism definition (SNP calling).
  • biomarker discovery and detection techniques are used in disease association studies and disease linkage studies using nucleotide polymorphism information of normal and patient groups.
  • data classified by genotype, etc. correspond to categorical variables that are non-contiguous variables. Since these categorical variables lose a lot of information compared to continuous variables, disease association and linkage studies due to alleles such as cancer, rare diseases and chronic diseases are conducted. When doing so, the biomarker detection and discovery power tends to decrease.
  • nucleotide polymorphism in the case of next-generation sequencing (NGS) or nucleotide polymorphism-chip data, a large amount of nucleotide polymorphism (SnV calling) is deposited on a chip by chemical method and sequencing.
  • NGS next-generation sequencing
  • SnV calling nucleotide polymorphism
  • the quantified signal intensity value is expressed as a number of hundreds to thousands per base.
  • SNV calling from NGS data relates to a method for confirming the presence of a single base variant (SNV) from the results of next-generation sequencing (NGS) experiments.
  • SNV single base variant
  • NGS next-generation sequencing
  • a typical process is to filter NGS reads to eliminate the cause of errors/bias; read alignment to the reference genome; Predicting the likelihood of variability at each locus based on the number of alleles and a qualitative score for the likelihood of alignment at each locus, using statistical models or algorithms based on some heuristics; Filter predicted results based on application-specific metrics; And SNP annotations that predict the functional effect of each mutation (Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51). The typical output of this procedure is a VCF file.
  • D denotes observed data, that is, sorted read
  • G denotes a genotype from which a probability is calculated
  • Gi denotes the i-th possible genotype among n possibilities.
  • the error model used to generate the probabilistic method for variant calling is the basis for computing the P(D
  • a heuristic method exists as an alternative to the probabilistic method. Rather than modeling the distribution of observed data and calculating genotype probabilities using Bayesian statistics, variation is based on various empirical factors such as minimum allele count, read quality cut-offs, read depth bounds, etc. Proceed with calling. In fact, it is relatively less used than the probabilistic method, but the influence of external data that breaks the assumption of the probabilistic model is less because it uses the boundary and cutoff (Daniel C Koboldt, et al., (2012) Genome Res, 22 ( 3), 568-76).
  • biases may exist within the sequenced read set. For example, a strand bias may occur, and there is a very uneven distribution of reads arranged in the neighborhood, forward and reverse. Also, sometimes abnormally high replication of some reads can occur due to bias in PCR, for example. Such bias can lead to ambiguous mutant calls. For example, if a fragment containing a PCR error at a locus is amplified due to PCR bias, the locus will have a large number of false alleles and may be called SNV. Therefore, the analysis pipeline filters calls based on these biases (Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51).
  • the present invention relates to a composition for diagnosing dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
  • SNV Single Nucleotide Variant
  • the present invention relates to a kit for diagnosing dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
  • SNV Single Nucleotide Variant
  • the present invention relates to the use of an agent capable of measuring Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene for diagnosis or prediction of dementia.
  • SNV Single Nucleotide Variant
  • the formulation may be characterized in that it comprises a primer capable of specifically amplifying the SNV position of the TOP3B gene or a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
  • the present invention relates to a composition for diagnosing dementia or a kit for diagnosing dementia comprising a primer or a set of primers capable of specifically amplifying the SNV position of the TOP3B gene.
  • the present invention also relates to a composition for diagnosing dementia or a kit for diagnosing dementia comprising a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
  • the present invention also relates to a composition for diagnosis or prediction of dementia comprising a primer or a set of primers capable of specifically amplifying the SNV position of the TOP3B gene for use in a method for diagnosing or predicting dementia.
  • the present invention also relates to a composition for diagnosis or prediction of dementia comprising a probe complementarily binding to a region containing the SNV position of the TOP3B gene for use in a method for diagnosing or predicting dementia.
  • the present invention also relates to the use of a primer or primer set capable of specifically amplifying the SNV position of the TOP3B gene for diagnosis or prediction of dementia.
  • the present invention also relates to the use of a probe that complementarily binds to a region containing the SNV position of the TOP3B gene for diagnosis or prediction of dementia.
  • the present invention further relates to the use of a primer or primer set capable of specifically amplifying the SNV position of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
  • the present invention also relates to the use of a probe that complementarily binds to a region containing the SNV position of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
  • the primer may be a set of primers capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1, but is not limited thereto.
  • the probe may be characterized in that it is a probe that complementarily binds to a region including the SNV position of the TOP3B gene described in Table 1, but is not limited thereto.
  • a primer for the TOP3B gene in the step of detecting SNV of the TOP3B gene, may be used, or a pair or more primer sets may be used.
  • the primer may be used without limitation as long as it is a sequence capable of amplifying the TOP3B gene.
  • it may be characterized in that it is a primer set capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1 above. It is not limited.
  • a primer set capable of specifically amplifying the SNV position described in Table 1 and a probe that complementarily binds to the region containing the SNV position of the TOP3B gene described in Table 1 above is in the technical field to which the present invention pertains.
  • the primer set and probe can be used for real-time PCR, and more preferably, can be used for real-time PCR.
  • composition for diagnosing dementia and the kit for diagnosing dementia of the present invention uses and uses, because the above-described “method for providing information for diagnosis or prediction of dementia” or “method for diagnosing dementia” is used, the method for providing information according to the present invention described above The description of the content overlapping with is omitted.
  • SEQ ID NO: 45 5'-AATGATACGGCGACCACCGAGATCTACAC-3'
  • Readings of each sample were referenced to the reference sequence hg19 (human genome version 19; GRCh37. p13), and SNP/InDels were confirmed using the Haplotype Caller modified in the Genome Analysis Toolkit (GATK) (Mark A DePristo, et al., (2011) Nat Genet, 43 (5), 491-8). The enrichment efficiency was determined based on the ratio of reads mapped to the targeted region with a padding of 150 bp.
  • GATK Genome Analysis Toolkit
  • Example 3 Analysis of the correlation between SNV and dementia using big data analysis
  • the SNV location of the TOP3B gene was confirmed in the normal group and the dementia group (Table 3).
  • the SNVs of the TOP3B gene of the dementia patient group are shown in Table 4 below.
  • the locations of SNVs that are specifically found in the TOP3B gene in the dementia patient group were identified, and in terms of the frequency of SNV occurrence, there is a higher frequency of SNV in the dementia patient group than in the normal group.
  • the etiology and pathology of dementia can be understood, the risk of dementia can be more accurately diagnosed using the gene and SNV, and thus, it can be usefully used in the development of a dementia treatment.

Abstract

The present invention relates to a TOP3B gene mutation-based dementia diagnosis method and, more particularly, to a method for diagnosis of dementia on the basis of numbers and loci of single nucleotide variants (SNV)located in the DNA topoisomerase III beta (TOP3B) gene. In the present invention, NGS data and big data analysis identified loci of SNV specifically found at the TOP3B gene in a patient group with dementia, and with respect to the SNV frequency, it was revealed that SNV exists at a high frequency in a patient group with dementia, compared to a normal group. Through the present invention, the etiology and pathology of dementia can be understood and more accurate diagnosis can be made of the risk of dementia by using the gene and SNV, thereby finding useful applications in the development of therapeutic agents for dementia.

Description

TOP3B 유전자 변이 기반 치매 진단방법Dementia diagnosis method based on TOP3B gene mutation
본 발명은 TOP3B 유전자 변이에 기반한 치매 진단방법에 관한 것으로, 더욱 상세하게는, TOP3B(DNA topoisomerase III beta) 유전자에 존재하는 SNV(Single Nucleotide Variant)의 개수 및 위치를 기반으로 치매를 진단하는 방법에 관한 것이다.The present invention relates to a method for diagnosing dementia based on TOP3B gene mutation, and more particularly, to a method for diagnosing dementia based on the number and location of SNV (Single Nucleotide Variant) present in TOP3B (DNA topoisomerase III beta) gene. About.
‘치매(dementia)’란, 뇌 질환으로 초래된 일련의 증세를 의미한다. 치매가 진행되면, 사고력, 행동 및 일상 생활 수행에 영향을 미치게 된다. 치매의 특징은 인식 능력의 저하로 일상적인 활동 능력 결여 상태가 되는 것이다. 의사들은 두 개 이상의 인식 기능이 현저하게 손상될 경우 치매로 진단한다. 그러한 인식 기능은 기억력, 언어 기능, 정보 이해, 공간 기능, 판단력 및 주의력을 포함한다. 치매 환자는 문제를 해결하고 감정을 통제하는데 어려움이 있을 수 있으며, 인격 변화를 겪을 수도 있다. 치매 환자가 겪는 정확한 증세는 치매를 일으킨 질환에 의해 손상된 뇌가 어떤 부위인가에 달려 있다. 치매의 여러 유형에서는 뇌의 신경 세포 일부가 기능을 멈추고 다른 세포들과의 연결이 사라져 죽게 된다. 치매는 대개 꾸준히 진행된다. 즉, 치매는 점차적으로 뇌로 퍼지며 환자의 증세는 시간이 지나면서 악화된다.“Dementia” refers to a series of symptoms caused by brain disease. As dementia progresses, it affects the ability to think, act and perform daily activities. The characteristic of dementia is a lack of daily activity due to a decrease in cognitive ability. Doctors diagnose dementia when two or more cognitive functions are significantly impaired. Such cognitive functions include memory, language function, information comprehension, spatial function, judgment and attention. People with dementia may have difficulty solving problems and controlling their emotions, and may experience personality changes. The exact symptoms experienced by dementia patients depend on which part of the brain is damaged by the disease that caused the dementia. In many types of dementia, some of the brain's nerve cells stop functioning and their connections to other cells disappear and die. Dementia usually progresses steadily. In other words, dementia gradually spreads to the brain, and the patient's symptoms worsen over time.
치매는 나이와 관련된 대표적인 신경퇴행성 뇌 질환으로, 전 세계적으로 65세 이상 노인에서 약 5~10%의 유병률을 보이며, 대부분의 치매 환자는 진행성 인지기능장애, 환각, 망상, 생활능력상실의 증상을 나타낸다. 치매의 원인 중 가장 대표적인 알츠하이머의 경우, 뇌 피질의 신경 세포 내에서 신경섬유다발(neurofibrillary tangle)과 뇌 세포 주변에 아밀로이드 베타(Amyloid β)라는 단백질이 엉긴 덩어리(plaque)가 관찰되며(Hardy, J. et al., (1998) Nat Neurosci, 1, 355-8), 이 노폐물이 신경 세포의 괴사를 일으키는 것으로 추정되고 있다. 이 외에도 타우(τ) 단백질의 과인산화, 염증, 산화적 손상 등도 발병과 관련이 있는 것으로 보인다. 신경반(혹은 노인반)은 아밀로이드 베타 단백질의 침착과 관련되며, 신경섬유다발은 타우 단백질 과인산화와 연관이 있는 것으로 알려져 있다. 특히 유전되는 타우 기인성 치매(tau only dementia)인 전두측두엽 치매 환자에서 외부신경세포에서 아밀로이드 베타의 축적 없이 내부신경세포에서 타우 과인산화와 응집만으로도 치매가 유도되는 점이 확인된 바 있다(John van Swieten, Maria Grazia Spillantini, (2007) Brain Pathol, 17 (1), 63-73).Dementia is a representative neurodegenerative brain disease related to age, with a prevalence of about 5-10% in the elderly over 65 years of age worldwide, and most patients with dementia have symptoms of progressive cognitive dysfunction, hallucinations, delusions, and loss of life ability. Show. In the case of Alzheimer's, the most representative cause of dementia, a neurofibrillary tangle and a plaque of a protein called amyloid β around the brain cells are observed within the neurons of the brain cortex (Hardy, J. et al., (1998) Nat Neurosci, 1, 355-8), it is estimated that this waste product causes necrosis of nerve cells. In addition, hyperphosphorylation of tau (τ) protein, inflammation, and oxidative damage appear to be associated with the onset. Neural plaques (or senile plaques) are known to be associated with the deposition of amyloid beta protein, and nerve fiber bundles are known to be associated with tau protein hyperphosphorylation. In particular, in patients with frontotemporal dementia, an inherited tau-induced dementia, it has been confirmed that dementia is induced only by tau hyperphosphorylation and aggregation in internal neurons without accumulation of amyloid beta in external neurons (John van Swieten, Maria Grazia Spillantini, (2007) Brain Pathol, 17 (1), 63-73).
치매 진단은 증상에 대한 자세한 병력청취 및 평가를 통해 인지기능의 장애로 인한 일상생활 및 사회활동의 장애를 확인하고, 뇌 영상진단을 활용해 뇌혈관 질환 여부 및 뇌위축(brain atrophy)등을 조사하여 치매를 확진하게 된다. 치매의 초기 단계에는 노인성 건망증과의 구분이 힘들기 때문에 기억력뿐 아니라, 언어능력, 계산능력, 시공간 지각능력, 판단력 등을 종합적으로 평가하는 신경심리검사를 시행하게 된다. 치매를 진단하기 위해서는 환자/보호자와의 면담과 선별검사로 얻은 여러 정보를 바탕으로 추가적인 정밀검사를 거쳐야 한다. 추가적인 검사로는 신경심리검사(SNSB), 혈액검사나 다양한 종류의 뇌 영상검사(CT, MRI, PET) 등이 있다.The diagnosis of dementia is to check the disorders in daily life and social activities due to cognitive impairment through detailed medical history and evaluation of symptoms, and use brain imaging to investigate cerebrovascular disease and brain atrophy. Dementia is confirmed. In the early stages of dementia, since it is difficult to distinguish it from senile forgetfulness, a neuropsychological test that comprehensively evaluates not only memory, but also language ability, computational ability, spatiotemporal perception ability, and judgment is performed. In order to diagnose dementia, an additional close-up examination is required based on various information obtained through interviews and screening tests with patients/guardians. Additional tests include neuropsychological tests (SNSB), blood tests, and various types of brain imaging tests (CT, MRI, PET).
MRI(magnetic resonance imaging, 자기 공명 영상) 촬영은 치매의 종류를 구분하는 데 중요한 역할을 할 수 있다. 알츠하이머병 치매에 가까운지, 혈관성 치매에 가까운지의 여부를 이 검사로 알 수 있으며, 또 다른 질환에 의한 치매인지를 판별하는 데에도 일부 도움을 줄 수 있다. 알츠하이머병 치매를 조기에 진단할 수 있는 방법 중의 하나로 아밀로이드 양전자방출단층촬영(PET)이 있는데, 이 검사는 알츠하이머병 환자에게서 나타나는 뇌 속의 아밀로이드판을 영상으로 확인하여 진단하는 검사로, 증상이 심하지 않거나 치매 증상이 없는 경우에도 확인이 가능하다(대한치매학회).Magnetic resonance imaging (MRI) can play an important role in distinguishing types of dementia. This test can determine whether it is close to Alzheimer's disease dementia or vascular dementia, and may also help in determining whether it is due to another disease. One of the methods for early diagnosis of Alzheimer's disease dementia is amyloid positron emission tomography (PET), which is a test to diagnose amyloid plates in the brain that appear in Alzheimer's disease patients with images. It can be confirmed even if there are no symptoms of dementia (Korean Dementia Association).
다만, MRI 검사법은 뇌위축이 상당히 진행된 상태에서만 확인이 가능하기 때문에 치매의 조기진단 목적으로 사용하기는 힘들며, 조기진단을 위해 임상증상을 대변하거나(surrogate marker), 증상이 나타나기 이전 상태를 측정할 수 있는 새로운 진단 마커가 필요한 상황이다.However, MRI is difficult to use for the purpose of early diagnosis of dementia because it can only be confirmed in a state where brain atrophy has progressed significantly.For early diagnosis, it is possible to represent clinical symptoms (surrogate marker) or measure the condition before symptoms appear. There is a need for a new diagnostic marker.
한편, 전체 인간 유전체 상에서 81~93%의 유전자는 적어도 1개 이상의 단일염기변이(Single Nucleotide Variant, SNV)를 포함하고 있다. 많은 숫자의 단일염기변이로부터 특정 질환과의 연관성이 있는 단일염기변이를 발굴하는 것은 매우 의미 있는 작업이다(Benjamin Lehne, et al., (2011) PLoS One, 6 (6), e20133).Meanwhile, 81-93% of genes in the entire human genome contain at least one single nucleotide variant (SNV). It is very meaningful to find a single base variant that is associated with a specific disease from a large number of single base variants (Benjamin Lehne, et al., (2011) PLoS One, 6 (6), e20133).
최근 유전자 염기서열 분석기술의 발달로 대용량(high-throughput) GWAS(genome-wide association studies)가 가능해짐에 따라, 수 종의 유전자 다양성(단일염기다형성, Single Nucleotide Polymorphism, SNP)이 치매 발병과 관련 있음이 보고되고 있다. 치매와 연관된 유전자로는 ApoE 유전자 외에 SOL1, CLU, PICALM, CR1, BIN1 등이 있으며, 이러한 유전자의 기능과 치매와의 연관성을 규명하는 연구가 현재 진행되고 있다(J C Lambert, et al., (2013) Nat Genet, 45 (12), 1452-8).As high-throughput genome-wide association studies (GWAS) have become possible with recent advances in gene sequencing technology, several kinds of genetic diversity (single nucleotide polymorphism, SNP) are associated with the onset of dementia. There are reports. Genes associated with dementia include SOL1, CLU, PICALM, CR1, and BIN1 in addition to the ApoE gene, and studies to determine the function of these genes and the association with dementia are currently being conducted (JC Lambert, et al., (2013). ) Nat Genet, 45 (12), 1452-8).
치매 중 가장 대표적인 알츠하이머 치매는 65세 이전에 증상을 나타내는 조발성 치매(early-oneset AD)과 이후에 발병하는 후발성 치매(late-oneset AD)로 나누어지며, 후발성 치매가 치매 환자의 대부분(>95%)을 차지한다. 현재까지 알려진 유전적 위험인자는 ApoE 유전자 타입이다. ApoE는 ApoEε2, ApoEε3, ApoEε4 세가지 동형질체(isoforms)를 가지는 지질결합단백질(lipid-binding protein)로 ApoEε4 type을 가진 사람은 그 외 경우에 비하여 heterozygote는 2~3배, homozygote는 5배 이상 치매 발병률이 높은 것으로 알려져 있다(Christiane Reitz, Richard Mayeux, (2014) Biochem Pharmacol, 88 (4), 640-51).Alzheimer's dementia, which is the most representative of dementia, is divided into early-oneset AD, which shows symptoms before age 65, and late-oneset AD, which develops symptoms before age 65, and late-onset dementia is the majority of dementia patients (> 95%). The genetic risk factor known to date is the ApoE gene type. ApoE is a lipid-binding protein with three isoforms, ApoEε2, ApoEε3, and ApoEε4. People with ApoEε4 type have 2 to 3 times more dementia in heterozygote and 5 times more in homozygote than others. It is known to have a high incidence rate (Christiane Reitz, Richard Mayeux, (2014) Biochem Pharmacol, 88 (4), 640-51).
이에 본 발명자들은 52세 이상의 정상인과 치매 환자들의 혈액에서 genomic DNA를 추출한 후, NGS(Next Generation Sequencing, 차세대 염기서열 분석) 및 빅데이터 분석을 이용하여 두 그룹간의 SNVs를 조사한 결과, 두 그룹간에 TOP3B(DNA topoisomerase III beta) 유전자에 분포하는 SNVs의 수 및 위치의 확연한 차이를 확인하고 본 발명을 완성하였다.Therefore, the present inventors extracted genomic DNA from the blood of normal people over 52 years of age and dementia patients, and then investigated SNVs between the two groups using NGS (Next Generation Sequencing) and big data analysis. (DNA topoisomerase III beta) The present invention was completed by confirming the distinct difference in the number and location of SNVs distributed in the gene.
본 배경기술 부분에 기재된 상기 정보는 오직 본 발명의 배경에 대한 이해를 향상시키기 위한 것이며, 이에 본 발명이 속하는 기술분야에서 통상의 지식을 가지는 자에게 있어 이미 알려진 선행기술을 형성하는 정보를 포함하지 않을 수 있다.The information described in the background section is only for improving an understanding of the background of the present invention, and thus does not include information forming the prior art known to those of ordinary skill in the art to which the present invention belongs. May not.
발명의 요약Summary of the invention
본 발명의 목적은 TOP3B(DNA topoisomerase III beta) 유전자에 존재하는 SNV(Single Nucleotide Variant)의 개수 및 위치를 기반으로 치매를 진단하는 방법을 제공하는 데 있다.An object of the present invention is to provide a method for diagnosing dementia based on the number and location of SNV (Single Nucleotide Variant) present in TOP3B (DNA topoisomerase III beta) gene.
본 발명의 다른 목적은 TOP3B 유전자의 SNV를 검출할 수 있는 제제를 포함하는 치매의 진단 또는 예측을 위한 조성물 및 키트를 제공하는 데 있다.Another object of the present invention is to provide a composition and kit for diagnosing or predicting dementia, including an agent capable of detecting SNV of the TOP3B gene.
본 발명의 또 다른 목적은 치매의 진단 또는 예측 방법에 사용하기 위한 TOP3B 유전자의 SNV를 검출할 수 있는 제제를 포함하는 치매의 진단 또는 예측을 위한 조성물을 제공하는 데 있다.Another object of the present invention is to provide a composition for diagnosing or predicting dementia comprising an agent capable of detecting SNV of the TOP3B gene for use in a method for diagnosing or predicting dementia.
본 발명의 또 다른 목적은 치매의 진단 또는 예측을 위한 TOP3B 유전자의 SNV를 측정할 수 있는 제제의 용도를 제공하는 데 있다.Another object of the present invention is to provide a use of an agent capable of measuring the SNV of the TOP3B gene for diagnosis or prediction of dementia.
본 발명의 또 다른 목적은 치매의 진단 또는 예측용 키트의 제조에 있어서 TOP3B 유전자의 SNV를 검출할 수 있는 제제의 사용을 제공하는 데 있다.Another object of the present invention is to provide the use of an agent capable of detecting SNV of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
상기 목적을 달성하기 위하여, 본 발명은 분리된 생물학적 시료에서 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출하는 단계를 포함하는 치매의 진단 또는 예측을 위한 정보제공방법을 제공한다.In order to achieve the above object, the present invention provides a method for providing information for diagnosis or prediction of dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample. .
본 발명은 또한, 분리된 생물학적 시료에서 TOP3B 유전자의 SNV를 검출하는 단계를 포함하는 치매의 진단방법을 제공한다.The present invention also provides a method for diagnosing dementia comprising the step of detecting SNV of the TOP3B gene in an isolated biological sample.
본 발명은 또한, TOP3B 유전자의 SNV를 검출할 수 있는 제제를 포함하는 치매 진단용 조성물 및 키트를 제공한다.The present invention also provides a composition and kit for diagnosing dementia comprising an agent capable of detecting SNV of the TOP3B gene.
본 발명은 또한, 치매의 진단 또는 예측 방법에 사용하기 위한 TOP3B 유전자의 SNV를 검출할 수 있는 제제를 포함하는 치매의 진단 또는 예측을 위한 조성물을 제공한다.The present invention also provides a composition for diagnosis or prediction of dementia comprising an agent capable of detecting SNV of the TOP3B gene for use in a method for diagnosing or predicting dementia.
본 발명은 또한, 치매의 진단 또는 예측을 위한 TOP3B 유전자의 SNV를 측정할 수 있는 제제의 용도를 제공한다.The present invention also provides the use of an agent capable of measuring the SNV of the TOP3B gene for diagnosis or prediction of dementia.
본 발명은 또한, 치매의 진단 또는 예측용 키트의 제조에 있어서 TOP3B 유전자의 SNV를 검출할 수 있는 제제의 사용을 제공한다.The present invention also provides the use of an agent capable of detecting SNV of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
도 1은 선별된 132개의 후보 유전자에서 정상인과 치매 환자군에서 SNVs 발생 빈도를 나타낸 그래프이다.1 is a graph showing the incidence of SNVs in a normal person and a dementia patient group in the selected 132 candidate genes.
도 2는 정상인과 치매 환자군의 TOP3B 유전자에 존재하는 SNVs 발생 빈도를 나타낸 그래프이다.2 is a graph showing the incidence of SNVs present in the TOP3B gene in normal and dementia patients.
도 3은 R의 pROC 패키지를 적용하여 개발한 분석 프로그램에서 TOP3B의 SNVs 개수에 대한 ROC 분석 결과이다.3 is a result of ROC analysis for the number of SNVs in TOP3B in the analysis program developed by applying the pROC package of R.
발명의 상세한 설명 및 바람직한 구현예Detailed description and preferred embodiments of the invention
다른 식으로 정의되지 않는 한, 본 명세서에서 사용된 모든 기술적 및 과학적 용어들은 본 발명이 속하는 기술분야에서 숙련된 전문가에 의해서 통상적으로 이해되는 것과 동일한 의미를 갖는다. 일반적으로 본 명세서에서 사용된 명명법은 본 기술분야에서 잘 알려져 있고 통상적으로 사용되는 것이다.Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by an expert skilled in the art to which the present invention belongs. In general, the nomenclature used in this specification is well known and commonly used in the art.
알츠하이머병과 연관성이 밝혀진 유전자로는, APOE에서 e4 대립유전자(allele)가 산발적 알츠하이머병의 후발성 발병에 대한 강력한 위험요인인 것으로 밝혀진 바 있다(Guojun Bu, (2009) Nat Rev Neurosci, 10 (5), 333-44; E H Corder, et al., (1993) Science, 261 (5123), 921-3; Yadong Huang, Lennart Mucke, (2012) Cell, 148 (6), 1204-22). APOE 유전자는 세계적으로 각각 8.4%, 77.9%, 및 13.7%의 빈도를 갖는 3개의 다형성, e2, e3 및 e4를 갖는다. APOE e4는 통상 알츠하이머병 환자의 50% 이상에서 발견되나, 인지력이 정상인 대조군에서는 15% 미만으로 발견된다(Alex Ward, et al., (2012) Neuroepidemiology, 38 (1), 1-17). 그러나, APOE e4를 기반으로 하는 예측의 경우도, 치매에 대한 유전적 영향도의 20% 이내에서만 설명 가능하다. 이처럼, 알츠하이머병에 대한 e4-매개 위험도 및 가능성 있는 유발 인자에 대해서는 여전히 명확하게 밝혀지지 않고 있다.As a gene associated with Alzheimer's disease, the e4 allele in APOE has been found to be a strong risk factor for the onset of sporadic Alzheimer's disease (Guojun Bu, (2009) Nat Rev Neurosci, 10 (5)). , 333-44; EH Corder, et al., (1993) Science, 261 (5123), 921-3; Yadong Huang, Lennart Mucke, (2012) Cell, 148 (6), 1204-22). The APOE gene has three polymorphisms, e2, e3 and e4, with frequencies of 8.4%, 77.9%, and 13.7% respectively worldwide. APOE e4 is usually found in more than 50% of Alzheimer's disease patients, but less than 15% in the control group with normal cognitive ability (Alex Ward, et al., (2012) Neuroepidemiology, 38 (1), 1-17). However, the prediction based on APOE e4 can only be explained within 20% of the genetic impact on dementia. As such, the e4-mediated risk and possible triggers for Alzheimer's disease remain unclear.
NGS는 in vitroin vivo 상에서 쉽게 유전적 서열을 확인할 수 있는 기술로서, 최근 몇 년 동안 유전학 연구는 sequencing 기술의 출현으로 인해 상당한 진보를 보였다. NGS 기술을 통하여 점점 더 많은 수의 유전자를 조사할 수 있게 되었고, 이러한 기술의 발달로 질병 진단과 치료를 위해 유전적 돌연변이 발견이 용이해졌다. 최근 확인된 치매와 연관된 대다수의 유전자는 Aβ42 생성 및 제거에 영향을 주거나 알츠하이머병 병인에 있어서 중요한 경로에 영향을 주는 것으로 알려졌다(Celeste M Karch, Alison M Goate, (2015) Biol Psychiatry, 77 (1), 43-51; Bin Zhang, et al., (2013) Cell, 153 (3), 707-20). 하지만 어떠한 돌연변이 유전자가 어떤 질병과 연관성이 있는지 규명되지 않은 부분이 상당히 많다. 특히 다양한 퇴행성 뇌 질환의 경우 복잡한 상호연관성을 가지고 있기 때문에(Lars Bertram, Rudolph E Tanzi, (2005) J Clin Invest, 115 (6), 1449-57) 한 가지 유전자로 해당 질병을 진단하거나 치료에 적용하기에 문제점이 있었다.NGS is a technology that can easily identify genetic sequences in vitro and in vivo . In recent years, genetics research has made significant progress due to the advent of sequencing technology. NGS technology has enabled the investigation of a growing number of genes, and the development of this technology has made it easier to detect genetic mutations for disease diagnosis and treatment. Most of the genes associated with dementia that have been recently identified have been known to affect Aβ42 production and elimination or an important pathway in the pathogenesis of Alzheimer's disease (Celeste M Karch, Alison M Goate, (2015) Biol Psychiatry, 77 (1) , 43-51; Bin Zhang, et al., (2013) Cell, 153 (3), 707-20). However, there are quite a few parts that have not been identified which mutant genes are associated with which diseases. In particular, in the case of various degenerative brain diseases, they have complex interrelationships (Lars Bertram, Rudolph E Tanzi, (2005) J Clin Invest, 115 (6), 1449-57). There was a problem below.
치매는 현재 효과적인 치료 방법이 없어서 조기진단이 매우 중요하므로, 본 발명자들은 치매의 조기진단을 위해 치매와 관련된 특정 유전자 및 해당 유전자의 SNV를 확인하고자 하였다. NGS 기술을 이용하여 해당 유전자의 SNV를 검출하고, SNV의 발생 빈도 차이 또는 SNV의 위치 등을 통해 치매의 조기진단이 가능함을 확인하고자 하였다.Since there is currently no effective treatment method for dementia, early diagnosis is very important, so the present inventors tried to identify a specific gene related to dementia and the SNV of the gene for early diagnosis of dementia. It was attempted to detect the SNV of the corresponding gene using NGS technology, and to confirm that early diagnosis of dementia is possible through the difference in the incidence of SNV or the location of SNV.
본 발명의 일 실시예에서는 가장 대표적인 치매인 알츠하이머 치매 환자들의 혈액 샘플로부터 genomic DNA를 추출한 후, NGS 분석을 통한 치매 관련 바이오 마커를 탐색하기 위하여 TOP3B를 포함한 132개의 신경계 유전질환 관련 후보 유전자를 선정하여, 이에 대한 타겟 시퀀싱(sequencing)을 수행하였다. R을 이용한 빅데이터 분석을 통해 치매 관련 유전자로 TOP3B(DNA topoisomerase III beta)를 선택하였다. R 패키지 pROC(Xavier Robin, et al., (2011) BMC Bioinformatics, 12, 77)를 적용한 분석 프로그램을 이용하여 ROC(Receiver Operating Characteristics) 분석을 수행하였다. 결과적으로, 치매 환자들의 TOP3B 유전자의 특정 SNV의 분포를 확인하였으며, 상기 SNV의 개수 또는 위치 비교를 통해 치매를 조기진단할 수 있음을 확인하였다.In one embodiment of the present invention, after extracting genomic DNA from a blood sample of Alzheimer's dementia patients, which is the most representative dementia, 132 neurological genetic diseases-related candidate genes including TOP3B are selected to search for biomarkers related to dementia through NGS analysis. , Target sequencing for this was performed. Through big data analysis using R, TOP3B (DNA topoisomerase III beta) was selected as a gene related to dementia. ROC (Receiver Operating Characteristics) analysis was performed using an analysis program to which the R package pROC (Xavier Robin, et al., (2011) BMC Bioinformatics, 12, 77) was applied. As a result, it was confirmed that the distribution of specific SNVs of the TOP3B gene of dementia patients was confirmed, and it was confirmed that dementia can be diagnosed early by comparing the number or location of the SNVs.
따라서, 본 발명은 일 관점에서, 분리된 생물학적 시료에서 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출하는 단계를 포함하는 치매의 진단 또는 예측을 위한 정보제공방법에 관한 것이다.Therefore, in one aspect, the present invention relates to a method of providing information for diagnosis or prediction of dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample.
본 발명은 다른 관점에서, 분리된 생물학적 시료에서 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출하는 단계를 포함하는 치매의 진단방법에 관한 것이다.In another aspect, the present invention relates to a method for diagnosing dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample.
본 발명에 있어서, 상기 치매의 진단 또는 예측은 치매 환자로 진단하는 것뿐만 아니라, 치매 고위험군으로 선별하는 것을 포함하는 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the diagnosis or prediction of dementia may include not only diagnosing as a dementia patient, but also selecting as a high-risk group for dementia, but is not limited thereto.
대부분의 치매는 60대 이상이 되어서 발생하며, 나이가 들면서 발병 위험이 증가한다. 30대, 40대, 50대에서도 발병 가능하고, 드물게 발생하는 65세 미만의 치매의 경우, ‘조기 발병 치매’라고 한다. 65세 이후에 발생하는 경우에 비해 유전적 경향이 강하다. 나이가 들어감에 따라 이 병에 걸릴 가능성이 점차 증가하기 때문에 나이가 들면 발병하는 병으로 인식되었던 적도 있었지만, 치매는 노환의 정상적인 과정이 아니며, 병적인 퇴행성 뇌신경 변화로 인해 발생하는 것으로 알려져 있다.Most of the dementia occurs in the 60s or older, and the risk of developing it increases with age. In the case of dementia under the age of 65, which can occur in the 30s, 40s, and 50s, which occurs rarely, it is called ‘early onset dementia.’ The genetic predisposition is stronger than when it occurs after age 65. Since the likelihood of getting this disease gradually increases with age, it has been recognized as a disease that develops with age. However, dementia is not a normal process of old age and is known to be caused by pathological degenerative cranial nerve changes.
유전이 되는 치매 형태는 매우 일부이며, 특정 유전자 돌연변이가 발병 원인으로 알려져 있다. 하지만 대부분의 경우에 이들 유전자가 관여되지 않다 하더라도 가족 치매 병력을 가진 사람들에게 치매 발병 위험이 더 높다. 또한, 특정한 건강 및 생활양식도 치매에 걸리는 위험 요인이 될 수 있다. 고혈압 등 치료하지 않은 혈관 요인을 가진 사람들이 위험이 높으며, 신체적 정신적 활동이 적은 사람들도 마찬가지이다. 치매를 일으키는 질환은 다양하다. 대부분의 경우, 왜 그러한 질환들이 진전되는지에 대해서는 알려진 바가 없다.There are very few inherited forms of dementia, and certain genetic mutations are known to be the cause of the onset. However, in most cases, even if these genes are not involved, people with a history of family dementia have a higher risk of developing dementia. In addition, certain health and lifestyle options can be a risk factor for dementia. People with untreated vascular factors, such as high blood pressure, are at high risk, as are those with low physical and mental activity. There are many diseases that cause dementia. In most cases, it is not known why such disorders develop.
본 발명에 있어서, 상기 치매는 알츠하이머병(Alzheimer’s disease), 노인성 치매(senile dementia), 혈관성 치매(vascular dementia), 전두측두엽 치매(frontotemporal dementia), 루이소체 치매(dementia with Lewy Bodies) 또는 파킨슨병(Parkinson’s disease) 치매인 것을 특징으로 할 수 있다. 바람직하게는 알츠하이머병(Alzheimer’s disease)인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the dementia is Alzheimer's disease, senile dementia, vascular dementia, frontotemporal dementia, dementia with Lewy Bodies, or Parkinson's disease. Parkinson's disease) dementia. Preferably, it may be characterized as Alzheimer's disease, but is not limited thereto.
‘알츠하이머병(Alzheimer’s disease, AD)’은 가장 흔한 형태의 치매이며, 치매 환자의 약 3분의 2 정도의 치매가 여기에 속한다. 본 발명에서는, 알츠하이머 질환, 알츠하이머성 치매와 동일한 의미로 사용된다. 이는 인식 능력을 점차적으로 저하시키며 종종 기억력 상실이 시작된다. 알츠하이머 질환은 아밀로이드반 및 신경섬유매듭이라고 하는 뇌의 두 이상 증세로 특징 짓는다. 아밀로이드반은 베타 아밀로이드라는 비정상적인 단백질 덩어리들이다. 신경섬유매듭은 타우라는 단백질로 구성된 꼬인 필라멘트 매듭들이다. 아밀로이드반과 신경섬유매듭은 신경세포들과의 커뮤니케이션을 막아 이들 세포들을 죽인다.“Alzheimer's disease (AD)” is the most common form of dementia, and it includes about two-thirds of dementia patients. In the present invention, it is used in the same sense as Alzheimer's disease and Alzheimer's dementia. This gradually decreases cognitive abilities and often begins to lose memory. Alzheimer's disease is characterized by two or more symptoms in the brain called amyloid plaques and nerve fiber knots. Amyloid plaques are abnormal clusters of proteins called beta amyloids. Nerve fiber knots are twisted filament knots made up of a protein called taura. Amyloid plaques and nerve fiber knots kill these cells by blocking communication with them.
치매의 가장 일반적인 형태인 알츠하이머 질환은 진행성 뇌 질환으로, 1906년에 독일인 의사인 알로이스 알츠하이머에 의해서 처음 기술되었다. 알츠하이머 질환으로 사망한 환자의 두뇌에서 관찰되는 노인성 플라크(senile plaque)와 신경섬유다발(neurofibrillary tangles)이 알츠하이머 질환의 병리학적 특성으로 나타난다. 이중 노인성 플라크는 세포 외부에 단백질과 죽은 세포 등이 축적되어 형성되는 것으로, 주 구성 성분은 아밀로이드 베타(Amyloid β, Aβ)라는 펩타이드이다. AD 환자의 주요 특징인 인지 작용의 점진적 상실은 비정상적으로 축적된 Aβ에 의해 유발되는 것으로 보이며, Aβ는 아밀로이드 전구 단백질(amyloidprecursor protein, APP)로부터 단백질분해(proteolysis) 과정을 통해 생성된다. 전구 물질인 APP가 β-세크레타제(BSCE) 및 γ-세크라타제에 의해 분해되어 Aβ가 생성되게 된다(D H Small, et al., (2001) Nat Rev Neurosci, 2 (8), 595-8; B A Yankner, (1996) Neuron, 16 (5), 921-32; D J Selkoe, (1999) Nature, 399 (6738 Suppl), A23-31).Alzheimer's disease, the most common form of dementia, is a progressive brain disease, first described in 1906 by German physician Allois Alzheimer's. Senile plaques and neurofibrillary tangles observed in the brains of patients who died of Alzheimer's disease appear as pathological characteristics of Alzheimer's disease. Among them, senile plaques are formed by accumulation of proteins and dead cells on the outside of cells, and the main constituent is a peptide called amyloid β (Aβ). The gradual loss of cognitive function, a major characteristic of AD patients, appears to be caused by abnormally accumulated Aβ, and Aβ is produced from amyloid precursor protein (APP) through proteolysis. The precursor APP is decomposed by β-secretase (BSCE) and γ-secretase to produce Aβ (DH Small, et al., (2001) Nat Rev Neurosci, 2 (8), 595- 8; BA Yankner, (1996) Neuron, 16 (5), 921-32; DJ Selkoe, (1999) Nature, 399 (6738 Suppl), A23-31).
‘노인성 치매(senile dementia)’는 정상적으로 생활해오던 사람이 65세 이후 다양한 원인에 인해 뇌기능이 손상되면서 이전에 비해 인지 기능이 지속적이고 전반적으로 저하되어 일상생활에 상당한 지장이 나타나고 있는 상태를 가리킨다. 즉, 노인성 치매란 65세 이후 노년기에 발병한 치매를 총칭한다. 과거에는 노인성 치매를 노인이면 당연히 겪게 되는 노화 현상이라고 생각했으나 최근 많은 연구를 통해 분명한 뇌질환으로 인식되고 있다. 노년기에 치매를 일으킬 수 있는 원인들은 매우 다양한데, 이들 중 가장 많은 것은 ‘알츠하이머병’과 ‘혈관성 치매’이며, 상대적으로 빈도는 낮으나 루이소체 치매, 전측두엽 퇴행, 파킨슨병 등의 다른 퇴행성 뇌질환들과 정상압 뇌수두증, 두부 외상, 뇌종양, 대사성 질환, 결핍성질환, 중독성 질환, 감염성 질환 등도 원인이 될 수 있다.'Senile dementia' refers to a condition in which a person who has been living normally has suffered a significant disruption in daily life due to a persistent and general decline in cognitive function compared to before as the brain function has been impaired due to various causes after 65 years of age. . In other words, senile dementia is a generic term for dementia that develops in old age after 65 years of age. In the past, senile dementia was thought to be an aging phenomenon that the elderly would naturally experience, but it has been recognized as a clear brain disease through recent studies. There are a wide variety of causes that can cause dementia in old age, the most of which are'Alzheimer's disease' and'vascular dementia', and although relatively low in frequency, other degenerative brain diseases such as Lewy body dementia, frontotemporal lobe degeneration, and Parkinson's disease Hypernormal pressure hydrocephalus, head trauma, brain tumors, metabolic diseases, deficiency diseases, addictive diseases, and infectious diseases can also be the cause.
‘혈관성 치매(vascular dementia)’는 뇌의 혈관 손상이 초래한 인식 장애이다. 이는 한 번의 뇌졸중이나 여러 번의 뇌졸중이 시간을 두고 발생하여 초래될 수 있다. 혈관성 치매는 뇌에 혈관 질환의 증거 및 일상 생활을 방해하는 인식 기능 장애가 있을 때 진단된다. 혈관성 치매의 징후는 뇌졸중 이후에 갑자기 시작하거나 혈관 질환이 악화되면서 점차적으로 시작될 수도 있다. 증세는 뇌 손상의 위치 및 크기에 따라 다르다. 이는 하나 혹은 몇 개의 특정한 인식 기능에 영향을 미칠 수도 있다. 혈관성 치매는 알츠하이머 질환과 유사하게 나타날 수도 있으며, 알츠하이머 질환과 혈관성 치매가 함께 발생하는 경우가 매우 흔하다.'Vascular dementia' is a cognitive impairment caused by damage to blood vessels in the brain. This can be caused by a single stroke or multiple strokes occurring over time. Vascular dementia is diagnosed when there is evidence of vascular disease in the brain and cognitive impairment that interferes with daily life. Signs of vascular dementia may begin suddenly after a stroke or may begin gradually as vascular disease worsens. Symptoms depend on the location and size of the brain injury. This may affect one or several specific cognitive functions. Vascular dementia may appear similar to Alzheimer's disease, and it is very common that Alzheimer's disease and vascular dementia occur together.
‘루이소체 치매(dementia with Lewy Bodies)’ 또는 ‘루이소체 질환(Lewy body disease)’은 뇌의 루이소체 형성으로 특징 짓는다. 루이소체는 신경 세포 내에서 진전되는 단백질 알파-시누클레인의 비정상적인 덩어리들로, 이들은 뇌의 특정 부위에 발생하여 움직임과 사고 및 행동의 변화를 초래한다. 루이소체 질환자들은 주의력과 사고력에 많은 변동을 겪을 수도 있다. 이들은 거의 정상적인 수행에서 단기간 내에 심각한 혼란까지 갈 수 있으며, 시각적 환영도 흔한 증세이다. 루이소체 치매, 파킨슨 병 또는 파킨슨 병 치매의 겹치는 장애가 루이소체 질환에 포함될 수 있다. 움직임 증세가 먼저 나타날 때 흔히 파킨슨 병으로 진단되며, 파킨슨 병이 진전될 때 대부분의 사람들의 경우 치매가 진전된다. 인식적 증세가 먼저 나타날 때 이는 루이소체 치매로 진단된다. 루이소체 질환은 가끔 알츠하이머 질환 및/혹은 혈관성 치매와 함께 나타난다.'Dementia with Lewy Bodies' or'Lewy body disease' is characterized by the formation of Lewy bodies in the brain. Lewy bodies are abnormal chunks of the protein alpha-synuclein that progresses within nerve cells, which occur in specific areas of the brain that lead to changes in movement, thinking and behavior. People with Lewy body disease may experience many fluctuations in attention and thinking. They can go from almost normal practice to severe confusion within a short period of time, and visual illusions are also common symptoms. Lewy body disease can include overlapping disorders of Lewy body dementia, Parkinson's disease or Parkinson's disease dementia. It is often diagnosed as Parkinson's disease when movement symptoms first appear, and dementia develops in most people when Parkinson's disease develops. When cognitive symptoms first appear, it is diagnosed as Lewy body dementia. Lewy body disease sometimes appears with Alzheimer's disease and/or vascular dementia.
‘전두측두엽 치매(frontotemporal dementia, FTD)’는 뇌의 전두엽 및/혹은 측두엽에 점진적인 손상이 있을 때 발생한다. 증세는 50대나 60대 혹은 그보다 더 일찍 시작된다. 전두측두엽 치매에는 두 가지 주요 유형이 있으며, 전두엽(행동 증세 및 인격 변화와 연관)과 측두엽(언어 장애)이다. 하지만 이 두 가지는 종종 병행되기도 한다. 뇌의 측두엽은 판단과 사회적 행동을 통제하기 때문에 전두측두엽 치매 환자는 종종 사회적으로 적절한 행동을 유지하는데 문제가 발생한다. 이들은 무례한 행동을 하거나 정상적인 책임을 간과하거나 통제가 어렵거나 반복적이거나 공격적이거나 억제력이 결여되거나 충동적으로 행동한다. 전두측두엽 치매의 측두엽 혹은 언어 변종에는 두 가지 주요 형태가 있다. 의미 치매(semantic dementia)는 단어 의미의 점진적인 상실, 단어 찾기의 어려움 및 언어 이해의 어려움이 연관된다. 진행성 비유창성 실어증(progressive non-fluent aphasia)은 덜 흔하지만 유창하게 말하는 능력에 영향을 미친다. 전측두엽성 치매는 전두측두엽 퇴화(frontotemporal lobar degeneration, FTLD) 또는 픽병(Pick’s disease)으로도 불린다.“Frontotemporal dementia (FTD)” occurs when there is gradual damage to the frontal and/or temporal lobes of the brain. Symptoms begin in their 50s or 60s or earlier. There are two main types of frontotemporal dementia: the frontal lobe (associated with behavioral symptoms and personality changes) and the temporal lobe (speech impairment). However, these two often go hand in hand. Because the temporal lobe of the brain controls judgment and social behavior, patients with frontotemporal dementia often have problems maintaining socially appropriate behavior. They behave disrespectfully, overlook normal responsibilities, are difficult to control, repetitive, aggressive, lack deterrent, or behave impulsively. There are two main types of temporal lobe or language variant of frontotemporal dementia. Semantic dementia is associated with a gradual loss of word meaning, difficulty finding words, and difficulty understanding language. Progressive non-fluent aphasia is less common but affects the ability to speak fluently. Frontotemporal lobar degeneration (FTLD) or Pick's disease is also called frontotemporal lobar degeneration.
‘파킨슨병(Parkinson’s disease)’은 대표적인 퇴행성 뇌 질환으로, 중뇌에 위치한 흑질이라는 뇌의 특정부위에서 도파민을 분비하는 신경세포가 원인 모르게 서서히 소실되어 가는 질환이다. 도파민은 뇌 속에 존재하는 신경전달물질 중 운동에 필요한 물질로서, 파킨슨 환자들에게서는 서동증(운동 느림), 안정 시 떨림, 근육 강직, 자세 불안정 등의 증상이 발생한다.'Parkinson's disease' is a representative degenerative brain disease. It is a disease in which nerve cells that secrete dopamine in a specific part of the brain called black matter located in the midbrain are gradually lost without knowing the cause. Dopamine is a neurotransmitter in the brain that is necessary for exercise, and symptoms such as slow motion (slow movement), tremors at rest, muscle stiffness, and postural instability occur in Parkinson patients.
현재까지, 치매를 초기에 진단하기 위해 많은 연구가 이루어지고 있지만, 아직까지 진단에 유효한 마커는 존재하지 않는다. 현재 치매의 진단검사에는 MMSE(Mini Mental State Examination)와 같은 정신상태학적 검사 및 CDR(clinical dementia rating)과 같은 신경심리학적 검사 등이 이용되고 있다.To date, many studies have been conducted to diagnose dementia at an early stage, but there are no effective markers for diagnosis yet. Currently, diagnostic tests for dementia include mental state tests such as MMSE (Mini Mental State Examination) and neuropsychological tests such as clinical dementia rating (CDR).
간이정신상태검사인 MMSE는 치매 또는 알코올 중독 등의 병으로 인한 인지능력의 저하 여부를 확인하는 테스트로, 방향감, 회상, 단기기억, 집중력, 구성행동 및 언어능력 등을 측정하는 도구로써, 30점이 만점이고, 보통 18점 이하이면 확정적 치매(분명한 인지기능 장애)로 판단하고, 19~23점이면 치매를 의심(경도의 인지기능 장애)하며, 24점 이상이면 정상(인지기능 장애의 인지적 손상 없음)으로 판정한다. 아주 간단하고 시간도 많이 걸리지 않아 편한 반면, 어느 기능이 저하되었는지 정확한 정보를 얻기 위해서는 다른 테스트를 병행해야 하므로, MMSE 검사만으로 치매를 확진하거나 치매 유형을 구별할 수는 없다. 치매 임상 평가 척도인 CDR은 6가지 지표(기억, 지남력, 판단력 및 문제해결능력, 사회활동, 집안 생활과 취미 및 위생 및 몸치장)로 치매의 단계를 나누는 방법으로, 각 점수에서 0은 치매가 아니고, 0.5는 약간의 인지장애, 1은 경증의 치매, 2는 중증도의 치매, 3은 중증 치매, 4는 심화된 치매 및 5는 말기 치매로 판정한다.MMSE, a simplified mental state test, is a test that checks whether cognitive abilities are deteriorated due to diseases such as dementia or alcoholism.It is a tool that measures sense of direction, recollection, short-term memory, concentration, constitutive behavior, and language ability. If it is a perfect score, it is usually judged as definite dementia (clear cognitive dysfunction) if it is less than 18 points, and if it is 19 to 23, it is suspected of dementia (mild cognitive dysfunction). None). While it is very simple and does not take much time, it is convenient, but because other tests must be performed in parallel to obtain accurate information about which function is degraded, it is not possible to confirm dementia or distinguish the type of dementia only with the MMSE test. The CDR, which is a clinical evaluation scale for dementia, is a method of dividing the stages of dementia into six indicators (memory, mental power, judgment and problem solving ability, social activities, household life and hobbies, and hygiene and grooming).In each score, 0 is not dementia. , 0.5 is mild cognitive impairment, 1 is mild dementia, 2 is severe dementia, 3 is severe dementia, 4 is severe dementia, and 5 is terminal dementia.
상기 진단 기준을 통해 알츠하이머 치매 환자는 경도인지장애(Mild Cognitive Impairment), 경증 치매(MILD AD), 중간 치매 및 중증 치매(severe AD)로 나눌 수 있다. 정상인들도 나이가 들면 어느 정도의 기억장애를 겪게 되지만, 알츠하이머 환자들에게 특이적으로 나타나는 성격변화 등의 증상은 나타나지 않게 되는데, 이를 경도인지장애(MCI)라 한다. 경도인지장애는 알츠하이머 질환의 전구증상으로 여겨지고, 단기기억상실, 공간기억상실 및 감정적 불균형으로 특징지어지는데, 이는 다시 몇 단계로 분류가 이루어지게 된다. 이중 기억손실과 관련된 MCI를 망각성 MCI(amnestic MCI)라 하는데, 65세 정상인이 특정 기간 안에 알츠하이머성 환자로 변환된 확률이 1 내지 3% 인데 반해, 망각성 MCI를 가진 그룹은 10명 중 8명이 알츠하이머성 환자로 전환되는 것으로 나타나, 망각성 경도인지장애를 가진 경우 알츠하이머성 치매로 발전할 가능성이 높은 것으로 여겨지고 있다.According to the diagnostic criteria, Alzheimer's dementia patients can be classified into mild cognitive impairment, mild dementia (MILD AD), moderate dementia, and severe dementia (severe AD). Normal people also experience some degree of memory impairment as they age, but symptoms such as personality changes that are specific to Alzheimer's patients do not appear, which is called mild cognitive impairment (MCI). Mild cognitive impairment is considered a precursor to Alzheimer's disease and is characterized by short-term memory loss, spatial memory loss, and emotional imbalance, which can be classified into several stages. Among these, the MCI related to memory loss is called amnestic MCI, and the probability of converting to Alzheimer's in a 65-year-old normal person within a certain period is 1 to 3%, whereas the group with forgetful MCI is 8 out of 10. It is believed that patients with Alzheimer's disease are converted to Alzheimer's, and those with oblivious mild cognitive impairment are highly likely to develop Alzheimer's dementia.
치매의 증세가 처음 나타나는 초기 단계에 의학적 진단을 받아서 환자가 올바른 진단과 치료를 받도록 하는 것이 필수적이다. 다만, 치매의 초기 징후는 명백하지 않을 수도 있으며, 몇 가지 흔한 증세에는 점진적이고 자주 발생하는 기억 상실, 혼란, 인격 변화, 실어증 및 금단증상, 일상적인 과제 수행 능력 상실 등이 있다. 현재, 일부 약품이 몇 가지 치매의 증세 완화를 위해 사용될 수 있으나, 효과적인 치료 방법이 존재하지 않는다. 치매 치료의 궁극적인 목표는 병 자체를 되돌려서 완치시키고 치매에 의하여 나타나는 인지장애, 정신장애 및 이상 행동증 등을 줄이고 없애는 것이다.It is essential that the patient undergoes a medical diagnosis in the early stages when the symptoms of dementia first appear, so that the patient receives the correct diagnosis and treatment. However, early signs of dementia may not be obvious, and some common symptoms include progressive and frequent memory loss, confusion, personality changes, aphasia and withdrawal symptoms, and loss of ability to perform routine tasks. Currently, some medications can be used to relieve some symptoms of dementia, but no effective treatment methods exist. The ultimate goal of dementia treatment is to reverse and cure the disease itself, and to reduce and eliminate cognitive disorders, mental disorders, and abnormal behavioral symptoms caused by dementia.
현재 많은 약물들이 알츠하이머 질환 치료에 사용될 수 있는 것으로 보고되고 있으나, 대부분은 그 약효와 관련하여 아직 심사과정에 있으며, 더욱이 현존하는 대부분의 약물들은 알츠하이머병의 진행을 약간 늦출 수 있거나 알츠하이머병에 의해 나타나는 증상에 대한 치료를 위하여 만들어진 것일 뿐, 알츠하이머병 자체를 근본적으로 치료할 수 있도록 고안되고 만들어진 약은 없는 실정이다.Currently, it is reported that many drugs can be used to treat Alzheimer's disease, but most of them are still in the process of screening regarding their efficacy. Moreover, most of the existing drugs can slightly slow the progression of Alzheimer's disease or appear due to Alzheimer's disease. It is only intended to treat symptoms, and there are no drugs designed and made to fundamentally treat Alzheimer's disease itself.
따라서, 다른 질병 분야와 비교하여 치매 분야에서 조기진단은 더욱 중요하다. 치매 환자를 조기에 구별할 수 있는 간단한 진단 기술이 제공된다면, 질병의 초기 단계에 약물의 투여 등을 통한 빠르고 적절한 치료를 통해, 증상을 완화시키고 발병의 정도를 약화시킬 수 있기 때문이다. 치매를 조기에 진단할 수 있다면, 의료진과 환자, 보호자들이 향후 발생할 문제들에 대해 미리 대처할 수 있으며, 약물이나 비약물적 치료를 조기에 시행함으로써 병의 진행 속도를 느리게 하여 삶의 질 향상에 훨씬 많은 도움을 줄 수 있다.Therefore, compared to other disease fields, early diagnosis is more important in the field of dementia. This is because if a simple diagnostic technology that can distinguish dementia patients early is provided, symptoms can be alleviated and the severity of the onset can be reduced through rapid and appropriate treatment through administration of drugs in the early stages of the disease. If dementia can be diagnosed early, medical staff, patients, and caregivers can cope with future problems in advance, and early treatment with drugs or non-drugs slows the progression of the disease and improves the quality of life. I can help.
본 발명에 있어서, 상기 정보제공방법 또는 진단방법은 뇌 영상을 수득하여 대뇌피질 두께가 감소하고 뇌위축(brain atrophy)된 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 것을 특징으로 할 수 있다. 상기 뇌 영상은 MRI 뇌 영상인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the information providing method or diagnostic method may further include the step of identifying as a dementia patient or a high-risk group for dementia when the cortical thickness decreases and brain atrophy by obtaining a brain image. have. The brain image may be an MRI brain image, but is not limited thereto.
본 발명에 있어서, 상기 정보제공방법 또는 진단방법은 신경심리검사, 뇌척수액(CSF) 검사 및 아밀로이드-PET 검사로 구성된 검사 중 하나 이상을 추가로 수행하는 것을 특징으로 할 수 있다.In the present invention, the information providing method or diagnosis method may be characterized in that one or more of tests consisting of a neuropsychological test, a cerebrospinal fluid (CSF) test, and an amyloid-PET test are additionally performed.
본 발명의 일 실시예에서, TOP3B의 SNVs 분석은 피험자의 증상에 대한 자세한 병력청취 및 뇌 영상 평가와 병행하여 치매의 위험도를 평가하였다.In one embodiment of the present invention, analysis of SNVs of TOP3B was performed to evaluate the risk of dementia in parallel with detailed medical history and brain imaging evaluation of the subject's symptoms.
알츠하이머 치매 진단 기술로는 신경심리검사, MRI 뇌영상 검사, 전문의 소견에 의한 임상진단, 및 뇌척수액과 플로베타벤(florbetaben) 기반의 아밀로이드-PET을 통한 병리학적 검사 등이 있다.Alzheimer's dementia diagnosis techniques include neuropsychological tests, MRI brain imaging tests, clinical diagnosis based on expert findings, and pathological tests using cerebrospinal fluid and florbetaben-based amyloid-PET.
임상진단의 경우 치매발병 또는 경도인지장애를 진단할 수 있으나, 다른 뇌 질환과의 구분이 어려운 경우가 있고, 증상이 시작된 후에 비로소 진단이 가능하므로 조기진단의 목적에는 적합하지 않다. 뇌척수액 검사의 경우, 아밀로이드 베타 단백질과 타우 단백질 분석 등 정량적 분석을 통해 수행되어 신뢰도 높은 치매진단 척도이나, 침습적 뇌척수액 채취로 인해 피험자의 거부감이 매우 높은 수준이다. 아밀로이드-PET을 통한 병리학적 검사의 경우, 신뢰도는 높으나 비용이 고가이다. MRI 뇌영상의 경우, 대뇌피질 위축, 해마 위축 등 치매와 동반되는 뇌손상을 규명하고 조기진단을 위한 기술이 개발 중이나, 현재는 그 진단 시점이 빠르지 않은 것으로 알려져 있다. 또한, 혈액을 통한 치매 진단기술개발이 활발히 진행 중이나, 실험대상 집단의 규모와 정확성에 한계점이 있어 임상적 활용을 위해서는 신뢰도 검증이 요구되고 있다.In the case of clinical diagnosis, dementia onset or mild cognitive impairment can be diagnosed, but there are cases where it is difficult to distinguish it from other brain diseases, and it is not suitable for the purpose of early diagnosis because diagnosis is possible only after symptoms begin. In the case of the cerebrospinal fluid test, it is a reliable dementia diagnosis scale that is performed through quantitative analysis such as amyloid beta protein and tau protein analysis, but the subject's rejection level is very high due to invasive cerebrospinal fluid collection. In the case of pathological examination through amyloid-PET, the reliability is high, but the cost is high. In the case of MRI brain imaging, a technique for early diagnosis and identification of brain damage accompanying dementia, such as cortical atrophy and hippocampal atrophy, is being developed, but it is known that the diagnosis is not early. In addition, although the development of dementia diagnosis technology through blood is actively in progress, reliability verification is required for clinical use due to limitations in the size and accuracy of the experimental group.
아직까지 알츠하이머 질환의 초기 진단에 유효한 마커는 존재하지 않는다. 혈장 내 Aβ-40 또는 Aβ-42의 레벨과 다양한 신호조절 단백질이 대조군으로부터 알츠하이머성 치매를 구별하기 위해 사용될 수 있다는 연구 결과가 있었다(Dietmar R Thal, et al., (2002) Neurology, 58 (12), 1791-800). 몇 가지 프로테옴(proteome)에 기초한 연구에서, 대조군과 비교하여 알츠하이머병 혈장 내에서 α2-매크로글로불린(α2M), 컴플리멘트 팩터 H(CFH) 및 α1-안티트립신(AIAT) 등이 높은 레벨로 존재함이 밝혀져, 이를 이용하여 알츠하이머병의 진단에 응용 가능한 마커로의 활용이 기대되었다(Liu Shi, et al., (2018) J Alzheimers Dis, 62 (3), 1181-1198). 하지만, 개별 단백질이나 그들의 조합을 이용한 단백질의 민감도 및 특이도가 MCI 및 알츠하이머 질환의 초기 진단에는 매우 불충분하여, MCI 및 알츠하이머 질환의 초기 진단을 위한 임상학적으로 유용한 새로운 바이오 마커의 개발이 필요한 상황이다.To date, there are no effective markers for early diagnosis of Alzheimer's disease. Studies have shown that levels of Aβ-40 or Aβ-42 in plasma and various signaling proteins can be used to differentiate Alzheimer's dementia from controls (Dietmar R Thal, et al., (2002) Neurology, 58 (12). ), 1791-800). In several proteome-based studies, higher levels of α2-macroglobulin (α2M), complement factor H (CFH), and α1-antitrypsin (AIAT) are present in Alzheimer's disease plasma compared to controls. It was found that, using this, it was expected to be used as a marker applicable to the diagnosis of Alzheimer's disease (Liu Shi, et al., (2018) J Alzheimers Dis, 62 (3), 1181-1198). However, since the sensitivity and specificity of proteins using individual proteins or combinations thereof are very insufficient for the initial diagnosis of MCI and Alzheimer's disease, it is necessary to develop new clinically useful biomarkers for early diagnosis of MCI and Alzheimer's disease. .
유전적 요인에 있어, 아밀로이드 전구체 단백질, 프레세닐린-1(presenilin-1) 및 프레세닐린-2(presenilin-2)가 가족력에 의한 알츠하이머병의 조기 발병에 대한 일차적인 요인으로 알려져 있으며(Kaj Blennow, et al., (2006) Lancet, 368 (9533), 387-403; John Hardy, Dennis J Selkoe, (2002) Science, 297 (5580), 353-6), APOE에서 e4 대립유전자(allele)가 산발적 알츠하이머병의 후발성 발병에 대한 가장 강력한 위험요인인 것으로 밝혀진 바 있다. APOE 유전자는 세계적으로 각각 8.4%, 77.9%, 및 13.7%의 빈도를 갖는 3개의 다형성, e2, e3 및 e4를 갖는다. 알츠하이머병에서 e4 빈도는 ~40%까지 극적으로 증가한다(L A Farrer, et al., (1997) JAMA, 278 (16), 1349-56). APOE는 19번 염색체 장완(19 q13.2)에 위치하고, 112번(Cys/Arg)과 158번(Arg/Cys) 아미노산이 달라짐으로 인해 형성되는 3가지 대립유전자(allele) e2, e3 및 e4의 조합에 의해 6개의 유전자형(E2/E2, E2/E3, E3/E3, E2/E4, E3/E4, E4/E4) 다형성(polymorphism)이 존재한다. 이들 중 APOE e4는 통상 알츠하이머병 환자의 50% 이상에서 발견되나, 인지력이 정상인 대조군에서는 15% 미만으로 발견된다. 종래 연구는 APOE e4가 AD 발병 시기를 5~15년 앞당길 수 있다고 보고한 바 있다(E H Corder, et al., (1993) Science, 261 (5123), 921-3; Estrella Gomez-Tortosa, et al., (2007) Arch Neurol, 64 (12), 1743-8). 또한, 인지력 감퇴에 대한 APOE e4의 영향이 보고된 바 있으나, 일부는 APOE e4에 의한 인지력 감소를 보고한 반면에, 다른 연구는 인지력 감소에 아무런 영향이 없음을 보고하였고, 또 다른 연구 결과는 APOE e4에 의해 인지력이 천천히 감소되는 것을 제시하였다(K Anstey, H Christensen, (2000) Gerontology, 46 (3), 163-77; Sherry A Beaudreau, et al., (2013) J Anxiety Disord, 27 (6), 559-66; Richard J Caselli, et al., (2009) N Engl J Med, 361 (3), 255-63).In terms of genetic factors, amyloid precursor proteins, presenilin-1 and presenilin-2, are known to be the primary factors for the early onset of Alzheimer's disease due to a family history (Kaj Blennow, et al., (2006) Lancet, 368 (9533), 387-403; John Hardy, Dennis J Selkoe, (2002) Science, 297 (5580), 353-6), the e4 allele in APOE Has been shown to be the strongest risk factor for the onset of sporadic Alzheimer's disease. The APOE gene has three polymorphisms, e2, e3 and e4, with frequencies of 8.4%, 77.9%, and 13.7% respectively worldwide. In Alzheimer's disease, the e4 frequency increases dramatically by ~40% (L A Farrer, et al., (1997) JAMA, 278 (16), 1349-56). APOE is located on the long arm of chromosome 19 (19 q13.2), and the three alleles e2, e3, and e4 are formed by different amino acids 112 (Cys/Arg) and 158 (Arg/Cys). There are six genotypes (E2/E2, E2/E3, E3/E3, E2/E4, E3/E4, E4/E4) polymorphism by combination. Among these, APOE e4 is usually found in more than 50% of Alzheimer's disease patients, but less than 15% in the control group with normal cognitive ability. Previous studies have reported that APOE e4 can accelerate the onset of AD by 5 to 15 years (EH Corder, et al., (1993) Science, 261 (5123), 921-3; Estrella Gomez-Tortosa, et al. ., (2007) Arch Neurol, 64 (12), 1743-8). In addition, the effect of APOE e4 on cognitive decline has been reported, but some reported decreased cognitive ability by APOE e4, while other studies reported no effect on cognitive decline, and another study result was APOE. It was suggested that cognitive decline was slowly reduced by e4 (K Anstey, H Christensen, (2000) Gerontology, 46 (3), 163-77; Sherry A Beaudreau, et al., (2013) J Anxiety Disord, 27 (6). ), 559-66; Richard J Caselli, et al., (2009) N Engl J Med, 361 (3), 255-63).
이처럼 논의가 필요한 결과에도 불구하고, APOE e4 동형접합체(homozygotes)를 갖는 건강한 개체가 감소된 해마 부피를 나타낸 반면에, e4 이형접합체(heterozygotes)는 건강한 노년층 그룹에서 e4를 보유하지 않은 피험자와 차이를 보이지 않았다(Fabrice Crivello, et al., (2010) Neuroimage, 53 (3), 1064-9; Herve Lemaitre, et al., (2005) Neuroimage, 24 (4), 1205-13). 또한, APOE e4는 건강한 노년층 및 경도인지장애가 알츠하이머병으로 전환되는데 관여하는 것으로 나타났다(Wang et al., 2011). e4 대립유전자의 위험성은 인종 그룹(ethnic group) 간 차이가 있는 것으로 보고되었다(C J Brainerd, et al., (2013) Neuropsychology, 27 (1), 86-94; R Heun, et al., (2010) Eur Psychiatry, 25 (1), 15-8). 그러나, APOE e4를 기반으로 하는 예측의 경우도, 치매에 대한 유전적 영향도의 20% 이내에서만 설명 가능하다. 이처럼, 알츠하이머병에 대한 e4-매개 위험도 및 가능성 있는 유발 인자에 대해서는 여전히 명확하게 밝혀지지 않고 있다.Despite these debate results, healthy individuals with APOE e4 homozygotes showed reduced hippocampal volume, while e4 heterozygotes differed from those without e4 in the healthy elderly group. Not seen (Fabrice Crivello, et al., (2010) Neuroimage, 53 (3), 1064-9; Herve Lemaitre, et al., (2005) Neuroimage, 24 (4), 1205-13). In addition, APOE e4 was found to be involved in the conversion of healthy elderly people and mild cognitive impairment to Alzheimer's disease (Wang et al., 2011). The risk of the e4 allele has been reported to differ between ethnic groups (CJ Brainerd, et al., (2013) Neuropsychology, 27 (1), 86-94; R Heun, et al., (2010). ) Eur Psychiatry, 25 (1), 15-8). However, the prediction based on APOE e4 can only be explained within 20% of the genetic impact on dementia. As such, the e4-mediated risk and possible triggers for Alzheimer's disease remain unclear.
한편, 대한민국 등록특허 10-1335021은 알츠하이머병 또는 경도인지장애(mild cognitive impairment)가 있는 환자와 APOE rs405509의 T/G 이형접합의 관련성에 대해 기술하였다. 대한민국 등록특허 10-1250464는 APOE 프로모터에 위치하는 유전자 단일염기변이가 APOE E4/E4 동형접합체(homozygote)의 인종별 위험도 차이를 설명할 수 있고, 그 대립형질 또는 대립유전자에 따라서 대뇌피질 두께의 차이를 나타내는 것을 확인하였다.Meanwhile, Korean Patent Registration No. 10-1335021 describes the relationship between T/G heterozygosity of APOE rs405509 with patients with Alzheimer's disease or mild cognitive impairment. Republic of Korea Patent Registration 10-1250464 can explain the difference in the risk of each race of the APOE E4/E4 homozygote where a single base mutation of the gene located in the APOE promoter, and the difference in cortical thickness according to the allele or allele It confirmed that it shows.
TOP3B(DNA topoisomerase III beta) 유전자는 전사 중에 DNA의 위상 상태를 조절하고 변경시키는 효소인 DNA 토포이소머라제(DNA topoisomerase)를 코딩하는 유전자를 의미한다. DNA topoisomerase는 가닥이 서로 통과할 수 있도록 DNA의 단일가닥을 일시적으로 절단하고 재결합하여 슈퍼코일(supercoil)을 완화시키고 DNA의 위상을 변경시킨다. 세포 내에서 DNA 복제 또는 전사가 수행될 때 염색체의 초꼬임 상태(superhelicity)를 이완 또는 복원시켜 적절한 DNA 슈퍼코일을 유지시키는 것이다. 이 효소는 DNA helicase SGS1과 상호작용하며 DNA 재조합, 세포 노화 및 게놈 안정성 유지에 중요한 역할을 한다. 이 유전자의 C-말단의 다른 스플라이싱은 뚜렷한 조직 특이성을 갖는 3개의 전사 변이체를 생성한다.The TOP3B (DNA topoisomerase III beta) gene refers to a gene encoding DNA topoisomerase, an enzyme that regulates and changes the phase state of DNA during transcription. The DNA topoisomerase temporarily cuts and recombines single strands of DNA so that the strands can pass through each other, easing the supercoil and changing the phase of the DNA. When DNA replication or transcription is performed in a cell, the superhelicity of chromosomes is relaxed or restored to maintain an appropriate DNA supercoil. This enzyme interacts with DNA helicase SGS1 and plays an important role in DNA recombination, cellular senescence, and maintenance of genomic stability. Another splicing of the C-terminus of this gene results in three transcriptional variants with distinct tissue specificity.
진핵세포의 토포이소머라아제 III는 반복된 염기서열간의 hyper-recombination을 유발하는 돌연변이에 의하여 출아형 효모에서 그 유전자가 처음 확인되었고, 이 후 포유동물 DNA 토포이소머라아제 III 유전자 역시 클론 되었는데, 효모의 DNA 토포이소머라아제 III와 달리 고등생물체로 진화됨에 따라 알파와 베타 아이소자임(isozyme)으로 분화되어 있는 것으로 밝혀졌다.Eukaryotic topoisomerase III was first identified in budding yeast by a mutation that causes hyper-recombination between repeated sequences, and then the mammalian DNA topoisomerase III gene was also cloned. Unlike DNA topoisomerase III, it was found to be differentiated into alpha and beta isozymes as it evolved into higher organisms.
본 발명에 있어서, TOP3B 유전자는 총 길이가 25,823nt이며, 사람의 22번 염색체 내에 존재한다. 염색체 내 TOP3B 유전자의 위치는 GRCh37.p13(Genome Reference Consortium Human Build 37 patch release 13)을 기준으로 22,311,397-22,337,219이다.In the present invention, the TOP3B gene has a total length of 25,823 nt and is present in human chromosome 22. The location of the TOP3B gene in the chromosome is 22,311,397-22,337,219 based on GRCh37.p13 (Genome Reference Consortium Human Build 37 patch release 13).
본 발명에 있어서, TOP3B 유전자는 이를 포함하는 유전체상에 존재하는 exon, intron, 5’과 3‘ 말단 비전사지역(5’ and 3’ untranslated region; UTR)을 모두 포함한다. 바람직하게는, 상기 TOP3B 유전자는 서열번호 47로 표시되는 염기서열을 포함하는 것을 특징으로 할 수 있다.In the present invention, the TOP3B gene includes all of exon, intron, 5'and 3'untranslated regions (UTRs) present on the genome including the same. Preferably, the TOP3B gene may be characterized by including a nucleotide sequence represented by SEQ ID NO: 47.
이러한 TOP3B 유전자의 치매와의 관련성을 밝힌 문헌은 존재하지 않으며, 본 발명에서 최초로 확인하였다.There is no literature revealing the association of the TOP3B gene with dementia, and was first confirmed in the present invention.
본 발명에 따른 치매의 진단 또는 예측을 위한 정보제공방법 또는 진단방법은 상기 TOP3B 유전자에 존재하는 SNV(Single Nucleotide Variant)를 검출하고 그 개수 및 위치를 통해 치매 발병 고위험군 선별법을 제공한다.The information providing method or diagnostic method for diagnosing or predicting dementia according to the present invention detects SNV (Single Nucleotide Variant) present in the TOP3B gene, and provides a method for selecting a group at high risk of developing dementia through the number and location.
따라서, 본 발명에 있어서, 상기 정보제공방법 또는 진단방법은 TOP3B 유전자의 SNV가 3개 이상인 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 것을 특징으로 할 수 있다. 바람직하게는 TOP3B 유전자의 SNV가 4개 이상인 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.Accordingly, in the present invention, the information providing method or diagnosis method may further include the step of identifying as a dementia patient or a high-risk group of dementia when the SNV of the TOP3B gene is 3 or more. Preferably, if the SNV of the TOP3B gene is 4 or more, it may be characterized in that it further comprises a step of identifying as a dementia patient or a high risk group for dementia, but is not limited thereto.
본 발명의 일 실시예에서, R 패키지 pROC를 적용한 분석 프로그램을 개발하여 ROC(Receiver Operating Characteristics) 분석을 수행하였으며, TOP3B 유전자내의 SNVs 개수 2.5 기준으로 0.6061의 특이도와 0.9245의 민감도를 확인하였다(도 3). 이에 따라, 2개 이하의 경우 정상인으로 선별할 수 있고, 3 이상이면 치매 환자 또는 치매 발병 고위험군으로 선별이 가능하다.In one embodiment of the present invention, an analysis program applying the R package pROC was developed to perform ROC (Receiver Operating Characteristics) analysis, and a specificity of 0.6061 and a sensitivity of 0.9245 were confirmed based on the number of SNVs 2.5 in the TOP3B gene (Fig. 3 ). Accordingly, two or less cases can be selected as a normal person, and if three or more, a dementia patient or a high-risk group for dementia can be selected.
ROC 분석은 주로 임상 화학, 약리학, 생리학 진단 검사에 사용되는 것으로서, sensitivity(= true positive/(true positive + false positive), 민감도)와 specificity(= true negative/(true negative + false positive), 특이도)를 동시에 나타내는 그래프이다. 이때, x축은 false positive rate(= 1 - true negative rate), y축은 true positive rate가 된다. ROC curve는 그래프가 왼쪽 꼭대기에 가깝게 그려질수록 분류 성능이 우수하다고 보는데, 이는 ROC 곡선 면적이 1에 가까울수록 성능이 좋다는 것을 의미한다. 진단 검사에 있어서, sensitivity가 specificity보다 중요한 역할을 하는데, sensitivity가 낮다는 것은 false negative, 즉, 질병 위험군이 위험군으로 예측되지 않는다는 것을 의미하기 때문이다. 따라서, 1에 매우 가까운 0.9245의 민감도를 나타낸 본 발명의 정보제공방법 또는 진단방법은 질병의 진단에 있어서 매우 높은 정확도를 나타냄을 시사한다.ROC analysis is mainly used for clinical chemistry, pharmacology, and physiology diagnostic tests, and sensitivity (= true positive/(true positive + false positive), sensitivity) and specificity (= true negative/(true negative + false positive), specificity) ) Is a graph showing at the same time. At this time, the x-axis is a false positive rate (= 1-true negative rate), and the y-axis is a true positive rate. As for the ROC curve, the closer the graph is drawn to the top left, the better the classification performance is, which means that the closer the ROC curve area is to 1, the better the performance. In diagnostic tests, sensitivity plays a more important role than specificity, because low sensitivity means false negatives, that is, disease risk groups are not predicted as risk groups. Therefore, it is suggested that the information providing method or diagnostic method of the present invention exhibiting a sensitivity of 0.9245, which is very close to 1, exhibits very high accuracy in diagnosing a disease.
또한, 본 발명에 있어서, 상기 정보제공방법 또는 진단방법은 TOP3B 유전자에서 GRCh37.p13(Genome Reference Consortium Human Build 37 patch release 13)을 기준으로 22,311,659; 22,311,776; 22,312,061; 22,312,502; 22,312,378; 22,312,589; 22,312,970; 22,313,743; 22,318,365; 22,312,555; 22,312,531; 22,316,792; 22,311,882; 22,313,733; 22,311,516; 22,312,292; 22,313,669; 22,312,383; 22,330,107; 22,312,568; 22,312,476; 22,318,671; 22,312,668; 22,312,790; 22,318,538; 22,312,484; 22,312,351; 22,312,350; 22,312,315; 22,313,829; 및 22,330,082;로 구성된 군에서 선택되는 위치에 SNV가 검출되는 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 것을 특징으로 할 수 있다.In addition, in the present invention, the information providing method or diagnosis method is 22,311,659 based on GRCh37.p13 (Genome Reference Consortium Human Build 37 patch release 13) in TOP3B gene; 22,311,776; 22,312,061; 22,312,502; 22,312,378; 22,312,589; 22,312,970; 22,313,743; 22,318,365; 22,312,555; 22,312,531; 22,316,792; 22,311,882; 22,313,733; 22,311,516; 22,312,292; 22,313,669; 22,312,383; 22,330,107; 22,312,568; 22,312,476; 22,318,671; 22,312,668; 22,312,790; 22,318,538; 22,312,484; 22,312,351; 22,312,350; 22,312,315; 22,313,829; And 22,330,082; when SNV is detected at a location selected from the group consisting of; determining as a dementia patient or a high-risk group for dementia.
본 발명에 있어서, 상기 위치들은 서열번호 47로 표시되는 염기서열에서 각각 263, 380, 665, 1106, 982, 1193, 1574, 2347, 6969, 1159, 1135, 5396, 486, 2337, 120, 896, 2273, 987, 18711, 1172, 1080, 7275, 1272, 1394, 7142, 1088, 955, 954, 919, 2433, 18686번째 뉴클레오타이드의 위치를 의미한다.In the present invention, the positions are 263, 380, 665, 1106, 982, 1193, 1574, 2347, 6969, 1159, 1135, 5396, 486, 2337, 120, 896, respectively, in the nucleotide sequence represented by SEQ ID NO: 47, 2273, 987, 18711, 1172, 1080, 7275, 1272, 1394, 7142, 1088, 955, 954, 919, 2433, 18686 refers to the position of the nucleotide.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV가 하기 표 1의 SNV에서 선택되는 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, it may be characterized in that the SNV of the TOP3B gene is selected from the SNV of Table 1, but is not limited thereto.
Figure PCTKR2020006028-appb-T000001
Figure PCTKR2020006028-appb-T000001
본 발명에 있어서, 보다 정확도 높은 치매 환자군의 선별을 위해 추가적으로 다른 연관 유전자의 SNVs를 첨가하거나 제외할 수 있다.In the present invention, SNVs of other related genes may be additionally added or excluded in order to select a dementia patient group with higher accuracy.
치매의 진단 또는 예측을 위하여, 본 발명에 따른 TOP3B 유전자 외에 다른 유전자로 APOE, SOL1, CLU, PICALM, CR1, BIN1 등의 유전자의 SNV 검출을 추가로 수행할 수 있다.For diagnosis or prediction of dementia, SNV detection of genes such as APOE, SOL1, CLU, PICALM, CR1, BIN1 may be additionally performed as other genes in addition to the TOP3B gene according to the present invention.
차세대 염기서열 분석기술의 발전으로 개인의 유전체를 저렴한 비용에 신속하게 밝히는 단계에 진입하였다. 특정 질환 집단의 전체 유전체 염기서열을 분석하여 염기서열의 변이를 발굴하고 이 변이와 특정 질환의 표현형과의 연관성을 밝히기 위한 연구가 진행되고 있다.With the development of next-generation sequencing technology, we have entered the stage of quickly revealing an individual's genome at low cost. Research is being conducted to discover the nucleotide sequence variation by analyzing the entire genome sequence of a specific disease group and to clarify the association between this mutation and the phenotype of a specific disease.
“단일염기변이(Single Nucleotide Variant, SNV)”는 유전체상의 변이 중 단일염기서열이 다른 차이를 보이는 변이를 의미하며, 단일염기다형성(single nucleotide polymorphism, SNP)과 점돌연변이(point mutation)가 여기에 포함된다. 빈도에 제한이 없으며 체세포에서 발생할 수 있다. 체세포의 단일 뉴클레오타이드 변이(예: 암에 의한)는 단일-뉴클레오타이드 변이(single-nucleotide alteration)라고도 한다. 단일염기다형성은 여러 사람의 유전체의 같은 위치에서 특정 염기서열 하나가 다른 염기로 변화되어 다른 형질로 표현되는 것을 의미하며, 인간 유전체 상에 가장 많이 존재하는 형태의 유전자 변이이다. 단일염기다형성은 일반적으로 집단의 1% 이상의 빈도로 나타나며, 1% 이하일 경우는 돌연변이라고 분류한다. 점돌연변이는 하나의 염기서열이 치환, 삽입 또는 결실되어 나타나며 특정 단백질의 생성을 막거나 변형시킬 수 있다.“Single Nucleotide Variant (SNV)” refers to mutations in which a single nucleotide sequence differs among mutations in the genome, and single nucleotide polymorphism (SNP) and point mutations are included here. Included. The frequency is not limited and can occur in somatic cells. A single nucleotide alteration in a somatic cell (eg due to cancer) is also referred to as single-nucleotide alteration. Single nucleotide polymorphism means that one specific nucleotide sequence is changed to another nucleotide at the same location in the genome of several people and is expressed as a different trait, and is the most common type of genetic mutation in the human genome. Monobasic polymorphism generally occurs with a frequency of more than 1% of the population, and cases less than 1% are classified as mutations. Point mutations appear when one nucleotide sequence is substituted, inserted, or deleted, and can prevent or modify the production of a specific protein.
단일염기변이는 유전체 상에 존재하는 위치와 기능에 따라 분류된다. 또한, 아미노산 서열 변이의 유무에 따라 아미노산의 서열 변이를 일으키지 않는 synonymous SNV(sSNV)와 아미노산의 서열 변이를 일으키는 nonsynonymous SNV(nsSNV)로 분류된다.Single base mutations are classified according to their location and function in the genome. In addition, it is classified into synonymous SNV (sSNV), which does not cause amino acid sequence mutation, and nonsynonymous SNV (nsSNV), that does not cause amino acid sequence mutation.
3개의 염기서열에 의해 하나의 아미노산이 인식되기 때문에 하나의 염기서열의 치환, 삽입-결실(insertion-deletion, indel)로 생기는 SNV는 경우에 따라서 아미노산의 서열을 바꾸어 생성된 단백질의 기능에 영향을 줄 수 있다. 하나의 염기서열의 치환에 의해 단일염기변이가 일어난 후 아미노산의 염기서열이 바뀌지 않은 경우를 synonymous SNV 혹은 silent SNV라고 한다. 예를 들어 GAC 염기서열이 GAG로 C가 G로 치환된 경우, mRNA의 코돈은 CUG에서 CUC로 바뀌게 되지만 치환 전후 모두 동일한 leucine을 암호화한다. 반면, 하나의 염기서열의 치환에 의해 단일염기변이가 일어난 후 아미노산의 염기서열이 바뀌는 경우가 있는데 이것을 nonsynonymous SNV라고 한다. 예를 들어 GUA 코돈 염기서열이 GUU로 A가 U로 치환된 경우, aspartic acid가 valine으로 바뀌어 암호화되는데, 두 아미노산은 화학적 특성이 매우 다르므로 생성된 단백질의 구조와 기능에 큰 영향을 줄 수 있다. 이렇게 SNV에 의해 다른 아미노산이 암호화된 것을 세분하여 missense SNV라 하고, SNV가 일어난 후 다른 아미노산으로 바뀌는 것이 아니라 종결코돈(stop codon)이 암호화되어 실제보다 짧은 단백질을 생성하게 되는 것을 nonsense SNV라 한다. missense SNV의 대표적인 예로 겸상적혈구 빈혈증을 들 수 있다. β-헤모글로빈 6번째 코돈이 GAG에서 GTG로 치환되어 산성인 글루탐산이 비극성 아미노산인 발린으로 암호화되는데 헤모글로빈의 산소 운반력이 약화되어 빈혈을 유발하고, 적혈구가 긴 낫 모양으로 서로 달라붙어 혈관폐색으로 인한 통증 및 조직 손상이 유발될 수 있다. 단일염기변이 중 indel의 경우 치환보다 더 심각한 변이를 유발할 수 있다. indel의 경우 아미노산 염기서열 배열의 격자이동(frame shift)이 유발되어 SNV 뒤에 번역되는 아미노산이 바뀌게 된다.Since one amino acid is recognized by three nucleotide sequences, SNV, which is caused by substitution or insertion-deletion (indel) of one nucleotide sequence, affects the function of the resulting protein by changing the amino acid sequence in some cases. Can give. A case where the amino acid sequence does not change after a single base mutation occurs due to the substitution of one base sequence is called synonymous SNV or silent SNV. For example, when the GAC sequence is GAG and C is replaced by G, the codon of the mRNA changes from CUG to CUC, but both before and after the substitution encode the same leucine. On the other hand, there are cases where the nucleotide sequence of an amino acid is changed after a single base mutation occurs due to the substitution of one nucleotide sequence, which is called nonsynonymous SNV. For example, when the GUA codon base sequence is GUU and A is replaced with U, aspartic acid is changed to valine and is encoded.Since the two amino acids have very different chemical properties, the structure and function of the resulting protein can be greatly affected. . In this way, what is encoded by SNV is subdivided into missense SNV. After SNV occurs, it is called nonsense SNV that it does not change to another amino acid, but a stop codon is encoded to produce a protein that is shorter than it actually is. A representative example of missense SNV is sickle cell anemia. The sixth codon of β-hemoglobin is substituted from GAG to GTG, and the acidic glutamic acid is encoded by the non-polar amino acid valine.The oxygen carrying capacity of hemoglobin is weakened, causing anemia. Pain and tissue damage can be caused. Among the single base mutations, indels may cause more severe mutations than substitutions. In the case of indel, a frame shift of the amino acid sequence sequence is induced, and the translated amino acid is changed after SNV.
유전체상에 존재하는 위치에 따라 암호화하는 exon 부위에 존재하는 SNV를 coding SNV(cSNV)라 하고, intron, 5’과 3‘ 말단 비전사지역(5’ and 3’ untranslated region; UTR)과 같은 비암호화 부위에 존재하는 SNV를 non-coding SNV(ncSNV)라 한다.The SNV present in the exon region that encodes according to the position on the genome is called coding SNV (cSNV), and the ratio of intron, 5'and 3'untranslated regions (5' and 3'untranslated regions; UTR) The SNV present in the coding region is called non-coding SNV (ncSNV).
서로 다른 두 사람의 전체 유전체를 분석하면 4백만 개 이상의 염기서열 변이가 발견되며(M W Nachman, S L Crowell, (2000) Genetics, 156 (1), 297-304), 이 중 80% 정도는 단일염기변이이다(Pauline C Ng, et al., (2008) PLoS Genet, 4 (8), e1000160). 전체 인간 유전체 상에서 81~93%의 유전자는 적어도 1개 이상의 단일염기변이를 포함하고 있다(Benjamin Lehne, et al., (2011) PLoS One, 6 (6), e20133). 이렇게 엄청난 숫자의 단일염기변이로부터 특정 질환과의 연관성이 있는 의미 있는 단일염기서열 변이를 발굴하는 작업은 매우 큰 도전이다. 그러므로, 먼저 질환과 연관성이 높은 단일염기변이를 선별하는 과정이 선행되어야 한다.Analyzing the entire genome of two different people, more than 4 million nucleotide sequence variations were found (MW Nachman, SL Crowell, (2000) Genetics, 156 (1), 297-304), of which 80% were single bases. It is a variant (Pauline C Ng, et al., (2008) PLoS Genet, 4 (8), e1000160). 81-93% of genes in the entire human genome contain at least one single nucleotide mutation (Benjamin Lehne, et al., (2011) PLoS One, 6 (6), e20133). The task of discovering meaningful single nucleotide sequence mutations associated with a specific disease from such a huge number of single nucleotide mutations is a great challenge. Therefore, first, the process of selecting a single nucleotide mutation that is highly related to the disease must be preceded.
nsSNV는 단백질의 folding, 결합력(binding affinity), 발현정도(expression), 번역 후 변형(post-translational modification) 그리고 기타 단백질의 특성(protein features)에 영향을 줄 수 있고 알려진 유전적 질환의 염기서열변이 중 85% 이상을 차지하고 있으므로 단일염기변이 중 가장 관심이 집중되고 있다. 그러나, synonymous SNV 또한 특정 질환과 연관성이 있다고 보고되고 있으며(Siyuan Zheng, et al., (2014) Cell, 156(6), 1129-1131), ncSNV에 의해 생성된 비번역 RNA나 promoter 등이 전사인자의 결합, gene splicing, mRNA 분해 등에 영향을 줄 수 있으므로, 이로 인한 특정 질환과의 연관성도 간과할 수 없다(Yanyun Ma, et al., (2014) Genet Test Mol Biomarkers, 18 (7), 516-24; Isabel De Castro-Oros, et al., (2014) BMC Med Genomics, 7, 17).nsSNV can affect protein folding, binding affinity, expression, post-translational modification, and other protein features, and sequence variations in known genetic diseases. It accounts for more than 85% of the total, so the most attention is focused among single base variants. However, it has been reported that synonymous SNV is also associated with specific diseases (Siyuan Zheng, et al., (2014) Cell, 156(6), 1129-1131), and untranslated RNA or promoter generated by ncSNV is transcribed. Since it may affect factor binding, gene splicing, mRNA degradation, etc., the association with specific diseases cannot be overlooked (Yanyun Ma, et al., (2014) Genet Test Mol Biomarkers, 18 (7), 516 -24; Isabel De Castro-Oros, et al., (2014) BMC Med Genomics, 7, 17).
대한민국 등록특허 10-1933847에 의하면, APOE 유전자 및 프로모터의 SNP는 알츠하이머병에서 나타나는 유전적 변이이다. APOE는 APOEε2, APOEε3, APOEε4 세가지 동형질체(isoforms)를 가지는 지질결합단백질(lipid-binding protein)로, APOE 유전자는 세계적으로 각각 8.4%, 77.9%, 및 13.7% 빈도를 갖는 3개의 다형성, e2, e3 및 e4를 갖는다. 이중에서 e4 대립유전자가 산발적 알츠하이머병의 후발성 발병에 대한 가장 강력한 위험요인이다. APOE e4가 내후각피질(entorhinal cortex), 해마곁피질(parahippocampal cortex) 및 설전부(precuneus) 내의 피질 두께를 감소시키는 것을 촉진하는 것이 보고된 바 있다(Markus Donix, et al., (2010) Neuroimage, 53 (1), 37-43). 인지력이 정상인 피험자에 대한 다른 연구는 e4를 지닌 개체가 또한 감소된 피질 두께를 갖는 것을 제시하였다(Baptiste Fauvel, et al., (2014) Neuroimage, 90, 179-88). 또한, e4 대립유전자를 갖는 개체가, 기억 기능에서 중요한 역할을 하는 해마의 심각한 위축과 기억 손상을 나타냈다(Panagiotis Alexopoulos, et al., (2011) J Alzheimers Dis, 26 (2), 207-10). 알츠하이머병에서 e4 빈도는 ~40%까지 극적으로 증가한다. APOE는 19번 염색체 장완(19 q13.2)에 위치하고, 112번(Cys/Arg)과 158번(Arg/Cys) 아미노산이 달라짐으로 인해 형성되는 3가지 대립유전자 e2, e3 및 e4의 조합에 의해 6개의 유전자형(E2/E2, E2/E3, E3/E3, E2/E4, E3/E4, E4/E4) 다형성(polymorphism)이 존재한다. APOE e4 동형접합체(homozygotes)를 갖는 건강한 개체가 감소된 해마 부피를 나타낸 반면에, e4 이형접합체(heterozygotes)는 건강한 노년층 그룹에서 e4를 보유하지 않은 피험자와 차이를 보이지 않았다(M R Farlow, et al., (2004) Neurology, 63 (10), 1898-901). 또한, APOE e4는 건강한 노년층 및 경도인지장애가 알츠하이머병으로 전환되는데 관여하는 것으로 나타났다. 그러나 APOE e4를 기반으로 하는 예측의 경우도, 치매에 대한 유전적 영향도의 20% 이내에서만 설명 가능하다. 이처럼, 알츠하이머병에 대한 e4-매개 위험도 및 가능성 있는 유발 인자에 대해서는 여전히 명확하게 밝혀지지 않고 있다.According to Korean Patent Registration No. 10-1933847, the SNP of the APOE gene and promoter is a genetic mutation that appears in Alzheimer's disease. APOE is a lipid-binding protein having three isoforms, APOEε2, APOEε3, and APOEε4, and the APOE gene has three polymorphisms, e2, with a frequency of 8.4%, 77.9%, and 13.7%, respectively, worldwide. , e3 and e4. Of these, the e4 allele is the strongest risk factor for the onset of sporadic Alzheimer's disease. It has been reported that APOE e4 promotes the reduction of cortical thickness in the entorhinal cortex, parahippocampal cortex, and precuneus (Markus Donix, et al., (2010) Neuroimage , 53(1), 37-43). Other studies of subjects with normal cognitive abilities have suggested that individuals with e4 also have reduced cortical thickness (Baptiste Fauvel, et al., (2014) Neuroimage, 90, 179-88). In addition, individuals with the e4 allele showed severe hippocampal atrophy and memory impairment, which play an important role in memory function (Panagiotis Alexopoulos, et al., (2011) J Alzheimers Dis, 26 (2), 207-10). . In Alzheimer's disease, the e4 frequency increases dramatically by ~40%. APOE is located on the long arm of chromosome 19 (19 q13.2), and is formed by a combination of three alleles e2, e3 and e4 that are formed by different amino acids 112 (Cys/Arg) and 158 (Arg/Cys). There are six genotypes (E2/E2, E2/E3, E3/E3, E2/E4, E3/E4, E4/E4) polymorphism. While healthy individuals with APOE e4 homozygotes showed reduced hippocampal volume, e4 heterozygotes did not differ from those without e4 in the healthy elderly group (MR Farlow, et al. , (2004) Neurology, 63 (10), 1898-901). In addition, APOE e4 was found to be involved in the conversion of healthy elderly people and mild cognitive impairment to Alzheimer's disease. However, predictions based on APOE e4 can only be explained within 20% of the genetic impact on dementia. As such, the e4-mediated risk and possible triggers for Alzheimer's disease remain unclear.
한편, 대한민국 등록특허 10-1933847은 APOE E4 유전자 변이뿐만 아니라, APOE 유전자 주변인 APOE 프로모터의 rs405509 T 대립유전자에서 APOE E4의 치매 위험도에 영향을 줄 수 있는 유전변이들을 기술하였다. 상기 SNP는 인간 유전자 지도 GRCh38.p7 버전을 기준으로 할 때, 염색체 19번 44905579에 위치한다. APOE 프로모터 및 인트론 영역내의 다형성이 AD 발병에서 APOE e4의 영향을 조절하는 것이 보고된 바 있다(Lars Bertram, et al., (2007) Neurobiol Aging, 28 (1), 18.e1-4; J-C Lambert, et al., (2002) Neurology, 59 (1), 59-66; Francesco Lescai, et al., (2011) J Alzheimers Dis, 24 (2), 235-45). 프로모터 내의 2개의 SNPs(rs449647 및 rs405509) 및 인트론 내의 하나의 SNP(rs440446)가 알츠하이머병에 대한 APOE 엡실론 변이의 영향을 조절하는 것으로 평가되었다. -491AA 유전자형 및 -219TT 유전자형이 APOE 엡실론 유전자형과 독립적으로 AD 위험도를 높이는 것으로 보고된 바 있다(Anna Limon-Sztencel, et al., (2016) Alzheimers Res Ther, 8 (1), 19). 또한, rs405509가 인지력에 대한 영향에 있어 APOE e4와 시너지 작용을 하는 것이 보고되었다(C Ma, et al., (2016) Eur J Neurol, 23 (9), 1415-25). APOE e4와 함께 rs405509-TT를 지닌 개체가 rs405509 G-대립유전자를 지닌 개체와 비교하여 연령 의존적으로 피질 두께의 위축을 나타내는 것이 발견되었다(Ni Shu, et al., (2015) Hum Brain Mapp, 36 (12), 4847-58). rs405509-TT 유전자형의 개체가 알츠하이머병과 고도로 연관되고, e4 동형접합체 중에서 G-대립유전자를 지닌 개체와 비교하여 피질 두께 및 해마부피에서 보다 더 강한 위축을 초래하는 것을 밝혔다. 특히, 이러한 피질두께 감소 패턴은 중막측두피질(medial temporal cortex)(내후각 및 해마곁 영역) 및 설전부(precuneus)에서 관찰되었으며, 해마곁 영역에서의 위축은 이전 연구(Ni Shu, et al., (2015) Hum Brain Mapp, 36 (12), 4847-58)와 유사하다. 또한, rs405509-TT는 사람 뇌 및 혈청에서 감소된 APOE 발현을 유도한다.Meanwhile, Korean Patent Registration No. 10-1933847 describes not only the APOE E4 gene mutation, but also the genetic mutations that may affect the risk of dementia of APOE E4 in the rs405509 T allele of the APOE promoter surrounding the APOE gene. The SNP is located on chromosome 19 44905579 based on the human genetic map GRCh38.p7 version. It has been reported that polymorphisms in the APOE promoter and intron region regulate the influence of APOE e4 in the onset of AD (Lars Bertram, et al., (2007) Neurobiol Aging, 28 (1), 18.e1-4; JC Lambert , et al., (2002) Neurology, 59 (1), 59-66; Francesco Lescai, et al., (2011) J Alzheimers Dis, 24 (2), 235-45). Two SNPs in the promoter (rs449647 and rs405509) and one SNP in the intron (rs440446) were evaluated to regulate the effect of the APOE epsilon mutation on Alzheimer's disease. The -491AA genotype and -219TT genotype have been reported to increase AD risk independently of the APOE epsilon genotype (Anna Limon-Sztencel, et al., (2016) Alzheimers Res Ther, 8 (1), 19). In addition, it has been reported that rs405509 has a synergistic effect with APOE e4 in the influence on cognitive ability (C Ma, et al., (2016) Eur J Neurol, 23 (9), 1415-25). It was found that individuals with rs405509-TT together with APOE e4 exhibit age-dependent atrophy of cortical thickness compared to individuals with rs405509 G-allele (Ni Shu, et al., (2015) Hum Brain Mapp, 36 (12), 4847-58). It has been found that individuals of the rs405509-TT genotype are highly associated with Alzheimer's disease and cause stronger atrophy in cortical thickness and hippocampal volume compared to individuals with the G-allele among the e4 homozygous. In particular, this pattern of reduction in cortical thickness was observed in the medial temporal cortex (medial temporal cortex) and precuneus, and atrophy in the hippocampal region was previously studied (Ni Shu, et al. , (2015) Hum Brain Mapp, 36 (12), 4847-58). In addition, rs405509-TT induces decreased APOE expression in human brain and serum.
한편, 치매의 유전적 요인과 관련하여, 연령, 가족성이 있는 경우, apolipoprotein E(APOE)의 ε4 대립유전자가 확인되었을 뿐 다른 요인에 대해서는 아직은 확실한 연관성이 제시되지 못하고 있다. 상염색체 우성(autosomal dominant)으로 유전되는 가족성 AD의 유전적 연구에서, 65세 이전 질병을 일으키는(early-onset AD, EOAD) 원인유전자로 amyloid precursor protein(APP), presenilin 1(PS1), presenilin 2(PS2)의 돌연변인 유전자들이 있고, 65세 이후 AD 발현(late-onset AD, LOAD)과 관련이 되는 감수성 유전자(susceptible gene)로 APOE의 다형성(polymorphism)이 있다(Seung Hwan Lee, Kun Woo Park, (2008) J Korean Geriatr Soc, 12(1):5-10).On the other hand, in relation to the genetic factors of dementia, in the case of age and familiality, the ε4 allele of apolipoprotein E (APOE) has been identified, but no clear association has yet been suggested for other factors. In a genetic study of familial AD that is inherited as an autosomal dominant, amyloid precursor protein (APP), presenilin 1 (PS1), and presenilin are the causative genes that cause disease before age 65 (early-onset AD, EOAD). 2 (PS2) mutant genes, and APOE polymorphism as a susceptible gene related to late-onset AD (LOAD) after 65 years of age (Seung Hwan Lee, Kun Woo Park, (2008) J Korean Geriatr Soc, 12(1):5-10).
APP 유전자는 770개의 아미노산을 코딩하는 유전자로, 인간의 제21번 염색체(chromosome) 21q21.1에 위치하고 있다. 아밀로이드 가설(amyloid cascade theory)에 따르면, β-아밀로이드의 생성과 관련된 APP 대사의 이상으로 뇌조직과 뇌혈관에 아밀로이드 침착이 일어나고, 이것이 알츠하이머병의 발병에 중요한 역할을 하는 것으로 생각되고 있다(D J Selkoe, (1991) Neuron, 6 (4), 487-98). APP는 3가지 종류의 단백질 대사효소(α-, β-, γ-secretase)에 의해 분해가 되며, nonamyloidogenic product인 Aβ40은 용해되기 쉬운 반면, amyloidogenic product인 Aβ42는 β-아밀로이드 침착을 일으키려는 경향이 커 섬유다발 형성을 잘한다. 현재까지 18개의 돌연변이 유전자가 발견되었으며, 이것들은 효소의 활성에 영향을 끼쳐, Aβ42의 생성을 증가시키며, 조직내에 노인성 플라크(senile plaque)와 신경섬유다발(neurofibrillary tangle)을 만드는 것으로 알려졌다.The APP gene is a gene that codes for 770 amino acids and is located on human chromosome 21q21.1. According to the amyloid cascade theory, an abnormality in APP metabolism related to the production of β-amyloid causes amyloid deposition in brain tissues and cerebrovascular vessels, which is thought to play an important role in the onset of Alzheimer's disease (DJ Selkoe , (1991) Neuron, 6 (4), 487-98). APP is degraded by three types of protein metabolism enzymes (α-, β-, and γ-secretase), and Aβ40, a nonamyloidogenic product, is easy to dissolve, whereas Aβ42, an amyloidogenic product, tends to cause β-amyloid deposition. It is large and good at forming fiber bundles. To date, 18 mutant genes have been discovered, and they are known to affect the activity of enzymes, increase the production of Aβ42, and create senile plaques and neurofibrillary tangles in tissues.
Presenilin 1(PS1) 및 Presenilin 2(PS2) 유전자 또한 알츠하이머와 관련이 있는 것으로 보고되었다. PS1의 유전자는 제14번 염색체 14q24.3에 위치하고 있으며, PS2는 제1번 염색체 1q31-q42에 존재한다. Presenilin(PSI)은 핵막(nuclear membrane), endoplasmic reticulum, 그리고 Gogi에 존재하며, 8개의 transmembrane domains이 있다. PS1과 PS2유전자는 67%의 아미노산 순서가 일치하며, transmembrane domain은 84%가 일치한다. 현재까지 142개의 PS1의 돌연변이와 10개의 PS2 돌연변이가 발견되었으며, 이들 돌연변이는 APP 대사과정에 영향을 미쳐, Aβ42의 생성을 증가시키는 것으로 알려졌다. APP의 대사 과정 중 정확한 PS의 역할은 아직은 알려져 있지 않지만, γ-secretase의 효소 활성에 관련된 단백질 복합체의 구성 물질로, 유전자변이에 의해 복합체의 구조적인 변형이 나타나고, 이로 인해 단백질간의 상호작용에 이상을 초래하는 것으로 추측되고 있다.Presenilin 1 (PS1) and Presenilin 2 (PS2) genes have also been reported to be associated with Alzheimer's. The gene of PS1 is located on chromosome 14q24.3, and PS2 is on chromosome 1, 1q31-q42. Presenilin (PSI) is present in the nuclear membrane, endoplasmic reticulum, and Gogi, and has eight transmembrane domains. The PS1 and PS2 genes are 67% of the amino acid sequence, and the transmembrane domain is 84%. To date, 142 mutations in PS1 and 10 mutations in PS2 have been discovered, and these mutations are known to affect APP metabolism, thereby increasing the production of Aβ42. The exact role of PS in the metabolic process of APP is not yet known, but it is a constituent of the protein complex related to the enzyme activity of γ-secretase, and the structural modification of the complex occurs due to genetic mutation, resulting in abnormal interactions between proteins. It is estimated to cause
조발성 치매의 원인이 되는 APP, PS1, PS2 유전자 돌연변이의 경우는 알츠하이머병의 발병에 약 5% 미만을 설명할 수 있다. 대부분의 65세 이후에 질병을 일으키는 후발성 치매, 혹은 산발성(sporadic) 알츠하이머병의 경우는 유전자 돌연변이 보다는 APOE 대립유전자의 다형성이 최근 유전적인 요인으로 주목 받고 있다. APOE는 제19번 염색체에 유전자가 위치하고 있으며, 콜레스테롤 수송에 관여되는 단백질로 ε2, ε3, ε4의 세 가지 종류의 대립유전자가 존재한다. APOEε4 동형접합 혹은 이형접합 유전자를 가지는 경우 85세까지 AD가 나타날 확률이 90% 이상이고, ε2나 ε3의 유전자를 가진 사람보다 10년 빨리 AD로 발전할 수 있다는 보고가 있고, ε4의 유전자 발현은 일반인에게 약 15%로 AD의 유전적 위험 요인의 약 50%와 관련된다고 밝혀진 바 있다. 또한 AD 환자의 뇌조직이나, 뇌혈관 내에 존재하는 노인반 혹은 신경섬유 다발과 APOE와의 immunoreactivity 연구에서도 ε4가 다른 ε2나 ε3보다 연관성이 있는 것으로 알려졌다. 또한, APOE는 APP에 대한 γ-secretase 작용을 감소시킨다는 연구 결과도 있다(J Poirier, (1994) Trends Neurosci, 17 (12), 525-30). APOE의 유전자는 결정인자(determinant)라기 보다는 감수성인자(susceptibility)로 이해하여야 한다.APP, PS1, PS2 gene mutations that cause premature dementia can account for less than 5% of the onset of Alzheimer's disease. In the case of late-onset dementia or sporadic Alzheimer's disease, which causes disease after the age of 65, polymorphism of the APOE allele rather than genetic mutation has recently attracted attention as a genetic factor. APOE has a gene located on chromosome 19, and is a protein involved in cholesterol transport. There are three types of alleles, ε2, ε3, and ε4. APOEε4 homozygous or heterozygous genes have a 90% probability of developing AD until 85 years of age, and there are reports that they can develop AD 10 years earlier than those with ε2 or ε3 genes, and ε4 gene expression is It has been found to be related to about 50% of the genetic risk factors for AD, with about 15% in the general population. In addition, ε4 was found to be more correlated than other ε2 or ε3 in the immunoreactivity study between AD patients' brain tissues, senile plaques or nerve fiber bundles present in the cerebral blood vessels and APOE. In addition, there are studies showing that APOE reduces the γ-secretase action on APP (J Poirier, (1994) Trends Neurosci, 17 (12), 525-30). APOE genes should be understood as susceptibility rather than determinants.
현재까지 후발성 치매와 관련되어 염색체 9, 10, 12번에 존재하는 유전자의 연구가 보고되고 있다. 예로 β-아밀로이드의 침착에 영향을 미치는 것으로 추정되는 ubiquilin 1(Mikko Hiltunen, et al., (2006) J Biol Chem, 281 (43), 32240-53), β-아밀로이드의 분해에 관여되는 것으로 추측되는 insulinedegrading enzyme(W Q Qiu, et al., (1998) J Biol Chem, 273 (49), 32730-8) 등 100여개 이상의 유전자가 연구되고 있다. 그러나 여러 연구에서 이들 유전자와 후발성 치매의 관계에 상반되는 보고들로, APOE와 같이 연관성이 확립되지는 않은 상태이다(Alessandro Serretti, et al., (2007) J Alzheimers Dis, 12 (1), 73-92).To date, studies of genes present on chromosomes 9, 10 and 12 have been reported related to late-onset dementia. For example, ubiquilin 1 (Mikko Hiltunen, et al., (2006) J Biol Chem, 281 (43), 32240-53), which is believed to affect the deposition of β-amyloid, is assumed to be involved in the decomposition of β-amyloid. More than 100 genes such as insulinedegrading enzyme (WQ Qiu, et al., (1998) J Biol Chem, 273 (49), 32730-8) are being studied. However, several studies have reported contradicting the relationship between these genes and late-onset dementia, and the association with APOE has not been established (Alessandro Serretti, et al., (2007) J Alzheimers Dis, 12 (1), 73-92).
한편, 성장 인자 수용체 결합 단백질 2(growth factor receptor-bound protein-associated binding protein 2, GAB2)는 APOE 엡실론 4 캐리어에서 LOAD 위험을 수정하고 알츠하이머병의 신경 병리학에 영향을 미친다. 포스파티딜 이노시톨 결합성 Clathrin 조립 단백질(phosphatidylinositol binding clathrin assembly protein, PICALM) 및 sortilin 관련 수용체(sortilin-related receptor, SORL1) 변이종이 AD와 관련이 있는지를 보다 정확하게 평가하기 위해 메타 분석이 수행되었다. 이에 따르면, PICALM에서 rs3851179의 대립유전자 T는 AD의 위험이 13% 증가한 것과 관련이 있었다. 또한, SORL1의 7가지 SNP는 AD와 유의한 관련이 있었다. rs1010159*T, rs641120*A, rs668387*T 및 rs689021*A를 포함한 4개의 SNP는 AD의 위험 감소와 관련이 있는 반면, rs12285364*T, rs2070045*G 및 rs2282649*T를 포함한 3개의 SNP는, 모두 AD의 위험 증가와 관련이 있었다. 상기 연구의 결과는 여러 유전자 변이가 AD와 연관되어 있음을 시사한다(Ziran Wang, et al., (2016) Mol Neurobiol, 53 (9), 6501-6510). rs3851179(PICALM), rs12285364(SORL1), rs2070045(SORL1) 및 rs2282649(SORL1)의 SNP는 AD의 위험 증가와 관련이 있는 반면, SORL1 rs1010159, rs641120, rs668387 및 rs689021은 AD의 위험 감소와 관련이 있는 것을 알 수 있다.Meanwhile, growth factor receptor-bound protein-associated binding protein 2 (GAB2) modifies the risk of LOAD in the APOE epsilon 4 carrier and influences the neuropathology of Alzheimer's disease. A meta-analysis was performed to more accurately assess whether the phosphatidyl inositol binding clathrin assembly protein (PICALM) and sortilin-related receptor (SORL1) mutants were related to AD. According to this, in PICALM, the allele T of rs3851179 was associated with a 13% increased risk of AD. In addition, 7 SNPs of SORL1 were significantly related to AD. Four SNPs, including rs1010159*T, rs641120*A, rs668387*T, and rs689021*A, were associated with a reduced risk of AD, while all three SNPs, including rs12285364*T, rs2070045*G, and rs2282649*T, were all It was associated with an increased risk of AD. The results of this study suggest that several gene mutations are associated with AD (Ziran Wang, et al., (2016) Mol Neurobiol, 53 (9), 6501-6510). SNPs of rs3851179 (PICALM), rs12285364 (SORL1), rs2070045 (SORL1) and rs2282649 (SORL1) were associated with an increased risk of AD, whereas SORL1 rs1010159, rs641120, rs668387 and rs689021 were associated with a reduced risk of AD. Able to know.
“단일염기다형성(Single Nucleotide Polymorphism, SNP)”이란 게놈(genome)에서 단일염기(A, T, C 또는 G)가 종의 멤버들 간 또는 한 개체(individual)의 쌍 염색체 간에 다른 경우에 발생하는 DNA 서열의 다양성을 의미한다. 예를 들어, 서로 다른 개체의 DNA 단편들(예: TGTG[G/T]AAAG, G/T는 상보적인 염기)처럼 단일염기에서 차이를 포함하는 경우, 두 개의 대립유전자(G 또는 T)라고 부르며, 일반적으로 거의 모든 SNP는 두 개의 대립유전자를 가진다. 한 집단(population)내에서, SNP는 소수 대립인자 빈도(minor allele frequency, MAF; 특정 집단에서 발견되는 유전자좌(locus)에서 가장 낮은 대립인자 빈도)로 할당될 수 있다. 인간 집단 내에서 변이(variations)가 존재하며, 지질학적 또는 민족적 군에서 공통적인 하나의 SNP 대립유전자는 매우 희귀하다. 단일염기는 폴리뉴클레오타이드 서열에 변화(대체), 제거(결실) 또는 첨가(삽입)될 수 있다. SNP는 번역 프레임의 변화(inframe shift)를 유발할 수 있다.“Single Nucleotide Polymorphism (SNP)” refers to when a single base (A, T, C or G) in the genome differs between members of a species or between individual paired chromosomes. It refers to the diversity of DNA sequences. For example, DNA fragments from different individuals (eg, TGTG[G/T]AAAG, where G/T is a complementary base), including differences in a single base, are called two alleles (G or T). In general, almost all SNPs have two alleles. Within a population, SNPs can be assigned a minor allele frequency (MAF; the lowest allele frequency at a locus found in a particular population). Variations exist within the human population, and a single SNP allele common in geological or ethnic groups is very rare. Single bases can be changed (replaced), removed (deleted) or added (inserted) to the polynucleotide sequence. SNP may cause an inframe shift.
SNP는 유전체상에 존재하는 위치와 기능적 측면에서 여러 종류로 나눠 볼 수 있다. 유전체 상에 존재하는 위치에 따라 분류해보면 regulatory SNP(rSNP)는 유전자의 프로모터 부위에 위치하여 유전자의 발현을 조절하는 기능을 지닌 SNP를 말한다. 또한 SNP는 유전자의 코딩 서열, 유전자의 비 코딩 영역 또는 유전자 간 영역(유전자 사이의 영역)에 속할 수 있다. Coding SNP(cSNP)는 유전자를 코딩하는 엑손(exon)부위에 존재하는 SNP를 지칭하고, intron SNP(iSNP)는 인트론(intron)에 위치하는 SNP를 지칭하며, genomic SNP(gSNP)는 유전자와 유전자 사이의 intergenic region에 존재하는 SNP를 말한다.SNPs can be classified into several types in terms of their location and function in the genome. If classified according to its location in the genome, regulatory SNP (rSNP) refers to a SNP that has the function of regulating gene expression by being located at the promoter region of a gene. In addition, the SNP may belong to a coding sequence of a gene, a non-coding region of a gene, or an intergene region (region between genes). Coding SNP (cSNP) refers to the SNP present in the exon region encoding the gene, intron SNP (iSNP) refers to the SNP located in the intron, and genomic SNP (gSNP) refers to the gene and gene It refers to the SNP that exists in the intergenic region between.
이 가운데 유전자의 기능 변화에 직접적으로 관여하며 발현을 조절할 수 있는 exon 앞에 위치하는 rSNP와 cSNP가 표현형의 변화를 초래할 수 있는 기능적 SNP일 가능성이 매우 높은데, 이는 rSNP와 cSNP에서의 변화는 기능적 아미노산 서열에 변화를 초래할 가능성이 높기 때문이다. 그러나 유전암호의 중복성(codon degeneracy)으로 인해 유전자의 코딩 서열 내의 SNP가 반드시 타겟 단백질의 아미노산 서열 상에 변화를 일으키는 것은 아니다.Among these, rSNP and cSNP, which are located in front of the exon, which are directly involved in the change of gene function and that can control expression, are very likely to be functional SNPs that can cause phenotypic changes. This is because changes in rSNP and cSNP are functional amino acid sequences. This is because there is a high possibility of causing a change in However, due to the codon degeneracy, the SNP in the coding sequence of the gene does not necessarily cause a change in the amino acid sequence of the target protein.
인간 DNA 서열의 다양성은 인간이 어떻게 질병을 일으키고 병원균, 화학 물질, 약물, 백신 및 기타 매개체에 반응하는지에 영향을 미칠 수 있다. SNP는 맞춤형 의약의 개념을 실현하기 위한 중요한 도구(keyenabler)로 생각되고 있다. 무엇보다도, 최근에 마커로서 활발하게 개발되고 있는 SNP는 질병을 가지거나 또는 가지지 않는 군들 간에 게놈 부위를 비교함으로써 질병을 진단하는 생의학적 연구에서 매우 중요하다.The diversity of human DNA sequences can affect how humans cause disease and react to pathogens, chemicals, drugs, vaccines and other vectors. SNP is considered to be an important tool (keyenabler) for realizing the concept of customized medicine. Above all, SNPs, which have been actively developed as markers in recent years, are very important in biomedical studies for diagnosing diseases by comparing genomic regions between groups with or without disease.
모든 유형의 SNP는 관찰 가능한 표현형을 가질 수 있거나 질병을 유발할 수 있다. 비 암호화 영역의 SNP는 암 위험이 더 높을 수 있으며, mRNA 구조와 질병 감수성에 영향을 줄 수 있다. 비 암호화 SNP는 eQTL(expression quantitative trait locus)과 같이 유전자의 발현 수준을 변경할 수도 있다. 암호화 지역의 SNP의 경우, 동의적 치환(synonymous substitutions)은 단백질 내의 아미노산을 변화시키지는 않지만 여전히 다른 방식으로 그의 기능에 영향을 미칠 수 있다. 예를 들어, 다제내성유전자 1(multiple drug resistant gene 1, MDR1)에서 침묵 돌연변이가 있을 수 있는데, 이는 세포로부터 약물을 방출하는 세포막 펌프를 암호화하고, 번역 속도를 늦추고 펩타이드 사슬을 비정상적인 형태로 접히도록 허용한다. MDR1 단백질에서 C1236T 다형성은 폴리펩티드의 아미노산 위치 412에서 GGC 코돈을 GGT로 변화시키고(둘 다 글리신을 코딩한다)(G Gumus-Akay, et al., (2008) Genet Mol Res, 7 (4), 1193-9), C3435T 다형성은 1145 위치에서 ATC를 ATT로 변화시킨다(둘 다 이소루신을 코딩한다)(Ji Woong Sohn, et al., Tuberc Respir Dis 2005; 58:135-141). 비동의적 치환의 예로는 LMNA 유전자의 c1580 G>T SNP로, DNA 서열(CGT 코돈)의 1580 위치(nt)로 인해 구아닌이 대체될 수 있는 단백질의 아미노산 및 그 오작동의 원인이 된다. DNA 서열에서 CTT 코돈을 생성하는 티민과 함께, 아르기닌이 527 위치의 루신에 의해 치환되는 단백질 수준을 나타내며, 표현형 수준에서 이것은 중첩하는 mandibuloacral dysplasia 및 progeria 증후군에서 나타난다.All types of SNPs can have an observable phenotype or can cause disease. SNPs in non-coding regions may have a higher cancer risk and may affect mRNA structure and disease susceptibility. Non-coding SNPs can also alter the level of expression of a gene, such as expression quantitative trait locus (eQTL). For SNPs in the coding region, synonymous substitutions do not change amino acids in the protein, but can still affect their function in other ways. For example, there may be a silent mutation in multiple drug resistant gene 1 (MDR1), which encodes a cell membrane pump that releases the drug from the cell, slows the rate of translation and causes the peptide chain to fold into an abnormal shape. Allow. The C1236T polymorphism in the MDR1 protein changes the GGC codon to GGT at amino acid position 412 of the polypeptide (both encode glycine) (G Gumus-Akay, et al., (2008) Genet Mol Res, 7 (4), 1193). -9), C3435T polymorphism changes ATC to ATT at position 1145 (both coding for isoleucine) (Ji Woong Sohn, et al., Tuberc Respir Dis 2005; 58:135-141). An example of a non-synonymous substitution is the c1580 G>T SNP of the LMNA gene, which causes the amino acid in the protein to be replaced by guanine due to position 1580 (nt) of the DNA sequence (CGT codon) and its malfunction. Together with thymine producing the CTT codon in the DNA sequence, arginine represents the protein level where it is replaced by leucine at position 527, and at the phenotypic level this is seen in overlapping mandibuloacral dysplasia and progeria syndromes.
SNP와 질병 관련 유전자에 관한 연관성 연구가 활발히 진행 중에 있다. SNP 발굴을 위한 사업이 진행되어 총 180만 개의 SNP를 발굴하였다. 유전변이형 정보를 체계적으로 수집하고 일반 연구자에게 전달하기 위해서 만들어진 NCBI의 dbSNP에 의하면 현재 dbSNP에 등록된 수가 무려 2,365만개 이상으로 조사되었고(dbSNP build 131; 2010년5월기준), 향후 지속적인 발굴로 그 수는 더욱 늘어날 것으로 예측된다.Research on the association between SNP and disease-related genes is actively underway. A project to discover SNPs was underway and a total of 1.8 million SNPs were discovered. According to NCBI's dbSNP, which was created to systematically collect genetic variant information and deliver it to general researchers, the number of currently registered in dbSNP is over 23.65 million (dbSNP build 131; as of May 2010). The number is expected to increase further.
SNP genotyping(SNP 유전형분석)은 한 종의 구성원간 단일염기다형성(SNP)의 유전적 변이를 측정하는 것이다. 이는 보다 일반적인 유전적 변이를 측정하는 유전형 분석의 한 형태이다. SNP는 많은 인간 질병의 병인에 관여하는 것으로 밝혀졌으며, 약리 유전학과 관련하여 특히 관심이 높아지고 있다. SNP는 진화 과정에서 보존되기 때문에, quantitative trait loci(QTL) 분석 및 microsatellites 대신 연구에 사용하기 위한 마커로 제안되었다. HapMap 프로젝트에서 SNP의 사용이 확대되고 있는데, 이 프로젝트는 인간 게놈의 유전형 분석에 필요한 최소한의 SNP 세트를 제공하는 것을 목표로 한다. SNP는 또한 신원 확인에 사용하기 위해 유전자 지문(genetic fingerprint)을 제공할 수 있다(Harbron S; Rapley R (2004). Molecular analysis and genome discovery. London: John Wiley & Sons Ltd.). SNP에 대한 관심의 증가는 다양한 SNP 유전형분석 방법의 발전으로 반영되었다.SNP genotyping (SNP genotyping) measures the genetic variation of single nucleotide polymorphism (SNP) between members of a species. This is a form of genotyping that measures more common genetic variations. SNP has been found to be involved in the etiology of many human diseases, and is of particular interest in pharmacological genetics. Since SNPs are conserved during evolution, they have been proposed as markers for use in research instead of quantitative trait loci (QTL) analysis and microsatellites. The use of SNPs is expanding in the HapMap project, which aims to provide the minimal set of SNPs required for genotyping of the human genome. SNPs can also provide genetic fingerprints for use in identification (Harbron S; Rapley R (2004). Molecular analysis and genome discovery. London: John Wiley & Sons Ltd.). The increasing interest in SNP was reflected by the development of various SNP genotyping methods.
SNP 분석에는 다양한 방법이 이용되고 있다. 지금까지 개발되고 이용중인 대부분의 방법은 PCR 방법에 기초하여 여러 시료에 대한 한정된 수의 SNP 분석이 주를 이루고 있으나, DNA array를 이용해 동시에 많은 수의 SNP를 분석하거나 MALDI-TOF와 같은 초정밀 분석장비를 이용한 분석방법도 많이 이용되고 있다. SNP genotyping의 원리에는 시료의 준비방법과 검색방법의 차이에 따라 Allele-Specific Hybridization, Primer Extension, Allele-Specific Oligonucleotide Ligation, Cleavage 등 4가지가 있다(이종극/질병유전체분석법3(Genetic Variation and Disease)).Various methods are used for SNP analysis. Most of the methods developed and used so far are based on the PCR method and mainly analyze a limited number of SNPs for several samples, but analyze a large number of SNPs simultaneously using a DNA array or an ultra-precise analysis equipment such as MALDI-TOF. The analysis method using is also widely used. There are four principles of SNP genotyping, including Allele-Specific Hybridization, Primer Extension, Allele-Specific Oligonucleotide Ligation, and Cleavage, depending on the difference between sample preparation and retrieval methods (Genetic Variation and Disease). .
PCR을 기초로 한 주요 SNP 분석법은 SSCP(Single Strand Conformation Polymorphism), AFLP(Amplified Fragment Length Polymorphism), RFLP(Restriction Fragment Length Polymorphism), RAPD(Random Amplified Polymorphic DNA), AS-PCR(Allele-Specific PCR) 등이 있다.The major SNP analysis methods based on PCR are SSCP (Single Strand Conformation Polymorphism), AFLP (Amplified Fragment Length Polymorphism), RFLP (Restriction Fragment Length Polymorphism), RAPD (Random Amplified Polymorphic DNA), AS-PCR (Allele-Specific PCR). Etc.
SSCP(single-strand conformation polymorphism 또는 single-strand chain polymorphism)는 SNP genotyping에 많이 이용되는 방법으로, 특정 실험 조건 하에서 서열의 차이에 의해 유도된 동일 길이의 단일가닥 염기서열의 형태적 차이로 정의된다. 이 특성은 서로 다른 형태에 따라 단편을 분리하는 겔 전기영동에 의해 서열을 구별할 수 있게 한다(M Orita, et al., (1989) Proc Natl Acad Sci U S A, 86 (8), 2766-70). PCR로 해당 부위를 증폭한 뒤 이중나선 DNA를 높은 온도 조건(94℃)에서 변성(denature)시켜 단일가닥(single strand)으로 만든 뒤 빠르게 냉각시켜 단일가닥 서열 특유의 입체구조를 형성하게 한다. 이를 denaturing polyacrylamide gel에서 전기영동 하면 서열 상의 차이가 존재하는 각각의 단일가닥은 서로 다른 이동상을 가지게 된다. 길이가 같더라도 그 안에 서로 다른 염기구조를 가지게 되면 이동상에서 구별이 되므로 샘플 사이의 이동속도를 비교하여 변이를 확인할 수 있다.SSCP (single-strand conformation polymorphism or single-strand chain polymorphism) is a method commonly used for SNP genotyping, and is defined as the morphological difference of single-stranded nucleotide sequences of the same length induced by sequence differences under specific experimental conditions. This property makes it possible to distinguish sequences by gel electrophoresis, which separates fragments according to different morphology (M Orita, et al., (1989) Proc Natl Acad Sci USA, 86 (8), 2766-70). . After amplifying the site by PCR, the double-stranded DNA is denatured under high temperature conditions (94°C) to form a single strand and then quickly cooled to form a unique three-strand structure. When this is electrophoresed on a denaturing polyacrylamide gel, each single strand with a difference in sequence has a different mobile phase. Even if the lengths are the same, if they have different base structures in them, they are distinguished in the mobile phase, so the variation can be confirmed by comparing the moving speed between samples.
AFLP(amplified fragment length polymorphism)는 1990년대 초 Keygene에 의해 개발되었으며(“Keygene.com”. Retrieved 10 February 2013), 결과 데이터는 길이 다형성(length polymorphisms)으로 기록되지 않고, 존재-부존재 다형성(presence-absence polymorphisms)으로 기록된다(P Vos, et al, (1995) Nucleic Acids Res, 23 (21), 4407-14). 게놈 DNA를 분해하기 위해 제한효소(restriction enzyme)를 사용하며, restriction fragment의 sticky end에 adaptor를 연결한다. 이어서, restriction fragment의 subset가 증폭되도록 선택된다. 인식부위가 많지 않은 특정 제한효소로 절단된 DNA의 단편들에 adaptor를 붙인 다음, adaptor의 염기서열을 바탕으로 제작된 primer를 사용하여 각 단편들을 증폭시켜 얻어지는 band pattern의 차이를 비교하는 것이다. AFLP는 randomly amplified polymorphic DNA(RAPD), restriction fragment length polymorphism(RFLP) 및 microsatellites와 같은 다른 마커 기술과 비교할 때 많은 장점이 있다. AFLP는 다른 기술에 비해 전체 게놈 수준에서 더 높은 재현성, 분해능 및 민감도를 가질 뿐만 아니라(UG Mueller, LL Wolfenbarger, (1999) Trends Ecol Evol, 14 (10), 389-394), 한 번에 50 내지 100개의 단편을 증폭할 수 있는 능력을 가지고 있다. 또한, 증폭을 위해서 이전의 서열 정보가 필요하지 않다(Heidi M Meudt, Andrew C Clarke, (2007) Trends Plant Sci, 12 (3), 106-17). 다형성이 드문 계통에 적용할 수 있을 뿐 아니라 양 말단의 DNA 염기서열을 알지 못하는 제한효소 단편을 증폭시킬 수 있다는 장점이 있다.AFLP (amplified fragment length polymorphism) was developed by Keygene in the early 1990s (“Keygene.com”. Retrieved 10 February 2013), and the resulting data are not recorded as length polymorphisms, but presence-presence polymorphism. absence polymorphisms) (P Vos, et al, (1995) Nucleic Acids Res, 23 (21), 4407-14). Restriction enzyme is used to degrade genomic DNA, and an adapter is connected to the sticky end of the restriction fragment. Subsequently, a subset of the restriction fragment is selected to be amplified. This is to compare the difference in band pattern obtained by amplifying each fragment by attaching an adapter to the fragments of DNA cut with a specific restriction enzyme that does not have many recognition sites, and then amplifying each fragment using a primer made based on the base sequence of the adapter. AFLP has many advantages compared to other marker technologies such as randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP) and microsatellites. AFLP not only has higher reproducibility, resolution and sensitivity at the whole genome level compared to other technologies (UG Mueller, LL Wolfenbarger, (1999) Trends Ecol Evol, 14 (10), 389-394), 50 to 50 at a time. It has the ability to amplify 100 fragments. In addition, prior sequence information is not required for amplification (Heidi M Meudt, Andrew C Clarke, (2007) Trends Plant Sci, 12 (3), 106-17). Not only can it be applied to lines with rare polymorphisms, it has the advantage of being able to amplify restriction enzyme fragments that do not know the DNA sequence at both ends.
RFLP(restriction fragment length polymorphism)는 제한효소(restriction endonuclease) 처리에 의한 DNA fragment 길이 차이를 확인하여 SNP를 typing하는 방법이다. PCR을 통해 증폭된 DNA fragment 상에 존재하는 SNP 부위가 특정 제한효소에 의하여 구별될 수 있는 경우에 이용된다. 증폭된 fragment의 SNP에 의하여 특정 제한효소에 대한 restriction site의 sequence가 달라져 두 SNP allele의 fragment 길이의 차이가 발생하여 agarose gel 상에서 쉽게 확인할 수 있다. 많은 종류의 제한효소가 시판되고 있고 원하는 sequence에 작용하는 인식부위를 찾아주는 software가 web 상에서 무료로 제공되고 있어 손쉽게 이용할 수 있다. 그러나 30~40%의 SNP는 restriction site를 가지고 있지 않은데, 이를 해결하기 위해서 primer 상에 1~2bp의 변화를 주어 실재하지 않는 restriction site를 만들어 typing에 이용하기도 한다(primer mutagenesis).Restriction fragment length polymorphism (RFLP) is a method of typing SNP by checking the difference in length of DNA fragments by treatment with restriction endonuclease. It is used when the SNP site present on the DNA fragment amplified through PCR can be distinguished by a specific restriction enzyme. The sequence of the restriction site for a specific restriction enzyme is changed by the SNP of the amplified fragment, resulting in a difference in fragment length of the two SNP alleles, which can be easily identified on an agarose gel. Many types of restriction enzymes are commercially available, and software that finds the recognition site acting on the desired sequence is provided free of charge on the web, so it can be easily used. However, 30-40% of SNPs do not have a restriction site, and to solve this, a restriction site that does not exist by changing 1 to 2 bp on the primer is sometimes used for typing (primer mutagenesis).
RAPD(Random Amplified Polymorphic DNA)는 PCR의 일종이지만, 증폭되는 DNA 부분은 random이다. 임의의 짧은 프라이머(8~12bp)를 이용하여 상보적인 염기서열에 의해 match되는 부위만을 증폭시키게 된다. 이 방법은 agarose gel에서 나타나는 DNA 절편의 패턴을 조사하면 되기 때문에 매우 간단하다. 하지만 아주 작은 primer 단편들은 DNA에 대해 대략 70%의 상동성만 지니고 있으면 증폭이 가능하기 때문에 극도의 세심한 실험 조건을 필요로 한다. 이런 단점을 극복하기 위해 증폭된 부위의 말단 염기서열을 분석한 다음 특이적인 primer로 재합성하여 사용한다면 재현성에 전혀 문제가 없기 때문에 연관분석 연구에는 충분히 사용할 수 있는 방법이다.RAPD (Random Amplified Polymorphic DNA) is a kind of PCR, but the amplified DNA portion is random. An arbitrary short primer (8-12bp) is used to amplify only the matched region by the complementary nucleotide sequence. This method is very simple because you only need to investigate the pattern of DNA fragments appearing on the agarose gel. However, very small primer fragments can be amplified as long as they have approximately 70% homology to DNA, and thus require extremely careful experimental conditions. In order to overcome this shortcoming, if the terminal sequence of the amplified site is analyzed and then resynthesized with a specific primer, there is no problem in reproducibility, so it is a method that can be sufficiently used for association analysis.
AS-PCR(allele-specific polymerase chain reaction)은 ethidium bromide로 염색된 agarose 또는 polyacrylamide gel에서 PCR 산물을 분석함으로써 DNA의 임의의 점 돌연변이를 직접 검출할 수 있는 PCR의 응용 방법이다(Luis Ugozzoli, R. Bruce Wallace, Allele-specific polymerase chain reaction, Methods, Volume 2, Issue 1, February 1991, Pages 42-48). PCR 증폭에서 primer의 3’ 말단(end)이 반드시 DNA template와 상보적이어야 한다는 것에 기초하고 있다. A(adenine)과 C(cytosine)의 SNP가 있을 경우 3’ 말단이 A로 끝나는 primer와 C로 끝나는 primer를 제작하여 증폭하게 되면 각각의 primer와 상보적인 DNA만 증폭되므로 SNP typing이 가능해진다.AS-PCR (allele-specific polymerase chain reaction) is an application method of PCR that can directly detect any point mutation in DNA by analyzing the PCR product on agarose or polyacrylamide gel stained with ethidium bromide (Luis Ugozzoli, R. Bruce Wallace, Allele-specific polymerase chain reaction, Methods, Volume 2, Issue 1, February 1991, Pages 42-48). It is based on the fact that the 3'end of the primer must be complementary to the DNA template in PCR amplification. If there are SNPs of A (adenine) and C (cytosine), if the 3'end of the primer ends in A and the primer ends in C are prepared and amplified, only DNA that is complementary to each primer is amplified, so SNP typing becomes possible.
이 외에도 형광 dye를 이용하여 Real-time PCR을 통해 분석하는 방법도 이용되고 있다.In addition, a method of analyzing through real-time PCR using a fluorescent dye is also used.
본 발명의 일 실시예에서, TOP3B를 포함한 132개의 신경계 유전질환 관련 후보 유전자를 선정하여 이에 대한 타겟 시퀀싱을 수행하였다. 시퀀싱 라이브러리는 Illumina 社(San Diego, CA, USA)의 TruSeq Nano DNA Library Prep Kits를 이용하였고 132개의 유전자에 대한 타겟 농축(targeted enrichment)을 위하여 IDT 社(Coralville, IA, USA)의 xGen 잠금 프로브(lockdown probes)를 사용하였다. Beckman Coulter 社의 Agencourt AMPure protocol을 따라 정제한 후, Illumina p5와 p7 프라이머를 이용하여 증폭시켰으며, qPCR 및 KAPA Library Quantification kit(KAPA Biosystems, Boston, MA, USA)로 정제 및 정량을 하였다. 최종적으로 Post-enriched 라이브러리 NGS 분석은 Illumina 社의 NextSeq 550을 이용하였다.In an embodiment of the present invention, 132 candidate genes related to neurological genetic diseases including TOP3B were selected and target sequencing was performed. For the sequencing library, TruSeq Nano DNA Library Prep Kits of Illumina (San Diego, CA, USA) were used, and the xGen locking probe of IDT (Coralville, IA, USA) was used for targeted enrichment of 132 genes. lockdown probes) were used. After purification according to Beckman Coulter's Agencourt AMPure protocol, it was amplified using Illumina p5 and p7 primers, and purified and quantified with qPCR and KAPA Library Quantification kit (KAPA Biosystems, Boston, MA, USA). Finally, the NGS analysis of the post-enriched library was performed using NextSeq 550 of Illumina.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 상기 유전자를 증폭하고, 상기 증폭된 산물의 시퀀싱(sequencing) 데이터를 이용하여 유전자 돌연변이를 분석하는 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the step of detecting the SNV of the TOP3B gene may be characterized by amplifying the gene and analyzing the gene mutation using sequencing data of the amplified product, but is limited thereto. no.
본 발명에 있어서, 상기 시퀀싱은 생어 염기서열 분석(Sanger sequencing) 또는 차세대 염기서열 분석(next generation sequencing; NGS)인 것을 특징으로 할 수 있다.In the present invention, the sequencing may be characterized in that Sanger sequencing or next generation sequencing (NGS).
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 TOP3B 유전자에 대한 프라이머를 이용할 수 있으며, 한 쌍(pair) 이상의 프라이머 세트를 이용할 수도 있다. 상기 프라이머는 TOP3B 유전자를 증폭시킬 수 있는 서열이면 제한 없이 이용가능하나, 바람직하게는 상기 표 1에 기재된 TOP3B 유전자의 SNV 중 어느 하나 이상을 증폭할 수 있는 프라이머 세트인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, in the step of detecting SNV of the TOP3B gene, a primer for the TOP3B gene may be used, or a pair or more primer sets may be used. The primer may be used without limitation as long as it is a sequence capable of amplifying the TOP3B gene. Preferably, it may be characterized in that it is a primer set capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1 above. It is not limited.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 TOP3B 유전자에 대한 프로브를 이용할 수 있으며, 상기 프로브는 상기 표 1에 기재된 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다. 본 발명에 있어서, 상기 프로브의 5' 말단에는 리포터(reporter)가 부착될 수 있으며, 형광을 나타내는 다른 형광 물질이 부착될 수 있으나, 이에 한정되는 것은 아니다. 예를 들면, 상기 리포터는 FAM, JOE, BHQ1, VIC, TAMRA, ROX, NED, HEX, TET, 플루오레신(fluorescein), 플루오레신 클로로트리아지닐(fluorescein chlorotriazinyl), 로다민 그린(rhodamine green), 로다민 레드(rhodamine red), 테트라메틸로다민(tetramethylrhodamine), FITC, 오레곤 그린(Oregon green), 알렉사 플루오로(Alexa Fluor), 텍사스 레드(Texas Red), 시아닌(Cyanine) 계열 염료 및 씨아디카르보시아닌(thiadicarbocyanine) 염료로 구성된 군에서 선택된 하나 이상일 수 있다. 또한, 상기 프로브의 3’ 말단에는 퀀쳐(quencher)로서 블랙홀 퀀쳐-1(Black Hole Quencher-1, BHQ-1)이 부착되어 있을 수 있고, 퀀쳐로서 사용될 수 있는 다른 물질이 부착될 수 있으며, 이에 한정되지 아니한다. 예를 들면, 상기 퀀쳐는 답실(Dabcyl), TAMRA, Eclipse, DDQ, QSY, 블랙베리 퀀쳐(Blackberry Quencher), Qxl, 아이오와 블랙(Iowa black) FQ, 아이오와 블랙 RQ 및 IRDye QC-1로 이루어지는 군으로부터 선택된 하나 이상일 수 있다.In the present invention, in the step of detecting the SNV of the TOP3B gene, a probe for the TOP3B gene may be used, and the probe is a probe that complementarily binds to a region containing the SNV position of the TOP3B gene described in Table 1 above. It may be characterized, but is not limited thereto. In the present invention, a reporter may be attached to the 5'end of the probe, and another fluorescent material indicating fluorescence may be attached, but the present invention is not limited thereto. For example, the reporter is FAM, JOE, BHQ1, VIC, TAMRA, ROX, NED, HEX, TET, fluorescein, fluorescein chlorotriazinyl, rhodamine green. , Rhodamine red, tetramethylrhodamine, FITC, Oregon green, Alexa Fluor, Texas Red, Cyanine-based dyes and ciadica It may be one or more selected from the group consisting of thiadicarbocyanine dyes. In addition, a black hole quencher-1 (BHQ-1) may be attached as a quencher to the 3'end of the probe, and another material that can be used as a quencher may be attached. Not limited. For example, the quencher is from the group consisting of Dabcyl, TAMRA, Eclipse, DDQ, QSY, Blackberry Quencher, Qxl, Iowa black FQ, Iowa black RQ and IRDye QC-1. It may be one or more selected.
상기 표 1에 기재된 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 세트 및 상기 표 1에 기재된 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 디자인하는 것은 본 발명이 속하는 기술분야의 통상의 기술자라면 쉽게 도출할 수 있으며, 상기 프라이머 세트 및 프로브는 실시간 PCR(real time polymerase chain reaction)에 사용할 수 있고, 더욱 바람직하게는 동시다중(multiplex) 실시간 PCR에 사용할 수 있다.Designing a primer set capable of specifically amplifying the SNV position described in Table 1 and a probe that complementarily binds to the region containing the SNV position of the TOP3B gene described in Table 1 above is in the technical field to which the present invention pertains. Those of ordinary skill in the art can easily derive, and the primer set and probe can be used for real time polymerase chain reaction (PCR), and more preferably, can be used for real time PCR.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 중합효소연쇄반응(polymerase chain reaction), 핵산 분해(nuclease digestion), 혼성화(hybridization), 서던 블로팅(Southern blotting), 제한효소 단편다형성(restriction enzyme fragment polymorphism), 프라이머 확장(primer extension), 단일가닥 형태 다형성(single stranded conformation polymorphism) 또는 상기 방법들을 함께 사용하여 분석하는 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다. 이미 알려진 분자생물학적인 방법을 함께 사용하여 분석할 수 있다.In the present invention, the step of detecting SNV of the TOP3B gene includes polymerase chain reaction, nucleic acid digestion, hybridization, Southern blotting, restriction enzyme fragment polymorphism ( restriction enzyme fragment polymorphism), primer extension, single stranded conformation polymorphism, or analysis using the above methods together, but is not limited thereto. It can be analyzed using a combination of known molecular biology methods.
본 발명에 있어서, 상기 “동시다중(multiplex) PCR”이란 PCR에 사용되는 프라이머 두 세트 이상이 하나의 증폭 반응에 사용되는 것을 의미한다.In the present invention, the "multiplex PCR" means that two or more sets of primers used in PCR are used in one amplification reaction.
본 발명에 있어서, 상기 생물학적 시료는 혈액, 모발, 타액, 소변, 정액, 질 세포, 구강세포, 태반세포 또는 태아세포를 포함하는 양수 및 이의 혼합물로 구성된 군에서 선택되는 시료로부터 분리된 핵산 시료인 것을 특징으로 할 수 있다. 상기 핵산은 게놈 DNA, cfDNA(cell free DNA), RNA 또는 micro RNA인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the biological sample is a nucleic acid sample isolated from a sample selected from the group consisting of amniotic fluid including blood, hair, saliva, urine, semen, vaginal cells, oral cells, placental cells or fetal cells, and mixtures thereof. It can be characterized. The nucleic acid may be genomic DNA, cell free DNA (cfDNA), RNA, or micro RNA, but is not limited thereto.
상기 핵산은 당업계에 공지된 통상적인 방법을 통해 수득할 수 있다. 예컨대, 상기 조직에 DNA 용해 완충액(예컨대, tris-HCl, EDTA, EGTA, SDS, 디옥시콜레이트(deoxycholate), 및 트리톤X(tritonX) 및/또는 NP-40을 포함)을 처리하여 DNA를 분리할 수 있으나, 이에 한정되지 않는다.The nucleic acid can be obtained through a conventional method known in the art. For example, the tissue is treated with a DNA lysis buffer (e.g., tris-HCl, EDTA, EGTA, SDS, deoxycholate, and tritonX and/or NP-40) to separate DNA. However, it is not limited thereto.
본 발명의 일 실시예에서, 게놈 DNA 단편은 (i) 상기 시험 샘플로부터 세포 DNA를 단리하는 단계; 및 (ii) 상기 세포 DNA를 단편화하여 상기 게놈 DNA 단편을 수득하는 단계를 포함하는 단계에 의해 수득될 수 있다.In one embodiment of the present invention, the genomic DNA fragment comprises the steps of: (i) isolating cellular DNA from the test sample; And (ii) fragmenting the cellular DNA to obtain the genomic DNA fragment.
본 명세서에 기재된 용어 “증폭”은 핵산 분자를 증폭하는 반응을 의미한다. 다양한 증폭 반응들이 당업계에 보고되어 있으며, 이는 중합효소 연쇄반응(PCR)(US 4,683,195, 4,683,202, 및 4,800,159), 역전사-중합효소 연쇄반응(RT-PCR) (Sambrook et al., Molecular Cloning. A Laboratory Manual, 3rd ed. Cold Spring Harbor Press(2001); WO 89/06700; 및 EP 329,822의 방법, 리가아제 연쇄 반응(ligase chain reaction; LCR) Gap-LCR(WO 90/01069), 복구 연쇄 반응(repair chain reaction; EP 439,182), 전사-매개 증폭(transcriptionmediated amplification; TMA, WO 88/10315), 자가 유지 염기서열 복제(self sustained sequence replication, WO 90/06995), 타깃 폴리뉴클레오티드 염기서열의 선택적 증폭(selective amplification of target polynucleotide sequences, 미국특허 6,410,276), 컨센서스 서열 프라이밍 중합효소 연쇄 반응(consensus sequence primed polymerase chain reaction(CP-PCR), US 4,437,975), 임의적 프라이밍 중합효소연쇄 반응(arbitrarily primed polymerase chain reaction(AP-PCR), US 5,413,909 및 5,861,245), 핵산 염기서열 기반 증폭(nucleic acid sequence based amplification(NASBA), US 5,130,238, 5,409,818, 5,554,517 및 6,063,603), 가닥 치환 증폭(strand displacement amplification)(21, 22) 및 고리-중재 항온성 증폭(loopmediated isothermalamplification; LAMP)(23)를 포함하나, 이에 한정되는 것은 아니다. 사용 가능한 다른 증폭 방법들은 US 5,242,794, 5,494,810, 4,988,617에 기술되어 있다.The term “amplification” as used herein refers to a reaction to amplify a nucleic acid molecule. Various amplification reactions have been reported in the art, which are polymerase chain reaction (PCR) (US 4,683,195, 4,683,202, and 4,800,159), reverse transcription-polymerase chain reaction (RT-PCR) (Sambrook et al., Molecular Cloning.A Laboratory Manual, 3rd ed.Cold Spring Harbor Press (2001); WO 89/06700; and EP 329,822, ligase chain reaction (LCR) Gap-LCR (WO 90/01069), repair chain reaction ( repair chain reaction; EP 439,182), transcription-mediated amplification (TMA, WO 88/10315), self sustained sequence replication (WO 90/06995), selective amplification of target polynucleotide sequences ( selective amplification of target polynucleotide sequences, US Patent 6,410,276), consensus sequence primed polymerase chain reaction (CP-PCR), US 4,437,975), arbitrarily primed polymerase chain reaction (AP -PCR), US 5,413,909 and 5,861,245), nucleic acid sequence based amplification (NASBA), US 5,130,238, 5,409,818, 5,554,517 and 6,063,603), strand displacement amplification (21, 22) and rings -Loopmediated isothermalamplification (LAMP) (23) Including, but is not limited to. Other amplification methods that can be used are described in US 5,242,794, 5,494,810, 4,988,617.
사용 가능한 다른 증폭 방법들은 미국특허 제5,242,794, 5,494,810, 4,988,617호 및 미국 특허 제09/854,317호에 기술되어 있다.Other amplification methods that can be used are described in US Pat. Nos. 5,242,794, 5,494,810, 4,988,617 and US 09/854,317.
PCR은 가장 잘 알려진 핵산 증폭 방법으로, 그의 많은 변형과 응용들이 개발되어 있다. 예를 들어, PCR의 특이성 또는 민감성을 증진시키기 위해 전통적인 PCR 절차를 변형시켜 터치다운(touchdown) PCR, 핫 스타트(hot start) PCR, 네스티드(nested) PCR 및 부스터(booster) PCR이 개발되었다. 또한, 실시간(real-time) PCR, 분별 디스플레이 PCR(differential display PCR: DD-PCR), cDNA 말단의 신속 증폭(rapid amplification of cDNA ends: RACE), 멀티플렉스 PCR, 인버스 중합효소 연쇄반응(inverse polymerase chain reaction: IPCR), 벡토레트(vectorette) PCR 및 TAIL-PCR(thermal asymmetric interlaced PCR)이 특정한 응용을 위해 개발되었다.PCR is the most well-known nucleic acid amplification method, and its many modifications and applications have been developed. For example, touchdown PCR, hot start PCR, nested PCR and booster PCR have been developed by modifying traditional PCR procedures to enhance the specificity or sensitivity of PCR. In addition, real-time PCR, differential display PCR (DD-PCR), rapid amplification of cDNA ends (RACE), multiplex PCR, inverse polymerase chain reaction chain reaction: IPCR), vectorette PCR and TAIL-PCR (thermal asymmetric interlaced PCR) have been developed for specific applications.
PCR에 대한 자세한 내용은 McPherson, M.J., 및 Moller, S.G. PCR. BIOS Scientific Publishers, Springer-Verlag New York Berlin Heidelberg, N.Y. (2000)에 기재되어 있으며, 그의 교시사항은 본 명세서에 참조로 삽입된다.For more information on PCR, see McPherson, M.J., and Moller, S.G. PCR. BIOS Scientific Publishers, Springer-Verlag New York Berlin Heidelberg, N.Y. (2000), the teachings of which are incorporated herein by reference.
SNV 또는 SNP 분석은 DNA sequencing을 통해 수행될 수 있다. DNA 염기서열 분석(DNA sequencing)이란 DNA를 이루고 있는 뉴클레오타이드의 핵염기서열 순서를 분석하는 것을 의미한다. DNA는 이중나선 구조로 이루어져 있으며 각각의 단일가닥은 5`-말단과 3`-말단으로 이루어져 있다. 일반적으로 DNA는 DNA 중합효소(DNA polymerase)에 의하여 5`-말단에서 3`-말단 방향으로 합성된다. 이러한 특성을 이용하여 과거로부터 DNA 염기서열을 분석하려는 시도가 계속되어 왔고, DNA 염기서열 분석은 1977년에 거의 동시에 개발된 두 가지 방법에 의해 가능하게 되었다. 첫째는 디디옥시뉴클레오타이드 트리포스페이트(ddNTP)를 이용하여 DNA 사슬 종결(DNA chain termination)을 통해 염기서열을 분석하는 Sanger 방법이며, 다른 하나는 화학제를 이용하여 DNA 내의 특정 염기부위를 절단하여 그 조각을 분석하는 Maxam-Gilbert 방법이다.SNV or SNP analysis can be performed through DNA sequencing. DNA sequencing refers to analyzing the sequence of nucleobases of nucleotides constituting DNA. DNA consists of a double helix structure, and each single strand has a 5`-end and a 3`-end. In general, DNA is synthesized from the 5`-end to the 3`-end by DNA polymerase. Attempts to analyze DNA sequencing have been continued from the past using these characteristics, and DNA sequencing has been made possible by two methods developed almost simultaneously in 1977. The first is the Sanger method, which analyzes the base sequence through DNA chain termination using the didioxynucleotide triphosphate (ddNTP), and the other is by cutting a specific base site in the DNA using a chemical agent This is the Maxam-Gilbert method of analyzing.
생어 염기서열 분석법(Sanger sequencing)은 매우 간편하고 독성이 적어서 비슷한 시기에 개발된 Maxam-Gilbert 법(Maxam and Gilbert, 1977)에 비해 빠르게 보급되었으며, 차후의 다른 방법들도 이 방법에서 변형 및 발전되었다. 이 기술은 DNA 중합 반응(DNA polymerization)에 기초한 것으로서, 서열 분석 대상인 DNA의 단일가닥 부위가 주형(template)으로 사용되며, 이 주형에 상보적인 짧은 올리고뉴클레오타이드가 합성을 개시하기 위한 프라이머(primer)로 사용된다. DNA 중합반응에서 디디옥시뉴클레오타이드 트리포스페이트(dideoxy nucleotide triphosphate, ddNTP)가 사용되면 DNA 사슬의 연장이 종료된다. 디디옥시뉴클레오타이드(dd-nucleotide)는 정상적인 뉴클레오타이드의 ribose의 3` 위치에 -OH기가 H기로 치환되어 있다. 정상적인 DNA 합성과정에서 ddNTPs도 DNA 사슬에 결합할 수 있다. 그러나 DNA 사슬로 들어가고 나면 ddNTPs는 3` 위치에 -OH기가 없으므로 더 이상 다음 뉴클레오타이드가 결합하지 못하여 신장반응이 종결된다.Sanger sequencing was very simple and less toxic, so it was quickly spread compared to the Maxam-Gilbert method (Maxam and Gilbert, 1977) developed at the same time, and other methods later were modified and developed from this method. . This technology is based on DNA polymerization, and a single-stranded portion of the DNA to be sequenced is used as a template, and short oligonucleotides complementary to this template are used as a primer to initiate synthesis. Used. When didioxy nucleotide triphosphate (ddNTP) is used in the DNA polymerization reaction, the extension of the DNA chain is terminated. In the didioxynucleotide (dd-nucleotide), the -OH group is substituted with the H group at the 3` position of the ribose of the normal nucleotide. During normal DNA synthesis, ddNTPs can also bind to the DNA chain. However, after entering the DNA chain, since ddNTPs have no -OH at the 3` position, the next nucleotide can no longer bind and the elongation reaction is terminated.
반응에서는 4가지의 각기 다른 시험관을 사용한다. 각 시험관에는 DNA의 구성성분이 되는 dNTP(dATP, dTTP, dGTP, dCTP)가 공통적으로 들어있다. 각각의 시험관에는 서로 다른 ddNTP 사슬 종결자(chain terminator)가 들어있어서 한 시험관에는 ddATP, 다음 시험관에는 ddTTP, 다음 시험관에는 ddGTP, 다음 시험관에는 ddCTP가 소량씩 들어있다. 나중에 검출을 용이하게 하기 위하여 dNTP 중의 한가지나 또는 primer는 방사능(32P)으로 표지가 되어야 한다. 예를 들면, ddGTP는 무작위적으로 G자리에 들어가므로 모든 G자리에 이론적으로 ddGTP가 들어갈 수 있다. 이 반응에서 합성되는 각각의 DNA 사슬은 모든 G지점에서 끝나게 되므로 합성된 사슬의 길이를 보면 G가 존재하는 위치를 알 수 있다. 이와 마찬가지로 A시험관에서는 사슬의 중합은 모든 A지점에서 끝날 수 있으며, T시험관에서는 모든 T지점에서, C시험관에서는 모든 C지점에서 끝나게 되어, 각 시험관마다 일련의 서로 길이가 다른 DNA가 만들어진다. 반응 후 각 시험관에서 DNA를 변성시켜 새로 합성된 다양한 가닥이 주형으로부터 떨어져 나오게 한다. A, T, G, C 각 염기 반응시험관마다 다른 lane에서 전기영동 후, 길이에 따라 분리된 DNA 조각들을 자기방사법(autoradiography)으로 관찰한다. 인접한 A, C, G, T 각 lane에서 위치에 따라 이동한 DNA 조각인 band를 차례로 읽으면 DNA 염기서열을 결정할 수 있다.Four different test tubes are used for the reaction. Each test tube commonly contains dNTP (dATP, dTTP, dGTP, dCTP), which is a component of DNA. Each test tube contains a different ddNTP chain terminator, so one test tube contains ddATP, the next test tube ddTTP, the next test tube ddGTP, and the next test tube ddCTP. To facilitate detection later, one of the dNTPs or primers should be labeled with radioactivity (32P). For example, since ddGTP randomly enters the G position, ddGTP can theoretically fit into any G position. Since each DNA chain synthesized in this reaction ends at all G points, you can see the location of G by looking at the length of the synthesized chain. Likewise, in test tube A, the polymerization of the chain can end at all points A, in test tube T, at all points T, and in test tube C, at all points C, a series of DNAs of different lengths are produced for each test tube. After the reaction, the DNA is denatured in each test tube so that various newly synthesized strands come off the template. A, T, G, C After electrophoresis in different lanes for each base reaction test tube, the separated DNA fragments according to their length are observed by autoradiography. The DNA sequence can be determined by reading the band, which is a fragment of DNA that has moved according to its position in each of the adjacent lanes A, C, G, and T.
초기의 Sanger 방식은 생성된 DNA 조각을 polyacrylamide slab gel에서 전기영동으로 분리하고 방사능으로 읽어내는 과정을 따로 수행해야 하기 때문에 조작이 길고 복잡하며 시간과 노동력이 많이 소요되었다(Sun-Il Kwon, (2012) Korean J Clin Lab Sci, 44(4): 167-177; F Sanger, et al., (1977) Proc Natl Acad Sci U S A, 74 (12), 5463-7).In the early Sanger method, the process of separating the generated DNA fragments by electrophoresis on a polyacrylamide slab gel and reading them by radioactivity was required, so the manipulation was long and complicated, and it took a lot of time and labor (Sun-Il Kwon, (2012). ) Korean J Clin Lab Sci, 44(4): 167-177; F Sanger, et al., (1977) Proc Natl Acad Sci USA, 74 (12), 5463-7).
이러한 초기 Sanger 방식의 문제점을 개선하기 위하여, 형광라벨을 도입하고 모세관 전기영동을 결합시켜 반응과 탐색을 부분적으로 자동화시켰다(자동화 염기서열 분석기술 - 1세대 염기서열 분석법). 형광라벨을 각각의 ddNTPs를 구분할 수 있는 표지자로 이용함으로써 하나의 시험관에서 염기서열 분석을 진행할 수 있으며, 모세관 전기영동은 전기영동에 필요한 모세관수를 획기적으로 늘림으로써 분석의 효율을 높였다. 또한 염기서열 분석기뿐만 아니라 주변기기도 자동화되게 되어서 사람의 손으로 하던 클로닝과 염기서열 결정 작업이 상당부분 자동화되었다.In order to improve the problems of this early Sanger method, a fluorescent label was introduced and the reaction and search were partially automated by combining capillary electrophoresis (automated sequencing technology-first generation sequencing method). By using a fluorescent label as a marker that can distinguish each ddNTPs, sequencing can be performed in one test tube, and capillary electrophoresis has increased the efficiency of the analysis by dramatically increasing the number of capillaries required for electrophoresis. In addition, not only the sequencing analyzer but also the peripheral devices have been automated, so that cloning and sequencing tasks that were done by human hands have been largely automated.
하지만 이러한 자동화에도 불구하고, 기본적으로는 Sanger의 사슬 종결(chain termination)을 이용한 염기서열 분석방법을 그대로 이용하는 것이었기 때문에 인간의 방대한 양의 유전체를 밝히기 위해서는 막대한 시간과 비용이 소요되는 문제점을 여전히 가지고 있었다. 개인의 염기서열을 분석하고 이를 의학을 비롯한 산업에 연계하기 위해서는 적은 시간 및 저렴한 비용을 들여 염기서열을 분석할 수 있는 획기적인 기술 개발의 필요성이 대두되었다. 이러한 문제를 해결하기 위하여 병목이 되고 있는 복잡한 과정을 과감히 없애거나, 시간이 많이 소요되는 과정을 한꺼번에 대량으로 처리할 수 있는 방법이 시도되었다.However, despite this automation, the basic sequencing method using Sanger's chain termination was used as it is, so it still has a problem that enormous time and cost are required to uncover a vast amount of human genomes. there was. In order to analyze an individual's sequence and link it to industries including medicine, the necessity of developing a breakthrough technology capable of analyzing the sequence in a small amount of time and low cost has emerged. In order to solve this problem, a method of drastically removing a complex process that is becoming a bottleneck or processing a time-consuming process in bulk at once has been attempted.
본 발명의 일 실시예에서, TOP3B를 포함한 132개의 신경계 유전질환 관련 후보 유전자를 선정하여 이에 대한 타겟 시퀀싱을 수행하였다. 시퀀싱 라이브러리는 Illumina 社(San Diego, CA, USA)의 TruSeq Nano DNA Library Prep Kits를 이용하였고 132개의 유전자에 대한 타겟 농축(targeted enrichment)을 위하여 IDT 社(Coralville, IA, USA)의 xGen 잠금 프로브(lockdown probes)를 사용하였다. Beckman Coulter 社의 Agencourt AMPure protocol을 따라 정제한 후, Illumina p5와 p7 프라이머를 이용하여 증폭시켰으며, qPCR 및 KAPA Library Quantification kit(KAPA Biosystems, Boston, MA, USA)로 정제 및 정량을 하였다. 최종적으로 Post-enriched 라이브러리 NGS 분석은 Illumina 社의 NextSeq 550을 이용하였다.In an embodiment of the present invention, 132 candidate genes related to neurological genetic diseases including TOP3B were selected and target sequencing was performed. For the sequencing library, TruSeq Nano DNA Library Prep Kits of Illumina (San Diego, CA, USA) were used, and the xGen locking probe of IDT (Coralville, IA, USA) was used for targeted enrichment of 132 genes. lockdown probes) were used. After purification according to Beckman Coulter's Agencourt AMPure protocol, it was amplified using Illumina p5 and p7 primers, and purified and quantified with qPCR and KAPA Library Quantification kit (KAPA Biosystems, Boston, MA, USA). Finally, the NGS analysis of the post-enriched library was performed using NextSeq 550 of Illumina.
현재 가장 많이 이용되고 있는 Illumina 社의 차세대 염기서열 분석법은 검체로부터 DNA를 추출한 이후 기계적으로 조각화(fragmentation) 시킨 이후 특정 크기를 가지는 라이브러리(library)를 제작하여 시퀀싱에 사용한다. 대용량 시퀀싱 장비를 사용하여 한 개의 염기단위로 4가지 종류의 상보적 뉴클레오타이드 결합 및 분리 반응을 반복하면서 초기 시퀀싱 데이터를 생산하게 되고, 이후에 초기 데이터의 가공(Trimming), 매핑(Mapping), 유전체 변이의 동정 및 변이 정보의 해석(Annotation) 등 생물적보학(Bioinformatics)을 이용한 분석 단계를 수행하여 이루어진다.Illumina's next-generation sequencing method, which is currently the most widely used, extracts DNA from a sample and then mechanically fragments it, then creates a library having a specific size and uses it for sequencing. Using large-capacity sequencing equipment, it repeats the binding and separation of four types of complementary nucleotides in one base unit to produce initial sequencing data, and afterwards, processing, mapping, and genome mutations of the initial data. It is accomplished by performing the analysis steps using bioinformatics, such as identification of variance and analysis of mutation information.
이러한 차세대 염기서열 분석법은 질병 및 다양한 생물학적 형태(phenotype)에 영향을 미치거나 가능성이 높은 유전체 변이를 발굴하여 혁신적인 치료제 개발 및 산업화를 통한 새로운 부가가치 창출에 기여하고 있다. 차세대 염기서열 분석법은 DNA 뿐만 아니라 RNA 및 메틸화(Methylation) 해독에도 응용될 수 있으며, 단백질을 코딩하는 엑솜(Exome) 영역만을 포획(Capture)하여 시퀀싱하는 전장 엑솜 시퀸싱(Whole-exome sequencing, WES)도 가능하다.This next-generation sequencing method is contributing to the creation of new added value through the development of innovative therapeutic agents and industrialization by discovering genomic mutations that have a high possibility or affect diseases and various biological phenotypes. The next-generation sequencing method can be applied not only to DNA but also to RNA and methylation decoding, and whole-exome sequencing (WES) that captures and sequence only the exome region encoding the protein. Also possible.
한편, NGS에서 라이브러리 제작(Library preparation)은 시료의 무작위적인 DNA 또는 cDNA 조각에서 5’에서 3’방향의 어댑터(adapter)를 접합하여 서열 분석에 필요한 라이브러리를 준비하는 과정이다. 초기 NGS 라이브러리 제작은 DNA 또는 RNA 시료의 무작위 절단, 3’ 및 5’ 말단 수리(repair), 어댑터 연결(ligation), PCR 증폭 및 정제 과정 등의 복잡한 과정과 하루 내지 이틀의 긴 시간이 필요하였다. Illumina 社에서는 이를 개선하여, “Nextera XT DNA library Preparation”과 같은 tagmentation 방법을 개발하였다. 이는, transposome에 tag(기존의 어댑터)를 결합시킨 복합체를 샘플 DNA에 처리하여, 절단과 어댑터 연결을 동시에 수행한 다음, PCR로 증폭하는 방법으로서, 8개의 샘플에서 라이브러리를 제작할 때 걸리는 시간을 3시간으로 줄이는 성과를 얻었다.Meanwhile, in NGS, library preparation is a process of preparing a library required for sequence analysis by conjugating an adapter in the direction of 5'to 3'from random DNA or cDNA fragments of a sample. Initial NGS library construction required complex procedures such as random cleavage of DNA or RNA samples, 3'and 5'end repair, adapter ligation, PCR amplification and purification, and a long time of one to two days. Illumina improved this and developed a tagmentation method such as “Nextera XT DNA library Preparation”. This is a method of treating the sample DNA with a complex in which a tag (conventional adapter) is bound to a transposome, cutting and linking the adapter at the same time, and then amplifying it by PCR, which reduces the time taken to prepare a library from 8 samples. The result was reduced by time.
본래 차세대 염기서열 분석(Next Generation Sequencing, NGS)으로 지칭되는 기술은 자동화로는 제2세대 기술에 해당된다. NGS는 이전의 첫 자동화 기기와 구분하고, 이후에 탄생한 Next NGS 기기(차차세대, 혹은 제3세대 NGS라고도 지칭됨)와 따로 구분하기 위하여 불리는 이름이다. 그러나, 효율적인 염기서열 분석기술의 개발경쟁이 가속화되고 새로운 기술의 도입 및 플랫폼의 사용 목적에 기초한 염기서열 분석기술이 지속적으로 개발됨에 따라, 각 세대의 염기서열 분석기술은 그 구분이 모호해지고, NGS는 자동화된 생어 염기서열 분석기술 이후의 염기서열 분석기술을 모두 아우르는 광의의 의미로 사용되고 있다.The technology, originally referred to as Next Generation Sequencing (NGS), corresponds to the second generation technology for automation. NGS is a name that is called to distinguish it from the first automated devices before, and to distinguish them from Next NGS devices (also referred to as the next generation or third generation NGS) that were created afterwards. However, as the competition for the development of efficient sequencing technology accelerates and sequencing technology based on the introduction of new technology and the purpose of using the platform is continuously developed, the sequencing technology of each generation becomes ambiguous, and the division of NGS Is used in a broad sense encompassing all of the sequencing technologies after the automated Sanger sequencing technology.
NGS에 도입된 기술은 크게 클론 증폭(clonal amplification), 대량병렬법(massively parallel), 바로 읽을 수 있는 새로운 염기서열결정법(비 Sanger법)(base/color calling) 등 3가지로 나눌 수 있다. 클론 증폭은 라이브러리(library) 구축과정을 제거하여 클로닝 과정이 제거되는 효과를 가지며, 대량병렬법은 동시에 수십만 개의 클론을 취급하므로 효율이 향상된다. 바로 읽을 수 있는 새로운 염기서열결정법은 모세관 전기영동 과정이 제거된 효과를 나타낸다.The technology introduced in NGS can be largely divided into three types: clonal amplification, massively parallel, and a new readable sequencing method (non-Sanger method) (base/color calling). Clonal amplification has the effect of removing the cloning process by removing the library construction process, and the mass-parallel method handles hundreds of thousands of clones at the same time, improving efficiency. The new straightforward sequencing method shows the effect of eliminating capillary electrophoresis.
클론 증폭(clonal amplification)에 의해 주형 clone을 얻는 과정이 단순화되었다. Sanger법으로 시퀀싱을 하려면 약 500염기쌍의 길이를 가진 주형 DNA가 필요하다. BAC library를 구축한 후 subcloning을 통해서 짧은 단편을 cloning한 다음 bacteria에서 증폭해야 한다. 새로운 방법은 번거로운 library 구축과 cloning 과정을 모두 없애고 DNA를 바로 적절히 짧은 단편으로 자른 다음 프라이머를 이용하여 PCR로 바로 증폭하여 주형 clone을 얻을 수 있게 한다. 클론 증폭에는 비드 기반(bead-based), 솔리드-스테이트(solid-satate), DNA 나노볼 생성(DNA nanoball generation)과 같은 전략들이 사용된다.The process of obtaining a template clone was simplified by clonal amplification. For sequencing by the Sanger method, a template DNA with a length of about 500 base pairs is required. After constructing the BAC library, short fragments must be cloned through subcloning and then amplified in bacteria. The new method eliminates both the cumbersome library construction and cloning process, cuts the DNA into short fragments as appropriate, and then amplifies it by PCR using primers to obtain a template clone. Strategies such as bead-based, solid-satate, and DNA nanoball generation are used for clonal amplification.
비드 기반의 클론 증폭의 경우, 에멀젼 PCR을 이용한다. 에멀전 PCR은 게놈 DNA를 단편화(fragmentation)하여 얻은 집합체인 DNA 라이브러리(DNA library)를 기름 속에서 작은 수용액 방울로 공간적으로 분리(separation)한 다음 한쪽 PCR primer가 표면에 수식된 미세비드와 함께 유탁액(emulsion)안에서 증폭한다. 이렇게 만들어진 한 개의 비드에 하나의 단일 DNA 단편에서 유래한 100만개 이상의 클론 DNA 조각이 고정되어 있게 하는 방법이다. 솔리드 스테이트 방법에는 대표적으로 브릿지-증폭방법(Bridge-amplification)이 있다. 브릿지-증폭방법은 단편화한 DNA의 양 말단에 아답터 올리고뉴크레오타이드(adaptor oligonucleotide)를 연결시킨 후, 이를 glass flow cell의 표면에 흘려주면 표면에 고정된 아답터와 상보적인 primer에 무작위로 결합된다. 이 상태에서 PCR을 행하면 주변에 존재하는 free primer에 고정된 DNA의 자유 말단이 결합되어 브릿지 형태를 이루고 증폭이 진행된다. 이렇게 증폭이 진행을 하면 상기 비드와 동일한 역할을 하는 클러스터(cluster)가 형성된다.For bead-based clone amplification, emulsion PCR is used. In emulsion PCR, a DNA library, which is an aggregate obtained by fragmenting genomic DNA, is spatially separated into small drops of an aqueous solution in oil, and then one PCR primer is used as an emulsion with fine beads modified on the surface. Amplify in (emulsion). This is a method in which more than 1 million cloned DNA fragments derived from one single DNA fragment are fixed on one bead. A representative solid state method is a bridge-amplification method. In the bridge-amplification method, an adapter oligonucleotide is connected to both ends of the fragmented DNA, and then flowed on the surface of a glass flow cell to randomly bind to an adapter fixed on the surface and a complementary primer. When PCR is performed in this state, the free ends of the DNA fixed to the surrounding free primer are bound to form a bridge and amplification proceeds. When the amplification proceeds in this way, a cluster that plays the same role as the bead is formed.
NGS는 대량병렬(massively parallel) 방식을 도입하여 상기 클론들을 판상으로 배치하여 염기서열 분석을 진행한다. 주형 clone은 숫자가 매우 많아서 이를 따로 준비하면 시간이 많이 소요된다. 주형에서 염기서열신호를 읽어내는 과정도 효율을 떨어뜨리는 심각한 제한요인이 된다. 수십만 개의 다른 clone을 대량병렬 방식으로 처리하면 시간을 획기적으로 단축할 수 있다.NGS introduces a massively parallel method to arrange the clones in a plate shape to perform nucleotide sequence analysis. The number of template clones is very large, so preparing them separately will take a lot of time. The process of reading the sequence signal from the template also becomes a serious limiting factor that reduces the efficiency. If hundreds of thousands of different clones are processed in a mass-parallel manner, time can be drastically reduced.
번거로운 전기영동 과정을 없애기 위해서 주형에 반응을 일으킨 다음, 반응에서 나오는 시그널로 각 주형의 서열정보를 바로 읽는 Sanger법을 탈피한 새로운 방법이 개발되었다. Sanger법을 대체하는 염기서열 결정법은 크게 DNA 결찰(ligation)을 통한 서열 분석 방법(Sequencing By Ligation, SBL)과 중합을 통한 서열 분석 방법(Sequencing By Synthesis, SBS)으로 나뉜다.In order to eliminate the cumbersome electrophoresis process, a new method was developed that breaks the Sanger method, which causes a reaction to the template and then reads the sequence information of each template with a signal from the reaction. The nucleotide sequence determination method, which replaces the Sanger method, is largely divided into a sequencing method (Sequencing By Ligation, SBL) and a sequencing method (Sequencing By Synthesis, SBS) through DNA ligation.
SBL 방식은 DNA단편의 반복적인 결찰(ligation)을 이용하는 것으로 주형 DNA에 n개의 염기를 갖는 앵커가 상보적으로 결합되며, 형광라벨로 표지 되는 2개의 무작위적으로 인코딩된 염기(encoded base)와 그 뒤에 따라오는 퇴화염기 또는 범용염기(degenerate or universial bases)를 갖는 프로브가 상기한 비드 나 클러스터가 침전된 DNA 라이브러리 슬라이드에 추가된다. 앵커의 바로 뒤에 따라오는 주형 DNA 단편과 상보적인 2개의 엔코딩된 서열을 가지는 프로브가 앵커에 라이게이션되고, 슬라이드의 형광라벨 이미징을 통해 2개의 인코딩된 염기서열을 분석한다. 2개의 서열이 분석되면 퇴화염기서열과 형광입자는 제거 된 후 프로브를 추가하는 상기 과정을 반복한다. 상기한 n의 앵커 외에 n+2, n+4의 염기를 갖는 앵커를 이용 및 반복적으로 분석하여 전체 주형 DNA단편의 서열을 분석하는 방법이다.The SBL method uses repetitive ligation of DNA fragments. An anchor with n bases is complementarily bound to a template DNA, and two randomly encoded bases labeled with a fluorescent label and their Probes with subsequent degenerate or universal bases are added to the DNA library slide in which the beads or clusters are precipitated. A probe having two encoded sequences complementary to the template DNA fragment immediately following the anchor is ligated to the anchor, and the two encoded nucleotide sequences are analyzed through fluorescent label imaging of the slide. When the two sequences are analyzed, the degenerate base sequence and the fluorescent particles are removed and the above process of adding a probe is repeated. In addition to the above-described n anchor, an anchor having bases of n+2 and n+4 is used and repeatedly analyzed to analyze the sequence of the entire template DNA fragment.
SBS는 다시 사이클릭 리버서블 터미네이션 방식(Cyclic Reversible Termination, CRT)과 단일 뉴클레오타이드 추가 방식(Single Nucleotide Addition, SNA)으로 구분된다.SBS is again divided into a cyclic reversible termination (CRT) and a single nucleotide addition (SNA).
CRT 방식은 자동화된 Sanger 방식과 유사한 과정을 이용하는데, 솔리드 스테이트 방법을 이용해 증폭된 DNA 클러스터를 갖는 슬라이드에 프라이머, DNA 중합효소, 변형 뉴클레오타이드 혼합물을 추가한다. 상기 변형 뉴클레오타이드는 추가적인 중합과정이 일어날 수 없도록 3`-O-아지도메틸(3`-O-azidomethyl)로 차단되며 각 베이스 특유적인 그리고 추후 제거가능한 형광라벨로 표지 된다. 중합 후 중합되지 않은 베이스는 씻어내고 총 내부 반사형 형광체(total internal reflection fluorescence, TIRF) 현미경을 이용하여 이미징을 통해 염기를 식별한다. 염기가 식별되면, 형광라벨은 분해되고 3′-OH는 환원제 Tris 2-Carboxyethyl)phosphine(TCEP)으로 재생된다. 이러한 과정을 반복하여 전기영동 없이 주형 DNA의 서열을 분석하는 방식이다.The CRT method uses a process similar to the automated Sanger method, in which a mixture of primers, DNA polymerase, and modified nucleotides is added to a slide having a DNA cluster amplified using the solid state method. The modified nucleotide is blocked with 3'-O-azidomethyl so that no additional polymerization process can occur, and is labeled with a fluorescent label specific to each base and which can be removed later. After polymerization, the unpolymerized base is washed off and the base is identified through imaging using a total internal reflection fluorescence (TIRF) microscope. When the base is identified, the fluorescent label is decomposed and 3'-OH is regenerated with the reducing agent Tris 2-Carboxyethyl)phosphine (TCEP). This process is repeated to analyze the sequence of the template DNA without electrophoresis.
SNA 방식은 DNA 중합효소가 단일 뉴클레오타이드를 붙일 때 생성되는 이온등을 빛으로 전환하여 염기서열을 분석하는 방식이다. SNA방식은 Roche사의 454기기가 이용하는 파이로시퀀싱 방법으로 대표되는데, 이는 뉴클레오타이드가 결합할 때 방출되는 이인산(pyrophosphate)을 빛으로 읽어내는 방식이다. 4가지의 dNTP(A, G, T, C)를 순차적으로 넣어서 반응시키고 씻어내기를 반복하면 중합반응이 될 때마다 빛을 발산하므로 이를 통해 염기서열을 알아내는 방식이다.The SNA method is a method of analyzing base sequence by converting ions, etc., which are generated when DNA polymerase attaches a single nucleotide to light. The SNA method is represented by the pyrosequencing method used by Roche's 454 device, which is a method of reading the pyrophosphate released when nucleotides are bound with light. If 4 kinds of dNTPs (A, G, T, C) are sequentially added and reacted and washed repeatedly, light is emitted every time the polymerization reaction occurs. This is a method to find the base sequence.
SBL을 이용한 대표적인 분석기기로는 구 Life Technologies사의 SOLiD 시리즈가 있으며, SBS를 이용한 대표적 분석기기로는 Illumina사의 Hiseq 시리즈(CRT 방식), Roche사의 454 시리즈(SNA 방식)가 있다.Representative analyzers using SBL include the former Life Technologies' SOLiD series, and representative analyzers using SBS include Illumina's Hiseq series (CRT method) and Roche's 454 series (SNA method).
상기 언급한 NGS 기술들은 공통적으로 복잡한 library 구축과 클로닝 과정을 과감히 버리고 클론증폭기술을 채택하였고, 한꺼번에 대량으로 처리할 수 있는 대량병렬방식(massively parallel sequencing) 기술을 택하였으며, 기존의 Sanger 방식을 탈피한 방법으로 염기서열을 결정하여 번잡한 전기영동과정을 제거하였다. 이렇게 얻어진 DNA 단편 조각의 서열정보를 샷건분석방식(shotgun sequencing)을 사용하여 읽혀진 짧은 read를 컴퓨터로 배열하여 중복된 부분을 찾아 전체를 완성하는 알고리즘을 사용한다.The above-mentioned NGS technologies boldly discarded the complex library construction and cloning process in common and adopted the clone amplification technology, adopted the massively parallel sequencing technology that can be processed in a large amount at once, and escaped the existing Sanger method. The nucleotide sequence was determined in one way to eliminate the complicated electrophoresis process. The sequence information of the DNA fragment fragment thus obtained is sequenced by a computer, which is read using a shotgun sequencing method, and an algorithm is used to find the overlapping part and complete the whole.
샷건분석방식(shotgun sequencing)은 커다란 유전자의 DNA 서열의 염기서열을 효율적으로 분석하기 위해 1Gb 이하의 짧은 리드(short-read)를 가진 DNA 단편의 라이브러리를 제작하고 이러한 짧은 리드의 서열 분석 결과를 토대로 각 리드에 중복되는 서열부분을 맵핑하고 배열하는 알고리즘을 통하여 분석하고자 하는 전체 DNA의 서열을 획득하는 방법이다. 짧은 리드를 이용하기 때문에 빠른 시간 내에 DNA 단편의 염기서열을 얻을 수 있지만 고성능의 컴퓨터가 필요하며, 전체 유전자의 크기가 클 경우에 신뢰도가 매우 낮아지는 단점이 있다. 또한 반복적이고 복잡한 영역은 샷건방법을 사용하여 조립하고 해결하는 것이 어려웠다.In the shotgun sequencing method, in order to efficiently analyze the nucleotide sequence of the DNA sequence of a large gene, a library of DNA fragments with short-reads of 1 Gb or less was created and based on the result of sequencing of such short reads. This is a method of obtaining the sequence of the entire DNA to be analyzed through an algorithm that maps and arranges the overlapping sequence parts in each read. Since a short read is used, the base sequence of a DNA fragment can be obtained in a short time, but a high-performance computer is required, and reliability is very low when the size of the entire gene is large. In addition, it was difficult to assemble and solve repetitive and complex areas using the shotgun method.
NGS 기기들이 기능과 속도가 많이 향상되었지만 맞춤의학시대를 열 수 있는 실질적인 게놈 염기서열 결정 비용 목표인 일인당 1000달러에는 많이 못 미친다. 미국국립인간게놈연구소(National Human Genome Research Institute, NHGRI)의 연구지원과 여러 기관의 경쟁적인 노력으로 NGS를 뛰어넘는 새로운 원리와 개념의 NGS 기기가 개발되고 있다(3세대 이상의 NGS).Although NGS devices have improved functionality and speed a lot, they fall short of the $1,000 per capita cost target for real genome sequencing that could open an era of personalized medicine. With the research support of the National Human Genome Research Institute (NHGRI) and the competitive efforts of various organizations, NGS devices with new principles and concepts that go beyond NGS are being developed (3rd generation or higher NGS).
상기 기술한 NGS 기술들의 단점을 극복하기 위해 1Gb 이상 또는 1Tb의 긴 리드를 읽는 기술들이 등장하고 있으며, 이로 인한 염기서열 분석의 시간연장을 단축하기 위해 클론 증폭과정 없이 단일 DNA분자 주형(Single DNA template)를 이용하는 방식이 연구되었다. Next NGS 기기에 도입된 기술로는 클론 증폭을 없애는 효과를 갖는 단일 DNA분자 주형사용 기술과 검출감도를 증대시킨 염기검출반응으로 합성이나 분해 시 생성되는 다양한 신호 사용(전류, 빛, 수소이온 등) 기술 등이 있다.In order to overcome the shortcomings of the above-described NGS technologies, technologies for reading long reads of 1 Gb or more or 1 Tb are emerging, and in order to shorten the time extension for sequencing analysis, a single DNA template without clonal amplification process ) Has been studied. The technology introduced in the Next NGS device is the use of a single DNA molecule template that has the effect of eliminating clonal amplification, and the use of various signals generated during synthesis or decomposition through a base detection reaction with increased detection sensitivity (current, light, hydrogen ions, etc.) Technology, etc.
첫째로, 새로운 NGS 기술에서는 상기 기술한 NGS 기술들의 한계를 극복하여 증폭이 없이 단일 DNA 분자로부터 바로 염기서열 분석을 하게 된다. 상기 기재된 짧은 리드를 이용한 NGS에서는 시퀀싱 반응에서 고속 촬영카메라로 충분히 잡힐 수준의 광신호를 생성하기 위하여 주형의 수를 충분히 늘려야 하기 때문에 단일 DNA 단편으로 먼저 클론증폭을 하였다. 그러나 새로운 NGS 기술은 단일 DNA 분자로부터 염기서열을 읽어낸다. 즉, DNA를 1분자 상태로 반응시켜 실시간으로 서열을 읽어낸다. 그리하여 PCR로 하는 클론증폭과정에서 나타날 수 있는 오류와 불균형 증폭 문제를 피하여 정확도를 높이고, 전체 과정을 1단계 줄여 염기서열 결정 속도를 더욱 높이게 되었다. 또한 반응에 관여하는 DNA 분자가 1개이기 때문에 DNA 중합효소, dNTP 등 염기서열을 분석하는데 필요한 시약의 양이 대폭 줄어들게 되었고, 이로 인해 비용의 절감에도 큰 영향을 미칠 수 있다.First, the new NGS technology overcomes the limitations of the above-described NGS technologies to perform base sequence analysis directly from a single DNA molecule without amplification. In the NGS using the short read described above, clone amplification was first performed with a single DNA fragment because the number of templates had to be sufficiently increased in order to generate an optical signal sufficient to be captured by a high-speed imaging camera in the sequencing reaction. However, the new NGS technology reads a sequence from a single DNA molecule. In other words, DNA is reacted in a single molecule state to read the sequence in real time. Thus, errors and imbalanced amplification problems that may appear in the clonal amplification process by PCR are avoided to increase accuracy, and the entire process is reduced by one step to further increase the speed of base sequence determination. In addition, since there is only one DNA molecule involved in the reaction, the amount of reagents required for nucleotide sequence analysis such as DNA polymerase and dNTP has been greatly reduced, and this can have a significant impact on cost reduction.
둘째로, 염기탐색을 위한 반응의 종류도 다양화되었다. 예를 들면 Pacific Biosciences의 SMRT(Single molecule real-time) 기술은 DNA 1분자를 주형으로 삼고 DNA 합성효소로 합성하여 1염기마다 발생하는 반응을 형광의 파장 변화로 검출하여 실시간으로 염기서열을 결정한다. Oxford Nanopore 시퀀서는 외핵산분해효소(exonuclease)에 의해 잘려진 하나의 염기가 pore를 통과할 때 발생하는 전위변화로 염기를 읽어낸다.Second, the types of reactions for base search have also been diversified. For example, Pacific Biosciences' single molecule real-time (SMRT) technology uses one molecule of DNA as a template and synthesizes it with a DNA synthase, detecting reactions that occur for each base by changing the wavelength of fluorescence, and determining the base sequence in real time. . The Oxford Nanopore sequencer reads the base by the change in potential that occurs when a single base cut by exonuclease passes through the pore.
Pacific Biosciences에서는 단일 DNA 분자 서열 분석을 개발하였으며 SMRT(single molecule, real-time) 기술로 불린다. 분석용 칩의 바닥에 한 분자의 DNA 중합효소가 결합되어있고 이곳에서 주형 DNA와 중합 반응을 일으키고 실시간으로 반응을 탐지하여 염기서열을 읽는다. 뉴클레오타이드의 인산기 끝에 형광라벨이 부착되어 염기결합반응이 일어나면 형광라벨이 탈락하여 형광파장이 중단되는데 이를 실시간으로 탐지하여 서열을 분석한다.Pacific Biosciences has developed a single DNA molecule sequencing and is referred to as SMRT (single molecule, real-time) technology. A molecule of DNA polymerase is bound to the bottom of the analysis chip, where it initiates a polymerization reaction with the template DNA, detects the reaction in real time, and reads the base sequence. When a fluorescent label is attached to the end of the phosphate group of a nucleotide and a base-binding reaction occurs, the fluorescent label is dropped and the fluorescent wavelength is stopped. This is detected in real time and the sequence is analyzed.
옥스포드 나노포어 염기서열 분석기(Oxford Nanopore sequencer)의 염기 결정 방식은 주형에서 DNA를 합성하는 신호를 받는 대신 주형의 뉴클레오타이드를 절단하여 유리된 뉴클레오타이드의 종류를 읽어내는 exonuclease sequencing 방식이다. 나노포어는 전류가 흐르는 통로로서 유리된 뉴클레오타이드가 나노포어를 통과하면 A, T, G, C의 각 염기에 따라 다른 전류가 발생하는데, 이러한 전위의 변화를 감지하는 방식이다. 옥스포드 나노포어 염기서열 분석기는 PCR 증폭과정과 형광이미지 처리과정 모두를 없앤 혁신적인 초소형의 기기이다. 이러한 나노포어는 단백질로 막에 걸쳐서 만드는데 예로서 알파-헤모라이신(α-hemolysin)이 있다. 알파-헤모라이신은 헵타머(heptamer)로 된 단백질 포어로 내경이 DNA 단일 분자와 같다. 이러한 단백질 나노포어 외에도 더욱 정교하고 특이적인 나노포어를 만들기 위해 그래핀(grapheme) 등의 가공을 통한 고체상태의 나노포어도 개발되고 있다.The base determination method of the Oxford Nanopore sequencer is an exonuclease sequencing method that reads the type of free nucleotide by cutting the nucleotides of the template instead of receiving a signal to synthesize DNA from the template. Nanopores are a path through which current flows, and when free nucleotides pass through the nanopores, different currents are generated for each base of A, T, G, and C, and this is a method of detecting changes in potential. The Oxford Nanopore Sequencing Analyzer is an innovative ultra-compact instrument that eliminates both the PCR amplification process and the fluorescence imaging process. These nanopores are made from proteins across the membrane, such as alpha-hemolysin. Alpha-hemolysine is a protein pore made of heptamer, and the inner diameter is the same as a single DNA molecule. In addition to these protein nanopores, solid-state nanopores are also being developed through processing such as grapheme to make more sophisticated and specific nanopores.
과거에는 긴 시간과 노력이 필요했던 유전체 전체의 염기서열 분석(Whole Genome Sequencing, WGS) 또는 재염기서열 분석(Resequencing)을 NGS를 이용해 적은 자원으로도 효과적으로 진행할 수 있게 되었다. 뿐만 아니라 그 효율성으로 인해 유전체의 구조, 유전변이, 차별적인 유전자의 발현, 전사 조절에 관한 연구 등 다양한 부분에서 사용되고 있다.Whole genome sequencing (WGS) or resequencing, which required a long time and effort in the past, can be performed effectively with less resources using NGS. In addition, due to its efficiency, it is used in various fields such as studies on the structure of the genome, genetic variation, differential gene expression, and transcriptional regulation.
현재까지 SNP 기반 유전형 측정을 위한 다양한 방법들이 보고되어 왔다. 이들 중 NGS를 이용한 SNP 분석방법으로는 제한 효소 기반의 RAD-seq(Restriction site Associated DNA sequencing)가 먼저 개발되었다. 대표적인 분석 프로그램으로는 Julian M. Catchen 등이 발표한 Stacks이 있으며, 이를 이용하여 개체 및 집단에서 SNP를 식별하였다. 다만, RAD-seq 방법은 실험방법이 복잡할 뿐만 아니라, 양질의 결과를 얻기 위해서는 많은 양의 유전체 염기서열 결정을 해야 하기 때문에 상대적으로 효율이 낮다.To date, various methods for SNP-based genotyping have been reported. Among them, RAD-seq (Restriction site Associated DNA sequencing) based on restriction enzymes was first developed as a SNP analysis method using NGS. A typical analysis program is Stacks published by Julian M. Catchen et al., and SNPs were identified in individuals and populations using this. However, the RAD-seq method is not only complicated in the experimental method, but also has relatively low efficiency because a large amount of genome sequencing must be determined to obtain a good result.
이러한 단점을 극복하기 위해 나온 방법이 GBS로 상대적으로 적은 양의 염기서열결정 만으로도 RAD-seq와 동일한 수준의 결과를 얻을 수 있다. GBS는 다양한 작물의 종과 개체들의 SNP 유전형을 탐지하기 위한 목적으로 만들어진 NGS 기술의 최신 방법 중 하나이다. 다른 유전형 분석 기술과는 달리, GBS는 저렴한 비용으로 높은 수준의 SNP 마커들을 참조 유전체에 맵핑할 수 있다. GBS 분석의 첫 번째 단계는 반복적인 지역의 유전체 서열을 피하고 동시에 유전체의 주요 지역이 선택될 수 있도록 하기 위해 유전체 분석을 통해 가장 효과적인 제한효소를 선택하는 것이다. 다음으로 유전체를 제한효소로 처리한 후 서열의 양쪽 모두가 제한효소로 단편들 모두를 시퀀싱한다. 이러한 방법은 유전체 전체를 분석하지 않고도 넓은 유전체 범위에 대해 일정한 부분을 높은 비율로 분석할 수 있게 됨으로써 비용 및 시간을 감소시킨다. GBS는 Reduced Representation Library(RRL), RAD-seq 등과 같이 제한 효소를 이용하는 기본원리는 동일하지만 제한 효소로 자른 후 사이즈 크기를 상관하지 않는 점에서 라이브러리 제작이 더 간단하다.The method that came out to overcome these drawbacks is that with GBS, a relatively small amount of sequencing can be used to obtain the same level of results as RAD-seq. GBS is one of the latest methods of NGS technology designed to detect SNP genotypes in various crop species and individuals. Unlike other genotyping techniques, GBS can map high-level SNP markers to reference genomes at low cost. The first step in GBS analysis is to select the most effective restriction enzymes through genomic analysis in order to avoid repetitive genomic sequences and at the same time ensure that major regions of the genome can be selected. Next, the genome is treated with a restriction enzyme, and then both fragments are sequenced with restriction enzymes on both sides of the sequence. This method reduces cost and time by allowing a certain portion to be analyzed at a high rate over a wide genome range without analyzing the entire genome. For GBS, the basic principle of using restriction enzymes such as Reduced Representation Library (RRL), RAD-seq, etc. is the same, but it is simpler to make a library in that it does not care about the size and size after cutting with restriction enzymes.
다양한 분석 파이프라인 중 가장 많이 사용되고 있는 GBS 분석 파이프라인은 코넬대학교 Buckler lab에서 개발한 TASSEL(Trait Analysis by aSSociation, Evolution and Linkage)로 현재 가장 안정적이고 우수한 결과를 보여주고 있다. TASSEL은 코넬대학교 Buckler lab에서 개발한 GBS 등 유전체와 제한 효소 정보를 이용한 유전형 분석을 위한 자바 기반의 분석 프로그램으로 개체군과 양적 유전학 도구로서 유전형과 특성 연관을 평가하는 소프트웨어이다. TASSEL은 Discovery와 Production의 2개의 커다란 파이프라인으로 이루어져 있다. Discovery 파이프라인은 바코드와 제한 효소로 처리가 되어 FASTQ 형식의 서열 정보를 이용하여 일정한 길이의 유전체 조각인 Tag들을 추출하고 이를 참조 유전체에 맵핑을 시킨 후 맵핑이 완료된 데이터로부터 SNP를 탐지하는 역할을 한다. Production 파이프라인은 FASTQ 형식의 유전체 파일과 Discovery를 통해 맵핑된 데이터를 가지고 최종적으로 다수의 시료에 대한 Hatmap 데이터 포맷의 유전형 정보를 생성한다(Jeong-Ho Baek, et al., (2015) 한국정보통신학회논문지(J. Korea Inst. Inf. Commun. Eng.) Vol. 19, No. 10: 2491-2499).The GBS analysis pipeline, which is the most used among various analysis pipelines, is TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) developed by Buckler Lab at Cornell University, which is currently showing the most stable and excellent results. TASSEL is a Java-based analysis program for genotyping analysis using genome and restriction enzyme information such as GBS developed by Buckler lab at Cornell University. It is a software that evaluates genotype and trait association as a quantitative genetics tool with population. TASSEL consists of two large pipelines, Discovery and Production. The discovery pipeline is processed with barcodes and restriction enzymes, extracts tags, which are genome fragments of a certain length, using sequence information in FASTQ format, maps them to reference genomes, and detects SNPs from the mapped data. . The production pipeline finally generates genotyping information in Hatmap data format for a number of samples with genome files in FASTQ format and data mapped through discovery (Jeong-Ho Baek, et al., (2015) Journal of the Society (J. Korea Inst. Inf. Commun. Eng.) Vol. 19, No. 10: 2491-2499).
현재 유전체(염기다형성칩 혹은 차세대 염기서열 분석 기술) 데이터 기반 바이오마커 검색 및 발굴은 단일염기다형성(SNP) 방법을 사용한다. 그리고, 이러한 단일염기다형성을 계산하는 방법을 단일염기다형성 정의(SNP calling)라고 부른다.The current genome (polymorphism chip or next-generation sequencing technology) data-based biomarker search and discovery uses a single nucleotide polymorphism (SNP) method. In addition, the method of calculating this single nucleotide polymorphism is called single nucleotide polymorphism definition (SNP calling).
단일염기다형성 계산은 대립형 유전자에 기반하여 통계를 적용하고 SNP calling을 수행하여 SNP를 계산한다.In the calculation of single nucleotide polymorphism, statistics are applied based on alleles and SNP calling is performed to calculate SNP.
따라서, 바이오마커 발굴 및 검출 기술들은 정상군 및 환자군들의 염기다형성 정보를 이용한 질병연관성연구(association study) 및 질병 링키지연구(linkage study)에 사용된다.Therefore, biomarker discovery and detection techniques are used in disease association studies and disease linkage studies using nucleotide polymorphism information of normal and patient groups.
한편, NGS 및 SNP 칩 데이터의 이미지 정보를 가공하면, 대립유전자 차이, 시그널강도(signal intensity), 대립유전자불균형(imbalance) 및 질점수(quality score) 등의 정보가 산출된다. 이러한 연속변수 데이터에 기반하여 다양한 변이에 대한 정의(variant calling)를 수행한 후 분류한 정보(SNV, CNV, 대립유전자방향성 및 INDEL)를 가지고 정상과 질병 사이의 차이를 주는 마커를 선별한다.Meanwhile, when image information of NGS and SNP chip data is processed, information such as allele difference, signal intensity, allele imbalance, and quality score is calculated. Based on this continuous variable data, variable calling is performed, and then a marker that gives a difference between normal and disease is selected with the classified information (SNV, CNV, allele orientation and INDEL).
여기서, 유전자형(genotype) 등으로 분류된 데이터는 비연속 변수인 범주형 변수에 해당한다. 이러한 범주형 변수는 연속변수에 비하여 많은 정보가 손실이 되기 때문에, 암, 희귀질환 및 만성질환과 같은 대립유전자(rare allele)에 기인한 질병연관성(disease association) 및 링키지(linkage study) 연구를 수행할 때 바이오마커 검출 및 발굴 파워가 감소되는 경향이 있다.Here, data classified by genotype, etc. correspond to categorical variables that are non-contiguous variables. Since these categorical variables lose a lot of information compared to continuous variables, disease association and linkage studies due to alleles such as cancer, rare diseases and chronic diseases are conducted. When doing so, the biomarker detection and discovery power tends to decrease.
일반적으로 염기다형성 정의(SNV calling)를 하려면 차세대시퀀싱(NGS) 혹은 염기다형성-칩(Chip)데이터의 경우, 올리고(oligo nucleotide: 작은 염기서열조각) chip에 대량으로 화학적 방법에 의해 적치하고, 시퀀싱이나 지노타이핑을 수행할 때 잘게 쪼개진 DNA조각을 Chip에 붙어 있는 DNA조각과 반응결합(hybridization)이 생기게 한 후 서로 잘 결합하고 있는지 여부를 나타내는 시그널 강도 값을 정량화하는 방법이 일반적이다. 염기다형성칩(SNPChip)데이터의 경우 정량화된 시그널강도 값은 염기 당 수백 내지 수천 개의 수치로 표현된다.In general, in order to define nucleotide polymorphism (SNV calling), in the case of next-generation sequencing (NGS) or nucleotide polymorphism-chip data, a large amount of nucleotide polymorphism (SnV calling) is deposited on a chip by chemical method and sequencing. When performing or genotyping, it is common to quantify the signal intensity value indicating whether or not the fragmented DNA fragments are hybridized with the DNA fragments attached to the chip, and then are well bound to each other. In the case of SNPChip data, the quantified signal intensity value is expressed as a number of hundreds to thousands per base.
현재 일반적으로 사용하는 illumina 및 affymetrix 사의 SNP chip의 경우, 1백만 SNP를 한 번에 집적하도록 되어있다. 따라서, 1백만 개 대립유전자 위치에서 시그널 값(약 1천개)를 생산한다면 1M * 1,000값, 즉, 1명의 게놈 당 10억 개의 수치가 생기고 이러한 방법으로 10,000명을 처리하면 100조개의 수치가 생긴다. 그러므로 데이터 크기로 약 5-10TB 정도가 된다.In the case of illumina and affymetrix's SNP chips, which are currently generally used, 1 million SNPs are integrated at a time. So, if you produce signal values (about 1,000) at 1 million allele positions, you will get 1M * 1,000 values, that is, 1 billion numbers per genome, and processing 10,000 people in this way yields 100 trillion values. . Therefore, the data size is about 5-10TB.
바이오마커를 발굴하는 과학자들은 단일염기변이 정의(SNV calling) 방법을 사용하여 프로세싱을 한 후에 계산된 단일염기변이(SNV)만을 사용한다(약 10GB).Scientists discovering biomarkers use only the calculated single base variant (SNV) after processing using the SNV calling method (approximately 10 GB).
한편, 대한민국 특허등록 제10-0996443호에는 고집적 유전자 데이터베이스를 처리하는 방법이 개시되어 있다.Meanwhile, Korean Patent Registration No. 10-0996443 discloses a method of processing a highly integrated gene database.
NGS 데이터로부터의 SNV calling은 차세대 염기서열 분석(NGS) 실험의 결과로부터 단일염기변이(SNV)의 존재를 확인하는 방법에 관한 것이다. 이는 계산 기술로서, 알려진 집단 전체의 뉴클레오타이드 다형성에 기초한 특정한 실험 방법과는 차이가 있다. NGS 데이터가 풍부해짐에 따라 이러한 기술은 특정 실험 설계 및 응용 프로그램용으로 설계된 다양한 알고리즘을 사용하여 SNP genotyping을 수행하는 데 점점 더 널리 사용되고 있다(Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51). SNP genotyping의 일반적인 응용 영역 외에도 집단 내의 희귀한 SNP를 확인하고 여러 조직 표본을 사용하여 개체 내에서 체세포 SNV를 탐지하는 데 성공적으로 적용되었다(Vikas Bansal, (2010) Bioinformatics, 26 (12), i318-24; Andrew Roth, et al., (2012) Bioinformatics, 28 (7), 907-13).SNV calling from NGS data relates to a method for confirming the presence of a single base variant (SNV) from the results of next-generation sequencing (NGS) experiments. This is a computational technique, which differs from certain experimental methods based on known population-wide nucleotide polymorphism. As NGS data becomes more abundant, these techniques are increasingly being used to perform SNP genotyping using a variety of algorithms designed for specific experimental designs and applications (Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51). In addition to the general application areas of SNP genotyping, it has been successfully applied to identify rare SNPs within populations and to detect somatic SNV in individuals using multiple tissue samples (Vikas Bansal, (2010) Bioinformatics, 26 (12), i318- 24; Andrew Roth, et al., (2012) Bioinformatics, 28 (7), 907-13).
NGS 데이터로부터의 SNV calling은 생식세포 변이 검출에 이용될 수 있다. SNV 검출을 위한 대부분의 NGS 기반 방법은 개인의 게놈에서 생식세포 변이를 검출하도록 설계되었다. 이들은 개인이 부모로부터 생물학적으로 물려받은 돌연변이이며, 체세포 돌연변이가 필요한 특정한 적용을 제외하면 분석을 수행할 때 검색되는 변종의 일반적인 유형이다. 대부분 검색된 변이는 모집단 전체에 걸쳐 낮은 빈도로 발생하며, 이 경우 단일염기다형성(SNP)이라고 할 수 있다. 기술적으로 SNP라는 용어는 이러한 종류의 변이만을 의미하지만 실제로는 변형 calling에 관한 문헌에서 SNV와 동의어로 사용된다. 또한, 생식세포 SNV의 검출은 각각의 유전자좌에서 개체의 유전자형을 결정할 필요가 있기 때문에, “SNP genotyping”이 이 과정을 언급하는데 사용될 수도 있다. 또는 알려진 SNP 위치 집합에서 유전자형을 분류하기 위한 wet-lab 실험 절차를 나타낼 수도 있다.SNV calling from NGS data can be used to detect germ cell mutations. Most NGS-based methods for SNV detection are designed to detect germ cell variants in an individual's genome. These are mutations that an individual has inherited biologically from their parents and, except for certain applications that require somatic mutations, are the general type of variant that is detected when performing an analysis. Most of the detected mutations occur at a low frequency throughout the population, which can be referred to as single nucleotide polymorphism (SNP). Technically, the term SNP refers only to this kind of mutation, but is actually used synonymously with SNV in the literature on variant calling. Also, because detection of germ cell SNV requires determining the genotype of an individual at each locus, “SNP genotyping” may be used to refer to this process. Alternatively, it may represent a wet-lab experimental procedure for classifying genotypes in a set of known SNP locations.
일반적인 프로세스는 NGS read를 필터링하여 오류/바이어스의 원인을 제거; reference 게놈에 대한 read 정렬; 통계 모델 또는 일부 heuristics에 기초한 알고리즘을 사용하여, 각 유전자좌에서의 정렬 가능성에 대한 질적 점수 및 대립유전자 수에 기초하여 각 유전자좌에서의 변이 가능성을 예측; 응용 프로그램과 관련된 metrics를 기반으로 예측된 결과를 필터링; 및 각 변이의 기능적 효과를 예측하는 SNP 주석에 기초한다(Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51). 이러한 절차의 일반적인 결과물은 VCF 파일이다.A typical process is to filter NGS reads to eliminate the cause of errors/bias; read alignment to the reference genome; Predicting the likelihood of variability at each locus based on the number of alleles and a qualitative score for the likelihood of alignment at each locus, using statistical models or algorithms based on some heuristics; Filter predicted results based on application-specific metrics; And SNP annotations that predict the functional effect of each mutation (Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51). The typical output of this procedure is a VCF file.
NGS 데이터로부터의 SNV calling에는 확률론적 방법이 있다. 높은 판독 범위를 갖는 이상적인 오류가 없는 경우, NGS 데이터 정렬 결과로부터 변이 calling의 작업은 간단하다. 각 유전자좌(게놈상의 위치)에서 그 위치에 정렬된 read들 사이의 각각의 개별적인 뉴클레오타이드의 발생 수를 셀 수 있어, 진정한 유전자형은 명백할 것이다. 예를 들어, 모든 뉴클레오타이드가 allele A와 일치하면 AA, allele B와 일치하면 BB, 혼합물이 존재하면 AB가 된다. 다만, 실제 NGS 데이터로 작업할 때 입력 데이터의 노이즈를 고려할 수 없기 때문에 이런 종류의 단순한 접근법은 사용되지 않는다(E R Martin, et al., (2010) Bioinformatics, 26 (22), 2803-10). 염기 calling에 사용되는 뉴클레오타이드 카운팅은 시퀀싱된 read 자체 및 정렬 프로세스에 기인한 오류 및 바이어스를 모두 포함한다. 이 문제는 read 범위가 더 넓어지도록 시퀀싱함으로써 어느 정도 완화될 수 있지만, 비용이 많이 들고, 실제 연구에서는 낮은 커버리지 데이터에서의 추론을 필요로 한다.There is a probabilistic method for SNV calling from NGS data. In the absence of an ideal error with a high read range, the task of calling transitions from NGS data alignment results is simple. The number of occurrences of each individual nucleotide between reads aligned at that location at each locus (a genomic location) would be counted, so the true genotype would be clear. For example, if all nucleotides match allele A, then AA, if allele B matches, BB, and if a mixture is present, AB. However, this kind of simple approach is not used because the noise of input data cannot be considered when working with actual NGS data (E R Martin, et al., (2010) Bioinformatics, 26 (22), 2803-10). The nucleotide counting used for base calling includes both errors and biases due to the sequenced read itself and the alignment process. This problem can be mitigated to some extent by sequencing so that the read range is wider, but it is expensive and requires inference on low-coverage data in actual research.
확률론적 방법은 노이즈를 고려하여 추정 가능한 유전자형 각각의 확률에 대한 견고한 추정치뿐만 아니라, 추정치를 개선하는 데 사용할 수 있는 기타 가능한 사전 정보를 제공함으로써 문제를 극복하는 것을 목표로 한다. 때때로 MAP(Maximum a posteriori estimation) 추정치에 따라 확률을 기반으로 유전자형을 예측할 수 있다. 변이 calling에 대한 확률론적 방법은 Bayes’ theorem을 기반으로 한다. 변이 calling의 맥락에서 Bayes’ theorem은 각 가능한 유전자형의 사전 확률 및 각 가능한 유전자형에 대한 데이터의 확률 분포와 관련하여, 관측된 데이터가 주어진 각 유전자형이 true genotype이 될 확률을 정의한다. 공식은 다음과 같다:Probabilistic methods aim to overcome the problem by providing a robust estimate of the probability of each of the genotypes that can be estimated taking into account the noise, as well as other possible prior information that can be used to improve the estimate. Sometimes a genotype can be predicted based on probability according to a maximum a posteriori estimation (MAP) estimate. The probabilistic method for mutant calling is based on Bayes’ theorem. In the context of variant calling, Bayes' theorem defines the probability that each genotype given the observed data will be a true genotype, with respect to the prior probability of each possible genotype and the probability distribution of the data for each possible genotype. The formula is:
Figure PCTKR2020006028-appb-I000001
Figure PCTKR2020006028-appb-I000001
상기 방정식에서, D는 관찰된 데이터, 즉, 정렬된 read를 의미하며, G는 확률이 계산되는 genotype이며, Gi는 n개의 가능성 중에서 i번째 가능한 genotype을 의미한다.In the above equation, D denotes observed data, that is, sorted read, G denotes a genotype from which a probability is calculated, and Gi denotes the i-th possible genotype among n possibilities.
상기 구조를 고려할 때, SNV를 검출하기 위한 다양한 소프트웨어의 해결책은 사전 확률 P(G)를 계산하는 방법, 확률 P(D|G)를 모델링하는 데 사용된 오류 모델 및 전체 genotype을 sub-genotype으로 세분화하는 것에 따라 상이하다(Na You, et al., (2012) Bioinformatics, 28 (5), 643-50).Considering the above structure, solutions of various software for detecting SNV are a method of calculating the prior probability P(G) , the error model used to model the probability P(D|G) , and the entire genotype as a sub-genotype. It differs according to subdivision (Na You, et al., (2012) Bioinformatics, 28 (5), 643-50).
예비 확률의 계산은 연구 중인 게놈의 사용 가능한 데이터와 수행되는 분석의 유형에 달려 있다. 알려진 돌연변이의 빈도를 포함하는 좋은 참조 데이터를 이용할 수 있는 연구(예를 들어, 인간 게놈 데이터 연구)에서는 모집단의 유전자형 빈도가 예비 확률 추정에 사용될 수 있다. 모집단의 allele 빈도가 주어지면, 예비 유전자형 확률은 Hardy Weinberg Equilibrium에 따라 각 유전자좌에서 계산될 수 있다(Ruiqiang Li, et al., (2009) Genome Res, 19 (6), 1124-32). 이러한 데이터가 없는 경우, 유전자좌와 독립적으로 상수(constant)도 사용할 수 있다. 이 값들은 경험적으로 선택된 값을 사용하여 설정할 수 있다.The calculation of the preliminary probability depends on the available data of the genome under study and the type of analysis being performed. In studies where good reference data are available, including the frequency of known mutations (eg, studies of human genomic data), the genotype frequency of the population can be used for preliminary probability estimation. Given the allele frequency of the population, the probabilities of preliminary genotypes can be calculated at each locus according to the Hardy Weinberg Equilibrium (Ruiqiang Li, et al., (2009) Genome Res, 19 (6), 1124-32). In the absence of such data, constants can also be used independently of the locus. These values can be set using empirically chosen values.
변이 calling에 대한 확률론적 방법을 생성하는 데 사용되는 오류(error) 모델은 Bayes' theorem에 사용된 P(D|G) 항을 계산하기 위한 기초가 된다. 데이터에 오류가 없다고 가정하면, 각 유전자좌에서 관찰된 뉴클레오타이드 카운트의 분포는 AA 및 BB case에서 각각 A 또는 B allele과 100% 일치하는 뉴클레오타이드 및 AB case에서 A 또는 B와 50%의 확률로 일치하는 각 뉴클레오타이드의 Binomial Distribution을 따를 것이다. 그러나, 판독 데이터에 노이즈가 존재하는 경우 이러한 가정은 성립할 수 없으며, P(D|G) 값은 잘못된 뉴클레오타이드가 각 유전자좌에서 정렬된 read에 존재할 가능성에 대해 설명될 필요가 있다.The error model used to generate the probabilistic method for variant calling is the basis for computing the P(D|G) term used in Bayes' theorem. Assuming that there are no errors in the data, the distribution of nucleotide counts observed at each locus is 100% matching nucleotides to A or B allele in AA and BB cases, respectively, and each matching A or B with 50% probability in the AB case. It will follow the Binomial Distribution of nucleotides. However, if there is noise in the read data, this assumption cannot be made, and the P(D|G) value needs to be accounted for the possibility that the wrong nucleotide is present in the aligned read at each locus.
간단한 오차 모델은 homozygous 경우에서 데이터 확률 항에 작은 오차를 도입하여 A allele과 일치하지 않는 뉴클레오타이드가 AA case에서 관찰되는 작은 일정한 확률 및 BB case에서 B allele과 일치하지 않는 뉴클레오타이드가 관찰되는 작은 일정한 확률을 허용한다. 그러나 조건부 데이터 확률을 계산할 때, 실제 데이터에서 관찰된 실제 오류 패턴을 보다 더 사실적으로 복제하려고 시도하는 매우 복잡한 절차가 유효할 수 있다. 예를 들어, 판독 품질의 평가(Phred quality scores로서 측정된)가 유전자좌에 각각의 개별 read에서 예상된 오류율을 고려하여 이들 계산에 통합되었다(Heng Li, et al., (2008) Genome Res, 18 (11), 1851-8). 오류 모델에 성공적으로 통합된 또 다른 기술은 염기 품질 재교정(base quality recalibration)으로, 오류 패턴에 대한 이전의 알려진 정보를 바탕으로 각 가능한 뉴클레오타이드 치환에 대해 별도의 오류율이 계산된다. 각각의 가능한 뉴클레오타이드 치환은 시퀀싱 데이터에서 오류로 나타날 가능성이 동일하지 않으므로 오류 확률 추정을 향상시키기 위해 염기 품질 재보정이 적용되었다.The simple error model introduces a small error in the data probability term in the homozygous case to determine the small constant probability that a nucleotide that does not match A allele is observed in the AA case and a small constant probability that a nucleotide that does not match B allele is observed in the BB case. Allow. However, when calculating conditional data probabilities, a very complex procedure that attempts to more realistically replicate the actual error patterns observed in real data can be effective. For example, an assessment of read quality (measured as Phred quality scores) has been incorporated into these calculations, taking into account the expected error rate in each individual read at the locus (Heng Li, et al., (2008) Genome Res, 18 (11), 1851-8). Another technique that has been successfully integrated into the error model is base quality recalibration, in which a separate error rate is calculated for each possible nucleotide substitution based on previously known information about the error pattern. Since each possible nucleotide substitution is unlikely to appear as an error in the sequencing data, a base quality recalibration was applied to improve the error probability estimation.
상기 논의들에서, 각 유전자좌에서의 유전자형 확률은 독립적으로 계산된다고 가정되었다. 즉, 전체 유전자형이 각 유전자좌에서 독립적인 유전자형으로 분할되며, 그 확률은 독립적으로 계산된다. 그러나 결합 불균형 때문에 인근 유전자좌의 유전자형은 일반적으로 독립적이지 않다. 결과적으로 전반적인 유전자형을 overlapping haplotype의 서열로 분배(partitioning)하는 것은 이러한 상관 관계를 모델링할 수 있게 하여, 이전에 모집단 전체의 haplotype 빈도의 통합을 통해 보다 정확한 확률 추정을 가능하게 한다. 변이형 검출 정확도를 높이기 위한 haplotype의 이용은 예를 들어 1000 Genomes Project에서 성공적으로 적용되었다(Goncalo R Abecasis, et al., (2010) Nature, 467 (7319), 1061-73).In the above discussions, it was assumed that the genotype probability at each locus was calculated independently. That is, the entire genotype is divided into independent genotypes at each locus, and the probability is calculated independently. However, because of the binding imbalance, the genotype of nearby loci is generally not independent. As a result, partitioning the overall genotype into the overlapping haplotype sequence makes it possible to model this correlation, enabling more accurate probability estimation through the integration of the haplotype frequencies of the entire population before. The use of haplotype to increase the detection accuracy of variant types has been successfully applied in the 1000 Genomes Project, for example (Goncalo R Abecasis, et al., (2010) Nature, 467 (7319), 1061-73).
NGS 데이터에 대한 변이 calling을 수행하기 위한 방법에 있어서, 확률론적 방법의 대안으로 heuristic 방법이 존재한다. 관측된 데이터의 분포를 모델링하고 Bayesian 통계를 사용하여 유전자형 확률을 계산하는 대신, 최소 대립유전자 수, read 품질 컷오프(cut-offs), read 깊이의 경계(bounds) 등과 같은 다양한 경험적 요인을 바탕으로 변이 calling을 진행한다. 실제로 확률론적 방법에 비해 상대적으로 적게 사용되나, 경계와 컷오프를 사용하기 때문에 확률론적 모델의 가정을 깨뜨리는 외부 데이터에 의한 영향이 적다(Daniel C Koboldt, et al., (2012) Genome Res, 22 (3), 568-76).In a method for performing mutation calling on NGS data, a heuristic method exists as an alternative to the probabilistic method. Rather than modeling the distribution of observed data and calculating genotype probabilities using Bayesian statistics, variation is based on various empirical factors such as minimum allele count, read quality cut-offs, read depth bounds, etc. Proceed with calling. In fact, it is relatively less used than the probabilistic method, but the influence of external data that breaks the assumption of the probabilistic model is less because it uses the boundary and cutoff (Daniel C Koboldt, et al., (2012) Genome Res, 22 ( 3), 568-76).
NGS 데이터를 사용하는 변이 calling 방법의 설계에서 중요한 부분은 NGS read를 정렬시키는 것에 있어 reference로 사용되는 DNA 서열이다. 인간 유전학 연구에서, HapMap 프로젝트와 같은 출처로부터 고품질의 references를 사용할 수 있는데(International HapMap Consortium (2003) Nature, 426 (6968), 789-96), 이는 변이 calling 알고리즘에 의해 만들어진 변이 calls의 정확성을 크게 향상시킬 수 있다. 이러한 reference는 Bayesian 기반 분석을 위한 이전의 유전자형 확률의 원천이 될 수 있다. 그러한 고품질의 reference가 존재하지 않는 경우, 실험을 통해 얻은 read를 먼저 조립하여 정렬을 위한 reference 서열을 만들 수 있다.An important part of the design of the mutant calling method using NGS data is the DNA sequence used as a reference for aligning the NGS read. In human genetics studies, high-quality references are available from sources such as the HapMap project (International HapMap Consortium (2003) Nature, 426 (6968), 789-96), which greatly increases the accuracy of mutation calls made by the mutation calling algorithm. Can be improved. These references can be a source of previous genotype probabilities for Bayesian-based analysis. In the absence of such a high-quality reference, the read obtained through the experiment can be assembled first to create a reference sequence for alignment.
변이 calling 실험에서 오류/바이어스의 원인을 제거하기 위해 데이터를 필터링하는 다양한 방법이 있다. 변이 calling 알고리즘에 의해 반환된 면이 리스트를 정렬 및/또는 필터링하기 전에 의심스러운 read를 제거하는 작업이 포함될 수 있다.There are various ways to filter the data to remove the cause of the error/bias in the mutation calling experiment. This may involve removing suspicious reads before sorting and/or filtering the list of faces returned by the mutant calling algorithm.
사용된 시퀀싱 플랫폼에 따라, 시퀀싱된 read 세트 내에 다양한 바이어스가 존재할 수 있다. 예를 들어 스트랜드 바이어스(strand bias)가 발생할 수 있는데, 이웃에 정렬된 reads에서 forward와 reverse로 매우 불균등한 분포가 존재한다. 또한 때때로 예를 들어 PCR에서의 bias 때문에 일부 reads의 비정상적으로 높은 복제가 발생할 수 있다. 그러한 bias는 모호한 변이 calls를 초래할 수 있다. 예를 들면, 어떤 유전자좌의 PCR 오류를 포함하는 단편이 PCR bias로 인해 증폭되는 경우, 그 유전자좌는 많은 수의 거짓(false) 대립유전자를 가질 것이고, SNV로 불릴 수도 있다. 따라서 분석 파이프 라인은 이러한 bias들을 기반으로 calls를 필터링한다(Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51).Depending on the sequencing platform used, various biases may exist within the sequenced read set. For example, a strand bias may occur, and there is a very uneven distribution of reads arranged in the neighborhood, forward and reverse. Also, sometimes abnormally high replication of some reads can occur due to bias in PCR, for example. Such bias can lead to ambiguous mutant calls. For example, if a fragment containing a PCR error at a locus is amplified due to PCR bias, the locus will have a large number of false alleles and may be called SNV. Therefore, the analysis pipeline filters calls based on these biases (Rasmus Nielsen, et al., (2011) Nat Rev Genet, 12 (6), 443-51).
NGS 데이터로부터의 SNV calling은 체세포 변이 검출에도 이용될 수 있다. 생식세포의 유전적 변이를 검출하기 위해 개별 샘플들로부터 reference 게놈으로 read를 정렬하는 방법에 더하여, 단일 개체 내의 다수의 조직 샘플로부터 reads를 정렬하고 비교하여 체세포 변이를 검출할 수 있다. 이러한 변이들은 개체 내의 체세포 그룹 내에서 새롭게 발생하는 돌연변이에 해당한다(즉, 개체의 생식세포 내에 존재하지 않는다). 이러한 형태의 분석은 암의 연구에 적용되어 왔으며, 특히 암 조직 내에서 체세포 돌연변이의 프로필을 조사하도록 설계된 많은 암 연구에 적용되었다. 이러한 연구들은 임상 적용되는 진단 도구로 이어졌으며, 예를 들어 새로운 암 관련 유전자의 발견, 관련된 유전자 조절 네트워크 및 대사 경로의 확인과 종양의 성장 및 진화와 관련된 모델에 대한 확인에 의해 질병에 대한 과학적 이해를 향상시키는 데 사용된다(Derek Shyr, Qi Liu, (2013) Biol Proced Online, 15 (1), 4).SNV calling from NGS data can also be used to detect somatic mutations. In addition to aligning reads from individual samples to a reference genome to detect genetic variation in germ cells, somatic variation can be detected by sorting and comparing reads from multiple tissue samples within a single individual. These mutations correspond to a mutation that occurs newly within a group of somatic cells within an individual (ie, does not exist within the individual's germ cells). This type of analysis has been applied to the study of cancer, especially in many cancer studies designed to investigate the profile of somatic mutations within cancer tissues. These studies have led to diagnostic tools that have been applied clinically, for example, scientific understanding of disease by the discovery of new cancer-related genes, identification of related gene regulatory networks and metabolic pathways, and models related to tumor growth and evolution. (Derek Shyr, Qi Liu, (2013) Biol Proced Online, 15 (1), 4).
NGS 데이터로부터의 SNV calling을 이용하여 체세포 변이를 검출하는 분석을 수행하기 위한 소프트웨어들이 많이 개발되지 않았으며, 생식세포 변이 검출에 사용된 것과 동일한 알고리즘을 기반으로 해왔다. 그러한 절차들은 동일한 개체의 여러 조직 표본에 존재하는 유전자형 간의 통계적 상관 관계를 적절히 모델링하지 않기 때문에, 체세포 변이 검출에 최적화되지 않았다(Andrew Roth, et al., (2012) Bioinformatics, 28 (7), 907-13). 최근에서야 여러 조직 표본에서 체세포 돌연변이를 검출하기 위해 특별히 최적화된 소프트웨어 도구들이 개발되었다. 각 유전자좌에서의 모든 조직 샘플로부터 얻은 pool allele 수와 모든 조직에 있어서 joint-genotypes의 확률 및 유전자형을 고려한 allele 수의 분포에 대한 통계학적 모델을 이용하는 확률론적 기술이 개발되었으며, 모든 이용 가능한 데이터를 사용하여 각 유전자좌에서 체세포 돌연변이의 확률을 상대적으로 정확하게 계산할 수 있다(David E Larson, et al., (2012) Bioinformatics, 28 (3), 311-7). 이러한 분석을 수행하기 위한 기술에 기반한 machine learning에 대한 연구가 최근 진행되고 있다(Jiarui Ding, et al., (2012) Bioinformatics, 28 (2), 167-75).There have not been many software developments for performing analysis to detect somatic mutations using SNV calling from NGS data, and they have been based on the same algorithms used for detection of germ cell mutations. Such procedures have not been optimized for detection of somatic mutations because they do not adequately model statistical correlations between genotypes present in multiple tissue samples from the same individual (Andrew Roth, et al., (2012) Bioinformatics, 28 (7), 907 -13). Only recently have software tools specifically optimized for detecting somatic mutations in several tissue samples have been developed. A probabilistic technique was developed using a statistical model for the distribution of the number of pool alleles obtained from all tissue samples at each locus and the number of alleles taking into account the probability of joint-genotypes and genotypes in all tissues, using all available data. Thus, the probability of somatic mutation at each locus can be calculated relatively accurately (David E Larson, et al., (2012) Bioinformatics, 28 (3), 311-7). Research on machine learning based on technology for performing this analysis is currently being conducted (Jiarui Ding, et al., (2012) Bioinformatics, 28 (2), 167-75).
NGS 데이터로부터의 SNV calling에 이용 가능한 소프트웨어로는 Freebayes, SOAPsnp, realSFS, SAMtools, GATK, Beagle, IMPUTE2, MaCH, SNVmix, VarScan, DeepVariant, Somaticsniper, JointSNVMix, Big Data Genomics: Avocado, NGSEP, VarDict, Reveel 등이 있다.Software available for SNV calling from NGS data is Freebayes, SOAPsnp, realSFS, SAMtools, GATK, Beagle, IMPUTE2, MaCH, SNVmix, VarScan, DeepVariant, Somaticsniper, JointSNVMix, Big Data Genomics: Avocado, NGSEP, VarDict, Reveel, etc. There is this.
본 발명은 또 다른 관점에서, TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제를 포함하는 치매 진단용 조성물에 관한 것이다.In another aspect, the present invention relates to a composition for diagnosing dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
본 발명은 또 다른 관점에서, TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제를 포함하는 치매 진단용 키트에 관한 것이다.In another aspect, the present invention relates to a kit for diagnosing dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
본 발명은 또 다른 관점에서, 치매의 진단 또는 예측 방법에 사용하기 위한 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제를 포함하는 치매의 진단 또는 예측을 위한 조성물에 관한 것이다.In another aspect, the present invention is a composition for diagnosis or prediction of dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene for use in a method for diagnosing or predicting dementia It is about.
본 발명은 또 다른 관점에서, 치매의 진단 또는 예측을 위한 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 측정할 수 있는 제제의 용도에 관한 것이다.In another aspect, the present invention relates to the use of an agent capable of measuring Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene for diagnosis or prediction of dementia.
본 발명은 또 다른 관점에서, 치매의 진단 또는 예측용 키트의 제조에 있어서 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제의 사용에 관한 것이다.In another aspect, the present invention relates to the use of an agent capable of detecting Single Nucleotide Variant (SNV) of the TOP3B (DNA topoisomerase III beta) gene in the manufacture of a kit for diagnosis or prediction of dementia.
본 발명에 있어서, 상기 제제는 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 포함하는 것을 특징으로 할 수 있다.In the present invention, the formulation may be characterized in that it comprises a primer capable of specifically amplifying the SNV position of the TOP3B gene or a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
따라서, 본 발명은 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 프라이머 세트를 포함하는 치매 진단용 조성물 또는 치매 진단용 키트에 관한 것이다.Accordingly, the present invention relates to a composition for diagnosing dementia or a kit for diagnosing dementia comprising a primer or a set of primers capable of specifically amplifying the SNV position of the TOP3B gene.
본 발명은 또한, TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 포함하는 치매 진단용 조성물 또는 치매 진단용 키트에 관한 것이다.The present invention also relates to a composition for diagnosing dementia or a kit for diagnosing dementia comprising a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
본 발명은 또한, 치매의 진단 또는 예측 방법에 사용하기 위한 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 프라이머 세트를 포함하는 치매의 진단 또는 예측을 위한 조성물에 관한 것이다.The present invention also relates to a composition for diagnosis or prediction of dementia comprising a primer or a set of primers capable of specifically amplifying the SNV position of the TOP3B gene for use in a method for diagnosing or predicting dementia.
본 발명은 또한, 치매의 진단 또는 예측 방법에 사용하기 위한 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 포함하는 치매의 진단 또는 예측을 위한 조성물에 관한 것이다.The present invention also relates to a composition for diagnosis or prediction of dementia comprising a probe complementarily binding to a region containing the SNV position of the TOP3B gene for use in a method for diagnosing or predicting dementia.
본 발명은 또한, 치매의 진단 또는 예측을 위한 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 프라이머 세트의 용도에 관한 것이다.The present invention also relates to the use of a primer or primer set capable of specifically amplifying the SNV position of the TOP3B gene for diagnosis or prediction of dementia.
본 발명은 또한, 치매의 진단 또는 예측을 위한 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브의 용도에 관한 것이다.The present invention also relates to the use of a probe that complementarily binds to a region containing the SNV position of the TOP3B gene for diagnosis or prediction of dementia.
본 발명은 또한, 치매의 진단 또는 예측용 키트의 제조에 있어서 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 프라이머 세트의 사용에 관한 것이다.The present invention further relates to the use of a primer or primer set capable of specifically amplifying the SNV position of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
본 발명은 또한, 치매의 진단 또는 예측용 키트의 제조에 있어서 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브의 사용에 관한 것이다.The present invention also relates to the use of a probe that complementarily binds to a region containing the SNV position of the TOP3B gene in the manufacture of a kit for diagnosis or prediction of dementia.
본 발명에 있어서, 상기 프라이머는 상기 표 1에 기재된 TOP3B 유전자의 SNV 중 어느 하나 이상을 증폭할 수 있는 프라이머 세트인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다. 본 발명에 있어서, 상기 프로브는 상기 표 1에 기재된 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, the primer may be a set of primers capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1, but is not limited thereto. In the present invention, the probe may be characterized in that it is a probe that complementarily binds to a region including the SNV position of the TOP3B gene described in Table 1, but is not limited thereto.
본 발명의 일 실시예에서, TOP3B 유전자에 대한 타겟 시퀀싱을 수행하였다. 시퀀싱 라이브러리는 TruSeq Nano DNA Library Prep Kits를 이용하였으며, 서열번호 1 내지 서열번호 44의 프로브를 이용하여 TOP3B 유전자에 대한 타겟 농축을 수행하였다. 정제 후, Illumina p5와 p7 프라이머(서열번호 45 및 서열번호 46)를 이용하여 증폭시키고, qPCR 및 KAPA Library Quantification kit로 정제 및 정량하였다. Illumina 社의 NextSeq 550을 이용하여 post-enriched 라이브러리의 NGS 분석을 수행하여, TOP3B 유전자의 SNV를 검출하였다.In one embodiment of the present invention, target sequencing was performed on the TOP3B gene. As a sequencing library, TruSeq Nano DNA Library Prep Kits were used, and target enrichment for the TOP3B gene was performed using probes of SEQ ID NOs: 1 to 44. After purification, it was amplified using Illumina p5 and p7 primers (SEQ ID NO: 45 and SEQ ID NO: 46), and purified and quantified by qPCR and KAPA Library Quantification kit. NGS analysis of the post-enriched library was performed using NextSeq 550 of Illumina, and SNV of the TOP3B gene was detected.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 TOP3B 유전자에 대한 프라이머를 이용할 수 있으며, 한 쌍(pair) 이상의 프라이머 세트를 이용할 수도 있다. 상기 프라이머는 TOP3B 유전자를 증폭시킬 수 있는 서열이면 제한 없이 이용가능하나, 바람직하게는 상기 표 1에 기재된 TOP3B 유전자의 SNV 중 어느 하나 이상을 증폭할 수 있는 프라이머 세트인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, in the step of detecting SNV of the TOP3B gene, a primer for the TOP3B gene may be used, or a pair or more primer sets may be used. The primer may be used without limitation as long as it is a sequence capable of amplifying the TOP3B gene. Preferably, it may be characterized in that it is a primer set capable of amplifying any one or more of the SNVs of the TOP3B gene described in Table 1 above. It is not limited.
본 발명에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 TOP3B 유전자에 대한 프로브를 이용할 수 있으며, 상기 프로브는 상기 표 1에 기재된 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브인 것을 특징으로 할 수 있으나, 이에 제한되는 것은 아니다.In the present invention, in the step of detecting the SNV of the TOP3B gene, a probe for the TOP3B gene may be used, and the probe is a probe that complementarily binds to a region containing the SNV position of the TOP3B gene described in Table 1 above. It may be characterized, but is not limited thereto.
상기 표 1에 기재된 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 세트 및 상기 표 1에 기재된 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 디자인하는 것은 본 발명이 속하는 기술분야의 통상의 기술자라면 쉽게 도출할 수 있으며, 상기 프라이머 세트 및 프로브는 실시간 PCR에 사용할 수 있고, 더욱 바람직하게는 동시다중(multiplex) 실시간 PCR에 사용할 수 있다.Designing a primer set capable of specifically amplifying the SNV position described in Table 1 and a probe that complementarily binds to the region containing the SNV position of the TOP3B gene described in Table 1 above is in the technical field to which the present invention pertains. Those skilled in the art can easily derive, and the primer set and probe can be used for real-time PCR, and more preferably, can be used for real-time PCR.
본 발명의 치매 진단용 조성물 및 치매 진단용 키트, 용도 및 사용은 상술한 “치매의 진단 또는 예측을 위한 정보제공방법” 또는 “치매의 진단방법”을 이용하기 때문에, 상술한 본 발명에 따른 정보제공방법과 중복된 내용은 그 기재를 생략한다.The composition for diagnosing dementia and the kit for diagnosing dementia of the present invention, uses and uses, because the above-described “method for providing information for diagnosis or prediction of dementia” or “method for diagnosing dementia” is used, the method for providing information according to the present invention described above The description of the content overlapping with is omitted.
이하, 실시예를 통하여 본 발명을 더욱 상세히 설명하고자 한다. 이들 실시예는 오로지 본 발명을 예시하기 위한 것으로서, 본 발명의 범위가 이들 실시예에 의해 제한되는 것으로 해석되지 않는 것은 당업계에서 통상의 지식을 가진 자에 있어서 자명할 것이다.Hereinafter, the present invention will be described in more detail through examples. These examples are for illustrative purposes only, and it will be apparent to those of ordinary skill in the art that the scope of the present invention is not construed as being limited by these examples.
실시예 1: 실험군 및 정상군 선정Example 1: Selection of experimental group and normal group
명지병원(경기도 고양)에서 수집한 52세 이상의 정상인 99명과 뇌 영상을 통해 치매로 진단된 106명의 말초 혈액의 DNA를 추출하여, 특정 유전자의 SNV 및 이의 발생 빈도를 확인하기 위해 NGS 및 빅데이터 분석을 수행하였다.DNA from peripheral blood of 99 normal people aged 52 or older and 106 people diagnosed with dementia through brain imaging collected at Myongji Hospital (Goyang, Gyeonggi-do) and analyzed NGS and big data to determine the SNV of a specific gene and its frequency of occurrence Was performed.
실시예 2: 차세대 염기서열 분석(next generation sequencing, NGS)을 이용한 치매 관련 유전자의 SNV 확인Example 2: SNV identification of dementia-related genes using next generation sequencing (NGS)
NGS 분석을 통한 치매관련 바이오 마커를 탐색하기 위하여 TOP3B를 포함한 132개의 신경계 유전질환 관련 후보 유전자로 선정하였고, 이에 대한 타겟 시퀀싱을 수행하였다. 시퀀싱 라이브러리는 Illumina 社(San Diego, CA, USA)의 TruSeq Nano DNA Library Prep Kits를 이용하였고, TOP3B 유전자에 대한 타겟 농축(targeted enrichment)을 위하여 IDT 社(Coralville, IA, USA)의 xGen 잠금 프로브(lockdown probes)를 사용하였다(표 2).In order to search for dementia-related biomarkers through NGS analysis, 132 neurological genetic diseases-related candidate genes including TOP3B were selected, and target sequencing was performed. For the sequencing library, TruSeq Nano DNA Library Prep Kits of Illumina (San Diego, CA, USA) were used, and the xGen locking probe of IDT (Coralville, IA, USA) was used for the targeted enrichment of the TOP3B gene. lockdown probes) were used (Table 2).
Figure PCTKR2020006028-appb-T000002
Figure PCTKR2020006028-appb-T000002
Figure PCTKR2020006028-appb-I000002
Figure PCTKR2020006028-appb-I000002
Figure PCTKR2020006028-appb-I000003
Figure PCTKR2020006028-appb-I000003
Beckman Coulter 社의 Agencourt AMPure protocol을 따라 정제한 후, Illumina p5와 p7 프라이머를 이용하여 증폭시켰다. p5 프라이머는 서열번호 45로 표시되며, p7 프라이머는 서열번호 46으로 표시된다.After purification according to Beckman Coulter's Agencourt AMPure protocol, it was amplified using Illumina p5 and p7 primers. The p5 primer is represented by SEQ ID NO: 45, and the p7 primer is represented by SEQ ID NO: 46.
서열번호 45: 5’-AATGATACGGCGACCACCGAGATCTACAC-3’SEQ ID NO: 45: 5'-AATGATACGGCGACCACCGAGATCTACAC-3'
서열번호 46: 5’-CAAGCAGAAGACGGCATACGAGAT-3’SEQ ID NO: 46: 5'-CAAGCAGAAGACGGCATACGAGAT-3'
qPCR 및 KAPA Library Quantification kit(KAPA Biosystems, Boston, MA, USA)로 정제 및 정량하였다. 최종적으로 Post-enriched 라이브러리의 NGS 분석은 Illumina 社의 NextSeq 550을 이용하였다.Purified and quantified with qPCR and KAPA Library Quantification kit (KAPA Biosystems, Boston, MA, USA). Finally, NGS analysis of the post-enriched library was performed using Illumina's NextSeq 550.
수정된 매개 변수를 가진 Burrows-Wheeler Aligner(Heng Li, Richard Durbin, (2009) Bioinformatics, 25 (14), 1754-60)를 사용하여 각 시료의 판독을 참조 서열 hg19(인간 게놈 버전 19; GRCh37.p13)에 매핑하였고, SNP/InDels은 게놈 분석 툴킷(Genome Analysis Toolkit, GATK)에서 수정된 Haplotype Caller를 사용하여 확인하였다(Mark A DePristo, et al., (2011) Nat Genet, 43 (5), 491-8). 농축 효율(enrichment efficiency)은 150bp의 패딩을 갖는 표적화된 영역에 매핑되는 판독 비율에 기초하여 결정되었다.Readings of each sample were referenced to the reference sequence hg19 (human genome version 19; GRCh37. p13), and SNP/InDels were confirmed using the Haplotype Caller modified in the Genome Analysis Toolkit (GATK) (Mark A DePristo, et al., (2011) Nat Genet, 43 (5), 491-8). The enrichment efficiency was determined based on the ratio of reads mapped to the targeted region with a padding of 150 bp.
실시예 3: 빅데이터 분석을 이용한 SNV와 치매의 상관성 분석Example 3: Analysis of the correlation between SNV and dementia using big data analysis
132개의 후보 유전자의 빅데이터 분석은 R version 3.5.1(2018-07-02)을 이용하였으며, 전처리, 파싱, 필터링 등을 통하여 정상군과 치매 환자군의 전체적인 SNVs 통계 비교에서 나타난 상관성(도 1)을 발견하였다. 염색체별 클러스터링 통계분석을 수행한 후, SNV의 위치 및 개수에 따라 정상인과 치매 환자군의 선별이 가능한 유전자를 탐색하기 위하여 R에서 랜덤 포레스트(Random Forest)를 수행하였고, 치매 연관성이 가장 높은 TOP3B를 선택할 수 있었다. TOP3B 유전자에 존재하는 SNVs 발생 빈도는 도 2에 나타내었다.Big data analysis of 132 candidate genes was performed using R version 3.5.1 (2018-07-02), and the correlation shown in the comparison of overall SNVs statistics between the normal group and the dementia patient group through pre-processing, parsing, and filtering (Fig. 1). Found. After performing the statistical analysis of clustering by chromosome, a random forest was performed in R to search for genes that can be selected for normal and dementia patients according to the location and number of SNVs, and TOP3B with the highest dementia correlation was selected. Could The frequency of occurrence of SNVs present in the TOP3B gene is shown in FIG. 2.
실시예 4: TOP3B 유전자의 SNV 위치 분석Example 4: SNV position analysis of TOP3B gene
정상군 및 치매군에서 TOP3B 유전자의 SNV 위치를 확인하였다(표 3).The SNV location of the TOP3B gene was confirmed in the normal group and the dementia group (Table 3).
Figure PCTKR2020006028-appb-T000003
Figure PCTKR2020006028-appb-T000003
Figure PCTKR2020006028-appb-I000004
Figure PCTKR2020006028-appb-I000004
TOP3B 유전자의 SNV site 중에서 22,312,292; 22,313,669; 22,312,315; 22,312,350; 22,312,351은 모두 intron variant이며, 치매군에서 정상군보다 SNV 개수가 약 7배 이상 증가하였다.22,312,292 among the SNV sites of the TOP3B gene; 22,313,669; 22,312,315; 22,312,350; All 22,312,351 were intron variants, and the number of SNVs increased by more than 7 times in the dementia group than in the normal group.
22,312,531; 22,313,733; 22,312,383; 22,330,107; 22,312,484의 경우, 치매군에서 정상군보다 SNV 개수가 약 2배 증가했으며, 22,330,107은 5 prime UTR variant이고, 나머지는 모두 intron variant이다.22,312,531; 22,313,733; 22,312,383; 22,330,107; In the case of 22,312,484, the number of SNVs increased about 2 times in the dementia group than in the normal group, 22,330,107 were 5 prime UTR variants, and all others were intron variants.
치매 환자군의 TOP3B 유전자의 SNVs를 하기 표 4에 나타내었다.The SNVs of the TOP3B gene of the dementia patient group are shown in Table 4 below.
Figure PCTKR2020006028-appb-T000004
Figure PCTKR2020006028-appb-T000004
실시예 5: TOP3B 유전자에 존재하는 SNV 개수 분석을 통한 치매 진단Example 5: Diagnosis of dementia through analysis of the number of SNVs present in the TOP3B gene
TOP3B의 SNVs를 기준으로 정상인과 치매 환자 또는 치매 고위험군을 선별하기 위한 cut-off를 조사하기 위해, R 패키지 pROC(Xavier Robin, et al., (2011) BMC Bioinformatics, 12, 77)를 적용한 분석 프로그램을 이용하여 ROC(Receiver Operating Characteristics) 분석을 수행하였다. 상기 cut-off 수치는 최대화된 특이도와 민감도를 이루는 지점과 관련해서 결정되었다. TOP3B 유전자내의 SNVs 개수 2.5 기준으로 0.6061의 특이도(specificity)와 0.9245의 민감도(sensitivity)를 나타내었다(도 3). 따라서 2개 이하의 경우 정상인으로 선별할 수 있고, 3 이상이면 치매 환자 또는 치매 발병 고위험군으로 선별이 가능하다.An analysis program that applied the R package pROC (Xavier Robin, et al., (2011) BMC Bioinformatics, 12, 77) to investigate the cut-off for screening normal and dementia patients or high-risk groups based on TOP3B SNVs. ROC (Receiver Operating Characteristics) analysis was performed using. The cut-off value was determined in relation to the point at which the maximum specificity and sensitivity were achieved. Based on the number of SNVs in the TOP3B gene of 2.5, the specificity of 0.6061 and the sensitivity of 0.9245 were shown (Fig. 3). Therefore, two or less cases can be selected as normal, and if three or more, dementia patients or high-risk groups for dementia can be selected.
본 발명에서는 NGS 데이터 및 빅데이터 분석을 통하여, 치매 환자군에서 TOP3B 유전자에 특이적으로 발견되는 SNV의 위치들을 확인하였으며, SNV 발생 빈도에 있어서, 정상군에 비하여 치매 환자군에서 높은 빈도의 SNV가 존재함을 확인하였다. 이를 통해 치매의 병인(etiology)과 병리(pathology)를 이해할 수 있고, 상기 유전자 및 SNV를 이용하여 치매 위험도를 보다 정확하게 진단할 수 있으며, 따라서 치매 치료제 개발에 유용하게 활용될 수 있다.In the present invention, through NGS data and big data analysis, the locations of SNVs that are specifically found in the TOP3B gene in the dementia patient group were identified, and in terms of the frequency of SNV occurrence, there is a higher frequency of SNV in the dementia patient group than in the normal group. Was confirmed. Through this, the etiology and pathology of dementia can be understood, the risk of dementia can be more accurately diagnosed using the gene and SNV, and thus, it can be usefully used in the development of a dementia treatment.
이상으로 본 발명 내용의 특정한 부분을 상세히 기술하였는바, 당업계의 통상의 지식을 가진 자에게 있어서 이러한 구체적 기술은 단지 바람직한 실시 양태일 뿐이며, 이에 의해 본 발명의 범위가 제한되는 것이 아닌 점은 명백할 것이다. 따라서, 본 발명의 실질적인 범위는 첨부된 청구항들과 그것들의 등가물에 의하여 정의된다고 할 것이다.As described above, specific parts of the present invention have been described in detail, and it will be apparent to those of ordinary skill in the art that this specific description is only a preferred embodiment, and the scope of the present invention is not limited thereby. will be. Accordingly, it will be said that the substantial scope of the present invention is defined by the appended claims and their equivalents.
전자파일 첨부하였음.Electronic file attached.

Claims (15)

  1. 분리된 생물학적 시료에서 TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출하는 단계를 포함하는 치매의 진단 또는 예측을 위한 정보제공방법.A method of providing information for diagnosis or prediction of dementia comprising the step of detecting a single nucleotide variant (SNV) of a DNA topoisomerase III beta (TOP3B) gene in an isolated biological sample.
  2. 제1항에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 상기 유전자를 증폭하고, 상기 증폭된 산물의 시퀀싱(sequencing) 데이터를 이용하여 유전자 돌연변이를 분석하는 것을 특징으로 하는 정보제공방법.The method of claim 1, wherein the detecting of SNV of the TOP3B gene comprises amplifying the gene and analyzing gene mutations using sequencing data of the amplified product.
  3. 제2항에 있어서, 상기 시퀀싱은 생어 염기서열 분석(Sanger sequencing) 또는 차세대 염기서열 분석(next generation sequencing; NGS)인 것을 특징으로 하는 정보제공방법.The method of claim 2, wherein the sequencing is Sanger sequencing or next generation sequencing (NGS).
  4. 제1항에 있어서, 상기 TOP3B 유전자의 SNV를 검출하는 단계는 중합효소연쇄반응(polymerase chain reaction), 핵산 분해(nuclease digestion), 혼성화(hybridization), 서던 블로팅(Southern blotting), 제한효소 단편다형성(restriction enzyme fragment polymorphism), 프라이머 확장(primer extension), 단일가닥 형태 다형성(single stranded conformation polymorphism) 또는 상기 방법들을 함께 사용하여 분석하는 것을 특징으로 하는 정보제공방법.The method of claim 1, wherein the detecting SNV of the TOP3B gene comprises polymerase chain reaction, nucleic acid digestion, hybridization, Southern blotting, and restriction enzyme fragment polymorphism. (restriction enzyme fragment polymorphism), primer extension (primer extension), single stranded conformation polymorphism (single stranded conformation polymorphism), or an information providing method characterized in that the analysis using the above methods together.
  5. 제1항에 있어서, 상기 생물학적 시료는 혈액, 모발, 타액, 소변, 정액, 질 세포, 구강세포, 태반세포 또는 태아세포를 포함하는 양수 및 이의 혼합물로 구성된 군에서 선택되는 시료로부터 분리된 핵산 시료인 것을 특징으로 하는 정보제공방법.The nucleic acid sample of claim 1, wherein the biological sample is a sample selected from the group consisting of amniotic fluid including blood, hair, saliva, urine, semen, vaginal cells, oral cells, placental cells or fetal cells, and mixtures thereof. Information providing method, characterized in that.
  6. 제5항에 있어서, 상기 핵산은 게놈 DNA, cfDNA(cell free DNA), RNA 또는 micro RNA인 것을 특징으로 하는 정보제공방법.The method of claim 5, wherein the nucleic acid is genomic DNA, cfDNA (cell free DNA), RNA, or micro RNA.
  7. 제1항에 있어서, 상기 TOP3B 유전자의 SNV가 3개 이상인 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 것을 특징으로 하는 정보제공방법.The method of claim 1, further comprising the step of identifying as a dementia patient or a high-risk group of dementia when the SNV of the TOP3B gene is 3 or more.
  8. 제1항에 있어서, 상기 TOP3B 유전자에서 GRCh37.p13(Genome Reference Consortium Human Build 37 patch release 13)을 기준으로 22,311,659; 22,311,776; 22,312,061; 22,312,502; 22,312,378; 22,312,589; 22,312,970; 22,313,743; 22,318,365; 22,312,555; 22,312,531; 22,316,792; 22,311,882; 22,313,733; 22,311,516; 22,312,292; 22,313,669; 22,312,383; 22,330,107; 22,312,568; 22,312,476; 22,318,671; 22,312,668; 22,312,790; 22,318,538; 22,312,484; 22,312,351; 22,312,350; 22,312,315; 22,313,829; 및 22,330,082;로 구성된 군에서 선택되는 위치에 SNV가 검출되는 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 정보제공방법.According to claim 1, Based on GRCh37.p13 (Genome Reference Consortium Human Build 37 patch release 13) in the TOP3B gene 22,311,659; 22,311,776; 22,312,061; 22,312,502; 22,312,378; 22,312,589; 22,312,970; 22,313,743; 22,318,365; 22,312,555; 22,312,531; 22,316,792; 22,311,882; 22,313,733; 22,311,516; 22,312,292; 22,313,669; 22,312,383; 22,330,107; 22,312,568; 22,312,476; 22,318,671; 22,312,668; 22,312,790; 22,318,538; 22,312,484; 22,312,351; 22,312,350; 22,312,315; 22,313,829; And 22,330,082; when SNV is detected at a location selected from the group consisting of: identifying a dementia patient or a high risk group for dementia.
  9. 제8항에 있어서, 상기 위치 중에서 SNV가 3개 이상 검출되는 경우 치매 환자 또는 치매 고위험군으로 확인하는 단계를 추가로 포함하는 정보제공방법.The method of claim 8, further comprising the step of identifying as a dementia patient or a high-risk group for dementia when three or more SNVs are detected among the locations.
  10. 제8항에 있어서, 상기 TOP3B 유전자의 SNV가 하기 표의 SNV에서 선택되는 것을 특징으로 하는 정보제공방법:The method of claim 8, wherein the SNV of the TOP3B gene is selected from SNVs in the following table:
    Figure PCTKR2020006028-appb-I000005
    Figure PCTKR2020006028-appb-I000005
  11. 제1항에 있어서, 상기 치매는 알츠하이머병(Alzheimer's disease), 노인성 치매(senile dementia), 혈관성 치매(vascular dementia), 전두측두엽 치매(frontotemporal dementia), 루이소체 치매(dementia with Lewy Bodies) 또는 파킨슨병(Parkinson’s disease) 치매인 것을 특징으로 하는 정보제공방법.The method of claim 1, wherein the dementia is Alzheimer's disease, senile dementia, vascular dementia, frontotemporal dementia, dementia with Lewy Bodies, or Parkinson's disease. (Parkinson's disease) A method of providing information characterized by dementia.
  12. TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제를 포함하는 치매 진단용 조성물.A composition for diagnosing dementia comprising an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
  13. 제12항에 있어서, 상기 제제는 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 포함하는 것을 특징으로 하는 조성물.The composition of claim 12, wherein the preparation comprises a primer capable of specifically amplifying the SNV position of the TOP3B gene or a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
  14. TOP3B(DNA topoisomerase III beta) 유전자의 SNV(Single Nucleotide Variant)를 검출할 수 있는 제제를 포함하는 치매 진단용 키트.A kit for diagnosing dementia containing an agent capable of detecting Single Nucleotide Variant (SNV) of TOP3B (DNA topoisomerase III beta) gene.
  15. 제14항에 있어서, 상기 제제는 TOP3B 유전자의 SNV 위치를 특이적으로 증폭할 수 있는 프라이머 또는 TOP3B 유전자의 SNV 위치를 포함하는 영역에 상보적으로 결합하는 프로브를 포함하는 것을 특징으로 하는 키트.The kit according to claim 14, wherein the preparation comprises a primer capable of specifically amplifying the SNV position of the TOP3B gene or a probe complementarily binding to a region containing the SNV position of the TOP3B gene.
PCT/KR2020/006028 2019-05-10 2020-05-07 Top3b gene mutation-based dementia diagnosis method WO2020231081A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20190054962 2019-05-10
KR10-2019-0054962 2019-05-10

Publications (1)

Publication Number Publication Date
WO2020231081A1 true WO2020231081A1 (en) 2020-11-19

Family

ID=73289461

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/006028 WO2020231081A1 (en) 2019-05-10 2020-05-07 Top3b gene mutation-based dementia diagnosis method

Country Status (2)

Country Link
KR (1) KR20200130165A (en)
WO (1) WO2020231081A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230075786A (en) * 2021-11-23 2023-05-31 주식회사 피디젠 Method and platform for predicting high-risk groups of non-face-to-face dementia
KR20230134755A (en) * 2022-03-15 2023-09-22 사회복지법인 삼성생명공익재단 A method of providing information for predicting a group at risk for Alzheimer's disease dementia or early onset of symptoms, a risk group for amnesia-type mild cognitive impairment and/or a PET-positive risk group for amyloid β deposition based on European population data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8257922B2 (en) * 1999-01-06 2012-09-04 Genenews Corporation Method of profiling gene expression in a subject having alzheimer's disease
KR101410986B1 (en) * 2011-04-22 2014-06-27 (주)바이오니아 SNP genotyping assay set for ApoE genes and method of detecting ApoE using the same
KR20140089384A (en) * 2011-11-10 2014-07-14 제넨테크, 인크. Methods for treating, diagnosing and monitoring alzheimer's disease
KR101933847B1 (en) * 2017-05-15 2018-12-28 조선대학교산학협력단 APOE promoter SNP associated with the risk of Alzheimer's disease and the use thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8257922B2 (en) * 1999-01-06 2012-09-04 Genenews Corporation Method of profiling gene expression in a subject having alzheimer's disease
KR101410986B1 (en) * 2011-04-22 2014-06-27 (주)바이오니아 SNP genotyping assay set for ApoE genes and method of detecting ApoE using the same
KR20140089384A (en) * 2011-11-10 2014-07-14 제넨테크, 인크. Methods for treating, diagnosing and monitoring alzheimer's disease
KR101933847B1 (en) * 2017-05-15 2018-12-28 조선대학교산학협력단 APOE promoter SNP associated with the risk of Alzheimer's disease and the use thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GEORG STOLL, PIETILÄINEN OLLI P H, LINDER BASTIAN, SUVISAARI JAANA, BROSI CORNELIA, HENNAH WILLIAM, LEPPÄ VIRPI, TORNIAINEN MINNA,: "Deletion of TOP3 beta , a component of FMRP-containing mRNPs, contributes to neurodevelopmental disorders", NATURE NEUROSCIENCE, vol. 16, no. 9, 1 September 2013 (2013-09-01), pages 1228 - 1237, XP055761081, ISSN: 1097-6256, DOI: 10.1038/nn.3484 *

Also Published As

Publication number Publication date
KR20200130165A (en) 2020-11-18

Similar Documents

Publication Publication Date Title
De Roeck et al. NanoSatellite: accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION
Zhan et al. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping
Anasagasti et al. Current mutation discovery approaches in Retinitis Pigmentosa
Tang et al. Characterization of mitochondrial DNA heteroplasmy using a parallel sequencing system
Huang Next generation sequencing to characterize mitochondrial genomic DNA heteroplasmy
WO2017023148A1 (en) Novel method capable of differentiating fetal sex and fetal sex chromosome abnormality on various platforms
WO2020231081A1 (en) Top3b gene mutation-based dementia diagnosis method
CN104561016B (en) New mutation of congenital cataract PITX3 gene
WO2015042980A1 (en) Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
EP2609219A2 (en) Defining diagnostic and therapeutic targets of conserved free floating fetal dna in maternal circulating blood
Tsai et al. Identification of a CCG-Enriched expanded allele in patients with myotonic dystrophy type 1 using amplification-free long-read sequencing
Trinh et al. Mitochondrial DNA heteroplasmy distinguishes disease manifestation in PINK1/PRKN-linked Parkinson’s disease
Tosto et al. Use of “omics” technologies to dissect neurologic disease
Asgharzade et al. A novel missense mutation in GIPC3 causes sensorineural hearing loss in an Iranian family revealed by targeted next-generation sequencing
JPWO2011148715A1 (en) Normal-tension glaucoma disease susceptibility gene and use thereof
Wang et al. Using whole exome sequencing and bioformatics in the molecular autopsy of a sudden unexplained death syndrome (SUDS) case
CN111593108B (en) Method and kit for detecting polymorphism of 7q36.3 region associated with occurrence of noise-induced hearing loss, and use thereof
Yu et al. Identification of two missense mutations of ERCC6 in three Chinese sisters with Cockayne syndrome by whole exome sequencing
Zhao et al. Exome sequencing identifies novel compound heterozygous mutations in SPG11 that cause autosomal recessive hereditary spastic paraplegia
Gong et al. Identification of rare paired box 3 variant in strabismus by whole exome sequencing
Zhang et al. Evaluation of three microhaplotypes in individual identification and ancestry inference
WO2021107713A1 (en) Method for diagnosing amyotrophic lateral sclerosis on basis of lats1 gene mutation marker
CN107475351B (en) Screening method of high-contribution pathogenic gene of rheumatoid arthritis
RU2777091C1 (en) Method for pre-implantation genetic testing of breast and ovarian cancer
Zhou et al. Somatic Mosaicism in Amyotrophic Lateral Sclerosis and Frontotemporal Dementia Reveals Widespread Degeneration from Focal Mutations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20806149

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20806149

Country of ref document: EP

Kind code of ref document: A1