CN110129422B - Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing - Google Patents
Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing Download PDFInfo
- Publication number
- CN110129422B CN110129422B CN201910458674.9A CN201910458674A CN110129422B CN 110129422 B CN110129422 B CN 110129422B CN 201910458674 A CN201910458674 A CN 201910458674A CN 110129422 B CN110129422 B CN 110129422B
- Authority
- CN
- China
- Prior art keywords
- mutation
- pcr
- repeat
- sequence
- molecule sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000035772 mutation Effects 0.000 title abstract description 85
- 238000012163 sequencing technique Methods 0.000 title abstract description 64
- 239000012634 fragment Substances 0.000 title abstract description 43
- 108091033319 polynucleotide Proteins 0.000 title abstract description 40
- 102000040430 polynucleotide Human genes 0.000 title abstract description 40
- 239000002157 polynucleotide Substances 0.000 title abstract description 40
- 238000000034 method Methods 0.000 title abstract description 39
- 201000010099 disease Diseases 0.000 title abstract description 30
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title abstract description 30
- 238000001514 detection method Methods 0.000 claims description 33
- 206010044565 Tremor Diseases 0.000 claims description 10
- 206010015037 epilepsy Diseases 0.000 claims description 9
- 108090000623 proteins and genes Proteins 0.000 claims description 9
- 102400000739 Corticotropin Human genes 0.000 claims description 8
- 101800000414 Corticotropin Proteins 0.000 claims description 8
- IDLFZVILOHSSID-OVLDLUHVSA-N corticotropin Chemical compound C([C@@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](C(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(N)=O)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)NC(=O)[C@@H](N)CO)C1=CC=C(O)C=C1 IDLFZVILOHSSID-OVLDLUHVSA-N 0.000 claims description 8
- 229960000258 corticotropin Drugs 0.000 claims description 8
- 239000003153 chemical reaction reagent Substances 0.000 claims description 7
- 230000002566 clonic effect Effects 0.000 claims description 4
- 230000003321 amplification Effects 0.000 abstract description 49
- 238000003199 nucleic acid amplification method Methods 0.000 abstract description 49
- 150000007523 nucleic acids Chemical class 0.000 abstract description 17
- 102000039446 nucleic acids Human genes 0.000 abstract description 13
- 108020004707 nucleic acids Proteins 0.000 abstract description 13
- 238000003780 insertion Methods 0.000 description 39
- 230000037431 insertion Effects 0.000 description 39
- 238000003752 polymerase chain reaction Methods 0.000 description 29
- 239000002773 nucleotide Substances 0.000 description 26
- 239000000523 sample Substances 0.000 description 25
- 101150050223 Samd12 gene Proteins 0.000 description 23
- 108020004414 DNA Proteins 0.000 description 19
- 239000013615 primer Substances 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 11
- 125000003729 nucleotide group Chemical group 0.000 description 11
- 230000002159 abnormal effect Effects 0.000 description 9
- 238000003745 diagnosis Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000007480 sanger sequencing Methods 0.000 description 7
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 6
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 6
- 230000001717 pathogenic effect Effects 0.000 description 6
- 108091028043 Nucleic acid sequence Proteins 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 108010078286 Ataxins Proteins 0.000 description 4
- 102000014461 Ataxins Human genes 0.000 description 4
- 206010008025 Cerebellar ataxia Diseases 0.000 description 4
- 238000002105 Southern blotting Methods 0.000 description 4
- 208000009415 Spinocerebellar Ataxias Diseases 0.000 description 4
- 201000004562 autosomal dominant cerebellar ataxia Diseases 0.000 description 4
- 230000003252 repetitive effect Effects 0.000 description 4
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 description 3
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 description 3
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 3
- 108010006785 Taq Polymerase Proteins 0.000 description 3
- 230000001054 cortical effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000007672 fourth generation sequencing Methods 0.000 description 3
- 241000264288 mixed libraries Species 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 239000003155 DNA primer Substances 0.000 description 2
- 201000011240 Frontotemporal dementia Diseases 0.000 description 2
- 208000026350 Inborn Genetic disease Diseases 0.000 description 2
- 206010068871 Myotonic dystrophy Diseases 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 208000016361 genetic disease Diseases 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- NOIIUHRQUVNIDD-UHFFFAOYSA-N 3-[[oxo(pyridin-4-yl)methyl]hydrazo]-N-(phenylmethyl)propanamide Chemical compound C=1C=CC=CC=1CNC(=O)CCNNC(=O)C1=CC=NC=C1 NOIIUHRQUVNIDD-UHFFFAOYSA-N 0.000 description 1
- 229920000936 Agarose Polymers 0.000 description 1
- 206010003591 Ataxia Diseases 0.000 description 1
- 101150022676 CSTB gene Proteins 0.000 description 1
- 208000028698 Cognitive impairment Diseases 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000005971 DNA damage repair Effects 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 208000002877 Epileptic Syndromes Diseases 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 208000001914 Fragile X syndrome Diseases 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 101000716931 Homo sapiens Sterile alpha motif domain-containing protein 12 Proteins 0.000 description 1
- 208000019695 Migraine disease Diseases 0.000 description 1
- 208000002033 Myoclonus Diseases 0.000 description 1
- 208000001140 Night Blindness Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 208000037140 Steinert myotonic dystrophy Diseases 0.000 description 1
- 102100020929 Sterile alpha motif domain-containing protein 12 Human genes 0.000 description 1
- 102000009329 Sterile alpha motif domains Human genes 0.000 description 1
- 108050000172 Sterile alpha motif domains Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000003556 anti-epileptic effect Effects 0.000 description 1
- 239000001961 anticonvulsive agent Substances 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 238000005251 capillar electrophoresis Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 208000010877 cognitive disease Diseases 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001037 epileptic effect Effects 0.000 description 1
- 230000000763 evoking effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 206010027599 migraine Diseases 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000000337 motor cortex Anatomy 0.000 description 1
- 230000002151 myoclonic effect Effects 0.000 description 1
- 201000009340 myotonic dystrophy type 1 Diseases 0.000 description 1
- 230000001293 nucleolytic effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000003950 pathogenic mechanism Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 210000005259 peripheral blood Anatomy 0.000 description 1
- 239000011886 peripheral blood Substances 0.000 description 1
- 238000002205 phenol-chloroform extraction Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 201000003494 spinocerebellar ataxia type 37 Diseases 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/166—Oligonucleotides used as internal standards, controls or normalisation probes
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a method for analyzing a mutation structure of a polynucleotide repeat mutation disease based on long-fragment PCR and single-molecule sequencing. The method comprises the following steps: (a) providing a sample to be detected, wherein the sample to be detected is a nucleic acid sample containing genome DNA; (b) carrying out long-fragment PCR on the sample to be detected so as to obtain a first amplification product; (c) adding a barcode sequence to an end of the amplification product to form a first amplification product with a barcode sequence; (d) single molecule sequencing the amplification products of the barcoding sequence to obtain a data set corresponding to the daughter reads of the target region. Based on this data set, the mutant structure of a polynucleotide repeat mutation disease can be accurately resolved. The invention has the characteristics of high efficiency, high precision, low cost and the like.
Description
Technical Field
The invention relates to the technical field of biology, in particular to a method for analyzing a mutation structure of a polynucleotide repeat mutation disease based on long-fragment PCR and single-molecule sequencing.
Background
Polynucleotide expansion disease (RED) is a large group of genetic diseases caused by abnormal amplification of 3-12 nucleotide repeats. In addition to the more common trinucleotide repeat amplification mutations such as (CAG) n and (CGG) n, more and more polynucleotide repeat amplification mutations are reported to be discovered at present. In the polynucleotide repeat mutation diseases, such as C9ORF72 gene (GGGGGGCC) n hexanucleotide repeat mutation, CSTB gene (CCCCGCCCCGCG) n and the like, the length of the abnormally repeated amplified sequence can reach several kb or even dozens of kb, the detailed structure of the mutation is clarified, and the exploration of different pathogenic mechanisms of different structures and the relation with clinical phenotypes are always difficult.
Taking Familial Cortical Myoclonic Tremor Epilepsy (FCMTE) as an example, the disease is a group of autosomal dominant hereditary epilepsy syndromes with significant clinical and genetic heterogeneity. FCMTE has as its main clinical manifestations adult onset, tremor of the motor cortex and myoclonus of the extremities, with or without epileptic onset, and can be associated with symptoms of cognitive impairment, dementia, night blindness, migraine, ataxia, etc. The electrophysiological examination may have abnormal electromyographic and electroencephalographic manifestations, such as Giant somatosensory evoked potentials (G-SEP) of cortical origin, and Long-latency cortical reflex (LLCR or C-reflex). The antiepileptic medicine is effective.
The clinical phenotype of FCMTE is complex, and the diagnosis can be confirmed only by combining complex electrophysiological examinations such as G-SEP, C-reflex and the like in the past diagnostic standard, so that a large number of cases of missed diagnosis and misdiagnosis exist, and the targeted and accurate diagnosis and treatment cannot be obtained.
The single-molecule sequencing (single-molecule sequencing or long-read sequencing) technology provides a new detection means for the repeated mutation of the polynucleotides, is also applied to the research of the repeated insertion mutation of the SAMD12 gene pentanucleotide of FCMTE by other groups, but has various limitations, so that the missed detection rate is high, reliable sequence content information cannot be provided, only the detection at a qualitative level can be realized, the existence of the repeated amplification sequence can be seen, but the reliable sequence content information cannot be provided. In addition, single molecule sequencing at the whole genome level is very expensive and cannot be popularized in clinical detection.
Therefore, there is an urgent need in the art to develop a novel method for efficiently and accurately analyzing the mutant structure of a disease caused by repetitive mutation of a polynucleotide.
Disclosure of Invention
The invention aims to provide a novel method for efficiently and accurately analyzing the mutant structure of a polynucleotide repeat mutation disease.
In a first aspect of the present invention, there is provided a method of resolving a mutant structure of a polynucleotide repeat mutation disease or a method of resolving a structure of a polynucleotide repeat region, the method comprising the steps of:
(a) providing a sample to be detected, wherein the sample to be detected is a nucleic acid sample containing genome DNA;
(b) carrying out long-fragment PCR on the sample to be detected so as to obtain a first amplification product;
(c) adding a barcode sequence to an end of the amplification product to form a first amplification product with a barcode sequence;
(d) single molecule sequencing the amplification products of the barcoding sequence to obtain a dataset corresponding to the sub-reads of the target region (i.e., the sub-reads corresponding to the polynucleotide repeat region).
In another preferred example, the method further comprises:
(e) the data set is analyzed to obtain the mutant structure of the target region (i.e., the polynucleotide repeat region).
In another preferred embodiment, between steps (c) and (d), further comprising:
(d0) mixing the first amplification product of the barcode sequence with m-1 amplification products with barcode sequences to obtain a mixed library of amplification products;
wherein the m-1 barcoded amplification products are the 2 nd, 3 rd, … th and m th amplification products with different barcode sequences prepared in steps (a), (b) and (c), respectively, and the mixed library of amplification products contains the m barcoded amplification products;
wherein m is a positive integer not less than 2.
In another preferred embodiment, m.gtoreq.5, preferably.gtoreq.10, more preferably.gtoreq.20, most preferably.gtoreq.30.
In another preferred embodiment, m is from 5 to 5000, preferably from 10 to 2000, more preferably from 20 to 500.
In another preferred embodiment, m is 5 to 60, preferably 20 to 50, more preferably 35 to 45.
In another preferred embodiment, in step (d), single-molecule sequencing is performed on the mixed library of amplification products, thereby obtaining the data set of the sub-reads corresponding to the target region (i.e., the sub-reads corresponding to the polynucleotide repeat region).
In another preferred example, in step (e), the data set is split based on different barcode sequences, and then reads with the same barcode sequence are subjected to classification analysis, so as to obtain mutation structures corresponding to the target regions (i.e. polynucleotide repeat regions) of the m samples to be tested respectively.
In another preferred embodiment, the length of the polynucleotide repeat region is 200 + 10000bp, preferably 1500 + 5000 bp.
In another preferred embodiment, the polynucleotide repeats are repeats of 3-12nt nucleotide units.
In another preferred embodiment, the polynucleotide repeat region comprises one or more polynucleotide repeats.
In another preferred embodiment, the polynucleotide repeat mutation disease is selected from the group consisting of: familial corticotropin tremor epilepsy (e.g., familial corticotropin tremor epilepsy types 1, 6, 7); c9ORF 72-associated amyotrophic lateral sclerosis/frontotemporal dementia; spinocerebellar ataxia (e.g., spinocerebellar ataxia types 8, 10, 31, 36, 37); myotonic dystrophy (e.g., myotonic dystrophy type 1, 2).
In another preferred example, between steps (b) and (c), further comprising:
(c0) separating the first amplification product to obtain a separated and purified first amplification product.
In another preferred embodiment, when the method is used for m amplification products, the m amplification products are separated, thereby obtaining m separated amplification products.
In another preferred embodiment, Bluepippin is used to separate and recover the amplified product by fragment separation.
In another preferred embodiment, the method is non-diagnostic and non-therapeutic.
In a second aspect of the invention, there is provided a kit for diagnosing familial corticotropin tremor epilepsy (FCMTE), the kit comprising a first standard which is a nucleic acid sequence having (TTTGA) n1 five nucleotide repeat insertion mutation, wherein n1 is 50-800.
In another preferred embodiment, n1 is 100-500.
In another preferred embodiment, the kit further comprises a second standard, wherein the second standard is a nucleic acid sequence having a (TTTCA) n2 five-nucleotide repeat insertion mutation, wherein n2 is 100-700.
In another preferred embodiment, n2 is 200-500.
In another preferred embodiment, the kit further comprises a primer pair for long-fragment PCR.
In another preferred embodiment, the sequences of the primer pair for long-fragment PCR are shown in SEQ ID Nos. 1 and 2.
In a third aspect of the invention, there is provided the use of a kit according to the second aspect of the invention in the preparation of a test kit for the diagnosis of familial corticotropin clonic tremor epilepsy (FCMTE).
In a fourth aspect of the invention, there is provided a use of a detection reagent for detecting (TTTGA) n1 pentanucleotide repeat in SAMD12 gene, wherein the detection reagent is used for preparing a detection kit for diagnosing familial corticotropin tingling tremor epilepsy (FCMTE).
In a fifth aspect of the invention, there is provided a method of diagnosing FCMTE, comprising the steps of: detecting the presence or absence of a TTTGA-type pentanucleotide repeat in the SAMD12 gene of the subject;
wherein, if a TTTGA-type pentanucleotide repeat is present, it is indicative that the subject has, or is more likely to have, FCMTE (i.e., susceptible) than the normal population.
In a sixth aspect of the present invention, there is provided a system (or apparatus) for resolving a mutant structure of a polynucleotide repeat mutation disease, the system comprising:
(i) an LR-PCR amplification module configured to: carrying out long-fragment PCR on a sample to be detected so as to obtain a first amplification product, wherein the sample to be detected is a nucleic acid sample containing genome DNA;
(ii) an amplification product post-processing module configured to: adding a barcode sequence to an end of the amplification product to form a first amplification product with a barcode sequence; and
(iii) a single molecule sequencing module configured to: single molecule sequencing the amplification products of the barcoding sequence to obtain a dataset corresponding to the sub-reads of the target region (i.e., the sub-reads corresponding to the polynucleotide repeat region).
In another preferred example, the system further includes:
(iv) a data analysis module configured to: the data set is analyzed to obtain the mutant structure of the target region (i.e., the polynucleotide repeat region).
It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.
Drawings
FIG. 1 shows a diagram of the five nucleotide repeat insertion mutation pattern within intron 4 of SAMD12 gene. Normal sequences are generally (TTTTTTA)7TTA(TTTTA)13(ii) a The sequences of the two mutations are (TTTTA) exp (TTTGA) exp and (TTTTA) exp (TTTCA) (exp: repeat expansion, where exp represents the presence or absence of the repeat expansion sequence and does not represent the number of times), respectively.
FIG. 2 shows the result of the repeat insertion of five nucleotides into intron 4 of SAMD12 gene and RP-PCR. As can be seen from the two samples tested, RP-PCR suggested the presence of (TTTTA) n and (TTTGA) n repeat amplification but the absence of (TTTCA) n repeat amplification.
FIG. 3 shows the result of LR-PCR gel running of five-nucleotide repeat insertion mutation of SAMD12 gene. The III:4, II:6 and IV:2 samples have abnormal amplification bands at about 2000 bp; the P-I-III2 sample has an abnormal amplification band at about 3000 bp.
FIG. 4 shows the sub-reads of a representative target region for single molecule sequencing of two cases of FCMTE samples. II:6 details of the sequence of the aberrantly amplified bands of the samples are: (TTTTA)5TTA(TTTTA)114(TTTGA)111(ii) a The detailed sequence of the abnormal amplification band of the P-I-III2 sample is: (TTTTA)3TTA(TTTTA)32(TTTCA)481。
FIG. 5 shows the sub-read length and content distribution of the target region for 4 cases of FCMTE samples: A-D is the length distribution of the abnormal amplification bands of each sample; E-H is the distribution of the lengths of (TTTTA) n and (TTTGA) n or (TTTCA) n in each sample abnormal amplification band. The dotted line represents the median (see table 2 for specific values).
FIG. 6 shows (TTTGA) n pentanucleotide repeat insertion mutation pathogenic pedigree, mutation sequence structure and LR-PCR gel map.
FIG. 7 shows (TTTGA) n pentanucleotide repeat insertion mutation nosogenes sample LR-PCR product Sanger sequencing and normal control: normal control Sanger sequencing suggested a repetitive sequence structure of (TTTTTTA)7TTA(TTTTA)13(ii) a Sanger sequencing of the long fragment PCR product suggested a (TTTTA) exp at the 5 'end and a (TTTGA) exp at the 3' end.
Figure 8 shows the sub-reads of a representative region of interest for single molecule sequencing of two additional (TTTGA) n quintet repeat insertion mutation-causing FCMTE samples: the detailed sequences of the abnormal amplification bands of the III:4 and IV:2 samples are respectively as follows: (TTTTA)5TTA(TTTTA)119(TTTGA)111And (TTTTA)5TTA(TTTTA)108(TTTGA)113。
Detailed Description
The present inventors have conducted extensive and intensive studies and, for the first time, have developed a method for efficiently and accurately analyzing the mutation structure of a disease caused by repetitive mutation of a polynucleotide. Specifically, the method is based on LR-PCR and single-molecule sequencing, and utilizes LR-PCR products to perform target region single-molecule sequencing, so that more effective reads (effective data increase) are obtained under the condition of reduced sequencing total amount (cost reduction), and the method has the characteristics of high efficiency, high precision and low cost. Based on the method of the present invention, the present inventors also identified for the first time a new mutation structure of FCMTE, a polynucleotide repeat mutation disease, i.e., the presence of a (TTTGA) n-type five-nucleotide repeat insertion mutation on SAMD12 gene. The present invention has been completed based on this finding.
Term(s) for
As used herein, the term "on-target read" of a target region refers to a read associated with a disease mutation structure.
As used herein, the term "TTTGA-type pentanucleotide repeat" refers to the presence of a (TTTGA) n1 pentanucleotide repeat within a target region, wherein n1 is a positive integer as defined above. In the present invention, the TTTGA-type pentanucleotide repeat in SAMD12 gene was first confirmed to be associated with familial corticotropin clonic tremor epilepsy.
As used herein, the term "TTTCA-type pentanucleotide repeat" refers to the (TTTCA) n2 pentanucleotide repeat present within a target region, wherein n2 is a positive integer as defined above. It was demonstrated that the TTTCA type pentanucleotide repeat in the SAMD12 gene was first demonstrated to be associated with familial corticotropin clonic tremor epilepsy.
SAMD12 gene
SAMD12(Sterile Alpha Motif Domain restriction 12, NCBI ID:401474), is a protein coding gene in which the "ENST 00000409003.5" transcript contains 5 coding exons, has been previously reported as the causative gene for FCMTE, and the causative mutation is a TTTCA-type pentanucleotide repeat insert located within intron 4.
Some specific functions of the protein encoded by the SADM12 gene are not yet known.
LR-PCR
The Polymerase Chain Reaction (PCR) is a molecular biological technique for the in vitro enzymatic synthesis of specific DNA fragments, the DNA fragment to be amplified starting from oligonucleotide primers complementary to its sequence, the basic principle of the PCR technique being similar to the natural replication process of DNA, the specificity of which depends on oligonucleotide primers complementary to both ends of the target sequence. PCR consists of three basic reaction steps of denaturation-annealing-extension. In the PCR amplification process, under the action of DNA polymerase (such as Taq DNA polymerase), dNTP is used as a reaction raw material, a target sequence is used as a template, and a new half-retained replication chain complementary with a template DNA chain is synthesized according to the base complementary pairing and half-retained replication principle. The specific DNA fragments are exponentially increased in number through multiple cycles of three reactions of denaturation, annealing and extension. A large number of specific gene fragments can be obtained in a short time by PCR.
In the present invention, Long-range PCR (LR-PCR) refers to a PCR reaction in which the amplification product is 4kb or more (preferably 5kb or more). In the invention, the technical method for amplifying the target product with more than 5kb which can not be amplified by the conventional PCR (generally can amplify a 3-4kb fragment) is achieved by adjusting the PCR reaction conditions and the type (such as Taq polymerase) of the related DNA polymerase.
In the present invention, the amplification product of Long-range PCR (LR-PCR) is usually 4.5-15kb, preferably 5-10kb, more preferably 5-8 kb.
In the present invention, the amplification of long-fragment DNA sequences can be further improved by adjusting PCR conditions, such as the use of a specific polymerase, optimization of the amount of template DNA, Mg2+ concentration, and the like.
One preferred LR-PCR polymerase includes TAKARA's specific DNA polymerase (Takara LA Taq DNA polymerase and PrimeSTAR GXLDDNA polymerase), which can amplify long fragments up to several tens of Kb, including sequences with AT repeats, high GC content.
For the operation of LR-PCR, see also the following documents: waggott W.Long Range PCR.In: Lo Y.M.D. (eds) Cl inorganic Applications of PCR.methods in Molecular MedicineTM,vol 16.1998.Humana Press;Saiki RK,Gelfand DH,Stoffel S,et al.Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase.Science.1998.239(4839):487–91。
Single molecule sequencing
In the present invention, for the long-fragment amplification product obtained by LR-PCR amplification, the corresponding reading data can be obtained by single-molecule sequencing. In the present invention, representative single molecule sequencing includes (but is not limited to): tSMS sequencing, nanopore sequencing, and the like.
Typically, based on the technical principle of using fluorescence to label deoxynucleotide and recording the change of fluorescence intensity in real time by a microscope, the new generation sequencing which can detect very long sequence (up to 20kb) by one reaction is completed, and the technical bottleneck of reading the length (100-.
For nucleic acid sequencing, the term "template" refers to a nucleic acid molecule that undergoes a sequencing reaction. For example, in a sequencing-by-synthesis reaction, a template is a molecule used by a polymerase to direct nascent strand synthesis; for example, it is complementary to the nascent strand produced. In nanopore-based sequencing methods, the template is a nucleic acid that passes through the nanopore, either intact or after nucleolytic degradation. The template may comprise, for example, DNA, RNA, or the like, or a combination thereof. In addition, the template may be single-stranded, double-stranded, or may comprise both single-stranded and double-stranded regions.
In the present invention, it is preferred to use single molecule sequencing systems to detect nucleic acid templates by analyzing reaction data (e.g., sequence and/or kinetic data) obtained from such systems. In particular, a modification in a template nucleic acid strand can cause a unique and identifiable change in an analytical reaction that allows the modification to be identified. In other embodiments, the modification in the template alters pathways in which the current through the nanopore is perturbed as the template passes. In a preferred embodiment, such modifications are detected using single molecule nucleic acid sequencing techniques, wherein the resulting sequence reads correspond to a single molecule of the nucleic acid template. In preferred embodiments, single molecule nucleic acid sequencing technology is capable of detecting individual nucleotides in real time, for example during nucleotide incorporation or passage through a nanopore. Such sequencing techniques are known in the art and include, for example, nanopore sequencing techniques. For more information on nanopore sequencing, see, e.g., U.S. patent nos. 5,795,782; kasiaanowicz, et al (1996) Proc Natl Acad Sci USA 93 (24): 13770-3; ashkenas, et al (2005) Angew Chem Int Ed Engl44 (9): 1401-4; howorkka, et al (2001) Nat Biotechnology 19 (7): 636-9; astier, et al (2006) J Am Chem Soc128 (5): 1705-10; U.S.S.N.13/083,320, filed on 8/4/2011; and Zhao, et al (2007) Nano Letters 7 (6): 1680-.
Furthermore, for single molecule sequencing techniques, see also the following documents: ameur A, Kloosterman WP, Hestand MS.Single-molecule sequencing: todards clinical applications. trends Biotechnol.2018.37(1): 72-85; mitsuhashi, et al, tandem-genpatterns, robust detection of tandem repeat extensions from long DNA reads, genome biology.2019.20:58: 1-17.
Diseases caused by repeated mutation of polynucleotide
The polynucleotide repeat mutation disease (RED) applicable to the present invention is not particularly limited, and any disease caused by abnormal amplification of 3-12 nucleotide repeat sequences, particularly genetic diseases, may be used, and representative examples include (but are not limited to): spinocerebellar ataxia, myotonic dystrophy, C9ORF 72-associated amyotrophic lateral sclerosis/frontotemporal dementia, fragile X syndrome, and the like.
Detection method
The invention provides a method for efficiently and accurately analyzing the mutation structure of the repeated mutation disease of the polynucleotide. The method skillfully integrates the advantages of long-fragment PCR and single-molecule sequencing analysis, thereby not only efficiently and accurately analyzing the FCMTE, but also analyzing the mutation structures of other different polynucleotide repeated mutation diseases.
The invention is particularly suitable for the case where the repeat region of the polynucleotide exceeds 500 bp. In the prior art, when the repeated region of the polynucleotide exceeds 500bp, even if the technology such as second generation sequencing is adopted, the accurate result can not be obtained due to the interference of various factors such as the repeated region of the polynucleotide.
Reagent kit
The invention also provides a kit for detecting FCMTE. The kit of the invention contains a first standard which is a nucleic acid sequence having a (TTTGA) n1 pentanucleotide repeat insertion mutation, wherein n1 is 50-800.
In another preferred embodiment, n1 is 100-500.
In another preferred embodiment, the kit further comprises a second standard, wherein the second standard is a nucleic acid sequence having a (TTTCA) n2 five-nucleotide repeat insertion mutation, wherein n2 is 100-700.
In another preferred embodiment, n2 is 200-500.
In another preferred embodiment, the kit further comprises a primer pair for long-fragment PCR.
In another preferred embodiment, the sequences of the primer pair for long-fragment PCR are shown in SEQ ID Nos. 1 and 2.
In another preferred embodiment, the kit further comprises a barcode nucleic acid for adding a barcode sequence to the amplification product.
In another preferred embodiment, the kit also contains m barcode nucleic acids, wherein m is a positive integer greater than or equal to 2.
In another preferred embodiment, m.gtoreq.5, preferably.gtoreq.10, more preferably.gtoreq.20, most preferably.gtoreq.30.
In another preferred embodiment, m is from 5 to 5000, preferably from 10 to 2000, more preferably from 20 to 500.
In another preferred embodiment, m is 5 to 60, preferably 20 to 50, more preferably 35 to 45.
The main advantages of the invention include:
(a) in the invention, the target region sequence is captured and then single-molecule sequencing is carried out, so that more (for example, 50-300 pieces) of sub-reads (sub-reads of the target region) of the target region with higher accuracy (> 90%) are obtained, and compared with single-molecule sequencing at the whole genome level which only has single-digit sub-reads of the target region, the analysis on the specific sequence content of the repeated sequence mutation is more accurate.
(b) In the present invention, false negatives can be significantly reduced. Even for the detection of false negatives, such as appearance (TTTTA), that may be missed by repeated insertion mutation of SAMD12 gene pentanucleotide of FCMTE using repetitive primer PCR (RP-PCR) and long-range PCR (LR-PCR)100(TTTCA)210(TTTTA)100This rare mutation structure, or (TTTTA) n (TTTGA) n, a novel pentanucleotide repeat insertion, is still accurately detectable by the methods of the present invention.
(c) Compared with the high price of whole genome single molecule sequencing (about 3-5 ten thousand yuan per case), the cost of the whole process is only 1/12 or less (about 2500 yuan), so the method has higher clinical application value.
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specific conditions noted in the following examples, generally followed by conventional conditions, such as Sambrook et al, molecular cloning: the conditions described in the Laboratory Manual (New York: Cold Spring Harbor Laboratory Press,1989), or according to the manufacturer's recommendations. Unless otherwise indicated, percentages and parts are percentages and parts by weight.
General procedure
1. Long fragment PCR (Long-range PCR, LR-PCR)
1.1, collecting a peripheral blood sample of a person to be detected, and extracting genome DNA of the sample by using a phenol chloroform method;
1.2 LR-PCR using sample DNA: LR-PCR system configuration: 50ul/LR-PCR system: 100-500ng sample DNA, 0.2. mu.M primers SAMD12LF and SAMD12LR (see Table 1), 200mM dNTP, 1 × PrimeSTAR GXL buffer, 1.25U PrimeSTAR GXL DNA polymerase (TAKARA). LR-PCR reaction condition parameters: denaturation at 98 ℃ for 1 min; 30 cycles: alternating at 98 ℃ for 10 seconds and 68 ℃ for 15 minutes; 10 minutes at 72 ℃; storing at 4 ℃ to complete the PCR amplification procedure. The long fragment products of the 1% agarose gel-run unambiguous mutation were amplified (FIG. 3).
TABLE 1 LR-PCR primer sequences
Primer and method for producing the same | Sequence of | SEQ ID No: |
SAMD12LF | 5'-TGTGCAGCCATTGGTCCAGTCTT-3' | 1 |
SAMD12LR2 | 3'-GCTGGCAAAGTTCAGAGGTCACTT-5' | 2 |
Single molecule sequencing by PacBio sequencing platform
2.1 sample purification and fragment sorting before single molecule sequencing: the method comprises the following steps of (1) carrying out fragment sorting and recovery on a target LR-PCR product by utilizing a BluePippin full-automatic nucleic acid electrophoresis and fragment recovery system and combining the length of a target large fragment during the glue running of an LR-PCR sample;
2.2 labeling the recovery products of the target fragments of different samples with barcode sequences (barcodes): different barcode sequence tags are added to the target fragments of different samples by SMRTbell Barcoded Adapter Complete Prep-96 (PN: 100-. All labeled target fragments were adjusted to the same concentration and pooled based on the qubit (invitrogen) measurement method and purified using agencurtempure XP beads according to the PacBio library preparation recommendation protocol;
2.3 Single molecule sequencing library creation: the PacBio library was created according to the protocol of the SMRTbell Template Prep Kit 1.0(100-259-100) of PacBio and the "Procedure & checkpoint-10 kb Template Preparation and Sequencing" instructions. DNA damage repair, end repair, etc. were all performed in 200ng of labeled DNA. In all steps, AMPure PB magnetic beads (Pacific Biosciences) were used for purification. Both qualitative and quantitative analyses used the Agilent 2100Fragment Analyzer and the Quit fluorometer with Quant-iT dsDNA BR Assay Kits (Invitrogen);
2.4 Single molecule sequencing: SMRTbell templates were annealed to the v2 sequencing primer and bound to DNA Polymerase P6 using DNA/Polymerase Binding Kit P6(part #: 100-356-300) under the direction of Binding primer version 2.3.1.1 according to PacBio's protocol. The polymerase-template complex was purified using the Pacific Biosciences Magbead Binding Kit (part #: 100-133-600). And the sample reaction was set under the direction of Binding simulator. The samples were added to a single SMRT cell v3(Part #: 100-;
2.5 Single molecule sequencing data post-processing: sequencing data were processed using Pacific Biosciences SMRT Portal and SMRT Analysis System software (v2.3.0) bioinformatics software.
3. Screening and carrying out statistical analysis processing on a target sequence of single molecule sequencing data by using biological information software, screening CCS reading sequence with the length of the target sequence, and further screening data with the accuracy prediction more than or equal to 90% for next analysis; the sub-reads of the entire target sequence of the SAMD12 gene were used as the sub-reads of the target region (fig. 4). Counting the number of the sub-reads of each sample target region, calculating the total length (TTTTA + TTTCA or TTTTA + TTTGA), (TTTTA) length, (TTTCA) or (TTTGA) length of the repeat sequence region in the sub-reads of each target region of each sample by using R language, and selecting the median of each length as a representative result of the mutation specific structure of the sample repeat sequence.
Example 1
Identification of a novel five-nucleotide repeat insertion mutation (TTTGA) n by combination of LR-PCR and Single-molecule sequencing
For a certain FCMTE family, the pathogenic cause of FCMTE is firstly studied by RP-PCR and LR-PCR.
The inventors found that a long amplified fragment exists in the target region of the pedigree, but RP-PCR suggests that only (TTTTA) n is repeatedly amplified and no (TTTCA) n is repeatedly amplified (FIG. 2).
Through Sanger sequencing of LR-PCR long fragment products, the inventors found that a new unreported (TTTGA) n-pentanucleotide repeat insertion exists at the 3' end (FIG. 7), but the inventors still cannot confirm whether the (TTTCA) n-pentanucleotide repeat insertion still exists inside the long fragment to cause diseases.
Further, the inventors selected 3 RP-PCR-suggested (TTTTA) n-repeat amplified samples (III:4, II:6 and IV:2) in the family, obtained long fragment products by LR-PCR amplification, and performed single molecule sequencing. It was confirmed that in the long fragment cosegregating the pedigree with the disease, only (TTTGA) n pentanucleotide repeat insertion mutation, and no (TTTCA) n pentanucleotide repeat was present.
By the invention, the inventor firstly defines an FCMTE family in which no (TTTCA) n-quintet repeat mutation insertion is detected by RP-PCR (figure 6), and the family is a new and unreported FCMTE disease caused by (TTTGA) n-quintet repeat insertion mutation (figure 1, figure 4, figure 5 and figure 8).
Example 2
Combined with LR-PCR and single-molecule sequencing, specific sequences of N five-nucleotide repeated insertions of SAMD12 gene (TTTCA) of FCMTE are analyzed
For one example of the study of the present inventors, FCMTE samples (P-I-III:2) with (TTTCA) n-quintet repeat insertion mutations were confirmed by RP-PCR and LR-PCR product Sanger sequencing, and the mutant structures of the polynucleotide repeat mutation diseases were analyzed again by a method combining LR-PCR and single-molecule sequencing.
The results show that LR-PCR amplified its corresponding long fragment product (FIG. 3). Further, by single molecule sequencing, the specific sequence of the long fragment product was confirmed to be (TTTTA)35(TTTCA)481(FIG. 4).
Thus, for this FCMTE sample (P-I-III:2), the corresponding polynucleotide repeat mutation disease mutation structure was accurately defined for the first time: namely, specific sequence of n five-nucleotide repeated insertion mutation on SAMD12 gene (TTTCA) of FCMTE.
The disease mutation structures of 4 FCMTE samples analyzed in examples 1 and 2 are summarized in table 2.
TABLE 2.4 statistical Table of the length and content of the sub-reads of the target region for the FCMTE sample
Note:
1. the table summarizes the length of the number of repetitions and the median of the number of repetitions.
N represents the number of sub-reads that are available from the target long read.
3.N.D. indicates no detection. n is (length-3 bp)/5 bp.
Discussion of the related Art
In 2018, the first FCMTE pathogenic gene (SAMD12 gene) and the pathogenic mutation thereof, namely the (TTTCA) n five-nucleotide repeat insertion mutation (shown in figure 1) are identified and found, so that the molecular genetic diagnosis of FCMTE is possible for the first time.
However, the detection method of (TTTCA) n-pentanucleotide repeat insertion mutation reported at present is mainly RP-PC, LR-PCR or Southern Blot, and the combination of RP-PCR and LR-PCR or Southern Blot can qualitatively judge whether (TTTCA) n-pentanucleotide repeat insertion mutation exists in the number 4 intron region of SAMD12 gene, but the following problems still exist, which may cause false negative of detection: typically (TTTCA) n pentanucleotide repeat insertion mutations are located downstream (i.e. 5' to) a stretch of (ttta) n pentanucleotide repeats, such as: (TTTTA)200(TTTCA)210(FIG. 1), therefore, primers for specific diagnosis (TTTCA) n five-nucleotide repeat insertion mutation are designed for RP-PCR for detection, and a saw-like (saw-like) detection result can be generated in the capillary electrophoresis detection result (FIG. 2). However, it has also been reported that (TTTCA) n-pentanucleotide repeat insertion mutations are located within (TTTTA) n-pentanucleotide repeat sequences, e.g. (TTTTA)100(TTTCA)210(TTTTA)100In this case, although the long segment allel amplified repeatedly can be detected by LR-PCR (primers shown in table 1) or Southern blot (fig. 3), since there are cases where only (ttta) n long segment allel amplified repeatedly is present in normal persons, it is impossible to distinguish whether the person to be detected carries a repeat insertion mutation of pathogenic (TTTCA) n pentanucleotide, which may lead to missed diagnosis.
In the mutation detection of FCMTE, a novel five-nucleotide repeat insertion mutation- (TTTGA) n is unexpectedly discovered, and is determined to be co-separated with diseases in a family by RP-PCR and LR-PCR (see an example), and the structure of the mutation is predicted to be (TTTTTTA) n (TTTGA) n by two-end Sanger sequencing (figure 1).
Since Sanger sequencing cannot cover the sequence of the entire LR-PCR long fragment, the inventors were still unable to ascertain whether (TTTCA) n pentanucleotide repeat insertion mutations still remain inside the LR-PCR long fragment. Similar problems also occur in spinocerebellar ataxia types (SCA), such as SCA10, SCA31, SCA37, etc., where repeated insertions of five nucleotides outside the reference sequence are present. Most of the detection methods at present cannot accurately detect the detailed sequence of the mutation. Therefore, the current detection means still has obvious defects in mutation detection and mutation content judgment.
Although the single-molecule sequencing technology has certain application value, the practical application of the single-molecule sequencing technology is greatly limited. Firstly, the detection of the five-nucleotide repeat insertion mutation of the SAMD12 gene by using a single-molecule sequencing technology is based on single-molecule sequencing at the whole genome level, the average effective coverage depth is only about 8X, and the reading sequence capable of crossing the five-nucleotide repeat insertion mutation of the SAMD12 gene is only 1-2 or even no reading sequence, so that the detection omission can be caused.
Secondly, even if 1-2 reads were obtained with repeated insertional mutations across the five nucleotides of the SAMD12 gene, there was still great difficulty in the accuracy of the analysis of the specific sequence content within the reads. Since the misreading of a single base is the technical defect of single-molecule sequencing, a limited number of reads cannot be corrected by an algorithm, so that the detection of the five-nucleotide repeat insertion mutation of the SAMD12 gene by using single-molecule sequencing at the genome-wide level is still a qualitative level detection, the existence of repeat amplification sequences can be seen, and reliable sequence content information such as the specific repeat number, arrangement mode and the like of (TTTTTTA) n, (TTTCA) n and (TTTGA) n cannot be provided.
Thirdly, the price of the whole genome level single molecule sequencing is still very expensive, and the whole genome level single molecule sequencing is still in scientific research level at present and cannot be popularized in clinical detection.
Based on the problem that the specific sequence content of a mutated long fragment needs to be analyzed in detail in SAMD12 gene mutation detection of FCMTE, the limitations and defects of the prior related technologies (RP-PCR, LR-PCR or Southern blot, whole genome single molecule sequencing) at the technical and application levels are fully considered, the method successfully analyzes the five-nucleotide repeat insertion mutation with different intron regions in SAMD12 gene No. 4 for the first time in detail by combining LR-PCR and target region single molecule sequencing, and confirms that the method can analyze the detailed sequence content of the long fragment polynucleotide repeat mutation and identifies (TTTGA) n as a new FCMTE five-nucleotide repeat insertion pathogenic mutation for the first time.
In the invention, in the technical aspect, more effective reading sequences are obtained, so that the accuracy of sequence analysis is greatly improved; in the aspect of sequencing cost, compared with the whole genome, the total sequencing amount is obviously reduced, the cost is greatly reduced, and the total cost is controlled at the thousand yuan level.
It will be appreciated that although the examples given in the examples are examples of the analysis of the detailed sequence content of the five nucleotide repeat insertion mutation of the SAMD12 gene of FCMTE, it is clear that the method of the present invention can be used to analyze other mutant structures of FCMTE and also to analyze the mutant structures of other polynucleotide repeat mutation diseases.
Meanwhile, the invention can provide reference for more detailed sequence analysis of similar repeated mutation of the polynucleotide, and provides a more clinically accurate molecular genetics detection and diagnosis method.
All documents referred to herein are incorporated by reference into this application as if each were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.
Sequence listing
<110> Zhejiang university
<120> method for analyzing mutant structure of repeated mutation disease of polynucleotide based on long fragment PCR and single molecule sequencing
<130> P2019-0707
<160> 2
<170> SIPOSequenceListing 1.0
<210> 1
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
tgtgcagcca ttggtccagt ctt 23
<210> 2
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
gctggcaaag ttcagaggtc actt 24
Claims (1)
1. The application of the detection reagent is characterized in that the detection reagent is used for preparing a detection kit for diagnosing the familial corticotropin clonic tremor epilepsy; wherein the detection reagent is used for detection inSAMD12A detection reagent for (TTTGA) n1 pentanucleotide repeats in a gene, wherein n1 is 50-800.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458674.9A CN110129422B (en) | 2019-05-29 | 2019-05-29 | Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458674.9A CN110129422B (en) | 2019-05-29 | 2019-05-29 | Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110129422A CN110129422A (en) | 2019-08-16 |
CN110129422B true CN110129422B (en) | 2021-06-29 |
Family
ID=67582728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910458674.9A Active CN110129422B (en) | 2019-05-29 | 2019-05-29 | Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110129422B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114317728B (en) * | 2022-03-11 | 2022-06-07 | 北京贝瑞和康生物技术有限公司 | Primer group, kit, method and system for detecting multiple mutations in SMA |
CN115410649B (en) * | 2022-04-01 | 2023-03-28 | 北京吉因加医学检验实验室有限公司 | Method and device for simultaneously detecting methylation and mutation information |
CN114807360B (en) * | 2022-06-27 | 2022-09-02 | 北京贝瑞和康生物技术有限公司 | Method and kit for detecting fragile X syndrome mutation |
-
2019
- 2019-05-29 CN CN201910458674.9A patent/CN110129422B/en active Active
Non-Patent Citations (3)
Title |
---|
Detecting expansions of tandem repeats in cohorts sequenced with short-read sequencing data;Tankard RM等;《Am J Hum Genet.》;20181231;第103卷;858-873 * |
Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy;Ishiura H等;《Nat Genet.》;20181231;第50卷;581-590 * |
Intronic pentanucleotide TTTCA repeat insertion in the SAMD12 gene causes familial cortical myoclonic tremor with epilepsy type 1;Zhidong Cen等;《Brain》;20180623;第141卷(第8期);2280-2288 * |
Also Published As
Publication number | Publication date |
---|---|
CN110129422A (en) | 2019-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2015314114B2 (en) | Detecting repeat expansions with short read sequencing data | |
CN110129422B (en) | Method for analyzing mutation structure of repeated mutation disease of polynucleotide based on long-fragment PCR and single-molecule sequencing | |
JP2015512655A (en) | Detection and quantification of sample contamination in immune repertoire analysis | |
KR102363284B1 (en) | Sequencing process | |
CN111292804B (en) | Method and system for detecting SMN1 gene mutation by means of high-throughput sequencing | |
JP7212720B2 (en) | Precision sequencing method | |
EP3274477B1 (en) | Method of identifying risk for autism | |
CN113637744B (en) | Application of microbial marker in judging progress of acute pancreatitis course | |
KR20110094041A (en) | Gene sensitive to normal-tension glaucoma disease, and use thereof | |
JP6502477B2 (en) | Method of determining fetal gene status | |
CN105838720B (en) | PTPRQ gene mutation body and its application | |
EP3474168B1 (en) | Method for measuring mutation rate | |
CN115851973A (en) | Method and kit for rapidly detecting human InDel genetic polymorphism by real-time fluorescent PCR (polymerase chain reaction) and application | |
WO2022231449A1 (en) | Circulating noncoding rnas as a signature of autism spectrum disorder symptomatology | |
CN115323048A (en) | Primer combination and method for detecting human embryo alpha-thalassemia gene mutation | |
WO2022082199A1 (en) | Method for detecting amyotrophic lateral sclerosis | |
CN107267600A (en) | A kind of primer, method, kit and its application in enrichment BRCA1 and BRCA2 gene targets region | |
CN104152568B (en) | High-throughput STR sequence core repeat number detection method | |
CN111662992A (en) | Flora associated with acute pancreatitis and application thereof | |
CN113637782B (en) | Microbial marker related to progression of acute pancreatitis course and application thereof | |
WO2023058100A1 (en) | Method for detecting structural variation, primer set, and method for designing primer set | |
RU2800083C2 (en) | Method for obtaining molecular single nucleotide markers for identifying an unknown individual by multiplex amplification for working with samples of degraded DNA, a kit for obtaining molecular markers | |
CN112442527B (en) | Autism diagnosis kit, gene chip, gene target screening method and application | |
EP4289966A1 (en) | Genetic information analysis system and genetic information analysis method | |
WO2022131285A1 (en) | Method for evaluating adapter ligation efficiency in sequence of dna sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |