CN110129422A

CN110129422A - The method for repeating mutation diseases mutation structure based on Long fragment PCR and single-molecule sequencing parsing polynucleotides

Info

Publication number: CN110129422A
Application number: CN201910458674.9A
Authority: CN
Inventors: 罗巍; 岑志栋; 姜正文; 杨德壕; 付爱思; 胡奔
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2019-08-16
Anticipated expiration: 2039-05-29
Also published as: CN110129422B

Abstract

The present invention provides the methods for repeating mutation diseases mutation structure based on Long fragment PCR and single-molecule sequencing parsing polynucleotides.The method of the present invention step: (a) providing a sample to be tested, and the sample to be tested is the sample of nucleic acid containing genomic DNA；(b) Long fragment PCR is carried out to the sample to be tested, to obtain the first amplified production；(c) sequence of barcodes is added to the end of the amplified production, forms the first amplified production with sequence of barcodes；(d) amplified production to described with sequence of barcodes carries out single-molecule sequencing, to obtain the data set for corresponding to the son reading sequence of the target area.Based on the data set, the mutation structure that polynucleotides repeat mutation diseases can be accurately parsed.The present invention has the characteristics that high efficiency, high-precision and low cost.

Description

It is prominent that mutation diseases are repeated based on Long fragment PCR and single-molecule sequencing parsing polynucleotides The method of structure changes

Technical field

The present invention relates to field of biotechnology, parse multicore glycosides more particularly to based on Long fragment PCR and single-molecule sequencing The method that acid repeats mutation diseases mutation structure.

Background technique

It is a major class because of 3-12 nucleosides that polynucleotides, which repeat mutation diseases (repeat expansion disease, RED), The sour repetitive sequence pathogenic genetic disease of amplification extremely.Except the Trinucleotide repeats such as relatively common (CAG) n and (CGG) n expand It is outer to increase mutation, there are more and more polynucleotides repeat amplification protcols to be mutated the discovery that is reported at present.It is repeated in this kind of polynucleotides prominent Become in disease, such as (GGGGCC) n Hexanucleotide repetition of C9ORF72 gene is mutated, (CCCCGCCCCGCG) n of CSTB gene The sequence length of its abnormal repeat amplification protcol specifies the detailed construction of its mutation, explores different structure up to several kb even tens kb Different pathogenic mechanisms and be always a great problem with the relationships of clinical phenotypes.

It is trembled epilepsy (familial cortical myoclonic tremor with familial cortex myoclonic Epilepsy, FCMTE) for, which is one group of autosomal dominant with obvious Clinical heterogeneity and genetic heterogeneity Genetic epilepsy syndrome.FCMTE is with onset of growing up, far-end of limb motility cortical tremor and myoclonia, companion or not with epilepsy Breaking-out is main clinical manifestation, the symptoms such as combinable Cognitive function damage, dementia, yctalopia, migraine, incoordination.Electro physiology It learns and checks can there is abnormal electromyogram, EEG performance, such as the huge somatosensory evoked potentials (Giant in cortex source Somatosensory evoked potentials, G-SEP), long latency cortical reflex (Long-latency cortical reflex,LLCR or C-reflex).Antiepileptic Drugs are effective.

The clinical phenotypes of FCMTE are complicated, and the electricity for needing to combine G-SEP, C-reflex etc. complicated in the past diagnostic criteria is raw Reason is checked and can just be made a definite diagnosis, therefore is had and largely failed to pinpoint a disease in diagnosis, mistaken diagnosis case, it is caused to be unable to get targetedly accurate diagnosis and treatment.

Single-molecule sequencing (single-molecule sequencing or long-read sequencing) technology is This kind of polynucleotides repeat mutation and provide new detection means, and the SAMD12 gene five of FCMTE is also applied to by other team Nucleotide is repeatedly inserted into the research of mutation, however there is also various limitations, cause omission factor high, can not provide reliable sequence Content information can only belong to the detection of qualitative level, can see the presence or absence of repeat amplification protcol sequence, but can not provide reliable Sequence content information.In addition, the single-molecule sequencing of full-length genome level, price is very expensive, can not carry out in clinical detection It promotes.

Therefore, there is an urgent need in the art to develop new efficiently and accurately parsing polynucleotides to repeat mutation diseases abrupt junction The method of structure.

Summary of the invention

It is an object of the invention to provide a kind of new efficiently and accurately parsing polynucleotides to repeat mutation diseases mutation The method of structure.

In the first aspect of the present invention, the mutation structure for providing a kind of pair of polynucleotides repetition mutation diseases is parsed Method or a kind of pair of polynucleotides repeat region the method that is parsed of structure, the method includes the steps:

(a) sample to be tested is provided, the sample to be tested is the sample of nucleic acid containing genomic DNA；

(b) Long fragment PCR is carried out to the sample to be tested, to obtain the first amplified production；

(c) sequence of barcodes is added to the end of the amplified production, forms the first amplified production with sequence of barcodes；

(d) amplified production to described with sequence of barcodes carries out single-molecule sequencing, corresponds to the target to obtain The son in region reads the data set of sequence (reading sequence corresponding to the son of the polynucleotides repeat region).

In another preferred example, the method also includes:

(e) data set is analyzed, to obtain the prominent of the target area (i.e. polynucleotides repeat region) Structure changes.

In another preferred example, between in step (c) and (d), further includes:

(d0) the first amplified production by described with sequence of barcodes is mixed with the m-1 amplified productions with sequence of barcodes It closes, to obtain amplified production mixing library；

Wherein, the m-1 amplified productions with sequence of barcodes are the bands respectively with step (a), (b) and (c) preparation Have different sequence of barcodes the 2nd, the 3rd ... and m amplified production, and m kind tape is contained in the amplified production mixing library The amplified production of code sequence；

Wherein, the positive integer that m is >=2.

In another preferred example, m >=5, preferably >=10, more preferably >=20, most preferably >=30.

In another preferred example, m 5-5000, preferably 10-2000, more preferably 20-500.

In another preferred example, m 5-60, preferably 20-50, more preferably 35-45.

In another preferred example, in step (d), single-molecule sequencing is carried out to amplified production mixing library, thus It obtains the son for corresponding to the target area and reads sequence (reading sequence corresponding to the son of the polynucleotides repeat region) Data set.

In another preferred example, in step (e), based on different sequence of barcodes, the data set is split, so The reading sequence with identical sequence of barcodes carries out classification analysis afterwards, to obtain respectively respective described corresponding to m kind sample to be tested The mutation structure of target area (i.e. polynucleotides repeat region).

In another preferred example, the length of the polynucleotides repeat region is 200-10000bp, preferably 1500- 5000bp。

In another preferred example, polynucleotides repetition is the repetition of 3-12nt nucleotide units.

In another preferred example, the polynucleotides repeat region contains one or more polynucleotides and repeats.

In another preferred example, polynucleotides repetition mutation diseases are selected from the group: familial cortex myoclonic Epilepsy of trembling (such as 1,6,7 type familial cortex myoclonics tremble epilepsy)；The relevant amyotrophic lateral sclerosis of C9ORF72/ Frontotemporal dementia；Spinocerebellar ataxia (such as 8,10,31,36,37 type spinocerebellar ataxias)；Tatanic myotrophy Bad (such as 1,2 type myotonia dystrophys).

In another preferred example, between in step (b) and (c), further includes:

(c0) the first amplified production is separated, to obtain the first amplified production through isolating and purifying.

In another preferred example, when the method is used for m amplified production, the m amplified production is separated, To obtain m separated amplified productions.

In another preferred example, with Bluepippin, segment sorting separation and recovery is carried out to amplified production.

In another preferred example, the method is non-diagnostic and non-therapeutic.

In the second aspect of the present invention, provides and a kind of tremble epilepsy for diagnosing familial cortex myoclonic (FCMTE) kit, the kit contain the first standard items, and first standard items are that have (TTTGA) n1 five Nucleotide is repeatedly inserted into the nucleic acid sequence of mutation, and wherein n1 is 50-800.

In another preferred example, n1 100-500.

In another preferred example, the kit also contains the second standard items, and second standard items are that have (TTTCA) n2 pentanucleotide is repeatedly inserted into the nucleic acid sequence of mutation, and wherein n2 is 100-700.

In another preferred example, n2 200-500.

In another preferred example, the kit also contains the primer pair for Long fragment PCR.

In another preferred example, the sequence of the primer pair for Long fragment PCR is as shown in SEQ ID No:1 and 2.

In the third aspect of the present invention, the purposes of kit described in second aspect of the present invention is provided, it be used to make Standby diagnosis familial cortex myoclonic trembles the detection kit of epilepsy (FCMTE).

In the fourth aspect of the present invention, a kind of purposes of detection reagent is provided, the detection reagent is for detecting (TTTGA) n1 pentanucleotide in SAMD12 gene repeats, wherein the detection reagent is used to prepare diagnosis familial cortex flesh Clonicity is trembled the detection kit of epilepsy (FCMTE).

In the fifth aspect of the invention, a kind of method for diagnosing FCMTE is provided, comprising steps of detecting the object It is repeated in SAMD12 gene with the presence or absence of TTTGA type pentanucleotide；

Wherein, it is repeated if there is TTTGA type pentanucleotide, then the object is prompted to suffer from FCMTE or the object A possibility that FCMTE (i.e. neurological susceptibility), is higher than normal population.

In the sixth aspect of the present invention, the mutation structure for providing a kind of pair of polynucleotides repetition mutation diseases is parsed System (or equipment), the system comprises:

(i) LR-PCR expands module, and the LR-PCR amplification module is configured as: Long fragment PCR is carried out to sample to be tested, To obtain the first amplified production, wherein the sample to be tested is the sample of nucleic acid containing genomic DNA；

(ii) amplified production post-processing module, amplified production post-processing module are configured as: to the end of the amplified production Portion adds sequence of barcodes, forms the first amplified production with sequence of barcodes；With

(iii) single-molecule sequencing module, the single-molecule sequencing module are configured as: to the expansion with sequence of barcodes Increase production object and carry out single-molecule sequencing, (corresponds to the polynucleotides to obtain and read sequence corresponding to the son of the target area Repeat region son read sequence) data set.

In another preferred example, the system also includes:

(iv) data analysis module, the data analysis module are configured as: being analyzed the data set, to obtain Obtain the mutation structure of the target area (i.e. polynucleotides repeat region).

It should be understood that above-mentioned each technical characteristic of the invention and having in below (eg embodiment) within the scope of the present invention It can be combined with each other between each technical characteristic of body description, to form a new or preferred technical solution.As space is limited, exist This no longer tires out one by one states.

Detailed description of the invention

Fig. 1 shows that pentanucleotide is repeatedly inserted into Catastrophe Model figure in No. 4 intrones of SAMD12 gene.Normal sequence is general For (TTTTA)₇TTA(TTTTA)₁₃；The sequence of two kinds of mutation is respectively (TTTTA) exp (TTTGA) exp and (TTTTA) exp (TTTCA) (wherein exp represents the presence or absence of repeat amplification protcol sequence for exp:repeat expansion, repeat amplification protcol, does not represent secondary Number).

Fig. 2 shows that pentanucleotide is repeatedly inserted into mutation RP-PCR result figure in No. 4 intrones of SAMD12 gene.It can be seen that quilt Two samples are detected, RP-PCR prompts the presence of (TTTTA) n and the amplification of (TTTGA) n repetitive sequence, but (TTTCA) n is not present Repetitive sequence amplification.

Fig. 3 shows that SAMD12 gene pentanucleotide is repeatedly inserted into mutation LR-PCR and runs cementing fruit.III:4, II:6 and IV: 2 samples go out in about 2000bp has abnormal amplified band；There is abnormal amplified band at about 3000bp in P-I-III2 sample.

Fig. 4 shows that the son of two FCMTE sample single-molecule sequencing representativeness target areas reads sequence.II:6 sample expands extremely Increase the detailed sequence of band are as follows: (TTTTA)₅TTA(TTTTA)₁₁₄(TTTGA)₁₁₁；The abnormal amplified band of P-I-III2 sample Detailed sequence are as follows: (TTTTA)₃TTA(TTTTA)₃₂(TTTCA)₄₈₁。

Fig. 5 shows that the son in 4 FCMTE sample object regions reads sequence length and profile of content: A-D is that each sample is different Normal amplified band distribution of lengths；E-H is in each sample exception amplified band, (TTTTA) n and (TTTGA) n's or (TTTCA) n Distribution of lengths situation.Dotted line represents median (specific value is shown in Table 2).

Fig. 6 shows that (TTTGA) n pentanucleotide is repeatedly inserted into the pathogenic pedigree chart of mutation, mutant nucleotide sequence structure and LR-PCR Glue figure.

Fig. 7 shows that (TTTGA) n pentanucleotide is repeatedly inserted into the pathogenic family sample LR-PCR product Sanger sequencing of mutation And normal control figure: normal control Sanger sequencing prompt repetitive sequence structure is (TTTTA)₇TTA(TTTTA)₁₃；Long segment It is (TTTGA) exp that PCR product Sanger sequencing prompt 5' terminal sequence, which is the end exp, 3' (TTTTA),.

Fig. 8 shows that another two (TTTGA) n pentanucleotide is repeatedly inserted into mutation and causes a disease FCMTE sample single-molecule sequencing generation The son of table target area reads sequence: the detailed sequence of III:4 and IV:2 sample exception amplified band is respectively as follows: (TTTTA)₅TTA (TTTTA)₁₁₉(TTTGA)₁₁₁(TTTTA)₅TTA(TTTTA)₁₀₈(TTTGA)₁₁₃。

Specific embodiment

The present inventor after extensive and in-depth study, develops a kind of efficiently and accurately parsing polynucleotides weight for the first time The method of multiple mutation diseases mutation structure.Specifically, the present invention is based on LR-PCR and single-molecule sequencing, using LR-PCR product into Row target area single-molecule sequencing, to obtain more effectively reading sequences in the case where total amount is sequenced and reduces (cost reduction) (valid data increase), has the characteristics that high efficiency, high-precision and low cost.Based on the method for the present invention, the present inventor is also for the first time , i.e., there is (TTTGA) n class on SAMD12 gene in the new mutation structure for identifying a kind of polynucleotides repetition mutation diseases FCMTE The pentanucleotide of type is repeatedly inserted into mutation.The present invention is completed on this basis.

Term

As used herein, term " son of target area reads sequence (on-target subread) " refers to and disease mutation structure Relevant reading sequence.

As used herein, term " repetition of TTTGA type pentanucleotide " refers to existing (TTTGA) n1 in a target area Pentanucleotide repeats, wherein n1 positive integer as defined above.In the present invention, the TTTGA type pentanucleotide in SAMD12 gene It repeats to be confirmed for the first time related to familial cortex myoclonic epilepsy of trembling.

As used herein, term " repetition of TTTCA type pentanucleotide " refers to existing (TTTCA) n2 in a target area Pentanucleotide repeats, and wherein n2 is positive integer as defined above.It has proven convenient that the TTTCA type pentanucleotide weight in SAMD12 gene It is multiple confirmed for the first time it is related to familial cortex myoclonic epilepsy of trembling.

SAMD12 gene

SAMD12 (Sterile Alpha Motif Domain Containing 12, NCBI ID:401474), is one A protein coding gene previously reports that it is wherein " ENST00000409003.5 " transcript contains 5 encoded exons The Disease-causing gene of FCMTE, pathogenic mutation are that the TTTCA type pentanucleotide in No. 4 intrones is repeatedly inserted into.

Certain concrete functions of the protein of SADM12 coded by said gene it is not immediately clear.

LR-PCR

Polymerase chain reaction (PCR) is the Protocols in Molecular Biology of external enzyme' s catalysis specific DNA fragment, to be amplified DNA fragmentation is by the oligonucleotides strand primer complementary with its sequence as starting point, and the basic principle of round pcr is similar to DNA's Natural reproduction process, specificity depend on the Oligonucleolide primers complementary with target sequence both ends.PCR is by denaturation-annealing-extension Three fundamental reaction steps are constituted.During PCR amplification, under the action of archaeal dna polymerase (such as Taq DNA polymerase), with DNTP is reaction raw materials, and target sequence is template, by base pair complementarity and semi-conservative replication principle, synthesizes a new and template The semi-conservative replication chain of DNA chain complementation.Through denaturation, annealing, the multiple circulation for extending three-step reaction, specific DNA fragmentation is made to exist It is in exponential increase in quantity.By PCR, a large amount of specific gene segment can be obtained in a short time.

In the present invention, Long fragment PCR (Long-range PCR, LR-PCR) refers to amplified production 4kb or more (preferably 5kb or more) PCR reaction.In the present invention, by adjusting PCR reaction condition and associated DNA polymerase type (such as Taq Polymerase), reach and amplify 5kb or more the target production that Standard PCR (general amplifiable 3-4kb segment out) can not amplify The technical method of object.

In the present invention, the amplified production of Long fragment PCR (Long-range PCR, LR-PCR) is usually 4.5-15kb, Preferably 5-10kb, more preferably 5-8kb.

In the present invention, the amplification of DNA long fragment sequence is tackled, usually can further improve knot by adjusting PCR condition Fruit, for example use specific polymerase, optimization template DNA amount and Mg2+ concentration etc..

A kind of polymerase of preferred LR-PCR includes specific archaeal dna polymerase (the Takara LA Taq of TAKARA company Archaeal dna polymerase and PrimeSTAR GXLDNA polymerase), it can expand to obtain the long segment up to tens Kb, including have The sequence of AT repetitive sequence, high GC content.

About the operation of LR-PCR, referring also to following documents: Waggott W.Long Range PCR.In:Lo Y.M.D.(eds)Cl inical Applications of PCR.Methods in Molecular Medicine^TM,vol 16.1998.Humana Press；Saiki RK,Gelfand DH,Stoffel S,et al.Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase.Science.1998.239(4839):487–91。

Single-molecule sequencing

In the present invention, long segment amplified production obtained is expanded for LR-PCR, is further advanced by unimolecule survey Sequence is to obtain corresponding reading ordinal number evidence.In the present invention, representative single-molecule sequencing includes (but being not limited to): tSMS sequencing With nano-pore sequencing etc..

Typically, based on fluorescent marker deoxynucleotide is used, the technology of fluorescence intensity change is recorded in real time by microscope Principle, the new-generation sequencing for realizing that a reaction can survey very long sequence (can reach 20kb) terminate, this is efficiently solved Previously two generation sequencing technologies read the technical bottleneck of length (100-300bp).

For nucleic acid sequencing, term " template " indicates to carry out the nucleic acid molecules of sequencing reaction.For example, in synthesis order-checking In reaction, template is the molecule for by polymerase being used to that nascent strand to be instructed to synthesize；For example, it is complementary with the nascent strand generated.In base In the sequencing approach of nano-pore, template is the nucleic acid across nano-pore, either completely or after molten core degraded.Mould Plate may include for example, DNA, RNA or the like or combinations thereof.In addition, template can be single-stranded, double-strand, or can wrap Containing single-stranded and double-stranded region.

In the present invention, it is preferred to using Single-molecule Sequencing System, by analyzing the response data (example obtained from this kind of system Such as, sequence and/or dynamics data) it is nucleic acid-templated to detect.Specifically, the modification in template nucleic acid chain can be reacted in analysis In cause unique and identifiable change, the change, which allows to modify, to be identified.In other embodiments, in template Modification can change approach, wherein electric current across nano-pore can be upset during template is passed through.In preferred embodiment In, such modification is detected using monomolecular nucleic acid sequencing technologies, wherein the sequence generated is read corresponding to nucleic acid-templated Unimolecule.In preferred embodiments, monomolecular nucleic acid sequencing technologies can each nucleotide of real-time detection, such as in nucleosides Acid mixes or by during nano-pore.Such sequencing technologies are known in the art, and including, such as nano-pore sequencing Technology.About the more information of nano-pore sequencing, see, e.g., U.S. Patent number 5,795,782；Kasianowicz, et al. (1996) Proc Natl Acad Sci USA 93 (24): 13770-3；Ashkenas, et al. (2005) Angew Chem Int Ed Engl44 (9): 1401-4；Howorka, et al. (2001) Nat Biotechnology 19 (7): 636-9；Astier, etc. People (2006) J Am Chem Soc128 (5): 1705-10；On April 8th, 2011 U.S.S.N.13/083,320 submitted；With Zhao, et al. (2007) Nano Letters 7 (6): 1680-1685, these documents are all incorporated herein by reference in their entirety.

In addition, about single-molecule sequencing technology, referring also to following documents: Ameur A, Kloosterman WP, Hestand MS.Single-molecule sequencing:towards clinical applications.Trends Biotechnol.2018.37(1):72-85；Mitsuhashi,et al.Tandem-genotypes:robust detection of tandem repeat expansions from long DNA reads.Genome Biology.2019.20:58:1-17。

Polynucleotides repeat mutation diseases

Being suitable for the invention polynucleotides repetition mutation diseases (repeat expansion disease, RED) does not have Especially limitation, the disease caused by any being expanded extremely because of 3-12 trinucleotide repeat sequence, especially heredity disease Disease, representative example include (but being not limited to): spinocerebellar ataxia, myotonia dystrophy, C9ORF72 are related Amyotrophic lateral sclerosis/Frontotemporal dementia, fragile X mental retardation etc..

Detection method

The present invention provides a kind of mutation structures that efficiently and accurately can repeat mutation diseases to polynucleotides to parse Method.The method of the present invention dexterously combines the advantages of Long fragment PCR and single-molecule sequencing parsing, thus not only can be efficient And accurately FCMTE is parsed, the mutation structure that other different polynucleotides repeat mutation diseases can also be parsed.

The case where present invention is especially suited for polynucleotides repeat region being more than 500bp.In the prior art, for multicore glycosides The case where sour repeat region is more than 500bp, even if using technologies such as two generations sequencings, because of doing for many factors such as polynucleotides repetition It disturbs, can not obtain accurate result.

Kit

The present invention also provides a kind of for can be used for detecting the kit of FCMTE.Kit of the invention contains first Standard items, first standard items are the nucleic acid sequences that mutation is repeatedly inserted into (TTTGA) n1 pentanucleotide, and wherein n1 is 50-800。

In another preferred example, n1 100-500.

In another preferred example, n2 200-500.

In another preferred example, the kit also contains for the bar code core to amplified production addition sequence of barcodes Acid.

In another preferred example, the kit also contains m kind bar code nucleic acid, wherein the positive integer that m is >=2.

In another preferred example, m 5-60, preferably 20-50, more preferably 35-45.

Main advantages of the present invention include:

(a) in the present invention, is captured, then carry out single-molecule sequencing for target area sequence, is obtained more The son of the target area of (such as 50-300 is a plurality of), more high accuracy (> 90%) reads sequence (son of target area reads sequence), compared to The single-molecule sequencing of full-length genome level only has the son of the target area of units to read sequence, makes the specific sequence being mutated to repetitive sequence Column content analysis is more accurate.

(b) in the present invention, false negative can be substantially reduced.Even for repeat primer PCR (repeat-primed PCR, RP-PCR) and Long fragment PCR (long-range PCR, LR-PCR) detection FCMTE SAMD12 gene pentanucleotide weight Multiple insertion mutation institute may missing inspection false negative, such as appearance (TTTTA)₁₀₀(TTTCA)₂₁₀(TTTTA)₁₀₀This rare mutation Structure, or this new pentanucleotide of (TTTTA) n (TTTGA) n are repeatedly inserted into sequence, and the method for the present invention still can be detected accurately.

(c) in the present invention, there is advantage at low cost, compared to the high price (mesh of full-length genome single-molecule sequencing Preceding about 3-5 ten thousand yuan every), the cost of present invention full set process is only 1/12 or lower (about 2500 yuan or so), therefore is had Higher clinical value.

Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention Rather than it limits the scope of the invention.In the following examples, the experimental methods for specific conditions are not specified, usually according to conventional strip Part, such as Sambrook et al., molecular cloning: laboratory manual (New York:Cold Spring Harbor Laboratory Press, 1989) condition described in, or according to the normal condition proposed by manufacturer.Unless otherwise stated, no Then percentage and number are weight percent and parts by weight.

Universal method

1. Long fragment PCR (Long-range PCR, LR-PCR)

The peripheral blood sample of 1.1 acquisition persons to be detected, the genomic DNA of sample is extracted using phenol chloroform method；

1.2 carry out LR-PCR:LR-PCR system configurations: 50ul/LR-PCR system: 100-500ng sample using sample DNA DNA, 0.2 μM of primer SAMD12LF and SAMD12LR (being shown in Table 1), 200mM dNTP, 1x PrimeSTAR GXL buffer, 1.25U PrimeSTAR GXL archaeal dna polymerase (TAKARA).LR-PCR reaction condition parameter: it is denaturalized 1 minute at 98 DEG C；30 Circulation: 98 DEG C 10 seconds and 68 DEG C 15 minutes alternatings；72 DEG C 10 minutes；4 DEG C of preservations, complete PCR amplification program.1% agarose gel It runs the long segment product that glue is clearly mutated and is amplified out (Fig. 3).

Table 1.LR-PCR primer sequence

Primer	Sequence	SEQ ID No:
			SAMD12LF	5'-TGTGCAGCCATTGGTCCAGTCTT-3'	1
SAMD12LR2	3'-GCTGGCAAAGTTCAGAGGTCACTT-5'	2

2.PacBio microarray dataset carries out single-molecule sequencing

Sample purification and segment sorting before 2.1 single-molecule sequencings: the full-automatic nucleic acid electrophoresis of BluePippin and segment are utilized Recovery system, target sheet segment length when running glue in conjunction with LR-PCR sample carry out segment sorting recycling to target LR-PCR product；

The target fragment recovery product of 2.2 pairs of different samples carries out sequence of barcodes (barcodes) and is marked: passing through Target patch of the SMRTbell Barcoded Adapter Complete Prep-96 (PN:100-514-900) to different samples Duan Tianjia different sequence of barcodes labels.Target fragment after all labels is based on Qubit (Invitrogen) measurement method, It adjusts to same concentrations and merges, and proposed projects are prepared according to the library PacBio, use AgencourtAMPure XP beads Purifying；

2.3 single-molecule sequencing libraries creation: according to the 1.0 (100- of SMRTbell Template Prep Kit of PacBio Protocol and " Procedure&Checklist-10kb Template Preparation and 259-100) Sequencing " specification creates the library PacBio.DNA damage reparation, end repair etc. after 200ng label in DNA into Row.It is purified using AMPure PB magnetic beads (Pacific Biosciences) in all steps.It is qualitative Agilent 2100Fragment Analyzer and Qubit fluorometer with Quant- is used with quantitative analysis iT dsDNA BR Assay Kits(Invitrogen)；

2.4 single-molecule sequencings: SMRTbell templates and v2 sequencing primer are annealed, according to PacBio's Protocol uses DNA/Polymerase under Binding Calculator version 2.3.1.1 guidance Binding Kit P6 (part#:100-356-300) is in conjunction with archaeal dna polymerase P6.Polymerase-stamp complex uses Pacific Biosciences Magbead Binding Kit (part#:100-133-600) purifying.And in Binding Calculator guidance, which is divided into, determines example reaction.Sample is added to single SMRT cell v3 (Part#:100-171-800), Operation duration is 360 minutes, under MagBead Station improved conditions, uses C4DNA Sequencing Kit 2.0 (Part#:100-216-400) reagent is sequenced in PacBio RS II (SMRT) sequenator；

Machine post-processes under 2.5 single-molecule sequencing data: using Pacific Biosciences SMRT Portal and SMRT Analysis System software (v2.3.0) bioinformatics software handles sequencing data.

3. handling using bioinformatics software single-molecule sequencing datum target sequence screening and statistical analysis, mesh is filtered out The CCS for marking sequence length reads sequence, and data of the further screening accuracy prediction more than or equal to 90% carry out next step analysis；Make It uses the son of the entire target sequence of SAMD12 gene to read sequence as the son of target area and reads sequence (Fig. 4).Count each sample object area The son in domain reads the item number of sequence, using R language, to the total length in the Zi Duxunei repetitive sequence area of every target area of each sample (TTTTA+TTTCA or TTTTA+TTTGA), (TTTTA) length, (TTTCA) or (TTTGA) length are calculated, and are chosen each Representative result of the median of length as sample repetitive sequence mutation specific structure.

Embodiment 1

Identify that a new pentanucleotide is repeatedly inserted into mutation (TTTGA) n in conjunction with LR-PCR and single-molecule sequencing

For some FCMTE family, RP-PCR and LR-PCR are first passed through, the pathogenesis of FCMTE is studied.

The inventors discovered that the family is there are target area, there are one section of long amplified fragments, but RP-PCR prompt only has (TTTTA) (TTTCA) n repeat amplification protcol (Fig. 2) is not present in n repeat amplification protcol.

By to LR-PCR long segment product carry out Sanger sequencing, the inventors discovered that its end 3' exist one newly, (TTTGA) the n pentanucleotide not being reported is repeatedly inserted into (Fig. 7), but the present inventor can not still confirm inside its long segment Whether there are still (TTTCA) n pentanucleotide be repeatedly inserted into it is pathogenic.

Further, the present inventor chooses the sample that 3 RP-PCR prompts in the family have (TTTTA) n repeat amplification protcol (III:4, II:6 and IV:2) is expanded by LR-PCR and is obtained its long segment product, carry out single-molecule sequencing.Specify the family with In the long segment that disease isolates, only (TTTGA) n pentanucleotide is repeatedly inserted into mutation, and without (TTTCA) n pentanucleotide weight It is multiple.

Through the invention, the present inventor specifies that one is not detected (TTTCA) n pentanucleotide weight by RP-PCR for the first time The FCMTE family (Fig. 6) of multiple mutation insertion repeats to insert for a kind of (TTTGA) n pentanucleotide that is new, not being reported actually It is sick (Fig. 1, Fig. 4, Fig. 5, Fig. 8) to enter FCMTE caused by mutation.

Embodiment 2

In conjunction with LR-PCR and single-molecule sequencing, SAMD12 gene (TTTCA) the n pentanucleotide for parsing FCMTE is repeatedly inserted into Particular sequence

Clearly there is (TTTCA) n by RP-PCR and LR-PCR product Sanger sequencing for an example of the present inventor's research Pentanucleotide is repeatedly inserted into the FCMTE sample (P-I-III:2) of mutation, is combined again using LR-PCR and single-molecule sequencing Method repeats mutation diseases mutation structure to polynucleotides and parses.

The result shows that LR-PCR amplifies its corresponding long segment product (Fig. 3).Further, it is surveyed by unimolecule Sequence, the particular sequence for specifying its long segment product is (TTTTA)₃₅(TTTCA)₄₈₁(Fig. 4).

Therefore, for the FCMTE sample (P-I-III:2), it is prominent accurately to specify that corresponding polynucleotides repeat for the first time Become disease mutation structure: i.e. (TTTCA) n pentanucleotide is repeatedly inserted into the particular sequence of mutation on the SAMD12 gene of FCMTE.

The disease mutation structure for the 4 FCMTE samples analyzed in Examples 1 and 2 is summarized in table 2.

The son of the target area of 2.4 FCMTE samples of table reads the length and content statistics table of sequence

Note:

1. table summarizes the median of number of repetition length and number of repetition.

2.N indicates that obtaining effective son from the long reading sequence of target reads ordinal number amount.

3.N.D. expression is not detected.N=(length -3bp)/5bp.

It discusses

2018, first FCMTE Disease-causing gene (SAMD12 gene) and its pathogenic mutation-(TTTCA) n pentanucleotide repeated The identified discovery of insertion mutation (Fig. 1) makes it possible that the molecular genetics of FCMTE diagnoses for the first time.

However, it is mainly RP-PC, LR- that (TTTCA) the n pentanucleotide reported at present, which is repeatedly inserted into mutation detection methods, PCR or Southern Blot can qualitatively judge SAMD12 gene 4 in conjunction with RP-PCR and LR-PCR or Southern blot It includes in sub-district that whether there is or not (TTTCA) n pentanucleotides to be repeatedly inserted into mutation, but still has the following problems that may cause detection vacation occurs Negative: general (TTTCA) n pentanucleotide is repeatedly inserted into the downstream that mutation is located at one section of (TTTTA) n fermentation by five tubes (i.e. the end 5'), such as: (TTTTA)₂₀₀(TTTCA)₂₁₀(Fig. 1), therefore RP-PCR devises five nucleosides of specific diagnosis (TTTCA) n The primer that acid is repeatedly inserted into mutation is detected, and can occur the inspection of sawtooth sample (saw-like) in capillary electrophoresis detection result It surveys result (Fig. 2).But also have been reported that (TTTCA) n pentanucleotide is repeatedly inserted into mutation and is located at the repetition of (TTTTA) n pentanucleotide Interior sequences, such as (TTTTA)₁₀₀(TTTCA)₂₁₀(TTTTA)₁₀₀, in this case, although LR-PCR (primer is shown in Table 1) or The long segment allele (Fig. 3) of repeat amplification protcol can be detected in Southern blot, but because there is also only in normal person (TTTTA) the case where the long segment allele of n repeat amplification protcol, therefore it cannot be distinguished what detected person was caused a disease either with or without carrying on earth (TTTCA) n pentanucleotide is repeatedly inserted into mutation, may cause missing inspection and fails to pinpoint a disease in diagnosis.

The present invention has been surprisingly found that a new pentanucleotide is repeatedly inserted into mutation-in the abrupt climatic change of FCMTE (TTTGA) n determines that it is isolated in family with disease (see " embodiment ") by RP-PCR and LR-PCR, both ends Sanger Sequencing predicts that its structure is (TTTTA) n (TTTGA) n (Fig. 1).

Because the sequence that can not cover entire LR-PCR long segment is sequenced in Sanger, the present inventor can not still be defined (TTTCA) n pentanucleotide whether is still remained inside LR-PCR long segment is repeatedly inserted into mutation.Similar problems also appear in several In type spinocerebellar ataxia (Spinocerebellarataxia, SCA), such as SCA10, SCA31, SCA37, exist Pentanucleotide other than reference sequences is repeatedly inserted into.Current most of detection methods can not accurately measure the detailed sequence of its mutation Column.It can be seen that there are still apparent defects for current detection means in the judgement of abrupt climatic change and mutation content.

Although single-molecule sequencing technology has certain application value, but its practical application is greatly limited.Firstly, mesh It is preceding that mutation was detected is repeatedly inserted into SAMD12 gene pentanucleotide using single-molecule sequencing technology, it is all based on full genome The horizontal single-molecule sequencing of group, average effective overburden depth only have 8X or so, and can be across SAMD12 gene pentanucleotide weight The reading sequence of multiple insertion mutation is even more there was only 1-2 item, even without, it is thus possible to it will cause missing inspection.

Secondly, even if obtaining the reading sequence that 1-2 item is repeatedly inserted into mutation across SAMD12 gene pentanucleotide, in reading sequence There are still very big difficulties for the precision of analysis of particular sequence content.Because single base mispronounce be single-molecule sequencing technology Where defect, the reading sequence of limited item number can not go to correct this defect by algorithm, therefore use single point of full-length genome level Son sequencing is repeatedly inserted into mutation to SAMD12 gene pentanucleotide and detects, and is still the detection of qualitative level, can see The presence or absence of repeat amplification protcol sequence, but reliable sequence content information can not be provided, such as (TTTTA) n, (TTTCA) n and (TTTGA) the specific of n repeats number, arrangement mode etc..

Again, the single-molecule sequencing of full-length genome level, price is still very expensive, all rests on scientific research level at present, It can not be promoted in clinical detection.

Based in the long segment particular sequence in the SAMD12 detection in Gene Mutation of FCMTE, needing to parse mutation in detail Hold this problem, fully considers that presently relevant technology (survey by RP-PCR, LR-PCR or Southern blot, full-length genome unimolecule Sequence) in the limitation and defect of technology and application level, the present invention is by combining LR-PCR and target area single-molecule sequencing (Targeted single-molecule sequencing) has successfully parsed No. 4 intrones of SAMD12 gene in detail for the first time The different pentanucleotide in area is repeatedly inserted into mutation, it was demonstrated that the method for the present invention can parse long segment polynucleotides and repeat the detailed of mutation Thin sequence content, and first identified (TTTGA) n is that new FCMTE pentanucleotide is repeatedly inserted into pathogenic mutation.

In the present invention, in technological layer, because more effectively reading sequences are obtained, so that the accuracy of sequence analysis is big It is big to improve；And in terms of sequencing cost, because the target sequence of detection compared to full-length genome, is sequenced total amount and significantly reduces, at Originally it is greatly reduced, overall expenses is controlled in thousand yuan of ranks.

Although it should be understood that provided in embodiment be parse FCMTE SAMD12 gene pentanucleotide be repeatedly inserted into mutation The example of detailed sequence content also can be used it may be evident, however, that the method for the present invention can be used for parsing other mutation structures of FCMTE In the mutation structure for parsing other polynucleotides repetition mutation diseases.

Meanwhile the present invention can repeat the detailed sequence being mutated parsing for more similar polynucleotides and provide reference, more Clinically more accurately Molecular genetic test diagnosis provides method.

All references mentioned in the present invention is incorporated herein by reference, independent just as each document It is incorporated as with reference to such.In addition, it should also be understood that, after reading the above teachings of the present invention, those skilled in the art can To make various changes or modifications to the present invention, such equivalent forms equally fall within model defined by the application the appended claims It encloses.

Sequence table

<110>Zhejiang University

<120>method that mutation diseases mutation structure is repeated based on Long fragment PCR and single-molecule sequencing parsing polynucleotides

<130> P2019-0707

<160> 2

<170> SIPOSequenceListing 1.0

<210> 1

<211> 23

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 1

tgtgcagcca ttggtccagt ctt 23

<210> 2

<211> 24

<212> DNA

<213>artificial sequence (Artificial Sequence)

<400> 2

gctggcaaag ttcagaggtc actt 24

Claims

1. a kind of pair of polynucleotides repeat the method that the mutation structure of mutation diseases is parsed, which is characterized in that comprising steps of

(d) amplified production to described with sequence of barcodes carries out single-molecule sequencing, corresponds to the target area to obtain Son read sequence data set.

2. the method as described in claim 1, which is characterized in that the method also includes:

(e) data set is analyzed, to obtain the target area i.e. mutation structure of polynucleotides repeat region.

3. the method as described in claim 1, which is characterized in that between in step (c) and (d), further includes:

(d0) the first amplified production by described with sequence of barcodes is mixed with the m-1 amplified productions with sequence of barcodes, To obtain amplified production mixing library；

Wherein, the m-1 amplified productions with sequence of barcodes are being respectively provided with not with step (a), (b) and (c) preparation With sequence of barcodes the 2nd, the 3rd ... and m amplified production, and m kind is contained with bar code sequence in the amplified production mixing library The amplified production of column；

Wherein, the positive integer that m is >=2.

4. the method as described in claim 1, which is characterized in that in step (d), carried out to amplified production mixing library Single-molecule sequencing (corresponds to the polynucleotides to repeat to obtain the son for corresponding to the target area and read sequence Region son read sequence) data set.

5. method according to claim 2, which is characterized in that in step (e), based on different sequence of barcodes, to described Data set is split, and then the reading sequence with identical sequence of barcodes carries out classification analysis, is waited for obtain correspond to m kind respectively The mutation structure of the respective target area (i.e. polynucleotides repeat region) of test sample sheet.

6. the method as described in claim 1, which is characterized in that the length of the polynucleotides repeat region is 200- 10000bp, preferably 1500-5000bp；And/or

The polynucleotides repetition is the repetition of 3-12nt thuja acid unit.

7. the method as described in claim 1, which is characterized in that the polynucleotides repeat mutation diseases and are selected from the group: family Race's property cortex myoclonic trembles the relevant amyotrophic lateral sclerosis/Frontotemporal dementia of epilepsy, C9ORF72, spinocerebellum mutual aid Imbalance, myotonia dystrophy.

8. a kind of tremble the kit of epilepsy for diagnosing familial cortex myoclonic, which is characterized in that the kit Containing the first standard items, first standard items are the nucleic acid sequences that mutation is repeatedly inserted into (TTTGA) n1 pentanucleotide, Wherein n1 is 50-800.

9. a kind of purposes of kit according to any one of claims 8, which is characterized in that be used to prepare diagnosis familial cortex myoclonia Property is trembled the detection kit of epilepsy.

10. a kind of purposes of detection reagent, the detection reagent is for detecting five nucleosides of (TTTGA) n1 in SAMD12 gene Acid repeat, which is characterized in that the detection reagent be used to prepare diagnosis familial cortex myoclonic tremble epilepsy detection examination Agent box.