WO2017181670A1 - 一种从核酸样品富集目标序列核酸的方法 - Google Patents

一种从核酸样品富集目标序列核酸的方法 Download PDF

Info

Publication number
WO2017181670A1
WO2017181670A1 PCT/CN2016/106595 CN2016106595W WO2017181670A1 WO 2017181670 A1 WO2017181670 A1 WO 2017181670A1 CN 2016106595 W CN2016106595 W CN 2016106595W WO 2017181670 A1 WO2017181670 A1 WO 2017181670A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequence
target
bait
average
Prior art date
Application number
PCT/CN2016/106595
Other languages
English (en)
French (fr)
Inventor
蔡万世
王瑞超
屈武斌
杭兴宜
Original Assignee
艾吉泰康生物科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 艾吉泰康生物科技(北京)有限公司 filed Critical 艾吉泰康生物科技(北京)有限公司
Priority to AU2016403554A priority Critical patent/AU2016403554A1/en
Publication of WO2017181670A1 publication Critical patent/WO2017181670A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the invention relates to the capture, enrichment and analysis of nucleic acid sequences. More specifically, the present invention relates to a target sequence enrichment method based on liquid phase capture.
  • Target region capture technology refers to the capture of the nucleic acid sequence of the target region by a specific technical means, and then the library is sequenced to achieve the purpose of deep sequencing of the target region, and the sequencing cost is greatly reduced.
  • PCR is a common technique for enriching target regions, and it is more common to capture multiple target regions at once using multiplex PCR techniques. Multiplex PCR is more suitable for the capture of hotspots or smaller target areas; for larger target areas, such as target areas longer than 100K, multiplex PCR is no longer suitable in terms of cost and technical complexity.
  • the present invention provides a target sequence enrichment method based on liquid phase capture.
  • the invention provides a method of enriching a nucleic acid of a target sequence from a nucleic acid sample, the method comprising:
  • nucleic acid sample comprising a target nucleic acid sequence and a bait sequence that is identical to or characteristic of the target nucleic acid sequence
  • a linker sequence is ligated at both ends of the nucleic acid sample fragment in the preparative library of step c), and step e) further comprises the step of f) hybridizing the nucleic acid analog/DNA according to the linker sequence The complex is amplified to achieve the purpose of enriching the nucleic acid of the target sequence.
  • the bait sequence has a property selected from the group consisting of: i) not producing a hairpin structure by itself and no dimer production between each other, ii) copy number according to the GC content of the target nucleic acid sequence and / or spatial structure to compensate, and iii) when the target area is a very high or very low GC content area or when the target area is a low complexity area, use the two sides of the target area as a substitute area to design the bait,
  • the design method is consistent with the target region, iv) non-specifically binding to other sequences than the target nucleic acid sequence in the nucleic acid sample.
  • the copy number of the bait sequence is also compensated for by the target nucleic acid sequence being of interest.
  • nucleic acid sample is genomic DNA, RNA, cDNA, mRNA
  • the nucleic acid sample is RNA or mRNA
  • the bait sequence is on a solid support, such as on a microarray slide.
  • the solid support is also a plurality of beads or a microarray.
  • nucleic acid analogs carry a binding moiety.
  • the nucleic acid analog is prepared by in vitro transcription using the nucleic acid analog GNA, LNA, PNA, TNA or morpholino nucleic acid in step b), preferably the nucleic acid analog carries a binding moiety.
  • the binding moiety is a biotin binding moiety.
  • the bait sequence copy number is compensated according to the GC content of the target sequence, and the smaller or larger the GC content, the more the bait sequence copy number corresponding to the target sequence increases.
  • the copy number is compensated according to the GC content of the target nucleic acid sequence, which means that the GC content is 50% of the bait sequence copy number coefficient, and the GC content is between 10% and 90%. For every 1%, the bait sequence copy number coefficient is increased by 0.08-0.12.
  • the bait sequence copy number compensation method is: according to the GC content of the target sequence is divided into 6 files from high to low, wherein the first file: 10%-30%; the second file: 30% -40%; third gear: 40%-60%; fourth gear: 60%-70%; fifth gear: 70%-90%; sixth gear: less than 10% or greater than 90%, of which the third gear
  • the copy number of the bait sequence is the reference copy number, and the number of copies of the bait sequence of the second and fourth gears is more than the third gear, for example, 2.2-2.8 times of the third gear, the bait of the first gear and the fifth gear
  • the sequence has more copies, for example 3-4 times the third block.
  • the decoy sequence design method is: designing the probe with the region on both sides of the target region as an alternative region, generally selecting the target region An area within 300 bp on both sides is used as a replacement area, preferably an area within 150 bp.
  • the bait sequence is 60-150 bp in length, preferably 80-120 bp in length.
  • said dimer-free production refers to a dimer formed between any two bait sequences having a T m ⁇ 47 ° C, preferably ⁇ 37 ° C; preferably the value of Tm is based on the thermodynamics of SantaLucia 2007 The nearest neighbor method of the parameter table is calculated.
  • any one of the decoy sequence itself forms a hairpin structure, which T m ⁇ 47 °C, preferably ⁇ 37 °C; Tm value is preferably based on the thermodynamic parameter table SantaLucia 2007 Closest Method calculation.
  • the average Tm value, the Tm target is the decoy sequence and the target region T m ;
  • the invention also provides a specific decoy sequence for carrying out the method of the invention, the specific decoy sequence being the decoy sequence referred to in the first aspect of the invention.
  • the specific decoy sequence is identical to or characteristic of the target nucleic acid sequence, and i) does not itself produce a hairpin structure and is free of dimers from each other, ii) copy number according to Compensating for the GC content and/or spatial structure of the target nucleic acid sequence, iii) when the target region is a very high or very low GC content region or when the target region is a low complexity region, using both sides of the target region The region is designed as a surrogate region, the design method is consistent with the target region, iv) non-specifically binds to other sequences than the target nucleic acid sequence in the nucleic acid sample.
  • the copy number of the bait sequence is also compensated for by the target nucleic acid sequence being of interest.
  • the present invention also provides a kit comprising the bait sequence of the second aspect of the invention, the kit further comprising, but not limited to, a double linker molecule, a plurality of different Oligonucleotide probe.
  • the kit comprises a composition and reagents for carrying out the method of the first aspect of the invention.
  • the kit includes, but is not limited to, a double-linker molecule, a plurality of different oligonucleotide probes, a bait sequence that is identical to or characteristic of the target nucleic acid sequence, and the decoy sequence: i) itself No hairpin structure is produced and no dimer is produced between each other, ii) copy number is compensated according to GC content, spatial structure and/or attention of the target nucleic acid sequence, iii) when the target region is extremely high Or when the region of the very low GC content is used or when the target region is a low complexity region, the probe is designed with the region on both sides of the target region as a substitute region, and the design method is consistent with the target region, iv) the target nucleic acid in the nucleic acid sample Other sequences outside the sequence have no specific binding.
  • the kit comprises two different double-linker molecules.
  • the kit may further comprise at least one or more additional components selected from the group consisting of DNA polymerase, T4 polynucleotide kinase, T4 DNA ligase, hybridization, wash and/or eluent.
  • the kit comprises a magnet.
  • the kit comprises one or more enzymes, as well as corresponding reagents, buffers, and the like, such as restriction enzymes, such as MlyI, and for restriction enzyme digestion using MlyI. Buffer/reagent.
  • the invention provides a target sequence enrichment method based on liquid phase capture, which comprises: decoy sequence design, nucleic acid synthesis of bait sequence (using conventional primers or solid phase synthesis method), preparation of nucleic acid by in vitro transcription method
  • An analog the nucleic acid analog comprises a binding moiety; a nucleic acid sample is pretreated (by a library preparation method), the sample may be genomic DNA, RNA, cDNA, mRNA, etc.; the nucleic acid analog and the target sequence nucleic acid are formed by complementary pairing principles Nucleic acid analog/DNA hybrid complex; eluting to remove low complementary paired nucleic acid analog/DNA hybrid, removing non-target sequence nucleic acid; complementing according to the linker sequence added by nucleic acid sample pretreatment
  • the paired nucleic acid analog/DNA is specifically amplified to achieve the purpose of enriching the nucleic acid of the target sequence.
  • sample is used in its broadest sense and is intended to include a sample or culture obtained from any source, preferably from a biological source.
  • Biological samples are available from animals, including humans, and include liquids, solids, tissues, and gases.
  • Biological samples include blood products such as plasma, serum, and the like.
  • a "nucleic acid sample” comprises nucleic acids of any origin (eg, DNA, RNA, cDNA, mRNA, tRNA, miRNA, etc.). In the case where the nucleic acid sample is RNA or mRNA, there is a step of reverse transcription of the RNA or mRNA into DNA prior to step c).
  • the nucleic acid sample is preferably derived from a biological source, such as a human or non-human cell, tissue, and the like.
  • a biological source such as a human or non-human cell, tissue, and the like.
  • non-human refers to all non-human animals and entities including, but not limited to, vertebrates such as rodents, non-human primates, sheep, cattle, ruminants, rabbits, pigs, goats, horses, dogs, Cats, birds, etc.
  • Non-humans also include invertebrates and prokaryotes, such as bacteria, plants, yeast, viruses, and the like.
  • nucleic acid samples for use in the methods and systems of the invention are nucleic acid samples derived from any organism, whether eukaryotic or prokaryotic.
  • the inventors found that the GC content of the target sequence has a large influence on the capture efficiency of the target sequence based on liquid phase capture. In order to achieve effective capture of multiple target sequences, it is preferred to compensate the number of copies of the bait sequence according to the GC content of the target sequence. The smaller or larger the GC content, the larger the copy number of the bait sequence corresponding to the target sequence. The more.
  • the inventors have found that for a target sequence with a GC content of about 50%, for example ⁇ 10%, a good target sequence capture efficiency can be obtained; for other GC content target sequences, a bait sequence copy number compensation is required to obtain a good target sequence. Capture efficiency.
  • the GC content is 50% of the bait sequence copy number coefficient as the benchmark 1, and the GC content is between 10% and 90%. For every 1%, the bait sequence copy number coefficient is increased by 0.08-0.12. For example, when the GC content is 68%, the deviation is 18%, and the induced sequence copy number coefficient is 2.44-3.16.
  • the corresponding bait sequence design method in this case is when the target region is a very high or very low GC content region or when the target region is low
  • the probe is designed by using the region on both sides of the target region as a substitute region, and the region within 300 bp on both sides of the target region is generally selected as the replacement region, preferably within 150 bp.
  • a low complexity region refers to a region composed of a rare variety of elements such as oligonucleotides, such as a simple repeat sequence of microsatellites.
  • the decoy sequence copy number compensation method may be simply expressed as: according to the GC content of the target sequence, from high to low, divided into 6 files, wherein the first file: 10%-30%; the second file : 30%-40%; 3rd gear: 40%-60%; 4th gear: 60%-70%; 5th gear: 70%-90%; 6th gear: less than 10% or more than 90%, of which
  • the copy number of the bait sequence of the third gear is the reference copy number
  • the copy number of the decoy sequence corresponding to the second gear and the fourth gear needs to be increased, for example, 2.2-2.8 times of the third gear, the first gear and the fifth gear.
  • the copy number of the bait sequence needs to be increased more, for example 3-4 times the third gear.
  • the bait sequence design method is: using the two sides of the target region as an alternative region design
  • the needle generally selects an area within 300 bp on both sides of the target area as a replacement area, preferably an area within 150 bp.
  • the bait sequence is one or more bait sequences that are optimally scored in terms of specificity, dimer, hairpin structure, and relative position to the target region
  • S- specific scores are all values between 0 and 1, and the specific scoring method is as follows:
  • the Tm value, the Tm target is the decoy sequence and the target region T m ;
  • the calculated T m of the sequence is not held to a particular method, various methods of calculating the Tm value may be used in the present invention, the Tm value obtained by various methods not substantially reverse the effects of the present invention, but the effect of The degree will vary.
  • the nearest neighbor method of the SantaLucia 2007 thermodynamic parameter table can calculate Tm
  • the Tm value calculated by other methods can correspond to it, and those skilled in the art can compare the Tm calculated by various methods through simple experiments, thereby The calculated Tm value is appropriately selected.
  • the human genome coding region for the human genome coding region, more than 99% of the target regions can design a bait sequence suitable for the present invention, indicating that our aforementioned binning of the GC region and filtering of the Tm value are reasonable. .
  • the hybridization between the nucleic acid analog and the target nucleic acid is carried out under preferably stringent conditions sufficient to support hybridization between the nucleic acid analog/DNA, wherein the nucleic acid is similar
  • the inclusions comprise a complementary region of the linking compound and the target nucleic acid sample to provide the nucleic acid analog/DNA hybrid complex.
  • the complex is then captured by the linker compound and washed under conditions sufficient to remove the non-atopic binding nucleic acid, and the hybridized target nucleic acid sequence is then eluted from the captured nucleic acid analog/DNA complex.
  • the nucleic acid analog comprises a chemical group or a linking compound, such as a binding moiety such as biotin, digoxin, or the like, which is capable of binding to a solid support.
  • the solid support may comprise a corresponding capture compound, such as streptavidin for biotin or a digoxin antibody for digoxin.
  • the invention is not limited to the linking compounds used, and alternative linking compounds are equally suitable for use in the methods, bait sequences and kits of the invention.
  • the chemical group or a linking compound such as a binding moiety such as biotin, digoxigenin or the like, may be linked to a nucleic acid analog (glycerol nucleic acid GNA, locked nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA) Or any base in the morpholine nucleic acid).
  • the nucleic acid analog chain may comprise ribose and/or deoxyribose
  • the chemical group or linking compound such as a binding moiety such as biotin, digoxin, etc., may be attached to ribose and/or deoxyribose On the base.
  • the synthesis of the nucleic acid analog includes the use of a label ATP, CTP, GTP, and/or UTP.
  • Labeling methods for the labeling nucleotides Cydye, DIG, biotin, rhodamine, fluorescein, etc. are known in the art.
  • biotin can be used as a nucleic acid probe label which binds to the UTP of a nucleic acid molecule or a C atom at the 5' position of dUTP, and can be detected by binding to avidin.
  • the present invention is not limited to known labels and labeling methods, and markers and labeling methods found in the future are also within the scope of the present invention.
  • the plurality of target nucleic acid molecules preferably comprise a whole genome of an organism or at least one chromosome or a nucleic acid molecule of any size.
  • the nucleic acid molecule is at least about 200 kb in size, at least about 500 kb, at least about 1 Mb, at least about 2 Mb, or at least about 5 Mb, more preferably from about 100 kb to about 5 Mb, from about 200 kb to about 5 Mb, from about 500 kb to about 5 Mb. From about 1 Mb to about 2 Mb or from about 2 Mb to about 5 Mb.
  • the target nucleic acid is from an animal, plant or microorganism, and in a preferred embodiment, the target nucleic acid molecule is selected from a human. If the amount of nucleic acid sample is relatively small (e.g., a human nucleic acid sample obtained in some cases, such as the genome of a developing fetus), the nucleic acid can be amplified prior to performing the methods of the invention, such as by whole genome amplification. Pre-amplification may be necessary for performing the methods of the invention, such as in forensic applications (e.g., for use in genetics for forensic purposes).
  • the plurality of target nucleic acid molecules are a set of genomic DNA molecules.
  • the bait sequence may be selected, for example, from a plurality of decoy sequences defining a plurality of exons, introns or regulatory sequences from a plurality of genetic loci; a plurality of decoy sequences defining a full sequence of at least one individual genetic locus, Said locus is of any size, preferably at least 1 Mb, or at least one of the above specified sizes; a plurality of decoy sequences defining a single nucleotide polymorphism (SNP); or a plurality of bait sequences defining an array, for example designed as A chimeric array of full sequences of at least one complete chromosome is captured.
  • SNP single nucleotide polymorphism
  • hybridization refers to the pairing of complementary nucleic acids. Hybridization and hybridization strength (eg, the strength of binding between nucleic acids) are affected by a number of factors, such as the degree of complementarity between nucleic acids, the stringency of hybridization conditions used, the melting temperature (Tm) of the formed hybrid, and the GC of the nucleic acid. Content value.
  • Tm melting temperature
  • GC GC of the nucleic acid. Content value.
  • stringent hybridization conditions depend on the sequence and vary with hybridization parameters (eg, salt concentration, presence of organics, etc.).
  • stringent conditions are selected to be from about 5 ° C to about 20 ° C below the Tm of the particular nucleic acid sequence at the specified ionic strength and pH.
  • stringent conditions are from about 5 ° C to 10 ° C below the temperature melting point of the particular nucleic acid to which the complementary nucleic acid is bound.
  • the Tm is the temperature (under defined ionic strength and pH) at which 50% of the nucleic acid (eg, the target nucleic acid) hybridizes to the fully matched probe.
  • stringent conditions may, for example, be 50% formamide, 5 x SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5 x Denhardt solution, sonicated sperm DNA (50 mg/ml), 0.1% SDS, and 10% dextran sulfate hybridized at 42 ° C at 0.2 ° SSC (sodium chloride / sodium citrate) and at 42 ° C Wash at 50 °C with 50% formamide and then at 55 °C with 0.1 x SSC containing EDTA.
  • buffers containing 35% formamide, 5 x SSC, and 0.1% (w/v) sodium dodecyl sulfate (SDS) are expected to be suitable for hybridization at 45 ° C for 16-72 hours under moderately non-stringent conditions.
  • the term "primer” refers to an oligonucleotide, whether purified, cleaved or produced synthetically, under conditions which induce the synthesis of a primer extension product complementary to a nucleic acid strand. (for example in the presence of nucleotides and inducing agents such as DNA polymerase, and at suitable temperatures and pH), can serve as a starting point for synthesis.
  • the primer is preferably a single strand having the greatest amplification efficiency.
  • the primer is an oligodeoxynucleotide.
  • the primer must be sufficiently long to initiate synthesis of the extension product in the presence of the inducing agent. The exact length of the primer depends on many factors including temperature, source of the primer and the method used.
  • the term "bait” or “bait sequence” refers to an oligonucleotide (eg, a nucleotide sequence), whether produced in nature, purified, cleaved, or produced by synthetic, recombinant, or PCR amplification, Able to work with another goal
  • An oligonucleotide such as at least a portion of a target nucleic acid sequence, hybridizes.
  • the probe can be single stranded or double stranded. Probes can be used for the detection, identification and isolation of specific gene sequences.
  • target nucleic acid molecule refers to a molecule or sequence from a region of a target genomic region.
  • the preselected probe determines the extent of the target nucleic acid molecule.
  • the "target” attempts to distinguish it from other nucleic acid sequences.
  • a “fragment” is defined as a nucleic acid region in the sequence of interest, such as a “fragment” or a “portion” of a nucleic acid sequence.
  • isolated when used in reference to a nucleic acid, such as when used in “isolated nucleic acid”, refers to the identification and isolation of a nucleic acid sequence from at least one other component or contaminant to which it is normally associated. .
  • An isolated nucleic acid exists in a form different from its natural presence.
  • nucleic acids of unseparated nucleic acids such as DNA and RNA exist in their naturally occurring state.
  • the isolated nucleic acid, oligonucleotide or polynucleotide may exist in a single stranded form or in a double stranded form.
  • a decoy sequence consistent with a target nucleic acid sequence refers to a sequence whose complementary sequence can hybridize to a target nucleic acid sequence.
  • the hybridization is carried out under stringent conditions.
  • the target area is a very high or very low GC content area or when the target area is a low complexity area, since the area cannot design a bait sequence, that is, the bait sequence coverage is zero, then the target area is left and right.
  • the side looks for a suitable area to design the bait sequence; generally, the bait sequence is designed within a range of 300 bp or less on the left and right sides; preferably, the area within 150 bp.
  • a transcription primer for a bait sequence for use in the capture methods and kits described herein comprises a ligation compound, such as a binding moiety.
  • the binding moiety comprises any portion that joins or introduces the 5' end of the amplification primer for subsequent capture of the nucleic acid analog/target nucleic acid hybridization complex.
  • the binding moiety is any sequence that introduces the 5' end of the primer sequence, such as a captureable 6 histidine (6HIS) sequence.
  • a primer comprising a 6HIS sequence can be captured by nickel, such as in a nickel coated or tube containing nickel coated beads, granules, or the like, in a microwell, or in a purification column, wherein the beads are packed into a column and the sample is loaded and The column is passed through to capture complexes with reduced complexity (eg, and subsequent target elution).
  • An example of another binding moiety for use in embodiments of the invention includes a hapten, such as digoxin, for example, which is ligated to the 5' end of the amplification primer.
  • Digoxin can be captured using a digoxin antibody, such as a substrate coated or containing an anti-digoxigenin antibody.
  • the binding moiety is biotin
  • the capture matrix such as a bead, such as a paramagnetic particle
  • streptavidin for isolating the target nucleic acid from a non-specific hybridization target nucleic acid/ Transcription product complex.
  • a streptavidin (SA) coated matrix such as SA coated beads (eg, magnetic beads/particles)
  • SA coated beads eg, magnetic beads/particles
  • the bait sequence corresponding to at least one region of the genome in the sequence can be provided in parallel on a solid support using a maskless array synthesis technique.
  • the probe can be obtained continuously and applied to the solid support using a standard DNA synthesizer, or can be obtained from an organism and fixed to the solid support.
  • a nucleic acid that has not hybridized or non-specifically hybridized to the nucleic acid analog is isolated by washing from the nucleic acid analog to which the vector is bound.
  • the remaining nucleic acid specifically binds to the nucleic acid analog, elutes from the solid support in, for example, hot water or in a nucleic acid elution buffer containing, for example, TRIS buffer and/or EDTA to produce the target nucleic acid Molecularly enriched eluate.
  • the bait sequence for the target molecule can be synthesized on a solid support as described above, released and amplified from the solid support as a collection of bait sequences.
  • the transcribed set of released nucleic acid analogs can be covalently or non-covalently immobilized to a carrier, such as glass, metal, ceramic, or polymeric beads or other solid carrier.
  • the nucleic acid analog can be designed to be conveniently released from the solid support, for example to provide an acid or base labile nucleic acid sequence at or near the end of the nucleic acid analog closest to the vector, which is released under low or high pH conditions, respectively.
  • Nucleic acid analogs A variety of cleavable linking compounds are known in the art.
  • the carrier can be provided, for example, in a cylinder having a liquid inlet and an outlet.
  • Methods of immobilizing nucleic acids to vectors are well known in the art, for example by binding biotinylated nucleotides to the The nucleic acid analog is coated with streptavidin, whereby the coated vector non-covalently attracts and immobilizes the nucleic acid analog in the collection.
  • the sample is passed through the vector comprising the nucleic acid analog under hybridization conditions, whereby the target nucleic acid molecule that hybridizes to the immobilized vector can be eluted for later analysis or other use.
  • nucleic acid may include, for example, but not limited to, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and artificial nucleic acids such as peptide nucleic acids (PNA), morpholino and lock nucleic acids (LNA), glycerol nucleic acids. (glycol nucleic acid, GNA) and threose nucleic acid (TNA).
  • nucleic acid may include, for example, but not limited to, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and artificial nucleic acids such as peptide nucleic acids (PNA), morpholino and lock nucleic acids (LNA), glycerol nucleic acids. (glycol nucleic acid, GNA) and threose nucleic acid (TNA).
  • nucleic acid amino acid
  • nucleic acid sequence or nucleic acid molecule
  • RNA oligos of ribonucleic acid
  • the term includes molecules consisting of natural nucleobases, saccharides, and covalent internucleoside (backbone) linkages, and similar functions with non-natural nucleobases, saccharides, and covalent internucleoside (skeleton) linkages. Molecule or a combination thereof. Such modified or substituted nucleic acids may be preferred over the native form because of the desired properties, such as enhanced affinity for nucleic acid target molecules and increased stability in the presence of nucleases and other enzymes, and the term “nucleic acid similar” is used herein. "” or “nucleic acid mimic" is described.
  • nucleic acid mimetics include peptide nucleic acid (PNA), locked nucleic acid (LNA), wood-locked nucleic acid Uylo-LNA, thiophosphoric acid, 2'-methoxy, 2'-methoxyethoxy Molecular or functionally similar nucleic acid derivatives of morpholino nucleic acid and phosphoramidate.
  • PNA peptide nucleic acid
  • LNA locked nucleic acid
  • Uylo-LNA wood-locked nucleic acid Uylo-LNA
  • thiophosphoric acid 2'-methoxy
  • 2'-methoxyethoxy 2'-methoxyethoxy
  • Example 1 Design of a bait sequence
  • Table 1 Chromosome distribution of randomly selected 1000 loci
  • the bait sequence design includes the following steps:
  • the target sequence characteristic analysis includes the following steps:
  • GC content from high to low is divided into 5 files, of which 1 file: 10% -30%; 2 files: 30% -40%; 3 files: 40% -60%; 4 files: 60% -70%; 5 files: 70%-90%;
  • the target sequence length is in the range of 60-150 bp;
  • thermodynamic stability of the binding of the bait sequence on the non-target area is significantly weaker than the thermodynamic stability of the binding on the target area;
  • the general analysis index is T m (target area) - T m (non-specific region) ⁇ (non-specific region) 5 ° C; partial data T m (target region) - T m (non-specific region) ⁇ 10 ° C for comparison (strong specificity restriction);
  • different thermodynamic calculation methods The calculation results have a large impact, which is calculated based on the nearest neighbor method of the SantaLucia 2007 thermodynamic parameter table;
  • S- dimer scoring rule Perform dimer alignment analysis on each of the newly designed bait sequences with each designed bait sequence, using BLAT software, using default parameters, and comparing each of them.
  • bait sequence copy number compensation is performed according to the specific target area:
  • the number of copies of the decoy sequence of the third gear is used as the reference copy number (ie, the reference 1); the decoy sequence corresponding to the first and fifth files needs to increase the copy number, which is the third block. 2.5 times; followed by 2 and 4, the corresponding bait sequence also needs a little more copy number is 3.5 times of the third gear;
  • the target area may be the focus area, for example, the area where the fusion event occurs, and the number of copies of the bait sequence doubles;
  • the target sequence cannot design the probe, for example, when the target area is a very high or very low GC content area, or when the target area is a low complexity area (low complexity area refers to a few types) Element such as an area composed of oligonucleotides, such as a simple repeat of microsatellites), due to the inability of the region to design
  • the bait sequence that is, the coverage of the bait sequence is zero, then the bait sequence is designed to find the appropriate area on the left and right sides of the target area; the bait sequence is generally designed within the range of 300 bp on the left and right sides; if the area within 150 bp can be designed properly
  • the bait sequence is recorded as a control.
  • 138 of the randomly selected target sequences belong to this situation, 68 of them have successfully designed the bait sequence in the area of 150 bp or so, and the other 22 successfully designed the bait sequence within 150-300 bp. There are 48 probes that cannot be designed in these areas.
  • the specific sequence design principle is: 1) no non-specific amplification products are generated on the target (to be captured) genome. 2) the GC content is between 30% and 70%, preferably between 40% and 60%; 3) the two do not form a dimer, or the dimer free energy formed is ⁇ 47 ° C, preferably ⁇ 37 ° C .
  • the sequence to be synthesized is formed, and all the bait sequences are identical to the specific sequence, as follows:
  • the 5'-end specific sequence-bait sequence (60-150 bp unequal)-3'-end specific sequence is (SEQ ID NO. 1):
  • the oligonucleotide to be synthesized is synthesized on a large scale by a chip method well known in the art, and then the oligonucleotide on the chip is eluted with ammonia water, purified and dissolved in double distilled water to form an oligonucleoside. Acid pool.
  • the 5' end primer and the 3' end primer complementary to the 5' end specific sequence and the 3' end specific sequence are used as primers, and the Taq polymerase (JumpStart Taq DNA Polymerase is used to purchase Sigma, Catalog No. D6558) Polymerase chain reaction amplification, obtaining a large number of double-stranded DNA pools, the specific steps are as follows:
  • Reagent name volume Water 37 ⁇ l 10 ⁇ PCR Buffer 5 ⁇ l 10mM dATP 1 ⁇ l 10mM dCTP 1 ⁇ l 10mM dGTP 1 ⁇ l 10mM TTP 1 ⁇ l 5' primer (10 ⁇ M) 1 ⁇ l 3' primer (10 ⁇ M) 1 ⁇ l JumpStart Taq DNAPolymerase 1 ⁇ l
  • Reagent name volume Water 37 ⁇ l 10 ⁇ PCR Buffer 5 ⁇ l 10mM dATP 1 ⁇ l 10mM dCTP 1 ⁇ l 10mM dGTP 1 ⁇ l 10mM TTP 1 ⁇ l BAITS_5_PRIMER_N-T7 (10 ⁇ M) 1 ⁇ l BAITS_3_PRIMER_N (10 ⁇ M) 1 ⁇ l JumpStart Taq DNAPolymerase 1 ⁇ l Oligonucleotide pool 1 ⁇ l
  • the product of the previous PCR reaction was separated by gel electrophoresis, the non-specific band was removed, and the 120-210 bp region fragment was recovered and purified by Qiagen Gel Extraction Kit (Cat No./ID28704).
  • NTP and biotin labeling using nucleic acid analogs using nucleic acid analogs (glycerol nucleic acid GNA, locked nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine nucleic acid) using T7 High Yield RNA Transcription Kit (Vazyme, TR101-01/02)
  • the UTP is a substrate, and the purified product of the previous step is subjected to in vitro transcription to prepare a pool of biotin-labeled nucleic acid analogs:
  • Reagent name Volume ( ⁇ l) ATP analog (GNA, LNA, PNA, TNA or morpholine nucleic acid, 10 mM) 2
  • Block 2 Dilute cot-1 DNA and salmon sperm DNA to 100 ng/ ⁇ l, and mix in equal volumes, labeled Block 2;
  • step 11 Repeat step 11 twice, a total of 3 magnetic beads cleaning, and finally resuspend the magnetic beads with 200 ⁇ l of the binding solution;
  • Reagent name volume 5 ⁇ Phusion HF 10 ⁇ l 10mM dNTPs 1 ⁇ l Post Prmier Mix (both 10 ⁇ M) 1 ⁇ l Resuspend the magnetic beads (step 20) 20 ⁇ l Phusion DNA polymerase 0.5 ⁇ l H 2 O 17.5 ⁇ l
  • the BWA MEM software was used to compare the sequencing data with the human reference genome HG19 using the following parameters: bwa mem-Mk 40-t 8-R"@RG ⁇ tID:Hiseq ⁇ tPL:Illumina ⁇ tSM:sample ", thereby obtaining a single nucleotide polymorphism, insertion or deletion different from the reference genome, ie, the detected gene mutation.
  • the number of bases with sequencing depth ⁇ 1, ⁇ 4, ⁇ 10 and ⁇ 20 is counted separately, and the number of bases is divided by the total number of bases in the target region, thereby obtaining Parameters of 1 ⁇ coverage, 4 ⁇ coverage, 10 ⁇ coverage, and 20 ⁇ coverage.
  • the average depth is 451.53 layers; 4 ⁇ coverage rate is 94.35%, and 20 ⁇ coverage rate is also 93.64%, with good coverage and uniformity, and total data volume. Only 8.52Mb reads.
  • the beneficial effects of such results are: 1) small amount of sequencing, effective cost reduction; 2) high average sequencing depth, that is, each target site is sequenced multiple times, so the data accuracy is high; 3) high coverage, Less missing sites; 4) Good homogeneity, that is, most sites have similar coverage depths.
  • the coverage and homogeneity decreased by 4.5 and 5.1 percentage points, respectively, compared with the LNA, and the strong specificity limit, strict dimer Coverage and uniformity increased by 6.3 and 7.8 percentage points respectively under the limitation, strict hairpin structure limitation and strict scoring function limitation; the area coverage and uniformity within 150 bp and the uniformity of 150-300 bp were 2.3 and 3.8 respectively.
  • Percentage points; parallel coverage and homogeneity of standard nucleic acid ATP, CTP, GTP, UTP, and Biotin-UTP in the same ratio decreased by 5.3 and 4.8 percentage points, respectively.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Chemical Kinetics & Catalysis (AREA)

Abstract

本发明提供了一种从核酸样品富集目标序列核酸的方法,所述方法包括:提供包含目标核酸序列的核酸样品和与目标核酸序列一致或对目标序列具有特征性的诱饵序列;以所述诱饵序列为模板进行体外转录制备核酸类似物,所述核酸类似物带有结合部分;使所述核酸样品片段化;所述核酸类似物与所述核酸样品杂交,使得所述核酸类似物与所述目标序列核酸形成核酸类似物/DNA杂交复合物;通过所述结合部分,从非特异性杂交核酸中分离所述核酸类似物/DNA杂交复合物,去除非目标序列核酸。在优选的实施方案中,所述方法还包括对所述核酸类似物/DNA杂交复合物进行扩增,达到富集目标序列核酸的目的。

Description

一种从核酸样品富集目标序列核酸的方法 技术领域
本发明涉及核酸序列的捕获、富集与分析。更具体来说,本发明涉及基于液相捕获的目标序列富集方法。
背景技术
全基因组测序可以获得全基因组水平范围的突变、插入、缺失以及结构变异。然而,由于基因组容量较大,以30×进行测序就会产生接近100G的数据量。而肿瘤等相关的低突变频率测序则需要至少1000×的覆盖度,如果进行全基因组测序,则会产生多达3000G的数据量。这样规模的数据量除了会对数据的分析工作造成极大的困难之外,还会使测序成本巨大。这个时候,目标区域捕获技术应运而生。
目标区域捕获技术是指通过特定的技术手段定向的捕获目标区域的核酸序列,然后进行建库测序,以达到在对目标区域进行深度测序的目的的同时使得测序成本大大降低。PCR是一种常见的用于富集目标区域的技术,更为常见的是利用多重PCR技术一次性地捕获多个目标区域。多重PCR更适用于热点区域或者长度较小的目标区域的捕获;对于长度较大的目标区域,例如长度超过100K的目标区域,多重PCR从其成本以及技术复杂度上来看,都不再适合。
因此,本领域中需要适合对长度较大的目标区域进行捕获的新方法。
发明内容
为了解决上述问题,本发明提供了一种基于液相捕获的目标序列富集方法。
在第一方面,本发明提供了一种从核酸样品富集目标序列核酸的方法,所述方法包括:
a)提供包含目标核酸序列的核酸样品和与目标核酸序列一致或对目标序列具有特征性的诱饵序列;
b)以所述诱饵序列为模板进行体外转录制备核酸类似物,所述核酸类似物带有结合部分;
c)使所述核酸样品片段化,优选制备文库;
d)所述核酸类似物与所述核酸样品杂交,使得所述核酸类似物与所述目标序列核酸形成核酸类似物/DNA杂交复合物;
e)通过所述结合部分,从非特异性杂交核酸中分离所述核酸类似物/DNA杂交复合物,去除非目标序列核酸。
在一个实施方案中,在步骤c)的制备文库中在所述核酸样品片段两端连接接头序列,并且在步骤e)还包括步骤f)根据所述接头序列对所述核酸类似物/DNA杂交复合物进行扩增,达到富集目标序列核酸的目的。
在一个实施方案中,其中所述诱饵序列具有选自如下的特性:i)自身不产生发夹结构并且相互之间无二聚体产生,ii)拷贝数根据所述目标核酸序列的GC含量和/或空间结构进行补偿,和iii)当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计诱饵,设计方法与所述目标区域一致,iv)与核酸样品中目标核酸序列之外的其他序列无特异性结合。
在一个实施方案中,所述诱饵序列的拷贝数还根据所述目标核酸序列受关注情况进行补偿。
在一个实施方案中,其中所述核酸样品是基因组DNA、RNA、cDNA、mRNA, 在所述核酸样品是RNA或mRNA的情况下,中步骤c)之前有将所述RNA或mRNA反转录成DNA的步骤。
在一个实施方案中,所述诱饵序列在固体载体上,例如在微阵列载玻片上。
在一个实施方案中,所述固体载体也为多种珠子或者为微阵列。
在一个实施方案中,部分或者全部所述核酸类似物带有结合部分。
在一个实施方案中,步骤b)中利用核酸类似物GNA、LNA、PNA、TNA或吗啉核酸进行体外转录,制备核酸类似物,优选所述核酸类似物带有结合部分。
在一个实施方案中,其中所述结合部分为生物素结合部分。
在一个实施方案中,根据所述目标序列的GC含量对所述诱饵序列拷贝数进行补偿,GC含量越小或者越大,所述目标序列对应的诱饵序列拷贝数增加的就越多。
在一个实施方案中,拷贝数根据所述目标核酸序列的GC含量进行补偿是指:以GC含量在50%的诱饵序列拷贝数系数为基准1,GC含量在10%-90%之间偏离50%每1%,诱饵序列拷贝数系数增加0.08-0.12。
在一个具体实施方案中,诱饵序列拷贝数补偿方法为:根据所述目标序列的GC含量大小从高到低分为6档,其中第1档:10%-30%;第2档:30%-40%;第3档:40%-60%;第4档:60%-70%;第5档:70%-90%;第6档:小于10%或大于90%,其中第3档的诱饵序列的拷贝数为基准拷贝数,第2档和第4档的诱饵序列的拷贝数多于第3档,例如是第3挡的2.2-2.8倍,第1档和第5档的诱饵序列的拷贝数更多,例如是第3挡的3-4倍。对于第6档,GC含量小于10%或大于90%,以及目标区域是低复杂序列的情况,诱饵序列设计方法是:用所述目标区域两侧区域作为替代区域设计探针,一般选择目标区域两侧300bp以内区域作为替代区域,优选150bp以内的区域。
在一个实施方案中,其中所述诱饵序列长度为60-150bp,优选80-120bp。
在一个实施方案中,其中所述与目标核酸序列一致或者对目标序列具有特异性是指,诱饵序列在非目标区域上结合的热力学稳定性要显著弱于在目标区域上结合的热力学稳定性,优选与目标区域Tm-与非特异区域Tm≥5℃,更优选与目标区域Tm-与非特异区域Tm≥10℃;优选Tm的值基于SantaLucia 2007热力学参数表的最邻近法计算。
在一个实施方案中,其中所述无二聚体产生是指,任意两个诱饵序列之间形成的二聚体,其Tm≤47℃,优选≤37℃;优选Tm的值基于SantaLucia 2007热力学参数表的最邻近法计算。
在一个实施方案中,其中所述无发卡结构产生是指,任一诱饵序列自身形成发卡结构,其Tm≤47℃,优选≤37℃;优选Tm的值基于SantaLucia 2007热力学参数表的最邻近法计算。
在一个实施方案中,其中对每个目标区域,所述诱饵序列是在特异性、二聚体、发卡结构以及与目标区域的相对位置方面综合评分最优的一个或者多个诱饵序列,所述综合评分通过如下的打分函数进行:S=a×S特异性+b×S二聚体+c×S发卡结构+d×S相对距 ,其中a=0.26-0.34、b=0.08-0.12、c=0.17-0.23、d=0.35-0.45,具体的打分计算方法如下:
S特异性的打分计算:对新设计的任一条诱饵序列,在基因组上对其进行序列比对,对其每一条比对上的序列分别计算所述诱饵序列与比对上的序列之间Tm,所述诱饵序列与目标区域Tm-其与任一比对上序列Tm之差≥5℃,优选≥10℃,计算所述诱饵序列与所有比对上的序列之间的平均Tm,S特异性=1-Tm平均值/(Tm目标-5),优选S特异性=1-Tm平均值/(Tm目标-10),其中Tm平均值是诱饵序列与所有非特异区域比对结果的平均Tm值,Tm目标是诱饵序列与目标区域Tm
S二聚体的打分计算:对新设计的任一条诱饵序列,与每一条已经设计的诱饵序列进行二聚体比对分析,对其每一条比对上的序列分别计算所述诱饵序列与所述比对上的诱饵序列之间的Tm,所述Tm<47℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(47–Tm平均值)/47,优选所述Tm<37℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(37–Tm平均值)/37;
S发卡结构的打分计算:对任一条诱饵序列,计算其最佳的自身比对结构,并计算所述结构的Tm,所述Tm<47℃,并且S发卡结构=(47–Tm)/47,优选所述Tm<37℃,并且S发卡结构=(37–Tm平均值)/37;
S相对距离的打分计算:对于目标区域坐标,对新设计的任一条诱饵序列,计算其与所述目标区域坐标差值δDistance,δDistance小于150,S相对距离=(150-δDistance)/150。
在第二方面,本发明还提供了实施本发明的方法的特异性诱饵序列,所述特异性诱饵序列是本发明第一方面中涉及到的诱饵序列。
在一个实施方案中,所述特异性诱饵序列与目标核酸序列一致或对目标序列具有特征性,并且i)自身不产生发夹结构并且相互之间无二聚体产生,ii)拷贝数根据所述目标核酸序列的GC含量和/或空间结构进行补偿,iii)当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计探针,设计方法与所述目标区域一致,iv)与核酸样品中目标核酸序列之外的其他序列无特异性结合。
在一个实施方案中,所述诱饵序列的拷贝数还根据所述目标核酸序列受关注情况进行补偿。
在第三方面,本发明还提供了一种试剂盒,所述试剂盒包括本发明第二方面所述的诱饵序列,所述试剂盒还包括,但不限于,双链接头分子、多种不同的寡核苷酸探针。
在一个实施方案中,所述试剂盒包含用于实现本发明第一方面的方法的组合物和试剂。所述试剂盒包括,但不限于,双链接头分子、多种不同的寡核苷酸探针、与目标核酸序列一致或对目标序列具有特征性的诱饵序列,所述诱饵序列:i)自身不产生发夹结构并且相互之间无二聚体产生,ii)拷贝数根据所述目标核酸序列的GC含量、空间结构和/或受关注情况进行补偿,iii)当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计探针,设计方法与所述目标区域一致,iv)与核酸样品中目标核酸序列之外的其他序列无特异性结合。在某些实施方案中,试剂盒包含两种不同双链接头分子。所述试剂盒可进一步包含至少一种或多种其他成分,所述其他成分选自DNA聚合酶、T4多核苷酸激酶、T4DNA连接酶、杂交液、洗涤液和/或洗脱液。在某些实施方案中,所述试剂盒包含磁体。在某些实施方案中,所述试剂盒包含一种或多种酶,以及相应的试剂、缓冲液等,例如限制性内切酶,例如MlyI,以及用于使用MlyI进行限制性酶切反应的缓冲液/试剂。
具体实施方式
本发明提供了一种基于液相捕获的目标序列富集方法,所述包括:诱饵序列设计,诱饵序列的核酸合成(用合成常规引物或固相合成的方法),用体外转录的方法制备核酸类似物,所述核酸类似物包含结合部分;核酸样品前处理(按文库制备的方法进行),样品可以是基因组DNA、RNA、cDNA、mRNA等;核酸类似物与目标序列核酸以互补配对原则形成核酸类似物/DNA杂交复合物;洗脱去除低互补配对的核酸类似物/DNA杂交体,去除非目标序列核酸;根据核酸样品前处理所加的接头序列,对互补 配对的核酸类似物/DNA进行特异性扩增,达到富集目标序列核酸的目的。
在发明中,术语“样品”以其最广泛的意思使用,其意在包括从任何来源,优选从生物来源获得的样本或培养物。生物样品可从动物(包括人)获得,并包括液体、固体、组织和气体。生物样品包括血液制品,例如血浆、血清等等。因此,“核酸样品”包含任何来源的核酸(例如DNA、RNA、cDNA、mRNA、tRNA、miRNA等)。在所述核酸样品是RNA或mRNA的情况下,中步骤c)之前有将所述RNA或mRNA反转录成DNA的步骤。在本申请中,核酸样品优选源自生物来源,例如人或非人细胞、组织等等。术语“非人”系指所有非人动物和实体,包括但不限于,脊椎动物例如啮齿动物、非人灵长动物、绵羊、牛、反刍动物、兔类动物、猪、山羊、马、犬、猫、鸟类等等。非人还包括无脊椎动物和原核生物,例如细菌、植物、酵母、病毒等等。因此,用于本发明的方法和系统的核酸样品为源自任何生物,无论真核或者原核的核酸样品。
在发明中,发明人发现目标序列的GC含量对基于液相捕获的目标序列捕获效率有较大影响。为了达到对多个目标序列的有效捕获,优选根据所述目标序列的GC含量对所述诱饵序列拷贝数进行补偿,GC含量越小或者越大,所述目标序列对应的诱饵序列拷贝数增加的就越多。
发明人发现,对于GC含量在50%左右,例如±10%,的目标序列可以获得良好的目标序列捕获效率;对于其他GC含量的目标序列,需要进行诱饵序列拷贝数补偿才能获得良好的目标序列捕获效率。经过用人类基因组序列进行全面测试,发明人发现,为了达到更好的目标序列捕获效率,以GC含量在50%的诱饵序列拷贝数系数为基准1,GC含量10%-90%之间偏离50%每1%,诱饵序列拷贝数系数增加0.08-0.12。例如,GC含量为68%时,偏离18%,诱导序列拷贝数系数为2.44-3.16。
对于GC含量小于10%或大于90%属于低复杂序列的情况,这种情况下对应的诱饵序列设计方法是:当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计探针,一般选择目标区域两侧300bp以内区域作为替代区域,优选150bp以内的区域。
在本发明中,低复杂度区域是指由很少种类的元素(如寡核苷酸)所组成的一个区域,例如微卫星这种简单重复序列。
在本发明中,优选对片段化后的样品DNA片段进行建库。
在一个实施方案中,诱饵序列拷贝数补偿方法可以简单地表示为:根据所述目标序列的GC含量大小从高到低分为6档,其中第1档:10%-30%;第2档:30%-40%;第3档:40%-60%;第4档:60%-70%;第5档:70%-90%;第6档:小于10%或大于90%,其中第3档的诱饵序列的拷贝数为基准拷贝数,第2档和第4档对应的诱饵序列的拷贝数需要增加,例如是第3挡的2.2-2.8倍,第1档和第5档的诱饵序列的拷贝数需要增加更多,例如是第3挡的3-4倍。在一个实施方案中,对于第6档,GC含量小于10%或大于90%或者在GC含量是低复杂序列的情况,诱饵序列设计方法是:用所述目标区域两侧区域作为替代区域设计探针,一般选择目标区域两侧300bp以内区域作为替代区域,优选150bp以内的区域。
在一个实施方案中,其中对每个目标区域,所述诱饵序列是在特异性、二聚体、发卡结构以及与目标区域的相对位置方面综合评分最优的一个或者多个诱饵序列,所述综合评分通过如下的打分函数进行:S=a×S特异性+b×S二聚体+c×S发卡结构+d×S相对距 ,其中a=0.26-0.34、b=0.08-0.12、c=0.17-0.23、d=0.35-0.45。S特异性等打分均为0到1之间的数值,具体的打分计算方法如下:
S特异性的打分规则:对新设计的任一条诱饵序列,在基因组上对其进行序列比对, 采用BLAT软件,使用默认参数,对其每一条比对结果,分别计算热力学Tm参数,如果有与目标区域Tm-与非特异区域Tm之差<5℃,优选<10℃,则放弃该诱饵序列,重新设计;否则计算所有非特异区域比对结果的平均Tm值,最终S特异性=1-Tm平均值/(Tm目标-5),其中优选S特异性=1-Tm平均值/(Tm目标-10),其中Tm平均值是诱饵序列与所有非特异区域比对结果的平均Tm值,Tm目标是诱饵序列与目标区域Tm
S二聚体的打分规则:对新设计的任一条诱饵序列,与每一条已经设计的诱饵序列进行二聚体比对分析,采用BLAT软件,使用默认参数,对其每一条比对结果,分别计算热力学Tm参数,如果有Tm≥47℃,则放弃该诱饵序列,重新设计;否则计算所有比对结果的平均Tm值,最终S二聚体=(47–Tm平均值)/47;优选如果有Tm≥37℃,则放弃该诱饵序列,重新设计;否则计算所有比对结果的平均Tm值,S二聚体=(37–Tm平均值)/37;
S发卡结构的打分规则:对任一条诱饵序列,采用Smith-Waterman算法,计算其最佳的自身比对结构,并根据此结构计算其热力学Tm参数值,如果有Tm≥47℃,则放弃该诱饵序列,重新设计;否则其S发卡结构=(47–Tm)/47,优选如果有Tm≥37℃,则放弃该诱饵序列,重新设计;否则其S发卡结构=(37–Tm平均值)/37;
S相对距离的打分规则:已知待设计目标区域坐标,对任一条诱饵序列,计算其与目标区域坐标差值δDistance,设定可接受的差值为150,该数值是经验数值;如果差值大于150,则放弃该诱饵序列,重新设计;否则其S相对距离=(150-δDistance)/150。在与目标区域坐标差值150范围内无法设计出合适的诱饵序列,也可以将差值设置为300,其S相对距离=(300-δDistance)/300。
在本发明中,序列的Tm的计算不拘泥于具体的方法,各种方法计算的Tm值均可以用于本发明,各种方法得到的Tm值基本不能逆转本发明的效果,只是效果的程度会有差异。虽然SantaLucia 2007热力学参数表的最邻近法可以计算Tm,但其他方法计算的Tm值可以与之相对应,本领域技术人员可以经过简单的试验比较各种方法计算得到的Tm,从而对各种方法计算的Tm值作出适当选择。
根据发明人的经验,对于人基因组编码区而言,超过99%的目标区域均可以设计出适合本发明的诱饵序列,表明我们前述对GC区域的分档以及对Tm值的过滤都是合理的。
在某些实施方案中,所述核酸类似物与目标核酸之间的杂交在优选地严格条件下进行,所述严格条件足以支持所述核酸类似物/DNA之间的杂交,其中所述核酸类似物包含连接化合物和所述目标核酸样品的互补区域,以提供所述核酸类似物/DNA杂交复合物。所述复合物随后通过所述连接化合物捕获,并在足以去除非特应性结合核酸的条件下洗涤,然后所杂交的目标核酸序列从所捕获的核酸类似物/DNA复合物中洗脱。
在某些实施方案中,所述核酸类似物包含化学基团或连接化合物,例如结合部分例如生物素、地高辛等等,其能结合于固体载体。所述固体载体可以包含相应的捕获化合物,例如用于生物素的链霉亲和素或用于地高辛的地高辛抗体。本发明不限于所使用的连接化合物,并且替代的连接化合物等同适用于本发明的方法、诱饵序列和试剂盒。
在本发明中,所述化学基团或连接化合物,例如结合部分例如生物素、地高辛等等,可以连接在核酸类似物(甘油核酸GNA、锁核酸LNA、肽核酸PNA、苏糖核酸TNA或吗啉核酸)中任何碱基上。优选地,所述核酸类似物链中可以包括核糖和/或脱氧核糖,所述化学基团或连接化合物,例如结合部分例如生物素、地高辛等等,可以连接在核糖和/或脱氧核糖上的碱基上。例如,所述核酸类似物合成中包括使用标记的 ATP、CTP、GTP和/或UTP。标记用核苷酸Cydye、DIG、生物素、罗丹明、荧光素等的标记方法是本领域已知的。例如,生物素可以用作核酸探针标记物,它能与核酸分子的UTP或dUTP 5’位上的C原子相结合,并可与亲和素结合而被检测。然而,本发明不限于已知的标记物和标记方法,未来发现的标记物和标记方法也在本发明的考虑范围内。
在本发明的实施方案中,所述多个目标核酸分子优选包含一种生物的全基因组或至少一条染色体或一种任意大小分子量的核酸分子。优选地,所述核酸分子的大小至少约200kb、至少约500kb、至少约1Mb、至少约2Mb、或至少约5Mb,更优选大小约100kb至约5Mb、约200kb至约5Mb、约500kb至约5Mb、约1Mb至约2Mb或约2Mb至约5Mb。
在某些实施方案中,所述目标核酸来自动物、植物或微生物,在优选的实施方案中,所述目标核酸分子选来自人。如果核酸样品的量比较少(例如某些情况下取得的人核酸样品,例如发育中的胎儿的基因组),在实施本发明的方法之前可扩增所述核酸,例如通过全基因组扩增。为进行本发明的方法,预先扩增可能是必须的,例如在法医应用中(例如在法医学中用于遗传特征目的)。
在某些实施方案中,所述多个目标核酸分子为一组基因组DNA分子。所述诱饵序列可选自例如限定来自多个遗传基因座的多种外显子、内含子或调控序列的多个诱饵序列;限定至少一个单独遗传基因座的全序列的多个诱饵序列,所述基因座大小任意,优选至少1Mb,或至少上述特定大小之一;限定单核苷酸多态性(SNP)的多种诱饵序列;或限定一种阵列的多种诱饵序列,例如设计为捕获至少一条完整染色体的全序列的嵌合阵列。
在本文中,术语“杂交”系指互补核酸的配对。杂交和杂交强度(例如核酸之间结合的强度)受多种因素的影响,例如核酸之间互补的程度、使用杂交条件的严格程度、所形成杂交体的解链温度(Tm)以及核酸的GC含量值。虽然本发明不受限于具体的杂交条件,但优选使用严格的杂交条件。严格的杂交条件取决于序列并随杂交参数(例如盐浓度、有机物存在等)而变化。通常,“严格的”条件选择为在规定的离子强度和pH下低于特定核酸序列的Tm约5℃到约20℃。优选地,严格的条件为低于结合互补核酸的具体核酸的温度熔点约5℃到10℃。所述Tm是50%核酸(例如目标核酸)与完全配对探针杂交的温度(在规定的离子强度和pH下)。
在本文中,“严格的条件”,例如可为50%甲酰胺,5×SSC(0.75M NaCl,0.075M柠檬酸钠),50mM磷酸钠(pH6.8),0.1%焦磷酸钠,5×Denhardt溶液、超声波处理的鲑鱼精子DNA(50mg/ml),0.1%SDS,以及10%硫酸葡聚糖在42℃下杂交,在42℃以0.2×SSC(氯化钠/柠檬酸钠)和在55℃以50%甲酰胺洗涤,然后在55℃以含有EDTA的0.1×SSC洗涤。例如,预计包含35%甲酰胺、5×SSC和0.1%(w/v)十二烷基硫酸钠(SDS)的缓冲液适合在适度非严格条件下在45℃杂交16-72小时。
在本文中,术语“引物”系指寡核苷酸,无论天然存在经纯化、酶切后得到的或者经合成方法产生的,当置于诱导与核酸链互补的引物延伸产物的合成的条件下(例如在核苷酸和诱导试剂例如DNA聚合酶存在下,并在合适的温度和pH下),能够作为合成的起点。所述引物优选为具有最大扩增效率的单链。优选地,所述引物为寡脱氧核苷酸。所述引物必须足够长以在所述诱导试剂存在下引发延伸产物的合成。所述引物的确切长度取决于很多因素,包括温度、引物来源和所使用方法。
在本文中,术语“诱饵”或“诱饵序列”系指寡核苷酸(例如核苷酸序列),无论天然存在经纯化、酶切后得到的或者经合成、重组或PCR扩增产生的,能够与另一目标 寡核苷酸例如目标核酸序列的至少一部分杂交。探针可为单链或双链。探针可用于特定基因序列的检测、鉴别和分离。
在本文中,术语“目标核酸分子”是指来自目标基因组区域的分子或序列。预选的探针确定了目标核酸分子的范围。因此,所述“目标”试图与其它核酸序列区分出来。一个“片段”定义为所述目标序列中的一个核酸区域,如作为核酸序列的一个“片段”或一“部分”。
在本文中,术语“分离”当用于涉及核酸时,如用于“分离核酸”时,系指核酸序列从其天然来源通常结合的至少一种其他组分或污染物中被鉴别并分离出来。分离的核酸以不同于其天然存在的形式存在。相反,未分离的核酸例如DNA和RNA的核酸以其天然存在的状态存在。所述分离的核酸、寡核苷酸或多核苷酸可以单链形式或双链形式存在。
在本文中,术语“与目标核酸序列一致的诱饵序列”是指其互补序列可以与目标核酸序列杂交的序列。优选,在严格的条件下进行杂交。当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,由于该区域无法设计诱饵序列,即诱饵序列覆盖率为零,那么会在该目标区域左右两侧寻找合适区域设计诱饵序列;一般会在左右两侧300bp以内的范围设计诱饵序列;优选150bp以内的区域。
在本发明的实施方案中,用于在本文所述的捕获方法和试剂盒中使用的诱饵序列的转录引物包含连接化合物,例如结合部分。结合部分包含任何连接或引入用于随后捕获核酸类似物/目标核酸杂交复合物的扩增引物的5’端的部分。结合部分为引入引物序列5’端的任何序列,例如可捕获的6组氨酸(6HIS)序列。例如,包含6HIS序列的引物可被镍捕获,例如在镍包被或包含镍包被珠子、颗粒等的管子、微孔、或纯化柱中,其中所述珠子包装入柱子中,样品装入并通过柱子以捕获复杂度降低的复合物(例如,和随后的目标洗脱)。用于本发明的实施方案的另一种结合部分的实例包括半抗原,例如地高辛,例如其连接到扩增引物的5’端。地高辛可使用地高辛抗体捕获,例如包被或包含抗地高辛抗体的基质。
在某些实施方案中,所述结合部分为生物素,用链霉亲和素包被所述捕获基质,例如珠子如顺磁颗粒,用于从非特异性杂交目标核酸中分离所述目标核酸/转录产物复合物。例如,当生物素为结合部分时,链霉亲和素(SA)包被的基质,例如SA包被的珠子(例如磁珠/颗粒)用于捕获所述生物素标记的核酸类似物/目标复合物。洗涤所述SA结合的复合物,所杂交的目标核酸从所述复合物洗脱进行测序。
可使用无掩膜阵列合成技术在固体载体上并行提供序列中与所述基因组至少一个区域对应的诱饵序列。替代性地,探针可使用标准DNA合成仪连续获得并应用到所述固体载体,或可从有机体获得并固定于所述固体载体。杂交之后,未杂交或与所述核酸类似物非特异性杂交的核酸通过洗涤从所述载体结合的核酸类似物中分离。剩余的核酸与所述核酸类似物特异性结合,在例如热水中或在包含例如TRIS缓冲液和/或EDTA的核酸洗脱缓冲液中从所述固体载体洗脱,以产生所述目标核酸分子富集的洗脱物。
或者,用于目标分子的诱饵序列可如上所述在固体载体上合成,作为诱饵序列集合从所述固体载体释放并扩增。所述转录的释放核酸类似物集合可共价或非共价固定于载体,例如玻璃、金属、陶瓷、或聚合珠子或其它固体载体。所述核酸类似物可设计为从所述固体载体方便释放,例如在最接近载体的核酸类似物末端或其附近提供酸或碱不稳定的核酸序列,其分别在低或高pH条件下释放所述核酸类似物。本领域已知多种可剪切的连接化合物。所述载体可以,例如,以具有液体进口和出口的圆柱提供。本领域熟悉将核酸固定到载体的方法,例如通过将生物素标记的核苷酸结合到所 述核酸类似物中,并使用链霉亲和素包被所述载体,由此所述包被的载体非共价吸引并固定所述集合中的所述核酸类似物。所述样品在杂交条件下通过所述包含核酸类似物的载体,由此与所述固定载体杂交的目标核酸分子可洗脱,用于之后的分析或其它用途。
术语“核酸”可包括,例如,但不限于:脱氧核糖核酸(DNA)、核糖核酸(RNA)和人工核酸比如肽核酸(PNA)、吗啉核酸(morpholino)和锁核酸(LNA)、甘油核酸(glycol nucleic acid,GNA)和苏糖核酸(TNA)。在本文中,术语“核酸”、“核酸序列”或者“核酸分子”应该从广义解释,举例来说,可以是核糖核酸(RNA)或脱氧核糖核酸(DNA)或者其模拟物的寡聚物或者聚合物。该术语包括由天然核碱基、糖类和共价核苷间(骨架)连接构成的分子以及具有非天然核碱基、糖类和共价核苷间(骨架)连接构成的具有类似功能的分子或者其組合。因为所需的性质,比如对核酸靶分子亲和力增强以及在核酸酶和其他酶存在时稳定性増加,这样的经修饰或者取代的核酸可能比天然形式更优选,并且在本文中用术语“核酸类似物”或者“核酸模拟物”来描述。核酸模拟物的优选实例是包含肽核酸(PNA)、锁核酸(LNA)、木-锁核酸Uylo-LNA)、硫代磷酸酷、2’-甲氧基、2’-甲氧基乙氧基、吗啉核酸和氨基磷酸酯的分子或者功能上类似的核酸衍生物。
实施例
实施例1:诱饵序列的设计
随机选择人基因组上外显子和内含子上1000个位点(这些位点的分布见表)用于测试本发明的方法。对这1000个随机靶序列设计诱饵序列用于后续测试。
表1:随机选择的1000个位点的染色体分布
染色体 个数 染色体 个数
chr1 92 chr12 73
chr2 67 chr13 23
chr3 53 chr14 15
chr4 43 chr15 29
chr5 45 chr16 41
chr6 124 chr17 36
chr7 42 chr18 14
chr8 46 chr19 31
chr9 34 chr20 21
chr10 61 chr21 9
chr11 80 chr22 21
诱饵序列设计包括以下步骤:
1.首先,目标序列特征性分析包括如下步骤:
a)根据目标序列GC含量大小从高到低分为5档,其中1档:10%-30%;2档:30%-40%;3档:40%-60%;4档:60%-70%;5档:70%-90%;
b)分析目标序列空间结构,标记能形成稳定空间结构的目标序列;
2.其次,对诱饵序列的设定标准以及评分:
a)目标序列长度在60-150bp范围;
b)保持特异性,特异性的原则是,诱饵序列在非目标区域上结合的热力学稳定性要显著弱于在目标区域上结合的热力学稳定性;一般分析的指标为Tm(目标区域)-Tm(非特异区域)≥(非特异区域)5℃;部分数据Tm(目标区域)-Tm(非特异区域) ≥10℃进行对比(强特异性限制);不同热力学计算方法,对计算结果影响较大,这里是基于SantaLucia 2007热力学参数表的最邻近法计算;
c)无二级结构产生,二级结构包括二聚体和发卡结构,即所设计的诱饵序列不允许产生二聚体或者发卡结构;任意两个诱饵序列之间形成的二聚体,其Tm≤47℃,部分数据≤37℃进行对比(严格二聚体限制);任一诱饵序列自身形成发卡结构,其Tm≤47℃,部分数据≤37℃进行对比(严格发夹结构限制);不同热力学计算方法,对计算结果影响较大,这里是基于SantaLucia 2007热力学参数表的最邻近法计算;
d)对每个目标区域,分析候选诱饵序列,根据每个候选序列的特异性、二聚体、发卡结构以及与目标区域的相对位置,设计综合评分,然后根据评分结果,选择最优的一个或者多个诱饵序列(即打分函数值最大的),:S=a×S特异性+b×S二聚体+c×S发卡 结构+d×S相对距离,其中a=0.26-0.34、b=0.08-0.12、c=0.17-0.23、d=0.35-0.45,打分通过自有软件计算提供,规则如下:
S特异性的打分规则:对新设计的任一条诱饵序列,在基因组上对其进行序列比对,采用BLAT软件,使用默认参数,对其每一条比对结果,分别计算热力学Tm参数,如果有与目标区域Tm-与非特异区域Tm之差<5℃,则放弃该诱饵序列,重新设计,其中部分数据<10℃作为对比;否则计算所有比对结果的平均Tm值,最终S特异性=1-Tm平均值/(Tm目标-5),部分数据S特异性=1-Tm平均值/(Tm目标-10)作为对比,其中Tm平均值是诱饵序列与所有非特异区域比对结果的平均Tm值,Tm目标是诱饵序列与目标区域Tm
S二聚体的打分规则:对新设计的任一条诱饵序列,与每一条已经设计的诱饵序列进行二聚体比对分析,采用BLAT软件,使用默认参数,对其每一条比对结果,分别计算热力学Tm参数,如果有Tm≥47℃,则放弃该诱饵序列,重新设计;否则计算所有比对结果的平均Tm值,最终S二聚体=(47–Tm平均值)/47,部分数据Tm≥37℃作为对比,则放弃该诱饵序列,重新设计;否则计算所有比对结果的平均Tm值,S二聚体=(37–Tm平均值)/37;
S发卡结构的打分规则:对任一条诱饵序列,采用Smith-Waterman算法,计算其最佳的自身比对结构,并根据此结构计算其热力学Tm参数值,如果有Tm≥47℃,则放弃该诱饵序列,重新设计;否则其S发卡结构=(47–Tm)/47,部分数据如果有Tm≥37℃作为对比,则放弃该诱饵序列,重新设计;否则其S发卡结构=(37–Tm平均值)/37;
S相对距离的打分规则:已知待设计目标区域坐标,对任一条诱饵序列,计算其与目标区域坐标差值δDistance,设定可接受的差值为150,该数值是经验数值;如果差值大于150,则放弃该诱饵序列,重新设计;否则其S相对距离=(150-δDistance)/150。在与目标区域坐标差值150范围内无法设计出合适的诱饵序列,作为对比还将部分差值设置为300,其S相对距离=(300-δDistance)/300。
3.再次,根据具体目标区域情况,进行诱饵序列拷贝数补偿:
a)根据目标序列的稳定性分类情况,以3档的诱饵序列拷贝数作为基准拷贝数(即基准1);1档和5档对应的诱饵序列需要增加较多的拷贝数,是第3挡的2.5倍;其次是2档和4档,其对应的诱饵序列也需要稍多的拷贝数是第3挡的3.5倍;
b)对于形成稳定空间结构的目标序列,诱饵序列拷贝数翻倍;
c)对于目标区域可能是重点关注区域时,例如可能是融合事件发生的区域,诱饵序列拷贝数翻倍;
d)另外在相同条件下进行诱饵序列拷贝数不补偿的平行试验作为对照。
4.最后,当目标序列无法设计探针时,例如,当目标区域是极高或者极低GC含量区域时,或者当目标区域是低复杂度区域时(低复杂度区域是指由很少种类的元素如寡核苷酸所组成的一个区域,例如微卫星这种简单重复序列),由于该区域无法设计 诱饵序列,即诱饵序列覆盖率为零,那么会在该目标区域左右两侧寻找合适区域设计诱饵序列;一般会在左右两侧300bp以内的范围设计诱饵序列;如果150bp以内的区域能设计出合适的诱饵序列,则记录作为对照。本实施例中随机选择的目标序列中有138个属于这种情况,68个在其左右150bp以内的区域成功设计出诱饵序列,另外22个在其左右150-300bp内成功设计出诱饵序列,仍有48个在这些区域都无法设计探针。
5.最终设计的诱饵序列见情况见表2。
表2:诱饵序列设计情况
Figure PCTCN2016106595-appb-000001
其中严格打分函数限制的条件是:与目标区域Tm-与非特异区域Tm≥10℃,S特异性=Tm平均值/37;Tm<37℃,S二聚体=(37–Tm平均值)/37;Tm<37℃,S发卡结构=(37–Tm平均值)/37。
实施例2:诱饵序列的制备
按照实施例1设计的诱饵序列进行序列制备,诱饵序列制备方法如下:
1.在诱饵序列5’端和3’端分别添加长度为20个碱基的特异性序列,特异性序列设计原则是:1)不会在目标(待捕获)基因组上产生非特异扩增产物;2)GC含量位于30%-70%之间,优选40%-60%之间;3)两者不会形成二聚体,或者形成的二聚体自由能≤47℃,优选≤37℃。从而形成待合成序列,所有诱饵序列同一对特异性序列,举例如下:
5’端特异性序列-诱饵序列(60-150bp不等)-3’端特异性序列为(SEQ ID NO.1):
ATATAGATGCCGTCCTAGCG-NNNNNNNNNN……NNNNNNNNNN-TGGGCACAGGAAAGATACTT。其中“NNNNNNNNNN……NNNNNNNNNN”表示诱饵序列。
2.特异性序列通过本发明人自主开发的液相杂交捕获测序探针设计软件生成。
3.将待合成序列利用本领域公知的芯片方法大规模合成寡核苷酸,接着用用氨水将芯片上的寡核苷酸洗脱下来,经过纯化后溶于双蒸水中,形成寡核苷酸池。
4.以寡核苷酸池为模板,与5’端特异性序列和3’端特异性序列互补的5’端引物和3’端引物为引物,利用Taq聚合酶(JumpStart Taq DNA Polymerase采购至Sigma,Catalog No.D6558)进行聚合酶链式反应扩增,获得大量的双链DNA池,具体操作步骤如下:
1)反应体系如下:
试剂名称 体积
Water 37μl
10×PCR Buffer 5μl
10mM dATP 1μl
10mM dCTP 1μl
10mM dGTP 1μl
10mM TTP 1μl
5’端引物(10μM) 1μl
3’端引物(10μM) 1μl
JumpStart Taq DNAPolymerase 1μl
寡核苷酸池 1μl
2)反应条件如下:
Figure PCTCN2016106595-appb-000002
3)使用QIAGEN PCR纯化试剂盒(QIAGEN、Cat No./ID 28104),根据其操作说明书进行PCR产物纯化:
4)使用5’端引物的5’端带T7序列(TAATACGACTCACTATAGGG)作为正向引物与3’端引物作为反向引物,利用Taq聚合酶(JumpStart Taq DNA Polymerase采购至Sigma,Catalog No.D6558)进行聚合酶链式反应扩增,形成5’端带T7序列的双链DNA池。操作如下:
5)反应体系:
试剂名称 体积
Water 37μl
10×PCR Buffer 5μl
10mM dATP 1μl
10mM dCTP 1μl
10mM dGTP 1μl
10mM TTP 1μl
BAITS_5_PRIMER_N-T7(10μM) 1μl
BAITS_3_PRIMER_N(10μM) 1μl
JumpStart Taq DNAPolymerase 1μl
寡核苷酸池 1μl
6)反应条件如下:
Figure PCTCN2016106595-appb-000003
采用凝胶电泳对上一步PCR反应产物进行分离,去除非特异条带,回收120-210bp区域片段,采用Qiagen胶回收试剂盒(QIAquick Gel Extraction Kit,Cat No./ID28704)进行纯化;
7)采用T7High Yield RNA Transcription Kit(Vazyme,TR101-01/02),利用核酸类似物(甘油核酸GNA、锁核酸LNA、肽核酸PNA、苏糖核酸TNA或吗啉核酸)的NTP和生物素标记的UTP为底物,对上一步胶回收纯化产物进行体外转录,制备成含生物素标记的核酸类似物池:
试剂名称 体积(μl)
ATP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) 2
CTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) 2
GTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) 2
UTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) 1.6
生物素-UTP(1mM) 3
10×缓冲物 2
反应缓冲物(10×) 2
上一步含T7序列的胶回收纯化产物 5.4
37℃孵育8-12小时,得到最高产量核酸类似物池,纯化后稀释至500ng/μl,置于-80℃冰箱保存。
另外以标准核酸ATP、CTP、GTP、UTP和Biotin-UTP中相同条件下平行试验作为对照。
实施3:目标区域文库捕获
1.用于高通量捕获测序的DNA文库制备:
1)取被测物种的基因组DNA 1μg,使用超声波破碎仪Bioruptor pico进行随机打断至150-250bp小片段;
2)使用Illumina TruSeq DNA library preparation试剂盒进行捕获前小片段文库制备。
2.使用制备的核酸类似物池和目标物种的小片段文库进行目标区域文库杂交捕获:
1)封闭引物准备:
Figure PCTCN2016106595-appb-000004
按照以上引物序列进行合成,每种合成100OD,将每种引物稀释至1000μM,并按照等体积混合,命名为Block 1;
2)将cot-1DNA与salmon sperm DNA稀释至100ng/μl,并等体积混合,标记为Block 2;
3)取6μl Block 1与5μl Block 2进行混合,标记为Block Mix;
4)取1μg小片段基因组文库与11μl Block Mix混合,并使用低温冷冻干燥离心机进行浓缩至9μl,标记为试剂S1,置于冰上待用;
6)取20μl杂交液(20×SSPE,2×Dennard`s,1mM EDTA,1%SDS)置于65℃金属浴上预热,标记为S2;
7)取5μl纯水,混匀后加入2μl 500ng/μl核酸类似物池,缓慢吸打数次混匀,标记为S3,置于冰上待用;
8)将PCR仪参数设置成95℃,5min;65℃,16h;65℃,恒温;热盖105℃;
9)将S1置于PCR模块上,启动PCR程序,程序运行至65℃5min后,将S2放入PCR仪模块,继续孵育5min后,将S3放入PCR仪模块,继续孵育2min;
10)将移液器调至13μl,取13μl S2转移至S3,取9μl S1转移至S3,缓慢吸打数次充分混匀混合物,密封管盖,盖上PCR热盖,孵育16小时进行探针与文库杂交;
11)取50μl Dynabeads MyOne Streptavidin T1(Invitrogen,货号:65601)置于1.5ml低吸附离心管内,加入200μl结合液[0.5M NaCl(Ambion,货号:AM9760G),2mM Tris-HCl,pH 8.0(Ambion,货号:AM9855G),0.2mM EDTA(Ambion,货号:AM9260G)],吸打混匀后置于磁力架上1min,移除上清液;
12)将离心管从磁力架上取下,再加入200μl结合液,吸打混匀后置于磁力架上1min,移除上清;
13)重复步骤11两次,共进行3次磁珠清洗,最后用200μl结合液重悬磁珠;
14)将探针、文库杂交混合液(步骤9产物)转移至磁珠重悬液内,密封管盖,置于旋转混匀仪上混匀结合30min;
15)将离心管置于磁力架上2min,移除上清液;
16)将离心管从磁力架上取下,加入200μl清洗液1[10×SSC(Ambion,货号:AM9763),1%SDS(Invitrogen,货号:24730020)]重悬磁珠,密封管盖,置于旋转混匀仪上清洗10min;
17)将离心管置于磁力架上2min,移除上清;
18)把离心管从磁力架上取下,加入200μl 65℃预热的清洗液2[1×SSC(Ambion,货号:AM9763),5%SDS(Invitrogen,货号:24730020)]重悬磁珠,并置于PCR仪模块上65℃孵育10min;
19)将离心管置于磁力架上2min,移除上清;
20)重复步骤17-18两次,共进行3次清洗;
21)向离心管内加入200μl 80%乙醇溶液,静置30s,移除全部酒精,室温晾干2min,加入20μl纯水缓慢吸打数次重悬磁珠;
3.PCR富集目标区域捕获产物,采用NEB高保真PCR试剂盒(
Figure PCTCN2016106595-appb-000005
High-Fidelity PCR Kit,New England Biolabs,Catalog#E0553S):
1)反应体系:
试剂名称 体积
5×Phusion HF 10μl
10mM dNTPs 1μl
Post Prmier Mix(均10μM) 1μl
重悬磁珠(步骤20) 20μl
Phusion DNA聚合酶 0.5μl
H2O 17.5μl
2)反应条件如下:
Figure PCTCN2016106595-appb-000006
3)使用Beckman Agencourt AMPure XP Kit[Beckman(p/n A63880)]进行PCR产物纯化;
4)使用Illumina测序平台进行目标区域捕获文库进行高通量测序,测序读长建议使用PE150模式。
3.结果
1)采用Illumina高通量测序仪Hiseq 4000,对测序文库进行上机测序,得到1000个位点的测序数据;
2)利用BWA MEM软件,将测序数据与到人类参考基因组HG19进行比对,所用的参数为:bwa mem-M-k 40-t 8-R"@RG\tID:Hiseq\tPL:Illumina\tSM:sample",从而得到与参考基因组不同的单核苷酸多态性、插入或缺失,即所检测到的基因突变。
3)采用samtools-1.2软件中的samtools stats工具统计数据的大小、比对率、重复率、质量值,接着再用软件中的samtools depth工具,计算目标区域每个位置的测序深度;
4)根据目标区域每个位置的测序深度,分别统计测序深度≥1、≥4、≥10及≥20的碱基数量,再将该碱基数量除以目标区域的总碱基数量,从而得到1×覆盖率、4×覆盖率、10×覆盖率及20×覆盖率的参数。
表3:1000位点捕获测序结果
Figure PCTCN2016106595-appb-000007
从以上表3可以看出,以LNA为例,平均深度有451.53层;4×覆盖率有94.35%,而20×覆盖率也有93.64%,具有较好的覆盖率和均一性,而总数据量仅为8.52Mb reads。这样的结果带来的有益效果有:1)测序量小,有效降低成本;2)平均测序深度高,即每一个目标位点被测序多次,因而数据准确性高;3)覆盖率高,遗漏位点少;4)均一性好,即绝大多数位点具有相近的覆盖深度。
根据对作为比较的数据子集以及对照数据的分析,与LNA相比,诱饵序列拷贝数不补偿的情况下覆盖率和均一性分别下降4.5和5.1个百分点;强特异性限制、严格二聚体限制、严格发夹结构限制和严格打分函数限制的情况下覆盖率和均一性分别增加6.3和7.8个百分点;150bp以内的区域与150-300bp内的区域覆盖率和均一性分别大2.3和3.8个百分点;以相同比例的标准核酸ATP、CTP、GTP、UTP和Biotin-UTP平行试验覆盖率和均一性分别降低5.3和4.8个百分点。
虽然已经结合优选实施例对本发明进行了描述,但应当理解本发明的保护范围并不局限于这里所描述的实施例。结合这里披露的本发明的说明和实践,本发明的其他实施例对于本领域技术人员都是易于想到和理解的。说明和实施例仅被认为是示例性的,本发明的真正范围和主旨均由权利要求所限定。

Claims (10)

  1. 一种从核酸样品富集目标序列核酸的方法,所述方法包括:
    a)提供包含目标核酸序列的核酸样品和与目标核酸序列一致或对目标序列具有特征性的诱饵序列;
    b)以所述诱饵序列为模板进行体外转录制备核酸类似物,所述核酸类似物带有结合部分,例如生物素结合部分;
    c)使所述核酸样品片段化,优选制备文库;
    d)所述核酸类似物与所述核酸样品杂交,使得所述核酸类似物与所述目标序列核酸形成核酸类似物/DNA杂交复合物;
    e)通过所述结合部分,从非特异性杂交核酸中分离所述核酸类似物/DNA杂交复合物,去除非目标序列核酸。
  2. 根据权利要求1的方法,还包括步骤f):对所述核酸类似物/DNA杂交复合物进行扩增,达到富集目标序列核酸的目的。
  3. 根据权利要求1的方法,其中步骤b)中利用核酸类似物GNA、LNA、PNA、TNA或吗啉核酸进行体外转录,制备核酸类似物。
  4. 根据权利要求1所述的方法,其中所述核酸样品是基因组DNA、RNA、cDNA、mRNA,在所述核酸样品是RNA或mRNA的情况下,在步骤c)之前有将所述RNA或mRNA反转录成DNA的步骤。
  5. 根据权利要求1的方法,其中所述诱饵序列具有选自如下的特性:i)自身不产生发夹结构并且相互之间无二聚体产生,ii)拷贝数根据所述目标核酸序列的GC含量和/或空间结构进行补偿,iii)当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计诱饵,设计方法与所述目标区域一致,iv)与核酸样品中目标核酸序列之外的其他序列无特异性结合。
  6. 根据权利要求4的方法,其中ii)中拷贝数根据所述目标核酸序列的GC含量进行补偿是指:以GC含量在50%的诱饵序列拷贝数系数为基准1,GC含量10%-90%之间每偏离1%,诱饵序列拷贝数系数增加0.08-0.12。
  7. 所述诱饵序列在固体载体上,例如在微阵列载玻片上。
  8. 根据权利要求1的方法,其中对每个目标区域,所述诱饵序列是在特异性、二聚体、发卡结构以及与目标区域的相对位置方面综合评分最优的一个或者多个诱饵序列,所述综合评分通过如下的打分函数进行:S=a×S特异性+b×S二聚体+c×S发卡结构+d×S相对距离,其中a=0.26-0.34、b=0.08-0.12、c=0.17-0.23、d=0.35-0.45,具体的打分计算方法如下:
    S特异性的打分计算:对新设计的任一条诱饵序列,在基因组上对其进行序列比对,对其每一条比对上的序列分别计算所述诱饵序列与比对上的序列之间Tm,所述诱饵序列与目标区域Tm-其与任一比对上序列Tm之差≥5℃,优选≥10℃,计算所述诱饵序列与所有比对上的序列之间的平均Tm,S特异性=1-Tm平均值/(Tm目标-5),优选S特异性=1-Tm平均值/(Tm目标-10),其中Tm平均值是诱饵序列与所有非特异区域比对结果的平均Tm值,Tm目标是诱饵序列与目标区域Tm
    S二聚体的打分计算:对新设计的任一条诱饵序列,与每一条已经设计的诱饵序列进行二聚体比对分析,对其每一条比对上的序列分别计算所述诱饵序列与所述比对上的诱饵序列之间的Tm,所述Tm<47℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(47–Tm平均值)/47,优选Tm<37℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(37–Tm平均值)/37;
    S发卡结构的打分计算:对任一条诱饵序列,计算其最佳的自身比对结构,并计算所述结构的Tm,所述Tm<47℃,并且S发卡结构=(47–Tm)/47,所述Tm<47℃,并且S 卡结构=(37–Tm平均值)/37;
    S相对距离的打分计算:对于目标区域坐标,对新设计的任一条诱饵序列,计算其与所述目标区域坐标差值δDistance,δDistance小于150,S相对距离=(150-δDistance)/150。
  9. 权利要求1-8任一项涉及到的诱饵序列。
  10. 包括权利要求9所述的诱饵序列的试剂盒,所述试剂盒包括,但不限于,双链接头分子、多种不同的寡核苷酸探针。
PCT/CN2016/106595 2016-04-22 2016-11-21 一种从核酸样品富集目标序列核酸的方法 WO2017181670A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2016403554A AU2016403554A1 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610250133.3A CN105925671B (zh) 2016-04-22 2016-04-22 一种从核酸样品富集目标序列核酸的方法
CN201610250133.3 2016-04-22

Publications (1)

Publication Number Publication Date
WO2017181670A1 true WO2017181670A1 (zh) 2017-10-26

Family

ID=56839769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/106595 WO2017181670A1 (zh) 2016-04-22 2016-11-21 一种从核酸样品富集目标序列核酸的方法

Country Status (3)

Country Link
CN (1) CN105925671B (zh)
AU (2) AU2016102398A4 (zh)
WO (1) WO2017181670A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110343756A (zh) * 2019-06-25 2019-10-18 广西识远医学检验实验室有限公司 一组用于检测地中海贫血的探针及相关试剂盒和应用

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105925671B (zh) * 2016-04-22 2019-07-23 艾吉泰康(嘉兴)生物科技有限公司 一种从核酸样品富集目标序列核酸的方法
CN106676169B (zh) * 2016-11-15 2021-01-12 上海派森诺医学检验所有限公司 一种用于乳腺癌易感基因brca1和brca2突变检测的杂交捕获试剂盒及其方法
CN108546739A (zh) * 2018-04-20 2018-09-18 曹顺 一种用于ngs测序的核酸目标序列富集的方法
CN111723261B (zh) * 2019-03-22 2021-08-13 昆明逆火科技股份有限公司 基于搜索引擎的dna比对算法
AU2021241674A1 (en) * 2020-03-26 2022-09-01 Rachael CUNNINGHAM Hybridization capture methods and compositions

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103602658A (zh) * 2013-10-15 2014-02-26 东南大学 一种新型靶向核酸分子的捕获与富集技术
CN105925671A (zh) * 2016-04-22 2016-09-07 艾吉泰康生物科技(北京)有限公司 一种从核酸样品富集目标序列核酸的方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002307654A1 (en) * 2002-05-01 2003-11-17 Seegene, Inc. Methods and compositions for improving specificity of pcr amplication
US8192937B2 (en) * 2004-04-07 2012-06-05 Exiqon A/S Methods for quantification of microRNAs and small interfering RNAs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103602658A (zh) * 2013-10-15 2014-02-26 东南大学 一种新型靶向核酸分子的捕获与富集技术
CN105925671A (zh) * 2016-04-22 2016-09-07 艾吉泰康生物科技(北京)有限公司 一种从核酸样品富集目标序列核酸的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110343756A (zh) * 2019-06-25 2019-10-18 广西识远医学检验实验室有限公司 一组用于检测地中海贫血的探针及相关试剂盒和应用
CN110343756B (zh) * 2019-06-25 2023-02-24 广西识远医学检验实验室有限公司 一组用于检测地中海贫血的探针及相关试剂盒和应用

Also Published As

Publication number Publication date
AU2016403554A1 (en) 2018-12-13
CN105925671B (zh) 2019-07-23
CN105925671A (zh) 2016-09-07
AU2016102398A4 (en) 2019-05-02

Similar Documents

Publication Publication Date Title
EP3377625B1 (en) Method for controlled dna fragmentation
US8986958B2 (en) Methods for generating target specific probes for solution based capture
US20190005193A1 (en) Digital measurements from targeted sequencing
WO2017181670A1 (zh) 一种从核酸样品富集目标序列核酸的方法
JP2020522243A (ja) 核酸のマルチプレックス末端タギング増幅
US20150017635A1 (en) Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers
CN105647907B (zh) 一种用于靶向杂交捕获的修饰性dna杂交探针的制备方法
CA2945358C (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
CA3134831A1 (en) Methods and compositions for analyzing nucleic acid
CN105925678B (zh) 用于扩增样品中多个目标dna序列的引物组及其应用
JP2022516821A (ja) 複合体が表面結合されたトランスポソーム複合体
AU2021240263A1 (en) Isothermal methods and related compositions for preparing nucleic acids
CN115298323A (zh) 靶向测序方法
WO2020132316A2 (en) Target enrichment
CN106191256B (zh) 一种针对目标区域进行dna甲基化测序的方法
US20110091939A1 (en) Methods and Compositions for Removing Specific Target Nucleic Acids
US11136576B2 (en) Method for controlled DNA fragmentation
US9315807B1 (en) Genome selection and conversion method
US11718848B1 (en) Methods for depletion of high-copy sequences in multiplexed whole genome sequencing libraries
US20220136042A1 (en) Improved nucleic acid target enrichment and related methods
US20190284550A1 (en) Methods of depleting or isolating target rna from a nucleic acid sample

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16899252

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016403554

Country of ref document: AU

Date of ref document: 20161121

Kind code of ref document: A

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/02/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16899252

Country of ref document: EP

Kind code of ref document: A1