WO2017181670A1 - 一种从核酸样品富集目标序列核酸的方法 - Google Patents
一种从核酸样品富集目标序列核酸的方法 Download PDFInfo
- Publication number
- WO2017181670A1 WO2017181670A1 PCT/CN2016/106595 CN2016106595W WO2017181670A1 WO 2017181670 A1 WO2017181670 A1 WO 2017181670A1 CN 2016106595 W CN2016106595 W CN 2016106595W WO 2017181670 A1 WO2017181670 A1 WO 2017181670A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- sequence
- target
- bait
- average
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
Definitions
- the invention relates to the capture, enrichment and analysis of nucleic acid sequences. More specifically, the present invention relates to a target sequence enrichment method based on liquid phase capture.
- Target region capture technology refers to the capture of the nucleic acid sequence of the target region by a specific technical means, and then the library is sequenced to achieve the purpose of deep sequencing of the target region, and the sequencing cost is greatly reduced.
- PCR is a common technique for enriching target regions, and it is more common to capture multiple target regions at once using multiplex PCR techniques. Multiplex PCR is more suitable for the capture of hotspots or smaller target areas; for larger target areas, such as target areas longer than 100K, multiplex PCR is no longer suitable in terms of cost and technical complexity.
- the present invention provides a target sequence enrichment method based on liquid phase capture.
- the invention provides a method of enriching a nucleic acid of a target sequence from a nucleic acid sample, the method comprising:
- nucleic acid sample comprising a target nucleic acid sequence and a bait sequence that is identical to or characteristic of the target nucleic acid sequence
- a linker sequence is ligated at both ends of the nucleic acid sample fragment in the preparative library of step c), and step e) further comprises the step of f) hybridizing the nucleic acid analog/DNA according to the linker sequence The complex is amplified to achieve the purpose of enriching the nucleic acid of the target sequence.
- the bait sequence has a property selected from the group consisting of: i) not producing a hairpin structure by itself and no dimer production between each other, ii) copy number according to the GC content of the target nucleic acid sequence and / or spatial structure to compensate, and iii) when the target area is a very high or very low GC content area or when the target area is a low complexity area, use the two sides of the target area as a substitute area to design the bait,
- the design method is consistent with the target region, iv) non-specifically binding to other sequences than the target nucleic acid sequence in the nucleic acid sample.
- the copy number of the bait sequence is also compensated for by the target nucleic acid sequence being of interest.
- nucleic acid sample is genomic DNA, RNA, cDNA, mRNA
- the nucleic acid sample is RNA or mRNA
- the bait sequence is on a solid support, such as on a microarray slide.
- the solid support is also a plurality of beads or a microarray.
- nucleic acid analogs carry a binding moiety.
- the nucleic acid analog is prepared by in vitro transcription using the nucleic acid analog GNA, LNA, PNA, TNA or morpholino nucleic acid in step b), preferably the nucleic acid analog carries a binding moiety.
- the binding moiety is a biotin binding moiety.
- the bait sequence copy number is compensated according to the GC content of the target sequence, and the smaller or larger the GC content, the more the bait sequence copy number corresponding to the target sequence increases.
- the copy number is compensated according to the GC content of the target nucleic acid sequence, which means that the GC content is 50% of the bait sequence copy number coefficient, and the GC content is between 10% and 90%. For every 1%, the bait sequence copy number coefficient is increased by 0.08-0.12.
- the bait sequence copy number compensation method is: according to the GC content of the target sequence is divided into 6 files from high to low, wherein the first file: 10%-30%; the second file: 30% -40%; third gear: 40%-60%; fourth gear: 60%-70%; fifth gear: 70%-90%; sixth gear: less than 10% or greater than 90%, of which the third gear
- the copy number of the bait sequence is the reference copy number, and the number of copies of the bait sequence of the second and fourth gears is more than the third gear, for example, 2.2-2.8 times of the third gear, the bait of the first gear and the fifth gear
- the sequence has more copies, for example 3-4 times the third block.
- the decoy sequence design method is: designing the probe with the region on both sides of the target region as an alternative region, generally selecting the target region An area within 300 bp on both sides is used as a replacement area, preferably an area within 150 bp.
- the bait sequence is 60-150 bp in length, preferably 80-120 bp in length.
- said dimer-free production refers to a dimer formed between any two bait sequences having a T m ⁇ 47 ° C, preferably ⁇ 37 ° C; preferably the value of Tm is based on the thermodynamics of SantaLucia 2007 The nearest neighbor method of the parameter table is calculated.
- any one of the decoy sequence itself forms a hairpin structure, which T m ⁇ 47 °C, preferably ⁇ 37 °C; Tm value is preferably based on the thermodynamic parameter table SantaLucia 2007 Closest Method calculation.
- the average Tm value, the Tm target is the decoy sequence and the target region T m ;
- the invention also provides a specific decoy sequence for carrying out the method of the invention, the specific decoy sequence being the decoy sequence referred to in the first aspect of the invention.
- the specific decoy sequence is identical to or characteristic of the target nucleic acid sequence, and i) does not itself produce a hairpin structure and is free of dimers from each other, ii) copy number according to Compensating for the GC content and/or spatial structure of the target nucleic acid sequence, iii) when the target region is a very high or very low GC content region or when the target region is a low complexity region, using both sides of the target region The region is designed as a surrogate region, the design method is consistent with the target region, iv) non-specifically binds to other sequences than the target nucleic acid sequence in the nucleic acid sample.
- the copy number of the bait sequence is also compensated for by the target nucleic acid sequence being of interest.
- the present invention also provides a kit comprising the bait sequence of the second aspect of the invention, the kit further comprising, but not limited to, a double linker molecule, a plurality of different Oligonucleotide probe.
- the kit comprises a composition and reagents for carrying out the method of the first aspect of the invention.
- the kit includes, but is not limited to, a double-linker molecule, a plurality of different oligonucleotide probes, a bait sequence that is identical to or characteristic of the target nucleic acid sequence, and the decoy sequence: i) itself No hairpin structure is produced and no dimer is produced between each other, ii) copy number is compensated according to GC content, spatial structure and/or attention of the target nucleic acid sequence, iii) when the target region is extremely high Or when the region of the very low GC content is used or when the target region is a low complexity region, the probe is designed with the region on both sides of the target region as a substitute region, and the design method is consistent with the target region, iv) the target nucleic acid in the nucleic acid sample Other sequences outside the sequence have no specific binding.
- the kit comprises two different double-linker molecules.
- the kit may further comprise at least one or more additional components selected from the group consisting of DNA polymerase, T4 polynucleotide kinase, T4 DNA ligase, hybridization, wash and/or eluent.
- the kit comprises a magnet.
- the kit comprises one or more enzymes, as well as corresponding reagents, buffers, and the like, such as restriction enzymes, such as MlyI, and for restriction enzyme digestion using MlyI. Buffer/reagent.
- the invention provides a target sequence enrichment method based on liquid phase capture, which comprises: decoy sequence design, nucleic acid synthesis of bait sequence (using conventional primers or solid phase synthesis method), preparation of nucleic acid by in vitro transcription method
- An analog the nucleic acid analog comprises a binding moiety; a nucleic acid sample is pretreated (by a library preparation method), the sample may be genomic DNA, RNA, cDNA, mRNA, etc.; the nucleic acid analog and the target sequence nucleic acid are formed by complementary pairing principles Nucleic acid analog/DNA hybrid complex; eluting to remove low complementary paired nucleic acid analog/DNA hybrid, removing non-target sequence nucleic acid; complementing according to the linker sequence added by nucleic acid sample pretreatment
- the paired nucleic acid analog/DNA is specifically amplified to achieve the purpose of enriching the nucleic acid of the target sequence.
- sample is used in its broadest sense and is intended to include a sample or culture obtained from any source, preferably from a biological source.
- Biological samples are available from animals, including humans, and include liquids, solids, tissues, and gases.
- Biological samples include blood products such as plasma, serum, and the like.
- a "nucleic acid sample” comprises nucleic acids of any origin (eg, DNA, RNA, cDNA, mRNA, tRNA, miRNA, etc.). In the case where the nucleic acid sample is RNA or mRNA, there is a step of reverse transcription of the RNA or mRNA into DNA prior to step c).
- the nucleic acid sample is preferably derived from a biological source, such as a human or non-human cell, tissue, and the like.
- a biological source such as a human or non-human cell, tissue, and the like.
- non-human refers to all non-human animals and entities including, but not limited to, vertebrates such as rodents, non-human primates, sheep, cattle, ruminants, rabbits, pigs, goats, horses, dogs, Cats, birds, etc.
- Non-humans also include invertebrates and prokaryotes, such as bacteria, plants, yeast, viruses, and the like.
- nucleic acid samples for use in the methods and systems of the invention are nucleic acid samples derived from any organism, whether eukaryotic or prokaryotic.
- the inventors found that the GC content of the target sequence has a large influence on the capture efficiency of the target sequence based on liquid phase capture. In order to achieve effective capture of multiple target sequences, it is preferred to compensate the number of copies of the bait sequence according to the GC content of the target sequence. The smaller or larger the GC content, the larger the copy number of the bait sequence corresponding to the target sequence. The more.
- the inventors have found that for a target sequence with a GC content of about 50%, for example ⁇ 10%, a good target sequence capture efficiency can be obtained; for other GC content target sequences, a bait sequence copy number compensation is required to obtain a good target sequence. Capture efficiency.
- the GC content is 50% of the bait sequence copy number coefficient as the benchmark 1, and the GC content is between 10% and 90%. For every 1%, the bait sequence copy number coefficient is increased by 0.08-0.12. For example, when the GC content is 68%, the deviation is 18%, and the induced sequence copy number coefficient is 2.44-3.16.
- the corresponding bait sequence design method in this case is when the target region is a very high or very low GC content region or when the target region is low
- the probe is designed by using the region on both sides of the target region as a substitute region, and the region within 300 bp on both sides of the target region is generally selected as the replacement region, preferably within 150 bp.
- a low complexity region refers to a region composed of a rare variety of elements such as oligonucleotides, such as a simple repeat sequence of microsatellites.
- the decoy sequence copy number compensation method may be simply expressed as: according to the GC content of the target sequence, from high to low, divided into 6 files, wherein the first file: 10%-30%; the second file : 30%-40%; 3rd gear: 40%-60%; 4th gear: 60%-70%; 5th gear: 70%-90%; 6th gear: less than 10% or more than 90%, of which
- the copy number of the bait sequence of the third gear is the reference copy number
- the copy number of the decoy sequence corresponding to the second gear and the fourth gear needs to be increased, for example, 2.2-2.8 times of the third gear, the first gear and the fifth gear.
- the copy number of the bait sequence needs to be increased more, for example 3-4 times the third gear.
- the bait sequence design method is: using the two sides of the target region as an alternative region design
- the needle generally selects an area within 300 bp on both sides of the target area as a replacement area, preferably an area within 150 bp.
- the bait sequence is one or more bait sequences that are optimally scored in terms of specificity, dimer, hairpin structure, and relative position to the target region
- S- specific scores are all values between 0 and 1, and the specific scoring method is as follows:
- the Tm value, the Tm target is the decoy sequence and the target region T m ;
- the calculated T m of the sequence is not held to a particular method, various methods of calculating the Tm value may be used in the present invention, the Tm value obtained by various methods not substantially reverse the effects of the present invention, but the effect of The degree will vary.
- the nearest neighbor method of the SantaLucia 2007 thermodynamic parameter table can calculate Tm
- the Tm value calculated by other methods can correspond to it, and those skilled in the art can compare the Tm calculated by various methods through simple experiments, thereby The calculated Tm value is appropriately selected.
- the human genome coding region for the human genome coding region, more than 99% of the target regions can design a bait sequence suitable for the present invention, indicating that our aforementioned binning of the GC region and filtering of the Tm value are reasonable. .
- the hybridization between the nucleic acid analog and the target nucleic acid is carried out under preferably stringent conditions sufficient to support hybridization between the nucleic acid analog/DNA, wherein the nucleic acid is similar
- the inclusions comprise a complementary region of the linking compound and the target nucleic acid sample to provide the nucleic acid analog/DNA hybrid complex.
- the complex is then captured by the linker compound and washed under conditions sufficient to remove the non-atopic binding nucleic acid, and the hybridized target nucleic acid sequence is then eluted from the captured nucleic acid analog/DNA complex.
- the nucleic acid analog comprises a chemical group or a linking compound, such as a binding moiety such as biotin, digoxin, or the like, which is capable of binding to a solid support.
- the solid support may comprise a corresponding capture compound, such as streptavidin for biotin or a digoxin antibody for digoxin.
- the invention is not limited to the linking compounds used, and alternative linking compounds are equally suitable for use in the methods, bait sequences and kits of the invention.
- the chemical group or a linking compound such as a binding moiety such as biotin, digoxigenin or the like, may be linked to a nucleic acid analog (glycerol nucleic acid GNA, locked nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA) Or any base in the morpholine nucleic acid).
- the nucleic acid analog chain may comprise ribose and/or deoxyribose
- the chemical group or linking compound such as a binding moiety such as biotin, digoxin, etc., may be attached to ribose and/or deoxyribose On the base.
- the synthesis of the nucleic acid analog includes the use of a label ATP, CTP, GTP, and/or UTP.
- Labeling methods for the labeling nucleotides Cydye, DIG, biotin, rhodamine, fluorescein, etc. are known in the art.
- biotin can be used as a nucleic acid probe label which binds to the UTP of a nucleic acid molecule or a C atom at the 5' position of dUTP, and can be detected by binding to avidin.
- the present invention is not limited to known labels and labeling methods, and markers and labeling methods found in the future are also within the scope of the present invention.
- the plurality of target nucleic acid molecules preferably comprise a whole genome of an organism or at least one chromosome or a nucleic acid molecule of any size.
- the nucleic acid molecule is at least about 200 kb in size, at least about 500 kb, at least about 1 Mb, at least about 2 Mb, or at least about 5 Mb, more preferably from about 100 kb to about 5 Mb, from about 200 kb to about 5 Mb, from about 500 kb to about 5 Mb. From about 1 Mb to about 2 Mb or from about 2 Mb to about 5 Mb.
- the target nucleic acid is from an animal, plant or microorganism, and in a preferred embodiment, the target nucleic acid molecule is selected from a human. If the amount of nucleic acid sample is relatively small (e.g., a human nucleic acid sample obtained in some cases, such as the genome of a developing fetus), the nucleic acid can be amplified prior to performing the methods of the invention, such as by whole genome amplification. Pre-amplification may be necessary for performing the methods of the invention, such as in forensic applications (e.g., for use in genetics for forensic purposes).
- the plurality of target nucleic acid molecules are a set of genomic DNA molecules.
- the bait sequence may be selected, for example, from a plurality of decoy sequences defining a plurality of exons, introns or regulatory sequences from a plurality of genetic loci; a plurality of decoy sequences defining a full sequence of at least one individual genetic locus, Said locus is of any size, preferably at least 1 Mb, or at least one of the above specified sizes; a plurality of decoy sequences defining a single nucleotide polymorphism (SNP); or a plurality of bait sequences defining an array, for example designed as A chimeric array of full sequences of at least one complete chromosome is captured.
- SNP single nucleotide polymorphism
- hybridization refers to the pairing of complementary nucleic acids. Hybridization and hybridization strength (eg, the strength of binding between nucleic acids) are affected by a number of factors, such as the degree of complementarity between nucleic acids, the stringency of hybridization conditions used, the melting temperature (Tm) of the formed hybrid, and the GC of the nucleic acid. Content value.
- Tm melting temperature
- GC GC of the nucleic acid. Content value.
- stringent hybridization conditions depend on the sequence and vary with hybridization parameters (eg, salt concentration, presence of organics, etc.).
- stringent conditions are selected to be from about 5 ° C to about 20 ° C below the Tm of the particular nucleic acid sequence at the specified ionic strength and pH.
- stringent conditions are from about 5 ° C to 10 ° C below the temperature melting point of the particular nucleic acid to which the complementary nucleic acid is bound.
- the Tm is the temperature (under defined ionic strength and pH) at which 50% of the nucleic acid (eg, the target nucleic acid) hybridizes to the fully matched probe.
- stringent conditions may, for example, be 50% formamide, 5 x SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5 x Denhardt solution, sonicated sperm DNA (50 mg/ml), 0.1% SDS, and 10% dextran sulfate hybridized at 42 ° C at 0.2 ° SSC (sodium chloride / sodium citrate) and at 42 ° C Wash at 50 °C with 50% formamide and then at 55 °C with 0.1 x SSC containing EDTA.
- buffers containing 35% formamide, 5 x SSC, and 0.1% (w/v) sodium dodecyl sulfate (SDS) are expected to be suitable for hybridization at 45 ° C for 16-72 hours under moderately non-stringent conditions.
- the term "primer” refers to an oligonucleotide, whether purified, cleaved or produced synthetically, under conditions which induce the synthesis of a primer extension product complementary to a nucleic acid strand. (for example in the presence of nucleotides and inducing agents such as DNA polymerase, and at suitable temperatures and pH), can serve as a starting point for synthesis.
- the primer is preferably a single strand having the greatest amplification efficiency.
- the primer is an oligodeoxynucleotide.
- the primer must be sufficiently long to initiate synthesis of the extension product in the presence of the inducing agent. The exact length of the primer depends on many factors including temperature, source of the primer and the method used.
- the term "bait” or “bait sequence” refers to an oligonucleotide (eg, a nucleotide sequence), whether produced in nature, purified, cleaved, or produced by synthetic, recombinant, or PCR amplification, Able to work with another goal
- An oligonucleotide such as at least a portion of a target nucleic acid sequence, hybridizes.
- the probe can be single stranded or double stranded. Probes can be used for the detection, identification and isolation of specific gene sequences.
- target nucleic acid molecule refers to a molecule or sequence from a region of a target genomic region.
- the preselected probe determines the extent of the target nucleic acid molecule.
- the "target” attempts to distinguish it from other nucleic acid sequences.
- a “fragment” is defined as a nucleic acid region in the sequence of interest, such as a “fragment” or a “portion” of a nucleic acid sequence.
- isolated when used in reference to a nucleic acid, such as when used in “isolated nucleic acid”, refers to the identification and isolation of a nucleic acid sequence from at least one other component or contaminant to which it is normally associated. .
- An isolated nucleic acid exists in a form different from its natural presence.
- nucleic acids of unseparated nucleic acids such as DNA and RNA exist in their naturally occurring state.
- the isolated nucleic acid, oligonucleotide or polynucleotide may exist in a single stranded form or in a double stranded form.
- a decoy sequence consistent with a target nucleic acid sequence refers to a sequence whose complementary sequence can hybridize to a target nucleic acid sequence.
- the hybridization is carried out under stringent conditions.
- the target area is a very high or very low GC content area or when the target area is a low complexity area, since the area cannot design a bait sequence, that is, the bait sequence coverage is zero, then the target area is left and right.
- the side looks for a suitable area to design the bait sequence; generally, the bait sequence is designed within a range of 300 bp or less on the left and right sides; preferably, the area within 150 bp.
- a transcription primer for a bait sequence for use in the capture methods and kits described herein comprises a ligation compound, such as a binding moiety.
- the binding moiety comprises any portion that joins or introduces the 5' end of the amplification primer for subsequent capture of the nucleic acid analog/target nucleic acid hybridization complex.
- the binding moiety is any sequence that introduces the 5' end of the primer sequence, such as a captureable 6 histidine (6HIS) sequence.
- a primer comprising a 6HIS sequence can be captured by nickel, such as in a nickel coated or tube containing nickel coated beads, granules, or the like, in a microwell, or in a purification column, wherein the beads are packed into a column and the sample is loaded and The column is passed through to capture complexes with reduced complexity (eg, and subsequent target elution).
- An example of another binding moiety for use in embodiments of the invention includes a hapten, such as digoxin, for example, which is ligated to the 5' end of the amplification primer.
- Digoxin can be captured using a digoxin antibody, such as a substrate coated or containing an anti-digoxigenin antibody.
- the binding moiety is biotin
- the capture matrix such as a bead, such as a paramagnetic particle
- streptavidin for isolating the target nucleic acid from a non-specific hybridization target nucleic acid/ Transcription product complex.
- a streptavidin (SA) coated matrix such as SA coated beads (eg, magnetic beads/particles)
- SA coated beads eg, magnetic beads/particles
- the bait sequence corresponding to at least one region of the genome in the sequence can be provided in parallel on a solid support using a maskless array synthesis technique.
- the probe can be obtained continuously and applied to the solid support using a standard DNA synthesizer, or can be obtained from an organism and fixed to the solid support.
- a nucleic acid that has not hybridized or non-specifically hybridized to the nucleic acid analog is isolated by washing from the nucleic acid analog to which the vector is bound.
- the remaining nucleic acid specifically binds to the nucleic acid analog, elutes from the solid support in, for example, hot water or in a nucleic acid elution buffer containing, for example, TRIS buffer and/or EDTA to produce the target nucleic acid Molecularly enriched eluate.
- the bait sequence for the target molecule can be synthesized on a solid support as described above, released and amplified from the solid support as a collection of bait sequences.
- the transcribed set of released nucleic acid analogs can be covalently or non-covalently immobilized to a carrier, such as glass, metal, ceramic, or polymeric beads or other solid carrier.
- the nucleic acid analog can be designed to be conveniently released from the solid support, for example to provide an acid or base labile nucleic acid sequence at or near the end of the nucleic acid analog closest to the vector, which is released under low or high pH conditions, respectively.
- Nucleic acid analogs A variety of cleavable linking compounds are known in the art.
- the carrier can be provided, for example, in a cylinder having a liquid inlet and an outlet.
- Methods of immobilizing nucleic acids to vectors are well known in the art, for example by binding biotinylated nucleotides to the The nucleic acid analog is coated with streptavidin, whereby the coated vector non-covalently attracts and immobilizes the nucleic acid analog in the collection.
- the sample is passed through the vector comprising the nucleic acid analog under hybridization conditions, whereby the target nucleic acid molecule that hybridizes to the immobilized vector can be eluted for later analysis or other use.
- nucleic acid may include, for example, but not limited to, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and artificial nucleic acids such as peptide nucleic acids (PNA), morpholino and lock nucleic acids (LNA), glycerol nucleic acids. (glycol nucleic acid, GNA) and threose nucleic acid (TNA).
- nucleic acid may include, for example, but not limited to, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and artificial nucleic acids such as peptide nucleic acids (PNA), morpholino and lock nucleic acids (LNA), glycerol nucleic acids. (glycol nucleic acid, GNA) and threose nucleic acid (TNA).
- nucleic acid amino acid
- nucleic acid sequence or nucleic acid molecule
- RNA oligos of ribonucleic acid
- the term includes molecules consisting of natural nucleobases, saccharides, and covalent internucleoside (backbone) linkages, and similar functions with non-natural nucleobases, saccharides, and covalent internucleoside (skeleton) linkages. Molecule or a combination thereof. Such modified or substituted nucleic acids may be preferred over the native form because of the desired properties, such as enhanced affinity for nucleic acid target molecules and increased stability in the presence of nucleases and other enzymes, and the term “nucleic acid similar” is used herein. "” or “nucleic acid mimic" is described.
- nucleic acid mimetics include peptide nucleic acid (PNA), locked nucleic acid (LNA), wood-locked nucleic acid Uylo-LNA, thiophosphoric acid, 2'-methoxy, 2'-methoxyethoxy Molecular or functionally similar nucleic acid derivatives of morpholino nucleic acid and phosphoramidate.
- PNA peptide nucleic acid
- LNA locked nucleic acid
- Uylo-LNA wood-locked nucleic acid Uylo-LNA
- thiophosphoric acid 2'-methoxy
- 2'-methoxyethoxy 2'-methoxyethoxy
- Example 1 Design of a bait sequence
- Table 1 Chromosome distribution of randomly selected 1000 loci
- the bait sequence design includes the following steps:
- the target sequence characteristic analysis includes the following steps:
- GC content from high to low is divided into 5 files, of which 1 file: 10% -30%; 2 files: 30% -40%; 3 files: 40% -60%; 4 files: 60% -70%; 5 files: 70%-90%;
- the target sequence length is in the range of 60-150 bp;
- thermodynamic stability of the binding of the bait sequence on the non-target area is significantly weaker than the thermodynamic stability of the binding on the target area;
- the general analysis index is T m (target area) - T m (non-specific region) ⁇ (non-specific region) 5 ° C; partial data T m (target region) - T m (non-specific region) ⁇ 10 ° C for comparison (strong specificity restriction);
- different thermodynamic calculation methods The calculation results have a large impact, which is calculated based on the nearest neighbor method of the SantaLucia 2007 thermodynamic parameter table;
- S- dimer scoring rule Perform dimer alignment analysis on each of the newly designed bait sequences with each designed bait sequence, using BLAT software, using default parameters, and comparing each of them.
- bait sequence copy number compensation is performed according to the specific target area:
- the number of copies of the decoy sequence of the third gear is used as the reference copy number (ie, the reference 1); the decoy sequence corresponding to the first and fifth files needs to increase the copy number, which is the third block. 2.5 times; followed by 2 and 4, the corresponding bait sequence also needs a little more copy number is 3.5 times of the third gear;
- the target area may be the focus area, for example, the area where the fusion event occurs, and the number of copies of the bait sequence doubles;
- the target sequence cannot design the probe, for example, when the target area is a very high or very low GC content area, or when the target area is a low complexity area (low complexity area refers to a few types) Element such as an area composed of oligonucleotides, such as a simple repeat of microsatellites), due to the inability of the region to design
- the bait sequence that is, the coverage of the bait sequence is zero, then the bait sequence is designed to find the appropriate area on the left and right sides of the target area; the bait sequence is generally designed within the range of 300 bp on the left and right sides; if the area within 150 bp can be designed properly
- the bait sequence is recorded as a control.
- 138 of the randomly selected target sequences belong to this situation, 68 of them have successfully designed the bait sequence in the area of 150 bp or so, and the other 22 successfully designed the bait sequence within 150-300 bp. There are 48 probes that cannot be designed in these areas.
- the specific sequence design principle is: 1) no non-specific amplification products are generated on the target (to be captured) genome. 2) the GC content is between 30% and 70%, preferably between 40% and 60%; 3) the two do not form a dimer, or the dimer free energy formed is ⁇ 47 ° C, preferably ⁇ 37 ° C .
- the sequence to be synthesized is formed, and all the bait sequences are identical to the specific sequence, as follows:
- the 5'-end specific sequence-bait sequence (60-150 bp unequal)-3'-end specific sequence is (SEQ ID NO. 1):
- the oligonucleotide to be synthesized is synthesized on a large scale by a chip method well known in the art, and then the oligonucleotide on the chip is eluted with ammonia water, purified and dissolved in double distilled water to form an oligonucleoside. Acid pool.
- the 5' end primer and the 3' end primer complementary to the 5' end specific sequence and the 3' end specific sequence are used as primers, and the Taq polymerase (JumpStart Taq DNA Polymerase is used to purchase Sigma, Catalog No. D6558) Polymerase chain reaction amplification, obtaining a large number of double-stranded DNA pools, the specific steps are as follows:
- Reagent name volume Water 37 ⁇ l 10 ⁇ PCR Buffer 5 ⁇ l 10mM dATP 1 ⁇ l 10mM dCTP 1 ⁇ l 10mM dGTP 1 ⁇ l 10mM TTP 1 ⁇ l 5' primer (10 ⁇ M) 1 ⁇ l 3' primer (10 ⁇ M) 1 ⁇ l JumpStart Taq DNAPolymerase 1 ⁇ l
- Reagent name volume Water 37 ⁇ l 10 ⁇ PCR Buffer 5 ⁇ l 10mM dATP 1 ⁇ l 10mM dCTP 1 ⁇ l 10mM dGTP 1 ⁇ l 10mM TTP 1 ⁇ l BAITS_5_PRIMER_N-T7 (10 ⁇ M) 1 ⁇ l BAITS_3_PRIMER_N (10 ⁇ M) 1 ⁇ l JumpStart Taq DNAPolymerase 1 ⁇ l Oligonucleotide pool 1 ⁇ l
- the product of the previous PCR reaction was separated by gel electrophoresis, the non-specific band was removed, and the 120-210 bp region fragment was recovered and purified by Qiagen Gel Extraction Kit (Cat No./ID28704).
- NTP and biotin labeling using nucleic acid analogs using nucleic acid analogs (glycerol nucleic acid GNA, locked nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine nucleic acid) using T7 High Yield RNA Transcription Kit (Vazyme, TR101-01/02)
- the UTP is a substrate, and the purified product of the previous step is subjected to in vitro transcription to prepare a pool of biotin-labeled nucleic acid analogs:
- Reagent name Volume ( ⁇ l) ATP analog (GNA, LNA, PNA, TNA or morpholine nucleic acid, 10 mM) 2
- Block 2 Dilute cot-1 DNA and salmon sperm DNA to 100 ng/ ⁇ l, and mix in equal volumes, labeled Block 2;
- step 11 Repeat step 11 twice, a total of 3 magnetic beads cleaning, and finally resuspend the magnetic beads with 200 ⁇ l of the binding solution;
- Reagent name volume 5 ⁇ Phusion HF 10 ⁇ l 10mM dNTPs 1 ⁇ l Post Prmier Mix (both 10 ⁇ M) 1 ⁇ l Resuspend the magnetic beads (step 20) 20 ⁇ l Phusion DNA polymerase 0.5 ⁇ l H 2 O 17.5 ⁇ l
- the BWA MEM software was used to compare the sequencing data with the human reference genome HG19 using the following parameters: bwa mem-Mk 40-t 8-R"@RG ⁇ tID:Hiseq ⁇ tPL:Illumina ⁇ tSM:sample ", thereby obtaining a single nucleotide polymorphism, insertion or deletion different from the reference genome, ie, the detected gene mutation.
- the number of bases with sequencing depth ⁇ 1, ⁇ 4, ⁇ 10 and ⁇ 20 is counted separately, and the number of bases is divided by the total number of bases in the target region, thereby obtaining Parameters of 1 ⁇ coverage, 4 ⁇ coverage, 10 ⁇ coverage, and 20 ⁇ coverage.
- the average depth is 451.53 layers; 4 ⁇ coverage rate is 94.35%, and 20 ⁇ coverage rate is also 93.64%, with good coverage and uniformity, and total data volume. Only 8.52Mb reads.
- the beneficial effects of such results are: 1) small amount of sequencing, effective cost reduction; 2) high average sequencing depth, that is, each target site is sequenced multiple times, so the data accuracy is high; 3) high coverage, Less missing sites; 4) Good homogeneity, that is, most sites have similar coverage depths.
- the coverage and homogeneity decreased by 4.5 and 5.1 percentage points, respectively, compared with the LNA, and the strong specificity limit, strict dimer Coverage and uniformity increased by 6.3 and 7.8 percentage points respectively under the limitation, strict hairpin structure limitation and strict scoring function limitation; the area coverage and uniformity within 150 bp and the uniformity of 150-300 bp were 2.3 and 3.8 respectively.
- Percentage points; parallel coverage and homogeneity of standard nucleic acid ATP, CTP, GTP, UTP, and Biotin-UTP in the same ratio decreased by 5.3 and 4.8 percentage points, respectively.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Chemical Kinetics & Catalysis (AREA)
Abstract
Description
染色体 | 个数 | 染色体 | 个数 |
chr1 | 92 | chr12 | 73 |
chr2 | 67 | chr13 | 23 |
chr3 | 53 | chr14 | 15 |
chr4 | 43 | chr15 | 29 |
chr5 | 45 | chr16 | 41 |
chr6 | 124 | chr17 | 36 |
chr7 | 42 | chr18 | 14 |
chr8 | 46 | chr19 | 31 |
chr9 | 34 | chr20 | 21 |
chr10 | 61 | chr21 | 9 |
chr11 | 80 | chr22 | 21 |
试剂名称 | 体积 |
Water | 37μl |
10×PCR Buffer | 5μl |
10mM dATP | 1μl |
10mM dCTP | 1μl |
10mM dGTP | 1μl |
10mM TTP | 1μl |
5’端引物(10μM) | 1μl |
3’端引物(10μM) | 1μl |
JumpStart Taq DNAPolymerase | 1μl |
寡核苷酸池 | 1μl |
试剂名称 | 体积 |
Water | 37μl |
10×PCR Buffer | 5μl |
10mM dATP | 1μl |
10mM dCTP | 1μl |
10mM dGTP | 1μl |
10mM TTP | 1μl |
BAITS_5_PRIMER_N-T7(10μM) | 1μl |
BAITS_3_PRIMER_N(10μM) | 1μl |
JumpStart Taq DNAPolymerase | 1μl |
寡核苷酸池 | 1μl |
试剂名称 | 体积(μl) |
ATP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) | 2 |
CTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) | 2 |
GTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) | 2 |
UTP类似物(GNA、LNA、PNA、TNA或吗啉核酸,10mM) | 1.6 |
生物素-UTP(1mM) | 3 |
10×缓冲物 | 2 |
反应缓冲物(10×) | 2 |
上一步含T7序列的胶回收纯化产物 | 5.4 |
试剂名称 | 体积 |
5×Phusion HF | 10μl |
10mM dNTPs | 1μl |
Post Prmier Mix(均10μM) | 1μl |
重悬磁珠(步骤20) | 20μl |
Phusion DNA聚合酶 | 0.5μl |
H2O | 17.5μl |
Claims (10)
- 一种从核酸样品富集目标序列核酸的方法,所述方法包括:a)提供包含目标核酸序列的核酸样品和与目标核酸序列一致或对目标序列具有特征性的诱饵序列;b)以所述诱饵序列为模板进行体外转录制备核酸类似物,所述核酸类似物带有结合部分,例如生物素结合部分;c)使所述核酸样品片段化,优选制备文库;d)所述核酸类似物与所述核酸样品杂交,使得所述核酸类似物与所述目标序列核酸形成核酸类似物/DNA杂交复合物;e)通过所述结合部分,从非特异性杂交核酸中分离所述核酸类似物/DNA杂交复合物,去除非目标序列核酸。
- 根据权利要求1的方法,还包括步骤f):对所述核酸类似物/DNA杂交复合物进行扩增,达到富集目标序列核酸的目的。
- 根据权利要求1的方法,其中步骤b)中利用核酸类似物GNA、LNA、PNA、TNA或吗啉核酸进行体外转录,制备核酸类似物。
- 根据权利要求1所述的方法,其中所述核酸样品是基因组DNA、RNA、cDNA、mRNA,在所述核酸样品是RNA或mRNA的情况下,在步骤c)之前有将所述RNA或mRNA反转录成DNA的步骤。
- 根据权利要求1的方法,其中所述诱饵序列具有选自如下的特性:i)自身不产生发夹结构并且相互之间无二聚体产生,ii)拷贝数根据所述目标核酸序列的GC含量和/或空间结构进行补偿,iii)当所述目标区域是极高或者极低GC含量区域时或者当目标区域是低复杂度区域时,用所述目标区域两侧区域作为替代区域设计诱饵,设计方法与所述目标区域一致,iv)与核酸样品中目标核酸序列之外的其他序列无特异性结合。
- 根据权利要求4的方法,其中ii)中拷贝数根据所述目标核酸序列的GC含量进行补偿是指:以GC含量在50%的诱饵序列拷贝数系数为基准1,GC含量10%-90%之间每偏离1%,诱饵序列拷贝数系数增加0.08-0.12。
- 所述诱饵序列在固体载体上,例如在微阵列载玻片上。
- 根据权利要求1的方法,其中对每个目标区域,所述诱饵序列是在特异性、二聚体、发卡结构以及与目标区域的相对位置方面综合评分最优的一个或者多个诱饵序列,所述综合评分通过如下的打分函数进行:S=a×S特异性+b×S二聚体+c×S发卡结构+d×S相对距离,其中a=0.26-0.34、b=0.08-0.12、c=0.17-0.23、d=0.35-0.45,具体的打分计算方法如下:S特异性的打分计算:对新设计的任一条诱饵序列,在基因组上对其进行序列比对,对其每一条比对上的序列分别计算所述诱饵序列与比对上的序列之间Tm,所述诱饵序列与目标区域Tm-其与任一比对上序列Tm之差≥5℃,优选≥10℃,计算所述诱饵序列与所有比对上的序列之间的平均Tm,S特异性=1-Tm平均值/(Tm目标-5),优选S特异性=1-Tm平均值/(Tm目标-10),其中Tm平均值是诱饵序列与所有非特异区域比对结果的平均Tm值,Tm目标是诱饵序列与目标区域Tm;S二聚体的打分计算:对新设计的任一条诱饵序列,与每一条已经设计的诱饵序列进行二聚体比对分析,对其每一条比对上的序列分别计算所述诱饵序列与所述比对上的诱饵序列之间的Tm,所述Tm<47℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(47–Tm平均值)/47,优选Tm<37℃,计算所述诱饵序列与所有比对上的诱饵序列之间的平均Tm,S二聚体=(37–Tm平均值)/37;S发卡结构的打分计算:对任一条诱饵序列,计算其最佳的自身比对结构,并计算所述结构的Tm,所述Tm<47℃,并且S发卡结构=(47–Tm)/47,所述Tm<47℃,并且S发 卡结构=(37–Tm平均值)/37;S相对距离的打分计算:对于目标区域坐标,对新设计的任一条诱饵序列,计算其与所述目标区域坐标差值δDistance,δDistance小于150,S相对距离=(150-δDistance)/150。
- 权利要求1-8任一项涉及到的诱饵序列。
- 包括权利要求9所述的诱饵序列的试剂盒,所述试剂盒包括,但不限于,双链接头分子、多种不同的寡核苷酸探针。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2016403554A AU2016403554A1 (en) | 2016-04-22 | 2016-11-21 | Method for enriching target nucleic acid sequence from nucleic acid sample |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610250133.3A CN105925671B (zh) | 2016-04-22 | 2016-04-22 | 一种从核酸样品富集目标序列核酸的方法 |
CN201610250133.3 | 2016-04-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017181670A1 true WO2017181670A1 (zh) | 2017-10-26 |
Family
ID=56839769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/106595 WO2017181670A1 (zh) | 2016-04-22 | 2016-11-21 | 一种从核酸样品富集目标序列核酸的方法 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN105925671B (zh) |
AU (2) | AU2016102398A4 (zh) |
WO (1) | WO2017181670A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110343756A (zh) * | 2019-06-25 | 2019-10-18 | 广西识远医学检验实验室有限公司 | 一组用于检测地中海贫血的探针及相关试剂盒和应用 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105925671B (zh) * | 2016-04-22 | 2019-07-23 | 艾吉泰康(嘉兴)生物科技有限公司 | 一种从核酸样品富集目标序列核酸的方法 |
CN106676169B (zh) * | 2016-11-15 | 2021-01-12 | 上海派森诺医学检验所有限公司 | 一种用于乳腺癌易感基因brca1和brca2突变检测的杂交捕获试剂盒及其方法 |
CN108546739A (zh) * | 2018-04-20 | 2018-09-18 | 曹顺 | 一种用于ngs测序的核酸目标序列富集的方法 |
CN111723261B (zh) * | 2019-03-22 | 2021-08-13 | 昆明逆火科技股份有限公司 | 基于搜索引擎的dna比对算法 |
AU2021241674A1 (en) * | 2020-03-26 | 2022-09-01 | Rachael CUNNINGHAM | Hybridization capture methods and compositions |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103602658A (zh) * | 2013-10-15 | 2014-02-26 | 东南大学 | 一种新型靶向核酸分子的捕获与富集技术 |
CN105925671A (zh) * | 2016-04-22 | 2016-09-07 | 艾吉泰康生物科技(北京)有限公司 | 一种从核酸样品富集目标序列核酸的方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002307654A1 (en) * | 2002-05-01 | 2003-11-17 | Seegene, Inc. | Methods and compositions for improving specificity of pcr amplication |
US8192937B2 (en) * | 2004-04-07 | 2012-06-05 | Exiqon A/S | Methods for quantification of microRNAs and small interfering RNAs |
-
2016
- 2016-04-22 CN CN201610250133.3A patent/CN105925671B/zh active Active
- 2016-11-21 WO PCT/CN2016/106595 patent/WO2017181670A1/zh active Application Filing
- 2016-11-21 AU AU2016102398A patent/AU2016102398A4/en active Active
- 2016-11-21 AU AU2016403554A patent/AU2016403554A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103602658A (zh) * | 2013-10-15 | 2014-02-26 | 东南大学 | 一种新型靶向核酸分子的捕获与富集技术 |
CN105925671A (zh) * | 2016-04-22 | 2016-09-07 | 艾吉泰康生物科技(北京)有限公司 | 一种从核酸样品富集目标序列核酸的方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110343756A (zh) * | 2019-06-25 | 2019-10-18 | 广西识远医学检验实验室有限公司 | 一组用于检测地中海贫血的探针及相关试剂盒和应用 |
CN110343756B (zh) * | 2019-06-25 | 2023-02-24 | 广西识远医学检验实验室有限公司 | 一组用于检测地中海贫血的探针及相关试剂盒和应用 |
Also Published As
Publication number | Publication date |
---|---|
AU2016403554A1 (en) | 2018-12-13 |
CN105925671B (zh) | 2019-07-23 |
CN105925671A (zh) | 2016-09-07 |
AU2016102398A4 (en) | 2019-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3377625B1 (en) | Method for controlled dna fragmentation | |
US8986958B2 (en) | Methods for generating target specific probes for solution based capture | |
US20190005193A1 (en) | Digital measurements from targeted sequencing | |
WO2017181670A1 (zh) | 一种从核酸样品富集目标序列核酸的方法 | |
JP2020522243A (ja) | 核酸のマルチプレックス末端タギング増幅 | |
US20150017635A1 (en) | Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers | |
CN105647907B (zh) | 一种用于靶向杂交捕获的修饰性dna杂交探针的制备方法 | |
CA2945358C (en) | Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications | |
CA3134831A1 (en) | Methods and compositions for analyzing nucleic acid | |
CN105925678B (zh) | 用于扩增样品中多个目标dna序列的引物组及其应用 | |
JP2022516821A (ja) | 複合体が表面結合されたトランスポソーム複合体 | |
AU2021240263A1 (en) | Isothermal methods and related compositions for preparing nucleic acids | |
CN115298323A (zh) | 靶向测序方法 | |
WO2020132316A2 (en) | Target enrichment | |
CN106191256B (zh) | 一种针对目标区域进行dna甲基化测序的方法 | |
US20110091939A1 (en) | Methods and Compositions for Removing Specific Target Nucleic Acids | |
US11136576B2 (en) | Method for controlled DNA fragmentation | |
US9315807B1 (en) | Genome selection and conversion method | |
US11718848B1 (en) | Methods for depletion of high-copy sequences in multiplexed whole genome sequencing libraries | |
US20220136042A1 (en) | Improved nucleic acid target enrichment and related methods | |
US20190284550A1 (en) | Methods of depleting or isolating target rna from a nucleic acid sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899252 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016403554 Country of ref document: AU Date of ref document: 20161121 Kind code of ref document: A |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/02/2019) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16899252 Country of ref document: EP Kind code of ref document: A1 |