WO2016124069A1 - 一种构建长片段测序文库的方法 - Google Patents
一种构建长片段测序文库的方法 Download PDFInfo
- Publication number
- WO2016124069A1 WO2016124069A1 PCT/CN2016/070789 CN2016070789W WO2016124069A1 WO 2016124069 A1 WO2016124069 A1 WO 2016124069A1 CN 2016070789 W CN2016070789 W CN 2016070789W WO 2016124069 A1 WO2016124069 A1 WO 2016124069A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- well plate
- well
- stranded dna
- sequencing
- library
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 117
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 79
- 239000012634 fragment Substances 0.000 title claims abstract description 71
- 108020004414 DNA Proteins 0.000 claims abstract description 88
- 102000053602 DNA Human genes 0.000 claims abstract description 45
- 238000006243 chemical reaction Methods 0.000 claims abstract description 45
- 108020004682 Single-Stranded DNA Proteins 0.000 claims abstract description 42
- 238000013412 genome amplification Methods 0.000 claims abstract description 24
- 239000002585 base Substances 0.000 claims description 28
- 210000004027 cell Anatomy 0.000 claims description 27
- 238000006062 fragmentation reaction Methods 0.000 claims description 22
- 102000008579 Transposases Human genes 0.000 claims description 15
- 108010020764 Transposases Proteins 0.000 claims description 15
- 230000003321 amplification Effects 0.000 claims description 15
- 230000000295 complement effect Effects 0.000 claims description 15
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 15
- 239000003153 chemical reaction reagent Substances 0.000 claims description 14
- 238000004925 denaturation Methods 0.000 claims description 14
- 230000036425 denaturation Effects 0.000 claims description 14
- 238000013467 fragmentation Methods 0.000 claims description 14
- 238000003752 polymerase chain reaction Methods 0.000 claims description 14
- 210000000349 chromosome Anatomy 0.000 claims description 13
- 230000000813 microbial effect Effects 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 12
- 244000005700 microbiome Species 0.000 claims description 9
- 239000003513 alkali Substances 0.000 claims description 8
- 238000006073 displacement reaction Methods 0.000 claims description 8
- 210000003608 fece Anatomy 0.000 claims description 6
- 239000007788 liquid Substances 0.000 claims description 6
- 238000000746 purification Methods 0.000 claims description 6
- 229920000936 Agarose Polymers 0.000 claims description 5
- 238000003776 cleavage reaction Methods 0.000 claims description 4
- 230000009089 cytolysis Effects 0.000 claims description 4
- 238000001962 electrophoresis Methods 0.000 claims description 4
- 239000011535 reaction buffer Substances 0.000 claims description 4
- 230000007017 scission Effects 0.000 claims description 4
- 239000002689 soil Substances 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 238000000502 dialysis Methods 0.000 claims description 3
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 3
- 238000000137 annealing Methods 0.000 claims description 2
- 239000011324 bead Substances 0.000 claims description 2
- 210000000601 blood cell Anatomy 0.000 claims description 2
- 238000011534 incubation Methods 0.000 claims description 2
- 238000004806 packaging method and process Methods 0.000 claims 1
- 102000039446 nucleic acids Human genes 0.000 abstract description 2
- 108020004707 nucleic acids Proteins 0.000 abstract description 2
- 150000007523 nucleic acids Chemical class 0.000 abstract description 2
- 239000000047 product Substances 0.000 description 26
- 102000054766 genetic haplotypes Human genes 0.000 description 20
- 238000010276 construction Methods 0.000 description 18
- 238000005192 partition Methods 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 14
- 238000000926 separation method Methods 0.000 description 14
- 238000007789 sealing Methods 0.000 description 10
- 239000000243 solution Substances 0.000 description 9
- 230000000694 effects Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 238000011529 RT qPCR Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000006228 supernatant Substances 0.000 description 4
- 239000000872 buffer Substances 0.000 description 3
- 238000005119 centrifugation Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011049 filling Methods 0.000 description 3
- 239000008363 phosphate buffer Substances 0.000 description 3
- 239000011148 porous material Substances 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241000894006 Bacteria Species 0.000 description 2
- 238000007400 DNA extraction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 238000006911 enzymatic reaction Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 238000012917 library technology Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000003753 real-time PCR Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 239000008223 sterile water Substances 0.000 description 2
- 239000011550 stock solution Substances 0.000 description 2
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 229930040373 Paraformaldehyde Natural products 0.000 description 1
- 208000020584 Polyploidy Diseases 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 238000005251 capillar electrophoresis Methods 0.000 description 1
- 230000032823 cell division Effects 0.000 description 1
- 239000006285 cell suspension Substances 0.000 description 1
- 229920002301 cellulose acetate Polymers 0.000 description 1
- 230000005770 chromosome separation Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 238000003113 dilution method Methods 0.000 description 1
- 239000012154 double-distilled water Substances 0.000 description 1
- 210000002308 embryonic cell Anatomy 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- HQPMKSGTIOYHJT-UHFFFAOYSA-N ethane-1,2-diol;propane-1,2-diol Chemical compound OCCO.CC(O)CO HQPMKSGTIOYHJT-UHFFFAOYSA-N 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 239000000706 filtrate Substances 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000000265 homogenisation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 150000002500 ions Chemical group 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000010871 livestock manure Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000003472 neutralizing effect Effects 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 229920002866 paraformaldehyde Polymers 0.000 description 1
- 229920001993 poloxamer 188 Polymers 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000002002 slurry Substances 0.000 description 1
- 239000000344 soap Substances 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J19/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J19/0046—Sequential or parallel reactions, e.g. for the synthesis of polypeptides or polynucleotides; Apparatus and devices for combinatorial chemistry or for making molecular arrays
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1068—Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B60/00—Apparatus specially adapted for use in combinatorial chemistry or with libraries
- C40B60/14—Apparatus specially adapted for use in combinatorial chemistry or with libraries for creating libraries
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/00277—Apparatus
- B01J2219/00279—Features relating to reactor vessels
- B01J2219/00306—Reactor vessels in a multiple arrangement
- B01J2219/00313—Reactor vessels in a multiple arrangement the reactor vessels being formed by arrays of wells in blocks
- B01J2219/00315—Microtiter plates
- B01J2219/00317—Microwell devices, i.e. having large numbers of wells
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/00583—Features relative to the processes being carried out
- B01J2219/00585—Parallel processes
- B01J2219/00587—High throughput processes
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/00583—Features relative to the processes being carried out
- B01J2219/00599—Solution-phase processes
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/00709—Type of synthesis
- B01J2219/00716—Heat activated synthesis
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/00718—Type of compounds synthesised
- B01J2219/0072—Organic compounds
- B01J2219/00722—Nucleotides
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/08—Liquid phase synthesis, i.e. wherein all library building blocks are in liquid phase or in solution during library creation; Particular methods of cleavage from the liquid support
Definitions
- the invention relates to the field of biotechnology, and in particular to a method for constructing a long segment sequencing library.
- the existing high-throughput parallel sequencer has a sequencing read length ranging from a few hundred bases to several thousand bases, and the genome or genome components that need to be studied may be as large as Hundreds of millions of bases, which requires bioinformatics to re-convert these fragmented information into long pieces of chromosome information of the organism itself, that is, to assemble short fragments generated by sequencing.
- the assembly effect of the genome is subject to many aspects.
- the quality of the sequencing data, and the human factors used in the assembly software the genome's own characteristics will also affect the genome assembly effect.
- Two of the more important factors are the degree of heterozygosity between the repeat region and the genome in the genome. Due to the short read length of the existing sequencing means, it is impossible to cross the repeated area and thus cause the splicing failure. Too high a degree of heterozygosity in the genome results in assembly software that assembles homologous chromosomes separately, causing the assembled genome to deviate from the true state of the genome.
- These factors cannot be completely eliminated by the improved assembly stitching algorithm without changing the read length of the existing sequencer.
- potential sequencing errors, as well as bias and errors caused by amplification during library construction can have a negative impact on assembly performance.
- genomic repeat regions traditional methods generally aid in assembly by increasing the length of library jumps, such as constructing Mate pair jump libraries or Fosmid libraries of different lengths for spanning genomic repeat regions of different sizes.
- a hybrid sequencing type method can be used, such as using a PacBio sequencer to read long sequences to generate long scaffold sequences, and then using the short read length of the illumina sequencer to perform error correction, thereby achieving better assembly results.
- haplotype assembly phase phasing Phasing
- the haploid information is separated from the polyploid information by experimental means, so that the haploid type is completely assembled.
- haplotype information of samples using different research methods, including: 1. Obtaining haplotype information of samples by genome-wide sequencing of samples and fathers of samples. 2. Haploid sequencing by using the Fosmid sequencing method. 3. In the middle of cell division, the chromosomes are separated and sequenced using micromanipulation techniques to obtain haplotype information. 4. Haploid sequencing by proximity to random ligation.
- haplotype phasing methods have certain limitations: 1. Simultaneous sequencing of the parent and progeny samples, and then haplotype phasing according to the genotype requires both the father and the The sample of the female parent greatly limits its range of use and the method cannot detect De novo mutations. 2.
- the method using Fosmid sequencing requires at least one week of library preparation time, including a large number of library experiments, so the method requires a microgram sample as a starting point. Unable to analyze for a small number of clinical samples. 3.
- the chromosome separation method requires complex professional micromanipulation equipment, and requires the experimenter's operation level to be very high. 4.
- the mutants detected by the proximity method are limited, and only about 80% of SNV can be detected, which cannot meet the needs of clinical analysis. Therefore, in order to cope with the needs of individualized medical treatment, a haploid type sequencing technology with high accuracy, high coverage, low cost, low initial amount and relatively simple experimental conditions is urgently needed.
- the researchers developed a method for assembling haplotypes using short read sequences to assemble growth read length fragments.
- Representative technologies include Complete Genomics' Long Fragment Read (LFR technology, the same below) technology, and Illumina's Truseq synthetic long read product.
- LFR technology Complete Genomics' Long Fragment Read
- Illumina's Truseq synthetic long read product In experimental terms, the two are similar to the Fosmid haplotype typing method, which is achieved by randomly separating DNA into different physical partitions to achieve the separation of homologous chromosomes from different sources.
- Fosmid sequencing the library construction time of both of them is significantly shortened, and there is no need for a large number of operations, and the haplotype assembly effect can reach about Haplotype N50500 kb.
- LFR technology only needs about 100pg of DNA, that is, the genome of 10-20 cells can be used as the starting point to complete the library construction, and can cover more than 92% of SNV sites, the correct rate can reach 99.99999 %, which is 10 times more accurate than the whole genome sequencing method based on ligation sequencing.
- the method provided by the invention comprises the following steps:
- the method for preparing a 5184-well plate containing single-stranded DNA is as follows A) or B):
- the sample to be tested is dispensed into each well of a 5184-well plate, and then lysed and denatured to obtain a 5184-well plate containing single-stranded DNA;
- the fragment length of the single-stranded DNA is not less than 100 Kb;
- the fragmented product has a length of 200-1500 bp;
- the total amount of single-stranded DNA in the 5184-well plate containing single-stranded DNA satisfies the homologous chromosome fragment (two interdigitated) from the same genomic position of both parents in each of the 5184-well plates.
- the probability is less than 1%;
- the total amount of single-stranded DNA in the 5184-well plate containing single-stranded DNA is specifically the amount of DNA of 10 to 500 cells.
- the volume of the 5184-well plate per hole is 190 nanoliters to 350 nanoliters, and the volume of the 5184-well plate per hole is specifically 200 nanoliters to 350 nanoliters.
- the reagent used for the denaturation or the cleavage denaturation is an alkali denaturation reagent
- the denatured reaction conditions were 25 degrees incubation for 2 minutes;
- reaction conditions for the cleavage denaturation were incubated at 85 degrees Celsius for 2 minutes;
- the method for extracting genomic DNA of the sample to be tested uses a dialysis method or an alkali lysis method or an agarose embedding method.
- the whole genome amplification reaction system contains a random primer consisting of 8 bases.
- step 1) the method further comprises the step of dispensing a random primer consisting of 8 bases into the 5184-well plate containing single-stranded DNA.
- the whole genome amplification reaction system is a multiple strand displacement polymerase amplification reaction system
- the whole genome amplification reaction was incubated at 30 ° C for 1 hour and then at 65 ° C for 5 minutes;
- the whole genome amplification reaction was incubated at 37 ° C for 45 minutes and then at 65 ° C for 5 minutes.
- the fragmentation reaction system is a transposase reaction system
- the transposase reaction system comprises a transposase reaction buffer and a transposase for embedding the linker;
- the conditions of the fragmentation reaction were incubated at 55 ° C for 10 minutes;
- the joint is a joint 1 and/or a joint 2.
- the linker 1 and the linker 2 are joints of different sequences.
- the method of adding different tag sequences comprises the steps of: substituting 72 kinds of 3' end tag primers, polymerase chain reaction system and 72 kinds of 5' end tag primers into the containing The polymerase chain reaction was carried out in each well of a 5184-well plate of the fragmented product to obtain a original sequencing library;
- the 3'-end tag primer consists of a sequencing linker A, a base of 8 bases, and a single-stranded DNA molecule complementary to the linker 1 in the 5' to 3' direction;
- the 5'-end tag primer consists of a sequencing linker B, a base of 8 bases, and a single-stranded DNA molecule complementary to the linker 2, in the 5' to 3' direction;
- the 8 base random primers of the 72 3' end tag primers and 72 5' end tag primers are all different;
- the polymerase chain reaction system contains primer 1 and primer 2;
- the primer 1 is identical or complementary to the sequencing linker A;
- the primer 2 is complementary or identical to the sequencing linker B.
- the sequencing linker A and the sequencing linker B are linkers of different sequences.
- the 72 kinds of 3' end label primers are divided into: the 72 kinds of 3' end label primers are dispensed into each of the 72 columns of the 5184-well plate containing the fragmentation product. ;
- the 72 kinds of 5' end tag primers are packaged in such a manner that the 72 kinds of 5' end tag primers are dispensed into each of the 72 rows of the 5184-well plate containing the fragmentation product;
- the polymerase chain reaction system is packaged in such a manner that the polymerase chain reaction system is dispensed into each well of a 5184-well plate containing the fragmented product.
- the annealing temperature of the polymerase chain reaction is 60 ° C for 30 seconds.
- the method of dispensing is: firstly, each substance to be dispensed is dispensed into a dispensing mode hole in a 384-well plate, and then the substance in the dispensing mode hole is subjected to nano-dispensing liquid separation. Dispense into the holes of the 5184 orifice plate;
- the amount of the dispensing is equal to each well.
- the dispensing mode holes of the 384-well plate conform to a 4 ⁇ 2 arrangement, and the long side has 2 short sides of 4, and the dispensing mode holes have a hole number ranging from 8 to 384 holes, and the dispensing mode
- the pores are specifically 24 or 72 wells.
- the method further comprises the steps of: sorting the original sequencing library into a fragment selection library to obtain a fragment selection library having a length in the range of 200-1100 bp, that is, long-sequence sequencing library.
- the 200-1100 bp fragment selection library is a 250-550 bp fragment selection library and a 550-1000 bp fragment selection library;
- the method of sorting is magnetic bead purification or agarose electrophoresis purification
- the sample to be tested is a eukaryotic cell or a mixture of microbial cells, and the eukaryotic cell is derived from human blood cells;
- the mixture of microbial cells is derived from feces containing mixed microorganisms or containing mixed microbial soil.
- the long fragment DNA molecule is not less than 100 Kb;
- the mixed microorganism is specifically derived from feces or soil.
- Another object of the present invention is to provide a kit for preparing a long-sequence sequencing library.
- the product provided by the invention comprises a 384-well plate, the above-mentioned 5184-well plate and a nano-upgrader dispenser.
- a third object of the present invention is to provide a custom made 5184 orifice plate as described above.
- the customized 5184-well plate provided by the present invention is characterized in that the orifice plate has a volume of 200-350 nanoliters per hole.
- the denaturation reagent is an alkali denaturation reagent, and the formulation is as shown in Table 1 in the examples;
- the sample dispensing mode of the 384-well plate conforms to 4*2 and the long side is 2 short sides of 4, and the number of dispensing holes ranges from 8 to 384 holes, preferably 24 holes or 72 holes.
- the embedding joint is a joint 1 and/or a joint 2; it can be designed as needed.
- the design of the joint 1 and the joint 2 of this embodiment is designed for the illumina sequencer application:
- the designed joint 1 and joint 2 are composed of two chains, one long chain and one short chain.
- the 3' end 19 bp sequence at the long chain end is fixed to AGATGTGTATAAGAGACAG (5' to 3' direction).
- the 5' end of the long chain can be replaced with a different sequence depending on the sequencing platform.
- the long chain of the linker 1 was changed to CCTCTCTATGGGCAGTCGGTGATAGATGTGTATAAGAGACAG (5' to 3' direction); the long chain of the linker 2 was changed to: CCATCTCATCCCTGCGTGTCTCCGACTCAGAGATGTGTATAAGAGACAG (5' to 3' direction).
- the short strand is a 19 bp oligonucleotide strand having the sequence CTGTCTCTTATACACATCT (5' to 3' direction) which is inversely complementary to the 3' end 19 bp sequence of the long chain end of 2.
- Figure 1 is a schematic diagram of the principle of long segment sequencing technology.
- FIG. 2 is a flow chart of a method for constructing a long segment sequencing library.
- Figure 3 is a comparison of the time required to build a library and the time required to build a library of illumina.
- Figure 4 is a comparison of the time required to build a library and the time required to build a library for mainstream haplotypes.
- Figure 5 is a schematic diagram showing the results of high molecular weight DNA extraction electrophoresis.
- Figure 6 is a plot of the probability of overlapping the number of cells with the parental parent fragment for a given number of partitions.
- Figure 7 is a schematic view of the filling position of the 384-well liquid storage source plate.
- Figure 8 is a schematic view of the filling position of the 384-well liquid storage source plate.
- Figure 9 is a graph showing the results of an Agilent 2100 bioanalyzer test without a fragment selection library.
- Figure 10 is a graph showing the results of Agilent 2100 bioanalyzer detection of a fragment selection library (250-550 bp).
- Figure 11 is a graph showing the results of Agilent 2100 bioanalyzer detection of a fragment selection library (550-1000 bp).
- Figure 12 is a schematic diagram of the long segment 100kb stitching result (please replace English with Chinese).
- the single-stranded DNA to be sequenced is equally divided into each well of a 5184-well plate.
- the Agilent RecoverEase Dialysis DNA Extraction Kit was used to extract white blood cells from human blood (also cultured cells, and tissue samples were subjected to slurry homogenization). Genomic DNA was extracted and pulse length gel electrophoresis was used to detect fragment length distribution. The detection condition is: 6V/cm, 50-90 second conversion range, and the running time is 20 hours. The results are shown in the third lane of Fig. 5, and it is seen that the main length of the extracted fragment length distribution is larger than 100 kb, and the fragment span is 50 kb to 800 kb.
- the human genome is used as an example.
- the human genome contains 6Gb haplotype information, and 150 pg, that is, DNA of DNA quality of 20 human cells is calculated as the final reaction starting amount. This amount ensures that in all physical separations, the probability of homologous chromosome fragments (two interdigitated) from the same genomic location of both parents in the same partition is less than 1%.
- the source of the sample to be tested (species) is different, and the amount of DNA is also different, ranging from 10 to 500 cells of DNA.
- a probability formula for homologous chromosome fragments from the same genomic location of the male and female parents in the same well can be derived:
- X is the sequencing depth of a single base site
- the long segment in each well is located at the same position in the genome, and the probability from the male parent and the female parent respectively: P*50%.
- the probability that homologous chromosome fragments from the same genomic location of the male and female parents appear in the same well is only related to the number of cells input and the number of physical partitions, regardless of the length of the input fragment and the size of the genome.
- a distribution map of cell numbers and probabilities can be generated ( Figure 6). According to the distribution map, the number of cells that need to be input at a certain probability can be derived.
- the amount of DNA added to the 5184 well plate is 10-500 cells.
- the DNA concentration was adjusted to 664.56 pg/ ⁇ L, and 9.57 ⁇ L of the DNA sample was pipetted using a slit pipette tip and placed in a 1.5 mL centrifuge tube for use. Then, 20 ⁇ L of the alkali denaturing reagent stock solution (formulation shown in Table 1) was diluted 10 times to 200 ⁇ L, and 172.3 ⁇ L of the diluted liquid was added to the above 1.5 mL centrifuge tube, and the tube wall was gently mixed. After standing at room temperature for two minutes (denaturation), transfer the reaction system to ice. Note that the treatment time should not exceed 5 minutes, otherwise the DNA sample will be damaged. The purpose of this step is to destroy the hydrogen bonds between the double strands of its DNA molecule, making it a single-stranded DNA.
- Table 1 is the alkali denaturing reagent formula
- step 2 Dispense the sample processed in step 1 into 24 wells in a 384-well plate using a wide-mouth pipette tip (the specific position is shown in blue in Figure 7 and the second hole in 24 holes).
- the first 12 holes in the second column, the sample dispensing mode is 4*2 and the long side is 2 short sides are 4, and the number of dispensing holes is 8-384 holes, preferably 24 Hole, but not limited to.
- the sample dispensing mode here is 3 sets of precisely arranged 4*2 holes with a volume of 22 microliters per well (DNA content per well is 18.41 picograms).
- the customized 5184-well plate is a well-suited 5184-well plate with a volume of 200 nl per well.
- Customized 5184-well plates range in volume from 200 nl to 350 nl, and can also be used in standard 5184-well plates, depending on the amount of DNA from different sources.
- the 5184-well plate containing the DNA single strand obtained above was sealed with a sealing film, and the container was placed in an Eppendorf 5810-plate centrifuge, 3220 ⁇ g, and centrifuged for 5 minutes. Place it at room temperature (in order to place the solution on the bottom of the tube).
- Table 2 shows the multiple strand displacement polymerase amplification buffer solution
- the container was sealed with a sealing film, and the container was placed in an Eppendorf 5810-plate centrifuge, 3220 x g, and centrifuged for 5 minutes. It was then placed in an adapted warm bath apparatus, incubated for 1 hour at 30 ° C (also 45 minutes at 37 ° C), and incubated for 5 minutes at 65 ° C. Finally, it was cooled to room temperature for use, and a whole genome amplification product was obtained to obtain a 5184-well plate containing the whole genome amplification product.
- the whole genome amplification product obtained by the above two is fragmented to have a size of 200-1500 bp, and fragmented by a transposase encapsulating linker 1 and linker 2;
- the transposase reaction system of the embedded junction 1 and the junction 2 is configured according to Table 3. After that, it was mixed upside down twenty times, and after centrifugation, the reaction system was dispensed to 384. In the 24 holes of the orifice plate, the specific position is shown in Fig. 7, and the volume per hole is 22 ⁇ l.
- the bases in the gray background are bonded to each other by hydrogen bonding, ensuring that the two sequences form a specific linker.
- Table 3 shows the transposase reaction solution for fragmentation
- the WaferGenMultisampleNanodispenser apparatus the sample in the 24 wells of the 384-well plate prepared in the step 1 was equally divided into the 5184-well plates containing the whole genome amplification product obtained by the above-mentioned two using the "LFR 35nl Single Sample dispensing.seq" program. In the hole.
- the container was sealed with a sealing film, and the container was placed in an Eppendorf 5810-plate centrifuge, 3220 x g, and centrifuged for 5 minutes. It was then placed in an adapted warm bath apparatus, incubated at 55 ° C for 10 minutes, then cooled to room temperature for later use.
- the transposase neutralizes the reaction solution, and then shakes and mixes. After centrifugation, the reaction solution is loaded into 24 holes of a 384-well plate. The specific position is shown in Figure 7, and the volume per well is 14 ⁇ g. l, and then the sample in the 24 wells of the 384-well plate was equally divided into the wells of the 5184-well plate obtained by the above 3 using the "LFR 35nl Single Sample dispensing.seq" program to obtain a fragment product of 200-1500 bp, that is, A 5184 well plate containing the fragmented product was obtained.
- the fragmented product was subjected to capillary electrophoresis by Agilent Bioanalyzer 2100 to determine the fragment size distribution. As shown in Fig. 10 and Fig. 11, it can be seen that the size was 200-1500 bp.
- the tag sequence is used to distinguish between different samples
- the 3'-end tag primer consists of a sequenced linker A, a base of 8 bases and a single-stranded DNA molecule complementary to linker 1 from the 5' to the 3' direction; 8 bases of 72 3'-end tag primers Random fragments are not the same;
- the underlined is a sequencing linker A, the italicized 8 base random fragment, and the bolded part is a single-stranded DNA molecule complementary to the linker A.
- the sample in the 384-well plate obtained in the above 1 was dispensed in an equal amount to each of the above-mentioned three columns of the 75-row plate of the fragmented product of 5184 using the "LFR 35nl 72Sample dispensing.seq" procedure.
- the container was sealed with a sealing film, and the container was placed in an Eppendorf 5810-plate centrifuge, 3220 x g, and centrifuged for 5 minutes. Allow to stand at room temperature for 10 minutes, and set aside.
- the polymerase chain reaction buffer was configured according to Table 6.
- the sequences of Primer 1 and 2 in Table 6 are shown in Table 7.
- Primer 1 is identical or complementary to sequencing linker A;
- Primer 2 is complementary or identical to sequencing linker B.
- the 5'-end tag primer consists of a sequencing linker B, a base of 8 bases, and a single-stranded DNA molecule complementary to linker B, in the 5' to 3' direction;
- the 8 base random primers of the 72 3' end tag primers and 72 5' end tag primers are all different;
- the underline is the sequencing linker B, the italicized 8 base random fragment, and the bolded part is the single-stranded DNA molecule complementary to the linker B.
- the 5184-well plate obtained by the above 6 treatment was subjected to polymerase chain amplification according to the PCR procedure shown in Table 9.
- the sequencing library (original) was electrophoresed on 2% agarose, and the electrophoresis product was recovered to carry out sequencing (original) sorting to obtain a 250-550 bp fragment selection library 1 and a 550-1000 bp fragment selection library 2.
- the sequencing library (original), the fragment selection library 1, and the fragment selection library 2 were detected using an Agilent Bioanalyzer, and the distribution of the library fragments was determined.
- Figure 9 is the original library results, it can be seen that the fragment is 200-1500 bp;
- Figure 10 is the result of fragment selection library 1, it can be seen that the fragment is 250-550 bp;
- Figure 11 shows the results of the fragment selection library 2, and it can be seen that the fragment is 550-1000 bp.
- Real-time PCR was used to detect the effective molecular concentration standard qPCR concentration in the library (qPCR can be performed using Kapa Biosystems' KK4824 KAPA Library Quantification Kits for illumina platform.
- the kit contains all the reagents and primers for the assay.
- the qPCR program is shown in the following table. 10)), the results are shown in Table 11.
- IlluminaHiseq 2000 or Hiseq 2500 sequencing was performed after the test was passed (the eligibility criteria were qPCR concentrations greater than 5 nmol/L).
- the long segment length distribution histogram is shown in Fig. 12.
- the fixed microbial cells were vortexed to a brief vortex, and the cell concentration was counted using a red blood cell counter plate, and then diluted to 30 cells/ ⁇ l with sterile water for use, and the diluted microbial cells were obtained.
- the container is sealed with a sealing film.
- the container is placed in an Eppendorf 5810-plate centrifuge and centrifuged at 4000 rpm for 5 minutes. Then carefully remove the sealing film and heat the container to 85 ° C. Hold for 15 minutes, allow all the water in the container to evaporate, then take out the container and let it cool naturally to room temperature;
- step 5 Using a WaferGen Multisample Nanodispenser instrument, the sample in the 384-well plate prepared in step 4) was added to the vessel described in step 3) using the "LFR 35nl Single Sample dispensing.seq" procedure.
- the container was sealed with a sealing film, and the container was placed in an Eppendorf 5810-plate centrifuge and centrifuged at 4000 rpm for 5 minutes. Then put it into the fitted warm bath equipment, Incubate for 2 minutes at 85 ° C. Place the container in an Eppendorf 5810-well plate centrifuge and centrifuge at 4000 rpm for 2 minutes, then incubate the container at 85 ° C for 2 minutes.
- the sequencing library (original) was obtained in the same manner as the six methods of Example 1.
- a 500 bp to 600 bp fragment selection library was obtained by sorting.
- the effective molecular concentration in the library was detected by real-time fluorescent quantitative PCR. IlluminaHiseq 2000 or Hiseq 2500 sequencing was performed after passing the test.
- the method uses a novel high-throughput micro pipetting platform to carry out preliminary database construction experiments.
- the platform can perform a 35 nanoliter pipetting operation to control the reaction volume of the multiple strand displacement reaction below 100 nanoliters. There is evidence that reducing the reaction volume can significantly improve the amplification preference of the multiple strand displacement reaction.
- the method increases the number of physical partitions to 5,184 pores. In the case where the initial amount of DNA in the database remains unchanged, the DNA contained in each physical partition of the container is only one tenth of the original 384-separated product. Left and right, about 1% of the genome, which undoubtedly reduces the difficulty of denovo assembly in each separation.
- the method optimizes and integrates the multiple strand displacement amplification technology, transposase fragmentation technology and nano-scale micropipetting platform, and establishes a set of haplotype library construction process which can be used for low sample initial amount.
- Figure 2 is a flow chart of the long segment sequencing library construction method.
- the method is relatively simple to operate, and the reaction steps are few.
- the library construction time is only 10 hours, and the manual operation part is only 3.5 hours.
- the traditional illumina sequencing library construction takes 2 days, and the manual operation takes about 5 hours.
- Fosmid's construction of the haplotype library technology takes 8 days to complete the library construction, while the Illumina commercialized Truseq synthetic long reads kit takes three days to complete the library, with manual operations taking 6-8 hours.
- this method has certain advantages in the reporting period.
- Figure 3 compares the time required to build the library with the time required to build a library of illumina.
- Figure 4 compares the time required to build a library with the time required to build a library for mainstream haplotypes.
- this method requires only one nano-scale micro-pipetting platform, and all the experimental procedures can be completed without the assistance of other instruments or automated pipetting devices.
- the mid-chromosome separation requires micromanipulation devices required for haplotype construction techniques.
- Complete Genomics LFR technology requires three large automated pipetting devices, and this method is undoubtedly more feasible.
- this method requires less DNA samples, and can be as low as 150 picograms (equivalent to the amount of DNA contained in 25 human cells).
- the traditional Fosmid construct haplotype library technology requires 8 micrograms of DNA for library construction, and the illuminaTruseq synthetic long reads kit requires 500 ng of DNA as a starting point. The reduction of the initial amount makes the application scenario of the method greatly expanded, and can be used for haplotype sequencing of free tumor cells, free fetal cells or embryonic cells.
- the method adopts a mild extraction method, and can separate longer DNA fragments, not less than 100 kb.
- the advantage of the long fragment is that it can better solve the assembly of the repeated regions in the genome, and can also improve the assembly performance in the hybrid region.
- the method can therefore be applied to the construction of genomic de novo assembly libraries of animals, plants and microorganisms.
- the method can achieve the effect of single cell separation when a certain concentration of cell suspension is used as a starting material. Combined with certain cell observation and lysis methods, this method can realize the construction of high-throughput single-cell sequencing library, and theoretically, about 1000 single-cell sequencing libraries can be constructed in parallel. Therefore, the method can be applied to the study of flora in a complex environment, thereby providing a deeper understanding of the composition of the flora in a complex environment. More importantly, the genomic information of bacteria that cannot be isolated and cultured can be obtained by this method, which is of great significance for metagenomic research. As an extension, the method can also be applied to single-cell omics research, and can be used as a pre-high-throughput, automated single-cell amplification library construction method.
- Both the Fosmid method, the Truseq synthetic long reads kit, and the Complete Genomics LFR technology are based on the principle of separate dilution method for haplotype assembly, which reduces the homologous chromosome fragments in the same partition by physical separation. Probability, thus phasing the heterozygous SNP, the number of divisions generally does not exceed 384. The number of separations can be increased by repeating the experiment with the same sample, but this means that the cost of building and sequencing is multiplied.
- the method of the present invention can effectively increase the number of physical partitions by using a nano-scale micropipetting device and a customized 5184-well container, thereby further reducing the probability that homologous chromosome fragments appear in the same partition, thereby achieving a better single.
- Double assembly effect The custom container has a larger volume per unit volume than the original container and can hold a volume of 200 nl to 350 nl (the original volume is 100 nl).
- the custom container can hold up to 5 loadings, so the entire library reaction can be completed in the customized 5184-well container.
- the operator can separate the 5184-well container again, that is, multiple samples can be processed simultaneously in one container, such as 2 ⁇ 2597, or 3 ⁇ 1728, thereby increasing throughput and reducing costs. the goal of.
- this method can also introduce a virtual partition by adding a new dimension label, thereby further increasing the number of partitions. For example, introducing two groups during DNA transposase fragmentation Labeling, marking the position of the first heavy physical separation, and then introducing two combination labels during the polymerase chain amplification process, marking the position of the second virtual virtual separation, thereby finally achieving 5184 ⁇ 5184 (including but not Limited to this, it may be a physical separation effect of 12 ⁇ 5184, 24 ⁇ 5184 or more.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Biomedical Technology (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Immunology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Plant Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本发明提供了一种构建长片段测序文库的方法。本发明提供的方法包括如下步骤:1)制备含有单链DNA的5184孔板;2)将所述单链DNA等量分装到5184孔板每个孔中;3)将全基因组扩增反应体系分装到步骤2)处理后的5184孔板每个孔中,反应;4)将片段化反应液分装到步骤3)处理的5184孔板每个孔中,反应;5)将步骤4)处理后的5184孔板每个孔中的核酸分子添加不同标签序列,即得到测序文库。
Description
本发明涉及生物技术领域,尤其涉及一种构建长片段测序文库的方法。
新一代高通量并行测序仪使基因组学领域发生了革命性变化,基因测序所需成本和时间大大降低,同时一些新的应用,诸如宏基因组测序,基因结构变异和基因表达分析也被开发了出来。尽管如此,目前其他物种基因组的组装,离达到人类基因组项目设定的标准的,常规的基因组还有非常长的距离。受限于测序原理和技术瓶颈,现有的高通量并行测序仪的测序读长分布从几百碱基到几千碱基不等,而需要进行研究的基因组或基因组元件的或可多达几亿个碱基,这要求生物信息学家将这些片段化的信息重新还原成生物体本身的长片段染色体信息,即对测序生成的短片段进行组装。
事实上,基因组的组装效果受到很多方面的制约,除了制定的测序策略,测序数据质量,以及所使用的组装软件等人为因素外,待研究的基因组的自身特性也会影响基因组组装的效果,其中较为重要的两个因素是基因组中的重复区域和基因组的杂合程度。由于现有测序手段的读长较短,无法跨越重复区域因而造成拼接失败。而基因组的杂合程度过高则会导致组装软件将同源染色体单独组装出来,从而造成组装的基因组偏离于基因组的真实情况。在不改变现有测序仪读长的情况下,这些因素无法通过改进组装拼接算法完全消除。此外,潜在的测序错误,以及文库构建过程中扩增导致的偏向性及错误都会对组装效果产生负面影响。
因此越来越多的研究者试图通过在实验设计方面进行改进从而提升基因组拼装效果。针对基因组重复区域,传统的方法一般通过增加文库跳跃长度来辅助组装,如构建不同长度的Mate pair jump文库或Fosmid文库,用于跨过不同大小的基因组重复区域。另外还可以采用混合测序类型的方法,如使用PacBio测序仪长读长序列生成长脚手架序列,然后利用illumina测序仪的短读长进行错误修正,从而达到较好的组装效果。
而对于高杂合度引起的组装问题,可以通过单倍型组装定相(Phasing)来解决。即通过实验手段将单倍体信息从多倍体信息中分离出来,从而使单倍体型被完整的组装出来。目前已经有一些研究使用不同的研究手段获得了样品的单倍体型信息,这些手段包括:1.通过对样品以及样品的父本进行全基因组测序进而获得样品的单倍体型信息。2.通过使用Fosmid测序方法进行单倍体型测序。3.在细胞分裂中期,使用显微操作技术将染色体分离并测序,进而获得单倍体型信息。4.通过临近随机连接法进行单倍体测序。
然而,以上几种单倍型定相方法均具有一定的局限性:1.同时对父代和子代样本进行测序,然后根据基因型进行单倍型定相的方法要求同时拥有父本和
母本的样品,这大大限制了它的使用范围,并且该方法无法对De novo突变进行检测。2.使用Fosmid测序的方法需要至少一周的文库制备时间,包含大量建库实验,因此该方法需要微克级的样品作为起始。无法针对临床少量样品进行分析。3.染色体分离方法需要有复杂的专业显微操作设备,同时要求实验人员的操作水平非常高。4.受限于实验原理,临近连接法检测到的突变体有限,只能检测到80%左右的SNV,不能满足临床分析的需要。因此,为了应对个体化医疗的需要,急需一种高准确度,高覆盖度,低成本,低起始量,实验条件相对简便的单倍体型测序技术。
针对这一情况,研究人员研发了使用短读序列组装成长读长片段,从而进行单倍体型组装的方法。代表技术有Complete Genomics公司的长片段读取(Long Fragment Read,简称LFR技术,下同)技术,以及illumina公司旗下的Truseq synthetic long read产品。在实验原理上,这两者与Fosmid单倍型分型方法相似,都是通过将DNA随机分离到不同物理分隔中,以实现将不同来源的同源染色体分离的效果。而与Fosmid测序不同的是,这两者的文库构建时间均有明显缩短,并且无需大量操作,单倍型组装效果均可达到Haplotype N50500kb左右。特别值得一提的是,LFR技术仅需要100pg左右的DNA,即10-20个细胞的基因组作为起始即可完成文库构建,并可以覆盖92%以上的SNV位点,其正确率可以达到99.99999%,相比同样基于连接测序的全基因组测序方法的正确率提高了10倍。
也应承认,目前的短读组装技术仍存在一定不足之处。首先,该类型方法目前多使用384孔板作为物理分隔,理论上讲每个分隔内包含10%-20%的基因组,仍然是较高的单分隔基因组容量,组装上仍存在一定难度。其次,illumina公司的Truseq synthetic long read产品在起始步骤将DNA打断至8-10kb的片段,该方法对具有大重复区域的基因组的组装效果不理想。而Complete Genomics的LFR技术起始量低的样本进行建库,需要使用多重链置换反应(Multiple Displacement Amplification,MDA)对样品进行大量扩增,该过程会带来一些扩增偏好性。
发明公开
本发明的一个目的是提供一种构建长片段测序文库的方法。
本发明提供的方法,包括如下步骤:
1)制备含有单链DNA的5184孔板;
所述制备含有单链DNA的5184孔板的方法为如下A)或B):
A)先提取待测样品基因组DNA,变性,得到单链DNA,再将所述单链DNA分子分装到5184孔板的每个孔中,得到含有单链DNA的5184孔板;
B)先将待测样品分装到5184孔板的每个孔中,再裂解变性,得到含有单链DNA的5184孔板;
所述单链DNA的片段长度不小于100Kb;
2)将全基因组扩增反应体系分装到所述含有单链DNA的5184孔板的每个孔中,进行全基因组扩增反应,得到含有全基因组扩增产物的5184孔板;
3)将片段化反应体系分装到所述含有全基因组扩增产物的5184孔板的每个孔中,进行片段化反应,得到片段化产物,含有所述片段化产物的孔板即为含有片段化产物的5184孔板;
所述片段化产物的长度为200-1500bp;
4)对所述含有片段化产物的5184孔板每个孔中片段化产物添加不同标签序列,即得到测序文库。
上述方法中,所述含有单链DNA的5184孔板中的单链DNA总量满足5184孔板中每个孔内出现来自父母双方相同基因组位置的(两条有相互交叉的)同源染色体片段的概率小于1%;
所述含有单链DNA的5184孔板中的单链DNA总量具体为10-500个细胞的DNA量。
上述方法中,所述5184孔板每孔的容积为190纳升至350纳升,所述5184孔板每孔的容积具体为200纳升至350纳升。
上述方法中,步骤1)中,所述变性或所述裂解变性采用的试剂为碱变性试剂;
所述变性的反应条件为25度孵育2分钟;
所述裂解变性的反应条件为85摄氏度孵育2分钟;
所述提取待测样品基因组DNA的方法采用透析法或碱裂解法或琼脂糖包埋法。
上述方法中,步骤2)中,所述全基因组扩增反应体系中含有由8个碱基组成的随机引物。
上述方法中,在步骤1)和步骤2)之间,还包括如下步骤:将由8个碱基组成的随机引物分装到所述含有单链DNA的5184孔板中。
上述方法中,步骤2)中,所述全基因组扩增反应体系为多重链置换聚合酶扩增反应体系;
所述全基因组扩增反应为30℃孵育1小时,再65℃孵育5分钟;
或所述全基因组扩增反应为37℃孵育45分钟,再65℃孵育5分钟。
上述方法中,步骤3)中,所述片段化反应体系为转座酶反应体系;
所述转座酶反应体系包括转座酶反应缓冲液和包埋接头的转座酶;
所述片段化反应的条件为55℃孵育10分钟;
所述接头为接头1和/或接头2。
所述接头1和接头2为序列不同的接头。
上述方法中,步骤4)中,所述添加不同标签序列的方法包括如下步骤:将72种3’端标签引物、聚合酶链式反应体系和72种5’端标签引物分装到所述含有片段化产物的5184孔板各孔中,进行聚合酶链式反应,得到原始测序文库;
所述3’端标签引物从5’至3’方向依次由测序接头A、8个碱基组成随机片段和与所述接头1互补的单链DNA分子组成;
所述5’端标签引物从5’至3’方向依次由测序接头B、8个碱基组成随机片段和与所述接头2互补的单链DNA分子组成;
所述72条3’端标签引物和72条5’端标签引物的8个碱基组成随机片段均不相同;
所述聚合酶链式反应体系含有引物1和引物2;
所述引物1与所述测序接头A相同或互补;
所述引物2与所述测序接头B互补或相同。
所述测序接头A和测序接头B为序列不同的接头。
上述方法中,所述72种3’端标签引物的分装方式为:将所述72种3’端标签引物分装到所述含有片段化产物的5184孔板的72列的每个孔中;
所述72种5’端标签引物的分装方式为:将所述72种5’端标签引物分装到所述含有片段化产物的5184孔板的72行的每个孔中;
所述聚合酶链式反应体系的分装方式为:将所述聚合酶链式反应体系分装到所述含有片段化产物的5184孔板的每个孔中。
上述方法中,所述聚合酶链式反应的退火温度为60℃30秒。
上述方法中,所述分装的方法均为:先将各待分装物质分装到384孔板中的分装模式孔中,再将所述分装模式孔中的物质用纳升级分液器分装到所述5184孔板各孔中;
所述分装的量均为各孔等量分装。
上述方法中,所述384孔板的分装模式孔符合4×2排列,且长边为2短边为4,所述分装模式孔的孔数范围8-384孔,所述分装模式孔具体为24孔或72孔。
上述方法中,在步骤4)中得到原始测序文库后,还包括如下步骤:将所述原始测序文库进行文库片段分选,得到长度在200-1100bp范围内的片段选择文库,即为长片段测序文库。
上述方法中,所述200-1100bp片段选择文库为250-550bp片段选择文库和550-1000bp片段选择文库;
所述分选的方法为磁珠纯化或琼脂糖电泳纯化;
上述方法中,所述待测样品为真核细胞或混合物微生物细胞,所述真核细胞来源于人类血液细胞;
所述混合物微生物细胞来源于含有混合微生物的粪便或含有混合微生物土壤。
由上述方法制备的长片段测序文库也是本发明保护的范围。
上述方法在长片段DNA分子测序中的应用也是本发明保护的范围;
所述长片段DNA分子不小于100Kb;
上述方法在混合微生物纯化单菌落中的应用也是本发明保护的范围;
所述混合微生物具体来源于粪便或土壤。
本发明的另一个目的是提供一种制备长片段测序文库的成套产品。
本发明提供的产品,包括384孔板、上述5184孔板和纳升级分液器。
本发明的第三个目的是提供一种上述定制的5184孔板。
本发明提供的定制的5184孔板,其特征在于:所述孔板每孔容积为200-350纳升。
所述变性试剂为碱变性试剂,配方如实施例中的表1;
所述384孔板的样品分装模式符合4*2且长边为2短边为4,分装孔数范围8-384孔,优选24孔或72孔。
所述包埋接头为接头1和/或接头2;可根据需要自行设计。本实施例的接头1和接头2的设计是针对illumina测序仪应用而设计的:
接头1和接头2序列
接头1序列及结构:
5′-GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAG-3′
3′-TCTACACATATTCTCTGTC-5′
接头2序列及结构:
5′-GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAG-3′
3′-TCTACACATATTCTCTGTC-5′
若更换其他的测序平台,可以根据下方设计原则,更换相应的序列。接头1和接头2的序列的设计原则为:
1.设计的接头1和接头2是由两条链组成,一条长链一条短链。
2.长链端的3’末尾19bp序列固定为AGATGTGTATAAGAGACAG(5’至3’方向)。
3.长链的5’末端可根据不同的测序平台更换为不同序列。如针对Life Technology公司的Ion Proton测序仪,接头1长链变更为CCTCTCTATGGGCAGTCGGTGATAGATGTGTATAAGAGACAG(5’至3’方向);接头2的长链变更为:CCATCTCATCCCTGCGTGTCTCCGACTCAGAGATGTGTATAAGAGACAG(5’至3’方向)。
4.短链为一条19bp的寡聚核苷酸链,序列为CTGTCTCTTATACACATCT(5’至3’方向),它与2中所述长链端的3’末尾19bp序列为反向互补关系。
5.应当注意到,接头1与接头2的序列可以相同,也可以不相同。
6.针对不同的测序仪的接头1和接头2,需要按照PCR引物设计原则,有针对性的设计不同的引物1、2、3和4。
图1为长片段测序技术原理示意图。
图2为长片段测序文库构建方法流程图。
图3为本方法建库时间与illumina常规文库建库所需时间对比。
图4为本方法建库时间与主流单倍型文库建库所需时间对比。
图5为高分子量DNA提取电泳结果示意图。
图6为给定的分隔数量下,细胞数量与父本母本片段交叠概率关系图。
图7为384孔储液来源板加液位置示意图1。
图8为384孔储液来源板加液位置示意图2。
图9为未进行片段选择文库的Agilent 2100 bioanalyzer检测结果图。
图10为片段选择文库(250-550bp)的Agilent 2100 bioanalyzer检测结果图。
图11为片段选择文库(550-1000bp)的Agilent 2100 bioanalyzer检测结果图。
图12为长片段100kb拼接结果示意图(请将英文替换为中文)。
实施发明的最佳方式
下述实施例中所使用的实验方法如无特殊说明,均为常规方法。
下述实施例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
实施例1、长片段测序文库的构建
一、待测序单链DNA等量分装到5184孔板中每个孔中
1、长片段基因组DNA的提取
提取待测样本基因组DNA,且使提取的基因组DNA片段不小于100kb;
具体提取方法如下:
使用Agilent RecoverEase透析法DNA提取试剂盒,提取人血液中的白细胞(也可以为培养细胞,组织样品则需进行研磨匀浆处理。)基因组DNA,提取后使用脉冲场凝胶电泳检测其片段长度分布,检测条件为:6V/cm,50-90秒转换范围,运行时间20小时。结果如图5中第三泳道,看出提取片段长度分布主带大于100kb,片段跨度为50kb-800kb。
2、长片段基因组DNA双链变为单链DNA
1)、反应起始用量
本例以人类基因组为例,人类基因组包含6Gb单倍体型信息,经计算采用150皮克,即20个人类细胞所含DNA质量的DNA为最终的反应起始用量。该用量保证了在所有物理分隔中,同一分隔内出现来自父母双方相同基因组位置的(两条有相互交叉的)同源染色体片段的概率小于1%。待测样品的来源(物种)不同,DNA用量也不同,范围为10-500个细胞的DNA量。
设细胞数量Cells=n,物理分隔数量(孔数)Wells=w,细胞为二倍体基因组;
定义每一个孔中存在的基因组数量λ=4n/w;
可推出同一孔中出现来自父本和母本相同基因组位置的同源染色体片段的概率公式:
X为单碱基位点的测序深度;
由于孔中长片段来自父本或母本的概率都为50%,因此最终每个孔中长片段位于基因组相同位置,且分别来自父本和母本的概率:P*50%。
由此公式可以推出,在同一孔中出现来自父本和母本的相同基因组位置的同源染色体片段的概率只与投入的细胞数量和物理分隔数量有关,而与投入片段长度和基因组大小无关。
在给定的分隔数量下,可以生成细胞数量与概率的分布图(图6)。按照该分布图,可以推算出在某概率下,需要投入的细胞数量。
对于本发明为了满足同一分隔内出现来自父母双方相同基因组位置的同源染色体片段的概率小于1%,则5184孔板中DNA加入量为10-500个细胞的DNA量。
2)、双链变单链
将DNA的浓度调整至664.56pg/μL,使用剪口移液吸头吸取9.57μL DNA样品置于1.5mL离心管中备用。随后将20μL碱变性试剂原液(配方如表1所示)稀释10倍至200μL,取172.3μL稀释后液体加入到上述1.5mL离心管中,轻弹管壁混匀。室温静止两分钟(变性)后将反应体系转移至于冰上,注意此处处理时间不得超过5分钟,否则会对DNA样品造成损害。该步骤的目的是破坏其DNA分子双链间的氢键,使其成为单链DNA。
表1为碱变性试剂配方
3、单链DNA分装到5184孔板中
1)、随机引物的添加
取出13.40μL步骤二的2得到的样品,加入到49.78μL的1mM浓度的8碱基随机引物中,无需混匀。室温静止两分钟。随后向样品中加入496.78μL无核酸酶的水,使总体积达到560μL。
2)、将步骤1中处理好的样品,使用阔口移液器吸头分装至384孔板中的24个孔(具体位置如图7蓝色标记所示,24孔为第1列和第2列的前12孔,样品分装模式符合4*2且长边为2短边为4,分装孔数范围8-384孔,优选24
孔,但不限于。此处样品分装模式为3组精密排列的4*2的孔,每孔体积22微升(每孔DNA含量为18.41皮克)。
3)、使用WaferGenMultisampleNanodispenser仪器(分液器),将步骤2中准备好384孔板中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序等量分装到定制化的5184孔板的各孔中,每个孔里面加35nl体积,每个孔中DNA含量为0.0293皮克(1个人白细胞细胞核内DNA含量为6.5皮克,5184个孔中的细胞总数为23个),得到含有DNA单链的5184孔板。
定制化的5184孔板为适合分液器5184孔板,且每孔容积为200nl的孔板。定制化的5184孔板每孔容积范围在200nl-350nl,也可用普通标准5184孔板,视不同来源的DNA起始量而定。
二、全基因组扩增反应
1、上述一得到的含有DNA单链的5184孔板使用封口膜封好,将容器放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。室温放置备用(为了使溶液放在管底部)。
2、在1.5mL离心管中,依据表2加入配置好的多重链置换聚合酶扩增缓冲反应液(全基因组扩增反应体系),之后进行震荡混匀,离心后将该反应液分装到384孔板中的24孔,具体位置如图7所示,每孔体积22微升。
表2为多重链置换聚合酶扩增缓冲反应液
试剂 | 体积(微升) |
ddH20 | 421.92 |
10x Phi Buffer(Enzymatics#B7020) | 110.21 |
10%Pluronic F68 | 1.18 |
25mM dNTPs | 11.85 |
Phi29(Enzymatics#P7020-LC-L) | 14.81 |
3、小心移除容器表面的封膜,使用WaferGenMultisampleNanodispenser仪器,将步骤2中384孔板24孔中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序等量分装到经过上述1)处理后的含有DNA单链的5184孔板各孔中。
4、加液完成后,将容器使用封口膜封好,将容器放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。随后将其放入适配的温浴设备中,30℃孵育1小时(也可以37℃孵育45分),再65℃孵育5分钟。最后冷却至室温备用,得到全基因组扩增产物,得到含有全基因组扩增产物的5184孔板。
三、片段化
将上述二得到的全基因组扩增产物进行片段化,使其大小为200-1500bp,采用包埋接头1和接头2的转座酶进行片段化;
1、在1.5mL离心管中,依据表3说明配置好的包埋接头1和接头2的转座酶反应体系。之后上下颠倒二十次混匀,短暂离心后将该反应体系分装到384
孔板的24孔中,具体位置如图7所示,每孔体积22微升。
接头1序列及结构:
5′-GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAG-3′
3′-TCTACACATATTCTCTGTC-5′
接头2序列及结构:
5′-GCCTCCCTCGCGCCATCAGAGATGTGTATAAGAGACAG-3′
3′-TCTACACATATTCTCTGTC-5′
灰色背景内碱基通过碱基互补配对原则,以氢键作用相互结合在一起,保证两条序列组成一个特定的接头。
表3为用于片段化的转座酶反应液
试剂 | 体积(微升) |
5x Tagment Buffer(Vazyme#TD108-02) | 336.00 |
Tagment Enzyme Mix*(Vazyme#TD108-02) | 224.00 |
2、小心移除容器表面的封膜,之后使用
WaferGenMultisampleNanodispenser仪器,将步骤1中准备好的384孔板24孔中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序等量分装到上述二得到的含有全基因组扩增产物的5184孔板各孔中。
3、加液完成后,将该容器使用封口膜封好,将容器放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。随后将其放入适配的温浴设备中,55℃孵育10分钟,之后冷却至室温备用。
4、依据表4配置好转座酶中和反应液,之后震荡混匀,短暂离心后将该反应液通过装到384孔板的24孔中,具体位置如图7所示,每孔体积14微升,再将384孔板24孔中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序等量分装到上述3得到的5184孔板各孔中,得到200-1500bp的片段化产物,即得到含有片段化产物5184孔板。
片段化产物通过Agilent Bioanalyzer 2100进行毛细管电泳确定片段大小分布,如图10和图11,可以看出大小为200-1500bp。
表4为转座酶中和反应液
试剂 | 体积(微升) |
5X NT buffer(Vazyme#TD108-02) | 823.2 |
ddH2O | 164.64 |
四、加标签序列得到测序文库
1、加标签序列用于区分不同样本
1)、取出已经分装在96孔板内的72种带有标签序列的3’端引物(2.5μmol/L,序列如表5所示),使用八道移液器取0.56μL,加入到384孔
板的72孔中,如图8绿色区域所示。将384孔板置于混匀仪上震荡混匀,随后将孔板放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。检查无气泡后室温静置备用。
3’端标签引物从5’至3’方向依次由测序接头A、8个碱基组成随机片段和与接头1互补的单链DNA分子组成;72条3’端标签引物的8个碱基组成随机片段均不相同;
表5 3’端标签序列引物
上表中,下划线为测序接头A、斜体为8个碱基组成随机片段,加粗部分为与所述接头A互补的单链DNA分子。
2、小心移除容器表面的封膜,随后使用
WaferGenMultisampleNanodispenser仪器,将上述1得到的384孔板中的样品,使用“LFR 35nl 72Sample dispensing.seq”程序等量分装到上述三得到的含有片段化产物5184孔板72列的每个孔中。
3、加液完成后,将该容器使用封口膜封好,将容器放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。室温静置10分钟,备用。
4、依据表6配置好聚合酶链式反应缓冲液,表6中Primer1和2的序列见表7。之后震荡混匀,短暂离心后将该反应液分装到384孔板中,具体位置如图8绿色区域所示所示,每孔体积15.6微升;使用“LFR 35nl Single Sample dispensing.seq”程序等量分装到经上述3处理得到的5184孔板各孔中。
表6聚合酶链式反应缓冲液
表7 Primer1和Primer2引物序列
名称 | 序列5'-3' |
Primer1 | AATGATACGGCGACCACCGA |
Primer2 | CAAGCAGAAGACGGCATACGA |
引物1与测序接头A相同或互补;
引物2与测序接头B互补或相同。
5、取出已经分装在96孔板内的72种带有标签序列的5’端引物(2.5μmol/L,序列如表8所示),使用八道移液器取0.56μL,加入到准备好
的384孔板中。将384孔板置于混匀仪上震荡混匀,随后将孔板放入Eppendorf 5810孔板离心机,3220×g,离心5分钟。检查无气泡后室温静置备用。
5’端标签引物从5’至3’方向依次由测序接头B、8个碱基组成随机片段和与接头B互补的单链DNA分子组成;
72条3’端标签引物和72条5’端标签引物的8个碱基组成随机片段均不相同;
表8 5’端标签序列引物
上表中,下划线为测序接头B、斜体为8个碱基组成随机片段,加粗部分为与接头B互补的单链DNA分子。
6、小心移除容器表面的封膜,随后使用WaferGenMultisampleNanodispenser仪器,将步骤5中准备好的的384孔板中的样品,使用“LFR 50nl 72 Sample dispensing.seq”程序等量分装到经上述4处理得到的5184孔板的72行的每个孔中。
7、将经上述6处理得到的5184孔板按照表9所示的PCR程序进行聚合酶链式扩增。
表9聚合酶链式扩增反应程序
8、将经上述7处理后的5184孔板中所有孔的扩增产物混合,得到测序文库(原始)。
将测序文库(原始)用2%琼脂糖电泳,回收电泳产物,实现测序文库(原始)分选,得到250-550bp片段选择文库1和550-1000bp片段选择文库2。
将测序文库(原始)、片段选择文库1、片段选择文库2使用Agilent Bioanalyzer进行检测,测定其文库片段分布情况。
图9为原始文库结果,可以看出,片段为200-1500bp;
图10为片段选择文库1结果,可以看出,片段为250-550bp;
图11为片段选择文库2结果,可以看出,片段为550-1000bp。
用实时荧光定量PCR检测文库中的有效分子浓度标准qPCR浓度(可以使用Kapa Biosystems公司的KK4824 KAPA Library Quantification Kits for illumina platform进行qPCR检测。该试剂盒包含该检测的所有试剂及引物。qPCR程序如下表10)),结果如表11。
表10 qPCR使用程序
表11文库中的有效分子浓度
检测合格(合格标准为qPCR浓度大于5nmol/L)后进行illuminaHiseq 2000或者Hiseq 2500测序。
用SOAP2软件进行拼接测序结果,结果为拼接得到100KB片段,说明本发明方法正确,可以测序大小为100KB的长片段。
拼接分析如下:
方法:(1)使用soap2把reads比对Ref hg19,比对时只取unique的比对结果(-r 0)。
(2)使用soap.coverage计算reads在reference上单点覆盖度(物理覆盖度-phy)。
(3)计算每个well上覆盖的块区(block)的长度和block之间的间隔(gap)的长度。注:这里gap容忍度为500K,即长度小于500K的不认定为gap。把三种插入长度的文库合并起来计算。
结果:
长片段长度分布直方图如图12所示。
中位数:45,510bp
平均值:195,457bp
标准差:551,263bp
最大值:147,859,018bp。
实施例2、长片段测序文库用于分离纯化单菌
一、长片段基因组DNA的提取
提取混合微生物基因组DNA,且使提取的基因组DNA片段不小于100kb(100-800kb),同时变性基因组DNA片段;
具体提取方法如下:
1、分离菌落
1)取0.2克含复杂微生物群落的鸡粪置于1.5毫升离心管中,用1000微升磷酸盐缓冲液进行悬浮,涡旋振荡5分钟至充分混匀。
2)将上述样品在高速离心机上以10000转每分钟的转速离心5分钟,丢弃上清液后加入800微升磷酸盐缓冲液,涡旋振荡3分钟至充分混匀。
3)将上述样品在离心机上以2000转每分钟的转速离心3分钟,然后小心吸取上清到一个新的1.5毫升离心管中,然后重复本步骤2次;
4)将上述样品在高速离心机上以10000转每分钟的转速离心5分钟,丢弃上清液后加入800微升磷酸盐缓冲液,涡旋振荡3分钟至充分混匀,然后重复本步骤2次。
5)使用孔径20微米的醋酸纤维滤膜对前述样品进行过滤,收集滤液到一个新的1.5毫升离心管中,然后在高速离心机上以10000转每分钟的转速离心5分钟,丢弃上清;得到微生物混合物;
2、细胞固定
在上述1得到的微生物混合物中加入800微升用无菌水配制的4%多聚甲醛溶液,涡旋振荡3分钟至充分混匀后置于4摄氏度静置过夜,得到固定微生物细胞。
将上述固定微生物细胞短暂漩涡振荡至混匀,用红细胞计数板对其中细胞浓度进行计数,然后用无菌水稀释至30细胞/微升后备用,得到稀释后微生物细胞。
3、裂解微生物得到长片段核酸分子
1)将20微升碱变性试剂原液(见表1内容)稀释30倍至600微升,并分装至384孔板中,具体位置如图7中蓝色标记所示,每孔加入体积22微升。
2)使用WaferGenMultisampleNanodispenser仪器,将步骤1)中准备好的384孔板中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序加入到定制化的5184孔板中。每孔容积为350纳升。
3)加液完成后,将容器使用封口膜封好,将容器放入Eppendorf 5810孔板离心机用4000转每分钟的转速离心5分钟,然后小心移去封口膜,金属浴加热容器至85摄氏度保持15分钟,使容器中的水全部蒸发,然后取出容器待其自然冷却至室温;
4)将上述2得到的稀释后微生物细胞分装至384孔板的24孔中,具体位置如图7中蓝色标记所示,每孔加入体积22微升。
5)使用WaferGenMultisampleNanodispenser仪器,将步骤4)中准备好的384孔板中的样品,使用“LFR 35nl Single Sample dispensing.seq”程序加入到步骤3)所述的容器中。
6)加液完成后,将容器使用封口膜封好,将容器放入Eppendorf 5810孔板离心机用4000转每分钟的转速离心5分钟。随后将其放入适配的温浴设备中,
85摄氏度孵育2分钟,将容器放入Eppendorf 5810孔板离心机用4000转每分钟的转速离心2分钟,再将容器放入85摄氏度孵育2分钟。
7)将上述容器放入Eppendorf 5810孔板离心机用4000转每分钟的转速离心5分钟,置于室温备用,得到含有单链DNA分子样本,含有其的孔板为含有单链DNA分子样本的5184孔板。
二、全基因组扩增反应
与实施例1的二方法相同。
三、片段化
与实施例1的三方法相同。
四、加标签序列,区分不同样本
与实施例1的六方法相同,得到测序文库(原始)。
分选得到500bp-600bp片段选择文库。
用实时荧光定量PCR检测文库中的有效分子浓度。检测合格后进行illuminaHiseq 2000或者Hiseq 2500测序。
结果:得到已知粪便中的一个细菌基因组的组装。
工业应用
本方法通过采用一种新型的高通量微量移液平台进行前期建库实验。该平台可以进行35纳升的移液操作,从而将多重链置换反应的反应体积控制在100纳升以下,有证据显示降低反应体积可以显著改善多重链置换反应的扩增偏好性。同时本方法将物理分隔数目提升到了5184孔,在建库DNA起始量保持不变的情况下,该容器每个物理分隔中所包含的DNA仅为原有384分隔的产品的十分之一左右,即1%左右的基因组,这无疑降低了每个分隔中denovo组装的难度。同时,物理分隔数目的增加可以降低来自父母双方相同基因组位置的同源染色体片段在相同孔内出现的概率至0.5%-1%,这对于定相长度的提升有非常大的帮助,从而最终可以得到更好的单倍体型组装效果。(图1,长片段测序技术原理示意图。)
本方法通过将多重链置换扩增技术,转座酶片段化技术与纳升级微量移液平台进行优化整合,建立了一套可用于低样品起始量的单倍体型文库构建流程。(图2为长片段测序文库构建方法流程图。)
本方法操作相对简单,反应步骤较少,文库构建时间只需10小时,其中手工操作部分仅为3.5小时。作为对比,传统的illumina测序文库构建需要2天时间,手工操作部分需要5小时左右。而Fosmid构建单倍体型文库技术需要8天时间才能完成文库构建,而illumina商业化的Truseq synthetic long reads试剂盒需要三天时间完成建库,其中手工操作时间需6-8小时。对比以上两种建库方式,本方法在报告周期方面具有一定优势。(图3为本方法建库时间与illumina常规文库建库所需时间对比。图4为本方法建库时间与主流单倍型文库建库所需时间对比。)
除以上特点之外,本方法仅需一台纳升级微量移液平台,无需其他仪器或者自动化移液装置的辅助即可完成全部实验流程。相比之下,染色体中期分离进行单倍体型构建技术需要的显微操作装置,Complete Genomics公司的LFR技术需要三台大型自动化移液装置,本方法无疑具有更好的可行性。
应用方面,本方法对DNA样品的需求量较少,最低可以低至150皮克(等效于25个人类细胞中所含有的DNA量)。传统的Fosmid构建单倍体型文库技术需要8微克DNA进行建库,illuminaTruseq synthetic long reads试剂盒则需要500ng DNA作为起始。起始量的降低使得本方法的应用场景得到了极大拓展,可以用于游离肿瘤细胞,游离胎儿细胞或胚胎细胞的单倍型测序。
本方法采用了较温和的提取方法,可以分离出较长的DNA片段,不小于100kb。而长片段的优势在于可以更好的解决基因组中重复区域的组装,同时也可以提升在杂合区域的组装表现。因此可以将本方法应用于动物,植物以及微生物的基因组de novo组装文库的构建。
除以上两种应用场景之外,当以一定浓度的细胞悬液作为起始材料时,本方法可以实现单细胞分离的效果。再辅以一定的细胞观测以及裂解手段,本方法可以实现高通量单细胞测序文库构建,每次运行理论上可以平行构建约1000个单细胞测序文库。因此可以将该方法用于复杂环境中的菌群研究,进而可以更深入的了解复杂环境中菌群组成。更重要的是,使用本方法可以获得一些无法分离培养的细菌的基因组信息,这无疑对于宏基因组学研究具有非常重大的意义。作为延伸,还可以将本方法应用于单细胞组学研究当中,可作为前期高通量,自动化单细胞扩增文库的构建方法。
无论是Fosmid方法,Truseq synthetic long reads试剂盒,还是Complete Genomics公司的LFR技术都是基于分隔稀释法的原理进行单倍型组装,即通过物理分隔的方法降低同源染色体片段出现在同一分隔中的概率,从而对杂合SNP进行定相(Phasing),其分隔数一般不超过384。可以通过同一样品进行重复实验的方法增加分隔数,但这意味着建库和测序成本成倍增加。
总之,本发明的方法是通过使用纳升级微量移液装置以及定制的5184孔容器,可以有效的提升物理分隔数,进一步降低同源染色体片段出现在同一分隔中的概率,从而实现更好的单倍型组装效果。该定制容器相较原有容器的单位孔容积更大,可以容纳200nl-350nl体积的试剂(原有容积为100nl)。该定制化容器可以容纳5次加样,因此可以在该定制化5184孔容器中完成全部建库反应。
也应当注意到,操作人员可以将该5184孔容器进行再次分隔处理,即可以在一个容器中同时处理多个样品,如2×2597,或3×1728等模式,从而达到增加通量,降低成本的目的。
另外还应注意到,本方法还可以通过新增加一个维度标签的方式,引入虚拟分隔,从而进一步提升分隔数目。如在DNA转座酶片段化过程中引入两个组
合标签,标记第一重物理分隔的位置,之后在聚合酶链式扩增过程中,再引入两个组合标签,标记第二重虚拟分隔的位置,从而最终可以实现5184×5184(包括但不限于此,可以是12×5184,24×5184或者更多)物理分隔的效果。
Claims (20)
- 一种构建长片段测序文库的方法,包括如下步骤:1)制备含有单链DNA的5184孔板;所述制备含有单链DNA的5184孔板的方法为如下A)或B):A)先提取待测样品基因组DNA,变性,得到单链DNA,再将所述单链DNA分子分装到5184孔板的每个孔中,得到含有单链DNA的5184孔板;B)先将待测样品分装到5184孔板的每个孔中,再裂解变性,得到含有单链DNA的5184孔板;所述单链DNA的片段长度不小于100Kb;2)将全基因组扩增反应体系分装到所述含有单链DNA的5184孔板的每个孔中,进行全基因组扩增反应,得到含有全基因组扩增产物的5184孔板;3)将片段化反应体系分装到所述含有全基因组扩增产物的5184孔板的每个孔中,进行片段化反应,得到片段化产物,含有所述片段化产物的孔板即为含有片段化产物的5184孔板;所述片段化产物的长度为200-1500bp;4)对所述含有片段化产物的5184孔板每个孔中片段化产物添加不同标签序列,即得到测序文库。
- 根据权利要求1所述的方法,其特征在于:所述含有单链DNA的5184孔板中的单链DNA总量满足5184孔板中每个孔内出现来自父母双方相同基因组位置的同源染色体片段的概率小于1%;所述含有单链DNA的5184孔板中的单链DNA总量具体为10-500个细胞的DNA量。
- 根据权利要求1或2所述方法,其特征在于:所述5184孔板每孔的容积为190纳升至350纳升,所述5184孔板每孔的容积具体为200纳升至350纳升。
- 根据权利要求1-3中任一所述的方法,其特征在于:步骤1)中,所述变性或所述裂解变性采用的试剂为碱变性试剂;所述变性的反应条件为25度孵育2分钟;所述裂解变性的反应条件为85摄氏度孵育2分钟;所述提取待测样品基因组DNA的方法采用透析法或碱裂解法或琼脂糖包埋法。
- 根据权利要求1-4中任一所述的方法,其特征在于:步骤2)中,所述全基因组扩增反应体系中含有由8个碱基组成的随机引物。
- 根据权利要求1-5中任一所述的方法,其特征在于:在步骤1)和步骤2)之间,还包括如下步骤:将由8个碱基组成的随机引物分装到所述含有单链DNA的5184孔板中。
- 根据权利要求1-6中任一所述的方法,其特征在于:步骤2)中,所述全基因组扩增反应体系为多重链置换聚合酶扩增反应体系;所述全基因组扩增反应为30℃孵育1小时,再65℃孵育5分钟;或所述全基因组扩增反应为37℃孵育45分钟,再65℃孵育5分钟。
- 根据权利要求1-7中任一所述的方法,其特征在于:步骤3)中,所述片段化反应体系为转座酶反应体系;所述转座酶反应体系包括转座酶反应缓冲液和包埋接头的转座酶;所述片段化反应的条件为55℃孵育10分钟;所述接头为接头1和/或接头2。
- 根据权利要求1-8任一所述的方法,其特征在于:步骤4)中,所述添加不同标签序列的方法包括如下步骤:将72种3’端标签引物、聚合酶链式反应体系和72种5’端标签引物分装到所述含有片段化产物的5184孔板各孔中,进行聚合酶链式反应,得到原始测序文库;所述3’端标签引物从5’至3’方向依次由测序接头A、8个碱基组成随机片段和与所述接头1互补的单链DNA分子组成;所述5’端标签引物从5’至3’方向依次由测序接头B、8个碱基组成随机片段和与所述接头2互补的单链DNA分子组成;所述72条3’端标签引物和72条5’端标签引物的8个碱基组成随机片段均不相同;所述聚合酶链式反应体系含有引物1和引物2;所述引物1与所述测序接头A相同或互补;所述引物2与所述测序接头B互补或相同。
- 根据权利要求9所述的方法,其特征在于:所述72种3’端标签引物的分装方式为:将所述72种3’端标签引物分装到所述含有片段化产物的5184孔板的72列的每个孔中;所述72种5’端标签引物的分装方式为:将所述72种5’端标签引物分装到所述含有片段化产物的5184孔板的72行的每个孔中;所述聚合酶链式反应体系的分装方式为:将所述聚合酶链式反应体系分装到所述含有片段化产物的5184孔板的每个孔中。
- 根据权利要求9或10所述的方法,其特征在于:所述聚合酶链式反应的退火温度为60℃30秒。
- 根据根据权利要求1-11任一所述的方法,其特征在于:所述分装的方法均为:先将各待分装物质分装到384孔板中的分装模式孔中,再将所述分装模式孔中的物质用纳升级分液器分装到所述5184孔板各孔中;所述分装的量均为各孔等量分装。
- 根据权利要求12所述的方法,其特征在于:所述384孔板的分装模式孔符合4×2排列,且长边为2短边为4,所述分 装模式孔的孔数范围8-384孔,所述分装模式孔具体为24孔或72孔。
- 根据权利要求9-13中任一所述的方法,其特征在于:在步骤4)中得到原始测序文库后,还包括如下步骤:将所述原始测序文库进行文库片段分选,得到长度在200-1100bp范围内的片段选择文库,即为长片段测序文库。
- 根据权利要求14所述的方法,其特征在于:所述200-1100bp片段选择文库为250-550bp片段选择文库和550-1000bp片段选择文库;所述分选的方法为磁珠纯化或琼脂糖电泳纯化。
- 根据权利要求1-15任一所述的方法,其特征在于:所述待测样品为真核细胞或混合物微生物细胞,所述真核细胞来源于人类血液细胞;所述混合物微生物细胞来源于含有混合微生物的粪便或含有混合微生物土壤。
- 由权利要求1-16中任一所述方法制备的长片段测序文库。
- 权利要求1-16中任一所述方法在长片段DNA分子测序中的应用;所述长片段DNA分子不小于100Kb;或权利要求1-16中任一所述方法在混合微生物纯化单菌落中的应用;所述混合微生物具体来源于粪便或土壤。
- 一种制备长片段测序文库的成套产品,包括384孔板、权利要求1-16中任一所述方法中的所述5184孔板和纳升级分液器。
- 一种权利要求1-16中任一所述方法中的定制的5184孔板,其特征在于:所述孔板每孔容积为200-350纳升。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201680003838.3A CN107002153B (zh) | 2015-02-04 | 2016-01-13 | 一种构建长片段测序文库的方法 |
US15/667,841 US10456769B2 (en) | 2015-02-04 | 2017-08-03 | Method of constructing sequencing library |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510058616.9 | 2015-02-04 | ||
CN201510058616 | 2015-02-04 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/667,841 Continuation US10456769B2 (en) | 2015-02-04 | 2017-08-03 | Method of constructing sequencing library |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016124069A1 true WO2016124069A1 (zh) | 2016-08-11 |
Family
ID=56563430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/070789 WO2016124069A1 (zh) | 2015-02-04 | 2016-01-13 | 一种构建长片段测序文库的方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10456769B2 (zh) |
CN (1) | CN107002153B (zh) |
WO (1) | WO2016124069A1 (zh) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109295163B (zh) * | 2018-10-09 | 2022-11-15 | 中国农业科学院深圳农业基因组研究所 | 一种通用长片段染色体步移的方法 |
WO2020135347A1 (zh) * | 2018-12-29 | 2020-07-02 | 深圳华大生命科学研究院 | 一种dna甲基化检测的方法、试剂盒、装置和应用 |
CN112538475A (zh) * | 2019-09-23 | 2021-03-23 | 武汉华大医学检验所有限公司 | 植物基因组dna的提取方法、测序方法及试剂盒 |
KR102177386B1 (ko) * | 2019-11-05 | 2020-11-11 | 주식회사 마크로젠 | 차세대염기서열분석을 위한, 마이크로웨이브를 이용한 dna 추출방법 및 이의 용도 |
CN113444769B (zh) * | 2020-03-28 | 2023-06-23 | 深圳人体密码基因科技有限公司 | 一种dna标签序列的构建方法及其应用 |
CN114250279B (zh) * | 2020-09-22 | 2024-04-30 | 上海韦翰斯生物医药科技有限公司 | 一种单倍型的构建方法 |
CN112251491B (zh) * | 2020-10-23 | 2024-03-12 | 江苏吉诺思美精准医学科技有限公司 | 一种毛细管96孔板的cDNA建库方法 |
CN118043478B (zh) * | 2023-10-20 | 2025-03-14 | 北京昌平实验室 | 用于可靠的无创胚胎植入前基因检测的方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101932729A (zh) * | 2007-12-05 | 2010-12-29 | 考利达基因组股份有限公司 | 测序反应中碱基的有效确定 |
CN102864498A (zh) * | 2012-09-24 | 2013-01-09 | 天津工业生物技术研究所 | 一种长片段末端文库的构建方法 |
WO2014145820A2 (en) * | 2013-03-15 | 2014-09-18 | Complete Genomics, Inc. | Multiple tagging of long dna fragments |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6401267B1 (en) * | 1993-09-27 | 2002-06-11 | Radoje Drmanac | Methods and compositions for efficient nucleic acid sequencing |
US9524369B2 (en) * | 2009-06-15 | 2016-12-20 | Complete Genomics, Inc. | Processing and analysis of complex nucleic acid sequence data |
US9957564B2 (en) * | 2010-06-30 | 2018-05-01 | Bgi Genomics Co., Ltd. | Application of a PCR sequencing method, based on DNA barcoding technique and DNA incomplete shearing strategy, in HLA genotyping |
US9469874B2 (en) * | 2011-10-18 | 2016-10-18 | The Regents Of The University Of California | Long-range barcode labeling-sequencing |
US9968901B2 (en) * | 2012-05-21 | 2018-05-15 | The Scripps Research Institute | Methods of sample preparation |
US20170009288A1 (en) * | 2014-02-03 | 2017-01-12 | Thermo Fisher Scientific Baltics Uab | Method for controlled dna fragmentation |
-
2016
- 2016-01-13 WO PCT/CN2016/070789 patent/WO2016124069A1/zh active Application Filing
- 2016-01-13 CN CN201680003838.3A patent/CN107002153B/zh active Active
-
2017
- 2017-08-03 US US15/667,841 patent/US10456769B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101932729A (zh) * | 2007-12-05 | 2010-12-29 | 考利达基因组股份有限公司 | 测序反应中碱基的有效确定 |
CN102864498A (zh) * | 2012-09-24 | 2013-01-09 | 天津工业生物技术研究所 | 一种长片段末端文库的构建方法 |
WO2014145820A2 (en) * | 2013-03-15 | 2014-09-18 | Complete Genomics, Inc. | Multiple tagging of long dna fragments |
Also Published As
Publication number | Publication date |
---|---|
US20170341051A1 (en) | 2017-11-30 |
US10456769B2 (en) | 2019-10-29 |
CN107002153A (zh) | 2017-08-01 |
CN107002153B (zh) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016124069A1 (zh) | 一种构建长片段测序文库的方法 | |
AU2022204365B2 (en) | Preserving genomic connectivity information in fragmented genomic DNA samples | |
Zhang et al. | Advances in metagenomics and its application in environmental microorganisms | |
AU2015296029B2 (en) | Tagging nucleic acids for sequence assembly | |
JP2020522243A (ja) | 核酸のマルチプレックス末端タギング増幅 | |
WO2018039969A1 (en) | Methods of whole genome digital amplification | |
CN105492668B (zh) | 基本无偏差的基因组扩增 | |
CA3149201A1 (en) | Genetic mutational analysis | |
US20210108263A1 (en) | Methods and Compositions for Preparing Sequencing Libraries | |
US20230005568A1 (en) | Method of correcting amplification bias in amplicon sequencing | |
US20160040228A1 (en) | Sequencing strategies for genomic regions of interest | |
Hu et al. | Accurate CNV identification from only a few cells with low GC bias in a single-molecule sequencing platform | |
CN106222164B (zh) | 利用转座子酶进行体外核酸单向扩增的方法、组合物和试剂盒 | |
Tatarkina et al. | Isolation of highly purified genomic material from mitochondria of muscle tissue cells | |
Rapley | Molecular cloning and DNA sequencing | |
HK40068506A (zh) | 在片段化的基因組dna樣本中保留基因組連接信息 | |
WO2019178465A1 (en) | Methods for joint low-pass and targeted sequencing | |
HK1234106A1 (zh) | 在片段化的基因組 dna 樣品中保留基因組連接信息 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16746064 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16746064 Country of ref document: EP Kind code of ref document: A1 |