WO2024104129A1 - 一种对基因组的一部分进行富集的方法与相关应用 - Google Patents

一种对基因组的一部分进行富集的方法与相关应用 Download PDF

Info

Publication number
WO2024104129A1
WO2024104129A1 PCT/CN2023/128105 CN2023128105W WO2024104129A1 WO 2024104129 A1 WO2024104129 A1 WO 2024104129A1 CN 2023128105 W CN2023128105 W CN 2023128105W WO 2024104129 A1 WO2024104129 A1 WO 2024104129A1
Authority
WO
WIPO (PCT)
Prior art keywords
annealing temperature
genome
sequence
1min
sequencing
Prior art date
Application number
PCT/CN2023/128105
Other languages
English (en)
French (fr)
Inventor
常玉晓
赵胜
陈鹏
王雅美
向勇
刘毓文
Original Assignee
中国农业科学院深圳农业基因组研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国农业科学院深圳农业基因组研究所 filed Critical 中国农业科学院深圳农业基因组研究所
Publication of WO2024104129A1 publication Critical patent/WO2024104129A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the invention relates to a method for enriching a part of a genome and related applications, belonging to the technical field of genetic engineering.
  • SSR Simple Sequence Repeats
  • the principle of SSR molecular marker detection technology is based on the short tandem repeat sequences that are widely distributed throughout the genome, which produce polymorphisms between different samples due to differences in the number of repeat sequence units; when performing whole-genome SSR marker detection, it is necessary to design a pair of specific primers in the flanking sequence of each SSR site in advance, and then detect the genotypes of all SSR sites one by one.
  • 2DNA chip detection technology mainly compares and analyzes the genome sequences of the core resources of this species, selects commonly present differential sites on the genome for nucleic acid hybridization detection, and then determines the genome polymorphism of the test material.
  • 3Multiplex PCR detection technology is a unified single-tube PCR amplification reaction for multiple target sites (several to thousands) distributed on the genome. Because each target site requires a pair of specific primers, the PCR reaction system contains amplification primers for all sites.
  • 4 Simplified genome sequencing technology mainly performs high-throughput sequencing on the regions near the restriction endonuclease cleavage sites on the genome (accounting for about 10%).
  • the library construction process of this type of technology mainly involves the selection of restriction endonucleases, enzyme cleavage, adapter ligation, PCR amplification, and multi-step purification.
  • sequence variation information of tens of thousands or millions of regions near the restriction endonuclease sites can be obtained (depending on the amount of sequencing data and the type of endonuclease), which is used for whole genome polymorphism analysis.
  • 5 Whole genome resequencing technology is to sequence and analyze the entire genome sequence. Generally, it is necessary to first use ultrasound, transposase or fragmentase to randomly break the genomic DNA, and then perform adapter ligation and subsequent PCR amplification library construction process. Usually, the variation information of hundreds of thousands to millions of sites covering the whole genome can be obtained.
  • One object of the present invention is to provide a method for enriching a portion of a genome.
  • Another object of the present invention is to provide related applications of the method for enriching a portion of a genome, specifically including applications in species genetics and breeding detection, such as detecting prospect genes and genetic background in breeding.
  • the present invention provides a method for enriching a portion of a genome, the method comprising:
  • the genomic DNA of the sample to be tested was amplified by thermal asymmetric interlaced PCR (TAIL-PCR) using a pair of sequence-specific primers (Sequence Specific Primer) and random degenerate primers (Arbitrary Degenerate Primer).
  • TAIL-PCR thermal asymmetric interlaced PCR
  • the PCR amplification products were purified and sequencing libraries were constructed.
  • sequence-specific primer refers to a primer that does not contain degenerate bases, that is, its degeneracy is 1.
  • the method of enriching a portion of the genome of the present invention can be used to detect the background genotype of the whole genome (i.e., not focusing on the foreground gene), and has an effect similar to simplified genome sequencing; it can also be used for the simultaneous detection of the foreground gene and the genetic background.
  • a sequence-specific primer In the method for enriching a part of a genome of the present invention, only two primers are needed in the corresponding sequencing library construction process: a sequence-specific primer and another primer containing random degenerate bases (i.e., a random degenerate primer).
  • the sequence-specific primer is used for specific annealing of target sites on the genome and non-specific annealing of tens of thousands of sites, and the random degenerate primer is used for Pairing, through a thermal asymmetric staggered PCR amplification, a part of the genome (1% to 10%) can be enriched.
  • the detection of the genetic background of the whole genome mainly relies on the non-specific amplification between the primer and the DNA template, that is, as long as the distance between the two primers on the genome through the specific or non-specific binding position is appropriate, the middle fragment can be amplified and sequenced, and these fragments can be evenly distributed in the whole genome.
  • all amplified products can be screened for fragments within a certain range. For example, when the PE150 sequencing strategy is selected later, the amplified products within the range of the target length (such as 250-600bp) can be considered for library construction and sequencing; or, sequencing adapters can be directly connected to both ends of all amplified products (including 250-600bp).
  • the region (350-700bp, including sequencing adapters) that can be used for PE150 sequencing is sorted with the help of fragment sorting instruments such as SageELF or Pippin HT, and then high-throughput sequencing is performed.
  • fragment sorting instruments such as SageELF or Pippin HT
  • high-throughput sequencing is performed.
  • the detection of the foreground gene can mainly rely on the specific amplification of the sequence-specific primer and the target gene.
  • the present invention can utilize the specific annealing of the sequence-specific primer at the target site on the genome and the non-specific annealing of tens of thousands of sites, and the paired amplification with the random degenerate primer, and can detect the foreground markers in the sample and the background markers of the whole genome through one PCR amplification, which has important application value for genetic and breeding detection.
  • the sequence-specific primer in the method of enriching a portion of a genome of the present invention, can be designed according to the sequence of any region on the genome.
  • the sequence-specific primer can be designed according to the foreground gene sequence (for example, designed according to the vicinity of the polymorphic site within the foreground gene sequence). More specifically, the sequence-specific primer binding position is within 150bp of the polymorphic site within the foreground gene, for example, 10-150bp. This ensures that the sequencing read covers the polymorphic site, and also ensures complete linkage of the sequence-specific primer to the foreground gene.
  • the length of the sequence-specific primer is 18-30 nt, preferably 18-25 nt.
  • a 6-10 nt barcode sequence may be additionally added to the 5' end of the sequence-specific primer.
  • the barcode sequence may be used to distinguish sample types.
  • the four bases ATCG in the barcode sequence are evenly distributed to avoid polybase types such as continuous AAAA.
  • the present invention has no other special requirements for the barcode sequence, as long as it can achieve the function of distinguishing different samples.
  • the length of the random degenerate primer is 8-20 nt, preferably 8-15 nt, more preferably 10-15 nt.
  • the degeneracy of the random degenerate primer is 64-4096, preferably 120-3072.
  • a 6-10 nt barcode sequence may be additionally added to the 5' end of the random degenerate primer.
  • the barcode sequence may be used to distinguish sample types.
  • PCR amplification When the amount of random degenerate primers exceeds the amount of sequence-specific primers.
  • a high-fidelity DNA polymerase (DNA polymerase for high-throughput sequencing library) is used in the reaction system.
  • DNA polymerase for high-throughput sequencing library DNA polymerase for high-throughput sequencing library
  • the annealing temperature and cycle conditions of the primers can be adjusted as needed.
  • the annealing temperature of the primers can be adjusted according to the needs of the primer Tm value.
  • the number of amplification cycles can be 20-37 in total (one "denaturation, annealing, extension" is counted as one cycle unit, and the number is counted as 1 cycle number).
  • the number of PCR cycles can also be increased or decreased according to the amount of PCR product. Generally, about 500ng is amplified, which is sufficient for subsequent library construction.
  • the thermal asymmetric staggered PCR amplification includes a process of staggered cycling reactions at a higher annealing temperature and a lower annealing temperature.
  • the thermal asymmetric staggered PCR amplification in the method for enriching a portion of a genome of the present invention comprises staggering 6-10 cycles at a higher annealing temperature and a lower annealing temperature, each cycle comprising 2-3 cycle units and at least one cycle unit annealed at a higher annealing temperature and at least one cycle unit annealed at a lower annealing temperature. More specifically, each cycle adopts one of the following methods (1) to (8):
  • each cycle includes 3 cycle units, the number of cycles is counted as 3, and the number of cycles for 6-10 cycles is counted as 18-30);
  • the thermal asymmetry in the method for enriching a portion of a genome of the present invention is
  • the higher annealing temperature is 50-60° C.
  • the lower annealing temperature is 5-15° C. lower than the higher annealing temperature.
  • the lower annealing temperature is 40-50° C.
  • the higher temperatures are preferably substantially the same, and the lower temperatures are preferably substantially the same.
  • the thermal asymmetric staggered PCR amplification includes at least two stages of cyclic reactions: the first stage of the cyclic reaction mainly allows more (compared to random degenerate primers) specific primers to bind to the template, thereby obtaining more amplified products; the second stage of the cyclic reaction mainly allows the degenerate primers to bind to the template more easily (compared to sequence-specific primers) to initiate an amplification reaction.
  • the thermal asymmetric staggered PCR amplification includes at least two stages of cyclic reactions:
  • the first stage cycle reaction includes 3-7 cycles at a higher annealing temperature, each cycle includes denaturation, annealing at a higher annealing temperature, and extension; preferably, each cycle includes: denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C for 1min-3min;
  • the second stage cyclic reaction includes 6-10 cycles at a higher annealing temperature and a lower annealing temperature, each cycle includes 2-3 cycle units and at least one cycle unit annealed at a higher annealing temperature and at least one cycle unit annealed at a lower annealing temperature.
  • each cycle adopts one of the methods described in (1) to (8) above. More preferably, each cycle includes: denaturation, annealing at a higher annealing temperature, extension, denaturation, annealing at a higher annealing temperature, extension, denaturation, annealing at a lower annealing temperature, extension.
  • each cycle includes: denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C for 1min-3min; denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C for 1min-3min; denaturation at 90-98°C for 20s-2min, annealing at a lower annealing temperature for 30s-1min, and extension at 72°C for 1min-3min.
  • the thermal asymmetric staggered PCR amplification of the present invention is carried out according to the following conditions:
  • each higher annealing temperature is independently 50-60°C
  • each lower annealing temperature is independently 40-50°C.
  • the process of purifying the PCR amplification product and constructing a sequencing library can be performed according to conventional operations in the relevant field.
  • A was added to the 3' end and connected to the Y-shaped Illumina sequencing adapter to complete the preparation of the sequencing library.
  • the method of enriching a part of the genome of the present invention also includes: sequencing the sequencing library, or the sequencing library is used for high-throughput sequencing after fragment sorting.
  • all amplified products can be screened for fragments within a certain range.
  • the amplified products within the target length such as 250-600bp
  • sequencing adapters can be directly connected to both ends of all amplified products (including 250-600bp).
  • the region (350-700bp) that can be used for PE150 sequencing is sorted with the help of fragment sorting instruments such as SageELF or Pippin HT, and high-throughput sequencing is performed.
  • the method of enriching a part of a genome of the present invention further comprises:
  • the high-throughput sequencing data of the sequencing library are split and compared with the reference genome to detect the reads coverage within the foreground gene and at the sequence-specific primers, as well as the distribution of high-depth tag sites enriched by reads and with a coverage depth of ⁇ 3 ⁇ on the whole genome. These high-depth tag sites can be used for subsequent whole-genome genotype detection.
  • the present invention also provides the use of the method for enriching a part of a genome in marker-assisted selection breeding or genomic breeding.
  • the method can be used to perform whole genome genetic background detection on a species, or to perform foreground gene and genetic background detection on a species.
  • the method of the present invention can be used to sequence species such as animals and plants.
  • the species include but are not limited to rice, dogs and pigs.
  • the method of enriching a portion of the genome of the present invention can be used to detect background genotypes of the entire genome (ie, not focusing on foreground genes), which has an effect similar to simplified genome sequencing.
  • the method of enriching a part of the genome of the present invention can also realize the simultaneous detection of the foreground gene and the genetic background, so the detection process is faster and more efficient.
  • the sequence-specific primers used are located near the polymorphic sites inside the foreground gene, thereby ensuring that the specific amplification products of the sequence-specific primers are completely linked to the foreground gene.
  • the sequence-specific primers anneal specifically at the target sites on the genome and the non-specific annealing of tens of thousands of sites, and the pairing with random degenerate primers, the amplified tags sites appear stably and evenly cover the entire genome.
  • the method of the present invention does not need to interrupt the genome in advance, and only uses sequence-specific primers and random degenerate primers to perform PCR amplification on the genome once, and can stably obtain tens of thousands of high-depth tags sites that evenly cover the entire genome for the detection of the genetic background of breeding materials.
  • FIG. 1 shows the PCR amplification results of primers Pi2_F01 and AD1_R03 in Example 1.
  • FIG. 2 shows the results of the Chinese library construction test in Example 1.
  • FIG. 3 shows the fragment sorting results after mixing the three libraries in Example 1.
  • Figure 4 shows the distribution of sequencing data corresponding to three technical repetitions in Example 1 on the rice genome.
  • Picture A reads are enriched in the foreground gene (the blue horizontal line in the figure indicates the position of the Pi2 gene interval);
  • Picture B reads are enriched at the Pi2_F01 primer binding site (the red arrow in the figure indicates the sequence-specific primer binding site);
  • Picture C Sequencing reads in three technical repetitions are significantly enriched on the genome, stably forming high-depth tags;
  • Picture D The density distribution of the common tags in the three technical repetitions on the genome.
  • FIG. 5 shows the PCR amplification results of primers Rf4_F01 and AD2_R03 in Example 2.
  • FIG. 6 shows the fragment distribution range detection after the three libraries are mixed in Example 2.
  • FIG. 7 shows the fragment sorting results after mixing the three libraries in Example 2.
  • Figure 8 shows the distribution of sequencing data corresponding to three technical repetitions on the rice genome in Example 2.
  • Picture A reads are enriched in the foreground gene (the blue horizontal line in the figure indicates the position of the Rf4 gene interval);
  • Picture B reads are enriched at the Rf4_F01 primer binding site (the red arrow in the figure indicates the sequence-specific primer binding site);
  • Picture C Sequencing reads in three technical repetitions are significantly enriched on the genome, stably forming high-depth tags;
  • Picture D The density distribution of the total tags in the three technical repetitions on the genome.
  • FIG. 9 shows the detection results represented by dog_139-Rep1 and pig_421-Rep1 samples during the PCR product purification and fragment distribution detection process in Example 3.
  • FIG. 10 shows the library detection results represented by dog_139-Rep1 and pig_421-Rep1 samples during the high-throughput sequencing library construction process in Example 3.
  • FIG. 11 shows the fragment sorting results in Example 3.
  • FIG. 12 shows that during the sequencing data splitting and tags feature analysis in Example 3, the reads generated by random amplification of primers Hd1_F01 and AD1_R3B are significantly enriched on the genomes of pigs and dogs, stably forming high-depth tags sites.
  • FIG. 13 shows the distribution positions of the common tags in three replicates of dog samples and pig samples in Example 3 on the genome.
  • each raw reagent material can be obtained commercially, and the experimental method without specifying specific conditions is a conventional method and conventional conditions well known in the art, or according to the conditions recommended by the instrument manufacturer.
  • the present invention designs a pair of primers to perform a PCR amplification on the genomic DNA of the material to be tested, and then performs high-throughput sequencing and data analysis to complete the simultaneous detection of the foreground gene and the genetic background.
  • One pair of primers includes: A forward sequence-specific primer F and a reverse random degenerate primer R are used on the foreground gene. Based on the PCR amplification after the specific binding between the forward sequence-specific primer F and the foreground gene sequence, the foreground gene is screened; based on the non-specific amplification caused by the forward sequence-specific primer F and the reverse random degenerate primer R in the PCR reaction, the genetic background is detected.
  • the method of the present invention mainly includes:
  • a conventional forward sequence-specific primer F with a length of 18-25 nt is designed near the polymorphic site within the gene (a 6-10 nt barcode sequence can be added to the 5' end of the F primer to distinguish different samples); according to the genome sequence characteristics of the species to be tested, a reverse random degenerate primer R with a length of 8-20 nt is designed (a 6-10 nt barcode sequence can also be added to the 5' end of the R primer to distinguish different samples).
  • the degenerate base is located in the middle of the primer, and the degeneracy is between 64-4096.
  • genomic DNA of the material to be tested is subjected to a PCR amplification using the primers.
  • the extraction of genomic DNA can refer to the prior art.
  • the genomic DNA of the sample to be tested can be extracted using the CATB method or a commercial kit, etc., and the DNA is required to have clear and complete bands detected by agarose gel, without obvious degradation and RNA contamination.
  • a high-fidelity DNA polymerase is used to perform the PCR reaction, and the annealing temperature and cycle conditions of the primers are adjusted to simultaneously achieve the purpose of detecting the foreground gene and the genetic background.
  • the total amount of the final PCR amplification product (especially the total amount of the 200-600bp region) can be adjusted according to the Tm value of the forward sequence-specific primer and the total amount of the final PCR amplification product (especially the total amount of the 200-600bp region), and the total amount of the amplification product is expected to be around 500ng, which is convenient for subsequent library construction and subsequent PE150 sequencing.
  • the PCR amplification is a thermal asymmetric staggered PCR amplification, including two-stage cyclic reactions: the first stage cyclic reaction mainly allows more specific primers (compared to random degenerate primers) to bind to the template, thereby obtaining more amplified products; the second stage cyclic reaction mainly allows the degenerate primers to bind to the template more easily (compared to sequence-specific primers) to trigger an amplification reaction.
  • the first stage cyclic reaction includes 3-7 rounds of cycles at a higher annealing temperature, and each cycle includes denaturation, annealing at a higher annealing temperature, and extension.
  • the second stage cyclic reaction includes 6-10 rounds of cycles staggered at a higher annealing temperature and a lower annealing temperature, and each cycle includes 3 cycle units and at least one cycle unit annealed at a higher annealing temperature and at least one cycle unit annealed at a lower annealing temperature.
  • each cycle includes: denaturation, annealing at a higher annealing temperature, extension, denaturation, annealing at a higher annealing temperature, extension, denaturation, annealing at a lower annealing temperature, and extension.
  • the thermal asymmetric staggered PCR amplification of the present invention is performed according to the following conditions:
  • the first stage of cycle reaction denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C 1min-3min; 3-7 cycles;
  • the second stage of the cycle reaction is denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C for 1min-3min; denaturation at 90-98°C for 20s-2min, annealing at a higher annealing temperature for 30s-1min, and extension at 72°C for 1min-3min; denaturation at 90-98°C for 20s-2min, annealing at a lower annealing temperature for 30s-1min, and extension at 72°C for 1min-3min; 6-10 cycles;
  • each higher annealing temperature is independently 50-60°C
  • each lower annealing temperature is independently 40-50°C.
  • the amplified product was purified using magnetic beads, followed by steps such as adding A to the 3' end and connecting to the Y-shaped Illumina sequencing adapter to complete the preparation of the sequencing library. After fragment sorting, the library can be used for PE150 high-throughput sequencing.
  • the sequencing data corresponding to each sample can be obtained, and then it is compared with the reference genome to detect the reads coverage at the forward sequence-specific primer F within the foreground gene (foreground gene screening), as well as the distribution of high-depth tag sites enriched by reads and with a coverage depth of ⁇ 3 ⁇ on the whole genome.
  • These polymorphic tag sites can be used for subsequent whole-genome genotype detection (genetic background detection).
  • Example 1 Simultaneous detection of the foreground gene Pi2 (rice blast resistance gene) and genetic background in rice using the forward sequence-specific primer Pi2_F01 and the reverse random degenerate primer AD1_R03
  • a method for simultaneously detecting the foreground gene Pi2 (rice blast resistance gene) and the genetic background of rice using the forward sequence-specific primer Pi2_F01 and the reverse random degenerate primer AD1_R03 is provided.
  • the main process of the method includes:
  • the genomic DNA of young leaves of rice variety Nipponbare was extracted by CTAB method, and after passing the agarose gel test, it was diluted with sterile ultrapure water.
  • the minimum dilution concentration of 50 ⁇ L reaction system should be 3.85 ng/ ⁇ L, and in this embodiment, it was diluted to 10 ng/ ⁇ L (Qubit concentration).
  • a PCR thin-walled tube with a specification of 200 ⁇ L was used to prepare a reaction system with a total volume of 50 ⁇ L on ice, and the mixture was mixed evenly by repeatedly blowing and beating 10 times with a pipette.
  • 3 technical replicates (Pi2-Rep1, Pi2-Rep2 and Pi2-Rep3) were set.
  • the sequence of the forward sequence-specific primer Pi2_F01 used was: 5'- TAACAGCCAA CCTCCGAACAACGCCAACTG-3' (SEQ ID NO: 1, the underlined 10nt is the barcode sequence connected to the forward sequence-specific primer); the sequence of the reverse random degenerate primer AD1_R03 was: 5'- TCAGTGAGTC GCCVNVNNNCGG-3' (SEQ ID NO: 2, the underlined 10nt is the barcode sequence connected to the reverse primer; the V and N in the middle are degenerate bases; the degeneracy of this primer is 2304).
  • the PCR product was purified using 1.8 ⁇ magnetic beads (90 ⁇ L; purchased from Vazyme, catalog number: N411-03), and then eluted with 30 ⁇ L sterile ultrapure water to obtain the purified product.
  • concentration and total amount of the purified product were determined using the Qubit 3.0 instrument, and the results are shown in Table 3.
  • the fragment size distribution after purification was detected using Qsep100 nucleic acid fragment analyzer.
  • the fragment size distributions of the three samples were basically consistent, and the detection results of Pi2-Rep1 are shown in Figure 1.
  • the purified products of the three technical repetitions were sorted using the Sage Pippin HT nucleic acid fragment sorter, and the fragments in the range of 200-600bp were recovered and diluted to 20ng/ ⁇ L with sterile ultrapure water. Then 200ng was taken and a terminal repair reaction system with a total volume of 60 ⁇ L was prepared according to Table 4. The reaction procedure was: 105°C hot cover; 30°C stand for 20min; 72°C stand for 20min; 4°C end reaction. Then, according to Table 5, the 3' end addition "A" and short linker connection reaction (a total of 100 ⁇ L) were prepared, and the reaction procedure was: 20°C stand for 30min.
  • the PCR product was purified using 0.8 ⁇ magnetic beads (80 ⁇ L; purchased from Vazyme, item number: N411-03), and 36 ⁇ L sterile ultrapure water was used for elution to obtain the purified product.
  • the Illumina long connector connection and library amplification reaction system was prepared (the total volume was 50 ⁇ L; different P5xx and P7xx combinations were added to three technical replicates, where xx represents different digital numbers, and each replicate has a specific P5xx+P7xx combination, which is mainly used to distinguish this sample/replicate, so that the offline sequencing data can be split according to this combination and correspond to each sample/replicate).
  • the reaction program was set according to Table 7, and this step of the reaction was completed using the Jena Biometra Tone 96 thermal amplification instrument in Germany (other similar instruments can be used). After the PCR amplification was completed, it was purified using 0.9 ⁇ magnetic beads (45 ⁇ L; purchased from Vazyme, item number: N411-03), and eluted with 30 ⁇ L of sterile ultrapure water to obtain the purified product. The concentration of the purified product was determined using the Qubit 3.0 instrument, and the results are shown in Table 8.
  • the size distribution of the purified fragments was detected using the Qsep100 nucleic acid fragment analyzer.
  • the library peaks of the three replicate samples were consistent, and the detection results represented by Pi2-Rep1 are shown in Figure 2.
  • reagents suitable for the BGI sequencing platform can also be used to convert the product after PCR amplification in the present invention into the library type required for BGI sequencing.
  • each sample has been added with a specific index (i.e., different combinations of P5xx and P7xx) during the library amplification process
  • all three libraries can be mixed together in equal amounts (the three libraries can also be sorted and sequenced separately), and the fragments suitable for subsequent PE150 sequencing (such as 350-700bp) can be sorted out using the Sage Pippin HT nucleic acid fragment sorter (mainly used to remove unexhausted primer sequences and short fragments less than 350bp), as shown in Figure 3 (12.8ng/ ⁇ L; 384ng in total), and high-throughput sequencing was performed using the Illumina sequencing platform.
  • the Sage Pippin HT nucleic acid fragment sorter mainly used to remove unexhausted primer sequences and short fragments less than 350bp
  • the number of high-depth tag sites with depth ⁇ 3 ⁇ detected in the whole genome in three technical replicates was 19,814, 20,103 and 19,266 respectively; among them, the number of common tags detected between the three technical replicates was 12,209, and the proportion of these tags in the total number of tags detected in the three technical replicates was between 60.73% and 63.37%.
  • Table 9 indicating that even in the present invention, the random amplification initiated by the primers, the amplification product still has high stability and repeatability.
  • the average spacing between adjacent tags is 30.55 Kb, and the standard deviation is 42.97 Kb. It can be seen that the common tags are evenly distributed on all chromosomes of the whole genome ( Figure D in Figure 4).
  • the present invention uses the forward sequence-specific primer Pi2_F01 and the reverse random degenerate primer AD1_R03 for one PCR amplification, which can not only complete the screening of the prospective rice blast resistance gene Pi2, but also stably obtain a large number of high-depth tags evenly distributed in the whole genome for genetic background detection.
  • Example 2 Simultaneous detection of foreground gene Rf4 (rice fertility restorer gene) and genetic background in rice using forward sequence-specific primer Rf4_F01 and reverse random degenerate primer AD2_R03
  • a method for simultaneously detecting the foreground gene Rf4 (rice fertility restoration gene) and genetic background of rice (the same rice variety Nipponbare as in Example 1) using the forward sequence-specific primer Rf4_F01 and the reverse random degenerate primer AD2_R03 is provided.
  • the main process of the method includes:
  • a 200 ⁇ L PCR thin-walled tube was used to prepare a reaction system with a total volume of 50 ⁇ L on ice, and the mixture was mixed evenly by repeatedly blowing and beating 10 times with a pipette.
  • Three technical replicates (Rf4-Rep1, Rf4-Rep2, and Rf4-Rep3) were set at the same time.
  • the sequence of the forward sequence-specific primer Rf4_F01 used is: 5'- TAACAGCCAA CTGCTTACA AAGTGAGGTGGTGT-3' (SEQ ID NO: 3, the underlined 10 nt is the barcode sequence connected to the forward sequence-specific primer); the sequence of the reverse random degenerate primer AD2_R03 is: 5'- TCAGTGAGTC GCCBNBNNNCGG-3' (SEQ ID NO: 4, the underlined 10 nt is the barcode sequence connected to the reverse primer; the B and N in the middle are degenerate bases; the degeneracy of this primer is 2304).
  • the PCR product was purified using 1.8 ⁇ magnetic beads (90 ⁇ L; purchased from Vazyme, catalog number: N411-03), and then eluted with 30 ⁇ L sterile ultrapure water to obtain the purified product.
  • concentration and total amount of the purified product were measured using the Qubit 3.0 instrument, and the results are shown in Table 10.
  • the fragment size distribution after purification was detected by Qsep100 nucleic acid fragment analyzer.
  • the fragment size distributions of the three samples were basically consistent, and the detection result represented by the Rf4-Rep1 sample is shown in Figure 5.
  • reagents suitable for the BGI sequencing platform can also be used to convert the product after PCR amplification in the present invention into the library type required for BGI sequencing.
  • the number of high-depth tag sites with depth ⁇ 3 ⁇ in the whole genome can be detected in three technical replicates, which are 18,578, 18,504 and 20,939 respectively; among them, the number of common tags detected between the three technical replicates is 12,350, and the proportion of these tags in the total number of tags detected in the three technical replicates is between 58.98-66.74% (Table 14).
  • the average spacing between adjacent tags is 30.02Kb, and the standard deviation is 38.51Kb. It can be seen that the common tags are evenly distributed on all chromosomes in the whole genome ( Figure D in Figure 8).
  • the present invention uses the forward sequence-specific primer Rf4_F01 and the reverse random degenerate primer AD2_R03 for one PCR amplification, which can not only complete the screening of the prospective rice fertility restorer gene Rf4, but also stably obtain a large number of high-depth tags evenly distributed in the whole genome for genetic background detection.
  • Example 3 Whole genome genetic background detection of animal samples using forward sequence-specific primer Hd1_F01 and reverse random degenerate primer AD1_R3B
  • a forward sequence-specific primer Hd1_F01 and a reverse random degenerate primer AD1_R3B are provided to perform whole genome genetic background detection on two animal DNA samples (a golden retriever dog_139 and a Duroc pig sample pig_421).
  • the main process of the method includes:
  • the FastPure Blood DNA Isolation Mini Kit V2 (purchased from Vazyme, catalog number: DC111-01) was used to extract the blood DNA of dog_139 and pig_421 samples according to the official instructions. After passing the agarose gel test, the samples were diluted with sterile ultrapure water. The minimum dilution concentration of the 50 ⁇ L reaction system should be 3.85 ng/ ⁇ L, and in this example, it was diluted to 10 ng/ ⁇ L (Qubit concentration).
  • a 200 ⁇ L PCR thin-walled tube was used to prepare a total volume of 50 ⁇ L of the reaction system on ice, and the mixture was mixed by repeatedly blowing and beating 10 times with a pipette.
  • Three technical replicates were set for each sample (dog_139-Rep1, dog_139-Rep2, and dog_139-Rep3; pig_421-Rep1, pig_421-Rep2, and pig_421-Rep3).
  • the sequence of the forward sequence-specific primer Hd1_F01 used is: 5'- TAACAGCCAA AGGACGGAGGTGGCCGGGATGGT-3' (SEQ ID NO: 5, the underlined 10 nt is the barcode sequence connected to the forward sequence-specific primer); the sequence of the reverse random degenerate primer AD1_R3B is: 5'- TCAGTGAGTC GCCVAVNGNCGG-3' (SEQ ID NO: 6, the underlined 10 nt is the barcode sequence connected to the reverse primer; the V and N in the middle are degenerate bases; the degeneracy of this primer is 144).
  • the PCR product was purified using 1.8 ⁇ magnetic beads (90 ⁇ L; purchased from Vazyme, catalog number: N411-03), and then eluted with 50 ⁇ L sterile ultrapure water to obtain the purified product.
  • concentration and total amount of the purified product were determined using the Qubit 3.0 instrument, and the results are shown in Table 15.
  • the fragment size distribution after purification was detected by Qsep100 nucleic acid fragment analyzer.
  • the fragment size distributions of the three samples were basically consistent.
  • the detection results represented by dog_139-Rep1 and pig_421-Rep1 samples are shown in Figure 9.
  • the library was constructed and purified according to step (5) in Example 2, and finally eluted with 21 ⁇ L of sterile ultrapure water to obtain the purified product.
  • concentration of the purified product was determined using the Qubit 3.0 instrument, and the results are shown in Table 16.
  • the size distribution of the purified library fragments was detected using the Qsep100 nucleic acid fragment analyzer.
  • the library peaks of the three replicate samples corresponding to dogs and pigs were basically the same, and the library detection results represented by dog_139-Rep1 and pig_421-Rep1 samples are shown in Figure 10.
  • Example 2 Library mixing and fragment sorting were performed according to step (5) in Example 2, and fragments suitable for subsequent PE150 sequencing (e.g., 400-800 bp) were sorted out from the total mixed library, as shown in FIG11 (2.05 ng/ ⁇ L; 61.5 ng in total), and high-throughput sequencing was performed using the Illumina sequencing platform.
  • PE150 sequencing e.g. 400-800 bp
  • the data volumes corresponding to the three technical replicates of dog_139 were 1.02 Gb, 1.13 Gb, and 1.29 Gb, respectively; the data volumes corresponding to the three technical replicates of pig_421 were 1.14 Gb, 1.08 Gb, and 1.48 Gb, respectively (Table 17).
  • the number of high-depth tag sites with depth ⁇ 3 ⁇ in the whole genome range detected in the three technical replicates of the dog_139 sample was 110,930, 115,395 and 122,240, respectively, of which the number of common tags detected between the three technical replicates was 71,885, and the proportion of these common tags in the total number of tags detected in the three technical replicates was between 58.81-64.80% (Table 17).
  • the number of high-depth tag sites with depth ⁇ 3 ⁇ in the whole genome range detected in the three technical replicates was 118,255, 111,705 and 133,166, respectively, of which the number of common tags detected between the three technical replicates was 74,937, and the proportion of these common tags in the total number of tags detected in the three technical replicates was between 56.27-67.08% (Table 17).
  • the method of enriching a part of the genome of the present invention can not only be used for the simultaneous detection of foreground genes and genetic background, but also can be used as a new genotype identification technology for genotype detection of the whole genome of animals and plants.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种对基因组的一部分进行富集的方法与相关应用,所述的对基因组的一部分进行富集的方法包括:采用序列特异引物和随机简并引物配对,对待测样本的基因组DNA进行热不对称交错式PCR扩增;对PCR扩增产物进行纯化并构建测序文库。利用序列特异引物在基因组上靶位点的特异性退火和数万个位点的非特异性退火,与随机简并引物的配对,通过(热不对称交错式)PCR扩增实现对基因组的一部分进行富集,可同时检测样本中的前景基因和全基因组的遗传背景,对于遗传和育种检测有重要应用价值。

Description

一种对基因组的一部分进行富集的方法与相关应用 技术领域
本发明是关于一种对基因组的一部分进行富集的方法与相关应用,属于基因工程技术领域。
背景技术
基因工程检测技术领域,如涉及动植物遗传或育种中所需的全基因组检测,经常会用到对基因组的一部分进行富集分析的方法。举例而言,标记辅助选择育种技术,需要先借助DNA分子标记或者生化标记对前景基因(Foreground Gene)进行跟踪筛选,随后利用全基因组均匀分布的分子标记对遗传背景(Genetic Background)进行检测,以期获得既含有尽可能小的前景基因片段、又使遗传背景尽可能接近受体品种的新材料,用于后续的育种过程。检测遗传背景的常用技术主要以下几种:简单序列重复(Simple Sequence Repeats,SSR)分子标记检测、DNA芯片(DNA Chip)检测、多重PCR(Multiplex PCR)检测、简化基因组测序(Reduced-Representation Genome Sequencing)和全基因组重测序(Whole Genome Resequencing)等。其中,①SSR分子标记检测技术原理是基于全基因组上广泛分布的短串联重复序列,因重复序列单元数目的差异而在不同样本间产生多态性;进行全基因组SSR标记检测时需要提前在每一个SSR位点的侧翼序列设计一对特异性引物,然后逐个检测所有SSR位点的基因型。②DNA芯片检测技术主要是通过对此物种核心资源基因组序列的比对分析,选取基因组上普遍存在的差异性位点进行核酸杂交检测,进而判断测试材料的基因组多态性。③多重PCR检测技术是对基因组上分布的多个靶位点(几个到几千个)进行统一的单管PCR扩增反应,因每个靶位点需要一对特异性的引物,因此PCR反应体系中包含所有位点的扩增引物。④简化基因组测序技术主要是对基因组上特定限制性内切酶酶切位点附近的区域(约占10%)进行高通量测序,这类技术的文库构建过程主要涉及限制性内切酶的选取、酶切、接头连接、PCR扩增以及多步纯化等步骤,一般可以获得几万或上百万个酶切位点附近区域的序列变异信息(取决于测序数据量及内切酶的种类),用于全基因组多态性分析。⑤全基因组重测序技术是对整个基因组序列进行测序分析,一般需要先利用超声波、转座酶或片段化酶随机断裂基因组DNA,再进行接头连接及后续PCR扩增的文库构建过程,通常可以获得覆盖全基因组的几十万至几百万个位点的变异信息。
然而,上述检测遗传背景的方法存在以下缺点:①SSR分子标记检测,要求每个位点独立扩增、扩增产物独立通过琼脂糖胶或聚丙烯酰胺凝胶电泳检测,因此检测技术落后,且当 样本量大时检测效率低下。②DNA芯片检测技术,前期需要昂贵的成本用于芯片设计、制作;检测的位点固定,当需要检测其它位点或用于其它物种时,现有芯片需要重新设计制作,因此通用性非常差。③多重PCR检测技术,需要先设计每一个靶位点扩增所需的引物;引物序列确定后需要昂贵的成本批量合成这些引物;PCR扩增体系包含全部的引物序列,因此存在引物间相互干扰的问题。④简化基因组测序技术,建库流程涉及多步酶切及连接反应,因此过程繁琐,且多步的酶学反应效率在不同样本间难以保持一致、造成测序数据中检测到的变异位点在样本间缺失率高。⑤全基因组重测序技术,基于超声波或转座酶方法进行文库构建的费用较高;一般要求每个测序数据量大(覆盖度5×以上),最终所得变异位点数量过多(一般育种中需要几百至几千个均分分布全基因组的标记足以进行背景检测),存在严重的信息冗余。
此外,上述检测遗传背景的方法多数不能实现分子标记辅助育种过程中所需要的前景基因和遗传背景的同时检测,前景基因的筛选和遗传背景的检测是两个相互独立的环节,因此整个筛选鉴定过程耗时较长,且操作流程繁琐。
发明内容
本发明的一个目的在于提供一种对基因组的一部分进行富集的方法。
本发明的另一目的在于提供所述对基因组的一部分进行富集的方法的相关应用,具体包括在物种遗传和育种检测中的应用,例如育种中检测前景基因和遗传背景。
一方面,本发明提供了一种对基因组的一部分进行富集的方法,该方法包括:
采用序列特异引物(Sequence Specific Primer)和随机简并引物(Arbitrary Degenerate Primer)配对,对待测样本的基因组DNA进行热不对称交错式PCR(Thermal Asymmetric Interlaced PCR,TAIL-PCR)扩增;
对PCR扩增产物进行纯化并构建测序文库。
根据本发明的具体实施方案,本发明中,所述“序列特异引物”是指该引物中不含有简并碱基,即,其简并度为1。
本发明的对基因组的一部分进行富集的方法,可用于全基因组背景基因型的检测(即不关注前景基因),具有类似于简化基因组测序的功效;也可用于前景基因和遗传背景的同时检测。
本发明的对基因组的一部分进行富集的方法,对应的测序文库构建过程中,只需要两条引物:一条序列特异的引物和另外一条含随机简并碱基的引物(即随机简并引物),利用序列特异引物在基因组上靶位点的特异性退火和数万个位点的非特异性退火,与随机简并引物 的配对,通过一次热不对称交错式PCR扩增可实现对基因组的一部分(1%~10%)进行富集。具体而言,全基因组遗传背景的检测主要依靠引物与DNA模板间的非特异性扩增,即只要两条引物在基因组上通过特异或非特异结合位置间的距离合适,中间的片段即可被扩增测序,且这些片段可在全基因组均匀分布。具体而言,本发明的方法中,可对所有的扩增产物进行一定范围内片段的筛选,比如后续选择PE150测序策略时,可考虑筛选目标长度(如250-600bp)范围内的扩增产物进行建库测序;或者,也可以直接在所有扩增产物(包含250-600bp)的两端连接测序接头,待文库构建完毕后,再借助SageELF或者Pippin HT等片段分选仪器对可被用于PE150测序的区域(350-700bp,含测序接头)进行分选,而后进行高通量测序。当本发明的方法需要对前景基因进行检测时,前景基因的检测可主要依靠序列特异引物与目标基因的特异性扩增。本发明可利用序列特异引物在基因组上靶位点的特异性退火与数万个位点的非特异性退火,与随机简并引物的配对扩增,能实现通过一次PCR扩增而检测样本中的前景标记和全基因组的背景标记,对于遗传和育种检测有重要应用价值。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述序列特异引物可根据基因组上任意区域的序列而设计。或者,当本发明的方法需要对前景基因进行检测时,所述序列特异引物可根据前景基因序列而设计(例如,根据前景基因序列内部的多态性位点附近而设计)。更具体地,序列特异引物结合位置与前景基因内的多态性位点间隔在150bp以内,例如10-150bp。这样可以确保测序read覆盖到多态性位点,也可以保证序列特异引物与前景基因的完全连锁。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述序列特异引物的长度为18-30nt,优选可以为18-25nt。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述序列特异引物的5’端可额外添加6-10nt的barcode序列。所述barcode序列可用以区分样品种类。
根据本发明的具体实施方案,本发明的方法中,barcode序列中ATCG四个碱基分布均匀即可,避免类似连续AAAA的多聚碱基类型。本发明对barcode序列无其它特殊要求,能实现区分不同样品的功能即可。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述随机简并引物的长度为8-20nt,优选为8-15nt,更优选为10-15nt。优选地,所述随机简并引物的简并度为64-4096,优选为120-3072。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述随机简并引物的5’端可额外添加6-10nt的barcode序列。所述barcode序列可用以区分样品种类。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,PCR扩增 时,随机简并引物的量超过序列特异引物的量。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,PCR扩增时,反应体系使用高保真DNA聚合酶(高通量测序文库用DNA聚合酶)。具体可参照所属领域的现有技术操作进行。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,PCR扩增时,可根据需要调整引物的退火温度和循环条件。引物的退火温度可根据引物Tm值的需要而调整。通常情况下,扩增循环数总共可为20-37(以一次“变性、退火、延伸”计为一个循环单元,计数为1循环数)。PCR循环数也可根据PCR产物的量进行增减,一般扩增500ng左右,足以用于后续文库构建。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,所述热不对称交错式PCR扩增包括在较高退火温度和较低退火温度下交错进行循环反应的过程。
在本发明的一些具体实施方案中,本发明的对基因组的一部分进行富集的方法中所述热不对称交错式PCR扩增包括在较高退火温度下与较低退火温度下交错进行6-10轮循环,每轮循环包括2-3个循环单元且其中至少1个在较高退火温度下退火的循环单元以及至少1个在较低退火温度下退火的循环单元,更具体地,每轮循环采用以下(1)至(8)所述方式之一:
(1)变性、较高退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸(即每轮循环包括3个循环单元,循环数计为3,进行6-10轮循环的循环数计为18-30);
(2)变性、较低退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸;
(3)变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸、变性、较低退火温度下退火、延伸;
(4)变性、较低退火温度下退火、延伸、变性、较低退火温度下退火、延伸、变性、较高退火温度下退火、延伸;或者
(5)变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸、变性、较高退火温度下退火、延伸;或者
(6)变性、较低退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较高退火温度下退火、延伸;或者
(7)变性、较低退火温度下退火、延伸、变性、较高退火温度下退火、延伸;或者
(8)变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中所述热不对 称交错式PCR扩增中,较高退火温度为50-60℃,较低退火温度比较高退火温度低5-15℃。优选地,较低退火温度为40-50℃。本发明的所述热不对称交错式PCR扩增中,各较高温度优选基本相同,各较低温度优选基本相同。
在本发明的一些具体实施方案中,本发明的对基因组的一部分进行富集的方法中,所述热不对称交错式PCR扩增包括至少两阶段循环反应:第一阶段循环反应主要使更多的(相比于随机简并引物而言)特异引物结合在模板上,进而获得更多的扩增产物;第二阶段循环反应主要使简并引物更容易地(相比于序列特异引物而言)与模板结合,引发扩增反应。具体而言,所述热不对称交错式PCR扩增包括至少两阶段循环反应:
第一阶段循环反应包括在较高退火温度下进行3-7轮循环,每轮循环包括变性、较高退火温度下退火、延伸;优选地,每轮循环包括:90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;
第二阶段循环反应包括在较高退火温度下与较低退火温度下交错进行6-10轮循环,每轮循环包括2-3个循环单元且其中至少1个在较高退火温度下退火的循环单元以及至少1个在较低退火温度下退火的循环单元。优选地,每轮循环采用前述(1)至(8)所述方式之一。更优选地,每轮循环包括:变性、较高退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸。更进一步优选地,每轮循环包括:90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较低退火温度下退火30s-1min,72℃延伸1min-3min。
在本发明的一些具体实施方案中,本发明的所述热不对称交错式PCR扩增按照以下条件进行:
90-98℃预变性1min-5min;
90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;3-7轮循环;
90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较低退火温度下退火30s-1min,72℃延伸1min-3min;6-10轮循环;
72℃终延伸5min-10min;
其中,各较高退火温度各自独立地为50-60℃,各较低退火温度各自独立地为40-50℃。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法中,对PCR扩增产物进行纯化并构建测序文库的过程可按照所属领域的常规操作进行。本发明中优选地, PCR扩增产物纯化后,进行3’末端加A、连接Y型Illumina测序接头,完成测序文库的制备。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法还包括:对测序文库进行测序,或者,测序文库经片段分选后用于高通量测序。如前述所提及的,本发明的方法中,可对所有的扩增产物进行一定范围内片段的筛选,比如后续选择PE150测序策略时,可考虑筛选目标长度(如250-600bp)范围内的扩增产物进行建库测序;或者,也可以直接在所有扩增产物(包含250-600bp)的两端连接测序接头,待文库构建完毕后,再借助SageELF或者Pippin HT等片段分选仪器对可被用于PE150测序的区域(350-700bp)进行分选,而用进行高通量测序。
根据本发明的具体实施方案,本发明的对基因组的一部分进行富集的方法还包括:
对测序文库的高通量测序数据进行拆分,比对参考基因组,检测前景基因内、序列特异引物处的reads覆盖情况,以及全基因组上由reads富集、覆盖深度≥3×的高深度tags位点的分布情况,这些高深度的tags位点即可用于后续的全基因组基因型检测。
另一方面,本发明还提供了所述的对基因组的一部分进行富集的方法在标记辅助选择育种或基因组育种中的应用。具体地,该方法可以是用于对物种进行全基因组遗传背景检测,或是用于对物种进行前景基因和遗传背景检测等。本发明的方法可用于对动植物等物种进行测序。在一些具体实施方案中,所述物种包括但不限于水稻、犬和猪等。
在一些具体实施方案中,本发明的对基因组的一部分进行富集的方法,可以用于全基因组背景基因型的检测(即不关注前景基因),具有类似于简化基因组测序的功效。
在一些具体实施方案中,本发明的对基因组的一部分进行富集的方法,还可实现对前景基因和遗传背景的同时检测,因此检测过程更加快捷高效。并且,所用序列特异引物位于前景基因内部的多态性位点附近,从而保证序列特异引物的特异性扩增产物与前景基因完全连锁。检测遗传背景时,序列特异引物在基因组上靶位点的特异性退火和数万个位点的非特异性退火,与随机简并引物的配对,扩增获得的tags位点稳定出现,且均匀覆盖整个基因组。与现有的基于简化基因组测序和DNA芯片等遗传背景检测技术相比,本发明的方法,无需提前打断基因组、仅使用序列特异引物和随机简并引物对基因组进行一次PCR扩增,即可稳定获得数以万计、均匀覆盖全基因组的高深度tags位点,用于育种材料遗传背景的检测。
附图说明
图1显示实施例1中引物Pi2_F01与AD1_R03的PCR扩增结果检测。
图2显示实施例1中文库构建结果检测。
图3显示实施例1中三个文库混合后的片段分选结果检测。
图4显示实施例1中三次技术重复对应的测序数据在水稻基因组上的分布。其中,图片A:reads在前景基因内富集(图中蓝色横线表示Pi2基因区间位置);图片B:reads在Pi2_F01引物结合处富集(图中红色箭头表示序列特异引物结合位置);图片C:三次技术重复中的测序reads在基因组上显著富集,稳定地形成高深度tags;图片D:三次技术重复中共有tags在基因组上的密度分布。
图5显示实施例2中引物Rf4_F01与AD2_R03的PCR扩增结果检测。
图6显示实施例2中三个文库混合后的片段分布范围检测。
图7显示实施例2中三个文库混合后的片段分选结果检测。
图8显示实施例2中三次技术重复对应的测序数据在水稻基因组上的分布。其中,图片A:reads在前景基因内富集(图中蓝色横线表示Rf4基因区间位置);图片B:reads在Rf4_F01引物结合处富集(图中红色箭头表示序列特异引物结合位置);图片C:三次技术重复中的测序reads在基因组上显著富集,稳定地形成高深度tags;图片D:三次技术重复中共有tags在基因组上的密度分布。
图9显示实施例3中PCR产物纯化及片段分布检测过程中,以dog_139-Rep1和pig_421-Rep1样品为代表的检测结果。
图10显示实施例3中高通量测序文库构建过程中,以dog_139-Rep1和pig_421-Rep1样品为代表的文库检测结果。
图11显示实施例3中片段分选结果。
图12显示实施例3中测序数据拆分及tags特征分析过程中,由引物Hd1_F01与AD1_R3B随机扩增产生的reads在猪和犬的基因组上显著富集、稳定地形成高深度tags位点。
图13显示实施例3中犬样品、猪样品三次重复中共有tags在基因组上的分布位置。
具体实施方式
为了对本发明的技术特征、目的和有益效果有更加清楚的理解,现结合具体实施例及对本发明的技术方案进行以下详细说明,应理解这些实例仅用于说明本发明而不用于限制本发明的范围。实施例中,各原始试剂材料均可商购获得,未注明具体条件的实验方法为所属领域熟知的常规方法和常规条件,或按照仪器制造商所建议的条件。
除非另外专门定义,本文使用的所有技术和科学术语都与相关领域普通技术人员的通常理解具有相同的含义。
本发明通过设计一对引物,对待测材料的基因组DNA进行一次PCR扩增,随后经高通量测序及数据分析,可以完成对前景基因和遗传背景的同时检测。其中一对引物包括:结合 在前景基因上的正向序列特异引物F,以及反向随机简并引物R。基于正向序列特异引物F与前景基因序列间特异性结合后的PCR扩增,完成前景基因的筛选;基于正向序列特异引物F和反向随机简并引物R在PCR反应中引发的非特异性扩增,完成遗传背景的检测。本发明的方法主要包括:
(1)引物设计
根据前景基因序列,在其基因内部的多态性位点附近、设计长18-25nt的常规正向序列特异引物F(为区分不同的样品,可在F引物5’端额外添加6-10nt的barcode序列);根据待测物种的基因组序列特点,设计长8-20nt的反向随机简并引物R(为区分不同的样品,也可在R引物5’端额外添加6-10nt的barcode序列)。在随机简并引物R中,简并碱基位于引物的中间位置,简并度在64-4096之间。
(2)PCR扩增
以所述引物对待测材料的基因组DNA进行一次PCR扩增。其中,基因组DNA的提取可参照现有技术。本发明中,针对不同物种或组织的特点,可使用CATB法或者商业试剂盒等提取待测样本的基因组DNA,要求DNA经琼脂糖凝胶检测条带清晰完整、无明显降解和RNA污染。
在PCR反应体系中,使用高保真的DNA聚合酶进行PCR反应,通过调整引物的退火温度和循环条件,同时实现前景基因和遗传背景的检测目的。例如,可根据正向序列特异引物的Tm值和最终PCR扩增产物的总量(尤其是其中200-600bp区域的总量)来调整,期望扩增产物总量在500ng左右,方便后续建库,继而进行PE150测序。
所述PCR扩增为热不对称交错式PCR扩增,包括两阶段循环反应:第一阶段循环反应主要使更多的(相比于随机简并引物而言)特异引物结合在模板上,进而获得更多的扩增产物;第二阶段循环反应主要使简并引物更容易地(相比于序列特异引物而言)与模板结合,引发扩增反应。具体而言,第一阶段循环反应包括在较高退火温度下进行3-7轮循环,每轮循环包括变性、较高退火温度下退火、延伸。第二阶段循环反应包括在较高退火温度下与较低退火温度下交错进行6-10轮循环,每轮循环包括3个循环单元且其中至少1个在较高退火温度下退火的循环单元以及至少1个在较低退火温度下退火的循环单元。优选地,每轮循环包括:变性、较高退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸。
在一些更具体实施方案中,本发明的所述热不对称交错式PCR扩增按照以下条件进行:
90-98℃预变性1min-5min;
第一阶段循环反应:90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸 1min-3min;3-7轮循环;
第二阶段循环反应90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较低退火温度下退火30s-1min,72℃延伸1min-3min;6-10轮循环;
72℃终延伸5min-10min;
其中,各较高退火温度各自独立地为50-60℃,各较低退火温度各自独立地为40-50℃。
(3)扩增产物纯化及文库构建
利用磁珠对扩增产物进行纯化,随后进行3’末端加A、连接Y型Illumina测序接头等步骤,完成测序文库的制备,文库经片段分选后可用于PE150高通量测序。
(4)测序数据分析及基因型鉴定
根据样品特异的barcode序列对下机数据进行拆分,可获得每个样品对应的测序数据,随后将其比对参考基因组,检测前景基因内、正向序列特异引物F处的reads覆盖情况(前景基因筛选),以及全基因组上由reads富集、覆盖深度≥3×的高深度tags位点的分布情况,这些多态性的tags位点即可用于后续的全基因组基因型检测(遗传背景检测)。
实施例1:利用正向序列特异引物Pi2_F01与反向随机简并引物AD1_R03对水稻进行前景基因Pi2(抗稻瘟病基因)和遗传背景的同时检测
本实施例中,提供了利用正向序列特异引物Pi2_F01与反向随机简并引物AD1_R03对水稻进行前景基因Pi2(抗稻瘟病基因)和遗传背景进行同时检测的方法。该方法主要过程包括:
(1)基因组DNA提取及稀释
利用CTAB法提取水稻品种Nipponbare幼嫩叶片的基因组DNA,经琼脂糖凝胶检测合格后,用无菌超纯水稀释。50μL反应体系稀释浓度最低应在3.85ng/μL,本实施例中稀释至10ng/μL(Qubit浓度)。
(2)PCR反应体系配置
按照表1中PCR反应体系的成分及用量,使用规格为200μL的PCR薄壁管,在冰上配置总体积为50μL的反应体系,利用移液枪反复吹打10次混合均匀。同时设置3次技术重复(Pi2-Rep1、Pi2-Rep2和Pi2-Rep3)。所用正向序列特异引物Pi2_F01的序列为:5‘-TAACAGCCAACCTCCGAACAACGCCAACTG-3’(SEQ ID NO:1,下划线处10nt为正向序列特异引物所连的barcode序列);反向随机简并引物AD1_R03的序列为:5‘-TCAGTGAGTCGCCVNVNNNCGG-3’(SEQ ID NO:2,下划线处10nt为反向引物所连的barcode序列;中间的V和N为简并碱基;此引物简并度为2304)。
表1、PCR反应体系(50μL)
(3)PCR扩增反应程序设置
经小型掌上离心机短暂离心后,收集所有50μL液体至PCR管底,随后按照表2,设置PCR扩增反应的程序(需提前105℃热盖),循环数共计26;所用PCR仪为德国耶拿Biometra Tone 96扩增仪(其它类似仪器均可)。
表2、PCR反应程序
(4)PCR产物纯化及片段分布检测
使用1.8×磁珠(90μL;购自Vazyme公司,货号:N411-03)对PCR产物进行纯化,随后用30μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度及总量,结果如表3所示。
表3、纯化后PCR产物浓度及总量
利用Qsep100核酸片段分析仪检测纯化后的片段大小分布,三个样品的片段大小分布基本一致,其中Pi2-Rep1检测结果如图1所示。
(5)高通量测序文库构建
利用Sage Pippin HT核酸片段分选仪对上述三次技术重复的纯化产物进行片段分选,回收200-600bp范围内的片段,并用无菌超纯水稀释至20ng/μL。随后取200ng,按照表4配制总体积为60μL的末端修复反应体系,反应程序为:105℃热盖;30℃静置20min;72℃静置20min;4℃结束反应。随后按照表5配制3’末端加“A”及短接头连接反应(共100μL),反应程序为:20℃静置30min。反应完成之后,利用0.8×磁珠(80μL;购自Vazyme公司,货号:N411-03)对PCR产物进行纯化,用36μL无菌超纯水进行洗脱,获得纯化产物。然后按照表6配制Illumina长接头的连接及文库扩增反应体系(总体积为50μL;三个技术重复添加不同的P5xx和P7xx组合,其中,xx代表不同的数字编号,每个重复有一个特定的P5xx+P7xx组合,主要用于区分此样品/重复,便于下机测序数据依此组合进行拆分,对应到每个样品/重复),反应程序按照表7设置,使用德国耶拿Biometra Tone 96扩增仪完成此步反应(其它类似仪器均可)。PCR扩增结束之后,利用0.9×磁珠(45μL;购自Vazyme公司,货号:N411-03)进行纯化,用30μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度,结果如表8所示。
表4、末端修复反应体系(60μL)
表5、末端加“A”及短接头连接反应体系(100μL)
表6、长接头连接及文库扩增反应(50μL)
表7、文库扩增PCR反应程序
表8、纯化后文库浓度及总量
利用Qsep100核酸片段分析仪检测纯化后的片段大小分布,三个重复样品的文库峰形一致,其中以Pi2-Rep1为代表的检测结果如图2所示。
其它类似的文库构建试剂(如Vazyme公司的VAHTS Universal Pro DNA Library Prep Kit for Illumina,货号:ND608)也可用于本发明中PCR扩增后的文库构建过程;除利用Illumina测序平台外,也可使用适用于华大测序平台的试剂(如Vazyme公司的VAHTS Universal Pro DNA Library Prep Kit for MGI,货号:NDM608),将本发明中PCR扩增后的产物转化为华大测序所需的文库类型。
(6)片段分选及高通量测序
由于每个样品在文库扩增过程中已添加了特异的index(即不同的P5xx和P7xx组合),因此可将所有三个文库等量混合在一起(三个文库也可分别进行分选、测序),再次利用Sage Pippin HT核酸片段分选仪将适用于后续PE150测序的片段(如350-700bp)分选出来(主要用于去除未耗尽的引物序列和小于350bp的短片段),如图3所示(12.8ng/μL;共384ng),利用Illumina测序平台进行高通量测序。
(7)测序数据拆分及tags特征分析
测序数据经拆分后,获得三次技术重复对应的数据量分别为1.28Gb、1.31Gb和1.16Gb(表9)。随后,去除每个样品测序数据中低质量(碱基质量值小于15)的reads,比对水稻Nipponbare MSU v7.0参考基因组序列(~0.4Gb,http://rice.uga.edu/),利用可视化软件IGV(https://igv.org/)可以看见测序reads不仅在前景基因Pi2内的引物结合处显著富集(图4中的图片A、图片B),而且三次技术重复中由引物Pi2_F01与AD1_R03随机扩增产生的reads在基因组上显著富集、稳定地形成高深度tags位点(图4中的图片C)。在获得参考基因组上碱基覆盖深度数据之后,可以在三次技术重复中检测到全基因组范围内depth≥3×的高深度tags位点数目分别为19,814、20,103和19,266个;其中,三次技术重复间检测到的共有tags数目为12,209个,这些tags在三次技术重复检测到的总tags数目中的占比介于60.73-63.37% 之间(表9),说明本发明中即使是引物引发的随机扩增,但扩增产物依然具有较高的稳定性和可重复性。当查看这些共有tags在基因组上的分布位置时,相邻tag间平均间距为30.55Kb,标准差为42.97Kb,可见共有tags均匀分布于全基因组的所有染色体上(图4中的图片D)。
表9、样品测序数据量及tags数目检测结果
综上,本发明中利用正向序列特异引物Pi2_F01与反向随机简并引物AD1_R03进行一次PCR扩增,不仅可以完成对前景抗稻瘟病基因Pi2的筛选,而且可以稳定获得数目众多、均匀分布于全基因组的高深度tags,用于遗传背景检测。
实施例2:利用正向序列特异引物Rf4_F01与反向随机简并引物AD2_R03对水稻进行前景基因Rf4(水稻育性恢复基因)和遗传背景的同时检测
本实施例中,提供了利用正向序列特异引物Rf4_F01与反向随机简并引物AD2_R03对水稻(与实施例1相同的水稻品种Nipponbare)进行前景基因Rf4(水稻育性恢复基因)和遗传背景进行同时检测的方法。该方法主要过程包括:
(1)基因组DNA提取及稀释
同实施例1中(1)。
(2)PCR反应体系配置
按照表1中的成分(其中的引物替换为本实施例的引物)及用量,使用规格为200μL的PCR薄壁管,在冰上配置总体积为50μL的反应体系,利用移液枪反复吹打10次混合均匀。同时设置3次技术重复(Rf4-Rep1、Rf4-Rep2和Rf4-Rep3)。所用正向序列特异引物Rf4_F01的序列为:5‘-TAACAGCCAACTGCTTACA AAGTGAGGTGGTGT-3’(SEQ ID NO:3,下划线处10nt为正向序列特异引物所连的barcode序列);反向随机简并引物AD2_R03的序列为:5‘-TCAGTGAGTCGCCBNBNNNCGG-3’(SEQ ID NO:4,下划线处10nt为反向引物所连的barcode序列;中间的B和N为简并碱基;此引物简并度为2304)。
(3)PCR扩增反应程序设置
参见实施例1中(3)。
(4)PCR产物纯化及片段分布检测
使用1.8×磁珠(90μL;购自Vazyme公司,货号:N411-03)对PCR产物进行纯化,随后用30μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度及总量,结果如表10所示。
表10、纯化后PCR产物浓度及总量
利用Qsep100核酸片段分析仪检测纯化后的片段大小分布,三个样品的片段大小分布基本一致,其中以Rf4-Rep1样品为代表的检测结果如图5所示。
(5)高通量测序文库构建
分别取上述纯化产物5μL,按照表11配制总体积为60μL的末端修复反应体系,反应程序为:105℃热盖;30℃静置20min;72℃静置20min;4℃结束反应。随后按照表5配制3’末端加“A”及短接头连接反应(共100μL),反应程序为:20℃静置30min。反应完成之后,利用0.8×磁珠(80μL;购自Vazyme公司,货号:N411-03)对PCR产物进行纯化,用36μL无菌超纯水进行洗脱,获得纯化产物。然后按照表6配制Illumina长接头的连接及文库扩增反应体系(总体积为50μL;三个技术重复添加不同的P5xx和P7xx组合),反应程序按照表12设置,使用德国耶拿Biometra Tone 96扩增仪完成此步反应(其它类似仪器均可)。PCR扩增结束之后,利用0.9×磁珠(45μL;购自Vazyme公司,货号:N411-03)进行纯化,用26μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度,结果如表13所示。
表11、末端修复反应体系(60μL)
表12、文库扩增PCR反应程序
表13、纯化后文库浓度及总量
其它类似的文库构建试剂(如Vazyme公司的VAHTS Universal Pro DNA Library Prep Kit for Illumina,货号:ND608)也可用于本发明中PCR扩增后的文库构建过程;除利用Illumina测序平台外,也可使用适用于华大测序平台的试剂(如Vazyme公司的VAHTS Universal Pro DNA Library Prep Kit for MGI,货号:NDM608),将本发明中PCR扩增后的产物转化为华大测序所需的文库类型。
(6)片段分选及高通量测序
由于每个样品在文库扩增过程中已添加了特异的index(即不同的P5xx和P7xx组合),且纯化后总量近似相等(表13,均值±标准差为:289.47±3.24),因此可直接将所有三个文库全部混合在一起,经Qsep100核酸片段分析仪检测混合后的片段范围分布,如图6所示。利用Sage Pippin HT核酸片段分选仪,将适用于后续PE150测序的片段(如350-700bp)从混合后的总文库中分选出来,如图7所示(1.75ng/μL;共52.5ng),利用Illumina测序平台进行高通量测序。
(7)测序数据拆分及tags特征分析
测序数据经拆分后,获得三次技术重复对应的数据量分别为0.28Gb、0.29Gb和0.33Gb(表14)。随后,去除每个样品测序数据中低质量的reads,比对水稻Nipponbare MSU v7.0参考基因组序列(~0.4Gb,http://rice.uga.edu/),利用可视化软件IGV(https://igv.org/)可以看见测序reads不仅在前景基因Rf4内的引物结合处显著富集(图8中的图片A、图片B),而且三次技术重复中由引物Rf4_F01与AD2_R03随机扩增产生的reads在基因组上显著富集、稳定地形成高深度tags位点(图8中的图片C)。在获得参考基因组上碱基覆盖深度数据之后,可以在三次技术重复中检测到全基因组范围内depth≥3×的高深度tags位点数目分别为18,578、18,504和20,939个;其中,三次技术重复间检测到的共有tags数目为12,350个,这些tags在三次技术重复检测到的总tags数目中的占比介于58.98-66.74%之间(表14)。当查看这些共有tags在基因组上的分布位置时,相邻tag间平均间距为30.02Kb,标准差为38.51Kb,可见共有tags均匀分布于全基因组的所有染色体上(图8中的图片D)。
表14、样品测序数据量及tags数目检测结果
综上,本发明中利用正向序列特异引物Rf4_F01与反向随机简并引物AD2_R03进行一次PCR扩增,不仅可以完成对前景水稻育性恢复基因Rf4的筛选,而且可以稳定获得数目众多、均匀分布于全基因组的高深度tags,用于遗传背景检测。
实施例3:利用正向序列特异引物Hd1_F01与反向随机简并引物AD1_R3B对动物样本进行全基因组遗传背景检测
本实施例中,提供了利用正向序列特异引物Hd1_F01与反向随机简并引物AD1_R3B,分别对两份动物DNA样本(一份金毛寻回犬dog_139和一份杜洛克猪样品pig_421)进行全基因组遗传背景检测。该方法主要过程包括:
(1)基因组DNA提取及稀释
利用FastPure Blood DNA Isolation Mini Kit V2试剂盒(购自Vazyme公司,货号:DC111-01),按照官方说明书,提取dog_139和pig_421样品的血液DNA,经琼脂糖凝胶检测合格后,用无菌超纯水稀释。50μL反应体系稀释浓度最低应在3.85ng/μL,本实施例中稀释至10ng/μL(Qubit浓度)。
(2)PCR反应体系配置
按照表1中PCR反应体系的成分及用量,使用规格为200μL的PCR薄壁管,在冰上配置总体积为50μL的反应体系,利用移液枪反复吹打10次混合均匀。每份样品同时设置3次技术重复(dog_139-Rep1、dog_139-Rep2和dog_139-Rep3;pig_421-Rep1、pig_421-Rep2和pig_421-Rep3)。所用正向序列特异引物Hd1_F01的序列为:5‘-TAACAGCCAAAGGACGGAGGTGGCCGGGATGGT-3’(SEQ ID NO:5,下划线处10nt为正向序列特异引物所连的barcode序列);反向随机简并引物AD1_R3B的序列为:5‘-TCAGTGAGTCGCCVAVNGNCGG-3’(SEQ ID NO:6,下划线处10nt为反向引物所连的barcode序列;中间的V和N为简并碱基;此引物简并度为144)。
(3)PCR扩增反应程序设置
参见实施例1中步骤(3)。
(4)PCR产物纯化及片段分布检测
使用1.8×磁珠(90μL;购自Vazyme公司,货号:N411-03)对PCR产物进行纯化,随后用50μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度及总量,结果如表15所示。
表15、纯化后PCR产物浓度及总量
利用Qsep100核酸片段分析仪检测纯化后的片段大小分布,三个样品的片段大小分布基本一致,其中以dog_139-Rep1和pig_421-Rep1样品为代表的检测结果如图9所示。
(5)高通量测序文库构建
按照实施例2中步骤(5)进行文库构建和纯化,最终用21μL无菌超纯水进行洗脱,获得纯化产物。利用Qubit 3.0仪器测定纯化后产物的浓度,结果如表16所示。利用Qsep100核酸片段分析仪检测纯化后的文库片段大小分布,犬和猪对应的三个重复样品的文库峰形基本一致,其中以dog_139-Rep1和pig_421-Rep1样品为代表的文库检测结果如图10所示。
表16、纯化后文库浓度及总量
(6)片段分选及高通量测序
按照实施例2中步骤(5)进行文库混合和片段分选,将适用于后续PE150测序的片段(如400-800bp)从混合后的总文库中分选出来,如图11所示(2.05ng/μL;共61.5ng),利用Illumina测序平台进行高通量测序。
(7)测序数据拆分及tags特征分析
测序数据经拆分后,dog_139三次技术重复对应的数据量分别为1.02Gb、1.13Gb和1.29Gb;pig_421三次技术重复对应的数据量分别为1.14Gb、1.08Gb和1.48Gb(表17)。随后,去除每个样品测序数据中低质量的reads,分别比对到常用的犬参考基因组(~2.4Gb,https://www.ncbi.nlm.nih.gov/assembly/GCA_008641055.3),以及猪参考基因组(~2.5Gb,https://www.ncbi.nlm.nih.gov/assembly/GCF_000003025.6/)上,利用可视化软件IGV(https://igv.org/)可以看见三次技术重复中由引物Hd1_F01与AD1_R3B随机扩增产生的reads在猪和犬的基因组上显著富集、稳定地形成高深度tags位点(图12)。
表17、样品测序数据量及tags数目检测结果
在获得参考基因组上碱基覆盖深度数据之后,可以在犬dog_139样品的三次技术重复中检测到全基因组范围内depth≥3×的高深度tags位点数目分别为110,930、115,395和122,240个,其中三次技术重复间检测到的共有tags数目为71,885个,这些共有tags在三次技术重复检测到的总tags数目中的占比介于58.81-64.80%之间(表17)。对于猪样品pig_421,三次技术重复中检测到全基因组范围内depth≥3×的高深度tags位点数目分别为118,255、111,705和133,166个,其中三次技术重复间检测到的共有tags数目为74,937个,这些共有tags在三次技术重复检测到的总tags数目中的占比介于56.27-67.08%之间(表17)。当查看犬样品三次重复中共有tags在基因组上的分布位置时(图13),相邻tag间平均间距为33.43Kb,标准差为210.64Kb;同样,当查看猪样品三次重复中共有tags在基因组上的分布位置时(图13),相邻tag间平均间距为32.39Kb,标准差为227.52Kb。可见,这些共有tags可以较均匀地分布于犬和猪的所有染色体上。
综上,本发明对基因组的一部分进行富集的方法,不仅可以用于前景基因和遗传背景的同时检测,而且可作为一种新的基因型鉴定技术,应用于动植物全基因组的基因型检测。

Claims (10)

  1. 一种对基因组的一部分进行富集的方法,该方法包括:
    采用序列特异引物和随机简并引物配对,对待测样本的基因组DNA进行热不对称交错式PCR扩增;
    对PCR扩增产物进行纯化并构建测序文库。
  2. 根据权利要求1所述的方法,其中,所述序列特异引物根据基因组上任意区域的序列而设计,或是根据前景基因序列而设计;
    优选地,序列特异引物的长度为18-30nt;
    优选地,序列特异引物的5’端额外添加6-10nt的barcode序列。
  3. 根据权利要求1所述的方法,其中,所述随机简并引物的长度为8-20nt,优选为8-15nt;
    优选地,所述随机简并引物的简并度为64-4096,优选为120-3072;
    优选地,所述随机简并引物的5’端额外添加6-10nt的barcode序列。
  4. 根据权利要求1所述的方法,其中,PCR扩增时,随机简并引物的量超过序列特异引物的量。
  5. 根据权利要求1所述的方法,其中,所述热不对称交错式PCR扩增包括在较高退火温度和较低退火温度下交错进行循环反应的过程;
    优选地,所述热不对称交错式PCR扩增包括至少两阶段循环反应:
    第一阶段循环反应包括在较高退火温度下进行3-7轮循环,每轮循环包括变性、较高退火温度下退火、延伸;
    第二阶段循环反应包括在较高退火温度下与较低退火温度下交错进行6-10轮循环,每轮循环包括变性、较高退火温度下退火、延伸、变性、较高退火温度下退火、延伸、变性、较低退火温度下退火、延伸;
    其中,较高退火温度为50-60℃,较低退火温度比较高退火温度低5-15℃。
  6. 根据权利要求1-5任一项所述的方法,其中,热不对称交错式PCR扩增按照以下条件进行:
    90-98℃预变性1min-5min;
    90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;3-7轮循环;
    90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2min,较高退火温度下退火30s-1min,72℃延伸1min-3min;90-98℃变性20s-2 min,较低退火温度下退火30s-1min,72℃延伸1min-3min;6-10轮循环;
    72℃终延伸5min-10min;
    其中,各较高退火温度各自独立地为50-60℃,较低退火温度为40-50℃。
  7. 根据权利要求1所述的方法,其中,PCR扩增产物纯化后,进行3’末端加A、连接Y型Illumina测序接头,完成测序文库的制备。
  8. 根据权利要求1所述的方法,该方法还包括:
    对测序文库进行测序,或者,测序文库经片段分选后用于高通量测序。
  9. 根据权利要求1或8所述的方法,该方法还包括:
    对测序文库的高通量测序数据进行拆分,比对参考基因组,检测前景基因内、序列特异引物处的reads覆盖情况,以及全基因组上由reads富集、覆盖深度≥3×的高深度tags位点的分布情况,这些高深度的tags位点即可用于后续的全基因组基因型检测。
  10. 权利要求1-9任一项所述的方法在标记辅助选择育种或基因组育种中的应用;
    优选地,该方法是用于对物种进行全基因组遗传背景检测,或是用于对物种进行前景基因和遗传背景检测。
PCT/CN2023/128105 2022-11-14 2023-10-31 一种对基因组的一部分进行富集的方法与相关应用 WO2024104129A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211418807.8 2022-11-14
CN202211418807.8A CN115725694A (zh) 2022-11-14 2022-11-14 一种对基因组的一部分进行富集的方法与相关应用

Publications (1)

Publication Number Publication Date
WO2024104129A1 true WO2024104129A1 (zh) 2024-05-23

Family

ID=85295441

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/128105 WO2024104129A1 (zh) 2022-11-14 2023-10-31 一种对基因组的一部分进行富集的方法与相关应用

Country Status (2)

Country Link
CN (1) CN115725694A (zh)
WO (1) WO2024104129A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115725694A (zh) * 2022-11-14 2023-03-03 中国农业科学院深圳农业基因组研究所 一种对基因组的一部分进行富集的方法与相关应用

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017165925A1 (en) * 2016-03-31 2017-10-05 Reproductive Health Science Limited Amplification of target sequences
CN115725694A (zh) * 2022-11-14 2023-03-03 中国农业科学院深圳农业基因组研究所 一种对基因组的一部分进行富集的方法与相关应用

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017165925A1 (en) * 2016-03-31 2017-10-05 Reproductive Health Science Limited Amplification of target sequences
CN115725694A (zh) * 2022-11-14 2023-03-03 中国农业科学院深圳农业基因组研究所 一种对基因组的一部分进行富集的方法与相关应用

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
KALENDAR RUSLAN, SHUSTOV ALEXANDR V., SEPPÄNEN MERVI M., SCHULMAN ALAN H., STODDARD FREDERICK L.: "Palindromic sequence-targeted (PST) PCR: a rapid and efficient method for high-throughput gene characterization and genome walking", SCIENTIFIC REPORTS, vol. 9, no. 1, 1 December 2019 (2019-12-01), XP055885585, DOI: 10.1038/s41598-019-54168-0 *
KALENDAR RUSLAN, SHUSTOV ALEXANDR V.; SCHULMAN ALAN H.: "Palindromic Sequence-Targeted (PST) PCR, Version 2: An Advanced Method for High-Throughput Targeted Gene Characterization and Transposon Display", FRONTIERS IN PLANT SCIENCE, FRONTIERS RESEARCH FOUNDATION, CH, vol. 12, 22 June 2021 (2021-06-22), CH , XP093170445, ISSN: 1664-462X, DOI: 10.3389/fpls.2021.691940 *
LIU, CAN ET AL.: "Introduction to Five Common PCR Technologies", BULLETIN OF BIOLOGY, CN, vol. 57, no. 6, 20 June 2022 (2022-06-20), CN, pages 46 - 50, XP009555403, ISSN: 0006-3193 *
YAO-GUANG LIU, YUANLING CHEN: "High-efficiency thermal asymmetric interlaced PCR for amplification of unknown flanking sequences", BIOTECHNIQUES, INFORMA HEALTHCARE, US, vol. 43, no. 5, 1 November 2007 (2007-11-01), US , pages 649 - 656, XP055428052, ISSN: 0736-6205, DOI: 10.2144/000112601 *
ZHANG H.C. ET AL.: "A low degenerate primer pool improved the efficiency of high‑efficiency thermal asymmetric interlaced PCR to amplify T‑DNA flanking sequences in Arabidopsis thaliana", 3 BIOTECH, 11 December 2017 (2017-12-11), pages 1 - 5 *

Also Published As

Publication number Publication date
CN115725694A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
WO2017024690A1 (zh) 一种单管高通量测序文库的构建方法
CN102409042B (zh) 一种高通量基因组甲基化dna富集方法及其所使用标签和标签接头
WO2024104129A1 (zh) 一种对基因组的一部分进行富集的方法与相关应用
CN105506063A (zh) 引物组合物及其用途
CN110885883B (zh) Dna参照标准及其应用
WO2013152456A1 (zh) 玉米真实性检测及分子育种SNP芯片-maizeSNP3072及其检测方法
WO2023284768A1 (zh) 融合引物直扩法人类线粒体全基因组高通量测序试剂盒
WO2018113799A1 (zh) 构建简化基因组文库的方法及试剂盒
WO2024104130A1 (zh) 一种利用简并引物扩增进行全基因组分子标记开发的方法
CN108300766A (zh) 利用转座酶对染色质开放区和线粒体甲基化研究的方法
CN115232881A (zh) 一种鲍基因组育种芯片及其应用
WO2022199242A1 (zh) 一组条码接头以及中通量多重单细胞代表性dna甲基化建库和测序方法
WO2016045105A1 (zh) Pf快速建库方法及其应用
CN109280696B (zh) Snp检测技术拆分混合样本的方法
CN115109843A (zh) 一种多个水稻性状控制基因变异检测功能标记方法
Dugé de Bernonville et al. From methylome to integrative analysis of tissue specificity
CN109913575B (zh) 一种鉴定辣椒cms雄性不育恢复基因的kasp分子标记、试剂盒及其应用
CN117106975A (zh) 一种检测ii型鲤疱疹病毒的raa-crispr核酸分子组合物及试剂盒
CN115125295A (zh) 一种用于多位点可持续使用的基因分型标准品
CN110305974B (zh) 基于检测五个snp位点区分常见小鼠近交系的pcr分析引物及其分析方法
CN111763668B (zh) 测序引物组及基于pcr的全基因组测序方法
WO2020259722A1 (zh) Dna文库的制备方法和对dna文库的分析方法
CN114606334A (zh) 玉米花期基因的snp分子标记的开发及应用
CN108866225B (zh) 一种遗传修饰水稻遗传背景的筛查方法
CN105838821B (zh) 一种检测牛apaf1基因隐性致死突变的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890565

Country of ref document: EP

Kind code of ref document: A1