WO2023221307A1 - 一种靶向富集核酸的探针 - Google Patents

一种靶向富集核酸的探针 Download PDF

Info

Publication number
WO2023221307A1
WO2023221307A1 PCT/CN2022/111610 CN2022111610W WO2023221307A1 WO 2023221307 A1 WO2023221307 A1 WO 2023221307A1 CN 2022111610 W CN2022111610 W CN 2022111610W WO 2023221307 A1 WO2023221307 A1 WO 2023221307A1
Authority
WO
WIPO (PCT)
Prior art keywords
probe
sequence
target
biotin
capture
Prior art date
Application number
PCT/CN2022/111610
Other languages
English (en)
French (fr)
Inventor
汪彪
余丽萍
吴强
Original Assignee
纳昂达(南京)生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 纳昂达(南京)生物科技有限公司 filed Critical 纳昂达(南京)生物科技有限公司
Priority to EP22896845.9A priority Critical patent/EP4299758A1/en
Publication of WO2023221307A1 publication Critical patent/WO2023221307A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6811Selection methods for production or design of target specific oligonucleotides or binding molecules
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Definitions

  • the present invention relates to a probe, in particular to a probe used for targeted enrichment of nucleic acids and its application and design methods.
  • Nucleic acid sequences are carriers of life information, and high-throughput sequencing technology has become one of the core technologies in the biological and medical fields.
  • High-throughput sequencing generates large amounts of data, not all of which are target sequences to be studied or detected.
  • the cost of sequencing has been significantly reduced, the cost remains high due to the high amount of whole-genome sequencing data.
  • the solution to this problem is to turn whole-genome sequencing into targeted enrichment technology, NGS sequencing technology that enriches target regions. It will ignore the information of uninteresting regions in the genome and amplify the signal of the target region in the genome, which can save sequencing costs and sequencing time.
  • Target enrichment is mainly divided into multiplex PCR amplification and target capture based on different enrichment principles.
  • the latter is a liquid-phase hybridization capture technology based on probes, which is currently the mainstream and has the advantages of low difficulty in probe design and high probe fault tolerance.
  • Liquid-phase hybridization capture technology specifically binds biotin-labeled probes to the target region in solution, and enriches the target fragments captured by the probes through streptavidin magnetic beads.
  • biotin-labeled probes and the liquid phase reaction conditions of hybridization capture have a great impact on the capture efficiency of this system. For large target areas, hybridization capture efficiency is higher, such as full-penetrance detection.
  • the hit rate is above 80%; but for some small target areas (Panel), the hit rate is lower, such as small target areas below 10kb, the hit rate is even lower. Less than 10%.
  • the probe length should ensure that in a specific hybridization system and with different sequence base compositions, the hybridization annealing temperature is appropriate, and the binding ability and specificity to the target sequence are optimal. Excellent; secondly, it should be ensured that the hybridization annealing temperature does not drop significantly when there is a certain degree of mismatch between the probe and the target sequence; finally, the longer the probe, the more difficult it is to synthesize, and the harder it is to ensure the quality of the synthesis.
  • the probe sequence length is usually 40-120nt.
  • the mainstream probe length is 120nt and is modified (such as biotin). The modified group can be combined with the corresponding affinity medium to complete the alignment. "Capture" of the target sequence.
  • the forms of probes include single-stranded DNA, double-stranded DNA, single-stranded RNA, double-stranded RNA, etc.
  • second-generation sequencing technology is the most widely used high-throughput sequencing technology, and bidirectional 150bp is the more mainstream sequencing reading mode.
  • the average insert length of sequencing libraries is mostly between 100-400bp.
  • the middle part of an insert that is too long cannot be read, and it also poses challenges to multiple PCR amplification steps in the sequencing process.
  • samples with relatively short original lengths such as FFPE and extracellular free nucleic acids, it is impossible to prepare libraries with longer inserts.
  • a library molecule can usually only bind 1-2 probes, which also means that the probability of probe shedding increases and the recovery rate of the target sequence decreases. For example, a target sequence with a length of 120 bp can only completely bind to one probe at most.
  • the two probes can only partially bind.
  • the probe can be shortened and the number of probes can be increased, or a shingled design strategy can be adopted, that is, the probes are placed overlapping each other so that different target sequence fragments have higher Probability can be more completely integrated with probes.
  • overlapping probes cannot completely bind to the same target fragment at the same time ( Figure 4A).
  • hybridization capture technology usually targets regions of more than 5 kb, and the inherent non-specific capture can be compressed through a variety of means, and the target rate (the proportion of the target sequence among all captured sequences) is guaranteed.
  • the target rate the proportion of the target sequence among all captured sequences
  • liquid-phase hybridization capture process is very time-consuming. It takes 2-4 days to obtain the nucleic acid sample from the capture library.
  • hybridization capture involves many types of reagents and the operation process is extremely cumbersome, which is difficult for the operator.
  • the technical requirements are high, and problems in any part of the process will affect the performance of the capture library.
  • Liquid-phase hybridization capture technology is widely used in cancer tumor mutation gene detection, copy number variation, and methylation status analysis.
  • MRD liquid phase hybridization capture technology.
  • solid tumor MRD detection technology the primary tumor tissue is first sequenced to identify patient-specific genomic variation patterns, and then the target region is designed for personalized ctDNA detection and analysis. This requires the hybridization capture system to put forward higher requirements in terms of compatibility of small target areas, ease of operation, degeneracy of experimental procedures, and degree of automation.
  • the invention provides a probe for nucleic acid capture and enrichment, and a design method for a probe pool composed of the probe.
  • the invention provides a probe for nucleic acid capture and enrichment, which is characterized in that the above-mentioned probe includes: (1) a probe binding sequence complementary to another probe, and (2) complementary to a nucleic acid target sequence Paired target-specific sequences.
  • the above probe binding sequence includes a first probe binding sequence and a second probe binding sequence.
  • the 5' end of the probe has a first probe binding sequence that is complementary to the 3' end of another probe, and the 3' end of the probe has a 5' end that is complementary to the 5' end of the other probe.
  • Complementarily paired second probe binding sequences are preferably paired.
  • the length of the above-mentioned probe binding sequence is 8-30nt.
  • the length of the above target-specific sequence is 20-80nt.
  • the length of the first probe binding sequence complementary to another probe at the 5' end of the probe is 8-30 nt, and the second probe binding sequence complementary to the other probe at the 3' end of the probe Length is 8-30nt.
  • the 3' end or 5' end of the above-mentioned probe carries a biomarker.
  • the above-mentioned biomarker is biotin.
  • the annealing temperature between the probe and the nucleic acid target sequence is greater than the annealing temperature between the probe and the probe.
  • the invention provides a probe pool design method for nucleic acid capture and enrichment, which is characterized in that it includes the following steps:
  • the above-mentioned initial sequence information includes (1) the total sequence information, which is the sequence that may be included in the library before capture; (2) the target sequence information, which is the sequence to be captured. Sequences and sequence information that need to be avoided, that is, low-specificity sequences such as repetitive sequences in the comprehensive sequence;
  • the above design parameters include the annealing temperature range and sequence length range of the probe binding to the target sequence, as well as the length range of the binding sequence between the probe and the probe;
  • d) Select the target-specific sequence that the probe binds to the nucleic acid target sequence, wherein the i-th target sequence is selected, and the initial value of i is equal to 1; then the probe is selected starting from the n-th base of the selected target sequence.
  • the initial value of n is equal to 1;
  • the target-specific sequence that the above-mentioned probe binds to the nucleic acid target sequence does not fall into the sequence interval that needs to be avoided, it is put into the probe pool, and an interval of m1 bases is used to try to obtain the next target-specific sequence; If it falls into a sequence interval that needs to be avoided, it will not be put into the probe pool and try again to obtain the target-specific sequence at an interval of m2 bases;
  • the above-mentioned digital m1 value is greater than or equal to the length of the above-mentioned probe and target-specific sequence; the above-mentioned digital m2 value is less than or equal to the minimum value of the length range of the above-mentioned probe and target-specific sequence.
  • the steps of selecting a target-specific sequence that the probe binds to the nucleic acid target sequence include: when n is less than the length of the i-th target sequence, select the next target-specific sequence; when n is greater than or equal to the i-th target sequence.
  • the i-th target-specific sequence is selected.
  • the above-mentioned target-specific sequence selection is performed on the i+1-th target sequence until all target sequences have completed target-specific sequence selection.
  • the present invention also provides the application of the above-mentioned probe in detecting low-frequency mutation detection, chromosome copy number variation analysis, insertion/deletion, microsatellite instability or fusion gene variation in DNA fragments.
  • the present invention also provides the application of the above-mentioned probe in targeted mNGS sequencing or detecting pathogen epidemiology.
  • the beneficial effect of the probe of the present invention is that compared with conventional probes, the probe of the present invention binds to the target fragment more firmly, and can increase the number of target fragments that can be combined through a shorter target-specific binding sequence. the number of probes.
  • the probe of the present invention is more suitable for capturing short fragment libraries; more suitable for capturing small target regions; more suitable for capturing PCR-free libraries; and more conducive to shortening the hybridization capture process.
  • Figure 1 is a schematic structural diagram of the probe of the present invention.
  • the probe mainly consists of four parts: the P-Cap region complementary to the target gene, the P-L region at the 3' end and the P-R region at the 5' end.
  • the 5' end of the probe has Biotin (Biotin) label, P-L and P-R have a complementary sequence.
  • Figure 2 is a comparison diagram of the processes of the conventional hybridization capture system and the hybridization capture system of the present invention.
  • Figure 3 shows the experimental scheme for different types of samples.
  • FIG 4 is a schematic structural diagram of the conventional 120nt probe (A), the short probe used in the prior art (B), and the probe (C) of the present invention combined with the target fragment, where T represents the target fragment of the sample nucleic acid, and P represents probe.
  • Figure 5 shows the experimental results of conventional 120nt probes, short probes used in the prior art, and probe hybridization capture library NGS of the present invention.
  • Figure 6 shows the capture effects of conventional 120nt probes and probes of the present invention for PCR-free libraries.
  • Figure 7 shows the probe concentration test results of the present invention.
  • Figure 8 shows the hybridization temperature test results of the probe of the present invention.
  • Figure 9 shows the test results of hybridization time of the probe of the present invention.
  • the invention provides a set of probes for nucleic acid capture.
  • the probes are designed respectively for the sense strand and the negative sense strand of the target region.
  • the sense strand probe and the negative sense strand probe are arranged in a non-overlapping manner.
  • the probe 3' The end or 5' end is modified with biotin, which can bind to streptavidin magnetic beads.
  • the probe mainly consists of three parts.
  • the middle section is the target sequence binding section, and the 5' and 3' sections are stability-enhancing sections.
  • the 5' end section of one probe can be combined with the 3' end section of another probe.
  • the end segments are complementary to each other, and the 3' end segment can be complementary to the 5' end segment of another probe.
  • the complementary pairs between the probes are the P-L fragment and the P-R fragment.
  • the P-L fragment is 8-30nt in length
  • the P-R fragment is 8-30nt in length.
  • the 3' end of L or the 5' end of R is modified with biotin, which can bind to the streptavidin on the magnetic beads;
  • the fragment complementary to the probe and the target region is the P-Cap fragment, and the P-Cap length Between 20-80nt ( Figure 1).
  • the probe design method is as follows:
  • Design probes based on the location of the gene to be detected. If the probe is designed for mutation, insertion or deletion mutations, select the region covering the corresponding fragment; if the probe is designed for fusion genes, select the genes on both sides of the breakpoint of the fusion gene to design probes;
  • the probe will design a capture probe for the sense strand
  • the probe will design a capture probe for the antisense strand
  • the present invention also provides a method for designing a probe pool.
  • the method is as follows:
  • the design tool inputs the initial sequence information and design parameters to generate probe sequence information.
  • Initial sequence information including total sequence information, that is, the sequence information that may be contained in the library before enrichment, and target sequence information, that is, the sequence information that needs to be enriched from the total sequence information.
  • Design parameters including the annealing temperature range for the combination of the probe and the target sequence. This temperature is related to the composition of the hybridization reaction solution and the set temperature of the reaction. It also includes the sequence length range of the probe and target binding region.
  • Design tool workflow including:
  • the preprocessing of the sum sequence information includes the evaluation of the specificity of different segments of the sum sequence, and counting the occurrence times of all sequence combinations of length k in the positive strand and complementary strand of the sum sequence, and where k is smaller than the binding region between the probe and the target sequence.
  • the minimum value of the sequence length range is the minimum value of the sequence length range.
  • the characteristics of the probe and probe binding region sequence include:
  • the annealing temperature is lower than the annealing temperature of the probe binding to the target sequence
  • the selection process includes:
  • the value of the above-mentioned number m1 is greater than or equal to the sequence length of the probe-target binding region put into the probe pool; the value of the above-mentioned number m2 is less than or equal to the sequence length range of the above-mentioned probe and target-binding region. minimum value.
  • n is greater than or equal to the length of the i-th target sequence, complete the selection of the probe and target binding region sequences of the i-th target sequence.
  • the present invention also provides a system from nucleic acid samples to targeted library construction (see Figure 2).
  • the specific process is as follows:
  • the nucleic acid samples include DNA samples or RNA samples.
  • DNA samples include plasma cell-free DNA (cfDNA), genomic DNA (gDNA), FFPE samples, virus or bacterial genome samples, etc.;
  • RNA samples include fresh tissue samples, FFPE samples, viruses or bacteria Genomic samples, etc.
  • the library can be constructed directly without interruption
  • RNA samples reverse transcription, first-strand and second-strand synthesis are required;
  • the sample is subjected to end repair, adapter ligation and ligation product purification.
  • the purified product is directly subjected to hybridization capture.
  • the hybridization capture scheme is related to the adapter used.
  • multi-library mixed hybridization can be performed, and the hybridization products Use Primer Mix to perform PCR amplification of the library captured by mixed hybridization; if a truncated molecular tag adapter module is used, only a single sample can be hybridized.
  • the molecular tag adapter module can detect low-frequency mutations in the sample and perform consensus sequence analysis. Filter out hybridization blur and background noise introduced by PCR amplification. It is compatible with the adapter modules of Illumina and MGI sequencing platforms to construct DNA libraries suitable for different sequencing platforms.
  • the adapter ligation product does not need to be concentrated in a vacuum, and the hybridization capture reaction system can be directly configured, or the purified magnetic beads from the previous step can be used directly for hybridization capture;
  • the hybridization system uses the specific probe designed in this project, which can perform rapid hybridization.
  • the hybridization time is 1-2 hours and the capture time is 20 minutes, which shortens the hybridization capture time.
  • PCR amplification enriches the hybridization capture library. This step
  • the PCR amplification amplification scheme is related to the adapter module used. When using the molecular tag adapter module, use primers containing Barcode sequences for PCR amplification. If using the full-length adapter module, use Primer Mix to perform PCR amplification on the target-enriched DNA library. amplification (see Figure 3).
  • the hybridization capture time selected for this system is from 1 hour to 16 hours, and the optimal capture time is 1 hour.
  • the hybridization non-capture temperature selected by this system is 59-61°C, and the optimal capture temperature is about 60°C.
  • the temperature selection is related to the probe length, GC content of the target area, and hybridization capture time.
  • hybridization capture library construction of this system requires a total of 6 hours from sample to capture library acquisition. Compared with the traditional 2 to 4 days, it simplifies the operation steps and greatly shortens the entire process operation time.
  • the invention also provides hybridization capture reagent components and methods of use thereof.
  • the specific contents are as follows:
  • the adapter ligation product was purified using 2x Beads, and the purified product used the Beads Wash Buffer configured in the kit.
  • the Beads Wash Buffer was 4 mL acetonitrile plus 1 mL H 2 O.
  • the hybridization system involves a total of three elution buffers, namely elution buffer I, elution buffer II and elution buffer III.
  • the recipes of the three elution buffers are shown in Table 2.
  • the structural schematic diagram of the probe of the present invention, the conventional 120nt probe and the short probe is shown in Figure 4.
  • the following Examples 1-3 compare the effects of the probe of the present invention with commonly used probes in the prior art, and the probe of the present invention is referred to as NC probe.
  • Example 1 Comparison of hybridization capture effects of conventional 120-base probes and short probes
  • the pre-capture library is a human plasma cell-free DNA library, which is derived from the fragmentation and release of human genomic DNA into the blood circulation system, that is, the total sequence is the entire human genome sequence.
  • the set target sequence is located in the interval shown in Table 3 and contains a series of high-frequency somatic mutation sites related to tumors.
  • the total length of the target sequence is only 1.2kb. If it is covered by conventional 120nt probes, 44 probes are needed. The 44 conventional 120nt probes are shown in Table 4. This experiment is based on Hybrid Capture Reagents perform hybridization capture, and the resulting capture library is sequenced on Illumina Novaseq6000. In the sequencing data, an average of 99.9% of the sequences can be mapped to the human reference genome, of which an average of 11.7% is located in the target region. The target rate of the 120nt probe is too low to meet the requirements (Figure 5).
  • the most concentrated range of plasma free DNA fragment length distribution is around 160 bp, so there may not be a probe that can completely bind to it, and the overall binding of the probe to the target sequence is not stable enough.
  • the target region accounts for a very small proportion of the entire genome, only about 1/2500000, so the low target rate is also expected.
  • probe length In order to increase the probability of each fragment binding to the probe, short probes are used for capture. This example intends to use four shorter probe lengths (that is, the probe length does not exceed 40 nt) to combine with each 160 bp fragment to be enriched.
  • the target annealing temperature of the probe was set to 65°C. If the probe length is short, the annealing temperature will be greatly affected by the base composition of the sequence. Therefore, the design method of the probe pool is different from the conventional 120nt probe. The probe length needs to be adjusted within a certain range to make the annealing temperature close to the target. value. Carry out probe pool design according to some steps of the probe pool design method provided by the present invention (skipping steps (1) (2) (4) in point 4).
  • the sum sequence is the human reference genome hg19, and the target sequence is as follows
  • the target region sequence shown in Table 3 enter the probe length range parameter 35-40nt, the probe annealing temperature 65°C, m1 set to 40, and m2 set to 5.
  • the obtained short probes are shown in Table 5, with a length of about 40nt and a total of 97 probes.
  • Analysis of capture library NGS data showed that an average of 99.9% of the sequences could be aligned to the human reference genome, of which an average of 23.4% were located in the target region (Figure 5).
  • the target rate has been significantly improved, it is still lower than the 50% target rate requirement of conventional hybrid capture. It is obvious that even if the probes are directly shortened and the probe density is increased on overlapping probes for capture, the target rate still cannot meet the basic requirement of 50%.
  • NC probes add sequences for the probes to bind to each other.
  • the sum sequence is the human reference genome hg19
  • the target sequence is the target region sequence shown in Table 3
  • the probe length range is set to 35-40nt
  • the probe annealing temperature is set to 65°C.
  • the selected probe binding sequence is CGTCGGTC, its complementary sequence is GACCGACG, and the number of occurrences is 2078 times. This sequence was added to both sides of the probe in Table 5 as a probe mutual binding sequence.
  • NC probe sequence is shown in Table 6. Compared with Table 5, probe binding sequences are added at both ends of the target specific sequence of the probe. When one fragment binds more than one probe, the probes can pass through Complementary pairing of probe binding sequences increases the robustness of probe binding.
  • NC probe capture library NGS data shows that 99.9% of the sequences can be aligned to the human reference genome, of which the proportion located in the target region is 56.0% on average, meeting the conventional hybridization capture target rate requirements. .
  • Example 3 NC probes for targeted capture of PCR-free libraries
  • PCR-free libraries refer to libraries that are connected to NGS adapters but have not been PCR amplified. They retain the original sequence information and have not introduced PCR preferences. Directly using PCR-free libraries for hybridization capture faces the difficulty of low hybridization input and unguaranteed capture rate.
  • each original fragment has multiple copies, so there are multiple opportunities to be bound and captured by the probe. If any fragment in the PCR-free library is not captured by the probe, it will not be able to enter subsequent steps, resulting in information loss.
  • each single strand of the PCR library fragment generates a corresponding complementary strand, so the probe only needs to be designed in one direction to capture information from both strands of the original fragment.
  • Examples 4-8 further tested the hybridization capture system and related parameters based on the NC probe of the present invention.
  • DS211 or SS information is directly proportional to the NC probe concentration.
  • the NC probe concentration is low, less effective library information is captured.
  • the higher the NC probe concentration the more effective library information is captured. Abundant, but too high a concentration of NC probes will lead to excessive redundant NC probes in the system, resulting in a reduction in the target rate.
  • the optimal NC probe concentration used in this system is between 6-10 fmol, and a better choice is 6 fmol NC probe.
  • DS211 or SS content is affected by the hybridization capture temperature.
  • the hybridization capture temperature of 60°C performs better than the other two temperature conditions, and the capture efficiency and target rate of 60°C are both high. at other hybridization capture temperatures.
  • Hybrid capture temperature Lib 1 59°C Lib 2 60°C Lib 3 61°C
  • hybridization temperatures from 59°C to 61°C show better capture efficiency. This system uses 60°C as the final hybridization capture condition.
  • the hybridization time used in the traditional hybridization capture system is 16 hours.
  • the hybridization time used in the present invention can be shortened from 16 hours to 1 hour, and shortening the hybridization time will not affect the capture efficiency of the probe for DNA samples.
  • the gDNA is fragmented to about 200 bp (Covaris ultrasonic fragmentation instrument), end repair and adapter ligation are performed, and then equal volumes of Beads are used to purify the nucleic acid; the specific purification process is as follows:
  • the hybridization system contains 6fmol probe, 1 ⁇ Hyb Buffer, 1 ⁇ Enhance, 1ug Human Cot-1, and 100pmmol Blocker. Place the configured hybridization reaction system in a temperature controller for reaction.
  • the hybridization reaction conditions are as follows: denaturation at 95°C 2 minutes, hybridization at 60°C for 1 hour or 16 hours.
  • the PCR reaction system mainly includes 2 ⁇ HiFi PCR Master Mix, 5 ⁇ L Index Primer Mix and 20 ⁇ L TE; start the PCR amplification program on the PCR temperature controller. After the reaction, use 1 Double volume magnetic beads are used for purification, and the purified product is Platform sequencing.
  • DS211 or SS information is proportional to hybridization time. More than 90% of the effective library has been captured in 1 hour of hybridization. Finally, 1 hour of hybridization time was selected to control the entire experimental process to be completed in 1 day.
  • Example 7 Comparison of the PCR-free mode capture of small target areas with NC probes and the conventional capture process with conventional probes
  • Group 1 used the traditional method to construct the target Capture library, the traditional hybridization capture system is matched with a 120nt probe;
  • Group 2 uses the NC probe system of the present invention to construct a PCR-free target capture library, and capture probes are designed for the same region, and the probes cover the genome exon region , the target area size is about 4kb.
  • the specific implementation process of Group 1 is referred to The product instructions of the simple hybridization capture kit; and the specific experimental process of Group 2 can be found in Example 6, and the hybridization time is fixed at 1 hour.
  • the data performance of this example is shown in Table 14.
  • the average coverage rate of Group 1 and Group 2 is close to 100%.
  • the target rate of Group 2 is 59%, which is higher than the 11.73% of Group 1. It is obvious that the NC probe of the present invention has The system can effectively improve the target hit rate.
  • Example 8 The detection efficiency of fusion genes is higher than that of traditional hybridization capture
  • Fusion genes are caused by genome rearrangements that cause partial fragments of two genes to join. Fusion genes can be detected and analyzed by capture sequencing of the regions flanking the rearrangement breakpoint. Since only part of the rearranged fragment across the breakpoint is the original sequence, for conventional probes, there will be a problem that only part of the fragment can be combined. NC probes can also improve the detection ability of fusion genes through more probe binding possibilities.
  • Group 1 used the traditional method to construct a target capture library.
  • the traditional hybridization capture system was used with a 120nt probe to design a probe covering ROS1 intron 33 to detect CD74-ROS1 fusion.
  • Group 2 used the present invention to construct a targeted capture library, and designed capture probes for the same region, and the target region was about 1 kb.
  • the specific implementation process of Group 1 is referred to Product instructions for the Simple Hybridization Capture Kit.
  • the sample is a pan-tumor 800 gDNA standard (GW-OGTM800), which contains multiple mutation sites verified by digital PCR. CD74-ROS1 Fusion is one of them. The theoretical mutation frequency of this site is 6%.
  • Fusion sites are often located in repetitive regions, and probe design in repetitive regions is a difficult problem to capture.
  • This system uses NC probes, which has certain advantages in detecting fusion genes.
  • the GW-OGTM800 standard in this experiment contains a set of CD74-ROS1 fusion genes, and digital PCR verified that its mutation frequency is 5%; Groups 1 and 2 used probes covering the same area for hybridization capture, and the fusion gene frequency was detected by traditional methods The frequency of the fusion gene detected by the optimized system of the present invention is around 1.1%.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pathology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种新型的杂交捕获探针,用于靶核酸序列的富集其原理是探针序列分为三个区段:中间区段为靶序列结合区段;一条探针的 5'端区段可与另一条探针的3'端区段互补配对,3'端区段可与另一条探针的5'端区段互补配对。这一新型探针可与靶序列更稳固地结合,相比于传统的杂交捕获探针,对于低起始量或小靶标区域(panel)下的杂交捕获靶向富集有着更优的效果。

Description

一种靶向富集核酸的探针 技术领域
本发明涉及一种探针,特别是一种应用于靶向富集核酸的探针及其应用和设计方法。
背景技术
核酸序列是生命信息的载体,而高通量测序技术已成为生物和医学领域的核心技术之一。高通量测序产生大量的数据,其中并非全部都是研究或者检测的目标序列。虽然测序成本已经大幅降低,但是由于全基因组测序数据量较高,成本仍然居高不下,解决这一问题的方案就是将全基因组测序变为靶向富集技术,靶标区域富集的NGS测序技术会忽略基因组中不感兴趣区域的信息,并将基因组中靶区域的信号放大,可以节约测序成本和测序时间。
靶向富集根据富集原理不同主要分为多重PCR扩增和靶向捕获。后者是基于探针的液相杂交捕获技术,是目前的主流,具有探针设计难度低、探针容错性高等优点。液相杂交捕获技术是在溶液中生物素标记的探针与靶区域特异性结合,通过链霉亲和素磁珠,对探针捕获到的目的片段进行富集。在这一过程中带有生物素标记的探针,以及杂交捕获的液相反应条件均对这一系统捕获效率产生较大影响,对于大的靶标区域,杂交捕获效率较高,比如全外显子靶标区域(Panel, 又称捕获区域),中靶率在80%以上;但是对于一些小的靶标区域(Panel),中靶率较低,比如10kb以下的小靶标区域,中靶率甚至会低于10%。
探针序列长度的选择有多方面的考虑:首先,探针长度应保证其在特定的杂交体系中,不同的序列碱基组成下,杂交退火温度适宜,与靶序列的结合能力以及特异性最优;其次,应保证探针和靶序列的序列之间在存在一定程度错配的情况下,杂交退火温度下降不明显;最后,越长的探针合成难度越大,合成质量越难保证。目前,基于以上考虑,探针序列长度通常为40-120nt,主流的探针长度为120nt,且带有修饰(如生物素),其修饰基团则可与相应的亲和介质结合以完成对靶序列的“捕获”。探针的形式包括单链DNA、双链DNA、单链RNA、双链RNA等。
目前二代测序技术是应用最多的高通量测序技术,双向150bp是较为主流的测序读取模式。测序文库的平均插入片段长度也多为100-400bp之间。过长的插入片段其中间部分无法被读取,且过长的片段对测序流程中的多个PCR扩增步骤也提出了挑战。另外,对于FFPE、细胞外游离核酸等原始长度已经比较短的样本来说,也无法制备插入片段更长的文库。那么,一个文库分子在进行杂交捕获的过程中,通常只能结合1-2个探针,这也意味着探针脱落的概率增加,靶序列的回收率降低。例如,一个长度为120bp靶序列,最多只能完整结合一条探针,即使可以结合两条探针,这两条探针也只能是部分结合。为了增加探针的结合能力和结合概率,可将探针缩短,并增加 探针数目,或采取叠瓦式的设计策略,即探针互相重叠设置,使得不同的靶序列片段均有更高的几率可以与探针有更为完整的结合。然而,即使是互相重叠的探针也不可能同时完整的结合在同一个靶片段之上(图4A)。
对于经过了PCR扩增的测序文库来说,每个靶片段均存在多个拷贝,因此较低的回收率下也可以保证绝大部分原始的靶片段都有被捕获到的拷贝。并且杂交捕获技术通常靶向5kb以上的区域,而固有的非特异捕获可通过多种手段进行压缩,中靶率(靶序列在所有捕获到的序列中所占的比例)有一定保障。但是对于插入片段长度较短,或是未经PCR扩增的测序文库,以及靶标区域占总和区域的比例过低的应用需求来说,目前主流的探针以及杂交捕获体系无法提供令人满意的回收效率及中靶率。
除此之外,液相杂交捕获流程非常耗时,从核酸样本到捕获文库的获取需要2-4天的时间;同时杂交捕获涉及的试剂种类较多、操作流程异常的繁琐,对于操作人员的技术要求较高,中途任何一个环节出现问题,均会影响捕获文库的表现。这些环节成为制约液相杂交捕获发展的关键技术瓶颈。
液相杂交捕获技术对于癌症肿瘤突变基因检测、拷贝数变异、甲基化状态分析的应用较为广泛,目前市面上有多款产品应用到基因检测和临床的应用研究,但是随着肿瘤早筛以及MRD热度的上升,对液相杂交捕获技术提出更高的要求。例如针对实体瘤MRD检测技术,首先针对原发肿瘤组织进行测序以鉴定患者特异性基因组变异图谱, 然后设计靶标区域进行个性ctDNA检测分析。这就要求杂交捕获系统对小靶标区域的兼容性、操作便捷性、实验流程简并性、自动化程度等方面提出更高要求。
因此,开发出一种回收效率及中靶率较高的探针,以及一套捕获效率高、均一、稳定且操作简便、涉及试剂种类少、耗时短的液相杂交捕获系统,是解决当前市场痛点的出口。
发明内容
本发明提供了一种用于核酸捕获富集的探针,以及该探针组成的探针池的设计方法。
本发明提供了一种用于核酸捕获富集的探针,其特征在于,上述探针包括:(1)与另一探针互补配对的探针结合序列,以及(2)与核酸靶序列互补配对的靶特异性序列。
优选地,上述探针结合序列包括第一探针结合序列和第二探针结合序列。
更优选地,上述探针的5’端具有与另一探针的3’端互补配对的第一探针结合序列,并且上述探针的3’端具有与另一探针的5’端部互补配对的第二探针结合序列。
优选地,上述探针结合序列的长度为8-30nt。
优选地,上述靶特异性序列的长度为20-80nt。
更优选地,上述探针5’端与另一探针互补配对的第一探针结合序列长度为8-30nt,上述探针3’端与另一探针互补配对的第二探针结合序列长度为8-30nt。
优选地,上述探针的3’端或5’端带有生物标记物。
更优选地,其特征在于,上述生物标记物为生物素。
优选地,其特征在于,上述探针与核酸靶序列之间的退火温度大于探针与探针之间的退火温度。
本发明提供一种用于核酸捕获富集的探针池设计方法,其特征在于,包括以下步骤:
a)输入初始序列信息和设计参数,输出探针序列信息,其中,上述初始序列信息包括(1)总和序列信息,为捕获前文库中可能包含的序列;(2)靶序列信息,为待捕获的序列、需要避开的序列信息,即综合序列中的重复序列等低特异性序列;
上述设计参数包括探针与靶序列结合的退火温度范围和序列长度范围,以及探针与探针间结合序列的长度范围;
b)从总和序列的正链及互补链序列中截取长度为k的所有子序列,并统计每种子序列出现的次数;
c)选择探针与探针间互补配对的探针结合序列,其中,上述探针结合序列长度为k,其退火温度小于探针与靶序列结合的退火温度,且其在总和序列中出现的次数较少,优选地,出现次数小于平均值5%;
d)选择探针与核酸靶序列结合的靶特异性序列,其中,选择第i个上述靶序列,i的初始值等于1;接着从选择的上述靶序列,其第n个碱基开始选取探针与核酸靶序列结合的靶特异性序列,n的初始值等于1;
f)在上述靶特异性序列的5’端加上探针结合序列,3’端加上探针结合序列的反向互补序列;
g)输出所有的探针序列。
优选地,上述探针与核酸靶序列结合的靶特异性序列,如果未落入需要避开的序列区间,则放入探针池,并间隔m1个碱基尝试获得下一个靶特异性序列;如果落入需要避开的序列区间,则不放入探针池,并间隔m2个碱基再次尝试获得靶特异性序列;
其中,上述数字m1值大于或等于上述探针与靶特异性序列的长度;上述的数字m2值小于或等于上述探针与靶特异性序列长度范围的最小值。
优选地,选择探针与核酸靶序列结合的靶特异性序列,其步骤包括:当n小于第i个靶序列的长度时,选择下一个靶特异性序列;当n大于或等于第i个靶序列的长度时,选择第i个靶特异性序列。当第i个靶序列的靶特异性序列选择结束后,对第i+1个靶序列进行上述靶特异性序列选择,直到所有靶序列均完成靶特异性序列选择。
本发明还提供上述的探针于检测DNA片段中的低频突变检测、染色体拷贝数变异分析、插入/缺失、微卫星不稳定性或融合基因变异的应用。
本发明还提供上述的探针于靶向mNGS测序或检测病原体流行病学的应用。
与现有技术相比,本发明的探针,有益效果在于,与常规探针 相比,本发明的探针与靶片段结合更牢固,可以通过较短的靶特异结合序列增加靶片段可以结合的探针个数。本发明的探针更适合于短片段文库的捕获;更适合于小靶标区域的捕获;更适合于PCR-free文库的捕获;更有利于杂交捕获流程的缩短。
附图说明
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成本发明的不当限定。在附图中:
图1为本发明探针的结构示意图,探针主要由4个部分组成:与靶基因互补的P-Cap区、3’端的P-L区以及5’端的P-R区,其中探针的5’端有生物素(Biotin)标记,P-L和P-R有一段序列互补。
图2为常规杂交捕获系统与本发明杂交捕获系统流程的比较图。
图3为不同类型样本的实验方案。
图4为常规120nt探针(A)、现有技术使用的短探针(B),以及本发明探针(C)与靶片段结合的结构示意图,其中T代表样本核酸的靶片段,P代表探针。
图5为常规120nt探针、现有技术使用的短探针、本发明探针杂交捕获文库NGS的实验结果。
图6为常规120nt探针与本发明探针用于PCR-free文库的捕获效果。
图7为本发明探针浓度测试结果。
图8为本发明探针杂交温度测试结果。
图9为本发明探针杂交时间测试结果。
具体实施方式
以下结合附图与具体实施例对本发明做进一步的描述,本发明的保护内容不局限于以下实施例。还应该理解,本发明实施例中使用的术语是为了描述特定的具体实施方案,而不是为了限制本发明的保护范围,并不是唯一性限定。在不背离发明构思的精神和范围下,本领域技术人员能够想到的变化和优点都被包括在本发明中,并且以所附的权利要求及其任何等同物为本发明的保护范围。
本文中使用的所有技术和科学术语具有被本发明所属领域技术人员普遍理解的相同含义。在其他情况下,本文使用的某些术语会在说明书中阐明其含义。下列实施例中未注明具体条件的实验方法,均为本领域技术人员的普遍知识和公知常识。本申请中的实施例及实施例中的特征可以相互组合。
通过以下详细说明结合附图可以进一步理解本发明的特点和优点。所提供的实施例仅是对本发明方法的说明,而不以任何方式限制本发明揭示的其余内容。
本发明提供了一套用于核酸捕获的探针,探针分别针对靶区域的正义链以及负义链设计,正义链探针以及负义链探针均采用无重叠的排列方式,探针3’端或5’端有生物素修饰,该生物素能够与链酶亲和素磁珠结合。
探针主要由三部分组成,中间区段为靶序列结合区段,5’和3’ 区段为稳定性增强区段,一条探针的5’端区段可与另一条探针的3’端区段互补配对,3’端区段可与另一条探针的5’端区段互补配对。其中,探针之间互补配对的片段分别是P-L片段以及P-R片段,P-L片段长度为8-30nt,P-R片段长度为8-30nt,P-L与P-R两个片段之间存在8-30nt的互补配对的区域,L的3’端或R的5’端有生物素修饰,能够与磁珠上的链酶亲和素结合;探针与靶区域互补配对的片段为P-Cap片段,P-Cap长度在20-80nt之间(图1)。
探针设计方法如下:
根据待检基因的位置设计探针,若针对突变、插入或缺失突变,选取覆盖相应片段的区域设计探针;若针对融合基因设计探针,选取融合基因断点两侧的基因设计探针;
若需要捕获正义链,探针会针对正义链设计捕获探针;
若需要捕获反义链,探针会针对反义链设计捕获探针;
通过软件分析,剔除危险探针,危险探针会导致整个杂交捕获系统严重脱靶,导致中靶率降低,靶区域捕获效率偏低,覆盖均一性变差。
本发明还提供了一种探针池的设计方法,方法如下:
1.设计工具输入初始序列信息和设计参数,生成探针序列信息。
2.初始序列信息:包括总和序列信息,即富集前文库中可能包含的序列信息,以及靶标序列信息,即需要从总和序列信息中富集到的序列信息。
3.设计参数:包括探针与靶标序列结合的退火温度范围,此温度 与杂交反应溶液的组成以及反应的设定温度有关,还包括探针与靶标结合区的序列长度范围。
4.设计工具的工作流程,包括:
(1)对总和序列信息的预处理。对总和序列信息的预处理包括总和序列不同区段特异性的评估,以及统计总和序列正链及互补链中长度为k的所有序列组合的出现次数,且其中k小于探针与靶序列结合区的序列长度范围的最小值。
(2)探针与探针结合区序列的选择。所述的探针与探针结合区序列,其特点包括:
(2.1)长度为k;
(2.2)退火温度小于探针与靶标序列结合的退火温度;
(2.3)上述(1)中的出现次数统计结果,其在总和序列中次数较少,或出现次数小于平均值5%。
(3)探针与靶标序列结合区的选择。选择过程包括:
(3.1)选择第i个靶标序列,i的初始值等于1;从选择的该靶标序列的第n个碱基开始选取探针与靶标结合区序列,n的初始值等于1。其中,探针与靶标结合区序列的退火温度满足上述第3点,且序列长度满足上述第3点的范围。其中,探针与靶标结合区序列,其总和序列不同区段特异性的评估,如果评估为高特异性(即未落入需要避开的序列区间),则放入探针池,且n加上一个数字m1,间隔m1个碱基尝试获得下一个靶特异性序列;如果评估为低特异性(即 落入需要避开的序列区间),则不放入探针池,n加上一个数字m2,间隔m2个碱基再次尝试获得靶特异性序列。
优选地,上述数字m1,其值大于或等于上述放入探针池的探针与靶标结合区序列长度;上述数字m2,其值小于或等于上述的探针与靶标结合区的序列长度范围的最小值。
(3.2)当n小于第i个靶标序列的长度时,选择下一个探针与靶标结合区序列。
(3.3)当n大于或等于第i个靶标序列的长度时,完成第i个靶标序列的探针与靶标结合区序列选择。
(3.4)当第i个靶标序列的探针与靶标结合区序列选择结束后,对第i+1个靶标序列进行上述的探针与靶标结合区序列选择,直到所有靶标序列均完成探针与靶标结合区序列选择。
(4)对探针池中的探针与靶标结合区序列,对其5’端加上探针与探针结合区序列,对其3’端加上探针与探针结合区序列的反向互补序列。
(5)输出所有探针序列。
本发明还提供了一套从核酸样本到靶向文库构建的系统(参见图2),具体流程如下:
所述核酸样本包括DNA样本或RNA样本,DNA样本包括血浆游离DNA(cfDNA)、基因组DNA(gDNA)、FFPE样本、病毒或细菌基因组样本等;RNA样本包括新鲜组织样本、FFPE样本、病毒 或细菌基因组样本等。
针对cfDNA样本,无需打断,可以直接进行建库;
针对完整基因组样本需要进行物理打断,将基因组DNA打断至200-250bp左右;
针对RNA样本,需要进行反转录、一链以及二链合成;
片段化之后的样本进行末端修复、接头连接以及连接产物纯化,纯化后的产物直接进行杂交捕获,杂交捕获方案与使用的接头相关,使用全长UDI接头模块,可以进行多文库混合杂交,杂交产物使用Primer Mix对混和杂交捕获的文库进行PCR扩增;若使用截短型分子标签接头模块,只能进行单个样本的杂交,含分子标签接头模块可以对样本进行低频突变检测,通过一致性序列分析过滤掉杂交模糊以及PCR扩增引入的背景噪音。此处同时兼容Illumina以及MGI测序平台的接头模块,构建适用于不同测序平台的DNA文库。
接头连接产物无需真空浓缩,直接配置杂交捕获反应体系,也可以带着上一步的纯化磁珠直接进行杂交捕获;
杂交体系使用本项目设计的特异性探针,可以进行快速杂交,杂交时间为1-2小时,捕获时间20分钟,缩短了杂交捕获的时间,PCR扩增对杂交捕获文库进行富集,此步PCR扩增扩增方案与使用的接头模块相关,使用分子标签接头模块时,搭配含Barcode序列的引物进行PCR扩增,若使用全长接头模块,搭配Primer Mix对靶向富集的DNA文库进行扩增(参见图3)。
本系统所选择的杂交捕获时间为1小时到16小时,最优的捕获 时间为1小时。
本系统所选择的杂交不捕获温度为59-61℃,最优捕获温度为60℃左右,温度选择与探针长度、靶区域GC含量以及杂交捕获时间相关。
本系统杂交捕获文库构建,即从样本到捕获文库获取共需要6小时,相比传统的2天到4天,简化操作步骤的同时,大大缩短了全流程操作时间。
本发明还提供了杂交捕获试剂组分及其使用方法,具体内容如下:
接头连接产物使用2倍Beads纯化,且纯化产物使用试剂盒配置的Beads Wash Buffer,Beads Wash Buffer为4mL乙腈加入1mL H 2O。
杂交捕获反应体系使用的试剂详见表1。
表1
Figure PCTCN2022111610-appb-000001
Figure PCTCN2022111610-appb-000002
杂交系统共涉及3种洗脱缓冲液,分别是洗脱缓冲液I、洗脱缓冲液II以及洗脱缓冲液III,三种洗脱缓冲液配方参见表2。
表2
Figure PCTCN2022111610-appb-000003
本发明探针与常规120nt探针、短探针的结构示意图见图4。以下实施例1-3将针对本发明的探针与现有技术常用的探针进行效果比较,并将本发明的探针以NC探针代称。
实施例1:常规120碱基探针与短探针的杂交捕获效果比较
本实施例中,捕获前文库为人血浆游离DNA文库,来源于人基因组DNA的断裂和向血液循环系统的释放,即总和序列为全部的人基因组序列。设定的靶序列位于表3中所示的区间,包含与肿瘤相关的一系列高频体细胞突变位点。
表3.靶序列在hg19版本人基因组上所处的位置
染色体 靶标区域起点坐标 靶标区域终点坐标 基因名
chr1 115252204 115252205 NRAS
chr1 115256518 115256533 NRAS
chr1 115258730 115258752 NRAS
chr2 209113106 209113193 IDH1
chr12 25378561 25378563 KRAS
chr12 25378647 25378648 KRAS
chr12 25380275 25380286 KRAS
chr12 25398255 25398296 KRAS
chr12 112888139 112888212 PTPN11
chr12 112926852 112926909 PTPN11
chr13 28592620 28592654 FLT3
chr13 28602329 28602330 FLT3
chr13 28608244 28608342 FLT3
chr13 28610138 28610139 FLT3
chr15 90631837 90631939 IDH2
chr17 7573931 7574027 TP53
chr17 7577022 7577146 TP53
chr17 7577515 7577606 TP53
chr17 7578187 7578293 TP53
chr17 7578362 7578559 TP53
chr17 7579358 7579474 TP53
chr17 7579882 7579883 TP53
该靶序列总长度仅为1.2kb,若以常规的120nt探针覆盖,需要44条探针,44条常规120nt探针如表4所示。本实验以
Figure PCTCN2022111610-appb-000004
Hybrid Capture Reagents进行杂交捕获,得到的捕获文库在Illumina Novaseq6000上测序。测序数据中,平均99.9%的序列可以比对到人参考基因组上,其中平均11.7%位于靶标区域,120nt探针的中靶率过低无法满足要求(图5)。
表4.表3中覆盖靶标区域的常规120nt探针
Figure PCTCN2022111610-appb-000005
Figure PCTCN2022111610-appb-000006
Figure PCTCN2022111610-appb-000007
血浆游离DNA片段长度分布最集中的区间是在160bp左右,故而不一定有探针能完整地与之结合,探针与靶序列结合的整体不够稳固。再加上靶标区域占整个基因组的比例非常小,仅约1/2500000,故中靶率偏低的结果也是可以预期的。
为了提高每个片段与探针结合的概率,采用短探针来进行捕获。 本实施例欲利用4条较短的探针长度(即探针长度不超过40nt)与每个160bp待富集片段进行结合。探针的目标退火温度设为65℃。探针长度较短则退火温度受序列碱基组成的影响较大,因此,探针池的设计方法与常规的120nt探针不同,需要在一定范围内调整探针长度,使其退火温度接近目标值。按照本发明提供的探针池设计方法的部分步骤(跳过第4点中的(1)(2)(4)步骤)进行探针池设计,总和序列为人类参考基因组hg19,靶标序列为如表3所示的靶标区域序列,输入探针长度范围参数35-40nt,探针退火温度65℃,m1设为40,m2设为5。得到的短探针如表5所示,长度约40nt,共97条。捕获文库NGS数据分析后,显示平均99.9%的序列可以比对到人参考基因组上,其中平均23.4%位于靶标区域(图5)。虽然中靶率有了较为明显的提升,但仍然低于常规的杂交捕获50%的中靶率要求。显见即使直接缩短探针并在重叠探针增加探针密度进行捕获,中靶率仍无法达到50%的基本要求。
表5.表3中覆盖靶标区域的短探针
  序列名称 序列5'-3' 修饰
SEQ ID NO.45 NRAS-S-1 /biotin/CAAATGCTGAAAGCTGTACCATACCTGTCTGGTCT 5'生物素
SEQ ID NO.46 NRAS-S-2 /biotin/GCTGAGGTTTCAATGAATGGAATCCCGTAACTCTT 5'生物素
SEQ ID NO.47 NRAS-S-3 /biotin/CCAGTTCGTGGGCTTGTTTTGTATCAACTGTCCTT 5'生物素
SEQ ID NO.48 NRAS-S-4 /biotin/TGGCAAATCACACTTGTTTCCCACTAGCACCATAG 5'生物素
SEQ ID NO.49 NRAS-S-5 /biotin/ACATCATCCGAGTCTTTTACTCGCTTAATCTGCTC 5'生物素
SEQ ID NO.50 NRAS-S-6 /biotin/ACTTGCTATTATTGATGGCAAATACACAGAGGAAGCC 5'生物素
SEQ ID NO.51 NRAS-S-7 /biotin/CGCCTGTCCTCATGTATTGGTCTCTCATGGCACTG 5'生物素
SEQ ID NO.52 NRAS-S-8 /biotin/CTCTTCTTGTCCAGCTGTATCCAGTATGTCCAACA 5'生物素
SEQ ID NO.53 NRAS-S-9 /biotin/CAGGTTTCACCATCTATAACCACTTGTTTTCTGTAAGAAT 5'生物素
SEQ ID NO.54 NRAS-S-10 /biotin/CCTGGGGGTGTGGAGGGTAAGGGGGCAGGGAGGGA 5'生物素
SEQ ID NO.55 NRAS-S-11 /biotin/GGGCTACCACTGGGCCTCACCTCTATGGTGGGATC 5'生物素
SEQ ID NO.56 NRAS-S-12 /biotin/ATTCATCTACAAAGTGGTTCTGGATTAGCTGGATTGTC 5'生物素
SEQ ID NO.57 NRAS-S-13 /biotin/TGCGCTTTTCCCAACACCACCTGCTCCAACCACCA 5'生物素
SEQ ID NO.58 NRAS-S-14 /biotin/AGTTTGTACTCAGTCATTTCACACCAGCAAGAACC 5'生物素
SEQ ID NO.59 KRAS-S-1 /biotin/TATTTATTTCAGTGTTACTTACCTGTCTTGTCTTTGCTGA 5'生物素
SEQ ID NO.60 KRAS-S-2 /biotin/TGTTTCAATAAAAGGAATTCCATAACTTCTTGCTAAGTCC 5'生物素
SEQ ID NO.61 KRAS-S-3 /biotin/TGAGCCTGTTTTGTGTCTACTGTTCTAGAAGGCAA 5'生物素
SEQ ID NO.62 KRAS-S-4 /biotin/CACATTTATTTCCTACTAGGACCATAGGTACATCTTCAG 5'生物素
SEQ ID NO.63 KRAS-S-5 /biotin/GTCCTTAACTCTTTTAATTTGTTCTCTGGGAAAGAAAAAA 5'生物素
SEQ ID NO.64 KRAS-S-6 /biotin/AAGTTATAGCACAGTCATTAGTAACACAAATATCTTTCAA 5'生物素
SEQ ID NO.65 KRAS-S-7 /biotin/TAGTATTATTTATGGCAAATACACAAAGAAAGCCCTCCCC 5'生物素
SEQ ID NO.66 KRAS-S-8 /biotin/AGTCCTCATGTACTGGTCCCTCATTGCACTGTACT 5'生物素
SEQ ID NO.67 KRAS-S-9 /biotin/TCTTGACCTGCTGTGTCGAGAATATCCAAGAGACA 5'生物素
SEQ ID NO.68 KRAS-S-10 /biotin/TTTCTCCATCAATTACTACTTGCTTCCTGTAGGAATCC 5'生物素
SEQ ID NO.69 KRAS-S-11 /biotin/AGAAGGGAGAAACACAGTCTGGATTATTACAGTGC 5'生物素
SEQ ID NO.70 KRAS-S-12 /biotin/GATTTACCTCTATTGTTGGATCATATTCGTCCACAAAATG 5'生物素
SEQ ID NO.71 KRAS-S-13 /biotin/ATTCTGAATTAGCTGTATCGTCAAGGCACTCTTGC 5'生物素
SEQ ID NO.72 KRAS-S-14 /biotin/ACGCCACCAGCTCCAACTACCACAAGTTTATATTC 5'生物素
SEQ ID NO.73 KRAS-S-15 /biotin/TCATTTTCAGCAGGCCTTATAATAAAAATAATGAAAATGT 5'生物素
SEQ ID NO.74 PTPN11-S-1 /biotin/CTTTCCAATGGACTATTTTAGAAGAAATGGAGCTGTCAC 5'生物素
SEQ ID NO.75 PTPN11-S-2 /biotin/CACATCAAGATTCAGAACACTGGTGATTACTATGACC 5'生物素
SEQ ID NO.76 PTPN11-S-3 /biotin/TATGGAGGGGAGAAATTTGCCACTTTGGCTGAGTT 5'生物素
SEQ ID NO.77 PTPN11-S-4 /biotin/TCCAGTATTACATGGAACATCACGGGCAATTAAAAGAG 5'生物素
SEQ ID NO.78 PTPN11-S-5 /biotin/GAATGGAGATGTCATTGAGCTTAAATATCCTCTGAACTG 5'生物素
SEQ ID NO.79 PTPN11-S-6 /biotin/CTTCATGATGTTTCCTTCGTAGGTGTTGACTGCGA 5'生物素
SEQ ID NO.80 PTPN11-S-7 /biotin/TTGACGTTCCCAAAACCATCCAGATGGTGCGGTCT 5'生物素
SEQ ID NO.81 PTPN11-S-8 /biotin/GTACCGATTTATCTATATGGCGGTCCAGCATTATATTG 5'生物素
SEQ ID NO.82 PTPN11-S-9 /biotin/ACACTACAGCGCAGGATTGAAGAAGAGCAGGTACC 5'生物素
SEQ ID NO.83 PTPN11-S-10 /biotin/CCTGAGGGCTGGCATGCGGATTCTCATTCTCTTGC 5'生物素
SEQ ID NO.84 FLT3-S-1 /biotin/TAAGTAGGAAATAGCAGCCTCACATTGCCCCTGAC 5'生物素
SEQ ID NO.85 FLT3-S-2 /biotin/CATAGTTGGAATCACTCATGATATCTCGAGCCAATC 5'生物素
SEQ ID NO.86 FLT3-S-3 /biotin/AAGTCACATATCTTCACCACTTTCCCGTGGGTGAC 5'生物素
SEQ ID NO.87 FLT3-S-4 /biotin/GCACGTTCCTGGCGGCCAGGTCTCTGTGAACACAC 5'生物素
SEQ ID NO.88 FLT3-S-5 /biotin/GTGGGTTACCTGACAGTGTGCACGCCCCCAGCAGG 5'生物素
SEQ ID NO.89 FLT3-S-6 /biotin/CACAATATTCTCGTGGCTTCCCAGCTGGGTCATCA 5'生物素
SEQ ID NO.90 FLT3-S-7 /biotin/TTGAGTTCTGACATGAGTGCCTCTCTTTCAGAGCT 5'生物素
SEQ ID NO.91 FLT3-S-8 /biotin/CTGCTTTTTCTGTCAAAGAAAGGAGCATTAAAAATGTAAA 5'生物素
SEQ ID NO.92 FLT3-S-9 /biotin/GGCACATTCCATTCTTACCAAACTCTAAATTTTCTCTTGG 5'生物素
SEQ ID NO.93 FLT3-S-10 /biotin/AAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCA 5'生物素
SEQ ID NO.94 FLT3-S-11 /biotin/ACGTAGAAGTACTCATTATCTGAGGAGCCGGTCAC 5'生物素
SEQ ID NO.95 FLT3-S-12 /biotin/GTACCATCTGTAGCTGGCTTTCATACCTAAATTGC 5'生物素
SEQ ID NO.96 FLT3-S-13 /biotin/TATTACTTGGGAGACTTGTCTGAACACTTCTTCCAG 5'生物素
SEQ ID NO.97 FLT3-S-14 /biotin/CCAAGATGGTAATGGGTATCCATCCGAGAAACAGG 5'生物素
SEQ ID NO.98 FLT3-S-15 /biotin/GCCTGACTTGCCGATGCTTCTGCGAGCACTTGAGG 5'生物素
SEQ ID NO.99 FLT3-S-16 /biotin/TCCCTATAGAAAAGAACGTGTGAAATAAGCTCACTGG 5'生物素
SEQ ID NO.100 IDH2-S-1 /biotin/ATCCCCTCTCCACCCTGGCCTACCTGGTCGCCATG 5'生物素
SEQ ID NO.101 IDH2-S-2 /biotin/CGTGCCTGCCAATGGTGATGGGCTTGGTCCAGCCA 5'生物素
SEQ ID NO.102 IDH2-S-3 /biotin/GACTAGGCGTGGGATGTTTTTGCAGATGATGGGCT 5'生物素
SEQ ID NO.103 IDH2-S-4 /biotin/CGGAAGACAGTCCCCCCCAGGATGTTCCGGATAGT 5'生物素
SEQ ID NO.104 IDH2-S-5 /biotin/CATTGGGACTTTTCCACATCTTCTTCAGCTTGAAC 5'生物素
SEQ ID NO.105 TP53-S-1 /biotin/AGGTCACTCACCTGGAGTGAGCCCTGCTCCCCCCT 5'生物素
SEQ ID NO.106 TP53-S-2 /biotin/CTCCTTCCCAGCCTGGGCATCCTTGAGTTCCAAGG 5'生物素
SEQ ID NO.107 TP53-S-3 /biotin/TCATTCAGCTCTCGGAACATCTCGAAGCGCTCACG 5'生物素
SEQ ID NO.108 TP53-S-4 /biotin/CACGGATCTGCAGCAACAGAGGAGGGGGAGAAGTA 5'生物素
SEQ ID NO.109 TP53-S-5 /biotin/AGTGCTCCCTGGGGGCAGCTCGTGGTGAGGCTCCC 5'生物素
SEQ ID NO.110 TP53-S-6 /biotin/TTCTTGCGGAGATTCTCTTCCTCTGTGCGCCGGTC 5'生物素
SEQ ID NO.111 TP53-S-7 /biotin/TCCCAGGACAGGCACAAACACGCACCTCAAAGCTG 5'生物素
SEQ ID NO.112 TP53-S-8 /biotin/CCGTCCCAGTAGATTACCACTACTCAGGATAGGAA 5'生物素
SEQ ID NO.113 TP53-S-9 /biotin/CTCCTGACCTGGAGTCTTCCAGTGTGATGATGGTG 5'生物素
SEQ ID NO.114 TP53-S-10 /biotin/GATGGGCCTCCGGTTCATGCCGCCCATGCAGGAAC 5'生物素
SEQ ID NO.115 TP53-S-11 /biotin/TTACACATGTAGTTGTAGTGGATGGTGGTACAGTC 5'生物素
SEQ ID NO.116 TP53-S-12 /biotin/AGCCAACCTAGGAGATAACACAGGCCCAAGATGAG 5'生物素
SEQ ID NO.117 TP53-S-13 /biotin/CCAGACCTCAGGCGGCTCATAGGGCACCACCACAC 5'生物素
SEQ ID NO.118 TP53-S-14 /biotin/TGTCGAAAAGTGTTTCTGTCATCCAAATACTCCACAC 5'生物素
SEQ ID NO.119 TP53-S-15 /biotin/AAATTTCCTTCCACTCGGATAAGATGCTGAGGAGG 5'生物素
SEQ ID NO.120 TP53-S-16 /biotin/CCAGACCTAAGAGCAATCAGTGAGGAATCAGAGGC 5'生物素
SEQ ID NO.121 TP53-S-17 /biotin/CTCCAGCCCCAGCTGCTCACCATCGCTATCTGAGC 5'生物素
SEQ ID NO.122 TP53-S-18 /biotin/CGCTCATGGTGGGGGCAGCGCCTCACAACCTCCGT 5'生物素
SEQ ID NO.123 TP53-S-19 /biotin/TGTGCTGTGACTGCTTGTAGATGGCCATGGCGCGG 5'生物素
SEQ ID NO.124 TP53-S-20 /biotin/GCGGGTGCCGGGCGGGGGTGTGGAATCAACCCACA 5'生物素
SEQ ID NO.125 TP53-S-21 /biotin/TGCACAGGGCAGGTCTTGGCCAGTTGGCAAAACAT 5'生物素
SEQ ID NO.126 TP53-S-22 /biotin/TGTTGAGGGCAGGGGAGTACTGTAGGAAGAGGAAG 5'生物素
SEQ ID NO.127 TP53-S-23 /biotin/GACAGAGTTGAAAGTCAGGGCACAAGTGAACAGAT 5'生物素
SEQ ID NO.128 TP53-S-24 /biotin/AATGCAAGAAGCCCAGACGGAAACCGTAGCTGCCC 5'生物素
SEQ ID NO.129 TP53-S-25 /biotin/GTAGGTTTTCTGGGAAGGGACAGAAGATGACAGGG 5'生物素
SEQ ID NO.130 TP53-S-26 /biotin/CAGGAGGGGGCTGGTGCAGGGGCCGCCGGTGTAGG 5'生物素
SEQ ID NO.131 TP53-S-27 /biotin/CTGCTGGTGCAGGGGCCACGGGGGGAGCAGCCTCT 5'生物素
SEQ ID NO.132 TP53-S-28 /biotin/CATTCTGGGAGCTTCATCTGGACCTGGGTCTTCAG 5'生物素
SEQ ID NO.133 TP53-S-29 /biotin/GCCCTTCCAATGGATCCACTCACAGTTTCCATAGG 5'生物素
SEQ ID NO.134 TP53-S-30 /biotin/TGAAAATGTTTCCTGACTCAGAGGGGGCTCGACGC 5'生物素
SEQ ID NO.135 TP53-S-31 /biotin/GGATCTGACTGCGGCTCCTCCATGGCAGTGACCCG 5'生物素
SEQ ID NO.136 TP53-S-32 /biotin/AGGCAGTCTGGCTGCTGCAAGAGGAAAAGTGGGGA 5'生物素
SEQ ID NO.137 IDH1-S-1 /biotin/CATTATTGCCAACATGACTTACTTGATCCCCATAAGC 5'生物素
SEQ ID NO.138 IDH1-S-2 /biotin/GACGACCTATGATGATAGGTTTTACCCATCCACTC 5'生物素
SEQ ID NO.139 IDH1-S-3 /biotin/AAGCCGGGGGATATTTTTGCAGATAATGGCTTCTC 5'生物素
SEQ ID NO.140 IDH1-S-4 /biotin/AAGACCGTGCCACCCAGAATATTTCGTATGGTGCC 5'生物素
SEQ ID NO.141 IDH1-S-5 /biotin/TTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAAC 5'生物素
实施例2:常规120nt探针与NC探针的杂交捕获效果比较
本实施例中,展示了NC探针与常规120nt探针捕获人血浆游离DNA文库与实施例1中相同靶标区域的捕获结果比较。
NC探针在表5所示短探针序列的基础上增加了探针相互结合的序列。根据本发明提供的探针池设计方法,总和序列为人类参考基因组hg19,靶标序列为如表3所示的靶标区域序列,探针长度范围设定为35-40nt,探针退火温度设定为65℃。探针相互结合区序列长度设为8,即k=8。8个碱基的所有可能的序列组合共有65536个,在人类参考基因组hg19中均有出现,平均出现次数为88419次,从出现次数较低的序列中,选取的探针相互结合序列为CGTCGGTC,其互补序列为GACCGACG,出现次数为2078次。此序列作为探针互相结合序列加到表5中探针的两侧。
NC探针序列如表6所示,其与表5相比,在探针的靶特异性序列两端增加了探针结合序列,当一个片段结合大于一条探针时,探针之间可以通过探针结合序列的互补配对增加探针结合的牢固性。
表6.表3中覆盖靶标区域的NC探针
Figure PCTCN2022111610-appb-000008
Figure PCTCN2022111610-appb-000009
Figure PCTCN2022111610-appb-000010
Figure PCTCN2022111610-appb-000011
Figure PCTCN2022111610-appb-000012
Figure PCTCN2022111610-appb-000013
结果如图5所示,NC探针捕获文库NGS数据显示,99.9%的序列可以比对到人类参考基因组上,其中位于靶标区域的比例平均为56.0%,达到了常规的杂交捕获中靶率要求。
实施例3:NC探针用于PCR-free文库的靶向捕获
PCR-free文库是指连接了NGS接头,但是没有经过PCR扩增的文库,其保留了原始的序列信息,且尚未引入PCR偏好。直接以PCR-free文库进行杂交捕获面临着杂交投入量少,捕获得率无法保证的困难。经过了PCR扩增的文库,每个原始片段都有多个拷贝,因此有多个被探针结合并捕获到的机会。而PCR-free文库中的任何一个片段如果没有被探针捕获到,则无法进入后续步骤,造成信息损失。并且,PCR后的文库片段,每一条单链均生成了对应的互补链,因此探针只需要一个方向的设计,就可以捕获到来自原始片段两条链的信息。而PCR-free文库中,一个片段的正向负向两条链都是单一存在,如果只以其中一个方向的探针进行捕获,则互补链也会丢失。因此在本实施例中,增加了另一条链的探针。常规120nt的另一条链探针如表7所示,NC探针的另一条链探针如表8所示。
如图6所示,30ng血浆游离DNA PCR-free文库经表4和表7中 的常规120nt探针捕获后,NGS结果显示,平均中靶率仅为5.6%,正链平均去重后覆盖深度为356.1x,负链平均去重后深度为329.9x。而经表6和表8中所示的NC探针捕获后,NGS结果显示,平均中靶率达到48.7%,正链去重后平均深度为980.2x,负链去重后平均深度为1020.5x。可见,对于PCR-free文库,NC探针的回收率及中靶率都有大幅提升。
表7.表3中覆盖靶标区域的常规120nt探针的互补链探针
Figure PCTCN2022111610-appb-000014
Figure PCTCN2022111610-appb-000015
Figure PCTCN2022111610-appb-000016
Figure PCTCN2022111610-appb-000017
表8.表3中覆盖靶标区域的NC探针的互补链探针
Figure PCTCN2022111610-appb-000018
Figure PCTCN2022111610-appb-000019
Figure PCTCN2022111610-appb-000020
Figure PCTCN2022111610-appb-000021
Figure PCTCN2022111610-appb-000022
测试本发明NC探针的基础效果后,实施例4-8进一步测试基于本发明NC探针的杂交捕获系统及相关参数。
实施例4:最佳NC探针浓度测试
不同浓度的NC探针对靶基因的捕获效率差异未知,通过设置不同浓度梯度探针的实验,寻找最佳探针浓度。具体实验方案参见下表9,按照本发明的探针设计思路,设计4.5kb的靶标区域,使用Promega标准品male(G1471 Promega-male),将样本打断至200-250bp左右。
具体实验流程除各实验组探针浓度不同外,其他变量一致,结果数据参见图7。
表9
实验分组 探针浓度
Lib 1 2fmol
Lib 2 2fmol
Lib 3 4fmol
Lib 4 4fmol
Lib 5 6fmol
Lib 6 6fmol
Lib 7 10fmol
Lib 8 10fmol
从Consensus depth结果分析来看,DS211或者SS信息与NC探针浓度呈正比,NC探针浓度较低时捕获到的有效文库信息较少,NC探针浓度越高,捕获到的有效文库信息越丰富,但是NC探针浓度太高会导致体系中含有过量的冗余NC探针,导致中靶率降低。本系统采用的最优NC探针浓度在6-10fmol之间,更优的选择为6fmol的NC探针。
实施例5:最佳杂交捕获温度测试
本系统使用的是NC探针,需要根据探针结构选择杂交捕获温度,为了确定最佳温度条件,进行了一系列测试,具体实验方案参见下表10,按照本发明的NC探针设计思路,设计4.5kb的靶标区域,使用Promega标准品male(G1471 Promega-male),将样本打断至200-250bp左右。
具体实验流程除各实验组杂交捕获温度不同外,其他变量一致,结果数据参见图8。
表10
实验分组 杂交捕获温度
Lib 1 57℃
Lib 2 60℃
Lib 3 63℃
从建库效率以及Consensus depth结果分析来看,DS211或者SS含量受杂交捕获温度影响,60℃的杂交捕获温度表现优于其他两个温度条件,且60℃的捕获效率以及中靶率表现均高于其他杂交捕获温度。
为了确保60℃为最佳杂交条件,且该系统不会对杂交温度过于敏感,接着测试了更为接近的杂交条件,比较59℃、60℃以及61℃的杂交条件下文库捕获效率差异(见表11),各实验组除杂交捕获温度不同外,其他变量一致,结果数据参见图8。
表11
实验分组 杂交捕获温度
Lib 1 59℃
Lib 2 60℃
Lib 3 61℃
从以上数据分析,杂交温度从59℃到61℃均表现出较优的捕获效率,本系统采用60℃做为最终杂交捕获条件。
实施例6:缩短杂交捕获时间
传统杂交捕获系统采用的杂交时间为16小时,本发明所采用的杂交时间可以由16小时缩短为1小时,并且缩短杂交时间并不会影响探针对DNA样本的捕获效率。
使用本系统的杂交捕获条件进行实验,具体实验方案参见下表12,按照本发明的NC探针设计思路,先设计50kb的靶标区域,使用GW-OGTM800标准品,将样本打断至200-250bp左右。
实验流程如下:
gDNA打断至200bp左右(Covaris超声打断仪),进行末端修复、接头连接,接着利用等体积的Beads纯化核酸;该具体纯化流程如下:
1.提前将
Figure PCTCN2022111610-appb-000023
SP Beads取出涡旋混匀,室温平衡30分钟后使用;
2.向接头连接产物中加入80μL
Figure PCTCN2022111610-appb-000024
SP Beads,混合均匀,25℃孵育5–10分钟;
3.将PCR管瞬时离心后放置于磁力架上5-10分钟至液体完全 澄清,使用移液器吸取移弃上清;
4.加入200μL BW Buffer洗涤1次,静置2分钟,吸弃上清;
5.往反应体系中加入杂交反应液。
杂交体系中含有6fmol的探针、1×Hyb Buffer、1×Enhance、1ug Human Cot-1、100pmmol的Blocker,将配置的杂交反应系统放置在温控仪反应,杂交反应条件如下:95℃变性2分钟、60℃杂交1小时或16小时。
杂交反应结束后,将上清转移到新的PCR管中,向PCR反应管中加入10μL M270 Beads进行杂交捕获,60℃捕获20分钟。
捕获20分钟结束后,使用洗脱缓冲液I、洗脱缓冲液II以及洗脱缓冲液III各洗涤1次。
洗涤结束后,向M270Beads中加入PCR反应体系,PCR反应体系主要包括2×HiFi PCR Master Mix、5μL Index Primer Mix以及20μL TE;在PCR温控仪上启动PCR扩增程序,反应结束后,使用1倍体积磁珠进行纯化,纯化产物在
Figure PCTCN2022111610-appb-000025
平台测序。
测试结果数据参见图9。
表12
实验分组 建库及杂交捕获试剂盒 杂交时间
Lib 1 EASY Hybrid Capture System 16小时
Lib 2 EASY Hybrid Capture System 16小时
Lib 3 EASY Hybrid Capture System 1小时
Lib 4 EASY Hybrid Capture System 1小时
从Consensus depth结果分析来看,DS211或者SS信息与杂交时 间呈正比,杂交1小时已经捕获到90%以上的有效文库捕获,最终选择1小时杂交时间,控制整个实验流程在1天完成。
实施例7:小靶标区域NC探针PCR-free模式捕获与常规探针常规捕获流程的比较
为了比较对于小靶标区域,优化后的NC探针PCR-free模式与传统探针非PCR-free模式下的捕获表现,按照下表13的分组方式开展实验,其中组1采用传统方式构建靶向捕获文库,传统方式的杂交捕获系统搭配120nt的探针;组2使用本发明NC探针的系统构建PCR-free靶向捕获文库,针对同一区域设计捕获探针,探针覆盖基因组外显子区域,靶标区域大小为4kb左右。
表13
Figure PCTCN2022111610-appb-000026
其中组1的具体实施流程参考
Figure PCTCN2022111610-appb-000027
简易杂交捕获试剂盒的商品说明书;而组2的具体实验流程参见实施例6,杂交时间固定为1小时。
本实施例的数据表现参见表14,组1和组2覆盖平均率的接近100%,然而组2的中靶率为59%,比组1的11.73%还高,显见本发明NC探针的系统可以有效提升中靶率。
表14.小靶标区域捕获效率高于传统杂交捕获
Figure PCTCN2022111610-appb-000028
实施例8:对融合基因检测效率高于传统杂交捕获
融合基因是由于基因组重排造成两个基因的部分片段相接而产生的。可通过对重排断点两侧的区域进行捕获测序来检测和分析融合基因。由于跨断点的重排片段只有部分是原来的序列,对于常规探针来说,会出现只有部分区段可以结合的问题。而NC探针同样可以通过更多的探针结合可能性来提高融合基因的检测能力。
按照下面表15的分组方式开展实验,其中组1采用传统方式构建靶向捕获文库,传统方式的杂交捕获系统搭配120nt的探针,设计覆盖ROS1内含子33的探针,检测CD74-ROS1融合;组2使用本发明构建靶向捕获文库,针对同一区域设计捕获探针,靶标区域为1kb 左右。其中组1的具体实施流程参考
Figure PCTCN2022111610-appb-000029
简易杂交捕获试剂盒的商品说明书。
表15
Figure PCTCN2022111610-appb-000030
样本为泛肿瘤800 gDNA标准品(GW-OGTM800),该标准品包含多个经过数字PCR验证的突变位点,CD74-ROS1 Fusion就是其中之一,该位点理论突变频率为6%。
组2的具体实验流程参见实施例6,结果数据参见下表16。
表16.融合基因检测效果高于传统杂交捕获
Figure PCTCN2022111610-appb-000031
Figure PCTCN2022111610-appb-000032
融合位点常位于重复区域内,重复区域内的探针设计是捕获难题,而本系统使用NC探针,对于融合基因的检测表现出一定的优势。本实验的GW-OGTM800标准品包含一组CD74-ROS1融合基因,数字PCR验证其突变频率为5%;组1、组2使用覆盖相同区域的探针进行杂交捕获,传统方法检测到融合基因频率在1.1%左右,而优化后的本发明系统检测到融合基因频率在5.8%。
以上所述仅为本发明的优选实施例,并不用于限制本发明。本发明提及的所有文献都在本申请中全文引用作为参考。此外应理解,在阅读了本发明的上述讲授内容之后,凡在本发明的精神和原则之内,本领域技术人员可以对本发明作各种改动或修改,这些等价形式的修改同样落于本申请权利要求书所限定的范围。

Claims (14)

  1. 一种用于核酸捕获富集的探针,其特征在于,所述探针包括:(1)与另一探针互补配对的探针结合序列,以及(2)与核酸靶序列互补配对的靶特异性序列。
  2. 根据权利要求1所述的探针,其特征在于,所述探针结合序列包括第一探针结合序列和第二探针结合序列。
  3. 根据权利要求2所述的探针,其特征在于,所述探针的5’端具有与另一探针的3’端互补配对的第一探针结合序列,并且所述探针的3’端具有与另一探针的5’端部互补配对的第二探针结合序列。
  4. 根据权利要求1所述的探针,其特征在于,所述探针结合序列的长度为8-30nt。
  5. 根据权利要求1所述的探针,其特征在于,所述靶特异性序列的长度为20-80nt。
  6. 根据权利要求3所述的探针,其特征在于,所述探针5’端与另一探针互补配对的第一探针结合序列长度为8-30nt,所述探针3’端与另一探针互补配对的第二探针结合序列长度为8-30nt。
  7. 根据权利要求1所述的探针,其特征在于,所述探针的3’端或5’端带有生物标记物。
  8. 根据权利要求7所述的探针,其特征在于,所述生物标记物为生物素。
  9. 根据权利要求1所述的探针,其特征在于,所述探针与核酸靶序 列之间的退火温度大于探针与探针之间的退火温度。
  10. 一种用于核酸捕获富集的探针池设计方法,其特征在于,包括以下步骤:
    a)输入初始序列信息和设计参数,输出探针序列信息,其中,所述初始序列信息包括(1)总和序列信息,为捕获前文库中可能包含的序列;(2)靶序列信息,为待捕获的序列、需要避开的序列信息,即综合序列中的重复序列等低特异性序列;
    所述设计参数包括探针与靶序列结合的退火温度范围和序列长度范围,以及探针与探针间结合序列的长度范围;
    b)从总和序列的正链及互补链序列中截取长度为k的所有子序列,并统计每种子序列出现的次数;
    c)选择探针与探针间互补配对的探针结合序列,其中,所述探针结合序列长度为k,其退火温度小于探针与靶序列结合的退火温度,且在总和序列中出现的次数少于5%平均值;
    d)选择探针与核酸靶序列结合的靶特异性序列,其中,选择第i个所述靶序列,i的初始值等于1;接着从选择的所述靶序列,其第n个碱基开始选取探针与核酸靶序列结合的靶特异性序列,n的初始值等于1;
    f)在所述靶特异性序列的5’端加上探针结合序列,3’端加上探针结合序列的反向互补序列;
    g)输出所有的探针序列。
  11. 根据权利要求10所述的设计方法,其特征在于,所述探针与核酸 靶序列结合的靶特异性序列,如果未落入需要避开的序列区间,则放入探针池,并间隔m1个碱基尝试获得下一个靶特异性序列;
    如果落入需要避开的序列区间,则不放入探针池,并间隔m2个碱基再次尝试获得靶特异性序列;
    其中,所述数字m1值大于或等于所述探针与靶特异性序列的长度;
    所述的数字m2值小于或等于所述探针与靶特异性序列长度范围的最小值。
  12. 根据权利要求10所述的设计方法,其特征在于,选择探针与核酸靶序列结合的靶特异性序列,其步骤包括:当n小于第i个靶序列的长度时,选择下一个靶特异性序列;当n大于或等于第i个靶序列的长度时,选择第i个靶特异性序列。当第i个靶序列的靶特异性序列选择结束后,对第i+1个靶序列进行上述靶特异性序列选择,直到所有靶序列均完成靶特异性序列选择。
  13. 权利要求1-9中任一项所述的探针于检测DNA片段中的低频突变检测、染色体拷贝数变异分析、插入/缺失、微卫星不稳定性或融合基因变异的应用。
  14. 权利要求1-9中任一项所述的探针于靶向mNGS测序或检测病原体流行病学的应用。
PCT/CN2022/111610 2022-05-16 2022-08-11 一种靶向富集核酸的探针 WO2023221307A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22896845.9A EP4299758A1 (en) 2022-05-16 2022-08-11 Probe for target enrichment of nucleic acid

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210530059.6 2022-05-16
CN202210530059 2022-05-16

Publications (1)

Publication Number Publication Date
WO2023221307A1 true WO2023221307A1 (zh) 2023-11-23

Family

ID=83077344

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111610 WO2023221307A1 (zh) 2022-05-16 2022-08-11 一种靶向富集核酸的探针

Country Status (3)

Country Link
EP (1) EP4299758A1 (zh)
CN (2) CN115011594B (zh)
WO (1) WO2023221307A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116042920A (zh) * 2022-12-20 2023-05-02 南京世和基因生物技术股份有限公司 一种基于靶向hpv的宫颈癌患者治疗后的微小残留病灶的ngs检测方法及试剂盒

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101225432A (zh) * 2007-01-17 2008-07-23 富阳市金信投资有限公司 核酸三联探针和核酸四联探针
WO2014020137A1 (en) * 2012-08-02 2014-02-06 Qiagen Gmbh Recombinase mediated targeted dna enrichment for next generation sequencing
US20140323316A1 (en) * 2013-03-15 2014-10-30 Complete Genomics, Inc. Multiple tagging of individual long dna fragments
CN104977280A (zh) * 2015-05-28 2015-10-14 广东省生态环境与土壤研究所 基于核酸探针首尾互补策略的汞离子的检测方法及检测试剂盒
CN114891859A (zh) * 2022-05-16 2022-08-12 纳昂达(南京)生物科技有限公司 一种液相杂交捕获方法及其试剂盒

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005084277A2 (en) * 2004-02-27 2005-09-15 University Of Denver Method of isolating nucleic acid targets
CN106086013B (zh) * 2016-06-30 2018-10-19 厦门艾德生物医药科技股份有限公司 一种用于核酸富集捕获的探针及设计方法
WO2018144159A1 (en) * 2017-01-31 2018-08-09 Counsyl, Inc. Capture probes using positive and negative strands for duplex sequencing
WO2019200580A1 (zh) * 2018-04-19 2019-10-24 上海迪赢生物科技有限公司 一种同时捕获基因组目标区域正反义双链的平行液相杂交捕获方法
CN110387400B (zh) * 2018-04-19 2023-03-21 上海迪赢生物科技有限公司 一种同时捕获基因组目标区域正反义双链的平行液相杂交捕获方法
CN109777858A (zh) * 2018-12-20 2019-05-21 天津诺禾医学检验所有限公司 对基因重复区域进行杂交捕获的探针及方法
CN109628558B (zh) * 2018-12-21 2020-01-14 北京优迅医学检验实验室有限公司 一种用于高通量测序检测基因突变的捕获探针及其应用
CN109609635B (zh) * 2018-12-24 2020-07-07 深圳市海普洛斯生物科技有限公司 多基因富集的探针库及与多种肿瘤治疗相关的多个基因的检测方法
CN110257537B (zh) * 2019-07-04 2023-03-24 大连晶泰医学检验实验室有限公司 用于梅毒病原体检测的液相杂交捕获文库及其构建方法
CN112342627A (zh) * 2019-08-09 2021-02-09 深圳市真迈生物科技有限公司 一种核酸文库的制备方法及测序方法
CN113913493B (zh) * 2020-07-07 2024-04-09 天昊基因科技(苏州)有限公司 一种靶基因区域快速富集方法
CN112195287B (zh) * 2020-11-12 2022-03-01 武汉凯德维斯生物技术有限公司 一种用于人乳头瘤病毒hpv分型和整合检测的探针组及其试剂盒

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101225432A (zh) * 2007-01-17 2008-07-23 富阳市金信投资有限公司 核酸三联探针和核酸四联探针
WO2014020137A1 (en) * 2012-08-02 2014-02-06 Qiagen Gmbh Recombinase mediated targeted dna enrichment for next generation sequencing
US20140323316A1 (en) * 2013-03-15 2014-10-30 Complete Genomics, Inc. Multiple tagging of individual long dna fragments
CN104977280A (zh) * 2015-05-28 2015-10-14 广东省生态环境与土壤研究所 基于核酸探针首尾互补策略的汞离子的检测方法及检测试剂盒
CN114891859A (zh) * 2022-05-16 2022-08-12 纳昂达(南京)生物科技有限公司 一种液相杂交捕获方法及其试剂盒

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GYDUSH, G. ET AL.: "Massively Parallel Enrichment of Low-Frequency Alleles Enables Duplex Sequencing at Low-Depth Duplex Sequencing", NATURE BIOMEDICAL ENGINEERING, vol. 6, no. 3, 17 September 2022 (2022-09-17), pages 257 - 266, XP037766525, DOI: 10.1038/s41551-022-00855-9 *
LUO ZHI-MEI, ZHANG YONG-BIAO, CHUN YAN, YUAN-YUAN HAN, RUI H U, LIU JI-QIANG: "Research Advance and Application of Molecular Inversion Probe Technology", BIOTECHNOLOGY BULLETIN, vol. 34, no. 10, 31 December 2018 (2018-12-31), pages 49 - 57, XP093081219, DOI: 10.13560/j.cnki.biotech.bull.1985.2018-0367 *

Also Published As

Publication number Publication date
CN116083423B (zh) 2024-04-30
CN116083423A (zh) 2023-05-09
CN115011594A (zh) 2022-09-06
CN115011594B (zh) 2023-10-20
EP4299758A1 (en) 2024-01-03

Similar Documents

Publication Publication Date Title
WO2023221308A1 (zh) 一种液相杂交捕获方法及其试剂盒
CN107475375B (zh) 一种用于与微卫星不稳定性相关微卫星位点进行杂交的dna探针库、检测方法和试剂盒
CN105506125B (zh) 一种dna的测序方法及一种二代测序文库
AU2021224760A1 (en) Capturing genetic targets using a hybridization approach
CN106367485B (zh) 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN107002292B (zh) 一种核酸的双接头单链环状文库的构建方法和试剂
CN108004301A (zh) 基因目标区域富集方法及建库试剂盒
JP7232643B2 (ja) 腫瘍のディープシークエンシングプロファイリング
CN111748551B (zh) 封闭序列、捕获试剂盒、文库杂交捕获方法及建库方法
CN109536579A (zh) 单链测序文库的构建方法及其应用
CN109971827A (zh) 血浆dna的建库方法和建库试剂盒
BRPI0714563B1 (pt) Métodos para detectar a pré-disposição à ou estágio de um câncer em um indivíduo mamífero, para triagem, identificação ou otimização de um medicamento anti-câncer, para modificação de um gene de mamífero in vitro e para avaliação da eficácia de um medicamento anti-câncer ou medicamento candidato
CN109576346A (zh) 高通量测序文库的构建方法及其应用
CN109706219A (zh) 构建测序文库的方法、试剂盒、上机方法及测序数据的拆分方法
WO2023221307A1 (zh) 一种靶向富集核酸的探针
CN105734679A (zh) 核酸靶序列捕获测序文库的制备方法
CN111979307A (zh) 用于检测基因融合的靶向测序方法
CN108359723B (zh) 一种降低深度测序错误的方法
US7906326B2 (en) Bioinformatically detectable group of novel regulatory oligonucleotides associated with alzheimer's disease and uses thereof
CN110205365B (zh) 一种高效研究rna相互作用组的高通量测序方法及其应用
CN108624686B (zh) 一种检测brca1/2突变的探针库、检测方法和试剂盒
CN112639127A (zh) 用于对基因改变进行检测和定量的方法
US20190218606A1 (en) Methods of reducing errors in deep sequencing
WO2022007863A1 (zh) 一种靶基因区域快速富集方法
CN106636069B (zh) 一种菰cDNA文库的构建方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022896845

Country of ref document: EP

Effective date: 20230601