WO2021232793A1 - Universal closed sequence and use thereof - Google Patents

Universal closed sequence and use thereof Download PDF

Info

Publication number
WO2021232793A1
WO2021232793A1 PCT/CN2020/139918 CN2020139918W WO2021232793A1 WO 2021232793 A1 WO2021232793 A1 WO 2021232793A1 CN 2020139918 W CN2020139918 W CN 2020139918W WO 2021232793 A1 WO2021232793 A1 WO 2021232793A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
universal
blocking
library
closed
Prior art date
Application number
PCT/CN2020/139918
Other languages
French (fr)
Chinese (zh)
Inventor
胡玉刚
汪彪
郑文莉
吴强
Original Assignee
纳昂达(南京)生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 纳昂达(南京)生物科技有限公司 filed Critical 纳昂达(南京)生物科技有限公司
Publication of WO2021232793A1 publication Critical patent/WO2021232793A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid

Definitions

  • the present invention relates to the field of high-throughput sequencing library construction, in particular to a universal closed sequence and its application.
  • MGI MGI
  • MGI-200, MGI-2000 and T7 sequencers MGI-200, MGI-2000 and T7 sequencers.
  • T7 sequencer is currently the sequencer with the highest sequencing throughput and the lowest sequencing cost on the market.
  • targeted capture sequencing is also an effective way to achieve large-scale reduction of detection costs while detecting target sequences.
  • the hybridization capture can effectively reduce the sequencing cost of the detection target, and at the same time, if the proportion of the captured target area can be increased during the hybridization sealing process, the sequencing cost can also be saved.
  • the main purpose of the present invention is to provide a universal closed sequence and its application to solve the problem of low hybridization and capture efficiency of double-ended index libraries in the prior art.
  • a universal closing sequence which includes a left non-tag region closing sequence and a middle tag region closing sequence sequentially connected in a direction from 5'to 3'.
  • the right non-tag region closed sequence where the left non-tag region closed sequence includes 5-7 LAN or BNA modified bases, the middle label region closed sequence is a universal closed base sequence, and the right non-tag region closed sequence Including 7-10 LAN or BNA modified bases, and the 3'end of the blocking sequence in the right non-tag region has a blocking modification.
  • the 3'end blocking modification is MGB modification, C3 spacer modification, phosphorylation modification, digoxigenin modification or biotin modification, or the 3'end base is a dideoxy base.
  • the universal blocking base is hypoxanthine or C3 spacer.
  • the universal blocking sequence is the blocking sequence of the P1 linker with the first tag sequence or the blocking sequence of the P2 linker with the second tag sequence of the MGI sequencing platform, where the blocking sequence of the P1 linker is SEQ ID NO: 3 :
  • the blocking sequence of the P2 linker is SEQ ID NO: 4:
  • the universal blocking sequence is the blocking sequence of the P1 linker with the first tag sequence or the blocking sequence of the P2 linker with the second tag sequence of the MGI sequencing platform;
  • the blocking sequence of the P1 linker is SEQ ID NO: 5:
  • a capture kit in order to achieve the above objective, according to the second aspect of the present invention, includes a universal blocking sequence, and the blocking sequence is any of the above universal blocking sequences.
  • the working concentration of the universal capture probe in the kit is based on a single library, which is 0.4-0.8 ⁇ g of the universal blocking sequence/1 ⁇ g of the library to be captured.
  • a library hybridization capture method includes using a capture kit to capture the library to be captured, and the capture kit uses any of the foregoing capture kits.
  • the step of using the capture kit to capture the library to be captured includes blocking the blocking sequence and the library to be captured in a molar ratio of 10:1 to 20:1.
  • the method for constructing a library includes: constructing a fragmented library; hybridizing and capturing the fragmented library to obtain a captured library; performing PCR amplification on the captured library to obtain sequencing Library; use any of the above capture kits for hybrid capture, or use any of the above methods for hybrid capture.
  • 5-7 and 7-10 bases are modified by LNA or BNA respectively on the closed sequence of the left non-tag region and the closed sequence of the right non-tag region, which can enhance the closed sequence to be blocked. Binding ability, thereby enhancing the blocking effect; blocking modification at the 3'end of the blocking sequence in the right non-tag region, so that the blocking sequence of the present application can reduce or avoid the capture of non-target libraries during library capture, and improve the capture of target libraries Rate (or called the target rate of the target library).
  • target libraries Rate or called the target rate of the target library
  • Figure 1 shows a schematic diagram of the current MGI platform dual-end Index database construction process
  • Fig. 2 shows a schematic diagram of the principle of improving the target sequence capture rate when a universal closed sequence is used for capture in the prior art
  • Fig. 3 shows a statistical result diagram of crosstalk between libraries constructed by using 12 double-ended Indexes in the double-ended Index library of the MGI platform in the prior art
  • 4A and 4B show schematic diagrams of the normal and abnormal blocking of the hybrid library by the improved universal blocking sequence
  • Figure 5 shows a schematic diagram of the effect of using different modification quantities and different concentrations of universal blocking sequences on the target sequence capture rate
  • Figure 6 shows the influence of too few and too many base modification universal blocking sequences on the blocking effect
  • Figure 7 shows the effect of universal blocking sequences with the same number of modified bases but different modified positions on the blocking effect
  • Figure 8 shows the effect of different library input on the blocking effect.
  • Double-ended Index adapter For high-throughput sequencing, a universal sequencing adapter is required to connect the end of each fragment. Each non-complementary region of the adapter has a variable sequence region. The sequence is the Index sequence, which is used to split data during sequencing.
  • Adaptor blocking sequence During library capture, each library has the same and similar adaptor sequence. During hybridization, the target fragment and non-target fragment adaptor parts will bind to each other, reducing the target rate.
  • the adaptor sealing sequence is used to specifically bind the adaptor part. Sequence, play a role in improving the target rate.
  • the universal blocking sequence is a sequence that can be blocked for all linkers with different Inedx in the library.
  • C3 Splicer is mainly used to imitate the three-carbon gap between the 3'and 5'hydroxyl groups of ribose, or to "replace" an unknown base in a sequence.
  • the middle of the nucleic acid sequence is mainly used for connection and cannot be complementary.
  • the base pairing of the base pair plays a stabilizing effect, and only plays a role in connecting the front and rear bases.
  • index length used for the MGI sequencing platform mentioned in this application is described by taking 10 bp as an example.
  • the length of the label closure section for index in this application can be adapted to the closure of index joints of different lengths by adjusting the length of the hypoxanthine I and C3 spacer arms.
  • MGI MGI first introduced a single-ended Index connector library building solution. This solution has a significant problem that there will be crosstalk between samples. In order to solve the crosstalk problem, MGI also launched a double-ended Index library building solution, which can be solved. Due to the low-frequency crosstalk problem caused by the interaction of joint synthesis, experimental operation and sequencing process, the crosstalk data is filtered out through two Index data splitting.
  • Another effective way to reduce the cost of sequencing is to target capture sequencing.
  • the size of the human genome is 3Gb, and the region encoding the gene accounts for less than 2% of the region. (In the case of IDT's all-exon V2 version), it covers humans.
  • the detection area of tumor-targeted drugs has a greater cost difference and can be more reflected.
  • the importance of targeted sequencing Moreover, tumor mutations are low-frequency mutations. Two main issues need to be considered.
  • Double-ended Index means when crosstalk cannot be avoided. It is a necessary solution to eliminate crosstalk when avoiding, so this is also the greatest significance of the existence of double-ended Index.
  • Another condition for detecting low-frequency mutations is that the sequencing depth must be guaranteed, which generally requires a sequencing depth of several thousand to tens of thousands of times.
  • the target rate of targeted capture there are two key factors that determine the target rate of targeted capture.
  • the specificity of the design area the designed probe cannot fall into the highly repetitive area, and the other is the capture time. Joint closure effect.
  • the probe sequence is determined by the sequence of the region to be captured. Generally, highly repetitive regions are avoided when designing the probe, so what we can improve is the blocking effect during targeted capture. If the blocking sequence is not added during capture, theoretical and practical tests have shown that the target rate of (target fragment library) will not exceed 50%. As shown in Figure 2, when the blocking linker sequence is not added, the linker part will bind to the non-targeting library to reduce the target rate.
  • this application takes the double-ended Index linker library of the MGI platform as an example to explain the improved design of the universal blocking sequence and the achieved blocking effect, which can greatly increase the target rate and reduce the sequencing cost.
  • the index sequence carried by the Index linker of different numbers is a 10bp variable region, which is used for hybrid capture and hybrid sequencing. Different samples are distinguished.
  • the purpose of the double-ended Index mentioned above is to remove crosstalk. As shown in Figure 3, 3-7 sequences can be filtered out through the double-ended Index for every 10,000 sequences. If the double-ended Index is not used for one-thousandth and below The mutation detection is unreliable data, and the accuracy of detection can be improved through double-ended Index. In order to improve the target rate during the hybridization capture process, the present invention develops a universal closed sequence for the double-ended Index of the MGI platform.
  • this universal sequence selects universal bases (hypoxanthine or C3 spacer) to play a blocking/occupying role.
  • the fixed sequence regions at both ends are used to increase the hybridization temperature.
  • Base modification and substitution some bases in the fixed sequence are modified with LNA or BNA.
  • the tag sequence region is a universal closed base, such as hypoxanthine (I), C3 spacer equidistant sequence or a combination thereof;
  • the binding efficiency of the universal closed sequence is that some bases in the non-tag sequence region upstream and downstream of the closed sequence are modified by LNA or BNA, and the number of modified bases is 5-7 and 7-10, or 20%-40, respectively. %between.
  • the optimal use concentration of universal blocking sequences is inversely proportional to the number of modified bases. There are many modified bases, and the optimal concentration is relatively low. Too high a concentration will have a negative effect; on the contrary, there are fewer modified bases, and the blocking effect must be achieved. A higher blocking sequence concentration is required.
  • the present invention designs a universal closed modified sequence based on the linker sequence characteristics of the MGI double-ended Index.
  • the two universal closed original sequences of MGI are as follows:
  • the part of N is the closed sequence of the Index sequence.
  • the index of 10bp length has the advantage that it can increase the choice when designing different indexes.
  • the disadvantage is that the design of general closed sequence increases the difficulty and Instability.
  • the bases in the Index area are designed to be degenerate bases N, C3 spacer and hypoxanthine.
  • the obtained universal closed sequence is more unstable, which requires a universal closure at both ends of the Index.
  • the P1 (5+7 modified) end blocking sequence is SEQ ID NO: 3:
  • the P2 end blocking sequence is SEQ ID NO: 4:
  • the P1 end blocking sequence is SEQ ID NO: 5:
  • the P2 end blocking sequence is SEQ ID NO: 6:
  • the amount of library added is also limited to the maximum amount. Calculated as the 250bp length insert library, it cannot exceed 6.5 ⁇ g (25pmol/L, here refers to the single-ended sequence) in a single hybridization.
  • the whole exome test found that each library invested 500ng during hybridization, and the hybridization effect of 12 libraries was better than that of 14-16 libraries.
  • the library space distance would be shortened due to excessive library input.
  • the two libraries and universal blocking of the library form a cross-star structure of the double library and double blocking sequence shown in FIG. 4B, resulting in a decrease in capture efficiency.
  • a universal closing sequence includes, in a direction from 5'to 3': a left non-tag region closing sequence, a middle tag region closing sequence, and The right non-tag region blocking sequence, where the left non-tag region blocking sequence includes 5-7 LAN (Locked nucleic acid, locked nucleic acid) or BNA (Bridged nucleic acid 2', 4'-BNA NC , ie 2 '-O,4'-aminoethylene bridged nucleic acid is a compound containing a six-member bridged structure with an NO bond) modified bases (mainly C bases are modified by LNA or BNA), the middle tag region blocking sequence is universal
  • the closed base sequence, the closed sequence of the right non-tag region includes 7-10 LAN or BNA modified bases, and the 3'end of the closed sequence of the right non-tag region has a closed modification.
  • Modification of LNA or BNA by 5-7 bases on the closed sequence in the left non-tag region and 7-10 bases in the closed sequence in the right non-tag region can significantly enhance the binding ability to the sequence to be blocked , Thereby increasing the blocking effect; and blocking and modifying the 3'end of the blocking sequence in the right non-tag region, so that the excess adapters in the library cannot be used as primers to amplify the adapters of other libraries, thereby reducing or avoiding non-specific libraries.
  • Heterosexual capture improves the target rate of target library capture.
  • the blocking modification of the 3'end of the above-mentioned right non-tag region blocking sequence can be MGB modification, C3 spacer modification, 3'phosphorylation modification, 3'digoxigenin modification, 3'biotin modification or 3'end base Dideoxy base.
  • C3 spacer modification it is preferred to use C3 spacer modification.
  • the above-mentioned tag region blocking sequence adopts a universal blocking base, and a base sequence that has weak binding ability to all four bases of A, T, C, and G can be used.
  • the universal blocking base is hypoguanine I and/or C3 spacer.
  • the number of bases in the blocking sequence of the specific tag region is not limited to 10 bp, and can be set reasonably according to the number of bases in the sample tag in the library to be blocked. For example, it can also be 6bp, 7bp, 8bp, 9bp, 11bp or 12bp.
  • the C3 spacer When the C3 spacer is used as the tag region blocking sequence, take the blocking sequence of the P1 and P2 linker of the aforementioned MGI platform as an example, which is 10 C3 spacers, or 10 hypoxanthines (I).
  • the advantage of hypoxanthine is that it has weak pairing ability with all bases, while the C3 spacer only occupies one base, which has no binding ability with the paired base and cannot play a stable role.
  • the number of bases modified by LNA or BNA in the closed sequence of the left non-tag region and the closed sequence of the right non-tag region is generally considered to be the same as the amount of modified base data and the closed sequence of the left non-tag region or the right
  • the sequence length of the closed sequence in the non-tag region is negatively correlated. Longer sequences require fewer bases to be modified, while shorter sequences require more bases to be modified.
  • the inventors found that for the closed sequence of the non-tag region of a specific length, the number of bases modified by LNA or BNA in the closed sequence of the left non-tag region or the closed sequence of the right non-tag region is When the length is 5 to 10 bases, the blocking sequence has the strongest binding ability to the target linker. When it is less than 5 bases, the binding to the target linker is unstable, which makes the capture efficiency low.
  • the total amount of the library to be captured and the amount of the added universal blocking sequence should also match. Shape structure, which leads to non-specific capture and reduces capture efficiency.
  • the added amount of the universal blocking sequence is 2.4 ⁇ g, calculated based on 500 ng per library
  • a total of 12 libraries are hybridized, that is, when the total amount is 6 ⁇ g, the hybridization capture effect is better than the capture effect of hybridization of 14-16 libraries.
  • each library is less than 500ng, such as 400ng
  • 2.4 ⁇ g of universal blocking sequence and hybridization of 15 libraries at the same time have the highest capture efficiency.
  • this application also provides a universal closed sequence that can inhibit the capture of the double-ended Index sequencing library of the MGI sequencing platform.
  • the P1 (5+7 modified) end blocking sequence is SEQ ID NO: 3:
  • the P2 end blocking sequence is SEQ ID NO: 4:
  • the P1 end blocking sequence is SEQ ID NO: 5:
  • the P2 end blocking sequence is SEQ ID NO: 6:
  • the universal blocking sequences provided by the above two preferred embodiments not only increase the number of modified bases to improve the binding ability with the target linker, but also the specific positions of the modified bases on the above universal blocking sequences are also modified compared to In the case of bases in other positions, the binding ability to the target linker is strong. That is to say, the above-mentioned preferred universal blocking sequence has the best blocking effect on the target linker, and the capture efficiency of the target library is the highest during hybridization and capture.
  • a capture kit which includes any of the aforementioned universal blocking sequences.
  • the universal blocking sequence in the capture kit has strong binding ability to the target linker, and when used for the capture library construction, it can achieve efficient capture of the target library.
  • the working concentration of the universal capture probe in the above kit is 0.4-0.8 ⁇ g universal blocking sequence/1 ⁇ g of the library to be captured.
  • the capture according to the above dosage can further avoid the formation of a cross-star closure due to the excessive amount of the library, thereby reducing the target rate in the library.
  • the above working concentration can be different according to the specific blocking scheme. For example, when the blocking schemes of SEQ ID NO: 5 and SEQ ID NO: 6 of this application are adopted, the working concentration is 0.4 ⁇ g general blocking sequence/1 ⁇ g to be captured The library is captured, and the target rate of the target library is higher. When the blocking schemes of SEQ ID NO: 3 and SEQ ID NO: 4 of the present application are adopted, the target library for capturing the target library according to the working concentration of 0.8 ⁇ g universal blocking sequence/1 ⁇ g library to be captured is relatively high.
  • a library hybridization capture method is also provided.
  • the method includes using a capture kit to capture the library to be captured, and the capture kit uses the above-mentioned capture kit.
  • the blocking sequence in the capture kit can achieve efficient capture of the target library when the capture library is constructed.
  • the inventors also found that when the molar ratio of the universal blocking sequence to the library to be captured is 10:1-20:1, the blocking effect is better. Therefore, in a preferred embodiment of the present application, in the step of using the capture kit to capture the library to be captured, the blocking sequence and the library to be captured are blocked in a molar ratio of 10:1-20:1.
  • a method for constructing a library includes: constructing a fragmented library; hybridizing and capturing the fragmented library to obtain a capture library; and performing PCR amplification on the capture library , To obtain a sequencing library; use the above-mentioned capture kit for hybridization capture, or use any of the above-mentioned methods for hybridization capture.
  • the target library accounts for a relatively high proportion
  • the effective data produced by the library accounts for a high proportion.
  • DNA sample fragmentation ---end repair and A addition---adapter ligation---fragment screening---PCR amplification---library purification, quantification and quality inspection---after sequencing or targeted capture using the MGI platform Sequencing.
  • Steps Library construction was carried out with reference to the instructions of NadPrep TM DNA Library Construction Kit (for MGI) (201909 Version 2.0). The steps of hybridization and capture are carried out as follows. When multi-library hybridization and hybridization are performed after vacuum concentration, the specific hybridization library mixing steps are as follows:
  • X represents C3 Spacer/hypoxanthine
  • +N represents LNA or BNA modified base (the two modification effects are equivalent, taking LNA modification as an example)
  • /3SpC3/ represents the 3'C3 spacer arm closure.
  • Dynabeads TM M-270 Streptavidin Beads are vortexed and mixed uniformly, and the streptavidin magnetic beads can be cleaned and captured only after 30 minutes of equilibration at room temperature.
  • the subsequent PCR amplification and library purification and quantification steps can be performed in accordance with the instructions of the NadPrep TM DNA Library Construction Kit (for MGI) (201909 Version 2.0).
  • Example 2 The steps of Example 2 are the same as those of Example 1, with the only difference being that the number of modified bases of the universal blocking sequence used is different.
  • the number of modified bases of the universal blocking sequence is shown in the following table:
  • the P1 blocking sequence is SEQ ID NO: 8
  • P2 blocking sequence is SEQ ID NO :9
  • Scheme 3 is to leave the modification of P1 unchanged on the basis of scheme 1, and reduce the number of blocked modified bases at both ends of the P2 tag by one, and the number of modifications is 4+6;
  • scheme 4 is in scheme 3 Basically, add a block modification base to the universal block at both ends of the tag.
  • the blocking effect is obviously inferior to the 5+7 combination scheme of Scheme 1.
  • the 8+11 modification method of Scheme 4 is also less effective than the 7+10 scheme of Scheme 2, and the result is shown in Figure 6. Therefore, the closed sequence of the left non-tag region includes 5-7 modified bases, and the closed sequence of the right non-tag region includes 7-10 modified bases is a better solution.
  • Example 3 The steps of Example 3 are the same as those of Example 1.
  • the number of modified bases of the universal blocking sequence is also the same as that of Scheme 2 of Example 1. The only difference is that the positions of modified bases of the universal blocking sequence are different.
  • the specific sequence is as follows:
  • Blocking scheme 5 P1 blocking sequence is SEQ ID NO: 10
  • the P2 blocking sequence is SEQ ID NO: 11
  • Blocking scheme 6 P1 blocking sequence is SEQ ID NO: 12
  • the closed scheme 5 changed the position of the modification based on the same number of modifications as scheme 2.
  • the scheme 5 reduced one modification at each end and increased the modification at a position closer to the middle label; the closed scheme 6 is just the opposite.
  • the closed modified bases at both ends are increased, and the modifications closer to the middle tag are reduced.
  • the general blocking protocol 5, protocol 6 and protocol 2 used the blocking sequence at a concentration of 100 ⁇ mol/L, and hybridized with the same input amount of the library. As a result, it is found that the effects of Scheme 5 and Scheme 2 are close, and the effect of Scheme 6 is worse, as shown in Figure 7, which shows that not only the number of modified bases affects the blocking effect, but also the modified position has an impact on the blocking.
  • the present invention finds balanced modification Adding modification to the region closer to the middle tag sequence will be significantly better than increasing the number of modifications at the two ends.
  • Example 1 The experimental procedures for library construction and hybridization are the same as in Example 1.
  • the universal blocking selection is 7+10 base modifications, and the concentration is 100 ⁇ mol/L.
  • the difference is that multiple libraries are mixed and hybridized.
  • the specific library input quantity and total library input are as follows:
  • the sequencing indicators When the input amount for hybridization capture is 500ng/library, the sequencing indicators perform better. If a single hybridization can allow more input, that is, more samples in a single hybridization will reduce the cost of hybridization capture for each sample. .
  • the test of the universal closed sequence in this application found that when no more than 12 samples are hybridized together in a single time, the hybridization indicators perform well, and when the number of hybridized libraries reaches 14 and 16 samples, the hit rate will have a certain degree Decline. As shown in Figure 8, the total input of 14 libraries and 16 libraries is 7 ⁇ g and 8 ⁇ g, respectively. In a fixed capture system, as the number of libraries increases, some libraries have the opportunity to form a cross between two libraries and universal closure. The star structure is closed, thereby reducing the hit rate.
  • the present invention designs a universal blocking sequence for the hybridization of the double-ended Index library of the MGI platform, by replacing the variable region of the Index with universal bases, and adding the fixed sequence on both sides of the index to increase the annealing temperature.
  • Base modification can improve the capture efficiency of the double-ended index library.
  • this application also finds that the blocking effect of the universal blocking sequence and the number of modified bases that increase the annealing temperature are controlled at 5-7 on the left and 7-10 on the right, the blocking effect of the closed sequence is better. .
  • the specific positions of the modified bases are further optimized for the preferred positions of the examples of this application, the blocking effect is the best.
  • the concentration of the universal blocking sequence also affects the target rate of the target library.
  • the amount of the universal blocking sequence is 0.4-0.8 ⁇ g, the application can support 12 samples to be hybridized together at the same time, which greatly reduces the hybridization and capture cost of a single sample.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided in the present invention are a universal closed sequence and a use thereof. The universal closed sequence comprises a left non-tag region closed sequence, a middle tag region closed sequence and a right non-tag region closed sequence which are sequentially connected from 5' to 3'. The left non-tag region closed sequence comprises 5-7 LAN or BNA modified bases, the middle tag region closed sequence is a universal closed base sequence, the right non-tag region closed sequence comprises 7-10 LAN or BNA modified bases, and a 3' end of the right non-tag region closed sequence is provided with a closed modification. The structure can significantly enhance the binding capacity with a sequence to be closed, thereby improving the closing effect, and meanwhile, can reduce or avoid non-specific capture of a library, thereby improving the target rate of target library capture.

Description

通用封闭序列及其应用General closed sequence and its application 技术领域Technical field
本发明涉及高通量测序文库构建领域,具体而言,涉及一种通用封闭序列及其应用。The present invention relates to the field of high-throughput sequencing library construction, in particular to a universal closed sequence and its application.
背景技术Background technique
随着高通量测序在临床应用的辅助诊断重要性的提升,如何降低测序成本是一个很关键的问题,降低测序成本在不同的层面有相应的表现:华大智造(MGI)不断推出更高测序通量的测序仪,测序成本不断降低,相继推出MGI-200、MGI-2000和T7测序仪,其中T7测序仪是目前市场上测序通量最高和测序成本最低的测序仪。此外靶向捕获测序也是在检测目标序列的同时实现大规模降低检测成本的有效途径。With the increasing importance of high-throughput sequencing in assisted diagnosis in clinical applications, how to reduce sequencing costs is a critical issue. Reducing sequencing costs has corresponding performance at different levels: MGI (MGI) continues to introduce higher Sequencers with sequencing throughput have continuously reduced sequencing costs, and successively launched MGI-200, MGI-2000 and T7 sequencers. Among them, the T7 sequencer is currently the sequencer with the highest sequencing throughput and the lowest sequencing cost on the market. In addition, targeted capture sequencing is also an effective way to achieve large-scale reduction of detection costs while detecting target sequences.
在测序的过程中,不同样本用不同的Index序列来区分,进而将多个样本混合测序,也是高通量测序能够降低单个样本成本的方式。但是,如果采用单端Index,由于Index接头或引物在合成、文库构建实验操作以及测序的各个环节,都不可避免地会导致污染和/或相互串扰。因此,需要一种方式解决样本之间的低频度的相互串扰,目前解决这个问题的方法是采用双端Index来区分不同的样本,用双端Index的方法可以有效去除样本之间的相互串扰。In the sequencing process, different samples are distinguished by different Index sequences, and then multiple samples are mixed and sequenced, which is also a way for high-throughput sequencing to reduce the cost of a single sample. However, if a single-ended Index is used, since Index adapters or primers will inevitably lead to contamination and/or crosstalk in each link of synthesis, library construction, experimental operations, and sequencing. Therefore, there is a need for a way to solve the low-frequency mutual crosstalk between samples. The current method to solve this problem is to use double-ended Index to distinguish different samples. The double-ended Index method can effectively remove the mutual crosstalk between samples.
通过杂交捕获可以有效地降低检测目标的测序成本,同时如果在杂交封闭的过程中能够提升捕获目标区域的占比,也能节省测序成本。然而,如何对双端带有index的样本文库进行高效捕获,目前尚无有效的解决方案。The hybridization capture can effectively reduce the sequencing cost of the detection target, and at the same time, if the proportion of the captured target area can be increased during the hybridization sealing process, the sequencing cost can also be saved. However, there is no effective solution for how to efficiently capture the sample library with index on both ends.
发明内容Summary of the invention
本发明的主要目的在于提供一种通用封闭序列及其应用,以解决现有技术中双端index文库杂交捕获效率较低的问题。The main purpose of the present invention is to provide a universal closed sequence and its application to solve the problem of low hybridization and capture efficiency of double-ended index libraries in the prior art.
为了实现上述目的,根据本发明的一个方面,提供了一种通用封闭序列,该通用封闭序列按照从5’到3’的方向包括依次连接的左侧非标签区封闭序列、中间标签区封闭序列及右侧非标签区封闭序列,其中,左侧非标签区封闭序列包括5~7个LAN或BNA修饰的碱基,中间标签区封闭序列为通用封闭碱基序列,右侧非标签区封闭序列包括7~10个LAN或BNA修饰的碱基,且右侧非标签区封闭序列的3’端带有封闭修饰。In order to achieve the above objective, according to one aspect of the present invention, a universal closing sequence is provided, which includes a left non-tag region closing sequence and a middle tag region closing sequence sequentially connected in a direction from 5'to 3'. And the right non-tag region closed sequence, where the left non-tag region closed sequence includes 5-7 LAN or BNA modified bases, the middle label region closed sequence is a universal closed base sequence, and the right non-tag region closed sequence Including 7-10 LAN or BNA modified bases, and the 3'end of the blocking sequence in the right non-tag region has a blocking modification.
进一步地,3’端的封闭修饰为MGB修饰、C3间隔臂修饰,磷酸化修饰,地高辛修饰或生物素修饰,或者3’端碱基为双脱氧碱基。Further, the 3'end blocking modification is MGB modification, C3 spacer modification, phosphorylation modification, digoxigenin modification or biotin modification, or the 3'end base is a dideoxy base.
进一步地,通用封闭碱基为次黄嘌呤或C3间隔臂。Further, the universal blocking base is hypoxanthine or C3 spacer.
进一步地,通用封闭序列为MGI测序平台的带有第一标签序列的P1接头的封闭序列或带有第二标签序列的P2接头的封闭序列,其中,P1接头的封闭序列为SEQ ID NO:3:Further, the universal blocking sequence is the blocking sequence of the P1 linker with the first tag sequence or the blocking sequence of the P2 linker with the second tag sequence of the MGI sequencing platform, where the blocking sequence of the P1 linker is SEQ ID NO: 3 :
CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰;CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/is Modification of the C3 spacer at the 3'end;
P2接头的封闭序列为SEQ ID NO:4:The blocking sequence of the P2 linker is SEQ ID NO: 4:
GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰。GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/ is The C3 spacer at the 3'end is modified.
进一步地,通用封闭序列为MGI测序平台的带有第一标签序列的P1接头的封闭序列或带有第二标签序列的P2接头的封闭序列;Further, the universal blocking sequence is the blocking sequence of the P1 linker with the first tag sequence or the blocking sequence of the P2 linker with the second tag sequence of the MGI sequencing platform;
P1接头的封闭序列为SEQ ID NO:5:The blocking sequence of the P1 linker is SEQ ID NO: 5:
CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰;CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is the modification of the C3 spacer at the 3'end;
P2接头的封闭序列SEQ ID NO:6:The blocking sequence of the P2 linker SEQ ID NO: 6:
GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰。GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is the C3 spacer modification at the 3'end.
为了实现上述目的,根据本发明的第二个方面,提供了一种捕获试剂盒,该捕获试剂盒包括通用封闭序列,封闭序列为上述任一种通用封闭序列。In order to achieve the above objective, according to the second aspect of the present invention, a capture kit is provided, the capture kit includes a universal blocking sequence, and the blocking sequence is any of the above universal blocking sequences.
进一步地,试剂盒中通用捕获探针的工作浓度以单个文库计,为0.4~0.8μg通用封闭序列/1μg待捕获文库。Further, the working concentration of the universal capture probe in the kit is based on a single library, which is 0.4-0.8 μg of the universal blocking sequence/1 μg of the library to be captured.
根据本发明的第三个方面,提供了一种文库杂交捕获方法,该方法包括采用捕获试剂盒对待捕获文库进行捕获,捕获试剂盒采用上述任一种捕获试剂盒。According to a third aspect of the present invention, there is provided a library hybridization capture method, the method includes using a capture kit to capture the library to be captured, and the capture kit uses any of the foregoing capture kits.
进一步地,采用捕获试剂盒对待捕获文库进行捕获的步骤包括,将封闭序列与待捕获文库按照摩尔比为10:1~20:1的比例进行封闭。Further, the step of using the capture kit to capture the library to be captured includes blocking the blocking sequence and the library to be captured in a molar ratio of 10:1 to 20:1.
根据本发明的第四个方面,提供了一种建库方法,该建库方法包括:构建片段化文库;对片段化文库进行杂交捕获,得到捕获文库;对捕获文库进行PCR扩增,得到测序文库;采用上述任一种捕获试剂盒进行杂交捕获,或者采用上述任一种方法进行杂交捕获。According to the fourth aspect of the present invention, there is provided a method for constructing a library. The method for constructing a library includes: constructing a fragmented library; hybridizing and capturing the fragmented library to obtain a captured library; performing PCR amplification on the captured library to obtain sequencing Library; use any of the above capture kits for hybrid capture, or use any of the above methods for hybrid capture.
应用本发明的技术方案,通过在左侧非标签区封闭序列和右侧非标签区封闭序列上分别对5~7个和7~10个碱基进行LNA或BNA修饰,能够增强对待封闭序列的结合能力,从而提升封闭效果;在右侧非标签区封闭序列的3’端进行封闭修饰,使得本申请的封闭序列在文库捕获时能够减少或避免对非目标文库的捕获,提升目标文库的捕获率(或者叫目标文库的中靶率)。By applying the technical solution of the present invention, 5-7 and 7-10 bases are modified by LNA or BNA respectively on the closed sequence of the left non-tag region and the closed sequence of the right non-tag region, which can enhance the closed sequence to be blocked. Binding ability, thereby enhancing the blocking effect; blocking modification at the 3'end of the blocking sequence in the right non-tag region, so that the blocking sequence of the present application can reduce or avoid the capture of non-target libraries during library capture, and improve the capture of target libraries Rate (or called the target rate of the target library).
附图说明Description of the drawings
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings of the specification constituting a part of the present application are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and the description thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:
图1示出了目前MGI平台双端Index建库流程的示意图;Figure 1 shows a schematic diagram of the current MGI platform dual-end Index database construction process;
图2示出了现有技术中采用通用封闭序列进行捕获时能够提升目标序列捕获的中靶率的原理示意图;Fig. 2 shows a schematic diagram of the principle of improving the target sequence capture rate when a universal closed sequence is used for capture in the prior art;
图3示出了现有技术中MGI平台双端Index文库采用12个双端Index所构建的文库之间相互串扰的统计结果图;Fig. 3 shows a statistical result diagram of crosstalk between libraries constructed by using 12 double-ended Indexes in the double-ended Index library of the MGI platform in the prior art;
图4A和图4B示出了改进后的通用封闭序列对杂交文库的正常封闭和异常封闭的原理示意图;4A and 4B show schematic diagrams of the normal and abnormal blocking of the hybrid library by the improved universal blocking sequence;
图5示出了采用不同修饰数量和不同浓度的通用封闭序列对目标序列捕获的中靶率的影响示意图;Figure 5 shows a schematic diagram of the effect of using different modification quantities and different concentrations of universal blocking sequences on the target sequence capture rate;
图6示出了过少和过多的碱基修饰的通用封闭序列对封闭效果的影响;Figure 6 shows the influence of too few and too many base modification universal blocking sequences on the blocking effect;
图7示出了修饰碱基数量相同但修饰位置不同的通用封闭序列对封闭效果的影响;Figure 7 shows the effect of universal blocking sequences with the same number of modified bases but different modified positions on the blocking effect;
图8示出了不同文库投入量对封闭效果影响。Figure 8 shows the effect of different library input on the blocking effect.
具体实施方式Detailed ways
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将结合实施例来详细说明本发明。It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other if there is no conflict. The present invention will be described in detail below in conjunction with embodiments.
术语解释:Term explanation:
双端Index接头:高通量测序时需要每个片段末端连接通用的测序接头,接头的非互补区域各有一个可变序列区域序列是Index序列,是用来测序时拆分数据用。Double-ended Index adapter: For high-throughput sequencing, a universal sequencing adapter is required to connect the end of each fragment. Each non-complementary region of the adapter has a variable sequence region. The sequence is the Index sequence, which is used to split data during sequencing.
接头封闭序列:在文库捕获时,每个文库都具有相同和相近的接头序列,杂交时目标片段和非目标片段接头部分会相互结合,降低中靶率,接头封闭序列是用来特异结合接头部分序列,起到提升中靶率的作用。通用封闭序列,就是针对文库中带有不同Inedx的接头均能实现封闭的序列。Adaptor blocking sequence: During library capture, each library has the same and similar adaptor sequence. During hybridization, the target fragment and non-target fragment adaptor parts will bind to each other, reducing the target rate. The adaptor sealing sequence is used to specifically bind the adaptor part. Sequence, play a role in improving the target rate. The universal blocking sequence is a sequence that can be blocked for all linkers with different Inedx in the library.
C3间臂:C3 Splicer主要用于模仿核糖的3'和5'羟基间的三碳间隔,或“替代”一个序列中未知的碱基,在核酸序列中间主要是起连接作用,并不能和互补的碱基配对而起到稳定作用,仅对前后碱基起到连接作用。C3 arm: C3 Splicer is mainly used to imitate the three-carbon gap between the 3'and 5'hydroxyl groups of ribose, or to "replace" an unknown base in a sequence. The middle of the nucleic acid sequence is mainly used for connection and cannot be complementary. The base pairing of the base pair plays a stabilizing effect, and only plays a role in connecting the front and rear bases.
需要说明的是,本申请中所提及的MGI测序平台用的index长度是以10bp为例进行说明的。本申请中的针对index的标签封闭段的长度可以通过调整次黄嘌呤I和C3间隔臂的长度来适应不同长度的index接头的封闭。It should be noted that the index length used for the MGI sequencing platform mentioned in this application is described by taking 10 bp as an example. The length of the label closure section for index in this application can be adapted to the closure of index joints of different lengths by adjusting the length of the hypoxanthine I and C3 spacer arms.
如背景技术所提到的,现在的高通量测序仪需要降低测序成本,单次测序通量越来越高,现有MGI测序仪为了节约测序成本,也推出了目前市场上通量最高,测序成本最低的T7测序仪,测序通量提升了就要求一次必须混合很多测序样本测序,高通量测序就是通过不同的样本连接不用的Index接头来实现多个样本混合在一起测序。MGI最先推出的是单端Index接头的建库方案,该方案有一个显著的问题就是会有样本的相互串扰,为了解决串扰问题MGI也推出了双端Index的建库方案,这样就能解决由于接头合成、实验操作和测序过程相互影响导致的低频串扰问题,通过两个Index数据拆分把串扰的数据过滤掉。As mentioned in the background technology, current high-throughput sequencers need to reduce sequencing costs, and the throughput of single sequencing is getting higher and higher. In order to save sequencing costs, the existing MGI sequencers have also launched the highest throughput on the market. The T7 sequencer, which has the lowest sequencing cost, requires that many sequencing samples must be mixed for sequencing at a time when the sequencing throughput increases. High-throughput sequencing is to connect different samples to unused Index connectors to achieve multiple samples mixed together for sequencing. MGI first introduced a single-ended Index connector library building solution. This solution has a significant problem that there will be crosstalk between samples. In order to solve the crosstalk problem, MGI also launched a double-ended Index library building solution, which can be solved. Due to the low-frequency crosstalk problem caused by the interaction of joint synthesis, experimental operation and sequencing process, the crosstalk data is filtered out through two Index data splitting.
但针对其双端index文库进行捕获文库测序时,还存在对含目标片段的文库的捕获效率低的缺陷,为了进一步提高该双端Index文库的捕获效率,本申请利用纳昂达针对MGI测序仪推出的双端Index建库方案(具体的双端Index建库流程如图1),提出了相应的杂交捕获改进方案。However, when performing capture library sequencing for its paired-end index library, there is also the defect of low capture efficiency for the library containing the target fragment. In order to further improve the capture efficiency of the paired-end Index library, this application uses Nanoda to launch the MGI sequencer. The double-ended Index database construction program (the specific double-ended Index database construction process is shown in Figure 1), and the corresponding hybrid capture improvement program is proposed.
另一个降低测序成本的有效途径是靶向捕获测序,人的基因组大小是3Gb,编码基因的区域占比不到2%的区域,(以IDT的全外显子V2版本来说)涵盖了人的大概2万个基因,约34Mb大小,所以一个全基因组测序成本可以测10个全外显子测序,而肿瘤靶向用药的检测区域与全基因组测序相比,成本差异更大,更能体现出靶向测序的重要性。而且肿瘤的突变都是低频突变,需要考虑两个主要问题,一个是低频突变不能有样本之间的串扰,或者说如果串扰不能避免必须能有方法剔除串扰的数据,双端Index就是当串扰不能避免时的一种可剔除串扰的必选方案,所以这也是双端Index存在的最大意义。能够检测低频突变的另一个条件是测序深度要有保证,一般需要几千乘到几万乘的测序深度。Another effective way to reduce the cost of sequencing is to target capture sequencing. The size of the human genome is 3Gb, and the region encoding the gene accounts for less than 2% of the region. (In the case of IDT's all-exon V2 version), it covers humans. There are about 20,000 genes, about 34Mb in size, so the cost of a whole-genome sequencing can measure 10 whole-exome sequencing. Compared with whole-genome sequencing, the detection area of tumor-targeted drugs has a greater cost difference and can be more reflected. The importance of targeted sequencing. Moreover, tumor mutations are low-frequency mutations. Two main issues need to be considered. One is that low-frequency mutations cannot have crosstalk between samples, or if crosstalk cannot be avoided, there must be a way to eliminate crosstalk data. Double-ended Index means when crosstalk cannot be avoided. It is a necessary solution to eliminate crosstalk when avoiding, so this is also the greatest significance of the existence of double-ended Index. Another condition for detecting low-frequency mutations is that the sequencing depth must be guaranteed, which generally requires a sequencing depth of several thousand to tens of thousands of times.
在靶向测序的过程中,有两方面的关键因素决定靶向捕获的中靶率,一方面是设计区域的特异性,设计的探针不能落到高度重复区域,另一方面是捕获时的接头封闭效果。探针序列是由待捕获区域的序列决定的,一般在设计探针时都会避开高度重复的区域,所以我们能够改进的是靶向捕获时的封闭效果。如果捕获时不加入封闭序列,理论上以及实际测试均表明,(目标片段文库)的中靶率不会超过50%。如图2所示,在不加入封闭接头序列时,接头部分会结合非靶向文库从而减低中靶率。在捕获探针区域的特异性比较好的前提下,加入不同等级的封闭序列可以使中靶率在45-90%之间波动。因此,本申请就以MGI平台的双端Index接头文库为例,解释说明改进设计的通用封闭序列及其所达到的封闭效果,能够大幅提升中靶率进而降低测序成本。In the process of targeted sequencing, there are two key factors that determine the target rate of targeted capture. On the one hand, the specificity of the design area, the designed probe cannot fall into the highly repetitive area, and the other is the capture time. Joint closure effect. The probe sequence is determined by the sequence of the region to be captured. Generally, highly repetitive regions are avoided when designing the probe, so what we can improve is the blocking effect during targeted capture. If the blocking sequence is not added during capture, theoretical and practical tests have shown that the target rate of (target fragment library) will not exceed 50%. As shown in Figure 2, when the blocking linker sequence is not added, the linker part will bind to the non-targeting library to reduce the target rate. Under the premise that the specificity of the capture probe region is relatively good, adding different levels of blocking sequences can make the target rate fluctuate between 45-90%. Therefore, this application takes the double-ended Index linker library of the MGI platform as an example to explain the improved design of the universal blocking sequence and the achieved blocking effect, which can greatly increase the target rate and reduce the sequencing cost.
MGI平台的双端Index的文库两端分别有一个接头序列,不同编号的Index接头所携带的index序列都是一个10bp的可变区,该可变区是用来对混合捕获和混合测序中的不同样本进行区分的。上文提到双端Index的目的是去除相互串扰的,如图3所示,每万条序列通过双端Index可以过滤掉3-7条序列,如果不用双端Index对千分之一及以下的突变检测是不可信的数据,通过双端Index可以提升检测的准确性。在杂交捕获的过程中为了提升中靶率,本发明 开发了针对MGI平台双端Index的通用封闭序列。这个通用序列的特征之一是10bp的Index区域是选择通用的碱基(次黄嘌呤或C3 spacer)起到封闭/占位作用,为了提升封闭效果两端的固定序列区域进行了提升杂交温度的碱基修饰替代,固定序列部分碱基进行LNA或BNA修饰。双端Index的通用探针的具体特征和要求如下:(1)标签序列区域为通用封闭碱基,比如次黄嘌呤(I)、C3间隔臂等间隔序列或者其组合;(2)为了增强该通用型封闭序列的结合效率,在封闭序列的上下游的非标签序列区域的部分碱基进行LNA或BNA修饰,并且修饰碱基的数量分别在5~7和7~10,或者20%-40%之间。(3)通用封闭序列最适使用浓度和修饰碱基数量成反比,修饰的碱基多,最适浓度相对浓度低,太高浓度会产生起反作用;相反修饰的碱基少,要达到封闭效果需要更高的封闭序列浓度。There is a linker sequence at both ends of the double-ended Index library of the MGI platform. The index sequence carried by the Index linker of different numbers is a 10bp variable region, which is used for hybrid capture and hybrid sequencing. Different samples are distinguished. The purpose of the double-ended Index mentioned above is to remove crosstalk. As shown in Figure 3, 3-7 sequences can be filtered out through the double-ended Index for every 10,000 sequences. If the double-ended Index is not used for one-thousandth and below The mutation detection is unreliable data, and the accuracy of detection can be improved through double-ended Index. In order to improve the target rate during the hybridization capture process, the present invention develops a universal closed sequence for the double-ended Index of the MGI platform. One of the characteristics of this universal sequence is that the 10bp Index region selects universal bases (hypoxanthine or C3 spacer) to play a blocking/occupying role. In order to improve the blocking effect, the fixed sequence regions at both ends are used to increase the hybridization temperature. Base modification and substitution, some bases in the fixed sequence are modified with LNA or BNA. The specific characteristics and requirements of the double-ended Index universal probe are as follows: (1) The tag sequence region is a universal closed base, such as hypoxanthine (I), C3 spacer equidistant sequence or a combination thereof; (2) In order to enhance this The binding efficiency of the universal closed sequence is that some bases in the non-tag sequence region upstream and downstream of the closed sequence are modified by LNA or BNA, and the number of modified bases is 5-7 and 7-10, or 20%-40, respectively. %between. (3) The optimal use concentration of universal blocking sequences is inversely proportional to the number of modified bases. There are many modified bases, and the optimal concentration is relatively low. Too high a concentration will have a negative effect; on the contrary, there are fewer modified bases, and the blocking effect must be achieved. A higher blocking sequence concentration is required.
本发明根据MGI双端Index的接头序列特征设计了通用封闭修饰序列,MGI两条通用封闭的原始序列如下:The present invention designs a universal closed modified sequence based on the linker sequence characteristics of the MGI double-ended Index. The two universal closed original sequences of MGI are as follows:
Figure PCTCN2020139918-appb-000001
Figure PCTCN2020139918-appb-000001
N的部分是Index序列的封闭序列,10bp长度的Index相比6bp和8bp时的Index,其好处是可以增加在设计不同Index时的选择,缺点是对通用封闭序列设计而言,增加了难度和不稳定性。在设计成通用封闭序列时,Index区域的碱基设计成简并碱基N、C3间隔壁和次黄嘌呤,所获得的通用封闭序列的不稳定性更高,这就要求Index两端的通用封闭序列修饰的碱基更多,因此,本申请将提升封闭效果的修饰碱基的数量由3-6个提升到5-10个。The part of N is the closed sequence of the Index sequence. Compared with the Index of 6bp and 8bp, the index of 10bp length has the advantage that it can increase the choice when designing different indexes. The disadvantage is that the design of general closed sequence increases the difficulty and Instability. When designing a universal closed sequence, the bases in the Index area are designed to be degenerate bases N, C3 spacer and hypoxanthine. The obtained universal closed sequence is more unstable, which requires a universal closure at both ends of the Index. There are more bases modified in the sequence. Therefore, this application increases the number of modified bases that improve the blocking effect from 3-6 to 5-10.
在本申请一种优选的实施例中,P1(5+7修饰)端封闭序列为SEQ ID NO:3:In a preferred embodiment of the present application, the P1 (5+7 modified) end blocking sequence is SEQ ID NO: 3:
CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。在一种优选的实施例中,P2端封闭序列为SEQ ID NO:4:CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/is Block modification at the 3'end. In a preferred embodiment, the P2 end blocking sequence is SEQ ID NO: 4:
GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/ is Block modification at the 3'end.
在本申请另一种优选的实施例中,P1端封闭序列为SEQ ID NO:5:In another preferred embodiment of the present application, the P1 end blocking sequence is SEQ ID NO: 5:
CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表 示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。在一种优选的实施例中,P2端封闭序列为SEQ ID NO:6:CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is a blocking modification at the 3'end. In a preferred embodiment, the P2 end blocking sequence is SEQ ID NO: 6:
GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is a blocking modification at the 3'end.
进一步研究发现,修饰碱基的数量比较多,同时加入的通用封闭序列的浓度比较高时反而封闭效果不好,如图4A和图4B所示,当封闭序列中修饰碱基数量多了,在局部区域形成强的结合能力,大部分会像图4A所示,封闭序列可以正确封闭文库序列,但同时也会形成十字星型封闭,捕获下来一个非靶向序列,从而降低文库中靶率,如图4B所示。Further research found that the number of modified bases is relatively large, and the concentration of the universal blocking sequence added at the same time is relatively high, but the blocking effect is not good. As shown in Figure 4A and Figure 4B, when the number of modified bases in the closed sequence is large, The local area forms a strong binding ability. Most of them will be like Figure 4A. The blocking sequence can correctly block the library sequence, but at the same time it will also form a cross-star block to capture a non-target sequence, thereby reducing the target rate in the library. As shown in Figure 4B.
进一步地发现在杂交的过程中,加入的文库量也有最大量的限制,以250bp的长度插入片段文库计算,在单次杂交不能超过6.5μg(25pmol/L,这里指单端序列),在用全外显子测试发现每个文库在杂交时投入500ng,12个文库杂交效果要比14-16个文库杂交效果好,在杂交时由于文库投入量过多,文库空间距离变短也会有部分文库两个文库和通用封闭形成图4B所示的双文库和双封闭序列的十字星型结构,从而导致捕获效率降低。It is further found that during the hybridization process, the amount of library added is also limited to the maximum amount. Calculated as the 250bp length insert library, it cannot exceed 6.5μg (25pmol/L, here refers to the single-ended sequence) in a single hybridization. The whole exome test found that each library invested 500ng during hybridization, and the hybridization effect of 12 libraries was better than that of 14-16 libraries. During hybridization, the library space distance would be shortened due to excessive library input. The two libraries and universal blocking of the library form a cross-star structure of the double library and double blocking sequence shown in FIG. 4B, resulting in a decrease in capture efficiency.
基于上述研究和发现,申请人提出了本申请的技术方案。在一种典型的实施方式中,提供了一种通用封闭序列,该通用封闭序列按照从5’到3’的方向包括:包括依次连接的左侧非标签区封闭序列、中间标签区封闭序列及右侧非标签区封闭序列,其中,左侧非标签区封闭序列包括5~7个LAN(Locked nucleic acid,锁核酸)或BNA(桥连核酸Bridged nucleic acid2',4'-BNA NC,即2'-O,4'-aminoethylene bridged nucleic acid是含有具有NO键的六元桥连结构的化合物)修饰的碱基(主要对是C碱基进行LNA或BNA修饰),中间标签区封闭序列为通用封闭碱基序列,右侧非标签区封闭序列包括7~10个LAN或BNA修饰的碱基,且右侧非标签区封闭序列的3’端带有封闭修饰。 Based on the above-mentioned research and findings, the applicant proposed the technical solution of this application. In a typical embodiment, a universal closing sequence is provided. The universal closing sequence includes, in a direction from 5'to 3': a left non-tag region closing sequence, a middle tag region closing sequence, and The right non-tag region blocking sequence, where the left non-tag region blocking sequence includes 5-7 LAN (Locked nucleic acid, locked nucleic acid) or BNA (Bridged nucleic acid 2', 4'-BNA NC , ie 2 '-O,4'-aminoethylene bridged nucleic acid is a compound containing a six-member bridged structure with an NO bond) modified bases (mainly C bases are modified by LNA or BNA), the middle tag region blocking sequence is universal The closed base sequence, the closed sequence of the right non-tag region includes 7-10 LAN or BNA modified bases, and the 3'end of the closed sequence of the right non-tag region has a closed modification.
通过在左侧非标签区封闭序列上5~7个碱基,并在右侧非标签区封闭序列上的7~10个碱基进行LNA或BNA修饰,能够显著增强与待封闭序列的结合能力,从而增加封闭效果;而在右侧非标签区封闭序列的3’端进行封闭修饰,使得文库中多余的接头不能被作为引物对其他文库的接头进行扩增,进而能够减少或避免文库的非特异性捕获,提高目的文库捕获的中靶率。Modification of LNA or BNA by 5-7 bases on the closed sequence in the left non-tag region and 7-10 bases in the closed sequence in the right non-tag region can significantly enhance the binding ability to the sequence to be blocked , Thereby increasing the blocking effect; and blocking and modifying the 3'end of the blocking sequence in the right non-tag region, so that the excess adapters in the library cannot be used as primers to amplify the adapters of other libraries, thereby reducing or avoiding non-specific libraries. Heterosexual capture improves the target rate of target library capture.
上述右侧非标签区封闭序列的3’端的封闭修饰,可以采用MGB修饰、C3间隔臂修饰,3’磷酸化修饰,3’地高辛修饰,3’生物素修饰或3’端的碱基为双脱氧碱基。在本申请中,优选采用C3间隔臂修饰。The blocking modification of the 3'end of the above-mentioned right non-tag region blocking sequence can be MGB modification, C3 spacer modification, 3'phosphorylation modification, 3'digoxigenin modification, 3'biotin modification or 3'end base Dideoxy base. In this application, it is preferred to use C3 spacer modification.
上述标签区封闭序列采用通用封闭碱基可以采用与A、T、C和G四种碱基均存在弱结合能力的碱基序列。在本申请一种优选的实施例中,该通用封闭碱基为次鸟嘌呤I和/或C3间隔臂。具体标签区封闭序列的碱基数目并非局限于10bp,可以根据待封闭的文库中的样本标签的碱基数目合理设定。比如,还可以是6bp、7bp、8bp、9bp、11bp或12bp等。The above-mentioned tag region blocking sequence adopts a universal blocking base, and a base sequence that has weak binding ability to all four bases of A, T, C, and G can be used. In a preferred embodiment of the present application, the universal blocking base is hypoguanine I and/or C3 spacer. The number of bases in the blocking sequence of the specific tag region is not limited to 10 bp, and can be set reasonably according to the number of bases in the sample tag in the library to be blocked. For example, it can also be 6bp, 7bp, 8bp, 9bp, 11bp or 12bp.
当采用C3间隔臂作为标签区封闭序列时,以前述MGI平台的P1和P2接头的封闭序列为例,其是10个C3间隔臂,或10个次黄嘌呤(I)。次黄嘌呤的优点是能与所有的碱基都有微弱的配对能力,而C3间隔臂只是占位一个碱基,与配对碱基没有结合能力,不能起到稳定的作用。When the C3 spacer is used as the tag region blocking sequence, take the blocking sequence of the P1 and P2 linker of the aforementioned MGI platform as an example, which is 10 C3 spacers, or 10 hypoxanthines (I). The advantage of hypoxanthine is that it has weak pairing ability with all bases, while the C3 spacer only occupies one base, which has no binding ability with the paired base and cannot play a stable role.
上述通用封闭序列中,左侧非标签区封闭序列和右侧非标签区封闭序列中LNA或BNA修饰的碱基的数目,通常认为修饰碱基数据量与左侧非标签区封闭序列或右侧非标签区封闭序列的序列长度呈负相关,序列长则需要修饰的碱基数量少,序列短则需要修饰的碱基数量多。然而在本申请中,发明人发现,对于特定长度的非标签区域的封闭序列来看,LNA或BNA修饰的碱基在左侧非标签区封闭序列或右侧非标签区封闭序列中的数目为5~10个碱基时,封闭序列与目的接头的结合能力最强。而当少于5个碱基时,与目的接头的结合不稳定,从而使得捕获效率较低。In the above-mentioned universal closed sequence, the number of bases modified by LNA or BNA in the closed sequence of the left non-tag region and the closed sequence of the right non-tag region is generally considered to be the same as the amount of modified base data and the closed sequence of the left non-tag region or the right The sequence length of the closed sequence in the non-tag region is negatively correlated. Longer sequences require fewer bases to be modified, while shorter sequences require more bases to be modified. However, in this application, the inventors found that for the closed sequence of the non-tag region of a specific length, the number of bases modified by LNA or BNA in the closed sequence of the left non-tag region or the closed sequence of the right non-tag region is When the length is 5 to 10 bases, the blocking sequence has the strongest binding ability to the target linker. When it is less than 5 bases, the binding to the target linker is unstable, which makes the capture efficiency low.
此外,在利用上述通用封闭序列进行文库捕获时,待捕获文库的总量与加入的通用封闭序列的量最好也要匹配,若添加的文库量过多,容易导致封闭序列之间杂交形成星状结构,从而导致非特异性捕获,减低捕获效率。比如,当通用封闭序列的添加量为2.4μg时,以每个文库500ng计算,共杂交12个文库,即总量为6μg时,杂交捕获效果比杂交14~16个文库的捕获效果要好。当然,每个文库小于500ng,比如400ng时,2.4μg的通用封闭序列,同时杂交15个文库的捕获效率最高。In addition, when using the above-mentioned universal blocking sequence for library capture, the total amount of the library to be captured and the amount of the added universal blocking sequence should also match. Shape structure, which leads to non-specific capture and reduces capture efficiency. For example, when the added amount of the universal blocking sequence is 2.4 μg, calculated based on 500 ng per library, a total of 12 libraries are hybridized, that is, when the total amount is 6 μg, the hybridization capture effect is better than the capture effect of hybridization of 14-16 libraries. Of course, when each library is less than 500ng, such as 400ng, 2.4μg of universal blocking sequence and hybridization of 15 libraries at the same time have the highest capture efficiency.
针对MGI平台的双端index接头,本申请也提供了能够抑制MGI测序平台的双端Index测序文库捕获的通用封闭序列。在本申请一种优选的实施例中,P1(5+7修饰)端封闭序列为SEQ ID NO:3:Regarding the double-ended index adapter of the MGI platform, this application also provides a universal closed sequence that can inhibit the capture of the double-ended Index sequencing library of the MGI sequencing platform. In a preferred embodiment of the present application, the P1 (5+7 modified) end blocking sequence is SEQ ID NO: 3:
CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。在一种优选的实施例中,P2端封闭序列为SEQ ID NO:4:CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/is Block modification at the 3'end. In a preferred embodiment, the P2 end blocking sequence is SEQ ID NO: 4:
GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/ is Block modification at the 3'end.
在本申请另一种优选的实施例中,P1端封闭序列为SEQ ID NO:5:In another preferred embodiment of the present application, the P1 end blocking sequence is SEQ ID NO: 5:
CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。在一种优选的实施例中,P2端封闭序列为SEQ ID NO:6:CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is a blocking modification at the 3'end. In a preferred embodiment, the P2 end blocking sequence is SEQ ID NO: 6:
GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的封闭修饰。GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is a blocking modification at the 3'end.
上述两种优选的实施例所提供的通用封闭序列,不仅在数量上增加了修饰碱基从而提高了与目的接头的结合能力,而且上述通用封闭序列上的修饰碱基的具体位置也相比修饰其他位置的碱基时,对目的接头的结合能力强。也就是说,上述优选的通用封闭序列对目的接头的封闭效果最佳,杂交捕获时,目的文库的捕获效率最高。The universal blocking sequences provided by the above two preferred embodiments not only increase the number of modified bases to improve the binding ability with the target linker, but also the specific positions of the modified bases on the above universal blocking sequences are also modified compared to In the case of bases in other positions, the binding ability to the target linker is strong. That is to say, the above-mentioned preferred universal blocking sequence has the best blocking effect on the target linker, and the capture efficiency of the target library is the highest during hybridization and capture.
在上述各种改进的通用封闭序列的基础上,在本申请第二种典型的实施方式中,提供了一种捕获试剂盒,该捕获试剂盒包括上述任一种通用封闭序列。该捕获试剂盒中的通用封闭序列与目的接头的结合能力强,在用于捕获文库构建时,能够实现目的文库的高效捕获。On the basis of the aforementioned various improved universal blocking sequences, in a second exemplary embodiment of the present application, a capture kit is provided, which includes any of the aforementioned universal blocking sequences. The universal blocking sequence in the capture kit has strong binding ability to the target linker, and when used for the capture library construction, it can achieve efficient capture of the target library.
为了进一步提高目标文库的捕获效率(即中靶率),在一种优选实施例中,上述试剂盒中通用捕获探针的工作浓度为0.4~0.8μg通用封闭序列/1μg待捕获文库。按照上述用量进行捕获,能够进一步避免文库量过多而形成十字星型封闭,从而降低文库中靶率。上述工作浓度可以根据具体封闭方案的不同有所不同,比如,当采用本申请的SEQ ID NO:5和SEQ ID NO:6的封闭方案时,该工作浓度按照0.4μg通用封闭序列/1μg待捕获文库进行捕获,目标文库的中靶率较高。当采用本申请的SEQ ID NO:3和SEQ ID NO:4的封闭方案时,该工作浓度按照0.8μg通用封闭序列/1μg待捕获文库进行捕获目标文库的中靶率较高。In order to further improve the capture efficiency of the target library (ie, the target rate), in a preferred embodiment, the working concentration of the universal capture probe in the above kit is 0.4-0.8 μg universal blocking sequence/1 μg of the library to be captured. The capture according to the above dosage can further avoid the formation of a cross-star closure due to the excessive amount of the library, thereby reducing the target rate in the library. The above working concentration can be different according to the specific blocking scheme. For example, when the blocking schemes of SEQ ID NO: 5 and SEQ ID NO: 6 of this application are adopted, the working concentration is 0.4 μg general blocking sequence/1 μg to be captured The library is captured, and the target rate of the target library is higher. When the blocking schemes of SEQ ID NO: 3 and SEQ ID NO: 4 of the present application are adopted, the target library for capturing the target library according to the working concentration of 0.8 μg universal blocking sequence/1 μg library to be captured is relatively high.
在本申请第三种典型的实施方式中,还提供了一种文库杂交捕获方法,该方法包括采用捕获试剂盒对待捕获文库进行捕获,捕获试剂盒采用上述捕获试剂盒。该捕获试剂盒中的封闭序列在捕获文库构建时,能够实现目的文库的高效捕获。In a third exemplary embodiment of the present application, a library hybridization capture method is also provided. The method includes using a capture kit to capture the library to be captured, and the capture kit uses the above-mentioned capture kit. The blocking sequence in the capture kit can achieve efficient capture of the target library when the capture library is constructed.
发明人还发现通用封闭序列与待捕获文库按照摩尔比为10:1~20:1时封闭效果更好。因而在本申请一种优选的实施例中,采用捕获试剂盒对待捕获文库进行捕获的步骤中,将封闭序列与待捕获文库按照摩尔比为10:1~20:1的比例进行封闭。The inventors also found that when the molar ratio of the universal blocking sequence to the library to be captured is 10:1-20:1, the blocking effect is better. Therefore, in a preferred embodiment of the present application, in the step of using the capture kit to capture the library to be captured, the blocking sequence and the library to be captured are blocked in a molar ratio of 10:1-20:1.
在本申请第四种典型的实施方式中,提供了一种建库方法,该建库方法包括:构建片段化文库;对片段化文库进行杂交捕获,得到捕获文库;对捕获文库进行PCR扩增,得到测序文库;采用上述捕获试剂盒进行杂交捕获,或者采用上述任一种方法进行杂交捕获。采用本申请的建库方法所构建的文库中目的文库的占比较高,文库产出的有效数据占比高。In a fourth exemplary embodiment of the present application, a method for constructing a library is provided. The method for constructing a library includes: constructing a fragmented library; hybridizing and capturing the fragmented library to obtain a capture library; and performing PCR amplification on the capture library , To obtain a sequencing library; use the above-mentioned capture kit for hybridization capture, or use any of the above-mentioned methods for hybridization capture. In the library constructed by the library construction method of the present application, the target library accounts for a relatively high proportion, and the effective data produced by the library accounts for a high proportion.
下面将结合具体的实施例来进一步说明本申请的有益效果。The following will further illustrate the beneficial effects of the present application in conjunction with specific embodiments.
需要说明的是,以下实施例采用NadPrep TM DNA文库构建试剂盒(for MGI)(201909Version2.0)(纳昂达(南京)生物科技有限公司)所提供的文库构建流程进行。还需要说明的是,以下实施例仅是示例性说明,并不限定本申请的方法仅能采用如下方法。具体流程简述如下: It should be noted that the following examples are performed using the library construction process provided by the NadPrep TM DNA Library Construction Kit (for MGI) (201909 Version 2.0) (Naonda (Nanjing) Biotechnology Co., Ltd.). It should also be noted that the following embodiments are only exemplary descriptions, and do not limit the method of the present application to only the following methods. The specific process is briefly described as follows:
DNA样本片段化---末端修复和加A---接头连接---片段筛选---PCR扩增---文库纯化、定量和质检---使用MGI平台测序或靶向捕获后测序。DNA sample fragmentation---end repair and A addition---adapter ligation---fragment screening---PCR amplification---library purification, quantification and quality inspection---after sequencing or targeted capture using the MGI platform Sequencing.
实施例1修饰多少和加入浓度差异的通用封闭方案Example 1 How much modification and the general blocking scheme of adding concentration difference
步骤:文库构建参考NadPrep TM DNA文库构建试剂盒(for MGI)(201909Version2.0)说明书进行。其中杂交捕获的步骤按以下进行,真空浓缩后进行多文库混合杂交捕获时,具体的杂交文库混合步骤如下表: Steps: Library construction was carried out with reference to the instructions of NadPrep TM DNA Library Construction Kit (for MGI) (201909 Version 2.0). The steps of hybridization and capture are carried out as follows. When multi-library hybridization and hybridization are performed after vacuum concentration, the specific hybridization library mixing steps are as follows:
表1:Table 1:
组分Component 总文库量Total library volume 数量quantity
总文库Total library 6μg6μg 1~121~12
Human Cot DNAHuman Cot DNA 5μl5μl //
通用封闭序列(下面列出序列)Universal closed sequence (sequences listed below) 2μl2μl //
1)本申请改进的MGI接头双端Index通用型封闭序列1) The application's improved MGI connector double-ended Index universal closed sequence
1.1 SEQ ID NO:3所示的P1端封闭序列和SEQ ID NO:4所示的P2端封闭序列。1.1 The P1 end blocking sequence shown in SEQ ID NO: 3 and the P2 end closing sequence shown in SEQ ID NO: 4.
CTCTCA+GTACG+TCA+GCA+GT+T10XXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;CTCTCA+GTACG+TCA+GCA+GT+T10XXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;
GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/
1.2 SEQ ID NO:5:所示的P1端封闭序列和SEQ ID NO:6所示的P2端封闭序列1.2 SEQ ID NO: 5: P1 end blocking sequence shown in SEQ ID NO: 6 P2 end blocking sequence
CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;
GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/
上述四条序列中,X表示C3 Spacer/次黄嘌呤,+N表示LNA或BNA修饰碱基(两种修饰效果相当,以LNA修饰为例),/3SpC3/表示3’的C3间隔臂封闭。In the above four sequences, X represents C3 Spacer/hypoxanthine, +N represents LNA or BNA modified base (the two modification effects are equivalent, taking LNA modification as an example), and /3SpC3/ represents the 3'C3 spacer arm closure.
在杂交过程中放入的通用封闭序列和浓度表如下:The general blocking sequence and concentration table put in the hybridization process are as follows:
表2:Table 2:
Figure PCTCN2020139918-appb-000002
Figure PCTCN2020139918-appb-000002
2)具体杂交捕获步骤如下:2) The specific hybridization capture steps are as follows:
1、按照上表将各组分混合于一个0.2/1.5ml的低吸附离心管中,涡旋混匀,瞬时离心。1. Mix the components in a 0.2/1.5ml low-adsorption centrifuge tube according to the above table, vortex to mix, and centrifuge immediately.
2、将离心管放入提前预热至60℃的真空浓缩仪中干燥。2. Put the centrifuge tube into a vacuum concentrator preheated to 60℃ for drying.
3、待全部液体蒸发并完全干燥后,将离心管密封后备用。3. After all the liquid has evaporated and dried completely, seal the centrifuge tube for later use.
4、取出
Figure PCTCN2020139918-appb-000003
Exome Research Panel v2.0在冰上自然融解,使用后按需小量分装。
4. Take out
Figure PCTCN2020139918-appb-000003
Exome Research Panel v2.0 melts naturally on ice, and aliquoted in small quantities as needed after use.
5、根据下表3配制杂交反应液,使用移液器混合均匀后加入到已经真空浓缩干燥的离心管底部,使用移液器轻柔吹吸混匀15~20次,瞬时离心,25℃孵育5~10min。5. Prepare the hybridization reaction solution according to Table 3 below, mix it with a pipette and add it to the bottom of the centrifuge tube that has been vacuum concentrated and dry, use the pipette to gently pipette and mix 15-20 times, centrifuge briefly, and incubate at 25°C 5 ~10min.
表3:table 3:
Figure PCTCN2020139918-appb-000004
Figure PCTCN2020139918-appb-000004
6、涡旋混匀杂交反应混合液,瞬时离心后,将离心管中的全部17μl杂交反应混合液转移至一个新的0.2ml PCR管中,瞬时离心,放进PCR仪中,启动如下杂交程序:6. Vortex and mix the hybridization reaction mixture. After instant centrifugation, transfer all 17μl of the hybridization reaction mixture in the centrifuge tube to a new 0.2ml PCR tube, centrifuge briefly, put it in the PCR machine, and start the following hybridization program :
表4:Table 4:
Figure PCTCN2020139918-appb-000005
Figure PCTCN2020139918-appb-000005
7、杂交文库洗脱7. Hybrid library elution
(1)准备工作(1) Preparation
1、取出
Figure PCTCN2020139918-appb-000006
Hybridization and Wash Kit中的其他试剂室温自然融解,涡旋混合均匀(注意:Wash Buffer I如果无法融解,可于65℃水浴孵育至完全融解)。
1. Take out
Figure PCTCN2020139918-appb-000006
The other reagents in the Hybridization and Wash Kit are naturally thawed at room temperature and mixed evenly by vortexing.
2、Dynabeads TM M-270 Streptavidin Beads涡旋混合均匀,室温平衡30min后方可进行链霉亲和素磁珠的清洗和捕获步骤。 2. Dynabeads TM M-270 Streptavidin Beads are vortexed and mixed uniformly, and the streptavidin magnetic beads can be cleaned and captured only after 30 minutes of equilibration at room temperature.
(2)试剂配制(2) Reagent preparation
1、洗脱缓冲液的配制1. Preparation of elution buffer
根据下表体系配制洗脱缓冲液的1X工作液:Prepare 1X working solution of elution buffer according to the following system:
表5:table 5:
组分名称Component name Rnase-free水Rnase-free water 缓冲液Buffer 总计total
2X磁珠洗脱缓冲液2X magnetic bead elution buffer 160μl160μl 160μl160μl 320μl320μl
10X洗脱缓冲液I10X Elution Buffer I 252μl252μl 28μl28μl 280μl280μl
10X洗脱缓冲液II10X Elution Buffer II 144μl144μl 16μl16μl 160μl160μl
10X洗脱缓冲液III10X Elution Buffer III 144μl144μl 16μl16μl 160μl160μl
10X Stringent洗脱缓冲液10X Stringent Elution Buffer 288μl288μl 32μl32μl 320μl320μl
2、磁珠悬浮液配制,如表6。2. Preparation of magnetic bead suspension, as shown in Table 6.
表6:Table 6:
Figure PCTCN2020139918-appb-000007
Figure PCTCN2020139918-appb-000007
(3)亲和素磁珠清洗(3) Avidin magnetic beads cleaning
1、将Dynabeads TM M-270 Streptavidin Beads涡旋混匀15s,确保完全混匀。吸取50μl M270磁珠至1个1.5ml低吸附离心管中。 1. Vortex and mix Dynabeads TM M-270 Streptavidin Beads for 15 seconds to ensure complete mixing. Pipette 50μl M270 magnetic beads into a 1.5ml low adsorption centrifuge tube.
2、向离心管中加入100μl 1X Bead Wash Buffer,轻柔吹吸混匀10次,瞬时离心,置于磁力架上数分钟,待液体完全澄清,使用移液器移弃上清。将离心管从磁力架上移出。2. Add 100μl 1X Bead Wash Buffer to the centrifuge tube, gently pipette and mix 10 times, centrifuge briefly, and place it on a magnetic stand for a few minutes. When the liquid is completely clear, use a pipette to remove the supernatant. Remove the centrifuge tube from the magnetic stand.
3、重复步骤2两次。3. Repeat step 2 twice.
4、向离心管中加入17μl磁珠悬浮液,轻柔吹吸混匀,将全部磁珠悬浮液转移至1个新的0.2ml低吸附PCR管中。4. Add 17μl magnetic bead suspension to the centrifuge tube, gently pipette and mix, and transfer all the magnetic bead suspension to a new 0.2ml low-adsorption PCR tube.
(4)链霉亲和素磁珠捕获(4) Streptavidin magnetic bead capture
1、16h杂交反应后,调节PCR仪进入到洗脱程序。1. After the 16h hybridization reaction, adjust the PCR instrument to enter the elution program.
2、将重悬的链霉亲和素磁珠加入到杂交体系中,并使用移液器轻柔吹吸混匀或涡旋混匀。2. Add the resuspended streptavidin magnetic beads to the hybridization system, and use a pipette to gently pipette or vortex to mix.
3、65℃孵育45min,每10~12min轻柔涡旋一次,确保磁珠完全重悬。3. Incubate at 65°C for 45 minutes, vortex gently every 10-12 minutes to ensure that the magnetic beads are completely resuspended.
(5)热洗脱(注意:热洗脱过程操作要迅速;吹吸混匀过程中尽量避免产生气泡)(5) Thermal elution (note: the thermal elution process should be operated quickly; try to avoid bubbles during the blowing and mixing process)
1、孵育结束后从PCR仪上取下PCR管,并向其中加入100μl 65℃1X Wash Buffer I,吹吸混匀含有磁珠的杂交体系。1. After the incubation, remove the PCR tube from the PCR machine, and add 100μl 65°C 1X Wash Buffer I to it, and mix the hybridization system containing magnetic beads by pipetting and mixing.
2、将PCR管置于磁力架上1min,待液体完全澄清后,使用移液器吸取移弃上清。2. Place the PCR tube on the magnetic stand for 1 min. After the liquid is completely clarified, use a pipette to remove and discard the supernatant.
3、将PCR管从磁力架上移出,加入150μl 65℃1X Stringent Wash Buffer,轻柔吹吸10次混合均匀,放进PCR仪中65℃孵育5min。3. Remove the PCR tube from the magnetic stand, add 150μl 65°C 1X Stringent Wash Buffer, gently pipette 10 times to mix evenly, put it in the PCR machine and incubate at 65°C for 5 minutes.
4、重复步骤2和3一次。4. Repeat steps 2 and 3 once.
(6)室温洗脱(6) Elution at room temperature
1、将PCR管瞬时离心后置于磁力架上1min,待液体完全澄清后吸取移弃上清,加入150μl室温1X Wash Buffer I,涡旋混匀,室温孵育2min,期间涡旋混匀30s后静置30s,交替进行,确保充分混匀。1. Centrifuge the PCR tube briefly and place it on the magnetic stand for 1 min. After the liquid is completely clear, pipette and discard the supernatant, add 150μl 1X Wash Buffer I at room temperature, vortex to mix, incubate at room temperature for 2 min, vortex and mix for 30 seconds during this time Let stand for 30 seconds, alternately, to ensure sufficient mixing.
2、将PCR管瞬时离心后置于磁力架上1min,待液体完全澄清后吸取移弃上清,加入150μl室温1X Wash Buffer II,涡旋混匀,室温孵育2min,期间涡旋混匀30s后静置30s,交替进行,确保充分混匀。2. Centrifuge the PCR tube briefly and place it on the magnetic stand for 1 min. After the liquid is completely clear, pipette and discard the supernatant, add 150μl of room temperature 1X Wash Buffer II, vortex and mix, incubate at room temperature for 2 minutes, vortex and mix for 30 seconds. Let stand for 30 seconds, alternately, to ensure sufficient mixing.
3、将PCR管瞬时离心后置于磁力架上1min,待液体完全澄清后吸取移弃上清,加入150μl室温1X Wash Buffer III,涡旋混匀,室温孵育2min,期间涡旋混匀30s后静置30s,交替进行,确保充分混匀。3. Centrifuge the PCR tube briefly and place it on a magnetic stand for 1 min. After the liquid is completely clear, pipette and discard the supernatant, add 150μl room temperature 1X Wash Buffer III, vortex to mix, incubate at room temperature for 2 minutes, vortex and mix for 30 seconds. Let stand for 30 seconds, alternately, to ensure sufficient mixing.
4、将PCR管瞬时离心后置于磁力架上1min,待液体完全澄清后吸取移弃上清,之后换用10μl吸头移去少量残余Buffer。4. Centrifuge the PCR tube briefly and place it on the magnetic stand for 1 min. After the liquid is completely clear, pipette and discard the supernatant, then use a 10μl tip to remove a small amount of residual Buffer.
5、将PCR管从磁力架上移出,加入22.5μl Nuclease Free Water,使用移液器轻柔吹吸10次,确保混合均匀,转移全部液体至一个新的0.2ml PCR管中。5. Remove the PCR tube from the magnetic stand, add 22.5μl Nuclease Free Water, use a pipette to gently pipette 10 times to ensure uniform mixing, and transfer all the liquid to a new 0.2ml PCR tube.
后续的PCR扩增及文库纯化和定量步骤按照NadPrep TM DNA文库构建试剂盒(for MGI)(201909Version2.0)的说明书进行操作即可。 The subsequent PCR amplification and library purification and quantification steps can be performed in accordance with the instructions of the NadPrep TM DNA Library Construction Kit (for MGI) (201909 Version 2.0).
本实施例测试发现在Index区域是用通用碱基时,两端固定的区域碱基修饰的多少和通用封闭序列的使用浓度对最终的封闭效果影响加大,偏少的修饰碱基(5+7)时达到最佳封闭效果需要加入200μmol/L(封闭序列和文库比例是20:1);偏多的修饰碱基(7+10)是100μmol/L就能达到最佳效果,达到200μmol/L时已经产生了抑制,如图5所示,可能是局部区域产生强的结合能力,当加入的通用封闭序列过多时,在反应体系中碰撞机会增加,从而形成异常的十字星结构(如图4B所示)。The test in this example found that when universal bases are used in the Index area, the number of base modifications in the fixed regions at both ends and the use concentration of the universal blocking sequence have a greater impact on the final blocking effect, and fewer modified bases (5+ 7) To achieve the best blocking effect, you need to add 200μmol/L (the ratio of the blocking sequence to the library is 20:1); the more modified bases (7+10) are 100μmol/L to achieve the best effect, reaching 200μmol/ Inhibition has been produced at L. As shown in Figure 5, it may be that the local area has strong binding ability. When too many universal closed sequences are added, the chance of collision in the reaction system increases, thus forming an abnormal cross star structure (as shown in the figure). Shown in 4B).
实施例2Example 2
实施例2与实施例1的步骤相同,唯一不同的在于,所采用的通用封闭序列的修饰碱基数量不同。本实施例中,通用封闭序列的修饰碱基数目如下表所示:The steps of Example 2 are the same as those of Example 1, with the only difference being that the number of modified bases of the universal blocking sequence used is different. In this embodiment, the number of modified bases of the universal blocking sequence is shown in the following table:
表7:Table 7:
封闭组合Closed combination 4+6修饰封闭(封闭方案3)4+6 modified blocking (blocking scheme 3) 8+11修饰组合(封闭方案4)8+11 modification combination (closed solution 4)
浓度(μmol/L)Concentration (μmol/L) 200200 100100
在封闭方案3中,P1序列不变(SEQ ID NO:3所示的P1端封闭序列,In the blocking scheme 3, the P1 sequence remains unchanged (SEQ ID NO: P1 end blocking sequence shown in SEQ ID NO: 3,
CTCTCA+GTACG+TCA+GCA+GT+T10XXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/),P2序列为:SEQ ID NO:7CTCTCA+GTACG+TCA+GCA+GT+T10XXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/), the sequence of P2 is: SEQ ID NO: 7
GCA+TGGC+GA+CCTT+ATCAGXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCCTCCGA+CTT/3SpC3/。GCA+TGGC+GA+CCTT+ATCAGXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCCTCCGA+CTT/3SpC3/.
在封闭方案4中,P1封闭序列为SEQ ID NO:8In the blocking scheme 4, the P1 blocking sequence is SEQ ID NO: 8
CTC+TCA+GT+ACG+TCA+G+CA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGA+CTT/3SpC3/;P2封闭序列为SEQ ID NO:9CTC+TCA+GT+ACG+TCA+G+CA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGA+CTT/3SpC3/; P2 blocking sequence is SEQ ID NO :9
GCA+TG+GC+GA+CC+TT+AT+CA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TC+C+GA+CTT/3SpC3/。GCA+TG+GC+GA+CC+TT+AT+CA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TC+C+GA+CTT/3SpC3/.
方案3是在方案1的基础上保留P1的修饰不变,把P2的标签两端的封闭修饰碱基数量减少一个,修饰额的数量是4+6个;方案4是在方案是在方案3的基础上在标签两端的通用封闭各加一个封闭修饰碱基。当标签序列左侧的封闭修饰碱基是4个,右侧是6个时封闭效果明显不如方案1的5+7组合方案。同时方案4的8+11修饰方式也比方案2的7+10方案封闭效果差,结果见图6。所以左侧非标签区封闭序列包括5~7个修饰的碱基,和右侧非标签区封闭序列包括7~10个修饰的碱基是效果更好的方案。 Scheme 3 is to leave the modification of P1 unchanged on the basis of scheme 1, and reduce the number of blocked modified bases at both ends of the P2 tag by one, and the number of modifications is 4+6; scheme 4 is in scheme 3 Basically, add a block modification base to the universal block at both ends of the tag. When there are 4 blocking modified bases on the left side of the tag sequence and 6 on the right side, the blocking effect is obviously inferior to the 5+7 combination scheme of Scheme 1. At the same time, the 8+11 modification method of Scheme 4 is also less effective than the 7+10 scheme of Scheme 2, and the result is shown in Figure 6. Therefore, the closed sequence of the left non-tag region includes 5-7 modified bases, and the closed sequence of the right non-tag region includes 7-10 modified bases is a better solution.
实施例3Example 3
实施例3与实施例1的步骤相同,通用封闭序列的修饰碱基数目也与实施例1的方案2相同,唯一不同之处的在于,通用封闭序列的修饰碱基位置不同,具体序列如下:The steps of Example 3 are the same as those of Example 1. The number of modified bases of the universal blocking sequence is also the same as that of Scheme 2 of Example 1. The only difference is that the positions of modified bases of the universal blocking sequence are different. The specific sequence is as follows:
封闭方案5:P1封闭序列为SEQ ID NO:10Blocking scheme 5: P1 blocking sequence is SEQ ID NO: 10
CTC+T+CA+GT+ACG+TCA+GCA+GTTXXXXXXXXXXCAACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGA+CTT/3SpC3/,P2封闭序列为SEQ ID NO:11CTC+T+CA+GT+ACG+TCA+GCA+GTTXXXXXXXXXXCAACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGA+CTT/3SpC3/, the P2 blocking sequence is SEQ ID NO: 11
G+CA+TG+GC+GA+CC+TT+ATCAGXXXXXXXXXXTTGTCTT+CCTA+AGA+CC+GC+TTG+GCC+TC+C+GA+CTT/3SpC3/。G+CA+TG+GC+GA+CC+TT+ATCAGXXXXXXXXXXTTGTCTT+CCTA+AGA+CC+GC+TTG+GCC+TC+C+GA+CTT/3SpC3/.
封闭方案6:P1封闭序列为SEQ ID NO:12Blocking scheme 6: P1 blocking sequence is SEQ ID NO: 12
CTCTCA+GT+ACG+TCA+G+CA+GT+TXXXXXXXXXXCA+ACT+CCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATCCGACTT/3SpC3/;P2封闭序列为SEQ ID NO:13CTCTCA+GT+ACG+TCA+G+CA+GT+TXXXXXXXXXXCA+ACT+CCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATCCGACTT/3SpC3/; P2 blocking sequence is SEQ ID NO: 13
GCATG+GC+GA+CC+TT+AT+CA+GXXXXXXXXXXTTG+TCTT+C+CTA+AGA+CC+GC+TTG+GCC+TCC+GACTT/3SpC3/。GCATG+GC+GA+CC+TT+AT+CA+GXXXXXXXXXXTTG+TCTT+C+CTA+AGA+CC+GC+TTG+GCC+TCC+GACTT/3SpC3/.
封闭方案5是与方案2修饰数量相同的基础上改变了修饰的位置,方案5的是减少了两端的各一个修饰,增加了离中间标签比较近的位置的修饰;封闭方案6是正好相反,增加了两端的封闭修饰碱基,减少了离中间标签比较近的修饰。通用封闭方案5,方案6与方案2的封闭序列使用浓度都是100μmol/L,用相同的投入量文库杂交。结果发现方案5和方案2的效果接近,方案6的效果变差,见图7所示,说明不仅是修饰的碱基数量影响封闭效果,修饰的位置了对封闭有影响,本发明发现均衡修饰和离中间标签序列比较近的区域增加修饰会明显好于两个末端增加修饰数量的效果好。The closed scheme 5 changed the position of the modification based on the same number of modifications as scheme 2. The scheme 5 reduced one modification at each end and increased the modification at a position closer to the middle label; the closed scheme 6 is just the opposite. The closed modified bases at both ends are increased, and the modifications closer to the middle tag are reduced. The general blocking protocol 5, protocol 6 and protocol 2 used the blocking sequence at a concentration of 100 μmol/L, and hybridized with the same input amount of the library. As a result, it is found that the effects of Scheme 5 and Scheme 2 are close, and the effect of Scheme 6 is worse, as shown in Figure 7, which shows that not only the number of modified bases affects the blocking effect, but also the modified position has an impact on the blocking. The present invention finds balanced modification Adding modification to the region closer to the middle tag sequence will be significantly better than increasing the number of modifications at the two ends.
实施例4多个文库杂交测试Example 4 Multiple library hybridization test
建库和杂交的实验步骤和实施例1一样,通用封闭选择7+10的碱基修饰,浓度用100μmol/L,区别是多个文库混合杂交,具体文库投入数量及文库投入总量如下:The experimental procedures for library construction and hybridization are the same as in Example 1. The universal blocking selection is 7+10 base modifications, and the concentration is 100 μmol/L. The difference is that multiple libraries are mixed and hybridized. The specific library input quantity and total library input are as follows:
表8.Table 8.
投入文库数量(500ng/文库)Number of input libraries (500ng/library) 10个10 12个12 14个14 16个16
文库投入量总量Total amount of library input 5μg5μg 6μg6μg 7μg7μg 8μg8μg
在杂交捕获时投入量是500ng/文库时测序的各项指标表现比较好,如果单次杂交能够允许投入更多的量,也就是单次杂交更多的样本会降低每个样本的杂交捕获成本。本申请的通用封闭序列的测试发现单次不超过12个样本一起杂交时,杂交的各项指标表现较好,而杂交的文库数量达到14个和16个样本时,中靶率会有一定程度的下降。如图8所示,14个文库和16个文库的投入总量分别是7μg和8μg,在固定的捕获体系中,随着文库数量的增加,部分文库有机会出现两个文库与通用封闭形成十字星结构封闭,从而降低中靶率。When the input amount for hybridization capture is 500ng/library, the sequencing indicators perform better. If a single hybridization can allow more input, that is, more samples in a single hybridization will reduce the cost of hybridization capture for each sample. . The test of the universal closed sequence in this application found that when no more than 12 samples are hybridized together in a single time, the hybridization indicators perform well, and when the number of hybridized libraries reaches 14 and 16 samples, the hit rate will have a certain degree Decline. As shown in Figure 8, the total input of 14 libraries and 16 libraries is 7μg and 8μg, respectively. In a fixed capture system, as the number of libraries increases, some libraries have the opportunity to form a cross between two libraries and universal closure. The star structure is closed, thereby reducing the hit rate.
综上所述,本发明针对MGI平台的双端Index文库杂交设计了通用的封闭序列,通过在Index的可变区域用通用碱基替代,并在index两侧的固定序列中增加提升退火温度的碱基修饰,能够提高双端index文库的捕获效率。进一步地,本申请还发现通用封闭序列的封闭的效果与提升退火温度的修饰碱基的数量控制在左侧为5~7个,右侧为7~10个时,封闭序列的封闭效果更好。此外还发现如果进一步优化修饰碱基的具体位置为本申请实施例的优选位置时,封闭效果最好。相应地,通用封闭序列的使用浓度也对捕获目的文库的中靶率有影响。当通用封闭序列的用量为0.4~0.8μg时,本申请可支持同时进行12个样本一起杂交,这大大降低了单个样本的杂交捕获成本。In summary, the present invention designs a universal blocking sequence for the hybridization of the double-ended Index library of the MGI platform, by replacing the variable region of the Index with universal bases, and adding the fixed sequence on both sides of the index to increase the annealing temperature. Base modification can improve the capture efficiency of the double-ended index library. Furthermore, this application also finds that the blocking effect of the universal blocking sequence and the number of modified bases that increase the annealing temperature are controlled at 5-7 on the left and 7-10 on the right, the blocking effect of the closed sequence is better. . In addition, it has also been found that if the specific positions of the modified bases are further optimized for the preferred positions of the examples of this application, the blocking effect is the best. Correspondingly, the concentration of the universal blocking sequence also affects the target rate of the target library. When the amount of the universal blocking sequence is 0.4-0.8 μg, the application can support 12 samples to be hybridized together at the same time, which greatly reduces the hybridization and capture cost of a single sample.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not used to limit the present invention. For those skilled in the art, the present invention can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

  1. 一种通用封闭序列,其特征在于,所述通用封闭序列按照从5’到3’的方向包括依次连接的左侧非标签区封闭序列、中间标签区封闭序列及右侧非标签区封闭序列,A universal closed sequence, characterized in that the universal closed sequence includes a left non-tag region closed sequence, a middle label region closed sequence, and a right non-tag region closed sequence sequentially connected in a direction from 5'to 3',
    其中,所述左侧非标签区封闭序列包括5~7个LAN或BNA修饰的碱基,所述中间标签区封闭序列为通用封闭碱基序列,所述右侧非标签区封闭序列包括7~10个LAN或BNA修饰的碱基,且所述右侧非标签区封闭序列的3’端带有封闭修饰。Wherein, the closed sequence of the left non-tag region includes 5-7 bases modified by LAN or BNA, the closed sequence of the middle label region is a universal closed base sequence, and the closed sequence of the right non-tag region includes 7~ 10 LAN or BNA modified bases, and the 3'end of the right non-tag region closed sequence has a closed modification.
  2. 根据权利要求1所述的通用封闭序列,其特征在于,所述3’端的所述封闭修饰为MGB修饰、C3间隔臂修饰、磷酸化修饰、地高辛修饰或生物素修饰,或者所述3’端碱基为双脱氧碱基。The universal blocking sequence according to claim 1, wherein the blocking modification at the 3'end is MGB modification, C3 spacer modification, phosphorylation modification, digoxigenin modification or biotin modification, or the 3' The'terminal base is a dideoxy base.
  3. 根据权利要求1所述的通用封闭序列,其特征在于,所述通用封闭碱基为次黄嘌呤或C3间隔臂。The universal blocking sequence of claim 1, wherein the universal blocking base is hypoxanthine or a C3 spacer.
  4. 根据权利要求1至3中任一项所述的通用封闭序列,其特征在于,所述通用封闭序列为MGI测序平台的带有第一标签序列的P1接头的封闭序列或带有第二标签序列的P2接头的封闭序列,其中,The universal closed sequence according to any one of claims 1 to 3, wherein the universal closed sequence is a closed sequence of a P1 linker with a first tag sequence of an MGI sequencing platform or a sequence with a second tag The closed sequence of the P2 linker, where,
    所述P1接头的封闭序列为SEQ ID NO:3:The blocking sequence of the P1 linker is SEQ ID NO: 3:
    CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰;CTCTCA+GTACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCACAGA+ACGA+CATGG+CTACGATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/is Modification of the C3 spacer at the 3'end;
    所述P2接头的封闭序列为SEQ ID NO:4:The blocking sequence of the P2 linker is SEQ ID NO: 4:
    GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰。GCA+TGGC+GA+CCTT+ATCA+GXXXXXXXXXXXTTGTCTT+CCTA+AGA+CCGC+TTG+GCC+TCCGA+CTT/3SpC3/; where + means LAN or BNA modification, X means hypoxanthine or C3 spacer, /3SpC3/ is The C3 spacer at the 3'end is modified.
  5. 根据权利要求1至3中任一项所述的通用封闭序列,其特征在于,所述通用封闭序列为MGI测序平台的带有第一标签序列的P1接头的封闭序列或带有第二标签序列的P2接头的封闭序列;The universal closed sequence according to any one of claims 1 to 3, wherein the universal closed sequence is a closed sequence of a P1 linker with a first tag sequence of an MGI sequencing platform or a sequence with a second tag The closed sequence of the P2 linker;
    所述P1接头的封闭序列为SEQ ID NO:5:The blocking sequence of the P1 linker is SEQ ID NO: 5:
    CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰;CTC+TCA+GT+ACG+TCA+GCA+GT+TXXXXXXXXXXCA+ACTCCT+TGGC+TCAC+AGA+ACGA+CAT+GG+CTAC+GATC+CGACTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is the modification of the C3 spacer at the 3'end;
    所述P2接头的封闭序列SEQ ID NO:6:The blocking sequence of the P2 linker SEQ ID NO: 6:
    GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/;其中+表示LAN或BNA修饰,X表示次黄嘌呤或C3 spacer,/3SpC3/为3’端的C3间隔臂修饰。GCA+TG+GC+GA+CC+TT+ATCA+GXXXXXXXXXXTTG+TCTT+CCTA+AGA+CC+GC+TTG+GCC+TCC+GA+CTT/3SpC3/; where + means LAN or BNA modification, X means times Xanthine or C3 spacer, /3SpC3/ is the C3 spacer modification at the 3'end.
  6. 一种捕获试剂盒,所述捕获试剂盒包括通用封闭序列,其特征在于,所述通用封闭序列为权利要求1至5中任一项所述的通用封闭序列。A capture kit comprising a universal blocking sequence, characterized in that the universal blocking sequence is the universal blocking sequence according to any one of claims 1 to 5.
  7. 根据权利要求6所述的试剂盒,其特征在于,所述试剂盒中通用捕获探针的工作浓度为0.4~0.8μg所述通用封闭序列/1μg待捕获文库。The kit according to claim 6, wherein the working concentration of the universal capture probe in the kit is 0.4-0.8 μg of the universal blocking sequence/1 μg of the library to be captured.
  8. 一种文库杂交捕获方法,所述方法包括采用捕获试剂盒对待捕获文库进行捕获,其特征在于,所述捕获试剂盒采用权利要求6或7所述的捕获试剂盒。A method for library hybridization capture, said method comprising using a capture kit to capture the library to be captured, characterized in that said capture kit uses the capture kit of claim 6 or 7.
  9. 根据权利要求8所述的方法,其特征在于,所述通用封闭序列与待捕获文库按照摩尔比为10:1~20:1的比例进行封闭。The method according to claim 8, wherein the universal blocking sequence and the library to be captured are blocked at a molar ratio of 10:1 to 20:1.
  10. 一种建库方法,所述建库方法包括:A method for building a database, the method for building a database includes:
    构建片段化文库;Construction of fragmented library;
    对所述片段化文库进行杂交捕获,得到捕获文库;Hybridize and capture the fragmented library to obtain a capture library;
    对所述捕获文库进行PCR扩增,得到测序文库;Performing PCR amplification on the capture library to obtain a sequencing library;
    其特征在于,采用权利要求6或7所述的捕获试剂盒进行所述杂交捕获,或者采用权利要求9或10所述的方法进行所述杂交捕获。It is characterized in that the capture kit according to claim 6 or 7 is used for the hybrid capture, or the method according to claim 9 or 10 is used for the hybrid capture.
PCT/CN2020/139918 2020-05-18 2020-12-28 Universal closed sequence and use thereof WO2021232793A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010421923.XA CN111534518B (en) 2020-05-18 2020-05-18 Universal blocking sequence and application thereof
CN202010421923.X 2020-05-18

Publications (1)

Publication Number Publication Date
WO2021232793A1 true WO2021232793A1 (en) 2021-11-25

Family

ID=71979455

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139918 WO2021232793A1 (en) 2020-05-18 2020-12-28 Universal closed sequence and use thereof

Country Status (2)

Country Link
CN (1) CN111534518B (en)
WO (1) WO2021232793A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114774515A (en) * 2022-03-24 2022-07-22 北京安智因生物技术有限公司 Capture probe, kit and detection method for detecting polycystic kidney disease gene mutation
CN114807125A (en) * 2022-05-20 2022-07-29 纳昂达(南京)生物科技有限公司 Sequencing library joint, sequencing library, construction method of sequencing library and method for improving NGS library construction connection efficiency
WO2024092562A1 (en) * 2022-11-02 2024-05-10 京东方科技集团股份有限公司 Blocking sequence, kit thereof, and method for using same

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111534518B (en) * 2020-05-18 2021-07-23 纳昂达(南京)生物科技有限公司 Universal blocking sequence and application thereof
CN111910258B (en) * 2020-08-19 2021-06-15 纳昂达(南京)生物科技有限公司 Paired-end library tag composition and application thereof in MGI sequencing platform
CN111961711A (en) * 2020-08-31 2020-11-20 伯科生物科技有限公司 Universal hybridization enhancers and methods for targeted sequencing
CN112708619B (en) * 2020-12-30 2022-05-17 纳昂达(南京)生物科技有限公司 Joint for building library of MGI platform, kit and library building method
CN114657232B (en) * 2022-03-11 2024-04-30 上海英基生物科技有限公司 Universal blocking reagent for improving targeted capture efficiency and application thereof
CN116536308A (en) * 2022-12-04 2023-08-04 深圳吉因加医学检验实验室 Sequencing sealant and application thereof
CN115948388A (en) * 2022-12-30 2023-04-11 纳昂达(南京)生物科技有限公司 Specific capture primer, targeted capture probe composition, targeted capture library construction method and application
CN115948621A (en) * 2023-01-18 2023-04-11 珠海舒桐医疗科技有限公司 HPV screening method based on menstrual blood DNA

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108456713A (en) * 2017-11-27 2018-08-28 天津诺禾致源生物信息科技有限公司 The construction method of tab closure sequence, library construction Kit and sequencing library
WO2018183808A1 (en) * 2017-03-31 2018-10-04 Agenovir Corporation Antiviral therapeutic
CN108676846A (en) * 2018-05-25 2018-10-19 艾吉泰康生物科技(北京)有限公司 A kind of application of bridge-type oligonucleotides in library target area captures
CN108949941A (en) * 2018-06-25 2018-12-07 北京莲和医学检验所有限公司 Low-frequency mutation detection method, kit and device
CN111534518A (en) * 2020-05-18 2020-08-14 纳昂达(南京)生物科技有限公司 Universal blocking sequence and application thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102468174B1 (en) * 2016-09-15 2022-11-17 에프. 호프만-라 로슈 아게 How to perform multiplex PCR
WO2018089944A1 (en) * 2016-11-11 2018-05-17 uBiome, Inc. Method and system for fragment assembly and sequence identification
CN110564831A (en) * 2019-08-30 2019-12-13 北京优迅医学检验实验室有限公司 Blocking reagent for sequencing library and method for improving targeted capture efficiency

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018183808A1 (en) * 2017-03-31 2018-10-04 Agenovir Corporation Antiviral therapeutic
CN108456713A (en) * 2017-11-27 2018-08-28 天津诺禾致源生物信息科技有限公司 The construction method of tab closure sequence, library construction Kit and sequencing library
CN108676846A (en) * 2018-05-25 2018-10-19 艾吉泰康生物科技(北京)有限公司 A kind of application of bridge-type oligonucleotides in library target area captures
CN108949941A (en) * 2018-06-25 2018-12-07 北京莲和医学检验所有限公司 Low-frequency mutation detection method, kit and device
CN111534518A (en) * 2020-05-18 2020-08-14 纳昂达(南京)生物科技有限公司 Universal blocking sequence and application thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114774515A (en) * 2022-03-24 2022-07-22 北京安智因生物技术有限公司 Capture probe, kit and detection method for detecting polycystic kidney disease gene mutation
CN114807125A (en) * 2022-05-20 2022-07-29 纳昂达(南京)生物科技有限公司 Sequencing library joint, sequencing library, construction method of sequencing library and method for improving NGS library construction connection efficiency
CN114807125B (en) * 2022-05-20 2023-09-12 纳昂达(南京)生物科技有限公司 Sequencing library joint, sequencing library, construction method of sequencing library and method for improving NGS (Next Generation System) library construction connection efficiency
WO2024092562A1 (en) * 2022-11-02 2024-05-10 京东方科技集团股份有限公司 Blocking sequence, kit thereof, and method for using same

Also Published As

Publication number Publication date
CN111534518A (en) 2020-08-14
CN111534518B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2021232793A1 (en) Universal closed sequence and use thereof
WO2021013244A1 (en) Method for constructing capture library and kit
JP6483249B2 (en) Isolated oligonucleotides and their use in sequencing nucleic acids
CN111748551B (en) Blocking sequence, capture kit, library hybridization capture method and library construction method
CN105400776B (en) Oligonucleotide linker and application thereof in constructing nucleic acid sequencing single-stranded circular library
CN109486811B (en) Double-end molecular tag joint, application thereof and sequencing library with joint
WO2020140693A1 (en) Gene target region enrichment method and kit
WO2016058134A1 (en) Linker element and method of using same to construct sequencing library
CN106460065A (en) Systems and methods for clonal replication and amplification of nucleic acid molecules for genomic and therapeutic applications
WO2021052310A1 (en) Dna library construction method
CN108517567B (en) Adaptor, primer group, kit and library construction method for cfDNA library construction
WO2017193833A1 (en) Method and kit comprising 4,000 human pathogenic target genes
WO2023221308A1 (en) Liquid-phase hybrid capture method and test kit thereof
EP3819385A1 (en) Construction and sequencing data analysis method for ctdna library for simultaneously detecting various common mutations in liver cancer
CN112626173B (en) RNA library construction method
WO2024146481A1 (en) Dna methylation library construction method and library obtained with same, dna hybridization capture method and kit
WO2023202030A1 (en) Method for constructing high-throughput sequencing library of small rna
CN107354207B (en) liquid phase hybridization capture kit based on double-stranded probe, washing kit and application thereof
WO2021253372A1 (en) High-compatibility pcr-free library building and sequencing method
TW201321520A (en) Method and system for virus detection
CN107077538B (en) Sequencing data processing device and method
CN114807317A (en) Optimized DNA linear amplification method and kit
WO2021219114A1 (en) Sequencing method, analysis method therefor and analysis system thereof, computer-readable storage medium, and electronic device
CN116536308A (en) Sequencing sealant and application thereof
WO2019052322A1 (en) Method for analyzing oligonucleotide sequence impurity based on high throughput sequencing and use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20936578

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20936578

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 20936578

Country of ref document: EP

Kind code of ref document: A1