CN114317528A - Specific molecular label UMI group, mixed specific molecular label joint and application - Google Patents

Specific molecular label UMI group, mixed specific molecular label joint and application Download PDF

Info

Publication number
CN114317528A
CN114317528A CN202011061421.7A CN202011061421A CN114317528A CN 114317528 A CN114317528 A CN 114317528A CN 202011061421 A CN202011061421 A CN 202011061421A CN 114317528 A CN114317528 A CN 114317528A
Authority
CN
China
Prior art keywords
specific molecular
sequence
molecular tag
umi
linker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011061421.7A
Other languages
Chinese (zh)
Inventor
楚玉星
李奇
杨玲
陈蓉蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiyinjia Medical Laboratory Co ltd
Geneplus-Beijing
Original Assignee
Beijing Jiyinjia Medical Laboratory Co ltd
Geneplus-Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiyinjia Medical Laboratory Co ltd, Geneplus-Beijing filed Critical Beijing Jiyinjia Medical Laboratory Co ltd
Priority to CN202011061421.7A priority Critical patent/CN114317528A/en
Publication of CN114317528A publication Critical patent/CN114317528A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of nucleic acid sequencing, in particular to a specific molecular label UMI group, a mixed specific molecular label joint and application, wherein the specific molecular label UMI group comprises: the sequence length is 5bp of a first specific molecular label UMI, the sequence length is 6bp of a second specific molecular label UMI, the sequence length is 7bp of a third specific molecular label UMI, and the sequence length is 8bp of a fourth specific molecular label. The double-index library constructed by using the mixed specific molecular tags can ensure the base balance of the downstream fixed base pairs of the molecular tags, and the direct machine sequencing of the library carrying the molecular tags is realized under the condition of not mixing balanced libraries. Meanwhile, the mixed specific molecular tag joint and the matched double-index joint primer provided by the invention are also obviously superior to the existing double-index library system in the connection uniformity of different molecular tag joints and the amplification uniformity among different indexes.

Description

Specific molecular label UMI group, mixed specific molecular label joint and application
Technical Field
The invention relates to the technical field of nucleic acid sequencing, in particular to a specific molecular label UMI group, a mixed specific molecular label joint, a double-index library construction system and application.
Background
The specific Molecular Identifier (UMI) is a short nucleotide sequence which can be regarded as a bar code, the Molecular Identifier is connected into DNA molecules through a connecting joint during library construction, each molecule in an original sample is marked, and amplified products from the same source are tracked and finally extracted and grouped, so that depth deviation caused by PCR amplification preference and sequencing preference and low-frequency false mutation noise caused by PCR error and sequencing error are eliminated. Excluding the depth deficiency brought by the cfDNA natural repeat sequence. Because the molecular tag is located at the joint end, in order to ensure normal T-A connection, a fixed T-A base pair is usually introduced into 1bp downstream of the molecular tag. The base pair is in Gene+Seq 2000、Gene+The Seq200 and DNBSEQ-T7 platforms can cause signal overexposure of the whole chip, resulting in molecular tag and downstream sequence sequencing errors. To avoid this problem, the library carrying the molecular tags is usually sequenced in a mixture with a library without molecular tags (equilibrium library) weighing not less than 50% of the data volume. The method limits the machine throughput of the molecular label library and also causes unnecessary data waste. Based on the above, the applicant originally optimizes the sequence of the joint, the primer and the molecular tag on the basis of the structure of the existing specific double-end index library. Test results show that the optimized Gionex plus specific double-end index library system avoids chip signal overexposure caused by the downstream fixed A-T base pairs of the molecular tags, and the library carrying the molecular tags can be independently programmed and sequenced without a mixed equilibrium library. Compared with the conventional specific double-end index library system, the connection uniformity of different molecular label joints and the connection uniformity of different molecular label jointsThe amplification uniformity among different indexes also has significant advantages.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problem of overexposure of the downstream A-T fixed base sequencing signal of the molecular tag in the existing specific double-end index library system, and further provide a specific molecular tag UMI group, a mixed specific molecular tag joint, a double-index library construction system and application thereof, so that the library carrying the molecular tag is independently operated under the condition of unbalanced library mixing without overexposure of the A-T fixed base sequencing signal, and meanwhile, through the optimized design of the double-end index and the mixed specific molecular tag joints of different molecular tags, the connection of different joints to DNA is not preferential and is uniform, and the amplification efficiency of the index is nearly uniform.
Therefore, the invention provides the following technical scheme:
in a first aspect, the present invention provides a set of specific molecular signatures UMI comprising: the sequence length is 5bp of a first specific molecular label UMI, the sequence length is 6bp of a second specific molecular label UMI, the sequence length is 7bp of a third specific molecular label UMI, and the sequence length is 8bp of a fourth specific molecular label UMI.
Alternatively, the sequences of the sense molecular tags of the specific molecular tag UMI group are shown in table 2, and the sequences of the antisense molecular tags are shown in table 2.
Optionally, the number of moles of each molecular tag is the same.
In a second aspect, the invention provides a hybrid specific molecular tag linker, wherein the sense molecular tag and the antisense molecular tag of the hybrid specific molecular tag linker both use the specific molecular tag UMI group.
Optionally, the number of moles of the first specific molecular tag UMI, the second specific molecular tag UMI, the third specific molecular tag UMI and the fourth specific molecular tag UMI in the mixed specific molecular tag linker is the same.
Optionally, the hybrid specific molecular tag linker comprises two partially complementary linker oligonucleotide strands:
the adaptor oligonucleotide chain 1 comprises an index2 primer binding region, a sense adaptor complementary region, a sense molecule label, at least one base S with base balance function and 1 protruding fixed base T in sequence from the 5 'end to the 3' end;
the adaptor oligonucleotide 2 comprises, in order from the 5 'terminus to the 3' terminus, a base complementary to the base S in the adaptor oligonucleotide 1, an antisense tag reverse-complementary to the sense molecule tag in the adaptor oligonucleotide 1, an antisense adaptor complementary region reverse-complementary to the sense adaptor complementary region of the adaptor oligonucleotide 1, and an index1 primer binding region.
Alternatively, "S" represents either of G/C bases.
Optionally, in the adaptor oligonucleotide chain 1, the length of the index2 primer binding region is 15-42bp, the length of the sense adaptor complementary region is 8-10bp, and the length of the sense molecule label is 5-8 bp; the complementary region of the sense joint is totally or partially overlapped with the 3' terminal sequence of the index2 primer binding region;
optionally, in the adaptor oligonucleotide strand 2, the length of the antisense molecular tag is 5-8bp, the length of the complementary region of the antisense adaptor is 8-10bp, and the length of the primer binding region of index1 is 15-30 bp; the complementary region of the antisense joint is totally or partially overlapped with the 5' terminal sequence of the primer binding region of index 1.
Optionally, in the linker oligonucleotide chain 1, the number of each of the a/T/G/C4 bases in the sequence consisting of the sense molecule tag, the equilibrium base S and the fixed base T accounts for 6.25% to 43.75% of the total number of bases of the sense molecule tag;
optionally, in the adaptor oligonucleotide 2, the number of each of the 4 bases A/T/G/C in the complementary sequence of the antisense molecular tag and the equilibrium base S is between 6.25% and 43.75% of the total number of bases in the antisense molecular tag.
Alternatively, the 5' end of the linker oligonucleotide strand 2 is modified by phosphorylation.
Optionally, the hybrid specific molecular tag linker is a Y-type hybrid specific molecular tag linker.
Alternatively, the index1 sequence and the index2 sequence are selected from table 1, so that the amplification efficiency of the constructed library is ensured.
TABLE 1index 1 and index2 sequence information
Figure BDA0002712504480000031
Figure BDA0002712504480000041
Figure BDA0002712504480000051
Figure BDA0002712504480000061
Figure BDA0002712504480000071
In a third aspect, the invention provides a double index library structure, which comprises a hybrid specific molecular tag and/or a hybrid specific molecular tag adaptor.
In a fourth aspect, the invention provides a double-index library construction system, which comprises the hybrid specific molecular tag and/or the hybrid specific molecular tag adaptor.
In a fifth aspect, the invention provides a library construction method, comprising the use of the mixed specific molecular tag and/or the mixed specific molecular tag linker.
In a sixth aspect, the invention provides a sequencing method comprising the use of the library structure.
Optionally, the sequencing platform is Gene+Seq 2000、Gene+Seq200 and DNBSEQ-T7.
The technical scheme of the invention has the following advantages:
1. the UMI group with different lengths provided by the invention effectively avoids the problem of poor sequencing quality caused by base overexposure, and realizes independent operation of a library carrying molecular tags under the condition of no balanced library mixing. The connection efficiency of different molecular labels is not favorable, and the connection is uniform.
2. The invention provides a mixed specific molecular label joint, wherein the specific molecular label UMI group is used as a sense molecular label and an antisense molecular label of the mixed specific molecular label joint; by using the specific molecular tag UMI group, the problem of base imbalance is solved when the mixed specific molecular tag joint is constructed in a library, and the connection efficiency of the joint and a target DNA sequence and the effective utilization rate of the target DNA sequence can be improved.
3. According to the mixed specific molecular label joint provided by the invention, the mole numbers of the first specific molecular label UMI, the second specific molecular label UMI, the third specific molecular label UMI and the fourth specific molecular label UMI in the mixed specific molecular label joint are the same, and the mole numbers of the molecular labels in the mixed specific molecular label joint are the same in order to ensure the uniformity of connection because the molecular labels have no obvious preference.
4. The invention provides a mixed specific molecular label joint, which comprises two joint oligonucleotide chains with partially complementary parts: the adaptor oligonucleotide chain 1 comprises an index2 primer binding region, a sense adaptor complementary region, a sense molecule label, at least one base S with base balance function and 1 protruding base T in sequence from the 5 'end to the 3' end; a linker oligonucleotide strand 2 comprising, in order from the 5 'terminus to the 3' terminus, a base complementary to the base S in the linker oligonucleotide strand 1, an antisense molecule tag reverse-complementary to the sense molecule tag in the linker oligonucleotide strand 1, an antisense linker complementary region reverse-complementary to the sense linker complementary region of the linker oligonucleotide strand 1, and an index1 primer binding region; the double-index library system constructed by using the mixed specific molecular tag joint can ensure the base balance of a fixed base pair (A-T) at the downstream of the molecular tag, and realizes the direct machine sequencing of the library carrying the molecular tag under the condition of not mixing a balanced library. Meanwhile, the mixed specific molecular tag joint and the matched double-index joint primer provided by the invention are also obviously superior to the existing double-index library system in the connection uniformity of different molecular tag joints and the amplification uniformity among different indexes.
5. According to the mixed specific molecular label joint provided by the invention, the length of the index2 primer binding region is 15-42bp, the length of the sense joint complementary region is 8-10bp, and the length of the sense molecular label is 5-8 bp; the complementary region of the sense joint is totally or partially overlapped with the 3' terminal sequence of the index2 primer binding region; the length of the antisense molecular label is 5-8bp, the length of the complementary region of the antisense joint is 8-10bp, and the length of the index1 primer binding region is 15-30 bp; the complementary region of the antisense joint is totally or partially overlapped with the 5' terminal sequence of the index1 primer binding region; the primer binding efficiency is improved by increasing the length of the primer binding region.
6. In the mixed specific molecular label joint provided by the invention, in the joint oligonucleotide chain 1, the number of A/T/G/C4 bases in a sequence consisting of a sense molecular label, a balanced base S and a fixed base T respectively accounts for 6.25-43.75% of the total base number of the sense molecular label; preferably, in the adaptor oligonucleotide 2, the number of the A/T/G/C4 bases in the complementary sequence of the antisense molecular tag and the equilibrium base S is between 6.25% and 43.75% of the total number of bases in the antisense molecular tag; when the amount is within the above range, it is possible to further avoid the base overexposure during sequencing.
7. The invention provides a double-index library construction system which comprises a specific molecular label UMI group and/or a mixed specific molecular label joint. The double-index library system constructed by the double-index library construction system can ensure the base balance of the downstream fixed base pairs of the molecular tags, and the direct machine sequencing of the library carrying the molecular tags is realized under the condition of not mixing balanced libraries. Meanwhile, the mixed specific molecular tag joint and the matched double-index joint primer thereof provided by the invention are also obviously superior to the existing double-index library system in the connection uniformity of different molecular tag joints and the amplification uniformity among different indexes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of the structure of a hybrid specific molecular tag adapter and a double index adapter primer according to the present invention;
FIG. 2 is library yield data for inventive protocol 1in inventive Experimental example 3; the abscissa is the library number;
FIG. 3 is library yield data for inventive protocol 2 in inventive Experimental example 3; the abscissa is the library number;
FIG. 4 shows the mean Q30 distribution per base for forward sequencing and reverse sequencing of the test library of the present invention in Experimental example 4 of the present invention; the abscissa is each cycle in sequencing; the ordinate is the Q30 value.
Detailed Description
The following examples are provided to further understand the present invention, not to limit the scope of the present invention, but to provide the best mode, not to limit the content and the protection scope of the present invention, and any product similar or similar to the present invention, which is obtained by combining the present invention with other prior art features, falls within the protection scope of the present invention.
The examples do not show the specific experimental steps or conditions, and can be performed according to the conventional experimental steps described in the literature in the field. The reagents, cells or instruments used are not indicated by the manufacturer, and are all conventional products commercially available.
Example 1 specific molecular signatures UMI group
This example provides a set of specific molecular signatures UMI, comprising: the sequence length is 5bp of a first specific molecular label UMI, the sequence length is 6bp of a second specific molecular label UMI, the sequence length is 7bp of a third specific molecular label UMI, and the sequence length is 8bp of a fourth specific molecular label UMI.
As a preferred embodiment, this example provides an optimal set of specific molecular tags UMI, with the sequences of the sense molecular tags shown in table 2 and the sequences of the antisense molecular tags shown in table 2.
In a more preferred embodiment, the number of moles of each of the above-mentioned optimal specific molecular tags in the UMI group is the same.
Example 2 hybrid specific molecular tag linkers
This example provides a hybrid specific molecular tag linker, as shown in FIG. 1, comprising: a linker oligonucleotide strand 1 comprising, in order from the 5 ' terminus to the 3 ' terminus, an index2 primer binding region, a sense linker complementary region, a sense molecule tag, at least one base S having a base balancing effect, and 1 overhanging base T, the sense linker complementary region wholly or partially overlapping with the 3 ' terminal sequence of the index2 primer binding region, the sense molecule tag being in the UMI group which is the most preferred specific molecule tag in example 1; the nucleotide sequence of linker oligonucleotide strand 1 and the optimal set of specific molecular tags UMI (sense molecular tags) are shown in table 2 below;
the adaptor oligonucleotide strand 2 comprises, in order from the 5 ' terminus to the 3 ' terminus, a base complementary to the base S in the adaptor oligonucleotide strand 1, an antisense molecule tag reverse-complementary to the sense molecule tag in the adaptor oligonucleotide strand 1, an antisense adaptor complementary region reverse-complementary to the sense adaptor complementary region of the adaptor oligonucleotide strand 1, which is wholly or partially overlapped with the 5 ' terminal sequence of the index1 primer binding region, and an index1 primer binding region, the antisense molecule tag being of the UMI group of the specific molecule tag optimized in example 1; the linker oligonucleotide strand 2 nucleotide sequence and the optimal set of specific molecular tags UMI (antisense molecular tags) are shown in table 2 below; examples are specifically as follows:
linker oligonucleotide strand 1 (forward linker sequence) 5 '-3':
GGCTCACAGAACGACATGGCTACGATCCGACTTNNNNNST;
linker oligonucleotide strand 2 (reverse linker sequence) 5 '-3':
/5hos/SNNNNNAAGTCGGAGGCCAAGCGGTCTTAGGAA;
in the above sequence, "/5 Phos/" represents 5' phosphorylation modification, "S" represents any one of G/C two bases, and "N" represents any one of A \ T \ G \ C four bases; "NNNNN" refers to a sense molecular tag and an antisense molecular tag that is complementary to the sense molecular tag.
The designed hybrid specific molecular tag linkers Adapter 1 to Adapter48 are shown in Table 2 below:
TABLE 2 hybrid specific molecular tag linker sequences
Figure BDA0002712504480000101
Figure BDA0002712504480000111
Figure BDA0002712504480000121
Figure BDA0002712504480000131
The linker sequences in Table 2 above were prepared according to the following procedure:
(1) respectively adding corresponding volumes of linker annealing buffer solution according to the total synthesis amount of the linker oligonucleotide chain 1 and the linker oligonucleotide chain 2, and fully and uniformly mixing, centrifuging briefly, standing at room temperature and dissolving for 10min to prepare a linker oligonucleotide chain annealing working solution with the molar concentration of 100 mu M;
(2) adding 25 μ L of annealing working solution corresponding to linker oligonucleotide chain 1 and linker oligonucleotide chain 2 of molecular tag into new PCR tube, shaking, mixing, and placing in PCR apparatus (Applied Biosystems Veriti thermal cycler, Veriti)TM96-Well Thermal Cycler) according to the procedure in Table 3, to obtain Adapter 1 to Adapter48 linker mother liquors (50. mu.M) containing each molecular tag. The quality control of each linker mother liquor was performed using an Agilent DNA 1000 chip.
TABLE 3 linker annealing PCR procedure
Figure BDA0002712504480000132
(3) And mixing the qualified joint mother liquor with quality control according to equal proportion to obtain mixed joint mother liquor. When in use, TE buffer solution is used for dilution according to needs and is subpackaged and stored at-20 +/-5 ℃.
Example 3 double index adaptor primer
The embodiment provides a matched double index adaptor primer of a mixed specific molecular label adaptor, which comprises a forward adaptor primer and a reverse adaptor primer;
the forward adaptor primer sequentially comprises a GF primer sequence, an index2 sequence and an index2 primer binding region sequence from a 5 'end to a 3' end;
the reverse adaptor primer sequentially comprises a GR primer sequence, an index1 sequence and an index1 primer binding region reverse complementary sequence from a 5 'end to a 3' end;
as shown in fig. 1: examples of the nucleotide sequence of the forward adapter primer and the nucleotide sequence of the reverse adapter primer are as follows:
forward adaptor primer 5 '-3':
Figure BDA0002712504480000141
reverse linker primer 5 '-3':
Figure BDA0002712504480000142
in the above sequences, "nnnnnnnnnnnn" in the forward adaptor primer indicates the sample tag index2 sequence, the sequence of the single underlined part is the GF primer sequence, and the sequence of the double underlined part is the index2 primer binding region sequence; "nnnnnnnnnnnn" in the reverse adaptor primer indicates the sample tag index1 sequence, the sequence of the single underlined part is the GR primer sequence, and the sequence of the double underlined part is the reverse complement of the index1 primer binding region; the sequence of sample tag index1 and the sequence of sample tag index2 may be the same or different. Further, the index1 sequence and the index2 sequence are selected from table 1.
The double-index Adapter primer of the hybrid specific molecular tag Adapter sequence in Table 2 in example 2 was designed according to the design principle of the double-index Adapter primer matched with the hybrid specific molecular tag Adapter.
Example 4 double index splice Block sequence
This example provides a matched double index adaptor blocking sequence for a hybrid specific molecular tag adaptor: comprises a forward linker blocking sequence and a reverse linker blocking sequence;
the forward adaptor blocking sequence sequentially comprises a reverse complementary sequence of a sense adaptor complementary region, a reverse complementary sequence of an index2 primer binding region, a reverse complementary sequence of an index2 sequence and a reverse complementary sequence of a GF primer sequence from a 5 'end to a 3' end; the reverse complement sequence of the sense adapter complementary region is completely or partially overlapped with the 5' terminal sequence of the reverse complement sequence of the index2 primer binding region;
the reverse adaptor blocking sequence sequentially comprises a GR primer sequence, an index1 sequence, an index1 primer binding region reverse complementary sequence and an antisense adaptor complementary region reverse complementary sequence from a 5 'end to a 3' end; the reverse complementary sequence of the antisense joint complementary region is totally or partially overlapped with the 3' terminal sequence of the reverse complementary sequence of the index1 primer binding region;
examples of the nucleotide sequence of the forward linker blocking sequence and the nucleotide sequence of the reverse linker blocking sequence are as follows:
forward linker blocking sequence 5 '-3':
Figure BDA0002712504480000151
reverse linker blocking sequence 5 '-3':
Figure BDA0002712504480000152
in the above sequence, "/3 phos/" indicates a 3' phosphorylation modification; in the forward linker blocking sequence, "NNNNNNNNNN" represents the reverse complement sequence of the sample label index2 sequence, and comprises a sense linker complementary region reverse complement sequence, an index2 primer binding region reverse complement sequence, an index2 sequence reverse complement sequence and a GF primer sequence reverse complement sequence in sequence from 5 ' end to 3 ' end, wherein the sequence of a wavy line is the sense linker complementary region reverse complement sequence, the sequence of a single underlined part represents the reverse complement sequence of the index2 primer binding region, the sense linker complementary region reverse complement sequence part is overlapped with the 5 ' end sequence of the index2 primer binding region reverse complement sequence, and the sequence of a double underlined part is the GF primer sequence reverse complement sequence; the 'NNNNNNNNNN' in the reverse junction blocking sequence represents a sample label index1 sequence, and comprises a GR primer sequence, an index1 sequence, an index1 primer binding region reverse complementary sequence and an antisense junction complementary region reverse complementary sequence from the 5 'end to the 3' end in sequence; the reverse complementary sequence of the antisense joint complementary region is completely or partially overlapped with the 3' terminal sequence of the reverse complementary sequence of the index1 primer binding region, the sequence of the double underline part is a GR primer sequence, the sequence of the single underline part represents the reverse complementary sequence of the index1 primer binding region, and the sequence of the wavy line part represents the reverse complementary sequence of the antisense joint complementary region; the sample tag index of the forward adaptor-blocking sequence and the reverse adaptor-blocking sequence may be the same or different. Further, the index1 sequence and the index2 sequence are selected from table 1.
The double index adaptor blocking sequence of the hybrid specific molecular tag adaptor sequence in Table 2 in example 2 was designed according to the design principle of the double index adaptor blocking sequence associated with the hybrid specific molecular tag adaptor.
Embodiment 5 double index target area acquisition library construction process
The embodiment provides a method for constructing a library by using a double index library construction system, which comprises the following steps:
(1) end repair and addition of "A"
The NEBNext Ultra II End Prep Reaction Buffer and the NEBNext Ultra II End Prep Enzyme Mix are placed in an ice box in advance, and are shaken, mixed uniformly and centrifuged after the reagent is dissolved. The end-repair and "A" reaction Mix (Mix1) was prepared as in Table 4, shaken, mixed and centrifuged.
TABLE 4 Mix1 preparation Table
Components Single reaction volume (μ L)
NEBNext Ultra II End Prep Reaction Buffer 7
NEBNext Ultra II End Prep Enzyme Mix 3
Total volume 10
Note: mix1 was formulated on an ice box.
The configured Mix1 was dispensed into 50. mu.L of DNA samples in 10. mu.L of each reaction, mixed by shaking, and centrifuged. Incubate on a homothermal mixer or PCR machine according to the reaction conditions of Table 5. After the incubation is finished, the temperature is reduced to room temperature, a high-speed centrifuge performs short-time centrifugation, and the liquid drops initiated by evaporation are collected into a tube.
TABLE 5 end repair and addition "A" reaction procedure
Figure BDA0002712504480000161
(2) Joint connection
The dissolved NEBNext Ultra II Ligation Master Mix, NEBNext Ligation Enhancer and the adaptor of example 2 were shaken, mixed well and centrifuged. Add 2. mu.L of linker working fluid to the sample separately and blow and mix well. Preparing a linker ligation reaction Mix (Mix 2) according to the table 6, fully oscillating, uniformly mixing and centrifuging, subpackaging the mixture on ice according to 31 mu L of each reaction to each reaction tube, oscillating, uniformly mixing and centrifuging, placing the reaction tubes in a constant temperature mixer for 20 ℃, and incubating for 15 min.
Table 6 Mix2 formulation table
Components Single reaction volume (μ L)
NEBNext Ultra II Ligation Master Mix 30
NEBNext Ligation Enhancer 1
Total volume 31
Note: mix2 was formulated on an ice box.
(3) Purification after ligation
87 mu L of AMPure XP magnetic beads are added into each reaction tube, and after shaking and mixing, incubation is carried out for 10min at room temperature. At the end of incubation, the tube was centrifuged briefly and placed on a magnetic stand until clear, and the supernatant was discarded. Keeping the centrifuge tubes on a magnetic frame, sequentially adding 400 mu L of ethanol solution with volume fraction of 80% into each centrifuge tube, closing the tube caps, rinsing for 3 times, discarding the supernatant, and rinsing repeatedly. After the supernatant was discarded in the second rinse, the centrifuge tube was briefly centrifuged and the residual liquid in the centrifuge tube was aspirated by a 20 μ L pipette. And opening the cover of the centrifugal tube, placing the centrifugal tube on a magnetic frame, and airing until the surface of the magnetic bead is matte. The tube was removed from the magnetic frame, 22. mu.L of TE buffer (pH 8.0) was added, the beads and TE were pipetted and mixed well, and incubated at room temperature for 5 min. At the end of incubation, the tube was centrifuged briefly and placed on a magnetic rack until it was completely clear. The supernatant purified product was transferred to a new 1.5mL centrifuge tube for use.
(4) Pre-Capture PCR (Non-C-PCR)
The double index adaptor primer working solution of example 3 and 2 XKAPA HiFi HotStart ReadyMix were thawed at room temperature in advance, and after thawing, the reagents were shaken, mixed and centrifuged. The corresponding reaction components were added to the PCR tubes in sequence and mixed and centrifuged as in Table 7. The samples were placed in a PCR machine and PCR amplification was performed according to the procedure of Table 8.
TABLE 7 Non-C-PCR Mix formulation Table
Components Single reaction volume (μ L)
Forward joint primer working solution (10 mu M) 2.5
Reverse joint primer working solution (10 mu M) 2.5
2×KAPA HiFi HotStart ReadyMix 25
Adapter-Ligated library 20
Total volume 50
TABLE 8 Non-C-PCR reaction procedure
Figure BDA0002712504480000171
Figure BDA0002712504480000181
(5) Purification of Non-C-PCR products
The PCR product was purified using 45. mu.L of AMPure XP magnetic beads and finally dissolved in 31. mu.L of TE (pH 8.0) (the same procedure as in step (3)). The purified product was transferred to a fresh 1.5mL centrifuge tube for library quality control, hybridization or storage at-20 ℃.
(6) Preparation of dried Mix
The working solution of double index adaptor blocking sequence of example 4, Cot-1 DNA and the library to be hybridized were thawed at 4 ℃. After melting, shaking, mixing and centrifuging, adding into a 1.5mL centrifuge tube according to Table 9, shaking, mixing and centrifuging.
TABLE 9 Mix formulation table by evaporation
Figure BDA0002712504480000182
And (3) steaming the Mix tube cover to be punched, putting the Mix tube cover in a vacuum concentrator to be concentrated and steamed at 60 ℃, and sealing the hole on the tube cover by using a sealing film after the Mix tube cover is steamed to be dried. In the process of drying by distillation, the probe to be hybridized is unfrozen at 4 ℃. Placing the xGen 2X Hybridization Buffer and the xGen Hybridization Buffer Enhancer at room temperature for dissolving, and oscillating and centrifuging;
(7) preparing denatured Mix according to table 10, shaking, mixing, subpackaging into the mixture, shaking, mixing, centrifuging, and denaturing at 95 deg.C for 10 min;
TABLE 10 list of modified Mix formulations
Components Single reaction volume (μ L)
xGen 2X Hybridization Buffer 8.5
xGen Hybridization Buffer Enhancer 2.7
Nuclease-Free Water 1.8
Total volume 13
2-3 minutes before the sample library denaturation is completed, subpackaging the dissolved probes into 0.2mL PCR tubes, wherein the dosage of each reaction probe is 4 mu L; after the denaturation is finished, centrifuging the sample library for 1min at full speed by using a high-speed centrifuge, then quickly transferring the sample library into a PCR tube, and carrying out oscillation centrifugation;
(8) place PCR tube on PCR instrument for hybridization overnight at 65 ℃ (hot lid temperature 75 ℃);
(9) elution experiments were performed after overnight incubation. Before the elution experiment, the Wash buffers II, III, Stringent Wash Buffer and the Bead Wash Buffer stock solutions are taken out of a refrigerator at the temperature of-20 ℃ at least 30 minutes in advance, are unfrozen at room temperature, are prepared into elution working solution with the concentration of 1 multiplied by the single reaction dosage in the table 11 and are preheated in advance in the corresponding temperature environment. Placing the M-270 magnetic beads and the AMPure XP magnetic beads at room temperature for balancing;
TABLE 111 Xelution working solution preparation
Components Single reaction dosage (mu L)
xGen 10×Stringent Wash Buffer 400(65℃)
xGen 10×Wash BufferⅠ 100(65℃)+200(RT)
xGen 10×Wash BufferⅡ 200
xGen 10×Wash BufferⅢ 200
xGenBead Wash Buffer 500
(10) After the M-270 magnetic beads equilibrated to room temperature were sufficiently shaken and mixed, 20. mu.L of the mixture was aspirated into a new 1.5mL centrifuge tube, and the supernatant was discarded by mounting on a magnetic holder. Taking down the magnetic frame, rinsing the magnetic beads for 3 times by using 200 mu L of 1 multiplied by Bead Wash Buffer, adding 100 mu L of 1 multiplied by Bead Wash Buffer to resuspend the magnetic beads after the supernatant is discarded for the last time, and transferring the magnetic beads to a new 0.2mL PCR tube for later use;
(11) the supernatant was removed from the magnetic rack of the PCR tube containing the beads, and the overnight incubated hybridization system was transferred to a bead tube and mixed by shaking and incubated in a 65 ℃ PCR apparatus (hot lid 75 ℃) for 45 minutes. Taking out the reaction tube every 15 minutes during incubation, quickly shaking and uniformly mixing for 1 time;
(12) rinsing the magnetic beads according to the sequence, the dosage and the times of the reagents in the table 12;
TABLE 12 magnetic bead rinse sequence and method
Figure BDA0002712504480000191
Figure BDA0002712504480000201
(13) 2 XKAPA HiFi HotStart ReadyMix and post-hybridization PCR primers (GF Primer:/5 Phos/TCTCAGTACGTCAGCAGTT; GR Primer: GGCATGGCGACCTTATCAG;) were thawed at 4 ℃ in advance and centrifuged by thoroughly shaking and mixing. Preparing post-hybridization PCR Mix according to Table 13 for later use;
TABLE 13 post-hybridization PCR Mix preparation Table
Components Single reaction volume (μ L)
2×KAPA HiFi HotStart ReadyMix 25
GF Primer(10μM) 2.5
GR Primer(10μM) 2.5
Total volume 30
(14) Transferring 20 mu L of rinsed resuspended magnetic beads into a post-hybridization PCR Mix, blowing and uniformly mixing by a pipette, placing in a PCR instrument, and performing post-hybridization PCR by running a program in a table 14;
TABLE 14 post-hybridization PCR reaction procedure
Figure BDA0002712504480000202
(15) The PCR product was purified using 60. mu.L of AMPure XP magnetic beads and finally dissolved in 31. mu.L of TE (pH 8.0) (the procedure was the same as in step 3). The purified library is used for library quality control, sequencing or storing at-20 ℃.
Comparative example 1
The comparative example provides a method for constructing a library by using the existing double index adaptor primer system, which comprises the following steps:
1) the existing double-index joint primer system is specifically characterized in that a mixed specific molecular label joint comprises the following components:
linker oligonucleotide strand 1 (forward linker sequence) 5 '-3':
Figure BDA0002712504480000211
linker oligonucleotide strand 2 (reverse linker sequence) 5 '-3':
Figure BDA0002712504480000212
forward adaptor primer 5 '-3':
TCTCAGTACGTCAGCAGTTNNNNNNNNNNCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACT;
reverse linker primer 5 '-3':
GGCATGGCGACCTTATCAGNNNNNNNNNNTTGTCTTCCTAAGACCGCTTGGCC;
in the above sequences, the double underlined sequence is a molecular tag (selected from Table 15), "/5 Phos/" indicates a 5' phosphorylation modification; "NNNNNNNNNN" of the forward adaptor primer represents sample label index2, "nnnnnnnnnnnn" of the reverse adaptor primer represents sample label index1, and sample label indices of the forward adaptor primer and the reverse adaptor primer may be the same or different. The index sequence is selected from table 1.
And designing mixed specific molecular tag joints Adapter 1-Adapter 16 containing different molecular tags according to the design principle of the mixed specific molecular tag joints, wherein the molecular tags are selected from the group consisting of Table 15.
TABLE 15 Mixed specific molecular tags for use now
Joint chain Joint numbering Molecular tag sequence Joint chain Joint numbering Molecular tag sequence
Forward linker sequence AD-1-1 CAATA Reverse linker sequence AD-2-1 TATTG
Forward linker sequence AD-1-2 CCACT Reverse linker sequence AD-2-2 AGTGG
Forward linker sequence AD-1-3 TTAGG Reverse linker sequence AD-2-3 CCTAA
Forward linker sequence AD-1-4 CAGAC Reverse linker sequence AD-2-4 GTCTG
Forward linker sequence AD-1-5 ATCGA Reverse linker sequence AD-2-5 TCGAT
Forward linker sequence AD-1-6 TAGAT Reverse linker sequence AD-2-6 ATCTA
Forward linker sequence AD-1-7 CTAAG Reverse linker sequence AD-2-7 CTTAG
Forward linker sequence AD-1-8 GTCTC Reverse linker sequence AD-2-8 GAGAC
Forward linker sequence AD-1-9 CCGAA Reverse linker sequence AD-2-9 TTCGG
Forward linker sequence AD-1-10 ACTAT Reverse linker sequence AD-2-10 ATAGT
Forward linker sequence AD-1-11 TCCAG Reverse linker sequence AD-2-11 CTGGA
Forward linker sequence AD-1-12 AGCTC Reverse linker sequence AD-2-12 GAGCT
Forward linker sequence AD-1-13 AACTA Reverse linker sequence AD-2-13 TAGTT
Forward linker sequence AD-1-14 CCCAT Reverse linker sequence AD-2-14 ATGGG
Forward linker sequence AD-1-15 CTTAG Reverse linker sequence AD-2-15 CTAAG
Forward linker sequence AD-1-16 ATCTC Reverse linker sequence AD-2-16 GAGAT
The sequence of the double index adapter primer of this comparative example is the double index adapter primer designed in example 3. The double index splice blocking sequence of this comparative example is the double index splice blocking sequence designed in example 4.
2) Library construction method
The library construction method was the same as in example 5.
Experimental example 1 library construction and Capture Performance comparison
This experiment compares the hybrid specific molecular tag linker of the invention with the existing linker systems. And evaluating the fragment conversion efficiency and the target area capture performance.
One, build storehouse system
Scheme 1, comprising the hybrid specific molecular tag adaptors of example 2 and the double index adaptor primers of example 3 and the double index adaptor blocking sequence of example 4. Wherein the index1 sequence and the index2 sequence are selected from Table 1. Specifically, the index combination used in this embodiment is index No. 1 to 6;
scheme 2, including the double index library construction system of comparative example 1, the index1 sequence and the index2 sequence are selected from table 1. Specifically, the index combination used in the experimental example is index No. 1-6;
second, joint performance comparison test
(1) The experimental method comprises the following steps: 50ng of gDNA was fragmented and then pooled using the double index pooling system of scheme 1-2 described above, and the detailed pooling method was performed by referring to steps (1) to (5) of example 5, and a portion of the intermediate library purified in step (5) was subjected to WGS sequencing. The remaining libraries were captured at the target region (i.e., the libraries obtained from steps (6) to (15) of example 5) using a commercially available liquid capture probe (IDT production) and subjected to sequencing (the sequencing platform is Gene)+Seq 2000). When the machine is operated, the dark reaction setting is not needed in the scheme 1, and the dark reaction setting is needed in the scheme 2.
(2) Evaluation indexes are as follows: the wet test indexes include: the total amount of the library; dry test indexes include: average depth, capture efficiency, alignment, coverage uniformity (1 × average depth coverage).
Library concentration Using Life technology Qubit 3.0 fluorescence quantifier with QubitTMQuantitatively obtaining the dsDNA HS Assay Kit;
average depth, capture efficiency, contrast ratio, coverage uniformity (1 × average depth coverage) (the above 4 indexes are basic evaluation indexes well known in the sequencing field, all obtained by calculating data, in this experiment, by gigen plus detection), after intercepting the same data volume, calculating it, comparing the corresponding index difference between the schemes;
(3) test results
1) Wet test indexes: the statistical results of the library construction quality control data of the 2 schemes library system are shown in Table 16, and the average library yield of the library system of the invention is improved by 9 percent compared with that of a comparative example and is superior to that of the comparative example library system. The conversion of the fragments according to the invention is shown to be superior to the comparative system.
TABLE 16 library yields for two library-building systems
index number Comparative example The invention
1 1152 1296
2 1212 1320
3 1194 1260
4 1152 1302
5 1242 1350
6 1194 1266
Average yield (ng) 1191 1299
2) Dry test indexes: the statistical results of the quality control data of the dry experiment after the 2 library system schemes are captured are shown in table 17, and the comparison rate of the two library systems has no significant difference. The combination of library yield data and average depth data at the same data volume demonstrates that the fragment conversion efficiency of the library system of the invention is superior to that of the comparative library system. Under the same data quantity, the comparison result of the capture efficiency and the 1 × mean coverage shows that the method can be applied to capture sequencing of the target region and has obviously better performance than a comparative example library system.
TABLE 17 summary of sequencing indexes for the Dry experiments
System for building storehouse Comparison rate Mean depth Efficiency of capture 1.0 × mean coverage
Comparative example 99.85% 2123 47.69% 50.92%
The invention 99.88% 2385 57.42% 51.36%
(4) Conclusion of the test
In conclusion, the library system of the invention is obviously superior to the comparative library system in terms of DNA fragment transformation efficiency and capture performance.
Experimental example 2 comparison of connection uniformity of molecular tags
Firstly, building a library system:
scheme 1: the sample tag index sequence combination information including the mixed specific molecular tag adaptors of example 2 and the double index adaptor primers of example 3, the forward adaptor primer and the reverse adaptor primer are selected from table 1. Specifically, the index combination used in the experimental example is index No. 1-3;
scheme 2: comprises the mixed specific molecular label joint and the double index joint primer of the comparative example 1; the index1 sequence and the index2 sequence are selected from Table 1. Specifically, the index combination used in the experimental example is index No. 1-3;
secondly, the connection uniformity of the molecular label is compared
(1) The experimental method comprises the following steps: after 1000ng of gDNA was fragmented, pools were constructed according to the steps (1) to (5) in example 5 using the library construction systems of scheme 1 and scheme 2, respectively, and were subjected to computer sequencing (the sequencing platform was Gene)+Seq 2000)。
(2) Evaluation indexes are as follows: different molecular labels support reads number coefficient of variation (cv)
Counting the number of reads supported by different molecular labels, wherein the coefficient of variation (cv) is calculated according to the following formula:
different molecular label supported reads number cv ═ different molecular label supported reads number standard deviation/average supported reads number
(3) And (3) testing results:
the number of reads supported by each molecular tag for both protocols is shown in Table 18.
TABLE 18 statistics of the number of reads supported by each molecular tag for the two schemes
Figure BDA0002712504480000241
The number cv of reads supported by each molecular label of the library system is obviously superior to that of a comparative example. The connection level of different molecular label joints is more excellent than that of the comparative example.
Experimental example 3 index comparison for amplification homogeneity
Firstly, building a library system:
scheme 1: comprising the mixed specific molecular tag adaptors of example 2 and the double index adaptor primers of example 3, the sample tag index sequence combination information of the forward adaptor primer and the reverse adaptor primer is selected from table 1;
scheme 2: including the mixed specific molecular tag adaptors and double index adaptor primers of comparative example 1, the sample tag index sequence combination information of the forward adaptor primer and the reverse adaptor primer is selected from table 19;
table 19 index1 and index2 sequence information
Figure BDA0002712504480000251
Figure BDA0002712504480000261
Figure BDA0002712504480000271
Second, index comparison test for amplification uniformity
(1) The experimental method comprises the following steps: linker ligation products were prepared following fragmentation of 1000ng gDNA using the library-building systems of scheme 1 and scheme 2, respectively, according to example 5 steps (1) to (3). Purified adaptor ligation products were evenly distributed and amplified using scheme 1 and scheme 2 double index adaptor primers, respectively. The amplification product was purified and quantified using the Qubit ds DNA HS Assay Kit.
(2) Evaluation indexes are as follows: coefficient of variation of library yield (cv) between individual indices.
The coefficient of variation of library yields among different indices was calculated according to the following formula:
inter-Index library yield cv ═ standard deviation of library concentrations at different indices/mean library concentration
(3) And (3) testing results:
the yield cv of the combinatorial library of the case 1483 against index is 7.9%, and the library yield data are shown in FIG. 2. Comparative example 96 yield cv for index combinatorial library was 8.08%, and the library yield data are shown in FIG. 3. Scheme 1index combination homogeneity is better than the comparative example.
Experimental example 4 evaluation of improvement in sequencing quality of molecular tag
Firstly, a test scheme: comprising the mixed specific molecular tag adaptors of example 2 and the double index adaptor primers of example 3, the sample tag index sequence combination information of the forward adaptor primer and the reverse adaptor primer is selected from table 1;
second, molecular tag sequencing quality improvement evaluation test
(1) The experimental method comprises the following steps: after 1000ng of gDNA was fragmented, pools were created according to the steps (1) to (5) in example 5 using the above-mentioned test protocol and were subjected to computer sequencing (the sequencing platform was Gene)+Seq2000)。
(2) Evaluation indexes are as follows: the first 10bp base sequencing Q30 is more than 85 percent in forward sequencing and reverse sequencing.
(3) And (3) testing results:
the average Q30 distribution per base for forward and reverse sequencing of the test library is shown in FIG. 4. The sequencing Q30 of the first 10bp (molecular tag and downstream fixed base region) of the forward sequencing and the reverse sequencing is more than 85 percent, and meets the expected requirement.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
Sequence listing
<110> Beijing Jiyin technologies Ltd
BEIJING JIYINJIA MEDICAL LABORATORY Co.,Ltd.
<120> specific molecular tag UMI group, mixed specific molecular tag joint and application
<160> 96
<170> SIPOSequenceListing 1.0
<210> 1
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 1
ggctcacaga acgacatggc tacgatccga cttagactgt 40
<210> 2
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 2
ggctcacaga acgacatggc tacgatccga cttcgcatgt 40
<210> 3
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 3
ggctcacaga acgacatggc tacgatccga cttgcgatct 40
<210> 4
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 4
ggctcacaga acgacatggc tacgatccga ctttctagct 40
<210> 5
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 5
ggctcacaga acgacatggc tacgatccga ctttagacgt 40
<210> 6
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 6
ggctcacaga acgacatggc tacgatccga ctttcgcagt 40
<210> 7
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 7
ggctcacaga acgacatggc tacgatccga ctttgcgact 40
<210> 8
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 8
ggctcacaga acgacatggc tacgatccga cttgtctact 40
<210> 9
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 9
ggctcacaga acgacatggc tacgatccga ctttagacgt 40
<210> 10
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 10
ggctcacaga acgacatggc tacgatccga ctttcgcagt 40
<210> 11
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 11
ggctcacaga acgacatggc tacgatccga ctttgcgact 40
<210> 12
<211> 40
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 12
ggctcacaga acgacatggc tacgatccga cttgtctact 40
<210> 13
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 13
ggctcacaga acgacatggc tacgatccga cttacatggg t 41
<210> 14
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 14
ggctcacaga acgacatggc tacgatccga cttcactgag t 41
<210> 15
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 15
ggctcacaga acgacatggc tacgatccga cttgagtctc t 41
<210> 16
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 16
ggctcacaga acgacatggc tacgatccga ctttatgccc t 41
<210> 17
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 17
ggctcacaga acgacatggc tacgatccga cttgacatgg t 41
<210> 18
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 18
ggctcacaga acgacatggc tacgatccga cttgcactag t 41
<210> 19
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 19
ggctcacaga acgacatggc tacgatccga cttcgagttc t 41
<210> 20
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 20
ggctcacaga acgacatggc tacgatccga cttctatgcc t 41
<210> 21
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 21
ggctcacaga acgacatggc tacgatccga cttgacatgg t 41
<210> 22
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 22
ggctcacaga acgacatggc tacgatccga cttgcactag t 41
<210> 23
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 23
ggctcacaga acgacatggc tacgatccga cttcgagttc t 41
<210> 24
<211> 41
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 24
ggctcacaga acgacatggc tacgatccga cttctatgcc t 41
<210> 25
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 25
ggctcacaga acgacatggc tacgatccga cttatagcag gt 42
<210> 26
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 26
ggctcacaga acgacatggc tacgatccga cttctcgaag gt 42
<210> 27
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 27
ggctcacaga acgacatggc tacgatccga cttgtgcatc ct 42
<210> 28
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 28
ggctcacaga acgacatggc tacgatccga ctttgtcatc ct 42
<210> 29
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 29
ggctcacaga acgacatggc tacgatccga cttcatagag gt 42
<210> 30
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 30
ggctcacaga acgacatggc tacgatccga cttactcgag gt 42
<210> 31
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 31
ggctcacaga acgacatggc tacgatccga cttagtgctc ct 42
<210> 32
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 32
ggctcacaga acgacatggc tacgatccga cttatgtctc ct 42
<210> 33
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 33
ggctcacaga acgacatggc tacgatccga cttcatagag gt 42
<210> 34
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 34
ggctcacaga acgacatggc tacgatccga cttactcgag gt 42
<210> 35
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 35
ggctcacaga acgacatggc tacgatccga cttagtgctc ct 42
<210> 36
<211> 42
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 36
ggctcacaga acgacatggc tacgatccga cttatgtctc ct 42
<210> 37
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 37
ggctcacaga acgacatggc tacgatccga cttatcagca ggt 43
<210> 38
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 38
ggctcacaga acgacatggc tacgatccga cttcagctaa cgt 43
<210> 39
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 39
ggctcacaga acgacatggc tacgatccga cttgctgata cct 43
<210> 40
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 40
ggctcacaga acgacatggc tacgatccga ctttgatcga gct 43
<210> 41
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 41
ggctcacaga acgacatggc tacgatccga cttgatcaca ggt 43
<210> 42
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 42
ggctcacaga acgacatggc tacgatccga ctttcagcaa cgt 43
<210> 43
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 43
ggctcacaga acgacatggc tacgatccga cttagctgta cct 43
<210> 44
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 44
ggctcacaga acgacatggc tacgatccga cttctgatga gct 43
<210> 45
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 45
ggctcacaga acgacatggc tacgatccga cttgctaaca ggt 43
<210> 46
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 46
ggctcacaga acgacatggc tacgatccga ctttgaccaa cgt 43
<210> 47
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 47
ggctcacaga acgacatggc tacgatccga cttatcggta cct 43
<210> 48
<211> 43
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 48
ggctcacaga acgacatggc tacgatccga cttcagttga gct 43
<210> 49
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 49
cagtctaagt cggaggccaa gcggtcttag gaa 33
<210> 50
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 50
catgcgaagt cggaggccaa gcggtcttag gaa 33
<210> 51
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 51
gatcgcaagt cggaggccaa gcggtcttag gaa 33
<210> 52
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 52
gctagaaagt cggaggccaa gcggtcttag gaa 33
<210> 53
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 53
cgtctaaagt cggaggccaa gcggtcttag gaa 33
<210> 54
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 54
ctgcgaaagt cggaggccaa gcggtcttag gaa 33
<210> 55
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 55
gtcgcaaagt cggaggccaa gcggtcttag gaa 33
<210> 56
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 56
gtagacaagt cggaggccaa gcggtcttag gaa 33
<210> 57
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 57
cgtctaaagt cggaggccaa gcggtcttag gaa 33
<210> 58
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 58
ctgcgaaagt cggaggccaa gcggtcttag gaa 33
<210> 59
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 59
gtcgcaaagt cggaggccaa gcggtcttag gaa 33
<210> 60
<211> 33
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 60
gtagacaagt cggaggccaa gcggtcttag gaa 33
<210> 61
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 61
cccatgtaag tcggaggcca agcggtctta ggaa 34
<210> 62
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 62
ctcagtgaag tcggaggcca agcggtctta ggaa 34
<210> 63
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 63
gagactcaag tcggaggcca agcggtctta ggaa 34
<210> 64
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 64
gggcataaag tcggaggcca agcggtctta ggaa 34
<210> 65
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 65
ccatgtcaag tcggaggcca agcggtctta ggaa 34
<210> 66
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 66
ctagtgcaag tcggaggcca agcggtctta ggaa 34
<210> 67
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 67
gaactcgaag tcggaggcca agcggtctta ggaa 34
<210> 68
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 68
ggcatagaag tcggaggcca agcggtctta ggaa 34
<210> 69
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 69
ccatgtcaag tcggaggcca agcggtctta ggaa 34
<210> 70
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 70
ctagtgcaag tcggaggcca agcggtctta ggaa 34
<210> 71
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 71
gaactcgaag tcggaggcca agcggtctta ggaa 34
<210> 72
<211> 34
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 72
ggcatagaag tcggaggcca agcggtctta ggaa 34
<210> 73
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 73
cctgctataa gtcggaggcc aagcggtctt aggaa 35
<210> 74
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 74
ccttcgagaa gtcggaggcc aagcggtctt aggaa 35
<210> 75
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 75
ggatgcacaa gtcggaggcc aagcggtctt aggaa 35
<210> 76
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 76
ggatgacaaa gtcggaggcc aagcggtctt aggaa 35
<210> 77
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 77
cctctatgaa gtcggaggcc aagcggtctt aggaa 35
<210> 78
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 78
cctcgagtaa gtcggaggcc aagcggtctt aggaa 35
<210> 79
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 79
ggagcactaa gtcggaggcc aagcggtctt aggaa 35
<210> 80
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 80
ggagacataa gtcggaggcc aagcggtctt aggaa 35
<210> 81
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 81
cctctatgaa gtcggaggcc aagcggtctt aggaa 35
<210> 82
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 82
cctcgagtaa gtcggaggcc aagcggtctt aggaa 35
<210> 83
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 83
ggagcactaa gtcggaggcc aagcggtctt aggaa 35
<210> 84
<211> 35
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 84
ggagacataa gtcggaggcc aagcggtctt aggaa 35
<210> 85
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 85
cctgctgata agtcggaggc caagcggtct taggaa 36
<210> 86
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 86
cgttagctga agtcggaggc caagcggtct taggaa 36
<210> 87
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 87
ggtatcagca agtcggaggc caagcggtct taggaa 36
<210> 88
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 88
gctcgatcaa agtcggaggc caagcggtct taggaa 36
<210> 89
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 89
cctgtgatca agtcggaggc caagcggtct taggaa 36
<210> 90
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 90
cgttgctgaa agtcggaggc caagcggtct taggaa 36
<210> 91
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 91
ggtacagcta agtcggaggc caagcggtct taggaa 36
<210> 92
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 92
gctcatcaga agtcggaggc caagcggtct taggaa 36
<210> 93
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 93
cctgttagca agtcggaggc caagcggtct taggaa 36
<210> 94
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 94
cgttggtcaa agtcggaggc caagcggtct taggaa 36
<210> 95
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 95
ggtaccgata agtcggaggc caagcggtct taggaa 36
<210> 96
<211> 36
<212> DNA
<213> Artificial Synthesis (artificial Synthesis)
<400> 96
gctcaactga agtcggaggc caagcggtct taggaa 36

Claims (16)

1. A set of specific molecular signature UMIs, comprising: the sequence length is 5bp of a first specific molecular label UMI, the sequence length is 6bp of a second specific molecular label UMI, the sequence length is 7bp of a third specific molecular label UMI, and the sequence length is 8bp of a fourth specific molecular label UMI.
2. The set of specific molecular tags UMI according to claim 1, wherein the sequences of the sense molecular tags of said set of specific molecular tags UMI are shown in Table 2, and the sequences of the antisense molecular tags are shown in Table 2.
3. The set of specific molecular tags UMI according to claim 1 or 2, characterized in that the number of moles of each molecular tag is the same.
4. Use of the set of specific molecular signature UMIs according to any one of claims 1 to 3 for sequencing, pooling.
5. A hybrid specific molecular tag linker, wherein the sense molecular tag and the antisense molecular tag of the hybrid specific molecular tag linker are both the specific molecular tag UMI set according to any one of claims 1 to 3.
6. The hybrid specific molecular tag linker as claimed in claim 5, wherein the first specific molecular tag UMI, the second specific molecular tag UMI, the third specific molecular tag UMI and the fourth specific molecular tag UMI are present in the same number of moles in the hybrid specific molecular tag linker.
7. The hybrid specific molecular tag linker according to claim 5 or 6, comprising two partially complementary linker oligonucleotide strands:
the adaptor oligonucleotide chain 1 comprises an index2 primer binding region, a sense adaptor complementary region, a sense molecule label, at least one base S with base balance function and 1 protruding fixed base T in sequence from the 5 'end to the 3' end;
the adaptor oligonucleotide 2 comprises, in order from the 5 'terminus to the 3' terminus, a base complementary to the base S in the adaptor oligonucleotide 1, an antisense tag reverse-complementary to the sense molecule tag in the adaptor oligonucleotide 1, an antisense adaptor complementary region reverse-complementary to the sense adaptor complementary region of the adaptor oligonucleotide 1, and an index1 primer binding region.
8. The hybrid specific molecular tag linker as claimed in claim 7,
in the adaptor oligonucleotide chain 1, the length of the index2 primer binding region is 15-42bp, the length of the sense adaptor complementary region is 8-10bp, and the length of the sense molecule label is 5-8 bp; the complementary region of the sense joint is totally or partially overlapped with the 3' terminal sequence of the index2 primer binding region;
preferably, in the adaptor oligonucleotide strand 2, the antisense molecular tag length is 5-8bp, the antisense adaptor complementary region length is 8-10bp, and the index1 primer binding region length is 15-30 bp; the complementary region of the antisense joint is totally or partially overlapped with the 5' terminal sequence of the primer binding region of index 1.
9. The mixed specific molecular tag linker according to any one of claims 5 to 8, wherein in the linker oligonucleotide chain 1, the number of each of the A/T/G/C4 bases in the sequence consisting of the sense molecular tag, the equilibrium base S and the fixed base T is between 6.25% and 43.75% of the total number of bases in the sense molecular tag;
preferably, in the linker oligonucleotide strand 2, the number of each of the 4 bases A/T/G/C in the complementary sequence of the antisense molecular tag and the equilibrium base S is between 6.25% and 43.75% of the total number of bases in the antisense molecular tag.
10. The mixed specific molecular tag linker according to any one of claims 5 to 9, wherein the 5' end of the linker oligonucleotide strand 2 is modified by phosphorylation; preferably, the hybrid specific molecular tag linker is a Y-type hybrid specific molecular tag linker.
11. The hybrid specific molecular tag linker according to any one of claims 5-10, wherein the index1 sequence and the index2 sequence are selected from table 1.
12. A double index library construct prepared using the hybrid specific molecular tag of claims 1-3 and/or the hybrid specific molecular tag adaptor of any one of claims 5-11.
13. A double index banking system comprising the use of the hybrid specific molecular tag of claims 1-3 and/or the hybrid specific molecular tag adaptor of any one of claims 5-11.
14. A library construction method comprising the use of the mixed specific molecular tag of claims 1-3 and/or the mixed specific molecular tag adaptor of any one of claims 5-11.
15. A sequencing method comprising using the library construct of claim 12.
16. The sequencing method of claim 15, wherein the sequencing platform is Gene+Seq 2000、Gene+Seq200 or DNBSEQ-T7.
CN202011061421.7A 2020-09-30 2020-09-30 Specific molecular label UMI group, mixed specific molecular label joint and application Pending CN114317528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011061421.7A CN114317528A (en) 2020-09-30 2020-09-30 Specific molecular label UMI group, mixed specific molecular label joint and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011061421.7A CN114317528A (en) 2020-09-30 2020-09-30 Specific molecular label UMI group, mixed specific molecular label joint and application

Publications (1)

Publication Number Publication Date
CN114317528A true CN114317528A (en) 2022-04-12

Family

ID=81011321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011061421.7A Pending CN114317528A (en) 2020-09-30 2020-09-30 Specific molecular label UMI group, mixed specific molecular label joint and application

Country Status (1)

Country Link
CN (1) CN114317528A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107475352A (en) * 2016-06-08 2017-12-15 深圳华大基因股份有限公司 A kind of universal PCR amplification fusion primers of the sequenator for being used to be sequenced in synthesis
CN107829146A (en) * 2017-11-29 2018-03-23 广州赛哲生物科技股份有限公司 Primer group for constructing 16SrRNA gene amplicon sequencing library and construction method
US20180245072A1 (en) * 2015-11-11 2018-08-30 Resolution Bioscience, Inc. High efficiency construction of dna libraries
CN109439729A (en) * 2018-12-27 2019-03-08 上海鲸舟基因科技有限公司 Detect connector, connector mixture and the correlation method of low frequency variation
CN110257480A (en) * 2019-07-04 2019-09-20 北京京诺玛特科技有限公司 Nucleic acid sequence sequence measuring joints and its method for constructing sequencing library
CN111349699A (en) * 2018-12-24 2020-06-30 深圳华大智造科技有限公司 Kit and method for detecting BRCA gene mutation from cervical secretions
WO2020136440A2 (en) * 2018-12-28 2020-07-02 National University Of Singapore Methods for targeted complementary dna enrichment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180245072A1 (en) * 2015-11-11 2018-08-30 Resolution Bioscience, Inc. High efficiency construction of dna libraries
CN107475352A (en) * 2016-06-08 2017-12-15 深圳华大基因股份有限公司 A kind of universal PCR amplification fusion primers of the sequenator for being used to be sequenced in synthesis
CN107829146A (en) * 2017-11-29 2018-03-23 广州赛哲生物科技股份有限公司 Primer group for constructing 16SrRNA gene amplicon sequencing library and construction method
CN111349699A (en) * 2018-12-24 2020-06-30 深圳华大智造科技有限公司 Kit and method for detecting BRCA gene mutation from cervical secretions
CN109439729A (en) * 2018-12-27 2019-03-08 上海鲸舟基因科技有限公司 Detect connector, connector mixture and the correlation method of low frequency variation
WO2020136440A2 (en) * 2018-12-28 2020-07-02 National University Of Singapore Methods for targeted complementary dna enrichment
CN110257480A (en) * 2019-07-04 2019-09-20 北京京诺玛特科技有限公司 Nucleic acid sequence sequence measuring joints and its method for constructing sequencing library

Similar Documents

Publication Publication Date Title
CN112626189A (en) Short joint, double-index joint primer and double-index library construction system of gene sequencer
CN110129415B (en) NGS library-building molecular joint and preparation method and application thereof
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN108456713A (en) The construction method of tab closure sequence, library construction Kit and sequencing library
CN109576347B (en) Sequencing joint containing single-molecule label and construction method of sequencing library
WO2004074429A2 (en) Method for producing second-generation library
CN111440896A (en) Novel β coronavirus variation detection method, probe and kit
CN110218781B (en) Composite amplification system of 21 micro haplotype sites, next generation sequencing and typing kit and typing method
CN109234357B (en) Method, primer combination, kit and application for detecting fusion mutation of target gene
CN112708622A (en) Joint primer combination for library construction and kit thereof
CN112410331A (en) Linker with molecular label and sample label and single-chain library building method thereof
WO2023036271A1 (en) Method for constructing capture library having high test performance, and kit
CN111748637A (en) SNP molecular marker combination, multiplex composite amplification primer set, kit and method for genetic relationship analysis and identification
WO2012037875A1 (en) Dna tags and use thereof
CN112322788A (en) mNGS primer group and kit for detecting SARS-CoV-2
CN111471746A (en) NGS library preparation joint for detecting low mutation abundance sample and preparation method thereof
CN114032288A (en) Kit and method for preparing target nucleotide for sequencing by using same
CN116790718B (en) Construction method and application of multiplex amplicon library
CN114807300A (en) Application of single-primer multiple amplification technology in detection of fragmented rare characteristic nucleic acid molecules and kit
CN114317528A (en) Specific molecular label UMI group, mixed specific molecular label joint and application
CN111501106A (en) Construction method, device and application of high-throughput sequencing library of exosome RNA
CN115807056B (en) BCR or TCR rearrangement sequence template pool and application thereof
CN117580959A (en) Methods and compositions for combinatorial indexing of bead-based nucleic acids
CN116536308A (en) Sequencing sealant and application thereof
CN113462759A (en) Method for enriching and sequencing single-stranded DNA sequence based on combination of multiplex amplification and probe capture and application of method in mutation detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination