CN112466405B

CN112466405B - Method for preparing molecular tag library for sequencing

Info

Publication number: CN112466405B
Application number: CN202011540460.5A
Authority: CN
Inventors: 罗俊峰; 陈曦; 张稀; 徐雪; 汪进平
Original assignee: Carrier Gene Technology Suzhou Co ltd
Current assignee: Carrier Gene Technology Suzhou Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-06-22
Anticipated expiration: 2040-12-23
Also published as: CN112466405A

Abstract

The invention discloses a method for preparing a molecular tag library for sequencing, wherein a molecular tag is prepared by connecting sequence units (B7 sequences) of 7 bases in series according to a certain mode, through calculation of a coding mathematical formula, the rightmost 3 bases of the B7 sequence are correction codes of the leftmost 4 bases, and any base in the B7 sequence has an error and can be corrected back to a correct coding sequence through a decoding correction mathematical formula. The method for constructing the molecular tag library can ensure that the molecular tag library has enough molecular tag types, can ensure that the molecular tag sequence is known and controllable, can ensure that the molecular tag sequence is correctable and correctable, is favorable for improving the accuracy of a sequencing result and the accuracy and specificity identification of target molecules in a sample.

Description

Method for preparing molecular tag library for sequencing

Technical Field

The invention belongs to the technical field of biotechnology detection, and particularly relates to a method for preparing a molecular tag library for sequencing.

Background

In the detection technology of DNA fragment molecules, not only the information of DNA fragments but also the number of original DNA fragment molecules are required to be known in some cases, however, due to the existence of amplification links, a large number of identical fragments appear, PCR amplifies original target molecules of hundreds to tens of thousands to dozens of times of 2, the method annihilates the information of the number of original DNA fragment molecules, simultaneously introduces amplification errors, sequencing errors and the like which cannot be identified and corrected, in order to more accurately obtain the information of original sequence information, the number of molecules and the like of DNA, scientists mark molecular labels on the original DNA fragment molecules, analyze the sequence and the number information of the original DNA fragment molecules by utilizing the uniqueness of the molecular labels, and in most cases, the molecular labels are formed by a plurality of N (N is A/C/T/G) or H (H is A/T/C) at random during synthesis, for example, the library is composed of 12H, so 3^12 ^ 531441 sequences can be obtained, the molecular tag library is simple to obtain, the molecular tags in the library are enough in possibility, but the molecular tags are not artificially controllable, the sequence AAA AAA AAA AAA, AAA AAA AAA AAT can appear, the 2 tag sequences cannot be distinguished at all in the case that the subsequent amplification error and the sequencing error objectively exist, the two tag differences are caused by the error, the two tag differences are self-carried in the molecular tag library, meanwhile, the randomly formed molecular tag library cannot artificially control the CG content, a large number of continuously identical bases (such as AAA AAA AAA AAA, AAA AAA AAA AAT) can cause potential troubles for some sequencing platforms, for example, the two sequences are difficult to identify in the PGM sequencing platform of Thermo company, which in turn leads to the loss of information, which is objectively present. An important premise for the application of molecular tags is that enough molecular tag species are needed, and the molecular tag species can be tens of thousands or hundreds of thousands, and the random synthesis of the Facultispie bases is a low-cost mode, but the sequence and the proportion are uncontrollable, and the synthesis of enough molecular tag sequences one by one is very uneconomical although the sequence and the proportion are controllable.

Disclosure of Invention

In order to solve the technical problems, the invention discloses a preparation method of a molecular tag library consisting of non-random sequences, wherein the molecular tag is prepared by connecting sequence units (B7 sequences) of 7 bases in series according to a certain mode, through calculation of a coding mathematical formula, the rightmost 3 bases of the B7 sequence are correction codes of the leftmost 4 bases, and any one base in the B7 sequence has an error, so that the correct coding sequence can be corrected back through decoding the correction mathematical formula.

The first object of the present invention is to provide a method for preparing a molecular tag library for sequencing, comprising the steps of:

s1, designing a molecular tag B7 sequence, wherein the B7 sequence is designed according to the following method:

defining the 7-base sequence of the B7 sequence as (a B c d x y z); wherein the content of the first and second substances,

a, b, c and d are information bits and represent the digital sequence converted from a randomly generated 4-bit base sequence consisting of the base A, T, G, C, and the base A, T, C, G is converted into the digital sequence by the following mode: a is 1, T is 2, G is 3, C is 4;

x, y and z are check bits and are obtained by converting a, b, c and d according to the following formula:

wherein floor is a down-rounding function;

s2, combining a plurality of different molecular tag sequences with the specific sequence to obtain the specific sequence containing a molecular tag library; the molecular tag sequence consists of n (E2+ B7+ F2) units, wherein E2 is 0-5 basic groups; f2 is 0-5 bases; n is any integer of 1 to 20.

Furthermore, in the n pieces (E2+ B7+ F2), the CG percent is 35-75 percent.

Further, the ratio of the number of the molecular tag sequences in the molecular tag library for sequencing to the number of the target molecules is more than 10: 1. The use of a ratio greater than 10:1 can satisfy the poisson distribution requirement, ensuring that each target molecule has a greater than 95% probability of having a unique molecular tag sequence attached.

Further, the step of combining the molecular tag sequence with the specific sequence specifically comprises the following steps:

s01, dividing the synthesized specific sequences into different shares, and synthesizing each sequence in the unit with the number n being 1 one by one on each specific sequence;

and S02, mixing the sequences synthesized in S01, dividing the sequences into different shares, further synthesizing each sequence in the (n) -2 th unit one by one, and repeating the steps in the same way, and synthesizing the specific sequences containing the molecular tag library according to the number requirement of the molecular tag sequences.

Further, the specific sequence is a PCR amplification primer, a hybridization probe, an isothermal extension primer or a connection primer.

Further, the molecular tag B7 sequence is any one of the following sequences:

the second purpose of the invention is to provide an error correction method for a molecular tag library, which comprises the following steps:

s001, setting temporary values temp1, temp2 and temp3, temp1 ═ a + b + d + x; temp2 ═ b + c + d + y; temp3 ═ a + c + d + z;

s002, evaluating whether the information bits of a, b, c and d have errors or not according to the values of temp1, temp2 and temp 3;

and S003, if an error occurs, completing self-checking, replacing by using a correct information bit, converting into a base information sequence, and outputting.

Further, the specific steps of evaluating whether the information bits a, b, c and d have errors according to the values of temp1, temp2 and temp3 in the step S002 are as follows:

if temp1-4 floor (temp1/4) is not equal to 2 and temp2-4 floor (temp2/4) is not equal to 2, an error occurs at position b; the correct value for b is calculated at this time:

b1＝14-a-d-x-4*floor((14-a-d-x-1)/4)，b＝b1

b is replaced by b 1;

and set b2 ═ 14-c-d-y-4 flow ((14-c-d-y-1)/4)

If B1 ≠ B2, it indicates that two or more information bit errors occur, the decoding output cannot be completed, and the correction process of the current B7 sequence exits;

if temp2-4 floor (temp2/4) is not equal to 2 and temp3-4 floor (temp3/4) is not equal to 2, an error occurs at position c; the correct value for c is calculated at this time:

c1＝14-a-d-z-4*floor((14-a-d-z-1)/4)，c＝c1

c is replaced by c 1;

and c2 ═ 14-b-d-y-4 flow ((14-b-d-y-1)/4)

If c1 ≠ c2, it indicates that two or more information bit errors occur, the decoding output cannot be completed, and the correction process of the current B7 sequence exits;

if temp1-4 floor (temp1/4) is not equal to 2 and temp3-4 floor (temp3/4) is not equal to 2, an error occurs at position a; the correct value for a is calculated at this time:

a1＝14-b-d-x-4*floor((14-b-d-x-1)/4)，a＝a1

replacing a with a 1;

and the setting a2 ═ 14-c-d-z-4 ═ floor ((14-c-d-z-1)/4)

If a1 ≠ a2, it indicates that two or more information bit errors occur, the decoding output cannot be completed, and the correction process of the current B7 sequence exits;

if temp1-4 floor (temp1/4) is not equal to 2, temp2-4 floor (temp2/4) is not equal to 2, and temp3-4 floor (temp3/4) is not equal to 2, an error occurs at position d; at this time, the correct value of d is calculated

d1＝14-a-b-x-4*floor((14-a-b-x-1)/4)，d＝d1

D is replaced by d 1;

and d2 is set to 14-b-c-y-4 floor ((14-b-c-y-1)/4);

d3＝14-a-c-z-4*floor((14-a-c-z-1)/4)；

if d1 ≠ d2 ≠ d3, it is said that two or more information bit errors occur, the decoding output cannot be completed, and the correction process of the current B7 sequence exits;

where floor is a floor rounding function.

By the scheme, the invention at least has the following advantages:

the method for constructing the molecular tag library can ensure that the molecular tag library has enough molecular tag types, can ensure that the molecular tag sequence is known and controllable, can ensure that the molecular tag sequence is correctable and correctable, is favorable for improving the accuracy of a sequencing result and the accuracy and specificity identification of target molecules in a sample.

The foregoing is a summary of the present invention, and in order to provide a clear understanding of the technical means of the present invention and to be implemented in accordance with the present specification, the following is a preferred embodiment of the present invention and is described in detail below.

Detailed Description

Example 1: scheme for coding B7 sequence

3 check bits are added at the right end of the 4-bit base information to form a B7 sequence, and the data of the 3 check bits are obtained by the following algorithm, so that when one bit in the B7 sequence has errors, the correct sequence can be corrected back

1) First, a 4-base sequence consisting of A, T, G, C is randomly generated, or a certain 4-base sequence, for example, TTGA;

2) converting the 4-bit base sequence from an alphabetic sequence to a numeric sequence, wherein if the base is A or a, the base is 1; the basic group is T or T, and then is 2; the base is G or G, then 3; base is C or C, then 4, for example, a 4-base sequence is TTGA, then 2231 is the result of conversion to a digital sequence;

3) converting into a sequence of 4 digits, which is defined as abcd, such as the digit sequence 2231, where a is 2, b is 2, c is 3, and d is 1, and sequentially obtaining information of 3 check digits by using the following conversion formula;

wherein the floor is a downward rounding function in Matlab;

4) adding parity bits to the end of abcd, resulting in a digitized B7 sequence: a B c d x y z, and then converted into a B7 letter sequence;

5) after obtaining the B7 letter sequence, we also need to examine the GC content and the degree of in-sequence repetition of the sequence, and only if the GC content is greater than 0.2 and less than 0.8 is taken as an output. In addition, the repetition degree in the sequence is too high and is not taken as an output.

The following is an implementation of the B7 sequence in Matlab:

the practical effect is that a set of encoded B7 sequences can be obtained, for example, 240 sequences in Table 1

Table 1 self-error-correctable 240 sequences

Example 2: decoding and error correction scheme for B7 sequences

1) This example is for the decoding of B7 sequence, requiring the input DNA sequence length to be an integer multiple of 7, and defining each 7 base sequence as (a B c d x y z), where a, B, c, d are information bits, x, y, z are check bits;

2) converting the base information sequence into a digital sequence according to A → 1, T → 2, G → 3, C → 4;

3) calculating temporary values temp1, temp2 and temp3, and respectively making temp1 be a + b + d + x; temp2 ═ b + c + d + y; temp3 ═ a + c + d + z

4) The error at each of the information bits a, b, c, d is then evaluated based on the values of temp1, temp2, temp 3:

if temp1-4 floor (temp1/4) is not equal to 2 and temp2-4 floor (temp2/4) is not equal to 2, then an error occurs at position b; at this time, the correct value of b is calculated

b1＝14-a-d-x-4*floor((14-a-d-x-1)/4)，b＝b1

B is replaced by b 1. Another formula is used:

b2＝14-c-d-y-4*floor((14-c-d-y-1)/4)

if b1 is not equal to b2, it indicates that two or more information bit errors occur, the decoding output cannot be completed, and the current 7-base sequence correction process is exited;

if temp2-4 floor (temp2/4) is not equal to 2 and temp3-4 floor (temp3/4) is not equal to 2, then an error occurs at position c; at this time, the correct value of c is calculated

c1＝14-a-d-z-4*floor((14-a-d-z-1)/4)，c＝c1

C is replaced by c 1. Another formula is also used

c2＝14-b-d-y-4*floor((14-b-d-y-1)/4)

If c1 is not equal to c2, the decoding output cannot be completed because two or more information bit errors occur, and the current 7-base sequence correction process is exited;

if temp1-4 floor (temp1/4) is not equal to 2 and temp3-4 floor (temp3/4) is not equal to 2, then an error occurs at position a; at this time, the correct value of a is calculated

a1＝14-b-d-x-4*floor((14-b-d-x-1)/4)，a＝a1

A is replaced with a 1. Another formula is used:

a2＝14-c-d-z-4*floor((14-c-d-z-1)/4)

if a1 is not equal to a2, the decoding output cannot be completed because two or more information bit errors occur, and the current 7-base sequence correction process is exited;

if temp1-4 floor (temp1/4) is not equal to 2, temp2-4 floor (temp2/4) is not equal to 2, and temp3-4 floor (temp3/4) is not equal to 2, then an error occurs at position d; at this time, the correct value of d is calculated

d1＝14-a-b-x-4*floor((14-a-b-x-1)/4)，d＝d1

D is replaced by d 1. In addition, 2 formulas are used

d2＝14-b-c-y-4*floor((14-b-c-y-1)/4)；

d3＝14-a-c-z-4*floor((14-a-c-z-1)/4)；

If d1 is not equal to d2 is not equal to d3, the decoding output cannot be completed due to two or more information bit errors, and the current 7-base sequence correction process is exited;

wherein the floor is a downward rounding function in Matlab;

5) if an error occurs, the self-checking is completed, and the correct information bit is used for replacing and is converted into a base information sequence, so that the output can be realized.

The following is an implementation of the decoding and correction process of the B7 sequence in Matlab:

example 3: method for preparing molecular tag (E2+ B7+ F2) n by synthesis

The preparation procedure assuming that n is 4 is as follows:

1. preparation of specific primers with 331,776 molecular tag

a) A sufficient amount of the desired specific sequence FP, e.g., 5-GGACCCCCACACAGCAAA-3, is synthesized, and the number of molecules is divided into 24;

b) the sequence of round E2+ B7+ F2, e.g., the following 24 sequences (5 '-3'), was determined. These 24 sequences are synthesized one by one on the basis of each specific sequence, for example the 1st sequence ACaagggaaAC in the table below is synthesized on the basis of the 1st specific sequence FP, and so on. After the synthesis is finished, the number of molecules is equally divided into 24 parts again, and the n-th-2-round synthesis is prepared;

c) the sequences to be used in the n-2 th round are determined, for example, the following 24 sequences (5 '-3') are synthesized one by one on the basis of each of the n-1 th round mixtures, for example, the 1st sequence ACataattcAC in the following table is synthesized on the basis of the first n-1 th round mixture, after the synthesis is completed, the 24 n-2 th round sequences are obtained, and then the molecules are mixed in equal parts, and the molecules are further divided into 24 parts to prepare the n-3 th round synthesis;

1	ACataattcAC	7	ACcaagtgtAC	13	ACgaatctcAC	19	ACtaatataAC
								2	ACaatttaaAC	8	ACcagcctgAC	14	ACgacctcgAC	20	ACtaccgctAC
3	ACactaagtAC	9	ACccgggccAC	15	ACgcctataAC	21	ACtcactcgAC
								4	ACacggtcaAC	10	ACcgcattcAC	16	ACggaccttAC	22	ACtccgagaAC
5	ACagcagtaAC	11	ACccattggAC	17	ACgtaggcaAC	23	ACtgtaccaAC
								6	ACatctacgAC	12	ACcttgataAC	18	ACgcaatagAC	24	ACttagtccAC

d) determining sequences to be used in the n-3 th round, for example, the following 24 sequences (5 '-3') were synthesized one by one on the basis of each of the n-2 th round mixtures to obtain 24 n-3 th round sequences in the same manner as in the previous round, and then the sequences were mixed by equal number of molecules, and the number of molecules was further divided into 24 parts to prepare the n-4 th round synthesis;

1	ACatataatAC	7	ACcataatgAC	13	ACgaactatAC	19	ACtaacgagAC
								2	ACaatcaggAC	8	ACcaggtacAC	14	ACgacgatcAC	20	ACtacgctaAC
3	ACacttgcaAC	9	ACcgaatgaAC	15	ACgcccgagAC	21	ACtcagatcAC
								4	ACagaaggcAC	10	ACcgctaatAC	16	ACggagtaaAC	22	ACtcgagtcAC
5	ACagctcagAC	11	ACccaggttAC	17	ACgttacacAC	23	ACtgtcaacAC
								6	ACatccgtcAC	12	ACctcaggcAC	18	ACgcatagcAC	24	ACttcaaggAC

e) determining sequences to be used in the n-th-4 round, for example, the following 24 sequences (5 '-3') were synthesized one by one on the basis of each of the n-th-3 round mixtures, and in the same manner as in the previous round, 24 fourth round sequences were obtained after completion of the synthesis, and then mixed by equal number of molecules to prepare for synthesis of universal sequences;

1	ACatacggaAC	7	ACcattgacAC	13	ACgaagagaAC	19	ACtaagcgcAC
								2	ACaatggccAC	8	ACctaagtaAC	14	ACgagaggaAC	20	ACtagatgcAC
3	ACactcctgAC	9	ACcgatacgAC	15	ACgccgcgcAC	21	ACtctaggaAC
								4	ACagatcctAC	10	ACcgccggaAC	16	ACggtaaccAC	22	ACtcgtcatAC
5	ACagcctgcAC	11	ACcctacggAC	17	ACgttcacaAC	23	ACtgcactgAC
								6	ACatcgcatAC	12	ACctctcctAC	18	ACgcacgctAC	24	ACttctgccAC

f) on the basis of the n-th-4 rounds of synthesized mixture, universal sequence tgt aaa acg acg gcc agt aca was further synthesized, so that a mixture of molecular tags (E2+ B7+ F2)4 with specific primers and universal sequences was obtained, wherein 24 × 24 × 24 × 24 ═ 331,776 molecular tags were included, the sequences of the molecular tags were known, the ratio between the molecular tags was 1:1, and the self-correcting function was achieved, and finally the FP sequence with the molecular tags was obtained as: 5-tgtaaaacgacggccagtaca (N44) GGACCCCCACACAGCAAA-3;

g) increasing the number of n can obtain longer molecular tag sequences, and the number of molecular tags also increases, for example, n is 5, and the number of molecular tags is 24 × 24 × 24 × 24 is 7,962,624; it is also possible to keep n equal to 4 and increase the number of the types (E2+ B7+ F2) in each round, for example, 36, i.e., 36 × 36 × 36 × 36 equal to 1,679,616.

2. Synthesis of specific sequence RP: 5-AAG TTA AAA TTC CCG TCG CTA TCA A-3 and the UNITag sequence: 5-tgt aaa acg acg gcc agt aca-3, mixing the FP sequence (UMI-FP) with 331,776 molecular tag synthesized above and the RP sequence according to the following system, and performing PCR amplification;

a) configuration of the 5 × Oligo mix System

	Primer concentration (μ M)	Volume (μ L)
			UMI-FP	100	20
RP	100	20
			0.1×TE		Make up to 1000 μ L
Total		1000μL

b) Configuration of PCR System

Reagent composition	Volume (μ L)
		5×Oligo Mix	6μL
2×Taq Master Mix	15μL
		Ultrasonic disruption of genomic DNA template	10ng/30ng/100ng (3 DNA inputs each repeated 3 times)
Nuclease Free Water	Make up to 30 mu L

c) UMI-PCR amplification procedure

After the PCR was completed, 1 unit of exonuclease I was added to each reaction, and the reaction was incubated at 37 ℃ for 30 minutes and inactivated at 80 ℃ for 30 minutes. Additional 2. mu.L of 10. mu.M RP and 2. mu.L of 10. mu.M UNITag were added for the subsequent PCR amplification procedure.

d) PCR amplification procedure

e) The three 10ng/30ng/100ng 9-tube amplification products are subjected to library construction and sequencing by using a commercial Illumina library construction kit, the diversity of molecular tags is analyzed finally, the reads number of each molecular tag needs to be more than 6 to be counted as one molecular tag, the statistical data is shown in the following table, and the molecular tag library prepared by the embodiment can save or correct about 10% of effective data on average according to the analysis result, so that the effect is very obvious.

Example 4: process method for constructing (E2+ B7+ F2) n molecular tag library by using connection method

1. 30 (E2+ B7+ F2)2 sequences with a CG% content of 50% were selected, wherein E2 and F2 are 0 bases as shown in the following table:

ID	Seq	ID	Seq
				HMB401	aacggttaagacgg	HMB416	agtctagatccgtc
HMB402	aacttggaatggcc	HMB417	atacggaatggcga
				HMB403	aagggaaacaccca	HMB418	atcgcatattcgcg
HMB404	aagttccaccgtgt	HMB419	atctacgcaaccac
				HMB405	aatcaggacctgtg	HMB420	atgatcgcacgttg
HMB406	acagttgacggtca	HMB421	atgcgatcactggt
				HMB407	acatggtacgtgac	HMB422	attgctccaggtac
HMB408	acgaatgactcctg	HMB423	caagtgtcagtgca
				HMB409	actgtacagaaggc	HMB424	caatgtgcatccgt
HMB410	acttgcaagcgact	HMB425	cacaaacccaggtt
				HMB411	agagaagagctcag	HMB426	cagaagtccattgg
HMB412	agatcctaggagag	HMB427	catgtcaccgactt
				HMB413	agcagtaaggctct	HMB428	cattgaccctggaa
HMB414	agggataagtgagc	HMB429	ccaacaacctttcc
				HMB415	agtagctatagccg	HMB430	ccgttaacgagcat

synthesis of PO₃-aaccaccaccaaca + HMB # + accaacaaaccacc sequences, 30 in total, uniformly mixed according to equal molecular number for standby, and marked as UMIseq 30.

500 sequences (E2+ B7+ F2)2 were selected, where E2 and F2 ═ 0 bases, as shown in the table below

AGACGTGTGCTCTTCCGATCTATCA + HMB # + aaccaccaccaaca sequences are synthesized, and the total number of the sequences is 500, the molecules are evenly mixed for standby, and the sequences are marked as UMIseq 500.

3. The following sequences were synthesized:

sequence name	Sequence (5 '-3')
		Primer 1st Stem complementation	ggtggtttgttggtggtggtttgttggt
1st-2nd stemComplementary to each other	tgttggtggtggtttgttggtggtggtt

Wherein, the last base T at the 3 terminal is the ddT modified by dideoxy.

4. Synthesis of the following Table specific primer sequences

Wherein, 5 ends of all the sequences in the table are modified by PO3 phosphate groups.

5. Connecting step

Primer sequences with molecular tags of 37 species were obtained by adjusting the sequence of each specific primer, primer 1st stem complementation, 1st-2nd stem complementation, UMIseq30 and UMIseq500 to a concentration of 2. mu.M based on the total molecular weight, mixing them at a volume ratio of 1:2:2:2:2, ligating them with a commercial ligase kit and performing the procedures as recommended by the manufacturer, wherein 30X 500-15,000 species are assigned to each primer. It should be noted that, according to the principle of poisson distribution, it is not sufficient to label 10ng of about 3000 copies of a molecule with 15,000 molecular tags, and it may not be sufficient to label each of 3000 molecules with a unique tag, but for a low proportion of mutant molecules, say 1%, there are only 30 molecules, and then the labeling of 30 molecules with 15,000 sequences is sufficient to label each mutant molecule with a unique tag, so this example is more suitable for the field of tumor detection or for detecting low proportions of target molecules. The amplification of the molecular tag species in this example is also simple, i.e., the number of species in the 1st stem sequence and the 2nd stem sequence can be increased, e.g., to 40 species and 1000 species, respectively, and the number of the finally obtained molecular tag species is equal to 40,000.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for preparing a molecular tag library for sequencing is characterized by comprising the following steps:

a, b, c and d are information bits and represent digital sequences converted from 4-base sequences randomly formed by the base A, T, G, C, and the base A, T, C, G is converted into the digital sequences in a mode that: a is 1, T is 2, G is 3, C is 4;

wherein floor is a down-rounding function;

2. The method of claim 1, wherein the CG% of the n (E2+ B7+ F2) is between 35% and 75%.

3. The method of claim 1, wherein the ratio of the number of molecular tag sequences in the molecular tag library for sequencing to the number of target molecules is greater than 10: 1.

4. The method of claim 1, wherein the step of binding the molecular tag sequence to the specific sequence comprises the steps of:

5. The method of claim 1, wherein the specific sequence is a PCR amplification primer, a hybridization probe, an isothermal extension primer, or a ligation primer.

6. The method according to claim 1, wherein the molecular tag B7 sequence is any one of the following sequences:

7. an error correction method for a molecular tag library is characterized by comprising the following steps:

wherein a, B, c, d, x, y and z represent 7 bases of the sequence of molecular tag B7,

wherein floor is a down-rounding function;

8. The method for correcting errors in molecular tag library according to claim 7, wherein the step S002 comprises the steps of evaluating whether the information bits a, b, c and d have errors according to the values of temp1, temp2 and temp 3:

b1＝14-a-d-x-4*floor((14-a-d-x-1)/4)，b＝b1

b is replaced by b 1;

and set b2 ═ 14-c-d-y-4 flow ((14-c-d-y-1)/4)

c1＝14-a-d-z-4*floor((14-a-d-z-1)/4)，c＝c1

c is replaced by c 1;

and c2 ═ 14-b-d-y-4 flow ((14-b-d-y-1)/4)

a1＝14-b-d-x-4*floor((14-b-d-x-1)/4)，a＝a1

replacing a with a 1;

and the setting a2 ═ 14-c-d-z-4 ═ floor ((14-c-d-z-1)/4)

d1＝14-a-b-x-4*floor((14-a-b-x-1)/4)，d＝d1

D is replaced by d 1;

and d2 is set to 14-b-c-y-4 floor ((14-b-c-y-1)/4);

d3＝14-a-c-z-4*floor((14-a-c-z-1)/4)；

where floor is a floor rounding function.