CN109517882B

CN109517882B - Quality control method for detecting unique double-end library label combination and application

Info

Publication number: CN109517882B
Application number: CN201811337895.2A
Authority: CN
Inventors: 张之宏; 罗健; 汉雨生
Original assignee: Guangzhou Burning Rock Dx Laboratory Co ltd
Current assignee: Guangzhou Burning Rock Dx Laboratory Co ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2021-08-17
Anticipated expiration: 2038-11-09
Also published as: CN109517882A; CN113957123A

Abstract

The invention discloses a quality control method for detecting a unique double-end library label combination and application thereof, belonging to the technical field of biological detection, wherein the quality control method comprises the following steps: s1) constructing a gDNA library with a unique double-ended library tag combination, performing on-machine sequencing on the constructed library, reading a library tag sequence, S2) performing first quality control analysis on the library tag sequence, S3) replacing problematic tag raw materials under the condition of needing according to the analysis result of S2, reconstructing the gDNA library with the unique double-ended library tag combination according to the step S1) method, performing on-machine sequencing on the constructed library, reading the library tag sequence, S4) performing second quality control analysis on the library tag sequence, and judging whether to continue replacing problematic library tags according to the S3) method according to the result until all library tags meet the index of the quality control analysis. The quality control method can improve the detection efficiency of the library label and is more suitable for the requirement of accurate sequencing of the library.

Description

Quality control method for detecting unique double-end library label combination and application

Technical Field

The invention belongs to the technical field of biological detection, and particularly relates to a quality control method for detecting a unique double-end library label combination and application thereof.

Background

With the rapid development of high-throughput technology and the increasing throughput of sequencers, previous methods for distinguishing different sequencing libraries by using physical partition methods such as Lane (Lane) type Flow Cell (Flow Cell) have not been applicable. Multiplex library Sequencing (Multiplex Sequencing) is widely used in various fields of next generation Sequencing. The key to multiplex library sequencing is the library tag (Index). The library tag is used for marking a special sequence of each sample in the preparation of an NGS (Next Generation sequencing) library, and is used for distinguishing specific sequences of DNAs from different sources, and the length of the specific sequences is generally 4-12 bases. In the high-throughput sequencing process, libraries labeled with different known tag sequences are mixed and then subjected to sequencing reaction, and the inserts and tags of the libraries are sequentially read and converted into bases. In the next analysis process, the software classifies the sequencing results by using the expected tag sequences, and splits the sequencing results into different samples.

In the multiple sequencing process, if library sequence misassignment occurs, sequences that originally did not belong to a library are misclassified. The occurrence of such erroneous assignments will lead to erroneous analysis results for some applications. For example, when a library of tissue samples from cancer patients is sequenced together with a library of tissue samples from benign tumor patients, if some of the sequences of the cancer tissue samples are misinterpreted in the benign tumor tissue samples, the detection report of the benign tumor patients shows malignant tumors, resulting in a diagnosis error.

There are many reasons why library sequences may be misassigned. Common include the following: 1) cross-contamination during library preparation, 2) cross-contamination during production of tag primers, 3) cross-reaction that occurs when multiple libraries are subjected to clustering in a flow cell, and 4) optical bias due to excessive cluster density, etc.

The tag primers suitable for the second generation sequencing library are usually 50-70 bases in length, and generally need to be purified to ensure the purity of the full-length primers. However, purification itself often leads to more cross-contamination due to the need for gel-cutting recovery or column chromatography. Adsorption and repeated use of the labeled primers by the purification column inevitably brings about cross-contamination in terms of HPLC (high performance liquid chromatography). Although such contamination can reduce residual contamination by performing an empty sample elution or an irrelevant sample elution between two different tagged primer column purifications, it still does not completely avoid cross-contamination. According to the experience, 0.5 to 5 percent of the former tag primer to the latter tag primer can be remained after two times of purification.

Due to the high sensitivity brought by the high throughput of NGS, quality testing of tagged primers requires very sensitive methods for detecting possible contamination down to thousandths or even ten thousandths. In addition, conventional methods such as qPCR are not suitable for detecting contamination either in sensitivity or specificity due to the very similar sequence between the tag primers. The conventional method still uses the NGS platform for quality inspection, but the conventional method can only detect one target label primer at most for each Lane, thus making the cost of quality inspection high and unsmooth.

Therefore, it is necessary to design a new quality control method for unique paired-end library tag combinations to improve the detection efficiency.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a quality control method for detecting a unique double-end Index combination and application thereof, which can improve the detection efficiency of library labels and meet the requirement of accurate sequencing of libraries.

In order to achieve the purpose, the invention adopts the technical scheme that: a quality control method for detecting unique paired-end library-tag combinations, comprising the steps of:

s1) constructing a gDNA library with a unique double-end library label combination by taking the library label standard substance and the gDNA standard substance as raw materials, carrying out on-machine sequencing on the constructed library, and reading a library label sequence;

s2), performing first quality control analysis on the library tag sequence, wherein the indexes of the quality control analysis comprise the following items: the maximum single-side label pollution ratio is less than or equal to 2.5 percent, the maximum label combination pollution ratio is less than or equal to 0.01 percent, the number of label sample sequences in each group is more than or equal to 5000, the mixed ratio variance coefficient of all label combinations is less than or equal to 0.5, the comprehensive sequence passing rate is more than or equal to 97 percent, the ratio of the label sample sequences in each group is more than or equal to 0.2/logarithm of the library label combinations, and the ratio of the single-side more than 1 percent polluted labels is less than or equal to 10 percent;

s3) if the quality control analysis in the step S2) shows that the indexes do not meet the requirements, re-synthesizing the library labels which do not meet the requirements of quality control; constructing a gDNA library with a unique double-end library label combination by taking the newly synthesized library label, the library label meeting the requirements of the first quality control analysis and gDNA as raw materials according to the method of the step S1), performing machine sequencing on the constructed library again, and reading a library label sequence;

s4) performing secondary quality control analysis on the library label sequences until all library labels meet the indexes of the quality control analysis;

in the parameters of quality control analysis, the unique paired-end library signature combination consists of an upstream library signature and a downstream library signature, the upstream library signature is collectively referred to as IG5, IG5 comprises a and B; the downstream library tags are collectively referred to as IG7, IG7 comprises a and b; matching and correct unique paired end library tag combinations are A-a and B-B; the unique paired-end library tag combinations that did not match were A-B, and B-a; the respective sequence number of the above combinations can be obtained by analysis after each sequencing reaction;

the single-sided label contamination ratio is the proportion of cross-contamination occurring between labels within a group, and contamination is only possible within a group, i.e. within IG5 group or/and IG7 group;

when a of IG7 does not generate any cross contamination in the production process, for A of IG5, the contamination ratio of B is the number of sequences containing B-a/the number of all sequences containing a,

when A of IG5 does not generate any cross contamination in the production process, for a of IG7, the contamination containing b accounts for the number of sequences containing A-b/the number of all sequences containing A;

when B contaminates a and B contaminates a, then B-B tag combination contamination ratio ═ (number of sequences containing B-a/number of all sequences containing a) × (number of sequences containing a-B/number of all sequences containing a);

when B of IG7 does not generate any cross contamination in the production process, for B of IG5, the contamination ratio of A is the number of sequences containing A-B/the number of all sequences containing B,

when B of IG5 does not generate any cross contamination in the production process, for B of IG7, the contamination ratio of a is the number of sequences containing B-a/the number of all sequences containing B;

when a contaminates B and a contaminates B, then a-a tag combination contamination ratio ═ (number of sequences containing a-B/number of all sequences containing B) × (number of sequences containing B-a/number of all sequences containing B);

the number of the sequence of each group of label samples is the number of the correct pairing sequence of each group after system filtration, namely the number of the sequence containing A-a or the number of the sequence containing B-B;

the mixed proportion variance coefficient of all the label combinations is the variance coefficient of the proportion of the number of each group of correct paired sequences filtered by the system to the number of total paired correct sequences filtered by the system;

the comprehensive sequence passing rate is the proportion of the total number of correctly matched and effective sequences after the sequencing reaction is filtered by the system to the total number of all sequences after the sequencing reaction is filtered by the system;

the proportion of each group of label sample sequences is the proportion of the number of correctly paired sequences in each group after system filtration to the total sequence after system filtration;

the label with more than 1% of pollution on one side comprises the following components in percentage by weight: in the upstream library tags, the proportion of the number of the library tags with the pollution proportion of more than 1 percent to the total number of the library tags is calculated; and, within the downstream library tags, a contamination ratio greater than 1% of the number of library tags to the total number of library tags.

As an improvement of the above technical solution, the step S1) includes the following steps in sequence: preparing a gDNA standard, fragmenting gDNA, repairing tail ends, connecting joints, purifying joint connection products, amplifying libraries, purifying amplification libraries, detecting the quality of the purified libraries, detecting the sizes of the fragments of the purified libraries and sequencing on the libraries.

As an improvement of the technical scheme, the unique double-end library tag combination consists of an IG5 group and an IG7 group, the Hamming distance of the library tags in the IG5 and IG7 groups is more than or equal to 3, and the sequence Hamming distance of the library tags between the IG5 and IG7 groups is more than or equal to 2.

As a further improvement of the technical scheme, the library label is purified by high performance liquid chromatography and the molecular weight is confirmed by mass spectrometry, and the purity is required to be more than or equal to 85%.

As an improvement of the technical scheme, the unique paired-end library tag combination consists of 96 pairs of library tags, namely 96 upstream library tags are arranged in an IG5 group, and 96 downstream library tags are arranged in an IG7 group, which are in one-to-one correspondence; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.2 percent.

As an improvement of the technical scheme, the unique paired-end library tag combination consists of 48 pairs of library tags, namely 48 upstream library tags exist in an IG5 group, and 48 downstream library tags exist in an IG7 group, which correspond to each other one by one; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.4 percent.

As an improvement of the technical scheme, when the unique double-end library tag combination consists of 192 pairs of library tags, namely 192 upstream library tags are arranged in an IG5 group, 192 downstream library tags are arranged in an IG7 group, the tags are in one-to-one correspondence, and the sample sequence proportion of each group of tags is correspondingly adjusted to be more than or equal to 0.1 percent.

As an improvement of the technical scheme, when the unique double-end library tag combination consists of 288 pairs of library tags, namely 288 upstream library tags are arranged in an IG5 group, 288 downstream library tags are arranged in an IG7 group, and the tag sample sequence of each group is correspondingly adjusted to be more than or equal to 0.07 percent in percentage.

As an improvement of the technical scheme, when the unique double-end library tag combination consists of 384 pairs of library tags, namely 384 upstream library tags are arranged in an IG5 group, 384 downstream library tags are arranged in an IG7 group, and the tag sample sequences in each group are correspondingly adjusted to be more than or equal to 0.05 percent in percentage.

In addition, the invention also provides application of the quality control method in sample sequence determination.

The invention has the beneficial effects that: the invention provides a quality control method for detecting a unique double-end library label combination and application thereof, wherein the quality control method can efficiently detect the cross contamination of library labels, has relatively low cost and is more suitable for high-throughput determination of sample sequences.

Drawings

FIG. 1 shows the results of quality control of the first simulation in example 1;

FIG. 2 shows the results of quality control of the second simulation in example 1;

FIG. 3 shows the result of the first quality control analysis at the IG5 end of example 2;

FIG. 4 is a point diagram of pollution specific heat ratios of first quality control analysis at an IG7 end of example 2, in FIG. 4, 96 pairs of labeled primers are provided, the abscissa of the labeled primers sequentially from left to right is IG5A 01-IG5A 12, IG5B 01-IG 5B12, IG5C 01-IG 5C12 to IG5H 01-IG 5H12, and the ordinate of the labeled primers sequentially from top to bottom is IG7A01-IG 7A12, IG7B 01-IG 7B12, IG7C 01-IG 7C12 to IG7H 01-IG 7H 12; the dots encircled by ovals in the figure represent unsatisfactory tag primers; the following is similar;

FIG. 5 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG5 end of example 2;

FIG. 6 is a distribution diagram of contamination ratios of first quality control analysis at IG7 and IG5 terminals of example 2;

FIG. 7 shows the stability comparison results of two quality control analyses at the IG7 end in example 2;

FIG. 8 shows the stability comparison results of two quality control analyses at the IG5 end in example 2;

FIG. 9 shows the results of the first quality control analysis at the IG5 end of example 3;

FIG. 10 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG7 end of example 3;

FIG. 11 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG5 end of example 3;

FIG. 12 is a distribution diagram of contamination ratios of the first quality control analysis at IG7 and IG5 terminals of example 3.

Detailed Description

To better illustrate the objects, aspects and advantages of the present invention, the present invention will be further described with reference to the following detailed description and accompanying drawings.

In the present specification, Index, library tag and tag primer mean the same; in the calculation of the percentage of sequences in each set of tag samples, the result of 0.2/log of the library tag combinations retained a non-zero number (and rounded off).

Principle of unique double-end library label in preventing sample pollution caused by cross contamination

In the NGS field, to distinguish different samples in the same sequencing reaction, specific "tags" (indexes) are applied to the different samples during the library construction process so that the data of the different samples can be separated during the subsequent data analysis. With the increasing throughput of sequencers, more samples were pieced together into the same flow channel (Lane) for sequencing, placing higher demands on the number and discrimination of indexes. In addition, both Illumina HiSeqX/4000 and NovaSeq use a clustering method different from other Illumina sequencers, and the literature reports that the method has higher risk of Index cross contamination. The traditional single-ended Index primer is used for data splitting only according to one end, and data are easily subjected to missplitting when pollution occurs. By adopting the unique double-end Index primer, the sample pollution risk caused by Index cross contamination can be avoided to the greatest extent, and the stability and reliability of the product are ensured. The unique double-ended Index primers rely on the unique double-ended paired Index for data resolution, thereby adding a 'double insurance' to the sequencing sequence, and most sequences with pollution can be discarded. Table 1 compares the tolerance of single-ended, combined double-ended, and unique double-ended Index strategies to Index cross-contamination.

TABLE 1

Principle for carrying out high-throughput pollution quality inspection on unique double-end Index primer by NGS method

Because a unique double-ended Index was used, each sample was labeled 2 times with the Index, which greatly increased the tolerance to cross-contamination between single-ended labeled primers. For example, if the ratio of 2 to Index single-sided contamination is 1%, the actual risk of sample misconception is 1% × 1% — 0.01%. This tolerance also greatly eases the pressure of synthesis and purification of Index primers, allowing further control of manufacturing costs.

By utilizing the advantages of the unique double-end Index, the invention provides a simple and feasible quality control method for detecting the cross contamination of the labeled primers by utilizing NGS. The rationale is based on observing the proportion of the undesired double-ended Index combinations in the overall sequencing result to estimate the maximum cross-contamination probability that can occur and the Index involved, thus avoiding misallocation between samples due to cross-contamination between Index primers.

For example, four libraries are labeled a + a, B + B, C + C, D + D, respectively. Therefore, only the 4 combinations described above were considered legal combinations when sequence analysis was performed. Taking the combination a + b as an example, since theoretically only a would pair with a, if two possibilities are observed for the a + b combination: 1) the label primer b enters the primer a, wherein S is defined as the number of sequences containing the Index of the species, and the estimated pollution ratio is S_(A+b)/S_A(ii) a 2) The primer A enters the primer B, and the estimated pollution ratio is S_(A+b)/S_b. It should be noted that the calculation method is premised on the fact that the indices of the same category, such as A/B/C/D, do not contain any indices of different categories, such as a/B/C/D. In addition, the estimation model only considers simple one-to-one pollution modes, and does not consider complex situations such as multiple pollution and the like. In addition, the calculation method only estimates the maximum pollution possibility and has no capability of judging the pollution directionality, and in fact, after any one unidirectional pollution event occurs, for example, the event of 'A entering B', can be detected as two possibilities of 'A entering B' or 'B entering a'. From this computational model, we can estimate that the maximum combinatorial contamination risk of the unique combinatorial paired-end Index library a + a by other primers within multiplex sequencing is:

however, since the combination we desire is only four, a + a, B + B, C + C, D + D, the effective maximum contamination risk can be calculated as:

in practical examples, PCR operations were performed on 48 or 96Index primers to label the Index into the library, respectively, and then mixed together for routine MiSeq sequencing. Analysis after sequencing analysis the analysis script was directly invoked to analyze 96 × 6 ═ 9216 sequence combinations, to find the proportion of abnormal combinations and to calculate the respective contamination ratios.

Index on-machine sequencing

1. Preparation of gDNA Standard

1) The 48plex Index requires 500ng of gDNA standard substance for quality inspection, and the 96plex Index requires 1000ng of gDNA standard substance for quality inspection;

2) 50 μ l of 1 × IDTE Buffer was added to a new 1.5ml Eppendorf Lobind tube, and the corresponding volume of gDNA standard was added to the tube: detecting by a 48plex Index plate, wherein the volume of the gDNA standard substance is 2 mu l; detecting by a 96plex Index plate, wherein the volume of the gDNA standard substance is 4 mu l; then, uniformly mixing for 10-15 s in a vortex mode, and then centrifuging for a short time to enable the solution to return to the bottom of the tube;

3) the standard dilutions were transferred to Covaris MicroT μm BE tubes supplemented with 1 × IDTE Buffer to 50 μ l prior to subsequent DNA fragmentation operations.

2. Fragmentation of gDNA

The DNA is broken into fragments of 170-200 bp by using a Covaris M220 instrument, and after the breaking is completed, a Covaris Microtube is taken out, and the liquid is centrifuged to return to the bottom of the tube.

3. End repair, 3' end plus A

1) Preparation of reagents: opening KAPA Hyper Prep 96reaction Kit, taking out the lower 2 tubes and putting on ice for melting;

2) preparing a mixed solution of a terminal repairing reaction system and a reaction system A on ice in a new 1.5ml Eppendorf Lobind tube, flicking the finger 3-5 times, mixing the mixture 2-3 times in an upside-down mode, and centrifuging the mixture for 1-3 seconds by using a centrifuge; the configuration of the reaction system is shown in table 2;

3) sucking 60 mul of the uniformly mixed solution, subpackaging the uniformly mixed solution into 4 (48plex Index plates) or 8 (96plex Index plates) 0.2ml of flat-cap PCR tubes, and centrifuging the tubes for 1 to 3 seconds by a centrifuge;

4) putting the mixture into a PCR instrument, and performing the following operations: heating the cover at 85 deg.C for 30min at 20 deg.C, 30min at 65 deg.C, storing at 4 deg.C, and proceeding to the next step within 2 hr.

TABLE 2

4. Connecting the two ends of the DNA double-stranded fragment added with the A with a preformed joint (containing T sticky ends)

1) Preparing a joint connection reaction system mixing solution on ice in a new 1.5ml Eppendorf LoBind tube, flicking the fingers 3-5 times, reversing the fingers up and down, mixing the mixture for 2-3 times, and centrifuging the mixture for 1-3 seconds by using a centrifuge; the configuration of the reaction system is shown in Table 3;

2) sucking 50 mul of the uniformly mixed solution, adding the uniformly mixed solution into the 0.2ml tube (the total volume of the 48plex Index plate is 4 tubes, and the total volume of the 96plex Index plate is 8 tubes), blowing and beating the uniformly mixed solution up and down by a pipettor for 5 times, and centrifuging the mixture for 1 to 3 seconds;

3) the following program was run on the PCR instrument: storing at 20 deg.C for 15min, 70 deg.C for 10min, and 4 deg.C (hot cover at 85 deg.C).

TABLE 3

5. Purification of the ligation product to remove linker dimer and unlinked linker

1) Reversing the top and the bottom for 2-3 times, and carrying out vortex mixing on the SPB magnetic beads which are returned to the room temperature for 5-10 s to homogenize the SPB magnetic beads; taking a 1.5ml centrifugal tube, and connecting a reaction system and the volume of the magnetic beads to be 1: 0.8, adding the homogenized magnetic beads and the joint products in sequence; the specific strategy is as follows: 352 mu l of magnetic beads and 440 mu l of adaptor products, and combining 4 tubes into 1 tube for purification, wherein the total volume is 1 tube; 2X 352. mu.l of magnetic beads, 2X 440. mu.l of adaptor product, 4 tubes combined into 1 tube for purification, 2 tubes in total (96plex Index); adding, mixing, rotary incubating for 5min, and centrifuging for a short time;

2) placing the centrifugal tube in a magnetic frame, and waiting for the solution to be clarified; placing the centrifugal tube on a magnetic frame without moving, opening a tube cover, carefully sucking clear supernatant away to avoid touching magnetic beads;

3) the tube is still placed on a magnetic frame, 500 mu L of freshly prepared 75% ethanol is added into each tube, the magnetic beads are fully precipitated after 1min, and the centrifugal tube is slowly rotated for 1 circle along the horizontal direction during the period, so that the ethanol is sucked away; repeating the step for 1 time;

4) centrifuging for 1-3 s, putting the centrifugal tube back to the magnetic frame again, standing for 30s, removing residual ethanol by using a liquid transfer device, and keeping the tube cover open; drying the magnetic beads at room temperature for 3min, adding 500 μ l EB solution into each tube, fully and uniformly blowing, and incubating at room temperature for 2 min; the centrifuge tubes were placed in a magnetic rack for 2min until the solution cleared, 490. mu.l of the supernatant was removed using a pipette, transferred to a new Eppendorf Lobind 1.5ml centrifuge tube (96plex Index plate, both tubes were pooled into 1 tube after elution), and kept on ice until needed.

6. Amplification of libraries, amplification of libraries to which adaptors have been ligated

1) Preparing a reaction system mixing solution (prepared on ice) with a corresponding volume in a 5ml Eppendorf Lobind tube (or a 15ml centrifuge tube), flicking with fingers for 3-5 times, mixing the mixture for 2-3 times in an upside-down mode, and standing the mixture vertically for 0.5-1 min; the configuration of the reaction system is shown in table 4;

2) the prepared reaction mixture was equally distributed into 8 tubes, and 138. mu.l of the reaction mixture was equally distributed (96Index pair Plate (refer part2#) for each test, and two times of equal distribution were required: 142. mu.l + 132. mu.l);

3) the reaction mixture was dispensed into a new 48-well plate (48plex Index) or 96PCR plate (96plex Index) in a volume of 22.5. mu.l/well;

4) taking out 2.5 mul Index from the IDP plate (adding the Index into the 48-hole plate or 96-hole PCR plate of the subpackaged reaction system mixed solution, repeatedly blowing, uniformly mixing for 2-3 times, and sealing the membrane; centrifuge at 1000rpm for 1min (reaction volume 25. mu.l) with a plate-throwing machine; the PCR product was placed in a PCR machine and the procedure was as shown in Table 5.

TABLE 4

TABLE 5

7. Amplified library purification, primer dimer removal and reaction system

1) The SPB magnetic beads are inverted from top to bottom for 2-3 times, and are uniformly mixed for 5-10 s at the maximum VORTEX rotation speed to be uniform;

2) the corresponding SPB beads were pipetted into the wells, and 20 μ l of SPB beads was added to each sample (sample: beads ═ 1: 0.8): adding 1440 mu L of magnetic beads into the sample adding groove for 48 samples, and adding 2880 mu L of magnetic beads into the sample adding groove for 96 samples;

3) taking out the 48-hole plate from the PCR instrument, and carefully tearing off the adhesive film at 1000rpm for 3 s; sucking 20 mul SPB magnetic beads from the sample adding groove, adding into a 48-hole plate/96-hole PCR plate, and blowing up and down for 10 times;

4) pasting films on a 48-hole plate/96-hole PCR plate, centrifuging for 3s at 1000rpm for a short time, and placing at room temperature for 5 min; placing the 48-hole plate/96-hole PCR plate on a 96-hole magnetic frame until the solution is clarified; discarding the membrane, sucking 45 mu l of supernatant and discarding;

5) the 48-well plate/96-well PCR plate was still placed on the magnetic stand, 200. mu.l of freshly prepared 75% ethanol was added to the sample wells; standing a 48-hole plate/96-hole PCR plate on a magnetic frame to fully soak and wash the magnetic beads, and discarding the ethanol after 1 min; repeating the step for 1 time;

6) standing the 48-hole plate/96-hole PCR plate on a magnetic frame for 30s, and removing residual ethanol; taking the 48-hole plate/96-hole PCR plate from the magnetic frame, placing the plate on the PCR plate frame at room temperature for 2min, and drying the magnetic beads; adding 14 mul EB into a 48-hole plate/96-hole PCR plate, covering an eight-connecting-tube cover, whirling for about 5s, and centrifuging for 3s at 1000rpm for a short time;

7) incubating the 48-pore plate at room temperature for 2min, removing the membrane, and placing the 48-pore plate on a magnetic frame for 2min until the solution is clear; transferring 8 mu L of the supernatant into a new 48-well plate/96-well PCR plate without magnetic beads;

8) transferring each array of libraries to the same new 0.2ml 8-linked tube, transferring the libraries in the 0.2ml 8-linked tube to the same new 1.5ml Eppendorf Lobind tube, combining the libraries into a pooling library, mixing the libraries by Vortex and centrifuging; and taking out 20 mu l of the uniformly mixed purified library to a new 1.5ml Eppendorf Lobind tube, adding 180 mu l of EB, repeatedly blowing for 5-6 times, and diluting the library by 10 times in advance to prepare for subsequent detection. 8. Quality control of purified libraries

Use of

The dsDNA HS (High Sensitivity) Assay Kit (Thermo Fisher) measures the diluted library concentration and converts back to the pre-library concentration; the concentration of the library is between 9 and 60 ng/mu l, and the Labchip result is normal, the constructed part of the library is qualified, and the subsequent Miseq computer can be carried out; if the requirements are not met, the library preparation needs to be carried out again.

9. Purified Library fragment size detection (Library QC)

The diluted library was tested using The LabChip DNA High Sensitivity Reagent kit (Perkin Elmer); the main peak of the qualified library fragment is 350-500 bp, and no obvious small fragment is in the range of 10-150 bp.

10. Library computer strategy (Miseq Run)

1) Diluting the purified library to 4nM according to the detection concentration of QC, and diluting 1N NaOH to 0.2N by using nuclease-free water;

2) library denaturation: adding 5 mul of the library diluted to 4nM into a new 1.5ml Eppendorf Lobind tube, then adding 5 mul of 0.2N NaOH, blowing and uniformly mixing for 15-20 times, and incubating for 5min at room temperature;

3) the library was diluted to 13 pM;

4) subsequent manipulations the library was sequenced using the corresponding settings Read 1-12 cycles, Index 1-8 cycles, and Index 2-8 cycles, with reference to the Illumina Miseq instructions.

11. Sequencing data Analysis (QC Analysis)

And (3) outputting sequences (Fastq format) of all the

indexes

1 and 2 by using Illumina bcl2Fastq software together with corresponding parameters, and performing statistical analysis on the sequences by using corresponding scripts to obtain each index.

11. Library sequencing result determination criteria

Miseq off-line index: sequencing data quality 01: q30> 90%, sequencing data quality 02: PF > 97%, sequencing data quality 03: both the Phasing and the Prephasing are less than 0.30.

Example 1Simulation of quality control method

1) Simulating unidirectional pollution for the first time: 1 cross-contamination is detected for the first time, 2 presumed contamination directions are given, and the maximum contamination ratio (namely the maximum single-side label contamination ratio) is 4 percent; generating 96 pairs of standard pairing sequences by simulation data, generating polluted normal pairing IG7F01+ IG5F 0148000 and IG7F01+ IG5E 012000; each of the remaining pairs is 50000. Performing data analysis on the simulated library data, wherein the actual test result is shown in table 6, and performing quality control analysis according to the parameters in table 6 to obtain a quality control analysis result shown in fig. 1; the maximum paired contamination ratio product (i.e., the maximum tag combined contamination ratio) is 4% × 0 ═ 0, the number of correctly paired sequence pieces is 48000, the number of correctly paired and valid sequence pieces is 48000, the sequence throughput rate is 100%, the number of tags with greater than 1% contamination on one side is 1, and the contamination index ratio greater than 1% (i.e., the tag with greater than 1% contamination on one side) is 1/96 ═ 1.04%.

TABLE 6

2) The second simulation can cause bidirectional contamination of misclassified samples: 2 cross contaminations are detected for the second time, and the 2 cross contaminations can cause sample misclassification, the maximum contamination ratio is 2 percent, and the maximum pairing contamination product is 0.04 percent; the standard pairing sequences generated by the simulation data are all 50000, and the normal pairing IG7F01+ IG5F 0148000, the error pairing IG7F01+ IG5E 011000 and the IG7E01+ IG5F011000 appear in pollution. The simulated library samples were subjected to data analysis, the actual test results are shown in table 7, and quality control analysis was performed according to the parameters in table 7, and the obtained quality control analysis results are shown in fig. 2.

TABLE 7

Therefore, the test result of the simulation test is consistent with the expectation.

Example 2

In this example, the library tags were subjected to quality inspection at 96, the first quality control analysis report results are shown in tables 8 and 9, and tables 8 and 9 only list the contamination.

Table 8 sequencing results for Index at end IG7

Query

Desired combined object

Desired combination

Undesired combinations

Undesired combination object

Total number of sequences

Number of undesired combined sequences

Pollution source

Is contaminated with

Proportion of pollution

IG7A01

IG5A01

IG7A01-IG5A01

IG7A01-IG5B01

IG5B01

96

45

IG5B01

IG5A01

46.88％

IG7A01

IG5A01

IG7A01-IG5A01

IG7A01-IG5A02

IG5A02

96

51

IG5A02

IG5A01

53.13％

IG7A08

IG5A08

IG7A08-IG5A08

IG7A08-IG5H07

IG5H07

53249

88

IG5H07

IG5A08

0.17％

IG7B02

IG5B02

IG7B02-IG5B02

IG7B02-IG5A03

IG5A03

40825

43

IG5A03

IG5B02

0.11％

IG7B10

IG5B10

IG7B10-IG5B10

IG7B10-IG5D08

IG5D08

46021

70

IG5D08

IG5B10

0.15％

IG7B11

IG5B11

IG7B11-IG5B11

IG7B11-IG5C11

IG5C11

47969

68

IG5C11

IG5B11

0.14％

IG7C01

IG5C01

IG7C01-IG5C01

IG7C01-IG5G12

IG5G12

39518

64

IG5G12

IG5C01

0.16％

IG7C06

IG5C06

IG7C06-IG5C06

IG7C06-IG5C07

IG5C07

60810

637

IG5C07

IG5C06

1.05％

IG7C08

IG5C08

IG7C08-IG5C08

IG7C08-IG5B08

IG5B08

67961

119

IG5B08

IG5C08

0.18％

IG7D03

IG5D03

IG7D03-IG5D03

IG7D03-IG5E03

IG5E03

44222

48

IG5E03

IG5D03

0.11％

IG7D03

IG5D03

IG7D03-IG5D03

IG7D03-IG5C03

IG5C03

44222

56

IG5C03

IG5D03

0.13％

IG7D04

IG5D04

IG7D04-IG5D04

IG7D04-IG5D03

IG5D03

40521

41

IG5D03

IG5D04

0.10％

IG7D07

IG5D07

IG7D07-IG5D07

IG7D07-IG5E08

IG5E08

39029

281

IG5E08

IG5D07

0.72％

IG7D08

IG5D08

IG7D08-IG5D08

IG7D08-IG5C08

IG5C08

53581

85

IG5C08

IG5D08

0.16％

IG7D09

IG5D09

IG7D09-IG5D09

IG7D09-IG5E09

IG5E09

54786

70

IG5E09

IG5D09

0.13％

IG7E03

IG5E03

IG7E03-IG5E03

IG7E03-IG5F03

IG5F03

60714

78

IG5F03

IG5E03

0.13％

IG7E07

IG5E07

IG7E07-IG5E07

IG7E07-IG5D07

IG5D07

57285

88

IG5D07

IG5E07

0.15％

IG7F04

IG5F04

IG7F04-IG5F04

IG7F04-IE5D04*

IE5D04*

49814

54

IE5D04*

IG5F04

0.11％

IG7F07

IG5F07

IG7F07-IG5F07

IG7F07-IG5E07

IG5E07

55273

63

IG5E07

IG5F07

0.11％

IG7G08

IG5G08

IG7G08-IG5G08

IG7G08-IG5F08

IG5F08

43769

167

IG5F08

IG5G08

0.38％

IG7G10

IG5G10

IG7G10-IG5G10

IG7G10-IG5F06

IG5F06

57227

60

IG5F06

IG5G10

0.10％

IG7H02

IG5H02

IG7H02-IG5H02

IG7H02-IG5H03

IG5H03

38360

58

IG5H03

IG5H02

0.15％

IG7H07

IG5H07

IG7H07-IG5H07

IG7H07-IG5G07

IG5G07

36388

42

IG5G07

IG5H07

0.12％

Table 9 sequencing results for Index at end IG5

Query

Desired combined object

Desired combination

Undesired combinations

Undesired combination object

Total number of sequences

Number of undesired combined sequences

Pollution source

Is contaminated with

Proportion of pollution

IG5A01

IG7A01

IG5A01-IG7A01

IG5A01-IG7B01

IG7B01

26

IG7B01

IG7A01

100.00％

IG5A02

IG7A02

IG5A02-IG7A02

IG5A02-IG7A01

IG7A01

49928

51

IG7A01

IG7A02

0.10％

IG5A03

IG7A03

IG5A03-IG7A03

IG5A03-IG7B02

IG7B02

33067

43

IG7B02

IG7A03

0.13％

IG5A08

IG7A08

IG5A08-IG7A08

IG5A08-IG7B08

IG7B08

53201

60

IG7B08

IG7A08

0.11％

IG5B08

IG7B08

IG5B08-IG7B08

IG5B08-IG7C08

IG7C08

61974

119

IG7C08

IG7B08

0.19％

IG5B11

IG7B11

IG5B11-IG7B11

IG5B11-IG7A11

IG7A11

47967

51

IG7A11

IG7B11

0.11％

IG5C03

IG7C03

IG5C03-IG7C03

IG5C03-IG7D03

IG7D03

49273

56

IG7D03

IG7C03

0.11％

IG5C07

IG7C07

IG5C07-IG7C07

IG5C07-IG7C06

IG7C06

45027

637

IG7C06

IG7C07

1.41％

IG5C08

IG7C08

IG5C08-IG7C08

IG5C08-IG7D08

IG7D08

67868

85

IG7D08

IG7C08

0.13％

IG5C11

IG7C11

IG5C11-IG7C11

IG5C11-IG7B11

IG7B11

57807

68

IG7B11

IG7C11

0.12％

IG5D07

IG7D07

IG5D07-IG7D07

IG5D07-IG7E07

IG7E07

38866

88

IG7E07

IG7D07

0.23％

IG5D07

IG7D07

IG5D07-IG7D07

IG5D07-IG7C08

IG7C08

38866

49

IG7C08

IG7D07

0.13％

IG5D08

IG7D08

IG5D08-IG7D08

IG5D08-IG7B10

IG7B10

53619

70

IG7B10

IG7D08

0.13％

IG5D08

IG7D08

IG5D08-IG7D08

IG5D08-IG7E08

IG7E08

53619

65

IG7E08

IG7D08

0.12％

IG5E07

IG7E07

IG5E07-IG7E07

IG5E07-IG7F07

IG7F07

57203

63

IG7F07

IG7E07

0.11％

IG5E08

IG7E08

IG5E08-IG7E08

IG5E08-IG7D07

IG7D07

72767

281

IG7D07

IG7E08

0.39％

IG5E09

IG7E09

IG5E09-IG7E09

IG5E09-IG7D09

IG7D09

58757

70

IG7D09

IG7E09

0.12％

IG5F03

IG7F03

IG5F03-IG7F03

IG5F03-IG7E03

IG7E03

54811

78

IG7E03

IG7F03

0.14％

IG5F06

IG7F06

IG5F06-IG7F06

IG5F06-IG7G10

IG7G10

50348

60

IG7G10

IG7F06

0.12％

IG5F08

IG7F08

IG5F08-IG7F08

IG5F08-IG7G08

IG7G08

67091

167

IG7G08

IG7F08

0.25％

IG5G12

IG7G12

IG5G12-IG7G12

IG5G12-IG7C01

IG7C01

40234

64

IG7C01

IG7G12

0.16％

IG5H03

IG7H03

IG5H03-IG7H03

IG5H03-IG7H02

IG7H02

48832

58

IG7H02

IG7H03

0.12％

IG5H06

IG7H06

IG5H06-IG7H06

IG5H06-IG7A11

IG7A11

42784

62

IG7A11

IG7H06

0.14％

IG5H07

IG7H07

IG5H07-IG7H07

IG5H07-IG7A08

IG7A08

36410

88

IG7A08

IG7H07

0.24％

IG5H11

IG7H11

IG5H11-IG7H11

IG5H11-IG7E11

IG7E11

32519

50

IG7E11

IG7H11

0.15％

Statistical analysis is performed on the first quality inspection results of the data in tables 8 and 9, so that the relevant information of IG7A01-IG5A01 can be obtained, as shown in Table 10 and FIG. 3; in addition, statistical analysis of 96 pairs of tagged primers resulted in a dot map of the contamination ratios of IG7 and IG5 (fig. 4 and 5), and a distribution map of the contamination ratios of IG7 and IG (fig. 6); the summary leads to the following conclusions: 1) the sequence measured by the combination of IG7A01-IG5A01 is very few, the sequence containing IG7A01 is only 96, and the sequence containing IG5A01 is also only 26, which is far lower than the requirements of at least 5000 strips and the percentage of the total content of the sequences required by quality inspection being more than 0.2%; 2) because the sequences of the combination are few, and the only combination is measured and illegal combination, the pollution ratio is very high; 3) taken together, this well, corresponding to IG7A01-IG5A01, is problematic, requiring replacement both in terms of the number of valid sequences and in terms of the likelihood of contamination.

Watch 10

Because the hole IG7A01-IG5A01 is problematic, the 2 label primers IG7A01 and IG5A01 are synthesized again and independently to dissolve the resynthesized primers to the specified concentration, the primers are put into the corresponding holes of a new deep hole plate in proportion, the holes corresponding to the original IG7A01-IG5A01 are removed, all the residual liquid in the original mother plate with failed quality inspection is transferred to the corresponding positions in a new deep hole plate, and the daughter plate is separated again for pollution quality control detection; the results of the second quality control analysis are shown in tables 11 and 12, and tables 11 and 12 only list the contamination.

Table 11 sequencing results for Index at end IG7

Query

Desired combined object

Desired combination

Undesired combinations

Undesired combination object

Total number of sequences

Number of undesired combined sequences

Pollution source

Is contaminated with

Proportion of pollution

IG7A01

IG5A01

IG7A01-IG5A01

IG7A01-IG5B01

IG5B01

490179

579

IG5B01

IG5A01

0.12％

IG7A08

IG5A08

IG7A08-IG5A08

IG7A08-IG5H07

IG5H07

244997

300

IG5H07

IG5A08

0.12％

IG7A11

IG5A11

IG7A11-IG5A11

IG7A11-IG5B11

IG5B11

398075

580

IG5B11

IG5A11

0.15％

IG7B02

IG5B02

IG7B02-IG5B02

IG7B02-IG5A03

IG5A03

285357

358

IG5A03

IG5B02

0.13％

IG7B10

IG5B10

IG7B10-IG5B10

IG7B10-IG5D08

IG5D08

262786

435

IG5D08

IG5B10

0.17％

IG7B11

IG5B11

IG7B11-IG5B11

IG7B11-IG5C11

IG5C11

336262

664

IG5C11

IG5B11

0.20％

IG7C06

IG5C06

IG7C06-IG5C06

IG7C06-IG5C07

IG5C07

345406

4253

IG5C07

IG5C06

1.23％

IG7C08

IG5C08

IG7C08-IG5C08

IG7C08-IG5B08

IG5B08

306713

460

IG5B08

IG5C08

0.15％

IG7C09

IG5C09

IG7C09-IG5C09

IG7C09-IG5D09

IG5D09

233352

284

IG5D09

IG5C09

0.12％

IG7C11

IG5C11

IG7C11-IG5C11

IG7C11-IG5D11

IG5D11

343633

744

IG5D11

IG5C11

0.22％

IG7D01

IG5D01

IG7D01-IG5D01

IG7D01-IG5C01

IG5C01

260626

313

IG5C01

IG5D01

0.12％

IG7D03

IG5D03

IG7D03-IG5D03

IG7D03-IG5C03

IG5C03

230602

238

IG5C03

IG5D03

0.10％

IG7D03

IG5D03

IG7D03-IG5D03

IG7D03-IG5E03

IG5E03

230602

264

IG5E03

IG5D03

0.11％

IG7D07

IG5D07

IG7D07-IG5D07

IG7D07-IG5E08

IG5E08

243561

1585

IG5E08

IG5D07

0.65％

IG7D07

IG5D07

IG7D07-IG5D07

IG7D07-IG5C07

IG5C07

243561

255

IG5C07

IG5D07

0.10％

IG7D08

IG5D08

IG7D08-IG5D08

IG7D08-IG5C08

IG5C08

316348

448

IG5C08

IG5D08

0.14％

IG7D09

IG5D09

IG7D09-IG5D09

IG7D09-IG5E09

IG5E09

351153

672

IG5E09

IG5D09

0.19％

IG7E07

IG5E07

IG7E07-IG5E07

IG7E07-IG5D07

IG5D07

314916

468

IG5D07

IG5E07

0.15％

IG7E08

IG5E08

IG7E08-IG5E08

IG7E08-IG5D08

IG5D08

313695

328

IG5D08

IG5E08

0.10％

IG7E08

IG5E08

IG7E08-IG5E08

IG7E08-IG5A11

IG5A11

313695

318

IG5A11

IG5E08

0.10％

IG7E12

IG5E12

IG7E12-IG5E12

IG7E12-IG5D12

IG5D12

189902

195

IG5D12

IG5E12

0.10％

IG7G01

IG5G01

IG7G01-IG5G01

IG7G01-IG5F01

IG5F01

271347

362

IG5F01

IG5G01

0.13％

IG7G08

IG5G08

IG7G08-IG5G08

IG7G08-IG5F08

IG5F08

202711

733

IG5F08

IG5G08

0.36％

IG7G09

IG5G09

IG7G09-IG5G09

IG7G09-IG5H09

IG5H09

268926

317

IG5H09

IG5G09

0.12％

IG7G10

IG5G10

IG7G10-IG5G10

IG7G10-IG5F06

IG5F06

363743

533

IG5F06

IG5G10

0.15％

IG7G10

IG5G10

IG7G10-IG5G10

IG7G10-IG5F10

IG5F10

363743

510

IG5F10

IG5G10

0.14％

IG7H02

IG5H02

IG7H02-IG5H02

IG7H02-IG5H03

IG5H03

237964

294

IG5H03

IG5H02

0.12％

IG7H07

IG5H07

IG7H07-IG5H07

IG7H07-IG5G07

IG5G07

240529

355

IG5G07

IG5H07

0.15％

IG7H08

IG5H08

IG7H08-IG5H08

IG7H08-IG5A09

IG5A09

121660

253

IG5A09

IG5H08

0.21％

Table 12 sequencing results for Index at end IG5

After the primer replacement operation, no cross contamination exists between IG7A01 and IG5A01, and each index of 96 pairs of label primers meets the quality inspection standard.

In addition, in this example, the first quality control analysis and the second quality control analysis are compared, and the results are shown in table 13, fig. 7 (comparative analysis of IG7 labeled primer) and fig. 8 (comparative analysis of IG5 labeled primer), which shows that the reproducibility of the two quality control analyses is good, and thus the stability of the quality control method of the present invention is good.

Watch 13

Example 3

In this embodiment, 96 is used to perform quality inspection on the library tags, and the statistical analysis is performed on the first quality inspection result, so as to obtain the related information of one pair of tag primers, as shown in fig. 9; in addition, statistical analysis of 96 pairs of tagged primers resulted in a thermogram of the contamination ratio ratios of IG7 and IG5 (fig. 10 and 11), and a distribution map of the contamination ratio of IG7 and IG (fig. 12); summarizing the conclusions drawn: after the first quality control analysis, 96 pairs of label primers all meet the indexes.

Finally, it should be noted that the above embodiments are intended to illustrate the technical solutions of the present invention and not to limit the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A quality control method for detecting unique paired-end library-tag combinations, comprising the steps of:

in the parameters of quality control analysis, the unique paired-end library signature combinations are each composed of an upstream library signature and a downstream library signature, the upstream library signatures are collectively referred to as IG5, IG5 comprises a and B; the downstream library tags are collectively referred to as IG7, IG7 comprises a and b; matching and correct unique paired end library tag combinations are A-a and B-B; the unique paired-end library tag combinations that did not match were A-B, and B-a; the respective sequence number of the above combinations can be obtained by analysis after each sequencing reaction;

when a of IG7 is not subjected to any cross contamination in the production process, for A of IG5, the ratio of B-containing contamination = number of sequences containing B-a/number of all sequences containing a,

when a of IG5 does not undergo any cross-contamination during production, for a of IG7, where b-containing contamination is = number of a-b-containing sequences/number of all a-containing sequences;

when B contaminates a and B contaminates a, then the B-B tag combination contamination ratio = (number of B-a containing sequences/number of all a containing sequences) × (number of a-B containing sequences/number of all a containing sequences);

when B of IG7 is not cross-contaminated during the production process, for B of IG5, wherein the contamination ratio of A is = number of sequences containing A-B/number of all sequences containing B,

when B of IG5 is not cross-contaminated during the production process, for B of IG7, the contamination ratio containing a = number of B-a containing sequences/number of all B containing sequences;

when a contaminates B and a contaminates B, then a-a tag combination contamination ratio = (number of a-B containing sequences/number of all B containing sequences) × (number of B-a containing sequences/number of all B containing sequences);

2. The quality control method according to claim 1, wherein the step S1) comprises the following steps in sequence: preparing a gDNA standard, fragmenting gDNA, repairing tail ends, connecting joints, purifying joint connection products, amplifying libraries, purifying amplification libraries, detecting the quality of the purified libraries, detecting the sizes of the fragments of the purified libraries and sequencing on the libraries.

3. The quality control method of claim 1, wherein the unique paired-end library tag combination consists of IG5 and IG7, the Hamming distance of the library tags within each of IG5 and IG7 is 3 or more, and the sequence Hamming distance of the library tags between IG5 and IG7 groups is 2 or more.

4. The quality control method according to claim 3, wherein the library tag is purified by high performance liquid chromatography and the molecular weight is confirmed by mass spectrometry, and the purity is required to be 85% or more.

5. The quality control method of claim 1, wherein the unique paired-end library signature combination consists of 96 pairs of library signatures, i.e., 96 upstream library signatures in group IG5 and 96 downstream library signatures in group IG7, one-to-one; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.2 percent.

6. The quality control method of claim 1, wherein the unique paired-end library signature combination consists of 48 pairs of library signatures, i.e., 48 upstream library signatures in group IG5 and 48 downstream library signatures in group IG7, in one-to-one correspondence; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.4 percent.

7. The quality control method of claim 1, wherein when the unique paired-end library tag combination consists of 192 pairs of library tags, i.e., 192 upstream library tags in IG5 group and 192 downstream library tags in IG7 group, in one-to-one correspondence, the percentage of sample sequences of each group of tags is adjusted to 0.1% or more accordingly.

8. The quality control method of claim 1, wherein when the unique paired-end library tag combination consists of 288 pairs of library tags, i.e., 288 upstream library tags in IG5 group and 288 downstream library tags in IG7 group, one-to-one correspondence is established, the percentage of tag sample sequences in each group is adjusted to 0.07% or more.

9. The quality control method of claim 1, wherein when the unique paired-end library signature combination consists of 384 pairs of library signatures, i.e., 384 upstream library signatures in IG5 group and 384 downstream library signatures in IG7 group, in one-to-one correspondence, the percentage of sample sequences of each set of signatures is adjusted to 0.05% or more.

10. The quality control method according to any one of claims 1 to 9, which is used for sequencing a sample.