CN109517882B - Quality control method for detecting unique double-end library label combination and application - Google Patents

Quality control method for detecting unique double-end library label combination and application Download PDF

Info

Publication number
CN109517882B
CN109517882B CN201811337895.2A CN201811337895A CN109517882B CN 109517882 B CN109517882 B CN 109517882B CN 201811337895 A CN201811337895 A CN 201811337895A CN 109517882 B CN109517882 B CN 109517882B
Authority
CN
China
Prior art keywords
library
quality control
sequences
group
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811337895.2A
Other languages
Chinese (zh)
Other versions
CN109517882A (en
Inventor
张之宏
罗健
汉雨生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Burning Rock Dx Laboratory Co ltd
Original Assignee
Guangzhou Burning Rock Dx Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Burning Rock Dx Laboratory Co ltd filed Critical Guangzhou Burning Rock Dx Laboratory Co ltd
Priority to CN202111090137.7A priority Critical patent/CN113957123A/en
Priority to CN201811337895.2A priority patent/CN109517882B/en
Publication of CN109517882A publication Critical patent/CN109517882A/en
Application granted granted Critical
Publication of CN109517882B publication Critical patent/CN109517882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a quality control method for detecting a unique double-end library label combination and application thereof, belonging to the technical field of biological detection, wherein the quality control method comprises the following steps: s1) constructing a gDNA library with a unique double-ended library tag combination, performing on-machine sequencing on the constructed library, reading a library tag sequence, S2) performing first quality control analysis on the library tag sequence, S3) replacing problematic tag raw materials under the condition of needing according to the analysis result of S2, reconstructing the gDNA library with the unique double-ended library tag combination according to the step S1) method, performing on-machine sequencing on the constructed library, reading the library tag sequence, S4) performing second quality control analysis on the library tag sequence, and judging whether to continue replacing problematic library tags according to the S3) method according to the result until all library tags meet the index of the quality control analysis. The quality control method can improve the detection efficiency of the library label and is more suitable for the requirement of accurate sequencing of the library.

Description

Quality control method for detecting unique double-end library label combination and application
Technical Field
The invention belongs to the technical field of biological detection, and particularly relates to a quality control method for detecting a unique double-end library label combination and application thereof.
Background
With the rapid development of high-throughput technology and the increasing throughput of sequencers, previous methods for distinguishing different sequencing libraries by using physical partition methods such as Lane (Lane) type Flow Cell (Flow Cell) have not been applicable. Multiplex library Sequencing (Multiplex Sequencing) is widely used in various fields of next generation Sequencing. The key to multiplex library sequencing is the library tag (Index). The library tag is used for marking a special sequence of each sample in the preparation of an NGS (Next Generation sequencing) library, and is used for distinguishing specific sequences of DNAs from different sources, and the length of the specific sequences is generally 4-12 bases. In the high-throughput sequencing process, libraries labeled with different known tag sequences are mixed and then subjected to sequencing reaction, and the inserts and tags of the libraries are sequentially read and converted into bases. In the next analysis process, the software classifies the sequencing results by using the expected tag sequences, and splits the sequencing results into different samples.
In the multiple sequencing process, if library sequence misassignment occurs, sequences that originally did not belong to a library are misclassified. The occurrence of such erroneous assignments will lead to erroneous analysis results for some applications. For example, when a library of tissue samples from cancer patients is sequenced together with a library of tissue samples from benign tumor patients, if some of the sequences of the cancer tissue samples are misinterpreted in the benign tumor tissue samples, the detection report of the benign tumor patients shows malignant tumors, resulting in a diagnosis error.
There are many reasons why library sequences may be misassigned. Common include the following: 1) cross-contamination during library preparation, 2) cross-contamination during production of tag primers, 3) cross-reaction that occurs when multiple libraries are subjected to clustering in a flow cell, and 4) optical bias due to excessive cluster density, etc.
The tag primers suitable for the second generation sequencing library are usually 50-70 bases in length, and generally need to be purified to ensure the purity of the full-length primers. However, purification itself often leads to more cross-contamination due to the need for gel-cutting recovery or column chromatography. Adsorption and repeated use of the labeled primers by the purification column inevitably brings about cross-contamination in terms of HPLC (high performance liquid chromatography). Although such contamination can reduce residual contamination by performing an empty sample elution or an irrelevant sample elution between two different tagged primer column purifications, it still does not completely avoid cross-contamination. According to the experience, 0.5 to 5 percent of the former tag primer to the latter tag primer can be remained after two times of purification.
Due to the high sensitivity brought by the high throughput of NGS, quality testing of tagged primers requires very sensitive methods for detecting possible contamination down to thousandths or even ten thousandths. In addition, conventional methods such as qPCR are not suitable for detecting contamination either in sensitivity or specificity due to the very similar sequence between the tag primers. The conventional method still uses the NGS platform for quality inspection, but the conventional method can only detect one target label primer at most for each Lane, thus making the cost of quality inspection high and unsmooth.
Therefore, it is necessary to design a new quality control method for unique paired-end library tag combinations to improve the detection efficiency.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a quality control method for detecting a unique double-end Index combination and application thereof, which can improve the detection efficiency of library labels and meet the requirement of accurate sequencing of libraries.
In order to achieve the purpose, the invention adopts the technical scheme that: a quality control method for detecting unique paired-end library-tag combinations, comprising the steps of:
s1) constructing a gDNA library with a unique double-end library label combination by taking the library label standard substance and the gDNA standard substance as raw materials, carrying out on-machine sequencing on the constructed library, and reading a library label sequence;
s2), performing first quality control analysis on the library tag sequence, wherein the indexes of the quality control analysis comprise the following items: the maximum single-side label pollution ratio is less than or equal to 2.5 percent, the maximum label combination pollution ratio is less than or equal to 0.01 percent, the number of label sample sequences in each group is more than or equal to 5000, the mixed ratio variance coefficient of all label combinations is less than or equal to 0.5, the comprehensive sequence passing rate is more than or equal to 97 percent, the ratio of the label sample sequences in each group is more than or equal to 0.2/logarithm of the library label combinations, and the ratio of the single-side more than 1 percent polluted labels is less than or equal to 10 percent;
s3) if the quality control analysis in the step S2) shows that the indexes do not meet the requirements, re-synthesizing the library labels which do not meet the requirements of quality control; constructing a gDNA library with a unique double-end library label combination by taking the newly synthesized library label, the library label meeting the requirements of the first quality control analysis and gDNA as raw materials according to the method of the step S1), performing machine sequencing on the constructed library again, and reading a library label sequence;
s4) performing secondary quality control analysis on the library label sequences until all library labels meet the indexes of the quality control analysis;
in the parameters of quality control analysis, the unique paired-end library signature combination consists of an upstream library signature and a downstream library signature, the upstream library signature is collectively referred to as IG5, IG5 comprises a and B; the downstream library tags are collectively referred to as IG7, IG7 comprises a and b; matching and correct unique paired end library tag combinations are A-a and B-B; the unique paired-end library tag combinations that did not match were A-B, and B-a; the respective sequence number of the above combinations can be obtained by analysis after each sequencing reaction;
the single-sided label contamination ratio is the proportion of cross-contamination occurring between labels within a group, and contamination is only possible within a group, i.e. within IG5 group or/and IG7 group;
when a of IG7 does not generate any cross contamination in the production process, for A of IG5, the contamination ratio of B is the number of sequences containing B-a/the number of all sequences containing a,
when A of IG5 does not generate any cross contamination in the production process, for a of IG7, the contamination containing b accounts for the number of sequences containing A-b/the number of all sequences containing A;
when B contaminates a and B contaminates a, then B-B tag combination contamination ratio ═ (number of sequences containing B-a/number of all sequences containing a) × (number of sequences containing a-B/number of all sequences containing a);
when B of IG7 does not generate any cross contamination in the production process, for B of IG5, the contamination ratio of A is the number of sequences containing A-B/the number of all sequences containing B,
when B of IG5 does not generate any cross contamination in the production process, for B of IG7, the contamination ratio of a is the number of sequences containing B-a/the number of all sequences containing B;
when a contaminates B and a contaminates B, then a-a tag combination contamination ratio ═ (number of sequences containing a-B/number of all sequences containing B) × (number of sequences containing B-a/number of all sequences containing B);
the number of the sequence of each group of label samples is the number of the correct pairing sequence of each group after system filtration, namely the number of the sequence containing A-a or the number of the sequence containing B-B;
the mixed proportion variance coefficient of all the label combinations is the variance coefficient of the proportion of the number of each group of correct paired sequences filtered by the system to the number of total paired correct sequences filtered by the system;
the comprehensive sequence passing rate is the proportion of the total number of correctly matched and effective sequences after the sequencing reaction is filtered by the system to the total number of all sequences after the sequencing reaction is filtered by the system;
the proportion of each group of label sample sequences is the proportion of the number of correctly paired sequences in each group after system filtration to the total sequence after system filtration;
the label with more than 1% of pollution on one side comprises the following components in percentage by weight: in the upstream library tags, the proportion of the number of the library tags with the pollution proportion of more than 1 percent to the total number of the library tags is calculated; and, within the downstream library tags, a contamination ratio greater than 1% of the number of library tags to the total number of library tags.
As an improvement of the above technical solution, the step S1) includes the following steps in sequence: preparing a gDNA standard, fragmenting gDNA, repairing tail ends, connecting joints, purifying joint connection products, amplifying libraries, purifying amplification libraries, detecting the quality of the purified libraries, detecting the sizes of the fragments of the purified libraries and sequencing on the libraries.
As an improvement of the technical scheme, the unique double-end library tag combination consists of an IG5 group and an IG7 group, the Hamming distance of the library tags in the IG5 and IG7 groups is more than or equal to 3, and the sequence Hamming distance of the library tags between the IG5 and IG7 groups is more than or equal to 2.
As a further improvement of the technical scheme, the library label is purified by high performance liquid chromatography and the molecular weight is confirmed by mass spectrometry, and the purity is required to be more than or equal to 85%.
As an improvement of the technical scheme, the unique paired-end library tag combination consists of 96 pairs of library tags, namely 96 upstream library tags are arranged in an IG5 group, and 96 downstream library tags are arranged in an IG7 group, which are in one-to-one correspondence; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.2 percent.
As an improvement of the technical scheme, the unique paired-end library tag combination consists of 48 pairs of library tags, namely 48 upstream library tags exist in an IG5 group, and 48 downstream library tags exist in an IG7 group, which correspond to each other one by one; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.4 percent.
As an improvement of the technical scheme, when the unique double-end library tag combination consists of 192 pairs of library tags, namely 192 upstream library tags are arranged in an IG5 group, 192 downstream library tags are arranged in an IG7 group, the tags are in one-to-one correspondence, and the sample sequence proportion of each group of tags is correspondingly adjusted to be more than or equal to 0.1 percent.
As an improvement of the technical scheme, when the unique double-end library tag combination consists of 288 pairs of library tags, namely 288 upstream library tags are arranged in an IG5 group, 288 downstream library tags are arranged in an IG7 group, and the tag sample sequence of each group is correspondingly adjusted to be more than or equal to 0.07 percent in percentage.
As an improvement of the technical scheme, when the unique double-end library tag combination consists of 384 pairs of library tags, namely 384 upstream library tags are arranged in an IG5 group, 384 downstream library tags are arranged in an IG7 group, and the tag sample sequences in each group are correspondingly adjusted to be more than or equal to 0.05 percent in percentage.
In addition, the invention also provides application of the quality control method in sample sequence determination.
The invention has the beneficial effects that: the invention provides a quality control method for detecting a unique double-end library label combination and application thereof, wherein the quality control method can efficiently detect the cross contamination of library labels, has relatively low cost and is more suitable for high-throughput determination of sample sequences.
Drawings
FIG. 1 shows the results of quality control of the first simulation in example 1;
FIG. 2 shows the results of quality control of the second simulation in example 1;
FIG. 3 shows the result of the first quality control analysis at the IG5 end of example 2;
FIG. 4 is a point diagram of pollution specific heat ratios of first quality control analysis at an IG7 end of example 2, in FIG. 4, 96 pairs of labeled primers are provided, the abscissa of the labeled primers sequentially from left to right is IG5A 01-IG5A 12, IG5B 01-IG 5B12, IG5C 01-IG 5C12 to IG5H 01-IG 5H12, and the ordinate of the labeled primers sequentially from top to bottom is IG7A01-IG 7A12, IG7B 01-IG 7B12, IG7C 01-IG 7C12 to IG7H 01-IG 7H 12; the dots encircled by ovals in the figure represent unsatisfactory tag primers; the following is similar;
FIG. 5 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG5 end of example 2;
FIG. 6 is a distribution diagram of contamination ratios of first quality control analysis at IG7 and IG5 terminals of example 2;
FIG. 7 shows the stability comparison results of two quality control analyses at the IG7 end in example 2;
FIG. 8 shows the stability comparison results of two quality control analyses at the IG5 end in example 2;
FIG. 9 shows the results of the first quality control analysis at the IG5 end of example 3;
FIG. 10 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG7 end of example 3;
FIG. 11 is a point diagram of the specific heat capacity of the contamination in the first quality control analysis at the IG5 end of example 3;
FIG. 12 is a distribution diagram of contamination ratios of the first quality control analysis at IG7 and IG5 terminals of example 3.
Detailed Description
To better illustrate the objects, aspects and advantages of the present invention, the present invention will be further described with reference to the following detailed description and accompanying drawings.
In the present specification, Index, library tag and tag primer mean the same; in the calculation of the percentage of sequences in each set of tag samples, the result of 0.2/log of the library tag combinations retained a non-zero number (and rounded off).
Principle of unique double-end library label in preventing sample pollution caused by cross contamination
In the NGS field, to distinguish different samples in the same sequencing reaction, specific "tags" (indexes) are applied to the different samples during the library construction process so that the data of the different samples can be separated during the subsequent data analysis. With the increasing throughput of sequencers, more samples were pieced together into the same flow channel (Lane) for sequencing, placing higher demands on the number and discrimination of indexes. In addition, both Illumina HiSeqX/4000 and NovaSeq use a clustering method different from other Illumina sequencers, and the literature reports that the method has higher risk of Index cross contamination. The traditional single-ended Index primer is used for data splitting only according to one end, and data are easily subjected to missplitting when pollution occurs. By adopting the unique double-end Index primer, the sample pollution risk caused by Index cross contamination can be avoided to the greatest extent, and the stability and reliability of the product are ensured. The unique double-ended Index primers rely on the unique double-ended paired Index for data resolution, thereby adding a 'double insurance' to the sequencing sequence, and most sequences with pollution can be discarded. Table 1 compares the tolerance of single-ended, combined double-ended, and unique double-ended Index strategies to Index cross-contamination.
TABLE 1
Figure BDA0001860358380000061
Figure BDA0001860358380000071
Principle for carrying out high-throughput pollution quality inspection on unique double-end Index primer by NGS method
Because a unique double-ended Index was used, each sample was labeled 2 times with the Index, which greatly increased the tolerance to cross-contamination between single-ended labeled primers. For example, if the ratio of 2 to Index single-sided contamination is 1%, the actual risk of sample misconception is 1% × 1% — 0.01%. This tolerance also greatly eases the pressure of synthesis and purification of Index primers, allowing further control of manufacturing costs.
By utilizing the advantages of the unique double-end Index, the invention provides a simple and feasible quality control method for detecting the cross contamination of the labeled primers by utilizing NGS. The rationale is based on observing the proportion of the undesired double-ended Index combinations in the overall sequencing result to estimate the maximum cross-contamination probability that can occur and the Index involved, thus avoiding misallocation between samples due to cross-contamination between Index primers.
For example, four libraries are labeled a + a, B + B, C + C, D + D, respectively. Therefore, only the 4 combinations described above were considered legal combinations when sequence analysis was performed. Taking the combination a + b as an example, since theoretically only a would pair with a, if two possibilities are observed for the a + b combination: 1) the label primer b enters the primer a, wherein S is defined as the number of sequences containing the Index of the species, and the estimated pollution ratio is S(A+b)/SA(ii) a 2) The primer A enters the primer B, and the estimated pollution ratio is S(A+b)/Sb. It should be noted that the calculation method is premised on the fact that the indices of the same category, such as A/B/C/D, do not contain any indices of different categories, such as a/B/C/D. In addition, the estimation model only considers simple one-to-one pollution modes, and does not consider complex situations such as multiple pollution and the like. In addition, the calculation method only estimates the maximum pollution possibility and has no capability of judging the pollution directionality, and in fact, after any one unidirectional pollution event occurs, for example, the event of 'A entering B', can be detected as two possibilities of 'A entering B' or 'B entering a'. From this computational model, we can estimate that the maximum combinatorial contamination risk of the unique combinatorial paired-end Index library a + a by other primers within multiplex sequencing is:
Figure BDA0001860358380000081
however, since the combination we desire is only four, a + a, B + B, C + C, D + D, the effective maximum contamination risk can be calculated as:
Figure BDA0001860358380000082
in practical examples, PCR operations were performed on 48 or 96Index primers to label the Index into the library, respectively, and then mixed together for routine MiSeq sequencing. Analysis after sequencing analysis the analysis script was directly invoked to analyze 96 × 6 ═ 9216 sequence combinations, to find the proportion of abnormal combinations and to calculate the respective contamination ratios.
Index on-machine sequencing
1. Preparation of gDNA Standard
1) The 48plex Index requires 500ng of gDNA standard substance for quality inspection, and the 96plex Index requires 1000ng of gDNA standard substance for quality inspection;
2) 50 μ l of 1 × IDTE Buffer was added to a new 1.5ml Eppendorf Lobind tube, and the corresponding volume of gDNA standard was added to the tube: detecting by a 48plex Index plate, wherein the volume of the gDNA standard substance is 2 mu l; detecting by a 96plex Index plate, wherein the volume of the gDNA standard substance is 4 mu l; then, uniformly mixing for 10-15 s in a vortex mode, and then centrifuging for a short time to enable the solution to return to the bottom of the tube;
3) the standard dilutions were transferred to Covaris MicroT μm BE tubes supplemented with 1 × IDTE Buffer to 50 μ l prior to subsequent DNA fragmentation operations.
2. Fragmentation of gDNA
The DNA is broken into fragments of 170-200 bp by using a Covaris M220 instrument, and after the breaking is completed, a Covaris Microtube is taken out, and the liquid is centrifuged to return to the bottom of the tube.
3. End repair, 3' end plus A
1) Preparation of reagents: opening KAPA Hyper Prep 96reaction Kit, taking out the lower 2 tubes and putting on ice for melting;
2) preparing a mixed solution of a terminal repairing reaction system and a reaction system A on ice in a new 1.5ml Eppendorf Lobind tube, flicking the finger 3-5 times, mixing the mixture 2-3 times in an upside-down mode, and centrifuging the mixture for 1-3 seconds by using a centrifuge; the configuration of the reaction system is shown in table 2;
3) sucking 60 mul of the uniformly mixed solution, subpackaging the uniformly mixed solution into 4 (48plex Index plates) or 8 (96plex Index plates) 0.2ml of flat-cap PCR tubes, and centrifuging the tubes for 1 to 3 seconds by a centrifuge;
4) putting the mixture into a PCR instrument, and performing the following operations: heating the cover at 85 deg.C for 30min at 20 deg.C, 30min at 65 deg.C, storing at 4 deg.C, and proceeding to the next step within 2 hr.
TABLE 2
Figure BDA0001860358380000091
4. Connecting the two ends of the DNA double-stranded fragment added with the A with a preformed joint (containing T sticky ends)
1) Preparing a joint connection reaction system mixing solution on ice in a new 1.5ml Eppendorf LoBind tube, flicking the fingers 3-5 times, reversing the fingers up and down, mixing the mixture for 2-3 times, and centrifuging the mixture for 1-3 seconds by using a centrifuge; the configuration of the reaction system is shown in Table 3;
2) sucking 50 mul of the uniformly mixed solution, adding the uniformly mixed solution into the 0.2ml tube (the total volume of the 48plex Index plate is 4 tubes, and the total volume of the 96plex Index plate is 8 tubes), blowing and beating the uniformly mixed solution up and down by a pipettor for 5 times, and centrifuging the mixture for 1 to 3 seconds;
3) the following program was run on the PCR instrument: storing at 20 deg.C for 15min, 70 deg.C for 10min, and 4 deg.C (hot cover at 85 deg.C).
TABLE 3
Figure BDA0001860358380000101
5. Purification of the ligation product to remove linker dimer and unlinked linker
1) Reversing the top and the bottom for 2-3 times, and carrying out vortex mixing on the SPB magnetic beads which are returned to the room temperature for 5-10 s to homogenize the SPB magnetic beads; taking a 1.5ml centrifugal tube, and connecting a reaction system and the volume of the magnetic beads to be 1: 0.8, adding the homogenized magnetic beads and the joint products in sequence; the specific strategy is as follows: 352 mu l of magnetic beads and 440 mu l of adaptor products, and combining 4 tubes into 1 tube for purification, wherein the total volume is 1 tube; 2X 352. mu.l of magnetic beads, 2X 440. mu.l of adaptor product, 4 tubes combined into 1 tube for purification, 2 tubes in total (96plex Index); adding, mixing, rotary incubating for 5min, and centrifuging for a short time;
2) placing the centrifugal tube in a magnetic frame, and waiting for the solution to be clarified; placing the centrifugal tube on a magnetic frame without moving, opening a tube cover, carefully sucking clear supernatant away to avoid touching magnetic beads;
3) the tube is still placed on a magnetic frame, 500 mu L of freshly prepared 75% ethanol is added into each tube, the magnetic beads are fully precipitated after 1min, and the centrifugal tube is slowly rotated for 1 circle along the horizontal direction during the period, so that the ethanol is sucked away; repeating the step for 1 time;
4) centrifuging for 1-3 s, putting the centrifugal tube back to the magnetic frame again, standing for 30s, removing residual ethanol by using a liquid transfer device, and keeping the tube cover open; drying the magnetic beads at room temperature for 3min, adding 500 μ l EB solution into each tube, fully and uniformly blowing, and incubating at room temperature for 2 min; the centrifuge tubes were placed in a magnetic rack for 2min until the solution cleared, 490. mu.l of the supernatant was removed using a pipette, transferred to a new Eppendorf Lobind 1.5ml centrifuge tube (96plex Index plate, both tubes were pooled into 1 tube after elution), and kept on ice until needed.
6. Amplification of libraries, amplification of libraries to which adaptors have been ligated
1) Preparing a reaction system mixing solution (prepared on ice) with a corresponding volume in a 5ml Eppendorf Lobind tube (or a 15ml centrifuge tube), flicking with fingers for 3-5 times, mixing the mixture for 2-3 times in an upside-down mode, and standing the mixture vertically for 0.5-1 min; the configuration of the reaction system is shown in table 4;
2) the prepared reaction mixture was equally distributed into 8 tubes, and 138. mu.l of the reaction mixture was equally distributed (96Index pair Plate (refer part2#) for each test, and two times of equal distribution were required: 142. mu.l + 132. mu.l);
3) the reaction mixture was dispensed into a new 48-well plate (48plex Index) or 96PCR plate (96plex Index) in a volume of 22.5. mu.l/well;
4) taking out 2.5 mul Index from the IDP plate (adding the Index into the 48-hole plate or 96-hole PCR plate of the subpackaged reaction system mixed solution, repeatedly blowing, uniformly mixing for 2-3 times, and sealing the membrane; centrifuge at 1000rpm for 1min (reaction volume 25. mu.l) with a plate-throwing machine; the PCR product was placed in a PCR machine and the procedure was as shown in Table 5.
TABLE 4
Figure BDA0001860358380000111
TABLE 5
Figure BDA0001860358380000112
Figure BDA0001860358380000121
7. Amplified library purification, primer dimer removal and reaction system
1) The SPB magnetic beads are inverted from top to bottom for 2-3 times, and are uniformly mixed for 5-10 s at the maximum VORTEX rotation speed to be uniform;
2) the corresponding SPB beads were pipetted into the wells, and 20 μ l of SPB beads was added to each sample (sample: beads ═ 1: 0.8): adding 1440 mu L of magnetic beads into the sample adding groove for 48 samples, and adding 2880 mu L of magnetic beads into the sample adding groove for 96 samples;
3) taking out the 48-hole plate from the PCR instrument, and carefully tearing off the adhesive film at 1000rpm for 3 s; sucking 20 mul SPB magnetic beads from the sample adding groove, adding into a 48-hole plate/96-hole PCR plate, and blowing up and down for 10 times;
4) pasting films on a 48-hole plate/96-hole PCR plate, centrifuging for 3s at 1000rpm for a short time, and placing at room temperature for 5 min; placing the 48-hole plate/96-hole PCR plate on a 96-hole magnetic frame until the solution is clarified; discarding the membrane, sucking 45 mu l of supernatant and discarding;
5) the 48-well plate/96-well PCR plate was still placed on the magnetic stand, 200. mu.l of freshly prepared 75% ethanol was added to the sample wells; standing a 48-hole plate/96-hole PCR plate on a magnetic frame to fully soak and wash the magnetic beads, and discarding the ethanol after 1 min; repeating the step for 1 time;
6) standing the 48-hole plate/96-hole PCR plate on a magnetic frame for 30s, and removing residual ethanol; taking the 48-hole plate/96-hole PCR plate from the magnetic frame, placing the plate on the PCR plate frame at room temperature for 2min, and drying the magnetic beads; adding 14 mul EB into a 48-hole plate/96-hole PCR plate, covering an eight-connecting-tube cover, whirling for about 5s, and centrifuging for 3s at 1000rpm for a short time;
7) incubating the 48-pore plate at room temperature for 2min, removing the membrane, and placing the 48-pore plate on a magnetic frame for 2min until the solution is clear; transferring 8 mu L of the supernatant into a new 48-well plate/96-well PCR plate without magnetic beads;
8) transferring each array of libraries to the same new 0.2ml 8-linked tube, transferring the libraries in the 0.2ml 8-linked tube to the same new 1.5ml Eppendorf Lobind tube, combining the libraries into a pooling library, mixing the libraries by Vortex and centrifuging; and taking out 20 mu l of the uniformly mixed purified library to a new 1.5ml Eppendorf Lobind tube, adding 180 mu l of EB, repeatedly blowing for 5-6 times, and diluting the library by 10 times in advance to prepare for subsequent detection. 8. Quality control of purified libraries
Use of
Figure BDA0001860358380000131
The dsDNA HS (High Sensitivity) Assay Kit (Thermo Fisher) measures the diluted library concentration and converts back to the pre-library concentration; the concentration of the library is between 9 and 60 ng/mu l, and the Labchip result is normal, the constructed part of the library is qualified, and the subsequent Miseq computer can be carried out; if the requirements are not met, the library preparation needs to be carried out again.
9. Purified Library fragment size detection (Library QC)
The diluted library was tested using The LabChip DNA High Sensitivity Reagent kit (Perkin Elmer); the main peak of the qualified library fragment is 350-500 bp, and no obvious small fragment is in the range of 10-150 bp.
10. Library computer strategy (Miseq Run)
1) Diluting the purified library to 4nM according to the detection concentration of QC, and diluting 1N NaOH to 0.2N by using nuclease-free water;
2) library denaturation: adding 5 mul of the library diluted to 4nM into a new 1.5ml Eppendorf Lobind tube, then adding 5 mul of 0.2N NaOH, blowing and uniformly mixing for 15-20 times, and incubating for 5min at room temperature;
3) the library was diluted to 13 pM;
4) subsequent manipulations the library was sequenced using the corresponding settings Read 1-12 cycles, Index 1-8 cycles, and Index 2-8 cycles, with reference to the Illumina Miseq instructions.
11. Sequencing data Analysis (QC Analysis)
And (3) outputting sequences (Fastq format) of all the indexes 1 and 2 by using Illumina bcl2Fastq software together with corresponding parameters, and performing statistical analysis on the sequences by using corresponding scripts to obtain each index.
11. Library sequencing result determination criteria
Miseq off-line index: sequencing data quality 01: q30> 90%, sequencing data quality 02: PF > 97%, sequencing data quality 03: both the Phasing and the Prephasing are less than 0.30.
Example 1Simulation of quality control method
1) Simulating unidirectional pollution for the first time: 1 cross-contamination is detected for the first time, 2 presumed contamination directions are given, and the maximum contamination ratio (namely the maximum single-side label contamination ratio) is 4 percent; generating 96 pairs of standard pairing sequences by simulation data, generating polluted normal pairing IG7F01+ IG5F 0148000 and IG7F01+ IG5E 012000; each of the remaining pairs is 50000. Performing data analysis on the simulated library data, wherein the actual test result is shown in table 6, and performing quality control analysis according to the parameters in table 6 to obtain a quality control analysis result shown in fig. 1; the maximum paired contamination ratio product (i.e., the maximum tag combined contamination ratio) is 4% × 0 ═ 0, the number of correctly paired sequence pieces is 48000, the number of correctly paired and valid sequence pieces is 48000, the sequence throughput rate is 100%, the number of tags with greater than 1% contamination on one side is 1, and the contamination index ratio greater than 1% (i.e., the tag with greater than 1% contamination on one side) is 1/96 ═ 1.04%.
TABLE 6
Figure BDA0001860358380000141
2) The second simulation can cause bidirectional contamination of misclassified samples: 2 cross contaminations are detected for the second time, and the 2 cross contaminations can cause sample misclassification, the maximum contamination ratio is 2 percent, and the maximum pairing contamination product is 0.04 percent; the standard pairing sequences generated by the simulation data are all 50000, and the normal pairing IG7F01+ IG5F 0148000, the error pairing IG7F01+ IG5E 011000 and the IG7E01+ IG5F011000 appear in pollution. The simulated library samples were subjected to data analysis, the actual test results are shown in table 7, and quality control analysis was performed according to the parameters in table 7, and the obtained quality control analysis results are shown in fig. 2.
TABLE 7
Figure BDA0001860358380000142
Therefore, the test result of the simulation test is consistent with the expectation.
Example 2
In this example, the library tags were subjected to quality inspection at 96, the first quality control analysis report results are shown in tables 8 and 9, and tables 8 and 9 only list the contamination.
Table 8 sequencing results for Index at end IG7
Query Desired combined object Desired combination Undesired combinations Undesired combination object Total number of sequences Number of undesired combined sequences Pollution source Is contaminated with Proportion of pollution
IG7A01 IG5A01 IG7A01-IG5A01 IG7A01-IG5B01 IG5B01 96 45 IG5B01 IG5A01 46.88%
IG7A01 IG5A01 IG7A01-IG5A01 IG7A01-IG5A02 IG5A02 96 51 IG5A02 IG5A01 53.13%
IG7A08 IG5A08 IG7A08-IG5A08 IG7A08-IG5H07 IG5H07 53249 88 IG5H07 IG5A08 0.17%
IG7B02 IG5B02 IG7B02-IG5B02 IG7B02-IG5A03 IG5A03 40825 43 IG5A03 IG5B02 0.11%
IG7B10 IG5B10 IG7B10-IG5B10 IG7B10-IG5D08 IG5D08 46021 70 IG5D08 IG5B10 0.15%
IG7B11 IG5B11 IG7B11-IG5B11 IG7B11-IG5C11 IG5C11 47969 68 IG5C11 IG5B11 0.14%
IG7C01 IG5C01 IG7C01-IG5C01 IG7C01-IG5G12 IG5G12 39518 64 IG5G12 IG5C01 0.16%
IG7C06 IG5C06 IG7C06-IG5C06 IG7C06-IG5C07 IG5C07 60810 637 IG5C07 IG5C06 1.05%
IG7C08 IG5C08 IG7C08-IG5C08 IG7C08-IG5B08 IG5B08 67961 119 IG5B08 IG5C08 0.18%
IG7D03 IG5D03 IG7D03-IG5D03 IG7D03-IG5E03 IG5E03 44222 48 IG5E03 IG5D03 0.11%
IG7D03 IG5D03 IG7D03-IG5D03 IG7D03-IG5C03 IG5C03 44222 56 IG5C03 IG5D03 0.13%
IG7D04 IG5D04 IG7D04-IG5D04 IG7D04-IG5D03 IG5D03 40521 41 IG5D03 IG5D04 0.10%
IG7D07 IG5D07 IG7D07-IG5D07 IG7D07-IG5E08 IG5E08 39029 281 IG5E08 IG5D07 0.72%
IG7D08 IG5D08 IG7D08-IG5D08 IG7D08-IG5C08 IG5C08 53581 85 IG5C08 IG5D08 0.16%
IG7D09 IG5D09 IG7D09-IG5D09 IG7D09-IG5E09 IG5E09 54786 70 IG5E09 IG5D09 0.13%
IG7E03 IG5E03 IG7E03-IG5E03 IG7E03-IG5F03 IG5F03 60714 78 IG5F03 IG5E03 0.13%
IG7E07 IG5E07 IG7E07-IG5E07 IG7E07-IG5D07 IG5D07 57285 88 IG5D07 IG5E07 0.15%
IG7F04 IG5F04 IG7F04-IG5F04 IG7F04-IE5D04* IE5D04* 49814 54 IE5D04* IG5F04 0.11%
IG7F07 IG5F07 IG7F07-IG5F07 IG7F07-IG5E07 IG5E07 55273 63 IG5E07 IG5F07 0.11%
IG7G08 IG5G08 IG7G08-IG5G08 IG7G08-IG5F08 IG5F08 43769 167 IG5F08 IG5G08 0.38%
IG7G10 IG5G10 IG7G10-IG5G10 IG7G10-IG5F06 IG5F06 57227 60 IG5F06 IG5G10 0.10%
IG7H02 IG5H02 IG7H02-IG5H02 IG7H02-IG5H03 IG5H03 38360 58 IG5H03 IG5H02 0.15%
IG7H07 IG5H07 IG7H07-IG5H07 IG7H07-IG5G07 IG5G07 36388 42 IG5G07 IG5H07 0.12%
Table 9 sequencing results for Index at end IG5
Query Desired combined object Desired combination Undesired combinations Undesired combination object Total number of sequences Number of undesired combined sequences Pollution source Is contaminated with Proportion of pollution
IG5A01 IG7A01 IG5A01-IG7A01 IG5A01-IG7B01 IG7B01 26 26 IG7B01 IG7A01 100.00%
IG5A02 IG7A02 IG5A02-IG7A02 IG5A02-IG7A01 IG7A01 49928 51 IG7A01 IG7A02 0.10%
IG5A03 IG7A03 IG5A03-IG7A03 IG5A03-IG7B02 IG7B02 33067 43 IG7B02 IG7A03 0.13%
IG5A08 IG7A08 IG5A08-IG7A08 IG5A08-IG7B08 IG7B08 53201 60 IG7B08 IG7A08 0.11%
IG5B08 IG7B08 IG5B08-IG7B08 IG5B08-IG7C08 IG7C08 61974 119 IG7C08 IG7B08 0.19%
IG5B11 IG7B11 IG5B11-IG7B11 IG5B11-IG7A11 IG7A11 47967 51 IG7A11 IG7B11 0.11%
IG5C03 IG7C03 IG5C03-IG7C03 IG5C03-IG7D03 IG7D03 49273 56 IG7D03 IG7C03 0.11%
IG5C07 IG7C07 IG5C07-IG7C07 IG5C07-IG7C06 IG7C06 45027 637 IG7C06 IG7C07 1.41%
IG5C08 IG7C08 IG5C08-IG7C08 IG5C08-IG7D08 IG7D08 67868 85 IG7D08 IG7C08 0.13%
IG5C11 IG7C11 IG5C11-IG7C11 IG5C11-IG7B11 IG7B11 57807 68 IG7B11 IG7C11 0.12%
IG5D07 IG7D07 IG5D07-IG7D07 IG5D07-IG7E07 IG7E07 38866 88 IG7E07 IG7D07 0.23%
IG5D07 IG7D07 IG5D07-IG7D07 IG5D07-IG7C08 IG7C08 38866 49 IG7C08 IG7D07 0.13%
IG5D08 IG7D08 IG5D08-IG7D08 IG5D08-IG7B10 IG7B10 53619 70 IG7B10 IG7D08 0.13%
IG5D08 IG7D08 IG5D08-IG7D08 IG5D08-IG7E08 IG7E08 53619 65 IG7E08 IG7D08 0.12%
IG5E07 IG7E07 IG5E07-IG7E07 IG5E07-IG7F07 IG7F07 57203 63 IG7F07 IG7E07 0.11%
IG5E08 IG7E08 IG5E08-IG7E08 IG5E08-IG7D07 IG7D07 72767 281 IG7D07 IG7E08 0.39%
IG5E09 IG7E09 IG5E09-IG7E09 IG5E09-IG7D09 IG7D09 58757 70 IG7D09 IG7E09 0.12%
IG5F03 IG7F03 IG5F03-IG7F03 IG5F03-IG7E03 IG7E03 54811 78 IG7E03 IG7F03 0.14%
IG5F06 IG7F06 IG5F06-IG7F06 IG5F06-IG7G10 IG7G10 50348 60 IG7G10 IG7F06 0.12%
IG5F08 IG7F08 IG5F08-IG7F08 IG5F08-IG7G08 IG7G08 67091 167 IG7G08 IG7F08 0.25%
IG5G12 IG7G12 IG5G12-IG7G12 IG5G12-IG7C01 IG7C01 40234 64 IG7C01 IG7G12 0.16%
IG5H03 IG7H03 IG5H03-IG7H03 IG5H03-IG7H02 IG7H02 48832 58 IG7H02 IG7H03 0.12%
IG5H06 IG7H06 IG5H06-IG7H06 IG5H06-IG7A11 IG7A11 42784 62 IG7A11 IG7H06 0.14%
IG5H07 IG7H07 IG5H07-IG7H07 IG5H07-IG7A08 IG7A08 36410 88 IG7A08 IG7H07 0.24%
IG5H11 IG7H11 IG5H11-IG7H11 IG5H11-IG7E11 IG7E11 32519 50 IG7E11 IG7H11 0.15%
Statistical analysis is performed on the first quality inspection results of the data in tables 8 and 9, so that the relevant information of IG7A01-IG5A01 can be obtained, as shown in Table 10 and FIG. 3; in addition, statistical analysis of 96 pairs of tagged primers resulted in a dot map of the contamination ratios of IG7 and IG5 (fig. 4 and 5), and a distribution map of the contamination ratios of IG7 and IG (fig. 6); the summary leads to the following conclusions: 1) the sequence measured by the combination of IG7A01-IG5A01 is very few, the sequence containing IG7A01 is only 96, and the sequence containing IG5A01 is also only 26, which is far lower than the requirements of at least 5000 strips and the percentage of the total content of the sequences required by quality inspection being more than 0.2%; 2) because the sequences of the combination are few, and the only combination is measured and illegal combination, the pollution ratio is very high; 3) taken together, this well, corresponding to IG7A01-IG5A01, is problematic, requiring replacement both in terms of the number of valid sequences and in terms of the likelihood of contamination.
Watch 10
Figure BDA0001860358380000161
Because the hole IG7A01-IG5A01 is problematic, the 2 label primers IG7A01 and IG5A01 are synthesized again and independently to dissolve the resynthesized primers to the specified concentration, the primers are put into the corresponding holes of a new deep hole plate in proportion, the holes corresponding to the original IG7A01-IG5A01 are removed, all the residual liquid in the original mother plate with failed quality inspection is transferred to the corresponding positions in a new deep hole plate, and the daughter plate is separated again for pollution quality control detection; the results of the second quality control analysis are shown in tables 11 and 12, and tables 11 and 12 only list the contamination.
Table 11 sequencing results for Index at end IG7
Query Desired combined object Desired combination Undesired combinations Undesired combination object Total number of sequences Number of undesired combined sequences Pollution source Is contaminated with Proportion of pollution
IG7A01 IG5A01 IG7A01-IG5A01 IG7A01-IG5B01 IG5B01 490179 579 IG5B01 IG5A01 0.12%
IG7A08 IG5A08 IG7A08-IG5A08 IG7A08-IG5H07 IG5H07 244997 300 IG5H07 IG5A08 0.12%
IG7A11 IG5A11 IG7A11-IG5A11 IG7A11-IG5B11 IG5B11 398075 580 IG5B11 IG5A11 0.15%
IG7B02 IG5B02 IG7B02-IG5B02 IG7B02-IG5A03 IG5A03 285357 358 IG5A03 IG5B02 0.13%
IG7B10 IG5B10 IG7B10-IG5B10 IG7B10-IG5D08 IG5D08 262786 435 IG5D08 IG5B10 0.17%
IG7B11 IG5B11 IG7B11-IG5B11 IG7B11-IG5C11 IG5C11 336262 664 IG5C11 IG5B11 0.20%
IG7C06 IG5C06 IG7C06-IG5C06 IG7C06-IG5C07 IG5C07 345406 4253 IG5C07 IG5C06 1.23%
IG7C08 IG5C08 IG7C08-IG5C08 IG7C08-IG5B08 IG5B08 306713 460 IG5B08 IG5C08 0.15%
IG7C09 IG5C09 IG7C09-IG5C09 IG7C09-IG5D09 IG5D09 233352 284 IG5D09 IG5C09 0.12%
IG7C11 IG5C11 IG7C11-IG5C11 IG7C11-IG5D11 IG5D11 343633 744 IG5D11 IG5C11 0.22%
IG7D01 IG5D01 IG7D01-IG5D01 IG7D01-IG5C01 IG5C01 260626 313 IG5C01 IG5D01 0.12%
IG7D03 IG5D03 IG7D03-IG5D03 IG7D03-IG5C03 IG5C03 230602 238 IG5C03 IG5D03 0.10%
IG7D03 IG5D03 IG7D03-IG5D03 IG7D03-IG5E03 IG5E03 230602 264 IG5E03 IG5D03 0.11%
IG7D07 IG5D07 IG7D07-IG5D07 IG7D07-IG5E08 IG5E08 243561 1585 IG5E08 IG5D07 0.65%
IG7D07 IG5D07 IG7D07-IG5D07 IG7D07-IG5C07 IG5C07 243561 255 IG5C07 IG5D07 0.10%
IG7D08 IG5D08 IG7D08-IG5D08 IG7D08-IG5C08 IG5C08 316348 448 IG5C08 IG5D08 0.14%
IG7D09 IG5D09 IG7D09-IG5D09 IG7D09-IG5E09 IG5E09 351153 672 IG5E09 IG5D09 0.19%
IG7E07 IG5E07 IG7E07-IG5E07 IG7E07-IG5D07 IG5D07 314916 468 IG5D07 IG5E07 0.15%
IG7E08 IG5E08 IG7E08-IG5E08 IG7E08-IG5D08 IG5D08 313695 328 IG5D08 IG5E08 0.10%
IG7E08 IG5E08 IG7E08-IG5E08 IG7E08-IG5A11 IG5A11 313695 318 IG5A11 IG5E08 0.10%
IG7E12 IG5E12 IG7E12-IG5E12 IG7E12-IG5D12 IG5D12 189902 195 IG5D12 IG5E12 0.10%
IG7G01 IG5G01 IG7G01-IG5G01 IG7G01-IG5F01 IG5F01 271347 362 IG5F01 IG5G01 0.13%
IG7G08 IG5G08 IG7G08-IG5G08 IG7G08-IG5F08 IG5F08 202711 733 IG5F08 IG5G08 0.36%
IG7G09 IG5G09 IG7G09-IG5G09 IG7G09-IG5H09 IG5H09 268926 317 IG5H09 IG5G09 0.12%
IG7G10 IG5G10 IG7G10-IG5G10 IG7G10-IG5F06 IG5F06 363743 533 IG5F06 IG5G10 0.15%
IG7G10 IG5G10 IG7G10-IG5G10 IG7G10-IG5F10 IG5F10 363743 510 IG5F10 IG5G10 0.14%
IG7H02 IG5H02 IG7H02-IG5H02 IG7H02-IG5H03 IG5H03 237964 294 IG5H03 IG5H02 0.12%
IG7H07 IG5H07 IG7H07-IG5H07 IG7H07-IG5G07 IG5G07 240529 355 IG5G07 IG5H07 0.15%
IG7H08 IG5H08 IG7H08-IG5H08 IG7H08-IG5A09 IG5A09 121660 253 IG5A09 IG5H08 0.21%
Table 12 sequencing results for Index at end IG5
Figure BDA0001860358380000162
Figure BDA0001860358380000171
After the primer replacement operation, no cross contamination exists between IG7A01 and IG5A01, and each index of 96 pairs of label primers meets the quality inspection standard.
In addition, in this example, the first quality control analysis and the second quality control analysis are compared, and the results are shown in table 13, fig. 7 (comparative analysis of IG7 labeled primer) and fig. 8 (comparative analysis of IG5 labeled primer), which shows that the reproducibility of the two quality control analyses is good, and thus the stability of the quality control method of the present invention is good.
Watch 13
Figure BDA0001860358380000172
Example 3
In this embodiment, 96 is used to perform quality inspection on the library tags, and the statistical analysis is performed on the first quality inspection result, so as to obtain the related information of one pair of tag primers, as shown in fig. 9; in addition, statistical analysis of 96 pairs of tagged primers resulted in a thermogram of the contamination ratio ratios of IG7 and IG5 (fig. 10 and 11), and a distribution map of the contamination ratio of IG7 and IG (fig. 12); summarizing the conclusions drawn: after the first quality control analysis, 96 pairs of label primers all meet the indexes.
Finally, it should be noted that the above embodiments are intended to illustrate the technical solutions of the present invention and not to limit the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A quality control method for detecting unique paired-end library-tag combinations, comprising the steps of:
s1) constructing a gDNA library with a unique double-end library label combination by taking the library label standard substance and the gDNA standard substance as raw materials, carrying out on-machine sequencing on the constructed library, and reading a library label sequence;
s2), performing first quality control analysis on the library tag sequence, wherein the indexes of the quality control analysis comprise the following items: the maximum single-side label pollution ratio is less than or equal to 2.5 percent, the maximum label combination pollution ratio is less than or equal to 0.01 percent, the number of label sample sequences in each group is more than or equal to 5000, the mixed ratio variance coefficient of all label combinations is less than or equal to 0.5, the comprehensive sequence passing rate is more than or equal to 97 percent, the ratio of the label sample sequences in each group is more than or equal to 0.2/logarithm of the library label combinations, and the ratio of the single-side more than 1 percent polluted labels is less than or equal to 10 percent;
s3) if the quality control analysis in the step S2) shows that the indexes do not meet the requirements, re-synthesizing the library labels which do not meet the requirements of quality control; constructing a gDNA library with a unique double-end library label combination by taking the newly synthesized library label, the library label meeting the requirements of the first quality control analysis and gDNA as raw materials according to the method of the step S1), performing machine sequencing on the constructed library again, and reading a library label sequence;
s4) performing secondary quality control analysis on the library label sequences until all library labels meet the indexes of the quality control analysis;
in the parameters of quality control analysis, the unique paired-end library signature combinations are each composed of an upstream library signature and a downstream library signature, the upstream library signatures are collectively referred to as IG5, IG5 comprises a and B; the downstream library tags are collectively referred to as IG7, IG7 comprises a and b; matching and correct unique paired end library tag combinations are A-a and B-B; the unique paired-end library tag combinations that did not match were A-B, and B-a; the respective sequence number of the above combinations can be obtained by analysis after each sequencing reaction;
the single-sided label contamination ratio is the proportion of cross-contamination occurring between labels within a group, and contamination is only possible within a group, i.e. within IG5 group or/and IG7 group;
when a of IG7 is not subjected to any cross contamination in the production process, for A of IG5, the ratio of B-containing contamination = number of sequences containing B-a/number of all sequences containing a,
when a of IG5 does not undergo any cross-contamination during production, for a of IG7, where b-containing contamination is = number of a-b-containing sequences/number of all a-containing sequences;
when B contaminates a and B contaminates a, then the B-B tag combination contamination ratio = (number of B-a containing sequences/number of all a containing sequences) × (number of a-B containing sequences/number of all a containing sequences);
when B of IG7 is not cross-contaminated during the production process, for B of IG5, wherein the contamination ratio of A is = number of sequences containing A-B/number of all sequences containing B,
when B of IG5 is not cross-contaminated during the production process, for B of IG7, the contamination ratio containing a = number of B-a containing sequences/number of all B containing sequences;
when a contaminates B and a contaminates B, then a-a tag combination contamination ratio = (number of a-B containing sequences/number of all B containing sequences) × (number of B-a containing sequences/number of all B containing sequences);
the number of the sequence of each group of label samples is the number of the correct pairing sequence of each group after system filtration, namely the number of the sequence containing A-a or the number of the sequence containing B-B;
the mixed proportion variance coefficient of all the label combinations is the variance coefficient of the proportion of the number of each group of correct paired sequences filtered by the system to the number of total paired correct sequences filtered by the system;
the comprehensive sequence passing rate is the proportion of the total number of correctly matched and effective sequences after the sequencing reaction is filtered by the system to the total number of all sequences after the sequencing reaction is filtered by the system;
the proportion of each group of label sample sequences is the proportion of the number of correctly paired sequences in each group after system filtration to the total sequence after system filtration;
the label with more than 1% of pollution on one side comprises the following components in percentage by weight: in the upstream library tags, the proportion of the number of the library tags with the pollution proportion of more than 1 percent to the total number of the library tags is calculated; and, within the downstream library tags, a contamination ratio greater than 1% of the number of library tags to the total number of library tags.
2. The quality control method according to claim 1, wherein the step S1) comprises the following steps in sequence: preparing a gDNA standard, fragmenting gDNA, repairing tail ends, connecting joints, purifying joint connection products, amplifying libraries, purifying amplification libraries, detecting the quality of the purified libraries, detecting the sizes of the fragments of the purified libraries and sequencing on the libraries.
3. The quality control method of claim 1, wherein the unique paired-end library tag combination consists of IG5 and IG7, the Hamming distance of the library tags within each of IG5 and IG7 is 3 or more, and the sequence Hamming distance of the library tags between IG5 and IG7 groups is 2 or more.
4. The quality control method according to claim 3, wherein the library tag is purified by high performance liquid chromatography and the molecular weight is confirmed by mass spectrometry, and the purity is required to be 85% or more.
5. The quality control method of claim 1, wherein the unique paired-end library signature combination consists of 96 pairs of library signatures, i.e., 96 upstream library signatures in group IG5 and 96 downstream library signatures in group IG7, one-to-one; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.2 percent.
6. The quality control method of claim 1, wherein the unique paired-end library signature combination consists of 48 pairs of library signatures, i.e., 48 upstream library signatures in group IG5 and 48 downstream library signatures in group IG7, in one-to-one correspondence; the percentage of the sample sequence of each group of labels is correspondingly adjusted to be more than or equal to 0.4 percent.
7. The quality control method of claim 1, wherein when the unique paired-end library tag combination consists of 192 pairs of library tags, i.e., 192 upstream library tags in IG5 group and 192 downstream library tags in IG7 group, in one-to-one correspondence, the percentage of sample sequences of each group of tags is adjusted to 0.1% or more accordingly.
8. The quality control method of claim 1, wherein when the unique paired-end library tag combination consists of 288 pairs of library tags, i.e., 288 upstream library tags in IG5 group and 288 downstream library tags in IG7 group, one-to-one correspondence is established, the percentage of tag sample sequences in each group is adjusted to 0.07% or more.
9. The quality control method of claim 1, wherein when the unique paired-end library signature combination consists of 384 pairs of library signatures, i.e., 384 upstream library signatures in IG5 group and 384 downstream library signatures in IG7 group, in one-to-one correspondence, the percentage of sample sequences of each set of signatures is adjusted to 0.05% or more.
10. The quality control method according to any one of claims 1 to 9, which is used for sequencing a sample.
CN201811337895.2A 2018-11-09 2018-11-09 Quality control method for detecting unique double-end library label combination and application Active CN109517882B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111090137.7A CN113957123A (en) 2018-11-09 2018-11-09 Method for constructing and detecting gDNA library containing unique double-end library tag combination
CN201811337895.2A CN109517882B (en) 2018-11-09 2018-11-09 Quality control method for detecting unique double-end library label combination and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811337895.2A CN109517882B (en) 2018-11-09 2018-11-09 Quality control method for detecting unique double-end library label combination and application

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202111090137.7A Division CN113957123A (en) 2018-11-09 2018-11-09 Method for constructing and detecting gDNA library containing unique double-end library tag combination

Publications (2)

Publication Number Publication Date
CN109517882A CN109517882A (en) 2019-03-26
CN109517882B true CN109517882B (en) 2021-08-17

Family

ID=65773575

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111090137.7A Pending CN113957123A (en) 2018-11-09 2018-11-09 Method for constructing and detecting gDNA library containing unique double-end library tag combination
CN201811337895.2A Active CN109517882B (en) 2018-11-09 2018-11-09 Quality control method for detecting unique double-end library label combination and application

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111090137.7A Pending CN113957123A (en) 2018-11-09 2018-11-09 Method for constructing and detecting gDNA library containing unique double-end library tag combination

Country Status (1)

Country Link
CN (2) CN113957123A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110970091B (en) * 2019-12-20 2023-05-23 北京优迅医学检验实验室有限公司 Label quality control method and device
CN111910258B (en) * 2020-08-19 2021-06-15 纳昂达(南京)生物科技有限公司 Paired-end library tag composition and application thereof in MGI sequencing platform
CN115197999B (en) * 2022-07-15 2024-01-23 纳昂达(南京)生物科技有限公司 Method and device for synthesizing crosstalk by quality control double-end unique tag connector

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104293783A (en) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 Primer applicable to amplicon sequencing library construction, construction method, amplicon library and kit comprising amplicon library
CN104561294A (en) * 2014-12-26 2015-04-29 北京诺禾致源生物信息科技有限公司 Construction method and sequencing method of genetic typing sequencing library
CN105671644A (en) * 2016-02-26 2016-06-15 武汉冰港生物科技有限公司 Preparation method of genome mixing sequencing library
WO2016109981A1 (en) * 2015-01-09 2016-07-14 深圳华大基因研究院 High-throughput detection method for dna synthesis product
WO2018197950A1 (en) * 2017-04-23 2018-11-01 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104099666A (en) * 2013-04-15 2014-10-15 江苏基谱生物科技发展有限公司 Construction method for next-generation sequencing library
CN105734048A (en) * 2016-02-26 2016-07-06 武汉冰港生物科技有限公司 PCR-free sequencing library preparation method for genome DNA

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104293783A (en) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 Primer applicable to amplicon sequencing library construction, construction method, amplicon library and kit comprising amplicon library
CN104561294A (en) * 2014-12-26 2015-04-29 北京诺禾致源生物信息科技有限公司 Construction method and sequencing method of genetic typing sequencing library
WO2016109981A1 (en) * 2015-01-09 2016-07-14 深圳华大基因研究院 High-throughput detection method for dna synthesis product
CN105671644A (en) * 2016-02-26 2016-06-15 武汉冰港生物科技有限公司 Preparation method of genome mixing sequencing library
WO2018197950A1 (en) * 2017-04-23 2018-11-01 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Effects of index misassignment on multiplexing and downstream";illumina white paper;《illumina》;20180404;第1-4页 *
"Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing";Laura E. MacConaill等;《BMC Genomics》;20180108;第19卷(第30期);第1-10页 *

Also Published As

Publication number Publication date
CN109517882A (en) 2019-03-26
CN113957123A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN108893466B (en) Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation
CN109517882B (en) Quality control method for detecting unique double-end library label combination and application
CN112967753B (en) Pathogenic microorganism detection system and method based on nanopore sequencing
CN108220479B (en) Multiplex connection probe amplification identification kit capable of detecting multiple sudden death disease pathogens of pigs
CN108517567B (en) Adaptor, primer group, kit and library construction method for cfDNA library construction
CN111052249B (en) Methods of determining predetermined chromosome conservation regions, methods of determining whether copy number variation exists in a sample genome, systems, and computer readable media
CN105567681B (en) A kind of method and label connector based on the noninvasive biopsy virus of high-throughput gene sequencing
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
WO2023284768A1 (en) Fusion primer direct amplification method-based human mitochondrial whole genome high-throughput sequencing kit
CN111304288A (en) Specific molecular tag UMI group and application thereof
CN108611408A (en) The method and apparatus for detecting fetal chromosomal aneuploidy
CN116287357A (en) Respiratory tract pathogenic bacteria detection kit based on targeted amplicon sequencing
CN103210093A (en) Method for detecting digestive tract pathogens
Goraichuk et al. Complete genome sequences of four avian paramyxoviruses of serotype 10 isolated from rockhopper penguins on the Falkland Islands
CN116064818A (en) Primer group, method and system for detecting IGH gene rearrangement and hypermutation
CN115725784A (en) Kit and method for detecting pathogens related to respiratory tract infection
CN112885407B (en) Second-generation sequencing-based micro-haplotype detection and typing system and method
CN105002566A (en) Visual chip and preparation method thereof and method for chip visualization
WO2022020259A1 (en) Methods and devices for detecting and sequencing sars-cov-2
CN112662747A (en) HRM genotyping method and primers for detecting RHD1227A allele of red blood cell Rh blood group system
CN113373207A (en) Methods for determining cytosine modifications
WO2023108865A1 (en) Primer pair, kit and detection method for detecting mitochondrial loop gene mutation
CN117867180B (en) Primer combination, kit and application for detecting respiratory tract pathogens
CN115044703B (en) MNP (MNP) marker locus of human coronavirus HCoV-OC43, primer composition, kit and application of MNP marker locus
CN117821562A (en) Method for enriching short fragment DNA library and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant