CN118086468A

CN118086468A - Method for improving library data uniformity and application

Info

Publication number: CN118086468A
Application number: CN202410128510.0A
Authority: CN
Inventors: 曾伟奇; 张俊杰; 许腾; 吴婉婷; 关瑞麟; 杨映洁; 李永军; 王小锐
Original assignee: Beijing Weiyuan Medical Laboratory Co ltd; Guangzhou Weiyuan Intelligent Manufacturing Technology Co ltd; Guangzhou Weiyuan Medical Equipment Co ltd; Guangzhou Weiyuan Medical Laboratory Co ltd; Guangzhou Weiyuan Medical Technology Co ltd; Shenzhen Weiyuan Medical Technology Co ltd; Guangzhou Vision Gene Technology Co ltd
Current assignee: Beijing Weiyuan Medical Laboratory Co ltd; Guangzhou Weiyuan Intelligent Manufacturing Technology Co ltd; Guangzhou Weiyuan Medical Equipment Co ltd; Guangzhou Weiyuan Medical Laboratory Co ltd; Guangzhou Weiyuan Medical Technology Co ltd; Shenzhen Weiyuan Medical Technology Co ltd; Guangzhou Vision Gene Technology Co ltd
Priority date: 2024-01-30
Filing date: 2024-01-30
Publication date: 2024-05-28

Abstract

The invention discloses a method for improving library data uniformity and application thereof, belonging to the technical field of NGS sequencing. The method comprises the following steps: fragmenting: sample DNA is taken for fragmentation treatment, and sequencing primers are connected at the fracture; amplification and enrichment: adding a 1st specific primer for PCR amplification, wherein the 1st specific primer comprises a connector, index and a complementary sequence, the complementary sequence is complementary with the sequencing primer, and the Index is a specific label; pooling: mixing samples with Index; homogenizing and amplifying: adding an equivalent amount of 2nd specific primers into the mixed sample for limiting PCR amplification; the 2nd specific primer is complementary to the adaptor sequence and the Index sequence in the 1st specific primer and is added in an amount less than that required for sufficient amplification of all sample target region fragments. By adopting the method for library construction and sequencing, the data uniformity can be improved, and the difference of the quality of sequencing data can be shortened.

Description

Method for improving library data uniformity and application

Technical Field

The invention relates to the technical field of NGS sequencing, in particular to a method for improving library data uniformity and application thereof.

Background

Etiology diagnosis is a vital link in diagnosis of infectious diseases, and the value of accurate diagnosis of infectious diseases is undoubted. However, the traditional original detection technology has the problems of long detection period and low positive rate; the molecular diagnosis technology has the problems of false positive/negative, narrow detection range and the like.

Along with the popularization and popularization of metagenomic sequencing technology, metagenomic second generation sequencing technology (metagenomics Next Generation Sequencing, mNGS) has the advantages of high throughput, wide coverage, no bias, rapidness, accuracy and the like, is gradually used for pathogen detection of clinical infectious diseases, and has important clinical significance on clinical complex and suspicious infectious samples such as respiratory tract, blood, sterile body fluid and the like.

In 2014, after mNGS is used for detecting leptospira infection by cerebrospinal fluid for the first time, related research and literature report on mNGS are more and more reported at home and abroad, and are gradually popularized and popularized in clinical diagnosis practice. In 2016, mNGS detection technology has been used clinically in China and is rapidly becoming the focus of "hot-hand" in the field of accurate diagnosis of clinical pathogenic microorganisms.

However, the difference in library concentration between captured samples resulted in a large difference in the amount of data obtained by splitting each sample after sequencing due to the different pathogen load for each sample in the mixed library. Therefore, a technology for performing homogenization processing on the captured samples is needed to make the data volume of the machine to be uniform so as to ensure the detection sensitivity and the accuracy of pathogen interpretation of each sample.

Disclosure of Invention

Aiming at the problem that the detection sensitivity is reduced due to the difference of the next machine data in the mixed library obtained after the library Pooling due to the different contents of each sample or target sequence, the invention provides a method for improving the uniformity of library data.

The invention provides a method for improving library data uniformity, comprising the following steps:

fragmenting: taking sample DNA, carrying out fragmentation treatment, and connecting a sequencing primer at a fracture;

Amplification and enrichment: adding a 1st specific primer for PCR amplification, wherein the 1st specific primer comprises a linker sequence, an Index sequence and a complementary sequence, the complementary sequence is complementary with the sequencing primer, and the Index sequence is a specific tag sequence of the sample;

Pooling: mixing the samples with the Index sequences to obtain mixed samples;

homogenizing and amplifying: adding an equal amount of 2nd specific primers into the mixed sample for carrying out restriction PCR amplification to obtain an on-line library; the 2nd specific primer is complementary to the adaptor sequence and the Index sequence in the 1st specific primer, and the 2nd specific primer is added in an amount less than that required for sufficient amplification of all samples.

In the method for improving the uniformity of library data, first, a 1st specific primer is designed for each sample, and a specific Index sequence for each sample is taken as a tag, so that a first library is prepared. And then in the homogenizing amplification step, the concentration and the amplification cycle number of the 2nd specific primers are controlled, the limited and equal amount of the 2nd specific primers are used, the amplification is started from the Index of each first library sequence, the amplification efficiency of samples initiated by different templates is different due to the limitation of resources in an amplification system in the amplification process by utilizing a limiting amplification method, the amplification efficiency of high initial template samples is reduced, the amplification efficiency of low initial template samples is increased in the later amplification stage, the product quality of different initial templates reaches the same platform stage in the later amplification stage, and the PCR products with the same concentration level are generated, so that the effect of uniform product concentration is achieved, and the difference of sequencing data quality is shortened.

In some embodiments, in the amplification enrichment step, the final concentration of the 1nd specific primer in the PCR amplification system is 1-2. Mu.M per primer.

In some embodiments, the final concentration of each of the 2nd specific primers in the limiting PCR amplification system in the homogenization amplification step is 0.05 to 2. Mu.M, preferably 0.05 to 1. Mu.M, more preferably 0.1 to 0.5. Mu.M.

The final concentration, i.e., the reaction concentration of the specific primer in the reaction system, may also be referred to as the working concentration. The final concentration of the 2nd specific primer needs to be controlled below the amount required by full amplification of the target region fragment maximum content sample, so that the purpose of limiting amplification is realized, and the specific amount can be adjusted according to actual conditions.

However, by adopting the concentration range, each pair of primers can be optimally amplified, the non-equivalent library can achieve optimal homogenization effect, the activity of enzyme is not easily influenced, a secondary structure is formed, and the effect and cost consideration of the whole amplification are considered, so that the concentration range is a preferred concentration range.

In some embodiments, between the Pooling step and the homogenizing and amplifying step, a capture step is further included, the capture step being: capturing the target region fragment of the mixed sample genome with an RNA probe or a DNA probe. It will be appreciated that the above RNA probes or DNA probes are of conventional design, as described in reference (BacCapSeq:a Platform for Diagnosis and Characterization of Bacterial Infections.DOI:10.1128/mBio.02007-18).

In some embodiments, in the capturing step, the target region fragments of the genome of the mixed sample are bound with a biotin-labeled RNA probe or DNA probe, and then the sample is captured by binding with biotin in the RNA probe or DNA probe using a magnetic bead with streptavidin. It will be appreciated that the capture process may be carried out by other means such as antigen-antibody binding or solid phase capture, and the like, according to conventional procedures. However, the combination of biotin with extremely strong affinity and streptavidin is much higher than that between common antigen and antibody, and the two have the advantages of good binding stability, strong specificity and high sensitivity.

In some embodiments, in the homogenization amplification step, several parts of which the number is the same as that of the samples are taken out from the mixed samples, and 2nd specific primers are added to each of the above parts of samples, respectively, and restriction PCR amplification is performed, and amplified products Pooling are obtained to obtain an on-machine library.

The above-mentioned homogenization amplification is a tube-based amplification, i.e., the mixed sample is divided into a plurality of parts, and each part is amplified with 2nd specific primers (distinguished by Index sequence recognition) of one part of the sample, and the amount of the added 2nd specific primers is equal to the amount to achieve the restriction homogenization amplification.

In some embodiments, in the homogenizing amplification step, an equal amount of the 2nd specific primer mixture is added to the mixed sample, and the mixed sample is subjected to restriction PCR amplification at the same time, so as to obtain the on-machine library.

The homogenization amplification is mixed tube amplification, namely, the mixed sample is directly amplified with a mixture of 2nd specific primers corresponding to each sample, and the restriction homogenization amplification is realized in the same reaction system.

In some embodiments, in the amplification enrichment step, the PCR amplification is followed by a purification step, which may be: magnetic bead sorting was used to obtain libraries with Index sequences that fit into a range of fragment sizes.

In some embodiments, the homogenizing step, after performing the restriction PCR amplification, further comprises a purification step, which may be: magnetic bead sorting was used to obtain libraries with Index sequences that fit into a range of fragment sizes.

It will be appreciated that the above purification steps are carried out in a manner conventional in the art.

In some embodiments, the amplification product fragment length that corresponds to the fragment size range is 100-600bp.

In some embodiments, in the fragmenting step, the sample DNA is broken with a transposase and universal sequencing primers are ligated at the break. It will be appreciated that the library fragmentation and sequencing primer ligation may be performed according to methods common in the art, such as adaptor library construction, PCR amplicon library construction, etc., but using transposase has the advantage of low initial amount of library construction and short library construction time.

In some embodiments, in the amplification enrichment step, the PCR amplification is performed according to the following procedure: after maintaining at 98℃for 30s, 13.+ -. 3 cycles were performed according to the procedure of maintaining at 98℃for 10s, at 60℃for 30s, and at 72℃for 30s, followed by maintaining at 72℃for 5min, cooling to 4℃and preserving.

In some embodiments, in the homogenizing amplification step, the limiting PCR amplification is performed according to the following procedure: after maintaining at 98℃for 45s, 25.+ -.5 cycles were performed according to the procedure of maintaining at 98℃for 10s, 65℃for 15s, and 72℃for 15s, followed by maintaining at 72℃for 1min, cooling to 4℃and preserving.

Under the above-mentioned restriction amplification conditions, by designing the sequence of the 2nd specific primers, controlling the input amount of the primers and controlling the number of amplification cycles, different 2nd specific primers can be amplified better under the restriction conditions, and the products with high concentration uniformity can be obtained.

The above-described methods for improving the homogeneity of diverse libraries of text are used for non-diagnostic therapeutic purposes.

In another aspect, the invention also provides the use of the above method for preparing a reagent for library construction.

In some embodiments, the library is used for mNGS assays and the sample DNA is prepared by the following method:

(1) Extracting DNA and RNA from a sample, and digesting and removing the DNA by DNase I;

(2) Hybridizing and combining host specific probes with host RNA, and retaining pathogenic RNA;

(3) Carrying out reverse transcription on the pathogenic RNA sequence to obtain a cDNA chain;

(4) And (5) adding the initial DNA of the sample back to obtain the DNA.

It will be appreciated that the extraction of RNA and DNA in the sample may be carried out by conventional methods for RNA or DNA extraction.

In another aspect, the invention also provides an NGS detection reagent for improving library data uniformity, comprising a 1st specific primer and a 2nd specific primer in the above method, preferably, the 1st specific primer has a final concentration of 1-2 μm per primer, and the 2nd specific primer has a final concentration of 0.05-2 μm, preferably 0.05-1 μm, more preferably 0.1-0.5 μm.

On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.

The reagents and materials used in the present invention are commercially available.

The invention has the positive progress effects that:

According to the method for improving the uniformity of library data, 1st specific primers are designed for each sample, index sequences specific to each sample are taken as labels, a first library is prepared, then in a uniform amplification step, the concentration and the amplification cycle number of the 2nd specific primers are controlled, limited and equal amounts of the 2nd specific primers are used, the Index of each first library sequence is amplified, a limited amplification method is utilized, the amplification efficiency of samples initiated by different templates is different due to the limitation of resources in an amplification system in the amplification process, the amplification efficiency of samples of high-initial templates is reduced, the amplification efficiency of samples of low-initial templates is high, the product quality of templates with different initial amounts reaches the same platform stage in the amplification later stage, and PCR products with the same concentration level are generated, so that the effect of uniform product concentration is achieved, and the difference of sequencing data quality is shortened.

Drawings

FIG. 1 is a schematic diagram of the PCR amplification and purification process in example 1.

FIG. 2 is a schematic diagram of the principle of homogenization amplification in example 1.

FIG. 3 is the yield of sequencing data after simulating unequal standard quality control Pooling in example 1.

FIG. 4 is a graph showing the relationship between the yield ratios of the sequencing data after simulating the unequal standard quality control Pooling in example 1.

FIG. 5 shows the cross-contamination rate of sequencing after the unequal standard quality control Pooling in example 1.

FIG. 6 is a schematic diagram of the sample extraction and purification process in example 2.

FIG. 7 is a plot of the yield of sequencing data from the non-equal mix library of example 2.

FIG. 8 shows the ratio of the pathogen detection sequences of the non-uniform mixed library of example 2.

FIG. 9 is a schematic diagram of the principle of limiting amplification by mixing tubes in example 3.

FIG. 10 is a plot of the yield of sequencing data simulating an unequal mix library in example 3.

FIG. 11 is a plot of the yield ratio of sequencing data simulating a non-equal mix library in example 3.

FIG. 12 is a plot of the sequencing cross-contamination rate of a simulated non-equal mix library in example 4.

FIG. 13 is a plot of the yield of sequencing data simulating an unequal mix library in example 4.

FIG. 14 is a plot of the ratio of pathogen detection sequences for a simulated non-equal mix library of example 4.

Detailed Description

The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention. The experimental methods, in which specific conditions are not noted in the following examples, were selected according to conventional methods and conditions, or according to the commercial specifications.

Definition:

Index: in second generation sequencing, a synthetic nucleic acid sequence is designed artificially and cannot be compared with a genome existing in a sample to be detected, and the sequence is used for sample distinction during sequencing, which is also called a molecular tag.

PCR amplification system amplification enzyme: entrans 2 products X qPCR Probe Master MIX, ABclonal Biotechnology (Wuhan) Inc.

Homogenizing PCR amplification System amplification enzyme: KAPA HiFi HotStart Ready Mix Roche (China) Holding Ltd.

Example 1

In the embodiment, the difference of experimental results is brought about by reducing the complexity of samples, and experimental verification is performed by using a standard quality control library.

1. Method of

1) Fragmentation of

And taking a standard quality control product, wherein the standard quality control product is a gene fragment obtained by amplifying a selected region of arabidopsis thaliana, and is a standard sample with known content and base sequence. The fragment size is 100-600bp. The standard quality control product is prepared according to the method of patent application CN 202210543900.5 (a quality control method of library label primers and application thereof), and after specific primer sequences are amplified, sequencing primer sequences are connected to two ends of the sequence.

2) Enrichment by amplification

The standard quality control and 1st specific primer were taken in an octal tube, and were dissolved thoroughly before use, vortexed, homogenized, and centrifuged transiently, and PCR amplified according to the following amplification system (Table 1) and procedure (Table 2).

The 1st specific primer described above contains a linker sequence, an Index sequence, which is the specific tag sequence of the sample in this example, and a complementary sequence, which is complementary to the sequencing primer, and the length of the Index sequence described above is designed to be 14nt in order to provide sufficient specificity for the 2nd primer to achieve the homogenization effect. It can be appreciated that the actual use length can be designed according to specific experimental requirements. For the complementary sequences, 14nt of sequencing primer sequences and complementary sequences were also selected in this example to increase the primer pairing success rate.

In this example, 4 different Index were used to model 4 different sample libraries, respectively.

TABLE 1 PCR amplification System

TABLE 2 PCR amplification cycling program

3) Purification

A: after amplification, standing the eight-connecting tube at a low temperature for a period of time, and performing instantaneous centrifugation by using a palm type centrifugal machine to reduce aerosol pollution.

B: the amplified product was made up to 50. Mu.L with 1.0× (50. Mu.L) DNA Ampure xp magnetic beads added thereto, which were equilibrated at room temperature for 30min to homogenize the amplified product before use, and mixed with shaking after adding the magnetic beads, and allowed to stand at room temperature for 5min to allow the amplified product to come into full contact with the magnetic beads.

C: after transient centrifugation, the octant was placed on a magnetic plate, left to stand for 3min, and the supernatant was discarded using a pipette.

D: preparing 80% ethanol, adding 200 μl of each tube of eight-joint tubes, transferring to the tube for 2 weeks, and discarding supernatant; this step was repeated once, then centrifuged instantaneously and the residual ethanol was discarded with a small-scale pipette.

E: and (3) keeping a cover-opening state, and drying at 37 ℃ on a dry type thermostat or airing at room temperature until the surface of the magnetic beads is matt (ethanol residues can inhibit subsequent enzyme reaction, and the nucleic acid elution rate is reduced when the magnetic beads are cracked).

F: adding 20 mu L NF water, shaking and mixing uniformly, standing for 2min, and incubating at room temperature for eluting.

G: after transient centrifugation, the mixture was placed on a magnetic plate and allowed to stand for 2min, and 17. Mu.L of the supernatant was pipetted into a new octant tube to obtain a library with Index identification tag.

The PCR amplification and purification process described above is shown in FIG. 1.

4)Pooling

The amplified and enriched libraries are mixed into non-equivalent libraries according to the mass ratio Pooling of 1:30:50:100.

5) Homogenization amplification

Because the sample in this embodiment is an analog standard sample, the target region fragment is not required to be captured, and the homogenization amplification is directly performed, specifically as follows:

The same amount of 4 parts was taken out of the above samples and added to each of the above samples, and an equal amount of 2nd specific primer was added thereto, and the mixture was thoroughly dissolved before use, vortexed and centrifuged instantaneously, and subjected to restriction PCR amplification according to the following amplification system (Table 3) and procedure (Table 4).

TABLE 3 homogenization of PCR amplification System (per sample)

TABLE 4 PCR amplification cycling program

5) Purification

B: 50 mu L of amplified product is added with 1.6X (80 mu L) DNA Ampure xp magnetic beads, the magnetic beads are balanced for 30min at room temperature before use, the amplified product is uniformly mixed and oscillated after being added with the magnetic beads, and the amplified product is kept stand for 5min at room temperature, so that the amplified product is fully contacted with the magnetic beads.

G: after instantaneous centrifugation, placing on a magnetic plate, standing for 2min, sucking 17 mu L of supernatant, and transferring to a new octal tube to obtain a quality control first sequencing library with index identification tag, namely a first standard library.

6) Sequencing on machine

A: the first standard library concentration was entered into the on-machine information table, library Pooling was performed to homogenize the library concentration, and all diluted libraries Pooling were placed into one tube. The above described procedure from restriction amplification to purification and Pooling is shown in FIG. 2, where UD1, UD2 … … UDn represent different samples, respectively.

B: mu.L of 1N NaOH solution was diluted to 0.2N with NF water and Pooling good library diluted to 4nM with HT1 high throughput sequencing buffer.

C: library denaturation, a defined volume of 4nM library was taken in a new 1.5mL EP tube, then a defined volume of 0.2N NaOH solution was added, mixed well with shaking, denatured at room temperature for 5min, and vortexed 2 times during this period.

D: the denatured library was diluted to 20pM with HT1 high throughput sequencing buffer and then further diluted to 1.1pM.

E: each library was run on machine 2M data volume, and the library was sequenced according to the run on machine.

2. Results

The data after sequencing is analyzed, the analysis results are shown in fig. 3-5, fig. 3 is a graph simulating the output condition of the sequencing data after the unequal standard quality control materials Pooling, each group is subjected to 5 sample detection, wherein the abscissa is the input quantity of the library with different proportion relations, namely, the input quantity is respectively 1:30:50:100, and the ordinate is the data quantity taking the logarithm. FIG. 4 is a diagram showing the ratio of the output of the sequencing data after simulating the unequal standard quality control Pooling, wherein the minimum input is taken as a scale 1, the ratio of other data to 1X is calculated, the abscissa is the input of the library with different ratios, the ordinate is the ratio of other data to 1X, and the different shape data points represent that 5 different samples are selected for Pooling. FIG. 5 shows the cross contamination rate of sequencing after unequal standard quality control Pooling, with the abscissa showing the standard quality control number (5 samples each including 4 mass ratios for 20 samples) and the ordinate showing the number of cross contamination sequences divided by the number of total_reads (contamination rate).

As can be seen from the graph, the data yield ratio is less than 1:5, and the cross_correlation is less than 1/10 ⁵, which shows that the method can improve the uniformity of library data.

Example 2

Based on the experimental verification results of example 1, the present example is based on clinical samples, and based on the non-uniformity of library quality due to pathogen differences after probe capture, systematic verification of clinical samples was performed on the above method for improving library data uniformity.

1. Method of

1. Sample DNA extraction

DNA and RNA were simultaneously extracted from the same clinical sample according to the conventional sample pretreatment and extraction method, and DNA in RNA nucleic acid was digested with DNase I (origin: nanjinouzan Biotechnology Co., ltd.; ultraClean ds-CDNA SYNTHESIS Module (+ GDNA WIPER)), to obtain a sample containing only RNA. The specific probe (specifically designed by human specific probe according to conventional method, such as Li,N.,Cai,Q.,Miao,Q.,et.al.(2020).High-Throughput Metagenomics for Identification of Pathogens in the Clinical Settings.Small Methods,5(1),2000792.) hybridized with human RNA to remove human RNA and residual pathogen RNA), and reverse transcribing to synthesize cDNA chain and adding DNA.

2. Fragmentation of

The sample DNA was taken, broken by a transposase (source: nanjinouzan Biotechnology Co., ltd., ultraClean Universal Plus DNA Library Prep Kit for Illumina V3) library construction method, and sequencing primers were ligated at the break.

3、Pooling

Taking the fragmented products, and carrying out Pooling according to the mass ratio of 1:30:50:100.

3. Enrichment by amplification

The library to which the sequencing primer had been attached and the 1st specific primer were taken in an octamer tube, and were sufficiently dissolved before use, vortexed, and subjected to transient centrifugation, and PCR amplification was performed in accordance with the amplification system and procedure of example 1.

In this example, 4 different indices were also used to model libraries of samples with different levels in clinical samples.

4. Purification

Purification was performed according to the method of example 1, except that 100. Mu.L of the amplified product was taken and 0.9 XDNA Ampure xp magnetic beads (90. Mu.L) were added.

The flow from sample DNA extraction to purification steps described above is shown in FIG. 6.

5、Pooling

The purified sample library with 4 different Index tags was taken and subjected to an equivalent mass Pooling according to 400 ng/library.

6. Capturing

The target region fragments of the above mixed sample genome are first bound with a biotin-labeled RNA probe or DNA probe (designed according to the CN202011636780.0 protocol), and the sample is captured by binding streptavidin to biotin in the RNA probe or DNA probe.

7. Homogenization amplification

The same amount of 4 parts of each sample was taken out and added to each octant, and the same amount of 2nd specific primer was added to each sample, and the mixture was thoroughly dissolved before use, vortexed and centrifuged instantaneously, and subjected to restriction PCR amplification according to the amplification system and procedure for restriction PCR amplification in example 1.

8. Purification

Purification was performed according to the method of example 1.

9. Sequencing on machine

Library Pooling and sequencing preparations were performed as experimental groups by reference to the method of example 1.

And simultaneously setting a control group, wherein the control group is conventional universal sequencing primer amplification of non-uniform amplification, namely, the conventional universal sequencing primer pair is used for performing common PCR amplification on the captured library instead of uniform amplification after capturing is completed, so as to obtain a captured amplified library.

2. Results

The data after sequencing is analyzed, the analysis results are shown in fig. 7-8, fig. 7 shows the sequencing data yield conditions of the unequal mixed library, the abscissa shows different groups, the ordinate shows the data yield, the ratio of the maximum data yield to the minimum data yield in the control group is 3439, and the ratio of the maximum data yield to the minimum data yield in the experimental group is 5.8.

FIG. 8 shows the ratio of pathogen detection sequences in non-equal mixed libraries, with the ratio of pathogen detection sequences on the ordinate (median 1.7 for experimental/control groups).

As can be seen from the graph, the data volume output ratio is less than 1:5, the cross_contact is less than 1/10 ⁵, and the library data volume output ratio relationship can be reduced from 1:1000 (0.1M: 100M) to less than 1:20 by the method, and the pollution rate is less than 5/10 ⁶.

Example 3

In examples 1-2, the homogenization amplification step was performed by selecting a mixed sample with Index after 1st specific primer amplification, and performing tube-mixing amplification, but the number of reactions and the amount of reagents were large in practical operation.

1. Method of

In this example, reference is made to example 1, and experimental verification is performed with a standard quality control library.

The difference is that in the homogenization amplification step, an equal amount of 2nd specific primer mixture designed for each sample Index was directly added to the non-equal amount mixed library, and the restriction amplification was performed with reference to the amplification system and procedure of example 1, but the added 2nd specific primers were mixed primers, wherein each primer had a final concentration of 0.2. Mu.M in the system. The principle of the above-described restriction amplification and purification steps is shown in FIG. 9.

2. Results

The data after sequencing are analyzed, the analysis results are shown in fig. 10-12, fig. 10 shows the output condition of sequencing data of simulated non-equal mixed libraries, each group is subjected to 5 sample detection, the abscissa shows the input amount of the libraries with different proportion relations, and the ordinate shows the data amount taking the logarithm.

FIG. 11 is a schematic diagram showing the ratio of the output of the sequencing data of a simulated non-uniform mixed library, wherein the minimum input is taken as a scale 1, the proportional relation between other data and 1X is calculated, the abscissa is the input of the library with different proportional relations, the ordinate is the ratio of other data and 1X, and the data points with different shapes represent that 6 different samples are selected for Pooling.

FIG. 12 is a plot of the sequencing cross-contamination rate of a simulated non-equal mix library, with the standard quality control number (6 samples, each sample comprising 4 mass fractions, 24 samples total) on the abscissa, and the number of cross-contamination sequences divided by the number of total_reads (contamination rate) on the ordinate.

From the graph, the data output ratio is less than 1:5, and the cross_contact is less than 1/10 ⁴, which shows that the method can improve the library data uniformity, reduce the experiment difficulty, simplify the operation, reduce the reagent cost, but the Cross Contamination is slightly higher than that of the example 1.

Example 4

In this example, the homogenization step was performed by tube mixing amplification.

1. Method of

This example refers to example 2, where experimental verification was performed on clinical specimens. The difference is that in the homogenization amplification step, an equal amount of 2nd specific primer mixture designed for each sample Index was directly added to the non-equal amount mixed library, and the restriction amplification was performed with reference to the amplification system and procedure of example 2, but the added 2nd specific primers were mixed primers, wherein each primer had a final concentration of 0.2. Mu.M in the system.

2. Results

The data after sequencing is analyzed, the analysis result is shown in fig. 13-14, fig. 13 shows the situation of simulating the sequencing data output of the unequal mixed library, the abscissa is different groups, the ordinate is the data output, the ratio of the maximum data output to the minimum data output in the control group is 31, and the ratio of the maximum data output to the minimum data output in the experimental group is 9.

FIG. 14 is a diagram showing the ratio of pathogen detection sequences in a simulated non-uniform mixed library, with the ratio of pathogen detection sequences on the ordinate (experimental/control group, median 3.5).

As can be seen from the graph, the data volume output ratio is less than 1:5, and the cross_correlation is less than 5/10 ⁴, which shows that the library data volume output ratio relationship can be reduced from 1:1000 (0.1M: 100M) to less than 1:20 by the method, and the pollution rate is less than 5/10 ⁴.

Claims

1. A method for improving library data uniformity comprising the steps of:

Pooling: mixing the samples with the Index sequences to obtain mixed samples;

2. The method for improving the homogeneity of library data according to claim 1, wherein in said amplification enrichment step, the final concentration of said 1st specific primer in the PCR amplification system is 1-2 μΜ per primer;

And/or, in the homogenizing amplification step, the final concentration of each of the 2 nd-specific primers in the limiting PCR amplification system is 0.05 to 2. Mu.M, preferably 0.05 to 1. Mu.M, more preferably 0.1 to 0.5. Mu.M.

3. The method for improving library data uniformity according to claim 1, further comprising a capturing step between said Pooling step and said homogenizing and amplifying step, said capturing step being: capturing the target region fragment of the mixed sample genome with an RNA probe or a DNA probe.

4. The method for improving the homogeneity of library data according to claim 3, wherein in the capturing step, a target region fragment of the genome of the mixed sample is first bound with a biotin-labeled RNA probe or DNA probe, and then a sample is captured by binding with biotin in the RNA probe or DNA probe using a magnetic bead with streptavidin.

5. The method for improving the uniformity of library data according to claim 1, wherein in said uniformity amplification step, a plurality of samples equal to the number of samples are taken out from the mixed sample, and 2nd specific primers of the same amount are added to each sample, respectively, and restriction PCR amplification is performed, and amplified products Pooling are obtained to obtain an on-line library;

Or adding an equal amount of the 2nd specific primer mixture into the mixed sample, and simultaneously carrying out restriction PCR amplification on the mixed sample to obtain the on-line library.

6. The method for improving library data uniformity according to claim 1, wherein at least one of the following conditions is met:

1) In the amplification and enrichment step, the PCR amplification is performed and then the PCR amplification further comprises a purification step, wherein the purification step can be as follows: magnetic bead separation is adopted to obtain a library with Index sequences which accords with the size range of the fragments;

2) In the homogenizing amplification step, the method further comprises a purification step after the restriction PCR amplification, wherein the purification step can be as follows: magnetic bead separation is adopted to obtain a library with Index sequences which accords with the size range of the fragments;

3) In the fragmentation step, the sample DNA is broken with a transposase and universal sequencing primers are ligated at the break.

7. The method for improving the homogeneity of library data according to claim 1, wherein in said amplification and enrichment step, the PCR amplification is performed according to the following procedure: after the temperature is kept at 98 ℃ for 30s, 13+/-3 cycles are carried out according to the procedures of keeping at 98 ℃ for 10s, keeping at 60 ℃ for 30s and keeping at 72 ℃ for 30s, and then the temperature is kept at 72 ℃ for 5min, and the temperature is reduced to 4 ℃ for preservation;

And/or, in the homogenizing amplification step, in the limiting PCR amplification, after the temperature is kept at 98 ℃ for 45 seconds, 25+/-5 cycles are carried out according to the procedures of keeping at 98 ℃ for 10 seconds, keeping at 65 ℃ for 15 seconds and keeping at 72 ℃ for 15 seconds, and then the temperature is kept at 72 ℃ for 1min, and the temperature is reduced to 4 ℃ for preservation.

8. Use of the method of any one of claims 1-7 in the preparation of a reagent for library construction.

9. The use according to claim 8, wherein the library is used for mNGS detection and the DNA sample is prepared by the following method:

(4) And (5) adding the initial DNA of the sample back to obtain the DNA.

10. NGS detection reagent for improving the homogeneity of library data, comprising a 1st specific primer and a 2nd specific primer in the method according to any one of claims 1 to 7, preferably wherein the 1st specific primer has a final concentration of 1 to 2 μm per primer and the 2nd specific primer has a final concentration of 0.05 to 2 μm, preferably 0.05 to 1 μm, more preferably 0.1 to 0.5 μm.