CN118086468A - Method for improving library data uniformity and application - Google Patents
Method for improving library data uniformity and application Download PDFInfo
- Publication number
- CN118086468A CN118086468A CN202410128510.0A CN202410128510A CN118086468A CN 118086468 A CN118086468 A CN 118086468A CN 202410128510 A CN202410128510 A CN 202410128510A CN 118086468 A CN118086468 A CN 118086468A
- Authority
- CN
- China
- Prior art keywords
- amplification
- library
- sample
- specific primer
- pcr amplification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000003321 amplification Effects 0.000 claims abstract description 82
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 82
- 238000012163 sequencing technique Methods 0.000 claims abstract description 49
- 238000012408 PCR amplification Methods 0.000 claims abstract description 36
- 238000011176 pooling Methods 0.000 claims abstract description 27
- 230000000295 complement effect Effects 0.000 claims abstract description 16
- 239000012634 fragment Substances 0.000 claims abstract description 16
- 238000010276 construction Methods 0.000 claims abstract description 8
- 238000002156 mixing Methods 0.000 claims abstract description 8
- 238000013467 fragmentation Methods 0.000 claims abstract description 7
- 238000006062 fragmentation reaction Methods 0.000 claims abstract description 7
- 239000000523 sample Substances 0.000 claims description 71
- 108020004414 DNA Proteins 0.000 claims description 22
- 238000000746 purification Methods 0.000 claims description 19
- 239000011324 bead Substances 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 18
- 239000000203 mixture Substances 0.000 claims description 17
- 230000001717 pathogenic effect Effects 0.000 claims description 15
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 14
- 239000003298 DNA probe Substances 0.000 claims description 9
- 108020004518 RNA Probes Proteins 0.000 claims description 9
- 239000003391 RNA probe Substances 0.000 claims description 9
- 108020003215 DNA Probes Proteins 0.000 claims description 8
- 229960002685 biotin Drugs 0.000 claims description 7
- 235000020958 biotin Nutrition 0.000 claims description 7
- 239000011616 biotin Substances 0.000 claims description 7
- 239000003153 chemical reaction reagent Substances 0.000 claims description 7
- 108010090804 Streptavidin Proteins 0.000 claims description 4
- 102000008579 Transposases Human genes 0.000 claims description 4
- 108010020764 Transposases Proteins 0.000 claims description 4
- 102000007260 Deoxyribonuclease I Human genes 0.000 claims description 3
- 108010008532 Deoxyribonuclease I Proteins 0.000 claims description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 3
- 239000002299 complementary DNA Substances 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 2
- 238000010839 reverse transcription Methods 0.000 claims description 2
- 238000004321 preservation Methods 0.000 claims 2
- 238000000926 separation method Methods 0.000 claims 2
- 238000007481 next generation sequencing Methods 0.000 abstract description 4
- 239000000047 product Substances 0.000 description 22
- 238000000265 homogenisation Methods 0.000 description 18
- 238000003908 quality control method Methods 0.000 description 16
- 244000052769 pathogen Species 0.000 description 12
- 241000264288 mixed libraries Species 0.000 description 11
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 10
- 238000005119 centrifugation Methods 0.000 description 7
- 238000012864 cross contamination Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 239000006228 supernatant Substances 0.000 description 6
- 238000012795 verification Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000001052 transient effect Effects 0.000 description 4
- 208000035473 Communicable disease Diseases 0.000 description 3
- 238000007400 DNA extraction Methods 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 150000007523 nucleic acids Chemical group 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 239000000443 aerosol Substances 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 2
- 238000001035 drying Methods 0.000 description 2
- 238000010828 elution Methods 0.000 description 2
- 238000006911 enzymatic reaction Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 241000219195 Arabidopsis thaliana Species 0.000 description 1
- 208000035143 Bacterial infection Diseases 0.000 description 1
- 108091092584 GDNA Proteins 0.000 description 1
- 206010024238 Leptospirosis Diseases 0.000 description 1
- 238000002123 RNA extraction Methods 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 101000980463 Treponema pallidum (strain Nichols) Chaperonin GroEL Proteins 0.000 description 1
- 108091060592 XDNA Proteins 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 208000022362 bacterial infectious disease Diseases 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 239000012521 purified sample Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biochemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- General Chemical & Material Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for improving library data uniformity and application thereof, belonging to the technical field of NGS sequencing. The method comprises the following steps: fragmenting: sample DNA is taken for fragmentation treatment, and sequencing primers are connected at the fracture; amplification and enrichment: adding a 1st specific primer for PCR amplification, wherein the 1st specific primer comprises a connector, index and a complementary sequence, the complementary sequence is complementary with the sequencing primer, and the Index is a specific label; pooling: mixing samples with Index; homogenizing and amplifying: adding an equivalent amount of 2nd specific primers into the mixed sample for limiting PCR amplification; the 2nd specific primer is complementary to the adaptor sequence and the Index sequence in the 1st specific primer and is added in an amount less than that required for sufficient amplification of all sample target region fragments. By adopting the method for library construction and sequencing, the data uniformity can be improved, and the difference of the quality of sequencing data can be shortened.
Description
Technical Field
The invention relates to the technical field of NGS sequencing, in particular to a method for improving library data uniformity and application thereof.
Background
Etiology diagnosis is a vital link in diagnosis of infectious diseases, and the value of accurate diagnosis of infectious diseases is undoubted. However, the traditional original detection technology has the problems of long detection period and low positive rate; the molecular diagnosis technology has the problems of false positive/negative, narrow detection range and the like.
Along with the popularization and popularization of metagenomic sequencing technology, metagenomic second generation sequencing technology (metagenomics Next Generation Sequencing, mNGS) has the advantages of high throughput, wide coverage, no bias, rapidness, accuracy and the like, is gradually used for pathogen detection of clinical infectious diseases, and has important clinical significance on clinical complex and suspicious infectious samples such as respiratory tract, blood, sterile body fluid and the like.
In 2014, after mNGS is used for detecting leptospira infection by cerebrospinal fluid for the first time, related research and literature report on mNGS are more and more reported at home and abroad, and are gradually popularized and popularized in clinical diagnosis practice. In 2016, mNGS detection technology has been used clinically in China and is rapidly becoming the focus of "hot-hand" in the field of accurate diagnosis of clinical pathogenic microorganisms.
However, the difference in library concentration between captured samples resulted in a large difference in the amount of data obtained by splitting each sample after sequencing due to the different pathogen load for each sample in the mixed library. Therefore, a technology for performing homogenization processing on the captured samples is needed to make the data volume of the machine to be uniform so as to ensure the detection sensitivity and the accuracy of pathogen interpretation of each sample.
Disclosure of Invention
Aiming at the problem that the detection sensitivity is reduced due to the difference of the next machine data in the mixed library obtained after the library Pooling due to the different contents of each sample or target sequence, the invention provides a method for improving the uniformity of library data.
The invention provides a method for improving library data uniformity, comprising the following steps:
fragmenting: taking sample DNA, carrying out fragmentation treatment, and connecting a sequencing primer at a fracture;
Amplification and enrichment: adding a 1st specific primer for PCR amplification, wherein the 1st specific primer comprises a linker sequence, an Index sequence and a complementary sequence, the complementary sequence is complementary with the sequencing primer, and the Index sequence is a specific tag sequence of the sample;
Pooling: mixing the samples with the Index sequences to obtain mixed samples;
homogenizing and amplifying: adding an equal amount of 2nd specific primers into the mixed sample for carrying out restriction PCR amplification to obtain an on-line library; the 2nd specific primer is complementary to the adaptor sequence and the Index sequence in the 1st specific primer, and the 2nd specific primer is added in an amount less than that required for sufficient amplification of all samples.
In the method for improving the uniformity of library data, first, a 1st specific primer is designed for each sample, and a specific Index sequence for each sample is taken as a tag, so that a first library is prepared. And then in the homogenizing amplification step, the concentration and the amplification cycle number of the 2nd specific primers are controlled, the limited and equal amount of the 2nd specific primers are used, the amplification is started from the Index of each first library sequence, the amplification efficiency of samples initiated by different templates is different due to the limitation of resources in an amplification system in the amplification process by utilizing a limiting amplification method, the amplification efficiency of high initial template samples is reduced, the amplification efficiency of low initial template samples is increased in the later amplification stage, the product quality of different initial templates reaches the same platform stage in the later amplification stage, and the PCR products with the same concentration level are generated, so that the effect of uniform product concentration is achieved, and the difference of sequencing data quality is shortened.
In some embodiments, in the amplification enrichment step, the final concentration of the 1nd specific primer in the PCR amplification system is 1-2. Mu.M per primer.
In some embodiments, the final concentration of each of the 2nd specific primers in the limiting PCR amplification system in the homogenization amplification step is 0.05 to 2. Mu.M, preferably 0.05 to 1. Mu.M, more preferably 0.1 to 0.5. Mu.M.
The final concentration, i.e., the reaction concentration of the specific primer in the reaction system, may also be referred to as the working concentration. The final concentration of the 2nd specific primer needs to be controlled below the amount required by full amplification of the target region fragment maximum content sample, so that the purpose of limiting amplification is realized, and the specific amount can be adjusted according to actual conditions.
However, by adopting the concentration range, each pair of primers can be optimally amplified, the non-equivalent library can achieve optimal homogenization effect, the activity of enzyme is not easily influenced, a secondary structure is formed, and the effect and cost consideration of the whole amplification are considered, so that the concentration range is a preferred concentration range.
In some embodiments, between the Pooling step and the homogenizing and amplifying step, a capture step is further included, the capture step being: capturing the target region fragment of the mixed sample genome with an RNA probe or a DNA probe. It will be appreciated that the above RNA probes or DNA probes are of conventional design, as described in reference (BacCapSeq:a Platform for Diagnosis and Characterization of Bacterial Infections.DOI:10.1128/mBio.02007-18).
In some embodiments, in the capturing step, the target region fragments of the genome of the mixed sample are bound with a biotin-labeled RNA probe or DNA probe, and then the sample is captured by binding with biotin in the RNA probe or DNA probe using a magnetic bead with streptavidin. It will be appreciated that the capture process may be carried out by other means such as antigen-antibody binding or solid phase capture, and the like, according to conventional procedures. However, the combination of biotin with extremely strong affinity and streptavidin is much higher than that between common antigen and antibody, and the two have the advantages of good binding stability, strong specificity and high sensitivity.
In some embodiments, in the homogenization amplification step, several parts of which the number is the same as that of the samples are taken out from the mixed samples, and 2nd specific primers are added to each of the above parts of samples, respectively, and restriction PCR amplification is performed, and amplified products Pooling are obtained to obtain an on-machine library.
The above-mentioned homogenization amplification is a tube-based amplification, i.e., the mixed sample is divided into a plurality of parts, and each part is amplified with 2nd specific primers (distinguished by Index sequence recognition) of one part of the sample, and the amount of the added 2nd specific primers is equal to the amount to achieve the restriction homogenization amplification.
In some embodiments, in the homogenizing amplification step, an equal amount of the 2nd specific primer mixture is added to the mixed sample, and the mixed sample is subjected to restriction PCR amplification at the same time, so as to obtain the on-machine library.
The homogenization amplification is mixed tube amplification, namely, the mixed sample is directly amplified with a mixture of 2nd specific primers corresponding to each sample, and the restriction homogenization amplification is realized in the same reaction system.
In some embodiments, in the amplification enrichment step, the PCR amplification is followed by a purification step, which may be: magnetic bead sorting was used to obtain libraries with Index sequences that fit into a range of fragment sizes.
In some embodiments, the homogenizing step, after performing the restriction PCR amplification, further comprises a purification step, which may be: magnetic bead sorting was used to obtain libraries with Index sequences that fit into a range of fragment sizes.
It will be appreciated that the above purification steps are carried out in a manner conventional in the art.
In some embodiments, the amplification product fragment length that corresponds to the fragment size range is 100-600bp.
In some embodiments, in the fragmenting step, the sample DNA is broken with a transposase and universal sequencing primers are ligated at the break. It will be appreciated that the library fragmentation and sequencing primer ligation may be performed according to methods common in the art, such as adaptor library construction, PCR amplicon library construction, etc., but using transposase has the advantage of low initial amount of library construction and short library construction time.
In some embodiments, in the amplification enrichment step, the PCR amplification is performed according to the following procedure: after maintaining at 98℃for 30s, 13.+ -. 3 cycles were performed according to the procedure of maintaining at 98℃for 10s, at 60℃for 30s, and at 72℃for 30s, followed by maintaining at 72℃for 5min, cooling to 4℃and preserving.
In some embodiments, in the homogenizing amplification step, the limiting PCR amplification is performed according to the following procedure: after maintaining at 98℃for 45s, 25.+ -.5 cycles were performed according to the procedure of maintaining at 98℃for 10s, 65℃for 15s, and 72℃for 15s, followed by maintaining at 72℃for 1min, cooling to 4℃and preserving.
Under the above-mentioned restriction amplification conditions, by designing the sequence of the 2nd specific primers, controlling the input amount of the primers and controlling the number of amplification cycles, different 2nd specific primers can be amplified better under the restriction conditions, and the products with high concentration uniformity can be obtained.
The above-described methods for improving the homogeneity of diverse libraries of text are used for non-diagnostic therapeutic purposes.
In another aspect, the invention also provides the use of the above method for preparing a reagent for library construction.
In some embodiments, the library is used for mNGS assays and the sample DNA is prepared by the following method:
(1) Extracting DNA and RNA from a sample, and digesting and removing the DNA by DNase I;
(2) Hybridizing and combining host specific probes with host RNA, and retaining pathogenic RNA;
(3) Carrying out reverse transcription on the pathogenic RNA sequence to obtain a cDNA chain;
(4) And (5) adding the initial DNA of the sample back to obtain the DNA.
It will be appreciated that the extraction of RNA and DNA in the sample may be carried out by conventional methods for RNA or DNA extraction.
In another aspect, the invention also provides an NGS detection reagent for improving library data uniformity, comprising a 1st specific primer and a 2nd specific primer in the above method, preferably, the 1st specific primer has a final concentration of 1-2 μm per primer, and the 2nd specific primer has a final concentration of 0.05-2 μm, preferably 0.05-1 μm, more preferably 0.1-0.5 μm.
On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.
The reagents and materials used in the present invention are commercially available.
The invention has the positive progress effects that:
According to the method for improving the uniformity of library data, 1st specific primers are designed for each sample, index sequences specific to each sample are taken as labels, a first library is prepared, then in a uniform amplification step, the concentration and the amplification cycle number of the 2nd specific primers are controlled, limited and equal amounts of the 2nd specific primers are used, the Index of each first library sequence is amplified, a limited amplification method is utilized, the amplification efficiency of samples initiated by different templates is different due to the limitation of resources in an amplification system in the amplification process, the amplification efficiency of samples of high-initial templates is reduced, the amplification efficiency of samples of low-initial templates is high, the product quality of templates with different initial amounts reaches the same platform stage in the amplification later stage, and PCR products with the same concentration level are generated, so that the effect of uniform product concentration is achieved, and the difference of sequencing data quality is shortened.
Drawings
FIG. 1 is a schematic diagram of the PCR amplification and purification process in example 1.
FIG. 2 is a schematic diagram of the principle of homogenization amplification in example 1.
FIG. 3 is the yield of sequencing data after simulating unequal standard quality control Pooling in example 1.
FIG. 4 is a graph showing the relationship between the yield ratios of the sequencing data after simulating the unequal standard quality control Pooling in example 1.
FIG. 5 shows the cross-contamination rate of sequencing after the unequal standard quality control Pooling in example 1.
FIG. 6 is a schematic diagram of the sample extraction and purification process in example 2.
FIG. 7 is a plot of the yield of sequencing data from the non-equal mix library of example 2.
FIG. 8 shows the ratio of the pathogen detection sequences of the non-uniform mixed library of example 2.
FIG. 9 is a schematic diagram of the principle of limiting amplification by mixing tubes in example 3.
FIG. 10 is a plot of the yield of sequencing data simulating an unequal mix library in example 3.
FIG. 11 is a plot of the yield ratio of sequencing data simulating a non-equal mix library in example 3.
FIG. 12 is a plot of the sequencing cross-contamination rate of a simulated non-equal mix library in example 4.
FIG. 13 is a plot of the yield of sequencing data simulating an unequal mix library in example 4.
FIG. 14 is a plot of the ratio of pathogen detection sequences for a simulated non-equal mix library of example 4.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention. The experimental methods, in which specific conditions are not noted in the following examples, were selected according to conventional methods and conditions, or according to the commercial specifications.
Definition:
Index: in second generation sequencing, a synthetic nucleic acid sequence is designed artificially and cannot be compared with a genome existing in a sample to be detected, and the sequence is used for sample distinction during sequencing, which is also called a molecular tag.
PCR amplification system amplification enzyme: entrans 2 products X qPCR Probe Master MIX, ABclonal Biotechnology (Wuhan) Inc.
Homogenizing PCR amplification System amplification enzyme: KAPA HiFi HotStart Ready Mix Roche (China) Holding Ltd.
Example 1
In the embodiment, the difference of experimental results is brought about by reducing the complexity of samples, and experimental verification is performed by using a standard quality control library.
1. Method of
1) Fragmentation of
And taking a standard quality control product, wherein the standard quality control product is a gene fragment obtained by amplifying a selected region of arabidopsis thaliana, and is a standard sample with known content and base sequence. The fragment size is 100-600bp. The standard quality control product is prepared according to the method of patent application CN 202210543900.5 (a quality control method of library label primers and application thereof), and after specific primer sequences are amplified, sequencing primer sequences are connected to two ends of the sequence.
2) Enrichment by amplification
The standard quality control and 1st specific primer were taken in an octal tube, and were dissolved thoroughly before use, vortexed, homogenized, and centrifuged transiently, and PCR amplified according to the following amplification system (Table 1) and procedure (Table 2).
The 1st specific primer described above contains a linker sequence, an Index sequence, which is the specific tag sequence of the sample in this example, and a complementary sequence, which is complementary to the sequencing primer, and the length of the Index sequence described above is designed to be 14nt in order to provide sufficient specificity for the 2nd primer to achieve the homogenization effect. It can be appreciated that the actual use length can be designed according to specific experimental requirements. For the complementary sequences, 14nt of sequencing primer sequences and complementary sequences were also selected in this example to increase the primer pairing success rate.
In this example, 4 different Index were used to model 4 different sample libraries, respectively.
TABLE 1 PCR amplification System
TABLE 2 PCR amplification cycling program
3) Purification
A: after amplification, standing the eight-connecting tube at a low temperature for a period of time, and performing instantaneous centrifugation by using a palm type centrifugal machine to reduce aerosol pollution.
B: the amplified product was made up to 50. Mu.L with 1.0× (50. Mu.L) DNA Ampure xp magnetic beads added thereto, which were equilibrated at room temperature for 30min to homogenize the amplified product before use, and mixed with shaking after adding the magnetic beads, and allowed to stand at room temperature for 5min to allow the amplified product to come into full contact with the magnetic beads.
C: after transient centrifugation, the octant was placed on a magnetic plate, left to stand for 3min, and the supernatant was discarded using a pipette.
D: preparing 80% ethanol, adding 200 μl of each tube of eight-joint tubes, transferring to the tube for 2 weeks, and discarding supernatant; this step was repeated once, then centrifuged instantaneously and the residual ethanol was discarded with a small-scale pipette.
E: and (3) keeping a cover-opening state, and drying at 37 ℃ on a dry type thermostat or airing at room temperature until the surface of the magnetic beads is matt (ethanol residues can inhibit subsequent enzyme reaction, and the nucleic acid elution rate is reduced when the magnetic beads are cracked).
F: adding 20 mu L NF water, shaking and mixing uniformly, standing for 2min, and incubating at room temperature for eluting.
G: after transient centrifugation, the mixture was placed on a magnetic plate and allowed to stand for 2min, and 17. Mu.L of the supernatant was pipetted into a new octant tube to obtain a library with Index identification tag.
The PCR amplification and purification process described above is shown in FIG. 1.
4)Pooling
The amplified and enriched libraries are mixed into non-equivalent libraries according to the mass ratio Pooling of 1:30:50:100.
5) Homogenization amplification
Because the sample in this embodiment is an analog standard sample, the target region fragment is not required to be captured, and the homogenization amplification is directly performed, specifically as follows:
The same amount of 4 parts was taken out of the above samples and added to each of the above samples, and an equal amount of 2nd specific primer was added thereto, and the mixture was thoroughly dissolved before use, vortexed and centrifuged instantaneously, and subjected to restriction PCR amplification according to the following amplification system (Table 3) and procedure (Table 4).
TABLE 3 homogenization of PCR amplification System (per sample)
TABLE 4 PCR amplification cycling program
5) Purification
A: after amplification, standing the eight-connecting tube at a low temperature for a period of time, and performing instantaneous centrifugation by using a palm type centrifugal machine to reduce aerosol pollution.
B: 50 mu L of amplified product is added with 1.6X (80 mu L) DNA Ampure xp magnetic beads, the magnetic beads are balanced for 30min at room temperature before use, the amplified product is uniformly mixed and oscillated after being added with the magnetic beads, and the amplified product is kept stand for 5min at room temperature, so that the amplified product is fully contacted with the magnetic beads.
C: after transient centrifugation, the octant was placed on a magnetic plate, left to stand for 3min, and the supernatant was discarded using a pipette.
D: preparing 80% ethanol, adding 200 μl of each tube of eight-joint tubes, transferring to the tube for 2 weeks, and discarding supernatant; this step was repeated once, then centrifuged instantaneously and the residual ethanol was discarded with a small-scale pipette.
E: and (3) keeping a cover-opening state, and drying at 37 ℃ on a dry type thermostat or airing at room temperature until the surface of the magnetic beads is matt (ethanol residues can inhibit subsequent enzyme reaction, and the nucleic acid elution rate is reduced when the magnetic beads are cracked).
F: adding 20 mu L NF water, shaking and mixing uniformly, standing for 2min, and incubating at room temperature for eluting.
G: after instantaneous centrifugation, placing on a magnetic plate, standing for 2min, sucking 17 mu L of supernatant, and transferring to a new octal tube to obtain a quality control first sequencing library with index identification tag, namely a first standard library.
6) Sequencing on machine
A: the first standard library concentration was entered into the on-machine information table, library Pooling was performed to homogenize the library concentration, and all diluted libraries Pooling were placed into one tube. The above described procedure from restriction amplification to purification and Pooling is shown in FIG. 2, where UD1, UD2 … … UDn represent different samples, respectively.
B: mu.L of 1N NaOH solution was diluted to 0.2N with NF water and Pooling good library diluted to 4nM with HT1 high throughput sequencing buffer.
C: library denaturation, a defined volume of 4nM library was taken in a new 1.5mL EP tube, then a defined volume of 0.2N NaOH solution was added, mixed well with shaking, denatured at room temperature for 5min, and vortexed 2 times during this period.
D: the denatured library was diluted to 20pM with HT1 high throughput sequencing buffer and then further diluted to 1.1pM.
E: each library was run on machine 2M data volume, and the library was sequenced according to the run on machine.
2. Results
The data after sequencing is analyzed, the analysis results are shown in fig. 3-5, fig. 3 is a graph simulating the output condition of the sequencing data after the unequal standard quality control materials Pooling, each group is subjected to 5 sample detection, wherein the abscissa is the input quantity of the library with different proportion relations, namely, the input quantity is respectively 1:30:50:100, and the ordinate is the data quantity taking the logarithm. FIG. 4 is a diagram showing the ratio of the output of the sequencing data after simulating the unequal standard quality control Pooling, wherein the minimum input is taken as a scale 1, the ratio of other data to 1X is calculated, the abscissa is the input of the library with different ratios, the ordinate is the ratio of other data to 1X, and the different shape data points represent that 5 different samples are selected for Pooling. FIG. 5 shows the cross contamination rate of sequencing after unequal standard quality control Pooling, with the abscissa showing the standard quality control number (5 samples each including 4 mass ratios for 20 samples) and the ordinate showing the number of cross contamination sequences divided by the number of total_reads (contamination rate).
As can be seen from the graph, the data yield ratio is less than 1:5, and the cross_correlation is less than 1/10 5, which shows that the method can improve the uniformity of library data.
Example 2
Based on the experimental verification results of example 1, the present example is based on clinical samples, and based on the non-uniformity of library quality due to pathogen differences after probe capture, systematic verification of clinical samples was performed on the above method for improving library data uniformity.
1. Method of
1. Sample DNA extraction
DNA and RNA were simultaneously extracted from the same clinical sample according to the conventional sample pretreatment and extraction method, and DNA in RNA nucleic acid was digested with DNase I (origin: nanjinouzan Biotechnology Co., ltd.; ultraClean ds-CDNA SYNTHESIS Module (+ GDNA WIPER)), to obtain a sample containing only RNA. The specific probe (specifically designed by human specific probe according to conventional method, such as Li,N.,Cai,Q.,Miao,Q.,et.al.(2020).High-Throughput Metagenomics for Identification of Pathogens in the Clinical Settings.Small Methods,5(1),2000792.) hybridized with human RNA to remove human RNA and residual pathogen RNA), and reverse transcribing to synthesize cDNA chain and adding DNA.
2. Fragmentation of
The sample DNA was taken, broken by a transposase (source: nanjinouzan Biotechnology Co., ltd., ultraClean Universal Plus DNA Library Prep Kit for Illumina V3) library construction method, and sequencing primers were ligated at the break.
3、Pooling
Taking the fragmented products, and carrying out Pooling according to the mass ratio of 1:30:50:100.
3. Enrichment by amplification
The library to which the sequencing primer had been attached and the 1st specific primer were taken in an octamer tube, and were sufficiently dissolved before use, vortexed, and subjected to transient centrifugation, and PCR amplification was performed in accordance with the amplification system and procedure of example 1.
In this example, 4 different indices were also used to model libraries of samples with different levels in clinical samples.
4. Purification
Purification was performed according to the method of example 1, except that 100. Mu.L of the amplified product was taken and 0.9 XDNA Ampure xp magnetic beads (90. Mu.L) were added.
The flow from sample DNA extraction to purification steps described above is shown in FIG. 6.
5、Pooling
The purified sample library with 4 different Index tags was taken and subjected to an equivalent mass Pooling according to 400 ng/library.
6. Capturing
The target region fragments of the above mixed sample genome are first bound with a biotin-labeled RNA probe or DNA probe (designed according to the CN202011636780.0 protocol), and the sample is captured by binding streptavidin to biotin in the RNA probe or DNA probe.
7. Homogenization amplification
The same amount of 4 parts of each sample was taken out and added to each octant, and the same amount of 2nd specific primer was added to each sample, and the mixture was thoroughly dissolved before use, vortexed and centrifuged instantaneously, and subjected to restriction PCR amplification according to the amplification system and procedure for restriction PCR amplification in example 1.
8. Purification
Purification was performed according to the method of example 1.
9. Sequencing on machine
Library Pooling and sequencing preparations were performed as experimental groups by reference to the method of example 1.
And simultaneously setting a control group, wherein the control group is conventional universal sequencing primer amplification of non-uniform amplification, namely, the conventional universal sequencing primer pair is used for performing common PCR amplification on the captured library instead of uniform amplification after capturing is completed, so as to obtain a captured amplified library.
2. Results
The data after sequencing is analyzed, the analysis results are shown in fig. 7-8, fig. 7 shows the sequencing data yield conditions of the unequal mixed library, the abscissa shows different groups, the ordinate shows the data yield, the ratio of the maximum data yield to the minimum data yield in the control group is 3439, and the ratio of the maximum data yield to the minimum data yield in the experimental group is 5.8.
FIG. 8 shows the ratio of pathogen detection sequences in non-equal mixed libraries, with the ratio of pathogen detection sequences on the ordinate (median 1.7 for experimental/control groups).
As can be seen from the graph, the data volume output ratio is less than 1:5, the cross_contact is less than 1/10 5, and the library data volume output ratio relationship can be reduced from 1:1000 (0.1M: 100M) to less than 1:20 by the method, and the pollution rate is less than 5/10 6.
Example 3
In examples 1-2, the homogenization amplification step was performed by selecting a mixed sample with Index after 1st specific primer amplification, and performing tube-mixing amplification, but the number of reactions and the amount of reagents were large in practical operation.
1. Method of
In this example, reference is made to example 1, and experimental verification is performed with a standard quality control library.
The difference is that in the homogenization amplification step, an equal amount of 2nd specific primer mixture designed for each sample Index was directly added to the non-equal amount mixed library, and the restriction amplification was performed with reference to the amplification system and procedure of example 1, but the added 2nd specific primers were mixed primers, wherein each primer had a final concentration of 0.2. Mu.M in the system. The principle of the above-described restriction amplification and purification steps is shown in FIG. 9.
2. Results
The data after sequencing are analyzed, the analysis results are shown in fig. 10-12, fig. 10 shows the output condition of sequencing data of simulated non-equal mixed libraries, each group is subjected to 5 sample detection, the abscissa shows the input amount of the libraries with different proportion relations, and the ordinate shows the data amount taking the logarithm.
FIG. 11 is a schematic diagram showing the ratio of the output of the sequencing data of a simulated non-uniform mixed library, wherein the minimum input is taken as a scale 1, the proportional relation between other data and 1X is calculated, the abscissa is the input of the library with different proportional relations, the ordinate is the ratio of other data and 1X, and the data points with different shapes represent that 6 different samples are selected for Pooling.
FIG. 12 is a plot of the sequencing cross-contamination rate of a simulated non-equal mix library, with the standard quality control number (6 samples, each sample comprising 4 mass fractions, 24 samples total) on the abscissa, and the number of cross-contamination sequences divided by the number of total_reads (contamination rate) on the ordinate.
From the graph, the data output ratio is less than 1:5, and the cross_contact is less than 1/10 4, which shows that the method can improve the library data uniformity, reduce the experiment difficulty, simplify the operation, reduce the reagent cost, but the Cross Contamination is slightly higher than that of the example 1.
Example 4
In this example, the homogenization step was performed by tube mixing amplification.
1. Method of
This example refers to example 2, where experimental verification was performed on clinical specimens. The difference is that in the homogenization amplification step, an equal amount of 2nd specific primer mixture designed for each sample Index was directly added to the non-equal amount mixed library, and the restriction amplification was performed with reference to the amplification system and procedure of example 2, but the added 2nd specific primers were mixed primers, wherein each primer had a final concentration of 0.2. Mu.M in the system.
2. Results
The data after sequencing is analyzed, the analysis result is shown in fig. 13-14, fig. 13 shows the situation of simulating the sequencing data output of the unequal mixed library, the abscissa is different groups, the ordinate is the data output, the ratio of the maximum data output to the minimum data output in the control group is 31, and the ratio of the maximum data output to the minimum data output in the experimental group is 9.
FIG. 14 is a diagram showing the ratio of pathogen detection sequences in a simulated non-uniform mixed library, with the ratio of pathogen detection sequences on the ordinate (experimental/control group, median 3.5).
As can be seen from the graph, the data volume output ratio is less than 1:5, and the cross_correlation is less than 5/10 4, which shows that the library data volume output ratio relationship can be reduced from 1:1000 (0.1M: 100M) to less than 1:20 by the method, and the pollution rate is less than 5/10 4.
Claims (10)
1. A method for improving library data uniformity comprising the steps of:
fragmenting: taking sample DNA, carrying out fragmentation treatment, and connecting a sequencing primer at a fracture;
Amplification and enrichment: adding a 1st specific primer for PCR amplification, wherein the 1st specific primer comprises a linker sequence, an Index sequence and a complementary sequence, the complementary sequence is complementary with the sequencing primer, and the Index sequence is a specific tag sequence of the sample;
Pooling: mixing the samples with the Index sequences to obtain mixed samples;
homogenizing and amplifying: adding an equal amount of 2nd specific primers into the mixed sample for carrying out restriction PCR amplification to obtain an on-line library; the 2nd specific primer is complementary to the adaptor sequence and the Index sequence in the 1st specific primer, and the 2nd specific primer is added in an amount less than that required for sufficient amplification of all samples.
2. The method for improving the homogeneity of library data according to claim 1, wherein in said amplification enrichment step, the final concentration of said 1st specific primer in the PCR amplification system is 1-2 μΜ per primer;
And/or, in the homogenizing amplification step, the final concentration of each of the 2 nd-specific primers in the limiting PCR amplification system is 0.05 to 2. Mu.M, preferably 0.05 to 1. Mu.M, more preferably 0.1 to 0.5. Mu.M.
3. The method for improving library data uniformity according to claim 1, further comprising a capturing step between said Pooling step and said homogenizing and amplifying step, said capturing step being: capturing the target region fragment of the mixed sample genome with an RNA probe or a DNA probe.
4. The method for improving the homogeneity of library data according to claim 3, wherein in the capturing step, a target region fragment of the genome of the mixed sample is first bound with a biotin-labeled RNA probe or DNA probe, and then a sample is captured by binding with biotin in the RNA probe or DNA probe using a magnetic bead with streptavidin.
5. The method for improving the uniformity of library data according to claim 1, wherein in said uniformity amplification step, a plurality of samples equal to the number of samples are taken out from the mixed sample, and 2nd specific primers of the same amount are added to each sample, respectively, and restriction PCR amplification is performed, and amplified products Pooling are obtained to obtain an on-line library;
Or adding an equal amount of the 2nd specific primer mixture into the mixed sample, and simultaneously carrying out restriction PCR amplification on the mixed sample to obtain the on-line library.
6. The method for improving library data uniformity according to claim 1, wherein at least one of the following conditions is met:
1) In the amplification and enrichment step, the PCR amplification is performed and then the PCR amplification further comprises a purification step, wherein the purification step can be as follows: magnetic bead separation is adopted to obtain a library with Index sequences which accords with the size range of the fragments;
2) In the homogenizing amplification step, the method further comprises a purification step after the restriction PCR amplification, wherein the purification step can be as follows: magnetic bead separation is adopted to obtain a library with Index sequences which accords with the size range of the fragments;
3) In the fragmentation step, the sample DNA is broken with a transposase and universal sequencing primers are ligated at the break.
7. The method for improving the homogeneity of library data according to claim 1, wherein in said amplification and enrichment step, the PCR amplification is performed according to the following procedure: after the temperature is kept at 98 ℃ for 30s, 13+/-3 cycles are carried out according to the procedures of keeping at 98 ℃ for 10s, keeping at 60 ℃ for 30s and keeping at 72 ℃ for 30s, and then the temperature is kept at 72 ℃ for 5min, and the temperature is reduced to 4 ℃ for preservation;
And/or, in the homogenizing amplification step, in the limiting PCR amplification, after the temperature is kept at 98 ℃ for 45 seconds, 25+/-5 cycles are carried out according to the procedures of keeping at 98 ℃ for 10 seconds, keeping at 65 ℃ for 15 seconds and keeping at 72 ℃ for 15 seconds, and then the temperature is kept at 72 ℃ for 1min, and the temperature is reduced to 4 ℃ for preservation.
8. Use of the method of any one of claims 1-7 in the preparation of a reagent for library construction.
9. The use according to claim 8, wherein the library is used for mNGS detection and the DNA sample is prepared by the following method:
(1) Extracting DNA and RNA from a sample, and digesting and removing the DNA by DNase I;
(2) Hybridizing and combining host specific probes with host RNA, and retaining pathogenic RNA;
(3) Carrying out reverse transcription on the pathogenic RNA sequence to obtain a cDNA chain;
(4) And (5) adding the initial DNA of the sample back to obtain the DNA.
10. NGS detection reagent for improving the homogeneity of library data, comprising a 1st specific primer and a 2nd specific primer in the method according to any one of claims 1 to 7, preferably wherein the 1st specific primer has a final concentration of 1 to 2 μm per primer and the 2nd specific primer has a final concentration of 0.05 to 2 μm, preferably 0.05 to 1 μm, more preferably 0.1 to 0.5 μm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410128510.0A CN118086468A (en) | 2024-01-30 | 2024-01-30 | Method for improving library data uniformity and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410128510.0A CN118086468A (en) | 2024-01-30 | 2024-01-30 | Method for improving library data uniformity and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118086468A true CN118086468A (en) | 2024-05-28 |
Family
ID=91152110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410128510.0A Pending CN118086468A (en) | 2024-01-30 | 2024-01-30 | Method for improving library data uniformity and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118086468A (en) |
-
2024
- 2024-01-30 CN CN202410128510.0A patent/CN118086468A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110129415B (en) | NGS library-building molecular joint and preparation method and application thereof | |
CN111254190B (en) | Nanopore third-generation sequencing detection method for plasma virology | |
CN111808854B (en) | Balanced joint with molecular bar code and method for quickly constructing transcriptome library | |
CN109486923B (en) | Primer system for sequencing multiple amplicons, application thereof and method for constructing sequencing library | |
CN105567681B (en) | A kind of method and label connector based on the noninvasive biopsy virus of high-throughput gene sequencing | |
CN104946795B (en) | Primer, probe and kit for Site Detection various serotype foot and mouth disease virus | |
CN111440896A (en) | Novel β coronavirus variation detection method, probe and kit | |
CN106191311B (en) | A kind of multiple liquid phase genetic chip method and reagent of quick detection cavy LCMV, SV, PVM, Reo-3 virus | |
CN111748637A (en) | SNP molecular marker combination, multiplex composite amplification primer set, kit and method for genetic relationship analysis and identification | |
CN111961713A (en) | Probe composition and kit for screening carriers of pathogenic genes of genetic diseases and preparation method of probe composition and kit | |
CN112646859B (en) | Macrogenomics-based respiratory tract pharynx swab sample database building method and pathogen detection method | |
WO2016045105A1 (en) | Pf rapid database construction method and application therefor | |
CN109295500B (en) | Single cell methylation sequencing technology and application thereof | |
WO2024104130A1 (en) | Whole genome molecular marker development method utilizing degenerate primer amplification | |
CN113265452A (en) | Bioinformatics pathogen detection method based on Nanopore metagenome RNA-seq | |
CN112342289A (en) | Primer group for enriching thalassemia genes by long-fragment PCR (polymerase chain reaction) and application of primer group | |
CN115948607B (en) | Method and kit for simultaneously detecting multiple pathogen genes | |
CN111501106A (en) | Construction method, device and application of high-throughput sequencing library of exosome RNA | |
CN114277092B (en) | RNA virus macro transcriptome database building method based on nanopore sequencing platform and application | |
CN118086468A (en) | Method for improving library data uniformity and application | |
CN112063757A (en) | Primer and kit for detecting African swine fever virus and application of primer and kit | |
CN117551717A (en) | Construction method of gene library and application of gene library in detection of pathogenic microorganisms | |
CN117467743A (en) | Gene capturing method and application thereof in whole exon sequencing | |
CN118064555A (en) | Method for improving uniformity of various text libraries and application | |
CN115873922A (en) | Single cell full-length transcript library construction sequencing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |