CN112342270A - Human respiratory virus targeted enrichment capture probe set and application thereof - Google Patents

Human respiratory virus targeted enrichment capture probe set and application thereof Download PDF

Info

Publication number
CN112342270A
CN112342270A CN201910728084.3A CN201910728084A CN112342270A CN 112342270 A CN112342270 A CN 112342270A CN 201910728084 A CN201910728084 A CN 201910728084A CN 112342270 A CN112342270 A CN 112342270A
Authority
CN
China
Prior art keywords
nucleic acid
genome
virus
sequences
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910728084.3A
Other languages
Chinese (zh)
Inventor
任丽丽
王健伟
曹德盼
肖艳
朱俊琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Pathogen Biology of CAMS
Original Assignee
Institute of Pathogen Biology of CAMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Pathogen Biology of CAMS filed Critical Institute of Pathogen Biology of CAMS
Priority to CN201910728084.3A priority Critical patent/CN112342270A/en
Publication of CN112342270A publication Critical patent/CN112342270A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6811Selection methods for production or design of target specific oligonucleotides or binding molecules

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of human respiratory virus diagnosis, and provides a method for generating a targeted enrichment capture probe set aiming at human respiratory viruses, which comprises the steps of obtaining a gene sequence of the viruses infecting human respiratory tracts; designing probes by using a sliding window method, and exhaustively exhausting all possible nucleic acid sequences; filtering the obtained nucleic acid sequence; filtering out sequences in the nucleic acid sequence library which are highly homologous with the human genome; probe sequences were clustered. The invention also relates to application of the probe set obtained by the method in preparing a next generation sequencing library of the human respiratory virus, so that whole genome sequence information is obtained, genome variation is identified, potential epidemic propagation characteristics of the virus are evaluated, and early warning and prevention and control of respiratory diseases are guided.

Description

Human respiratory virus targeted enrichment capture probe set and application thereof
Technical Field
The invention relates to a human respiratory virus targeted enrichment capture probe set and application thereof in detecting, identifying and evaluating respiratory viruses.
Background
Human respiratory viruses are important pathogens causing human infection, and are the main pathogens of new-onset and emergent infectious diseases. Viruses are subject to variation, which can lead to changes in virulence, resistance or transmission, leading to rapid transmission among people and causing significant public health problems. The method is key to find whether mutation and novel virus exist and then evaluate the potential epidemic propagation characteristics of the virus, and has important significance to early warning prevention and control of infectious diseases.
Common respiratory viruses that cause respiratory infections in humans include: (1) influenza virus (IFV), which can be classified into three types a (a), B (B) and C (C) according to the antigenicity of its Nucleoprotein (NP) and Matrix Protein (MP), wherein Influenza a virus (IFVA) is most widely affected, and is subdivided into several subtypes according to the antigenicity difference of its surface phytohemagglutinin protein (HA) and Neuraminidase (NA). It HAs now been found that 18 HA subtypes (H1-18) and 11 NA subtypes (N1-11) contribute to the pathogenesis of infections in humans and animals, and cause regional and outbreak epidemics in the population. (2) Human parainfluenza virus (HPIV), which is an enveloped single-stranded negative-strand RNA virus, is classified into four serotypes 1, 2, 3 and 4, and type 4 is further classified into two subtypes 4a and 4b according to hemagglutination inhibition activity and reactivity based on monoclonal antibodies. The genome is 15-19Kb in length, with no cap structure at the 5 'end and no poly A (polyA) "tail" at the 3' end. The genome composition from 3 ' to 5 ' end includes a leader region of about 55nt (70 nt for HPIV-2 leader), 6 protein-coding region genes (N, P, M, F, HN, L) and a tail sequence of 21-291nt at 5 ' end. (3) Respiratory Syncytial Virus (RSV), wherein the RSV genome is single-strand negative-strand RNA with the length of about 15Kb, a cap structure at the 5 'end and a poly A tail at the 3' end. The genome encodes 11 proteins, and the sequence from 3 'to 5' is non-structural protein NS1, non-structural protein NS2, nucleoprotein NP, phosphorylated protein P, matrix protein M, small hydrophobin SH, adsorption protein G, fusion protein F, transcription processing factor M2-1, transcription regulatory factor M2-2 and polymerase large subunit L. The subtypes are divided into two subtypes A and B according to antigenicity, and the subtypes A and B are divided into a plurality of genotypes according to the difference of RSV G gene sequences, and 37 genotypes are reported at present. (4) Metapneumovirus (hMPV), whose genome is a single non-segmented linear negative-strand RNA, about 13200 nucleotides in length, encodes 9 viral protein genes, of which the G gene sequence differs most among the viral subtypes. hMPV has only one serotype, and is divided into two genotypes a and B according to the virus G gene evolutionary relationship. (5) Enterovirus (Enterovirus) whose viral genome is a single positive strand linear RNA molecule, has a total length of about 7 Kb. The genus includes 12 species, 7 of which can infect humans, Enterovirus (EV) A species (EV-A), EV-B, EV-C, EV-D and Rhinovirus (RV) A species (RV-A), RV-B, RV-C, respectively. By 2018, EV-A was known to contain 25 sera/genotypes, EV-B includes 61 types, EV-C includes 23 types, EV-D includes 5 types, RV-A includes 80 types, RV-B includes 32 types, and RV-C includes 56 types. (6) Coronaviruses (HCoV), which are enveloped single-stranded negative-strand RNA viruses, are divided into four genera, alpha, beta, gamma, and delta, depending on the serotype and genomic characteristics of the coronavirus subfamily. Human-infecting coronaviruses belong to the genera α (HCoV-229E, HCoV-NL63) and β (HCoV-OC43, HCoV-HKU1, SARS-CoV and MERS-CoV). (7) Adenovirus (Adv), the Adv genome is a linear double-stranded DNA molecule with a genome size of about 36 kb. Adv is divided into 7 species, namely: Adv-A, -B, -C, -D, -E, -F and-G. 57 serotypes have been found in 7 Adv. (8) Bocavirus (HBoV), the HBoV genome is a linear single-stranded DNA of about 5kb in length, with a currently undefined end sequence, similar to other family members of the parvoviridae (Pavoviridae), containing two major Open Reading Frames (ORFs), including Non-structural protein 1(Non-structural protein 1, NS1) encoded by the 5 'ORF, two overlapping capsid proteins VP1 and VP2 encoded by the 3' ORF.
However, in the control of respiratory infectious diseases, the mere detection and identification of viruses does not satisfy the control requirements, and it is necessary to obtain the whole genome sequence information thereof in the first time. The method plays an important role in determining the characteristics of the genetic variation and analyzing the potential transmission capability of the genetic variation. Next-generation sequencing (NGS) technology enables genetic information to be directly obtained from clinical or environmental samples without culture. The method is used for virus metagenome research, and all pathogen sequences in a sample can be obtained through one-time detection. Although the number of viral nucleic acids in the infection process may be in the order of billions, the proportion of viral nucleic acids in the sample is still small due to the extremely small size of the viral genome compared to the human genome. The total nucleic acid of the sample contains a large amount of host ribosomal RNA and mRNA transcripts. It is not easy to obtain the whole genome sequence of the virus from a sample mixed with a large amount of host genetic material.
The targeted probe enrichment method is an enrichment detection method based on NGS sequencing, and is used for generating sufficient quantity for NGS sequencing by carrying out forward selective capture enrichment on a target sequence in a sample. Different from the targeted enrichment of human exons and the like, the size and abundance of pathogen genomes are low, so that after the targeted enrichment of pathogen microbial genes, a sequencing library is required to be constructed to meet the requirement of NGS. The operation process comprises the steps of breaking nucleic acid in a sample into small fragments, carrying out complementary pairing on a plurality of short complementary sequences (generally 80-120bp) covering the full-length sequence of the whole virus reference genome and the nucleic acid fragments in the sample through a probe (generally 80-120bp), capturing the virus nucleic acid sequences, amplifying and sequencing the captured virus nucleic acid sequences, increasing the sensitivity and specificity in detection, and improving the detection efficiency of the virus nucleic acid in the sample.
Currently, only broad-spectrum enriched probe sets against mammalian-infecting viruses are reported. However, the broad-spectrum enrichment probe set comprises a probe sequence of about 200M basic groups, the preparation period is long, and the cost is high; the probe design only uses limited reference sequences, lacks targeted capture sequences for respiratory virus variant sequences, and has insufficient coverage for polymorphic regions, and particularly has obvious defects in acquisition of virus sequences which are easy to mutate. In contrast, the consensus in the field is to establish a pathogen-specific targeted enrichment probe for syndrome, thereby improving the pathogen detection rate and the genome sequence acquisition capacity. However, currently, there is no report on the enriched probe set dedicated to NGS diagnosis of respiratory viruses.
Therefore, an enrichment probe set specific to common respiratory viruses is urgently needed, and the enrichment probe set can be used for NGS of the respiratory viruses to obtain the whole genome sequence information of the respiratory viruses and identify genome variation, so that the potential epidemic propagation characteristics of the viruses are evaluated, and early warning and prevention and control of respiratory diseases are guided.
Disclosure of Invention
In the present study, we screened 7 viruses of family 7, genus 15 and species 30 (table 1) causing respiratory tract infection from all known gene sequences to obtain the whole genome of the respiratory tract virus infected by human and polymorphism representative sequences, and designed a target enrichment capture probe set by combining specific RNA probe hybridization analysis. Synthesizing the probes, coupling the probes with biotin, hybridizing with a sample NGS library, and eluting to obtain an enriched library. The detection limit of the probe set on each target virus in the human respiratory tract sample is determined by methods such as sequence similarity analysis, short sequence comparison and the like, and a detection method of the human respiratory tract virus infection sequence based on the probe set is established.
In a first aspect, the present invention provides a method for generating a targeted enriched capture probe set against a human respiratory virus, the method comprising the steps of:
(1) obtaining a gene sequence of a virus infecting a human respiratory tract;
(2) designing probes by using a sliding window method aiming at the gene sequences, and exhaustively exhausting all possible nucleic acid sequences;
(3) filtering the obtained nucleic acid sequence, and reserving the nucleic acid sequence meeting the conditions;
(4) comparing the obtained nucleic acid sequence to a reference sequence, and filtering out sequences which are highly homologous with the human genome in a nucleic acid sequence library;
(5) and clustering the probe sequences, and reserving the core sequences of the clustered clusters.
In a second aspect, provided herein is a method of targeted enrichment, the method comprising the step of hybridizing a human respiratory sample NGS library to a probe set obtained by the method of the first aspect.
In a third aspect, provided herein is the use of the probeset obtained by the method of the first aspect for preparing an NGS library of human respiratory viruses.
Drawings
FIG. 1 shows the coverage of the universal target-rich capture probe set for human respiratory viruses of the present invention on the genomes of various species of human respiratory viruses.
FIG. 2 shows a comparison of IFVA genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 3 shows a comparison of IFVA genomic coverage depth of sequencing before and after capture enrichment.
FIG. 4 shows a comparison of IFVB genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 5 shows a comparison of IFVB genomic coverage depth by sequencing before and after capture enrichment.
Figure 6 shows a comparison of HCoV genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 7 shows a comparison of HCoV genome coverage depth by sequencing before and after capture enrichment.
FIG. 8 shows a comparison of hMPV genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 9 shows a comparison of depth of genome coverage of sequenced hMPV before and after capture enrichment.
Figure 10 shows a comparison of PIV2 genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 11 shows a comparison of sequencing PIV2 genome coverage depth before and after capture enrichment.
Figure 12 shows a comparison of RSV genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
Figure 13 shows a deep comparison of sequencing RSV genome coverage before and after capture enrichment.
Figure 14 shows a comparison of Adv B genome coverage length and depth before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
FIG. 15 shows sequencing Adv B genomic coverage depth comparison before and after capture enrichment.
Figure 16 shows EV D and RV B genome coverage length and depth comparisons before and after capture enrichment. The abscissa represents the length of the gene of each segment of the virus; the ordinate represents the depth of genome coverage.
Figure 17 shows sequencing EV D and RV B genome coverage depth comparisons before and after capture enrichment.
Detailed Description
The present invention provides a method for generating a targeted enriched capture probe set for human respiratory viruses, the method comprising the steps of:
(1) obtaining a gene sequence of a virus infecting a human respiratory tract;
(2) designing probes by using a sliding window method aiming at the gene sequences, and exhaustively exhausting all possible nucleic acid sequences;
(3) filtering the obtained nucleic acid sequence, and reserving the nucleic acid sequence meeting the conditions;
(4) comparing the obtained nucleic acid sequence to a reference sequence, and filtering out sequences which are highly homologous with the human genome in a nucleic acid sequence library;
(5) and clustering the probe sequences, and reserving the core sequences of the clustered clusters.
In a specific embodiment, the virus infecting the human respiratory tract is a virus of the family, genus, species/type of viruses as shown in table 1.
In one embodiment, the sliding window method sets the window width to 120nt and the step size to 1 nt.
In a specific embodimentIn step (3), the conditions for filtering the obtained probe sequence are as follows: the length of the nucleic acid sequence is 120 nt; the Tm value of the nucleic acid sequence is 75 +/-5 ℃; the Tm value of the nucleic acid sequence is calculated as Tm of 64.9+41 × (yG + zC-16.4)/(wA + xT + yG + zC), assuming 50mM Na+pH 7.0; in each nucleic acid sequence, the sequence segment which can form a hairpin structure does not exceed 13 bases; the GC content of the nucleic acid sequence is not higher than 70%.
In a specific embodiment, the reference sequence is the human reference genome GRCh 38.
In one embodiment, the resulting nucleic acid sequence is aligned to a reference sequence in step (4) using bowtie 2.
In one embodiment, the criteria for filtering out sequences in the pool of nucleic acid sequences that are highly homologous to the human genome are: aligning 80nt continuously to the human genome; and/or more than 80% similarity to the human genome.
In one embodiment, probe sequences are clustered using the UCLUST (Edgar 2010) software package (version 1.1.579) under the following conditions: the sequence similarity is more than or equal to 90 percent; and/or the sequence similarity is more than or equal to 90 percent after reverse complementation.
Also provided herein is a targeted enrichment method comprising the step of hybridizing a human respiratory sample NGS library to a probe set generated by the above method.
In a particular embodiment, the human respiratory sample may be: a bronchoalveolar lavage fluid sample, or a deep sputum sample, or a tracheal aspirate sample, or a pharyngeal swab sample, or a nasopharyngeal swab sample.
The invention also provides application of the human respiratory virus targeted enrichment capture probe set generated by the method of the invention in preparing an NGS library of the human respiratory virus.
In a specific embodiment, the human respiratory virus is a virus of the family, genus, species/type of viruses as shown in table 1.
The invention will be further illustrated with reference to preferred embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Specific experimental procedures are not noted in the following examples and are generally performed according to conventional experimental conditions and procedures, such as those described in the handbook of molecular cloning laboratories (Sambrook, et al. New York: Cold Spring Harbor Laboratory Press, 1989) or those provided by the reagent manufacturers.
Example 1 design of a Universal Targeted enrichment Capture Probe set for human respiratory viruses
1.1 acquisition of human respiratory Virus Gene sequences
The human respiratory virus gene sequences used herein are derived from sequence databases disclosed on the internet. Among them, enterovirus and rhinovirus standard strains and polymorphic sequences are from the professional database of picornavirus (http:// www.picornaviridae.com), coronavirus gene data are from CoVDB (https:// hsls. pitt. edu/obrc/index. phppege. URL1208462413), influenza virus gene data are from GISAID (https:// www.gisaid.org /) and ViPR (https:// www.viprbrc.org/brc/home. spgdecorater. ViPR), and other infectious human respiratory virus gene sequences are from the triple nucleotide sequence databases of GenBank, EMBL and DDBJ. Specific human respiratory virus species information is shown in table 1.
TABLE 1 Targeted species of Targeted enriched Capture Probe set Universal for human respiratory Virus
Figure BDA0002158789030000071
1.2 design and Synthesis of Targeted enrichment Capture Probe sequence for human respiratory Virus
For all target sequences, probes were designed using a sliding window approach, with a window width set to 120 nucleotides (nt) and a step size of 1nt, exhaustive of all possible probe sequences. Filtering the nucleic acid sequences under conditions for probe-specific hybridization, retaining nucleic acid sequences satisfying the following conditions: 1) the length of the nucleic acid sequence is 120 nt; 2) the Tm value of the nucleic acid sequence is 75 +/-5 ℃; 3) the Tm value of the nucleic acid sequence is calculated as Tm of 64.9+41 × (yG + zC-16.4)/(wA + xT + yG + zC), assuming that the condition is 50mM Na+pH 7.0; 4) in each nucleic acid sequence, there are no more than 13 sequence segments that may form hairpin structuresA base; 5) the GC content of the nucleic acid sequence is not higher than 70%.
Using human reference genome GRCh38 as reference sequence, aligning the nucleic acid sequence to the reference sequence using bowtie2, and filtering out sequences highly homologous to the human genome in the nucleic acid sequence library, the reference standard is as follows: 1) aligning 80nt continuously to the human genome; 2) similarity to the human genome is over 80%.
Probe sequences were clustered using the UCLUST (Edgar 2010) software package (version 1.1.579), preserving the core sequence of the clustered clusters. The clustering conditions were as follows: 1) the sequence similarity is more than or equal to 90 percent; 2) the sequence similarity after reverse complementation is more than or equal to 90 percent.
After clustering, the nucleic acid probe is obtained. The synthesis of the probe and the coupling with biotin were carried out by the company Aijiekang Biotechnology (Beijing) Ltd.
Example 2 human respiratory sample NGS library preparation and Probe set hybridization
2.1 preparation of nucleic acids from human respiratory tract samples
Taking 200 mu l of bronchoalveolar lavage fluid sample, placing the bronchoalveolar lavage fluid sample in 2ml of sample lysate of an easy MAG automatic nucleic acid extraction system, extracting total nucleic acid by using an easy MAG full-automatic nucleic acid extractor, and operating according to the equipment instructions. The elution volume was 60. mu.l, and the extracted nucleic acids were stored frozen at-80 ℃ after being dispensed.
2.2 human respiratory tract sample nucleic acid reverse transcription
(1) Reverse transcription of nucleic acids
Nucleic acid reverse transcription System Using SuperScriptTMIII reverse transcriptase (Thermo, USA). Taking 6 μ l of the total nucleic acid prepared in the step 2.1 for reverse transcription, and carrying out the following system and reaction conditions:
DEPC water 5μl
dNTP(10mM) 1μl
pN(6) 0.5μl
Oligo(dt) 0.5μl
Total nucleic acids 6μl
After mixing, standing at 65 ℃ for 5min, carrying out ice bath for 2min, and reacting the reaction mixture in the following system:
5 Xfirst Strand Synthesis buffer 4μl
0.1M DTT 1μl
RNase inhibitors 1μl
Reverse transcriptase (200U/. mu.l) 1μl
After mixing evenly, the cDNA is generated by reaction under the conditions of 25 ℃ for 5min, 50 ℃ for 45min and 70 ℃ for 15 min.
(2) Second chain synthesis
The reverse transcribed cDNA was digested with RNAse H1. mu.l, 10min at 37 ℃ and the second strand was synthesized using 3 '-5' exo-Knelow large fragment enzyme (NEB, UK). The system is as follows:
Figure BDA0002158789030000081
Figure BDA0002158789030000091
after the reaction was allowed to stand at 37 ℃ for 1 hour, the reaction was terminated by allowing the reaction to proceed at 75 ℃ for 10 minutes.
(3) Double stranded cDNA purification
The purification is carried out by using a MinElute Reaction Cleanup Kit, and the specific steps are as follows:
adding 300 mul Buffer ERC into the cDNA solution and mixing evenly;
② adding the mixed solution on a MinElute centrifugal column, and centrifuging for 1min at 17900 g;
③ taking 750 ul of Buffer PE to add on a MinElute centrifugal column, and centrifuging for 1min at 10000 g;
fourthly, centrifuging again for 1 minute at 10000g, and drying in the air;
fifthly, the MinElute spin column is put into a clean 1.5ml EP tube, 20 mul Buffer EB is added to the MinElute spin column, and centrifugation is carried out for 1min at 10000 g. The purified DNA was stored at-20 ℃ until use.
(4) Determination of double-stranded DNA concentration
The concentration of the resulting double-stranded DNA was measured using the Qubit dsDNA HS Assay Kit, a Qubit 2.0 fluorometer.
Firstly, taking out dsDNA HS buffer in the kit and balancing for 30min at room temperature;
② 199 mul of 200 XHS buffer is taken and added with 1 mul of reactant (200X) to be mixed evenly;
③ preparing a standard substance, namely adding 190 microliter of working solution into 10 microliter of the standard substance and uniformly mixing for later use;
fourthly, preparing the sample, adding 199 mul of working solution into 1 mul of sample to be detected, and mixing uniformly for later use.
Fifthly, after centrifuging the mixed solution, measuring the concentration by using a Qubit 2.0 fluorometer.
2.3 human respiratory sample NGS library construction
The library was constructed using a double-ended sequencing library construction kit (SureSelectXT Target entity System) adapted to the Illumina platform.
(1) Sample preparation
(ii) sample quantification
The sample was quantified using a Qubit 2.0 fluorometer (ThermoFisher, USA), and 200ng of the double-stranded DNA prepared in 2.2 was diluted to a total volume of 50. mu.l with 1 XLow TE Buffer (cat # 12090-015) from Life technologies and carefully transferred to a 1.5ml EP tube.
(ii) fragmentation
Fragment fragmentation (parameters selected: fragment length 200bp, time 175s) was performed using the Covarism220 high performance sample processing system (Covaris, USA).
③ purification
The purification of the above fragmented DNA was performed using Agencour AMPure XP magnetic beads.
(2) Double stranded DNA fragment end repair
The above purified fragmented DNA was added to the terminal repair system and repaired at 20 ℃ for 30 min. The reaction system is 100 mu l, and the terminal repair system is prepared as follows:
Figure BDA0002158789030000101
(3) purification of end-repaired DNA
Placing Agencour AMPure XP magnetic beads for more than 30min at room temperature;
adding 180 mu l of fully and uniformly mixed AMPure XP magnetic beads into 100 mu l of DNA with repaired tail ends, and standing and incubating for 5min at room temperature after vortex oscillation;
thirdly, placing the mixed solution on a magnetic frame, standing for 3-5min, and removing the supernatant;
fourthly, washing the magnetic beads twice by 200 mul of 70 percent ethanol, and removing the ethanol after standing for 1min each time;
standing at room temperature for about 5min, adding 32 μ l nuclease-free water after the magnetic beads are completely dried, uniformly mixing by vortex, and standing at room temperature for 2 min;
sixthly, the magnetic beads are placed on a magnetic frame and stand for 2-3min, and 32 mu l of water is absorbed into a new 1.5ml Lobind tube.
(4) Filling-up DNA double strand by adding "adenylate A" and purifying
The DNA with the repaired end is added into an A reaction system and reacted for 30min at 37 ℃. The reaction system is as follows:
Figure BDA0002158789030000102
Figure BDA0002158789030000111
the resulting product was purified again by the method described in 2.3 (3).
(5) Add joint (Adaptor)
Adding the DNA added with the 'A' into an adaptor ligation reaction system, uniformly mixing, and reacting for 15min at 20 ℃. The Adaptor ligation system was formulated as follows:
Figure BDA0002158789030000112
the resulting product was purified again by the method described in 2.3 (3).
(6) Amplification of Adaptor DNA libraries
Adding the purified Adaptor DNA library into an amplification system, wherein the amplification conditions are as follows: 2min at 98 ℃; 30sec at 98 ℃, 30sec at 65 ℃ and 1min at 72 ℃ for 10 cycles; 10min at 72 ℃. The reaction system is as follows:
Figure BDA0002158789030000113
the resulting product was purified again as described in 2.3(3) with an elution volume of 30. mu.l.
(7) Library quality determination
The library quality is measured by the agent 2100, and the measured waveform peak distribution is between 225bp and 275bp to indicate that the library quality is qualified.
2.4 Probe set hybridization
(1) Construction of hybrid Capture libraries
Adjusting the concentration of the library
The hybrid capture library concentration should be 221 ng/. mu.l. The library below this concentration was concentrated in a vacuum concentrator at 45 ℃ and resuspended in water to a concentration of 221 ng/. mu.l.
Preparation of hybridization reaction solution
Hybridization buffer was prepared at room temperature:
Figure BDA0002158789030000121
preparing a SureSelect Block system:
Figure BDA0002158789030000122
preparing an RNase Block dilution system on ice:
Figure BDA0002158789030000123
preparing a probe set hybridization system at room temperature:
Figure BDA0002158789030000124
Figure BDA0002158789030000131
(2) hybrid Capture reaction
Preparation of libraries in PCR plates. Add 3.4. mu.l 221 ng/. mu.l library to row "B", place each sample library in a separate well, add 5.6. mu.l of SureSelect Block system, blow up and down with a pipette tip, mix well, cap, place in PCR instrument at 95 ℃ for 5min, incubate with 65 ℃ lid.
② at 65 ℃, 20 μ l of the pre-prepared probe set hybridization system is added into the row "A" of the PCR plate, covered, and left at 65 ℃ for at least 5 min.
Thirdly, opening the PCR instrument, and discharging 7 mul of liquid from the 'B' row to the 'C' row at the temperature of 65 ℃. The tube cap was closed and heated at 65 ℃ for 2 min.
Fourthly, the PCR instrument is opened, and 13 mul of hybridization buffer solution is sucked out from the row A and is put into the capture library mixed solution in the row C under the condition of 65 ℃. Then, a discharging gun is used for sucking out all liquid in the 'B' row and placing the liquid in the 'C' row, the liquid is gently mixed evenly for 8-10 times, and the mixture is incubated for 16-24h at 65 ℃.
(3) Purification after hybrid Capture
Firstly, transferring the hybridization mixed solution from the PCR instrument to a 1.5ml EP tube, adding a precleaned magnetic bead solution into the tube, and reversing and uniformly mixing the solution for 3 to 5 times. Incubating at room temperature for 30 min; after instantaneous centrifugation, putting the magnetic bead tube on a magnetic frame, and absorbing liquid;
adding 200 mul SureSelect Wash 1 into the tube, vortexing, incubating for 15min at room temperature, vortexing for 5 times, performing instant centrifugation for 5sec each time, placing on a magnetic frame, and absorbing liquid;
adding 200 mul of preheated SureSelect Wash 2 into the tube, vortexing for 5sec, carrying out water bath at 65 ℃ for 10min, carrying out instantaneous centrifugation, placing on a magnetic frame, absorbing supernatant, and repeating the washing step for 2 times;
add 30 mul nuclease-free water, vortex for 5sec, and resuspend the magnetic beads.
(4) Post-hybrid Capture amplification and tag addition
Figure BDA0002158789030000132
Figure BDA0002158789030000141
First, a reaction system was prepared on ice, and 31. mu.l of the above system was added to each sample R tube.
② add 5. mu.l of tag to the library.
Thirdly, vortex the capture DNA library combined with the magnetic beads, suck out 14 mu l of the capture DNA library, mix the capture DNA library with a reaction system, and carry out 2min at the temperature of 98 ℃; 30sec at 98 ℃, 30sec at 57 ℃ and 1min at 72 ℃ for 12 cycles; the program was run at 72 ℃ for 10 min. After the amplification product was purified according to the method described in 2.3(3), the peak was determined to be between 250bp and 350bp by Agient 2100, and the quality was considered to be acceptable. The library was stored at-20 ℃ until use.
2.5 library sequencing
Sequencing the libraries qualified in quality control by using an Illumina HiSeq X platform, wherein the original data volume of each library is 8G, and the sequencing strategy is double-ended PE 150. And (3) preprocessing the off-line data after the data is split by a library, filtering short reads (reads) containing joints, short reads with sequencing quality continuously less than 5 by 20 bases, and short reads with sequencing quality less than 5 by shortening continuously 5 bases.
2.6 Virus detection sensitivity and genomic coverage analysis
The short reads obtained in 2.5 were aligned using bowtie2 to the human respiratory virus reference genomic sequence using the full length sequence of the corresponding strain of this genus of virus. Coverage and depth of short reads to the reference genome were counted using samtools.
As shown in FIG. 1, the genome coverage of the universal target enrichment capture probe set for the human respiratory viruses in all the genera and species of the human respiratory viruses exceeds 70%. As shown in Table 2, the number of short reads detected by each virus genus and species after probe set enrichment is greatly increased compared to the method without enrichment capture.
The efficiency of probe enrichment varies among different types of viruses and samples of different nucleic acid copy numbers. The target enrichment capture can improve the detection sensitivity of partial viruses, and the summary result of the lowest detection concentration of each virus is shown in table 3. In the samples without capture enrichment, only 10 copies could be detected7NA segment of influenza B virus (IFVB) at ml, when IFVB has a nucleic acid copy number of 105Per ml and 104Nucleic acid could not be detected at each ml; however, after targeted enrichment of the sample, the IFVB has a nucleic acid copy number of 107At/ml, all eight segments can be detected; when it is a nucleusAcid copy number reduced to 105Per ml and 104At/ml, PA, NA, MP and NP segments of IFVB were detected. The lowest detected concentration of HCoV-HKU1, EVD, and RSV A in the non-target captured samples was 107Copy number/ml; but the lowest detected concentration is 10 after the targeted enrichment4Number of copies/ml. The lowest detected concentration of HCoV-NL63, HCoV-OC43, and HPIV 2 decreased by an order of magnitude after the sample was captured.
In addition, after probe set enrichment, the genome coverage of each virus genus and species is obviously improved compared with the genome coverage when the virus genus and species are not enriched and captured. The analysis was performed for each specific virus as follows.
(1) Influenza A virus (IFVA)
As shown in FIG. 2, IFVA is at 104And 105Viral genes were detected in both concentrations of copies/ml of library, whether enriched or not. But the genome coverage and depth are significantly increased after enrichment.
At 104When the gene is copied/ml, the average coverage before enrichment is only 23.22 percent, and the average depth of the detection sequences compared to the genome is 1.61; after enrichment, the coverage of seven segments, except the PA segment was 99.95%, was 100%, increasing the average depth to 354.22. The coverage is increased by 4.31 times compared with that before enrichment; the average depth of the coverage genome increased 262.05-fold. At 105At copy/ml, the average coverage of the genome before enrichment is 87.89%, and the average depth is 11.31; after capture enrichment, the coverage of eight segments of the genome was 100%, and the average depth was 2277.54. The average depth of the genome was 214.81-fold increased. The results of the depth of coverage before and after capture enrichment of the different gene segments are shown in FIG. 3.
(2) Influenza B virus (IFVB)
After IFVB nucleic acid in a sample is subjected to targeted enrichment, the detection sensitivity of the IFVB nucleic acid can be improved, and the genome coverage and depth are increased. As shown in FIG. 4, before targeted enrichment, the virus copy number can only be 107The sequence of the NA segment was detected in the sample per ml, with a coverage of only 19.03% of the viral genomic NA segment, aligned to the geneThe average depth of the groups was 1.22. After enrichment, the number of viral copies in the sample was 104At/ml, the NP-segment sequence coverage was 9.27% with an average depth of 7.92. When the virus copy number is 105At/ml, sequences of the MP, NA and PA segments in the genome could be detected with an average coverage of 15.19% and an average depth of coverage of 4.73% over the corresponding segment of the genome. When the virus copy number is 107At ml, the eight segments of the IFVB genome were all detectable. Except for HA and NS segments, the sequence coverage is 22.92% and 26.74% respectively in corresponding segments, the detection sequences of the other six segments cover more than 70% of the corresponding segment genome range, and the average coverage depth is 3.64. The difference in average depth of coverage before and after capture of the different gene segments is shown in FIG. 5.
(3) Coronavirus (HCoV)
The effect of targeted enrichment on genome coverage and depth is shown in figure 6.
①HCoV-229E
Number of virus copies in the sample was 104At/ml, the average coverage on the genome before targeted enrichment capture was only 0.56%, with an average depth of 1.97; after enrichment the coverage increased to 18.98% with an average depth of 1.96. Coverage was increased 33.89 times as compared to pre-enrichment. At 105At copy/ml, the coverage before enrichment is 0% and the average depth of coverage is 0; after enrichment, the genome coverage was 77.28%, and the average depth was increased to 4.93.
②HCoV-HKU1
Before enrichment, the copy number of HCoV-HKU1 was 104Per ml and 105At each ml, no viral nucleic acid sequence could be detected. After enrichment, at 104The coverage of the viral genome at copy/ml was 5.23%, the average depth was 2.02; at 105At copy/ml, the genome coverage was 14.88% with an average depth of 3.37. At 107At copy/ml, the coverage of the enriched pre-genome is 80%, and the average coverage depth is 0.16; after enrichment, the coverage of the genome was 50.01%, and the average depth was increased to 20.52.
③HCoV-OC43
At 104At copy/ml, no HCoV-OC43 viral nucleic acid sequence was detected prior to enrichment; the HCoV-OC43 coverage after enrichment was 20.02% with an average depth of 1.88. At 105At copy/ml, the coverage of the sequences before enrichment on the genome is 0.61%, and the average depth is 1.60; after capture enrichment, coverage in the viral genome increased to 71.84%, with an average depth of 4.28. The coverage is increased by 117.77 times compared with that before enrichment; the average depth increased by a factor of 2.76. At 107At copy/ml, the coverage of the enriched pre-sequence on the genome was 98.71%, with an average depth of 0.43; after capture enrichment, the coverage at the viral genome was 91.69%, increasing the average depth to 102.39. Its average depth in the genome increased 238.12.
④HCoV-NL63
At 104At copy/ml, no HCoV-NL63 viral nucleic acid sequence was detected prior to enrichment; the HCoV-NL63 coverage after enrichment was 4.23% with an average depth of 1.54. At 105At copy/ml, the coverage of the enriched pre-sequence on the genome is 94.6%, and the average depth is 0.07; after capture enrichment, the coverage at the viral genome was 17.16%, with an average depth of 4.98. The average depth increased by a factor of 3.23. At 107At copy/ml, the coverage of the enriched pre-sequence on the genome is 98.22%, and the average depth is 4.67%; after capture enrichment, the coverage at the viral genome was 39.17%, increasing the average depth to 31.40. The results of the differences in coverage depth before and after enrichment of different coronavirus captures are summarized in FIG. 7.
(4) Metapneumovirus (hMPV)
Since the content of hMPV nucleic acid in clinical samples is small, the virus copy number is only 104The sample/ml had hMPV mixed in. Before capture, the coverage of the hMPV gene sequence in the viral genome is 0%, and the average depth is 0; after enrichment capture, the viral genome coverage was 75.88% with an average depth of 171.75. See fig. 8 and 9.
(5) Parainfluenza virus (HPIV)
At 104At copy/ml, no viral nucleic acid sequences were detected prior to enrichment; the coverage of the gene sequence in the virus genome after enrichment is up to 96.92 percent,the average depth of coverage was 9.00. At 105At copy/ml, the average coverage of the genome before enrichment is 0.14%, and the average depth is 9; the average genome coverage after enrichment was 99.93% and the average depth increased to 71.66. The average depth of the genome is increased by 7.96 times. Coverage and depth of PIV2 before and after enrichment see fig. 10 and 11.
(6) Respiratory Syncytial Virus (RSV)
Before enrichment, at 104Copy/ml and 105No RSV nucleic acid sequence was detected at copy/ml; after enrichment, at 104The coverage of the viral genome at copy/ml was 2.03%, with an average depth of 1.5; at 105At copy/ml, the genome coverage was 14.75% and the mean depth was 11.2. At 107At copy/ml, the coverage of the enriched pre-genome is 12.49%, and the average coverage depth is 0.3; after enrichment, the genome coverage was 27.24% and the mean depth increased to 17.4. After capture and enrichment, the coverage of the genome is increased by 2.18 times; the average depth increased by a factor of 58. RSV coverage and depth before and after enrichment see figures 12 and 13.
(7) Adenovirus (Adv)
At 104At copy/ml, the coverage of Adv B before enrichment was 72.31%, and the average coverage depth was 0.05; after enrichment, the coverage of the genome was 42.95%, and the average depth was increased to 19.98. The average depth was increased 399.6 times compared to before enrichment. At 105At copy/ml, the average coverage of the genome before enrichment is 98.71%, and the average depth is 1.13; after capture enrichment, coverage at the genome was 59.01%, increasing the average depth to 283.31. The average depth was increased 250.72 times compared to before enrichment. At 107At copy/ml, the average coverage of the genome before enrichment was 99.20%, and the average depth was 10.04; after capture enrichment, coverage at the genome was 63.90%, increasing the average depth to 1280.22. The average depth is increased by 128 times compared with that before enrichment. The coverage and depth of Adv B before and after enrichment are seen in fig. 14 and 15.
(8) Enterovirus (EV)
Before enrichment, at 104Copy/ml and 105At copy/ml, allNo EV D and RV B virus nucleic acid sequences were detected; after enrichment, at 104At copy/ml, the enterovirus had a genome coverage of 5.62%, aligned to an average depth of 91.1 for the genome, and the rhinovirus had a coverage of 3.30% on the genome with an average depth of 3.2. When the virus copy number is increased to 105At copy/ml, the genomic coverage for enterovirus and rhinovirus were 6.56% and 3.70%, respectively, with mean depths of 1265.1 and 123.6, respectively. At 107At copy/ml, the genome coverage of the enterovirus before capture was 3.03%, aligned to an average depth of the genome of 25.5; after capture enrichment, the enterovirus genome coverage was 7.77% with an average depth of 3771.6. Before enrichment, the coverage of the genome was increased by 2.56 times, and the average depth was increased 147.91. Coverage and depth of EV D and RV B before and after enrichment see figures 16 and 17.
Figure BDA0002158789030000191
TABLE 3 lowest detected concentration of the universal set of target-rich capture probes for human respiratory viruses for each respiratory virus
Figure BDA0002158789030000201

Claims (10)

1. A method for generating a targeted enriched capture probe set against human respiratory viruses, the method comprising the steps of:
(1) obtaining a gene sequence of a virus infecting a human respiratory tract;
(2) designing probes by using a sliding window method aiming at the gene sequences, and exhaustively exhausting all possible nucleic acid sequences;
(3) filtering the obtained nucleic acid sequence, and reserving the nucleic acid sequence meeting the conditions;
(4) comparing the obtained nucleic acid sequence to a reference sequence, and filtering out sequences which are highly homologous with the human genome in a nucleic acid sequence library;
(5) and clustering the probe sequences, and reserving the core sequences of the clustered clusters.
2. The method of claim 1, wherein the virus infecting the human respiratory tract is a virus of the family, genus, species/type of viruses shown in table 1.
3. The method of claim 1, wherein the sliding window method sets the window width to 120nt and the step size to 1 nt.
4. The method of claim 1, wherein the conditions for filtering the obtained probe sequences in step (3) are: the length of the nucleic acid sequence is 120 nt; the Tm value of the nucleic acid sequence is 75 +/-5 ℃; the Tm value of the nucleic acid sequence is calculated as Tm of 64.9+41 × (yG + zC-16.4)/(wA + xT + yG + zC), assuming 50mM Na+pH 7.0; in each nucleic acid sequence, the sequence segment which can form a hairpin structure does not exceed 13 bases; the GC content of the nucleic acid sequence is not higher than 70%.
5. The method of claim 1, wherein the reference sequence is the human reference genome GRCh 38.
6. The method of claim 1, wherein the resulting nucleic acid sequence is aligned to a reference sequence in step (4) using bowtie 2.
7. The method of claim 1, wherein the criteria for filtering out sequences in the pool of nucleic acid sequences that are highly homologous to the human genome are: aligning 80nt continuously to the human genome; and/or more than 80% similarity to the human genome.
8. The method of claim 1, wherein the probe sequences are clustered using the UCLUST (Edgar 2010) software package (version 1.1.579) under the conditions: the sequence similarity is more than or equal to 90 percent; and/or the sequence similarity is more than or equal to 90 percent after reverse complementation.
9. A method of targeted enrichment comprising the step of hybridizing a human respiratory sample NGS library with a probe set obtained by the method of any one of claims 1 to 8; preferably, the human respiratory tract sample is a bronchoalveolar lavage fluid sample, or a deep sputum sample, or a tracheal aspirate sample, or a pharyngeal swab sample, or a nasopharyngeal swab sample.
10. Use of a probe set obtained by the method of any one of claims 1 to 8 for the preparation of an NGS library of human respiratory viruses; preferably, the human respiratory virus is a virus of the family, genus, species/type of virus as shown in table 1.
CN201910728084.3A 2019-08-07 2019-08-07 Human respiratory virus targeted enrichment capture probe set and application thereof Withdrawn CN112342270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910728084.3A CN112342270A (en) 2019-08-07 2019-08-07 Human respiratory virus targeted enrichment capture probe set and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910728084.3A CN112342270A (en) 2019-08-07 2019-08-07 Human respiratory virus targeted enrichment capture probe set and application thereof

Publications (1)

Publication Number Publication Date
CN112342270A true CN112342270A (en) 2021-02-09

Family

ID=74367496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910728084.3A Withdrawn CN112342270A (en) 2019-08-07 2019-08-07 Human respiratory virus targeted enrichment capture probe set and application thereof

Country Status (1)

Country Link
CN (1) CN112342270A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113755555A (en) * 2021-09-03 2021-12-07 浙江工商大学 Capture probe set for detecting food allergen, preparation method and application thereof
CN114242174A (en) * 2022-01-10 2022-03-25 湖南大学 Identification and annotation method for endogenous retroviruses
CN116153411A (en) * 2023-04-18 2023-05-23 北京携云启源科技有限公司 Design method and application of multi-pathogen probe library combination

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱俊琳: "呼吸道病毒核酸捕获与富集技术研究", 《中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113755555A (en) * 2021-09-03 2021-12-07 浙江工商大学 Capture probe set for detecting food allergen, preparation method and application thereof
CN114242174A (en) * 2022-01-10 2022-03-25 湖南大学 Identification and annotation method for endogenous retroviruses
CN114242174B (en) * 2022-01-10 2022-08-16 湖南大学 Identification and annotation method for endogenous retroviruses
CN116153411A (en) * 2023-04-18 2023-05-23 北京携云启源科技有限公司 Design method and application of multi-pathogen probe library combination

Similar Documents

Publication Publication Date Title
Lin et al. Use of oligonucleotide microarrays for rapid detection and serotyping of acute respiratory disease-associated adenoviruses
US8232058B2 (en) Multiplex detection of respiratory pathogens
US20080261198A1 (en) Diagnostic Primers and Method for Detecting Avian Influenza Virus Subtype H5 and H5n1
CN112342270A (en) Human respiratory virus targeted enrichment capture probe set and application thereof
EP3350351B1 (en) Virome capture sequencing platform, methods of designing and constructing and methods of using
US7695941B2 (en) Multiplexed polymerase chain reaction for genetic sequence analysis
US6867021B2 (en) Multiplex RT-PCR/PCR for simultaneous detection of bovine coronavirus, bovine rotavirus, Cryptosporidium parvum, and Escherichia coli
CN111235320A (en) Novel coronavirus 2019-nCoV nucleic acid detection kit
CN113652505B (en) Method and kit for detecting novel coronavirus and VOC-202012/01 mutant strain thereof
KR20230030639A (en) Methods for Detecting SARS-CoV-2, Influenza and RSV
US20150133329A1 (en) Methods of detecting influenza virus
RU2478718C2 (en) Method for detecting respiratory viral agents in sample
JP2005204664A (en) Detection of enterovirus nucleic acid
WO2021250617A1 (en) A rapid multiplex rpa based nanopore sequencing method for real-time detection and sequencing of multiple viral pathogens
CN105838826B (en) Double-color fluorescent PCR primer, probe and method for rapidly distinguishing canine parvovirus vaccine strain and wild strain
CN113249517A (en) Primer, probe and kit for real-time fluorescent quantitative PCR (polymerase chain reaction) detection of bovine plague
CN112280899A (en) Porcine astrovirus type 2 TaqMan fluorescent quantitative PCR kit and application thereof
WO2018005284A1 (en) Methods and compositions for influenza a virus subtyping
CN116790815A (en) Kit for detecting metapneumovirus
WO2006132601A1 (en) Diagnostic primers and method for detecting avian influenza virus subtype h5 and h5n1
EP4286538A2 (en) Pathogen diagnostic test
Wang et al. Development of a reliable assay protocol for identification of diseases (RAPID)-bioactive amplification with probing (BAP) for detection of Newcastle disease virus
KR102435209B1 (en) Composition for simultaneously distinguishing and detecting influenza type A and type B viruses and type 2 severe acute respiratory syndrome coronavirus and detection method using the same
CN111534639A (en) African swine fever gene chip and application thereof
CN111808998A (en) Adenovirus whole genome capture method, primer group and kit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210209

WW01 Invention patent application withdrawn after publication