CN115976178A

CN115976178A - SFTSV (Small form-factor TSV) detection method based on nanopore metagenome sequencing

Info

Publication number: CN115976178A
Application number: CN202211476258.XA
Authority: CN
Inventors: 许一菲; 韦雪敏; 王玉昊; 张宇涵
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-04-18

Abstract

The invention relates to an SFTSV detection method based on nanopore metagenome sequencing. The invention provides an SFTSV (Small form-factor TSV) detection method fusing sequence-independent single-primer amplification and nanopore metagenome sequencing, develops a set of automatic and real-time biological information data analysis process, and can be applied to the aspects of species identification of metagenome sequencing data, SFTSV whole genome assembly, SFTSV genotype, rearrangement characteristic analysis and the like.

Description

SFTSV (Small form-factor TSV) detection method based on nanopore metagenome sequencing

Technical Field

The invention belongs to the technical field of microbial detection methods, and particularly relates to an SFTSV detection method based on nanopore metagenome sequencing.

Background

The information in this background section is only for enhancement of understanding of the general background of the invention and is not necessarily to be construed as an admission or any form of suggestion that this information forms the prior art that is already known to a person of ordinary skill in the art.

Fever with thrombocytopenia syndrome (SFTS) is a tick borne zoonosis caused by a novel bunyavirus infection. The main symptoms of SFTS are fever, thrombocytopenia, leukopenia and gastrointestinal abnormality, and severe cases die due to multiple organ functions. The number of reported cases of SFTS in China shows a trend of increasing year by year, the fatality rate of the SFTS is maintained at a higher level, the incidence rate of the SFTS is continuously increased in a circulation area, and case reports also continuously appear in a non-circulation area, thereby seriously threatening the life health of people. Based on the public health challenges of SFTS, WHO listed it as one of the ten infectious diseases that need priority attention in 2017. SFTS has no specific treatment means, and the symptomatic support treatment is mainly adopted, so that the early discovery of patients is the key of treatment. However, SFTS patients have poor clinical symptom specificity, and need to be identified from diseases such as renal syndrome hemorrhagic fever, rickettsia such as human granulocytic anaplasmosis, dengue fever, sepsis, typhoid fever, and thrombocytopenic purpura.

At present, nucleic acid and antibody detection methods are mainly adopted for detecting the fever with thrombocytopenia syndrome virus (SFTSV). Although the nucleic acid detection method based on the PCR technology has the characteristics of sensitivity, simplicity and the like, the method depends heavily on the prior hypothesis that only a single pathogen or a part of pathogens can be detected, and pathogens other than a target cannot be detected. The antibody detection method is quick and simple, but the IgM and IgG antibodies can be generated only after the virus enters an organism through a certain incubation period, so that the method is not suitable for quick detection. In addition, the conventional nucleic acid and antibody detection methods cannot obtain complete genome information of pathogens, and cannot perform whole genome characteristic analysis and tracing of the SFTSV. Therefore, it is important to construct a new method for fast and comprehensive detection of SFTSV.

Disclosure of Invention

In order to solve the problems, the invention provides the SFTSV detection method for fusion sequence independent single primer amplification and nanopore metagenome sequencing, and the detection method has the characteristics of rapidness and comprehensive information.

In order to solve the above technical problems, the present invention provides the following technical solutions.

First aspect of the invention: the method for detecting the SFTSV in real time based on the nanopore metagenome sequencing comprises the following steps:

(1) Constructing a serum sample;

(2) Quantifying the virus load of the Hazara virus liquid by using quantitative reverse transcription-polymerase chain reaction, and adding the Hazara virus into a serum sample as a sequencing positive internal control;

(3) Extracting RNA from the serum sample, removing DNA and purifying RNA;

amplifying the cDNA using a Sequence Independent Single Primer Amplification (SISPA) method;

(4) Sequencing the serum sample to construct a multiple sequencing library; loading the sequencing library onto a nanopore R9.4.1 sequencing chip, and sequencing by using a nanopore MinlON sequencer;

(5) And (5) carrying out biological information data analysis.

Further, in the step (1), the first group of serum samples comprises serum samples with different serum backgrounds and different SFTSV concentrations, which are constructed by using the serum of 10 SFTSV RNA positive patients and the serum of 10 SFTSV RNA negative patients; the second group included 40 patient sera positive for SFTSV RNA; the third group contained 40 SFTSV RNA negative patient sera divided into two fractions, the first fraction contained 10 samples positive for Hantaan virus or dengue virus RNA detection, and the second fraction contained 30 samples negative for all SFTSV, hantaan virus, dengue virus RNA detection.

Further, in the step (2), the serum sample is added with the concentration of 10 ⁴ copies/ml of Hazara virus.

Further, in step (3), RNA was extracted using QIAamp viral RNA kit (Qiagen), and DNA was removed and RNA was purified using Turbo DNase (Thermo Fisher Scientific) and RNA Clean & concentrate-5 kit (Zymo Research).

Further, in the reverse transcription in the step (3), 4. Mu.l of RNA and 1. Mu.l of primer A with a concentration of 40 pmol/. Mu.l are mixed, incubated at 65 ℃ for 5min, and cooled to room temperature; during first strand synthesis, 5. Mu.l of SuperScriptIV (Thermo Fisher Scientific) reaction mix was added and incubated at 42 ℃ for 10min; during the second strand synthesis, 5. Mu.l of Sequenase (Affymetrix) reaction mixture was added and incubated at 37 ℃ for 8min, followed by addition of 0.45. Mu.l of Sequenase dilution buffer and 0.15. Mu.l of Sequenase and incubation at 37 ℃ for 8min; in the cDNA amplification process, 5 μ l of the reverse transcription product is mixed with 50 μ l of AccuTaq LA (Sigma) reaction mixture and 1 μ l of primer B; the reaction conditions for PCR were: 30 cycles of 98 deg.C 30s, 94 deg.C 15s, 50 deg.C 20s, 68 deg.C 2min, 68 deg.C 10min; finally, the amplified cDNA product was purified using AMPure XP Beads (Beckman Coulter) of 1.

The primer A is 5'-GTTTCCCACTGGAGGATA-N9-3' as shown in SEQ NO. 1.

The primer B is 5 'GTTTCCCACTGGAGGATA-3', and is shown as SEQ NO. 2.

Further, in step (4), 6 samples were taken, and 200fmol cDNA was obtained as a single sample, and a multiplex sequencing library was constructed using the EXP-NBD104 expansion kit and the SQK-LSK110 ligation kit.

Further, in the step (5), the biological information analysis includes the following processes:

(a) Converting nanopore sequencing original data into a sequence file in a Fastq format by using Guppy software;

(b) Removing coding Barcode at two ends of sequencing sequence reads by using Porecho, and monitoring data quality by using Nanoplot;

(c) Species annotation of sequencing reads was performed using Centrifuge alignment to a reference database (p _ compressed + h + v database, containing >2 million species of bacteria and viruses etc. genome sequences) to obtain preliminary classification results; secondly, selecting a temporary reference sequence for each species from a reference database according to the classification result of the Centrifuge; thirdly, mapping the sequencing reads to a temporary reference sequence by using Minimap2, and assembling a temporary consensus sequence of the species according to a majority voting method (namely selecting the most mapped base at each position of the genome); from then on, using Blast to align the temporal consensus sequence with a self-established reference database to select the best reference sequence; finally, mapping the sequencing reads to the optimal reference sequence by using Minimap2, filtering sequencing fragments with mapping quality lower than 50 and mapping proportion lower than 50%, and analyzing and counting the mapping reads quantity, mapping depth of each base position of the reference sequence and other information by using Samtools software.

The standard for judging the positive detection of a certain virus is that more than or equal to 2 sequencing reads cover different positions of a pathogenic genome.

(d) Medaka was used to analyze the mutation sites and assemble the consensus sequence of the virus.

(e) And integrating the biological information analysis software, and constructing a full-automatic and parallel biological information data analysis flow by adopting a Nextflow flow. The data analysis process scans the sequencing data storage folder according to a preset time interval, extracts and analyzes sequencing data in real time, can process the sequencing data of a plurality of samples simultaneously in parallel, and can operate on various platforms such as a personal computer, a high-performance server and cloud computing.

Further, the self-established reference database comprises all SFTSV genome sequences in the NCBI and ViPR virus databases.

The beneficial effects of the invention are:

according to the invention, the internal control is added in the SFTSV metagenome sequencing, so that the reliability of the detection result is improved; according to the invention, parameters in the biological information analysis process of the nanopore metagenome sequencing data are optimized and selected, SFTSV false positive sequencing fragments can be filtered, and the detection sensitivity and specificity are improved; according to the invention, SFTSV virus can be detected through nanopore metagenome sequencing and biological information data analysis, the genome sequence of SFTSV can be obtained at the same time, and the sequence information can be further used for genome characteristic analysis and tracing of the SFTSV.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of SFTSV nanopore metagenomic sequencing data analysis.

Detailed Description

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The serum sample is from the university of Shandong public health institute, and is a positive or negative sample for SFTSV RNA detection.

The sources of the reagents of the invention are as follows:

TABLE 1 reagent information

Reagent	Manufacturer of the product
		QIAamp viral RNA kit	Qiagen
Turbo DNase	Thermo Fisher Scientific
		RNA Clean&Concentrator-5 kit	Zymo Research
SuperScript IV	Thermo Fisher Scientific
		Sequenase	Affymetrix
AccuTaq LA	Sigma

To further illustrate the application of the detection method of the present invention, the following examples are given. The embodiment is only an example of the method of the present invention, and does not limit the subject matter and the scope of the invention, and other equivalent techniques within the scope of the inventive idea also belong to the scope of the invention.

Embodiment 1, an SFTSV detection technique based on nanopore metagenome sequencing is established, the method is as follows:

(1) Establishing samples and grouping, comprising the following steps:

the first group included serum samples constructed with different serum backgrounds and different concentrations of SFTSV.

Further, the specific method comprises the following steps: 10 serum samples positive for SFTSV RNA detection were mixed to form 1 positive mixed sample pool. 10 SFTSV RNA detection negative patient serum samples are mixed for 3 times respectively to form 3 negative mixed sample pools with different serum backgrounds. Respectively subpackaging 3 negative mixed sample pools into 6 EP tubes (1 ml serum/tube), adding unequal amounts of positive mixed sample pools into the EP tubes, and quantifying the virus load by using quantitative reverse transcription-polymerase chain reaction to make the concentration of SFTSV of each tube reach 0 and 10 respectively ² 、10 ³ 、10 ⁴ 、10 ⁵ 、10 ⁶ copies/ml。

The second group included 40 serum samples positive for SFTSV RNA detection.

The third group included 40 serum samples negative for SFTSV RNA detection. The first part comprises 10 samples, and the RNA of the hantavirus or the dengue virus is detected to be positive; the second part comprises 30 samples, and SFTSV, hantaan virus and dengue virus RNA tests are all negative.

(2) The method for establishing the nanopore metagenome sequencing comprises the following steps:

quantification of viral load of the Hazara virus fluid Using quantitative reverse transcription-polymerase chain reaction, a concentration of 10 was added to each sample ⁴ The copy/ml Hazara virus served as an internal control for sequencing positivity.

The quantitative reverse transcription-polymerase chain reaction is a process of reverse transcribing RNA into DNA and amplifying a specific DNA target using the polymerase chain reaction.

RNA was extracted using a QIAamp viral RNA kit (Qiagen), DNA was removed and RNA was purified using Turbo DNase (Thermo Fisher Scientific) and RNA Clean & concentrate-5 kit (Zymo Research).

The cDNA was amplified using a Sequence Independent Single Primer Amplification (SISPA) method. The Sequence Independent Single Primer Amplification (SISPA) method is an amplification method which does not depend on virus isolation and culture and shortens the time for detecting unknown viruses. The basic technical route is as follows: after treating the nucleic acid released from the sample with DNase, reverse transcription is performed using random primers containing a known fragment to synthesize a first strand cDNA and a second strand cDNA, respectively, and then PCR amplification reaction is performed on the double-stranded cDNA using the known fragment as a primer to amplify the unknown viral nucleic acid fragment accurately. In recent years, the method has been improved by first filtering the sample with a 0.22um filter and then treating the sample with DNase to remove free nucleic acids and reduce interference of background nucleic acids. SISPA has the advantages that the detection time is greatly shortened because the SISPA is independent of tissue culture and nucleic acid sequence, and the SISPA is suitable for different types of clinical specimens, and double-stranded RNA, single-stranded RNA and DNA virus can be detected.

The method comprises the following specific steps:

during reverse transcription, 4. Mu.l RNA and 1. Mu.l primer A (5'

-GTTTCCCACTGGAGGATA-N9-3',40 pmol/. Mu.l) and incubated at 65 ℃ for 5min and cooled to room temperature. During first strand synthesis, 5. Mu.l of SuperScriptIV (Thermo Fisher Scientific) reaction mix was added and incubated at 42 ℃ for 10min. During the second strand synthesis, 5. Mu.l of Sequenase (Affymetrix) reaction mixture was added and incubated at 37 ℃ for 8min, followed by addition of 0.45. Mu.l of Sequenase dilution buffer and 0.15. Mu.l of Sequenase and incubation at 37 ℃ for 8min. In the cDNA amplification, 5. Mu.l of the above reverse transcription product was mixed with 50. Mu.l of AccuTaq LA (Sigma) reaction mixture and 1. Mu.l of primer B (5'. The PCR reaction conditions were 30 cycles at 98 ℃ for 30s, 94 ℃ for 15s, 50 ℃ for 20s, and 68 ℃ for 2min, and 10min at 68 ℃. Finally, the amplified cDNA product was purified using AMPure XP Beads (Beckman Coulter) of 1.

6 samples were taken for each sequencing, 200fmol cDNA was taken from a single sample, and a multiplex sequencing library was constructed using the EXP-NBD104 expansion kit and the SQK-LSK110 ligation kit.

The sequencing library was loaded onto a nanopore r9.4.1 sequencing chip and sequenced using a nanopore minion sequencer.

The nanopore MinlON sequencer is the most portable sequencer at present, and the core of the nanopore MinlON sequencer is a flow cell which is provided with 2,048 nanopores, divided into 512 groups and controlled by an application-specific integrated circuit.

Example 2, a bioinformation analysis procedure of nanopore metagenomic sequencing data was established, the procedure was as follows:

(1) Nanopore sequencing raw data were converted to sequence files in Fastq format using Guppy software.

(2) Data quality was monitored by porecho to remove the coding Barcode, nanoplot, at both ends of the sequencing sequence reads.

(3) In order to accurately detect pathogens, the invention adopts an analysis method of species annotation, mapping and filtration of comprehensive sequencing reads. The method comprises the following steps:

first, species annotation of sequencing reads was performed using Centrifuge alignment against a reference database (p _ compressed + h + v database, containing >2 million species of bacteria and viruses etc. genome sequences) to obtain preliminary classification results. Next, a temporal reference sequence is selected for each species from its reference database based on the Centrifuge classification. Again, the sequencing reads were mapped to a temporal reference sequence using Minimap2, and a temporal consensus sequence for that species was assembled according to the majority voting method (i.e., the most mapped base was selected at each position in the genome). From this time, the temporal consensus sequence was aligned with a self-established reference database (containing all the SFTSV genome sequences in the public database) using Blast to select the best reference sequence. Finally, mapping the sequencing reads to the optimal reference sequence by using Minimap2, filtering sequencing fragments with mapping quality lower than 50% and mapping proportion lower than 50%, and analyzing and counting information such as mapping reads number, mapping depth of each base position of the reference sequence and the like by using Samtools software.

The standard for judging the positive detection of a certain virus is that more than or equal to 2 sequencing reads cover different positions of a pathogen genome.

(4) Medaka was used to analyze the mutation sites and assemble the consensus sequence of the virus.

(5) And integrating the biological information analysis software, and constructing a full-automatic and parallel biological information data analysis flow by adopting a Nextflow flow. The data analysis process scans the sequencing data storage folder according to a preset time interval, extracts and analyzes sequencing data in real time, can process the sequencing data of a plurality of samples simultaneously in parallel, and can operate on various platforms such as a personal computer, a high-performance server and cloud computing.

Example 3, the detection effect of nanopore metagenomic sequencing on SFTSV samples of different viral loads and serum backgrounds was evaluated. The method comprises the following steps:

(1) According to examples 1 and 2 of the present invention, nanopore metagenomic sequencing and bioinformatic data analysis were performed on the first set of samples.

(2) Analyzing the number of internal control Hazara reads detected from different background serum and different SFTSV load samples and the proportion of the internal control Hazara reads in sequencing data, analyzing the consistency of the proportion of the Hazara reads detected from different background serum samples, and evaluating the influence of the serum with different backgrounds on the pathogen detection of the sequencing method.

(3) Analyzing the number of SFTSV reads detected from different SFTSV load samples and the proportion of the SFTSV reads in sequencing data, analyzing the relation between the SFTSV reads proportion and the SFTSV load, and analyzing the detection limit of the SFTSV detected by the sequencing method.

(4) Reconstructing the whole genome sequence of the SFTSV in the sample, analyzing the relation between the genome coverage rate and the virus load, and reconstructing the detection limit of the SFTSV whole genome by using an analysis sequencing method.

Example 4, the sensitivity and specificity of nanopore metagenomic sequencing to detect SFTSV were studied, with the following steps:

(1) Nanopore metagenomic sequencing and bioinformatic data analysis were performed on the second and third sets of samples according to examples 2 and 3 of the present invention.

(2) Analyzing the number of internal control Hazara reads detected in each sample and the proportion of the internal control Hazara reads in sequencing data, and evaluating the stability of the internal control detected by the sequencing method from SFTSV positive samples and SFTSV negative samples with different CT values.

(3) The number of SFTSV reads detected from each sample and their proportion in the sequencing data were analyzed. Comparing the sequencing result with the result of the SFTSV nucleic acid detection kit, analyzing the relation between the SFTSV reads proportion and the CT value, and analyzing the sensitivity and specificity of the sequencing method for detecting the SFTSV.

(4) And selecting cDNA products of 10 positive samples according to the nucleic acid detection CT value to carry out Illumina sequencing, and analyzing and comparing the SFTSV reads proportion, the genome coverage rate and the genome accuracy obtained by the two sequencing platforms.

The Illumina sequencing and data analysis method comprises the following steps: a library was prepared using 1.5ng of SISPA amplified cDNA product and sequenced using an Illumina MiSeq sequencer. Sequencing data reads were aligned to the reference SFTSV sequences using Bwa, generating consensus sequences.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An SFTSV detection method based on nanopore metagenome sequencing is characterized by comprising the following steps of:

(1) Constructing a serum sample;

(3) Extracting RNA from the serum sample, removing DNA and purifying RNA;

(5) And (5) carrying out biological information data analysis.

2. An SFTSV detection method based on nanopore metagenomic sequencing of claim 1 wherein in step (1) the first group of serum samples comprises serum samples of different serum backgrounds and different SFTSV concentrations constructed using serum of 10 SFTSV RNA positive patients, serum of 10 SFTSV RNA negative patients; the second group included 40 sera of patients positive for SFTSV RNA; the third group comprises 40 SFTSV RNA negative patient sera divided into two parts, the first part comprises 10 hantavirus or dengue virus RNA positive samples, and the second part comprises 30 SFTSV, hantavirus and dengue virus RNA negative samples.

3. The method of claim 1, wherein the step (2) of adding 10% concentration of SFTSV assay is performed on the serum sample ⁴ copies/ml of Hazara virus.

4. The SFTSV detection method based on nanopore metagenomic sequencing of claim 1, wherein RNA is extracted using QIAamp viral RNA kit, DNA is removed using Turbo DNase and RNA Clean & concentrate-5 kit, and RNA is purified in step (3).

5. The SFTSV detection method based on nanopore metagenomic sequencing of claim 1, wherein in the step (3), 4. Mu.l of RNA and 1. Mu.l of primer A with a concentration of 40 pmol/. Mu.l are mixed during reverse transcription, incubated at 65 ℃ for 5min, and cooled to room temperature; during the first strand synthesis, 5. Mu.l of SuperScriptIV reaction mix was added and incubated at 42 ℃ for 10min; during the second strand synthesis, 5. Mu.l of Sequenase reaction mixture was added and incubated at 37 ℃ for 8min, followed by addition of 0.45. Mu.l of Sequenase dilution buffer and 0.15. Mu.l of Sequenase and incubation at 37 ℃ for 8min; in the cDNA amplification process, 5 μ l of the reverse transcription product is mixed with 50 μ l of AccuTaq LA (Sigma) reaction mixture and 1 μ l of primer B; the PCR reaction conditions are that 30 cycles of reaction are repeated at 98 ℃ for 30s, 94 ℃ for 15s, 50 ℃ for 20s and 68 ℃ for 2min, and the reaction time is 10min at 68 ℃; finally, the amplified cDNA product was purified using AMPure XP Beads (Beckman Coulter) of 1.

6. The method for SFTSV detection based on nanopore metagenomic sequencing of claim 5, wherein primer A is 5'-GTTTCCCACTGGAGGATA-N9-3';

the primer B is 5.

7. The method for SFTSV detection based on nanopore metagenomic sequencing of claim 1, wherein in step (4) a multiplex sequencing library is constructed using an EXP-NBD104 expansion kit and a SQK-LSK110 ligation kit.

8. The SFTSV detection method based on nanopore metagenomic sequencing of claim 1, wherein in the step (5), the bioinformatic analysis comprises the following procedures:

(c) Species annotation of sequencing reads was performed using Centrifuge alignment to a reference database (p _ compressed + h + v database, containing >2 million species of bacteria and viruses etc. genome sequences) to obtain preliminary classification results; secondly, selecting a temporary reference sequence for each species from a reference database according to the classification result of the Centrifuge; thirdly, mapping the sequencing reads to a temporary reference sequence by using Minimap2, and assembling a temporary consensus sequence of the species according to a majority voting method; from then on, using Blast to align the temporal consensus sequence with a self-established reference database to select the best reference sequence; finally, mapping sequencing reads to an optimal reference sequence by using Minimap2, filtering sequencing fragments with mapping quality lower than 50 and mapping proportion lower than 50%, and analyzing and counting the mapping reads quantity, mapping depth of each base position of the reference sequence and other information by using Samtools software;

(d) Analyzing mutation sites by Medaka and assembling a consensus sequence of the virus;

(e) Fusing the biological information analysis software, and constructing a full-automatic and parallel biological information data analysis flow by adopting a Nextflow flow;

the self-built reference database is a database built according to all SFTSV sequences in an NCBI or ViPR database and comprises all SFTSV genome sequences in a public database.

9. The SFTSV detection method based on nanopore metagenome sequencing of claim 8, wherein the criterion for determining the positive detection of a certain virus is that ≧ 2 sequencing reads cover different positions of the pathogenic genome.

10. The method of claim 8, wherein the majority voting method selects the base with the most mapping at each position in the genome.