CN111334868B - Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction - Google Patents

Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction Download PDF

Info

Publication number
CN111334868B
CN111334868B CN202010225821.0A CN202010225821A CN111334868B CN 111334868 B CN111334868 B CN 111334868B CN 202010225821 A CN202010225821 A CN 202010225821A CN 111334868 B CN111334868 B CN 111334868B
Authority
CN
China
Prior art keywords
artificial sequence
dna
primer
sequence
illumina
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010225821.0A
Other languages
Chinese (zh)
Other versions
CN111334868A (en
Inventor
王洋
李�杰
王辰
高汉林
郭超
王健伟
任丽丽
杨明
刘静
赵晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Furui Medical Laboratory Co ltd
Chinese Academy of Medical Sciences CAMS
Original Assignee
Fuzhou Furui Medical Laboratory Co ltd
Chinese Academy of Medical Sciences CAMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Furui Medical Laboratory Co ltd, Chinese Academy of Medical Sciences CAMS filed Critical Fuzhou Furui Medical Laboratory Co ltd
Priority to CN202010225821.0A priority Critical patent/CN111334868B/en
Publication of CN111334868A publication Critical patent/CN111334868A/en
Application granted granted Critical
Publication of CN111334868B publication Critical patent/CN111334868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/08Liquid phase synthesis, i.e. wherein all library building blocks are in liquid phase or in solution during library creation; Particular methods of cleavage from the liquid support
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Structural Engineering (AREA)
  • Virology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method for constructing a novel coronavirus whole genome high-throughput sequencing library and a kit for constructing the library. The method comprises the following steps: 1) Reverse transcription of viral RNA; 2) Performing a first round of PCR reaction using multiplex amplification primers of the anchor portion Illumina adaptor sequence; 3) And (3) carrying out a second round of PCR reaction by using the tagged Illumina library amplification primer, and purifying an amplification product to obtain a high-throughput sequencing library. The anchoring multiplex amplification primer combination provided by the invention can be used for carrying out high-efficiency targeted enrichment on the genome of the novel coronavirus COVID-19, overcomes the defects of low targeting, low experimental timeliness and easiness in bringing into the influence of host background pollution in the existing method, and is beneficial to completing the whole genome sequencing of the virus of the COVID-19 in a short time under the conditions of less sequencing data quantity and low cost, thereby realizing differential diagnosis of the virus of the COVID-19 and identification of virus mutation.

Description

Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction
Technical Field
The invention relates to the technical field of biology, in particular to a method for constructing a novel coronavirus whole genome high-throughput sequencing library and a kit for constructing the library.
Background
The novel coronavirus, covd-19 (severeacute respiratory syndrome coronavirus, sars-CoV-2), belongs to the genus b coronavirus, and like other discovered coronavirus genomes, the covd-19 genome comprises 6 major open reading frames (ORF, open Reading Frame), ORF1ab, ORF3a, ORF6, ORF7a, ORF8 and ORF10, respectively, and other accessory genes (accessoriy genes), S genes, E genes, M genes and N genes, respectively. The method for obtaining the whole nucleic acid mutation of the viral genome, the virus typing and the evolution relation research by carrying out the whole genome high-depth sequencing on the virus has the strongest sensitivity and specificity at present; however, due to the background interference of host nucleic acid and other factors, the current mainstream viral whole genome high-throughput sequencing schemes have the problems of large sequencing data volume requirement, higher experimental cost, lower timeliness and the like.
At present, most of detection kits for COVID-19, which are approved by the national drug administration (NMPA), are fluorescent real-time quantitative PCR (qRT-PCR) detection methods based on Taqman probes, colloidal gold antibody detection methods of IgG/IgM and IgM antibody detection methods based on a magnetic particle chemiluminescence method; the qRT-PCR method has strong specificity, and can complete the relative quantification of positive virus-carrying samples within 2 hours; however, as the variation of RNA viruses is often much faster than that of other types of viruses, once the combination position of probes and the combination position of specific primers on the viral genome are mutated, the mutation is influenced by factors such as the quality of extracted viral RNA, experimental means, laboratory personnel operation and the like, the sensitivity is lower (the false negative is higher), the Ct value is unstable or the Ct value is greater than an untrusted value such as 40, and the false negative is improved, so that the misdiagnosis rate and the missed diagnosis rate are improved; the colloidal gold antibody detection of IgG/IgM based on immunological antibody antigen reaction has extremely quick timeliness, but has high false positive and still needs subsequent clinical diagnosis support; CT detection is used as a gold standard for clinical diagnosis, depends on a large-scale instrument, and is difficult to realize by common screening; the digital PCR (ddPCR) based on the water-in-oil droplet technology has strong specificity and strong sensitivity; however, the method has the defects of low flux and high cost, and once mutation occurs in the primer binding position, the detection rate can be influenced.
In the field of research on microorganisms/viruses, especially RNA viruses, the method mainly uses RNA-seq as a main technical means to sequence the whole genome of the virus, and uses total RNA isolated from a host to carry out post-construction sequencing of a secondary sequencing library by removing (rRNA amplification) host ribosomal RNA (hrRNA, human Ribosomal RNA), the method has the defect that host genome and transcriptome information brought in when viruses are isolated can cause that reads from the genome of the virus account for only a very small part (0.01-0.1%) in the next machine data, so that the requirement for initial nucleic acid RNA is high; meanwhile, the sequence comparison and assembly of the subsequent raw signal analysis usually have the defects of a certain proportion of gap, insufficient coverage of partial viral genome region and sequencing depth, insufficient whole coverage proportion, high requirement on the data size of the next machine (the abundance of the virus in the host is usually more than 10G data according to the size of the virus), higher experimental cost, lower timeliness and the like. Recent research results published in the Nature journal show that RNA-seq was performed on a new coronavirus isolated from human and studied by the metagenome (metagenome) analysis method, and that of all obtained sequencing reads of 10038758 after the machine, the sequencing reads from the host human were filtered, and finally only 1582 sequencing reads were obtained for subsequent COVID-19 analysis. The virus whole genome sequencing through the targeted liquid phase hybridization capture system has stronger specificity and low data volume requirement; the method has the defects that the requirement on the initial target cDNA is high, the risk of capturing the viral genome exists, the probe design cost is high, the timeliness is low (the total hybridization capture is more than 12 hours), and the clinical transformation fitness is not high; through literature search, studies on high throughput Sequencing of RNA viral whole genomes using targeted multiplex PCR Sequencing (TMS, targeted Multiplexing-PCR Sequencing) technology have been recently reported.
To make up for the technical blank in the field, we propose a novel coronavirus (covd-19) whole genome mutation rapid differential diagnosis technology and kit application based on targeted multiplex polymerase chain reaction amplicon sequencing (Targeted Multiplexed Amplicon-seq). The method is not affected by host genome, has strong targeting to the COVID-2019, high and uniform coverage, low sample initial quantity requirement, greatly reduced experiment and sequencing cost compared with the existing virus high-throughput sequencing method, greatly improved timeliness, and can realize high sensitivity, accuracy and comprehensive differential diagnosis of the COVID-19 virus in biological samples such as throat swab, alveolar lavage fluid and the like and virus culture samples.
Disclosure of Invention
The invention aims to provide a method for constructing a novel coronavirus whole genome high-throughput sequencing library based on a targeted multiplex polymerase chain reaction amplicon sequencing technology (Targeted Multiplexed Amplicon-seq) and a kit for constructing the library.
It is another object of the present invention to provide the use of the above method in the detection of novel coronavirus variants.
To achieve the object of the present invention, in a first aspect, the present invention provides a method for constructing a novel coronavirus whole genome high throughput sequencing library, comprising the steps of:
A. Extracting RNA of a virus sample, and carrying out reverse transcription to obtain single-stranded cDNA or double-stranded cDNA;
B. according to the published novel coronavirus COVID-19 genome sequence, performing shingled full-coverage primer design, respectively designing a multiplex amplification primer group 1 of an anchor part Illumina joint sequence and a multiplex amplification primer group 2 of the anchor part Illumina joint sequence (the anchor multiplex amplification primer group 1 and the anchor multiplex amplification primer group 2), taking single-stranded cDNA or double-stranded cDNA as a template, respectively performing a first round of PCR reaction by using the primer group 1 and the primer group 2, and mixing amplification products according to equimolar amounts to cover the whole genome of the virus;
C. b, performing a second round of PCR reaction by using the mixed amplification products in the step B as templates and using tagged Illumina library amplification primers, and purifying the amplification products to obtain a high-throughput sequencing library;
the design method of the multiplex amplification primer group 1 of the anchor part Illumina linker sequence and the multiplex amplification primer group 2 of the anchor part Illumina linker sequence in the step B comprises the following steps:
b1, designing non-anchored multiplex amplification primer groups according to a novel coronavirus COVID-19 genome sequence, wherein the non-anchored multiplex amplification primer groups are respectively a multiplex specific amplification primer group I and a multiplex specific amplification primer group II, the primer group I comprises a forward primer pool and a reverse primer pool, the primer group II comprises a forward primer pool and a reverse primer pool, and each pair of forward primer and reverse primer corresponds to one amplicon; respectively designing a forward primer and a reverse primer of a primer group II in two adjacent amplicon sequences of the primer group I, respectively designing the forward primer and the reverse primer of the primer group I in the two adjacent amplicon sequences of the primer group II, and repeating the steps until the amplicons corresponding to the primer group I and the amplicons corresponding to the primer group II cover the whole genome of the virus in a shingled mode;
B2, adding the Illumina part linker sequence (1) to the 5 'end of each forward primer according to the 5' -3 'direction, and adding the Illumina part linker sequence (2) to the 5' end of each reverse primer according to the 5'-3' direction; a forward primer F pool with the Illumina part joint sequence (1) and a reverse primer R pool with the Illumina part joint sequence (2) are used as a multiplex amplification primer group 1 of an anchoring part Illumina joint sequence; a forward primer F 'pool with the Illumina part joint sequence (1) and a reverse primer R' pool with the Illumina part joint sequence (2) are used as a multiplex amplification primer group 2 of the anchoring part Illumina joint sequence;
wherein the sequence of Illumina partial linker sequence (1) is as follows: 5' -I7 tagged primer 3' terminal sequence-AGATGTGTATAAGAGACAG-3 ';
the sequence of Illumina partial linker sequence (2) is as follows: 5' -I5 tagged primer 3' -terminal sequence-AGATGTGTATAAGAGACAG-3 ';
and the size of the 3' -end sequence of the I7 tagged primer is 9-15 bp, and the size of the 3' -end sequence of the I5 tagged primer is 8-14 bp, so that the I7 tagged primer and the I5 tagged primer can be specifically annealed to the 3' -end binding position on the amplicon.
In the method, the Tm threshold difference between each primer pair in the step B is +/-2 ℃; and/or
The amplicon size is 200-300bp; and/or
Primer pairs that may cause Primer or Primer interior formation of dimers (Primer primers) and Stem-Loop structures (Stem-Loop) are removed during Primer design; and/or
In the same multiplex specific amplification primer set, the reverse primer sequence 5 'of the upstream amplicon of the genome is located upstream of the forward primer sequence 5' of the downstream amplicon to prevent short fragment byproducts from forming and to perform PCR competition.
The method of reverse transcription of RNA into single-stranded cDNA in step A is selected from the following a or b:
a. guiding single-stranded cDNA synthesis by using a 6-10bp random primer;
b. a plurality of primers from a reverse primer R pool and a reverse primer R ' pool are mixed to form a specific reverse transcription primer group to guide single-stranded cDNA synthesis, the reverse primers are uniformly distributed along the 3' -5' direction of a viral genome, and the primers are 800-1000bp apart.
The method for reverse transcription of RNA into double-stranded cDNA in step A comprises:
i. guiding single-stranded cDNA synthesis by using a 6-10bp random primer;
ii. Nicking RNA-cDNA hybrid duplex with RNase H (RNaseH) in the presence of dNTPs;
and iii, synthesizing double-stranded cDNA by using the small fragment RNA generated at the notch as a primer and utilizing RNA-dependent DNA polymerase.
The labeled Illumina library amplification primers in step C were as follows (SEQ ID NOS: 503-504):
i7 tagged primer: 5'-CAAGCAGAAGACGGCATACGAGAT (I7) GTCTCGTGGGCTCGG-3', I5 tagged primer: 5'-AATGATACGGCGACCACCGAGATCTACAC (i 5) TCGTCGGCAGCGTC-3'.
Preferably, the sequence of Illumina partial linker sequence (1) in step B is as follows: 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 1);
the sequence of Illumina partial linker sequence (2) is as follows: 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3' (SEQ ID NO: 2).
In the method, the multiplex amplification primer set 1 of the anchor part Illumina linker sequence and the multiplex amplification primer set 2 of the anchor part Illumina linker sequence in the step B comprise 250 pairs of primers, wherein the forward primers are COV-1-F-COV-250-F, the nucleotide sequences of the forward primers are respectively shown as SEQ ID NO. 3-252, the reverse primers are COV-1-R-COV-250-R, the nucleotide sequences of the reverse primers are respectively shown as SEQ ID NO. 253-502, the COV-1-F and the COV-1-R are a pair of primers, and the COV-2-F and the COV-2-R are a pair of primers, and the like.
Preferably, the primer information of the multiplex amplification primer set 1 of the anchor part Illumina adaptor sequence and the multiplex amplification primer set 2 of the anchor part Illumina adaptor sequence in step B are shown in table 1 and table 2, respectively.
TABLE 1 primer information for anchoring multiplex primer set 1
Figure BDA0002427599160000041
/>
Figure BDA0002427599160000051
/>
Figure BDA0002427599160000061
TABLE 2 primer information for Anchor multiplex primer set 2
Figure BDA0002427599160000071
/>
Figure BDA0002427599160000081
/>
Figure BDA0002427599160000091
Wherein the primer number COV-1 corresponds to the primers COV-1-F and COV-1-R, the primer number COV-2 corresponds to the primers COV-2-F and COV-2-R, and so on.
In the present invention, the virus sample may be derived from a biological sample such as a throat swab, an alveolar lavage, or a supernatant isolated culture after virus infection of cells.
In a second aspect, the present invention provides a kit for constructing a novel coronavirus whole genome high throughput sequencing library, the kit comprising multiplex amplification primer set 1 of anchor moiety Illumina adaptor sequences and multiplex amplification primer set 2 of anchor moiety Illumina adaptor sequences and tagged Illumina library amplification primers used in the library construction method described above, optionally comprising various reagents (e.g. amplification enzyme reagents, corresponding buffers, etc.) for library construction.
In a third aspect, the present invention provides the use of the above library construction method in the detection of novel coronavirus variants, said use comprising:
(1) Constructing and obtaining a novel coronavirus whole genome high-throughput sequencing library to be tested according to the method;
(2) Sequencing the high-throughput sequencing library on a machine after the quality inspection of the high-throughput sequencing library is qualified;
(3) Bioinformatics analysis and detection of mutation sites.
Preferably, step (3) comprises the sub-steps of:
1) Constructing a novel coronavirus COVID-19 reference genome MT019531.1 index data set by using BWA software, and generating fai files by using samtools faidx;
2) reads quality control analysis: filtering and quality control analysis are carried out on the double-end reads by using SOAPnuke to obtain clean reads (read length after filtering); reads with the following conditions will be removed: condition 1: reads containing linker sequence contamination; condition 2: reads with more than 10% N bases; condition 3: the number of low quality (Q < 38) bases exceeds 50% of the total reads;
3) Data alignment and sequencing: the BWA is combined with the samtools to compare clear reads to a reference genome MT019531.1 to generate a BAM file, and the comparison parameters are "-t 32-M"; sequencing by SortSam.jar using picard software; establishing an index for the ordered BAM files by using an index tool of samtools; performing quality control on the generated BAM file by using a Qualimap tool;
4) And (3) mutation detection: detecting SNP and InDel of the virus by using samtools pileup and VarScan; the SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the InDel detection parameters are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
5) Finally, the detected SNPs and indels were annotated using annovar software based on the GFF file of MT019531.1 reference genome.
By means of the technical scheme, the invention has at least the following advantages and beneficial effects:
the anchoring multiplex amplification primer combination provided by the invention can be used for carrying out high-efficiency targeted enrichment on the genome of the novel coronavirus COVID-19, overcomes the defects of low targeting, low experimental timeliness and easiness in bringing into the influence of host background pollution in the existing method, and is beneficial to completing the whole genome sequencing of the virus of the COVID-19 in a short time under the conditions of less sequencing data quantity and low cost.
The multiplex polymerase chain reaction amplification primer provided by the invention specifically targets a novel coronavirus COVID-19 genome sequence, is not influenced by host human RNA, has low sample initial quantity requirement, and can realize specific detection, differential diagnosis and mutation identification of the COVID-19 RNA virus whole genome extracted from samples such as throat swabs, alveolar lavage fluid and the like. The invention carries out multiple rounds of detection and optimization on the primer sequences of multiplex PCR amplification, and finally the coverage uniformity of the obtained sequencing data reaches more than 90%; meanwhile, the method overcomes the defects of too little machine-setting data of the genome of the virus itself, too long experiment period and large amount of initial nucleic acid materials of the experiment caused by introducing a large amount of host RNA residues in the RNA-seq sequencing method based on RNA virus genome, and can also carry out secondary detection and diagnosis on false negative patients with suspected symptoms, which are rapidly detected by the conventional qRT-PCR method.
In actual use, the invention optimizes a PCR reaction system and a program for targeted enrichment and further amplification in the library construction process, and effectively improves the problems of low amplification efficiency and poor uniformity of the conventional general multiplex PCR reaction.
Drawings
FIG. 1 shows the principle of primer design according to the present invention.
FIG. 2 is a chart of Agilent 2200 micro-electrophoresis peaks of sequencing library quality control in example 1 of the present invention. Where a is the quality inspection result of library 46d1-1, b is the quality inspection result of library 50d1-1, size (bp) on the abscissa represents library fragment Size, and Sample sensitivity on the ordinate represents signal Intensity.
FIG. 3 is a chart of Agilent 2200 micro-electrophoresis peaks of quality control of a sequencing library in example 2 of the present invention. Where a is the quality inspection result of library 46d1-2, b is the quality inspection result of library 50d1-2, size (bp) on the abscissa represents library fragment Size, and Sample sensitivity on the ordinate represents signal Intensity.
FIG. 4 is a chart of Agilent 2200 micro-electrophoresis peaks of sequencing library quality control in example 3 of the present invention. Where a is the quality inspection result of library 48d5-1, b is the quality inspection result of library 47d1-1, size (bp) on the abscissa represents library fragment Size, and Sample sensitivity on the ordinate represents signal Intensity.
FIG. 5 is a chart of Agilent 2200 micro-electrophoresis peaks of sequencing library quality control in example 4 of the present invention. Where a is the quality inspection result of library 48d5-2, b is the quality inspection result of library 47d1-2, size (bp) on the abscissa represents library fragment Size, and Sample sensitivity on the ordinate represents signal Intensity.
FIG. 6 is a chart of Agilent 2200 micro-electrophoresis peaks of sequencing library quality control in example 5 of the present invention. Wherein a is the quality inspection result of the library XH1P2_R, b is the quality inspection result of the library WHP6_R, c is the quality inspection result of the library XH1P6_R, size (bp) on the abscissa represents the Size of the library fragment, and Sample on the ordinate represents the signal Intensity.
Detailed Description
The invention provides a novel coronavirus (COVID-19) whole genome mutation rapid differential diagnosis technology based on targeted multiplex polymerase chain reaction amplicon sequencing and a kit application, and a primer combination and a kit designed according to the method, and a COVID-19 single-stranded RNA library building method using the primer combination can realize the accuracy and comprehensive differential diagnosis of novel coronavirus COVID-19 in biological samples such as pharyngeal swabs, alveolar lavage fluid and the like and virus culture samples.
The invention also provides a method for rapidly identifying the mutation of the RNA viruses with known genome sequences of all types, ideas and reference modes of identification, and furthermore, the anchoring multiplex PCR primer sequences targeting different RNA virus genome sequences can be replaced according to actual requirements, so that the kit suitable for different application ranges is developed.
The technical scheme of the invention is as follows:
the invention provides a library construction method for human novel coronavirus (COVID-19) whole genome high throughput sequencing, which comprises the following steps:
1. the method for synthesizing a single-stranded cDNA by reverse transcription of viral single-stranded RNA comprises the following steps: step a, 6 base-10 base Random Primer (Random 6mer-10mer Primer) guided single-strand cDNA (1 st cDNA) synthesis; step b, the 1st cDNA is not purified or enters a subsequent PCR amplification reaction after being purified; or step a, mixing a plurality of primers in the non-anchored reverse primer pool R in the claim 3a into a specific reverse transcription primer group for guiding the synthesis of a strand cDNA, wherein the binding positions of the selected specific reverse transcription primers are uniformly distributed along the 3'-5' direction of the viral genome, and the primers are separated by a distance of 800-100bp base; step b, purifying the one-strand cDNA and then carrying out subsequent PCR amplification reaction; the method for synthesizing double-stranded cDNA by reverse transcription of virus single-stranded RNA comprises the following steps: step a, one-strand cDNA synthesis guided by a 6-10 base random primer; step b, RNaseH mediates RNA-1st cDNA hybrid double-strand (RNA-1 st cDNA hybrid) notch generation with the assistance of deoxyribonucleoside triphosphates (dNTPs); step c, using the small fragment RNA generated at the notch as a primer, and synthesizing a two-chain cDNA (2 nd cDNA) by using RNA-dependent DNA polymerase; step d, recovering and purifying double-stranded cDNA; the steps can be carried out by using commercial reverse transcription and two-chain synthesis kits.
2. Designing a multiple specific primer 250 pair of the COVID-19 virus genome of the anchor part Illumina sequencing linker sequence:
a. two sets of multiplex amplification primer sets 1 and 2 (each primer pool comprising a forward primer F pool and a reverse primer R pool, respectively) were designed based on the full length of the sequence MT019531.1 of the genome of COVID-19 published on the National Center for Biotechnology Information (NCBI) website (Accession No: MT019531GWHABKH 00000000); the forward primer and the reverse primer of the amplification primer group 2 are respectively designed in two adjacent amplicon sequences of the amplification primer group 1, and the forward primer and the reverse primer of the amplification primer group 1 are respectively designed in two adjacent amplicon sequences of the amplification primer group 2 continuously and repeatedly until the amplification products of the amplification primer group 1 and the amplification primer group 2 can cover the whole virus genome in a shingled manner (figure 1). Designing primers, setting Tm threshold difference between different primers to be +/-2 ℃, setting the size of an amplicon product to be 200-300bp, and simultaneously removing Primer pairs which can cause Dimer (Primer Dimer) and Stem-Loop structure (Stem-Loop) to be formed between the primers and/or inside the primers in the design; in the same amplification primer group, the 5 'end of the reverse primer sequence of the upstream amplicon of the genome is ensured to be positioned at the upstream of the 5' end of the forward primer sequence of the downstream amplicon as much as possible, so that the formation of short-fragment byproducts is prevented and PCR competition is carried out; meanwhile, the amplification efficiency in the system is ensured to be close to high consistency.
b. Design of an Anchor multiplex amplification primer F pool for anchoring the Illumina partial linker sequence the Illumina Nextera linker partial sequence (5' -GTCTCGTGGGCTCGG)AGATGTGTATAAGAGACAG-3 ') to the 5' end of all primers in the forward primer F pool according to the 5'-3' direction; wherein 5'-AGATGTGTATAAGAGACAG-3' is Tn5 transposase binding site sequence in the Illumina Nextera linker, 5' -GTCTCGTGGGCTCGG-3 'is identical to the 3' terminal sequence of the tagged Primer (I7-Indexed Primer) used for Illumina complete library amplification; wherein 5'-GTCTCGTGGGCTCGG-3' is suitably shortened or lengthened to ensure that the 3' terminal sequence (downstream of I7) of the tagged Primer (I7-Indexed Primer) amplified from the complete library of Illumina can anneal normally thereto.
c. Design of an Anchor multiplex amplification primer R pool for anchoring the Illumina partial linker sequence the Illumina Nextera linker partial sequence (5' -TCGTCGGCAGCGTC)AGATGTGTATAAGAGACAG-3 ') adding to the 5' ends of all primers in the upstream primer R pool according to the 5'-3' direction; wherein 5'-AGATGTGTATAAGAGACAG-3' is the Tn5 transposase binding site sequence in the Illumina Nextera linker, 5'-TCGTCGGCAGCGTC-3' is the same sequence as the 3' terminal sequence of the tagged Primer (I5-Indexed Primer) used for amplification of Illumina complete library; wherein 5'-TCGTCGGCAGCGTC-3' is suitably shortened or lengthened to ensure that the 3' terminal sequence (downstream of I5) of the tagged Primer (I5-Indexed Primer) amplified from the complete library of Illumina can anneal normally thereto.
d. The synthesized pool of anchored multiplex amplification primers F and R were mixed to form an anchored multiplex amplification primer set 1 (Anchored Primer Pool 1) and an anchored multiplex amplification primer set 2 (Anchored Primer Pool), and the primer mix patterns are shown in tables 1 and 2. In practical applications, it is necessary to mix the amplification products of the anchored multiplex amplification primer set 1 or the anchored multiplex amplification primer set 2 in equimolar amounts to cover the whole viral genome.
The library construction method provided by the invention comprises the following steps: step 1) single-stranded RNA of retrovirus into one-stranded cDNA or single-stranded RNA of retrovirus is synthesized into double-stranded cDNA, wherein the double-stranded cDNA synthesis reagent is preferably EpiNext Hi-Fi cDNA kit (Epigentek), and the one-stranded cDNA synthesis reagent is preferably TAKARA PrimeScript 1 st strand cDNA Synthesis kit (TAKARA, cat No. 6110A); the reverse transcription primer is selected as a specific reverse transcription primer group formed by mixing a plurality of primers in a 6-10 base random primer or a non-anchoring reverse primer pool R; step 2) performing PCR reactions using the anchored multiplex amplification primer set 1 and the anchored multiplex amplification primer set 2, respectively,enriching novel coronavirus cDNA in a targeting way; 3) Purifying the PCR product in the step 2), and mixing according to the equimolar amount; 4) Performing PCR library amplification on the cDNA subjected to targeted enrichment to obtain a DNA library which can be used for sequencing by an Illumina sequencing platform; 5) And (5) purifying the library. The reaction system of the PCR in the step 2) comprises: 5. Mu.L-10. Mu.L of cDNA template, 2 5. Mu.L of anchored multiplex amplification primer set 1 or anchored multiplex amplification primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system, or 0-5. Mu.L of double distilled water calculated from the total reaction volume (total reaction volume/2), preferably KAPA HiFi HotStart ReadyMix of DNA polymerase and 2 Xbuffer system (Roche, cat No. KKK2602); the PCR amplification procedure of step 2) comprises: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20s, annealing at 60 ℃ for 30s, and extension at 72 ℃ for 30s, and the cycle number is set to be 10-25 according to the number of copies of different viruses; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃. Step 3) the first round of anchored PCR amplification product purification procedure included: step a) adding an equal volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of amplification product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization. Step 3) mixing two groups of PCR products according to equimolar quantity, wherein the step a comprises the steps of respectively detecting the mass concentration of the purified two groups of anchored multiplex amplification yield increase products; step b, respectively detecting two groups of purified anchored multiplex amplification products; step c, calculating the molar concentration of the two groups of anchored multiplex amplification products respectively; step d, mixing in equal molar amount; step 4) performing PCR library amplification on the targeted enriched cDNA comprises: 20. Mu.L of mixed anchored multiplex amplification PCR product, 5. Mu.L of Illumina library amplification (Indexed PCR) primer pair, 25. Mu.L of DNA polymerase and 2 Xbuffer system, preferably DNA polymerase and 2 Xbuffer system selection KAPA HiFi HotStart ReadyMix (Roche, cat No. KKK2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: 15s of denaturation at 98 ℃, 30s of annealing at 60 ℃ and 30s of extension at 72 ℃ with 10 cycles; step c, total extension: extension at 72 DEG C Stretching for 60s; preserving at 4 ℃. The library purification procedure of step 5) comprises: step a, adding 0.6-1 volume (30-50. Mu.L) of the amplified product of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400); step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, washing the magnetic beads twice by using 80% fresh prepared ethanol.
In some specific embodiments, the viral RNA sample is from an extraction of virus isolated from alveolar lavage fluid.
In some specific embodiments, the viral RNA sample is from viral extraction from a pharyngeal swab.
In some specific embodiments, the viral RNA sample is from a high copy number viral extract isolated from the supernatant of an in vitro infected cell culture virus.
In some embodiments, the reverse transcription primer employs a 6 base random primer.
In some embodiments, the reverse transcription primer employs a specific reverse transcription primer set that is a mixture of several primers in the non-anchor reverse primer pool R.
The method for analyzing the off-line data provided by the invention comprises the following steps: step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%; step 2: and (5) performing quality control analysis on reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads; step 3: data alignment and ordering. The BWA-combined samtools were used to align clean reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like; step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the InDel detection parameters are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; and 5, finally annotating the detected SNP by using annovar software based on the GFF file of the MT019531.1 reference genome.
The invention also provides a kit for constructing a human novel coronavirus (covd-19) whole genome high throughput sequencing library, the kit comprising the following components: a specific reverse transcription primer set, an anchor multiplex amplification primer set 1 and an anchor multiplex amplification primer set 2 which are formed by mixing a plurality of primers in a non-anchor reverse primer pool; sequencing library amplification primers, and various reagents used in library construction. Further, the kit also contains instructions for the method and safe use should be known.
The following examples are illustrative of the invention and are not intended to limit the scope of the invention. Unless otherwise indicated, the examples are in accordance with conventional experimental conditions, such as the molecular cloning laboratory Manual of Sambrook et al (Sambrook J & Russell DW, molecular Cloning: a Laboratory Manual, 2001), or in accordance with the manufacturer's instructions.
Example 1
The viral RNA used in this example was obtained from alveolar lavage fluid of a novel patient with coronary pneumonia by magnetic bead extraction, two cases in total; the extraction and quality inspection of RNA is performed by the biological safety class 3 (P3) laboratory of the institute of pathogenic biology of the national academy of medical sciences/Beijing synergetic hospital.
The method provided by the embodiment can be used for detecting virus types in alveolar lavage fluid or detecting virus genome mutation from patients diagnosed with novel coronary pneumonia; viral copy number viral concentration (Copies/. Mu.L) was determined by absolute quantitative qRT-PCR using N gene copy number of novel coronavirus nucleic acid standard (high concentration) GBW (E) 091089 (China national institute of metrology) (Table 3).
TABLE 3 alveolar lavage RNA viral copy number and clinical information
Figure BDA0002427599160000141
The specific experimental method is as follows:
the virus single-stranded RNA extracted from alveolar lavage fluid was reverse transcribed into a single-stranded cDNA (1 st cDNA) using a 6 base random primer, and the 1st cDNA synthesis kit was selected as follows: TAKARA PrimeScript 1 st strand cDNASynthesis kit (TAKARA, cat No. 6110A); the 1st cDNA was purified for subsequent amplification.
Using the anchored multiplex amplification primer set 1, a PCR reaction was performed, the PCR reaction system comprising: 5. Mu.L of the purified cDNA template, 1 5. Mu.L of the anchor multiplex primer set, 15. Mu.L of the DNA polymerase and 2 Xbuffer system, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98℃for 20s, annealing at 60℃for 30s, elongation at 72℃for 30s, cycle number 15; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
Using the anchored multiplex amplification primer set 2, a PCR reaction is performed, the PCR reaction system comprising: 5. Mu.L of cDNA template after purification of the same sample, 2 5. Mu.L of anchored multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20s, annealing at 60 ℃ for 30s, extension at 72 ℃ for 30s, and cycle number of 15; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The first round anchored PCR amplification products were purified separately, and the procedure included: step a, adding 1 volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) to dissolve back; equimolar amounts of the two sets of PCR products were mixed.
The first round PCR product of the Illumina library tagged amplification primer amplification mix is 20. Mu.L, the Illumina library amplification primer pair is 5. Mu.L, the DNA polymerase and 2 Xbuffer system is 25. Mu.L total, preferably, the DNA polymerase and 2 Xbuffer system is KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: 15s of denaturation at 98 ℃, 30s of annealing at 60 ℃ and 30s of extension at 72 ℃ with 10 cycles; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The second round of Illumina library amplification product purification procedure included: step a, adding 1 volume (50. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization.
High throughput sequencing, namely performing high throughput sequencing on the library purified in the previous step according to the on-machine operation steps of illuminea Novaseq; the amount of sequencing data was set to 1G.
And (5) machine-starting data analysis:
step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%;
step 2: and (5) performing quality control analysis on reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads;
step 3: data alignment and ordering. The BWA-combined samtools were used to align clear Reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like;
Step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the detection parameters of InDel are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
and 5, annotating the detected SNP by using annovar software based on the GFF file of the MT019531.1 reference genome.
Analysis of results:
FIG. 2 shows the library construction results of this example, showing that the library construction using alveolar lavage fluid to isolate viral RNA, the 46d1-1 sequencing library and the 50d1-1 sequencing library are bimodal, with a major peak at about 380-400bp (80%), which is in line with the average size of the designed amplicon and the size of the complete library, and a minor peak at 800-1000bp, presumably due to: 1) A small amount of genome byproducts generated under the influence of random primers; 2) Potential primer dimers are over-amplified by library amplification primers; or 3) the potential anchored multiplex amplification primer set 1 and anchored multiplex amplification primer 2 remain after the first round of purification, and are amplified to form a ratio of about 20%; blank NC did not construct libraries, meeting expectations (results not shown); the machine-down data are respectively as follows: 0.75G (46 d 1-1) and 1.1G (50 d 1-1), the raw data Q30 values were 90.38% and 80.72%, respectively (Table 4).
Table 4 alveolar lavage fluid sample library off-the-shelf data quality control
Figure BDA0002427599160000161
The comparison of the machine-setting data to the virus reference genome MT019531.1 (Accession No: MT019531 GWHABKH 00000000) after filtration is carried out, the comparison base number, the comparison rate, the mismatch rate, the average depth coverage ratio and the like are shown in Table 5, and the comparison rate of the machine-setting data of the alveolar lavage fluid sample is more than 92%, and the mismatch rate is less than 0.2%; the sequencing depth 100 times coverage ratio of the N and S accessory genes of the novel coronaviruses of the two libraries is 100%, so that the novel coronavirus COVID-19 can be determined; the viral genome sequencing depth 100 x coverage ratio reached 97.38% and 98.24%, respectively (table 5).
TABLE 5 off-the-shelf data analysis and statistics of alveolar lavage fluid sample library
Figure BDA0002427599160000171
In both libraries, 46d1-1 had 9 Single Nucleotide Polymorphism Sites (SNPs), no indel mutations were found, respectively: MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 3706 (orf 1ab: A3441G), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 5812 (orf 1ab: C5547T), MT019531.1 genomic position 6996 (orf 1ab: C67531T), MT019531.1 genomic position 7010 (orf 1ab: G6755A), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), MT019531.1 genomic position 18640 (orf 1ab: A18375G).
50d1-1 had 8 Single Nucleotide Polymorphism Sites (SNPs) and no indel mutation was found, respectively: MT019531.1 genomic position 1880 (orf 1ab: G1615A), MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 6996 (orf 1ab: C6751T), MT019531.1 genomic position 7010 (orf 1ab: G6755A), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), MT019531.1 genomic position 28620 (N: G346A).
Example 2
The viral RNA samples 46d1 and 50d1 used in this example are the same as those used in example 1.
The specific experimental method is as follows:
viral single-stranded RNA extracted from alveolar lavage fluid was purified using 34 gene-specific reverse primers (splitThe kit for synthesizing 1st cDNA was selected by mixing COV-1-R, COV-8-R, COV-12-R, COV-20-R, COV-30-R, COV-38-R, COV-47-R, COV-54-R, COV-62-R, COV-71-R, COV-80-R, COV-86-R, COV-94-R, COV-102-R, COV-111-R, COV-119-R, COV-125-R, COV-132-R, COV-141-R, COV-146-R, COV-155-R, COV-162-R, COV-172-R, COV-179-R, COV-187-R, COV-195-R, COV-202-R, COV-210-R, COV-220-R, COV-228-R, COV-233-R, COV-239-R, COV-247-R, COV-252-R (genomic direction 3 '-5'), and reverse transcription into one strand cDNA (1 st cDNA): TAKARA PrimeScript 1 st strand cDNA Synthesis kit (TAKARA, cat No. 6110A); the 1st cDNA was purified for subsequent amplification.
Using the anchored multiplex amplification primer set 1, a PCR reaction was performed, the PCR reaction system comprising: 5. Mu.L of the purified cDNA template, 1 5. Mu.L of the anchor multiplex primer set, 15. Mu.L of the DNA polymerase and 2 Xbuffer system, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20S, annealing at 60 ℃ for 30S, extension at 72 ℃ for 30S, and cycle number of 15; step c, total extension: extending at 72 ℃ for 60S; preserving at 4 ℃.
Using the anchored multiplex amplification primer set 2, a PCR reaction is performed, the PCR reaction system comprising: 5. Mu.L of cDNA template after purification of the same sample, 2 5. Mu.L of anchored multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20s, annealing at 60 ℃ for 30s, extension at 72 ℃ for 30s, and cycle number of 15; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The first round anchored PCR amplification products were purified separately, and the procedure included: step a, adding 1 volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) to dissolve back; equimolar amounts of the two sets of PCR products were mixed.
The first round PCR product of the Illumina library tagged amplification primer amplification mix is 20. Mu.L, the Illumina library amplification primer pair is 5. Mu.L, the DNA polymerase and 2 Xbuffer system is 25. Mu.L total, preferably, the DNA polymerase and 2 Xbuffer system is KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: denaturation at 98℃for 15s, annealing at 60℃for 30s, extension at 72℃for 30s, cycle number 10; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The second round of Illumina library amplification product purification procedure included: step a, adding 1 volume (50. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization.
High throughput sequencing, namely performing high throughput sequencing on the library purified in the previous step according to the on-machine operation steps of illuminea Novaseq; the amount of sequencing data was set to 1G.
And (5) machine-starting data analysis:
step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%;
step 2: and (5) performing quality control analysis on the Reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads;
step 3: data alignment and ordering. The BWA-combined samtools were used to align clear Reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like;
Step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the detection parameters of InDel are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
step 5, annotating the detected SNP by using annovar software based on a GFF file of a MT019531.1 reference genome;
analysis of results:
FIG. 3 shows the library construction results of this example, showing that the sequencing libraries of 46d1-2 and 50d1-2 are unimodal, with average library fragment sizes of about 380-420bp, following amplicon average size and complete library size expectations, by mixed specific primer reverse transcription and subsequent library construction; blank NC did not construct libraries, meeting expectations (results not shown); the machine-down data are respectively as follows: 0.9G (46 d 1-2) and 1.0G (50 d 1-2), the raw data Q30 values were 94.15% and 92.46%, respectively (Table 6).
TABLE 6 control of on-machine data quality for alveolar lavage fluid sample library
Figure BDA0002427599160000191
The comparison of the machine-setting data to the virus reference genome MT019531.1 (Accession No: MT019531 GWHABKH 00000000) after filtration is carried out, the comparison base number, the comparison rate, the mismatch rate, the average depth coverage ratio and the like are shown in Table 7, and the comparison rate of the machine-setting data of the alveolar lavage fluid sample is more than 97%, and the mismatch rate is less than 0.1%; the sequencing depth 100 times coverage ratio of the N and S accessory genes of the novel coronaviruses of the two libraries is 100%, so that the novel coronavirus COVID-19 can be determined; the viral genome sequencing depth 100 x coverage ratio reached 99.08% and 99.24%, respectively (table 7).
TABLE 7 off-the-shelf data analysis and statistics of alveolar lavage fluid sample library
Figure BDA0002427599160000192
Figure BDA0002427599160000201
In both samples, 46d1-2 had 9 Single Nucleotide Polymorphism Sites (SNPs), and no indel mutations were found, respectively: MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 3706 (orf 1ab: A3441G), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 5812 (orf 1ab: C5547T), MT019531.1 genomic position 6996 (orf 1ab: C67531T), MT019531.1 genomic position 7010 (orf 1ab: G6755A), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), MT019531.1 genomic position 18640 (orf 1ab: A18375G).
50d1-1 had 8 Single Nucleotide Polymorphism Sites (SNPs) and no indel mutation was found, respectively: MT019531.1 genomic position 1880 (orf 1ab: G1615A), MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 7010 (orf 1ab: G6755A), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), MT019531.1 genomic position 28620 (N: G346A); from a combination of the results of example 1 and example 2, it can be seen that the same mutation site can be identified by both reverse transcription methods for the same sample, and the comparison rate, average sequencing depth, 100×sequencing depth coverage ratio, and the like of the library obtained by reverse transcription using the specific primer are all great advantages in terms of data utilization rate.
Example 3
The viral RNA used in this example was obtained from a throat swab sample from a novel patient with coronary pneumonia by magnetic bead extraction; the extraction and quality inspection of RNA is performed by the biological safety class 3 (P3) laboratory of the institute of pathogenic biology of the national academy of medical sciences/Beijing synergetic hospital.
The method provided by the embodiment can be used for detecting virus types in throat swab samples or detecting virus genome mutations from patients with established or suspected novel coronary pneumonia; viral copy number viral concentration (Copies/. Mu.L) was determined by absolute quantitative qRT-PCR using N gene and E gene copy numbers of novel coronavirus nucleic acid standard (low concentration) GBW (E) 091090 (China national institute of metrology) (Table 8). .
TABLE 8 pharyngeal swab RNA Virus copy number, clinical information
Figure BDA0002427599160000202
The specific experimental method is as follows:
the throat swab collects the novel oral epithelial cells of patients with coronary pneumonia, and then the virus single-stranded RNA extracted by a magnetic bead method is reversely transcribed into a strand cDNA (1 st cDNA) by using a 6-base random primer and a 1st cDNA synthesis kit, wherein the 1st cDNA synthesis kit is selected as follows: TAKARA PrimeScript 1 st strand cDNA Synthesis kit (TAKARA, cat No. 6110A); purifying 1st cDNA for subsequent amplification;
Using the anchored multiplex amplification primer set 1, a PCR reaction was performed, the PCR reaction system comprising: 10. Mu.L of the purified cDNA template, 1 5. Mu.L of the anchor multiplex primer set, 15. Mu.L of the DNA polymerase and 2 Xbuffer system were selected from the group consisting of: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98℃for 20S, annealing at 60℃for 30S, elongation at 72℃for 30S, cycle number 25; step c, total extension: extending at 72 ℃ for 60S; preserving at 4 ℃.
Using the anchored multiplex amplification primer set 2, a PCR reaction is performed, the PCR reaction system comprising: 10. Mu.L of cDNA template after purification of the same sample, 2 5. Mu.L of anchor multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98℃for 20s, annealing at 60℃for 30s, elongation at 72℃for 30s, cycle number 25; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The first round anchored PCR amplification products were purified separately, and the procedure included: step a, adding 1 volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) to dissolve back; mixing the two sets of PCR products in equimolar amounts;
The first round PCR product of the Illumina library tagged amplification primer amplification mix is 20. Mu.L, the Illumina library amplification primer pair is 5. Mu.L, the DNA polymerase and 2 Xbuffer system is 25. Mu.L total, preferably, the DNA polymerase and 2 Xbuffer system is KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: 15s of denaturation at 98 ℃, 30s of annealing at 60 ℃ and 30s of extension at 72 ℃ with 10 cycles; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The second round of Illumina library amplification product purification procedure included: step a, adding 0.8 volumes (40. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization.
High throughput sequencing, namely performing high throughput sequencing on the library purified in the previous step according to the on-machine operation steps of illuminea Novaseq; the amount of sequencing data was set to 1G, and in this example, the actual number of moles of the library was adjusted due to the library peak type effect.
And (5) machine-starting data analysis:
step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%;
step 2: and (5) performing quality control analysis on reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads;
step 3: data alignment and ordering. The BWA-combined samtools were used to align clear Reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like;
step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the detection parameters of InDel are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
And 5, annotating the detected SNP by using annovar software based on the GFF file of the MT019531.1 reference genome.
Analysis of results:
FIG. 4 shows the results of library construction using throat swab viral RNA with a low relative viral copy number, wherein both libraries are multimodal, and have a major peak (proportion of about 80%) of about 180bp, suspected to be less affected by the sample viral copy number, and have low reverse transcription efficiency, and the anchor primer is obtained after dimer formation and excessive amplification by the library amplification primer, and the expected major peak has two minor peaks at 380-440bp, which are about 20%, thus increasing the actual number of moles of the library on the fly by 4 times during sequencing on the machine; blank NC did not construct libraries, meeting expectations (results not shown); the machine-down data are respectively as follows: 1.2G (48 d 5-1) and 1.3G (47 d 1-1), the raw data Q30 values were 85.28% and 79.77%, respectively (Table 9).
Table 9 control of on-machine data quality of throat swab sample library
Figure BDA0002427599160000221
The comparison of the machine-setting data to the virus reference genome MT019531.1 (Accession No: MT019531 GWHABKH 00000000) after filtering is long, the comparison base number, the comparison rate, the mismatch rate and the like are shown in Table 10, and the comparison rate of the machine-setting data of the throat swab sample is more than 93%, and the mismatch rate is less than 0.03%; the sequencing depth 100 times coverage ratio of the N and S accessory genes of the novel coronaviruses of the two libraries is 100%, so that the novel coronavirus COVID-19 can be determined; viral genome sequencing depth 100 x coverage ratio reached 98.12% and 96.73%, respectively (table 10).
TABLE 10 off-the-shelf data analysis and statistics of throat swab sample library
Figure BDA0002427599160000222
Figure BDA0002427599160000231
48d5-1 has 6 single nucleotide polymorphism Sites (SNP), 4 deletion mutation sites, respectively: MT019531.1 genomic position 2132 (orf 1ab: A1867G), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 11354 (orf 1ab: G11089A), MT019531.1 genomic position 17194 (orf 1ab: A16929G), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), deletion mutation at MT019531.1 genomic position 9264 (orf 1ab: 9000_9005del), MT019531.1 genomic position 9851 (orf 1ab: 9587_9596del), 019531.1 genomic position 20296 (orf 1ab:20032_20035 del), MT019531.1 genomic position 29067 (N: 795_808del).
47d1-1 had 3 Single Nucleotide Polymorphism Sites (SNPs), and no indel mutations were found, respectively: MT019531.1 genomic position 1578 (orf 1ab: T1313A), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 18123 (orf 1ab: T17858C).
Example 4
The viral RNA samples 48d5 and 47d1 used in this example are the same as in example 3.
The specific experimental method is as follows:
the virus single-stranded RNA extracted by the magnetic bead method after collecting the oral epithelial cells of the novel patient with coronary pneumonia from the throat swab is mixed by using 34 gene specific reverse primers (the genome direction is 3'-5', the specific reverse transcription primer combination in the embodiment 2), and a 1st cDNA synthesis kit is reversely transcribed into a strand cDNA (1 st cDNA), wherein the 1st cDNA synthesis kit is selected as follows: TAKARA PrimeScript 1 st strand cDNA Synthesis kit (TAKARA, cat No. 6110A); the 1st cDNA was purified for subsequent amplification.
Using the anchored multiplex amplification primer set 1, a PCR reaction was performed, the PCR reaction system comprising: 10. Mu.L of the purified cDNA template, 1 5. Mu.L of the anchor multiplex primer set, 15. Mu.L of the DNA polymerase and 2 Xbuffer system were selected from the group consisting of: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98℃for 20s, annealing at 60℃for 30s, elongation at 72℃for 30s, cycle number 25; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
Using the anchored multiplex amplification primer set 2, a PCR reaction is performed, the PCR reaction system comprising: 10. Mu.L of cDNA template after purification of the same sample, 2 5. Mu.L of anchor multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system were selected as follows: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98℃for 20s, annealing at 60℃for 30s, elongation at 72℃for 30s, cycle number 25; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The first round anchored PCR amplification products were purified separately, and the procedure included: step a, adding 1 volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu. LEB buffer (Qiagen Cat No. 19086) for reconstitution; mixing the two sets of PCR products in equimolar amounts;
the first round PCR product of the Illumina library tagged amplification primer amplification mix is 20. Mu.L, the Illumina library amplification primer pair is 5. Mu.L, the DNA polymerase and 2 Xbuffer system is 25. Mu.L total, preferably, the DNA polymerase and 2 Xbuffer system is KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: 15s of denaturation at 98 ℃, 30s of annealing at 60 ℃ and 30s of extension at 72 ℃ with 10 cycles; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The second round of Illumina library amplification product purification procedure included: step a, adding 0.8 volumes (40. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization.
High throughput sequencing, namely performing high throughput sequencing on the library purified in the previous step according to the on-machine operation steps of illuminea Novaseq; the amount of sequencing data was set to 1G.
And (5) machine-starting data analysis:
step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%;
step 2: and (5) performing quality control analysis on reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads;
step 3: data alignment and ordering. The BWA-combined samtools were used to align clear Reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like;
Step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the detection parameters of InDel are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
and 5, annotating the detected SNP by using annovar software based on the GFF file of the MT019531.1 reference genome.
Analysis of results:
FIG. 5 shows the library construction results of this example, showing that the pharyngeal swab virus RNA with a lower relative virus copy number was reverse transcribed and the subsequent library construction was performed using specific primer pairs, both libraries were unimodal with the expected main peak positions of 380-420bp, which was consistent with the expectation; blank NC does not construct a library, which meets the expectations; the machine-down data are respectively as follows: 1.3G (48 d 5-1) and 1.3G (47 d 1-1), the raw data Q30 values were 94.18% and 94.37%, respectively (Table 11).
Table 11 control of on-machine data quality of throat swab sample library
Figure BDA0002427599160000251
The comparison of the machine-setting data to the virus reference genome MT019531.1 (Accession No: MT019531 GWHABKH 00000000) after filtering is long, the comparison base number, the comparison rate, the mismatch rate and the like are shown in Table 12, and the comparison rate of the machine-setting data of the throat swab sample is more than 96%, and the mismatch rate is less than 0.03%; the sequencing depth 100 times coverage ratio of the N and S accessory genes of the novel coronaviruses of the two libraries is 100%, so that the novel coronavirus COVID-19 can be determined; the viral genome sequencing depth 100 x coverage ratio reached 99.08% and 98.85%, respectively (table 12).
TABLE 12 off-the-shelf data analysis and statistics of throat swab sample library
Figure BDA0002427599160000252
48d5-2 has 6 single nucleotide polymorphism Sites (SNP), 4 deletion mutation sites, respectively: MT019531.1 genomic position 2132 (orf 1ab: A1867G), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 11354 (orf 1ab: G11089A), MT019531.1 genomic position 17194 (orf 1ab: A16929G), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), deletion mutation at MT019531.1 genomic position 9264 (orf 1ab: 9000_9005del), MT019531.1 genomic position 9851 (orf 1ab: 9587_9596del), 019531.1 genomic position 20296 (orf 1ab:20032_20035 del), MT019531.1 genomic position 29067 (N: 795_808del).
47d1-2 present 3 Single Nucleotide Polymorphic Sites (SNPs), no indel mutations were found, respectively: MT019531.1 genomic position 1578 (orf 1ab: T1313A), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 18123 (orf 1ab: T17858C); from a combination of the results of example 3 and example 4, it was found that the same mutation sites could be identified by both reverse transcription methods, and that the comparison rate, average sequencing depth, 100×sequencing depth coverage ratio, and the like of the library obtained by reverse transcription using the specific primers were superior in terms of data utilization.
Example 5
The viruses used in this example were isolated from isolated novel coronavirus strain laboratory cells cultured by in vitro infection, and the virus supernatant was extracted for a total of 3 cases; the virus culture and RNA extraction operations are assisted by the biological safety class 3 (P3) laboratory of the institute of pathogenic biology of the national academy of medical sciences/Beijing synergetic hospital.
The method provided by the embodiment can be used for detecting the virus genome mutation of the novel coronavirus with high copy number and is used for identifying the variation and evolution condition analysis of the virus under the high copy number; viral Copy number viral concentration (Copy/. Mu.L) was determined by absolute quantitative qRT-PCR using the N gene Copy number of novel coronavirus nucleic acid standard (high concentration) GBW (E) 091089 (China national institute of metrology) (Table 13).
TABLE 13 RNA viral copy number of cultured virus and clinical information
Figure BDA0002427599160000261
The specific experimental method is as follows:
the single-stranded RNA of the virus extracted after virus culture was mixed using 34 gene-specific reverse primers (genome direction 3'-5', the same specific reverse transcription primer combination as in example 2), and reverse transcribed into one-stranded cDNA (1 st cDNA) using a 1st cDNA synthesis kit selected from the group consisting of: TAKARA PrimeScript 1 st strand cDNA Synthesis kit (TAKARA, cat No. 6110A), the 1st cDNA was purified and then subjected to the subsequent amplification reaction.
Using the anchored multiplex amplification primer set 1, a PCR reaction was performed, the PCR reaction system comprising: 5. Mu.L of cDNA template, 1 5. Mu.L of anchor multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20s, annealing at 60 ℃ for 30s, extension at 72 ℃ for 30s, and cycle number of 10; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
Using the anchored multiplex amplification primer set 2, a PCR reaction is performed, the PCR reaction system comprising: 5. Mu.L of cDNA template, 2 5. Mu.L of anchored multiplex primer set, 15. Mu.L of DNA polymerase and 2 Xbuffer system for the same sample, 5. Mu.L of double distilled water, DNA polymerase and 2 Xbuffer system were selected as: KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure included: step a, pre-denaturation: denaturation at 98℃for 1min; step b, cyclic amplification: denaturation at 98 ℃ for 20s, annealing at 60 ℃ for 30s, extension at 72 ℃ for 30s, and cycle number of 10; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The first round anchored PCR amplification products were purified separately, and the procedure included: step a, adding 1 volume (30. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) to dissolve back; equimolar amounts of the two sets of PCR products were mixed.
The first round PCR product of the Illumina library tagged amplification primer amplification mix is 20. Mu.L, the Illumina library amplification primer pair is 5. Mu.L, the DNA polymerase and 2 Xbuffer system is 25. Mu.L total, preferably, the DNA polymerase and 2 Xbuffer system is KAPA HiFi HotStart ReadyMix (Roche, cat No. KK 2602); the PCR amplification procedure of step 4) includes: step a, pre-denaturation: denaturation at 98℃for 45s; step b, cyclic amplification: 15s of denaturation at 98 ℃, 30s of annealing at 60 ℃ and 30s of extension at 72 ℃ with 10 cycles; step c, total extension: extending at 72 ℃ for 60s; preserving at 4 ℃.
The second round of Illumina library amplification product purification procedure included: step a, adding 1 volume (50. Mu.L) of DNA purification beads, preferably Agencourt AMPure XP Beads (Beckman Cat No. 14403400) of the amplified product; step b, incubating for 5min at room temperature; step c, placing the steel plate in a magnetic rack for 10min; step d, preparing 80% of fresh ethanol to wash the magnetic beads twice; step e, use 30. Mu.L EB buffer (Qiagen Cat No. 19086) for the solubilization.
High throughput sequencing, namely performing high throughput sequencing on the library purified in the previous step according to the on-machine operation steps of illuminea Novaseq; because of the large number of copies of the virus, the sequencing data volume is set to be 500M (0.5G) in order to reduce the influence of the Illumina sequencing platform on the data result.
And (5) machine-starting data analysis:
step 1: construction of a reference genome (MT 019531.1) index dataset was performed using BWA software, and fai files were generated using samtools faidx. Statistics MT019531.1 genome basic information: the total length is 29899bp, and the GC content is 37.98%;
step 2: and (5) performing quality control analysis on reads. The double-ended reads were filtered and analyzed for quality control using SOAPnuke to obtain clean reads (read length after filtration). Reads meeting the following conditions will be removed: 1) Reads containing linker sequence contamination; 2) Reads with more than 10% N bases; 3) The number of low quality (Q < 38) bases exceeds 50% of the total reads;
step 3: data alignment and ordering. The BWA-combined samtools were used to align clear Reads onto the reference genome (MT 019531.1) of the COVID-2019 to generate BAM files with alignment parameters of "-t 32-M". Sort is done using the sortSam. And (3) indexing the ordered BAM files by using an index tool of samtools. Performing quality control on the generated BAM file by using tools such as Qualimap and the like;
Step 4: and (3) mutation detection: SNP and InDel variants of the virus were detected using samtools pileup and VarScan. The SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the detection parameters of InDel are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
and 5, annotating the detected SNP by using annovar software based on the GFF file of the MT019531.1 reference genome.
Analysis of results:
FIG. 6 shows the results of library construction of this example, using a viral copy number of about 10 8 The individual virus culture samples were subjected to library construction, all libraries were unimodal, the library range was narrow and sharp, and the average library fragment size was 380About 420bp, which meets the expectations of the average size of the amplicon and the size of the complete library, Q30 reaches more than 93.79% (Table 14).
TABLE 14 off-the-shelf data quality control for library of cultured virus samples
Figure BDA0002427599160000281
The comparison of the machine-setting data after filtration (Clean) to the comparison read length of the virus reference genome MT019531.1 (Accession No: MT019531 GWHABKH 00000000), the comparison base number, the comparison rate, the mismatch rate and the like are shown in Table 6, the comparison rate of the machine-setting data of the sample is more than 99.61%, and the mismatch rate is less than 0.2%; the 100 Xcoverage ratio of the N and S sequencing depth of the novel coronavirus accessory genes of the two libraries is 100%, and the 100 Xsequencing depth coverage ratio of the viral genome of 3 samples can be determined to be the novel coronavirus COVID-19, and the coverage ratio of the viral genome is more than 98.65% (Table 15).
TABLE 15 off-the-shelf data analysis and statistics of cultured virus sample library
Figure BDA0002427599160000282
Figure BDA0002427599160000291
7 Single Nucleotide Polymorphism Sites (SNPs) exist in XH1P2-R, and insertion deletion mutation sites are not found, respectively: MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 3706 (orf 1ab: A3441G), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 5812 (orf 1ab: C5547T), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 18395 (orf 1ab: C18130T), MT019531.1 genomic position 18557 (orf 1ab: C18292T).
There are 6 Single Nucleotide Polymorphic Sites (SNPs) in XH1P6-R, and insertion deletion mutation sites were not found, respectively: MT019531.1 genomic position 3127 (orf 1ab: T2862C), MT019531.1 genomic position 5369 (orf 1ab: G5104T), MT019531.1 genomic position 5812 (orf 1ab: C5547T), MT019531.1 genomic position 6996 (orf 1ab: C6731T), MT019531.1 genomic position 18557 (orf 1ab: C18292T), MT019531.1 genomic position 26308 (E: G64T).
WHP6-R had 9 single nucleotide polymorphism Sites (SNP), 1 deletion mutation site, respectively: MT019531.1 genomic position 565 (ORF 1ab: T300C), MT019531.1 genomic position 6996 (ORF 1ab: C6731T), MT019531.1 genomic position 7010 (ORF 1ab: G6755A), MT019531.1 genomic position 17825 (ORF 1ab: C17560T), MT019531.1 genomic position 18557 (ORF 1ab: C18292T), MT019531.1 genomic position 21784 (S: T333A), MT019531.1 genomic position 23525 (S: C1965T), MT019531.1 genomic position 23598 (S: A2036G), MT019531.1 genomic position 29573 (ORF 10: G16A), 019531.1 genomic position 23594 (S: 2033_2062del).
While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Reference is made to:
[1]Ge,X.-Y.et al.Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor.Nature 503,535–538(2013)
[2]Yang,L.et al.Novel SARS-like betacoronaviruses in bats,China,2011.Emerg.Infect.Dis.19,989–991(2013)
[3]Menachery,V.D.et al.SARS-like WIV1-CoV poised for human emergence.Proc.Natl Acad.Sci.USA 113,3048–3053(2016)
[4]Cui,J.,Li,F.&Shi,Z.L.Origin and evolution of pathogenic coronaviruses.Nat.Rev.Microbiol.17,181–192(2019)
[5]Fan,Y.,Zhao,K.,Shi,Z.-L.&Zhou,P.Bat coronaviruses in China.Viruses 11,210(2019)
[6]Wuhan Municipal Health Commission.Press statement related to novel coronavirus infection(in Chinese)http://wjw.wuhan.gov.cn/front/web/showDetail/2020012709194(2020)
[7]Zhou,P.,Yang,X.,Wang,X.et al.A pneumonia outbreak associated with a new coronavirus of probable bat origin.Nature 579,270–273(2020).https://doi.org/10.1038/s41586-020-2012-7。
sequence listing
<110> Fuzhou Furui medical laboratory Co., ltd
Chinese Academy of Medical Sciences
<120> method for constructing novel coronavirus whole genome high throughput sequencing library and kit for library construction
<130> KHP201111315.5
<160> 504
<170> SIPOSequenceListing 1.0
<210> 1
<211> 34
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 1
gtctcgtggg ctcggagatg tgtataagag acag 34
<210> 2
<211> 33
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 2
tcgtcggcag cgtcagatgt gtataagaga cag 33
<210> 3
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 3
gtctcgtggg ctcggagatg tgtataagag acagaccaac caactttcga tctct 55
<210> 4
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 4
gtctcgtggg ctcggagatg tgtataagag acagtcccag gtaacaaacc aacc 54
<210> 5
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 5
gtctcgtggg ctcggagatg tgtataagag acagggtgtg accgaaaggt aagat 55
<210> 6
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 6
gtctcgtggg ctcggagatg tgtataagag acaggtccct ggtttcaacg agaa 54
<210> 7
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 7
gtctcgtggg ctcggagatg tgtataagag acagggcgaa ataccagtgg ctta 54
<210> 8
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 8
gtctcgtggg ctcggagatg tgtataagag acagttgagc tggtagcaga actc 54
<210> 9
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 9
gtctcgtggg ctcggagatg tgtataagag acagggtgtt acccgtgaac tcat 54
<210> 10
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 10
gtctcgtggg ctcggagatg tgtataagag acagtgtccg aacaactgga ctttat 56
<210> 11
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 11
gtctcgtggg ctcggagatg tgtataagag acaggcttga tggctttatg ggtaga 56
<210> 12
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 12
gtctcgtggg ctcggagatg tgtataagag acagattgtc cagcatgtca caattc 56
<210> 13
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 13
gtctcgtggg ctcggagatg tgtataagag acaggtggaa actgtgaaag gtttgg 56
<210> 14
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 14
gtctcgtggg ctcggagatg tgtataagag acagttctcc cgcactcttg aaac 54
<210> 15
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 15
gtctcgtggg ctcggagatg tgtataagag acaggctcgt gttgtacgat caattt 56
<210> 16
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 16
gtctcgtggg ctcggagatg tgtataagag acagtcgcag tggctaacta acatc 55
<210> 17
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 17
gtctcgtggg ctcggagatg tgtataagag acagagagaa gtttaaggaa ggtgtagag 59
<210> 18
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 18
gtctcgtggg ctcggagatg tgtataagag acagcagaga agaaactggc ctactc 56
<210> 19
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 19
gtctcgtggg ctcggagatg tgtataagag acagcatttg tcacgcactc aaagg 55
<210> 20
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 20
gtctcgtggg ctcggagatg tgtataagag acagctgtgc ccttgcacct aata 54
<210> 21
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 21
gtctcgtggg ctcggagatg tgtataagag acagcattgg ttggtacacc agtttg 56
<210> 22
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 22
gtctcgtggg ctcggagatg tgtataagag acagaatgag aagtgctctg cctatac 57
<210> 23
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 23
gtctcgtggg ctcggagatg tgtataagag acaggcaagg ttacaagagt gtgaatatc 59
<210> 24
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 24
gtctcgtggg ctcggagatg tgtataagag acagcttaca ccactgggca ttga 54
<210> 25
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 25
gtctcgtggg ctcggagatg tgtataagag acagcctcca gatgaggatg aagaag 56
<210> 26
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 26
gtctcgtggg ctcggagatg tgtataagag acagcctgaa gaagagcaag aagaaga 57
<210> 27
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 27
gtctcgtggg ctcggagatg tgtataagag acaggcagtg aggacaatca gacaa 55
<210> 28
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 28
gtctcgtggg ctcggagatg tgtataagag acagaaatgc agacattgtg gaagaag 57
<210> 29
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 29
gtctcgtggg ctcggagatg tgtataagag acagccttaa acatggagga ggtgtt 56
<210> 30
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 30
gtctcgtggg ctcggagatg tgtataagag acaggcggac acaatcttgc taaac 55
<210> 31
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 31
gtctcgtggg ctcggagatg tgtataagag acagggtgct gaccctatac attctt 56
<210> 32
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 32
gtctcgtggg ctcggagatg tgtataagag acaggatcgc tgagattcct aaagagg 57
<210> 33
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 33
gtctcgtggg ctcggagatg tgtataagag acagcttcat ccagattctg ccactc 56
<210> 34
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 34
gtctcgtggg ctcggagatg tgtataagag acagtgatgt tgttcaagag ggtgt 55
<210> 35
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 35
gtctcgtggg ctcggagatg tgtataagag acagaactgc tgtggttata cctactaaa 59
<210> 36
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 36
gtctcgtggg ctcggagatg tgtataagag acaggcttgc acatgcagaa gaaa 54
<210> 37
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 37
gtctcgtggg ctcggagatg tgtataagag acagcatgca gaagaaacac gcaaat 56
<210> 38
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 38
gtctcgtggg ctcggagatg tgtataagag acaggcgtca cttatcaaca cacttaac 58
<210> 39
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 39
gtctcgtggg ctcggagatg tgtataagag acagcatctc acttgctggt tcctat 56
<210> 40
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 40
gtctcgtggg ctcggagatg tgtataagag acagggtcct attctggaca atctacac 58
<210> 41
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 41
gtctcgtggg ctcggagatg tgtataagag acagttgaga gaagtgagga ctattaagg 59
<210> 42
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 42
gtctcgtggg ctcggagatg tgtataagag acaggggtag gtacatgtca gcatt 55
<210> 43
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 43
gtctcgtggg ctcggagatg tgtataagag acagaccaca caactgatcc tagttt 56
<210> 44
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 44
gtctcgtggg ctcggagatg tgtataagag acaggacagt aggtgagtta ggtgatg 57
<210> 45
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 45
gtctcgtggg ctcggagatg tgtataagag acagggtgag ttaggtgatg ttagagaaa 59
<210> 46
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 46
gtctcgtggg ctcggagatg tgtataagag acagcagata ccttgtacgt gtggtaa 57
<210> 47
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 47
gtctcgtggg ctcggagatg tgtataagag acagtacgtg tggtaaacaa gctaca 56
<210> 48
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 48
gtctcgtggg ctcggagatg tgtataagag acagttgcat agacggtgct ttact 55
<210> 49
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 49
gtctcgtggg ctcggagatg tgtataagag acagacaatt cttatttcac agagcaacc 59
<210> 50
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 50
gtctcgtggg ctcggagatg tgtataagag acagccaaac caaccatatc caaacg 56
<210> 51
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 51
gtctcgtggg ctcggagatg tgtataagag acagtggtga tgtggtggct attg 54
<210> 52
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 52
gtctcgtggg ctcggagatg tgtataagag acagtccctg acttaaatgg tgatgt 56
<210> 53
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 53
gtctcgtggg ctcggagatg tgtataagag acagcagtct ctgaagaagt agtggaaa 58
<210> 54
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 54
gtctcgtggg ctcggagatg tgtataagag acagtcttgc ctgcgaagat ctaaa 55
<210> 55
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 55
gtctcgtggg ctcggagatg tgtataagag acagaccctt gctactcatg gtttag 56
<210> 56
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 56
gtctcgtggg ctcggagatg tgtataagag acagtactca tggtttagct gctgtt 56
<210> 57
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 57
gtctcgtggg ctcggagatg tgtataagag acagatgccg actactatag caaagaata 59
<210> 58
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 58
gtctcgtggg ctcggagatg tgtataagag acagaaagca tctatgccga ctactat 57
<210> 59
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 59
gtctcgtggg ctcggagatg tgtataagag acaggtactg gttacagaga aggctatt 58
<210> 60
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 60
gtctcgtggg ctcggagatg tgtataagag acagacagag aaggctattt gaactcta 58
<210> 61
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 61
gtctcgtggg ctcggagatg tgtataagag acagtacttg gattggctgc aatcat 56
<210> 62
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 62
gtctcgtggg ctcggagatg tgtataagag acagggctgc aatcatgcaa ttgtt 55
<210> 63
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 63
gtctcgtggg ctcggagatg tgtataagag acagagagca acaagagtcg aatgta 56
<210> 64
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 64
gtctcgtggg ctcggagatg tgtataagag acaggtgtta caaacgtaat agagcaaca 59
<210> 65
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 65
gtctcgtggg ctcggagatg tgtataagag acagagaatg gttccatcca tctttact 58
<210> 66
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 66
gtctcgtggg ctcggagatg tgtataagag acaggaccag tcttcttaca tcgttga 57
<210> 67
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 67
gtctcgtggg ctcggagatg tgtataagag acagcgtctg tttactacag tcagcttat 59
<210> 68
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 68
gtctcgtggg ctcggagatg tgtataagag acagtgttgg tgatagtgcg gaag 54
<210> 69
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 69
gtctcgtggg ctcggagatg tgtataagag acagcttgca aagaatgtgt ccttagac 58
<210> 70
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 70
gtctcgtggg ctcggagatg tgtataagag acaggtgacc ttggtgcttg tattg 55
<210> 71
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 71
gtctcgtggg ctcggagatg tgtataagag acaggactgt agtgcgcgtc atatt 55
<210> 72
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 72
gtctcgtggg ctcggagatg tgtataagag acagtgacat gtgcaactac tagacaa 57
<210> 73
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 73
gtctcgtggg ctcggagatg tgtataagag acagaagata gcacttaagg gtggtaaa 58
<210> 74
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 74
gtctcgtggg ctcggagatg tgtataagag acagccagcg tggtggtagt tatac 55
<210> 75
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 75
gtctcgtggg ctcggagatg tgtataagag acaggacaaa gcttgcccat tgatt 55
<210> 76
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 76
gtctcgtggg ctcggagatg tgtataagag acagcttctg gtaagccagt accatatt 58
<210> 77
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 77
gtctcgtggg ctcggagatg tgtataagag acaggatgct tctggtaagc cagt 54
<210> 78
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 78
gtctcgtggg ctcggagatg tgtataagag acagcagaag ctggtgtttg tgtatc 56
<210> 79
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 79
gtctcgtggg ctcggagatg tgtataagag acaggtggta gatgggtact taacaatga 59
<210> 80
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 80
gtctcgtggg ctcggagatg tgtataagag acagacacca gtttactcat tcttacct 58
<210> 81
<211> 61
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 81
gtctcgtggg ctcggagatg tgtataagag acagctattc cttatgtcat tcactgtact 60
c 61
<210> 82
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 82
gtctcgtggg ctcggagatg tgtataagag acagcgtgta gtctttaatg gtgtttcc 58
<210> 83
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 83
gtctcgtggg ctcggagatg tgtataagag acagttgaag aagctgcgct gt 52
<210> 84
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 84
gtctcgtggg ctcggagatg tgtataagag acagtctcgc aaaggctctc aatg 54
<210> 85
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 85
gtctcgtggg ctcggagatg tgtataagag acagcatgtg atctgcacct ctgaa 55
<210> 86
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 86
gtctcgtggg ctcggagatg tgtataagag acaggatgac gtagtttact gtccaaga 58
<210> 87
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 87
gtctcgtggg ctcggagatg tgtataagag acagcagcca atcctaagac acctaag 57
<210> 88
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 88
gtctcgtggg ctcggagatg tgtataagag acagaggttg atacagccaa tcctaag 57
<210> 89
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 89
gtctcgtggg ctcggagatg tgtataagag acagcatgct ggcacagact taga 54
<210> 90
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 90
gtctcgtggg ctcggagatg tgtataagag acagtggcac agacttagaa ggtaac 56
<210> 91
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 91
gtctcgtggg ctcggagatg tgtataagag acagtgactt taaccttgtg gctatga 57
<210> 92
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 92
gtctcgtggg ctcggagatg tgtataagag acagggacct ctttctgctc aaact 55
<210> 93
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 93
gtctcgtggg ctcggagatg tgtataagag acagaatcaa gggtacacac cactg 55
<210> 94
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 94
gtctcgtggg ctcggagatg tgtataagag acagcacacc actggttgtt actca 55
<210> 95
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 95
gtctcgtggg ctcggagatg tgtataagag acaggctagt tgggtgatgc gtatt 55
<210> 96
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 96
gtctcgtggg ctcggagatg tgtataagag acaggtctat atgcctgcta gttggg 56
<210> 97
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 97
gtctcgtggg ctcggagatg tgtataagag acagccattt ccatgtgggc tctta 55
<210> 98
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 98
gtctcgtggg ctcggagatg tgtataagag acagaggtgt agttacaact gtcatgt 57
<210> 99
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 99
gtctcgtggg ctcggagatg tgtataagag acaggttggt ggcaaacctt gtatc 55
<210> 100
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 100
gtctcgtggg ctcggagatg tgtataagag acagctccca cccaagaata gcatag 56
<210> 101
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 101
gtctcgtggg ctcggagatg tgtataagag acaggctaaa gatactactg aagcctttg 59
<210> 102
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 102
gtctcgtggg ctcggagatg tgtataagag acaggcaggg tgctgtagac ataaa 55
<210> 103
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 103
gtctcgtggg ctcggagatg tgtataagag acaggctgtt gctaatggtg attctg 56
<210> 104
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 104
gtctcgtggg ctcggagatg tgtataagag acaggctaat ggtgattctg aagttgtt 58
<210> 105
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 105
gtctcgtggg ctcggagatg tgtataagag acagacaaca gcagccaaac taatg 55
<210> 106
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 106
gtctcgtggg ctcggagatg tgtataagag acaggagatg gttgtgttcc cttga 55
<210> 107
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 107
gtctcgtggg ctcggagatg tgtataagag acagggccaa ttctgctgtc aaatta 56
<210> 108
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 108
gtctcgtggg ctcggagatg tgtataagag acagcagctt taagggccaa ttctg 55
<210> 109
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 109
gtctcgtggg ctcggagatg tgtataagag acagaacaca acaaagggag gtagg 55
<210> 110
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 110
gtctcgtggg ctcggagatg tgtataagag acaggaaatg ggctagattc cctaaga 57
<210> 111
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 111
gtctcgtggg ctcggagatg tgtataagag acagtagctg ccacagtacg tcta 54
<210> 112
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 112
gtctcgtggg ctcggagatg tgtataagag acagcaagct ggtaatgcaa cagaag 56
<210> 113
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 113
gtctcgtggg ctcggagatg tgtataagag acagggtact ggtcaggcaa taaca 55
<210> 114
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 114
gtctcgtggg ctcggagatg tgtataagag acaggttgcc acatagatca tccaaatc 58
<210> 115
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 115
gtctcgtggg ctcggagatg tgtataagag acagctgatg tcgtatacag ggcttt 56
<210> 116
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 116
gtctcgtggg ctcggagatg tgtataagag acagaacggg tttgcggtgt aa 52
<210> 117
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 117
gtctcgtggg ctcggagatg tgtataagag acaggattgt ccagctgttg ctaaac 56
<210> 118
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 118
gtctcgtggg ctcggagatg tgtataagag acagggtgac atggtaccac atatatca 58
<210> 119
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 119
gtctcgtggg ctcggagatg tgtataagag acagccatgc gaaatgctgg tattg 55
<210> 120
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 120
gtctcgtggg ctcggagatg tgtataagag acaggtgtac gccaagcttt gtt 53
<210> 121
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 121
gtctcgtggg ctcggagatg tgtataagag acagccttga ccagggcttt aact 54
<210> 122
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 122
gtctcgtggg ctcggagatg tgtataagag acagagggct ttaactgcag agtc 54
<210> 123
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 123
gtctcgtggg ctcggagatg tgtataagag acagtctaca gtgttcccac ctaca 55
<210> 124
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 124
gtctcgtggg ctcggagatg tgtataagag acagcacgct gcttctggta atct 54
<210> 125
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 125
gtctcgtggg ctcggagatg tgtataagag acagtacttg tgtatgctgc tgacc 55
<210> 126
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 126
gtctcgtggg ctcggagatg tgtataagag acagtcagga tggtaatgct gctatc 56
<210> 127
<211> 62
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 127
gtctcgtggg ctcggagatg tgtataagag acaggattca atgagttatg aggatcaaga 60
tg 62
<210> 128
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 128
gtctcgtggg ctcggagatg tgtataagag acaggtggtt ggcacaacat gttaaa 56
<210> 129
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 129
gtctcgtggg ctcggagatg tgtataagag acagaagcaa attctatggt ggttgg 56
<210> 130
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 130
gtctcgtggg ctcggagatg tgtataagag acagaaacca ggtggaacct catc 54
<210> 131
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 131
gtctcgtggg ctcggagatg tgtataagag acagcatgtg tggcggttca ctat 54
<210> 132
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 132
gtctcgtggg ctcggagatg tgtataagag acaggacgat gctgttgtgt gtttc 55
<210> 133
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 133
gtctcgtggg ctcggagatg tgtataagag acagatactc tctgacgatg ctgttg 56
<210> 134
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 134
gtctcgtggg ctcggagatg tgtataagag acagtacctt ccttacccag atcca 55
<210> 135
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 135
gtctcgtggg ctcggagatg tgtataagag acagccttac ccagatccat caagaa 56
<210> 136
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 136
gtctcgtggg ctcggagatg tgtataagag acagacatga tgagttaaca ggacaca 57
<210> 137
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 137
gtctcgtggg ctcggagatg tgtataagag acagcaaggt attgggaacc tgag 54
<210> 138
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 138
gtctcgtggg ctcggagatg tgtataagag acagacacac cgcatacagt cttac 55
<210> 139
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 139
gtctcgtggg ctcggagatg tgtataagag acaggctatg tacacaccgc ataca 55
<210> 140
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 140
gtctcgtggg ctcggagatg tgtataagag acagccattg tgtgctaatg gacaag 56
<210> 141
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 141
gtctcgtggg ctcggagatg tgtataagag acagaaacct agaccaccac ttaacc 56
<210> 142
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 142
gtctcgtggg ctcggagatg tgtataagag acaggggaag ttggtaaacc tagacc 56
<210> 143
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 143
gtctcgtggg ctcggagatg tgtataagag acagctggct tatacccaac actcaa 56
<210> 144
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 144
gtctcgtggg ctcggagatg tgtataagag acagaccacc tggtactggt aaga 54
<210> 145
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 145
gtctcgtggg ctcggagatg tgtataagag acagcgtgct cgtgtagagt gttt 54
<210> 146
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 146
gtctcgtggg ctcggagatg tgtataagag acagcctgag acgacagcag atatag 56
<210> 147
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 147
gtctcgtggg ctcggagatg tgtataagag acagcactgt gagtgctttg gtttatg 57
<210> 148
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 148
gtctcgtggg ctcggagatg tgtataagag acaggttgac actgtgagtg ctttg 55
<210> 149
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 149
gtctcgtggg ctcggagatg tgtataagag acaggattca tcacagggct cagaata 57
<210> 150
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 150
gtctcgtggg ctcggagatg tgtataagag acagcaaacc actgaaacag ctcac 55
<210> 151
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 151
gtctcgtggg ctcggagatg tgtataagag acagatccta cacaggcacc taca 54
<210> 152
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 152
gtctcgtggg ctcggagatg tgtataagag acagaatcac tgggttacat cctacac 57
<210> 153
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 153
gtctcgtggg ctcggagatg tgtataagag acagcacccg cgaagaagct ataa 54
<210> 154
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 154
gtctcgtggg ctcggagatg tgtataagag acagcatgtt tatcacccgc gaaga 55
<210> 155
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 155
gtctcgtggg ctcggagatg tgtataagag acagcaccgc ctggagatca attt 54
<210> 156
<211> 61
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 156
gtctcgtggg ctcggagatg tgtataagag acagctggag atcaatttaa acacctcata 60
c 61
<210> 157
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 157
gtctcgtggg ctcggagatg tgtataagag acagtccact gcttcagaca cttatg 56
<210> 158
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 158
gtctcgtggg ctcggagatg tgtataagag acaggcctgt tggcatcatt ctattg 56
<210> 159
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 159
gtctcgtggg ctcggagatg tgtataagag acagagcgtg ttgactggac tattg 55
<210> 160
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 160
gtctcgtggg ctcggagatg tgtataagag acagagtgct ttgttaagcg tgttg 55
<210> 161
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 161
gtctcgtggg ctcggagatg tgtataagag acagtgccac acattctgac aaattc 56
<210> 162
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 162
gtctcgtggg ctcggagatg tgtataagag acagttcaca gatggtgtat gcctatt 57
<210> 163
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 163
gtctcgtggg ctcggagatg tgtataagag acagccacta aagtctgcta cgtgtataa 59
<210> 164
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 164
gtctcgtggg ctcggagatg tgtataagag acagatgtac cactaaagtc tgctacg 57
<210> 165
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 165
gtctcgtggg ctcggagatg tgtataagag acaggatgga caacagggtg aagt 54
<210> 166
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 166
gtctcgtggg ctcggagatg tgtataagag acagcagggt gaagtaccag tttctatc 58
<210> 167
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 167
gtctcgtggg ctcggagatg tgtataagag acaggtgtgg acattgctgc taatac 56
<210> 168
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 168
gtctcgtggg ctcggagatg tgtataagag acaggctaat actgtgatct gggactac 58
<210> 169
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 169
gtctcgtggg ctcggagatg tgtataagag acagtgcccg taatggtgtt cttat 55
<210> 170
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 170
gtctcgtggg ctcggagatg tgtataagag acaggtgcac cactcactgt cttt 54
<210> 171
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 171
gtctcgtggg ctcggagatg tgtataagag acaggttgat ggtgttgtcc aacaatta 58
<210> 172
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 172
gtctcgtggg ctcggagatg tgtataagag acaggttgtc caacaattac ctgaaact 58
<210> 173
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 173
gtctcgtggg ctcggagatg tgtataagag acagagtcat agtcagttag gtggtttac 59
<210> 174
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 174
gtctcgtggg ctcggagatg tgtataagag acagtaggtg gtttacatct actgattgg 59
<210> 175
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 175
gtctcgtggg ctcggagatg tgtataagag acagttatgc tttggtgtaa agatggc 57
<210> 176
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 176
gtctcgtggg ctcggagatg tgtataagag acaggtaaag atggccatgt agaaaca 57
<210> 177
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 177
gtctcgtggg ctcggagatg tgtataagag acagaaacac attaacatta gctgtaccc 59
<210> 178
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 178
gtctcgtggg ctcggagatg tgtataagag acaggtgata tgtacgaccc taagactaaa 60
<210> 179
<211> 62
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 179
gtctcgtggg ctcggagatg tgtataagag acagctcatt attagtgata tgtacgaccc 60
ta 62
<210> 180
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 180
gtctcgtggg ctcggagatg tgtataagag acagtaagct catgggacac ttcg 54
<210> 181
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 181
gtctcgtggg ctcggagatg tgtataagag acagcaaatc caattcagtt gtcttcct 58
<210> 182
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 182
gtctcgtggg ctcggagatg tgtataagag acaggggtac tgctgttatg tcttta 56
<210> 183
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 183
gtctcgtggg ctcggagatg tgtataagag acagcactag tctctagtca gtgtgttaat 60
<210> 184
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 184
gtctcgtggg ctcggagatg tgtataagag acagttattg ccactagtct ctagtcag 58
<210> 185
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 185
gtctcgtggg ctcggagatg tgtataagag acagggtttg ataaccctgt cctacc 56
<210> 186
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 186
gtctcgtggg ctcggagatg tgtataagag acagatacat gtctctggga ccaatg 56
<210> 187
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 187
gtctcgtggg ctcggagatg tgtataagag acaggaccca gtccctactt attgtt 56
<210> 188
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 188
gtctcgtggg ctcggagatg tgtataagag acagttgaat atgtctctca gccttt 56
<210> 189
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 189
gtctcgtggg ctcggagatg tgtataagag acagcggctt tagaaccatt ggtaga 56
<210> 190
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 190
gtctcgtggg ctcggagatg tgtataagag acagtgaccc tctctcagaa acaaag 56
<210> 191
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 191
gtctcgtggg ctcggagatg tgtataagag acagctgtgc acttgaccct ctc 53
<210> 192
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 192
gtctcgtggg ctcggagatg tgtataagag acagctgtgt tgctgattat tctgtcc 57
<210> 193
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 193
gtctcgtggg ctcggagatg tgtataagag acagtcttga ttctaaggtt ggtggtaa 58
<210> 194
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 194
gtctcgtggg ctcggagatg tgtataagag acagggctgc gttatagctt gga 53
<210> 195
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 195
gtctcgtggg ctcggagatg tgtataagag acagggttac caaccataca gagtagtag 59
<210> 196
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 196
gtctcgtggg ctcggagatg tgtataagag acagacccac taatggtgtt ggttac 56
<210> 197
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 197
gtctcgtggg ctcggagatg tgtataagag acagcagaga cattgctgac actact 56
<210> 198
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 198
gtctcgtggg ctcggagatg tgtataagag acaggtgatc cacagacact tgagat 56
<210> 199
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 199
gtctcgtggg ctcggagatg tgtataagag acagactcct acttggcgtg tttatt 56
<210> 200
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 200
gtctcgtggg ctcggagatg tgtataagag acagacacgt gcaggctgtt ta 52
<210> 201
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 201
gtctcgtggg ctcggagatg tgtataagag acagagtcaa tccatcattg cctaca 56
<210> 202
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 202
gtctcgtggg ctcggagatg tgtataagag acagggaata gctgttgaac aagacaaa 58
<210> 203
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 203
gtctcgtggg ctcggagatg tgtataagag acagccaagc aagaggtcat ttattgaag 59
<210> 204
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 204
gtctcgtggg ctcggagatg tgtataagag acagcacctt tgctcacaga tgaaatg 57
<210> 205
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 205
gtctcgtggg ctcggagatg tgtataagag acaggttagc gggtacaatc acttct 56
<210> 206
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 206
gtctcgtggg ctcggagatg tgtataagag acagtcaaga ctcactttct tccacag 57
<210> 207
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 207
gtctcgtggg ctcggagatg tgtataagag acagggctga agtgcaaatt gatagg 56
<210> 208
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 208
gtctcgtggg ctcggagatg tgtataagag acagtgatca caggcagact tcaaa 55
<210> 209
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 209
gtctcgtggg ctcggagatg tgtataagag acaggggcta tcatcttatg tccttcc 57
<210> 210
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 210
gtctcgtggg ctcggagatg tgtataagag acagcacctc atggtgtagt cttctt 56
<210> 211
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 211
gtctcgtggg ctcggagatg tgtataagag acagtgtgtc tggtaactgt gatgtt 56
<210> 212
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 212
gtctcgtggg ctcggagatg tgtataagag acaggactca ttcaaggagg agttagat 58
<210> 213
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 213
gtctcgtggg ctcggagatg tgtataagag acagccatgg tacatttggc taggt 55
<210> 214
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 214
gtctcgtggg ctcggagatg tgtataagag acagtagctg gcttgattgc catag 55
<210> 215
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 215
gtctcgtggg ctcggagatg tgtataagag acagccagtg ctcaaaggag tcaa 54
<210> 216
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 216
gtctcgtggg ctcggagatg tgtataagag acagttgctg tagttgtctc aaggg 55
<210> 217
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 217
gtctcgtggg ctcggagatg tgtataagag acaggataca agcctcactc cctttc 56
<210> 218
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 218
gtctcgtggg ctcggagatg tgtataagag acagctccct ttcggatggc ttatt 55
<210> 219
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 219
gtctcgtggg ctcggagatg tgtataagag acaggctttg gctttgctgg aaat 54
<210> 220
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 220
gtctcgtggg ctcggagatg tgtataagag acagataatg aggctttggc tttgc 55
<210> 221
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 221
gtctcgtggg ctcggagatg tgtataagag acaggatggc acaacaagtc ctatttc 57
<210> 222
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 222
gtctcgtggg ctcggagatg tgtataagag acagattacc agctgtactc aactcaa 57
<210> 223
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 223
gtctcgtggg ctcggagatg tgtataagag acagcacaca atcgacggtt catc 54
<210> 224
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 224
gtctcgtggg ctcggagatg tgtataagag acagcggttc atccggagtt gttaat 56
<210> 225
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 225
gtctcgtggg ctcggagatg tgtataagag acagcttgct ttcgtggtat tcttgc 56
<210> 226
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 226
gtctcgtggg ctcggagatg tgtataagag acagagttac actagccatc cttactg 57
<210> 227
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 227
gtctcgtggg ctcggagatg tgtataagag acagctcctt gaacaatgga acctagta 58
<210> 228
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 228
gtctcgtggg ctcggagatg tgtataagag acagcctatt ccttacatgg atttgtcttc 60
<210> 229
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 229
gtctcgtggg ctcggagatg tgtataagag acagttcttc tcaacgtgcc actc 54
<210> 230
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 230
gtctcgtggg ctcggagatg tgtataagag acaggttcca tgtggtcatt caatcc 56
<210> 231
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 231
gtctcgtggg ctcggagatg tgtataagag acagcgctac aggattggca actataa 57
<210> 232
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 232
gtctcgtggg ctcggagatg tgtataagag acagaacaca gaccattcca gtagc 55
<210> 233
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 233
gtctcgtggg ctcggagatg tgtataagag acaggcactg ataacactcg ctactt 56
<210> 234
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 234
gtctcgtggg ctcggagatg tgtataagag acaggagcaa ccaatggaga ttgattaaa 59
<210> 235
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 235
gtctcgtggg ctcggagatg tgtataagag acagagttac gtgccagatc agttt 55
<210> 236
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 236
gtctcgtggg ctcggagatg tgtataagag acagctgttc atcagacaag aggaagt 57
<210> 237
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 237
gtctcgtggg ctcggagatg tgtataagag acaggctgca tttcaccaag aatgt 55
<210> 238
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 238
gtctcgtggg ctcggagatg tgtataagag acagaatgaa acttgtcacg cctaaac 57
<210> 239
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 239
gtctcgtggg ctcggagatg tgtataagag acaggctggt tctaaatcac ccattc 56
<210> 240
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 240
gtctcgtggg ctcggagatg tgtataagag acagattgaa ttgtgcgtgg atgag 55
<210> 241
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 241
gtctcgtggg ctcggagatg tgtataagag acaggtttgg tggaccctca gatt 54
<210> 242
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 242
gtctcgtggg ctcggagatg tgtataagag acagtcaact ggcagtaacc agaat 55
<210> 243
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 243
gtctcgtggg ctcggagatg tgtataagag acagcaccaa tagcagtcca gatgac 56
<210> 244
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 244
gtctcgtggg ctcggagatg tgtataagag acagctacta ccgaagagct accaga 56
<210> 245
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 245
gtctcgtggg ctcggagatg tgtataagag acagcctgct aacaatgctg caatc 55
<210> 246
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 246
gtctcgtggg ctcggagatg tgtataagag acagccgcaa tcctgctaac aatg 54
<210> 247
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 247
gtctcgtggg ctcggagatg tgtataagag acaggtgatg ctgctcttgc tttg 54
<210> 248
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 248
gtctcgtggg ctcggagatg tgtataagag acagtgctgc tgcttgacag att 53
<210> 249
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 249
gtctcgtggg ctcggagatg tgtataagag acaggaccag gaactaatca gacaagg 57
<210> 250
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 250
gtctcgtggg ctcggagatg tgtataagag acagcccacc aacagagcct aaa 53
<210> 251
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 251
gtctcgtggg ctcggagatg tgtataagag acagctgact caactcaggc ctaaac 56
<210> 252
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 252
gtctcgtggg ctcggagatg tgtataagag acagagacca cacaaggcag atg 53
<210> 253
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 253
tcgtcggcag cgtcagatgt gtataagaga cagggacaag gctctccatc ttac 54
<210> 254
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 254
tcgtcggcag cgtcagatgt gtataagaga cagctccatc ttacctttcg gtcac 55
<210> 255
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 255
tcgtcggcag cgtcagatgt gtataagaga cagccgaacg tttgatgaac acatag 56
<210> 256
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 256
tcgtcggcag cgtcagatgt gtataagaga cagtgctacc agctcaacca taac 54
<210> 257
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 257
tcgtcggcag cgtcagatgt gtataagaga cagagggcca cagaagttgt tatc 54
<210> 258
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 258
tcgtcggcag cgtcagatgt gtataagaga caggggtaac accactgcta tgt 53
<210> 259
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 259
tcgtcggcag cgtcagatgt gtataagaga caggtgtctg caattcatag ctcttt 56
<210> 260
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 260
tcgtcggcag cgtcagatgt gtataagaga cagttggtga cgcaactgga tag 53
<210> 261
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 261
tcgtcggcag cgtcagatgt gtataagaga cagagactat gctcaggtcc tactt 55
<210> 262
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 262
tcgtcggcag cgtcagatgt gtataagaga cagcttcgga accttctcca aca 53
<210> 263
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 263
tcgtcggcag cgtcagatgt gtataagaga cagtagtatt gttatagcgg ccttctg 57
<210> 264
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 264
tcgtcggcag cgtcagatgt gtataagaga caggttagcc actgcgaagt caa 53
<210> 265
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 265
tcgtcggcag cgtcagatgt gtataagaga cagctgaaca acaccacctg taatg 55
<210> 266
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 266
tcgtcggcag cgtcagatgt gtataagaga cagtagagtc agcacacaaa gcc 53
<210> 267
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 267
tcgtcggcag cgtcagatgt gtataagaga cagggcatga gtaggccagt tt 52
<210> 268
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 268
tcgtcggcag cgtcagatgt gtataagaga cagcagagaa gaaactggcc tactc 55
<210> 269
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 269
tcgtcggcag cgtcagatgt gtataagaga cagtattagg tgcaagggca cag 53
<210> 270
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 270
tcgtcggcag cgtcagatgt gtataagaga cagcaacaca ggcgaactca tttac 55
<210> 271
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 271
tcgtcggcag cgtcagatgt gtataagaga cagtaggcag agcacttctc attaag 56
<210> 272
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 272
tcgtcggcag cgtcagatgt gtataagaga cagtcctcat ctggagggta gaaa 54
<210> 273
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 273
tcgtcggcag cgtcagatgt gtataagaga cagtctggag ggtagaaaga acaatac 57
<210> 274
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 274
tcgtcggcag cgtcagatgt gtataagaga cagaggttga agagcagcag aag 53
<210> 275
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 275
tcgtcggcag cgtcagatgt gtataagaga cagagtctga acaactggtg taagt 55
<210> 276
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 276
tcgtcggcag cgtcagatgt gtataagaga cagaggtaaa cattggctgc attaac 56
<210> 277
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 277
tcgtcggcag cgtcagatgt gtataagaga caggcaacac ctcctccatg ttta 54
<210> 278
<211> 51
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 278
tcgtcggcag cgtcagatgt gtataagaga cagttgggcc gacaacatga a 51
<210> 279
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 279
tcgtcggcag cgtcagatgt gtataagaga caggaatgta tagggtcagc accaa 55
<210> 280
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 280
tcgtcggcag cgtcagatgt gtataagaga cagcatttgt gcgaacagta tctacac 57
<210> 281
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 281
tcgtcggcag cgtcagatgt gtataagaga cagcttccag agttgttgta acttcttc 58
<210> 282
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 282
tcgtcggcag cgtcagatgt gtataagaga caggagtggc agaatctgga tgaa 54
<210> 283
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 283
tcgtcggcag cgtcagatgt gtataagaga cagacccggg taagtggtta tataattg 58
<210> 284
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 284
tcgtcggcag cgtcagatgt gtataagaga cagtctgcat gtgcaagcat ttc 53
<210> 285
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 285
tcgtcggcag cgtcagatgt gtataagaga caggcgtgtt tcttctgcat gtg 53
<210> 286
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 286
tcgtcggcag cgtcagatgt gtataagaga cagcatagcc aagtggcatt gtaac 55
<210> 287
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 287
tcgtcggcag cgtcagatgt gtataagaga cagcgagcag cttcttccaa atttaag 57
<210> 288
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 288
tcgtcggcag cgtcagatgt gtataagaga cagtgtccag aataggacca atctttat 58
<210> 289
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 289
tcgtcggcag cgtcagatgt gtataagaga cagacttgcg tgtggaggtt aat 53
<210> 290
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 290
tcgtcggcag cgtcagatgt gtataagaga cagccaaact gttgtccata tgtcatt 57
<210> 291
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 291
tcgtcggcag cgtcagatgt gtataagaga cagctgacat gtacctaccc agaaa 55
<210> 292
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 292
tcgtcggcag cgtcagatgt gtataagaga cagcacctaa ctcacctact gtcttatt 58
<210> 293
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 293
tcgtcggcag cgtcagatgt gtataagaga cagaacatca cctaactcac ctactg 56
<210> 294
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 294
tcgtcggcag cgtcagatgt gtataagaga cagggtgact cctgttgtac tagatatt 58
<210> 295
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 295
tcgtcggcag cgtcagatgt gtataagaga cagcaggtgg tgctgacatc ataa 54
<210> 296
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 296
tcgtcggcag cgtcagatgt gtataagaga cagggacctt tgtattctga ggactt 56
<210> 297
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 297
tcgtcggcag cgtcagatgt gtataagaga cagaacatcc gtaataggac ctttgt 56
<210> 298
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 298
tcgtcggcag cgtcagatgt gtataagaga cagcgaagct tgcgtttgga tatg 54
<210> 299
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 299
tcgtcggcag cgtcagatgt gtataagaga cagatagcca ccacatcacc attta 55
<210> 300
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 300
tcgtcggcag cgtcagatgt gtataagaga cagcgtggct ttattagttg cattgt 56
<210> 301
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 301
tcgtcggcag cgtcagatgt gtataagaga cagtcttcgc aggcaagatt atcc 54
<210> 302
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 302
tcgtcggcag cgtcagatgt gtataagaga cagcactact tcttcagaga ctggtt 56
<210> 303
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 303
tcgtcggcag cgtcagatgt gtataagaga cagacagcag ctaaaccatg agtag 55
<210> 304
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 304
tcgtcggcag cgtcagatgt gtataagaga cagggacact attaacagca gctaaac 57
<210> 305
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 305
tcgtcggcag cgtcagatgt gtataagaga cagctttgct atagtagtcg gcatagat 58
<210> 306
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 306
tcgtcggcag cgtcagatgt gtataagaga cagagtagtc ggcatagatg ctttaat 57
<210> 307
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 307
tcgtcggcag cgtcagatgt gtataagaga cagaccagta cagtaggttg caatag 56
<210> 308
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 308
tcgtcggcag cgtcagatgt gtataagaga cagagagttc aaatagcctt ctctgt 56
<210> 309
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 309
tcgtcggcag cgtcagatgt gtataagaga cagtgcagcc aatccaagta cata 54
<210> 310
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 310
tcgtcggcag cgtcagatgt gtataagaga cagcatgatt gcagccaatc caa 53
<210> 311
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 311
tcgtcggcag cgtcagatgt gtataagaga cagacattcg actcttgttg ctctatt 57
<210> 312
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 312
tcgtcggcag cgtcagatgt gtataagaga caggactctt gttgctctat tacgtttg 58
<210> 313
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 313
tcgtcggcag cgtcagatgt gtataagaga caggaaccat tcttcactgt aacactatc 59
<210> 314
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 314
tcgtcggcag cgtcagatgt gtataagaga cagcgatgta agaagactgg tcagtag 57
<210> 315
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 315
tcgtcggcag cgtcagatgt gtataagaga cagactgcaa cttccgcact atc 53
<210> 316
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 316
tcgtcggcag cgtcagatgt gtataagaga cagcactatc accaacatca gacacta 57
<210> 317
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 317
tcgtcggcag cgtcagatgt gtataagaga cagctgaatc aacaaaccct tgcc 54
<210> 318
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 318
tcgtcggcag cgtcagatgt gtataagaga cagcgccagt aacttctatg tcagat 56
<210> 319
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 319
tcgtcggcag cgtcagatgt gtataagaga caggcattaa tatgacgcgc actac 55
<210> 320
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 320
tcgtcggcag cgtcagatgt gtataagaga cagtttacca cccttaagtg ctatct 56
<210> 321
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 321
tcgtcggcag cgtcagatgt gtataagaga cagcccttaa gtgctatctt tgttgttac 59
<210> 322
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 322
tcgtcggcag cgtcagatgt gtataagaga cagcgagtga caccaccatc aata 54
<210> 323
<211> 50
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 323
tcgtcggcag cgtcagatgt gtataagaga cagccaccac gctggctaaa 50
<210> 324
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 324
tcgtcggcag cgtcagatgt gtataagaga cagtggctta ccagaagcat cttt 54
<210> 325
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 325
tcgtcggcag cgtcagatgt gtataagaga cagaatatgg tactggctta ccagaag 57
<210> 326
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 326
tcgtcggcag cgtcagatgt gtataagaga cagagataca caaacaccag cttct 55
<210> 327
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 327
tcgtcggcag cgtcagatgt gtataagaga cagatcattg ttaagtaccc atctacca 58
<210> 328
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 328
tcgtcggcag cgtcagatgt gtataagaga cagaaaggca actacatgac tgtattc 57
<210> 329
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 329
tcgtcggcag cgtcagatgt gtataagaga cagacagagt acagtgaatg acataagg 58
<210> 330
<211> 51
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 330
tcgtcggcag cgtcagatgt gtataagaga cagacagcgc agcttcttca a 51
<210> 331
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 331
tcgtcggcag cgtcagatgt gtataagaga cagaaagact acacgtctct ttaggt 56
<210> 332
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 332
tcgtcggcag cgtcagatgt gtataagaga cagggtttgt ggtggttggt aaag 54
<210> 333
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 333
tcgtcggcag cgtcagatgt gtataagaga cagcagctga ggtgatagag gtttg 55
<210> 334
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 334
tcgtcggcag cgtcagatgt gtataagaga caggggttaa gcatgtcttc agagg 55
<210> 335
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 335
tcgtcggcag cgtcagatgt gtataagaga cagagtctgt cctggttgaa tgc 53
<210> 336
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 336
tcgtcggcag cgtcagatgt gtataagaga cagaccagat ggtgaaccat tgtaa 55
<210> 337
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 337
tcgtcggcag cgtcagatgt gtataagaga cagctaagtc tgtgccagca tgaa 54
<210> 338
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 338
tcgtcggcag cgtcagatgt gtataagaga cagagcatga actccagttg gtaat 55
<210> 339
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 339
tcgtcggcag cgtcagatgt gtataagaga caggtttgag cagaaagagg tccta 55
<210> 340
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 340
tcgtcggcag cgtcagatgt gtataagaga cagcaattcc agtttgagca gaaaga 56
<210> 341
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 341
tcgtcggcag cgtcagatgt gtataagaga cagcagtggt gtgtaccctt gatt 54
<210> 342
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 342
tcgtcggcag cgtcagatgt gtataagaga cagcaaagac cattgagtac tctgga 56
<210> 343
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 343
tcgtcggcag cgtcagatgt gtataagaga cagcacccaa ctagcaggca tatag 55
<210> 344
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 344
tcgtcggcag cgtcagatgt gtataagaga cagtaatacg catcacccaa ctagc 55
<210> 345
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 345
tcgtcggcag cgtcagatgt gtataagaga cagtaagagc ccacatggaa atgg 54
<210> 346
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 346
tcgtcggcag cgtcagatgt gtataagaga cagcacatgg aaatggcttg atctaaa 57
<210> 347
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 347
tcgtcggcag cgtcagatgt gtataagaga caggtcagtc taaagtagcg gttgag 56
<210> 348
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 348
tcgtcggcag cgtcagatgt gtataagaga caggctattc ttgggtggga gtag 54
<210> 349
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 349
tcgtcggcag cgtcagatgt gtataagaga cagcctgttg tccagcattt cttc 54
<210> 350
<211> 51
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 350
tcgtcggcag cgtcagatgt gtataagaga cagaccctgc atggaaagca a 51
<210> 351
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 351
tcgtcggcag cgtcagatgt gtataagaga cagagcaaca gcctgctcat aa 52
<210> 352
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 352
tcgtcggcag cgtcagatgt gtataagaga caggctgcat cacggtcaaa ttc 53
<210> 353
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 353
tcgtcggcag cgtcagatgt gtataagaga cagtcaaggg aacacaacca tctc 54
<210> 354
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 354
tcgtcggcag cgtcagatgt gtataagaga cagggctgct gttgtaagag gtat 54
<210> 355
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 355
tcgtcggcag cgtcagatgt gtataagaga caggtcgtag tgcaacagga ctaa 54
<210> 356
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 356
tcgtcggcag cgtcagatgt gtataagaga caggacagca gaattggccc tta 53
<210> 357
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 357
tcgtcggcag cgtcagatgt gtataagaga caggtaccag ttccatcact cttagg 56
<210> 358
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 358
tcgtcggcag cgtcagatgt gtataagaga cagggacctt taggtgtgtc tgtaa 55
<210> 359
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 359
tcgtcggcag cgtcagatgt gtataagaga caggacgtac tgtggcagct aaa 53
<210> 360
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 360
tcgtcggcag cgtcagatgt gtataagaga cagcaggcac ttctgttgca ttac 54
<210> 361
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 361
tcgtcggcag cgtcagatgt gtataagaga cagccatatt ggcttccggt gtaa 54
<210> 362
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 362
tcgtcggcag cgtcagatgt gtataagaga cagggcagta cagacaacac gat 53
<210> 363
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 363
tcgtcggcag cgtcagatgt gtataagaga caggttgatc acaactacag ccataac 57
<210> 364
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 364
tcgtcggcag cgtcagatgt gtataagaga cagaaagccc tgtatacgac atcag 55
<210> 365
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 365
tcgtcggcag cgtcagatgt gtataagaga cagggtacca tgtcaccgtc tattc 55
<210> 366
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 366
tcgtcggcag cgtcagatgt gtataagaga caggtttagc aacagctgga caatc 55
<210> 367
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 367
tcgtcggcag cgtcagatgt gtataagaga cagggcgtac acgttcacct aa 52
<210> 368
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 368
tcgtcggcag cgtcagatgt gtataagaga cagacaccaa caataccagc atttc 55
<210> 369
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 369
tcgtcggcag cgtcagatgt gtataagaga cagcctctct tccgtgaagt catattt 57
<210> 370
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 370
tcgtcggcag cgtcagatgt gtataagaga cagaagccct ggtcaaggtt aata 54
<210> 371
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 371
tcgtcggcag cgtcagatgt gtataagaga cagcttgtag gtgggaacac tgtag 55
<210> 372
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 372
tcgtcggcag cgtcagatgt gtataagaga cagggtggga acactgtaga gaataa 56
<210> 373
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 373
tcgtcggcag cgtcagatgt gtataagaga cagaagcacg tagtgcgttt atct 54
<210> 374
<211> 61
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 374
tcgtcggcag cgtcagatgt gtataagaga cagtaacgat agtagtcata atcgctgata 60
g 61
<210> 375
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 375
tcgtcggcag cgtcagatgt gtataagaga cagagcatta ccatcctgag caaa 54
<210> 376
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 376
tcgtcggcag cgtcagatgt gtataagaga cagagtgcat cttgatcctc ataact 56
<210> 377
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 377
tcgtcggcag cgtcagatgt gtataagaga caggttgtgc caaccaccat aga 53
<210> 378
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 378
tcgtcggcag cgtcagatgt gtataagaga cagcatatag tgaaccgcca caca 54
<210> 379
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 379
tcgtcggcag cgtcagatgt gtataagaga caggatgagg ttccacctgg ttta 54
<210> 380
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 380
tcgtcggcag cgtcagatgt gtataagaga caggaaacac acaacagcat cgtc 54
<210> 381
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 381
tcgtcggcag cgtcagatgt gtataagaga caggcatcgt cagagagtat catcatt 57
<210> 382
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 382
tcgtcggcag cgtcagatgt gtataagaga cagattcttg atggatctgg gtaagg 56
<210> 383
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 383
tcgtcggcag cgtcagatgt gtataagaga cagccctagg attcttgatg gatctg 56
<210> 384
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 384
tcgtcggcag cgtcagatgt gtataagaga cagctcaggt tcccaatacc ttgaa 55
<210> 385
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 385
tcgtcggcag cgtcagatgt gtataagaga caggttccca ataccttgaa gtgttatc 58
<210> 386
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 386
tcgtcggcag cgtcagatgt gtataagaga cagggtcgta acagcattta caacataa 58
<210> 387
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 387
tcgtcggcag cgtcagatgt gtataagaga cagacggatt aacagacaag actaa 55
<210> 388
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 388
tcgtcggcag cgtcagatgt gtataagaga cagaacttgt ccattagcac acaatg 56
<210> 389
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 389
tcgtcggcag cgtcagatgt gtataagaga cagagctcat acctcctaag taaagttg 58
<210> 390
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 390
tcgtcggcag cgtcagatgt gtataagaga cagggttaag tggtggtcta ggttta 56
<210> 391
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 391
tcgtcggcag cgtcagatgt gtataagaga caggagtgtt gggtataagc cagtaa 56
<210> 392
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 392
tcgtcggcag cgtcagatgt gtataagaga cagtgggtat aagccagtaa ttctaaca 58
<210> 393
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 393
tcgtcggcag cgtcagatgt gtataagaga cagcgagcac gtgcaggtat aat 53
<210> 394
<211> 51
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 394
tcgtcggcag cgtcagatgt gtataagaga caggctgtcg tctcaggcaa t 51
<210> 395
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 395
tcgtcggcag cgtcagatgt gtataagaga cagggttcta gtgtgccctt agttag 56
<210> 396
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 396
tcgtcggcag cgtcagatgt gtataagaga caggtcaaca atttcagcag gacaa 55
<210> 397
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 397
tcgtcggcag cgtcagatgt gtataagaga cagcatattc tgagccctgt gatgaa 56
<210> 398
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 398
tcgtcggcag cgtcagatgt gtataagaga caggaatcaa cagtttgagt tggtagtc 58
<210> 399
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 399
tcgtcggcag cgtcagatgt gtataagaga cagtaggtgc ctgtgtagga tgta 54
<210> 400
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 400
tcgtcggcag cgtcagatgt gtataagaga caggccaggt atgtcaacac ataaac 56
<210> 401
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 401
tcgtcggcag cgtcagatgt gtataagaga cagcctcgac atcgaagcca atc 53
<210> 402
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 402
tcgtcggcag cgtcagatgt gtataagaga cagaacagct tctctagtag catgac 56
<210> 403
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 403
tcgtcggcag cgtcagatgt gtataagaga cagagtcctt tgtacataag tggtatga 58
<210> 404
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 404
tcgtcggcag cgtcagatgt gtataagaga caggcggtgg tttagcacta act 53
<210> 405
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 405
tcgtcggcag cgtcagatgt gtataagaga cagaagcatg tggcacgtct atc 53
<210> 406
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 406
tcgtcggcag cgtcagatgt gtataagaga cagcataagt gtctgaagca gtgga 55
<210> 407
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 407
tcgtcggcag cgtcagatgt gtataagaga cagtagtcca gtcaacacgc ttaac 55
<210> 408
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 408
tcgtcggcag cgtcagatgt gtataagaga caggccgcat taatcttcag ttcatc 56
<210> 409
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 409
tcgtcggcag cgtcagatgt gtataagaga cagtgtcaga atgtgtggca taaga 55
<210> 410
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 410
tcgtcggcag cgtcagatgt gtataagaga cagtgtgaat ttgtcagaat gtgtgg 56
<210> 411
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 411
tcgtcggcag cgtcagatgt gtataagaga cagctcacat ggactgtcag agtaatag 58
<210> 412
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 412
tcgtcggcag cgtcagatgt gtataagaga cagcgtagca gactttagtg gtacat 56
<210> 413
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 413
tcgtcggcag cgtcagatgt gtataagaga cagctggtac ttcaccctgt tgtc 54
<210> 414
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 414
tcgtcggcag cgtcagatgt gtataagaga cagtcaccct gttgtccatc aaa 53
<210> 415
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 415
tcgtcggcag cgtcagatgt gtataagaga cagatgtgct ggagcatctc ttt 53
<210> 416
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 416
tcgtcggcag cgtcagatgt gtataagaga cagtcagttg gtttcttggc tatgt 55
<210> 417
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 417
tcgtcggcag cgtcagatgt gtataagaga caggggacct acagatggtt gtaaa 55
<210> 418
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 418
tcgtcggcag cgtcagatgt gtataagaga caggactagc ttgtttggga cctac 55
<210> 419
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 419
tcgtcggcag cgtcagatgt gtataagaga cagttccatt tgactcctgg gttta 55
<210> 420
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 420
tcgtcggcag cgtcagatgt gtataagaga cagtgactcc tgggtttaaa ttcttgta 58
<210> 421
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 421
tcgtcggcag cgtcagatgt gtataagaga cagacgttta gctagtccaa tcagtag 57
<210> 422
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 422
tcgtcggcag cgtcagatgt gtataagaga cagagtagat gtaaaccacc taactgac 58
<210> 423
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 423
tcgtcggcag cgtcagatgt gtataagaga caggccatct ttacaccaaa gcataa 56
<210> 424
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 424
tcgtcggcag cgtcagatgt gtataagaga cagtctacat ggccatcttt acacc 55
<210> 425
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 425
tcgtcggcag cgtcagatgt gtataagaga caggggtaca gctaatgtta atgtgttt 58
<210> 426
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 426
tcgtcggcag cgtcagatgt gtataagaga cagctttatc agaaccagca ccaaa 55
<210> 427
<211> 59
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 427
tcgtcggcag cgtcagatgt gtataagaga caggtcttag ggtcgtacat atcactaat 59
<210> 428
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 428
tcgtcggcag cgtcagatgt gtataagaga cagatctatt tgttcgcgtg gtttg 55
<210> 429
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 429
tcgtcggcag cgtcagatgt gtataagaga cagcgtggtt tgccaagata attaca 56
<210> 430
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 430
tcgtcggcag cgtcagatgt gtataagaga cagtttaaag acataacagc agtaccc 57
<210> 431
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 431
tcgtcggcag cgtcagatgt gtataagaga cagctgacta gagactagtg gcaataaa 58
<210> 432
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 432
tcgtcggcag cgtcagatgt gtataagaga cagacaccac gtgtgaaaga attag 55
<210> 433
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 433
tcgtcggcag cgtcagatgt gtataagaga cagggacagg gttatcaaac ctctta 56
<210> 434
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 434
tcgtcggcag cgtcagatgt gtataagaga cagaaatggt aggacagggt tatca 55
<210> 435
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 435
tcgtcggcag cgtcagatgt gtataagaga cagctctgaa ctcactttcc atcca 55
<210> 436
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 436
tcgtcggcag cgtcagatgt gtataagaga cagtcgcact agaataaact ctgaact 57
<210> 437
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 437
tcgtcggcag cgtcagatgt gtataagaga cagctaaatt aataggcgtg tgcttaga 58
<210> 438
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 438
tcgtcggcag cgtcagatgt gtataagaga caggctgtcc aacctgaaga ag 52
<210> 439
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 439
tcgtcggcag cgtcagatgt gtataagaga caggtttctg agagagggtc aagtg 55
<210> 440
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 440
tcgtcggcag cgtcagatgt gtataagaga cagaggagac actccataac acttaaa 57
<210> 441
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 441
tcgtcggcag cgtcagatgt gtataagaga cagtgatgcg gaattatata ggacagaa 58
<210> 442
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 442
tcgtcggcag cgtcagatgt gtataagaga cagaccacca accttagaat caaga 55
<210> 443
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 443
tcgtcggcag cgtcagatgt gtataagaga cagagttgct ggtgcatgta gaa 53
<210> 444
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 444
tcgtcggcag cgtcagatgt gtataagaga caggtactac tactctgtat ggttggtaac 60
<210> 445
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 445
tcgtcggcag cgtcagatgt gtataagaga caggatcacg gacagcatca gtag 54
<210> 446
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 446
tcgtcggcag cgtcagatgt gtataagaga caggaatctc aagtgtctgt ggatca 56
<210> 447
<211> 50
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 447
tcgtcggcag cgtcagatgt gtataagaga cagagcctgc acgtgtttga 50
<210> 448
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 448
tcgtcggcag cgtcagatgt gtataagaga cagtatacct gcaccaatgg gtatg 55
<210> 449
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 449
tcgtcggcag cgtcagatgt gtataagaga cagttgtggg tatggcaata gagtt 55
<210> 450
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 450
tcgtcggcag cgtcagatgt gtataagaga cagagacact ggtagaattt ctgtgg 56
<210> 451
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 451
tcgtcggcag cgtcagatgt gtataagaga cagtttgtct tgttcaacag ctattcc 57
<210> 452
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 452
tcgtcggcag cgtcagatgt gtataagaga caggaggtct ctagcagcaa tatcac 56
<210> 453
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 453
tcgtcggcag cgtcagatgt gtataagaga cagaccaaag gtccaaccag aag 53
<210> 454
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 454
tcgtcggcag cgtcagatgt gtataagaga cagtgcactt gctgtggaag aa 52
<210> 455
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 455
tcgtcggcag cgtcagatgt gtataagaga cagagcgtgt ttaaagcttg tgc 53
<210> 456
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 456
tcgtcggcag cgtcagatgt gtataagaga caggtctgcc tgtgatcaac ctatc 55
<210> 457
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 457
tcgtcggcag cgtcagatgt gtataagaga cagctgactg agggaaggac ataag 55
<210> 458
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 458
tcgtcggcag cgtcagatgt gtataagaga cagaagacac cttcacgagg aaag 54
<210> 459
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 459
tcgtcggcag cgtcagatgt gtataagaga cagacaacat cacagttacc agaca 55
<210> 460
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 460
tcgtcggcag cgtcagatgt gtataagaga caggagtcta attcaggttg caaagg 56
<210> 461
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 461
tcgtcggcag cgtcagatgt gtataagaga cagcattgag gcggtcaatt tcttt 55
<210> 462
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 462
tcgtcggcag cgtcagatgt gtataagaga caggcaactg gtcatacagc aaag 54
<210> 463
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 463
tcgtcggcag cgtcagatgt gtataagaga cagctgaagg agtagcatcc ttgatt 56
<210> 464
<211> 51
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 464
tcgtcggcag cgtcagatgt gtataagaga cagtgcagta gcgcgaacaa a 51
<210> 465
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 465
tcgtcggcag cgtcagatgt gtataagaga caggaagtgc aacgccaaca ataa 54
<210> 466
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 466
tcgtcggcag cgtcagatgt gtataagaga cagccaacaa taagccatcc gaaag 55
<210> 467
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 467
tcgtcggcag cgtcagatgt gtataagaga caggcaaagc caaagcctca ttatt 55
<210> 468
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 468
tcgtcggcag cgtcagatgt gtataagaga caggaacggc atttccagca aag 53
<210> 469
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 469
tcgtcggcag cgtcagatgt gtataagaga cagcaacacc agtgtctgta ctcaa 55
<210> 470
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 470
tcgtcggcag cgtcagatgt gtataagaga cagacagctg gtaatagtct gaagtg 56
<210> 471
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 471
tcgtcggcag cgtcagatgt gtataagaga caggtcgtcg tcggttcatc ataa 54
<210> 472
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 472
tcgtcggcag cgtcagatgt gtataagaga cagcgtacct gtctcttccg aaac 54
<210> 473
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 473
tcgtcggcag cgtcagatgt gtataagaga cagcagcagt acgcacacaa tc 52
<210> 474
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 474
tcgtcggcag cgtcagatgt gtataagaga cagcgttaac aatattgcag cagtacg 57
<210> 475
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 475
tcgtcggcag cgtcagatgt gtataagaga caggtaccgt tggaatctgc cat 53
<210> 476
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 476
tcgtcggcag cgtcagatgt gtataagaga caggttccat tgttcaagga gcttt 55
<210> 477
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 477
tcgtcggcag cgtcagatgt gtataagaga caggcgcaaa cagtctgaaa gaag 54
<210> 478
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 478
tcgtcggcag cgtcagatgt gtataagaga cagggattga atgaccacat ggaac 55
<210> 479
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 479
tcgtcggcag cgtcagatgt gtataagaga cagaatcctg tagcgactgt atgc 54
<210> 480
<211> 53
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 480
tcgtcggcag cgtcagatgt gtataagaga cagaaacctg agtcacctgc tac 53
<210> 481
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 481
tcgtcggcag cgtcagatgt gtataagaga cagtctccat tggttgctct tcatc 55
<210> 482
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 482
tcgtcggcag cgtcagatgt gtataagaga cagcgagtgt tatcagtgcc aaga 54
<210> 483
<211> 57
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 483
tcgtcggcag cgtcagatgt gtataagaga cagtcttgaa cttcctcttg tctgatg 57
<210> 484
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 484
tcgtcggcag cgtcagatgt gtataagaga caggatctgg cacgtaactg atagac 56
<210> 485
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 485
tcgtcggcag cgtcagatgt gtataagaga cagcgtttag gcgtgacaag tttc 54
<210> 486
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 486
tcgtcggcag cgtcagatgt gtataagaga caggtgaaat gcagctacag ttgtg 55
<210> 487
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 487
tcgtcggcag cgtcagatgt gtataagaga cagacaacgc actacaagac tacc 54
<210> 488
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 488
tcgtcggcag cgtcagatgt gtataagaga cagtcgatgt actgaatggg tgattt 56
<210> 489
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 489
tcgtcggcag cgtcagatgt gtataagaga caggcgttct ccattctggt tact 54
<210> 490
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 490
tcgtcggcag cgtcagatgt gtataagaga caggggtgca tttcgctgat tt 52
<210> 491
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 491
tcgtcggcag cgtcagatgt gtataagaga cagtctggta gctcttcggt agtag 55
<210> 492
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 492
tcgtcggcag cgtcagatgt gtataagaga cagaccatct tggactgaga tctttc 56
<210> 493
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 493
tcgtcggcag cgtcagatgt gtataagaga caggcacgat tgcagcattg ttag 54
<210> 494
<211> 55
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 494
tcgtcggcag cgtcagatgt gtataagaga cagtttggca atgttgttcc ttgag 55
<210> 495
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 495
tcgtcggcag cgtcagatgt gtataagaga caggctctca agctggttca atct 54
<210> 496
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 496
tcgtcggcag cgtcagatgt gtataagaga cagctgtcaa gcagcagcaa ag 52
<210> 497
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 497
tcgtcggcag cgtcagatgt gtataagaga cagttgcggc caatgtttgt aatc 54
<210> 498
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 498
tcgtcggcag cgtcagatgt gtataagaga cagccttgtc tgattagttc ctggtc 56
<210> 499
<211> 52
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 499
tcgtcggcag cgtcagatgt gtataagaga caggctctgt tggtgggaat gt 52
<210> 500
<211> 58
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 500
tcgtcggcag cgtcagatgt gtataagaga caggaattca ttctgcacaa gagtagac 58
<210> 501
<211> 54
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 501
tcgtcggcag cgtcagatgt gtataagaga cagcagctct ccctagcatt gttc 54
<210> 502
<211> 56
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 502
tcgtcggcag cgtcagatgt gtataagaga cagcattagg gctcttccat ataggc 56
<210> 503
<211> 39
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 503
caagcagaag acggcatacg agatgtctcg tgggctcgg 39
<210> 504
<211> 43
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 504
aatgatacgg cgaccaccga gatctacact cgtcggcagc gtc 43

Claims (10)

1. The method for constructing the novel coronavirus whole genome high-throughput sequencing library is characterized by comprising the following steps of:
A. Extracting RNA of a virus sample, and carrying out reverse transcription to obtain single-stranded cDNA or double-stranded cDNA;
B. according to the published novel coronavirus COVID-19 genome sequence, performing shingled full-coverage primer design, respectively designing a multiplex amplification primer group 1 of an anchor part Illumina joint sequence and a multiplex amplification primer group 2 of the anchor part Illumina joint sequence, performing a first round of PCR reaction by using the primer group 1 and the primer group 2 respectively with single-stranded cDNA or double-stranded cDNA as templates, and mixing amplification products according to equimolar amounts to cover the whole genome of the virus;
C. b, performing a second round of PCR reaction by using the mixed amplification products in the step B as templates and using tagged Illumina library amplification primers, and purifying the amplification products to obtain a high-throughput sequencing library;
the design method of the multiplex amplification primer group 1 of the anchor part Illumina linker sequence and the multiplex amplification primer group 2 of the anchor part Illumina linker sequence in the step B comprises the following steps:
b1, respectively designing a multi-specific amplification primer group I and a multi-specific amplification primer group II according to a novel coronavirus COVID-19 genome sequence, wherein the primer group I comprises a forward primer F pool and a reverse primer R pool, the primer group II comprises a forward primer F 'pool and a reverse primer R' pool, and each pair of forward primer and reverse primer corresponds to one amplicon; respectively designing a forward primer and a reverse primer of a primer group II in two adjacent amplicon sequences of the primer group I, respectively designing the forward primer and the reverse primer of the primer group I in the two adjacent amplicon sequences of the primer group II, and repeating the steps until the amplicons corresponding to the primer group I and the amplicons corresponding to the primer group II cover the whole genome of the virus in a shingled mode;
B2, adding the Illumina part linker sequence (1) to the 5 'end of each forward primer according to the 5' -3 'direction, and adding the Illumina part linker sequence (2) to the 5' end of each reverse primer according to the 5'-3' direction; a forward primer F pool with the Illumina part joint sequence (1) and a reverse primer R pool with the Illumina part joint sequence (2) are used as a multiplex amplification primer group 1 of an anchoring part Illumina joint sequence; a forward primer F 'pool with the Illumina part joint sequence (1) and a reverse primer R' pool with the Illumina part joint sequence (2) are used as a multiplex amplification primer group 2 of the anchoring part Illumina joint sequence;
wherein the sequence of Illumina partial linker sequence (1) is as follows: 5' -I7 tagged primer 3' terminal sequence-AGATGTGTATAAGAGACAG-3 ';
the sequence of Illumina partial linker sequence (2) is as follows: 5' -I5 tagged primer 3' -terminal sequence-AGATGTGTATAAGAGACAG-3 ';
the size of the 3' -end sequence of the I7 tagged primer is 9-15 bp, and the size of the 3' -end sequence of the I5 tagged primer is 8-14 bp, so that the I7 tagged primer and the I5 tagged primer can be specifically annealed to the 3' -end binding position on the amplicon;
the multiplex amplification primer group 1 of the anchor part Illumina linker sequence and the multiplex amplification primer group 2 of the anchor part Illumina linker sequence in the step B comprise 250 pairs of primers, wherein the forward primers are COV-1-F-COV-250-F, the nucleotide sequences of the forward primers are respectively shown as SEQ ID NO. 3-252, the reverse primers are COV-1-R-COV-250-R, the nucleotide sequences of the reverse primers are respectively shown as SEQ ID NO. 253-502, the COV-1-F and the COV-1-R are a pair of primers, and the COV-2-F and the COV-2-R are a pair of primers, and the like.
2. The method of claim 1, wherein the Tm threshold difference between each primer pair in step B is ± 2 ℃; and/or
The amplicon size is 200-300bp; and/or
Primer pairs which can cause the formation of dimer and stem-loop structures between or within the primers are removed during primer design; and/or
In the same multiplex specific amplification primer set, the reverse primer sequence 5 'of the upstream amplicon of the genome is located upstream of the forward primer sequence 5' of the downstream amplicon.
3. The method according to claim 1, wherein the method for reverse transcription of RNA into single stranded cDNA in step a is selected from the group consisting of a or b:
a. guiding single-stranded cDNA synthesis by using a 6-10bp random primer;
b. mixing a plurality of primers from a reverse primer R pool and a reverse primer R ' pool to form a specific reverse transcription primer group to guide single-stranded cDNA synthesis, wherein the reverse primers are uniformly distributed along the 3' -5' direction of a viral genome, and the primers are 800-1000bp apart;
the method for reverse transcription of RNA into double-stranded cDNA in step A comprises:
i. guiding single-stranded cDNA synthesis by using a 6-10bp random primer;
ii. Nicking the RNA-cDNA hybrid duplex with RNase H in the presence of dNTPs;
And iii, synthesizing double-stranded cDNA by using the small fragment RNA generated at the notch as a primer and utilizing RNA-dependent DNA polymerase.
4. The method of claim 1, wherein the labeled Illumina library amplification primers in step C are as follows:
i7 tagged primer: 5'-CAAGCAGAAGACGGCATACGAGAT (I7) GTCTCGTGGGCTCGG-3', I5 tagged primer: 5'-AATGATACGGCGACCACCGAGATCTACAC (i 5) TCGTCGGCAGCGTC-3'.
5. The method of claim 4, wherein the sequence of Illumina partial linker sequence (1) in step B is as follows: 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3';
the sequence of Illumina partial linker sequence (2) is as follows: 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3'.
6. The method according to claim 1, wherein the primer information for multiplex amplification primer set 1 of the anchor portion Illumina adaptor sequence in step B is as follows:
Figure FDA0004163264270000021
/>
Figure FDA0004163264270000031
/>
Figure FDA0004163264270000041
/>
Figure FDA0004163264270000051
primer information for multiplex amplification primer set 2 for anchor Illumina adaptor sequence is as follows:
Figure FDA0004163264270000052
/>
Figure FDA0004163264270000061
/>
Figure FDA0004163264270000071
/>
Figure FDA0004163264270000081
wherein the primer number COV-1 corresponds to the primers COV-1-F and COV-1-R, the primer number COV-2 corresponds to the primers COV-2-F and COV-2-R, and so on.
7. The method of any one of claims 1-6, wherein the virus sample is from a pharyngeal swab, an alveolar lavage, or a supernatant isolated culture after virus infection of cells.
8. Kit for constructing a novel coronavirus whole genome high throughput sequencing library, characterized in that it comprises a multiplex amplification primer set 1 of anchor part Illumina adaptor sequences and a multiplex amplification primer set 2 of anchor part Illumina adaptor sequences used in the method according to any one of claims 1 to 7 and tagged Illumina library amplification primers, and further comprises reagents for library construction.
9. Use of the method of any one of claims 1-7 in the preparation of a kit for novel coronavirus variant detection, said use comprising:
(1) Constructing a novel coronavirus whole genome high-throughput sequencing library to be tested according to the method of any one of claims 1-7;
(2) Sequencing the high-throughput sequencing library on a machine after the quality inspection of the high-throughput sequencing library is qualified;
(3) Bioinformatics analysis and detection of mutation sites.
10. The use of claim 9, wherein step (3) comprises the sub-steps of:
1) Constructing a novel coronavirus COVID-19 reference genome MT019531.1 index data set by using BWA software, and generating fai files by using samtools faidx;
2) reads quality control analysis: filtering and quality control analysis are carried out on the double-end reads by using SOAPnuke to obtain clean reads; reads with the following conditions will be removed: condition 1: reads containing linker sequence contamination; condition 2: reads with more than 10% N bases; condition 3: the number of low-quality bases exceeds 50% of the total reads, said low quality being Q <38;
3) Data alignment and sequencing: the BWA is combined with the samtools to compare clear reads to a reference genome MT019531.1 to generate a BAM file, and the comparison parameters are "-t 32-M"; sequencing by SortSam.jar using picard software; establishing an index for the ordered BAM files by using an index tool of samtools; performing quality control on the generated BAM file by using a Qualimap tool;
4) And (3) mutation detection: detecting SNP and InDel of the virus by using samtools pileup and VarScan; the SNP detection parameters are: "-min-coverage 8-min-reads 24-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1"; the InDel detection parameters are: "-min-coverage 8-min-reads 2 4-min-var-freq 0.1-min-avg-quat 0-p-value 1.0-strand-filter 0-variants-output-vcf 1";
5) Finally, the detected SNPs and indels were annotated using annovar software based on the GFF file of MT019531.1 reference genome.
CN202010225821.0A 2020-03-26 2020-03-26 Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction Active CN111334868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010225821.0A CN111334868B (en) 2020-03-26 2020-03-26 Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010225821.0A CN111334868B (en) 2020-03-26 2020-03-26 Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction

Publications (2)

Publication Number Publication Date
CN111334868A CN111334868A (en) 2020-06-26
CN111334868B true CN111334868B (en) 2023-05-23

Family

ID=71180448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010225821.0A Active CN111334868B (en) 2020-03-26 2020-03-26 Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction

Country Status (1)

Country Link
CN (1) CN111334868B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111676325A (en) * 2020-07-07 2020-09-18 云南科耀生物科技有限公司 Primer combination for detecting SARS-CoV-2 whole genome and application method
CN114067907B (en) * 2020-07-31 2022-07-08 普瑞基准生物医药(苏州)有限公司 Method for accurately identifying RNA virus genome variation
CN112063752A (en) * 2020-08-20 2020-12-11 广东省科学院动物研究所 Universal coronavirus PCR primer and application thereof
CN111996290A (en) * 2020-08-21 2020-11-27 上海交通大学医学院附属第九人民医院 SARS-CoV-2 whole genome nucleic acid amplification specific primer based on multiple PCR
CN111979353A (en) * 2020-08-25 2020-11-24 上海融享生物科技有限公司 Library construction method for sequencing novel coronavirus SARS-CoV-2 full-length genome
CN112063764A (en) * 2020-10-28 2020-12-11 江苏科德生物医药科技有限公司 Multiplex real-time fluorescent RT-PCR primer probe composition and kit for novel coronavirus nucleic acid detection
CN112102945B (en) * 2020-11-09 2021-02-05 电子科技大学 Device for predicting severe condition of COVID-19 patient
CN112359101B (en) * 2020-11-13 2023-10-03 苏州金唯智生物科技有限公司 Method for cross contamination of quality inspection oligonucleotides
CN112322788B (en) * 2020-11-24 2021-07-06 杭州杰毅生物技术有限公司 mNGS primer group and kit for detecting SARS-CoV-2
CN113337639B (en) * 2021-05-28 2022-01-25 天津金匙医学科技有限公司 Method for detecting COVID-19 based on mNGS and application thereof
WO2023003608A1 (en) * 2021-07-22 2023-01-26 Ohio State Innovation Foundation Methods of collecting and analyzing dust samples for surveillance of viral diseases
CN114038501B (en) * 2021-12-21 2022-05-27 广州金匙医学检验有限公司 Background bacterium judgment method based on machine learning
CN114672591B (en) * 2022-01-11 2022-11-01 湖北省疾病预防控制中心(湖北省预防医学科学院) Primer group and kit for identifying novel coronavirus and application of primer group and kit
CN115838836B (en) * 2022-11-14 2024-01-30 圣湘生物科技股份有限公司 Composition, kit, method and application of different types of virus joint inspection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104975081A (en) * 2015-06-01 2015-10-14 南京市妇幼保健院 Amplimers, kit and method for detecting PKD1 gene mutation
CN109371139A (en) * 2018-12-29 2019-02-22 杭州迪安医学检验中心有限公司 A kind of primer and its application being used to detect the variation of thyroid cancer pathogenic related gene based on high throughput sequencing technologies
CN110273028A (en) * 2019-06-27 2019-09-24 深圳市海普洛斯生物科技有限公司 Enrichment method, sequencing data analysis method and the device of viral integrase type DNA
CN110343783A (en) * 2019-07-08 2019-10-18 广东省公共卫生研究院 Norovirus sequencing primer, kit and detection method based on high-flux sequence
CN110387438A (en) * 2019-07-08 2019-10-29 广东省公共卫生研究院 Multi-primers, kit and method for enterovirus high-flux sequence
CN110484655A (en) * 2019-08-30 2019-11-22 中国医学科学院病原生物学研究所 The detection method of two generation of parainfluenza virus full-length genome sequencing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103484564B (en) * 2013-07-23 2014-11-19 中国医学科学院病原生物学研究所 High-sensitivity method used for detecting and identifying human coronavirus
US9909176B2 (en) * 2014-09-08 2018-03-06 The Johns Hopkins University Efficient deep sequencing and rapid genomic speciation of RNA viruses (vRNAseq)
CN108456723A (en) * 2018-03-21 2018-08-28 福州福瑞医学检验实验室有限公司 A kind of the genetic test primer and kit of endometriosis risk profile
CN110734908B (en) * 2019-11-15 2021-06-08 福州福瑞医学检验实验室有限公司 Construction method of high-throughput sequencing library and kit for library construction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104975081A (en) * 2015-06-01 2015-10-14 南京市妇幼保健院 Amplimers, kit and method for detecting PKD1 gene mutation
CN109371139A (en) * 2018-12-29 2019-02-22 杭州迪安医学检验中心有限公司 A kind of primer and its application being used to detect the variation of thyroid cancer pathogenic related gene based on high throughput sequencing technologies
CN110273028A (en) * 2019-06-27 2019-09-24 深圳市海普洛斯生物科技有限公司 Enrichment method, sequencing data analysis method and the device of viral integrase type DNA
CN110343783A (en) * 2019-07-08 2019-10-18 广东省公共卫生研究院 Norovirus sequencing primer, kit and detection method based on high-flux sequence
CN110387438A (en) * 2019-07-08 2019-10-29 广东省公共卫生研究院 Multi-primers, kit and method for enterovirus high-flux sequence
CN110484655A (en) * 2019-08-30 2019-11-22 中国医学科学院病原生物学研究所 The detection method of two generation of parainfluenza virus full-length genome sequencing

Also Published As

Publication number Publication date
CN111334868A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111334868B (en) Construction method of novel coronavirus whole genome high-throughput sequencing library and kit for library construction
EP4202064A1 (en) Kit and method for isothermal rapid detection of sars-cov-2 virus nucleic acid
CN111118226B (en) Novel coronavirus whole genome capture method, primer group and kit
US11085079B2 (en) Universal Sanger sequencing from next-gen sequencing amplicons
EP3115468B1 (en) Increasing confidence of allele calls with molecular counting
US10100351B2 (en) High-throughput sequencing detection method for methylated CpG islands
KR20210039989A (en) Use of high temperature resistant Cas protein, detection method and reagent kit of target nucleic acid molecule
JP6739339B2 (en) Covered sequence-converted DNA and detection method
CN115335536B (en) Compositions and methods for on-the-fly nucleic acid detection
CN114592035B (en) Asymmetric amplification-based library construction primer group and application thereof
EP3430168A2 (en) Methods and kits to identify klebsiella strains
WO2016165591A1 (en) Mgmt gene promoter methylation detection based on pyrosequencing technology
US20220364173A1 (en) Methods and systems for detection of nucleic acid modifications
JP2020533974A5 (en)
CN105018490A (en) Primer pairs, probes and kit for detecting polymorphism of human MTHFR gene
CN110241175B (en) Method for researching preference of Tet2 sequence
MX2010012275A (en) Method for detecting respiratory viral agents in a test sample.
JP2016506755A (en) Methods and kits for identifying and adjusting bias in sequencing polynucleotide samples
US20180051330A1 (en) Methods of amplifying nucleic acids and compositions and kits for practicing the same
WO2008016334A1 (en) Multiplex analysis of nucleic acids
CN105441558A (en) Primer probe system for MGMT (O&lt;6&gt;-methylguanine-DNA methyhransferase) gene methylation detection and kit adopting primer probe system
JPWO2009119331A1 (en) Reagent containing primer for detection of cytokeratin 7 mRNA
CN115029345A (en) Nucleic acid detection kit based on CRISPR and application thereof
CN112501166A (en) Chemically modified high-stability RNA, kit and method
CN113151599A (en) Primer group, reagent, kit and detection method for detecting novel coronavirus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant