CN110364225B - Method for excavating ASFV nucleic acid detection sequence by using letter generation technology - Google Patents

Method for excavating ASFV nucleic acid detection sequence by using letter generation technology Download PDF

Info

Publication number
CN110364225B
CN110364225B CN201910763772.3A CN201910763772A CN110364225B CN 110364225 B CN110364225 B CN 110364225B CN 201910763772 A CN201910763772 A CN 201910763772A CN 110364225 B CN110364225 B CN 110364225B
Authority
CN
China
Prior art keywords
asfv
gene
nucleic acid
genome
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910763772.3A
Other languages
Chinese (zh)
Other versions
CN110364225A (en
Inventor
危宏平
熊东彦
张晓旭
余军平
熊进
蒋梦薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Virology of CAS
Original Assignee
Wuhan Institute of Virology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Virology of CAS filed Critical Wuhan Institute of Virology of CAS
Priority to CN201910763772.3A priority Critical patent/CN110364225B/en
Publication of CN110364225A publication Critical patent/CN110364225A/en
Application granted granted Critical
Publication of CN110364225B publication Critical patent/CN110364225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a method for excavating ASFV nucleic acid detection sequences by using a letter generation technology, and relates to the technical fields of bioinformatics and virus detection. The method for mining ASFV nucleic acid detection sequences by using a letter technology provides 3R language scripts and 3 Perl language scripts, and is matched with the ChewBBACA software, all ASFV whole genome sequences available in a public database are analyzed and mined, a conserved and specific sequence and a matrix file with sequence information reserved are found, the sequences are redistributed and reduced to corresponding ASFV sequences according to the matrix file information, the sequences are sequenced from 5 'to 3' on a genome, the functional gene names corresponding to ORFs of the sequences are obtained according to annotation information of the ASFVs, and finally all genes and sequence information which can be used for ASFV nucleic acid detection are obtained.

Description

Method for excavating ASFV nucleic acid detection sequence by using letter generation technology
Technical Field
The invention relates to the technical fields of bioinformatics and virus detection, in particular to a method for excavating ASFV nucleic acid detection sequences by using a letter generation technology.
Background
African swine fever virus (Africa Swine Fever Virus, hereinafter referred to as ASFV) is a virus which has strong infectivity, high mortality and is infected with pigs, has been widely spread worldwide since the first occurrence in Kennel in 1921, has suffered from significant loss to agricultural economy in China in 2018, and has an important role in preventing viral infection and controlling viral transmission in accurate and rapid detection of ASFV.
The nucleic acid detection has the advantages of high sensitivity and good specificity, and is a mainstream method for early infection diagnosis of ASFV. At present, a primer probe is designed for the p72 gene of the ASFV, the size of the ASFV genome is 171-193kb, the size of the p72 gene is only 1941bp, the whole genome is only covered, the current research shows that a large number of insertion or deletion mutations exist in the ASFV genome, a large number of recombination events exist in the genome, although the virus does not have a large number of mutation on the p72 gene at present, whether the virus has a large number of insertion or deletion mutation on the gene at present can not be ensured, if the mutation occurs, a large number of detection kits designed based on the gene (p 72) at present can not effectively detect the ASFV, so that a method for using other genes of the virus as nucleic acid detection sequences is developed, and a large number of nucleic acid detection sequences are dug as reserves, which is of great significance for comprehensively detecting the ASFV and controlling virus transmission.
Nucleic acid detection for viruses requires that the amplified target sequence is conserved and specific, although ASFV is a DNA virus, it is relatively easy for genomic insertion or deletion mutation to occur, so that the mining of different gene fragments as an effective means for guaranteeing the diversity and long-lasting performance of the nucleic acid detection method requires bioinformatics means for mining as many conserved and specific nucleic acid sequences as possible, and the present invention has developed a method for mining all conserved and specific nucleic acid sequences in ASFV for nucleic acid detection of ASFV using bioinformatics technology.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a method for excavating an ASFV nucleic acid detection sequence by using a biological communication technology, which solves the problems that the existing nucleic acid detection for viruses requires the conservation and specificity of amplified target sequences, and although ASFV is a DNA virus, the ASFV is easier to cause insertion or deletion mutation of genome.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: a method for mining ASFV nucleic acid detection sequences by using a letter generation technology specifically comprises the following steps:
s1, firstly, obtaining the existing ASFV genome from a NCBI nucleic acid database, downloading each genome as a single FASTA format file, naming the files, storing all the files in a folder, naming the folder, for example, naming the genome FASTA file with the genome ID of AM712239 as AM712239.Fa, and deleting the underline uniformly if the ID is included;
s2, creating a ref-genome folder, namely, custom naming, randomly selecting two or more genome files to be placed in, and taking one of the genome files as a reference genome, wherein the selected reference genome is an ASFV genome with the U18466.2 of Genbank accession, and analyzing the genome by using chewBBACA software to excavate a whole gene;
s3, using the chewBBACA software, calling prodigal2.6.0 to predict genes of 40 ASFV whole genome sequences, calling blastp to compare to wwole genes, calculating and screening BSR values based on BSR, then taking genes with BSR values larger than 0.6 as alleles, using the software to carry out the file rolling to screen out core genes, outputting a matrix file containing core types of all genomes and a reference genome, and similarly using the software to call clustalw2.1 and mafft v7.4.07 to compare the core sequences corresponding to all types of ASFV with the reference genome in multiple sequences, and outputting a comparison result folder containing each core gene type of ASFV, thereby obtaining the information of all the conserved core genes of ASFV;
s4, reading in a core gene type matrix file and a core gene representative sequence comparison file output by the chewBBACA software by using a written R language script, and reassigning each core gene of the ASFV by traversing the matrix data and pattern matching the core genes to output a total fasta file containing all core gene sequences of all ASFV, wherein the total fasta file is named total fasta;
s5, reading the total fasta file output by the R language script in the step S4 by using the written Perl language script, and distributing all the core genes of each ASFV into one single fasta file, namely, each single fasta file only comprises all the core genes of one ASFV;
s6, circularly reading independent fasta files of all core genes of each ASFV self output by the Perl script in the step S5 by using the written Perl script, and sequencing all core genes of each ASFV self according to the 5 'to 3' direction on a genome to generate sequenced gene files;
s7, extracting all gene sequences and gene names from ASFV reference genome with complete annotation information according to gff file by using written R language script, and constructing blast database for all extracted gene nucleic acid sequences, wherein the selected reference genome is ASFV genome with Genbank accession as U18466.2;
s8, using the corresponding classified gene file of the step S6, taking the blast database constructed in the step S7 as the reference, carrying out local blast, screening out an optimal result with the similarity being more than 90% and the length being more than 450bp, using the corresponding step S7 of the optimal result to output the gene name of the file, using the written R language script to extract the name and annotation information of the used gene according to the gbk file of the reference genome, wherein the output file contains the screened gene names of all sequences which can be used for nucleic acid detection, and merging all ASFV ordered gene sequences into one fasta file;
s9, independently extracting target detection gene sequences from a reference genome, constructing a local blast database, comparing the fasta file obtained in the step S8 with the constructed local blast database, screening results with the similarity of more than 90 percent and the length of more than 450bp, naming the results as result.
Preferably, in the step S1, the file is named as genome ID number added with fa suffix, and the folder is named as genome.
Preferably, the two genomes randomly selected in step S2 are genomes with complete annotation information.
Preferably, the name of the R language script written in the step S4 is merge_all_all_alle2total_fasta.
Preferably, the Perl language script written in step S5 is named as assignment_each_sample_core_alle2each_file.
Preferably, the method comprises the steps of, the Perl language script written in the step S6 is named sort_each sample_sample.
Preferably, the name of the R language script written in the step S7 is extract_genesbyff.
Preferably, the name of the R language script written in the step S8 is extract_gene_name_info bygbk.
Preferably, the Perl language script written in the step S9 is named as extract_seqbyid.
(III) beneficial effects
The invention provides a method for excavating ASFV nucleic acid detection sequences by using a letter generation technology. Compared with the prior art, the method has the following beneficial effects: the method for mining ASFV nucleic acid detection sequences by using a signaling technology aims to solve the problem of singleness of the existing nucleic acid detection method taking a p72 gene as a target gene by mining all nucleic acid detection sequences in the ASFV, provides a bioinformatics method applied to viral nucleic acid detection gene mining, and is used for mining 52 genes which comprise the p72 gene and can be used for ASFV detection, randomly selecting and verifying that 4 genes are effective for ASFV nucleic acid detection, obtaining the existing ASFV virus genome information from a nucleic acid database of NCBI, adopting a Geneby gene allele calling method to obtain different types of allele matrixes of ASFV viruses, reducing and distributing allele nucleic acid sequences to corresponding ASFV strains by using matrix information, sequencing in a direction from 5 'to 3' of a genome, constructing an allele library, randomly selecting an ASFV strain genome gbk file with complete gene annotation information, extracting all the mined nucleic acid sequence names, and finally extracting a conserved sequence from the allele library by writing script design for ASFV nucleic acid detection.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of the result of the operation of the program of the present invention;
FIG. 3 is a table showing the information of all gene names and encoded proteins that can be detected as ASFV nucleic acids according to the present invention;
FIG. 4 is a diagram of a primer probe sequence information table according to the present invention;
FIG. 5 is a diagram showing the result of ordinary PCR amplification of the detection sequences of MGF360-15R and CP312R genes of the present invention;
FIG. 6 is a diagram showing the result of ordinary PCR amplification of the detection sequence of the E184L gene of the present invention;
FIG. 7 is a graph showing the amplification of ASFV MGF360-15R gene of the present invention;
FIG. 8 is a graph showing the amplification of ASFV CP312R gene of the present invention;
FIG. 9 is a graph showing the amplification of ASFV E184L gene of the present invention;
FIG. 10 is a graph showing the amplification of ASFV MGF505 gene of the present invention;
FIG. 11 is a graph showing qPCR results of analyzing the detection sensitivity of MGF360-15R gene of the present invention;
FIG. 12 is a graph showing qPCR results of analyzing the detection sensitivity of the CP312R gene according to the present invention;
FIG. 13 is a graph showing qPCR results of analysis of detection sensitivity of E184L gene according to the present invention;
FIG. 14 is a graph of qPCR results of the present invention for analysis of detection sensitivity of multiple copy fragments of the MGF505 gene family.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-14, the embodiment of the present invention provides a technical solution: a method for mining ASFV nucleic acid detection sequences by using a letter generation technology specifically comprises the following steps:
s1, firstly, obtaining the existing ASFV genome from a NCBI nucleic acid database, downloading each genome as a single FASTA format file, naming the files, storing all the files in a folder, naming the folder, for example, the genome FASTA file with the genome ID of AM712239 is named as AM712239.Fa, if the genome ID contains an underline, the underline is deleted uniformly, the file is named as the genome ID number added with fa suffix, and the folder is named as genome. Fa or can be custom named;
s2, creating a ref-genome folder, which can be named in a self-defining way, randomly selecting two genome files to be placed in, taking one genome as a reference genome, wherein the selected reference genome is an ASFV genome with the Genbank accession of U18466.2, the two randomly selected genomes are genomes with complete annotation information, and analyzing the genomes to mine the whole genes;
s3, calling prodigal2.6.0 predictive genes and blastp to wwole genes by using the software of the chemiluminescent BBACA, calculating and screening BSR values based on the BSR, then taking genes with the BSR values larger than 0.6 as alleles, screening out core genes by using the software, outputting a matrix file containing core types of all the genomes and a reference genome, and carrying out multi-sequence comparison on core genes of all types of ASFV and core sequences corresponding to the reference genome by using the software to call clustalw2.1 and mafft v7.4.07, and outputting a comparison result folder containing each core gene type of the ASFV, thereby obtaining the information of the conserved core genes of the ASFV;
s4, reading in a core gene type matrix file and a core gene representative sequence comparison file output by the chewBBACA software by using a written R language script, and reassigning each core gene of the ASFV by traversing the matrix data and pattern matching core genes to output a total fasta file containing all core gene sequences of all ASFV, wherein the name of the written R language script is merge_all_alle2total_fasta.R;
s5, reading a total fasta file output by the R language script in the step S4 by using a written Perl language script, and distributing all core genes of each ASFV into one single fasta file, wherein each single fasta file only comprises all core genes of one ASFV, and the name of the written Perl language script is assignment_each_sample_core_alle2each_file.pl;
s6, circularly reading independent fasta files of all the core genes of each ASFV output in the step S5 by using the written Perl script, and sequencing all the core genes of each ASFV according to the direction from 5 'to 3' on the genome to obtain sequenced gene files. The name of the written Perl language script is sort_each_sample_sample_sample.pl;
s7, extracting all gene sequences from ASFV reference genome with complete annotation information according to gff file by using a written R language script, constructing a blast database for the extracted gene nucleic acid sequence, wherein the name of the written R language script is extract_genesbygff.R, and the reference genome selected by the invention is Genbank accession ASFV genome with U18466.2;
s8, using the corresponding proportioned gene file of the step S6, taking the blast database constructed in the step S7 as a reference, carrying out local blast, screening out an optimal result with the similarity being more than 90% and the length being more than 450bp, using the corresponding step S7 of the optimal result to output the gene name of the file, extracting the name and annotation information of the used gene according to the gbk file of the reference genome by using the compiled R language script, wherein the output file contains the screened gene names of all sequences which can be used for nucleic acid detection, and the name of the compiled R language script is extract_gene_name_info Bygbk. Combining all ASFV ordered gene sequences into one fasta file;
s9, extracting a target detection gene sequence from a reference genome, constructing a local blast database, comparing the fasta file obtained in the step S8 with the local blast database, screening results with the similarity being more than 90% and the length being more than 450bp, naming the results as result.txt, extracting all sequences by using a written Perl language script, extracting all sequences from the results obtained in the step, comparing the extracted conserved sequences by multiple sequences, and then respectively designing a primer and a probe for detecting nucleic acid, wherein the name of the written Perl language script is extract_seqByid.pl.
ASFV detection primers and probes are designed by the method provided by the invention:
the 3 multicopy and relatively conserved fragments of the random selection genes MGF360-15R, CP312R, E L and MGF505 gene family exist on the ASFV genome respectively as ASFV detection target genes, the design of primers and probes is carried out by software Beacon designer 8, the detection interval designed for the MGF360-15R gene is between base sequences 112-263, the upstream primer starts at the base sequence position of 112,5'-ATGGACATGATATGTCTAGAC-3', the downstream primer starts at the base sequence position of 245,5'-GCACATCATCTACTACAAG-3', and the probe starts at the base sequence position of 148,5 '(6-FAM) -CCTGCTCCTCTGGCGATGAT-3' (BHQ-1). The detection interval designed for the CP312R gene is between base sequences 310-461, the upstream primer is 310,5'-GATCCCTGTTTGCAGTTC-3' at the base sequence starting point, the downstream primer is 441,5'-GCTTCTTCTAACAGTTCAATA-3' at the base sequence starting point, the probe is 339,5 '(6-FAM) -AATCTCGCCGCCATTGGAAG-3' (BHQ-1) at the base sequence starting point, the detection interval designed for the E184L gene is between base sequences 92-235, the upstream primer is 92,5'-CACCATTCTAAACCATATCTG-3' at the base sequence starting point, the downstream primer is 218,5'-CACCTGAGGAGAAGAATC-3' at the base sequence starting point, the probe is 191,5 '(6-FAM) -CCTCCTTCGAGAGCCCATCTTTGA-3' (BHQ-1) at the base sequence starting point, the detection interval designed for the MGF505 gene is between base sequences 1-101, the degenerate MGF505 gene family is designed to detect as many copies of the MGF505 gene family as possible, the upstream primer is 218, the downstream primer is a probe 1, 5'-CACCTGAGGAGAAGAATC-3' at the base sequence starting point, and the probe is shown in FIG. 4-5 '-543' at the base sequence starting point, and the probe is shown in FIG. 4-5 '-CACCTGAGGAGAAGAATC-3'.
The sensitivity and the specificity of ASFV detection primer probes obtained by conventional PCR and fluorescent quantitative PCR evaluation are specifically determined by the following steps:
t1, simulation detection sample: and respectively adding plasmids containing the target fragments to be detected into different pig whole blood samples in different amounts to simulate 10 ASFV detection samples, wherein the sequences of the fragments on the plasmids are the sequences of the corresponding four genes of a reference genome No. Genbank accession U18466.2, 2, 3, 9, 10 and 11 are simulation positive samples, and 1,5, 6, 7 and 8 are simulation negative samples.
T2, QIAamp DNA Blood Mini Kit (Cat.51104) from Qiagen was used to extract the above-described simulated ASFV test sample nucleic acid according to the instruction manual.
T3, the concentration of the primer and the probe dry powder synthesized by the company is 20 mu M respectively.
T4, conventional PCR amplification reaction: the total reaction system of conventional PCR was 25. Mu.L, 2.5. Mu.L of 10 Xbuffer, 2. Mu.L of dNTPs, 0.4. Mu.L of 20. Mu.M upstream primer, 0.4. Mu.L of 20. Mu.M downstream primer, 0.5. Mu.L of DNA polymerase, 17.2. Mu.L of sterile water, and finally 2. Mu.L of the DNA template extracted in the step T2 were added, the reaction system was uniformly mixed and centrifuged briefly, the reaction conditions were pre-denatured at 95℃for 3min, amplified at 95℃30s,58℃30s,72℃30s, 35 cycles, 72℃5min,12℃for 5min on a BioRad T100PCR instrument, and the amplified products were detected after the end of amplification to verify the reaction system and primer specificity to finally obtain agarose gel electrophoresis results, and the specificity of the three pairs of primers designed for MGF360-15R, CP R and E184L genes was evaluated by the conventional PCR amplification reaction to finally obtain agarose gel graphs of the conventional PCR amplification products of FIG. 5 and FIG. 6. Among the detection sequence common PCR amplification results of MGF360-15R and CP312R genes of FIG. 5, 1-10 are the amplification results of MGF360-15R, wherein 2, 3, 8, 9, 10 are the amplification results of the simulated positive sample, 1, 4, 5, 6, 7 are the amplification results of the simulated negative sample, 11-20 are the amplification results of CP312R, wherein 12, 13, 18, 19, 20 are the amplification results of the simulated positive sample, 11, 14, 15, 16, 17 are the amplification results of the simulated negative sample; in the detection sequence general PCR amplification results of the E184L gene of FIG. 6, 21-30 are the amplification results of E184L, wherein 22, 23, 28, 29, 30 are the amplification results of the simulated positive samples, and 21, 24, 25, 26, 27 are the amplification results of the simulated negative samples.
T5, fluorescent quantitative PCR amplification reaction: a TaqMan real-time fluorescent quantitative PCR reaction system is adopted, the total system is 25 mu L, the Takara (Code No. RR390A) MIX 12.5 mu L,0.4 mu L of 20 mu M upstream primer, 0.4 mu L of 20 mu M downstream primer, 0.4 mu L of 20 mu M probe and 9.3 mu L DEPC water are adopted, finally 2 mu L of the DNA template extracted in the step T2 is added, the reaction system is uniformly mixed and then is subjected to short centrifugation, the reaction condition is pre-denaturation at 95 ℃ for 1min, the reaction is carried out for 30s at 95 ℃ for 40 s, the amplification is carried out on a BioRad CFX96 fluorescent quantitative PCR instrument, and the experiment evaluates the detection effect of the primers and the probes designed according to MGF360-15R, CP312 37184L and MGF505 genes, and finally the fluorescent quantitative PCR amplification results shown in the graphs of figures 7, 8, 9 and 10 are obtained.
T6, further evaluation of the specificity and sensitivity of 4 sets of primer probes: (1) extracting nucleic acid of streptococcus suis, hemolytic streptococcus, listeria monocytogenes, escherichia coli, salmonella, and liver tissue, muscle tissue, classical Swine Fever Virus (CSFV), porcine Reproductive and Respiratory Syndrome Virus (PRRSV), H7N8 avian influenza virus of ASFV negative pigs, and testing the specificity of the primer probe; (2) and (3) carrying out gradient dilution on the sample with the simulated positive number 2, and detecting the detection sensitivity of the four pairs of primer probes. Finally, the fluorescent quantitative PCR amplification results shown in FIG. 11, FIG. 12, FIG. 13 and FIG. 14 were obtained.
Test results:
as shown by the conventional PCR results, three pairs of primers designed according to three genes MGF360-15R, CP312R and E184L can accurately detect ASFV simulated positive samples, and the negative samples are not amplified, which indicates that the genes of pigs have no interference to the results, and the four pairs of primer probes designed according to the qPCR graphs of multi-copy fragments of MGF360-15R, CP312R, E184L genes and MGF505 gene family can detect simulated positive samples 2, 3, 9, 10 and 11, and the simulated negative samples and negative control groups are not amplified, and the four pairs of primer probes designed according to the multi-copy fragments of three genes MGF360-15R, CP R and E184L and MGF505 gene family have the lowest detection limits of 22 copies/mu L, 22 copies/mu L and 2256 copies/mu L respectively, which indicates that the four primer probes have high sensitivity to detection fragments, and in addition, the four pairs of primer probes are used for detecting streptococcus hemolyticus, the single-sided streptococcus, the single-crystal haemosis, the special PCR method is applied to detect the human liver virus (ASFV), and the PCR method has high value than that the PCR method is applied to detect the human virus (ASSV), and has no specific PCR method is applied to the human virus (the human virus, and has high value).
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A method for mining ASFV nucleic acid detection sequence by using a letter generation technology, which is characterized in that: the method specifically comprises the following steps:
s1, firstly, acquiring the existing ASFV genome from a NCBI nucleic acid database, downloading each genome as a single FASTA format file, naming the files, storing all the files in a folder, and naming the folder;
s2, creating a ref-genome folder, randomly selecting two or more genome files to be placed in the ref-genome folder, and analyzing and excavating all genomes in the ref-genome folder by using the chewBBACA software;
s3, using the chewBBACA software, calling a prodigal2.6.0 predicted gene for all genomes by using a Genebngeilelectalling algorithm, performing blastp comparison on all genomes, screening a BSR value based on BSR calculation, then using a gene with the BSR value being more than 0.6 as an allele, performing allelectalling by using the software, screening out a core gene, outputting a matrix file containing core gene types of all genomes, and also using the software to call clustalw2.1 and mafft v7.4.07 to perform multi-sequence comparison on core gene sequences corresponding to all types of ASFV and a reference genome, and outputting a comparison result folder containing each core gene type of ASFV, thereby obtaining information of all conservative core genes of the ASFV;
s4, reading a core gene type matrix file and a core gene representative sequence comparison file which are output by the chewbba software by using a written R language script, and reassigning each core gene of the ASFV by traversing the matrix data and pattern matching the core genes to output a total fasta file which contains all core gene sequences of all the ASFV and is named total.
S5, reading the total fasta file output by the R language script in the step S4 by using the written Perl language script, and distributing all core genes of each ASFV into one single fasta file, namely, each single fasta file only comprises all core genes of one ASFV;
s6, circularly reading independent fasta files of all the core genes of each ASFV, which are output in the step S5, by using the written Perl script, and sequencing all the core genes of each ASFV according to the direction from 5 'to 3' on a genome, thereby generating a dissolved gene file;
s7, extracting all gene sequences and gene names from ASFV reference genome with complete annotation information selected to be placed in a ref-genome folder according to gff file by using a written R language script, and constructing a blast database for all extracted gene nucleic acid sequences;
s8, using the corresponding classified gene file of the step S6, taking the blast database constructed in the step S7 as the reference, carrying out local blast, screening out an optimal result with the similarity being more than 90% and the length being more than 450bp, using the corresponding step S7 of the optimal result to output the gene name of the file, using the written R language script to extract the name and annotation information of the used gene according to the gbk file of the reference genome, wherein the output file contains the screened gene names of all sequences which can be used for nucleic acid detection, and merging all ASFV ordered gene sequences into one fasta file; in summary, all ASFV nucleic acid sequence libraries and corresponding gene names have been constructed as nucleic acid detection sequences;
s9, extracting a target detection gene sequence which is required to be detected, designing a primer, independently extracting a conserved target detection gene sequence from a reference genome, constructing a local blast database, comparing the fasta file obtained in the step S8 with the local blast database constructed in the step, screening a result with the similarity of more than 90% and the length of more than 450bp, naming the result as result.
2. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: in the step S1, the file is named as genome ID number plus fa suffix, and the folder is named as genome. Fa or custom named.
3. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the randomly selected genome in step S2 is a genome with complete annotation information.
4. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the name of the R language script written in the step S4 is merge_all_allel2total_fasta.
5. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the name of the Perl language script written in the step S5 is assignment_each_sample_core_alle2each_file.
6. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the Perl language script written in the step S6 is named sort_each sample_sample.
7. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the name of the R language script written in the step S7 is extract_genesbyfff.
8. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the name of the R language script written in the step S8 is extract_gene_name_info bygbk.
9. The method for mining ASFV nucleic acid detection sequence according to claim 1, wherein the method comprises the steps of: the Perl language script written in the step S9 is named extract_seqbyid.pl.
CN201910763772.3A 2019-08-19 2019-08-19 Method for excavating ASFV nucleic acid detection sequence by using letter generation technology Active CN110364225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910763772.3A CN110364225B (en) 2019-08-19 2019-08-19 Method for excavating ASFV nucleic acid detection sequence by using letter generation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910763772.3A CN110364225B (en) 2019-08-19 2019-08-19 Method for excavating ASFV nucleic acid detection sequence by using letter generation technology

Publications (2)

Publication Number Publication Date
CN110364225A CN110364225A (en) 2019-10-22
CN110364225B true CN110364225B (en) 2023-08-08

Family

ID=68225216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910763772.3A Active CN110364225B (en) 2019-08-19 2019-08-19 Method for excavating ASFV nucleic acid detection sequence by using letter generation technology

Country Status (1)

Country Link
CN (1) CN110364225B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079016A1 (en) * 2010-12-10 2012-06-14 Brandeis University Compositions and methods for the detection and analysis of african swine fever virus
US9474797B1 (en) * 2014-06-19 2016-10-25 The United States Of America, As Represented By The Secretary Of Agriculture African swine fever virus georgia strain adapted to efficiently grow in the vero cell line
CN107784199A (en) * 2017-10-18 2018-03-09 中国科学院昆明植物研究所 A kind of organelle gene group screening technique based on STb gene sequencing result
CN109295255A (en) * 2018-09-18 2019-02-01 张薇 A kind of nucleic acid rapid detection method for African swine fever virus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012079016A1 (en) * 2010-12-10 2012-06-14 Brandeis University Compositions and methods for the detection and analysis of african swine fever virus
US9474797B1 (en) * 2014-06-19 2016-10-25 The United States Of America, As Represented By The Secretary Of Agriculture African swine fever virus georgia strain adapted to efficiently grow in the vero cell line
CN107784199A (en) * 2017-10-18 2018-03-09 中国科学院昆明植物研究所 A kind of organelle gene group screening technique based on STb gene sequencing result
CN109295255A (en) * 2018-09-18 2019-02-01 张薇 A kind of nucleic acid rapid detection method for African swine fever virus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
乔彩霞 ; 刘洋 ; 刘艳华 ; 高志强 ; 林志雄 ; 刘巍 ; 田纯见 ; 王传彬 ; 王强 ; 倪建强 ; .非洲猪瘟病毒核酸能力验证样品的制备及初步应用.病毒学报.2018,(06),全文. *

Also Published As

Publication number Publication date
CN110364225A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
Maan et al. Development and evaluation of real time RT-PCR assays for detection and typing of bluetongue virus
US20180340215A1 (en) Sample analysis, presence determination of a target sequence
Kim et al. Variations in spike glycoprotein gene of MERS-CoV, South Korea, 2015
Lam et al. SARS-CoV-2 genome sequencing methods differ in their abilities to detect variants from low-viral-load samples
Obbard Expansion of the metazoan virosphere: Progress, pitfalls, and prospects
CN111850170A (en) African swine fever virus micro-drop type digital PCR detection method and application thereof
Ravan et al. Loop region-specific oligonucleotide probes for loop-mediated isothermal amplification–enzyme-linked immunosorbent assay truly minimize the instrument needed for detection process
Viarouge et al. Duplex real-time RT-PCR assays for the detection and typing of epizootic haemorrhagic disease virus
CN113774169A (en) 2019 novel coronavirus delta variant nucleic acid detection reagent, kit and detection method
Lounsberry et al. Next‐generation sequencing workflow for assembly of nonmodel mitogenomes exemplified with North Pacific albatrosses (Phoebastria spp.)
CN110527714B (en) Method for detecting integration site of HPV in host genome
Ma et al. Establishment of a real-time recombinase polymerase amplification assay for the detection of avian reovirus
Zheng et al. A TaqMan-MGB real-time RT-PCR assay with an internal amplification control for rapid detection of Muscovy duck reovirus
Liu et al. Transcriptome-wide measurement of poly (A) tail length and composition at subnanogram total RNA sensitivity by PAIso-seq
Zannoli et al. A deletion in the N gene may cause diagnostic escape in SARS-CoV-2 samples
Chen et al. Detection of swine transmissible gastroenteritis coronavirus using loop-mediated isothermal amplification
CN113817872A (en) 2019 novel coronavirus lambda variant nucleic acid detection reagent, kit and detection method
CN110364225B (en) Method for excavating ASFV nucleic acid detection sequence by using letter generation technology
Davis et al. Hepatitis E virus: whole genome sequencing as a new tool for understanding HEV epidemiology and phenotypes
Dimitrova et al. Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry
Bovo et al. A viral metagenomic approach on a non-metagenomic experiment: mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections
Hermann et al. Evaluation of the analytical sensitivity of a polymerase chain reaction assay for the detection of chicken infectious anemia virus in avian vaccines
CN111471801A (en) Reverse transcription-real-time fluorescence quantitative PCR kit for detecting TC07-2 avian infectious bronchitis virus and application thereof
CN111235310A (en) Quadruple TaqMan fluorescent quantitative PCR (polymerase chain reaction) detection method for porcine viral diarrhea pathogen
Liu et al. Quality control of next-generation sequencing-based in vitro diagnostic test for onco-relevant mutations using multiplex reference materials in plasma

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant