CN110364225A

CN110364225A - A method of utilizing raw letter technology mining ASFV detection of nucleic acids sequence

Info

Publication number: CN110364225A
Application number: CN201910763772.3A
Authority: CN
Inventors: 危宏平; 熊东彦; 张晓旭; 余军平; 熊进; 蒋梦薇
Original assignee: Wuhan Institute of Virology of CAS
Current assignee: Wuhan Institute of Virology of CAS
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2019-10-22
Anticipated expiration: 2039-08-19
Also published as: CN110364225B

Abstract

The invention discloses a kind of methods using raw letter technology mining ASFV detection of nucleic acids sequence, and the present invention relates to bioinformatics and technical field of virus detection.This utilizes the raw method for believing technology mining ASFV detection of nucleic acids sequence, provide 3 R language scripts, 3 Perl language scripts, collocation chewBBACA software, all ASFV whole genome sequences obtainable in public database are analyzed, it excavates, it finds conservative and special sequence and retains the matrix file of sequence information, by these sequences according to matrix file information, it redistributes and is restored to corresponding ASFV sequence, and by sequence according to from 5 ' to 3 ' direction sequencing in the genome, according to the annotation information of ASFV, the corresponding functional gene title of ORF where obtaining these sequences, finally obtain all genes and sequence information that can be used as ASFV detection of nucleic acids, the bioinformatics technique has important finger for the excavation of the detection of nucleic acids sequence of other viruses Lead meaning and higher application value.

Description

A method of utilizing raw letter technology mining ASFV detection of nucleic acids sequence

Technical field

It is specially a kind of to utilize raw letter technology mining ASFV the present invention relates to bioinformatics and technical field of virus detection The method of detection of nucleic acids sequence.

Background technique

African swine fever virus (Africa Swine Fever Virus, hereinafter referred to as ASFV) is that a kind of contagiousness is strong, anxious Property, high lethality rate, the virus for infecting pig, from nineteen twenty-one for the first time after Kenya occurs, the wide-scale distribution whole world, and in 2018 Year broken out in China, heavy losses caused to agricultural economy at present, precisely quickly detection ASFV for pre- preventing virus infection and Control viral transmission plays a significant role.

Detection of nucleic acids has the advantage that sensitivity is high, specificity is good, is the main stream approach of ASFV early infection diagnosis.At present The detection of nucleic acids of ASFV almost both for its p72 gene design primer probe, ASFV Genome Size in 171-193kb, and P72 gene size only has 1941bp, only covers whole gene group about 1%, it is a large amount of that current research shows that ASFV genome exists Insertion or deletion mutation, there is also larger amount of recombination events for genome, although the virus has not occurred in p72 gene at present Upper mass mutation, but not can guarantee whether the following virus can occur biggish insertion or deletion mutation on the gene, if These mutation occur, then will be unable to effectively detect ASFV, institute currently based on a large amount of detection kits that the gene (p72) designs To develop a kind of method of the other genes of virus as detection of nucleic acids sequence, and excavate nucleic acid inspection as much as possible Sequencing column are of great significance for complete detection ASFV and control viral transmission as deposit.

Detection of nucleic acids for virus requires the target sequence of amplification conservative and special, although ASFV is DNA virus, It is easier to that the insertion or deletion mutation of genome occurs, so excavation is as detection target fragment for different genes segment The effective means for guaranteeing nucleic acid detection method diversity and long-term effect will excavate conservative and special nucleic acid sequence as much as possible Need to use bioinformatics means, so the present invention utilizes bioinformatics technique, developing one can be excavated in ASFV All methods for guarding special nucleic acid sequence are used for the detection of nucleic acids of ASFV.

Summary of the invention

(1) the technical issues of solving

In view of the deficiencies of the prior art, technology mining ASFV detection of nucleic acids sequence is believed using life the present invention provides a kind of Method, the target sequence for solving the detection of nucleic acids requirement amplification for being currently used for virus is conservative and special, although ASFV is DNA disease Poison, but it is easier to the problem of insertion or deletion mutation of genome occurs.

(2) technical solution

In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of to utilize raw letter technology mining The method of ASFV detection of nucleic acids sequence, specifically includes the following steps:

S1, from the nucleic acid database of NCBI existing ASFV genome is obtained first, using each genome as a list Only FASTA formatted file downloading, is then named file, then All Files are stored in a file, to file Name, such as the genome fasta file designation that genome ID is AM712239 is AM712239.fa, as contained lower stroke in ID Line deletes underscore without exception；

S2, separately one ref-genome file of creation, can customize name, randomly choose two or more genomes File is put into wherein, and using one of as genome is referred to, the reference genome that the present invention selects is Genbank Accession is the ASFV genome of U18466.2, first carries out analysis mining to genome with chewBBACA software and goes out whole genes；

S3, it is called using chewBBACA software using Gene by gene allele calling algorithm The gene of prodigal2.6.0 40 ASFV whole genome sequences of prediction simultaneously calls blastp to compare to whole genes, and And it is based on BSR calculating sifting BSR value, then the gene using BSR value greater than 0.6 recycles the software to carry out as allele Allele calling filters out core genes, the matrix of one core genes type comprising all genomes of output File, containing genome is referred to, also with software transfer clustalw2.1 and mafft v7.4.07 by ASFV all types Coregenes carry out Multiple Sequence Alignment with reference to the corresponding core genes sequence of genome, output one includes ASFV The comparison result file of each core gene type, to obtain the information of all Conserved core genes of ASFV；

S4, the core genes type matrix file that the output of chewBBACA software is read in using the R language scripts write and The representative sequence alignment file of core genes, it is by Ergodic Matrices data and pattern match core genes that ASFV is each Core gene is redistributed, total fasta file of one all core genes sequence comprising all ASFV of output, It is named as total.fasta；

S5, the total fasta file exported using R language scripts in the Perl language scripts read step S4 write, will be every All core genes of a ASFV are assigned in an independent fasta file, i.e., each individually fasta file only includes one All core genes of a ASFV itself；

The institute of each ASFV itself of perl script output in the perl script circulation read step S5 that S6, recycling are write There is the independent fasta file of core genes, all core genes of each ASFV itself are arrived according in the genome 5 ' 3 ' direction is ranked up, and generates the gene document to have sorted；

S7, genome is referred to the ASFV with complete annotation information according to gff3 file using the R language scripts write All gene orders and Gene Name are extracted, to all gene nucleic acid sequence construct blast databases of extraction, present invention selection Reference genome be Genbank accession be U18466.2 ASFV genome；

S8, using the corresponding sorted gene document of step S6, step S7 of being subject to building blast database carry out this Ground blast filters out similarity greater than 90%, and length is greater than the optimal result of 450bp, utilizes step S7 corresponding to optimal result The Gene Name of output file extracts the name of the gene used with the R language scripts write according to the gbk file of reference genome Claim and annotation information, this output file include the Gene Name of all sequences that can be used as detection of nucleic acids screened, it will The gene order that all ASFV have sorted merges in a fasta file；

S9, target detection gene order is individually extracted from reference genome, local blast database is constructed, by step The local blast database of fasta file and building that S8 is obtained compares, and screening similarity is greater than 90%, and length is greater than 450bp As a result, be named as result.txt, and extract all sequences, the result of this step output using the Perl language scripts write For the conserved sequence excavated, then Multiple Sequence Alignment is carried out, then separately designs primer and probe for detection of nucleic acids.

Preferably, in the step S1 by file designation it is that genome ID number adds fa suffix, and file is named as Genomes.fa can customize name.

Preferably, randomly selected two genomes are the genomes for having complete annotation information in the step S2.

Preferably, the entitled merge_all_allele2total_ for the R language scripts write in the step S4 fasta.R。

Preferably, the entitled assign_each_sample_core_ for the Perl language scripts write in the step S5 allele2each_file.pl。

Preferably, the entitled sort_each_sample_ for the Perl language scripts write in the step S6 allele.pl。

Preferably, the entitled extract_genesBygff.R for the R language scripts write in the step S7.

Preferably, the entitled extract_gene_name_ for the R language scripts write in the step S8 infoBygbk.R。

Preferably, the entitled extract_seqByid.pl for the Perl language scripts write in the step S9.

(3) beneficial effect

The present invention provides a kind of methods using raw letter technology mining ASFV detection of nucleic acids sequence.Compared with prior art Have following the utility model has the advantages that the raw method for believing technology mining ASFV detection of nucleic acids sequence of the utilization, owns by excavating in ASFV The method that can be used as detection of nucleic acids sequence, it is existing monistic using p72 gene as the nucleic acid detection method of purpose gene to overcome Problem provides a kind of bioinformatics method applied to viral nucleic acid detection gene excavating, has excavated comprising p72 gene Totally 52 genes that can be used for ASFV detection, randomly choosing and demonstrating the wherein detection of nucleic acids of 4 genes for ASFV is Effectively, it realizes and obtains existing ASFV viral gene group information from the nucleic acid database of NCBI first, using Geneby Gene allele calling method obtains ASFV virus different type allele matrix, then by writing script, utilizes square Battle array information, reduction distribute in allele nucleic acid sequence to corresponding A SFV strain and with 5 ' to 3 ' direction sequencings of genome, structure Allele library is built up, then selects the ASFV strain genome gbk file with complete genome annotation information at random, is extracted All nucleic acid sequence titles excavated, it is conservative and special finally by writing script formulation from allele library and extracting Sequence is used for ASFV detection of nucleic acids.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is present procedure operation result figure；

Fig. 3 is all Gene Names that can be used as ASFV detection of nucleic acids of the present invention and coding Protein Information table figure；

Fig. 4 is primer probe sequence information table figure of the present invention；

Fig. 5 is the detection sequence regular-PCR amplification figure of MGF 360-15R and CP312R gene of the invention；

Fig. 6 is the detection sequence regular-PCR amplification figure of E184L gene of the invention；

Fig. 7 is ASFV MGF 360-15R gene magnification curve graph of the present invention；

Fig. 8 is ASFV CP312R gene magnification curve graph of the present invention；

Fig. 9 is ASFV E184L gene magnification curve graph of the present invention；

Figure 10 is 505 gene magnification curve graph of ASFV MGF of the present invention；

Figure 11 is the qPCR result figure of the detection sensitivity of present invention analysis MGF 360-15R gene；

Figure 12 is the qPCR result figure of the detection sensitivity of present invention analysis CP312R gene；

Figure 13 is the qPCR result figure of the detection sensitivity of present invention analysis E184L gene；

Figure 14 is the qPCR result figure of the detection sensitivity of the multicopy segment of present invention analysis 505 gene family of MGF.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Fig. 1-14 is please referred to, the embodiment of the present invention provides a kind of technical solution: a kind of to utilize raw letter technology mining ASFV core The method of sour detection sequence, specifically includes the following steps:

S1, from the nucleic acid database of NCBI existing ASFV genome is obtained first, using each genome as a list Only FASTA formatted file downloading, is then named file, then All Files are stored in a file, to file Name, such as the genome fasta file designation that genome ID is AM712239 is AM712239.fa, as contained lower stroke in ID Line deletes underscore without exception, by file designation is that genome ID number adds fa suffix, and file be named as genomes.fa or Person can customize name；

S2, separately one ref-genome file of creation, can customize name, randomly choose two genome files and be put into Wherein, and using one of genome as genome is referred to, the reference genome that the present invention selects is Genbank Accession is the ASFV genome of U18466.2, and randomly selected two genomes are the genomes for having complete annotation information, Analysis mining is carried out to genome and goes out whole genes；

S3, using chewBBACA software, using Gene by gene allele calling algorithm, to all genomes It calls prodigal2.6.0 predicted gene, blastp to compare to whole genes, and is based on BSR calculating sifting BSR value, then Gene using BSR value greater than 0.6 recycles the software to carry out allele calling, filters out core as allele Genes, the matrix file of one core genes type comprising all genomes of output, contains and refers to genome, also with Software transfer clustalw2.1 and mafft v7.4.07 is by the core genes of all types of ASFV and refers to genome Corresponding core genes sequence carries out Multiple Sequence Alignment, one comparison knot comprising each core gene type of ASFV of output Fruit file, to obtain the information of the Conserved core gene of ASFV；

S4, the core genes type matrix file that the output of chewBBACA software is read in using the R language scripts write and The representative sequence alignment file of core genes, it is by Ergodic Matrices data and pattern match core genes that ASFV is each Core gene is redistributed, total fasta file of one all core genes sequence comprising all ASFV of output, It is named as total.fasta, the entitled merge_all_allele2total_fasta.R for the R language scripts write；

S5, the total fasta file exported using R language scripts in the Perl language scripts read step S4 write, will be every All core genes of a ASFV are assigned in an independent fasta file, i.e., each individually fasta file only includes one All core genes of a ASFV, the entitled assign_each_sample_core_ for the Perl language scripts write allele2each_file.pl；

All core genes of each ASFV exported in the perl script circulation read step S5 that S6, recycling are write Independent fasta file, all core genes of each ASFV are ranked up according in the genome 5 ' to 3 ' direction, Obtain the gene document to have sorted.The entitled sort_each_sample_allele.pl for the Perl language scripts write；

S7, genome is referred to the ASFV with complete annotation information according to gff3 file using the R language scripts write All gene orders are extracted, to the gene nucleic acid sequence construct blast database extracted, the name for the R language scripts write Referred to as extract_genesBygff.R, the reference genome that the present invention selects are Genbank accession for U18466.2 ASFV genome；

S8, using the corresponding sorted gene document of step S6, step S7 of being subject to building blast database carry out this Ground blast filters out similarity greater than 90%, and length is greater than the optimal result of 450bp, utilizes step S7 corresponding to optimal result The Gene Name of output file extracts the name of the gene used with the R language scripts write according to the gbk file of reference genome Claim and annotation information, this output file include the Gene Name of all sequences that can be used as detection of nucleic acids screened, compiles The entitled extract_gene_name_infoBygbk.R for the R language scripts write.The gene order that all ASFV have been sorted Merge in a fasta file；

S9, target detection gene order is extracted from reference genome and constructs local blast database, step S8 is obtained The fasta file obtained is compared with local blast database, and screening similarity is greater than 90%, and length is greater than 450bp's as a result, life Entitled result.txt, and all sequences are extracted using the Perl language scripts write, the result of this step output is to excavate Conserved sequence, then carry out Multiple Sequence Alignment, separately design primer and probe then with for detection of nucleic acids, the Perl language write Say that the entitled extract_seqByid.pl of script, all ASFV that can be used as ASFV detection of nucleic acids that the present invention excavates are protected Gene information is kept as shown in table Fig. 3.

Using method design ASFV detection primer proposed by the invention and probe:

Random selection gene M GF360-15R, CP312R, E184L and MGF505 gene family exists on ASFV genome 3 multicopies and relatively conservative segment detect target gene respectively as ASFV, the design of primer and probe passes through software Beacon designer 8 carry out, for MGF 360-15R gene design detection interval between base sequence 112-263, The site that upstream primer starts from base sequence is 112,5'-ATGGACATGATATGTCTAGAC-3', and downstream primer starts from base The site of sequence is 245,5'-GCACATCATCTACTACAAG-3', and the site that probe starts from base sequence is 148,5'(6- FAM)-CCTGCTCCTCTGGCGATGAT-3'(BHQ-1).For CP312R gene design detection interval in base sequence Between 310-461, the site that upstream primer starts from base sequence is 310,5'-GATCCCTGTTTGCAGTTC-3', downstream primer The site for starting from base sequence is 441,5'-GCTTCTTCTAACAGTTCAATA-3', and the site that probe starts from base sequence is 339,5'(6-FAM)-AATCTCGCCGCCATTGGAAG-3'(BHQ-1), for E184L gene design detection interval in alkali Between basic sequence 92-235, the site that upstream primer starts from base sequence is 92,5'-CACCATTCTAAACCATATCTG-3', The site that downstream primer starts from base sequence is 218,5'-CACCTGAGGAGAAGAATC-3', and probe starts from the position of base sequence Point is 191,5'(6-FAM)-CCTCCTTCGAGAGCCCATCTTTGA-3'(BHQ-1), for the detection of MGF505 gene design Section is between base sequence 1-101, since the conservative of all sequences in 505 gene family of MGF is not high, to examine as far as possible All copies of 505 gene family of MGF are measured, therefore devise degenerate primer and probe, upstream primer starts from base sequence Site is 1,5'-TTACTRTGGRAGGGRA-3', and the site that downstream primer starts from base sequence is 85,5'- TGRCAGTCYYCRATTTG-3', the site that probe starts from base sequence is 28,5'(6-FAM)- TCYAARGCTCCTATGATGGC-3'(BHQ-1), experiment the primer probe sequence information is as shown in table Fig. 4.

The ASFV detection primer probe sensitivity and specificity that Standard PCR and quantitative fluorescent PCR are evaluated are specific to detect Method the following steps are included:

T1, analog detection sample: the plasmid containing above-mentioned object to be measured segment is added into different amounts to different pigs respectively Whole blood sample in 10 parts of ASFV of simulation detect samples, the fragment sequence on plasmid is with reference to genome Genbank The sequence of No. accession corresponding aforementioned four gene for U18466.2, wherein 2,3,9,10 and 11 be the positive sample of simulation This, 1,5,6,7 and 8 be simulation negative sample.

T2, with the QIAamp DNA Blood Mini Kit (Cat.51104) of Qiagen, illustratively handbook extracts above-mentioned mould Quasi- ASFV detects sample nucleic acid.

T3, the primer and probe dry powder difference compound concentration for synthesizing company are 20 μM.

T4, standard PCR amplification reaction: the reaction total system of Standard PCR is 25 μ L, 2.5 μ L 10 × buffer, 2 μ L DNTPs, 0.4 20 μM of μ L upstream primer, 0.4 20 μM of μ L downstream primer, 0.5 μ L archaeal dna polymerase, 17.2 μ L sterile waters, finally Adding the 2 μ L of DNA profiling extracted in step T2, of short duration centrifugation after reaction system is mixed, reaction condition is 95 DEG C of 3min of initial denaturation, With 95 DEG C of 30s, 58 DEG C of 30s, 72 DEG C of 30s, 35 circulations, 72 DEG C of 5min, 12 DEG C of 5min, in BioRad T100PCR instrument are expanded Upper amplification detects amplified production and comes confirmatory reaction system and primer specificity, finally obtains Ago-Gel electricity after amplification Swimming is as a result, this experiment is reacted with standard PCR amplification to three pairs of primers designed by MGF 360-15R, CP312R and E184L gene Specificity evaluated, finally obtain the regular-PCR amplified production agarose gel electrophoresis results figure of Fig. 5 and Fig. 6.Fig. 5 MGF 360-15R and CP312R gene detection sequence regular-PCR amplification in, 1-10 be MGF 360-15R amplification As a result, wherein 2,3,8,9,10 for simulation positive sample amplification, 1,4,5,6,7 for simulation negative sample amplification, 11- 20 be the amplification of CP312R, wherein 12,13,18,19,20 be to simulate positive sample amplification, 11,14,15,16,17 To simulate negative sample amplification；In the detection sequence regular-PCR amplification of the E184L gene of Fig. 6,21-30 E184L Amplification, wherein 22,23,28,29,30 is negative for simulation for simulation positive sample amplification, 21,24,25,26,27 Sample amplification.

T5, fluorescent quantitative PCR reaction: using TaqMan real-time fluorescence quantitative PCR reaction system, and total system is 25 μ L, including use 12.5 μ L of Takara (Code No.RR390A) MIX, 0.4 20 μM of μ L upstream primer, 0.4 downstream 20 μM of μ L Primer, 0.4 20 μM of μ L probe, 9.3 μ L DEPC water, the 2 μ L of DNA profiling finally plus in step T2 extracted mix reaction system Of short duration centrifugation after even, reaction condition expand 40 circulations, in BioRad with 95 DEG C of 10s, 58 DEG C of 30s for 95 DEG C of 1min of initial denaturation It is expanded on CFX96 quantitative fluorescent PCR instrument, this experiment is to according to MGF 360-15R, 505 gene of CP312R, E184L and MGF The detection effect of designed primer and probe is evaluated, and is finally obtained as Fig. 7, Fig. 8, Fig. 9 and fluorescence shown in Fig. 10 are fixed Measure PCR amplification result.

The specificity and sensitivity of T6, further 4 sets of primed probes of evaluation: 1. extracting Streptococcus suis, hemolytic streptococcus, The liver organization of single increasing Liszt, Escherichia coli, salmonella and ASFV feminine gender pig, musculature, swine fever virus (CSFV), pig blue-ear disease is malicious (PRRSV), and the nucleic acid of H7N8 avian influenza virus tests the specificity of primed probe；2. simulation sun No. 2 sample gradient dilutions of property, detect the detection sensitivity of four pairs of primed probes.It finally obtains such as Figure 11, Figure 12, Figure 13, Figure 14 Shown in fluorescent quantitative PCR result.

Test result:

Find out from Standard PCR result, the three pairs of primers designed according to three gene M GF 360-15R, CP312R and E184L It can accurately detect that ASFV simulates positive sample, and negative sample illustrates that the gene pairs result of pig is noiseless without amplification, from It can be seen in the qPCR curve graph of the multicopy segment of 505 gene family of MGF 360-15R, CP312R, E184L gene and MGF The primed probe designed out can detect simulation positive sample 2,3,9,10 and 11, simulate negative sample and negative control group is equal Without amplification, designed according to the multicopy segment of three gene M GF 360-15R, CP312R and E184L and 505 gene family of MGF Four pairs of primed probes to the minimum detection limit for simulating positive No. 2 samples be respectively 22copies/ μ L, 22copies/ μ L, 22copies/ μ L and 2256copies/ μ L illustrates that these four primed probes have very high sensitivity for detection segment, in addition, With four pairs of primed probes to Streptococcus suis, hemolytic streptococcus is single to increase Liszt, Escherichia coli, salmonella, and African pig The liver organization of pest feminine gender pig, musculature, swine fever virus (CSFV), pig blue-ear disease poison (PRRSV), H7N8 avian influenza virus QPCR detects no positive signal, shows higher specificity, method proposed by the invention for ASFV detection have compared with High practical application value, and be not limited solely to the detection of ASFV, the detection of nucleic acids sequence analyses of other viruses can also be with It excavates with the method to all candidate genes that can be used for detection of nucleic acids.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.

It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims

1. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence, it is characterised in that: specifically includes the following steps:

S1, from the nucleic acid database of NCBI existing ASFV genome is obtained first, individually using each genome as one The downloading of FASTA formatted file, is then named file, then All Files are stored in a file, orders file Name；

S2, separately one ref-genome file of creation randomly choose two or more genome files and are put into wherein, use ChewBBACA software first carries out analysis mining to the genome in ref-genome file and goes out whole genes；

S3, all genomes are called using Gene by gene allele calling algorithm using chewBBACA software Prodigal2.6.0 predicted gene, blastp are compared to whole genes, and are based on BSR calculating sifting BSR value, then will Gene of the BSR value greater than 0.6 recycles the software to carry out allele calling, filters out core as allele Genes, the matrix file of one core genes type comprising all genomes of output, also with the software transfer Clustalw2.1 and mafft v7.4.07 by all types of core genes of ASFV with reference to the corresponding core of genome Genes sequence carries out Multiple Sequence Alignment, exports the comparison result file comprising each core gene type of ASFV, thus Obtain the information of all Conserved core genes of ASFV；

S4, the core genes type matrix file and core that the output of chewBBACA software is read in using the R language scripts write The representative sequence alignment file of genes, by Ergodic Matrices data and pattern match core genes by each core of ASFV Gene is redistributed, and total fasta file of one all core genes sequence comprising all ASFV of output is named as total.fasta；

S5, the total fasta file exported using R language scripts in the Perl language scripts read step S4 write, will be each All core genes of ASFV are assigned in an independent fasta file, i.e., each individually fasta file only includes one All core genes of ASFV itself；

All core genes' of each ASFV itself exported in the perl script circulation read step S5 that S6, recycling are write All core genes of each ASFV itself are ranked up by independent fasta file according in the genome 5 ' to 3 ' direction, This step generates sorted gene document；

S7, ref- is put by selection to complete annotation information according to gff3 file using the R language scripts write The ASFV of genome file extracts all gene orders and Gene Name with reference to genome (one), owns to extraction Gene nucleic acid sequence construct blast database；

S8, using the corresponding sorted gene document of step S6, the blast database of step S7 of being subject to building carries out local Blast filters out similarity greater than 90%, and length is greater than the optimal result of 450bp, defeated using step S7 corresponding to optimal result The Gene Name of file out extracts the title of the gene used with the R language scripts write according to the gbk file of reference genome And annotation information, this output file include the Gene Name of all sequences that can be used as detection of nucleic acids screened.By institute The gene order for having ASFV to sort merges in a fasta file；In conclusion having had been built up all can be used as The ASFV nucleic acid sequence library (the fasta file of the good gene order of the ordering by merging that S8 is generated) of detection of nucleic acids sequence and corresponding base Because of title (the including all Gene Names that can be used as detection of nucleic acids sequence for screening of S8 output).

S9, extraction think the nucleic acid sequence and design primer of detection, then conservative target inspection need to be individually extracted from reference genome Cls gene sequence simultaneously constructs local blast database, the local blast of the fasta file that step S8 is obtained and this step building Database compares, and screening similarity is greater than 90%, and length is greater than 450bp's as a result, be named as result.txt, and using writing Perl language scripts extract all sequences, the result of this step output is the conserved sequence to be detected excavated, then is carried out more Then sequence alignment separately designs primer and probe for detection of nucleic acids.

2. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: in the step S1 by file designation be that genome ID number adds fa suffix, and file be named as genomes.fa or It can customize name.

3. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: randomly selected genome is the genome for having complete annotation information in the step S2.

4. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled merge_all_allele2total_fasta.R for the R language scripts write in the step S4.

5. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled assign_each_sample_core_allele2each_ for the Perl language scripts write in the step S5 file.pl。

6. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled sort_each_sample_allele.pl for the Perl language scripts write in the step S6.

7. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled extract_genesBygff.R for the R language scripts write in the step S7.

8. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled extract_gene_name_infoBygbk.R for the R language scripts write in the step S8.

9. a kind of method using raw letter technology mining ASFV detection of nucleic acids sequence according to claim 1, feature exist In: the entitled extract_seqByid.pl for the Perl language scripts write in the step S9.