CN110310699A - The analysis tool and application of target gene sequence are excavated based on whole genome sequence - Google Patents
The analysis tool and application of target gene sequence are excavated based on whole genome sequence Download PDFInfo
- Publication number
- CN110310699A CN110310699A CN201910586422.4A CN201910586422A CN110310699A CN 110310699 A CN110310699 A CN 110310699A CN 201910586422 A CN201910586422 A CN 201910586422A CN 110310699 A CN110310699 A CN 110310699A
- Authority
- CN
- China
- Prior art keywords
- sequence
- target gene
- file
- sample
- gene sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to a kind of method and its application tested and analyzed for the analysis tool for excavating target gene sequence based on whole genome sequence run under Linux environment write using Perl language, it realizes from full-length genome level, carry out the variant sites of target gene using the complete genome sequence of multiple parent materials, variation type is analyzed, and obtains homologous sequence of the target gene in parent material.The analysis tool and analysis method can be automatically performed target interval search, sequence alignment and the analysis work of function variation type, other any species gene group annotation results are not needed as reference, versatility with higher, and it can support 2, the analysis of 000 parental gene group, it can be widely applied to the target gene sequence analysis in crop gene group, simple and efficient sequence polymorphism analysis tool and strategy be provided for molecular breeding.
Description
Technical field
Analysis tool creation and fortune that target gene sequence is excavated based on whole genome sequence are utilized the present invention relates to a kind of
The excavation of target gene sequence, analysis method in whole genome sequence are carried out with it.This method and its creation based on full genome
The analysis tool EXGE1.0 of group sequential mining target gene sequence is mainly used in the target gene sequence in crop gene group point
Analysis.
Background technique
In recent years, being constantly progressive with sequencing technologies, sequencing throughput is higher and higher, while sequencing cost is lower and lower,
The genome sequence of some material is obtained by gene order-checking, and the variation type of target gene is found in genome sequence
Have become the elementary tactics of animals and plants molecular breeding improvement.But along with the sharp increase of sample size, the product of lots of genes group sequencing
Tired, the functional gene type and variation position information that target gene how is quickly found in mass data have become repressor gene group
The key factor of breeding improvement process, there are complex for operation step, works in operation lots of genes group sequence for traditional analysis tool
The shortcomings that making high intensity, heavy workload.Therefore it provides the target gene automated analysis tool based on whole genome sequence is one
A effective method.
Summary of the invention
Technical problem solved by the invention, which is to provide, a kind of excavates target gene sequence based on whole genome sequence
Analysis tool automatically analyzes the variation type of target gene sequence, does not need other any species bases from full-length genome level
Because group annotation result is as reference, there is good versatility.
The technical solution for realizing the aim of the invention is as follows:
A kind of analysis tool for excavating target gene sequence based on whole genome sequence, comprising:
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e:
Filtering threshold ,-d: between genomic region to be detected ,-o: the filename of output;Order line 1:-g file format is one gene of every row
Group path;Order line 2:-d specifies chromosome numbers and physical location;Order line 3:-i is fasta formatted file, it is desirable that storage
In the same file folder of genome to be detected;Using perl order, execute EXEG.pl shell script, and carry parameter-i ,-
g、-e、-d、-o。
A kind of determination method using the above-mentioned analysis tool for excavating target gene sequence based on whole genome sequence,
The following steps are included:
Step 1: bioperl software package being installed under computer (SuSE) Linux OS and sequence alignment program Blast+ is soft
Part packet;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, and be converted into
Fasta formatted file obtains sample genome sequence file;
Step 3: the title of sample genome sequence is sequentially written in the text on path text of target gene combination of sets in order
In part g, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;It sets to be checked
Cls gene class interval d is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;Filtering threshold e is set;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same target text
In part folder, wherein the target gene sequence file i for needing to detect is fasta formatted file, while will be described in claim 1
The script software packet of the analysis tool of target gene sequence, the path text of target gene combination of sets are excavated based on whole genome sequence
This document g is also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence
Insertion or deletion mutation site information, SNP mutation information and target gene sequence and sample gene in sample genome
BLAST comparison result in combination of sets;Wherein, insertion or deletion mutation site information include being inserted into or lacking in target gene sequence
The physical location of mutation, the title of affiliated sample genome sequence are inserted into sample genome sequence or the physics of deletion mutation
The variation type in variation type, sample genome sequence in position, target gene sequence;SNP mutation information includes target
The physical location of SNP mutation in gene order, the title of affiliated sample genome sequence, SNP mutation in sample genome sequence
Physical location, the SNP base type in target gene sequence, the nucleotide variation type in sample genome sequence, it is synonymous or
Nonsynonymous mutation type;It includes target gene sequence in sample that BLAST comparison result is concentrated in target gene sequence and the assortment of genes
Homologous sequence in genome.
Using above-mentioned determination method in the Sequence Detection analysis after rice and the gene order-checking of other crops
Using.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1, the present invention automatically analyzes the variation type of target gene sequence, does not need other from full-length genome level
What species gene group annotation result has good versatility, ordinary individual PC computer is suitble to use as reference.
2, the present invention can be automatically performed the analysis work of target interval search, sequence alignment, function variation type, whole nothing
Any manual intervention is needed, the summary sheet of the variation type, aligned sequences that ultimately generate works convenient for user for subsequent analysis.
3, the present invention can support the analysis of 2,000 or less complete genome group (each genome 430Mb) sequence, simultaneously
Standardized output data format is provided, calls third party's tool to analysis data reprocessing convenient for user.
Detailed description of the invention
Fig. 1 is the Fasta formatted file of sample genome sequence;
Fig. 2 is the Fasta formatted file of target gene sequence;
Fig. 3 is insertion or deletion mutation output result;
Fig. 4 is insertion or deletion mutation output result explanation;
Fig. 5 is SNP mutation output result;
Fig. 6 is SNP mutation output result explanation;
Fig. 7 is that BLAST comparison result is concentrated in target gene sequence and the assortment of genes;
Fig. 8 is the CDS sequence of rice blast Piz-t disease-resistant gene;
Fig. 9 is to analyze Piz-t disease-resistant gene using the analysis tool for excavating target gene sequence based on whole genome sequence
Variation type and disease-resistant phenotype in sequencing parent material, A indicate Piz-t disease-resistant gene haplotype and variation position
Point, B indicate the relationship of haplotype and disease-resistant, susceptible phenotype.
Specific embodiment
Embodiments of the present invention are described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein from beginning
Same or similar element or element with the same or similar functions are indicated to same or similar label eventually.Below by ginseng
The embodiment for examining attached drawing description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
A kind of analysis tool for excavating target gene sequence based on whole genome sequence, comprising:
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e:
Filtering threshold ,-d: between genomic region to be detected ,-o: the filename of output;
Order line 1:-g file format is one genome path of every row;
Order line 2:-d specifies chromosome numbers and physical location;
Order line 3:-i is fasta formatted file, it is desirable that is stored in the same file folder of genome to be detected;
Using perl order, EXEG.pl shell script is executed, and carries parameter-i ,-g ,-e ,-d ,-o.
A kind of determination method using the above-mentioned analysis tool for excavating target gene sequence based on whole genome sequence,
The following steps are included:
Step 1: bioperl software package being installed under computer (SuSE) Linux OS and sequence alignment program Blast+ is soft
Part packet;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, utilize analysis tool
The reads of each sample measured is compared BWA with reference to genome, generates BAM formatted file, recycles samtools
BAM formatted file is converted to fasta formatted file and obtains sample genome sequence file by software;
Step 3: the title of sample genome sequence is sequentially written in the text on path text of target gene combination of sets in order
In part g, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;Such as:~/
Msuv7.fa, 2000 genomes are exactly 2000 rows;
D between genomic region to be detected is set, is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;Such as
Chromosome numbers are Chr01, in addition physical distance, is expressed as Chr01:1-1000, it should be noted that the chromosome numbers in order line
It is consistent with the number in sample genome sequence to be detected;
Filtering threshold e is set, and e is defaulted as 10-10, filtering threshold values can be adjusted according to actual needs;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same target text
In part folder, wherein the target gene sequence file i for needing to detect is fasta formatted file,
Simultaneously by the script of the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence
Software package, target gene combination of sets text on path file g be also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence
Insertion or deletion mutation site information, SNP mutation information and target gene sequence and sample gene in sample genome
BLAST comparison result in combination of sets;
Wherein, insertion or deletion mutation site information include the physical bit of insertion or deletion mutation in target gene sequence
Set, in the title of affiliated sample genome sequence, sample genome sequence insertion or deletion mutation physical location, target gene
The variation type in variation type, sample genome sequence in sequence;
SNP mutation information includes the name of the physical location of SNP mutation in target gene sequence, affiliated sample genome sequence
Claim, the physical location of SNP mutation, the SNP base type in target gene sequence, sample genome sequence in sample genome sequence
Nucleotide variation type, synonymous or nonsynonymous mutation type in column;
It includes target gene sequence in sample genome that BLAST comparison result is concentrated in target gene sequence and the assortment of genes
Homologous sequence.
Above-mentioned determination method can be used in the analysis of the Sequence Detection after rice and the gene order-checking of other crops.
Embodiment 1
(1) running environment requirement
Hardware configuration requirement: more than 4 core of CPU, inside there is 16G or more, hard disk 1000G or more.Software environment requirement:
Linux operating system (perl equipped with 5.10 or more versions).
(2) in parent material disease-resistant gene excavation
1, material to be tested
199 parts of height are for stabilization of rice sample.
2, DNA extracts the DNA extraction method with reference to (2000) such as Temnykh, extracts genome respectively to each single plant
DNA.After extraction, gene order-checking builds library and sequencing, and sequencing depth is 20 times, and Read is more than 50% base in initial data
Quality value less than 5 or have connector pollution, then filtered eliminate.On the basis of genomic DNA sequencing data, benefit
The reads that each sample obtains is compared with reference to genome (IRGSP-1.0) with free analysis tool BWA, is given birth to
At BAM formatted file, BAM file is converted to the file of fasta format using samtools software.In order to improve sequential extraction procedures
Reliability, quality-controlling parameters setting are as follows: the mapping mass value in each site be greater than 20, variation mass value be greater than 50, and
And each base at least comes from 3 or more reads data supportings.
3, parental gene group sequence (sequence content such as Fig. 1) achieved above and the target gene sequence that detects is needed (such as
It Fig. 2) is stored in same file folder.The document format data that this shell script is related to is the file of fasta format, sequence
Description information with " > " beginning only account for a line, first character section cannot repeat in file thereafter.For in sequence after sequence explanation
Hold, continuous multirow can be divided to store.
Use the gene order of rice blast resistance gene Piz-t as target gene sequence, sequence content in the present embodiment
Such as Fig. 8.
4, direct.txt text file is created in the above file, and in the name of file input sample above genome
Claim, such as~/199_1.fa, until~/199_199.fa.
5, target gene sequence file is Piz-t.fasta, and sequence content is as shown in Figure 8.
6, using shell script EXGE.pl, order behavior perl EXGE.pl-i Piz-t.fa-g direct.txt
- e-10-d Chr06_consensus:10,000,00-12,000,000-o Piz-t_result, shell script Bao Jianyuan
Code.
7, after completing script operation, there are three files for output result, including target gene sequence in corresponding genome
SNP variation (such as Fig. 5), Indel insertion and deletion variant sites information (such as Fig. 3) and BLAST comparison result (such as Fig. 7), SNP
The result of variation illustrates as shown in fig. 6, the result explanation of insertion and deletion variant sites is as shown in Figure 4.BLAST in Fig. 7 is compared
As a result in, what > 199_17_Chr11_consensus_27982787-27983057 was indicated is the o.11 of sample ' 199_17 '
There are very high homologies with target gene for the sequence in 27982787 to 27983057 sections of chromosome;POS: target gene sequence is indicated
94th to the 364th section of column and the sequence homology degree of sample genome are up to 100%.
8, the insertion according to the above disease-resistant gene Piz-t in parent material, missing and replacement type non-synonymous will become
Foreign peoples's type is divided into 13 kinds of haplotypes (being named as Hap1~Hap13), wherein the Piz-t sequence of Hap1 type and disease-resistant type
100% is consistent, and as shown in Figure 9 A, NO. indicates the parent material number with the haplotype, and '-' indicates the site deletion 1
The base of bp, ' -- ' indicate the base of 2 bp of the site deletion.Above 13 kinds of Hap classes are identified using rice blast pathogen ' 83-14 '
The parent material rice blast resistance of type, as shown in Figure 9 B, R: disease-resistant phenotype;S: susceptible phenotype, wherein the Rice Leaf of Hap1 type
Pest performance it is disease-resistant, and other types Hap2~Hap13 then show it is susceptible.Therefore, using EXGE1.0 shell script from parent's material
Disease-resistant gene type is identified in material, result is consistent with the phenotype for connecing bacterium.
The source code for excavating the analysis tool of target gene sequence based on whole genome sequence is as follows:
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art
For member, without departing from the principle of the present invention, several improvement can also be made, these improvement should be regarded as guarantor of the invention
Protect range.
Claims (5)
1. a kind of analysis tool for excavating target gene sequence based on whole genome sequence characterized by comprising
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e: filtering
Threshold value ,-d: between genomic region to be detected ,-o: the filename of output;
Order line 1:-g file format is one genome path of every row;
Order line 2:-d specifies chromosome numbers and physical location;
Order line 3:-i is fasta formatted file, it is desirable that is stored in the same file folder of genome to be detected;
Using perl order, EXEG.pl shell script is executed, and carries parameter-i ,-g ,-e ,-d ,-o.
2. a kind of detection using the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence
Analysis method, which comprises the following steps:
Step 1: bioperl software package and sequence alignment program Blast+ software package are installed under computer (SuSE) Linux OS;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, and be converted into fasta
Formatted file obtains sample genome sequence file;
Step 3: the title of sample genome sequence is sequentially written in the text on path file g of target gene combination of sets in order
In, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;
D between genomic region to be detected is set, is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;
Filtering threshold e is set;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same destination folder
In, wherein the target gene sequence file i for needing to detect is fasta formatted file,
Simultaneously by the script software of the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence
It wraps, the text on path file g of target gene combination of sets is also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence in sample
Insertion or deletion mutation site information, SNP mutation information and target gene sequence and the sample assortment of genes in this genome
Concentrate BLAST comparison result;
Wherein, insertion or deletion mutation site information include physical location, the institute of insertion or deletion mutation in target gene sequence
Belong to the title of sample genome sequence, be inserted into sample genome sequence or physical location, the target gene sequence of deletion mutation
In variation type, the variation type in sample genome sequence;
SNP mutation information include the physical location of SNP mutation in target gene sequence, affiliated sample genome sequence title,
The physical location of SNP mutation, the SNP base type in target gene sequence, sample genome sequence in sample genome sequence
In nucleotide variation type, synonymous or nonsynonymous mutation type;
It includes that target gene sequence is same in sample genome that target gene sequence, which concentrates BLAST comparison result with the assortment of genes,
Source sequence.
3. the detection and analysis of the analysis tool according to claim 2 for excavating target gene sequence based on whole genome sequence
Method, which is characterized in that sample genome sequence is converted in step 2 and generates fasta formatted file specifically: utilize analysis work
The reads of each sample measured is compared tool BWA with reference to genome, generates BAM formatted file, recycles
BAM formatted file is converted to fasta formatted file by samtools software.
4. the detection and analysis of the analysis tool according to claim 2 for excavating target gene sequence based on whole genome sequence
Method, which is characterized in that filtering threshold e is 10-10。
5. using the determination method for the analysis tool for excavating target gene sequence based on whole genome sequence in rice and its
The application in Sequence Detection analysis after the gene order-checking of its crop.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910586422.4A CN110310699A (en) | 2019-07-01 | 2019-07-01 | The analysis tool and application of target gene sequence are excavated based on whole genome sequence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910586422.4A CN110310699A (en) | 2019-07-01 | 2019-07-01 | The analysis tool and application of target gene sequence are excavated based on whole genome sequence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110310699A true CN110310699A (en) | 2019-10-08 |
Family
ID=68078708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910586422.4A Pending CN110310699A (en) | 2019-07-01 | 2019-07-01 | The analysis tool and application of target gene sequence are excavated based on whole genome sequence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310699A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112037847A (en) * | 2020-09-15 | 2020-12-04 | 中国科学院微生物研究所 | Microbial strain genome analysis method and device and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1733915A (en) * | 2005-07-19 | 2006-02-15 | 浙江大学 | False gene data bank construction method of rice genome |
CN104462211A (en) * | 2014-11-04 | 2015-03-25 | 北京诺禾致源生物信息科技有限公司 | Re-sequencing data processing method and processing device |
CN105426700A (en) * | 2015-12-18 | 2016-03-23 | 江苏省农业科学院 | Method for batch computing of evolutionary rate of orthologous genes of genome |
CN106282320A (en) * | 2015-05-20 | 2017-01-04 | 广州华大基因医学检验所有限公司 | The method and apparatus of detection bodies cell mutation |
CN106529171A (en) * | 2016-11-09 | 2017-03-22 | 上海派森诺医学检验所有限公司 | Detection analysis method for breast cancer susceptibility gene heritable variation point |
CN106544407A (en) * | 2015-09-18 | 2017-03-29 | 广州华大基因医学检验所有限公司 | The method for determining donor source cfDNA ratios in receptor cfDNA samples |
CN107391965A (en) * | 2017-08-15 | 2017-11-24 | 上海派森诺生物科技股份有限公司 | A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies |
CN109402241A (en) * | 2017-08-07 | 2019-03-01 | 深圳华大基因研究院 | Identification and the method for analyzing ancient DNA sample |
-
2019
- 2019-07-01 CN CN201910586422.4A patent/CN110310699A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1733915A (en) * | 2005-07-19 | 2006-02-15 | 浙江大学 | False gene data bank construction method of rice genome |
CN104462211A (en) * | 2014-11-04 | 2015-03-25 | 北京诺禾致源生物信息科技有限公司 | Re-sequencing data processing method and processing device |
CN106282320A (en) * | 2015-05-20 | 2017-01-04 | 广州华大基因医学检验所有限公司 | The method and apparatus of detection bodies cell mutation |
CN106544407A (en) * | 2015-09-18 | 2017-03-29 | 广州华大基因医学检验所有限公司 | The method for determining donor source cfDNA ratios in receptor cfDNA samples |
CN105426700A (en) * | 2015-12-18 | 2016-03-23 | 江苏省农业科学院 | Method for batch computing of evolutionary rate of orthologous genes of genome |
CN106529171A (en) * | 2016-11-09 | 2017-03-22 | 上海派森诺医学检验所有限公司 | Detection analysis method for breast cancer susceptibility gene heritable variation point |
CN109402241A (en) * | 2017-08-07 | 2019-03-01 | 深圳华大基因研究院 | Identification and the method for analyzing ancient DNA sample |
CN107391965A (en) * | 2017-08-15 | 2017-11-24 | 上海派森诺生物科技股份有限公司 | A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies |
Non-Patent Citations (1)
Title |
---|
陶欢: ""水稻品种CO39及其NILs基因组重测序序列的组装与SNPs"", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112037847A (en) * | 2020-09-15 | 2020-12-04 | 中国科学院微生物研究所 | Microbial strain genome analysis method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lobaton et al. | Resequencing of common bean identifies regions of inter–gene pool introgression and provides comprehensive resources for molecular breeding | |
JP6314091B2 (en) | DNA sequence data analysis | |
Duran et al. | Future tools for association mapping in crop plants | |
WO2012168815A2 (en) | Method for assembly of nucleic acid sequence data | |
Eves‐van den Akker et al. | A metagenetic approach to determine the diversity and distribution of cyst nematodes at the level of the country, the field and the individual | |
Wildschutte et al. | Discovery and characterization of Alu repeat sequences via precise local read assembly | |
CN108197434A (en) | The method for removing human source gene sequence in macro gene order-checking data | |
KR20140006846A (en) | Data analysis of dna sequences | |
Aflitos et al. | Introgression browser: high‐throughput whole‐genome SNP visualization | |
WO2013103759A2 (en) | Haplotype based pipeline for snp discovery and/or classification | |
CN113571131B (en) | Pangenome construction method and corresponding structural variation mining method | |
Gutierrez-Gonzalez et al. | De novo transcriptome assembly in polyploid species | |
CN110310699A (en) | The analysis tool and application of target gene sequence are excavated based on whole genome sequence | |
CN108256291A (en) | It is a kind of to generate the method with higher confidence level detection in Gene Mutation result | |
KR101539737B1 (en) | Methodology for improving efficiency of marker-assisted backcrossing using genome sequence and molecular marker | |
Lammers et al. | Phylogenetic conflict in bears identified by automated discovery of transposable element insertions in low-coverage genomes | |
JP4468773B2 (en) | Gene information display method and display device | |
Douglas et al. | Processing a 16S rRNA sequencing dataset with the microbiome helper workflow | |
WO2022160700A1 (en) | Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing | |
Henke et al. | Identification of Mutations in Zebrafish Using Next‐Generation Sequencing | |
CN107354151A (en) | STR molecular labelings and its application based on the exploitation of sika deer full-length genome | |
JP2014530629A5 (en) | ||
KR101911307B1 (en) | Method for selecting and utilizing tag-SNP for discriminating haplotype in gene unit | |
Donaire et al. | Computational pipeline for the detection of plant RNA viruses using high-throughput sequencing | |
Hesse | K-Mer-Based Genome Size Estimation in Theory and Practice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |