CN110310699A - The analysis tool and application of target gene sequence are excavated based on whole genome sequence - Google Patents

The analysis tool and application of target gene sequence are excavated based on whole genome sequence Download PDF

Info

Publication number
CN110310699A
CN110310699A CN201910586422.4A CN201910586422A CN110310699A CN 110310699 A CN110310699 A CN 110310699A CN 201910586422 A CN201910586422 A CN 201910586422A CN 110310699 A CN110310699 A CN 110310699A
Authority
CN
China
Prior art keywords
sequence
target gene
file
sample
gene sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910586422.4A
Other languages
Chinese (zh)
Inventor
肖宁
李爱宏
戴正元
周长海
刘广青
潘存红
李育红
吴云雨
余玲
王志平
蔡跃
黄年生
季红娟
张小祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Lixiahe Prefecture Institute Of Agricultural Science
Original Assignee
Jiangsu Lixiahe Prefecture Institute Of Agricultural Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Lixiahe Prefecture Institute Of Agricultural Science filed Critical Jiangsu Lixiahe Prefecture Institute Of Agricultural Science
Priority to CN201910586422.4A priority Critical patent/CN110310699A/en
Publication of CN110310699A publication Critical patent/CN110310699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a kind of method and its application tested and analyzed for the analysis tool for excavating target gene sequence based on whole genome sequence run under Linux environment write using Perl language, it realizes from full-length genome level, carry out the variant sites of target gene using the complete genome sequence of multiple parent materials, variation type is analyzed, and obtains homologous sequence of the target gene in parent material.The analysis tool and analysis method can be automatically performed target interval search, sequence alignment and the analysis work of function variation type, other any species gene group annotation results are not needed as reference, versatility with higher, and it can support 2, the analysis of 000 parental gene group, it can be widely applied to the target gene sequence analysis in crop gene group, simple and efficient sequence polymorphism analysis tool and strategy be provided for molecular breeding.

Description

The analysis tool and application of target gene sequence are excavated based on whole genome sequence
Technical field
Analysis tool creation and fortune that target gene sequence is excavated based on whole genome sequence are utilized the present invention relates to a kind of The excavation of target gene sequence, analysis method in whole genome sequence are carried out with it.This method and its creation based on full genome The analysis tool EXGE1.0 of group sequential mining target gene sequence is mainly used in the target gene sequence in crop gene group point Analysis.
Background technique
In recent years, being constantly progressive with sequencing technologies, sequencing throughput is higher and higher, while sequencing cost is lower and lower, The genome sequence of some material is obtained by gene order-checking, and the variation type of target gene is found in genome sequence Have become the elementary tactics of animals and plants molecular breeding improvement.But along with the sharp increase of sample size, the product of lots of genes group sequencing Tired, the functional gene type and variation position information that target gene how is quickly found in mass data have become repressor gene group The key factor of breeding improvement process, there are complex for operation step, works in operation lots of genes group sequence for traditional analysis tool The shortcomings that making high intensity, heavy workload.Therefore it provides the target gene automated analysis tool based on whole genome sequence is one A effective method.
Summary of the invention
Technical problem solved by the invention, which is to provide, a kind of excavates target gene sequence based on whole genome sequence Analysis tool automatically analyzes the variation type of target gene sequence, does not need other any species bases from full-length genome level Because group annotation result is as reference, there is good versatility.
The technical solution for realizing the aim of the invention is as follows:
A kind of analysis tool for excavating target gene sequence based on whole genome sequence, comprising:
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e: Filtering threshold ,-d: between genomic region to be detected ,-o: the filename of output;Order line 1:-g file format is one gene of every row Group path;Order line 2:-d specifies chromosome numbers and physical location;Order line 3:-i is fasta formatted file, it is desirable that storage In the same file folder of genome to be detected;Using perl order, execute EXEG.pl shell script, and carry parameter-i ,- g、-e、-d、-o。
A kind of determination method using the above-mentioned analysis tool for excavating target gene sequence based on whole genome sequence, The following steps are included:
Step 1: bioperl software package being installed under computer (SuSE) Linux OS and sequence alignment program Blast+ is soft Part packet;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, and be converted into Fasta formatted file obtains sample genome sequence file;
Step 3: the title of sample genome sequence is sequentially written in the text on path text of target gene combination of sets in order In part g, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;It sets to be checked Cls gene class interval d is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;Filtering threshold e is set;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same target text In part folder, wherein the target gene sequence file i for needing to detect is fasta formatted file, while will be described in claim 1 The script software packet of the analysis tool of target gene sequence, the path text of target gene combination of sets are excavated based on whole genome sequence This document g is also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence Insertion or deletion mutation site information, SNP mutation information and target gene sequence and sample gene in sample genome BLAST comparison result in combination of sets;Wherein, insertion or deletion mutation site information include being inserted into or lacking in target gene sequence The physical location of mutation, the title of affiliated sample genome sequence are inserted into sample genome sequence or the physics of deletion mutation The variation type in variation type, sample genome sequence in position, target gene sequence;SNP mutation information includes target The physical location of SNP mutation in gene order, the title of affiliated sample genome sequence, SNP mutation in sample genome sequence Physical location, the SNP base type in target gene sequence, the nucleotide variation type in sample genome sequence, it is synonymous or Nonsynonymous mutation type;It includes target gene sequence in sample that BLAST comparison result is concentrated in target gene sequence and the assortment of genes Homologous sequence in genome.
Using above-mentioned determination method in the Sequence Detection analysis after rice and the gene order-checking of other crops Using.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1, the present invention automatically analyzes the variation type of target gene sequence, does not need other from full-length genome level What species gene group annotation result has good versatility, ordinary individual PC computer is suitble to use as reference.
2, the present invention can be automatically performed the analysis work of target interval search, sequence alignment, function variation type, whole nothing Any manual intervention is needed, the summary sheet of the variation type, aligned sequences that ultimately generate works convenient for user for subsequent analysis.
3, the present invention can support the analysis of 2,000 or less complete genome group (each genome 430Mb) sequence, simultaneously Standardized output data format is provided, calls third party's tool to analysis data reprocessing convenient for user.
Detailed description of the invention
Fig. 1 is the Fasta formatted file of sample genome sequence;
Fig. 2 is the Fasta formatted file of target gene sequence;
Fig. 3 is insertion or deletion mutation output result;
Fig. 4 is insertion or deletion mutation output result explanation;
Fig. 5 is SNP mutation output result;
Fig. 6 is SNP mutation output result explanation;
Fig. 7 is that BLAST comparison result is concentrated in target gene sequence and the assortment of genes;
Fig. 8 is the CDS sequence of rice blast Piz-t disease-resistant gene;
Fig. 9 is to analyze Piz-t disease-resistant gene using the analysis tool for excavating target gene sequence based on whole genome sequence Variation type and disease-resistant phenotype in sequencing parent material, A indicate Piz-t disease-resistant gene haplotype and variation position Point, B indicate the relationship of haplotype and disease-resistant, susceptible phenotype.
Specific embodiment
Embodiments of the present invention are described below in detail, the example of the embodiment is shown in the accompanying drawings, wherein from beginning Same or similar element or element with the same or similar functions are indicated to same or similar label eventually.Below by ginseng The embodiment for examining attached drawing description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
A kind of analysis tool for excavating target gene sequence based on whole genome sequence, comprising:
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e: Filtering threshold ,-d: between genomic region to be detected ,-o: the filename of output;
Order line 1:-g file format is one genome path of every row;
Order line 2:-d specifies chromosome numbers and physical location;
Order line 3:-i is fasta formatted file, it is desirable that is stored in the same file folder of genome to be detected;
Using perl order, EXEG.pl shell script is executed, and carries parameter-i ,-g ,-e ,-d ,-o.
A kind of determination method using the above-mentioned analysis tool for excavating target gene sequence based on whole genome sequence, The following steps are included:
Step 1: bioperl software package being installed under computer (SuSE) Linux OS and sequence alignment program Blast+ is soft Part packet;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, utilize analysis tool The reads of each sample measured is compared BWA with reference to genome, generates BAM formatted file, recycles samtools BAM formatted file is converted to fasta formatted file and obtains sample genome sequence file by software;
Step 3: the title of sample genome sequence is sequentially written in the text on path text of target gene combination of sets in order In part g, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;Such as:~/ Msuv7.fa, 2000 genomes are exactly 2000 rows;
D between genomic region to be detected is set, is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;Such as Chromosome numbers are Chr01, in addition physical distance, is expressed as Chr01:1-1000, it should be noted that the chromosome numbers in order line It is consistent with the number in sample genome sequence to be detected;
Filtering threshold e is set, and e is defaulted as 10-10, filtering threshold values can be adjusted according to actual needs;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same target text In part folder, wherein the target gene sequence file i for needing to detect is fasta formatted file,
Simultaneously by the script of the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence Software package, target gene combination of sets text on path file g be also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence Insertion or deletion mutation site information, SNP mutation information and target gene sequence and sample gene in sample genome BLAST comparison result in combination of sets;
Wherein, insertion or deletion mutation site information include the physical bit of insertion or deletion mutation in target gene sequence Set, in the title of affiliated sample genome sequence, sample genome sequence insertion or deletion mutation physical location, target gene The variation type in variation type, sample genome sequence in sequence;
SNP mutation information includes the name of the physical location of SNP mutation in target gene sequence, affiliated sample genome sequence Claim, the physical location of SNP mutation, the SNP base type in target gene sequence, sample genome sequence in sample genome sequence Nucleotide variation type, synonymous or nonsynonymous mutation type in column;
It includes target gene sequence in sample genome that BLAST comparison result is concentrated in target gene sequence and the assortment of genes Homologous sequence.
Above-mentioned determination method can be used in the analysis of the Sequence Detection after rice and the gene order-checking of other crops. Embodiment 1
(1) running environment requirement
Hardware configuration requirement: more than 4 core of CPU, inside there is 16G or more, hard disk 1000G or more.Software environment requirement: Linux operating system (perl equipped with 5.10 or more versions).
(2) in parent material disease-resistant gene excavation
1, material to be tested
199 parts of height are for stabilization of rice sample.
2, DNA extracts the DNA extraction method with reference to (2000) such as Temnykh, extracts genome respectively to each single plant DNA.After extraction, gene order-checking builds library and sequencing, and sequencing depth is 20 times, and Read is more than 50% base in initial data Quality value less than 5 or have connector pollution, then filtered eliminate.On the basis of genomic DNA sequencing data, benefit The reads that each sample obtains is compared with reference to genome (IRGSP-1.0) with free analysis tool BWA, is given birth to At BAM formatted file, BAM file is converted to the file of fasta format using samtools software.In order to improve sequential extraction procedures Reliability, quality-controlling parameters setting are as follows: the mapping mass value in each site be greater than 20, variation mass value be greater than 50, and And each base at least comes from 3 or more reads data supportings.
3, parental gene group sequence (sequence content such as Fig. 1) achieved above and the target gene sequence that detects is needed (such as It Fig. 2) is stored in same file folder.The document format data that this shell script is related to is the file of fasta format, sequence Description information with " > " beginning only account for a line, first character section cannot repeat in file thereafter.For in sequence after sequence explanation Hold, continuous multirow can be divided to store.
Use the gene order of rice blast resistance gene Piz-t as target gene sequence, sequence content in the present embodiment Such as Fig. 8.
4, direct.txt text file is created in the above file, and in the name of file input sample above genome Claim, such as~/199_1.fa, until~/199_199.fa.
5, target gene sequence file is Piz-t.fasta, and sequence content is as shown in Figure 8.
6, using shell script EXGE.pl, order behavior perl EXGE.pl-i Piz-t.fa-g direct.txt - e-10-d Chr06_consensus:10,000,00-12,000,000-o Piz-t_result, shell script Bao Jianyuan Code.
7, after completing script operation, there are three files for output result, including target gene sequence in corresponding genome SNP variation (such as Fig. 5), Indel insertion and deletion variant sites information (such as Fig. 3) and BLAST comparison result (such as Fig. 7), SNP The result of variation illustrates as shown in fig. 6, the result explanation of insertion and deletion variant sites is as shown in Figure 4.BLAST in Fig. 7 is compared As a result in, what > 199_17_Chr11_consensus_27982787-27983057 was indicated is the o.11 of sample ' 199_17 ' There are very high homologies with target gene for the sequence in 27982787 to 27983057 sections of chromosome;POS: target gene sequence is indicated 94th to the 364th section of column and the sequence homology degree of sample genome are up to 100%.
8, the insertion according to the above disease-resistant gene Piz-t in parent material, missing and replacement type non-synonymous will become Foreign peoples's type is divided into 13 kinds of haplotypes (being named as Hap1~Hap13), wherein the Piz-t sequence of Hap1 type and disease-resistant type 100% is consistent, and as shown in Figure 9 A, NO. indicates the parent material number with the haplotype, and '-' indicates the site deletion 1 The base of bp, ' -- ' indicate the base of 2 bp of the site deletion.Above 13 kinds of Hap classes are identified using rice blast pathogen ' 83-14 ' The parent material rice blast resistance of type, as shown in Figure 9 B, R: disease-resistant phenotype;S: susceptible phenotype, wherein the Rice Leaf of Hap1 type Pest performance it is disease-resistant, and other types Hap2~Hap13 then show it is susceptible.Therefore, using EXGE1.0 shell script from parent's material Disease-resistant gene type is identified in material, result is consistent with the phenotype for connecing bacterium.
The source code for excavating the analysis tool of target gene sequence based on whole genome sequence is as follows:
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, without departing from the principle of the present invention, several improvement can also be made, these improvement should be regarded as guarantor of the invention Protect range.

Claims (5)

1. a kind of analysis tool for excavating target gene sequence based on whole genome sequence characterized by comprising
Parameter :-i: target gene sequence file name ,-g: the text on path file name of target gene combination of sets ,-e: filtering Threshold value ,-d: between genomic region to be detected ,-o: the filename of output;
Order line 1:-g file format is one genome path of every row;
Order line 2:-d specifies chromosome numbers and physical location;
Order line 3:-i is fasta formatted file, it is desirable that is stored in the same file folder of genome to be detected;
Using perl order, EXEG.pl shell script is executed, and carries parameter-i ,-g ,-e ,-d ,-o.
2. a kind of detection using the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence Analysis method, which comprises the following steps:
Step 1: bioperl software package and sequence alignment program Blast+ software package are installed under computer (SuSE) Linux OS;
Step 2: extracting sample genomic dna, and be sequenced, build library, obtain sample genome sequence, and be converted into fasta Formatted file obtains sample genome sequence file;
Step 3: the title of sample genome sequence is sequentially written in the text on path file g of target gene combination of sets in order In, the format of the text on path of target gene combination of sets are as follows: every row records the path of a sample genome;
D between genomic region to be detected is set, is indicated between genomic region to be detected are as follows: chromosome numbers: physical distance;
Filtering threshold e is set;
Step 4: by sample genome sequence file and the target gene sequence file i detected being needed to be put into same destination folder In, wherein the target gene sequence file i for needing to detect is fasta formatted file,
Simultaneously by the script software of the analysis tool described in claim 1 for excavating target gene sequence based on whole genome sequence It wraps, the text on path file g of target gene combination of sets is also placed in same destination folder;
Step 5: running the analysis tool for excavating target gene sequence based on whole genome sequence, export target gene sequence in sample Insertion or deletion mutation site information, SNP mutation information and target gene sequence and the sample assortment of genes in this genome Concentrate BLAST comparison result;
Wherein, insertion or deletion mutation site information include physical location, the institute of insertion or deletion mutation in target gene sequence Belong to the title of sample genome sequence, be inserted into sample genome sequence or physical location, the target gene sequence of deletion mutation In variation type, the variation type in sample genome sequence;
SNP mutation information include the physical location of SNP mutation in target gene sequence, affiliated sample genome sequence title, The physical location of SNP mutation, the SNP base type in target gene sequence, sample genome sequence in sample genome sequence In nucleotide variation type, synonymous or nonsynonymous mutation type;
It includes that target gene sequence is same in sample genome that target gene sequence, which concentrates BLAST comparison result with the assortment of genes, Source sequence.
3. the detection and analysis of the analysis tool according to claim 2 for excavating target gene sequence based on whole genome sequence Method, which is characterized in that sample genome sequence is converted in step 2 and generates fasta formatted file specifically: utilize analysis work The reads of each sample measured is compared tool BWA with reference to genome, generates BAM formatted file, recycles BAM formatted file is converted to fasta formatted file by samtools software.
4. the detection and analysis of the analysis tool according to claim 2 for excavating target gene sequence based on whole genome sequence Method, which is characterized in that filtering threshold e is 10-10
5. using the determination method for the analysis tool for excavating target gene sequence based on whole genome sequence in rice and its The application in Sequence Detection analysis after the gene order-checking of its crop.
CN201910586422.4A 2019-07-01 2019-07-01 The analysis tool and application of target gene sequence are excavated based on whole genome sequence Pending CN110310699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910586422.4A CN110310699A (en) 2019-07-01 2019-07-01 The analysis tool and application of target gene sequence are excavated based on whole genome sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910586422.4A CN110310699A (en) 2019-07-01 2019-07-01 The analysis tool and application of target gene sequence are excavated based on whole genome sequence

Publications (1)

Publication Number Publication Date
CN110310699A true CN110310699A (en) 2019-10-08

Family

ID=68078708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910586422.4A Pending CN110310699A (en) 2019-07-01 2019-07-01 The analysis tool and application of target gene sequence are excavated based on whole genome sequence

Country Status (1)

Country Link
CN (1) CN110310699A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037847A (en) * 2020-09-15 2020-12-04 中国科学院微生物研究所 Microbial strain genome analysis method and device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1733915A (en) * 2005-07-19 2006-02-15 浙江大学 False gene data bank construction method of rice genome
CN104462211A (en) * 2014-11-04 2015-03-25 北京诺禾致源生物信息科技有限公司 Re-sequencing data processing method and processing device
CN105426700A (en) * 2015-12-18 2016-03-23 江苏省农业科学院 Method for batch computing of evolutionary rate of orthologous genes of genome
CN106282320A (en) * 2015-05-20 2017-01-04 广州华大基因医学检验所有限公司 The method and apparatus of detection bodies cell mutation
CN106529171A (en) * 2016-11-09 2017-03-22 上海派森诺医学检验所有限公司 Detection analysis method for breast cancer susceptibility gene heritable variation point
CN106544407A (en) * 2015-09-18 2017-03-29 广州华大基因医学检验所有限公司 The method for determining donor source cfDNA ratios in receptor cfDNA samples
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies
CN109402241A (en) * 2017-08-07 2019-03-01 深圳华大基因研究院 Identification and the method for analyzing ancient DNA sample

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1733915A (en) * 2005-07-19 2006-02-15 浙江大学 False gene data bank construction method of rice genome
CN104462211A (en) * 2014-11-04 2015-03-25 北京诺禾致源生物信息科技有限公司 Re-sequencing data processing method and processing device
CN106282320A (en) * 2015-05-20 2017-01-04 广州华大基因医学检验所有限公司 The method and apparatus of detection bodies cell mutation
CN106544407A (en) * 2015-09-18 2017-03-29 广州华大基因医学检验所有限公司 The method for determining donor source cfDNA ratios in receptor cfDNA samples
CN105426700A (en) * 2015-12-18 2016-03-23 江苏省农业科学院 Method for batch computing of evolutionary rate of orthologous genes of genome
CN106529171A (en) * 2016-11-09 2017-03-22 上海派森诺医学检验所有限公司 Detection analysis method for breast cancer susceptibility gene heritable variation point
CN109402241A (en) * 2017-08-07 2019-03-01 深圳华大基因研究院 Identification and the method for analyzing ancient DNA sample
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陶欢: ""水稻品种CO39及其NILs基因组重测序序列的组装与SNPs"", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037847A (en) * 2020-09-15 2020-12-04 中国科学院微生物研究所 Microbial strain genome analysis method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Lobaton et al. Resequencing of common bean identifies regions of inter–gene pool introgression and provides comprehensive resources for molecular breeding
JP6314091B2 (en) DNA sequence data analysis
Duran et al. Future tools for association mapping in crop plants
WO2012168815A2 (en) Method for assembly of nucleic acid sequence data
Eves‐van den Akker et al. A metagenetic approach to determine the diversity and distribution of cyst nematodes at the level of the country, the field and the individual
Wildschutte et al. Discovery and characterization of Alu repeat sequences via precise local read assembly
CN108197434A (en) The method for removing human source gene sequence in macro gene order-checking data
KR20140006846A (en) Data analysis of dna sequences
Aflitos et al. Introgression browser: high‐throughput whole‐genome SNP visualization
WO2013103759A2 (en) Haplotype based pipeline for snp discovery and/or classification
CN113571131B (en) Pangenome construction method and corresponding structural variation mining method
Gutierrez-Gonzalez et al. De novo transcriptome assembly in polyploid species
CN110310699A (en) The analysis tool and application of target gene sequence are excavated based on whole genome sequence
CN108256291A (en) It is a kind of to generate the method with higher confidence level detection in Gene Mutation result
KR101539737B1 (en) Methodology for improving efficiency of marker-assisted backcrossing using genome sequence and molecular marker
Lammers et al. Phylogenetic conflict in bears identified by automated discovery of transposable element insertions in low-coverage genomes
JP4468773B2 (en) Gene information display method and display device
Douglas et al. Processing a 16S rRNA sequencing dataset with the microbiome helper workflow
WO2022160700A1 (en) Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing
Henke et al. Identification of Mutations in Zebrafish Using Next‐Generation Sequencing
CN107354151A (en) STR molecular labelings and its application based on the exploitation of sika deer full-length genome
JP2014530629A5 (en)
KR101911307B1 (en) Method for selecting and utilizing tag-SNP for discriminating haplotype in gene unit
Donaire et al. Computational pipeline for the detection of plant RNA viruses using high-throughput sequencing
Hesse K-Mer-Based Genome Size Estimation in Theory and Practice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191008

RJ01 Rejection of invention patent application after publication