CN106951729A - A kind of method that synteny using organelle gene group carries out Phylogenetic analysis - Google Patents

A kind of method that synteny using organelle gene group carries out Phylogenetic analysis Download PDF

Info

Publication number
CN106951729A
CN106951729A CN201710163233.7A CN201710163233A CN106951729A CN 106951729 A CN106951729 A CN 106951729A CN 201710163233 A CN201710163233 A CN 201710163233A CN 106951729 A CN106951729 A CN 106951729A
Authority
CN
China
Prior art keywords
gene group
sequence
synteny
organelle
organelle gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710163233.7A
Other languages
Chinese (zh)
Inventor
茅云翔
毕桂萁
王冬梅
曹敏
徐奎鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN201710163233.7A priority Critical patent/CN106951729A/en
Publication of CN106951729A publication Critical patent/CN106951729A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a kind of method that synteny using organelle gene group carries out Phylogenetic analysis, belong to phylogenetic analysis field.In order to overcome the processing procedure during Phylogenetic analysis in the prior art cumbersome, the technical deficiency for the treatment of time length, the present invention provides a kind of method that synteny using organelle gene group carries out Phylogenetic analysis, and its main flow includes:(1)The synteny of different plant species organelle gene group is extracted using Mauve;(2)The trimming and series connection of synteny sequence;(3)The detection of co-linear modular data set optimality model;(4)The structure of phylogenetic tree.The synteny that this method is obtained can also include non-coding region comprising complete protein coding gene, while substantial amounts of time cost is also saved, while there is higher reliability and accuracy, and the suitably popularization and application in species Phylogenetic analysis.

Description

A kind of method that synteny using organelle gene group carries out Phylogenetic analysis
Technical field
The present invention relates to a kind of method that synteny using organelle gene group carries out Phylogenetic analysis, belong to and be System evolutionary analysis field.
Background technology
Main function is responsible for the metabolic process of photosynthesis and energy, organelle gene group to organelle in vivo Because multicopy, matrilinear inheritance, mutation rate is high, number gene is more, be easily sequenced the features such as, progressively enter into regarding for Phylogenetic analysis It is wild.In the origin evolutionary analysis of species, the information of organelle gene group can not only help our recognition system evolutionary relationships Can also its clear and definite Origin of Species problem.And during using organelle gene constructing system Evolvement, it is accurately many Sequence alignment is the foundation stone that related system Evolvement can correctly be inferred.In today of bioinformatics high speed development, permitted Many comparison instruments that are accurate, quick, meeting biological theory are developed;Have benefited from these instruments so that biological study It is able to be carried out in deeper level higher level.But with the development of high throughput sequencing technologies, sequencing data explosion type increases It is long, higher requirement is proposed to Multiple Sequence Alignment;How to carry out big data quick Multiple Sequence Alignment, obtain more excellent more accurate True result, is the important goal of Multiple Sequence Alignment software development.
In the phylogenetic relationship based on organelle, current widespread practice is that the shared gene of the multiple species of picking enters Row is contribute, it is necessary to each group of ortholog is compared, and is then united, but as Phylogenetic analysis is studied Species increase so that need the number gene compared manually then to increase in geometry multiple, this will drag significantly analyze slowly into Journey, whole process, which seems, to waste time and energy.
In view of a variety of drawbacks of the above, we explore a kind of quick comparison flow based on organelle gene group so that thin Analysis process, which occurs, for born of the same parents' device genome system becomes quick, easy, accelerates analysis process.The systematic growth hair analysis of single-gene data It has been customary, but the Limited information included by single-gene sequence, it is not enough to solve researcher's all classification interested Phylogenetic Relationships between unit, such as many studies have shown that different genes has different evolutionary rates, such as chloroplaset GenerbcL evolutionary rate is 1.4 times or so of karyogene 18S rDNA.In this case, by combining different genes Data set, can increase systematic growth number of signals, so as to strengthen the parsing power to systematic growth taxon, also just say many bases Because of the difference for the evolutionary rate that can eliminate individual gene that is cascaded, it is to build using polygenes even whole gene The accuracy of system chadogram is higher than the reliability of single-gene or multiple genes.Therefore, in the research of evolutionary relationship, with Species gene group information(Matrix attachment region and organelle gene group)It is perfect, the system derivation relations of species is had more deep Understanding.
At present, developing rapidly with sequencing technologies, the organelle gene group for covering each class species is gradually completed, Therefore the phylogenetic relationship built based on organelle gene group solves the problem of many systematicses.But correspond to therewith Result be increasing with data volume(Including species and gene data), cause workload increasing, for example polygenes joins Tree is built jointly, it is necessary to which each group of ortholog of multiple species is individually compared manually, then joins multiple genes (Chloroplast gene gene is often as high as more than 100) is combined, this will drag slow analysis process significantly, whole process seems Waste time and energy.
The content of the invention
In order to overcome the processing procedure during Phylogenetic analysis in the prior art cumbersome, the technology for the treatment of time length is not Foot, the present invention provides a kind of method of the progress Phylogenetic analysis using organelle gene group synteny, and it mainly includes With next techniqueflow:(1)The synteny of different plant species organelle gene group is extracted using Mauve;(2)Synteny sequence Trimming and series connection;(3)The detection of co-linear modular data set optimality model;(4)The structure of phylogenetic tree.
The method that synteny of the present invention using organelle gene group carries out Phylogenetic analysis, its specific bag Include following steps:
1)The synteny of different plant species organelle gene group is extracted using Mauve:Being downloaded in Genbank needs carry out system The organelle gene group of the species of Evolvement, a local data base is built into by the organelle gene group downloaded;Use The organelle gene group of whole species in local data base is imported in Mauve aligner, is examined using progressive Mauve The structure variation surveyed between different plant species organelle gene group, co-linear modular is marked off according to comparison result;For what is marked off Co-linear modular is counted, and co-linear modular is all extracted from comparison result sequence using script;
2)The trimming and series connection of synteny sequence:By Gblocks using conservative sequence trimming strategy to co-linear modular Sequence trimming is carried out, the block that phylogenetic information is not extracted is given up;Co-linear modular after trimming is merged To aligned sequences, and report distribution of each module in final collating sequence;
3)The detection of co-linear modular data set optimality model:The aligned sequences built based on co-linear modular, according to each common Linear block carries out sequence cutting and model selection in the distribution of final collating sequence, and determines optimal nucleic acid alternative model and sequence Row Cut Stratagem;
4)The structure of phylogenetic tree:By the tandem sequence of the co-linear modular quickly compared, built using MrBayes multiple The phylogenetic relationship of species.
The invention provides a kind of quick comparison flow based on organelle gene group, complete encoding histone can be included Gene also includes non-coding region, while substantial amounts of time cost is also saved, while also having higher reliability and accurate Property.Specifically, Phylogenetic analysis method of the present invention can be efficiently solved:
1st, the problem of the comparison speed of organelle full-length genome;
2nd, the information of organelle gene group(Including code area and noncoding region)Comprehensive coverage information;
3rd, the quick comparison for solving the problems, such as conservative region in different classifications unit;
4th, in fast explicit different classifications unit the problem of conservative gene species.
In a word, by the method for this quick comparison flow based on organelle gene group, different points can quickly be realized The comparison of the synteny of class unit, covers organelle gene group information more fully hereinafter, more accurately infers and reduces The phylogenetic relationship of species.
Brief description of the drawings
Fig. 1 is the experiment flow figure of synteny of the present invention.
Fig. 2 is the visualization result figure that synteny is compared.
Fig. 3 is algae mitochondria systematic evolution tree figure, and wherein left figure compares the system that flow chooses sequence construct to be quick Chadogram, right figure is the systematic evolution tree of the sequence construct of shared gene.
Fig. 4 is plant chloroplast phyletic evolution tree graph, and left figure compares the phyletic evolution that flow chooses sequence construct to be quick Tree, right figure is the systematic evolution tree of the sequence construct of shared gene.
Fig. 5 is rodent mitochondria systematic evolution tree figure, and left figure is for what quick comparison flow chose sequence construct System chadogram, right figure is the systematic evolution tree of the sequence construct of shared gene.
Embodiment
The present invention is further described below by way of specific embodiment, but those skilled in the art should be able to know, it is described to implement Example does not limit the scope of patent protection of the present invention in any way.
The method that the embodiment present invention carries out Phylogenetic analysis using the synteny of organelle gene group
The main of the rapid build step of synteny is included with next skill in organelle gene group Phylogenetic analysis of the present invention Art flow:(1)The synteny of different plant species organelle gene group is extracted using Mauve;(2)The trimming of synteny sequence and Series connection;(3)The detection of co-linear modular data set optimality model;(4)The structure of phylogenetic tree.Fig. 1 is homologous mould of the invention The experiment flow figure of block.The method that synteny of the present invention using organelle gene group carries out Phylogenetic analysis, It specifically includes following steps:
1. the synteny of different plant species organelle gene group is extracted using Mauve
A) the organelle gene group for the species for needing to carry out phylogenetic relationship, comparison data form branch are downloaded in Genbank Hold the main flow nucleotide sequence form such as fasta, gb, gbk, fas;
B) the organelle gene group downloaded is built into a local database;
C) using the organelle gene group that several species are imported in Mauve aligner, using progressive Mauve detections not Co-linear modular is marked off with the structure variation between organelle gene group, and according to comparison result.This step can solve sequence knot The problem of structure variation can not be compared directly, repeats and Redundant process while also avoiding single-gene and comparing this;
D) counted for the co-linear modular marked off, co-linear modular is utilized into script whole from comparison result sequence Extract.
2. the trimming and series connection of synteny sequence
A) followed by the trimming of co-linear modular sequence, that is, the site for having phylogenetic information is extracted:Sequence trimming is used Be Gblocks, use most conservative sequence trimming strategy.After sequence trimming, abandon and do not extract phylogenetic information Block;The block trimmed is merged, and reports distribution of each module in final collating sequence.
3. the detection of co-linear modular data set optimality model
The aligned sequences built based on co-linear modular, its minmal sequence unit is a co-linear modular, and we can be to every Individual co-linear modular, according to it in the distribution of ultimate sequence, carries out sequence cutting and model selection, is substituted with selecting optimal nucleic acid Model and sequence Cut Stratagem.
4. the structure of phylogenetic tree
Above by the tandem sequence of the synteny quickly compared, closed using the MrBayes systems for building multiple species System.
The quick phylogenetic analysis accuracy validation for comparing flow of the invention.
Not currently exist to build the software that organelle gene group aligned sequences are built.The structure of this flow will be entirely thin The time that born of the same parents' device genome system is analyzed shortens to more than ten minutes to a few houres.In order to verify that we build the accurate of flow Property and eurytopicity.The systematic evolution tree that we build middle utilization albumen coded sequence of having published an article(Current widespread practice) The systematic evolution tree of the sequence construct after quick comparison with the present invention carries out Accuracy Verification.Alga cells device genome system The checking of Evolvement, the result is as shown in Figure 3.The checking of plant cell organelle genome system Evolvement, the result As shown in Figure 4.The checking of zooblast device genome system Evolvement, the result is as shown in Figure 5.
From above confirmatory experiment, the sequence that the flow entirely quickly compared is extracted can be with constructing system chadogram The deduction of the phylogenetic relationship of species is accurately carried out, and it is applied widely, greatly save time cost.

Claims (2)

1. a kind of method that synteny using organelle gene group carries out Phylogenetic analysis, it includes following method:
1)The synteny of different plant species organelle gene group is extracted using Mauve:Being downloaded in Genbank needs carry out system The organelle gene group of the species of Evolvement, a local data base is built into by the organelle gene group downloaded;Use The organelle gene group of whole species in local data base is imported in Mauve aligner, is examined using progressive Mauve The structure variation surveyed between different plant species organelle gene group, co-linear modular is marked off according to comparison result;For what is marked off Co-linear modular is counted, and co-linear modular is all extracted from comparison result sequence using script;
2)The trimming and series connection of synteny sequence:By Gblocks using conservative sequence trimming strategy to co-linear modular Sequence trimming is carried out, the block that phylogenetic information is not extracted is given up;Co-linear modular after trimming is merged To aligned sequences, and report distribution of each module in final collating sequence;
3)The detection of co-linear modular data set optimality model:The aligned sequences built based on co-linear modular, according to each common Linear block carries out sequence cutting and model selection in the distribution of final collating sequence, and determines optimal nucleic acid alternative model and sequence Row Cut Stratagem;
4)The structure of phylogenetic tree:By the tandem sequence of the co-linear modular quickly compared, built using MrBayes multiple The phylogenetic relationship of species.
2. the method that the synteny according to claim 1 using organelle gene group carries out Phylogenetic analysis, its It is characterised by, the step 1)In local data base organelle gene group nucleotide sequence preserve form be fasta, gb, gbk, One kind in fas.
CN201710163233.7A 2017-03-19 2017-03-19 A kind of method that synteny using organelle gene group carries out Phylogenetic analysis Pending CN106951729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710163233.7A CN106951729A (en) 2017-03-19 2017-03-19 A kind of method that synteny using organelle gene group carries out Phylogenetic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710163233.7A CN106951729A (en) 2017-03-19 2017-03-19 A kind of method that synteny using organelle gene group carries out Phylogenetic analysis

Publications (1)

Publication Number Publication Date
CN106951729A true CN106951729A (en) 2017-07-14

Family

ID=59472617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710163233.7A Pending CN106951729A (en) 2017-03-19 2017-03-19 A kind of method that synteny using organelle gene group carries out Phylogenetic analysis

Country Status (1)

Country Link
CN (1) CN106951729A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243191A (en) * 2004-11-29 2008-08-13 雷根斯堡大学临床中心 Means and methods for detecting methylated DNA
CN101957892A (en) * 2010-09-17 2011-01-26 深圳华大基因科技有限公司 Whole-genome replication event detection method and system
CN103667328A (en) * 2013-12-03 2014-03-26 中国海洋大学 Construction method of porphyra yezoensis plastid genetic transformation vector

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243191A (en) * 2004-11-29 2008-08-13 雷根斯堡大学临床中心 Means and methods for detecting methylated DNA
CN101957892A (en) * 2010-09-17 2011-01-26 深圳华大基因科技有限公司 Whole-genome replication event detection method and system
CN103667328A (en) * 2013-12-03 2014-03-26 中国海洋大学 Construction method of porphyra yezoensis plastid genetic transformation vector

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张毓婷 等;: "《雷蒙德氏棉HSP70 基因家族的进化分析及其同源基因在陆地棉中的表达分析》", 《遗传HEREDITAS》 *
杨俊卿 等;: "《条斑紫菜水通道蛋白PyAQP1基因的克隆及功能分析》", 《中国海洋大学学报》 *
毕桂萁: "《海水红毛菜(Bangia fuscopurpurea OUCPT-01)与暗紫红毛菜(Bangia atropurpurea)细胞器基因组测序及系统发育分析》", 《中国优秀硕士学位论文全文数据库基础科学辑》 *

Similar Documents

Publication Publication Date Title
Bog et al. Duckweed (Lemnaceae): its molecular taxonomy
Chacón-Sánchez et al. Testing domestication scenarios of lima bean (Phaseolus lunatus L.) in Mesoamerica: insights from genome-wide genetic markers
CN103233075B (en) A kind of method based on transcript profile order-checking exploitation Dendranthema SSR primer
CN106868116A (en) A kind of mulberry tree pathogen high throughput identification and kind sorting technique and its application
CN110910959B (en) Population genetic evolution map and construction method thereof
Seifertová et al. Multiple Pleistocene refugia and post‐glacial colonization in the European chub (Squalius cephalus) revealed by combined use of nuclear and mitochondrial markers
Moreno et al. Genetic characterization of sunflower breeding resources from Argentina: assessing diversity in key open-pollinated and composite populations
CN108157293A (en) A kind of breeding method for simplifying selection high productivity energy A2A2 homozygous genotype milk cows based on pedigree information
Raduski et al. Patterns of genetic variation in a prairie wildflower, Silphium integrifolium, suggest a non‐prairie origin and locally adaptive variation
Guo et al. Revisiting the evolutionary history of domestic and wild ducks based on genomic analyses
Wang et al. Multiplexed massively parallel sequencing of plastomes provides insights into the genetic diversity, population structure, and phylogeography of wild and cultivated Coptis chinensis
US20030200033A1 (en) High-throughput alignment methods for extension and discovery
CN106951729A (en) A kind of method that synteny using organelle gene group carries out Phylogenetic analysis
Ané RECONSTRUCTING CONCORDANCE TREES AND TESTING THE~~ COALESCENT MODEL FROM~~ GENOME-WIDE DATA sars
CN105279396B (en) The Drought-resistant gene of plant module method of excavation
Molinari et al. Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview
CN115948521A (en) Method for detecting aneuploid missing chromosome information
Conry Determining the impact of recombination on phylogenetic inference
CN102747147B (en) High-throughput identification method of non-coding gene
Del Giudice et al. Study of genetic variation and its association with tensile strength among bamboo species through whole genome resequencing
Mu et al. Genomic Sequence Analysis of 4 Culm Shape Variants of Moso Bamboo Based on Re-sequencing
Mu et al. Investigation on tree molecular genome of Arabidopsis thaliana for internet of things
Dittberner et al. Approximate Bayesian computation untangles signatures of contemporary and historical hybridization between two endangered species
CN111445954B (en) Method for identifying multiple gene families and carrying out evolutionary analysis
CN104598770A (en) Wheat aphid quantity forecasting method and system based on human being evolution gene expression programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170714

RJ01 Rejection of invention patent application after publication