CN106980776B - Gene family affiliation calculation method between a kind of species - Google Patents

Gene family affiliation calculation method between a kind of species Download PDF

Info

Publication number
CN106980776B
CN106980776B CN201710229007.4A CN201710229007A CN106980776B CN 106980776 B CN106980776 B CN 106980776B CN 201710229007 A CN201710229007 A CN 201710229007A CN 106980776 B CN106980776 B CN 106980776B
Authority
CN
China
Prior art keywords
species
gene
gene family
cotton
nonredundancy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710229007.4A
Other languages
Chinese (zh)
Other versions
CN106980776A (en
Inventor
向浏欣
吴朝锋
邓聿杉
蔡应繁
汪露
廖华东
何琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710229007.4A priority Critical patent/CN106980776B/en
Publication of CN106980776A publication Critical patent/CN106980776A/en
Application granted granted Critical
Publication of CN106980776B publication Critical patent/CN106980776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Abstract

The invention discloses gene family affiliation calculation methods between a kind of species, including determining that a gene family is specified gene family, calculating two investigation species, who is closer with the specified gene family affiliation of reference substance, calculation method is that the gene of gene family will be specified to investigate species with two respectively with reference to species the gene of gene family is specified to carry out pair-wise alignment, the reference species nonredundancy gene dosage of the investigation species under multiple threshold conditions is obtained respectively, it determines the maximum absolute value for the difference for investigating the reference species nonredundancy gene dosage between species under the conditions of same threshold and is to specify gene family affiliation closer with reference species with reference to the more corresponding investigation species of species nonredundancy gene dosage, it is believed that the investigation species have more like corresponding morphological development feature to reference to species, and it is applied to dominant species Or the screening of purpose species, it can also reflect the evolutionary relationship of gene family between different plant species.

Description

Gene family affiliation calculation method between a kind of species
Technical field
The present invention relates to field of bioinformatics, and in particular to gene family affiliation calculation method between species.
Background technique
Spore relationship or affiliation research are one of important research contents of field of bioinformatics, study species The important method of evolution is to carry out amino acid or nucleic acid sequence alignment to ortholog important species or conservative gene, Speculate intergenic evolutionary relationship or affiliation according to the situation of change of sequence, so speculate the evolutionary relationship between species or Affiliation.
Gene family is the set with multiple genes of identical domain sequence (one section of conservative amino acid sequence), object The gene family having in kind includes even up to a hundred or thousands of a genes, belongs to a large family.The gene of same gene family by In structural domain having the same, therefore often there is similar biological function.Certainly, sequence similarity is higher between gene, The function of gene is more close.However, in current Study on Evolution, either single or a small amount of ortholog or conservative base The evolutionary analysis of cause or the gene in a gene family is all often to show " list by constructing chadogram after sequence alignment It is a " evolutionary relationship or affiliation between gene.Each species are an independent entirety, all have a certain phase between species Same gene family is common, and a certain gene family in species is regarded as an entirety, different plant species same gene man Study on Evolution or relationship research between race have not been reported.Any morphological development feature of one species is frequently not individual gene Effect, it is usually coefficient between gene family or different genes family as a result, therefore between single species or a small amount of straight It is that the evolutionary analysis of homologous gene or conservative gene or relationship analysis are not so good as to the evolutionary analysis or relation between gene family Analysis can more really reflect the morphological development feature similarities and differences between species evolutionary relationship and species.Due to same gene family gene often With similar function, then the same gene family resemblance level the high between species, show same gene family between the species Affiliation is closer, then corresponding gene function performance level or ammonia configuration development characteristics have closer similitude, can answer For inferring the species selection of gene function performance level and corresponding morphological development feature.For example, NBS gene family is well known Plant disease-resistant related gene family, it is assumed that known A species have very strong disease resistance, and assume there is unknown species B, C, D, to Understanding which species in B, C, D has stronger disease resistance, then, gene family relationship is closed between different plant species through the invention It is calculation method, so that it may be filtered out from B, C, D relatively strong anti-with having for nearest NBS gene family affiliation with A The species of characteristic of disease avoid the screening of the Biotechnology Experiment of cumbersome, prolonged, large-scale, high cost, save manpower Material resources can be greatly facilitated species breeding screening efficiency, comparatively fast obtain outstanding kind, purpose kind.With gene order-checking skill The full-length genome of the progress of art, more and more species is sequenced, then obtaining the complete gene family of more several species becomes May, then gene family affiliation calculation method through the invention, can faster screen with specific modality development characteristics The species that are worth with specific application of sum.In addition, Heterologous Hybridization species can be reflected with certain gene of itself and 2 parent species The genetic evolution relationship of family can reflect nearly edge species the evolutionary relationship of certain gene family between nearly edge species.
Summary of the invention
In order to find the evolutionary relationship or affiliation between different plant species same gene family, and understand a certain base between species Because of the function performance level of family, the species selection of specific modality development characteristics is improved, proposes base between a kind of species of the present invention Because of family's affiliation calculation method.
Gene family affiliation calculation method between a kind of species, comprising:
Determine that a gene family to be analyzed is used as specified gene family, the gene that a usual gene family is included Identical structural domain is all had, structural domain is one section than more conservative amino acid sequence;
Selection refers to species, obtains the gene that its described specified gene family is included and is used as with reference to gene, specifies gene Obtaining for the gene of family can be by submitting to some online websites for the gene order of species or software goes the analysis to be included Structural domain, if the structural domain comprising specifying gene family, which belongs to specified gene family, online website such as NCBI Structural domain analysis tool (https: //www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi), EMBL- Pfam research tool (http://pfam.xfam.org/search) of EBI etc., the local that software such as EMBL-EBI is provided HMMER installation procedure etc.;
It determines the first investigation species, obtains the gene that its described specified gene family is included and investigate gene as first, Acquisition methods are shown in the above-mentioned acquisition that gene family gene is specified with reference to species;
It determines the second investigation species, obtains the gene that its described specified gene family is included and investigate gene as second, Acquisition methods are shown in the above-mentioned acquisition that gene family gene is specified with reference to species;
Gene will be investigated with reference to gene and first and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than threshold The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of value condition investigates species as first, it is described Threshold condition is the combination for matching sequence length and matching sequence similarity value, and the threshold condition that is greater than is greater than matching sequence Length and it is greater than matching sequence similarity value, the quantity of the threshold condition is more than or equal to 2.Sequence alignment can be used ClustalX, ClustalW or Blast software carry out, and the result of sequence alignment generally comprises the matching sequence between sequence two-by-two Length and matching sequence similarity value or matching sequence distance value, matching sequence similarity value and the relationship that matches sequence distance value For they and be equal to 1, when comparison result data volume is big can by Perl programming obtain refer to species nonredundancy base Because of quantity, nonredundancy gene refers to unduplicated gene;
Gene will be investigated with reference to gene and second and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than institute State the reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of threshold condition investigates species as second;
The reference species nonredundancy gene dosage and second of the first investigation species investigates species when calculating same threshold condition Reference species nonredundancy gene dosage difference, determine the maximum absolute value of difference and with reference to species nonredundancy gene number compared with Big species of investigating are to specify the affiliation of gene family closer with reference species.
Preferably, with reference to species and species are investigated by genome sequencing, the gene obtained after genome sequencing The gene information of family is more complete, then the result of subsequent analysis is more reliable.
Preferably, the sequence is amino acid sequence.
Preferably, the matching sequence length in the threshold condition is the domain sequence length of the specified gene family 2/3rds, matching sequence length threshold value is arranged the too small confidence level that can reduce subsequent analysis result, and usually structural domain sequence 2/3rds of column length can consider the structural domain than more complete.
Preferably, the matching sequence similarity value in the threshold condition is at least 30% or more, usual sequence similarity Up to 30% it is considered that two sequences may have similar function.
Preferably, it includes 30%, 40%, 50%, 60% that the matching sequence similarity value, which is at least 30% or more, 70%, 80% and 90%.
Preferably, it is 7 threshold conditions, of 7 threshold conditions that the quantity of the threshold condition, which is more than or equal to 2, It is respectively 30%, 40%, 50%, 60%, 70%, 80% and 90% with sequence similarity value, matching sequence length is described 2/3rds of the domain sequence length of specified gene family.
Preferably, the nonredundancy gene dosage is that the number of nonredundancy gene or the number of nonredundancy gene account for reference substance The ratio of the specified gene family gene number of kind.
Certainly can be greater than by investigating species by 2, can be obtained and be joined from all investigation species by means of the present invention The species that species specify gene family affiliation nearest are examined, are calculated in species and as first investigated from any two with reference to species The closer investigation species of specified gene family affiliation, then calculate this and investigate species and another investigation species who and reference substance Kind specified gene family affiliation is closer, and so on, it obtains and specifies gene family affiliation nearest with reference to species Investigate species.
With between different plant species or in species of the same race between individual gene affiliation or evolutionary relationship compared with, different plant species Affiliation between same gene family has prior application value, the former only shows the affiliation between gene Or evolutionary relationship, the latter can be applied to the comparison and understanding to the function performance level of same gene family in different plant species, Thus the degree for understanding or predicting the corresponding morphological development feature of the gene family between species, so that faster more quasi- more efficiently sieve Select dominant species or purpose species.In addition, itself and certain gene family of 2 parent species can be reacted for Heterologous Hybridization species Genetic evolution relationship can reflect nearly edge species the evolutionary relationship of certain gene family between nearly edge species.
Detailed description of the invention
Fig. 1 gene family affiliation calculation method preferred embodiment flow diagram between species of the present invention.
Specific embodiment
Below in conjunction with embodiment, the present invention will be described in detail, these embodiments only serve illustrative, it is not limited to Application range of the invention.It is all to be made without prejudice to spirit of that invention the present invention is not limited to following embodiments or embodiment Modification and deformation should all be included within the scope of the present invention.
Experimental example 1: the affiliation of NBS gene family calculates between sea island cotton, upland cotton and Lei Mengdeshi cotton
1. material source: the genomic data of upland cotton and Lei Mengdeshi cotton is studied from Chinese Academy of Agricultural Sciences cotton Institute (http://cgp.genomics.org.cn/) separately includes 76,943 and 40,976 genes;The genome number of sea island cotton According to from Hua Zhong Agriculture University (http://cotton.cropdb.org/), including 109,918 genes.
2. method and step
First, by the protein sequence of the full gene in sea island cotton, upland cotton and Lei Mengdeshi cotton genomic data (and Amino acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each The structure domain information that gene is included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931) Gene, then pass through Perl programming respectively from obtaining sea island cotton in the first step, in upland cotton and Lei Mengdeshi cotton gene Structural domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, the sea island cotton of acquisition, upland cotton with The NBS gene family of Lei Mengdeshi cotton separately includes 682,588 and 365 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%, 60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment 200)。
4th, the NBS gene of Lei Mengdeshi cotton (referring to species) and the NBS gene of sea island cotton (the first investigation species) are made Pair-wise alignment is carried out with ClustalX software, and the nj file of generation includes the matching sequence length between sequence and matching two-by-two Sequence distance value, matching sequence similarity value is 1 total for subtracting matching sequence distance value, then passes through Perl programming point The nonredundancy gene dosage of the Lei Mengdeshi cotton of 7 threshold conditions described in third step Huo Qu be greater than, and be called sea island cotton Lei Mengdeshi cotton nonredundancy gene dosage, the threshold condition as described in third step are 7, obtain corresponding 7 sea island cotton Lei Mengdeshi cotton nonredundancy gene dosage, the results are shown in Table 1, from similitude 90% to 30%, the Lei Mengdeshi of sea island cotton Cotton nonredundancy gene number is respectively 216,310,339,345,348,350 and 352, accounts for Lei Mengdeshi cotton NBS gene number Ratio is respectively 59%, 85%, 93%, 95%, 95%, 96% and 96%.
5th, by the NBS gene of Lei Mengdeshi cotton (refer to species) and the NBS gene of upland cotton (second investigates species) into Row pair-wise alignment, and obtain the Lei Mengdeshi cotton nonredundancy gene number of upland cotton, comparison method and acquisition methods and Method described in four steps is identical, the Lei Mengdeshi cotton nonredundancy gene dosage of corresponding 7 upland cotton is obtained, as a result such as 1 institute of table Show, from similitude 90% to 30%, the Lei Mengdeshi cotton nonredundancy gene number of upland cotton is respectively 189,259,280,284, 293,296 and 315, the ratio for accounting for Lei Mengdeshi cotton NBS gene number is respectively 52%, 71%, 77%, 78%, 80%, 81% With 86%.
6th, the difference of Lei Mengdeshi cotton nonredundancy gene dosage when calculating same threshold condition between sea island cotton and upland cotton Value, as shown in table 1, difference is maximum be under the conditions of similitude is greater than 60% sea island cotton than the Lei Mengdeshi cotton nonredundancy of upland cotton More 61 of gene number (proportion more 17%), also, under each threshold condition sea island cotton Lei Mengdeshi cotton nonredundancy base It is bigger in terms of the difference of ratio because quantity is more than upland cotton, show the NBS gene family and Lei Mengdeshi of sea island cotton The affiliation of the NBS gene family of cotton is closer.
The quantity of the Lei Mengdeshi cotton nonredundancy gene of sea island cotton and upland cotton under the different threshold conditions of table 1
Since NBS gene is disease-resistant related gene, and verticillium wilt is the plant disease for influencing output of cotton most serious, can be claimed It is cotton " cancer " on obtaining.In fact, studies have shown that Lei Mengdeshi cotton and sea island cotton all have stronger anti-cotton verticillium wilt Ability, and upland cotton infecting vulnerable to verticillium wilt pathogen, the NBS gene man of the analysis of the method for the present invention sea island cotton as the result is shown Race and the affiliation of the NBS gene family of Lei Mengdeshi cotton are closer, and it is more similar to show that sea island cotton and Lei Mengdeshi cotton have The corresponding morphological development feature of NBS gene family, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, due to island Cotton and upland cotton be between Lei Mengdeshi cotton and Asiatic cotton Heterologous Hybridization formed new species, it is according to the present invention the result shows that, Lei Mengdeshi cotton has given more NBS gene family gene genetics to sea island cotton, thus it is speculated that the disease resistance and redmond of sea island cotton Family name cotton is more close, consistent with the reality of cotton disease resistance.As it can be seen that the evolutionary relationship of same gene family or parent between different plant species Edge relationship has close relationship with gene family function performance level and the corresponding morphological development characteristic level of species, shows this hair Bright method is with higher to apply confidence level.
Experimental example 2: the affiliation of NBS gene family calculates between sea island cotton, upland cotton and Asiatic cotton
1 material source: the genomic data of upland cotton and Asiatic cotton is from the Chinese Academy of Agriculture Science and Technologys Cotton Research Institute (http://cgp.genomics.org.cn/) separately includes 76,943 and 40,134 genes;The genomic data of sea island cotton It include 109,918 genes from Hua Zhong Agriculture University (http://cotton.cropdb.org/).
2 methods and step
First, by the protein sequence of the full gene in sea island cotton, upland cotton and Asiatic cotton genomic data (and amino Acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each gene The structure domain information for being included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931) Gene, then pass through Perl programming respectively from sea island cotton is obtained in the first step, structure in upland cotton and Asiatic cotton gene Domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, sea island cotton, upland cotton and the Asia of acquisition The NBS gene family of cotton separately includes 682,588 and 246 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%, 60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment 200)。
4th, the NBS gene of the NBS gene of Asiatic cotton (referring to species) and sea island cotton (first investigates species) is used ClustalX software carries out pair-wise alignment, and the nj file of generation includes the matching sequence length and matching sequence between sequence two-by-two Column distance value, matching sequence similarity value are 1 total for subtracting matching sequence distance value, then are distinguished by Perl programming The nonredundancy gene dosage for being greater than the Asiatic cotton of 7 threshold conditions described in third step is obtained, and is called the Asiatic cotton of sea island cotton Nonredundancy gene dosage, the threshold condition as described in third step are 7, and the Asiatic cotton for obtaining corresponding 7 sea island cotton is non- Redundancy gene dosage, the results are shown in Table 2, from similitude 90% to 30%, the Asiatic cotton nonredundancy gene number point of sea island cotton Not Wei 115,185,212,219,222,225 and 226, the ratio for accounting for Asiatic cotton NBS gene number is respectively 47%, 75%, 86%, 89%, 90%, 91% and 92%.
5th, the NBS gene of the NBS gene of Asiatic cotton (referring to species) and upland cotton (second investigates species) is carried out two Pair-wise alignment, and the Asiatic cotton nonredundancy gene number of upland cotton is obtained, described in comparison method and acquisition methods and the 4th step Method it is identical, obtain the Asiatic cotton nonredundancy gene dosage of corresponding 7 upland cotton, the results are shown in Table 2, from similitude 90% To 30%, the Asiatic cotton nonredundancy gene number of upland cotton is respectively 140,202,211,214,219,221 and 225, Zhan Yazhou The ratio of cotton NBS gene number is respectively 57%, 82%, 86%, 87%, 89%, 90% and 91%.
6th, the difference of Asiatic cotton nonredundancy gene dosage when calculating same threshold condition between sea island cotton and upland cotton, As shown in table 2, it is that upland cotton is more a than the Asiatic cotton nonredundancy gene of sea island cotton under the conditions of similitude is greater than 90% that difference is maximum Number is 25 more (proportion more 10%), and the upland cotton under the conditions of being greater than 80% for similitude that takes second place is more non-than the Asiatic cotton of sea island cotton superfluous Complementary basis because of (proportion more 7%) 17 more than number, although under remaining threshold condition sea island cotton Asiatic cotton nonredundancy base Because quantity is more than upland cotton, but difference is fairly small, without conspicuousness, it is thus determined that with difference maximum and Asiatic cotton nonredundancy The biggish upland cotton of gene dosage is closer with the affiliation of the NBS gene family of Asiatic cotton.
The quantity of the Asiatic cotton nonredundancy gene of sea island cotton and upland cotton under the different threshold conditions of table 2
Since NBS gene is disease-resistant related gene, in fact, studies have shown that Asiatic cotton and upland cotton are vulnerable to verticillium wilt Bacterium is infected, and sea island cotton can be immunized, and the method for the present invention analyzes the NBS gene family of upland cotton and Asia as the result is shown The affiliation of the NBS gene family of cotton is closer, shows that upland cotton has more similar NBS gene family corresponding with Asiatic cotton Morphological development feature, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, since sea island cotton and upland cotton are thunder Cover the new species that Heterologous Hybridization is formed between De Shi cotton and Asiatic cotton, it is according to the present invention the result shows that, Asiatic cotton is by more NBS Gene family gene genetic has given upland cotton, thus it is speculated that the disease resistance of upland cotton is more close with Asiatic cotton, with showing for cotton disease resistance Truth condition is consistent.It again shows that, the evolutionary relationship of same gene family or affiliation are with gene family function between different plant species Performance level and the corresponding morphological development characteristic level of species have close relationship, show the method for the present invention application with higher Confidence level.
Experimental example 3: the affiliation of NBS gene family calculates between cocoa, Lei Mengdeshi cotton and Asiatic cotton
1 material source: the genomic data of Lei Mengdeshi cotton and Asiatic cotton is studied from Chinese Academy of Agricultural Sciences cotton Institute (http://cgp.genomics.org.cn/) separately includes 40,976 and 40,134 genes;The genomic data of cocoa It include 46,143 genes from cocoa genome center (http://cocoagendb.cirad.fr. /).
2 methods and step
First, by the protein sequence of the full gene in cocoa, Lei Mengdeshi cotton and Asiatic cotton genomic data (and ammonia Base acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each base Because of the structure domain information for being included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931) Gene, then pass through Perl programming respectively from obtaining cocoa in the first step, tie in Lei Mengdeshi cotton and Asiatic cotton gene Structure domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, sea island cotton, upland cotton and the Asia of acquisition The NBS gene family of continent cotton separately includes 298,365 and 246 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%, 60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment 200)。
4th, the NBS gene of the NBS gene of cocoa (referring to species) and Lei Mengdeshi cotton (first investigates species) is used ClustalX software carries out pair-wise alignment, and the nj file of generation includes the matching sequence length and matching sequence between sequence two-by-two Column distance value, matching sequence similarity value are 1 total for subtracting matching sequence distance value, then are distinguished by Perl programming The nonredundancy gene dosage for being greater than the cocoa of 7 threshold conditions described in third step is obtained, and is called the cocoa of Lei Mengdeshi cotton Nonredundancy gene dosage, the threshold condition as described in third step are 7, obtain the cocoa of corresponding 7 Lei Mengdeshi cottons Nonredundancy gene dosage, the results are shown in Table 3, from similitude 90% to 30%, the cocoa nonredundancy gene of Lei Mengdeshi cotton Number is respectively 0,13,38,51,224,259 and 279, the ratio of Zhan Keke NBS gene number is respectively 0%, 4%, 13%, 17%, 75%, 87% and 94%.
5th, the NBS gene of the NBS gene of cocoa (referring to species) and Asiatic cotton (second investigates species) is carried out two-by-two Sequence alignment, and the cocoa nonredundancy gene number of Asiatic cotton is obtained, side described in comparison method and acquisition methods and the 4th step Method is identical, obtains the cocoa nonredundancy gene dosage of corresponding 7 Asiatic cottons, and the results are shown in Table 3, from similitude 90% to 30%, the cocoa nonredundancy gene number of Asiatic cotton is respectively 0,10,30,110,221,257 and 281, and Zhan Keke NBS gene is total Several ratios is respectively 0%, 3%, 10%, 37%, 75%, 86% and 94%.
6th, the difference of cocoa nonredundancy gene dosage when calculating same threshold condition between Lei Mengdeshi cotton and Asiatic cotton Value, as shown in table 3, it is the cocoa nonredundancy base of Asiatic cotton ratio Lei Mengdeshi cotton under the conditions of similitude is greater than 60% that difference is maximum Because of (proportion more 20%) 59 more than number, although under remaining threshold condition Lei Mengdeshi cotton cocoa nonredundancy gene Quantity is more than Asiatic cotton, but difference is fairly small, without conspicuousness, it is thus determined that with difference maximum and cocoa nonredundancy gene The biggish upland cotton of quantity is closer with the affiliation of the NBS gene family of cocoa.
The quantity of the cocoa nonredundancy gene of Lei Mengdeshi cotton and Asiatic cotton under the different threshold conditions of table 3
Since NBS gene is disease-resistant related gene, in fact, studies have shown that Asiatic cotton and cocoa are vulnerable to verticillium wilt pathogen Infect, and Lei Mengdeshi cotton can be immunized, the analysis of the method for the present invention the NBS gene family of cocoa and Asia as the result is shown The affiliation of the NBS gene family of cotton is closer, shows that cocoa and Asiatic cotton have the corresponding shape of more similar NBS gene family State development characteristics, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, since Lei Mengdeshi cotton and Asia can be can be The NBS gene family affiliation of the nearly edge species of cotton, the Asiatic cotton and cocoa that the method for the present invention obtains is closer, thus it is speculated that: Lei Meng The common ancestor of De Shi cotton and Asiatic cotton should be to be infected vulnerable to verticillium wilt pathogen, and Lei Mengdeshi cotton is to pass through after separating with Asiatic cotton The tachytelic evolution of NBS gene family obtains the ability of resisting verticillium, and the NBS gene of Lei Mengdeshi cotton is than cocoa and Asiatic cotton It is mostly very much, it may be possible to which that the duplication that rear NBS gene is separated with Asiatic cotton expands family member and changes disease resistance, however NBS The disease resistance of the more upland cotton of gene dosage illustrates the NBS of Lei Mengdeshi cotton there is no enhancing due to increasing for quantity The evolution and duplication of gene family be not random, has those of stronger disease resistance NBS to adapt to environment existence Gene is replicated, to enhance disease resistance.Therefore, the method for the present invention can be applied to the Study on Evolution of gene family With the screening of correlation function gene.This experimental example again shows that, the evolutionary relationship or relationship of same gene family between different plant species Relationship has close relationship with gene family function performance level and the corresponding morphological development characteristic level of species, shows the present invention Method is with higher to apply confidence level.

Claims (8)

1. gene family affiliation calculation method between a kind of species, it is characterised in that:
Determine a gene family to be analyzed as specified gene family;
Selection refers to species, obtains the gene that its described specified gene family is included and is used as with reference to gene;
It determines the first investigation species, obtains the gene that its described specified gene family is included and investigate gene as first;
It determines the second investigation species, obtains the gene that its described specified gene family is included and investigate gene as second;
Gene will be investigated with reference to gene and first and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than threshold value item The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of part investigates species as first, the threshold value Condition is the combination for matching sequence length and matching sequence similarity value, and the threshold condition that is greater than is greater than matching sequence length And it is greater than matching sequence similarity value, the quantity of the threshold condition is more than or equal to 2;
Gene will be investigated with reference to gene and second and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than the threshold The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of value condition investigates species as second;
First investigates the reference species nonredundancy gene dosage of species and the ginseng of the second investigation species when calculating same threshold condition The difference for examining species nonredundancy gene dosage, determines the maximum absolute value of difference and reference species nonredundancy gene dosage is biggish Investigating species is to specify the affiliation of gene family closer with reference species.
2. gene family affiliation calculation method between species as described in claim 1, which is characterized in that with reference to species and examine Species are examined by genome sequencing.
3. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the sequence is ammonia Base acid sequence.
4. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition In matching sequence length be the specified gene family domain sequence length 2/3rds.
5. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition In matching sequence similarity value be at least 30% or more.
6. gene family affiliation calculation method between species as claimed in claim 5, which is characterized in that the matching sequence It includes 30%, 40%, 50%, 60%, 70%, 80% and 90% that similarity, which is at least 30% or more,.
7. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition Quantity be more than or equal to 2 be 7 threshold conditions, the matching sequence similarity value of 7 threshold conditions is respectively 30%, 40%, 50%, 60%, 70%, 80% and 90%, matching sequence length is that the domain sequence of the specified gene family is long 2/3rds of degree.
8. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the nonredundancy base Because the number of number or nonredundancy gene that quantity is nonredundancy gene accounts for the specified gene family gene number with reference to species Ratio.
CN201710229007.4A 2017-04-10 2017-04-10 Gene family affiliation calculation method between a kind of species Active CN106980776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710229007.4A CN106980776B (en) 2017-04-10 2017-04-10 Gene family affiliation calculation method between a kind of species

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710229007.4A CN106980776B (en) 2017-04-10 2017-04-10 Gene family affiliation calculation method between a kind of species

Publications (2)

Publication Number Publication Date
CN106980776A CN106980776A (en) 2017-07-25
CN106980776B true CN106980776B (en) 2019-05-24

Family

ID=59343719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710229007.4A Active CN106980776B (en) 2017-04-10 2017-04-10 Gene family affiliation calculation method between a kind of species

Country Status (1)

Country Link
CN (1) CN106980776B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111445954B (en) * 2020-04-01 2023-09-01 广州基迪奥生物科技有限公司 Method for identifying multiple gene families and carrying out evolutionary analysis
CN113628684A (en) * 2021-08-06 2021-11-09 苏州鸿晓生物科技有限公司 Sample bacterial species detection methods and systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104450898A (en) * 2014-11-26 2015-03-25 江苏出入境检验检疫局动植物与食品检测中心 Species identification method of euproctis insects
CN104546938A (en) * 2014-09-30 2015-04-29 深圳华大基因科技有限公司 Application of extremely giant megamonas in treatment or prevention of rheumatoid arthritis or related diseases thereof
CN104603283A (en) * 2012-08-01 2015-05-06 深圳华大基因研究院 Method and system to determine biomarkers related to abnormal condition
CN105063761A (en) * 2015-09-02 2015-11-18 云南大学 Method for identifying predator nematophagous hyphomycete arthrobotrys through DNA bar codes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2681515A1 (en) * 2007-03-23 2008-10-02 Basf Plant Science Gmbh Transgenic plants with increased stress tolerance and yield expressing a lrp-2 protein

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104603283A (en) * 2012-08-01 2015-05-06 深圳华大基因研究院 Method and system to determine biomarkers related to abnormal condition
CN104546938A (en) * 2014-09-30 2015-04-29 深圳华大基因科技有限公司 Application of extremely giant megamonas in treatment or prevention of rheumatoid arthritis or related diseases thereof
CN104450898A (en) * 2014-11-26 2015-03-25 江苏出入境检验检疫局动植物与食品检测中心 Species identification method of euproctis insects
CN105063761A (en) * 2015-09-02 2015-11-18 云南大学 Method for identifying predator nematophagous hyphomycete arthrobotrys through DNA bar codes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes;Mincheol Kim etc.;《International Journal of Systematic and Evolutionary Microbiology》;20141231;第346-351页

Also Published As

Publication number Publication date
CN106980776A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
Marzano et al. Novel mycoviruses discovered from metatranscriptomics survey of soybean phyllosphere phytobiomes
Martin et al. Quarantine regulations and the impact of modern detection methods
Claverie et al. From spatial metagenomics to molecular characterization of plant viruses: a geminivirus case study
Jo et al. Peach RNA viromes in six different peach cultivars
Tanweer et al. Current advance methods for the identification of blast resistance genes in rice
Hospital Challenges for effective marker-assisted selection in plants
Sardos et al. DArT whole genome profiling provides insights on the evolution and taxonomy of edible Banana (Musa spp.)
Liu et al. Evaluating genetic diversity and constructing core collections of Chinese Lentinula edodes cultivars using ISSR and SRAP markers
Wolfe et al. Marker-based estimates reveal significant nonadditive effects in clonally propagated cassava (Manihot esculenta): implications for the prediction of total genetic value and the selection of varieties
Chang et al. Genome-wide association and genomic prediction identifies associated loci and predicts the sensitivity of Tobacco ringspot virus in soybean plant introductions
Sidharthan et al. Robust virome profiling and whole genome reconstruction of viruses and viroids enabled by use of available mRNA and sRNA-Seq datasets in grapevine (Vitis vinifera L.)
Bamba et al. Plant adaptation and speciation studied by population genomic approaches
Thomas et al. Resurgence of cucurbit downy mildew in the United States: Insights from comparative genomic analysis of Pseudoperonospora cubensis
CN106980776B (en) Gene family affiliation calculation method between a kind of species
Bulman et al. Opportunities and limitations for DNA metabarcoding in Australasian plant-pathogen biosecurity
Ahmed et al. Technological advancements and their importance for nematode identification
Shujaat et al. Cr-prom: A convolutional neural network-based model for the prediction of rice promoters
Monnot et al. Deciphering the genetic architecture of plant virus resistance by GWAS, state of the art and potential advances
AlMomin et al. Draft genome sequence of the silver pomfret fish, Pampus argenteus
Rodriguez-Rodriguez et al. The recombinant potato virus Y (PVY) strain, PVYNTN, identified in potato fields in Victoria, southeastern Australia
Mavrič Pleško et al. Raspberry bushy dwarf virus in Slovenia-geographic distribution, genetic diversity and population structure
Benkeblia Sustainable agriculture and new biotechnologies
Rabadán et al. Long-term monitoring of aphid-transmitted viruses in melon and zucchini crops: Genetic diversity and population structure of cucurbit aphid-borne yellows virus and watermelon mosaic virus
Karavina et al. High-throughput sequencing of virus-infected Cucurbita pepo samples revealed the presence of Zucchini shoestring virus in Zimbabwe
Lefebvre et al. Host plant resistance to pests and pathogens, the genetic leverage in integrated pest and disease management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant