CN106980776B - Gene family affiliation calculation method between a kind of species - Google Patents
Gene family affiliation calculation method between a kind of species Download PDFInfo
- Publication number
- CN106980776B CN106980776B CN201710229007.4A CN201710229007A CN106980776B CN 106980776 B CN106980776 B CN 106980776B CN 201710229007 A CN201710229007 A CN 201710229007A CN 106980776 B CN106980776 B CN 106980776B
- Authority
- CN
- China
- Prior art keywords
- species
- gene
- gene family
- cotton
- nonredundancy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Abstract
The invention discloses gene family affiliation calculation methods between a kind of species, including determining that a gene family is specified gene family, calculating two investigation species, who is closer with the specified gene family affiliation of reference substance, calculation method is that the gene of gene family will be specified to investigate species with two respectively with reference to species the gene of gene family is specified to carry out pair-wise alignment, the reference species nonredundancy gene dosage of the investigation species under multiple threshold conditions is obtained respectively, it determines the maximum absolute value for the difference for investigating the reference species nonredundancy gene dosage between species under the conditions of same threshold and is to specify gene family affiliation closer with reference species with reference to the more corresponding investigation species of species nonredundancy gene dosage, it is believed that the investigation species have more like corresponding morphological development feature to reference to species, and it is applied to dominant species Or the screening of purpose species, it can also reflect the evolutionary relationship of gene family between different plant species.
Description
Technical field
The present invention relates to field of bioinformatics, and in particular to gene family affiliation calculation method between species.
Background technique
Spore relationship or affiliation research are one of important research contents of field of bioinformatics, study species
The important method of evolution is to carry out amino acid or nucleic acid sequence alignment to ortholog important species or conservative gene,
Speculate intergenic evolutionary relationship or affiliation according to the situation of change of sequence, so speculate the evolutionary relationship between species or
Affiliation.
Gene family is the set with multiple genes of identical domain sequence (one section of conservative amino acid sequence), object
The gene family having in kind includes even up to a hundred or thousands of a genes, belongs to a large family.The gene of same gene family by
In structural domain having the same, therefore often there is similar biological function.Certainly, sequence similarity is higher between gene,
The function of gene is more close.However, in current Study on Evolution, either single or a small amount of ortholog or conservative base
The evolutionary analysis of cause or the gene in a gene family is all often to show " list by constructing chadogram after sequence alignment
It is a " evolutionary relationship or affiliation between gene.Each species are an independent entirety, all have a certain phase between species
Same gene family is common, and a certain gene family in species is regarded as an entirety, different plant species same gene man
Study on Evolution or relationship research between race have not been reported.Any morphological development feature of one species is frequently not individual gene
Effect, it is usually coefficient between gene family or different genes family as a result, therefore between single species or a small amount of straight
It is that the evolutionary analysis of homologous gene or conservative gene or relationship analysis are not so good as to the evolutionary analysis or relation between gene family
Analysis can more really reflect the morphological development feature similarities and differences between species evolutionary relationship and species.Due to same gene family gene often
With similar function, then the same gene family resemblance level the high between species, show same gene family between the species
Affiliation is closer, then corresponding gene function performance level or ammonia configuration development characteristics have closer similitude, can answer
For inferring the species selection of gene function performance level and corresponding morphological development feature.For example, NBS gene family is well known
Plant disease-resistant related gene family, it is assumed that known A species have very strong disease resistance, and assume there is unknown species B, C, D, to
Understanding which species in B, C, D has stronger disease resistance, then, gene family relationship is closed between different plant species through the invention
It is calculation method, so that it may be filtered out from B, C, D relatively strong anti-with having for nearest NBS gene family affiliation with A
The species of characteristic of disease avoid the screening of the Biotechnology Experiment of cumbersome, prolonged, large-scale, high cost, save manpower
Material resources can be greatly facilitated species breeding screening efficiency, comparatively fast obtain outstanding kind, purpose kind.With gene order-checking skill
The full-length genome of the progress of art, more and more species is sequenced, then obtaining the complete gene family of more several species becomes
May, then gene family affiliation calculation method through the invention, can faster screen with specific modality development characteristics
The species that are worth with specific application of sum.In addition, Heterologous Hybridization species can be reflected with certain gene of itself and 2 parent species
The genetic evolution relationship of family can reflect nearly edge species the evolutionary relationship of certain gene family between nearly edge species.
Summary of the invention
In order to find the evolutionary relationship or affiliation between different plant species same gene family, and understand a certain base between species
Because of the function performance level of family, the species selection of specific modality development characteristics is improved, proposes base between a kind of species of the present invention
Because of family's affiliation calculation method.
Gene family affiliation calculation method between a kind of species, comprising:
Determine that a gene family to be analyzed is used as specified gene family, the gene that a usual gene family is included
Identical structural domain is all had, structural domain is one section than more conservative amino acid sequence;
Selection refers to species, obtains the gene that its described specified gene family is included and is used as with reference to gene, specifies gene
Obtaining for the gene of family can be by submitting to some online websites for the gene order of species or software goes the analysis to be included
Structural domain, if the structural domain comprising specifying gene family, which belongs to specified gene family, online website such as NCBI
Structural domain analysis tool (https: //www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi), EMBL-
Pfam research tool (http://pfam.xfam.org/search) of EBI etc., the local that software such as EMBL-EBI is provided
HMMER installation procedure etc.;
It determines the first investigation species, obtains the gene that its described specified gene family is included and investigate gene as first,
Acquisition methods are shown in the above-mentioned acquisition that gene family gene is specified with reference to species;
It determines the second investigation species, obtains the gene that its described specified gene family is included and investigate gene as second,
Acquisition methods are shown in the above-mentioned acquisition that gene family gene is specified with reference to species;
Gene will be investigated with reference to gene and first and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than threshold
The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of value condition investigates species as first, it is described
Threshold condition is the combination for matching sequence length and matching sequence similarity value, and the threshold condition that is greater than is greater than matching sequence
Length and it is greater than matching sequence similarity value, the quantity of the threshold condition is more than or equal to 2.Sequence alignment can be used
ClustalX, ClustalW or Blast software carry out, and the result of sequence alignment generally comprises the matching sequence between sequence two-by-two
Length and matching sequence similarity value or matching sequence distance value, matching sequence similarity value and the relationship that matches sequence distance value
For they and be equal to 1, when comparison result data volume is big can by Perl programming obtain refer to species nonredundancy base
Because of quantity, nonredundancy gene refers to unduplicated gene;
Gene will be investigated with reference to gene and second and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than institute
State the reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of threshold condition investigates species as second;
The reference species nonredundancy gene dosage and second of the first investigation species investigates species when calculating same threshold condition
Reference species nonredundancy gene dosage difference, determine the maximum absolute value of difference and with reference to species nonredundancy gene number compared with
Big species of investigating are to specify the affiliation of gene family closer with reference species.
Preferably, with reference to species and species are investigated by genome sequencing, the gene obtained after genome sequencing
The gene information of family is more complete, then the result of subsequent analysis is more reliable.
Preferably, the sequence is amino acid sequence.
Preferably, the matching sequence length in the threshold condition is the domain sequence length of the specified gene family
2/3rds, matching sequence length threshold value is arranged the too small confidence level that can reduce subsequent analysis result, and usually structural domain sequence
2/3rds of column length can consider the structural domain than more complete.
Preferably, the matching sequence similarity value in the threshold condition is at least 30% or more, usual sequence similarity
Up to 30% it is considered that two sequences may have similar function.
Preferably, it includes 30%, 40%, 50%, 60% that the matching sequence similarity value, which is at least 30% or more,
70%, 80% and 90%.
Preferably, it is 7 threshold conditions, of 7 threshold conditions that the quantity of the threshold condition, which is more than or equal to 2,
It is respectively 30%, 40%, 50%, 60%, 70%, 80% and 90% with sequence similarity value, matching sequence length is described
2/3rds of the domain sequence length of specified gene family.
Preferably, the nonredundancy gene dosage is that the number of nonredundancy gene or the number of nonredundancy gene account for reference substance
The ratio of the specified gene family gene number of kind.
Certainly can be greater than by investigating species by 2, can be obtained and be joined from all investigation species by means of the present invention
The species that species specify gene family affiliation nearest are examined, are calculated in species and as first investigated from any two with reference to species
The closer investigation species of specified gene family affiliation, then calculate this and investigate species and another investigation species who and reference substance
Kind specified gene family affiliation is closer, and so on, it obtains and specifies gene family affiliation nearest with reference to species
Investigate species.
With between different plant species or in species of the same race between individual gene affiliation or evolutionary relationship compared with, different plant species
Affiliation between same gene family has prior application value, the former only shows the affiliation between gene
Or evolutionary relationship, the latter can be applied to the comparison and understanding to the function performance level of same gene family in different plant species,
Thus the degree for understanding or predicting the corresponding morphological development feature of the gene family between species, so that faster more quasi- more efficiently sieve
Select dominant species or purpose species.In addition, itself and certain gene family of 2 parent species can be reacted for Heterologous Hybridization species
Genetic evolution relationship can reflect nearly edge species the evolutionary relationship of certain gene family between nearly edge species.
Detailed description of the invention
Fig. 1 gene family affiliation calculation method preferred embodiment flow diagram between species of the present invention.
Specific embodiment
Below in conjunction with embodiment, the present invention will be described in detail, these embodiments only serve illustrative, it is not limited to
Application range of the invention.It is all to be made without prejudice to spirit of that invention the present invention is not limited to following embodiments or embodiment
Modification and deformation should all be included within the scope of the present invention.
Experimental example 1: the affiliation of NBS gene family calculates between sea island cotton, upland cotton and Lei Mengdeshi cotton
1. material source: the genomic data of upland cotton and Lei Mengdeshi cotton is studied from Chinese Academy of Agricultural Sciences cotton
Institute (http://cgp.genomics.org.cn/) separately includes 76,943 and 40,976 genes;The genome number of sea island cotton
According to from Hua Zhong Agriculture University (http://cotton.cropdb.org/), including 109,918 genes.
2. method and step
First, by the protein sequence of the full gene in sea island cotton, upland cotton and Lei Mengdeshi cotton genomic data (and
Amino acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each
The structure domain information that gene is included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931)
Gene, then pass through Perl programming respectively from obtaining sea island cotton in the first step, in upland cotton and Lei Mengdeshi cotton gene
Structural domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, the sea island cotton of acquisition, upland cotton with
The NBS gene family of Lei Mengdeshi cotton separately includes 682,588 and 365 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%,
60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied
As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment
200)。
4th, the NBS gene of Lei Mengdeshi cotton (referring to species) and the NBS gene of sea island cotton (the first investigation species) are made
Pair-wise alignment is carried out with ClustalX software, and the nj file of generation includes the matching sequence length between sequence and matching two-by-two
Sequence distance value, matching sequence similarity value is 1 total for subtracting matching sequence distance value, then passes through Perl programming point
The nonredundancy gene dosage of the Lei Mengdeshi cotton of 7 threshold conditions described in third step Huo Qu be greater than, and be called sea island cotton
Lei Mengdeshi cotton nonredundancy gene dosage, the threshold condition as described in third step are 7, obtain corresponding 7 sea island cotton
Lei Mengdeshi cotton nonredundancy gene dosage, the results are shown in Table 1, from similitude 90% to 30%, the Lei Mengdeshi of sea island cotton
Cotton nonredundancy gene number is respectively 216,310,339,345,348,350 and 352, accounts for Lei Mengdeshi cotton NBS gene number
Ratio is respectively 59%, 85%, 93%, 95%, 95%, 96% and 96%.
5th, by the NBS gene of Lei Mengdeshi cotton (refer to species) and the NBS gene of upland cotton (second investigates species) into
Row pair-wise alignment, and obtain the Lei Mengdeshi cotton nonredundancy gene number of upland cotton, comparison method and acquisition methods and
Method described in four steps is identical, the Lei Mengdeshi cotton nonredundancy gene dosage of corresponding 7 upland cotton is obtained, as a result such as 1 institute of table
Show, from similitude 90% to 30%, the Lei Mengdeshi cotton nonredundancy gene number of upland cotton is respectively 189,259,280,284,
293,296 and 315, the ratio for accounting for Lei Mengdeshi cotton NBS gene number is respectively 52%, 71%, 77%, 78%, 80%, 81%
With 86%.
6th, the difference of Lei Mengdeshi cotton nonredundancy gene dosage when calculating same threshold condition between sea island cotton and upland cotton
Value, as shown in table 1, difference is maximum be under the conditions of similitude is greater than 60% sea island cotton than the Lei Mengdeshi cotton nonredundancy of upland cotton
More 61 of gene number (proportion more 17%), also, under each threshold condition sea island cotton Lei Mengdeshi cotton nonredundancy base
It is bigger in terms of the difference of ratio because quantity is more than upland cotton, show the NBS gene family and Lei Mengdeshi of sea island cotton
The affiliation of the NBS gene family of cotton is closer.
The quantity of the Lei Mengdeshi cotton nonredundancy gene of sea island cotton and upland cotton under the different threshold conditions of table 1
Since NBS gene is disease-resistant related gene, and verticillium wilt is the plant disease for influencing output of cotton most serious, can be claimed
It is cotton " cancer " on obtaining.In fact, studies have shown that Lei Mengdeshi cotton and sea island cotton all have stronger anti-cotton verticillium wilt
Ability, and upland cotton infecting vulnerable to verticillium wilt pathogen, the NBS gene man of the analysis of the method for the present invention sea island cotton as the result is shown
Race and the affiliation of the NBS gene family of Lei Mengdeshi cotton are closer, and it is more similar to show that sea island cotton and Lei Mengdeshi cotton have
The corresponding morphological development feature of NBS gene family, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, due to island
Cotton and upland cotton be between Lei Mengdeshi cotton and Asiatic cotton Heterologous Hybridization formed new species, it is according to the present invention the result shows that,
Lei Mengdeshi cotton has given more NBS gene family gene genetics to sea island cotton, thus it is speculated that the disease resistance and redmond of sea island cotton
Family name cotton is more close, consistent with the reality of cotton disease resistance.As it can be seen that the evolutionary relationship of same gene family or parent between different plant species
Edge relationship has close relationship with gene family function performance level and the corresponding morphological development characteristic level of species, shows this hair
Bright method is with higher to apply confidence level.
Experimental example 2: the affiliation of NBS gene family calculates between sea island cotton, upland cotton and Asiatic cotton
1 material source: the genomic data of upland cotton and Asiatic cotton is from the Chinese Academy of Agriculture Science and Technologys Cotton Research Institute
(http://cgp.genomics.org.cn/) separately includes 76,943 and 40,134 genes;The genomic data of sea island cotton
It include 109,918 genes from Hua Zhong Agriculture University (http://cotton.cropdb.org/).
2 methods and step
First, by the protein sequence of the full gene in sea island cotton, upland cotton and Asiatic cotton genomic data (and amino
Acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each gene
The structure domain information for being included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931)
Gene, then pass through Perl programming respectively from sea island cotton is obtained in the first step, structure in upland cotton and Asiatic cotton gene
Domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, sea island cotton, upland cotton and the Asia of acquisition
The NBS gene family of cotton separately includes 682,588 and 246 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%,
60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied
As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment
200)。
4th, the NBS gene of the NBS gene of Asiatic cotton (referring to species) and sea island cotton (first investigates species) is used
ClustalX software carries out pair-wise alignment, and the nj file of generation includes the matching sequence length and matching sequence between sequence two-by-two
Column distance value, matching sequence similarity value are 1 total for subtracting matching sequence distance value, then are distinguished by Perl programming
The nonredundancy gene dosage for being greater than the Asiatic cotton of 7 threshold conditions described in third step is obtained, and is called the Asiatic cotton of sea island cotton
Nonredundancy gene dosage, the threshold condition as described in third step are 7, and the Asiatic cotton for obtaining corresponding 7 sea island cotton is non-
Redundancy gene dosage, the results are shown in Table 2, from similitude 90% to 30%, the Asiatic cotton nonredundancy gene number point of sea island cotton
Not Wei 115,185,212,219,222,225 and 226, the ratio for accounting for Asiatic cotton NBS gene number is respectively 47%, 75%,
86%, 89%, 90%, 91% and 92%.
5th, the NBS gene of the NBS gene of Asiatic cotton (referring to species) and upland cotton (second investigates species) is carried out two
Pair-wise alignment, and the Asiatic cotton nonredundancy gene number of upland cotton is obtained, described in comparison method and acquisition methods and the 4th step
Method it is identical, obtain the Asiatic cotton nonredundancy gene dosage of corresponding 7 upland cotton, the results are shown in Table 2, from similitude 90%
To 30%, the Asiatic cotton nonredundancy gene number of upland cotton is respectively 140,202,211,214,219,221 and 225, Zhan Yazhou
The ratio of cotton NBS gene number is respectively 57%, 82%, 86%, 87%, 89%, 90% and 91%.
6th, the difference of Asiatic cotton nonredundancy gene dosage when calculating same threshold condition between sea island cotton and upland cotton,
As shown in table 2, it is that upland cotton is more a than the Asiatic cotton nonredundancy gene of sea island cotton under the conditions of similitude is greater than 90% that difference is maximum
Number is 25 more (proportion more 10%), and the upland cotton under the conditions of being greater than 80% for similitude that takes second place is more non-than the Asiatic cotton of sea island cotton superfluous
Complementary basis because of (proportion more 7%) 17 more than number, although under remaining threshold condition sea island cotton Asiatic cotton nonredundancy base
Because quantity is more than upland cotton, but difference is fairly small, without conspicuousness, it is thus determined that with difference maximum and Asiatic cotton nonredundancy
The biggish upland cotton of gene dosage is closer with the affiliation of the NBS gene family of Asiatic cotton.
The quantity of the Asiatic cotton nonredundancy gene of sea island cotton and upland cotton under the different threshold conditions of table 2
Since NBS gene is disease-resistant related gene, in fact, studies have shown that Asiatic cotton and upland cotton are vulnerable to verticillium wilt
Bacterium is infected, and sea island cotton can be immunized, and the method for the present invention analyzes the NBS gene family of upland cotton and Asia as the result is shown
The affiliation of the NBS gene family of cotton is closer, shows that upland cotton has more similar NBS gene family corresponding with Asiatic cotton
Morphological development feature, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, since sea island cotton and upland cotton are thunder
Cover the new species that Heterologous Hybridization is formed between De Shi cotton and Asiatic cotton, it is according to the present invention the result shows that, Asiatic cotton is by more NBS
Gene family gene genetic has given upland cotton, thus it is speculated that the disease resistance of upland cotton is more close with Asiatic cotton, with showing for cotton disease resistance
Truth condition is consistent.It again shows that, the evolutionary relationship of same gene family or affiliation are with gene family function between different plant species
Performance level and the corresponding morphological development characteristic level of species have close relationship, show the method for the present invention application with higher
Confidence level.
Experimental example 3: the affiliation of NBS gene family calculates between cocoa, Lei Mengdeshi cotton and Asiatic cotton
1 material source: the genomic data of Lei Mengdeshi cotton and Asiatic cotton is studied from Chinese Academy of Agricultural Sciences cotton
Institute (http://cgp.genomics.org.cn/) separately includes 40,976 and 40,134 genes;The genomic data of cocoa
It include 46,143 genes from cocoa genome center (http://cocoagendb.cirad.fr. /).
2 methods and step
First, by the protein sequence of the full gene in cocoa, Lei Mengdeshi cotton and Asiatic cotton genomic data (and ammonia
Base acid sequence) submit the Pfam research tool (http://pfam.xfam.org/search) of EMBL-EBI to predict each base
Because of the structure domain information for being included.
Second, since NBS gene family is comprising NB-ARC structural domain (structure Field Number or Pfam number be PF00931)
Gene, then pass through Perl programming respectively from obtaining cocoa in the first step, tie in Lei Mengdeshi cotton and Asiatic cotton gene
Structure domain information includes the gene of PF00931 or NB-ARC, as NBS gene family member, sea island cotton, upland cotton and the Asia of acquisition
The NBS gene family of continent cotton separately includes 298,365 and 246 NBS genes.
Third, is arranged 7 threshold conditions, the matching sequence similarity value of threshold condition is respectively 90%, 80%, 70%,
60%, 50%, 40% and 30%, the matching sequence length of threshold condition is 200 (since the NB-ARC of NBS gene family is tied
As 2/3rds of structural domain, i.e., structure domain about 300 amino acid set the matching sequence length threshold value of pair-wise alignment
200)。
4th, the NBS gene of the NBS gene of cocoa (referring to species) and Lei Mengdeshi cotton (first investigates species) is used
ClustalX software carries out pair-wise alignment, and the nj file of generation includes the matching sequence length and matching sequence between sequence two-by-two
Column distance value, matching sequence similarity value are 1 total for subtracting matching sequence distance value, then are distinguished by Perl programming
The nonredundancy gene dosage for being greater than the cocoa of 7 threshold conditions described in third step is obtained, and is called the cocoa of Lei Mengdeshi cotton
Nonredundancy gene dosage, the threshold condition as described in third step are 7, obtain the cocoa of corresponding 7 Lei Mengdeshi cottons
Nonredundancy gene dosage, the results are shown in Table 3, from similitude 90% to 30%, the cocoa nonredundancy gene of Lei Mengdeshi cotton
Number is respectively 0,13,38,51,224,259 and 279, the ratio of Zhan Keke NBS gene number is respectively 0%, 4%, 13%,
17%, 75%, 87% and 94%.
5th, the NBS gene of the NBS gene of cocoa (referring to species) and Asiatic cotton (second investigates species) is carried out two-by-two
Sequence alignment, and the cocoa nonredundancy gene number of Asiatic cotton is obtained, side described in comparison method and acquisition methods and the 4th step
Method is identical, obtains the cocoa nonredundancy gene dosage of corresponding 7 Asiatic cottons, and the results are shown in Table 3, from similitude 90% to
30%, the cocoa nonredundancy gene number of Asiatic cotton is respectively 0,10,30,110,221,257 and 281, and Zhan Keke NBS gene is total
Several ratios is respectively 0%, 3%, 10%, 37%, 75%, 86% and 94%.
6th, the difference of cocoa nonredundancy gene dosage when calculating same threshold condition between Lei Mengdeshi cotton and Asiatic cotton
Value, as shown in table 3, it is the cocoa nonredundancy base of Asiatic cotton ratio Lei Mengdeshi cotton under the conditions of similitude is greater than 60% that difference is maximum
Because of (proportion more 20%) 59 more than number, although under remaining threshold condition Lei Mengdeshi cotton cocoa nonredundancy gene
Quantity is more than Asiatic cotton, but difference is fairly small, without conspicuousness, it is thus determined that with difference maximum and cocoa nonredundancy gene
The biggish upland cotton of quantity is closer with the affiliation of the NBS gene family of cocoa.
The quantity of the cocoa nonredundancy gene of Lei Mengdeshi cotton and Asiatic cotton under the different threshold conditions of table 3
Since NBS gene is disease-resistant related gene, in fact, studies have shown that Asiatic cotton and cocoa are vulnerable to verticillium wilt pathogen
Infect, and Lei Mengdeshi cotton can be immunized, the analysis of the method for the present invention the NBS gene family of cocoa and Asia as the result is shown
The affiliation of the NBS gene family of cotton is closer, shows that cocoa and Asiatic cotton have the corresponding shape of more similar NBS gene family
State development characteristics, i.e., disease-resistant level are consistent with cotton disease resistance reality.In addition, since Lei Mengdeshi cotton and Asia can be can be
The NBS gene family affiliation of the nearly edge species of cotton, the Asiatic cotton and cocoa that the method for the present invention obtains is closer, thus it is speculated that: Lei Meng
The common ancestor of De Shi cotton and Asiatic cotton should be to be infected vulnerable to verticillium wilt pathogen, and Lei Mengdeshi cotton is to pass through after separating with Asiatic cotton
The tachytelic evolution of NBS gene family obtains the ability of resisting verticillium, and the NBS gene of Lei Mengdeshi cotton is than cocoa and Asiatic cotton
It is mostly very much, it may be possible to which that the duplication that rear NBS gene is separated with Asiatic cotton expands family member and changes disease resistance, however NBS
The disease resistance of the more upland cotton of gene dosage illustrates the NBS of Lei Mengdeshi cotton there is no enhancing due to increasing for quantity
The evolution and duplication of gene family be not random, has those of stronger disease resistance NBS to adapt to environment existence
Gene is replicated, to enhance disease resistance.Therefore, the method for the present invention can be applied to the Study on Evolution of gene family
With the screening of correlation function gene.This experimental example again shows that, the evolutionary relationship or relationship of same gene family between different plant species
Relationship has close relationship with gene family function performance level and the corresponding morphological development characteristic level of species, shows the present invention
Method is with higher to apply confidence level.
Claims (8)
1. gene family affiliation calculation method between a kind of species, it is characterised in that:
Determine a gene family to be analyzed as specified gene family;
Selection refers to species, obtains the gene that its described specified gene family is included and is used as with reference to gene;
It determines the first investigation species, obtains the gene that its described specified gene family is included and investigate gene as first;
It determines the second investigation species, obtains the gene that its described specified gene family is included and investigate gene as second;
Gene will be investigated with reference to gene and first and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than threshold value item
The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of part investigates species as first, the threshold value
Condition is the combination for matching sequence length and matching sequence similarity value, and the threshold condition that is greater than is greater than matching sequence length
And it is greater than matching sequence similarity value, the quantity of the threshold condition is more than or equal to 2;
Gene will be investigated with reference to gene and second and carry out pair-wise alignment, obtain from the sequence alignment result and be greater than the threshold
The reference species nonredundancy gene dosage that the nonredundancy gene dosage of the reference species of value condition investigates species as second;
First investigates the reference species nonredundancy gene dosage of species and the ginseng of the second investigation species when calculating same threshold condition
The difference for examining species nonredundancy gene dosage, determines the maximum absolute value of difference and reference species nonredundancy gene dosage is biggish
Investigating species is to specify the affiliation of gene family closer with reference species.
2. gene family affiliation calculation method between species as described in claim 1, which is characterized in that with reference to species and examine
Species are examined by genome sequencing.
3. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the sequence is ammonia
Base acid sequence.
4. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition
In matching sequence length be the specified gene family domain sequence length 2/3rds.
5. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition
In matching sequence similarity value be at least 30% or more.
6. gene family affiliation calculation method between species as claimed in claim 5, which is characterized in that the matching sequence
It includes 30%, 40%, 50%, 60%, 70%, 80% and 90% that similarity, which is at least 30% or more,.
7. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the threshold condition
Quantity be more than or equal to 2 be 7 threshold conditions, the matching sequence similarity value of 7 threshold conditions is respectively 30%,
40%, 50%, 60%, 70%, 80% and 90%, matching sequence length is that the domain sequence of the specified gene family is long
2/3rds of degree.
8. gene family affiliation calculation method between species as described in claim 1, which is characterized in that the nonredundancy base
Because the number of number or nonredundancy gene that quantity is nonredundancy gene accounts for the specified gene family gene number with reference to species
Ratio.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710229007.4A CN106980776B (en) | 2017-04-10 | 2017-04-10 | Gene family affiliation calculation method between a kind of species |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710229007.4A CN106980776B (en) | 2017-04-10 | 2017-04-10 | Gene family affiliation calculation method between a kind of species |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106980776A CN106980776A (en) | 2017-07-25 |
CN106980776B true CN106980776B (en) | 2019-05-24 |
Family
ID=59343719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710229007.4A Active CN106980776B (en) | 2017-04-10 | 2017-04-10 | Gene family affiliation calculation method between a kind of species |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106980776B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445954B (en) * | 2020-04-01 | 2023-09-01 | 广州基迪奥生物科技有限公司 | Method for identifying multiple gene families and carrying out evolutionary analysis |
CN113628684A (en) * | 2021-08-06 | 2021-11-09 | 苏州鸿晓生物科技有限公司 | Sample bacterial species detection methods and systems |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104450898A (en) * | 2014-11-26 | 2015-03-25 | 江苏出入境检验检疫局动植物与食品检测中心 | Species identification method of euproctis insects |
CN104546938A (en) * | 2014-09-30 | 2015-04-29 | 深圳华大基因科技有限公司 | Application of extremely giant megamonas in treatment or prevention of rheumatoid arthritis or related diseases thereof |
CN104603283A (en) * | 2012-08-01 | 2015-05-06 | 深圳华大基因研究院 | Method and system to determine biomarkers related to abnormal condition |
CN105063761A (en) * | 2015-09-02 | 2015-11-18 | 云南大学 | Method for identifying predator nematophagous hyphomycete arthrobotrys through DNA bar codes |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2681515A1 (en) * | 2007-03-23 | 2008-10-02 | Basf Plant Science Gmbh | Transgenic plants with increased stress tolerance and yield expressing a lrp-2 protein |
-
2017
- 2017-04-10 CN CN201710229007.4A patent/CN106980776B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104603283A (en) * | 2012-08-01 | 2015-05-06 | 深圳华大基因研究院 | Method and system to determine biomarkers related to abnormal condition |
CN104546938A (en) * | 2014-09-30 | 2015-04-29 | 深圳华大基因科技有限公司 | Application of extremely giant megamonas in treatment or prevention of rheumatoid arthritis or related diseases thereof |
CN104450898A (en) * | 2014-11-26 | 2015-03-25 | 江苏出入境检验检疫局动植物与食品检测中心 | Species identification method of euproctis insects |
CN105063761A (en) * | 2015-09-02 | 2015-11-18 | 云南大学 | Method for identifying predator nematophagous hyphomycete arthrobotrys through DNA bar codes |
Non-Patent Citations (1)
Title |
---|
Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes;Mincheol Kim etc.;《International Journal of Systematic and Evolutionary Microbiology》;20141231;第346-351页 |
Also Published As
Publication number | Publication date |
---|---|
CN106980776A (en) | 2017-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Marzano et al. | Novel mycoviruses discovered from metatranscriptomics survey of soybean phyllosphere phytobiomes | |
Martin et al. | Quarantine regulations and the impact of modern detection methods | |
Claverie et al. | From spatial metagenomics to molecular characterization of plant viruses: a geminivirus case study | |
Jo et al. | Peach RNA viromes in six different peach cultivars | |
Tanweer et al. | Current advance methods for the identification of blast resistance genes in rice | |
Hospital | Challenges for effective marker-assisted selection in plants | |
Sardos et al. | DArT whole genome profiling provides insights on the evolution and taxonomy of edible Banana (Musa spp.) | |
Liu et al. | Evaluating genetic diversity and constructing core collections of Chinese Lentinula edodes cultivars using ISSR and SRAP markers | |
Wolfe et al. | Marker-based estimates reveal significant nonadditive effects in clonally propagated cassava (Manihot esculenta): implications for the prediction of total genetic value and the selection of varieties | |
Chang et al. | Genome-wide association and genomic prediction identifies associated loci and predicts the sensitivity of Tobacco ringspot virus in soybean plant introductions | |
Sidharthan et al. | Robust virome profiling and whole genome reconstruction of viruses and viroids enabled by use of available mRNA and sRNA-Seq datasets in grapevine (Vitis vinifera L.) | |
Bamba et al. | Plant adaptation and speciation studied by population genomic approaches | |
Thomas et al. | Resurgence of cucurbit downy mildew in the United States: Insights from comparative genomic analysis of Pseudoperonospora cubensis | |
CN106980776B (en) | Gene family affiliation calculation method between a kind of species | |
Bulman et al. | Opportunities and limitations for DNA metabarcoding in Australasian plant-pathogen biosecurity | |
Ahmed et al. | Technological advancements and their importance for nematode identification | |
Shujaat et al. | Cr-prom: A convolutional neural network-based model for the prediction of rice promoters | |
Monnot et al. | Deciphering the genetic architecture of plant virus resistance by GWAS, state of the art and potential advances | |
AlMomin et al. | Draft genome sequence of the silver pomfret fish, Pampus argenteus | |
Rodriguez-Rodriguez et al. | The recombinant potato virus Y (PVY) strain, PVYNTN, identified in potato fields in Victoria, southeastern Australia | |
Mavrič Pleško et al. | Raspberry bushy dwarf virus in Slovenia-geographic distribution, genetic diversity and population structure | |
Benkeblia | Sustainable agriculture and new biotechnologies | |
Rabadán et al. | Long-term monitoring of aphid-transmitted viruses in melon and zucchini crops: Genetic diversity and population structure of cucurbit aphid-borne yellows virus and watermelon mosaic virus | |
Karavina et al. | High-throughput sequencing of virus-infected Cucurbita pepo samples revealed the presence of Zucchini shoestring virus in Zimbabwe | |
Lefebvre et al. | Host plant resistance to pests and pathogens, the genetic leverage in integrated pest and disease management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |