CN113628687A - Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof - Google Patents

Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof Download PDF

Info

Publication number
CN113628687A
CN113628687A CN202110931309.2A CN202110931309A CN113628687A CN 113628687 A CN113628687 A CN 113628687A CN 202110931309 A CN202110931309 A CN 202110931309A CN 113628687 A CN113628687 A CN 113628687A
Authority
CN
China
Prior art keywords
nlr
gene
paired
genes
helper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110931309.2A
Other languages
Chinese (zh)
Inventor
田大成
秦超
杨四海
张小辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110931309.2A priority Critical patent/CN113628687A/en
Publication of CN113628687A publication Critical patent/CN113628687A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a construction method of a plant paired NLR resistance gene database and a multi-species paired NLR gene database thereof. The method comprises collecting genomic data of a species of interest from JGI or Ensembl; identifying NLR gene by Hmmscan and NLR-parser tool; screening clustered NLR genes by combining gene annotation information; analyzing the special structural domain of the NLR gene by Hmmscan and Blast tools; based on the evolution characteristics of paired NLR genes, paired NLR genes of target species are searched for in multiple rounds by means of Blast, Clustalw2, MEGA7.0 and the like; and establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications. By utilizing the method provided by the invention, the paired NLR resistance genes in different plant species can be searched and analyzed in a whole genome range, and molecular evidence is provided for researching and improving resistance breeding of crops.

Description

Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof
Technical Field
The invention relates to a gene sequence data processing method, in particular to a construction method of a plant paired NLR gene database and a multi-species paired NLR gene database.
Background
Plants are important members of the earth's ecology and are also indispensable important components in human life. The human body can not directly or indirectly participate in the clothes, food, live and rows, but the growth and development process of the plant is not plain and is often accompanied by the occurrence of various biotic stresses, wherein most of the biotic stresses are caused by the infection of microorganisms, the biotic stresses can damage the growth and the reproduction of the plant, and the serious situation is more likely to cause a great yield reduction of the plant.
At present, the prevention and treatment measures for the invasion of pathogenic bacteria are mainly the application of pesticides, but the use of pesticides can cause great harm to the ecology, some pesticides contain persistent organic pollutants, the pollutants are difficult to degrade, the pollutants can be remained in the soil for decades, and the quality and the biodiversity of the soil are adversely affected (Jacobsen and)
Figure BDA0003211299960000011
2014). The pesticide residues and their further concentration through the food chain can harm our human life and health, especially high-risk people such as children, pregnant women and the elderly (Kim et al, 2017). Several studies on the effect of pesticides on human health have shown that pesticides may be associated with a variety of diseases, including cancer, leukemia, and asthma, among others. Therefore, in order to effectively and sustainably control pathogenic bacteria, a plurality of control means must be used comprehensively, and the control of diseases by using the resistance of plants is one of the most economic, effective and safe ways known, and the research of molecular mechanisms of plant disease resistance, identification of disease-resistant genes, cloning and transgenosis are important in the field of plant science.
There are two ways in which plants can defend against pathogens by self-resistance (Jones and Dangl,2006), and Pathogen-associated molecular pattern-mediated immunity (PTI) is first initiated upon Pathogen infestation, a process that involves the recognition of conserved features (PAMPs) common to microorganisms. Once the pathogenic bacteria are not effectively inhibited by PTI, Effector-mediated immune responses (ETI) are triggered. The ETI process needs the participation of a plant disease Resistance gene (Resistance gene), and the combination of the R gene and a pathogenic bacteria AVR effector can induce the Hypersensitive cell death (Hypersensitive cell death) of an infected part to cause tissue necrosis so as to avoid further infection of pathogenic bacteria.
The NBS-LRR gene (also called NLR gene) is a disease-resistant gene group with the most quantity and the most wide distribution in plant disease-resistant genes and is a key gene in plant disease-resistant research. Most NLR proteins are typically modular multidomain proteins with the core elements being a central Nucleotide Binding Site (NBS) and a C-terminal Leucine-rich repeat (LRR), the NBS domain being a switch in NLR protein receptor activity that controls signaling through conformational changes (Moffett et al, n.d.), while the LRR domain is usually involved in specific recognition during direct recognition of pathogen effectors (Jia et al, 2000). In addition, the NLR protein also comprises an N-terminal variable domain, and the NB-LRR genes can be divided into two main groups according to the difference of the variable domain: TNL (TIR-NBS-LRR) having a TIR (toll and inter leukin-1receptor) domain at the N-terminus; CNL (CC-NBS-LRR) having a Coiled-Coil structure (CC) at the N-terminus. However, these NLR genes have the following evolutionary features: the distribution of the gene is not uniform in the whole genome, the concentrated distribution is taken as the main point, most of the gene is distributed in clusters, and the tandem repeat is taken as the main point in the same cluster; the same NBS-LRR gene has great nucleotide difference between different species and even between different strains of the same species; replication and loss of the NLR gene are very frequent between different lines of plants. The characteristics make the cloning, analysis and application of plant disease-resistant genes great obstacles.
The current research on plant disease resistance genes is mainly focused on resistance gene monomers on important crops, and the research on non-angiosperms and multi-gene cooperation is much less. "Gene-to-gene" is the earliest hypothesis on the resistance mechanism of NLR genes, i.e., one gene is responsible for the recognition immunity of a pathogenic bacterium, but it has been found that this hypothesis is not complete by increasing studies (Lee et al, 2009; Narusaka et al, 2009; Peart et al, 2005). Many findings indicate that immunization of plants against certain pathogenic bacteria requires the cooperation of two genetically linked genes to be completed, such as RRS1 and RPS4 in arabidopsis (Narusaka et al, 2009), Pi5-1 and Pi5-2 in rice (Lee et al, 2009), N and NRG1 in tobacco (Peart et al, 2005), and the like. They are collectively called paired NLR genes and tend to be arranged head-to-head on the genome. Some researches find that paired NLR genes not only have more broad-spectrum durable resistance after transgenic breeding, but also can reduce the resistance cost increase caused by resistance improvement (Deng et al, 2017), and are important transgenic breeding resources. Joshua et al found that nearly 50% of adjacent NLR genes are arranged head to head after scanning NLR genes of thirteen rice subspecies, and are remarkably different from head to tail arrangement preference (Stein et al, 2018) caused by duplication in tandem repeat in a gene cluster, and the above results suggest that the paired resistance characteristics of the NLR genes may be an important and common mode in a plant resistance mechanism, so that the deep research on the paired NLR genes is a crucial step for breaking through the bottleneck of plant disease resistance research.
The paired NLRs reported so far have most of the following characteristics: 1. the gene has NBS and LRR structural domains simultaneously, and the Sensor often comprises a special structural domain such as WRKY, NOI and the like (Nishimura et al, 2015); 2. the Helper and the Sensor head to head are closely distributed on the position of the chromosome; 3. helper is more conservative than Sensor in evolution; 4. the Helper and the Sensor belong to different branches of the evolutionary tree respectively on the phylogenetic characteristic and have a topological relation of mirror image distribution. The invention realizes the construction of paired NLR gene database in plants by a bioinformatics method based on the evolution characteristics of the paired NLR genes.
Disclosure of Invention
Aiming at the current situation that paired NLR gene data are effectively identified at home and abroad, the invention aims to provide a construction method of a paired plant NLR gene database.
In order to solve the problems in the prior art, the invention provides the following technical scheme: the invention relates to a method for constructing a paired NLR resistance gene database of plants, which comprises the following steps:
(1) collecting basic data: collecting genome sequences, protein sequences and gene annotation information of target species from JGI or Ensembl;
(2) identification of NLR genes and gene clusters: firstly, identifying NLR genes by Hmmscan and NLR-parser tools, and then screening clustered NLR genes in the NLR genes by combining the genome annotation positions of the NLR genes;
(3) identification of the specific domains: analyzing the special structural domain condition of the NLR gene by Hmmscan and Blast tools;
(4) identification of paired NLR genes: firstly, searching Helper homologous genes in clustered NLR genes of a target species through a Blast tool based on reported paired NLR genes, screening out NLR gene clusters containing Helper homologous genes, constructing a phylogenetic tree through a Clustalw2 tool, carrying out artificial inspection through a MEGA7.0 tool, and finally screening out the paired NLR genes of the target species based on the evolution characteristics of the paired NLR genes;
(5) re-identification of paired NLR genes: and (4) repeating the analysis and retrieval process in the step (4) based on the candidate paired NLR genes searched in the step (4) until no new paired NLR genes are retrieved.
Further, in step (3), after the NB-ARC domain and the LRR domain are detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain is classified as a major NLR gene, and the gene having only the NB-ARC domain is classified as a candidate NLR gene.
Further, in the step (4), in the process of searching the NLR gene cluster, NLR genes within two or less physical distances of 200kb or gene number distances on the same chromosome are regarded as genes within the same gene cluster.
Furthermore, in step (4), the trace of the specific domain of the NLR gene is mined in various ways, including Hmmscan search for the target protein, Hmmscan search for proteins before and after the target gene, tblastn search for nucleotide sequences within 5k before and after the target gene, and this is used as one of the criteria for Sensor determination.
Further, in the step (4), after the NLR gene cluster containing the Helper homologous gene is screened out, the NB-ARC domain protein sequence is used to draw the phylogenetic tree, the Helper and Sensor branches in the phylogenetic tree are determined based on the phylogenetic relationship and the special domain trail of the NLR gene, the candidate Helper and Sensor need to be gathered in the same branch as the Helper and Sensor with the identified sequence on the phylogenetic tree, and the candidate Helper and Sensor need to be gathered in the same branch as the identified Helper and Sensor, and the candidate Helper and Sensor corresponding gene is the Sensor.
Further, in the step (5), the characteristic value data of the paired NLR genes includes a gene position, a gene number, a strand orientation, a specific domain, a gene type, Pair-ID, TIR type, NBS position on the chromosome.
The invention relates to a multi-species paired NLR gene database constructed by the construction method of the plant paired NLR resistance gene database, which is characterized in that: establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications: https:// figshare. com/articles/dataset/Database _ base _ of _ scheduled _ NLR _ genes/15096966.
Further, the multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.
Has the advantages that: by utilizing the method provided by the invention, the paired NLR genes can be efficiently retrieved and analyzed for the plant species, and on one hand, a large number of reliable candidate gene pairs are provided for the disease-resistant function research of the species, so that the disease-resistant breeding of the plant is accelerated; on the other hand, based on the construction of a multi-species paired NLR gene database, NLR gene pairs of different species can be combined and researched, so that the NLR gene pairs with broad spectrum and high efficiency can be found and applied to different cash crops, and the influence of germ invasion on the production and life of people is reduced.
Compared with the prior art, the invention has the following advantages: the paired NLR gene database construction method provided by the invention has important scientific value and application value. In recent years, progress in the study of paired NLR genes has not been ideal: (1) a bioinformatics method capable of efficiently identifying paired NLR genes of a whole genome does not exist, so that subsequent functional identification cannot be smoothly carried out; (2) the disease resistance function verification of paired NLR genes has the problems of long period, germ specificity, functional redundancy and the like. The invention makes up the deletion of the research method on the one hand, and provides a very good way for the rapid identification of paired NLR genes in different species. By using the method provided by the invention, the paired NLR genes of the sequenced species can be detected and analyzed in the whole genome range, so that the disease-resistant research of crops is promoted, and molecular evidence is provided for researching and exploring the disease-resistant evolution of the crops.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic diagram of phylogenetic characteristics of brachypodium distachyon paired NLR, and the special structural domains are labeled in the diagram. The graph A is a cross-species identification representative branch of the two-ear brachypodium distachyon, a red square represents the two-ear brachypodium distachyon paired NLR, a blue circle represents the rice paired NLR seed sequence, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree. B: the brachypodium distachyon paired NLR rebuilds the evolutionary tree, the dark color shows the gene pair identified in cross species, and the light color shows the result identified in self circulation.
FIG. 3 is a schematic diagram showing the phylogenetic characteristics of paired NLR of barley, and the specific structural domains are labeled in the figure. A is a barley cross-species identification representative branch, a black triangle represents barley paired NLR, a blue circle represents rice paired NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; panel B is barley paired NLR reconstructed evolutionary tree.
FIG. 4 is a schematic diagram showing phylogenetic characteristics of paired NLR of sorghum, and specific structural domains are labeled in the diagram. A is a sorghum cross-species identification representative branch, a dark blue inverted triangle shows paired sorghum NLR, a blue circle represents paired rice NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; panel B is a sorghum paired NLR reconstructed evolutionary tree.
FIG. 5 is a schematic diagram showing the phylogenetic characteristics of Setaria viridis paired NLR, and the special structural domains are labeled in the figure. FIG. A is a cross-species identification representative branch of green bristlegrass, a brown diamond represents green bristlegrass paired NLR, a blue circle represents rice paired NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; and the graph B is a green bristlegrass paired NLR reconstructed evolutionary tree.
FIG. 6 is a phylogenetic tree of Arabidopsis paired NLR genes. Wherein, Clade I is a paired NLR gene which is retrieved by arabidopsis through the topological relation of the reported genes, Clade II is the paired NLR gene which is detected by arabidopsis through a self-circulation tree drawing, and the Helper branch and the Sensor branch are marked on the right side.
Detailed Description
The present invention is further described below by way of examples, but is not intended to limit the scope of the present invention.
The invention relates to a method for constructing a paired NLR resistance gene database of plants, which comprises the following steps:
(1) collecting basic data: collecting genome sequences, protein sequences and gene annotation information of target species from JGI or Ensembl;
(2) identification of NLR genes and gene clusters: firstly, identifying NLR genes by Hmmscan and NLR-parser tools, and then screening clustered NLR genes in the NLR genes by combining the genome annotation positions of the NLR genes;
(3) identification of the specific domains: analyzing the special structural domain condition of the NLR gene by Hmmscan and Blast tools; after the NB-ARC domain and the LRR domain were detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain was classified as the main NLR gene, and the gene having only the NB-ARC domain was classified as the candidate NLR gene.
(4) Identification of paired NLR genes: firstly, searching Helper homologous genes in clustered NLR genes of a target species through a Blast tool based on reported paired NLR genes, screening out NLR gene clusters containing Helper homologous genes, constructing a phylogenetic tree through a Clustalw2 tool, carrying out artificial inspection through a MEGA7.0 tool, and finally screening out the paired NLR genes of the target species based on the evolution characteristics of the paired NLR genes;
the method comprises the steps of excavating special structural domain traces of the NLR gene through various ways, including Hmmscan search on target proteins, Hmmscan search on proteins before and after the target genes, tblastn search on nucleotide sequences within 5k before and after the target genes, and taking the tblastn search as one of judgment bases of sensors.
Screening an NLR gene cluster containing a Helper homologous gene, drawing a phylogenetic tree by using an NB-ARC structural domain protein sequence, determining the Helper and Sensor branches in the phylogenetic tree based on the phylogenetic relationship and special structural domain traces of the NLR gene, wherein the candidate Helper and Sensor need to be gathered in the same branch with the Helper and Sensor with identified sequences on the phylogenetic tree, and the candidate Helper and Sensor gather in the same branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor.
In the process of searching NLR gene cluster, NLR genes within 200kb of physical distance or within two and less than gene number distance on the same chromosome are regarded as genes in the same gene cluster.
(5) Re-identification of paired NLR genes: and (4) repeating the analysis and retrieval process in the step (4) based on the candidate paired NLR genes searched in the step (4) until no new paired NLR genes are retrieved.
The characteristic value data of the paired NLR genes comprise gene positions, gene numbers, chain orientations, special structural domains, gene types, Pair-ID, TIR types and NBS positions on chromosomes.
The invention relates to a multi-species paired NLR gene database constructed by the construction method of the plant paired NLR resistance gene database, which is characterized in that: establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications: https:// figshare. com/articles/dataset/Database _ base _ of _ scheduled _ NLR _ genes/15096966.
The multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.
Example 1
Identification of paired NLR genes in Brachypodium distachyon (Brachypodium distachyon)
1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of brachypodium distachyon from JGI public databases;
2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 336 NBS genes and 272 NLR genes are identified and obtained in the brachypodium, and 69 gene clusters are finally obtained through screening;
3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.
4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 16 pairs of NLR genes (figure 2A);
5. re-identification of paired NLR genes: based on the 16 pairs of NLR genes searched in step 4, the analytical search process in step 4 is repeated until no new paired NLR genes are searched, and finally 22 pairs of paired NLR genes are obtained (fig. 2B).
6. And establishing a two-spike brachypodium distachyon paired NLR gene database by using the characteristic values corresponding to the paired NLR genes as data item identifications through MySQL.
Example 2
Identification of paired NLR genes in barley (Hordeum vulgare)
1. Collecting basic data: collecting the genome sequence, protein sequence and gene annotation information (http:// plants. ensemble. org) of barley from Ensembl public database;
2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 405 NBS genes and 318 NLR genes are identified in barley, and 74 gene clusters are finally screened;
3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.
4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 16 pairs of NLR genes (figure 3A);
5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 16 paired NLR genes are obtained (fig. 3B).
6. And establishing a barley paired NLR gene database by using the characteristic values corresponding to the paired NLR genes as data item identifications.
Example 3
Identification of paired NLR genes in Sorghum (Sorghum bicolor)
1. Collecting basic data: collecting the genome sequence, protein sequence and gene annotation information (https:// phytozome.jgi.doe.gov) of sorghum from JGI public databases;
2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 326 NBS genes and 275 NLR genes are identified in sorghum, and 55 gene clusters are finally screened;
3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.
4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 19 pairs of NLR genes (figure 4A);
5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 19 paired NLR genes are obtained (fig. 4B).
6. And establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications.
Example 4
Identification of paired NLR genes in Setaria viridis (Setaria viridis)
1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of green bristlegrass from JGI public databases;
2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 388 NBS genes and 291 NLR genes are identified in green bristlegrass herb, and 74 gene clusters are finally obtained through screening;
3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.
4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the identified Helper are gathered in one branch to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 20 pairs of NLR genes (figure 5A);
5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 20 paired NLR genes are obtained (fig. 5B).
6. And establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications.
Example 5
Identification of paired NLR genes in Arabidopsis thaliana (Arabidopsis thaliana)
1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of Arabidopsis thaliana from JGI public databases;
2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 165 NBS genes are identified in brachypodium distachyon, 142 NLR genes are identified, and 23 gene clusters are finally screened;
3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.
4. Identification of paired NLR genes: (1) based on Arabidopsis thaliana reported paired NLR genes RPS1/RPS4(Narusaka et al, 2009) and RPP2A/RPP2B (Sinapidou et al, 2004), a Helper isogene was searched for in a cluster of NLR genes by Blast (2.3.0+) tool, E-value < 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 12 pairs of NLR genes (FIG. 6Clade I);
5. re-identification of paired NLR genes: based on the paired NLR genes searched in the step 4, the analysis and retrieval process in the step 4 is repeated until no new paired NLR genes are retrieved, and finally 18 paired NLR genes are obtained (FIG. 6Clade II).
6. And establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.

Claims (8)

1. A method for constructing a plant paired NLR resistance gene database is characterized by comprising the following steps:
(1) collecting basic data: collecting genome sequences, protein sequences and gene annotation information of target species from JGI or Ensembl;
(2) identification of NLR genes and gene clusters: firstly, identifying NLR genes by Hmmscan and NLR-parser tools, and then screening clustered NLR genes in the NLR genes by combining the genome annotation positions of the NLR genes;
(3) identification of the specific domains: analyzing the special structural domain condition of the NLR gene by Hmmscan and Blast tools;
(4) identification of paired NLR genes: firstly, searching Helper homologous genes in clustered NLR genes of a target species through a Blast tool based on reported paired NLR genes, screening out NLR gene clusters containing Helper homologous genes, constructing a phylogenetic tree through a Clustalw2 tool, carrying out artificial inspection through a MEGA7.0 tool, and finally screening out the paired NLR genes of the target species based on the evolution characteristics of the paired NLR genes;
(5) re-identification of paired NLR genes: and (4) repeating the analysis and retrieval process in the step (4) based on the candidate paired NLR genes searched in the step (4) until no new paired NLR genes are retrieved.
2. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in step (3), after the NB-ARC domain and the LRR domain are detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain is classified as the main NLR gene, and the gene having only the NB-ARC domain is classified as the candidate NLR gene.
3. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), in the process of searching the NLR gene cluster, NLR genes within two or less physical distances of 200kb or gene number distances on the same chromosome are regarded as genes in the same gene cluster.
4. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), the trace of the special structural domain of the NLR gene is mined in various ways, including Hmmscan search on target protein, Hmmscan search on protein before and after the target gene, tblastn search on nucleotide sequence within 5k before and after the target gene, and the result is used as one of judgment bases of sensors.
5. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), after an NLR gene cluster containing a Helper homologous gene is screened out, an NB-ARC structural domain protein sequence is used for drawing a phylogenetic tree, the Helper and Sensor branches in the phylogenetic tree are determined based on the phylogenetic relation and the special structural domain traces of the NLR gene, candidate helpers and sensors need to be gathered in the same branch as the Helper and Sensor with identified sequences on the phylogenetic tree, the candidate helpers and Sensor are gathered in the same branch as the identified Helper, and the corresponding gene is the Sensor.
6. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (5), the characteristic value data of the paired NLR genes includes gene position, gene number, strand orientation, specific domain, gene type, Pair-ID, TIR type, NBS position on chromosome.
7. The paired plant NLR gene database constructed by the method of constructing paired plant NLR resistance gene database according to any one of claims 1 to 6, wherein the database comprises: establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications: https:// figshare. com/articles/dataset/Database _ base _ of _ scheduled _ NLR _ genes/15096966.
8. The multi-species paired NLR gene database according to claim 7, characterized in that: the multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.
CN202110931309.2A 2021-08-13 2021-08-13 Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof Pending CN113628687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110931309.2A CN113628687A (en) 2021-08-13 2021-08-13 Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110931309.2A CN113628687A (en) 2021-08-13 2021-08-13 Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof

Publications (1)

Publication Number Publication Date
CN113628687A true CN113628687A (en) 2021-11-09

Family

ID=78385428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110931309.2A Pending CN113628687A (en) 2021-08-13 2021-08-13 Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof

Country Status (1)

Country Link
CN (1) CN113628687A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705175A (en) * 2023-06-08 2023-09-05 南京农业大学 Cross-species comparative genomics database and construction and analysis method thereof

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU4720500A (en) * 1996-01-19 2000-09-14 Betagene, Inc. Recombinant expression of proteins from secretory cell lines
CN101962671A (en) * 2009-07-23 2011-02-02 王颖 Method for establishing phylogenetic tree aiming at target gene of target organism
CN102725420A (en) * 2009-08-31 2012-10-10 阿尔塞多生物技术有限公司 MicroRNA-based methods and compositions for the diagnosis, prognosis and treatment of tumor involving chromosomal rearrangements
US20120271558A1 (en) * 2009-12-11 2012-10-25 Korea Research Institute of Bioscience and Biotecn System and method for identifying and classifying resistance genes of plant using hidden marcov model
CN103710358A (en) * 2013-12-27 2014-04-09 河南省农业科学院经济作物研究所 Method for obtaining cDNA overall length of five disease resistance-related genes of Thurber cotton
CN104561035A (en) * 2013-10-28 2015-04-29 常熟市杜桥稻米专业合作社 Rapid identification of sorghum gene resistant to powdery mildew
CN109321673A (en) * 2018-10-31 2019-02-12 中国农业科学院蔬菜花卉研究所 A method of identification QTL relevant to Pigments in Cucumber color and gene
US20190195863A1 (en) * 2016-02-24 2019-06-27 The Rockefeller University Embryonic Cell-Based Therapeutic Candidate Screening Systems, Models for Huntington's Disease and Uses Thereof
CN111902547A (en) * 2018-03-23 2020-11-06 先锋国际良种公司 Method for identifying, selecting and generating disease resistant crops
CN113151285A (en) * 2019-12-30 2021-07-23 白素梅 Human 4IgB7-H3 mutation coding gene and application thereof in regulating immunity

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU4720500A (en) * 1996-01-19 2000-09-14 Betagene, Inc. Recombinant expression of proteins from secretory cell lines
CN101962671A (en) * 2009-07-23 2011-02-02 王颖 Method for establishing phylogenetic tree aiming at target gene of target organism
CN102725420A (en) * 2009-08-31 2012-10-10 阿尔塞多生物技术有限公司 MicroRNA-based methods and compositions for the diagnosis, prognosis and treatment of tumor involving chromosomal rearrangements
US20120271558A1 (en) * 2009-12-11 2012-10-25 Korea Research Institute of Bioscience and Biotecn System and method for identifying and classifying resistance genes of plant using hidden marcov model
CN104561035A (en) * 2013-10-28 2015-04-29 常熟市杜桥稻米专业合作社 Rapid identification of sorghum gene resistant to powdery mildew
CN103710358A (en) * 2013-12-27 2014-04-09 河南省农业科学院经济作物研究所 Method for obtaining cDNA overall length of five disease resistance-related genes of Thurber cotton
US20190195863A1 (en) * 2016-02-24 2019-06-27 The Rockefeller University Embryonic Cell-Based Therapeutic Candidate Screening Systems, Models for Huntington's Disease and Uses Thereof
CN111902547A (en) * 2018-03-23 2020-11-06 先锋国际良种公司 Method for identifying, selecting and generating disease resistant crops
CN109321673A (en) * 2018-10-31 2019-02-12 中国农业科学院蔬菜花卉研究所 A method of identification QTL relevant to Pigments in Cucumber color and gene
CN113151285A (en) * 2019-12-30 2021-07-23 白素梅 Human 4IgB7-H3 mutation coding gene and application thereof in regulating immunity

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
E BAGGS等: "NLR diversity, helpers and integrated domains: making sense of the NLR IDentity", 《CURRENT OPINION IN PLANT BIOLOGY》, vol. 38, pages 59 - 67 *
LONG WANG等: "Large-scale identification and functional analysis of NLR genes in blast resistance in the Tetep rice genome sequence", 《BIOLOGICAL SCIENCES》, pages 1 - 9 *
张雨: "植物NBS-LRR类抗病基因及小分子RNA的比较基因组学研究", 《中国博士学位论文全文数据库 农业科技辑》, no. 2017, pages 046 - 10 *
蔡慧忍: "拟南芥非典型NLR蛋白TN13在植物免疫中的功能分析", 《中国优秀硕士学位论文全文数据库 农业科技辑》, no. 2021, pages 046 - 80 *
赵丽娜: "水稻特特普持久抗稻瘟病机制的研究", 《中国博士学位论文全文数据库 农业科技辑》, no. 2019, pages 046 - 1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705175A (en) * 2023-06-08 2023-09-05 南京农业大学 Cross-species comparative genomics database and construction and analysis method thereof
CN116705175B (en) * 2023-06-08 2023-12-29 南京农业大学 Cross-species comparative genomics database and construction and analysis method thereof

Similar Documents

Publication Publication Date Title
VanBuren et al. The genome of black raspberry (Rubus occidentalis)
Debener et al. Disease resistance breeding in rose: current status and potential of biotechnological tools
Shulaev et al. The genome of woodland strawberry (Fragaria vesca)
Xin et al. High-throughput DNA extraction method suitable for PCR
Koornneef et al. Naturally occurring genetic variation in Arabidopsis thaliana
Jensen et al. Rootstock-regulated gene expression patterns associated with fire blight resistance in apple
CN106148353B (en) Brown planthopper resistant gene in rice Bph6 and its close linkage molecular labeling
Girsova et al. Diverse phytoplasmas associated with potato stolbur and other related potato diseases in Russia
Chen et al. Expression of resistance gene analogs in woodland strawberry (Fragaria vesca) during infection with Phytophthora cactorum
Fang et al. Selection of differential isolates of Magnaporthe oryzae for postulation of blast resistance genes
CN113628687A (en) Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof
Gonda et al. Genome-based high-resolution mapping of fusarium wilt resistance in sweet basil
Zolfaghary et al. Genetic diversity and virulence of Iranian Bipolaris sorokiniana isolates causing common root rot disease of wheat.
Zubareva et al. Genetic diversity of turnip mosaic virus and the mechanism of its transmission by Brassica seeds
Levi et al. High-frequency oligonucleotides in watermelon expressed sequenced tag-unigenes are useful in producing polymorphic polymerase chain reaction markers among watermelon genotypes
Bhat et al. Detection and characterization of the phytoplasma associated with a phyllody disease of black pepper (Piper nigrum L.) in India
Lantican et al. Resistance Gene Analogs of Mango: Insights on Molecular Defenses and Evolutionary Dynamics.
Jones et al. Pathogenicity effector candidates and accessory genome revealed by pan-genomic analysis of Parastagonospora nodorum
Nazir et al. Morphological and molecular identification of acridid grasshoppers (Acrididae: Orthoptera) from Poonch division, Azad Jammu Kashmir, Pakistan
CN112055753A (en) Corn event DP-023211-2 and detection method thereof
Xia et al. Population genomic analyses reveal extensive genomic regions within selective sweeps associated with adaptation and demographic history of a wheat fungal pathogen
Helo et al. UC Riverside Undergraduate Research Journal
Thanyasiriwat et al. Genetic loci associated with Fusarium wilt resistance in tomato (Solanum lycopersicum L.) discovered by genome‐wide association study
Helo et al. Virus discovery in winter-growing perennial plants of southern California sage scrub habitat
Fatima et al. Role of integrative omics and bioinformatics approaches in berries research and genetic improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination