CN113628687A

CN113628687A - Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof

Info

Publication number: CN113628687A
Application number: CN202110931309.2A
Authority: CN
Inventors: 田大成; 秦超; 杨四海; 张小辉
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-11-09

Abstract

The invention discloses a construction method of a plant paired NLR resistance gene database and a multi-species paired NLR gene database thereof. The method comprises collecting genomic data of a species of interest from JGI or Ensembl; identifying NLR gene by Hmmscan and NLR-parser tool; screening clustered NLR genes by combining gene annotation information; analyzing the special structural domain of the NLR gene by Hmmscan and Blast tools; based on the evolution characteristics of paired NLR genes, paired NLR genes of target species are searched for in multiple rounds by means of Blast, Clustalw2, MEGA7.0 and the like; and establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications. By utilizing the method provided by the invention, the paired NLR resistance genes in different plant species can be searched and analyzed in a whole genome range, and molecular evidence is provided for researching and improving resistance breeding of crops.

Description

Construction method of plant paired NLR resistance gene database and multi-species paired NLR gene database thereof

Technical Field

The invention relates to a gene sequence data processing method, in particular to a construction method of a plant paired NLR gene database and a multi-species paired NLR gene database.

Background

Plants are important members of the earth's ecology and are also indispensable important components in human life. The human body can not directly or indirectly participate in the clothes, food, live and rows, but the growth and development process of the plant is not plain and is often accompanied by the occurrence of various biotic stresses, wherein most of the biotic stresses are caused by the infection of microorganisms, the biotic stresses can damage the growth and the reproduction of the plant, and the serious situation is more likely to cause a great yield reduction of the plant.

At present, the prevention and treatment measures for the invasion of pathogenic bacteria are mainly the application of pesticides, but the use of pesticides can cause great harm to the ecology, some pesticides contain persistent organic pollutants, the pollutants are difficult to degrade, the pollutants can be remained in the soil for decades, and the quality and the biodiversity of the soil are adversely affected (Jacobsen and)

2014). The pesticide residues and their further concentration through the food chain can harm our human life and health, especially high-risk people such as children, pregnant women and the elderly (Kim et al, 2017). Several studies on the effect of pesticides on human health have shown that pesticides may be associated with a variety of diseases, including cancer, leukemia, and asthma, among others. Therefore, in order to effectively and sustainably control pathogenic bacteria, a plurality of control means must be used comprehensively, and the control of diseases by using the resistance of plants is one of the most economic, effective and safe ways known, and the research of molecular mechanisms of plant disease resistance, identification of disease-resistant genes, cloning and transgenosis are important in the field of plant science.

There are two ways in which plants can defend against pathogens by self-resistance (Jones and Dangl,2006), and Pathogen-associated molecular pattern-mediated immunity (PTI) is first initiated upon Pathogen infestation, a process that involves the recognition of conserved features (PAMPs) common to microorganisms. Once the pathogenic bacteria are not effectively inhibited by PTI, Effector-mediated immune responses (ETI) are triggered. The ETI process needs the participation of a plant disease Resistance gene (Resistance gene), and the combination of the R gene and a pathogenic bacteria AVR effector can induce the Hypersensitive cell death (Hypersensitive cell death) of an infected part to cause tissue necrosis so as to avoid further infection of pathogenic bacteria.

The NBS-LRR gene (also called NLR gene) is a disease-resistant gene group with the most quantity and the most wide distribution in plant disease-resistant genes and is a key gene in plant disease-resistant research. Most NLR proteins are typically modular multidomain proteins with the core elements being a central Nucleotide Binding Site (NBS) and a C-terminal Leucine-rich repeat (LRR), the NBS domain being a switch in NLR protein receptor activity that controls signaling through conformational changes (Moffett et al, n.d.), while the LRR domain is usually involved in specific recognition during direct recognition of pathogen effectors (Jia et al, 2000). In addition, the NLR protein also comprises an N-terminal variable domain, and the NB-LRR genes can be divided into two main groups according to the difference of the variable domain: TNL (TIR-NBS-LRR) having a TIR (toll and inter leukin-1receptor) domain at the N-terminus; CNL (CC-NBS-LRR) having a Coiled-Coil structure (CC) at the N-terminus. However, these NLR genes have the following evolutionary features: the distribution of the gene is not uniform in the whole genome, the concentrated distribution is taken as the main point, most of the gene is distributed in clusters, and the tandem repeat is taken as the main point in the same cluster; the same NBS-LRR gene has great nucleotide difference between different species and even between different strains of the same species; replication and loss of the NLR gene are very frequent between different lines of plants. The characteristics make the cloning, analysis and application of plant disease-resistant genes great obstacles.

The current research on plant disease resistance genes is mainly focused on resistance gene monomers on important crops, and the research on non-angiosperms and multi-gene cooperation is much less. "Gene-to-gene" is the earliest hypothesis on the resistance mechanism of NLR genes, i.e., one gene is responsible for the recognition immunity of a pathogenic bacterium, but it has been found that this hypothesis is not complete by increasing studies (Lee et al, 2009; Narusaka et al, 2009; Peart et al, 2005). Many findings indicate that immunization of plants against certain pathogenic bacteria requires the cooperation of two genetically linked genes to be completed, such as RRS1 and RPS4 in arabidopsis (Narusaka et al, 2009), Pi5-1 and Pi5-2 in rice (Lee et al, 2009), N and NRG1 in tobacco (Peart et al, 2005), and the like. They are collectively called paired NLR genes and tend to be arranged head-to-head on the genome. Some researches find that paired NLR genes not only have more broad-spectrum durable resistance after transgenic breeding, but also can reduce the resistance cost increase caused by resistance improvement (Deng et al, 2017), and are important transgenic breeding resources. Joshua et al found that nearly 50% of adjacent NLR genes are arranged head to head after scanning NLR genes of thirteen rice subspecies, and are remarkably different from head to tail arrangement preference (Stein et al, 2018) caused by duplication in tandem repeat in a gene cluster, and the above results suggest that the paired resistance characteristics of the NLR genes may be an important and common mode in a plant resistance mechanism, so that the deep research on the paired NLR genes is a crucial step for breaking through the bottleneck of plant disease resistance research.

The paired NLRs reported so far have most of the following characteristics: 1. the gene has NBS and LRR structural domains simultaneously, and the Sensor often comprises a special structural domain such as WRKY, NOI and the like (Nishimura et al, 2015); 2. the Helper and the Sensor head to head are closely distributed on the position of the chromosome; 3. helper is more conservative than Sensor in evolution; 4. the Helper and the Sensor belong to different branches of the evolutionary tree respectively on the phylogenetic characteristic and have a topological relation of mirror image distribution. The invention realizes the construction of paired NLR gene database in plants by a bioinformatics method based on the evolution characteristics of the paired NLR genes.

Disclosure of Invention

Aiming at the current situation that paired NLR gene data are effectively identified at home and abroad, the invention aims to provide a construction method of a paired plant NLR gene database.

In order to solve the problems in the prior art, the invention provides the following technical scheme: the invention relates to a method for constructing a paired NLR resistance gene database of plants, which comprises the following steps:

(1) collecting basic data: collecting genome sequences, protein sequences and gene annotation information of target species from JGI or Ensembl;

(2) identification of NLR genes and gene clusters: firstly, identifying NLR genes by Hmmscan and NLR-parser tools, and then screening clustered NLR genes in the NLR genes by combining the genome annotation positions of the NLR genes;

(3) identification of the specific domains: analyzing the special structural domain condition of the NLR gene by Hmmscan and Blast tools;

(4) identification of paired NLR genes: firstly, searching Helper homologous genes in clustered NLR genes of a target species through a Blast tool based on reported paired NLR genes, screening out NLR gene clusters containing Helper homologous genes, constructing a phylogenetic tree through a Clustalw2 tool, carrying out artificial inspection through a MEGA7.0 tool, and finally screening out the paired NLR genes of the target species based on the evolution characteristics of the paired NLR genes;

(5) re-identification of paired NLR genes: and (4) repeating the analysis and retrieval process in the step (4) based on the candidate paired NLR genes searched in the step (4) until no new paired NLR genes are retrieved.

Further, in step (3), after the NB-ARC domain and the LRR domain are detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain is classified as a major NLR gene, and the gene having only the NB-ARC domain is classified as a candidate NLR gene.

Further, in the step (4), in the process of searching the NLR gene cluster, NLR genes within two or less physical distances of 200kb or gene number distances on the same chromosome are regarded as genes within the same gene cluster.

Furthermore, in step (4), the trace of the specific domain of the NLR gene is mined in various ways, including Hmmscan search for the target protein, Hmmscan search for proteins before and after the target gene, tblastn search for nucleotide sequences within 5k before and after the target gene, and this is used as one of the criteria for Sensor determination.

Further, in the step (4), after the NLR gene cluster containing the Helper homologous gene is screened out, the NB-ARC domain protein sequence is used to draw the phylogenetic tree, the Helper and Sensor branches in the phylogenetic tree are determined based on the phylogenetic relationship and the special domain trail of the NLR gene, the candidate Helper and Sensor need to be gathered in the same branch as the Helper and Sensor with the identified sequence on the phylogenetic tree, and the candidate Helper and Sensor need to be gathered in the same branch as the identified Helper and Sensor, and the candidate Helper and Sensor corresponding gene is the Sensor.

Further, in the step (5), the characteristic value data of the paired NLR genes includes a gene position, a gene number, a strand orientation, a specific domain, a gene type, Pair-ID, TIR type, NBS position on the chromosome.

The invention relates to a multi-species paired NLR gene database constructed by the construction method of the plant paired NLR resistance gene database, which is characterized in that: establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications: https:// figshare. com/articles/dataset/Database _ base _ of _ scheduled _ NLR _ genes/15096966.

Further, the multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.

Has the advantages that: by utilizing the method provided by the invention, the paired NLR genes can be efficiently retrieved and analyzed for the plant species, and on one hand, a large number of reliable candidate gene pairs are provided for the disease-resistant function research of the species, so that the disease-resistant breeding of the plant is accelerated; on the other hand, based on the construction of a multi-species paired NLR gene database, NLR gene pairs of different species can be combined and researched, so that the NLR gene pairs with broad spectrum and high efficiency can be found and applied to different cash crops, and the influence of germ invasion on the production and life of people is reduced.

Compared with the prior art, the invention has the following advantages: the paired NLR gene database construction method provided by the invention has important scientific value and application value. In recent years, progress in the study of paired NLR genes has not been ideal: (1) a bioinformatics method capable of efficiently identifying paired NLR genes of a whole genome does not exist, so that subsequent functional identification cannot be smoothly carried out; (2) the disease resistance function verification of paired NLR genes has the problems of long period, germ specificity, functional redundancy and the like. The invention makes up the deletion of the research method on the one hand, and provides a very good way for the rapid identification of paired NLR genes in different species. By using the method provided by the invention, the paired NLR genes of the sequenced species can be detected and analyzed in the whole genome range, so that the disease-resistant research of crops is promoted, and molecular evidence is provided for researching and exploring the disease-resistant evolution of the crops.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a schematic diagram of phylogenetic characteristics of brachypodium distachyon paired NLR, and the special structural domains are labeled in the diagram. The graph A is a cross-species identification representative branch of the two-ear brachypodium distachyon, a red square represents the two-ear brachypodium distachyon paired NLR, a blue circle represents the rice paired NLR seed sequence, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree. B: the brachypodium distachyon paired NLR rebuilds the evolutionary tree, the dark color shows the gene pair identified in cross species, and the light color shows the result identified in self circulation.

FIG. 3 is a schematic diagram showing the phylogenetic characteristics of paired NLR of barley, and the specific structural domains are labeled in the figure. A is a barley cross-species identification representative branch, a black triangle represents barley paired NLR, a blue circle represents rice paired NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; panel B is barley paired NLR reconstructed evolutionary tree.

FIG. 4 is a schematic diagram showing phylogenetic characteristics of paired NLR of sorghum, and specific structural domains are labeled in the diagram. A is a sorghum cross-species identification representative branch, a dark blue inverted triangle shows paired sorghum NLR, a blue circle represents paired rice NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; panel B is a sorghum paired NLR reconstructed evolutionary tree.

FIG. 5 is a schematic diagram showing the phylogenetic characteristics of Setaria viridis paired NLR, and the special structural domains are labeled in the figure. FIG. A is a cross-species identification representative branch of green bristlegrass, a brown diamond represents green bristlegrass paired NLR, a blue circle represents rice paired NLR seed sequences, the left column is a Helper, the right column is a Sensor, and the Clade label is consistent with that in a cross-species big tree; and the graph B is a green bristlegrass paired NLR reconstructed evolutionary tree.

FIG. 6 is a phylogenetic tree of Arabidopsis paired NLR genes. Wherein, Clade I is a paired NLR gene which is retrieved by arabidopsis through the topological relation of the reported genes, Clade II is the paired NLR gene which is detected by arabidopsis through a self-circulation tree drawing, and the Helper branch and the Sensor branch are marked on the right side.

Detailed Description

The present invention is further described below by way of examples, but is not intended to limit the scope of the present invention.

The invention relates to a method for constructing a paired NLR resistance gene database of plants, which comprises the following steps:

(3) identification of the specific domains: analyzing the special structural domain condition of the NLR gene by Hmmscan and Blast tools; after the NB-ARC domain and the LRR domain were detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain was classified as the main NLR gene, and the gene having only the NB-ARC domain was classified as the candidate NLR gene.

the method comprises the steps of excavating special structural domain traces of the NLR gene through various ways, including Hmmscan search on target proteins, Hmmscan search on proteins before and after the target genes, tblastn search on nucleotide sequences within 5k before and after the target genes, and taking the tblastn search as one of judgment bases of sensors.

Screening an NLR gene cluster containing a Helper homologous gene, drawing a phylogenetic tree by using an NB-ARC structural domain protein sequence, determining the Helper and Sensor branches in the phylogenetic tree based on the phylogenetic relationship and special structural domain traces of the NLR gene, wherein the candidate Helper and Sensor need to be gathered in the same branch with the Helper and Sensor with identified sequences on the phylogenetic tree, and the candidate Helper and Sensor gather in the same branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor.

In the process of searching NLR gene cluster, NLR genes within 200kb of physical distance or within two and less than gene number distance on the same chromosome are regarded as genes in the same gene cluster.

The characteristic value data of the paired NLR genes comprise gene positions, gene numbers, chain orientations, special structural domains, gene types, Pair-ID, TIR types and NBS positions on chromosomes.

The multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.

Example 1

Identification of paired NLR genes in Brachypodium distachyon (Brachypodium distachyon)

1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of brachypodium distachyon from JGI public databases;

2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 336 NBS genes and 272 NLR genes are identified and obtained in the brachypodium, and 69 gene clusters are finally obtained through screening;

3. identification of the specific domains: (1) based on a Pfam database, locally using Hmmscan to perform structural domain search on all NLR protein sequences, and labeling special reported structural domains such as WRKY, NOI, RATX1 and SRP 54; (2) searching special structural domains of the front and back genes of the NLR gene, wherein if the front and back genes have the special structural domains, the ancestral gene of the NLR gene can also have the special structural domains; (3) searching reported special structural domains of the Sensor gene from a Pfam functional net, downloading fasta files of the protein sequences of the domains from alignment interfaces, constructing a protein sequence database by a formatdb tool, and then searching special structural domain traces in nucleotide sequences within 5k before and after the clustered NLR gene by using a tblastn tool, wherein E-value is < 1E-10; (4) and finally, integrating the special domain information obtained by the three methods to be used as one of the judgment bases of the Sensor.

4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 16 pairs of NLR genes (figure 2A);

5. re-identification of paired NLR genes: based on the 16 pairs of NLR genes searched in step 4, the analytical search process in step 4 is repeated until no new paired NLR genes are searched, and finally 22 pairs of paired NLR genes are obtained (fig. 2B).

6. And establishing a two-spike brachypodium distachyon paired NLR gene database by using the characteristic values corresponding to the paired NLR genes as data item identifications through MySQL.

Example 2

Identification of paired NLR genes in barley (Hordeum vulgare)

1. Collecting basic data: collecting the genome sequence, protein sequence and gene annotation information (http:// plants. ensemble. org) of barley from Ensembl public database;

2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 405 NBS genes and 318 NLR genes are identified in barley, and 74 gene clusters are finally screened;

4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 16 pairs of NLR genes (figure 3A);

5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 16 paired NLR genes are obtained (fig. 3B).

6. And establishing a barley paired NLR gene database by using the characteristic values corresponding to the paired NLR genes as data item identifications.

Example 3

Identification of paired NLR genes in Sorghum (Sorghum bicolor)

1. Collecting basic data: collecting the genome sequence, protein sequence and gene annotation information (https:// phytozome.jgi.doe.gov) of sorghum from JGI public databases;

2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 326 NBS genes and 275 NLR genes are identified in sorghum, and 55 gene clusters are finally screened;

4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 19 pairs of NLR genes (figure 4A);

5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 19 paired NLR genes are obtained (fig. 4B).

6. And establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications.

Example 4

Identification of paired NLR genes in Setaria viridis (Setaria viridis)

1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of green bristlegrass from JGI public databases;

2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 388 NBS genes and 291 NLR genes are identified in green bristlegrass herb, and 74 gene clusters are finally obtained through screening;

4. Identification of paired NLR genes: (1) based on paired NLR genes reported by rice (Wang et al, 2019), Helper homologous genes are searched in clustered NLR genes by a Blast (2.3.0+) tool, and E-value is less than 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the identified Helper are gathered in one branch to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 20 pairs of NLR genes (figure 5A);

5. re-identification of paired NLR genes: based on the paired NLR genes searched in step 4, the analytical retrieval process in step 4 is repeated until no new paired NLR genes are retrieved, and finally 20 paired NLR genes are obtained (fig. 5B).

Example 5

Identification of paired NLR genes in Arabidopsis thaliana (Arabidopsis thaliana)

1. Collecting basic data: collecting genome sequences, protein sequences and gene annotation information (https:// phytozome.jgi.doe.gov) of Arabidopsis thaliana from JGI public databases;

2. identification of NLR genes and gene clusters: (1) firstly, acquiring domain information of all protein sequences through an Hmmscan tool based on protein sequence information of a whole genome, screening proteins containing an NB-ARC domain (PF00931) from the domain information, and setting parameters to be E-value less than or equal to 1E-4; (2) then, through an NLR-parser tool, the LRR structural domain is detected, proteins with motif 9, 11 or 19 are screened, genes with an NB-ARC structural domain and the LRR structural domain are the main NLR genes, and the genes with the NB-ARC structural domain are the candidate NLR genes; (3) then based on the physical distance and the gene number of the NLR gene on the chromosome in the gene annotation, the NLR genes with the physical distance of more than 200kb (Holub,2001) or the distance of two or less gene numbers on the same chromosome are taken as the genes in the same gene cluster, and all the NLR gene clusters are identified; (4) finally, 165 NBS genes are identified in brachypodium distachyon, 142 NLR genes are identified, and 23 gene clusters are finally screened;

4. Identification of paired NLR genes: (1) based on Arabidopsis thaliana reported paired NLR genes RPS1/RPS4(Narusaka et al, 2009) and RPP2A/RPP2B (Sinapidou et al, 2004), a Helper isogene was searched for in a cluster of NLR genes by Blast (2.3.0+) tool, E-value < 10; (2) screening NLR gene cluster containing Helper homologous gene, using NB-ARC structural domain protein sequence to draw phylogenetic tree by Clustalw2 tool (bootstrap value is set to 1000, and default values are used for others); (3) manual inspection is carried out through MEGA7.0 tool, duplicate and NLR without conservative phylogenetic characteristics are removed; (4) determining the branches of the Helper and the Sensor through phylogenetic relationship and special structural domains (such as WRKY, NOI and the like), wherein the candidate Helper and the Sensor need to be gathered in the same branch with the Helper and the Sensor with identified sequences on an evolutionary tree, and the candidate Helper and the Sensor gather in one branch with the identified Helper to be the Helper, and the corresponding gene is the Sensor, or vice versa, and finally obtaining 12 pairs of NLR genes (FIG. 6Clade I);

5. re-identification of paired NLR genes: based on the paired NLR genes searched in the step 4, the analysis and retrieval process in the step 4 is repeated until no new paired NLR genes are retrieved, and finally 18 paired NLR genes are obtained (FIG. 6Clade II).

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the foregoing description only for the purpose of illustrating the principles of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims, specification, and equivalents thereof.

Claims

1. A method for constructing a plant paired NLR resistance gene database is characterized by comprising the following steps:

2. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in step (3), after the NB-ARC domain and the LRR domain are detected by Hmmscan and NLR-parser, the gene having both the NB-ARC domain and the LRR domain is classified as the main NLR gene, and the gene having only the NB-ARC domain is classified as the candidate NLR gene.

3. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), in the process of searching the NLR gene cluster, NLR genes within two or less physical distances of 200kb or gene number distances on the same chromosome are regarded as genes in the same gene cluster.

4. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), the trace of the special structural domain of the NLR gene is mined in various ways, including Hmmscan search on target protein, Hmmscan search on protein before and after the target gene, tblastn search on nucleotide sequence within 5k before and after the target gene, and the result is used as one of judgment bases of sensors.

5. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (4), after an NLR gene cluster containing a Helper homologous gene is screened out, an NB-ARC structural domain protein sequence is used for drawing a phylogenetic tree, the Helper and Sensor branches in the phylogenetic tree are determined based on the phylogenetic relation and the special structural domain traces of the NLR gene, candidate helpers and sensors need to be gathered in the same branch as the Helper and Sensor with identified sequences on the phylogenetic tree, the candidate helpers and Sensor are gathered in the same branch as the identified Helper, and the corresponding gene is the Sensor.

6. The method for constructing a plant paired NLR resistance gene database according to claim 1, wherein the method comprises the following steps: in the step (5), the characteristic value data of the paired NLR genes includes gene position, gene number, strand orientation, specific domain, gene type, Pair-ID, TIR type, NBS position on chromosome.

7. The paired plant NLR gene database constructed by the method of constructing paired plant NLR resistance gene database according to any one of claims 1 to 6, wherein the database comprises: establishing a paired NLR gene database by taking characteristic values corresponding to the paired NLR genes as data item identifications: https:// figshare. com/articles/dataset/Database _ base _ of _ scheduled _ NLR _ genes/15096966.

8. The multi-species paired NLR gene database according to claim 7, characterized in that: the multiple species are brachypodium distachyon, barley, sorghum, setaria viridis or arabidopsis thaliana.