CN112185459A - Prediction method for interaction of plant and pathogenic bacteria protein - Google Patents
Prediction method for interaction of plant and pathogenic bacteria protein Download PDFInfo
- Publication number
- CN112185459A CN112185459A CN202011020892.3A CN202011020892A CN112185459A CN 112185459 A CN112185459 A CN 112185459A CN 202011020892 A CN202011020892 A CN 202011020892A CN 112185459 A CN112185459 A CN 112185459A
- Authority
- CN
- China
- Prior art keywords
- protein
- interaction
- data
- spatial structure
- plant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 106
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 103
- 230000003993 interaction Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 23
- 244000052616 bacterial pathogen Species 0.000 title claims abstract description 16
- 230000006916 protein interaction Effects 0.000 claims abstract description 47
- 238000000547 structure data Methods 0.000 claims abstract description 15
- 241000894006 Bacteria Species 0.000 claims abstract description 11
- 238000010801 machine learning Methods 0.000 claims abstract description 11
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 238000002474 experimental method Methods 0.000 claims description 8
- 244000000003 plant pathogen Species 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 238000005481 NMR spectroscopy Methods 0.000 claims description 3
- 239000013078 crystal Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 244000052769 pathogen Species 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000001086 yeast two-hybrid system Methods 0.000 claims description 3
- 230000001717 pathogenic effect Effects 0.000 claims description 2
- 244000000005 bacterial plant pathogen Species 0.000 abstract description 9
- 241000209094 Oryza Species 0.000 description 22
- 235000007164 Oryza sativa Nutrition 0.000 description 22
- 235000009566 rice Nutrition 0.000 description 22
- 241000196324 Embryophyta Species 0.000 description 11
- 108010076504 Protein Sorting Signals Proteins 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 241000282461 Canis lupus Species 0.000 description 2
- 102100038385 Coiled-coil domain-containing protein R3HCC1L Human genes 0.000 description 2
- 101000743767 Homo sapiens Coiled-coil domain-containing protein R3HCC1L Proteins 0.000 description 2
- 108010052285 Membrane Proteins Proteins 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000009456 molecular mechanism Effects 0.000 description 2
- 230000004960 subcellular localization Effects 0.000 description 2
- 230000006918 subunit interaction Effects 0.000 description 2
- 241000219194 Arabidopsis Species 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 102000018697 Membrane Proteins Human genes 0.000 description 1
- 235000006679 Mentha X verticillata Nutrition 0.000 description 1
- 235000002899 Mentha suaveolens Nutrition 0.000 description 1
- 235000001636 Mentha x rotundifolia Nutrition 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 102000002067 Protein Subunits Human genes 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 102100036789 Protein TBATA Human genes 0.000 description 1
- 101710118245 Protein TBATA Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 102000040739 Secretory proteins Human genes 0.000 description 1
- 108091058545 Secretory proteins Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000004573 interface analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007110 pathogen host interaction Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000031068 symbiosis, encompassing mutualism through parasitism Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Peptides Or Proteins (AREA)
Abstract
The invention relates to a prediction method of plant and pathogenic bacteria protein interaction, which comprises the following steps: 1) collecting host-pathogen protein interaction positive data; 2) collecting the spatial structure of the protein complex template, and analyzing the interaction interface of the subunit pair; 3) carrying out homologous structure modeling on a host-pathogenic bacterium protein sequence to obtain a protein homologous space structure model; 4) comparing the protein homologous spatial structure with the protein complex template spatial structure to obtain structural characteristics; 5) extracting non-structural features; 6) and building a machine learning model, testing and adjusting the machine learning model based on the structural characteristics and the non-structural characteristics, and predicting the rice-rice blast germ protein interaction of the genome scale. Compared with the prior art, the invention fully utilizes the determined protein structure data and the information of homology, structural domain interaction and the like, and can effectively, quickly and simply extract the interaction characteristic information related to the plant-pathogenic bacteria protein.
Description
Technical Field
The invention relates to the technical field of biological data processing, in particular to a prediction method for the interaction between plants and pathogenic bacteria proteins.
Background
Plant-pathogen interactions are a two-way biological process of communication. On the one hand, plants attempt to recognize molecules secreted by pathogenic bacteria to avoid infection, and on the other hand, pathogenic bacteria manipulate plants as much as possible, thereby making the plant host environment more favorable to them. This makes many known intra-species protein interaction prediction methods unsuitable for plant-pathogen, and there is little research focused on plant-pathogen protein interaction prediction.
Although experimental detection methods for protein interactions have been developed, the experimental methods are time consuming, laborious, low in data accumulation, and most of these data focus on interactions between humans and pathogens (especially viruses). In contrast, other hosts, especially plant-pathogen protein interaction data, are very limited.
Although the protein interaction is very easy to explain from the perspective of the protein space structure, the protein space structure is complex, the number of proteins with known structures is limited, and how to extract relevant interaction characteristic information by fully using the measured protein structure data becomes a key problem to be solved urgently in the current plant-pathogenic bacterium interaction.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for predicting the interaction between the plant and the pathogenic bacteria protein, which can effectively, quickly and simply extract the plant-pathogenic bacteria protein related interaction characteristic information by means of the measured protein spatial structure data and the information of homology, domain interaction and the like.
The purpose of the invention can be realized by the following technical scheme:
a method for predicting the interaction of a plant with a pathogen protein, comprising the steps of:
s1, collecting host-pathogenic bacterium protein interaction positive data and genome data of rice and rice blast germs;
and collecting host-pathogen protein interaction positive data by using an HPIDB database, wherein the host-pathogen protein interaction positive data are obtained by at least one experimental method in protein interaction detection means such as yeast two-hybrid and the like.
Downloading genome data of rice from an MSU database, and deleting a transposon gene; downloading genome data of rice blast germs from an Ensembl Genomes database, performing transmembrane helix prediction on a TMHMM website, and selecting proteins with predicted transmembrane helix prediction quantity larger than 0; performing signal peptide prediction on a SignalP website, performing subcellular localization prediction on a WoLF PSORT website, wherein the protein which is the signal peptide and is localized outside cells belongs to secreted protein of rice blast; after removing the repeated protein obtained in each step, screening to obtain the rice blast germ protein with potential interaction with the rice protein.
S2, collecting the spatial structure of the protein complex template, and splitting the protein complex into different subunits to obtain the interaction interface of the subunit pair;
acquiring experimentally measured protein three-dimensional structure data by using a PDB protein structure database, wherein the protein three-dimensional structure data is measured by at least one experimental method of nuclear magnetic resonance, X-ray crystal diffraction or an electron microscope; after the three-dimensional structure data of the protein is obtained, the protein complex is split into different subunits, the structural data of the subunit pairs is read by PIBASE software, and interaction interface information is extracted.
And S3, taking the spatial structure of the protein complex template in the step S2 as a template, and carrying out homologous structure modeling on the host-pathogen protein sequence by using MODPIPE to obtain a protein homologous spatial structure model.
S4, comparing the protein homologous spatial structure with the protein complex template spatial structure to obtain structural characteristics;
further, comparing the protein homologous spatial structure with the protein complex template spatial structure by using TM-align software to obtain structural features. The structural characteristics comprise similarity and structural deviation of a protein homologous spatial structure and a protein complex, and the number and the proportion of conserved residues of an interaction interface of the protein homologous spatial structure and a protein complex template spatial structure.
S5, collecting protein interaction data of the model organisms, acquiring a positive interaction data set of the model organisms, and extracting non-structural features;
the cross-species conservation of plant-pathogenic bacteria protein interaction is analyzed by utilizing homology mapping to obtain a protein homology mapping relation, and a related interaction protein pair supported by an interaction structural domain, namely a structural domain interaction relation, is obtained by combining a structural domain interaction data set.
And S6, building a machine learning model based on the structural features and the non-structural features, testing and adjusting, and predicting the rice-blast germ protein interaction of the genome scale.
And S1, performing sequence clustering and random combination on the host-pathogenic bacteria protein interaction positive data set obtained in the step S1 to generate a certain amount of negative data set, generating a training set and a testing set by the positive data set and the negative data set according to a certain proportion, utilizing sciit-leran random forest to build a machine learning initial model according to the structural characteristics and the non-structural characteristics of the training set, performing batch optimization test and adjustment on parameters of the initial model through a grid search function, utilizing the optimization model to perform relation prediction on all rice-rice blast bacteria protein pairs which can possibly interact pairwise in a genome scale, and drawing a rice-rice blast bacteria protein interaction network by adopting Cytoscape software according to a prediction result.
Compared with the prior art, the method is based on the existing biological data, and can effectively, quickly and simply extract the plant-pathogenic bacteria protein related interaction characteristic information by means of the determined protein space structure data and the information of homology, structural domain interaction and the like, so as to obtain the plant-pathogenic bacteria protein interaction data and provide reference for the research of plant disease-resistant molecular mechanisms.
Drawings
FIG. 1 is a schematic flow chart of a method for predicting plant-pathogen protein interaction in the examples;
FIG. 2 is a rice-blast protein interaction network at the genomic scale in the examples.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
Computer prediction of protein interactions requires extraction of valuable features from large amounts of data using methods such as statistics, machine learning, and the like. With the exponential growth of biological data, the machine learning method can be applied to the analysis of biological data through improvement. The invention provides a prediction method of plant and pathogenic bacteria protein interaction based on a protein space structure, and a high-accuracy plant-pathogenic bacteria protein interaction network is constructed on a genome scale according to the prediction method.
Specifically, as shown in fig. 1, the invention relates to a prediction method of the protein interaction between plants and pathogenic bacteria, which comprises the following steps:
step one, collecting host-pathogenic bacteria protein interaction positive data and genome data of rice and rice blast fungi
A positive Host-Pathogen protein Interaction dataset was collected from the HPIDB Database (Host-Pathogen Interaction Database). The data must be obtained by at least one experimental method in protein interaction detection means such as yeast two-hybrid detection.
When rice is infected by rice blast, the membrane protein and the secretory protein are most likely to interact with the protein of rice in the rice body. The invention obtains the rice blast germ protein interacting with the rice protein potential based on the HPIDB database. Specifically, the method comprises the following steps: the genome data of rice was downloaded from the MSU database, and the transposon gene was deleted. Downloading genome data of rice blast germs from an Ensembl Genomes database, and performing transmembrane helix prediction on a TMHMM website, wherein the predicted proteins with the transmembrane helix prediction number larger than 0 are membrane proteins and are 2317 in total; performing signal peptide prediction on a SignalP website, performing subcellular localization prediction on a WoLF PSORT website, wherein the protein which is classified as the signal peptide and is localized outside belongs to secreted proteins of rice blast, and 1402 proteins are obtained in total; after the deletion of the repeats, 3491 rice blast germ proteins having potential interaction with rice proteins were obtained by co-screening.
Step two, collecting the spatial structure of the protein complex template and analyzing the subunit interaction interface
And downloading experimentally determined three-dimensional structure data of the protein from the PDB protein structure database, wherein the structure data needs to be determined by at least one experimental method of nuclear magnetic resonance, X-ray crystal diffraction or electron microscope. The complex subunit interaction interface analysis means that the PDB protein complex is divided into pairwise interacting protein subunit pairs; the protein complex is split into different subunits, and the PIBASE software is used for reading the structural data of the subunit pairs and extracting the interaction interface information.
Step three, protein homologous structure modeling
And (3) taking the three-dimensional structure data of the protein measured in the experiment in the step two as a template, and carrying out homologous structure modeling on the protein sequences of the host and the pathogenic bacteria by using MODPIPE software to obtain a spatial structure model of the host and the pathogenic bacteria.
Taking the host-pathogen protein interaction dataset in the step one as an example, downloading the protein sequence from the uniprot database, and performing homologous modeling on the protein sequence, wherein the comparison method comprises sequence-sequence comparison (sequence-to-sequence comparison), profile-sequence comparison and profile-profile comparison. Evaluation of the quality of the homology modeling model the scoring was performed using MPQS, which is a composite score comprising sequence similarity, template coverage and three independent evaluation scores: e-value, Z-DOPE and GA 341. e-value is the significance threshold for alignment between the modeled protein and the template; Z-DOPE is a statistical possibility to deduce the dependence of atomic distance from local structure samples based on probability theory, independent of any adjustable parameters (discrete optimized protein energy or DOPE); GA341 is the model reliability score based on statistics. And (3) setting a scoring threshold value of MPQS ≧ 0.5 by observing a scoring probability distribution function, and regarding the model as a stable homologous structure model.
And (3) scoring the sequence length of the homologous structure model, and filtering to remove the protein homologous structure model which is too short to judge whether an interaction interface exists or not. And (3) scoring the sequence length of the homologous structure model to obtain a score MODSEQ-sore ═ L-MOD/L-SEQ, wherein L-MOD is the length of the homologous modeling sequence, and L-SEQ is the length of the corresponding gene sequence. And (3) combining the probability density distribution function of MODSEQ-sore, considering both the data quantity and the data quality, setting the threshold value to be 30%, and obtaining the homology modeling results of 14628 proteins in total.
Fourthly, superposing and comparing the protein homologous structure model and the complex template structure to obtain the structural characteristics
And (3) carrying out spatial structure comparison on the homologous structure model of the host and the pathogenic bacteria and the complex template by using TM-align software. Taking the host-pathogen protein interaction data set in the step one as an example, controlling the TM-score value to be more than 0.4, finally obtaining a structure comparison result of 10148 positive homologous templates and the complex subunit, and calculating the RMSD value, the TM-score value, the number of conserved residues of the interaction interface and the proportion of the conserved residues between the protein homologous model and the complex template as the structural characteristics. The calculation of the RMSD value, TM-score value, the number of conserved residues at the interaction interface and the occupation ratio of the conserved residues between the protein homology model and the complex template through the structure comparison result is the prior art and will not be described in detail herein.
Step five, analyzing and extracting non-structural features
Protein interaction data of 7 model organisms including arabidopsis, mice, nematodes, humans, escherichia coli, yeast and drosophila are collected from five public databases of BioGRID, IntAct, DIP, BIND and MINT, and a model organism positive interaction data set is obtained.
And (3) analyzing the direct homologous relation between the rice and rice blast proteins obtained in the step one and 7 model biological protein groups respectively by using an inparanoid software and a blast software, and obtaining the non-structural characteristics: and (5) homologous mapping relation. According to the opanoid analysis result, a 5720 pair of rice-blast protein interaction relation supported by the homologous mapping result is obtained by combining a mode biological positive data set; according to the result of blast software, adjusting 3 parameters of e value, sequence consistency and sequence coverage, determining that the analysis parameter of blast software is that the e value is 1e-5, the sequence consistency is 45 percent and the sequence coverage is 50 percent, and obtaining 5702-rice blast protein interaction relation.
Reading protein domain information by using PfamScan, and combining a domain interaction data set collected by a 3did database to obtain a related interaction protein pair supported by an interaction domain. Obtaining the non-structural characteristics: domain interaction relationships.
Step six, construction and optimization of deep learning model
And (3) carrying out sequence clustering and random combination on the host-pathogen protein interaction data set in the step one to generate a certain amount of negative data sets, and generating a training set and a testing set by the positive data set obtained in the step one and the negative data set obtained in the step according to a certain proportion. And according to 4 structural features and 2 non-structural features of the training set, constructing a machine learning initial model by utilizing scimit-learn random forests. Using a grid search function to carry out batch optimization and adjustment on parameters of the initial model, and finally determining the parameters: the maximum iteration number is 60, the maximum depth of the decision tree is 13, the minimum sample number required by internal node subdivision is 120, the minimum sample number of leaf nodes is 20, the maximum feature number is 7, the random number seed is 10, and the other parameters are default. The optimization model is utilized to predict the relation of all rice-rice blast germ protein pairs which are possibly interacted pairwise in the genome scale, the screening threshold value is 0.5, a rice-rice blast germ protein interaction network is drawn by using Cytoscape software according to all the prediction results, and the presented visual results are shown in figure 2.
Based on the existing biological data, the invention can effectively, quickly and simply extract the plant-pathogenic bacteria protein related interaction characteristic information by means of the determined protein space structure data and the information of homology, structural domain interaction and the like, thereby obtaining the plant-pathogenic bacteria protein interaction data and providing reference for the research of plant disease-resistant molecular mechanisms.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A method for predicting the protein interaction between a plant and a pathogen, comprising the steps of:
1) collecting host-pathogen protein interaction positive data;
2) collecting the spatial structure of the protein complex template, and splitting the protein complex into different subunits to obtain an interaction interface of a subunit pair;
3) taking the spatial structure of the protein complex template in the step 2) as a template, and carrying out homologous structure modeling on a host-pathogenic bacteria protein sequence to obtain a protein homologous spatial structure model;
4) comparing the protein homologous spatial structure with the protein complex template spatial structure to obtain structural characteristics;
5) collecting protein interaction data of the model organism, acquiring a positive interaction data set of the model organism, and extracting non-structural features;
6) and building a machine learning model, testing and adjusting the machine learning model based on the structural characteristics and the non-structural characteristics, and predicting the rice-rice blast germ protein interaction of the genome scale.
2. The method according to claim 1, wherein in step 1), host-pathogen protein interaction positive data satisfying at least one experimental method of protein interaction detection means such as yeast two-hybrid is collected using an HPIDB database.
3. The method for predicting plant-pathogen protein interaction according to claim 1, wherein the step 2) comprises:
acquiring experimentally measured protein three-dimensional structure data by using a PDB protein structure database, wherein the protein three-dimensional structure data is measured by at least one experimental method of nuclear magnetic resonance, X-ray crystal diffraction or an electron microscope; after the three-dimensional structure data of the protein is obtained, the protein complex is split into different subunits, the structural data of the subunit pairs is read by PIBASE software, and interaction interface information is extracted.
4. The method for predicting the protein interaction between plants and pathogenic bacteria according to claim 3, wherein in the step 3), the three-dimensional structure data of the protein experimentally measured in the step 2) is used as a template, and MODPIPE is used for carrying out homologous structure modeling on the protein sequence of the host-pathogenic bacteria to obtain a protein homologous spatial structure model.
5. The method according to claim 1, wherein the structural characteristics are obtained by comparing the spatial structure of homology of protein with the spatial structure of the template of protein complex in step 4) using TM-align software.
6. The method of claim 5, wherein the structural features include similarity of protein homology spatial structure to protein complex, structural deviation, and the number of conserved residues and the ratio of conserved residues at the interaction interface between protein homology spatial structure and protein complex template spatial structure.
7. The method of claim 1, wherein in step 5), the cross-species conservation of plant-pathogen protein interactions is analyzed using homology mapping to obtain a protein homology mapping, and the domain interaction dataset is combined to obtain pairs of related interacting protein pairs, i.e., domain interactions, supported by the interaction domains.
8. The method for predicting plant-pathogen protein interaction according to claim 1, wherein the step 6) comprises:
carrying out sequence clustering and random combination on the host-pathogenic bacteria protein interaction positive data set obtained in the step 1) to generate a certain amount of negative data set, generating a training set and a testing set by the positive data set and the negative data set according to a certain proportion, building a machine learning model by utilizing sciit-leran random forest according to the structural characteristics and the non-structural characteristics of the training set, carrying out parameter adjustment and optimization on the machine learning model by using a grid search function, predicting the interaction of the rice-blast bacteria protein pair of the genome scale, and drawing a rice-blast bacteria protein interaction network by adopting Cytoscape software.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011020892.3A CN112185459A (en) | 2020-09-25 | 2020-09-25 | Prediction method for interaction of plant and pathogenic bacteria protein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011020892.3A CN112185459A (en) | 2020-09-25 | 2020-09-25 | Prediction method for interaction of plant and pathogenic bacteria protein |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112185459A true CN112185459A (en) | 2021-01-05 |
Family
ID=73944510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011020892.3A Pending CN112185459A (en) | 2020-09-25 | 2020-09-25 | Prediction method for interaction of plant and pathogenic bacteria protein |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112185459A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104911261A (en) * | 2015-05-06 | 2015-09-16 | 华南农业大学 | Method for researching oryza sativa and pathogen interaction mode |
CN105354441A (en) * | 2015-10-23 | 2016-02-24 | 上海交通大学 | Vegetable protein interaction network construction method |
CN110136773A (en) * | 2019-04-02 | 2019-08-16 | 上海交通大学 | A kind of phytoprotein interaction network construction method based on deep learning |
-
2020
- 2020-09-25 CN CN202011020892.3A patent/CN112185459A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104911261A (en) * | 2015-05-06 | 2015-09-16 | 华南农业大学 | Method for researching oryza sativa and pathogen interaction mode |
CN105354441A (en) * | 2015-10-23 | 2016-02-24 | 上海交通大学 | Vegetable protein interaction network construction method |
CN110136773A (en) * | 2019-04-02 | 2019-08-16 | 上海交通大学 | A kind of phytoprotein interaction network construction method based on deep learning |
Non-Patent Citations (1)
Title |
---|
蔡浩洋等: "水稻蛋白质相互作用网络的预测与分析", 《四川大学学报(自然科学版)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rokas et al. | Genome-scale approaches to resolving incongruence in molecular phylogenies | |
US20150294065A1 (en) | Database-Driven Primary Analysis of Raw Sequencing Data | |
US11347810B2 (en) | Methods of automatically and self-consistently correcting genome databases | |
CN111863121A (en) | Protein self-interaction prediction method based on graph convolution neural network | |
CN113488104A (en) | Cancer driver gene prediction method and system based on local and global network centrality analysis | |
Kandathil et al. | Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments | |
CN114582429B (en) | Mycobacterium tuberculosis drug resistance prediction method and device based on hierarchical attention neural network | |
Naresh et al. | Impact of machine learning in bioinformatics research | |
CN109801681B (en) | SNP (Single nucleotide polymorphism) selection method based on improved fuzzy clustering algorithm | |
Fleming et al. | Identifying and addressing methodological incongruence in phylogenomics: A review | |
Du et al. | Deep multi-label joint learning for RNA and DNA-binding proteins prediction | |
CN113743453A (en) | Population quantity prediction method based on random forest | |
Pratas et al. | Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight | |
CN116246705B (en) | Analysis method and device for whole genome sequencing data | |
Zhang et al. | iSP-RAAC: Identify secretory proteins of malaria parasite using reduced amino acid composition | |
CN112185459A (en) | Prediction method for interaction of plant and pathogenic bacteria protein | |
Lee et al. | Protein secondary structure prediction using BLAST and exhaustive RT-RICO, the search for optimal segment length and threshold | |
CN113257338A (en) | Protein structure prediction method based on residue contact diagram information game mechanism | |
Al-Barhamtoshy et al. | DNA sequence error corrections based on TensorFlow | |
JP2020182445A (en) | Novel method for processing sequence information about single biological unit | |
Semwal et al. | Pr [m]: An algorithm for protein motif discovery | |
Bhat et al. | OTU clustering: A window to analyse uncultured microbial world | |
Hassan et al. | Integrated rules classifier for predicting pathogenic non-synonymous single nucleotide variants in human | |
JP3920207B2 (en) | Domain discrimination method and discrimination device | |
Prajapati et al. | Feature Selection using Ant Colony Optimization for Microarray Data Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210105 |