CN105678109A - Method for protein functional annotation based on adjacent proteins - Google Patents
Method for protein functional annotation based on adjacent proteins Download PDFInfo
- Publication number
- CN105678109A CN105678109A CN201610012805.7A CN201610012805A CN105678109A CN 105678109 A CN105678109 A CN 105678109A CN 201610012805 A CN201610012805 A CN 201610012805A CN 105678109 A CN105678109 A CN 105678109A
- Authority
- CN
- China
- Prior art keywords
- albumen
- protein
- annotation
- adjacent
- unknown function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Abstract
The invention discloses a method for protein functional annotation based on adjacent proteins. The method is characterized by comprising the steps of determining unknown functional proteins, calculating unknown functional protein adjacent node annotation information and adding unknown functional protein GO annotations. An algorithm is achieved through a perl language until the number of added annotations is no longer changed. The invention further discloses application of the method for protein functional annotation based on adjacent proteins to the aspect of protein function prediction. The problem that the unknown protein functional annotation hinders physiological mechanism research of litopenaeus vannamei growth, development and immunity and the like can be solved. The basis is laid for later more refined protein function research, and effective help is provided for division of a protein network sub-network.
Description
Technical field
The invention belongs to bioinformatics technique field, relate to a kind of protein function based on adjacent albumen and annotate method.
Background technology
Step up along with gene surveys order technology, newfound protein sequence is also increasing, although the functional annotation for albumen has been carried out substantial amounts of research, yet suffer from the unforeseen albumen of a large amount of function at present, owing to analysis and the research of many organism vital movement mechanism need to be based upon on the basis that protein function is analyzed, therefore the existence of agnoprotein creates obstruction for these researchs. therefore, in the last few years, increasing research launched for protein function prediction. albumen for a new order-checking, its function can be predicted by carrying out BLASTP comparison with known protein sequence databank (such as Uniprot), or utilize its albumen numbering or title from its functional annotation of GeneOntology data base querying, and pass through data base's comparison and cannot find the albumen of functional annotation, then need to be further analyzed by additive method. the research that interactions between protein network carries out functional annotation is utilized to launch in this context, for the annotation of the function of agnoprotein in interactions between protein network, clustering method is mostly adopted to carry out at present, as carried out functional module division by combining classification tree and modularity index, agnoprotein is given by the functional annotation of known albumen all in module, thus agnoprotein being carried out method (list of references: the LectureNotesInElectricalEngineering of functional annotation, Volume322, 2015, pp831-837), the annotation result of such method is comprehensive better, and accuracy is not enough. this paper presents method agnoprotein being carried out functional annotation based on adjacent albumen, the method considers the function of each known neighbours' albumen of agnoprotein periphery in functional annotation process, thus determining the function of agnoprotein, the method design more meets the principle that the albumen interacted in organism tends to have identity function, it is possible to obtain high-quality protein function annotation.
Use the method need to possess interactions between protein network, GO term information and Perl software. The method utilizes the central idea of " consistent nearby ", defines the method adding GO annotation into unknown function albumen in albumen network, and the bioprocess participated in for further predicted protein function and research sub-network lays the foundation.
Summary of the invention
A kind of protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that
(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, how many adjacent albumen of each GO annotation callout, the adjacent albumen number of its mark accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely
(3) add unknown function Protein G O annotation, specifically comprise the following steps that
1) if A is unknown function albumen, and A only has an adjacent protein B, then all functional annotations of B are assigned to protein A;
2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, and calculate the p value of each GO annotation that adjacent albumen comprises, when the p value that certain GO annotates is be more than or equal to marginal value, just this GO is annotated imparting A;
(4) repeating step (1)-(3), no longer changing until having added annotation number.
For Fig. 1, wherein A, B, C are agnoproteins, and D, E, F, G, H are known albumen. In the first round is circulated, it is 2,3 that agnoprotein A has the functional annotation that adjacent albumen (D and E) number is 2, D albumen of functional annotation, 4, the functional annotation of E protein is 1,2,6,7, if the marginal value set is as 0.75, then at all functional annotations 1,2,3 of the adjacent albumen of A, in 4,6,7, only have the p=1 of 2, meet the condition of p >=0.75, then the functional annotation that this agnoprotein of A is endowed is then 2; In like manner, in the functional annotation that the adjacent albumen of agnoprotein C has, the p value only having function 9 meets the condition of p >=0.75, therefore gives C protein by function 9; B albumen only has an adjacent node F and has functional annotation, then the functional annotation 4,5,6 of F is assigned entirely to B albumen. To sum up, in first time circulation, the functional annotation that functional annotation is 2, B albumen of A albumen is 4,5,6, and the functional annotation of C protein is 9. In second time circulation, A albumen has three to have the adjacent protein B of functional annotation, D, E, and according to same principle, this takes turns the newly added functional annotation 4 and 6 of A albumen in circulation, namely there is functional annotation 2,4 and 6, in like manner, the newly added functional annotation 6 of C protein, has functional annotation 6 and 9. Constantly circulate in this manner, till all agnoproteins all no longer add New function annotation.
Protein function based on adjacent albumen disclosed by the invention annotates having the beneficial effects that of method:
The method for the annotation of agnoprotein in Litopenaeus vannamei albumen network, concrete steps:
(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, Litopenaeus vannamei interactions between protein network has 881, unknown function albumen, accounts for the 23% of total protein number.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely
(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then giving unknown function albumen by this GO annotation.
(4) step (1)-(3) are repeated, until network no longer produces newly to annotate albumen. The agnoprotein finally giving GO functional annotation is 625, accounts for 70.9%(such as Fig. 2 of agnoprotein number). This result can solve the problem that agnoprotein functional annotation hinders the Physiological Mechanism such as Growth of Litopenaeus vannamei, growth, immunity. Carry out finer protein function research for it afterwards to lay the foundation, be also that the division of albumen subnetwork of network extends efficient help simultaneously.
Accompanying drawing explanation
Fig. 1 is for adding agnoprotein functional method; Note: A, B, C represent the albumen of unknown function, D, E, F, G, H represent the albumen of known function, and 1 ~ 10 each numeral represents different functional annotation numberings;
In Fig. 2 Litopenaeus vannamei interactions between protein network, unknown function albumen adds annotation percentage ratio;
Fig. 3 annotates method flow diagram based on the protein function of adjacent albumen.
Detailed description of the invention
The present invention is described below by specific embodiment. Unless stated otherwise, technological means used in the present invention is method known in those skilled in the art. It addition, embodiment is interpreted as illustrative, and unrestricted the scope of the present invention, the spirit and scope of the invention are limited only by the claims that follow. To those skilled in the art, under the premise without departing substantially from spirit and scope of the present invention, the various changes or the changes that carry out the material component in these embodiments and consumption fall within protection scope of the present invention.
Embodiment 1
Possessing an interactions between protein network containing unknown function albumen, implementer needs possess the ability using Perl Programming with Pascal Language.
(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, have 881, unknown function albumen in Litopenaeus vannamei interactions between protein network.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p.Namely
(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then will change GO annotation and giving unknown function albumen.
(4) step (1)-(3) are repeated, until network no longer produces newly to annotate albumen. The agnoprotein finally giving GO functional annotation is 625, accounts for 70.9%(such as Fig. 2 of agnoprotein number). This result can solve the problem that agnoprotein functional annotation hinders the Physiological Mechanism such as Growth of Litopenaeus vannamei, growth, immunity. Carry out finer protein function research for it afterwards to lay the foundation, be also that the division of albumen subnetwork of network extends efficient help simultaneously.
Claims (2)
1. the protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that
(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen;
(2) statistics unknown function albumen adjacent node annotation information: find in unknown function albumen adjacent node and have the GO albumen annotated, adds up in all of GO annotation, the adjacent node number of each GO annotation callout;
Add unknown function Protein G O annotation, specifically comprise the following steps that
1) if A is unknown function albumen, and A only has an adjacent protein B, then all functional annotations of B are assigned to protein A;
2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, when the adjacent node number annotated by certain GO with when being annotated the percentage ratio of adjacent node number by GO be more than or equal to this marginal value, just by this GO annotation imparting A;
3) repeating step (1) and (2), no longer changing until having added annotation number.
2. annotate the method application in predicted protein function aspects based on the protein function of adjacent albumen described in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610012805.7A CN105678109A (en) | 2016-01-11 | 2016-01-11 | Method for protein functional annotation based on adjacent proteins |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610012805.7A CN105678109A (en) | 2016-01-11 | 2016-01-11 | Method for protein functional annotation based on adjacent proteins |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105678109A true CN105678109A (en) | 2016-06-15 |
Family
ID=56299722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610012805.7A Pending CN105678109A (en) | 2016-01-11 | 2016-01-11 | Method for protein functional annotation based on adjacent proteins |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105678109A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103025877A (en) * | 2010-07-26 | 2013-04-03 | 基因组股份公司 | Microorganisms and methods for the biosynthesis of aromatics, 2,4-pentadienoate and 1,3-butadiene |
CN103065066A (en) * | 2013-01-22 | 2013-04-24 | 四川大学 | Drug combination network based drug combined action predicting method |
CN104781458A (en) * | 2012-10-01 | 2015-07-15 | 独立行政法人科学技术振兴机构 | Approval prediction device, approval prediction method, and program |
US9173961B2 (en) * | 2010-02-10 | 2015-11-03 | Immunogen, Inc. | CD20 antibodies and uses thereof |
-
2016
- 2016-01-11 CN CN201610012805.7A patent/CN105678109A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9173961B2 (en) * | 2010-02-10 | 2015-11-03 | Immunogen, Inc. | CD20 antibodies and uses thereof |
CN103025877A (en) * | 2010-07-26 | 2013-04-03 | 基因组股份公司 | Microorganisms and methods for the biosynthesis of aromatics, 2,4-pentadienoate and 1,3-butadiene |
CN104781458A (en) * | 2012-10-01 | 2015-07-15 | 独立行政法人科学技术振兴机构 | Approval prediction device, approval prediction method, and program |
CN103065066A (en) * | 2013-01-22 | 2013-04-24 | 四川大学 | Drug combination network based drug combined action predicting method |
Non-Patent Citations (1)
Title |
---|
TONG HAO等: "The protein-protein interaction network of eyestalk,Y-organ and hepatopancreas in Chinese mitten crab Eriocheir sinensis", 《BMC SYSTEM BIOLOGY》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Greenwald et al. | Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning | |
Liu et al. | Detecting communities based on network topology | |
Jovanovic et al. | Ant colony optimization algorithm with pheromone correction strategy for the minimum connected dominating set problem | |
Zhuang et al. | A comprehensive description and evolutionary analysis of 22 grouper (Perciformes, Epinephelidae) mitochondrial genomes with emphasis on two novel genome organizations | |
Jin et al. | Community detection in complex networks by density-based clustering | |
ATE429679T1 (en) | MULTIPLE INACCURATE PATTERN COMPARISON | |
US20190340507A1 (en) | Classifying data | |
Cheng et al. | Graph-regularized dual Lasso for robust eQTL mapping | |
CN108280472A (en) | A kind of density peak clustering method optimized based on local density and cluster centre | |
Babaei et al. | Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion | |
CN108900320B (en) | Method and device for reducing topological structure of Internet test bed in large scale | |
CN103888541A (en) | Method and system for discovering cells fused with topology potential and spectral clustering | |
Parks et al. | Measuring community similarity with phylogenetic networks | |
Dinkla et al. | eXamine: Exploring annotated modules in networks | |
CN109064471A (en) | A kind of three-dimensional point cloud model dividing method based on skeleton | |
CN104700311B (en) | A kind of neighborhood in community network follows community discovery method | |
CN102306173B (en) | Image similarity comparison method | |
CN103927730A (en) | Image noise reduction method based on Primal Sketch correction and matrix filling | |
CN105678109A (en) | Method for protein functional annotation based on adjacent proteins | |
CN103871089A (en) | Image superpixel meshing method based on fusion | |
Zheng et al. | Line graph attention networks for predicting disease-associated Piwi-interacting RNAs | |
Díaz et al. | Morphing projections: a new visual technique for fast and interactive large-scale analysis of biomedical datasets | |
Zexi et al. | Cuckoo search algorithm for solving numerical integration | |
CN110473459A (en) | Point group based on network Voronoi diagram is chosen | |
CN105828434B (en) | Subnet division type DV-hop wireless sensor network positioning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160615 |