CN105678109A - Method for protein functional annotation based on adjacent proteins - Google Patents

Method for protein functional annotation based on adjacent proteins Download PDF

Info

Publication number
CN105678109A
CN105678109A CN201610012805.7A CN201610012805A CN105678109A CN 105678109 A CN105678109 A CN 105678109A CN 201610012805 A CN201610012805 A CN 201610012805A CN 105678109 A CN105678109 A CN 105678109A
Authority
CN
China
Prior art keywords
albumen
protein
annotation
adjacent
unknown function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610012805.7A
Other languages
Chinese (zh)
Inventor
郝彤
彭玮
孙金生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Tianjin Normal University
Original Assignee
Tianjin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Normal University filed Critical Tianjin Normal University
Priority to CN201610012805.7A priority Critical patent/CN105678109A/en
Publication of CN105678109A publication Critical patent/CN105678109A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention discloses a method for protein functional annotation based on adjacent proteins. The method is characterized by comprising the steps of determining unknown functional proteins, calculating unknown functional protein adjacent node annotation information and adding unknown functional protein GO annotations. An algorithm is achieved through a perl language until the number of added annotations is no longer changed. The invention further discloses application of the method for protein functional annotation based on adjacent proteins to the aspect of protein function prediction. The problem that the unknown protein functional annotation hinders physiological mechanism research of litopenaeus vannamei growth, development and immunity and the like can be solved. The basis is laid for later more refined protein function research, and effective help is provided for division of a protein network sub-network.

Description

A kind of protein function based on adjacent albumen annotates method
Technical field
The invention belongs to bioinformatics technique field, relate to a kind of protein function based on adjacent albumen and annotate method.
Background technology
Step up along with gene surveys order technology, newfound protein sequence is also increasing, although the functional annotation for albumen has been carried out substantial amounts of research, yet suffer from the unforeseen albumen of a large amount of function at present, owing to analysis and the research of many organism vital movement mechanism need to be based upon on the basis that protein function is analyzed, therefore the existence of agnoprotein creates obstruction for these researchs. therefore, in the last few years, increasing research launched for protein function prediction. albumen for a new order-checking, its function can be predicted by carrying out BLASTP comparison with known protein sequence databank (such as Uniprot), or utilize its albumen numbering or title from its functional annotation of GeneOntology data base querying, and pass through data base's comparison and cannot find the albumen of functional annotation, then need to be further analyzed by additive method. the research that interactions between protein network carries out functional annotation is utilized to launch in this context, for the annotation of the function of agnoprotein in interactions between protein network, clustering method is mostly adopted to carry out at present, as carried out functional module division by combining classification tree and modularity index, agnoprotein is given by the functional annotation of known albumen all in module, thus agnoprotein being carried out method (list of references: the LectureNotesInElectricalEngineering of functional annotation, Volume322, 2015, pp831-837), the annotation result of such method is comprehensive better, and accuracy is not enough. this paper presents method agnoprotein being carried out functional annotation based on adjacent albumen, the method considers the function of each known neighbours' albumen of agnoprotein periphery in functional annotation process, thus determining the function of agnoprotein, the method design more meets the principle that the albumen interacted in organism tends to have identity function, it is possible to obtain high-quality protein function annotation.
Use the method need to possess interactions between protein network, GO term information and Perl software. The method utilizes the central idea of " consistent nearby ", defines the method adding GO annotation into unknown function albumen in albumen network, and the bioprocess participated in for further predicted protein function and research sub-network lays the foundation.
Summary of the invention
A kind of protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that
(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, how many adjacent albumen of each GO annotation callout, the adjacent albumen number of its mark accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely
(3) add unknown function Protein G O annotation, specifically comprise the following steps that
1) if A is unknown function albumen, and A only has an adjacent protein B, then all functional annotations of B are assigned to protein A;
2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, and calculate the p value of each GO annotation that adjacent albumen comprises, when the p value that certain GO annotates is be more than or equal to marginal value, just this GO is annotated imparting A;
(4) repeating step (1)-(3), no longer changing until having added annotation number.
For Fig. 1, wherein A, B, C are agnoproteins, and D, E, F, G, H are known albumen. In the first round is circulated, it is 2,3 that agnoprotein A has the functional annotation that adjacent albumen (D and E) number is 2, D albumen of functional annotation, 4, the functional annotation of E protein is 1,2,6,7, if the marginal value set is as 0.75, then at all functional annotations 1,2,3 of the adjacent albumen of A, in 4,6,7, only have the p=1 of 2, meet the condition of p >=0.75, then the functional annotation that this agnoprotein of A is endowed is then 2; In like manner, in the functional annotation that the adjacent albumen of agnoprotein C has, the p value only having function 9 meets the condition of p >=0.75, therefore gives C protein by function 9; B albumen only has an adjacent node F and has functional annotation, then the functional annotation 4,5,6 of F is assigned entirely to B albumen. To sum up, in first time circulation, the functional annotation that functional annotation is 2, B albumen of A albumen is 4,5,6, and the functional annotation of C protein is 9. In second time circulation, A albumen has three to have the adjacent protein B of functional annotation, D, E, and according to same principle, this takes turns the newly added functional annotation 4 and 6 of A albumen in circulation, namely there is functional annotation 2,4 and 6, in like manner, the newly added functional annotation 6 of C protein, has functional annotation 6 and 9. Constantly circulate in this manner, till all agnoproteins all no longer add New function annotation.
Protein function based on adjacent albumen disclosed by the invention annotates having the beneficial effects that of method:
The method for the annotation of agnoprotein in Litopenaeus vannamei albumen network, concrete steps:
(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, Litopenaeus vannamei interactions between protein network has 881, unknown function albumen, accounts for the 23% of total protein number.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely
(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then giving unknown function albumen by this GO annotation.
(4) step (1)-(3) are repeated, until network no longer produces newly to annotate albumen. The agnoprotein finally giving GO functional annotation is 625, accounts for 70.9%(such as Fig. 2 of agnoprotein number). This result can solve the problem that agnoprotein functional annotation hinders the Physiological Mechanism such as Growth of Litopenaeus vannamei, growth, immunity. Carry out finer protein function research for it afterwards to lay the foundation, be also that the division of albumen subnetwork of network extends efficient help simultaneously.
Accompanying drawing explanation
Fig. 1 is for adding agnoprotein functional method; Note: A, B, C represent the albumen of unknown function, D, E, F, G, H represent the albumen of known function, and 1 ~ 10 each numeral represents different functional annotation numberings;
In Fig. 2 Litopenaeus vannamei interactions between protein network, unknown function albumen adds annotation percentage ratio;
Fig. 3 annotates method flow diagram based on the protein function of adjacent albumen.
Detailed description of the invention
The present invention is described below by specific embodiment. Unless stated otherwise, technological means used in the present invention is method known in those skilled in the art. It addition, embodiment is interpreted as illustrative, and unrestricted the scope of the present invention, the spirit and scope of the invention are limited only by the claims that follow. To those skilled in the art, under the premise without departing substantially from spirit and scope of the present invention, the various changes or the changes that carry out the material component in these embodiments and consumption fall within protection scope of the present invention.
Embodiment 1
Possessing an interactions between protein network containing unknown function albumen, implementer needs possess the ability using Perl Programming with Pascal Language.
(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, have 881, unknown function albumen in Litopenaeus vannamei interactions between protein network.
(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p.Namely
(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then will change GO annotation and giving unknown function albumen.
(4) step (1)-(3) are repeated, until network no longer produces newly to annotate albumen. The agnoprotein finally giving GO functional annotation is 625, accounts for 70.9%(such as Fig. 2 of agnoprotein number). This result can solve the problem that agnoprotein functional annotation hinders the Physiological Mechanism such as Growth of Litopenaeus vannamei, growth, immunity. Carry out finer protein function research for it afterwards to lay the foundation, be also that the division of albumen subnetwork of network extends efficient help simultaneously.

Claims (2)

1. the protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that
(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen;
(2) statistics unknown function albumen adjacent node annotation information: find in unknown function albumen adjacent node and have the GO albumen annotated, adds up in all of GO annotation, the adjacent node number of each GO annotation callout;
Add unknown function Protein G O annotation, specifically comprise the following steps that
1) if A is unknown function albumen, and A only has an adjacent protein B, then all functional annotations of B are assigned to protein A;
2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, when the adjacent node number annotated by certain GO with when being annotated the percentage ratio of adjacent node number by GO be more than or equal to this marginal value, just by this GO annotation imparting A;
3) repeating step (1) and (2), no longer changing until having added annotation number.
2. annotate the method application in predicted protein function aspects based on the protein function of adjacent albumen described in claim 1.
CN201610012805.7A 2016-01-11 2016-01-11 Method for protein functional annotation based on adjacent proteins Pending CN105678109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610012805.7A CN105678109A (en) 2016-01-11 2016-01-11 Method for protein functional annotation based on adjacent proteins

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610012805.7A CN105678109A (en) 2016-01-11 2016-01-11 Method for protein functional annotation based on adjacent proteins

Publications (1)

Publication Number Publication Date
CN105678109A true CN105678109A (en) 2016-06-15

Family

ID=56299722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610012805.7A Pending CN105678109A (en) 2016-01-11 2016-01-11 Method for protein functional annotation based on adjacent proteins

Country Status (1)

Country Link
CN (1) CN105678109A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103025877A (en) * 2010-07-26 2013-04-03 基因组股份公司 Microorganisms and methods for the biosynthesis of aromatics, 2,4-pentadienoate and 1,3-butadiene
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method
CN104781458A (en) * 2012-10-01 2015-07-15 独立行政法人科学技术振兴机构 Approval prediction device, approval prediction method, and program
US9173961B2 (en) * 2010-02-10 2015-11-03 Immunogen, Inc. CD20 antibodies and uses thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9173961B2 (en) * 2010-02-10 2015-11-03 Immunogen, Inc. CD20 antibodies and uses thereof
CN103025877A (en) * 2010-07-26 2013-04-03 基因组股份公司 Microorganisms and methods for the biosynthesis of aromatics, 2,4-pentadienoate and 1,3-butadiene
CN104781458A (en) * 2012-10-01 2015-07-15 独立行政法人科学技术振兴机构 Approval prediction device, approval prediction method, and program
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TONG HAO等: "The protein-protein interaction network of eyestalk,Y-organ and hepatopancreas in Chinese mitten crab Eriocheir sinensis", 《BMC SYSTEM BIOLOGY》 *

Similar Documents

Publication Publication Date Title
Greenwald et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning
Liu et al. Detecting communities based on network topology
Jovanovic et al. Ant colony optimization algorithm with pheromone correction strategy for the minimum connected dominating set problem
Zhuang et al. A comprehensive description and evolutionary analysis of 22 grouper (Perciformes, Epinephelidae) mitochondrial genomes with emphasis on two novel genome organizations
Jin et al. Community detection in complex networks by density-based clustering
ATE429679T1 (en) MULTIPLE INACCURATE PATTERN COMPARISON
US20190340507A1 (en) Classifying data
Cheng et al. Graph-regularized dual Lasso for robust eQTL mapping
CN108280472A (en) A kind of density peak clustering method optimized based on local density and cluster centre
Babaei et al. Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion
CN108900320B (en) Method and device for reducing topological structure of Internet test bed in large scale
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
Parks et al. Measuring community similarity with phylogenetic networks
Dinkla et al. eXamine: Exploring annotated modules in networks
CN109064471A (en) A kind of three-dimensional point cloud model dividing method based on skeleton
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN102306173B (en) Image similarity comparison method
CN103927730A (en) Image noise reduction method based on Primal Sketch correction and matrix filling
CN105678109A (en) Method for protein functional annotation based on adjacent proteins
CN103871089A (en) Image superpixel meshing method based on fusion
Zheng et al. Line graph attention networks for predicting disease-associated Piwi-interacting RNAs
Díaz et al. Morphing projections: a new visual technique for fast and interactive large-scale analysis of biomedical datasets
Zexi et al. Cuckoo search algorithm for solving numerical integration
CN110473459A (en) Point group based on network Voronoi diagram is chosen
CN105828434B (en) Subnet division type DV-hop wireless sensor network positioning method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160615