CN105678109A

CN105678109A - Method for protein functional annotation based on adjacent proteins

Info

Publication number: CN105678109A
Application number: CN201610012805.7A
Authority: CN
Inventors: 郝彤; 彭玮; 孙金生
Original assignee: Tianjin Normal University
Current assignee: Tianjin University; Tianjin Normal University
Priority date: 2016-01-11
Filing date: 2016-01-11
Publication date: 2016-06-15

Abstract

The invention discloses a method for protein functional annotation based on adjacent proteins. The method is characterized by comprising the steps of determining unknown functional proteins, calculating unknown functional protein adjacent node annotation information and adding unknown functional protein GO annotations. An algorithm is achieved through a perl language until the number of added annotations is no longer changed. The invention further discloses application of the method for protein functional annotation based on adjacent proteins to the aspect of protein function prediction. The problem that the unknown protein functional annotation hinders physiological mechanism research of litopenaeus vannamei growth, development and immunity and the like can be solved. The basis is laid for later more refined protein function research, and effective help is provided for division of a protein network sub-network.

Description

A kind of protein function based on adjacent albumen annotates method

Technical field

The invention belongs to bioinformatics technique field, relate to a kind of protein function based on adjacent albumen and annotate method.

Background technology

Step up along with gene surveys order technology, newfound protein sequence is also increasing, although the functional annotation for albumen has been carried out substantial amounts of research, yet suffer from the unforeseen albumen of a large amount of function at present, owing to analysis and the research of many organism vital movement mechanism need to be based upon on the basis that protein function is analyzed, therefore the existence of agnoprotein creates obstruction for these researchs. therefore, in the last few years, increasing research launched for protein function prediction. albumen for a new order-checking, its function can be predicted by carrying out BLASTP comparison with known protein sequence databank (such as Uniprot), or utilize its albumen numbering or title from its functional annotation of GeneOntology data base querying, and pass through data base's comparison and cannot find the albumen of functional annotation, then need to be further analyzed by additive method. the research that interactions between protein network carries out functional annotation is utilized to launch in this context, for the annotation of the function of agnoprotein in interactions between protein network, clustering method is mostly adopted to carry out at present, as carried out functional module division by combining classification tree and modularity index, agnoprotein is given by the functional annotation of known albumen all in module, thus agnoprotein being carried out method (list of references: the LectureNotesInElectricalEngineering of functional annotation, Volume322, 2015, pp831-837), the annotation result of such method is comprehensive better, and accuracy is not enough. this paper presents method agnoprotein being carried out functional annotation based on adjacent albumen, the method considers the function of each known neighbours' albumen of agnoprotein periphery in functional annotation process, thus determining the function of agnoprotein, the method design more meets the principle that the albumen interacted in organism tends to have identity function, it is possible to obtain high-quality protein function annotation.

Use the method need to possess interactions between protein network, GO term information and Perl software. The method utilizes the central idea of " consistent nearby ", defines the method adding GO annotation into unknown function albumen in albumen network, and the bioprocess participated in for further predicted protein function and research sub-network lays the foundation.

Summary of the invention

A kind of protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that

(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen.

(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, how many adjacent albumen of each GO annotation callout, the adjacent albumen number of its mark accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely

(3) add unknown function Protein G O annotation, specifically comprise the following steps that

1) if A is unknown function albumen, and A only has an adjacent protein B, then all functional annotations of B are assigned to protein A;

2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, and calculate the p value of each GO annotation that adjacent albumen comprises, when the p value that certain GO annotates is be more than or equal to marginal value, just this GO is annotated imparting A;

(4) repeating step (1)-(3), no longer changing until having added annotation number.

For Fig. 1, wherein A, B, C are agnoproteins, and D, E, F, G, H are known albumen. In the first round is circulated, it is 2,3 that agnoprotein A has the functional annotation that adjacent albumen (D and E) number is 2, D albumen of functional annotation, 4, the functional annotation of E protein is 1,2,6,7, if the marginal value set is as 0.75, then at all functional annotations 1,2,3 of the adjacent albumen of A, in 4,6,7, only have the p=1 of 2, meet the condition of p >=0.75, then the functional annotation that this agnoprotein of A is endowed is then 2; In like manner, in the functional annotation that the adjacent albumen of agnoprotein C has, the p value only having function 9 meets the condition of p >=0.75, therefore gives C protein by function 9; B albumen only has an adjacent node F and has functional annotation, then the functional annotation 4,5,6 of F is assigned entirely to B albumen. To sum up, in first time circulation, the functional annotation that functional annotation is 2, B albumen of A albumen is 4,5,6, and the functional annotation of C protein is 9. In second time circulation, A albumen has three to have the adjacent protein B of functional annotation, D, E, and according to same principle, this takes turns the newly added functional annotation 4 and 6 of A albumen in circulation, namely there is functional annotation 2,4 and 6, in like manner, the newly added functional annotation 6 of C protein, has functional annotation 6 and 9. Constantly circulate in this manner, till all agnoproteins all no longer add New function annotation.

Protein function based on adjacent albumen disclosed by the invention annotates having the beneficial effects that of method:

The method for the annotation of agnoprotein in Litopenaeus vannamei albumen network, concrete steps:

(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, Litopenaeus vannamei interactions between protein network has 881, unknown function albumen, accounts for the 23% of total protein number.

(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p. Namely

(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then giving unknown function albumen by this GO annotation.

(4) step (1)-(3) are repeated, until network no longer produces newly to annotate albumen. The agnoprotein finally giving GO functional annotation is 625, accounts for 70.9%(such as Fig. 2 of agnoprotein number). This result can solve the problem that agnoprotein functional annotation hinders the Physiological Mechanism such as Growth of Litopenaeus vannamei, growth, immunity. Carry out finer protein function research for it afterwards to lay the foundation, be also that the division of albumen subnetwork of network extends efficient help simultaneously.

Accompanying drawing explanation

Fig. 1 is for adding agnoprotein functional method; Note: A, B, C represent the albumen of unknown function, D, E, F, G, H represent the albumen of known function, and 1 ~ 10 each numeral represents different functional annotation numberings;

In Fig. 2 Litopenaeus vannamei interactions between protein network, unknown function albumen adds annotation percentage ratio;

Fig. 3 annotates method flow diagram based on the protein function of adjacent albumen.

Detailed description of the invention

The present invention is described below by specific embodiment. Unless stated otherwise, technological means used in the present invention is method known in those skilled in the art. It addition, embodiment is interpreted as illustrative, and unrestricted the scope of the present invention, the spirit and scope of the invention are limited only by the claims that follow. To those skilled in the art, under the premise without departing substantially from spirit and scope of the present invention, the various changes or the changes that carry out the material component in these embodiments and consumption fall within protection scope of the present invention.

Embodiment 1

Possessing an interactions between protein network containing unknown function albumen, implementer needs possess the ability using Perl Programming with Pascal Language.

(1) unknown function albumen is determined: Litopenaeus vannamei interactions between protein network packet is containing 3866, albumen, protein-interacting relation 46475, first the GO of each albumen is annotated and scan for, determine without any GO albumen annotated, these albumen are unknown function albumen, have 881, unknown function albumen in Litopenaeus vannamei interactions between protein network.

(2) statistics unknown function albumen adjacent node annotation information: after determining unknown function albumen, adds up the GO annotation information of their adjacent albumen respectively. For a unknown function albumen, first all of adjacent albumen of this albumen is found, add up which GO annotation these adjacent albumen have, and add up in the GO annotation that all of its neighbor albumen comprises, each GO annotates how many adjacent albumen, the adjacent albumen number of its annotation accounts for the percentage ratio of all adjacent protein total having GO to annotate, and this percentage ratio is set to p.Namely

(3) agnoprotein function information is added: agnoprotein function has following two situation when adding: if the adjacent albumen of only one of which known function around 1 unknown function albumen, now just annotated by the GO of this adjacent albumen and be all assigned to unknown function albumen; If there being multiple known function adjacent node around 2 unknown function albumen, then calculating the p value of each GO annotation that adjacent albumen comprises, when p >=0.25, then will change GO annotation and giving unknown function albumen.

Claims

1. the protein function based on adjacent albumen annotates method, it is characterized in that it is by determining that unknown function albumen, statistics unknown function albumen adjacent node annotation information and interpolation unknown function Protein G O annotation form, use perl language to realize this algorithm, specifically comprise the following steps that

(1) unknown function albumen is determined: in protein interaction network, two albumen interacted are referred to as node, its effect each other is referred to as limit, known albumen in interactions between protein network refers to the albumen that can find corresponding GO functional annotation in GeneOntology data base according to albumen numbering, and relative, it is impossible to find the albumen of GO functional annotation to be unknown function albumen;

(2) statistics unknown function albumen adjacent node annotation information: find in unknown function albumen adjacent node and have the GO albumen annotated, adds up in all of GO annotation, the adjacent node number of each GO annotation callout;

Add unknown function Protein G O annotation, specifically comprise the following steps that

2) if A has more than one adjacent node, now it needs to be determined that a suitable marginal value, when the adjacent node number annotated by certain GO with when being annotated the percentage ratio of adjacent node number by GO be more than or equal to this marginal value, just by this GO annotation imparting A;

3) repeating step (1) and (2), no longer changing until having added annotation number.

2. annotate the method application in predicted protein function aspects based on the protein function of adjacent albumen described in claim 1.