CN108171010A - Protein complex detection method and device based on semi-supervised internet startup disk model - Google Patents

Protein complex detection method and device based on semi-supervised internet startup disk model Download PDF

Info

Publication number
CN108171010A
CN108171010A CN201711250342.9A CN201711250342A CN108171010A CN 108171010 A CN108171010 A CN 108171010A CN 201711250342 A CN201711250342 A CN 201711250342A CN 108171010 A CN108171010 A CN 108171010A
Authority
CN
China
Prior art keywords
internet
protein
matrix
vertex
protein interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711250342.9A
Other languages
Chinese (zh)
Other versions
CN108171010B (en
Inventor
朱佳
黄昌勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong SUCHUANG Data Technology Co.,Ltd.
Original Assignee
Guangzhou Van Ping Electronic Technology Co Ltd
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Van Ping Electronic Technology Co Ltd, South China Normal University filed Critical Guangzhou Van Ping Electronic Technology Co Ltd
Priority to CN201711250342.9A priority Critical patent/CN108171010B/en
Publication of CN108171010A publication Critical patent/CN108171010A/en
Application granted granted Critical
Publication of CN108171010B publication Critical patent/CN108171010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention discloses protein complex detection methods and device based on semi-supervised internet startup disk model, the method includes obtaining the adjacency matrix of the protein interaction Internet, embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix, dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result, described device includes for storing at least one program storage and for loading at least one program to perform the processor of the protein complex detection method based on semi-supervised internet startup disk model.The present invention improves the effect of clustering processing by carrying out dimension conversion, then clustering algorithm is transferred to handle to the corresponding adjacency matrix of the protein interaction Internet.Protein complex detection method and device the present invention is based on semi-supervised internet startup disk model are widely used in protein complex identification technology field.

Description

Protein complex detection method and device based on semi-supervised internet startup disk model
Technical field
The present invention relates to protein complex identification technology fields, are based especially on the albumen of semi-supervised internet startup disk model The compound body detecting method of matter and device.
Background technology
Protein complex is that protein interaction (Protein-protein interaction, PPI) is formed Complicated graph structure, plays vital role in biochemical process and pharmaceutical technology.Therefore, PPI friendships are correctly identified Protein complex in mutual network, it is extremely useful for biomedical sector.But, with the tremendous growth of PPI data, again In addition the 'bottleneck' restrictions of experimental method, only a small amount of protein complex are identified by testing.
To overcome the technology restriction of experimental method in protein complex detection, it is used computational methods.PPI is interacted Network can regard a undirected unweighted graph as, wherein, protein is vertex, their interaction is side.Each albumen Matter complex is made of two or more protein for showing as intensive connected subgraph to be based on, which means that can utilize The figure that clustering method is formed finds them.
Recently, internet startup disk is studied extensively by people, and confirms that it can further improve many figure clustering methods Performance.The low-dimensional on vertex represents in network vector learning network, to capture and preserve the network structure.But, it is most of existing The feature on each vertex in some network vector method heavy dependence networks, this causes them not to be suitable for the PPI Internets. It is related to each vertex without any metadata other than protein name is referred to as in the PPI Internets.In other words, existing net Network vector approach can not capture PPI alternating network structures completely, because can be used for calculating the estimation of its single order without enough data Estimate with second order.
Invention content
In order to solve the above-mentioned technical problem, the first object of the present invention is to provide based on semi-supervised internet startup disk model Protein complex detection method, second is designed to provide the protein complex detection based on semi-supervised internet startup disk model Device.
The first technical solution for being taken of the present invention is:
Based on the protein complex detection method of semi-supervised internet startup disk model, include the following steps:
Obtain the adjacency matrix of the protein interaction Internet;
Embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
Dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
Further, it is described that embedded processing is carried out to adjacency matrix, it is specific to wrap the step for so as to obtain dimensionality reduction matrix It includes:
The single order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain protein The partial structurtes information of the interaction Internet;
The second order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain protein The overall structure information of the interaction Internet;
Partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
Further, the single order estimation calculated in the protein interaction Internet between all any two points, It the step for so as to obtain the partial structurtes information of the protein interaction Internet, specifically includes:
The preferred adjoint point on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm Collection;
Respectively according to the preferred adjoint point collection on each vertex, characteristic information is assigned for each vertex, so as to establish feature Information matrix;
According to characteristic information matrix, calculate the single order in the protein interaction Internet between all any two points and estimate Meter;
Using the single order estimation between any two points all in the protein interaction Internet as the egg of required acquisition The partial structurtes information of the white matter interaction Internet.
Further, the second order estimation calculated in the protein interaction Internet between all any two points, It the step for so as to obtain the overall structure information of the protein interaction Internet, specifically includes:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, it is mutual so as to export protein Act on the second order estimation between all any two points in the Internet;
Using the second order estimation between any two points all in the protein interaction Internet as the egg of required acquisition The overall structure information of the white matter interaction Internet.
Further, described each top selected using adjoint point selection algorithm in the protein interaction Internet It the step for preferred adjoint point collection of point, specifically includes:
The protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each vertex Deepwalk vectors;
A vertex in the selected protein interaction Internet is as object vertex;
According to the Deepwalk of object vertex and all adjoint points of object vertex vectors, computing object vertex is every with it respectively The Euclidean distance of one adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
The set that the adjoint point that all Euclidean distances with object vertex are more than arithmetic average is formed is as object The preferred adjoint point collection on vertex;
A vertex in the execution selected protein interaction Internet is returned to as this step of object vertex Suddenly, until the preferred adjoint point collection on each vertex in the protein interaction Internet is selected.
Further, the second order estimation calculated in the protein interaction Internet between all any two points, It is described excellent equipped with Optimization Steps after the step for so as to obtain the overall structure information of the protein interaction Internet Change step to include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, calculate Scheme Laplce's regular terms loss function;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Will according to corresponding single order estimation during figure Laplce's regular terms loss function minimum and second order estimation respectively as The partial structurtes information of the protein interaction Internet of required acquisition and overall structure information.
Further, the figure Laplce regular terms loss function, calculation formula are as follows:
L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondIt is two The monitored loss of rank estimation, λ LfirstAnd LsecondBetween balance factor.
Further, the monitored loss of the single order estimation, calculation formula are as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix;
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,
Further, the second order estimation calculated in the protein interaction Internet between all any two points, It is described excellent equipped with Optimization Steps after the step for so as to obtain the overall structure information of the protein interaction Internet Change step to include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,For the second target Minus deviation variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the singular value of X Most high percentage, α is a matrix, and the columns of α is equal to the maximum value that D can use, and β is equal to the minimum value that D can use.
The second technical solution for being taken of the present invention is:Protein complex detection based on semi-supervised internet startup disk model Device, including:
Memory, for storing at least one program;
Processor, it is embedding based on semi-supervised network described in the first technical solution to perform for loading at least one program Enter the protein complex detection method of model.
The beneficial effects of the invention are as follows:It is mutual to protein by the compound body detecting method of present protein and device The effect Internet carries out embedded, dimension conversion processing, can improve existing clustering algorithm to protein interaction Interactive Network Network carries out efficiency during cluster calculation process, optimizes Clustering Effect so that protein complex testing result is more accurate.Meanwhile The present invention can be that each vertex of the protein interaction Internet assigns feature, can capture protein interaction interaction The partial structurtes of network can capture its overall structure again, therefore present invention does not require each tops of the protein interaction Internet Point itself has feature, and overcoming directly to hand over protein interaction of each vertex there is no feature using clustering algorithm The technological deficiency that mutual network is handled.The present invention is stable, and every prediction result evaluation index is superior to other protein Compound body detecting method.
Description of the drawings
Fig. 1 is the flow chart of the compound body detecting method of present protein;
Fig. 2 is the particular flow sheet of step S2;
Fig. 3 is the particular flow sheet of step S21;
Fig. 4 is the particular flow sheet of step S211;
Fig. 5 is the comparison result of Krogan data sets;
Fig. 6 is the comparison result of Dip data sets;
Fig. 7 is the comparison result of Biogrid data sets;
Fig. 8 is the structure chart of present protein complex detection device.
Specific embodiment
Embodiment 1
Protein complex detection method disclosed by the invention based on semi-supervised internet startup disk model, as shown in Figure 1, packet Include following steps:
S1. the adjacency matrix of the protein interaction Internet is obtained;
S2. embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
S3. dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
The existing detection method to protein complex, be by the protein interaction Internet be expressed as one it is undirected Scheme G=(V, E), protein is the vertex V in figure, and interaction is the side E in figure, and protein interaction Interactive Network The side of network does not have weight.The protein interaction Internet can be concentrated from available datas such as Krogan, Dip and Biogrid It obtains.By graph theory it is found that a protein interaction Internet corresponds to an adjacency matrix, COACH or K-means is utilized Clustering algorithms is waited to handle adjacency matrix, protein complex testing result can be obtained, that is, export which result shows A little protein belong to an a kind of namely complex.The present invention is based on the protein complex inspections of semi-supervised internet startup disk model Survey method is by carrying out adjacency matrix embedded processing, so as to obtain the dimensionality reduction square for being passed through dimension by adjacency matrix and being transformed Battle array, then protein complex detection is carried out to dimensionality reduction matrix with well known clustering algorithm, the fortune of clustering algorithm can be improved Line efficiency.Since the present invention utilizes the corresponding Internet of protein interaction, i.e., figure progress protein mathematically is compound Physical examination is surveyed, therefore unless stated otherwise, not to protein interaction, PPI, the protein interaction Internet in embodiment And the concepts such as corresponding figure of the protein interaction Internet distinguish.
Preferred embodiment is further used as, it is described that embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix The step for, i.e. step S2, as shown in Fig. 2, specifically including:
S21. the single order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain egg The partial structurtes information of the white matter interaction Internet;
S22. the second order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain egg The overall structure information of the white matter interaction Internet;
S23. partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
Wherein, the pairwise similarity between single order estimation (First-order proximity) description vertex.For albumen Any pair of vertex v in the matter interaction InternetiAnd vjFor, if viAnd vjBetween have a line, then viAnd vjBetween There is positive single order to estimate.Conversely, viAnd vjBetween single order be estimated as 0.Single order estimation reflects the protein interaction Internet Partial structurtes.
Pairwise similarity between second order estimation (Second-order proximity) description vertex neighbour structure.It is assumed that NiAnd NjRepresent viAnd vjAdjacent opposite vertexes, then second order estimation by NiAnd NjSimilitude determine.If two vertex share perhaps Mostly public neighbour, then the second order estimation between two vertex can be very high.It is similar that second order estimation has proven to one opposite vertexes of definition Property good measure standard, even if they and it is boundless be connected, therefore it can greatly enrich the relationship on vertex.Second order estimation reflects egg The overall structure of the white matter interaction Internet.
Single order estimates the concept with second order estimation, is proposed in LINE models earliest.If u is in figure G=(V, E) One vertex, then u and the single order estimation on other all vertex in figure G=(V, E) are represented by Nu={ su,1,su,2,… su,|V|, wherein si,jThe weight on the side in figure G=(V, E) between vertex i and vertex j is represented, if between vertex i and vertex j There is no side connection, then si,j=0, if connected between vertex i and vertex j by side, and it is not weighted graph to scheme G=(V, E), that Si,j=1, if figure G=(V, E) is weighted graph, then si,j>0.Similarly vertex v and other all vertex in figure G=(V, E) Single order estimation be represented by Nv={ sv,1,sv,2,…sv,|V|}.According to this algorithm, all tops in figure G=(V, E) can be calculated Single order between point and other vertex is estimated.And second order is estimated, it, then can be by calculating N by taking vertex v and vertex u as an exampleuWith Nv Between similitude obtain.It can be seen that calculate single order estimation and second order estimation, it is desirable that the weight on each side in figure is first obtained, But the characteristics of PPI, is between vertex other than protein title difference, without other features for differentiation, that is, often A vertex lacks for for each entitled feature in side.
Since the present invention is using the corresponding Internet progress protein complex detection of protein interaction, that is, have in mind In protein interaction Internet entirety, therefore unless stated otherwise, protein interaction is not interacted in embodiment Single order estimation, the single order estimation of single order estimation, the protein interaction Internet in network between all any two points are made It distinguishes, also the second order estimation not between all any two points in the protein interaction Internet, protein interaction Second order estimation, the second order estimation of the Internet are distinguished.
After single order estimation and second order estimation is obtained, you can single order estimation and second order estimation are combined with adjacency matrix, Single order is exactly estimated that corresponding partial structurtes information and second order estimate that corresponding overall structure information is saved in adjacency matrix, So as to obtain dimensionality reduction matrix.Due to being combined and belonging to the prior art single order estimation and second order estimation with adjacency matrix, herein It does not repeat.
Because each vertex in the protein interaction Internet is other than corresponding protein title without other Feature, therefore in order to calculate the estimation of the single order of the protein interaction Internet, i.e., in the protein interaction Internet Single order estimation between all any two vertex, needs to assign one group of feature for each vertex.In view of protein complex Definition, the important adjoint point on each vertex can be set as its feature, because these adjoint points have higher probability to be answered as protein Zoarium is combined.So-called important adjoint point refers to screen in all adjoint points on a vertex by certain algorithm Part adjoint point.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet Between single order estimation, the step for so as to obtain the partial structurtes information of the protein interaction Internet, i.e. step S21, as shown in figure 3, specifically including:
The preferred adjoint point on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm Collection;
S211. respectively according to the preferred adjoint point collection on each vertex;
S212. according to the corresponding preferred adjoint point collection in each vertex, characteristic information is assigned for each vertex, so as to Establish characteristic information matrix;
S213. it according to characteristic information matrix, calculates in the protein interaction Internet between all any two points Single order is estimated;
Using the single order estimation between any two points all in the protein interaction Internet as the egg of required acquisition The partial structurtes information of the white matter interaction Internet.
Each vertex in the protein interaction Internet has preferred adjoint point collection, but be not excluded for certain vertex Preferred adjoint point collection may be empty set.For a vertex in the protein interaction Internet, preferred adjoint point collection is The set of qualified adjoint point screened from its all adjoint point.Using preferred adjoint point collection spy is assigned to corresponding vertex Reference ceases.If vertex viCorresponding preferred adjoint point collection includes vertex x, y and z, then " x, y and z " three vertex are exactly vertex vi The feature being endowed.After each vertex is endowed feature by such method, just there are the basis for calculating side right weight, Ran Houyong To calculate single order estimation.
Since each vertex has the characteristic information being endowed, protein interaction interaction can be obtained The characteristic information matrix (Feature matrix) of network, it is the matrix of N × D rank, and wherein N is protein interaction The vertex sum of the Internet, D are the feature quantity on each vertex.Because the preferred adjoint point collection of each vertex correspondence differs Sample, that is, the feature on each vertex are different, therefore the feature quantity on each vertex is also different.
For example, in the protein interaction Internet for having N number of vertex at one, a vertex may corresponding spy The maximum value of quantity is levied as N, therefore the maximum order of the corresponding characteristic information matrix of this protein interaction Internet For N × N ranks.If the feature quantity of a vertex correspondence be less than N, then this vertex in characteristic information matrix it is corresponding that This deficiency of a line N is arranged, and N row can be supplied with filling algorithm, and preferred method is to be supplied N row to make the element of its rightmost It is zero.And during the use of characteristic information matrix, it is sometimes desirable to reduce its scale, that is, keep its line number constant, reduce Its columns, at this time can be considered as D one variable, and the maximum value of D can be set to feature in the protein interaction Internet The feature quantity on the vertex of quantity maximum, can also directly be set to N, and the minimum value of D can be set to protein interaction interaction The feature quantity on the vertex of feature quantity minimum in network.For example, when the maximum value of D is set to N, the characteristic information square of N × D ranks Battle array can be reduced to N × (D-1) rank, N × (D-2) rank etc., it is preferable that be by its rightmost during by characteristic information matrix reduction Row are left out, and only retain leftmost row.
According to characteristic information matrix, can calculate in the protein interaction Internet between all any two points Single order is estimated.It, can be preferably by cosine similarity there are many ways to calculating single order estimation according to characteristic information matrix Computational methods since this belongs to the prior art, do not repeat here.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet, specifically include:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, it is mutual so as to export protein Act on the second order estimation between all any two points in the Internet;
Using the second order estimation between any two points all in the protein interaction Internet as the egg of required acquisition The overall structure information of the white matter interaction Internet.
Second order is estimated to represent the similarity degree of an opposite vertexes neighbour structure.Thus, second order estimation is modeled, first has to mould Typeization each pushes up neighborhood of a point.For the figure G=(V, E) containing n vertex, adjacency matrix M is corresponded to, it includes n row squares Battle array, i.e. m1,m2,…mn.For row matrixAnd if only if viAnd vjThere is m when being connected by a linei,j>0。
miVertex v is describediNeighbour structure, and M provides the information of each vertex neighbour structure.So it can be based on automatic Encoder design goes out GCN, to preserve the estimation of the second order of G.
Figure convolutional neural networks (Graph Convolutional Network, GCN) based on autocoder can answer With hidden variable, the interpretable hidden expression of undirected non-weight map can be learnt, this is to be very suitable for protein interaction friendship Mutual network.Using each vertex feature as GCN a part of input data, then, by l convolutional layers coding it Afterwards, the statement learnt by original graph can just be obtained.For decoded portion, internal product decoder can be simply used. The protein interaction Internet is a undirected nonweighted figure G=(V, E), there is N=| V | a vertex.By the neighbour of G The characteristic information matrix X of domain matrix A and N × D rank is as input.Using random hidden variable Zi, the output of N × F ranks can be obtained Matrix Z.Here, F is the quantity for exporting feature, and D is the feature quantity on each vertex.It just can be obtained from the output result of GCN The second order estimation for the protein interaction Internet to be obtained, i.e., it is all arbitrary in the protein interaction Internet The second order estimation on two vertex.Since the method that second order estimation is obtained from the output result of GCN belongs to the prior art, this In do not repeat.
Since each vertex is characterized in what the adjoint point based on selection generated, in other words, the feature quantity on each vertex It is different.So initial values of the N as D is set, when establishing characteristic information matrix X, if these no features of the vertex, Correlation values are then set as 0.Then, each network layer can be written as following nonlinear function in figure convolutional neural networks:
H(l+1)=f (Hl, A),
Wherein H(0)=X, H(l)=Z,
Transmission rule is as follows:
f(H(l), A) and=relu (AH(l)W(l)),
Wherein W is the weight matrix of I network layers, and relu is activation primitive, it is noted that is only enumerated with the A persons of multiplication all All features of adjoint point, but do not include the vertex in itself.It is therefore desirable to a unit matrix I is added on A.Then, transmission rule Then become:
Wherein It isDiagonal Vertex Degree matrix, if L=3, that is it is meant that figure convolutional neural networks have three A convolutional layer rebuilds the structure of A to obtain Z.It is assumed that determine the feature of each layer of reservation preceding layer half in network, then three It is obtained after layer
It is further used as preferred embodiment, the adjoint point selection algorithm, i.e. step S211, as shown in figure 4, specifically For:
S2111. the protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each The Deepwalk vectors on vertex;
S2112. a vertex in the protein interaction Internet is selected as object vertex;
S2113. according to the Deepwalk of object vertex and all adjoint points of object vertex vectors, difference computing object vertex With the Euclidean distance of each of which adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
S2114., all Euclidean distances with object vertex are more than to the collection cooperation of the adjoint point composition of arithmetic average Preferred adjoint point collection for object vertex;
S2115. a vertex in the execution selected protein interaction Internet is returned to as object vertex The step for, until the preferred adjoint point collection on each vertex in the protein interaction Internet is selected.
DeepWalk is a kind of method for learning the hidden expression of node, this method is in a vector row space to node Social relationships encoded, be language model and unsupervised learning from word sequence to figure on one extension.This method will The sequence for blocking migration is learnt as sentence.This method have it is expansible, can parallelization the characteristics of, can be used for do network Classification and outlier detection.DeepWalk methods are successfully verified in social networks and map analysis.It passes through model Change a succession of short and random migration, continuous vector space is encoded with low-dimensional, so as to learn potentially to state.
The protein interaction Internet is handled by Deepwalk, gained handling result causes protein phase Each vertex corresponds to the vector of one 64 dimension in the interaction Internet, according to any two vertex corresponding 64 Dimensional vector can calculate the Euclidean distance on the two vertex.In the present patent application, each vertex is calculated by Deepwalk 64 dimensional vectors obtained after method processing are referred to as the Deepwalk vectors of this vertex correspondence.Selected protein interaction Interactive Network A vertex in network, referred to as object vertex, the Euclidean distance of object vertex and its all adjoint point is calculated respectively Come, then seek the arithmetic average of all these Euclidean distances, i.e., by the Euclid of object vertex and its all adjoint point away from From the sum of divided by its adjoint point sum.Then, by the Euclidean distance and arithmetic average of object vertex and each of which adjoint point It is compared, the adjoint point of arithmetic average is more than for Euclidean distance, then is included into preferred adjoint point collection, otherwise excludes preferred Except adjoint point collection.By this method, the certain vertex that can be directed to the protein interaction Internet filters out it Qualified adjoint point forms preferred adjoint point collection.
The above method is recycled, i.e., selects for an object vertex in step S2114 and sets up its preferred adjoint point collection Afterwards, return to step S2112, the vertex that another is selected not yet to set up preferred adjoint point collection in the protein interaction Internet It as new object vertex, is continued to execute since step S2112, until vertex all in the protein interaction Internet Its qualified adjoint point is all filtered out by this method forms corresponding preferred adjoint point collection.There is corresponding preferred adjoint point Collection can carry out the operations such as feature imparting by above-mentioned published method.
According to above-mentioned this adjoint point selection algorithm, the meaning of characteristic information matrix is just definitely:It is arranged with N rows D, N For the vertex sum of the protein interaction Internet, D is the feature quantity on each vertex.After Deepwalk algorithms, Each vertex has corresponded to the vector of one 64 dimension, and therefore, each element in characteristic information matrix is substantially one 64 dimensional vectors.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet after, be equipped with Optimization Steps, the Optimization Steps include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, calculate Scheme Laplce's regular terms loss function;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Will according to corresponding single order estimation during figure Laplce's regular terms loss function minimum and second order estimation respectively as The partial structurtes information of the protein interaction Internet of required acquisition and overall structure information.
Due to setting initial values of the N as D when establishing characteristic information matrix, characteristic information order of matrix number differs Surely it is most rational, the single order estimation of the protein interaction Internet according to obtained by characteristic information matrix and second order estimation Also it is not necessarily optimal, the dimensionality reduction matrix handled for clustering algorithm for finally obtain is not optimal by this.In order to Optimal dimensionality reduction matrix is acquired, dynamically adjusts characteristic information order of matrix number, the single order of the protein interaction Internet Estimation and second order estimation will also change, and the figure Laplce regular terms that gained is calculated by single order estimation and second order estimation loses When function obtains minimum value, show the estimation of corresponding single order and second order estimation be combined as it is optimal, should with this optimal one Rank estimate and second order estimation combination respectively as required acquisition the protein interaction Internet partial structurtes information and Overall structure information further goes to acquire dimensionality reduction matrix.
It is further used as preferred embodiment, the figure Laplce regular terms loss function, calculation formula is as follows It is shown:L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondIt is two The monitored loss of rank estimation, λ LfirstAnd LsecondBetween balance factor, λ is a parameter, can be in algorithm actual motion When select its value.
Preferred embodiment is further used as, the single order estimates monitored loss, and calculation formula is as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix.Preferably, yiIt is by vi's The matrix that Deepwalk vectors are established, specifically, with viAnd viThe corresponding Deepwalk vectors conduct of all preferred adjoint points Element, structure matrix yi.Matrix yjConstruction method similarly.Because the adjoint point number on each vertex may be different, that is, Say yiAnd yjExponent number may be different, smaller matrix is filled using neutral element, it is ensured that two matrix sizes are identical, with It is calculated.It is so-called that smaller matrix is filled using neutral element, it specifically can it is preferable to use following this fill methods:Such as yi Exponent number compares yjIt is small, then to be just filled into y with neutral elementiIn become a new matrix so that new order of matrix number and yjEqually, and And yiIn the upper left corner of new matrix.
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,Here it is similary The method that ground is filled with neutral element so that H(l+1)And H(l)Exponent number it is identical.
In aforementioned manners, when obtaining minimum value for figure Laplce's regular terms loss function L the estimation of corresponding single order and Second order estimation combination is optimal.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet after, be equipped with Optimization Steps, the Optimization Steps include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,For the second target Minus deviation variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the singular value of X Most high percentage, α is a matrix, and the columns of α is equal to the maximum value that D can use, and β is equal to the minimum value that D can use;
By according to Z be equal to 0 or to the maximum extent close to 0 when corresponding characteristic information matrix calculate single order estimation and Second order estimates the partial structurtes information of the protein interaction Internet and overall structure information respectively as required acquisition.
The above method is another implementation method of Optimization Steps.Mathematically, by the way that figure Laplce regular terms is asked to damage The dimensionality reduction problem of the problem of function minimum is to realize optimization actually matrix is lost, it, can be with as preferred embodiment Using traditional singular value decomposition method (SVD) come into the dimensionality reduction of row matrix.According to the theorem of SVD, the feature for having N × D ranks is believed Matrix X is ceased, U × S × V* can be written as again, here, U is the orthogonal matrix of characteristic information matrix X, and the size of U is N × N ranks;S It is the diagonal matrix of characteristic information matrix X, the size of S is N × D ranks;V* is the associate matrix of U, and the size of V* is D × D Rank.S can also be referred to as the singular value of X.If the minimum value of some most high percentage P of the singular value is set as 0, then, It can obtain the approximate matrix of X, i.e. X '.Finally, the value of D is to reduce, but, since it is desired that the reconstruct for minimizing X → X ' misses Difference, it is necessary to maximize the value of 1-P.After having carried out multiplication calculation with SVD, X'=(1-P) X, X is a N × D matrix, institute The problem of figure Laplce's regular terms loss function minimum value is to realize optimization can will be asked to be converted to goal programming and asked Topic, as shown in below equation group:
Dynamic adjustment α, refers to that α is initially preferably taken as the matrix of N × N, that is, characteristic information matrix is in itself, adjusts α, that is, gradually α depression of orders are such as deleted the row of rightmost one as the matrix of N × (N-1), then substitute into equation group again and fall into a trap It calculates;It deletes matrix of the row of rightmost one as N × (N-2) again in next step, then substitutes into calculating, etc. in equation group again.
In this equation group, positive and negative deviation variable is placed in status of equal importance, which means that becoming for each deviation Amount, weight is 1.Obviously, when Z is equal to 0, Pareto optimal solution can be obtained.But in some cases, Z cannot be accurately Equal to 0, Z required at this time is the value as close possible to 0 in its value range.So by constantly updating α and β, until looking for To can make Z close or equal to 0 α and β combine, the characteristic information matrix corresponding to the combination of this α and β be it is optimal, by The single order estimation and second order estimation that optimal characteristic information matrix is calculated can make dimensionality reduction matrix optimal, to optimize cluster Effect.
Embodiment 2
In the present embodiment, based on three groups of PPI data sets, will illustrate in embodiment 1 based on semi-supervised internet startup disk mould The protein complex detection method of type, is tested with reference to existing clustering method, by its experimental result and existing cluster The experimental result routinely applied of method is compared with state-of-the-art method, to show the performance of 1 the method for embodiment.Experiment exists It is run on desktop computer, is configured to i7CPU double-cores 4.00GHZ, 16GB memory, 1070 video cards of GTX.Three group data sets it is entire Calculating process can be completed in one day.Further, since PPI data clusters are usually disposable process in real world, The improvement of run time and the analysis of time complexity need not be paid close attention under study for action because clustering result quality be only it is prior.
Use the PPI data sets of three groups of newest saccharomyces cerevisiaes, i.e. Krogan data sets, Dip data sets and Biogrid numbers According to collection.Krogan data sets and Dip data sets are the operations for assessing several clustering algorithms.As shown in table 1, Krogan numbers There are similar average degree and density according to collection and Dip data sets, and Biogrid data sets compare with them, have higher average Degree and density.Because PPI data can use non-directed graph G=(V, E) to represent that then average degree can be calculated asDensity can calculate ForThe characteristic of three kinds of PPI data sets is as shown in table 1.
PPI data have higher rate of false alarm, it is estimated that about 50% or so.The noise jamming of data is from the PPI data Detect the clustering method of protein complex.Then, using CYC2008 as with reference to data set.CYC2008 provides saccharomyces cerevisiae Aspect passes through the catalogue of 408 kinds of protein complexes manually proofreaded, 90% more than another prevalence data collection MIPS.
Table 1
Data set Vertex Side Average degree Density
Krogan 5364 61289 22.85 0.0043
Dip 4972 17836 7.17 0.0014
Biogrid 6242 255510 81.87 0.013
Using neighbour's affinity score from the point of view of certain algorithm detect protein complex whether with the albumen in CYC2008 Matter composite bulk phase is matched.Then, accuracy rate, recall rate and F values then with it are calculated, to assess the performance of the algorithm.Neighbour is affine Power scoring NA (p, b) is defined as follows:
Here, P=(Vp, Ep) is the protein complex of prediction, and B=(Vb, Eb) is the protein complex of reference.In It is that accuracy rate precision can calculate as follows:
Wherein,
Recall rate recall calculates as follows:
Wherein,
F values F-measure is the harmonic-mean of accuracy rate and recall rate, is calculated as follows:
ω is a threshold value, and it is compound with reference to a certain protein in data set to represent whether protein complex is confirmed to be Body.According to experiment, set neighbour's affinity scoring threshold value as 0.25, this so that model performance and other algorithms are different.
In addition, also using three indexs, i.e. score (Frac), maximum matching rate (MMR) and geometric accuracy (Acc), to spend Measure the quality of protein complex cluster.Frac is the index for estimating score pair between two protein complexes, has and is more than 0.25 overlap integral θ, Frac (θ) calculates as follows:
Here, A and B is two protein complexes.
The geometry that Acc is other two kinds measurements --- cluster sensitivity (Sn) and cluster positive predictive value (PPV) --- is put down Mean.Sn and PPV calculates as follows:
Here, n is the protein number with reference to protein complex, and m is the protein number for clustering protein complex, Element tijRepresent the protein number found in two complexs.Because SnIt can be by adding each egg in same complex White matter and increase, and PPV can also be maximized by adding each protein in its own complex, thus can with this two Kind measures the geometrical mean to calculate Sn and PPV:
MMR represents that the protein complex of two groups of aggregations is bigraph (bipartite graph), wherein two groups of nodes represent reference composite body respectively With prediction complex, it is coupled reference composite body and predicts that the side of complex is weighted by overlap integral.Two protein complexes it Between overlap integral equationIt calculates.The value of MMR is the total of the specific subset on the side for possessing weight limit Weight divided by the number with reference to protein complex.
Root is it was found that so far, COACH is that the PPI Internets most stablize most representative clustering algorithm.Made with it Clustering method for assessment models.With two kinds of state-of-the-art network vector model DeepWalk and SDNE come comparison model Performance.As for the robustness of assessment models, then two distinct types of traditional clustering algorithm K-means and DBSCAN is selected to carry out Compare.About COACH, three key parameters of the algorithm, i.e. density, affinity and the degree of approach are set, respectively 0.7,0.2 and 0.5, it empirically analyzes, these parameters are enough to complete stablizing for all-network vector algorithm and calculate.And for K-means and DBSCAN, using only its default settings.
Because SDNE is also required to single order estimation, but due to it is designed for social networks, three kinds of versions have been used This SDNE, i.e., each SDNE-NA of the vertex without any feature, each vertex use SDNE-ALL of all adjoint points as feature And each vertex is using SDNE-SN of the selected adjoint point as feature.SDNE-SN is using the adjoint point choosing disclosed in embodiment 1 Algorithm progress adjoint point is selected to select.
The test result of Krogan data sets, Dip data sets and Biogrid data sets is shown in Fig. 5, Fig. 6, Fig. 7 respectively.
In terms of result, for the test of the accuracy rate of all three data sets, recall rate and F values, model is superior to other Model.Especially for highdensity Biogrid data sets, the F values that model is completed are at least higher than deputy model 90%.For Dip data sets, the F values that model is completed are highest 0.528, are about higher by than the algorithm of COACH is used only 20%, 9.5% also is higher by than occupying second COACH+SDNE-SN algorithms, 17% is higher by than COACH+DeepWalk algorithm.Class As result can be equally focused to find out in Krogan data.These are the results show that model is more suitable for use than other models exists With on highdensity complex network.
It moreover has been found that for all three data sets, SDNE-SN is better than SDNE-NA and SDNE-ALL.Because SDNE-SN It is to be estimated based on the adjoint point selection algorithm disclosed in embodiment 1 to calculate single order, as a result demonstrates the effective of model from side Property.
As for K-means and DBSCAN clustering algorithms, the two performing poor in testing.With which kind of network vector Algorithm is used together, and experimental result is not fine, which means that both algorithms are not suitable for the PPI Internets.
Compare the clustering result quality of each model below.According to the test result of previous section, only three kinds of selection is representational Model is compared, i.e. COACH, COACH+DeepWalk and COACH+SDNE-SN.Table 2 shows different model inspections Protein complex number.From table, it is found that for all three data sets, model can be arrived than other model inspections More protein complexes.There is this quantity basic, the quality for improving cluster is just more easy.
Table 2
Data set COACH+ the method for the present invention COACH COACH+Deepwalk COACH+DNE-SN
Krogan 610 570 570 580
Dip 808 748 750 840
Biogrid 3470 3158 3160 3267
Table 3, table 4, table 5 show that the clustering result quality for Krogan, Dip and Biogrid data set compares respectively.From table 3 It can be seen that model can complete better clustering result quality, for MMR and Frac two, than the COACH+ for occupying second SDNE-SN is about high by 38%, and Acc mono- is then about high by 25%.The situation of Dip data sets is also substantially similar.
As for Biogrid data sets, due to the high density of the network, the clustering result quality of all models reduces.But, mould Type is still better than other.For example, model Acc values reach 0.69, the COACH+SDNE-SN than occupying second is about high by 25%.
Table 3
COACH+ the method for the present invention COACH COACH+Deepwalk COACH+DNE-SN
Frac 0.61 0.35 0.4 0.44
Acc 0.68 0.46 0.48 0.54
MMR 0.5 0.19 0.25 0.36
Table 4
COACH+ the method for the present invention COACH COACH+Deepwalk COACH+DNE-SN
Frac 0.81 0.61 0.62 0.64
Acc 0.68 0.58 0.6 0.63
MMR 0.75 0.36 0.4 0.48
Table 5
COACH+ the method for the present invention COACH COACH+Deepwalk COACH+DNE-SN
Frac 0.35 0.14 0.2 0.24
Acc 0.69 0.39 0.4 0.45
MMR 0.28 0.05 0.14 0.22
Compare other network vector methods, devise a kind of algorithm for selecting crucial adjoint point as each apex feature, To calculate the estimation of its single order.In addition, devise three layers of GCN of one kind, the structure of the deep learning PPI Internets, to preserve secondly Rank is estimated.
The extensive experiment carried out for the various PPI Internets shows that model is stable, and indices are better than other State-of-the-art model.In the future, plan is using Recognition with Recurrent Neural Network, by data conformity to PPI Interactive Network from Biomedical literature Network, to be further improved the quality of protein complex detection.
Embodiment 3
The present invention is based on the protein complex detection device of semi-supervised internet startup disk model, as shown in figure 8, it includes:
Memory, for storing at least one program;
Processor is based on semi-supervised internet startup disk for loading at least one program to perform described in Examples 1 and 2 The protein complex detection method of model.
It is that the preferable of the present invention is implemented to be illustrated, but be not limited to the invention the implementation above Example, those skilled in the art can also make various equivalent variations under the premise of without prejudice to spirit of the invention or replace It changes, these equivalent deformations or replacement are all contained in the application claim limited range.

Claims (10)

1. the protein complex detection method based on semi-supervised internet startup disk model, which is characterized in that include the following steps:
Obtain the adjacency matrix of the protein interaction Internet;
Embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
Dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
2. the protein complex detection method according to claim 1 based on semi-supervised internet startup disk model, feature It is, it is described that embedded processing is carried out to adjacency matrix, the step for so as to obtain dimensionality reduction matrix, specifically include:
The single order estimation between all any two points in the protein interaction Internet is calculated, it is mutual so as to obtain protein Act on the partial structurtes information of the Internet;
The second order estimation between all any two points in the protein interaction Internet is calculated, it is mutual so as to obtain protein Act on the overall structure information of the Internet;
Partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
3. the protein complex detection method according to claim 2 based on semi-supervised internet startup disk model, feature It is, the single order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen It the step for partial structurtes information of the matter interaction Internet, specifically includes:
The preferred adjoint point collection on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm;
Respectively according to the preferred adjoint point collection on each vertex, characteristic information is assigned for each vertex, so as to establish characteristic information Matrix;
According to characteristic information matrix, the single order estimation between all any two points in the protein interaction Internet is calculated;
Using the single order estimation between any two points all in the protein interaction Internet as the protein of required acquisition The partial structurtes information of the interaction Internet.
4. the protein complex detection method according to claim 3 based on semi-supervised internet startup disk model, feature It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen It the step for overall structure information of the matter interaction Internet, specifically includes:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, so as to export protein interaction Second order estimation in the Internet between all any two points;
Using the second order estimation between any two points all in the protein interaction Internet as the protein of required acquisition The overall structure information of the interaction Internet.
5. the protein complex detection method according to claim 3 or 4 based on semi-supervised internet startup disk model, special Sign is, the preferred neighbour on each vertex selected using adjoint point selection algorithm in the protein interaction Internet The step for point set, specifically includes:
The protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each vertex Deepwalk vectors;
A vertex in the selected protein interaction Internet is as object vertex;
According to the Deepwalk of object vertex and all adjoint points of object vertex vectors, computing object vertex and each of which are distinguished The Euclidean distance of adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
The set that the adjoint point that all Euclidean distances with object vertex are more than arithmetic average is formed is as object vertex Preferred adjoint point collection;
A step for vertex in the execution selected protein interaction Internet is as object vertex is returned to, directly Until the preferred adjoint point collection for selecting each vertex in the protein interaction Internet.
6. the protein complex detection method according to claim 4 based on semi-supervised internet startup disk model, feature It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen After the step for overall structure information of the matter interaction Internet, equipped with Optimization Steps, the Optimization Steps include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, Tula is calculated This regular terms loss function of pula;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Corresponding single order estimation and second order estimation are respectively as required when will be according to figure Laplce's regular terms loss function minimum The partial structurtes information of the protein interaction Internet of acquisition and overall structure information.
7. the protein complex detection method according to claim 6 based on semi-supervised internet startup disk model, feature It is, the figure Laplce regular terms loss function, calculation formula is as follows:
L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondEstimate for second order The monitored loss of meter, λ LfirstAnd LsecondBetween balance factor.
8. the protein complex detection method according to claim 7 based on semi-supervised internet startup disk model, feature It is, the monitored loss of the single order estimation, calculation formula is as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix;
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,
9. the protein complex detection method according to claim 4 based on semi-supervised internet startup disk model, feature It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen After the step for overall structure information of the matter interaction Internet, equipped with Optimization Steps, the Optimization Steps include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,Negative bias for the second target Poor variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the highest of the singular value of X Percentage, Z are that will abut against the output handled in matrix and characteristic information Input matrix to figure convolutional neural networks as a result, α is one Matrix, and the columns of α is equal to the maximum value that D can use, β is equal to the minimum value that D can use;
Will according to Z be equal to 0 or to the maximum extent close to 0 when corresponding characteristic information matrix and calculate single order estimation and two Rank estimates the partial structurtes information of the protein interaction Internet and overall structure information respectively as required acquisition.
10. the protein complex detection device based on semi-supervised internet startup disk model, which is characterized in that it includes:
Memory, for storing at least one program;
Processor is required described in any one of 1-9 with perform claim based on semi-supervised network for loading at least one program The protein complex detection method of incorporation model.
CN201711250342.9A 2017-12-01 2017-12-01 Protein complex detection method and device based on semi-supervised network embedded model Active CN108171010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711250342.9A CN108171010B (en) 2017-12-01 2017-12-01 Protein complex detection method and device based on semi-supervised network embedded model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711250342.9A CN108171010B (en) 2017-12-01 2017-12-01 Protein complex detection method and device based on semi-supervised network embedded model

Publications (2)

Publication Number Publication Date
CN108171010A true CN108171010A (en) 2018-06-15
CN108171010B CN108171010B (en) 2021-09-14

Family

ID=62525063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711250342.9A Active CN108171010B (en) 2017-12-01 2017-12-01 Protein complex detection method and device based on semi-supervised network embedded model

Country Status (1)

Country Link
CN (1) CN108171010B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932402A (en) * 2018-06-27 2018-12-04 华中师范大学 A kind of protein complex recognizing method
CN109389151A (en) * 2018-08-30 2019-02-26 华南师范大学 A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion
CN110796133A (en) * 2018-08-01 2020-02-14 北京京东尚科信息技术有限公司 Method and device for identifying file area
CN110942805A (en) * 2019-12-11 2020-03-31 云南大学 Insulator element prediction system based on semi-supervised deep learning
CN111860768A (en) * 2020-06-16 2020-10-30 中山大学 Method for enhancing point-edge interaction of graph neural network
CN112071362A (en) * 2020-08-03 2020-12-11 西安理工大学 Detection method of protein complex fusing global and local topological structures

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192034A1 (en) * 2001-06-21 2007-08-16 Benight Albert S Methods for representing sequence-dependent contextual information present in polymer sequence and uses thereof
WO2013049398A2 (en) * 2011-09-28 2013-04-04 H. Lee Moffitt Cancer Center & Research Institute, Inc. Protein-protein interaction as biomarkers
CN103235900A (en) * 2013-03-28 2013-08-07 中山大学 Weight assembly clustering method for excavating protein complex
CN105138866A (en) * 2015-08-12 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for identifying protein functions based on protein-protein interaction network and network topological structure features
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network
CN106021988A (en) * 2016-05-26 2016-10-12 河南城建学院 Recognition method of protein complexes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192034A1 (en) * 2001-06-21 2007-08-16 Benight Albert S Methods for representing sequence-dependent contextual information present in polymer sequence and uses thereof
WO2013049398A2 (en) * 2011-09-28 2013-04-04 H. Lee Moffitt Cancer Center & Research Institute, Inc. Protein-protein interaction as biomarkers
CN103235900A (en) * 2013-03-28 2013-08-07 中山大学 Weight assembly clustering method for excavating protein complex
CN105138866A (en) * 2015-08-12 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for identifying protein functions based on protein-protein interaction network and network topological structure features
CN106021988A (en) * 2016-05-26 2016-10-12 河南城建学院 Recognition method of protein complexes
CN105930686A (en) * 2016-07-05 2016-09-07 四川大学 Secondary protein structureprediction method based on deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
L. HUANG, L. LIAO AND C. H. WU: "Protein-protein interaction network inference from multiple kernels with optimization based on random walk by linear programming", 《2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)》 *
U013527419: "网络表示学习(DeepWalk,LINE,node2vec,SDNE)", 《HTTPS://WWW.ITDAAN.COM/BLOG/2017/07/24/CE511D9D6C68917C8A1AFABBD66C17AE.HTML》 *
朱佳,等: "针对蛋白质复合体检测的自学习图聚类(英文)", 《控制理论与应用》 *
梁华东: "基于流形学习的蛋白质功能预测与优化", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
梦游--: "LLE流行嵌入式降维算法", 《HTTPS://BLOG.CSDN.NET/ZHOUGUANGFEI0717/ARTICLE/DETAILS/78604980》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932402A (en) * 2018-06-27 2018-12-04 华中师范大学 A kind of protein complex recognizing method
CN110796133A (en) * 2018-08-01 2020-02-14 北京京东尚科信息技术有限公司 Method and device for identifying file area
US11763167B2 (en) 2018-08-01 2023-09-19 Bejing Jingdong Shangke Information Technology Co, Ltd. Copy area identification method and device
CN109389151A (en) * 2018-08-30 2019-02-26 华南师范大学 A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion
CN109389151B (en) * 2018-08-30 2022-01-18 华南师范大学 Knowledge graph processing method and device based on semi-supervised embedded representation model
CN110942805A (en) * 2019-12-11 2020-03-31 云南大学 Insulator element prediction system based on semi-supervised deep learning
CN111860768A (en) * 2020-06-16 2020-10-30 中山大学 Method for enhancing point-edge interaction of graph neural network
CN111860768B (en) * 2020-06-16 2023-06-09 中山大学 Method for enhancing point-edge interaction of graph neural network
CN112071362A (en) * 2020-08-03 2020-12-11 西安理工大学 Detection method of protein complex fusing global and local topological structures
CN112071362B (en) * 2020-08-03 2024-04-09 西安理工大学 Method for detecting protein complex fusing global and local topological structures

Also Published As

Publication number Publication date
CN108171010B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN108171010A (en) Protein complex detection method and device based on semi-supervised internet startup disk model
Gao et al. Deep transfer learning for image‐based structural damage recognition
CN106095893B (en) A kind of cross-media retrieval method
CN109389151B (en) Knowledge graph processing method and device based on semi-supervised embedded representation model
CN107895038B (en) Link prediction relation recommendation method and device
WO2013067461A2 (en) Identifying associations in data
Mall et al. Representative subsets for big data learning using k-NN graphs
CN112087420A (en) Network killing chain detection method, prediction method and system
Wei et al. Self-filtering: A noise-aware sample selection for label noise with confidence penalization
Yao et al. Denoising protein–protein interaction network via variational graph auto-encoder for protein complex detection
CN113254717A (en) Multidimensional graph network node clustering processing method, apparatus and device
Baheti et al. Federated Learning on Distributed Medical Records for Detection of Lung Nodules.
Ning et al. Conditional generative adversarial networks based on the principle of homologycontinuity for face aging
CN112437053A (en) Intrusion detection method and device
CN116467666A (en) Graph anomaly detection method and system based on integrated learning and active learning
Goyal et al. Benchmarks for graph embedding evaluation
KR101467707B1 (en) Method for instance-matching in knowledge base and device therefor
Hu et al. Unsupervised defect detection algorithm for printed fabrics using content-based image retrieval techniques
Rawal et al. Predicting missing values in a dataset: challenges and approaches
Tripathy et al. Uncertainty-based clustering algorithms for large data sets
CN113205124A (en) Clustering method, system and storage medium under high-dimensional real scene based on density peak value
Kumar et al. Graph Convolutional Neural Networks for Link Prediction in Social Networks
Wasim et al. Forecasting Networks Links with Laplace Characteristic and Geographical Information in Complex Networks
CN114821013B (en) Element detection method and device based on point cloud data and computer equipment
Rani et al. A Hybrid Grey Wolf-Meta Heuristic Optimization and Random Forest Classifier for Handling Imbalanced Credit Card Fraud Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220323

Address after: 510000 5548, floor 5, No. 1023, Gaopu Road, Tianhe District, Guangzhou City, Guangdong Province

Patentee after: Guangdong SUCHUANG Data Technology Co.,Ltd.

Address before: 510631 School of computer science, South China Normal University, 55 Zhongshan Avenue West, Tianhe District, Guangzhou City, Guangdong Province

Patentee before: SOUTH CHINA NORMAL University

Patentee before: Guangzhou Fanping Electronic Technology Co., Ltd