CN108171010A - Protein complex detection method and device based on semi-supervised internet startup disk model - Google Patents
Protein complex detection method and device based on semi-supervised internet startup disk model Download PDFInfo
- Publication number
- CN108171010A CN108171010A CN201711250342.9A CN201711250342A CN108171010A CN 108171010 A CN108171010 A CN 108171010A CN 201711250342 A CN201711250342 A CN 201711250342A CN 108171010 A CN108171010 A CN 108171010A
- Authority
- CN
- China
- Prior art keywords
- internet
- protein
- matrix
- vertex
- protein interaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- General Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses protein complex detection methods and device based on semi-supervised internet startup disk model, the method includes obtaining the adjacency matrix of the protein interaction Internet, embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix, dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result, described device includes for storing at least one program storage and for loading at least one program to perform the processor of the protein complex detection method based on semi-supervised internet startup disk model.The present invention improves the effect of clustering processing by carrying out dimension conversion, then clustering algorithm is transferred to handle to the corresponding adjacency matrix of the protein interaction Internet.Protein complex detection method and device the present invention is based on semi-supervised internet startup disk model are widely used in protein complex identification technology field.
Description
Technical field
The present invention relates to protein complex identification technology fields, are based especially on the albumen of semi-supervised internet startup disk model
The compound body detecting method of matter and device.
Background technology
Protein complex is that protein interaction (Protein-protein interaction, PPI) is formed
Complicated graph structure, plays vital role in biochemical process and pharmaceutical technology.Therefore, PPI friendships are correctly identified
Protein complex in mutual network, it is extremely useful for biomedical sector.But, with the tremendous growth of PPI data, again
In addition the 'bottleneck' restrictions of experimental method, only a small amount of protein complex are identified by testing.
To overcome the technology restriction of experimental method in protein complex detection, it is used computational methods.PPI is interacted
Network can regard a undirected unweighted graph as, wherein, protein is vertex, their interaction is side.Each albumen
Matter complex is made of two or more protein for showing as intensive connected subgraph to be based on, which means that can utilize
The figure that clustering method is formed finds them.
Recently, internet startup disk is studied extensively by people, and confirms that it can further improve many figure clustering methods
Performance.The low-dimensional on vertex represents in network vector learning network, to capture and preserve the network structure.But, it is most of existing
The feature on each vertex in some network vector method heavy dependence networks, this causes them not to be suitable for the PPI Internets.
It is related to each vertex without any metadata other than protein name is referred to as in the PPI Internets.In other words, existing net
Network vector approach can not capture PPI alternating network structures completely, because can be used for calculating the estimation of its single order without enough data
Estimate with second order.
Invention content
In order to solve the above-mentioned technical problem, the first object of the present invention is to provide based on semi-supervised internet startup disk model
Protein complex detection method, second is designed to provide the protein complex detection based on semi-supervised internet startup disk model
Device.
The first technical solution for being taken of the present invention is:
Based on the protein complex detection method of semi-supervised internet startup disk model, include the following steps:
Obtain the adjacency matrix of the protein interaction Internet;
Embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
Dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
Further, it is described that embedded processing is carried out to adjacency matrix, it is specific to wrap the step for so as to obtain dimensionality reduction matrix
It includes:
The single order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain protein
The partial structurtes information of the interaction Internet;
The second order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain protein
The overall structure information of the interaction Internet;
Partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
Further, the single order estimation calculated in the protein interaction Internet between all any two points,
It the step for so as to obtain the partial structurtes information of the protein interaction Internet, specifically includes:
The preferred adjoint point on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm
Collection;
Respectively according to the preferred adjoint point collection on each vertex, characteristic information is assigned for each vertex, so as to establish feature
Information matrix;
According to characteristic information matrix, calculate the single order in the protein interaction Internet between all any two points and estimate
Meter;
Using the single order estimation between any two points all in the protein interaction Internet as the egg of required acquisition
The partial structurtes information of the white matter interaction Internet.
Further, the second order estimation calculated in the protein interaction Internet between all any two points,
It the step for so as to obtain the overall structure information of the protein interaction Internet, specifically includes:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, it is mutual so as to export protein
Act on the second order estimation between all any two points in the Internet;
Using the second order estimation between any two points all in the protein interaction Internet as the egg of required acquisition
The overall structure information of the white matter interaction Internet.
Further, described each top selected using adjoint point selection algorithm in the protein interaction Internet
It the step for preferred adjoint point collection of point, specifically includes:
The protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each vertex
Deepwalk vectors;
A vertex in the selected protein interaction Internet is as object vertex;
According to the Deepwalk of object vertex and all adjoint points of object vertex vectors, computing object vertex is every with it respectively
The Euclidean distance of one adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
The set that the adjoint point that all Euclidean distances with object vertex are more than arithmetic average is formed is as object
The preferred adjoint point collection on vertex;
A vertex in the execution selected protein interaction Internet is returned to as this step of object vertex
Suddenly, until the preferred adjoint point collection on each vertex in the protein interaction Internet is selected.
Further, the second order estimation calculated in the protein interaction Internet between all any two points,
It is described excellent equipped with Optimization Steps after the step for so as to obtain the overall structure information of the protein interaction Internet
Change step to include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, calculate
Scheme Laplce's regular terms loss function;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Will according to corresponding single order estimation during figure Laplce's regular terms loss function minimum and second order estimation respectively as
The partial structurtes information of the protein interaction Internet of required acquisition and overall structure information.
Further, the figure Laplce regular terms loss function, calculation formula are as follows:
L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondIt is two
The monitored loss of rank estimation, λ LfirstAnd LsecondBetween balance factor.
Further, the monitored loss of the single order estimation, calculation formula are as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's
The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix;
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,
Further, the second order estimation calculated in the protein interaction Internet between all any two points,
It is described excellent equipped with Optimization Steps after the step for so as to obtain the overall structure information of the protein interaction Internet
Change step to include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,For the second target
Minus deviation variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the singular value of X
Most high percentage, α is a matrix, and the columns of α is equal to the maximum value that D can use, and β is equal to the minimum value that D can use.
The second technical solution for being taken of the present invention is:Protein complex detection based on semi-supervised internet startup disk model
Device, including:
Memory, for storing at least one program;
Processor, it is embedding based on semi-supervised network described in the first technical solution to perform for loading at least one program
Enter the protein complex detection method of model.
The beneficial effects of the invention are as follows:It is mutual to protein by the compound body detecting method of present protein and device
The effect Internet carries out embedded, dimension conversion processing, can improve existing clustering algorithm to protein interaction Interactive Network
Network carries out efficiency during cluster calculation process, optimizes Clustering Effect so that protein complex testing result is more accurate.Meanwhile
The present invention can be that each vertex of the protein interaction Internet assigns feature, can capture protein interaction interaction
The partial structurtes of network can capture its overall structure again, therefore present invention does not require each tops of the protein interaction Internet
Point itself has feature, and overcoming directly to hand over protein interaction of each vertex there is no feature using clustering algorithm
The technological deficiency that mutual network is handled.The present invention is stable, and every prediction result evaluation index is superior to other protein
Compound body detecting method.
Description of the drawings
Fig. 1 is the flow chart of the compound body detecting method of present protein;
Fig. 2 is the particular flow sheet of step S2;
Fig. 3 is the particular flow sheet of step S21;
Fig. 4 is the particular flow sheet of step S211;
Fig. 5 is the comparison result of Krogan data sets;
Fig. 6 is the comparison result of Dip data sets;
Fig. 7 is the comparison result of Biogrid data sets;
Fig. 8 is the structure chart of present protein complex detection device.
Specific embodiment
Embodiment 1
Protein complex detection method disclosed by the invention based on semi-supervised internet startup disk model, as shown in Figure 1, packet
Include following steps:
S1. the adjacency matrix of the protein interaction Internet is obtained;
S2. embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
S3. dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
The existing detection method to protein complex, be by the protein interaction Internet be expressed as one it is undirected
Scheme G=(V, E), protein is the vertex V in figure, and interaction is the side E in figure, and protein interaction Interactive Network
The side of network does not have weight.The protein interaction Internet can be concentrated from available datas such as Krogan, Dip and Biogrid
It obtains.By graph theory it is found that a protein interaction Internet corresponds to an adjacency matrix, COACH or K-means is utilized
Clustering algorithms is waited to handle adjacency matrix, protein complex testing result can be obtained, that is, export which result shows
A little protein belong to an a kind of namely complex.The present invention is based on the protein complex inspections of semi-supervised internet startup disk model
Survey method is by carrying out adjacency matrix embedded processing, so as to obtain the dimensionality reduction square for being passed through dimension by adjacency matrix and being transformed
Battle array, then protein complex detection is carried out to dimensionality reduction matrix with well known clustering algorithm, the fortune of clustering algorithm can be improved
Line efficiency.Since the present invention utilizes the corresponding Internet of protein interaction, i.e., figure progress protein mathematically is compound
Physical examination is surveyed, therefore unless stated otherwise, not to protein interaction, PPI, the protein interaction Internet in embodiment
And the concepts such as corresponding figure of the protein interaction Internet distinguish.
Preferred embodiment is further used as, it is described that embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix
The step for, i.e. step S2, as shown in Fig. 2, specifically including:
S21. the single order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain egg
The partial structurtes information of the white matter interaction Internet;
S22. the second order estimation between all any two points in the protein interaction Internet is calculated, so as to obtain egg
The overall structure information of the white matter interaction Internet;
S23. partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
Wherein, the pairwise similarity between single order estimation (First-order proximity) description vertex.For albumen
Any pair of vertex v in the matter interaction InternetiAnd vjFor, if viAnd vjBetween have a line, then viAnd vjBetween
There is positive single order to estimate.Conversely, viAnd vjBetween single order be estimated as 0.Single order estimation reflects the protein interaction Internet
Partial structurtes.
Pairwise similarity between second order estimation (Second-order proximity) description vertex neighbour structure.It is assumed that
NiAnd NjRepresent viAnd vjAdjacent opposite vertexes, then second order estimation by NiAnd NjSimilitude determine.If two vertex share perhaps
Mostly public neighbour, then the second order estimation between two vertex can be very high.It is similar that second order estimation has proven to one opposite vertexes of definition
Property good measure standard, even if they and it is boundless be connected, therefore it can greatly enrich the relationship on vertex.Second order estimation reflects egg
The overall structure of the white matter interaction Internet.
Single order estimates the concept with second order estimation, is proposed in LINE models earliest.If u is in figure G=(V, E)
One vertex, then u and the single order estimation on other all vertex in figure G=(V, E) are represented by Nu={ su,1,su,2,…
su,|V|, wherein si,jThe weight on the side in figure G=(V, E) between vertex i and vertex j is represented, if between vertex i and vertex j
There is no side connection, then si,j=0, if connected between vertex i and vertex j by side, and it is not weighted graph to scheme G=(V, E), that
Si,j=1, if figure G=(V, E) is weighted graph, then si,j>0.Similarly vertex v and other all vertex in figure G=(V, E)
Single order estimation be represented by Nv={ sv,1,sv,2,…sv,|V|}.According to this algorithm, all tops in figure G=(V, E) can be calculated
Single order between point and other vertex is estimated.And second order is estimated, it, then can be by calculating N by taking vertex v and vertex u as an exampleuWith Nv
Between similitude obtain.It can be seen that calculate single order estimation and second order estimation, it is desirable that the weight on each side in figure is first obtained,
But the characteristics of PPI, is between vertex other than protein title difference, without other features for differentiation, that is, often
A vertex lacks for for each entitled feature in side.
Since the present invention is using the corresponding Internet progress protein complex detection of protein interaction, that is, have in mind
In protein interaction Internet entirety, therefore unless stated otherwise, protein interaction is not interacted in embodiment
Single order estimation, the single order estimation of single order estimation, the protein interaction Internet in network between all any two points are made
It distinguishes, also the second order estimation not between all any two points in the protein interaction Internet, protein interaction
Second order estimation, the second order estimation of the Internet are distinguished.
After single order estimation and second order estimation is obtained, you can single order estimation and second order estimation are combined with adjacency matrix,
Single order is exactly estimated that corresponding partial structurtes information and second order estimate that corresponding overall structure information is saved in adjacency matrix,
So as to obtain dimensionality reduction matrix.Due to being combined and belonging to the prior art single order estimation and second order estimation with adjacency matrix, herein
It does not repeat.
Because each vertex in the protein interaction Internet is other than corresponding protein title without other
Feature, therefore in order to calculate the estimation of the single order of the protein interaction Internet, i.e., in the protein interaction Internet
Single order estimation between all any two vertex, needs to assign one group of feature for each vertex.In view of protein complex
Definition, the important adjoint point on each vertex can be set as its feature, because these adjoint points have higher probability to be answered as protein
Zoarium is combined.So-called important adjoint point refers to screen in all adjoint points on a vertex by certain algorithm
Part adjoint point.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet
Between single order estimation, the step for so as to obtain the partial structurtes information of the protein interaction Internet, i.e. step
S21, as shown in figure 3, specifically including:
The preferred adjoint point on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm
Collection;
S211. respectively according to the preferred adjoint point collection on each vertex;
S212. according to the corresponding preferred adjoint point collection in each vertex, characteristic information is assigned for each vertex, so as to
Establish characteristic information matrix;
S213. it according to characteristic information matrix, calculates in the protein interaction Internet between all any two points
Single order is estimated;
Using the single order estimation between any two points all in the protein interaction Internet as the egg of required acquisition
The partial structurtes information of the white matter interaction Internet.
Each vertex in the protein interaction Internet has preferred adjoint point collection, but be not excluded for certain vertex
Preferred adjoint point collection may be empty set.For a vertex in the protein interaction Internet, preferred adjoint point collection is
The set of qualified adjoint point screened from its all adjoint point.Using preferred adjoint point collection spy is assigned to corresponding vertex
Reference ceases.If vertex viCorresponding preferred adjoint point collection includes vertex x, y and z, then " x, y and z " three vertex are exactly vertex vi
The feature being endowed.After each vertex is endowed feature by such method, just there are the basis for calculating side right weight, Ran Houyong
To calculate single order estimation.
Since each vertex has the characteristic information being endowed, protein interaction interaction can be obtained
The characteristic information matrix (Feature matrix) of network, it is the matrix of N × D rank, and wherein N is protein interaction
The vertex sum of the Internet, D are the feature quantity on each vertex.Because the preferred adjoint point collection of each vertex correspondence differs
Sample, that is, the feature on each vertex are different, therefore the feature quantity on each vertex is also different.
For example, in the protein interaction Internet for having N number of vertex at one, a vertex may corresponding spy
The maximum value of quantity is levied as N, therefore the maximum order of the corresponding characteristic information matrix of this protein interaction Internet
For N × N ranks.If the feature quantity of a vertex correspondence be less than N, then this vertex in characteristic information matrix it is corresponding that
This deficiency of a line N is arranged, and N row can be supplied with filling algorithm, and preferred method is to be supplied N row to make the element of its rightmost
It is zero.And during the use of characteristic information matrix, it is sometimes desirable to reduce its scale, that is, keep its line number constant, reduce
Its columns, at this time can be considered as D one variable, and the maximum value of D can be set to feature in the protein interaction Internet
The feature quantity on the vertex of quantity maximum, can also directly be set to N, and the minimum value of D can be set to protein interaction interaction
The feature quantity on the vertex of feature quantity minimum in network.For example, when the maximum value of D is set to N, the characteristic information square of N × D ranks
Battle array can be reduced to N × (D-1) rank, N × (D-2) rank etc., it is preferable that be by its rightmost during by characteristic information matrix reduction
Row are left out, and only retain leftmost row.
According to characteristic information matrix, can calculate in the protein interaction Internet between all any two points
Single order is estimated.It, can be preferably by cosine similarity there are many ways to calculating single order estimation according to characteristic information matrix
Computational methods since this belongs to the prior art, do not repeat here.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet
Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet, specifically include:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, it is mutual so as to export protein
Act on the second order estimation between all any two points in the Internet;
Using the second order estimation between any two points all in the protein interaction Internet as the egg of required acquisition
The overall structure information of the white matter interaction Internet.
Second order is estimated to represent the similarity degree of an opposite vertexes neighbour structure.Thus, second order estimation is modeled, first has to mould
Typeization each pushes up neighborhood of a point.For the figure G=(V, E) containing n vertex, adjacency matrix M is corresponded to, it includes n row squares
Battle array, i.e. m1,m2,…mn.For row matrixAnd if only if viAnd vjThere is m when being connected by a linei,j>0。
miVertex v is describediNeighbour structure, and M provides the information of each vertex neighbour structure.So it can be based on automatic
Encoder design goes out GCN, to preserve the estimation of the second order of G.
Figure convolutional neural networks (Graph Convolutional Network, GCN) based on autocoder can answer
With hidden variable, the interpretable hidden expression of undirected non-weight map can be learnt, this is to be very suitable for protein interaction friendship
Mutual network.Using each vertex feature as GCN a part of input data, then, by l convolutional layers coding it
Afterwards, the statement learnt by original graph can just be obtained.For decoded portion, internal product decoder can be simply used.
The protein interaction Internet is a undirected nonweighted figure G=(V, E), there is N=| V | a vertex.By the neighbour of G
The characteristic information matrix X of domain matrix A and N × D rank is as input.Using random hidden variable Zi, the output of N × F ranks can be obtained
Matrix Z.Here, F is the quantity for exporting feature, and D is the feature quantity on each vertex.It just can be obtained from the output result of GCN
The second order estimation for the protein interaction Internet to be obtained, i.e., it is all arbitrary in the protein interaction Internet
The second order estimation on two vertex.Since the method that second order estimation is obtained from the output result of GCN belongs to the prior art, this
In do not repeat.
Since each vertex is characterized in what the adjoint point based on selection generated, in other words, the feature quantity on each vertex
It is different.So initial values of the N as D is set, when establishing characteristic information matrix X, if these no features of the vertex,
Correlation values are then set as 0.Then, each network layer can be written as following nonlinear function in figure convolutional neural networks:
H(l+1)=f (Hl, A),
Wherein H(0)=X, H(l)=Z,
Transmission rule is as follows:
f(H(l), A) and=relu (AH(l)W(l)),
Wherein W is the weight matrix of I network layers, and relu is activation primitive, it is noted that is only enumerated with the A persons of multiplication all
All features of adjoint point, but do not include the vertex in itself.It is therefore desirable to a unit matrix I is added on A.Then, transmission rule
Then become:
Wherein It isDiagonal Vertex Degree matrix, if L=3, that is it is meant that figure convolutional neural networks have three
A convolutional layer rebuilds the structure of A to obtain Z.It is assumed that determine the feature of each layer of reservation preceding layer half in network, then three
It is obtained after layer
It is further used as preferred embodiment, the adjoint point selection algorithm, i.e. step S211, as shown in figure 4, specifically
For:
S2111. the protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each
The Deepwalk vectors on vertex;
S2112. a vertex in the protein interaction Internet is selected as object vertex;
S2113. according to the Deepwalk of object vertex and all adjoint points of object vertex vectors, difference computing object vertex
With the Euclidean distance of each of which adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
S2114., all Euclidean distances with object vertex are more than to the collection cooperation of the adjoint point composition of arithmetic average
Preferred adjoint point collection for object vertex;
S2115. a vertex in the execution selected protein interaction Internet is returned to as object vertex
The step for, until the preferred adjoint point collection on each vertex in the protein interaction Internet is selected.
DeepWalk is a kind of method for learning the hidden expression of node, this method is in a vector row space to node
Social relationships encoded, be language model and unsupervised learning from word sequence to figure on one extension.This method will
The sequence for blocking migration is learnt as sentence.This method have it is expansible, can parallelization the characteristics of, can be used for do network
Classification and outlier detection.DeepWalk methods are successfully verified in social networks and map analysis.It passes through model
Change a succession of short and random migration, continuous vector space is encoded with low-dimensional, so as to learn potentially to state.
The protein interaction Internet is handled by Deepwalk, gained handling result causes protein phase
Each vertex corresponds to the vector of one 64 dimension in the interaction Internet, according to any two vertex corresponding 64
Dimensional vector can calculate the Euclidean distance on the two vertex.In the present patent application, each vertex is calculated by Deepwalk
64 dimensional vectors obtained after method processing are referred to as the Deepwalk vectors of this vertex correspondence.Selected protein interaction Interactive Network
A vertex in network, referred to as object vertex, the Euclidean distance of object vertex and its all adjoint point is calculated respectively
Come, then seek the arithmetic average of all these Euclidean distances, i.e., by the Euclid of object vertex and its all adjoint point away from
From the sum of divided by its adjoint point sum.Then, by the Euclidean distance and arithmetic average of object vertex and each of which adjoint point
It is compared, the adjoint point of arithmetic average is more than for Euclidean distance, then is included into preferred adjoint point collection, otherwise excludes preferred
Except adjoint point collection.By this method, the certain vertex that can be directed to the protein interaction Internet filters out it
Qualified adjoint point forms preferred adjoint point collection.
The above method is recycled, i.e., selects for an object vertex in step S2114 and sets up its preferred adjoint point collection
Afterwards, return to step S2112, the vertex that another is selected not yet to set up preferred adjoint point collection in the protein interaction Internet
It as new object vertex, is continued to execute since step S2112, until vertex all in the protein interaction Internet
Its qualified adjoint point is all filtered out by this method forms corresponding preferred adjoint point collection.There is corresponding preferred adjoint point
Collection can carry out the operations such as feature imparting by above-mentioned published method.
According to above-mentioned this adjoint point selection algorithm, the meaning of characteristic information matrix is just definitely:It is arranged with N rows D, N
For the vertex sum of the protein interaction Internet, D is the feature quantity on each vertex.After Deepwalk algorithms,
Each vertex has corresponded to the vector of one 64 dimension, and therefore, each element in characteristic information matrix is substantially one
64 dimensional vectors.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet
Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet after, be equipped with
Optimization Steps, the Optimization Steps include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, calculate
Scheme Laplce's regular terms loss function;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Will according to corresponding single order estimation during figure Laplce's regular terms loss function minimum and second order estimation respectively as
The partial structurtes information of the protein interaction Internet of required acquisition and overall structure information.
Due to setting initial values of the N as D when establishing characteristic information matrix, characteristic information order of matrix number differs
Surely it is most rational, the single order estimation of the protein interaction Internet according to obtained by characteristic information matrix and second order estimation
Also it is not necessarily optimal, the dimensionality reduction matrix handled for clustering algorithm for finally obtain is not optimal by this.In order to
Optimal dimensionality reduction matrix is acquired, dynamically adjusts characteristic information order of matrix number, the single order of the protein interaction Internet
Estimation and second order estimation will also change, and the figure Laplce regular terms that gained is calculated by single order estimation and second order estimation loses
When function obtains minimum value, show the estimation of corresponding single order and second order estimation be combined as it is optimal, should with this optimal one
Rank estimate and second order estimation combination respectively as required acquisition the protein interaction Internet partial structurtes information and
Overall structure information further goes to acquire dimensionality reduction matrix.
It is further used as preferred embodiment, the figure Laplce regular terms loss function, calculation formula is as follows
It is shown:L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondIt is two
The monitored loss of rank estimation, λ LfirstAnd LsecondBetween balance factor, λ is a parameter, can be in algorithm actual motion
When select its value.
Preferred embodiment is further used as, the single order estimates monitored loss, and calculation formula is as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's
The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix.Preferably, yiIt is by vi's
The matrix that Deepwalk vectors are established, specifically, with viAnd viThe corresponding Deepwalk vectors conduct of all preferred adjoint points
Element, structure matrix yi.Matrix yjConstruction method similarly.Because the adjoint point number on each vertex may be different, that is,
Say yiAnd yjExponent number may be different, smaller matrix is filled using neutral element, it is ensured that two matrix sizes are identical, with
It is calculated.It is so-called that smaller matrix is filled using neutral element, it specifically can it is preferable to use following this fill methods:Such as yi
Exponent number compares yjIt is small, then to be just filled into y with neutral elementiIn become a new matrix so that new order of matrix number and yjEqually, and
And yiIn the upper left corner of new matrix.
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,Here it is similary
The method that ground is filled with neutral element so that H(l+1)And H(l)Exponent number it is identical.
In aforementioned manners, when obtaining minimum value for figure Laplce's regular terms loss function L the estimation of corresponding single order and
Second order estimation combination is optimal.
Preferred embodiment is further used as, it is described to calculate all any two points in the protein interaction Internet
Between second order estimation, the step for so as to obtain the overall structure information of the protein interaction Internet after, be equipped with
Optimization Steps, the Optimization Steps include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,For the second target
Minus deviation variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the singular value of X
Most high percentage, α is a matrix, and the columns of α is equal to the maximum value that D can use, and β is equal to the minimum value that D can use;
By according to Z be equal to 0 or to the maximum extent close to 0 when corresponding characteristic information matrix calculate single order estimation and
Second order estimates the partial structurtes information of the protein interaction Internet and overall structure information respectively as required acquisition.
The above method is another implementation method of Optimization Steps.Mathematically, by the way that figure Laplce regular terms is asked to damage
The dimensionality reduction problem of the problem of function minimum is to realize optimization actually matrix is lost, it, can be with as preferred embodiment
Using traditional singular value decomposition method (SVD) come into the dimensionality reduction of row matrix.According to the theorem of SVD, the feature for having N × D ranks is believed
Matrix X is ceased, U × S × V* can be written as again, here, U is the orthogonal matrix of characteristic information matrix X, and the size of U is N × N ranks;S
It is the diagonal matrix of characteristic information matrix X, the size of S is N × D ranks;V* is the associate matrix of U, and the size of V* is D × D
Rank.S can also be referred to as the singular value of X.If the minimum value of some most high percentage P of the singular value is set as 0, then,
It can obtain the approximate matrix of X, i.e. X '.Finally, the value of D is to reduce, but, since it is desired that the reconstruct for minimizing X → X ' misses
Difference, it is necessary to maximize the value of 1-P.After having carried out multiplication calculation with SVD, X'=(1-P) X, X is a N × D matrix, institute
The problem of figure Laplce's regular terms loss function minimum value is to realize optimization can will be asked to be converted to goal programming and asked
Topic, as shown in below equation group:
Dynamic adjustment α, refers to that α is initially preferably taken as the matrix of N × N, that is, characteristic information matrix is in itself, adjusts
α, that is, gradually α depression of orders are such as deleted the row of rightmost one as the matrix of N × (N-1), then substitute into equation group again and fall into a trap
It calculates;It deletes matrix of the row of rightmost one as N × (N-2) again in next step, then substitutes into calculating, etc. in equation group again.
In this equation group, positive and negative deviation variable is placed in status of equal importance, which means that becoming for each deviation
Amount, weight is 1.Obviously, when Z is equal to 0, Pareto optimal solution can be obtained.But in some cases, Z cannot be accurately
Equal to 0, Z required at this time is the value as close possible to 0 in its value range.So by constantly updating α and β, until looking for
To can make Z close or equal to 0 α and β combine, the characteristic information matrix corresponding to the combination of this α and β be it is optimal, by
The single order estimation and second order estimation that optimal characteristic information matrix is calculated can make dimensionality reduction matrix optimal, to optimize cluster
Effect.
Embodiment 2
In the present embodiment, based on three groups of PPI data sets, will illustrate in embodiment 1 based on semi-supervised internet startup disk mould
The protein complex detection method of type, is tested with reference to existing clustering method, by its experimental result and existing cluster
The experimental result routinely applied of method is compared with state-of-the-art method, to show the performance of 1 the method for embodiment.Experiment exists
It is run on desktop computer, is configured to i7CPU double-cores 4.00GHZ, 16GB memory, 1070 video cards of GTX.Three group data sets it is entire
Calculating process can be completed in one day.Further, since PPI data clusters are usually disposable process in real world,
The improvement of run time and the analysis of time complexity need not be paid close attention under study for action because clustering result quality be only it is prior.
Use the PPI data sets of three groups of newest saccharomyces cerevisiaes, i.e. Krogan data sets, Dip data sets and Biogrid numbers
According to collection.Krogan data sets and Dip data sets are the operations for assessing several clustering algorithms.As shown in table 1, Krogan numbers
There are similar average degree and density according to collection and Dip data sets, and Biogrid data sets compare with them, have higher average
Degree and density.Because PPI data can use non-directed graph G=(V, E) to represent that then average degree can be calculated asDensity can calculate
ForThe characteristic of three kinds of PPI data sets is as shown in table 1.
PPI data have higher rate of false alarm, it is estimated that about 50% or so.The noise jamming of data is from the PPI data
Detect the clustering method of protein complex.Then, using CYC2008 as with reference to data set.CYC2008 provides saccharomyces cerevisiae
Aspect passes through the catalogue of 408 kinds of protein complexes manually proofreaded, 90% more than another prevalence data collection MIPS.
Table 1
Data set | Vertex | Side | Average degree | Density |
Krogan | 5364 | 61289 | 22.85 | 0.0043 |
Dip | 4972 | 17836 | 7.17 | 0.0014 |
Biogrid | 6242 | 255510 | 81.87 | 0.013 |
Using neighbour's affinity score from the point of view of certain algorithm detect protein complex whether with the albumen in CYC2008
Matter composite bulk phase is matched.Then, accuracy rate, recall rate and F values then with it are calculated, to assess the performance of the algorithm.Neighbour is affine
Power scoring NA (p, b) is defined as follows:
Here, P=(Vp, Ep) is the protein complex of prediction, and B=(Vb, Eb) is the protein complex of reference.In
It is that accuracy rate precision can calculate as follows:
Wherein,
Recall rate recall calculates as follows:
Wherein,
F values F-measure is the harmonic-mean of accuracy rate and recall rate, is calculated as follows:
ω is a threshold value, and it is compound with reference to a certain protein in data set to represent whether protein complex is confirmed to be
Body.According to experiment, set neighbour's affinity scoring threshold value as 0.25, this so that model performance and other algorithms are different.
In addition, also using three indexs, i.e. score (Frac), maximum matching rate (MMR) and geometric accuracy (Acc), to spend
Measure the quality of protein complex cluster.Frac is the index for estimating score pair between two protein complexes, has and is more than
0.25 overlap integral θ, Frac (θ) calculates as follows:
Here, A and B is two protein complexes.
The geometry that Acc is other two kinds measurements --- cluster sensitivity (Sn) and cluster positive predictive value (PPV) --- is put down
Mean.Sn and PPV calculates as follows:
Here, n is the protein number with reference to protein complex, and m is the protein number for clustering protein complex,
Element tijRepresent the protein number found in two complexs.Because SnIt can be by adding each egg in same complex
White matter and increase, and PPV can also be maximized by adding each protein in its own complex, thus can with this two
Kind measures the geometrical mean to calculate Sn and PPV:
MMR represents that the protein complex of two groups of aggregations is bigraph (bipartite graph), wherein two groups of nodes represent reference composite body respectively
With prediction complex, it is coupled reference composite body and predicts that the side of complex is weighted by overlap integral.Two protein complexes it
Between overlap integral equationIt calculates.The value of MMR is the total of the specific subset on the side for possessing weight limit
Weight divided by the number with reference to protein complex.
Root is it was found that so far, COACH is that the PPI Internets most stablize most representative clustering algorithm.Made with it
Clustering method for assessment models.With two kinds of state-of-the-art network vector model DeepWalk and SDNE come comparison model
Performance.As for the robustness of assessment models, then two distinct types of traditional clustering algorithm K-means and DBSCAN is selected to carry out
Compare.About COACH, three key parameters of the algorithm, i.e. density, affinity and the degree of approach are set, respectively 0.7,0.2 and
0.5, it empirically analyzes, these parameters are enough to complete stablizing for all-network vector algorithm and calculate.And for K-means and
DBSCAN, using only its default settings.
Because SDNE is also required to single order estimation, but due to it is designed for social networks, three kinds of versions have been used
This SDNE, i.e., each SDNE-NA of the vertex without any feature, each vertex use SDNE-ALL of all adjoint points as feature
And each vertex is using SDNE-SN of the selected adjoint point as feature.SDNE-SN is using the adjoint point choosing disclosed in embodiment 1
Algorithm progress adjoint point is selected to select.
The test result of Krogan data sets, Dip data sets and Biogrid data sets is shown in Fig. 5, Fig. 6, Fig. 7 respectively.
In terms of result, for the test of the accuracy rate of all three data sets, recall rate and F values, model is superior to other
Model.Especially for highdensity Biogrid data sets, the F values that model is completed are at least higher than deputy model
90%.For Dip data sets, the F values that model is completed are highest 0.528, are about higher by than the algorithm of COACH is used only
20%, 9.5% also is higher by than occupying second COACH+SDNE-SN algorithms, 17% is higher by than COACH+DeepWalk algorithm.Class
As result can be equally focused to find out in Krogan data.These are the results show that model is more suitable for use than other models exists
With on highdensity complex network.
It moreover has been found that for all three data sets, SDNE-SN is better than SDNE-NA and SDNE-ALL.Because SDNE-SN
It is to be estimated based on the adjoint point selection algorithm disclosed in embodiment 1 to calculate single order, as a result demonstrates the effective of model from side
Property.
As for K-means and DBSCAN clustering algorithms, the two performing poor in testing.With which kind of network vector
Algorithm is used together, and experimental result is not fine, which means that both algorithms are not suitable for the PPI Internets.
Compare the clustering result quality of each model below.According to the test result of previous section, only three kinds of selection is representational
Model is compared, i.e. COACH, COACH+DeepWalk and COACH+SDNE-SN.Table 2 shows different model inspections
Protein complex number.From table, it is found that for all three data sets, model can be arrived than other model inspections
More protein complexes.There is this quantity basic, the quality for improving cluster is just more easy.
Table 2
Data set | COACH+ the method for the present invention | COACH | COACH+Deepwalk | COACH+DNE-SN |
Krogan | 610 | 570 | 570 | 580 |
Dip | 808 | 748 | 750 | 840 |
Biogrid | 3470 | 3158 | 3160 | 3267 |
Table 3, table 4, table 5 show that the clustering result quality for Krogan, Dip and Biogrid data set compares respectively.From table 3
It can be seen that model can complete better clustering result quality, for MMR and Frac two, than the COACH+ for occupying second
SDNE-SN is about high by 38%, and Acc mono- is then about high by 25%.The situation of Dip data sets is also substantially similar.
As for Biogrid data sets, due to the high density of the network, the clustering result quality of all models reduces.But, mould
Type is still better than other.For example, model Acc values reach 0.69, the COACH+SDNE-SN than occupying second is about high by 25%.
Table 3
COACH+ the method for the present invention | COACH | COACH+Deepwalk | COACH+DNE-SN | |
Frac | 0.61 | 0.35 | 0.4 | 0.44 |
Acc | 0.68 | 0.46 | 0.48 | 0.54 |
MMR | 0.5 | 0.19 | 0.25 | 0.36 |
Table 4
COACH+ the method for the present invention | COACH | COACH+Deepwalk | COACH+DNE-SN | |
Frac | 0.81 | 0.61 | 0.62 | 0.64 |
Acc | 0.68 | 0.58 | 0.6 | 0.63 |
MMR | 0.75 | 0.36 | 0.4 | 0.48 |
Table 5
COACH+ the method for the present invention | COACH | COACH+Deepwalk | COACH+DNE-SN | |
Frac | 0.35 | 0.14 | 0.2 | 0.24 |
Acc | 0.69 | 0.39 | 0.4 | 0.45 |
MMR | 0.28 | 0.05 | 0.14 | 0.22 |
Compare other network vector methods, devise a kind of algorithm for selecting crucial adjoint point as each apex feature,
To calculate the estimation of its single order.In addition, devise three layers of GCN of one kind, the structure of the deep learning PPI Internets, to preserve secondly
Rank is estimated.
The extensive experiment carried out for the various PPI Internets shows that model is stable, and indices are better than other
State-of-the-art model.In the future, plan is using Recognition with Recurrent Neural Network, by data conformity to PPI Interactive Network from Biomedical literature
Network, to be further improved the quality of protein complex detection.
Embodiment 3
The present invention is based on the protein complex detection device of semi-supervised internet startup disk model, as shown in figure 8, it includes:
Memory, for storing at least one program;
Processor is based on semi-supervised internet startup disk for loading at least one program to perform described in Examples 1 and 2
The protein complex detection method of model.
It is that the preferable of the present invention is implemented to be illustrated, but be not limited to the invention the implementation above
Example, those skilled in the art can also make various equivalent variations under the premise of without prejudice to spirit of the invention or replace
It changes, these equivalent deformations or replacement are all contained in the application claim limited range.
Claims (10)
1. the protein complex detection method based on semi-supervised internet startup disk model, which is characterized in that include the following steps:
Obtain the adjacency matrix of the protein interaction Internet;
Embedded processing is carried out to adjacency matrix, so as to obtain dimensionality reduction matrix;
Dimensionality reduction matrix is handled using clustering algorithm, so as to obtain protein complex testing result.
2. the protein complex detection method according to claim 1 based on semi-supervised internet startup disk model, feature
It is, it is described that embedded processing is carried out to adjacency matrix, the step for so as to obtain dimensionality reduction matrix, specifically include:
The single order estimation between all any two points in the protein interaction Internet is calculated, it is mutual so as to obtain protein
Act on the partial structurtes information of the Internet;
The second order estimation between all any two points in the protein interaction Internet is calculated, it is mutual so as to obtain protein
Act on the overall structure information of the Internet;
Partial structurtes information and overall structure information are saved in adjacency matrix, so as to obtain dimensionality reduction matrix.
3. the protein complex detection method according to claim 2 based on semi-supervised internet startup disk model, feature
It is, the single order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen
It the step for partial structurtes information of the matter interaction Internet, specifically includes:
The preferred adjoint point collection on each vertex in the protein interaction Internet is selected using adjoint point selection algorithm;
Respectively according to the preferred adjoint point collection on each vertex, characteristic information is assigned for each vertex, so as to establish characteristic information
Matrix;
According to characteristic information matrix, the single order estimation between all any two points in the protein interaction Internet is calculated;
Using the single order estimation between any two points all in the protein interaction Internet as the protein of required acquisition
The partial structurtes information of the interaction Internet.
4. the protein complex detection method according to claim 3 based on semi-supervised internet startup disk model, feature
It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen
It the step for overall structure information of the matter interaction Internet, specifically includes:
It will abut against and handled in matrix and characteristic information Input matrix to figure convolutional neural networks, so as to export protein interaction
Second order estimation in the Internet between all any two points;
Using the second order estimation between any two points all in the protein interaction Internet as the protein of required acquisition
The overall structure information of the interaction Internet.
5. the protein complex detection method according to claim 3 or 4 based on semi-supervised internet startup disk model, special
Sign is, the preferred neighbour on each vertex selected using adjoint point selection algorithm in the protein interaction Internet
The step for point set, specifically includes:
The protein interaction Internet is handled using Deepwalk algorithms, so as to obtain each vertex
Deepwalk vectors;
A vertex in the selected protein interaction Internet is as object vertex;
According to the Deepwalk of object vertex and all adjoint points of object vertex vectors, computing object vertex and each of which are distinguished
The Euclidean distance of adjoint point;
Computing object vertex and the arithmetic average of the Euclidean distance of each of which adjoint point;
The set that the adjoint point that all Euclidean distances with object vertex are more than arithmetic average is formed is as object vertex
Preferred adjoint point collection;
A step for vertex in the execution selected protein interaction Internet is as object vertex is returned to, directly
Until the preferred adjoint point collection for selecting each vertex in the protein interaction Internet.
6. the protein complex detection method according to claim 4 based on semi-supervised internet startup disk model, feature
It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen
After the step for overall structure information of the matter interaction Internet, equipped with Optimization Steps, the Optimization Steps include:
According to the single order estimation between any two points all in the protein interaction Internet and second order estimation, Tula is calculated
This regular terms loss function of pula;
Dynamic adjustment characteristic information order of matrix number, until figure Laplce's regular terms loss function minimizes;
Corresponding single order estimation and second order estimation are respectively as required when will be according to figure Laplce's regular terms loss function minimum
The partial structurtes information of the protein interaction Internet of acquisition and overall structure information.
7. the protein complex detection method according to claim 6 based on semi-supervised internet startup disk model, feature
It is, the figure Laplce regular terms loss function, calculation formula is as follows:
L=Lfirst+λLsecond
In formula, L is schemes Laplce's regular terms loss function, LfirstThe loss monitored for single order estimation, LsecondEstimate for second order
The monitored loss of meter, λ LfirstAnd LsecondBetween balance factor.
8. the protein complex detection method according to claim 7 based on semi-supervised internet startup disk model, feature
It is, the monitored loss of the single order estimation, calculation formula is as follows:
In formula, viAnd vjIt is the opposite vertexes connected in the protein interaction Internet by a line, yiIt is by vi's
The matrix that Deepwalk vectors are established, yjIt is by vjDeepwalk vectors establish matrix;
The monitored loss of the second order estimation, calculation formula are as follows:
In formula, L0For the convolutional layer number of plies of figure convolutional neural networks, H(0)=N × D,
9. the protein complex detection method according to claim 4 based on semi-supervised internet startup disk model, feature
It is, the second order estimation calculated in the protein interaction Internet between all any two points, so as to obtain albumen
After the step for overall structure information of the matter interaction Internet, equipped with Optimization Steps, the Optimization Steps include:
Dynamic adjustment α and β so that Z is equal to 0 or to the maximum extent close to 0 in following equations group:
In formula,For the minus deviation variable of first object,For the overgauge variable of first object,Negative bias for the second target
Poor variable,Overgauge variable for the second target;X is characterized information matrix, and D is the columns of X, and P is the highest of the singular value of X
Percentage, Z are that will abut against the output handled in matrix and characteristic information Input matrix to figure convolutional neural networks as a result, α is one
Matrix, and the columns of α is equal to the maximum value that D can use, β is equal to the minimum value that D can use;
Will according to Z be equal to 0 or to the maximum extent close to 0 when corresponding characteristic information matrix and calculate single order estimation and two
Rank estimates the partial structurtes information of the protein interaction Internet and overall structure information respectively as required acquisition.
10. the protein complex detection device based on semi-supervised internet startup disk model, which is characterized in that it includes:
Memory, for storing at least one program;
Processor is required described in any one of 1-9 with perform claim based on semi-supervised network for loading at least one program
The protein complex detection method of incorporation model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711250342.9A CN108171010B (en) | 2017-12-01 | 2017-12-01 | Protein complex detection method and device based on semi-supervised network embedded model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711250342.9A CN108171010B (en) | 2017-12-01 | 2017-12-01 | Protein complex detection method and device based on semi-supervised network embedded model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108171010A true CN108171010A (en) | 2018-06-15 |
CN108171010B CN108171010B (en) | 2021-09-14 |
Family
ID=62525063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711250342.9A Active CN108171010B (en) | 2017-12-01 | 2017-12-01 | Protein complex detection method and device based on semi-supervised network embedded model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171010B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932402A (en) * | 2018-06-27 | 2018-12-04 | 华中师范大学 | A kind of protein complex recognizing method |
CN109389151A (en) * | 2018-08-30 | 2019-02-26 | 华南师范大学 | A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion |
CN110796133A (en) * | 2018-08-01 | 2020-02-14 | 北京京东尚科信息技术有限公司 | Method and device for identifying file area |
CN110942805A (en) * | 2019-12-11 | 2020-03-31 | 云南大学 | Insulator element prediction system based on semi-supervised deep learning |
CN111860768A (en) * | 2020-06-16 | 2020-10-30 | 中山大学 | Method for enhancing point-edge interaction of graph neural network |
CN112071362A (en) * | 2020-08-03 | 2020-12-11 | 西安理工大学 | Detection method of protein complex fusing global and local topological structures |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192034A1 (en) * | 2001-06-21 | 2007-08-16 | Benight Albert S | Methods for representing sequence-dependent contextual information present in polymer sequence and uses thereof |
WO2013049398A2 (en) * | 2011-09-28 | 2013-04-04 | H. Lee Moffitt Cancer Center & Research Institute, Inc. | Protein-protein interaction as biomarkers |
CN103235900A (en) * | 2013-03-28 | 2013-08-07 | 中山大学 | Weight assembly clustering method for excavating protein complex |
CN105138866A (en) * | 2015-08-12 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Method for identifying protein functions based on protein-protein interaction network and network topological structure features |
CN105930686A (en) * | 2016-07-05 | 2016-09-07 | 四川大学 | Secondary protein structureprediction method based on deep neural network |
CN106021988A (en) * | 2016-05-26 | 2016-10-12 | 河南城建学院 | Recognition method of protein complexes |
-
2017
- 2017-12-01 CN CN201711250342.9A patent/CN108171010B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192034A1 (en) * | 2001-06-21 | 2007-08-16 | Benight Albert S | Methods for representing sequence-dependent contextual information present in polymer sequence and uses thereof |
WO2013049398A2 (en) * | 2011-09-28 | 2013-04-04 | H. Lee Moffitt Cancer Center & Research Institute, Inc. | Protein-protein interaction as biomarkers |
CN103235900A (en) * | 2013-03-28 | 2013-08-07 | 中山大学 | Weight assembly clustering method for excavating protein complex |
CN105138866A (en) * | 2015-08-12 | 2015-12-09 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Method for identifying protein functions based on protein-protein interaction network and network topological structure features |
CN106021988A (en) * | 2016-05-26 | 2016-10-12 | 河南城建学院 | Recognition method of protein complexes |
CN105930686A (en) * | 2016-07-05 | 2016-09-07 | 四川大学 | Secondary protein structureprediction method based on deep neural network |
Non-Patent Citations (5)
Title |
---|
L. HUANG, L. LIAO AND C. H. WU: "Protein-protein interaction network inference from multiple kernels with optimization based on random walk by linear programming", 《2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)》 * |
U013527419: "网络表示学习(DeepWalk,LINE,node2vec,SDNE)", 《HTTPS://WWW.ITDAAN.COM/BLOG/2017/07/24/CE511D9D6C68917C8A1AFABBD66C17AE.HTML》 * |
朱佳,等: "针对蛋白质复合体检测的自学习图聚类(英文)", 《控制理论与应用》 * |
梁华东: "基于流形学习的蛋白质功能预测与优化", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
梦游--: "LLE流行嵌入式降维算法", 《HTTPS://BLOG.CSDN.NET/ZHOUGUANGFEI0717/ARTICLE/DETAILS/78604980》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932402A (en) * | 2018-06-27 | 2018-12-04 | 华中师范大学 | A kind of protein complex recognizing method |
CN110796133A (en) * | 2018-08-01 | 2020-02-14 | 北京京东尚科信息技术有限公司 | Method and device for identifying file area |
US11763167B2 (en) | 2018-08-01 | 2023-09-19 | Bejing Jingdong Shangke Information Technology Co, Ltd. | Copy area identification method and device |
CN110796133B (en) * | 2018-08-01 | 2024-05-24 | 北京京东尚科信息技术有限公司 | Text region identification method and device |
CN109389151A (en) * | 2018-08-30 | 2019-02-26 | 华南师范大学 | A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion |
CN109389151B (en) * | 2018-08-30 | 2022-01-18 | 华南师范大学 | Knowledge graph processing method and device based on semi-supervised embedded representation model |
CN110942805A (en) * | 2019-12-11 | 2020-03-31 | 云南大学 | Insulator element prediction system based on semi-supervised deep learning |
CN111860768A (en) * | 2020-06-16 | 2020-10-30 | 中山大学 | Method for enhancing point-edge interaction of graph neural network |
CN111860768B (en) * | 2020-06-16 | 2023-06-09 | 中山大学 | Method for enhancing point-edge interaction of graph neural network |
CN112071362A (en) * | 2020-08-03 | 2020-12-11 | 西安理工大学 | Detection method of protein complex fusing global and local topological structures |
CN112071362B (en) * | 2020-08-03 | 2024-04-09 | 西安理工大学 | Method for detecting protein complex fusing global and local topological structures |
Also Published As
Publication number | Publication date |
---|---|
CN108171010B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171010A (en) | Protein complex detection method and device based on semi-supervised internet startup disk model | |
Gao et al. | Deep transfer learning for image‐based structural damage recognition | |
CN106095893B (en) | A kind of cross-media retrieval method | |
González et al. | Validation methods for plankton image classification systems | |
CN109389151B (en) | Knowledge graph processing method and device based on semi-supervised embedded representation model | |
CN112087420B (en) | Network killing chain detection method, prediction method and system | |
CN110347932B (en) | Cross-network user alignment method based on deep learning | |
CN107895038B (en) | Link prediction relation recommendation method and device | |
WO2013067461A2 (en) | Identifying associations in data | |
Wei et al. | Self-filtering: A noise-aware sample selection for label noise with confidence penalization | |
Boutemine et al. | Mining community structures in multidimensional networks | |
Mall et al. | Representative subsets for big data learning using k-NN graphs | |
Yao et al. | Denoising protein–protein interaction network via variational graph auto-encoder for protein complex detection | |
Baheti et al. | Federated Learning on Distributed Medical Records for Detection of Lung Nodules. | |
Ning et al. | Conditional generative adversarial networks based on the principle of homologycontinuity for face aging | |
CN113254717A (en) | Multidimensional graph network node clustering processing method, apparatus and device | |
CN112437053A (en) | Intrusion detection method and device | |
CN116467666A (en) | Graph anomaly detection method and system based on integrated learning and active learning | |
Goyal et al. | Benchmarks for graph embedding evaluation | |
KR101467707B1 (en) | Method for instance-matching in knowledge base and device therefor | |
Amelio et al. | A multilayer network-based approach to represent, explore and handle convolutional neural networks | |
Kumar et al. | Graph Convolutional Neural Networks for Link Prediction in Social Networks | |
CN117349494A (en) | Graph classification method, system, medium and equipment for space graph convolution neural network | |
CN117152528A (en) | Insulator state recognition method, insulator state recognition device, insulator state recognition apparatus, insulator state recognition program, and insulator state recognition program | |
CN116977271A (en) | Defect detection method, model training method, device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220323 Address after: 510000 5548, floor 5, No. 1023, Gaopu Road, Tianhe District, Guangzhou City, Guangdong Province Patentee after: Guangdong SUCHUANG Data Technology Co.,Ltd. Address before: 510631 School of computer science, South China Normal University, 55 Zhongshan Avenue West, Tianhe District, Guangzhou City, Guangdong Province Patentee before: SOUTH CHINA NORMAL University Patentee before: Guangzhou Fanping Electronic Technology Co., Ltd |