CN106650309A - Prediction method and prediction device for membrane protein residue interaction relation - Google Patents
Prediction method and prediction device for membrane protein residue interaction relation Download PDFInfo
- Publication number
- CN106650309A CN106650309A CN201611264831.5A CN201611264831A CN106650309A CN 106650309 A CN106650309 A CN 106650309A CN 201611264831 A CN201611264831 A CN 201611264831A CN 106650309 A CN106650309 A CN 106650309A
- Authority
- CN
- China
- Prior art keywords
- residue
- feature
- amino acid
- protein
- memebrane protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
A prediction method for membrane protein residue interaction relation includes acquiring membrane protein with analyzed protein structure as a training set, extracting characteristic of inbalance classification, for distinguishing a interactive residue pair and a non-interactive residue pair, of the membrane protein with the analyzed protein structure, training a prediction model by a smote-boost algorithm according to the extracted characteristic of inbalance classification, and predicting interaction relation of the membrane protein residue of the unknown protein structure according to the trained prediction model. Since the prediction model is trained according to the characteristic of inbalance classification, loss of useful information is avoided for the trained prediction model, and prediction precision and coverage is improved.
Description
Technical field
The invention belongs to the crossing domain of data mining, machine learning and computer biology, more particularly to a kind of film egg
The Forecasting Methodology and device of white residue effect relation.
Background technology
In the drug target being currently known, memebrane protein accounts for 60%.Due to Membrane protein conformation experiment parsing difficulty compared with
Greatly, in Protein Data Bank (Protein Data Bank-PDB), in the known protein structure more than 90,000, it is known that
Membrane protein conformation only accounts for the 1% of known protein structure.
The biological experimental method of existing parsing protein three-dimensional structure mainly includes X-RAY and NMR methods.These are biological
Not only operating process is complex to learn experimental technique, takes, and it is also higher to test the cost for spending.Just because of experiment parsing
These of method are not enough so that computer computational methods develop into certainty.Currently used for the meter of protein three-dimensional structure prediction
Calculation method mainly has Blast search method, Folding recognition and ab initio prediction method.And generally from the angle of balanced classification, will be mutual
The residue pair of effect or non-interacting residue are to according to 1:1 Scale Training method model.Wherein, residue is referred to by 20 kinds of differences
Amino acid connect the polymer to be formed, formed protein after, the amino and carboxyl dehydration bonding between these amino acid, ammonia
Base acid take part in the formation of peptide bond due to its moieties, and remaining structure division is referred to as amino acid residue.So-called residue effect
Relation refers to those non-conterminous in the primary sequence of protein and residues pair neighbouring in tertiary structure.
Because the residue pair for interacting can typically be far longer than 1 with the ratio of non-interacting residue pair:1, so as to
So that existing Forecasting Methodology can cause a large amount of useful information loss, the degree of accuracy and the coverage of prediction are affected.
The content of the invention
It is an object of the invention to provide a kind of Forecasting Methodology of the interactively of memebrane protein residue, to solve prior art
In Forecasting Methodology can cause a large amount of useful information loss, affect the degree of accuracy of prediction and the problem of coverage.
In a first aspect, embodiments providing a kind of Forecasting Methodology of the interactively of memebrane protein residue, the side
Method includes:
Acquisition has parsed the memebrane protein of protein structure as training set;
Parse described in extracting in the memebrane protein of protein structure for distinguishing the residue pair and non-phase interaction that interact
The feature of the lack of balance classification of residue pair;
The feature that the lack of balance extracted is classified passes through smote-boost Algorithm for Training forecast models, after being trained
Forecast model;
According to the forecast model after training, the interactively of the memebrane protein residue of agnoprotein matter structure is predicted.
With reference in a first aspect, in the first possible implementation of first aspect, having parsed albumen described in the extraction
It is used to distinguish the spy that the lack of balance of the residue pair and non-interacting residue pair for interacting is classified in the memebrane protein of matter structure
In levying step, the feature of the lack of balance classification includes:Position-specific scoring matrices PSSM features, the residue phase in α spirals
Adjust the distance a kind of or many in feature, train interval feature, residue type feature, α spiral number features, sequence length feature
Kind.
With reference to the first possible implementation of first aspect, in second possible implementation of first aspect, institute
State vector representation of each residue in position-specific scoring matrices PSSM by one 20 dimension, the location specific score square
Battle array PSSM features include:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each is residual
Base is to obtaining 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b is obtained
Individual position-specific scoring matrices PSSM features.
With reference to the first possible implementation of first aspect, in the third possible implementation of first aspect, one
Individual residue effect is to including two amino acid, the residue type feature is included by acidic amino acid, basic amino acid, polarity ammonia
10 kinds of combinations produced by any two kinds in base acid, nonpolar amino acid.
With reference in a first aspect, in the 4th kind of possible implementation of first aspect, the residue of the interaction to for
Residue pair of the CB-CB atomic distances on the α spirals of memebrane protein less than 8 angstroms.
Second aspect, embodiments provides a kind of prediction meanss of the interactively of memebrane protein residue, the dress
Put including:
Training set acquiring unit, for obtaining the memebrane protein for having parsed protein structure as training set;
Feature extraction unit, for extracting the memebrane protein for having parsed protein structure in for distinguishing what is interacted
The feature that the lack of balance of residue pair and non-interacting residue pair is classified;
Training unit, the feature for the lack of balance extracted to be classified predicts mould by smote-boost Algorithm for Training
Type, the forecast model after being trained;
Predicting unit, for according to the forecast model after training, predicting the work of the memebrane protein residue of agnoprotein matter structure
With relation.
It is described in the feature extraction unit in the first possible implementation of second aspect with reference to second aspect
The feature of lack of balance classification includes:Position-specific scoring matrices PSSM features, residue relative distance feature, sequence in α spirals
One or more in row spaced features, residue type feature, α spiral number features, sequence length feature.
With reference to the first possible implementation of second aspect, in second possible implementation of second aspect, institute
State vector representation of each residue in position-specific scoring matrices PSSM by one 20 dimension, the location specific score square
Battle array PSSM features include:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each is residual
Base is to obtaining 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b is obtained
Individual position-specific scoring matrices PSSM features.
With reference to the first possible implementation of second aspect, in the third possible implementation of second aspect, one
Individual residue effect is to including two amino acid, the residue type feature is included by acidic amino acid, basic amino acid, polarity ammonia
10 kinds of combinations produced by any two kinds in base acid, nonpolar amino acid.
With reference to second aspect, in the 4th kind of possible implementation of second aspect, the residue of the interaction to for
Residue pair of the CB-CB atomic distances on the α spirals of memebrane protein less than 8 angstroms.
In the present invention, obtain the memebrane protein of protein structure that parsed as training set, extract described in parsed
The lack of balance for being used to distinguish the residue pair and non-interacting residue pair for interacting in the memebrane protein of protein structure is classified
Feature, by the feature extracted by smote-boost Algorithm for Training forecast models, the forecast model after being trained, and root
According to the forecast model after the training, the interactively of the memebrane protein residue of agnoprotein matter structure is predicted.Due to using non-equal
The feature of weighing apparatus classification is predicted the training of model, so that the forecast model after training can avoid the stream of useful information
Lose, be conducive to improving the precision and coverage of prediction.
Description of the drawings
Fig. 1 is the flowchart of the Forecasting Methodology of the interactively of memebrane protein residue provided in an embodiment of the present invention;
Fig. 2 is the structural representation of the prediction meanss of the interactively of memebrane protein residue provided in an embodiment of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and
It is not used in the restriction present invention.
The purpose of the embodiment of the present invention is to provide a kind of Forecasting Methodology of the interactively of memebrane protein residue, existing to solve
Have in technology for unknown structure memebrane protein residue interactively prediction during, the angle typically from balanced classification will
The residue pair of interaction or non-interacting residue are to according to 1:1 Scale Training method model, and in fact, interact or
Non-interacting compared residue's example is far longer than 1:1, can cause a large amount of useful letters according to the Scale Training method model of balanced equity
The loss of breath, so as to the problem that the precision and coverage of the interactively of the memebrane protein residue of prediction can be caused not high.Below
With reference to accompanying drawing, the present invention is further illustrated.
Fig. 1 shows the realization stream of the Forecasting Methodology of the interactively of the memebrane protein residue that first embodiment of the invention is provided
Journey, details are as follows:
In step S101, acquisition has parsed the memebrane protein of protein structure as training set.
Specifically, the memebrane protein that protein structure has been parsed, should fixed memebrane protein residue interactively.
Preferred a kind of embodiment, it is possible to use (English full name is PDBTM:protein data bank of
Transmembrane proteins, Chinese full name is:The Protein Data Bank of transmembrane protein) in parsed in the past within 2 months 2012
Memebrane protein as training set.
Certainly, the selection of the training set of above-mentioned Transmembrane Protein Database be one of which preferred embodiment, with solution
Analysis and the development of technology of identification, increasing Membrane protein conformation is resolved, and can obtain the effect of the memebrane protein residue of determination
Relation, thus the sample data in the training set also can increasingly be enriched, thus also can advantageously in raising forecast model
Training the degree of accuracy.
In step s 102, parsed described in extracting in the memebrane protein of protein structure for distinguishing the residue for interacting
Pair and non-interacting residue pair lack of balance classify feature;
Specifically, it is used to distinguish the residue pair and non-interacting residue pair of interaction described in the embodiment of the present invention
Lack of balance classification feature, can include that position-specific scoring matrices PSSM features, the residue relative distance in α spirals is special
Levy, one or more in train interval feature, residue type feature, α spiral number features, sequence length feature.
Wherein, (English full name is for the position-specific scoring matrices PSSM:Position-Specific Scoring
Matrix) feature, can (English full name be by running PSI-BLAST:Position-Specific Iterative
Basic Local Alignment Search Tool, Chinese full name is:Location specific Iterative search algorithm) mode obtain
Take.Wherein, the database that can adopt is UNIREF90 databases when running PSI-BLAST, and iterations during operation can be with
It is 1e-10 (being expressed as -10 powers of 1*10) for 2, E-value cutoff values.
In embodiments of the present invention, each residue in the position-specific scoring matrices PSSM by one 20 dimension
Vector representation, represents the frequency that 20 kinds of amino acid occur in PSSM relevant positions.During feature extraction, position-specific scoring matrices
PSSM features are divided into two classes, respectively:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each is residual
Base is to obtaining 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b is obtained
Individual position-specific scoring matrices PSSM features.
Such as, in a kind of specific embodiment, Ke Yiwei:
The first kind be with residue to the residue i and residue j in (i, j) respectively centered on take the sliding window that size is 7
Mouthful, i.e., to each residue to 2 × 7 × 20=280 position-specific scoring matrices PSSM feature is obtained;
Equations of The Second Kind is that the sliding window that size is 3 is taken centered on centre position (i+j)/2 with residue to (i, j),
3 × 20=60 position-specific scoring matrices PSSM feature can be obtained.
The sum of two class position-specific scoring matrices PSSM features is 280+60=340.
Residue relative distance feature in α spirals is specially:Assume that p is in length for a residue of residue centering
Relative position on the spiral of l, then residue relative distance feature in α spirals is just defined as p/l, for each residue centering
Including two residues, the relative distance feature in α spirals of the residue corresponding to residue can be respectively extracted, it is residual including 2 altogether
Base relative distance feature in α spirals.
The train interval feature can be divided according to residue to the position in primary sequence.Such as, a kind of tool
The interval dividing mode of body can be divided into following multiple intervals:
<25th, 25-50,50-75,75-100,100-125,125-150,150-175,175-200 and>200 this nine areas
Between.
(0 represents not in the interval, otherwise is can will to set to 0 or put 1 using corresponding train interval condition code 000000000
1) for stating train interval feature.For each residue is for, according to above-mentioned interval division mode, 9 sequences can be corresponded to
One in row spaced features.
For the residue type feature, it is contemplated that totally 20 kinds of the amino acid of constitutive protein matter, according to the pole of amino acid R bases
Property property can be divided into acidic amino acid (glutamic acid and aspartic acid), basic amino acid (lysine, arginine and histidine) and
Neutral amino acid, wherein neutral amino acid can be divided into polar amino acid (glycine, serine, cysteine, threonine, junket again
Propylhomoserin, asparagine and glutamine) and nonpolar amino acid (alanine, leucine, isoleucine, phenylalanine, first sulphur ammonia
Acid, tryptophan, valine and proline).According to this 4 kinds of different amino acid classes (acidic amino acid, basic amino acid, poles
Acidic amino acid and nonpolar amino acid), a residue effect can produce 10 kinds of different combinations to (two amino acid of correspondence),
1 can be respectively set to 0 or put with binary code 0000000000 to represent different composite types.10 residue type spies can be included
Levy.
The α spirals number feature can carry out interval division according to the α spirals number that memebrane protein is included.Such as, may be used
To be divided into 2-4,5-7,8-10 and this 4 intervals more than 10.1 is set to 0 or put by binary vector 0000 to represent the α
Spiral number feature (0 represents not in the interval, otherwise for 1).The category feature is to all residues in a certain memebrane protein to one
Cause property.Each residue includes 4 category features to characteristic vector.
The sequence length feature, be able to can be divided into according to the length of memebrane protein institute primary sequence<100,100-400,
400-800,>800 this 4 intervals, set to 0 or put 1 to represent that (0 represents not in the interval, anti-this feature with binary vector 0000
For 1).This category feature to same memebrane protein in all residues to consistent.Each residue should comprising 4 to characteristic vector
Category feature.
In sum, the present invention can use 340 position-specific scoring matrices PSSM features, relative in 2 α spirals
Distance feature, 9 train interval features and 10 residue type features, 4 α spiral number features, 4 sequence length spies
Levy, altogether 369 features.
In addition, the ratio of the residue pair interacted described in the embodiment of the present invention and non-interacting residue pair, can
Think 1 to 50 to 1 to 80, a kind of preferred embodiment could be arranged to 1 to 67.
Specifically, residue of protein act on to definition have various, for example based on atom Van der Waals distance definition, base
Definition in the definition of CA-CA atomic distances and based on CB-CB atomic distances.The present invention will with regard to the definition of residue effect pair
Continue to use a definition being widely adopted:CB-CB atomic distances on the α spirals of memebrane protein are less thanThe residue of (angstrom)
To the residue pair for being defined as interacting.CA, CB are the atomic types inside gromacs, gromacs molecular dynamics softwares.
In step s 103, the feature lack of balance extracted classified predicts mould by smote-boost Algorithm for Training
Type, the forecast model after being trained;
After the feature for mentioning the lack of balance classification, the feature can be updated in forecast model and be instructed
Practice.The forecast model can be vector machine training pattern etc..
The training algorithm smote-boost, is the Novel training method for combining smote technologies and boost technologies, its
In:Boost methods in each iteration, increase the weights without correct classification samples, reduce the weights of correct classification samples, more
Pay attention in the sample in classification error.Because a few sample is easier by mistake classification, institute can be improved to minority in this way
The estimated performance of class.SMOTE (English full name is synthetic minority over-sampling rechnique) technology
It is a kind of new method of non-equalization data collection study, by the ratio of the artificial synthesized raising minority class sample to a few sample,
Reduce the excess divergence of data.SMOTE technologies can be prevented effectively from due to giving a few sample more in combination with BOOST technologies
The issuable overfitting of big weights.
In step S104, according to the forecast model after training, the work of the memebrane protein residue of agnoprotein matter structure is predicted
With relation.
The present invention is used as training set by obtaining the memebrane protein of protein structure for having parsed, the egg parsed described in extraction
It is used to distinguish the lack of balance classification of the residue pair and non-interacting residue pair for interacting in the memebrane protein of white matter structure
Feature, by the feature extracted by smote-boost Algorithm for Training forecast models, the forecast model after being trained, and according to
Forecast model after the training, predicts the interactively of the memebrane protein residue of agnoprotein matter structure.Due to using lack of balance
The feature of classification is predicted the training of model, so that the forecast model after training can avoid the loss of useful information,
Be conducive to improving the precision and coverage of prediction.
Fig. 2 shows that a kind of structure of the prediction meanss of the interactively of memebrane protein residue provided in an embodiment of the present invention is shown
It is intended to, details are as follows:
The prediction meanss of the interactively of memebrane protein residue described in the embodiment of the present invention, including:
Training set acquiring unit 201, for obtaining the memebrane protein for having parsed protein structure as training set;
Feature extraction unit 202, for extracting the memebrane protein for having parsed protein structure in for distinguishing phase interaction
The feature that the lack of balance of residue pair and non-interacting residue pair is classified;
Training unit 203, for the feature that the lack of balance extracted is classified to be predicted by smote-boost Algorithm for Training
Model, the forecast model after being trained;
Predicting unit 204, for according to the forecast model after training, predicting the memebrane protein residue of agnoprotein matter structure
Interactively.
Preferably, in the feature extraction unit, the feature of the lack of balance classification includes:Position-specific scoring matrices
PSSM features, residue relative distance feature, train interval feature, residue type feature, α spiral number features, sequence in α spirals
One or more in row length characteristic.
Preferably, each residue in the position-specific scoring matrices PSSM is by one 20 vector representation tieed up, institute
Stating position-specific scoring matrices PSSM features includes:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each is residual
Base is to obtaining 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b is obtained
Individual position-specific scoring matrices PSSM features.
Preferably, a residue effect is to including two amino acid, the residue type feature include by acidic amino acid,
10 kinds of combinations produced by any two kinds in basic amino acid, polar amino acid, nonpolar amino acid.
Preferably, the residue of the interaction for the CB-CB atomic distances on the α spirals of memebrane protein to being less than 8
Angstrom residue pair.
The prediction meanss of the interactively of memebrane protein residue described in Fig. 2, the effect with memebrane protein residue described in embodiment one
The Forecasting Methodology correspondence of relation, here is not repeated and repeats.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, it can be passed through
Its mode is realized.For example, device embodiment described above is only schematic, for example, the division of the unit, and only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied
Close or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Coupling each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs by some interfaces, device or unit or logical
Letter connection, can be electrical, mechanical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can according to the actual needs be selected to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used
When, during a computer read/write memory medium can be stored in.Based on such understanding, technical scheme is substantially
The part for contributing to prior art in other words or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention
Portion or part.And aforesaid storage medium includes:USB flash disk, portable hard drive, read-only storage (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with store program codes
Medium.
Presently preferred embodiments of the present invention is the foregoing is only, not to limit the present invention, all essences in the present invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (10)
1. a kind of Forecasting Methodology of the interactively of memebrane protein residue, it is characterised in that methods described includes:
Acquisition has parsed the memebrane protein of protein structure as training set;
Extract in the memebrane protein for having parsed protein structure for distinguish the residue pair for interacting with it is non-interacting
The feature of the lack of balance classification of residue pair;
The feature that the lack of balance extracted is classified is pre- after being trained by smote-boost Algorithm for Training forecast models
Survey model;
According to the forecast model after training, the interactively of the memebrane protein residue of agnoprotein matter structure is predicted.
2. method according to claim 1, it is characterised in that in having parsed the memebrane protein of protein structure described in the extraction
It is described non-equal in the characterization step classified for the lack of balance for distinguishing the residue pair and non-interacting residue pair for interacting
The feature of weighing apparatus classification includes:Position-specific scoring matrices PSSM features, residue are in α spirals between relative distance feature, sequence
One or more in feature, residue type feature, α spiral number features, sequence length feature.
3. method according to claim 2, it is characterised in that each residue in the position-specific scoring matrices PSSM
By the vector representation of one 20 dimension, the position-specific scoring matrices PSSM features include:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each residue pair
Obtain 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b position is obtained
Put specific score matrix PSSM features.
4. method according to claim 2 a, it is characterised in that residue effect is to including two amino acid, the residue
Type feature is included by produced by any two kinds in acidic amino acid, basic amino acid, polar amino acid, nonpolar amino acid
10 kinds combination.
5. method according to claim 1, it is characterised in that the residue of the interaction is to for positioned at the α spiral shells of memebrane protein
Residue pair of the CB-CB atomic distances for screwing on less than 8 angstroms.
6. a kind of prediction meanss of the interactively of memebrane protein residue, it is characterised in that described device includes:
Training set acquiring unit, for obtaining the memebrane protein for having parsed protein structure as training set;
Feature extraction unit, for extracting the memebrane protein for having parsed protein structure in for distinguishing the residue for interacting
Pair and non-interacting residue pair lack of balance classify feature;
Training unit, the feature for the lack of balance extracted to be classified passes through smote-boost Algorithm for Training forecast models, obtains
Forecast model to after training;
Predicting unit, for according to the forecast model after training, the effect for predicting the memebrane protein residue of agnoprotein matter structure to be closed
System.
7. device according to claim 6, it is characterised in that in the feature extraction unit, the spy of the lack of balance classification
Levy including:Position-specific scoring matrices PSSM features, residue relative distance feature, train interval feature, residue in α spirals
One or more in type feature, α spiral number features, sequence length feature.
8. device according to claim 7, it is characterised in that each residue in the position-specific scoring matrices PSSM
By the vector representation of one 20 dimension, the position-specific scoring matrices PSSM features include:
With residue to the residue i and residue j in (i, j) respectively centered on take slides container of the size as a, each residue pair
Obtain 40a position-specific scoring matrices PSSM feature;
Sliding window of the size as b is taken centered on centre position (i+j)/2 of the residue to (i, j), 20*b position is obtained
Put specific score matrix PSSM features.
9. device according to claim 7 a, it is characterised in that residue effect is to including two amino acid, the residue
Type feature is included by produced by any two kinds in acidic amino acid, basic amino acid, polar amino acid, nonpolar amino acid
10 kinds combination.
10. device according to claim 6, it is characterised in that the residue of the interaction is to for positioned at the α spiral shells of memebrane protein
Residue pair of the CB-CB atomic distances for screwing on less than 8 angstroms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611264831.5A CN106650309A (en) | 2016-12-30 | 2016-12-30 | Prediction method and prediction device for membrane protein residue interaction relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611264831.5A CN106650309A (en) | 2016-12-30 | 2016-12-30 | Prediction method and prediction device for membrane protein residue interaction relation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106650309A true CN106650309A (en) | 2017-05-10 |
Family
ID=58837930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611264831.5A Pending CN106650309A (en) | 2016-12-30 | 2016-12-30 | Prediction method and prediction device for membrane protein residue interaction relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650309A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018120128A1 (en) * | 2016-12-30 | 2018-07-05 | 中国科学院深圳先进技术研究院 | Method and device for predicting interaction relationship between membrane protein residues |
CN110223730A (en) * | 2019-06-06 | 2019-09-10 | 河南师范大学 | Protein and small molecule binding site prediction technique, prediction meanss |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252581A (en) * | 2013-06-26 | 2014-12-31 | 中国科学院深圳先进技术研究院 | Method for predicting transmembrane protein residue function relationship based on SVM (support vector machine) |
CN104504299A (en) * | 2014-12-29 | 2015-04-08 | 中国科学院深圳先进技术研究院 | Method for predicting action relation between residues of membrane protein |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
-
2016
- 2016-12-30 CN CN201611264831.5A patent/CN106650309A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252581A (en) * | 2013-06-26 | 2014-12-31 | 中国科学院深圳先进技术研究院 | Method for predicting transmembrane protein residue function relationship based on SVM (support vector machine) |
CN104504299A (en) * | 2014-12-29 | 2015-04-08 | 中国科学院深圳先进技术研究院 | Method for predicting action relation between residues of membrane protein |
CN104615910A (en) * | 2014-12-30 | 2015-05-13 | 中国科学院深圳先进技术研究院 | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest |
Non-Patent Citations (5)
Title |
---|
VANI, K. SUVARNA等;: "SMOTE Based Protein Fold Prediction Classification", 《ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY)》 * |
姜彬,: "膜蛋白分类问题的特征提取算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
曾聪,: "膜蛋白分类的特征提取算法和数据集构建技术研究", 《中国优秀硕士学位论文全文数据库基础科学辑(月刊)》 * |
王璐林: "面向不平衡样本的Boosting分类算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
陈国青,等;主编: "《中国信息系统研究:新兴技术背景下的机遇与挑战 2011年11月第1版》", 30 November 2011, 同济大学出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018120128A1 (en) * | 2016-12-30 | 2018-07-05 | 中国科学院深圳先进技术研究院 | Method and device for predicting interaction relationship between membrane protein residues |
CN110223730A (en) * | 2019-06-06 | 2019-09-10 | 河南师范大学 | Protein and small molecule binding site prediction technique, prediction meanss |
CN110223730B (en) * | 2019-06-06 | 2022-09-27 | 河南师范大学 | Prediction method and prediction device for protein and small molecule binding site |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts | |
Zakeri et al. | Prediction of protein submitochondria locations based on data fusion of various features of sequences | |
Li et al. | Protein contact map prediction based on ResNet and DenseNet | |
WO2022271859A1 (en) | Methods, systems, articles of manufacture, and apparatus for decoding purchase data using an image | |
KR101809599B1 (en) | Method and Apparatus for Analyzing Relation between Drug and Protein | |
Ren et al. | Tertiary structure-based prediction of conformational B-cell epitopes through B factors | |
Minhas et al. | Multiple instance learning of Calmodulin binding sites | |
Yuan et al. | Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning | |
Bicego et al. | A bioinformatics approach to 2D shape classification | |
Yilmaz et al. | Sequence-to-sequence translation from mass spectra to peptides with a transformer model | |
White et al. | Generative models for chemical structures | |
Shao et al. | DeepSec: a deep learning framework for secreted protein discovery in human body fluids | |
CN106650309A (en) | Prediction method and prediction device for membrane protein residue interaction relation | |
Ghualm et al. | Identification of pathway-specific protein domain by incorporating hyperparameter optimization based on 2D convolutional neural network | |
Murphy et al. | Self-supervised learning of cell type specificity from immunohistochemical images | |
CN104615910A (en) | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest | |
Du et al. | Improving protein domain classification for third-generation sequencing reads using deep learning | |
CN114242159A (en) | Method for constructing antigen peptide presentation prediction model, and antigen peptide prediction method and device | |
Sun et al. | iNGNN-DTI: prediction of drug–target interaction with interpretable nested graph neural network and pretrained molecule models | |
Patel et al. | Protein secondary structure prediction using support vector machines (SVMs) | |
Ieremie et al. | Protein language models meet reduced amino acid alphabets | |
Nguyen et al. | Multimodal pretraining for unsupervised protein representation learning | |
Li et al. | Navigating the landscapes of spatial transcriptomics: How computational methods guide the way | |
Yang et al. | Many Local Pattern Texture Features: Which Is Better for Image‐Based Multilabel Human Protein Subcellular Localization Classification? | |
CN110245594A (en) | A kind of commodity recognition method for cash register system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |