CN105260626B - The full information Forecasting Methodology of protein structure space conformation - Google Patents
The full information Forecasting Methodology of protein structure space conformation Download PDFInfo
- Publication number
- CN105260626B CN105260626B CN201510623583.8A CN201510623583A CN105260626B CN 105260626 B CN105260626 B CN 105260626B CN 201510623583 A CN201510623583 A CN 201510623583A CN 105260626 B CN105260626 B CN 105260626B
- Authority
- CN
- China
- Prior art keywords
- protein
- conformation
- frequency
- shape code
- potential
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 89
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000012846 protein folding Effects 0.000 claims abstract description 23
- 238000001228 spectrum Methods 0.000 claims abstract description 11
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 150000001413 amino acids Chemical class 0.000 claims description 26
- 239000013589 supplement Substances 0.000 claims description 8
- 239000000047 product Substances 0.000 claims description 2
- 230000004907 flux Effects 0.000 abstract description 2
- 238000012216 screening Methods 0.000 abstract description 2
- 235000013601 eggs Nutrition 0.000 description 6
- 238000000455 protein structure prediction Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 241001505049 Balantiocheilos melanopterus Species 0.000 description 4
- 102000055006 Calcitonin Human genes 0.000 description 4
- 108060001064 Calcitonin Proteins 0.000 description 4
- 239000013078 crystal Substances 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- BBBFJLBPOGFECG-VJVYQDLKSA-N calcitonin Chemical compound N([C@H](C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N1[C@@H](CCC1)C(N)=O)C(C)C)C(=O)[C@@H]1CSSC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N1 BBBFJLBPOGFECG-VJVYQDLKSA-N 0.000 description 3
- 229960004015 calcitonin Drugs 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000002773 nucleotide Substances 0.000 description 3
- 210000004885 white matter Anatomy 0.000 description 3
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- XGWIJUOSCAQSSV-XHDPSFHLSA-N (S,S)-hexythiazox Chemical compound S([C@H]([C@@H]1C)C=2C=CC(Cl)=CC=2)C(=O)N1C(=O)NC1CCCCC1 XGWIJUOSCAQSSV-XHDPSFHLSA-N 0.000 description 1
- QCVGEOXPDFCNHA-UHFFFAOYSA-N 5,5-dimethyl-2,4-dioxo-1,3-oxazolidine-3-carboxamide Chemical compound CC1(C)OC(=O)N(C(N)=O)C1=O QCVGEOXPDFCNHA-UHFFFAOYSA-N 0.000 description 1
- 102000007469 Actins Human genes 0.000 description 1
- 108010085238 Actins Proteins 0.000 description 1
- 101710134784 Agnoprotein Proteins 0.000 description 1
- 102000002322 Egg Proteins Human genes 0.000 description 1
- 108010000912 Egg Proteins Proteins 0.000 description 1
- 101000859758 Homo sapiens Cartilage-associated protein Proteins 0.000 description 1
- 101000916686 Homo sapiens Cytohesin-interacting protein Proteins 0.000 description 1
- 101000726740 Homo sapiens Homeobox protein cut-like 1 Proteins 0.000 description 1
- 101000761460 Homo sapiens Protein CASP Proteins 0.000 description 1
- 101000761459 Mesocricetus auratus Calcium-dependent serine proteinase Proteins 0.000 description 1
- 101710124584 Probable DNA-binding protein Proteins 0.000 description 1
- 102100024933 Protein CASP Human genes 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000014103 egg white Nutrition 0.000 description 1
- 210000000969 egg white Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011173 large scale experimental method Methods 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000329 molecular dynamics simulation Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 108091005703 transmembrane proteins Proteins 0.000 description 1
- 102000035160 transmembrane proteins Human genes 0.000 description 1
- 238000002424 x-ray crystallography Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention relates to a kind of full information Forecasting Methodology of protein structure space conformation, belong to field of bioinformatics.For any protein sequence, with protein structure fingerprint technique, directly by the high flux screening to 5AAPFSC databases, its corresponding protein folding conformation will be obtained.Each folded conformation will be represented that these foldable structures cover secondary structure and tertiary structure by protein folding shape code letter.All possible collapsed shape code can align to form an array, and one PFSC protein steric conformation bands of a spectrum of generation are as prediction result.By the test of the protein to a large amount of known three-dimensional structures, the reliability and validity of the inventive method have been demonstrated well.
Description
Technical field
The present invention relates to a kind of full information Forecasting Methodology of protein structure space conformation, belong to field of bioinformatics.
Background technology
Protein structure is to carry out genomics, bioinformatics, the important letter of medicament research and development and biotechnology research
Breath1,2.However, up to the present, the three-dimensional structure of the protein only about less than 1% passes through X-ray crystal diffraction or core
The measurement of the experimental methods such as magnetic resonance obtains3.Still there is the sequence about more than 5,002,000,000 protein still without three-dimensional
The information and data of structure4, the highly desirable space structure that can determine that these albumen of biological medicine research.For a long time, in terms of
Based on the modeling of calculation machine, many methods and applications on protein structure prediction have been developed.From 1994, every two years
" key evaluation (CASP) of the protein structure prediction " activity held once turns into countries in the world protein molecule bioscience
One intercommunion platform of family5,6.In view of the complexity of protein structure, and the possibility folding mode of exponential number level, predict egg
The research puzzle of white structure is listed in one of 100 big challenge subjects of 21 century modern science7。
So far, it is other to can be basically divided into three major types for the method for various prediction protein structures.The first kind is to be based on sequence
Modeling method8,9,10.This method protein structure known to solves agnoprotein structure.This method need to rely on
Similarity degree between sequence compares extraction information, and the degree of reliability on prediction result is always a query.Second class is to adopt
The splicing modeling method identified with folding configuration11,12,13,14,15.This method is using statistical method from specific albumen database
Screen the correlation between fold segments and sequence.Statistical method can no doubt cover most of folding configurations, but frequency
Relatively low folding configuration is often just ignored.3rd class is ab iitio model method16,17,18.This method is anti-with computer
The interaction between the amino acid in protein and atom is iterated to calculate again, and to the last whole conformation system tends to one
Relatively low energy state.This method consumes substantial amounts of computer time and resource, and prediction is only capable of obtaining related protein
A possible space structure.For a long time, biologist is expected that by Forecasting Methodology and obtains egg that is reliable and having no objection
White matter structure.As target, various researchs attempt to improve the Forecasting Methodology of protein structure, but progress in this respect is
It is very undesirable.Search to the bottom reason, be due to protein structure complexity in itself and polytropy.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of full information prediction (Complete of protein conformation
Prediction for Protein Conformation, CPPC) method.This method simplifies albumen with digital model
The complexity of structure, while recognize with full information structured data the polytropy of protein structure.This method being capable of fast prediction
The structure of protein, and all possible protein steric conformation is provided.
The full information Forecasting Methodology of the protein conformation of the present invention is built upon patent before inventor
On the basis of PFSC disclosed in ZL200880003164.2 (Protein Folding Shape Code, PFSC)
The new method of the prediction protein structure of exploitation19.Can be intact by the PFSC (PFSC) being strictly derived by
Ground describes the collapsed shape of continuous 5 amino acid fragments.Any collapsed shape of 5 amino acid fragments in protein can be with
Described by 27PFSC vectors, whole 27PFSC vectors employ 26 English alphabets and add $ symbols.It is prior
It is that all 27 PFSC vectors cover a complete mathematical space.Moreover, all the collapsed shape of 27 PFSC vectors is
It is highly closely related.Each PFSC vectors can be another vector from a vectorial transition and conversion.
The amino acid from the point of view of mathematical angle, by different sequences, 5 amino acid can form different arrangements.From complete
5 amino acid are arbitrarily extracted in 20, portion amino acid will can form different arrangements of the sum for 3,200,000.Each row
The possibility folded conformation of row can obtain from global Protein Data Bank (PDB), then with protein folding shape code (PFSC) table
Show.On this basis, we create a database to collect the folded conformation of above-mentioned 3,200,000 arrangements.This is brand-new
Database be named as 5AAPFSC.In this database, the collapsed shape related to each arrangement will be used intactly pair
The PFSC codes storage answered is wherein.
The full information Forecasting Methodology of the protein structure space conformation of the present invention, comprises the following steps:
1) 5 amino acid are arbitrarily extracted from all 20 amino acid, forms sum as 3,200,000 different rows
Row, the possibility folded conformation of each arrangement obtains from global Protein Data Bank (PDB), then with protein folding shape code
(PFSC) represent;A database is created to collect above-mentioned arrangement and its corresponding protein folding shape code, the database quilt
5AAPFSC is named as, as shown in Figure 1;
2) for the protein of any one structure to be predicted, along the sequence of protein, since N- ends, progressively move
To C- ends, every 5 continuous amino acid is successively read, its folded conformation that may have directly obtains from 5AAPFSC databases,
With protein folding shape code (PFSC) character representation;The word of frequency of occurrences highest folded conformation code in Protein Data Bank
Symbol makes number one, and the high folded conformation code character of the frequency of occurrences second comes second, sequentially forms a row from top to bottom,
Untill collecting completely, every 5 continuous amino acid have different number of folded conformation may;
3) the possible collapsed shape code of the whole of testing protein forms an array, referred to as protein folding conformation bands of a spectrum,
As shown in Fig. 2 represent the whole possible folded conformation of sequence along protein;For each protein sequence, pass through it
Being substituted for each other for whole possible partial folds conformations, can accurately obtain all possible conformation;The sum of possible conformation
Mesh is the continued product of all possible folded conformation numbers of every 5 amino acid;
For any one testing protein, although the number of whole possible space conformations is huge, the high sky of possibility
Between conformation pass through the high local folded conformation of the frequency of occurrences and obtain.For example, first space conformation be by the frequency of occurrences most
High folded configuration shape code is formed;Second space conformation be by the high collapsed shape code of the frequency of occurrences second, it is high second
Frequency conformation position, formed using the collapsed shape code of frequency highest as supplement;3rd space conformation is by the frequency of occurrences
3rd high collapsed shape code, in no 3rd high-frequency conformation position, supplement structure is used as using the collapsed shape code of frequency highest
Into;And so on, form a series of higher possible prediction conformations of possibility.
Therefore, a succession of protein folding shape code being made up of high-frequency conformation is exactly the higher protein steric knot of possibility
Structure conformation.According to protein folding conformation bands of a spectrum, it can be found that more local changes and substitute, be modified to be formed it is more relevant
Possible space structure conformation.
The protein steric conformation bands of a spectrum that this analysis method obtains provide a full letter to protein structure space folding conformation
The prediction of breath, while disclose the minor variations of its possible any local conformation.The full information prediction of protein conformation
(CPPC) significance of method be must to build that a brand-new GFP structure composition database creates from now on
The condition wanted.The full information prediction of protein conformation is to predict a new method of protein structure, and this method will push away
The development of filamentous actin structural genomics.The full information Predicting Technique for the protein conformation that we develop is not only to albumen
The prediction of structure provides complete folded conformation, and the protein structure obtained for comprehensive understanding from measuring has ten
Divide significance.
Brief description of the drawings
The structure of Fig. 1,5AAPFSC database.
Fig. 2, the foundation of protein folding conformation bands of a spectrum.
Fig. 3, conformation and full information are pre- known to the 2XCW protein fragments (residue 3-62) of people's cell matter 5'- nucleotides II albumen
Survey the contrast of result.Form the first row is the amino acid sequence segments (3-62) of the albumen.Followed by the folding of 8 known structures
Conformation, folded conformation are represented with protein folding shape code (PFSC).Lower semisection is 9 possible space structures of prediction in form
As.
Fig. 4, the calcitonin prediction of 32 amino acid of marine aquatic biological silver shark (CallorhinchusMiliiX) extract are empty
Between conformation.
Embodiment
For any protein sequence, with protein structure fingerprint technique (PSFT), directly by 5AAPFSC data
The high flux screening in storehouse, its corresponding protein folding conformation will be obtained.Each folded conformation will be by protein folding shape code
(PFSC) letter represents that each of which letter all represents the characteristic of its proprietary foldable structure, these foldable structures are covered
Secondary structure and tertiary structure.All possible collapsed shape code can align to form an array, generate a PFSC egg
White space conformation bands of a spectrum are as prediction result.By the test of the protein to a large amount of known three-dimensional structures, test well
The reliability and validity of this method are demonstrate,proved.
Embodiment one is compareed with prediction result as an example from the protein for having known three-dimensional structure.
People's cell matter 5'- nucleotides II protein is a protein molecular with known three-dimensional structure, and the three-dimensional of it is tied
Structure is tested by X-ray crystal diffraction and determined.Fig. 3 first half is listed thin from the people that measures of X-ray crystal diffraction experiment
The space conformation of kytoplasm 5'- nucleotides II 8 structures of protein.Its each three-D space structure can be from protein data
Storehouse obtains.Then, each conformation is expressed with folded code, and is alignd and lined up array.Each space conformation represents experiment and surveyed
The configuration state obtained.List 9 most probable space conformations that the inventive method is predicted to obtain in Fig. 3 lower half.These
The step of space conformation can be described by earlier paragraphs obtains.First space conformation is to fold structure by frequency of occurrences highest
As the character string of code is formed;Second space conformation is high by the frequency of occurrences second, along with the folding for forming frequency highest
The character string of conformation code is formed as supplement;3rd space conformation be it is high by the frequency of occurrences the 3rd, along with form frequency most
The character string of high folded conformation code is formed as supplement;And so on, form 9 higher conformations of possibility.From table
If as can be seen that using the known conformation of 60 amino acid sequences (3-62) of 2XCW protein fragments as reference, extraction the first row is pre-
The result of survey is compared, and the result conformation of full information prediction has 45 folded conformations identical, and 5 similar, 10 differences.Only
Consider the prediction result of the first row, so-called rate of accuracy reached to about 80%.
On the other hand, molecular biologist is recognized, the structured data measured from the experiment of X-ray crystal diffraction is egg
Some static structure state of white matter, can not reflect protein all may dynamic conformational.The upper semisection of table one is listed
8 known spatial conformations of protein, these space conformations show the changeability of its structure.Compare these changes, full information
The prediction bands of a spectrum of prediction can cover the folding configuration of these changes completely.The data of form strongly suggest the egg that we develop
Prediction of the full information Predicting Technique of white matter space conformation not only to protein structure provides complete folded conformation, and right
It is of great significance in the protein structure that comprehensive understanding obtains from measuring.
Embodiment two selects the albumen of a unknown three-D space structure as an example, and its three-dimensional conformation can pass through
Holoprotein information prediction obtains.Fig. 4 is illustrated from 32 amino acid structures of marine aquatic biological silver shark (CallorhinchusMiliiX)
Into calcitonin polypeptide space conformation prediction.4 steps that these space conformations can be described by earlier paragraphs obtain.The
One space conformation is made up of the character string of frequency of occurrences highest folded conformation code;Second space conformation is by there is frequency
Rate second is high, along with the character string for the folded conformation code for forming frequency highest is formed as supplement;3rd space conformation
It is high by the frequency of occurrences the 3rd, along with the character string for the folded conformation code for forming frequency highest is formed as supplement;So
Analogize, form 13 higher conformations of possibility.Predict the protein steric conformation bands of a spectrum of obtained silver shark calcitonin by 13 eggs
White collapsed shape code (PFSC) character code composition.The bands of a spectrum are the complete predictions to silver shark calcitonin space conformation, and are illustrated
The possibility change of its local conformation.
Full information prediction (CPPC) method of the protein conformation of the present invention has following four characteristicses and breakthrough.
1. protein conformation full information prediction (CPPC) based on tight mathematical derivation and with albumen knot
Structure feature is combined.First, 27PFSC protein foldings shape code, which intactly represents one, has the complete closure of essential meaning empty
Between, so ensure that prediction result will not produce missing and omit.On 5 amino acid bases, by establishing 20 ammonia
Base acid and with 27 PFSC protein folding shape code correlations.Combining global albumen database, be closely connected protein structure
Feature, 5AAPFSC databases are created, enumerate the possible mathematics arrangement of whole of any 5 amino acid in 20 amino acid.
For traditional protein tertiary structure method, arranged according to these the CPPC's of the correlation foundation between PFSC codes
New method has solid mathematics basis.
2. protein conformation full information prediction (CPPC) provide fast prediction protein structure by way of.According to mesh
Preceding computer technology is horizontal, if every 10-13Second calculates a conformation.For the protein sequence of 100 amino acid, if
Allow 10 locus of each amino-acid variants, sum 10 will be produced100Space conformation.Completing these conformations needs 1077Year
Complete.For an equal amount of protein sequence, (CPPC) technology is predicted with the full information of protein conformation, it is only necessary to big
About 30 second time.The full information prediction (CPPC) of protein conformation is pre- for the structure for completing the thousands of protein sequence of length
Survey, also need only to about 120 seconds.
3. the full information prediction (CPPC) of protein conformation is shown along egg by PFSC protein folding shape codes
The Bai Xulie possible local of whole folds change.These locals, which fold change and combination, can form exponential total quantity
Space conformation.The information of these magnanimity is fully exposed in full information prediction conformation bands of a spectrum.
4. the full information prediction (CPPC) of protein conformation can predict possible protein conformation.According to office
The domain folded conformation frequency of occurrences predicts possible space conformation from huge number of space conformation.
Bibliography:
------------
1Jump up to:ab PSI Assessment Panel."Report of the Protein Structure
Initiative Assessment Panel".Retrieved December 5,2008
2Baker,D.;Sali,A.(Oct 2001)."Protein structure prediction and
Structuralgenomics ", Science 294 (5540):93–6
3Yonath,Ada.X-ray crystallography at the heart oflife science.Current
Opinion in Structural Biology.Volume 21,Issue 5,October 2011,Pages 622–626.
4Rigden,Daniel J.From Protein Structure to Function with
Bioinformatics.Springer Science.2009.ISBN 978-1-4020-9057-8.
5Moult J.et al.A large-scale experiment to assess protein structure
prediction methods,1995;Proteins 23
6http://predictioncenter.org
7Jump up,Editorial:So much more to know.Science 2005,309:78-102
8Zhang Y(2008)."Progress and challenges in protein structure
prediction".CurrOpinStructBiol 18(3):342–8.
9Yi Hea,S.Rackovskya,YanpingYina,and Harold.Scheragaa,Alternative
approach to protein structure prediction based on sequential similarity of
Physical properties, PNAS, 2015,112 (16):5029-5032
10Ashtawy,H.M.;Mahapatra,N.R.,"A Comparative Assessment of Predictive
Accuracies of Conventional and Machine Learning Scoring Functions for
Protein-Ligand Binding Affinity Prediction,"Computational Biology and
Bioinformatics,IEEE/ACM Transactions on,vol.12,no.2,pp.335,347,2015
11Bowie JU,Luthy R,Eisenberg D;Lüthy;Eisenberg(1991)."A method to
identify protein sequences that fold into a known three-dimensional
structure".Science 253(5016):164–170.
12JT.Huang,T Wang,SR.Huang and X Li,Reduced alphabet for protein
folding prediction,Proteins,2015,83-4,631–63
13Bowie JU,Lüthy R,Eisenberg D(1991)."A method to identify protein
sequences that fold into aknown three-dimensional structure".Science 253
(5016):164–170.
14Jones DT,TaylorWR,Thornton JM(1992)."A new approach to protein fold
recognition".Nature 358(6381):86–89..
15Peng,Jian;Jinbo Xu(2011)."RaptorX:exploiting structure information
for protein alignment by statistical inference".Proteins.79Suppl 10:
16Pierce,Levi C.T.;Salomon-Ferrer,Romelia;Augusto F.de Oliveira,
Cesar;McCammon,J.Andrew;Walker,Ross C.(2012)."Routine Access to Millisecond
Time Scale Events with Accelerated Molecular Dynamics".Journal of Chemical
Theory and Computation 8(9):2997–3002.
17Nugent,T.;Jones,D.T.(2012)."Accurate de novo structure prediction
of large transmembrane protein domains using fragment-assembly and correlated
mutation analysis".Proc Natl AcadSci U S A 109(24):E1540–7.
18Morcos,F.;Pagnani,A.;Lunt,B.;Bertolino,A.;Marks,DS.;Sander,C.;
Zecchina,R.;Onuchic,JN.et al.(Dec 2011)."Direct-coupling analysis of residue
coevolution captures
---------
native contacts across many protein families".Proc Natl AcadSci U S A
108(49):E1293–301.
19Yang J.Comprehensive description of protein structures using
protein folding shape code.Proteins 2008;71.3:1497-1518.
Claims (4)
1. a kind of full information Forecasting Methodology of protein structure space conformation, it is characterised in that comprise the following steps:
1)5 amino acid are arbitrarily extracted from all 20 amino acid, form sum as 3,200,000 different arrangements, often
The potential folded conformation of one arrangement obtains from global Protein Data Bank, then with protein folding shape representation;Create
One database is named as 5AAPFSC to collect above-mentioned arrangement and its corresponding protein folding shape code, the database;
2)For the protein of any one structure to be predicted, along the sequence of protein, since N- ends, progressively move to C-
End, is successively read every 5 continuous amino acid, its potential folded conformation directly obtains from 5AAPFSC databases, uses protein folding
The character representation of shape code;The character of frequency of occurrences highest folded conformation code makes number one in Protein Data Bank, goes out
The high folded conformation code character of existing frequency second comes second, sequentially forms a row from top to bottom, until collection is entirely
Only, every 5 continuous amino acid have different number of potential folded conformation;
3)The potential collapsed shape code of whole of testing protein forms an array, referred to as protein folding conformation bands of a spectrum, represents
Along the whole potential folded conformation of sequence of protein;For each protein sequence, pass through its whole potential partial folds
Conformation is substituted for each other, and can accurately obtain all potential conformations;The total number of potential conformation is whole every 5 amino acid
The continued product of potential folded conformation number.
2. full information Forecasting Methodology according to claim 1, it is characterised in that a space conformation for predicting to obtain be by
Frequency of occurrences highest collapsed shape code is formed.
3. full information Forecasting Methodology according to claim 1, it is characterised in that a space conformation for predicting to obtain be by
The high collapsed shape code of the frequency of occurrences second, in no second high-frequency conformation position, made with the collapsed shape code of frequency highest
Formed for supplement.
4. full information Forecasting Methodology according to claim 1, it is characterised in that a space conformation for predicting to obtain be by
The 3rd high collapsed shape code of the frequency of occurrences, in no 3rd high-frequency conformation position, made with the collapsed shape code of frequency highest
Formed for supplement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510623583.8A CN105260626B (en) | 2015-09-25 | 2015-09-25 | The full information Forecasting Methodology of protein structure space conformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510623583.8A CN105260626B (en) | 2015-09-25 | 2015-09-25 | The full information Forecasting Methodology of protein structure space conformation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105260626A CN105260626A (en) | 2016-01-20 |
CN105260626B true CN105260626B (en) | 2017-11-14 |
Family
ID=55100315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510623583.8A Active CN105260626B (en) | 2015-09-25 | 2015-09-25 | The full information Forecasting Methodology of protein structure space conformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105260626B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959852B (en) * | 2017-05-24 | 2021-12-24 | 北京工业大学 | Prediction method of protein-RNA (ribonucleic acid) binding module based on amino acid-nucleotide pair preference information |
CN107463793A (en) * | 2017-06-21 | 2017-12-12 | 南京迈格罗医药科技有限公司 | Complementary antibody determines area's conformation fingerprint database |
CN107451421A (en) * | 2017-06-21 | 2017-12-08 | 南京迈格罗医药科技有限公司 | epitope conformation fingerprint database |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998004734A1 (en) * | 1996-07-25 | 1998-02-05 | Wisconsin Alumni Research Foundation | Synthetic protein folding catalysis |
CN101082944A (en) * | 2007-06-01 | 2007-12-05 | 哈尔滨工程大学 | Computer simulation method for protein folding procedure based on synthesis algorithm |
CN101647022A (en) * | 2007-01-31 | 2010-02-10 | 桑迪亚医药技术(上海)有限责任公司 | Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall |
-
2015
- 2015-09-25 CN CN201510623583.8A patent/CN105260626B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998004734A1 (en) * | 1996-07-25 | 1998-02-05 | Wisconsin Alumni Research Foundation | Synthetic protein folding catalysis |
CN101647022A (en) * | 2007-01-31 | 2010-02-10 | 桑迪亚医药技术(上海)有限责任公司 | Methods, systems, algorithyms and means for describing the possible conformations of actual and theoretical proteins and for evaluating actual and theoretical proteins with respect to folding, overall |
CN101082944A (en) * | 2007-06-01 | 2007-12-05 | 哈尔滨工程大学 | Computer simulation method for protein folding procedure based on synthesis algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN105260626A (en) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nanni et al. | Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition | |
Camproux et al. | A hidden markov model derived structural alphabet for proteins | |
Seffernick et al. | Hybrid methods for combined experimental and computational determination of protein structure | |
Viswanath et al. | Improving ranking of models for protein complexes with side chain modeling and atomic potentials | |
Zhao et al. | Antibody-specified B-cell epitope prediction in line with the principle of context-awareness | |
KR102213670B1 (en) | Method for prediction of drug-target interactions | |
Berjanskii et al. | Unraveling the meaning of chemical shifts in protein NMR | |
CN105260626B (en) | The full information Forecasting Methodology of protein structure space conformation | |
Qiu et al. | Atomically detailed potentials to recognize native and approximate protein structures | |
CN111627494B (en) | Protein property prediction method and device based on multidimensional features and computing equipment | |
Zhang et al. | Predicting linear B-cell epitopes by using sequence-derived structural and physicochemical features | |
Li et al. | Protein inter‐residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14 | |
Li et al. | Study of real-valued distance prediction for protein structure prediction with deep learning | |
Liu et al. | Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation | |
Huang et al. | Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features | |
CN105046106B (en) | A kind of Prediction of Protein Subcellular Location method realized with nearest _neighbor retrieval | |
Ghualm et al. | Identification of pathway-specific protein domain by incorporating hyperparameter optimization based on 2D convolutional neural network | |
Wang et al. | CLePAPS: fast pair alignment of protein structures based on conformational letters | |
CN113409897A (en) | Method, apparatus, device and storage medium for predicting drug-target interaction | |
Comin et al. | PROuST: a comparison method of three-dimensional structures of proteins using indexing techniques | |
CN111048145B (en) | Method, apparatus, device and storage medium for generating protein prediction model | |
Rains et al. | A Bayesian method for construction of Markov models to describe dynamics on various time-scales | |
Jing et al. | Protein inter-residue contacts prediction: methods, performances and applications | |
Wu et al. | OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries | |
Sekmen et al. | Subspace modeling for classification of protein secondary structure elements from Cα trace |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231110 Address after: Room E and F, 26th Floor, No. 828-838 Zhangyang Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 201208 Patentee after: McCullough Biotechnology (Shanghai) Co.,Ltd. Address before: Building B5, No. 666 Gaoxin Avenue, Donghu Development Zone, Wuhan City, Hubei Province, 430075 Patentee before: Mccollow Pharmaceutical Technology (Wuhan) Co.,Ltd. |
|
TR01 | Transfer of patent right |