CN104951669B - A kind of distance spectrum construction method for protein structure prediction - Google Patents
A kind of distance spectrum construction method for protein structure prediction Download PDFInfo
- Publication number
- CN104951669B CN104951669B CN201510310053.8A CN201510310053A CN104951669B CN 104951669 B CN104951669 B CN 104951669B CN 201510310053 A CN201510310053 A CN 201510310053A CN 104951669 B CN104951669 B CN 104951669B
- Authority
- CN
- China
- Prior art keywords
- residue
- template
- protein
- search sequence
- chain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The construction method of distance spectrum in a kind of protein structure prediction, protein has specific space structure, similar protein has similar space structure, and the distance on its each position between residue is also close, it is possible to instruct to predict the search of protein structure by distance spectrum.Distance spectrum is to build in search sequence the higher fragment of score on each position residue according to the sequence spectrum of residue, secondary structure types, solvent accessibility, central atom dihedral angle etc. in residue in search sequence and template, then the fragment for coming from same template on each position is traveled through, the distance of residue in template is calculated, the distance in the space conformation of this distance and search sequence between residue is close.Precision of prediction of the present invention is higher, complexity is relatively low.
Description
Technical field
The present invention relates to bioinformatics, computer application field, more particularly to one kind are pre- for protein structure
The distance spectrum construction method of survey.
Background technology
Bioinformatics is a study hotspot of life science and computer science crossing domain.Bioinformatics research
Achievement has been widely used in gene discovery and prediction, the storage management of gene data, data retrieval and excavation, gene at present
Express data analysis, protein structure prediction, gene and protein homology Relationship Prediction, sequence analysis and than equity.Genome
The protein of all composition organisms is defined, gene defines the amino acid sequence of constitutive protein matter.Although protein by
The linear order composition of amino acid, still, they only have folding formed specific space structure could have corresponding activity and
Corresponding biological function.The space structure for understanding protein not only contributes to recognize the function of protein, is also beneficial to understanding
Protein be how perform function.The structure for determining protein is very important.At present, protein sequence database
The speed of data accumulation is very fast, however, it is known that the protein of structure compare it is less.Although protein structure determination technology has
More significant progress, still, determines that the process of protein structure is still extremely complex by experimental method, cost is higher.
Therefore, the protein structure of measuring wants much less than known protein sequence.On the other hand, with DNA sequencing technology
Development, human genome and more Model organism genomes or will be completely sequenced, and DNA sequence dna quantity will be anxious
Increase, and due to the progress of DNA sequence analysis technology and gene recognition method, we can derive substantial amounts of protein from DNA
Sequence.This means the protein amounts of known array and protein amounts (such as Protein structure databases of structure are determined
Data in PDB) gap will be increasing.Generation protein can be kept up with it is desirable to the speed for producing protein structure
The speed of sequence, or reduce both gaps.
Traditional method be by the energy model based on physical field or Knowledge based engineering energy model guidance search,
And so there is the deficiency that sampling efficiency is low, complexity is higher, precision of prediction is relatively low.So introducing here a kind of for egg
The construction method of the distance spectrum of white matter structure prediction, improves sampling efficiency, reduces complexity, improves precision of prediction.
The content of the invention
In order to overcome existing conformational space optimization method to exist, sampling efficiency is relatively low, complexity is higher, precision of prediction compared with
Low deficiency, the present invention proposes a kind of construction method of distance spectrum in protein structure prediction, and protein has specific space
Structure, similar protein has similar space structure, and the distance on its each position between residue is also close, so can
To instruct the search for predicting protein structure by distance spectrum.Distance spectrum is according to residue in residue in search sequence and template
Sequence spectrum, secondary structure types, solvent accessibility, central atom dihedral angle etc. built in search sequence on each position residue
The higher fragment of score, then travels through the fragment for coming from same template on each position, calculate residue in template away from
From the distance in the space conformation of this distance and search sequence between residue is close.The present invention is in protein structure prediction
Middle application, can obtain the conformation that precision of prediction is higher, complexity is relatively low.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of distance spectrum construction method for protein structure prediction, the building process comprises the following steps:1) structure
Build nonredundancy ATL:
1.1) from Protein Data Bank website (http://www.rcsb.org) on download resolution ratio be less thanPrecision
Higher known protein sequence;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain,
identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from
Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30%
Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one
Individual group;
1.6) corresponding protein is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain
Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) each relative to 20 amino of residue in search sequence search sequence can be obtained by PSI-BLAST softwares
The characteristic frequency spectrum P of acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt;
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template
Secondary structure types sst;
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten
Agent accessibility sat;
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in template
ψt;
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid
Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij;
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
Beneficial effects of the present invention are:Protein has specific space structure, and similar protein has similar sky
Between structure, the distance on its each position between residue is also close, it is possible to instruct to predict protein by distance spectrum
The search of structure.Distance spectrum is can according to the sequence spectrum of residue, secondary structure types, solvent in residue in search sequence and template
The higher fragment of score on each position residue, is then traveled through each up in property, central atom dihedral angle etc. structure search sequence
Come from the fragment of same template on position, calculate the distance of residue in template, the space of this distance and search sequence
Distance in conformation between residue is close.The present invention applied in protein structure prediction, can obtain precision of prediction it is higher,
The relatively low conformation of complexity.
Brief description of the drawings
Fig. 1 is 1VII the 5th residue E and the 24th residue W distance spectrum experimental result
Fig. 2 is 1VII the 13rd residue M and the 18th residue F distance spectrum experimental result
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Figures 1 and 2, a kind of distance spectrum construction method for protein structure prediction, the building process includes
Following steps:
1) nonredundancy ATL is built:
1.1) from Protein Data Bank website (http://www.rcsb.org) on download resolution ratio be less thanPrecision
Higher known protein sequence;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain,
identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from
Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30%
Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one
Individual group;
1.6) corresponding protein is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain
Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) each relative to 20 amino of residue in search sequence search sequence can be obtained by PSI-BLAST softwares
The characteristic frequency spectrum P of acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt;
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template
Secondary structure types sst;
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten
Agent accessibility sat;
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in template
ψt;
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid
Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij;
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
Fig. 2 is that preferable residue is adjusted the distance spectrum.The present embodiment is using the entitled 1VII of PDB as embodiment, and one kind is used for albumen
The distance spectrum construction method of matter structure prediction, wherein comprising the steps of:
1) nonredundancy ATL is built:
1.1) resolution ratio is downloaded from Protein Data Bank website to be less thanThe higher known protein sequence of precision
38443;
1.2) protein sequence for obtaining download splits into single-stranded 77890 altogether,
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain,
identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into 78 groups using 1000 chains as a unit, in each group according to accumulative similarity from
Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30%
Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one
Individual group, 6568 amino acid chains are retained;
1.6) corresponding protein structure is downloaded from PDB websites according to the PDB titles of the amino acid chain retained, constituted
The ATL of nonredundancy;
2) fragment library is generated:
2.1) each residue can be obtained in search sequence search sequence relative to 20 by PSI-BLAST softwares
The characteristic frequency spectrum P of individual amino acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt;
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template
Secondary structure types sst;
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten
Agent accessibility sat;
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in templateψt;
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid
Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij;
If 3.4)It is to carry out counting statistics apart from interval with 0.5;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(0,9);
4.2) ordinate of figure is the fragment number in some distance interval.
Described above is the excellent results that one embodiment that the present invention is provided is showed, it is clear that the present invention is not only fitted
Above-described embodiment is closed, can on the premise of without departing from essence spirit of the present invention and without departing from content involved by substantive content of the present invention
Many variations are done to it to be carried out.
Claims (1)
1. a kind of distance spectrum construction method for protein structure prediction, it is characterised in that:The construction method includes following
Step:
1) nonredundancy ATL is built:
1.1) resolution ratio is downloaded from Protein Data Bank to be less thanThe higher known protein sequence of precision;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
<mrow>
<mi>t</mi>
<mi>o</mi>
<mi>t</mi>
<mi>a</mi>
<mi>l</mi>
<mo>_</mo>
<msub>
<mi>identity</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</msubsup>
<msub>
<mi>identity</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain,
identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from greatly to
Minispread, since accumulative similarity greatly be compared successively with other all chains, reject similarity be more than 30% chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally one group of synthesis;
1.6) corresponding protein knot is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain
Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) characteristic frequency of each residue relative to 20 amino acid in search sequence can be obtained by PSI-BLAST softwares
Compose PqLogarithmic spectrum L with residue in template relative to 20 amino acidt;
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith two grades of knots of residue in template
Structure type sst;
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqSolvent with residue in template is reachable
Property sat;
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in templateψt;
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is the rope of 20 amino acid
Draw sequence number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij;
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510310053.8A CN104951669B (en) | 2015-06-08 | 2015-06-08 | A kind of distance spectrum construction method for protein structure prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510310053.8A CN104951669B (en) | 2015-06-08 | 2015-06-08 | A kind of distance spectrum construction method for protein structure prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104951669A CN104951669A (en) | 2015-09-30 |
CN104951669B true CN104951669B (en) | 2017-09-05 |
Family
ID=54166322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510310053.8A Active CN104951669B (en) | 2015-06-08 | 2015-06-08 | A kind of distance spectrum construction method for protein structure prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104951669B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574359B (en) * | 2015-12-15 | 2018-09-14 | 上海珍岛信息技术有限公司 | A kind of extending method and device in protein template library |
CN105653892A (en) * | 2015-12-29 | 2016-06-08 | 浙江工业大学 | Distance spectrum intelligence based normal distribution distance receiving probability model construction method |
CN107145765A (en) * | 2017-03-14 | 2017-09-08 | 浙江工业大学 | A kind of track multiscale analysis method for protein structure prediction |
CN107526939B (en) * | 2017-06-30 | 2020-10-16 | 南京理工大学 | Rapid alignment method for small molecular structure |
CN107609340B (en) * | 2017-07-24 | 2020-05-05 | 浙江工业大学 | Multi-domain protein distance spectrum construction method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130303387A1 (en) * | 2012-05-09 | 2013-11-14 | Sloan-Kettering Institute For Cancer Research | Methods and apparatus for predicting protein structure |
US20140100834A1 (en) * | 2012-10-04 | 2014-04-10 | Macromoltek | Computational methods for analysis and molecular design of antibodies, antibody humanization, and epitope mapping coupled to a user-interactive web browser with embedded three- dimensional rendering |
CN103473482B (en) * | 2013-07-15 | 2016-04-13 | 浙江工业大学 | Based on the prediction method for three-dimensional structure of protein that differential evolution and conformational space are annealed |
CN103413067B (en) * | 2013-07-30 | 2016-06-15 | 浙江工业大学 | A kind of protein structure prediction method based on abstract convex Lower Bound Estimation |
CN103714265B (en) * | 2013-12-23 | 2016-06-22 | 浙江工业大学 | A kind of prediction method for three-dimensional structure of protein assembled based on Monte Carlo localised jitter and fragment |
CN103984878B (en) * | 2014-04-08 | 2017-01-18 | 浙江工业大学 | Protein structure predicating method based on tree search and fragment assembly |
CN104200130B (en) * | 2014-07-23 | 2017-08-11 | 浙江工业大学 | It is a kind of that the Advances in protein structure prediction assembled with fragment is exchanged based on tree construction copy |
-
2015
- 2015-06-08 CN CN201510310053.8A patent/CN104951669B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104951669A (en) | 2015-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104951669B (en) | A kind of distance spectrum construction method for protein structure prediction | |
Bagal et al. | MolGPT: molecular generation using a transformer-decoder model | |
US11620567B2 (en) | Method, apparatus, device and storage medium for predicting protein binding site | |
Greener et al. | Design of metalloproteins and novel protein folds using variational autoencoders | |
Zhang et al. | iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition | |
Haubold | Alignment-free phylogenetics and population genetics | |
Su et al. | Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization | |
Swamidass et al. | Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time | |
Jiang et al. | LightCpG: a multi-view CpG sites detection on single-cell whole genome sequence data | |
CN105468934B (en) | Apart from model building method between a kind of residue of protein based on Bolzmann probability density functions | |
Long et al. | Zero-shot 3d drug design by sketching and generating | |
Ferguson et al. | Species-specific basecallers improve actual accuracy of nanopore sequencing in plants | |
Tian et al. | Pairwise alignment of interaction networks by fast identification of maximal conserved patterns | |
CN115458039A (en) | Single-sequence protein structure prediction method and system based on machine learning | |
Lin et al. | PanGu Drug Model: learn a molecule like a human | |
CN111881263A (en) | Service recommendation online optimization method for intelligent home scene | |
Monsu et al. | Fast alignment of reads to a variation graph with application to SNP detection | |
Liu et al. | De novo protein structure prediction by incremental inter-residue geometries prediction and model quality assessment using deep learning | |
Wu et al. | AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism | |
Yang et al. | CMGN: a conditional molecular generation net to design target-specific molecules with desired properties | |
Wei et al. | smsMap: mapping single molecule sequencing reads by locating the alignment starting positions | |
Chu et al. | A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions | |
Wang et al. | Feature selection methods in the framework of mrmr | |
Esquivel-Rodríguez et al. | Evaluation of multiple protein docking structures using correctly predicted pairwise subunits | |
CN107145764B (en) | A kind of protein conformation space search method of dual distribution estimation guidance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |