CN104951669B

CN104951669B - A kind of distance spectrum construction method for protein structure prediction

Info

Publication number: CN104951669B
Application number: CN201510310053.8A
Authority: CN
Inventors: 张贵军; 郝小虎; 俞旭锋; 周晓根; 陈凯; 徐东伟
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2015-06-08
Filing date: 2015-06-08
Publication date: 2017-09-05
Anticipated expiration: 2035-06-08
Also published as: CN104951669A

Abstract

The construction method of distance spectrum in a kind of protein structure prediction, protein has specific space structure, similar protein has similar space structure, and the distance on its each position between residue is also close, it is possible to instruct to predict the search of protein structure by distance spectrum.Distance spectrum is to build in search sequence the higher fragment of score on each position residue according to the sequence spectrum of residue, secondary structure types, solvent accessibility, central atom dihedral angle etc. in residue in search sequence and template, then the fragment for coming from same template on each position is traveled through, the distance of residue in template is calculated, the distance in the space conformation of this distance and search sequence between residue is close.Precision of prediction of the present invention is higher, complexity is relatively low.

Description

A kind of distance spectrum construction method for protein structure prediction

Technical field

The present invention relates to bioinformatics, computer application field, more particularly to one kind are pre- for protein structure The distance spectrum construction method of survey.

Background technology

Bioinformatics is a study hotspot of life science and computer science crossing domain.Bioinformatics research Achievement has been widely used in gene discovery and prediction, the storage management of gene data, data retrieval and excavation, gene at present Express data analysis, protein structure prediction, gene and protein homology Relationship Prediction, sequence analysis and than equity.Genome The protein of all composition organisms is defined, gene defines the amino acid sequence of constitutive protein matter.Although protein by The linear order composition of amino acid, still, they only have folding formed specific space structure could have corresponding activity and Corresponding biological function.The space structure for understanding protein not only contributes to recognize the function of protein, is also beneficial to understanding Protein be how perform function.The structure for determining protein is very important.At present, protein sequence database The speed of data accumulation is very fast, however, it is known that the protein of structure compare it is less.Although protein structure determination technology has More significant progress, still, determines that the process of protein structure is still extremely complex by experimental method, cost is higher. Therefore, the protein structure of measuring wants much less than known protein sequence.On the other hand, with DNA sequencing technology Development, human genome and more Model organism genomes or will be completely sequenced, and DNA sequence dna quantity will be anxious Increase, and due to the progress of DNA sequence analysis technology and gene recognition method, we can derive substantial amounts of protein from DNA Sequence.This means the protein amounts of known array and protein amounts (such as Protein structure databases of structure are determined Data in PDB) gap will be increasing.Generation protein can be kept up with it is desirable to the speed for producing protein structure The speed of sequence, or reduce both gaps.

Traditional method be by the energy model based on physical field or Knowledge based engineering energy model guidance search, And so there is the deficiency that sampling efficiency is low, complexity is higher, precision of prediction is relatively low.So introducing here a kind of for egg The construction method of the distance spectrum of white matter structure prediction, improves sampling efficiency, reduces complexity, improves precision of prediction.

The content of the invention

In order to overcome existing conformational space optimization method to exist, sampling efficiency is relatively low, complexity is higher, precision of prediction compared with Low deficiency, the present invention proposes a kind of construction method of distance spectrum in protein structure prediction, and protein has specific space Structure, similar protein has similar space structure, and the distance on its each position between residue is also close, so can To instruct the search for predicting protein structure by distance spectrum.Distance spectrum is according to residue in residue in search sequence and template Sequence spectrum, secondary structure types, solvent accessibility, central atom dihedral angle etc. built in search sequence on each position residue The higher fragment of score, then travels through the fragment for coming from same template on each position, calculate residue in template away from From the distance in the space conformation of this distance and search sequence between residue is close.The present invention is in protein structure prediction Middle application, can obtain the conformation that precision of prediction is higher, complexity is relatively low.

The technical solution adopted for the present invention to solve the technical problems is：

A kind of distance spectrum construction method for protein structure prediction, the building process comprises the following steps：1) structure Build nonredundancy ATL：

1.1) from Protein Data Bank website (http://www.rcsb.org) on download resolution ratio be less thanPrecision Higher known protein sequence；

1.2) protein sequence for obtaining download splits into single-stranded；

1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:

In formula (1), N is all single-stranded sums, total_identity_iFor the accumulative similarity of i-th chain, identity_ijFor i-th chain and the similarity score of j-th strip chain；

1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30% Chain；

1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one Individual group；

1.6) corresponding protein is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain Structure, constitutes the ATL of nonredundancy；

2) fragment library is generated：

2.1) each relative to 20 amino of residue in search sequence search sequence can be obtained by PSI-BLAST softwares The characteristic frequency spectrum P of acid_qLogarithmic spectrum L with residue in template relative to 20 amino acid_t；

2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwares_qWith residue in template Secondary structure types ss_t；

2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwares_qWith in template residue it is molten Agent accessibility sa_t；

2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψ_qWith the dihedral angle of residue in template ψ_t；

2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence：

In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid Index number；w₁,w₂,w₃,w₄,w₅For weight parameter；

2.6) point high preceding 300 fragments are obtained and constitute fragment library；

3) distance spectrum is built：

3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5；

3.2) fragment on traversal i and j positions, selects the fragment come from a template；

3.3) calculate the two fragments in template conformation apart from d_ij；

If 3.4)WithCounting statistics is carried out for distance interval；It is no, then return to 3.2；

4) draw residue pair apart from spectrogram：

4.1) abscissa of figure is that the piece come from template is intersegmental apart from d_ij, d_ij∈(d_min,d_max)；

4.2) ordinate of figure is the fragment for falling into respective bins to number.

Beneficial effects of the present invention are：Protein has specific space structure, and similar protein has similar sky Between structure, the distance on its each position between residue is also close, it is possible to instruct to predict protein by distance spectrum The search of structure.Distance spectrum is can according to the sequence spectrum of residue, secondary structure types, solvent in residue in search sequence and template The higher fragment of score on each position residue, is then traveled through each up in property, central atom dihedral angle etc. structure search sequence Come from the fragment of same template on position, calculate the distance of residue in template, the space of this distance and search sequence Distance in conformation between residue is close.The present invention applied in protein structure prediction, can obtain precision of prediction it is higher, The relatively low conformation of complexity.

Brief description of the drawings

Fig. 1 is 1VII the 5th residue E and the 24th residue W distance spectrum experimental result

Fig. 2 is 1VII the 13rd residue M and the 18th residue F distance spectrum experimental result

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Figures 1 and 2, a kind of distance spectrum construction method for protein structure prediction, the building process includes Following steps：

1) nonredundancy ATL is built：

1.2) protein sequence for obtaining download splits into single-stranded；

2) fragment library is generated：

3) distance spectrum is built：

3.3) calculate the two fragments in template conformation apart from d_ij；

4) draw residue pair apart from spectrogram：

Fig. 2 is that preferable residue is adjusted the distance spectrum.The present embodiment is using the entitled 1VII of PDB as embodiment, and one kind is used for albumen The distance spectrum construction method of matter structure prediction, wherein comprising the steps of：

1) nonredundancy ATL is built：

1.1) resolution ratio is downloaded from Protein Data Bank website to be less thanThe higher known protein sequence of precision 38443；

1.2) protein sequence for obtaining download splits into single-stranded 77890 altogether,

1.4) all chains are divided into 78 groups using 1000 chains as a unit, in each group according to accumulative similarity from Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30% Chain；

1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one Individual group, 6568 amino acid chains are retained；

1.6) corresponding protein structure is downloaded from PDB websites according to the PDB titles of the amino acid chain retained, constituted The ATL of nonredundancy；

2) fragment library is generated：

2.1) each residue can be obtained in search sequence search sequence relative to 20 by PSI-BLAST softwares

The characteristic frequency spectrum P of individual amino acid_qLogarithmic spectrum L with residue in template relative to 20 amino acid_t；

2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψ_qWith the dihedral angle of residue in templateψ_t；

3) distance spectrum is built：

3.3) calculate the two fragments in template conformation apart from d_ij；

If 3.4)It is to carry out counting statistics apart from interval with 0.5；It is no, then return to 3.2；

4) draw residue pair apart from spectrogram：

4.1) abscissa of figure is that the piece come from template is intersegmental apart from d_ij, d_ij∈(0,9)；

4.2) ordinate of figure is the fragment number in some distance interval.

Described above is the excellent results that one embodiment that the present invention is provided is showed, it is clear that the present invention is not only fitted Above-described embodiment is closed, can on the premise of without departing from essence spirit of the present invention and without departing from content involved by substantive content of the present invention Many variations are done to it to be carried out.

Claims

1. a kind of distance spectrum construction method for protein structure prediction, it is characterised in that：The construction method includes following Step：

1) nonredundancy ATL is built：

1.1) resolution ratio is downloaded from Protein Data Bank to be less thanThe higher known protein sequence of precision；

1.2) protein sequence for obtaining download splits into single-stranded；

<mrow> <mi>t</mi> <mi>o</mi> <mi>t</mi> <mi>a</mi> <mi>l</mi> <mo>_</mo> <msub> <mi>identity</mi> <mi>i</mi> </msub> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>identity</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from greatly to Minispread, since accumulative similarity greatly be compared successively with other all chains, reject similarity be more than 30% chain；

1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally one group of synthesis；

1.6) corresponding protein knot is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain Structure, constitutes the ATL of nonredundancy；

2) fragment library is generated：

2.1) characteristic frequency of each residue relative to 20 amino acid in search sequence can be obtained by PSI-BLAST softwares Compose P_qLogarithmic spectrum L with residue in template relative to 20 amino acid_t；

2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwares_qWith two grades of knots of residue in template Structure type ss_t；

2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwares_qSolvent with residue in template is reachable Property sa_t；

In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is the rope of 20 amino acid Draw sequence number；w₁,w₂,w₃,w₄,w₅For weight parameter；

3) distance spectrum is built：

3.3) calculate the two fragments in template conformation apart from d_ij；

4) draw residue pair apart from spectrogram：