CN104951669B - A kind of distance spectrum construction method for protein structure prediction - Google Patents

A kind of distance spectrum construction method for protein structure prediction Download PDF

Info

Publication number
CN104951669B
CN104951669B CN201510310053.8A CN201510310053A CN104951669B CN 104951669 B CN104951669 B CN 104951669B CN 201510310053 A CN201510310053 A CN 201510310053A CN 104951669 B CN104951669 B CN 104951669B
Authority
CN
China
Prior art keywords
residue
template
protein
search sequence
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510310053.8A
Other languages
Chinese (zh)
Other versions
CN104951669A (en
Inventor
张贵军
郝小虎
俞旭锋
周晓根
陈凯
徐东伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201510310053.8A priority Critical patent/CN104951669B/en
Publication of CN104951669A publication Critical patent/CN104951669A/en
Application granted granted Critical
Publication of CN104951669B publication Critical patent/CN104951669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The construction method of distance spectrum in a kind of protein structure prediction, protein has specific space structure, similar protein has similar space structure, and the distance on its each position between residue is also close, it is possible to instruct to predict the search of protein structure by distance spectrum.Distance spectrum is to build in search sequence the higher fragment of score on each position residue according to the sequence spectrum of residue, secondary structure types, solvent accessibility, central atom dihedral angle etc. in residue in search sequence and template, then the fragment for coming from same template on each position is traveled through, the distance of residue in template is calculated, the distance in the space conformation of this distance and search sequence between residue is close.Precision of prediction of the present invention is higher, complexity is relatively low.

Description

A kind of distance spectrum construction method for protein structure prediction
Technical field
The present invention relates to bioinformatics, computer application field, more particularly to one kind are pre- for protein structure The distance spectrum construction method of survey.
Background technology
Bioinformatics is a study hotspot of life science and computer science crossing domain.Bioinformatics research Achievement has been widely used in gene discovery and prediction, the storage management of gene data, data retrieval and excavation, gene at present Express data analysis, protein structure prediction, gene and protein homology Relationship Prediction, sequence analysis and than equity.Genome The protein of all composition organisms is defined, gene defines the amino acid sequence of constitutive protein matter.Although protein by The linear order composition of amino acid, still, they only have folding formed specific space structure could have corresponding activity and Corresponding biological function.The space structure for understanding protein not only contributes to recognize the function of protein, is also beneficial to understanding Protein be how perform function.The structure for determining protein is very important.At present, protein sequence database The speed of data accumulation is very fast, however, it is known that the protein of structure compare it is less.Although protein structure determination technology has More significant progress, still, determines that the process of protein structure is still extremely complex by experimental method, cost is higher. Therefore, the protein structure of measuring wants much less than known protein sequence.On the other hand, with DNA sequencing technology Development, human genome and more Model organism genomes or will be completely sequenced, and DNA sequence dna quantity will be anxious Increase, and due to the progress of DNA sequence analysis technology and gene recognition method, we can derive substantial amounts of protein from DNA Sequence.This means the protein amounts of known array and protein amounts (such as Protein structure databases of structure are determined Data in PDB) gap will be increasing.Generation protein can be kept up with it is desirable to the speed for producing protein structure The speed of sequence, or reduce both gaps.
Traditional method be by the energy model based on physical field or Knowledge based engineering energy model guidance search, And so there is the deficiency that sampling efficiency is low, complexity is higher, precision of prediction is relatively low.So introducing here a kind of for egg The construction method of the distance spectrum of white matter structure prediction, improves sampling efficiency, reduces complexity, improves precision of prediction.
The content of the invention
In order to overcome existing conformational space optimization method to exist, sampling efficiency is relatively low, complexity is higher, precision of prediction compared with Low deficiency, the present invention proposes a kind of construction method of distance spectrum in protein structure prediction, and protein has specific space Structure, similar protein has similar space structure, and the distance on its each position between residue is also close, so can To instruct the search for predicting protein structure by distance spectrum.Distance spectrum is according to residue in residue in search sequence and template Sequence spectrum, secondary structure types, solvent accessibility, central atom dihedral angle etc. built in search sequence on each position residue The higher fragment of score, then travels through the fragment for coming from same template on each position, calculate residue in template away from From the distance in the space conformation of this distance and search sequence between residue is close.The present invention is in protein structure prediction Middle application, can obtain the conformation that precision of prediction is higher, complexity is relatively low.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of distance spectrum construction method for protein structure prediction, the building process comprises the following steps:1) structure Build nonredundancy ATL:
1.1) from Protein Data Bank website (http://www.rcsb.org) on download resolution ratio be less thanPrecision Higher known protein sequence;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain, identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30% Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one Individual group;
1.6) corresponding protein is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) each relative to 20 amino of residue in search sequence search sequence can be obtained by PSI-BLAST softwares The characteristic frequency spectrum P of acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template Secondary structure types sst
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten Agent accessibility sat
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in template ψt
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
Beneficial effects of the present invention are:Protein has specific space structure, and similar protein has similar sky Between structure, the distance on its each position between residue is also close, it is possible to instruct to predict protein by distance spectrum The search of structure.Distance spectrum is can according to the sequence spectrum of residue, secondary structure types, solvent in residue in search sequence and template The higher fragment of score on each position residue, is then traveled through each up in property, central atom dihedral angle etc. structure search sequence Come from the fragment of same template on position, calculate the distance of residue in template, the space of this distance and search sequence Distance in conformation between residue is close.The present invention applied in protein structure prediction, can obtain precision of prediction it is higher, The relatively low conformation of complexity.
Brief description of the drawings
Fig. 1 is 1VII the 5th residue E and the 24th residue W distance spectrum experimental result
Fig. 2 is 1VII the 13rd residue M and the 18th residue F distance spectrum experimental result
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Figures 1 and 2, a kind of distance spectrum construction method for protein structure prediction, the building process includes Following steps:
1) nonredundancy ATL is built:
1.1) from Protein Data Bank website (http://www.rcsb.org) on download resolution ratio be less thanPrecision Higher known protein sequence;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain, identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30% Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one Individual group;
1.6) corresponding protein is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) each relative to 20 amino of residue in search sequence search sequence can be obtained by PSI-BLAST softwares The characteristic frequency spectrum P of acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template Secondary structure types sst
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten Agent accessibility sat
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in template ψt
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
Fig. 2 is that preferable residue is adjusted the distance spectrum.The present embodiment is using the entitled 1VII of PDB as embodiment, and one kind is used for albumen The distance spectrum construction method of matter structure prediction, wherein comprising the steps of:
1) nonredundancy ATL is built:
1.1) resolution ratio is downloaded from Protein Data Bank website to be less thanThe higher known protein sequence of precision 38443;
1.2) protein sequence for obtaining download splits into single-stranded 77890 altogether,
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain, identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into 78 groups using 1000 chains as a unit, in each group according to accumulative similarity from Minispread is arrived greatly, is compared successively with other all chains since accumulative similarity starting greatly and is rejected similarity more than 30% Chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally synthesizes one Individual group, 6568 amino acid chains are retained;
1.6) corresponding protein structure is downloaded from PDB websites according to the PDB titles of the amino acid chain retained, constituted The ATL of nonredundancy;
2) fragment library is generated:
2.1) each residue can be obtained in search sequence search sequence relative to 20 by PSI-BLAST softwares
The characteristic frequency spectrum P of individual amino acidqLogarithmic spectrum L with residue in template relative to 20 amino acidt
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith residue in template Secondary structure types sst
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqWith in template residue it is molten Agent accessibility sat
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in templateψt
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is 20 amino acid Index number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij
If 3.4)It is to carry out counting statistics apart from interval with 0.5;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(0,9);
4.2) ordinate of figure is the fragment number in some distance interval.
Described above is the excellent results that one embodiment that the present invention is provided is showed, it is clear that the present invention is not only fitted Above-described embodiment is closed, can on the premise of without departing from essence spirit of the present invention and without departing from content involved by substantive content of the present invention Many variations are done to it to be carried out.

Claims (1)

1. a kind of distance spectrum construction method for protein structure prediction, it is characterised in that:The construction method includes following Step:
1) nonredundancy ATL is built:
1.1) resolution ratio is downloaded from Protein Data Bank to be less thanThe higher known protein sequence of precision;
1.2) protein sequence for obtaining download splits into single-stranded;
1.3) accumulative similarity total_identity of the every chain relative to other chains is calculated:
<mrow> <mi>t</mi> <mi>o</mi> <mi>t</mi> <mi>a</mi> <mi>l</mi> <mo>_</mo> <msub> <mi>identity</mi> <mi>i</mi> </msub> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>identity</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
In formula (1), N is all single-stranded sums, total_identityiFor the accumulative similarity of i-th chain, identityijFor i-th chain and the similarity score of j-th strip chain;
1.4) all chains are divided into multiple groups using 1000 chains as a unit, in each group according to accumulative similarity from greatly to Minispread, since accumulative similarity greatly be compared successively with other all chains, reject similarity be more than 30% chain;
1.5) after all groups have all compared, the quantity for expanding chain in packet carries out similarity rejecting again, finally one group of synthesis;
1.6) corresponding protein knot is downloaded from Protein Data Bank website according to the PDB titles for remaining amino acid chain Structure, constitutes the ATL of nonredundancy;
2) fragment library is generated:
2.1) characteristic frequency of each residue relative to 20 amino acid in search sequence can be obtained by PSI-BLAST softwares Compose PqLogarithmic spectrum L with residue in template relative to 20 amino acidt
2.2) the secondary structure types ss of residue in search sequence is obtained by PSSpred softwaresqWith two grades of knots of residue in template Structure type sst
2.3) the solvent accessibility sa of residue in search sequence is obtained by EDTSurf softwaresqSolvent with residue in template is reachable Property sat
2.4) residue dihedral angle in search sequence is obtained by ANGLOR softwaresψqWith the dihedral angle of residue in templateψt
2.5) similarity score function f (i, j) of the calculation template fragment relative to search sequence:
In formula (2), i is the resi-dues in search sequence, and j is the position of residue in template, and k is the rope of 20 amino acid Draw sequence number;w1,w2,w3,w4,w5For weight parameter;
2.6) point high preceding 300 fragments are obtained and constitute fragment library;
3) distance spectrum is built:
3.1) residue of i-th of position of search sequence and the residue of j-th of position, j are chosen>i+5;
3.2) fragment on traversal i and j positions, selects the fragment come from a template;
3.3) calculate the two fragments in template conformation apart from dij
If 3.4)WithCounting statistics is carried out for distance interval;It is no, then return to 3.2;
4) draw residue pair apart from spectrogram:
4.1) abscissa of figure is that the piece come from template is intersegmental apart from dij, dij∈(dmin,dmax);
4.2) ordinate of figure is the fragment for falling into respective bins to number.
CN201510310053.8A 2015-06-08 2015-06-08 A kind of distance spectrum construction method for protein structure prediction Active CN104951669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510310053.8A CN104951669B (en) 2015-06-08 2015-06-08 A kind of distance spectrum construction method for protein structure prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510310053.8A CN104951669B (en) 2015-06-08 2015-06-08 A kind of distance spectrum construction method for protein structure prediction

Publications (2)

Publication Number Publication Date
CN104951669A CN104951669A (en) 2015-09-30
CN104951669B true CN104951669B (en) 2017-09-05

Family

ID=54166322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510310053.8A Active CN104951669B (en) 2015-06-08 2015-06-08 A kind of distance spectrum construction method for protein structure prediction

Country Status (1)

Country Link
CN (1) CN104951669B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574359B (en) * 2015-12-15 2018-09-14 上海珍岛信息技术有限公司 A kind of extending method and device in protein template library
CN105653892A (en) * 2015-12-29 2016-06-08 浙江工业大学 Distance spectrum intelligence based normal distribution distance receiving probability model construction method
CN107145765A (en) * 2017-03-14 2017-09-08 浙江工业大学 A kind of track multiscale analysis method for protein structure prediction
CN107526939B (en) * 2017-06-30 2020-10-16 南京理工大学 Rapid alignment method for small molecular structure
CN107609340B (en) * 2017-07-24 2020-05-05 浙江工业大学 Multi-domain protein distance spectrum construction method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130303387A1 (en) * 2012-05-09 2013-11-14 Sloan-Kettering Institute For Cancer Research Methods and apparatus for predicting protein structure
US20140100834A1 (en) * 2012-10-04 2014-04-10 Macromoltek Computational methods for analysis and molecular design of antibodies, antibody humanization, and epitope mapping coupled to a user-interactive web browser with embedded three- dimensional rendering
CN103473482B (en) * 2013-07-15 2016-04-13 浙江工业大学 Based on the prediction method for three-dimensional structure of protein that differential evolution and conformational space are annealed
CN103413067B (en) * 2013-07-30 2016-06-15 浙江工业大学 A kind of protein structure prediction method based on abstract convex Lower Bound Estimation
CN103714265B (en) * 2013-12-23 2016-06-22 浙江工业大学 A kind of prediction method for three-dimensional structure of protein assembled based on Monte Carlo localised jitter and fragment
CN103984878B (en) * 2014-04-08 2017-01-18 浙江工业大学 Protein structure predicating method based on tree search and fragment assembly
CN104200130B (en) * 2014-07-23 2017-08-11 浙江工业大学 It is a kind of that the Advances in protein structure prediction assembled with fragment is exchanged based on tree construction copy

Also Published As

Publication number Publication date
CN104951669A (en) 2015-09-30

Similar Documents

Publication Publication Date Title
CN104951669B (en) A kind of distance spectrum construction method for protein structure prediction
Bagal et al. MolGPT: molecular generation using a transformer-decoder model
US11620567B2 (en) Method, apparatus, device and storage medium for predicting protein binding site
Greener et al. Design of metalloproteins and novel protein folds using variational autoencoders
Zhang et al. iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition
Haubold Alignment-free phylogenetics and population genetics
Su et al. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization
Swamidass et al. Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time
Jiang et al. LightCpG: a multi-view CpG sites detection on single-cell whole genome sequence data
CN105468934B (en) Apart from model building method between a kind of residue of protein based on Bolzmann probability density functions
Long et al. Zero-shot 3d drug design by sketching and generating
Ferguson et al. Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Tian et al. Pairwise alignment of interaction networks by fast identification of maximal conserved patterns
CN115458039A (en) Single-sequence protein structure prediction method and system based on machine learning
Lin et al. PanGu Drug Model: learn a molecule like a human
CN111881263A (en) Service recommendation online optimization method for intelligent home scene
Monsu et al. Fast alignment of reads to a variation graph with application to SNP detection
Liu et al. De novo protein structure prediction by incremental inter-residue geometries prediction and model quality assessment using deep learning
Wu et al. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism
Yang et al. CMGN: a conditional molecular generation net to design target-specific molecules with desired properties
Wei et al. smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
Chu et al. A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions
Wang et al. Feature selection methods in the framework of mrmr
Esquivel-Rodríguez et al. Evaluation of multiple protein docking structures using correctly predicted pairwise subunits
CN107145764B (en) A kind of protein conformation space search method of dual distribution estimation guidance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant