WO2018122338A1 - Computational selection of proteases and prediction of cleavage products - Google Patents

Computational selection of proteases and prediction of cleavage products Download PDF

Info

Publication number
WO2018122338A1
WO2018122338A1 PCT/EP2017/084752 EP2017084752W WO2018122338A1 WO 2018122338 A1 WO2018122338 A1 WO 2018122338A1 EP 2017084752 W EP2017084752 W EP 2017084752W WO 2018122338 A1 WO2018122338 A1 WO 2018122338A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
peptides
peptide
matching
peptidase
Prior art date
Application number
PCT/EP2017/084752
Other languages
French (fr)
Inventor
Andrew Knox
Original Assignee
Dublin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dublin Institute Of Technology filed Critical Dublin Institute Of Technology
Publication of WO2018122338A1 publication Critical patent/WO2018122338A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • the invention relates to a method and apparatus for identifying, screening and/or developing enzymes which will produce pre-determined peptides from enzymatic cleavage by predicting which peptides will arise from enzymatic cleavage, e.g. during digestion.
  • the invention also relates to a method and apparatus for predicting the peptides formed by enzymatic hydrolysis of proteins, and optionally the cleavage points of the peptides.
  • hydrolysates The end products of both bacterial and enzymatic digestion of a food source are often referred to as a hydrolysate.
  • hydrolysates when a protein within a food source is digested, hydrolysates with diverse properties can be released and some of these hydrolysates can be bioactive peptides.
  • Such hydrolysates may have improved nutritive value, enhanced functional properties and potential biological activity. Accordingly, they can be potentially used as, or in, nutraceuticals and functional foods for promoting health.
  • the peptides which arise from enzymatic cleavage of a protein food source are determined used analysis after the digestion has occurred, e.g. using mass spectrometric analysis such as that described in "EnzymePredictor: A tool for predicting and visualising enzymatic cleavages of digested proteins" by Vijayakumar et al published in Journal of Proteome Research 2012, 1 1 , 6056-6065.
  • EnzymePredictor A tool for predicting and visualising enzymatic cleavages of digested proteins
  • the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
  • the peptidase may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis.
  • the peptidase may be putative, that is to say an enzyme which is believed or understood to have the ability to break down peptides into amino acids and such peptidases may be synthetically generated (such as by employing standard genetic engineering techniques) or generated (but not yet synthesised in the laboratory) using computational biology.
  • the term "matching” relates to the successful identification of two sequences which have a high level of identity with one another. For example the sequences may have at least 95%. 96%, 97%. 98%, 99% or 100% identity with one another.
  • the peptide may be identified, screened and/or developed. The terms may be used interchangeably.
  • the selected protein source may be a food protein, e.g. whey or pea protein.
  • the initial and alternative peptidases may be selected from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB).
  • the peptidase which is output in the final step is the designed peptidase.
  • the method is relatively quick and can be performed for multiple different peptidases or combinations of enzymes until the designed peptidase is output.
  • the method is an iterative one and the necessary steps of the method can be repeated until a desired outcome is achieved.
  • the output peptidase is manufactured.
  • the one or more peptides formed by enzymatic hydrolysis of a selected protein source is manufactured.
  • the manufacture can be done by various known techniques using the output amino acid sequence.
  • a synthesiser may be used to manufacture the designed peptidase and/or one or more peptides.
  • the method may further comprise determining whether the output resulting sequence matches the sequence of at least one desired output peptide.
  • the selected protein source is a food source
  • the at least one desired output peptide may be a bioactive peptide, a neutraceutical of a peptide which promotes health.
  • the method is iterative and thus if there is no matching sequence, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no matching sequence, another alternative peptidase is selected and the listed steps are repeated again. In other words, all the method steps can be repeated until a matching sequence is output. If there is a matching sequence, the outputting step will output the peptidase which resulted in the at least one desired output peptide.
  • the method may comprise minimising at least one of the resulting sequences before recalculating the surface accessibility of the at least one resulting sequence.
  • the method may comprise modelling the at least one matching sequence within the selected protein source.
  • the method may comprise modelling the at least one composed peptide before calculating the surface accessibility.
  • the model may be a 3-dimensional model, for example a model constructed using a modelling computational tool such as iTasser (Iterative Threading Assembly Refinement).
  • the tool may predict a three-dimensional structural model of the protein molecules from the amino acid sequences. Monte Carlo simulations may be used in the prediction method.
  • the modelling may comprise assembling fragments of template structures to form the model and there may be more than one stage in the assembling of the fragments.
  • the method may comprise determining the cleavage site on the at least one matching sequence within the selected protein source by aligning the model of the at least one matching sequence within the selected protein source with the model of its matching composed peptide and identifying the cleavage site on the at least one matching sequence as the site which aligns with the marked cleavage site on the at least one composed peptide.
  • the model of the matching composed peptide may be discarded before cleavage. In this way, only the sequence from the target protein source is taken forwards in the process.
  • Using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible may comprise comparing the calculated surface accessibility with a threshold value; and determining that the at least one matching sequence is surface accessible if the calculated surface accessibility is above the threshold value.
  • surface accessibility only surface exposed amino acid stretches on proteins matching those composed peptides are taken forwards. If the sequence is not surface accessible, the method ends and if no surface accessible sequences have been identified, no results are output. As set out above, the method is iterative and thus if there is no output, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no output, another alternative peptidase is selected and the listed steps are repeated again.
  • the binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner.
  • the peptide partner may have a fixed number of amino acids, e.g. 20. After the amino acid sequence of the peptide partner has been identified, there may be a search for combinations of amino acids across the binding site which satisfy various constraints.
  • Composing a plurality of peptides may comprise identifying a set of peptide backbone scaffolds having a matching backbone arrangement to the binding site of the selected peptidase and determining optimal sequences for the identified backbone scaffolds.
  • One method for composing the peptide is to use a computational pipeline such as PEPcomposer which designs peptides binding to a given protein surface. There may be a large number of composed peptides. Accordingly, the method may further comprise ranking each of the plurality of composed peptides before determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source. Ranking each of the plurality of composed peptides may be according to cleavage probability.
  • Ranking may be performed by using any suitable ranking algorithm such as Fold-X FlexPepDock, GalaxyPepDock, CABS-dock, pepATTRACT or PEPCrawler".
  • the ranking preferably predicts the optimal peptide sequence(s) which should bind to the peptidase at the identified binding site.
  • the method may further comprise selecting highest ranking peptides (e.g. selecting the top 20 ranking peptides; and taking only these highest ranking peptides forwards in the method.
  • the method may comprise determining whether at least one composed peptide from the highest ranking peptides matches at least one sequence within the selected protein source.
  • a method for predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source comprising:
  • the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
  • a computer-implemented method for predicting the protein source of peptides formed by enzymatic hydrolysis comprising:
  • the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
  • the method(s) may be computer-implemented and may be practised with other computer system configurations, e.g. microprocessor systems, main frame computers and the like.
  • a computer readable medium i.e. any storage device that can store data which can be read by a computer system, for storing a computer program which when implemented on a computer system causes the steps of the method(s) above to be performed.
  • Examples of a computer readable medium include a hard-drive, read-only memory, random-access memory, a compact disc, CD-ROM, a digital versatile disk, a magnetic tape, other non-transitory devices and other non-optical storage devices.
  • the computer readable medium may also be distributed over a network coupled system so that the computer program code is stored and executed in a distributed fashion.
  • the computer readable medium is preferably non-transitory.
  • Figure 1 is a flowchart showing the steps in the method
  • Figures 2a to 2f illustrate the outputs at various stages in the method of Figure 1 ;
  • Figure 3 is a schematic block diagram of a system for implementing the method of Figure 1.
  • Figure 1 is a flowchart illustrating the various steps in the method.
  • the method may be considered to comprise two distinct stages: a first stage in which peptide sequences within a protein substrate which match peptide sequences within an enzyme substrate are identified and a second stage in which there is a determination of whether the identified sequences are accessible for binding and cleavage in a peptidase binding site.
  • the first step S100 in stage 1 is to select a peptidase (or protease).
  • the terms may be used interchangeably and may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis.
  • the peptidase may be identified from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB).
  • the database also stores data (e.g. sequence, structure and binding sites for each type of amino acid) regarding the peptidase.
  • the repository may be used to derive preferences in the form of position specific structural elements which describe a binding site environment in bound peptides.
  • the next step S102 is to identify at least one binding site (or active site) on the peptidase.
  • the binding site may be identified by extracting the structural elements of the protein-protein interaction interface from the database. These elements capture the atomic composition and solvent accessibility of a central residue and its closest neighbours in the protein structure.
  • the binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner.
  • the peptide partner is typically a sequence from a protein from a food source.
  • the sequence for peptide partner will typically have a fixed number of amino acids, e.g. 20.
  • the constraints may be determined by applying a machine learning or scoring matrix to generate a model which enables differentiation of the preferred interactions for every interface amino acid in a protein-protein complex.
  • An example of a suitable binding site is shown in Figure 2a.
  • the next step in the method is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This can be done using the model mentioned above to place the interacting amino acids from the peptide partner and by linking the interacting residues by peptide bond and by constructing all initial conformations of potential interacting peptide(s).
  • the composed peptides are constructed de novo (i.e. from scratch) using the amino acids which are considered to be placed in their optimal positions in the context of the binding site.
  • One method for composing the peptide is to use a computational pipeline known as PEPcomposer which designs peptides binding to a given protein surface.
  • PEPcomposer is described for example in "PepComposer: computational design of peptides binding to a given protein surface" by Obarska-Kosinska et al published in Nucleic Acids Research Advance Access 2016, April 30.
  • the inputs are the structure of the target protein and an approximate definition of the binding site of the target protein.
  • a search of monomeric proteins is conducted and a set of peptide backbone scaffolds having the same backbone arrangement as the binding site of the selected peptidase are identified. Once the backbone scaffolds are identified, optimal sequences for the identified scaffolds are designed.
  • Figure 2b shows potential binding sites 20 highlighted in dark cyan and the residues 22 within the selected cut-off highlighted in dark violet.
  • Figure 2c shows one of the designed peptide 24 at one binding site. The cleavage points are also shown in Figure 2c.
  • the designed proteins are subjected to conformational refinement and scoring to rank them (step S106).
  • the ranking predicts the optimal peptide sequence which should bind to the peptidase at the identified binding site.
  • One important factor used to rank the designed proteins is the cleavage probability. This may be determined using a scoring function (e.g. dMM-PBSA) which reflects the interaction energy of a given peptide with a modelled "near-attack" conformation of the peptidase binding site.
  • dMM-PBSA is described for example in "dMM-PBSA: A new HADDOCK scoring function for Protein-peptide docking" by Spiliotopoulos et al published in Front Mol BioSci 2016 Aug 31. For each predicted (or designed) protein, the associated cleavage site is marked as shown in Figure 2c.
  • the next stage (S108) is to determine which of the highest ranked designed peptides (e.g. the top 20 or top 10) exactly match sequence stretches in a hydrolysate protein source (e.g. a food source such as whey, pea protein etc.)
  • This matching step may be done by aligning the designed peptides with the sequences in the target protein source.
  • a single target protein source may be used or a handful of target protein sources may be considered.
  • One suitable algorithm for matching is known as Peptidematch which is a computational tool designed to quickly retrieve all occurrences of a given query peptide from a database known as UniProtKnowledge (UniProtKB) together with the isoforms.
  • the output from the tool is a summary table showing each match together with other information such as the matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases.
  • the results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree.
  • a 3-d model of each matching sequence is constructed, for example, using a known modelling computational tool such as iTasser (Iterative Threading Assembly Refinement) which is described for example in "l-TASSER server: new development for protein structure and function predictions” by Yang et al published in Nucleic Acids Res 2015, Vol 43, Issue W1 , W174-181 .
  • the tool predicts a three-dimensional structural model of the protein molecules from the amino acid sequences.
  • Monte Carlo simulations may be used in the prediction method as well as COACH which is a meta-server approach to protein-ligand binding site prediction. These techniques are described in "The l-TASSER suite: protein structure and function prediction” by Yang et al published in Nature Methods 12, P7-8 (2015).
  • the amino acid sequences are input and using a technique called fold recognition or threading (e.g. using LOMETS (Local Meta-Threading Server), template structures are identified.
  • LOMETS is described in "LOMETS: a local meta-threading server for protein structure prediction” by Wu et al published in Nucleic Acids Res 2007, Epub 2007 May 3.
  • These template structures are typically template fragments and a full-length structural model is constructed by assembling the template structures.
  • the assembly may comprise multiple stages, for example in a first stage the template fragments are clustered to form a cluster centroid. This assembly stage may use restraints from LOMETS together with decoy-based optimized potential. Using restraints from the cluster, LOMETS and another algorithm known as TM-align, the cluster centroid is then reassembled.
  • TM-align is described for example in "TM-align: a protein structure alignment algorithm based on the TM-score” by Zhang et al published in Nucleic Acids Res 2005 Apr 22. The inherent reduced potential is also used in the re-assembly.
  • REMO A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks
  • This final model is then processed using the TM-align algorithm using the Protein Data Bank library. This processing gives the structural analogy together with the enzyme commission number, the gene ontology vocabulary, the binding site(s) and a prediction of the function.
  • the steps to generate a 3D model may be considered to be a first stage in the overall process.
  • the second stage is to determine accessibility for binding and cleavage.
  • the next step S1 12 is to calculate the surface accessibility for each amino acid in each of the modelled designed peptides. This can be done for example using Netsurf which is an ensemble of artificial neural networks trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids.
  • the method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process.
  • the techniques underpinning Netsurf are described for example in "A generic method for assignment of reliability scores applied to solvent accessibility predictions" by Petersen et al published in BMC Struct Biol 2009 Jul 31 ; 9:51.
  • the primary network has inputs of a Position- Specific Scoring Matrix (PSSM) and the raw output from secondary structure predictions.
  • PSSM Position- Specific Scoring Matrix
  • 'B/E Classification (buried or exposed classification)' is the raw output from the neural networks within the primary network.
  • the secondary network is trained to predict the relative surface exposure of an amino acid using the 'B/E Classification' and the PSSM. The results are output from the web server.
  • Figure 2d shows an example of a 3D model of one of the designed peptides aligned to its matching protein.
  • the designed peptide is deleted from the structure so that only the sequence from the target protein source is taken forwards in the process.
  • the next step is to cleave the sequence of the target protein source at the correct position into two peptides (S1 18). Once the sequence has been cleaved, the next step is to minimise each cleaved peptide (S120).
  • S120 An example of a minimised cleaved peptide is shown in Figure 2e.
  • step S1 12 The process then loops back to step S1 12 to calculate the surface accessibility for each amino acid in the cleaved, minimised peptide. There is a determination as to whether or not one of the sequences in the cleaved, minimised peptide is surface accessible. If so, the sequence is aligned, cleaved and minimised as before. The calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the resulting cleaved, minimised sequences which are the peptides resulting from enzymatic cleavage of the target protein source by the identified peptidase. Example outputs of sequences are shown in Figure 2f.
  • the method described above is a structure based predictive method for identifying the peptides which arise from enzymatic cleavage of a protein food source.
  • the final outputs shown in Figure 2f are the peptide fragments found in the hydrolysates.
  • Step S108 determines which food proteins have sequences which align with the designed peptides and this matching (aligning) step means that the protein sources of the peptide fragments is identified.
  • the identified protein source may be output in the final stage together with or instead of the resulting cleaved, minimised sequences.
  • the designed peptides are ones which are predicted to interact with the binding site of the identified peptidase.
  • the method is relatively quick and can be performed for multiple different peptidases (enzymes) or combinations of enzymes.
  • the method can be used to identify and hence design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme. This can be done by selecting a new peptidase if no matching protein source that is of interest is identified in step S108 and repeating the steps of the methods.
  • the method shown in Figure 1 predicts the products of enzymatic cleavage for a particular protein, e.g. how a food protein is digested to produce specific peptides (e.g. bioactive peptides). Accordingly, the method can be further used to design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme into the desired hydrolysates (i.e. desired products). This can be done by selecting a new peptidase and repeating the steps of the method if no peptides of interest are output in the final stage.
  • the method allows: i. Identification of protein sources of peptide fragments found in hydrolysates;
  • the method may also be combined with bioactive prediction algorithms to identify enzymes or combinations of enzymes that may produce higher titres of certain peptides from a specific starting material.
  • FIG 3 is a schematic block diagram of a system for implementing the method of Figure 1.
  • the system comprises a central computing system which comprises a processor 70, memory 72 and an interface 74. These components are all operably connected to one another.
  • the method described above may be computer-implemented and Figure 3 shows one possible system for implementing the method.
  • the central system comprises a processor 70, e.g. a central processing unit implemented in hardware, which implements the method above.
  • the processor is connected to a memory 72 (e.g. RAM, ROM or other suitable storage) which stores the computer code which implements the method.
  • the processor and memory are operatively connected to an interface 74 which is an input/output device.
  • the first step is for the processor to select a peptidase from the peptidase database 76.
  • the database is shown as a separate database connected to the interface. However, it will be appreciated that the database may also be an integral part of the central system.
  • the processor then implements the method step of identifying a binding site. Once a binding site is identified, the next step is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This may be done in a composing server 78, e.g. a PEPcomposer server.
  • the composing server 78 may receive a request from the processor 70 to compose the peptide(s). The composed peptides are input to the processor 70 from the composing server 78.
  • the composed peptides then need to be ranked.
  • the processor may send the information on the composed peptides to a ranking server 80 which ranks the peptides using one of the algorithms identified above.
  • the ranking of the peptides is returned to the processor 70.
  • the next step is to determine which of the highest ranked designed peptides exactly match sequence stretches in a hydrolysate protein source.
  • the highest ranked peptides sequences are sent to a peptide match server 82 (e.g. Peptidematch) which retrieves all matching sequences from a peptide database 84 (e.g. UniProtKB).
  • the matching sequences are returned to the processor 70 from the peptide match server 82.
  • the final step in the first stage is to model the matching sequences, e.g. using a modelling server 86 (such as l-TASSER).
  • the models are returned to the processor 70 from the modelling server 86.
  • These models may be stored in memory 72 or in another database (not shown) which is connected to the processor.
  • the first step is to determine surface accessibility of the matching sequences and this is done using a surface accessibility server 88 which calculates values for the surface accessibility. These values are returned to the processor 70 from the surface accessibility server 88.
  • the processor determines whether or not the segments are surface accessible, e.g. by determining whether or not the calculated values are above a threshold value. Thereafter, for surface accessible segments only, the processor aligns the matching protein (e.g. whey, pea protein) to the matching designed peptide and then deletes the designed peptide from the structure.
  • the sequence of the target protein source is then cleaved by the processor at the correct position and the resulting sequences are minimised.
  • the calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the peptidase which resulted in these resulting cleaved, minimised sequences and/or the peptidase which resulted in these cleaved, minimised sequences and/or the protein source from which the cleaved, minimised sequences are derived.
  • These output sequences may be stored in memory 72 or output on a display (not shown).
  • each of the servers may have similar hardware to the central system and may thus comprises a processor e.g. a central processing unit implemented in hardware, a memory (e.g. RAM, ROM or other suitable storage) and an interface which is an input/output device.
  • a processor e.g. a central processing unit implemented in hardware
  • a memory e.g. RAM, ROM or other suitable storage
  • an interface which is an input/output device.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method and apparatus for identifying, screening and/or developing enzymes which will produce pre-determined peptides from enzymatic cleavage by predicting which peptides will arise from enzymatic cleavage, e.g. during digestion. The method comprises: selecting an initial peptidase; identifying a binding site on the initial peptidase; composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides; and determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source. If there is at least one matching sequence, the method further comprises calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source; and using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible. If it is surface accessible, the method further comprises cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences; and recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences.

Description

COMPUTATIONAL SELECTION OF PROTEASES AND PREDICTION
OF CLEAVAGE PRODUCTS
Technical field The invention relates to a method and apparatus for identifying, screening and/or developing enzymes which will produce pre-determined peptides from enzymatic cleavage by predicting which peptides will arise from enzymatic cleavage, e.g. during digestion. The invention also relates to a method and apparatus for predicting the peptides formed by enzymatic hydrolysis of proteins, and optionally the cleavage points of the peptides.
Background
The end products of both bacterial and enzymatic digestion of a food source are often referred to as a hydrolysate. For example, when a protein within a food source is digested, hydrolysates with diverse properties can be released and some of these hydrolysates can be bioactive peptides. Such hydrolysates may have improved nutritive value, enhanced functional properties and potential biological activity. Accordingly, they can be potentially used as, or in, nutraceuticals and functional foods for promoting health. Various articles have been published illustrating some of these potential uses, for example, "Screening of whey protein isolate hydrolysates for their dual functionality: Influence of heat pre-treatment and enzyme specificity" by Adjonu et al published in Food Chemistry 136 (2013) 1435 -1443; "Pharmaceutical applications of bioactive peptides" by Danquah et al published in OA Biotechnology 2012 Dec 29; 1 (2):5; "Bioactive peptides and protein hydrolysates: research trends and challenges for application as nutraceuticals and functional food ingredients" by Li- Chan published in Science Direct 2015, 1 :28-37; and "Generation of bioactive hydrolysates and peptides from bovine haemoglobin with in vitro renin, angiotensin-l-converting enzyme and dipeptidyl peptidase IV inhibitory activities" by Lafarga et al published in Journal of Food Biochemistry 00 (2016). Currently, the peptides which arise from enzymatic cleavage of a protein food source are determined used analysis after the digestion has occurred, e.g. using mass spectrometric analysis such as that described in "EnzymePredictor: A tool for predicting and visualising enzymatic cleavages of digested proteins" by Vijayakumar et al published in Journal of Proteome Research 2012, 1 1 , 6056-6065. Some work has been done on computer- implemented prediction tools, e.g. "Carp Proteins as a Source of Bioactive Peptides - an in silico approach" by Darewicz et al published in Czech J Food Sci 34, 2016 (2): 1 1 1-1 17 and "PeptideLocator: prediction of bioactive peptides in protein sequences" by Mooney et al published in Bioinformatics Vol 29, no. 9 2013, P1 120-1 126. The present applicant has recognised the need for an improved computer-implemented process. Summary According to a first aspect of the invention, there is provided a computer-implemented method for designing a peptidase which provides desired enzymatic cleavage of a selected protein source, the method comprising:
selecting an initial peptidase;
identifying a binding site on the initial peptidase;
composing a plurality of peptides each of which interacts with the identified binding site; marking a cleavage site on each of the plurality of peptides;
determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source;
if there is no matching sequence, selecting an alternative peptidase and repeating the identifying, composing, marking and determining steps;
if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source; using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;
if the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences;
repeating the using, cleaving and recalculating steps until at least one resulting sequence which is not surface accessible is formed; and
outputting the peptidase which resulted in the output sequence and optionally the at least one resulting sequence which is not surface accessible.
The peptidase (or protease) may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis. The peptidase may be putative, that is to say an enzyme which is believed or understood to have the ability to break down peptides into amino acids and such peptidases may be synthetically generated (such as by employing standard genetic engineering techniques) or generated (but not yet synthesised in the laboratory) using computational biology. The term "matching" relates to the successful identification of two sequences which have a high level of identity with one another. For example the sequences may have at least 95%. 96%, 97%. 98%, 99% or 100% identity with one another. By designing the peptide, the peptide may be identified, screened and/or developed. The terms may be used interchangeably.
The selected protein source may be a food protein, e.g. whey or pea protein. The initial and alternative peptidases (where required) may be selected from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB). The peptidase which is output in the final step is the designed peptidase.
The method is relatively quick and can be performed for multiple different peptidases or combinations of enzymes until the designed peptidase is output. The method is an iterative one and the necessary steps of the method can be repeated until a desired outcome is achieved.
In certain embodiments, once a desired outcome is achieved, the output peptidase is manufactured. In other embodiments, the one or more peptides formed by enzymatic hydrolysis of a selected protein source is manufactured. A skilled person would recognise that the manufacture can be done by various known techniques using the output amino acid sequence. For example, a synthesiser may be used to manufacture the designed peptidase and/or one or more peptides. The method may further comprise determining whether the output resulting sequence matches the sequence of at least one desired output peptide. For example, if the selected protein source is a food source, the at least one desired output peptide may be a bioactive peptide, a neutraceutical of a peptide which promotes health. As set out above, the method is iterative and thus if there is no matching sequence, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no matching sequence, another alternative peptidase is selected and the listed steps are repeated again. In other words, all the method steps can be repeated until a matching sequence is output. If there is a matching sequence, the outputting step will output the peptidase which resulted in the at least one desired output peptide.
The method may comprise minimising at least one of the resulting sequences before recalculating the surface accessibility of the at least one resulting sequence. The method may comprise modelling the at least one matching sequence within the selected protein source. The method may comprise modelling the at least one composed peptide before calculating the surface accessibility. The model may be a 3-dimensional model, for example a model constructed using a modelling computational tool such as iTasser (Iterative Threading Assembly Refinement). The tool may predict a three-dimensional structural model of the protein molecules from the amino acid sequences. Monte Carlo simulations may be used in the prediction method. The modelling may comprise assembling fragments of template structures to form the model and there may be more than one stage in the assembling of the fragments.
The method may comprise determining the cleavage site on the at least one matching sequence within the selected protein source by aligning the model of the at least one matching sequence within the selected protein source with the model of its matching composed peptide and identifying the cleavage site on the at least one matching sequence as the site which aligns with the marked cleavage site on the at least one composed peptide. The model of the matching composed peptide may be discarded before cleavage. In this way, only the sequence from the target protein source is taken forwards in the process.
Using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible may comprise comparing the calculated surface accessibility with a threshold value; and determining that the at least one matching sequence is surface accessible if the calculated surface accessibility is above the threshold value. By considering surface accessibility, only surface exposed amino acid stretches on proteins matching those composed peptides are taken forwards. If the sequence is not surface accessible, the method ends and if no surface accessible sequences have been identified, no results are output. As set out above, the method is iterative and thus if there is no output, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no output, another alternative peptidase is selected and the listed steps are repeated again.
The binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner. The peptide partner may have a fixed number of amino acids, e.g. 20. After the amino acid sequence of the peptide partner has been identified, there may be a search for combinations of amino acids across the binding site which satisfy various constraints.
Composing a plurality of peptides may comprise identifying a set of peptide backbone scaffolds having a matching backbone arrangement to the binding site of the selected peptidase and determining optimal sequences for the identified backbone scaffolds. One method for composing the peptide is to use a computational pipeline such as PEPcomposer which designs peptides binding to a given protein surface. There may be a large number of composed peptides. Accordingly, the method may further comprise ranking each of the plurality of composed peptides before determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source. Ranking each of the plurality of composed peptides may be according to cleavage probability. Ranking may be performed by using any suitable ranking algorithm such as Fold-X FlexPepDock, GalaxyPepDock, CABS-dock, pepATTRACT or PEPCrawler". The ranking preferably predicts the optimal peptide sequence(s) which should bind to the peptidase at the identified binding site. The method may further comprise selecting highest ranking peptides (e.g. selecting the top 20 ranking peptides; and taking only these highest ranking peptides forwards in the method. For example, the method may comprise determining whether at least one composed peptide from the highest ranking peptides matches at least one sequence within the selected protein source.
It will be appreciated that at the heart of the method for designing, identifying, screening and/or developing the peptidase described above, there is a method for predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source. Thus according to a related aspect of the invention, there is provided a computer-implemented method for predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source, the method comprising:
selecting a protein source;
selecting an initial peptidase;
identifying a binding site on the initial peptidase;
composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides;
determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source;
if there is no matching sequence, selecting an alternative peptidase and repeating the identifying, composing, marking and determining steps;
if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source;
using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;
if the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences;
repeating the using, cleaving and recalculating steps until at least one resulting peptide sequence which is not surface accessible is formed; and
outputting the at least one resulting peptide sequence which is not surface accessible and optionally the peptidase which resulted in the output sequence. It will also be appreciated that the method of predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source can be used to predict the protein source of peptides formed by enzymatic hydrolysis. Thus according to yet another related aspect of the invention, there is provided a computer-implemented method for predicting the protein source of peptides formed by enzymatic hydrolysis, the method comprising:
selecting an initial peptidase;
identifying a binding site on the initial peptidase;
composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides;
determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source;
if there is no matching sequence, selecting an alternative peptidase and repeating the identifying, composing, marking and determining steps;
if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source;
using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;
if the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences;
repeating the using, cleaving and recalculating steps until at least one resulting sequence which is not surface accessible is formed;
comparing the at least one resulting sequence which is not surface accessible with a plurality of protein source sequences so as to identify one or more matching sequences; and outputting the identified protein source and optionally the peptidase. The optional features described in relation to the first aspect may also be combined with the related aspects of the invention.
The method(s) may be computer-implemented and may be practised with other computer system configurations, e.g. microprocessor systems, main frame computers and the like.
According to another aspect of the invention, there is also provided a computer readable medium, i.e. any storage device that can store data which can be read by a computer system, for storing a computer program which when implemented on a computer system causes the steps of the method(s) above to be performed. Examples of a computer readable medium include a hard-drive, read-only memory, random-access memory, a compact disc, CD-ROM, a digital versatile disk, a magnetic tape, other non-transitory devices and other non-optical storage devices. The computer readable medium may also be distributed over a network coupled system so that the computer program code is stored and executed in a distributed fashion. The computer readable medium is preferably non-transitory.
Brief description of drawings
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which:
Figure 1 is a flowchart showing the steps in the method;
Figures 2a to 2f illustrate the outputs at various stages in the method of Figure 1 ; and
Figure 3 is a schematic block diagram of a system for implementing the method of Figure 1.
Detailed description of drawings
Figure 1 is a flowchart illustrating the various steps in the method. The method may be considered to comprise two distinct stages: a first stage in which peptide sequences within a protein substrate which match peptide sequences within an enzyme substrate are identified and a second stage in which there is a determination of whether the identified sequences are accessible for binding and cleavage in a peptidase binding site. As shown, the first step S100 in stage 1 is to select a peptidase (or protease). The terms may be used interchangeably and may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis. The peptidase may be identified from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB). The database also stores data (e.g. sequence, structure and binding sites for each type of amino acid) regarding the peptidase. Thus the repository may be used to derive preferences in the form of position specific structural elements which describe a binding site environment in bound peptides. Once the peptidase has been identified, the next step S102 is to identify at least one binding site (or active site) on the peptidase. The binding site may be identified by extracting the structural elements of the protein-protein interaction interface from the database. These elements capture the atomic composition and solvent accessibility of a central residue and its closest neighbours in the protein structure. The binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner. The peptide partner is typically a sequence from a protein from a food source. The sequence for peptide partner will typically have a fixed number of amino acids, e.g. 20. After the amino acid sequence of the peptide partner has been identified, there is a search for combinations of amino acids across the binding site which satisfy various constraints. For example, the constraints may be determined by applying a machine learning or scoring matrix to generate a model which enables differentiation of the preferred interactions for every interface amino acid in a protein-protein complex. An example of a suitable binding site is shown in Figure 2a. Once a binding site is selected, the next step in the method is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This can be done using the model mentioned above to place the interacting amino acids from the peptide partner and by linking the interacting residues by peptide bond and by constructing all initial conformations of potential interacting peptide(s). The composed peptides are constructed de novo (i.e. from scratch) using the amino acids which are considered to be placed in their optimal positions in the context of the binding site. One method for composing the peptide is to use a computational pipeline known as PEPcomposer which designs peptides binding to a given protein surface. PEPcomposer is described for example in "PepComposer: computational design of peptides binding to a given protein surface" by Obarska-Kosinska et al published in Nucleic Acids Research Advance Access 2016, April 30. The inputs are the structure of the target protein and an approximate definition of the binding site of the target protein. A search of monomeric proteins is conducted and a set of peptide backbone scaffolds having the same backbone arrangement as the binding site of the selected peptidase are identified. Once the backbone scaffolds are identified, optimal sequences for the identified scaffolds are designed.
Merely as an example, Figure 2b shows potential binding sites 20 highlighted in dark cyan and the residues 22 within the selected cut-off highlighted in dark violet. Figure 2c shows one of the designed peptide 24 at one binding site. The cleavage points are also shown in Figure 2c. The designed proteins are subjected to conformational refinement and scoring to rank them (step S106). Examples of suitable algorithms are FlexPepDock (described in "Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors" by Raveh et al published in PLoS One, 201 1 Apr 29) or Fold-X (described in "The FoldX web server: an online force field" by Schymkowitz et al published in Nucleic Acids Res, 2005 Jul 1 ). Other suitable algorithms are described in "GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimisation" by Lee et al published in Nucleic Acids Res, 2015 Jul 1 ; "CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site" by Kurcinski et al published in Nucleic Acids Res, Volume 43, Issue W1 , PW419-424; "Fully Blind Peptide-Protein Docking with pepATTRACT" by Schlinder et al published in Structure 23, 2015; or "PEPCrawler: a fast RRT- based algorithm for high-resolution refinement and binding affinity estimation of peptide inhibitors" by Donsky et al published in Bioinformatics, 201 1 Oct 15.
The ranking predicts the optimal peptide sequence which should bind to the peptidase at the identified binding site. One important factor used to rank the designed proteins is the cleavage probability. This may be determined using a scoring function (e.g. dMM-PBSA) which reflects the interaction energy of a given peptide with a modelled "near-attack" conformation of the peptidase binding site. dMM-PBSA is described for example in "dMM-PBSA: A new HADDOCK scoring function for Protein-peptide docking" by Spiliotopoulos et al published in Front Mol BioSci 2016 Aug 31. For each predicted (or designed) protein, the associated cleavage site is marked as shown in Figure 2c. The next stage (S108) is to determine which of the highest ranked designed peptides (e.g. the top 20 or top 10) exactly match sequence stretches in a hydrolysate protein source (e.g. a food source such as whey, pea protein etc.) This matching step may be done by aligning the designed peptides with the sequences in the target protein source. A single target protein source may be used or a handful of target protein sources may be considered. One suitable algorithm for matching is known as Peptidematch which is a computational tool designed to quickly retrieve all occurrences of a given query peptide from a database known as UniProtKnowledge (UniProtKB) together with the isoforms. This is described for example in "A fast Peptide Match service for UniProt Knowledgebase" by Chen et al published in Bioinformatics 2013 Nov 1 . The output from the tool is a summary table showing each match together with other information such as the matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. Once the matching sequences have been identified, they are stored and then modelled (S1 10). A 3-d model of each matching sequence is constructed, for example, using a known modelling computational tool such as iTasser (Iterative Threading Assembly Refinement) which is described for example in "l-TASSER server: new development for protein structure and function predictions" by Yang et al published in Nucleic Acids Res 2015, Vol 43, Issue W1 , W174-181 . The tool predicts a three-dimensional structural model of the protein molecules from the amino acid sequences. Monte Carlo simulations may be used in the prediction method as well as COACH which is a meta-server approach to protein-ligand binding site prediction. These techniques are described in "The l-TASSER suite: protein structure and function prediction" by Yang et al published in Nature Methods 12, P7-8 (2015). For example, the amino acid sequences are input and using a technique called fold recognition or threading (e.g. using LOMETS (Local Meta-Threading Server), template structures are identified. LOMETS is described in "LOMETS: a local meta-threading server for protein structure prediction" by Wu et al published in Nucleic Acids Res 2007, Epub 2007 May 3.
These template structures are typically template fragments and a full-length structural model is constructed by assembling the template structures. The assembly may comprise multiple stages, for example in a first stage the template fragments are clustered to form a cluster centroid. This assembly stage may use restraints from LOMETS together with decoy-based optimized potential. Using restraints from the cluster, LOMETS and another algorithm known as TM-align, the cluster centroid is then reassembled. TM-align is described for example in "TM-align: a protein structure alignment algorithm based on the TM-score" by Zhang et al published in Nucleic Acids Res 2005 Apr 22. The inherent reduced potential is also used in the re-assembly. A structure with the lowest energy is found, e.g. by optimising using REMO H-Bond optimisation. This technique is described in "REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks" by Li et al published in Proteins 2009 Aug 15. This final model is then processed using the TM-align algorithm using the Protein Data Bank library. This processing gives the structural analogy together with the enzyme commission number, the gene ontology vocabulary, the binding site(s) and a prediction of the function.
As shown in Figure 1 , the steps to generate a 3D model may be considered to be a first stage in the overall process. The second stage is to determine accessibility for binding and cleavage. The next step S1 12 is to calculate the surface accessibility for each amino acid in each of the modelled designed peptides. This can be done for example using Netsurf which is an ensemble of artificial neural networks trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. The techniques underpinning Netsurf are described for example in "A generic method for assignment of reliability scores applied to solvent accessibility predictions" by Petersen et al published in BMC Struct Biol 2009 Jul 31 ; 9:51. There may be two sets of artificial neural networks: a primary and a secondary network. The primary network has inputs of a Position- Specific Scoring Matrix (PSSM) and the raw output from secondary structure predictions. 'B/E Classification (buried or exposed classification)' is the raw output from the neural networks within the primary network. The secondary network is trained to predict the relative surface exposure of an amino acid using the 'B/E Classification' and the PSSM. The results are output from the web server.
At step S1 14, a determination is made as to whether or not one of the sequences identified in step S108 is surface accessible, for example by comparing the calculated surface accessibility with a threshold value which is indicative of a minimum level of surface accessibility. Only surface exposed amino acid stretches on proteins matching those peptides designed in previous steps are taken forwards. If the sequence is not surface accessible, the method ends and if no surface accessible sequences have been identified, no results are output. If the sequence is surface accessible, the matching protein (e.g. whey, pea protein) is aligned to the matching designed peptide (S1 16). Figure 2d shows an example of a 3D model of one of the designed peptides aligned to its matching protein.
The designed peptide is deleted from the structure so that only the sequence from the target protein source is taken forwards in the process. The next step is to cleave the sequence of the target protein source at the correct position into two peptides (S1 18). Once the sequence has been cleaved, the next step is to minimise each cleaved peptide (S120). An example of a minimised cleaved peptide is shown in Figure 2e.
The process then loops back to step S1 12 to calculate the surface accessibility for each amino acid in the cleaved, minimised peptide. There is a determination as to whether or not one of the sequences in the cleaved, minimised peptide is surface accessible. If so, the sequence is aligned, cleaved and minimised as before. The calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the resulting cleaved, minimised sequences which are the peptides resulting from enzymatic cleavage of the target protein source by the identified peptidase. Example outputs of sequences are shown in Figure 2f.
The method described above is a structure based predictive method for identifying the peptides which arise from enzymatic cleavage of a protein food source. The final outputs shown in Figure 2f are the peptide fragments found in the hydrolysates. Step S108 determines which food proteins have sequences which align with the designed peptides and this matching (aligning) step means that the protein sources of the peptide fragments is identified. Thus, the identified protein source may be output in the final stage together with or instead of the resulting cleaved, minimised sequences. Furthermore, the designed peptides are ones which are predicted to interact with the binding site of the identified peptidase. This coupled with the steps in the second stage in which only surface accessible sequences are taken forwards means that only protein source for which the identified peptidase (enzyme) will cause cleavage are identified. In other words, the method also identifies whether or not the peptidase (enzyme) will yield the desired cleavage. Thus, the selected peptidase which resulted in the cleaved, minimised sequences may be output in the final stage together with or instead of the resulting cleaved, minimised sequences.
The method is relatively quick and can be performed for multiple different peptidases (enzymes) or combinations of enzymes. Thus the method can be used to identify and hence design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme. This can be done by selecting a new peptidase if no matching protein source that is of interest is identified in step S108 and repeating the steps of the methods.
Moreover, the method shown in Figure 1 predicts the products of enzymatic cleavage for a particular protein, e.g. how a food protein is digested to produce specific peptides (e.g. bioactive peptides). Accordingly, the method can be further used to design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme into the desired hydrolysates (i.e. desired products). This can be done by selecting a new peptidase and repeating the steps of the method if no peptides of interest are output in the final stage.
In other words, the method allows: i. Identification of protein sources of peptide fragments found in hydrolysates;
ii. Identification of the protein source for peptide fragments and
iii. Identification of enzymes that could yield such cleavages.
The method may also be combined with bioactive prediction algorithms to identify enzymes or combinations of enzymes that may produce higher titres of certain peptides from a specific starting material.
Figure 3 is a schematic block diagram of a system for implementing the method of Figure 1. The system comprises a central computing system which comprises a processor 70, memory 72 and an interface 74. These components are all operably connected to one another. The method described above may be computer-implemented and Figure 3 shows one possible system for implementing the method. The central system comprises a processor 70, e.g. a central processing unit implemented in hardware, which implements the method above. The processor is connected to a memory 72 (e.g. RAM, ROM or other suitable storage) which stores the computer code which implements the method. The processor and memory are operatively connected to an interface 74 which is an input/output device.
In line with the method shown in Figure 1 , the first step is for the processor to select a peptidase from the peptidase database 76. The database is shown as a separate database connected to the interface. However, it will be appreciated that the database may also be an integral part of the central system. The processor then implements the method step of identifying a binding site. Once a binding site is identified, the next step is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This may be done in a composing server 78, e.g. a PEPcomposer server. The composing server 78 may receive a request from the processor 70 to compose the peptide(s). The composed peptides are input to the processor 70 from the composing server 78.
As shown in Figure 1 , the composed peptides then need to be ranked. The processor may send the information on the composed peptides to a ranking server 80 which ranks the peptides using one of the algorithms identified above. The ranking of the peptides is returned to the processor 70. The next step is to determine which of the highest ranked designed peptides exactly match sequence stretches in a hydrolysate protein source. The highest ranked peptides sequences are sent to a peptide match server 82 (e.g. Peptidematch) which retrieves all matching sequences from a peptide database 84 (e.g. UniProtKB). The matching sequences are returned to the processor 70 from the peptide match server 82. The final step in the first stage is to model the matching sequences, e.g. using a modelling server 86 (such as l-TASSER). The models are returned to the processor 70 from the modelling server 86. These models may be stored in memory 72 or in another database (not shown) which is connected to the processor.
In the second stage, the first step is to determine surface accessibility of the matching sequences and this is done using a surface accessibility server 88 which calculates values for the surface accessibility. These values are returned to the processor 70 from the surface accessibility server 88. The processor then determines whether or not the segments are surface accessible, e.g. by determining whether or not the calculated values are above a threshold value. Thereafter, for surface accessible segments only, the processor aligns the matching protein (e.g. whey, pea protein) to the matching designed peptide and then deletes the designed peptide from the structure. The sequence of the target protein source is then cleaved by the processor at the correct position and the resulting sequences are minimised. As explained in relation to Figure 1 , the calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the peptidase which resulted in these resulting cleaved, minimised sequences and/or the peptidase which resulted in these cleaved, minimised sequences and/or the protein source from which the cleaved, minimised sequences are derived. These output sequences may be stored in memory 72 or output on a display (not shown).
In the above description, several steps are undertaken by separate servers, e.g. composing server, ranking server etc.. However, it will be appreciated that the functionality of some or all of these servers may be integrated within a single server or within the central system itself. Alternatively, the functionality of some or all of these servers may be performed by the processor. Each of the servers may have similar hardware to the central system and may thus comprises a processor e.g. a central processing unit implemented in hardware, a memory (e.g. RAM, ROM or other suitable storage) and an interface which is an input/output device.
The contents of all papers and documents mentioned in this document are incorporated herein by reference. The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

Claims

1. A computer-implemented method for designing a peptidase which provides desired enzymatic cleavage of a selected protein source, the method comprising:
selecting an initial peptidase;
identifying a binding site on the initial peptidase;
composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides;
determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source;
if there is no matching sequence, selecting an alternative peptidase and repeating the identifying, composing, marking and determining steps;
if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source; using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;
if the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;
recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences;
repeating the using, cleaving and recalculating steps until at least one resulting sequence which is not surface accessible is formed; and
outputting the peptidase which resulted in the at least one resulting sequence which is not surface accessible.
2. A method according to claim 1 , comprising
determining whether the at least one resulting sequence which is not surface accessible matches the sequence of at least one desired peptide;
if there is no matching sequence, selecting an alternative peptidase and repeating the method steps of claim 1 ;
if there is a matching sequence, outputting the peptidase which resulted in the at least one desired peptide.
3. A method according to claim 1 or claim 2, comprising
minimising at least one of the resulting sequences before recalculating the surface accessibility of the at least one resulting sequence.
4. A method according to any one of claims 1 to 3, comprising
modelling the at least one matching sequence within the selected protein source and/or the at least one composed peptide before calculating the surface accessibility.
5. A method according to claim 4, comprising
determining the cleavage site on the at least one matching sequence within the selected protein source by
aligning the model of the at least one matching sequence within the selected protein source with the model of its matching composed peptide and
identifying the cleavage site on the at least one matching sequence as the site which aligns with the marked cleavage site on the at least one composed peptide.
6. A method according to claim 5, comprising
discarding the model of the matching composed peptide before cleavage.
7. A method according to any one of claims 1 to 6, wherein using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible comprises
comparing the calculated surface accessibility with a threshold value;
and determining that the at least one matching sequence is surface accessible if the calculated surface accessibility is above the threshold value.
8. A method according to any one of claims 1 to 7, wherein the binding site is identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner.
9. A method according to claim 8, wherein the peptide partner has a fixed number of amino acids.
10. A method according to any one of claims 1 to 9, wherein composing a plurality of peptides comprises
identifying a set of peptide backbone scaffolds having a matching backbone arrangement to the binding site of the selected peptidase and
determining optimal sequences for the identified backbone scaffolds.
1 1. A method according to any one of claims 1 to 10, comprising ranking each of the plurality of composed peptides before determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source.
12. A method according to claim 1 1 , comprising
ranking each of the plurality of composed peptides according to cleavage probability.
13. A method according to claim 1 1 or claim 12, comprising
selecting highest ranking peptides; and
determining whether at least one composed peptide from the highest ranking of the plurality of peptides matches at least one sequence within the selected protein source.
14. A method according to any one of the preceding claims, comprising outputting the at least one resulting sequence which is determined to not be surface accessible.
15. A method of manufacturing the peptidase designed by the method of any one of claims 1 to 14.
16. A non-transitory computer readable medium storing processor code which when implemented on a computer causes the computer to carry out the steps of any one of claims 1 to 14.
17. A system comprising a processor and a memory, wherein the processor is configured to implement the method of any one of claims 1 to 14.
PCT/EP2017/084752 2016-12-30 2017-12-28 Computational selection of proteases and prediction of cleavage products WO2018122338A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1622422.2A GB201622422D0 (en) 2016-12-30 2016-12-30 Peptide prediction
GB1622422.2 2016-12-30

Publications (1)

Publication Number Publication Date
WO2018122338A1 true WO2018122338A1 (en) 2018-07-05

Family

ID=58412269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/084752 WO2018122338A1 (en) 2016-12-30 2017-12-28 Computational selection of proteases and prediction of cleavage products

Country Status (2)

Country Link
GB (1) GB201622422D0 (en)
WO (1) WO2018122338A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116130004A (en) * 2023-01-06 2023-05-16 成都侣康科技有限公司 Identification processing method and system for antibacterial peptide
WO2023186863A1 (en) * 2022-03-30 2023-10-05 Ecole Polytechnique Federale De Lausanne (Epfl) Computer-implemented design of peptide:receptor signaling complexes for enhanced chemotaxis
CN117095743A (en) * 2023-10-17 2023-11-21 山东鲁润阿胶药业有限公司 Polypeptide spectrum matching data analysis method and system for small molecular peptide donkey-hide gelatin

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050048166A1 (en) * 2003-07-01 2005-03-03 Novozymes A/S Compositions and methods for tenderizing meat
EP2444810A2 (en) * 2002-10-02 2012-04-25 Catalyst Biosciences, Inc. Proteases mutants with altered specificity and uses thereof
WO2014200912A2 (en) * 2013-06-10 2014-12-18 Iogenetics, Llc Mathematical processes for determination of peptidase cleavage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2444810A2 (en) * 2002-10-02 2012-04-25 Catalyst Biosciences, Inc. Proteases mutants with altered specificity and uses thereof
US20050048166A1 (en) * 2003-07-01 2005-03-03 Novozymes A/S Compositions and methods for tenderizing meat
WO2014200912A2 (en) * 2013-06-10 2014-12-18 Iogenetics, Llc Mathematical processes for determination of peptidase cleavage

Non-Patent Citations (23)

* Cited by examiner, † Cited by third party
Title
ADJONU ET AL.: "Screening of whey protein isolate hydrolysates for their dual functionality: Influence of heat pre-treatment and enzyme specificity", FOOD CHEMISTRY, vol. 136, 2013, pages 1435 - 1443
CHEN ET AL.: "A fast Peptide Match service for UniProt Knowledgebase", BIOINFORMATICS, 1 November 2013 (2013-11-01)
DANQUAH ET AL.: "Pharmaceutical applications of bioactive peptides", OA BIOTECHNOLOGY, vol. 1, no. 2, 29 December 2012 (2012-12-29), pages 5
DAREWICZ ET AL.: "Carp Proteins as a Source of Bioactive Peptides - an in silico approach", CZECH J FOOD SCI, vol. 34, no. 2, 2016, pages 111 - 117
DONSKY ET AL.: "PEPCrawler: a fast RRT-based algorithm for high-resolution refinement and binding affinity estimation of peptide inhibitors", BIOINFORMATICS, 15 October 2011 (2011-10-15)
KURCINSKI ET AL.: "CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site", NUCLEIC ACIDS RES, vol. 43, no. W1, pages PW419 - 424
LAFARGA ET AL.: "Generation of bioactive hydrolysates and peptides from bovine haemoglobin with in vitro renin, angiotensin-l-converting enzyme and dipeptidyl peptidase IV inhibitory activities", JOURNAL OF FOOD BIOCHEMISTRY, 2016
LEE ET AL.: "GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimisation", NUCLEIC ACIDS RES, 1 July 2015 (2015-07-01)
LI ET AL.: "REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks", PROTEINS, 15 August 2009 (2009-08-15)
LI-CHAN PUBLISHED: "Bioactive peptides and protein hydrolysates: research trends and challenges for application as nutraceuticals and functional food ingredients", SCIENCE DIRECT, vol. 1, 2015, pages 28 - 37
MOONEY ET AL.: "PeptideLocator: prediction of bioactive peptides in protein sequences", BIOINFORMATICS, vol. 29, no. 9, 2013, pages 1120 - 1126
OBARSKA-KOSINSKA ET AL.: "PepComposer: computational design of peptides binding to a given protein surface", NUCLEIC ACIDS RESEARCH ADVANCE ACCESS, 30 April 2016 (2016-04-30)
PEDRO FERNANDES: "Enzymes in Food Processing: A Condensed Overview on Strategies for Better Biocatalysts", ENZYME RESEARCH, vol. 2010, 1 January 2010 (2010-01-01), pages 1 - 19, XP055467173, DOI: 10.4061/2010/862537 *
PETERSEN ET AL.: "A generic method for assignment of reliability scores applied to solvent accessibility predictions", BMC STRUCT BIOL, vol. 9, 31 July 2009 (2009-07-31), pages 51, XP021058173, DOI: doi:10.1186/1472-6807-9-51
RAVEH ET AL.: "Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors", PLOS ONE, 29 April 2011 (2011-04-29)
SCHLINDER ET AL.: "Fully Blind Peptide-Protein Docking with pepATTRACT", STRUCTURE, vol. 23, 2015
SCHYMKOWITZ ET AL.: "The FoldX web server: an online force field", NUCLEIC ACIDS RES, 1 July 2005 (2005-07-01)
SPILIOTOPOULOS ET AL.: "dMM-PBSA: A new HADDOCK scoring function for Protein-peptide docking", FRONT MOL BIOSCI, 31 August 2016 (2016-08-31)
VIJAYAKUMAR ET AL.: "EnzymePredictor: A tool for predicting and visualising enzymatic cleavages of digested proteins", JOURNAL OF PROTEOME RESEARCH, vol. 11, 2012, pages 6056 - 6065
WU ET AL.: "LOMETS: a local meta-threading server for protein structure prediction", NUCLEIC ACIDS RES, 2007
YANG ET AL.: "l-TASSER server: new development for protein structure and function predictions", NUCLEIC ACIDS RES, vol. 43, no. W1, 2015, pages W174 - 181
YANG ET AL.: "The 1- TASSER suite: protein structure and function prediction", NATURE METHODS, vol. 12, 2015, pages 7 - 8
ZHANG ET AL.: "TM-align: a protein structure alignment algorithm based on the TM-score", NUCLEIC ACIDS RES, 22 April 2005 (2005-04-22)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023186863A1 (en) * 2022-03-30 2023-10-05 Ecole Polytechnique Federale De Lausanne (Epfl) Computer-implemented design of peptide:receptor signaling complexes for enhanced chemotaxis
CN116130004A (en) * 2023-01-06 2023-05-16 成都侣康科技有限公司 Identification processing method and system for antibacterial peptide
CN116130004B (en) * 2023-01-06 2024-05-24 成都侣康科技有限公司 Identification processing method and system for antibacterial peptide
CN117095743A (en) * 2023-10-17 2023-11-21 山东鲁润阿胶药业有限公司 Polypeptide spectrum matching data analysis method and system for small molecular peptide donkey-hide gelatin
CN117095743B (en) * 2023-10-17 2024-01-05 山东鲁润阿胶药业有限公司 Polypeptide spectrum matching data analysis method and system for small molecular peptide donkey-hide gelatin

Also Published As

Publication number Publication date
GB201622422D0 (en) 2017-02-15

Similar Documents

Publication Publication Date Title
Zhou et al. SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures
Jisna et al. Protein structure prediction: conventional and deep learning perspectives
Shen et al. Virus-mPLoc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites
Chou et al. Large‐scale plant protein subcellular location prediction
Chou Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology
Adhikari DEEPCON: protein contact prediction using dilated convolutional neural networks with dropout
Shen et al. Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins
Xu et al. AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction
Skolnick et al. Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement
Smolarczyk et al. Protein secondary structure prediction: a review of progress and directions
Dobson et al. Prediction of protein function in the absence of significant sequence similarity
Barukab et al. DBP-GAPred: an intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning
WO2018122338A1 (en) Computational selection of proteases and prediction of cleavage products
Cheung et al. De novo protein structure prediction using ultra-fast molecular dynamics simulation
Wang et al. Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity
Huang et al. Simultaneously identify three different attributes of proteins by fusing their three different modes of Chou's pseudo amino acid compositions
Vangaveti et al. Integrating ab initio and template-based algorithms for protein–protein complex structure prediction
Tran et al. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction
Wu et al. Atomic protein structure refinement using all-atom graph representations and SE (3)-equivariant graph neural networks
Ghualm et al. Identification of pathway-specific protein domain by incorporating hyperparameter optimization based on 2D convolutional neural network
Wu et al. Atomic protein structure refinement using all-atom graph representations and SE (3)-equivariant graph transformer
Krull et al. ProPairs: a data set for protein–protein docking
Zhu et al. PreAcrs: a machine learning framework for identifying anti-CRISPR proteins
Hippe et al. ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features
Tao et al. Docking cyclic peptides formed by a disulfide bond through a hierarchical strategy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17832784

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17832784

Country of ref document: EP

Kind code of ref document: A1