WO2018122338A1

WO2018122338A1 - Computational selection of proteases and prediction of cleavage products

Info

Publication number: WO2018122338A1
Application number: PCT/EP2017/084752
Authority: WO
Inventors: Andrew Knox
Original assignee: Dublin Institute Of Technology
Priority date: 2016-12-30
Filing date: 2017-12-28
Publication date: 2018-07-05
Also published as: GB201622422D0

Abstract

The invention relates to a method and apparatus for identifying, screening and/or developing enzymes which will produce pre-determined peptides from enzymatic cleavage by predicting which peptides will arise from enzymatic cleavage, e.g. during digestion. The method comprises: selecting an initial peptidase; identifying a binding site on the initial peptidase; composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides; and determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source. If there is at least one matching sequence, the method further comprises calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source; and using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible. If it is surface accessible, the method further comprises cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences; and recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences.

Description

COMPUTATIONAL SELECTION OF PROTEASES AND PREDICTION

OF CLEAVAGE PRODUCTS

Technical field The invention relates to a method and apparatus for identifying, screening and/or developing enzymes which will produce pre-determined peptides from enzymatic cleavage by predicting which peptides will arise from enzymatic cleavage, e.g. during digestion. The invention also relates to a method and apparatus for predicting the peptides formed by enzymatic hydrolysis of proteins, and optionally the cleavage points of the peptides.

Background

The end products of both bacterial and enzymatic digestion of a food source are often referred to as a hydrolysate. For example, when a protein within a food source is digested, hydrolysates with diverse properties can be released and some of these hydrolysates can be bioactive peptides. Such hydrolysates may have improved nutritive value, enhanced functional properties and potential biological activity. Accordingly, they can be potentially used as, or in, nutraceuticals and functional foods for promoting health. Various articles have been published illustrating some of these potential uses, for example, "Screening of whey protein isolate hydrolysates for their dual functionality: Influence of heat pre-treatment and enzyme specificity" by Adjonu et al published in Food Chemistry 136 (2013) 1435 -1443; "Pharmaceutical applications of bioactive peptides" by Danquah et al published in OA Biotechnology 2012 Dec 29; 1 (2):5; "Bioactive peptides and protein hydrolysates: research trends and challenges for application as nutraceuticals and functional food ingredients" by Li- Chan published in Science Direct 2015, 1 :28-37; and "Generation of bioactive hydrolysates and peptides from bovine haemoglobin with in vitro renin, angiotensin-l-converting enzyme and dipeptidyl peptidase IV inhibitory activities" by Lafarga et al published in Journal of Food Biochemistry 00 (2016). Currently, the peptides which arise from enzymatic cleavage of a protein food source are determined used analysis after the digestion has occurred, e.g. using mass spectrometric analysis such as that described in "EnzymePredictor: A tool for predicting and visualising enzymatic cleavages of digested proteins" by Vijayakumar et al published in Journal of Proteome Research 2012, 1 1 , 6056-6065. Some work has been done on computer- implemented prediction tools, e.g. "Carp Proteins as a Source of Bioactive Peptides - an in silico approach" by Darewicz et al published in Czech J Food Sci 34, 2016 (2): 1 1 1-1 17 and "PeptideLocator: prediction of bioactive peptides in protein sequences" by Mooney et al published in Bioinformatics Vol 29, no. 9 2013, P1 120-1 126. The present applicant has recognised the need for an improved computer-implemented process. Summary According to a first aspect of the invention, there is provided a computer-implemented method for designing a peptidase which provides desired enzymatic cleavage of a selected protein source, the method comprising:

selecting an initial peptidase;

identifying a binding site on the initial peptidase;

composing a plurality of peptides each of which interacts with the identified binding site; marking a cleavage site on each of the plurality of peptides;

determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source;

if there is no matching sequence, selecting an alternative peptidase and repeating the identifying, composing, marking and determining steps;

if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source; using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;

if the at least one matching sequence within the selected protein source is surface accessible, cleaving the at least one matching sequence at a cleavage site on the matching sequence which aligns with the cleavage site on its matching composed peptide to form two resulting sequences;

recalculating the surface accessibility for each amino acid in at least one of the two resulting sequences;

repeating the using, cleaving and recalculating steps until at least one resulting sequence which is not surface accessible is formed; and

outputting the peptidase which resulted in the output sequence and optionally the at least one resulting sequence which is not surface accessible.

The peptidase (or protease) may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis. The peptidase may be putative, that is to say an enzyme which is believed or understood to have the ability to break down peptides into amino acids and such peptidases may be synthetically generated (such as by employing standard genetic engineering techniques) or generated (but not yet synthesised in the laboratory) using computational biology. The term "matching" relates to the successful identification of two sequences which have a high level of identity with one another. For example the sequences may have at least 95%. 96%, 97%. 98%, 99% or 100% identity with one another. By designing the peptide, the peptide may be identified, screened and/or developed. The terms may be used interchangeably.

The selected protein source may be a food protein, e.g. whey or pea protein. The initial and alternative peptidases (where required) may be selected from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB). The peptidase which is output in the final step is the designed peptidase.

The method is relatively quick and can be performed for multiple different peptidases or combinations of enzymes until the designed peptidase is output. The method is an iterative one and the necessary steps of the method can be repeated until a desired outcome is achieved.

In certain embodiments, once a desired outcome is achieved, the output peptidase is manufactured. In other embodiments, the one or more peptides formed by enzymatic hydrolysis of a selected protein source is manufactured. A skilled person would recognise that the manufacture can be done by various known techniques using the output amino acid sequence. For example, a synthesiser may be used to manufacture the designed peptidase and/or one or more peptides. The method may further comprise determining whether the output resulting sequence matches the sequence of at least one desired output peptide. For example, if the selected protein source is a food source, the at least one desired output peptide may be a bioactive peptide, a neutraceutical of a peptide which promotes health. As set out above, the method is iterative and thus if there is no matching sequence, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no matching sequence, another alternative peptidase is selected and the listed steps are repeated again. In other words, all the method steps can be repeated until a matching sequence is output. If there is a matching sequence, the outputting step will output the peptidase which resulted in the at least one desired output peptide.

The method may comprise minimising at least one of the resulting sequences before recalculating the surface accessibility of the at least one resulting sequence. The method may comprise modelling the at least one matching sequence within the selected protein source. The method may comprise modelling the at least one composed peptide before calculating the surface accessibility. The model may be a 3-dimensional model, for example a model constructed using a modelling computational tool such as iTasser (Iterative Threading Assembly Refinement). The tool may predict a three-dimensional structural model of the protein molecules from the amino acid sequences. Monte Carlo simulations may be used in the prediction method. The modelling may comprise assembling fragments of template structures to form the model and there may be more than one stage in the assembling of the fragments.

The method may comprise determining the cleavage site on the at least one matching sequence within the selected protein source by aligning the model of the at least one matching sequence within the selected protein source with the model of its matching composed peptide and identifying the cleavage site on the at least one matching sequence as the site which aligns with the marked cleavage site on the at least one composed peptide. The model of the matching composed peptide may be discarded before cleavage. In this way, only the sequence from the target protein source is taken forwards in the process.

Using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible may comprise comparing the calculated surface accessibility with a threshold value; and determining that the at least one matching sequence is surface accessible if the calculated surface accessibility is above the threshold value. By considering surface accessibility, only surface exposed amino acid stretches on proteins matching those composed peptides are taken forwards. If the sequence is not surface accessible, the method ends and if no surface accessible sequences have been identified, no results are output. As set out above, the method is iterative and thus if there is no output, an alternative peptidase may be selected and the identifying, composing, marking, determining, calculating, using, cleaving, re-calculating, repeating and outputting steps may be repeated. If there is still no output, another alternative peptidase is selected and the listed steps are repeated again.

The binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner. The peptide partner may have a fixed number of amino acids, e.g. 20. After the amino acid sequence of the peptide partner has been identified, there may be a search for combinations of amino acids across the binding site which satisfy various constraints.

Composing a plurality of peptides may comprise identifying a set of peptide backbone scaffolds having a matching backbone arrangement to the binding site of the selected peptidase and determining optimal sequences for the identified backbone scaffolds. One method for composing the peptide is to use a computational pipeline such as PEPcomposer which designs peptides binding to a given protein surface. There may be a large number of composed peptides. Accordingly, the method may further comprise ranking each of the plurality of composed peptides before determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source. Ranking each of the plurality of composed peptides may be according to cleavage probability. Ranking may be performed by using any suitable ranking algorithm such as Fold-X FlexPepDock, GalaxyPepDock, CABS-dock, pepATTRACT or PEPCrawler". The ranking preferably predicts the optimal peptide sequence(s) which should bind to the peptidase at the identified binding site. The method may further comprise selecting highest ranking peptides (e.g. selecting the top 20 ranking peptides; and taking only these highest ranking peptides forwards in the method. For example, the method may comprise determining whether at least one composed peptide from the highest ranking peptides matches at least one sequence within the selected protein source.

It will be appreciated that at the heart of the method for designing, identifying, screening and/or developing the peptidase described above, there is a method for predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source. Thus according to a related aspect of the invention, there is provided a computer-implemented method for predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source, the method comprising:

selecting a protein source;

selecting an initial peptidase;

identifying a binding site on the initial peptidase;

composing a plurality of peptides each of which interact with the identified binding site; marking a cleavage site on each of the plurality of peptides;

if there is at least one matching sequence, calculating the surface accessibility for each amino acid in the at least one matching sequence within the selected protein source;

using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible;

repeating the using, cleaving and recalculating steps until at least one resulting peptide sequence which is not surface accessible is formed; and

outputting the at least one resulting peptide sequence which is not surface accessible and optionally the peptidase which resulted in the output sequence. It will also be appreciated that the method of predicting the one or more peptides formed by enzymatic hydrolysis of a selected protein source can be used to predict the protein source of peptides formed by enzymatic hydrolysis. Thus according to yet another related aspect of the invention, there is provided a computer-implemented method for predicting the protein source of peptides formed by enzymatic hydrolysis, the method comprising:

selecting an initial peptidase;

identifying a binding site on the initial peptidase;

repeating the using, cleaving and recalculating steps until at least one resulting sequence which is not surface accessible is formed;

comparing the at least one resulting sequence which is not surface accessible with a plurality of protein source sequences so as to identify one or more matching sequences; and outputting the identified protein source and optionally the peptidase. The optional features described in relation to the first aspect may also be combined with the related aspects of the invention.

The method(s) may be computer-implemented and may be practised with other computer system configurations, e.g. microprocessor systems, main frame computers and the like.

According to another aspect of the invention, there is also provided a computer readable medium, i.e. any storage device that can store data which can be read by a computer system, for storing a computer program which when implemented on a computer system causes the steps of the method(s) above to be performed. Examples of a computer readable medium include a hard-drive, read-only memory, random-access memory, a compact disc, CD-ROM, a digital versatile disk, a magnetic tape, other non-transitory devices and other non-optical storage devices. The computer readable medium may also be distributed over a network coupled system so that the computer program code is stored and executed in a distributed fashion. The computer readable medium is preferably non-transitory.

Brief description of drawings

For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which:

Figure 1 is a flowchart showing the steps in the method;

Figures 2a to 2f illustrate the outputs at various stages in the method of Figure 1 ; and

Figure 3 is a schematic block diagram of a system for implementing the method of Figure 1.

Detailed description of drawings

Figure 1 is a flowchart illustrating the various steps in the method. The method may be considered to comprise two distinct stages: a first stage in which peptide sequences within a protein substrate which match peptide sequences within an enzyme substrate are identified and a second stage in which there is a determination of whether the identified sequences are accessible for binding and cleavage in a peptidase binding site. As shown, the first step S100 in stage 1 is to select a peptidase (or protease). The terms may be used interchangeably and may be defined as an enzyme which breaks down peptides into amino acids, i.e. an enzyme which performs proteolysis. The peptidase may be identified from a suitable peptidase database such as a repository of protein-peptide complexes (e.g. protinDB). The database also stores data (e.g. sequence, structure and binding sites for each type of amino acid) regarding the peptidase. Thus the repository may be used to derive preferences in the form of position specific structural elements which describe a binding site environment in bound peptides. Once the peptidase has been identified, the next step S102 is to identify at least one binding site (or active site) on the peptidase. The binding site may be identified by extracting the structural elements of the protein-protein interaction interface from the database. These elements capture the atomic composition and solvent accessibility of a central residue and its closest neighbours in the protein structure. The binding site may be identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner. The peptide partner is typically a sequence from a protein from a food source. The sequence for peptide partner will typically have a fixed number of amino acids, e.g. 20. After the amino acid sequence of the peptide partner has been identified, there is a search for combinations of amino acids across the binding site which satisfy various constraints. For example, the constraints may be determined by applying a machine learning or scoring matrix to generate a model which enables differentiation of the preferred interactions for every interface amino acid in a protein-protein complex. An example of a suitable binding site is shown in Figure 2a. Once a binding site is selected, the next step in the method is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This can be done using the model mentioned above to place the interacting amino acids from the peptide partner and by linking the interacting residues by peptide bond and by constructing all initial conformations of potential interacting peptide(s). The composed peptides are constructed de novo (i.e. from scratch) using the amino acids which are considered to be placed in their optimal positions in the context of the binding site. One method for composing the peptide is to use a computational pipeline known as PEPcomposer which designs peptides binding to a given protein surface. PEPcomposer is described for example in "PepComposer: computational design of peptides binding to a given protein surface" by Obarska-Kosinska et al published in Nucleic Acids Research Advance Access 2016, April 30. The inputs are the structure of the target protein and an approximate definition of the binding site of the target protein. A search of monomeric proteins is conducted and a set of peptide backbone scaffolds having the same backbone arrangement as the binding site of the selected peptidase are identified. Once the backbone scaffolds are identified, optimal sequences for the identified scaffolds are designed.

Merely as an example, Figure 2b shows potential binding sites 20 highlighted in dark cyan and the residues 22 within the selected cut-off highlighted in dark violet. Figure 2c shows one of the designed peptide 24 at one binding site. The cleavage points are also shown in Figure 2c. The designed proteins are subjected to conformational refinement and scoring to rank them (step S106). Examples of suitable algorithms are FlexPepDock (described in "Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors" by Raveh et al published in PLoS One, 201 1 Apr 29) or Fold-X (described in "The FoldX web server: an online force field" by Schymkowitz et al published in Nucleic Acids Res, 2005 Jul 1 ). Other suitable algorithms are described in "GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimisation" by Lee et al published in Nucleic Acids Res, 2015 Jul 1 ; "CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site" by Kurcinski et al published in Nucleic Acids Res, Volume 43, Issue W1 , PW419-424; "Fully Blind Peptide-Protein Docking with pepATTRACT" by Schlinder et al published in Structure 23, 2015; or "PEPCrawler: a fast RRT- based algorithm for high-resolution refinement and binding affinity estimation of peptide inhibitors" by Donsky et al published in Bioinformatics, 201 1 Oct 15.

The ranking predicts the optimal peptide sequence which should bind to the peptidase at the identified binding site. One important factor used to rank the designed proteins is the cleavage probability. This may be determined using a scoring function (e.g. dMM-PBSA) which reflects the interaction energy of a given peptide with a modelled "near-attack" conformation of the peptidase binding site. dMM-PBSA is described for example in "dMM-PBSA: A new HADDOCK scoring function for Protein-peptide docking" by Spiliotopoulos et al published in Front Mol BioSci 2016 Aug 31. For each predicted (or designed) protein, the associated cleavage site is marked as shown in Figure 2c. The next stage (S108) is to determine which of the highest ranked designed peptides (e.g. the top 20 or top 10) exactly match sequence stretches in a hydrolysate protein source (e.g. a food source such as whey, pea protein etc.) This matching step may be done by aligning the designed peptides with the sequences in the target protein source. A single target protein source may be used or a handful of target protein sources may be considered. One suitable algorithm for matching is known as Peptidematch which is a computational tool designed to quickly retrieve all occurrences of a given query peptide from a database known as UniProtKnowledge (UniProtKB) together with the isoforms. This is described for example in "A fast Peptide Match service for UniProt Knowledgebase" by Chen et al published in Bioinformatics 2013 Nov 1 . The output from the tool is a summary table showing each match together with other information such as the matched sequence region(s) and links to corresponding proteins in a number of proteomic/peptide spectral databases. The results are grouped by taxonomy and can be browsed by organism, taxonomic group or taxonomy tree. Once the matching sequences have been identified, they are stored and then modelled (S1 10). A 3-d model of each matching sequence is constructed, for example, using a known modelling computational tool such as iTasser (Iterative Threading Assembly Refinement) which is described for example in "l-TASSER server: new development for protein structure and function predictions" by Yang et al published in Nucleic Acids Res 2015, Vol 43, Issue W1 , W174-181 . The tool predicts a three-dimensional structural model of the protein molecules from the amino acid sequences. Monte Carlo simulations may be used in the prediction method as well as COACH which is a meta-server approach to protein-ligand binding site prediction. These techniques are described in "The l-TASSER suite: protein structure and function prediction" by Yang et al published in Nature Methods 12, P7-8 (2015). For example, the amino acid sequences are input and using a technique called fold recognition or threading (e.g. using LOMETS (Local Meta-Threading Server), template structures are identified. LOMETS is described in "LOMETS: a local meta-threading server for protein structure prediction" by Wu et al published in Nucleic Acids Res 2007, Epub 2007 May 3.

These template structures are typically template fragments and a full-length structural model is constructed by assembling the template structures. The assembly may comprise multiple stages, for example in a first stage the template fragments are clustered to form a cluster centroid. This assembly stage may use restraints from LOMETS together with decoy-based optimized potential. Using restraints from the cluster, LOMETS and another algorithm known as TM-align, the cluster centroid is then reassembled. TM-align is described for example in "TM-align: a protein structure alignment algorithm based on the TM-score" by Zhang et al published in Nucleic Acids Res 2005 Apr 22. The inherent reduced potential is also used in the re-assembly. A structure with the lowest energy is found, e.g. by optimising using REMO H-Bond optimisation. This technique is described in "REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks" by Li et al published in Proteins 2009 Aug 15. This final model is then processed using the TM-align algorithm using the Protein Data Bank library. This processing gives the structural analogy together with the enzyme commission number, the gene ontology vocabulary, the binding site(s) and a prediction of the function.

As shown in Figure 1 , the steps to generate a 3D model may be considered to be a first stage in the overall process. The second stage is to determine accessibility for binding and cleavage. The next step S1 12 is to calculate the surface accessibility for each amino acid in each of the modelled designed peptides. This can be done for example using Netsurf which is an ensemble of artificial neural networks trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. The techniques underpinning Netsurf are described for example in "A generic method for assignment of reliability scores applied to solvent accessibility predictions" by Petersen et al published in BMC Struct Biol 2009 Jul 31 ; 9:51. There may be two sets of artificial neural networks: a primary and a secondary network. The primary network has inputs of a Position- Specific Scoring Matrix (PSSM) and the raw output from secondary structure predictions. 'B/E Classification (buried or exposed classification)' is the raw output from the neural networks within the primary network. The secondary network is trained to predict the relative surface exposure of an amino acid using the 'B/E Classification' and the PSSM. The results are output from the web server.

At step S1 14, a determination is made as to whether or not one of the sequences identified in step S108 is surface accessible, for example by comparing the calculated surface accessibility with a threshold value which is indicative of a minimum level of surface accessibility. Only surface exposed amino acid stretches on proteins matching those peptides designed in previous steps are taken forwards. If the sequence is not surface accessible, the method ends and if no surface accessible sequences have been identified, no results are output. If the sequence is surface accessible, the matching protein (e.g. whey, pea protein) is aligned to the matching designed peptide (S1 16). Figure 2d shows an example of a 3D model of one of the designed peptides aligned to its matching protein.

The designed peptide is deleted from the structure so that only the sequence from the target protein source is taken forwards in the process. The next step is to cleave the sequence of the target protein source at the correct position into two peptides (S1 18). Once the sequence has been cleaved, the next step is to minimise each cleaved peptide (S120). An example of a minimised cleaved peptide is shown in Figure 2e.

The process then loops back to step S1 12 to calculate the surface accessibility for each amino acid in the cleaved, minimised peptide. There is a determination as to whether or not one of the sequences in the cleaved, minimised peptide is surface accessible. If so, the sequence is aligned, cleaved and minimised as before. The calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the resulting cleaved, minimised sequences which are the peptides resulting from enzymatic cleavage of the target protein source by the identified peptidase. Example outputs of sequences are shown in Figure 2f.

The method described above is a structure based predictive method for identifying the peptides which arise from enzymatic cleavage of a protein food source. The final outputs shown in Figure 2f are the peptide fragments found in the hydrolysates. Step S108 determines which food proteins have sequences which align with the designed peptides and this matching (aligning) step means that the protein sources of the peptide fragments is identified. Thus, the identified protein source may be output in the final stage together with or instead of the resulting cleaved, minimised sequences. Furthermore, the designed peptides are ones which are predicted to interact with the binding site of the identified peptidase. This coupled with the steps in the second stage in which only surface accessible sequences are taken forwards means that only protein source for which the identified peptidase (enzyme) will cause cleavage are identified. In other words, the method also identifies whether or not the peptidase (enzyme) will yield the desired cleavage. Thus, the selected peptidase which resulted in the cleaved, minimised sequences may be output in the final stage together with or instead of the resulting cleaved, minimised sequences.

The method is relatively quick and can be performed for multiple different peptidases (enzymes) or combinations of enzymes. Thus the method can be used to identify and hence design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme. This can be done by selecting a new peptidase if no matching protein source that is of interest is identified in step S108 and repeating the steps of the methods.

Moreover, the method shown in Figure 1 predicts the products of enzymatic cleavage for a particular protein, e.g. how a food protein is digested to produce specific peptides (e.g. bioactive peptides). Accordingly, the method can be further used to design the enzymes or combinations of enzymes which mean that a selected protein source is cleaved by the designed enzyme into the desired hydrolysates (i.e. desired products). This can be done by selecting a new peptidase and repeating the steps of the method if no peptides of interest are output in the final stage.

In other words, the method allows: i. Identification of protein sources of peptide fragments found in hydrolysates;

ii. Identification of the protein source for peptide fragments and

iii. Identification of enzymes that could yield such cleavages.

The method may also be combined with bioactive prediction algorithms to identify enzymes or combinations of enzymes that may produce higher titres of certain peptides from a specific starting material.

Figure 3 is a schematic block diagram of a system for implementing the method of Figure 1. The system comprises a central computing system which comprises a processor 70, memory 72 and an interface 74. These components are all operably connected to one another. The method described above may be computer-implemented and Figure 3 shows one possible system for implementing the method. The central system comprises a processor 70, e.g. a central processing unit implemented in hardware, which implements the method above. The processor is connected to a memory 72 (e.g. RAM, ROM or other suitable storage) which stores the computer code which implements the method. The processor and memory are operatively connected to an interface 74 which is an input/output device.

In line with the method shown in Figure 1 , the first step is for the processor to select a peptidase from the peptidase database 76. The database is shown as a separate database connected to the interface. However, it will be appreciated that the database may also be an integral part of the central system. The processor then implements the method step of identifying a binding site. Once a binding site is identified, the next step is to compose (or construct) the sequence of at least one peptide which is predicted to interact with the identified binding site. This may be done in a composing server 78, e.g. a PEPcomposer server. The composing server 78 may receive a request from the processor 70 to compose the peptide(s). The composed peptides are input to the processor 70 from the composing server 78.

As shown in Figure 1 , the composed peptides then need to be ranked. The processor may send the information on the composed peptides to a ranking server 80 which ranks the peptides using one of the algorithms identified above. The ranking of the peptides is returned to the processor 70. The next step is to determine which of the highest ranked designed peptides exactly match sequence stretches in a hydrolysate protein source. The highest ranked peptides sequences are sent to a peptide match server 82 (e.g. Peptidematch) which retrieves all matching sequences from a peptide database 84 (e.g. UniProtKB). The matching sequences are returned to the processor 70 from the peptide match server 82. The final step in the first stage is to model the matching sequences, e.g. using a modelling server 86 (such as l-TASSER). The models are returned to the processor 70 from the modelling server 86. These models may be stored in memory 72 or in another database (not shown) which is connected to the processor.

In the second stage, the first step is to determine surface accessibility of the matching sequences and this is done using a surface accessibility server 88 which calculates values for the surface accessibility. These values are returned to the processor 70 from the surface accessibility server 88. The processor then determines whether or not the segments are surface accessible, e.g. by determining whether or not the calculated values are above a threshold value. Thereafter, for surface accessible segments only, the processor aligns the matching protein (e.g. whey, pea protein) to the matching designed peptide and then deletes the designed peptide from the structure. The sequence of the target protein source is then cleaved by the processor at the correct position and the resulting sequences are minimised. As explained in relation to Figure 1 , the calculating, determining, aligning, cleaving and minimising steps are repeated until no further cleavage is possible and the method ends and outputs the peptidase which resulted in these resulting cleaved, minimised sequences and/or the peptidase which resulted in these cleaved, minimised sequences and/or the protein source from which the cleaved, minimised sequences are derived. These output sequences may be stored in memory 72 or output on a display (not shown).

In the above description, several steps are undertaken by separate servers, e.g. composing server, ranking server etc.. However, it will be appreciated that the functionality of some or all of these servers may be integrated within a single server or within the central system itself. Alternatively, the functionality of some or all of these servers may be performed by the processor. Each of the servers may have similar hardware to the central system and may thus comprises a processor e.g. a central processing unit implemented in hardware, a memory (e.g. RAM, ROM or other suitable storage) and an interface which is an input/output device.

The contents of all papers and documents mentioned in this document are incorporated herein by reference. The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

Claims

1. A computer-implemented method for designing a peptidase which provides desired enzymatic cleavage of a selected protein source, the method comprising:

selecting an initial peptidase;

identifying a binding site on the initial peptidase;

outputting the peptidase which resulted in the at least one resulting sequence which is not surface accessible.

2. A method according to claim 1 , comprising

determining whether the at least one resulting sequence which is not surface accessible matches the sequence of at least one desired peptide;

if there is no matching sequence, selecting an alternative peptidase and repeating the method steps of claim 1 ;

if there is a matching sequence, outputting the peptidase which resulted in the at least one desired peptide.

3. A method according to claim 1 or claim 2, comprising

minimising at least one of the resulting sequences before recalculating the surface accessibility of the at least one resulting sequence.

4. A method according to any one of claims 1 to 3, comprising

modelling the at least one matching sequence within the selected protein source and/or the at least one composed peptide before calculating the surface accessibility.

5. A method according to claim 4, comprising

determining the cleavage site on the at least one matching sequence within the selected protein source by

aligning the model of the at least one matching sequence within the selected protein source with the model of its matching composed peptide and

identifying the cleavage site on the at least one matching sequence as the site which aligns with the marked cleavage site on the at least one composed peptide.

6. A method according to claim 5, comprising

discarding the model of the matching composed peptide before cleavage.

7. A method according to any one of claims 1 to 6, wherein using the calculated surface accessibility to determine whether the at least one matching sequence within the selected protein source is surface accessible comprises

comparing the calculated surface accessibility with a threshold value;

and determining that the at least one matching sequence is surface accessible if the calculated surface accessibility is above the threshold value.

8. A method according to any one of claims 1 to 7, wherein the binding site is identified by scanning the surface of the selected peptidase for sites for each of the amino acids present in a peptide partner.

9. A method according to claim 8, wherein the peptide partner has a fixed number of amino acids.

10. A method according to any one of claims 1 to 9, wherein composing a plurality of peptides comprises

identifying a set of peptide backbone scaffolds having a matching backbone arrangement to the binding site of the selected peptidase and

determining optimal sequences for the identified backbone scaffolds.

1 1. A method according to any one of claims 1 to 10, comprising ranking each of the plurality of composed peptides before determining whether at least one composed peptide from the plurality of peptides matches at least one sequence within the selected protein source.

12. A method according to claim 1 1 , comprising

ranking each of the plurality of composed peptides according to cleavage probability.

13. A method according to claim 1 1 or claim 12, comprising

selecting highest ranking peptides; and

determining whether at least one composed peptide from the highest ranking of the plurality of peptides matches at least one sequence within the selected protein source.

14. A method according to any one of the preceding claims, comprising outputting the at least one resulting sequence which is determined to not be surface accessible.

15. A method of manufacturing the peptidase designed by the method of any one of claims 1 to 14.

16. A non-transitory computer readable medium storing processor code which when implemented on a computer causes the computer to carry out the steps of any one of claims 1 to 14.

17. A system comprising a processor and a memory, wherein the processor is configured to implement the method of any one of claims 1 to 14.