WO2023016621A1

WO2023016621A1 - Ternary complex determination for plausible targeted protein degradation using deep learning and design of degrader molecules using deep learning

Info

Publication number: WO2023016621A1
Application number: PCT/EP2021/025372
Authority: WO
Inventors: Christopher TRUMMER; Arjun Rao; Oliver ORASCH; Markus FLECK; Hosein FOOLADI; Noah WEBER
Original assignee: Celeris Therapeutics Gmbh
Priority date: 2021-08-12
Filing date: 2021-09-29
Publication date: 2023-02-16

Abstract

The invention is related to a computer implemented, machine learning based method for determining ternary complexes in targeted protein degradation, by representing biomolecules as graphs and then feeding these graphs as inputs into a machine learning system comprising steps of - determination of the 3D structure of relevant proteins (1) determination of the interactions between each fragment of the degrader and the cor-responding proteins as well as identification of the corresponding interaction (2) Protein-Protein complex prediction (3) Refinement of the ternary complex, with the designed linker (4).

Description

Title of the invention

Ternary complex determination for plausible targeted protein degradation using Deep Learning and design of degrader molecules using Deep Learning.

Field of the invention

The present invention relates to a computer implemented, machine learning based method for determining ternary complexes in targeted protein degradation.

Background of the invention

Proteins play critical roles in maintaining the life of organisms. Correct protein folding controls cell health and survival. However, most proteins are inherently prone to aggregation in their misfolded or partially misfolded state. In addition, misfolding or misregulation of proteins leads to the development of many diseases, including neurodegenerative diseases, cancers and type 2 diabetes mellitus. Therefore, cells must constantly adjust their protein composition to maintain a homeostasis of their proteomes. Misfolded proteins are refolded or degraded by quality control systems, and elimination of misfolded proteins is critical for maintaining protein homeostasis and cell viability.

Under physiological conditions a complex network that includes folding enzymes, chaperones, lectins and ATP-driven motors controls the elimination of misfolded proteins. The ubiquitin-proteasome system (UPS) and autophagy are the two major intracellular pathways for protein degradation. The UPS and autophagy have long been considered as independent degradation pathways with little or no interaction points. In spite of growing evidence of close coordination and complementarity between the two systems, they are actually different mechanisms: UPS is responsible for the degradation of short-lived proteins and soluble misfolded proteins, whereas autophagy eliminates long-lived proteins, insoluble protein aggregates and even whole organelles (such as mitochondria, peroxisomes), macromolecular compounds, and intracellular parasites (e.g., certain bacteria).

In addition, small interfering RNA (siRNA) and clustered regularly interspaced short palindromic re- peats/associated protein nuclease (CRISPR-Cas9) technologies can also down-regulate or eliminate proteins. However, these two technologies also have limitations: for example, CRISPR-Cas9 technology has undesired off-target effects and low efficiency, which limit its application in vivo. Inefficient delivery to target cells in vivo and non-specific immune responses following systemic or local administration are barriers for the clinical application of siRNA. Researchers are still developing various technology platforms to improve in vivo delivery of therapeutic siRNA.

In addition, heat shock proteins (HSPs) also play important roles in protein kinase degradation. For example, the level of many oncogenic kinases, such as ERBB2, BRAF-V600E, FGFR-G719S and BCR-ABL, are reported to be tightly coupled to heat shock protein 90 (HSP90). The methods mentioned above for controlling protein degradation are mostly achieved via biomacromolecules. In order to target a broader range of proteins with sufficiently high efficiency for clinical application, in recent years pharmaceutical researchers have developed a series of new strategies for protein degradation using small molecules. One representative strategy is mono- and heterobifunctional degrader that degrade proteins by hijacking the UPS. These degraders are small molecules that bind both E3 ubiquitin (U) ligase and target proteins, thereby leading to the exposed lysine on the target protein being ubiquitinated by the E3 ubiquitin ligase complex, followed by UPS-mediated protein degradation. Theoretically, degrader not only provide binding activity, but also have great potential to eliminate protein targets that are "undruggables" by traditional inhibitors or are non-enzymatic proteins, e.g., transcription factors. In addition, the degrader technique is "event-driven", which does not require direct inhibition of the functional activity of the target protein. These characteristics make degrader technologies an attractive strategy for targeting protein degradation (TPD).

Therefore, targeted protein degradation using the mono- and heterobifunctional degrader technologies is emerging as a novel therapeutic method to address diseases, such as cancer, driven by the aberrant expression of a disease-causing protein.

The binding of a degrader molecule to a target protein (protein of interest) as well as to an E3 ligase at the same time results in the formation of a ternary complex. This ternary complex can induce the targeted degradation of the pathogenic protein, as the E3 ligase triggers protein degradation via proteasomes by ubiquitination. In order to induce TPD, positive cooperativity between the molecules forming the ternary complex is necessary.

Ternary complex formation in a degrader function is known for several years, as degraders, that are weaker binders can also induce the degradation of proteins under the condition of ternary complex formation between a protein of interest, a degrader molecule, and a recruited E3 ligase. The significance of such ternary complexes was shown with the first ternary complex crystal structures, which displayed positive cooperativity and newly formed protein-protein interactions.

According to the state of the art, the determination of ternary complexes is performed by traditional computer-based methods such as molecular dynamics simulations and docking. AutoDock, AutoDock Vina, DOCK, FlexX, GLIDE, GOLD, and similar software are used for fragments and, e.g., Zdock as well as RosettaDock for proteins.

The two recent publications In Silica Modeling of PROTAC-Mediated Ternary Complexes: Validation and Application; M8ichael L Drummond and Christopher I. Williams; J. Chem. Inf. Model. 2019, 59, 4, 1634- 1644 and PRosettaC: Rosetta Based Modeling of PROTAC Mediated Ternary Complexes: Daniel Zaidman, Jaime Prilusky, and Nir London; J. Chem. Inf. Model. 2020, 60, 10, 4894-4903 employ these methods. Both articles treat the field of targeted protein degradation. The ternary complex predictions of the presented frameworks are validated via the reconstruction of already known ternary complex crystal structures.

CN109785902A provides a method to predict the degradation of target proteins by means of state-of-the- art techniques in the field of homology modeling, molecular dynamics simulations and docking, or by means of Convolutional Neural Networks. However, the problem of predicting ternary complexes involves resolving a significantly larger set of interactions For an accurate determination of ternary complexes, not only are fragment-protein interactions and protein-protein interactions crucial, but importantly, the effects that the linker imposes on these interactions need to be considered as well.

It's an object of the invention to enable the process of designing a degrader molecule that results in a stable ternary complex.

It is also an object of the invention to determine the structure (conformation, orientation in a 3D space) of the ternary complex that results from a particular degrader.

SUMMARY OF THE INVENTION

The object of the invention is achieved by the independent claims. The dependent claims describe advantageous developments and modifications of the invention.

According to the invention, a framework for ternary complex formation is provided, which enables the treatment of this cluster of interactions via the use of machine learning models.

Brief description of the drawings

Figure 1 shows a summary of the protocol for degrader design and ternary complex prediction.

Figure 2 shows the method of estimation of chemo-geometric features.

Figure 3 shows the main DGRL pipeline.

Figure 4 shows the method for estimation and pre-processing for the protein component.

Figure 5 shows a fragment-protein interaction module.

Figure 6 shows a protein-protein interaction module.

Figure 7 shows the Bayesian Optimization Loop.

Figure 8 shows Deep Molecular Conformation Generation from the 2D graph.

Figure 9 shows Message Passing Neural Networks.

Figure 10 shows an example of a score network.

Figure 11 shows the Deep Linker Generation.

Figure 12 shows the relative orientation coordinates fed to the Deep Linker Generation model.

Tab. 1 shows the Statistics of the GEOM data, which contains Q.M9 and DRUGS dataset. The illustration in the drawings is in schematic form. It is noted that in different figures, similar or identical elements may be provided with the same reference signs.

Description of the drawings

Figure 2 shows a summary of the method for degrader design and ternary complex prediction. The method consists of four serial steps: the 3D structure determination of proteins 1, the interaction determination between protein and ligand 2, the protein-protein complex generation 3, and the refinement of ternary complex structure 4.

Core of the present invention is a new method for the determination of a degrader molecule and the associated ternary complex by use of machine learning modules in tackling the various requirements of ternary complex determination.

Additionally, a specialized optimization method using Bayesian Optimization (BO) is presented, which allows the efficient inclusion of the effect of the linker on the ternary complex, while simultaneously informing the linker generation.

The method according to the invention also allows the determination of the ternary complex formed by a pre-designed, e.g., human-designed degrader molecule, thus serving as an in-silico tool to validate manually designed degraders.

As shown in Figure 1, the method comprises the following four major steps:

Step 1. 3D structure determination of relevant proteins (E3 ligase and the protein of interest).

Step 2. Determination of the interactions between each fragment of the degrader and the corresponding proteins as well as identification of the corresponding interaction sites using module "Deep Interaction Prediction" DIP.

Step 3. Protein-Protein complex prediction using modules "Bayesian Optimization" BO, "Deep Linker Generation", "Deep Molecular Conformation Generation", and "Dep Graph Representation Learning".

Step 4. Refinement of the ternary complex, with the designed linker.

As mentioned above the following functional modules are used in the method:

Module "Deep Interaction Prediction" DIP is used for converting the geometry of the protein molecule and degrader fragments into a graph and applying deep learning techniques to this graph to determine properties such as the protein-fragment and protein-protein interactions (used in Steps 2 and 3 above).

With Module "Deep Linker Generation "DLG a valid linker sub-structure is generated that connects the two molecules on basis of the coordinates of the two fragments of a degrader molecule. The validity of this generated linker is scored on metrics such as drug-likeness, synthetic accessibility, toxicity, and/or solubility. In conjunction with the modul "Deep Molecular Conformation Generation" DMCG e, this model plays a key role in providing a fitness function for the Bayesian Optimization loop in Step 3.

Module "Deep Molecular Conformation Generation" DMCG is used to efficiently generate a large number (> 100 000) of conformations of a designed degrader. When given a pre-designed degrader (either man- ually or via Deep Linker Generation), this set of conformations enables the determination, which conformations of the linker and consequently the degrader are valid within the ternary complex structure. This information provides a key fitness function for the Bayesian Optimization loop in Step 3.

The methodology for determining a ternary complex includes the following steps, which are briefly described in the subsections below. For more details regarding the Bayesian Optimization loop and the three deep learning modules, see the section on our modules.

The value chain for designing a degrader molecule starts with an amino acid sequence or protein structure that acts as a potential target for a degrader molecule. With significant progress in predicting the 3D structure from an amino acid sequence, referred to as the 'protein folding problem', the method according to the invention starts from such information.

Starting from an amino acid sequence associated with proteins or SMILES strings associated with fragments, the 3D structure is determined via in-house models that are inspired by open-source frameworks such as AlphaFold and RosettaFold for proteins or RDKit in the case of fragments. Here, also the method of homology modeling can be used. In addition, the direct use of experimentally determined 3D structures as an input to the pipeline is possible. This step outputs 3D structures of not only proteins of interests but also E3 ligases.

Determination of fragment-protein (either fragment-E3 ligase, or fragment-pathogenic protein) interactions using module Deep Interaction Prediction. The output is essential to understand the extent and mode of the fragment binding the corresponding protein. This interaction also plays a role in determining the way in which the constituent proteins interact within the ternary complex.

The computation of the protein-protein interactions and the resulting complex formed is the deciding factor in solving the problem of ternary complex determination. This because the protein-protein interaction is the primary interaction stabilizing the ternary complex.

When calculating the structure of protein-protein complexes, one sees that the proteins can interact with each other in a large number of possible orientations and conformations, each resulting in a candidate structure for the protein-protein complex. However, not all protein-protein complexes are feasible when considering the presence of a degrader molecule. For many complexes, when one considers the positioning of the fragments bound to the respective proteins, it is seen that a valid linker molecule cannot be designed.

Using this fact, an iterative optimization process with active learning and Bayesian Optimization is applied, that uses the constraints imposed by the linker design to determine the structure of the protein-protein complex. In the Bayesian Optimization loop, a fitness function for each candidate protein-protein structure is acquired, which is computed by help of the following modules.

Module Determination of Protein-Protein Interactions DIP. Using the graph representations of the E3 ligase and the proteins of interest as created by the Deep Interaction Prediction module, the protein-protein interactions between the E3 ligase and the proteins of interest are predicted using graph-based convolutional neural networks.

Module Deep Linker Generation. Generative models are used to predict whether a valid linker can be generated to connect the fragments as bound in this protein-protein complex. The model takes into ac- count the relative position and orientations of the degrader fragments as well as pharmacological constraints to design a valid linker. This enables to ignore protein-protein complexes for which the bound degrader fragments cannot be linked by a valid linker structure.

Once the degrader molecules have been generated, this method allows to efficiently generate a potentially large dataset of conformations (> 100000). This conformation generation is used to score the linkers generated by Deep Linker Generation above. Additionally, when dealing with a pre-designed degrader, by analyzing a large dataset of generated conformations, the probability of a valid degrader conformation within a particular protein-protein complex candidate can be determined. This gives an additional score that allows to filter out viable protein-protein complex candidates.

The efficiency of this approach is twofold. The use of the Bayesian Optimization loop along with the fitness function ensures, that no computation is wasted on protein-protein complex candidates, that are invalid in the context of the degrader molecule.

Additionally, the use of the deep-learning modules for protein-protein interactions, linker generation and molecular conformation generation means that the space of interactions in the ternary complex can be screened while avoiding expensive docking and molecular dynamics simulations.

To calculate a final stable ternary complex structure a monte-carlo based method to pack the designed linker in the complexes and perform energy minimization is used. Candidates for this include AMBER and M ERCK force fields for the degrader molecule and PyRosetta for the proteins and ternary complexes. Then clustering techniques are used to choose the complexes with the best energy and consensus from possible ternary complexes.

The optimization loop above means that this step, although more computationally expensive per computation, becomes cheap because it is applied only to a small number of candidate structures.

The goal of the pipeline is the determination of ternary complex structures consisting of the proteins of interest, the degrader and the E3 ligase. This in turn involves modeling the interactions between proteins, i.e., the proteins of interest and the E3 ligase, as well as between proteins and the degrader. Typical methods to achieve this apply particularly expensive docking operations.

With the present method, the use of deep learning methods that first process the structural information of the proteins and fragments into graph representations is proposed. These representations are then processed using graph-based convolutional neural networks, which consider the geometric and chemical properties of the biomolecules, to compute features that can be used to predict the interactions between the respective molecules.

In mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called vertices (also called nodes or points) and each of the related pairs of vertices is called an edge (also called link or line). Typically, a graph is depicted in diagrammatic form as a set of dots or circles for the vertices, joined by lines or curves for the edges.

According to the present invention, molecules are presented as graphs through their point clouds and chemo-geometric features and process this representation using deep graph representation learning DGRL network architectures. The final deep learning architecture leverages the fact that all the nodes in a certain neighborhood of a node share common properties with that node (in the real world but also in their graph representation). These properties, that are expressed with edges, can be "summarized" with the help of weight sharing. That's the reason why the main layer components of the used neural network are convolutional layers.

A brief summary of the layers in the used architecture is as follows: At first Cluster-GCN (Chiang, et al., 2019) is used. This convolutional layer architecture does not only already demonstrate superior performance on similar molecular datasets, but it also reduces the memory and time complexity by a high margin. This fact is of considerable importance because the network has to be fast during runtime. The subsequent layer is GraphConv (Morris, et al., 2019). This convolutional layer architecture proved useful because of not only self-supervised representation-learning capabilities, that allow it to exploit atom level complexities, their geometries and all of the interactions between the atoms, but also its efficiency in computing the graph convolutions, which is again important during runtime. The code that encompasses these two main layers is PyTorch-Geometric code (Fey & Lenssen, 2019) doing the standard batching, pooling, gluing these layers together so that slowly a lower and lower dimensional representation is reached, until the final prediction of the score function is made.

The first step in the deep graph representation learning DGRL pipeline is to map the initial 3D structures of proteins and fragments to a suitable representation that respects the chemo-geometric properties of the biomolecules involved. Subsequently, deep graph representation learning DGRL methods are applied to model the respective interactions.

Once the 3D structures of proteins/fragments have been obtained, e.g., through means of a machine learning model or through access to experimental data, the coordinates of proteins and fragments are mapped to a graph representation. In its most general form, a graph consists of nodes and edges, i.e., atoms and their connections. By this way, the pipeline can model neighborhood relations between the constituents of proteins/fragments, which are used to embed chemical properties in abstract feature vectors that are used for making predictions, as it will be explained below.

First, the procedure for fragments is sketched: The graph that describes a degrader fragment is either constructed using k-nearest-neighbor, or ball queries. In the former approach one chooses a node, e.g., an atom, to be connected to k of its neighboring nodes. Ball query graphs, on the other hand, are constructed by specifying a cutoff distance. If the distance between two constituents lies below this threshold value, the algorithm is allowed to place an edge.

In the case of proteins, it is not the atoms of the atomic point cloud that form the nodes of the graph. Instead, the model computes a representation for the surface of the proteins on which the DGRL models operate. Suitable surface representations are given by surface meshes, which are computed by triangulation of the (virtual) protein surface, or surface point clouds. It is these points on this virtual surface that are connected to their neighboring points to form the relevant graph.

As shown in Figure 2, 3D coordinates of the estimated protein surface, 3D atomic coordinates and their respective atom types and, lastly, the normal vectors which are estimated based on the local coordinate features are used as input for the estimation of the chemo-geometric features.

For the chemical features, explicitly structural information of atoms, e.g., interatomic distances and relative angles, and other features such as hydrophobicity are modelled. The full set of features to feed them into the deep graph representation learning DGRL pipeline are aggregated. This is used to learn low dimensional representations of inputs and the generated features. For the geometric features the following strategy is used: Starting from the 3D coordinates of the surface representation curvature features, e.g., the curvedness and shape index (Koenderink & Doorn, 1992) are determined. Intermediate features are also learned from these inputs before again applying an aggregation layer and letting the deep graph representation learning DGRL pipeline learn a suitable low dimensional representation.

After the initial data, i.e., the raw atomic coordinates of proteins and fragments, are mapped to a suitable graph representation, the pipeline proceeds to generate embeddings of chemical and geometrical properties of the molecules. This assumes that a complete description of chemo-geometric features is needed to model protein-protein and protein-fragment interactions accurately.

For fragments the procedure is straightforward. Due to the graph structure of small molecules, well- known deep graph representation learning DGRL strategies can be employed to learn embeddings of chemical information on the nodes of the graphs. To describe the 3D structures free of any bias from the center of mass and global rotations, the deep graph representation learning DGLR models depend only on inter-atomic distances and angles between constituents.

The situation is different for proteins, where the graph representation is of points on the surface mesh or surface point cloud, which do not correspond directly to the constituents of the protein. In order to embed the chemical features onto these points, a graph is created where each surface point is connected to the k atoms of the protein that are closest (by Euclidean distance) to it. The chemical information associated to the atoms is processed, and by the use of deep graph representation learning DGRL methods, embed representations of this onto the surface points. More concretely, different convolutional and attention layers to learn a low dimensional representation of the chemical information are leveraged. This learning is not only based upon the 3D coordinates of atoms and the atom types but also, some chemical information is generated explicitly and fed into the module deep graph representation learning DGRL. More concretely, this information consists of angles between atoms, interatomic distances, hydrophobicity and hydrogen bond potential. It has been observed that providing some explicit information lets the network learn the hidden ones that are not familiar.

An important geometric feature that becomes available when processing surfaces is that of curvature, which plays an important role in identifying both interaction sites for fragments and proteins on a protein surface. To model geometric information, the procedure is similarly to the chemical approach. Along with the initial 3D coordinates of the point clouds and the local coordinate features that are fed into the deep graph representation learning DGRL, in addition some explicit geometric features based on the curvature are generated. Examples of these features are the shape index and the curvedness (Koenderink & Doorn, 1992) or shallow MLPs (Multi-Layer Perceptron), which are used to learn intermediate geometric features. All three features depend on the principal curvatures which are defined as the maximum and the minimum normal curvatures at a given point. Existing approaches suggested in literature such as (Melzi, et al., 2019) and (Cao, et al., 2021) are used to calculate these curvatures. Again, the motivation of using explicit geometrical features is to take the burden off the main deep graph representation learning DGRL to learn these features. In this way, it can focus on condensing latent features into a low dimensional representation which helps the algorithm to make correct predictions.

Figure 3 shows the main deep graph representation learning DGRL pipeline: In this block, all the pre-processing steps are combined in one final model with various convolutional layers. These layers mainly consist of GraphConv (Morris, et al., 2019) and ClusterGCNConv (Chiang, et al., 2019) and were constructed with a manual hyperparameter search to minimize the loss and achieve the best ROC-AUC score possible for the classification.

Given that the protein of interest will have an interaction located on dedicated regions of its surface, it is necessary to predict the areas where this interaction occurs. At this stage, as outlined above, the necessary pre-processing of the 3D structures as well as the necessary chemical and geometrical representations of the protein surfaces is already accomplished. Thus, it can be proceeded to learn, with the help of geometric deep learning, which surface regions are the interaction sites. More concretely, the process of achieving the interaction site classification can be divided into two parts. The first one, as noted above, is done with suitable chemo-geometric features, where the best low-dimensional representation has been learned. The subsequent step is applying the main deep graph representation learning DGRL pipeline on these features so that the classification can eventually be performed. The main idea behind this pipeline is weight sharing, i.e., convolutions. Most properties of an atom or surface point depend on its immediate neighborhood. Since it is a reasonable assumption that points in a neighborhood interact similarly across different neighborhoods, the choice of convolutional layers is a natural one to calculate the low dimensional representations of these properties. The choice of convolutional layers (GraphConv and Cluster- GCN) is motivated by their previous performances on the molecular data as well as their theoretic advantages, such as their ability to perform self-supervised learning, and to exploit atomic level granularity, as well as their superior speed. The latter is important for the runtime systems. This classification will be of importance for further tasks down the pipeline.

Figure 3. shows a Deep Interaction Prediction module: Taking in inputs in the form of atomic 3D coordinates and atom types, this information is used to estimate the protein surface. For the calculation of protein surfaces, standard algorithms for point cloud representations conversion into meshes, e.g., Points2Surf (Erler, et al., 2020) and Delaunay triangulation are used. After calculating the protein surface and selecting patches, the patches are forwarded together with the atomic coordinates and the atoms into a pipeline to generate geometric, chemical, and local coordinate features. After the generated chemical and geometric features have been concatenated, and the local coordinate features have been created, this information is forwarded in form of graph representations into a deep learning pipeline with multiple convolutional layers that ought to learn deep relationships and rotational invariance of the protein surfaces in question.

To summarize: The main components are GraphConv (Morris, et al., 2019) and ClusterGCNConv (Chiang, et al., 2019) layers, which are combined to perform the binary classification indicating if the surface in question is a potential interaction site

Figure 5 shows a Fragment-protein interaction module: Estimation and pre-processing for the protein component in this architecture is the same as for the interaction site prediction presented in Figure 4. The other constituent of the input pair, i.e., the fragment, needs a different representation. The start is similarly for the fragment, where atom coordinates and atom types are taken. Then the 3D structure of the fragment to a graph representation is mapped, which is capable of modelling interatomic relationships. This is achieved by using a combination of DimeNet [(Klicpera, et al., 2020a) and (Klicpera, et al., 2020b)] and explicit features that model interatomic relationships. Then, the chemical and geometric features using GraphConv (Morris, et al., 2019) are embedded and the outputs to a final deep pipeline passed, which is composed of multiple GraphConv and ClusterGCNConvs (Chiang, et al., 2019) layers. The last step exploits the outputs of the deep graph representation learning deep graph representation learning DGRL for the fragment and the deep graph representation learning DGRL for the protein to learn whether the fragment and the protein interact. This prediction relies again on binary classification

Once the possible surface regions are identified, where the binding will occur, the aim is the prediction whether protein and a fragment will interact. Again, it is proceeded similarly to the previous section where the necessary pre-processing has been performed for the protein and the fragment in the ternary complex, i.e., representing them as the respective graphs, and embedding in them the geometric and chemical features. This resultant graph embedding is processed by the main deep graph representation learning DGRL pipeline to where a binary label will be predicted of whether it interacts with the ligand or not.

To train this pipeline, a dataset of proteins and ligands interacting is used. For each protein and ligand, the ground truth of whether the pair does in fact interact is used, in order to train the deep graph representation learning DGRL pipeline to recognize what constitutes interaction and what not. The elaborated procedure may be considered as "fuzzy" docking, where not any Root-Mean-Square-Deviation (RMSD) values are predicted as part of our inference, but rather a simple binary classification indicating if two proteins would interact or not.

Figure 6 shows a protein-protein interaction module. This interaction is modelled similarly to the case of interaction site identification and fragment-protein interaction. To be precise, the pipeline that was used to determine the interaction site on a single protein for both proteins in parallel as shown in Figure 4 is used similarly. To achieve the desired effect, the loss function to make sure that the pipeline is learning to model the interactions between proteins is adjusted.

As previously, the learned interaction is not quantified in terms of a continuous value like the RMSD, but rather by a binary classification indicating the interaction between the respective pair of proteins.

When designing degrader molecules as well as computing ternary complexes to be formed, determining the structure of the protein-protein PP complex that is formed in the presence of degrader molecules is a major step.

For the determination of protein-protein PP complexes, due to the complexity of protein molecules, many possible interactions, and potential conformations between them must be considered. This might lead to a computationally intractable number of potential protein-protein PP complexes. The presence of degrader molecules within a complex significantly alters these interactions and conformations.

For instance, assuming one knows, for each protein, the corresponding interaction sites where a degrader fragment binds. An energetically favorable (stable) PP complex might feature a relative orientation for which the fragment interaction sites in each protein are too distant to be connected via linked fragments. Then, this complex is infeasible due to spatial constraints despite being energetically favorable.

State-of-the-art protocols for the calculation of ternary complex structures handle this problem by calculating the possible protein-protein PP complexes via blind PP docking and afterwards filtering the obtained complexes based on whether the degrader molecule can be placed within the complex in an energetically favorable manner.

However, the constraint which the degrader molecule imposes can be used to selectively sample the protein-protein PP complexes prior to computing their interaction. This approach thus enables the use of more complex virtual screening alternatives via DGRL. The following Bayesian Optimization (Active Learning) loop as shown in Figure 7 can be used to sample protein-protein PP complexes in the presence of a degrader molecule in an efficient and automated form:

1. Random sampling of potential protein-protein complexes. Each complex is characterized by the relative rotation and translation (RRT) between the two constituent proteins, as well as the conformations of these proteins.

2. For each: a. Calculate a fitness measure (score) that correlates to the strength of the protein-protein interaction given their current RRT and conformations. This is called the protein-protein interaction fitness (PPI-fitness). This fitness is calculated from the output of the Deep Interaction Prediction module in the present workflow. b. Given the RRT of the proteins, the orientations and positions of the interacting degrader fragments are calculated as bound to their respective interaction sites. Given this placement of fragments and thereby their relative orientation (RRT), an additional constraintfitness is calculated, which reflects the feasibility of linking the fragments with a geometrically and pharmacologically viable linker. c. Then a combined-fitness is calculated that considers both the above scores (PPI-fitness and constraint-fitness) and gives an acceptable measure of how likely it is that the protein-protein complex will lead to a valid ternary complex structure.

3. Subsequently, a surrogate model (see surrogate model explanation below) is calculated, to predict the combined-fitness (a Gaussian Process) using the scores obtained in the loop from step 2. An important fact here is that the surrogate function can report the uncertainty in its prediction.

4. A new set of conformations/orientations is selected for which the surrogate model lacks knowledge, i.e., expresses high uncertainty, or predicts a high score. This tradeoff between exploitation and exploration is managed by an acquisition strategy as shown below.

5. Loop from 2.

Some of the concepts in the loop above are clarified below.

The surrogate model is a model that takes as input the representation (i.e., RRT + NMA coordinates) of a particular protein-protein complex candidate and predicts the associated combined-fitness. It is trained using the actual combined-fitness as data points. A Gaussian Process model is used, that can predict not only an estimate of the combined-fitness, but also give a reliable measure of the uncertainty in its estimate. The Kernel function used for the Gaussian Process is the well-known Matern Kernel that is modified to handle the relative translations and rotations. This specific kernel function is not essential to the advantage proposed by this patent and can be substituted for any valid alternative in the representation space.

The acquisition strategy is a key aspect of a Bayesian Optimization BO loop and determines in what manner and to what extent exploration for exploitation is traded. The fact that the surrogate model reports the uncertainty of its estimate is crucial here and allows to make principled decisions regarding this tradeoff. Several standard acquisition strategies may be used, for instance, noisy Expected Improvement, Upper Confidence Bound, and Knowledge Gradient. These strategies are implemented by use of the openly available BoTorch framework (Balandat, et al., 2020). In order to sample potential protein-protein PP complexes, each complex candidate by the RRT between the two constituent proteins, as well as a vector representation of the conformations of these proteins are chosen. The relative translation is represented by a 3D vector between the center of masses of the two proteins. The relative rotation is represented by a 4D normalized quaternion.

Each candidate complex is sampled by picking a random RRT and conformation using an even distribution over the above representation space. A uniformly random direction is picked for translation with the distance exponentially distributed. The rotations are selected evenly at random.

To represent the conformations of the proteins, the technique of Normal Mode Analysis (NMA) is employed. The top 10 normal modes of vibration of each protein are calculated using the potential field of an elastic network model (e.g., Anisotropic Network Model). Furthermore, the potential field, which accounts for the rigidity of the residues binding to the degrader fragments, is adjusted. In this manner, each conformation of a protein is specified by a 10-D vector which describes the extents by which the protein is distorted along the respective normal mode.

Convolutional neural networks are applied, which operate on graph representations of the protein molecules to predict the score. These representations account for the geometric and chemical properties in order to predict features that are subsequently processed to eventually yield a measure of the interaction strength.

Two ways to calculate a score are applied, that is informative regarding the feasibility of a degrader molecule given the RRT, and conformations of the two proteins.

In the first method, a Deep Linker Generation model is used, that takes as input the coordinates of the fragments, as bound to the respective proteins in their respective positions and orientations, and thereby the fragments relative orientation (RRT). The model then generates a linker that joins the two fragments. This linker is then scored on the basis of any number of pharmacological constraints such as toxicity and drug-likeness. Additionally, through the use of the Deep Molecular Conformation Generation module, the geometric viability of the linker is determined. Together, this provides the constraint-fitness.

When estimating the ternary complex for a pre-designed degrader, a deep learning-based approach (Deep Molecular Conformation Generation) is used to generate a large dataset (> 100000 datapoints) of energetically stable (low energy) degrader conformations, including the two fragments and the linker.

Each generated degrader conformation is characterized by the relative rotation and translation (RRT's) between its two fragments and the distribution of valid conformations over the RRT space is learned. For instance, one may fit a mixture of Gaussians using expectation maximization. Hence, given the RRT of the two proteins, since the binding pocket for each of the degrader fragments is known, the RRT between the degrader fragments can be computed. The learned distribution function can be used to compute the constraint score.

The combined-fitness can be any function of the PPI-fitness and the constraint-fitness that mimics a logical AND operation. This means that if either of the fitness scores indicates a particularly unfit protein-protein complex candidate, the combined fitness must be low. For instance, if the PPI-fitness and the constraintfitness are normalized to lie between Oto 1, the product of these fitness scores would be a valid combined- fitness. One of the key considerations in ternary complex determination is the stability and validity of the degrader molecule itself. In the Bayesian Optimization BO protocol, this is specified through the constraint-fitness. As previously described, one of the methods to achieve it is to analyze the dataset of stable (low energy) conformations of the degrader molecule. A method that can generate a large number (> 100000) of conformations of a large molecule such as a degrader, which can have more than 60 atoms is needed.

The problem of molecular conformation generation, i.e., predicting an ensemble of low energy 3D conformations based on a molecular graph, is traditionally treated with either stochastic or systematic approaches. The former is based on molecular dynamics (MD) simulations or Markov Chain Monte Carlo (MCMC) techniques. Stochastic algorithms can be accurate but are difficult to scale to larger molecules (e.g., proteins) as the runtime becomes prohibitive. On the other hand, systematic (rule-based) methods are based on careful selection and design of torsion templates (torsion rules), and knowledge of rigid 3D fragments. These methods can be fast and generate conformations in the order of seconds. However, their prediction might become inaccurate for larger molecules, or molecules that are not subject to any of these rules (torsion rules). Therefore, systematic methods are fast, but they do/may not generalize.

The question that then arises is the following: How can the best of these two worlds be kept? Le., generate for larger molecules ensembles of conformations in an accurate and fast manner. According to the invention, an end-to-end trainable machine learning model that can handle and generate conformations is preferred. In addition, it models conformations in a SE(3) invariant manner, which means that the likelihood of a particular conformation is unaffected by rigid translation and rotation operations. This is a desirable inductive bias for molecular generation tasks, as molecules do not change if the entire molecule is translated or rotated. This model is based on a recently proposed machine learning technique, i.e., score-based generative models. The score is the gradient of the log density of the data distribution with respect to the data. Instead of learning a model that has minimum distance to the data distribution, the model that has the minimum distance to the score (gradient) of the data distribution will be learned. The score of the data distribution can be considered as a vector (gradient field) that guides the molecule towards stable (low energy) conformations as shown in Figure 8.

It starts with a random initialization of the positions of the atoms of the molecule in 3D space, and the score (gradient) will guide the molecule toward a suitable state. After an accurate score based on the training data has been learned, annealed Langevin dynamics can be leveraged to create an ensemble of stable conformations within a short amount of time. It is also possible to fix some parts of the molecules (two fragments) and apply the gradient (score) on other parts of the molecule (e.g., linker) to generate constrained conformations. Using the ensembles of generated conformations, a function can be learned, that predicts the likelihood of an energetically stable linker for a particular relative position and orientation of the fragments.

Figure 8 shows the Deep Molecular Conformation Generation from the 2D graph: The input is the graph, and the goal is generating an ensemble of stable (low energy) 3D conformations. It will be initiated with random 3D coordinates for the molecule in 3D space, and in each iteration, these coordinates change a little bit towards a more stable conformation. Something that guides the coordinate change is pseudoforce which comes from the estimation of the score. The score is the gradient of data distribution, and it will be tried to learn the score based on the training data. After that, this score is used to guide the atoms to the specific conformation through stochastic Langevin dynamics. A machine learning model has been leveraged for generating conformations from input molecular graphs. So, some data has been used for the training the model. The data that has been used for training is GEOM- Q.M9 and GEOM-DRUGS data (Axelrod & Gomez-Bombarelli, 2020), which consists of a molecular graph and corresponding ground truth conformations. Q.M9 contains smaller molecules (up to 9 heavy atoms), but DRUGS contains larger and drug-like molecules. You can find some more information about the training dataset in Table 1. Tab 1 shows the Statistics of the GEOM data, which contains Q.M9 and DRUGS dataset.

So, the problem is how this data can be leveraged to learn the conditional distribution of conformations given a graph P(r\G; 0).

One traditional method for conformation generation is molecular dynamics (MD) simulations. It begins with some random initialization of the molecule in 3D space, then based on interatomic potential and forces, the atoms' position changes until reaching some stable (low energy) conformations. For calculating and determining the interatomic potential, different methods have been proposed. The most accurate method is based on Density Functional Theory (DFT), but the computation is very intensive, and it becomes prohibitively expensive for large molecules (with a high number of rotatable bonds). For alleviating this issue, people have proposed some empirical potential (force-field). These empirical force fields are fast, but they are not very accurate. Machine learning methods have the potential to be fast and at the same time, accurate.

The method that is used in the present example is based on score matching generative models that have been used recently in the machine vision domain for generating realistic images (Song & Ermon, 2019). The goal of a score-based generative model is to estimate the score (gradient of the data distribution with respect to data) by minimizing the following loss.

In the present case (generative model for conformation generation), the gradient is learned with respect to the positions of the atom in the molecule. s(r; 0) = V_r logP(r|G)

This gradient can be considered as some pseudo force that guides the evolution of molecules towards stable (low energy) conformations. In most cases, because the support of the data distribution is sparse, people are using a noise conditional score-based generative model (Song & Ermon, 2021). In this case, the goal is to estimate the noisy version of the data score:

This alleviates some of the issues that can appear regarding the generation and creating new samples.

Now the only missing ingredient is how the score network s(r; 0) is defined. The score network (s(r; 0)) can be anything that maps the input molecules to the gradient with respect to input coordinates (the output will be 3N dimensional where the N will be the number of atoms in a molecule). According to the invention, it is proposed the use of a message passing neural network (MPNN) (Battaglia, et al., 2018) as the score network, and the goal is to learn the parameters of this score network. The input to the MPNN is a molecule (graph) with nodes (atoms) and edges (bonds). In an MPNN, one assigns features to each node (e.g., atom type, atom position, aromaticity, charge) and edge (e.g., bond types).

Figure 9 shows a message passing neural networks. The <P^e, <P^V, <P^U are update functions for edge (E), node (V), and global feature (u) update, respectively. p^e^^v (reduce edges to nodes) and p^v^^u (reduce vertices to global features) are aggregation functions that take a set of input and reduce it to a single element in a permutation invariant manner. These updates are described in the equation below.

Score network as shown in Figure 10 is MPNN that updates the edge and node features at each step. The output will be three coordinates for each node which represent the pseudo-force (gradient) that change the position of each node.

An MPNN layer updates the edge features e_(/ and node features and computes a global feature u at each step. One can use series of MPNN layers to update the edge and node features hierarchically. At each layer, edge features can be updated by using a learned function of the current edge feature as well as the node features of connected nodes. Then, for each node, the edge features of connected edges can be aggregated and update the node features using a learned function of this aggregation. At the end, global features (that belong to the whole graph, in the case molecule) will be updated.

The generic formula for the update performed by an MPNN layer is as follows:

Here, p^e^^v denotes a differentiable, permutation invariant aggregation function, e.g., sum, mean or max, and denote differentiable functions the parameters of which can be learned, such as MLPs (Multi-Layer Perceptron). In the present method, element-wise summation for aggregation function and MLPs for the differentiable functions have been used. At the end, for each node, its features are processed via a final readout MLP (weights shared across nodes) to output a three-valued output, that is of the form of a gradient with respect to the cartesian coordinates of that particular node. The network is trained to reproduce the score function in these outputs. The network is trained with the GEOM-DRUGS data to learn a valid score function for larger molecules.

For the generation phase, the initiation starts from some random coordinate in 3D, update the coordinate sequentially based on the learned score to come up with an ensemble of low energy conformations

In the following the generation of novel linkers for given fragments in order to form complete degraders is described. The input is given in the form of graph representations of the fragments to be linked, e.g., as SMILES strings, augmented with structural input in the form of the relative spatial orientation (RRT) of the fragments. (Figure 11)

Given the above information of the fragments as input, the essence of the procedure discussed in this subsection is to connect them by generating novel linker graphs. Each generated linker graph corresponds to a complete degrader graph when considered with the two fragments. By assessing the obtained degrader graphs with graph/pharmaceutical metrics such as uniqueness, chemical validity, quantitative estimate of drug-likeness, synthetic accessibility, toxicity, solubility, ring aromaticity, and/or pan-assay interference compounds, a fitness score can be reported to the surrogate model.

Now referring exclusively to the fitness scoring procedure, additional quality metrics can be reported to the surrogate model using the structural information provided as input. To do so, a 3D structure needs to be established based on the generated degrader graphs. Traditional methods, such as those mentioned above, could be employed here. Alternatively, presenting itself more advantageous, our in-house Deep Molecular Conformation Generation method from said section is applied. The quality metrics reported to the surrogate model are then based on comparisons (e. g., RMSD) between the structural input, which serves as a target to reach (i.e., a label in machine learning jargon) and the 3D coordinates established from the generated degrader graphs.

Additionally, the energy, as determined either by classical methods such as force-fields or dedicated machine learning algorithms, normalized per degree of freedom of the molecule, presents itself as a viable measure of the validity of the degrader since it reports on the molecules strain. Finally, it has to be noted, that the model, by removing the relative orientations from its architecture, can generate linker graphs without any structural information as an input. Then, however, the quality of the generated linkers is expected to be lower.

The model is inspired by DeLinker (Imrie, et al., 2020), with most fundamental differences listed at the bottom of this section.

The model is a Variational Autoencoder (VAE), whereby both the encoder as well as the decoder are implemented via standard Gated Graph Neural Networks (GGNN). The decoder takes as input a set of latent variables and generates a linker to connect the input fragments. The encoder, on the other hand, imposes a distribution over the latent variables that is conditioned on the graph and structure of the unlinked input fragments.

For training the model, the fragment graph X is processed using the encoder GGNN, yielding the set of latent variables z_v, one for each node (atom) in the graph. Additionally, to allow for more control over the generative process, the decoder is fed a low-dimensional latent vector z derived via a learned mapping from the node embeddings of the label (ground truth) degrader (i.e., the target degrader supposed to be generated). Loosely speaking, this allows the decoder to learn to generate different "types" of linkers conditioned on z (i.e., via a conditioned multi-modal distribution). By using z_v and the z, the model can be augmented to learn a prediction of constraints such as toxicity and alike. Then, during runtime, by optimizing over z, z_v, the decoder can improve the quality of the generated linkers with respect to these constraints. During training, both z and z_v are regularized to approximate the standard normal distribution.

To generate the linker given z, z_v, a set of candidate atoms are added to the graph and initialized with random node features. Using these features, the atom types are initialized. At each step of the generation, one can make use of the features z_v, z, atom types I_v, and the features and types of the candidate atoms to generate one bond (can be of any type) connecting an unconnected candidate node to an already connected candidate node in the graph. The valency of the already connected node also affects the choice of the bond. It can be continued to choose bonds for this node until a bond to a special "STOP" atom is picked, at which point the next connected atom in the queue is chosen. This queue is created and traversed in a breadth-first manner. Note that every bond that is selected changes the graph V. This means that the features z_v, I_v are recomputed in each iteration.

During generation, one can draw z from a standard normal distribution and add noise to the encoding of X to calculate z_v. Note that, if during training one can learn to predict the properties mentioned below as a function of z, z_v, during generation, it can be optimized over z, z_v to condition the model to generate degraders of better quality. Properties such as a quantitative estimate of drug-likeness, synthetic accessibility, toxicity, solubility, ring aromaticity, and/or pan-assay interference compounds are considered in this context.

Finally, the key points differentiating the model from other generative approaches are listed in the following: Firstly, as discussed above, 3D coordinates are generated from the obtained degrader graphs using the Deep Molecular Conformation Generation module. Secondly, the graph generation process is fed with structural information of higher quality.

Figure 12 illustrates the structural information provided, i.e., the fragments' relative orientation. This allows to directly interface with the RRT coordinates used in the Bayesian Optimization Pipeline (The relative orientation coordinates fed to the Deep Linker Generation model. The two rings represent the fragments of a degrader. Then, the distance from atom

to atom L₂, the angles between the vectors and L_±-L₂ («I) as well as between the vectors L_±-L₂ and L₂-E₂ (a₂) and the dihedral angle <p (stemming from all three mention vectors) are processed by the model as structural information.

Acknowledging that E -L and E₂-L₂ constitute rotatable bonds by design of the graph generation model, the following bond-angle-torsion coordinates completely specify the relative orientation of the fragments: the lengths E₁-L₁, L_±-L₂, L₂-E₂, the bond angles a and a₂ and the dihedral angle <p. In comparison to the pseudo-bond length L_±-L₂, the physical bond lengths hardly vary. Furthermore, the atom types and L₂ are not available prior to the graph generation process but are modeled as placeholder atoms. Thus, the model is not fed with the bond lengths L -E and L₂-E₂. Also, dihedral angles are circular, rendering it difficult for the model to learn from this feature due to the circular discontinuity. Therefore, instead of feeding the model <p directly, the cosine and sine of <p are provided, both of which are continuous. Note that the mapping of <p to the pair of its angular functions sine and cosine is bijective.

Claims

Patent Claims . A computer implemented, machine learning based method for determining ternary complexes in targeted protein degradation, by representing biomolecules as graphs and then feeding these graphs as inputs into a machine learning system comprising steps of

- determination of the 3D structure of relevant proteins (1)

- determination of the interactions between each fragment of the degrader and the corresponding proteins as well as identification of the corresponding interaction (2)

- Protein-Protein complex prediction (3)

- Refinement of the ternary complex, with the designed linker (4). . Computer implemented method, according to claim 1, wherein Bayesian Optimization is used to sample protein-protein complexes for ternary complex determination in targeted protein degradation. . Computer implemented method, according to claim 2, wherein relative rotations and translations are used to represent the space of protein-protein complexes to optimize over. . Computer implemented method, according to claims 2 or 3, wherein docking is used as the oracle for the determination of the protein-protein interaction. . Computer implemented method, according to one of claims 2 to 4, wherein deep graph representation learning is used as the oracle for the determination of the protein-protein interaction.. Computer implemented method, according to one of claims 2 to 5, wherein a learned distribution over a dataset of generated degrader conformers is used as a fitness function for a candidate protein-protein complex in the Bayesian Optimization loop. . Computer implemented method, according to one of claims 2 to 6, wherein the fitness of a generated linker is used as a fitness function for a candidate protein-protein complex in the Bayesian Optimization loop. . Computer implemented method, according to one of claims 1 one to 7, wherein a molecular graph representation of a biomolecule is fed into a deep interaction prediction model for ternary complex determination in targeted protein degradation. . Computer implemented method, according one of claims 1 to 8, wherein deep graph representation learning is used to predict interactions for ternary complexes and hence docking scores which represent the fitness of each ternary complex in targeted protein degradation. 0. Computer implemented method, according to claim 9, wherein protein-protein interactions are determined via machine learning. 1. Computer implemented method, according to claim 9 or 10, wherein fragment-protein interactions are determined via machine learning. 2. Computer implemented method, according to claim 6, wherein deep molecular conformation generation is used to generate the dataset of molecular conformers. Computer implemented method, according to claim 14, wherein deep molecular conformation generation is used to validate the linkers. Computer implemented method, according to claim 2 of generating valid linkers via deep learning given a particular relative position and orientation of the degrader fragments. Computer implemented method, according to claim 1, wherein biomolecules are generated initiating the process of targeted protein degradation. Computer implemented system, prepared for the execution of the methods according to claims 1 to 15, characterized by the use of functional modules as "Deep Interaction Prediction" (DIP), "Deep Linker Generation " (DLG), "Deep Molecular Conformation Generation" (DMCG).