WO2020070485A1

WO2020070485A1 - Method and apparatus for identifying candidate signatures and compounds for drug therapies

Info

Publication number: WO2020070485A1
Application number: PCT/GB2019/052768
Authority: WO
Inventors: Jonathan Wray; Benjamin Allen; Brendan JACKSON; Adam SARDAR; Alan Whitmore
Original assignee: E-Therapeutics Plc
Priority date: 2018-10-03
Filing date: 2019-10-02
Publication date: 2020-04-09

Abstract

The present invention provides a method and apparatus for analysing data relating to analysis of complex networks in order to identify target bioactivity signatures, for example proteins, and potential therapeutic actives, for example compounds, that are likely candidates for drug therapies. There is provided a computer-implemented method of identifying a subset of potential therapeutic agents from a plurality of potential therapeutic agents comprising the steps of: receiving a network of biological data; receiving footprint data in relation to the plurality of therapeutic agents; determining from the footprint data of each or a combination of the plurality of potential therapeutic agents a functional effect on the network of the each or a combination of potential therapeutic agents; and outputting potential therapeutic agents ranked based on the determined functional effects on the network.

Description

METHOD AND APPARATUS FOR IDENTIFYING CANDIDATE SIGNATURES AND

COMPOUNDS FOR DRUG THERAPIES

Field

The present invention relates to a method and apparatus for analysing data relating to analysis of complex networks in order to identify target bioactivity signatures, for example proteins, and potential therapeutic actives, for example compounds, that are likely candidates for drug therapies.

Background

Drug discovery is the process by which new medications are discovered for use in treating illnesses or as other forms of medication. Conventional drug discovery approaches are based on highly specific targeting of a single protein. Although some forms of treatment are very highly sought, the process of developing a new drug can be extremely expensive and time consuming, and hence an option only for large and established pharmaceutical companies.

Summary of Invention

Aspects and/or embodiments seek to provide a method and apparatus for analysing data relating to analysis of complex networks in order to identify target bioactivity signatures and potential therapeutic actives that are likely candidates for drug therapies.

According to a first aspect, there is provided a computer-implemented method of identifying a subset of potential therapeutic agents from a plurality of potential therapeutic agents comprising the steps of: receiving a network of biological data; receiving footprint data in relation to the plurality of therapeutic agents; determining from the footprint data of each or a combination of the plurality of potential therapeutic agents a functional effect on the network of the each or a combination of potential therapeutic agents; and outputting potential therapeutic agents ranked based on the determined functional effects on the network. The network of biological data may be selected to relate to a disease being studied.

Optionally, the step of determining the functional effect comprises identifying one or more structural or dynamic effects of each potential therapeutic agent on the network. According to a second aspect, there is provided a computer-implemented method of identifying potential therapeutic agents comprising: receiving a network of biological data; determining one or more target bioactivity signatures by perturbing one or more of the nodes; determining a network impact score for each target bioactivity signature; identifying one or more effective target bioactivity signatures having a network impact score greater than a predetermined threshold; and outputting said one or more effective target bioactivity signatures. The network of biological data may be selected to relate to a disease being studied. As used herein, the term“impact score” and“impact value” are used interchangeably.

Optionally, according to this aspect of the invention, there is provided a computer-implemented method of identifying one or more potential therapeutic agent(s), having a footprint overlapping with the one or more effective target bioactivity signature(s).

According to a third aspect, there is provided a method for determining pharmacologically effective therapeutic agents for a disease, including: carrying out a computer-implemented method of any of the preceding aspects, to identify potential therapeutic agents; including selecting the network to be related to a disease being studied. Identified potential therapeutic agents may then be further screened using a method or methods known in the art. In some examples, selecting the network includes identifying cellular mechanisms of the disease being studied, embedded in the network. Some example methods may include pharmacologically screening each therapeutic agent in vitro and/or in vivo. Pharmacological screening may comprise affinity screening, potency screening, or both. In some example methods, potency screening may comprise phenotypic screening or cell-free screening.

Optionally, selecting the network related to the disease being studied may include identifying cellular mechanisms embedded within the network.

Optionally, identifying the cellular mechanisms embedded in the network may comprise selecting a plurality of sets of nodes, each set being involved in a functional process; and for each set of nodes, an example method may include perturbing the network (for example, by removing at least one node of each set, or modifying at least one edge of at least one node of each set); determining a network impact score for each perturbation, and assigning a rank to each set of nodes, each rank indicative of the network impact score of the respective perturbation. Each set of nodes may represent a respective known cellular mechanism (for example, apoptosis or IL10 signalling, replication, or any of many other cellular mechanisms). Selecting a plurality of sets of nodes may allow the relative importance of their respective cellular mechanisms within the network to be determined, based on its respective network impact score. Associations between sets of nodes (for example, proteins) and cellular mechanisms may be provided in one or more database.

The higher the network impact score, the higher the assigned rank. Highly ranked sets of nodes may be considered functionally important to the network. A high ranking of a set of nodes may indicate that the corresponding cellular mechanism is embedded within the network and is functionally important to that network.

Optionally, the sets of nodes known to be related to a functional process may be, but are not limited to, sets of nodes known to be involved in specific cellular processes, cellular functions or signalling pathways.

Optionally, sets of nodes known to be involved in specific cellular processes or signalling pathways may be obtained from one or more database(s) of curated collections of genes, proteins, and/or other molecules involved in those functional constructs.

Optionally, the subset of potential therapeutic agents is identified by any or any combination of: perturbation analysis and/or potential therapeutic agent impact measurement.

In some example methods, perturbing the network may include removing all nodes in one or more set(s) of nodes; or removing combinations of nodes, optionally all possible combinations of nodes.

Optionally, the perturbation analysis comprises the steps of: determining a plurality of effective target bioactivity signatures; and determining from the footprint data of each or a combination of the plurality of potential therapeutic agents a similarity of the footprint data of any or any combination of potential therapeutic agents in relation to each of the effective target bioactivity signatures.

Optionally, the step of determining the effective target bioactivity signatures comprises one or more of network impact maximisation measurement and/or analysis; optimal percolation analysis; and/or minimal control set analysis.

Optionally, the step of determining the effective target bioactivity signatures is completed exhaustively and/or via an approximation technique. Optionally, determining the effective target bioactivity signatures comprises exhaustive network impact maximisation measurement and/or analysis.

Optionally, the method of exhaustive network impact maximisation as disclosed herein further comprises the step of: generating an impact score for all possible sets of nodes of a given size in the network of biological data.

Optionally, the method as disclosed herein further comprises the step of: identifying one or more sets of nodes whose perturbation in the network generates a global substantially optimal network impact score (the highest network impact score).

Optionally, the one or more sets of nodes whose perturbation in the network generates a global substantially optimal network impact score are in a substantially similar region of search space as the global optimum.

Optionally, the one or more sets of nodes whose perturbation in the network generates a local substantially optimal network impact score are in a different area of search space from the substantially global optima.

Optionally, impact maximisation is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

Optionally, the step of determining the effective target bioactivity signatures comprises maximisation of impact by approximation.

Optionally, the step of maximisation of impact by approximation includes but is not limited to metaheuristic optimisation methods, including but not limited to stochastic optimisation methods.

Optionally, stochastic optimisation methods and/or metaheuristic optimisation methods comprise one or more genetic algorithms.

Optionally, the step of determining the effective target bioactivity signatures comprises optimal percolation analysis (network dismantling). Optionally, optimal percolation analysis comprises identification of a plurality of sets of nodes of minimal size (that is, sets having fewest nodes), whereby removal of those nodes is operable to destroy a network giant component.

Optionally, one or more of the plurality of sets of nodes of minimal size comprises different nodes but is of the same minimal size.

Optionally, optimal percolation analysis is completed exhaustively and/or via an approximation approach.

Optionally, optimal percolation analysis is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

Optionally, the step of determining the effective target bioactivity signatures comprises minimum control set analysis.

Optionally, said minimum control set analysis comprises identification of one or more dominating sets of smallest size (which may be referred to as“minimum dominating sets”) for any given network.

Optionally, the or each of the one or more dominating sets of smallest size comprise different nodes.

Optionally, an approximation to the minimum control set is identified via any optimisation technique.

Optionally a minimum dominating set is identified as an approximation to a minimal control set.

Optionally, minimum control set analysis is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

Optionally, a target bioactivity signature is considered effective when an associated impact score is greater than a predetermined threshold. Optionally, determining each of the effective target bioactivity signatures comprises searching for overlap between the footprint data of any or any combination of potential therapeutic agents and each of the effective target bioactivity signatures.

The step of searching for overlap between the footprint data of any or any combination of potential therapeutic agents and each of the effective target bioactivity signatures step may comprise determining the similarity of the footprint data of any or any combination of potential therapeutic agents in relation to each of the effective target bioactivity signatures.

Optionally, the step of determining a similarity of the footprint data comprises searching for overlap between the footprint data of any or any combination of potential therapeutic agents and each of the effective target bioactivity signatures.

Potentially active compounds, or potential therapeutic agents, can be determined by searching for any level/amount of overlap, optionally using a threshold for the amount of overlap being searched for, in order to find the similarity, or amount of similarity, between the effective target bioactivity signatures and the footprint data for the potentially active compounds or potential therapeutic agents.

Optionally, the effective target bioactivity signatures may be output to a data store, optionally referred to as an Effective Target Bioactivity Signature Set Store.

Optionally, one or more effective target bioactivity signatures are output as an ordered set, whereby those with high impact scores are ranked more highly.

Optionally, therapeutic agent impact measurement comprises measurement of impact of each of the potential therapeutic agents on the network.

Optionally, nodes in the network to perturb to measure impact of each potential therapeutic agent are chosen based on the footprint of each potential therapeutic agent. The therapeutic agent impact measurement may thereby comprise selecting nodes in the network to perturb to measure impact of that perturbation based on the footprint of the potential therapeutic agents.

Optionally a target bioactivity signature is operable to be identified from footprints of compounds with known therapeutic effect. Optionally, the effective target bioactivity signature is a collection of molecules whose simultaneous perturbation will significantly affect the network structure or behaviour of dynamic process operating on the network (in other words, in which the effect of the simultaneous perturbation on the network structure exceeds a predetermined threshold value).

Optionally, the identification of one or more sets of nodes, wherein the one or more sets of nodes comprise one or more of: proteins; DNA; RNA; amino acids, hormones and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents.

Optionally, the effective target bioactivity signatures are identified independently of the plurality of therapeutic agents.

Optionally, the one or more nodes are connected by edges, and further wherein edges represent one or more interactions between nodes and/or cellular interactions.

Optionally, the step of calculating an impact value for a perturbation signature indicative of the impact on the network represented by the network data, of the removal of the nodes associated with any given perturbation signature may comprise any of: calculating an impact value that is a measure of the fragmentation of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in ability to communicate along all paths of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in topology of the network as a result of the removal of the nodes associated with the perturbation signature; and/or calculating an impact value that is a measure of the change in statistical mechanics of the network as a result of the removal of the nodes associated with the perturbation signature.

Optionally, the step of calculating an impact value for a perturbation signature indicative of the impact of removing the associated nodes on the network represented by the network data may comprise approximately calculating an impact value associated with the removal of multiple nodes, which may include summing the impact values associated with the removal of each node individually. In accordance with an aspect of the present invention there is provided a computer implemented method for identifying effective target bioactivity signatures, independent of available compounds and/or produces idealised signatures.

Brief Description of Drawings

Embodiments will now be described, by way of example only and with reference to the accompanying drawings having like-reference numerals, in which:

Figure 1 shows an example of effective target bioactivity signature identification;

Figure 2 shows an example of effective target bioactivity signature to potential therapeutic agent mapping;

Figure 3 shows an example of compound impact ranking; and Figure 4 shows an example of a network-aware ranking of compounds.

Specific Description

Network-driven drug discovery (NDD) is a distinctive approach to drug discovery. As shown with respect to Figure 1 , NDD aims to identify target bioactivity signatures (also referred to as molecular perturbation signatures), for example collections of multiple proteins, that significantly disrupt the structural integrity and/or dynamic behaviour of the cellular networks giving rise to targeted disease mechanisms. Small molecules, or other potential therapeutic agents, can then be sought based on their ability to produce the identified perturbation signature. Potential actives such as compounds are not expected to directly bind to all molecules, for example proteins, within the identified signature, but rather to produce a downstream, functional effect on the target molecules making up the signature.

Conventional drug discovery approaches are based on highly specific targeting of a single protein. NDD addresses the true complexity of disease and by seeking to harness the ability of drugs to influence many different proteins. NDD has the potential to provide new treatments for complex diseases where conventional approaches have failed to deliver satisfactory therapies. Network biology represents the cell as a collection of interacting molecules and aims to elucidate how a cellular phenotype emerges from these networks of molecular interactions. The networks can be thought of as forming the mechanistic bridge between the constituent molecules of a cell and the robust phenotypes that those cells demonstrate. Cellular mechanisms of disease can therefore be considered as arising due to networks of pathological interactions that occur only in the disease state. Drug discovery can thus be viewed as the search for agents that significantly disrupt these pathological networks.

Traditional target driven discovery takes a top-down approach, aiming to identify drug targets whose downstream, knock on effects will significantly perturb the disease phenotype. The cellular networks responsible for the underlying disease mechanisms are rarely considered in detail during target identification or validation. NDD proposes an alternative, bottom-up, approach in which the molecular networks of diseased cells, and their relationship to disease mechanisms, are considered explicitly. The use of NDD’s bottom-up approach leads to a number of advantages and opportunities as disclosed herein.

Cellular phenotypes are robust to molecular perturbations and such robustness has been implicated in the failure of compounds to translate to successful drugs. The robustness of biological phenotypes can be understood as a consequence of the underlying cellular networks and the evolutionary processes that shaped them. NDD searches for significant effects on cellular networks and addresses the mechanisms of robustness to molecular perturbation, and so leads to improved efficacy.

Complex, or multifactorial, diseases are defined as those that do not have a single genetic cause but are rather associated with the effects of multiple genes, as well as environmental factors. Such diseases represent a major challenge to traditional single target approaches. NDD considers the combination of these multiple factors as resulting in the rewiring of cellular networks in the disease context, and then uses those rewired networks as the basis of a discovery process, providing a concrete, hypothesis-based approach to tackling complex, multifactorial disease.

NDD considers potential therapeutic agent action via the indirect, functional consequences of their direct binding pattern: such a discovery approach is target agnostic and can discover drugs whose mechanism of action (MoA) is via a single target or via polypharmacological mechanisms. In addition, information regarding a compound’s target or targets is not required during discovery and so the approach is ideally suited for the identification of first in class compounds with a novel MoA. One limitation of target driven approaches is that identified and validated targets can prove “undruggable” even if the underlying disease mechanisms are well founded. NDD enables these mechanisms to be targeted allowing the discovery of compounds with novel and potentially polypharmacological MoA.

In one embodiment, there is provided a bottom up drug discovery technique comprising a system and/or method and/or apparatus which identifies an effective set of target molecules or effective target bioactivity signatures to perturb, for the treatment, prophylaxis or palliation of a disease.

In one embodiment, there is presented a computer implemented method of identifying effective target bioactivity signatures that comprises; receiving biological data S1 , creation of an interaction network from the biological data S2, identification of cellular mechanisms embedded in the network related to disease S2A, identification of effective (optionally comprising optimal and/or substantially optimal) target bioactivity signatures S3 - S5 and outputting effective target bioactivity signatures S6 to a set store S7.

A method of creation of an interaction network S2 may comprise construction of an interaction network comprising of nodes and edges through any applicable methods.

Interaction networks may comprise nodes connected by edges, whereby edges represent interactions between nodes. Nodes include but are not limited to Proteins, DNA, RNA, amino acids, hormones, and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents. Edges may include any cellular interaction.

A method of creation of an interaction network S2 may comprise construction of an interaction network comprising of nodes and edges, which is related to the disease being studied.

A method of creation of networks related to the disease being studied may include identification of cellular mechanisms related to the disease being studied embedded within a network.

A method of identification of cellular mechanisms embedded in a network (S2A) may comprise taking sets of nodes, known to be related to a certain functional process, perturbing the network based on each set of nodes, identifying a network impact score for each perturbation and ranking each set by their network impact score, whereby those with high impact score are ranked highly.

Sets of nodes highly ranked are considered functionally important to that network.

Sets of nodes highly ranked indicate that the cellular mechanism associated with that set of nodes is embedded within that network and is functionally important to that network.

The sets of nodes known to be related to a functional process may be, but are not limited to, node sets known to be involved in specific cellular processes, cellular functions or signalling pathways.

Optionally, node sets known to be involved in specific cellular processes or signalling pathways, may be obtained from databases that curate the collections of gene, proteins, and other molecules involved in those functional constructs.

Node sets known to be involved in specific cellular processes, signalling pathways, or other such functional groupings, are obtained from databases that curate the collections of gene, proteins, and other molecules involved in those functional constructs. These databases are generally curated manually by experts in the field. Examples of such databases are the Gene Ontology Resource, specifically the GO annotations section, and WikiPathways.

A method of identification of effective target bioactivity signatures may comprise one or more of Impact maximisation analysis S5, Optimal Percolation analysis S4 and Minimal control set analysis S3.

Impact maximisation analysis S5 may comprise of exhaustive network impact measurement or impact maximisation by approximation.

Exhaustive network impact maximisation comprises generating an impact score for all possible sets of nodes of a given size in the network of biological data, to identify the set of, or sets of, nodes whose perturbation in the network gives the maximal network impact.

Exhaustive network impact measurement may comprise identifying the set of nodes whose perturbation in the network gives the global optimal network impact via exhaustive means. Exhaustive network impact measurement may comprise identifying the set of nodes whose perturbation in the network gives the global substantially-optimal network impact via exhaustive means. The nodes whose perturbation in the network gives the global substantially-optimal network impact may be in a similar region of search space as the global optimal solution. A method of exhaustive network impact measurement may comprise identifying sets of nodes whose perturbation in the network gives a local optimal network impact via exhaustive means. One or more sets of nodes may be in a different region of search space from the global optima and global substantially optimal solution. Exhaustive network impact measurement may comprise identifying sets of nodes whose perturbation in the network gives a local substantially-optimal network impact via exhaustive means. The nodes whose perturbation in the network gives a local substantially-optimal network impact may be in a similar region of search space as the local optimal solution.

Impact maximisation by approximation may comprise identifying the set of, or sets of, nodes whose perturbation in the network maximises the network impact by approximation. A method of network impact approximation may include, but not be limited to, stochastic optimisation methods and metaheuristic optimisation methods optionally comprising one or more genetic algorithms. Impact approximation may be conducted when exhaustive approaches are not possible or not practical owing to time and/or cost constraints. Such constraints may mean that not all possible sets of nodes of a given size can have impact scores generated individually, hence an approximation is used.

Impact maximisation by approximation may comprise identifying the set of, or sets of, nodes whose perturbation in the network maximises the network impact by approximation. Impact maximisation by approximation may comprise identifying the approximate set of nodes whose perturbation in the network gives the global optimal network impact. Impact maximisation by approximation may comprise identifying the approximate set of nodes whose perturbation in the network gives the global substantially-optimal network impact. The nodes whose perturbation in the network gives the global substantially-optimal network impact is in a similar region of search space as the global optimal solution. A method of impact maximisation by approximation may comprise identifying approximate sets of nodes whose perturbation in the network gives a local optimal network impact. One or more sets of nodes may be in a different region of search space from the global optima and global substantially-optimal solution. Impact maximisation by approximation may comprise identifying approximate sets of nodes whose perturbation in the network gives a local substantially-optimal network impact. The nodes whose perturbation in the network gives a local substantially-optimal network impact is in a similar region of search space as the local optimal solution. Optimal percolation analysis S4 and network dismantling may comprise identification of a plurality of sets of nodes of minimal size, whereby removal of those nodes destroys the network giant component. The minimal set of nodes whose removal destroys the network giant component can form a target bioactivity signature. Each set of nodes of minimal size may comprise different nodes but be of the same minimal size.

Minimal control set analysis S3 may comprise identifying a minimum dominating set as an approximation to a minimal control set. A minimum dominating set identification process may comprise identification of a plurality of dominating sets of smallest size for any given network, whereby each dominating set of smallest size may comprise different nodes.

According to one embodiment, the effective target bioactivity signatures may be output to a data store, optionally referred to as an Effective Target Bioactivity Signature Set Store S7.

According to one embodiment, the effective target bioactivity signatures are output as an ordered set, whereby those with high impact scores are ranked more highly. The identified target bioactivity signatures may be considered effective target bioactivity signatures when the impact score is above a user defined threshold. Effective target bioactivity signatures may be considered as collections of molecules whose perturbation will significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network. Such effective target bioactivity signatures may be sets of nodes within the network whose perturbation will significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network. The nodes may comprise any cellular constituent including but not limited to Proteins, DNA, RNA, amino acids, hormones, and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents. Effective target bioactivity signatures can be considered optimal or substantially-optimal collections of nodes, whereby substantially-optimal solutions are close to the optimal. Effective target bioactivity signatures may be identified independently of therapeutic agents. Effective target bioactivity signatures may be used to identify potential therapeutic agents.

According to a further embodiment, there is provided a computer implemented method for identifying effective therapeutic agents for the use in treatment, prophylaxis and/or palliation of disease, optionally using identified bioactivity signatures and maps to known and/or predicted compound bioactivity signatures.

According to one embodiment as shown in Figure 1 there is presented a computer implemented method of identifying effective target bioactivity signatures that comprises; receiving biological data S1 , obtaining footprint data of known therapeutic agents S8, expansion of footprint using biological data S9, identifying effective target bioactivity signatures based on the expanded footprint network S10 outputting effective target bioactivity signatures S11 to a set store S7.

According to one embodiment as further shown in Figure 1 there is presented a computer implemented method of identifying effective target bioactivity signatures that comprises; receiving biological data S1 , obtaining footprint data of known therapeutic agents S8, identifying effective target bioactivity signatures based on the footprint, outputting effective target bioactivity signatures S11 to a set store S7.

As shown in Figure 2, there is provided a computer implemented method of therapeutic agent discovery is provided that comprises; receiving biological data S1 , creation of an interaction network S2, identification of cellular mechanisms involved in the network related to the disease S2A, identification of effective (optionally optimal and/or substantially-optimal) target bioactivity signatures S3-S5, outputting effective bioactivity signatures S6 to a data set store S7, obtaining potential therapeutic agent footprint data C1 , creating a potential therapeutic agent signature database C2, searching the potential therapeutic agent signature database and mapping effective bioactivity signatures to the potential therapeutic agent signatures T 1 , outputting a result of the potential therapeutic agent to bioactivity signature overlap T2, and ranking by overlap to generate an ordered sub-set of potential therapeutic agents T3.

The potential therapeutic agent signature database C2 may be generated through creating a dataset of potential therapeutic actives against their footprints, whereby the dataset of potential therapeutic actives includes activity values in both direct binding assays and indirect functional assays to provide an overall potential therapeutic active functional footprint. The dataset of potential therapeutic actives may comprise empirical and predicted data, whereby predicted includes machine learning.

A method of searching for similarity between effective target bioactivity signatures and therapeutic agents T1 may comprise taking the effective target bioactivity signatures, identifying therapeutic agents with a similar downstream footprint as those effective target bioactivity signatures, and outputting a list optionally ranked by similarity and/or closeness between the footprint signature and target bioactivity signature, outputting to potential therapeutic agent sub-set store. The potential therapeutic agents may be stored within the potential therapeutic agent sub-set store T3 and be considered enriched in agents which have phenotypic activity in vitro. The potential therapeutic agents stored within the potential therapeutic agent sub-set store may be further optimised via wet-chemistry approaches as recognised in the art as hit-lead and lead-optimisation campaigns. The sub-set of potential therapeutic agents stored within the potential therapeutic agent sub-set store which are subsequently optimised via wet- chemistry approaches may be considered enriched in agents which have phenotypic activity in vivo. Effective target bioactivity signatures may be considered collections of molecules whose perturbation will significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network. Such effective target bioactivity signatures are sets of nodes within the network whose perturbation will significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network.

Such effective target bioactivity signatures may be optimal or sub-optimal whereby sub- optimal is close to optimal. Effective target bioactivity signatures are identified independently of therapeutic agents. Perturbation may refer to the full, partial, or probabilistic removal of nodes from a network. Networks may comprise one or more nodes connected by edges, whereby edges represent interactions between nodes. Nodes may include but are not limited to Proteins, DNA, RNA, amino acids, hormones and/or any naturally occurring cellular, sub- cellular or extra-cellular constituents. Edges may include any cellular interaction.

Ranking of therapeutic agent bioactivity signatures may comprise taking identified therapeutic agent bioactivity signatures and ranking by their network impact, whereby those with high impact are ranked highly. Highly ranked potential therapeutic agents may be operable to significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network. Lists of potential therapeutic agents, ranked by impact, can then be output to a ranked potential therapeutic agent ordered sub-set store. The potential therapeutic agents stored within the potential therapeutic agent ordered sub-set store are considered enriched in agents which have phenotypic activity in vivo.

According to a further embodiment, there is provided a computer implemented method for identifying effective therapeutic agents for the use in treatment, prophylaxis, or palliation of disease. This method may use actual or predicted compound bioactivity signatures (also referred to as“footprints”) and ranks their impact on a network (also referred to as potential therapeutic agent impact ranking). The actual or predicted compound bioactivity signatures may be potentially therapeutic actives. As further shown in Figure 2, in order to apply the network-based approaches to drug discovery, the identification of therapeutic agents with the ability to perturb the molecules in the target bioactivity signature is required. The approach taken to therapeutic agent mapping is to use a therapeutic agent’s bioactivity signature, also referred to as a footprint, defined as the combination of activity values against specific molecules in both direct binding assays, and indirect functional assays, performed in a cellular context. Such a mapping approach may consider the target signature as the complete, downstream, molecular effect of an agent and not just the cellular receptor through which that effect initially triggered.

As shown in Figure 3, potential therapeutic agent impact ranking may comprise receiving biological data S1 , creating an interaction network S2, identification of cellular mechanisms involved in the network S2A, obtaining potential therapeutic agent footprint data C1 , creation a potential therapeutic agent signature database C2, searching the potential therapeutic agent signature database identifying footprints with at least one perturbation in the network, or with at least a user defined minimum overlap between footprint and network C3. For each potential therapeutic agent meeting these criteria, the method then comprises calculating an impact value for the footprint on the network C5-C7, output an ordered sub-set of potential therapeutic agents ranked by impact value. The impact value may be an approximation.

Potential therapeutic agents in this sub-set include those with footprints with at least one perturbation in the network, or with at least a user defined minimum overlap between footprint and network. Agents are ranked by impact value based upon one or multiple impact metrics.

Ranking of therapeutic agents may comprise taking identified therapeutic agent footprints and ranking by their network impact, whereby those with high impact are ranked highly. Highly ranked potential therapeutic agents may be operable to significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network. Lists of potential therapeutic agents, ranked by impact, can then be output to a ranked potential therapeutic agent ordered sub-set store. The potential therapeutic agents stored within the potential therapeutic agent ordered sub-set store are considered enriched in agents which have phenotypic activity in vivo.

According to a further embodiment, the ranked potential therapeutic agents stored within the ranked potential therapeutic agent store may be further tested to identify hits and optimised via wet-chemistry approaches as recognised in the art as hit-lead and lead-optimisation campaigns. Networks may comprise nodes connected by edges, whereby edges represent interactions between nodes. Nodes include but are not limited to Proteins, DNA, RNA, amino acids, hormones, and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents. Edges may include any cellular interaction.

A method of identifying network impact may comprise identifying the relative change in a network level statistic. Network level statistics can be chosen from one or more of: network fragmentation measures, shortest path measures, all path measures, local topology measures, or statistical mechanics measures. n Network fragmentation measures may be selected from one or more of: size of largest connected component; mean isolated connected component size; potential edge count; edge connectivity; resilience factor; percolation threshold; or algebraic connectivity. Shortest path measures may be selected from one or more of: mean distance, diameter, or efficiency. All path measures may be selected from one or more of: natural connectivity; effective graph resistance; bipartivity; spanning tree count or total graph diversity. bLocal topology measures may be selected from one or more of: average local clustering coefficient, global clustering coefficient, average disconnectedness, or changes in graphlet distribution. Statistical mechanical measures may be selected from one or more of entropy and/or complexity. The impact value may be an approximation.

According to one embodiment a sub-set of potential therapeutic agents is produced, whereby the potential therapeutic agents in the subset are considered enriched in agents which have phenotypic activity in vitro.

It has been conventionally assumed that identified important proteins are to be treated as direct binding targets, whereas the arrangement as disclosed herein utilises the downstream functional effect of a compound. Embodiments seek to provide a method and apparatus for analysing data relating to analysis of complex networks in order to identify target bioactivity signatures and potential therapeutic actives that are likely candidates for drug therapies.

The approach of some embodiments may comprise the following advantages:

• The approach does not limit results to only directly druggable targets, since proteins driving discovery are not just direct drug targets.

• One limitation of target driven approaches is that identified and validated targets can prove undruggable even if the underlying disease mechanisms are well founded. This technique enables these mechanisms to be identified and utilised allowing the discovery of compounds with novel and potentially polypharmacological mechanism of action (MoA). • Consideration of compound action via the indirect, functional consequences of their direct binding pattern implies that such a discovery approach is target agnostic and can discover drugs whose MoA is via a single target or via polypharmacological mechanisms.

• Information regarding a compound’s target or targets is not required during discovery and so the approach is ideally suited for the identification of first in class compounds with novel MoA.

• Targets do not need to be in the original network thus do not need to be known.

• Targets do not need to be known and so the approach is applicable to mechanisms and indications (for example complex polygenic diseases) that have thwarted target identification and validation.

• By taking into account downstream effects, biological potentiation may be allowed for such that an indirect effect can be more potent than original binding affinity would suggest.

• A discovery approach driven by the search for significant effects on cellular networks will address the mechanisms of robustness to molecular perturbation, and so lead to improved efficacy.

NDD requires a number of components. First, the ability to construct computational, network based, models representing the cellular disease mechanisms to be targeted. Second, analysis techniques that can use those models to identify effective molecular perturbation signatures. Thirdly, the ability to identify compounds whose downstream, functional effects match the identified perturbation signatures.

There are numerous approaches which can be followed to construct computational network- based models representing cellular disease mechanisms. It will be well understood by a person skilled in the art that networks of cellular disease mechanisms consist of nodes, representing cellular molecules such as inter alia proteins, connected via edges representing interactions between those molecules. Nodes in such networks are not limited to proteins and edges are not limited to a single type of interaction. Edges are interactions between nodes, thus in this context may include any cellular interaction.

According to some embodiments, input networks could be derived from any source, including but not limited to genomic, proteomic, and/or metabolomic data. In one embodiment impact measures are calculated, such impact measures account for the system as a whole rather than individual nodes.

The concept of linear superposition states that for all linear systems the net response caused by two or more stimuli is the sum of the responses that would have been caused by each stimulus individually. In fact, this can be viewed as the definition of linearity. Any system where superposition holds can be called a linear system. Linearity implies that the collective behaviour of multiple system elements can assessed be studying the isolated elements and simply summing the behaviour. Biological networks are not linear in their behaviour. As such, collective behaviour of multiple network elements (nodes or edges) cannot be assessed by measurement of the isolated nodes or edges except as an approximation.

The implication of the non-linearity to the problem of identification of node sets that optimize impact is that optimal sets cannot be identified via the identification of important nodes in isolation (i.e. a node metric approach). Behaviour of the network as a whole, under perturbation conditions, needs to be studied in order to identify optimal node sets. All impact measures are based on measures of network behaviour as a whole.

In a further embodiment, footprints of potential therapeutic agents, for example compounds, are used. The use of the downstream footprint takes into account the potential amplifications of effects of a compound after its initial binding. Thus, generally, embodiments are not looking to identify targets in a traditional sense. Therefore, the described approach does not limit results (for example protein sets) to only directly druggable targets, since proteins driving discovery are not just direct drug targets.

While numerous techniques exist to create networks, it may be desirable to identify techniques which are able to analyse such networks to identify the effect of perturbations within a network, caused by removal of nodes associated with effective target bioactivity signatures, and match such effective target bioactivity signatures to potential therapeutic actives.

One embodiment aims to identify target bioactivity signatures whose perturbation will have a significant effect on the structural integrity of the network or behaviours of dynamic processes operating on the network. One such way of identification of target bioactivity signatures described by the invention, is via the maximisation of impact. Such techniques require calculation, or approximation of, impact values for any given perturbation or an optimisation thereof. A further embodiment aims to identify a sub-set of potential therapeutic agents, from a potential therapeutic active database, by therapeutic agent impact measurement. Therapeutic agent impact measurement comprises measurement of impact of each of the potential therapeutic agents on the network. According to this embodiment the nodes in the network to perturb to measure impact are chosen based on the footprint of the potential therapeutic agents.

Impact measures which may be calculated for either effective target bioactivity signature identification or potential therapeutic agent impact measurement are described herein in generalised terms for any perturbation.

The impact of a given perturbation on a network can be calculated via various measures as described herein. The term perturbation signature used herein may refer to any of; potential target bioactivity signature; potential therapeutic agent footprint.

The impact value for a perturbation signature can be indicative of the overall change in the structure of the network, as a whole, represented by the network data as a result of the removal of the nodes associated with a perturbation signature.

The step of calculating an impact value for a perturbation signature indicative of the impact on the network represented by the network data, of the removal of the nodes associated with any given perturbation signature may comprise any of: calculating an impact value that is a measure of the fragmentation of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in ability to communicate along all paths of the network as a result of the removal of the nodes associated with the perturbation signature; calculating an impact value that is a measure of the change in topology of the network as a result of the removal of the nodes associated with the perturbation signature; and/or calculating an impact value that is a measure of the change in statistical mechanics of the network as a result of the removal of the nodes associated with the perturbation signature.

The step of calculating an impact value for a perturbation signature indicative of the impact on the network represented by the network data, of the removal of the nodes associated with any given perturbation signature, may be an approximation of an impact value. Optionally, the step of calculating an impact value for a perturbation signature indicative of the impact on the network represented by the network data, of the removal of the nodes associated with any given perturbation signature may comprise, calculating an impact value associated with the removal of multiple nodes approximately, which may include summation of the impact values associated with the removal of each node individually.

The step of calculating an impact value that is a measure of the fragmentation of the network as a result of the removal of the nodes associated with the perturbation signature may comprise any of: calculating an impact value that is an average size for all components of the network except the largest component after the removal of the nodes associated with the perturbation signature via mean isolated connected component size, calculating an impact value that is a size of the largest component of the network after the removal of the nodes associated with the perturbation signature via calculation of the size of largest connected component, calculating an impact value that is a difference between the number of potential connections in the network after the removal of the nodes associated with the perturbation signature and the number of potential connections in a fully connected network with the same number of nodes through a potential edge count, calculating an impact value that is representative of the minimum number of edges that need to be removed to fragment a network through edge connectivity, calculating an impact value that is representative of the number of subgraphs that can be removed before a network fragments using a resilience factor, calculating an impact value that is representative of the number of nodes that need to be removed in order to fully disconnect a network using a percolation threshold, calculating an impact value that is representative of the algebraic connectivity of the network also referred to as algebraic connectivity.

The step of calculating a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the bioactivity signature may comprise any of: calculating an impact value that is indicative of the change in the mean distance/length of the shortest paths of the network as a result of the removal of the nodes associated with the bioactivity signature comprising a mean distance, and calculating an impact value that is indicative of the change in the maximum distance between two nodes of the network as a result of the removal of the nodes associated with the bioactivity signature also referred to as a diameter.

The step of calculating an impact value that is a measure of the change in ability to communicate along all paths of the network as a result of the removal of the nodes associated with the perturbation signature may comprise: calculating an impact value that is indicative of the natural connectivity of a network as a result of the removal of the nodes associated with the bioactivity signature, calculating an impact value that is indicative of the change in the effective graph resistance of the network as a result of the removal of the nodes associated with the bioactivity signature, calculating an impact value that is indicative of the change in the total graph diversity of the network as a result of the removal of the nodes associated with the bioactivity signature, calculating an impact value that is indicative of the change in the bipartivity of the network as a result of the removal of the nodes associated with the bioactivity signature, calculating an impact value that is indicative of the change in the spanning tree count of the network as a result of the removal of the nodes associated with the bioactivity signature.

The step of calculating an impact value that is a measure of the change in local topology through clustering within and/or connectivity of the network as a result of the removal of the nodes associated with the perturbation signature may comprise any of: calculating an impact value that is indicative of the change in the average local clustering coefficient of the network as a result of the removal of the nodes associated with the perturbation signature, calculating an impact value that is indicative of the change in the global clustering coefficient of the network as a result of the removal of the nodes associated with the perturbation signature, calculating an impact value that is indicative of the change in the disconnectedness of the network as a result of the removal of the nodes associated with the perturbation signature and calculating an impact value that is indicative of the change in the graphlet distribution of the network as a result of the removal of the nodes associated with the perturbation signature.

The step of calculating an impact value that is a measure of the change statistical mechanics of the network as a result of the removal of the nodes associated with the perturbation signature may comprise any of; calculating an impact value that is indicative of the change in the network entropy of the network as a result of the removal of the nodes associated with the perturbation signature and calculating an impact value that is indicative of the change in the network complexity of the network as a result of the removal of the nodes associated with the perturbation signature.

A network impact arrangement is operable to quantify the effect, on the network as a whole, of perturbing a set of nodes. Network impact, I, is defined mathematically as the relative change in a network level statistic S(N)

where N denotes the network before perturbation and N* denotes the network after perturbation.

There are multiple different possible functions S(N) that can be used as a network level statistic and they are described below. The specific impact functions are categorized based on the type of network changes they are measuring. Some impact measures deviate from the general form above and in those cases the full equation for I will be given.

Fragmentation based impact functions can comprise multiple different possible functions S(N) that can be used as a network level statistic. The specific impact functions are categorized based on the type network changes they are measuring. By way of example, the impact value for a perturbation signature that is indicative of the impact on the network represented by the network data of the removal of the nodes associated with the perturbation signature may comprise any of a measure of the fragmentation of the network as a result of the removal of the nodes associated with the perturbation signature, a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the perturbation signature, or a measure of the change in clustering within/connectivity of the network as a result of the removal of the nodes associated with the perturbation signature.

Fragmentation based impact functions measure how much a perturbation fragments the network into multiple sub-networks, or measures the change in how easy it would be to fragment the network after perturbation. Network fragmentation therefore refers to the breaking up of the network represented by the network data into a number of separate subnetworks/components, wherein each subnetwork/component comprises one or more nodes of the network that are not connected/linked to another subnetwork/component.

As a measure of the fragmentation of the network as a result of a perturbation, the average size of all of the connected subnetworks/components of the network except for the largest subnetwork/component can be calculated. For example, such a calculation could take the form:

1 = 0 when Nc = 1

where |C| is the size of component C, and Nc is the number of components, after perturbation.

Alternatively, as a measure of the fragmentation of the network as a result of a perturbation, a measure of the relative size of the largest connected subnetwork or component of the network after the removal of the nodes can be calculated. For example, such a calculation could take the form:

where |N| is the size of the network and max |C| is the size of the largest component after perturbation. This measure has been conventionally described as a measure of the fragmentation of a network due to perturbations.

As a further alternative, as a measure of the fragmentation of the network as a result of the removal of the nodes associated with a perturbation signature, the difference between the number of potential connections in the network after the removal of the nodes associated with the perturbation signature and the number of potential connections in a fully connected network with the same number of nodes can be calculated. As a network becomes fragmented the number of potential edges reduces and impact increases. For example, such a calculation could take the form:

where \C \² is the number of potential edges (i.e. edges that would be present in a fully connected component) in the component C.

As a further alternative, as a measure of the fragmentation of the network as a result of the removal of the nodes associated with a perturbation signature a measure of the edge (vertex) connectivity can be calculated where edge connectivity is defined as the minimum number of edges that need to be removed to fragment a network. A large effect on the network is measured by a reduction in edge connectivity. These measures have been investigated as measures of robustness of communication networks.

As a further alternative, as a measure of the fragmentation of the network as a result of the removal of the nodes associated with a perturbation signature a measure of the resilience factor of a network of a network can be calculated whereby the resilience factor of a network measures the number of subgraphs that can be removed before a network fragment. A reduction in resilience factor can be used to measure the impact of a network perturbation. Resilience factor has been used as a measure of robustness in computer networks.

As a further alternative, as a measure of the fragmentation of the network as a result of the removal of the nodes associated with a perturbation signature the percolation threshold of a network can be calculated where the percolation threshold of a network is defined as the number of nodes that need to be removed in order to fully disconnect the network. This is also described as the destruction of the giant component. Impact can be measured as a reduction in the percolation threshold as a result of perturbing the network.

As a further alternative, as a measure of the fragmentation of the network as a result of the removal of the nodes associated with a perturbation signature the algebraic connectivity can be calculated where the algebraic connectivity of the network is defined as the second smallest eigenvalue of the network Laplacian matrix and measures how far from fully fragmented the network is. Impact can be measured as a reduction in algebraic connectivity and so a move towards fully fragmented. Algebraic connectivity has previously been investigated as a measure of network robustness.

Distance within a network is defined as the number of connections or edges in the shortest path between two nodes, and network distance metrics can therefore be used as an indication of the size and connectivity of a particular network. Consequently, calculating the change in a distance metric of the network as a result of the removal of the nodes associated with the perturbation signature provides a basis for assessing the impact of the removal of the nodes on the size and connectivity of the network.

As a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the identified proteins, the change in the average distance (i.e. the length of the shortest path) of the network as a result of the removal of the nodes associated with the identified proteins can be calculated. Impact can be assessed by an increase in average distance between nodes after a perturbation. Mean distance has been used in multiple papers as a measure of network robustness.

Alternatively, as measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the identified proteins, the change in in the diameter (i.e. the longest shortest path) of the network as a result of the removal of the nodes associated with the identified proteins can be calculated. The diameter of a network may be defined as the maximum distance between two nodes in the network and summarizes the communication ability across that network. Impact can be assessed by an increase in network diameter after a perturbation. Network diameter has been used to measure the difference is network response to random versus targeted perturbations.

Alternatively, as measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the identified proteins, the change in in the efficiency of the network as a result of the removal of the nodes associated with the identified proteins can be calculated. Efficiency measures the averaged sum of the reciprocal of distances between each pair of nodes and has been studied in the context of robustness. A decrease in efficiency after perturbation implies the perturbation has reduced the ability of nodes to communicate and so can be used as a measure of impact.

All path communication impact functions measure how much a perturbation changes the ability for nodes to communicate along all paths through a network. Such changes may include one or more of: natural connectivity or communicability; effective graph resistance; total graph diversity; bipartivity; and/or spanning tree count.

Natural connectivity or communicability has been proposed as a measure of network robustness that explicitly considers all redundant paths. It is defined as the natural logarithm of the average weighted sum of the lengths of all closed walks within the network. It is computationally tractable due to its relationship to the eigen spectra of the adjacency matrix. Increases in natural connectivity as a result of a perturbation can be used as an impact measure.

If a network is considered as an electrical circuit with each edge corresponding to a resistor then the effective resistance between all node pairs can be calculated from the network Laplacian. Given that current can flow through any path between two nodes this also reflects a measure of node communication along all paths. The total effective resistance summed over all pairs of nodes is known as the effective graph resistance and has been proposed as a good measure of network robustness. Increases in effective graph resistance as a result of a perturbation can be used as an impact measure.

Path diversity measures, for a particular path between a pair of nodes, the relative number of nodes on that path but not on the shortest path between the two nodes that is explicitly aimed at measuring redundancy in node to node communication. Effective path diversity is an aggregation of path diversities for all paths between a node pair weighted to ensure the measure is bounded. Total graph diversity is the effective path diversity averaged across all node pairs. A decrease in total path diversity as a result of perturbation indicates a reduction in overall communication ability, and so can be used as an impact measure.

Bipartivity is the ratio of even length closed walks to all closed walks and has been shown to be related to communication efficiency of various networks. Changes in bipartivity as a result of perturbation can thus be used to assess changes in communication efficiency and used as an impact measure.

A spanning tree is defining as a sub-network containing all vertices of the original network, edge count equal to n-1 , where n is the number of nodes in the original network, and no cycles. One network will have multiple spanning trees. The number of trees is related to how well connected a network is and has been investigated as a measure of network robustness. Reduction in the spanning tree count as a result of perturbation can thus be used as an impact measure.

Local topology impact functions measure how perturbations change the distribution of local topological features within the network. Clustering within and/or connectivity of a network is defined as the extent to which nodes of the network are connected to one another. Consequently, calculating the change in a measure of the clustering within and/or connectivity of the network as a result of the removal of the nodes associated with the identified proteins provides a basis for assessing the impact of the removal of the nodes on the connectivity of the network.

As a measure of the change in the clustering within and/or connectivity of the network as a result of the removal of the nodes associated with the identified proteins, the change in the average local clustering coefficient of the network as a result of the removal of the nodes associated with the identified proteins can be calculated.

Techniques from statistical mechanics can be applied to the analysis of networks to provide measures of system behaviour.

Network entropy has been proposed as a quantitative measure of robustness in the context of evolution and selection. Decreases in network entropy as a result of a perturbation can be used as an impact measure.

Multiple specific measures have been proposed to capture the concept that a network is complex and exists between the non-complex extremes of a fully connected system or a system that is not connected at all. Reduction in network complexity as a result of perturbation can be used as an impact measure.

A core goal of bioactivity signature identification is to find collections of molecules (for example proteins) whose simultaneous perturbation will significantly affect the network structure or the behaviour of dynamic processes operating on the network. Multiple, complementary, approaches to bioactivity signature identification are possible.

While in theory it is possible to conduct exhaustive analysis, this would be computationally expensive and/or time consuming using conventional techniques.

Using impact optimisation for the identification of target bioactivity signatures formulates the problem as finding the set of nodes (for example molecules), of a given size, whose perturbation in the network maximises a measure of impact as listed above. Exact solutions to these problems may not be available analytically and cannot conventionally be calculated in a feasible time due to combinatorial explosion. Approximate solutions can be obtained using techniques from metaheuristics or stochastic optimization such as genetic algorithms. Note that in addition to optimal solutions to impact maximisation, other, sub-optimal solutions that nonetheless lead to significant impact, are also useful.

The minimal set of nodes whose removal destroys the network giant component can form an effective target bioactivity signature. Optimal percolation and network dismantling are two computationally tractable approaches to finding such node sets. Control sets are defined as sets of driver nodes underlying network controllability, the ability to guide a network’s dynamic behaviour. Such a set can be considered as an effective target bioactivity signature. A dominating set is a collection of nodes such that every node in the network is either in the dominating set or a neighbour of a node in the dominating set. A minimum dominating set (MDS) is a dominating set of smallest size for a specific network. The MDS can be used to identify sets of driver nodes underlying network controllability and can be viewed as an approximation to a minimal control set.

In a further aspect of the invention, a method for determining pharmacologically effective therapeutic agents for a disease may be provided. Using any of the abovementioned computer-implemented methods where potential therapeutic agents are identified and ranked based on their functional effects on a network of biological data, the skilled person could use said ranked list as a starting point for a conventional drug discovery screening process. The abovementioned computer implemented methods are in silico screening methods and reduce the resource burden on the drug discovery process by providing a superior starting point or plurality of starting points from which a drug discovery process can begin.

In an embodiment of the present invention, the computer-implemented method may identify any number from 1 to 1 ,000,000 potential therapeutic agents, in some embodiments 1 to 100,000, in some embodiments 1 to 10,000, in some embodiments 1 to 5,000, in some embodiments 1 to 3,000, in some embodiments 1 to 1-1 ,000 or less. Of the potential therapeutic agents identified, any number of the potential therapeutic compounds may be selected depending on any characteristic or combination of characteristics of said compounds which are known in the art. For example, such characteristics may be based on known ADME characteristics of the starting compound, or the number of Lipinkski’s rules the compound breaks. Once selected, the selected potential therapeutic agents would undergo in vitro or in vivo screening to further assess their affinity and/or potency.

Affinity screening is well known in the drug discovery industry, and in the context in the present method would involve any known process for assessing the binding characteristics of a potential therapeutic compound to a specific target biomolecule. This binding characteristic is commonly referred to as the dissociation constant (KD)_. For example, affinity screening could be radioligand binding, wherein a radio-labelled compound known to bind to a target biomolecule is displaced by any of the potential therapeutic agents which are non- radiolabelled and using standard techniques the KD of the non-radiolabelled compounds can be calculated. It should be understood that any affinity screening technique could be used in the context of the present computer-implemented method and the selection of the appropriate technique would be made on a target-dependent basis. These affinity screening techniques are typically performed in vitro, but they can be performed in vivo.

Potency screening, sometimes called ‘functional screening’, is well known in the drug discovery industry, and in the context of the present method would involve any known process for assessing the functional characteristics of a potential therapeutic compound to a specific target biomolecule. Therapeutics can have a number of actions at a target biomolecule. They can inhibit the function of the target biomolecule (antagonists), they can activate the biomolecule (agonists), they can enhance the action of an agonist acting at the target biomolecule (positive allosteric modulators), they can lessen the action of an agonist acting at the target biomolecule (negative allosteric modulators), or they can possess any combination of these effects depending on certain factors such as the target or the agonist in question. For example, the potency screening could be a phenotypic screen involving assessing the potency of compounds to alter the phenotype of a cell or organism in a desired manner. An example is given in WO2018/078360. In said example, expression of alkaline phosphatase is increased by activating the Hedgehog pathway by incubating cells in vitro with an agonist, and this increase can be measured using fluorescence-based techniques. For the phenotypic screening, the potential therapeutic compound is incubating with the cells at the same time as the agonist, and the ability of the potential therapeutic compound to inhibit the expression of alkaline phosphatase can be measured. In said example, the potency of a range of compounds are expressed as the IC50, which is the concentration at which the biological process (in this case alkaline phosphatase activity) is inhibited to 50% of its maximum.

When deciding which technique to use to study the potency of one or more unknown compounds, the assay is selected largely based on the target, or disease, under study. Many target- or disease-appropriate assays are known in the art, and routinely used or adapted for use with new targets. Similarly, many phenotypic screens occur in vivo. The assay could measure IC50 as in the abovementioned example. Alternatively, it could measure any other aspect of a potential therapeutic compound’s potency, which is dependent on the nature of the potential therapeutic compound. Other measurable aspects of potency could be the EC50 (for agonists), the effective dose (ED), the median effective dose (ED50), the lethal dose (LD) or median lethal dose (LD50), or any other well-known measures of potency of a compound.

A subset of potential therapeutic compounds may be selected by virtue of having favourable affinity characteristics. For example, a KD less than 100 mM, 10 pM, 1 pM, 100nM, 10nM and/or 1 nM. A subset of potential therapeutic compounds may also be selected by virtue of having favourable potency characteristics. For example, an IC50 or EC50 less than 100 pM, 10 mM, 1 mM 100nM, 10nM and/or 1 nM. A subset of therapeutic compounds may be selected by virtue of having both favourable affinity and favourable potency characteristics according to any combination of the values outlined above.

Definitions

It is appreciated that the following terms used herein may be understood as follows:

Target molecules: Proteins, DNA, RNA, amino acids, hormones and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents.

Target bioactivity signature: collections of molecules whose perturbation will affect the structural integrity of the network or behaviours of dynamic processes operating on the network.

Effective target bioactivity signature: collections of molecules whose perturbation will significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network, optionally comprising a user defined cut off. Effective target bioactivity signatures may include both optimal and substantially optimal signatures which are different from random.

Potential therapeutic agent: compounds, small molecules or any other agent capable of having therapeutic effect, including antibodies, proteins, nucleic acids or viruses.

Potential therapeutic agent sub-set: potential therapeutic agents, which possess activity footprints which significantly affect the structural integrity of the network or behaviours of dynamic processes operating on the network.

Potential therapeutic agent bioactivity signature: interactions, both direct and indirect, empirical and predicted, including downstream effects.

Biological data: any of Proteins, DNA, RNA, amino acids, hormones, and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents and their interactions, structures, and/or dynamics.

Interaction network: network data consisting of nodes and edges (vertices) between nodes. Nodes: Proteins, DNA, RNA, amino acids, hormones and/or any naturally occurring cellular, sub-cellular or extra-cellular constituents.

Edges: interactions between nodes, comprising any cellular interaction.

Optimal network impact: the largest change in the network impact being measured, usually identified through exhaustive techniques.

Substantially optimal network impact (close to optimal target bioactivity signature): derived either through exhaustive techniques and taking a non-maximal result. Or derived from impact optimisation techniques.

Global-optimal network impact score: the largest change in the network impact being measured, across the entire search space.

Global substantially optimal impact score: a sub-optimal network impact score, in the same area of the search space as the global optimal network impact score.

Local optimal network impact score: the largest change in the network impact being measured, lower than the global optimal and in an area of the search space that is different to the global optimal.

Local substantially optimal impact score: a sub-optimal network impact score, in the same area of the search space as the local optimal network impact score

Node removal / removal of nodes / perturbation: may include complete removal of nodes from network or probabilistic removal or partial removal. In some examples, perturbation may include introducing one or more additional node(s), and/or modifying the relative importance of one or more node(s).

The probabilistic removal of a node means removing the node in certain circumstances but not in all circumstances. As an illustrative example, probabilistic removal of a node may be linked to potency/IC50; for example, removal of nodes in the footprints of relatively more potent potential therapeutic agents more often (for example, 9/10 times for more potent nodes). Example 1

Some embodiments identify a subset of potential therapeutic agents from a plurality of potential therapeutic agents. In one embodiment these therapeutic agents are ranked. We now demonstrate by way of example that the approach of the embodiment is superior to an alternative ranking that, whilst still based on the same database of compound bioactivities, is not aware of a biological network. There is shown in Figure 4 an illustration of an exemplary platform, demonstrating that a network-aware ranking of compounds, with a network derived from the functional effect of a given molecule, is superior to a ranking method unaware of the biological network.

The connectivity map (cMAP) used is a publicly available resource detailing the results of 164 bioactive small-molecules and corresponding vehicle controls applied to between one and four human cancer cell-lines over a short duration. Each assay captures the functional effect of a compound by means of measurement of gene expression signatures using microarray technology. Conventional differential expression analysis provided input data for a number of literature derived network sampling/inference techniques S3-5 to build biological networks used by the e-Therapeutics platform. A database of 2.3 million compounds integrates compound bioactivity measurements from a number of public and commercial sources, augmented using Naive Bayes machine learning models. Figure 4 demonstrates the procedure as disclosed herein as applied to the 76 cMAP drug-like compounds included in this bioactivity database.

Different ranking methods are contrasted by their cMAP compound recovery rates for several ranges of high ranks; the top one, five, ten, fifteen and twenty thousand compounds. The box- whisker diagrams depict the median (mid-bar), interquartile range (box) and extremal values (whiskers) of a range of network sampling and impact methods of the invention. Where multiple cell-lines and/or concentrations were assayed for a given cMAP compound, multiple different biological networks and database rankings were produced and the best overall rank taken for the purposes of computing recovery. The triangle points and diamond points detail the recovery rates expected given when compounds are ranked (network-unaware random ranking of compounds, with repeated draws made where multiple concentrations, cell-lines or internal database mappings were considered in the network- aware ranking) by amount of bioactivity annotation and binding annotation respectively. The circle points demonstrate the recovery rate achieved when ranking compounds by the amount of bioactivity data present in the database randomly, without reference to a biological network. The procedure described herein is an improvement on both network-unaware rankings of compounds. Example 2

This embodiment reduces the number of compounds which require screening in vitro and thus increases efficiency, reduces cost and saves time, compared to other approaches. We now demonstrate by way of example that, in projects using the process of the embodiment, the hit rate is considerably greater compared to that of the literature. Using the process of the embodiment, the hit rate in phenotypic screens ranges between 2.2 and 11 % as shown in Table 1). In this scenario a“hit” is defined as having an IC50 < 10mM in multiple cell-based assays, no cytotoxicity, structural QC and freedom to operate in the indication(s) of interest. These projects are in a diverse range of biological areas, namely areas of high unmet need and areas that are historically hard to target conventionally.

This is a large increase in hit rate and thus efficiency compared to the literature which, as described in Table 2, is often less than 1% and has hit definitions which are often less stringent. Further, the number of compounds screened are considerably larger for literature described discovery approaches, thus making the process of the invention approximately two orders of magnitude more productive and/or efficient than other drug discovery techniques. Table 1 : Hit rates (using the process described herein)

Table 2: Literature defined hit rates based on non-NDD techniques

Example 3

An example method for determining pharmacologically effective therapeutic agents for a disease may be provided, using an example disclosed computer-implemented method to identify potential therapeutic agents, ranked based on their functional effects on a network of biological data. The network may be related to a disease being studied and may be selected by identifying cellular mechanisms of that disease, the cellular mechanisms being embedded in the network. Some example methods may include pharmacologically screening each of the ranked therapeutic agents in vitro and/or in vivo. Pharmacological screening may comprise affinity screening, potency screening, or both; and potency screening may comprise phenotypic screening.

In one example, a database containing approximately 3 million compounds, together with their respective footprints, was provided. The disclosed computer-implemented method was used to process these data, identifying 1 , 146 compounds for subsequent in vitro screening, thus substantially reducing the number of compounds that needed to be further screened in the laboratory and substantially speeding up the process of drug discovery. About 5% of the compounds identified by the computer-implemented method were“hit” compounds, defined as having an I C50 <10 pMolar. An example of compounds screened is given in WO2018/078360. This is considerably greater than the number identified using a standard approach known in the art.

The compounds may be tested phenotypically, rather than merely by means of direct binding assays. In some examples, a reduction in alkaline phosphatase expression was used as a marker of reduced differentiation arising from compounds acting on the pathway.

Any system feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure.

Any feature in one aspect/embodiment may be applied to other aspects/embodiment, in any appropriate combination. In particular, method aspects may be applied to system aspects, and vice versa. Furthermore, any, some and/or all features in one aspect/embodiment can be applied to any, some and/or all features in any other aspect/embodiment, in any appropriate combination. It should also be appreciated that particular combinations of the various features described and defined in any aspects/embodiments can be implemented and/or supplied and/or used independently.

Claims

CLAIMS:

1. A computer-implemented method of identifying a subset of potential therapeutic agents from a plurality of potential therapeutic agents comprising the steps of:

receiving a network of biological data;

receiving footprint data in relation to the plurality of therapeutic agents;

determining from the footprint data of each or a combination of the plurality of potential therapeutic agents a functional effect on the network of the each or a combination of potential therapeutic agents; and

outputting potential therapeutic agents ranked based on the determined functional effects on the network.

2. The method of claim 1 , including identifying cellular mechanisms embedded in the network, to select a network related to a disease being studied.

3. The method of claim 2, wherein identifying the cellular mechanisms embedded in the network comprises:

selecting a plurality of sets of nodes, each set of nodes being involved in a functional process; and for each set of nodes,

perturbing the network;

determining a network impact score for each perturbation, and assigning a rank to each set of nodes, each rank indicative of the network impact score of the respective perturbation.

4. The method of any of the preceding claims, wherein the step of determining the functional effect comprises identifying one or more structural or dynamic effects of each potential therapeutic agent on the network.

5. The method of any preceding claim, including using perturbation analysis and/or potential therapeutic agent impact measurement to determine the subset of potential therapeutic agents.

6. The method of claim 5, wherein said perturbation analysis comprises the steps of:

determining a plurality of effective target bioactivity signatures; and

determining from the footprint data of each or a combination of the plurality of potential therapeutic agents a similarity of the footprint data of any or any combination of potential therapeutic agents in relation to each of the effective target bioactivity signatures.

7. The method of claim 6, wherein the step of determining the effective target bioactivity signatures comprises one or more of: network impact maximisation measurement and/or analysis; optimal percolation analysis; and/or minimal control set analysis.

8. The method of claim 6 or 7, wherein the step of determining the effective target bioactivity signatures is completed exhaustively and/or via an approximation optimisation technique.

9. The method of any one of claims 6 to 8, wherein determining the effective target bioactivity signatures comprises exhaustive network impact maximisation measurement and/or analysis.

10. The method of claim 9, further comprising the step of:

generating an impact value for all possible sets of nodes of a given size in the network of biological data.

11. The method of any one of claims 6 to 10, further comprising the step of:

identifying one or more sets of nodes whose perturbation in the network generates a global substantially optimal network impact value.

12. The method of claim 11 , wherein the one or more sets of nodes whose perturbation in the network generates a global substantially optimal network impact value are in a substantially similar region of search space as the global optimum.

13. The method of any one of claims 6 to 8, further comprising the step of:

identifying one or more sets of nodes whose perturbation in the network generates a local substantially optimal network impact value.

14. The method of claim 13, wherein the one or more sets of nodes whose perturbation in the network generates a local substantially optimal network impact value are in a different area of search space from the substantially global optimum.

15. The method of any one of claims 6 to 8, wherein the step of determining the effective target bioactivity signatures comprises maximisation of impact by approximation.

16. The method of claim 15, wherein the step of maximisation of impact by approximation comprises any or any combination of: stochastic optimisation methods; and/or metaheuristic optimisation methods.

17. The method of claim 16, wherein said stochastic optimisation methods and/or metaheuristic optimisation methods comprise one or more genetic algorithms.

18. The method of claim 6, wherein the step of determining the effective target bioactivity signatures comprises optimal percolation analysis.

19. The method of claim 6, wherein one or more effective target bioactivity signatures are output as an ordered set, whereby those with high impact scores are ranked more highly.

20. The method of claim 18, wherein optimal percolation analysis network dismantling comprises identification of a plurality of sets of nodes of minimal size, whereby removal of those nodes is operable to destroy a network giant component.

21. The method of claim 20, wherein one or more of the plurality of sets of nodes of minimal size comprises different nodes but is of the same minimal size.

22. The method of any one of claims 18 to 21 , wherein optimal percolation analysis is completed exhaustively and/or via an approximation approach.

23. The method of any one of claims 18 to 22, wherein optimal percolation analysis is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

24. The method of claim 6, wherein the step of determining the effective target bioactivity signatures comprises minimum control set analysis, to generate a minimum control set.

25. The method of claim 24, wherein said minimum control set analysis comprises identification of one or more dominating sets of smallest size for any given network.

26. The method of claim 25, wherein the or each of the one or more dominating sets of smallest size comprise different nodes.

27. The method of any one of claims 24 to 26, wherein an approximation to the minimum control set is identified via any optimisation technique.

28. The method of any one of claims 25 to 27, wherein a minimum dominating set, being a dominating set of the smallest size, is identified as an approximation to a minimal control set.

29. The method of any one of claims 24 to 28, wherein minimum control set analysis is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

30. The method of claim 6, wherein a target bioactivity signature is considered effective when an associated impact score is greater than a predetermined threshold.

31. The method of any of claims 6 to 30, wherein the determining each of the effective target bioactivity signatures comprises searching for overlap between the footprint data of any or any combination of potential therapeutic agents and each of the effective target bioactivity signatures.

32. The method of any of claims 6 to 31 , wherein the step of determining a similarity of the footprint data comprises searching for overlap between the footprint data of any or any combination of potential therapeutic agents and each of the effective target bioactivity signatures.

33. The method of claim 5, wherein therapeutic agent impact measurement comprises measurement of impact of each of the potential therapeutic agents on the network.

34. The method of claim 33, wherein nodes in the network to perturb to measure impact of each potential therapeutic agent are chosen based on the footprint of each potential therapeutic agent.

35. The method of any of claims 6 to 34, wherein the effective target bioactivity signature is a collection of molecules whose simultaneous perturbation will significantly affect the network structure or behaviour of dynamic process operating on the network.

36. The method of any preceding claim, comprising the identification of one or more sets of nodes, wherein the one or more sets of nodes comprise one or more of: proteins; DNA; RNA; hormones; amino acids; and/or any naturally occurring cellular, sub- cellular or extra-cellular constituents.

37. The method of any of claims 6 to 32, or claims 35 or 36, wherein the effective target bioactivity signatures are identified independently of the plurality of therapeutic agents.

38. The method of any preceding claim, comprising one or more nodes wherein the nodes are connected by edges, and further wherein edges represent one or more interactions between nodes and/or cellular interactions.

39. The method of any one of claims 7 to 17, or claim 31 or 34, including a step of calculating an impact value for a perturbation signature indicative of the impact on the network represented by the network data, of the removal of the nodes associated with any given perturbation signature, wherein calculating the impact value comprises one or more of:

calculating an impact value, or an approximate impact value, that is a measure of the fragmentation of the network as a result of the removal of the nodes associated with the perturbation signature;

calculating an impact value, or an approximate impact value, that is a measure of the change in a distance metric of the network as a result of the removal of the nodes associated with the perturbation signature;

calculating an impact value, or an approximate impact value, that is a measure of the change in ability to communicate along all paths of the network as a result of the removal of the nodes associated with the perturbation signature;

calculating an impact value, or an approximate impact value, that is a measure of the change in topology of the network as a result of the removal of the nodes associated with the perturbation signature; and/or

calculating an impact value, or an approximate impact value, that is a measure of the change in statistical mechanics of the network as a result of the removal of the nodes associated with the perturbation signature.

40. A computer-implemented method of identifying potential therapeutic agents comprising:

receiving a network of biological data; determining one or more target bioactivity signatures by perturbing one or more of the nodes;

determining a network impact score for each target bioactivity signature;

identifying one or more effective target bioactivity signatures having a network impact score greater than a predetermined threshold; and

outputting said one or more effective target bioactivity signatures.

41. The method of claim 40, including identifying cellular mechanisms embedded in the network, to select a network related to a disease being studied.

42. The method of claim 41 , wherein identifying the cellular mechanisms embedded in the network comprises:

perturbing the network;

43. The method of any of claims 40 to 42, further including identifying one or more potential therapeutic agent, having a footprint overlapping with the one or more effective target bioactivity signature.

44. The method of any of claims 40 to 43, wherein identifying the one or more effective target bioactivity signatures comprises one or more of: network impact maximisation measurement and/or analysis; optimal percolation analysis; and/or minimal control set analysis.

45. The method of any of claims 40 to 44, further comprising the step of:

generating an impact score for all possible sets of nodes of a given size in the network of biological data.

46. The method of any of claims 40 to 45, further comprising the step of:

identifying one or more sets of nodes whose perturbation in the network generates a global substantially optimal network impact score; optionally, the one or more sets of nodes are in a substantially similar region of search space as the global optimum.

47. The method of any of claims 40 to 46, further comprising the step of:

identifying one or more sets of nodes whose perturbation in the network generates a local substantially optimal network impact score;

optionally, the one or more sets of nodes are in a different area of search space from the substantially global optimum.

48. The method of claim 40, wherein the step of identifying the one or more effective target bioactivity signatures comprises maximisation of impact by approximation;

optionally, the step of maximisation of impact by approximation comprises any or any combination of: stochastic optimisation methods; and/or metaheuristic optimisation methods.

49. The method of claim 40, wherein the step of identifying the one or more effective target bioactivity signatures comprises optimal percolation analysis;

optionally, optimal percolation analysis comprises identification of a plurality of sets of nodes of minimal size, whereby removal of those nodes is operable to destroy a network giant component.

50. The method of claim 40, wherein the step of identifying the one or more effective target bioactivity signatures comprises minimum control set analysis.

51. The method of claim 50, wherein the minimum control set analysis comprises identification of one or more dominating sets of smallest size for any given network; optionally, the or each of the one or more dominating sets of smallest size comprise different nodes;

further optionally, a dominating set of the smallest size is identified as an approximation to a minimal control set.

52. The method of any one of claims 48 or 49, wherein minimum control set analysis is operable to identify any one of: the global-optimal solution; a local optimal solution; the global substantially optimal solution; or a local substantially optimal solution.

53. The method of any of claims 40 to 52, wherein the effective target bioactivity signature is a collection of molecules whose simultaneous perturbation will significantly affect the network structure or behaviour of dynamic process operating on the network.

54. A computer-implemented method of identifying the cellular mechanisms embedded in a network comprising:

receiving a network of biological data;

perturbing the network;

55. A method for determining pharmacologically effective therapeutic agents for a disease, including:

carrying out a computer-implemented method of any of the preceding claims, to identify ranked potential therapeutic agents; including selecting the network to be related to a disease being studied.

56. A method as claimed in claim 55, wherein selecting the network includes identifying cellular mechanisms of the disease being studied, embedded in the network.

57. A method as claimed in any of claims 55 or 56, including pharmacologically screening each therapeutic agent in vitro and/or in vivo.

58. A method as claimed in claim 57, wherein pharmacologically screening comprises affinity screening, potency screening, or both.

59. A method as claimed in claim 58 wherein potency screening comprises phenotypic screening.