WO2014086949A2 - System and method for the personalisation and optimization of anti-cancer treatments - Google Patents

System and method for the personalisation and optimization of anti-cancer treatments Download PDF

Info

Publication number
WO2014086949A2
WO2014086949A2 PCT/EP2013/075725 EP2013075725W WO2014086949A2 WO 2014086949 A2 WO2014086949 A2 WO 2014086949A2 EP 2013075725 W EP2013075725 W EP 2013075725W WO 2014086949 A2 WO2014086949 A2 WO 2014086949A2
Authority
WO
WIPO (PCT)
Prior art keywords
pca space
responsiveness
cancer treatment
pca
sample
Prior art date
Application number
PCT/EP2013/075725
Other languages
French (fr)
Other versions
WO2014086949A3 (en
Inventor
Markus Rehm
Maximilian Wuerstle
Egle PASSANTE
Original Assignee
Royal College Of Surgeons In Ireland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Royal College Of Surgeons In Ireland filed Critical Royal College Of Surgeons In Ireland
Publication of WO2014086949A2 publication Critical patent/WO2014086949A2/en
Publication of WO2014086949A3 publication Critical patent/WO2014086949A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The present invention provides a computer implemented systems biological framework and method for the field of cancer research and cancer medicine that (i) allows to specifically predict, on a case-by-case basis, the responsiveness to anti-cancer treatments, (ii) allows to select optimal alternative treatments for cases where poor responsiveness is predicted, and (iii) allows to suggest promising targeted co-treatment options for cases of non-responsiveness to treatment.

Description

Title
System and Method for the Personalisation and Optimization of Anti-Cancer Treatments
Field of the Invention
The invention relates to a computer-implemented system and method for personalising and optimising a patient's response to a treatment. In particular, the invention relates to a method of knowledge- and data-driven systems simulations for the personalisation and optimisation of anti-cancer treatments.
Background to the Invention
Decades of basic and clinical research have been largely unsuccessful in identifying reliable biomarkers and tools that can be used to prognosticate or predict the responsiveness of cancer to specific treatments, be it in isolated cancer cell lines or in patients. Biomarkers are usually gene mutations, mRNA expression levels and, more seldom, protein expression levels specifically known to be connected to a certain cancer type. Classical bioinformatics and statistical approaches are prone to identify possible biomarkers in large datasets. However, single or dual gene/gene product biomarkers mostly fail as reliable indicators that can inform treatment decisions. From an estimated number of 150,000 biomarkers reported in the scientific literature, only about 100 are in practical use (Nature. 201 1 Jan 13;469(7329):156-7). Biomarker failure in the field of cancer is often due to the immense heterogeneity in this group of diseases.
Reported and widely used alternative approaches to tackle the problems described above are based on classical statistical analyses that aim to identify single or multiple genes or gene products indicative of treatment responses. However, these classical approaches have so far failed to overcome the problems described above. Multi-variate statistical algorithms have been used previously to process large and/or quantitative biological data sets in the context of cancer treatment (Janes KA, Albeck JG, Gaudet S, Sorger PK, Lauffenburger DA, Yaffe MB. Science. 2005 Dec 9;310(5754):1646-53; Janes KA, Gaudet S, Albeck JG, Nielsen UB, Lauffenburger DA, Sorger PK. Cell. 2006 Mar 24;124(6):1225-39; Lee MJ, Ye AS, Gardino AK, Heijink AM, Sorger PK, MacBeath G, Yaffe MB. Cell. 2012 May 1 1 ;149(4)780-94). However, these studies employed PCA (principal component analysis) and PLSR (partial least square regression) algorithms as statistical methods for exploratory data analysis, and for the development of data-driven models on a very limited number of cases to identify novel signalling pathways and components. These applications were therefore not capable of providing predictive capacity for new, testable cases. Lee MJ et al. 2012 use PCA/PLS to get insights into how different treatment combinations are affecting/changing single protein concentrations over time and how these correlate to treatment. The authors investigate how protein concentration changes over time are influenced by different treatment combinations, and if these changes correlate between proteins of different pathways. However, treatment efficacy is not predicted.
A report by Kevin A. Janes et al. (Nature Reviews Molecular Cell Biology vol. 7(1 1 ), pp 820-828 (2006)) describes that a PCA-based data exploratory approach is used to rank the contribution of variables {e.g. proteins) to the overall covariance of a raw data set. As described in Janes et al. 2006, PCA cannot be applied to predict treatment outcome. PLSR, a regression technique described in Janes et al. 2006, can be applied to analyse which protein changes are likely linked to treatment response. However, the response information of every sample is part of the input data set for the analysis. It therefore cannot be used to predict treatment responses of new samples. Indeed, Janes et al. 2006 does not describe any such use.
It is therefore an object of the present invention to provide a method and system which can be used to prognosticate or predict the responsiveness of cancer to specific treatments and overcome at least some of the above-mentioned problems. Summary of the Invention
In general, the method of the invention involves processing protein expression data obtained from a number of biological samples. As a result of functional grouping of expression data of individual proteins (described below) and a subsequent principle component analysis (PCA), each sample occupies a specific coordinate in the PCA space. In addition, the treatment response that is associated with each of these samples is known. Through steps specified the description, the coordinates of each sample with known anti-cancer treatment responsiveness are used by the invention to segment the PCA space into spatial regions that reflect different amounts of anti-cancer treatment responsiveness (see also Figure 2C). This "(pre-defined) group of samples with known anti-cancer treatment responsiveness" therefore represents data which serves as a knowledge base by which the method of the invention is initially parameterised. It allows associating every coordinate or position in the PCA space with a specific amount of anti-cancer treatment responsiveness.
A sample with unknown anti-cancer treatment responsiveness is next processed by the method of the invention and placed into the existing, segmented PCA space and its anti-cancer treatment responsiveness is predicted. The placement or positioning of a sample with unknown anti-cancer treatment responsiveness into the existing segmented PCA space is achieved by multiplying its functional group values with the associated coefficients for the functional groups as determined by the prior PCA. Once the sample has been placed into the segmented PCA space, the position yields the anti-cancer treatment responsiveness prediction or co-treatment prediction.
According to the present invention there is provided, as set out in the appended claims, a computer implemented method for the personalisation and/or optimization of anti-cancer treatments for a patient comprising the steps of: analysing quantitative expression levels of multiple proteins obtained from a biological sample obtained from a patient;
grouping the multiple proteins into a plurality of functional groups based on the biological pathway knowledge; inputting the functional group data into a multivariate statistical processing module and performing a principle component analysis (PCA) in which a multi-dimensional PCA space is obtained;
positioning the sample into the PCA space according to its functional group values derived from quantitative protein expression data and the associated coefficients generated from the principle component analysis; defining groups of samples with similar anti-cancer treatment responsiveness by using a clustering technique;
segmenting the PCA space into a plurality of regions, each region represents different levels of treatment responsiveness based on separate groups of samples with known anti-cancer treatment responsiveness; and generating predictions of anti-cancer treatment responsiveness of a patient based on the position occupied by the patient sample in the PCA space that has been segmented into regions that represent different levels of anti-cancer treatment responsiveness, whereby the position of the patient sample is determined from the functional group values that are calculated from the quantitative protein expression data obtained from the patient sample and their multiplication with the associated coefficients of these functional groups as generated in the principle component analysis.
Single or combinatorial biomarkers so far are based on correlating the presence or absence of genes, gene products and their mutations with treatment responsiveness by classical statistics. A major problem with the prior art methods of identifying biomarkers is that the vast majority of markers subsequently fail. A key contributing factor to this failure is considered to be the lack of taking into account systems-level interactions and topologies of signalling pathways in which these markers exert their biological function. The present invention overcomes this problem by integrating quantitative measurements of multiple protein amounts into a systems-level context. This context reflects biological knowledge on multi-protein interactions, and decision processes coded by these interactions. The exploitation of these data in the form of systems biomarkers is achieved by coupled, innovative data processing algorithms. This constitutes a novel means to provide case-specific predictions on whether an anti-cancer treatment is likely to be effective.
In one embodiment, at least one functional group comprises pathway knowledge through inter-relating multiple protein expression levels according to their biological functions and interactions in biological signalling networks.
In one embodiment, the principle component analysis may be conducted on systems-knowledge enriched input data comprising a multivariate statistical algorithm adapted to reduce the dimensionality of the data set.
In the specification, the term "functional group" should be understood to mean a group composed of a multiple of proteins (for example, at least two or three proteins) which are interrelated by a rules-based logic (see Figure 1 ). This grouping and the related arithmetic operations introduce biological pathway knowledge as well as linear and non-linear biological signalling characteristics into the original protein data set.
In one embodiment, at least one functional group comprises quantitative data on multiple proteins interacting according to linear and non-linear signalling features found in biological signalling networks. Examples of such biological signalling networks in the context of cancer are apoptosis signal transduction; cellular proliferation signalling; cancer cell bioenergetics. The core pathways implicated in determining cancer treatment efficacy, such as de-regulated cell cycle progression and proliferation signalling as well as cancer metabolomic adjustment and apoptotic cell death pathways, are regulated by highly nonlinear signalling networks. Reference is made to reviews detailing such hallmark pathways which are deregulated in cancer (Cell, 201 1 , vol. 144(5), pp. 646-674; Hesketh R, Introduction to Cancer Biology, Cambridge University Press, 2013 (ISBN 9781 107601482); Nat. Med. 2013, 19(1 1 ), pp. 1389-1400; Semin Cancer Biol., 2013, 23(5), pp. 352-360). Systems-level interactions of multiple proteins can be classified into functional groups that reflect cooperativity, antagonism and redundancies, as described for example for the TRAIL signalling pathway as described below in the Figures.
In one embodiment, the principle component analysis conducted on systems- knowledge enriched input data comprises a multivariate statistical algorithm adapted to reduce the dimensionality of the data set.
In the specification, the term "systems-defined number of dimensions" should be understood to mean that the method of the invention selects and starts from the first principal component, and, following an implemented algorithms that reflects the Kaiser criterion, that the method selects a number of additional dimensions (principle components) that is required to represent 75% of the data variance found in the functional groups. The remaining principle components are then discarded.
In one embodiment the reduced set of principle components defines a multidimensional and segmented PCA space. Implementation of an algorithm that applies a Kaiser criterion defines the number of principle components to be maintained for further analysis.
In one embodiment, the method further comprises the step of applying a k- means clustering algorithm to define groups of samples with similar levels of resistant, low, medium or high anti-cancer treatment responsiveness for subsequent PCA space segmentation.
In the specification, the term "with similar levels of anti-cancer treatment responsiveness" should be understood to mean a grouped number of samples with similar levels of anti-cancer treatment responsiveness. This grouping reflects "resistant", "low", "medium", "high" anti-cancer treatment responsiveness (see Figure 2B) and is achieved by an implementation of an automated k-means clustering algorithm. These groups of samples then are used to segment the PCA space into regions that represent "resistant", "low", "medium", "high" levels of anti-cancer drug responsiveness. The segmentation of the PCA space according to the grouped samples is achieved by the implementation of a discriminant analysis algorithm or a support vector machine algorithm.
In one embodiment, linear/quadratic discriminant analysis or support vector machine based algorithms are applied to the groups of samples (see previous point) such that the PCA space can be segmented to separate different levels of resistant, low, medium or high anti-cancer treatment responsiveness for different anti-cancer treatments.
In the specification, the term "segmented PCA space" should be understood to mean that the multi-dimensional PCA space has been segmented into response regions which have been defined by discriminant analysis or support vector machine based algorithms.
In one embodiment, the method comprises the step of applying a support vector machine calculation instead of a discriminant analysis, to deal with larger amounts of data for PCA space segmentation where the sample number > (PCA dimensions)2. One of the advantages of the method of the present invention is that with larger amounts of data, the support vector machine calculations outperforms the analysis currently described in the prior art, namely linear/quadratic discriminant analysis.
In the specification, the term "larger amounts of data" should be understood to mean a scenario in which number of samples positioned in the PCA space exceeds the number of (PCA dimensions)2
In one embodiment, the method further comprises the step of outputting a prediction on anti-cancer treatment responsiveness based on the positioning of a patient's sample into the segmented PCA space.
In one embodiment, the method further comprises the step of applying vectorial additions to relative contributions of at least one functional group to calculate the consequence of altering individual or combinations protein amounts on the PCA space position of a sample; and outputting possibilities to re-position non- responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co-treatment prediction.
In one embodiment, the method can identify the best treatment choice for the patient by comparing the predicted anti-cancer treatment responsiveness of at least two individual treatment choices by using a decision tree-based algorithm.
In one embodiment, the method comprises the step of applying vectorial additions to calculate the consequence of altering individual or combinations of protein amounts on the PCA space position of a sample. Applied to functional groups of non-responsive cases, this allows the identification of possibilities to re-position non-responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co-treatment prediction.
In a further embodiment of the invention there is provided a computer implemented system for the personalisation and/or optimization of anti-cancer treatments for a patient comprising:
means for analysing quantitative expression levels of multiple proteins obtained from a biological sample obtained from a patient;
means for grouping the multiple protein quantities into functional groups based on biological pathway knowledge ;
means for inputting the functional group data of multiple samples into a multivariate statistical processing module and performing a principle component analysis (PCA) in which a multi-dimensional PCA space is obtained;
means for positioning each sample into the PCA space according to its functional group values derived from quantitative protein expression data and the associated coefficients generated from the principle component analysis;
means for defining groups of samples with common/comparable drug responsiveness by an implemented algorithms that perform k-means clustering means for segmenting the PCA space into regions that represent different levels of treatment responsiveness based on pattern recognition algorithms (discriminant analysis and/or support vector machine) applied to separate groups of samples with common anti-cancer treatment responsiveness; and
means for generating predictions of anti-cancer treatment responsiveness of a patient based on the position occupied by the patient sample in the PCA space that has been segmented into regions that represent different levels of anti-cancer treatment responsiveness, whereby the position of the patient sample is determined from the functional group values that are calculated from the quantitative protein expression data obtained from the patient sample and their multiplication with the associated coefficients of these functional groups as generated in the principle component analysis.
In one embodiment of the system, at least one functional group comprises quantitative data on multiple proteins interacting according to linear and nonlinear signalling features found in biological signalling networks.
In one embodiment of the system, the principle component analysis conducted on systems-knowledge enriched input data comprises a multivariate statistical algorithm adapted to reduce the dimensionality of the data set.
In one embodiment of the system, each principle component generated from the PCA is defined by the specific coefficient values that are associated with each of the functional groups.
In one embodiment of the system, at least one functional group comprises quantitative data on multiple proteins interacting according to linear and nonlinear signalling features found in biological signalling networks.
In one embodiment of the system, at least one functional group comprises pathway knowledge through inter-relating multiple protein expression levels according to their biological functions and interactions in biological signalling networks.
In one embodiment of the system, the set of principle components defines a multi-dimensional PCA space, and wherein an implementation of an algorithm that applies a Kaiser criterion defines the number of principle components to be maintained for further analysis.
In one embodiment of the system, the system further comprises means for applying a k-means clustering algorithm to define groups of samples with similar levels of resistant, low, medium or high anti-cancer treatment responsiveness for subsequent PCA space segmentation. These groups of samples with similar anti-cancer treatment responsiveness subsequently serve to segment the PCA space into response regions.
In one embodiment of the system, linear/quadratic discriminant analysis based algorithms are applied to the groups of samples (see previous point) such that the PCA space can be segmented to separate different levels of resistant, low, medium or high anti-cancer treatment responsiveness for different anti-cancer treatments.
In one embodiment of the system, the system further comprises means for applying a support vector machine-based algorithm instead of a discriminant analysis to deal with larger amounts of data for PCA space segmentation where the sample number > (PCA dimensions)2.
In one embodiment of the system, further comprises a means for outputting a prediction on anti-cancer treatment responsiveness based on the positioning of a patient's sample into the segmented PCA space.
In one embodiment of the system, further comprises a means for applying vectorial additions to relative contributions of at least one functional group to calculate the consequence of altering individual or combinations of protein amounts on the PCA space position of a sample; and means for outputting possibilities to re-position non-responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co-treatment prediction.
In one embodiment of the system, a decision tree-based algorithm selects the best treatment choice for a new sample, if at least two different drug treatment responses were predicted for that newly placed sample in the PCA space. The system can identify the best treatment choice for the patient by comparing the predicted anti-cancer treatment responsiveness of at least two individual treatment choices, implemented by a decision tree-based algorithm.
In one embodiment of the system, the system further comprises means for analysing whether changes in individual or multiple protein amounts for a sample can result in re-positioning non-responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co- treatment prediction. This is achieved by vectorial additions of the relative contributions of these proteins and their functional groups in each of the principle components that define the PCA space.
In one embodiment of the invention, there is provided a computer implemented method for the personalisation and/or optimization of anti-cancer treatments for a patient comprising the steps of:
analysing protein data obtained from a biological sample obtained from a patient;
grouping of different protein data into functional units;
inputting the functional units into a multivariate statistical processing module and performing a principle component analysis (PCA) in which a multi-dimensional PCA space of a systems-defined number of dimensions is obtained;
positioning the biological sample into the PCA space according to functional units based on protein expression profiles and coefficients generated from the principle component analysis; and generating predictions on anti-cancer treatment responsiveness based on a comparison of the positioning of the biological sample in the PCA space in relation to pre-defined groups of samples with known drug responsiveness.
Prior applications processed unmodified raw data sets on single or multiple measurements rather than generating knowledge-driven, biologically informed functional groups that represent systems features of the signalling networks.
The example implementing the present invention covers both extrinsic and intrinsic apoptosis signalling as well as apoptosis execution, therefore expanding significantly beyond prior art. Also in contrast to prior art, the present invention generates reliable predictions on which treatment from multiple treatment options available is the best one on a case-by-case basis. Taken together, the present invention therefore builds on fundamentally different methodology of prior art systems, has a broader scope and functional versatility, and its performance quality builds on a growing knowledge database that contributes to a continued self-improvement or self-learning of the invention.
Protein levels can be obtained from biopsies, resected tumor material, or formalin-fixed, paraffin-embedded histopathology material using reverse phase protein arrays, quantitative Western Blotting, tissue microarray immunostaining or immunohistochemistry, quantitative mass spectrometry or other methods generating quantitative protein data. This protein data will feed into the computational model.
Generally speaking, the raw biological sample can be any cancer cell containing sample, including blood samples cell or tissue extracts or isolated cell lines, primary cancer cells or biopsy or tissue biopsy (including tumour tissue samples).
Generally speaking, the individual is a human, although the computer- implemented method of the invention is applicable to other higher mammals. In this specification, the term "cancer" should be understood to mean a cancer that is treated by chemo- or radiotherapeutic regimens. An example of such a cancer include multiple myeloma, prostate cancer, glioblastoma, lymphoma, fibrosarcoma; myxosarcoma; liposarcoma; chondrosarcom; osteogenic sarcoma; chordoma; angiosarcoma; endotheliosarcoma; lymphangiosarcoma; lymphangioendotheliosarcoma; synovioma; mesothelioma; Ewing's tumor; leiomyosarcoma; rhabdomyosarcoma; colon carcinoma; pancreatic cancer; breast cancer; ovarian cancer; squamous cell carcinoma; basal cell carcinoma; adenocarcinoma; sweat gland carcinoma; sebaceous gland carcinoma; papillary carcinoma; papillary adenocarcinomas; cystadenocarcinoma; medullary carcinoma; bronchogenic carcinoma; renal cell carcinoma; hepatoma; bile duct carcinoma; choriocarcinoma; seminoma; embryonal carcinoma; Wilms' tumor; cervical cancer; uterine cancer; testicular tumor; lung carcinoma; small cell lung carcinoma; bladder carcinoma; epithelial carcinoma; glioma; astrocytoma; medulloblastoma; craniopharyngioma; ependymoma; pinealoma; hemangioblastoma; acoustic neuroma; oligodendroglioma; meningioma; melanoma; retinoblastoma; and leukemias.
In the specification, the term "treatment" or "anti-cancer treatment" should be understood to mean treatments known the person skilled in the art which are used to treat cancers, such as, for example, radiation treatment, chemotherapy (for example, platinum-based drugs such as Cisplatin, Carboplatin and Oxalaplatin; anti-metabolites such as 5-fluorouracil (5-FU); alkylating agents such as Dacarbazine, Temozolomide; Anthracyclines such as Doxorubicin), second-line therapies (for example, targeted thearpeutics such as narrow range or specific kinase inhibitors, proteasome inhibitors such as Bortezomib (Velcade), death receptor ligands, IAP antagonists, Bcl-2 family antagonists such as Obatoclax, epidermal growth factor receptor-based therapies such as Cetuximab, SMAC mimetics such as TL 3271 1 ), and the like.
The invention also provides a computer program comprising program instructions for causing a computer program to carry out the above method which may be embodied on a record medium, carrier signal or read-only memory.
Embodiments of the invention can be described through functional modules, which are defined by computer executable instructions recorded on computer readable media and which cause a computer to perform method steps when executed. The modules are segregated by function for the sake of clarity. However, it should be understood that the modules/systems need not correspond to discreet blocks of code and the described functions can be carried out by the execution of various code portions stored on various media and executed at various times. Furthermore, it should be appreciated that the modules may perform other functions, thus the modules are not limited to having any particular functions or set of functions.
The computer readable storage media can be any available tangible media that can be accessed by a computer. Computer readable storage media includes volatile and non-volatile, removable and non-removable tangible media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer readable storage media includes, but is not limited to, RAM (random access memory), ROM (read only memory), EPROM (erasable programmable read only memory), EEPROM (electrically erasable programmable read only memory), flash memory or other memory technology, CD-ROM (compact disc read only memory), DVDs (digital versatile disks) or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage media, other types of volatile and non-volatile memory, and any other tangible medium which can be used to store the desired information and which can accessed by a computer including and any suitable combination of the foregoing.
Computer-readable data embodied on one or more computer-readable storage media may define instructions, for example, as part of one or more programs that, as a result of being executed by a computer, instruct the computer to perform one or more of the functions described herein, and/or various embodiments, variations and combinations thereof. Such instructions may be written in any of a plurality of programming languages, for example, Java, J#, Visual Basic, C, C#, C++, Fortran, Pascal, Eiffel, MATLAB, Basic, COBOL assembly language, and the like, or any of a variety of combinations thereof. The computer-readable storage media on which such instructions are embodied may reside on one or more of the components of either of a system, or a computer readable storage medium described herein, may be distributed across one or more of such components.
The computer-readable storage media may be transportable such that the instructions stored thereon can be loaded onto any computer resource to implement the aspects of the present invention discussed herein. In addition, it should be appreciated that the instructions stored on the computer-readable medium, described above, are not limited to instructions embodied as part of an application program running on a host computer. Rather, the instructions may be embodied as any type of computer code (e.g., software or microcode) that can be employed to program a computer to implement aspects of the present invention. The computer executable instructions may be written in a suitable computer language or combination of several languages. Basic computational biology methods are known to those of ordinary skill in the art and are described in, for example, Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2nd ed., 2001 ).
The functional modules of certain embodiments of the invention include at minimum a determination system, a storage device, optionally a comparison module, and a display module. The functional modules can be executed on one, or multiple, computers, or by using one, or multiple, computer networks. The determination system has computer executable instructions to provide e.g., expression levels of proteins of interest which are involved in cancer progression or regression in computer readable form.
The determination system, can comprise any system for assaying, for example, a cancer tumor sample before and/or after anti-cancer drug treatment for expression of proteins of interest. Standard procedures, such as immunohistochemistry or an enzyme-linked immunosorbent assay (ELISA), may be employed.
The information determined in the determination system can be read by the storage device. As used herein the "storage device" is intended to include any suitable computing or processing apparatus or other device configured or adapted for storing data or information. Examples of an electronic apparatus suitable for use with the present invention include a stand-alone computing apparatus, data telecommunications networks, including local area networks (LAN), wide area networks (WAN), Internet, Intranet, and Extranet, and local and distributed computer processing systems. Storage devices also include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage media, magnetic tape, optical storage media such as CD-ROM, DVD, electronic storage media such as RAM, ROM, EPROM, EEPROM and the like, general hard disks and hybrids of these categories such as magnetic/optical storage media. The storage device is adapted or configured for having recorded thereon nucleic acid sequence information. Such information may be provided in digital form that can be transmitted and read electronically, e.g., via the Internet, on diskette, via USB (universal serial bus) or via any other suitable mode of communication.
As used herein, "stored" refers to a process for encoding information on the storage device. Those skilled in the art can readily adopt any of the presently known methods for recording information on known media to generate manufactures comprising information relating to protein or gene expression in a sample. In one embodiment the reference data stored in the storage device to be read by the comparison module is compared.
The "comparison module" can use a variety of available software programs and formats for the comparison operative to compare expression profile information data determined in the determination system to reference samples and/or stored reference data. In one embodiment, the comparison module is configured to use pattern recognition techniques to compare information from one or more entries to one or more reference data patterns. The comparison module may be configured using existing commercially-available or freely- available software for comparing patterns, staining, and may be optimized for particular data comparisons that are conducted. The comparison module provides computer readable information related to the expression levels of the proteins or genes of interest in the sample.
The comparison module, or any other module of the invention, may include an operating system (e.g., UNIX) on which runs a relational database management system, a World Wide Web application, and a World Wide Web server. World Wide Web application includes the executable code necessary for generation of database language statements (e.g., Structured Query Language (SQL) statements). Generally, the executables will include embedded SQL statements. In addition, the World Wide Web application may include a configuration file which contains pointers and addresses to the various software entities that comprise the server as well as the various external and internal databases which must be accessed to service user requests. The Configuration file also directs requests for server resources to the appropriate hardware~as may be necessary should the server be distributed over two or more separate computers. In one embodiment, the World Wide Web server supports a TCP/I P protocol. Local networks such as this are sometimes referred to as "Intranets." An advantage of such Intranets is that they allow easy communication with public domain databases residing on the World Wide Web {e.g., the GenBank or Swiss Pro World Wide Web site). Thus, in a particular preferred embodiment of the present invention, users can directly access data (via Hypertext links for example) residing on Internet databases using a HTML interface provided by Web browsers and Web servers.
The comparison module provides a computer readable comparison result that can be processed in computer readable form by predefined criteria, or criteria defined by a user, to provide a content based in part on the comparison result that may be stored and output as requested by a user using a display module.
In one embodiment of the invention, the content based on the comparison result or the determination system is displayed on a computer monitor. In one embodiment of the invention, the content based on the comparison result or determination system is displayed through printable media. The display module can be any suitable device configured to receive from a computer and display computer readable information to a user. Non-limiting examples include, for example, general-purpose computers such as those based on Intel PENTIUM- type processor, Motorola PowerPC, Sun UltraSPARC, Hewlett-Packard PA- RISC processors, any of a variety of processors available from Advanced Micro Devices (AMD) of Sunnyvale, California, or any other type of processor, visual display devices such as flat panel displays, cathode ray tubes and the like, as well as computer printers of various types.
In one embodiment, a World Wide Web browser is used for providing a user interface for display of the content based on the comparison result. It should be understood that other modules of the invention can be adapted to have a web browser interface. Through the Web browser, a user may construct requests for retrieving data from the comparison module. Thus, the user will typically point and click to user interface elements such as buttons, pull down menus, scroll bars and the like conventionally employed in graphical user interfaces.
The methods described herein therefore provide for systems (and computer readable media for causing computer systems) to perform methods as described in the Statements of Invention above, for example methods for the personalisation and/or optimization of anti-cancer treatments for a patient.
Systems and computer readable media described herein are merely illustrative embodiments of the invention for performing methods of diagnosis in an individual, and are not intended to limit the scope of the invention. Variations of the systems and computer readable media described herein are possible and are intended to fall within the scope of the invention.
The modules of the machine, or those used in the computer readable medium, may assume numerous configurations. For example, function may be provided on a single machine or distributed over multiple machines.
In one embodiment the invention provides the capacity to predict whether a sample is likely to respond to a treatment. In one embodiment the invention provides the capacity to select the best treatment option from multiple treatment possibilities. Furthermore, the invention generates predictions for beneficial targeted, re-sensitizing interventions in cases where no responsiveness is achieved by established treatments. The present invention provides a novel, powerful and highly versatile application for predicting treatment effectiveness and case-specific optimisation of anti-cancer treatments.
Due to its capacity to generate predictions on responsiveness to anti-cancer treatments, the present invention has high potential to be applied in various scenarios of basic and translational biomedical research, clinical research and clinical practice. The invention can be used to predict the response of a biological system to cell death-inducing treatments. The biological system that the present invention is applied to can be of a different nature; constituting, for example, isolated cell lines, primary cancer cells, 2-dimensional and 3- dimensional cultures of cell lines and primary cancer cells, isolated tumour spheroids, organotypic slices and cancer tissue sections, xenografted cell lines and tumours, as well as primary tumours in human patients. Examples for applications for the key predictive functionalities are listed in the following, with examples from both ends of the spectrum of biological systems (basic biomedical research - cell lines; clinical practice - patients):
(i) The invention can be used to predict the response of a biological system to cell death-inducing treatments. For example, it can be applied in basic biomedical research to generate predictions on whether treatments will be effective in cell line systems under investigation. Likewise, in clinical practice it can be employed to predict whether the standard treatment regime for a particular cancer will be effective in a patient presenting in the clinic.
(ii) The invention can be used to select the best treatment option from multiple available treatments. For example, it can be applied in basic biomedical research to identify cell lines that would preferentially respond to selected treatments. Particularly for the investigation of novel drugs and interventions, this is a useful function that can guide screens for the selection of suitable experimental systems and drug candidates in pre-clinical drug development processes. Likewise, in the clinical scenario this function can be exploited by patient- specifically selecting the treatment regime that is most likely to be efficient. Thereby, the invention constitutes an important step towards personalised medicine in the field of cancer. In addition, the invention can also be used for patient stratification as part of clinical trial design. Here, it can assist in identifying patients likely to respond to new therapeutics and treatment regimes. This particular application will be of high interest to all private and public investors in drug development and clinical trials.
(iii) The invention can be used to identify suitable targeted co-treatments for cases where no available treatment option is effective in initiating cell death. For example, in basic biomedical research it allows to identify those experimental systems in which targeted therapeutics will most likely have a beneficial effect, and thereby can accelerate research activities in drug development and validation processes. Likewise, in the clinical scenario this function is helpful to identify patients that may qualify for and benefit from co-treatments with drugs affecting the functionality of the functional groups defined in the invention. In addition, this function can be used also for the stratification of patients in trials that investigate novel targeted therapeutics as potential co-treatment options in established standard therapies.
Through the integration of knowledge on how multiple proteins interact as combinatorial systems biomarkers in their pathway-specific context in linear and non-linear fashion, and by specifically taking inter-individual heterogeneity into account, the present invention addresses multiple current problems: It provides a new means to prognosticate drug- and dose-dependent responsiveness, to select optimal treatments and to suggest additional targeted interventions to maximise responsiveness. This is a significant and beneficial addition to current treatment decision processes, which currently mostly build on type of cancer and additional clinical pathophysiological factors. The present invention therefore contributes to reducing inefficient treatments and to select more effective alternative interventions. This also reduces unnecessary costs in the research and health care sectors. Also, it reduces mentally and physically exhausting treatments for patients where the cancer is identified to be resistant to the used chemotherapeutic drugs.
Furthermore, the present invention allows predicting on a case-by-case basis whether new or alternative treatments are likely to be successful. The present invention can therefore also be used to stratify patients for treatments and for recruitment into clinical trials. Many of the current clinically used chemo- and radiotherapeutic treatments are activating directly or indirectly cell death processes. Therefore, the present invention's underlying hypothesis is that it is possible to predict the treatment outcome based on the expression profiles of proteins (relevant for the cell death modality induced by the treatment) from a patient's tumour tissue sample when investigated quantitatively in the context of the signalling pathway.
Brief Description of the Drawings
The invention will be more clearly understood from the following description of an embodiment thereof, given by way of example only, with reference to the accompanying drawings, in which :-
Figure 1 illustrates the functional grouping of protein data. The overview visualises the arithmetic operations that are applied to quantitative protein expression data in order to generate functional groups. These operations are applied according to the relation of the respective proteins within cell death signalling systems or different pathways.
Figure 2 illustrates the application of the knowledge base for subsequent systems-based predictions. (A) Visualisation of the principal component (PCA) space defined limited to the first three PCAs derived from a validation data set from 1 1 melanoma cell lines. Cell lines are placed into the PCA space according to their respective functional group values. (B) Shape coding (star, square, circle, triangle) according to drug responsiveness (TRAIL treatment) for visualising common response areas in the PCA space. Shape coded responsiveness classes were defined by k-means clustering. (C) Schematic 2D visualisation of the PCA space segmentation by linear discriminant analysis (LDA). Individual cases are separated according to their responsiveness. New cases can be placed into the PC space, and the spatial region in which they are positioned indicates their predicted treatment responsiveness Figure 3 illustrates the visualisation of "best treatment choice" output. Applied as proof-of-principle to TRAIL- and DTIC-induced cell death signalling, the invention generated response predictions for five exemplary cases of malignant melanoma. The invention selects and highlights the treatment for which higher responsiveness is predicted. Prediction of identical responsiveness results in offering both treatment options. Experimental validation of the responsiveness predictions is indicated as well in the last row and confirms the accuracy of the predictions.
Figure 4 illustrates an overview of the integration of the invention into workflows applicable to settings in research and clinical environments. The invention processes data inputs (protein data and associated responsiveness information) to generate case specific predictions on drug responsiveness or treatment suggestions (outputs) for samples for which responsiveness is unknown. These treatment suggestions can be applied in practice to optimise responsiveness case-specifically. Data on responsiveness which accumulate subsequent to the predictions made are feeding back into the knowledge base of the invention, improving and optimising its performance.
Figure 5 illustrates an overview of the sequential and inter-related data processing algorithms of the invention. From a knowledge base on protein data and associated treatment-specific responsiveness information, the invention conducts, based on systems-knowledge enriched functional groups, a principal component analysis, which is subsequently used to identify and separate responsiveness clusters in the PCA space. Performance validation algorithms, like a leave-one-out cross-validation, then provide a quality-control for the accuracy of case- specific predictions on single or multiple treatments. The approach allows the addition of new data to update the PCA, PCA space and associated PCA space segmentation based on discriminant analysis or support vector machine calculations. The position of the new cases in the PCA space allows to predict responsiveness. An additional decision tree algorithm can automatically filter for the best treatment options if multiple treatments are investigated. A vectorial analysis algorithm allows to calculate the re-positioning of individual samples in the PCA space closer to response regions in the PC space, thereby identifying suitable strategies for additional sensitizing targeted interventions that can be applied to cases where non-responsiveness is predicted. Detailed Description of the Drawings
The present invention provides a computer implemented systems biological framework and method for the field of cancer research and cancer medicine that (i) allows to specifically predict, on a case-by-case basis, the responsiveness to anti-cancer treatments, (ii) allows to select optimal alternative treatments for cases where poor responsiveness is predicted, and (iii) allows to suggest promising targeted co-treatment options for cases of non-responsiveness to treatment.
As a biological data input, the invention requires quantitative measurements on the amounts of proteins that regulate responsiveness to the drugs under investigation. The invention is independent of the method (e.g. luminescence- or fluorescence-based assays, quantitative mass spectrometry etc.) by which the protein amounts are determined and can process relative and absolute protein concentration measurements.
The invention generates predictions from a complex implementation and interaction of algorithms from the field of multivariate statistics and pattern recognition and from the integration of current knowledge of biological signal transduction in the signalling pathways under investigation. The invention processes a data knowledge base, consisting of the protein expression amounts in biological samples and associated measurements on treatment responsiveness to one or multiple treatments given at a single or at multiple concentrations or doses. From this, the invention trains and optimises itself. Since the knowledge grows with each validated prediction and analysed case, the performance of the invention can be considered to be of non-static nature, with continued use contributing to further self-training and performance improvement of the invention.
Framework for the intearation of biomedical systems knowledae on signal transduction pathways into the invention: Protein expression amounts required for the invention to work need to cover core aspects of the signal transduction pathways under investigation. The protein amounts are integrated into the invention with the help of defined functional groups in which multiple proteins interact according to linear and nonlinear signalling features found in biological signalling networks, as shown in Figure 1. Functional groups need to be defined by the user according to the current understanding of the pathway under investigation. The rules-based processing within the functional groups as shown in Figure 1 is a major component of the functionality of the invention and not defined by a user. While arithmetically simple, the functional group calculation enriches the original data with information on linear or non-linear systems-level features of biological signalling pathways. Applying the method of the invention to a signalling pathway of choice however requires the user to add the respective proteins to functional group meaningfully {i.e. the pathway needs to be known). For example, if a user knows that two proteins are required to form a functional signalling platform, these two proteins would need to be added to a functional group that represents signalling platforms. Likewise, if a user knows that two proteins have redundant functions, they would accordingly need to be added to a function group that reflects these characteristics. As shown in Figure 1 , 3 classes of functional groups which a user could use are provided.
A proof-of-principle implementation of the invention for apoptosis signalling, applied here as an example to the TRAIL signalling pathway, is presented. According to the core rules shown in Figure 1 , the definition of the functional groups for apoptosis signalling follows from the current knowledge in the literature (See for example Hellwig CT et al. (Curr Mol Med. 201 1 Feb;1 1 (1 ):31 - 47)). From the entire pool of proteins, only those relevant for TRAIL signalling were entered into the functional groups.
Functional Group 1 (Extrinsic apoptosis initiation platform): The sum of death receptors amounts multiplied by the amount of Fas-associated death domain (FADD). Functional Group 2 (Initiator caspase activity): The ratio of inactive caspase-8 homologue cFLIP and caspase-8.
Functional Group 3 (Anti-apoptotic Bcl-2 family members): Addition of the amounts of redundant anti-apoptotic Bcl-2 family members Bcl-2, Mcl-1 , Bcl-xL
Functional Group 4 (Pro-apoptotic proteins Bax and Bak): Addition of the amounts of pro-apoptotic Bcl-2 family members Bax and Bak.
Functional Group 5 (BH3-only proteins from the Bcl-2 family): Here, expression amount of the BH3-only protein Bid. For TRAIL signalling other BH3-only proteins are not relevant.
Functional Group 6 (Smac-like proteins): Here, expression amount of Smac. For TRAIL signalling other Smac-like proteins are not relevant.
Functional Group 7 (Cytochrome-c): Expression amount of cytochrome- c. Expressed in excess in cells and may be obsolete.
Functional Group 8 (Apoptosome platform): Multiplication of the amounts of Apaf-1 and caspase-9, the core components of the apoptosome signalling platform.
Functional Group 9 (Effector caspase-dependent apoptosis execution): The amount of x-linked inhibitor of apoptosis protein (X-IAP) divided by the amount of caspase-3. Other IAP proteins and effector caspases are not relevant for TRAIL signalling.
Functional groups can likewise be defined for other apoptotic or otherwise cancer-related pathways. The generation of functional groups introduces pathway knowledge through inter-relating multiple protein amounts according to their biological functions and interactions. Furthermore, it reduces the number of independent protein variables.
Framework for the mathematical processes implemented to provide data analysis, predictions on treatment responses, best treatment options and targeted co-treatment options:
Case-specific values for the functional groups, as determined from biological samples, are processed by an exploratory multivariate statistical algorithm in the form of a principle component analysis (PCA). The PCA aims to further reduce the dimensionality of the data set. Each principle component is defined by individually weighted contributions of the functional groups, and the values of these contributions (coefficients) can be displayed. Next, a Kaiser criterion is applied to the PCA. This is a mathematical criterion that reduces the number of used principle components, while maintaining a minimum of approximately 75% of the original data variance. The reduced set of principle components then defines a multi-dimensional PCA space. All following mathematical procedures are performed in this multi-dimensional PCA space, and the implementation of the method is capable of generating visualisations limited to two and/or three- dimensions in this context.
The invention then positions each biological sample into the PCA space according to its specific profile of functional group values (Figure 2A). As a next step, a k-means clustering algorithm is applied to define groups of samples with similar drug responsiveness. For demonstration purposes, sample positions were shape-coded in Figure 2B according to the k-means-defined clusters. Visually, already here PCA space regions can be seen that contain samples with similar levels of responsiveness. In the form of linear and quadratic discriminant analysis, the invention entails mathematical procedures from the field of pattern recognition that segment the PCA space objectively into regions of different responsiveness (see Figure 2C for a scheme). Once the invention has accumulated a sufficiently large knowledge base, the discriminant analysis is replaced by a support vector machine, which allows superior performance for larger amounts of data. The segmentations by discriminant analysis or support vector machine are treatment-specific, so that one and the same PCA space position may reflect different levels of responsiveness for different treatments. The output of this cluster segmentation procedure entails a measurement for the segmentation accuracy. In test cases, segmentation accuracy ranged between 80-100% and subsequently allowed for high prediction accuracy.
The present invention can generate predictions on treatment responsiveness for additional cases and samples for which treatment responsiveness in not yet known. These cases are placed by the invention into the PCA space according to their functional group values. Based on their positioning, the invention reports back a prediction on the likely responsiveness to one or multiple treatments. The prediction is made as follows: Once the PCA space has been segmented and regions of different responsiveness have been defined, samples for which predictions need to be generated can be processed. From the sample-specific protein profile, the method described herein calculates the functional group values. The method described herein then multiplies these with the coefficients that the PCA has assigned to the respective functional groups. This yields the final position of the sample in the PCA space. The responsiveness is predicted according to the responsiveness region the sample is positioned in. The present invention therefore provides the user with a case- and treatment-specific prognosis of responsiveness and, if the initial knowledge base includes this information, can also suggest the optimal concentration or dosage of the specific treatments. In addition, from the pool of possible treatments for which the invention has been trained, it automatically and case- specifically can select optimal treatments that result in maximum responsiveness. Figure 3 provides an example for TRAIL vs. dacarbazine treatment.
Furthermore, the invention can filter for cases that have been predicted as non- responsive to one or multiple treatments. For these cases the present invention provides predictions on which functional groups need to be targeted to restore drug responsiveness. This is possible through the implementation of a mathematical algorithm that screens for possibilities to re-position non- responding cases closer to the PCA space regions that reflect higher treatment responsiveness. This algorithm combines information of the spatial position of the non-responding case and vectorial additions of relative contributions of the functional groups and their proteins to the principle components in the PCA space. Vector additions can be positive or negative, depending on the application. For example, if one uses SMAC mimetics, one can increase SMAC direction in the PCA space. If one uses ABT737, it will reduce the amount of its targets BCL2 and Bcl-xL, and therefore vectorial additions would be inverse. Vectorial additions that point in the direction of PCA space regions of higher responsiveness identify functional groups that can be targeted in practice to increase responsiveness. This targeting can be achieved through any intervention that increases the relative contribution of pro-death proteins, decreases the contribution of cell survival proteins, or both in the identified functional group(s). The invention therefore allows to suggest promising targeted co-treatments combinations and manipulations that restore treatment responsiveness. This functionality has been validated for three independent sensitizing perturbations in four experimental model systems each for the TRAIL pathway example described above. An overview of the workflows of the invention is provided in Figure 4.
The sequential workflow of the data processing and the usage of algorithms in this invention are shown in detail in Figure 5.
Figure 4 provides a workflow of the whole framework according to a preferred embodiment of the invention. Patient tissue samples are collected and for each sample, proteins involved in pathway/s affecting disease, are quantitatively determined. Those protein concentrations are analysed by the computational systems biology framework, which allows the prognosis of responsiveness to certain drugs, the best treatment choice and the suggestion of co-treatment strategies for sensitizing non-responder. Once the treatment response to a used drug is validated, the latter new cases are added to the data set of the framework for self-learning and future comparison. Figure 5 is a more detailed technical workflow (UML) of the computational systems biology framework shown in Figure 4. A knowledge base consisting of quantitative data and specific drug treatment response information is required for the invention. The protein concentrations are standardized and grouped into functional groups, as shown in Figure 1 . A principal component analysis is performed on the functional groups to create the principal component space. The Kaiser criterion determines the amount of principal component necessary to cover at least 75% of variance in the data. A k-means clustering is performed on the treatment response information to define responder groups. A discrimant analysis / support vector machine classification is performed on the PCA space together with the information about responder groups to divide the PCA space into regions with similar treatment response and to validate the cluster separation. To test the predictive capacity, a cross-validation is performed. Protein concentrations of new (not yet included in the knowledge base) tissue samples are quantified and serve as input for the computational systems biology framework. These new data are standardized to the knowledge base, and grouped into functional groups accordingly. Coefficients already gained from the previous principal component analysis are used to place new cases into the PCA space according to their protein profile. A discriminant analysis or support vector machine calculation performed on the PCA space, predicts responder group affiliation of new samples. Thus, it allows the selection of best treatment choice. Vectorial additions applied to targetable functional groups allows to identify best sensitizing co-treatment strategies.
The embodiments in the invention described with reference to the drawings comprise a computer apparatus and/or processes performed in a computer apparatus. However, the invention also extends to computer programs, particularly computer programs stored on or in a carrier adapted to bring the invention into practice. The program may be in the form of source code, object code, or a code intermediate source and object code, such as in partially compiled form or in any other form suitable for use in the implementation of the method according to the invention. The carrier may comprise a storage medium such as ROM, e.g. CD ROM, or magnetic recording medium, e.g. a floppy disk or hard disk. The carrier may be an electrical or optical signal which may be transmitted via an electrical or an optical cable or by radio or other means.
In the specification the terms "comprise, comprises, comprised and comprising" or any variation thereof and the terms "include, includes, included and including" or any variation thereof are considered to be totally interchangeable and they should all be afforded the widest possible interpretation and vice versa.
The invention is not limited to the embodiments hereinbefore described but may be varied in both construction and detail.

Claims

Claims
1 . A computer implemented method for the personalisation and/or optimization of anti-cancer treatments for a patient comprising the steps of:
analysing quantitative expression levels of multiple proteins obtained from a biological sample obtained from a patient;
grouping the multiple proteins into a plurality functional groups based on the biological pathway knowledge;
inputting the functional group data into a multivariate statistical processing module and performing a principle component analysis (PCA) in which a multi-dimensional PCA space is obtained;
positioning the sample into the PCA space according to its functional group values derived from quantitative protein expression data and the associated coefficients generated from the principle component analysis; defining groups of samples with similar known anti-cancer treatment responsiveness by using a clustering technique;
segmenting the PCA space into a plurality of regions, each region represents different levels of anti-cancer treatment responsiveness based on separate groups of samples with known anti-cancer treatment responsiveness; and
generating predictions of anti-cancer treatment responsiveness of a patient based on the position occupied by the patient sample in the PCA space that has been segmented into regions that represent different levels of anti-cancer treatment responsiveness, whereby the position of the patient sample is determined from the functional group values that are calculated from the quantitative protein expression data obtained from the patient sample and their multiplication with the associated coefficients of these functional groups as generated in the principle component analysis.
2. The computer implemented method as claimed in claim 1 , wherein the principle component analysis conducted on systems-knowledge enriched input data comprises a multivariate statistical algorithm adapted to reduce the dimensionality of the data set.
3. The computer implemented method as claimed in any preceding claim wherein at least one functional group comprises quantitative data on multiple proteins interacting according to linear and non-linear signalling features found in biological signalling networks.
4. The computer implemented method as claimed in any preceding claim wherein at least one functional group comprises pathway knowledge through inter-relating multiple protein expression levels according to their biological functions and interactions in biological signalling networks.
5. The computer implemented method as claimed in any one of the preceding claims, wherein the set of principle components defines a multi-dimensional PCA space, and wherein an implementation of an algorithm that applies a Kaiser criterion defines the number of principle components to be maintained for further analysis.
6. The computer implemented method as claimed in any preceding claim further comprising the step of applying a k-means clustering algorithm to define groups of samples with similar levels of resistant, low, medium or high anticancer treatment responsiveness for subsequent PCA space segmentation.
7. The computer implemented method as claimed in any one of the preceding claims, wherein linear/quadratic discriminant analysis based algorithms are applied to the groups of samples such that the PCA space can be segmented to separate different levels of resistant, low, medium or high anti-cancer treatment responsiveness for different anti-cancer treatments.
8. The computer implemented method as claimed in any one of the preceding claims, further comprising the step of applying a support vector machine based algorithm instead of a discriminant analysis to deal with larger amounts of data for PCA space segmentation where the sample number > (PCA dimensions)2.
9. The computer-implemented method as claimed in any one of the preceding claims, further comprising the step of outputting a prediction on anti-cancer treatment responsiveness based on the positioning of a patient's sample into the segmented PCA space.
10. The computer-implemented method as claimed in any one of the preceding claims, further comprising the step of applying vectorial additions to relative contributions of at least one functional group to calculate the consequence of altering individual or combinations of protein amounts on the PCA space position of a sample; and outputting possibilities to re-position non- responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co-treatment prediction.
1 1 . The computer-implemented method according to any one of the preceding claims, wherein the method can identify the best treatment choice for the patient by comparing the predicted anti-cancer treatment responsiveness of at least two individual treatment choices, implemented by a decision tree- based algorithm.
12. A computer implemented system for the personalisation and/or optimization of anti-cancer treatments for a patient comprising the steps of:
means for analysing quantitative expression levels of multiple proteins obtained from a biological sample obtained from a patient;
means for grouping the multiple protein quantities into a plurality of functional groups based on biological pathway knowledge ;
means for inputting the functional group data into a multivariate statistical processing module and performing a principle component analysis (PCA) in which a multi-dimensional PCA space is obtained;
means for positioning the sample into the PCA space according to its functional group values derived from quantitative protein expression data and the associated coefficients generated from the principle component analysis;
means for defining groups of samples with similar known anti-cancer treatment responsiveness by using a clustering technique; means for segmenting the PCA space into a plurality of regions that represent different levels of anti-cancer treatment responsiveness based on separate groups of samples with known anti-cancer treatment responsiveness; and
means for generating predictions of anti-cancer treatment responsiveness of a patient based on the position occupied by the patient sample in the PCA space that has been segmented into regions that represent different levels of anti-cancer treatment responsiveness, whereby the position of the patient sample is determined from the functional group values that are calculated from the quantitative protein expression data obtained from the patient sample and their multiplication with the associated coefficients of these functional groups as generated in the principle component analysis.
13. The computer implemented system as claimed in claim 12 wherein the principle component analysis conducted on systems-knowledge enriched input data comprises a multivariate statistical algorithm adapted to reduce the dimensionality of the data set.
14. The computer implemented system as claimed in claim 12 or 13 wherein at least one functional group comprises quantitative data on multiple proteins interacting according to linear and non-linear signalling features found in biological signalling networks.
15. The computer implemented system as claimed in any one of claims 12 to 14 wherein at least one functional group comprises pathway knowledge through inter-relating multiple protein expression levels according to their biological functions and interactions in biological signalling networks.
16. The computer implemented system as claimed in any one of claims 12 to 15 wherein the set of principle components defines a multi-dimensional PCA space, and wherein an implementation of an algorithm that applies a Kaiser criterion defines the number of principle components to be maintained for further analysis.
17. The computer implemented system as claimed in any one of claims 12 to 16 further comprising means for applying a k-means clustering algorithm to define groups of samples with similar levels of resistant, low, medium or high anti-cancer treatment responsiveness for subsequent PCA space segmentation.
18. The computer implemented system as claimed in any one of claims 12 to 17 wherein linear/quadratic discriminant analysis based algorithms are applied to the groups of samples such that the PCA space can be segmented to separate different levels of resistant, low, medium or high anti-cancer treatment responsiveness for different anti-cancer treatments.
19. The computer implemented system as claimed in any one of Claims 12 to 18, further comprising a means of applying a support vector machine-based algorithm instead of a discriminant analysis to deal with larger amounts of data for PCA space segmentation where the sample number > (PCA dimensions)2.
20. The computer implemented system as claimed in any one of claims 12 to 19 further comprising a means for outputting a prediction on anti-cancer treatment responsiveness based on the positioning of a patient's sample into the segmented PCA space.
21 . The computer system as claimed in any one of claims 12 to 20, further comprising a means for applying vectorial additions to relative contributions of at least one functional group to calculate the consequence of altering individual or combinations of protein amounts on the PCA space position of a sample; and means for outputting possibilities to re-position non-responsive cases closer to PCA space regions that reflect higher anti-cancer treatment responsiveness or a co-treatment prediction.
22. The computer-implemented system according to any one of claims 12 to 21 , wherein the system can identify the best treatment choice for the patient by comparing the predicted anti-cancer treatment responsiveness of at least two individual treatment choices, implemented by a decision tree based algorithm.
PCT/EP2013/075725 2012-12-05 2013-12-05 System and method for the personalisation and optimization of anti-cancer treatments WO2014086949A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201261733554P 2012-12-05 2012-12-05
EP12195636 2012-12-05
US61/733,554 2012-12-05
EP12195636.1 2012-12-05

Publications (2)

Publication Number Publication Date
WO2014086949A2 true WO2014086949A2 (en) 2014-06-12
WO2014086949A3 WO2014086949A3 (en) 2014-09-25

Family

ID=47562998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/075725 WO2014086949A2 (en) 2012-12-05 2013-12-05 System and method for the personalisation and optimization of anti-cancer treatments

Country Status (1)

Country Link
WO (1) WO2014086949A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342657B2 (en) * 2003-03-24 2016-05-17 Nien-Chih Wei Methods for predicting an individual's clinical treatment outcome from sampling a group of patient's biological profiles
EP1969506A1 (en) * 2005-12-13 2008-09-17 Erasmus University Medical Center Rotterdam Genetic brain tumor markers

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
CELL, vol. 144, no. 5, 2011, pages 646 - 674
HELLWIG CT ET AL., CURR MOL MED., vol. 11, no. 1, February 2011 (2011-02-01), pages 31 - 47
HESKETH R: "Introduction to Cancer Biology", 2013, CAMBRIDGE UNIVERSITY PRESS
JANES KA; ALBECK JG; GAUDET S; SORGER PK; LAUFFENBURGER DA; YAFFE MB, SCIENCE, vol. 310, no. 5754, 9 December 2005 (2005-12-09), pages 1646 - 53
JANES KA; GAUDET S; ALBECK JG; NIELSEN UB; LAUFFENBURGER DA; SORGER PK, CELL, vol. 124, no. 6, 24 March 2006 (2006-03-24), pages 1225 - 39
KEVIN A. JANES ET AL., NATURE REVIEWS MOLECULAR CELL BIOLOGY, vol. 7, no. 11, 2006, pages 820 - 828
LEE MJ; YE AS; GARDINO AK; HEIJINK AM; SORGER PK; MACBEATH G; YAFFE MB, CELL, vol. 149, no. 4, 11 May 2012 (2012-05-11), pages 780 - 94
NAT. MED., vol. 19, no. 11, 2013, pages 1389 - 1400
NATURE, vol. 469, no. 7329, 13 January 2011 (2011-01-13), pages 156 - 7
OUELETTE; BZEVANIS: "Bioinformatics: A Practical Guide for Analysis of Gene and Proteins", 2001, WILEY & SONS, INC.
RASHIDI; BUEHLER: "Bioinformatics Basics: Application in Biological Science and Medicine", 2000, CRC PRESS
SALZBERG; SEARLES; KASIF: "Computational Methods in Molecular Biology", 1998, ELSEVIER
SEMIN CANCER BIOL., vol. 23, no. 5, 2013, pages 352 - 360
SETUBAL; MEIDANIS ET AL.: "Introduction to Computational Biology Methods", 1997, PWS PUBLISHING COMPANY

Also Published As

Publication number Publication date
WO2014086949A3 (en) 2014-09-25

Similar Documents

Publication Publication Date Title
Lehallier et al. Undulating changes in human plasma proteome profiles across the lifespan
Zhang et al. A novel heterogeneous network-based method for drug response prediction in cancer cell lines
Chen et al. Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data
Xia et al. ANLN functions as a key candidate gene in cervical cancer as determined by integrated bioinformatic analysis
Rouzier et al. Multigene assays and molecular markers in breast cancer: systematic review of health economic analyses
Wei et al. Nonparametric pathway-based regression models for analysis of genomic data
Hu et al. Personalized risk prediction in clinical oncology research: applications and practical issues using survival trees and random forests
US20180039732A1 (en) Dasatinib response prediction models and methods therefor
Liu et al. Statistical methods for analyzing tissue microarray data
US20040236723A1 (en) Method and system for data evaluation, corresponding computer program product, and corresponding computer-readable storage medium
Verma et al. Patterns of care and outcomes with the addition of chemotherapy to radiation therapy for stage I nasopharyngeal cancer
Preto et al. SYNPRED: prediction of drug combination effects in cancer using different synergy metrics and ensemble learning
Teran Hidalgo et al. Clustering multilayer omics data using MuNCut
Peng et al. Identification of genomic expression differences between right-sided and left-sided colon cancer based on bioinformatics analysis
Zhang et al. A Bayesian semiparametric survival model with longitudinal markers
Mocellin et al. Targeted Therapy Database (TTD): a model to match patient's molecular profile with current knowledge on cancer biology
Sheehy et al. Gynecological cancer prognosis using machine learning techniques: A systematic review of last three decades (1990–2022)
JP7412061B2 (en) Techniques for identifying optimal drug combinations
Shimada et al. A tool for browsing the Cancer Dependency Map reveals functional connections between genes and helps predict the efficacy and selectivity of candidate cancer drugs
US20190214136A1 (en) Predictive biomarkers of drug response in malignancies
Chakraborty et al. Multi-OMICS approaches in cancer biology: New era in cancer therapy
Bhowmick et al. Identification of tissue-specific tumor biomarker using different optimization algorithms
WO2014086949A2 (en) System and method for the personalisation and optimization of anti-cancer treatments
Sharma et al. Emerging trends of bioinformatics in health informatics
Kontio et al. Scalable nonparametric prescreening method for searching higher-order genetic interactions underlying quantitative traits

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13821075

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 13821075

Country of ref document: EP

Kind code of ref document: A2