US20190252075A1 - Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling - Google Patents

Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling Download PDF

Info

Publication number
US20190252075A1
US20190252075A1 US16/237,392 US201816237392A US2019252075A1 US 20190252075 A1 US20190252075 A1 US 20190252075A1 US 201816237392 A US201816237392 A US 201816237392A US 2019252075 A1 US2019252075 A1 US 2019252075A1
Authority
US
United States
Prior art keywords
feature
value
bivariate
score
phenomic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/237,392
Inventor
Guenter Schmidt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Definiens AG
Original Assignee
Definiens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Definiens AG filed Critical Definiens AG
Priority to US16/237,392 priority Critical patent/US20190252075A1/en
Publication of US20190252075A1 publication Critical patent/US20190252075A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57434Specifically defined cancers of prostate
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/63ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for local operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2570/00Omics, e.g. proteomics, glycomics or lipidomics; Methods of analysis focusing on the entire complement of classes of biological molecules or subsets thereof, i.e. focusing on proteomes, glycomes or lipidomes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/34Genitourinary disorders
    • G01N2800/342Prostate diseases, e.g. BPH, prostatitis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/54Determining the risk of relapse
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/70Mechanisms involved in disease identification
    • G01N2800/7023(Hyper)proliferation
    • G01N2800/7028Cancer

Definitions

  • the present invention relates to systems and methods for detecting cancer and predicting the recurrence of cancer, and more particularly relates to systems and methods for predicting the recurrence of prostate cancer PSA recurrence.
  • a cancer patient can be treated such that the cancer goes into remission. Knowing whether and when the cancer might later come out of remission and recur would, for many reasons, be beneficial. Having such information may facilitate making better clinical and treatment decisions. Having such information may also allow the patient to improve the patient's quality of life, and to make better life decisions. An improved system and method for determining the likelihood of cancer recurrence is desired.
  • An analysis and display system generates and displays a score indicative of whether cancer will recur in a patient.
  • tumor tissue from each one of many patients is obtained and analyzed. For each of these patients, it is known whether the patient suffered a recurrence of cancer, and this information is loaded into the system.
  • a univariate phenomic feature of the tumor tissue is measured, and a corresponding univariate phenomic feature is defined.
  • the univariate phenomic feature may be measured through image analysis of digital images taken of tissue slices stained with IHC-based stains.
  • a univariate genomic feature of the tissue is also measured. This may entail obtaining a probe count indicative of a degree of expression of a particular gene.
  • a bivariate feature is then calculated using both the phenomic and genomic information. In this way, many univariate features can be measured. A bivariate feature can be calculated for the relationship between any two of the univariate features.
  • some of the nodes (bubbles) of the MST network represent phenomic features, and others represent genomic features.
  • the edges represent bivariate features. Some of the edges represent bivariate features that are based on both phenomic feature information and genomic feature information. The method of using phenomic information along with genomic information in the prediction of cancer recurrence allows additional feature measurements to be brought to bear in the determination of the score, as compared to methods that only use genomic information.
  • a user of the system can cause a rendering of the MST network to be displayed on the display of the system.
  • the nodes of univariate features that have more prognostic importance are rendered to be larger, whereas the nodes of univariate features that have less prognostic importance are rendered to be smaller.
  • Edges in the network that have more prognostic importance are rendered as thicker lines, whereas edges in the network that have less prognostic importance are rendered as thinner lines.
  • the type of bivariate relationship one of four “fuzzy logic” combinations) that was determined to have the most prognostic importance when the strength of the bivariate relationship was being determined in the learning phase is indicated in the MST network by the type of arrow or line that is representing the edge.
  • a score is to be generated for a new patient.
  • a diagnostic test that involves collecting information on only a relatively small number of the features is developed using the network as displayed on the system.
  • raw measurement information on only three features need be collected from the patient.
  • One of the features is a phenomic feature, and the other two features are genomic features.
  • a tissue sample is taken from the patient, and this raw phenomic and genomic data is obtained from the sample.
  • a score for each univariate feature (univariate feature used in the diagnostic test) is calculated.
  • raw measurement data for these three features is obtained. From this raw data, a score for each of the three features is calculated.
  • the raw data is used to calculate a score for each of the two bivariate features (edges in the network) between these three univariate features.
  • the overall score is a function of the underlying feature scores.
  • each of the underlying feature scores is either a “1” (representing a “yes” vote), or a “0” (representing a “no” vote).
  • the function is a majority voting function, so the overall score is a majority vote of the five votes provided by the five underlying features.
  • the resulting overall score which is indicative of whether cancer will recur in the patient, is then displayed on the display of the system.
  • test can be developed by inspecting the network after the learning phase, and by selecting features that have notably high prognostic importance.
  • the example of a test involving five univariate features and two bivariate features is presented for illustrative purposes.
  • the system has general applicability.
  • the system is usable to generate a score indicative of whether the patient will suffer a recurrence of another type of cancer, such as lung cancer, or breast cancer.
  • FIG. 1 is a diagram of a system for predicting the recurrence of cancer.
  • FIG. 2 is a diagram that illustrates how, in a “diagnostic phase” operation of the system of FIG. 1 , a tissue sample from a patient is used to generate both raw phenomic feature measurement data as well as raw genomic feature measurement data.
  • FIG. 3 is a diagram of a Minimal Spanning Tree (MST) of univariate and bivariate features that predict the recurrence of cancer, where the tree includes both phenomic and genomic features.
  • MST Minimal Spanning Tree
  • FIG. 4 is a two-dimensional matrix of the prognostic values of bivariate relationships between pairs of univariate features.
  • FIG. 5 is a table that sets forth thirty-two univariate features that are determined to have significant prognostic value in the prediction of the recurrence of cancer (specifically prostate cancer).
  • FIG. 6 is a grayscale version of a high-resolution digital image of a first slice of tissue that was duplex stained in an IHC-based image analysis process.
  • FIG. 7 is a grayscale version of a high-resolution digital image of a second slice of tissue that was duplex stained in the IHC-based image analysis process.
  • FIG. 8 is an expanded view of a portion of the first digital image of FIG. 6 .
  • FIG. 9 is an expanded view of a portion of the second digital image of FIG. 7 .
  • FIG. 10 is an illustrative diagram that shows how the average distance from an M2 type macrophage to its nearest four non-cytotoxic T-cells is determined.
  • FIG. 11 is a diagram that sets forth LGALS3 raw measurements used in the learning phase of the system.
  • FIG. 12 shows how the information in the rows of FIG. 11 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 13 shows how the raw measurement count values of FIG. 12 are normalized by rank percentage.
  • FIG. 14 is a Kaplan-Meier plot for the data of FIG. 13 .
  • FIG. 15 is a diagram that sets forth MAGEC2 raw measurements used in the learning phase of the system.
  • FIG. 16 shows how the information in the rows of FIG. 15 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 17 shows how the raw measurement count values of FIG. 16 are normalized by rank percentage.
  • FIG. 18 is a Kaplan-Meier plot for the data of FIG. 17 .
  • FIG. 19 is a diagram that sets forth IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ ) raw measurements used in the learning phase of the system.
  • FIG. 20 shows how the information in the rows of FIG. 15 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 21 shows how the raw measurement count values of FIG. 20 are normalized by rank percentage.
  • FIG. 22 is a Kaplan-Meier plot for the data of FIG. 21 .
  • FIG. 23 is a table that shows the four “fuzzy logic” bivariate scoring methods.
  • FIG. 24 is a diagram that shows how LGALS3-to-MAGEC2 normalized rank values are calculated when the SM1 bivariate scoring method is used.
  • FIG. 25 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM2 bivariate scoring method is used.
  • FIG. 26 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM3 bivariate scoring method is used.
  • FIG. 27 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM4 bivariate scoring method is used.
  • FIG. 28 is a table showing the prognostic values determined for the LGALS3-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4.
  • FIG. 29 is a Kaplan-Meier plot for the LGALS3-to-MAGEC2 bivariate relationship when the SM2 scoring method is used.
  • FIG. 30 is a diagram that shows how IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC 2 normalized rank values are calculated using the first bivariate scoring method SM1.
  • FIG. 31 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship is determined when the SM2 bivariate scoring method is used.
  • FIG. 32 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship is determined when the SM3 bivariate scoring method is used.
  • FIG. 33 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship is determined when the SM4 bivariate scoring method is used.
  • FIG. 34 is a table showing the prognostic values determined for IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4.
  • FIG. 35 is a Kaplan-Meier plot for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship when the SM1 scoring method is used.
  • FIG. 36 is a table showing three raw measurement values for the new patient (to be used in the diagnostic phase to determine a score for the new patient).
  • FIG. 37 shows how a score is determined in the diagnostic phase for the LGALS3 univariate feature.
  • FIG. 38 shows how a score is determined in the diagnostic phase for the MAGEC2 univariate feature.
  • FIG. 39 shows how a score is determined in the diagnostic phase for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ ) univariate feature.
  • FIG. 40 shows how a score is determined in the diagnostic phase for the LGALS3-to-MAGEC2 bivariate feature.
  • FIG. 41 shows how a score is determined in the diagnostic phase for the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate feature.
  • FIG. 42 sets forth the function that is used to determine the overall score from the underlying five feature scores.
  • FIG. 43 shows how the function of FIG. 42 is applied in the case of the new patient whose overall score is being determined in the diagnostic phase.
  • FIG. 1 is a conceptual diagram of a system 1 for predicting cancer recurrence using a prognostic method that analyzes genomic univariate features, phenomic univariate features, and bivariate features of the two (of a genomic feature and a phenomic feature). Based at least in part on that analysis, system 1 outputs a score 2 .
  • the score 2 is indicative of whether a patient will suffer a recurrence of cancer.
  • System 1 includes a data analysis server 3 .
  • the server 3 has a processor 4 that executes system software 5 .
  • the software 5 is stored on the server 3 in a non-transitory processor-readable medium, such as semiconductor memory and/or magnetic disk storage.
  • the server 3 also maintains, and/or provides access to, a database 6 of patient data.
  • the database 6 may be stored on the server 3 , or it may be stored remotely such that it is accessible to the server 3 .
  • the system 1 also further includes a computer 7 .
  • the computer 7 is coupled to the server 3 , for example by one or more networks or network connections 8 .
  • Computer 7 includes a keyboard (not shown) and a display 9 .
  • a user of the system uses the computer 7 to enter information into the system.
  • Information that the user can enter includes genomic feature information 10 , phenomic feature information 11 , and context information 12 .
  • the genomic feature information can be a set of counts, where each count indicates the degree of expression of a corresponding gene in the tissue of a cancer patient.
  • the phenomic feature information 11 can be digital images taken of tissue of a cancer patient.
  • context information 12 for the patient is also loaded into the database 6 .
  • the context information 12 for a patient includes information about the patient including clinical cancer recurrence data.
  • the user uses computer 7 to cause genomic information, phenomic information, and context information to be loaded into the system so that the information is stored in the database 6 .
  • the user uses the computer to cause both genomic and phenomic information about this patient to be loaded into the system and to be stored in the database 6 .
  • the user uses computer 7 to interact with the system, and to view information served to the user by server 3 .
  • the server 3 may cause this information to be displayed for viewing on the graphical user interface or display 9 of the computer 7 .
  • An example of information that can be viewed is a Minimum Spanning Tree (MST) 13 of univariate features and bivariate features, where nodes (bubbles) of the MST represent univariate features, and where edges (interconnecting lines and arrows) of the MST represent bivariate features.
  • MST Minimum Spanning Tree
  • the system 1 by virtue of processor 4 executing the system software 5 , analyzes the genomic gene expression information and the phenomic digital image information along with the context information, and generates therefrom the score 2 .
  • the system 1 then causes the score 2 to be displayed on the display 9 of the computer 7 .
  • phenomic features are physical structural characteristics of features of tissue that are obtained by analyzing digital images of tissue.
  • One or more slices of tissue are stained with one or more protein-specific ImmunoHistoChemical (IHC) stains.
  • IHC ImmunoHistoChemical
  • Such a stain is typically an antibody stain that has a fluorescent tag, where the antibody can bind to a particular target protein.
  • the selective staining of different proteins is usable to reveal certain structures within the tissue.
  • One or more digital images are taken of the stained tissue.
  • One particular physical structural characteristic may, for example, be a count of certain types of structures within the tissue, or may be a size of those structures within the tissue, or may be a density of those structures within the tissue.
  • a relationship between such detected structures in the tissue may also be considered to be a phenomic feature.
  • An example of a relationship between detected structures is the average distance between different types of structures detected within the tissue.
  • Another example of a relationship between detected structures is a ratio of the number of one type of structure to another type of structure.
  • a phenomic feature is the number of M1 macrophages in parts of tissue referred to as “influence zones”. Another example of a phenomic feature is the number of M1 macrophages in other parts of tissue referred to as “stroma regions”. Another example of a phenomic feature is the density of M2 macrophages in other regions of the tissue. Another example of a phenomic feature is a score, where that score is in turn a function of other such phenomic feature numbers. For detailed information on how tissue sample slices may be prepared, stained, and analyzed using image analysis in order to identify, measure and quantify phenomic features present in tissue of a cancer patient, see U.S. patent application Ser. No. 15/075,180, entitled “System for Predicting the Recurrence of Cancer in a Cancer Patient”, filed Mar. 20, 2016, by Natalie Harder et al. (the entire subject matter of which is incorporated herein by reference).
  • IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ ) One particular phenomic feature that is of particular interest in the prognostic method carried out by the system 1 of FIG. 1 is an average distance. This phenomic feature is referred to as “IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )”. IHC-based staining and image analysis are used to identify M2 macrophages in a tissue sample and to identify non-cytotoxic T-cells in the tissue sample. Each identified M2 macrophage is then considered, and for that M2 macrophage the average distance (in micrometers) between it and the four nearest identified non-cytotoxic T-cells is determined.
  • the average of all these averages for all the identified M2 microphages in the tissue sample is then determined, and this overall average is the value score for the IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ ) feature. Additional detail on how this value score is determined is set forth below.
  • a “genomic feature”, as that terms is used here, is a characteristic of particular DNA nucleotide sequences that is present in a tissue sample. This characteristic may be given as a count that is indicative of the degree of expression of a particular gene present in the sample.
  • Commercially available gene-specific biomarker probes exist that are designed so that they only attach to particular DNA nucleotide sequences, such as the sequences that are present in parts of mRNA strands.
  • a lysing buffer is used to lyse tissue to be analyzed into its constituent genetic material. The constituent genetic material is put into solution.
  • a pair of these biomarker “probes” is then mixed in.
  • One of the probes is a capture probe that is selective in that it only attaches to a particular sequence of DNA nucleotides of a target molecule (for example, a target mRNA strand that includes the particular sequence of nucleotides).
  • This capture probe can be made specific to a particular DNA nucleotide subsequence found on a gene.
  • the other probe of the probe pair is the reporter probe.
  • This reporter probe has a color-coded “barcode” that can be illuminated and optically examined to identify it.
  • nCounter is commercially available from NanoString Technologies, Inc., of Seattle, Wash.
  • the nCounter device has a high-resolution CCD camera.
  • the nCounter device is usable to illuminate the bar-code on each reporter probe, and thereby to determine the barcode of the probe and to count the number of times that a probe with that same particular barcode was detected.
  • the pair of probes is therefore said to be “gene-specific” in that the probe pair is usable as a biomarker for a specific gene that includes the particular sequence of DNA nucleotides for which the capture probe is selective.
  • Gene-specific probe pairs are commercially available from multiple sources, including from NanoString Technologies, Inc. After the pair of probes has been mixed into the solution of genetic material and after the probes have attached to their target molecules, excess probes (unattached probes) in the solution are removed.
  • the remaining probe/target complexes are then aligned and immobilized.
  • the nCoutner device illuminates the probe/target complexes and uses its high-resolution CCD camera to perform optical examination of the probes. In this way, the probe on each individual target molecule is identified by its barcode to be a probe of a particular type, and the count of this particular probe type is incremented.
  • the nCounter device outputs a digital file.
  • a digital file is an example of the genomic information 10 that is loaded into the system 1 of FIG. 1 . This digital file includes a count value.
  • the count value indicates the number of times that a probe of a particular type (bearing a particular color-coded barcode) was detected in the sample.
  • the so-called “expression level” of a gene is a measurement of how large the count value is for the barcode of the probe that is specific to the gene of interest.
  • the nCounter device may involve a CCD camera and may perform optical inspections in order to identify probes
  • the nCounter is not doing wide-field phenomic image analysis in that it is not performing any analysis to identify cells or groups of cells, or structural aspects of non-lysed tissue.
  • the nCounter device is not measuring or outputting raw phenomic feature data.
  • the term “phenomic” as it is used here is intended to exclude the data that results from the optical identification of gene-specific probes.
  • the first probe pair is usable with the nCounter device to measure the gene expression of the LGALS3 gene.
  • the LGALS3 gene is located in chromosome 14, locus q21-q22.
  • the second probe pair is usable with the nCounter device to measure the expression of the MAGEC2 gene.
  • the MAGEC2 gene is not expressed in normal tissue, but is expressed in tumors on chromosome Xq27.2.
  • the prognostic method carried out by the system 1 of FIG. 1 has a “learning phase” and a “diagnostic phase”.
  • learning phase both genomic feature information as well as phenomic feature information from each patient of a plurality of patients is generated, and then analyzed by the system. For each of these patients, information on many different genomic features and on many different phenomic features are typically collected and loaded into the system.
  • system 1 has information that is usable to generate a score 2 for a particular new patient in the later “diagnostic phase”. This score 2 can be generated for the new patient without having to load but a little bit of genomic feature information and but a little bit of phenomic feature information for the new patient. Based on this relatively small amount of information, the score 2 is generated. The score 2 indicates whether the new patient will likely suffer cancer recurrence.
  • FIG. 2 is a diagram that illustrates the “diagnostic phase” operation of the system 1 .
  • a tissue sample 15 is obtained (for example, by biopsy) from the new patient 16 .
  • the new patient 16 is the patient for whom the score 2 is to be generated.
  • the tissue sample 15 is then sliced into very thin slices. Some of the slices are used to generate phenomic digital image information 11 that is supplied to the system in the diagnostic phase for the new patient. Others of the slices are used to generate gene expression information 10 that is supplied to the system in the diagnostic phase for the new patient.
  • the first tissue slice 17 is stained with a first pair of IHC stains and is put on a first slide 20 .
  • a first high-resolution color digital image of the slice 17 is taken and is supplied as first digital image information to the system.
  • the second tissue slice 18 is stained with another pair of IHC stains and is put on a second slide 21 .
  • a second high-resolution color digital image of the slice 18 is taken and is supplied as second digital image information to the system.
  • the digital image information derived from slices 17 and 18 is the raw measurement data used by the system 1 to generate a phenomic univariate feature value score for the new patient.
  • the third tissue slice 19 is used for gene expression-based genomic analysis. The tissue of the third slice 19 is lysed and a first gene-specific probe pair for a first gene is attached, and a second gene-specific probe pair for a second gene is attached.
  • the first gene is the LGALS3 gene
  • the second gene is the MAGEC2 gene.
  • the resulting material 22 in the sample capsule 23 is processed by the nCounter device mentioned above, thereby generating a first count indicative of the degree of gene expression of the first gene, and a second count indicative of the degree of gene expression of the second gene.
  • a digital computer file that records these counts is output from the nCounter device and is supplied to the system 1 of FIG. 1 as the gene expression information 8 .
  • These counts, as they are recorded in the digital file are the raw genomic measurement data that is used by the system 1 to generate genomic univariate feature value scores for the new patient.
  • the system 1 generates a phenomic univariate feature value score (using the digital image information from the first and second digital images), generates a genomic univariate feature value score (using the count data as output by the nCounter device), and further generates a bivariate feature score (based on both the phenomic feature information and on the genomic feature information). Based at least in part on these univariate and bivariate feature value scores, the system 1 generates the overall score 2 . The overall score 2 is then displayed on the display 9 of the computer 5 .
  • the “learning phase” of the prognostic method is explained in further detail below by way of an example.
  • the clinical cancer recurrence of the patient is known. Namely, whether the patient actually suffered a recurrence of cancer is known and this information is stored as part of the context information for the patient. In addition, if the patient did suffer such recurrence, then the date of that recurrence is known. This information is also stored as part of the context information for the patient.
  • tissue sample is obtained from each one of these twenty-three patients.
  • the resulting tissue sample block is sliced into numerous tissue slices. Some of these tissue slices are used to make raw measurements of various different phenomic univariate features. Information on many different phenomic univariate features is obtained. Others of the tissue slices are used to make raw measurements of various different genomic univariate features. Many different gene-specific probe pairs are employed to obtain gene expression information for many different genes. For a given feature, all the raw measurements for that feature are normalized. One normalization method that can be used is rank percentage normalization.
  • the normalized and ranked values for each significant univariate feature is combined with the normalized and ranked values of each other significant univariate feature in order to calculate a bivariate feature value (a “ ⁇ log-p-value”).
  • a bivariate feature value a “ ⁇ log-p-value”.
  • the determination of which fuzzy logical combination is considered significant is the same as the selection of significant univariate features as described above, except that here the determination of the significant bivariate features has the additional requirement that the log-rank-test p-value of the combination f12 must be at least a factor of ten times smaller than the smallest log-rank-test p-value from the univariate analysis of f1 or f2.
  • the determination of the significant bivariate features has the additional requirement that the log-rank-test p-value of the combination f12 must be at least a factor of ten times smaller than the smallest log-rank-test p-value from the univariate analysis of f1 or f2.
  • the ⁇ log-p-value of that combination is determined to be the prognostic value ( ⁇ log-p-value) of the bivariate feature.
  • the univariate and bivariate feature values as determined above are fashioned into a network (into a graph).
  • the determined univariate feature values ( ⁇ log-p-values) are the bubbles (nodes) of the network.
  • the determined bivariate feature values ( ⁇ log-p-values) are the edges (interconnecting lines or arrows) of the network.
  • a modified version of Prim's algorithm is then used to trim the network and thereby to obtain a Minimal Spanning Tree (MST). In this version of Prim's algorithm, all significant bivariate features are first sorted according to their ⁇ log-p-values (sorted into descending order).
  • the most significant bivariate feature is then selected (first in the sorted list) to be a starting node of the MST. Then the bivariate feature list is iterated from top to end, and any bivariate feature f12 is added to the MST if at least f1 or f2 is not yet part of the MST. Additionally, f12 bivariate features are added if they are part of the top 75% quantile of all bivariate features.
  • a tree layout method is then used to render a diagram of the MST. In one example, the open source graph visualization software tool called “Graphviz sfdp” is used to generate a visual rendering of the MST. The user of the system 1 of FIG. 1 can use the computer 7 to cause the rendered MST diagram to be displayed on the display 9 as shown in simplified form in FIG. 1 .
  • FIG. 3 is a diagram of the MST 13 as it is rendered on display 9 of the computer 7 .
  • the size of the bubble (node) indicates the prognostic significance of the univariate feature, namely a larger bubble indicates a larger ⁇ log-p-value, whereas a smaller bubble indicates a smaller ⁇ log-p-value.
  • the larger the bubble the more significant the univariate feature.
  • a genomic univariate feature is denoted by its corresponding bubble being unshaded, whereas a phenomic univariate feature is denoted by its corresponding bubble being shaded.
  • the thickness of the edge representing a bivariate feature indicates the significance of the bivariate feature, namely a thicker line represents a larger ⁇ log-p-value, whereas a thinner line represents a smaller ⁇ log-p-value. The thicker the line, the more prognostic significance the bivariate feature has.
  • the MST indicates which one of the four fuzzy logic combinations it was that was considered to be the most significant.
  • the lack of an arrow head where an edge reaches a bubble indicates a “not” of the univariate feature represented by the bubble.
  • FIG. 4 is a portion of a two-dimensional matrix of the prognostic values ( ⁇ log-p-values) of the bivariate features.
  • Univariate features including both phenomic and genomic features
  • IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ ) there is a column for the genomic feature “MAGEC2”
  • genomic feature “LGALS3” there is also a corresponding row in the matrix.
  • FIG. 5 is a table that sets forth the thirty-two univariate features that are determined to be most significant. A feature was determined to be significant if it had a significant (p ⁇ 0.05) mean Kaplan Meier log-rank test p-value within a range of five cut-points. In the table, a positive sign indicates that high feature values are favorable for the patients, and the risk of PSA recurrence is low.
  • FIGS. 6-10 set forth more detail about how the “IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )” phenomic feature raw measurement data is obtained.
  • Two consecutive tissue slices of the same tissue sample are stained with IHC-based stains in different ways.
  • the first tissue slice is duplex-stained with a CD68 antibody stain and a CD163 antibody stain.
  • the CD68 stain may, for example, be a stain referred to as #M087601-2, available from Dako North America, Inc., 6392 Via Real, Carpinteria, Calif. 93013.
  • the CD1623 stain may, for example, be a stain referred to as #760-4437, available from Ventana Medical Systems, Inc., 1910 Innovation Park Drive, Arlington, Ariz. 85755. Due to this double staining, individual tumoricidal M1 type macrophages appear red when the slice is viewed under magnification, and individual tumorigenic M2 type macrophages appear brown when the slice is viewed under magnification. After staining, the slice is placed on a slide. A high-resolution color digital image 24 is then taken of the stained slice.
  • FIG. 6 is a grayscale version of the high-resolution digital image 24 .
  • the second tissue slice is also duplex stained, but this slice is stained with a CD3 antibody stain and a CD8 antibody stain. Due to this double staining, individual non-cytotoxic T-cells appear red when the slice is viewed under magnification, and individual cytotoxic T-cells appear brown when the slice is viewed under magnification. After staining, the slice is placed on a slide. A high-resolution color digital image 25 is then taken of the stained slice.
  • FIG. 7 is a grayscale version of the high-resolution digital image 25 .
  • FIG. 8 is an expanded view of a portion 26 of the first digital image 24 of FIG. 6 .
  • the system 1 performs image analysis on the first digital image, thereby identifying M2 macrophage objects.
  • the arrows in FIG. 8 identify the M2 macrophage objects.
  • the location in the X-Y dimension of the center of each detected M2 macrophage object is logged.
  • FIG. 9 is an expanded view of a portion 27 of the second digital image 25 of FIG. 7 .
  • the portion 27 of the second digital image 25 is the same X-Y dimension as is the portion 26 of the first digital image 24 .
  • the tissue represented in the two portions 27 and 26 is, however, slightly offset in the Z dimension.
  • the system 1 performs image analysis on the second digital image, thereby identifying individual non-cytotoxic T-cells objects.
  • the arrows in FIG. 9 identify the non-cytotoxic T-cells objects.
  • the location in the X-Y dimension of the center of each detected non-cytotoxic T-cell object is logged.
  • FIG. 10 is an illustrative diagram that shows how the average distance from a M2 type macrophage object to its nearest four non-cytotoxic T-cells is determined.
  • the X-Y image block portion identified by reference numeral 28 in FIG. 10 represents the same X-Y block of tissue as does block 29 in FIG. 9 and as does block 30 in FIG. 8 .
  • the non-cytotoxic T-cells 31 - 34 are determined to be the four such cells that are the closest (in the X-Y dimension) to the M2 macrophage 35 .
  • the distances D 1 , D 2 , D 3 and D 4 are determined from the logged center locations of the non-cytotoxic T-cells 31 - 34 and the M2 macrophage 35 .
  • the average of the distances D 1 , D 2 , D 3 and D 4 in micrometers is determined and recorded. This process is repeated for all the M2 macrophage objects detected in the first digital image 24 . All these averages are in turn averaged to obtain one overall average. This one overall average (in micrometers) is the “IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )” phenomic feature raw measurement for the patient from whom the first and second tissue slices were taken.
  • Inspection of the MST 13 of FIG. 3 indicates that a phenomic univariate feature can be advantageously used in the diagnostic phase along with two genomic univariate features, and along with two associated bivariate features, to generate the score 2 .
  • the dashed line 36 in FIG. 3 encircles these five features.
  • the MST 13 is viewable by the user during and after the learning phase so that the user can review the results of the learning phase, and can identify the features that the learning phase identified as being significant. This information is usable to design a diagnostic clinical test (for example, a test to predict cancer recurrence) that is effective, and yet only employs a relatively small number of features.
  • FIGS. 11-37 are a sequence of diagrams that set forth how the raw measurement data for the three univariate features (the LGALS3 feature, the MAGEC2 feature, and the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ ) feature) is processed in preparation for the “diagnostic phase”.
  • FIG. 11 shows the LGALS3 raw measurement data.
  • the LGALS3 feature is a genomic feature, so the listed raw measurement values are counts. For each patient of the twenty-three patients, there is a raw measurement count.
  • the right column sets forth the known cancer recurrence information for the associated patient. For example, the patient identified with patient ID of “4” was known not to suffer cancer recurrence.
  • the LGALS3 raw measurement count for this patient is “1159”.
  • FIG. 12 shows how the information in the rows of FIG. 11 is reordered (i.e., is “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 13 shows how the raw measurement count values of FIG. 12 are then normalized.
  • the smallest raw count value is replaced with the value 0/22, the next smallest raw count value is replaced with the value 1/22, and so forth.
  • the denominator of the replacement values is the number of patients supplying the data (in this case, twenty-three) minus one.
  • a Kaplan-Meier analysis is performed on the data of FIG. 13 . This analysis generates a “cut-point value” and indicates that there are eleven patients in the group of patients above the cut-point value, and there are twelve patients the group of patients below the cut-point value.
  • FIG. 14 is a Kaplan-Meier plot for the LGALS3 data of FIG. 13 .
  • the horizontal axis represents time.
  • the upper line 37 represents the group of eleven patients above the cut-point in FIG. 13 . This group of patients is estimated by the grouping to be patients who will not suffer cancer recurrence. At the time indicated by arrow 38 , however, one of these patients did suffer recurrence. The upper line 37 therefore dropped downward an amount to reflect the number of patients suffering recurrence at this time. Then later at the time indicated by arrow 39 , another of the patients in this group suffered recurrence. The upper line 37 therefore dropped downward further.
  • the lower line 41 represents the group of twelve patients below the cut-point in FIG. 13 .
  • This group of patients is estimated by the grouping to be patients who will suffer recurrence. None of the patients of this group suffered recurrence until the time indicated by arrow 42 . At this time a patient in this second group suffered recurrence, so the bottom line 41 drops downward an amount to reflect the number of patients suffering recurrence at that time.
  • another of the patents of this second group sufferance recurrence, and therefore the lower line 41 drops downward again.
  • the upper line 41 would extend horizontally from left to right over time, without ever dropping, because none of the patients represented by that upper line 37 would ever have suffered cancer recurrence.
  • the bottom line 41 by the end of time at that right of the plot, that bottom line 41 would reach the very bottom of the plot because all the patients of the second group as represented by the bottom line 41 would have suffered cancer recurrence at some time.
  • the actual Kaplan-Meier plot and analysis performed indicates the cut-point value for the best grouping of patients with regard to the known cancer recurrence information.
  • the Kaplan-Meier analysis indicates the cut-point indicated in FIG. 13 as well as a prognostic value (p-value) of 0.002478.
  • FIG. 15 shows the MAGEC2 raw measurement data for the twenty-three patients being studied in the learning phase.
  • FIG. 16 shows the information in the rows of FIG. 15 in reordered form (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 17 shows how the ranked raw measurement count values of FIG. 16 are then normalized.
  • FIG. 18 is a Kaplan-Meier plot for the MAGEC2 data of FIG. 17 .
  • the Kaplan-Meier analysis indicates a cut-point as well as a prognostic value (p-value) of 0.05880.
  • FIG. 19 shows the IHC_DIST_CD163(+)_CD3(+)CD8( ⁇ ) raw measurement data for the twenty-three patients being studied in the learning phase.
  • FIG. 20 shows the information in the rows of FIG. 19 in reordered form (i.e., “ranked”).
  • FIG. 21 shows how the ranked raw measurement count values of FIG. 20 are then normalized.
  • FIG. 22 is a Kaplan-Meier plot for the data of FIG. 21 .
  • the Kaplan-Meier analysis indicates a cut-point value as indicated in FIG. 21 as well as a prognostic value (p-value) of 0.02780.
  • FIG. 23 is a table that shows the four “fuzzy logic” bivariate scoring methods. These scoring methods are denoted SM1, SM2, SM3 and SM4.
  • FIG. 24 is a diagram that shows how LGALS3-to-MAGEC2 normalized rank values are calculated using the first bivariate scoring method SM1.
  • the leftmost two columns of the diagram set forth the normalized rank values for LGALS3. These are the normalized LGALS3 values of FIG. 13 , except that they have been reordered according to patient ID.
  • the next two leftmost columns of the diagram set forth the normalized rank values for MAGEC2.
  • These are the normalized MAGEC2 values of FIG. 17 , except that they have been reordered according to patient ID.
  • the normalized value in the LGALS3 column for that patient ID is multiplied by the normalized value in the MAGEC2 column for that patient ID.
  • the resulting values are indicated in the column denoted “f1 ⁇ f2”.
  • the normalized LGALS3 value is simply multiplied by the normalized MAGEC2 value because the scoring method is SM1, and according to the table of FIG. 23 , the f1 value for the patient is multiplied by the f2 value for patient K, where the variable K goes from one (for the patient having the patient ID of “1”) to twenty-three (for the patient having the patient ID of “23”).
  • the rows of the “f1 ⁇ f2” column are then reordered so that the top row has the smallest “f1 ⁇ f2” value, and so that the bottom row has the largest “f1 ⁇ f2” value.
  • a Kaplan-Meier analysis is then performed on this reordered data.
  • the Kaplan-Meier analysis indicates a cut-point value of 0.2121, as well as a prognostic value of 0.0009008.
  • FIG. 25 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM2 is used.
  • the same process as set forth above in connection with FIG. 24 is performed, except that the “f1 ⁇ f2” column of FIG. 24 is replaced with a “(1 ⁇ f1) ⁇ f2” column.
  • the values in this column are then reordered so that the smallest SM2 value is at the top of the reordered column, and so that the largest SM2 value is at the bottom of the reordered column. This reordered information is the rightmost column in FIG.
  • the Kaplan-Meier analysis indicates a cut-point value of 0.2273, as well as a prognostic value of 0.9806.
  • FIG. 26 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM3 is used.
  • the Kaplan-Meier analysis indicates a cut-point value of 0.1515, as well as a prognostic value of 0.8034.
  • FIG. 27 is a diagram that shows show how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM4 is used.
  • the Kaplan-Meier analysis indicates a cut-point value of 0.2272, as well as a prognostic value of 0.006845.
  • FIG. 28 is a table showing the prognostic values determined for the LGALS3-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4. Scoring method SM2 resulted in the highest prognostic value, so scoring method SM2 is selected to be the scoring method used.
  • FIG. 29 is a Kaplan-Meier plot for the LGALS3-to-MAGEC2 bivariate relationship when the SM2 scoring method is used.
  • the determined cut-point value of 0.2273 divides the patients into a first group of eleven patients and a second group of twelve patients.
  • the first group is represented by the line 44 in the Kaplan-Meier plot.
  • the second group is represented by the line 45 in the Kaplan-Meier plot.
  • FIG. 30 shows how the cut-point value and the prognostic value for the SM1 scoring method are determined.
  • FIG. 31 shows how the cut-point value and the prognostic value for the SM2 scoring method are determined.
  • FIG. 32 shows how the cut-point value and the prognostic value for the SM3 scoring method are determined.
  • FIG. 33 shows how the cut-point value and the prognostic value for the SM4 scoring method are determined.
  • FIG. 34 is a chart showing the prognostic values determined for each of the four scoring methods SM1, SM2, SM3 and SM4. The chart reveals that the largest prognostic value is obtained when the SM4 scoring method is used. The SM1 scoring method is therefore determined to be the method used.
  • FIG. 35 is a Kaplan-Meier plot for the IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate relationship when the SM 1 scoring method is used.
  • the determined cut-point value of 0.2121 divides the twenty-three patients into a first group of eleven patients and a second group of twelve patients. The first group is represented by the line 46 in the Kaplan-Meier plot. The second group is represented by the line 47 in the Kaplan-Meier plot.
  • the “learning phase” results in the generation of both a cut-point value and a rank normalized value list for each of the three univariate features, and for each of the two bivariate features. This information is stored in the database 6 for later use in the diagnostic phase.
  • the new patient for whom the score 2 is to be determined is seen in a clinical setting, and a tissue sample is obtained from the patient.
  • the tissue sample is sliced and analyzed as explained above in connection with FIG. 2 .
  • the first a second tissue slices are stained and imaged and analyzed by the system so as to generate an IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 raw measurement value.
  • the third tissue slice is lysed and two pairs of probes are applied.
  • the nCounter device is used to count probes, thereby generating two count values.
  • the count values are loaded into the system as the LGALS3 raw measurement value and the MAGEC2 raw measurement value.
  • the three raw measurement values for the new patient are set forth in FIG. 36 .
  • FIG. 37 shows the ordered rank values and the cut-point value determined in the learning phase for the LGALS3 univariate feature.
  • the LGALS3 raw measurement value is listed.
  • the LGALS3 raw measurement value of 2400 is compared to this list of raw measurement values obtained in the learning phase.
  • the 2400 value is a value between the 2358 value for patient number 2 and the 2409 value for patient number 9. If the raw measurement value for the new patient is below the cut-point value, then an LGALS3 univariate feature score SLGALS3 is “1”, otherwise the score SLGALS3 is “0”. In the case of the raw value being 2400, the 2400 raw measurement value falls below the cut-point so the SLGALS3 score is “1”.
  • a ranked normalized value for the raw score of the new patient is also determined.
  • the ranked value corresponding to the 2400 raw measurement value is between the ranked value for patient number 2 (who has a raw measurement value of 2358) and patient number 9 (who has a raw measurement value of 2409).
  • the ranked value for the new patient must therefore be between the ranked value of 13/22 for patient number 2 and the ranked value of 14/22 for patient number 9.
  • a ranked value for the new patient is determined in simplified fashion to be a ranked value of 13.5/22 which is midway between the two ranked values of 13/22 and 14/22. This ranked value of 13.5/22 for LGALS3 is stored and is later used in the determination of a bivariate feature score.
  • FIG. 38 shows the ordered rank values and the cut-point value determined in the learning phase for the MAGEC2 univariate feature.
  • the same process described above in connection with LGALS3 and FIG. 37 is performed here for MAGEC2.
  • the new patient's MAGEC2 raw measurement value of 17 points to a location in the list of raw MAGEC2 measurements that is below the cut-point value.
  • the SMAGEC2 score is therefore determined to be “1”.
  • the new patient's MAGEC2 raw measurement value translates into a ranked value of 10.5/12.
  • FIG. 39 shows the ordered rank values and the cut-point value determined in the learning phase for the IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 univariate feature.
  • the same process described above in connection with LGALS3 and FIG. 37 is performed here for IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ ).
  • the new patient's raw measurement value of 700 points to a location above the cut-point value.
  • the S IHC score is therefore determined to be “0”.
  • the new patient's raw measurement value translates into a ranked value of 6.5/22.
  • FIG. 40 shows how a LGALS3-to-MAGEC2 bivariate feature score value for the new patient is determined. Because the SM2 scoring method was determined in the diagnostic phase to the scoring method for the LGALS3-to-MAGEC2 bivariate relationship that results in the highest prognostic value, the SM2 scoring method is used. The rank normalized list of rank values of FIG. 25 for the SM2 scoring method is replicated in FIG. 40 . The rank value for the new patient is determined using the 13.5/22 rank value for LGALS3 and the 10.5/12 rank value for MAGEC2. The SM2 equation is applied to these rank values, resulting in a SM2 rank value for the LGALS3-to-MAGEC2 bivariate relationship of 0.3381. As can be seen from FIG. 40 , this 0.3381 rank value is located between the values for patient number 9 and patient number 16. This location is below the cut-point, so the S(LGALS3-MAGEC2) score value is “1”.
  • FIG. 41 shows how a IHC_Dist_CD163(+)_CD3(+)CD8( ⁇ )-to-MAGEC2 bivariate feature score value for the new patient is determined.
  • the same process described above in connection with FIG. 40 is used. Because the SM1 scoring method was determined in the diagnostic phase (see FIG. 34 ), the SM1 scoring method is used.
  • the rank normalized list of rank values of FIG. 30 for the SM 1 scoring method is replicated in FIG. 41 .
  • the rank value for the new patient is determined using the 6.5/22 rank value for IHC (see FIG. 39 ) and the 10.5/12 rank value for MAGEC2 (see FIG. 38 ).
  • the SM1 equation is applied to these two rank values, resulting in a SM1 rank value for the bivariate relationship of 0.2585. As can be seen from FIG. 41 , this rank value is located between the values for patient number 19 and patient number 22. This location is below the cut-point, so the S(IHC-MAGEC2) score value is “1”.
  • the score 2 generated by the system 1 of FIG. 1 for the new patient is a function of the three univariate feature score values S LGALS3 , S MAGEC2 and S IHC , and of the two bivariate feature score values S (LGALS3 ⁇ MAGEC2) and S (IHC ⁇ MAGEC2) .
  • this function is a majority voting function (see FIG. 42 ).
  • Each of the five feature score values is a number that is either 0 or 1.
  • the five feature score values are summed, and if the sum is greater than or equal to 2.5 then the overall score (the score 2 of FIG. 1 ) is determined to be “1”, otherwise the overall score (the score 2 of FIG. 1 ) is determined to be “0”.
  • An overall score of “0” indicates that cancer recurrence is determined to be unlikely.
  • An overall score of “1” indicates that cancer recurrence is determined to be likely.
  • FIG. 43 shows how the function of FIG. 42 is applied in the case of the new patient whose score 2 is being determined.
  • the sum of the five features value scores is 4, and this sum is greater than 2.5, so score 2 as determined by the system 1 of FIG. 1 is “1”.
  • This score value of “1” is displayed on the display 9 of the computer 7 of FIG. 1 .

Abstract

An analysis and display system generates and displays a score indicative of whether cancer will recur in a patient. In a learning phase, a phenomic feature of tumor tissue is measured. A corresponding phenomic feature is defined. The phenomic feature may be measured through image analysis of digital images taken of tissue slices stained with IHC-based stains. A genomic feature of the tissue is also measured. This may entail obtaining a probe count indicative of a degree of expression of a particular gene. A bivariate feature is calculated using both the phenomic and genomic information. A network including the bivariate feature is displayed. In a diagnostic phase, raw phenomic and genomic data is obtained from a tissue sample taken from the patient. From the data, a score for the bivariate feature, and scores for the other features, are calculated. The score is a function of the underlying feature scores.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit under 35 U.S.C. § 119 of provisional application Ser. No. 62/629,591, entitled “Predicting Prostate Cancer Recurrence Using A Prognostic Model That Combines Immunohistochemical Staining And Gene Expression Profiling”, filed on Feb. 12, 2018, by Guenter Schmidt. The subject matter of provisional application Ser. No. 62/629,591 is incorporated herein by reference.
  • TECHNICAL FIELD
  • The present invention relates to systems and methods for detecting cancer and predicting the recurrence of cancer, and more particularly relates to systems and methods for predicting the recurrence of prostate cancer PSA recurrence.
  • BACKGROUND INFORMATION
  • A cancer patient can be treated such that the cancer goes into remission. Knowing whether and when the cancer might later come out of remission and recur would, for many reasons, be beneficial. Having such information may facilitate making better clinical and treatment decisions. Having such information may also allow the patient to improve the patient's quality of life, and to make better life decisions. An improved system and method for determining the likelihood of cancer recurrence is desired.
  • SUMMARY
  • An analysis and display system generates and displays a score indicative of whether cancer will recur in a patient. In a learning phase, tumor tissue from each one of many patients is obtained and analyzed. For each of these patients, it is known whether the patient suffered a recurrence of cancer, and this information is loaded into the system. A univariate phenomic feature of the tumor tissue is measured, and a corresponding univariate phenomic feature is defined. The univariate phenomic feature may be measured through image analysis of digital images taken of tissue slices stained with IHC-based stains. A univariate genomic feature of the tissue is also measured. This may entail obtaining a probe count indicative of a degree of expression of a particular gene. A bivariate feature is then calculated using both the phenomic and genomic information. In this way, many univariate features can be measured. A bivariate feature can be calculated for the relationship between any two of the univariate features. Once all this information has been collected and all the features have been calculated and defined, the system employs a thinning method to eliminate those features that do not have a substantial prognostic value in predicting the recurrence of cancer in a patient. The result of this elimination of unimportant features is a Minimal Spanning Tree (MST). The MST is a network (also called a graph). It includes the features of substantial prognostic importance.
  • In one novel aspect, some of the nodes (bubbles) of the MST network represent phenomic features, and others represent genomic features. The edges represent bivariate features. Some of the edges represent bivariate features that are based on both phenomic feature information and genomic feature information. The method of using phenomic information along with genomic information in the prediction of cancer recurrence allows additional feature measurements to be brought to bear in the determination of the score, as compared to methods that only use genomic information.
  • In another novel aspect, a user of the system can cause a rendering of the MST network to be displayed on the display of the system. The nodes of univariate features that have more prognostic importance are rendered to be larger, whereas the nodes of univariate features that have less prognostic importance are rendered to be smaller. Edges in the network that have more prognostic importance are rendered as thicker lines, whereas edges in the network that have less prognostic importance are rendered as thinner lines. The type of bivariate relationship (one of four “fuzzy logic” combinations) that was determined to have the most prognostic importance when the strength of the bivariate relationship was being determined in the learning phase is indicated in the MST network by the type of arrow or line that is representing the edge.
  • In a diagnostic phase, a score is to be generated for a new patient. A diagnostic test that involves collecting information on only a relatively small number of the features is developed using the network as displayed on the system. In one example, raw measurement information on only three features need be collected from the patient. One of the features is a phenomic feature, and the other two features are genomic features. A tissue sample is taken from the patient, and this raw phenomic and genomic data is obtained from the sample. From the raw measurement data, a score for each univariate feature (univariate feature used in the diagnostic test) is calculated. In the example in which there are three univariate features involved in the diagnostic test, raw measurement data for these three features is obtained. From this raw data, a score for each of the three features is calculated. In addition, the raw data is used to calculate a score for each of the two bivariate features (edges in the network) between these three univariate features. The overall score is a function of the underlying feature scores. In one example, each of the underlying feature scores is either a “1” (representing a “yes” vote), or a “0” (representing a “no” vote). There is one score for each of the three univariate phenomic features, and one for each of the two bivariate features, for a total of five scores. The function is a majority voting function, so the overall score is a majority vote of the five votes provided by the five underlying features. The resulting overall score, which is indicative of whether cancer will recur in the patient, is then displayed on the display of the system.
  • Multiple different diagnostic tests can be developed by inspecting the network after the learning phase, and by selecting features that have notably high prognostic importance. The example of a test involving five univariate features and two bivariate features is presented for illustrative purposes. Although an example of the system is presented where the score is to be indicative of whether the patient will suffer a recurrence of prostate cancer, the system has general applicability. For example, the system is usable to generate a score indicative of whether the patient will suffer a recurrence of another type of cancer, such as lung cancer, or breast cancer.
  • Further details and embodiments and methods are described in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, where like numerals indicate like components, illustrate embodiments of the invention.
  • FIG. 1 is a diagram of a system for predicting the recurrence of cancer.
  • FIG. 2 is a diagram that illustrates how, in a “diagnostic phase” operation of the system of FIG. 1, a tissue sample from a patient is used to generate both raw phenomic feature measurement data as well as raw genomic feature measurement data.
  • FIG. 3 is a diagram of a Minimal Spanning Tree (MST) of univariate and bivariate features that predict the recurrence of cancer, where the tree includes both phenomic and genomic features.
  • FIG. 4 is a two-dimensional matrix of the prognostic values of bivariate relationships between pairs of univariate features.
  • FIG. 5 is a table that sets forth thirty-two univariate features that are determined to have significant prognostic value in the prediction of the recurrence of cancer (specifically prostate cancer).
  • FIG. 6 is a grayscale version of a high-resolution digital image of a first slice of tissue that was duplex stained in an IHC-based image analysis process.
  • FIG. 7 is a grayscale version of a high-resolution digital image of a second slice of tissue that was duplex stained in the IHC-based image analysis process.
  • FIG. 8 is an expanded view of a portion of the first digital image of FIG. 6.
  • FIG. 9 is an expanded view of a portion of the second digital image of FIG. 7.
  • FIG. 10 is an illustrative diagram that shows how the average distance from an M2 type macrophage to its nearest four non-cytotoxic T-cells is determined.
  • FIG. 11 is a diagram that sets forth LGALS3 raw measurements used in the learning phase of the system.
  • FIG. 12 shows how the information in the rows of FIG. 11 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 13 shows how the raw measurement count values of FIG. 12 are normalized by rank percentage.
  • FIG. 14 is a Kaplan-Meier plot for the data of FIG. 13.
  • FIG. 15 is a diagram that sets forth MAGEC2 raw measurements used in the learning phase of the system.
  • FIG. 16 shows how the information in the rows of FIG. 15 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 17 shows how the raw measurement count values of FIG. 16 are normalized by rank percentage.
  • FIG. 18 is a Kaplan-Meier plot for the data of FIG. 17.
  • FIG. 19 is a diagram that sets forth IHC_DIST_CD163(+)_CD3(+)CD8(−) raw measurements used in the learning phase of the system.
  • FIG. 20 shows how the information in the rows of FIG. 15 is reordered (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 21 shows how the raw measurement count values of FIG. 20 are normalized by rank percentage.
  • FIG. 22 is a Kaplan-Meier plot for the data of FIG. 21.
  • FIG. 23 is a table that shows the four “fuzzy logic” bivariate scoring methods.
  • FIG. 24 is a diagram that shows how LGALS3-to-MAGEC2 normalized rank values are calculated when the SM1 bivariate scoring method is used.
  • FIG. 25 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM2 bivariate scoring method is used.
  • FIG. 26 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM3 bivariate scoring method is used.
  • FIG. 27 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the SM4 bivariate scoring method is used.
  • FIG. 28 is a table showing the prognostic values determined for the LGALS3-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4.
  • FIG. 29 is a Kaplan-Meier plot for the LGALS3-to-MAGEC2 bivariate relationship when the SM2 scoring method is used.
  • FIG. 30 is a diagram that shows how IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 normalized rank values are calculated using the first bivariate scoring method SM1.
  • FIG. 31 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship is determined when the SM2 bivariate scoring method is used.
  • FIG. 32 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship is determined when the SM3 bivariate scoring method is used.
  • FIG. 33 is a diagram that shows how the cut-point and prognostic value for the IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship is determined when the SM4 bivariate scoring method is used.
  • FIG. 34 is a table showing the prognostic values determined for IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4.
  • FIG. 35 is a Kaplan-Meier plot for the IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship when the SM1 scoring method is used.
  • FIG. 36 is a table showing three raw measurement values for the new patient (to be used in the diagnostic phase to determine a score for the new patient).
  • FIG. 37 shows how a score is determined in the diagnostic phase for the LGALS3 univariate feature.
  • FIG. 38 shows how a score is determined in the diagnostic phase for the MAGEC2 univariate feature.
  • FIG. 39 shows how a score is determined in the diagnostic phase for the IHC_DIST_CD163(+)_CD3(+)CD8(−) univariate feature.
  • FIG. 40 shows how a score is determined in the diagnostic phase for the LGALS3-to-MAGEC2 bivariate feature.
  • FIG. 41 shows how a score is determined in the diagnostic phase for the IHC_DIST_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate feature.
  • FIG. 42 sets forth the function that is used to determine the overall score from the underlying five feature scores.
  • FIG. 43 shows how the function of FIG. 42 is applied in the case of the new patient whose overall score is being determined in the diagnostic phase.
  • DETAILED DESCRIPTION
  • Reference will now be made in detail to some embodiments of the invention, examples of which are illustrated in the accompanying drawings.
  • FIG. 1 is a conceptual diagram of a system 1 for predicting cancer recurrence using a prognostic method that analyzes genomic univariate features, phenomic univariate features, and bivariate features of the two (of a genomic feature and a phenomic feature). Based at least in part on that analysis, system 1 outputs a score 2. The score 2 is indicative of whether a patient will suffer a recurrence of cancer. System 1 includes a data analysis server 3. The server 3 has a processor 4 that executes system software 5. The software 5 is stored on the server 3 in a non-transitory processor-readable medium, such as semiconductor memory and/or magnetic disk storage. The server 3 also maintains, and/or provides access to, a database 6 of patient data. The database 6 may be stored on the server 3, or it may be stored remotely such that it is accessible to the server 3. The system 1 also further includes a computer 7. The computer 7 is coupled to the server 3, for example by one or more networks or network connections 8. Computer 7 includes a keyboard (not shown) and a display 9. A user of the system uses the computer 7 to enter information into the system. Information that the user can enter includes genomic feature information 10, phenomic feature information 11, and context information 12. The genomic feature information can be a set of counts, where each count indicates the degree of expression of a corresponding gene in the tissue of a cancer patient. The phenomic feature information 11 can be digital images taken of tissue of a cancer patient. In addition to this genomic and phenomic information about a patient, context information 12 for the patient is also loaded into the database 6. The context information 12 for a patient includes information about the patient including clinical cancer recurrence data. For each patient of a plurality of patients, the user uses computer 7 to cause genomic information, phenomic information, and context information to be loaded into the system so that the information is stored in the database 6. In addition to the patients mentioned above, there is also a patient who is being seen in the clinical setting. It is for this patient that the score 2 is to be generated. The user uses the computer to cause both genomic and phenomic information about this patient to be loaded into the system and to be stored in the database 6.
  • The user uses computer 7 to interact with the system, and to view information served to the user by server 3. The server 3 may cause this information to be displayed for viewing on the graphical user interface or display 9 of the computer 7. An example of information that can be viewed is a Minimum Spanning Tree (MST) 13 of univariate features and bivariate features, where nodes (bubbles) of the MST represent univariate features, and where edges (interconnecting lines and arrows) of the MST represent bivariate features. The system 1, by virtue of processor 4 executing the system software 5, analyzes the genomic gene expression information and the phenomic digital image information along with the context information, and generates therefrom the score 2. The system 1 then causes the score 2 to be displayed on the display 9 of the computer 7.
  • What are referred to here as “phenomic features” are physical structural characteristics of features of tissue that are obtained by analyzing digital images of tissue. One or more slices of tissue are stained with one or more protein-specific ImmunoHistoChemical (IHC) stains. Such a stain is typically an antibody stain that has a fluorescent tag, where the antibody can bind to a particular target protein. The selective staining of different proteins is usable to reveal certain structures within the tissue. One or more digital images are taken of the stained tissue. One particular physical structural characteristic may, for example, be a count of certain types of structures within the tissue, or may be a size of those structures within the tissue, or may be a density of those structures within the tissue. A relationship between such detected structures in the tissue may also be considered to be a phenomic feature. An example of a relationship between detected structures is the average distance between different types of structures detected within the tissue. Another example of a relationship between detected structures is a ratio of the number of one type of structure to another type of structure.
  • One example of a phenomic feature is the number of M1 macrophages in parts of tissue referred to as “influence zones”. Another example of a phenomic feature is the number of M1 macrophages in other parts of tissue referred to as “stroma regions”. Another example of a phenomic feature is the density of M2 macrophages in other regions of the tissue. Another example of a phenomic feature is a score, where that score is in turn a function of other such phenomic feature numbers. For detailed information on how tissue sample slices may be prepared, stained, and analyzed using image analysis in order to identify, measure and quantify phenomic features present in tissue of a cancer patient, see U.S. patent application Ser. No. 15/075,180, entitled “System for Predicting the Recurrence of Cancer in a Cancer Patient”, filed Mar. 20, 2016, by Natalie Harder et al. (the entire subject matter of which is incorporated herein by reference).
  • One particular phenomic feature that is of particular interest in the prognostic method carried out by the system 1 of FIG. 1 is an average distance. This phenomic feature is referred to as “IHC_Dist_CD163(+)_CD3(+)CD8(−)”. IHC-based staining and image analysis are used to identify M2 macrophages in a tissue sample and to identify non-cytotoxic T-cells in the tissue sample. Each identified M2 macrophage is then considered, and for that M2 macrophage the average distance (in micrometers) between it and the four nearest identified non-cytotoxic T-cells is determined. The average of all these averages for all the identified M2 microphages in the tissue sample is then determined, and this overall average is the value score for the IHC_Dist_CD163(+)_CD3(+)CD8(−) feature. Additional detail on how this value score is determined is set forth below.
  • In addition to analyzing “phenomic features”, the system 1 of FIG. 1 also analyzes “genomic features”. A “genomic feature”, as that terms is used here, is a characteristic of particular DNA nucleotide sequences that is present in a tissue sample. This characteristic may be given as a count that is indicative of the degree of expression of a particular gene present in the sample. Commercially available gene-specific biomarker probes exist that are designed so that they only attach to particular DNA nucleotide sequences, such as the sequences that are present in parts of mRNA strands. In one example, a lysing buffer is used to lyse tissue to be analyzed into its constituent genetic material. The constituent genetic material is put into solution. A pair of these biomarker “probes” is then mixed in. One of the probes is a capture probe that is selective in that it only attaches to a particular sequence of DNA nucleotides of a target molecule (for example, a target mRNA strand that includes the particular sequence of nucleotides). This capture probe can be made specific to a particular DNA nucleotide subsequence found on a gene. The other probe of the probe pair is the reporter probe. This reporter probe has a color-coded “barcode” that can be illuminated and optically examined to identify it. In one example, a device called nCounter is commercially available from NanoString Technologies, Inc., of Seattle, Wash. The nCounter device has a high-resolution CCD camera. The nCounter device is usable to illuminate the bar-code on each reporter probe, and thereby to determine the barcode of the probe and to count the number of times that a probe with that same particular barcode was detected. The pair of probes is therefore said to be “gene-specific” in that the probe pair is usable as a biomarker for a specific gene that includes the particular sequence of DNA nucleotides for which the capture probe is selective. Gene-specific probe pairs are commercially available from multiple sources, including from NanoString Technologies, Inc. After the pair of probes has been mixed into the solution of genetic material and after the probes have attached to their target molecules, excess probes (unattached probes) in the solution are removed. The remaining probe/target complexes are then aligned and immobilized. The nCoutner device illuminates the probe/target complexes and uses its high-resolution CCD camera to perform optical examination of the probes. In this way, the probe on each individual target molecule is identified by its barcode to be a probe of a particular type, and the count of this particular probe type is incremented. After all the probes in the sample have been detected and counted, the nCounter device outputs a digital file. Such a digital file is an example of the genomic information 10 that is loaded into the system 1 of FIG. 1. This digital file includes a count value. The count value indicates the number of times that a probe of a particular type (bearing a particular color-coded barcode) was detected in the sample. The so-called “expression level” of a gene is a measurement of how large the count value is for the barcode of the probe that is specific to the gene of interest.
  • Although the nCounter device may involve a CCD camera and may perform optical inspections in order to identify probes, the nCounter is not doing wide-field phenomic image analysis in that it is not performing any analysis to identify cells or groups of cells, or structural aspects of non-lysed tissue. The nCounter device is not measuring or outputting raw phenomic feature data. The term “phenomic” as it is used here is intended to exclude the data that results from the optical identification of gene-specific probes.
  • In a specific example of the prognostic method carried out by the system 1 of FIG. 1, there are two probe pairs that are of particular interest. The first probe pair is usable with the nCounter device to measure the gene expression of the LGALS3 gene. The LGALS3 gene is located in chromosome 14, locus q21-q22. The second probe pair is usable with the nCounter device to measure the expression of the MAGEC2 gene. The MAGEC2 gene is not expressed in normal tissue, but is expressed in tumors on chromosome Xq27.2.
  • The prognostic method carried out by the system 1 of FIG. 1 has a “learning phase” and a “diagnostic phase”. In the learning phase, both genomic feature information as well as phenomic feature information from each patient of a plurality of patients is generated, and then analyzed by the system. For each of these patients, information on many different genomic features and on many different phenomic features are typically collected and loaded into the system. The result of the learning phase is that system 1 has information that is usable to generate a score 2 for a particular new patient in the later “diagnostic phase”. This score 2 can be generated for the new patient without having to load but a little bit of genomic feature information and but a little bit of phenomic feature information for the new patient. Based on this relatively small amount of information, the score 2 is generated. The score 2 indicates whether the new patient will likely suffer cancer recurrence.
  • FIG. 2 is a diagram that illustrates the “diagnostic phase” operation of the system 1. A tissue sample 15 is obtained (for example, by biopsy) from the new patient 16. The new patient 16 is the patient for whom the score 2 is to be generated. The tissue sample 15 is then sliced into very thin slices. Some of the slices are used to generate phenomic digital image information 11 that is supplied to the system in the diagnostic phase for the new patient. Others of the slices are used to generate gene expression information 10 that is supplied to the system in the diagnostic phase for the new patient. In the illustration of FIG. 2, the first tissue slice 17 is stained with a first pair of IHC stains and is put on a first slide 20. A first high-resolution color digital image of the slice 17 is taken and is supplied as first digital image information to the system. The second tissue slice 18 is stained with another pair of IHC stains and is put on a second slide 21. A second high-resolution color digital image of the slice 18 is taken and is supplied as second digital image information to the system. The digital image information derived from slices 17 and 18 is the raw measurement data used by the system 1 to generate a phenomic univariate feature value score for the new patient. The third tissue slice 19 is used for gene expression-based genomic analysis. The tissue of the third slice 19 is lysed and a first gene-specific probe pair for a first gene is attached, and a second gene-specific probe pair for a second gene is attached. In the specific example set forth below, the first gene is the LGALS3 gene, and the second gene is the MAGEC2 gene. The resulting material 22 in the sample capsule 23 is processed by the nCounter device mentioned above, thereby generating a first count indicative of the degree of gene expression of the first gene, and a second count indicative of the degree of gene expression of the second gene. A digital computer file that records these counts is output from the nCounter device and is supplied to the system 1 of FIG. 1 as the gene expression information 8. These counts, as they are recorded in the digital file, are the raw genomic measurement data that is used by the system 1 to generate genomic univariate feature value scores for the new patient. As is explained in further detail below, the system 1 generates a phenomic univariate feature value score (using the digital image information from the first and second digital images), generates a genomic univariate feature value score (using the count data as output by the nCounter device), and further generates a bivariate feature score (based on both the phenomic feature information and on the genomic feature information). Based at least in part on these univariate and bivariate feature value scores, the system 1 generates the overall score 2. The overall score 2 is then displayed on the display 9 of the computer 5.
  • The “learning phase” of the prognostic method is explained in further detail below by way of an example. In the example, there are twenty-three patients from whom a substantial amount of genomic information and a substantial amount of phenomic information is collected in the learning phase. For each of these twenty-three patients, the clinical cancer recurrence of the patient is known. Namely, whether the patient actually suffered a recurrence of cancer is known and this information is stored as part of the context information for the patient. In addition, if the patient did suffer such recurrence, then the date of that recurrence is known. This information is also stored as part of the context information for the patient.
  • From each one of these twenty-three patients, a tissue sample is obtained. The resulting tissue sample block is sliced into numerous tissue slices. Some of these tissue slices are used to make raw measurements of various different phenomic univariate features. Information on many different phenomic univariate features is obtained. Others of the tissue slices are used to make raw measurements of various different genomic univariate features. Many different gene-specific probe pairs are employed to obtain gene expression information for many different genes. For a given feature, all the raw measurements for that feature are normalized. One normalization method that can be used is rank percentage normalization. It involves sorting the raw measurement values of the feature from smallest to largest, and then replacing each raw measurement value by a rank value equal to: (position_in_the_sorted_list−1)/(number_of_samples−1). Then, using these normalized rank values, as well as the known clinical cancer recurrence information for the corresponding patients, a Kaplan-Meier plot analysis is performed for every possible cut-point within a quantile range of 40% to 60%. This results in a Kaplan-Meier determined prognostic value (“log-rank test p-value” (also called a −log-p-value)) for every cut-point. A univariate feature that has a mean −log-p-value greater than −log10 (0.05) is considered to be prognostic.
  • Next, the normalized and ranked values for each significant univariate feature is combined with the normalized and ranked values of each other significant univariate feature in order to calculate a bivariate feature value (a “−log-p-value”). To combine a first feature (denoted f1) with a second feature (denoted f2) in order to obtain a single (potentially prognostic) bivariate feature (denoted f12), four fuzzy logical combinations of the corresponding normalized and ranked values of the two features are calculated. Those four fuzzy logical combinations are: 1) f12=f1*f2 (denoted “f1 and f2” of “SM1”); 2) f12=(1−f1)*f2 (denoted “not f1 and f2” or “SM2”); 3) f12=f1*(1−f2) (denoted “f1 and not f2” or “SM3”); and 4) f12=(1−f1)*(1−f2) (denoted “not f1 and not f2” or “SM4”). The determination of which fuzzy logical combination is considered significant is the same as the selection of significant univariate features as described above, except that here the determination of the significant bivariate features has the additional requirement that the log-rank-test p-value of the combination f12 must be at least a factor of ten times smaller than the smallest log-rank-test p-value from the univariate analysis of f1 or f2. For each bivariate feature, one of the four possible fuzzy logical combinations is determined to be the most significant, and the −log-p-value of that combination is determined to be the prognostic value (−log-p-value) of the bivariate feature.
  • To reduce the number of prognostic features by another step, the univariate and bivariate feature values as determined above are fashioned into a network (into a graph). The determined univariate feature values (−log-p-values) are the bubbles (nodes) of the network. The determined bivariate feature values (−log-p-values) are the edges (interconnecting lines or arrows) of the network. A modified version of Prim's algorithm is then used to trim the network and thereby to obtain a Minimal Spanning Tree (MST). In this version of Prim's algorithm, all significant bivariate features are first sorted according to their −log-p-values (sorted into descending order). The most significant bivariate feature is then selected (first in the sorted list) to be a starting node of the MST. Then the bivariate feature list is iterated from top to end, and any bivariate feature f12 is added to the MST if at least f1 or f2 is not yet part of the MST. Additionally, f12 bivariate features are added if they are part of the top 75% quantile of all bivariate features. A tree layout method is then used to render a diagram of the MST. In one example, the open source graph visualization software tool called “Graphviz sfdp” is used to generate a visual rendering of the MST. The user of the system 1 of FIG. 1 can use the computer 7 to cause the rendered MST diagram to be displayed on the display 9 as shown in simplified form in FIG. 1.
  • FIG. 3 is a diagram of the MST 13 as it is rendered on display 9 of the computer 7. In the diagram, there is a bubble (node) for each univariate feature. The size of the bubble (node) indicates the prognostic significance of the univariate feature, namely a larger bubble indicates a larger −log-p-value, whereas a smaller bubble indicates a smaller −log-p-value. The larger the bubble, the more significant the univariate feature. In the diagram, a genomic univariate feature is denoted by its corresponding bubble being unshaded, whereas a phenomic univariate feature is denoted by its corresponding bubble being shaded. In the diagram, the thickness of the edge representing a bivariate feature indicates the significance of the bivariate feature, namely a thicker line represents a larger −log-p-value, whereas a thinner line represents a smaller −log-p-value. The thicker the line, the more prognostic significance the bivariate feature has. For each bivariate feature shown, the MST indicates which one of the four fuzzy logic combinations it was that was considered to be the most significant. As indicated by the key at the lower right of FIG. 3, the lack of an arrow head where an edge reaches a bubble indicates a “not” of the univariate feature represented by the bubble.
  • FIG. 4 is a portion of a two-dimensional matrix of the prognostic values (−log-p-values) of the bivariate features. Univariate features (including both phenomic and genomic features) that are determined to be significant are listed across the top of the matrix. Note that there is a column for the phenomic feature “IHC_Dist_CD163(+)_CD3(+)CD8(−)”, and there is a column for the genomic feature “MAGEC2”, and there is a column for the genomic feature “LGALS3”. For each of the univariate features, there is also a corresponding row in the matrix. In this example, 730 univariate features were measured in the learning phase, so there is one row in the matrix for each of the 730 univariate features. The thin rectangular intersection block of the matrix that appears in the column of one univariate feature and in the row of another univariate feature is shaded with a color. The darker the color of the intersection block of a bivariate feature, the more significant the bivariate feature is. Using the method described above, the most significant bivariate features are determined.
  • FIG. 5 is a table that sets forth the thirty-two univariate features that are determined to be most significant. A feature was determined to be significant if it had a significant (p<0.05) mean Kaplan Meier log-rank test p-value within a range of five cut-points. In the table, a positive sign indicates that high feature values are favorable for the patients, and the risk of PSA recurrence is low.
  • FIGS. 6-10 set forth more detail about how the “IHC_Dist_CD163(+)_CD3(+)CD8(−)” phenomic feature raw measurement data is obtained. Two consecutive tissue slices of the same tissue sample are stained with IHC-based stains in different ways. The first tissue slice is duplex-stained with a CD68 antibody stain and a CD163 antibody stain. The CD68 stain may, for example, be a stain referred to as #M087601-2, available from Dako North America, Inc., 6392 Via Real, Carpinteria, Calif. 93013. The CD1623 stain may, for example, be a stain referred to as #760-4437, available from Ventana Medical Systems, Inc., 1910 Innovation Park Drive, Tucson, Ariz. 85755. Due to this double staining, individual tumoricidal M1 type macrophages appear red when the slice is viewed under magnification, and individual tumorigenic M2 type macrophages appear brown when the slice is viewed under magnification. After staining, the slice is placed on a slide. A high-resolution color digital image 24 is then taken of the stained slice. FIG. 6 is a grayscale version of the high-resolution digital image 24.
  • The second tissue slice is also duplex stained, but this slice is stained with a CD3 antibody stain and a CD8 antibody stain. Due to this double staining, individual non-cytotoxic T-cells appear red when the slice is viewed under magnification, and individual cytotoxic T-cells appear brown when the slice is viewed under magnification. After staining, the slice is placed on a slide. A high-resolution color digital image 25 is then taken of the stained slice. FIG. 7 is a grayscale version of the high-resolution digital image 25.
  • FIG. 8 is an expanded view of a portion 26 of the first digital image 24 of FIG. 6. The system 1 performs image analysis on the first digital image, thereby identifying M2 macrophage objects. The arrows in FIG. 8 identify the M2 macrophage objects. The location in the X-Y dimension of the center of each detected M2 macrophage object is logged.
  • FIG. 9 is an expanded view of a portion 27 of the second digital image 25 of FIG. 7. The portion 27 of the second digital image 25 is the same X-Y dimension as is the portion 26 of the first digital image 24. The tissue represented in the two portions 27 and 26 is, however, slightly offset in the Z dimension. The system 1 performs image analysis on the second digital image, thereby identifying individual non-cytotoxic T-cells objects. The arrows in FIG. 9 identify the non-cytotoxic T-cells objects. The location in the X-Y dimension of the center of each detected non-cytotoxic T-cell object is logged.
  • FIG. 10 is an illustrative diagram that shows how the average distance from a M2 type macrophage object to its nearest four non-cytotoxic T-cells is determined. The X-Y image block portion identified by reference numeral 28 in FIG. 10 represents the same X-Y block of tissue as does block 29 in FIG. 9 and as does block 30 in FIG. 8. The non-cytotoxic T-cells 31-34 are determined to be the four such cells that are the closest (in the X-Y dimension) to the M2 macrophage 35. The distances D1, D2, D3 and D4 are determined from the logged center locations of the non-cytotoxic T-cells 31-34 and the M2 macrophage 35. The average of the distances D1, D2, D3 and D4 in micrometers is determined and recorded. This process is repeated for all the M2 macrophage objects detected in the first digital image 24. All these averages are in turn averaged to obtain one overall average. This one overall average (in micrometers) is the “IHC_Dist_CD163(+)_CD3(+)CD8(−)” phenomic feature raw measurement for the patient from whom the first and second tissue slices were taken.
  • Inspection of the MST 13 of FIG. 3 indicates that a phenomic univariate feature can be advantageously used in the diagnostic phase along with two genomic univariate features, and along with two associated bivariate features, to generate the score 2. The dashed line 36 in FIG. 3 encircles these five features. The MST 13 is viewable by the user during and after the learning phase so that the user can review the results of the learning phase, and can identify the features that the learning phase identified as being significant. This information is usable to design a diagnostic clinical test (for example, a test to predict cancer recurrence) that is effective, and yet only employs a relatively small number of features.
  • FIGS. 11-37 are a sequence of diagrams that set forth how the raw measurement data for the three univariate features (the LGALS3 feature, the MAGEC2 feature, and the IHC_DIST_CD163(+)_CD3(+)CD8(−) feature) is processed in preparation for the “diagnostic phase”.
  • FIG. 11 shows the LGALS3 raw measurement data. As mentioned above, the LGALS3 feature is a genomic feature, so the listed raw measurement values are counts. For each patient of the twenty-three patients, there is a raw measurement count. The right column sets forth the known cancer recurrence information for the associated patient. For example, the patient identified with patient ID of “4” was known not to suffer cancer recurrence. The LGALS3 raw measurement count for this patient is “1159”.
  • FIG. 12 shows how the information in the rows of FIG. 11 is reordered (i.e., is “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 13 shows how the raw measurement count values of FIG. 12 are then normalized. The smallest raw count value is replaced with the value 0/22, the next smallest raw count value is replaced with the value 1/22, and so forth. The denominator of the replacement values is the number of patients supplying the data (in this case, twenty-three) minus one. After the normalized rank values have been determined, a Kaplan-Meier analysis is performed on the data of FIG. 13. This analysis generates a “cut-point value” and indicates that there are eleven patients in the group of patients above the cut-point value, and there are twelve patients the group of patients below the cut-point value.
  • FIG. 14 is a Kaplan-Meier plot for the LGALS3 data of FIG. 13. The horizontal axis represents time. The upper line 37 represents the group of eleven patients above the cut-point in FIG. 13. This group of patients is estimated by the grouping to be patients who will not suffer cancer recurrence. At the time indicated by arrow 38, however, one of these patients did suffer recurrence. The upper line 37 therefore dropped downward an amount to reflect the number of patients suffering recurrence at this time. Then later at the time indicated by arrow 39, another of the patients in this group suffered recurrence. The upper line 37 therefore dropped downward further. No more of the patients of this group suffered recurrence until the time indicated by arrow 40, at which point another patient suffered recurrence. The upper line 37 therefore drops downward at again. The lower line 41 represents the group of twelve patients below the cut-point in FIG. 13. This group of patients is estimated by the grouping to be patients who will suffer recurrence. None of the patients of this group suffered recurrence until the time indicated by arrow 42. At this time a patient in this second group suffered recurrence, so the bottom line 41 drops downward an amount to reflect the number of patients suffering recurrence at that time. Similarly, at the time indicated by arrow 43 another of the patents of this second group sufferance recurrence, and therefore the lower line 41 drops downward again. If the grouping of patients as reflected by the cut-point were perfect, then the upper line 41 would extend horizontally from left to right over time, without ever dropping, because none of the patients represented by that upper line 37 would ever have suffered cancer recurrence. As to the bottom line 41, by the end of time at that right of the plot, that bottom line 41 would reach the very bottom of the plot because all the patients of the second group as represented by the bottom line 41 would have suffered cancer recurrence at some time. The actual Kaplan-Meier plot and analysis performed, however, indicates the cut-point value for the best grouping of patients with regard to the known cancer recurrence information. For the data of FIG. 13, the Kaplan-Meier analysis indicates the cut-point indicated in FIG. 13 as well as a prognostic value (p-value) of 0.002478.
  • FIG. 15 shows the MAGEC2 raw measurement data for the twenty-three patients being studied in the learning phase.
  • FIG. 16 shows the information in the rows of FIG. 15 in reordered form (i.e., “ranked”) so that the top row is for the patient having the smallest raw measurement count, and so that the bottom row is for the patient having the largest raw measurement count.
  • FIG. 17 shows how the ranked raw measurement count values of FIG. 16 are then normalized.
  • FIG. 18 is a Kaplan-Meier plot for the MAGEC2 data of FIG. 17. The Kaplan-Meier analysis indicates a cut-point as well as a prognostic value (p-value) of 0.05880.
  • FIG. 19 shows the IHC_DIST_CD163(+)_CD3(+)CD8(−) raw measurement data for the twenty-three patients being studied in the learning phase.
  • FIG. 20 shows the information in the rows of FIG. 19 in reordered form (i.e., “ranked”).
  • FIG. 21 shows how the ranked raw measurement count values of FIG. 20 are then normalized.
  • FIG. 22 is a Kaplan-Meier plot for the data of FIG. 21. The Kaplan-Meier analysis indicates a cut-point value as indicated in FIG. 21 as well as a prognostic value (p-value) of 0.02780.
  • FIG. 23 is a table that shows the four “fuzzy logic” bivariate scoring methods. These scoring methods are denoted SM1, SM2, SM3 and SM4.
  • FIG. 24 is a diagram that shows how LGALS3-to-MAGEC2 normalized rank values are calculated using the first bivariate scoring method SM1. The leftmost two columns of the diagram set forth the normalized rank values for LGALS3. These are the normalized LGALS3 values of FIG. 13, except that they have been reordered according to patient ID. The next two leftmost columns of the diagram set forth the normalized rank values for MAGEC2. These are the normalized MAGEC2 values of FIG. 17, except that they have been reordered according to patient ID. For each patient ID, the normalized value in the LGALS3 column for that patient ID is multiplied by the normalized value in the MAGEC2 column for that patient ID. The resulting values are indicated in the column denoted “f1×f2”. The normalized LGALS3 value is simply multiplied by the normalized MAGEC2 value because the scoring method is SM1, and according to the table of FIG. 23, the f1 value for the patient is multiplied by the f2 value for patient K, where the variable K goes from one (for the patient having the patient ID of “1”) to twenty-three (for the patient having the patient ID of “23”). The rows of the “f1×f2” column are then reordered so that the top row has the smallest “f1×f2” value, and so that the bottom row has the largest “f1×f2” value. A Kaplan-Meier analysis is then performed on this reordered data. The Kaplan-Meier analysis indicates a cut-point value of 0.2121, as well as a prognostic value of 0.0009008.
  • FIG. 25 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM2 is used. The same process as set forth above in connection with FIG. 24 is performed, except that the “f1×f2” column of FIG. 24 is replaced with a “(1−f1)×f2” column. The values in this column are then reordered so that the smallest SM2 value is at the top of the reordered column, and so that the largest SM2 value is at the bottom of the reordered column. This reordered information is the rightmost column in FIG. 25 that is labeled “LGALS3-to-MAGEC2 normalized rank values for SM2”. A Kaplan-Meier analysis is then performed on this reordered data. The Kaplan-Meier analysis indicates a cut-point value of 0.2273, as well as a prognostic value of 0.9806.
  • FIG. 26 is a diagram that shows how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM3 is used. The Kaplan-Meier analysis indicates a cut-point value of 0.1515, as well as a prognostic value of 0.8034.
  • FIG. 27 is a diagram that shows show how the cut-point and prognostic value for the LGALS3-to-MAGEC2 bivariate relationship is determined when the second bivariate scoring method SM4 is used. The Kaplan-Meier analysis indicates a cut-point value of 0.2272, as well as a prognostic value of 0.006845.
  • FIG. 28 is a table showing the prognostic values determined for the LGALS3-to-MAGEC2 bivariate relationship for each of the four scoring methods SM1, SM2, SM3 and SM4. Scoring method SM2 resulted in the highest prognostic value, so scoring method SM2 is selected to be the scoring method used.
  • FIG. 29 is a Kaplan-Meier plot for the LGALS3-to-MAGEC2 bivariate relationship when the SM2 scoring method is used. As indicated by the diagram of FIG. 25 for the SM2 scoring method, the determined cut-point value of 0.2273 divides the patients into a first group of eleven patients and a second group of twelve patients. The first group is represented by the line 44 in the Kaplan-Meier plot. The second group is represented by the line 45 in the Kaplan-Meier plot.
  • The same process described above for the LGALS3-to-MAGEC2 bivariate relationship in connection with FIGS. 24-29 is performed for the IHC_Dist_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship. FIG. 30 shows how the cut-point value and the prognostic value for the SM1 scoring method are determined. FIG. 31 shows how the cut-point value and the prognostic value for the SM2 scoring method are determined. FIG. 32 shows how the cut-point value and the prognostic value for the SM3 scoring method are determined. FIG. 33 shows how the cut-point value and the prognostic value for the SM4 scoring method are determined. FIG. 34 is a chart showing the prognostic values determined for each of the four scoring methods SM1, SM2, SM3 and SM4. The chart reveals that the largest prognostic value is obtained when the SM4 scoring method is used. The SM1 scoring method is therefore determined to be the method used. FIG. 35 is a Kaplan-Meier plot for the IHC_Dist_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate relationship when the SM1 scoring method is used. The determined cut-point value of 0.2121 divides the twenty-three patients into a first group of eleven patients and a second group of twelve patients. The first group is represented by the line 46 in the Kaplan-Meier plot. The second group is represented by the line 47 in the Kaplan-Meier plot.
  • Accordingly, the “learning phase” results in the generation of both a cut-point value and a rank normalized value list for each of the three univariate features, and for each of the two bivariate features. This information is stored in the database 6 for later use in the diagnostic phase.
  • Operation of the system 1 of FIG. 1 in the “diagnostic phase” is described below in connection with FIGS. 36-43. The new patient for whom the score 2 is to be determined is seen in a clinical setting, and a tissue sample is obtained from the patient. The tissue sample is sliced and analyzed as explained above in connection with FIG. 2. The first a second tissue slices are stained and imaged and analyzed by the system so as to generate an IHC_Dist_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 raw measurement value. The third tissue slice is lysed and two pairs of probes are applied. The nCounter device is used to count probes, thereby generating two count values. The count values are loaded into the system as the LGALS3 raw measurement value and the MAGEC2 raw measurement value. The three raw measurement values for the new patient are set forth in FIG. 36.
  • FIG. 37 shows the ordered rank values and the cut-point value determined in the learning phase for the LGALS3 univariate feature. For each patient ID, the LGALS3 raw measurement value is listed. The LGALS3 raw measurement value of 2400 is compared to this list of raw measurement values obtained in the learning phase. The 2400 value is a value between the 2358 value for patient number 2 and the 2409 value for patient number 9. If the raw measurement value for the new patient is below the cut-point value, then an LGALS3 univariate feature score SLGALS3 is “1”, otherwise the score SLGALS3 is “0”. In the case of the raw value being 2400, the 2400 raw measurement value falls below the cut-point so the SLGALS3 score is “1”.
  • In addition to determining the SLGALS3 score, a ranked normalized value for the raw score of the new patient is also determined. The ranked value corresponding to the 2400 raw measurement value is between the ranked value for patient number 2 (who has a raw measurement value of 2358) and patient number 9 (who has a raw measurement value of 2409). The ranked value for the new patient must therefore be between the ranked value of 13/22 for patient number 2 and the ranked value of 14/22 for patient number 9. In the present example, because the ranked value for the new patient is between these two values, a ranked value for the new patient is determined in simplified fashion to be a ranked value of 13.5/22 which is midway between the two ranked values of 13/22 and 14/22. This ranked value of 13.5/22 for LGALS3 is stored and is later used in the determination of a bivariate feature score.
  • FIG. 38 shows the ordered rank values and the cut-point value determined in the learning phase for the MAGEC2 univariate feature. The same process described above in connection with LGALS3 and FIG. 37 is performed here for MAGEC2. The new patient's MAGEC2 raw measurement value of 17 points to a location in the list of raw MAGEC2 measurements that is below the cut-point value. The SMAGEC2 score is therefore determined to be “1”. The new patient's MAGEC2 raw measurement value translates into a ranked value of 10.5/12.
  • FIG. 39 shows the ordered rank values and the cut-point value determined in the learning phase for the IHC_Dist_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 univariate feature. The same process described above in connection with LGALS3 and FIG. 37 is performed here for IHC_Dist_CD163(+)_CD3(+)CD8(−). The new patient's raw measurement value of 700 points to a location above the cut-point value. The SIHC score is therefore determined to be “0”. The new patient's raw measurement value translates into a ranked value of 6.5/22.
  • FIG. 40 shows how a LGALS3-to-MAGEC2 bivariate feature score value for the new patient is determined. Because the SM2 scoring method was determined in the diagnostic phase to the scoring method for the LGALS3-to-MAGEC2 bivariate relationship that results in the highest prognostic value, the SM2 scoring method is used. The rank normalized list of rank values of FIG. 25 for the SM2 scoring method is replicated in FIG. 40. The rank value for the new patient is determined using the 13.5/22 rank value for LGALS3 and the 10.5/12 rank value for MAGEC2. The SM2 equation is applied to these rank values, resulting in a SM2 rank value for the LGALS3-to-MAGEC2 bivariate relationship of 0.3381. As can be seen from FIG. 40, this 0.3381 rank value is located between the values for patient number 9 and patient number 16. This location is below the cut-point, so the S(LGALS3-MAGEC2) score value is “1”.
  • FIG. 41 shows how a IHC_Dist_CD163(+)_CD3(+)CD8(−)-to-MAGEC2 bivariate feature score value for the new patient is determined. The same process described above in connection with FIG. 40 is used. Because the SM1 scoring method was determined in the diagnostic phase (see FIG. 34), the SM1 scoring method is used. The rank normalized list of rank values of FIG. 30 for the SM1 scoring method is replicated in FIG. 41. The rank value for the new patient is determined using the 6.5/22 rank value for IHC (see FIG. 39) and the 10.5/12 rank value for MAGEC2 (see FIG. 38). The SM1 equation is applied to these two rank values, resulting in a SM1 rank value for the bivariate relationship of 0.2585. As can be seen from FIG. 41, this rank value is located between the values for patient number 19 and patient number 22. This location is below the cut-point, so the S(IHC-MAGEC2) score value is “1”.
  • The score 2 generated by the system 1 of FIG. 1 for the new patient is a function of the three univariate feature score values SLGALS3, SMAGEC2 and SIHC, and of the two bivariate feature score values S(LGALS3−MAGEC2) and S(IHC−MAGEC2). In the present example, this function is a majority voting function (see FIG. 42). Each of the five feature score values is a number that is either 0 or 1. The five feature score values are summed, and if the sum is greater than or equal to 2.5 then the overall score (the score 2 of FIG. 1) is determined to be “1”, otherwise the overall score (the score 2 of FIG. 1) is determined to be “0”. An overall score of “0” indicates that cancer recurrence is determined to be unlikely. An overall score of “1” indicates that cancer recurrence is determined to be likely.
  • FIG. 43 shows how the function of FIG. 42 is applied in the case of the new patient whose score 2 is being determined. The sum of the five features value scores is 4, and this sum is greater than 2.5, so score 2 as determined by the system 1 of FIG. 1 is “1”. This score value of “1” is displayed on the display 9 of the computer 7 of FIG. 1.
  • Although the present invention has been described in connection with certain specific embodiments for instructional purposes, the present invention is not limited thereto. Accordingly, various modifications, adaptations, and combinations of various features of the described embodiments can be practiced without departing from the scope of the invention as set forth in the claims.

Claims (20)

What is claimed is:
1. A method involving a system for generating a score, wherein the score is indicative of whether a cancer patient will have a recurrence of cancer, the method comprising:
(a) measuring a phenomic feature based on biomarker positive objects detected by the system in a digital image of a first tissue slice, wherein the first tissue slice was stained with at least one protein-specific immunohistochemical (IHC) biomarker;
(b) as a result of the measuring of (a) generating a univariate phenomic feature score value;
(c) measuring a genomic feature based on detecting objects marked with at least one gene-specific probe biomarker detected by the system in tissue of a second tissue slice, wherein the first tissue slice and the second tissue slice are both taken from the same tissue sample taken from the cancer patient;
(d) as a result of the measuring of (c) generating a univariate genomic feature score value;
(e) based at least in part on the measuring of (a) and the measuring of (c) generating a bivariate feature score value, wherein the bivariate feature score value indicates a strength of a relationship between the phenomic feature and the genomic feature; and
(f) determining the score by evaluating a function, wherein the function is a function of at least the bivariate feature score value, wherein (a) through (f) are performed by the system.
2. The method of claim 1, wherein the function of (f) is also a function of the univariate phenomic feature score value generated in (b), and is also a function of the univariate genomic feature score value generated in (d).
3. The method of claim 1, wherein the function of (f) is a majority voting function, wherein the univariate phenomic feature score value generated in (b) is a first vote, wherein the univariate genomic feature score value generated in (d) is a second vote, and wherein the bivariate feature score value generated in (e) is a third vote.
4. The method of claim 1, wherein the gene-specific probe biomarker of (c) is a probe that attaches to a particular sequence of nucleotides of an mRNA strand.
5. The method of claim 1, wherein the score determined in (f) is indicative of whether measurable prostate-specific antigen (PSA) is present in the cancer patient's blood.
6. The method of claim 1, wherein the system stores a cut-point value for a bivariate feature, and wherein the generating of the bivariate feature score value in (e) involves:
(e1) determining a bivariate feature value for the bivariate feature, wherein the bivariate feature value is determined based at least in part on raw phenomic measurement data obtained in (a) and on raw genomic measurement data obtained in (c); and
(e2) comparing the bivariate feature value determined in (e1) to the cut-point value, wherein the cut-point value was stored in the system prior to the measuring of (a) and prior to the measuring of (c).
7. The method of claim 1, wherein the measuring of (c) involves a counter device that outputs a count value, wherein the count value is indicative of a number of gene-specific probes counted, and wherein the counter device is a part of the system.
8. The method of claim 1, wherein the system comprises a display, the method further comprising:
(g) displaying a rendering of a network on the display of the system, wherein the network includes a first node representing a phenomic feature, a second node representing a genomic feature, and edge that extends between the first node and the second node, wherein the edge represents the bivariate feature.
9. A method comprising:
(a) receiving raw phenomic feature measurement data onto a system;
(b) defining a phenomic feature based at least in part on the raw phenomic feature measurement data received in (a), wherein the phenomic feature includes a list of ranked values;
(c) receiving raw genomic feature measurement data onto the system;
(d) defining a genomic feature based at least in part on the raw genomic feature measurement data received in (c), wherein the genomic feature includes a list of ranked values;
(e) defining a bivariate feature by combining ranked values of the list of ranked values of the phenomic feature with ranked values of the list of ranked values of the genomic feature thereby generating a list of ranked values for the bivariate feature, wherein the defining of the bivariate feature in (e) further involves determining and storing a cut-point value for the bivariate feature;
(f) receiving a phenomic feature measurement data value onto the system, wherein the phenomic feature measurement data value is data obtained by analyzing a digital image of a first portion of tissue of a tissue sample, wherein the tissue sample is from a cancer patient;
(g) receiving a genomic feature measurement data value onto the system, wherein the genomic feature measurement data value is data obtained by analyzing a second portion of the tissue of the tissue sample;
(h) calculating a first score for the bivariate feature based at least in part on the phenomic feature measurement data value received in (f), the genomic feature measurement data value received in (g), and the cut-point value of (e), wherein the receiving of (f) and the receiving of (g) and the calculating in (h) are all performed after the defining of the bivariate feature in (e); and
(i) determining a second score by evaluating a function, wherein the function is a function of the first score calculated in (h), wherein (a) through (i) are performed by the system, and wherein the second score is indicative of whether the cancer patient will have a recurrence of cancer.
10. The method of claim 9, wherein the system also calculates a third score for a phenomic feature, and wherein the system also calculates a fourth score for a genomic feature, and wherein the function in (i) is also a function of the third score and a function of the fourth score.
11. The method of claim 9, wherein the receiving of the raw genomic feature measurement data in (c) is a receiving of a digital file onto the system, wherein the digital file includes a digital count value.
12. The method of claim 9, wherein the receiving of the genomic feature measurement data value in (g) is a count of a number of gene-specific probes.
13. The method of claim 9, wherein the genomic feature measurement data value received in (g) is indicative of a degree of expression of a gene in the second portion of the tissue of the tissue sample.
14. A method of generating a network of prognostic features for cancer recurrence of a cancer patient, comprising:
(a) measuring an immunohistochemical-based (IHC-based) feature of a tissue sample of a tumor of the cancer patient and computing the IHC-based feature;
(b) measuring a gene expression feature of the tissue sample and computing the gene expression feature;
(c) computing a bivariate feature, wherein the bivariate feature provides significant prognostic information on the cancer recurrence; and
(d) displaying the network on a computer display, wherein a first node of the network represents the IHC-base feature computed in (a), wherein a second node of the network represents the gene expression feature computed in (b), and wherein an edge that extends between the first and second nodes represents the bivariate feature computed in (c).
15. The method of claim 14, wherein the size of a node of the displayed network indicates a prognostic value of a feature represented by the node, wherein some nodes of the network as displayed on the computer display are larger than other nodes of the network as displayed on the computer display.
16. The method of claim 14, wherein the width of an edge of the displayed network indicates a prognostic value of a bivariate feature represented by the edge, wherein some edges of the network as displayed on the computer display are wider than other edges of the network as displayed on the computer display.
17. The method of claim 14, wherein the bivariate feature computed in (c) is computed using a fuzzy logic combination of two features including the operators “and” and “not”.
18. The method of claim 14, wherein the bivariate feature that is computed in (c) includes a prognostic value.
19. The method of claim 14, wherein the bivariate feature that is computed in (a) includes a cut-point value.
20. The method of claim 14, wherein the display is a part of a system, and wherein (a) through (d) are performed by the system.
US16/237,392 2018-02-12 2018-12-31 Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling Abandoned US20190252075A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/237,392 US20190252075A1 (en) 2018-02-12 2018-12-31 Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862629591P 2018-02-12 2018-02-12
US16/237,392 US20190252075A1 (en) 2018-02-12 2018-12-31 Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling

Publications (1)

Publication Number Publication Date
US20190252075A1 true US20190252075A1 (en) 2019-08-15

Family

ID=65433497

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/237,392 Abandoned US20190252075A1 (en) 2018-02-12 2018-12-31 Predicting Prostate Cancer Recurrence Using a Prognostic Model that Combines Immunohistochemical Staining and Gene Expression Profiling
US16/271,827 Active 2042-02-03 US11651863B2 (en) 2018-02-12 2019-02-09 Predicting prostate cancer recurrence using a prognostic model that combines immunohistochemical staining and gene expression profiling

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/271,827 Active 2042-02-03 US11651863B2 (en) 2018-02-12 2019-02-09 Predicting prostate cancer recurrence using a prognostic model that combines immunohistochemical staining and gene expression profiling

Country Status (2)

Country Link
US (2) US20190252075A1 (en)
EP (1) EP3533883A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596298A (en) * 2022-03-16 2022-06-07 华东师范大学 Hyperspectral imaging-based automatic generation method of fine-labeled digital pathological data set

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201420859D0 (en) 2014-11-24 2015-01-07 Cancer Res Inst Royal Tumour analysis
US10380491B2 (en) 2016-03-20 2019-08-13 Definiens Ag System for predicting the recurrence of cancer in a cancer patient

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596298A (en) * 2022-03-16 2022-06-07 华东师范大学 Hyperspectral imaging-based automatic generation method of fine-labeled digital pathological data set
US11763453B1 (en) 2022-03-16 2023-09-19 East China Normal University Automatic generation method of fine-labeled digital pathological data set based on hyperspectral imaging

Also Published As

Publication number Publication date
US20190252044A1 (en) 2019-08-15
US11651863B2 (en) 2023-05-16
EP3533883A1 (en) 2019-09-04

Similar Documents

Publication Publication Date Title
KR20220119447A (en) Method for supporting pathology diagnosis using AI, and supporting device
JP6143743B2 (en) Cluster analysis of biomarker expression in cells
Albert et al. Latent class modeling approaches for assessing diagnostic error without a gold standard: with applications to p53 immunohistochemical assays in bladder tumors
Kluk et al. MYC immunohistochemistry to identify MYC-driven B-cell lymphomas in clinical practice
JP2023116530A (en) Single-cell genomic profiling of circulating tumor cells (ctc) in metastatic disease to characterize disease heterogeneity
Shaver et al. B-ALL minimal residual disease flow cytometry: an application of a novel method for optimization of a single-tube model
CN110998318A (en) Method for determining therapy based on single cell characterization of Circulating Tumor Cells (CTCs) in metastatic disease
Gao et al. A new method for predicting survival in stage I non-small cell lung cancer patients: nomogram based on macrophage immunoscore, TNM stage and lymphocyte-to-monocyte ratio
CN115274118A (en) Method for constructing testis tumor diagnosis and postoperative recurrence risk prediction model
US20230360208A1 (en) Training end-to-end weakly supervised networks at the specimen (supra-image) level
US11651863B2 (en) Predicting prostate cancer recurrence using a prognostic model that combines immunohistochemical staining and gene expression profiling
Padmanaban et al. Between-tumor and within-tumor heterogeneity in invasive potential
KR101990430B1 (en) System and method of biomarker identification for cancer recurrence prediction
Forjaz et al. Three-dimensional assessments are necessary to determine the true, spatially-resolved composition of tissues
CN108603233A (en) The unicellular Genome Atlas of circulating tumor cell (CTC) is analyzed to characterize disease heterogeneity in metastatic disease
Chen et al. Evaluation of dynamic image progression of minimally invasive and preinvasive lung adenocarcinomas
JP6517933B2 (en) Inspection system, inspection apparatus, and inspection method
EP4172852A1 (en) Method and system for generating a visual representation
Fassler Digital Pathology-Based Approaches for Assessing the Tumor Microenvironment: Surveys of the Immune Landscape and Patient Prognosis
Morisi The use of digital pathology and machine learning for the detection and characterisation of canine soft tissue sarcomas
Diep Variable selection for generalized linear mixed model by L1 penalization for predicting clinical parameters of ovarian cancer
Sorigue Diagnosis of erythroid dysplasia by flow cytometry: a review
US20210209756A1 (en) Apparatuses and methods for digital pathology
Weeratunga et al. Temporo-spatial cellular atlas of the regenerating alveolar niche in idiopathic pulmonary fibrosis
Hong et al. Tumor-naïve pre-surgical ctDNA detection is prognostic in clinical stage I lung adenocarcinoma

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION