EP3721372A1 - Method of storing and retrieving digital pathology analysis results - Google Patents
Method of storing and retrieving digital pathology analysis resultsInfo
- Publication number
- EP3721372A1 EP3721372A1 EP18814573.4A EP18814573A EP3721372A1 EP 3721372 A1 EP3721372 A1 EP 3721372A1 EP 18814573 A EP18814573 A EP 18814573A EP 3721372 A1 EP3721372 A1 EP 3721372A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- sub
- regions
- pixels
- derived
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/695—Preprocessing, e.g. image segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
Definitions
- Digital pathology involves scanning of whole histopathology or cytopathology glass slides into digital images interpretable on a computer screen. These images are to be processed subsequently by an imaging algorithm or interpreted by a pathologist.
- tissue sections which are virtually transparent
- tissue sections are prepared using colored histochemical stains that bind selectively to cellular components.
- Color-enhanced, or stained, cellular structures are used by clinicians or a computer-aided diagnosis (CAD) algorithm to identify morphological markers of a disease, and to proceed with therapy accordingly.
- CAD computer-aided diagnosis
- Immunohistochemical (IHC) slide staining can be utilized to identify proteins in cells of a tissue section and hence is widely used in the study of different types of cells, such as cancerous cells and immune cells in biological tissue.
- IHC staining may be used in research to understand the distribution and localization of the differentially expressed biomarkers of immune cells (such as T-cells or B-cells) in a cancerous tissue for an immune response study.
- immune cells such as T-cells or B-cells
- tumors often contain infiltrates of immune cells, which may prevent the development of tumors or favor the outgrowth of tumors.
- ISH In-situ hybridization
- a genetic abnormality or condition such as amplification of cancer causing genes specifically in cells that, when viewed under a microscope, morphologically appear to be malignant.
- ISH employs labeled DNA or RNA probe molecules that are anti-sense to a target gene sequence or transcript to detect or localize targeted nucleic acid target genes within a cell or tissue sample.
- ISH is performed by exposing a cell or tissue sample immobilized on a glass slide to a labeled nucleic acid probe which is capable of specifically hybridizing to a given target gene in the cell or tissue sample.
- target genes can be simultaneously analyzed by exposing a cell or tissue sample to a plurality of nucleic acid probes that have been labeled with a plurality of different nucleic acid tags.
- simultaneous multicolored analysis may be performed in a single step on a single target cell or tissue sample.
- the present disclosure relates, among other things, to automated systems and methods for analyzing and storing data associated with biological objects having irregular shapes (e.g. fibroblasts or macrophages).
- the present disclosure also relates to automated systems and methods for analyzing and storing data associated with biological objects using a mid-resolution analysis (or medium-resolution analysis) approach, i.e. an approach that groups pixels having similar properties (e.g. staining intensity, staining presence, and/or texture) into "sub-regions.”
- a mid-resolution analysis or medium-resolution analysis
- images are acquired from biological specimens (e.g., tissue specimens) mounted on a glass slide and stained for the identification of biomarkers. It is possible to assess the biological sample under a microscope at high magnification or to automatically analyze it with a digital pathology algorithm that detects and classifies biological objects of interest.
- the objects of interest can be cells, vessels, glands, tissue regions, etc. Any derived information may be stored in a database for later retrieval, and the database may include statistics of a presence, absence, spatial relation, and/or staining properties ofbiological structures of interest.
- fibroblasts or macrophages have an irregular shape. Groups of these types of cells may extend around each other or other cells (see FIG. 5). Consequently, it is often difficult to precisely identify these irregularly-shaped cells individually by an observer or by an automated algorithm. Instead, these cells are very often identified just by a local presence of their stained cytoplasm or membrane without the identification of individual cells.
- FIGS. 6A and 6B illustrate an example of an IHC images stained for tumor (yellow, 620) and fibroblasts (purple, 610), which are represented by a large polygon outline (red, 630) surrounding a group of relevant cells with exclusion "holes" (cyan, 640) for undesired regions.
- the analysis results are averaged over a large region (red outline, 630) that may contain a large number of individual cells having different features (e.g.
- the outlined fibroblast activation protein (FAP)-positive area is 928.16 urn 2 with a computed FAP -positive mean intensity of 0.26.
- the mean intensity of 0.26 is quite coarse to indicate and be representative of the whole FAP -positive in this image.
- this low-resolution analysis approach may lead to a loss of accuracy when stored results are subsequently utilized in downstream processing. As such, it is believed that due to such heterogeneity of the stained cells, this method does not locally present the actual detail of the regions of such biological structures of interest.
- the present disclosure provides systems and methods for deriving data corresponding to irregularly-shaped cells using a mid-resolution analysis approach by segmenting the image into a plurality of sub-regions, the sub-regions having similar image properties (e.g. at least one of texture, intensity, or color).
- a method of storing image analysis data derived from an image of a biological specimen having at least one stain comprising: (a) deriving one or more feature metrics from the image; (b) segmenting the image into a plurality of sub-regions, each sub-region comprising pixels that are substantially uniform in at least one of staining presence, staining intensity, or local texture; (c) generating a plurality representational objects based on the plurality of segmented sub-regions; (d) associating each of the plurality of representational objects with derived feature metrics; and (e) storing coordinates for each representational object along with the associated derived feature metrics in a database.
- the segmentation of the image into the plurality of sub-regions comprises deriving superpixels.
- the superpixels are derived by (i) grouping pixels with local k-means clustering; and (ii) using a connected components algorithm to merge small isolated regions into nearest large superpixels.
- the superpixels (as sub-regions) are perceptually meaningful such that each superpixel is a perceptually consistent unit, i.e. all pixels in a superpixel are likely uniform in color and texture.
- connected components labeling scans an image and groups its pixels into components based on pixel connectivity, i.e. all pixels in a connected component share similar pixel intensity values and are in some way connected with each other.
- the segmentation of the image into the plurality of sub- regions comprises overlaying a sampling grid onto the image, the sampling grid defining non overlapping areas having a predetermined size and shape.
- the sub-regions have a MxN size, where M ranges from 50 pixels to 100 pixels, and where N ranges from 50 pixels to about 100 pixels.
- the representational objects comprise outlines of sub- regions that meet a pre-defined staining intensity threshold.
- representational objects comprise seed points.
- the seed points are derived by computing a centroid for each of the plurality of sub-regions.
- the derived feature metrics are staining intensities, and where an average staining intensity for all pixels within each generated representational object outline is computed.
- the derived feature metrics are expression scores, and wherein average expression scores corresponding to areas within each generated sub-region are associated with the generated plurality of representational objects.
- the method further comprises retrieving the stored coordinates and associated feature metric data from the database, and projecting the retrieved data onto the image.
- the analysis results (e.g., intensity, area) within a corresponding sub-region can be stored in the form of average pixel measurements which are representative of pixel data of that sub-region.
- the biological sample is stained with a membrane stain. In some embodiments, the biological sample is stained with at least one of a membrane stain and a nuclear stain. In some embodiments, the biological sample is stained with at least FAP, and wherein the derived one or more feature metrics include at least one of a FAP staining intensity or a FAP percent positivity. In some embodiments, an average FAP percent positivity is calculated for all pixels within a sub-region. In some embodiments, an average FAP staining intensity is calculated for all pixels within a sub-region. In some embodiments, the sample is stained with FAP and H&E. In some embodiments, the sample is stained with FAP and another nuclear or membrane stain.
- the images received as input are first unmixed into image channel images, e.g. an image channel image for a particular stain.
- image channel images e.g. an image channel image for a particular stain.
- a region-of-interest is selected prior to image analysis.
- a system for deriving data corresponding to irregularly-shaped cells from an image of a biological sample comprising at least one stain comprising: (i) one or more processors, and (ii) a memory coupled to the one or more processors, the memory to store computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: (a) deriving one or more feature metrics from the image; (b) generating a plurality of sub-regions within the image, each sub-region having pixels with similar characteristics, the characteristics selected from color, brightness, and/or texture; (c) computing a series of representational objects based on the generated plurality of sub-regions; and (d) associating the derived one or more feature metrics from the image with calculated coordinates of each of the series of computed representational objects.
- sub-regions are formed by grouping pixels that are (i) adjacent, (ii) have similar perceptually meaningful properties (e.g. color, brightness, and/or texture), are (iii) sufficiently homogenous with respect to biological properties (e.g. biological structures, staining properties of biological structures, cellular features, groups of cells).
- biological properties e.g. biological structures, staining properties of biological structures, cellular features, groups of cells.
- pixels in a sub-region have similar properties and descriptive statistics for the biological objects of interest, e.g. irregularly shaped cells including, but not limited to, fibroblasts and macrophages.
- the segmentation of the image into the plurality of sub- regions comprises deriving superpixels.
- the superpixels are derived using one of a graph-based approach or a gradient-ascent-based approach.
- the superpixels are derived by (i) grouping pixels with local k-means clustering; and (ii) using a connected components algorithm to merge small isolated regions into nearest large superpixels.
- the representational objects comprise outlines of sub- regions that meet a pre-defined staining intensity threshold. In some embodiments, the representational objects comprise seed points. In some embodiments, the system further comprises instructions for storing the derived one or more feature metrics and associated calculated representational object coordinates in a database. In some embodiments, the one or more derived feature metrics comprise at least one expression score selected from percent positivity, an H-score, or a staining intensity. In some embodiments, data corresponding to irregularly-shaped cells is derived for a region-of-interest within the image. In some embodiments, the region-of-interest is an area of the image annotated by a medical professional.
- a non-transitory computer-readable medium storing instructions for analyzing data associated with biological objects having irregular shapes, the instructions comprising: (a) instructions for deriving one or more feature metrics from an image of a biological sample, the biological sample comprising at least one stain; (b) instructions for partitioning the image into a series of sub-regions by grouping pixels having similar characteristics, the characteristics selected from color, brightness, and/or texture; (c) instructions for computing a plurality of representational objects based on the series of partitioned sub-regions; and (d) instructions for associating the derived one or more feature metrics from the image with calculated coordinates of each of the plurality of computed representational objects.
- the partitioning of the image into the series of sub-regions comprising computing superpixels.
- the superpixels are computing using one of a normalized cuts algorithm, an agglomerative clustering algorithm, a quick shift algorithm, a turbopixel algorithm, or simple linear iterative clustering algorithm.
- the superpixels are generated using simple iterative clustering, and wherein a superpixel size parameter is set to between about 40 pixels and about 400 pixels, and wherein a compactness parameter is set to between about 10 to about 100.
- the superpixels are computed by (i) grouping pixels with local k-means clustering; and (ii) using a connected components algorithm to merge small isolated regions into nearest large superpixels.
- the biological sample is stained with at least FAP, and wherein the derived one or more feature metrics include at least one of a FAP staining intensity or a FAP percent positivity.
- a FAP staining intensity is calculated for all pixels within a sub-region.
- an average FAP staining intensity is calculated for all pixels within a sub-region.
- the representational objects comprise at least one of polygon outlines and seed points.
- the memory includes instructions for storing the derived one or more feature metrics and associated calculated representational object coordinates in a database.
- the memory includes instructions for projecting stored information onto the image of the biological sample.
- the systems and methods are computationally efficient since the generated sub- regions allow for a reduction in the complexity of images from several thousands of pixels to a smaller, more manageable number of sub-regions allowing for significantly faster further retrieval and reporting of analysis results.
- the sub-regions are representationally efficient since they are not too small or too large to store and represent analysis results.
- the systems and methods disclosed herein allow for enhanced accuracy, especially as compared with a low-resolution analysis approach, since the sub- regions generated describe properties or statistical information of biologically relevant objects of interest as compared with the storage of information from a larger regional representation (i.e. the sub-regions comprise pixels that are as uniform as possible in staining presence, staining intensity, and texture).
- FIG. 1 illustrates a representative digital pathology system including an image acquisition device and a computer system, in accordance with some embodiments.
- FIG. 2 sets forth various modules that can be utilized in a digital pathology system or within a digital pathology workflow, in accordance with some embodiments.
- FIG. 3 sets forth a flowchart illustrating the various steps of deriving image analysis data and associated such image analysis data with generated sub-regions, in accordance with some embodiments.
- FIG. 4 provides an example of digital pathology image of liver cancer cells at a high-level resolution, in accordance with some embodiments.
- Each annotation point can contain read-out information e.g., descriptive statistics of the presence, absence, spatial relation, and staining properties ofbiological structures of interest.
- FIGS. 5 A - 5D illustrate the appearance of fibroblast cells that are morphologically heterogeneous with diverse appearances (e.g. irregular size, shape, and boundary of cells).
- normal and activated fibroblasts are illustrated in (A) and (C), respectively.
- C) and (D) set forth haematoxylin and eosin stained (H&E) images of normal and activated fibroblasts, respectively.
- FIG. 6A sets forth an example of immunohi stochemi stry (IFIC) of fibroblasts associated with tumor cells, where fibroblasts (610) are stained in purple and tumor (620) stained in yellow. As shown, fibroblasts can touch and have a very irregular shape, extending beyond or around other cells.
- FIG. 6B sets forth an example of lower resolution polygon outlines (red, 630) for areas with positive fibroblast expression and exclusion regions (holes, 640) in cyan.
- FIG. 7 illustrates sub-regions (710) having a simple shape (e.g. circle) which may be associated with image data using the mid-resolution approach described herein.
- FIG. 8A sets forth an example of superpixels generated using SLIC in the fibroblast regions on an IFIC image.
- FIG. 8B provides an original IF1C image at high magnification, where tumor cells (830) stained in yellow and fibroblasts (840) are stained in purple.
- FIG. 8C illustrates an initial shape of superpixels, which appear similar to squares before a regularization parameter is adjusted, in accordance with some embodiments.
- FIG. 8D illustrates a final representation of superpixels where the regularization parameter in the SLIC algorithm was adjusted, in accordance with some embodiments.
- FIG. 9A illustrates polygon outlines (black, 910) of the sub-regions (here, superpixels) belonging to the regions of interest (fibroblast regions), in accordance with some embodiments.
- FIG. 9B sets forth polygon outlines (black, 920) and center seeds (green dots, 930) for the sub-regions (superpixels) belonging to the biological objects of interest (fibroblasts), in accordance with some embodiments.
- FIG. 10A provides an example of a whole slide IHC image ofhead-and-neck cancer tissue stained with fibroblast-activation protein (FAP) for fibroblasts (1010) in purple and with pan-cytokeratin (PanCK) for epithelial tumor (1020) in yellow.
- FAP fibroblast-activation protein
- PanCK pan-cytokeratin
- FIG. 10B sets forth an example of polygon outlines attached with their analysis results of the superpixels (blue, 1030) belonging to the fibroblast regions which can be stored in database.
- FIG. 1 1 sets forth an example of center seeds attached with their analysis results of the superpixels (red, 1 140) belonging to the fibroblast regions which can be stored in database.
- FIG. 12 provides an example of a histogram plot of FAP intensity retrieved from whole slide superpixels.
- FIG. 13 provides a flow chart illustrating the steps of region selection, in accordance with some embodiments.
- FIG. 14 sets forth six different annotation shapes and regions within an image of a biological sample.
- FIG. 15 illustrates the agreement of the percentage of the FAP -positive areas between (i) the FAP+ area determined using a high-resolution analysis approach and (ii) using the example mid-resolution (sub-regional) approaches described herein.
- a method involving steps a, b, and c means that the method includes at least steps a, b, and c.
- steps and processes may be outlined herein in a particular order, the skilled artisan will recognize that the ordering steps and processes may vary.
- the phrase "at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified.
- At least one of A and B can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
- biological sample refers to any sample including a biomolecule (such as a protein, a peptide, a nucleic acid, a lipid, a carbohydrate, or a combination thereof) that is obtained from any organism including viruses.
- a biomolecule such as a protein, a peptide, a nucleic acid, a lipid, a carbohydrate, or a combination thereof
- Other examples of organisms include mammals (such as humans; veterinary animals like cats, dogs, horses, cattle, and swine; and laboratory animals like mice, rats and primates), insects, annelids, arachnids, marsupials, reptiles, amphibians, bacteria, and fungi.
- Biological samples include tissue samples (such as tissue sections and needle biopsies of tissue), cell samples (such as cytological smears such as Pap smears or blood smears or samples of cells obtained by microdissection), or cell fractions, fragments or organelles (such as obtained by lysing cells and separating their components by centrifugation or otherwise).
- tissue samples such as tissue sections and needle biopsies of tissue
- cell samples such as cytological smears such as Pap smears or blood smears or samples of cells obtained by microdissection
- cell fractions, fragments or organelles such as obtained by lysing cells and separating their components by centrifugation or otherwise.
- biological samples include blood, serum, urine, semen, fecal matter, cerebrospinal fluid, interstitial fluid, mucous, tears, sweat, pus, biopsied tissue (for example, obtained by a surgical biopsy or a needle biopsy), nipple aspirates, cerumen, milk, vaginal fluid, saliva, swabs (such as buccal swabs), or any material containing biomolecules that is derived from a first biological sample.
- the term "biological sample” as used herein refers to a sample (such as a homogenized or liquefied sample) prepared from a tumor or a portion thereof obtained from a subject.
- biomarker refers to a measurable indicator of some biological state or condition.
- a biomarker may be a protein or peptide, e.g. a surface protein, that can be specifically stained and which is indicative of a biological feature of the cell, e.g. the cell type or the physiological state of the cell.
- An immune cell marker is a biomarker that is selectively indicative of a feature that relates to an immune response of a mammal.
- a biomarker may be used to determine how well the body responds to a treatment for a disease or condition or if the subject is predisposed to a disease or condition.
- a biomarker refers to a biological substance that is indicative of the presence of cancer in the body.
- a biomarker may be a molecule secreted by a tumor or a specific response of the body to the presence of cancer.
- Genetic, epigenetic, proteomic, glycomic, and imaging biomarkers can be used for cancer diagnosis, prognosis, and epidemiology. Such biomarkers can be assayed in non- invasively collected biofluids like blood or serum.
- Biomarkers may be useful as diagnostics (to identify early stage cancers) and/or prognostics (to forecast how aggressive a cancer is and/or predict how a subject will respond to a particular treatment and/or how likely a cancer is to recur).
- image data encompasses raw image data acquired from the biological sample, such as by means of an optical sensor or sensor array, or pre-processed image data.
- the image data may comprise a pixel matrix.
- immunohistochemistry refers to a method of determining the presence or distribution of an antigen in a sample by detecting interaction of the antigen with a specific binding agent, such as an antibody. A sample is contacted with an antibody under conditions permitting antibody-antigen binding.
- Antibo dy- antigen binding can be detected by means of a detectable label conjugated to the antibody (direct detection) or by means of a detectable label conjugated to a secondary antibody, which binds specifically to the primary antibody (indirect detection).
- a "mask” as used herein is a derivative of a digital image wherein each pixel in the mask is represented as a binary value, e.g. "1" or "0" (or “true” or “false”).
- a mask can be generated from an original digital image by assigning all pixels of the original image with an intensity value above a threshold to true and otherwise false, thereby creating a mask that will filter out all pixels overlaid by a "false” masked pixel.
- a "multi-channel image” as understood herein encompasses a digital image obtained from a biological tissue sample in which different biological structures, such as nuclei and tissue structures, are simultaneously stained with specific fluorescent dyes, quantum dots, chromogens, etc., each of which fluoresces or are otherwise detectable in a different spectral band thus constituting one of the channels of the multi-channel image.
- Applicants have developed a system and method of storing analysis results of biological objects having irregular shapes, including, for example, fibroblasts or macrophages in a database or other non-transitory memory.
- the analysis results may be subsequently retrieved from the database or memory for further analysis or for use in other downstream processes.
- the analysis results may also be projected onto input images or other derived images; or visualized by other means.
- the present disclosure also allows for the ability to adjust the size of generated sub-regions (e.g. by increasing or decreasing the size of a simple shape; or adjusting a parameter of a superpixels algorithm), facilitating the storage and reporting of analysis results with an adjustable level of detail. This is believed to allow for increased efficiencies and accuracies as compared with the low-resolution analysis approach described herein where an average analysis result from a global region of interest is saved.
- the disclosed systems and methods are based on a mid resolution analysis approach using locally similar small regions (sub-regions) to store analysis results.
- the sub-regions can be a simple shape (e.g., circle, square) or a complex shape (e.g., superpixels) and are utilized to store local analysis results of each small region across in an entire slide.
- the sub-regions defined by the present mid-resolution approach group pixels having similar (or homogeneous) properties (e.g. staining presence (i.e. the presence or absence of a particular stain), staining intensity (i.e. the relative intensity (or amount) of a stain), local texture (i.e.
- a sub-region within the mid-resolution approach has a size ranging from about 50 to about 100 pixels; or a pixel area between about 2,500 pixels 2 and about 10,000 pixels 2 .
- the sub- region may have any size and the size may be based on the type of analysis being conducted and/or the type of cells being studied.
- a mid-level approach falls between the high and low-resolution analysis approaches described herein, such that data is collected on a sub regional level, the sub-regions being smaller in proportion than the regions of interest in a low- resolution analysis, and obviously larger than a pixel as in a high-resolution analysis approach.
- high resolution analysis it is meant image data captured at a pixel level or substantially at the pixel level.
- low resolution analysis refers to a regional-level analysis, such as a region having a size of at least 500 pixels by 500 pixels or an area having a size of greater than 250,000 pixels 2 .
- the low-resolution analysis approach would encompass many biological objects, e.g. a plurality of irregularly-shaped cells.
- the present disclosure may refer to the analysis and storage of biological objects having irregular shapes and/or sizes, including fibroblasts or macrophages. It is to be understood that the present disclosure is not to be limited to fibroblasts or macrophages, but may be extended to any biological object having a non-well-defined size or shape.
- fibroblasts are cells that make up the structural framework or stroma composed of the extracellular matrix and collagen in animal tissues. These cells are the most common type of connective tissue in animals, and are important for wound healing. Fibroblasts come in various shapes and sizes, as well as in an activated and non-activated form (see, e.g. FIGS. 5A - 5D). Fibroblasts are the activated form (the suffix "blast” refers to a metabolically active cell), while fibrocytes are considered less active. However, sometimes both fibroblasts and fibrocytes are not designated as being different and are simply referred to as fibroblasts.
- fibroblasts can be distinguished from fibrocytes by their abundance of rough endoplasmic reticulum and relatively larger size. Moreover, fibroblasts are believed to make contact with their neighbors, and the contacts are believed to be adhesions which may distort the form of the isolated cell.
- the mid-resolution analysis approach provided herein is able to account for these morphological differences and is believed to be well-suited to store information pertaining to fibroblasts, macrophages, and other irregular biological objects.
- the digital pathology system 200 may comprise an imaging apparatus 12 (e.g. an apparatus having means for scanning a specimen bearing microscope slide) and a computer 14, whereby the imaging apparatus 12 and computer may be communicatively coupled together (e.g. directly, or indirectly over a network 20).
- the computer system 14 can include a desktop computer, a laptop computer, a tablet, or the like, digital electronic circuitry, firmware, hardware, memory, a computer storage medium, a computer program or set of instructions (e.g. where the program is stored within the memory or storage medium), one or more processors (including a programmed processor), and any other hardware, software, or firmware modules or combinations thereof.
- the computing system 14 illustrated in FIG. 1 may comprise a computer with a display device 16 and an enclosure 18.
- the computer can store digital images in binary form (locally, such as in a memory, on a server, or another network connected device).
- the digital images can also be divided into a matrix of pixels.
- the pixels can include a digital value of one or more bits, defined by the bit depth.
- additional components e.g. specimen analyzers, microscopes, other imaging systems, automated slide preparation equipment, etc.
- the imaging apparatus 12 can include, without limitation, one or more image capture devices.
- Image capture devices can include, without limitation, a camera (e.g., an analog camera, a digital camera, etc.), optics (e.g., one or more lenses, sensor focus lens groups, microscope objectives, etc.), imaging sensors (e.g., a charge- coupled device (CCD), a complimentary metal-oxide semiconductor (CMOS) image sensor, or the like), photographic film, or the like.
- the image capture device can include a plurality of lenses that cooperate to prove on-the-fly focusing.
- An image sensor for example, a CCD sensor can capture a digital image of the specimen.
- the imaging apparatus 12 is a brightfield imaging system, a multispectral imaging (MSI) system or a fluorescent microscopy system.
- the digitized tissue data may be generated, for example, by an image scanning system, such as a VENT ANA iScan HT scanner by VENT AN A MEDICAL SYSTEMS, Inc. (Tucson, Arizona) or other suitable imaging equipment. Additional imaging devices and systems are described further herein.
- the digital color image acquired by the imaging apparatus 12 can be conventionally composed of elementary color pixels. Each colored pixel can be coded over three digital components, each comprising the same number of bits, each component corresponding to a primary color, generally red, green or blue, also denoted by the term "RGB" components.
- FIG. 2 provides an overview of the various modules utilized within the presently disclosed digital pathology system.
- the digital pathology system employs a computer device 200 or computer-implemented method having one or more processors 203 and at least one memory 201 , the at least one memory 201 storing non-transitory computer-readable instructions for execution by the one or more processors to cause the one or more processors to execute instructions (or stored data) in one or more modules (e.g. modules 202, and 205 through 209).
- modules e.g. modules 202, and 205 through 209
- the present disclosure provides a computer- implemented method of analyzing and/or storing analysis results of biological objects having irregular shapes, including, for example, fibroblasts or macrophages in a database or other non- transitory memory.
- the method may include, for example, (a) running an image acquisition module 202 to generate or receive multi-channel image data, e.g.
- an acquired image of a biological sample stained with one or more stains (step 300); (b) running an image analysis module 205 to derive one or more metrics from features within the acquired image (step 310); (c) running a segmentation module 206 to segment the acquired image into a plurality of sub-regions (step 320); (d) running a representational object generation module 207 to generate polygons, center seeds, or other objects identifying the sub-regions (step 330); (e) running a labeling module 208 to associate the derived one or more metrics with the generated representational objects (step 340); and (f) storing the representational objects and associated metrics in a data base 209 (step 350).
- additional modules or databases may be incorporated into the workflow.
- an image processing module may be run to apply certain filters to the acquired images or to identify certain histological and/or morphological structures within the tissue samples.
- a region of interest selection module may be utilized to select a particular portion of an image for analysis.
- an unmixing module may be run to provide image channel images corresponding to a particular stain or biomarker.
- the digital pathology system 200 runs an image acquisition module 202 to capture images or image data of a biological sample having one or more stains (step 300).
- the images received or acquired are RGB images or multispectral images (e.g. multiplex brightfield and/or darkfield images).
- the images captured are stored in memory 201.
- the images or image data may be acquired using the imaging apparatus 12, such as in real-time.
- the images are acquired from a microscope or other instrument capable of capturing image data of a specimen-bearing microscope slide, as noted herein.
- the images are acquired using a 2D scanner, such as one capable of scanning image tiles, or a line scanner capable of scanning the image in a line-by-line manner, such as the VENT ANA DP 200 scanner.
- the images may be images that have been previously acquired (e.g. scanned) and stored in a memory 201 (or, for that matter, retrieved from a server via network 20).
- the biological sample may be stained through application of one or more stains, and the resulting image or image data comprises signals corresponding to each of the one or more stains.
- the systems and methods described herein may estimate or normalize to a single stain, e.g. hematoxylin, there exists no limit on the number of stains within the biological sample. Indeed, the biological sample may have been stained in a multiplex assay for two or more stains, in addition to or including any counterstains.
- a biological sample may be stained for different types of nuclei and/or cell membrane biomarkers.
- Methods for staining tissue structures and guidance in the choice of stains appropriate for various purposes are discussed, for example, in “Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989)” and “Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Intersciences (1987),” the disclosures of which are incorporated herein by reference.
- the tissue sample is stained in an IHC assay for the presence of one or more biomarkers including a fibroblast activation protein (FAP).
- FAP fibroblast activation protein
- Over-expression of FAP in fibroblastic cell lines is believed to promote malignant behavior. It has been shown that stromal fibroblasts, which are an essential component of the tumor microenvironment and which have often been designated as cancer-associated fibroblasts (CAFs), can promote tumorigenesis and progression through multiple mechanisms, including proliferation, angiogenesis, invasion, survival and immune suppression.
- cancer cells activate stromal fibroblasts and induce the expression of FAP, which in turn, affects the proliferation, invasion and migration of the cancer cells.
- FAP is heavily expressed on reactive stromal fibroblasts in 90% of human epithelial carcinomas, including those of the breast, lung, colorectal, ovary, pancreas, and head- and-neck.
- the quantity of FAP most likely presents an important prognosis for the clinical behavior of tumors (and this is an example of one type of metrics that may be derived and later associated with a generated sub-region or representational object).
- Chromogenic stains may comprise Flematoxylin, Eosin, Fast Red, or 3,3'- Diaminobenzidine (DAB).
- DAB 3,3'- Diaminobenzidine
- any biological sample may also be stained with one or more fluorophores.
- the tissue sample is stained with a primary stain (e.g. hematoxylin).
- the tissue sample is stained in an IFIC assay for a particular biomarker.
- the samples may also be stained with one or more fluorescent dyes.
- a typical biological sample is processed in an automated staining/assay platform that applies a stain to the sample.
- the staining/assay platform may also include a bright field microscope, such as the VENTANA iScan FIT or the VENTANA DP 200 scanners of Ventana Medical Systems, Inc., or any microscope having one or more objective lenses and a digital imager. Other techniques for capturing images at different wavelengths may be used.
- Further camera platforms suitable for imaging stained biological specimens are known in the art and commercially available from companies such as Zeiss, Canon, Applied Spectral Imaging, and others, and such platforms are readily adaptable for use in the system, methods and apparatus of this subject disclosure.
- the input images are masked such that only tissue regions are present in the images.
- a tissue region mask is generated to mask non- tissue regions from tissue regions.
- a tissue region mask may be created by identifying the tissue regions and automatically or semi-automatically (i.e., with minimal user input) excluding the background regions (e.g. regions of a whole slide image corresponding to glass with no sample, such as where there exists only white light from the imaging source).
- the tissue masking module may also mask other areas of interest as needed, such as a portion of a tissue identified as belonging to a certain tissue type or belonging to a suspected tumor region.
- a segmentation technique is used to generate the tissue region masked images by masking tissue regions from non-tissue regions in the input images.
- Suitable segmentation techniques are as such known from the prior art, (cf. Digital Image Processing, Third Edition, Rafael C. Gonzalez, Richard E. Woods, chapter 10, page 689 and Handbook of Medical Imaging, Processing and Analysis, Isaac N. Bankman Academic Press, 2000, chapter 2).
- an image segmentation technique is utilized to distinguish between the digitized tissue data and the slide in the image, the tissue corresponding to the foreground and the slide corresponding to the background.
- the component computes the Area of Interest (AOI) in a whole slide image in order to detect all tissue regions in the AOI while limiting the amount of background non-tissue area that is analyzed.
- AOI Area of Interest
- a wide range of image segmentation techniques e.g., HSV color-based image segmentation, Lab image segmentation, mean-shift color image segmentation, region growing, level set methods, fast marching methods, etc.
- image segmentation techniques e.g., HSV color-based image segmentation, Lab image segmentation, mean-shift color image segmentation, region growing, level set methods, fast marching methods, etc.
- the component can also generate a tissue foreground mask that can be used to identify those portions of the digitized slide data that correspond to the tissue data.
- the component can generate a background mask used to identify those portions of the digitized slide date that do not correspond to the tissue data.
- tissue region mask may be used to remove the non-tissue background noise in the image, for example the non-tissue regions.
- the generation of the tissue region mask comprises one or more of the following operations (but not limited to the following operations): computing the luminance of the low resolution analysis input image, producing a luminance image, applying a standard deviation filter to the luminance image, producing a filtered luminance image, and applying a threshold to filtered luminance image, such that pixels with a luminance above a given threshold are set to one, and pixels below the threshold are set to zero, producing the tissue region mask.
- a region of interest identification module may be used to select a portion of the biological sample for which an image or for which image data should be acquired, e.g. a region of interest having a large concentration of fibroblast cells.
- FIG. 13 provides a flow chart illustrating the steps of region selection, in accordance with some embodiments.
- the region selection module receives an identified region of interest or field of view.
- the region of interest is identified by a user of a system of the present disclosure, or another system communicatively coupled to a system of the present disclosure.
- the region selection module retrieves a location or identification of a region or interest from a storage/memory.
- the region selection modul e automatically generates a field of view (FOV) or a region of interest (ROI), for example, via methods described in PCT/EP2015/062015, the disclosure of which is hereby incorporated by reference herein in its entirety.
- the region of interest is automatically determined by the system based on some predetermined criteria or characteristics that are in or of the image (e.g. for a biological sample stained with more than two stains, identifying an area of the image that comprises just two stains).
- the region selection module outputs the ROI.
- certain metrics are derived from features within the images received as input (step 300) (see FIG. 3).
- the derived metrics may be correlated with the sub-regions generated herein (steps 320, 330, and 340), and together the metrics (or averages, standard deviations, etc. thereof) and sub-region locations may be stored in a database (step 350) for later retrieval and/or downstream processing.
- the procedures and algorithms described herein may be adapted to derive metrics from and/or classify various types of cells or cell nuclei, including deriving metrics from fibroblasts and/or macrophages.
- the metrics are derived by detecting nuclei within the input image and/or by extracting features from the detected nuclei (such as from image patches surrounding the detected nuclei) and/or from cell membranes (depending, of course, on the biomarker(s) utilized within the input image).
- metrics are derived by analyzing cell membrane staining, cell cytoplasm staining, and/or punctuate staining (e.g. to distinguish between membrane-staining areas and non-membrane staining areas).
- cytoplasmic staining refers to a group of pixels arranged in a pattern bearing the morphological characteristics of a cytoplasmic region of a cell.
- the term “membrane staining” refers to a group of pixels arranged in a pattern bearing the morphological characteristics of a cell membrane.
- the term “punctate staining” refers to a group of pixels with strong localized intensity of staining appearing as spots/dots scattering on the membrane area of the cell.
- the nucleus, cytoplasm and membrane of a cell have different characteristics and that differently stained tissue samples may reveal different biological features. Indeed, the skilled artisan will appreciate that certain cell surface receptors can have staining patterns localized to the membrane, or localized to the cytoplasm.
- a "membrane" staining pattern is analytically distinct from a “cytoplasmic” staining pattern.
- a "cytoplasmic” staining pattern and a “nuclear” staining pattern are analytically distinct.
- stromal cells may be strongly stained by FAP
- tumor epithelial cells may be strongly stained by EpCAM
- cytokeratins may be stained by panCK.
- the '927 Patent describes an automated method for simultaneously identifying a plurality of pixels in an input image of a biological tissue stained with a biomarker, including considering a first color plane of a plurality of pixels in a foreground of the input image for simultaneous identification of cell cytoplasm and cell membrane pixels, wherein the input image has been processed to remove background portions of the input image and to remove counterstained components of the input image; determining a threshold level between cell cytoplasm and cell membrane pixels in the foreground of the digital image; and determining simultaneously with a selected pixel and its eight neighbors from the foreground if the selected pixel is a cell cytoplasm pixel, a cell membrane pixel or a transitional pixel in the digital image using the determined threshold level.
- the '927 Patent further describes that the step of determining simultaneously with a selected pixel and its eight neighbors includes: determining a square root of a product of the selected pixel with its eight neighboring pixels; comparing the product to the determined threshold level; incrementing a first counter for a cell membrane, a second counter for cell cytoplasm or a third counter for transitional pixel based on the comparison; determining whether the first counter, second counter or third counter exceeds a pre- determined maximum value, and if so, classifying the selected pixel based on a counter that exceeds the predetermined maximum value.
- the '927 Patent provides examples on scoring cytoplasm and membranes, such as based on computed cytoplasm pixel volume indexes, cytoplasm pixel median intensity, membrane pixel volume, and membrane pixel median intensity, respectively.
- the ⁇ 80 Publication describes a method of quantifying analyte staining of a biological compartment in a region in which the staining is intermixed with analyte staining of an analytically-distinct different biological compartment (e.g.
- scoring is performed on classified nuclei, resulting in percent positivity metric or an H-score metric for a particular biomarker.
- identifying nuclei corresponding cells may be identified.
- cells are scored by associating respective nuclei with a stained membrane around them. Based on the presence of a stained membrane surrounding the nuclei, a cell may be classified, e.g. as non-stained (no stained membrane found around the nucleus), partially stained (the nucleus of the cell is partially surrounded by the stained membrane), or completely stained (the nucleus is completely surrounded by a stained membrane).
- tumor nuclei are automatically identified by first identifying candidate nuclei and then automatically distinguishing between tumor nuclei and non-tumor nuclei.
- Numerous methods of identifying candidate nuclei in images of tissue are known in the art.
- automatic candidate nucleus detection can be performed by applying a radial- symmetry-base method, a radial-symmetry-based method ofParvin et al., as described herein, such as on the Hematoxylin image channel or a biomarker image channel obtained using color deconvolution as described by Ruiffok et al, also described herein.
- a radial symmetry based nuclei detection operation is used as described in commonly-assigned and co -pending patent application W02014140085A1, the entirety of which is incorporated herein by reference.
- Other methods are discussed in US Patent Publication No. 2017/0140246, the disclosure of which is incorporated by reference herein.
- candidate nuclei are identified, they are further analyzed to distinguish tumor nuclei from other candidate nuclei.
- the other candidate nuclei may be further classified (for example, by identifying lymphocyte nuclei and stroma nuclei).
- a learnt supervised classifier is applied to identify tumor nuclei.
- the learnt supervised classifier is trained on nuclei features to identify tumor nuclei and then applied to classify the nucleus candidate in the test image as either a tumor nucleus or a non-tumor nucleus.
- the learnt supervised classifier may be further trained to distinguish between different classes of non-tumor nuclei, such as lymphocyte nuclei and stromal nuclei.
- the learnt supervised classifier used to identify tumor nuclei is a random forest classifier.
- the random forest classifier may be trained by: (i) creating a training set of tumor and non-tumor nuclei, (ii) extracting features for each nucleus, and (iii) training the random forest classifier to distinguish between tumor nuclei and non-tumor nuclei based on the extracted features.
- the trained random forest classifier may then be applied to classify the nuclei in a test image into tumor nuclei and non-tumor nuclei.
- the random forest classifier may be further trained to distinguish between different classes of non-tumor nuclei, such as lymphocyte nuclei and stromal nuclei.
- the images received as input are processed such as to detect nucleus centers (seeds) and/or to segment the nuclei.
- instructions may be provided to detect nucleus centers based on radial-symmetry voting using techniques commonly known to those of ordinary skill in the art (see Parvin, man, et al. "Iterative voting for inference of structural saliency and characterization of subcellular events.” Image Processing, IEEE Transactions on 16.3 (2007): 615-623, the disclosure of which is incorporated by reference in its entirety herein).
- nuclei are detected using radial symmetry to detect centers of nuclei and then the nuclei are classified based on the intensity of stains around the cell centers.
- an image magnitude may be computed within an image and one or more votes at each pixel are accumulated by adding the summation of the magnitude within a selected region.
- Mean shift clustering may be used to find the local centers in the region, with the local centers representing actual nuclear locations.
- Nuclei detection based on radial symmetry voting is executed on color image intensity data and makes explicit use of the a priori domain knowledge that the nuclei are elliptical shaped blobs with varying sizes and eccentricities. To accomplish this, along with color intensities in the input image, image gradient information is also used in radial symmetry voting and combined with an adaptive segmentation process to precisely detect and localize the cell nuclei.
- a “gradient” as used herein is, for example, the intensity gradient of pixels calculated for a particular pixel by taking into consideration an intensity value gradient of a set of pixels surrounding said particular pixel.
- Each gradient may have a particular "orientation" relative to a coordinate system whose x- and y- axis are defined by two orthogonal edges of the digital image.
- nuclei seed detection involves defining a seed as a point which is assumed to lie inside a cell nucleus and serve as the starting point for localizing the cell nuclei.
- the first step is to detect seed points associated with each cell nuclei using a highly robust approach based on the radial symmetry to detect elliptical-shaped blobs, structures resembling cell nuclei.
- the radial symmetry approach operates on the gradient image using a kernel based voting procedure.
- a voting response matrix is created by processing each pixel that accumulates a vote through a voting kernel.
- the kernel is based on the gradient direction computed at that particular pixel and an expected range of minimum and maximum nucleus size and a voting kernel angle (typically in the range [p/4, p/8]).
- a voting kernel angle typically in the range [p/4, p/8]
- Nuclei may be identified using other techniques known to those of ordinary skill in the art. For example, an image magnitude may be computed from a particular image channel of one of the FI&E or IHC images, and each pixel around a specified magnitude may be assigned a number of votes that is based on a summation of the magnitude within a region around the pixel. Alternatively, a mean shift clustering operation may be performed to find the local centers within a voting image, which represents the actual location of the nucleus. In other embodiments, nuclear segmentation may be used to segment the entire nucleus based on the now-known centers of the nuclei via morphological operations and local thresholding. In yet other embodiments, model based segmentation may be utilized to detect nuclei (i.e. learning the shape model of the nuclei from a training data set and using that as the prior knowledge to segment the nuclei in the testing image).
- the nuclei are then subsequently segmented using thresholds individually computed for each nucleus.
- Otsu's method may be used for segmentation in a region around an identified nucleus since it is believed that the pixel intensity in the nuclear regions varies.
- Otsu's method is used to determine an optimal threshold by minimizing the intra-class variance and is known to those of skill in the art. More specifically, Otsu's method is used to automatically perform clustering-based image thresholding or, the reduction of a gray level image to a binary image. The algorithm assumes that the image contains two classes of pixels following a bi-modal histogram (foreground pixels and background pixels). It then calculates the optimum threshold separating the two classes such that their combined spread (intra-class variance) is minimal, or equivalent (because the sum of pairwise squared distances is constant), so that their inter-class variance is maximal.
- the systems and methods further comprise automatically analyzing spectral and/or shape features of the identified nuclei in an image for identifying nuclei of non-tumor cells.
- blobs may be identified in the first digital image in a first step.
- a "blob" as used herein can be, for example, a region of a digital image in which some properties, e.g. the intensity or grey value, are constant or vary within a prescribed range of values. All pixels in a blob can be considered in some sense to be similar to each other.
- blobs may be identified using differential methods which are based on derivatives of a function of position on the digital image, and methods based on local extrema.
- a nuclear blob is a blob whose pixels and/or whose outline shape indicate that the blob was probably generated by a nucleus stained with the first stain.
- the radial symmetry of a blob could be evaluated to determine if the blob should be identified as a nuclear blob or as any other structure, e.g. a staining artifact.
- a staining artifact e.g. a staining artifact.
- said blob may not be identified as a nuclear blob but rather as a staining artifact.
- a blob identified to be a "nuclear blob” may represent a set of pixels which are identified as candidate nuclei and which may be further analyzed for determining if said nuclear blob represents a nucleus.
- any kind of nuclear blob is directly used as an "identified nucleus.”
- filtering operations are applied on the identified nuclei or nuclear blobs for identifying nuclei which do not belong to biomarker-positive tumor cells and for removing said identified non-tumor nuclei from the list of already identified nuclei or not adding said nuclei to the list of identified nuclei from the beginning.
- additional spectral and/or shape features of the identified nuclear blob may be analyzed to determine if the nucleus or nuclear blob is a nucleus of a tumor cell or not.
- the nucleus of a lymphocyte is larger than the nucleus of other tissue cell, e.g. of a lung cell.
- nuclei oflymphocytes are identified by identifying all nuclear blobs of a minimum size or diameter which is significantly larger than the average size or diameter of a normal lung cell nucleus.
- the identified nuclear blobs relating to the nuclei of lymphocytes may be removed (i.e., "filtered out from") the set of already identified nuclei.
- non-tumor cells By filtering out the nuclei of non-tumor cells, the accuracy of the method may be increased.
- non-tumor cells may express the biomarker to a certain extent, and may therefore produce an intensity signal in the first digital image which does not stem from a tumor cell.
- the accuracy of identifying biomarker-positive tumor cells may be increased.
- marker based watershed algorithms can also be used to identify the nuclei blobs around the detected nuclei centers.
- the system can use at least one image characteristic metric and at least one morphology metric to determine whether a feature within an image corresponds to a structure of interest (collectively“feature metrics”).
- Image characteristic metrics derived from features within an image
- Morphology metrics derived from features within an image
- feature size can include, for example, feature size, feature color, feature orientation, feature shape, relation or distance between features (e.g., adjacent features), relation or distance of a feature relative to another anatomical structure, or the like.
- Image characteristic metrics, morphology metrics, and other metrics can be used to train a classifier as described herein. Specific examples of metrics derived from image features are set forth below:
- a "morphology feature” as used herein is, for example, a feature being indicative of the shape or dimensions of a nucleus. Without wishing to be bound by any particular theory, it is believed that morphological features provide some vital information about the size and shape of a cell or its nucleus. For example, a morphology feature may be computed by applying various image analysis algorithms on pixels contained in or surrounding a nuclear blob or seed. In some embodiments, the morphology features include area, minor, and major axis lengths, perimeter, radius, solidity, etc.
- an "appearance feature” as used herein is, for example, a feature having been computed for a particular nucleus by comparing pixel intensity values of pixels contained in or surrounding a nuclear blob or seed used for identifying the nucleus, whereby the compared pixel intensities are derived from different image channels (e.g. a background channel, a channel for the staining of a biomarker, etc.).
- the metrics derived from appearance features are computed from percentile values (e.g. the 1 Oth, 50th, and 95th percentile values) of pixel intensities and of gradient magnitudes computed from different image channels.
- Computing appearance feature metrics may be advantageous since the derived metrics may describe the properties of the nuclear regions as well as describe the membrane region around the nuclei.
- a "background feature” is, for example, a feature being indicative of the appearance and/or stain presence in cytoplasm and cell membrane features of the cell comprising the nucl eus for which the background feature was extracted from the image.
- a background feature and a corresponding metrics can be computed for a nucleus and a corresponding cell depicted in a digital image e.g. by identifying a nuclear blob or seed representing the nucleus; analyzing a pixel area (e.g.
- a ribbon of 20 pixels - about 9 microns - thickness around the nuclear blob boundary) directly adjacent to the identified set of cells are computed in, therefore capturing appearance and stain presence in cytoplasm and membrane of the cell with this nucleus together with areas directly adjacent to the cell.
- These metrics are similar to the nuclear appearance features, but are computed in a ribbon of about 20 pixels (about 9 microns) thickness around each nucleus boundary, therefore capturing the appearance and stain presence in the cytoplasm and membrane of the cell having the identified nucleus together with areas directly adjacent to the cell.
- the ribbon size is selected because it is believed that it captures a sufficient amount of background tissue area around the nuclei that can be used to provide useful information for nuclei discrimination.
- metrics derived from color include color ratios, R/(R+G+B). or color principal components.
- metrics derived from color include local statistics of each of the colors (mean/median/variance/ std dev) and/or color intensity correlations in a local image window.
- the group of adjacent cells with certain specific property values is set up between the dark and the white shades of grey colored cells represented in a histopathological slide image.
- the correlation of the color feature defines an instance of the size class, thus this way the intensity of these colored cells determines the affected cell from its surrounding cluster of dark cells. Examples of texture features are described in PCT Publication No. WO/2017/075095, the disclosure of which is incorporated by reference herein in its entirety.
- spatial features include a local density of cells; average distance between two adjacent detected cells; and/or distance from a cell to a segmented region.
- the feature may be used alone or in conjunction with training data (e.g. during training, example cells are presented together with a ground truth identification provided by an expert observer according to procedures known to those of ordinary skill in the art) to classify nuclei or cells.
- the system can include a classifier that was trained based at least in part on a set of training or reference slides for each biomarker. The skilled artisan will appreciate that different sets of slides can be used to train a classifier for each biomarker. Accordingly, for a single biomarker, a single classifier is obtained after training.
- a different classifier can be trained for each different biomarker so as to ensure better performance on unseen test data, where the biomarker type of the test data will be known.
- the trained classifier can be selected based at least in part on how best to handle training data variability, for example, in tissue type, staining protocol, and other features of interest, for slide interpretation.
- the classification module is a Support Vector Machine ("SVM").
- SVM Support Vector Machine
- a SVM is a classification technique, which is based on statistical learning theory where a nonlinear input data set is converted into a high dimensional linear feature space via kernels for the non-linear case.
- support vector machines project a set of training data, E, that represents two different classes into a high-dimensional space by means of a kernel function, K.
- K In this transformed data space, nonlinear data are transformed so that a flat line can be generated (a discriminating hyperplane) to separate the classes so as to maximize the class separation.
- Testing data are then projected into the high-dimensional space via K, and the test data are classified on the basis of where they fall with respect to the hyperplane.
- the kernel function K defines the method in which data are projected into the high-dimensional space.
- classification is performed using an AdaBoost algorithm.
- the AdaBoost is an adaptive algorithm which combines a number of weak classifiers to generate a strong classifier.
- Image pixels identified by a pathologist during the training stage e.g. those having a particular stain or belonging to a particular tissue type
- derived stain intensity values, counts of specific nuclei, or other classification results may be used to determine various marker expression scores (used interchangeably with the term“expression score” herein), such as percent positivity or an H-Score (i.e. fro the classified features, expression scores may be calculated).
- marker expression scores used interchangeably with the term“expression score” herein
- H-Score i.e. fro the classified features, expression scores may be calculated.
- a score (e.g., a whole-slide score) can be determined.
- average blob intensity, color and geometric features, such as area and shape of the detected nuclear blob may be computed, and the nuclear blobs are classified into tumor nuclei and nuclei of non-tumor cells.
- the number of identified nuclei output corresponds to the total number of biomarker-positive tumor cells detected in the FOV, as evidenced by the number of tumor nuclei counted.
- the feature metrics are derived and a classifier is trained such that a percentage (e.g. a percent positivity expression score) of FAP positive or negative cells may be elucidated, e.g. positively or negatively stained stromal cells.
- a score of 0 may be assigned to a stained area with ⁇ 10% of the tumor cells, 1 for an area with > 1 1% to ⁇ 25% of tumor cells, 2 for > 26% to ⁇ 50% tumor cells, and 3 for > 51% tumor cells.
- a score of 0 may be assigned for absent/weak staining (negative control), 1 for a weak staining obviously stronger than the negative control level, 2 for moderately intense staining, and 3 for intense staining.
- a final score of > 3 may be recognized to indicate positive expression of FAP.
- the mid-resolution analysis approach employs segmentation algorithms to generate the sub-regions within the input images, the sub-regions defined to capture biologically meaningful regions of interest.
- a segmentation generation module 206 is utilized to segment the input image into a plurality of sub-regions (step 320).
- segmentation is performed on a single channel image, e.g. a "purple" channel in an unmixed FAP image.
- Methods of unmixing are known to those of ordinary skill in the art (e.g. linear unmixing is described, for example, in 'Zimmermann “Spectral Imaging and Linear Unmixing in Light Microscopy” Adv Biochem Engin/Biotechnol (2005) 95:245-265' and in in C. L. Lawson and R. J. Hanson, "Solving least squares Problems," Prentice Hall, 1974, Chapter 23, p. 161,' the disclosures of which are incorporated herein by reference in their entirety). Other methods of unmixing are disclosed herein.
- the sub-regions generated capture information in an area of the input image having either a pre-determined size or a size within a range as set forth within an image processing algorithm (e.g. a parameter of a SLIC superpixel generation algorithm as described herein).
- an image processing algorithm e.g. a parameter of a SLIC superpixel generation algorithm as described herein.
- the input image is segmented into sub-regions having a predefined shape, size, area, and/or spacing.
- the sub-regions (710) may be ovals, circles, squares, rectangles, etc., such as depicted in FIG. 7.
- the oval, circular, square, or rectangular sub-regions may have a size ranging from between 50 pixels to about 100 pixels, or some other size such that groups of pixels are selected having similar properties or characteristics (e.g. color, brightness, and/or texture).
- the sub- regions are non-overlapping, and may be generated via a sampling grid.
- sampling grid relates to a network of horizontal and perpendicular lines which are uniformly spaced and superimposed onto an image, ultimately used for locating points non-overlapping points within an image.
- any number of adjacent positions established by the horizontal and perpendicular lines may be used to define an image segment.
- the sub-regions are distributed across the image in a manner that captures a representative sample of relevant regions for analysis, e.g. areas where irregularly shaped cells are a predominant feature.
- the input image is segmented by applying a series of algorithms to the image, including global thresholding filters, local adaptive thresholding filters, morphological operations, and watershed transformations.
- the filters may be run sequentially or in any order deemed necessary by those of ordinary skill in the art. Of course, any filter may be applied iteratively until the desired outcome is achieved.
- a first filter is applied to the input image to remove regions that are unlikely to have nuclei, such as removing those image regions that are white (corresponding to regions in the tissue samples that are unstained or nearly unstained). In some embodiments, this is achieved by applying a global thresholding filter.
- the global thresholding is based on a median and/or standard deviation computed on a first principal component channel, e.g. similar to a gray scale channel .
- a first principal component channel e.g. similar to a gray scale channel.
- Filters are then applied to the image to selectively remove artifacts, e.g. small blobs, small discontinuities, other small objects, and/or to fill holes.
- morphological operators are applied to remove artifacts and/or fill holes.
- a distance-based watershed is applied, based on a binary image introduced as input (e.g. a binary image resulting from prior filtering steps).
- the input image is segmented into superpixels. It is believed that a superpixels algorithm partitions an image into a number of segments (group of pixels) that represent perceptually meaningful entities.
- Each superpixel is obtained by a low-level grouping process and has a perceptually consistent unit, i.e., all pixels in a biological object contained in a superpixel are as uniform as possible in staining presence (e.g. pixels present in the superpixel are of a particular type of stain), staining intensity (e.g. pixels have a certain relative intensity value or range of values), and texture (e.g. pixels have a particular spatial arrangement of color or intensities).
- the local analysis result of each superpixel can be stored and reported to represent the analysis results on digital pathology images.
- a superpixel is a collection of pixels with similar characteristics, such as color, brightness, and texture.
- An image can be composed of a certain number of superpixels that contain multiple combination characteristics of the pixels and can preserve the edge information of the original image. Compared with a single pixel, a superpixel contains rich characteristic information and can greatly reduce image post-processing complexity and significantly increase the speed of image segmentation. Superpixels are also useful for estimating probabilities and making decisions with small neighborhood models.
- Superpixel algorithms are methods that group pixels into meaningful atomic regions of similar size. Without wishing to be bound by any particular theory, it is believed that superpixels are powerful because they often fall on important boundaries within the image, and tend to take on abnormal or unique shapes when they contain salient object features. Consistent with the desire to obtain and store information at a medium resolution analysis, superpixels are located between pixel- and object-level: they carry more information than pixels by representing perceptually meaningful pixel groups, while not comprehensively representing image objects. Superpixels can be understood as a form of image segmentation, that over-segment the image in a short computing time. The outlines of superpixels have shown to adhere well to natural image boundaries, as most structures in the image are conserved. With image features being computed for each superpixel rather than each pixel, subsequent processing tasks are reduced in complexity and computing time. Thus, superpixels are considered useful as a preprocessing step for analyses at object level such as image segmentation.
- the Turbopixel method progressively dilates a set of seed locations using level-set based geometric flow (see A. Levinshtein, A. Stere, K. Kutulakos, D. Fleet, S. Dickinson, and K. Siddiqi). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2009, the disclosure of which is incorporated by reference herein).
- the geometric flow relies on local image gradients, aiming to regularly distribute superpixels on the image plane.
- the Turbopixel superpixels are constrained to have uniform size, compactness, and boundary adherence.
- Yet other methods of generating superpixels are described by Radhakrishna Achanta, "SLIC Superpixels Compared to State-of-the-art," Journal of Latex Class Files, Vol. 6, No. 1 , December 201 1, the disclosure of which is incorporated by herein in its entirety).
- SLIC simple linear iterative clustering
- KMC local k-means clustering
- CCA connected components algorithm
- K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
- p denotes the pixel to be labeled at any stage in the scanning process
- V ⁇ 1 ⁇ .
- the equivalent label pairs are sorted into equivalence classes and a unique label is assigned to each class.
- a second scan is made through the image, during which each label is replaced by the label assigned to its equivalence classes.
- the labels might be different gray levels or colors.
- SLIC is an adaptation of k-means for superpixel generation, with two important distinctions: (i) the number of distance calculations in the optimization is dramatically reduced by limiting the search space to a region proportional to the superpixel size (this is believed to reduce the complexity to be linear in the number of pixels— and independent of the number of superpixels k); and (ii) a weighted distance measure combines color and spatial proximity while simultaneously providing control over the size and compactness of the superpixels. (See Achanta, et al., "SLIC Superpixels Compared to State-of-the-Art Superpixel Methods," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.34, No. 1 1 , November 2012, the disclosure of which is hereby incorporated by reference in its entirety herein).
- SLIC considers image pixels in a 5D space, defined by the L*a*b values of the CIELAB color space as well as their x and y coordinates. Pixels in the 5D space are clustered based on an adapted k-means clustering integrating color similarity and proximity in the image plane. The clustering is based on a distance measure D that measures color similarity in L*a*b space (dc) and pixel proximity in x, y space ( ds ). The latter is normalized by a grid interval (S) that defines the square root of the total number of image pixels divided by the number of superpixels (k). The compactness and regularity of the superpixels is controlled with the constant m.
- D distance measure
- S grid interval
- This parameter functions as a weighting criteria between the spatial distance (dc) and the spectral distance (ds).
- dc spatial distance
- ds spectral distance
- a larger m increases the weight of spatial proximity, which leads to more compact superpixels with boundaries adhering less to spectral outlines in the image.
- the CCA is used to reassign isolated regions to nearby superpixels if the size of the isolated regions is smaller than a minimum size Smin.
- a local KMC is applied in step (2) of the SLIC method, where each pixel is associated with the closest cluster center whose search area covers its location.
- the search area of each cluster center is the whole image, and then the distances are calculated from each cluster center to every pixel in the image.
- the search space of a cluster center is limited to a local 2S x 2S square region. Therefore, the SLIC only computes distances from each cluster center to pixels within its searching area.
- m is a regularization parameter that weights the relative contribution of df and ds to the integrated distance Di.
- m is a regularization parameter that weights the relative contribution of df and ds to the integrated distance Di.
- a larger m indicates that ds is more significant than df.
- An equivalent integrated distance Di directly describing the contribution of the two distances can be given by:
- Nf is the mean intensity of the whole image
- w e [0,1] is a regularization parameter.
- w and (l-w) are the ratios of the normalized intensity and spatial distances in Di, respectively.
- the parameter k of the SLIC algorithm specifies the number of approximately equally sized superpixels.
- the compactness parameter m can be set to control the trade-off between superpixels' homogeneity and boundary adherence. Without wishing to be bound by any particular theory, it is believed that by varying the compactness parameter, regular-shaped superpixels may be generated in untextured regions and highly irregular superpixels may be generated in textured regions. Again, without wishing to be bound by any particular theory, it is believed that the parameter m also allows for the weighting of the relative importance between color similarity and spatial proximity. When m is large, spatial proximity is more important and the resulting superpixels are more compact (i.e. they have a lower area to perimeter ratio). When m is small, the resulting superpixels adhere more tightly to image boundaries, but have less regular size and shape.
- both superpixel size and compactness parameters are adjusted.
- a superpixel size ranging from between about 40 pixels and about 400 pixels is used.
- a superpixel size ranging from between about 60 pixels and about 300 pixels is used.
- a superpixel size ranging from between about 70 pixels and about 250 pixels is used.
- a superpixel size ranging from between about 80 pixels and about 200 pixels is used.
- the compactness parameter ranges from about 10 to about 100. In other embodiments, the compactness parameter ranges from about 20 to about 90. In other embodiments, the compactness parameter ranges from about 40 to about 80. In other embodiments, the compactness parameter ranges from about 50 to about 80.
- FIG. 8 A illustrates an example of superpixels generated using SLIC as noted herein, where the superpixels are segmented to be appropriate for localized characteristics of the regions of interest without overlapping, and have no gaps among them. Moreover, each superpixel sub- region has a specific final shape depending on its local intensity (810) and direction (820) of the presence of biomarker expression. Hence, superpixels are perceptually meaningful for such biological structures of interest.
- FIGS. 8B, 8C, and 8D show an original IHC image at high magnification, the initialization of a superpixel generation process, and the final superpixels with local homogeneity, respectively, and where the regularity of their shape has been adjusted by a technical parameter of the SLIC algorithm, as noted above.
- representational objects or interest points are determined for each sub-region (step 330) using module 207.
- the representational objects are outlines of sub-regions or superpixels pertaining to cells or groups of cells of interest, e.g. fibroblasts or macrophages.
- the representational objects are seed points.
- an objective of the present disclosure is to characterize cells of interest (e.g. irregularly shaped cells) based on sub-regions having similar staining presence, staining intensity, and/or local texture, and to automatically save those homogeneous property sub-regions in a database.
- the representational objects, or coordinates thereof, are one method of storing the generated sub-regions.
- FIGS. 9A and 9B provide examples of polygon outlines and center seeds for those superpixels that contain the biological objects of interest.
- a thresholding algorithm e.g. Outsu, mean clustering, etc.
- a threshold parameter e.g. a threshold staining parameter provided by an expert pathologist.
- segmentation is achieved by applying a series of filters designed to enhance the image such that (i) sub-regions unlikely to represent objects of interest are separated from (ii) sub-regions representing cells having an object of interest. Additional filters may be selectively applied to remove artifacts, remove small blobs, remove small discontinuities, fill holes, and split up bigger blobs.
- regions that are unlikely to have sub-regions identifying irregularly shaped cells are removed, such as by removing image regions in a binary image of the stain channel that are white (corresponding to regions in the tissue samples that are unstained or nearly unstained).
- this is achieved by applying a global thresholding filter.
- Thresholding is a method used for converting an intensity image (I) into a binary image (G) by assigning to all pixels the value one or zero if their intensity is above or below some threshold value, here a global threshold value.
- a global threshold value is applied to partition pixels depending on their intensity value.
- the global thresholding is based on a median and/or standard deviation computed on a first principal component channel, e.g. similar to a gray scale channel.
- the boundaries may be created by: 1) unmixing the purple channel, 2) thresholding the purple channel to identify FAP- positive regions, 3) applying a superpixel segmentation on the purple channel, and 4) attach feature metrics to the superpixel objects.
- the presence of FAP -positive regions may be identified using a supervised-generation rule, which was trained based on ground truth obtained from pathologists.
- FAP-positive threshold parameters may be supplied by a pathologist, such as by identifying a threshold on a training set of images. A binary mask may then be generated using the threshold parameters.
- the boundaries of the sub-regions are traced.
- an algorithm may be provided which traces the exterior boundary of the sub-regions, as well as those boundaries of "holes" inside or between sub-regions.
- the boundaries of the sub-regions are generated by creating the boundary traces using a matlab function called bwboundaries (https ://www.mathworks . com/help/images/ref/bwboundaries.html) [000145] Following boundary creation, the boundary traces were converted into polygon outlines of x,y coordinates. The x,y coordinates of the traced boundaries may be stored in a memory or database, e.g. the row and column coordinates of all of the pixels of the traced border of the sub-region object may be determined and stored.
- seed points are derived by calculating or computing a centroid or center of mass of each sub-region. Methods of determining centroids of irregular objects are known to those of ordinary skill in the art. Once calculated, the centroid of the sub- region is labeled and/or the x,y coordinates of the seed are stored in a memory or database. In some embodiments, the position of the centroid or center of mass may be superimposed on the input image.
- the representational objects are annotated, labeled, or associated with data (step 330), such as the metrics derived from the image analysis module 202 (step 310), using a labeling module 208.
- the labeling module 208 may create a database 209 which is a non-transitory memory that stores data as noted herein.
- the database 209 storages the images received as input, the coordinates of any polygons and/or seed points, and any associated data or labels from image analysis (see FIG. 1 1).
- a vector of data may be stored for each segmented sub-region of the image.
- a vector of data may be stored for each sub-region, including the coordinates of any representational objects and associated image analysis data.
- the database would store the following vectors of data [a, b, c, x, y, z] 1 , [a, b, c, x, y, z] 2 , [a, b, c, x, y, z] N , where N is the number of sub-regions generated with segmentation module 206.
- data from the image analysis module describes individual pixels within an image.
- the data of all pixels within a particular sub-region may be averaged to provide an average value of the pixel data within the sub- region.
- individual pixels may each have a certain intensity.
- the intensity of all of the pixels in a particular sub-region may be averaged to provide an average pixel intensity for that sub-region.
- That average pixel for that sub-region may be associated with a representational object for that sub-region and the data may be stored together in a memory.
- the FAP -positive area can be another feature/measurement attached to the superpixel object.
- the FAP -positive area refers to the summation of the pixels that have the FAP intensity above a set threshold.
- the selection of a threshold is described by Auranuch Lorsakul et al. "Automated whole-slide analysis of multiplex- brightfield IHC images for cancer cells and carcinoma-associated fibroblasts," Proc. SPIE 10140, Medical Imaging 2017: Digital Pathology, 1014007 (2017/03/01), the disclosure of which is hereby incorporated by reference herein in its entirety.
- an average intensity of the FAP stain within a sub- region may be derived through image analysis for a particular sub-region and that FAP stain intensity may be stored in a database along with the coordinates of any representational objects for that sub-region.
- a particular expression score such as a FAP expression score, for a sub-region may be derived using image analysis, and that FAP expression score for that sub-region may be stored along with representation objects of that particular sub-region.
- other parameters may be stored including, but not limited to, the distances between seed points, the distance between identified tumor cells and irregularly shaped cells (e.g. the distance between a tumor cell and a fibroblast), and FAP -positive areas.
- analysis results e.g., average local intensity, positive stained area, computed within a corresponding superpixels
- these representation objects e.g., polygon outlines and seeds
- FIG. 10A illustrates an example of a whole slide IHC image of head-and-neck cancer tissue stained with fibroblast-activation protein (FAP) for fibroblasts (1010) in purple and with pan-cytokeratin (PanCK) for epithelial tumor (1020) in yellow.
- FIGS. 10B and 11 show examples of polygon outlines and seeds attached with their analysis results of the superpixels belonging to the fibroblast regions which can be stored in database, respectively.
- FAP fibroblast-activation protein
- PanCK pan-cytokeratin
- the stored analysis results and associated biological features can be later retrieved, and the data may be reported or visualized in various formats, e.g., histogram plot of analysis results. More specifically, the representation object coordinate data and associated image analysis data may be retrieved from the database 209 and used for further analysis. In some embodiments, and by way of example, the representation objects can be retrieved from a database for the visualization or reporting of analysis results within a whole slide image or in user annotated regions. As illustrated in FIG. 12, the associated or attached image analysis results can be reported by plotting in histogram of FAP intensity retrieved from whole slide superpixels. Alternatively, the data can be visualized on a whole-slide image, a field-of-view image, or a portion of an image annotated by a medical professional for further review.
- the system 200 of the present disclosure may be tied to a specimen processing apparatus that can perform one or more preparation processes on the tissue specimen.
- the preparation process can include, without limitation, deparaffinizing a specimen, conditioning a specimen (e.g., cell conditioning), staining a specimen, performing antigen retrieval, performing immunohistochemistry staining (including labeling) or other reactions, and/or performing in situ hybridization (e.g., SISFI, FISFt, etc.) staining (including labeling) or other reactions, as well as other processes for preparing specimens for microscopy, microanalyses, mass spectrometric methods, or other analytical methods.
- the processing apparatus can apply fixatives to the specimen.
- Fixatives can include cross-linking agents (such as aldehydes, e.g., formaldehyde, paraformaldehyde, and glutaraldehyde, as well as non-aldehyde cross-linking agents), oxidizing agents (e.g., metallic ions and complexes, such as osmium tetroxide and chromic acid), protein-denaturing agents (e.g., acetic acid, methanol, and ethanol), fixatives of unknown mechanism (e.g., mercuric chloride, acetone, and picric acid), combination reagents (e.g., Carnoy's fixative, methacam, Bouin's fluid, B5 fixative, Rossman's fluid, and Gendre's fluid), microwaves, and miscellaneous fixatives (e.g., excluded volume fixation and vapor fixation).
- cross-linking agents such as al
- the sample can be deparaffinized using appropriate deparaffinizing fluid(s).
- any number of substances can be successively applied to the specimen.
- the substances can be for pretreatment (e.g., to reverse protein-crosslinking, expose nucleic acids, etc.), denaturation, hybridization, washing (e.g., stringency wash), detection (e.g., link a visual or marker molecule to a probe), amplifying (e.g., amplifying proteins, genes, etc.), counterstaining, coverslipping, or the like.
- the specimen processing apparatus can apply a wide range of substances to the specimen.
- the substances include, without limitation, stains, probes, reagents, rinses, and/or conditioners.
- the substances can be fluids (e.g., gases, liquids, or gas/liquid mixtures), or the like.
- the fluids can be solvents (e.g., polar solvents, non-polar solvents, etc.), solutions (e.g., aqueous solutions or other types of solutions), or the like.
- Reagents can include, without limitation, stains, wetting agents, antibodies (e.g., monoclonal antibodies, polyclonal antibodies, etc.), antigen recovering fluids (e.g., aqueous- or non-aqueous-based antigen retrieval solutions, antigen recovering buffers, etc.), or the like.
- Probes can be an isolated nucleic acid or an isolated synthetic oligonucleotide, attached to a detectable label or reporter molecule. Labels can include radioactive isotopes, enzyme substrates, co-factors, ligands, chemiluminescent or fluorescent agents, haptens, and enzymes.
- the specimen processing apparatus can be an automated apparatus, such as the BEN CHM ARK XT instrument and SYMPHONY instrument sold by Ventana Medical Systems, Inc. Ventana Medical Systems, Inc. is the assignee of a number of United States patents disclosing systems and methods for performing automated analyses, including U.S. Pat. Nos. 5,650,327, 5,654,200, 6,296,809, 6,352,861, 6,827,901 and 6,943,029, and U.S. Published Patent Application Nos. 20030211630 and 20040052685, each of which is incorporated herein by reference in its entirety.
- specimens can be manually processed.
- the imaging apparatus is a brightfield imager slide scanner.
- One brightfield imager is the iScan HT and DP200 (Griffin) brightfield scanner sold by Ventana Medical Systems, Inc.
- the imaging apparatus is a digital pathology device as disclosed in International Patent Application No.: PCT/US2010/002772 (Patent Publication No.: WO/2011/049608) entitled IMAGING SYSTEM AND TECHNIQUES or disclosed in U.S. Patent Application No. 61/533,1 14, filed on Sep.
- the imaging system or apparatus may be a multispectral imaging (MSI) system or a fluorescent microscopy system.
- the imaging system used here is an MSI.
- MSI generally, equips the analysis of pathology specimens with computerized microscope-based imaging systems by providing access to spectral distribution of an image at a pixel level. While there exists a variety of multispectral imaging systems, an operational aspect that is common to all of these systems is a capability to form a multispectral image.
- a multispectral image is one that captures image data at specific wavelengths or at specific spectral bandwidths across the electromagnetic spectrum. These wavelengths may be singled out by optical filters or by the use of other instruments capable of selecting a pre-determined spectral component including electromagnetic radiation at wavelengths beyond the range of visible light range, such as, for example, infrared (IR).
- IR infrared
- An MSI system may include an optical imaging system, a portion of which contains a spectrally-selective system that is tunable to define a pre-determined number N of discrete optical bands.
- the optical system may be adapted to image a tissue sample, illuminated in transmission with a broadband light source onto an optical detector.
- the optical imaging system which in one embodiment may include a magnifying system such as, for example, a microscope, has a single optical axis generally spatially aligned with a single optical output of the optical system.
- the system forms a sequence of images of the tissue as the spectrally selective system is being adjusted or tuned (for example with a computer processor) such as to assure that images are acquired in different discrete spectral bands.
- the apparatus may additionally contain a display in which appears at least one visually perceivable image of the tissue from the sequence of acquired images.
- the spectrally-selective system may include an optically-dispersive element such as a diffractive grating, a collection of optical filters such as thin-film interference filters or any other system adapted to select, in response to either a user input or a command of the pre-programmed processor, a particular pass-band from the spectrum of light transmitted from the light source through the sample towards the detector.
- a spectrally selective system defines several optical outputs corresponding to N discrete spectral bands. This type of system intakes the transmitted light output from the optical system and spatially redirects at least a portion of this light output along N spatially different optical paths in such a way as to image the sample in an identified spectral band onto a detector system along an optical path corresponding to this identified spectral band.
- Embodiments of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Any of the modules described herein may include logic that is executed by the processor(s). "Logic,” as used herein, refers to any information having the form of instruction signals and/or data that may be applied to affect the operation of a processor. Software is an example of logic.
- a computer storage medium can be, or can be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them.
- a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal.
- the computer storage medium can also be, or can be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).
- the operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
- the term "programmed processor” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable microprocessor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing.
- the apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
- the apparatus also can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them.
- a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subprograms, or portions of code).
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read only memory or a random-access memory or both.
- the essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
- Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer having a display device, e.g., an LCD (liquid crystal display), LED (light emitting diode) display, or OLED (organic light emitting diode) display, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., an LCD (liquid crystal display), LED (light emitting diode) display, or OLED (organic light emitting diode) display
- a keyboard and a pointing device e.g., a mouse or a trackball
- a touch screen can be used to display information and receive input from a user.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
- Embodiments of the subject mater described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network.
- Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
- LAN local area network
- WAN wide area network
- inter network e.g., the Internet
- peer-to-peer networks e.g., ad hoc peer-to-peer networks.
- the network 20 of FIG. 1 can include one or more local area networks.
- the computing system can include any number of clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device).
- client device e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device.
- Data generated at the client device e.g., a result of the user interaction
- Unmixing is the procedure by which the measured spectrum of a mixed pixel is decomposed into a collection of constituent spectra, or endmembers, and a set of corresponding fractions, or abundances, that indicate the proportion of each endmember present in the pixel.
- the unmixing process can extract stain-specific channels to determine local concentrations of individual stains using reference spectra that are well known for standard types of tissue and stain combinations.
- the unmixing may use reference spectra retrieved from a control image or estimated from the image under observation.
- Unmixing the component signals of each input pixel enables retrieval and analysis of stain-specific channels, such as a hematoxylin channel and an eosin channel in H&E images, or a diaminobenzidine (DAB) channel and a counterstain (e.g., hematoxylin) channel in IHC images.
- stain-specific channels such as a hematoxylin channel and an eosin channel in H&E images, or a diaminobenzidine (DAB) channel and a counterstain (e.g., hematoxylin) channel in IHC images.
- DAB diaminobenzidine
- counterstain e.g., hematoxylin channel in IHC images.
- the terms “unmixing” and “color deconvolution” (or “deconvolution") or the like are used interchangeably in the art.
- the multiplex images are unmixed with an unmixing module using liner unmixing.
- the columns of the M x N matrix R are the optimal color system as derived herein
- the N x 1 vector A is the unknown of the proportions of individual stains
- the M x 1 vector S is the measured multichannel spectral vector at a pixel.
- the signal in each pixel (S) is measured during acquisition of the multiplex image and the reference spectra, i.e. the optimal color system, is derived as described herein.
- the contributions of various stains (Ai) can be determined by calculating their contribution to each point in the measured spectrum.
- the solution is obtained using an inverse least squares fitting approach that minimizes the square difference between the measured and calculated spectra by solving the following set of equations,
- unmixing is accomplished using the methods described in WO2014/195193, entitled “Image Adaptive Physiologically Plausible Color Separation,” filed on May 28, 2014, the disclosure of which is hereby incorporated by reference in its entirety herein.
- WO2014/195193 describes a method of unmixing by separating component signals of the input image using iteratively optimized reference vectors.
- image data from an assay is correlated with expected or ideal results specific to the characteristics of the assay to determine a quality metric.
- one or more reference column vectors in matrix R are adjusted, and the unmixing is repeated iteratively using adjusted reference vectors, until the correlation shows a good quality image that matches physiological and anatomical requirements.
- the anatomical, physiological, and assay information may be used to define rules that are applied to the measured image data to determine the quality metric. This information includes how the tissue was stained, what structures within the tissue were intended or not intended to be stained, and relationships between structures, stains, and markers specific to the assay being processed.
- An iterative process results in stain- specific vectors that can generate images that accurately identify structures of interest and biologically relevant information, are free from any noisy or unwanted spectra, and therefore fit for analysis.
- the reference vectors are adjusted to within a search space.
- the search space defines a range of values that a reference vector can take to represent a stain.
- the search space may be determined by scanning a variety of representative training assays including known or commonly occurring problems, and determining high-quality sets of reference vectors for the training assays.
- unmixing is accomplished using the methods described in WO2015/124772, entitled “Group Sparsity Model for Image Unmixing,” filed on February 23, 215, the disclosure of which is hereby incorporated by reference in its entirety herein.
- WO2015/124772 describes unmixing using a group sparsity framework, in which fractions of stain contributions from a plurality of colocation markers are modeled within a "same group” and fractions of stain contributions from a plurality of non-colocation markers are modeled in different groups, providing co-localization information of the plurality of colocation markers to the modeled group sparsity framework, solving the modeled framework using a group lasso to yield a least squares solution within each group, wherein the least squares solution corresponds to the unmixing of the colocation markers, and yielding a sparse solution among the groups that corresponds to the unmixing of the non-colocation markers.
- WO2015124772 describes a method of unmixing by inputting image data obtained from the biological tissue sample, reading reference data from an electronic memory, the reference data being descriptive of the stain color of each one of the multiple stains, reading colocation data from electronic memory, the colocation data being descriptive of groups of the stains, each group comprising stains that can be collocated in the biological tissue sample, and each group forming a group for the group lasso criterion, at least one of the groups having a size of two or above, and calculating a solution of the group lasso criterion for obtaining the unmixed image using the reference data as a reference matrix.
- the method for unmixing an image may comprise generating a group sparsity model wherein a fraction of a stain contribution from colocalized markers is assigned within a single group and a fraction of a stain contribution from non-colocalized markers is assigned within separate groups, and solving the group sparsity model using an unmixing algorithm to yield a least squares solution within each group.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Image Analysis (AREA)
- Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762595143P | 2017-12-06 | 2017-12-06 | |
PCT/EP2018/083434 WO2019110561A1 (en) | 2017-12-06 | 2018-12-04 | Method of storing and retrieving digital pathology analysis results |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3721372A1 true EP3721372A1 (en) | 2020-10-14 |
Family
ID=64604651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18814573.4A Pending EP3721372A1 (en) | 2017-12-06 | 2018-12-04 | Method of storing and retrieving digital pathology analysis results |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3721372A1 (en) |
JP (2) | JP7197584B2 (en) |
CN (2) | CN117038018A (en) |
WO (1) | WO2019110561A1 (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347823B (en) * | 2019-08-09 | 2024-05-03 | 中国石油天然气股份有限公司 | Deposition phase boundary identification method and device |
EP4022286A1 (en) | 2019-08-28 | 2022-07-06 | Ventana Medical Systems, Inc. | Label-free assessment of biomarker expression with vibrational spectroscopy |
EP4107657A1 (en) * | 2020-02-17 | 2022-12-28 | 10X Genomics, Inc. | Systems and methods for machine learning features in biological samples |
CN112070041B (en) * | 2020-09-14 | 2023-06-09 | 北京印刷学院 | Living body face detection method and device based on CNN deep learning model |
CN112329765B (en) * | 2020-10-09 | 2024-05-24 | 中保车服科技服务股份有限公司 | Text detection method and device, storage medium and computer equipment |
WO2022107435A1 (en) * | 2020-11-20 | 2022-05-27 | コニカミノルタ株式会社 | Image analysis method, image analysis system, and program |
CN112785713B (en) * | 2021-01-29 | 2024-06-14 | 广联达科技股份有限公司 | Method, device, equipment and readable storage medium for arranging light source |
CN113469939B (en) * | 2021-05-26 | 2022-05-03 | 透彻影像(北京)科技有限公司 | HER-2 immunohistochemical automatic interpretation system based on characteristic curve |
US11830622B2 (en) | 2021-06-11 | 2023-11-28 | International Business Machines Corporation | Processing multimodal images of tissue for medical evaluation |
CN113763370B (en) * | 2021-09-14 | 2024-09-06 | 佰诺全景生物技术(北京)有限公司 | Digital pathology image processing method and device, electronic equipment and storage medium |
CN115201092B (en) * | 2022-09-08 | 2022-11-29 | 珠海圣美生物诊断技术有限公司 | Method and device for acquiring cell scanning image |
KR102579826B1 (en) * | 2022-12-09 | 2023-09-18 | (주) 브이픽스메디칼 | Method, apparatus and system for providing medical diagnosis assistance information based on artificial intelligence |
CN116188423B (en) * | 2023-02-22 | 2023-08-08 | 哈尔滨工业大学 | Super-pixel sparse and unmixed detection method based on pathological section hyperspectral image |
CN117272393B (en) * | 2023-11-21 | 2024-02-02 | 福建智康云医疗科技有限公司 | Method for checking medical images across hospitals by scanning codes in regional intranet |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5595707A (en) | 1990-03-02 | 1997-01-21 | Ventana Medical Systems, Inc. | Automated biological reaction apparatus |
ATE489613T1 (en) | 1998-02-27 | 2010-12-15 | Ventana Med Syst Inc | AUTOMATED MOLECULAR PATHOLOGY APPARATUS WITH INDEPENDENT SLIDE WARMERS |
US6582962B1 (en) | 1998-02-27 | 2003-06-24 | Ventana Medical Systems, Inc. | Automated molecular pathology apparatus having independent slide heaters |
US20030211630A1 (en) | 1998-02-27 | 2003-11-13 | Ventana Medical Systems, Inc. | Automated molecular pathology apparatus having independent slide heaters |
WO2002077903A2 (en) * | 2001-03-26 | 2002-10-03 | Cellomics, Inc. | Methods for determining the organization of a cellular component of interest |
US20050136549A1 (en) * | 2003-10-30 | 2005-06-23 | Bioimagene, Inc. | Method and system for automatically determining diagnostic saliency of digital images |
US7760927B2 (en) | 2003-09-10 | 2010-07-20 | Bioimagene, Inc. | Method and system for digital image based tissue independent simultaneous nucleus cytoplasm and membrane quantitation |
JP4496943B2 (en) * | 2004-11-30 | 2010-07-07 | 日本電気株式会社 | Pathological diagnosis support apparatus, pathological diagnosis support program, operation method of pathological diagnosis support apparatus, and pathological diagnosis support system |
ES2617882T3 (en) * | 2005-05-13 | 2017-06-20 | Tripath Imaging, Inc. | Image analysis method based on chromogen separation |
EP1991852B1 (en) * | 2006-03-06 | 2015-07-01 | Zetiq Technologies Ltd. | Methods for identifying a cell phenotype |
JP4838094B2 (en) * | 2006-10-27 | 2011-12-14 | 三井造船株式会社 | Flow cytometer having cell sorting function and living cell sorting method |
SG187479A1 (en) | 2009-10-19 | 2013-02-28 | Ventana Med Syst Inc | Imaging system and techniques |
US20130156785A1 (en) * | 2010-08-27 | 2013-06-20 | University Of Zurich | Novel diagnostic and therapeutic target in inflammatory and/or cardiovascular diseases |
US9076198B2 (en) | 2010-09-30 | 2015-07-07 | Nec Corporation | Information processing apparatus, information processing system, information processing method, program and recording medium |
JP5645146B2 (en) * | 2011-01-31 | 2014-12-24 | 日本電気株式会社 | Information processing system, information processing method, information processing apparatus, control method thereof, and control program thereof |
JP2014533509A (en) | 2011-11-17 | 2014-12-15 | セルスケープ・コーポレーション | Method, apparatus and kit for obtaining and analyzing cells |
LT2841575T (en) * | 2012-04-27 | 2019-10-10 | Millennium Pharmaceuticals, Inc. | Anti-gcc antibody molecules and use of same to test for susceptibility to gcc-targeted therapy |
WO2013167139A1 (en) | 2012-05-11 | 2013-11-14 | Dako Denmark A/S | Method and apparatus for image scoring and analysis |
AU2013369439B2 (en) * | 2012-12-28 | 2019-01-17 | Cleveland Clinic Foundation | Image analysis for breast cancer prognosis |
EP2973397B1 (en) | 2013-03-15 | 2017-08-02 | Ventana Medical Systems, Inc. | Tissue object-based machine learning system for automated scoring of digital whole slides |
CA2909913C (en) | 2013-06-03 | 2019-04-16 | Ventana Medical Systems, Inc. | Image adaptive physiologically plausible color separation |
CN103426169B (en) * | 2013-07-26 | 2016-12-28 | 西安华海盈泰医疗信息技术有限公司 | A kind of dividing method of medical image |
WO2015113895A1 (en) * | 2014-01-28 | 2015-08-06 | Ventana Medical Systems, Inc. | Adaptive classification for whole slide tissue segmentation |
WO2015124772A1 (en) | 2014-02-21 | 2015-08-27 | Ventana Medical Systems, Inc. | Group sparsity model for image unmixing |
JP6604960B2 (en) | 2014-02-21 | 2019-11-13 | ベンタナ メディカル システムズ, インコーポレイテッド | Medical image analysis to identify biomarker positive tumor cells |
WO2015189264A1 (en) * | 2014-06-10 | 2015-12-17 | Ventana Medical Systems, Inc. | Predicting breast cancer recurrence directly from image features computed from digitized immunohistopathology tissue slides |
EP3175389B1 (en) | 2014-07-28 | 2024-05-15 | Ventana Medical Systems, Inc. | Automatic glandular and tubule detection in histological grading of breast cancer |
CA2965564C (en) | 2014-11-10 | 2024-01-02 | Ventana Medical Systems, Inc. | Classifying nuclei in histology images |
WO2016087589A1 (en) * | 2014-12-03 | 2016-06-09 | Ventana Medical Systems, Inc. | Methods, systems, and apparatuses for quantitative analysis of heterogeneous biomarker distribution |
WO2016120442A1 (en) | 2015-01-30 | 2016-08-04 | Ventana Medical Systems, Inc. | Foreground segmentation and nucleus ranking for scoring dual ish images |
AU2016236323A1 (en) * | 2015-03-20 | 2017-08-10 | Ventana Medical Systems, Inc. | System and method for image segmentation |
EP3345122B1 (en) | 2015-09-02 | 2021-06-30 | Ventana Medical Systems, Inc. | Automated analysis of cellular samples having intermixing of analytically distinct patterns of analyte staining |
-
2018
- 2018-12-04 CN CN202311034131.7A patent/CN117038018A/en active Pending
- 2018-12-04 CN CN201880079402.1A patent/CN111448569B/en active Active
- 2018-12-04 WO PCT/EP2018/083434 patent/WO2019110561A1/en unknown
- 2018-12-04 JP JP2020530584A patent/JP7197584B2/en active Active
- 2018-12-04 EP EP18814573.4A patent/EP3721372A1/en active Pending
-
2022
- 2022-12-15 JP JP2022200094A patent/JP7558242B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
WO2019110561A1 (en) | 2019-06-13 |
JP7197584B2 (en) | 2022-12-27 |
CN111448569A (en) | 2020-07-24 |
CN117038018A (en) | 2023-11-10 |
JP2023030033A (en) | 2023-03-07 |
JP7558242B2 (en) | 2024-09-30 |
CN111448569B (en) | 2023-09-26 |
JP2021506003A (en) | 2021-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11682192B2 (en) | Deep-learning systems and methods for joint cell and region classification in biological images | |
JP7558242B2 (en) | Method for storing and retrieving digital pathology analysis results - Patents.com | |
US11922681B2 (en) | Systems and methods for identifying cell clusters within images of stained biological samples | |
US11657503B2 (en) | Computer scoring based on primary stain and immunohistochemistry images related application data | |
US11842483B2 (en) | Systems for cell shape estimation | |
US11978200B2 (en) | Image enhancement to enable improved nuclei detection and segmentation | |
WO2019110567A1 (en) | Method of computing tumor spatial and inter-marker heterogeneity | |
US11959848B2 (en) | Method of storing and retrieving digital pathology analysis results | |
US11615532B2 (en) | Quantitation of signal in stain aggregates |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200608 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20221026 |