CN117378015A

CN117378015A - Predicting actionable mutations from digital pathology images

Info

Publication number: CN117378015A
Application number: CN202280035189.0A
Authority: CN
Inventors: P·S·S·奥坎波; B·斯蒂姆皮尔; 聂垚; F·谢赫扎德; 李骁; P·斯佐斯塔克; P·波瓦尔; F·阿加伊
Original assignee: F Hoffmann La Roche AG; Genentech Inc; Ventana Medical Systems Inc
Current assignee: F Hoffmann La Roche AG; Genentech Inc; Ventana Medical Systems Inc
Priority date: 2021-05-14
Filing date: 2022-05-14
Publication date: 2024-01-09

Abstract

The present disclosure provides a method comprising: a digital pathology image depicting tumor cells sampled from a subject is accessed. A plurality of tiles may be selected from the digital pathology image, wherein each of the tiles depicts a tumor cell. A mutation prediction may be generated for each of the tiles, wherein the mutation prediction represents a prediction of a likelihood that an operable mutation occurs in the tile. Based on the plurality of mutation predictions, a prognosis prediction associated with one or more treatment regimens for the subject may be generated. The prognosis prediction may be based on determining one or more mutated backgrounds of the digital pathology image as unknown drivers or tumor suppressors, oncogene driver mutations, or gene fusions.

Description

Predicting actionable mutations from digital pathology images

技术领域Technical Field

本公开涉及癌症诊断、治疗和预后。The present disclosure relates to cancer diagnosis, treatment, and prognosis.

背景技术Background Art

规划癌症治疗中的一个重要步骤是在考虑疾病进展和患者依从性的可能性两者的情况下识别最有可能有效的治疗选项。一旦恶性肿瘤已生长超过一定尺寸和/或已经转移，患者的选项就可能限于诸如化学疗法、放射疗法、靶向疗法和免疫治疗等治疗。由于严重的副作用让患者感到不适，患者对化学疗法和放射疗法的依从性可能面临挑战。患者也面临癌细胞可能突变并对全身性疗法产生耐药性的风险。免疫疗法是这些治疗选择中最新颖的，其激活患者自身的免疫系统并且代表管控癌症中的重要选项，尽管其可能仅对20％至30％的肿瘤有效。一般来说，跨不同的肿瘤类型应用以下基本原理，以便使直接临床效益最大化并且保障针对患者的未来治疗选项：作为第一步，应首先筛查患有晚期疾病的患者对于靶向疗法的资格。如果没有识别到突变靶标，则为患者提供单独的免疫疗法或与传统化学疗法相组合的免疫疗法。作为最后手段，如果该组合失败，则可以考虑化学疗法的组合。An important step in planning cancer treatment is to identify the treatment options that are most likely to be effective, taking into account both the likelihood of disease progression and patient compliance. Once a malignant tumor has grown beyond a certain size and/or has metastasized, the patient's options may be limited to treatments such as chemotherapy, radiotherapy, targeted therapy, and immunotherapy. Patient compliance with chemotherapy and radiotherapy may be challenging due to severe side effects that make the patient uncomfortable. Patients also face the risk that cancer cells may mutate and become resistant to systemic therapy. Immunotherapy is the newest of these treatment options, which activates the patient's own immune system and represents an important option in managing cancer, although it may only be effective for 20% to 30% of tumors. In general, the following basic principles are applied across different tumor types in order to maximize direct clinical benefits and safeguard future treatment options for patients: As a first step, patients with advanced disease should first be screened for eligibility for targeted therapy. If no mutation target is identified, the patient is offered immunotherapy alone or in combination with traditional chemotherapy. As a last resort, if the combination fails, a combination of chemotherapy can be considered.

靶向疗法可能是优选的，但由于可寻址突变的罕见性而仅适用于一小部分肿瘤。基于目前批准的疗法，可寻址突变仅在致癌基因(阳性地调节不同细胞功能的基因)上出现。在肺癌中，最常见和可操作突变(actionable mutation)存在于以下致癌基因中：EGFR、ALK、RET、ROS1和NTRK。除了致癌基因之外，被称为肿瘤抑制因子的第二类基因在突变时也可能导致癌症。顾名思义，这些基因通常起到抑制细胞活性的作用；因此，这些基因中的突变会经由抑制的损失而导致癌症。最终，在临床实践中，使用下一代测序(NGS)实验组(panel)对肿瘤进行最全面的分子表征。该测定仅检测基因子集中的突变，其中一些实验组查询低至25个基因，而另一些实验组则可以查询多达500个基因。这代表人类基因组中的20,000至25,000个总基因的一小部分。这可以解释为什么在经由NGS(未知驱动因子)测试的肿瘤样品的一部分中未发现驱动因子突变。Targeted therapies may be preferred, but are only applicable to a small number of tumors due to the rarity of addressable mutations. Based on currently approved therapies, addressable mutations only occur on oncogenes (genes that positively regulate different cell functions). In lung cancer, the most common and actionable mutations are found in the following oncogenes: EGFR, ALK, RET, ROS1, and NTRK. In addition to oncogenes, a second class of genes called tumor suppressors can also cause cancer when mutated. As the name suggests, these genes usually play a role in suppressing cell activity; therefore, mutations in these genes can cause cancer via the loss of inhibition. Ultimately, in clinical practice, the most comprehensive molecular characterization of tumors is performed using next-generation sequencing (NGS) panels. This assay detects mutations only in a subset of genes, with some panels querying as few as 25 genes and others querying up to 500 genes. This represents a small fraction of the 20,000 to 25,000 total genes in the human genome. This may explain why no driver mutations were found in a portion of tumor samples tested via NGS (unknown drivers).

靶向疗法可以包括靶向表皮生长因子受体(EGFR)以及涉及间变性淋巴瘤激酶(ALK)、RET、ROS1和神经营养性酪氨酸受体激酶(NTRK)的基因融合的药物。对于EGFR，尽管免疫组织化学染色可以用于识别最常见的变体(例如，其覆盖率高达EGFR阳性肺腺癌患者的97％)，但可能需要分子检测来识别EGFR靶向疗法失败的患者中的耐药突变。还没有开发出针对RET和ROS1的此类免疫组织化学染色剂，并且针对ALK和NTRK的免疫组织化学染色剂的性能可能变化很大并且难以解释。Targeted therapies can include drugs that target the epidermal growth factor receptor (EGFR) as well as gene fusions involving anaplastic lymphoma kinase (ALK), RET, ROS1, and neurotrophic tyrosine receptor kinase (NTRK). For EGFR, although immunohistochemical staining can be used to identify the most common variants (e.g., with coverage up to 97% in patients with EGFR-positive lung adenocarcinoma), molecular testing may be required to identify resistance mutations in patients who have failed EGFR-targeted therapy. Such immunohistochemical stains have not been developed for RET and ROS1, and the performance of immunohistochemical stains for ALK and NTRK can be highly variable and difficult to interpret.

基因融合通常需要更复杂的分子测定，与针对有限数量基因座进行测试的更常用的“热点”测定相比，其基因组的覆盖范围更大。为了靶向基因融合，可能需要广泛得多的覆盖范围，从而导致昂贵得多的测试，这需要实验室表现出大得多的技术能力。然而，某些基因融合(例如，NTRK融合)可能极其罕见。尽管NTRK融合已在多种肿瘤类型中被识别，但在最常见的癌症适应症中(诸如在肺腺癌、结肠直肠癌和非分泌性乳腺癌中)，该特定融合的频率可能小于1％。基因融合的相对罕见性(例如，在肺腺癌中范围为从针对ALK的7％到针对NTRK的小于0.3％)构成了对广泛检测的显著技术和资金阻碍。目前，分子检测是迄今为止可用于确定患者中是否存在基因融合的唯一方法。然而，分子检测价格昂贵，因此患者有时会避免安排分子检测，因为患者很可能无法从靶向疗法中受益。因此，相当大比例的患者可能不太可能接受正确的测试来确定他们的肿瘤是否携带基因融合。因此，需要一种快速、稳健并且灵敏的筛查工具，以通过所需的测试类型对肿瘤样品进行分类来确定其首选并且最佳的全身性治疗选项。Gene fusions usually require more complex molecular assays, which have a greater coverage of the genome than the more commonly used "hotspot" assays for testing a limited number of loci. In order to target gene fusions, a much wider coverage may be required, resulting in much more expensive tests, which require laboratories to demonstrate much greater technical capabilities. However, some gene fusions (e.g., NTRK fusions) may be extremely rare. Although NTRK fusions have been identified in a variety of tumor types, the frequency of this particular fusion may be less than 1% in the most common cancer indications (such as in lung adenocarcinoma, colorectal cancer, and non-secretory breast cancer). The relative rarity of gene fusions (e.g., ranging from 7% for ALK to less than 0.3% for NTRK in lung adenocarcinoma) constitutes a significant technical and financial obstacle to extensive testing. At present, molecular testing is the only method that can be used to determine whether there is a gene fusion in a patient so far. However, molecular testing is expensive, so patients sometimes avoid arranging molecular testing because patients are likely to not benefit from targeted therapy. Therefore, a considerable proportion of patients may be unlikely to receive the right test to determine whether their tumors carry gene fusions. Therefore, a rapid, robust, and sensitive screening tool is needed to triage tumor samples by the type of test required to determine their preferred and optimal systemic treatment options.

发明内容Summary of the invention

本文提供了使用数字病理学技术来识别可操作突变(包括(例如但不限于)致癌基因融合(例如，ALK、ROS1、RET和NTRK))的系统和方法，其中可操作突变预测对于其而言靶向疗法可用的突变以及治疗应答的预后。Provided herein are systems and methods for using digital pathology technologies to identify actionable mutations, including, for example, but not limited to, oncogenic gene fusions (e.g., ALK, ROS1, RET, and NTRK), wherein the actionable mutations predict mutations for which targeted therapies may be useful and the prognosis of treatment response.

在一些情况下，所公开的方法和系统可以应用于检测基因融合/重排，这是一种特定类型的罕见的可药性(druggable)致癌基因突变事件，其可以跨许多不同的癌症类型被识别，如果存在于肿瘤组织样品中，则其可以指示对某些靶向疗法的强烈应答。基因融合包括罕见的可药性突变事件，这些突变事件可以跨许多不同的肿瘤类型而发生，并且越来越多地作为新颖疗法的靶标。基因融合的识别可能是在技术上困难、昂贵并且耗时的过程，其最终可能仅使少数携带此类基因改变的患者受益；由于这些原因，广泛检测可能仅限于少数有能力吸收和提供在该过程中所涉及的技术和资金资源的医院。本文公开的实施方案可以通过创建、训练和使用机器学习模型(例如，数字病理学筛查模型)来解决这一差距，该机器学习模型可以从数字病理学图像(诸如经扫描、经染色的(例如，经苏木精和伊红(H&E)染色的)描绘癌组织/细胞(例如，肺腺癌)的全载玻片图像(WSI))来预测致癌基因融合的存在。此外，本文公开的实施方案可以包括快速、便宜并且足够准确的筛查工具，其可以用于指导分子检测以及与针对患者(包括但不限于肺腺癌患者)中的个体的靶向疗法的使用有关的决策制定。In some cases, the disclosed methods and systems can be applied to detect gene fusion/rearrangement, which is a specific type of rare druggable oncogene mutation event that can be identified across many different cancer types and, if present in tumor tissue samples, can indicate a strong response to certain targeted therapies. Gene fusions include rare druggable mutation events that can occur across many different tumor types and are increasingly being targeted for novel therapies. The identification of gene fusions can be a technically difficult, expensive, and time-consuming process that may ultimately benefit only a few patients carrying such genetic changes; for these reasons, widespread testing may be limited to a few hospitals that have the ability to absorb and provide the technical and financial resources involved in the process. The embodiments disclosed herein can address this gap by creating, training, and using a machine learning model (e.g., a digital pathology screening model) that can predict the presence of oncogene fusions from digital pathology images (such as scanned, stained (e.g., hematoxylin and eosin (H&E) stained) whole slide images (WSIs) depicting cancer tissue/cells (e.g., lung adenocarcinoma)). Furthermore, embodiments disclosed herein may include rapid, inexpensive, and sufficiently accurate screening tools that can be used to guide molecular testing and decision making regarding the use of targeted therapies for individuals in patients, including but not limited to patients with lung adenocarcinoma.

在特定实施方案中，数字病理学图像处理系统可访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。数字病理学图像处理系统然后可从数字病理学图像识别一个或多个图像图块(image patch)，每个图像图块描绘肿瘤细胞的一个或多个簇(例如，完全由肿瘤细胞组成的区域或包括被间质包围的一个或多个肿瘤巢(tumornest)结构的区域)。在一些情况下，当数字病理学图像已被划分为多个图块时，图像图块可以包括图像图块的一部分、多个相邻图块、或者一个或多个相邻图块与图块的一个或多个相邻部分的组合。数字病理学图像处理系统可针对该多个图像图块中的每一者生成指示图像图块描绘肿瘤细胞的簇的可能性(例如，二进制输出或百分比输出)的标记。在特定实施方案中，数字病理学图像处理系统可基于针对每个图块生成的标记来确定数字病理学图像包括对存在于生物学样品中的癌细胞中例如基因融合的发生的描绘。数字病理学图像处理系统可进一步基于对例如基因融合的检测来生成针对受试者的预后预测。在一些实施方案中，预后预测可以包括对针对受试者的一个或多个治疗方案(例如，化学疗法或靶向治疗)的适用性的预测。In a specific embodiment, the digital pathology image processing system may access a digital pathology image depicting cancer cells in a specific slice of a biological sample from a subject. The digital pathology image processing system may then identify one or more image patches from the digital pathology image, each image patch depicting one or more clusters of tumor cells (e.g., an area consisting entirely of tumor cells or an area including one or more tumor nests (tumornest) structures surrounded by stroma). In some cases, when the digital pathology image has been divided into a plurality of tiles, the image tile may include a portion of the image tile, a plurality of adjacent tiles, or a combination of one or more adjacent tiles and one or more adjacent portions of the tile. The digital pathology image processing system may generate a mark indicating the possibility (e.g., a binary output or a percentage output) of a cluster of tumor cells depicted by the image tile for each of the plurality of image tiles. In a specific embodiment, the digital pathology image processing system may determine that the digital pathology image includes a depiction of the occurrence of, for example, gene fusion in cancer cells present in the biological sample based on the mark generated for each tile. The digital pathology image processing system may further generate a prognosis prediction for the subject based on the detection of, for example, gene fusion. In some embodiments, a prognostic prediction can include a prediction of the suitability of one or more treatment regimens (eg, chemotherapy or targeted therapy) for a subject.

本文公开的方法包括：访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像，并且其中经描绘的特定切片经一种或多种染色剂染色；将数字病理学图像分割为多个图像图块；针对该多个图像图块中的每一者生成指示图像图块描绘成簇的肿瘤细胞的可能性的标记；基于针对每个图像图块生成的标记来确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘；以及基于关于癌细胞的基因融合的发生来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测。Methods disclosed herein include: accessing a digital pathology image depicting cancer cells in a specific slice of a biological sample from a subject, and wherein the depicted specific slice is stained with one or more stains; segmenting the digital pathology image into a plurality of image tiles; generating, for each of the plurality of image tiles, a label indicating a likelihood that the image tile depicts a cluster of tumor cells; determining, based on the label generated for each image tile, that the digital pathology image includes a depiction of an occurrence of a gene fusion with respect to the cancer cells; and generating a prognostic prediction for the subject based on the occurrence of the gene fusion with respect to the cancer cells, wherein the prognostic prediction includes a prediction of the suitability of one or more treatment regimens for the subject.

在一些实施方案中，该方法进一步包括：从该多个图像图块中的每一者检测一个或多个特征，其中该一个或多个特征包括临床特征或组织学特征中的一者或多者，并且其中针对该多个图像图块中的每一者生成标记是基于该一个或多个特征的。In some embodiments, the method further includes: detecting one or more features from each of the multiple image tiles, wherein the one or more features include one or more of clinical features or histological features, and wherein generating a label for each of the multiple image tiles is based on the one or more features.

在一些实施方案中，针对该多个图像图块中的每一者生成标记是基于肿瘤形态学的，其中肿瘤形态学基于对以下中的一者或多者的分析：印戒细胞的存在、印戒细胞的数量、肝样细胞的存在、肝样细胞的数量、细胞外粘蛋白或肿瘤生长模式。In some embodiments, generating a label for each of the plurality of image tiles is based on tumor morphology, wherein the tumor morphology is based on an analysis of one or more of: the presence of signet ring cells, the number of signet ring cells, the presence of hepatoid cells, the number of hepatoid cells, extracellular mucin, or a tumor growth pattern.

在一些实施方案中，针对该多个图像图块中的每一者生成标记是基于一个或多个机器学习模型的，其中该方法进一步包括：基于多个训练数据来训练该一个或多个机器学习模型，该多个训练数据包括对成簇的肿瘤细胞的一个或多个带标记的描绘以及对其他组织学或临床特征的一个或多个带标记的描绘。In some embodiments, generating labels for each of the multiple image tiles is based on one or more machine learning models, wherein the method further includes: training the one or more machine learning models based on multiple training data, the multiple training data including one or more labeled depictions of clustered tumor cells and one or more labeled depictions of other histological or clinical features.

在一些实施方案中，进一步基于对一个或多个额外数字病理学图像的分析来生成预后预测，该一个或多个额外数字病理学图像中的每一者描绘来自受试者的生物学样品中的额外特定样品，并且其中分析包括：确定该一个或多个额外数字病理学图像中的每一者包括对关于癌细胞的基因融合的发生的描绘的可能性；以及对针对该一个或多个额外数字病理学图像中的每一者的确定进行组合。In some embodiments, the prognostic prediction is generated further based on an analysis of one or more additional digital pathology images, each of the one or more additional digital pathology images depicting an additional specific sample in the biological sample from the subject, and wherein the analyzing includes: determining a likelihood that each of the one or more additional digital pathology images includes a depiction of the occurrence of a gene fusion for a cancer cell; and combining the determinations for each of the one or more additional digital pathology images.

在一些实施方案中，该方法进一步包括：经由图形用户界面来输出预后预测，其中图形用户界面包括数字病理学图像的图形表示，并且其中图形表示包括：对针对该多个图像图块中的每一者所生成的标记的指示，以及与相应标记相关联的经预测的置信水平。In some embodiments, the method further includes: outputting the prognostic prediction via a graphical user interface, wherein the graphical user interface includes a graphical representation of the digital pathology image, and wherein the graphical representation includes: an indication of the label generated for each of the plurality of image tiles, and a predicted confidence level associated with the corresponding label.

在一些实施方案中，该方法进一步包括：生成与该一个或多个治疗方案的使用相关联的建议。In some embodiments, the method further comprises generating a recommendation associated with use of the one or more treatment regimens.

在一些实施方案中，生物学样品的特定切片经一种或多种染色剂染色。In some embodiments, specific sections of a biological sample are stained with one or more stains.

在一些实施方案中，确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘进一步基于针对每个图像图块生成的标记的加权组合。In some embodiments, determining that the digital pathology image includes a depiction of the occurrence of a gene fusion with respect to the cancer cell is further based on a weighted combination of the labels generated for each image tile.

在一些实施方案中，该方法进一步包括：从数字病理学图像识别肿瘤异质性；以及测量经识别的肿瘤异质性，其中确定数字病理学图像包括对基因融合的发生的描绘进一步基于经测量的肿瘤异质性。In some embodiments, the method further comprises: identifying tumor heterogeneity from the digital pathology image; and measuring the identified tumor heterogeneity, wherein determining that the digital pathology image includes delineation of the occurrence of the gene fusion is further based on the measured tumor heterogeneity.

在一些实施方案中，识别肿瘤异质性包括：通过识别形态上相似的细胞(例如，通过评定核异质性)来将突变肿瘤细胞分类到表型。在一些实施方案中，评定核异质性包括：量化细胞核的某些特征以基于核形态异质性来区分突变细胞。In some embodiments, identifying tumor heterogeneity comprises classifying mutant tumor cells into phenotypes by identifying morphologically similar cells (e.g., by assessing nuclear heterogeneity). In some embodiments, assessing nuclear heterogeneity comprises quantifying certain features of the cell nucleus to distinguish mutant cells based on nuclear morphological heterogeneity.

在一些实施方案中，识别肿瘤异质性包括：通过进行细胞级空间分析以评定空间分布来识别克隆细胞的区域。在一些实施方案中，评定空间分布包括：测量肿瘤细胞的最小生成树的子图内的光谱距离，其中子图中的每一者表示成簇的相邻细胞(例如，肿瘤巢)，以及跨所有子图成对地计算邻接光谱距离。在一些实施方案中，可通过进行异常值检测来定义子图中的每一者。在一些实施方案中，可基于对检测到的肿瘤巢的分割来定义子图中的每一者。In some embodiments, identifying tumor heterogeneity includes: identifying regions of clonal cells by performing a cell-level spatial analysis to assess the spatial distribution. In some embodiments, assessing the spatial distribution includes: measuring spectral distances within a subgraph of a minimum spanning tree of tumor cells, wherein each of the subgraphs represents clustered adjacent cells (e.g., tumor nests), and calculating adjacency spectral distances pairwise across all subgraphs. In some embodiments, each of the subgraphs can be defined by performing outlier detection. In some embodiments, each of the subgraphs can be defined based on segmentation of detected tumor nests.

在一些实施方案中，识别肿瘤异质性包括：通过进行细胞级空间分析以评定空间熵来识别紧密相邻的克隆细胞的区域。在一些实施方案中，评定空间熵包括：针对预定义数量的距离箱中的每一者来计算成对的细胞被识别为形态上相似的频率。In some embodiments, identifying tumor heterogeneity comprises: identifying regions of closely adjacent clonal cells by performing a cell-level spatial analysis to assess spatial entropy. In some embodiments, assessing spatial entropy comprises: calculating the frequency with which pairs of cells are identified as morphologically similar for each of a predefined number of distance bins.

本文公开了一种或多种计算机可读非暂时性存储介质，其体现软件，该软件当被执行时可操作以执行本文公开的一种或多种方法的一部分或全部。Disclosed herein are one or more computer-readable non-transitory storage media embodying software that, when executed, is operable to perform a portion or all of one or more methods disclosed herein.

本文公开的系统包括：一个或多个处理器；以及非暂时性存储器，该非暂时性存储器包含指令，该指令当被一个或多个数据处理器执行时，使该一个或多个数据处理器执行本文公开的一种或多种方法的一部分或全部。The system disclosed herein includes: one or more processors; and a non-transitory memory, which contains instructions, which when executed by the one or more data processors, cause the one or more data processors to perform part or all of one or more methods disclosed herein.

本文公开的方法包括：从客户端计算系统向远程计算系统传输请求通信，以处理描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像，其中响应于从客户端计算系统接收请求通信，远程计算系统进行包括以下的操作：访问数字病理学图像；将数字病理学图像分割为多个图像图块；针对该多个图像图块中的每一者生成指示图像图块描绘成簇的肿瘤细胞的可能性(例如，二进制输出或百分比输出)的标记；基于针对每个图像图块生成的标记来确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘；基于关于癌细胞的基因融合的发生来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测；以及经由响应通信来向客户端计算系统提供预后预测；以及由客户端计算系统响应于接收响应通信而输出预后预测。Methods disclosed herein include: transmitting a request communication from a client computing system to a remote computing system to process a digital pathology image depicting cancer cells in a particular slice of a biological sample from a subject, wherein in response to receiving the request communication from the client computing system, the remote computing system performs operations including: accessing the digital pathology image; segmenting the digital pathology image into a plurality of image tiles; generating a label indicating a likelihood (e.g., a binary output or a percentage output) that the image tile depicts a cluster of tumor cells for each of the plurality of image tiles; determining, based on the label generated for each image tile, that the digital pathology image includes a depiction of an occurrence of a gene fusion with respect to the cancer cells; generating a prognostic prediction for the subject based on the occurrence of the gene fusion with respect to the cancer cells, wherein the prognostic prediction includes a prediction of suitability of one or more treatment regimens for the subject; and providing the prognostic prediction to the client computing system via a response communication; and outputting, by the client computing system, the prognostic prediction in response to receiving the response communication.

本文公开的方法包括由数字病理学图像处理系统：访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像；确定数字病理学图像包括对与基因融合的发生相互排斥的一个或多个突变的描绘；确定不存在关于癌细胞的基因融合；以及基于不存在关于癌细胞的基因融合来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测。The methods disclosed herein include: accessing, by a digital pathology image processing system, a digital pathology image depicting cancer cells in a particular slice of a biological sample from a subject; determining that the digital pathology image includes a depiction of one or more mutations that are mutually exclusive with the occurrence of a gene fusion; determining an absence of the gene fusion with respect to the cancer cell; and generating a prognostic prediction for the subject based on the absence of the gene fusion with respect to the cancer cell, wherein the prognostic prediction includes a prediction of the suitability of one or more treatment regimens for the subject.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1A至图1D示出了肿瘤异质性的非限制性示例。Figures 1A-1D illustrate non-limiting examples of tumor heterogeneity.

图2示出了根据本公开的一些实施方案的可以用于如本文所述的数字病理学图像生成和处理的交互计算机系统的网络的非限制性示例。2 illustrates a non-limiting example of a network of interactive computer systems that may be used for digital pathology image generation and processing as described herein, according to some embodiments of the present disclosure.

图3是针对用于预测基因融合的非限制性示例方法的流程图。FIG. 3 is a flow chart for a non-limiting example method for predicting gene fusions.

图4是针对用于检测基因融合的非限制性示例流水线的流程图。FIG. 4 is a flow chart for a non-limiting example pipeline for detecting gene fusions.

图5A示出了针对肺腺癌的病理学载玻片图像的非限制性示例。FIG. 5A shows a non-limiting example of a pathology slide image for lung adenocarcinoma.

图5B示出了来自质量控制的结果的非限制性示例。FIG. 5B shows a non-limiting example of results from quality control.

图5C示出了肿瘤簇检测的非限制性示例。FIG5C shows a non-limiting example of tumor cluster detection.

图5D示出了对基因融合状态的预测的非限制性示例。FIG. 5D shows a non-limiting example of prediction of gene fusion status.

图6示出了对ROS1基因融合状态的非限制性示例预测。FIG. 6 shows non-limiting example predictions for ROS1 gene fusion status.

图7示出了针对基于图像图块的融合预测的ROC曲线的非限制性示例。FIG. 7 shows a non-limiting example of a ROC curve for image patch based fusion prediction.

图8是针对用于使得最终用户能够请求预后预测的非限制性示例方法的流程图。8 is a flow diagram for a non-limiting example method for enabling an end user to request a prognostic prediction.

图9是针对用于基于识别缺乏(或不存在)基因融合来预测替代性治疗的非限制性示例方法的流程图。9 is a flow chart for a non-limiting example method for predicting alternative treatments based on identifying the lack (or absence) of a gene fusion.

图10A是当突变背景对应于仅肿瘤抑制因子(诸如KEAP1、STK11或TP53中的致病性变体)；和/或未知驱动因子时的肿瘤异质性的示意图。FIG. 10A is a schematic diagram of tumor heterogeneity when the mutational context corresponds to only tumor suppressors (such as pathogenic variants in KEAP1, STK11, or TP53); and/or unknown drivers.

图10B是当突变背景对应于致癌基因驱动因子突变(诸如EGFR、KRAS或PIK3CA中的致病性变体)时的肿瘤异质性的示意图。FIG. 10B is a schematic diagram of tumor heterogeneity when the mutational context corresponds to oncogenic driver mutations, such as pathogenic variants in EGFR, KRAS, or PIK3CA.

图10C是当突变背景对应于致癌基因融合(诸如ALK、RET、ROS1或NTRK)时的肿瘤异质性的示意图。FIG. 10C is a schematic diagram of tumor heterogeneity when the mutational context corresponds to an oncogenic gene fusion such as ALK, RET, ROS1, or NTRK.

图11是针对用于识别肿瘤异质性的非限制性示例方法的流程图。FIG. 11 is a flow chart for a non-limiting example method for identifying tumor heterogeneity.

图12A至图12C描绘了肿瘤细胞注释、细胞核分割和细胞核掩模生成的非限制性示例。12A-12C depict non-limiting examples of tumor cell annotation, cell nucleus segmentation, and cell nucleus mask generation.

图13A至图13E示出了基于量化某些核形态学特征来识别基因融合的实验结果的非限制性示例。Figures 13A-13E show non-limiting examples of experimental results based on quantifying certain nuclear morphological features to identify gene fusions.

图14A至图14C示出了数字病理学图像的非限制性示例，其示出了对应于肿瘤抑制因子和/或未知驱动因子的四种不同的核形态学特征。14A-14C illustrate non-limiting examples of digital pathology images showing four different nuclear morphological features corresponding to tumor suppressors and/or unknown drivers.

图15A至图15C示出了数字病理学图像的非限制性示例，其示出了对应于致癌基因驱动因子突变的四种不同的核形态学特征。Figures 15A-15C illustrate non-limiting examples of digital pathology images showing four different nuclear morphological features corresponding to oncogenic driver mutations.

图16A至图16C示出了数字病理学图像的非限制性示例，其示出了对应于基因融合的四种不同的核形态学特征。16A-16C illustrate non-limiting examples of digital pathology images showing four different nuclear morphological features corresponding to gene fusions.

图17A示出了对WSI中所描绘的肿瘤细胞的最小生成树的子图的识别的非限制性示例。FIG. 17A shows a non-limiting example of identification of a subgraph of a minimum spanning tree of tumor cells depicted in a WSI.

图17B示出了基于量化如子图之间的邻接光谱距离来识别基因融合的实验结果的非限制性示例。FIG. 17B shows a non-limiting example of experimental results for identifying gene fusions based on quantification such as contiguous spectral distances between subgraphs.

图18A至图18C示出了如在肿瘤细胞表型的三种不同分布中所测量的空间熵的非限制性示例。18A-18C show non-limiting examples of spatial entropy as measured in three different distributions of tumor cell phenotypes.

图18D是对在指定距离内共生并被分类在指定表型中的成对肿瘤细胞的识别的示意图。FIG. 18D is a schematic diagram of the identification of pairs of tumor cells that co-occur within a specified distance and are classified into a specified phenotype.

图18E至图18F示出了基于测量空间熵来识别基因融合、致癌基因驱动因子突变和肿瘤抑制因子和/或未知驱动因子的实验结果的非限制性示例。Figures 18E to 18F show non-limiting examples of experimental results based on measuring spatial entropy to identify gene fusions, oncogene driver mutations, and tumor suppressors and/or unknown drivers.

图19示出了计算系统的示例。FIG. 19 shows an example of a computing system.

具体实施方式DETAILED DESCRIPTION

为了识别哪些患者可以从针对基因融合的分子检测中受益，病理学家可以查看肿瘤组织样品的载玻片图像以评定肿瘤异质性的指标。肿瘤异质性可以从不同肿瘤细胞的有区别的形态学和表型特征被观察到，这些特征对应于引起细胞变成为癌性并且在体内生长和扩散的基因突变。肿瘤异质性可以表现为(例如，在单个肿瘤巢内的)肿瘤内异质性，或者表现为(例如，如在附近肿瘤巢之间的)肿瘤间异质性。除了用作正常组织与肿瘤组织之间的区分因素之外，肿瘤异质性还可以用作疾病严重程度的指标—高度异质性肿瘤(又名奇异肿瘤)可以指示由于可归因于通过基因突变所获取的耐药性的疗法失败而导致的不良预后。To identify which patients may benefit from molecular testing for gene fusions, pathologists can view slide images of tumor tissue samples to assess indicators of tumor heterogeneity. Tumor heterogeneity can be observed from the distinct morphological and phenotypic features of different tumor cells, which correspond to genetic mutations that cause cells to become cancerous and grow and spread in the body. Tumor heterogeneity can be manifested as intra-tumor heterogeneity (e.g., within a single tumor nest), or as inter-tumor heterogeneity (e.g., as between nearby tumor nests). In addition to being used as a distinguishing factor between normal and tumor tissue, tumor heterogeneity can also be used as an indicator of disease severity—highly heterogeneous tumors (also known as bizarre tumors) can indicate a poor prognosis due to failure of therapy attributable to resistance acquired through genetic mutations.

图1A至图1D示出了如在数字病理学图像中经可视化的肿瘤异质性的示例。图1A示出了子宫平滑肌肉瘤中的肿瘤异质性的示例。肿瘤细胞看起来彼此非常不同。例如，就尺寸而言，存在尺寸为其他肿瘤细胞的两倍、三倍或四倍的一些肿瘤细胞。就曲率而言，一些肿瘤细胞具有光滑并且呈圆形的外形，而其他肿瘤细胞则是有棱角的(即，具有至少两个re。换句话说，它们是非常异质的。图1B示出了隆凸性皮肤纤维肉瘤中的肿瘤异质性的示例。与图1A相反，图1B中的肿瘤细胞作为群体在其形状、易染色强度、角度和尺寸方面更加相似。在肉瘤中(诸如图1A和图1B中所描绘的那些)，高异质性(例如，如图1A所示)通常与非整倍性(其是染色体数量的畸变)相关联，而单调性(例如如图1B所示)(又名低肿瘤异质性)可以与基因融合相关联。图1C示出了肺鳞状细胞癌样品中的高肿瘤异质性的示例。图1D示出了肺腺癌样品中的高肿瘤异质性的示例。Figures 1A to 1D show examples of tumor heterogeneity as visualized in digital pathology images. Figure 1A shows an example of tumor heterogeneity in uterine leiomyosarcoma. The tumor cells look very different from each other. For example, in terms of size, there are some tumor cells that are two, three, or four times the size of other tumor cells. In terms of curvature, some tumor cells have a smooth and rounded appearance, while others are angular (i.e., have at least two re. In other words, they are very heterogeneous. FIG. 1B shows an example of tumor heterogeneity in dermatofibrosarcoma protuberans. In contrast to FIG. 1A , the tumor cells in FIG. 1B are more similar as a population in terms of their shape, easy staining intensity, angles, and size. In sarcomas (such as those depicted in FIG. 1A and FIG. 1B ), high heterogeneity (e.g., as shown in FIG. 1A ) is often associated with aneuploidy (which is an aberration in the number of chromosomes), while monotonicity (e.g., as shown in FIG. 1B ) (aka low tumor heterogeneity) can be associated with gene fusions. FIG. 1C shows an example of high tumor heterogeneity in a lung squamous cell carcinoma sample. FIG. 1D shows an example of high tumor heterogeneity in a lung adenocarcinoma sample.

患有基因融合的患者通常处于疾病晚期。基因融合/重排是一种罕见类型的致癌基因突变事件，其可以跨许多不同的癌症类型而被识别到。然而，这些突变具有越来越大的重要性，因为来自肿瘤样品的某些基因融合的存在可以指示对某些靶向疗法的强烈应答的可能性。在此类基因融合中，在细胞级上，细胞可能看起来具有低异质性(即，看起来是克隆的)，但在群组级(例如，群体级)上，其表型(即，一组形态学特征，其中共享共同的一组形态学特征的肿瘤细胞被认为属于单个表型)可能具有侵袭性，如对应于晚期时的诊断。关于肿瘤异质性与基因融合的相关性，存在至少几种假设。一种假设可以是指示基因融合的视觉信号可能主要存在于肿瘤巢/细胞中。另一种假设可以是指示基因融合的视觉信号可能很强并且跨肿瘤区域的所有部分扩散。又另一种假设可以是低肿瘤突变负荷可能表明降低的肿瘤形态异质性。在任何情况下，侵袭性恶性肿瘤中的肿瘤异质性的缺乏因此可以是跨肿瘤类型的基因融合的标志。然而，在一些情况下，人眼可能很难观察到肿瘤异质性的缺乏。Patients with gene fusions are usually in the late stage of the disease.Gene fusion/rearrangement is a rare type of oncogene mutation event that can be identified across many different cancer types. However, these mutations are of increasing importance because the presence of certain gene fusions from tumor samples can indicate the possibility of a strong response to certain targeted therapies. In such gene fusions, at the cellular level, cells may appear to have low heterogeneity (i.e., appear to be clones), but at the group level (e.g., colony level), their phenotype (i.e., a set of morphological features, wherein tumor cells sharing a common set of morphological features are considered to belong to a single phenotype) may be aggressive, such as corresponding to the diagnosis at the late stage. There are at least several hypotheses about the correlation of tumor heterogeneity with gene fusion. One hypothesis may be that visual signals indicating gene fusions may be mainly present in tumor nests/cells. Another hypothesis may be that visual signals indicating gene fusions may be strong and spread across all parts of the tumor region. Yet another hypothesis may be that low tumor mutation load may indicate reduced tumor morphological heterogeneity. In any case, the lack of tumor heterogeneity in invasive malignancies may therefore be a sign of gene fusion across tumor types. However, in some cases, the lack of tumor heterogeneity may be difficult to observe with the human eye.

图2示出了根据本公开的一些实施方案的可以用于如本文所述的数字病理学图像生成和处理(包括对可操作突变的预测)的交互计算机系统的网络200。2 illustrates a network 200 of interacting computer systems that may be used for digital pathology image generation and processing as described herein, including prediction of actionable mutations, according to some embodiments of the present disclosure.

数字病理学图像生成系统220可以生成对应于特定样品的一个或多个全载玻片图像(WSI)或其他相关数字病理学图像。例如，由数字病理学图像生成系统220生成的图像可以包括活检样品的染色切片。又如，由数字病理学图像生成系统220生成的图像可以包括液体样品的载玻片图像(例如，血涂片)。又如，由数字病理学图像生成系统220生成的图像可以包括荧光显微图，诸如描绘荧光探针已经与靶DNA或RNA序列结合之后的荧光原位杂交(FISH)的载玻片图像。The digital pathology image generation system 220 can generate one or more whole slide images (WSI) or other related digital pathology images corresponding to a particular sample. For example, the image generated by the digital pathology image generation system 220 can include a stained section of a biopsy sample. As another example, the image generated by the digital pathology image generation system 220 can include a slide image of a liquid sample (e.g., a blood smear). As another example, the image generated by the digital pathology image generation system 220 can include a fluorescence micrograph, such as a slide image depicting fluorescence in situ hybridization (FISH) after a fluorescent probe has bound to a target DNA or RNA sequence.

一些类型的样品(例如，活检组织、固体样品和/或包括组织的样品)可以由样品制备系统221处理以固定和/或嵌入该样品。样品制备系统221可以促进用固定剂(例如，液体固定剂，诸如甲醛溶液)和/或包埋物质(例如，组织学蜡)浸润样品。例如，样品固定子系统可通过将样品暴露于固定剂达至少阈值时间量(例如，至少3小时、至少6小时或至少13小时)来固定样品。脱水子系统可以使样品脱水(例如，通过将固定样品和/或固定样品的一部分暴露于一种或多种乙醇溶液)并可能使用澄清中间剂(例如，包括乙醇和组织学蜡)清除脱水样品。样品包埋子系统可以用加热的(例如，因此呈液体状)组织学石蜡来浸润样品(例如，一次或多次达对应的预定义时间段)。组织学蜡可以包含石蜡和可能的一种或多种树脂(例如，苯乙烯或聚乙烯)。然后可以冷却样品和蜡，然后可以将经蜡浸润的样品封闭。Some types of samples (e.g., biopsy tissue, solid sample and/or sample including tissue) can be processed by sample preparation system 221 to fix and/or embed the sample. Sample preparation system 221 can facilitate infiltration of sample with fixative (e.g., liquid fixative, such as formaldehyde solution) and/or embedding material (e.g., histological wax). For example, the sample fixing subsystem can fix the sample by exposing the sample to the fixative for at least a threshold amount of time (e.g., at least 3 hours, at least 6 hours or at least 13 hours). The dehydration subsystem can dehydrate the sample (e.g., by exposing a portion of the fixed sample and/or the fixed sample to one or more ethanol solutions) and may use a clarifying intermediate (e.g., including ethanol and histological wax) to remove the dehydrated sample. The sample embedding subsystem can infiltrate the sample with heated (e.g., so that it is liquid) histological paraffin (e.g., one or more corresponding predefined time periods). Histological wax can include paraffin and possible one or more resins (e.g., styrene or polyethylene). The sample and wax can then be cooled, and the sample infiltrated with wax can then be sealed.

样品切片器222可以接收固定和包埋的样品并且可以产生一组切片。样品切片器222可以将固定和包埋的样品暴露在凉爽或寒冷的温度下。样品切片器222然后可以切割冷却的样品(或其修整版本)以产生一组切片。每个切片可以具有(例如)小于100μm、小于50μm、小于10μm或小于5μm的厚度。每个切片可以具有(例如)大于0.1μm、大于1μm、大于2μm或大于4μm的厚度。冷冻样品的切割可以在温水浴中(例如，在至少30℃、至少35℃或至少40℃的温度下)执行。The sample slicer 222 can receive a fixed and embedded sample and can produce a set of slices. The sample slicer 222 can expose the fixed and embedded sample to a cool or cold temperature. The sample slicer 222 can then cut the cooled sample (or its trimmed version) to produce a set of slices. Each slice can have a thickness of (for example) less than 100 μm, less than 50 μm, less than 10 μm or less than 5 μm. Each slice can have a thickness of (for example) greater than 0.1 μm, greater than 1 μm, greater than 2 μm or greater than 4 μm. The cutting of frozen samples can be performed in a warm water bath (for example, at a temperature of at least 30° C., at least 35° C. or at least 40° C.).

自动化染色系统223可以通过将每个切片暴露于一种或多种染色剂来促进对一个或多个样品切片的染色。每个切片可以被暴露于预定义体积的染色剂达预定义时间段。在一些情况下，单个切片被同时或依次暴露于多种染色剂。Automated staining system 223 can promote the staining of one or more sample sections by exposing each section to one or more staining agents. Each section can be exposed to the staining agent of predefined volume for a predefined time period. In some cases, a single section is exposed to a variety of staining agents simultaneously or sequentially.

可以将一个或多个染色切片中的每一个呈现给图像扫描仪224，该图像扫描仪可以捕获该切片的数字图像。图像扫描仪224可以包括显微镜相机。图像扫描仪224可以多个放大级(例如，使用10x物镜、20x物镜、40x物镜等)来捕获数字图像。对图像的操纵可以用于在期望的放大倍数范围内捕获样品的选定部分。图像扫描仪224可以进一步捕获由人类操作者识别的注释和/或形态测量结果。在一些情况下，在捕获一个或多个图像之后，切片被返回到自动化染色系统223，使得切片可以被清洗、暴露于一种或多种其他染色剂并再次成像。当使用多种染色剂时，可以选择具有不同颜色配置的染色剂，使得可以将图像的对应于吸收大量第一染色剂的第一切片部分的第一区域与图像(或不同的图像)的对应于吸收大量第二染色剂的第二切片部分的第二区域区分开来。Each of one or more stained sections can be presented to an image scanner 224, which can capture the digital image of the section. The image scanner 224 can include a microscope camera. The image scanner 224 can capture digital images at multiple magnification levels (e.g., using a 10x objective lens, a 20x objective lens, a 40x objective lens, etc.). The manipulation of the image can be used to capture the selected portion of the sample within the desired magnification range. The image scanner 224 can further capture the annotations and/or morphological measurements identified by the human operator. In some cases, after capturing one or more images, the section is returned to the automated staining system 223 so that the section can be cleaned, exposed to one or more other stains and imaged again. When using a variety of stains, stains with different color configurations can be selected so that the first region of the first section portion corresponding to the first stain absorbing a large amount of the image can be distinguished from the second region of the second section portion corresponding to the second stain absorbing a large amount of the image (or different images).

应当理解，在一些情况下，数字病理学图像生成系统220的一个或多个部件可以结合人类操作者进行操作。例如，人类操作员可以跨各个子系统(例如，样品制备系统221或数字病理学图像生成系统220的子系统)移动样品和/或启动或终止一个或多个子系统、系统或数字病理学图像生成系统220的部件的操作。作为另一示例，数字病理学图像生成系统的一个或多个部件(例如，样品制备系统221的一个或多个子系统)的一部分或全部可以部分或全部用人类操作者的动作代替。It should be understood that in some cases, one or more components of the digital pathology image generation system 220 can be operated in conjunction with a human operator. For example, a human operator can move samples across various subsystems (e.g., the sample preparation system 221 or subsystems of the digital pathology image generation system 220) and/or start or terminate the operation of one or more subsystems, systems, or components of the digital pathology image generation system 220. As another example, a portion or all of one or more components of the digital pathology image generation system (e.g., one or more subsystems of the sample preparation system 221) can be partially or completely replaced with the actions of a human operator.

此外，应当理解，虽然数字病理学图像生成系统220的各种所描述和描绘的功能和部件涉及固体和/或活检样品的处理，但其他实施例可以涉及液体样品(例如，血液样品)。例如，数字病理学图像生成系统220可以接收液体样品(例如，血液或尿液)载玻片，该载玻片包括基础载玻片、涂抹的液体样品和盖玻片。图像扫描仪224然后可以捕获样品载玻片的图像。数字病理学图像生成系统220的其他实施例可以涉及使用本文描述的FISH等高级成像技术来捕获样品的图像。例如，一旦荧光探针已被引入样品并使其与靶序列结合，就可以使用适当的成像来捕获样品的图像以供进一步分析。In addition, it should be understood that while the various described and depicted functions and components of the digital pathology image generation system 220 relate to the processing of solid and/or biopsy samples, other embodiments may relate to liquid samples (e.g., blood samples). For example, the digital pathology image generation system 220 can receive a liquid sample (e.g., blood or urine) slide that includes a base slide, a smeared liquid sample, and a coverslip. The image scanner 224 can then capture an image of the sample slide. Other embodiments of the digital pathology image generation system 220 may involve capturing images of the sample using advanced imaging techniques such as FISH described herein. For example, once a fluorescent probe has been introduced into the sample and bound to a target sequence, an image of the sample can be captured using appropriate imaging for further analysis.

给定样品可以在处理和成像期间与一个或多个使用者(例如，一个或多个医师、实验室技术人员和/或医疗提供者)相关联。相关联的提供者可以包括(例如但不限于)订购产生被成像样品的测试或活检的人、有权接收测试或活检的结果的人或进行对测试或活检样品的分析的人等。例如，使用者可以对应于医师、病理医师、临床医师或受试者。使用者可以使用一个或多个使用者装置230来提交以下一个或多个请求(例如，其识别受试者)：由数字病理学图像生成系统220处理样品并且由数字病理学图像处理系统210处理所得图像。A given sample may be associated with one or more users (e.g., one or more physicians, laboratory technicians, and/or medical providers) during processing and imaging. The associated providers may include, for example, but not limited to, a person who orders a test or biopsy that produces an imaged sample, a person who is entitled to receive the results of a test or biopsy, or a person who performs an analysis of a test or biopsy sample, etc. For example, a user may correspond to a physician, a pathologist, a clinician, or a subject. A user may use one or more user devices 230 to submit one or more requests (e.g., identifying a subject) for processing a sample by a digital pathology image generation system 220 and for processing the resulting image by a digital pathology image processing system 210.

数字病理学图像生成系统220可以将由图像扫描仪224产生的图像传输回使用者装置230。使用者装置230随后与数字病理学图像处理系统210通信以启动图像的自动化处理。在一些情况下，数字病理学图像生成系统220将图像扫描仪224所产生的图像直接提供到数字病理学图像处理系统210，例如在用户装置230的用户的指示下。尽管未示出，但也可以使用其他中间装置(例如，连接到数字病理学图像生成系统220或数字病理学图像处理系统210的服务器的数据存储区)。另外，为了简单起见，在网络200中示出了仅一个数字病理学图像处理系统210、图像生成系统220和用户装置230。本公开预期使用每种类型的系统和其部件中的一个或多个而不一定背离本公开的教导。The digital pathology image generation system 220 can transmit the image generated by the image scanner 224 back to the user device 230. The user device 230 then communicates with the digital pathology image processing system 210 to initiate automated processing of the image. In some cases, the digital pathology image generation system 220 provides the image generated by the image scanner 224 directly to the digital pathology image processing system 210, for example, at the instruction of the user of the user device 230. Although not shown, other intermediate devices (e.g., data storage areas connected to the digital pathology image generation system 220 or the server of the digital pathology image processing system 210) can also be used. In addition, for simplicity, only one digital pathology image processing system 210, image generation system 220, and user device 230 are shown in the network 200. The present disclosure contemplates the use of one or more of each type of system and its components without necessarily departing from the teachings of the present disclosure.

图2所示的网络200和相关联系统可以用于其中数字病理学图像(诸如WSI)的扫描和评估是工作的基本组成部分的各种情境中。作为示例，网络200可以与临床环境相关联，其中用户出于可能的诊断目的评估样品。用户可以使用用户装置230来审查图像，之后将图像提供到数字病理学图像处理系统210。用户可以将额外信息提供到数字病理学图像处理系统210，该额外信息可以用于指导或指示数字病理学图像处理系统210对图像的分析。例如，用户可以提供扫描内特征的预期诊断或初步评定。用户还可以提供附加情境，例如被检查的组织类型。作为另一示例，网络200可以与正在检查组织，例如，以确定药物的功效或潜在副作用的实验室环境相关联。在此情境下，提交多种类型的组织进行审查以确定该药物对全身的影响可能是司空见惯的事情。这可能会给人类扫描审查员带来特别的挑战，他们可能需要确定图像的各种情境，这可能高度依赖于被成像的组织类型。这些情境可以任选地提供到数字病理学图像处理系统210。The network 200 and associated systems shown in FIG. 2 can be used in various scenarios where the scanning and evaluation of digital pathology images (such as WSI) are essential components of the work. As an example, the network 200 can be associated with a clinical environment, where a user evaluates a sample for possible diagnostic purposes. The user can use a user device 230 to review the image, and then provide the image to the digital pathology image processing system 210. The user can provide additional information to the digital pathology image processing system 210, which can be used to guide or instruct the digital pathology image processing system 210 to analyze the image. For example, the user can provide an expected diagnosis or preliminary assessment of the features within the scan. The user can also provide additional scenarios, such as the type of tissue being examined. As another example, the network 200 can be associated with a laboratory environment where tissue is being examined, for example, to determine the efficacy or potential side effects of a drug. In this scenario, it may be commonplace to submit multiple types of tissues for review to determine the effects of the drug on the whole body. This may bring special challenges to human scan reviewers, who may need to determine various scenarios of the image, which may be highly dependent on the type of tissue being imaged. These scenarios can be optionally provided to the digital pathology image processing system 210.

数字病理学图像处理系统210可以处理数字病理学图像(包括WSI)，以对数字病理学图像进行分类并针对数字病理学图像和相关输出生成注释。作为示例，数字病理学图像处理系统210可以处理组织样品的WSI或由数字病理学图像处理系统210生成的WSI的图像图块，以识别可以在成簇的肿瘤细胞中观察到的形态学性状，并且确定基因改变事件(诸如基于经识别的形态学性状的基因融合)的发生。数字病理学图像处理系统210可以使用滑动窗口来在成簇的肿瘤细胞之上生成掩模。除了其用于识别WSI中成簇的肿瘤细胞的用途之外，掩模也可以用于测量厚度、确定针对不同端点的长度、确定扭曲的弯曲度以及测量三维成像或处理场景中的体积。数字病理学图像处理系统210然后可以将查询图像裁剪成多个图像图块。图块生成模块211可以定义针对每个数字病理学图像的一组图像图块。为了定义一组图像图块，图块生成模块211可以将数字病理学图像分割为一组图像图块。如本文所体现，图像图块可以是非重叠的(例如，每个图像图块包括图像的不包括在任何其他图像图块中的像素)或重叠的(例如，每个图像图块包括图像的包括在至少一个其他图像图块中的像素的一些部分)。除了每个图像图块的尺寸和窗口的步幅(例如，图像图块与后续图像图块之间的图像距离或像素)之外，诸如图像图块是否重叠的特征也可以增加或减少用于分析的数据集，其中更多图像图块(例如，通过对重叠或更小图像图块的使用所实现的)增加最终输出和可视化的潜在分辨率。在一些情况下，图块生成模块211定义针对图像的一组图像图块，其中每个图像图块具有预定义尺寸和/或图像图块之间的偏移是预定义的。继续检测基因融合或其他基因改变的示例，每个病理学载玻片图像可以被裁剪为带有一定数量的像素的宽度和高度的图像图块。此外，在一些情况下，图块生成模块211可以针对每个WSI创建具有不同尺寸、重叠、步长等的多组图像图块。作为示例，在一些情况下，每个图像图块的宽度和高度(就像素的数量而言)可以基于诸如手头的评估任务、查询图像本身或任何合适的因素等因素来动态地确定(即，不固定)。在一些实施方案中，数字病理学图像本身可以包含可以由成像技术产生的图像图块重叠。在一些情况下，在没有图像图块重叠的情况下进行的平均分割可以是平衡图像图块处理要求并避免影响本文讨论的嵌入生成和权重值生成的优选解决方案。可以例如通过计算针对每个尺寸/偏移的一个或多个性能度量(例如，精确度、召回率、准确度和/或误差)，并且通过选择与高于预定阈值的一个或多个性能度量相关联和/或与一个或多个性能度量(例如，高精确度、高召回率、高准确度和/或低误差)相关联的图像图块尺寸和/或偏移来确定图像图块尺寸或图像图块偏移。The digital pathology image processing system 210 can process digital pathology images (including WSI) to classify digital pathology images and generate annotations for digital pathology images and related outputs. As an example, the digital pathology image processing system 210 can process the WSI of a tissue sample or the image tiles of the WSI generated by the digital pathology image processing system 210 to identify the morphological traits that can be observed in clustered tumor cells, and determine the occurrence of gene change events (such as gene fusions based on the identified morphological traits). The digital pathology image processing system 210 can use a sliding window to generate a mask on clustered tumor cells. In addition to its use for identifying clustered tumor cells in WSI, the mask can also be used to measure thickness, determine the length for different endpoints, determine the curvature of the distortion, and measure the volume in the three-dimensional imaging or processing scene. The digital pathology image processing system 210 can then cut the query image into multiple image tiles. The tile generation module 211 can define a set of image tiles for each digital pathology image. To define a set of image tiles, the tile generation module 211 can segment the digital pathology image into a set of image tiles. As embodied herein, the image tiles can be non-overlapping (e.g., each image tile includes pixels of the image that are not included in any other image tile) or overlapping (e.g., each image tile includes some portion of the pixels of the image that are included in at least one other image tile). In addition to the size of each image tile and the stride of the window (e.g., the image distance or pixels between an image tile and a subsequent image tile), features such as whether the image tiles overlap can also increase or decrease the data set for analysis, where more image tiles (e.g., achieved by the use of overlapping or smaller image tiles) increase the potential resolution of the final output and visualization. In some cases, the tile generation module 211 defines a set of image tiles for an image, where each image tile has a predefined size and/or the offset between image tiles is predefined. Continuing with the example of detecting gene fusions or other genetic changes, each pathology slide image can be cropped into image tiles with a width and height of a certain number of pixels. In addition, in some cases, the tile generation module 211 can create multiple groups of image tiles with different sizes, overlaps, step sizes, etc. for each WSI. As an example, in some cases, the width and height (in terms of the number of pixels) of each image tile can be dynamically determined (i.e., not fixed) based on factors such as the assessment task at hand, the query image itself, or any suitable factors. In some embodiments, the digital pathology image itself can contain image tile overlaps that can be generated by imaging techniques. In some cases, average segmentation without image tile overlap can be a preferred solution to balance image tile processing requirements and avoid affecting the embedding generation and weight value generation discussed herein. The image tile size or image tile offset can be determined, for example, by calculating one or more performance metrics (e.g., precision, recall, accuracy, and/or error) for each size/offset, and by selecting an image tile size and/or offset associated with one or more performance metrics above a predetermined threshold and/or associated with one or more performance metrics (e.g., high precision, high recall, high accuracy, and/or low error).

图块生成模块211可以进一步根据被检测的异常的类型来定义图像图块尺寸。例如，图块生成模块211可以被配置为了解数字病理学图像处理系统210将搜索的组织表型性状或异常的类型，并且可以根据组织表型或异常(并且在一些情况下根据组织样品类型)来定制图像图块尺寸以改进检测。例如，图像生成模块211可以确定当组织表型或异常包括搜索肺组织中的炎症或坏死时，应减小图像图块尺寸以增加扫描速率，而当组织异常包括肝组织中的库普弗细胞的异常时，应增加图像图块尺寸以增加数字病理学图像处理系统210整体分析库普弗细胞的机会。在一些情况下，图块生成模块211定义了一组图像图块，其中针对每个WSI定义了该组中的图像图块数量、该组的图像图块的尺寸、针对该组的图像图块分辨率或其他相关属性，并将其保持为对一个或多个图像中的每一者不变。The tile generation module 211 can further define the image tile size according to the type of abnormality being detected. For example, the tile generation module 211 can be configured to understand the type of tissue phenotype or abnormality that the digital pathology image processing system 210 will search for, and can customize the image tile size according to the tissue phenotype or abnormality (and in some cases according to the tissue sample type) to improve detection. For example, the image generation module 211 can determine that when the tissue phenotype or abnormality includes searching for inflammation or necrosis in lung tissue, the image tile size should be reduced to increase the scanning rate, and when the tissue abnormality includes an abnormality of Kupffer cells in liver tissue, the image tile size should be increased to increase the chance of the digital pathology image processing system 210 analyzing Kupffer cells as a whole. In some cases, the tile generation module 211 defines a set of image tiles, wherein the number of image tiles in the group, the size of the image tiles of the group, the resolution of the image tiles for the group, or other relevant attributes are defined for each WSI and maintained as unchanged for each of one or more images.

如本文所体现，图块生成模块211可以进一步沿着一个或多个颜色通道或颜色组合定义针对每个数字病理学图像的一组图像图块。作为示例，由数字病理学图像处理系统210接收到的数字病理学图像可以包括大格式多色通道图像，这些图像具有针对若干颜色通道中的一个的图像的每个像素指定的像素颜色值(例如，对应于强度的位值)。可以使用的示例性颜色规范或颜色空间包括RGB、CMYK、HSL、HSV或HSB颜色规范。可以基于分割颜色通道和/或生成每个图像图块的亮度图或灰度等效物来定义一组图像图块。例如，对于图像的每个片段，图块生成模块211可以提供红色图像图块、蓝色图像图块、绿色图像图块和/或亮度图像图块，或所使用的颜色规范的等效物。如本文所解释，基于图像的片段和/或片段的颜色值来分割数字病理学图像可以提高用于生成针对图像图块和数字病理学图像的嵌入(例如，低维空间)并产生数字病理学图像分类的模型/网络的准确度和识别率。另外，数字病理学图像处理系统210(例如，使用图块生成模块211)可以在颜色规范之间转换和/或使用多个颜色规范来制备图像图块的副本。可以基于所需类型的图像增强(例如，强调或增强特定颜色通道、饱和度水平、亮度水平等)来选择颜色规范转换。还可以选择颜色规范转换以提高数字病理学图像生成系统220和数字病理学图像处理系统210之间的兼容性。例如，特定图像扫描组件可以提供HSL颜色规范的输出，并且如本文所述，数字病理学图像处理系统210中使用的模型可以使用RGB图像进行训练。将图像图块转换至兼容的颜色规范可以确保仍然可以分析图像图块。另外，数字病理学图像处理系统可以对以特定颜色深度(例如，8位、1位等)提供的图像进行上采样或下采样，以供数字病理学图像处理系统使用。此外，数字病理学图像处理系统210可以根据已捕获的图像的类型使图像图块被转换(例如，荧光图像可以包括关于颜色强度的更多细节或更广泛的颜色)。As embodied herein, the tile generation module 211 can further define a set of image tiles for each digital pathology image along one or more color channels or color combinations. As an example, the digital pathology images received by the digital pathology image processing system 210 can include large-format multi-color channel images having pixel color values (e.g., bit values corresponding to intensity) specified for each pixel of the image in one of the several color channels. Exemplary color specifications or color spaces that can be used include RGB, CMYK, HSL, HSV, or HSB color specifications. A set of image tiles can be defined based on segmenting color channels and/or generating a brightness map or grayscale equivalent for each image tile. For example, for each fragment of the image, the tile generation module 211 can provide a red image tile, a blue image tile, a green image tile, and/or a brightness image tile, or an equivalent of the color specification used. As explained herein, segmenting digital pathology images based on fragments of the image and/or color values of the fragments can improve the accuracy and recognition rate of the model/network used to generate embeddings (e.g., low-dimensional space) for image tiles and digital pathology images and generate digital pathology image classification. In addition, the digital pathology image processing system 210 (e.g., using the tile generation module 211) can convert between color specifications and/or use multiple color specifications to prepare copies of image tiles. Color specification conversion can be selected based on the desired type of image enhancement (e.g., emphasizing or enhancing specific color channels, saturation levels, brightness levels, etc.). Color specification conversion can also be selected to improve the compatibility between the digital pathology image generation system 220 and the digital pathology image processing system 210. For example, a specific image scanning component can provide an output of HSL color specifications, and as described herein, the model used in the digital pathology image processing system 210 can be trained using RGB images. Converting image tiles to compatible color specifications can ensure that image tiles can still be analyzed. Additionally, the digital pathology image processing system may upsample or downsample images provided at a particular color depth (e.g., 8-bit, 1-bit, etc.) for use by the digital pathology image processing system. Additionally, the digital pathology image processing system 210 may cause image tiles to be converted depending on the type of image that was captured (e.g., a fluorescence image may include more detail regarding color intensity or a wider range of colors).

在一些情况下，数字病理学图像处理系统210可以从该多个图像图块中的每一者检测一个或多个特征。该一个或多个特征可以包括例如临床特征或组织学特征中的一者或多者，诸如细胞类型。因此，针对该多个图像图块中的每一者生成标记可以基于该一个或多个特征。作为示例而非限制，临床特征可以包括以下中的一者或多者：诊断时的患者年龄、患者性别、患者身高、患者体重、患者临床病史、患者样品类型或患者吸烟史。作为另一示例而非限制，组织学特征可以包括例如生长模式，诸如实体、筛状、微乳头状、乳头状、腺泡状或鳞屑状。In some cases, the digital pathology image processing system 210 can detect one or more features from each of the multiple image tiles. The one or more features may include, for example, one or more of clinical features or histological features, such as cell type. Therefore, generating a label for each of the multiple image tiles can be based on the one or more features. As an example and not limitation, clinical features may include one or more of the following: patient age at diagnosis, patient sex, patient height, patient weight, patient clinical history, patient sample type, or patient smoking history. As another example and not limitation, histological features may include, for example, growth patterns, such as solid, cribriform, micropapillary, papillary, alveolar, or squamous.

如本文所述，图块嵌入模块212可以针对每个图像图块在对应特征嵌入空间中生成嵌入(例如，低维表示空间)。嵌入可以由数字病理学图像处理系统210表示为图像图块的特征向量。在一些情况下，图块嵌入模块212可以使用神经网络(例如，卷积神经网络)来生成表示图像的每个图像图块的特征向量。在特定实施方案中，图块嵌入神经网络可以基于例如对基于自然(例如，非医学)图像的数据集(诸如ImageNet数据集)进行训练的ResNet图像网络。通过使用非专门的图块嵌入网络，图块嵌入模块212可以利用在有效处理图像以生成嵌入方面的已知进展。此外，使用自然图像数据集允许嵌入神经网络学习在整体水平上辨别图像图块片段之间的差异。As described herein, the tile embedding module 212 can generate an embedding (e.g., a low-dimensional representation space) in a corresponding feature embedding space for each image tile. The embedding can be represented by the digital pathology image processing system 210 as a feature vector for the image tile. In some cases, the tile embedding module 212 can use a neural network (e.g., a convolutional neural network) to generate a feature vector for each image tile representing the image. In a particular embodiment, the tile embedding neural network can be based on, for example, a ResNet image network trained on a dataset based on natural (e.g., non-medical) images (such as an ImageNet dataset). By using a non-specialized tile embedding network, the tile embedding module 212 can take advantage of known advances in efficiently processing images to generate embeddings. In addition, using a natural image dataset allows the embedding neural network to learn to discern differences between image tile fragments at a holistic level.

在其他实施方案中，图块嵌入模块212使用的图块嵌入网络可以是经定制以处理大格式图像(诸如数字病理学WSI)的大量图像图块的嵌入网络。另外，可以使用自定义数据集来训练由图块嵌入模块212使用的图块嵌入网络。例如，图块嵌入网络可以使用WSI的各种样品进行训练，或甚至使用与嵌入网络将生成嵌入的主题(例如，特定组织类型的扫描)相关的样品进行训练。使用专门的或定制的图像的集合来训练图块嵌入网络可以允许图块嵌入网络识别图像图块之间更精细的(例如，更细微的)差异，这可使得特征嵌入空间中的图像图块之间的距离更详细和准确，但潜在代价是需要额外时间来获取图像和/或训练多个图块生成网络以供图块嵌入模块212使用的计算和经济成本。在一些情况下，图块嵌入模块212可以基于数字病理学图像处理系统210正在处理的图像的类型来从图块嵌入网络库中进行选择。In other embodiments, the tile embedding network used by the tile embedding module 212 can be an embedding network customized to handle a large number of image tiles of a large format image (such as a digital pathology WSI). In addition, a custom data set can be used to train the tile embedding network used by the tile embedding module 212. For example, the tile embedding network can be trained using various samples of the WSI, or even trained using samples related to the subject matter (e.g., scans of a specific tissue type) for which the embedding network will generate embeddings. Using a collection of specialized or customized images to train the tile embedding network can allow the tile embedding network to identify finer (e.g., more subtle) differences between image tiles, which can make the distances between image tiles in the feature embedding space more detailed and accurate, but at the potential cost of additional time required to acquire images and/or the computational and economic costs of training multiple tile generation networks for use by the tile embedding module 212. In some cases, the tile embedding module 212 can select from a library of tile embedding networks based on the type of image being processed by the digital pathology image processing system 210.

如本文所述，可以基于使用图像图块的视觉特征来使用机器学习模型(例如，深度学习神经网络)生成图像图块嵌入(例如，低维空间)。在一些情况下，经训练的机器学习模型因此可以充当例如图像特征提取模型。可以根据与图像图块相关联的情境信息或根据图像图块中显示的内容进一步生成图像图块嵌入。例如，图像图块嵌入可以包括一个或多个特征，这些特征指示和/或对应于所描绘对象的尺寸(例如，所描绘细胞或像差的尺寸)和/或所描绘对象的密度(例如，所描绘细胞或像差的密度)。尺寸和密度可以被绝对地测量(例如，基于以像素表示或从像素转换成纳米的大小)，或者相对于来自同一数字病理学图像、来自一类数字病理学图像(例如，使用类似技术或通过单个数字病理学图像生成系统或扫描仪所产生的)或来自相关的数字病理学图像系列的其他图像图块来测量。此外，可以在图块嵌入模块212针对图像图块生成嵌入之前对图像图块进行分类，使得图块嵌入模块212在准备嵌入时考虑分类。As described herein, an image tile embedding (e.g., a low-dimensional space) can be generated using a machine learning model (e.g., a deep learning neural network) based on the use of visual features of the image tile. In some cases, the trained machine learning model can therefore act as, for example, an image feature extraction model. The image tile embedding can be further generated based on contextual information associated with the image tile or based on the content displayed in the image tile. For example, the image tile embedding may include one or more features that indicate and/or correspond to the size of the depicted object (e.g., the size of the depicted cell or aberration) and/or the density of the depicted object (e.g., the density of the depicted cell or aberration). The size and density can be measured absolutely (e.g., based on the size represented in pixels or converted from pixels to nanometers), or relative to other image tiles from the same digital pathology image, from a class of digital pathology images (e.g., generated using similar technology or by a single digital pathology image generation system or scanner), or from a related series of digital pathology images. In addition, the image tiles can be classified before the tile embedding module 212 generates an embedding for the image tile, so that the tile embedding module 212 considers the classification when preparing the embedding.

为了一致性，在一些情况下，图块嵌入模块212可以产生预定义尺寸的嵌入(例如，512个元素的特征向量、2048字节的特征向量等)。在一些情况下，图块嵌入模块212可以产生各种并且任意尺寸的嵌入。图块嵌入模块212可以基于用户指示来调整嵌入的尺寸，或者可以例如基于计算效率、准确度或其他参数来选择尺寸。在特定实施例中，嵌入尺寸可以基于生成嵌入的深度学习神经网络的限制或规范。较大的嵌入尺寸可以用于增加嵌入中捕获的信息量并且提高结果的质量和准确度，而较小的嵌入尺寸可以用于提高计算效率。For consistency, in some cases, tile embedding module 212 can generate embeddings of predefined sizes (e.g., a feature vector of 512 elements, a feature vector of 2048 bytes, etc.). In some cases, tile embedding module 212 can generate embeddings of various and arbitrary sizes. Tile embedding module 212 can adjust the size of the embedding based on user instructions, or can select the size based on computational efficiency, accuracy, or other parameters, for example. In particular embodiments, the embedding size can be based on the limitations or specifications of the deep learning neural network that generates the embedding. Larger embedding sizes can be used to increase the amount of information captured in the embedding and improve the quality and accuracy of the results, while smaller embedding sizes can be used to improve computational efficiency.

数字病理学图像处理系统210可以通过将一个或多个机器学习模型应用于嵌入，即，将嵌入输入到机器学习模型来导出不同的推断。作为示例，数字病理学图像处理系统210可以基于经训练以识别此类结构的机器学习模型来识别成簇的肿瘤细胞。在一些实施方案中，可能没有必要将图像裁剪成图像图块，针对这些图像图块生成嵌入，然后基于此类嵌入来导出推断。相反，在一些情况下，带有足够的图形处理单元(GPU)存储器的数字病理学图像处理系统210可以直接将机器学习模型应用于WSI的嵌入以做出推断。在一些情况下，机器学习模型的输出可以调整成输入图像的形状。The digital pathology image processing system 210 can derive different inferences by applying one or more machine learning models to the embedding, that is, inputting the embedding to the machine learning model. As an example, the digital pathology image processing system 210 can identify clustered tumor cells based on a machine learning model trained to identify such structures. In some embodiments, it may not be necessary to crop the image into image tiles, generate embeddings for these image tiles, and then derive inferences based on such embeddings. On the contrary, in some cases, a digital pathology image processing system 210 with sufficient graphics processing unit (GPU) memory can directly apply the machine learning model to the embedding of the WSI to make inferences. In some cases, the output of the machine learning model can be adjusted to the shape of the input image.

WSI访问模块213可以管理来自数字病理学图像处理系统210的其他模块以及用户装置230的访问WSI的请求。例如，WSI访问模块213接收基于特定图像图块、针对图像图块的标识或针对WSI的标识来识别WSI的请求。WSI访问模块213可以执行以下任务：确认WSI对请求的用户或模块可用，识别从中检索请求的WSI的适当数据库，以及检索请求用户或模块可能感兴趣的任何额外元数据。另外，WSI访问模块213可以有效地处理将适当数据流式传输到请求装置。如本文所述，在一些情况下，可以基于用户将希望看到整体WSI或WSI的一部分的可能性来将WSI以部分的形式提供给用户装置。在一些情况下，WSI访问模块213可以确定要提供WSI的哪些区域以及确定如何提供它们。此外，在一些情况下，可以在数字病理学图像处理系统210内授权WSI访问模块213以确保没有单独的部件锁定或以其他方式误用数据库或WSI而损害其他部件或用户。The WSI access module 213 can manage requests for access to the WSI from other modules of the digital pathology image processing system 210 and the user device 230. For example, the WSI access module 213 receives a request to identify the WSI based on a specific image tile, an identification for an image tile, or an identification for a WSI. The WSI access module 213 can perform the following tasks: confirm that the WSI is available to the requesting user or module, identify the appropriate database from which to retrieve the requested WSI, and retrieve any additional metadata that the requesting user or module may be interested in. In addition, the WSI access module 213 can effectively handle streaming the appropriate data to the requesting device. As described herein, in some cases, the WSI can be provided to the user device in a partial form based on the possibility that the user will want to see the entire WSI or a portion of the WSI. In some cases, the WSI access module 213 can determine which areas of the WSI are to be provided and how to provide them. In addition, in some cases, the WSI access module 213 can be authorized within the digital pathology image processing system 210 to ensure that no individual component locks or otherwise misuses the database or WSI to the detriment of other components or users.

数字病理学图像处理系统210的肿瘤异质性评定模块214可以应用一种或多种技术来评定在WSI中的一者或多者中识别的肿瘤细胞的异质性。在一些实施方案中，评定肿瘤异质性包括：通过识别形态上相似的细胞(例如，通过评定核异质性)来将突变肿瘤细胞分类到表型。在一些实施方案中，评定核异质性包括：量化细胞核的某些特征以基于核形态异质性来区分突变细胞。The tumor heterogeneity assessment module 214 of the digital pathology image processing system 210 can apply one or more techniques to assess the heterogeneity of tumor cells identified in one or more of the WSIs. In some embodiments, assessing tumor heterogeneity includes classifying mutant tumor cells into phenotypes by identifying morphologically similar cells (e.g., by assessing nuclear heterogeneity). In some embodiments, assessing nuclear heterogeneity includes quantifying certain features of the cell nucleus to distinguish mutant cells based on nuclear morphological heterogeneity.

在一些实施方案中，识别肿瘤异质性包括：通过进行细胞级空间分析以评定空间分布来识别克隆细胞的区域。在一些实施方案中，评定空间分布包括：测量肿瘤细胞的最小生成树的子图内的光谱距离，其中子图中的每一者表示成簇的相邻细胞(例如，肿瘤巢)，以及跨所有子图成对地计算邻接光谱距离。在一些实施方案中，可通过进行异常值检测来定义子图中的每一者。在一些实施方案中，可基于对检测到的肿瘤巢的分割来定义子图中的每一者。In some embodiments, identifying tumor heterogeneity includes: identifying regions of clonal cells by performing a cell-level spatial analysis to assess spatial distribution. In some embodiments, assessing spatial distribution includes: measuring spectral distances within a subgraph of a minimum spanning tree of tumor cells, wherein each of the subgraphs represents clustered adjacent cells (e.g., tumor nests), and calculating adjacency spectral distances pairwise across all subgraphs. In some embodiments, each of the subgraphs can be defined by performing outlier detection. In some embodiments, each of the subgraphs can be defined based on segmentation of detected tumor nests.

数字病理学图像处理系统210的基因融合预测模块215可以应用一种或多种技术来预测基因融合存在的可能性(例如，二进制输出或百分比输出)。在一些实施方案中，基因融合预测模块215可以评定和/或聚合评定肿瘤异质性的结果、基因融合的端到端预测的结果、评定肿瘤形态学的结果和/或其他方法的结果以得出预测(例如，评分)。The gene fusion prediction module 215 of the digital pathology image processing system 210 can apply one or more techniques to predict the likelihood of the presence of a gene fusion (e.g., a binary output or a percentage output). In some embodiments, the gene fusion prediction module 215 can assess and/or aggregate the results of assessing tumor heterogeneity, the results of end-to-end prediction of gene fusions, the results of assessing tumor morphology, and/or the results of other methods to derive a prediction (e.g., a score).

数字病理学图像处理系统210的输出生成模块216可以基于用户请求来生成对应于图像图块和一个或多个WSI数据集中的一者或多者的输出。如本文所述，输出可以包括基于请求类型和可用的数据的类型的各种可视化、交互式图形和报告。在一些实施方案中，输出将被提供给用户装置230以供显示，但在某些实施方案中，输出可以直接从数字病理学图像处理系统210访问。输出将基于适当数据的存在和访问，因此输出生成模块将被授权根据需要访问必要的元数据和匿名患者信息。与数字病理学图像处理系统210的其他模块一样，输出生成模块214可以按模块化方式进行更新和改进，使得新输出特征可以提供给用户而无需大量停机时间。The output generation module 216 of the digital pathology image processing system 210 can generate outputs corresponding to one or more of the image tiles and one or more WSI data sets based on user requests. As described herein, the output can include various visualizations, interactive graphics, and reports based on the type of request and the type of data available. In some embodiments, the output will be provided to the user device 230 for display, but in certain embodiments, the output can be accessed directly from the digital pathology image processing system 210. The output will be based on the existence and access of the appropriate data, so the output generation module will be authorized to access the necessary metadata and anonymous patient information as needed. Like other modules of the digital pathology image processing system 210, the output generation module 214 can be updated and improved in a modular manner so that new output features can be provided to users without a lot of downtime.

本文所述的一般技术可以集成到各种工具和用例中。例如，如所述的，用户(例如，病理学家或临床医生)可以访问与数字病理学图像处理系统210通信的用户装置230并且提供查询图像以供分析。数字病理学图像处理系统210或到数字病理学图像处理系统的连接可以提供为独立的软件工具或包，该软件工具或包搜索对应的匹配项、识别类似的特征并且根据请求为用户生成适当输出。作为可以在简化的基础上购买或许可的独立工具或插件，该工具可用于增强研究或临床实验室的能力。另外，该工具可以集成到数字病理学图像生成系统的客户可用的服务中。例如，该工具可以提供为统一的工作流程，其中执行或请求针对提交的样品自动创建WSI的用户接收图像和/或先前已索引的类似WSI内值得注意的特征的报告。因此，除了改进WSI分析之外，这些技术还可以集成到现有系统中，以提供先前未考虑或不可能的额外功能。The general techniques described herein can be integrated into various tools and use cases. For example, as described, a user (e.g., a pathologist or clinician) can access a user device 230 communicating with a digital pathology image processing system 210 and provide a query image for analysis. The digital pathology image processing system 210 or the connection to the digital pathology image processing system can be provided as an independent software tool or package that searches for corresponding matches, identifies similar features, and generates appropriate outputs for users upon request. As an independent tool or plug-in that can be purchased or licensed on a simplified basis, the tool can be used to enhance the capabilities of a research or clinical laboratory. In addition, the tool can be integrated into the services available to the customers of the digital pathology image generation system. For example, the tool can be provided as a unified workflow, wherein a user receiving an image and/or a previously indexed similar WSI that automatically creates a WSI for a submitted sample is executed or requested to be notable for features previously indexed. Therefore, in addition to improving WSI analysis, these techniques can also be integrated into existing systems to provide additional features that were not previously considered or impossible.

此外，数字病理学图像处理系统210可以被训练和定制以用于特定设置。例如，数字病理学图像处理系统210可以被专门训练以用于提供与特定类型的组织(例如，肺、心脏、血液、肝脏等)相关的见解。作为另一示例，数字病理学图像处理系统210可以被训练以辅助安全评定，例如确定与药物或其他潜在治疗相关联的毒性水平或程度。一旦被训练以用于特定主题或用例，数字病理学图像处理系统210不一定限于该用例。由于至少部分标记或注释的图像的相对较大集合，可以在特定情境，例如，毒性评定中执行训练。In addition, the digital pathology image processing system 210 can be trained and customized for specific settings. For example, the digital pathology image processing system 210 can be specifically trained to provide insights related to specific types of tissues (e.g., lungs, heart, blood, liver, etc.). As another example, the digital pathology image processing system 210 can be trained to assist in safety assessments, such as determining the level or degree of toxicity associated with a drug or other potential treatment. Once trained for a specific subject or use case, the digital pathology image processing system 210 is not necessarily limited to that use case. Due to the relatively large collection of at least partially labeled or annotated images, training can be performed in specific scenarios, such as toxicity assessments.

本文公开的方法和系统可以使得用户能够容易地请求基于由用户提供的数字病理学图像的预后预测。在一些情况下，数字病理学图像处理系统210可以从客户端计算系统向远程计算系统传输请求通信，以处理描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。响应于从客户端计算系统接收请求通信，远程计算系统可以进行包括以下步骤的操作。远程计算系统可以首先访问数字病理学图像。远程计算系统然后可以将数字病理学图像分割为多个图像图块，每个图像图块描绘一个或多个肿瘤细胞簇。远程计算系统然后可以针对该多个图像图块中的每一者生成指示图像图块描绘肿瘤异质性的可能性的标记。远程计算系统然后可以基于针对每个图像图块生成的标记来确定数字病理学图像包括对关于癌细胞的可操作突变的发生的描绘。远程计算系统然后可以基于带有对癌细胞的应答的基因融合的发生来生成针对受试者的预后预测。在一些情况下，预后预测可以包括对针对受试者的一个或多个治疗方案的适用性的预测。远程计算系统可以进一步经由响应通信来向客户端计算系统提供预后预测。在一些情况下，客户端计算系统可以响应于接收响应通信而输出预后预测。The methods and systems disclosed herein can enable a user to easily request a prognosis prediction based on a digital pathology image provided by a user. In some cases, the digital pathology image processing system 210 can transmit a request communication from a client computing system to a remote computing system to process a digital pathology image depicting cancer cells in a specific slice of a biological sample from a subject. In response to receiving a request communication from a client computing system, the remote computing system can perform operations including the following steps. The remote computing system can first access the digital pathology image. The remote computing system can then segment the digital pathology image into a plurality of image tiles, each of which depicts one or more tumor cell clusters. The remote computing system can then generate a mark indicating the possibility of the image tile depicting tumor heterogeneity for each of the plurality of image tiles. The remote computing system can then determine that the digital pathology image includes a depiction of the occurrence of an actionable mutation about a cancer cell based on the mark generated for each image tile. The remote computing system can then generate a prognosis prediction for a subject based on the occurrence of a gene fusion with a response to a cancer cell. In some cases, the prognosis prediction can include a prediction of the suitability of one or more treatment regimens for a subject. The remote computing system may further provide the prognostic prediction to the client computing system via the response communication.In some cases, the client computing system may output the prognostic prediction in response to receiving the response communication.

图3示出了用于检测基因改变(例如，基因融合)的示例性方法300。该方法可以包括步骤310，其中图3中所描绘的数字病理学图像处理系统210可以访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。作为示例而非限制，数字病理学图像可以是包括肿瘤细胞(例如，肺腺癌)的经扫描、经染色的(例如，经苏木精和伊红染色的)WSI。FIG3 shows an exemplary method 300 for detecting genetic alterations (e.g., gene fusions). The method may include step 310, where the digital pathology image processing system 210 depicted in FIG3 may access a digital pathology image depicting cancer cells in a particular slice of a biological sample from a subject. By way of example and not limitation, the digital pathology image may be a scanned, stained (e.g., hematoxylin and eosin stained) WSI including tumor cells (e.g., lung adenocarcinoma).

在图3的步骤320处，数字病理学图像处理系统210可以将数字病理学图像分割为多个图像图块，每个图像图块描绘至少一个簇的肿瘤细胞。在特定实施方案中，图3中所描绘的图块生成模块211可以用于生成图像图块。图像图块可以是不重叠的或重叠的。除了每个图像图块的尺寸以及用于创建图像图块的窗口的逐步位移之外，诸如图像图块是否重叠等特征也可以增加或减少用于分析的数据集，其中更多的图像图块增加最终输出和可视化的潜在分辨率。在特定实施方案中，每个图像图块可以具有预定义的尺寸和/或图像图块之间的偏移可以是预定义的。此外，图块生成模块211可以针对每个图像创建具有不同尺寸、重叠、步长等的多组图像图块。图块生成模块211可以针对一个或多个颜色通道中的每个数字病理学图像或者针对一个或多个颜色组合来生成图像图块。可以基于分割颜色通道和/或生成每个图像图块的亮度图或灰度等效物来生成图像图块。另外，数字病理学图像处理系统210可以对以特定颜色深度提供的图像进行上采样或下采样，以可供数字病理学图像处理系统210使用。此外，数字病理学图像处理系统210可以根据已捕获的图像的类型来使图像图块被转换。At step 320 of FIG. 3 , the digital pathology image processing system 210 may segment the digital pathology image into a plurality of image tiles, each image tile depicting at least one cluster of tumor cells. In a particular embodiment, the tile generation module 211 depicted in FIG. 3 may be used to generate image tiles. The image tiles may be non-overlapping or overlapping. In addition to the size of each image tile and the progressive displacement of the window used to create the image tile, features such as whether the image tiles overlap may also increase or decrease the data set used for analysis, where more image tiles increase the potential resolution of the final output and visualization. In a particular embodiment, each image tile may have a predefined size and/or the offset between image tiles may be predefined. In addition, the tile generation module 211 may create multiple groups of image tiles with different sizes, overlaps, step sizes, etc. for each image. The tile generation module 211 may generate image tiles for each digital pathology image in one or more color channels or for one or more color combinations. Image tiles may be generated based on segmenting color channels and/or generating a brightness map or grayscale equivalent for each image tile. Additionally, the digital pathology image processing system 210 may upsample or downsample images provided at a particular color depth for use by the digital pathology image processing system 210. Additionally, the digital pathology image processing system 210 may cause image tiles to be converted depending on the type of image that has been captured.

在图3中的步骤330处，数字病理学图像处理系统210可以针对该多个图像图块中的每一者生成指示图像图块(诸如描绘可操作突变的图像图块)描绘成簇的肿瘤细胞(例如，肿瘤区域或肿瘤巢结构)的可能性的标记。作为示例而非限制，数字病理学图像处理系统210可以从该多个图像图块中的每一者检测一个或多个特征。该一个或多个特征可以包括例如组织学特征(诸如细胞类型或细胞分组)、临床特征或基因组特征中的一者或多者。因此，针对该多个图像图块中的每一者生成标记可以基于该一个或多个特征。在特定实施方案中，针对该多个图像图块中的每一者生成标记可以基于基于图像图块的分类或多实例学习(MIL)分类中的一者或多者。针对该多个图像图块中的每一者生成标记可以基于对一个或多个经训练的机器学习模型的使用。在特定实施方案中，数字病理学图像处理系统210可以基于多个训练数据来训练该一个或多个机器学习模型，该多个训练数据包括：对包括例如肿瘤区域或肿瘤巢结构的样品的一个或多个带标记的描绘，以及对不包括肿瘤区域或肿瘤巢结构的样品的一个或多个带标记的描绘。在特定实施方案中，针对该多个图像图块中的每一者生成标记可以基于组织形态学，例如，肿瘤形态学。肿瘤形态学可以基于例如对以下组织学特征中的一者或多者的分析：印戒细胞的存在或数量、肝样细胞的存在或数量、细胞外粘蛋白或肿瘤生长模式。At step 330 in FIG. 3 , the digital pathology image processing system 210 may generate a tag indicating the likelihood that an image tile (such as an image tile depicting an actionable mutation) depicts clustered tumor cells (e.g., a tumor region or a tumor nest structure) for each of the multiple image tiles. As an example and not limitation, the digital pathology image processing system 210 may detect one or more features from each of the multiple image tiles. The one or more features may include, for example, one or more of histological features (such as cell types or cell groupings), clinical features, or genomic features. Therefore, generating a tag for each of the multiple image tiles may be based on the one or more features. In a specific embodiment, generating a tag for each of the multiple image tiles may be based on one or more of an image tile-based classification or a multi-instance learning (MIL) classification. Generating a tag for each of the multiple image tiles may be based on the use of one or more trained machine learning models. In certain embodiments, the digital pathology image processing system 210 can train the one or more machine learning models based on a plurality of training data, the plurality of training data comprising: one or more labeled depictions of samples including, for example, tumor regions or tumor nest structures, and one or more labeled depictions of samples not including tumor regions or tumor nest structures. In certain embodiments, generating labels for each of the plurality of image tiles can be based on tissue morphology, for example, tumor morphology. Tumor morphology can be based on, for example, analysis of one or more of the following histological features: presence or number of signet ring cells, presence or number of hepatoid cells, extracellular mucin, or tumor growth pattern.

在图3中的步骤340处，数字病理学图像处理系统210可以基于针对每个图像图块所生成的标记来确定数字病理学图像包括对关于图像中的癌细胞的基因融合的发生的描绘。在特定实施方案中，数字病理学图像处理系统210可以使用各种不同方法中的任一者来有效地确定所存在的基因融合。例如，一种方法可以包括：将靶基因融合(例如，NTRK融合)与其他基因融合(诸如ROS1、ALK和RET融合)组合成单个可操作基因融合簇。该簇可以被视为单个类别的基因融合以促进检测。在该方法中，并非试图单独地识别每个基因融合，而是数字病理学图像处理系统210将它们视为单个群组，并且因此不再需要检测以小于半个百分比的频率单独地出现的基因融合。作为示例而非限制，这些基因融合的组合出现频率可以为约15％。At step 340 in FIG. 3 , the digital pathology image processing system 210 can determine that the digital pathology image includes a depiction of the occurrence of gene fusions about cancer cells in the image based on the markers generated for each image tile. In a specific embodiment, the digital pathology image processing system 210 can use any of a variety of different methods to effectively determine the gene fusions present. For example, a method may include: combining target gene fusions (e.g., NTRK fusions) with other gene fusions (such as ROS1, ALK, and RET fusions) into a single operational gene fusion cluster. The cluster can be considered as a single category of gene fusions to facilitate detection. In this method, rather than attempting to identify each gene fusion individually, the digital pathology image processing system 210 treats them as a single group, and therefore no longer needs to detect gene fusions that occur individually at a frequency of less than half a percentage. As an example and not limitation, the combined frequency of occurrence of these gene fusions can be about 15%.

对可操作基因融合的检测可以基于以下中的一者或多者：使用一个或多个端到端数据驱动式机器学习模型的(i)对组织学特征的自动检测，(ii)对相互排斥的基因突变的识别(从而识别基因融合的缺乏(或不存在))，(iii)通过将NTRK与ALK、ROS1和RET分组到单个“可操作基因融合簇”中并且识别该簇来进行的对NTRK基因融合的检测，(iv)对与ALK、ROS1和RET相关联的组织学特征(包括实体和筛状生长模式、细胞外粘蛋白、印戒细胞、杯状细胞和肝样细胞)的自动检测，(v)对与吸烟相关的突变特征的识别和消除，(vi)对低肿瘤突变负荷的识别，(vii)对肿瘤异质性的评定，或(viii)对泛肿瘤或肿瘤不可知的可操作基因融合簇的识别。Detection of actionable gene fusions can be based on one or more of the following: (i) automated detection of histological features using one or more end-to-end data-driven machine learning models, (ii) identification of mutually exclusive gene mutations (thereby identifying the absence (or absence) of gene fusions), (iii) detection of NTRK gene fusions by grouping NTRK with ALK, ROS1, and RET into a single "actionable gene fusion cluster" and identifying that cluster, (iv) automated detection of histological features associated with ALK, ROS1, and RET (including solid and cribriform growth patterns, extracellular mucin, signet ring cells, goblet cells, and hepatoid cells), (v) identification and elimination of mutational signatures associated with smoking, (vi) identification of low tumor mutational burden, (vii) assessment of tumor heterogeneity, or (viii) identification of pan-tumor or tumor-agnostic actionable gene fusion clusters.

另一种方法可以包括使用这些肿瘤的分子景观和分子特征。在特定实施方案中，针对融合的信号可以主要在肿瘤巢/细胞中，并且跨整个肿瘤区域是很强的并跨整个肿瘤区域扩散。因此，除了直接从载玻片识别基因融合之外，数字病理学图像处理系统210还可以基于分子特征跨肿瘤的相互排斥的分布来识别基因融合。Another approach can include using the molecular landscape and molecular signatures of these tumors. In certain embodiments, the signal for the fusion can be primarily in the tumor nests/cells and be strong and diffuse across the entire tumor region. Thus, in addition to identifying gene fusions directly from the slide, the digital pathology image processing system 210 can also identify gene fusions based on the mutually exclusive distribution of molecular signatures across the tumor.

在特定实施方案中，数字病理学图像处理系统210可以向病理学家指示基因融合的发生，例如，融合阳性载玻片图像与来自该载玻片的带有基因融合预测的重叠热图的相同视场之间的比较。当比较该两者时，病理学家可以看到本文公开的一些实施方案中的肿瘤检测算法如何拒绝不包含肿瘤的图像图块。此外，对基因融合的预测的置信度量(如例如通过热图的强度所描绘的)可能跨肿瘤区域有所不同。在带有印戒细胞的区域中，置信度量可能是最高的。In certain embodiments, the digital pathology image processing system 210 can indicate the occurrence of a gene fusion to a pathologist, for example, a comparison between a fusion-positive slide image and the same field of view with an overlaid heat map of a gene fusion prediction from that slide. When comparing the two, the pathologist can see how the tumor detection algorithm in some embodiments disclosed herein rejects image tiles that do not contain a tumor. In addition, the confidence metric for the prediction of a gene fusion (as depicted, for example, by the intensity of the heat map) may vary across tumor regions. The confidence metric may be highest in regions with signet ring cells.

在步骤350处，数字病理学图像处理系统210可以基于检测到的关于癌细胞的基因融合的发生来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测。数字病理学图像处理系统210可以例如经由图形用户界面来输出预后预测。作为示例而非限制，数字病理学图像处理系统210可以输出治疗方案评定。数字病理学图像处理系统210可以生成与该一个或多个治疗方案的使用相关联的建议。例如，评定可能是该患者可能有基因融合。作为后续或进一步的步骤，数字病理学图像处理系统210可以提示进行后续分子检测(诸如下一代测序测定)的建议。在一些实施方案中，图3中所描绘的方法的一个或多个步骤可以在适当的情况下被重复。尽管本公开描述和示出了图3的方法的如以特定顺序发生的特定步骤，但是本公开设想了图3的方法的以任何合适的顺序发生的任何合适的步骤。此外，尽管本公开描述和示出了用于检测基因融合(或其他基因改变)的示例性方法(包括图3中所描绘的方法的特定步骤)，但是本公开设想了用于检测基因融合的任何合适的方法(在适当的情况下包括任何合适的步骤，其可以包括图3的方法的所有步骤、一些步骤或不包括该方法的步骤)。此外，尽管本公开描述和示出了执行图3的方法的特定步骤的特定部件、装置或系统，但是本公开设想了执行图3的方法的任何合适的步骤的任何合适的部件、装置或系统的任何合适的组合。At step 350, the digital pathology image processing system 210 can generate a prognosis prediction for the subject based on the occurrence of the gene fusion of the cancer cell detected, wherein the prognosis prediction includes a prediction of the suitability of one or more treatment plans for the subject. The digital pathology image processing system 210 can output the prognosis prediction, for example, via a graphical user interface. As an example and not limitation, the digital pathology image processing system 210 can output a treatment plan assessment. The digital pathology image processing system 210 can generate suggestions associated with the use of the one or more treatment plans. For example, the assessment may be that the patient may have gene fusion. As a subsequent or further step, the digital pathology image processing system 210 can prompt the suggestion of subsequent molecular detection (such as next generation sequencing determination). In some embodiments, one or more steps of the method depicted in Figure 3 can be repeated where appropriate. Although the present disclosure describes and illustrates the specific steps of the method of Figure 3 as occurring in a specific order, the present disclosure contemplates any suitable steps of the method of Figure 3 occurring in any suitable order. In addition, although the present disclosure describes and illustrates an exemplary method for detecting gene fusion (or other genetic alterations) (including specific steps of the method depicted in FIG. 3 ), the present disclosure contemplates any suitable method for detecting gene fusion (including any suitable steps, where appropriate, which may include all steps, some steps, or no steps of the method of FIG. 3 ). In addition, although the present disclosure describes and illustrates specific components, devices, or systems for performing specific steps of the method of FIG. 3 , the present disclosure contemplates any suitable combination of any suitable components, devices, or systems for performing any suitable steps of the method of FIG. 3 .

在一些情况下，所公开的方法和系统可以应用于检测基因融合/重排，这是一种特定类型的罕见的可药性致癌基因突变事件，其可以跨许多不同的癌症类型被识别，如果存在于肿瘤组织样品中，则其可以指示对某些靶向疗法的强烈应答。基因融合包括罕见的可药性突变事件，这些突变事件可以跨许多不同的肿瘤类型而发生，并且越来越多地作为新颖疗法的靶标。基因融合的识别可能是在技术上困难、昂贵并且耗时的过程，其最终可能仅使少数携带此类基因改变的患者受益；由于这些原因，广泛检测可能仅限于少数有能力吸收和提供在该过程中所涉及的技术和资金资源的医院。本文公开的实施方案可以通过创建、训练和使用机器学习模型(例如，数字病理学筛查模型)来解决这一差距，该机器学习模型可以从数字病理学图像(诸如经扫描、经染色的(例如，经苏木精和伊红染色的)描绘癌组织/细胞(例如，肺腺癌)的WSI)来预测致癌基因融合的存在。此外，本文公开的实施方案可以包括快速、便宜并且足够准确的筛查工具，其可以用于指导分子检测以及与针对个体患者(包括但不限于肺腺癌患者)的靶向疗法的使用有关的决策制定。In some cases, the disclosed methods and systems can be applied to detect gene fusion/rearrangement, which is a specific type of rare druggable oncogene mutation event that can be identified across many different cancer types and, if present in tumor tissue samples, can indicate a strong response to certain targeted therapies. Gene fusions include rare druggable mutation events that can occur across many different tumor types and are increasingly being targeted for novel therapies. The identification of gene fusions can be a technically difficult, expensive, and time-consuming process that may ultimately benefit only a few patients carrying such genetic changes; for these reasons, widespread testing may be limited to a few hospitals that have the ability to absorb and provide the technical and financial resources involved in the process. The embodiments disclosed herein can address this gap by creating, training, and using a machine learning model (e.g., a digital pathology screening model) that can predict the presence of oncogene fusions from digital pathology images (such as scanned, stained (e.g., hematoxylin and eosin-stained) WSIs depicting cancer tissue/cells (e.g., lung adenocarcinoma)). Furthermore, embodiments disclosed herein may include rapid, inexpensive, and sufficiently accurate screening tools that can be used to guide molecular testing and decision making regarding the use of targeted therapies for individual patients, including, but not limited to, patients with lung adenocarcinoma.

在一些情况下，如本文别处所指出的，所公开的方法和系统可以用于识别越来越多地作为新颖疗法的靶标的基因融合。针对患有肿瘤的患者的靶向疗法可以包括靶向表皮生长因子受体(EGFR)以及涉及间变性淋巴瘤激酶(ALK)、RET、ROS1和神经营养性酪氨酸受体激酶(NTRK)的基因融合的药物。对于EGFR，尽管免疫组织化学染色可以用于识别最常见的变体(例如，其覆盖率高达EGFR阳性肺腺癌患者的97％)，但可能需要分子检测来识别EGFR靶向疗法失败的患者中的耐药突变。还没有开发出针对RET和ROS1的此类免疫组织化学染色剂，并且针对ALK和NTRK的免疫组织化学染色剂可能变化很大并且难以解释。此外，基因融合通常需要更复杂的分子测定，与针对有限数量基因座进行测试的更常用的“热点”测定相比，其基因组的覆盖范围更大。为了靶向基因融合，可能需要广泛得多的覆盖范围，从而导致昂贵得多的测试，这需要实验室表现出大得多的技术能力。因此，相当大比例的患者可能不太可能接受正确的测试来确定他们的肿瘤携带基因融合的可能性。除此之外，一些基因融合(例如，NTRK融合)可能极其罕见。尽管NTRK融合已在多种肿瘤类型中被识别，但在最常见的癌症适应症中(诸如在肺腺癌、结肠直肠癌和非分泌性乳腺癌中)，该特定融合的频率可能小于1％。基因融合的相对罕见性(例如，在肺腺癌中范围为从针对ALK的7％到针对NTRK的小于0.3％)构成了对广泛检测的显著技术和资金阻碍。事实上，研究已表明，从这些药物中受益最多的患者群体是居住在学术机构附近的那些患者群体，这些学术机构拥有进行复杂实验室测试的专业知识、基础设施和预算。目前，分子检测是迄今为止可用于确定患者中存在基因融合的可能性的唯一方法。然而，分子检测价格昂贵，并且患者有时会因为费用和/或给可能不会从分子检测中受益的患者带来不必要的费用而避免安排分子检测。当前的实施方案提出了对当前系统的改进，其在于当前的实施方案可以用于识别可以从分子检测中受益的患者。特别地，本文所述的数字病理学图像处理系统可以使用数字病理学机器学习模型来筛查可能有基因融合的患者，并且然后可以提供使用分子测定来测试那些患者的建议。因此，所公开的数字病理学图像处理系统可以提高检测患者中的基因融合的可能性，并且可以降低后续分子检测的成本，从而进一步有益于和改进针对对于其而言存在靶向疗法的表现出基因融合的那些患者的医疗保健结果。数字病理学模型可以适用于任何合适的肿瘤类型，尽管本文所公开的实施方案设想了对作为示例性肿瘤类型的肺腺癌应用数字病理学模型。In some cases, as noted elsewhere herein, the disclosed methods and systems can be used to identify gene fusions that are increasingly being targeted as novel therapies. Targeted therapies for patients with tumors can include drugs targeting epidermal growth factor receptor (EGFR) and gene fusions involving anaplastic lymphoma kinase (ALK), RET, ROS1, and neurotrophic tyrosine receptor kinase (NTRK). For EGFR, although immunohistochemical staining can be used to identify the most common variants (e.g., its coverage is as high as 97% of EGFR-positive lung adenocarcinoma patients), molecular testing may be required to identify drug-resistant mutations in patients who have failed EGFR targeted therapy. Such immunohistochemical stains for RET and ROS1 have not yet been developed, and immunohistochemical stains for ALK and NTRK may vary greatly and be difficult to interpret. In addition, gene fusions typically require more complex molecular assays, which have a greater coverage of the genome than the more commonly used "hotspot" assays for testing a limited number of loci. In order to target gene fusions, a much wider coverage may be required, resulting in much more expensive testing, which requires laboratories to demonstrate much greater technical capabilities. Therefore, a considerable proportion of patients may be unlikely to receive the correct test to determine the possibility that their tumors carry gene fusions. In addition, some gene fusions (e.g., NTRK fusions) may be extremely rare. Although NTRK fusions have been identified in a variety of tumor types, the frequency of this particular fusion may be less than 1% in the most common cancer indications (such as in lung adenocarcinoma, colorectal cancer, and non-secretory breast cancer). The relative rarity of gene fusions (e.g., ranging from 7% for ALK to less than 0.3% for NTRK in lung adenocarcinoma) constitutes a significant technical and financial obstacle to extensive testing. In fact, studies have shown that the patient groups that benefit most from these drugs are those living near academic institutions, which have the expertise, infrastructure, and budget for complex laboratory tests. At present, molecular testing is the only method that can be used to determine the possibility of gene fusions in patients so far. However, molecular testing is expensive, and patients sometimes avoid arranging molecular testing because of costs and/or unnecessary costs for patients who may not benefit from molecular testing. The current embodiment proposes an improvement to the current system, which is that the current embodiment can be used to identify patients who can benefit from molecular testing. In particular, the digital pathology image processing system described herein can use a digital pathology machine learning model to screen patients who may have gene fusions, and can then provide recommendations for testing those patients using molecular assays. Thus, the disclosed digital pathology image processing system can increase the likelihood of detecting gene fusions in patients, and can reduce the cost of subsequent molecular testing, thereby further benefiting and improving healthcare outcomes for those patients who exhibit gene fusions for whom targeted therapies exist. The digital pathology model can be applicable to any suitable tumor type, although the embodiments disclosed herein contemplate the application of the digital pathology model to lung adenocarcinoma as an exemplary tumor type.

在特定实施方案中，数字病理学图像处理系统210可以使用不同的解决方案来有效地检测基因融合。一种解决方案可以是将靶基因融合(例如，NTRK融合)与其他基因融合(诸如ROS1、ALK和RET)组合成单个可操作基因融合簇。该簇然后可以被视为单个类别的基因融合。因为数字病理学图像处理系统210并非试图单独地识别这些基因融合中的每一者，而是将它们作为单个群组进行处理，所以数字病理学图像处理系统210可以不再以每个基因融合的小于半个百分比的频率进行处理。作为示例而非限制，这些基因融合的组合频率可以为约15％。In certain embodiments, the digital pathology image processing system 210 can use different solutions to effectively detect gene fusions. One solution can be to combine target gene fusions (e.g., NTRK fusions) with other gene fusions (such as ROS1, ALK, and RET) into a single operational gene fusion cluster. The cluster can then be considered as a single category of gene fusions. Because the digital pathology image processing system 210 does not attempt to identify each of these gene fusions individually, but rather processes them as a single group, the digital pathology image processing system 210 can no longer process at a frequency of less than half a percentage of each gene fusion. As an example and not limitation, the combined frequency of these gene fusions can be about 15%.

另一种方法可以包括使用这些肿瘤的分子景观和分子特征。在特定情况下，针对融合的信号可以主要出现在肿瘤巢/细胞中，并且可以跨肿瘤区域是很强的并跨肿瘤区域扩散。因此，除了直接从载玻片识别融合之外，数字病理学图像处理系统210还可以基于分子特征跨肿瘤的相互排斥的分布来识别基因融合。作为示例而非限制，肺腺癌的形态学可以被映射到分子景观上，其可以包括例如但不限于：17％ EGFR-敏化、7％ ALK、4％EGFR其他、3％具有>1个突变、2％ HER2、2％ ROS1、2％ BRAF、2％RET、1％ NTRK1、1％ PIK3CA、1％MEK1、31％未知致癌基因驱动因子和25％ KRAS交替。在肺腺癌最常见的驱动因子突变中，仅百分之三可能具有大于一个突变，这意味着97％的肺癌患者携带单个突变。因此，驱动因子突变表现出相互排斥性的情况显著地更常见，并且该特征可以用于多种背景中，以在癌症患者的治疗中提供临床决策制定信息。在一些实施方案中，数字病理学图像处理系统210可以访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。数字病理学图像处理系统210然后可以确定数字病理学图像包括对与基因融合的发生互斥的一个或多个突变的描绘，并且因此确定不存在关于癌细胞的基因融合。在一些情况下，数字病理学图像处理系统210可以进一步基于不存在关于癌细胞的基因融合来生成针对受试者的预后预测。在特定实施方案中，数字病理学图像处理系统210可以进一步基于不存在关于癌症的基因融合来生成针对受试者的预后预测。预后预测可以包括例如对针对受试者的一个或多个治疗方案的适用性的预测。由于这种相互排斥性，除了阳性地识别基因融合之外，数字病理学模型还可以识别更常见的突变(诸如KRAS和EGFR)，并且如此来排除基因融合的存在。Another approach can include using the molecular landscape and molecular features of these tumors. In certain cases, the signal for fusion can appear primarily in tumor nests/cells and can be strong and spread across tumor regions. Therefore, in addition to identifying fusions directly from the slide, the digital pathology image processing system 210 can also identify gene fusions based on the mutually exclusive distribution of molecular features across tumors. As an example and not limitation, the morphology of lung adenocarcinoma can be mapped onto a molecular landscape, which may include, for example, but not limited to: 17% EGFR-sensitized, 7% ALK, 4% EGFR other, 3% with> 1 mutation, 2% HER2, 2% ROS1, 2% BRAF, 2% RET, 1% NTRK1, 1% PIK3CA, 1% MEK1, 31% unknown oncogene drivers, and 25% KRAS alternation. Among the most common driver mutations in lung adenocarcinoma, only three percent may have more than one mutation, which means that 97% of lung cancer patients carry a single mutation. Therefore, the situation that the driver mutation shows mutual exclusivity is significantly more common, and this feature can be used in a variety of contexts to provide clinical decision-making information in the treatment of cancer patients. In some embodiments, the digital pathology image processing system 210 can access the digital pathology image of the cancer cells in the specific slice of the biological sample from the subject. The digital pathology image processing system 210 can then determine that the digital pathology image includes the depiction of one or more mutations mutually exclusive with the occurrence of gene fusion, and therefore determine that there is no gene fusion about the cancer cell. In some cases, the digital pathology image processing system 210 can further generate a prognosis prediction for the subject based on the absence of gene fusion about the cancer cell. In a specific embodiment, the digital pathology image processing system 210 can further generate a prognosis prediction for the subject based on the absence of gene fusion about cancer. Prognosis prediction can include, for example, a prediction of the suitability of one or more treatment regimens for the subject. Due to this mutual exclusivity, in addition to positively identifying gene fusions, the digital pathology model can also identify more common mutations (such as KRAS and EGFR), and so exclude the presence of gene fusions.

在特定实施方案中，数字病理学图像处理系统210可以从该多个图像图块中的每一者来检测一个或多个特征。该一个或多个特征可以包括临床特征或组织学特征中的一者或多者，诸如细胞类型。因此，针对该多个图像图块中的每一者生成标记可以基于该一个或多个特征。作为示例而非限制，临床特征可以包括诊断时的较年轻年龄或对吸烟史的估计中的一者或多者。在特定实施方案中，预测可操作基因融合可以基于识别和排除与吸烟相关的突变特征。作为另一示例而非限制，组织学特征可以包括生长模式，诸如实体、筛状、微乳头状、乳头状、腺泡状或鳞屑状。在特定实施方案中，预测或确定可操作基因融合存在可以基于对与ALK、ROS1和RET相关联的组织学特征(包括实体和筛状生长模式和/或细胞外粘蛋白)的检测。作为另一示例而非限制，预测或确定可操作基因融合可以基于对与ALK、ROS1和RET相关联的细胞类型的检测。这些细胞类型可以包括印戒细胞、杯状细胞或肝样细胞中的一者或多者。不同的特征对于不同的肿瘤类型可能具有不同程度的重要性。对这些视觉特征中的每一者的自动检测和量化可以允许对例如肺腺癌中的例如ALK、ROS1、RET和NTRK的预测。In a particular embodiment, the digital pathology image processing system 210 can detect one or more features from each of the multiple image tiles. The one or more features may include one or more of clinical features or histological features, such as cell types. Therefore, generating a label for each of the multiple image tiles can be based on the one or more features. As an example and not limitation, clinical features may include one or more of a younger age at diagnosis or an estimate of a smoking history. In a particular embodiment, predicting an operable gene fusion can be based on identifying and excluding mutation features associated with smoking. As another example and not limitation, histological features may include growth patterns, such as solid, cribriform, micropapillary, papillary, alveolar, or squamous. In a particular embodiment, predicting or determining the presence of an operable gene fusion can be based on the detection of histological features associated with ALK, ROS1, and RET (including solid and cribriform growth patterns and/or extracellular mucin). As another example and not limitation, predicting or determining an operable gene fusion can be based on the detection of cell types associated with ALK, ROS1, and RET. These cell types may include one or more of signet ring cells, goblet cells, or hepatoid cells. Different features may have different degrees of importance for different tumor types. Automatic detection and quantification of each of these visual features can allow prediction of, for example, ALK, ROS1, RET, and NTRK in, for example, lung adenocarcinoma.

在特定实施方案中，可以由数字病理学图像处理系统210(或驻留在其中的数字病理学机器学习模型)用于确定基因改变(例如，基因融合)的存在的融合和肿瘤的另一特征可以为肿瘤突变负荷(TMB)。在一些情况下，例如，激酶或致癌基因融合可能与低肿瘤突变负荷相关联。肿瘤的主要致癌基因驱动因子可能是单个基因融合。因此，人们可以预期源自单个致癌基因驱动因子的形态学信号将跨载玻片上的组织标本中的大多数肿瘤细胞/区域而存在。端到端基因融合状态预测也可以显示出跨全载玻片的强的均匀信号。In certain embodiments, another feature of a fusion and tumor that can be used by the digital pathology image processing system 210 (or a digital pathology machine learning model resident therein) to determine the presence of a genetic alteration (e.g., a gene fusion) can be tumor mutation burden (TMB). In some cases, for example, a kinase or oncogene fusion may be associated with a low tumor mutation burden. The primary oncogene driver of a tumor may be a single gene fusion. Therefore, one can expect that morphological signals derived from a single oncogene driver will exist across most tumor cells/regions in a tissue specimen on a slide. End-to-end gene fusion status predictions can also show strong, uniform signals across the entire slide.

在一些情况下，低肿瘤突变负荷可能表明降低的肿瘤形态学异质性。患者的特征可以为具有驱动因子突变、驱动因子基因中的突变和/或驱动因子融合(例如，涉及驱动因子基因的基因融合)。在一些情况下，癌症中的肿瘤突变负荷可可以由驱动因子突变来驱动。在一些情况下，癌症的肿瘤突变负荷也可以由基因融合来驱动。在一些情况下，由基因融合驱动的癌症可以具有显著较低的肿瘤突变负荷。因此，低肿瘤突变负荷可以与低肿瘤异质性相关联。In some cases, low tumor mutation load may indicate reduced tumor morphological heterogeneity. Patients may be characterized by having driver mutations, mutations in driver genes, and/or driver fusions (e.g., gene fusions involving driver genes). In some cases, the tumor mutation load in cancer may be driven by driver mutations. In some cases, the tumor mutation load of cancer may also be driven by gene fusions. In some cases, cancers driven by gene fusions may have significantly lower tumor mutation loads. Therefore, low tumor mutation load may be associated with low tumor heterogeneity.

数字病理学模型可以跨不同的肿瘤类型是通用的。因此，数字病理学图像处理系统210可以基于数字病理学模型来识别和预测泛肿瘤或肿瘤不可知的可操作基因融合。作为示例而非限制，数字病理学图像处理系统210分别针对ALK融合和ROS1融合来训练数字病理学模型，并且性能是相同的。作为另一示例，针对NTRK信号可以用ALK、ROS1和RET来排序。例如，即使数字病理学模型从未在训练中使用基于NTRK的训练数据，其也能够以与其在本文所公开的实施方案的实验中在ROS1融合的情况下相同的准确度来识别NTRK融合。数字病理学模型的普遍性可以表明特征跨不同的基因融合以及跨不同的肿瘤类型是一致的。The digital pathology model can be universal across different tumor types. Therefore, the digital pathology image processing system 210 can identify and predict pan-tumor or tumor-agnostic actionable gene fusions based on the digital pathology model. As an example and not limitation, the digital pathology image processing system 210 trains the digital pathology model for ALK fusion and ROS1 fusion, respectively, and the performance is the same. As another example, ALK, ROS1, and RET can be used to rank NTRK signals. For example, even if the digital pathology model has never used NTRK-based training data in training, it can identify NTRK fusions with the same accuracy as it does in the case of ROS1 fusion in the experiments of the embodiments disclosed herein. The universality of the digital pathology model can indicate that the features are consistent across different gene fusions and across different tumor types.

本文所公开的实施方案可以具有相比对应的分子检测使用容易获得并且更便宜的材料进行分析的技术优势。在一些实施方案中，生物学样品的切片可以经一种或多种染色剂染色。作为示例而非限制，数字病理学图像处理系统可以用于扫描例如经苏木精和伊红(H&E)染色的载玻片；原始组织标本载玻片可容易获得以进行任何新的或后续的诊断分析。相比之下，分子检测可能需要切入到组织块中以牺牲一些组织并且用于测序，这会导致诊断组织材料的消耗。如可以看到的，通过使用数字病理学机器学习模型来分析图像数据不会破坏组织。在一些情况下，人们可以使用最初的诊断载玻片的数字病理学图像进行分析，而不需要额外的载玻片。在一些实施方案中，可以基于对一个或多个额外数字病理学图像的进一步分析来生成预后预测。在一些情况下，该一个或多个额外数字病理学图像中的每一者可以描绘来自受试者的生物学样品的额外切片。在一些实施方案中，分析可以包括：确定该一个或多个额外数字病理学图像中的每一者包括对关于癌细胞的基因融合的发生的描绘的可能性，并且组合针对该一个或多个额外数字病理学图像中的每一者的确定。在一些情况下，在用经H&E染色的标本载玻片做出诊断之后，人们可能需要牺牲额外的未经染色的标本载玻片(例如，至少5个、6个、7个、8个、9个、10个或多于10个未经染色的标本载玻片)来进行分子检测。The embodiments disclosed herein may have the technical advantage of using readily available and cheaper materials for analysis compared to corresponding molecular detection. In some embodiments, the slices of biological samples may be stained with one or more stains. As an example and not limitation, the digital pathology image processing system may be used to scan slides stained with hematoxylin and eosin (H&E), for example; the original tissue specimen slides may be easily available for any new or subsequent diagnostic analysis. In contrast, molecular detection may require cutting into tissue blocks to sacrifice some tissues and for sequencing, which may result in the consumption of diagnostic tissue materials. As can be seen, the image data may not be destroyed by using a digital pathology machine learning model to analyze the image data. In some cases, people can use the digital pathology images of the initial diagnostic slides for analysis without the need for additional slides. In some embodiments, a prognosis prediction may be generated based on further analysis of one or more additional digital pathology images. In some cases, each of the one or more additional digital pathology images may depict additional slices of biological samples from a subject. In some embodiments, the analysis may include determining a likelihood that each of the one or more additional digital pathology images includes a depiction of the occurrence of a gene fusion with respect to a cancer cell, and combining the determinations for each of the one or more additional digital pathology images. In some cases, after making a diagnosis with an H&E stained specimen slide, one may need to sacrifice additional unstained specimen slides (e.g., at least 5, 6, 7, 8, 9, 10, or more than 10 unstained specimen slides) for molecular testing.

本文所公开的实施方案可以具有易于使用的另一技术优点。人们可以扫描病理学标本载玻片并且将经扫描的图像或从其导出的图像图块数据输入到数字病理学机器学习模型。数字病理学机器学习模型然后可以用于做出对生物学样品中存在基因改变(例如，基因融合)的可能性的预测。在一些情况下，该过程可能不需要由病理学家进行任何注释。在一些情况下，病理学家可能仅需将载玻片正确地识别为靶肿瘤类型，例如，肺腺癌。本文所公开的实施方案可以具有效率的另一技术优点。作为示例而非限制，对基因融合的预测可以在大约几分钟、几小时或几天内(例如，在小于60分钟、小于50分钟、小于40分钟、小于30分钟、小于25分钟、小于20分钟、小于15分钟或小于10分钟内)完成。Embodiments disclosed herein can have another technical advantage of being easy to use. People can scan pathology specimen slides and input the scanned image or the image tile data derived therefrom into the digital pathology machine learning model. The digital pathology machine learning model can then be used to make a prediction of the possibility of the presence of genetic changes (e.g., gene fusion) in biological samples. In some cases, the process may not require any annotations by a pathologist. In some cases, a pathologist may only need to correctly identify the slide as a target tumor type, e.g., lung adenocarcinoma. Embodiments disclosed herein can have another technical advantage of efficiency. As an example and not limitation, the prediction of gene fusion can be completed in about a few minutes, hours, or days (e.g., less than 60 minutes, less than 50 minutes, less than 40 minutes, less than 30 minutes, less than 25 minutes, less than 20 minutes, less than 15 minutes, or less than 10 minutes).

在一些情况下，预测或确定可操作基因融合存在可以至少部分地基于对细胞外粘蛋白的检测。据报告，过量的细胞外粘蛋白指示融合状态，并且所公开的用于基因融合状态预测的方法可以证实这些发现。在一些情况下，数字病理学图像处理系统210可以详细地预测基因融合状态、识别例如切除物与活检组织之间的差异、确定对区域的精确分割、进行对包含细胞外粘蛋白的图像图块的粗略检测、以及从肿瘤区域检测转变到实际基因融合状态预测。作为示例而非限制，在一些情况下，从肿瘤区域检测转变到实际基因融合状态预测可以包括：确定所检测到的粘蛋白相比于组织的分数，或者确定所检测到的粘蛋白相比于肿瘤的分数。In some cases, predicting or determining the presence of an actionable gene fusion can be based at least in part on the detection of extracellular mucin. It has been reported that excess extracellular mucin indicates a fusion state, and the disclosed methods for predicting gene fusion states can confirm these findings. In some cases, the digital pathology image processing system 210 can predict gene fusion states in detail, identify differences between, for example, resections and biopsies, determine precise segmentation of regions, perform rough detection of image tiles containing extracellular mucin, and transition from tumor region detection to actual gene fusion state prediction. As an example and not limitation, in some cases, transitioning from tumor region detection to actual gene fusion state prediction can include determining the fraction of detected mucin compared to tissue, or determining the fraction of detected mucin compared to tumor.

在一些情况下，数字病理学机器学习模型可能跨不同的肿瘤类型是普遍适用的。因此，数字病理学图像处理系统210可以用于基于对数字病理学机器学习模型的使用来识别和预测泛肿瘤或肿瘤不可知的可操作基因融合。例如，包括分别针对ALK融合或针对ROS1融合进行训练的数字病理学机器学习模型的数字病理学图像处理系统210的性能是相同的。作为另一示例，针对NTRK融合的信号可以用ALK、ROS1和RET来排序。例如，即使数字病理学机器学习模型在没有使用基于NTRK的训练数据的情况下进行训练，其也能够以与其在测试本文所公开的方法的实验中对于对ROS1融合的检测所具有的准确度相同的准确度来识别NTRK融合。数字病理学机器学习模型的普遍适用性可以表明，用于预测的潜在图像图块特征跨不同的基因融合以及跨不同的肿瘤类型是一致的。In some cases, the digital pathology machine learning model may be universally applicable across different tumor types. Therefore, the digital pathology image processing system 210 can be used to identify and predict pan-tumor or tumor-agnostic actionable gene fusions based on the use of the digital pathology machine learning model. For example, the performance of the digital pathology image processing system 210 including a digital pathology machine learning model trained for ALK fusion or for ROS1 fusion is the same. As another example, the signal for NTRK fusion can be sorted with ALK, ROS1, and RET. For example, even if the digital pathology machine learning model is trained without using NTRK-based training data, it can identify NTRK fusion with the same accuracy as it has for the detection of ROS1 fusion in experiments testing the method disclosed herein. The universal applicability of the digital pathology machine learning model can show that the potential image tile features used for prediction are consistent across different gene fusions and across different tumor types.

图4示出了针对用于检测生物学样品(例如，组织标本)中的基因融合的过程400的示例性工作流程图。工作流程400可以从组织选择410开始。在组织选择410中，数字病理学图像处理系统210可以进行质量控制和/或肿瘤检测。在特定实施方案中，质量控制和肿瘤区域检测可以包括进行受监督的分类任务。对于此类任务，图像图块级准确度可以是足够的，并且数字病理学图像处理系统210可以使用例如每个任务一个二元分类器。然后可以将组织选择410的结果作为输入提供给端到端分类步骤420。在一些实施方案中，端到端分类420可以基于如上所述的基于图像图块的分类或多实例学习(MIL)分类技术中的一者或多者。作为端到端分类420的一部分，针对该多个图像图块中的每一者生成标记可以基于一个或多个机器学习模型。在一些实施方案中，数字病理学图像处理系统210可以基于多个训练数据来训练该一个或多个机器学习模型，该多个训练数据包括例如对肿瘤区域或肿瘤巢结构的一个或多个带标记的描绘以及对其他组织学或临床特征的一个或多个带标记的描绘。Fig. 4 shows an exemplary workflow diagram for a process 400 for detecting gene fusions in biological samples (e.g., tissue specimens). The workflow 400 can start with tissue selection 410. In tissue selection 410, the digital pathology image processing system 210 can perform quality control and/or tumor detection. In a specific embodiment, quality control and tumor region detection can include performing supervised classification tasks. For such tasks, image tile level accuracy can be sufficient, and the digital pathology image processing system 210 can use, for example, one binary classifier per task. The results of tissue selection 410 can then be provided as input to an end-to-end classification step 420. In some embodiments, the end-to-end classification 420 can be based on one or more of the classification or multi-instance learning (MIL) classification techniques based on image tiles as described above. As part of the end-to-end classification 420, generating a label for each of the multiple image tiles can be based on one or more machine learning models. In some embodiments, the digital pathology image processing system 210 can train the one or more machine learning models based on multiple training data, which include, for example, one or more labeled depictions of tumor regions or tumor nest structures and one or more labeled depictions of other histological or clinical features.

当进行端到端分类420的同时，数字病理学图像处理系统210可以进行肿瘤形态学分析430。在一些实施方案中，针对该多个图像图块中的每一者生成标记可以基于肿瘤形态学。肿瘤形态学分析430可以包括分析以识别印戒细胞、肝样细胞、细胞外粘蛋白或肿瘤生长模式中的一者或多者。在一些情况下，生长模式分析对于基因融合检测可以是有帮助的。作为示例而非限制，肺腺癌可以呈现多种生长模式并且每种生长模式的比例不同。作为另一示例而非限制，在一些情况下，可以将实体和筛状模式与基因融合相关联。在一些情况下，数字病理学图像处理系统210可以确定样品采集类型(例如，切除物相比活检组织)对生长模式的影响。因为生长模式通常是大而同质的区域，所以图像图块级分类可以是足够准确的。在一些实施方案中，可以将印戒细胞检测和肝样细胞检测两者与基因融合的存在相关联。为了检测此类感兴趣的细胞，数字病理学图像处理系统210可以依赖于对象检测和定位。在涉及感兴趣细胞检测的一些情况下，数字病理学图像处理系统210可以例如基于检测到的细胞的数量或类型来确定检测到的细胞与融合状态之间的关系。在一些情况下，数字病理学图像处理系统210可以进一步进行对细胞的精细粒度定位或图像图块级检测。While performing the end-to-end classification 420, the digital pathology image processing system 210 may perform a tumor morphology analysis 430. In some embodiments, generating a label for each of the multiple image tiles may be based on tumor morphology. The tumor morphology analysis 430 may include an analysis to identify one or more of signet ring cells, hepatoid cells, extracellular mucin, or tumor growth patterns. In some cases, growth pattern analysis may be helpful for gene fusion detection. As an example and not limitation, lung adenocarcinoma may present multiple growth patterns and the proportion of each growth pattern is different. As another example and not limitation, in some cases, solid and cribriform patterns may be associated with gene fusions. In some cases, the digital pathology image processing system 210 may determine the effect of sample collection type (e.g., resection versus biopsy tissue) on growth pattern. Because growth patterns are typically large and homogenous areas, image tile-level classification may be sufficiently accurate. In some embodiments, both signet ring cell detection and hepatoid cell detection may be associated with the presence of gene fusions. In order to detect such cells of interest, the digital pathology image processing system 210 may rely on object detection and positioning. In some cases involving detection of cells of interest, the digital pathology image processing system 210 can determine the relationship between the detected cells and the fusion state, for example, based on the number or type of detected cells. In some cases, the digital pathology image processing system 210 can further perform fine-grained positioning or image tile-level detection of cells.

数字病理学图像处理系统210也可以使用其他方法440来进行基因融合检测。作为示例而非限制，在一些情况下，数字病理学图像处理系统210可以从数字病理学图像来识别肿瘤异质性(肿瘤细胞在尺寸、形状和染色方面的可变性)并且测量所识别的肿瘤异质性。相应地，在一些情况下，确定数字病理学图像可以包括对基因融合的发生的描绘可以进一步基于测量到的肿瘤异质性。The digital pathology image processing system 210 may also use other methods 440 for gene fusion detection. By way of example and not limitation, in some cases, the digital pathology image processing system 210 may identify tumor heterogeneity (variability in size, shape, and staining of tumor cells) from a digital pathology image and measure the identified tumor heterogeneity. Accordingly, in some cases, determining that a digital pathology image may include a depiction of the occurrence of a gene fusion may be further based on the measured tumor heterogeneity.

数字病理学图像处理系统210然后可以对来自肿瘤形态学分析430、端到端分类420和其他方法440的结果进行聚合步骤450。可以使用任何合适的方法来进行对结果的聚合(例如，使用系综分类，或者通过经由所有子任务生成所有中间结果，并且随后训练消耗所有中间结果以输出联合预测的另一分类模型)。经聚合的结果可以用于预测组织标本的融合状态460。在一些实施方案中，融合状态预测可以为受弱监督的分类任务(例如，其中载玻片级标记可以是可用的)。在一些情况下，数字病理学图像处理系统210可以使用多实例学习(MIL)方法来对多个图像图块进行分类。在一些情况下，数字病理学图像处理系统210可以使用包括将载玻片标记分配给所有图像图块的经简化的策略。在特定实施方案中，确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘可以进一步基于针对每个图像图块生成的标记的加权组合。作为示例而非限制，在一些情况下，数字病理学图像处理系统210可以使用二元分类器来对图像图块进行分类，并且随后通过对所有图像图块预测进行组合(例如，平均)来确定载玻片级预测。The digital pathology image processing system 210 can then perform an aggregation step 450 on the results from the tumor morphology analysis 430, the end-to-end classification 420, and the other methods 440. Any suitable method can be used to aggregate the results (e.g., using an ensemble classification, or by generating all intermediate results via all subtasks, and then training another classification model that consumes all intermediate results to output a joint prediction). The aggregated results can be used to predict the fusion state 460 of the tissue specimen. In some embodiments, the fusion state prediction can be a weakly supervised classification task (e.g., where a slide-level label can be available). In some cases, the digital pathology image processing system 210 can classify multiple image tiles using a multiple instance learning (MIL) method. In some cases, the digital pathology image processing system 210 can use a simplified strategy that includes assigning slide labels to all image tiles. In a specific embodiment, determining that the digital pathology image includes a depiction of the occurrence of gene fusions about cancer cells can be further based on a weighted combination of labels generated for each image tile. By way of example and not limitation, in some cases, the digital pathology image processing system 210 may use a binary classifier to classify image patches and then determine a slide-level prediction by combining (eg, averaging) all image patch predictions.

在特定实施方案中，数字病理学图像处理系统210可以经由图形用户界面来输出预后预测。在一些情况下，图形用户界面可以包括数字病理学图像的图形表示。在一些情况下，图形表示可以包括对针对多个图像图块中的每一者生成的标记的指示以及与相应标记相关联的经预测的置信水平。在一些情况下，数字病理学图像处理系统210的输出也可以包括如下的其他信息。作为示例而非限制，数字病理学图像处理系统210可以输出治疗方案评定。数字病理学图像处理系统210可以生成与针对生物学样品所源自的受试者或患者的一个或多个治疗方案的使用相关联的建议。例如，评定可以是来自给定受试者或患者的样品可能具有基因融合，因此建议通过后续分子测定进行确认。作为另一示例而非限制，数字病理学图像处理系统210可以输出阴性结果，即，没有经预测或检测到的基因融合。作为又另一示例而非限制，数字病理学图像处理系统210可以输出“不足以进行分析”。例如，“不足以进行分析”可能是由于肿瘤尺寸或载玻片所致(例如，肿瘤标本太小和/或病理学载玻片质量受到组织处理伪影量的影响)。例如，用于切割组织切片的切片机刀片可能跨载玻片产生一系列平行的撕裂。这些类型的采样处理伪影可能阻碍用于分析病理学切片图像的数字病理学机器学习模型做出准确的预测。In certain embodiments, the digital pathology image processing system 210 can output a prognosis prediction via a graphical user interface. In some cases, the graphical user interface can include a graphical representation of a digital pathology image. In some cases, the graphical representation can include an indication of a mark generated for each of a plurality of image tiles and a predicted confidence level associated with the corresponding mark. In some cases, the output of the digital pathology image processing system 210 can also include other information as follows. As an example and not limitation, the digital pathology image processing system 210 can output a treatment plan assessment. The digital pathology image processing system 210 can generate suggestions associated with the use of one or more treatment plans for a subject or patient derived from a biological sample. For example, an assessment can be that a sample from a given subject or patient may have a gene fusion, so it is recommended to confirm by subsequent molecular assays. As another example and not limitation, the digital pathology image processing system 210 can output a negative result, that is, there is no predicted or detected gene fusion. As another example and not limitation, the digital pathology image processing system 210 can output "not enough for analysis". For example, “insufficient for analysis” may be due to tumor size or slide (e.g., tumor specimen is too small and/or pathology slide quality is affected by the amount of tissue processing artifacts). For example, a microtome blade used to cut tissue sections may create a series of parallel tears across the slide. These types of sampling processing artifacts may prevent digital pathology machine learning models used to analyze pathology slide images from making accurate predictions.

本文所公开的实施方案可以使得用户能够容易地从用户端请求基于数字病理学图像的预后预测。在特定实施方案中，数字病理学图像处理系统210可以从客户端计算系统向远程计算系统传输请求通信，以处理描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。响应于从客户端计算系统接收请求通信，远程计算系统可以进行包括以下步骤的操作。远程计算系统可以首先访问数字病理学图像。远程计算系统然后可以将数字病理学图像分割为多个图像图块，每个图像图块描绘一个或多个肿瘤细胞簇。远程计算系统然后可以针对该多个图像图块中的每一者生成指示图像图块描绘肿瘤区域或肿瘤巢结构的可能性的标记。远程计算系统然后可以基于针对每个图像图块所生成的标记来确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘。远程计算系统然后可以基于带有对癌细胞的应答的基因融合的发生来生成针对受试者的预后预测。在特定实施方案中，预后预测可以包括对针对受试者的一个或多个治疗方案的适用性的预测。远程计算系统可以进一步经由响应通信来向客户端计算系统提供预后预测。特定实施方案可以进一步由客户端计算系统响应于接收响应通信而输出预后预测。The embodiments disclosed herein can enable users to easily request prognosis predictions based on digital pathology images from a user end. In a specific embodiment, the digital pathology image processing system 210 can transmit a request communication from a client computing system to a remote computing system to process a digital pathology image depicting cancer cells in a specific slice of a biological sample from a subject. In response to receiving a request communication from a client computing system, the remote computing system can perform operations including the following steps. The remote computing system can first access the digital pathology image. The remote computing system can then segment the digital pathology image into a plurality of image tiles, each of which depicts one or more tumor cell clusters. The remote computing system can then generate a mark indicating the possibility of the image tile depicting a tumor area or a tumor nest structure for each of the plurality of image tiles. The remote computing system can then determine that the digital pathology image includes a depiction of the occurrence of gene fusions about cancer cells based on the marks generated for each image tile. The remote computing system can then generate a prognosis prediction for a subject based on the occurrence of gene fusions with a response to cancer cells. In a specific embodiment, the prognosis prediction can include a prediction of the suitability of one or more treatment regimens for a subject. The remote computing system may further provide the prognostic prediction to the client computing system via the response communication.Particular embodiments may further output, by the client computing system, the prognostic prediction in response to receiving the response communication.

图5A至图5D示出了肺腺癌中的示例性可操作融合预测。图5A示出了针对肺腺癌的病理学载玻片图像的非限制性示例。左侧图像505是针对带有ROS1融合的转移性肺腺癌的载玻片。右侧图像510是针对带有EGFR突变的肺腺癌的载玻片。图5B示出了来自质量控制的结果的非限制性示例。如图5B所示，质量控制过程可以识别组织515、标志物520、模糊525和组合图像特征550。图5C示出了肿瘤区域检测的非限制性示例。如图5C所示，区域越暗，则其越有可能是肿瘤区域。图5D示出了对融合状态的预测的非限制性示例。如图5D所示，区域越暗，则其越有可能包括基因融合。Figures 5A to 5D illustrate exemplary actionable fusion predictions in lung adenocarcinoma. Figure 5A illustrates a non-limiting example of a pathology slide image for lung adenocarcinoma. The left image 505 is a slide for a metastatic lung adenocarcinoma with a ROS1 fusion. The right image 510 is a slide for a lung adenocarcinoma with an EGFR mutation. Figure 5B illustrates a non-limiting example of results from quality control. As shown in Figure 5B, the quality control process may identify tissue 515, markers 520, blur 525, and combined image features 550. Figure 5C illustrates a non-limiting example of tumor region detection. As shown in Figure 5C, the darker the region, the more likely it is a tumor region. Figure 5D illustrates a non-limiting example of a prediction of a fusion state. As shown in Figure 5D, the darker the region, the more likely it is to include a gene fusion.

图6示出了对ROS1基因融合状态的预测的非限制性示例。图6所示的图像是可以提供给病理学家的最终输出的示例。左侧图像610指示针对包括ROS1融合的转移性肺腺癌的融合阳性载玻片。右侧图像620指示来自图像610的带有基因融合预测的重叠热图的相同视场。当比较该两个图像时，人们可以看到肿瘤检测算法如何拒绝不包含肿瘤的图像图块。此外，预测的置信度量(如通过热图的强度所描绘的)可能跨肿瘤区域有所不同。在带有印戒细胞的区域中，置信度量可能是最高的。如可以看到的，数字病理学图像处理系统210可以提供呈使病理学家清楚数字病理学模型基于可解释的形态学特征的格式的输出。FIG. 6 shows a non-limiting example of a prediction of a ROS1 gene fusion state. The image shown in FIG. 6 is an example of a final output that can be provided to a pathologist. The left image 610 indicates a fusion-positive slide for a metastatic lung adenocarcinoma including a ROS1 fusion. The right image 620 indicates the same field of view with an overlapping heat map of gene fusion predictions from image 610. When comparing the two images, one can see how the tumor detection algorithm rejects image tiles that do not contain a tumor. In addition, the predicted confidence metric (as depicted by the intensity of the heat map) may vary across tumor regions. In an area with signet ring cells, the confidence metric may be the highest. As can be seen, the digital pathology image processing system 210 can provide an output in a format that makes it clear to the pathologist that the digital pathology model is based on interpretable morphological features.

进行了对肺腺癌中的可操作融合预测的实验以验证本文所公开的数字病理学模型和方法。图7示出了针对基于图像图块的融合预测的接收器操作特性(ROC)曲线710的非限制性示例。用于数字病理学模型的训练集包括270个切除物。其中的18.5％为融合阳性，即，50个载玻片源自5位患者。在这些融合阳性载玻片中，5个载玻片为ALK融合阳性，并且45个载玻片为ROS1融合阳性。测试集包括598个切除物和活检组织。其中的11％为融合阳性，即，68个载玻片。在这些融合阳性载玻片中，8个载玻片为NTRK融合阳性，并且60个载玻片为ROS1融合阳性。对于设定为0.5的截止值，性能统计数据如下：阳性预测值(PPV)为0.46，并且阴性预测值(NPV)为0.97，其中总曲线下面积(AUC)为0.89。Experiments for actionable fusion prediction in lung adenocarcinoma were conducted to validate the digital pathology model and method disclosed herein. FIG. 7 shows a non-limiting example of a receiver operating characteristic (ROC) curve 710 for image tile-based fusion prediction. The training set for the digital pathology model includes 270 resections. 18.5% of them are fusion-positive, i.e., 50 slides are from 5 patients. Of these fusion-positive slides, 5 slides are ALK fusion-positive and 45 slides are ROS1 fusion-positive. The test set includes 598 resections and biopsies. 11% of them are fusion-positive, i.e., 68 slides. Of these fusion-positive slides, 8 slides are NTRK fusion-positive and 60 slides are ROS1 fusion-positive. For a cutoff value set at 0.5, the performance statistics were as follows: positive predictive value (PPV) was 0.46, and negative predictive value (NPV) was 0.97, with an overall area under the curve (AUC) of 0.89.

在特定实施方案中，端到端结果中的细胞外粘蛋白信号可以如下。据报告，过量的细胞外粘蛋白指示融合状态，并且端到端融合状态预测可以证实这一假设。粘蛋白池中的强信号可以是可观察到的。在特定实施方案中，数字病理学图像处理系统210另外可以详细地预测融合状态、识别切除物与活检组织之间的差异、确定对区域的精确分割、进行对包含细胞外粘蛋白的图像图块的粗略检测、以及从区域到实际融合状态预测的转变。作为示例而非限制，从区域到实际融合状态预测的转变可以包括：确定粘蛋白相比于组织的分数，或者确定粘蛋白相比于肿瘤的分数。In certain embodiments, the extracellular mucin signal in the end-to-end results can be as follows. It is reported that excess extracellular mucin indicates a fusion state, and the end-to-end fusion state prediction can confirm this assumption. Strong signals in the mucin pool can be observable. In certain embodiments, the digital pathology image processing system 210 can additionally predict the fusion state in detail, identify the difference between the resection and the biopsy tissue, determine the precise segmentation of the region, perform a rough detection of the image tiles containing extracellular mucin, and transition from the region to the actual fusion state prediction. As an example and not limitation, the transition from the region to the actual fusion state prediction can include: determining the score of mucin compared to the tissue, or determining the score of mucin compared to the tumor.

在特定实施方案中，端到端预测中的扩散融合信号可以如下。可以将激酶或致癌基因融合与低肿瘤突变负荷相关联。肿瘤的主要致癌基因驱动因子可能是单个基因融合。因此，人们会预期源自单个致癌基因驱动因子的形态学信号会跨载玻片上的大多数肿瘤细胞/区域而存在。端到端融合状态预测也可以显示出跨全载玻片的强的均匀信号。In a particular embodiment, the diffuse fusion signal in the end-to-end prediction can be as follows. A kinase or oncogene fusion can be associated with a low tumor mutation load. The primary oncogene driver of a tumor may be a single gene fusion. Therefore, one would expect that the morphological signal derived from a single oncogene driver would be present across most tumor cells/regions on a slide. The end-to-end fusion status prediction can also show a strong uniform signal across the entire slide.

图8示出了示例性方法800，该方法用于使得终端用户能够利用客户端计算系统从数字病理学图像处理系统210请求基于对数字病理学图像的处理(由远程计算系统执行方法300中的步骤)的预后预测。该方法可以开始于步骤810，其中图2中所描绘的数字病理学图像处理系统210可以从客户端计算系统向远程计算系统传输请求通信以处理描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像，其中响应于从客户端计算系统接收请求通信，远程计算系统进行包括以下子步骤的操作。在子步骤810a处，远程计算系统可以访问数字病理学图像。在子步骤810b处，远程计算系统可以将数字病理学图像分割为多个图像图块。在子步骤810c处，远程计算系统可以针对该多个图像图块中的每一者生成指示图像图块描绘例如肿瘤区域或肿瘤巢结构的可能性的标记。在子步骤810d处，远程计算系统可以基于针对每个图像图块生成的标记来确定数字病理学图像包括对关于癌细胞的基因融合的发生的描绘。在子步骤810e处，远程计算系统可以基于关于癌细胞的基因融合的发生来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测。在子步骤810f处，远程计算系统可以经由响应通信来向客户端计算系统提供预后预测。在步骤820处，计算系统可以响应于接收响应通信而输出预后预测。在一些情况下，图8中所描绘的方法的一个或多个步骤可以在适当的情况下被重复。尽管本公开描述和示出了图8的方法的如以特定顺序发生的特定步骤，但是本公开设想了图8的方法的以任何合适的顺序发生的任何合适的步骤。此外，尽管本公开描述和示出了用于使得最终用户能够请求预后预测的示例性方法(包括图8的方法的特定步骤)，但是本公开设想了用于使得最终用户能够请求预后预测的任何合适的方法(在适当的情况下包括任何合适的步骤，其可以包括图8的方法的所有步骤、一些步骤或不包括该方法的步骤)。此外，尽管本公开描述和示出了执行图8的方法的特定步骤的特定部件、装置或系统，但是本公开设想了执行图8的方法的任何合适的步骤的任何合适的部件、装置或系统的任何合适的组合。FIG8 illustrates an exemplary method 800 for enabling an end user to utilize a client computing system to request a prognosis prediction based on processing of a digital pathology image from a digital pathology image processing system 210 (by a remote computing system performing steps in method 300). The method may begin at step 810, wherein the digital pathology image processing system 210 depicted in FIG2 may transmit a request communication from a client computing system to a remote computing system to process a digital pathology image depicting cancer cells in a particular slice of a biological sample from a subject, wherein in response to receiving the request communication from the client computing system, the remote computing system performs operations including the following sub-steps. At sub-step 810a, the remote computing system may access the digital pathology image. At sub-step 810b, the remote computing system may segment the digital pathology image into a plurality of image tiles. At sub-step 810c, the remote computing system may generate, for each of the plurality of image tiles, a tag indicating the likelihood that the image tile depicts, for example, a tumor region or a tumor nest structure. At sub-step 810d, the remote computing system may determine that the digital pathology image includes a depiction of the occurrence of gene fusions about cancer cells based on the markers generated for each image tile. At sub-step 810e, the remote computing system may generate a prognosis prediction for the subject based on the occurrence of gene fusions about cancer cells, wherein the prognosis prediction includes a prediction of the suitability of one or more treatment regimens for the subject. At sub-step 810f, the remote computing system may provide the prognosis prediction to the client computing system via a response communication. At step 820, the computing system may output the prognosis prediction in response to receiving the response communication. In some cases, one or more steps of the method depicted in Figure 8 may be repeated where appropriate. Although the present disclosure describes and illustrates specific steps of the method of Figure 8 as occurring in a particular order, the present disclosure contemplates any suitable steps of the method of Figure 8 occurring in any suitable order. Furthermore, although the present disclosure describes and illustrates an exemplary method for enabling an end user to request a prognostic prediction (including specific steps of the method of FIG. 8 ), the present disclosure contemplates any suitable method for enabling an end user to request a prognostic prediction (including any suitable steps, where appropriate, which may include all, some, or no steps of the method of FIG. 8 ). Furthermore, although the present disclosure describes and illustrates specific components, devices, or systems that perform specific steps of the method of FIG. 8 , the present disclosure contemplates any suitable combination of any suitable components, devices, or systems that perform any suitable steps of the method of FIG. 8 .

图9示出了用于基于识别关于一组检测到的癌细胞的基因融合的缺乏来预测替代性治疗的示例性方法900。该方法可以开始于步骤910，其中图2所示的数字病理学图像处理系统210可以访问描绘来自受试者的生物学样品的特定切片中的癌细胞的数字病理学图像。在步骤920处，数字病理学图像处理系统210可以确定数字病理学图像包括对与基因融合的发生相互排斥的一个或多个突变的描绘。在步骤930处，数字病理学图像处理系统210可以确定不存在关于癌细胞的基因融合。在步骤940处，数字病理学图像处理系统210可以基于不存在关于癌细胞的基因融合来生成针对受试者的预后预测，其中预后预测包括对针对受试者的一个或多个治疗方案的适用性的预测。在一些情况下，在适当的情况下，图9的方法的一个或多个步骤可以被重复。尽管本公开描述和示出了图9的方法的如以特定顺序发生的特定步骤，但是本公开设想了图9的方法的以任何合适的顺序发生的任何合适的步骤。此外，尽管本公开描述和示出了用于排除基因融合的示例性方法(包括图9的方法的特定步骤)，但是本公开设想了用于识别基因融合的缺乏(或不存在)的任何合适的方法(在适当的情况下包括任何合适的步骤，其可以包括图9的方法的所有步骤、一些步骤或不包括该方法的步骤)。此外，尽管本公开描述和示出了执行图9的方法的特定步骤的特定部件、装置或系统，但是本公开设想了执行图9的方法的任何合适的步骤的任何合适的部件、装置或系统的任何合适的组合。FIG. 9 shows an exemplary method 900 for predicting alternative treatments based on the lack of gene fusions for a group of detected cancer cells. The method may begin at step 910, where the digital pathology image processing system 210 shown in FIG. 2 may access a digital pathology image depicting cancer cells in a specific slice of a biological sample from a subject. At step 920, the digital pathology image processing system 210 may determine that the digital pathology image includes a depiction of one or more mutations that are mutually exclusive with the occurrence of gene fusions. At step 930, the digital pathology image processing system 210 may determine that there is no gene fusion for cancer cells. At step 940, the digital pathology image processing system 210 may generate a prognosis prediction for the subject based on the absence of gene fusions for cancer cells, wherein the prognosis prediction includes a prediction of the suitability of one or more treatment regimens for the subject. In some cases, one or more steps of the method of FIG. 9 may be repeated where appropriate. Although the present disclosure describes and illustrates specific steps of the method of FIG. 9 as occurring in a specific order, the present disclosure contemplates any suitable steps of the method of FIG. 9 occurring in any suitable order. In addition, although the present disclosure describes and illustrates an exemplary method for excluding gene fusions (including specific steps of the method of FIG. 9 ), the present disclosure contemplates any suitable method for identifying the lack (or absence) of gene fusions (including any suitable steps, where appropriate, which may include all, some, or no steps of the method of FIG. 9 ). In addition, although the present disclosure describes and illustrates specific components, devices, or systems for performing specific steps of the method of FIG. 9 , the present disclosure contemplates any suitable combination of any suitable components, devices, or systems for performing any suitable steps of the method of FIG. 9 .

在一些实施方案中，识别肿瘤异质性包括：基于突变肿瘤细胞的形态学特征来将突变肿瘤细胞分类到表型，以及评定表型中的任一者中的突变肿瘤细胞的空间分布。突变背景或肿瘤细胞突变的方式可以从高异质性突变背景(例如，肿瘤抑制因子和/或未知驱动因子，其可以是对于对免疫疗法的应答的预后)到中等异质性突变背景(例如，致癌基因突变，其可以是对于对针对致癌基因突变的靶向疗法的应答的预后)到低异质性(又名同质性或克隆突变环境(例如，基因融合，其可以是对于对针对特定基因融合的特定类型的靶向疗法的应答的预后))进行变化。In some embodiments, identifying tumor heterogeneity includes classifying mutant tumor cells into phenotypes based on their morphological features, and assessing the spatial distribution of mutant tumor cells in any of the phenotypes. The mutational context or the way in which tumor cells mutate can vary from a high heterogeneity mutational context (e.g., tumor suppressors and/or unknown drivers, which can be prognostic for response to immunotherapy) to a moderate heterogeneity mutational context (e.g., oncogene mutations, which can be prognostic for response to targeted therapies for oncogene mutations) to a low heterogeneity (aka homogeneous or clonal mutational context (e.g., gene fusions, which can be prognostic for response to a particular type of targeted therapy for a particular gene fusion)).

图10A至图10C是突变背景的示例性表示的示意图，如上面的窗口所示，在上方示出了对应视觉特征的示例性表示(例如，在WSI中捕获的成簇的肿瘤细胞的形态学特征)，如下面的窗口中所示。如图所示，突变背景的表示示出了异质性如何随着肿瘤细胞突变和繁殖而增加或减少(总体上以及在肿瘤细胞的单表型簇内两种情况)。不同的肿瘤细胞表型由带有不同填充图案的圆圈来表示。Figures 10A to 10C are schematic diagrams of exemplary representations of mutational backgrounds, as shown in the upper window, with exemplary representations of corresponding visual features (e.g., morphological features of clustered tumor cells captured in WSI) shown above, as shown in the lower window. As shown, the representation of the mutational background shows how heterogeneity increases or decreases (both overall and within a single phenotype cluster of tumor cells) as tumor cells mutate and multiply. Different tumor cell phenotypes are represented by circles with different fill patterns.

图10A是突变背景1010—肿瘤抑制因子和/或未知驱动因子—及其对应视觉特征1020的示例性表示的示意图。可以在形态学特征中观察到如肿瘤细胞之间的高水平异质性。将肿瘤抑制因子和/或未知驱动因子的突变背景与其他突变背景区分开的示例性视觉特征包括：肿瘤细胞之间的高度表型变化，以及如不同表型之间的肿瘤细胞的不均匀空间分布。如由肿瘤细胞的簇1010a(用虚线指示)所示，由黑圆圈表示的表型可以是(例如但不限于)带有如所测量的特定细胞核面积范围以及与细胞核边界的一定水平的角度的肿瘤细胞。视觉特征1020的区域1020a描绘了该示例性表型(即，带有如所测量的特定细胞核面积范围以及与细胞核边界的一定水平的角度的肿瘤细胞)的对应视觉特征的示例性表示。FIG10A is a schematic diagram of an exemplary representation of a mutational context 1010—tumor suppressors and/or unknown drivers—and its corresponding visual features 1020. A high level of heterogeneity, such as between tumor cells, can be observed in morphological features. Exemplary visual features that distinguish the mutational context of tumor suppressors and/or unknown drivers from other mutational contexts include: a high degree of phenotypic variation between tumor cells, and an uneven spatial distribution of tumor cells between different phenotypes. As shown by the cluster 1010a of tumor cells (indicated by dashed lines), the phenotype represented by the black circle can be, for example, but not limited to, a tumor cell with a specific nuclear area range as measured and a certain horizontal angle to the nuclear boundary. Region 1020a of visual feature 1020 depicts an exemplary representation of the corresponding visual features of this exemplary phenotype (i.e., a tumor cell with a specific nuclear area range as measured and a certain horizontal angle to the nuclear boundary).

图10B是突变背景1030—致癌基因驱动因子突变(例如，EGFR突变)—及其对应视觉特征1040的示例性表示的示意图。可以在形态学特征中观察到如肿瘤细胞之间的中等水平异质性。区分致癌基因驱动因子突变的突变背景与其他突变背景的示例性视觉特征包括：肿瘤细胞的各种表型的簇，以及如肿瘤细胞的不同簇之间的空间分布中的差异，而且也包括如簇内的肿瘤细胞之间的紧密、等距的空间分布。如由肿瘤细胞的簇1030a、1030b和1030c(用虚线指示)所示，这些簇所表示的三种表型可以是(例如但不限于)带有视场的选定区域中的细胞核中的像素与定界框中的像素的比率的三个有区别的范围的肿瘤细胞。视觉特征1040的区域1040a、1040b和1040c(分别对应于簇1030a、1030b和1030c)描绘了那三个示例性表型的对应视觉特征的示例性表示。FIG10B is a schematic diagram of an exemplary representation of a mutational context 1030—oncogenic driver mutations (e.g., EGFR mutations)—and their corresponding visual features 1040. A moderate level of heterogeneity between tumor cells can be observed in morphological features. Exemplary visual features that distinguish a mutational context of oncogenic driver mutations from other mutational contexts include clusters of various phenotypes of tumor cells, and differences in spatial distribution between different clusters of tumor cells, but also tight, equidistant spatial distribution between tumor cells within a cluster. As shown by clusters 1030a, 1030b, and 1030c of tumor cells (indicated by dashed lines), the three phenotypes represented by these clusters can be, for example, but not limited to, three distinct ranges of tumor cells with a ratio of pixels in the nucleus to pixels in a bounding box in a selected region of the field of view. Regions 1040a, 1040b, and 1040c of visual features 1040 (corresponding to clusters 1030a, 1030b, and 1030c, respectively) depict exemplary representations of corresponding visual features of those three exemplary phenotypes.

图10C是突变背景1050—致癌基因融合(例如，ALK、NTRK、ROS1、RET)—及其对应视觉特征1060的示例性表示的示意图。可以在形态学特征中观察到如肿瘤细胞之间的高度同质性(即，低异质性或不存在异质性，又名克隆外观)。区分致癌基因融合的突变背景与其他突变背景的示例性视觉特征包括：肿瘤细胞中低程度的(或不存在)表型变化，以及肿瘤细胞的紧密、等距的空间分布。如由肿瘤细胞的簇1050a(用虚线指示)所示，该簇所表示的表型可以是(例如但不限于)带有特定像素强度值的肿瘤细胞。视觉特征1060描绘了该示例性表型(即，单个表型的同质性细胞簇)的对应视觉特征的示例性表示。FIG. 10C is a schematic diagram of an exemplary representation of a mutational context 1050—oncogene fusions (e.g., ALK, NTRK, ROS1, RET)—and their corresponding visual features 1060. A high degree of homogeneity (i.e., low or absent heterogeneity, also known as a clonal appearance) among tumor cells can be observed in morphological features. Exemplary visual features that distinguish a mutational context of oncogene fusions from other mutational contexts include: a low degree (or absence) of phenotypic changes in tumor cells, and a tight, equidistant spatial distribution of tumor cells. As shown by a cluster 1050a of tumor cells (indicated by a dotted line), the phenotype represented by the cluster can be, for example, but not limited to, tumor cells with specific pixel intensity values. Visual features 1060 depict an exemplary representation of the corresponding visual features of the exemplary phenotype (i.e., a homogeneous cluster of cells of a single phenotype).

图11是针对用于识别肿瘤异质性的非限制性示例方法1100的流程图。在步骤1110处，可以通过检索(例如，从数据库或在线储存库)或通过使用数字病理学图像生成系统220扫描WSI来访问描绘取自受试者的肿瘤细胞的数字病理学图像。数字病理学图像可能先前已被识别为描绘肿瘤细胞的图像(例如，由人类病理学家或由数字病理学图像处理系统210)。在步骤1120处，可以从数字病理学图像选择图块。在这些图块中，肿瘤细胞可能已被注释，细胞核已被分割，并且已生成细胞核掩模。在步骤1130处，可以基于评定核形态学特征(例如，如下所列)来将肿瘤细胞分类到包括共享核形态学特征的表型簇，表型中的每一者对应于不同的突变。在一些实施方案中，也可以评定肿瘤细胞、其亚结构和/或间质或其他附近细胞的其他形态学特征。在步骤1140处，可以按表型和位置来将肿瘤细胞分组到表示肿瘤区域或巢的簇中。按表型对肿瘤细胞进行分组可以包括通过基于形态学特征来评定肿瘤细胞。按位置对肿瘤细胞进行分组可以包括：生成最小生成树，以及通过如所描绘的分割肿瘤巢或通过任何其他合适的技术使用异常值检测将树拆分为子图。在步骤1150处，可以量化簇中的每一者的细胞之间的空间距离。可以通过针对WSI中的所有图成对地计算邻接光谱距离来量化空间距离。也可以通过测量空间熵来量化空间距离以识别紧密相邻的克隆细胞。在步骤1160处，可以基于经分类的肿瘤细胞、经识别的簇或经量化的空间距离来预测存在可操作突变的可能性。在步骤1170处，基于图块级预测，可以针对受试者生成预后预测，其中预后预测包括对于对针对受试者的一个或多个治疗方案的应答的预测。FIG. 11 is a flow chart for a non-limiting example method 1100 for identifying tumor heterogeneity. At step 1110, a digital pathology image depicting a tumor cell taken from a subject can be accessed by retrieval (e.g., from a database or online repository) or by scanning a WSI using a digital pathology image generation system 220. The digital pathology image may have been previously identified as an image depicting a tumor cell (e.g., by a human pathologist or by a digital pathology image processing system 210). At step 1120, a tile can be selected from a digital pathology image. In these tiles, tumor cells may have been annotated, cell nuclei have been segmented, and cell nucleus masks have been generated. At step 1130, tumor cells can be classified into phenotypic clusters including shared nuclear morphological features based on assessment of nuclear morphological features (e.g., as listed below), each of which corresponds to a different mutation. In some embodiments, other morphological features of tumor cells, their substructures and/or stroma or other nearby cells may also be assessed. At step 1140, tumor cells can be grouped into clusters representing tumor regions or nests by phenotype and location. Grouping tumor cells by phenotype can include assessing tumor cells based on morphological features. Grouping tumor cells by location can include: generating a minimum spanning tree, and splitting the tree into subgraphs by segmenting tumor nests as depicted or using outlier detection by any other suitable technique. At step 1150, the spatial distance between cells of each in the cluster can be quantified. Spatial distances can be quantified by calculating the adjacent spectral distances in pairs for all graphs in the WSI. Spatial distances can also be quantified by measuring spatial entropy to identify closely adjacent clonal cells. At step 1160, the possibility of the presence of actionable mutations can be predicted based on classified tumor cells, identified clusters, or quantified spatial distances. At step 1170, based on tile-level predictions, a prognosis prediction can be generated for a subject, wherein the prognosis prediction includes a prediction of a response to one or more treatment regimens for the subject.

图12A至图12C描绘了肿瘤细胞注释、细胞核分割和细胞核掩模生成的非限制性示例。如图12A所示，可以对肿瘤细胞核中的每一者的位置进行注释(例如，使用简单的点)，之后对每个细胞核进行分割，如图12B所示。如图12B中的附图标记所示，不同的细胞核可以表现出不同的核形态学特征。例如，如与细胞核1220的卵形边界或细胞核1230和1240的不规则、部分弯曲、部分有角度的边界相比，细胞核1210的边界是相对圆形的。此外，细胞核1220的相对面积看起来为细胞核1210的约两倍，并且为细胞核1230的面积的约三倍。接下来，如图12C所示，可以针对每个核生成掩模以便定义在其内测量每个核形态学特征的边界。Figures 12A to 12C depict non-limiting examples of tumor cell annotation, nucleus segmentation, and nucleus mask generation. As shown in Figure 12A, the position of each of the tumor cell nuclei can be annotated (e.g., using simple points), and each nucleus is then segmented, as shown in Figure 12B. As shown in the reference numerals in Figure 12B, different nuclei can exhibit different nuclear morphological features. For example, the border of nucleus 1210 is relatively circular, as compared to the oval border of nucleus 1220 or the irregular, partially curved, partially angled borders of nuclei 1230 and 1240. In addition, the relative area of nucleus 1220 appears to be about twice that of nucleus 1210, and about three times the area of nucleus 1230. Next, as shown in Figure 12C, a mask can be generated for each nucleus to define the boundaries within which each nuclear morphological feature is measured.

在一些实施方案中，通过评定核异质性来识别肿瘤异质性。在一些实施方案中，评定核异质性包括：量化细胞核的某些特征以基于核形态异质性来区分突变细胞。在一些实施方案中，数字病理学图像处理系统210可以使用若干不同方法中的任一者来分析在WSI中识别的每个肿瘤细胞核。例如，在一种方法中，可以进行自动肿瘤细胞核检测和参数化，其中经训练的机器学习模型可以用于识别每个肿瘤细胞核，测量针对每个细胞核的一组指定参数或特征(如下所述)，并且随后比较这些指定参数或特征的群体级分布。在另一示例中，该方法可以包括进行肿瘤图像分割，其可以是基于图像图块的评定。在一些情况下，对肿瘤异质性的确定可以在载玻片预测的基础上进行(其可以包括，例如，计算被预测为异质性的图像图块的百分比，或每个载玻片的预测评分的平均值)。In some embodiments, tumor heterogeneity is identified by assessing nuclear heterogeneity. In some embodiments, assessing nuclear heterogeneity includes: quantifying certain features of the nucleus to distinguish mutant cells based on nuclear morphology heterogeneity. In some embodiments, the digital pathology image processing system 210 can use any of several different methods to analyze each tumor cell nucleus identified in the WSI. For example, in one method, automatic tumor cell nucleus detection and parameterization can be performed, wherein a trained machine learning model can be used to identify each tumor cell nucleus, measure a set of specified parameters or features for each cell nucleus (as described below), and then compare the population-level distribution of these specified parameters or features. In another example, the method may include performing tumor image segmentation, which may be an assessment based on image tiles. In some cases, the determination of tumor heterogeneity can be performed on the basis of slide prediction (which may include, for example, calculating the percentage of image tiles predicted to be heterogeneous, or the average value of the predicted score for each slide).

在一些情况下，肿瘤异质性可以由基因融合中所涉及的基因类型来驱动。对于肿瘤抑制因子基因，肿瘤形成可以通过功能的损失来介导。此类突变使细胞脱离细胞周期控制，这继而可以间接地促进生长。随着时间的推移，该过程使得每个新一代的子细胞积累促癌突变。相反地，对于致癌基因而言，肿瘤形成可可以通过功能的获得来介导。生长因子的过度激活可以例如直接促进生长，从而导致不受约束的生长。预计该过程导致不需要额外突变的立即生长优势。基于这一基本原理，可以在包括涉及致癌基因(诸如ALK、ROS1、RET和NTRK)的融合的肿瘤中预计低肿瘤异质性。In some cases, tumor heterogeneity can be driven by the gene types involved in gene fusion. For tumor suppressor genes, tumor formation can be mediated by loss of function. Such mutations cause cells to escape cell cycle control, which in turn can indirectly promote growth. Over time, this process causes each new generation of daughter cells to accumulate cancer-promoting mutations. On the contrary, for oncogenes, tumor formation can be mediated by gain of function. Overactivation of growth factors can, for example, directly promote growth, resulting in unconstrained growth. It is expected that this process results in an immediate growth advantage that does not require additional mutations. Based on this basic principle, low tumor heterogeneity can be expected in tumors including fusions involving oncogenes such as ALK, ROS1, RET, and NTRK.

评定肿瘤异质性的一种方法可以包括对肿瘤细胞中细胞级结构(诸如细胞核)的形态学的评定。细胞核的形态可以通过多个图像特征来表示，这些图像特征可以被组织到图像特征的类别中，诸如，例如但不限于染色质特征、几何坐标、基本形态学特征、二维形状特征、一阶统计数据、“灰度级”(例如，其中“灰度”表示像素强度级别的空间分布)共生矩阵特征、灰度级依赖矩阵特征、灰度级游程矩阵特征、灰度级尺寸区矩阵特征、邻近灰度色调差值矩阵特征、高级细胞核形态学特征以及边界和曲率特征。One method of assessing tumor heterogeneity may include assessing the morphology of cellular level structures (such as nuclei) in tumor cells. The morphology of the nuclei may be represented by a plurality of image features, which may be organized into categories of image features such as, for example but not limited to, chromatin features, geometric coordinates, basic morphological features, two-dimensional shape features, first-order statistics, "grayscale" (e.g., where "grayscale" refers to the spatial distribution of pixel intensity levels) co-occurrence matrix features, grayscale dependency matrix features, grayscale run matrix features, grayscale size region matrix features, neighboring grayscale hue difference matrix features, high-level nuclear morphology features, and boundary and curvature features.

每种类型可以包括一个或多个图像特征。示例性图像特征可以包括但不限于：Each type may include one or more image features. Exemplary image features may include, but are not limited to:

●染色质特征，诸如：Chromatin features, such as:

○细胞核的异质性(异质)，○ Heterogeneity of the nucleus (heterogeneity),

○颗粒的尺寸分布(丛)，○Particle size distribution (cluster),

○大颗粒相对于总核面积的分数(凝聚)，○The fraction of large particles relative to the total core area (agglomeration),

○核膜或边缘周围的分布(边集)；○Distribution around the nuclear membrane or edge (edge set);

●几何坐标，诸如：●Geometric coordinates, such as:

○区域坐标(region_coords)，○Region coordinates (region_coords),

○x坐标(x)，○x coordinate (x),

○y坐标(y)；○y coordinate (y);

●基本形态学特征，诸如：Basic morphological features, such as:

○细胞核的面积(面积)，○The area of the cell nucleus (area),

○细胞核的凸面面积(convex_area)，○The convex area of the nucleus (convex_area),

○细胞核的偏心率(eccentricity)，○Eccentricity of the cell nucleus,

○细胞核的直径(equivalent_diameter)，○The diameter of the cell nucleus (equivalent_diameter),

○细胞核中的像素与定界框中的像素的比率，无论是在总视场或视图中还是在视场的选定区域(范围)中，○ The ratio of pixels in the nucleus to pixels in the bounding box, either in the total field of view or in a selected region (range) of the field of view,

○细胞核的周长(周长)，○ The circumference (perimeter) of the nucleus,

○细胞核中的像素与凸包的像素的比率(实度)，○The ratio of pixels in the nucleus to pixels in the convex hull (solidity),

○如通过惯性张量的特征值衡量的细胞核的伸长率(inertia_tensor_eigvals1,inertia_tensor_eigvals2)，○ The elongation of the nucleus as measured by the eigenvalues of the inertia tensor (inertia_tensor_eigvals1, inertia_tensor_eigvals2),

○细胞核的长轴的长度(major_axis_length)，○The length of the major axis of the cell nucleus (major_axis_length),

○细胞核的短轴的长度(minor_axis_length)，○The length of the minor axis of the cell nucleus (minor_axis_length),

○Hu矩(图像像素的强度的某些特定加权平均值，又名○Hu moment (some specific weighted average of the intensity of image pixels, also known as

“矩”，即，平移、旋转和尺度不变矩)(moments_hu0,"Moments", i.e., translation, rotation, and scale invariant moments) (moments_hu0,

moments_hu1,moments_hu2,moments_hu3,moments_hu4,moments_hu5,moments_hu6)，moments_hu1,moments_hu2,moments_hu3,moments_hu4,moments_hu5,moments_hu6),

○加权Hu矩(weighted_moments_hu0,○Weighted Hu moments (weighted_moments_hu0,

weighted_moments_hu1,weighted_moments_hu2,weighted_moments_hu1,weighted_moments_hu2,

weighted_moments_hu3,weighted_moments_hu4,weighted_moments_hu3,weighted_moments_hu4,

weighted_moments_hu5,weighted_moments_hu6)；weighted_moments_hu5,weighted_moments_hu6);

●二维形状特征，诸如：2D shape features, such as:

○二维形状伸长率(original_shape2D_Elongation)，○2D shape elongation (original_shape2D_Elongation),

○二维形状最大直径○Maximum diameter of two-dimensional shape

(original_shape2D_MaximumDiameter)，(original_shape2D_MaximumDiameter),

○二维形状网格表面(original_shape2D_MeshSurface)，○ 2D shape mesh surface (original_shape2D_MeshSurface),

○二维形状周长与表面比率○Ratio of perimeter to surface of two-dimensional shape

(original_shape2D_PerimeterSurfaceRatio)，(original_shape2D_PerimeterSurfaceRatio),

○二维形状像素表面(original_shape2D_PixelSurface)，○ 2D shape pixel surface (original_shape2D_PixelSurface),

○二维形状球度(original_shape2D_Sphericity)，○ 2D shape sphericity (original_shape2D_Sphericity),

○二维形状球形不均衡○ Two-dimensional shape spherical imbalance

(original_shape2D_SphericalDisproportion)；(original_shape2D_SphericalDisproportion);

●一阶统计数据，诸如：First-order statistics, such as:

○一阶第10百分位(original_firstorder_10Percentile)，○ first order 10th percentile (original_firstorder_10Percentile),

○一阶第90百分位(original_firstorder_90Percentile)，○ The first-order 90th percentile (original_firstorder_90Percentile),

○一阶能量(original_firstorder_Energy)，○First-order energy (original_firstorder_Energy),

○一阶熵，其指定图像值中的不确定性或随机性○ First-order entropy, which specifies the uncertainty or randomness in the image values

(original_firstorder_Entropy)，(original_firstorder_Entropy),

○一阶四分位距，其基于四分位拆分来衡量可变性○ The first-order interquartile range, which measures variability based on quartile splits

(original_firstorder_InterquartileRange)，(original_firstorder_InterquartileRange),

○一阶峭度，其衡量值的分布的“峰值”○ First-order kurtosis, which measures the “peakedness” of the distribution of values

(original_firstorder_Kurtosis)，(original_firstorder_Kurtosis),

○一阶最大值(original_firstorder_Maximum)，○First-order maximum (original_firstorder_Maximum),

○一阶平均绝对偏差○First-order mean absolute deviation

(original_firstorder_MeanAbsoluteDeviation)，(original_firstorder_MeanAbsoluteDeviation),

○一阶平均值(original_firstorder_Mean)，○First-order mean (original_firstorder_Mean),

○一阶中位数(original_firstorder_Median)，○First-order median (original_firstorder_Median),

○一阶最小值(original_firstorder_Minimum)，○First-order minimum (original_firstorder_Minimum),

○一阶范围(original_firstorder_Range)，○First-order range (original_firstorder_Range),

○一阶鲁棒平均绝对偏差，其为所有强度值与平均值的平均距离○ The first-order robust mean absolute deviation, which is the average distance of all intensity values from the mean

(original_firstorder_RobustMeanAbsoluteDeviation)，(original_firstorder_RobustMeanAbsoluteDeviation),

○一阶均方根，其为所有平方强度值的平均值○First-order RMS, which is the average of all squared intensity values

(original_firstorder_RootMeanSquared)，(original_firstorder_RootMeanSquared),

○一阶偏度，其衡量值关于平均值的分布的不对称性○ First-order skewness, which measures the asymmetry of the distribution of values about the mean

(original_firstorder_Skewness)，(original_firstorder_Skewness),

○一阶总能量(original_firstorder_TotalEnergy)，○First-order total energy (original_firstorder_TotalEnergy),

○一阶均匀性(original_firstorder_Uniformity)，○First-order uniformity (original_firstorder_Uniformity),

○一阶方差(original_firstorder_Variance)；○First-order variance (original_firstorder_Variance);

●灰度级共生矩阵(GLCM)(描述受掩模约束的图像区域的二阶联合概率函数)特征，诸如Gray-level co-occurrence matrix (GLCM) (a second-order joint probability function describing the image region constrained by the mask) features, such as

○GLCM自相关性(original_glcm_Autocorrelation)，○GLCM autocorrelation (original_glcm_Autocorrelation),

○GLCM簇突出(original_glcm_ClusterProminence)，○GLCM cluster prominence (original_glcm_ClusterProminence),

○GLCM簇阴影(original_glcm_ClusterShade)，○GLCM cluster shadow (original_glcm_ClusterShade),

○GLCM簇趋势(original_glcm_ClusterTendency)，○GLCM cluster trend (original_glcm_ClusterTendency),

○GLCM对比度(original_glcm_Contrast)，○GLCM contrast (original_glcm_Contrast),

○GLCM相关性(original_glcm_Correlation)，○GLCM correlation (original_glcm_Correlation),

○GLCM差值平均值(original_glcm_DifferenceAverage)，○GLCM difference average (original_glcm_DifferenceAverage),

○GLCM差值熵(original_glcm_DifferenceEntropy)○GLCM difference entropy (original_glcm_DifferenceEntropy)

○GLCM差值方差(original_glcm_DifferenceVariance)，○GLCM difference variance (original_glcm_DifferenceVariance),

○GLCM逆差(original_glcm_Id)，○GLCM deficit (original_glcm_Id),

○GLCM逆差矩(original_glcm_Idm)，○GLCM inverse moment difference (original_glcm_Idm),

○经归一化的GLCM逆差矩(original_glcm_Idmn)，○ Normalized GLCM inverse moment difference (original_glcm_Idmn),

○经归一化的GLCM逆差(original_glcm_Idn)，○ Normalized GLCM deficit (original_glcm_Idn),

○GLCM相关性信息衡量(original_glcm_lmc1,○GLCM correlation information measurement (original_glcm_lmc1,

original_glcm_Imc2)，original_glcm_Imc2),

○GLCM逆方差(original_glcm_InverseVariance)，○GLCM inverse variance (original_glcm_InverseVariance),

○GLCM联合平均值(original_glcm_JointAverage)，○GLCM joint average (original_glcm_JointAverage),

○GLCM联合能量(original_glcm_JointEnergy)，○GLCM joint energy (original_glcm_JointEnergy),

○GLCM联合熵(original_glcm_JointEntropy)，○GLCM joint entropy (original_glcm_JointEntropy),

○GLCM最大相关系数(original_glcm_MCC)，○GLCM maximum correlation coefficient (original_glcm_MCC),

○GLCM最大概率(original_glcm_MaximumProbability)，○GLCM maximum probability (original_glcm_MaximumProbability),

○GLCM总和平均值(original_glcm_SumAverage)，○GLCM sum average (original_glcm_SumAverage),

○GLCM总和熵(original_glcm_SumEntropy)，○GLCM sum entropy (original_glcm_SumEntropy),

○GLCM平方和(original_glcm_SumSquares)；○GLCM sum of squares (original_glcm_SumSquares);

●灰度级依赖矩阵(量化图像中的灰度级依赖性，其中灰度级依赖性被定义为在指定距离内依赖于中心像素的连接像素的数量)特征，诸如：Gray-level dependency matrix (quantifies gray-level dependency in an image, where gray-level dependency is defined as the number of connected pixels that depend on a center pixel within a specified distance) features such as:

○GLDM灰度级依赖熵○GLDM gray level dependency entropy

(original_gldm_DependenceEntropy)，(original_gldm_DependenceEntropy),

○GLDM依赖非均匀性○GLDM relies on heterogeneity

(original_gldm_DependenceNonUniformity)，(original_gldm_DependenceNonUniformity),

○经归一化的GLDM依赖非均匀性○Normalized GLDM dependence heterogeneity

(original_gldm_DependenceNonUniformityNormalized)，(original_gldm_DependenceNonUniformityNormalized),

○GLDM依赖方差(original_gldm_DependenceVariance)，○GLDM dependency variance (original_gldm_DependenceVariance),

○GLDM灰度级非均匀性○GLDM grayscale non-uniformity

(original_gldm_GrayLevelNonUniformity)，(original_gldm_GrayLevelNonUniformity),

○GLDM灰度级方差(original_gldm_GrayLevelVariance)，○GLDM gray level variance (original_gldm_GrayLevelVariance),

○GLDM高灰度级强调○GLDM high gray level emphasis

(original_gldm_HighGrayLevelEmphasis)，(original_gldm_HighGrayLevelEmphasis),

○GLDM大依赖强调○GLDM relies heavily on emphasis

(original_gldm_LargeDependenceEmphasis)，(original_gldm_LargeDependenceEmphasis),

○GLDM大依赖高灰度级强调○GLDM relies heavily on high grayscale emphasis

(original_gldm_LargeDependenceHighGrayLevelEmphasis)，(original_gldm_LargeDependenceHighGrayLevelEmphasis),

○GLDM大依赖低灰度级强调○GLDM heavily relies on low grayscale emphasis

(original_gldm_LargeDependenceLowGrayLevelEmphasis)，(original_gldm_LargeDependenceLowGrayLevelEmphasis),

○GLDM低灰度级强调○GLDM low gray level emphasis

(original_gldm_LowGrayLevelEmphasis)，(original_gldm_LowGrayLevelEmphasis),

○GLDM小依赖强调○GLDM small dependency emphasis

(original_gldm_SmallDependenceEmphasis)，(original_gldm_SmallDependenceEmphasis),

○GLDM小依赖高灰度级强调○GLDM small dependency high gray level emphasis

(original_gldm_SmallDependenceHighGrayLevelEmphasis)，(original_gldm_SmallDependenceHighGrayLevelEmphasis),

○GLDM小依赖低灰度级强调○GLDM small dependence low gray level emphasis

(original_gldm_SmallDependenceLowGrayLevelEmphasis)；(original_gldm_SmallDependenceLowGrayLevelEmphasis);

●灰度级游程矩阵(GLRLM)(量化灰度级游程，其被定义为具有相同灰度级值的连续像素的像素数量的长度)特征，诸如：Gray Level Run Matrix (GLRLM) (quantized gray level run, which is defined as the length of the number of pixels of consecutive pixels with the same gray level value) features such as:

○GLRLM灰度级非均匀性○GLRLM grayscale non-uniformity

(original_glrlm_GrayLevelNonUniformity)，(original_glrlm_GrayLevelNonUniformity),

○经归一化的GLRLM灰度级非均匀性○Normalized GLRLM gray level non-uniformity

(original_glrlm_GrayLevelNonUniformityNormalized)，(original_glrlm_GrayLevelNonUniformityNormalized),

○GLRLM灰度级方差(original_glrlm_GrayLevelVariance)，○ GLRLM gray level variance (original_glrlm_GrayLevelVariance),

○GLRLM高灰度级游程强调○GLRLM high grayscale run-length emphasis

(original_glrlm_HighGrayLevelRunEmphasis)，(original_glrlm_HighGrayLevelRunEmphasis),

○GLRLM长游程强调(LRE)○GLRLM Long Run Emphasis (LRE)

(original_glrlm_LongRunEmphasis)，(original_glrlm_LongRunEmphasis),

○GLRLM长游程高灰度级强调○GLRLM long run length high gray level emphasis

(original_glrlm_LongRunHighGrayLevelEmphasis)，(original_glrlm_LongRunHighGrayLevelEmphasis),

○GLRLM长游程低灰度级强调○GLRLM long run low gray level emphasis

(original_glrm_LongRunLowGrayLevelEmphasis)，(original_glrm_LongRunLowGrayLevelEmphasis),

○GLRLM低灰度级游程强调○GLRLM low gray level run-length emphasis

(original_glrlm_LowGrayLevelRunEmphasis)，(original_glrlm_LowGrayLevelRunEmphasis),

○GLRM游程熵(original_glrm_RunEntropy)，○GLRM run entropy (original_glrm_RunEntropy),

○GLRM游程非均匀性)，○GLRM run length non-uniformity),

(original_glrm_RunLengthNonuniformity)，(original_glrm_RunLengthNonuniformity),

○经归一化的GLRLM游程非均匀性○Normalized GLRLM run-length non-uniformity

(original_glrlm_RunLengthNonUniformityNormalized)，(original_glrlm_RunLengthNonUniformityNormalized),

○GLRLM游程百分比(original_glrlm_RunPercentage)，○GLRLM run percentage (original_glrlm_RunPercentage),

○GLRLM游程方差(original_glrlm_RunVariance)，○ GLRLM run variance (original_glrlm_RunVariance),

○GLRLM短游程强调(original_glrlm_ShortRunEmphasis)，○GLRLM short run emphasis (original_glrlm_ShortRunEmphasis),

○GLRLM短游程高灰度级强调○GLRLM short run length high gray level emphasis

(original_glrlm_ShortRunHighGrayLevelEmphasis)，(original_glrlm_ShortRunHighGrayLevelEmphasis),

○GLRLM短游程低灰度级强调○GLRLM short run low gray level emphasis

(original_glrlm_ShortRunLowGrayLevelEmphasis)；(original_glrlm_ShortRunLowGrayLevelEmphasis);

●灰度级尺寸区(GLSZM)(描述图像区域中的灰度级区)矩阵特征，诸如：Gray Level Size Zone (GLSZM) (describing the gray level zone in the image area) matrix features, such as:

○GLSZM灰度级非均匀性○GLSZM grayscale non-uniformity

(original_glszm_GrayLevelNonUniformity)，(original_glszm_GrayLevelNonUniformity),

○经归一化的GLSZM灰度级非均匀性○Normalized GLSZM grayscale non-uniformity

(original_glszm_GrayLevelNonUniformityNormalized)，(original_glszm_GrayLevelNonUniformityNormalized),

○GLSZM灰度级方差(original_glszm_GrayLevelVariance)，○GLSZM gray level variance (original_glszm_GrayLevelVariance),

○GLSZM高灰度级区强调○GLSZM High gray level area emphasis

(original_glszm_HighGrayLevelZoneEmphasis)，(original_glszm_HighGrayLevelZoneEmphasis),

○GLSZM大面积强调○GLSZM emphasizes large areas

(original_glszm_LargeAreaEmphasis)，(original_glszm_LargeAreaEmphasis),

○GLSZM大面积高灰度级强调○GLSZM large area high gray level emphasis

(original_glszm_LargeAreaHighGrayLevelEmphasis)，(original_glszm_LargeAreaHighGrayLevelEmphasis),

○GLSZM大面积低灰度级强调○GLSZM large area low gray level emphasis

(original_glszm_LargeAreaLowGrayLevelEmphasis)，(original_glszm_LargeAreaLowGrayLevelEmphasis),

○GLSZM低灰度级区强调○GLSZM low gray level area emphasis

(original_glszm_LowGrayLevelZoneEmphasis)，(original_glszm_LowGrayLevelZoneEmphasis),

○GLSZM尺寸区非均匀性○GLSZM size zone non-uniformity

(original_glszm_SizeZoneNonUniformity)，(original_glszm_SizeZoneNonUniformity),

○经归一化的GLSZM尺寸区非均匀性○Normalized GLSZM size zone non-uniformity

(original_glszm_SizeZoneNonUniformityNormalized)，(original_glszm_SizeZoneNonUniformityNormalized),

○GLSZM小面积强调○GLSZM small area emphasis

(original_glszm_SmallAreaEmphasis)，(original_glszm_SmallAreaEmphasis),

○GLSZM小面积强调高灰度级强调○GLSZM small area emphasis high gray level emphasis

(original_glszm_SmallAreaHighGrayLevelEmphasis)，(original_glszm_SmallAreaHighGrayLevelEmphasis),

○GLSZM小面积低灰度级强调○GLSZM small area low gray level emphasis

(original_glszm_SmallAreaLowGrayLevelEmphasis)，(original_glszm_SmallAreaLowGrayLevelEmphasis),

○GLSZM区熵(original_glszm_ZoneEntropy)，○GLSZM zone entropy (original_glszm_ZoneEntropy),

○GLSZM区百分比(original_glszm_ZonePercentage)，○GLSZM zone percentage (original_glszm_ZonePercentage),

○GLSZM区方差(original_glszm_ZoneVariance)；○GLSZM zone variance (original_glszm_ZoneVariance);

●邻近灰度色调差值矩阵(NGTDM)(描述灰度值与在一定距离内的近邻的平均灰度值之间的差值)特征，诸如：Neighborhood Gray Tone Difference Matrix (NGTDM) (describes the difference between a gray value and the average gray value of its neighbors within a certain distance) features, such as:

○NGTDM繁忙度(original_ngtdm_Busyness)，○NGTDM Busyness (original_ngtdm_Busyness),

○NGTDM粗糙度(original_ngtdm_Coarseness)，○NGTDM roughness (original_ngtdm_Coarseness),

○NGTDM复杂度(original_ngtdm_Complexity)，○NGTDM complexity (original_ngtdm_Complexity),

○NGTDM对比度(original_ngtdm_Contrast)，○NGTDM Contrast (original_ngtdm_Contrast),

○NGTDM强度(original_ngtdm_Strength)；○NGTDM strength (original_ngtdm_Strength);

●高级细胞核形态学特征，诸如：●Advanced nuclear morphological features, such as:

○椭圆形细胞核的半径(ellipse_R_index)，○The radius of the elliptical nucleus (ellipse_R_index),

○椭圆形细胞核的长轴(ellipse_MA_index)，○The major axis of the elliptical nucleus (ellipse_MA_index),

○细胞核的凸性周长，其衡量曲率的周长○ The convex perimeter of the nucleus, which measures the curvature of the perimeter

(convexity_perimeter)，(convexity_perimeter),

○细胞核的圆度，其衡量细胞核的圆形度(圆度)，○ Nuclear roundness, which measures the circularity (roundness) of the nucleus,

○当从凸包减去形状时剩余的经归一化的连接分量数○ The normalized number of connected components remaining when the shape is subtracted from the convex hull

(Ncce_index)；(Ncce_index);

●边界(其中细胞核的边界特征为从所有边界坐标到细胞核质心点的距离分布)特征，诸如：●Boundary (where the boundary feature of the nucleus is the distance distribution from all boundary coordinates to the center of mass of the nucleus) features, such as:

○平均值(mean(R))，○ Mean (R) ),

○中位数(中位数(R))，○Median (Median (R)),

○模式(模式(R))，○Mode (Mode (R)),

○最大值(max_v:maxR(R))，○ Maximum value (max_v:maxR(R)),

○最小值(min_v:min(R))，○ Minimum value (min_v:min(R)),

○边界特征的第25百分位(percentile_25:25％percentile○ The 25th percentile of the border feature (percentile_25: 25% percentile

(R))，(R)),

○边界特征的第75百分位(percentile_75:75％percentile○ The 75th percentile of the boundary feature (percentile_75: 75% percentile

(R))，(R)),

○低于平均边界特征的第25百分位○ Below the 25th percentile for borderline characteristics

(mean_below_percentile_25:mean(R(R<percentile_25)))，○高于平均边界特征的第75百分位(mean_below_percentile_25:mean(R(R<percentile_25))), ○ Above the 75th percentile of the mean boundary feature

(mean_above_percentile_75:mean(R(R>percentile_75)))，○边界特征的总和距离(sum_dist:sum(R))，(mean_above_percentile_75:mean(R(R>percentile_75))),○The sum of the distances of the boundary features (sum_dist:sum(R)),

○边界特征的调和平均值(harmonic_mean:harmonic○The harmonic mean of the boundary features (harmonic_mean: harmonic

mean(R))，mean(R)),

○3％经修整的平均边界特征(trimmed_mean_3_percent:3％○3% trimmed mean boundary feature (trimmed_mean_3_percent: 3%

trimmed mean(R))，trimmed mean(R)),

○5％经修整的平均边界特征(trimmed_mean_5_percent:5％○5% trimmed mean boundary feature (trimmed_mean_5_percent: 5%

trimmed mean(R))，trimmed mean(R)),

○15％经修整的平均边界特征(trimmed_mean_15_percent:○15% trimmed mean boundary feature (trimmed_mean_15_percent:

15％trimmed mean(R))，15% trimmed mean(R)),

○25％经修整的平均边界特征(trimmed_mean_25_percent:○25% trimmed mean boundary feature (trimmed_mean_25_percent:

25％trimmed mean(R))，25% trimmed mean(R)),

○标准差(std_dev:standard deviation(R))，○Standard deviation (std_dev: standard deviation (R) ),

○按平均值计的标准差(std_dev_by_mean:sR/|<R>|)，○ Standard deviation by mean (std_dev_by_mean:sR/|<R>|),

○按中位数计的标准差(std_dev_by_median:○Standard deviation by median (std_dev_by_median:

sR/|median(R)|)，sR/|median(R)|),

○按模式计的标准差(std_dev_by_mode:sR/|mode(R)|)，○Standard deviation by mode (std_dev_by_mode:sR/|mode(R)|),

○偏度(偏度(R))，○ Skewness (Skewness(R)),

○峭度(峭度(R))，○ Kurtosis (Kurtosis (R)),

○平均距离分布减去边界特征的平均值○ The average distance distribution minus the average of the boundary features

(mean_dist_profile_minus_mean:mean(|R-<R>|))，(mean_dist_profile_minus_mean:mean(|R-<R>|)),

○范围(range_v:range(X))，○ range (range_v:range(X)),

○四分位距(interquartile_range:interquartile range(X))，○ Interquartile range (interquartile_range:interquartile range(X)),

○平方总和距离分布(sum_dist_profile_square:sum(R2))，○Sum of square distance distribution (sum_dist_profile_square:sum(R2)),

○立方总和距离分布(sum_dist_profile_cube:sum(R3))，○ Cubic sum distance profile (sum_dist_profile_cube:sum(R3)),

○平方平均距离分布(mean_dist_profile_square:mean(R2))，○ Square mean distance distribution (mean_dist_profile_square:mean(R2)),

○立方平均距离分布(mean_dist_profile_cube:mean(R3))，○ Cubic mean distance profile (mean_dist_profile_cube:mean(R3)),

○提高到四次方的平均距离分布○ Improved to the fourth power average distance distribution

(mean_dist_profile_raise_to_four:mean(R4))，(mean_dist_profile_raise_to_four:mean(R4)),

○提高到五次方的平均距离分布○ Improved to the fifth power average distance distribution

(mean_dist_profile_raise_to_five:mean(R5))，(mean_dist_profile_raise_to_five:mean(R5)),

○总和距离分布减去2的平均幂○ The sum of the distance distribution minus the average power of 2

(sum_dist_profile_minus_mean_pow2:sum(|R-<R>|2))，(sum_dist_profile_minus_mean_pow2:sum(|R-<R>|2)),

○总和距离分布减去3的平均幂○ The sum of the distance distribution minus the average power of 3

(sum_dist_profile_minus_mean_pow3:sum(|R-<R>|3))，(sum_dist_profile_minus_mean_pow3:sum(|R-<R>|3)),

○平均距离分布减去2的平均幂○ The average distance distribution minus the average power of 2

(mean_dist_profile_minus_mean_pow2:mean(|R-<R>|2))，(mean_dist_profile_minus_mean_pow2:mean(|R-<R>|2)),

○平均距离分布减去3的平均幂○ The average distance distribution minus the average power of 3

(mean_dist_profile_minus_mean_pow3:mean(|R-<R>|3))，(mean_dist_profile_minus_mean_pow3:mean(|R-<R>|3)),

○平均距离分布减去4的平均幂○ The average distance distribution minus the average power of 4

(mean_dist_profile_minus_mean_pow4:mean(|R-<R>|4))，(mean_dist_profile_minus_mean_pow4:mean(|R-<R>|4)),

○平均距离分布减去5的平均幂○ The average distance distribution minus the average power of 5

(mean_dist_profile_minus_mean_pow5:mean(|R-<R>|5))，(mean_dist_profile_minus_mean_pow5:mean(|R-<R>|5)),

○峰数(number_of_peaks)，○Number of peaks (number_of_peaks),

○基尼系数(基尼系数)；○ Gini coefficient (Gini coefficient);

●曲率特征，诸如：Curvature features, such as:

○平均曲率(c_mean:mean(k))，○ Mean curvature (c_mean:mean(k) ),

○中位数曲率(c_median:median(k))，○Median curvature (c_median:median(k)),

○模式曲率(c_mode:mode(k))，○ Mode curvature (c_mode: mode(k)),

○最大曲率(c_max_v:max(k))，○ Maximum curvature (c_max_v:max(k)),

○最小曲率(c_min_v:min(k))，○ Minimum curvature (c_min_v:min(k)),

○第25百分位曲率(c_percentile_25:25％percentile(k))，○ 25th percentile curvature (c_percentile_25: 25% percentile (k)),

○第75百分位曲率(c_percentile_75:75％percentile(k))，○低于平均曲率的第25百分位○ 75th percentile curvature (c_percentile_75: 75% percentile(k)), ○ 25th percentile below the mean curvature

(c_mean_below_percentile_25:mean(k(k<(c_mean_below_percentile_25:mean(k(k<

c_percentile_25)))，c_percentile_25))),

○高于平均曲率的第75百分位○ Above the 75th percentile of mean curvature

(c_mean_above_percentile_75:mean(k(k>(c_mean_above_percentile_75:mean(k(k>

c_percentile_75)))，c_percentile_75))),

○总和距离(c_sum_dist:sum(k))，○Sum distance (c_sum_dist:sum(k)),

○调和平均值(c_harmonic_mean:harmonic mean(k))，○ Harmonic mean (c_harmonic_mean:harmonic mean(k)),

○3％经修整的平均曲率(c_trimmed_mean_3_percent:3％○3% trimmed mean curvature (c_trimmed_mean_3_percent: 3%

trimmed mean(k))，trimmed mean(k)),

○3％经修整的平均曲率(c_trimmed_mean_5_percent:5％○3% trimmed mean curvature (c_trimmed_mean_5_percent: 5%

trimmed mean(k))，trimmed mean(k)),

○15％经修整的平均曲率(c_trimmed_mean_15_percent:○15% trimmed mean curvature (c_trimmed_mean_15_percent:

15％trimmed mean(k))，15% trimmed mean(k)),

○25％经修整的平均曲率(c_trimmed_mean_25_percent:○25% trimmed mean curvature (c_trimmed_mean_25_percent:

25％trimmed mean(k))，25% trimmed mean(k)),

○标准差(c_std_dev:standard deviation(k))，○Standard deviation (c_std_dev: standard deviation (k) ),

○按平均值计的标准差(c_std_dev_by_mean:sk/|<k>|)，○ Standard deviation by mean (c_std_dev_by_mean:sk/|<k>|),

○按中位数计的标准差(c_std_dev_by_median:○Standard deviation by median (c_std_dev_by_median:

sk/|median(k)|)，sk/|median(k)|),

○按模式计的标准差(c_std_dev_by_mode:sk/|mode(k)|)，○偏度(c_skewness:skewness(k))，○ standard deviation by mode (c_std_dev_by_mode:sk/|mode(k)|), ○ skewness (c_skewness:skewness(k)),

○峭度(c_kurtosis:kurtosis(k))，○Kurtosis(c_kurtosis:kurtosis(k)),

○平均曲率(c_mean:mean(|k-<k>|))，○ Mean curvature (c_mean:mean(|k-<k>|)),

○曲率的范围(c_range_v:range(k))，○The range of curvature (c_range_v:range(k)),

○四分位距(c_interquartile_range:interquartile range(k))，○平方总和曲率分布(c_sum_curvature_profile_square:○ interquartile range (c_interquartile_range:interquartile range(k)), ○ sum of square curvature profile (c_sum_curvature_profile_square:

sum(k2))，sum(k2)),

○立方总和曲率分布(c_sum_curvature_profile_cube:○ Cubic sum curvature profile (c_sum_curvature_profile_cube:

sum(k3))，sum(k3)),

○平方平均曲率分布(c_mean_curvature_profile_square:○ Square mean curvature profile (c_mean_curvature_profile_square:

mean(k2))，mean(k2)),

○立方平均曲率分布(c_mean_curvature_profile_cube:○Cubic mean curvature profile (c_mean_curvature_profile_cube:

mean(k3))，mean(k3)),

○提高到四次方的平均曲率分布○The average curvature distribution increased to the fourth power

(c_mean_curvature_profile_raise_to_four:mean(k4))，(c_mean_curvature_profile_raise_to_four:mean(k4)),

○提高到五次方的平均曲率分布○The average curvature distribution increased to the fifth power

(c_mean_curvature_profile_raise_to_five:mean(k5))，(c_mean_curvature_profile_raise_to_five:mean(k5)),

○总和曲率分布减去2的平均幂○ The sum of the curvature distribution minus the average power of 2

(c_sum_curvature_profile_minus_mean_pow2:sum(|k-(c_sum_curvature_profile_minus_mean_pow2:sum(|k-

<k>|2))，<k>|2)),

○总和曲率分布减去3的平均幂○ The sum of the curvature distribution minus the average power of 3

(c_sum_curvature_profile_minus_mean_pow3:sum(|k-(c_sum_curvature_profile_minus_mean_pow3:sum(|k-

<k>|3))，<k>|3)),

○平均曲率分布减去2的平均幂○ The mean curvature distribution minus the mean power of 2

(c_mean_curvature_profile_minus_mean_pow2:mean(|k-(c_mean_curvature_profile_minus_mean_pow2:mean(|k-

<k>|2))，<k>|2)),

○平均曲率分布减去3的平均幂○ The mean curvature distribution minus the mean power of 3

(c_mean_curvature_profile_minus_mean_pow3:mean(|k-(c_mean_curvature_profile_minus_mean_pow3:mean(|k-

<k>|3))，<k>|3)),

○平均曲率分布减去4的平均幂○ The mean curvature distribution minus the mean power of 4

(c_mean_curvature_profile_minus_mean_pow4:mean(|k-(c_mean_curvature_profile_minus_mean_pow4:mean(|k-

<k>|4))，<k>|4)),

○平均曲率分布减去5的平均幂○ Mean curvature distribution minus the average power of 5

(c_mean_curvature_profile_minus_mean_pow5:mean(|k-(c_mean_curvature_profile_minus_mean_pow5:mean(|k-

<k>|5))，<k>|5)),

○峰数(c_number_of_peaks:number of peaks)，○Number of peaks (c_number_of_peaks:number of peaks),

○基尼系数(c_gini_coefficient:gini coefficient(k))。○Gini coefficient (c_gini_coefficient:gini coefficient(k)).

可以使用一个或多个统计度量来评估图像特征。可以使用一个或多个特征选择过程来选择与致癌基因驱动因子相关联的图像特征。非限制性示例统计度量为标准差、对两个随机抽取样品之间的差值进行平均的二次熵、基于样品的正态分布和经验分布函数之间的距离的Kolmogorov-Smirnov、以及异常值百分比(例如，在平均值标准差两倍范围之外的值的百分比)。在一些实施方案中，所选择的图像特征可以在该多个图像特征中具有与致癌基因驱动因子的最高相关性。致癌基因驱动因子可以是融合、突变或未知驱动因子。One or more statistical metrics can be used to evaluate image features. One or more feature selection processes can be used to select image features associated with oncogene drivers. Non-limiting example statistical metrics are standard deviation, quadratic entropy averaging the differences between two randomly drawn samples, Kolmogorov-Smirnov based on the distance between the normal distribution of the sample and the empirical distribution function, and the percentage of outliers (e.g., the percentage of values outside the range of two times the standard deviation of the mean). In some embodiments, the selected image feature can have the highest correlation with the oncogene driver among the multiple image features. The oncogene driver can be a fusion, a mutation, or an unknown driver.

在一些实施方案中，示例性所选择的核形态学图像特征可以包括在四个示例性类别中。一个类别的形状相关特征可以包括以细胞核的几何形状为目标的特征。作为示例而非限制，所捕获的属性可以包括单独的细胞核的尺寸和形状以及其图像矩(图像像素强度的加权平均值)。该类别中的示例性所选择的功能可以包括：In some embodiments, exemplary selected nuclear morphology image features may be included in four exemplary categories. One category of shape-related features may include features targeting the geometry of the cell nucleus. By way of example and not limitation, the captured attributes may include the size and shape of individual cell nuclei and their image moments (weighted averages of image pixel intensities). Exemplary selected features in this category may include:

●细胞核的面积(面积)，●The area of the cell nucleus (area),

●细胞核中的像素与定界框中的像素的比率，无论是在总视场或视图中还是在视场的选定区域(范围)中，The ratio of pixels in the nucleus to pixels in the bounding box, either in the total field of view or in a selected region (range) of the field of view,

●细胞核中的像素与凸包的像素的比率(实度)，The ratio of pixels in the nucleus to pixels in the convex hull (solidity),

●Hu矩(图像像素的强度的某个特定加权平均值，又名“矩”，即，平移、旋转和尺度不变矩)(moments_hu0)，Hu moment (a certain weighted average of the intensities of the image pixels, also known as "moments", i.e., translation, rotation, and scale invariant moments) (moments_hu0),

●加权Hu矩(weighted_moments_hu0,weighted_moments_hu1,weighted_moments_hu2)；●Weighted Hu moments (weighted_moments_hu0, weighted_moments_hu1, weighted_moments_hu2);

●二维形状特征，诸如二维形状周长与表面比率(original_shape2D_PerimeterSurfaceRatio)；● 2D shape features, such as 2D shape perimeter to surface ratio (original_shape2D_PerimeterSurfaceRatio);

●椭圆形细胞核的半径(ellipse_R_index)，●The radius of the elliptical nucleus (ellipse_R_index),

●椭圆形细胞核的长轴(ellipse_MA_index)●The major axis of the elliptical nucleus (ellipse_MA_index)

●细胞核的凸性周长，其衡量曲率的周长(convexity_perimeter)，以及The convexity perimeter of the nucleus, which measures the curvature of the perimeter (convexity_perimeter), and

●当从凸包减去形状时剩余的连接分量的经归一化的数量(Ncce_index)。• The normalized number of connected components remaining when the shape is subtracted from the convex hull (Ncce_index).

强度分布相关特征的类别可以包括：捕获单独的细胞核的图像中的图像强度(像素值)的分布的统计属性的特征。该类别中的示例性所选择的功能可以包括：The category of intensity distribution related features may include features that capture the statistical properties of the distribution of image intensities (pixel values) in images of individual cell nuclei. Exemplary selected features in this category may include:

●一阶第90百分位(original_firstorder_90Percentile)，● first-order 90th percentile (original_firstorder_90Percentile),

●一阶最小值(original_firstorder_Minimum)，以及● the first-order minimum (original_firstorder_Minimum), and

●一阶熵，其指定图像值中的不确定性或随机性(original_firstorder_Entropy)。• First-order entropy, which specifies the uncertainty or randomness in the image values (original_firstorder_Entropy).

纹理相关特征的类别可以包括：通过分析细胞核图像的像素及其值(在子区域中)之间的空间关系来以对纹理的量化为目标的特征。如下所述，灰度级共生矩阵(GLCM)描述受掩模约束的图像区域的二阶联合概率函数。灰度级依赖矩阵(GLDM)量化图像中的灰度级依赖性，其中灰度级依赖性被定义为在指定距离内依赖于中心像素的连接像素的数量。灰度级游程矩阵(GLRLM)量化灰度级游程，其被定义为具有相同灰度级值的连续像素的像素数量的长度。该类别中的示例性所选择的功能可以包括：The category of texture-related features may include: features that target the quantification of texture by analyzing the spatial relationship between pixels of a cell nucleus image and their values (in sub-regions). As described below, the gray-level co-occurrence matrix (GLCM) describes the second-order joint probability function of an image region constrained by a mask. The gray-level dependency matrix (GLDM) quantifies the gray-level dependency in an image, where the gray-level dependency is defined as the number of connected pixels that depend on a center pixel within a specified distance. The gray-level run matrix (GLRLM) quantifies the gray-level run, which is defined as the length of the number of pixels of consecutive pixels with the same gray-level value. Exemplary selected features in this category may include:

●GLCM逆差(original_glcm_Id)，GLCM deficit (original_glcm_Id),

●GLCM对比度(original_glcm_contrast)，GLCM contrast (original_glcm_contrast),

●GLCM联合熵(original_glcm_JointEntropy)，GLCM joint entropy (original_glcm_JointEntropy),

●GLCM总和熵(original_glcm_SumEntropy)；●GLCM sum entropy (original_glcm_SumEntropy);

●GLDM灰度级依赖熵(original_gldm_DependenceEntropy)，GLDM gray level dependency entropy (original_gldm_DependenceEntropy),

●经归一化的GLDM依赖非均匀性(DNUN)，其通过图像来衡量依赖的相似性并且被归一化● Normalized GLDM dependency non-uniformity (DNUN), which measures the similarity of dependencies across images and is normalized

(original_DependenceNonUniformityNormalized)，(original_DependenceNonUniformityNormalized),

●小依赖性强调(SDE)，其衡量小依赖性的分布Small Dependency Emphasis (SDE), which measures the distribution of small dependencies

(original_gldm_SmallDependenceEmphasis)；(original_gldm_SmallDependenceEmphasis);

●GLRLM长游程强调(LRE)(original_glrlm_LongRunEmphasis)，●GLRLM Long Run Emphasis (LRE) (original_glrlm_LongRunEmphasis),

●GLRLM长游程低灰度级强调●GLRLM long run length low gray level emphasis

●GLRM游程非均匀性(original_glrm_RunLengthNonuniformity)，●GLRM run length non-uniformity (original_glrm_RunLengthNonuniformity),

●GLRM游程熵(original_glrm_RunEntropy)；GLRM run entropy (original_glrm_RunEntropy);

边界曲率相关特征的类别可以包括：通过用不同的统计方法分析细胞核的边界曲率而导出的特征。该类别中的示例性所选择的功能可以包括：The category of boundary curvature related features may include features derived by analyzing the boundary curvature of the cell nucleus using different statistical methods. Exemplary selected features in this category may include:

●平均曲率(c_mean)，● Mean curvature (c_mean),

●中位数曲率(c_median)，●Median curvature (c_median),

●第25百分位曲率(c_percentile_25)，●25th percentile curvature (c_percentile_25),

●第75百分位曲率(c_percentile_75)，●75th percentile curvature (c_percentile_75),

●高于平均曲率的第75百分位(c_mean_above_percentile_75)，● Above the 75th percentile of the mean curvature (c_mean_above_percentile_75),

●3％经修正的平均曲率(c_trimmed_mean_3_percent)，3% trimmed mean curvature (c_trimmed_mean_3_percent),

●5％经修正的平均曲率(c_trimmed_mean_5_percent)，5% trimmed mean curvature (c_trimmed_mean_5_percent),

●15％经修正的平均曲率(c_trimmed_mean_15_percent)，15% trimmed mean curvature (c_trimmed_mean_15_percent),

●25％经修正的平均曲率(c_trimmed_mean_25_percent)，25% trimmed mean curvature (c_trimmed_mean_25_percent),

●四分位距曲率(c_interquartile_range)，以及●Interquartile range curvature (c_interquartile_range), and

●曲率的基尼系数(c_gini_coefficient)。●Gini coefficient of curvature (c_gini_coefficient).

图13A至图13E示出了基于量化某些核形态学特征来识别基因融合的实验结果的非限制性示例。如图13A至图13E所示，这些特定核形态学特征的比较可以提供基因融合、致癌基因驱动因子突变和肿瘤抑制因子和/或未知驱动因子之间的统计上显著的区别。Figures 13A to 13E show non-limiting examples of experimental results based on quantifying certain nuclear morphological features to identify gene fusions. As shown in Figures 13A to 13E, comparison of these specific nuclear morphological features can provide statistically significant distinctions between gene fusions, oncogene driver mutations, and tumor suppressors and/or unknown drivers.

图14A至图14C示出了数字病理学图像的非限制性示例，其示出了对应于肿瘤抑制因子和/或未知驱动因子的四种不同的核形态学特征。图14A至图14C示出了示例性WSI，对于其而言，“面积”、“一阶熵”和“游程非均匀性”的核形态学特征分别高度指示肿瘤抑制因子和/或未知驱动因子的突变背景中可能存在的肿瘤异质性的程度。“面积”可以是观察到的细胞核的表面积。“一阶熵”可以是描述在所考虑的图像区域内的像素强度(即，值)的分布的一阶统计数据。因此，熵指定所观察到的像素值中的不确定性/随机性。“游程非均匀性”可以是量化图像中的灰度级游程的游程度量。灰度级游程被定义为具有相同灰度级值的连续像素的像素数量的长度。在灰度级行程长度矩阵p(i,j|θ)中，第(i,j)个元素描述灰度级i在由θ指定的方向上连续出现的次数j。游程非均匀性量度基于该灰度级游程矩阵来分析带有给定游程的可能游程的分布。FIG14A to FIG14C illustrate non-limiting examples of digital pathology images showing four different nuclear morphological features corresponding to tumor suppressors and/or unknown drivers. FIG14A to FIG14C illustrate exemplary WSIs for which the nuclear morphological features of “area”, “first-order entropy”, and “run non-uniformity” are highly indicative of the degree of tumor heterogeneity that may exist in the mutational background of tumor suppressors and/or unknown drivers, respectively. “Area” may be the surface area of the observed nuclei. “First-order entropy” may be a first-order statistic describing the distribution of pixel intensities (i.e., values) within the considered image region. Thus, entropy specifies the uncertainty/randomness in the observed pixel values. “Run non-uniformity” may be a run length metric that quantifies the grayscale run in an image. A grayscale run is defined as the length of the number of pixels of consecutive pixels with the same grayscale value. In the grayscale run length matrix p(i,j|θ), the (i,j)th element describes the number of times j that grayscale level i appears consecutively in the direction specified by θ. The run non-uniformity measure analyzes the distribution of possible runs with a given run based on the gray-level run matrix.

图15A至图15C示出了数字病理学图像的非限制性示例，其示出了对应于致癌基因驱动因子突变的四种不同的核形态学特征。图15A至图15C示出了示例性WSI，对于其而言，“面积”、“一阶熵”和“游程非均匀性”的核形态学特征分别高度指示致癌基因驱动因子突变的突变背景中可能存在的肿瘤异质性的程度。Figures 15A-15C show non-limiting examples of digital pathology images showing four different nuclear morphology features corresponding to oncogenic driver mutations. Figures 15A-15C show exemplary WSIs for which the nuclear morphology features of "Area", "First Order Entropy" and "Run Heterogeneity" are each highly indicative of the degree of tumor heterogeneity that may exist in the mutational background of oncogenic driver mutations.

图16A至图16C示出了数字病理学图像的非限制性示例，其示出了对应于基因融合的四种不同的核形态学特征。图16A至图16C示出了示例性WSI，对于其而言，“面积”、“一阶熵”和“游程非均匀性”的核形态学特征分别高度指示基因融合的突变背景中可能存在的肿瘤异质性的程度。Figures 16A-16C show non-limiting examples of digital pathology images showing four different nuclear morphological features corresponding to gene fusions. Figures 16A-16C show exemplary WSIs for which the nuclear morphological features of "Area", "First Order Entropy" and "Run Heterogeneity" are each highly indicative of the degree of tumor heterogeneity that may exist in the mutational context of the gene fusion.

在一些实施方案中，识别肿瘤异质性包括：通过进行细胞级空间分析以评定空间分布来识别克隆细胞的区域。在一些实施方案中，评定空间分布包括：测量肿瘤细胞的最小生成树的子图内的光谱距离，其中子图中的每一者表示成簇的相邻细胞(例如，肿瘤巢)，以及跨所有子图成对地计算邻接光谱距离。图17A示出了对WSI中所描绘的肿瘤细胞的最小生成树的子图的识别的非限制性示例。在生成视场或WSI中连接所有细胞的最小生成树之后，可以识别子图(例如，对应于肿瘤巢)。可以通过进行异常值检测来定义和/或基于检测到的肿瘤巢的分割来定义每个子图。一旦跨所有子图的邻接光谱距离成对，就可以关于肿瘤细胞的空间分布来做出评定。更多等距分布的肿瘤细胞可以对应于同质性肿瘤细胞的区域(即，带有特定基因融合的细胞)。图17B示出了基于量化如子图之间的邻接光谱距离来识别基因融合的实验结果的非限制性示例。如图17B所示，跨子图的邻接光谱距离的比较可以提供基因融合、致癌基因驱动因子突变和肿瘤抑制因子和/或未知驱动因子之间的统计上显著的区别。In some embodiments, identifying tumor heterogeneity includes: identifying the region of clonal cells by performing cell-level spatial analysis to assess spatial distribution. In some embodiments, assessing spatial distribution includes: measuring the spectral distance within the subgraph of the minimum spanning tree of tumor cells, wherein each of the subgraphs represents clustered adjacent cells (e.g., tumor nests), and calculating the adjacent spectral distance in pairs across all subgraphs. Figure 17A shows a non-limiting example of the identification of the subgraph of the minimum spanning tree of tumor cells depicted in the WSI. After generating the minimum spanning tree connecting all cells in the field of view or WSI, a subgraph (e.g., corresponding to a tumor nest) can be identified. Each subgraph can be defined by performing outlier detection and/or defined based on the segmentation of the detected tumor nest. Once the adjacent spectral distance across all subgraphs is paired, an assessment can be made about the spatial distribution of tumor cells. More equidistantly distributed tumor cells can correspond to the region of homogeneous tumor cells (i.e., cells with specific gene fusions). Figure 17B shows a non-limiting example of the experimental results of identifying gene fusions based on quantification such as the adjacent spectral distance between subgraphs. As shown in Figure 17B, comparison of neighbor spectral distances across subgraphs can provide statistically significant distinctions between gene fusions, oncogene driver mutations, and tumor suppressors and/or unknown drivers.

在一些实施方案中，识别肿瘤异质性包括：通过进行细胞级空间分析以评定空间熵来识别紧密相邻的克隆细胞的区域。图18A至图18C示出了如在肿瘤细胞表型的三种不同分布中所测量的空间熵的非限制性示例。如图18A至图18C所示，非空间熵的量度(即，Shannon熵)无法反映位置对发生事件分布的异质性的影响，而空间熵(即，Altieri熵)准确地反映如沿图18A至18C中的x和y轴线的空间上绘制的发生事件之间的异质性方面的差值。In some embodiments, identifying tumor heterogeneity includes: identifying regions of closely adjacent clonal cells by performing a cell-level spatial analysis to assess spatial entropy. Figures 18A to 18C show non-limiting examples of spatial entropy as measured in three different distributions of tumor cell phenotypes. As shown in Figures 18A to 18C, measures of non-spatial entropy (i.e., Shannon entropy) cannot reflect the effect of location on the heterogeneity of the distribution of occurrences, while spatial entropy (i.e., Altieri entropy) accurately reflects the differences in heterogeneity between occurrences as plotted spatially along the x and y axes in Figures 18A to 18C.

在一些实施方案中，评定空间熵可以包括：指明一组有区别的距离箱(每个箱表示一对肿瘤细胞之间的距离范围)、识别属于距离箱中的每一者的所有成对的肿瘤细胞、指定距离箱中的每一者来计算成对的肿瘤细胞被识别为形态上相似的频率、并且随后计算所有箱频率值的加权总和。应用于每个距离箱的权重可以对应于距离箱中的成对的肿瘤细胞的数量。该组距离箱可以仅限于表示带有低于指定阈值的最大距离的距离范围的那些箱。图18D是对在指定距离内共生并被分类在指定表型中的成对肿瘤细胞的识别的示意图。In some embodiments, assessing spatial entropy may include specifying a set of distinct distance bins (each bin representing a range of distances between a pair of tumor cells), identifying all pairs of tumor cells belonging to each of the distance bins, specifying each of the distance bins to calculate the frequency at which paired tumor cells are identified as morphologically similar, and then calculating a weighted sum of all bin frequency values. The weight applied to each distance bin may correspond to the number of paired tumor cells in the distance bin. The set of distance bins may be limited to those bins representing a range of distances with a maximum distance below a specified threshold. FIG. 18D is a schematic diagram of the identification of paired tumor cells that coexist within a specified distance and are classified in a specified phenotype.

图18E至图18F示出了基于测量空间熵来识别基因融合、致癌基因驱动因子突变和肿瘤抑制因子和/或未知驱动因子的实验结果的非限制性示例。如图18E至图18F所示，空间熵可以提供基因融合、致癌基因驱动因子突变和肿瘤抑制因子和/或未知驱动因子之间的统计上显著的区别。Figures 18E to 18F show non-limiting examples of experimental results based on measuring spatial entropy to identify gene fusions, oncogene driver mutations, and tumor suppressors and/or unknown drivers. As shown in Figures 18E to 18F, spatial entropy can provide statistically significant distinctions between gene fusions, oncogene driver mutations, and tumor suppressors and/or unknown drivers.

图19示出了示例性计算机系统1900。在特定实施例中，一个或多个计算机系统1900执行本文描述或示出的一种或多种方法的一个或多个步骤。在特定实施例中，一个或多个计算机系统1900提供本文描述或示出的功能。在特定实施例中，在一个或多个计算机系统1900上运行的软件执行本文描述或示出的一种或多种方法的一个或多个步骤，或者提供本文描述或示出的功能。特定实施例包括一个或多个计算机系统1900的一个或多个部分。在本文，在适当的情况下，对计算机系统的引用可包括计算设备，反之亦然。此外，在适当的情况下，对计算机系统的引用可包括一个或多个计算机系统。FIG. 19 illustrates an exemplary computer system 1900. In a particular embodiment, one or more computer systems 1900 perform one or more steps of one or more methods described or illustrated herein. In a particular embodiment, one or more computer systems 1900 provide functionality described or illustrated herein. In a particular embodiment, software running on one or more computer systems 1900 performs one or more steps of one or more methods described or illustrated herein, or provides functionality described or illustrated herein. A particular embodiment includes one or more portions of one or more computer systems 1900. Herein, references to computer systems may include computing devices, and vice versa, where appropriate. Additionally, references to computer systems may include one or more computer systems, where appropriate.

本公开设想了任何合适数量的计算机系统1900。本公开设想了采用任何合适的物理形式的计算机系统1900。作为示例而非限制，计算机系统1900可为嵌入式计算机系统、片上系统(SOC)、单板计算机系统(SBC)(诸如例如模块上计算机(COM)或模块上系统(SOM))、台式计算机系统、膝上型电脑或笔记本计算机系统、交互式信息亭、大型机、计算机系统网格、移动电话、个人数字助理(PDA)、服务器、平板计算机系统或其中两个或更多个的组合。在适当的情况下，计算机系统1900可包括一个或多个计算机系统1900；可为一体的或分布式的；可跨越多个位置；可跨越多个机器；可跨越多个数据中心；或可驻留在云中，该云可包含一个或多个网络中的一个或多个云部件。在适当的情况下，一个或多个计算机系统1900可在无实质性空间或时间限制的情况下执行本文描述或示出的一种或多种方法的一个或多个步骤。作为示例而非限制，一个或多个计算机系统1900可实时地或以成批模式执行本文描述或示出的一种或多种方法的一个或多个步骤。在适当的情况下，一个或多个计算机系统1900可在不同的时间或在不同的位置执行本文描述或示出的一种或多种方法的一个或多个步骤。The present disclosure contemplates any suitable number of computer systems 1900. The present disclosure contemplates computer systems 1900 in any suitable physical form. By way of example and not limitation, computer system 1900 may be an embedded computer system, a system on a chip (SOC), a single board computer system (SBC) (such as, for example, a computer on a module (COM) or a system on a module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a computer system grid, a mobile phone, a personal digital assistant (PDA), a server, a tablet computer system, or a combination of two or more thereof. Where appropriate, computer system 1900 may include one or more computer systems 1900; may be integrated or distributed; may span multiple locations; may span multiple machines; may span multiple data centers; or may reside in a cloud, which may include one or more cloud components in one or more networks. Where appropriate, one or more computer systems 1900 may perform one or more steps of one or more methods described or illustrated herein without substantial spatial or temporal limitations. As an example and not limitation, one or more computer systems 1900 may perform one or more steps of one or more methods described or illustrated herein in real time or in batch mode. Where appropriate, one or more computer systems 1900 may perform one or more steps of one or more methods described or illustrated herein at different times or at different locations.

在特定实施例中，计算机系统1900包括处理器1902、存储器1904、存储装置1906、输入/输出(I/O)接口1908、通信接口1910和总线1912。尽管本公开描述并示出了在特定布置中具有特定数量的特定部件的特定计算机系统，但本公开设想了在任何合适布置中具有任何合适数量的任何合适部件的任何合适计算机系统。In a particular embodiment, computer system 1900 includes a processor 1902, a memory 1904, a storage device 1906, an input/output (I/O) interface 1908, a communication interface 1910, and a bus 1912. Although this disclosure describes and illustrates a particular computer system having particular numbers of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable numbers of any suitable components in any suitable arrangement.

在特定实施例中，处理器1902包括用于执行指令的硬件，诸如构成计算机程序的那些硬件。作为示例而非限制，为了执行指令，处理器1902可从内部寄存器、内部高速缓存、存储器1904或存储装置1906检索(或提取)指令；可将这些指令解码并执行；并且然后可将一个或多个结果写入内部寄存器、内部高速缓存、存储器1904或存储装置1906。在特定实施例中，处理器1902可包括用于数据、指令或地址的一个或多个内部高速缓存。在适当的情况下，本公开设想了包括任何合适数量的任何合适内部高速缓存的处理器1902。作为示例而非限制，处理器1902可包括一个或多个指令高速缓存、一个或多个数据高速缓存以及一个或多个转换后备缓冲器(TLB)。指令高速缓存中的指令可为存储器1904或存储装置1906中的指令的副本，并且指令高速缓存可加速处理器1902对那些指令的检索。数据高速缓存中的数据可为：存储器1904或存储装置1906中的数据的副本，以供在处理器1902处执行的指令进行操作；在处理器1902执行的先前指令的结果，以供在处理器1902执行的后续指令进行访问或写入存储器1904或存储装置1906；或其他合适的数据。数据高速缓存可加速处理器1902的读取或写入操作。TLB可加速处理器1902的虚拟地址转换。在特定实施例中，处理器1902可包括用于数据、指令或地址的一个或多个内部寄存器。在适当的情况下，本公开设想了包括任何合适数量的任何合适内部寄存器的处理器1902。在适当的情况下，处理器1902可包括一个或多个算术逻辑单元(ALU)；可为多核处理器；或可包括一个或多个处理器1902。尽管本公开描述并示出了特定处理器，但本公开设想了任何合适的处理器。In a particular embodiment, the processor 1902 includes hardware for executing instructions, such as those that constitute a computer program. By way of example and not limitation, to execute instructions, the processor 1902 may retrieve (or fetch) instructions from an internal register, an internal cache, memory 1904, or storage 1906; may decode and execute these instructions; and may then write one or more results to an internal register, an internal cache, memory 1904, or storage 1906. In a particular embodiment, the processor 1902 may include one or more internal caches for data, instructions, or addresses. Where appropriate, the present disclosure contemplates a processor 1902 that includes any suitable number of any suitable internal caches. By way of example and not limitation, the processor 1902 may include one or more instruction caches, one or more data caches, and one or more translation lookaside buffers (TLBs). The instructions in the instruction cache may be copies of the instructions in the memory 1904 or storage 1906, and the instruction cache may speed up the retrieval of those instructions by the processor 1902. The data in the data cache may be: a copy of the data in the memory 1904 or storage device 1906 for operation by instructions executed at the processor 1902; the result of the previous instruction executed at the processor 1902 for access or writing to the memory 1904 or storage device 1906 by subsequent instructions executed at the processor 1902; or other suitable data. The data cache may speed up the read or write operation of the processor 1902. The TLB may speed up the virtual address translation of the processor 1902. In a particular embodiment, the processor 1902 may include one or more internal registers for data, instructions, or addresses. Where appropriate, the present disclosure contemplates a processor 1902 including any suitable number of any suitable internal registers. Where appropriate, the processor 1902 may include one or more arithmetic logic units (ALUs); may be a multi-core processor; or may include one or more processors 1902. Although the present disclosure describes and illustrates a particular processor, the present disclosure contemplates any suitable processor.

在特定实施例中，存储器1904包括主存储器，该主存储器用于存储供处理器1902执行的指令或供处理器1902对其进行操作的数据。作为示例而非限制，计算机系统1900可将来自存储装置1906或另一来源(诸如例如另一计算机系统1900)的指令加载到存储器1904。然后，处理器1902可将来自存储器1904的指令加载到内部寄存器或内部高速缓存。为了执行指令，处理器1902可从内部寄存器或内部高速缓存检索指令并将这些指令解码。在指令执行期间或之后，处理器1902可将一个或多个结果(其可为中间结果或最终结果)写入内部寄存器或内部高速缓存。然后，处理器1902可将那些结果中的一个或多个写入存储器1904。在特定实施例中，处理器1902仅执行一个或多个内部寄存器或内部高速缓存中或存储器1904(而非存储装置1906或其他地方)中的指令，并且仅对一个或多个内部寄存器或内部高速缓存中或存储器1904(而非存储装置1906或其他地方)中的数据进行操作。一个或多个存储器总线(其可各自包括地址总线和数据总线)可将处理器1902耦接至存储器1904。总线1912可包括一个或多个存储器总线，如下所述。在特定实施例中，一个或多个存储器管理单元(MMU)驻留在处理器1902和存储器1904之间，并且促进处理器1902所请求的对存储器1904的访问。在特定实施例中，存储器1904包括随机存取存储器(RAM)。在适当的情况下，该RAM可为易失性存储器。在适当的情况下，该RAM可为动态RAM(DRAM)或静态RAM(SRAM)。此外，在适当的情况下，该RAM可为单端口或多端口RAM。本公开设想了任何合适的RAM。在适当的情况下，存储器1904可包括一个或多个存储器1904。尽管本公开描述并示出了特定存储器，但本公开设想了任何合适的存储器。In a particular embodiment, the memory 1904 includes a main memory for storing instructions for the processor 1902 to execute or data for the processor 1902 to operate on. As an example and not limitation, the computer system 1900 may load instructions from the storage device 1906 or another source (such as, for example, another computer system 1900) to the memory 1904. The processor 1902 may then load the instructions from the memory 1904 to an internal register or an internal cache. To execute the instructions, the processor 1902 may retrieve the instructions from the internal register or the internal cache and decode them. During or after the execution of the instructions, the processor 1902 may write one or more results (which may be intermediate results or final results) to the internal register or the internal cache. The processor 1902 may then write one or more of those results to the memory 1904. In a particular embodiment, the processor 1902 executes only instructions in one or more internal registers or internal caches or in memory 1904 (rather than storage 1906 or elsewhere), and operates only on data in one or more internal registers or internal caches or in memory 1904 (rather than storage 1906 or elsewhere). One or more memory buses (which may each include an address bus and a data bus) may couple the processor 1902 to the memory 1904. Bus 1912 may include one or more memory buses, as described below. In a particular embodiment, one or more memory management units (MMUs) reside between the processor 1902 and the memory 1904 and facilitate access to the memory 1904 requested by the processor 1902. In a particular embodiment, the memory 1904 includes a random access memory (RAM). Where appropriate, the RAM may be a volatile memory. Where appropriate, the RAM may be a dynamic RAM (DRAM) or a static RAM (SRAM). In addition, where appropriate, the RAM may be a single-port or multi-port RAM. The present disclosure contemplates any suitable RAM. Where appropriate, memory 1904 may include one or more memories 1904. Although this disclosure describes and illustrates particular memories, this disclosure contemplates any suitable memories.

在特定实施例中，存储装置1906包括用于数据或指令的海量存储装置。作为示例而非限制，存储装置1906可包括硬盘驱动器(HDD)、软盘驱动器、闪存存储器、光盘、磁光盘、磁带或通用串行总线(USB)驱动器或其中两个或更多个的组合。在适当的情况下，存储装置1906可包括可移动或不可移动(或固定)介质。在适当的情况下，存储装置1906可在计算机系统1900的内部或外部。在特定实施例中，存储装置1906为非易失性固态存储器。在特定实施例中，存储装置1906包括只读存储器(ROM)。在适当的情况下，该ROM可为掩模编程ROM、可编程ROM(PROM)、可擦除PROM(EPROM)、电可擦除PROM(EEPROM)、电可改写ROM(EAROM)或闪存存储器或者其中两个或更多个的组合。本公开设想了采用任何合适的物理形式的海量存储装置1906。在适当的情况下，存储装置1906可包括一个或多个存储器控制单元，其促进处理器1902和存储装置1906之间的通信。在适当的情况下，存储装置1906可包括一个或多个存储装置1906。尽管本公开描述并示出了特定存储装置，但本公开设想了任何合适的存储装置。In a particular embodiment, the storage device 1906 includes a mass storage device for data or instructions. By way of example and not limitation, the storage device 1906 may include a hard disk drive (HDD), a floppy disk drive, a flash memory, an optical disk, a magneto-optical disk, a tape, or a universal serial bus (USB) drive, or a combination of two or more thereof. Where appropriate, the storage device 1906 may include a removable or non-removable (or fixed) medium. Where appropriate, the storage device 1906 may be inside or outside the computer system 1900. In a particular embodiment, the storage device 1906 is a non-volatile solid-state memory. In a particular embodiment, the storage device 1906 includes a read-only memory (ROM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), an electrically rewritable ROM (EAROM), or a flash memory, or a combination of two or more thereof. The present disclosure contemplates a mass storage device 1906 in any suitable physical form. Where appropriate, storage 1906 may include one or more memory control units that facilitate communications between processor 1902 and storage 1906. Where appropriate, storage 1906 may include one or more storage devices 1906. Although this disclosure describes and illustrates particular storage devices, this disclosure contemplates any suitable storage devices.

在特定实施例中，I/O接口1908包括硬件、软件或两者，其提供用于在计算机系统1900与一个或多个I/O设备之间进行通信的一个或多个接口。在适当的情况下，计算机系统1900可包括这些I/O设备中的一者或多者。这些I/O设备中的一个或多个可实现人与计算机系统1900之间的通信。作为示例而非限制，I/O设备可包括键盘、小键盘、麦克风、监视器、鼠标、打印机、扫描仪、扬声器、静止相机、触控笔、平板计算机、触摸屏、轨迹球、摄像机、另一合适的I/O设备或其中两个或更多个的组合。I/O设备可包括一个或多个传感器。本公开设想了任何合适的I/O设备以及针对它们的任何合适的I/O接口1908。在适当的情况下，I/O接口1908可包括一个或多个设备或软件驱动器，使得处理器1902能够驱动这些I/O设备中的一者或多者。在适当的情况下，I/O接口1908可包括一个或多个I/O接口1908。尽管本公开描述并示出特定的I/O接口，但本公开涵盖任何合适的I/O接口。In a particular embodiment, the I/O interface 1908 includes hardware, software, or both, which provides one or more interfaces for communicating between the computer system 1900 and one or more I/O devices. Where appropriate, the computer system 1900 may include one or more of these I/O devices. One or more of these I/O devices can enable communication between a person and the computer system 1900. As an example and not limitation, the I/O device may include a keyboard, a keypad, a microphone, a monitor, a mouse, a printer, a scanner, a speaker, a still camera, a stylus, a tablet computer, a touch screen, a trackball, a camera, another suitable I/O device, or a combination of two or more thereof. The I/O device may include one or more sensors. The present disclosure contemplates any suitable I/O device and any suitable I/O interface 1908 for them. Where appropriate, the I/O interface 1908 may include one or more device or software drivers so that the processor 1902 can drive one or more of these I/O devices. Where appropriate, the I/O interface 1908 may include one or more I/O interfaces 1908. Although this disclosure describes and illustrates a particular I/O interface, this disclosure contemplates any suitable I/O interface.

在特定实施例中，通信接口1910包括硬件、软件或两者，其提供用于在计算机系统1900与一个或多个其他计算机系统1900或一个或多个网络之间的通信(诸如例如基于分组的通信)的一个或多个接口。作为示例而非限制，通信接口1910可包括用于与以太网或其他基于导线的网络进行通信的网络接口控制器(NIC)或网络适配器，或者用于与无线网络(诸如WI-FI网络)进行通信的无线NIC(WNIC)或无线适配器。本公开设想了任何合适的网络和针对它的任何合适的通信接口1910。作为示例而非限制，计算机系统1900可与自组织网络、个人局域网(PAN)、局域网(LAN)、广域网(WAN)、城域网(MAN)或者因特网的一个或多个部分或其中两个或更多个的组合进行通信。这些网络中的一个或多个的一个或多个部分可以是有线或无线的。作为示例，计算机系统1900可与无线PAN(WPAN)(诸如例如BLUETOOTHWPAN)、WI-FI网络、WI-MAX网络、蜂窝电话网络(诸如例如全球移动通信系统(GSM)网络)或其他合适的无线网络或其中两个或更多个的组合。在适当的情况下，计算机系统1900可包括用于这些网络中的任一个的任何合适的通信接口1910。在适当的情况下，通信接口1910可包括一个或多个通信接口1910。尽管本公开描述并示出特定的通信接口，但本公开涵盖任何合适的通信接口。In a particular embodiment, the communication interface 1910 includes hardware, software, or both that provides one or more interfaces for communication (such as, for example, packet-based communication) between the computer system 1900 and one or more other computer systems 1900 or one or more networks. By way of example and not limitation, the communication interface 1910 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network, or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network (such as a WI-FI network). The present disclosure contemplates any suitable network and any suitable communication interface 1910 for it. By way of example and not limitation, the computer system 1900 may communicate with one or more parts of an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or the Internet, or a combination of two or more thereof. One or more parts of one or more of these networks may be wired or wireless. As an example, the computer system 1900 may be connected to a wireless PAN (WPAN) (such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless networks, or a combination of two or more thereof. Where appropriate, the computer system 1900 may include any suitable communication interface 1910 for any of these networks. Where appropriate, the communication interface 1910 may include one or more communication interfaces 1910. Although the present disclosure describes and illustrates particular communication interfaces, the present disclosure encompasses any suitable communication interface.

在特定实施例中，总线1912包括硬件、软件或计算机系统1900的两个相互耦合的部件。作为示例而非限制，总线1912可包括加速图形端口(AGP)或其他图形总线、增强型工业标准结构(EISA)总线、前端总线(FSB)、超传输(HT)互连、工业标准架构(ISA)总线、INFINIBAND互连、低引脚数(LPC)总线、存储器总线、微通道架构(MCA)总线、外围部件互连(PCI)总线、PCI-Express(PCIe)总线，串行高级技术附件(SATA)总线、视频电子设备标准协会本地(VLB)总线或另一种合适的总线或其中两个或更多个的组合。在适当的情况下，总线1912可包括一个或多个总线1912。尽管本公开描述并示出了特定总线，但本公开设想了任何合适的总线。In certain embodiments, bus 1912 includes hardware, software, or two mutually coupled components of computer system 1900. By way of example and not limitation, bus 1912 may include an accelerated graphics port (AGP) or other graphics bus, an enhanced industry standard architecture (EISA) bus, a front side bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (ISA) bus, an INFINIBAND interconnect, a low pin count (LPC) bus, a memory bus, a micro channel architecture (MCA) bus, a peripheral component interconnect (PCI) bus, a PCI-Express (PCIe) bus, a serial advanced technology attachment (SATA) bus, a video electronics standard association local (VLB) bus, or another suitable bus or a combination of two or more thereof. Where appropriate, bus 1912 may include one or more buses 1912. Although this disclosure describes and illustrates a particular bus, this disclosure contemplates any suitable bus.

在本文，一种或多种计算机可读的非暂时性存储介质可包括一个或多个基于半导体或其他集成电路(IC)(诸如例如现场可编程门阵列(FPGA)或专用IC(ASIC))、硬盘驱动器(HDD)、混合硬盘驱动器(HHD)、光盘、光盘驱动器(ODD)、磁光盘、磁光盘驱动器、软盘、软盘驱动器(FDD)、磁带、固态驱动器(SSD)、RAM驱动器、SECURE DIGITAL卡或驱动器、任何其他合适的计算机可读的非暂时性存储介质或其中两个或更多个的任何合适组合。在适当的情况下，计算机可读的非暂时性存储介质可为易失性存储介质、非易失性存储介质或易失性存储介质和非易失性存储介质的组合。Herein, one or more computer-readable non-transitory storage media may include one or more semiconductor-based or other integrated circuits (ICs) (such as, for example, field programmable gate arrays (FPGAs) or application-specific ICs (ASICs)), hard disk drives (HDDs), hybrid hard disk drives (HHDs), optical disks, optical disk drives (ODDs), magneto-optical disks, magneto-optical disk drives, floppy disks, floppy disk drives (FDDs), magnetic tapes, solid-state drives (SSDs), RAM drives, SECURE DIGITAL cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more thereof. Where appropriate, the computer-readable non-transitory storage medium may be a volatile storage medium, a non-volatile storage medium, or a combination of a volatile storage medium and a non-volatile storage medium.

在本文，“或”是包含性的而非排他性的，除非另有明确说明或上下文另有说明。因此，在本文，“A或B”指“A、B或两者”，除非另有明确说明或上下文另有说明。此外，在本文，“和”既是共同的又是各自的，除非另有明确说明或上下文另有说明。因此，在本文，“A和B”是指“A和B，共同地或各自地”，除非另有明确说明或上下文另有说明。As used herein, "or" is inclusive and not exclusive, unless expressly stated otherwise or the context indicates otherwise. Thus, as used herein, "A or B" means "A, B, or both," unless expressly stated otherwise or the context indicates otherwise. Furthermore, as used herein, "and" means both jointly and severally, unless expressly stated otherwise or the context indicates otherwise. Thus, as used herein, "A and B" means "A and B, jointly or severally," unless expressly stated otherwise or the context indicates otherwise.

本公开的范围涵盖本领域普通技术人员将理解的对在本文描述或示出的示例性实施例的所有改变、替换、变化、变更和修改。本公开的范围不限于在本文描述或示出的示例性实施例。此外，尽管本公开将本文中的相应实施例描述和示出为包括特定的部件、元件、特征、功能、操作或步骤，但这些实施例中的任一个可包括本领域普通技术人员将理解的在本文任何地方描述或示出的任何部件、元件、特征、功能、操作或步骤的任何组合或排列。此外，在所附权利要求书中，对装置或系统或装置或系统的部件适配为、布置为、能够、配置为、使能够、可操作为或操作为执行特定功能的引用，涵盖该装置、系统、部件，无论其或该特定功能是否被激活、开启或解锁，只要该装置、系统或部件是如此适应、布置、能够、配置、使能、可操作或操作即可。另外，尽管本公开将特定实施例描述或示出为提供特定优点，但特定实施例可不提供这些优点、某些优点或全部优点。The scope of the present disclosure covers all changes, substitutions, variations, alterations and modifications to the exemplary embodiments described or shown herein that will be understood by those of ordinary skill in the art. The scope of the present disclosure is not limited to the exemplary embodiments described or shown herein. In addition, although the present disclosure describes and illustrates the corresponding embodiments herein as including specific parts, elements, features, functions, operations or steps, any of these embodiments may include any combination or arrangement of any parts, elements, features, functions, operations or steps described or shown anywhere herein that will be understood by those of ordinary skill in the art. In addition, in the appended claims, references to devices or systems or parts of devices or systems adapted to, arranged to, capable of, configured to, enabled to, operable to or operated to perform specific functions, cover the device, system, component, whether or not it or the specific function is activated, turned on or unlocked, as long as the device, system or component is so adapted, arranged, capable of, configured to, enabled to, operable or operated. In addition, although the present disclosure describes or illustrates specific embodiments as providing specific advantages, specific embodiments may not provide these advantages, some advantages or all advantages.

实施方案Implementation

1.一种方法，其包括由数字病理学图像处理系统：1. A method comprising:

访问描绘从受试者采样的肿瘤细胞的数字病理学图像；access to digital pathology images depicting tumor cells sampled from the subject;

从所述数字病理学图像选择多个图块，其中所述图块中的每一者描绘肿瘤细胞；selecting a plurality of tiles from the digital pathology image, wherein each of the tiles depicts a tumor cell;

针对所述图块中的每一者生成突变预测，其中所述突变预测表示对可操作突变出现在所述图块中的可能性的预测；以及generating a mutation prediction for each of the tiles, wherein the mutation prediction represents a prediction of the likelihood that an actionable mutation will occur in the tile; and

基于多个突变预测来生成与针对所述受试者的一个或多个治疗方案相关的预后预测。A prognostic prediction associated with one or more treatment regimens for the subject is generated based on the plurality of mutation predictions.

2.根据权利要求1所述的方法，其中生成所述突变预测包括：2. The method of claim 1, wherein generating the mutation prediction comprises:

从所述多个图块中的每一者检测一个或多个特征，其中所述一个或多个特征包括临床特征或组织学特征中的一者或多者，并且其中针对所述多个图块中的每一者生成标记是基于所述一个或多个特征的。One or more features are detected from each of the plurality of tiles, wherein the one or more features include one or more of clinical features or histological features, and wherein generating a label for each of the plurality of tiles is based on the one or more features.

3.根据权利要求2或3所述的方法，其中生成所述突变预测是基于肿瘤形态学的，其中所述肿瘤形态学基于对以下中的一者或多者的分析：印戒细胞的存在、印戒细胞的数量、肝样细胞的存在、肝样细胞的数量、细胞外粘蛋白或肿瘤生长模式。3. The method of claim 2 or 3, wherein generating the mutation prediction is based on tumor morphology, wherein the tumor morphology is based on analysis of one or more of: the presence of signet ring cells, the number of signet ring cells, the presence of hepatoid cells, the number of hepatoid cells, extracellular mucin, or tumor growth pattern.

4.根据权利要求1至3中任一项所述的方法，其中生成所述突变预测是基于一个或多个机器学习模型的，其中所述方法进一步包括：基于多个训练数据来训练所述一个或多个机器学习模型，所述多个训练数据包括对肿瘤细胞的一个或多个带标记的描绘以及对其他组织学或临床特征的一个或多个带标记的描绘。4. The method of any one of claims 1 to 3, wherein generating the mutation prediction is based on one or more machine learning models, wherein the method further comprises: training the one or more machine learning models based on a plurality of training data, the plurality of training data comprising one or more labeled depictions of tumor cells and one or more labeled depictions of other histological or clinical features.

5.根据权利要求1至4中任一项所述的方法，其中生成所述预后预测基于针对来自一个或多个额外数字病理学图像的图块生成突变预测，所述一个或多个额外数字病理学图像中的每一者描绘来自所述受试者的生物学样品中的额外特定样品，并且其中分析包括：5. The method of any one of claims 1 to 4, wherein generating the prognostic prediction is based on generating mutation predictions for tiles from one or more additional digital pathology images, each of the one or more additional digital pathology images depicting an additional specific sample from the biological samples of the subject, and wherein analyzing comprises:

针对来自所述一个或多个额外数字病理学图像的所述图块中的每一者生成突变预测；以及generating a mutation prediction for each of the tiles from the one or more additional digital pathology images; and

基于所有所述突变预测来针对所述受试者生成组合的预后预测。A combined prognostic prediction is generated for the subject based on all of the mutation predictions.

6.根据权利要求1至5中任一项所述的方法，其进一步包括：6. The method according to any one of claims 1 to 5, further comprising:

经由图形用户界面来输出所述预后预测，其中所述图形用户界面包括所述数字病理学图像的图形表示，并且其中所述图形表示包括：对针对所述多个图块中的每一者所生成的所述突变预测的指示，以及与所述预后预测相关联的经预测的置信水平。The prognostic prediction is outputted via a graphical user interface, wherein the graphical user interface includes a graphical representation of the digital pathology image, and wherein the graphical representation includes an indication of the mutation prediction generated for each of the plurality of tiles and a predicted confidence level associated with the prognostic prediction.

7.根据权利要求1至6中任一项所述的方法，其进一步包括：7. The method according to any one of claims 1 to 6, further comprising:

生成与所述一个或多个治疗方案的使用相关联的建议。A recommendation associated with use of the one or more treatment regimens is generated.

8.根据权利要求1至7中任一项所述的方法，其中生物学样品的特定切片经一种或多种染色剂染色。8. The method according to any one of claims 1 to 7, wherein specific sections of the biological sample are stained with one or more stains.

9.根据权利要求1至8中任一项所述的方法，其中生成所述预后预测进一步基于针对所述图块生成的所述突变预测的加权组合。9. The method of any one of claims 1 to 8, wherein generating the prognostic prediction is further based on a weighted combination of the mutation predictions generated for the tile.

10.根据权利要求1至9中任一项所述的方法，其中针对描绘肿瘤细胞的图块来生成突变预测包括：10. The method of any one of claims 1 to 9, wherein generating mutation predictions for tiles depicting tumor cells comprises:

将所述肿瘤细胞分类到表型，所述表型中的每一者对应于不同的突变类别。The tumor cells are classified into phenotypes, each of which corresponds to a different mutation class.

11.根据权利要求10所述的方法，其中将所述肿瘤细胞分类到表型包括：11. The method of claim 10, wherein classifying the tumor cells into phenotypes comprises:

识别所述图块中的核异质性；以及identifying nuclear heterogeneity in the image patch; and

量化经识别的核异质性，其中生成所述突变预测进一步基于经量化的核异质性。The identified nuclear heterogeneity is quantified, wherein generating the mutation prediction is further based on the quantified nuclear heterogeneity.

12.根据权利要求10或11所述的方法，其中生成所述突变预测包括：12. The method of claim 10 or 11, wherein generating the mutation prediction comprises:

进行细胞级空间分析以评定空间分布，其中所述空间分布指示克隆细胞的区域以及克隆细胞的所述区域中的每一者内的细胞的空间排列。Cell-level spatial analysis is performed to assess spatial distribution, wherein the spatial distribution indicates regions of clonal cells and the spatial arrangement of cells within each of the regions of clonal cells.

13.根据权利要求12所述的方法，其中评定空间分布包括：13. The method of claim 12, wherein assessing the spatial distribution comprises:

测量所述肿瘤细胞的最小生成树的子图内的光谱距离，其中所述子图中的每一者表示肿瘤巢；measuring spectral distances within subgraphs of minimum spanning trees of the tumor cells, wherein each of the subgraphs represents a tumor nest;

跨所有所述子图成对地计算邻接光谱距离。Adjacency spectral distances are calculated pairwise across all of the subgraphs.

14.根据权利要求13所述的方法，其中通过进行异常值检测来定义所述子图中的每一者。The method of claim 13 , wherein each of the subgraphs is defined by performing outlier detection.

15.根据权利要求13或14所述的方法，其中基于对检测到的肿瘤巢的分割来定义所述子图中的每一者。15. The method of claim 13 or 14, wherein each of the sub-graphs is defined based on segmentation of detected tumor nests.

16.根据权利要求10至15中任一项所述的方法，其中生成所述突变预测包括：16. The method of any one of claims 10 to 15, wherein generating the mutation prediction comprises:

通过进行细胞级空间分析以评定空间熵来识别所述图块中紧密相邻的克隆细胞的区域。Regions of closely adjacent clonal cells in the tiles were identified by performing cell-level spatial analysis to assess spatial entropy.

17.根据权利要求16所述的方法，其中评定空间熵包括：17. The method of claim 16, wherein assessing spatial entropy comprises:

指明一组有区别的距离箱，其中所述距离箱中的每一者对应于成对的肿瘤细胞之间的距离范围；specifying a set of distinct distance bins, wherein each of the distance bins corresponds to a range of distances between pairs of tumor cells;

对于所述距离箱中的每一者，识别成对的所述肿瘤细胞，其中所述对中的每一者中的所述肿瘤细胞之间的距离落入对应于所述距离箱的所述距离范围内；for each of the distance bins, identifying pairs of the tumor cells, wherein a distance between the tumor cells in each of the pairs falls within the distance range corresponding to the distance bin;

对于所述距离箱中的每一者，计算成对的肿瘤细胞被识别为形态上相似的频率；For each of the distance bins, the frequency with which pairs of tumor cells were identified as morphologically similar was calculated;

对于被分类到表型的肿瘤细胞的每个可能对，将所述成对的肿瘤细胞分类到预定义数量的箱中的一者中，所述箱中的每一者表示所述对中的所述细胞中的每一者的空间位置之间的距离；for each possible pair of tumor cells classified into a phenotype, classifying the paired tumor cells into one of a predefined number of bins, each of the bins representing a distance between the spatial locations of each of the cells in the pair;

对于所述箱中的每一者，计算所述箱中的所述成对的肿瘤细胞的分类的频率。For each of the bins, the frequency of classification of the pairs of tumor cells in the bin is calculated.

18.根据权利要求17所述的方法，其中应用于每个距离箱的权重可对应于所述距离箱中的成对的肿瘤细胞的数量。18. The method of claim 17, wherein the weight applied to each distance bin corresponds to the number of pairs of tumor cells in the distance bin.

19.根据权利要求17或18所述的方法，其中所述一组距离箱可仅限于表示以下距离范围的那些箱：所述距离范围带有低于指定阈值的最大距离。19. A method according to claim 17 or 18, wherein the set of distance bins is restricted to only those bins representing distance ranges with a maximum distance below a specified threshold.

20.根据权利要求1至19中任一项所述的方法，其中所述生成所述预后预测包括：20. The method of any one of claims 1 to 19, wherein said generating said prognostic prediction comprises:

确定所述数字病理学图像的突变背景为未知驱动因子或肿瘤抑制因子，其中成簇的肿瘤细胞的异质性水平高，并且其中所述预后预测与包括免疫疗法的治疗方案相关。The mutational background of the digital pathology image is determined to be an unknown driver or tumor suppressor, wherein the heterogeneity level of the clustered tumor cells is high, and wherein the prognostic prediction is associated with a treatment regimen including immunotherapy.

21.根据权利要求1至20中任一项所述的方法，其中所述生成所述预后预测包括：21. The method of any one of claims 1 to 20, wherein said generating said prognostic prediction comprises:

确定所述数字病理学图像的突变背景为致癌基因驱动因子突变，其中成簇的肿瘤细胞的异质性水平为中等，并且其中所述预后预测与包括对应于突变的靶向疗法的治疗方案相关。The mutational context of the digital pathology image is determined to be oncogenic driver mutations, wherein the level of heterogeneity of the clustered tumor cells is intermediate, and wherein the prognostic prediction is associated with a treatment regimen that includes a targeted therapy corresponding to the mutation.

22.根据权利要求1至21中任一项所述的方法，其中：22. The method according to any one of claims 1 to 21, wherein:

如果可操作突变出现在所述图块中的至少一者中，则所述一个或多个治疗方案包括与所述可操作突变相关联的靶向疗法；If an actionable mutation is present in at least one of the tiles, the one or more treatment regimens include a targeted therapy associated with the actionable mutation;

否则所述一个或多个治疗方案包括免疫疗法。Otherwise the one or more treatment regimens comprise immunotherapy.

23.一种或多种计算机可读非暂时性存储介质，其体现软件，所述软件当被执行时可操作以：23. One or more computer-readable non-transitory storage media embodying software that, when executed, is operable to:

24.根据权利要求23所述的计算机可读非暂时性存储介质，其中生成所述突变预测包括：24. The computer-readable non-transitory storage medium of claim 23, wherein generating the mutation prediction comprises:

25.根据权利要求23或24所述的计算机可读非暂时性存储介质，其中生成所述突变预测是基于肿瘤形态学的，其中所述肿瘤形态学基于对以下中的一者或多者的分析：印戒细胞的存在、印戒细胞的数量、肝样细胞的存在、肝样细胞的数量、细胞外粘蛋白或肿瘤生长模式。25. The computer-readable non-transitory storage medium of claim 23 or 24, wherein generating the mutation prediction is based on tumor morphology, wherein the tumor morphology is based on analysis of one or more of: the presence of signet ring cells, the number of signet ring cells, the presence of hepatoid cells, the number of hepatoid cells, extracellular mucin, or tumor growth pattern.

26.根据权利要求23至25中任一项所述的计算机可读非暂时性存储介质，其中生成所述突变预测是基于一个或多个机器学习模型的，其中所述计算机可读非暂时性存储介质进一步包括：基于多个训练数据来训练所述一个或多个机器学习模型，所述多个训练数据包括对肿瘤细胞的一个或多个带标记的描绘以及对其他组织学或临床特征的一个或多个带标记的描绘。26. A computer-readable, non-transitory storage medium according to any one of claims 23 to 25, wherein generating the mutation prediction is based on one or more machine learning models, wherein the computer-readable, non-transitory storage medium further comprises: training the one or more machine learning models based on a plurality of training data, the plurality of training data comprising one or more labeled depictions of tumor cells and one or more labeled depictions of other histological or clinical features.

27.根据权利要求23至26中任一项所述的计算机可读非暂时性存储介质，其中生成所述预后预测基于针对来自一个或多个额外数字病理学图像的图块生成突变预测，所述一个或多个额外数字病理学图像中的每一者描绘来自所述受试者的生物学样品中的额外特定样品，并且其中分析包括：27. The computer-readable non-transitory storage medium of any one of claims 23 to 26, wherein generating the prognostic prediction is based on generating mutation predictions for tiles from one or more additional digital pathology images, each of the one or more additional digital pathology images depicting an additional specific sample from the biological samples of the subject, and wherein analyzing comprises:

28.根据权利要求23至27中任一项所述的计算机可读非暂时性存储介质，其进一步包括：28. The computer-readable non-transitory storage medium according to any one of claims 23 to 27, further comprising:

29.根据权利要求23至28中任一项所述的计算机可读非暂时性存储介质，其进一步包括：29. The computer-readable non-transitory storage medium according to any one of claims 23 to 28, further comprising:

30.根据权利要求23至29中任一项所述的计算机可读非暂时性存储介质，其中生物学样品的特定切片经一种或多种染色剂染色。30. The computer readable non-transitory storage medium of any one of claims 23 to 29, wherein a specific section of the biological sample is stained with one or more stains.

31.根据权利要求23至30中任一项所述的计算机可读非暂时性存储介质，其中生成所述预后预测进一步基于针对所述图块生成的所述突变预测的加权组合。31. The computer-readable non-transitory storage medium of any one of claims 23 to 30, wherein generating the prognostic prediction is further based on a weighted combination of the mutation predictions generated for the tile.

32.根据权利要求23至31中任一项所述的计算机可读非暂时性存储介质，其中针对描绘肿瘤细胞的图块来生成突变预测包括：32. The computer-readable non-transitory storage medium of any one of claims 23 to 31, wherein generating a mutation prediction for a tile depicting a tumor cell comprises:

33.根据权利要求32所述的计算机可读非暂时性存储介质，其中将所述肿瘤细胞分类到表型包括：33. The computer readable non-transitory storage medium of claim 32, wherein classifying the tumor cells into phenotypes comprises:

34.根据权利要求32或33所述的计算机可读非暂时性存储介质，其中生成所述突变预测包括：34. The computer-readable non-transitory storage medium of claim 32 or 33, wherein generating the mutation prediction comprises:

35.根据权利要求34所述的计算机可读非暂时性存储介质，其中评定空间分布包括：35. The computer-readable non-transitory storage medium of claim 34, wherein assessing the spatial distribution comprises:

36.根据权利要求35所述的计算机可读非暂时性存储介质，其中通过进行异常值检测来定义所述子图中的每一者。36. The computer-readable non-transitory storage medium of claim 35, wherein each of the subgraphs is defined by performing outlier detection.

37.根据权利要求35或36所述的计算机可读非暂时性存储介质，其中基于对检测到的肿瘤巢的分割来定义所述子图中的每一者。37. The computer-readable non-transitory storage medium of claim 35 or 36, wherein each of the subgraphs is defined based on segmentation of detected tumor nests.

38.根据权利要求32至37中任一项所述的计算机可读非暂时性存储介质，其中生成所述突变预测包括：38. The computer-readable non-transitory storage medium of any one of claims 32 to 37, wherein generating the mutation prediction comprises:

39.根据权利要求38所述的计算机可读非暂时性存储介质，其中评定空间熵包括：39. The computer-readable non-transitory storage medium of claim 38, wherein assessing spatial entropy comprises:

40.根据权利要求39所述的计算机可读非暂时性存储介质，其中应用于每个距离箱的权重可对应于所述距离箱中的成对的肿瘤细胞的数量。40. The computer-readable non-transitory storage medium of claim 39, wherein a weight applied to each distance bin corresponds to a number of pairs of tumor cells in the distance bin.

41.根据权利要求39或40所述的计算机可读非暂时性存储介质，其中所述一组距离箱可仅限于表示以下距离范围的那些箱：所述距离范围带有低于指定阈值的最大距离。41. The computer-readable non-transitory storage medium of claim 39 or 40, wherein the set of distance bins is limited to only those bins representing distance ranges with a maximum distance below a specified threshold.

42.根据权利要求23至41中任一项所述的计算机可读非暂时性存储介质，其中所述生成所述预后预测包括：42. The computer-readable non-transitory storage medium of any one of claims 23 to 41, wherein said generating said prognostic prediction comprises:

43.根据权利要求23至42中任一项所述的计算机可读非暂时性存储介质，其中所述生成所述预后预测包括：43. The computer-readable non-transitory storage medium of any one of claims 23 to 42, wherein said generating said prognostic prediction comprises:

44.根据权利要求23至43中任一项所述的计算机可读非暂时性存储介质，其中：44. The computer-readable non-transitory storage medium of any one of claims 23 to 43, wherein:

45.一种系统，其包括：一个或多个处理器；以及耦接至所述处理器的非暂时性存储器，所述非暂时性存储器包括通过所述处理器可执行的指令，所述处理器当执行所述指令时可操作以：45. A system comprising: one or more processors; and a non-transitory memory coupled to the processors, the non-transitory memory comprising instructions executable by the processors, the processors when executing the instructions being operable to:

46.根据权利要求45所述的系统，其中生成所述突变预测包括：46. The system of claim 45, wherein generating the mutation prediction comprises:

47.根据权利要求45或46所述的系统，其中生成所述突变预测是基于肿瘤形态学的，其中所述肿瘤形态学基于对以下中的一者或多者的分析：印戒细胞的存在、印戒细胞的数量、肝样细胞的存在、肝样细胞的数量、细胞外粘蛋白或肿瘤生长模式。47. The system of claim 45 or 46, wherein generating the mutation prediction is based on tumor morphology, wherein the tumor morphology is based on analysis of one or more of: the presence of signet ring cells, the number of signet ring cells, the presence of hepatoid cells, the number of hepatoid cells, extracellular mucin, or tumor growth pattern.

48.根据权利要求45至47中任一项所述的系统，其中生成所述突变预测是基于一个或多个机器学习模型的，其中所述系统进一步包括：基于多个训练数据来训练所述一个或多个机器学习模型，所述多个训练数据包括对肿瘤细胞的一个或多个带标记的描绘以及对其他组织学或临床特征的一个或多个带标记的描绘。48. The system of any one of claims 45 to 47, wherein generating the mutation prediction is based on one or more machine learning models, wherein the system further comprises: training the one or more machine learning models based on a plurality of training data, the plurality of training data comprising one or more labeled depictions of tumor cells and one or more labeled depictions of other histological or clinical features.

49.根据权利要求45至48中任一项所述的系统，其中生成所述预后预测基于针对来自一个或多个额外数字病理学图像的图块生成突变预测，所述一个或多个额外数字病理学图像中的每一者描绘来自所述受试者的生物学样品中的额外特定样品，并且其中分析包括：49. The system of any one of claims 45 to 48, wherein generating the prognostic prediction is based on generating mutation predictions for tiles from one or more additional digital pathology images, each of the one or more additional digital pathology images depicting an additional specific sample from the biological samples of the subject, and wherein analyzing comprises:

50.根据权利要求45至49中任一项所述的系统，其进一步包括：50. The system of any one of claims 45 to 49, further comprising:

51.根据权利要求45至50中任一项所述的系统，其进一步包括：51. The system of any one of claims 45 to 50, further comprising:

52.根据权利要求45至51中任一项所述的系统，其中生物学样品的特定切片经一种或多种染色剂染色。52. The system of any one of claims 45 to 51, wherein specific sections of the biological sample are stained with one or more stains.

53.根据权利要求45至52中任一项所述的系统，其中生成所述预后预测进一步基于针对所述图块生成的所述突变预测的加权组合。53. The system of any one of claims 45 to 52, wherein generating the prognostic prediction is further based on a weighted combination of the mutation predictions generated for the tile.

54.根据权利要求45至53中任一项所述的系统，其中针对描绘肿瘤细胞的图块来生成突变预测包括：54. The system of any one of claims 45 to 53, wherein generating a mutation prediction for a tile depicting a tumor cell comprises:

55.根据权利要求54所述的系统，其中将所述肿瘤细胞分类到表型包括：55. The system of claim 54, wherein classifying the tumor cells into phenotypes comprises:

56.根据权利要求54或55所述的系统，其中生成所述突变预测包括：56. The system of claim 54 or 55, wherein generating the mutation prediction comprises:

57.根据权利要求56所述的系统，其中评定空间分布包括：57. The system of claim 56, wherein assessing the spatial distribution comprises:

58.根据权利要求57所述的系统，其中通过进行异常值检测来定义所述子图中的每一者。58. The system of claim 57, wherein each of the subgraphs is defined by performing outlier detection.

59.根据权利要求57或58所述的系统，其中基于对检测到的肿瘤巢的分割来定义所述子图中的每一者。59. The system of claim 57 or 58, wherein each of the sub-graphs is defined based on segmentation of detected tumor nests.

60.根据权利要求54至59中任一项所述的系统，其中生成所述突变预测包括：60. The system of any one of claims 54 to 59, wherein generating the mutation prediction comprises:

61.根据权利要求60所述的系统，其中评定空间熵包括：61. The system of claim 60, wherein assessing spatial entropy comprises:

62.根据权利要求61所述的系统，其中应用于每个距离箱的权重可对应于所述距离箱中的成对的肿瘤细胞的数量。62. The system of claim 61, wherein a weight applied to each distance bin corresponds to the number of pairs of tumor cells in the distance bin.

63.根据权利要求61或62所述的系统，其中所述一组距离箱可仅限于表示以下距离范围的那些箱：所述距离范围带有低于指定阈值的最大距离。63. A system according to claim 61 or 62, wherein the set of distance bins is limited to only those bins representing distance ranges with a maximum distance below a specified threshold.

64.根据权利要求45至63中任一项所述的系统，其中所述生成所述预后预测包括：64. The system of any one of claims 45 to 63, wherein said generating said prognostic prediction comprises:

65.根据权利要求45至64中任一项所述的系统，其中所述生成所述预后预测包括：65. The system of any one of claims 45 to 64, wherein said generating said prognostic prediction comprises:

66.根据权利要求45至65中任一项所述的系统，其中：66. A system according to any one of claims 45 to 65, wherein:

Claims

1. A method comprising: a digital pathology image processing system:

access digital pathology images depicting tumor cells sampled from the subject;

Selecting a plurality of tiles from the digital pathology image, wherein each of the tiles depicts tumor cells;

Generating a mutation prediction for each of the tiles, wherein the mutation prediction represents a prediction of the likelihood of an actionable mutation occurring in the tile; and

A prognostic prediction related to one or more treatment regimens for the subject is generated based on the plurality of mutation predictions.

2. The method of claim 1, wherein generating the mutation prediction includes:

One or more features are detected from each of the plurality of tiles, wherein the one or more features include one or more of clinical features or histological features, and wherein for the plurality of tiles Each of the blocks generates a mark based on the one or more characteristics.

3. The method of claim 1, wherein generating the mutation prediction is based on tumor morphology, wherein the tumor morphology is based on analysis of one or more of: the presence of signet ring cells, signature The number of ring cells, the presence of hepatoid cells, the number of hepatoid cells, extracellular mucin, or tumor growth pattern.

4. The method of claim 1, wherein generating the mutation prediction is based on one or more machine learning models, wherein the method further comprises: training the one or more machine learning models based on a plurality of training data The plurality of training data includes one or more labeled depictions of tumor cells and one or more labeled depictions of other histological or clinical features.

5. The method of claim 1, wherein generating the prognostic prediction is based on generating mutation predictions for tiles from one or more additional digital pathology images, each of the one or more additional digital pathology images One depicts an additional specific sample from the biological sample from the subject, and wherein the analysis includes:

generating mutation predictions for each of the tiles from the one or more additional digital pathology images; and

A combined prognostic prediction is generated for the subject based on all of the mutation predictions.

6. The method of claim 1, further comprising:

The prognostic prediction is output via a graphical user interface, wherein the graphical user interface includes a graphical representation of the digital pathology image, and wherein the graphical representation includes: An indication of the mutation prediction is generated, and a predicted confidence level associated with the prognostic prediction.

7. The method of claim 1, further comprising:

Recommendations associated with use of the one or more treatment regimens are generated.

8. The method of claim 1, wherein specific sections of the biological sample are stained with one or more stains.

9. The method of claim 1, wherein generating the prognostic prediction is further based on a weighted combination of the mutation predictions generated for the tile.

10. The method of claim 1, wherein generating mutation predictions for tiles depicting tumor cells comprises:

The tumor cells are classified into phenotypes, each of which corresponds to a different mutation class.

11. The method of claim 10, wherein classifying the tumor cells into phenotypes includes:

identifying nuclear heterogeneity in the tile; and

The identified nuclear heterogeneity is quantified, wherein generating the mutation prediction is further based on the quantified nuclear heterogeneity.

12. The method of claim 10, wherein generating the mutation prediction comprises:

Cell-level spatial analysis was performed to assess the spatial distribution indicative of the area of clonal cells and the spatial arrangement of cells within each of the areas of clonal cells.

13. The method of claim 12, wherein assessing spatial distribution includes:

measuring spectral distances within subgraphs of a minimum spanning tree of said tumor cells, wherein each of said subgraphs represents a tumor nest;

Adjacency spectral distances are calculated pairwise across all subgraphs.

14. The method of claim 13, wherein each of the subgraphs is defined by performing outlier detection.

15. The method of claim 13, wherein each of the subgraphs is defined based on segmentation of detected tumor nests.

16. The method of claim 10, wherein generating the mutation prediction comprises:

Regions of closely adjacent clonal cells in the tile were identified by performing cell-level spatial analysis to assess spatial entropy.

17. The method of claim 16, wherein assessing spatial entropy includes:

specifying a set of distinct distance bins, wherein each of said distance bins corresponds to a range of distances between pairs of tumor cells;

For each of the distance bins, pairs of the tumor cells are identified, wherein the distance between the tumor cells in each of the pairs falls within the range corresponding to the distance bin. within distance range;

For each of the distance bins, calculate the frequency with which pairs of tumor cells are identified as morphologically similar;

For each possible pair of tumor cells classified into a phenotype, the pair of tumor cells is classified into one of a predefined number of bins, each of the bins representing a the distance between the spatial locations of each of the cells;

For each of the bins, the frequency of classification of the pair of tumor cells in the bin is calculated.

18. The method of claim 17, wherein the weight applied to each distance bin may correspond to the number of paired tumor cells in the distance bin.

19. The method of claim 17, wherein the set of distance bins may be limited to those bins that represent a distance range with a maximum distance below a specified threshold.

20. The method of claim 1, wherein generating the prognostic prediction includes:

The mutational context of the digital pathology image is determined to be an unknown driver or tumor suppressor, where the heterogeneity level of clustered tumor cells is high, and where the prognostic prediction is relevant to a treatment regimen including immunotherapy.

21. The method of claim 1, wherein generating the prognostic prediction includes:

Determining that the mutational background of the digital pathology image is an oncogene driver mutation, wherein the heterogeneity level of the clustered tumor cells is moderate, and wherein the prognostic prediction is relevant to a treatment regimen including a targeted therapy corresponding to the mutation .

22. The method of claim 1, wherein:

If an actionable mutation is present in at least one of the tiles, the one or more treatment options include a targeted therapy associated with the actionable mutation;

Otherwise the one or more treatment regimens include immunotherapy.

23. One or more computer-readable non-transitory storage media embodying software that when executed is operable to:

access digital pathology images depicting tumor cells sampled from the subject;

24. A system, comprising: one or more processors; and non-transitory memory coupled to the processors, the non-transitory memory including instructions executable by the processors, the processing The processor when executing the instructions is operable to:

access digital pathology images depicting tumor cells sampled from the subject;