WO2023137627A1 - Tumor microenvironment spatial relationship modeling system and method based on digital pathology image - Google Patents

Tumor microenvironment spatial relationship modeling system and method based on digital pathology image Download PDF

Info

Publication number
WO2023137627A1
WO2023137627A1 PCT/CN2022/072760 CN2022072760W WO2023137627A1 WO 2023137627 A1 WO2023137627 A1 WO 2023137627A1 CN 2022072760 W CN2022072760 W CN 2022072760W WO 2023137627 A1 WO2023137627 A1 WO 2023137627A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
image
distribution
cells
spatial
Prior art date
Application number
PCT/CN2022/072760
Other languages
French (fr)
Chinese (zh)
Inventor
秦文健
刁颂辉
何佳慧
侯嘉馨
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to PCT/CN2022/072760 priority Critical patent/WO2023137627A1/en
Publication of WO2023137627A1 publication Critical patent/WO2023137627A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts

Definitions

  • the present invention relates to the technical field of medical image processing, and more specifically, to a system and method for modeling the spatial relationship of tumor microenvironment based on digital pathological images.
  • Tumor tissue is a complex structure composed of cancer cells and surrounding non-cancer cells (such as stromal cells and lymphocytes) forming a tumor microenvironment. Its spatial heterogeneity is very complex. Although weakly supervised learning algorithms can be used to identify and locate cancer cells, lymphocytes, stromal cells, and other types of cells (such as macrophages, T cells, or non-differentiating cells) in digital pathology images, existing methods cannot achieve full expression due to simple distance measurement, cell density statistics, or clustering methods. In addition, due to the abundance of cell types in the tumor microenvironment, the current cell spatial organization relationship constructed by graph neural network cannot be applied to the automatic and comprehensive quantitative analysis of the spatial organization distribution of multiple cell types at the same time. Therefore, it is necessary to study new methods for topological clustering of multi-layer networks for multi-cell types.
  • the tumor microenvironment controls the formation, development, metastasis, and drug resistance of solid tumors. It is the result of the interaction between tumor cells and non-tumor cells and tissues such as stromal cells and immune cells to generate anti-tumor immune responses. There are strong clinical and experimental evidences to support the importance of the tumor microenvironment in cancer progression and mediation of drug resistance. However, the relationship of complex anatomy and local microenvironment to metabolic and immune responses remains to be deeply explored. It is difficult for pathologists to capture the interaction between the tumor and its microenvironment through conventional qualitative or semi-quantitative parametric visual inspection.
  • the use of digital pathological image computing analysis to decipher the characteristics of the tumor microenvironment, especially the spatial heterogeneity within the tumor not only provides a new way of thinking for solving the problem of tumor microenvironment analysis, but more importantly, it can mine potential biomarkers related to cancer treatment, so as to design the most appropriate precision medicine treatment plan for patients.
  • computational pathology can not only assist pathologists to examine patients' histological data in a high-throughput, quantitative and objective manner, but also use various types of cells obtained by automatic detection algorithms to construct a spatial relationship map of cells in the tumor, and combine spatial analysis methods to achieve accurate assessment of tumor treatment response and prognosis.
  • the initial research work on the spatial analysis of the tumor microenvironment based on digital pathology usually uses clustering algorithms to perform spatial positioning and morphometric measurement of cell features extracted from digital pathology images to describe the relationship between the spatial distribution pattern of immune cells and diseases.
  • a convolutional neural network was first used to identify tumor-infiltrating lymphocytes (TIL) and segment tumor necrosis regions, then an affine clustering algorithm was used to model the spatial pattern of infiltrating lymphocytes, and then the corresponding clustering features were extracted to describe the spatial pattern of TIL, revealing the relationship between TIL patterns and immune subtypes, tumor types, immune cell fragments, and patient survival.
  • TIL tumor-infiltrating lymphocytes
  • an affine clustering algorithm was used to model the spatial pattern of infiltrating lymphocytes
  • the corresponding clustering features were extracted to describe the spatial pattern of TIL, revealing the relationship between TIL patterns and immune subtypes, tumor types, immune cell fragments, and patient survival.
  • KunHuang et al. used the topological space modeling of deep learning features based on delaunay triangulation graphs.
  • stacked autoencoder networks were used to learn high-level semantic features of cells, and then K-means clustering was used to obtain the spatial pattern of cell nuclei.
  • the statistical method of edge histograms confirmed that the spatial topological features of the renal tumor microenvironment were significantly correlated with survival. They also verified that topological features have superior performance compared with clinical features and cell morphological features in terms of survival prediction.
  • Guanghua Xiao et al. used the method of cell statistical density to construct a regional spatial organization map, used a deep convolutional network to automatically identify cell types, and finally calculated two spatially distributed features to predict the survival of lung cancer patients.
  • the tumor microenvironment is very complex and has spatial heterogeneity.
  • Existing methods cannot be fully expressed by simple distance measurement, cell statistics or clustering methods.
  • the spatial analysis of graphs still relies on manual extraction of features such as the number of adjacent node connections and edge histograms in graphs, and can only analyze simple relationships between cells. Due to the abundance of cell types in the tumor microenvironment, it is difficult for current spatial analysis methods to simultaneously perform fully automatic and comprehensive quantitative analysis of the spatial organization distribution of various types of components (including blood vessels and other structures, lymphocytes, stromal cells and other different types of cells).
  • the purpose of the present invention is to overcome the defects of the above-mentioned prior art, and provide a tumor microenvironment spatial relationship modeling system and method based on digital pathological images, which is a new technical solution for multi-layer network topology clustering of multi-cell types.
  • a tumor microenvironment spatial relationship modeling system based on digital pathological images includes:
  • Image staining standardization module used to determine the pixel distribution type of the pathological image, perform color standardization on the change of staining distribution according to the overall distribution of each pixel of the pathological image, and obtain a staining standardized image;
  • Structural region segmentation module for the dyed standardized image, using a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structural region;
  • Cell detection module used to extract various types of cell information from the obtained target structure region
  • Spatial relationship building module it is used to model the multi-layer network by using a multi-layer graph to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a quantitative model of spatial distribution, wherein the quantitative model of spatial distribution is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment.
  • a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images includes the following steps:
  • Step S1 Determine the pixel distribution type of the pathological image, perform color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtain a staining standardized image;
  • Step S2 For the stained and standardized image, use a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structure region;
  • Step S3 extracting various types of cell information from the obtained target structure region
  • Step S4 use a multi-layer graph to model a multi-layer network to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a spatial distribution quantitative model, wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.
  • the present invention has the advantage that due to the richness and spatial heterogeneity of tumor cell types, there is a strong spatial correlation between various components of the tumor microenvironment and cancer cells at the same time.
  • the present invention provides a mathematical model for constructing a topological space of tumor cells and multiple components of the tumor microenvironment, which can reveal the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provide a new idea for quantitative analysis of tumor evolution mechanisms.
  • Fig. 1 is an architecture diagram of a tumor microenvironment spatial relationship modeling system based on digital pathological images according to an embodiment of the present invention
  • Fig. 2 is a flow chart of a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images according to an embodiment of the present invention.
  • the provided digital pathology image-based tumor microenvironment spatial relationship modeling system includes an image staining standardization module, a structural region segmentation module, a cell detection module and a spatial relationship building module.
  • the staining standardization module is used to solve the problem of inconsistent color distribution of different slices
  • the structural region segmentation module is used to combine the multi-scale imaging characteristics of pathological images, and realize the segmentation of lesion regions and structures at high resolution through regularizable weakly supervised learning methods
  • the cell detection module is used to detect and identify various types of cells in clusters of small targets
  • the spatial relationship construction module is used to identify immune cell types through image registration algorithms, and build a multi-type structure-topological spatial relationship model between cells and tumor cells through a multi-layer graph network to achieve quantitative analysis of the tumor microenvironment.
  • the distribution of stained pixels in pathological images usually conforms to a Gaussian distribution or a partial normal distribution, which can be determined by a self-supervised algorithm based on parameter estimation of the distribution model, where the probability density function (PDF) of the multivariate partial normal distribution is:
  • a is the element of the upper triangular part of the matrix ⁇
  • ⁇ d ( ⁇ ;u, ⁇ ) is the PDF and covariance matrix of a d-variable Gaussian distribution with a ⁇ -mean vector
  • ( ⁇ ) refers to the cumulative distribution function of ⁇ d ( ⁇ ;u, ⁇ )) as a standard univariate Gaussian distribution.
  • the probability density function of the multivariate mixed Gaussian distribution is:
  • the parameter ⁇ d is called the mixing coefficient (mixing coefficients), And 0 ⁇ d ⁇ 1. is the prior probability of the selected kth distribution, density is the probability of x given the kth distribution.
  • the Jarque-Bera test can be used to confirm the actual distribution model type of the pixels of the pathological image, such as testing based on the skewness and kurtosis of the pixel data.
  • the method of depth convolution can be used to estimate and update the parameters of the model, so as to obtain the overall distribution of each pixel of the pathological image, and finally realize the color standardization of the image to be analyzed through the coloring distribution change model.
  • the structural region segmentation module sequentially performs regularized encoding, regularized decoding, weakly supervised learning, and region-of-interest detection through the multi-colored standardized image to obtain the reserved segmentation map of the structural region.
  • one of the key issues is to extract enough key feature codes with limited information to effectively assist segmentation.
  • the characteristics of the key areas in the data are constructed, irrelevant redundant information and noise are removed, and the original data information is abstracted into two types of data matrices.
  • the target structure information is a low-rank matrix, and the redundant and noise information is a sparse matrix.
  • the two matrices are solved separately, and finally the characteristic information of the target structure data is obtained.
  • This method is used in the multi-scale pathological image segmentation model, and a regularizer is designed for model training for the above losses.
  • l(S, Y) is the loss between the real value and the predicted value
  • R(S) is the regularization loss
  • the parameter S f ⁇ (I) ⁇ [0,1]
  • ⁇ and ⁇ are the corresponding item weight parameters set.
  • the input of the model can be fused with image information of multiple magnifications to achieve different attention to cells under high magnification and tissues at medium and low magnifications, so as to fully consider the specificity and generality of the data samples and learn the key features of the data.
  • the diagnostic process of clinical pathologists is simulated, and different attention weights are given to image features of different magnifications, so as to fully consider the data characteristics of images under various magnifications.
  • the optimization function of the corresponding multi-rate regularization loss is:
  • I d represents the image input under d magnification
  • f ⁇ represents the feature calculation under the ⁇ parameter and the attention weight of ⁇
  • is mainly calculated by Softmax(f ⁇ (I)).
  • the weighted category activation map is obtained by fusion calculation of the features of multiple scales, and the target tissue and structural area can be obtained after post-processing.
  • the structural region segmentation network can use multiple types such as AlexNet, VGG, GoogleNet, ResNet, etc., which is not limited in the present invention.
  • the key discriminant matrix which is calculated according to the probability distance of the matrix corresponding to the high-dimensional features of different types of images, and using the tumor region as a reference, that is, the distance between cancer and other distances is long
  • the key discriminant matrix which is calculated according to the probability distance of the matrix corresponding to the high-dimensional features of different types of images, and using the tumor region as a reference, that is, the distance between cancer and other distances is long
  • the transformation network based on self-attention realizes the encoding and decoding calculation of each cell and structure in the image, and at the same time combines the key discriminant matrix for fusion analysis, as follows:
  • W A is a learnable weight and H, W, and C respectively represent the isotropic dimensions of the features, L represents the number of visual markers T, and L ⁇ HW.
  • the self-attention transformation is used to model the dependency between T, and projected to the dimension of the normal feature map, combined with the key discriminant matrix G of the preorder, expressed as:
  • X out X in +SOFTMAX L ((X in W Q )(TW K ) T )T+G (6)
  • Xin represents the image features obtained during the multi-scale pathological image tumor region detection
  • X out represents the final output result of the cell detection module
  • W Q and W K are the weight parameters that can be learned, respectively, after constructing the feature relationship of the image, a large amount of data learning is carried out to realize the recognition and positioning of different types of cells and structures.
  • the spatial relationship building module sequentially performs processes such as image slicing, image feature encoding, low-magnification rigid registration, high-magnification non-rigid registration, multi-layer graph network construction, graph embedding dimensionality reduction, point cloud distribution data acquisition, continuous coherent modeling and feature cluster analysis, and finally obtains a quantitative model of spatial distribution.
  • processes such as image slicing, image feature encoding, low-magnification rigid registration, high-magnification non-rigid registration, multi-layer graph network construction, graph embedding dimensionality reduction, point cloud distribution data acquisition, continuous coherent modeling and feature cluster analysis, and finally obtains a quantitative model of spatial distribution.
  • the following focuses on the multi-layer graph network construction and feature cluster analysis.
  • a multi-layer graph is used to model a multi-layer network.
  • a multi-layer graph is a collection of single-layer graph adjacency matrices with weights, including intra-layer relationships and inter-layer interactions.
  • the specific implementation includes the multi-layer network constructed based on the multi-layer graph and the clustering calculation for the multi-layer network, so as to finally realize the construction of the spatial distribution expression model between the tumor cells and the multi-components of the tumor microenvironment.
  • a cross-layer adjacency matrix set C p ⁇ A l,k ,k ⁇ l ⁇ can be obtained, which represents the edges between nodes of different layers, and p represents the number of connection graphs.
  • a multi-layer network A collection of interlayer connections that connect nodes across layers for sides There are u ⁇ V(G k ) and v ⁇ V(G l ), and k ⁇ l. defined multilayer network
  • the hyperadjacency matrix of has a block matrix structure:
  • nodes in the same layer are defined to represent the same type of cells, and connections between different layers are defined to represent the spatial connection relationship between different types of cells or structures. Taking vascular structures and tumor cells as an example, the relationship between cell-structure layers can be established based on the spatial distance to obtain the value of the off-diagonal element A kl between layers. Tumors or immune cells that are close to blood vessels have a strong connection with the structural layer, and vice versa; the diagonal elements are intra-layer matrices that are also obtained through the Euclidean distance between cells.
  • the network After building a multi-layer graph network, considering that extracting meaningful information from a complex network requires a lot of computation and memory, in order to solve these two problems, the network is transformed into a low-dimensional space through node embedding and its structural information is preserved, for example, the method of graph embedding is used to achieve dimensionality reduction.
  • topological data analysis topological data analysis
  • the evaluation of changes in network topology induction is used to detect persistent features over a wide range of thresholds ⁇ j .
  • the goal is to detect persistent features that exceed different thresholds ⁇ , and this persistent feature is a feature of the internal spatial organization distribution.
  • This persistent graph clustering algorithm can obtain more accurate clustering results.
  • the multi-layer network clustering method adopted in the embodiment of the present invention starts from the perspective of similarity in the shape of multi-resolution recorded data, and performs clustering calculations on multi-layer networks under unsupervised conditions.
  • the multi-lens tool of TDA is introduced in the clustering calculation, whose core idea is that if the local neighborhoods of two points are similar in shape at all resolution scales, then the distance between them is close enough to be clustered into a cluster. Therefore, persistent graph clustering utilizes the distance function and local spatial information around points, and can obtain more accurate clustering results for multi-layer graph networks.
  • the present invention also provides a method for modeling the spatial relationship of the tumor microenvironment based on digital pathological images, which is used to realize the functions of each module in the above system.
  • the method includes: step S110, determining the pixel distribution type of the pathological image, performing color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtaining a staining standardized image; step S120, using a weakly supervised deep learning model to detect the region of interest for the staining standardized image, and then segmenting the target structure region; step S130, extracting various types of cell information from the obtained target structure region; Class analysis to obtain a quantitative model of spatial distribution.
  • the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment
  • the multi-layer graph includes intra-layer relationships and inter-layer interactions
  • nodes in the same layer represent cells of the same type
  • connections between different layers represent spatial connection relationships between different types of cells or structures.
  • the present invention has at least the following technical effects:
  • a fast calculation method for multi-scale pathological images based on learnable regularization constraint encoding and decoding weakly supervised learning is designed. Aiming at the difficulty of calculating a single pathological image with over one billion pixels and the incomplete utilization of information at different magnification scales, combined with the idea of weak supervision and deep learning technology, it does not need to rely on large-scale data labeling and makes full use of cross-scale information to achieve rapid detection of lesion regions of interest and accurate positioning of cell nuclei in digital panoramic pathological images.
  • the self-attention transformation network is used to realize the encoding and decoding of each cell and structure, thereby realizing the rapid detection and accurate identification of clustered multi-type small target cells.
  • the tumor microenvironment topological space modeling method based on persistence graph clustering, to further realize the quantitative calculation of pathological diagnostic indicators.
  • Conventional distance or statistical methods are difficult to analyze the spatial expression of complex tumor microenvironments.
  • the present invention introduces the concept of topological data analysis into complex multi-layer network clustering calculations, and proposes a topological space modeling method for persistent graph clustering. It reveals the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provides a new idea for quantitative analysis of tumor evolution mechanisms.
  • the present invention can be a system, method and/or computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read only memory (CD-ROM), digital versatile disks (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punched cards or raised-in-recess structures with instructions stored thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disks
  • memory sticks floppy disks
  • mechanically encoded devices such as punched cards or raised-in-recess structures with instructions stored thereon, and any suitable combination of the foregoing.
  • Computer-readable storage media as used herein is not to be interpreted as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or electrical signals transmitted through electrical wires.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device.
  • Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), can be executed by utilizing state information of computer readable program instructions to personalize electronic circuits that execute computer readable program instructions, thereby implementing various aspects of the present invention.
  • These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing devices, thereby producing a machine, such that these instructions, when executed by a processor of a computer or other programmable data processing devices, produce devices that implement the functions/actions specified in one or more blocks in the flowchart and/or block diagrams.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific manner, so that the computer-readable medium storing instructions includes an article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagrams.
  • Computer-readable program instructions can also be loaded onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to generate a computer-implemented process, so that the instructions executed on the computer, other programmable data processing device, or other equipment realize the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of instructions comprising one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by special purpose hardware-based systems that perform the specified functions or actions, or by combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Abstract

Disclosed in the present invention are a tumor microenvironment spatial relationship modeling system and method based on a digital pathology image. The system comprises an image dyeing normalization module, configured to determine a pixel distribution type of a pathology image to perform color normalization on the dyeing distribution change to obtain a dyeing normalization image; a structure region segmentation module, configured to detect a region of interest by using a weakly supervised deep learning model for the dyeing normalization image, and then segmenting to obtain a target structure region; a cell detection module, configured to extract information of various types of cells from the target structure region; and a spatial relationship construction module, configured to model a multi-layer network by using a multi-layer graph, to represent a co-spatial distribution among the various types of cells, and perform clustering analysis on the multi-layer graph to obtain a spatial distribution quantitative model. The present invention can accurately reveal the correlation between the intratumoral heterogeneity and the tumor microenvironment cell and tissue spatial distribution rule, thereby providing a new quantitative analysis idea for a tumor evolution mechanism.

Description

基于数字病理图像的肿瘤微环境空间关系建模系统与方法System and method for modeling spatial relationship of tumor microenvironment based on digital pathology images 技术领域technical field
本发明涉及医学图像处理技术领域,更具体地,涉及一种基于数字病理图像的肿瘤微环境空间关系建模系统与方法。The present invention relates to the technical field of medical image processing, and more specifically, to a system and method for modeling the spatial relationship of tumor microenvironment based on digital pathological images.
背景技术Background technique
肿瘤组织是由癌症细胞和周围非癌细胞(如基质细胞和淋巴细胞等)形成肿瘤微环境构成的复杂结构,其空间异质性非常复杂,虽然可以利用弱监督学习算法实现对数字病理图像中的癌细胞、淋巴细胞、基质细胞和其他类型细胞(如巨噬细胞、T细胞或非鉴别细胞)的识别并定位出空间位置,然而现有方法由于仅靠简单的距离测量、细胞密度统计或聚类方式而无法实现完全表达。此外,由于肿瘤微环境中的细胞种类丰富,目前利用图神经网络构建的细胞空间组织关系无法适用于同时对多种细胞类型空间组织分布进行全自动综合性的定量化分析。因此,有必要研究面向多细胞类型的多层网络拓扑聚类新方法。Tumor tissue is a complex structure composed of cancer cells and surrounding non-cancer cells (such as stromal cells and lymphocytes) forming a tumor microenvironment. Its spatial heterogeneity is very complex. Although weakly supervised learning algorithms can be used to identify and locate cancer cells, lymphocytes, stromal cells, and other types of cells (such as macrophages, T cells, or non-differentiating cells) in digital pathology images, existing methods cannot achieve full expression due to simple distance measurement, cell density statistics, or clustering methods. In addition, due to the abundance of cell types in the tumor microenvironment, the current cell spatial organization relationship constructed by graph neural network cannot be applied to the automatic and comprehensive quantitative analysis of the spatial organization distribution of multiple cell types at the same time. Therefore, it is necessary to study new methods for topological clustering of multi-layer networks for multi-cell types.
肿瘤微环境控制着实体瘤的形成、发展、转移及耐药性的产生,它是肿瘤细胞与间质细胞、免疫细胞等非肿瘤细胞和组织相互作用产生抗肿瘤免疫反应的结果,已有强有力的临床和实验证据支持肿瘤微环境在癌症进展和介导耐药中的重要。然而,复杂的解剖结构和局部微环境对代谢和免疫反应的关系还有待深入探索。病理学家常规的定性或半定量参数视觉检查很难捕捉到肿瘤与其微环境之间的相互作用,因此利用数字病理图像计算分析来解密肿瘤微环境的特性,尤其是肿瘤内的空间异质性,不仅为解决肿瘤微环境分析难题提供了新的思维方式,更重要的是可以挖掘出癌症治疗相关的潜在生物标志物,从而为患者设计最合适的精准医学治疗方案。The tumor microenvironment controls the formation, development, metastasis, and drug resistance of solid tumors. It is the result of the interaction between tumor cells and non-tumor cells and tissues such as stromal cells and immune cells to generate anti-tumor immune responses. There are strong clinical and experimental evidences to support the importance of the tumor microenvironment in cancer progression and mediation of drug resistance. However, the relationship of complex anatomy and local microenvironment to metabolic and immune responses remains to be deeply explored. It is difficult for pathologists to capture the interaction between the tumor and its microenvironment through conventional qualitative or semi-quantitative parametric visual inspection. Therefore, the use of digital pathological image computing analysis to decipher the characteristics of the tumor microenvironment, especially the spatial heterogeneity within the tumor, not only provides a new way of thinking for solving the problem of tumor microenvironment analysis, but more importantly, it can mine potential biomarkers related to cancer treatment, so as to design the most appropriate precision medicine treatment plan for patients.
随着数字病理全景成像和基于深度学习的病理图像处理算法的发展,计算病理不仅可以辅助病理医生以高通量,定量和客观的方式检查患者的 组织学数据,还能利用自动检测算法获得的各类细胞来构建肿瘤内的细胞空间关系图,联合空间分析方法实现对肿瘤治疗反应及预后的精准评估。最初基于数字病理的肿瘤微环境的空间分析研究工作通常采用聚类算法,将从数字病理图像中提取到的细胞特征进行空间定位和形态测量,来描述免疫细胞空间分布模式与疾病之间的关系。例如,首先利用卷积神经网络实现对肿瘤浸润性淋巴细胞(TIL)的识别和肿瘤坏死区域的分割,然后采用仿射聚类算法对浸润性淋巴细胞进行空间模式建模,进而提取相应的聚类特征描述TIL的空间模式,揭示了TIL模式与免疫亚型、肿瘤类型、免疫细胞碎片和患者生存之间的关系。还有一些研究采用欧几里得距离测量细胞分布密度来定量表示癌症与微环境成分之间的空间关系,探究其临床意义。这些研究表明,使用图像分析可以超越样本细胞计数,以空间距离为基础进行肿瘤微环境的空间分析。为了更好利用高层次的空间分布信息,KunHuang等人采用了基于delaunay三角化图对深度学习特征的拓扑空间建模,首先采用堆叠自编码网络学习细胞的高层次语义特征,然后利用K-means聚类获取细胞核的空间模式,最后通过边直方图统计方式证实了肾脏肿瘤微环境的空间拓扑特征与生存期显著关联,还验证了在生存预测方面,拓扑特征与临床特征和细胞形态特征相比具有更优越的性能。Guanghua Xiao等人采用了基于细胞统计密度方式构建区域的空间组织图,使用深度卷积网络全自动识别细胞类型,最后计算2个空间分布的特征来预测肺癌患者的生存。With the development of digital pathological panoramic imaging and pathological image processing algorithms based on deep learning, computational pathology can not only assist pathologists to examine patients' histological data in a high-throughput, quantitative and objective manner, but also use various types of cells obtained by automatic detection algorithms to construct a spatial relationship map of cells in the tumor, and combine spatial analysis methods to achieve accurate assessment of tumor treatment response and prognosis. The initial research work on the spatial analysis of the tumor microenvironment based on digital pathology usually uses clustering algorithms to perform spatial positioning and morphometric measurement of cell features extracted from digital pathology images to describe the relationship between the spatial distribution pattern of immune cells and diseases. For example, a convolutional neural network was first used to identify tumor-infiltrating lymphocytes (TIL) and segment tumor necrosis regions, then an affine clustering algorithm was used to model the spatial pattern of infiltrating lymphocytes, and then the corresponding clustering features were extracted to describe the spatial pattern of TIL, revealing the relationship between TIL patterns and immune subtypes, tumor types, immune cell fragments, and patient survival. There are also some studies that use Euclidean distance to measure cell distribution density to quantitatively represent the spatial relationship between cancer and microenvironment components, and to explore its clinical significance. These studies demonstrate that the use of image analysis can go beyond sample cell counts to provide spatial analysis of the tumor microenvironment based on spatial distance. In order to make better use of high-level spatial distribution information, KunHuang et al. used the topological space modeling of deep learning features based on delaunay triangulation graphs. First, stacked autoencoder networks were used to learn high-level semantic features of cells, and then K-means clustering was used to obtain the spatial pattern of cell nuclei. Finally, the statistical method of edge histograms confirmed that the spatial topological features of the renal tumor microenvironment were significantly correlated with survival. They also verified that topological features have superior performance compared with clinical features and cell morphological features in terms of survival prediction. Guanghua Xiao et al. used the method of cell statistical density to construct a regional spatial organization map, used a deep convolutional network to automatically identify cell types, and finally calculated two spatially distributed features to predict the survival of lung cancer patients.
综上,肿瘤微环境十分复杂,并具有空间异质性,现有方法单靠简单距离测量、细胞统计或聚类方式无法做到完全表达,虽然最新研究尝试了利用图的方式构建空间组织关系,但是图的空间分析还是依靠人工提取图的邻节点连接数、边直方图等特征,只能对简单几种细胞相互关系进行分析。由于肿瘤微环境中的细胞种类丰富,目前的空间分析方法难以做到同时对多种类型成分(包括血管等结构,淋巴细胞,基质细胞等不同类型细胞)的空间组织分布进行全自动综合性的定量化分析。To sum up, the tumor microenvironment is very complex and has spatial heterogeneity. Existing methods cannot be fully expressed by simple distance measurement, cell statistics or clustering methods. Although the latest research attempts to use graphs to construct spatial organization relationships, the spatial analysis of graphs still relies on manual extraction of features such as the number of adjacent node connections and edge histograms in graphs, and can only analyze simple relationships between cells. Due to the abundance of cell types in the tumor microenvironment, it is difficult for current spatial analysis methods to simultaneously perform fully automatic and comprehensive quantitative analysis of the spatial organization distribution of various types of components (including blood vessels and other structures, lymphocytes, stromal cells and other different types of cells).
发明内容Contents of the invention
本发明的目的是克服上述现有技术的缺陷,提供一种基于数字病理图像的肿瘤微环境空间关系建模系统与方法,是面向多细胞类型的多层网络拓扑聚类的新技术方案。The purpose of the present invention is to overcome the defects of the above-mentioned prior art, and provide a tumor microenvironment spatial relationship modeling system and method based on digital pathological images, which is a new technical solution for multi-layer network topology clustering of multi-cell types.
根据本发明的第一方面,提供一种基于数字病理图像的肿瘤微环境空间关系建模系统。该系统包括:According to the first aspect of the present invention, a tumor microenvironment spatial relationship modeling system based on digital pathological images is provided. The system includes:
图像染色标准化模块:用于确定病理图像的像素分布类型,根据病理图像各像素的整体分布情况对染色分布变化进行颜色标准化,获得染色标准化图像;Image staining standardization module: used to determine the pixel distribution type of the pathological image, perform color standardization on the change of staining distribution according to the overall distribution of each pixel of the pathological image, and obtain a staining standardized image;
结构区域分割模块:用于针对所述染色标准化图像,利用弱监督深度学习模型检测感兴趣区域,进而分割得到目标结构区域;Structural region segmentation module: for the dyed standardized image, using a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structural region;
细胞检测模块:用于从获得的目标结构区域中提取多种类型的细胞信息;Cell detection module: used to extract various types of cell information from the obtained target structure region;
空间关系构建模块:用于采用多层图建模多层网络来表征多种类型细胞之间的共空间分布,并对所述多层图进行聚类分析,得到空间分布定量模型,其中所述空间分布定量模型用于定量表征肿瘤细胞与肿瘤微环境相互之间的作用,所述多层图包含层内关系和层间相互作用,同层节点表示同一类型细胞,不同层之间的连接表示不同类型细胞或结构之间的空间连接关系。Spatial relationship building module: it is used to model the multi-layer network by using a multi-layer graph to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a quantitative model of spatial distribution, wherein the quantitative model of spatial distribution is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment.
根据本发明的第二方面,提供一种基于数字病理图像的肿瘤微环境空间关系建模方法。该方法包括以下步骤:According to the second aspect of the present invention, a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images is provided. The method includes the following steps:
步骤S1:确定病理图像的像素分布类型,根据病理图像各像素的整体分布情况对染色分布变化进行颜色标准化,获得染色标准化图像;Step S1: Determine the pixel distribution type of the pathological image, perform color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtain a staining standardized image;
步骤S2:针对所述染色标准化图像,利用弱监督深度学习模型检测感兴趣区域,进而分割得到目标结构区域;Step S2: For the stained and standardized image, use a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structure region;
步骤S3:从获得的目标结构区域中提取多种类型的细胞信息;Step S3: extracting various types of cell information from the obtained target structure region;
步骤S4:用于采用多层图建模多层网络来表征多种类型细胞之间的共空间分布,并对所述多层图进行聚类分析,得到空间分布定量模型,其中所述空间分布定量模型用于定量表征肿瘤细胞与肿瘤微环境相互之间的作用,所述多层图包含层内关系和层间相互作用,同层节点表示同一类型 细胞,不同层之间的连接表示不同类型细胞或结构之间的空间连接关系。Step S4: use a multi-layer graph to model a multi-layer network to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a spatial distribution quantitative model, wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.
与现有技术相比,本发明的优点在于,由于肿瘤细胞种类的丰富性和空间异质性,肿瘤微环境多种成份同时与癌细胞之间存在很强的空间相关性,本发明提供构建肿瘤细胞与肿瘤微环境多成份的拓扑空间数学模型,能够揭示肿瘤内异质性与肿瘤微环境细胞和组织空间分布规律的关联性,为肿瘤演化机制提供全新的定量化分析思路。Compared with the prior art, the present invention has the advantage that due to the richness and spatial heterogeneity of tumor cell types, there is a strong spatial correlation between various components of the tumor microenvironment and cancer cells at the same time. The present invention provides a mathematical model for constructing a topological space of tumor cells and multiple components of the tumor microenvironment, which can reveal the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provide a new idea for quantitative analysis of tumor evolution mechanisms.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.
附图说明Description of drawings
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
图1是根据本发明一个实施例的基于数字病理图像的肿瘤微环境空间关系建模系统的架构图;Fig. 1 is an architecture diagram of a tumor microenvironment spatial relationship modeling system based on digital pathological images according to an embodiment of the present invention;
图2是根据本发明一个实施例的基于数字病理图像的肿瘤微环境空间关系建模方法的流程图。Fig. 2 is a flow chart of a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images according to an embodiment of the present invention.
具体实施方式Detailed ways
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一 旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.
参见图1所示,所提供的基于数字病理图像的肿瘤微环境空间关系建模系统包括图像染色标准化模块、结构区域分割模块、细胞检测模块和空间关系构建模块。其中染色标准化模块用于解决不同切片颜色分布不一致问题;结构区域分割模块用于结合病理图像多尺度成像特性,通过可正则化弱监督学习方法实现高分辨率下的病变区域和结构的分割;细胞检测模块用于对群集小目标各类型细胞进行检测和识别;空间关系构建模块用于通过图像配准算法实现免疫细胞类型的鉴别,并通过多层图网络构建多类型结构-细胞与肿瘤细胞之间的拓扑空间关系模型,实现肿瘤微环境量化分析。Referring to Figure 1, the provided digital pathology image-based tumor microenvironment spatial relationship modeling system includes an image staining standardization module, a structural region segmentation module, a cell detection module and a spatial relationship building module. Among them, the staining standardization module is used to solve the problem of inconsistent color distribution of different slices; the structural region segmentation module is used to combine the multi-scale imaging characteristics of pathological images, and realize the segmentation of lesion regions and structures at high resolution through regularizable weakly supervised learning methods; the cell detection module is used to detect and identify various types of cells in clusters of small targets; the spatial relationship construction module is used to identify immune cell types through image registration algorithms, and build a multi-type structure-topological spatial relationship model between cells and tumor cells through a multi-layer graph network to achieve quantitative analysis of the tumor microenvironment.
在下文中将介绍各模块功能和具体实施例。The functions and specific embodiments of each module will be introduced below.
(1)图像染色标准化模块(1) Image coloring standardization module
病理图像的染色像素分布通常符合高斯分布或偏正态分布,可采用基于分布模型的参数估计的自监督算法来确定,其中多元偏正态分布的概率密度函数(PDF)为:The distribution of stained pixels in pathological images usually conforms to a Gaussian distribution or a partial normal distribution, which can be determined by a self-supervised algorithm based on parameter estimation of the distribution model, where the probability density function (PDF) of the multivariate partial normal distribution is:
Figure PCTCN2022072760-appb-000001
Figure PCTCN2022072760-appb-000001
式中,θ=(μ T,a TT) T为未知参数向量,a是矩阵Σ上三角部分的元素,
Figure PCTCN2022072760-appb-000002
是一个对称矩阵的平方根,
Figure PCTCN2022072760-appb-000003
φ d(·;u,∑)为带有μ均值向量的d变量高斯分布的PDF和协方差矩阵,(·)是指φ d(·;u,∑))为标准单变量高斯分布的累积分布函数。多元的混合高斯分布的概率密度函数为:
In the formula, θ=(μ T ,a TT ) T is the unknown parameter vector, a is the element of the upper triangular part of the matrix Σ,
Figure PCTCN2022072760-appb-000002
is the square root of a symmetric matrix,
Figure PCTCN2022072760-appb-000003
φd (·;u,∑) is the PDF and covariance matrix of a d-variable Gaussian distribution with a μ-mean vector, and (·) refers to the cumulative distribution function of φd (·;u,∑)) as a standard univariate Gaussian distribution. The probability density function of the multivariate mixed Gaussian distribution is:
Figure PCTCN2022072760-appb-000004
Figure PCTCN2022072760-appb-000004
其中,参数π d被称为混合系数(mixing coefficients),
Figure PCTCN2022072760-appb-000005
且0≤π d≤1。
Figure PCTCN2022072760-appb-000006
为选择的第k个分布的先验概率,密度
Figure PCTCN2022072760-appb-000007
Figure PCTCN2022072760-appb-000008
为给定第k个分布时x的概率。
Among them, the parameter π d is called the mixing coefficient (mixing coefficients),
Figure PCTCN2022072760-appb-000005
And 0≤π d ≤1.
Figure PCTCN2022072760-appb-000006
is the prior probability of the selected kth distribution, density
Figure PCTCN2022072760-appb-000007
Figure PCTCN2022072760-appb-000008
is the probability of x given the kth distribution.
在一个实施例中,可采用雅克-贝拉检验(Jarque-Bera test)病理图像的像素进行实际分布模型类型确认,如基于像素数据的偏度和峰度进行检验,进一步对分布模型的参数进行求解时,可采用深度卷积的方法对模型的参数进行估计更新,从而得到病理图像各像素的整体分布情况,最终实 现待分析的图像通过染色分布变化模型进行颜色标准化。In one embodiment, the Jarque-Bera test (Jarque-Bera test) can be used to confirm the actual distribution model type of the pixels of the pathological image, such as testing based on the skewness and kurtosis of the pixel data. When further solving the parameters of the distribution model, the method of depth convolution can be used to estimate and update the parameters of the model, so as to obtain the overall distribution of each pixel of the pathological image, and finally realize the color standardization of the image to be analyzed through the coloring distribution change model.
由于不同厂家的保存液、染色剂和制片过程存在各种差异以及数字化扫描仪不同会导致病理图像的颜色显著变化,通过染色标准化可以解决因染色操作、染色条件或设备影像导致病理图像颜色分布不一致问题,从而改善后续分析识别的结果。Since there are various differences in the preservation solutions, staining agents, and film production processes of different manufacturers, and different digital scanners will cause significant changes in the color of pathological images, standardization of staining can solve the problem of inconsistent color distribution of pathological images caused by staining operations, staining conditions, or equipment images, thereby improving the results of subsequent analysis and identification.
(2)结构区域分割模块(2) Structural Region Segmentation Module
结构区域分割模块通过多染色标准化图像依次进行正则化编码、正则化解码、弱监督学习、感兴趣区域检测等获得结构区域预约分割图。The structural region segmentation module sequentially performs regularized encoding, regularized decoding, weakly supervised learning, and region-of-interest detection through the multi-colored standardized image to obtain the reserved segmentation map of the structural region.
具体地,对于构建弱监督分割模型,其中一个关键问题是利用有限的信息提取足够的关键特征编码,从而有效地辅助分割。根据图像级标签构建出数据中的关键区域的特征,去除不相关的冗余信息和噪声,将原有的数据信息抽象为两个大类的数据矩阵,目标结构信息为低秩矩阵,冗余及噪声信息为稀疏矩阵,分别对两个矩阵求解,最终得到目标结构数据的特征信息,将此方法用在多尺度病理图像分割模型上,并针对上述损失设计了正则化器来进行模型训练。例如,设图像为I及其标签为Y,设fθ(I)为θ参数化的分割网络的输出,可使用联合正则化损失的卷积神经网络训练对应的优化问题,表示为:Specifically, for building a weakly supervised segmentation model, one of the key issues is to extract enough key feature codes with limited information to effectively assist segmentation. According to the image-level labels, the characteristics of the key areas in the data are constructed, irrelevant redundant information and noise are removed, and the original data information is abstracted into two types of data matrices. The target structure information is a low-rank matrix, and the redundant and noise information is a sparse matrix. The two matrices are solved separately, and finally the characteristic information of the target structure data is obtained. This method is used in the multi-scale pathological image segmentation model, and a regularizer is designed for model training for the above losses. For example, let an image be I and its label be Y, let fθ(I) be the output of a segmentation network parameterized by θ, and the corresponding optimization problem can be trained using a convolutional neural network with a joint regularization loss, expressed as:
Figure PCTCN2022072760-appb-000009
Figure PCTCN2022072760-appb-000009
其中,l(S,Y)是真实值与预测值之间的损失,R(S)是正则化损失,参数S=f θ(I)∈[0,1] |Ω|×K,即网络生成的K通道的softmax分割结果,λ和μ是设定的相应项权重参数。 Among them, l(S, Y) is the loss between the real value and the predicted value, R(S) is the regularization loss, the parameter S=f θ (I)∈[0,1] |Ω|×K , which is the softmax segmentation result of the K channel generated by the network, and λ and μ are the corresponding item weight parameters set.
为了将不同倍率下图像的特征信息融合进算法的学习过程中,模型的输入可融合多个倍率的图像信息,以实现对高倍率下细胞、中低倍率组织的不同注意,从而充分考虑数据样本的特异性和通性,学习数据的关键特征。同时模拟临床病理医师的诊断流程,对不同倍率的图像特征予以不同的注意权重,以充分考虑各倍率图像下的数据特征。例如,相应的多倍率正则化损失的优化函数为:In order to integrate the feature information of images under different magnifications into the learning process of the algorithm, the input of the model can be fused with image information of multiple magnifications to achieve different attention to cells under high magnification and tissues at medium and low magnifications, so as to fully consider the specificity and generality of the data samples and learn the key features of the data. At the same time, the diagnostic process of clinical pathologists is simulated, and different attention weights are given to image features of different magnifications, so as to fully consider the data characteristics of images under various magnifications. For example, the optimization function of the corresponding multi-rate regularization loss is:
Figure PCTCN2022072760-appb-000010
Figure PCTCN2022072760-appb-000010
其中,I d表示d倍率下的图像输入,f θ,η表示在θ参数下、η的注意权 重下的特征计算;另外,η主要由Softmax(f θ(I))进行计算。通过深度卷积网络对上述参数进行学习并优化求解,具有高效的提取特征的能力,使得模型可以较好、较快得学习数据先验。 Among them, I d represents the image input under d magnification, f θ, η represents the feature calculation under the θ parameter and the attention weight of η; in addition, η is mainly calculated by Softmax(f θ (I)). Through the deep convolutional network to learn and optimize the above parameters, it has the ability to extract features efficiently, so that the model can learn data prior better and faster.
参数化模型完成学习后,根据识别的目标类别,通过对多个尺度的特征进行融合计算,得到加权类别激活映射图,经后处理后可得到目标组织和结构区域。After the parametric model completes the learning, according to the recognized target category, the weighted category activation map is obtained by fusion calculation of the features of multiple scales, and the target tissue and structural area can be obtained after post-processing.
弱监督学习可使用更容易获得的真值标注替代逐像素的真值标注,从而降低了数据标注成本并提高了图像分割的效率。结构区域分割网络可使用AlexNet、VGG、GoogleNet、ResNet等多种类型,本发明对此不进行限制。Weakly supervised learning can replace pixel-wise ground-truth annotations with more readily available ground-truth annotations, thereby reducing the cost of data annotation and improving the efficiency of image segmentation. The structural region segmentation network can use multiple types such as AlexNet, VGG, GoogleNet, ResNet, etc., which is not limited in the present invention.
(3)细胞检测模块(3) Cell detection module
由于病理图像的大尺度性,在获得了癌症的感兴趣结构区域,以及传递了癌与非癌组织判别特性的关键特征之后(即关键判别矩阵,其根据不同类型图像的高维特征相对应矩阵的概率距离计算获得,且以肿瘤区域作为基准参考,即癌症的距离近,其他的距离远),需要从这一目标结构区域中提取不同类型的细胞信息,然而由于细胞的分布众多且占比小,算法是否能准确检测面临巨大的挑战。Due to the large-scale nature of pathological images, after obtaining the structural region of interest of cancer and transferring the key features of the discriminative characteristics of cancer and non-cancerous tissues (that is, the key discriminant matrix, which is calculated according to the probability distance of the matrix corresponding to the high-dimensional features of different types of images, and using the tumor region as a reference, that is, the distance between cancer and other distances is long), it is necessary to extract different types of cell information from this target structure region.
在一个实施例中,根据细胞与结构本身的差异性和相关性,基于自注意力的变换网络实现对图像各细胞、结构的编码与解码计算,同时结合关键判别矩阵进行融合分析,具体如下:In one embodiment, according to the differences and correlations between cells and structures, the transformation network based on self-attention realizes the encoding and decoding calculation of each cell and structure in the image, and at the same time combines the key discriminant matrix for fusion analysis, as follows:
首先,将经过卷积网络提取的系列特征图X转换为视觉标记(visual tokens)T,表示为:First, the series of feature maps X extracted by the convolutional network are converted into visual tokens (visual tokens) T, expressed as:
T=SOFTMAX HW(XW A) TX    (5) T=SOFTMAX HW (XW A ) T X (5)
其中,
Figure PCTCN2022072760-appb-000011
W A为可学习的权重且
Figure PCTCN2022072760-appb-000012
H、W、C分别代表的特征的各向维度,L代表视觉标记T的个数,且L<<HW。
in,
Figure PCTCN2022072760-appb-000011
W A is a learnable weight and
Figure PCTCN2022072760-appb-000012
H, W, and C respectively represent the isotropic dimensions of the features, L represents the number of visual markers T, and L<<HW.
在获得视觉标记T之后,利用自注意力变换进行T之间依赖关系的建模,并投射到正常特征图的维度,并结合前序的关键判别矩阵G,表示为:After obtaining the visual mark T, the self-attention transformation is used to model the dependency between T, and projected to the dimension of the normal feature map, combined with the key discriminant matrix G of the preorder, expressed as:
X out=X in+SOFTMAX L((X inW Q)(TW K) T)T+G   (6) X out =X in +SOFTMAX L ((X in W Q )(TW K ) T )T+G (6)
其中,X in表示多尺度病理图像肿瘤区域检测时获得的图像特征,X out 表示细胞检测模块的最后输出结果,W Q和W K分别为可学习的权重参数,构建完图像各特征关系之后进行大量数据学习,实现对不同类别的细胞、结构识别和定位。 Among them , Xin represents the image features obtained during the multi-scale pathological image tumor region detection, X out represents the final output result of the cell detection module, W Q and W K are the weight parameters that can be learned, respectively, after constructing the feature relationship of the image, a large amount of data learning is carried out to realize the recognition and positioning of different types of cells and structures.
(4)空间关系构建模块(4) Spatial relationship building blocks
空间关系构建模块依次执行图像切块、图像特征编码、低倍率刚性配准、高倍率非刚性配准、多层图网络构建、图嵌入降维、获取点云分布数据、持续同调建模和特征聚类分析等过程,最终获得空间分布定量模型。以下重点说明多层图网络构建和特征聚类分析。The spatial relationship building module sequentially performs processes such as image slicing, image feature encoding, low-magnification rigid registration, high-magnification non-rigid registration, multi-layer graph network construction, graph embedding dimensionality reduction, point cloud distribution data acquisition, continuous coherent modeling and feature cluster analysis, and finally obtains a quantitative model of spatial distribution. The following focuses on the multi-layer graph network construction and feature cluster analysis.
具体地,为了分析多类型细胞之间的共空间分布表达来定量表征肿瘤细胞与肿瘤微环境的相互之间的作用,在一个实施例中,采用多层图建模多层网络方式实现,多层图是带有权重的单层图邻接矩阵的一个集合,包含层内关系和层间相互作用。具体实现包括基于多层图构建的多层网络和针对多层网络的聚类计算,从而最终实现出肿瘤细胞与肿瘤微环境多成分之间的空间分布表达模型的构建。Specifically, in order to analyze the co-space distribution expression between multiple types of cells to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, in one embodiment, a multi-layer graph is used to model a multi-layer network. A multi-layer graph is a collection of single-layer graph adjacency matrices with weights, including intra-layer relationships and inter-layer interactions. The specific implementation includes the multi-layer network constructed based on the multi-layer graph and the clustering calculation for the multi-layer network, so as to finally realize the construction of the spatial distribution expression model between the tumor cells and the multi-components of the tumor microenvironment.
1)、多层网络的空间高阶关系的建模1) Modeling of spatial high-order relationships in multi-layer networks
单层图网络定义为:G=(V,E,ω),其中V是节点的集合,
Figure PCTCN2022072760-appb-000013
是边的集合。图G中点的总数为n=|V|。ω:
Figure PCTCN2022072760-appb-000014
是一个边权重函数,边e uv∈E的权重表示为ω uv,邻接矩阵A是一个对称矩阵,即A ij=A ji,表示每个节点是否有连接关系,也就是不同类型细胞节点的信息。
A single-layer graph network is defined as: G = (V, E, ω), where V is the set of nodes,
Figure PCTCN2022072760-appb-000013
is a set of edges. The total number of points in graph G is n=|V|. ω:
Figure PCTCN2022072760-appb-000014
is an edge weight function, the weight of the edge e uv ∈ E is expressed as ω uv , and the adjacency matrix A is a symmetric matrix, that is, A ij = A ji , indicating whether each node has a connection relationship, that is, the information of different types of cell nodes.
根据基于单层图的定义,可以构建出多层网络
Figure PCTCN2022072760-appb-000015
由不重叠的m层组成,每一层都由邻接矩阵为A i,i=1,…,m的加权图G i建模。集合A={A 1,A 2,…,A m}中的元素称为层内矩阵,表示单层内的连接,即层内连接。对于两个图之间联系的建模,G k和G l以及它们的邻接矩阵可分别表示为,A k和A l(k,l=1,2,…,m;k≠l),其代表了两个相关图的节点之间一对一的对称内部连接。通过这种方式,可以获得一个跨层邻接矩阵的集合C p={A l,k,k≠l},表示不同层的节点之间的边,p代表联系图的数量。
According to the definition based on a single-layer graph, a multi-layer network can be constructed
Figure PCTCN2022072760-appb-000015
Consists of m non-overlapping layers, each layer is modeled by a weighted graph G i with adjacency matrix A i , i=1,...,m. The elements in the set A={A 1 , A 2 ,...,A m } are called intralayer matrices, which represent connections within a single layer, that is, intralayer connections. For the modeling of the connection between two graphs, G k and G l and their adjacency matrices can be expressed as, A k and A l (k,l=1,2,...,m; k≠l), which represent the one-to-one symmetric internal connection between the nodes of the two related graphs. In this way, a cross-layer adjacency matrix set C p ={A l,k ,k≠l} can be obtained, which represents the edges between nodes of different layers, and p represents the number of connection graphs.
综上,一个多层网络
Figure PCTCN2022072760-appb-000016
具有一个连接跨层节点的层间连接集合
Figure PCTCN2022072760-appb-000017
对于边
Figure PCTCN2022072760-appb-000018
有u∈V(G k)以及v∈V(G l),且k≠l。定义的多层网络
Figure PCTCN2022072760-appb-000019
的超邻接矩阵具有一个块矩阵结构:
In summary, a multi-layer network
Figure PCTCN2022072760-appb-000016
A collection of interlayer connections that connect nodes across layers
Figure PCTCN2022072760-appb-000017
for sides
Figure PCTCN2022072760-appb-000018
There are u∈V(G k ) and v∈V(G l ), and k≠l. defined multilayer network
Figure PCTCN2022072760-appb-000019
The hyperadjacency matrix of has a block matrix structure:
Figure PCTCN2022072760-appb-000020
Figure PCTCN2022072760-appb-000020
集合A中的对角元素是层内矩阵,非对角元素A kl(k,l=1,2,…,m;k≠l)表示将G K层中节点与G l层中节点连接起来的层间连接。在一个实施例中,定义同层节点表示同一类型细胞,不同层之间连接表示不同类型细胞或结构之间的空间连接关系。以血管结构和肿瘤细胞为例,细胞-结构层间关系的建立可以基于空间距离大小获得层间非对角元素A kl的值,距离血管近的肿瘤或免疫细胞与结构层有强联系,反之有弱联系;对角元素是层内矩阵也是通过细胞之间欧氏距离来获取。 The diagonal elements in the set A are intra-layer matrices, and the off-diagonal elements A kl (k,l=1,2,...,m; k≠l) represent interlayer connections connecting nodes in layer G K with nodes in layer G l . In one embodiment, nodes in the same layer are defined to represent the same type of cells, and connections between different layers are defined to represent the spatial connection relationship between different types of cells or structures. Taking vascular structures and tumor cells as an example, the relationship between cell-structure layers can be established based on the spatial distance to obtain the value of the off-diagonal element A kl between layers. Tumors or immune cells that are close to blood vessels have a strong connection with the structural layer, and vice versa; the diagonal elements are intra-layer matrices that are also obtained through the Euclidean distance between cells.
在构建完多层图网络后,考虑到从复杂网络中提取有意义的信息需要大量的计算和内存,为解决这两个问题,通过节点嵌入将网络转换到一个低维空间并保留其结构信息,例如采用图嵌入的方法实现降维。After building a multi-layer graph network, considering that extracting meaningful information from a complex network requires a lot of computation and memory, in order to solve these two problems, the network is transformed into a low-dimensional space through node embedding and its structural information is preserved, for example, the method of graph embedding is used to achieve dimensionality reduction.
2)、基于持续性图聚类的多层网络拓扑分析方法2), multi-layer network topology analysis method based on persistence graph clustering
为了从肿瘤微环境多细胞类型与癌细胞的节点嵌入中推理出肿瘤演化结论,需要对节点嵌入进行聚类计算。通过基于形状动力学形成簇,有助于发现具有相似模式的持久性节点簇,在一个实施例中,将拓扑数据分析(topological data analysis,TDA)的概念引入到复杂的多层网络拓扑分析中。In order to infer tumor evolution conclusions from the node embeddings of multiple cell types and cancer cells in the tumor microenvironment, clustering computation of the node embeddings is required. By forming clusters based on shape dynamics, it is helpful to discover persistent node clusters with similar patterns. In one embodiment, the concept of topological data analysis (topological data analysis, TDA) is introduced into complex multi-layer network topology analysis.
假设一个加权图G,如果选择一个阈值∈ j>0,并只保留权重满足ω uv≤∈ j的边,就能得到一个邻接矩阵为
Figure PCTCN2022072760-appb-000021
的图G j。如果将阈值改为∈ 1<∈ 2<…<∈ n得到图的分层嵌套序列
Figure PCTCN2022072760-appb-000022
称为“网络过滤”。以广泛应用的单纯复形Vietoris–Rips(VR)复形为例,阈值v j处的VR复形定义为
Figure PCTCN2022072760-appb-000023
对于所有的u,v∈σ}。借助于网络过滤,采用评估网络拓扑归纳的变化来检测大范围阈值∈ j上的持久性特征,其目标就是检测超过不同阈值∈的持久性特征,而这种持久性特征就是内在空间组织分布的特征。这种持续性图聚类算法能够获得更准确的聚类结果。
Assuming a weighted graph G, if we choose a threshold ∈ j > 0 and keep only the edges whose weight satisfies ω uv ≤ ∈ j , we can get an adjacency matrix as
Figure PCTCN2022072760-appb-000021
The graph G j of . If we change the threshold to ∈ 1 < ∈ 2 <...< ∈ n we get a hierarchically nested sequence of graphs
Figure PCTCN2022072760-appb-000022
Known as "network filtering". Taking the widely used simplicial complex Vietoris–Rips (VR) complex as an example, the VR complex at the threshold v j is defined as
Figure PCTCN2022072760-appb-000023
For all u, v ∈ σ}. With the help of network filtering, the evaluation of changes in network topology induction is used to detect persistent features over a wide range of thresholds ∈ j . The goal is to detect persistent features that exceed different thresholds ∈, and this persistent feature is a feature of the internal spatial organization distribution. This persistent graph clustering algorithm can obtain more accurate clustering results.
综上,目前大多数多层网络的聚类方法都是基于图谱分解将图嵌入到 欧几里德空间,并没有显式的考虑局部图几何和拓扑,而本发明实施例采用的多层网络聚类方法是从多分辨率记录的数据形状相似性的角度出发,在无监督的情况下对多层网络进行聚类计算。为了在演化相似尺度下量化多层网络的形状动力学,在聚类计算中引入了TDA的多透镜工具,其核心思想是如果两个点的局部邻域在所有分辨率尺度上形状相似,则它们之间的距离足够近,可以聚类为一个簇。因此,持续性图聚类利用了距离函数和点周围的局部空间信息,对于多层图网络可以获得更准确的聚类结果。To sum up, most of the current clustering methods for multi-layer networks are based on graph decomposition to embed graphs into Euclidean space, and do not explicitly consider the geometry and topology of local graphs. However, the multi-layer network clustering method adopted in the embodiment of the present invention starts from the perspective of similarity in the shape of multi-resolution recorded data, and performs clustering calculations on multi-layer networks under unsupervised conditions. In order to quantify the shape dynamics of multilayer networks at evolutionary similar scales, the multi-lens tool of TDA is introduced in the clustering calculation, whose core idea is that if the local neighborhoods of two points are similar in shape at all resolution scales, then the distance between them is close enough to be clustered into a cluster. Therefore, persistent graph clustering utilizes the distance function and local spatial information around points, and can obtain more accurate clustering results for multi-layer graph networks.
相应地,本发明还提供一种基于数字病理图像的肿瘤微环境空间关系建模方法,用于实现上述系统中各模块的功能。例如,该方法包括:步骤S110,确定病理图像的像素分布类型,根据病理图像各像素的整体分布情况对染色分布变化进行颜色标准化,获得染色标准化图像;步骤S120,针对所述染色标准化图像,利用弱监督深度学习模型检测感兴趣区域,进而分割得到目标结构区域;步骤S130,从获得的目标结构区域中提取多种类型的细胞信息;步骤S140,用于采用多层图建模多层网络来表征多种类型细胞之间的共空间分布,并对所述多层图进行聚类分析,得到空间分布定量模型。其中所述空间分布定量模型用于定量表征肿瘤细胞与肿瘤微环境相互之间的作用,所述多层图包含层内关系和层间相互作用,同层节点表示同一类型细胞,不同层之间的连接表示不同类型细胞或结构之间的空间连接关系。Correspondingly, the present invention also provides a method for modeling the spatial relationship of the tumor microenvironment based on digital pathological images, which is used to realize the functions of each module in the above system. For example, the method includes: step S110, determining the pixel distribution type of the pathological image, performing color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtaining a staining standardized image; step S120, using a weakly supervised deep learning model to detect the region of interest for the staining standardized image, and then segmenting the target structure region; step S130, extracting various types of cell information from the obtained target structure region; Class analysis to obtain a quantitative model of spatial distribution. Wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.
综上所述,相对于现有技术,本发明至少具有以下技术效果:In summary, compared with the prior art, the present invention has at least the following technical effects:
1)、设计了基于可学习正则化约束编解码弱监督学习的多尺度病理图像快速计算方法,针对病理图像单张超十亿像素的计算困难和不同倍率尺度下的信息利用不完全的问题,结合弱监督思想和深度学习技术,无需依赖大规模数据标注并充分利用跨尺度信息,实现数字全景病理图像快速病变感兴趣区域的检测和细胞核准确定位。1) A fast calculation method for multi-scale pathological images based on learnable regularization constraint encoding and decoding weakly supervised learning is designed. Aiming at the difficulty of calculating a single pathological image with over one billion pixels and the incomplete utilization of information at different magnification scales, combined with the idea of weak supervision and deep learning technology, it does not need to rely on large-scale data labeling and makes full use of cross-scale information to achieve rapid detection of lesion regions of interest and accurate positioning of cell nuclei in digital panoramic pathological images.
2)、结合细胞与结构本身的差异性和相关性,采用自注意力变换网络实现各细胞、结构的编解码,从而实现了群集多类型小目标细胞的快速检测和准确识别。2) Combining the differences and correlations between cells and structures, the self-attention transformation network is used to realize the encoding and decoding of each cell and structure, thereby realizing the rapid detection and accurate identification of clustered multi-type small target cells.
3)、基于持续性图聚类的肿瘤微环境拓扑空间建模方法,进一步实 现病理诊断指标的定量化计算。常规的距离或统计方法难以实现对复杂的肿瘤微环境空间表达分析,本发明将拓扑数据分析的概念引入到复杂的多层网络聚类计算中,提出一种持续性图聚类的拓扑空间建模方法,揭示肿瘤内异质性与肿瘤微环境细胞和组织空间分布规律关联性,为肿瘤演化机制提供全新的定量化分析新思路。3) The tumor microenvironment topological space modeling method based on persistence graph clustering, to further realize the quantitative calculation of pathological diagnostic indicators. Conventional distance or statistical methods are difficult to analyze the spatial expression of complex tumor microenvironments. The present invention introduces the concept of topological data analysis into complex multi-layer network clustering calculations, and proposes a topological space modeling method for persistent graph clustering. It reveals the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provides a new idea for quantitative analysis of tumor evolution mechanisms.
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read only memory (CD-ROM), digital versatile disks (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punched cards or raised-in-recess structures with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or electrical signals transmitted through electrical wires.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device.
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数 据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), can be executed by utilizing state information of computer readable program instructions to personalize electronic circuits that execute computer readable program instructions, thereby implementing various aspects of the present invention.
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing devices, thereby producing a machine, such that these instructions, when executed by a processor of a computer or other programmable data processing devices, produce devices that implement the functions/actions specified in one or more blocks in the flowchart and/or block diagrams. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific manner, so that the computer-readable medium storing instructions includes an article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions can also be loaded onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to generate a computer-implemented process, so that the instructions executed on the computer, other programmable data processing device, or other equipment realize the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of instructions comprising one or more executable instructions for implementing specified logical functions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or actions, or by combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

  1. 一种基于数字病理图像的肿瘤微环境空间关系建模系统,包括:A tumor microenvironment spatial relationship modeling system based on digital pathology images, including:
    图像染色标准化模块:用于确定病理图像的像素分布类型,根据病理图像各像素的整体分布情况对染色分布变化进行颜色标准化,获得染色标准化图像;Image staining standardization module: used to determine the pixel distribution type of the pathological image, perform color standardization on the change of staining distribution according to the overall distribution of each pixel of the pathological image, and obtain a staining standardized image;
    结构区域分割模块:用于针对所述染色标准化图像,利用弱监督深度学习模型检测感兴趣区域,进而分割得到目标结构区域;Structural region segmentation module: for the dyed standardized image, using a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structural region;
    细胞检测模块:用于从获得的目标结构区域中提取多种类型的细胞信息;Cell detection module: used to extract various types of cell information from the obtained target structure region;
    空间关系构建模块:用于采用多层图建模多层网络来表征多种类型细胞之间的共空间分布,并对所述多层图进行聚类分析,得到空间分布定量模型,其中所述空间分布定量模型用于定量表征肿瘤细胞与肿瘤微环境相互之间的作用,所述多层图包含层内关系和层间相互作用,同层节点表示同一类型细胞,不同层之间的连接表示不同类型细胞或结构之间的空间连接关系。Spatial relationship building module: it is used to model the multi-layer network by using a multi-layer graph to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a quantitative model of spatial distribution, wherein the quantitative model of spatial distribution is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment.
  2. 一种基于数字病理图像的肿瘤微环境空间关系建模方法,包括以下步骤:A method for modeling the spatial relationship of the tumor microenvironment based on digital pathology images, comprising the following steps:
    步骤S1:确定病理图像的像素分布类型,根据病理图像各像素的整体分布情况对染色分布变化进行颜色标准化,获得染色标准化图像;Step S1: Determine the pixel distribution type of the pathological image, perform color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtain a staining standardized image;
    步骤S2:针对所述染色标准化图像,利用弱监督深度学习模型检测感兴趣区域,进而分割得到目标结构区域;Step S2: For the stained and standardized image, use a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structure region;
    步骤S3:从获得的目标结构区域中提取多种类型的细胞信息;Step S3: extracting various types of cell information from the obtained target structure region;
    步骤S4:用于采用多层图建模多层网络来表征多种类型细胞之间的共空间分布,并对所述多层图进行聚类分析,得到空间分布定量模型,其中所述空间分布定量模型用于定量表征肿瘤细胞与肿瘤微环境相互之间的作用,所述多层图包含层内关系和层间相互作用,同层节点表示同一类型细胞,不同层之间的连接表示不同类型细胞或结构之间的空间连接关系。Step S4: using a multi-layer graph to model a multi-layer network to characterize the co-space distribution among various types of cells, and performing cluster analysis on the multi-layer graph to obtain a spatial distribution quantitative model, wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.
  3. 根据权利要求2所述的方法,其特征在于,所述弱监督深度学习模型的输入融合多个倍率的图像信息,并且训练过程采用多倍率正则化损失 作为优化目标,表示为:The method according to claim 2, wherein the input of the weakly supervised deep learning model fuses image information of multiple magnifications, and the training process adopts multiple magnification regularization loss as the optimization target, expressed as:
    Figure PCTCN2022072760-appb-100001
    Figure PCTCN2022072760-appb-100001
    其中,I d表示d倍率下的图像输入,f θ,η表示在θ参数下、η的注意权重下的特征计算,η根据Softmax(f θ(I))进行计算,
    Figure PCTCN2022072760-appb-100002
    是真实值与预测值之间的损失,R(S)是正则化损失,参数S=f θ(I)∈[0,1] |Ω|×K,K表示通道数,λ和μ是设定的权重参数,I表示图像,Y表示图像对应的标签。
    Wherein, I d represents the image input under the d magnification, f θ, η represents the feature calculation under the θ parameter and the attention weight of η, and η calculates according to Softmax(f θ (I)),
    Figure PCTCN2022072760-appb-100002
    is the loss between the real value and the predicted value, R(S) is the regularization loss, the parameter S=f θ (I)∈[0,1] |Ω|×K , K represents the number of channels, λ and μ are the set weight parameters, I represents the image, and Y represents the label corresponding to the image.
  4. 根据权利要求2所述的方法,其特征在于,步骤S3包括以下子步骤:The method according to claim 2, wherein step S3 comprises the following sub-steps:
    将所提取的目标结构区域的特征图X转换为视觉标记T;Convert the extracted feature map X of the target structure region into a visual marker T;
    利用自注意力变换进行T之间依赖关系的建模,并投射到正常特征图的维度,表示为:The self-attention transformation is used to model the dependencies between T, and projected to the dimension of the normal feature map, expressed as:
    X out=X in+SOFTMAX L((X inW Q)(TW K) T)T+G X out =X in +SOFTMAX L ((X in W Q )(TW K ) T )T+G
    其中,G是关键判别矩阵,W Q和W K是权重参数,X in表示多尺度病理图像肿瘤区域检测时获得的图像特征,X out表示输出结果; Among them, G is the key discriminant matrix, W Q and W K are the weight parameters, Xin represents the image features obtained during the multi-scale pathological image tumor region detection, and X out represents the output result;
    根据构建的图像各特征关系,通过数据学习,实现对不同类型细胞的识别和定位。According to the feature relationship of the constructed image, through data learning, the identification and positioning of different types of cells can be realized.
  5. 根据权利要求2所述的方法,其特征在于,所述多层图包括不重叠的m层,每一层都由邻接矩阵为A i,i=1,…,m的加权图G i建模,集合A={A 1,A 2,…,A m}中的元素称为层内矩阵,表示层内连接;对于两个图之间联系的建模,G k和G l以及它们的邻接矩阵分别表示为A k和A l,其代表了两个相关图的节点之间一对一的对称内部连接,跨层邻接矩阵的集合C p={A l,k,k≠l},表示不同层的节点之间的边,p代表联系图的数量,其中,k,l=1,2,…,m,k≠l。 The method according to claim 2, wherein the multi-layer graph includes non-overlapping m layers, and each layer consists of an adjacency matrix A i, a weighted graph G of i=1,...,m iModeling, set A = {A 1,A 2,...,A m} is called the intra-layer matrix, which represents the intra-layer connection; for the modeling of the connection between two graphs, G kand G land their adjacency matrices are denoted as A kand A l, which represents a one-to-one symmetric internal connection between the nodes of two correlation graphs, the set C of cross-layer adjacency matrices p={A l,k,k≠l}, represents the edge between nodes in different layers, p represents the number of connection graphs, where k,l=1,2,...,m, k≠l.
  6. 根据权利要求5所述的方法,其特征在于,对于由多层图构建的一个多层网络
    Figure PCTCN2022072760-appb-100003
    具有一个连接跨层节点的层间连接集合
    Figure PCTCN2022072760-appb-100004
    对于边
    Figure PCTCN2022072760-appb-100005
    有u∈V(G k)以及v∈V(G l),且k≠l,所述多层网络
    Figure PCTCN2022072760-appb-100006
    的超邻接矩阵具有一个块矩阵结构,表示为:
    The method according to claim 5, wherein, for a multi-layer network constructed by a multi-layer graph
    Figure PCTCN2022072760-appb-100003
    has a collection of interlayer connections that connect nodes across layers
    Figure PCTCN2022072760-appb-100004
    for sides
    Figure PCTCN2022072760-appb-100005
    There are u∈V(G k ) and v∈V(G l ), and k≠l, the multilayer network
    Figure PCTCN2022072760-appb-100006
    The hyperadjacency matrix of has a block matrix structure expressed as:
    Figure PCTCN2022072760-appb-100007
    Figure PCTCN2022072760-appb-100007
    其中,集合A中的对角元素是层内矩阵,非对角元素A kl(k,l=1,2,…,m;k≠l)表示将G k层中节点与G l层中节点连接起来的层间连接。 Among them, the diagonal elements in the set A are intra-layer matrices, and the off-diagonal elements A kl (k,l=1,2,...,m; k≠l) represent interlayer connections connecting nodes in layer G k to nodes in layer G l .
  7. 根据权利要求6所述的方法,其特征在于,对于层间连接,基于空间距离大小获得层间非对角元素A kl的值,对于层内矩阵,通过细胞之间的欧氏距离获取对角元素的值。 The method according to claim 6, wherein, for the interlayer connection, the value of the interlayer off-diagonal element A kl is obtained based on the spatial distance, and for the intralayer matrix, the value of the diagonal element is obtained by the Euclidean distance between cells.
  8. 根据权利要求5所述的方法,其特征在于,还包括对所述多层网络采用图嵌入进行降维,并对降维后的多层网络根据两个点的局部邻域在所有分辨率尺度上的形状相似度进行聚类。The method according to claim 5, further comprising reducing the dimensionality of the multi-layer network by using graph embedding, and clustering the multi-layer network after dimensionality reduction according to the shape similarity of the local neighborhood of two points at all resolution scales.
  9. 根据权利要求2所述的方法,其特征在于,所述病理图像的像素分布类型利用雅克-贝拉检验确定。The method according to claim 2, wherein the pixel distribution type of the pathological image is determined by Jacques-Béla test.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求2至9中任一项所述方法的步骤。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 2 to 9 are implemented.
PCT/CN2022/072760 2022-01-19 2022-01-19 Tumor microenvironment spatial relationship modeling system and method based on digital pathology image WO2023137627A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072760 WO2023137627A1 (en) 2022-01-19 2022-01-19 Tumor microenvironment spatial relationship modeling system and method based on digital pathology image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/072760 WO2023137627A1 (en) 2022-01-19 2022-01-19 Tumor microenvironment spatial relationship modeling system and method based on digital pathology image

Publications (1)

Publication Number Publication Date
WO2023137627A1 true WO2023137627A1 (en) 2023-07-27

Family

ID=87347642

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072760 WO2023137627A1 (en) 2022-01-19 2022-01-19 Tumor microenvironment spatial relationship modeling system and method based on digital pathology image

Country Status (1)

Country Link
WO (1) WO2023137627A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115117A (en) * 2023-08-31 2023-11-24 南京诺源医疗器械有限公司 Pathological image recognition method based on small sample, electronic equipment and storage medium
CN117423476A (en) * 2023-12-18 2024-01-19 中国科学院地理科学与资源研究所 Echinococcosis epidemic rate prediction method based on downscaling and Bayesian model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550651A (en) * 2015-12-14 2016-05-04 中国科学院深圳先进技术研究院 Method and system for automatically analyzing panoramic image of digital pathological section
US20180204085A1 (en) * 2015-06-11 2018-07-19 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Systems and methods for finding regions of interest in hematoxylin and eosin (h&e) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images
CN111417958A (en) * 2017-12-07 2020-07-14 文塔纳医疗系统公司 Deep learning system and method for joint cell and region classification in biological images
CN113591919A (en) * 2021-06-29 2021-11-02 复旦大学附属中山医院 AI-based analysis method and system for prognosis of postoperative recurrence of early hepatocellular carcinoma
CN113674252A (en) * 2021-08-25 2021-11-19 上海鹏冠生物医药科技有限公司 Histopathology image diagnosis system based on graph neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204085A1 (en) * 2015-06-11 2018-07-19 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Systems and methods for finding regions of interest in hematoxylin and eosin (h&e) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images
CN105550651A (en) * 2015-12-14 2016-05-04 中国科学院深圳先进技术研究院 Method and system for automatically analyzing panoramic image of digital pathological section
CN111417958A (en) * 2017-12-07 2020-07-14 文塔纳医疗系统公司 Deep learning system and method for joint cell and region classification in biological images
CN113591919A (en) * 2021-06-29 2021-11-02 复旦大学附属中山医院 AI-based analysis method and system for prognosis of postoperative recurrence of early hepatocellular carcinoma
CN113674252A (en) * 2021-08-25 2021-11-19 上海鹏冠生物医药科技有限公司 Histopathology image diagnosis system based on graph neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115117A (en) * 2023-08-31 2023-11-24 南京诺源医疗器械有限公司 Pathological image recognition method based on small sample, electronic equipment and storage medium
CN117115117B (en) * 2023-08-31 2024-02-09 南京诺源医疗器械有限公司 Pathological image recognition method based on small sample, electronic equipment and storage medium
CN117423476A (en) * 2023-12-18 2024-01-19 中国科学院地理科学与资源研究所 Echinococcosis epidemic rate prediction method based on downscaling and Bayesian model
CN117423476B (en) * 2023-12-18 2024-03-08 中国科学院地理科学与资源研究所 Echinococcosis epidemic rate prediction method based on downscaling and Bayesian model

Similar Documents

Publication Publication Date Title
Wang et al. Uncertainty estimation for stereo matching based on evidential deep learning
Chen et al. Adapting grad-cam for embedding networks
WO2023137627A1 (en) Tumor microenvironment spatial relationship modeling system and method based on digital pathology image
Pan et al. Cell detection in pathology and microscopy images with multi-scale fully convolutional neural networks
Zheng et al. Application of transfer learning and ensemble learning in image-level classification for breast histopathology
Abdelsamea et al. A survey on artificial intelligence in histopathology image analysis
Li et al. A hierarchical conditional random field-based attention mechanism approach for gastric histopathology image classification
Habtemariam et al. Cervix type and cervical cancer classification system using deep learning techniques
Mahanta et al. IHC-Net: A fully convolutional neural network for automated nuclear segmentation and ensemble classification for Allred scoring in breast pathology
CN112529005A (en) Target detection method based on semantic feature consistency supervision pyramid network
Wang et al. Learning to find reliable correspondences with local neighborhood consensus
Fang et al. Annotation-efficient COVID-19 pneumonia lesion segmentation using error-aware unified semisupervised and active learning
Pati et al. Weakly supervised joint whole-slide segmentation and classification in prostate cancer
CN116912240B (en) Mutation TP53 immunology detection method based on semi-supervised learning
CN111476802B (en) Medical image segmentation and tumor detection method, equipment and readable storage medium
CN114565919A (en) Tumor microenvironment spatial relationship modeling system and method based on digital pathological image
Zamanitajeddin et al. Cells are actors: Social network analysis with classical ml for sota histology image classification
Pan et al. A review of machine learning approaches, challenges and prospects for computational tumor pathology
Kalyani et al. Deep learning-based detection and classification of adenocarcinoma cell nuclei
Marini et al. Semi-supervised learning with a teacher-student paradigm for histopathology classification: a resource to face data heterogeneity and lack of local annotations
Lu et al. Prediction of breast cancer metastasis by deep learning pathology
Tang et al. Salient object detection via two-stage absorbing Markov chain based on background and foreground
Benedek An embedded marked point process framework for three-level object population analysis
Wu et al. A soft-computing based approach to overlapped cells analysis in histopathology images with genetic algorithm
Romo-Bucheli et al. Nuclei graph local features for basal cell carcinoma classification in whole slide images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921074

Country of ref document: EP

Kind code of ref document: A1