WO2023137627A1

WO2023137627A1 - Tumor microenvironment spatial relationship modeling system and method based on digital pathology image

Info

Publication number: WO2023137627A1
Application number: PCT/CN2022/072760
Authority: WO
Inventors: 秦文健; 刁颂辉; 何佳慧; 侯嘉馨
Original assignee: 深圳先进技术研究院
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2023-07-27

Abstract

Disclosed in the present invention are a tumor microenvironment spatial relationship modeling system and method based on a digital pathology image. The system comprises an image dyeing normalization module, configured to determine a pixel distribution type of a pathology image to perform color normalization on the dyeing distribution change to obtain a dyeing normalization image; a structure region segmentation module, configured to detect a region of interest by using a weakly supervised deep learning model for the dyeing normalization image, and then segmenting to obtain a target structure region; a cell detection module, configured to extract information of various types of cells from the target structure region; and a spatial relationship construction module, configured to model a multi-layer network by using a multi-layer graph, to represent a co-spatial distribution among the various types of cells, and perform clustering analysis on the multi-layer graph to obtain a spatial distribution quantitative model. The present invention can accurately reveal the correlation between the intratumoral heterogeneity and the tumor microenvironment cell and tissue spatial distribution rule, thereby providing a new quantitative analysis idea for a tumor evolution mechanism.

Description

System and method for modeling spatial relationship of tumor microenvironment based on digital pathology images

technical field

The present invention relates to the technical field of medical image processing, and more specifically, to a system and method for modeling the spatial relationship of tumor microenvironment based on digital pathological images.

Background technique

Tumor tissue is a complex structure composed of cancer cells and surrounding non-cancer cells (such as stromal cells and lymphocytes) forming a tumor microenvironment. Its spatial heterogeneity is very complex. Although weakly supervised learning algorithms can be used to identify and locate cancer cells, lymphocytes, stromal cells, and other types of cells (such as macrophages, T cells, or non-differentiating cells) in digital pathology images, existing methods cannot achieve full expression due to simple distance measurement, cell density statistics, or clustering methods. In addition, due to the abundance of cell types in the tumor microenvironment, the current cell spatial organization relationship constructed by graph neural network cannot be applied to the automatic and comprehensive quantitative analysis of the spatial organization distribution of multiple cell types at the same time. Therefore, it is necessary to study new methods for topological clustering of multi-layer networks for multi-cell types.

The tumor microenvironment controls the formation, development, metastasis, and drug resistance of solid tumors. It is the result of the interaction between tumor cells and non-tumor cells and tissues such as stromal cells and immune cells to generate anti-tumor immune responses. There are strong clinical and experimental evidences to support the importance of the tumor microenvironment in cancer progression and mediation of drug resistance. However, the relationship of complex anatomy and local microenvironment to metabolic and immune responses remains to be deeply explored. It is difficult for pathologists to capture the interaction between the tumor and its microenvironment through conventional qualitative or semi-quantitative parametric visual inspection. Therefore, the use of digital pathological image computing analysis to decipher the characteristics of the tumor microenvironment, especially the spatial heterogeneity within the tumor, not only provides a new way of thinking for solving the problem of tumor microenvironment analysis, but more importantly, it can mine potential biomarkers related to cancer treatment, so as to design the most appropriate precision medicine treatment plan for patients.

With the development of digital pathological panoramic imaging and pathological image processing algorithms based on deep learning, computational pathology can not only assist pathologists to examine patients' histological data in a high-throughput, quantitative and objective manner, but also use various types of cells obtained by automatic detection algorithms to construct a spatial relationship map of cells in the tumor, and combine spatial analysis methods to achieve accurate assessment of tumor treatment response and prognosis. The initial research work on the spatial analysis of the tumor microenvironment based on digital pathology usually uses clustering algorithms to perform spatial positioning and morphometric measurement of cell features extracted from digital pathology images to describe the relationship between the spatial distribution pattern of immune cells and diseases. For example, a convolutional neural network was first used to identify tumor-infiltrating lymphocytes (TIL) and segment tumor necrosis regions, then an affine clustering algorithm was used to model the spatial pattern of infiltrating lymphocytes, and then the corresponding clustering features were extracted to describe the spatial pattern of TIL, revealing the relationship between TIL patterns and immune subtypes, tumor types, immune cell fragments, and patient survival. There are also some studies that use Euclidean distance to measure cell distribution density to quantitatively represent the spatial relationship between cancer and microenvironment components, and to explore its clinical significance. These studies demonstrate that the use of image analysis can go beyond sample cell counts to provide spatial analysis of the tumor microenvironment based on spatial distance. In order to make better use of high-level spatial distribution information, KunHuang et al. used the topological space modeling of deep learning features based on delaunay triangulation graphs. First, stacked autoencoder networks were used to learn high-level semantic features of cells, and then K-means clustering was used to obtain the spatial pattern of cell nuclei. Finally, the statistical method of edge histograms confirmed that the spatial topological features of the renal tumor microenvironment were significantly correlated with survival. They also verified that topological features have superior performance compared with clinical features and cell morphological features in terms of survival prediction. Guanghua Xiao et al. used the method of cell statistical density to construct a regional spatial organization map, used a deep convolutional network to automatically identify cell types, and finally calculated two spatially distributed features to predict the survival of lung cancer patients.

To sum up, the tumor microenvironment is very complex and has spatial heterogeneity. Existing methods cannot be fully expressed by simple distance measurement, cell statistics or clustering methods. Although the latest research attempts to use graphs to construct spatial organization relationships, the spatial analysis of graphs still relies on manual extraction of features such as the number of adjacent node connections and edge histograms in graphs, and can only analyze simple relationships between cells. Due to the abundance of cell types in the tumor microenvironment, it is difficult for current spatial analysis methods to simultaneously perform fully automatic and comprehensive quantitative analysis of the spatial organization distribution of various types of components (including blood vessels and other structures, lymphocytes, stromal cells and other different types of cells).

Contents of the invention

The purpose of the present invention is to overcome the defects of the above-mentioned prior art, and provide a tumor microenvironment spatial relationship modeling system and method based on digital pathological images, which is a new technical solution for multi-layer network topology clustering of multi-cell types.

According to the first aspect of the present invention, a tumor microenvironment spatial relationship modeling system based on digital pathological images is provided. The system includes:

Image staining standardization module: used to determine the pixel distribution type of the pathological image, perform color standardization on the change of staining distribution according to the overall distribution of each pixel of the pathological image, and obtain a staining standardized image;

Structural region segmentation module: for the dyed standardized image, using a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structural region;

Cell detection module: used to extract various types of cell information from the obtained target structure region;

Spatial relationship building module: it is used to model the multi-layer network by using a multi-layer graph to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a quantitative model of spatial distribution, wherein the quantitative model of spatial distribution is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment.

According to the second aspect of the present invention, a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images is provided. The method includes the following steps:

Step S1: Determine the pixel distribution type of the pathological image, perform color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtain a staining standardized image;

Step S2: For the stained and standardized image, use a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structure region;

Step S3: extracting various types of cell information from the obtained target structure region;

Step S4: use a multi-layer graph to model a multi-layer network to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a spatial distribution quantitative model, wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.

Compared with the prior art, the present invention has the advantage that due to the richness and spatial heterogeneity of tumor cell types, there is a strong spatial correlation between various components of the tumor microenvironment and cancer cells at the same time. The present invention provides a mathematical model for constructing a topological space of tumor cells and multiple components of the tumor microenvironment, which can reveal the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provide a new idea for quantitative analysis of tumor evolution mechanisms.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

Fig. 1 is an architecture diagram of a tumor microenvironment spatial relationship modeling system based on digital pathological images according to an embodiment of the present invention;

Fig. 2 is a flow chart of a method for modeling the spatial relationship of tumor microenvironment based on digital pathological images according to an embodiment of the present invention.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

Referring to Figure 1, the provided digital pathology image-based tumor microenvironment spatial relationship modeling system includes an image staining standardization module, a structural region segmentation module, a cell detection module and a spatial relationship building module. Among them, the staining standardization module is used to solve the problem of inconsistent color distribution of different slices; the structural region segmentation module is used to combine the multi-scale imaging characteristics of pathological images, and realize the segmentation of lesion regions and structures at high resolution through regularizable weakly supervised learning methods; the cell detection module is used to detect and identify various types of cells in clusters of small targets; the spatial relationship construction module is used to identify immune cell types through image registration algorithms, and build a multi-type structure-topological spatial relationship model between cells and tumor cells through a multi-layer graph network to achieve quantitative analysis of the tumor microenvironment.

The functions and specific embodiments of each module will be introduced below.

(1) Image coloring standardization module

The distribution of stained pixels in pathological images usually conforms to a Gaussian distribution or a partial normal distribution, which can be determined by a self-supervised algorithm based on parameter estimation of the distribution model, where the probability density function (PDF) of the multivariate partial normal distribution is:

In the formula, θ=(μ ^T ,a ^T ,λ ^T ) ^T is the unknown parameter vector, a is the element of the upper triangular part of the matrix Σ,

is the square root of a symmetric matrix,

_φd (·;u,∑) is the PDF and covariance matrix of a d-variable Gaussian distribution with a μ-mean vector, and (·) refers to the cumulative distribution function of _φd (·;u,∑)) as a standard univariate Gaussian distribution. The probability density function of the multivariate mixed Gaussian distribution is:

Among them, the parameter π _d is called the mixing coefficient (mixing coefficients),

And 0≤π _d ≤1.

is the prior probability of the selected kth distribution, density

is the probability of x given the kth distribution.

In one embodiment, the Jarque-Bera test (Jarque-Bera test) can be used to confirm the actual distribution model type of the pixels of the pathological image, such as testing based on the skewness and kurtosis of the pixel data. When further solving the parameters of the distribution model, the method of depth convolution can be used to estimate and update the parameters of the model, so as to obtain the overall distribution of each pixel of the pathological image, and finally realize the color standardization of the image to be analyzed through the coloring distribution change model.

Since there are various differences in the preservation solutions, staining agents, and film production processes of different manufacturers, and different digital scanners will cause significant changes in the color of pathological images, standardization of staining can solve the problem of inconsistent color distribution of pathological images caused by staining operations, staining conditions, or equipment images, thereby improving the results of subsequent analysis and identification.

(2) Structural Region Segmentation Module

The structural region segmentation module sequentially performs regularized encoding, regularized decoding, weakly supervised learning, and region-of-interest detection through the multi-colored standardized image to obtain the reserved segmentation map of the structural region.

Specifically, for building a weakly supervised segmentation model, one of the key issues is to extract enough key feature codes with limited information to effectively assist segmentation. According to the image-level labels, the characteristics of the key areas in the data are constructed, irrelevant redundant information and noise are removed, and the original data information is abstracted into two types of data matrices. The target structure information is a low-rank matrix, and the redundant and noise information is a sparse matrix. The two matrices are solved separately, and finally the characteristic information of the target structure data is obtained. This method is used in the multi-scale pathological image segmentation model, and a regularizer is designed for model training for the above losses. For example, let an image be I and its label be Y, let fθ(I) be the output of a segmentation network parameterized by θ, and the corresponding optimization problem can be trained using a convolutional neural network with a joint regularization loss, expressed as:

Among them, l(S, Y) is the loss between the real value and the predicted value, R(S) is the regularization loss, the parameter S=f _θ (I)∈[0,1] ^|Ω|×K , which is the softmax segmentation result of the K channel generated by the network, and λ and μ are the corresponding item weight parameters set.

In order to integrate the feature information of images under different magnifications into the learning process of the algorithm, the input of the model can be fused with image information of multiple magnifications to achieve different attention to cells under high magnification and tissues at medium and low magnifications, so as to fully consider the specificity and generality of the data samples and learn the key features of the data. At the same time, the diagnostic process of clinical pathologists is simulated, and different attention weights are given to image features of different magnifications, so as to fully consider the data characteristics of images under various magnifications. For example, the optimization function of the corresponding multi-rate regularization loss is:

Among them, I _d represents the image input under d magnification, f _{θ, η} represents the feature calculation under the θ parameter and the attention weight of η; in addition, η is mainly calculated by Softmax(f _θ (I)). Through the deep convolutional network to learn and optimize the above parameters, it has the ability to extract features efficiently, so that the model can learn data prior better and faster.

After the parametric model completes the learning, according to the recognized target category, the weighted category activation map is obtained by fusion calculation of the features of multiple scales, and the target tissue and structural area can be obtained after post-processing.

Weakly supervised learning can replace pixel-wise ground-truth annotations with more readily available ground-truth annotations, thereby reducing the cost of data annotation and improving the efficiency of image segmentation. The structural region segmentation network can use multiple types such as AlexNet, VGG, GoogleNet, ResNet, etc., which is not limited in the present invention.

(3) Cell detection module

Due to the large-scale nature of pathological images, after obtaining the structural region of interest of cancer and transferring the key features of the discriminative characteristics of cancer and non-cancerous tissues (that is, the key discriminant matrix, which is calculated according to the probability distance of the matrix corresponding to the high-dimensional features of different types of images, and using the tumor region as a reference, that is, the distance between cancer and other distances is long), it is necessary to extract different types of cell information from this target structure region.

In one embodiment, according to the differences and correlations between cells and structures, the transformation network based on self-attention realizes the encoding and decoding calculation of each cell and structure in the image, and at the same time combines the key discriminant matrix for fusion analysis, as follows:

First, the series of feature maps X extracted by the convolutional network are converted into visual tokens (visual tokens) T, expressed as:

T＝SOFTMAX _HW (XW _A ) ^T X (5)

in,

W _A is a learnable weight and

H, W, and C respectively represent the isotropic dimensions of the features, L represents the number of visual markers T, and L<<HW.

After obtaining the visual mark T, the self-attention transformation is used to model the dependency between T, and projected to the dimension of the normal feature map, combined with the key discriminant matrix G of the preorder, expressed as:

X _out ＝X _in +SOFTMAX _L ((X _in W _Q )(TW _K ) ^T )T+G (6)

Among _them , Xin represents the image features obtained during the multi-scale pathological image tumor region detection, X _out represents the final output result of the cell detection module, W _Q and W _K are the weight parameters that can be learned, respectively, after constructing the feature relationship of the image, a large amount of data learning is carried out to realize the recognition and positioning of different types of cells and structures.

(4) Spatial relationship building blocks

The spatial relationship building module sequentially performs processes such as image slicing, image feature encoding, low-magnification rigid registration, high-magnification non-rigid registration, multi-layer graph network construction, graph embedding dimensionality reduction, point cloud distribution data acquisition, continuous coherent modeling and feature cluster analysis, and finally obtains a quantitative model of spatial distribution. The following focuses on the multi-layer graph network construction and feature cluster analysis.

Specifically, in order to analyze the co-space distribution expression between multiple types of cells to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, in one embodiment, a multi-layer graph is used to model a multi-layer network. A multi-layer graph is a collection of single-layer graph adjacency matrices with weights, including intra-layer relationships and inter-layer interactions. The specific implementation includes the multi-layer network constructed based on the multi-layer graph and the clustering calculation for the multi-layer network, so as to finally realize the construction of the spatial distribution expression model between the tumor cells and the multi-components of the tumor microenvironment.

1) Modeling of spatial high-order relationships in multi-layer networks

A single-layer graph network is defined as: G = (V, E, ω), where V is the set of nodes,

is a set of edges. The total number of points in graph G is n=|V|. ω:

is an edge weight function, the weight of the edge e _uv ∈ E is expressed as ω _uv , and the adjacency matrix A is a symmetric matrix, that is, A _ij = A _ji , indicating whether each node has a connection relationship, that is, the information of different types of cell nodes.

According to the definition based on a single-layer graph, a multi-layer network can be constructed

Consists of m non-overlapping layers, each layer is modeled by a weighted graph G _i with adjacency matrix A _i , i=1,...,m. The elements in the set A={A ₁ , A ₂ ,...,A _m } are called intralayer matrices, which represent connections within a single layer, that is, intralayer connections. For the modeling of the connection between two graphs, G _k and G _l and their adjacency matrices can be expressed as, A _k and A _l (k,l=1,2,...,m; k≠l), which represent the one-to-one symmetric internal connection between the nodes of the two related graphs. In this way, a cross-layer adjacency matrix set C _p ={A _l,k ,k≠l} can be obtained, which represents the edges between nodes of different layers, and p represents the number of connection graphs.

In summary, a multi-layer network

A collection of interlayer connections that connect nodes across layers

for sides

There are u∈V(G _k ) and v∈V(G _l ), and k≠l. defined multilayer network

The hyperadjacency matrix of has a block matrix structure:

The diagonal elements in the set A are intra-layer matrices, and the off-diagonal elements A _kl (k,l=1,2,...,m; k≠l) represent interlayer connections connecting nodes in layer G _K with nodes in layer G _l . In one embodiment, nodes in the same layer are defined to represent the same type of cells, and connections between different layers are defined to represent the spatial connection relationship between different types of cells or structures. Taking vascular structures and tumor cells as an example, the relationship between cell-structure layers can be established based on the spatial distance to obtain the value of the off-diagonal element A _kl between layers. Tumors or immune cells that are close to blood vessels have a strong connection with the structural layer, and vice versa; the diagonal elements are intra-layer matrices that are also obtained through the Euclidean distance between cells.

After building a multi-layer graph network, considering that extracting meaningful information from a complex network requires a lot of computation and memory, in order to solve these two problems, the network is transformed into a low-dimensional space through node embedding and its structural information is preserved, for example, the method of graph embedding is used to achieve dimensionality reduction.

2), multi-layer network topology analysis method based on persistence graph clustering

In order to infer tumor evolution conclusions from the node embeddings of multiple cell types and cancer cells in the tumor microenvironment, clustering computation of the node embeddings is required. By forming clusters based on shape dynamics, it is helpful to discover persistent node clusters with similar patterns. In one embodiment, the concept of topological data analysis (topological data analysis, TDA) is introduced into complex multi-layer network topology analysis.

Assuming a weighted graph G, if we choose a threshold ∈ _j > 0 and keep only the edges whose weight satisfies ω _uv ≤ ∈ _j , we can get an adjacency matrix as

The graph G _j of . If we change the threshold to ∈ ₁ < ∈ ₂ <...< ∈ _n we get a hierarchically nested sequence of graphs

Known as "network filtering". Taking the widely used simplicial complex Vietoris–Rips (VR) complex as an example, the VR complex at the threshold v _j is defined as

For all u, v ∈ σ}. With the help of network filtering, the evaluation of changes in network topology induction is used to detect persistent features over a wide range of thresholds ∈ _j . The goal is to detect persistent features that exceed different thresholds ∈, and this persistent feature is a feature of the internal spatial organization distribution. This persistent graph clustering algorithm can obtain more accurate clustering results.

To sum up, most of the current clustering methods for multi-layer networks are based on graph decomposition to embed graphs into Euclidean space, and do not explicitly consider the geometry and topology of local graphs. However, the multi-layer network clustering method adopted in the embodiment of the present invention starts from the perspective of similarity in the shape of multi-resolution recorded data, and performs clustering calculations on multi-layer networks under unsupervised conditions. In order to quantify the shape dynamics of multilayer networks at evolutionary similar scales, the multi-lens tool of TDA is introduced in the clustering calculation, whose core idea is that if the local neighborhoods of two points are similar in shape at all resolution scales, then the distance between them is close enough to be clustered into a cluster. Therefore, persistent graph clustering utilizes the distance function and local spatial information around points, and can obtain more accurate clustering results for multi-layer graph networks.

Correspondingly, the present invention also provides a method for modeling the spatial relationship of the tumor microenvironment based on digital pathological images, which is used to realize the functions of each module in the above system. For example, the method includes: step S110, determining the pixel distribution type of the pathological image, performing color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtaining a staining standardized image; step S120, using a weakly supervised deep learning model to detect the region of interest for the staining standardized image, and then segmenting the target structure region; step S130, extracting various types of cell information from the obtained target structure region; Class analysis to obtain a quantitative model of spatial distribution. Wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.

In summary, compared with the prior art, the present invention has at least the following technical effects:

1) A fast calculation method for multi-scale pathological images based on learnable regularization constraint encoding and decoding weakly supervised learning is designed. Aiming at the difficulty of calculating a single pathological image with over one billion pixels and the incomplete utilization of information at different magnification scales, combined with the idea of weak supervision and deep learning technology, it does not need to rely on large-scale data labeling and makes full use of cross-scale information to achieve rapid detection of lesion regions of interest and accurate positioning of cell nuclei in digital panoramic pathological images.

2) Combining the differences and correlations between cells and structures, the self-attention transformation network is used to realize the encoding and decoding of each cell and structure, thereby realizing the rapid detection and accurate identification of clustered multi-type small target cells.

3) The tumor microenvironment topological space modeling method based on persistence graph clustering, to further realize the quantitative calculation of pathological diagnostic indicators. Conventional distance or statistical methods are difficult to analyze the spatial expression of complex tumor microenvironments. The present invention introduces the concept of topological data analysis into complex multi-layer network clustering calculations, and proposes a topological space modeling method for persistent graph clustering. It reveals the correlation between intra-tumor heterogeneity and the spatial distribution of cells and tissues in the tumor microenvironment, and provides a new idea for quantitative analysis of tumor evolution mechanisms.

The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read only memory (CD-ROM), digital versatile disks (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punched cards or raised-in-recess structures with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or electrical signals transmitted through electrical wires.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device.

Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), can be executed by utilizing state information of computer readable program instructions to personalize electronic circuits that execute computer readable program instructions, thereby implementing various aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing devices, thereby producing a machine, such that these instructions, when executed by a processor of a computer or other programmable data processing devices, produce devices that implement the functions/actions specified in one or more blocks in the flowchart and/or block diagrams. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific manner, so that the computer-readable medium storing instructions includes an article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagrams.

Computer-readable program instructions can also be loaded onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to generate a computer-implemented process, so that the instructions executed on the computer, other programmable data processing device, or other equipment realize the functions/actions specified in one or more blocks in the flowchart and/or block diagram.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of instructions comprising one or more executable instructions for implementing specified logical functions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or actions, or by combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A tumor microenvironment spatial relationship modeling system based on digital pathology images, including:

Image staining standardization module: used to determine the pixel distribution type of the pathological image, perform color standardization on the change of staining distribution according to the overall distribution of each pixel of the pathological image, and obtain a staining standardized image;

Structural region segmentation module: for the dyed standardized image, using a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structural region;

Cell detection module: used to extract various types of cell information from the obtained target structure region;

Spatial relationship building module: it is used to model the multi-layer network by using a multi-layer graph to characterize the co-space distribution among various types of cells, and perform cluster analysis on the multi-layer graph to obtain a quantitative model of spatial distribution, wherein the quantitative model of spatial distribution is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment.
A method for modeling the spatial relationship of the tumor microenvironment based on digital pathology images, comprising the following steps:

Step S1: Determine the pixel distribution type of the pathological image, perform color standardization on the change of the staining distribution according to the overall distribution of each pixel in the pathological image, and obtain a staining standardized image;

Step S2: For the stained and standardized image, use a weakly supervised deep learning model to detect the region of interest, and then segment to obtain the target structure region;

Step S3: extracting various types of cell information from the obtained target structure region;

Step S4: using a multi-layer graph to model a multi-layer network to characterize the co-space distribution among various types of cells, and performing cluster analysis on the multi-layer graph to obtain a spatial distribution quantitative model, wherein the spatial distribution quantitative model is used to quantitatively characterize the interaction between tumor cells and the tumor microenvironment, the multi-layer graph includes intra-layer relationships and inter-layer interactions, nodes in the same layer represent cells of the same type, and connections between different layers represent spatial connection relationships between different types of cells or structures.
The method according to claim 2, wherein the input of the weakly supervised deep learning model fuses image information of multiple magnifications, and the training process adopts multiple magnification regularization loss as the optimization target, expressed as:

Wherein, I d represents the image input under the d magnification, f θ, η represents the feature calculation under the θ parameter and the attention weight of η, and η calculates according to Softmax(f θ (I)),
is the loss between the real value and the predicted value, R(S) is the regularization loss, the parameter S=f θ (I)∈[0,1] |Ω|×K , K represents the number of channels, λ and μ are the set weight parameters, I represents the image, and Y represents the label corresponding to the image.
The method according to claim 2, wherein step S3 comprises the following sub-steps:

Convert the extracted feature map X of the target structure region into a visual marker T;

The self-attention transformation is used to model the dependencies between T, and projected to the dimension of the normal feature map, expressed as:

X out ＝X in +SOFTMAX L ((X in W Q )(TW K ) T )T+G

Among them, G is the key discriminant matrix, W Q and W K are the weight parameters, Xin represents the image features obtained during the multi-scale pathological image tumor region detection, and X out represents the output result;

According to the feature relationship of the constructed image, through data learning, the identification and positioning of different types of cells can be realized.
The method according to claim 2, wherein the multi-layer graph includes non-overlapping m layers, and each layer consists of an adjacency matrix A i, a weighted graph G of i=1,...,m iModeling, set A = {A 1,A 2,...,A m} is called the intra-layer matrix, which represents the intra-layer connection; for the modeling of the connection between two graphs, G kand G land their adjacency matrices are denoted as A kand A l, which represents a one-to-one symmetric internal connection between the nodes of two correlation graphs, the set C of cross-layer adjacency matrices p={A l,k,k≠l}, represents the edge between nodes in different layers, p represents the number of connection graphs, where k,l=1,2,...,m, k≠l.
The method according to claim 5, wherein, for a multi-layer network constructed by a multi-layer graph
has a collection of interlayer connections that connect nodes across layers
for sides
There are u∈V(G k ) and v∈V(G l ), and k≠l, the multilayer network
The hyperadjacency matrix of has a block matrix structure expressed as:

Among them, the diagonal elements in the set A are intra-layer matrices, and the off-diagonal elements A kl (k,l=1,2,...,m; k≠l) represent interlayer connections connecting nodes in layer G k to nodes in layer G l .
The method according to claim 6, wherein, for the interlayer connection, the value of the interlayer off-diagonal element A kl is obtained based on the spatial distance, and for the intralayer matrix, the value of the diagonal element is obtained by the Euclidean distance between cells.
The method according to claim 5, further comprising reducing the dimensionality of the multi-layer network by using graph embedding, and clustering the multi-layer network after dimensionality reduction according to the shape similarity of the local neighborhood of two points at all resolution scales.
The method according to claim 2, wherein the pixel distribution type of the pathological image is determined by Jacques-Béla test.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 2 to 9 are implemented.