CN116189785A - Spatial domain identification method based on spatial transcriptomics data feature extraction - Google Patents

Spatial domain identification method based on spatial transcriptomics data feature extraction Download PDF

Info

Publication number
CN116189785A
CN116189785A CN202310097081.0A CN202310097081A CN116189785A CN 116189785 A CN116189785 A CN 116189785A CN 202310097081 A CN202310097081 A CN 202310097081A CN 116189785 A CN116189785 A CN 116189785A
Authority
CN
China
Prior art keywords
gene expression
matrix
spatial
network
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310097081.0A
Other languages
Chinese (zh)
Inventor
贾松卫
崔议文
兰猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310097081.0A priority Critical patent/CN116189785A/en
Publication of CN116189785A publication Critical patent/CN116189785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The invention discloses a spatial domain identification method based on spatial transcriptome data feature extraction, which mainly solves the problems of overfitting and low spatial domain identification precision in the spatial transcriptome data feature extraction in the prior art. The implementation scheme is as follows: preprocessing gene expression data and spatial information measured in a spatial transcriptome; constructing a gene similarity network and a space neighborhood network based on the gene expression feature matrix and the space information; carrying out data enhancement on the gene similarity network and the space neighborhood network; constructing a feature extraction model, and inputting enhanced data into the model to calculate contrast loss and reconstruction loss; according to the calculation loss training model, inputting unreinforced data into the trained model to obtain low-dimensional embedding; clustering the low-dimensional embeddings completes spatial domain identification. The method avoids overfitting in the characteristic extraction process, improves the accuracy of spatial domain identification, and can be used for providing reference data for exploring biological development and treating diseases.

Description

Spatial domain identification method based on spatial transcriptomics data feature extraction
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a spatial domain identification method which can be used for providing reference data for exploring biological development and treating diseases.
Background
In tissue sections, some regions have a similar spatial gene expression profile, forming specific structures or substructures in the tissue. These regions have different functional compartments due to the differences in cell type composition and gene expression, thereby forming spatial domains with specific biologically significant structures. The identification of spatial domains is critical for studying the effects of tissue structure and cell-cell interactions.
Single cell transcriptome sequencing techniques scRNA-seq can be used to provide high resolution gene expression profiles, however, limitations are imposed on downstream analysis due to the inability to retain spatial position information when preparing samples. Spatial transcriptome sequencing techniques, including in situ hybridization-based imaging techniques and spatial barcode-based in situ sequencing techniques, provide both gene expression profile and spatial location information that is critical to understanding the biological significance of healthy tissue development and disease tumor microenvironment. The presentation of spatial transcriptome data thus helps better describe the spatial organization of cells. Regions with similar expression patterns are mined for spatial transcriptome data by clustering to interpret the spatial organization of cells, i.e., identifying spatial domains is one of the most important tasks of spatial transcriptomics.
Traditional clustering algorithms, such as: louvain and K-means cannot effectively utilize available spatial information, so that a clustering result cannot continuously identify a tissue region with an obvious layered structure in a tissue section, and cannot provide accurate reference for downstream analysis, and therefore a spatial clustering method for developing data of a spatial transcriptome while utilizing a gene expression profile and spatial position coordinates is required.
2021 Jian Hu et al proposed a deep learning algorithm called SpaGCN on Nature Methods to integrate gene expression, spatial location and histological images through a graph rolling network. Firstly, constructing a graph representing the relation between points by combining a spatial position and a histological image, and then gathering gene expression information from adjacent points by utilizing a graph roll lamination; and then adopting an unsupervised iterative clustering algorithm, and clustering the points by using an aggregation expression matrix.
2021 Edward Zhao et al proposed an algorithm named Bayespace on Nature Biotechnology. The Bayes space algorithm models the low-dimensional representation of the gene expression matrix, and introduces a spatial neighbor structure into the pre-inspection algorithm through a Bayes statistical method to encourage adjacent pixels to belong to the same cluster, so that spatial clustering is realized.
2022 Shihua Zhang et al proposed a new frame STAGATE based on a graph attention self-encoder on Nature Communications, which utilized the graph attention self-encoder to automatically learn weights of inter-node edges through an attention mechanism while embedding spatial information, taking into account spatial similarity of spatial domain boundary pixels.
2022 Chang Xu et al, on Nucleic Acids Research, proposed a deep neural network framework deep ST that uses a neural network to extract histological image features and creates a spatially enhanced gene expression matrix with gene expression and spatial location, using a graph convolution network and denoising self-encoder in combination to generate a potential representation of enhanced ST data.
These algorithms all suffer from the following disadvantages:
firstly, due to the addition of histological features, the complexity of the model is increased while the clustering precision is improved, and the occupied memory is large and the running time is long;
meanwhile, some algorithms attach importance to spatial information excessively so as to overcorrect gene expression characteristics and cause overfitting of clustering, so that some fine areas cannot be identified, and the identified spatial domains cannot be subjected to accurate analysis of biological functions.
Thirdly, the results of each repetition are unstable, the difference is large, and the results are only good on the data set measured by the space transcriptome sequencing means based on in-situ sequencing, but poor on the data set based on imaging, and the data set of the space transcriptome cannot be widely analyzed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a spatial domain identification method based on spatial transcriptome data feature extraction, so as to extract the combined features of gene expression spectrum and spatial position information in spatial transcriptome data, improve the generalization capability of spatial transcriptome based on sequencing and imaging based on two sequencing means and finish the accurate analysis of spatial domain biological functions.
The technical scheme of the invention is as follows: preprocessing gene expression data and spatial information measured in a spatial transcriptome; constructing a gene similarity network and a space neighborhood network based on the gene expression feature matrix and the space information; carrying out data enhancement on the gene similarity network and the space neighborhood network; constructing a feature extraction model, and inputting enhanced data into the model to calculate contrast loss and reconstruction loss; according to the calculation loss training model, inputting unreinforced data into the trained model to obtain low-dimensional embedding; clustering the low-dimensional embeddings completes spatial domain identification. The implementation steps comprise the following steps:
(1) Simultaneously measuring a gene expression value and a spatial position coordinate of each pixel point in a required tissue slice by using a spatial transcriptome sequencing technology to obtain spatial transcriptome data comprising a pixel point-gene expression matrix and the spatial position of each pixel point in the tissue slice;
(2) Preprocessing a gene expression matrix of the space transcriptome data:
(2a) Deleting the expressed genes with gene expression values less than three pixels in the space transcriptome data;
(2b) Carrying out numerical normalization on the deleted data to enable the count sum of each cell to be the median of all cells, carrying out logarithmic transformation on the normalized data, and normalizing the normalized data into zero mean and unit variance;
(2c) Performing Principal Component Analysis (PCA) on the standardized data, extracting the first n principal components, and generating a feature matrix X of gene expression;
(3) Constructing a spatial neighborhood network:
(3a) Calculating Euclidean distance d between each pixel point in the tissue slice on the basis of the space coordinate information;
(3b) Selecting the first k nearest neighbors of each pixel point based on the Euclidean distance d calculated by the space coordinates, and constructing an adjacency matrix A representing the space information;
(3c) Taking the gene expression characteristic matrix X generated in the step (2) as a node attribute characteristic matrix;
(3d) Based on the adjacency matrix A representing the space information and the node attribute feature matrix X, a space neighborhood network G is formed 1 (A,X);
(4) Constructing a gene expression similarity network:
(4a) Calculating the Euclidean distance d' between the gene expression values of each pixel point in the tissue slice based on the gene expression characteristic matrix X generated in the step (2);
(4b) Based on Euclidean distance d' between gene expression values, selecting the first k nearest neighbors of each pixel point, and constructing an adjacency matrix B for representing the gene expression similarity;
(4c) Based on the adjacent matrix B and node attribute feature matrix X for representing the gene expression similarity, a gene expression similarity network G is formed 2 (B,X);
(5) Data enhancement:
(5a) Masking probability p for edge and node attribute features in a spatial neighbor network according to a given edge consistent with Bernoulli distribution r And node feature mask probability p m Masking to obtain an enhanced spatial neighbor network G 1 (A 1 ,X 1 );
(5b) Masking probability p for edge and node attribute features in a gene expression similarity network according to a given edge conforming to Bernoulli distribution r And node feature mask probability p m Masking to obtain a gene expression similarity network G after enhancement 2 (B 1 ,X 2 );
(6) Constructing a feature extraction model of spatial transcriptome data consisting of a concatenation of encoder f (, parallel decoder h (), and projector g (), and using contrast loss L con And reconstruction loss L recon As a loss function L;
(7) Training a feature extraction model of the spatial transcriptome data:
(7a) Enhanced space neighbor network G 1 (A 1 ,X 1 ) Adjacent matrix a of (a) 1 And node attribute feature matrix X 1 Gene expression similarity network G 2 (B 1 ,X 2 ) Adjacent matrix B of (a) 1 And node attribute feature matrix X 2 Input into a space transcriptome feature extraction model, and generate a low-dimensional embedded Z by an encoder 1 and Z2 The decoder generates a reconstructed gene expression feature matrix
Figure BDA0004071985370000031
and
Figure BDA0004071985370000032
(7b) Computing a low-dimensional embedded Z 1 and Z2 Is characterized by the contrast loss and the reconstruction gene expression of (1)
Figure BDA0004071985370000041
and
Figure BDA0004071985370000042
The reconstruction loss of the node attribute feature matrix X is updated according to the calculated loss until the loss function L converges, and a trained space transcriptome feature extraction model is obtained;
(8) Inputting an adjacency matrix A and a node attribute feature matrix X of a spatial neighborhood network which is not subjected to data enhancement into a trained spatial transcriptome feature extraction model in the step (7 b) to obtain a combined low-dimensional embedded Z containing spatial information and gene expression;
(9) And clustering the obtained combined low-dimensional embedded Z by using a Leiden clustering algorithm, and obtaining a region with consistent gene expression, namely a spatial domain, on the tissue slice.
Compared with the prior art, the invention has the following advantages:
1) Because the spatial information and the gene expression profile of the spatial transcriptome data are combined to construct the spatial neighborhood network and the gene expression similarity network, compared with the prior method, the spatial information and the gene expression information can be better balanced, the overfitting of the gene expression profile is prevented, and the accuracy and the robustness of spatial domain identification are improved.
2) The invention only uses gene expression spectrum and spatial information, and does not add the characteristics of histological images, so that the efficiency of the model is improved, the running time is reduced, and the challenges of larger data sets generated in the future can be dealt with.
3) According to the invention, due to the introduction of contrast loss training low-dimensional embedding, similar samples in the samples are more similar, dissimilar samples are far away, the problem of spatial clustering is well fitted, and compared with the prior art, the generalization capability on a data set generated by two sequencing means of a spatial transcriptome is improved.
4) According to the invention, as the model architecture of the cascade connection of the encoder and the decoder is designed, and meanwhile, the contrast loss and the reconstruction loss are considered, the denoised data can be generated, and the biological significance in the original sample is better reserved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of data enhancement in accordance with the present invention;
FIG. 3 is a feature extraction model diagram of spatial transcriptome data constructed in the present invention;
FIG. 4 is a visual representation of spatial clustering results using the present invention and the existing STAGATE and deep ST methods, respectively.
Detailed Description
Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.
Existing spatial transcriptome data includes imaging techniques based on in situ hybridization and in situ sequencing techniques based on spatial barcodes, where imaging techniques include STAPmap, MERFISH and in situ sequencing techniques include spatial transcriptomics, 10x visual, slide-seq. This example takes the spatial transcriptome dataset of 10x visual spatial transcriptome sequenced human dorsal lateral forehead cortex 151673 slices, which contains 3639 pixels with 33538 genes per pixel.
Referring to fig. 1, the implementation steps of this example are as follows:
step 1, preprocessing a gene expression matrix of space transcriptome data.
1.1 Acquiring pixel-gene expression matrix data in a spatial transcriptome data set of a spatial transcriptome sequenced human dorsal lateral forehead cortex layer 151673 slice of 10x Visum, deleting genes expressed in less than three pixels in gene expression values in the spatial transcriptome data to realize data filtering, and obtaining 3639 pixels and 19151 genes remained after filtering;
1.2 Median normalization of the transcriptome data after filtering, that is, dividing each column of data by the median of the column of data, and then carrying out logarithmic conversion on the data after normalization of the median, and normalizing the data into zero mean and unit variance;
1.3 Main component analysis PCA is carried out on the standardized data, the first 300 main components are extracted, and a feature matrix X of gene expression is generated:
X=[x 1 ;x 2 …;x i ;…;x n ] T
wherein, [;]representing the splicing operation, x i For the gene feature vector of i pixels, i=1..n, n is the number of all pixels in a tissue slice, and T represents the transpose.
And 2, constructing a space neighborhood network.
2.1 Acquiring spatial position coordinate data in a spatial transcriptome dataset of a 10x visual spatial transcriptome sequenced human dorsal lateral forehead cortex layer 151673 slice, and calculating the euclidean distance d of each pixel point on a spatial position based on the spatial coordinate information:
Figure BDA0004071985370000051
wherein ,(ai ,b i) and (aj ,b j ) Spatial coordinates of pixel i and pixel j on the tissue slice;
2.2 The first 5 nearest neighbors of each pixel point are selected based on the Euclidean distance d calculated by the space coordinates, and an adjacency matrix A for representing the space information is constructed:
Figure BDA0004071985370000052
wherein ,
Figure BDA0004071985370000053
i and j represent two nodes in the spatial neighborhood network respectively for the elements of the ith row and jth column in the adjacency matrix A of the spatial neighborhood network, if node i is included in the first 5 nearest neighbors of node j calculated based on spatial coordinates, then i and j are adjacent, otherwise i and j are not adjacent, i, j=1..n, n=3639 represents the number of nodes included in the spatial neighborhood network;
2.3 Taking the gene expression characteristic matrix X generated in the step 1 as a node attribute characteristic matrix;
2.4 Based on the adjacent matrix A representing the space information and the node attribute characteristic matrix X, a space neighborhood network G is formed 1 (A,X)。
And 3, constructing a gene expression similarity network.
3.1 Based on the gene expression feature matrix X generated in step 1), calculating the euclidean distance d' between the gene expression values of each pixel point in the tissue section:
Figure BDA0004071985370000061
wherein xjk and xik The values of the kth dimension of the pixel i and pixel j gene expression feature vectors, respectively, k=1..m, m=300 being the dimension of each pixel gene expression feature vector;
3.2 Based on the Euclidean distance d' calculated by the gene expression value, selecting the first 5 nearest neighbors of each pixel point, and constructing an adjacency matrix B for representing the gene expression similarity:
Figure BDA0004071985370000062
wherein ,
Figure BDA0004071985370000063
for elements of the ith row and jth column in the adjacency matrix B of the gene expression similarity network, i and j represent two nodes in the gene expression similarity network respectively, if the node i is included in the first 5 nearest neighbors calculated based on the gene expression matrix of the node j, i and j are adjacent, otherwise i and j are not adjacent, i, j=1..n, n=3639 represents the number of nodes contained in the gene expression similarity network, and the number of nodes is the same as the number of nodes in the spatial neighborhood network;
3.3 Based on the adjacent matrix B and node attribute characteristic matrix X for representing the gene expression similarity, forming a gene expression similarity network G 2 (B,X)。
Step 4, for the space neighbor network G 1 (A, X) and Gene expression similarity network G 2 The edge and node attribute features in (B, X) are enhanced.
In order to increase training samples and improve the self-supervision capability of the model, data enhancement needs to be carried out on an adjacent matrix A and a node attribute feature matrix X of the spatial neighbor network and an adjacent matrix B and a node attribute feature matrix X of the gene expression similarity network.
Referring to fig. 2, the present step is specifically implemented as follows:
4.1 According to each element a in the adjacency matrix A of the spatial neighborhood network ij Sampling an edge mask matrix according to Bernoulli distribution
Figure BDA0004071985370000071
Figure BDA0004071985370000072
in the formula ,
Figure BDA0004071985370000073
mask matrix for edge->
Figure BDA0004071985370000074
The element in the ith row and the jth column in (a), if a ij =1, then->
Figure BDA0004071985370000075
From Bernoulli distribution B (1-p r ) Middle sampling, if a ij =0, then->
Figure BDA0004071985370000076
wherein ,pr =0.2 is the probability that each edge in the spatial neighborhood network is deleted, i and j represent two nodes in the spatial neighborhood network, i, j=1..n, n=3639 represents the number of nodes contained in the spatial neighborhood network, respectively;
4.2 A) combining the adjacency matrix A of the spatial neighborhood network with the sampling matrix generated in 4.1)
Figure BDA0004071985370000077
Enhanced adjacency matrix A by element multiplication 1
Figure BDA0004071985370000078
in the formula ,
Figure BDA00040719853700000715
the operator represents the adjacency matrix A and sampling matrix in the spatial neighborhood network>
Figure BDA0004071985370000079
According to element multiplication, a ij Is the element of the ith row and jth column in the adjacency matrix A of the space neighborhood network, ++>
Figure BDA00040719853700000710
For sampling matrix->
Figure BDA00040719853700000711
Elements of the ith row and the jth column;
4.3 According to Bernoulli distribution B (1-p) m ) Sampling a random vector to generate a node feature mask vector with the same dimension as the gene feature vector
Figure BDA00040719853700000712
Where pm=0.3 is the probability that the value in each node feature vector in the spatial neighborhood network is deleted;
4.4 Node attribute feature matrix X of the spatial neighbor network with 4.3) generated node feature mask vector
Figure BDA00040719853700000713
Obtaining the enhanced node attribute feature matrix X according to element multiplication 1
Figure BDA00040719853700000714
Wherein, [;]representing the splicing operation, x i Is X 1 The i < th > row of the spatial neighborhood network represents the gene feature vector of the spatial neighborhood network on the node i, i=1..n, n=3639 is the number of nodes of the spatial neighborhood network;
4.5 Each element B in the adjacency matrix B according to the gene expression similarity network ij Sampling an edge mask matrix R epsilon {0,1} according to Bernoulli distribution N×N
Figure BDA0004071985370000081
in the formula ,
Figure BDA0004071985370000082
if b, for the element of the ith row and jth column of the edge mask matrix R ij =1, then R ij From Bernoulli distribution B (1-p r ) Middle sampling, if b ij =0, then R ij =0, where p r ' 0=2 is that each edge in the gene expression similarity network is coveredThe probability of deletion, i and j, respectively represent two nodes in the gene expression similarity network, i, j=1..n, n=3639 represents the number of nodes contained in the gene expression similarity network, consistent with the number of nodes in the spatial neighborhood network;
4.6 Multiplying the adjacency matrix B of the gene expression similarity network with the sampling matrix R generated in 4.5) according to elements to obtain the adjacency matrix B of the enhanced gene expression similarity network 1
Figure BDA0004071985370000083
in the formula ,
Figure BDA0004071985370000084
the operator represents multiplying the adjacency matrix B and the sampling matrix R of the gene expression similarity network by elements, B ij For elements of the ith row and jth column in adjacency matrix B of the gene expression similarity network, R ij The element of the ith row and the jth column in the sampling matrix R;
4.7 According to Bernoulli distribution B (1-p) m ) Sampling a random vector to generate a node feature mask vector m with the same dimension as the gene feature vector, wherein p m ' 0.3 is the probability that the value in each node feature vector in the gene expression similarity network is deleted;
4.8 Multiplying the node attribute feature matrix X of the gene expression similarity network with the node attribute mask vector m generated by 4.7) according to elements to obtain the node attribute feature matrix X of the enhanced gene expression similarity network 2
Figure BDA0004071985370000085
Wherein, [;]representing the splicing operation, x i Is X 2 Represents the feature vector of the gene on the node i in the gene expression similarity network.
And 5, constructing a feature extraction model of the space transcriptome data.
Referring to fig. 3, the specific implementation of this step is as follows:
5.1 Establishing an encoder consisting of an input GCN layer and two layers of hidden GCN layers, wherein the input dimension is 300 dimensions of transcriptome gene characteristic dimension, the first hidden GCN layer is 256 dimensions, the second hidden GCN layer is 128 dimensions, and a PRelu function is used as an activation function between each two layers of GCNs;
5.2 Establishing a decoder consisting of an input full-connection layer and three layers of hidden full-connection hierarchies, wherein the input full-connection layer is 128-dimensional, the first hidden full-connection layer is 128-dimensional, the second hidden full-connection layer is 256-dimensional, the third hidden full-connection layer dimension is 300-dimensional of transcriptome gene characteristic dimension, and a Relu function is used as an activation function between each layer of full-connection layer;
5.3 The projector formed by the cascade of the input full-connection layer and the hidden full-connection layer is established, the input full-connection layer is 128-dimensional, the hidden full-connection layer is 128-dimensional, and no activation function is arranged between each layer;
5.4 Cascading the encoder with the decoder and the projector respectively to form a feature extraction model of the space transcriptome data;
5.5 Let the loss function L of the feature extraction model be a weighted sum of the contrast loss and the reconstruction loss, expressed as follows:
L=λ con L conrecon L recon
wherein ,λcon =1,λ recon =0.01 is the super parameter for measuring the weight of contrast loss and reconstruction loss, respectively, L recon Represents reconstruction loss, L con Representing contrast loss.
And step 6, training a feature extraction model of the space transcriptome data.
6.1 G) are extracted separately by encoder f (·) 1 (A 1 ,X 1 ) Node low-dimensional embedding Z of (E) 1 and G2 (B 1 ,X 2 ) Node low-dimensional embedding Z of (E) 2
Z 1 =f(X 1 ,A 1 )=GC k+1 (GC k (X 1 ,A 1 ),A 1 )
Z 2 =f(X 2 ,B 1 )=GC k+1 (GC k (X 2 ,B 1 ),B 1 ),
wherein ,GCk (. Cndot.) represents the k-layer network of the encoder, X 1 and A1 Node characteristic matrix and adjacent matrix of space neighborhood network respectively, X 2 and B1 Respectively a node characteristic matrix and an adjacent matrix in the gene expression similarity network, wherein k=1;
6.2 Embedding Z) the two low dimensions obtained in step 6.1) 1 and Z2 Respectively as input of decoder h (·) to obtain G 1 (A 1 ,X 1 ) Reconstructed gene expression feature matrix
Figure BDA0004071985370000091
and G2 (B 1 ,X 2 ) Reconstructed Gene expression characterization matrix->
Figure BDA0004071985370000092
Figure BDA0004071985370000093
Figure BDA0004071985370000094
6.3 Embedding the two low dimensions generated in step 6.1) into Z 1 and Z2 Respectively input to a projector g (·) to obtain Z 1 Low dimensional embedding of Z 'for contrast loss of (C)' 1 and Z2 Low dimensional embedding of Z 'for contrast loss of (C)' 2
Z′ 1 =g(Z 1 )
Z′ 2g (Z 2 );
6.4 Calculating the contrast loss l (z 'of each node i and other nodes k according to the result of the step 6.3)' 1i ,z′ 2i ):
Figure BDA0004071985370000101
Wherein, θ (·) represents the cosine similarity distance, and τ is a given hyper-parameter; z'. 1i Is Z' 1 Vector of the i-th row representing pixel point i at G 1 (A 1 ,X 1 ) As input, the output of the projector; z'. 2i Is Z' 2 Vector of the i-th row representing pixel point i at G 2 (B 1 ,X 2 ) As output of the projector at the time of input, i, k=1..n, n=3639 is the number of all pixel points in the tissue slice;
6.5 Calculating the contrast loss L of the whole model according to the contrast loss of each node obtained in the step 6.4) con
Figure BDA0004071985370000102
6.6 According to the reconstructed gene feature matrix generated in step 6.2)
Figure BDA0004071985370000103
and
Figure BDA0004071985370000104
Calculating reconstruction loss L recon
Figure BDA0004071985370000105
in the formula ,xi A vector of the ith row of X, which represents the genetic characteristic of pixel point i;
Figure BDA0004071985370000106
is->
Figure BDA0004071985370000107
Vector of the i-th row representing the pixel point i is represented by G 1 (A 1 ,X 1 ) Reconstructed gene signature;
Figure BDA0004071985370000108
Is->
Figure BDA0004071985370000109
Vector of the i-th row representing the pixel point i is represented by G 2 (B 1 ,X 2 ) Reconstructed gene signature;
6.7 According to contrast loss L con And reconstruction loss L recon Calculating a loss function L:
L=λ con L conrecon L recon
6.8 Updating the network parameters of the encoder and the decoder according to the loss function L obtained in the step 6.7) until the loss function L converges, and obtaining a trained spatial transcriptome feature extraction model.
Step 7, inputting an adjacency matrix A and a node attribute feature matrix X of the spatial neighborhood network which are not subjected to data enhancement into the spatial transcriptome feature extraction model trained in the step 6, so as to obtain a combined low-dimensional embedded Z containing spatial information and gene expression;
and step 8, clustering the combined low-dimensional embedding obtained in the step 7 by using a Leiden clustering algorithm.
8.1 Calculating the neighbor of each pixel point according to the combined low-dimensional embedding Z extracted in the step 7, constructing a neighborhood graph, and storing a neighborhood label l';
8.2 Performing dimension reduction on the combined low-dimension embedded Z through UMAP algorithm to obtain an embedded Z' with reduced dimension;
8.3 Obtaining a clustering label l through a Leiden algorithm according to the neighborhood label l 'obtained in the step 8.1) and the low-dimensional post-embedding Z' obtained in the step 8.2);
8.4 UMAP visualization is carried out on the clustering label l and the low-dimensional embedded Z', each pixel point is dyed on the tissue slice according to the clustering label l, and the pixel points with the same color are regarded as a domain, namely the identification of the spatial domain is realized.
The technical effects of the present invention will be described below in connection with simulation experiments.
Simulation conditions:
the CPU of the computer hardware of the simulation experiment is Intel Core (TM) i7-8700, and the memory of the computer hardware is 32G;
computer software: python3.8 integrated development software on WINDOWS10 system.
Second, simulation content:
simulation 1: the spatial clustering was performed with the present invention and the existing 6 methods SEDR, STAGATE, deepST, scanpy, stlearn, spaGCN on a dataset generated by two spatial transcriptome sequencing means, namely a spatial transcriptome dataset based on 12 slices of 10x visual human dorsal lateral prefrontal cortex layer DLPFC and a spatial transcriptome dataset based on imaged STARmap mouse visual cortex, and the results were as shown in table 1 using the adjusted rand index ARI as an evaluation index for evaluating the spatial clustering results of each method:
table 1 evaluation of the invention and the 6 existing methods in a tagged dataset
Figure BDA0004071985370000111
The existing 6 spatial domain identification methods are as follows:
SEDR,Ling S,Huazhu F,et al.Unsupervised Spatially Embedded Deep Representation ofSpatial Transcriptomics[J].bioRxiv,2021.
STAGATE,Dong K,Zhang S.Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder[J].Nature communications,2022,13(1):1-12.
DeepST,Xu C,Jin X,Wei S,et al.DeepST:identifying spatial domains in spatial transcriptomics by deep learning[J].Nucleic Acids Research,2022.
Scanpy,Wolf F A,Angerer P,Theis F J.SCANPY:large-scale single-cell gene expression data analysis[J].Genome Biology,2018,19(1):1-5.
Stlearn,Pham D,Tan X,Xu J,et a1.stLearn:integrating spatial location,tissue morphology and gene expression to find cell types,cell-cell interactions and spatial trajectories within undissociated tissues[J].bioRxiv,2020.
SpaGCN,LiM,Hu J,Li X,et al.SpaGCN:Integrating gene expression,spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network[J].Nature Methods,2021,18(10):1342-1351.
as can be seen from Table 1, the present invention has better results than the other methods on 12 data sets of DLPFC of 10xVisium, and the mean value is higher than that of the other methods. On the mouse visual cortex dataset of STAPmap, the performance of the invention and STAGATE is significantly higher than other methods, but the invention has higher accuracy than STAGATE. The simulation result shows that the invention maintains higher accuracy in both in-situ sequencing-based data sets and imaging-based data sets, and has good generalization capability.
Simulation 2: the present invention was used to spatially cluster on 10x visual mouse brain slices and human breast cancer spatial transcriptome datasets with the current 3 methods DeepST, SEDR, STAGATE, and profile factor Silhouette Coefficient score and DB score Davies-Bouldin score were used as evaluation indicators for evaluating the spatial clustering results of each method, the results are shown in table 2:
table 2 evaluation of the invention and the prior 3 methods in unlabeled dataset
Figure BDA0004071985370000121
As can be seen from Table 2, the performance of the invention with STAGATE is significantly higher than the other methods on the spatial transcriptome dataset of the 10XVisium mouse brain, but the invention is slightly higher than the STAGATE index. The invention has great advantages over other methods on spatial transcriptome datasets of 10x visual human breast cancer. The simulation result shows that on a plurality of unlabeled data sets needing fine recognition, the clustering result of the invention is better, and the biological significance in the original sample is better reserved.
Simulation 3: the data set of the mouse brain coronal plane at 10x visual with the two existing methods deep st and STAGATE of the invention is used for identifying the spatial domain through spatial clustering, and each pixel point is stained with the clustering result on a histological section, and the result is shown in fig. 4. Wherein fig. 4 (a) shows a spatial cluster visualization of the present invention, fig. 4 (b) shows a spatial cluster visualization of STAGATE, and fig. 4 (c) shows a spatial cluster visualization of deep st.
As can be seen from fig. 4, the existing methods STAGATE and deep cannot accurately identify the spatial domain on the mouse brain coronal slice, and cannot clearly represent the difference between each domain, especially the hippocampal region on the dataset, and the spatial clustering result of the present invention is more in accordance with biological significance. The simulation result shows that the characteristics extracted by the method do not cause overfitting on the gene expression profile, and the accuracy and the robustness of spatial domain identification are improved.

Claims (12)

1. The spatial domain identification method based on the spatial transcriptomics data feature extraction is characterized by comprising the following steps:
(1) Simultaneously measuring a gene expression value and a spatial position coordinate of each pixel point in a required tissue slice by using a spatial transcriptome sequencing technology to obtain spatial transcriptome data comprising a pixel point-gene expression matrix and the spatial position of each pixel point in the tissue slice;
(2) Preprocessing a gene expression matrix of the space transcriptome data:
(2a) Deleting the expressed genes with gene expression values less than three pixels in the space transcriptome data;
(2b) Carrying out numerical normalization on the deleted data to enable the count sum of each cell to be the median of all cells, carrying out logarithmic transformation on the normalized data, and normalizing the normalized data into zero mean and unit variance;
(2c) Performing Principal Component Analysis (PCA) on the standardized data, extracting the first n principal components, and generating a feature matrix X of gene expression;
(3) Constructing a spatial neighborhood network:
(3a) Calculating the Euclidean distance d between each pixel point in the tissue slice on the basis of the space coordinate information:
(3b) Selecting the first k nearest neighbors of each pixel point based on the Euclidean distance d calculated by the space coordinates, and constructing an adjacency matrix A representing the space information;
(3c) Taking the gene expression characteristic matrix X generated in the step (2) as a node attribute characteristic matrix;
(3d) Based on the adjacency matrix A representing the space information and the node attribute feature matrix X, a space neighborhood network G is formed 1 (A,X);
(4) Constructing a gene expression similarity network:
(4a) Calculating the Euclidean distance d' between the gene expression values of each pixel point in the tissue slice based on the gene expression characteristic matrix X generated in the step (2);
(4b) Based on the Euclidean distance d' calculated by the gene expression value, selecting the first k nearest neighbors of each pixel point, and constructing an adjacent matrix B for representing the gene expression similarity;
(4c) Based on the adjacent matrix B and node attribute feature matrix X for representing the gene expression similarity, a gene expression similarity network G is formed 2 (B,X);
(5) Data enhancement:
(5a) Masking probability p for edge and node attribute features in a spatial neighbor network according to a given edge consistent with Bernoulli distribution r And node feature mask probability p m Masking to obtain an enhanced spatial neighbor network G 1 (A 1 ,X 1 );
(5b) Masking probability p for edge and node attribute features in a gene expression similarity network according to a given edge conforming to Bernoulli distribution r ' and node feature mask probability p m ' masking to obtain a network of similarity of enhanced gene expression G 2 (B 1 ,X 2 );
(6) Construction consists of a cascade of encoder f (·) with decoder h (·) and projector g (·) respectivelyFeature extraction model of spatial transcriptome data of (2) and using contrast loss L con And reconstruction loss L recon As a loss function L;
(7) Training a feature extraction model of the spatial transcriptome data:
(7a) Spatial neighbor network G with enhanced data 1 (A 1 ,X 1 ) Adjacency matrix A1 and node attribute feature matrix X 1 Gene expression similarity network G 2 (B 1 ,X 2 ) Adjacent matrix B of (a) 1 And node attribute feature matrix X 2 Input into a space transcriptome feature extraction model, and generate a low-dimensional embedded Z by an encoder 1 and Z2 The decoder generates a reconstructed gene expression feature matrix
Figure FDA0004071985350000021
and
Figure FDA0004071985350000022
(7b) Computing a low-dimensional embedded Z 1 and Z2 Is characterized by the contrast loss and the reconstruction gene expression of (1)
Figure FDA0004071985350000023
and
Figure FDA0004071985350000024
Updating network parameters of the encoder and the decoder according to the calculated loss until the loss function L converges to obtain a trained space transcriptome feature extraction model;
(8) Inputting an adjacency matrix A and a node attribute feature matrix X of a spatial neighborhood network which is not subjected to data enhancement into a trained spatial transcriptome feature extraction model in the step (7 b) to obtain a combined low-dimensional embedded Z containing spatial information and gene expression;
(9) And clustering the obtained combined low-dimensional embedded Z by using a Leiden clustering algorithm, and obtaining a region with consistent gene expression, namely a spatial domain, on the tissue slice.
2. The method of claim 1, wherein step (2 c) generates a gene expression profile X, expressed as follows:
X=[x 1 ;x 2 …;x i ;…;x n ] T
wherein, [;]representing the splicing operation, x i For the gene feature vector of i pixels, i=1..n, n is the number of all pixels in a tissue slice, and T represents the transpose.
3. The method of claim 1, wherein the euclidean distance d in spatial position between each pixel point in the tissue slice is calculated in step (3 a) as follows:
Figure FDA0004071985350000031
wherein ,(ai ,b i) and (aj ,b j ) The spatial coordinates of pixel i and pixel j on the tissue slice, respectively.
4. The method of claim 1, wherein the adjacency matrix a of the spatial neighborhood network constructed in step (3 b) is represented as follows:
Figure FDA0004071985350000032
wherein ,
Figure FDA0004071985350000033
for the elements of the ith row and jth column in the adjacency matrix A of the spatial neighborhood network, i and j represent two nodes in the spatial neighborhood network respectively, if the node i is included in the first k nearest neighbors calculated based on the spatial coordinates of the node j, the i and j are adjacent, otherwisei and j are not adjacent, i, j=1..n, n represents the number of nodes contained in the spatial neighborhood network.
5. The method of claim 1, wherein the euclidean distance d' between the gene expression values for each pixel in the tissue section is calculated in step (4 a) as follows:
Figure FDA0004071985350000034
wherein xjk and xik The values of the kth dimension of the pixel i and pixel j gene expression feature vectors, respectively, k=1..m, m being the dimension of each pixel gene expression feature vector.
6. The method of claim 1, wherein the adjacency matrix B of the gene expression similarity network constructed in step (4B) is represented as follows:
Figure FDA0004071985350000035
wherein ,
Figure FDA0004071985350000041
for the elements of the ith row and jth column in the adjacency matrix B of the gene expression similarity network, i and j represent two nodes in the gene expression similarity network, respectively, if the node i is included in the first k nearest neighbors calculated based on the gene expression matrix of the node j, i and j are adjacent, otherwise i and j are not adjacent, i, j=1.
7. The method of claim 1, wherein the step (5 a) is performed on a spatial neighbor network G 1 The edge and node attribute features in (a, X) are masked according to probabilities as follows:
(5a1) According to spaceEach element a in adjacency matrix a of the neighborhood network ij Sampling an edge mask matrix according to Bernoulli distribution
Figure FDA0004071985350000042
It is represented as follows:
Figure FDA0004071985350000043
in the formula ,
Figure FDA0004071985350000044
mask matrix for edge->
Figure FDA0004071985350000045
The element in the ith row and the jth column in (a), if a ij =1, then->
Figure FDA0004071985350000046
From Bernoulli distribution B (1-p r ) Middle sampling, if a ij =0, then->
Figure FDA0004071985350000047
wherein ,pr The probability that each edge in the spatial neighborhood network is deleted is that i and j respectively represent two nodes in the spatial neighborhood network, i, j=1..n, n represents the number of nodes contained in the spatial neighborhood network;
(5a2) Combining the adjacency matrix A of the spatial neighborhood network with the sampling matrix generated in (5 a 1)
Figure FDA0004071985350000048
Enhanced adjacency matrix A by element multiplication 1
Figure FDA0004071985350000049
in the formula ,
Figure FDA00040719853500000410
the operator represents the adjacency matrix A and sampling matrix in the spatial neighborhood network>
Figure FDA00040719853500000411
According to element multiplication, a ij Is the element of the ith row and jth column in the adjacency matrix A of the space neighborhood network, ++>
Figure FDA00040719853500000412
For sampling matrix->
Figure FDA00040719853500000413
Elements of the ith row and the jth column;
(5a3) According to Bernoulli distribution B (1-p m ) Sampling a random vector to generate a node feature mask vector with the same dimension as the gene feature vector
Figure FDA0004071985350000051
wherein ,pm Is the probability that the value in each node feature vector in the spatial neighborhood network is deleted;
(5a4) Node attribute feature matrix X of the space neighbor network and the node feature mask vector generated in (5 a 3)
Figure FDA0004071985350000057
Obtaining the enhanced node attribute feature matrix X according to element multiplication 1
Figure FDA0004071985350000052
Wherein, [;]representing the splicing operation, x i Is X1 A kind of electronic device And the ith row represents the gene characteristic vector of the node i in the space neighbor network.
8. Root of Chinese characterThe method of claim 1, wherein the gene is expressed in step (5 b) in a similarity network G 2 The edge and node attribute features in (B, X) are masked according to probability as follows:
(5b1) Each element B in the adjacency matrix B according to the gene expression similarity network ij Sampling an edge mask matrix R epsilon {0,1} according to Bernoulli distribution N×N The expression is as follows:
Figure FDA0004071985350000053
in the formula ,
Figure FDA0004071985350000054
if b, for the element of the ith row and jth column of the edge mask matrix R ij =1, then R ij From Bernoulli distribution B (1-p r ) Middle sampling, if b ij =0, then R ij =0, where p r ' is the probability that each edge in the gene expression similarity network is deleted, i and j represent two nodes in the gene expression similarity network, respectively, i, j=1..n, n representing the number of nodes contained in the gene expression similarity network;
(5b2) Multiplying the adjacency matrix B of the gene expression similarity network with the sampling matrix R generated in (5B 1) by elements to obtain an adjacency matrix B of the enhanced gene expression similarity network 1
Figure FDA0004071985350000055
in the formula ,
Figure FDA0004071985350000056
the operator represents multiplying the adjacency matrix B and the sampling matrix R of the gene expression similarity network by elements, B ij For elements of the ith row and jth column in adjacency matrix B of the gene expression similarity network, R ij For the first sample matrix Ri row j column elements;
(5b3) According to Bernoulli distribution B (1-p m ) Sampling a random vector to generate a node feature mask vector m with the same dimension as the gene feature vector, wherein p m ' is the probability that the value in each node feature vector in the gene expression similarity network is deleted;
(5b4) Multiplying the node attribute feature matrix X of the gene expression similarity network with the node feature mask vector m generated in step (5 a 3) according to elements to obtain the node attribute feature matrix X of the enhanced gene expression similarity network 2
Figure FDA0004071985350000061
Wherein, [;]representing the splicing operation, x i Is X 2 Represents the feature vector of the gene on the node i in the gene expression similarity network.
9. The method of claim 1, wherein the encoder, decoder, projector parameters and loss functions of the spatial transcriptome data feature extraction model constructed in step (6) are as follows:
the encoder f (·) is formed by cascade of an input graph neural network GCN layer and two layers of hidden GCN layers, wherein the input dimension is a transcriptome gene feature number, the first hidden GCN layer is 256 dimensions, the second hidden GCN layer is 128 dimensions, and a PRelu function is used as an activation function between each layer of GCNs;
the decoder h (-) consists of an input full-connection layer and three hidden full-connection layers which are cascaded, wherein the input full-connection layer is 128-dimensional, the first hidden full-connection layer is 128-dimensional, the second hidden full-connection layer is 256-dimensional, the third hidden full-connection layer is the feature number of the transcriptome genes, and a Relu function is used as an activation function between each full-connection layer;
the projector g (-) consists of an input full-connection layer and a hidden full-connection layer which are cascaded, wherein the input full-connection layer is 128-dimensional, the hidden full-connection layer is 128-dimensional, and an activation function is not arranged between each layer;
the loss function, which is a weighted sum of the contrast loss and the reconstruction loss, is expressed as:
L=λ con L conrecon L recon
wherein ,λcon and λrecon Is a super parameter for measuring the weight of contrast loss and reconstruction loss, L recon Represents reconstruction loss, L con Representing contrast loss.
10. The method of claim 1, wherein the step (7 a) of generating a low-dimensional embedding by an encoder and reconstructing a gene expression feature matrix by a decoder is performed as follows:
(7a1) Encoder f (·) extraction G 1 (A 1 ,X 1 ) Node low-dimensional embedding Z of (E) 1 and G2 (B 1 ,X 2 ) Node low-dimensional embedding Z of (E) 2
Z 1 =f(X 1 ,A 1 )=GC k+1 (GC k (X 1 ,A 1 ),A 1 )
Z 2 =f(X 2 ,B 1 )=GC k+1 (GC k (X 2 ,B 1 ),B 1 ),
wherein ,GCk (. Cndot.) represents the k-layer network of the encoder, X 1 and A1 Node characteristic matrix and adjacent matrix of space neighborhood network respectively, X 2 and B1 Respectively a node characteristic matrix and an adjacent matrix in the gene expression similarity network, wherein k=1;
(7a2) Embedding Z into the two low dimensions obtained in (7 a 1) 1 and Z2 Respectively as input of decoder h (·) to obtain G 1 (A 1 ,X 1 ) Reconstructed gene expression feature matrix
Figure FDA0004071985350000071
and G2 (B 1 ,X 2 ) Reconstructed Gene expression characterization matrix->
Figure FDA0004071985350000072
Figure FDA0004071985350000073
Figure FDA0004071985350000074
11. The method of claim 1, wherein the calculation of the contrast loss and the reconstruction loss in step (7 b) is performed as follows:
(7b1) Inputting the low-dimensional embeddings generated in the step (7 a) into a projector g (·) respectively to obtain Z respectively 1 and Z2 Low dimensional embedding Z 'for contrast loss' 1 and Z′2
Z′ 1 =g(Z 1 )
Z′ 2 =g(Z 2 );
(7b2) Calculating the comparison loss l (z 'of each node i and other nodes k according to the result of (7 b 1)' 1i ,z′ 2i ):
Figure FDA0004071985350000075
Wherein, θ (·) represents the cosine similarity distance, and τ is a given hyper-parameter; z'. 1i Is Z' 1 Vector of the i-th row representing pixel point i at G 1 (A 1 ,X 1 ) As input, the output of the projector; z'. 2i Is Z' 2 Vector of the i-th row representing pixel point i at G 2 (B 1 ,X 2 ) As output of the projector at the time of input, i, k=1..n, n is the number of all pixel points in the tissue slice;
(7b3) Calculating the whole model according to the comparison loss of each node obtained in (7 b 2)Contrast loss L of (2) con
Figure FDA0004071985350000076
(7b4) According to the reconstructed gene feature matrix generated in step (7 a)
Figure FDA0004071985350000077
and
Figure FDA0004071985350000078
Calculating reconstruction loss L recon
Figure FDA0004071985350000079
in the formula ,xi A vector of the ith row of X, which represents the genetic characteristic of pixel point i;
Figure FDA00040719853500000710
is->
Figure FDA00040719853500000711
Vector of the i-th row representing the pixel point i is represented by G 1 (A 1 ,X 1 ) Reconstructed gene signature;
Figure FDA00040719853500000712
Is->
Figure FDA00040719853500000713
Vector of the i-th row representing the pixel point i is represented by G 2 (B 1 ,X 2 ) Reconstructed gene signature;
(7b5) According to contrast loss L con And reconstruction loss L recon Calculating a loss function L:
L=λ con L conrecon L recon
wherein ,λcon and λrecon Is a super parameter for measuring the weights of the contrast loss and the reconstruction loss, respectively.
12. The method of claim 1, wherein step (9) uses Leiden clustering algorithm to cluster the combined low-dimensional embedding Z obtained after training, implemented as follows:
(9a) Calculating the neighbor of each pixel point according to the combined low-dimensional embedding Z extracted in the step (8), constructing a neighborhood graph, and storing a neighborhood label l';
(9b) Performing dimension reduction on the combined low-dimension embedded Z through UMAP algorithm to obtain an embedded Z' with reduced dimension;
(9c) Obtaining a cluster label l through a Leiden algorithm according to the neighborhood label l 'in the step (9 a) and the embedded Z' in the step (9 b);
(9d) And carrying out UMAP visualization on the cluster label l and the low-dimensional embedded Z', and dyeing each pixel point on the tissue slice according to the cluster label l, wherein the pixel points with the same color are regarded as a domain, namely, the identification of the spatial domain is realized.
CN202310097081.0A 2023-02-10 2023-02-10 Spatial domain identification method based on spatial transcriptomics data feature extraction Pending CN116189785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310097081.0A CN116189785A (en) 2023-02-10 2023-02-10 Spatial domain identification method based on spatial transcriptomics data feature extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310097081.0A CN116189785A (en) 2023-02-10 2023-02-10 Spatial domain identification method based on spatial transcriptomics data feature extraction

Publications (1)

Publication Number Publication Date
CN116189785A true CN116189785A (en) 2023-05-30

Family

ID=86435993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310097081.0A Pending CN116189785A (en) 2023-02-10 2023-02-10 Spatial domain identification method based on spatial transcriptomics data feature extraction

Country Status (1)

Country Link
CN (1) CN116189785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118016149A (en) * 2024-04-09 2024-05-10 太原理工大学 Spatial domain identification method for integrating space transcriptome multi-mode information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118016149A (en) * 2024-04-09 2024-05-10 太原理工大学 Spatial domain identification method for integrating space transcriptome multi-mode information

Similar Documents

Publication Publication Date Title
Xue et al. An application of transfer learning and ensemble learning techniques for cervical histopathology image classification
CN113706487A (en) Multi-organ segmentation method based on self-supervision characteristic small sample learning
Hou Breast cancer pathological image classification based on deep learning
Xu et al. Computerized spermatogenesis staging (CSS) of mouse testis sections via quantitative histomorphological analysis
Shubham et al. Identify glomeruli in human kidney tissue images using a deep learning approach
WO2021073279A1 (en) Staining normalization method and system for digital pathological image, electronic device and storage medium
CN115497623A (en) Lung cancer prognosis prediction system based on image, pathology and gene multiomics
Yu et al. A recognition method of soybean leaf diseases based on an improved deep learning model
Liao et al. A segmentation method for lung parenchyma image sequences based on superpixels and a self-generating neural forest
Shallu et al. Automatic magnification independent classification of breast cancer tissue in histological images using deep convolutional neural network
Routray et al. Ensemble Learning with Symbiotic Organism Search Optimization Algorithm for Breast Cancer Classification & Risk Identification of Other Organs on Histopathological Images
CN117253550A (en) Spatial transcriptome data clustering method
CN116189785A (en) Spatial domain identification method based on spatial transcriptomics data feature extraction
Hacking et al. Deep learning for the classification of medical kidney disease: a pilot study for electron microscopy
CN117036894B (en) Multi-mode data classification method and device based on deep learning and computer equipment
Yu et al. Pyramid multi-loss vision transformer for thyroid cancer classification using cytological smear
Ke et al. Mine local homogeneous representation by interaction information clustering with unsupervised learning in histopathology images
Taheri et al. A Comprehensive Study on Classification of Breast Cancer Histopathological Images: Binary Versus Multi-Category and Magnification-Specific Versus Magnification-Independent
Liu et al. TSDLPP: a novel two-stage deep learning framework for prognosis prediction based on whole slide histopathological images
Martin et al. A graph based neural network approach to immune profiling of multiplexed tissue samples
Su et al. Whole slide cervical image classification based on convolutional neural network and random forest
Chen et al. MSCCNet: Multi-Scale Convolution-Capsule Network for Cervical Cell Classification
Yuan [Retracted] Image Processing Method Based on FGCA and Artificial Neural Network
Li et al. BIS5k: a large-scale dataset for medical segmentation task based on HE-staining images of breast cancer
Qi et al. Discrete Wavelet Transform-Based CNN for Breast Cancer Classification from Histopathology Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination