CN116189785A - Spatial domain identification method based on spatial transcriptomics data feature extraction - Google Patents
Spatial domain identification method based on spatial transcriptomics data feature extraction Download PDFInfo
- Publication number
- CN116189785A CN116189785A CN202310097081.0A CN202310097081A CN116189785A CN 116189785 A CN116189785 A CN 116189785A CN 202310097081 A CN202310097081 A CN 202310097081A CN 116189785 A CN116189785 A CN 116189785A
- Authority
- CN
- China
- Prior art keywords
- gene expression
- matrix
- spatial
- network
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000000605 extraction Methods 0.000 title claims abstract description 28
- 239000011159 matrix material Substances 0.000 claims abstract description 129
- 230000014509 gene expression Effects 0.000 claims abstract description 128
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 40
- 238000005070 sampling Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000009826 distribution Methods 0.000 claims description 16
- 238000012163 sequencing technique Methods 0.000 claims description 14
- 230000000873 masking effect Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 5
- 238000012800 visualization Methods 0.000 claims description 5
- 230000004547 gene signature Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 claims description 2
- 230000002068 genetic effect Effects 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims 1
- 238000004043 dyeing Methods 0.000 claims 1
- 201000010099 disease Diseases 0.000 abstract description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 3
- 238000011161 development Methods 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 description 10
- 230000000007 visual effect Effects 0.000 description 8
- 238000003384 imaging method Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000011065 in-situ storage Methods 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 210000001061 forehead Anatomy 0.000 description 3
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 230000008827 biological function Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007901 in situ hybridization Methods 0.000 description 2
- 230000008611 intercellular interaction Effects 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 210000000857 visual cortex Anatomy 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000971 hippocampal effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000002442 prefrontal cortex Anatomy 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012174 single-cell RNA sequencing Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000025366 tissue development Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The invention discloses a spatial domain identification method based on spatial transcriptome data feature extraction, which mainly solves the problems of overfitting and low spatial domain identification precision in the spatial transcriptome data feature extraction in the prior art. The implementation scheme is as follows: preprocessing gene expression data and spatial information measured in a spatial transcriptome; constructing a gene similarity network and a space neighborhood network based on the gene expression feature matrix and the space information; carrying out data enhancement on the gene similarity network and the space neighborhood network; constructing a feature extraction model, and inputting enhanced data into the model to calculate contrast loss and reconstruction loss; according to the calculation loss training model, inputting unreinforced data into the trained model to obtain low-dimensional embedding; clustering the low-dimensional embeddings completes spatial domain identification. The method avoids overfitting in the characteristic extraction process, improves the accuracy of spatial domain identification, and can be used for providing reference data for exploring biological development and treating diseases.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a spatial domain identification method which can be used for providing reference data for exploring biological development and treating diseases.
Background
In tissue sections, some regions have a similar spatial gene expression profile, forming specific structures or substructures in the tissue. These regions have different functional compartments due to the differences in cell type composition and gene expression, thereby forming spatial domains with specific biologically significant structures. The identification of spatial domains is critical for studying the effects of tissue structure and cell-cell interactions.
Single cell transcriptome sequencing techniques scRNA-seq can be used to provide high resolution gene expression profiles, however, limitations are imposed on downstream analysis due to the inability to retain spatial position information when preparing samples. Spatial transcriptome sequencing techniques, including in situ hybridization-based imaging techniques and spatial barcode-based in situ sequencing techniques, provide both gene expression profile and spatial location information that is critical to understanding the biological significance of healthy tissue development and disease tumor microenvironment. The presentation of spatial transcriptome data thus helps better describe the spatial organization of cells. Regions with similar expression patterns are mined for spatial transcriptome data by clustering to interpret the spatial organization of cells, i.e., identifying spatial domains is one of the most important tasks of spatial transcriptomics.
Traditional clustering algorithms, such as: louvain and K-means cannot effectively utilize available spatial information, so that a clustering result cannot continuously identify a tissue region with an obvious layered structure in a tissue section, and cannot provide accurate reference for downstream analysis, and therefore a spatial clustering method for developing data of a spatial transcriptome while utilizing a gene expression profile and spatial position coordinates is required.
2021 Jian Hu et al proposed a deep learning algorithm called SpaGCN on Nature Methods to integrate gene expression, spatial location and histological images through a graph rolling network. Firstly, constructing a graph representing the relation between points by combining a spatial position and a histological image, and then gathering gene expression information from adjacent points by utilizing a graph roll lamination; and then adopting an unsupervised iterative clustering algorithm, and clustering the points by using an aggregation expression matrix.
2021 Edward Zhao et al proposed an algorithm named Bayespace on Nature Biotechnology. The Bayes space algorithm models the low-dimensional representation of the gene expression matrix, and introduces a spatial neighbor structure into the pre-inspection algorithm through a Bayes statistical method to encourage adjacent pixels to belong to the same cluster, so that spatial clustering is realized.
2022 Shihua Zhang et al proposed a new frame STAGATE based on a graph attention self-encoder on Nature Communications, which utilized the graph attention self-encoder to automatically learn weights of inter-node edges through an attention mechanism while embedding spatial information, taking into account spatial similarity of spatial domain boundary pixels.
2022 Chang Xu et al, on Nucleic Acids Research, proposed a deep neural network framework deep ST that uses a neural network to extract histological image features and creates a spatially enhanced gene expression matrix with gene expression and spatial location, using a graph convolution network and denoising self-encoder in combination to generate a potential representation of enhanced ST data.
These algorithms all suffer from the following disadvantages:
firstly, due to the addition of histological features, the complexity of the model is increased while the clustering precision is improved, and the occupied memory is large and the running time is long;
meanwhile, some algorithms attach importance to spatial information excessively so as to overcorrect gene expression characteristics and cause overfitting of clustering, so that some fine areas cannot be identified, and the identified spatial domains cannot be subjected to accurate analysis of biological functions.
Thirdly, the results of each repetition are unstable, the difference is large, and the results are only good on the data set measured by the space transcriptome sequencing means based on in-situ sequencing, but poor on the data set based on imaging, and the data set of the space transcriptome cannot be widely analyzed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a spatial domain identification method based on spatial transcriptome data feature extraction, so as to extract the combined features of gene expression spectrum and spatial position information in spatial transcriptome data, improve the generalization capability of spatial transcriptome based on sequencing and imaging based on two sequencing means and finish the accurate analysis of spatial domain biological functions.
The technical scheme of the invention is as follows: preprocessing gene expression data and spatial information measured in a spatial transcriptome; constructing a gene similarity network and a space neighborhood network based on the gene expression feature matrix and the space information; carrying out data enhancement on the gene similarity network and the space neighborhood network; constructing a feature extraction model, and inputting enhanced data into the model to calculate contrast loss and reconstruction loss; according to the calculation loss training model, inputting unreinforced data into the trained model to obtain low-dimensional embedding; clustering the low-dimensional embeddings completes spatial domain identification. The implementation steps comprise the following steps:
(1) Simultaneously measuring a gene expression value and a spatial position coordinate of each pixel point in a required tissue slice by using a spatial transcriptome sequencing technology to obtain spatial transcriptome data comprising a pixel point-gene expression matrix and the spatial position of each pixel point in the tissue slice;
(2) Preprocessing a gene expression matrix of the space transcriptome data:
(2a) Deleting the expressed genes with gene expression values less than three pixels in the space transcriptome data;
(2b) Carrying out numerical normalization on the deleted data to enable the count sum of each cell to be the median of all cells, carrying out logarithmic transformation on the normalized data, and normalizing the normalized data into zero mean and unit variance;
(2c) Performing Principal Component Analysis (PCA) on the standardized data, extracting the first n principal components, and generating a feature matrix X of gene expression;
(3) Constructing a spatial neighborhood network:
(3a) Calculating Euclidean distance d between each pixel point in the tissue slice on the basis of the space coordinate information;
(3b) Selecting the first k nearest neighbors of each pixel point based on the Euclidean distance d calculated by the space coordinates, and constructing an adjacency matrix A representing the space information;
(3c) Taking the gene expression characteristic matrix X generated in the step (2) as a node attribute characteristic matrix;
(3d) Based on the adjacency matrix A representing the space information and the node attribute feature matrix X, a space neighborhood network G is formed 1 (A,X);
(4) Constructing a gene expression similarity network:
(4a) Calculating the Euclidean distance d' between the gene expression values of each pixel point in the tissue slice based on the gene expression characteristic matrix X generated in the step (2);
(4b) Based on Euclidean distance d' between gene expression values, selecting the first k nearest neighbors of each pixel point, and constructing an adjacency matrix B for representing the gene expression similarity;
(4c) Based on the adjacent matrix B and node attribute feature matrix X for representing the gene expression similarity, a gene expression similarity network G is formed 2 (B,X);
(5) Data enhancement:
(5a) Masking probability p for edge and node attribute features in a spatial neighbor network according to a given edge consistent with Bernoulli distribution r And node feature mask probability p m Masking to obtain an enhanced spatial neighbor network G 1 (A 1 ,X 1 );
(5b) Masking probability p for edge and node attribute features in a gene expression similarity network according to a given edge conforming to Bernoulli distribution r And node feature mask probability p m Masking to obtain a gene expression similarity network G after enhancement 2 (B 1 ,X 2 );
(6) Constructing a feature extraction model of spatial transcriptome data consisting of a concatenation of encoder f (, parallel decoder h (), and projector g (), and using contrast loss L con And reconstruction loss L recon As a loss function L;
(7) Training a feature extraction model of the spatial transcriptome data:
(7a) Enhanced space neighbor network G 1 (A 1 ,X 1 ) Adjacent matrix a of (a) 1 And node attribute feature matrix X 1 Gene expression similarity network G 2 (B 1 ,X 2 ) Adjacent matrix B of (a) 1 And node attribute feature matrix X 2 Input into a space transcriptome feature extraction model, and generate a low-dimensional embedded Z by an encoder 1 and Z2 The decoder generates a reconstructed gene expression feature matrix and
(7b) Computing a low-dimensional embedded Z 1 and Z2 Is characterized by the contrast loss and the reconstruction gene expression of (1) andThe reconstruction loss of the node attribute feature matrix X is updated according to the calculated loss until the loss function L converges, and a trained space transcriptome feature extraction model is obtained;
(8) Inputting an adjacency matrix A and a node attribute feature matrix X of a spatial neighborhood network which is not subjected to data enhancement into a trained spatial transcriptome feature extraction model in the step (7 b) to obtain a combined low-dimensional embedded Z containing spatial information and gene expression;
(9) And clustering the obtained combined low-dimensional embedded Z by using a Leiden clustering algorithm, and obtaining a region with consistent gene expression, namely a spatial domain, on the tissue slice.
Compared with the prior art, the invention has the following advantages:
1) Because the spatial information and the gene expression profile of the spatial transcriptome data are combined to construct the spatial neighborhood network and the gene expression similarity network, compared with the prior method, the spatial information and the gene expression information can be better balanced, the overfitting of the gene expression profile is prevented, and the accuracy and the robustness of spatial domain identification are improved.
2) The invention only uses gene expression spectrum and spatial information, and does not add the characteristics of histological images, so that the efficiency of the model is improved, the running time is reduced, and the challenges of larger data sets generated in the future can be dealt with.
3) According to the invention, due to the introduction of contrast loss training low-dimensional embedding, similar samples in the samples are more similar, dissimilar samples are far away, the problem of spatial clustering is well fitted, and compared with the prior art, the generalization capability on a data set generated by two sequencing means of a spatial transcriptome is improved.
4) According to the invention, as the model architecture of the cascade connection of the encoder and the decoder is designed, and meanwhile, the contrast loss and the reconstruction loss are considered, the denoised data can be generated, and the biological significance in the original sample is better reserved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of data enhancement in accordance with the present invention;
FIG. 3 is a feature extraction model diagram of spatial transcriptome data constructed in the present invention;
FIG. 4 is a visual representation of spatial clustering results using the present invention and the existing STAGATE and deep ST methods, respectively.
Detailed Description
Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.
Existing spatial transcriptome data includes imaging techniques based on in situ hybridization and in situ sequencing techniques based on spatial barcodes, where imaging techniques include STAPmap, MERFISH and in situ sequencing techniques include spatial transcriptomics, 10x visual, slide-seq. This example takes the spatial transcriptome dataset of 10x visual spatial transcriptome sequenced human dorsal lateral forehead cortex 151673 slices, which contains 3639 pixels with 33538 genes per pixel.
Referring to fig. 1, the implementation steps of this example are as follows:
1.1 Acquiring pixel-gene expression matrix data in a spatial transcriptome data set of a spatial transcriptome sequenced human dorsal lateral forehead cortex layer 151673 slice of 10x Visum, deleting genes expressed in less than three pixels in gene expression values in the spatial transcriptome data to realize data filtering, and obtaining 3639 pixels and 19151 genes remained after filtering;
1.2 Median normalization of the transcriptome data after filtering, that is, dividing each column of data by the median of the column of data, and then carrying out logarithmic conversion on the data after normalization of the median, and normalizing the data into zero mean and unit variance;
1.3 Main component analysis PCA is carried out on the standardized data, the first 300 main components are extracted, and a feature matrix X of gene expression is generated:
X=[x 1 ;x 2 …;x i ;…;x n ] T
wherein, [;]representing the splicing operation, x i For the gene feature vector of i pixels, i=1..n, n is the number of all pixels in a tissue slice, and T represents the transpose.
And 2, constructing a space neighborhood network.
2.1 Acquiring spatial position coordinate data in a spatial transcriptome dataset of a 10x visual spatial transcriptome sequenced human dorsal lateral forehead cortex layer 151673 slice, and calculating the euclidean distance d of each pixel point on a spatial position based on the spatial coordinate information:
wherein ,(ai ,b i) and (aj ,b j ) Spatial coordinates of pixel i and pixel j on the tissue slice;
2.2 The first 5 nearest neighbors of each pixel point are selected based on the Euclidean distance d calculated by the space coordinates, and an adjacency matrix A for representing the space information is constructed:
wherein ,i and j represent two nodes in the spatial neighborhood network respectively for the elements of the ith row and jth column in the adjacency matrix A of the spatial neighborhood network, if node i is included in the first 5 nearest neighbors of node j calculated based on spatial coordinates, then i and j are adjacent, otherwise i and j are not adjacent, i, j=1..n, n=3639 represents the number of nodes included in the spatial neighborhood network;
2.3 Taking the gene expression characteristic matrix X generated in the step 1 as a node attribute characteristic matrix;
2.4 Based on the adjacent matrix A representing the space information and the node attribute characteristic matrix X, a space neighborhood network G is formed 1 (A,X)。
And 3, constructing a gene expression similarity network.
3.1 Based on the gene expression feature matrix X generated in step 1), calculating the euclidean distance d' between the gene expression values of each pixel point in the tissue section:
wherein xjk and xik The values of the kth dimension of the pixel i and pixel j gene expression feature vectors, respectively, k=1..m, m=300 being the dimension of each pixel gene expression feature vector;
3.2 Based on the Euclidean distance d' calculated by the gene expression value, selecting the first 5 nearest neighbors of each pixel point, and constructing an adjacency matrix B for representing the gene expression similarity:
wherein ,for elements of the ith row and jth column in the adjacency matrix B of the gene expression similarity network, i and j represent two nodes in the gene expression similarity network respectively, if the node i is included in the first 5 nearest neighbors calculated based on the gene expression matrix of the node j, i and j are adjacent, otherwise i and j are not adjacent, i, j=1..n, n=3639 represents the number of nodes contained in the gene expression similarity network, and the number of nodes is the same as the number of nodes in the spatial neighborhood network;
3.3 Based on the adjacent matrix B and node attribute characteristic matrix X for representing the gene expression similarity, forming a gene expression similarity network G 2 (B,X)。
In order to increase training samples and improve the self-supervision capability of the model, data enhancement needs to be carried out on an adjacent matrix A and a node attribute feature matrix X of the spatial neighbor network and an adjacent matrix B and a node attribute feature matrix X of the gene expression similarity network.
Referring to fig. 2, the present step is specifically implemented as follows:
4.1 According to each element a in the adjacency matrix A of the spatial neighborhood network ij Sampling an edge mask matrix according to Bernoulli distribution
in the formula ,mask matrix for edge->The element in the ith row and the jth column in (a), if a ij =1, then->From Bernoulli distribution B (1-p r ) Middle sampling, if a ij =0, then-> wherein ,pr =0.2 is the probability that each edge in the spatial neighborhood network is deleted, i and j represent two nodes in the spatial neighborhood network, i, j=1..n, n=3639 represents the number of nodes contained in the spatial neighborhood network, respectively;
4.2 A) combining the adjacency matrix A of the spatial neighborhood network with the sampling matrix generated in 4.1)Enhanced adjacency matrix A by element multiplication 1 :
in the formula ,the operator represents the adjacency matrix A and sampling matrix in the spatial neighborhood network>According to element multiplication, a ij Is the element of the ith row and jth column in the adjacency matrix A of the space neighborhood network, ++>For sampling matrix->Elements of the ith row and the jth column;
4.3 According to Bernoulli distribution B (1-p) m ) Sampling a random vector to generate a node feature mask vector with the same dimension as the gene feature vectorWhere pm=0.3 is the probability that the value in each node feature vector in the spatial neighborhood network is deleted;
4.4 Node attribute feature matrix X of the spatial neighbor network with 4.3) generated node feature mask vectorObtaining the enhanced node attribute feature matrix X according to element multiplication 1 :
Wherein, [;]representing the splicing operation, x i Is X 1 The i < th > row of the spatial neighborhood network represents the gene feature vector of the spatial neighborhood network on the node i, i=1..n, n=3639 is the number of nodes of the spatial neighborhood network;
4.5 Each element B in the adjacency matrix B according to the gene expression similarity network ij Sampling an edge mask matrix R epsilon {0,1} according to Bernoulli distribution N×N :
in the formula ,if b, for the element of the ith row and jth column of the edge mask matrix R ij =1, then R ij From Bernoulli distribution B (1-p r ) Middle sampling, if b ij =0, then R ij =0, where p r ' 0=2 is that each edge in the gene expression similarity network is coveredThe probability of deletion, i and j, respectively represent two nodes in the gene expression similarity network, i, j=1..n, n=3639 represents the number of nodes contained in the gene expression similarity network, consistent with the number of nodes in the spatial neighborhood network;
4.6 Multiplying the adjacency matrix B of the gene expression similarity network with the sampling matrix R generated in 4.5) according to elements to obtain the adjacency matrix B of the enhanced gene expression similarity network 1 :
in the formula ,the operator represents multiplying the adjacency matrix B and the sampling matrix R of the gene expression similarity network by elements, B ij For elements of the ith row and jth column in adjacency matrix B of the gene expression similarity network, R ij The element of the ith row and the jth column in the sampling matrix R;
4.7 According to Bernoulli distribution B (1-p) m ) Sampling a random vector to generate a node feature mask vector m with the same dimension as the gene feature vector, wherein p m ' 0.3 is the probability that the value in each node feature vector in the gene expression similarity network is deleted;
4.8 Multiplying the node attribute feature matrix X of the gene expression similarity network with the node attribute mask vector m generated by 4.7) according to elements to obtain the node attribute feature matrix X of the enhanced gene expression similarity network 2 :
Wherein, [;]representing the splicing operation, x i Is X 2 Represents the feature vector of the gene on the node i in the gene expression similarity network.
And 5, constructing a feature extraction model of the space transcriptome data.
Referring to fig. 3, the specific implementation of this step is as follows:
5.1 Establishing an encoder consisting of an input GCN layer and two layers of hidden GCN layers, wherein the input dimension is 300 dimensions of transcriptome gene characteristic dimension, the first hidden GCN layer is 256 dimensions, the second hidden GCN layer is 128 dimensions, and a PRelu function is used as an activation function between each two layers of GCNs;
5.2 Establishing a decoder consisting of an input full-connection layer and three layers of hidden full-connection hierarchies, wherein the input full-connection layer is 128-dimensional, the first hidden full-connection layer is 128-dimensional, the second hidden full-connection layer is 256-dimensional, the third hidden full-connection layer dimension is 300-dimensional of transcriptome gene characteristic dimension, and a Relu function is used as an activation function between each layer of full-connection layer;
5.3 The projector formed by the cascade of the input full-connection layer and the hidden full-connection layer is established, the input full-connection layer is 128-dimensional, the hidden full-connection layer is 128-dimensional, and no activation function is arranged between each layer;
5.4 Cascading the encoder with the decoder and the projector respectively to form a feature extraction model of the space transcriptome data;
5.5 Let the loss function L of the feature extraction model be a weighted sum of the contrast loss and the reconstruction loss, expressed as follows:
L=λ con L con +λ recon L recon
wherein ,λcon =1,λ recon =0.01 is the super parameter for measuring the weight of contrast loss and reconstruction loss, respectively, L recon Represents reconstruction loss, L con Representing contrast loss.
And step 6, training a feature extraction model of the space transcriptome data.
6.1 G) are extracted separately by encoder f (·) 1 (A 1 ,X 1 ) Node low-dimensional embedding Z of (E) 1 and G2 (B 1 ,X 2 ) Node low-dimensional embedding Z of (E) 2 :
Z 1 =f(X 1 ,A 1 )=GC k+1 (GC k (X 1 ,A 1 ),A 1 )
Z 2 =f(X 2 ,B 1 )=GC k+1 (GC k (X 2 ,B 1 ),B 1 ),
wherein ,GCk (. Cndot.) represents the k-layer network of the encoder, X 1 and A1 Node characteristic matrix and adjacent matrix of space neighborhood network respectively, X 2 and B1 Respectively a node characteristic matrix and an adjacent matrix in the gene expression similarity network, wherein k=1;
6.2 Embedding Z) the two low dimensions obtained in step 6.1) 1 and Z2 Respectively as input of decoder h (·) to obtain G 1 (A 1 ,X 1 ) Reconstructed gene expression feature matrix and G2 (B 1 ,X 2 ) Reconstructed Gene expression characterization matrix->
6.3 Embedding the two low dimensions generated in step 6.1) into Z 1 and Z2 Respectively input to a projector g (·) to obtain Z 1 Low dimensional embedding of Z 'for contrast loss of (C)' 1 and Z2 Low dimensional embedding of Z 'for contrast loss of (C)' 2 :
Z′ 1 =g(Z 1 )
Z′ 2 = g (Z 2 );
6.4 Calculating the contrast loss l (z 'of each node i and other nodes k according to the result of the step 6.3)' 1i ,z′ 2i ):
Wherein, θ (·) represents the cosine similarity distance, and τ is a given hyper-parameter; z'. 1i Is Z' 1 Vector of the i-th row representing pixel point i at G 1 (A 1 ,X 1 ) As input, the output of the projector; z'. 2i Is Z' 2 Vector of the i-th row representing pixel point i at G 2 (B 1 ,X 2 ) As output of the projector at the time of input, i, k=1..n, n=3639 is the number of all pixel points in the tissue slice;
6.5 Calculating the contrast loss L of the whole model according to the contrast loss of each node obtained in the step 6.4) con :
6.6 According to the reconstructed gene feature matrix generated in step 6.2) andCalculating reconstruction loss L recon :
in the formula ,xi A vector of the ith row of X, which represents the genetic characteristic of pixel point i;is->Vector of the i-th row representing the pixel point i is represented by G 1 (A 1 ,X 1 ) Reconstructed gene signature;Is->Vector of the i-th row representing the pixel point i is represented by G 2 (B 1 ,X 2 ) Reconstructed gene signature;
6.7 According to contrast loss L con And reconstruction loss L recon Calculating a loss function L:
L=λ con L con +λ recon L recon
6.8 Updating the network parameters of the encoder and the decoder according to the loss function L obtained in the step 6.7) until the loss function L converges, and obtaining a trained spatial transcriptome feature extraction model.
and step 8, clustering the combined low-dimensional embedding obtained in the step 7 by using a Leiden clustering algorithm.
8.1 Calculating the neighbor of each pixel point according to the combined low-dimensional embedding Z extracted in the step 7, constructing a neighborhood graph, and storing a neighborhood label l';
8.2 Performing dimension reduction on the combined low-dimension embedded Z through UMAP algorithm to obtain an embedded Z' with reduced dimension;
8.3 Obtaining a clustering label l through a Leiden algorithm according to the neighborhood label l 'obtained in the step 8.1) and the low-dimensional post-embedding Z' obtained in the step 8.2);
8.4 UMAP visualization is carried out on the clustering label l and the low-dimensional embedded Z', each pixel point is dyed on the tissue slice according to the clustering label l, and the pixel points with the same color are regarded as a domain, namely the identification of the spatial domain is realized.
The technical effects of the present invention will be described below in connection with simulation experiments.
Simulation conditions:
the CPU of the computer hardware of the simulation experiment is Intel Core (TM) i7-8700, and the memory of the computer hardware is 32G;
computer software: python3.8 integrated development software on WINDOWS10 system.
Second, simulation content:
simulation 1: the spatial clustering was performed with the present invention and the existing 6 methods SEDR, STAGATE, deepST, scanpy, stlearn, spaGCN on a dataset generated by two spatial transcriptome sequencing means, namely a spatial transcriptome dataset based on 12 slices of 10x visual human dorsal lateral prefrontal cortex layer DLPFC and a spatial transcriptome dataset based on imaged STARmap mouse visual cortex, and the results were as shown in table 1 using the adjusted rand index ARI as an evaluation index for evaluating the spatial clustering results of each method:
table 1 evaluation of the invention and the 6 existing methods in a tagged dataset
The existing 6 spatial domain identification methods are as follows:
SEDR,Ling S,Huazhu F,et al.Unsupervised Spatially Embedded Deep Representation ofSpatial Transcriptomics[J].bioRxiv,2021.
STAGATE,Dong K,Zhang S.Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder[J].Nature communications,2022,13(1):1-12.
DeepST,Xu C,Jin X,Wei S,et al.DeepST:identifying spatial domains in spatial transcriptomics by deep learning[J].Nucleic Acids Research,2022.
Scanpy,Wolf F A,Angerer P,Theis F J.SCANPY:large-scale single-cell gene expression data analysis[J].Genome Biology,2018,19(1):1-5.
Stlearn,Pham D,Tan X,Xu J,et a1.stLearn:integrating spatial location,tissue morphology and gene expression to find cell types,cell-cell interactions and spatial trajectories within undissociated tissues[J].bioRxiv,2020.
SpaGCN,LiM,Hu J,Li X,et al.SpaGCN:Integrating gene expression,spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network[J].Nature Methods,2021,18(10):1342-1351.
as can be seen from Table 1, the present invention has better results than the other methods on 12 data sets of DLPFC of 10xVisium, and the mean value is higher than that of the other methods. On the mouse visual cortex dataset of STAPmap, the performance of the invention and STAGATE is significantly higher than other methods, but the invention has higher accuracy than STAGATE. The simulation result shows that the invention maintains higher accuracy in both in-situ sequencing-based data sets and imaging-based data sets, and has good generalization capability.
Simulation 2: the present invention was used to spatially cluster on 10x visual mouse brain slices and human breast cancer spatial transcriptome datasets with the current 3 methods DeepST, SEDR, STAGATE, and profile factor Silhouette Coefficient score and DB score Davies-Bouldin score were used as evaluation indicators for evaluating the spatial clustering results of each method, the results are shown in table 2:
table 2 evaluation of the invention and the prior 3 methods in unlabeled dataset
As can be seen from Table 2, the performance of the invention with STAGATE is significantly higher than the other methods on the spatial transcriptome dataset of the 10XVisium mouse brain, but the invention is slightly higher than the STAGATE index. The invention has great advantages over other methods on spatial transcriptome datasets of 10x visual human breast cancer. The simulation result shows that on a plurality of unlabeled data sets needing fine recognition, the clustering result of the invention is better, and the biological significance in the original sample is better reserved.
Simulation 3: the data set of the mouse brain coronal plane at 10x visual with the two existing methods deep st and STAGATE of the invention is used for identifying the spatial domain through spatial clustering, and each pixel point is stained with the clustering result on a histological section, and the result is shown in fig. 4. Wherein fig. 4 (a) shows a spatial cluster visualization of the present invention, fig. 4 (b) shows a spatial cluster visualization of STAGATE, and fig. 4 (c) shows a spatial cluster visualization of deep st.
As can be seen from fig. 4, the existing methods STAGATE and deep cannot accurately identify the spatial domain on the mouse brain coronal slice, and cannot clearly represent the difference between each domain, especially the hippocampal region on the dataset, and the spatial clustering result of the present invention is more in accordance with biological significance. The simulation result shows that the characteristics extracted by the method do not cause overfitting on the gene expression profile, and the accuracy and the robustness of spatial domain identification are improved.
Claims (12)
1. The spatial domain identification method based on the spatial transcriptomics data feature extraction is characterized by comprising the following steps:
(1) Simultaneously measuring a gene expression value and a spatial position coordinate of each pixel point in a required tissue slice by using a spatial transcriptome sequencing technology to obtain spatial transcriptome data comprising a pixel point-gene expression matrix and the spatial position of each pixel point in the tissue slice;
(2) Preprocessing a gene expression matrix of the space transcriptome data:
(2a) Deleting the expressed genes with gene expression values less than three pixels in the space transcriptome data;
(2b) Carrying out numerical normalization on the deleted data to enable the count sum of each cell to be the median of all cells, carrying out logarithmic transformation on the normalized data, and normalizing the normalized data into zero mean and unit variance;
(2c) Performing Principal Component Analysis (PCA) on the standardized data, extracting the first n principal components, and generating a feature matrix X of gene expression;
(3) Constructing a spatial neighborhood network:
(3a) Calculating the Euclidean distance d between each pixel point in the tissue slice on the basis of the space coordinate information:
(3b) Selecting the first k nearest neighbors of each pixel point based on the Euclidean distance d calculated by the space coordinates, and constructing an adjacency matrix A representing the space information;
(3c) Taking the gene expression characteristic matrix X generated in the step (2) as a node attribute characteristic matrix;
(3d) Based on the adjacency matrix A representing the space information and the node attribute feature matrix X, a space neighborhood network G is formed 1 (A,X);
(4) Constructing a gene expression similarity network:
(4a) Calculating the Euclidean distance d' between the gene expression values of each pixel point in the tissue slice based on the gene expression characteristic matrix X generated in the step (2);
(4b) Based on the Euclidean distance d' calculated by the gene expression value, selecting the first k nearest neighbors of each pixel point, and constructing an adjacent matrix B for representing the gene expression similarity;
(4c) Based on the adjacent matrix B and node attribute feature matrix X for representing the gene expression similarity, a gene expression similarity network G is formed 2 (B,X);
(5) Data enhancement:
(5a) Masking probability p for edge and node attribute features in a spatial neighbor network according to a given edge consistent with Bernoulli distribution r And node feature mask probability p m Masking to obtain an enhanced spatial neighbor network G 1 (A 1 ,X 1 );
(5b) Masking probability p for edge and node attribute features in a gene expression similarity network according to a given edge conforming to Bernoulli distribution r ' and node feature mask probability p m ' masking to obtain a network of similarity of enhanced gene expression G 2 (B 1 ,X 2 );
(6) Construction consists of a cascade of encoder f (·) with decoder h (·) and projector g (·) respectivelyFeature extraction model of spatial transcriptome data of (2) and using contrast loss L con And reconstruction loss L recon As a loss function L;
(7) Training a feature extraction model of the spatial transcriptome data:
(7a) Spatial neighbor network G with enhanced data 1 (A 1 ,X 1 ) Adjacency matrix A1 and node attribute feature matrix X 1 Gene expression similarity network G 2 (B 1 ,X 2 ) Adjacent matrix B of (a) 1 And node attribute feature matrix X 2 Input into a space transcriptome feature extraction model, and generate a low-dimensional embedded Z by an encoder 1 and Z2 The decoder generates a reconstructed gene expression feature matrix and
(7b) Computing a low-dimensional embedded Z 1 and Z2 Is characterized by the contrast loss and the reconstruction gene expression of (1) andUpdating network parameters of the encoder and the decoder according to the calculated loss until the loss function L converges to obtain a trained space transcriptome feature extraction model;
(8) Inputting an adjacency matrix A and a node attribute feature matrix X of a spatial neighborhood network which is not subjected to data enhancement into a trained spatial transcriptome feature extraction model in the step (7 b) to obtain a combined low-dimensional embedded Z containing spatial information and gene expression;
(9) And clustering the obtained combined low-dimensional embedded Z by using a Leiden clustering algorithm, and obtaining a region with consistent gene expression, namely a spatial domain, on the tissue slice.
2. The method of claim 1, wherein step (2 c) generates a gene expression profile X, expressed as follows:
X=[x 1 ;x 2 …;x i ;…;x n ] T
wherein, [;]representing the splicing operation, x i For the gene feature vector of i pixels, i=1..n, n is the number of all pixels in a tissue slice, and T represents the transpose.
4. The method of claim 1, wherein the adjacency matrix a of the spatial neighborhood network constructed in step (3 b) is represented as follows:
wherein ,for the elements of the ith row and jth column in the adjacency matrix A of the spatial neighborhood network, i and j represent two nodes in the spatial neighborhood network respectively, if the node i is included in the first k nearest neighbors calculated based on the spatial coordinates of the node j, the i and j are adjacent, otherwisei and j are not adjacent, i, j=1..n, n represents the number of nodes contained in the spatial neighborhood network.
5. The method of claim 1, wherein the euclidean distance d' between the gene expression values for each pixel in the tissue section is calculated in step (4 a) as follows:
wherein xjk and xik The values of the kth dimension of the pixel i and pixel j gene expression feature vectors, respectively, k=1..m, m being the dimension of each pixel gene expression feature vector.
6. The method of claim 1, wherein the adjacency matrix B of the gene expression similarity network constructed in step (4B) is represented as follows:
wherein ,for the elements of the ith row and jth column in the adjacency matrix B of the gene expression similarity network, i and j represent two nodes in the gene expression similarity network, respectively, if the node i is included in the first k nearest neighbors calculated based on the gene expression matrix of the node j, i and j are adjacent, otherwise i and j are not adjacent, i, j=1.
7. The method of claim 1, wherein the step (5 a) is performed on a spatial neighbor network G 1 The edge and node attribute features in (a, X) are masked according to probabilities as follows:
(5a1) According to spaceEach element a in adjacency matrix a of the neighborhood network ij Sampling an edge mask matrix according to Bernoulli distributionIt is represented as follows:
in the formula ,mask matrix for edge->The element in the ith row and the jth column in (a), if a ij =1, then->From Bernoulli distribution B (1-p r ) Middle sampling, if a ij =0, then-> wherein ,pr The probability that each edge in the spatial neighborhood network is deleted is that i and j respectively represent two nodes in the spatial neighborhood network, i, j=1..n, n represents the number of nodes contained in the spatial neighborhood network;
(5a2) Combining the adjacency matrix A of the spatial neighborhood network with the sampling matrix generated in (5 a 1)Enhanced adjacency matrix A by element multiplication 1 :
in the formula ,the operator represents the adjacency matrix A and sampling matrix in the spatial neighborhood network>According to element multiplication, a ij Is the element of the ith row and jth column in the adjacency matrix A of the space neighborhood network, ++>For sampling matrix->Elements of the ith row and the jth column;
(5a3) According to Bernoulli distribution B (1-p m ) Sampling a random vector to generate a node feature mask vector with the same dimension as the gene feature vector wherein ,pm Is the probability that the value in each node feature vector in the spatial neighborhood network is deleted;
(5a4) Node attribute feature matrix X of the space neighbor network and the node feature mask vector generated in (5 a 3)Obtaining the enhanced node attribute feature matrix X according to element multiplication 1 :
Wherein, [;]representing the splicing operation, x i Is X1 A kind of electronic device And the ith row represents the gene characteristic vector of the node i in the space neighbor network.
8. Root of Chinese characterThe method of claim 1, wherein the gene is expressed in step (5 b) in a similarity network G 2 The edge and node attribute features in (B, X) are masked according to probability as follows:
(5b1) Each element B in the adjacency matrix B according to the gene expression similarity network ij Sampling an edge mask matrix R epsilon {0,1} according to Bernoulli distribution N×N The expression is as follows:
in the formula ,if b, for the element of the ith row and jth column of the edge mask matrix R ij =1, then R ij From Bernoulli distribution B (1-p r ) Middle sampling, if b ij =0, then R ij =0, where p r ' is the probability that each edge in the gene expression similarity network is deleted, i and j represent two nodes in the gene expression similarity network, respectively, i, j=1..n, n representing the number of nodes contained in the gene expression similarity network;
(5b2) Multiplying the adjacency matrix B of the gene expression similarity network with the sampling matrix R generated in (5B 1) by elements to obtain an adjacency matrix B of the enhanced gene expression similarity network 1 :
in the formula ,the operator represents multiplying the adjacency matrix B and the sampling matrix R of the gene expression similarity network by elements, B ij For elements of the ith row and jth column in adjacency matrix B of the gene expression similarity network, R ij For the first sample matrix Ri row j column elements;
(5b3) According to Bernoulli distribution B (1-p m ) Sampling a random vector to generate a node feature mask vector m with the same dimension as the gene feature vector, wherein p m ' is the probability that the value in each node feature vector in the gene expression similarity network is deleted;
(5b4) Multiplying the node attribute feature matrix X of the gene expression similarity network with the node feature mask vector m generated in step (5 a 3) according to elements to obtain the node attribute feature matrix X of the enhanced gene expression similarity network 2 :
Wherein, [;]representing the splicing operation, x i Is X 2 Represents the feature vector of the gene on the node i in the gene expression similarity network.
9. The method of claim 1, wherein the encoder, decoder, projector parameters and loss functions of the spatial transcriptome data feature extraction model constructed in step (6) are as follows:
the encoder f (·) is formed by cascade of an input graph neural network GCN layer and two layers of hidden GCN layers, wherein the input dimension is a transcriptome gene feature number, the first hidden GCN layer is 256 dimensions, the second hidden GCN layer is 128 dimensions, and a PRelu function is used as an activation function between each layer of GCNs;
the decoder h (-) consists of an input full-connection layer and three hidden full-connection layers which are cascaded, wherein the input full-connection layer is 128-dimensional, the first hidden full-connection layer is 128-dimensional, the second hidden full-connection layer is 256-dimensional, the third hidden full-connection layer is the feature number of the transcriptome genes, and a Relu function is used as an activation function between each full-connection layer;
the projector g (-) consists of an input full-connection layer and a hidden full-connection layer which are cascaded, wherein the input full-connection layer is 128-dimensional, the hidden full-connection layer is 128-dimensional, and an activation function is not arranged between each layer;
the loss function, which is a weighted sum of the contrast loss and the reconstruction loss, is expressed as:
L=λ con L con +λ recon L recon
wherein ,λcon and λrecon Is a super parameter for measuring the weight of contrast loss and reconstruction loss, L recon Represents reconstruction loss, L con Representing contrast loss.
10. The method of claim 1, wherein the step (7 a) of generating a low-dimensional embedding by an encoder and reconstructing a gene expression feature matrix by a decoder is performed as follows:
(7a1) Encoder f (·) extraction G 1 (A 1 ,X 1 ) Node low-dimensional embedding Z of (E) 1 and G2 (B 1 ,X 2 ) Node low-dimensional embedding Z of (E) 2 :
Z 1 =f(X 1 ,A 1 )=GC k+1 (GC k (X 1 ,A 1 ),A 1 )
Z 2 =f(X 2 ,B 1 )=GC k+1 (GC k (X 2 ,B 1 ),B 1 ),
wherein ,GCk (. Cndot.) represents the k-layer network of the encoder, X 1 and A1 Node characteristic matrix and adjacent matrix of space neighborhood network respectively, X 2 and B1 Respectively a node characteristic matrix and an adjacent matrix in the gene expression similarity network, wherein k=1;
(7a2) Embedding Z into the two low dimensions obtained in (7 a 1) 1 and Z2 Respectively as input of decoder h (·) to obtain G 1 (A 1 ,X 1 ) Reconstructed gene expression feature matrix and G2 (B 1 ,X 2 ) Reconstructed Gene expression characterization matrix->
11. The method of claim 1, wherein the calculation of the contrast loss and the reconstruction loss in step (7 b) is performed as follows:
(7b1) Inputting the low-dimensional embeddings generated in the step (7 a) into a projector g (·) respectively to obtain Z respectively 1 and Z2 Low dimensional embedding Z 'for contrast loss' 1 and Z′2 :
Z′ 1 =g(Z 1 )
Z′ 2 =g(Z 2 );
(7b2) Calculating the comparison loss l (z 'of each node i and other nodes k according to the result of (7 b 1)' 1i ,z′ 2i ):
Wherein, θ (·) represents the cosine similarity distance, and τ is a given hyper-parameter; z'. 1i Is Z' 1 Vector of the i-th row representing pixel point i at G 1 (A 1 ,X 1 ) As input, the output of the projector; z'. 2i Is Z' 2 Vector of the i-th row representing pixel point i at G 2 (B 1 ,X 2 ) As output of the projector at the time of input, i, k=1..n, n is the number of all pixel points in the tissue slice;
(7b3) Calculating the whole model according to the comparison loss of each node obtained in (7 b 2)Contrast loss L of (2) con :
(7b4) According to the reconstructed gene feature matrix generated in step (7 a) andCalculating reconstruction loss L recon :
in the formula ,xi A vector of the ith row of X, which represents the genetic characteristic of pixel point i;is->Vector of the i-th row representing the pixel point i is represented by G 1 (A 1 ,X 1 ) Reconstructed gene signature;Is->Vector of the i-th row representing the pixel point i is represented by G 2 (B 1 ,X 2 ) Reconstructed gene signature;
(7b5) According to contrast loss L con And reconstruction loss L recon Calculating a loss function L:
L=λ con L con +λ recon L recon
wherein ,λcon and λrecon Is a super parameter for measuring the weights of the contrast loss and the reconstruction loss, respectively.
12. The method of claim 1, wherein step (9) uses Leiden clustering algorithm to cluster the combined low-dimensional embedding Z obtained after training, implemented as follows:
(9a) Calculating the neighbor of each pixel point according to the combined low-dimensional embedding Z extracted in the step (8), constructing a neighborhood graph, and storing a neighborhood label l';
(9b) Performing dimension reduction on the combined low-dimension embedded Z through UMAP algorithm to obtain an embedded Z' with reduced dimension;
(9c) Obtaining a cluster label l through a Leiden algorithm according to the neighborhood label l 'in the step (9 a) and the embedded Z' in the step (9 b);
(9d) And carrying out UMAP visualization on the cluster label l and the low-dimensional embedded Z', and dyeing each pixel point on the tissue slice according to the cluster label l, wherein the pixel points with the same color are regarded as a domain, namely, the identification of the spatial domain is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310097081.0A CN116189785A (en) | 2023-02-10 | 2023-02-10 | Spatial domain identification method based on spatial transcriptomics data feature extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310097081.0A CN116189785A (en) | 2023-02-10 | 2023-02-10 | Spatial domain identification method based on spatial transcriptomics data feature extraction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116189785A true CN116189785A (en) | 2023-05-30 |
Family
ID=86435993
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310097081.0A Pending CN116189785A (en) | 2023-02-10 | 2023-02-10 | Spatial domain identification method based on spatial transcriptomics data feature extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116189785A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118016149A (en) * | 2024-04-09 | 2024-05-10 | 太原理工大学 | Spatial domain identification method for integrating space transcriptome multi-mode information |
-
2023
- 2023-02-10 CN CN202310097081.0A patent/CN116189785A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118016149A (en) * | 2024-04-09 | 2024-05-10 | 太原理工大学 | Spatial domain identification method for integrating space transcriptome multi-mode information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xue et al. | An application of transfer learning and ensemble learning techniques for cervical histopathology image classification | |
CN113706487A (en) | Multi-organ segmentation method based on self-supervision characteristic small sample learning | |
Hou | Breast cancer pathological image classification based on deep learning | |
Xu et al. | Computerized spermatogenesis staging (CSS) of mouse testis sections via quantitative histomorphological analysis | |
Shubham et al. | Identify glomeruli in human kidney tissue images using a deep learning approach | |
WO2021073279A1 (en) | Staining normalization method and system for digital pathological image, electronic device and storage medium | |
CN115497623A (en) | Lung cancer prognosis prediction system based on image, pathology and gene multiomics | |
Yu et al. | A recognition method of soybean leaf diseases based on an improved deep learning model | |
Liao et al. | A segmentation method for lung parenchyma image sequences based on superpixels and a self-generating neural forest | |
Shallu et al. | Automatic magnification independent classification of breast cancer tissue in histological images using deep convolutional neural network | |
Routray et al. | Ensemble Learning with Symbiotic Organism Search Optimization Algorithm for Breast Cancer Classification & Risk Identification of Other Organs on Histopathological Images | |
CN117253550A (en) | Spatial transcriptome data clustering method | |
CN116189785A (en) | Spatial domain identification method based on spatial transcriptomics data feature extraction | |
Hacking et al. | Deep learning for the classification of medical kidney disease: a pilot study for electron microscopy | |
CN117036894B (en) | Multi-mode data classification method and device based on deep learning and computer equipment | |
Yu et al. | Pyramid multi-loss vision transformer for thyroid cancer classification using cytological smear | |
Ke et al. | Mine local homogeneous representation by interaction information clustering with unsupervised learning in histopathology images | |
Taheri et al. | A Comprehensive Study on Classification of Breast Cancer Histopathological Images: Binary Versus Multi-Category and Magnification-Specific Versus Magnification-Independent | |
Liu et al. | TSDLPP: a novel two-stage deep learning framework for prognosis prediction based on whole slide histopathological images | |
Martin et al. | A graph based neural network approach to immune profiling of multiplexed tissue samples | |
Su et al. | Whole slide cervical image classification based on convolutional neural network and random forest | |
Chen et al. | MSCCNet: Multi-Scale Convolution-Capsule Network for Cervical Cell Classification | |
Yuan | [Retracted] Image Processing Method Based on FGCA and Artificial Neural Network | |
Li et al. | BIS5k: a large-scale dataset for medical segmentation task based on HE-staining images of breast cancer | |
Qi et al. | Discrete Wavelet Transform-Based CNN for Breast Cancer Classification from Histopathology Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |