CN114944194A - Method and system for deducing cell subset expression mode in space transcriptome - Google Patents
Method and system for deducing cell subset expression mode in space transcriptome Download PDFInfo
- Publication number
- CN114944194A CN114944194A CN202210552099.0A CN202210552099A CN114944194A CN 114944194 A CN114944194 A CN 114944194A CN 202210552099 A CN202210552099 A CN 202210552099A CN 114944194 A CN114944194 A CN 114944194A
- Authority
- CN
- China
- Prior art keywords
- expression
- cell
- spatial
- transcriptome
- distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000009826 distribution Methods 0.000 claims abstract description 40
- 239000011159 matrix material Substances 0.000 claims abstract description 35
- 238000012174 single-cell RNA sequencing Methods 0.000 claims abstract description 20
- 238000003908 quality control method Methods 0.000 claims abstract description 13
- 238000005070 sampling Methods 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 108090000623 proteins and genes Proteins 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000003860 storage Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 108020005196 Mitochondrial DNA Proteins 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 239000000470 constituent Substances 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 abstract description 4
- 238000007405 data analysis Methods 0.000 abstract description 2
- 210000004027 cell Anatomy 0.000 description 72
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 239000011324 bead Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 210000003292 kidney cell Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Medical Informatics (AREA)
- Pure & Applied Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Algebra (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Computing Systems (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method and a system for deducing an expression mode of a cell subset in a space transcriptome, and relates to the technical field of sequencing data analysis of the space transcriptome in bioinformatics. The method comprises the steps of performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix; normalizing and normalizing the cell subpopulation expression matrix; constructing a variational neural network to learn the implicit variable distribution of each cell subset in the scRNA-seq data set; sampling in the trained latent variable distribution to generate an expression mode of the cell subset; deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains. The invention can keep a large amount of related information while reducing dimensionality of the single cell reference data required by the deconvolution method in the space transcriptome, improve the running speed and accuracy of the deconvolution method, and enable the distribution of cells in tissue slices to be more accurate.
Description
Technical Field
The invention belongs to the technical field of bioinformatics space transcriptome sequencing data analysis, and particularly relates to a method and a system for deducing an expression mode of a cell subset in a space transcriptome.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Spatial transcriptomics is a cross discipline of life sciences and computer science. Breakthrough advances in this area have brought new discoveries into the study of diseases and biological processes. However, due to the limitations of current sequencing technologies: spatial transcriptomics techniques are able to measure the location of transcript production, but not which cells produced the transcript separately. Whereas single cell technology (scRNA-seq) can obtain transcripts per cell, although spatial information is lost.
Some analytical tools integrate single-cell data with spatial transcriptome data and propose a method to understand convolution, i.e. consider each sample point (spot or bead) as a mixture of multiple cell types. The method takes the expression mode of cell subsets in a single cell as a basis to construct a model, takes the experimental data of each spot of a space transcriptome as input, and generates output which is the maximum posterior estimation of the distribution of the cell subsets in the space under the gene expression distribution of given spots.
The inventor finds that the current deconvolution method has very high requirements on the expression pattern of the cell subset, and the original scRNA-seq data has large scale and much noise, which can result in slow operation speed and general effect of the deconvolution method. Down-sampling directly in the data can lose a large amount of valuable information.
Therefore, it is necessary to develop a method for obtaining the expression pattern of cell subsets to solve the above problems.
Disclosure of Invention
The invention aims to provide a method and a system for deducing an expression mode of a cell subset in a space transcriptome, so that single cell reference data required by a deconvolution method in the space transcriptome is reduced in dimensionality and simultaneously retains a large amount of related information, thereby improving the running speed and accuracy of the deconvolution method.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
in a first aspect, the invention is a method of inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
normalizing and normalizing the cell subpopulation expression matrix;
constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
Preferably, the quality control and pretreatment of the scRNA-seq data set comprises: filtering the cells with the low gene content and the genes which are not expressed in the cells and the mitochondrial genes, and screening out the genes with high expression.
Preferably, the method of normalizing and normalizing the expression matrix of a subpopulation of cells is as follows:
X i =log(X i +1),i∈C
wherein X i Expressing the expression matrix of each cell subset, wherein the normalization adopts a log normalization method, and the normalization adopts a min-max normalization method; obtained expression matrix X' i Has a value range of [0, 1 ]]。
Preferably, the method for constructing the variational neural network to learn the implicit variational distribution of the expression matrix of each cell subset in the scRNA-seq data set is as follows:
for a preprocessed single-cell transcriptome gene expression matrix X i Firstly, inputting a layer of coder consisting of all connection layers, and outputting mu and sigma; again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain an implicit variable Z, and finally generating final reference data through a decoder consisting of a full connection layer;
the formula for the neural network is as follows:
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
wherein E and D represent hidden layers of an encoder and a decoder, respectively; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable; x' i Represents the expression matrix after reconstitution of the cell subset i.
Preferably, the method further comprises the steps of: setting an activation function, a loss function and a reparameterization method.
Preferably, the loss function expression is:
Preferably, the expression for reparameterizing the hidden variable z is as follows:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein, epsilon to Norm (0, 1).
In a second aspect, the present invention provides a system for inferring expression patterns of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
The above one or more technical solutions have the following beneficial effects:
the invention can accurately acquire the expression mode of each cell subset in the scRNA-seq data set by using the variational self-encoder, so that the deconvolution method in the space transcriptome can accurately obtain the maximum posterior estimation of the cell subset distribution in the space under the gene expression distribution of a given spot.
The invention ensures that the dimension of the single cell reference data required by the deconvolution method in the space transcriptome is reduced, and simultaneously, a large amount of related information is kept, thereby improving the operation speed and the accuracy of the deconvolution method and ensuring that the distribution of cells in the tissue slice is more accurate.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a variational self-encoder of the present invention;
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Specific embodiments of the present invention are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the invention in unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment of the invention provides a method for deducing an expression mode of a cell subset in a space transcriptome, which can be applied to the fields of space transcriptomics, single cell transcriptomics and the like, can be combined with a variational self-encoder to accurately obtain the expression mode of the cell subset, and further provides the maximum posterior estimation of the distribution of the cell subset in the space by utilizing a deconvolution method, wherein the method comprises the following steps:
step 1: the quality control of the scRNA-seq dataset, in this example, the kidney cell data of the 18-month-old mouse in the Tabula-muris dataset was selected as an expression matrix consisting of 3138 cells and 20138 genes, which is denoted as X. Quality control is carried out on the gene, cells with low gene content and genes which are not expressed in the cells are filtered, and genes with high expression are screened. After pretreatment, the expression matrix X consists of 2771 cells and 3000 hypervariable genes respectively.
Step 2: the cell subpopulation expression matrix was normalized and normalized. Tabula-muris gives the cell subset C to which each cell belongs, and in this example, cell subsets with a cell number less than 25 are excluded, and the matrix X is expressed on the cell subsets i (i ∈ C) log normalization and min-max normalization were performed, as shown in the equation:
X i =log(X i +1),i∈C
and 3, step 3: a Variational Autoencoder (VAE) was constructed to learn the latent variate distribution of the cell subset expression matrix in the scra-seq dataset. In this embodiment, the variational self-encoder belongs to one of neural networks, and realizes the learning of the cell expression pattern through the connection between nodes, describes the observation of hidden variables in a gaussian distribution mode, and finally reconstructs the cell subset expression pattern through the hidden variables. In this example, gene expression matrix X for a single cell transcriptome i First, the data is passed through an Encoder (Encoder) consisting of a full connection layerThe values are given as μ and σ, and again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain a hidden variable Z, and finally generating final reference data through a Decoder (Decoder) consisting of a full connection layer.
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
Wherein E and D represent the hidden layers of the encoder and decoder, respectively, which in this embodiment has a dimension of 400; w E And W D Respectively representing the weight parameters of the full connection layer; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable, which in this embodiment has a dimension of 20; x' i Represents the expression matrix after reconstitution of the cell subset i.
Furthermore, because the input values of the standardized expression matrixes are all between 0 and 1, the hidden layer adopts a ReLU activation function, and the output layer adopts a sigmoid function. The loss function of the VAE can be expressed as:
Loss=E z~q(z|x) [logp(x|z)]+KL(N(μ,σ 2 ) N (O, I)) where the first term is also called reconstruction loss, the model herein employs L2 loss, i.e.:
||X i -X′ i || 2
the second term, klloss, is used to reflect the degree of fit between the reconstructed expression pattern and the original cell subpopulation expression pattern, and can be expressed in VAE as:
the final loss function is thus expressed as:
where α is used to represent the reconstruction loss and the KL loss fraction, is set to 2 in this embodiment. In the backward propagation, we need to re-parameterize the hidden variable z (replication) since the sampling operation is not guided.
Because Z to N (mu, sigma) 2 ) And the following steps can be performed:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein epsilon-Norm (0, 1). By this technique, the gradient can be propagated back directly through μ and σ.
And 4, step 4: sampling in the trained implicit variable distribution of the cell subsets to generate an expression mode of the cell subsets, which specifically comprises the following steps: for each cell subset with a cell number greater than 25, as input to the variational self-encoder, in this example, the maximum number of iterations is set to 1000, and the learning rate is set to 10 -3 When KL loss is less than 10 -5 When so, training is stopped. For the output results, down-sampling was performed to a dimension of 25, resulting in a standard reference cell subpopulation.
And 5: deconvoluting the expression patterns of all spots in the tissue section of the space transcriptome based on the expression patterns of the cell subsets to obtain the maximum posterior estimation of the distribution of the cell subsets in the space, which specifically comprises the following steps: in this example, FFPE _ Kidney spatial transcriptome data Y obtained by 10X Visium sequencing technology has 3124 spots on tissue, 19465 genes, and 2675 genes having intersection with the cell subset obtained in step S4. Dividing the tissue section into regions by a spatial clustering method, taking X 'and Y as input of a deconvolution method, and outputting the proportion of each cell subset in X' in each region.
It should be noted that the spatial clustering method may adopt methods such as sourat, bayesian space and SpaGCN, and the deconvolution method may adopt methods such as SPOTlight, spacexr and stereoScope, which are well known and all fall within the scope of protection of the present patent.
Example two
The object of the present embodiment is to provide a computing device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the method in the first embodiment.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of the first embodiment.
Example four
It is an object of this embodiment to provide a system for inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained implicit variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.
Those skilled in the art will appreciate that the modules or steps of the present disclosure described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code executable by computing means, whereby the modules or steps may be stored in memory means for execution by the computing means, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A method of inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
normalizing and normalizing the cell subpopulation expression matrix;
constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
sampling in the trained implicit variable distribution to generate an expression mode of the cell subset;
deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
2. The method of claim 1, wherein the quality control and pre-processing of the scRNA-seq dataset comprises: filtering the cells with the low gene content and the genes which are not expressed in the cells and the mitochondrial genes, and screening out the genes with high expression.
3. The method of inferring the expression pattern of a subset of cells within a spatial transcriptome of claim 1, wherein the method of normalizing and normalizing the expression matrix of the subset of cells is as follows:
X i =log(X i +1),i∈C
wherein X i Expressing the expression matrix of each cell subset, wherein the normalization adopts a log normalization method, and the normalization adopts a min-max normalization method; expression matrix X 'obtained' i Has a value range of [0, 1 ]]。
4. The method of inferring expression patterns of cell subsets within a spatial transcriptome of claim 1, wherein a variational neural network is constructed to learn the hidden variable distribution method of the expression matrix of each cell subset in the scRNA-seq dataset as follows:
for a preprocessed single-cell transcriptome gene expression matrix X i Firstly, inputting a layer of coder consisting of all connection layers, and outputting mu and sigma; again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain hidden variable Z, and finally passing through a layer of fully-connected layerThe constituent decoders generate final reference data;
the formula for the neural network is as follows:
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
wherein E and D represent hidden layers of an encoder and a decoder, respectively; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable; x' i Represents the expression matrix after reconstitution of the cell subset i.
5. The method of inferring the expression pattern of a subpopulation of cells within a spatial transcriptome of claim 1, further comprising the step of: setting an activation function, a loss function and a reparameterization method.
7. The method of inferring expression patterns of cell subsets within a spatial transcriptome of claim 5, wherein the expression for reparameterizing the latent variable z is:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein ε to Norm (0, 1).
8. A system for inferring the expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1 to 7 are performed when the program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of the preceding claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210552099.0A CN114944194A (en) | 2022-05-20 | 2022-05-20 | Method and system for deducing cell subset expression mode in space transcriptome |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210552099.0A CN114944194A (en) | 2022-05-20 | 2022-05-20 | Method and system for deducing cell subset expression mode in space transcriptome |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114944194A true CN114944194A (en) | 2022-08-26 |
Family
ID=82908702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210552099.0A Pending CN114944194A (en) | 2022-05-20 | 2022-05-20 | Method and system for deducing cell subset expression mode in space transcriptome |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114944194A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110957009A (en) * | 2019-11-05 | 2020-04-03 | 中山大学中山眼科中心 | Single-cell transcriptome missing value filling method based on deep hybrid network |
CN111785329A (en) * | 2020-07-24 | 2020-10-16 | 中国人民解放军国防科技大学 | Single-cell RNA sequencing clustering method based on confrontation automatic encoder |
-
2022
- 2022-05-20 CN CN202210552099.0A patent/CN114944194A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110957009A (en) * | 2019-11-05 | 2020-04-03 | 中山大学中山眼科中心 | Single-cell transcriptome missing value filling method based on deep hybrid network |
CN111785329A (en) * | 2020-07-24 | 2020-10-16 | 中国人民解放军国防科技大学 | Single-cell RNA sequencing clustering method based on confrontation automatic encoder |
Non-Patent Citations (3)
Title |
---|
ALMA ANDERSSON, ET AL: "Single-cell and spatial transcriptomics enables probabilisticinference of cell type topography", 《COMMUNICATIONS BIOLOGY》, 9 October 2020 (2020-10-09), pages 1 - 8 * |
嵩楠: "基于分子间关联关系的生物数据降维算法研究", 《中国优秀硕士论文全文数据库基础科学辑》, 15 January 2022 (2022-01-15), pages 11 - 12 * |
苏剑林: "变分自编码器VAE:原来是这么一回事", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/34998569》, 27 March 2018 (2018-03-27), pages 1 - 15 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | DeepDSC: a deep learning method to predict drug sensitivity of cancer cell lines | |
Assefa et al. | Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data | |
CN114022693B (en) | Single-cell RNA-seq data clustering method based on double self-supervision | |
Anderson et al. | A functional central limit theorem for a Markov-modulated infinite-server queue | |
CN110060657B (en) | SN-based many-to-many speaker conversion method | |
CN114202072A (en) | Expected value estimation method and system under quantum system | |
Kuznetsov et al. | Interpretable feature generation in ECG using a variational autoencoder | |
Montserrat et al. | Class-conditional vae-gan for local-ancestry simulation | |
Borisyak et al. | Machine Learning on data with sPlot background subtraction | |
CN113449802A (en) | Graph classification method and device based on multi-granularity mutual information maximization | |
Lee et al. | NAS-TasNet: neural architecture search for time-domain speech separation | |
DE112021005739T5 (en) | GENERATION OF PEPTIDE-BASED VACCINE | |
Venkataramanan et al. | Identification of hidden Markov models for ion channel currents. III. Bandlimited, sampled data | |
Rho et al. | Nas-vad: Neural architecture search for voice activity detection | |
CN111312270B (en) | Voice enhancement method and device, electronic equipment and computer readable storage medium | |
CN114944194A (en) | Method and system for deducing cell subset expression mode in space transcriptome | |
Stadlthanner et al. | Hybridizing sparse component analysis with genetic algorithms for microarray analysis | |
Einipour et al. | EinImpute: a local and gene-based approach to imputation of dropout events in ScRNA-seq data | |
Listgarten | Analysis of sibling time series data: alignment and difference detection | |
Zhang et al. | Hierarchical model compression via shape-edge representation of feature maps—an enlightenment from the primate visual system | |
CN113707172A (en) | Single-channel voice separation method, system and computer equipment of sparse orthogonal network | |
Wang et al. | scBKAP: a clustering model for single-cell RNA-Seq data based on bisecting K-means | |
Feng et al. | Advancing single-cell RNA-seq data analysis through the fusion of multi-layer perceptron and graph neural network | |
Pokorny et al. | A connectome manipulation framework for the systematic and reproducible study of structure-function relationships through simulations | |
Humbert et al. | Low rank activations for tensor-based convolutional sparse coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |