CN114944194A - Method and system for deducing cell subset expression mode in space transcriptome - Google Patents

Method and system for deducing cell subset expression mode in space transcriptome Download PDF

Info

Publication number
CN114944194A
CN114944194A CN202210552099.0A CN202210552099A CN114944194A CN 114944194 A CN114944194 A CN 114944194A CN 202210552099 A CN202210552099 A CN 202210552099A CN 114944194 A CN114944194 A CN 114944194A
Authority
CN
China
Prior art keywords
expression
cell
spatial
transcriptome
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210552099.0A
Other languages
Chinese (zh)
Inventor
刘健
阮志涵
陈娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202210552099.0A priority Critical patent/CN114944194A/en
Publication of CN114944194A publication Critical patent/CN114944194A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Medical Informatics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computing Systems (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a system for deducing an expression mode of a cell subset in a space transcriptome, and relates to the technical field of sequencing data analysis of the space transcriptome in bioinformatics. The method comprises the steps of performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix; normalizing and normalizing the cell subpopulation expression matrix; constructing a variational neural network to learn the implicit variable distribution of each cell subset in the scRNA-seq data set; sampling in the trained latent variable distribution to generate an expression mode of the cell subset; deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains. The invention can keep a large amount of related information while reducing dimensionality of the single cell reference data required by the deconvolution method in the space transcriptome, improve the running speed and accuracy of the deconvolution method, and enable the distribution of cells in tissue slices to be more accurate.

Description

Method and system for deducing cell subset expression mode in space transcriptome
Technical Field
The invention belongs to the technical field of bioinformatics space transcriptome sequencing data analysis, and particularly relates to a method and a system for deducing an expression mode of a cell subset in a space transcriptome.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Spatial transcriptomics is a cross discipline of life sciences and computer science. Breakthrough advances in this area have brought new discoveries into the study of diseases and biological processes. However, due to the limitations of current sequencing technologies: spatial transcriptomics techniques are able to measure the location of transcript production, but not which cells produced the transcript separately. Whereas single cell technology (scRNA-seq) can obtain transcripts per cell, although spatial information is lost.
Some analytical tools integrate single-cell data with spatial transcriptome data and propose a method to understand convolution, i.e. consider each sample point (spot or bead) as a mixture of multiple cell types. The method takes the expression mode of cell subsets in a single cell as a basis to construct a model, takes the experimental data of each spot of a space transcriptome as input, and generates output which is the maximum posterior estimation of the distribution of the cell subsets in the space under the gene expression distribution of given spots.
The inventor finds that the current deconvolution method has very high requirements on the expression pattern of the cell subset, and the original scRNA-seq data has large scale and much noise, which can result in slow operation speed and general effect of the deconvolution method. Down-sampling directly in the data can lose a large amount of valuable information.
Therefore, it is necessary to develop a method for obtaining the expression pattern of cell subsets to solve the above problems.
Disclosure of Invention
The invention aims to provide a method and a system for deducing an expression mode of a cell subset in a space transcriptome, so that single cell reference data required by a deconvolution method in the space transcriptome is reduced in dimensionality and simultaneously retains a large amount of related information, thereby improving the running speed and accuracy of the deconvolution method.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
in a first aspect, the invention is a method of inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
normalizing and normalizing the cell subpopulation expression matrix;
constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
Preferably, the quality control and pretreatment of the scRNA-seq data set comprises: filtering the cells with the low gene content and the genes which are not expressed in the cells and the mitochondrial genes, and screening out the genes with high expression.
Preferably, the method of normalizing and normalizing the expression matrix of a subpopulation of cells is as follows:
X i =log(X i +1),i∈C
Figure BDA0003655270350000021
wherein X i Expressing the expression matrix of each cell subset, wherein the normalization adopts a log normalization method, and the normalization adopts a min-max normalization method; obtained expression matrix X' i Has a value range of [0, 1 ]]。
Preferably, the method for constructing the variational neural network to learn the implicit variational distribution of the expression matrix of each cell subset in the scRNA-seq data set is as follows:
for a preprocessed single-cell transcriptome gene expression matrix X i Firstly, inputting a layer of coder consisting of all connection layers, and outputting mu and sigma; again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain an implicit variable Z, and finally generating final reference data through a decoder consisting of a full connection layer;
the formula for the neural network is as follows:
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
Figure BDA0003655270350000033
wherein E and D represent hidden layers of an encoder and a decoder, respectively; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable; x' i Represents the expression matrix after reconstitution of the cell subset i.
Preferably, the method further comprises the steps of: setting an activation function, a loss function and a reparameterization method.
Preferably, the loss function expression is:
Figure BDA0003655270350000031
wherein α is used to represent | | | X i -X′ i || 2 And
Figure BDA0003655270350000032
the ratio of (a) to (b).
Preferably, the expression for reparameterizing the hidden variable z is as follows:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein, epsilon to Norm (0, 1).
In a second aspect, the present invention provides a system for inferring expression patterns of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
The above one or more technical solutions have the following beneficial effects:
the invention can accurately acquire the expression mode of each cell subset in the scRNA-seq data set by using the variational self-encoder, so that the deconvolution method in the space transcriptome can accurately obtain the maximum posterior estimation of the cell subset distribution in the space under the gene expression distribution of a given spot.
The invention ensures that the dimension of the single cell reference data required by the deconvolution method in the space transcriptome is reduced, and simultaneously, a large amount of related information is kept, thereby improving the operation speed and the accuracy of the deconvolution method and ensuring that the distribution of cells in the tissue slice is more accurate.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a variational self-encoder of the present invention;
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Specific embodiments of the present invention are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the invention in unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example one
The embodiment of the invention provides a method for deducing an expression mode of a cell subset in a space transcriptome, which can be applied to the fields of space transcriptomics, single cell transcriptomics and the like, can be combined with a variational self-encoder to accurately obtain the expression mode of the cell subset, and further provides the maximum posterior estimation of the distribution of the cell subset in the space by utilizing a deconvolution method, wherein the method comprises the following steps:
step 1: the quality control of the scRNA-seq dataset, in this example, the kidney cell data of the 18-month-old mouse in the Tabula-muris dataset was selected as an expression matrix consisting of 3138 cells and 20138 genes, which is denoted as X. Quality control is carried out on the gene, cells with low gene content and genes which are not expressed in the cells are filtered, and genes with high expression are screened. After pretreatment, the expression matrix X consists of 2771 cells and 3000 hypervariable genes respectively.
Step 2: the cell subpopulation expression matrix was normalized and normalized. Tabula-muris gives the cell subset C to which each cell belongs, and in this example, cell subsets with a cell number less than 25 are excluded, and the matrix X is expressed on the cell subsets i (i ∈ C) log normalization and min-max normalization were performed, as shown in the equation:
X i =log(X i +1),i∈C
Figure BDA0003655270350000071
and 3, step 3: a Variational Autoencoder (VAE) was constructed to learn the latent variate distribution of the cell subset expression matrix in the scra-seq dataset. In this embodiment, the variational self-encoder belongs to one of neural networks, and realizes the learning of the cell expression pattern through the connection between nodes, describes the observation of hidden variables in a gaussian distribution mode, and finally reconstructs the cell subset expression pattern through the hidden variables. In this example, gene expression matrix X for a single cell transcriptome i First, the data is passed through an Encoder (Encoder) consisting of a full connection layerThe values are given as μ and σ, and again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain a hidden variable Z, and finally generating final reference data through a Decoder (Decoder) consisting of a full connection layer.
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
Figure BDA0003655270350000072
Wherein E and D represent the hidden layers of the encoder and decoder, respectively, which in this embodiment has a dimension of 400; w E And W D Respectively representing the weight parameters of the full connection layer; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable, which in this embodiment has a dimension of 20; x' i Represents the expression matrix after reconstitution of the cell subset i.
Furthermore, because the input values of the standardized expression matrixes are all between 0 and 1, the hidden layer adopts a ReLU activation function, and the output layer adopts a sigmoid function. The loss function of the VAE can be expressed as:
Loss=E z~q(z|x) [logp(x|z)]+KL(N(μ,σ 2 ) N (O, I)) where the first term is also called reconstruction loss, the model herein employs L2 loss, i.e.:
||X i -X′ i || 2
the second term, klloss, is used to reflect the degree of fit between the reconstructed expression pattern and the original cell subpopulation expression pattern, and can be expressed in VAE as:
Figure BDA0003655270350000081
the final loss function is thus expressed as:
Figure BDA0003655270350000082
where α is used to represent the reconstruction loss and the KL loss fraction, is set to 2 in this embodiment. In the backward propagation, we need to re-parameterize the hidden variable z (replication) since the sampling operation is not guided.
Because Z to N (mu, sigma) 2 ) And the following steps can be performed:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein epsilon-Norm (0, 1). By this technique, the gradient can be propagated back directly through μ and σ.
And 4, step 4: sampling in the trained implicit variable distribution of the cell subsets to generate an expression mode of the cell subsets, which specifically comprises the following steps: for each cell subset with a cell number greater than 25, as input to the variational self-encoder, in this example, the maximum number of iterations is set to 1000, and the learning rate is set to 10 -3 When KL loss is less than 10 -5 When so, training is stopped. For the output results, down-sampling was performed to a dimension of 25, resulting in a standard reference cell subpopulation.
And 5: deconvoluting the expression patterns of all spots in the tissue section of the space transcriptome based on the expression patterns of the cell subsets to obtain the maximum posterior estimation of the distribution of the cell subsets in the space, which specifically comprises the following steps: in this example, FFPE _ Kidney spatial transcriptome data Y obtained by 10X Visium sequencing technology has 3124 spots on tissue, 19465 genes, and 2675 genes having intersection with the cell subset obtained in step S4. Dividing the tissue section into regions by a spatial clustering method, taking X 'and Y as input of a deconvolution method, and outputting the proportion of each cell subset in X' in each region.
It should be noted that the spatial clustering method may adopt methods such as sourat, bayesian space and SpaGCN, and the deconvolution method may adopt methods such as SPOTlight, spacexr and stereoScope, which are well known and all fall within the scope of protection of the present patent.
Example two
The object of the present embodiment is to provide a computing device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the method in the first embodiment.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of the first embodiment.
Example four
It is an object of this embodiment to provide a system for inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained implicit variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.
Those skilled in the art will appreciate that the modules or steps of the present disclosure described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code executable by computing means, whereby the modules or steps may be stored in memory means for execution by the computing means, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A method of inferring an expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
normalizing and normalizing the cell subpopulation expression matrix;
constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
sampling in the trained implicit variable distribution to generate an expression mode of the cell subset;
deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
2. The method of claim 1, wherein the quality control and pre-processing of the scRNA-seq dataset comprises: filtering the cells with the low gene content and the genes which are not expressed in the cells and the mitochondrial genes, and screening out the genes with high expression.
3. The method of inferring the expression pattern of a subset of cells within a spatial transcriptome of claim 1, wherein the method of normalizing and normalizing the expression matrix of the subset of cells is as follows:
X i =log(X i +1),i∈C
Figure FDA0003655270340000011
wherein X i Expressing the expression matrix of each cell subset, wherein the normalization adopts a log normalization method, and the normalization adopts a min-max normalization method; expression matrix X 'obtained' i Has a value range of [0, 1 ]]。
4. The method of inferring expression patterns of cell subsets within a spatial transcriptome of claim 1, wherein a variational neural network is constructed to learn the hidden variable distribution method of the expression matrix of each cell subset in the scRNA-seq dataset as follows:
for a preprocessed single-cell transcriptome gene expression matrix X i Firstly, inputting a layer of coder consisting of all connection layers, and outputting mu and sigma; again from the Gaussian distribution Norm (μ, σ) 2 ) Sampling to obtain hidden variable Z, and finally passing through a layer of fully-connected layerThe constituent decoders generate final reference data;
the formula for the neural network is as follows:
E=ReLU(X i W E )
μ=ReLU(X i W μ )
σ=ReLU(X i W σ )
Z=Sample[Norm(μ,σ 2 )]
D=ReLU(ZW D )
Figure FDA0003655270340000021
wherein E and D represent hidden layers of an encoder and a decoder, respectively; μ and σ represent parameters of the implicit spatial gaussian distribution; z represents a hidden variable; x' i Represents the expression matrix after reconstitution of the cell subset i.
5. The method of inferring the expression pattern of a subpopulation of cells within a spatial transcriptome of claim 1, further comprising the step of: setting an activation function, a loss function and a reparameterization method.
6. The method of inferring expression patterns of subsets of cells within a spatial transcriptome of claim 5, wherein the loss function is expressed as:
Figure FDA0003655270340000022
wherein α is used to represent | | | X i -X′ i || 2 And
Figure FDA0003655270340000023
the ratio of (a) to (b).
7. The method of inferring expression patterns of cell subsets within a spatial transcriptome of claim 5, wherein the expression for reparameterizing the latent variable z is:
Z=Sample[Norm(μ,σ 2 )]=μ+εσ
wherein ε to Norm (0, 1).
8. A system for inferring the expression pattern of a subpopulation of cells within a spatial transcriptome, comprising:
a quality control and pre-processing module configured to: performing quality control and pretreatment on the scRNA-seq data set to obtain a cell subset expression matrix;
a normalization module configured to: normalizing and normalizing the cell subpopulation expression matrix;
a hidden variable distribution learning module configured to: constructing a variational neural network to learn the implicit variable distribution of each cell subset expression matrix in the scRNA-seq data set;
an expression pattern generation module configured to: sampling in the trained latent variable distribution to generate an expression mode of the cell subset;
a deconvolution module configured to: deconvoluting the expression patterns of all spatial domains in the spatial transcriptome tissue section based on the expression patterns of the cell subsets to obtain a maximum a posteriori estimate of the distribution of the cell subsets in the spatial domains.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1 to 7 are performed when the program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of the preceding claims 1 to 7.
CN202210552099.0A 2022-05-20 2022-05-20 Method and system for deducing cell subset expression mode in space transcriptome Pending CN114944194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210552099.0A CN114944194A (en) 2022-05-20 2022-05-20 Method and system for deducing cell subset expression mode in space transcriptome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210552099.0A CN114944194A (en) 2022-05-20 2022-05-20 Method and system for deducing cell subset expression mode in space transcriptome

Publications (1)

Publication Number Publication Date
CN114944194A true CN114944194A (en) 2022-08-26

Family

ID=82908702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210552099.0A Pending CN114944194A (en) 2022-05-20 2022-05-20 Method and system for deducing cell subset expression mode in space transcriptome

Country Status (1)

Country Link
CN (1) CN114944194A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110957009A (en) * 2019-11-05 2020-04-03 中山大学中山眼科中心 Single-cell transcriptome missing value filling method based on deep hybrid network
CN111785329A (en) * 2020-07-24 2020-10-16 中国人民解放军国防科技大学 Single-cell RNA sequencing clustering method based on confrontation automatic encoder

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110957009A (en) * 2019-11-05 2020-04-03 中山大学中山眼科中心 Single-cell transcriptome missing value filling method based on deep hybrid network
CN111785329A (en) * 2020-07-24 2020-10-16 中国人民解放军国防科技大学 Single-cell RNA sequencing clustering method based on confrontation automatic encoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALMA ANDERSSON, ET AL: "Single-cell and spatial transcriptomics enables probabilisticinference of cell type topography", 《COMMUNICATIONS BIOLOGY》, 9 October 2020 (2020-10-09), pages 1 - 8 *
嵩楠: "基于分子间关联关系的生物数据降维算法研究", 《中国优秀硕士论文全文数据库基础科学辑》, 15 January 2022 (2022-01-15), pages 11 - 12 *
苏剑林: "变分自编码器VAE:原来是这么一回事", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/34998569》, 27 March 2018 (2018-03-27), pages 1 - 15 *

Similar Documents

Publication Publication Date Title
Li et al. DeepDSC: a deep learning method to predict drug sensitivity of cancer cell lines
Assefa et al. Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
Anderson et al. A functional central limit theorem for a Markov-modulated infinite-server queue
CN110060657B (en) SN-based many-to-many speaker conversion method
CN114202072A (en) Expected value estimation method and system under quantum system
Kuznetsov et al. Interpretable feature generation in ECG using a variational autoencoder
Montserrat et al. Class-conditional vae-gan for local-ancestry simulation
Borisyak et al. Machine Learning on data with sPlot background subtraction
CN113449802A (en) Graph classification method and device based on multi-granularity mutual information maximization
Lee et al. NAS-TasNet: neural architecture search for time-domain speech separation
DE112021005739T5 (en) GENERATION OF PEPTIDE-BASED VACCINE
Venkataramanan et al. Identification of hidden Markov models for ion channel currents. III. Bandlimited, sampled data
Rho et al. Nas-vad: Neural architecture search for voice activity detection
CN111312270B (en) Voice enhancement method and device, electronic equipment and computer readable storage medium
CN114944194A (en) Method and system for deducing cell subset expression mode in space transcriptome
Stadlthanner et al. Hybridizing sparse component analysis with genetic algorithms for microarray analysis
Einipour et al. EinImpute: a local and gene-based approach to imputation of dropout events in ScRNA-seq data
Listgarten Analysis of sibling time series data: alignment and difference detection
Zhang et al. Hierarchical model compression via shape-edge representation of feature maps—an enlightenment from the primate visual system
CN113707172A (en) Single-channel voice separation method, system and computer equipment of sparse orthogonal network
Wang et al. scBKAP: a clustering model for single-cell RNA-Seq data based on bisecting K-means
Feng et al. Advancing single-cell RNA-seq data analysis through the fusion of multi-layer perceptron and graph neural network
Pokorny et al. A connectome manipulation framework for the systematic and reproducible study of structure-function relationships through simulations
Humbert et al. Low rank activations for tensor-based convolutional sparse coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination