CN116386729A - scRNA-seq data dimension reduction method based on graph neural network - Google Patents

scRNA-seq data dimension reduction method based on graph neural network Download PDF

Info

Publication number
CN116386729A
CN116386729A CN202211716676.1A CN202211716676A CN116386729A CN 116386729 A CN116386729 A CN 116386729A CN 202211716676 A CN202211716676 A CN 202211716676A CN 116386729 A CN116386729 A CN 116386729A
Authority
CN
China
Prior art keywords
cell
data
neural network
scrna
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211716676.1A
Other languages
Chinese (zh)
Inventor
王树林
孙鸿福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202211716676.1A priority Critical patent/CN116386729A/en
Publication of CN116386729A publication Critical patent/CN116386729A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to data mining in bioinformatics, and in particular to mining single cell RNA sequencing data. In particular to a method for carrying out dimension compression and clustering on single-cell RNA sequencing data by a deep learning method so as to achieve the purpose of effectively identifying cell populations. The method of the invention comprises collecting and preprocessing scRNA-seq data; constructing a graph neural network model; performing dimension reduction on the preprocessed data by using the constructed model; and carrying out cluster analysis on the result after the dimension reduction. The model constrains the data structure, reduces the dimension through the graphic neural network module, and simultaneously maintains the cell-cell relationship and the gene-gene relationship in the dimension reduction result. Experiments performed on five real scRNA-seq data sets with standardized mutual information and adjusted Raney index as evaluation indexes show that the method has good performance.

Description

scRNA-seq data dimension reduction method based on graph neural network
Technical Field
The present invention relates to data mining in bioinformatics, and in particular to mining single cell RNA sequencing data. In particular to the method for achieving the purpose of effectively identifying the cell population by carrying out dimensional compression and clustering on single-cell RNA sequencing data.
Background
With the explosive growth of single cell RNA sequencing (scRNAseq) technology in recent years, unprecedented single cell transcriptional analysis opportunities have emerged. Traditional batch RNA sequencing methods sequence a mixture of millions of cells. This results in gene expression of one gene reflecting the average of gene expression in all cells, and ignoring heterogeneity between cells. Unlike bulk RNAseq, scRNAseq isolates cells in a first step and sequences thousands of genes per cell in a second step. Millions of expression values are collected for each gene according to different sequencing schemes, so that new cell types can be identified, gene regulation mechanisms are determined, and the cell dynamics problem in the development process is solved.
Single cell RNA sequencing (scRNA-seq) is an ideal method to study intercellular variation. Conventional dimension reduction techniques such as Principal Component Analysis (PCA) and t-distributed random neighborhood embedding (t-SNE) are implemented on scRNA-seq data for visualization and downstream analysis, which significantly increases our understanding of cellular heterogeneity and development progress. The recent advent of massively parallel scRNA-seq (e.g. droplet platforms) enabled sequencing of millions of cells in complex biological systems, which provides excellent potential for dissection of tissue and cell microenvironments, identification of rare/new cell types, inference of developmental lineages, and elucidation of the response mechanisms of cells to stimuli. However, the data generated by the massive parallel scRNA-seq has the characteristics of high dropout, high noise, complex structure and the like, and brings a series of challenges for dimension reduction. In particular, preserving the complex topology between cells is a great challenge.
Over the past few years, a number of dimension reduction methods have been developed or introduced for scRNA-seq data analysis. Recently developed competing methods include DCA, scVI, scDeepCluster, PHATE, SAUCIE, scGNN, ZINB-WaVE and Ivis. Among them, deep learning shows the greatest potential. For example, DCA, scDeepCluster, ivis and SAUCIE adjust the auto-encoder to denoise, visualize and cluster the scRNA-seq data. However, these deep learning based models embed only different cellular features and ignore cell-to-cell relationships, which limits their ability to reveal complex topologies between cells and also makes it difficult to elucidate developmental trajectories. The recently proposed graph auto-encoder is very promising because it preserves long distance relationships between data in potential space.
However, studies have shown that gene interactions involved in gene regulatory networks or protein-protein interactions (PPI) networks are informative in different biological contexts. Furthermore, previous studies have shown that combining scRNA-seq data with previous gene interaction information can lead to meaningful understanding of the data. NetNMF-sc is a network regularized non-negative matrix factorization designed specifically for scRNA-seq analysis that uses a priori gene networks to obtain a more meaningful low-dimensional representation of genes. Correspondingly, the scRNA-seq data also contains rich information to infer gene-gene interactions.
In light of the above understanding, we propose scTPGAE, a graph neural network-based calculation method that uses two graph neural networks to simultaneously retain the cell-cell relationship, gene-gene relationship, into the dimension-reduction result to achieve better downstream analysis results.
Disclosure of Invention
Aiming at the problems of the method and the complexity of the scRNA-seq data, the invention provides a dimension reduction method of the scRNA-seq data based on a graph neural network. The method can effectively solve the problems of important information loss, insufficient feature extraction and the like of the existing dimension reduction method, simultaneously reserves a cell-cell relationship and a gene-gene relationship in a dimension reduction result, and obtains better clustering precision. The steps of the described method include:
1. data preprocessing
First, we assume that we have an original scRNA-seq count matrix C, which filters out genes that are not counted in any cells. C can be expressed as a P by N dimensional matrix, where P is defined as the total number of genes and N is defined as the total number of cells, C ij The expression value of gene i in cell j is indicated.
In this work, we first pre-process the raw scRNA-seq count data, including logarithmic transformation and z-score normalization. We have a normalized output X, shown below
Figure SMS_1
X=zscore(X′)
Wherein S is j Is the size factor of each cell j. The advantage of data preprocessing is to preserve the effect of data size differences and convert discrete values to continuous values, thereby providing greater flexibility for subsequent modeling.
The inputs required for the graph neural network require a cell-cell relationship graph and a gene-gene interaction network in addition to the gene-cell relationship matrix described above.
Wherein the cell-cell relationship graph is constructed by the K Nearest Neighbor (KNN) algorithm in the Scikit-learn Python package. Default K was predefined as 35 in this study and was adjusted according to the dataset in our experiment. The adjacency matrix generated is a matrix of 0-1, 1 representing connectivity and 0 representing non-connectivity.
Gene-gene interaction networks we have collected seven different human gene interaction networks and a mouse gene interaction network to evaluate the performance of scTPGAE using existing data. One of the most well known gene interaction networks is the sting database, a PPI network, which collects and integrates protein-protein association information from a variety of sources, including literature and experiments. HumanNet is a human functional gene network that integrates multiple types of histology data through a Bayesian statistical framework. Humantet includes the hierarchical structure of the human gene network, i.e., human-derived PPIs, co-functional links, co-references, and mutual exclusion from other species. In particular, we use two versions of HumanNet, humanNet-CF and HumanNet-PI, which consist of a synergistic network and PPI network, respectively. FunCoup is a genome-wide functionally-associated network that uses unique redundant weighted Bayesian integration to combine 10 different types of functionally-associated data. GeneMANIA creates a combinatorial gene network by weighting multiple functional genome datasets. Furthermore, we collected two functional similarity matrices from pgWalk, which were derived from KEGG pathway and Gene ontologiy biological processes, respectively. Next, we transform the two similarity matrices into a gene network by filtering out those pairs of genes whose similarity values are less than a certain threshold (i.e., 0.9). These two networks are referred to as pgWalk-kegg and pgWalk-gobp, respectively.
2. Construction of a graph neural network for dimension reduction
(1) Graphic neural network G1 retaining cell-cell relationship
The graph automatic encoder is an artificial neural network for unsupervised representation learning of graph structure data. The graphic auto-encoder has a low-dimensional bottleneck layer and thus can be used as a dimension-reduction model. Assume that the inputs are a cell-cell relationship graph of node matrix X and adjacency matrix a. In our joint picture automatic encoder, there is one encoder E for the whole picture, two decoders D X And D A For nodes and edges, respectively. In practice, we first encode the input graph as the latent variable h=e (X, a), and then decode h into the reconstructed node matrix X r =D X (h) And a reconstructed adjacency matrix A r =D A (h) A. The invention relates to a method for producing a fibre-reinforced plastic composite The goal of the learning process is to minimize reconstruction losses
Figure SMS_2
Wherein the weights are superparameters. In our experiments, set to 0.6.
We use Python package Spektral32 to implement our model. There are many types of graphic neural networks that can be used as encoders or decoders. Therefore, to extract the features of the nodes by means of their neighbors, we apply the graph attention layer as default in the encoder. Other graphic neural networks such as GCN, graphSAGE and TAGCN may also be implemented as encoders in the scTPGAE. Feature decoder D X Is a four-layer fully connected neural network with 64, 256 and 512 nodes in the hidden layer.
The edge decoder consists of one fully connected layer, then the components of quadrant and activation:
A r =D A (h)=σ(ZZ T )
where z=σ (Wh) as the output of the fully connected layer with the weight matrix W, σ (x) =max (0, x) is a straight linear unit.
(2) Graph neural network G2 retaining gene-gene relationship
We note that when a gene interaction network is applied to a data set, only those interaction pairs in which two interacting genes occur in the data set are retained, and the remaining pairs are discarded. In other words, the number of interaction pairs of the gene interaction networks of different data sets may differ from each other. To capture both regulatory directions and their corresponding intensities in a pair of genes, the gene interaction network is considered a directed graph, so for the edges of the a and B genes from the undirected gene network, e.g., STRING PPI network, we consider it as a pair of edges (i.e., the edge from a to B and the edge from B to a).
The specific graph neural network construction method is the same as that of the graph neural network which retains the cell-cell relationship, except that the input of the graph neural network is converted from the cell-cell relationship graph into the PPI interaction network of the gene-gene relationship. The interaction relationships between genes can spontaneously be presented in a graphical format, where a graphical neural network is applied to model such relationships. In the graph roll stack, each node represents one gene, and the edge between two nodes represents the relationship of the two corresponding genes. The graph representation module is designed as a graph volume layer, updating each node by aggregating the information of its neighboring nodes.
3. Dimension reduction for scRNA-seq data
And (3) performing dimension reduction on the preprocessed scRNA-seq data by using the constructed graph neural network.
Inputting the gene-cell count matrix and the cell-cell relation into the graph neural network G1 to obtain the cell characteristics theta 1 after dimension reduction.
Inputting the gene-cell count matrix and the gene-gene interaction network into the graph neural network G2 to obtain the cell characteristics theta 2 after dimension reduction.
The learned cell characteristics are linked as a dimension reduction result of subsequent downstream analysis.
K-means algorithm clustering
The present method uses the ZINB conditional likelihood to reconstruct the decoder output of the scRNA-seq data, and the ZINB distribution has proven to be a better model for describing the scRNA-seq data and is a widely accepted gene expression distribution structure.
In order to evaluate the effectiveness of the method, a k-means clustering algorithm is applied to cluster the data after dimension reduction, and the index of standardized mutual information is used for evaluation. Assuming that X is the predicted clustering result and Y is the true tagged cell type, the NMI score is calculated as follows:
Figure SMS_3
MI is the mutual entropy between X and Y, and H is the shannon entropy.
From the foregoing, it can be seen that the scRNA-seq data dimension reduction method based on the graph neural network provided in one or more embodiments of the present disclosure retains both cell-cell and gene-gene relationships in the dimension reduction results. Our model constrains the data structure and dimension reduction is performed by two graph neural network modules. Experiments performed on five real scRNA-seq datasets indicate that the present method can provide a more accurate low-dimensional representation of the scRNA-seq data.
Detailed Description
The present invention will be described in further detail with reference to the following experiments in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
1. Overview of data set
To evaluate the performance of scTPGAE, we focused on a relatively large dataset; five authentic scRNA-seq datasets with known cell types were selected. The following table summarizes the basic information of five real datasets, which will be described below.
Figure SMS_4
(i) 10X PBMC dataset, provided by the 10X scRNA-seq platform, data collected from a healthy human; (ii) A mouse embryonic stem cell dataset describing a transcriptome of mouse embryonic stem cell heterodifferentiation following withdrawal of Leukemia Inhibitory Factor (LIF); (iii) The mouse bladder cell dataset was from the mouse cytogram project GSE108097. From the original count matrix, we selected about 2700 cells from bladder tissue; (iv) The worm neuron cell dataset was analyzed by single cell combinatorial indexing RNA sequencing from L2 larval stage caenorhabditis elegans; (v) The Zeisel dataset contained 3005 cells from mouse cortex and hippocampal GSE60361.
2. Experimental environment and parameter setting
The hardware environment is mainly a PC host. The CPU 11th Gen Intel (R) Core (TM) i5-1135G7,2.42GHz of the PC host computer is 16GB RAM and 64-bit operating system. The software is implemented in Python language under Pycharm environment with Windows 10 as platform, python version 3.5.0 and Tensorflow version 1.4.0.
We use Python package Spektral32 to implement our model. There are many types of graphic neural networks that can be used as encoders or decoders. Therefore, to extract the features of the nodes by means of their neighbors, we apply the graph attention layer as default in the encoder. Other graph neural networks such as GCN, graphSAGE and TAGCN may also be implemented as encoders in the scTPGAE. Feature decoder D X Is a four-layer fully connected neural network with 64, 256 and 512 nodes in the hidden layer.
The edge decoder consists of one fully connected layer, then the components of quadrant and activation:
A r =D A (h)=σ(ZZ T )
where z=σ (Wh) as the output of the fully connected layer with the weight matrix W, σ (x) =max (0, x) is a straight linear unit.
The inputs required for the graph neural network require a cell-cell relationship graph and a gene-gene interaction network in addition to the gene-cell relationship matrix described above.
Wherein the cell-cell relationship graph is constructed by the K Nearest Neighbor (KNN) algorithm in the Scikit-learn Python package. Default K was predefined as 35 in this study and was adjusted according to the dataset in our experiment. The adjacency matrix generated is a matrix of 0-1, 1 representing connectivity and 0 representing non-connectivity.
Gene-gene interaction networks we have collected seven different human gene interaction networks and a mouse gene interaction network to evaluate the performance of scTPGAE using existing data.
3. Evaluation index
In order to make the results of the different methods easy to compare, we use K-means for cluster analysis and set the parameter K as the true cluster number in each dataset. In our experiments, the scTPGAE model was evaluated using two indices, normalized Mutual Information (NMI) and Adjusted Rankine Index (ARI), which are widely used in model performance evaluation in unsupervised learning scenarios.
4. Analysis of experimental results
Here, experiments are mainly performed on five real data sets by the method, and the obtained normalized mutual information and the adjusted rand index are shown in the following table.
Normalized Mutual Information (NMI)
Figure SMS_5
Adjusting the Rankine index (ARI)
Figure SMS_6
The experimental result shows that the scTPGAE method based on the graph neural network is a promising new method. The present method achieves better performance over five real datasets, indicating that the present method can provide a more accurate low-dimensional representation of the scRNA-seq data.
It can be seen that the proposed scTPGAE method is a method for performing dimension reduction and cluster analysis on single-cell RNA-seq data, and has the following advantages that firstly, the scTPGAE matches potential spatial distribution with a selected priori; secondly, scTPGAE retains the cell-cell relationship in the dimension reduction result; again, the scTPGAE method retains the cell-cell relationship while retaining the gene-gene relationship; finally, the method takes into account the parallelism and scalability properties in the deep neural network framework. Our model constrains the data structure and performs dimension reduction through the graph neural network module. Experiments performed on five real scRNA-seq data sets with standardized mutual information and adjusted Raney index as evaluation indexes show that the method has good performance.
Drawings
Fig. 1: a flow diagram of a scRNA-seq data dimension reduction method based on a graph neural network;
fig. 2: experimental results with Normalized Mutual Information (NMI) as a measure;
fig. 3: experimental results with the Adjusted Rand Index (ARI) as a measure.

Claims (5)

1. A scRNA-seq data dimension reduction method based on a graph neural network is characterized by comprising the following implementation steps:
(1) Preprocessing data; collecting scRNA-seq datasets from different species, different types, different cell numbers; preprocessing the collected original scRNA-seq data by adopting a logarithmic conversion and z fraction normalization method, and reconstructing the input data by utilizing zero expansion negative binomial distribution to obtain noiseless data;
(2) Constructing a graphic neural network for dimension reduction, which is an automatic encoder framework consisting of a depth encoder, an intermediate hidden layer and a depth decoder; the topological structure between cells and the topological structure between genes can be simultaneously reserved in the dimension reduction result;
(3) Reducing the dimension of the preprocessed scRNA-seq data by using the constructed graph neural network, learning a hidden layer feature vector by using an intermediate hidden layer of an automatic encoder, restraining prior distribution of the hidden layer feature vector, and matching the hidden layer feature vector with the selected prior distribution; connecting the hidden layer feature vectors learned in the two graph neural networks so as to facilitate subsequent downstream analysis;
(4) And clustering the dimensionality reduced data by using a k-means clustering algorithm to obtain a standardized mutual information score and adjust the Rand index.
2. The method for reducing dimension of scRNA-seq data based on graphic neural network according to claim 1, wherein the data is collected and the collected single cell RNA sequencing data is preprocessed:
we collected five scRNA-seq datasets from different species, different types, different cell numbers, and were then preprocessed using the method of logarithmic transformation and z-score normalization.
Specifically, we performed data preprocessing operations on the following five data sets.
(1) 10X PBMC dataset, provided by the 10X scRNA-seq platform, data collected from a healthy human;
(2) A mouse embryonic stem cell dataset describing a transcriptome of mouse embryonic stem cell heterodifferentiation following withdrawal of Leukemia Inhibitory Factor (LIF);
(3) The mouse bladder cell dataset was from the mouse cytogram project GSE108097. From the original count matrix, we selected about 2700 cells from bladder tissue;
(4) The worm neuron cell dataset was analyzed by single cell combinatorial indexing RNA sequencing from L2 larval stage caenorhabditis elegans;
(5) The Zeisel dataset contained 3005 cells from mouse cortex and hippocampal GSE60361.
3. The method for reducing dimension of scRNA-seq data based on graphic neural network according to claim 1, wherein the construction of a graphic neural network is an automatic encoder framework composed of a depth encoder, an intermediate hidden layer and a depth decoder, and specifically comprises:
(1) Graphic neural network G1 retaining cell-cell relationship
The graph automatic encoder is an artificial neural network for unsupervised representation learning of graph structure data. The graphic auto-encoder has a low-dimensional bottleneck layer and thus can be used as a dimension-reduction model. Assume that the inputs are a cell-cell relationship graph of node matrix X and adjacency matrix a. In our joint picture automatic encoder, there is one encoder E for the whole picture, two decoders D X And D A For nodes and edges, respectively. In practice, we first encode the input graph as the latent variable h=e (X, a), and then decode h into the reconstructed node matrix X r =D X (h) And a reconstructed adjacency matrix A r =D A (h) A. The invention relates to a method for producing a fibre-reinforced plastic composite The goal of the learning process is to minimize reconstruction losses
Figure FDA0004014338180000021
Wherein the weights are superparameters. In our experiments, set to 0.6.
We use Python package Spektral32 to implement our model. There are many types of graphic neural networks that can be used as encoders or decoders. Therefore, to extract the features of the nodes by means of their neighbors, we apply the graph attention layer as default in the encoder. Other graph neural networks such as GCN, graphSAGE and TAGCN may also be implemented as encoders in the scTPGAE. Feature decoder D X Is a four-layer fully connected neural network with 64, 256 and 512 nodes in the hidden layer.
The edge decoder consists of one fully connected layer, then the components of quadrant and activation:
A r =D A (h)=σ(ZZ T )
where z=σ (Wh) as the output of the fully connected layer with the weight matrix W, σ (x) =max (0, x) is a straight linear unit.
(2) Graph neural network G2 retaining gene-gene relationship
We note that when a gene interaction network is applied to a data set, only those interaction pairs in which two interacting genes occur in the data set are retained, and the remaining pairs are discarded. In other words, the number of interaction pairs of the gene interaction networks of different data sets may differ from each other. To capture both regulatory directions and their corresponding intensities in a pair of genes, the gene interaction network is considered a directed graph, so for the edges of the a and B genes from the undirected gene network, e.g., STRING PPI network, we consider it as a pair of edges (i.e., the edge from a to B and the edge from B to a).
The specific graph neural network construction method is the same as that of the graph neural network which retains the cell-cell relationship, except that the input of the graph neural network is converted from the cell-cell relationship graph into the PPI interaction network of the gene-gene relationship. The interaction relationships between genes can spontaneously be presented in a graphical format, where a graphical neural network is applied to model such relationships. In the graph roll stack, each node represents one gene, and the edge between two nodes represents the relationship of the two corresponding genes. The graph representation module is designed as a graph volume layer, updating each node by aggregating the information of its neighboring nodes.
4. The method for reducing the dimension of scRNA-seq data based on the graphic neural network according to claim 1, wherein the method for reducing the dimension of the preprocessed scRNA-seq data by using the constructed graphic neural network is characterized by comprising the following steps:
and (3) performing dimension reduction on the preprocessed scRNA-seq data by using the constructed graph neural network.
Inputting the gene-cell count matrix and the cell-cell relation into the graph neural network G1 to obtain the cell characteristics theta 1 after dimension reduction.
Inputting the gene-cell count matrix and the gene-gene interaction network into the graph neural network G2 to obtain the cell characteristics theta 2 after dimension reduction.
The learned cell characteristics are linked as a dimension reduction result of subsequent downstream analysis.
5. The method for reducing the dimension of scRNA-seq data based on the graphic neural network according to claim 1, wherein the k-means clustering algorithm is applied to cluster the dimension-reduced data. The method specifically comprises the following steps:
the present method uses the ZINB conditional likelihood to reconstruct the decoder output of the scRNA-seq data, and the ZINB distribution has proven to be a better model for describing the scRNA-seq data and is a widely accepted gene expression distribution structure.
In order to evaluate the effectiveness of the method, a k-means clustering algorithm is applied to cluster the data after dimension reduction, and standardized mutual information and an adjusted Rand index are used as evaluation indexes. Experiments performed on five real scRNA-seq datasets indicate that the present method can provide a more accurate low-dimensional representation of the scRNA-seq data.
CN202211716676.1A 2022-12-23 2022-12-23 scRNA-seq data dimension reduction method based on graph neural network Pending CN116386729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211716676.1A CN116386729A (en) 2022-12-23 2022-12-23 scRNA-seq data dimension reduction method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211716676.1A CN116386729A (en) 2022-12-23 2022-12-23 scRNA-seq data dimension reduction method based on graph neural network

Publications (1)

Publication Number Publication Date
CN116386729A true CN116386729A (en) 2023-07-04

Family

ID=86975628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211716676.1A Pending CN116386729A (en) 2022-12-23 2022-12-23 scRNA-seq data dimension reduction method based on graph neural network

Country Status (1)

Country Link
CN (1) CN116386729A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665786A (en) * 2023-07-21 2023-08-29 曲阜师范大学 RNA layered embedding clustering method based on graph convolution neural network
CN116825204A (en) * 2023-08-30 2023-09-29 鲁东大学 Single-cell RNA sequence gene regulation inference method based on deep learning
CN118335192A (en) * 2024-06-13 2024-07-12 杭州电子科技大学 Single-cell sequencing data clustering method based on self-attention network and contrast learning
CN118645154A (en) * 2024-08-12 2024-09-13 中国医学科学院基础医学研究所 Single-cell Hi-C map prediction method based on single-cell RNA expression data
CN118645154B (en) * 2024-08-12 2024-11-08 中国医学科学院基础医学研究所 Single-cell Hi-C map prediction method based on single-cell RNA expression data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665786A (en) * 2023-07-21 2023-08-29 曲阜师范大学 RNA layered embedding clustering method based on graph convolution neural network
CN116825204A (en) * 2023-08-30 2023-09-29 鲁东大学 Single-cell RNA sequence gene regulation inference method based on deep learning
CN116825204B (en) * 2023-08-30 2023-11-07 鲁东大学 Single-cell RNA sequence gene regulation inference method based on deep learning
CN118335192A (en) * 2024-06-13 2024-07-12 杭州电子科技大学 Single-cell sequencing data clustering method based on self-attention network and contrast learning
CN118645154A (en) * 2024-08-12 2024-09-13 中国医学科学院基础医学研究所 Single-cell Hi-C map prediction method based on single-cell RNA expression data
CN118645154B (en) * 2024-08-12 2024-11-08 中国医学科学院基础医学研究所 Single-cell Hi-C map prediction method based on single-cell RNA expression data

Similar Documents

Publication Publication Date Title
CN107622182B (en) Method and system for predicting local structural features of protein
CN116386729A (en) scRNA-seq data dimension reduction method based on graph neural network
CN111785329A (en) Single-cell RNA sequencing clustering method based on confrontation automatic encoder
CN111210871A (en) Protein-protein interaction prediction method based on deep forest
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
Wang et al. Inferring gene–gene interactions and functional modules using sparse canonical correlation analysis
Wang et al. Graph neural networks: Self-supervised learning
CN115732034A (en) Identification method and system of spatial transcriptome cell expression pattern
CN113571125A (en) Drug target interaction prediction method based on multilayer network and graph coding
CN114091603A (en) Spatial transcriptome cell clustering and analyzing method
CN111276187A (en) Gene expression profile feature learning method based on self-encoder
CN114067915A (en) scRNA-seq data dimension reduction method based on deep antithetical variational self-encoder
CN114783526A (en) Depth unsupervised single cell clustering method based on Gaussian mixture graph variation self-encoder
Celik et al. Biological cartography: Building and benchmarking representations of life
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
Wu et al. AAE-SC: A scRNA-seq clustering framework based on adversarial autoencoder
Zhang et al. Feature selection algorithm for high-dimensional biomedical data using information gain and improved chemical reaction optimization
Wen et al. CellPLM: pre-training of cell language model beyond single cells
CN117594132A (en) Single-cell RNA sequence data clustering method based on robust residual error map convolutional network
Bagyamani et al. Biological significance of gene expression data using similarity based biclustering algorithm
CN112071362A (en) Detection method of protein complex fusing global and local topological structures
Chen et al. A deep graph convolution network with attention for clustering scRNA-seq data
Pavlov et al. Recognition of DNA secondary structures as nucleosome barriers with deep learning methods
Leoshchenko et al. Sequencing for Encoding in Neuroevolutionary Synthesis of Neural Network Models for Medical Diagnosis.
Deng Algorithms for reconstruction of gene regulatory networks from high-throughput gene expression data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination