CN117789817A - Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type - Google Patents

Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type Download PDF

Info

Publication number
CN117789817A
CN117789817A CN202311541116.1A CN202311541116A CN117789817A CN 117789817 A CN117789817 A CN 117789817A CN 202311541116 A CN202311541116 A CN 202311541116A CN 117789817 A CN117789817 A CN 117789817A
Authority
CN
China
Prior art keywords
tissue
cancer
cell
expression
cell type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311541116.1A
Other languages
Chinese (zh)
Inventor
申健
何金花
谢芳梅
韩泽平
罗文峰
赵莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Panyu Central Hospital
Original Assignee
Guangzhou Panyu Central Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Panyu Central Hospital filed Critical Guangzhou Panyu Central Hospital
Priority to CN202311541116.1A priority Critical patent/CN117789817A/en
Publication of CN117789817A publication Critical patent/CN117789817A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses an analysis system and a retrieval method for cancer cross-tissue immune cell type enrichment and expression patterns, wherein the system comprises an enrichment analysis module and an expression analysis module; the enrichment analysis module is used for obtaining a cancer cross-tissue immune cell type or enrichment state according to the user-defined genes input by a user based on the marker genes of all tissue-specific cell types; the expression analysis module is used for comprehensively analyzing immune cell characteristics drawn across a plurality of tissues according to the gene information input by a user based on the marker genes of all tissue-specific cell types, and comprises expression patterns, correlation, similar gene detection, characteristic scoring and expression value comparison; marker genes were derived from cancer cross-tissue single cell RNA-seq data collected in NCBI and publications. The invention realizes omnibearing one-stop online calculation and information retrieval of cancer cross-tissue immune cell types, expression patterns and the like, and greatly improves the convenience of single-cell data use.

Description

Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type
Technical Field
The invention relates to the technical field of gene databases, in particular to an analysis system and a retrieval method for enrichment and expression patterns of cancer cross-tissue immune cell types.
Background
Single cell RNA sequencing technology (Single-cell RNA sequencing, scRNA-seq) uses an optimized new generation sequencing technology to detect the transcriptome of Single cells, providing higher accuracy in the resolution of gene expression, and better understanding the function of Single cells in their microenvironment. Single cell transcriptome studies have grown exponentially over the past decade, covering a wide range of tissue types and diseases, especially in the field of cancer research, scRNA-seq technology has become an indispensable tool for studying tumor microenvironment, cancer pathogenesis, metastasis and invasion, and various cancer therapies and diagnostics. Accumulation of large-scale cancer scRNA-seq data places higher demands on data management and canonical processing.
Recently, cross-tissue single cell studies of cancer revealed previously unappreciated heterogeneity of immune cell types or states, which is reflected in their expression in the marker genome. For example, immune cells from peripheral blood, primary and metastatic tumors, and adjacent normal tissues of cancer patients exhibit different molecular and functional properties. Thus, there is an urgent need for deep data integration and web servers to characterize cancer immune cell mapping to multiple tissues and in turn evaluate cells of a particular environment, enriching the type or status of a custom gene set.
The single cell sequencing data which are published at present are mainly stored in the form of raw data or expression spectrums in a comprehensive portal site, such as NCBI GEO, NGDC GSA database and the like. Because of the complex computational procedures and large cluster servers required to process single cell data, it is difficult for researchers to access and use such data. The published single-cell sequencing related secondary database attempts to integrate and preprocess data by adopting a standardized flow, so as to realize the aim of easy access of users. Such as those disclosed in the literature (Franzen O., gan L.M., bjorkgren J.L.M., et al, panglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data database (Oxford) 2019;2019: baz046; papatheoo I., moreno P., manning J.et al, expression Atlas update: from tissues to single cells nucleic Acids Res.2020;48: D77-D83; lindboom R.G.H., regev A., teichmann S.A., towards a human cell Atlas: taking notes from the past. Trends Genet.2021;37:625-630; cao Y., zhu J., ji P.et. ScRNASeq a database for RNA-seq based gene expression profiles in human single cells.Genet. 2017; 8) and/or the like. Other disease or cancer specific databases, such as the literature (Sun D., wang J., han Y.et al., TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment nucleic Acids Res.2021;49: D1420-D1430), all of which downloaded datasets are fixed expression matrices and there is no unified method for reads quality control and gene expression quantification. These problems may lead to analysis bias or reduce comparability between different databases; the database disclosed in the literature (Yuan H, yan M, zhang G, et al, cancer SEA: a cancer single-cell state atlas. Nucleic Acids Res2019;47: D900-8.) contains only 41900 single cell data and only provides a search for the functional status of cancer single cell level cancer cells. The databases disclosed in the literature (Zeng J, zhang Y, shang Y, et al, cancer SCEM: a database of single-cell expression map across various human cancer nucleic Acids Res2022;50: D1147-55.) contain a large number of single-cell data sets, but only provide the analysis and retrieval functions of the genes and samples of a single sample, and do not realize the integrated analysis function of multiple samples. Although some online analysis or search platforms related to cancer have been developed, the functions of integrated analysis of single cell data based on large-scale cancer cross-tissue and online search and analysis of immune cell tissue specific characteristics of cancer cross-tissue patients have not been reported.
Disclosure of Invention
In view of the above, the invention provides an analysis system and a retrieval method for cancer cross-tissue immune cell type enrichment and expression profile, which can provide convenient analysis and retrieval for cancer cross-tissue immune cell type enrichment and expression profile query.
It is a first object of the present invention to provide an analysis system for enrichment and expression profiling of cancer across tissue immune cell types.
A second object of the present invention is to provide a method for analysis and retrieval of cancer cross-tissue immune cell type enrichment and expression profiling.
The first object of the present invention can be achieved by adopting the following technical scheme:
an analysis system for cancer cross-tissue immune cell type enrichment and expression profiling, the analysis system comprising an enrichment analysis module and an expression analysis module; the enrichment analysis module is used for obtaining cancer cross-tissue immune cell types or enrichment states according to user-input custom genes based on the marker genes of all tissue-specific cell types; the expression analysis module is used for comprehensively analyzing immune cell characteristics drawn across a plurality of tissues based on marker genes of all tissue-specific cell types according to gene information input by a user, and comprises expression patterns, correlation, similar gene detection, characteristic scoring and expression value comparison;
the marker genes for all tissue-specific cell types were obtained as follows:
cancer cross-tissue single cell RNA-seq data was collected from NCBI and publications; and (3) carrying out data processing on the collected cancer cross-tissue single-cell RNA-seq data to obtain the marker genes of all tissue specific cell types.
Further, the enrichment analysis module allows a user to input a custom gene list or gene list file, and calculates matched cell types and states through an enrichment algorithm based on marker genes of all tissue-specific cell types, so as to realize enrichment analysis of cross-tissue immune cell types and states.
Further, the calculating the matched cell type and state by the enrichment algorithm includes:
calculating a foreground gene set and a background gene set according to the marker genes of all tissue-specific cell types, and then calculating the number of the marker genes and the non-marker genes of each enriched cell type;
statistical analysis, including chi-square or Fisher exact tests, are then performed to determine the significance of the enrichment and provide corresponding p-values.
Further, the expression profile allows a user to delineate the expression of one or more genes in a specific tissue type of cancer using a map, violin and/or dot plot, the input parameters including:
input genes: inputting a custom gene or a gene list, wherein the Enter bond splits the gene;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
drawing type: setting the drawing type and the violin drawing display mode.
Further, the correlation analysis allows a user to calculate the correlation of two genes in a specific cell type of a specific tissue of a cancer, and input parameters include:
input genes: two target genes;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
cell type: selecting a cell type;
correlation coefficient: selecting a calculation method of a correlation coefficient, wherein the calculation method comprises Pearson, spearman or Kendall algorithm;
the similar gene detection allows a user to search a gene list of a specific cell type of a specific tissue of the cancer, which has a similar expression pattern with a custom gene, and compared with the correlation analysis, only one target gene of the input genes is used, and other input parameters are the same.
Further, the feature scoring analysis allows the user to calculate the score and difference comparison of all cell type custom features in a specific tissue of the cancer, and the input parameters include:
input genes: at least 2 target genes are input;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
the comparison analysis allows the user to compare the expression profile of the custom gene in all tissue specific cell types of the cancer, as compared to the feature score analysis, with only one input gene target and the same other input parameters.
Further, the data processing of the collected cancer cross-tissue single cell RNA-seq data results in marker genes of all tissue specific cell types, comprising:
performing single-cell data quality control on each data set in the collected million-level cancer cross-tissue single-cell RNA-seq data to generate a gene expression matrix;
according to the gene expression matrix, identifying and removing double cells and batch effects respectively;
preliminary cell type annotation is carried out according to the specific marker genes of the main immune cell type and the non-immune cell type, and then sub-cell type clustering and annotation are carried out; finally, a plurality of cell types, including immune and non-immune cell subtypes, are identified and a comprehensive cell type profile is constructed in each tissue type;
marker genes were calculated for all tissue specific cell types based on the various cell types.
Further, the batch effect adopts a Harmonys algorithm, and the expression model of the cells is adjusted to minimize the difference between different batches, wherein the method comprises two key steps: batch-to-batch alignment and removal of batch effects;
the alignment between batches determines the difference between batches by calculating the similarity between cells, specifically:
mapping the expression pattern of the cells into a low dimensional space using an embedding mapping; adjusting the positions of the cells by minimizing the embedding difference between batches so as to realize batch alignment;
the removal of the batch effect further reduces the batch effect by adjusting the expression value of each gene.
Further, the collected cancer cross-tissue single-cell RNA-Seq data includes single-cell RNA-Seq data of millions of cancer multiple tissue types, specifically including single-cell RNA-Seq data of hundreds of samples of primary cancer tissue and paracancerous tissue, metastatic cancer tissue and paracancerous tissue, lymph node tissue, and peripheral blood.
The second object of the invention can be achieved by adopting the following technical scheme:
a method of analysis and retrieval of cancer cross-tissue immune cell type enrichment and expression profiling, the method comprising:
data processing is carried out on the collected million-level cancer cross-tissue single-cell RNA-seq data to obtain marker genes of all tissue specific cell types;
based on the marker genes of all tissue-specific cell types, obtaining a cancer cross-tissue immune cell type or enrichment state according to the user-input custom genes;
based on the marker genes of all tissue-specific cell types, immune cell characteristics drawn across a plurality of tissues are comprehensively analyzed according to the genetic information input by the user, including expression profiling, correlation, similar gene detection, feature scoring and expression value comparison.
Compared with the prior art, the invention has the following beneficial effects:
the analysis system and the retrieval method for the enrichment and the expression profile of the cancer cross-tissue immune cell type collect millions of cancer cross-tissue single-cell RNA-seq data from NCBI and open literature, and process all the collected data to obtain marker genes of all tissue specific cell types; based on the marker genes of all tissue-specific cell types, obtaining a cancer cross-tissue immune cell type or an enrichment state by using an enrichment analysis module according to the user-defined genes input by a user; based on the marker genes of all tissue-specific cell types, the expression analysis module is utilized to realize the functions of expression profile analysis, correlation analysis, similar gene detection analysis, feature scoring analysis and expression value comparison analysis according to the input gene information, so that the comprehensive one-stop on-line calculation and information retrieval of cancer cross-tissue immune cell types, expression profiles and the like are realized, the convenience and practicality of single-cell data use are greatly improved, precious resources are provided for exploring the internal features of immune cells of cancer patients, and the development and immunotherapy strategies of new cancer immune biomarkers can be possibly guided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a functional block diagram of an analysis system for enrichment and expression profiling of cancer cross-tissue immune cell types in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of processing single cell RNA-seq data according to an embodiment of the present invention;
FIG. 3 is a graph showing the data sets before and after removing the lot effect, wherein (A) is a graph showing no lot correction, (B) is a graph showing the lot balance KNN algorithm, and (C) is a graph showing the Harmony algorithm;
FIG. 4 is a workflow diagram of an analysis system for cancer cross-tissue immune cell type enrichment and expression profiling in accordance with an embodiment of the present invention;
FIG. 5 is a schematic diagram of an enrichment analysis interface according to an embodiment of the present invention, wherein (A) is an input box, and (B) to (D) are output results;
FIG. 6 is a schematic diagram of an expression map interface according to an embodiment of the present invention, wherein (A) is an input box and (B) is an output result;
FIG. 7 is a schematic diagram of a correlation analysis interface according to an embodiment of the present invention, wherein (A) is an input box and (B) is an output result;
FIG. 8 is a schematic diagram of a similar gene detection interface according to an embodiment of the present invention, wherein (A) is an input box and (B) is an output result;
FIG. 9 is a schematic diagram of a feature score analysis interface according to an embodiment of the present invention, wherein (A) is an input box and (B) is an output result;
fig. 10 is a schematic diagram of a comparison analysis interface according to an embodiment of the present invention, where (a) is an input box and (B) is an output result.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention. It should be understood that the description of the specific embodiments is intended for purposes of illustration only and is not intended to limit the scope of the present application.
Examples:
as shown in fig. 1, the analysis system (CIEC) for enrichment and expression profiling of cancer cross-tissue immune cell type provided in this embodiment, by collecting millions of cancer cross-tissue single-cell RNA-seq data and cell type annotation information from NCBI (GEO) and published literature, and performing processing and integrated analysis on all data sets through a standardized procedure, provides comprehensive one-stop on-line calculation and information retrieval of cancer cross-tissue immune cell type and expression profiling, and the like, and greatly improves convenience and practicability of single-cell data use.
The analysis system for cancer cross-tissue immune cell type enrichment and expression profiling provided by this embodiment is supported by the NGINX service and stores data using MySQL database. The method is reconfigured as a RESTful API and the data is refreshed dynamically on the web page using AJAX. In order to improve accessibility, a REACT front-end framework is adopted, an Ant-design component library is used for constructing a user-friendly layout and display data table, and simultaneously, the Echarts is utilized for interactive chart display. The analysis system in this example collected data for 162 cancer patients accumulating over one million single cells covering cancerous and paracancerous, metastatic and paracancerous, lymph node and peripheral blood samples. The degree of enrichment of the environment-specific immune cell types/states was estimated by constructing immune cell types/state maps for various tissues and using the KOBAS enrichment algorithm. In addition, the analysis system of enrichment and expression profiling of cancer cross-tissue immune cell types also provides a simple and easy-to-use online interface for a user to comprehensively analyze immune cell characteristics drawn across a plurality of tissues, including expression profiling, correlation, similar gene detection, characteristic scoring and expression value comparison, provides valuable resources for exploring the inherent characteristics of immune cells of cancer patients, and possibly guides the development of new cancer immune biomarkers and immunotherapy strategies.
Wherein, the data acquisition comprises the following steps:
(1) And (5) collecting source data.
Raw sequencing data or expression matrices of cancer cross-tissue related single cell RNA-Seq datasets were collected from NCBI (GEO) or publications, wherein the datasets included single cell RNA-Seq data of 316 samples of primary and paracancestral, metastatic and paracancestral, lymph node and peripheral blood of 162 cancer patients (see table 1) for data processing and analysis by standard analytical procedures. After cell quality control and removal of potential double cells (daubles), a total of 839584 high quality cells were retained.
TABLE 1 Source data information
(2) And (5) data processing.
The analysis system for enrichment and expression profile of cancer cross-tissue immune cell type provided in this embodiment is dedicated to comprehensive analysis of human cancer cross-tissue single cell sequencing data, and first, a series of unified and normalized treatments are performed on each dataset. After the generation of the gene expression matrix, the two-cell and batch effects were identified and removed, respectively, using the Scrublet algorithm and the Harmony algorithm. Preliminary cell type annotation was performed based on specific marker genes for major immune cell types (including T cells, B cells, NK cells, myeloid cells, and mast cells) and non-immune cell types (including epithelial cells, endothelial cells, and stromal cells) (see table 2). These major cell types are then extracted one by one for clustering and annotation of subcellular types. Finally, 68 cell types were identified in total, including immune and non-immune cell subtypes, and a comprehensive cell type profile was constructed in each tissue type. Finally, the sc.tl.rank_genes_groups () function was used to calculate the marker genes for all tissue specific cell types.
TABLE 2 specific marker genes for Primary immune and non-immune cell types
The process of processing and analyzing the acquired single cell data is described below with reference to fig. 2, in which the following operations are performed:
(2-1) Single cell data quality control.
For each dataset, genes detected in less than 3 cells were filtered out; and filtering out single cells with detection base factors smaller than 600 and detection base factors larger than 6000, and detecting cells with mitochondrial gene expression detection rate higher than 20%. The functions used include sc.pp.filter_cells (), sc.pp.filter_genes (), and. Obs [ 'pct_counts_mt' ]. The data was normalized using sc.pp.normal_total () function and sc.pp.log1p () function, allowing comparison between cells.
(2-2) identification and removal of double cells.
Potential double cells in the data were identified using the sc.external.pp.scrublet () function, expected_doublet_rate set to 0.03 and threshold set to 0.25. The potential double cells are filtered using the. Obs [ 'predicted_docublet' ] and the. Obs [ 'docublet_score' ].
(2-3) removal of unsupervised clustering and batch effects.
All datasets were merged using the scapy. The scanpy.pp.high_variable_genes () function was used to calculate the high variable genes, the scanpy.tl.pca () function was used to calculate the coordinates, load and variance decomposition of the Principal Component Analysis (PCA), and svd _solver was set to 'arpack'. The first 50 principal components were used for further analysis. Dimension reduction was performed using the scanpy. Single cells were clustered based on an unsupervised graph clustering algorithm using the scanpy. Preliminary analysis results showed that there was a significant batch effect of the data as shown in fig. 3A.
Common methods for batch effect removal are the k-nearest neighbor (k-NN) and Harmony algorithm. KNN is a basic classification and regression method. Is one of the simplest algorithms in the principle of the data mining technology, and has the core function of solving the supervised classification problem. The KNN can quickly and efficiently solve the problem of prediction classification built on a special data set, but does not generate a model, so that the algorithm accuracy does not have strong popularization. The batch balancing KNN algorithm (Batch balanced KNN) alters the KNN process by identifying the top-ranked neighbors of each cell in each batch individually, without regard to the batch, rather than the entire cell pool. The nearest neighbors of each batch are then merged to create the final neighbor list of the cell. The alignment of batches is done in a fast and lightweight way. However, this method does not remove lots well (as shown in FIG. 3B) when the project is applied, so the sub-algorithm is not used. Another common method of batch effect removal is the Harmony algorithm, which aims to minimize the differences between batches by adjusting the expression model of the cells. It is based on the following two key steps: batch alignment (Batch alignment) and removal of Batch effects (Batch effect removal). First, the Harmony algorithm determines the differences between batches by calculating the similarity between cells. It uses a technique called embedding (embedding) to map the expression pattern of cells into a low dimensional space. It then adjusts the cell position by minimizing the embedding differences between batches to achieve batch alignment. Thereafter, the Harmony algorithm further reduced the batch effect by adjusting the expression value of each gene. It uses a linear model to estimate the relationship between each gene and the lot and corrects the expression values of the genes based on these relationships to remove lot-induced bias. The Harmony algorithm can effectively remove batch effect in single-cell RNA sequencing data, so that cells in different batches can be compared, and the accuracy and reliability of the data are improved. In this embodiment, the batch effect is removed from the data using the sc.external.pp.half_integer () function (based on the Harmony algorithm), and the key is set to 'Patent', so that the batch effect can be effectively removed by using this method (as shown in FIG. 3C). And clustering the data by using an unsupervised clustering method after the batch effect is removed, so as to determine the final cell cluster.
(2-4) cell type annotation.
This example uses a two-step approach to annotate cell types. In the first step, when the resolution (resolution) was set to 2, all cell clusters were annotated with major cell types according to known cell-specific marker genes (table 2). And secondly, after the main cell types of all the clusters are annotated, extracting the main cell types one by one, and clustering and annotating the subcellular types. After the subcellular clusters are obtained, the subcellular types are annotated by calculating the marker genes between the different subcellular clusters. Marker genes were calculated between different subcellular clusters using the sc.tl.rank_genes_groups () function.
(2-5) constructing a cell type classification tree and calculating marker genes of all nodes in the classification tree.
The construction of classification trees was performed on the major cell types and the subcellular types of cancer tissues using echartis. By setting the nCells (optional 20,500,100), the sc.tl.rank_genes_groups () function is used to calculate the marker genes for all nodes in the classification tree. These marker genes will be used as background gene sets for cancer cross-tissue immune cell type and enrichment status analysis.
As shown in fig. 4, the analysis system for cancer cross-tissue immune cell type enrichment and expression profile provided in this embodiment is composed of an enrichment analysis module and an expression analysis module. The enrichment analysis module only needs to input a gene list or a gene list file to provide answers as to which cell types and tissue types are statistically significantly related to the input gene list; each task provides a task link so that the user can directly acquire results at present and in the future; these entered gene lists will automatically map into the background marker genes. The functions realized by the expression analysis module comprise expression patterns, correlation analysis, similar gene detection, characteristic scoring analysis and comparison analysis.
(1) An enrichment analysis (Enrichment Analysis) module.
Enrichment analysis for estimating cancer cross-tissue immune cell type and status. The previously developed KEGG Orthology-based annotation system (KOBAS) enrichment algorithm was applied to estimate the enrichment of specific immune cell types under specific conditions. Specifically, this example implements a two-step procedure involving calculation of foreground and background gene sets, followed by calculation of the number of marker genes and non-marker genes for each enriched cell type. Subsequently, statistical analysis, including chi-square or Fisher exact tests, are performed to determine the significance of the enrichment and provide corresponding p-values. The embodiment provides a strict method for identifying highly enriched cell types in cross-tissue analysis of cancers, and has potential application prospects in various biological fields.
Description of: this function allows the user to input custom genes to obtain cancer cross-tissue immune cell type or enrichment status;
parameters: inputting a custom gene list;
the input box is shown in fig. 5 (a), and the results are shown in fig. 5 (B) -5 (D).
(2) And an expression analysis module.
The functions realized by the expression analysis module comprise expression patterns, correlation analysis, similar gene detection, characteristic scoring analysis and comparison analysis.
(2-1) Expression Map (Expression Map).
Description of: this function allows the user to depict the expression of one or more genes in a specific tissue type of cancer using a map, violin, and/or dot plot;
parameters:
input genes: inputting a self-defined gene or gene list; splitting a gene by an Enter bond;
tissue type: primary cancer tissue (Tumor), primary beside cancer tissue (Normal), metastatic cancer tissue (metasite), beside metastatic cancer tissue (Metastasis Normal), lymph node tissue (Lymph node), or Peripheral Blood (PBMC);
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
drawing type: setting the type of drawing and the display mode of the violin drawing;
the input box is shown in fig. 6 (a), and the result is shown in fig. 6 (B).
(2-2) correlation analysis (Correlation Analysis).
Description of: this function allows the user to calculate the correlation of two genes in a specific cell type of a specific tissue of a cancer;
parameters:
gene 1: inputting a target gene 1;
gene 2: inputting a target gene 2;
tissue type: primary cancer tissue (Tumor), primary beside cancer tissue (Normal), metastatic cancer tissue (metasite), beside metastatic cancer tissue (Metastasis Normal), lymph node tissue (Lymph node), or Peripheral Blood (PBMC);
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
cell type: selecting a cell type;
correlation coefficient: selecting a calculation method of a correlation coefficient, wherein the calculation method comprises Pearson, spearman and Kendall algorithm;
the input box is shown in fig. 7 (a), and the result is shown in fig. 7 (B).
(2-3) similar Gene detection (Similar Genes Detection).
Description of: this function allows the user to search for a list of genes in a specific cell type of a specific tissue of a cancer that have similar expression patterns as the custom genes;
parameters:
gene: inputting a target gene;
tissue type: primary cancer tissue (Tumor), primary beside cancer tissue (Normal), metastatic cancer tissue (metasite), beside metastatic cancer tissue (Metastasis Normal), lymph node tissue (Lymph node), or Peripheral Blood (PBMC);
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
cell type: selecting a cell type;
correlation coefficient: selecting a calculation method of a correlation coefficient, wherein the calculation method comprises Pearson, spearman and Kendall algorithm;
the input box is shown in fig. 8 (a), and the result is shown in fig. 8 (B).
(2-4) feature score analysis (Signature Score Analysis).
Description of: this function allows the user to calculate scores for all cell type custom features in a specific tissue of the cancer and a difference comparison;
parameters:
gene: inputting the target gene. The Enter bond splitting gene is. The number of genes needs to be at least 2;
tissue: primary cancer tissue (Tumor), primary beside cancer tissue (Normal), metastatic cancer tissue (metasite), beside metastatic cancer tissue (Metastasis Normal), lymph node tissue (Lymph node), or Peripheral Blood (PBMC);
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
the input box is shown in fig. 9 (a), and the result is shown in fig. 9 (B).
(2-5) comparative analysis (Comparison Analysis).
Description of: this function allows the user to compare the expression profile of custom genes in all tissue specific cell types of cancer;
parameters:
gene: inputting a target gene;
tissue: primary cancer tissue (Tumor), primary beside cancer tissue (Normal), metastatic cancer tissue (metasite Tumor), beside metastatic cancer tissue (Metastasis Normal), lymph node tissue (Lymph node), and Peripheral Blood (PBMC) were selected.
Default full selection;
cell type annotation: selecting "master cell type annotation" or "daughter cell type annotation";
cell type: setting a custom nCells threshold;
cell type: selecting a cell type;
the input box is shown in fig. 10 (a), and the result is shown in fig. 10 (B).
The embodiment also provides an analysis and retrieval method of cancer cross-tissue immune cell type enrichment and expression profile, comprising the following steps:
carrying out data processing on the collected single-cell RNA-seq data to obtain marker genes of all tissue-specific cell types;
based on the marker genes of all tissue-specific cell types, obtaining a cancer cross-tissue immune cell type or enrichment state according to the user-input custom genes;
based on the marker genes of all tissue-specific cell types, immune cell characteristics drawn across a plurality of tissues are comprehensively analyzed according to the genetic information input by the user, including expression profiling, correlation, similar gene detection, feature scoring and expression value comparison.
Wherein, the specific implementation of each step can be seen in the analysis system of the enrichment and expression profile of the cancer across tissue immune cell types.
In summary, the analysis system (CIEC) and the retrieval method for the cancer cross-tissue immune cell type enrichment and expression profile provided by the invention realize the integrated analysis and data recycling of the single cell data of a plurality of tissue types of millions of cancers. CIEC allows users to input a custom gene list, and calculates matched cell types and states in a background data set through an enrichment algorithm so as to realize enrichment analysis of cross-tissue immune cell types and states; in addition, the gene expression analysis and search system is integrated to provide the functions of expression map analysis, correlation analysis, similar gene detection analysis, feature scoring analysis and expression value comparison analysis. The CIEC processes and integrates all data sets through a standardized flow, provides an omnibearing one-stop online calculation and information retrieval system of cancer cross-tissue immune cell types, expression patterns and the like, greatly improves the convenience and practicality of single-cell data use, provides precious resources for exploring the immune cell internal characteristics of cancer patients, and possibly guides the development of new cancer immune biomarkers and immunotherapy strategies.
The above-mentioned embodiments are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent substitutions or modifications according to the technical solution and the inventive concept of the present invention within the scope of the present invention disclosed in the present invention patent, and all those skilled in the art belong to the protection scope of the present invention.

Claims (10)

1. An analysis system for enrichment and expression profiling of cancer across tissue immune cell types, the analysis system comprising an enrichment analysis module and an expression analysis module; the enrichment analysis module is used for obtaining cancer cross-tissue immune cell types or enrichment states according to user-input custom genes based on the marker genes of all tissue-specific cell types; the expression analysis module is used for comprehensively analyzing immune cell characteristics drawn across a plurality of tissues based on marker genes of all tissue-specific cell types according to gene information input by a user, and comprises expression patterns, correlation, similar gene detection, characteristic scoring and expression value comparison;
the marker genes for all tissue-specific cell types were obtained as follows:
cancer cross-tissue single cell RNA-seq data was collected from NCBI and publications; and (3) carrying out data processing on the collected cancer cross-tissue single-cell RNA-seq data to obtain the marker genes of all tissue specific cell types.
2. The analysis system of claim 1, wherein the enrichment analysis module allows a user to input a custom gene list or gene list file, calculate matched cell types and status by an enrichment algorithm based on marker genes for all tissue-specific cell types to achieve enrichment analysis of cross-tissue immune cell types and status.
3. The analytical system of claim 2 wherein the calculating of the matched cell type and status by the enrichment algorithm comprises:
calculating a foreground gene set and a background gene set according to the marker genes of all tissue-specific cell types, and then calculating the number of the marker genes and the non-marker genes of each enriched cell type;
statistical analysis, including chi-square or Fisher exact tests, are then performed to determine the significance of the enrichment and provide corresponding p-values.
4. The analysis system of claim 1, wherein the expression profile allows a user to delineate the expression of one or more genes in a specific tissue type of cancer using a map, violin map, and/or dot map, and wherein inputting parameters comprises:
input genes: inputting a custom gene or a gene list, wherein the Enter bond splits the gene;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
drawing type: setting the drawing type and the violin drawing display mode.
5. The analysis system of claim 1, wherein the correlation analysis allows a user to calculate a correlation of two genes in a specific cell type of a specific tissue of a cancer, and wherein inputting parameters comprises:
input genes: two target genes;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
cell type: selecting a cell type;
correlation coefficient: selecting a calculation method of a correlation coefficient, wherein the calculation method comprises Pearson, spearman or Kendall algorithm;
the similar gene detection allows a user to search a gene list of a specific cell type of a specific tissue of the cancer, which has a similar expression pattern with a custom gene, and compared with the correlation analysis, only one target gene of the input genes is used, and other input parameters are the same.
6. The analysis system of claim 1, wherein the feature scoring analysis allows a user to calculate scores for all cell type custom features in a specific tissue of a cancer and a difference comparison, and wherein inputting parameters comprises:
input genes: at least 2 target genes are input;
tissue type: selecting a primary cancer tissue, a primary paracancerous tissue, a metastatic cancer tissue, a metastatic paracancerous tissue, a lymph node tissue, or peripheral blood;
cell type annotation: selecting "major cell type annotation" or "subcellular type annotation";
cell type: setting a custom nCells threshold;
the comparison analysis allows the user to compare the expression profile of the custom gene in all tissue specific cell types of the cancer, as compared to the feature score analysis, with only one input gene target and the same other input parameters.
7. The analysis system of claim 1, wherein the data processing of the collected cancer cross-tissue single cell RNA-seq data to obtain marker genes for all tissue specific cell types comprises:
performing single-cell data quality control on each data set in the collected million-level cancer cross-tissue single-cell RNA-seq data to generate a gene expression matrix;
according to the gene expression matrix, identifying and removing double cells and batch effects respectively;
preliminary cell type annotation is carried out according to the specific marker genes of the main immune cell type and the non-immune cell type, and then sub-cell type clustering and annotation are carried out; finally, a plurality of cell types, including immune and non-immune cell subtypes, are identified and a comprehensive cell type profile is constructed in each tissue type;
marker genes were calculated for all tissue specific cell types based on the various cell types.
8. The analytical system of claim 7 wherein the batch effect employs a harmony algorithm by adjusting the expression model of cells to minimize differences between batches, comprising two key steps: batch-to-batch alignment and removal of batch effects;
the alignment between batches determines the difference between batches by calculating the similarity between cells, specifically:
mapping the expression pattern of the cells into a low dimensional space using an embedding mapping; adjusting the positions of the cells by minimizing the embedding difference between batches so as to realize batch alignment;
the removal of the batch effect further reduces the batch effect by adjusting the expression value of each gene.
9. The analysis system of any one of claims 1 to 8, wherein the collected cancer cross-tissue single-cell RNA-Seq data comprises millions of cancer multiple tissue type single-cell data, specifically comprising single-cell RNA-Seq data of several hundred samples of primary and paracancerous, metastatic and paracancerous, lymph node and peripheral blood.
10. A method for analysis and retrieval of cancer cross-tissue immune cell type enrichment and expression profiling, the method comprising:
data processing is carried out on the collected million-level cancer cross-tissue single-cell RNA-seq data to obtain marker genes of all tissue specific cell types;
based on the marker genes of all tissue-specific cell types, obtaining a cancer cross-tissue immune cell type or enrichment state according to the user-input custom genes;
based on the marker genes of all tissue-specific cell types, immune cell characteristics drawn across a plurality of tissues are comprehensively analyzed according to the genetic information input by the user, including expression profiling, correlation, similar gene detection, feature scoring and expression value comparison.
CN202311541116.1A 2023-11-20 2023-11-20 Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type Pending CN117789817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311541116.1A CN117789817A (en) 2023-11-20 2023-11-20 Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311541116.1A CN117789817A (en) 2023-11-20 2023-11-20 Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type

Publications (1)

Publication Number Publication Date
CN117789817A true CN117789817A (en) 2024-03-29

Family

ID=90385885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311541116.1A Pending CN117789817A (en) 2023-11-20 2023-11-20 Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type

Country Status (1)

Country Link
CN (1) CN117789817A (en)

Similar Documents

Publication Publication Date Title
CN108198621B (en) Database data comprehensive diagnosis and treatment decision method based on neural network
US10275711B2 (en) System and method for scientific information knowledge management
US9141913B2 (en) Categorization and filtering of scientific data
US8364665B2 (en) Directional expression-based scientific information knowledge management
CN106021984A (en) Whole-exome sequencing data analysis system
CN110910950A (en) Flow method for combined analysis of single-cell scRNA-seq and scATAC-seq
CN108335756B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
CN108206056B (en) Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal
Bhargava et al. DNA barcoding in plants: evolution and applications of in silico approaches and resources
CN110377605B (en) Sensitive attribute identification and classification method for structured data
CN110134719B (en) Identification and classification method for sensitive attribute of structured data
CN115428088A (en) Systems and methods for joint interactive visualization of gene expression and DNA chromatin accessibility
CN111755071B (en) Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering
CN114864003A (en) Differential analysis method and system based on single cell samples of mixed experimental group and control group
CN108320797B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
Wang et al. Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data
Weitschek et al. Genomic data integration: A case study on next generation sequencing of cancer
CN115881296B (en) Thyroid papillary carcinoma (PTC) risk auxiliary layering system
CN117789817A (en) Analysis system and retrieval method for enrichment and expression profile of cancer cross-tissue immune cell type
Salle et al. Mining discriminant sequential patterns for aging brain
CN113380326B (en) Gene expression data analysis method based on PAM clustering algorithm
WO2009039425A1 (en) Directional expression-based scientific information knowledge management
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
CN113361752A (en) Protein solvent accessibility prediction method based on multi-view learning
Nafar et al. Data mining methods for protein-protein interactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination