CN117789828B - Anti-aging target spot detection system based on single-cell sequencing and deep learning technology - Google Patents

Anti-aging target spot detection system based on single-cell sequencing and deep learning technology Download PDF

Info

Publication number
CN117789828B
CN117789828B CN202410218131.0A CN202410218131A CN117789828B CN 117789828 B CN117789828 B CN 117789828B CN 202410218131 A CN202410218131 A CN 202410218131A CN 117789828 B CN117789828 B CN 117789828B
Authority
CN
China
Prior art keywords
sasp
cell
gene
cells
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410218131.0A
Other languages
Chinese (zh)
Other versions
CN117789828A (en
Inventor
黄可心
韩瑜娟
周小波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202410218131.0A priority Critical patent/CN117789828B/en
Publication of CN117789828A publication Critical patent/CN117789828A/en
Application granted granted Critical
Publication of CN117789828B publication Critical patent/CN117789828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to an anti-aging target spot detection system based on single-cell sequencing and deep learning technologies, and belongs to the technical field of biology. The anti-aging target detection system is based on single cell sequencing combined with a deep learning model, and anti-aging target genes are screened out. According to the invention, an interaction network of aging cells and immune cells is constructed by using a deep learning technology, and verification is performed by using a linear model, so that the reliability of the system is improved; the method can be used for detecting anti-aging targets, can be expanded to other diseases and data modes, provides a unique visual angle for researching functions of aging-related genes, and provides a powerful tool for identifying potential interactions between SASP and immune microenvironment in an aging process.

Description

Anti-aging target spot detection system based on single-cell sequencing and deep learning technology
Technical Field
The invention relates to an anti-aging target spot detection system, and belongs to the technical field of biology.
Background
Aging is a complex process that gradually declines each function of a biological organism over time. In recent years, with the continuous increase of population aging speed and the rapid expansion of the size of the aged population, population aging problems have presented serious challenges to global socioeconomic and healthcare decisions. Numerous studies have shown that the aging process is regulated by a wide range of cellular and molecular changes. Therefore, research on complex potential mechanisms in aging so as to realize accurate detection of anti-aging targets brings important value for development of anti-aging medicaments and even delay of aging processes.
Recent studies have found that the life span of the elderly, as well as susceptibility to disease, all exhibit great individual differences. The research starts from the condition of the individual responding to external interference signals (such as infection and the like), and provides a brand new index for measuring the immune state of the individuals with different age groups, namely immune elasticity (Immune Resilience, IR). Immune elasticity can be used to measure the ability of the human immune system to maintain or rapidly resume immune function when disturbed. Studies have found that there is a close relationship between individual's immune elasticity and their life and health. In addition, the decline in immune system function during aging was found to be regulated by a variety of age-related factors, such as the aging-related secretory phenotype (SENESCENCE-Associated Secretory Phenotype, SASP). Senescence-associated secretion phenotype (SASP) is a phenotype associated with senescent cells (SENESCENT CELL) that secrete high levels of inflammatory cytokines, immunomodulators, growth factors, proteases, etc., causing chronic inflammation in the body, further inhibiting immune cell function for immune surveillance. Secretion of SASP has an important impact on the regulation of immune system function and elucidation of its molecular level mechanisms will also help reveal how immune elasticity changes during aging. Based on the existing results, the genes involved in immune cell regulation and the interaction of the genes with SASP signals are researched, and genes related to regulating immune elasticity are identified, so that the immune change rule in the aging process is understood and quantified from the aspect of a system.
Previous studies suggest that SASP has a high degree of heterogeneity, the exact composition of which depends on the senescent cell inducer and the cell type. While recent advances in single-cell sequencing technology have certainly provided powerful conditions for studying the heterogeneity of the functions of the various SASP factors secreted by senescent cells. In addition, the rapid development of single cell sequencing-related data resources also provides a platform for a comprehensive understanding of senescent cell phenotype and functional changes. However, single-cell sequencing data itself has the characteristics of high dimensionality and high sparsity, which brings great challenges to traditional statistical methods and machine learning methods. Traditional methods such as linear regression models require multiple data to fit linear assumptions, making it difficult to capture complex nonlinear relationships. With the development of computer technology, the occurrence of the deep learning model can make up for the defects of the traditional machine learning method to a great extent.
Currently, deep learning methods are very diverse, including generating models, discriminating models, drawing models, and the like. For example, the graphic neural network model (Graph Neural Network, GNN) is capable of capturing entity-to-entity relationship structures, which learn data representations by combining topologies between the data, can well handle edge-connected non-Euclidean data, and can be used to describe different levels of molecular interaction networks of genes, proteins, pathways, etc. In addition, the introduction of deep learning model attention mechanisms can enhance the weight of certain parts of the neural network input data, while weakening the weight of other parts, thereby focusing the attention of the network on the most important small part of the data. At the same time, the transducer model with the attentional mechanism also provides the possibility to focus the association between different genes in different cell types. BERT (Bidirectional Encoder Representation from Transformers) is a new language learning model based on a transducer architecture, which has excellent effects on many natural language processing (Natural language processing, NLP) tasks. If the deep learning technology and the single cell sequencing technology are combined, the method is favorable for identifying key signal transduction pathways and genes for regulating and controlling the immunity elasticity, thereby providing a new strategy for researching anti-aging targets.
Currently, there have been some studies on the above mentioned changes in immune characteristics against the aging process. The university of texas san francisco health science center Sunil k, ahuja professor team presented a new feature of measuring aging immune status, immune elasticity. Researchers define immune elasticity as the ability of an individual to maintain or restore immune function in the face of an inflammatory factor stimulus. To test the reliability of this concept of immune elasticity, the study conducted an integrative study based on the four cohort data that were published, two indices were set, and its theoretical framework was validated in a large dataset. The study team measured immune elasticity by two means: 1. by measuring the balance between cd8+ and cd4+ T cells. T cells combat infections, but in many infectious and autoimmune diseases, their levels are unbalanced. The balance between cd8+ and cd4+ T cells is divided into four different categories, called Immune health class (Immune HEALTH GRADES, IHGS), measured in different infection queues and age ranges. 2. By measuring the expression level of genes associated with immune competence and greater chance of survival, and genes associated with inflammation and higher risk of death. The survival and Mortality related characteristics (Survival-Associated Signature, SAS) -1/(Mortality-Associated Signature, MAS) -1, namely SAS-1/MAS-1 index, were set. Thus, the gene expression markers of high immunity and low inflammation were identified, and the immune health grade and optimal immune elasticity were obtained [ extract :Ahuja,S.K.,Manoharan,M.S.,Lee,G.C. et al. Immune resilience despite inflammatory stress promotes longevity and favorable health outcomes including resistance to infection. Nat Commun 14,3286 (2023).].
The prior studies do not reveal the underlying mechanisms of the change in the cellular level of immune elasticity during aging; only some statistical methods, such as regression analysis and correlation analysis, are used, without developing and applying specific algorithm models; the correlation of changes in individual's immune elasticity with age-related cytokines is not discussed.
Disclosure of Invention
The invention aims to establish an anti-aging target spot detection system based on single-cell sequencing and deep learning technologies.
The invention provides an anti-aging target detection system, which screens anti-aging target genes based on a single-cell sequencing combined deep learning model.
The anti-aging target detection system comprises the following functional modules:
a. A data acquisition processing module;
b. a data analysis module;
c. And constructing crosstalkBERT models and IRES models, and screening anti-aging target genes.
The data acquisition processing module in the step a comprises the following steps:
S 1: single cell transcriptome scRNA-seq data was collected, quality control and pretreatment of single cell transcriptome data: the cells are potential characteristic cells for regulating immune elasticity of immune cells in the aging process;
S 2: scoring the cells in step S 1 by marker genes of senescent cells to identify senescent cells within the tissue;
s 3: selecting senescent cells having an interaction with immune cells by expressing the ligand-receptor gene on senescent cells identified in step S 2;
S 4: determining the SASP secreted by senescent cells, collecting a SASP perturbation-response data set, and establishing a SASP-gene response matrix.
Further preferably, the data in S 1 are T cell, NK cell, monocyte sequencing data of blood, skeletal muscle, lung tissue stored in database AgeAnno;
The marker gene scoring method in S 2 is to calculate the enrichment fraction of the senescence marker gene in each cell by ssGSEA to obtain the senescence cell score of each cell, and divide the cells into senescence cells and non-senescence cells according to SCS;
the senescent cells that interact with immune cells described in S 3 promote secondary or paracrine senescence by secreting SASP factors;
The specific method of S 3 is to extract information of ligand genes and receptor genes from CellTalkDB database, and to perform standardized treatment of expression level of the ligand genes and receptor genes:
l i is the ligand gene expression value in the cell i, and the ligand gene expression value is normalized and then is alpha i:
(1)
R j is the expression value of the receptor gene in the cell j, and the normalized expression value of the receptor gene is beta j:
(2)
The interaction score of senescent cells i with immune cells j was calculated:
(3)
s 3, screening aged cell-immune cell pairs with the interaction score of 25% before sorting, identifying the interrelationship of the aged cell-immune cell pairs, and taking the whole gene expression profile of the single cell transcriptome sample screened in the step as a response variable;
S 4, searching a disturbance-response data set of the SASP factors in a public database by taking the SASP factors related to aging as keywords, wherein the data type is transcriptome or chip expression data; all collected data sets are subjected to quality control and processing, SASP factors with disturbance-control conditions are selected for carrying out differential gene expression analysis of a disturbance group and a control group, and the first 5% of differential genes are used as a response matrix which is used as an explanatory variable.
The data analysis module in the step b includes:
s 5: scoring immunosuppressive activities of SASP in three tissues through a ridge regression model, and establishing an SASP active matrix; and verifying and evaluating the fitting accuracy of the SASP active matrix.
The verification method is that a transcription factor activity-SASP activity prediction model is constructed by a random forest method, and whether the SASP activity can predict the activity degree of a downstream transcription factor is verified.
The method for constructing crosstalkBERT model in the step c is as follows:
1) Data preprocessing: integrating single-cell transcriptome data and SASP factor active matrix into an interactive node characteristic representation;
2) Extracting the interaction relation between genes from the STRING database, and constructing a gene-gene relation knowledge graph;
3) Processing the knowledge graph by using a graph attention network, and learning embedding for each gene node to represent;
4) Pre-training: inputting the interactive feature embedding result into a pre-training model to perform MLM pre-training;
5) Fine tuning: parameters in the original pre-training model are frozen and not updated, and only the parameters of the newly added classification head layer are updated so as to predict the downstream task of the immune cell type to finely adjust the network model, and when the loss function between the predicted immune cell type and the label converges, the performance of the model is judged to be optimal;
6) And (3) testing: the immune cell classification performance of the trimmed model is evaluated on an independent test set;
7) Output interaction network: and outputting attention matrix, visualizing the interaction network structure of the genes-genes, SASP-SASP and the genes-SASP, and identifying important signals affecting immune cells.
The IRES model building method in the step c is as follows:
1) Extracting T cell, NK cell and monocyte expression data from the data set, and constructing expression matrixes of the three types of immune cells;
2) Collecting T cell, NK cell and monocyte activation related genes from a plurality of gene function annotation libraries of KEGG, GO and MSigDB, constructing a binary feature vector, using a linear regression model, taking the feature vector and the integral gene expression matrix of the cells as input, and calculating the activation level score of each cell; carrying out Pearson correlation analysis on the activity of each SASP and the activation level of each cell, and screening SASP meeting the correlation coefficient as inhibitory SASP of the corresponding immune cell type;
3) Identifying the immune elasticity marker gene by using a linear interactive regression model, and establishing a linear interactive regression model formula for each inhibition SASP as follows:
(4)
Wherein: a is a regression coefficient corresponding to the SASP activity item, b is a regression coefficient corresponding to the gene expression item, c is a regression coefficient corresponding to the SASP-gene interaction item, d is an intercept item of a regression equation, and G represents the expression level of the gene in immune cells; all four parameters of a, b, c and d are estimated to obtain numerical values through a least square method, and the magnitude and significance of the numerical values reflect the influence of different interpretation variables on response variables;
4) And (3) outputting results: t-test was performed on each SASP-gene interaction term coefficient, i.e., the c-value, and genes contributing significantly (p-value < 0.05) were selected as SASP repressors.
The invention optimizes the traditional BERT model, provides a map network guided crosstalkBERT model, can acquire the interaction among genes-SASP, genes-genes and SASP-SASP, aims at more comprehensively capturing the change of the immunity elasticity at the cellular level in the aging process and the potential mechanism of the immunity elasticity characteristic at the molecular level, and makes up for the lack of the correlation of the immunity elasticity and SASP factors which are not researched at present. And a novel Immune cell elasticity fraction model (Immune CELL RESILIENCE Score, IRES) is established for verifying the Immune elasticity of Immune cells to SASP factors in the aging process, and key regulatory genes affecting the Immune cell elasticity are identified through the IRES model, so that the screened target genes have important significance for deep research on the complex mechanism of aging, and simultaneously provide new information for research and development of anti-aging drugs and prevention and treatment of senile diseases such as Alzheimer disease.
The beneficial effects of the invention are as follows:
1. According to the invention, an interaction network of aging cells and immune cells is constructed by using a deep learning technology, and then verification is performed by using a linear model, so that the reliability of the system is improved, a perfect anti-aging related target spot detection analysis system is established, and a step is taken on a road for delaying the aging process;
2. the invention can be used for detecting anti-aging targets and can be also expanded to other diseases and data modes;
3. the crosstalkBERT model established by the invention can comprehensively and systematically identify the interaction among the genes-SASP, the genes-genes and the SASP-SASP, integrate the interaction data among various cytokines, establish an interaction network between aging cells and immune cells and reveal the potential complex mechanism of the aging cells and the immune cells;
4. according to the invention, an IRES model is established, so that the immune elasticity of immune cells to SASP factors in the aging process can be measured, and the immune elasticity is expanded to the cellular level for the first time;
5. According to the invention, the activity level of SASP is fitted by using a ridge regression model, so that the prediction accuracy of the system is improved;
6. The invention can also be used to study the heterogeneity of senescent cells, which will provide a unique perspective for studying the function of senescence-associated genes and a powerful tool for identifying potential interactions between SASP and the immune microenvironment during senescence.
Drawings
Fig. 1: a system flow diagram;
Fig. 2: data collection and processing diagram;
Fig. 3: identifying senescent and non-senescent cell patterns; wherein, fig. 3A: senescent and non-senescent cell patterns in blood tissue, fig. 3B: ratio of senescent to non-senescent cells in young and old groups figure 3C: expression profile of senescence marker genes in blood cells, fig. 3D: senescent and non-senescent cell patterns in lung tissue, fig. 3E: a plot of expression of senescence marker genes in lung tissue cells;
fig. 4: calculating an interaction score to screen a senescent cell map of interactions with immune cells;
Fig. 5: SASP activity prediction workflow diagram;
fig. 6: a workflow diagram of crosstalkBERT model;
Fig. 7: IRES model building flow chart.
Detailed Description
Example 1 anti-aging target detection System based on Single cell sequencing and deep learning technology of the invention
The invention is realized by adopting the following technical scheme:
S 1: collecting single cell transcriptome (scRNA-seq) data, and performing quality control and pretreatment on the single cell transcriptome data;
S 2: identifying senescent cells within the tissue by scoring marker genes of the senescent cells;
S 3: screening senescent cells having interactions with immune cells by expression of ligand receptor genes;
S 4: determining the SASP secreted by senescent cells, collecting a SASP perturbation-response data set, and establishing a SASP-gene response matrix;
S 5: scoring immunosuppressive activity of SASP in tissues through a ridge regression model, and establishing an SASP active matrix; evaluating the fitting accuracy of the SASP active matrix;
S 6: and constructing crosstalkBERT a model. And identifying key regulatory genes, and constructing a gene-gene, SASP-gene and SASP-SASP interaction network;
S 7: and establishing an IRES model. And optimizing an interaction network constructed in the S 6 by using a linear interaction model, and screening anti-aging target genes.
The whole flow of the system is as shown in fig. 1:
Each of the key steps of the present invention is described in further detail below:
S 1: single cell transcriptome (scRNA-seq) data was collected, quality control and pretreatment of single cell transcriptome data:
To determine the potential characteristics of immune cells to regulate immune elasticity during aging, we extracted single cell sequencing data of multiple immune cell types (T cells, NK cells, monocytes) of three tissues (blood, skeletal muscle, lung tissue) stored in database AgeAnno and extracted basic information for each sample as shown in fig. 2: age, sex, tissue type, cell-specific marker genes, single cell expression profile, and the like. All samples are then quality controlled and pre-processed.
S 2: identifying senescent cells within the tissue by scoring marker genes of the senescent cells:
Markers [D. Saul et al.,"A new gene set identifies senescent cells and predicts senescence-associated pathways across tissues," Nature communications, vol. 13, no. 1, p. 4827, 2022.], for 125 widely recognized senescent cells, such as JUN, NRG 1、WNT2, MMP s, and the like, were collected according to previously published studies. This step uses GSEA (Gene Set Enrichment Analysis)'s modified method ssGSEA(Single-sample GSEA)[A. Subramanianet al.,"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles"Proceedings of the National Academy of Sciences,vol. 102,no. 43,pp. 15545-15550,2005.] to calculate the enrichment scores of 125 senescence marker genes per cell, resulting in a senescence cell score (SENESCENT CELL score, SCS) per cell. All cells were arranged in descending order according to SCS, with the first 8% being taken as the high group and the remainder as the low group, the high group corresponding to the predicted senescent cells and the low group corresponding to the predicted non-senescent cells. FIGS. 3A-C are examples of recognition of senescent cells in blood tissue, and FIGS. 3D-E are examples of recognition of senescent cells in lung tissue. Wherein fig. 3A and 3D show UMAP (Uniform Manifold Approximation and Projection) plots identifying senescent and non-senescent cells in blood and lung tissue, respectively. As can be seen from fig. 3B, the proportion of senescent cells was higher in the elderly, consistent with previous studies.
To verify the reliability of the identified senescent cells, differential gene analysis was first performed on genes of the same cell type for different age groups, followed by functional taxonomic enrichment analysis based on differential genes. Tables 1 and 2 show that the enrichment fraction of gene sets in functional classes such as cell senescence, extracellular space, secretion vesicles, and the like are significantly increased, and the functional classes have strong correlation with cell senescence [T. Misawa, Y. Tanaka, R. Okada, and A. Takahashi, "Biology of extracellular vesicles secreted from senescent cells as senescence-associated secretory phenotype factors," Geriatrics&Gerontology International, vol. 20, no. 6, pp. 539-546, 2020.].
CDKN1a/p21 and CDKN2a/p16[J. A. López-Domínguez et al., "Cdkn1a transcript variant 2 is a marker of aging and cellular senescence," Aging (Albany NY), vol. 13, no. 10, p. 13380, 2021.] have been shown to be the most typical marker genes for senescent cells. Thus, another approach is to evaluate the expression of marker genes from known senescent cells. We examined the expression of these typical senescence marker genes in senescent and non-senescent cells. From fig. 3C and 3E, it can be seen that expression of senescence marker genes is significantly higher in senescent cells than in non-senescent cells.
Table 1: gene enrichment score of senescent cells in blood tissue
Table 2: gene enrichment score of senescent cells in lung tissue
S 3: by expression of ligand receptor genes, senescent cells are screened for interaction with immune cells:
Senescent cells act on some immune cells surrounding them, primarily by secreting SASP factors, forcing neighboring cells into secondary or paracrine senescence .[G. Nelson et al., "A senescent cell bystander effect: senescence-induced senescence," Aging cell, vol. 11, no. 2, pp. 345-349, 2012.] and thus, there is a need to identify immune cells that may be affected by senescent cells. As shown in fig. 4, this step extracts information [X. Shao, J. Liao,C. Li, X. Lu, J. Cheng, and X. Fan, "CellTalkDB: a manually curated database of ligand–receptor interactions in humans and mice," Briefings in bioinformatics, vol. 22, no. 4, p. bbaa269, 2021.], of the ligand gene and the receptor gene from CellTalkDB database, and as shown in fig. 4, performs normalization of the expression level of the ligand gene and the receptor gene.
L i is the ligand gene expression value in the cell i, and the ligand gene expression value is normalized and then is alpha i:
(1)
R j is the expression value of the receptor gene in the cell j, and the normalized expression value of the receptor gene is beta j:
(2)
The interaction score of senescent cells i with immune cells j was calculated:
(3)
The interaction score considers the possibility that the expression degree of the ligand and the receptor influences signal transmission, the aged cell-immune cell pairs 25% before the interaction score is sequenced are further screened to identify the interrelationship, the range of possible influence of the aged cells is identified, and the interaction information among cells is provided for the subsequent model. And the whole gene expression profile of the single cell transcriptome sample screened in this step, i.e., the y variable in FIG. 5, was taken as the "response variable".
S 4: determining the sapp secreted by senescent cells, collecting a sapp perturbation-response dataset, and establishing a sapp-gene response matrix:
Numerous studies have shown that cell secretion of SASP factors has a significant link to cell senescence, and previous studies have found that over 60 SASP factors [V. Gorgoulis et al., "Cellular senescence: defining a path forward," Cell, vol. 179, no. 4, pp. 813-827, 2019.]. associated with senescence in order to accurately estimate SASP activity, as shown in FIG. 5, a published database was searched for a perturbation-response dataset of SASP factors, the data type being transcriptome or chip expression data, using these SASP factors as keywords. All collected data sets are subjected to quality control and processing, only SASP factors with disturbance-control conditions are selected, firstly, we perform differential gene expression analysis of a disturbance group and a control group, and take the first 5% of differential genes as a response matrix, namely an X variable in FIG. 5, and the X variable is taken as an explanatory variable.
S 5: scoring immunosuppressive activities of SASP in three tissues through a ridge regression model, and establishing an SASP active matrix; and evaluating the fitting accuracy of the SASP active matrix:
After constructing the explanatory and response variables, this step constructs a ridge regression model in order to build the SASP active matrix. The coefficient matrix of the response matrix is taken as the activity of SASP. To verify SASP activity, we constructed a transcription factor activity-SASP activity prediction model using a random forest method to verify whether SASP activity can predict the extent of downstream transcription factor activity.
S 6: and constructing crosstalkBERT a model. And identify key regulatory genes, construct gene-gene, SASP-SASP interaction network:
fig. 6 is a workflow of the crosstalkBERT model, and further details of the implementation of the crosstalkBERT model are described below in conjunction with fig. 6:
1) Data preprocessing: integrating single cell transcriptome data and SASP factor activity information into an interactive node characteristic representation;
2) Extracting interaction relation between genes from STRING database, and constructing a gene-gene relation knowledge graph;
3) Processing the knowledge graph by using a graph attention network, and learning embedding for each gene node to represent;
4) Pre-training: inputting the interactive feature embedding result into a pre-training model for MLM (Masked Language Modeling) pre-training;
5) Fine tuning: parameters in the original pre-training model are frozen and not updated, and only the parameters of the newly added classification head layer are updated so as to predict the downstream task of the immune cell type to finely adjust the network model, and when the loss function between the predicted immune cell type and the label converges, the performance of the model is judged to be optimal;
6) And (3) testing: the immune cell classification performance of the trimmed model is evaluated on an independent test set;
7) Output interaction network: and outputting attention matrix, visualizing the interaction network structure of the genes-genes, SASP-SASP and the genes-SASP, and identifying important signals affecting immune cells.
S 7: and establishing an IRES model. Optimizing an interaction network constructed in S 6 by using a linear interaction model, and screening anti-aging target genes:
FIG. 7 is a flow chart of IRES construction, first extracting T cell, NK cell and monocyte expression data from the dataset, and constructing expression matrices of these three types of immune cells. And collecting genes related to T cell, NK cell and monocyte activation from a plurality of gene function annotation libraries such as KEGG, GO, MSigDB and the like, and constructing a binary feature vector. The activation level score for each cell was calculated using a linear regression model, with the feature vector and the overall gene expression matrix of the cell as inputs. A Pearson correlation analysis was then performed on the individual SASP activities with the level of activation per cell. SASPs satisfying the correlation coefficients are screened for inhibition of SASPs by the corresponding immune cell type. Identifying the immune elasticity marker gene by using a linear interactive regression model, and establishing a linear interactive regression model formula for each inhibition SASP as follows:
(4)
Wherein: is the regression coefficient corresponding to SASP activity term,/> Is the regression coefficient corresponding to the gene expression item,/>Is the regression coefficient corresponding to SASP-gene interaction term,/>Is the intercept term of the regression equation, G represents the expression level of the gene in immune cells.All four parameters are estimated to obtain numerical values through a least square method, and the magnitude and significance of the numerical values reflect the influence of different interpretation variables on response variables.
Regression coefficients for each SASP-gene interaction term, i.e., forThe value is t-checked,/>Indicates that SASP-gene interaction item is positively correlated with immune cell activation level,/>The SASP-gene interaction term was shown to be inversely related to the level of immune cell activation, and genes contributing significantly (p-value < 0.05) were selected as SASP repressor genes. Further using F test, genes were screened for which the model was overall significant (p value < 0.05) and the number of interactions was over 50%. And finally, probing an interaction network, and verifying whether the selected genes are matched with experimental results or not, and if so, taking the genes as the immunoelastic regulatory genes.
The alternatives existing in the steps of the detection system of the present invention are as follows:
1. Alternative to calculation of the interaction value between cytokines:
The contribution of the immunoelastic marker gene can also be assessed at the cellular level using the saprolidine addition and interpretation (SHAPLEYADDITIVE EXPLANATIONS, SHAP) method in the present invention. An advantage in interaction analysis is the ability to assign values to feature pairs, SHAP interaction values can be used to quantify the co-contribution of gene pairs and SASP factor pairs, as well as gene-SASP pairs in immune cells. SHAP interactions do not make any assumptions about the form of interaction, and more complex, potentially nonlinear interactions can be captured.
2. Alternative to identifying senescent cells:
In addition to ssGSEA methods, methods based on probabilistic modeling, such as VariationalAutoEncoder, can be used to learn potential pattern distributions in single cell data using the generated model to identify senescent cells.
3. An alternative scheme for establishing an IRES model is as follows:
In addition to using a linear regression model, the IRES model can be constructed by using a nonlinear machine learning algorithm such as SVM or random forest to realize nonlinear fitting effect of function variation.
The key technical points of the invention mainly include the following six points:
(1) The invention designs a complete set of analysis flow to construct an anti-aging target detection system, and the specific steps are as follows:
S 1: collecting single cell transcriptome (scRNA-seq) data, and performing quality control and pretreatment on the single cell transcriptome data;
S 2: identifying senescent cells within the tissue by scoring marker genes of the senescent cells;
S 3: screening senescent cells having interactions with immune cells by expression of ligand receptor genes;
S 4: determining the SASP secreted by senescent cells, collecting a SASP perturbation-response data set, and establishing a SASP-gene response matrix;
S 5: scoring immunosuppressive activity of SASP in tissues through a ridge regression model, and establishing an SASP active matrix; evaluating the fitting accuracy of the SASP active matrix;
S 6: and constructing crosstalkBERT a model. And identifying key regulatory genes, and constructing a gene-gene, SASP-gene and SASP-SASP interaction network;
S 7: and establishing an IRES model. And optimizing an interaction network constructed in the S 6 by using a linear interaction model, and screening anti-aging target genes.
(2) Identification of senescent cells within a tissue by scoring marker genes of senescent cells (S 2): the ssGSEA method was used to calculate the enrichment score of 125 senescence-associated genes collected per cell, thereby identifying senescent and non-senescent cells.
(3) By expression of ligand receptor genes, senescent cells that interact with immune cells are selected (S 3): extracting information of ligand genes and receptor genes from CellTalkDB databases, carrying out expression level standardization treatment (formulas 1 and 2) on the ligand genes and the receptor genes, calculating interaction scores (formula 3) of aging cells and immune cells, further screening out cell pairs with the interaction scores being greater than a given threshold value, and identifying interaction relations of the cell pairs.
(4) Construction crosstalkBERT model (S 6): firstly, preprocessing data: integrating single cell transcriptome data with predicted activity information of SASP molecules on different immune cells into an interaction node characteristic representation; extracting the interaction relation between genes from the STRING database, and constructing a gene-gene relation knowledge graph; then, the knowledge graph is processed by using a graph attention network; inputting the interactive feature embedding result into a pre-training model to perform MLM pre-training; and finally, fine tuning and prediction result checking are carried out.
(5) IRES model was built (S 7): the level of immune cell activation was calculated using a linear regression model (equation 4). A Pearson correlation analysis was then performed on the individual SASP activities with the level of activation per cell. Next, t-test was performed on each of the SASP-gene interaction coefficients, and a gene contributing significantly was selected as the SASP repressor gene. Further adopting F test to select the gene whose model is whole obvious and whose interactive item number is more than 50%. And finally, probing the network, and verifying whether the selected genes are matched with experimental results, and if so, taking the genes as the immunoelastic regulatory genes.
(6) The invention can also be applied to the fields of identifying potential interaction between SASP and immune microenvironment in the aging process, researching anti-aging drug development, preventing and treating senile diseases such as Alzheimer disease and the like. In addition, the result of the interaction network obtained by the crosstalkBERT model in the present invention can also be used to identify heterogeneity of senescent cells in different tissues.

Claims (3)

1. An anti-aging target spot detection system, which is characterized in that: screening anti-aging target genes based on single-cell sequencing combined deep learning model; the device comprises the following functional modules:
a. A data acquisition processing module;
b. a data analysis module;
c. Constructing crosstalkBERT model and IRES model, screening anti-aging target gene;
the data acquisition processing module in the step a comprises the following steps:
S 1: single cell transcriptome scRNA-seq data was collected, quality control and pretreatment of single cell transcriptome data: the cells are potential characteristic cells for regulating immune elasticity of immune cells in the aging process;
S 2: scoring the cells in step S 1 by marker genes of senescent cells to identify senescent cells within the tissue;
s 3: selecting senescent cells having an interaction with immune cells by expressing the ligand-receptor gene on senescent cells identified in step S 2;
S 4: determining the SASP secreted by senescent cells, collecting a SASP perturbation-response data set, and establishing a SASP-gene response matrix;
The data analysis module in the step b comprises:
S 5: scoring immunosuppressive activities of SASP in three tissues through a ridge regression model, and establishing an SASP active matrix; verifying and evaluating the fitting accuracy of the SASP active matrix;
the method for constructing crosstalkBERT model in the step c is as follows:
1) Data preprocessing: integrating single-cell transcriptome data and SASP factor active matrix into an interactive node characteristic representation;
2) Extracting the interaction relation between genes from the STRING database, and constructing a gene-gene relation knowledge graph;
3) Processing the knowledge graph by using a graph attention network, and learning embedding for each gene node to represent;
4) Pre-training: inputting the interactive feature embedding result into a pre-training model to perform MLM pre-training;
5) Fine tuning: parameters in the original pre-training model are frozen and not updated, and only the parameters of the newly added classification head layer are updated so as to predict the downstream task of the immune cell type to finely adjust the network model, and when the loss function between the predicted immune cell type and the label converges, the performance of the model is judged to be optimal;
6) And (3) testing: the immune cell classification performance of the trimmed model is evaluated on an independent test set;
7) Output interaction network: outputting attention a matrix, visualizing the interaction network structure of the genes-genes, SASP-SASP and the genes-SASP, and identifying important signals affecting immune cells;
the IRES model building method in the step c is as follows:
1) Extracting T cell, NK cell and monocyte expression data from the data set, and constructing expression matrixes of the three types of immune cells;
2) Collecting T cell, NK cell and monocyte activation related genes from a plurality of gene function annotation libraries of KEGG, GO and MSigDB, constructing a binary feature vector, using a linear regression model, taking the feature vector and the integral gene expression matrix of the cells as input, and calculating the activation level score of each cell; carrying out Pearson correlation analysis on the activity of each SASP and the activation level of each cell, and screening SASP meeting the correlation coefficient as inhibitory SASP of the corresponding immune cell type;
3) Identifying the immune elasticity marker gene by using a linear interactive regression model, and establishing a linear interactive regression model formula for each inhibition SASP as follows:
(4)
Wherein: a is a regression coefficient corresponding to the SASP activity item, b is a regression coefficient corresponding to the gene expression item, c is a regression coefficient corresponding to the SASP-gene interaction item, d is an intercept item of a regression equation, and G represents the expression level of the gene in immune cells; all four parameters of a, b, c and d are estimated to obtain numerical values through a least square method, and the magnitude and significance of the numerical values reflect the influence of different interpretation variables on response variables;
4) And (3) outputting results: t-test was performed on each SASP-gene interaction term coefficient, i.e., the c-value, and genes contributing significantly (p-value < 0.05) were selected as SASP repressors.
2. The anti-aging target detection system of claim 1, wherein: the data acquisition processing module in the step a comprises the following steps:
The data in S 1 are T cell, NK cell, monocyte sequencing data of blood, skeletal muscle, lung tissue stored in database AgeAnno;
The marker gene scoring method in S 2 is to calculate the enrichment fraction of the senescence marker gene in each cell by ssGSEA to obtain the senescence cell score of each cell, and divide the cells into senescence cells and non-senescence cells according to SCS;
the senescent cells that interact with immune cells described in S 3 promote secondary or paracrine senescence by secreting SASP factors;
The specific method of S 3 is to extract information of ligand genes and receptor genes from CellTalkDB database, and to perform standardized treatment of expression level of the ligand genes and receptor genes:
l i is the ligand gene expression value in the cell i, and the ligand gene expression value is normalized and then is alpha i:
(1)
R j is the expression value of the receptor gene in the cell j, and the normalized expression value of the receptor gene is beta j:
(2)
The interaction score of senescent cells i with immune cells j was calculated:
(3)
s 3, screening aged cell-immune cell pairs with the interaction score of 25% before sorting, identifying the interrelationship of the aged cell-immune cell pairs, and taking the whole gene expression profile of the single cell transcriptome sample screened in the step as a response variable;
S 4, searching a disturbance-response data set of the SASP factors in a public database by taking the SASP factors related to aging as keywords, wherein the data type is transcriptome or chip expression data; all collected data sets are subjected to quality control and processing, SASP factors with disturbance-control conditions are selected for carrying out differential gene expression analysis of a disturbance group and a control group, and the first 5% of differential genes are used as a response matrix which is used as an explanatory variable.
3. The anti-aging target detection system of claim 1, wherein: the verification method in the step b is that a transcription factor activity-SASP activity prediction model is constructed by a random forest method, and whether the SASP activity can predict the activity degree of a downstream transcription factor is verified.
CN202410218131.0A 2024-02-28 2024-02-28 Anti-aging target spot detection system based on single-cell sequencing and deep learning technology Active CN117789828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410218131.0A CN117789828B (en) 2024-02-28 2024-02-28 Anti-aging target spot detection system based on single-cell sequencing and deep learning technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410218131.0A CN117789828B (en) 2024-02-28 2024-02-28 Anti-aging target spot detection system based on single-cell sequencing and deep learning technology

Publications (2)

Publication Number Publication Date
CN117789828A CN117789828A (en) 2024-03-29
CN117789828B true CN117789828B (en) 2024-04-30

Family

ID=90383785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410218131.0A Active CN117789828B (en) 2024-02-28 2024-02-28 Anti-aging target spot detection system based on single-cell sequencing and deep learning technology

Country Status (1)

Country Link
CN (1) CN117789828B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279308A (en) * 2017-12-05 2018-07-13 上海相宜本草化妆品股份有限公司 A kind of Chinese herbal medicine substance and its screening technique
CN111218515A (en) * 2020-02-17 2020-06-02 中国科学院动物研究所 Aging marker of multiple tissues, organs and cell types and application of calorie limitation in delaying aging of organism
CN113380327A (en) * 2021-03-15 2021-09-10 浙江大学 Human biological age prediction and human aging degree evaluation method based on whole peripheral blood transcriptome
CN113604474A (en) * 2021-08-30 2021-11-05 秦绪军 Application of GPx8 as molecular target in preparation of anti-aging drugs
CN113838531A (en) * 2021-09-19 2021-12-24 复旦大学 Method for evaluating cell senescence degree based on transcriptome data and machine learning strategy
CN114214363A (en) * 2021-12-03 2022-03-22 浙江大学 Anti-mesenchymal stem cell aging modification method and application thereof
CN114450750A (en) * 2019-05-17 2022-05-06 英科智能有限公司 Deep proteomic markers of human biological aging and method for determining biological aging clock
CN114958953A (en) * 2022-05-31 2022-08-30 云南贝泰妮生物科技集团股份有限公司 Screening method for anti-aging efficacy of in-vitro 3D whole skin model of cosmetic raw materials
CN115966315A (en) * 2023-01-10 2023-04-14 中国中医科学院医学实验中心 Method, equipment and storage medium for predicting anti-aging medicine
CN116344055A (en) * 2023-04-10 2023-06-27 重庆医科大学 Heart failure risk prediction and neural network model construction method
CN116555163A (en) * 2023-05-23 2023-08-08 南开大学 Anti-aging strategy based on epigenetic target
CN116884476A (en) * 2023-07-18 2023-10-13 平安科技(深圳)有限公司 Prediction method, device, equipment and medium of anti-aging drug combination
CN117253543A (en) * 2023-10-20 2023-12-19 广东丸美生物技术股份有限公司 Skin epidermal cell anti-aging gene library and construction method and application thereof
CN117482008A (en) * 2023-11-06 2024-02-02 昆明理工大学 Organic juniper berry essential oil with anti-aging activity and application thereof

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279308A (en) * 2017-12-05 2018-07-13 上海相宜本草化妆品股份有限公司 A kind of Chinese herbal medicine substance and its screening technique
CN114450750A (en) * 2019-05-17 2022-05-06 英科智能有限公司 Deep proteomic markers of human biological aging and method for determining biological aging clock
CN111218515A (en) * 2020-02-17 2020-06-02 中国科学院动物研究所 Aging marker of multiple tissues, organs and cell types and application of calorie limitation in delaying aging of organism
CN113380327A (en) * 2021-03-15 2021-09-10 浙江大学 Human biological age prediction and human aging degree evaluation method based on whole peripheral blood transcriptome
CN113604474A (en) * 2021-08-30 2021-11-05 秦绪军 Application of GPx8 as molecular target in preparation of anti-aging drugs
CN113838531A (en) * 2021-09-19 2021-12-24 复旦大学 Method for evaluating cell senescence degree based on transcriptome data and machine learning strategy
CN114214363A (en) * 2021-12-03 2022-03-22 浙江大学 Anti-mesenchymal stem cell aging modification method and application thereof
CN114958953A (en) * 2022-05-31 2022-08-30 云南贝泰妮生物科技集团股份有限公司 Screening method for anti-aging efficacy of in-vitro 3D whole skin model of cosmetic raw materials
CN115966315A (en) * 2023-01-10 2023-04-14 中国中医科学院医学实验中心 Method, equipment and storage medium for predicting anti-aging medicine
CN116344055A (en) * 2023-04-10 2023-06-27 重庆医科大学 Heart failure risk prediction and neural network model construction method
CN116555163A (en) * 2023-05-23 2023-08-08 南开大学 Anti-aging strategy based on epigenetic target
CN116884476A (en) * 2023-07-18 2023-10-13 平安科技(深圳)有限公司 Prediction method, device, equipment and medium of anti-aging drug combination
CN117253543A (en) * 2023-10-20 2023-12-19 广东丸美生物技术股份有限公司 Skin epidermal cell anti-aging gene library and construction method and application thereof
CN117482008A (en) * 2023-11-06 2024-02-02 昆明理工大学 Organic juniper berry essential oil with anti-aging activity and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Aging impairs beige adipocyte differentiation of mesenchymal stem cells via the reduced expression of Sirtuin 1;Vuong Cat Khanh等;《Biochemical and Biophysical Research Communications》;20180731;第500卷(第3期);第682-690页 *

Also Published As

Publication number Publication date
CN117789828A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
Keedwell et al. Discovering gene networks with a neural-genetic hybrid
Sotirov et al. A hybrid approach for modular neural network design using intercriteria analysis and intuitionistic fuzzy logic
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN112599187B (en) Method for predicting drug and target protein binding fraction based on double-flow neural network
KR20210153540A (en) System for phenotype-based anticancer drug screening using artificial intelligence deep learning
Hameed et al. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets
Zhou et al. scAdapt: virtual adversarial domain adaptation network for single cell RNA-seq data classification across platforms and species
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
CN108920889B (en) Chemical health hazard screening method
US20240233878A9 (en) Compound function prediction method based on neural network and connectivity map algorithm
Vigneshwari et al. A study on the application of machine learning algorithms using R
TWI709904B (en) Methods for training an artificial neural network to predict whether a subject will exhibit a characteristic gene expression and systems for executing the same
CN117476114A (en) Model construction method and system based on biological multi-group data
CN117789828B (en) Anti-aging target spot detection system based on single-cell sequencing and deep learning technology
CN109545289A (en) A method of based on classification caution structure high flux examination incretion interferent
JP2003513667A (en) Computational methods for inferring elements of gene regulatory networks from temporal patterns of gene expression
Mishra et al. Probable Biomarker Identification Using Recursive Feature Extraction and Network Analysis
Cai et al. Application and research progress of machine learning in Bioinformatics
Zhang et al. Missing Value Recovery for Single Cell RNA Sequencing Data
Yang et al. A Novel Neural Metric Based on Deep Boltzmann Machine
Zakaria et al. MinCAR-Classifier for classifying lung cancer gene expression dataset
TWI650664B (en) Method for establishing assessment model for protein loss of function and risk assessment method and system using the assessment model
Sarra et al. Maximum entropy models for patterns of gene expression
Teodorescu Genetics, gene prediction, and neuro-fuzzy systems-the context and a program proposal
Alsulami et al. PrePR-CT: Predicting Perturbation Responses in Unseen Cell Types Using Cell-Type-Specific Graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant