CN109801676B - Method and device for evaluating activation effect of compound on gene pathway - Google Patents

Method and device for evaluating activation effect of compound on gene pathway Download PDF

Info

Publication number
CN109801676B
CN109801676B CN201910142574.5A CN201910142574A CN109801676B CN 109801676 B CN109801676 B CN 109801676B CN 201910142574 A CN201910142574 A CN 201910142574A CN 109801676 B CN109801676 B CN 109801676B
Authority
CN
China
Prior art keywords
gene
compound
pathway
clustering
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910142574.5A
Other languages
Chinese (zh)
Other versions
CN109801676A (en
Inventor
戴蝉
李瑛颖
管峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Deep Intelligent Pharma Technology Co ltd
Original Assignee
Beijing Deep Intelligent Pharma Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Deep Intelligent Pharma Technology Co ltd filed Critical Beijing Deep Intelligent Pharma Technology Co ltd
Priority to CN201910142574.5A priority Critical patent/CN109801676B/en
Publication of CN109801676A publication Critical patent/CN109801676A/en
Application granted granted Critical
Publication of CN109801676B publication Critical patent/CN109801676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The present application discloses a method for evaluating the activation of a gene pathway by a compound, comprising: obtaining transcriptome data of a control group and transcriptome data of a compound research group; determining transcription differential expression fold data according to transcriptome data of a control group and transcriptome data of a compound research group; clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; acquiring a gene path, and correspondingly distributing corresponding weight coefficients for each gene in the gene path according to promotion, inhibition, phosphorylation and dephosphorylation functions of the gene in the gene path so as to determine a gene path topological coefficient matrix; and determining a scoring result for evaluating the activation of the compound on the gene channel according to the transcription difference expression fold, the gene co-expression unit and the gene channel topological coefficient matrix.

Description

Method and device for evaluating activation effect of compound on gene pathway
Technical Field
The application relates to the technical field of biological information, in particular to a method and a device for evaluating activation of a compound on a gene pathway.
Background
Over the past few decades, with the advent of genetic engineering, much research and money has been invested in genomics and gene-based personalized medicine. With the wide application of deep learning and machine learning algorithms, large-scale transcriptome data is effectively applied, and the traditional disease classification, personalized medicine, prognosis models and the like are optimized to a great extent.
However, these classical clinical applications are currently still limited by several well-recognized challenges and limitations, first, one of the most relevant challenges in transcriptome data analysis is the inherent complexity of gene network interactions, which remains a significant obstacle to building comprehensive predictive models from transcriptome data; furthermore, the high diversity of experimental platforms, the difficulty in understanding the values obtained, and the inconsistency of data from various types of equipment may also lead to incorrect interpretation of potential biological processes.
Despite these challenges, various transcriptome data analysis algorithms have been developed rapidly both academically and commercially, and some of these algorithms have been tried in clinical applications, especially to predict patient response to various cancer treatments by specifically identifying genes differentially expressed between different sample sets, which, while identifying potential genetic biomarkers and expression signature patterns during the study, have made it difficult to capture subtle sample-to-sample differences due to dynamic interactions between genes at the level of the signal network.
The IPANDA approach developed in 2016, which incorporates a gene pathway, greatly reduces the biological data dimension, but does not provide accurate assessment of the role a gene plays in the gene pathway.
Disclosure of Invention
The embodiment of the application provides a method for evaluating the activation effect of a compound on a gene pathway, which can accurately evaluate the activation effect of the compound on the gene pathway while reducing the dimension of biological data.
In view of this, the present application provides, in a first aspect, a method for evaluating the effect of a compound on gene pathway activation, the method comprising:
obtaining transcriptome data of a control group and transcriptome data of a compound research group;
obtaining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units;
acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
Optionally, the assigning a weight coefficient to each gene in the gene pathway according to the role played by the gene in the gene pathway includes:
setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;
setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.
Optionally, the obtaining a gene pathway topological coefficient matrix includes:
and calculating the topological coefficients of the genes on each gene channel by using the R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes.
Optionally, the clustering the genes, and clustering the co-expressed genes into the same group to obtain a plurality of co-expression units of the genes includes:
and carrying out primary clustering processing on the co-expressed genes, and carrying out secondary clustering processing on the primary clustering result to obtain a gene co-expression unit.
Optionally, the clustering the genes, and clustering the co-expressed genes into the same group to obtain a plurality of co-expression units of the genes includes:
a density-based clustering method and/or a hierarchical clustering method is employed.
Optionally, the density-based clustering method includes: DBSCAN, OPTIC;
the hierarchical clustering method comprises the following steps: BIRCH.
In a second aspect, the present application provides a device for assessing the activation of a gene pathway by a compound, the device comprising:
the transcriptome data acquisition module is used for acquiring transcriptome data of a control group and transcriptome data of a compound research group;
the transcription difference expression fold data acquisition module is used for acquiring transcription difference expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
the gene co-expression unit acquisition module is used for clustering related genes, clustering co-expressed genes into the same group and acquiring a plurality of gene co-expression units;
the gene path topological coefficient matrix acquisition module is used for acquiring a gene path, distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path and acquiring a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
the scoring module is used for determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
Optionally, the gene pathway topological coefficient matrix obtaining module is specifically configured to:
setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;
setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.
Optionally, the gene pathway topological coefficient matrix obtaining module is specifically configured to:
and calculating the topological coefficients of the genes on each gene channel by using the R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes.
Optionally, the gene co-expression unit obtaining module is specifically configured to:
and carrying out primary clustering processing on the co-expressed genes, and carrying out secondary clustering processing on the primary clustering result to obtain a gene co-expression unit.
Optionally, the gene co-expression unit obtaining module is specifically configured to:
a density-based clustering method and/or a hierarchical clustering method is employed.
Optionally, the density-based clustering method includes: DBSCAN, OPTIC;
the hierarchical clustering method comprises the following steps: BIRCH.
A third aspect of the present application provides an apparatus for evaluating the effect of a compound on gene pathway activation, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the steps of the method for assessing the effect of a compound on gene pathway activation as described in the first aspect above, according to instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code for performing the method for assessing the effect of a compound on gene pathway activation as described in the first aspect above.
According to the technical scheme, the embodiment of the application has the following advantages:
the embodiment of the application provides a method for evaluating the activation effect of a compound on a gene pathway, wherein the transcriptome data of a control group and the transcriptome data of a compound research group are obtained firstly; then, determining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group; clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; acquiring a gene path, and correspondingly distributing corresponding weight coefficients for each gene in the gene path according to promotion, inhibition, phosphorylation and dephosphorylation functions of the gene in the gene path so as to determine a gene path topological coefficient matrix; finally, determining the scoring result of the compound on each gene channel by using an IPANDA method according to the transcription difference expression multiple, the gene co-expression unit and the gene channel topological coefficient matrix, wherein the scoring result can evaluate the activation effect of the compound on the gene channel. In the process of determining the gene path topological coefficient matrix, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the genes are comprehensively considered, the effect of each gene on the gene path is accurately evaluated, the subsequent scoring result determined based on the gene path topological coefficient matrix is further ensured, and the activation effect of the compound on the gene path can be more accurately represented.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow diagram of a method for evaluating the effect of a compound on gene pathway activation provided in the examples herein;
FIG. 2 is a schematic diagram of the structure of an apparatus for evaluating the effect of a compound on the activation of a gene pathway provided in the examples of the present application;
FIG. 3 is a schematic diagram showing the structure of an apparatus for evaluating the effect of a compound on the activation of a gene pathway, provided in the examples of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the prior art, when the IPANDA method is used for evaluating the activation effect of a compound, the accuracy of the finally determined activation effect evaluation result is low because the effect of a gene on a gene pathway cannot be accurately evaluated.
In order to solve the technical problems of the prior art, the embodiments of the present application provide a method for evaluating the activation of a compound on a gene pathway, which can accurately evaluate the activation of the gene pathway by the compound while ensuring the reduction of the dimension of biological data.
Specifically, in the method for evaluating the activation effect of a compound on a gene pathway provided in the embodiment of the present application, the transcriptome data of a control group and the transcriptome data of a compound research group are obtained; then, calculating transcription difference expression fold data according to the obtained transcriptome data of the control group and the transcriptome data of the compound research group; then, carrying out clustering processing on related genes, clustering co-expressed genes to the same group, and thus obtaining a plurality of gene co-expression units; further, acquiring a gene path, correspondingly distributing a weight coefficient to each gene in the gene path according to the promotion effect, the inhibition effect, the phosphorylation effect or the dephosphorylation effect of each gene in the gene path, and determining a gene path topological coefficient matrix based on the weight coefficient corresponding to each gene in the gene path; finally, according to the transcription difference expression fold data, the gene co-expression units and the gene path topological coefficient matrix, the scoring result of the compound on each gene path is correspondingly determined, and the scoring result can be used for evaluating the activation effect of the compound on the gene paths.
According to the method for evaluating the activation effect of the compound on the gene pathway, in the process of determining the topological coefficient matrix of the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the scoring result of the compound determined on the gene pathway based on transcription difference expression fold data, a gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.
The methods provided herein for evaluating the activation of a gene pathway by a compound are described in detail below by way of example:
referring to fig. 1, fig. 1 is a schematic flow chart of a method for evaluating the effect of a compound on gene pathway activation provided in the examples of the present application, as shown in fig. 1, the method comprising the steps of:
step 101: transcriptome data for the control group and transcriptome data for the compound study group were obtained.
Wherein the transcriptome of the control group is transcriptome not affected by the compound; the transcriptome data of the compound research group is transcriptome data acted by a compound, the transcriptome data of different compound research groups may correspond to different compound dosages, and/or correspond to different compound types, and/or different administration times, namely in the experimental process, different types of compounds with the same dosage can be adopted for experiment, the same types of compounds with different dosages can be adopted for experiment, different types of compounds with different dosages can be adopted for experiment, and then a group of corresponding transcriptome data is correspondingly generated for each experiment. In addition, on the basis of the above experimental conditions, the generation of the transcriptome data may be performed by increasing the variable dosing interval, and the generation conditions of the transcriptome data are not limited at all.
In practical applications, the transcriptome data of one or more control groups and the transcriptome data of one or more compound research groups can be obtained according to actual requirements. Specifically, the transcriptome data of the control group and the transcriptome data of the compound study group may be obtained through an experiment, or the transcriptome data of the control group and the transcriptome data of the compound study group may be obtained from the transcriptome data set online or offline, and the implementation manner of obtaining the transcriptome data of the control group and the transcriptome data of the compound study group is not specifically limited herein.
In practical applications, the obtained transcriptome data of the control group and the transcriptome data of the compound research group are specifically shown in table 1:
TABLE 1
Gene Compound study group 1 Compound study group 2 Normal group 1 Normal group 2
TSPAN6 737.88411 789.4028003 734.65068 774.0787405
TNMD 0 0 0 0
DPM1 685.1781021 659.2014157 607.2866174 648.4525964
SCYL3 177.2012333 181.1893394 179.9709581 173.6596697
C1orf112 364.3984336 385.1411586 379.3234039 345.4718961
FGR 0 0 0 0
CFH 79.96773606 62.82444432 90.44694302 79.44006167
FUCA2 1105.917441 1074.389048 1091.823812 978.2212245
GCLC 3505.858247 3347.905533 3424.062843 3341.101198
NFYA 603.3929175 656.4699182 674.6603607 706.6470602
STPG1 146.304608 150.2323668 168.8958222 165.3461749
NIPAL3 245.3555538 295.9122377 275.955469 247.5574015
Wherein, the data of the compound research group 1 and the compound research group 2 are transcriptome data of the compound research group, and the data of the normal group 1 and the normal group 2 are transcriptome data of a corresponding control group.
Step 102: obtaining transcription difference expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group.
After the transcriptome data of the control group and the transcriptome data of the compound research group are obtained, the obtained transcriptome data can be subjected to primary processing, and then transcription differential expression fold data of the transcriptome data of the compound research group relative to the transcriptome data of the control group can be calculated.
At present, a plurality of mature modes for calculating transcription difference expression fold data exist, when the method is applied specifically, a proper calculation mode can be selected correspondingly according to actual requirements, the transcription difference expression fold data can be determined according to transcriptome data of a control group and transcriptome data of a compound research group, and the mode for calculating the transcription difference expression fold data is not specifically limited.
Step 103: and (3) clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units.
Next, clustering related genes in the transcriptome data of the control group and the transcriptome data of the compound research group, wherein the related genes specifically refer to genes of which the expression quantity can be changed to a certain extent under the action of a compound; and then, clustering a series of genes which are controlled by the same expression factor and/or show obvious synergistic expression in related genes into the same group, wherein the genes are co-expressed genes, and clustering the co-expressed genes to obtain the gene co-expression unit. And obtaining a plurality of gene co-expression units according to the synergistic effect.
When the co-expressed genes are subjected to clustering treatment, the co-expressed genes can be directly subjected to primary clustering treatment to obtain corresponding gene co-expression units; certainly, in order to ensure better clustering effect, two times of clustering processing can be performed on the co-expressed genes, that is, the first time of clustering processing is performed on the co-expressed genes, and the second time of clustering processing is performed on the first time of clustering results to obtain the gene co-expression unit.
It should be noted that, when clustering the co-expressed genes, a density-based clustering method and/or a hierarchical clustering method may be adopted; specifically, the density-based clustering method includes: DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise), OPTICS (ordering of Points to identity the Clustering structure); the hierarchical clustering method comprises the following steps: birch (balance induced reduction and conditioning using hierarchy).
When the co-expressed genes are clustered only once, the co-expressed genes can be clustered by adopting any one of the clustering methods to obtain a gene co-expression unit; when the co-expressed genes are subjected to multiple clustering, multiple clustering can be performed by only adopting any one clustering method to obtain the gene co-expression unit, or the multiple clustering methods can be combined to cluster the co-expressed genes to obtain the gene co-expression unit, for example, the co-expressed genes are subjected to primary clustering by adopting a density-based clustering method, and then the primary clustering result is subjected to secondary clustering by adopting a hierarchical clustering method to obtain the gene co-expression unit.
The following description will be made of the process of generating a co-expression unit of a gene by taking the twice clustering process of co-expressed genes as an example:
specifically, a density-based clustering method OPTIC can be adopted to perform first clustering processing on the co-expressed genes, the OPTIC method does not need to manually input two parameters of the radius of the field and the minimum point number of the field, and the clustering result has lower sensitivity to the radius of the field and the minimum point number of the field. After the first clustering result is obtained, the similarity between the genes in the first clustering result is determined, and the genes with the similarity higher than a preset threshold are screened out from the first clustering result to be used for the second clustering treatment, for example, only the genes with the similarity higher than 0.3, 0.4, 0.5, 0.6 and 0.7 are reserved, and further, only the genes with the similarity higher than 0.5 are reserved.
And then, performing secondary clustering processing on the primary clustering result by adopting a hierarchical clustering method BIRCH to generate a gene co-expression unit, wherein the BIRCH is suitable for large-scale data sets, has higher clustering efficiency when processing large-scale data, and can normally operate under any given memory.
Through the effective combination of the two clustering methods, accurate gene co-expression units can be obtained in a short time with a small amount of computing resources.
Step 104: acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation.
Obtaining gene pathways from gene pathway databases such as KEGG (Kyoto Encyclopedia of Genes and Genes), and assigning weight coefficients to each gene in the gene pathways according to the role of each gene in the gene pathways; and then, according to the respective weight coefficients corresponding to the genes, correspondingly calculating the topological coefficients of the genes on each gene path, and determining a gene path topological coefficient matrix.
In assigning a weight coefficient to each gene, the roles played by the genes to be mainly referred to in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation.
Specifically, if a gene contributes to a gene pathway, the weight coefficient corresponding to the gene may be set to +1 accordingly; if a gene inhibits a gene pathway, the weight coefficient corresponding to that gene can be set to-1 accordingly; considering that the addition or removal of phosphate groups acts as a biological "switch" for many reactions, i.e. phosphorylation and dephosphorylation processes act as "switches" in biology, if a gene phosphorylates a gene pathway, the corresponding weight coefficient of the gene can be set to +2 accordingly; if a gene dephosphorylates a gene pathway, the weight coefficient for that gene can be set to-2 accordingly.
It should be understood that, in practical applications, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene on the gene pathway can be considered according to actual requirements, and corresponding weight coefficients can be set for the genes, that is, the weight coefficients can be set to other values commonly used in the art accordingly, and the specific values of the set weight coefficients are not limited at all.
After the weight coefficients are correspondingly distributed to each gene in the gene path in consideration of the functions of the genes in the gene path; the topological coefficients of the genes on each gene channel can be further calculated by utilizing R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes, and then the topological coefficient matrix of the gene channel is formed by utilizing the calculated topological coefficients.
The obtained gene pathway topological coefficient matrix is shown in table 2:
TABLE 2
Figure BDA0001979010770000101
Figure BDA0001979010770000111
It should be noted that, in practical applications, the execution sequence of step 102, step 103 and step 104 is not limited to the sequence described above, and in specific implementation, step 102 may be executed first, step 103 may be executed first, step 104 may be executed first, step 102, step 103 and step 104 may be executed simultaneously, and the execution sequence of step 102, step 103 and step 104 is not specifically limited herein.
Step 105: determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
After the transcription difference expression multiple data, the gene co-expression unit and the gene channel topological coefficient matrix are obtained, the scoring result of the compound on each gene channel can be calculated by adopting an IPANDA method according to the obtained transcription difference expression multiple data, the gene co-expression unit and the gene channel topological coefficient matrix, and the scoring result is used for evaluating the activation effect of the compound on the gene channel.
The final scoring results for the specifically identified compounds on each gene pathway are shown in table 3:
TABLE 3
Figure BDA0001979010770000112
Figure BDA0001979010770000121
Wherein, positive value data represents that the compound has strengthening effect on the corresponding gene path, negative value data represents that the compound has weakening effect on the corresponding gene path, and the larger the absolute value of the numerical value is, the stronger the effect is.
It will be appreciated that in practice, it will be necessary to determine the appropriate score for each compound study based on the transcriptome data for that compound study, i.e. to determine the type of compound, dose of compound and/or time of action of compound used in that compound study, the activation of the gene pathway.
According to the method for evaluating the activation effect of the compound on the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered in the process of determining the topological coefficient matrix of the gene pathway, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the subsequent scoring result of the compound determined on the gene pathway based on the transcription difference expression fold data, the gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.
In view of the above-described methods for assessing the activation of a gene pathway by a compound, the present embodiments accordingly also provide devices for assessing the activation of a gene pathway by a compound.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an apparatus for evaluating activation of a gene pathway by a compound according to an embodiment of the present application, as shown in fig. 2, the apparatus comprising:
a transcriptome data acquisition module 201, configured to acquire transcriptome data of a control group and transcriptome data of a compound research group;
a transcription differential expression fold data obtaining module 202, configured to obtain transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
a gene co-expression unit obtaining module 203, configured to perform clustering processing on related genes, cluster co-expressed genes into the same group, and obtain multiple gene co-expression units;
a gene pathway topological coefficient matrix obtaining module 204, configured to obtain a gene pathway, and distribute a weight coefficient to each gene in the gene pathway according to a role of the gene in the gene pathway, so as to obtain a gene pathway topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
a scoring module 205, configured to determine a scoring result of the compound on each gene pathway according to the transcription difference expression fold data, the gene co-expression unit, and the gene pathway topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
Optionally, the gene pathway topological coefficient matrix obtaining module 204 is specifically configured to:
setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;
setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.
Optionally, the gene pathway topological coefficient matrix obtaining module 204 is specifically configured to:
and calculating the topological coefficients of the genes on each gene channel by using the R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes.
Optionally, the gene co-expression unit obtaining module 203 is specifically configured to:
and carrying out primary clustering processing on the co-expressed genes, and carrying out secondary clustering processing on the primary clustering result to obtain a gene co-expression unit.
Optionally, the gene co-expression unit obtaining module 203 is specifically configured to:
a density-based clustering method and/or a hierarchical clustering method is employed.
Optionally, the density-based clustering method includes: DBSCAN, OPTIC;
the hierarchical clustering method comprises the following steps: BIRCH.
According to the device for evaluating the activation effect of the compound on the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered in the process of determining the topological coefficient matrix of the gene pathway, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the subsequent scoring result of the compound determined on the gene pathway based on the transcription difference expression fold data, the gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.
The application also provides equipment for evaluating the activation effect of the compound on the gene pathway, wherein the equipment can be a server or a terminal device; the apparatus for evaluating the activation of a gene pathway by a compound will be described below by taking a server as an example.
Referring to fig. 3, fig. 3 is a schematic diagram of a server 300 for evaluating the effect of a compound on gene pathway activation according to an embodiment of the present disclosure, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) for storing applications 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.
The server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 3.
The CPU322 is configured to execute the following steps:
obtaining transcriptome data of a control group and transcriptome data of a compound research group;
obtaining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units;
acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
Alternatively, CPU322 may perform the method steps of any particular implementation of the method for assessing the activation of a gene pathway by a compound shown in FIG. 2.
The present embodiments also provide a computer-readable storage medium storing program code for performing any one of the embodiments of a method for evaluating the effect of a compound on gene pathway activation as described in the various embodiments above.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method for evaluating the activation of a gene pathway by a compound, comprising:
obtaining transcriptome data of a control group and transcriptome data of a compound research group;
obtaining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; the related gene refers to a gene of which the expression quantity can be changed to a certain extent under the action of a compound;
acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
2. The method of claim 1, wherein assigning a weight coefficient to each gene in a gene pathway based on the role that the gene plays in the gene pathway comprises:
setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;
setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.
3. The method of claim 1 or 2, wherein the obtaining a gene pathway topological coefficient matrix comprises:
and calculating the topological coefficients of the genes on each gene channel by using the R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes.
4. The method of claim 1, wherein the clustering the related genes to cluster the co-expressed genes into a same group to obtain a plurality of co-expression units of genes comprises:
and carrying out primary clustering processing on the co-expressed genes, and carrying out secondary clustering processing on the primary clustering result to obtain a gene co-expression unit.
5. The method of claim 1, wherein the clustering the related genes to cluster the co-expressed genes into a same group to obtain a plurality of co-expression units of genes comprises:
a density-based clustering method and/or a hierarchical clustering method is employed.
6. The method of claim 5, wherein the density-based clustering method comprises: DBSCAN, OPTIC;
the hierarchical clustering method comprises the following steps: BIRCH.
7. A device for evaluating the activation of a gene pathway by a compound, the device comprising:
the transcriptome data acquisition module is used for acquiring transcriptome data of a control group and transcriptome data of a compound research group;
the transcription difference expression fold data acquisition module is used for acquiring transcription difference expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;
the gene co-expression unit acquisition module is used for clustering related genes, clustering co-expressed genes into the same group and acquiring a plurality of gene co-expression units; the related gene refers to a gene of which the expression quantity can be changed to a certain extent under the action of a compound;
the gene path topological coefficient matrix acquisition module is used for acquiring a gene path, distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path and acquiring a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;
the scoring module is used for determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.
8. The apparatus according to claim 7, wherein the gene pathway topological coefficient matrix obtaining module is specifically configured to:
setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;
setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.
9. An apparatus for evaluating the activation of a gene pathway by a compound, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is adapted to perform the steps of the method for assessing the effect of a compound on the activation of a gene pathway according to any one of claims 1 to 6, according to instructions in the program code.
10. A computer-readable storage medium for storing program code for performing the method for assessing the effect of a compound on the activation of a gene pathway according to any one of claims 1 to 6.
CN201910142574.5A 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway Active CN109801676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910142574.5A CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910142574.5A CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Publications (2)

Publication Number Publication Date
CN109801676A CN109801676A (en) 2019-05-24
CN109801676B true CN109801676B (en) 2021-01-01

Family

ID=66561331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910142574.5A Active CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Country Status (1)

Country Link
CN (1) CN109801676B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444248B (en) * 2019-07-22 2021-09-24 山东大学 Cancer biomolecule marker screening method and system based on network topology parameters

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553492A (en) * 2006-08-31 2009-10-07 阵列生物制药公司 RAF inhibitor compounds and methods of use thereof
CN103608036A (en) * 2011-06-19 2014-02-26 瓦克西尼私人有限公司 Vaccine adjuvant composition comprising inulin particles
CN104968646A (en) * 2012-12-13 2015-10-07 葛兰素史密斯克莱有限责任公司 Enhancer of Zeste homolog 2 inhibitors
WO2019034576A1 (en) * 2017-08-18 2019-02-21 Koninklijke Philips N.V. Methods for sequencing biomolecules

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093119A (en) * 2013-01-24 2013-05-08 南京大学 Method for recognizing significant biologic pathway through utilization of network structural information
US10460830B2 (en) * 2013-08-22 2019-10-29 Genomoncology, Llc Computer-based systems and methods for analyzing genomes based on discrete data structures corresponding to genetic variants therein
US20170277826A1 (en) * 2016-03-27 2017-09-28 Insilico Medicine, Inc. System, method and software for robust transcriptomic data analysis
US11260078B2 (en) * 2017-07-25 2022-03-01 Insilico Medicine Ip Limited Method of treating senescence with multi-stage longevity therapeutics
CN108763864B (en) * 2018-05-04 2021-06-29 温州大学 Method for evaluating state of biological pathway sample
CN108753915A (en) * 2018-05-12 2018-11-06 内蒙古农业大学 The assay method of millet enzymatic activity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553492A (en) * 2006-08-31 2009-10-07 阵列生物制药公司 RAF inhibitor compounds and methods of use thereof
CN103608036A (en) * 2011-06-19 2014-02-26 瓦克西尼私人有限公司 Vaccine adjuvant composition comprising inulin particles
CN104968646A (en) * 2012-12-13 2015-10-07 葛兰素史密斯克莱有限责任公司 Enhancer of Zeste homolog 2 inhibitors
WO2019034576A1 (en) * 2017-08-18 2019-02-21 Koninklijke Philips N.V. Methods for sequencing biomolecules

Also Published As

Publication number Publication date
CN109801676A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
Lin et al. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data
Liu et al. TranSynergy: Mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations
Kueffner et al. Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach
Dessimoz et al. Phylogenetic assessment of alignments reveals neglected tree signal in gaps
Assefa et al. Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data
Zhu et al. Recursively imputed survival trees
JP6382459B1 (en) System and method for patient specific prediction of drug response from cell line genomics
Rorbach et al. Distinguishing mirtrons from canonical miRNAs with data exploration and machine learning methods
EP3248127A1 (en) Systems and methods for response prediction to chemotherapy in high grade bladder cancer
Wang et al. Using multiple measurements of tissue to estimate subject-and cell-type-specific gene expression
Li et al. A gene-based information gain method for detecting gene–gene interactions in case–control studies
CN112053742A (en) Method and device for screening molecular target protein, computer equipment and storage medium
Pelz et al. Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data
Kandoi et al. Tissue-specific mouse mRNA isoform networks
Wang et al. Subpopulation detection and their comparative analysis across single-cell experiments with scPopCorn
CN106709028A (en) High-throughput sequencing data counting method and counting device
Eldjárn Hjörleifsson et al. Accurate quantification of single-cell and single-nucleus RNA-seq transcripts using distinguishing flanking k-mers
CN109935341B (en) Method and device for predicting new drug indication
CN109801676B (en) Method and device for evaluating activation effect of compound on gene pathway
Liu et al. Improving RNA-Seq expression estimation by modeling isoform-and exon-specific read sequencing rate
Saeed et al. A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes
Masoudi-Sobhanzadeh et al. Discovering driver nodes in chronic kidney disease-related networks using Trader as a newly developed algorithm
KR101816646B1 (en) A METHOD FOR PROCESSING DATA OF A COMPUTER FOR IDENTIFYING GENE-microRNA MODULE HAVING HIGH COREELATION WITH CANCER AND A METHOD OF SELECTING GENES AND microRNAs HAVING HIGH COREELATION WITH CANCER
Engchuan et al. Gene-set activity toolbox (GAT): a platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach
Kakati et al. X-Module: A novel fusion measure to associate co-expressed gene modules from condition-specific expression profiles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant