CN109801676B

CN109801676B - Method and device for evaluating activation effect of compound on gene pathway

Info

Publication number: CN109801676B
Application number: CN201910142574.5A
Authority: CN
Inventors: 戴蝉; 李瑛颖; 管峥
Original assignee: Beijing Deep Intelligent Pharma Technology Co ltd
Current assignee: Beijing Deep Intelligent Pharma Technology Co ltd
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2021-01-01
Anticipated expiration: 2039-02-26
Also published as: CN109801676A

Abstract

The present application discloses a method for evaluating the activation of a gene pathway by a compound, comprising: obtaining transcriptome data of a control group and transcriptome data of a compound research group; determining transcription differential expression fold data according to transcriptome data of a control group and transcriptome data of a compound research group; clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; acquiring a gene path, and correspondingly distributing corresponding weight coefficients for each gene in the gene path according to promotion, inhibition, phosphorylation and dephosphorylation functions of the gene in the gene path so as to determine a gene path topological coefficient matrix; and determining a scoring result for evaluating the activation of the compound on the gene channel according to the transcription difference expression fold, the gene co-expression unit and the gene channel topological coefficient matrix.

Description

Method and device for evaluating activation effect of compound on gene pathway

Technical Field

The application relates to the technical field of biological information, in particular to a method and a device for evaluating activation of a compound on a gene pathway.

Background

Over the past few decades, with the advent of genetic engineering, much research and money has been invested in genomics and gene-based personalized medicine. With the wide application of deep learning and machine learning algorithms, large-scale transcriptome data is effectively applied, and the traditional disease classification, personalized medicine, prognosis models and the like are optimized to a great extent.

However, these classical clinical applications are currently still limited by several well-recognized challenges and limitations, first, one of the most relevant challenges in transcriptome data analysis is the inherent complexity of gene network interactions, which remains a significant obstacle to building comprehensive predictive models from transcriptome data; furthermore, the high diversity of experimental platforms, the difficulty in understanding the values obtained, and the inconsistency of data from various types of equipment may also lead to incorrect interpretation of potential biological processes.

Despite these challenges, various transcriptome data analysis algorithms have been developed rapidly both academically and commercially, and some of these algorithms have been tried in clinical applications, especially to predict patient response to various cancer treatments by specifically identifying genes differentially expressed between different sample sets, which, while identifying potential genetic biomarkers and expression signature patterns during the study, have made it difficult to capture subtle sample-to-sample differences due to dynamic interactions between genes at the level of the signal network.

The IPANDA approach developed in 2016, which incorporates a gene pathway, greatly reduces the biological data dimension, but does not provide accurate assessment of the role a gene plays in the gene pathway.

Disclosure of Invention

The embodiment of the application provides a method for evaluating the activation effect of a compound on a gene pathway, which can accurately evaluate the activation effect of the compound on the gene pathway while reducing the dimension of biological data.

In view of this, the present application provides, in a first aspect, a method for evaluating the effect of a compound on gene pathway activation, the method comprising:

obtaining transcriptome data of a control group and transcriptome data of a compound research group;

obtaining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;

clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units;

acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;

determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.

Optionally, the assigning a weight coefficient to each gene in the gene pathway according to the role played by the gene in the gene pathway includes:

setting a weight coefficient corresponding to a gene contributing to a gene pathway to + 1; setting a weight coefficient corresponding to a gene inhibiting a gene pathway to-1;

setting a weight coefficient corresponding to a gene that phosphorylates a gene pathway to + 2; the weight coefficient corresponding to the gene for dephosphorylation of the gene pathway was set to-2.

Optionally, the obtaining a gene pathway topological coefficient matrix includes:

and calculating the topological coefficients of the genes on each gene channel by using the R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes.

Optionally, the clustering the genes, and clustering the co-expressed genes into the same group to obtain a plurality of co-expression units of the genes includes:

and carrying out primary clustering processing on the co-expressed genes, and carrying out secondary clustering processing on the primary clustering result to obtain a gene co-expression unit.

a density-based clustering method and/or a hierarchical clustering method is employed.

Optionally, the density-based clustering method includes: DBSCAN, OPTIC;

the hierarchical clustering method comprises the following steps: BIRCH.

In a second aspect, the present application provides a device for assessing the activation of a gene pathway by a compound, the device comprising:

the transcriptome data acquisition module is used for acquiring transcriptome data of a control group and transcriptome data of a compound research group;

the transcription difference expression fold data acquisition module is used for acquiring transcription difference expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;

the gene co-expression unit acquisition module is used for clustering related genes, clustering co-expressed genes into the same group and acquiring a plurality of gene co-expression units;

the gene path topological coefficient matrix acquisition module is used for acquiring a gene path, distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path and acquiring a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;

the scoring module is used for determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.

Optionally, the gene pathway topological coefficient matrix obtaining module is specifically configured to:

Optionally, the gene co-expression unit obtaining module is specifically configured to:

Optionally, the density-based clustering method includes: DBSCAN, OPTIC;

the hierarchical clustering method comprises the following steps: BIRCH.

A third aspect of the present application provides an apparatus for evaluating the effect of a compound on gene pathway activation, the apparatus comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the steps of the method for assessing the effect of a compound on gene pathway activation as described in the first aspect above, according to instructions in the program code.

A fourth aspect of the present application provides a computer-readable storage medium for storing program code for performing the method for assessing the effect of a compound on gene pathway activation as described in the first aspect above.

According to the technical scheme, the embodiment of the application has the following advantages:

the embodiment of the application provides a method for evaluating the activation effect of a compound on a gene pathway, wherein the transcriptome data of a control group and the transcriptome data of a compound research group are obtained firstly; then, determining transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group; clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; acquiring a gene path, and correspondingly distributing corresponding weight coefficients for each gene in the gene path according to promotion, inhibition, phosphorylation and dephosphorylation functions of the gene in the gene path so as to determine a gene path topological coefficient matrix; finally, determining the scoring result of the compound on each gene channel by using an IPANDA method according to the transcription difference expression multiple, the gene co-expression unit and the gene channel topological coefficient matrix, wherein the scoring result can evaluate the activation effect of the compound on the gene channel. In the process of determining the gene path topological coefficient matrix, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the genes are comprehensively considered, the effect of each gene on the gene path is accurately evaluated, the subsequent scoring result determined based on the gene path topological coefficient matrix is further ensured, and the activation effect of the compound on the gene path can be more accurately represented.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic flow diagram of a method for evaluating the effect of a compound on gene pathway activation provided in the examples herein;

FIG. 2 is a schematic diagram of the structure of an apparatus for evaluating the effect of a compound on the activation of a gene pathway provided in the examples of the present application;

FIG. 3 is a schematic diagram showing the structure of an apparatus for evaluating the effect of a compound on the activation of a gene pathway, provided in the examples of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the prior art, when the IPANDA method is used for evaluating the activation effect of a compound, the accuracy of the finally determined activation effect evaluation result is low because the effect of a gene on a gene pathway cannot be accurately evaluated.

In order to solve the technical problems of the prior art, the embodiments of the present application provide a method for evaluating the activation of a compound on a gene pathway, which can accurately evaluate the activation of the gene pathway by the compound while ensuring the reduction of the dimension of biological data.

Specifically, in the method for evaluating the activation effect of a compound on a gene pathway provided in the embodiment of the present application, the transcriptome data of a control group and the transcriptome data of a compound research group are obtained; then, calculating transcription difference expression fold data according to the obtained transcriptome data of the control group and the transcriptome data of the compound research group; then, carrying out clustering processing on related genes, clustering co-expressed genes to the same group, and thus obtaining a plurality of gene co-expression units; further, acquiring a gene path, correspondingly distributing a weight coefficient to each gene in the gene path according to the promotion effect, the inhibition effect, the phosphorylation effect or the dephosphorylation effect of each gene in the gene path, and determining a gene path topological coefficient matrix based on the weight coefficient corresponding to each gene in the gene path; finally, according to the transcription difference expression fold data, the gene co-expression units and the gene path topological coefficient matrix, the scoring result of the compound on each gene path is correspondingly determined, and the scoring result can be used for evaluating the activation effect of the compound on the gene paths.

According to the method for evaluating the activation effect of the compound on the gene pathway, in the process of determining the topological coefficient matrix of the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the scoring result of the compound determined on the gene pathway based on transcription difference expression fold data, a gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.

The methods provided herein for evaluating the activation of a gene pathway by a compound are described in detail below by way of example:

referring to fig. 1, fig. 1 is a schematic flow chart of a method for evaluating the effect of a compound on gene pathway activation provided in the examples of the present application, as shown in fig. 1, the method comprising the steps of:

step 101: transcriptome data for the control group and transcriptome data for the compound study group were obtained.

Wherein the transcriptome of the control group is transcriptome not affected by the compound; the transcriptome data of the compound research group is transcriptome data acted by a compound, the transcriptome data of different compound research groups may correspond to different compound dosages, and/or correspond to different compound types, and/or different administration times, namely in the experimental process, different types of compounds with the same dosage can be adopted for experiment, the same types of compounds with different dosages can be adopted for experiment, different types of compounds with different dosages can be adopted for experiment, and then a group of corresponding transcriptome data is correspondingly generated for each experiment. In addition, on the basis of the above experimental conditions, the generation of the transcriptome data may be performed by increasing the variable dosing interval, and the generation conditions of the transcriptome data are not limited at all.

In practical applications, the transcriptome data of one or more control groups and the transcriptome data of one or more compound research groups can be obtained according to actual requirements. Specifically, the transcriptome data of the control group and the transcriptome data of the compound study group may be obtained through an experiment, or the transcriptome data of the control group and the transcriptome data of the compound study group may be obtained from the transcriptome data set online or offline, and the implementation manner of obtaining the transcriptome data of the control group and the transcriptome data of the compound study group is not specifically limited herein.

In practical applications, the obtained transcriptome data of the control group and the transcriptome data of the compound research group are specifically shown in table 1:

TABLE 1

Gene	Compound study group 1	Compound study group 2	Normal group 1	Normal group 2
					TSPAN6	737.88411	789.4028003	734.65068	774.0787405
TNMD	0	0	0	0
					DPM1	685.1781021	659.2014157	607.2866174	648.4525964
SCYL3	177.2012333	181.1893394	179.9709581	173.6596697
					C1orf112	364.3984336	385.1411586	379.3234039	345.4718961
FGR	0	0	0	0
					CFH	79.96773606	62.82444432	90.44694302	79.44006167
FUCA2	1105.917441	1074.389048	1091.823812	978.2212245
					GCLC	3505.858247	3347.905533	3424.062843	3341.101198
NFYA	603.3929175	656.4699182	674.6603607	706.6470602
					STPG1	146.304608	150.2323668	168.8958222	165.3461749
NIPAL3	245.3555538	295.9122377	275.955469	247.5574015

Wherein, the data of the compound research group 1 and the compound research group 2 are transcriptome data of the compound research group, and the data of the normal group 1 and the normal group 2 are transcriptome data of a corresponding control group.

Step 102: obtaining transcription difference expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group.

After the transcriptome data of the control group and the transcriptome data of the compound research group are obtained, the obtained transcriptome data can be subjected to primary processing, and then transcription differential expression fold data of the transcriptome data of the compound research group relative to the transcriptome data of the control group can be calculated.

At present, a plurality of mature modes for calculating transcription difference expression fold data exist, when the method is applied specifically, a proper calculation mode can be selected correspondingly according to actual requirements, the transcription difference expression fold data can be determined according to transcriptome data of a control group and transcriptome data of a compound research group, and the mode for calculating the transcription difference expression fold data is not specifically limited.

Step 103: and (3) clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units.

Next, clustering related genes in the transcriptome data of the control group and the transcriptome data of the compound research group, wherein the related genes specifically refer to genes of which the expression quantity can be changed to a certain extent under the action of a compound; and then, clustering a series of genes which are controlled by the same expression factor and/or show obvious synergistic expression in related genes into the same group, wherein the genes are co-expressed genes, and clustering the co-expressed genes to obtain the gene co-expression unit. And obtaining a plurality of gene co-expression units according to the synergistic effect.

When the co-expressed genes are subjected to clustering treatment, the co-expressed genes can be directly subjected to primary clustering treatment to obtain corresponding gene co-expression units; certainly, in order to ensure better clustering effect, two times of clustering processing can be performed on the co-expressed genes, that is, the first time of clustering processing is performed on the co-expressed genes, and the second time of clustering processing is performed on the first time of clustering results to obtain the gene co-expression unit.

It should be noted that, when clustering the co-expressed genes, a density-based clustering method and/or a hierarchical clustering method may be adopted; specifically, the density-based clustering method includes: DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise), OPTICS (ordering of Points to identity the Clustering structure); the hierarchical clustering method comprises the following steps: birch (balance induced reduction and conditioning using hierarchy).

When the co-expressed genes are clustered only once, the co-expressed genes can be clustered by adopting any one of the clustering methods to obtain a gene co-expression unit; when the co-expressed genes are subjected to multiple clustering, multiple clustering can be performed by only adopting any one clustering method to obtain the gene co-expression unit, or the multiple clustering methods can be combined to cluster the co-expressed genes to obtain the gene co-expression unit, for example, the co-expressed genes are subjected to primary clustering by adopting a density-based clustering method, and then the primary clustering result is subjected to secondary clustering by adopting a hierarchical clustering method to obtain the gene co-expression unit.

The following description will be made of the process of generating a co-expression unit of a gene by taking the twice clustering process of co-expressed genes as an example:

specifically, a density-based clustering method OPTIC can be adopted to perform first clustering processing on the co-expressed genes, the OPTIC method does not need to manually input two parameters of the radius of the field and the minimum point number of the field, and the clustering result has lower sensitivity to the radius of the field and the minimum point number of the field. After the first clustering result is obtained, the similarity between the genes in the first clustering result is determined, and the genes with the similarity higher than a preset threshold are screened out from the first clustering result to be used for the second clustering treatment, for example, only the genes with the similarity higher than 0.3, 0.4, 0.5, 0.6 and 0.7 are reserved, and further, only the genes with the similarity higher than 0.5 are reserved.

And then, performing secondary clustering processing on the primary clustering result by adopting a hierarchical clustering method BIRCH to generate a gene co-expression unit, wherein the BIRCH is suitable for large-scale data sets, has higher clustering efficiency when processing large-scale data, and can normally operate under any given memory.

Through the effective combination of the two clustering methods, accurate gene co-expression units can be obtained in a short time with a small amount of computing resources.

Step 104: acquiring a gene path, and distributing a weight coefficient to each gene in the gene path according to the function of the gene in the gene path to obtain a gene path topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation.

Obtaining gene pathways from gene pathway databases such as KEGG (Kyoto Encyclopedia of Genes and Genes), and assigning weight coefficients to each gene in the gene pathways according to the role of each gene in the gene pathways; and then, according to the respective weight coefficients corresponding to the genes, correspondingly calculating the topological coefficients of the genes on each gene path, and determining a gene path topological coefficient matrix.

In assigning a weight coefficient to each gene, the roles played by the genes to be mainly referred to in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation.

Specifically, if a gene contributes to a gene pathway, the weight coefficient corresponding to the gene may be set to +1 accordingly; if a gene inhibits a gene pathway, the weight coefficient corresponding to that gene can be set to-1 accordingly; considering that the addition or removal of phosphate groups acts as a biological "switch" for many reactions, i.e. phosphorylation and dephosphorylation processes act as "switches" in biology, if a gene phosphorylates a gene pathway, the corresponding weight coefficient of the gene can be set to +2 accordingly; if a gene dephosphorylates a gene pathway, the weight coefficient for that gene can be set to-2 accordingly.

It should be understood that, in practical applications, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene on the gene pathway can be considered according to actual requirements, and corresponding weight coefficients can be set for the genes, that is, the weight coefficients can be set to other values commonly used in the art accordingly, and the specific values of the set weight coefficients are not limited at all.

After the weight coefficients are correspondingly distributed to each gene in the gene path in consideration of the functions of the genes in the gene path; the topological coefficients of the genes on each gene channel can be further calculated by utilizing R packets of KEGGgraph and RBGL according to the respective corresponding weight coefficients of the genes, and then the topological coefficient matrix of the gene channel is formed by utilizing the calculated topological coefficients.

The obtained gene pathway topological coefficient matrix is shown in table 2:

TABLE 2

It should be noted that, in practical applications, the execution sequence of step 102, step 103 and step 104 is not limited to the sequence described above, and in specific implementation, step 102 may be executed first, step 103 may be executed first, step 104 may be executed first, step 102, step 103 and step 104 may be executed simultaneously, and the execution sequence of step 102, step 103 and step 104 is not specifically limited herein.

Step 105: determining a scoring result of the compound on each gene channel according to the transcription difference expression fold data, the gene co-expression unit and the gene channel topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.

After the transcription difference expression multiple data, the gene co-expression unit and the gene channel topological coefficient matrix are obtained, the scoring result of the compound on each gene channel can be calculated by adopting an IPANDA method according to the obtained transcription difference expression multiple data, the gene co-expression unit and the gene channel topological coefficient matrix, and the scoring result is used for evaluating the activation effect of the compound on the gene channel.

The final scoring results for the specifically identified compounds on each gene pathway are shown in table 3:

TABLE 3

Wherein, positive value data represents that the compound has strengthening effect on the corresponding gene path, negative value data represents that the compound has weakening effect on the corresponding gene path, and the larger the absolute value of the numerical value is, the stronger the effect is.

It will be appreciated that in practice, it will be necessary to determine the appropriate score for each compound study based on the transcriptome data for that compound study, i.e. to determine the type of compound, dose of compound and/or time of action of compound used in that compound study, the activation of the gene pathway.

According to the method for evaluating the activation effect of the compound on the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered in the process of determining the topological coefficient matrix of the gene pathway, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the subsequent scoring result of the compound determined on the gene pathway based on the transcription difference expression fold data, the gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.

In view of the above-described methods for assessing the activation of a gene pathway by a compound, the present embodiments accordingly also provide devices for assessing the activation of a gene pathway by a compound.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an apparatus for evaluating activation of a gene pathway by a compound according to an embodiment of the present application, as shown in fig. 2, the apparatus comprising:

a transcriptome data acquisition module 201, configured to acquire transcriptome data of a control group and transcriptome data of a compound research group;

a transcription differential expression fold data obtaining module 202, configured to obtain transcription differential expression fold data according to the transcriptome data of the control group and the transcriptome data of the compound research group;

a gene co-expression unit obtaining module 203, configured to perform clustering processing on related genes, cluster co-expressed genes into the same group, and obtain multiple gene co-expression units;

a gene pathway topological coefficient matrix obtaining module 204, configured to obtain a gene pathway, and distribute a weight coefficient to each gene in the gene pathway according to a role of the gene in the gene pathway, so as to obtain a gene pathway topological coefficient matrix; the roles that the genes play in the gene pathway include: promotion, inhibition, phosphorylation and dephosphorylation;

a scoring module 205, configured to determine a scoring result of the compound on each gene pathway according to the transcription difference expression fold data, the gene co-expression unit, and the gene pathway topological coefficient matrix; the scored results were used to evaluate the activation of the gene pathway by the compound.

Optionally, the gene pathway topological coefficient matrix obtaining module 204 is specifically configured to:

Optionally, the gene co-expression unit obtaining module 203 is specifically configured to:

Optionally, the density-based clustering method includes: DBSCAN, OPTIC;

the hierarchical clustering method comprises the following steps: BIRCH.

According to the device for evaluating the activation effect of the compound on the gene pathway, the promotion effect, the inhibition effect, the phosphorylation effect and the dephosphorylation effect of the gene in the gene pathway are comprehensively considered in the process of determining the topological coefficient matrix of the gene pathway, namely, the effect of the gene in the gene pathway is accurately evaluated in the process of determining the topological coefficient matrix of the gene pathway, and then, the subsequent scoring result of the compound determined on the gene pathway based on the transcription difference expression fold data, the gene co-expression unit and the topological coefficient matrix of the gene pathway is ensured, and the activation effect of the compound on the gene pathway can be more accurately characterized.

The application also provides equipment for evaluating the activation effect of the compound on the gene pathway, wherein the equipment can be a server or a terminal device; the apparatus for evaluating the activation of a gene pathway by a compound will be described below by taking a server as an example.

Referring to fig. 3, fig. 3 is a schematic diagram of a server 300 for evaluating the effect of a compound on gene pathway activation according to an embodiment of the present disclosure, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) for storing applications 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.

The server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 3.

The CPU322 is configured to execute the following steps:

Alternatively, CPU322 may perform the method steps of any particular implementation of the method for assessing the activation of a gene pathway by a compound shown in FIG. 2.

The present embodiments also provide a computer-readable storage medium storing program code for performing any one of the embodiments of a method for evaluating the effect of a compound on gene pathway activation as described in the various embodiments above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for evaluating the activation of a gene pathway by a compound, comprising:

clustering related genes, clustering co-expressed genes into the same group, and obtaining a plurality of gene co-expression units; the related gene refers to a gene of which the expression quantity can be changed to a certain extent under the action of a compound;

2. The method of claim 1, wherein assigning a weight coefficient to each gene in a gene pathway based on the role that the gene plays in the gene pathway comprises:

3. The method of claim 1 or 2, wherein the obtaining a gene pathway topological coefficient matrix comprises:

4. The method of claim 1, wherein the clustering the related genes to cluster the co-expressed genes into a same group to obtain a plurality of co-expression units of genes comprises:

5. The method of claim 1, wherein the clustering the related genes to cluster the co-expressed genes into a same group to obtain a plurality of co-expression units of genes comprises:

6. The method of claim 5, wherein the density-based clustering method comprises: DBSCAN, OPTIC;

the hierarchical clustering method comprises the following steps: BIRCH.

7. A device for evaluating the activation of a gene pathway by a compound, the device comprising:

the gene co-expression unit acquisition module is used for clustering related genes, clustering co-expressed genes into the same group and acquiring a plurality of gene co-expression units; the related gene refers to a gene of which the expression quantity can be changed to a certain extent under the action of a compound;

8. The apparatus according to claim 7, wherein the gene pathway topological coefficient matrix obtaining module is specifically configured to:

9. An apparatus for evaluating the activation of a gene pathway by a compound, the apparatus comprising a processor and a memory:

the processor is adapted to perform the steps of the method for assessing the effect of a compound on the activation of a gene pathway according to any one of claims 1 to 6, according to instructions in the program code.

10. A computer-readable storage medium for storing program code for performing the method for assessing the effect of a compound on the activation of a gene pathway according to any one of claims 1 to 6.