CN109801676A - A kind of method and device acted on for evaluating compound on gene signal pathway activated - Google Patents

A kind of method and device acted on for evaluating compound on gene signal pathway activated Download PDF

Info

Publication number
CN109801676A
CN109801676A CN201910142574.5A CN201910142574A CN109801676A CN 109801676 A CN109801676 A CN 109801676A CN 201910142574 A CN201910142574 A CN 201910142574A CN 109801676 A CN109801676 A CN 109801676A
Authority
CN
China
Prior art keywords
gene
pathway
compound
effect
profile data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910142574.5A
Other languages
Chinese (zh)
Other versions
CN109801676B (en
Inventor
戴蝉
李瑛颖
管峥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Deep System Yao Technology Co Ltd
Original Assignee
Beijing Deep System Yao Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Deep System Yao Technology Co Ltd filed Critical Beijing Deep System Yao Technology Co Ltd
Priority to CN201910142574.5A priority Critical patent/CN109801676B/en
Publication of CN109801676A publication Critical patent/CN109801676A/en
Application granted granted Critical
Publication of CN109801676B publication Critical patent/CN109801676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This application discloses a kind of methods for evaluating the effect of compound on gene signal pathway activated, comprising: obtains the transcript profile data of control group and the transcript profile data of compound study group;According to the transcript profile data of the transcript profile data of control group and compound study group, determine that transcriptional differences express multiple data;Clustering processing is done to related gene, by the gene clusters of coexpression to same group, obtains multiple gene co-expressing units;Obtain gene pathway, according to facilitation, inhibiting effect, phosphorylation and dephosphorylation of the gene played in gene pathway, its corresponding weight coefficient correspondingly is distributed for each gene in gene pathway, and then determines gene pathway topology coefficient matrix;Multiple, gene co-expressing unit and gene pathway topology coefficient matrix are expressed according to transcriptional differences, is determined for evaluating compound for the marking result of the activation of gene pathway.

Description

A kind of method and device acted on for evaluating compound on gene signal pathway activated
Technical field
This application involves technical field of biological information, more particularly to one kind is for evaluating compound on gene signal pathway activated work Method and device.
Background technique
In the past few decades, with the appearance of genetic engineering, many researchs and fund be put to genomics and In personalized medicine based on gene.With the extensive use of deep learning and machine learning algorithm, to Large Scale Transcriptional group number According to effectively being used, to traditional classification of diseases, personalized medicine and in terms of produce very great Cheng The optimization of degree.
However, these classical clinical applications are still limited by several generally acknowledged challenges and limitation at present, firstly, transcript profile number According to challenge maximally related in analysis first is that the intrinsic complexity of idiotype network interaction, this is still from transcript profile data structure Build the major obstacles of Comprehensive Model;In addition, the great diversity of experiment porch, indigestion value obtained and coming from The inconsistency of the data of various types equipment, it is also possible to lead to the misinterpretation to potential source biomolecule process.
Despite the presence of these challenges, various transcript profile data analysis algorithms are still grown rapidly in academic and business, part Algorithm has been attempted to be applied to clinic at present, the reaction in particular for prediction patient to various treatments of cancer, these methods are specific By identifying the gene of differential expression between different sample groups, the reaction of various treatments of cancer is predicted, although the above method can be with Genetic biomarker potential in research process and expression characteristic mode are identified, but are difficult capture because in signal network Dynamic interaction in level between gene and the nuance between the sample that generates.
The IPANDA method of exploitation in 2016 combines gene pathway, largely reduces the data dimension of biology Degree, but its to gene on gene pathway role assessment it is not accurate enough.
Summary of the invention
The embodiment of the present application provides a kind of method for evaluating the effect of compound on gene signal pathway activated, can drop While low biological data dimension, compound is accurately evaluated for the activation of gene pathway.
In view of this, the application first aspect provides a kind of side acted on for evaluating compound on gene signal pathway activated Method, which comprises
Obtain the transcript profile data of control group and the transcript profile data of compound study group;
According to the transcript profile data of the transcript profile data of the control group and the compound study group, transcriptional differences are obtained Express multiple data;
Clustering processing is done to related gene, by the gene clusters of coexpression to same group, obtains multiple gene co-expressing lists Member;
Gene pathway is obtained, is each in the gene pathway according to gene effect played in the gene pathway A gene distributes weight coefficient, obtains gene pathway topology coefficient matrix;Effect packet of the gene played in gene pathway It includes: facilitation, inhibiting effect, phosphorylation and dephosphorylation;
Multiple data, the gene co-expressing unit and gene pathway topology system are expressed according to the transcriptional differences Matrix number determines marking result of the compound on every gene pathway;The marking result for evaluate the compound for The activation of the gene pathway.
Optionally, the effect according to gene played in gene pathway is each gene distribution in gene pathway Weight coefficient, comprising:
+ 1 will be set as to the corresponding weight coefficient of the favorable gene of gene pathway;Gene pathway will be risen and be inhibited The corresponding weight coefficient of the gene of effect is set as -1;
+ 2 are set by the corresponding weight coefficient of gene for playing phosphorylation to gene pathway;Gene pathway will be gone it The corresponding weight coefficient of the gene of phosphorylation is set as -2.
Optionally, the acquisition gene pathway topology coefficient matrix, comprising:
According to the corresponding weight coefficient of each gene, gene is calculated in every base using R packet KEGGgraph and RBGL Because of the topological coefficient on access.
Optionally, described that clustering processing is done to gene, by the gene clusters of coexpression to same group, it is total to obtain multiple genes Expression unit, comprising:
First time clustering processing is carried out to the gene of coexpression, and second is carried out to the first time cluster result and is gathered Class processing, obtains gene co-expressing unit.
Optionally, described that clustering processing is done to gene, by the gene clusters of coexpression to same group, it is total to obtain multiple genes Expression unit, comprising:
Using density clustering method and/or hierarchy clustering method.
Optionally, the density clustering method includes: DBSCAN, OPTICS;
The hierarchy clustering method includes: BIRCH.
The application second aspect provides a kind of for evaluating the device of compound on gene signal pathway activated effect, the dress It sets and includes:
Transcript profile data acquisition module, for obtaining the transcript profile data of control group and the transcript profile number of compound study group According to;
Transcriptional differences express multiple data acquisition module, for according to the control group transcript profile data and the chemical combination The transcript profile data of object study group obtain transcriptional differences and express multiple data;
Gene co-expressing unit obtains module and arrives the gene clusters of coexpression for doing clustering processing to related gene Same group, obtain multiple gene co-expressing units;
Gene pathway topology coefficient matrix obtains module, for obtaining gene pathway, according to gene in the gene pathway Played in effect, be the gene pathway in each gene distribute weight coefficient, obtain gene pathway topology coefficient matrix; Effect of the gene played in gene pathway includes: that facilitation, inhibiting effect, phosphorylation and dephosphorylation are made With;
Scoring modules, for expressing multiple data, the gene co-expressing unit and described according to the transcriptional differences Gene pathway topology coefficient matrix determines marking result of the compound on every gene pathway;The marking result is for commenting Activation of the valence compound for the gene pathway.
Optionally, the gene pathway topology coefficient matrix obtains module, is specifically used for:
+ 1 will be set as to the corresponding weight coefficient of the favorable gene of gene pathway;Gene pathway will be risen and be inhibited The corresponding weight coefficient of the gene of effect is set as -1;
+ 2 are set by the corresponding weight coefficient of gene for playing phosphorylation to gene pathway;Gene pathway will be gone it The corresponding weight coefficient of the gene of phosphorylation is set as -2.
Optionally, the gene pathway topology coefficient matrix obtains module, is specifically used for:
According to the corresponding weight coefficient of each gene, gene is calculated in every base using R packet KEGGgraph and RBGL Because of the topological coefficient on access.
Optionally, the gene co-expressing unit obtains module, is specifically used for:
First time clustering processing is carried out to the gene of coexpression, and second is carried out to the first time cluster result and is gathered Class processing, obtains gene co-expressing unit.
Optionally, the gene co-expressing unit obtains module, is specifically used for:
Using density clustering method and/or hierarchy clustering method.
Optionally, the density clustering method includes: DBSCAN, OPTICS;
The hierarchy clustering method includes: BIRCH.
The application third aspect provides a kind of equipment for evaluating the effect of compound on gene signal pathway activated, described to set Standby includes processor and memory:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used to execute according to the instruction in said program code and be used to comment as described in above-mentioned first aspect The step of method of valence compound on gene signal pathway activated effect.
The application fourth aspect provides a kind of computer readable storage medium, and the computer readable storage medium is used for Program code is stored, said program code is living for evaluating compound on gene access described in above-mentioned first aspect for executing The method of change effect.
As can be seen from the above technical solutions, the embodiment of the present application has the advantage that
The embodiment of the present application provides a kind of method for evaluating the effect of compound on gene signal pathway activated, in this method In, first obtain the transcript profile data of control group and the transcript profile data of compound study group;Then, according to the transcript profile of control group The transcript profile data of data and compound study group determine that transcriptional differences express multiple data;Clustering processing is done to related gene, By the gene clusters of coexpression to same group, multiple gene co-expressing units are obtained;Gene pathway is obtained, according to gene in gene Facilitation played in access, inhibiting effect, phosphorylation and dephosphorylation are correspondingly each in gene pathway A gene distributes its corresponding weight coefficient, and then determines gene pathway topology coefficient matrix;Finally, it is expressed according to transcriptional differences Multiple, gene co-expressing unit and gene pathway topology coefficient matrix determine compound in every base using IPANDA method Because the marking on access is as a result, the marking result can evaluate compound for the activation of gene pathway.Determining gene During access topology coefficient matrix, facilitation, inhibiting effect, phosphorylation and the dephosphorylation of gene are comprehensively considered Effect guarantees the role on gene pathway of each gene of accurate evaluation, and then guarantees that the subsequent gene pathway that is based on is opened up Marking that coefficient matrix determines is flutterred as a result, it is possible to the activation that more accurately characterization of compound plays gene pathway.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is that the process provided by the embodiments of the present application for evaluating the method for compound on gene signal pathway activated effect is shown It is intended to;
Fig. 2 is that the structure provided by the embodiments of the present application for evaluating the device of compound on gene signal pathway activated effect is shown It is intended to;
Fig. 3 is that the structure provided by the embodiments of the present application for evaluating the equipment of compound on gene signal pathway activated effect is shown It is intended to.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to Here the sequence other than those of diagram or description is implemented.In addition, term " includes " and " having " and their any deformation, Be intended to cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, product or setting It is standby those of to be not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for these mistakes The intrinsic other step or units of journey, method, product or equipment.
In the prior art, when being assessed using activation of the IPANDA method to compound, often because can not be right Gene role on gene pathway carries out accurate evaluation, and leads to the activation assessment result accuracy finally determined It is lower.
In order to solve above-mentioned technical problem of the existing technology, the embodiment of the present application provides a kind of for assessing chemical combination For object to the method for gene pathway activation, this method, can be to compound while guaranteeing reduces biological data dimension Accurate evaluation is carried out to the activation that gene pathway rises.
Specifically, in the method provided by the embodiments of the present application for evaluating the effect of compound on gene signal pathway activated, First obtain the transcript profile data of control group and the transcript profile data of compound study group;Then, according to acquired control group The transcript profile data of transcript profile data and compound study group calculate transcriptional differences and express multiple data;Then, to related gene Clustering processing is done, by the gene clusters of coexpression to same group, to obtain multiple gene co-expressing units;In turn, base is obtained Because of access, made according to each gene facilitation played in the gene pathway, inhibiting effect, phosphorylation or dephosphorylation With being correspondingly that gene each in gene pathway distributes weight coefficient, and corresponding based on each gene in gene pathway Weight coefficient determines gene pathway topology coefficient matrix;Finally, multiple data are expressed according to above-mentioned transcriptional differences, gene is total to table Up to unit and gene pathway topology coefficient matrix, correspondingly determine marking of the compound on every gene pathway as a result, should Marking result can be used in evaluating the activation that compound plays gene pathway.
The above-mentioned method for evaluating the effect of compound on gene signal pathway activated, is determining gene pathway topology coefficient matrix During, comprehensively consider facilitation, inhibiting effect, phosphorylation and dephosphorylation of the gene played in gene pathway Effect, i.e., it is accurate that effect during determining gene pathway topology coefficient matrix to gene played in gene pathway carries out Assessment guarantees subsequent based on transcriptional differences expression multiple data, gene co-expressing unit and gene pathway topology coefficient in turn Marking of the compound that matrix is determined on gene pathway is as a result, it is possible to which more accurately characterization of compound plays gene pathway Activation.
Below by embodiment to it is provided by the present application for evaluate compound on gene signal pathway activated effect method into Row is discussed in detail:
Referring to Fig. 1, Fig. 1 is the method provided by the embodiments of the present application for evaluating the effect of compound on gene signal pathway activated Flow diagram, as shown in Figure 1, method includes the following steps:
Step 101: obtaining the transcript profile data of control group and the transcript profile data of compound study group.
Wherein, the transcript profile data of control group refer to the transcript profile data for being not affected by compound effects;Compound study group Transcript profile data refer to the transcript profile data by compound effects, the transcript profile data of different compound study groups may be right Different compound dosage, and/or corresponding different classes of compounds and/or different administration times are answered, i.e., in experimentation In, it can be tested using the compound of variety classes same dose, the chemical combination of identical type various dose can also be used Object is tested, and can also be tested using the compound of variety classes various dose, and then correspondingly for experiment every time Generate one group of corresponding transcript profile data.Further, it is also possible on the basis of above-mentioned experiment condition, increase variable dosing interval into Row experiment generates transcript profile data, does not do any restriction to the formation condition of transcript profile data herein.
In practical applications, the transcript profile data of one or more groups of control groups can according to actual needs, be obtained, and corresponding Ground obtains the transcript profile data of one or more groups of compound study groups.When specific acquisition, experiment can be passed through and obtain above-mentioned control The transcript profile data of group and the transcript profile data of compound study group can also be obtained from transcription group data set online or offline The transcript profile data of above-mentioned control group and the transcript profile data of compound study group, herein not to the transcript profile number for obtaining control group It is specifically limited according to the implementation of the transcript profile data with compound study group.
In practical applications, the transcript profile data of the transcript profile data of acquired control group and compound study group, tool Body is as shown in table 1:
Table 1
Gene Compound study group 1 Compound study group 2 Normal group 1 Normal group 2
TSPAN6 737.88411 789.4028003 734.65068 774.0787405
TNMD 0 0 0 0
DPM1 685.1781021 659.2014157 607.2866174 648.4525964
SCYL3 177.2012333 181.1893394 179.9709581 173.6596697
C1orf112 364.3984336 385.1411586 379.3234039 345.4718961
FGR 0 0 0 0
CFH 79.96773606 62.82444432 90.44694302 79.44006167
FUCA2 1105.917441 1074.389048 1091.823812 978.2212245
GCLC 3505.858247 3347.905533 3424.062843 3341.101198
NFYA 603.3929175 656.4699182 674.6603607 706.6470602
STPG1 146.304608 150.2323668 168.8958222 165.3461749
NIPAL3 245.3555538 295.9122377 275.955469 247.5574015
Wherein, 2 liang of compound study group 1, compound study group column datas are the transcript profile data of compound study group, Normal group 1,2 this two column data of normal group are the transcript profile data of corresponding control group.
Step 102: according to the transcript profile data of the transcript profile data of the control group and the compound study group, obtaining Transcriptional differences express multiple data.
It, can be first to acquired after getting the transcript profile data of control group and the transcript profile data of compound study group Each transcript profile data carry out preliminary treatment, in turn, calculate transcription of the transcript profile data of compound study group relative to control group The transcriptional differences of group data express multiple data.
More mature calculatings transcriptional differences existing many at present express the mode of multiple data, can be with when concrete application Suitable calculation is correspondingly chosen according to actual needs, according to turn of the transcript profile data of control group and compound study group Record group data determine that transcriptional differences express multiple data, do not do have to the mode for calculating transcriptional differences expression multiple data herein Body limits.
Step 103: clustering processing being done to related gene, by the gene clusters of coexpression to same group, obtains multiple genes Co-express unit.
Next, the transcript profile data to control group and the related gene in the transcript profile data of compound study group carry out Clustering processing, related gene herein specifically refer under compound effects, the gene-that expression quantity can change a lot; In turn, the same expression factor will be controlled by related gene, and/or shows the series of genes cluster of significant coordinate expression To same group, these genes are the gene co-expressed, are clustered to obtain gene co-expressing list by the gene to coexpression Member.According to synergistic effect, multiple gene co-expressing units are obtained.
When specifically doing clustering processing to the gene of coexpression, a clustering processing directly can be done to the gene of coexpression, Obtain corresponding gene co-expressing unit;It is more preferable of course for the effect for guaranteeing cluster, two can also be done to the gene of coexpression Secondary clustering processing carries out first time clustering processing to the gene of coexpression, and carry out again to first time cluster result second Clustering processing obtains gene co-expressing unit.
It should be noted that when the gene to coexpression does clustering processing, can using density clustering method and/ Or hierarchy clustering method;Specifically, density clustering method includes: DBSCAN (Density-Based Spatial Clustering of Applications with Noise)、OPTICS(Ordering Points to identity the clustering structure);Hierarchy clustering method includes: BIRCH (Balance Iterative Reducing and Clustering using Hierarchies)。
It, can be using any one of the above clustering method to total table when the gene to coexpression only does a clustering processing The gene reached is clustered, and gene co-expressing unit is obtained;When the gene to coexpression does multiple clustering processing, can only adopt It is repeatedly clustered with any one of the above clustering method, obtains gene co-expressing unit, it can also will be above-mentioned any a variety of poly- Class method combines, and is clustered to obtain gene co-expressing unit to the gene of coexpression, for example, using poly- based on density Class method does first time clustering processing to the gene of coexpression, and then again using hierarchy clustering method to first time clustering processing knot Fruit carries out second of clustering processing, obtains gene co-expressing unit.
Below by taking the gene to coexpression does clustering processing twice as an example, the process for generating gene co-expressing unit is carried out It introduces:
Specifically, can first be done at first time cluster using gene of the density clustering method OPTICS to coexpression Reason, OPTICS method is without being manually entered field radius and field minimal point the two parameters, and the class cluster knot that cluster obtains Fruit is lower to field radius and field minimal point susceptibility.After obtaining first time cluster result, first time cluster result is determined In similarity between each gene, and be screened out from it similarity and be used as at second of cluster higher than the gene of preset threshold Reason further, only retains similarity and is higher than for example, only retaining the gene that similarity is higher than 0.3,0.4,0.5,0.6,0.7 0.5 gene.
Then, second of clustering processing is carried out to first time cluster result using hierarchy clustering method BIRCH, generates gene Unit is co-expressed, BIRCH is suitable for large-scale data set, the cluster efficiency with higher when handling large-scale data, And normal operation can be left in any give.
By the efficient combination of both clustering methods, can be obtained in a relatively short period of time with a small amount of computing resource Accurate gene co-expressing unit.
Step 104: gene pathway is obtained, it is logical for the gene according to gene effect played in the gene pathway Each gene in road distributes weight coefficient, obtains gene pathway topology coefficient matrix;The gene is played in gene pathway Effect include: facilitation, inhibiting effect, phosphorylation and dephosphorylation.
It is obtained from gene pathway database such as KEGG (Kyoto Encyclopedia of Genes and Genomes) Gene pathway is correspondingly distributed according to effect of each gene played in gene pathway for each gene in gene pathway Weight coefficient;In turn, according to the corresponding weight coefficient of each gene, each gene is correspondingly calculated in every gene pathway On topological coefficient, and determine gene pathway topology coefficient matrix.
It should be noted that the gene of Primary Reference is played in gene pathway when distributing weight coefficient for each gene Effect include: facilitation, inhibiting effect, phosphorylation and dephosphorylation.
Specifically, if gene-for-gene access plays a driving role, it can be correspondingly by the corresponding weight coefficient of the gene It is set as+1;If gene-for-gene access plays inhibiting effect, can correspondingly set the corresponding weight coefficient of the gene to- 1;In view of the addition or removal of phosphate group play biological " switch ", i.e. phosphorylation and dephosphorylation mistake to many reactions Journey played in biology " switch " effect, correspondingly, if gene-for-gene access rise phosphorylation, can correspondingly by The corresponding weight coefficient of the gene is set as+2;If gene-for-gene access plays dephosphorylation, will can correspondingly be somebody's turn to do The corresponding weight coefficient of gene is set as -2.
It should be understood that in practical applications, can consider according to actual needs facilitation that gene-for-gene access plays, Inhibiting effect, phosphorylation and dephosphorylation, and corresponding weight coefficient is set for it, it can correspondingly by weight Coefficient is set as other numerical value commonly used in the art, does not do any restriction to the specific value of set weight coefficient herein.
Consider effect of the gene played in gene pathway, correspondingly distributes weight system for each gene in gene pathway After number;It can be calculated using R packet KEGGgraph and RBGL each further according to the corresponding weight coefficient of each gene Topological coefficient of the gene on every gene pathway, in turn, calculated topology coefficient form gene pathway topology system Matrix number.
Gene pathway topology coefficient matrix specific manifestation obtained is as shown in table 2:
Table 2
It should be noted that in practical applications, the execution sequence of step 102, step 103 and step 104 is not limited to When sequence as described above, specific implementation, step 102 can be first carried out, step 103 can also be first carried out, it can also be first Execute step 104, may also be performed simultaneously step 102, step 103 and step 104, herein not to step 102, step 103 and The execution sequence of step 104 is specifically limited.
Step 105: it is logical to express multiple data, the gene co-expressing unit and the gene according to the transcriptional differences Road topology coefficient matrix determines marking result of the compound on every gene pathway;The marking result is for evaluating the change Object is closed for the activation of the gene pathway.
It, can after obtaining transcriptional differences expression multiple data, gene co-expressing unit and gene pathway topology coefficient matrix To express multiple data, gene co-expressing unit and gene pathway topology coefficient matrix according to transcriptional differences obtained, adopt Marking of the compound on every gene pathway is calculated as a result, the marking result is used to evaluate compound to base with IPANDA method The activation risen by access.
Marking of the final specific identified compound on every gene pathway is as a result, as shown in table 3:
Table 3
Wherein, positive value data representation compound has invigoration effect to corresponding gene pathway, and negative valued data represents chemical combination Object has attenuation to corresponding gene pathway, and the more big then expression effect of the absolute value of numerical value is stronger.
It should be understood that in practical applications, needing the transcript profile data according to each compound study group, correspondingly determining should The corresponding marking of compound study group is as a result, determine classes of compounds, compound dosage used in the compound study group And/or the compound effects time, the activation that gene pathway is risen.
Method provided by the embodiments of the present application for evaluating the effect of compound on gene signal pathway activated is determining that gene is logical During the topology coefficient matrix of road, facilitation, inhibiting effect, phosphorylation of the gene played in gene pathway are comprehensively considered Effect and dephosphorylation, i.e., during determining gene pathway topology coefficient matrix to gene played in gene pathway Effect carry out accurate evaluation, in turn, guarantee subsequent based on transcriptional differences expression multiple data, gene co-expressing unit and base Marking of the compound determined by access topology coefficient matrix on gene pathway is as a result, it is possible to more accurately characterization of compound The activation that gene pathway is risen.
For the method described above for evaluation compound on gene signal pathway activated effect, the embodiment of the present application also phase Provide the device for evaluating the effect of compound on gene signal pathway activated with answering.
Referring to fig. 2, Fig. 2 is provided by the embodiments of the present application for evaluating the device of compound on gene signal pathway activated effect Structural schematic diagram, as shown in Fig. 2, the device includes:
Transcript profile data acquisition module 201, for obtaining the transcript profile data of control group and the transcription of compound study group Group data;
Transcriptional differences express multiple data acquisition module 202, for according to the transcript profile data of the control group and described The transcript profile data of compound study group obtain transcriptional differences and express multiple data;
Gene co-expressing unit obtains module 203, for doing clustering processing to related gene, by the gene clusters of coexpression To same group, multiple gene co-expressing units are obtained;
Gene pathway topology coefficient matrix obtains module 204, logical in the gene according to gene for obtaining gene pathway Effect played in road is that each gene in the gene pathway distributes weight coefficient, obtains gene pathway topology coefficient square Battle array;Effect of the gene played in gene pathway includes: facilitation, inhibiting effect, phosphorylation and dephosphorylation Effect;
Scoring modules 205, for expressing multiple data, the gene co-expressing unit and institute according to the transcriptional differences Gene pathway topology coefficient matrix is stated, determines marking result of the compound on every gene pathway;The marking result is used for The compound is evaluated for the activation of the gene pathway.
Optionally, the gene pathway topology coefficient matrix obtains module 204, is specifically used for:
+ 1 will be set as to the corresponding weight coefficient of the favorable gene of gene pathway;Gene pathway will be risen and be inhibited The corresponding weight coefficient of the gene of effect is set as -1;
+ 2 are set by the corresponding weight coefficient of gene for playing phosphorylation to gene pathway;Gene pathway will be gone it The corresponding weight coefficient of the gene of phosphorylation is set as -2.
Optionally, the gene pathway topology coefficient matrix obtains module 204, is specifically used for:
According to the corresponding weight coefficient of each gene, gene is calculated in every base using R packet KEGGgraph and RBGL Because of the topological coefficient on access.
Optionally, the gene co-expressing unit obtains module 203, is specifically used for:
First time clustering processing is carried out to the gene of coexpression, and second is carried out to the first time cluster result and is gathered Class processing, obtains gene co-expressing unit.
Optionally, the gene co-expressing unit obtains module 203, is specifically used for:
Using density clustering method and/or hierarchy clustering method.
Optionally, the density clustering method includes: DBSCAN, OPTICS;
The hierarchy clustering method includes: BIRCH.
The device provided by the embodiments of the present application acted on for evaluating compound on gene signal pathway activated is determining that gene is logical During the topology coefficient matrix of road, facilitation, inhibiting effect, phosphorylation of the gene played in gene pathway are comprehensively considered Effect and dephosphorylation, i.e., during determining gene pathway topology coefficient matrix to gene played in gene pathway Effect carry out accurate evaluation, in turn, guarantee subsequent based on transcriptional differences expression multiple data, gene co-expressing unit and base Marking of the compound determined by access topology coefficient matrix on gene pathway is as a result, it is possible to more accurately characterization of compound The activation that gene pathway is risen.
Present invention also provides a kind of equipment for evaluating the effect of compound on gene signal pathway activated, which specifically may be used Think server, or terminal device;Below by taking server as an example, to this for evaluating compound on gene signal pathway activated The equipment of effect is introduced.
Referring to Fig. 3, Fig. 3 is provided by the embodiments of the present application for evaluating the service of compound on gene signal pathway activated effect Device structural schematic diagram, the server 300 can generate bigger difference because configuration or performance are different, may include one or one A above central processing unit (central processing units, CPU) 322 (for example, one or more processors) With memory 332, storage medium 330 (such as one or one of one or more storage application programs 342 or data 344 A above mass memory unit).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.Storage It may include one or more modules (diagram does not mark) in the program of storage medium 330, each module may include pair Series of instructions operation in server.Further, central processing unit 322 can be set to communicate with storage medium 330, The series of instructions operation in storage medium 330 is executed on server 300.
Server 300 can also include one or more power supplys 326, one or more wired or wireless networks Interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server can be based on the server architecture shown in Fig. 3 in above-described embodiment.
Wherein, CPU 322 is for executing following steps:
Obtain the transcript profile data of control group and the transcript profile data of compound study group;
According to the transcript profile data of the transcript profile data of the control group and the compound study group, transcriptional differences are obtained Express multiple data;
Clustering processing is done to related gene, by the gene clusters of coexpression to same group, obtains multiple gene co-expressing lists Member;
Gene pathway is obtained, is each in the gene pathway according to gene effect played in the gene pathway A gene distributes weight coefficient, obtains gene pathway topology coefficient matrix;Effect packet of the gene played in gene pathway It includes: facilitation, inhibiting effect, phosphorylation and dephosphorylation;
Multiple data, the gene co-expressing unit and gene pathway topology system are expressed according to the transcriptional differences Matrix number determines marking result of the compound on every gene pathway;The marking result for evaluate the compound for The activation of the gene pathway.
Optionally, CPU322 can also be performed shown in Fig. 2 for evaluating the side of compound on gene signal pathway activated effect The method and step of any specific implementation of method.
The embodiment of the present application also provides a kind of computer readable storage mediums, for storing program code, the program generation Code is a kind of for evaluating in the method that compound on gene signal pathway activated acts on described in foregoing individual embodiments for executing Any one embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (full name in English: Read-Only Memory, english abbreviation: ROM), random access memory (full name in English: Random Access Memory, english abbreviation: RAM), the various media that can store program code such as magnetic or disk.
The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of method for evaluating the effect of compound on gene signal pathway activated, which is characterized in that the described method includes:
Obtain the transcript profile data of control group and the transcript profile data of compound study group;
According to the transcript profile data of the transcript profile data of the control group and the compound study group, transcriptional differences expression is obtained Multiple data;
Clustering processing is done to related gene, by the gene clusters of coexpression to same group, obtains multiple gene co-expressing units;
Gene pathway is obtained, is each base in the gene pathway according to gene effect played in the gene pathway Because distributing weight coefficient, gene pathway topology coefficient matrix is obtained;Effect of the gene played in gene pathway includes: to promote Into effect, inhibiting effect, phosphorylation and dephosphorylation;
Multiple data, the gene co-expressing unit and the gene pathway topology coefficient square are expressed according to the transcriptional differences Battle array, determines marking result of the compound on every gene pathway;The marking result is for evaluating the compound for described The activation of gene pathway.
2. the method according to claim 1, wherein the effect according to gene played in gene pathway, Weight coefficient is distributed for each gene in gene pathway, comprising:
+ 1 will be set as to the corresponding weight coefficient of the favorable gene of gene pathway;Inhibiting effect will be played to gene pathway The corresponding weight coefficient of gene be set as -1;
+ 2 are set by the corresponding weight coefficient of gene for playing phosphorylation to gene pathway;It phosphoric acid will be removed to gene pathway The corresponding weight coefficient of gene of change effect is set as -2.
3. method according to claim 1 or 2, which is characterized in that the acquisition gene pathway topology coefficient matrix, packet It includes:
According to the corresponding weight coefficient of each gene, it is logical in every gene that gene is calculated using R packet KEGGgraph and RBGL The topological coefficient of road.
4. the method according to claim 1, wherein described do clustering processing to gene, by the gene of coexpression Same group is clustered, multiple gene co-expressing units are obtained, comprising:
First time clustering processing is carried out to the gene of coexpression, and the first time cluster result is carried out at second of cluster Reason obtains gene co-expressing unit.
5. the method according to claim 1, wherein described do clustering processing to gene, by the gene of coexpression Same group is clustered, multiple gene co-expressing units are obtained, comprising:
Using density clustering method and/or hierarchy clustering method.
6. according to the method described in claim 5, it is characterized in that, the density clustering method include: DBSCAN, OPTICS;
The hierarchy clustering method includes: BIRCH.
7. a kind of for evaluating the device of compound on gene signal pathway activated effect, which is characterized in that described device includes:
Transcript profile data acquisition module, for obtaining the transcript profile data of control group and the transcript profile data of compound study group;
Transcriptional differences express multiple data acquisition module, for being ground according to the transcript profile data and the compound of the control group Study carefully the transcript profile data of group, obtains transcriptional differences and express multiple data;
Gene co-expressing unit obtains module, for doing clustering processing to related gene, by the gene clusters of coexpression to same Group obtains multiple gene co-expressing units;
Gene pathway topology coefficient matrix obtain module, for obtaining gene pathway, according to gene in the gene pathway institute Role is that each gene in the gene pathway distributes weight coefficient, obtains gene pathway topology coefficient matrix;It is described Effect of the gene played in gene pathway includes: facilitation, inhibiting effect, phosphorylation and dephosphorylation;
Scoring modules, for expressing multiple data, the gene co-expressing unit and the gene according to the transcriptional differences Access topology coefficient matrix determines marking result of the compound on every gene pathway;The marking result is for evaluating this Activation of the compound for the gene pathway.
8. device according to claim 7, which is characterized in that the gene pathway topology coefficient matrix obtains module, tool Body is used for:
+ 1 will be set as to the corresponding weight coefficient of the favorable gene of gene pathway;Inhibiting effect will be played to gene pathway The corresponding weight coefficient of gene be set as -1;
+ 2 are set by the corresponding weight coefficient of gene for playing phosphorylation to gene pathway;It phosphoric acid will be removed to gene pathway The corresponding weight coefficient of gene of change effect is set as -2.
9. a kind of equipment for evaluating the effect of compound on gene signal pathway activated, which is characterized in that the equipment includes processing Device and memory:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used to execute the use as described in claim 1 to 6 any one according to the instruction in said program code In the step of evaluating the method for compound on gene signal pathway activated effect.
10. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing program generation Code, said program code is for executing described in 1 to 6 any one of the claims for evaluating compound on gene access The method of activation.
CN201910142574.5A 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway Active CN109801676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910142574.5A CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910142574.5A CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Publications (2)

Publication Number Publication Date
CN109801676A true CN109801676A (en) 2019-05-24
CN109801676B CN109801676B (en) 2021-01-01

Family

ID=66561331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910142574.5A Active CN109801676B (en) 2019-02-26 2019-02-26 Method and device for evaluating activation effect of compound on gene pathway

Country Status (1)

Country Link
CN (1) CN109801676B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553492A (en) * 2006-08-31 2009-10-07 阵列生物制药公司 RAF inhibitor compounds and methods of use thereof
CN103093119A (en) * 2013-01-24 2013-05-08 南京大学 Method for recognizing significant biologic pathway through utilization of network structural information
CN103608036A (en) * 2011-06-19 2014-02-26 瓦克西尼私人有限公司 Vaccine adjuvant composition comprising inulin particles
US20150073719A1 (en) * 2013-08-22 2015-03-12 Genomoncology, Llc Computer-based systems and methods for analyzing genomes based on discrete data structures corresponding to genetic variants therein
CN104968646A (en) * 2012-12-13 2015-10-07 葛兰素史密斯克莱有限责任公司 Enhancer of Zeste homolog 2 inhibitors
US20170277826A1 (en) * 2016-03-27 2017-09-28 Insilico Medicine, Inc. System, method and software for robust transcriptomic data analysis
CN108763864A (en) * 2018-05-04 2018-11-06 温州大学 A method of evaluation biological pathway sample state
CN108753915A (en) * 2018-05-12 2018-11-06 内蒙古农业大学 The assay method of millet enzymatic activity
US20190030078A1 (en) * 2017-07-25 2019-01-31 Insilico Medicine, Inc. Multi-stage personalized longevity therapeutics
WO2019034576A1 (en) * 2017-08-18 2019-02-21 Koninklijke Philips N.V. Methods for sequencing biomolecules

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101553492A (en) * 2006-08-31 2009-10-07 阵列生物制药公司 RAF inhibitor compounds and methods of use thereof
CN103608036A (en) * 2011-06-19 2014-02-26 瓦克西尼私人有限公司 Vaccine adjuvant composition comprising inulin particles
CN104968646A (en) * 2012-12-13 2015-10-07 葛兰素史密斯克莱有限责任公司 Enhancer of Zeste homolog 2 inhibitors
CN103093119A (en) * 2013-01-24 2013-05-08 南京大学 Method for recognizing significant biologic pathway through utilization of network structural information
US20150073719A1 (en) * 2013-08-22 2015-03-12 Genomoncology, Llc Computer-based systems and methods for analyzing genomes based on discrete data structures corresponding to genetic variants therein
US20170277826A1 (en) * 2016-03-27 2017-09-28 Insilico Medicine, Inc. System, method and software for robust transcriptomic data analysis
US20190030078A1 (en) * 2017-07-25 2019-01-31 Insilico Medicine, Inc. Multi-stage personalized longevity therapeutics
WO2019034576A1 (en) * 2017-08-18 2019-02-21 Koninklijke Philips N.V. Methods for sequencing biomolecules
CN108763864A (en) * 2018-05-04 2018-11-06 温州大学 A method of evaluation biological pathway sample state
CN108753915A (en) * 2018-05-12 2018-11-06 内蒙古农业大学 The assay method of millet enzymatic activity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AFSANEH MOHAMMADNEJAD ET AL: "Weighted gene co-expression network analysis of microarray mRNA expression profiling in response to electroacupuncture", 《2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)》 *
沈慧丽等: "HER-2阴性乳腺癌新靶向基因的生物信息学分析", 《基因组学与应用生物学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN110444248B (en) * 2019-07-22 2021-09-24 山东大学 Cancer biomolecule marker screening method and system based on network topology parameters

Also Published As

Publication number Publication date
CN109801676B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
Tyler et al. PyMINEr finds gene and autocrine-paracrine networks from human islet scRNA-Seq
CN110957002B (en) Drug target interaction relation prediction method based on synergistic matrix decomposition
CN103778349B (en) Biomolecular network analysis method based on function module
US20170277826A1 (en) System, method and software for robust transcriptomic data analysis
Li et al. Putative biomarkers for predicting tumor sample purity based on gene expression data
Yue et al. Computational systems biology in disease modeling and control, review and perspectives
CN113012770A (en) Medicine-medicine interaction event prediction method, system, terminal and readable storage medium based on multi-modal deep neural network
CN105224823A (en) A kind of drug gene target spot Forecasting Methodology
Raza Reconstruction, topological and gene ontology enrichment analysis of cancerous gene regulatory network modules
Skoufos et al. AGAMEMNON: an Accurate metaGenomics And MEtatranscriptoMics quaNtificatiON analysis suite
Rahmani et al. Recursive indirect-paths modularity (RIP-M) for detecting community structure in RNA-Seq co-expression networks
Nakajima et al. Network completion using dynamic programming and least-squares fitting
CN109801676A (en) A kind of method and device acted on for evaluating compound on gene signal pathway activated
KR20190054386A (en) Genome analysis method based on modularization
EP4035163A1 (en) Single cell rna-seq data processing
Jhalia et al. A critical review on the application of artificial neural network in bioinformatics
KR101810527B1 (en) Algorithm for the construction of a regulatory network for more than 10,000 genes and method for the identification of causal genes in drug responses using the same algorithm
Singha et al. GraphGR: A graph neural network to predict the effect of pharmacotherapy on the cancer cell growth
Hernandez-Hernandez et al. Nonlinear signaling on biological networks: The role of stochasticity and spectral clustering
So et al. GraphComm: a graph-based deep learning method to predict cell-cell communication in single-cell RNAseq data
Barzel et al. Graph theory properties of cellular networks
Nguyen et al. Discovery of pathways in protein–protein interaction networks using a genetic algorithm
Madhamshettiwar et al. RMaNI: regulatory module network inference framework
Roy et al. Soft computing approaches to extract biologically significant gene network modules
Li et al. iDEG: a single-subject method utilizing local estimates of dispersion to impute differential expression between two transcriptomes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant