CN106874704B - A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network - Google Patents

A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network Download PDF

Info

Publication number
CN106874704B
CN106874704B CN201710004254.4A CN201710004254A CN106874704B CN 106874704 B CN106874704 B CN 106874704B CN 201710004254 A CN201710004254 A CN 201710004254A CN 106874704 B CN106874704 B CN 106874704B
Authority
CN
China
Prior art keywords
gene
regulator
value
expression
regulated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710004254.4A
Other languages
Chinese (zh)
Other versions
CN106874704A (en
Inventor
王伟胜
曾亚菲
骆嘉伟
刘智明
蔡洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201710004254.4A priority Critical patent/CN106874704B/en
Publication of CN106874704A publication Critical patent/CN106874704A/en
Application granted granted Critical
Publication of CN106874704B publication Critical patent/CN106874704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Abstract

The invention discloses a kind of genes based on linear model to be total to the sub- recognition methods of key regulatory in regulated and control network, using gene expression profile data and gene regulation relation data, the identification that gene is total to key regulatory in regulated and control network is completed by the expression of disease gene known to building Linear Model for Prediction.The present invention realizes simple, key regulatory that gene is total in regulated and control network only can need to be relatively accurately identified according to gene expression profile data and gene regulation relationship, and the regulator for being experimentally confirmed identification has critically important biological meaning, has important theory significance and practical value for the research of disease mechanisms.

Description

A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network
Technical field
The invention belongs to calculation biology field, it is related to a kind of gene based on linear model and is total in regulated and control network crucial tune Control sub- recognition methods.
Background technique
In the genome times afterwards comprehensively, the function of gene, non-coding RNA, protein and other associated biomolecules is understood, prompt The realization mechanism of bioprocess becomes one of current computing system biology and the most important goal in research of bioinformatics.Its In, the research of gene regulation is a very important project.We are recognized biology by the regulatory mechanism for understanding gene expression The mechanism of process and disease all plays an important role.In eucaryote, the regulatory factor that there are two classes important: turn The factor (TF) and microRNA (miRNA) are recorded, regulates and controls the expression of target gene on transcriptional level and post-transcriptional level respectively. Transcription factor is a kind of protein with specific function, it opens turning for gene by being integrated to the promoter region of gene Record process.MiRNA is the new gene regulatory elements of one kind of Recent study discovery, is in the one kind found in eucaryote The non-coding RNA with adjusting function of source property, size are about 20-25 nucleotide.Transcription factor, miRNA are in gene table Up to playing an important role in regulation, this regulating and controlling effect spreads various biological activities and disease generating process.In this base On plinth, research finds transcription factor and miRNA there is extensive interaction and cooperation regulation, they constitute a complexity Total regulated and control network.Turn in regulated and control network comprising transcription factor regulation miRNA, transcription factor regulation target gene, miRNA regulation altogether The regulating and controlling effect of the factor and target gene is recorded, these regulating and controlling effects embody each rank that cellular elements life process and function execute Section, so regulated and control network includes the biological information more richer than single network altogether.Therefore, it efficiently identifies on total regulated and control network Key regulatory all plays an important role to clinical treatment and the drug design of disease, this will likely can be mentioned to the treatment of human diseases For a kind of new means.
With the rapid development of high-throughput techniques, a large amount of genomics, transcription group and proteomics etc. are produced Group learns data, provides new opportunity for biological molecular function research.In the past for the recognizer of key point, mainly concentrate For the identification of key protein on protein-protein interaction network.Compared with protein-protein interaction network, transcriptional control net The Study on Evolution of network is then more difficult.Firstly, believable transcription regulatory network data are still not easy to obtain;Secondly, just existing From the point of view of transcription regulatory network, due to the functional characteristic of network itself, the topological property and the protein interaction net that show Network has relatively big difference, and adds the aeoplotropism of regulating and controlling effect, so that the topological property that regulated and control network is shown is more complicated.Therefore It is also more increasingly complex than the identification of key protein for the identification of key regulatory on regulated and control network.In recent years, for regulated and control network Research it is more and more, had it is a variety of identified based on the method for calculating on regulated and control network key regulatories, mainly Have this following a few class method: based on information flow model (RWR), rank algorithm (PageRanking), building classifier (SVM, Regularized least-squares classification), Bayesian network, based on regression model etc..However, More or less there are some problems in some methods: cannot such as handle big data, time complexity are too high, precision is to be improved etc. Deng.2015, Alexandra etc. proposed MIPRIP method, and the key regulatory on regulated and control network is identified using linear model Son, the experimental results showed that, the method based on linear model can effectively identify the regulator with important biomolecule meaning.However, This method is the simple relationship considered between transcription factor and gene, and there is no in view of regulator in total regulated and control network Between interaction and cooperation regulation relationship, while accuracy of identification is also to be improved.
Therefore, it is necessary to which designing a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of genes based on linear model to be total to key in regulated and control network Regulator recognition methods.The gene based on linear model is total in regulated and control network that the sub- recognition methods of key regulatory only need to be according to gene Expression modal data and gene regulation relationship can relatively accurately identify that gene is total to the key in regulated and control network with biological meaning Regulator.
The technical solution of invention is as follows:
A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, comprising the following steps:
Step 1) building gene is total to regulated and control network:
Input gene expression profile data, gene regulation relationship and protein interaction data (Protein-Protein Interaction, PPI), it filters out wherein in the presence of the interactively pair of no express spectra back end, establishes gene and regulate and control net altogether Network GCN (gene co-regulatory networks), gene are total in regulated and control network GCN altogether comprising three kinds of nodes: regulator MiRNA (microRNA), regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and gene-gene;
If gene is total to any two points in regulated and control network GCN, it is 1 there are interactively side right, is otherwise 0;
Step 2) calculates separately the activity value of regulator miRNA, regulator TF and adjacent gene to known disease gene;
Activity value, that is, miRNA, TF and the influence value for abutting disease gene known to gene pairs;
Step 3) is total in regulated and control network GCN in the gene constructed, is obtained using in gene expression profile data and step 2) The activity value building linear model of the regulator and adjacent gene that arrive, predicts the expression of known disease gene, obtains known disease The prediction expression value of gene;
Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step 3) linear model constructed is converted into optimization problem, is asked based on mixed integer linear programming thought optimization problem Solution, it is final to identify that gene is total to of the key regulatory in regulated and control network.
Further, the linear model expression formula for predicting the expression building of known disease gene is as follows:
Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease The adjoining gene of gene i;
g′i,sIndicate the expression value that known disease gene i is predicted in sample s, β0Refer to the additional weight of linear model (additive offset), M, T, G respectively indicate miRNA collection, TF collection, gene collection;βm、βt、βgRespectively indicate the optimization of m, t, g Parameter, the optimization problem in step 4) can be directly calculated when handling using optimizer;
esm,i、tst,i、gsg,iThe action edge weight of m, t, g and i are respectively indicated, value is 0 or 1;
actm,s、actt,s、actg,sRespectively indicate the activity value of m, t, g in sample s;
The sample s refers to the data of some observation individual of known disease.
Further, the difference according between predictive genes expression value and truly expressed value is minimized linear mould Type is converted into optimization problem, indicates are as follows:
Wherein, gi,s、g′i,sRespectively indicate the expression value of disease gene i true expression value and prediction in sample s, O with What S was respectively indicated is total sample set of known disease gene collection He the disease;
The optimization problem is solved using Gurobi optimizer, records each regulator in solving optimization problem mistake All regulators are carried out ranking according to selection number by the number of optimised device selection in journey, take before ranking 50 regulator work For final candidate regulatory.
After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is the line constructed Property model using the difference minimum between the prediction expression value and truly expressed value of known disease gene by the linear mould of building Type is converted into obtained from optimization problem.A series of typical different size of models in order to obtain, by bound base because Regulator number constructs linear model.For each known disease gene, it is 1 to k to construct that regulator number, which is set separately, Linear model.
Further, the activity value of the regulator miRNA, regulator TF and adjacent gene is respectively by following two side Method is calculated:
1) activity value of regulator miRNA and regulator TF are calculated:
The first step, first the benchmark expression value of all target genes of calculating regulator r:
Wherein, r indicates regulator, is regulator miRNA or regulator TF;Indicate the target gene g of regulator rt Benchmark expression value, value be gene gtThe average value of expression value in all samples that regulator r expression tends to 0;e (r) -> 0 indicate that regulator r expression tends to 0;
The benchmark expression value of target gene refers to the expression value of target gene when no regulating and controlling effect influences;
Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e., The expression changing value of target geneHave:
Wherein, ygt,sIndicate the target gene g of regulator rtTruly expressed value in sample s,Indicate regulator r Target gene gtExpression changing value;
Third step constructs simple linear model according to the expression changing value of target gene, solves the activity of regulator Value actr,s:
Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target base of regulator r Because of the expression changing value summation and benchmark expression value summation of collection;
3) activity value for calculating adjacent gene influences accumulation using the expression based on adjacent its all effect gene of gene pairs Effect solves, it may be assumed that
Wherein, N indicates the gene number in sample s, gsg,iIndicate the effect side right of the gene i in gene g and sample s, gi,sIndicate expression value of the gene i in sample s in sample s, the sample s refers to some observation individual of known disease Data.
Further, after the activity value of the regulator and adjacent gene that obtain to the step 2) is normalized, It is used further to the building of the linear model in step 3).
Beneficial effect
The present invention provides a kind of genes based on linear model to be total to the sub- recognition methods (co- of key regulatory in regulated and control network BOTLM), using gene expression profile data and gene regulation relationship, pass through the table of disease gene known to building Linear Model for Prediction It reaches to complete the identification that gene is total to key regulatory in regulated and control network.
Compared with having the method based on linear model identification key regulatory, co-BOTLM method tool of the present invention There is following advantage:
1) it is applied to regulated and control network altogether, regulated and control network includes the biological information more richer than single network altogether, therefore is known Other regulator may have prior biological meaning;
2) protein interaction data (PPI information) is added, considers that the expression of gene may be by the shadow of adjacent gene It rings;
3) activity value that new method calculates regulator and adjacent gene is quoted, cancer gene expression prediction is effectively increased Precision.The present invention realizes simply, only can need to relatively accurately be identified according to gene expression profile data and gene regulation relationship Gene is total to of the key regulatory in regulated and control network.
It is experimentally confirmed, method co-BOTLM of the present invention can effectively identify the pass that gene is total in regulated and control network Key regulator, and key regulatory identified all has critically important biological meaning.Meanwhile by comparing other methods, accurately Degree also increases.The comparison of specific experiment result figure and analysis detailed in Example.
Detailed description of the invention
Fig. 1 is the flow chart of co-BOTLM of the present invention.
Specific embodiment
The present invention is described in further details below with reference to the drawings and specific embodiments:
Embodiment 1:
One, the gene based on linear model is total to the sub- identification model of key regulatory in regulated and control network
Gene is total to the key regulatory sub-definite in regulated and control network by the present invention are as follows: utilizes gene expression profile data and gene tune Control relationship, by the expression of disease gene known to building Linear Model for Prediction, so that identifies is serious in total regulated and control network Influence the regulator of disease gene expression.
The gene based on linear model is described for clarity and is total to the sub- identification model of key regulatory in regulated and control network, and inventor will The related definition of the model is as follows:
Disease gene known to the building Linear Model for Prediction of proposition is expressed, and expression-form is as follows:
It is identification in total regulation that gene based on linear model, which is total to the target of the sub- identification model of key regulatory in regulated and control network, The regulator of disease gene expression is seriously affected in network.By constructing line using gene expression profile data and gene regulation relationship The expression of property disease gene known to model prediction, to complete the identification that gene is total to key node in regulated and control network.
The whole flow process that gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network is as shown in Figure 1. Gene expression profile data, gene regulation relationship and PPI data are inputted first.Method co-BOTLM can be divided into 4 subprocess:
1) building gene is total to regulated and control network;
2) in view of the expression of gene may be influenced by regulator and adjacent gene, therefore it is directed to known disease base Cause calculates separately the activity value (shadow of disease gene i.e. known to miRNA, TF and adjacent gene pairs of miRNA, TF and adjacent gene Ring value);
3) it is total in regulated and control network in obtained gene and constructs linear model using the expression modal data of gene, known to prediction The expression of disease gene;
4) it is converted linear model to according to the difference minimum between predictive genes expression value and truly expressed value optimal Change problem, and solved based on mixed integer linear programming thought (MILP), it is final to identify that gene is total to the pass in regulated and control network Key regulator, entire identification process terminate;
The optimization problem is solved using Gurobi optimizer, records each regulator in solving optimization problem mistake All regulators are carried out ranking according to selection number by the number of optimised device selection in journey, take before ranking 50 regulator work For final candidate regulatory.
After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is known to utilization Difference between the prediction expression value and truly expressed value of disease gene, which is minimized, converts optimization for the linear model of building Obtained from problem.A series of typical different size of models in order to obtain, by bound base because regulator number come structure Build linear model.For each known disease gene, it is 1 to k to construct linear model that regulator number, which is set separately,.In this reality In example, k value was 5 (through many experiments, when k value is 5, experiment effect reaches best).
Two, the gene based on linear model is total to the sub- recognition methods validation verification of key regulatory in regulated and control network
For the validity of verification method co-BOTLM, co-BOTLM method is applied on one group of oophoroma data set. Experimental data set includes: oophoroma sample data, gene regulation relationship, PPI data, the Cancer-Related disease base of known ovary Cause.TCGA database is downloaded under oophoroma sample data, totally 385 samples, by filter out expression value absolute value it is too small or The gene that no significant difference is expressed in each sample, finally obtains totally 385 sample, wherein comprising 559 miRNA and The oophoroma of 12456 genes expresses spectrum data set.Interactively data include miRNA-gene, TF-gene and PPI data, It is downloaded from MicroCosm website, ENCODE database and BioGrid database down respectively.By by oophoroma express spectra number It is mutually mapped according to collection and interactively, finally constructs a miRNA-TF gene and be total to regulated and control network, altogether include three kinds in network The node of type: 12381 genes, 559 miRNA and 75 TF, existing interactively between node: 59660 couples of gene- Gene, 241722 couples of miRNA-gene and 9877 couple of TF-gene.Disease gene related for known oophoroma, from DDOC number 379 are obtained according to library downloading, filters out the disease gene without expression modal data or without regulating and controlling effect relationship, final residue 123 It is a.
Three folding cross-validation experiments have been carried out in this example, and method co-BOTLM and Alexandra et al. have been proposed MIPRIP method is compared in terms of precision of prediction, and reference Pearson correlation coefficient PCC is pre- to calculate co-BOTLM method Similitude between the disease gene expression data and truly expressed data of survey, PCC value is bigger, then similitude is higher, Jin Erbiao The linear model accuracy of bright co-BOTLM method building is higher, therefore the precision of experimental result is also higher.PCC value makes in example It is calculated with the cor function of R language.Meanwhile in this example, the regulator also identified to co-BOTLM method carries out Characteristic and function enrichment analysis.
1. analysis of experimental results, verification algorithm validity
Table 1:miRNA-TF gene is total in regulated and control network before ranking 20 regulator
No. Key regulatory of identification Target gene number Optimizer selects number
1 hsa-mir-106a* 377 50
2 hsa-mir-586 508 43
3 hsa-mir-423-5p 496 38
4 hsa-mir-515-3p 512 34
5 hsa-mir-181a-2* 496 34
6 hsa-mir-768-3p 530 32
7 hsa-mir-663 480 32
8 hsa-mir-539 382 31
9 hsa-mir-206 477 30
10 hsa-mir-509-3p 552 30
11 hsa-mir-362-3p 512 25
12 hsa-mir-378* 519 24
13 hsa-mir-520c-3p 566 24
14 hsa-mir-33a 523 24
15 hsa-mir-29a* 495 23
16 hsa-mir-193a-3p 496 23
17 hsa-mir-601 484 23
18 FOXA2 169 23
19 hsa-mir-26b 466 22
20 hsa-mir-30b 541 22
In this example, after three folding cross-validation experiments, the final averagely PPC value that obtains is 0.535, is shown in the present invention The gene expression values and truly expressed value of Linear Model for Prediction have relatively high similitude, hence it is demonstrated that co-BOTLM method structure The linear model accuracy built is relatively high, can effectively identify key regulatory in network.After the completion of experiment operation, according to excellent Change device to the selection number of all regulators, ranking is carried out to it, take first 50 as the candidate key regulator in this example. In table 1 above, before ranking 20 regulator is listed, it can be seen that the base that any regulator in addition to FOXA2 is regulated and controled Because being no less than 300, and wherein many genes have been found related with oophoroma.Since TF experimental data is very few, FOXA2 Target gene it is on the low side.It is indicated above identified regulator and ovarian cancer gene is total in regulated and control network a large amount of gene and there is work , may be related with the expression of lots of genes (including known ovarian cancer disease gene) with relationship, therefore it is total to regulated and control network herein In be of crucial importance.
2. method co-BOTLM and MIPRIP methods experiment compare, verification algorithm accuracy
Table 2: the PCC value of method MIPRIP experimental result
No. 1 2 3 4 5
1 0.3329907 0.4312150 0.4436449 0.4731776 0.4893458
2 0.3195237 0.4221495 0.4500000 0.4687850 0.4851402
3 0.3214019 0.4341121 0.4571028 0.4768224 0.4916822
Note: 1-3: three folding cross-validation experiments are indicated, 1-5: indicates the regulator number k value of building linear model
Table 3: the PCC value of method co-BOTLM experimental result
No. 1 2 3 4 5
1 0.5018750 0.5709821 0.5940179 0.6112500 0.6227679
2 0.4858036 0.5575893 0.5869643 0.6025893 0.6164286
3 0.4956250 0.5518750 0.5691964 0.5918750 0.6059821
MIPRIP method and co-BOTLM method of the invention are all based on linear model to identify the key of specified disease Regulator, however there are three differences: 1) MIPRIP method be applied to regulated and control network, and co-BOTLM method is applied to adjust altogether Controlling network, transcription factor and miRNA, there is extensive interactions and cooperation regulation, therefore regulated and control network includes than single altogether The richer biological information of network;2) factor expressed for influencing disease gene, in addition to transcription factor and miRNA, co- BOTLM method also contemplates its issuable influence of adjacent gene pairs;3) transcription of MIPRIP method and co-BOTLM method The factor is different with miRNA activity value calculation.Since MIPRIP method is applied to regulated and control network, the total tune in network is not considered Control relationship, therefore this example regards transcription factor as common gene when comparing laboratory.Table 2, table 3 side of being respectively The PCC value that method MIPRIP and method co-BOTLM experimental result obtain, can be it is clear to see that co-BOTLM method takes from table Higher PCC value was obtained, average PCC value is 0.571, and the average PCC value of MIPRIP method is 0.433.It is obvious that method The gene expression values and truly expressed value of co-BOTLM prediction have higher similitude, and therefore, experiment shows method co- indirectly BOTLM accuracy is higher, and the sub- reliability of the key regulatory identified is higher.
3. the enrichment analysis of experimental result function, the validity of verification result
Table 4: 10 regulator GO is enriched with analysis before ranking
Ncellular component assemblycellular component assemblyo.: regulator ranking, The GO term of enrichment: by before P-value (the smaller the better) ranking 3 GO term, GO number: the GO term of P-value < 0.05 Number, P-value: < 0.05 shows enrichment degree height.
Table 5: 10 regulator KEGG access is enriched with analysis before ranking
No.: regulator ranking, the KEGG access of enrichment: pressing before P-value (the smaller the better) ranking 3 KEGG access, KEGG number: KEGG number of P-value < 0.05, P-value: < 0.05 shows enrichment degree height.
In order to which key regulatory that the co-BOTLM method verified in the present invention is identified has biological meaning, at this In secondary example, GO enrichment analysis and KEGG access have been carried out to key regulatory identified using the GOstats of R language respectively Enrichment analysis.Table 4 and table 5 are respectively it is shown that GO the and KEGG access of 10 regulator is enriched with analysis result before ranking.
It is obvious that from table 4, it can be seen that 10 regulator is big before the ranking that the co-BOTLM method in the present invention is identified Part is enriched 300 or more GO terms, wherein the GO term being more frequently enriched with has: cellular component organization、cellular process、cell death、negative regulation of dendritic Cell differentiation etc. shows identified regulator and largely takes part in the related vital movement mistake of cell Journey.The GO term number that hsa-mir-515-3p and hsa-mir-768-3p are enriched with less than 100, reason may be due to this two It is less that gene is matched in the target gene of a miRNA and the library GOstats, meanwhile, Jiang et al. in 2016 it was demonstrated that due to Hsa-mir-768-3p downward is related with the MEK/ERK-mediated reinforcement in the synthesis of the protein of melanoma cells, therefore Hsa-mir-768-3p is possible to have potential prognostic function in oophoroma.Similarly, it clearly can be seen that from table 5 10 regulator is largely enriched at least 5 or more KEGG accesses before ranking, wherein the biological mistake being more frequently enriched with Cheng You: Prostate cancer, pathways in cancer, signaling pathway, ErbB signaling Pathway etc., shows identified regulator and takes part in a large amount of cancer and signal path, has close pass with cancer System.Take part in a large amount of bioprocess in conclusion sufficiently demonstrating and testing identified regulator, especially with cellular activity And the related bioprocess of cancer, therefore there is critically important biological meaning.

Claims (5)

1. a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, which is characterized in that including with Lower step:
Step 1) building gene is total to regulated and control network:
Gene expression profile data, gene regulation relationship and protein interaction data are inputted, are filtered out wherein in the presence of no expression The interactively pair of modal data node establishes gene and is total to regulated and control network GCN, and gene is total in regulated and control network GCN altogether comprising three kinds of sections Point: regulator miRNA, regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and gene-gene;
If gene is total to any two points in regulated and control network GCN, it is 1 there are interactively side right, is otherwise 0;
Step 2) calculates separately the activity value of regulator miRNA, regulator TF and adjacent gene to known disease gene;
Step 3) is total in regulated and control network GCN in the gene constructed, using obtained in gene expression profile data and step 2) The activity value of regulator and adjacent gene constructs linear model, predicts the expression of known disease gene, obtains known disease gene Prediction expression value;
Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step 3) structure The linear model built is converted into optimization problem, is solved based on mixed integer linear programming thought to optimization problem, most Identification gene is total to of the key regulatory in regulated and control network eventually.
2. the gene according to claim 1 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, the linear model expression formula for predicting the expression building of known disease gene is as follows:
Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease gene The adjoining gene of i;
g′i,sIndicate the expression value that known disease gene i is predicted in sample s, β0Refer to the additional weight of linear model, M, T, G respectively indicates miRNA collection, TF collection, gene collection;βm、βt、βgThe Optimal Parameters for respectively indicating m, t, g, it is optimal in step 4) It can be directly calculated when changing issue handling using optimizer;
esm,i、tst,i、gsg,iThe action edge weight of m, t, g and i are respectively indicated, value is 0 or 1;
actm,s、actt,s、actg,sRespectively indicate the activity value of m, t, g in sample s;
The sample s refers to the data of some observation individual of known disease.
3. the gene according to claim 2 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, the difference according between predictive genes expression value and truly expressed value, which minimizes, converts linear model to Optimization problem indicates are as follows:
Wherein, gi,s、g′i,sThe expression value of disease gene i true expression value and prediction in sample s is respectively indicated, O and S divide What is do not indicated is total sample set of known disease gene collection He the disease;
The optimization problem is solved using Gurobi optimizer, records each regulator during solving optimization problem The number of optimised device selection carries out ranking to all regulators according to selection number, takes before ranking 50 regulator as most Whole candidate regulatory.
4. the gene according to claim 1-3 based on linear model is total to key regulatory in regulated and control network and identifies Method, which is characterized in that the activity value of regulator miRNA, the regulator TF and adjacent gene are respectively by following two side Method is calculated:
1) activity value of regulator miRNA and regulator TF are calculated:
The first step, first the benchmark expression value of all target genes of calculating regulator r:
Wherein, r indicates regulator, is regulator miRNA or regulator TF;Indicate the target gene g of regulator rtBase Quasi- expression value, value are gene gtThe average value of expression value in all samples that regulator r expression tends to 0;e(r)->0 Indicate that regulator r expression tends to 0;
Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e. target The expression changing value of geneHave:
Wherein,Indicate the target gene g of regulator rtTruly expressed value in sample s,Indicate the mesh of regulator r Mark gene gtExpression changing value;
Third step constructs simple linear model according to the expression changing value of target gene, solves the activity value of regulator actr,s:
Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target gene collection of regulator r Expression changing value summation and benchmark expression value summation;
2) activity value for calculating adjacent gene influences cumulative effect using the expression based on adjacent its all effect gene of gene pairs To solve, it may be assumed that
Wherein, N indicates the gene number in sample s, gsg,iIndicate the effect side right of the gene i in gene g and sample s, gi,sTable Expression value of the gene i in sample s in this s of sample, the sample s refer to the data of some observation individual of known disease.
5. the gene according to claim 4 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, after the activity value of the regulator and adjacent gene that obtain the step 2) is normalized, is used further to walk It is rapid 3) in linear model building.
CN201710004254.4A 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network Active CN106874704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710004254.4A CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710004254.4A CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Publications (2)

Publication Number Publication Date
CN106874704A CN106874704A (en) 2017-06-20
CN106874704B true CN106874704B (en) 2019-02-19

Family

ID=59164588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710004254.4A Active CN106874704B (en) 2017-01-04 2017-01-04 A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Country Status (1)

Country Link
CN (1) CN106874704B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391962B (en) * 2017-09-05 2020-12-29 武汉古奥基因科技有限公司 Method for analyzing regulation and control relation of genes or loci to diseases based on multiple groups of theories
CN107679367B (en) * 2017-09-20 2020-02-21 湖南大学 Method and system for identifying co-regulation network function module based on network node association degree
CN109308934A (en) * 2018-08-20 2019-02-05 唐山照澜海洋科技有限公司 A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm
CN111304200B (en) * 2020-02-11 2022-04-15 山东大学 CeRNA (cellular ribonucleic acid) regulation and control network for regulating and controlling osteointegration around rat implant with hyperlipidemia and application of network
CN111613268B (en) * 2020-05-27 2023-02-24 中山大学 Method for determining gene expression regulation mechanism based on single cell transcriptome data
CN112102876B (en) * 2020-09-27 2023-03-28 西安交通大学 Method for automatically modeling gene circuit and transcription regulation and control relation
CN115798600A (en) * 2023-02-03 2023-03-14 北京灵迅医药科技有限公司 Genome data analysis method, apparatus, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719195A (en) * 2009-12-03 2010-06-02 上海大学 Inference method of stepwise regression gene regulatory network
CN101719194A (en) * 2009-12-03 2010-06-02 上海大学 Artificial gene regulatory network simulation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10159262B4 (en) * 2001-12-03 2007-12-13 Siemens Ag Identify pharmaceutical targets

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719195A (en) * 2009-12-03 2010-06-02 上海大学 Inference method of stepwise regression gene regulatory network
CN101719194A (en) * 2009-12-03 2010-06-02 上海大学 Artificial gene regulatory network simulation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Transcription factor and miRNA;Ying Lin等;《SCIENTIFIC REPORTS》;20151021;全文
整合分析基因表达与拷贝数变异识别癌症的驱动基因及调控子miRNAs;许艳等;《现代生物医学进展》;20160215;第940-943页

Also Published As

Publication number Publication date
CN106874704A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874704B (en) A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network
EP2864915B1 (en) Systems and methods relating to network-based biomarker signatures
Fan et al. Comprehensive overview and assessment of computational prediction of microRNA targets in animals
Elyasigomari et al. Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization
Su et al. Phenotypic heterogeneity and evolution of melanoma cells associated with targeted therapy resistance
CN111933212B (en) Clinical histology data processing method and device based on machine learning
Mohammadi et al. Automated design of synthetic cell classifier circuits using a two-step optimization strategy
CN106295246A (en) Find the lncRNA relevant to tumor and predict its function
CN105808976A (en) Recommendation model based miRNA target gene prediction method
Marques et al. Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction
Torkamannia et al. A review of machine learning approaches for drug synergy prediction in cancer
Nguyen et al. Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes
Engelmann et al. A least angle regression model for the prediction of canonical and non-canonical miRNA-mRNA interactions
KR100759817B1 (en) Method and device for predicting regulation of multiple transcription factors
Tian et al. Graph random Forest: a graph embedded algorithm for identifying highly connected important features
Khan et al. Integrative workflows for network analysis
Rau et al. Individualized multi-omic pathway deviation scores using multiple factor analysis
Kalyakulina et al. Disease classification for whole-blood DNA methylation: meta-analysis, missing values imputation, and XAI
Yang et al. MSPL: Multimodal self-paced learning for multi-omics feature selection and data integration
CN115280415A (en) Application of pathogenicity model and training thereof
Liu et al. miRNA-disease associations prediction based on neural tensor decomposition
Park et al. Use of evolutionary hypernetworks for mining prostate cancer data
Wong et al. An integrative boosting approach for predicting survival time with multiple genomics platforms
Rafsanzani et al. Construction of GRN for Lung Cancer Datasets Using Weighted Co-Expression Networks
Zhai Statistical Methods for Gene Differential Expression Detection and Cell Trajectory Reconstruction from Single-Cell RNA Sequencing Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant