CN106874704B

CN106874704B - A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Info

Publication number: CN106874704B
Application number: CN201710004254.4A
Authority: CN
Inventors: 王伟胜; 曾亚菲; 骆嘉伟; 刘智明; 蔡洁
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2017-01-04
Filing date: 2017-01-04
Publication date: 2019-02-19
Anticipated expiration: 2037-01-04
Also published as: CN106874704A

Abstract

The invention discloses a kind of genes based on linear model to be total to the sub- recognition methods of key regulatory in regulated and control network, using gene expression profile data and gene regulation relation data, the identification that gene is total to key regulatory in regulated and control network is completed by the expression of disease gene known to building Linear Model for Prediction.The present invention realizes simple, key regulatory that gene is total in regulated and control network only can need to be relatively accurately identified according to gene expression profile data and gene regulation relationship, and the regulator for being experimentally confirmed identification has critically important biological meaning, has important theory significance and practical value for the research of disease mechanisms.

Description

A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network

Technical field

The invention belongs to calculation biology field, it is related to a kind of gene based on linear model and is total in regulated and control network crucial tune Control sub- recognition methods.

Background technique

In the genome times afterwards comprehensively, the function of gene, non-coding RNA, protein and other associated biomolecules is understood, prompt The realization mechanism of bioprocess becomes one of current computing system biology and the most important goal in research of bioinformatics.Its In, the research of gene regulation is a very important project.We are recognized biology by the regulatory mechanism for understanding gene expression The mechanism of process and disease all plays an important role.In eucaryote, the regulatory factor that there are two classes important: turn The factor (TF) and microRNA (miRNA) are recorded, regulates and controls the expression of target gene on transcriptional level and post-transcriptional level respectively. Transcription factor is a kind of protein with specific function, it opens turning for gene by being integrated to the promoter region of gene Record process.MiRNA is the new gene regulatory elements of one kind of Recent study discovery, is in the one kind found in eucaryote The non-coding RNA with adjusting function of source property, size are about 20-25 nucleotide.Transcription factor, miRNA are in gene table Up to playing an important role in regulation, this regulating and controlling effect spreads various biological activities and disease generating process.In this base On plinth, research finds transcription factor and miRNA there is extensive interaction and cooperation regulation, they constitute a complexity Total regulated and control network.Turn in regulated and control network comprising transcription factor regulation miRNA, transcription factor regulation target gene, miRNA regulation altogether The regulating and controlling effect of the factor and target gene is recorded, these regulating and controlling effects embody each rank that cellular elements life process and function execute Section, so regulated and control network includes the biological information more richer than single network altogether.Therefore, it efficiently identifies on total regulated and control network Key regulatory all plays an important role to clinical treatment and the drug design of disease, this will likely can be mentioned to the treatment of human diseases For a kind of new means.

With the rapid development of high-throughput techniques, a large amount of genomics, transcription group and proteomics etc. are produced Group learns data, provides new opportunity for biological molecular function research.In the past for the recognizer of key point, mainly concentrate For the identification of key protein on protein-protein interaction network.Compared with protein-protein interaction network, transcriptional control net The Study on Evolution of network is then more difficult.Firstly, believable transcription regulatory network data are still not easy to obtain；Secondly, just existing From the point of view of transcription regulatory network, due to the functional characteristic of network itself, the topological property and the protein interaction net that show Network has relatively big difference, and adds the aeoplotropism of regulating and controlling effect, so that the topological property that regulated and control network is shown is more complicated.Therefore It is also more increasingly complex than the identification of key protein for the identification of key regulatory on regulated and control network.In recent years, for regulated and control network Research it is more and more, had it is a variety of identified based on the method for calculating on regulated and control network key regulatories, mainly Have this following a few class method: based on information flow model (RWR), rank algorithm (PageRanking), building classifier (SVM, Regularized least-squares classification), Bayesian network, based on regression model etc..However, More or less there are some problems in some methods: cannot such as handle big data, time complexity are too high, precision is to be improved etc. Deng.2015, Alexandra etc. proposed MIPRIP method, and the key regulatory on regulated and control network is identified using linear model Son, the experimental results showed that, the method based on linear model can effectively identify the regulator with important biomolecule meaning.However, This method is the simple relationship considered between transcription factor and gene, and there is no in view of regulator in total regulated and control network Between interaction and cooperation regulation relationship, while accuracy of identification is also to be improved.

Therefore, it is necessary to which designing a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of genes based on linear model to be total to key in regulated and control network Regulator recognition methods.The gene based on linear model is total in regulated and control network that the sub- recognition methods of key regulatory only need to be according to gene Expression modal data and gene regulation relationship can relatively accurately identify that gene is total to the key in regulated and control network with biological meaning Regulator.

The technical solution of invention is as follows:

A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, comprising the following steps:

Step 1) building gene is total to regulated and control network:

Input gene expression profile data, gene regulation relationship and protein interaction data (Protein-Protein Interaction, PPI), it filters out wherein in the presence of the interactively pair of no express spectra back end, establishes gene and regulate and control net altogether Network GCN (gene co-regulatory networks), gene are total in regulated and control network GCN altogether comprising three kinds of nodes: regulator MiRNA (microRNA), regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and gene-gene；

If gene is total to any two points in regulated and control network GCN, it is 1 there are interactively side right, is otherwise 0；

Step 2) calculates separately the activity value of regulator miRNA, regulator TF and adjacent gene to known disease gene；

Activity value, that is, miRNA, TF and the influence value for abutting disease gene known to gene pairs；

Step 3) is total in regulated and control network GCN in the gene constructed, is obtained using in gene expression profile data and step 2) The activity value building linear model of the regulator and adjacent gene that arrive, predicts the expression of known disease gene, obtains known disease The prediction expression value of gene；

Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step 3) linear model constructed is converted into optimization problem, is asked based on mixed integer linear programming thought optimization problem Solution, it is final to identify that gene is total to of the key regulatory in regulated and control network.

Further, the linear model expression formula for predicting the expression building of known disease gene is as follows:

Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease The adjoining gene of gene i；

g′_i,sIndicate the expression value that known disease gene i is predicted in sample s, β₀Refer to the additional weight of linear model (additive offset), M, T, G respectively indicate miRNA collection, TF collection, gene collection；β_m、β_t、β_gRespectively indicate the optimization of m, t, g Parameter, the optimization problem in step 4) can be directly calculated when handling using optimizer；

es_m,i、ts_t,i、gs_g,iThe action edge weight of m, t, g and i are respectively indicated, value is 0 or 1；

act_m,s、act_t,s、act_g,sRespectively indicate the activity value of m, t, g in sample s；

The sample s refers to the data of some observation individual of known disease.

Further, the difference according between predictive genes expression value and truly expressed value is minimized linear mould Type is converted into optimization problem, indicates are as follows:

Wherein, g_i,s、g′_i,sRespectively indicate the expression value of disease gene i true expression value and prediction in sample s, O with What S was respectively indicated is total sample set of known disease gene collection He the disease；

The optimization problem is solved using Gurobi optimizer, records each regulator in solving optimization problem mistake All regulators are carried out ranking according to selection number by the number of optimised device selection in journey, take before ranking 50 regulator work For final candidate regulatory.

After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is the line constructed Property model using the difference minimum between the prediction expression value and truly expressed value of known disease gene by the linear mould of building Type is converted into obtained from optimization problem.A series of typical different size of models in order to obtain, by bound base because Regulator number constructs linear model.For each known disease gene, it is 1 to k to construct that regulator number, which is set separately, Linear model.

Further, the activity value of the regulator miRNA, regulator TF and adjacent gene is respectively by following two side Method is calculated:

1) activity value of regulator miRNA and regulator TF are calculated:

The first step, first the benchmark expression value of all target genes of calculating regulator r:

Wherein, r indicates regulator, is regulator miRNA or regulator TF；Indicate the target gene g of regulator r_t Benchmark expression value, value be gene g_tThe average value of expression value in all samples that regulator r expression tends to 0；e (r) -> 0 indicate that regulator r expression tends to 0；

The benchmark expression value of target gene refers to the expression value of target gene when no regulating and controlling effect influences；

Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e., The expression changing value of target geneHave:

Wherein, y_gt,sIndicate the target gene g of regulator r_tTruly expressed value in sample s,Indicate regulator r Target gene g_tExpression changing value；

Third step constructs simple linear model according to the expression changing value of target gene, solves the activity of regulator Value act_r,s:

Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target base of regulator r Because of the expression changing value summation and benchmark expression value summation of collection；

3) activity value for calculating adjacent gene influences accumulation using the expression based on adjacent its all effect gene of gene pairs Effect solves, it may be assumed that

Wherein, N indicates the gene number in sample s, gs_g,iIndicate the effect side right of the gene i in gene g and sample s, g_i,sIndicate expression value of the gene i in sample s in sample s, the sample s refers to some observation individual of known disease Data.

Further, after the activity value of the regulator and adjacent gene that obtain to the step 2) is normalized, It is used further to the building of the linear model in step 3).

Beneficial effect

The present invention provides a kind of genes based on linear model to be total to the sub- recognition methods (co- of key regulatory in regulated and control network BOTLM), using gene expression profile data and gene regulation relationship, pass through the table of disease gene known to building Linear Model for Prediction It reaches to complete the identification that gene is total to key regulatory in regulated and control network.

Compared with having the method based on linear model identification key regulatory, co-BOTLM method tool of the present invention There is following advantage:

1) it is applied to regulated and control network altogether, regulated and control network includes the biological information more richer than single network altogether, therefore is known Other regulator may have prior biological meaning；

2) protein interaction data (PPI information) is added, considers that the expression of gene may be by the shadow of adjacent gene It rings；

3) activity value that new method calculates regulator and adjacent gene is quoted, cancer gene expression prediction is effectively increased Precision.The present invention realizes simply, only can need to relatively accurately be identified according to gene expression profile data and gene regulation relationship Gene is total to of the key regulatory in regulated and control network.

It is experimentally confirmed, method co-BOTLM of the present invention can effectively identify the pass that gene is total in regulated and control network Key regulator, and key regulatory identified all has critically important biological meaning.Meanwhile by comparing other methods, accurately Degree also increases.The comparison of specific experiment result figure and analysis detailed in Example.

Detailed description of the invention

Fig. 1 is the flow chart of co-BOTLM of the present invention.

Specific embodiment

The present invention is described in further details below with reference to the drawings and specific embodiments:

Embodiment 1:

One, the gene based on linear model is total to the sub- identification model of key regulatory in regulated and control network

Gene is total to the key regulatory sub-definite in regulated and control network by the present invention are as follows: utilizes gene expression profile data and gene tune Control relationship, by the expression of disease gene known to building Linear Model for Prediction, so that identifies is serious in total regulated and control network Influence the regulator of disease gene expression.

The gene based on linear model is described for clarity and is total to the sub- identification model of key regulatory in regulated and control network, and inventor will The related definition of the model is as follows:

Disease gene known to the building Linear Model for Prediction of proposition is expressed, and expression-form is as follows:

It is identification in total regulation that gene based on linear model, which is total to the target of the sub- identification model of key regulatory in regulated and control network, The regulator of disease gene expression is seriously affected in network.By constructing line using gene expression profile data and gene regulation relationship The expression of property disease gene known to model prediction, to complete the identification that gene is total to key node in regulated and control network.

The whole flow process that gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network is as shown in Figure 1. Gene expression profile data, gene regulation relationship and PPI data are inputted first.Method co-BOTLM can be divided into 4 subprocess:

1) building gene is total to regulated and control network；

2) in view of the expression of gene may be influenced by regulator and adjacent gene, therefore it is directed to known disease base Cause calculates separately the activity value (shadow of disease gene i.e. known to miRNA, TF and adjacent gene pairs of miRNA, TF and adjacent gene Ring value)；

3) it is total in regulated and control network in obtained gene and constructs linear model using the expression modal data of gene, known to prediction The expression of disease gene；

4) it is converted linear model to according to the difference minimum between predictive genes expression value and truly expressed value optimal Change problem, and solved based on mixed integer linear programming thought (MILP), it is final to identify that gene is total to the pass in regulated and control network Key regulator, entire identification process terminate；

After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is known to utilization Difference between the prediction expression value and truly expressed value of disease gene, which is minimized, converts optimization for the linear model of building Obtained from problem.A series of typical different size of models in order to obtain, by bound base because regulator number come structure Build linear model.For each known disease gene, it is 1 to k to construct linear model that regulator number, which is set separately,.In this reality In example, k value was 5 (through many experiments, when k value is 5, experiment effect reaches best).

Two, the gene based on linear model is total to the sub- recognition methods validation verification of key regulatory in regulated and control network

For the validity of verification method co-BOTLM, co-BOTLM method is applied on one group of oophoroma data set. Experimental data set includes: oophoroma sample data, gene regulation relationship, PPI data, the Cancer-Related disease base of known ovary Cause.TCGA database is downloaded under oophoroma sample data, totally 385 samples, by filter out expression value absolute value it is too small or The gene that no significant difference is expressed in each sample, finally obtains totally 385 sample, wherein comprising 559 miRNA and The oophoroma of 12456 genes expresses spectrum data set.Interactively data include miRNA-gene, TF-gene and PPI data, It is downloaded from MicroCosm website, ENCODE database and BioGrid database down respectively.By by oophoroma express spectra number It is mutually mapped according to collection and interactively, finally constructs a miRNA-TF gene and be total to regulated and control network, altogether include three kinds in network The node of type: 12381 genes, 559 miRNA and 75 TF, existing interactively between node: 59660 couples of gene- Gene, 241722 couples of miRNA-gene and 9877 couple of TF-gene.Disease gene related for known oophoroma, from DDOC number 379 are obtained according to library downloading, filters out the disease gene without expression modal data or without regulating and controlling effect relationship, final residue 123 It is a.

Three folding cross-validation experiments have been carried out in this example, and method co-BOTLM and Alexandra et al. have been proposed MIPRIP method is compared in terms of precision of prediction, and reference Pearson correlation coefficient PCC is pre- to calculate co-BOTLM method Similitude between the disease gene expression data and truly expressed data of survey, PCC value is bigger, then similitude is higher, Jin Erbiao The linear model accuracy of bright co-BOTLM method building is higher, therefore the precision of experimental result is also higher.PCC value makes in example It is calculated with the cor function of R language.Meanwhile in this example, the regulator also identified to co-BOTLM method carries out Characteristic and function enrichment analysis.

1. analysis of experimental results, verification algorithm validity

Table 1:miRNA-TF gene is total in regulated and control network before ranking 20 regulator

No.	Key regulatory of identification	Target gene number	Optimizer selects number
				1	hsa-mir-106a*	377	50
2	hsa-mir-586	508	43
				3	hsa-mir-423-5p	496	38
4	hsa-mir-515-3p	512	34
				5	hsa-mir-181a-2*	496	34
6	hsa-mir-768-3p	530	32
				7	hsa-mir-663	480	32
8	hsa-mir-539	382	31
				9	hsa-mir-206	477	30
10	hsa-mir-509-3p	552	30
				11	hsa-mir-362-3p	512	25
12	hsa-mir-378*	519	24
				13	hsa-mir-520c-3p	566	24
14	hsa-mir-33a	523	24
				15	hsa-mir-29a*	495	23
16	hsa-mir-193a-3p	496	23
				17	hsa-mir-601	484	23
18	FOXA2	169	23
				19	hsa-mir-26b	466	22
20	hsa-mir-30b	541	22

In this example, after three folding cross-validation experiments, the final averagely PPC value that obtains is 0.535, is shown in the present invention The gene expression values and truly expressed value of Linear Model for Prediction have relatively high similitude, hence it is demonstrated that co-BOTLM method structure The linear model accuracy built is relatively high, can effectively identify key regulatory in network.After the completion of experiment operation, according to excellent Change device to the selection number of all regulators, ranking is carried out to it, take first 50 as the candidate key regulator in this example. In table 1 above, before ranking 20 regulator is listed, it can be seen that the base that any regulator in addition to FOXA2 is regulated and controled Because being no less than 300, and wherein many genes have been found related with oophoroma.Since TF experimental data is very few, FOXA2 Target gene it is on the low side.It is indicated above identified regulator and ovarian cancer gene is total in regulated and control network a large amount of gene and there is work , may be related with the expression of lots of genes (including known ovarian cancer disease gene) with relationship, therefore it is total to regulated and control network herein In be of crucial importance.

2. method co-BOTLM and MIPRIP methods experiment compare, verification algorithm accuracy

Table 2: the PCC value of method MIPRIP experimental result

No.	1	2	3	4	5
						1	0.3329907	0.4312150	0.4436449	0.4731776	0.4893458
2	0.3195237	0.4221495	0.4500000	0.4687850	0.4851402
						3	0.3214019	0.4341121	0.4571028	0.4768224	0.4916822

Note: 1-3: three folding cross-validation experiments are indicated, 1-5: indicates the regulator number k value of building linear model

Table 3: the PCC value of method co-BOTLM experimental result

No.	1	2	3	4	5
						1	0.5018750	0.5709821	0.5940179	0.6112500	0.6227679
2	0.4858036	0.5575893	0.5869643	0.6025893	0.6164286
						3	0.4956250	0.5518750	0.5691964	0.5918750	0.6059821

MIPRIP method and co-BOTLM method of the invention are all based on linear model to identify the key of specified disease Regulator, however there are three differences: 1) MIPRIP method be applied to regulated and control network, and co-BOTLM method is applied to adjust altogether Controlling network, transcription factor and miRNA, there is extensive interactions and cooperation regulation, therefore regulated and control network includes than single altogether The richer biological information of network；2) factor expressed for influencing disease gene, in addition to transcription factor and miRNA, co- BOTLM method also contemplates its issuable influence of adjacent gene pairs；3) transcription of MIPRIP method and co-BOTLM method The factor is different with miRNA activity value calculation.Since MIPRIP method is applied to regulated and control network, the total tune in network is not considered Control relationship, therefore this example regards transcription factor as common gene when comparing laboratory.Table 2, table 3 side of being respectively The PCC value that method MIPRIP and method co-BOTLM experimental result obtain, can be it is clear to see that co-BOTLM method takes from table Higher PCC value was obtained, average PCC value is 0.571, and the average PCC value of MIPRIP method is 0.433.It is obvious that method The gene expression values and truly expressed value of co-BOTLM prediction have higher similitude, and therefore, experiment shows method co- indirectly BOTLM accuracy is higher, and the sub- reliability of the key regulatory identified is higher.

3. the enrichment analysis of experimental result function, the validity of verification result

Table 4: 10 regulator GO is enriched with analysis before ranking

Ncellular component assemblycellular component assemblyo.: regulator ranking, The GO term of enrichment: by before P-value (the smaller the better) ranking 3 GO term, GO number: the GO term of P-value < 0.05 Number, P-value: < 0.05 shows enrichment degree height.

Table 5: 10 regulator KEGG access is enriched with analysis before ranking

No.: regulator ranking, the KEGG access of enrichment: pressing before P-value (the smaller the better) ranking 3 KEGG access, KEGG number: KEGG number of P-value < 0.05, P-value: < 0.05 shows enrichment degree height.

In order to which key regulatory that the co-BOTLM method verified in the present invention is identified has biological meaning, at this In secondary example, GO enrichment analysis and KEGG access have been carried out to key regulatory identified using the GOstats of R language respectively Enrichment analysis.Table 4 and table 5 are respectively it is shown that GO the and KEGG access of 10 regulator is enriched with analysis result before ranking.

It is obvious that from table 4, it can be seen that 10 regulator is big before the ranking that the co-BOTLM method in the present invention is identified Part is enriched 300 or more GO terms, wherein the GO term being more frequently enriched with has: cellular component organization、cellular process、cell death、negative regulation of dendritic Cell differentiation etc. shows identified regulator and largely takes part in the related vital movement mistake of cell Journey.The GO term number that hsa-mir-515-3p and hsa-mir-768-3p are enriched with less than 100, reason may be due to this two It is less that gene is matched in the target gene of a miRNA and the library GOstats, meanwhile, Jiang et al. in 2016 it was demonstrated that due to Hsa-mir-768-3p downward is related with the MEK/ERK-mediated reinforcement in the synthesis of the protein of melanoma cells, therefore Hsa-mir-768-3p is possible to have potential prognostic function in oophoroma.Similarly, it clearly can be seen that from table 5 10 regulator is largely enriched at least 5 or more KEGG accesses before ranking, wherein the biological mistake being more frequently enriched with Cheng You: Prostate cancer, pathways in cancer, signaling pathway, ErbB signaling Pathway etc., shows identified regulator and takes part in a large amount of cancer and signal path, has close pass with cancer System.Take part in a large amount of bioprocess in conclusion sufficiently demonstrating and testing identified regulator, especially with cellular activity And the related bioprocess of cancer, therefore there is critically important biological meaning.

Claims

1. a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, which is characterized in that including with Lower step:

Step 1) building gene is total to regulated and control network:

Gene expression profile data, gene regulation relationship and protein interaction data are inputted, are filtered out wherein in the presence of no expression The interactively pair of modal data node establishes gene and is total to regulated and control network GCN, and gene is total in regulated and control network GCN altogether comprising three kinds of sections Point: regulator miRNA, regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and gene-gene；

Step 3) is total in regulated and control network GCN in the gene constructed, using obtained in gene expression profile data and step 2) The activity value of regulator and adjacent gene constructs linear model, predicts the expression of known disease gene, obtains known disease gene Prediction expression value；

Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step 3) structure The linear model built is converted into optimization problem, is solved based on mixed integer linear programming thought to optimization problem, most Identification gene is total to of the key regulatory in regulated and control network eventually.

2. the gene according to claim 1 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, the linear model expression formula for predicting the expression building of known disease gene is as follows:

Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease gene The adjoining gene of i；

g′_i,sIndicate the expression value that known disease gene i is predicted in sample s, β₀Refer to the additional weight of linear model, M, T, G respectively indicates miRNA collection, TF collection, gene collection；β_m、β_t、β_gThe Optimal Parameters for respectively indicating m, t, g, it is optimal in step 4) It can be directly calculated when changing issue handling using optimizer；

3. the gene according to claim 2 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, the difference according between predictive genes expression value and truly expressed value, which minimizes, converts linear model to Optimization problem indicates are as follows:

Wherein, g_i,s、g′_i,sThe expression value of disease gene i true expression value and prediction in sample s is respectively indicated, O and S divide What is do not indicated is total sample set of known disease gene collection He the disease；

The optimization problem is solved using Gurobi optimizer, records each regulator during solving optimization problem The number of optimised device selection carries out ranking to all regulators according to selection number, takes before ranking 50 regulator as most Whole candidate regulatory.

4. the gene according to claim 1-3 based on linear model is total to key regulatory in regulated and control network and identifies Method, which is characterized in that the activity value of regulator miRNA, the regulator TF and adjacent gene are respectively by following two side Method is calculated:

1) activity value of regulator miRNA and regulator TF are calculated:

Wherein, r indicates regulator, is regulator miRNA or regulator TF；Indicate the target gene g of regulator r_tBase Quasi- expression value, value are gene g_tThe average value of expression value in all samples that regulator r expression tends to 0；e(r)->0 Indicate that regulator r expression tends to 0；

Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e. target The expression changing value of geneHave:

Wherein,Indicate the target gene g of regulator r_tTruly expressed value in sample s,Indicate the mesh of regulator r Mark gene g_tExpression changing value；

Third step constructs simple linear model according to the expression changing value of target gene, solves the activity value of regulator act_r,s:

Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target gene collection of regulator r Expression changing value summation and benchmark expression value summation；

2) activity value for calculating adjacent gene influences cumulative effect using the expression based on adjacent its all effect gene of gene pairs To solve, it may be assumed that

Wherein, N indicates the gene number in sample s, gs_g,iIndicate the effect side right of the gene i in gene g and sample s, g_i,sTable Expression value of the gene i in sample s in this s of sample, the sample s refer to the data of some observation individual of known disease.

5. the gene according to claim 4 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, It is characterized in that, after the activity value of the regulator and adjacent gene that obtain the step 2) is normalized, is used further to walk It is rapid 3) in linear model building.