CN106874704B - A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network - Google Patents
A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network Download PDFInfo
- Publication number
- CN106874704B CN106874704B CN201710004254.4A CN201710004254A CN106874704B CN 106874704 B CN106874704 B CN 106874704B CN 201710004254 A CN201710004254 A CN 201710004254A CN 106874704 B CN106874704 B CN 106874704B
- Authority
- CN
- China
- Prior art keywords
- gene
- regulator
- value
- expression
- regulated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Abstract
The invention discloses a kind of genes based on linear model to be total to the sub- recognition methods of key regulatory in regulated and control network, using gene expression profile data and gene regulation relation data, the identification that gene is total to key regulatory in regulated and control network is completed by the expression of disease gene known to building Linear Model for Prediction.The present invention realizes simple, key regulatory that gene is total in regulated and control network only can need to be relatively accurately identified according to gene expression profile data and gene regulation relationship, and the regulator for being experimentally confirmed identification has critically important biological meaning, has important theory significance and practical value for the research of disease mechanisms.
Description
Technical field
The invention belongs to calculation biology field, it is related to a kind of gene based on linear model and is total in regulated and control network crucial tune
Control sub- recognition methods.
Background technique
In the genome times afterwards comprehensively, the function of gene, non-coding RNA, protein and other associated biomolecules is understood, prompt
The realization mechanism of bioprocess becomes one of current computing system biology and the most important goal in research of bioinformatics.Its
In, the research of gene regulation is a very important project.We are recognized biology by the regulatory mechanism for understanding gene expression
The mechanism of process and disease all plays an important role.In eucaryote, the regulatory factor that there are two classes important: turn
The factor (TF) and microRNA (miRNA) are recorded, regulates and controls the expression of target gene on transcriptional level and post-transcriptional level respectively.
Transcription factor is a kind of protein with specific function, it opens turning for gene by being integrated to the promoter region of gene
Record process.MiRNA is the new gene regulatory elements of one kind of Recent study discovery, is in the one kind found in eucaryote
The non-coding RNA with adjusting function of source property, size are about 20-25 nucleotide.Transcription factor, miRNA are in gene table
Up to playing an important role in regulation, this regulating and controlling effect spreads various biological activities and disease generating process.In this base
On plinth, research finds transcription factor and miRNA there is extensive interaction and cooperation regulation, they constitute a complexity
Total regulated and control network.Turn in regulated and control network comprising transcription factor regulation miRNA, transcription factor regulation target gene, miRNA regulation altogether
The regulating and controlling effect of the factor and target gene is recorded, these regulating and controlling effects embody each rank that cellular elements life process and function execute
Section, so regulated and control network includes the biological information more richer than single network altogether.Therefore, it efficiently identifies on total regulated and control network
Key regulatory all plays an important role to clinical treatment and the drug design of disease, this will likely can be mentioned to the treatment of human diseases
For a kind of new means.
With the rapid development of high-throughput techniques, a large amount of genomics, transcription group and proteomics etc. are produced
Group learns data, provides new opportunity for biological molecular function research.In the past for the recognizer of key point, mainly concentrate
For the identification of key protein on protein-protein interaction network.Compared with protein-protein interaction network, transcriptional control net
The Study on Evolution of network is then more difficult.Firstly, believable transcription regulatory network data are still not easy to obtain;Secondly, just existing
From the point of view of transcription regulatory network, due to the functional characteristic of network itself, the topological property and the protein interaction net that show
Network has relatively big difference, and adds the aeoplotropism of regulating and controlling effect, so that the topological property that regulated and control network is shown is more complicated.Therefore
It is also more increasingly complex than the identification of key protein for the identification of key regulatory on regulated and control network.In recent years, for regulated and control network
Research it is more and more, had it is a variety of identified based on the method for calculating on regulated and control network key regulatories, mainly
Have this following a few class method: based on information flow model (RWR), rank algorithm (PageRanking), building classifier (SVM,
Regularized least-squares classification), Bayesian network, based on regression model etc..However,
More or less there are some problems in some methods: cannot such as handle big data, time complexity are too high, precision is to be improved etc.
Deng.2015, Alexandra etc. proposed MIPRIP method, and the key regulatory on regulated and control network is identified using linear model
Son, the experimental results showed that, the method based on linear model can effectively identify the regulator with important biomolecule meaning.However,
This method is the simple relationship considered between transcription factor and gene, and there is no in view of regulator in total regulated and control network
Between interaction and cooperation regulation relationship, while accuracy of identification is also to be improved.
Therefore, it is necessary to which designing a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of genes based on linear model to be total to key in regulated and control network
Regulator recognition methods.The gene based on linear model is total in regulated and control network that the sub- recognition methods of key regulatory only need to be according to gene
Expression modal data and gene regulation relationship can relatively accurately identify that gene is total to the key in regulated and control network with biological meaning
Regulator.
The technical solution of invention is as follows:
A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, comprising the following steps:
Step 1) building gene is total to regulated and control network:
Input gene expression profile data, gene regulation relationship and protein interaction data (Protein-Protein
Interaction, PPI), it filters out wherein in the presence of the interactively pair of no express spectra back end, establishes gene and regulate and control net altogether
Network GCN (gene co-regulatory networks), gene are total in regulated and control network GCN altogether comprising three kinds of nodes: regulator
MiRNA (microRNA), regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and
gene-gene;
If gene is total to any two points in regulated and control network GCN, it is 1 there are interactively side right, is otherwise 0;
Step 2) calculates separately the activity value of regulator miRNA, regulator TF and adjacent gene to known disease gene;
Activity value, that is, miRNA, TF and the influence value for abutting disease gene known to gene pairs;
Step 3) is total in regulated and control network GCN in the gene constructed, is obtained using in gene expression profile data and step 2)
The activity value building linear model of the regulator and adjacent gene that arrive, predicts the expression of known disease gene, obtains known disease
The prediction expression value of gene;
Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step
3) linear model constructed is converted into optimization problem, is asked based on mixed integer linear programming thought optimization problem
Solution, it is final to identify that gene is total to of the key regulatory in regulated and control network.
Further, the linear model expression formula for predicting the expression building of known disease gene is as follows:
Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease
The adjoining gene of gene i;
g′i,sIndicate the expression value that known disease gene i is predicted in sample s, β0Refer to the additional weight of linear model
(additive offset), M, T, G respectively indicate miRNA collection, TF collection, gene collection;βm、βt、βgRespectively indicate the optimization of m, t, g
Parameter, the optimization problem in step 4) can be directly calculated when handling using optimizer;
esm,i、tst,i、gsg,iThe action edge weight of m, t, g and i are respectively indicated, value is 0 or 1;
actm,s、actt,s、actg,sRespectively indicate the activity value of m, t, g in sample s;
The sample s refers to the data of some observation individual of known disease.
Further, the difference according between predictive genes expression value and truly expressed value is minimized linear mould
Type is converted into optimization problem, indicates are as follows:
Wherein, gi,s、g′i,sRespectively indicate the expression value of disease gene i true expression value and prediction in sample s, O with
What S was respectively indicated is total sample set of known disease gene collection He the disease;
The optimization problem is solved using Gurobi optimizer, records each regulator in solving optimization problem mistake
All regulators are carried out ranking according to selection number by the number of optimised device selection in journey, take before ranking 50 regulator work
For final candidate regulatory.
After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter
Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and
The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is the line constructed
Property model using the difference minimum between the prediction expression value and truly expressed value of known disease gene by the linear mould of building
Type is converted into obtained from optimization problem.A series of typical different size of models in order to obtain, by bound base because
Regulator number constructs linear model.For each known disease gene, it is 1 to k to construct that regulator number, which is set separately,
Linear model.
Further, the activity value of the regulator miRNA, regulator TF and adjacent gene is respectively by following two side
Method is calculated:
1) activity value of regulator miRNA and regulator TF are calculated:
The first step, first the benchmark expression value of all target genes of calculating regulator r:
Wherein, r indicates regulator, is regulator miRNA or regulator TF;Indicate the target gene g of regulator rt
Benchmark expression value, value be gene gtThe average value of expression value in all samples that regulator r expression tends to 0;e
(r) -> 0 indicate that regulator r expression tends to 0;
The benchmark expression value of target gene refers to the expression value of target gene when no regulating and controlling effect influences;
Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e.,
The expression changing value of target geneHave:
Wherein, ygt,sIndicate the target gene g of regulator rtTruly expressed value in sample s,Indicate regulator r
Target gene gtExpression changing value;
Third step constructs simple linear model according to the expression changing value of target gene, solves the activity of regulator
Value actr,s:
Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target base of regulator r
Because of the expression changing value summation and benchmark expression value summation of collection;
3) activity value for calculating adjacent gene influences accumulation using the expression based on adjacent its all effect gene of gene pairs
Effect solves, it may be assumed that
Wherein, N indicates the gene number in sample s, gsg,iIndicate the effect side right of the gene i in gene g and sample s,
gi,sIndicate expression value of the gene i in sample s in sample s, the sample s refers to some observation individual of known disease
Data.
Further, after the activity value of the regulator and adjacent gene that obtain to the step 2) is normalized,
It is used further to the building of the linear model in step 3).
Beneficial effect
The present invention provides a kind of genes based on linear model to be total to the sub- recognition methods (co- of key regulatory in regulated and control network
BOTLM), using gene expression profile data and gene regulation relationship, pass through the table of disease gene known to building Linear Model for Prediction
It reaches to complete the identification that gene is total to key regulatory in regulated and control network.
Compared with having the method based on linear model identification key regulatory, co-BOTLM method tool of the present invention
There is following advantage:
1) it is applied to regulated and control network altogether, regulated and control network includes the biological information more richer than single network altogether, therefore is known
Other regulator may have prior biological meaning;
2) protein interaction data (PPI information) is added, considers that the expression of gene may be by the shadow of adjacent gene
It rings;
3) activity value that new method calculates regulator and adjacent gene is quoted, cancer gene expression prediction is effectively increased
Precision.The present invention realizes simply, only can need to relatively accurately be identified according to gene expression profile data and gene regulation relationship
Gene is total to of the key regulatory in regulated and control network.
It is experimentally confirmed, method co-BOTLM of the present invention can effectively identify the pass that gene is total in regulated and control network
Key regulator, and key regulatory identified all has critically important biological meaning.Meanwhile by comparing other methods, accurately
Degree also increases.The comparison of specific experiment result figure and analysis detailed in Example.
Detailed description of the invention
Fig. 1 is the flow chart of co-BOTLM of the present invention.
Specific embodiment
The present invention is described in further details below with reference to the drawings and specific embodiments:
Embodiment 1:
One, the gene based on linear model is total to the sub- identification model of key regulatory in regulated and control network
Gene is total to the key regulatory sub-definite in regulated and control network by the present invention are as follows: utilizes gene expression profile data and gene tune
Control relationship, by the expression of disease gene known to building Linear Model for Prediction, so that identifies is serious in total regulated and control network
Influence the regulator of disease gene expression.
The gene based on linear model is described for clarity and is total to the sub- identification model of key regulatory in regulated and control network, and inventor will
The related definition of the model is as follows:
Disease gene known to the building Linear Model for Prediction of proposition is expressed, and expression-form is as follows:
It is identification in total regulation that gene based on linear model, which is total to the target of the sub- identification model of key regulatory in regulated and control network,
The regulator of disease gene expression is seriously affected in network.By constructing line using gene expression profile data and gene regulation relationship
The expression of property disease gene known to model prediction, to complete the identification that gene is total to key node in regulated and control network.
The whole flow process that gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network is as shown in Figure 1.
Gene expression profile data, gene regulation relationship and PPI data are inputted first.Method co-BOTLM can be divided into 4 subprocess:
1) building gene is total to regulated and control network;
2) in view of the expression of gene may be influenced by regulator and adjacent gene, therefore it is directed to known disease base
Cause calculates separately the activity value (shadow of disease gene i.e. known to miRNA, TF and adjacent gene pairs of miRNA, TF and adjacent gene
Ring value);
3) it is total in regulated and control network in obtained gene and constructs linear model using the expression modal data of gene, known to prediction
The expression of disease gene;
4) it is converted linear model to according to the difference minimum between predictive genes expression value and truly expressed value optimal
Change problem, and solved based on mixed integer linear programming thought (MILP), it is final to identify that gene is total to the pass in regulated and control network
Key regulator, entire identification process terminate;
The optimization problem is solved using Gurobi optimizer, records each regulator in solving optimization problem mistake
All regulators are carried out ranking according to selection number by the number of optimised device selection in journey, take before ranking 50 regulator work
For final candidate regulatory.
After installing Gurobi optimizer, gurobi packet only need to be imported in R language, can call directly gurobi letter
Number optimize issue handling, there are three input parameters for the gurobi function: Optimized model, timeLimit and
The general value 600 of OutputFlag, timeLimit, OutputFlag take default value 0, and the Optimized model is known to utilization
Difference between the prediction expression value and truly expressed value of disease gene, which is minimized, converts optimization for the linear model of building
Obtained from problem.A series of typical different size of models in order to obtain, by bound base because regulator number come structure
Build linear model.For each known disease gene, it is 1 to k to construct linear model that regulator number, which is set separately,.In this reality
In example, k value was 5 (through many experiments, when k value is 5, experiment effect reaches best).
Two, the gene based on linear model is total to the sub- recognition methods validation verification of key regulatory in regulated and control network
For the validity of verification method co-BOTLM, co-BOTLM method is applied on one group of oophoroma data set.
Experimental data set includes: oophoroma sample data, gene regulation relationship, PPI data, the Cancer-Related disease base of known ovary
Cause.TCGA database is downloaded under oophoroma sample data, totally 385 samples, by filter out expression value absolute value it is too small or
The gene that no significant difference is expressed in each sample, finally obtains totally 385 sample, wherein comprising 559 miRNA and
The oophoroma of 12456 genes expresses spectrum data set.Interactively data include miRNA-gene, TF-gene and PPI data,
It is downloaded from MicroCosm website, ENCODE database and BioGrid database down respectively.By by oophoroma express spectra number
It is mutually mapped according to collection and interactively, finally constructs a miRNA-TF gene and be total to regulated and control network, altogether include three kinds in network
The node of type: 12381 genes, 559 miRNA and 75 TF, existing interactively between node: 59660 couples of gene-
Gene, 241722 couples of miRNA-gene and 9877 couple of TF-gene.Disease gene related for known oophoroma, from DDOC number
379 are obtained according to library downloading, filters out the disease gene without expression modal data or without regulating and controlling effect relationship, final residue 123
It is a.
Three folding cross-validation experiments have been carried out in this example, and method co-BOTLM and Alexandra et al. have been proposed
MIPRIP method is compared in terms of precision of prediction, and reference Pearson correlation coefficient PCC is pre- to calculate co-BOTLM method
Similitude between the disease gene expression data and truly expressed data of survey, PCC value is bigger, then similitude is higher, Jin Erbiao
The linear model accuracy of bright co-BOTLM method building is higher, therefore the precision of experimental result is also higher.PCC value makes in example
It is calculated with the cor function of R language.Meanwhile in this example, the regulator also identified to co-BOTLM method carries out
Characteristic and function enrichment analysis.
1. analysis of experimental results, verification algorithm validity
Table 1:miRNA-TF gene is total in regulated and control network before ranking 20 regulator
No. | Key regulatory of identification | Target gene number | Optimizer selects number |
1 | hsa-mir-106a* | 377 | 50 |
2 | hsa-mir-586 | 508 | 43 |
3 | hsa-mir-423-5p | 496 | 38 |
4 | hsa-mir-515-3p | 512 | 34 |
5 | hsa-mir-181a-2* | 496 | 34 |
6 | hsa-mir-768-3p | 530 | 32 |
7 | hsa-mir-663 | 480 | 32 |
8 | hsa-mir-539 | 382 | 31 |
9 | hsa-mir-206 | 477 | 30 |
10 | hsa-mir-509-3p | 552 | 30 |
11 | hsa-mir-362-3p | 512 | 25 |
12 | hsa-mir-378* | 519 | 24 |
13 | hsa-mir-520c-3p | 566 | 24 |
14 | hsa-mir-33a | 523 | 24 |
15 | hsa-mir-29a* | 495 | 23 |
16 | hsa-mir-193a-3p | 496 | 23 |
17 | hsa-mir-601 | 484 | 23 |
18 | FOXA2 | 169 | 23 |
19 | hsa-mir-26b | 466 | 22 |
20 | hsa-mir-30b | 541 | 22 |
In this example, after three folding cross-validation experiments, the final averagely PPC value that obtains is 0.535, is shown in the present invention
The gene expression values and truly expressed value of Linear Model for Prediction have relatively high similitude, hence it is demonstrated that co-BOTLM method structure
The linear model accuracy built is relatively high, can effectively identify key regulatory in network.After the completion of experiment operation, according to excellent
Change device to the selection number of all regulators, ranking is carried out to it, take first 50 as the candidate key regulator in this example.
In table 1 above, before ranking 20 regulator is listed, it can be seen that the base that any regulator in addition to FOXA2 is regulated and controled
Because being no less than 300, and wherein many genes have been found related with oophoroma.Since TF experimental data is very few, FOXA2
Target gene it is on the low side.It is indicated above identified regulator and ovarian cancer gene is total in regulated and control network a large amount of gene and there is work
, may be related with the expression of lots of genes (including known ovarian cancer disease gene) with relationship, therefore it is total to regulated and control network herein
In be of crucial importance.
2. method co-BOTLM and MIPRIP methods experiment compare, verification algorithm accuracy
Table 2: the PCC value of method MIPRIP experimental result
No. | 1 | 2 | 3 | 4 | 5 |
1 | 0.3329907 | 0.4312150 | 0.4436449 | 0.4731776 | 0.4893458 |
2 | 0.3195237 | 0.4221495 | 0.4500000 | 0.4687850 | 0.4851402 |
3 | 0.3214019 | 0.4341121 | 0.4571028 | 0.4768224 | 0.4916822 |
Note: 1-3: three folding cross-validation experiments are indicated, 1-5: indicates the regulator number k value of building linear model
Table 3: the PCC value of method co-BOTLM experimental result
No. | 1 | 2 | 3 | 4 | 5 |
1 | 0.5018750 | 0.5709821 | 0.5940179 | 0.6112500 | 0.6227679 |
2 | 0.4858036 | 0.5575893 | 0.5869643 | 0.6025893 | 0.6164286 |
3 | 0.4956250 | 0.5518750 | 0.5691964 | 0.5918750 | 0.6059821 |
MIPRIP method and co-BOTLM method of the invention are all based on linear model to identify the key of specified disease
Regulator, however there are three differences: 1) MIPRIP method be applied to regulated and control network, and co-BOTLM method is applied to adjust altogether
Controlling network, transcription factor and miRNA, there is extensive interactions and cooperation regulation, therefore regulated and control network includes than single altogether
The richer biological information of network;2) factor expressed for influencing disease gene, in addition to transcription factor and miRNA, co-
BOTLM method also contemplates its issuable influence of adjacent gene pairs;3) transcription of MIPRIP method and co-BOTLM method
The factor is different with miRNA activity value calculation.Since MIPRIP method is applied to regulated and control network, the total tune in network is not considered
Control relationship, therefore this example regards transcription factor as common gene when comparing laboratory.Table 2, table 3 side of being respectively
The PCC value that method MIPRIP and method co-BOTLM experimental result obtain, can be it is clear to see that co-BOTLM method takes from table
Higher PCC value was obtained, average PCC value is 0.571, and the average PCC value of MIPRIP method is 0.433.It is obvious that method
The gene expression values and truly expressed value of co-BOTLM prediction have higher similitude, and therefore, experiment shows method co- indirectly
BOTLM accuracy is higher, and the sub- reliability of the key regulatory identified is higher.
3. the enrichment analysis of experimental result function, the validity of verification result
Table 4: 10 regulator GO is enriched with analysis before ranking
Ncellular component assemblycellular component assemblyo.: regulator ranking,
The GO term of enrichment: by before P-value (the smaller the better) ranking 3 GO term, GO number: the GO term of P-value < 0.05
Number, P-value: < 0.05 shows enrichment degree height.
Table 5: 10 regulator KEGG access is enriched with analysis before ranking
No.: regulator ranking, the KEGG access of enrichment: pressing before P-value (the smaller the better) ranking 3 KEGG access,
KEGG number: KEGG number of P-value < 0.05, P-value: < 0.05 shows enrichment degree height.
In order to which key regulatory that the co-BOTLM method verified in the present invention is identified has biological meaning, at this
In secondary example, GO enrichment analysis and KEGG access have been carried out to key regulatory identified using the GOstats of R language respectively
Enrichment analysis.Table 4 and table 5 are respectively it is shown that GO the and KEGG access of 10 regulator is enriched with analysis result before ranking.
It is obvious that from table 4, it can be seen that 10 regulator is big before the ranking that the co-BOTLM method in the present invention is identified
Part is enriched 300 or more GO terms, wherein the GO term being more frequently enriched with has: cellular component
organization、cellular process、cell death、negative regulation of dendritic
Cell differentiation etc. shows identified regulator and largely takes part in the related vital movement mistake of cell
Journey.The GO term number that hsa-mir-515-3p and hsa-mir-768-3p are enriched with less than 100, reason may be due to this two
It is less that gene is matched in the target gene of a miRNA and the library GOstats, meanwhile, Jiang et al. in 2016 it was demonstrated that due to
Hsa-mir-768-3p downward is related with the MEK/ERK-mediated reinforcement in the synthesis of the protein of melanoma cells, therefore
Hsa-mir-768-3p is possible to have potential prognostic function in oophoroma.Similarly, it clearly can be seen that from table 5
10 regulator is largely enriched at least 5 or more KEGG accesses before ranking, wherein the biological mistake being more frequently enriched with
Cheng You: Prostate cancer, pathways in cancer, signaling pathway, ErbB signaling
Pathway etc., shows identified regulator and takes part in a large amount of cancer and signal path, has close pass with cancer
System.Take part in a large amount of bioprocess in conclusion sufficiently demonstrating and testing identified regulator, especially with cellular activity
And the related bioprocess of cancer, therefore there is critically important biological meaning.
Claims (5)
1. a kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network, which is characterized in that including with
Lower step:
Step 1) building gene is total to regulated and control network:
Gene expression profile data, gene regulation relationship and protein interaction data are inputted, are filtered out wherein in the presence of no expression
The interactively pair of modal data node establishes gene and is total to regulated and control network GCN, and gene is total in regulated and control network GCN altogether comprising three kinds of sections
Point: regulator miRNA, regulator TF and gene gene, there are action edges between node: miRNA-gene, TF-gene and
gene-gene;
If gene is total to any two points in regulated and control network GCN, it is 1 there are interactively side right, is otherwise 0;
Step 2) calculates separately the activity value of regulator miRNA, regulator TF and adjacent gene to known disease gene;
Step 3) is total in regulated and control network GCN in the gene constructed, using obtained in gene expression profile data and step 2)
The activity value of regulator and adjacent gene constructs linear model, predicts the expression of known disease gene, obtains known disease gene
Prediction expression value;
Step 4) is minimized according to the difference between the prediction expression value and truly expressed value of known disease gene by step 3) structure
The linear model built is converted into optimization problem, is solved based on mixed integer linear programming thought to optimization problem, most
Identification gene is total to of the key regulatory in regulated and control network eventually.
2. the gene according to claim 1 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network,
It is characterized in that, the linear model expression formula for predicting the expression building of known disease gene is as follows:
Wherein, i indicates known disease gene, and m, t, g respectively indicate regulator miRNA, regulator TF, known disease gene
The adjoining gene of i;
g′i,sIndicate the expression value that known disease gene i is predicted in sample s, β0Refer to the additional weight of linear model, M, T,
G respectively indicates miRNA collection, TF collection, gene collection;βm、βt、βgThe Optimal Parameters for respectively indicating m, t, g, it is optimal in step 4)
It can be directly calculated when changing issue handling using optimizer;
esm,i、tst,i、gsg,iThe action edge weight of m, t, g and i are respectively indicated, value is 0 or 1;
actm,s、actt,s、actg,sRespectively indicate the activity value of m, t, g in sample s;
The sample s refers to the data of some observation individual of known disease.
3. the gene according to claim 2 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network,
It is characterized in that, the difference according between predictive genes expression value and truly expressed value, which minimizes, converts linear model to
Optimization problem indicates are as follows:
Wherein, gi,s、g′i,sThe expression value of disease gene i true expression value and prediction in sample s is respectively indicated, O and S divide
What is do not indicated is total sample set of known disease gene collection He the disease;
The optimization problem is solved using Gurobi optimizer, records each regulator during solving optimization problem
The number of optimised device selection carries out ranking to all regulators according to selection number, takes before ranking 50 regulator as most
Whole candidate regulatory.
4. the gene according to claim 1-3 based on linear model is total to key regulatory in regulated and control network and identifies
Method, which is characterized in that the activity value of regulator miRNA, the regulator TF and adjacent gene are respectively by following two side
Method is calculated:
1) activity value of regulator miRNA and regulator TF are calculated:
The first step, first the benchmark expression value of all target genes of calculating regulator r:
Wherein, r indicates regulator, is regulator miRNA or regulator TF;Indicate the target gene g of regulator rtBase
Quasi- expression value, value are gene gtThe average value of expression value in all samples that regulator r expression tends to 0;e(r)->0
Indicate that regulator r expression tends to 0;
Second step, the difference between truly expressed value after calculating target gene benchmark expression value and regulator influence, i.e. target
The expression changing value of geneHave:
Wherein,Indicate the target gene g of regulator rtTruly expressed value in sample s,Indicate the mesh of regulator r
Mark gene gtExpression changing value;
Third step constructs simple linear model according to the expression changing value of target gene, solves the activity value of regulator
actr,s:
Wherein, G ' indicates the target gene collection of regulator r,Respectively indicate the target gene collection of regulator r
Expression changing value summation and benchmark expression value summation;
2) activity value for calculating adjacent gene influences cumulative effect using the expression based on adjacent its all effect gene of gene pairs
To solve, it may be assumed that
Wherein, N indicates the gene number in sample s, gsg,iIndicate the effect side right of the gene i in gene g and sample s, gi,sTable
Expression value of the gene i in sample s in this s of sample, the sample s refer to the data of some observation individual of known disease.
5. the gene according to claim 4 based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network,
It is characterized in that, after the activity value of the regulator and adjacent gene that obtain the step 2) is normalized, is used further to walk
It is rapid 3) in linear model building.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710004254.4A CN106874704B (en) | 2017-01-04 | 2017-01-04 | A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710004254.4A CN106874704B (en) | 2017-01-04 | 2017-01-04 | A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106874704A CN106874704A (en) | 2017-06-20 |
CN106874704B true CN106874704B (en) | 2019-02-19 |
Family
ID=59164588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710004254.4A Active CN106874704B (en) | 2017-01-04 | 2017-01-04 | A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874704B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391962B (en) * | 2017-09-05 | 2020-12-29 | 武汉古奥基因科技有限公司 | Method for analyzing regulation and control relation of genes or loci to diseases based on multiple groups of theories |
CN107679367B (en) * | 2017-09-20 | 2020-02-21 | 湖南大学 | Method and system for identifying co-regulation network function module based on network node association degree |
CN109308934A (en) * | 2018-08-20 | 2019-02-05 | 唐山照澜海洋科技有限公司 | A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm |
CN111304200B (en) * | 2020-02-11 | 2022-04-15 | 山东大学 | CeRNA (cellular ribonucleic acid) regulation and control network for regulating and controlling osteointegration around rat implant with hyperlipidemia and application of network |
CN111613268B (en) * | 2020-05-27 | 2023-02-24 | 中山大学 | Method for determining gene expression regulation mechanism based on single cell transcriptome data |
CN112102876B (en) * | 2020-09-27 | 2023-03-28 | 西安交通大学 | Method for automatically modeling gene circuit and transcription regulation and control relation |
CN115798600A (en) * | 2023-02-03 | 2023-03-14 | 北京灵迅医药科技有限公司 | Genome data analysis method, apparatus, device and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719195A (en) * | 2009-12-03 | 2010-06-02 | 上海大学 | Inference method of stepwise regression gene regulatory network |
CN101719194A (en) * | 2009-12-03 | 2010-06-02 | 上海大学 | Artificial gene regulatory network simulation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10159262B4 (en) * | 2001-12-03 | 2007-12-13 | Siemens Ag | Identify pharmaceutical targets |
-
2017
- 2017-01-04 CN CN201710004254.4A patent/CN106874704B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719195A (en) * | 2009-12-03 | 2010-06-02 | 上海大学 | Inference method of stepwise regression gene regulatory network |
CN101719194A (en) * | 2009-12-03 | 2010-06-02 | 上海大学 | Artificial gene regulatory network simulation method |
Non-Patent Citations (2)
Title |
---|
Transcription factor and miRNA;Ying Lin等;《SCIENTIFIC REPORTS》;20151021;全文 |
整合分析基因表达与拷贝数变异识别癌症的驱动基因及调控子miRNAs;许艳等;《现代生物医学进展》;20160215;第940-943页 |
Also Published As
Publication number | Publication date |
---|---|
CN106874704A (en) | 2017-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106874704B (en) | A kind of gene based on linear model is total to the sub- recognition methods of key regulatory in regulated and control network | |
EP2864915B1 (en) | Systems and methods relating to network-based biomarker signatures | |
Fan et al. | Comprehensive overview and assessment of computational prediction of microRNA targets in animals | |
Elyasigomari et al. | Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization | |
Su et al. | Phenotypic heterogeneity and evolution of melanoma cells associated with targeted therapy resistance | |
CN111933212B (en) | Clinical histology data processing method and device based on machine learning | |
Mohammadi et al. | Automated design of synthetic cell classifier circuits using a two-step optimization strategy | |
CN106295246A (en) | Find the lncRNA relevant to tumor and predict its function | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
Marques et al. | Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction | |
Torkamannia et al. | A review of machine learning approaches for drug synergy prediction in cancer | |
Nguyen et al. | Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes | |
Engelmann et al. | A least angle regression model for the prediction of canonical and non-canonical miRNA-mRNA interactions | |
KR100759817B1 (en) | Method and device for predicting regulation of multiple transcription factors | |
Tian et al. | Graph random Forest: a graph embedded algorithm for identifying highly connected important features | |
Khan et al. | Integrative workflows for network analysis | |
Rau et al. | Individualized multi-omic pathway deviation scores using multiple factor analysis | |
Kalyakulina et al. | Disease classification for whole-blood DNA methylation: meta-analysis, missing values imputation, and XAI | |
Yang et al. | MSPL: Multimodal self-paced learning for multi-omics feature selection and data integration | |
CN115280415A (en) | Application of pathogenicity model and training thereof | |
Liu et al. | miRNA-disease associations prediction based on neural tensor decomposition | |
Park et al. | Use of evolutionary hypernetworks for mining prostate cancer data | |
Wong et al. | An integrative boosting approach for predicting survival time with multiple genomics platforms | |
Rafsanzani et al. | Construction of GRN for Lung Cancer Datasets Using Weighted Co-Expression Networks | |
Zhai | Statistical Methods for Gene Differential Expression Detection and Cell Trajectory Reconstruction from Single-Cell RNA Sequencing Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |