CN111223523B - Gene regulation network construction method and system based on multi-time-lag causal entropy - Google Patents

Gene regulation network construction method and system based on multi-time-lag causal entropy Download PDF

Info

Publication number
CN111223523B
CN111223523B CN202010013036.9A CN202010013036A CN111223523B CN 111223523 B CN111223523 B CN 111223523B CN 202010013036 A CN202010013036 A CN 202010013036A CN 111223523 B CN111223523 B CN 111223523B
Authority
CN
China
Prior art keywords
time
gene
edges
genes
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010013036.9A
Other languages
Chinese (zh)
Other versions
CN111223523A (en
Inventor
李敏
冯浩楠
郑瑞清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010013036.9A priority Critical patent/CN111223523B/en
Publication of CN111223523A publication Critical patent/CN111223523A/en
Application granted granted Critical
Publication of CN111223523B publication Critical patent/CN111223523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method and a system for constructing a gene regulation network based on multi-time-lag causal entropy, wherein input time sequence gene expression data are divided into time windows with different time lags; for gene expression data of t time slices, respectively constructing a gene expression matrix under t-tau time windows, for each pair of genes, calculating multi-time-lag transfer entropy of a target gene under the t time windows and genes under the previous t-tau time windows to obtain a multi-time-lag gene correlation matrix, wherein elements of the matrix represent the probability of edges between genes, clustering edges of the matrix into two types through k-means, filtering out low-probability edge clusters, calculating multi-time-lag condition transfer entropy under conditional genes for each of the rest edges, and filtering out indirectly regulated edges with maximum causal entropy smaller than a threshold value to obtain a final network structure. The application effectively improves the accuracy of inference.

Description

Gene regulation network construction method and system based on multi-time-lag causal entropy
Technical Field
The application relates to the field of bioinformatics, in particular to a construction method of a complex biological network.
Background
In organisms, cells are the fundamental unit of all tissue structure and function. The DNA information of all cells in a living body is the same, but cells of different tissues and organs have different differences, because a complex regulatory gene regulatory mechanism exists in the cells, so that the expression of different cells shows a plurality of specificities. Among them, the mechanism controlling gene expression is collectively called gene expression regulation. There are also differences in the manner in which different organisms perform gene regulation. In prokaryotes, environmental stimuli have a critical impact on gene expression, and by contact with the external environment, prokaryotes adapt to different environments by turning on and off the expression of a portion of the genes. Eukaryotic gene regulation is more complex than prokaryotic organisms. Regulation in eukaryotes is mainly affected by hormones, the cell growth cycle, and environmental factors are greatly reduced. Specific characteristics of gene regulation include (1) complex structure; (2) changeable regulation and control modes: there is both one-to-one regulation between genes and one-to-many or many-to-one multi-factor regulation. (3) Type diversity may be involved in a variety of types, such as DNA, mRNA, protein, small molecules, and the like. And (4) dynamically changing the regulation and control relation. Therefore, the gene regulation mechanism is one of important foundation for researching growth and development rules and basic morphological structure of animals and plants
Based on different types of gene expression data, the construction of gene regulation networks by computational means has become one of the important challenges of systems biology. Common computing methods to construct gene regulation networks cover a number of fields of theory, including correlation analysis methods, bayesian networks, feature selection methods, and boolean networks. These determine the gene regulation relationship by analyzing the correlation between the genes or the relationship of the expression level between the genes by modeling, and finally construct a regulation network.
The construction of gene regulation networks by correlation analysis is one of the most intuitive methods. Researchers analyzed the relativity between genes by means of pearson correlation coefficient, mutual information, etc. Among the most popular is the baseAnd constructing a gene regulation network of mutual information. Compared with the pearson correlation coefficient, the mutual information can reveal the nonlinear regulation and control relationship among genes. Margolin et al propose an ARACNE algorithm, using Data Processing Inequality (DPI) to determine if any in a triplet (X1, X2, X3)Then there is a relationship of I (X 1 ;X 3 )≤min[I(X 1 ;X 2 ),I(X 2 ;X 3 )]. ARACNE calculates mutual information I for any pair of genes and uses threshold I 0 Margolin et al consider that only I.gtoreq.I 0 And the regulation relationship exists between the gene pairs. Meyer et al further propose the MRNET algorithm based on ARACNE. The algorithm uses a policy of maximum relevance/minimum redundancy (MRMR) and uses a greedy algorithm to pick out node X j The node needs to meet and target node Y and have the maximum difference Score in information with respect to other selected node sets S. Patrick for a pair of nodes { X i ,X j Larger MRMR values are used as their weights. In further studies by Luo et al, it was considered that the gene expression regulation relationship was generally more than 3 genes, i.e., for the target gene T, there were generally more than two regulatory genes. Based on this hypothesis, they propose a new algorithm MI3, scoring the target gene T and the two regulatory genes R1 and R2 by a correlation part and a coordination part, to discover higher order interactions. Zhang et al aggregate condition mutual information and path consistency algorithm (path consistency algorithm, PCA) propose a network construction algorithm CMI-PCA. CMI-PCA adopts multivariate condition mutual information to test and filter indirect regulation and control relation. Zhao et al propose a new mutual information estimation method PMI aiming at the problem that the regulation and control relation is underestimated in calculation by CMI-PCA. The breakthrough of the gene regulation network construction method based on mutual information is to filter indirect regulation relations.
Bayesian networks are another common method of gene regulation network construction. Bayesian networks quantify attributes of biological directed networks, combining graph theory and probability theoryThe method of the two aspects. The difficulty of the Bayesian network in the gene regulation network can be mainly divided into two parts of structure learning and parameter learning. Werhli and Husmeier integrate gene expression data and a priori knowledge of multiple sources. By constructing the energy function E (G) in combination with Gibbs distribution as learning of the bayesian network structure, accuracy of the bayesian network structure is improved. At the same time, they also use the Markov chain Monte Carlo Method (MCMC) to estimate the hyper-parameters in different prior knowledge. Qin et al added Ontology Fingerprint on a priori knowledge to assess similarity between genes, inferring cell type specific signaling networks. In the algorithm process, a bayesian network is established on a given normalized signal transmission network, a heuristic search algorithm is adopted, edges are added and deleted according to Ontology Fingerprint similarity, and the coincidence degree of a candidate model and observed data is calculated as a selection index. In the process of bayesian network structure learning, qin adopts BIC as a choice of parameters, and Monte Carlo EM algorithm is used to infer hidden states of nodes in the network, and further estimate candidate model parameters. Hill et al inspired by the "minimal number of upstream regulatory factors" that considered the degree of gene entry in the network d max =4, thereby effectively reducing the uncertainty of the network. Hill's method is extremely effective in breast cancer cell lines, while constructing a network with AUC values up to 0.82 in the yeast complex network given a priori knowledge of the network structure. Li et al propose a dynamic Bayesian algorithm MMHO-DBN combining a high-order timing model and Max-Min hill-climing heuristic search. The MMHO-DBN adopts a local search method to improve, and adopts Dynamic Max-Min Parent (DMMP) to obtain a Parent node set with great possibility, so that the space of a candidate network structure is effectively reduced.
At present, a plurality of gene regulation network construction methods are proposed, but the method is limited by the complexity of gene regulation, and the precision of the method still has a great improvement space. Among these, the following problems are mainly present: (1) How to design an effective algorithm to filter indirect regulation and control relations among genes; (2) How to combine other biological information, and improve the precision of network construction.
Disclosure of Invention
The application aims to solve the technical problem of providing a gene regulation network construction method based on multi-time-lag causal entropy and improves the network construction precision aiming at the defects of the prior art.
In order to solve the technical problems, the application adopts the following technical scheme: a method and a system for constructing a gene regulation network based on multi-time-lag causal entropy comprise the following steps:
1) Dividing the input time sequence gene expression data into different time windows according to time lag tau;
2) Respectively constructing time sequence gene expression matrixes under t-tau time windows for the gene expression data of t time slices after dividing the windows, wherein the time sequence gene expression matrixes from t-tau to t-1;
3) For each gene in the time sequence gene expression matrix under t-tau time windows, selecting an expression profile under the t window by a target gene, selecting an expression profile under the t-tau window to t-1 time window by a regulatory factor, and calculating multi-time-lag transfer entropy between genes to obtain a gene correlation matrix;
4) The method comprises the steps of dividing edge clusters into two types for a full-communication network of a gene correlation matrix, filtering out one type of edges with low probability values, calculating multi-time-lag causal entropy of each of the remaining edges under different condition genes, and filtering out indirectly regulated edges with maximum causal entropy lower than a threshold value theta to obtain a final gene regulation network.
In step 1), the regulation and control relationship can be more accurately identified by using multiple time delays, and different time windows G tau are divided according to the time lag tau, wherein the expression is as follows:
wherein ,representing the expression value of the gene N in the time window expression matrix G tau under the time window T of the sample M; t indicates in which moving time windowA gene expression vector under the mouth; n represents the subscript of the genes, N ε the number of genes; m represents the subscript of the sample cells, M ε the number of samples.
In step 3), in order to more accurately identify the regulation and control relation under multiple time lags, the transfer entropy is promoted to a mode under multiple time lags, and multiple time lags between genes transfer entropy T X→Y The calculation formula of (2) is as follows:
T X→Y =I(Y t ,X t-1:t-τ |Y t-1:t-τ )
=H(Y t |Y t-1:t-τ )-H(Y t |Y t-1:t-τ ,X t-1:t-τ );
wherein I(Yt ,X t-1:t-τ |Y t-1:t-τ ) Represents Y t ,X t-1:t-τ In the condition variable Y t-1:t-τ Conditional mutual information of (a):
wherein Px, y, z (x, y, z) represent joint probability densities, pz (z) represent marginal probability densities, px, z (x, z) represent marginal probability densities between variables x, z; x is X t-1:t-τ Representing the expression value of gene x under the time window from t-1 to t- τ, H (|. Cndot.) represents conditional entropy:
where P (x, y) represents the joint probability and P (x) represents the marginal probability density.
In step 4), in order to filter out the indirect regulation and control effect, a path consistency algorithm based on causal entropy is used, and the specific implementation process of filtering out the indirect regulation and control edge with the maximum causal entropy lower than the threshold value theta comprises the following steps:
1) For the gene correlation matrix G zero-order Filtering edges of low expression values, dividing edge clusters into two clusters according to k-means, and filtering edges in the clusters with low probability values; the elements in the gene correlation matrix represent the probability that regulatory relationships exist between genesA rate;
2) Filtering indirectly regulated edges based on a path consistency algorithm, regarding each edge (X, Y) existing in the filtered network, regarding adjacent points Z existing on each edge as a condition gene if the edges (Y, Z) exist and the edges (X, Z) exist, and calculating causal entropy CE under the condition gene X→Y|Z =I(Y t ;X t-1:t-τ |Z t-1:t-τ );
3) Conditional Gene set K ε { K for genes X, Y 1 ,K 2 ,K 3 …K n Filtering the maximum causal entropy max of a plurality of conditional genes Z∈K {CE X→Y|Z Edges less than the threshold θ.
The threshold of the present application may be set to θ=0.03.
The application also provides a gene regulation network construction system based on the multi-time-lag causal entropy, which comprises the following steps:
an input unit for dividing the inputted time-series gene expression data into different time windows according to the time lag tau;
the gene expression matrix construction unit is used for respectively constructing time sequence gene expression matrixes under t-tau time windows for the gene expression data of t time slices after window division, and the gene expression matrixes from t-tau to t-1;
the gene correlation matrix construction unit is used for selecting an expression profile under a t window for each gene in the time sequence gene expression matrix under t-tau time windows, selecting an expression profile under the t window from a t-tau window to a t-1 time window by a regulating factor, and calculating multi-time-lag transfer entropy between genes to obtain a gene correlation matrix;
and the clustering unit is used for clustering the full-communication network of the gene correlation matrix, classifying the edges into two types, filtering out one type of edges with low probability values, calculating multi-time-lag causal entropy of each of the rest edges under different condition genes, and filtering out indirectly regulated edges with the maximum causal entropy lower than a threshold value theta to obtain the final gene regulation network.
Compared with the prior art, the application has the following beneficial effects: the method is suitable for the situation that the time slices of real time sequence gene expression data are too few, can calculate the regulation and control relation of genes under a plurality of time slices, and filters indirectly regulated edges through conditional transfer entropy, thereby effectively improving the network construction precision.
Drawings
FIG. 1 is a flow chart of the NIMCE of the application;
FIG. 2 is a graph comparing the methods NIMCE and GENIE3, jump3, fastBMA based on PR curves and area under them (AUPR);
FIG. 3 is a comparative graph of the NIMCE and GENIE3, jump3, fastBMA methods evaluated based on the Recall, precision method.
Detailed Description
1. Construction of time window Gene expression matrices
Reading in a time sequence gene table data file, wherein G represents a gene expression matrix, and an expression matrix G tau under a moving time window is expressed as follows:
wherein ,representing the expression value of the gene N in the time window expression matrix G tau under the time window T of the sample M; t represents the gene expression vector under which moving time window; n represents the subscript of the gene (N.epsilon.the number of genes) and M represents the subscript of the sample cells (M.epsilon.the number of samples).
2. Construction of a Gene correlation matrix
For each pair of genes, the target gene selects an expression profile under a t window, the regulatory factor selects an expression profile under a t-tau window to t-1 time window, and multi-time-lag transfer entropy between the regulatory factor and the target gene is calculated:
wherein I(Yt ,X t-1:t-τ |Y t-1:t-τ ) Represents Y t ,X t-1:t-τ In the condition variable Y t-1:t-τ Conditional mutual information of (a):
where Px, y, z (x, y, z) represent joint probability densities, pz (z) represent marginal probability densities, px, z (x, z) represent marginal probability densities between the variables x, z.
X t-1:t-τ Representing the expression value of gene x in the time window from t-1 to t- τ, gene y is similarly represented, and H (|. Cndot.) represents conditional entropy:
where P (x, y) represents the joint probability and P (x) represents the marginal probability density.
In NIMCE, to effectively calculate the distribution of probabilities in equation (2), we calculate the probability density of continuous variables using Kernel Density Estimation (KDE):
wherein Xp Represents X t-1:t-τ ,Y p Represents Y t-1:t-τ L=m (T- τ) denotes the number of samples under the time-shift matrix, f h (x) Is a kernel density function defined as follows:
where h represents the size of the baseband and K is a kernel function.
3. Edge of indirect regulation and control by filtration
For the gene correlation matrix G zero-order The elements in the matrix represent the probability of regulatory relationship among genes, edge clusters are divided into two clusters according to k-means, and low-level filtering is performedEdges in the cluster of probability values;
filtering indirectly regulated edges based on a path consistency algorithm, regarding each edge (X, Y) existing in the filtered network, regarding adjacent points Z existing on each edge as a condition gene if the edges (Y, Z) exist and the edges (X, Z) exist, and calculating causal entropy CE under the condition gene X→Y|Z =I(Y t ;X t-1:t-τ |Z t-1:t-τ );
Conditional Gene set K ε { K for genes X, Y 1 ,K 2 ,K 3 …K n Filtering the maximum causal entropy max among a plurality of conditional genes Z∈K {CE x→Y|z Edges less than the threshold θ (artificially set super parameter, default to 0.03).
4. Experiment verification
To verify the effectiveness of the method of the present application, we tested on GeneNetWave generated 5 simulated data and compared with random forest based GENIE3 and tree method based Jump3 and dynamic bayesian fastBMA. The GeneNetWave dataset was extracted from each subnet generated from the E-coli or cerevisiae gene regulatory network time series perturbation data for the DREAM4 challenge. We generated 5 datasets using GNW, where each dataset contained 50 genes, each sample contained 10 samples, containing a total of 21 time-point temporal expression data.
To evaluate the continuity and accuracy of the inferred results, we used several metrics for comparison, namely the AUPR value and the Recall, precision value. The AUPR value is the area under the line of the PR curve, the Recall value calculation formula is the ratio of the number of predicted correct edges to the number of true directed edges, the Precision value calculation formula is the ratio of the number of predicted correct edges to the number of predicted edges, and the experimental results of the AUPR value and the Recall value are shown in fig. 2 and 3 respectively.
As can be seen from fig. 2 and 3, our method is superior to other methods at different samples, whether based on AUPR values or Recall, precision values. Thus, we propose a method NIMCE with good stability. Experiments show that the time complexity of the Jump3 increases exponentially under the condition of larger network scale, and the calculation cannot be basically performed, and the result can be obtained in a shorter time by using the NIMCE method.

Claims (3)

1. The gene regulation network construction method based on the multi-time-lag causal entropy is characterized by comprising the following steps of:
1) Dividing the input time sequence gene expression data into different time windows according to time lag tau;
2) Respectively constructing time sequence gene expression matrixes under t-tau time windows for the gene expression data of t time slices after dividing the windows, wherein the time sequence gene expression matrixes from t-tau to t-1;
3) For each gene in the time sequence gene expression matrix under t-tau time windows, selecting an expression profile under the t window by a target gene, selecting an expression profile under the t-tau window to t-1 time window by a regulatory factor, and calculating multi-time-lag transfer entropy between genes to obtain a gene correlation matrix;
4) Dividing edge clusters into two types for a full-communication network of a gene correlation matrix, filtering out one type of edges with low probability values, calculating multi-time-lag causal entropy of each of the remaining edges under different condition genes, and filtering out indirectly regulated edges with maximum causal entropy lower than a threshold value theta to obtain a final gene regulation network; the specific implementation process of filtering out the indirectly regulated edges with the maximum causal entropy lower than the threshold value theta comprises the following steps:
for the gene correlation matrix G zero-order Filtering edges of low expression values, dividing edge clusters into two clusters according to k-means, and filtering edges in the clusters with low probability values; the elements in the gene correlation matrix represent the probability of regulatory relationships between genes;
filtering indirectly regulated edges based on a path consistency algorithm, regarding each edge (X, Y) existing in the filtered network, regarding adjacent points Z existing on each edge as a condition gene if the edges (Y, Z) exist and the edges (X, Z) exist, and calculating causal entropy CE under the condition gene X→Y|Z =I(Y t ;X t-1:t-τ |Z t-1:t-τ );
Conditional Gene set K ε { K for genes X, Y 1 ,K 2 ,K 3 ,…,K n Filtering the maximum causal entropy max of a plurality of conditional genes Z∈K {CE X→Y|Z Edges less than a threshold θ;
in step 1), the expressions for dividing different time windows gτ according to the time lag τ are:
wherein ,representing the expression value of the gene N in the time window expression matrix G tau under the time window T of the sample M; n represents the total number of genes; m represents the total number of sample cells;
in step 3), the multiple-time-lag transfer entropy T between genes X→Y The calculation formula of (2) is as follows:
T X→Y =I(Y t ;X t-1:t-τ |Y t-1:t-τ )
=H(Y t |Y t-1:t-τ )-H(Y t |Y t-1:t-τ ,X t-1:t-τ );
wherein I(Yt ;X t-1:t-τ |Y t-1:t-τ ) Represents Y t ,X t-1:t-τ In the condition variable Y t-1:t-τ Conditional mutual information of (a):
wherein pX,Y,Z (x, y, z) represents the joint probability density, p Z (z) represents the marginal probability density, p X,Z (x, z) represents the marginal probability density between the variables x, z;
X t-1:t-τ representing the expression value of gene X in the time window from t-1 to t-tau, H (|. Cndot.) represents the conditional entropy:
where p (x, y) represents the joint probability and p (x) represents the marginal probability density.
2. The method for constructing a gene regulation network based on multi-time-lag causal entropy according to claim 1, wherein the threshold θ is 0.03.
3. The gene regulation network construction system based on the multi-time-lag causal entropy is characterized by comprising the following steps:
an input unit for dividing the inputted time-series gene expression data into different time windows according to the time lag tau;
the gene expression matrix construction unit is used for respectively constructing time sequence gene expression matrixes under t-tau time windows for the gene expression data of t time slices after window division, and the gene expression matrixes from t-tau to t-1;
the gene correlation matrix construction unit is used for selecting an expression profile under a t window for each gene in the time sequence gene expression matrix under t-tau time windows, selecting an expression profile under the t window from a t-tau window to a t-1 time window by a regulating factor, and calculating multi-time-lag transfer entropy between genes to obtain a gene correlation matrix;
the clustering unit is used for clustering edges of the full-communication network of the gene correlation matrix into two types, filtering out one type of edges with low probability values, calculating multi-time-lag causal entropy of each of the remaining edges under different condition genes, and filtering out indirectly regulated edges with maximum causal entropy lower than a threshold value theta to obtain a final gene regulation network;
the specific implementation process for filtering out the indirectly regulated edges with the maximum causal entropy lower than the threshold value theta comprises the following steps:
for the gene correlation matrix G zero-order Filtering edges of low expression values, dividing edge clusters into two clusters according to k-means, and filtering edges in the clusters with low probability values; the elements in the gene correlation matrix represent the probability of regulatory relationships between genes;
filtering indirectly regulated edges based on a path consistency algorithm, regarding each edge (X, Y) existing in the filtered network, regarding adjacent points Z existing on each edge as a condition gene if the edges (Y, Z) exist and the edges (X, Z) exist, and calculating causal entropy CE under the condition gene X→Y|Z =I(Y t ;X t-1:t-τ |Z t-1:t-τ );
Conditional Gene set K ε { K for genes X, Y 1 ,K 2 ,K 3 ,…,K n Filtering the maximum causal entropy max of a plurality of conditional genes Z∈K {CE X→Y|Z Edges less than a threshold θ;
the expression for dividing different time windows G tau according to the time lag tau is as follows:
wherein ,representing the expression value of the gene N in the time window expression matrix G tau under the time window T of the sample M; n represents the total number of genes; m represents the total number of sample cells;
multi-time-lag transfer entropy T between genes X→Y The calculation formula of (2) is as follows:
T X→Y =I(Y t ;X t-1:t-τ |Y t-1:t-τ )
=H(Y t |Y t-1:t-τ )-H(Y t |Y t-1:t-τ ,X t-1:t-τ );
wherein I(Yt ;X t-1:t-τ |Y t-1:t-τ ) Represents Y t ,X t-1:t-τ In the condition variable Y t-1:t-τ Conditional mutual information of (a):
wherein pX,Y,Z (x, y, z) representsJoint probability density, p Z (z) represents the marginal probability density, p X,Z (x, z) represents the marginal probability density between the variables x, z;
X t-1:t-τ representing the expression value of gene X in the time window from t-1 to t-tau, H (|. Cndot.) represents the conditional entropy:
where p (x, y) represents the joint probability and p (x) represents the marginal probability density.
CN202010013036.9A 2020-01-06 2020-01-06 Gene regulation network construction method and system based on multi-time-lag causal entropy Active CN111223523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010013036.9A CN111223523B (en) 2020-01-06 2020-01-06 Gene regulation network construction method and system based on multi-time-lag causal entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010013036.9A CN111223523B (en) 2020-01-06 2020-01-06 Gene regulation network construction method and system based on multi-time-lag causal entropy

Publications (2)

Publication Number Publication Date
CN111223523A CN111223523A (en) 2020-06-02
CN111223523B true CN111223523B (en) 2023-10-03

Family

ID=70811155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010013036.9A Active CN111223523B (en) 2020-01-06 2020-01-06 Gene regulation network construction method and system based on multi-time-lag causal entropy

Country Status (1)

Country Link
CN (1) CN111223523B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610760B (en) * 2021-07-05 2024-03-12 河海大学 Cell image segmentation tracing method based on U-shaped residual neural network
CN113889180B (en) * 2021-09-30 2024-05-24 山东大学 Biomarker identification method and system based on dynamic network entropy
CN114925837B (en) * 2022-03-23 2024-04-16 华中农业大学 Gene regulation network construction method based on mixed entropy optimization mutual information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058549A (en) * 2001-08-21 2003-02-28 Mamoru Kato Computer readable recording medium with program recorded thereon for estimating control relation between genes from gene expression quantity data and gene arrangement data
CN108491686A (en) * 2018-03-30 2018-09-04 中南大学 A kind of gene regulatory network construction method based on two-way XGBoost
KR20190054386A (en) * 2017-11-13 2019-05-22 한양대학교 산학협력단 Genome analysis method based on modularization

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256652A1 (en) * 2004-05-16 2005-11-17 Sai-Ping Li Reconstruction of gene networks from time-series microarray data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058549A (en) * 2001-08-21 2003-02-28 Mamoru Kato Computer readable recording medium with program recorded thereon for estimating control relation between genes from gene expression quantity data and gene arrangement data
KR20190054386A (en) * 2017-11-13 2019-05-22 한양대학교 산학협력단 Genome analysis method based on modularization
CN108491686A (en) * 2018-03-30 2018-09-04 中南大学 A kind of gene regulatory network construction method based on two-way XGBoost

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Gene regulatory networks on transfer entropy (GRNTE): a novel approach to reconstruct gene regulatory interactions applied to a case study for the plant pathogen phytophthora infestans;Juan Camilo Castro et al.;《Theoretical Biology and Medical Modelling》;全文 *
On the interplay between entropy and robustness of gene regulatory networks;Bor-Sen Chen et al.;《Entropy in Genetics and Computational Biology》;全文 *
Xiang Chen et al..A novel method of gene regulatory network structure inference from gene knock-out expression data.《Tsinghua Science and Technology》.2019,第24卷446-455. *
Xiujun Zhang et al..Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.《Bioinformatics》.2011,第28卷98-104. *
几何模式动态贝叶斯网络推理基因调控网络;王开军;张军英;赵峰;张宏怡;;西安电子科技大学学报(06);全文 *
王文杰 等.基因组学数据的网络构建与分析方法.《中国卫生统计》.2017,第34卷177-180+184. *

Also Published As

Publication number Publication date
CN111223523A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN111223523B (en) Gene regulation network construction method and system based on multi-time-lag causal entropy
Jarboui et al. Combinatorial particle swarm optimization (CPSO) for partitional clustering problem
Yu et al. Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations
Hu et al. Comprehensive learning particle swarm optimization based memetic algorithm for model selection in short-term load forecasting using support vector regression
Zhou et al. A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks
Genovese et al. False discovery control with p-value weighting
Gebert et al. Modeling gene regulatory networks with piecewise linear differential equations
Maraziotis A semi-supervised fuzzy clustering algorithm applied to gene expression data
Li et al. A novel complex network community detection approach using discrete particle swarm optimization with particle diversity and mutation
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
EP2354988A1 (en) Gene clustering program, gene clustering method, and gene cluster analyzing device
Bahrepour et al. An adaptive ordered fuzzy time series with application to FOREX
Wang et al. Learning large-scale fuzzy cognitive maps using an evolutionary many-task algorithm
Zhang et al. A novel power-driven fractional accumulated grey model and its application in forecasting wind energy consumption of China
Kahraman A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric
Zeng et al. A novel HMM-based clustering algorithm for the analysis of gene expression time-course data
Zhu et al. Deep-gknock: nonlinear group-feature selection with deep neural networks
Fu et al. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data
Sartori Penalized regression: Bootstrap confidence intervals and variable selection for high-dimensional data sets
Huang et al. Identification of fuzzy inference systems using a multi-objective space search algorithm and information granulation
Deng et al. EXAMINE: A computational approach to reconstructing gene regulatory networks
Örkçü et al. A hybrid applied optimization algorithm for training multi-layer neural networks in data classification
CN113486952A (en) Multi-factor model optimization method of gene regulation and control network
Aalto et al. Continuous time Gaussian process dynamical models in gene regulatory network inference
Ergul et al. DOPGA: A new fitness assignment scheme for multi-objective evolutionary algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant