CN110060730A - A kind of netic module analysis method - Google Patents
A kind of netic module analysis method Download PDFInfo
- Publication number
- CN110060730A CN110060730A CN201910267199.7A CN201910267199A CN110060730A CN 110060730 A CN110060730 A CN 110060730A CN 201910267199 A CN201910267199 A CN 201910267199A CN 110060730 A CN110060730 A CN 110060730A
- Authority
- CN
- China
- Prior art keywords
- gene
- phenotype
- disease
- candidate
- topology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of netic module analysis methods, comprising: input gene phenotype double-layer network, gene function similitude network and known disease gene set s relevant to disease phenotype0;Increase the connection relationship in gene phenotype double-layer network between gene and phenotype;By in gene phenotype double-layer network with s0Middle disease gene has side to connect and not in s0In gene as candidate gene, calculate and s be added in the sum of selection Semantic Similarity, Topology Similarity and phenotype relevance maximum candidate gene0, when candidate gene set no longer significant enrichment GO noumenon function annotation relevant to disease phenotype, biological pathway gene and the differential expression genes in disease phenotype sample and normal sample simultaneously of expansion, remember that current algebra is m, export m-1 for s0The candidate gene of middle expansion and the known disease gene for thering is side to connect with these expansion candidate genes.
Description
Technical field
The present invention relates to data analysis technique field more particularly to a kind of netic module analysis methods.
Background technique
Currently, the detection method of disease module can substantially be divided into three classes in mankind's interactive network:
First kind research method is to detect disease module based on gene expression data, and gene expression profile data is to disease module
Excavation and research for be a kind of more common and effective data resource.Due in available data there are still it is a large amount of mistake,
Missing and the number of drawbacks such as high-dimensional, therefore only excavate disease modular structure using only previous clustering algorithm and can not obtain
Obtain good effect.
Second class research method be based on single network detect disease module, but the one-sidedness of single network data and
Error can have a huge impact the excavation of netic module, cause finally obtained netic module and actual netic module it
Between there is very large deviation.
Third class research method is to detect disease module based on multiple networks, in all Multi net voting structures that are based on to module knot
Structure carries out in the algorithm of mining analysis, and research emphasis is search more modular structure of frequency of occurrence in all-network.
Because multiple networks used in method are all by the gene expression profile building that is highly mutually related with certain disease phenotype
It obtains, so the frequency that netic module occurs is higher higher with the degree of relevancy of certain disease phenotype.
Summary of the invention
Technical problems based on background technology, the invention proposes a kind of netic module analysis methods;
A kind of netic module analysis method proposed by the present invention, comprising:
S1, input gene phenotype double-layer network, gene function similitude network and known disease relevant to disease phenotype x
Gene sets s0;
Connection relationship in S2, increase gene phenotype double-layer network between gene and phenotype;
S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In gene
As candidate gene, the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene are iterated to calculate, according to topology
Similitude and Semantic Similarity increase and delete the connection relationship between gene, recalculate and select Semantic Similarity, topology
S is added in the maximum candidate gene of the sum of similitude and phenotype relevance0, until iterating to calculate to the candidate gene set expanded not
Again simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate, biological pathway gene and in disease phenotype x sample
It is m with the algebra for when differential expression genes, remembering expansion candidate gene in normal sample, by m-1 for disease gene set s0Middle expansion
The candidate gene that fills and with expand the target disease base of known disease gene that candidate gene has side to connect as disease phenotype x
Because of module and export.
Preferably, in step S1, the gene phenotype double-layer network, specifically:
According to idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype Double-level Reticulated
The adjacency matrix of network, double-layer network can be expressed asWherein CTFor the transposed matrix of C.
Preferably, step S2 is specifically included:
Pass throughDefine the phenotype of gene i and gene j in gene phenotype double-layer network
Relevance wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p
(i) ∩ p (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | for the phenotype in set p (i)
Number, | p (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate gene table
Gene i and gene j ' phenotype relevance are maximum in type double-layer network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility that:Wherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known
Disease gene set;
JudgementWhether it is greater than γ, when the judgment result is yes, the i-th row t in gene phenotype relational network C is set
The value of column element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
Preferably, step S3 is specifically included:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of base
Because of the gene for thering is side to connect, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define nth iteration
The Topology Similarity of middle candidate gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and collection
Close s0Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_ of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities
Topology is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities,
Gene sets TB is added in candidate gene i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In there is side to connect gene between functional similarity it is average
Valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function
Energy similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side connection
Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd
Aij=0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj
∈s0;
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity it is average
Valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function
Energy similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection
Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd
Aij=1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈
s0;
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect base
Cause, and as candidate gene, candidate gene set is denoted as c '0;
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates
Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar
Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease base
Because of set s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times
Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, and this e
Phenotype similarity between disease phenotype and disease phenotype x is expressed as three comprising disease gene set and disease gene number
Member set O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculating candidate gene i is associated with disease phenotype x's
Degree
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri;
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge
Whether there is candidate gene in gene phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously significant enrichment GO noumenon function relevant to disease phenotype x
Annotation, biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands candidate gene
Algebra be m, m-1 is expanded known to candidate genes are connected for the candidate gene expanded in disease gene set and with these
Disease gene as disease phenotype x target disease module and export.
Idiotype network and the phenotype network integration are constituted gene phenotype double-layer network by the present invention, calculate candidate gene and disease
Being associated between Topology Similarity and Semantic Similarity and candidate gene between gene disease similar with institute study of disease
Property, Topology Similarity, Semantic Similarity and relevance are combined, disease module is detected, by a variety of biological datas
And the effective integration of a variety of attributes, to greatly enhance the accuracy of disease module detection algorithm;Further,
The interaction of existing protein is increased and deleted by Topology Similarity and Semantic Similarity, and utilizes collaborative filtering
Method increases gene phenotype relationship, is adjusting the effective false positive and false negative data of reducing to disease to gene phenotype double-layer network
The influence of sick module detection.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of netic module analysis method proposed by the present invention.
Specific embodiment
Referring to Fig.1, a kind of netic module analysis method proposed by the present invention, comprising:
S1, input gene phenotype double-layer network, gene function similitude network and known disease relevant to disease phenotype x
Gene sets s0;
In this step, gene phenotype double-layer network, specifically: according to idiotype network A, phenotype similitude network B, Yi Jiji
Because Phenotype Correlation network C constructs gene phenotype double-layer network, the adjacency matrix of double-layer network can be expressed asWherein CTFor the transposed matrix of C.
In concrete scheme, D is acquired1L in a database1It interacts to protein, wherein including N number of protein, egg
White matter network can be abstracted as a protein point set V1With interaction side collection E1The figure G of composition1=(V1,E1), number of nodes is denoted as N
=| V1|, number of edges is denoted as l1=| E1|, E1Middle each edge has V1In a pair of of point correspond, adjacency matrix A=(Aij)N×N
Indicate protein network, A in adjacency matrixijValue be to have Bian Xianglian between 1 expression point i and point j, be 0 expression node i and node
J is boundless to be connected, and adjacency matrix A is the symmetrical matrix of 0 and 1 element composition, because substantially one-to-one between gene and protein
Relationship, so protein network is substantially equivalent to idiotype network.
Acquire D2L in a database2To phenotype similarity relation, wherein including M phenotype, phenotype similitude network can be taken out
As for a phenotype point set V2With phenotype similarity relation side collection E2The figure G of composition2=(V2,E2), number of nodes is denoted as M=| V2|, side
Number scale is l2=| E2|, E2Middle each edge has V2In a pair of of point correspond, adjacency matrix B=(Bij)M×MIndicate phenotype
Similitude network, the B in adjacency matrixijValue, which is greater than between 0 expression point i and point j, Bian Xianglian, indicates node i and node j for 0
Boundless to be connected, adjacency matrix B is by the symmetrical matrix of the element composition belonged on section [0,1].
Acquire D3L in a database3To gene phenotype relationship, wherein a comprising a disease gene of N ' and M phenotype, N '
The set of disease gene composition is denoted as V '3, gene-Phenotype Correlation network can be abstracted as a gene and phenotype point set V3=V2∪
V′3With gene-Phenotype Correlation side collection E3The figure G of composition3=(V3,E3), number of nodes is denoted as N+M=| V3|, number of edges is denoted as l3=| E3
|, E3Middle each edge has V3In a pair of of point correspond, adjacency matrix C=(Cij)N×MIndicate gene phenotype relational network,
C in adjacency matrixijValue is to have Bian Xianglian between 1 expression point i and point j, for 0 indicate node i with node j is boundless is connected, adjoining
Matrix C is the symmetrical matrix of 0 and 1 element composition.
Using idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype Double-level Reticulated
The adjacency matrix of network, double-layer network can be expressed asWherein CTFor the transposed matrix of C.
Step S2 increases the connection relationship in gene phenotype double-layer network between gene and phenotype.
This step specifically includes:
Pass throughDefine the phenotype of gene i and gene j in gene phenotype double-layer network
Relevance wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p
(i) ∩ p (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | for the phenotype in set p (i)
Number, | p (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate gene table
Gene i and gene j ' phenotype relevance are maximum in type double-layer network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility that:Wherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known
Disease gene set;
JudgementWhether it is greater than γ, when the judgment result is yes, the i-th row t in gene phenotype relational network C is set
The value of column element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
In concrete scheme, the interaction of existing protein increase by Topology Similarity and Semantic Similarity and
It deletes, and increases gene phenotype relationship using collaborative filtering method, reached to gene phenotype double-layer network structural adjustment
Purpose, to effectively reduce the influence that false positive and false negative data detect disease module.
Step S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In
Gene iterates to calculate the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene as candidate gene, according to
Topology Similarity and Semantic Similarity increase and delete the connection relationship between gene, recalculate and select Semantic Similarity,
S is added in the maximum candidate gene of the sum of Topology Similarity and phenotype relevance0, until iterating to calculate to the candidate gene collection expanded
Close no longer simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate, biological pathway gene and in disease phenotype x
In sample and normal sample when differential expression genes, the algebra that note expands candidate gene is m, by m-1 for disease gene set s0
The candidate gene of middle expansion and with target disease of the known disease gene as disease phenotype x that expands candidate gene and there is side to connect
Ospc gene module simultaneously exports.
This step specifically includes:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of base
Because of the gene for thering is side to connect, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define nth iteration
The Topology Similarity of middle candidate gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and collection
Close s0Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_ of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities
Topology is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities,
Gene sets TB is added in candidate gene i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In there is side to connect gene between functional similarity it is average
Valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function
Energy similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side connection
Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd
Aij=0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj
∈s0;
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity it is average
Valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function
Energy similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection
Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd
Aij=1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈
s0;
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect base
Cause, and as candidate gene, candidate gene set is denoted as c '0;
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates
Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar
Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease base
Because of set s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times
Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, and this e
Phenotype similarity between disease phenotype and disease phenotype x is expressed as three comprising disease gene set and disease gene number
Member set O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculating candidate gene i is associated with disease phenotype x's
Degree
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri;
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge
Whether there is candidate gene in gene phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously significant enrichment GO noumenon function relevant to disease phenotype x
Annotation, biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands candidate gene
Algebra be m, m-1 is expanded known to candidate genes are connected for the candidate gene expanded in disease gene set and with these
Disease gene as disease phenotype x target disease module and export.
In concrete scheme, idiotype network and the phenotype network integration are constituted into gene phenotype double-layer network, because of existing life
There is a large amount of mistake and missing in object data, existing gene interaction is carried out by Topology Similarity and Semantic Similarity
Increase and delete and increase gene phenotype relationship using collaborative filtering method and gene phenotype double-layer network is adjusted, simultaneously
With the involvement of disease phenotype data, to effectively reduce the influence of false positive and false negative data, disease module is improved
The accuracy of detection algorithm.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (4)
1. a kind of netic module analysis method characterized by comprising
S1, input gene phenotype double-layer network, gene function similitude network and known disease gene relevant to disease phenotype x
Set s0;
Connection relationship in S2, increase gene phenotype double-layer network between gene and phenotype;
S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In gene conduct
Candidate gene iterates to calculate the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene, similar according to topology
Property and Semantic Similarity increase and delete the connection relationship between gene, recalculate with selection Semantic Similarity, topology it is similar
Property and the maximum candidate gene of the sum of phenotype relevance s is added0, no longer same to the candidate gene set expanded until iterating to calculate
When significant enrichment relevant to disease phenotype x GO noumenon function annotation, biological pathway gene and in disease phenotype x sample and just
In normal sample when differential expression genes, the algebra that note expands candidate gene is m, by m-1 for disease gene set s0Middle expansion
Candidate gene and with target disease gene mould of the known disease gene as disease phenotype x that expands candidate gene and there is side to connect
Block simultaneously exports.
2. netic module analysis method according to claim 1, which is characterized in that in step S1, the gene phenotype is double
Layer network, specifically:
According to idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype double-layer network,
The adjacency matrix of double-layer network can be expressed asWherein CTFor the transposed matrix of C.
3. netic module analysis method according to claim 2, which is characterized in that step S2 is specifically included:
Pass throughGene i in gene phenotype double-layer network is defined to be associated with the phenotype of gene j
Property wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p (i) ∩
P (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | it is the phenotype number in set p (i), |
P (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate that gene phenotype is double-deck
Gene i and gene j ' phenotype relevance are maximum in network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility thatWherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known
Disease gene set;
JudgementWhether γ is greater than, when the judgment result is yes, the i-th row t being arranged in gene phenotype relational network C arranges member
The value of element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
4. netic module analysis method according to claim 3, which is characterized in that step S3 is specifically included:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of gene have side
The gene of connection, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define candidate in nth iteration
The Topology Similarity of gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and set s0
Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_Topology of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities
It is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities, by candidate base
Because gene sets TB is added in i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In have side connect gene between functional similarity average valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function
Similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side to connect
Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd Aij=
0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj∈s0;
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity average valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function
Similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection
Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd Aij=
1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈s0;
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect gene, and will
It is denoted as c ' as candidate gene, candidate gene set0;
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates
Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar
Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease gene collection
Close s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times
Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, this e disease
Phenotype similarity between phenotype and disease phenotype x is expressed as three metasets comprising disease gene set and disease gene number
Close O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculate the correlation degree of candidate gene i and disease phenotype x
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri;
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge gene
Whether there is candidate gene in phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate,
Biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands the generation of candidate gene
Number is m, and m-1 is expanded the known disease that candidate gene is connected for the candidate gene expanded in disease gene set and with these
Gene as disease phenotype x target disease module and export.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267199.7A CN110060730B (en) | 2019-04-03 | 2019-04-03 | Gene module analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910267199.7A CN110060730B (en) | 2019-04-03 | 2019-04-03 | Gene module analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110060730A true CN110060730A (en) | 2019-07-26 |
CN110060730B CN110060730B (en) | 2022-11-01 |
Family
ID=67318273
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910267199.7A Active CN110060730B (en) | 2019-04-03 | 2019-04-03 | Gene module analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110060730B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704747A (en) * | 2019-10-11 | 2020-01-17 | 安徽大学 | Intelligent patent recommendation method based on deep semantic similarity |
CN110706740A (en) * | 2019-09-29 | 2020-01-17 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN111540405A (en) * | 2020-04-29 | 2020-08-14 | 新疆大学 | Disease gene prediction method based on rapid network embedding |
CN111951951A (en) * | 2020-07-14 | 2020-11-17 | 西安电子科技大学 | Disease module detection method and system based on connectivity significance |
CN113947149A (en) * | 2021-10-19 | 2022-01-18 | 大理大学 | Similarity measurement method and device for gene module group, electronic device and storage medium |
CN116343913A (en) * | 2023-03-15 | 2023-06-27 | 昆明市延安医院 | Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055922A (en) * | 2016-06-08 | 2016-10-26 | 哈尔滨工业大学深圳研究生院 | Hybrid network gene screening method based on gene expression data |
US20170242959A1 (en) * | 2016-02-24 | 2017-08-24 | Ucb Biopharma Sprl | Method and system for quantifying the likelihood that a gene is casually linked to a disease |
-
2019
- 2019-04-03 CN CN201910267199.7A patent/CN110060730B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170242959A1 (en) * | 2016-02-24 | 2017-08-24 | Ucb Biopharma Sprl | Method and system for quantifying the likelihood that a gene is casually linked to a disease |
CN106055922A (en) * | 2016-06-08 | 2016-10-26 | 哈尔滨工业大学深圳研究生院 | Hybrid network gene screening method based on gene expression data |
Non-Patent Citations (1)
Title |
---|
黄俊恒等: "利用蛋白质-表型网络的致病基因预测方法研究", 《计算机工程与应用》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706740A (en) * | 2019-09-29 | 2020-01-17 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN110706740B (en) * | 2019-09-29 | 2022-03-22 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN110704747A (en) * | 2019-10-11 | 2020-01-17 | 安徽大学 | Intelligent patent recommendation method based on deep semantic similarity |
CN111540405A (en) * | 2020-04-29 | 2020-08-14 | 新疆大学 | Disease gene prediction method based on rapid network embedding |
CN111951951A (en) * | 2020-07-14 | 2020-11-17 | 西安电子科技大学 | Disease module detection method and system based on connectivity significance |
CN111951951B (en) * | 2020-07-14 | 2023-06-23 | 西安电子科技大学 | Disease module detection method and system based on connected significance |
CN113947149A (en) * | 2021-10-19 | 2022-01-18 | 大理大学 | Similarity measurement method and device for gene module group, electronic device and storage medium |
CN116343913A (en) * | 2023-03-15 | 2023-06-27 | 昆明市延安医院 | Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network |
CN116343913B (en) * | 2023-03-15 | 2023-11-14 | 昆明市延安医院 | Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network |
Also Published As
Publication number | Publication date |
---|---|
CN110060730B (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110060730A (en) | A kind of netic module analysis method | |
Huang et al. | Shrink: a structural clustering algorithm for detecting hierarchical communities in networks | |
CN110532436B (en) | Cross-social network user identity recognition method based on community structure | |
CN104008165B (en) | Club detecting method based on network topology and node attribute | |
CN103325061B (en) | A kind of community discovery method and system | |
CN109935332A (en) | A kind of miRNA- disease association prediction technique based on double random walk models | |
Zhang et al. | Protein complex prediction in large ontology attributed protein-protein interaction networks | |
Wang et al. | Dynamic community detection based on network structural perturbation and topological similarity | |
Khan et al. | Virtual community detection through the association between prime nodes in online social networks and its application to ranking algorithms | |
CN103034687B (en) | A kind of relating module recognition methodss based on 2 class heterogeneous networks | |
CN112084373B (en) | Graph embedding-based multi-source heterogeneous network user alignment method | |
CN103020163A (en) | Node-similarity-based network community division method in network | |
CN110533253A (en) | A kind of scientific research cooperative Relationship Prediction method based on Heterogeneous Information network | |
Botta et al. | Finding network communities using modularity density | |
CN109165040A (en) | A method of the code copy suspicion detection based on Random Forest model | |
Bro et al. | Surname affinity in Santiago, Chile: A network-based approach that uncovers urban segregation | |
CN114969369A (en) | Knowledge graph human cancer lethal prediction method based on mixed network and knowledge graph construction method | |
Mihelčić et al. | A framework for redescription set construction | |
CN115620143A (en) | New classical architecture style identification system, construction method and identification method | |
CN115440392A (en) | Important super-edge identification method based on post-deletion Laplace matrix | |
CN105893481A (en) | Method for decomposing relation among entities based on Markov clustering | |
CN102779241B (en) | PPI (Point-Point Interaction) network clustering method based on artificial swarm reproduction mechanism | |
CN109101783B (en) | Cancer network marker determination method and system based on probability model | |
CN106354886A (en) | Method for screening nearest neighbor by using potential neighbor relation graph in recommendation system | |
CN107862073B (en) | Web community division method based on node importance and separation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |