CN110060730A - A kind of netic module analysis method - Google Patents

A kind of netic module analysis method Download PDF

Info

Publication number
CN110060730A
CN110060730A CN201910267199.7A CN201910267199A CN110060730A CN 110060730 A CN110060730 A CN 110060730A CN 201910267199 A CN201910267199 A CN 201910267199A CN 110060730 A CN110060730 A CN 110060730A
Authority
CN
China
Prior art keywords
gene
phenotype
disease
candidate
topology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910267199.7A
Other languages
Chinese (zh)
Other versions
CN110060730B (en
Inventor
苏延森
祝火乐
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201910267199.7A priority Critical patent/CN110060730B/en
Publication of CN110060730A publication Critical patent/CN110060730A/en
Application granted granted Critical
Publication of CN110060730B publication Critical patent/CN110060730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of netic module analysis methods, comprising: input gene phenotype double-layer network, gene function similitude network and known disease gene set s relevant to disease phenotype0;Increase the connection relationship in gene phenotype double-layer network between gene and phenotype;By in gene phenotype double-layer network with s0Middle disease gene has side to connect and not in s0In gene as candidate gene, calculate and s be added in the sum of selection Semantic Similarity, Topology Similarity and phenotype relevance maximum candidate gene0, when candidate gene set no longer significant enrichment GO noumenon function annotation relevant to disease phenotype, biological pathway gene and the differential expression genes in disease phenotype sample and normal sample simultaneously of expansion, remember that current algebra is m, export m-1 for s0The candidate gene of middle expansion and the known disease gene for thering is side to connect with these expansion candidate genes.

Description

A kind of netic module analysis method
Technical field
The present invention relates to data analysis technique field more particularly to a kind of netic module analysis methods.
Background technique
Currently, the detection method of disease module can substantially be divided into three classes in mankind's interactive network:
First kind research method is to detect disease module based on gene expression data, and gene expression profile data is to disease module Excavation and research for be a kind of more common and effective data resource.Due in available data there are still it is a large amount of mistake, Missing and the number of drawbacks such as high-dimensional, therefore only excavate disease modular structure using only previous clustering algorithm and can not obtain Obtain good effect.
Second class research method be based on single network detect disease module, but the one-sidedness of single network data and Error can have a huge impact the excavation of netic module, cause finally obtained netic module and actual netic module it Between there is very large deviation.
Third class research method is to detect disease module based on multiple networks, in all Multi net voting structures that are based on to module knot Structure carries out in the algorithm of mining analysis, and research emphasis is search more modular structure of frequency of occurrence in all-network. Because multiple networks used in method are all by the gene expression profile building that is highly mutually related with certain disease phenotype It obtains, so the frequency that netic module occurs is higher higher with the degree of relevancy of certain disease phenotype.
Summary of the invention
Technical problems based on background technology, the invention proposes a kind of netic module analysis methods;
A kind of netic module analysis method proposed by the present invention, comprising:
S1, input gene phenotype double-layer network, gene function similitude network and known disease relevant to disease phenotype x Gene sets s0
Connection relationship in S2, increase gene phenotype double-layer network between gene and phenotype;
S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In gene As candidate gene, the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene are iterated to calculate, according to topology Similitude and Semantic Similarity increase and delete the connection relationship between gene, recalculate and select Semantic Similarity, topology S is added in the maximum candidate gene of the sum of similitude and phenotype relevance0, until iterating to calculate to the candidate gene set expanded not Again simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate, biological pathway gene and in disease phenotype x sample It is m with the algebra for when differential expression genes, remembering expansion candidate gene in normal sample, by m-1 for disease gene set s0Middle expansion The candidate gene that fills and with expand the target disease base of known disease gene that candidate gene has side to connect as disease phenotype x Because of module and export.
Preferably, in step S1, the gene phenotype double-layer network, specifically:
According to idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype Double-level Reticulated The adjacency matrix of network, double-layer network can be expressed asWherein CTFor the transposed matrix of C.
Preferably, step S2 is specifically included:
Pass throughDefine the phenotype of gene i and gene j in gene phenotype double-layer network Relevance wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p (i) ∩ p (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | for the phenotype in set p (i) Number, | p (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate gene table Gene i and gene j ' phenotype relevance are maximum in type double-layer network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility that:Wherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known Disease gene set;
JudgementWhether it is greater than γ, when the judgment result is yes, the i-th row t in gene phenotype relational network C is set The value of column element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
Preferably, step S3 is specifically included:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of base Because of the gene for thering is side to connect, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define nth iteration The Topology Similarity of middle candidate gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and collection Close s0Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_ of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities Topology is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities, Gene sets TB is added in candidate gene i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In there is side to connect gene between functional similarity it is average Valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function Energy similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side connection Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd Aij=0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj ∈s0
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity it is average Valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function Energy similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd Aij=1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈ s0
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect base Cause, and as candidate gene, candidate gene set is denoted as c '0
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease base Because of set s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, and this e Phenotype similarity between disease phenotype and disease phenotype x is expressed as three comprising disease gene set and disease gene number Member set O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculating candidate gene i is associated with disease phenotype x's Degree
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge Whether there is candidate gene in gene phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously significant enrichment GO noumenon function relevant to disease phenotype x Annotation, biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands candidate gene Algebra be m, m-1 is expanded known to candidate genes are connected for the candidate gene expanded in disease gene set and with these Disease gene as disease phenotype x target disease module and export.
Idiotype network and the phenotype network integration are constituted gene phenotype double-layer network by the present invention, calculate candidate gene and disease Being associated between Topology Similarity and Semantic Similarity and candidate gene between gene disease similar with institute study of disease Property, Topology Similarity, Semantic Similarity and relevance are combined, disease module is detected, by a variety of biological datas And the effective integration of a variety of attributes, to greatly enhance the accuracy of disease module detection algorithm;Further, The interaction of existing protein is increased and deleted by Topology Similarity and Semantic Similarity, and utilizes collaborative filtering Method increases gene phenotype relationship, is adjusting the effective false positive and false negative data of reducing to disease to gene phenotype double-layer network The influence of sick module detection.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of netic module analysis method proposed by the present invention.
Specific embodiment
Referring to Fig.1, a kind of netic module analysis method proposed by the present invention, comprising:
S1, input gene phenotype double-layer network, gene function similitude network and known disease relevant to disease phenotype x Gene sets s0
In this step, gene phenotype double-layer network, specifically: according to idiotype network A, phenotype similitude network B, Yi Jiji Because Phenotype Correlation network C constructs gene phenotype double-layer network, the adjacency matrix of double-layer network can be expressed asWherein CTFor the transposed matrix of C.
In concrete scheme, D is acquired1L in a database1It interacts to protein, wherein including N number of protein, egg White matter network can be abstracted as a protein point set V1With interaction side collection E1The figure G of composition1=(V1,E1), number of nodes is denoted as N =| V1|, number of edges is denoted as l1=| E1|, E1Middle each edge has V1In a pair of of point correspond, adjacency matrix A=(Aij)N×N Indicate protein network, A in adjacency matrixijValue be to have Bian Xianglian between 1 expression point i and point j, be 0 expression node i and node J is boundless to be connected, and adjacency matrix A is the symmetrical matrix of 0 and 1 element composition, because substantially one-to-one between gene and protein Relationship, so protein network is substantially equivalent to idiotype network.
Acquire D2L in a database2To phenotype similarity relation, wherein including M phenotype, phenotype similitude network can be taken out As for a phenotype point set V2With phenotype similarity relation side collection E2The figure G of composition2=(V2,E2), number of nodes is denoted as M=| V2|, side Number scale is l2=| E2|, E2Middle each edge has V2In a pair of of point correspond, adjacency matrix B=(Bij)M×MIndicate phenotype Similitude network, the B in adjacency matrixijValue, which is greater than between 0 expression point i and point j, Bian Xianglian, indicates node i and node j for 0 Boundless to be connected, adjacency matrix B is by the symmetrical matrix of the element composition belonged on section [0,1].
Acquire D3L in a database3To gene phenotype relationship, wherein a comprising a disease gene of N ' and M phenotype, N ' The set of disease gene composition is denoted as V '3, gene-Phenotype Correlation network can be abstracted as a gene and phenotype point set V3=V2∪ V′3With gene-Phenotype Correlation side collection E3The figure G of composition3=(V3,E3), number of nodes is denoted as N+M=| V3|, number of edges is denoted as l3=| E3 |, E3Middle each edge has V3In a pair of of point correspond, adjacency matrix C=(Cij)N×MIndicate gene phenotype relational network, C in adjacency matrixijValue is to have Bian Xianglian between 1 expression point i and point j, for 0 indicate node i with node j is boundless is connected, adjoining Matrix C is the symmetrical matrix of 0 and 1 element composition.
Using idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype Double-level Reticulated The adjacency matrix of network, double-layer network can be expressed asWherein CTFor the transposed matrix of C.
Step S2 increases the connection relationship in gene phenotype double-layer network between gene and phenotype.
This step specifically includes:
Pass throughDefine the phenotype of gene i and gene j in gene phenotype double-layer network Relevance wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p (i) ∩ p (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | for the phenotype in set p (i) Number, | p (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate gene table Gene i and gene j ' phenotype relevance are maximum in type double-layer network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility that:Wherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known Disease gene set;
JudgementWhether it is greater than γ, when the judgment result is yes, the i-th row t in gene phenotype relational network C is set The value of column element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
In concrete scheme, the interaction of existing protein increase by Topology Similarity and Semantic Similarity and It deletes, and increases gene phenotype relationship using collaborative filtering method, reached to gene phenotype double-layer network structural adjustment Purpose, to effectively reduce the influence that false positive and false negative data detect disease module.
Step S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In Gene iterates to calculate the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene as candidate gene, according to Topology Similarity and Semantic Similarity increase and delete the connection relationship between gene, recalculate and select Semantic Similarity, S is added in the maximum candidate gene of the sum of Topology Similarity and phenotype relevance0, until iterating to calculate to the candidate gene collection expanded Close no longer simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate, biological pathway gene and in disease phenotype x In sample and normal sample when differential expression genes, the algebra that note expands candidate gene is m, by m-1 for disease gene set s0 The candidate gene of middle expansion and with target disease of the known disease gene as disease phenotype x that expands candidate gene and there is side to connect Ospc gene module simultaneously exports.
This step specifically includes:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of base Because of the gene for thering is side to connect, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define nth iteration The Topology Similarity of middle candidate gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and collection Close s0Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_ of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities Topology is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities, Gene sets TB is added in candidate gene i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In there is side to connect gene between functional similarity it is average Valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function Energy similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side connection Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd Aij=0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj ∈s0
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity it is average Valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function Energy similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd Aij=1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈ s0
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect base Cause, and as candidate gene, candidate gene set is denoted as c '0
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease base Because of set s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, and this e Phenotype similarity between disease phenotype and disease phenotype x is expressed as three comprising disease gene set and disease gene number Member set O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculating candidate gene i is associated with disease phenotype x's Degree
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge Whether there is candidate gene in gene phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously significant enrichment GO noumenon function relevant to disease phenotype x Annotation, biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands candidate gene Algebra be m, m-1 is expanded known to candidate genes are connected for the candidate gene expanded in disease gene set and with these Disease gene as disease phenotype x target disease module and export.
In concrete scheme, idiotype network and the phenotype network integration are constituted into gene phenotype double-layer network, because of existing life There is a large amount of mistake and missing in object data, existing gene interaction is carried out by Topology Similarity and Semantic Similarity Increase and delete and increase gene phenotype relationship using collaborative filtering method and gene phenotype double-layer network is adjusted, simultaneously With the involvement of disease phenotype data, to effectively reduce the influence of false positive and false negative data, disease module is improved The accuracy of detection algorithm.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (4)

1. a kind of netic module analysis method characterized by comprising
S1, input gene phenotype double-layer network, gene function similitude network and known disease gene relevant to disease phenotype x Set s0
Connection relationship in S2, increase gene phenotype double-layer network between gene and phenotype;
S3, by gene phenotype double-layer network with s0In any one disease gene have side connect and not in s0In gene conduct Candidate gene iterates to calculate the Topology Similarity, Semantic Similarity and phenotype relevance of each candidate gene, similar according to topology Property and Semantic Similarity increase and delete the connection relationship between gene, recalculate with selection Semantic Similarity, topology it is similar Property and the maximum candidate gene of the sum of phenotype relevance s is added0, no longer same to the candidate gene set expanded until iterating to calculate When significant enrichment relevant to disease phenotype x GO noumenon function annotation, biological pathway gene and in disease phenotype x sample and just In normal sample when differential expression genes, the algebra that note expands candidate gene is m, by m-1 for disease gene set s0Middle expansion Candidate gene and with target disease gene mould of the known disease gene as disease phenotype x that expands candidate gene and there is side to connect Block simultaneously exports.
2. netic module analysis method according to claim 1, which is characterized in that in step S1, the gene phenotype is double Layer network, specifically:
According to idiotype network A, phenotype similitude network B and gene phenotype relational network C construct gene phenotype double-layer network, The adjacency matrix of double-layer network can be expressed asWherein CTFor the transposed matrix of C.
3. netic module analysis method according to claim 2, which is characterized in that step S2 is specifically included:
Pass throughGene i in gene phenotype double-layer network is defined to be associated with the phenotype of gene j Property wij, wherein spectrum p (i) is the phenotype set comprising disease gene i, p (j) is the phenotype set comprising disease gene j, p (i) ∩ P (j) indicates while including the phenotype set of disease gene i and disease gene j, | p (i) | it is the phenotype number in set p (i), | P (j) | it is the phenotype number in set p (j), | p (u) | for disease gene number known to phenotype u;
Calculate phenotype relevance wijNormalized valueWherein, maxj′(wij′) indicate that gene phenotype is double-deck Gene i and gene j ' phenotype relevance are maximum in network, are worth for maxj′(wij′);
Calculate gene g in gene phenotype double-layer networkiIt is disease phenotype ptDisease gene a possibility thatWherein,Indicate gene giAnd gjPhenotype relevance, | pt| indicate disease phenotype ptIt is known Disease gene set;
JudgementWhether γ is greater than, when the judgment result is yes, the i-th row t being arranged in gene phenotype relational network C arranges member The value of element is 1, to increase giAnd ptBetween connection relationship, γ be preset variable element.
4. netic module analysis method according to claim 3, which is characterized in that step S3 is specifically included:
S301, definition current iteration number are n, and initialize n=1;
S302, disease gene set s relevant to disease phenotype x in gene phenotype double-layer network is obtained0Any of gene have side The gene of connection, and as candidate gene, it is denoted as c0={ g1,g2,…,gi,…,gw, and define candidate in nth iteration The Topology Similarity of gene i
Wherein, k is the connection number of edges in gene i and gene phenotype double-layer network between other genes, ksFor gene i and set s0 Connection number of edges between middle gene, N are the gene number in network;
S303, calculate n times iteration all candidate gene Topology Similarities average value W is the number when former generation candidate gene;
S304, by the average value ave_Topology of the Topology Similarity of candidate gene i and all candidate gene Topology Similarities It is compared, if the Topology Similarity of candidate gene i is greater than the average value of all candidate gene Topology Similarities, by candidate base Because gene sets TB is added in i;Otherwise gene sets TS is added in candidate gene i;
S305, each gene g in TB is calculatediWith set s0In have side connect gene between functional similarity average valueif Aij=1, wherein similarijFor gene giWith gene gjBetween function Similitude, Aij=1 indicates gene giWith gene gjThere are side connection, l1Indicate set s0In gene and gene giThere is side to connect Number;
S306, by similarijWith ave1_similariIt is compared, if similarij> ave1_similariAnd Aij= 0, then the i-th row j column element A in idiotype network A is setij=1, in giAnd gjBetween increase even side, wherein gi∈ TB, gj∈s0
S307, each gene g in TS is calculatediWith set s0In boundless connection gene between functional similarity average valueif Aij=0, wherein similarijFor gene giWith gene gjBetween function Similitude, Aij=0 indicates gene giWith gene gjBoundless connection, l2Indicate set s0In gene and gene giBoundless connection Number;
S308, by similarijWith ave2_similariIt is compared, if similarij< ave2_similariAnd Aij= 1, then the i-th row j column element A in idiotype network A is setij=0, to delete giAnd gjBetween connect side, wherein gi∈ TS, gj∈s0
S309, reacquire gene phenotype double-layer network in gene sets s0Any of gene have side connect gene, and will It is denoted as c ' as candidate gene, candidate gene set0
S310, c ' is calculated0The Topology Similarity of middle candidate gene i and obtain normalized value Wherein, TopologyiIndicate the Topology Similarity of candidate gene i, TopologyminIt indicates Minimum of topological similarity in all candidate genes, TopologymaxIndicate that the maximum topology in all candidate genes is similar Property;
S311, c ' is calculated0Middle candidate gene i Semantic SimilarityGene j is disease gene collection Close s0In gene, z set s0In gene number;
S312, c ' is calculated0The Semantic Similarity of middle candidate gene i and obtain normalized value Wherein, SimilariIndicate the Semantic Similarity of candidate gene i, SimilarminIndicate all times Select the minimum Semantic Similarity value in gene, SimilarmaxIndicate the maximum Semantic Similarity in all candidate genes;
S313, candidate gene i are appeared in the disease gene set of e disease phenotype similar with disease phenotype, this e disease Phenotype similarity between phenotype and disease phenotype x is expressed as three metasets comprising disease gene set and disease gene number Close O={ (s1,c1,t1),…,(sk,ck,tk),…,(se,ce,te), calculate the correlation degree of candidate gene i and disease phenotype x
S314, set of computations c '0The score score of middle candidate gene ii=Nsimilari-NTopoloyi+ri
S315, by score scoreiHighest candidate gene i extends to disease gene set s0In, n=n+1 is enabled, and judge gene Whether there is candidate gene in phenotype double-layer network, if so, executing step S302, otherwise, executes step S316;
S316, expansion candidate gene set no longer simultaneously relevant to the disease phenotype x GO noumenon function of significant enrichment annotate, Biological pathway gene and in disease phenotype x sample and normal sample when differential expression genes, note expands the generation of candidate gene Number is m, and m-1 is expanded the known disease that candidate gene is connected for the candidate gene expanded in disease gene set and with these Gene as disease phenotype x target disease module and export.
CN201910267199.7A 2019-04-03 2019-04-03 Gene module analysis method Active CN110060730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910267199.7A CN110060730B (en) 2019-04-03 2019-04-03 Gene module analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910267199.7A CN110060730B (en) 2019-04-03 2019-04-03 Gene module analysis method

Publications (2)

Publication Number Publication Date
CN110060730A true CN110060730A (en) 2019-07-26
CN110060730B CN110060730B (en) 2022-11-01

Family

ID=67318273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910267199.7A Active CN110060730B (en) 2019-04-03 2019-04-03 Gene module analysis method

Country Status (1)

Country Link
CN (1) CN110060730B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704747A (en) * 2019-10-11 2020-01-17 安徽大学 Intelligent patent recommendation method based on deep semantic similarity
CN110706740A (en) * 2019-09-29 2020-01-17 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN111540405A (en) * 2020-04-29 2020-08-14 新疆大学 Disease gene prediction method based on rapid network embedding
CN111951951A (en) * 2020-07-14 2020-11-17 西安电子科技大学 Disease module detection method and system based on connectivity significance
CN113947149A (en) * 2021-10-19 2022-01-18 大理大学 Similarity measurement method and device for gene module group, electronic device and storage medium
CN116343913A (en) * 2023-03-15 2023-06-27 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055922A (en) * 2016-06-08 2016-10-26 哈尔滨工业大学深圳研究生院 Hybrid network gene screening method based on gene expression data
US20170242959A1 (en) * 2016-02-24 2017-08-24 Ucb Biopharma Sprl Method and system for quantifying the likelihood that a gene is casually linked to a disease

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242959A1 (en) * 2016-02-24 2017-08-24 Ucb Biopharma Sprl Method and system for quantifying the likelihood that a gene is casually linked to a disease
CN106055922A (en) * 2016-06-08 2016-10-26 哈尔滨工业大学深圳研究生院 Hybrid network gene screening method based on gene expression data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄俊恒等: "利用蛋白质-表型网络的致病基因预测方法研究", 《计算机工程与应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706740A (en) * 2019-09-29 2020-01-17 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN110706740B (en) * 2019-09-29 2022-03-22 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN110704747A (en) * 2019-10-11 2020-01-17 安徽大学 Intelligent patent recommendation method based on deep semantic similarity
CN111540405A (en) * 2020-04-29 2020-08-14 新疆大学 Disease gene prediction method based on rapid network embedding
CN111951951A (en) * 2020-07-14 2020-11-17 西安电子科技大学 Disease module detection method and system based on connectivity significance
CN111951951B (en) * 2020-07-14 2023-06-23 西安电子科技大学 Disease module detection method and system based on connected significance
CN113947149A (en) * 2021-10-19 2022-01-18 大理大学 Similarity measurement method and device for gene module group, electronic device and storage medium
CN116343913A (en) * 2023-03-15 2023-06-27 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network
CN116343913B (en) * 2023-03-15 2023-11-14 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network

Also Published As

Publication number Publication date
CN110060730B (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN110060730A (en) A kind of netic module analysis method
Huang et al. Shrink: a structural clustering algorithm for detecting hierarchical communities in networks
CN110532436B (en) Cross-social network user identity recognition method based on community structure
CN104008165B (en) Club detecting method based on network topology and node attribute
CN103325061B (en) A kind of community discovery method and system
CN109935332A (en) A kind of miRNA- disease association prediction technique based on double random walk models
Zhang et al. Protein complex prediction in large ontology attributed protein-protein interaction networks
Wang et al. Dynamic community detection based on network structural perturbation and topological similarity
Khan et al. Virtual community detection through the association between prime nodes in online social networks and its application to ranking algorithms
CN103034687B (en) A kind of relating module recognition methodss based on 2 class heterogeneous networks
CN112084373B (en) Graph embedding-based multi-source heterogeneous network user alignment method
CN103020163A (en) Node-similarity-based network community division method in network
CN110533253A (en) A kind of scientific research cooperative Relationship Prediction method based on Heterogeneous Information network
Botta et al. Finding network communities using modularity density
CN109165040A (en) A method of the code copy suspicion detection based on Random Forest model
Bro et al. Surname affinity in Santiago, Chile: A network-based approach that uncovers urban segregation
CN114969369A (en) Knowledge graph human cancer lethal prediction method based on mixed network and knowledge graph construction method
Mihelčić et al. A framework for redescription set construction
CN115620143A (en) New classical architecture style identification system, construction method and identification method
CN115440392A (en) Important super-edge identification method based on post-deletion Laplace matrix
CN105893481A (en) Method for decomposing relation among entities based on Markov clustering
CN102779241B (en) PPI (Point-Point Interaction) network clustering method based on artificial swarm reproduction mechanism
CN109101783B (en) Cancer network marker determination method and system based on probability model
CN106354886A (en) Method for screening nearest neighbor by using potential neighbor relation graph in recommendation system
CN107862073B (en) Web community division method based on node importance and separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant