CN107766697A - A kind of general cancer gene expression and the association analysis method that methylates - Google Patents

A kind of general cancer gene expression and the association analysis method that methylates Download PDF

Info

Publication number
CN107766697A
CN107766697A CN201710838076.5A CN201710838076A CN107766697A CN 107766697 A CN107766697 A CN 107766697A CN 201710838076 A CN201710838076 A CN 201710838076A CN 107766697 A CN107766697 A CN 107766697A
Authority
CN
China
Prior art keywords
gene
module
expression
sub
methylates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710838076.5A
Other languages
Chinese (zh)
Inventor
杨利英
耿芳歌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710838076.5A priority Critical patent/CN107766697A/en
Publication of CN107766697A publication Critical patent/CN107766697A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to bioinformatics and supervised learning method and technology field, a kind of general cancer gene expression and the association analysis method that methylates are disclosed, difference expression gene and differential methylation site are filtered out using the sane t methods of inspection for combining linear model and Empirical Bayes;Differential gene and methylation sites are combined with protein reciprocation network, find significance difference opposite sex module using PPI protein reciprocation networks as framework;The sub-net module that false discovery rate FDR is more than threshold value is deleted, and obtains the sub-net module with statistical significance;Similar association mode and common regulatory gene between finding out gene expression and methylating.The present invention combines PPI networks, and finding influences the molecular pathway that cancer occurs, and finds algorithm using SPG modules, the sub-net module drawn has biological meaning.The pathogenic pattern of identical that the present invention is used to find between various cancers, so as to be preferably applied for cancer pathogenesis.

Description

A kind of general cancer gene expression and the association analysis method that methylates
Technical field
The invention belongs to bioinformatics and supervised learning method and technology field, more particularly to a kind of general cancer gene expression With the association analysis method that methylates.
Background technology
Epigenetics is to study gene in the case where nucleotide sequence does not change, the heritable change of gene expression A science of heredity subdiscipline.Epigenetic has a lot, and DNA methylation is most common of which one kind.Research shows that DNA is different Normal methylating is an important factor for causing cancer to occur, to exist under normal circumstances between DNA methylation and gene expression dose Certain association.Abnormal the methylating of promoter region is considered as a kind of mark of cancer, frequently can lead to tumor suppression base The silence of cause and enlivening for oncogene.General cancer analysis is a new breakthrough of current cancer research, by finding different carcinoma The identical and different pathogenic factor of organizational boundaries is crossed between disease, the genesis mechanism of cancer is explored from molecular level.Currently, it is right In in cancer between DNA methylation and gene expression for the research of relation, because the spy of higher-dimension small sample in itself be present in data Point, it is with influence of the gene expression for cancer on methylating for some gene using what traditional association analysis method was drawn It is incomplete, it is impossible to the pathogenesis of cancer to be inquired into from full-length genome aspect, because gene is usually phase in organism Mutually adjust, be interactional, the formation of many diseases is the result of multiple gene associations effects.Furthermore general cancer research analysis Similar association mode between the DNA methylation of kinds cancer and gene expression, correlative study show between different tumour subgroups Similar methylation patterns be present, such as TP53 mutation can cause serous ovarian cancer, the serosity intrauterine of high malignancy Film cancer and substrate sample breast cancer, they have a common transcription features, are related to the activation of similar oncogenic pathways.
In summary, the problem of prior art is present be:Using traditional association analysis method draw on some base Methylating with influence of the gene expression for cancer for cause is incomplete, it is impossible to from full-length genome aspect to the complicated disease such as cancer The pathogenesis of disease is inquired into.
The content of the invention
The problem of existing for prior art, the invention provides a kind of general cancer gene expression and the association analysis that methylates Method.
The present invention is achieved in that a kind of general cancer gene expression and the association analysis method that methylates, the general cancer Gene expression and the association analysis method that methylates are filtered out using the sane t methods of inspection for combining linear model and Empirical Bayes Difference expression gene and differential methylation site;Differential gene and methylation sites are combined with protein reciprocation network, It was found that significance difference opposite sex module is using PPI protein reciprocation networks as framework;False discovery rate FDR is more than the subnet mould of threshold value Block is deleted, and obtains the sub-net module with statistical significance;Between finding out gene expression and methylating similar association mode and Common regulatory gene.
Further, the general cancer gene expression and the association analysis method that methylates comprise the following steps:
Step 1, gene expression data and the data prediction that methylates, are more than columns by missing data and null value number Standard deviation, is then less than 1 gene elmination by 70% gene elmination, and remaining Gene Name is changed into corresponding NCBI Ebtrez gene ID are corresponding gene numbering in data;
The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and sieve according to previous step The gene of choosing, the methylation sites of corresponding gene promoter position are screened;
Step 2, differential expression is filtered out using the sane t methods of inspection of a kind of combination linear model and Empirical Bayes Gene and differential methylation site utilize together with normal and cancer sample gene expression data and the Data Integration that methylates One phenotype vector distinguishes both types;
Step 3, using protein reciprocation network PPI as framework, network includes 8434 genes and 303600 Side;Weight of the result that integration back is drawn as PPI networks, significance difference opposite sex mould is drawn using SPG community discovery algorithms Block;By the t of each geneg (D)Methylate data variance analysis result and tg (R)Gene expression data difference analysis result Variance carries out unitized processing, has identical variance:
tg={ H (tg (D))H(-tg (R))+H(-tg (D))H(tg (R))}|tg (D)-tg (R)|;
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe weight on the side that two gene g connect with h is determined in network Justice is:wgh=(tg+th)/2;Using spin glass algorithm, by tgBig preceding 100 genes of value utilize community discovery as seed Algorithm finds out the sub-net module of the maximization weight sum including seed cdna;
Step 4, derived differential expression module depend on PPI network topology structures, and institute is assessed using monte carlo method Obtain the statistical significance of sub-net module;The arrangement of 1000 times is carried out to the node data in network and recalculates subnet mould Block, the sub-net module that false discovery rate FDR is more than to threshold value are deleted, and obtain the sub-net module with statistical significance;
Step 5, by six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out base in each cancer Because of similar association mode and common regulatory gene between expressing and methylating.
Further, in the step 2:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viFor i-th of gene Covariance matrix diagonal element, d when doing simple regressioniFor the linear model error free degree of i-th of gene, d0It is diPriori Estimation, s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate can by the prior distribution assumed and gene expression and The data that methylate are obtained, and difference expression gene is screened according to the t values of the final gained of each gene;By calculating each gene first The t values screening differential methylation site in base site.
Further, in the step 3:The differential expression of each gene and the relative difference of methylation differential degree are made For the value of PPI nodes, the weight of the average value of two nodal values as side;Found out according to SPG algorithms significant in network Sub-net module, hamilton's function is as relevance measure function between assessment disparate modules:
Wherein, σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijRepresent existing between point i and point j The probability on side;δ(σij) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0;γ=0.5 is chosen, Make the size of object module in 10 to 100 genes.
Further, in the step 4:In order to verify the statistical significance of each sub-net module of gained, tried to achieve according to following formula Module C module value:
The weighted value on each side in C is wgh, the collection at module midpoint is combined into V (C);
MC methods rearrange nodal value by 1000 times, assess the conspicuousness of gained module value, select significance Result less than 0.05.
Advantages of the present invention and good effect are:DNA methylation and gene are excavated for general cancer, from full-length genome aspect Association between expression, the present invention are used to find between various cancers that identical is caused a disease pattern, to cause so as to be preferably applied for cancer Anttdisease Mechanism is inquired into.Examined in difference analysis using sane t, the high difference of reliability is drawn using linear model and Empirical Bayes Different in nature methylation sites and difference expression gene.With reference to PPI networks, finding influences the molecular pathway that cancer occurs, and uses SPG moulds Block finds algorithm, draws sub-net module.Method therefor and existing methods Comparative result of the present invention shows, the meter of the inventive method Result is calculated to be obviously improved in terms of sensitivity and specificity.
Brief description of the drawings
Fig. 1 is general cancer gene expression provided in an embodiment of the present invention and the association analysis method flow chart that methylates.
Fig. 2 is methylating and gene expression association mode figure for six kinds of cancer genes provided in an embodiment of the present invention.
Fig. 3 is the inventive method provided in an embodiment of the present invention (FEM) and existing method (BioNet) in result of calculation spirit The comparison diagram of quick property and specificity aspect.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
The present invention is examined based on sane t and FEM association analysis method, draws the similar channels and phase for influenceing various cancers Same associated gene;FEM methods are directed to gene expression and the data that methylate, and find the molecular pathway and difference that change is expressed in cancer Express module.
The application principle of the present invention is explained in detail below in conjunction with the accompanying drawings.
As shown in figure 1, general cancer gene expression provided in an embodiment of the present invention and the association analysis method that methylates include with Lower step:
S101:Gene expression data and the data prediction that methylates, are more than columns by missing data and null value number Standard deviation, is then less than 1 gene elmination by 70% gene elmination, and remaining Gene Name is changed into corresponding NCBI Ebtrez gene ID;The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and according to upper The gene of one step screening, the methylation sites of corresponding gene promoter position are screened;
S102:Difference expression gene and differential methylation site are screened, linear model and experience pattra leaves are combined using one kind This sane t methods of inspection filter out difference expression gene and differential methylation site.By normal sample and the base of cancer sample Together with expression data and the Data Integration that methylates, phenotype vector differentiation both types are utilized;
S103:Obtained differential gene and methylation sites are combined with protein reciprocation network, find significance difference For different in nature module using PPI protein reciprocation networks as framework, the network includes 8434 genes and 303600 sides.Will be every The value of the differential expression of individual gene and the relative difference of methylation differential degree as PPI nodes, two nodal values are put down Weight of the average as side.Significance difference opposite sex module is drawn using SPG community discovery algorithms;
S104:Sub-net module of the screening with statistical significance, derived differential expression module depend on PPI network topologies Structure, the statistical significance of gained sub-net module is assessed using Monte Carlo methods.Node data in network is carried out The arrangement of 1000 times and these sub-net modules are recalculated, the sub-net module that false discovery rate FDR is more than to threshold value is deleted, and is obtained To the sub-net module with statistical significance;
S105:By six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out gene in each cancer Similar association mode and common regulatory gene between expressing and methylating.
In step s 102:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viFor i-th of gene Covariance matrix diagonal element, d when doing simple regressioniFor the linear model error free degree of i-th of gene, d0It is diPriori Estimation, s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate can by the prior distribution assumed and gene expression and The data that methylate are obtained, and difference expression gene is screened according to the t values of the final gained of each gene.Similarly, it is each by calculating The t values screening differential methylation site in gene methylation site.
In step s 103:Significance difference opposite sex module is drawn using SPG community discovery algorithms.By the t of each geneg (D) (methylate data variance analysis result) and tg (R)The variance of (i.e. gene expression data difference analysis result) is united One change is handled, and makes it have identical variance.It is often negatively correlated association due to gene expression and between methylating:
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe side that then two gene g connect with h in network Weight definition is:wgh=(tg+th)/2。
Using spin glass algorithm SPG, by tgBig preceding 100 genes of value are calculated as seed using a kind of community discovery Method finds out the sub-net module of the maximization weight sum including seed cdna.In SPG algorithms, input is weight adjacency matrix W, is made The reason for this algorithm, is as follows:First, to allow the module alternative released, that is, allow appropriate overlapping between disparate modules. Secondly, algorithm will also avoid the occurrence of the result of high superposed.SPG algorithms are a kind of greedy algorithm, by specifying seed node Mode finds conspicuousness module.But not all seed node can obtain a module, because some seeds may It is isolated node.The size of gained module can be determined by adjusting the parameter γ (0≤γ≤1) in SPG algorithms.
σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijExisting side is general between expression point i and point j Rate.δ(σij) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0.
γ=0.5 is made, makes the size of object module between 10 to 100 genes.First, unsupervised linear dimension reduction method (PCA and ICA) is used in the gene for finding that the gene number of co-expression gene module is usually studied on gene expression data Several 1% or so.The number gene selected in experiment after difference analysis is in 8000 genes or so, so the gene drawn The number of module should be 80 or so.Research shows, during γ=0.5, object module has minimum overlapping.
In step S104:In order to verify the statistical significance of each sub-net module of gained, concrete operations are as follows:
Computing module C module valueThe weighted value on each side in C is wgh, module midpoint Collection be combined into V (C).MC methods rearrange nodal value by 1000 times, then can assess the conspicuousness of gained module value, select Those significances are less than 0.05 result.
The application principle of the present invention is further described with reference to specific embodiment.
General cancer gene expression provided in an embodiment of the present invention and the association analysis method that methylates comprise the following steps:
Step 1:Six kinds of cancers methylate data and gene expression data pretreatment
The present invention is mainly TCGA (The Cancer GenomeAtlas:Cancer and oncogene collection of illustrative plates) database offer Six kinds of cancers composition general cancer project data, six kinds of cancers are head and neck squamous cell carcinoma HNSC, kidney hyaline cell respectively Cancer KIRC, breast cancer BRCA, lung squamous cancer LUSC, adenocarcinoma of lung LUAD and carcinoma of endometrium UCEC.For methylating for every kind of cancer Data, remove the row that data are NA;For gene expression data, then need to remove the row of no gene name and variance is small Row in 1 removes, and then logarithmic scaleization is handled.
Methylate data prediction result, and initial data is 20530 rows, and six kinds of cancers are respectively after pretreatment:UCEC Remaining 15469 rows, BRCA 17489 rows of residue, KIRC 17318 rows of residue, LUSC 17113 rows of residue, LUAD 17386 rows of residue, HNSC 17553 rows of residue.
Step 2:Six kinds of cancer differential methylation sites and the screening of difference expression gene
Examined using sane t, calculate methylate t values and the gene expression t values of each gene.
Step 3:The acquisition of notable sub-net module
PPI networks, the t of each gene are built using difference analysis result as input datag (D)(methylate data difference Specific analysis result) and tg (R)(i.e. gene expression data difference analysis result), two gene g connect with h in PPI networks The weight definition on side is:wgh=(tg+th)/2.Six kinds of cancers are tested respectively, and finding out conspicuousness using SPG algorithms is higher than net The difference submodule of other parts in network.In order to verify the statistical significance of gained sub-net module, counted using MC methods Property examine, following table be meet p value be less than 0.05 conspicuousness sub-net module.
The sub-net module of 1 six kinds of cancers of table
The high expression of significant hyper-methylation low expression, hypomethylation, the high expression of hyper-methylation, hypomethylation in six kinds of cancers Four kinds of association mode genes of low expression are as shown in table 2.Wherein, MLEL represents hypomethylation low expression, and MLEH represents hypomethylation Height expression, MHEL represent hyper-methylation low expression, and MHEH hyper-methylations are high to express, similarly hereinafter.Statistics are every kind of cancer in table The number of gene under corresponding association mode.
The quantity of gene in 2 four kinds of association modes of table
BRCA KIRC UCEC LUSC HNSC LUAD
MLEL 25 22 19 34 31 32
MLEH 25 97 29 24 45 30
MHEL 88 111 18 85 23 56
MHEH 32 90 8 57 21 24
Inquire about IntOGen databases (http://www.intogen.org/search), obtain and caused corresponding to six kinds of cancers Ospc gene, KIRC and LUSC identical Disease-causing gene SETD2, LUAD and BRCA identical pathogenic BMPR2, KIRC and LUSC phase It is both present in Disease-causing gene LRP6 in database.Genecard databases have collected the related base of a large amount of diseases such as to cancer Cause, in the database inquire about six kinds of cancers related genes, and with acquired results it was found that, LUSC have 67, LUAD have 50, KIRC have 73, HNSC have 41 identical Disease-causing genes, as a result such as table 3.
The Genecard databases of table 3 are mutually homogenic with this experimental result
The significant hyper-methylation low expression of some in six kinds of cancers, hypomethylation height are expressed, hyper-methylation height is expressed and low The model identical gene for the low expression that methylates is listed in Table 4 below.
The DNA methylation and gene expression association mode of 4 six kinds of cancers of table
Experiment finds have the number gene of identical association mode most between BRCA and LUAD both cancers;BRCA、 The number gene of identical association mode takes second place between LUAD and UCEC.
The as shown by data negative correlativing relation between intensity of variation and changes in gene expression that methylates is present in most of cancers in table 4 In disease.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims (5)

1. a kind of general cancer gene expression and the association analysis method that methylates, it is characterised in that the general cancer gene expression and The association analysis method that methylates filters out differential expression base using the sane t methods of inspection for combining linear model and Empirical Bayes Cause and differential methylation site;Differential gene and methylation sites are combined with protein reciprocation network, find significance difference Different in nature module is using PPI protein reciprocation networks as framework;The sub-net module that false discovery rate FDR is more than threshold value is deleted, and is obtained To the sub-net module with statistical significance;Similar association mode and common regulation between finding out gene expression and methylating Gene.
2. general cancer gene expression as claimed in claim 1 and the association analysis method that methylates, it is characterised in that the general cancer Disease gene expression and the association analysis method that methylates comprise the following steps:
Step 1, gene expression data and the data prediction that methylates, are more than columns 70% by missing data and null value number Gene elmination, then standard deviation is less than 1 gene elmination, and by remaining Gene Name be changed to corresponding to NCBI Ebtrez Gene ID are corresponding gene numbering in data;
The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and screened according to previous step Gene, the methylation sites of corresponding gene promoter position are screened;
Step 2, difference expression gene is filtered out using the sane t methods of inspection of a kind of combination linear model and Empirical Bayes With differential methylation site by together with normal and cancer sample gene expression data and the Data Integration that methylates, one is utilized Phenotype vector distinguishes both types;
Step 3, using protein reciprocation network PPI as framework, network includes 8434 genes and 303600 sides;It is whole Weight of the result that conjunction back is drawn as PPI networks, significance difference opposite sex module is drawn using SPG community discovery algorithms;Will The t of each geneg (D)Methylate data variance analysis result and tg (R)The variance of gene expression data difference analysis result is entered The unitized processing of row, has identical variance:
tg={ H (tg (D))H(-tg (R))+H(-tg (D))H(tg (R))}|tg (D)-tg (R)|;
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe weight definition on the side that two gene g connect with h in network For:wgh=(tg+th)/2;Using spin glass algorithm, by tgBig preceding 100 genes of value are calculated as seed using community discovery Method finds out the sub-net module of the maximization weight sum including seed cdna;
Step 4, derived differential expression module depend on PPI network topology structures, and resulting bottle is assessed using monte carlo method The statistical significance of net module;The arrangement of 1000 times is carried out to the node data in network and recalculates sub-net module, will The sub-net module that false discovery rate FDR is more than threshold value is deleted, and obtains the sub-net module with statistical significance;
Step 5, by six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out gene table in each cancer Similar association mode and common regulatory gene between reaching and methylating.
3. general cancer gene expression as claimed in claim 2 and the association analysis method that methylates, it is characterised in that the step In two:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viLetter is done for i-th of gene Covariance matrix diagonal element during simple regression, diFor the linear model error free degree of i-th of gene, d0It is diPrior estimate, s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate by the prior distribution assumed and gene expression and can methylate Data are obtained, and difference expression gene is screened according to the t values of the final gained of each gene;By calculating each gene methylation position The t values screening differential methylation site of point.
4. general cancer gene expression according to claim 2 and the association analysis method that methylates, it is characterised in that the step In rapid three:Value using the differential expression of each gene and the relative difference of methylation differential degree as PPI nodes, two Weight of the average value of nodal value as side;Significant sub-net module in network is found out according to SPG algorithms, hamilton's function is made To assess relevance measure function between disparate modules:
Wherein, σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijExisting side between expression point i and point j Probability;δ(σij) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0;γ=0.5 is chosen, makes knot The size of fruit module is in 10 to 100 genes.
5. general cancer gene expression according to claim 2 and the association analysis method that methylates, it is characterised in that the step In rapid four:In order to verify the statistical significance of each sub-net module of gained, module C module value is tried to achieve according to following formula:
The weighted value on each side in C is wgh, the collection at module midpoint is combined into V (C);
MC methods rearrange nodal value by 1000 times, assess the conspicuousness of gained module value, select significance and be less than 0.05 result.
CN201710838076.5A 2017-09-18 2017-09-18 A kind of general cancer gene expression and the association analysis method that methylates Pending CN107766697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710838076.5A CN107766697A (en) 2017-09-18 2017-09-18 A kind of general cancer gene expression and the association analysis method that methylates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710838076.5A CN107766697A (en) 2017-09-18 2017-09-18 A kind of general cancer gene expression and the association analysis method that methylates

Publications (1)

Publication Number Publication Date
CN107766697A true CN107766697A (en) 2018-03-06

Family

ID=61265614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710838076.5A Pending CN107766697A (en) 2017-09-18 2017-09-18 A kind of general cancer gene expression and the association analysis method that methylates

Country Status (1)

Country Link
CN (1) CN107766697A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109411023A (en) * 2018-09-30 2019-03-01 华中农业大学 Interactive relation method for digging between a kind of gene based on Bayesian Network Inference
CN110349633A (en) * 2019-07-12 2019-10-18 大连海事大学 A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways
CN110428866A (en) * 2019-07-23 2019-11-08 哈尔滨工业大学 Cancer related pathways recognition methods based on network integration multiple groups data
CN111640468A (en) * 2020-05-18 2020-09-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease-related protein based on complex network
CN112599201A (en) * 2020-12-15 2021-04-02 中国人民解放军军事科学院军事医学研究院 System for analyzing infection path between virus receptor and human target organ, and electronic device
CN112662763A (en) * 2020-03-10 2021-04-16 博尔诚(北京)科技有限公司 Probe composition for detecting common amphoteric cancers
CN112951418A (en) * 2021-05-17 2021-06-11 臻和(北京)生物科技有限公司 Method and device for evaluating methylation of linked regions based on liquid biopsy, terminal equipment and storage medium
CN114373502A (en) * 2022-01-07 2022-04-19 吉林大学第一医院 Tumor data analysis system based on methylation
CN115019884A (en) * 2022-05-13 2022-09-06 华东交通大学 Network marker identification method fusing multiple groups of mathematical data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740651A (en) * 2016-03-07 2016-07-06 吉林大学 Construction method for specific cancer differential expression gene regulation and control network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740651A (en) * 2016-03-07 2016-07-06 吉林大学 Construction method for specific cancer differential expression gene regulation and control network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GORDON K. SMYTH ET AL.: "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments", 《STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY 3》 *
JAMES WEST ET AL.: "An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways", 《SCIENTIFIC REPORTS》 *
YINMING JIAO ET AL.: "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control", 《ORIGINAL PAPER》 *
殷黎洋: "全基因组乳腺癌DNA甲基化与基因表达关联模式", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109411023A (en) * 2018-09-30 2019-03-01 华中农业大学 Interactive relation method for digging between a kind of gene based on Bayesian Network Inference
CN109411023B (en) * 2018-09-30 2022-03-18 华中农业大学 Method for mining inter-gene interaction relation based on Bayesian network inference
CN110349633A (en) * 2019-07-12 2019-10-18 大连海事大学 A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways
CN110428866A (en) * 2019-07-23 2019-11-08 哈尔滨工业大学 Cancer related pathways recognition methods based on network integration multiple groups data
CN112662763A (en) * 2020-03-10 2021-04-16 博尔诚(北京)科技有限公司 Probe composition for detecting common amphoteric cancers
CN111640468A (en) * 2020-05-18 2020-09-08 天士力国际基因网络药物创新中心有限公司 Method for screening disease-related protein based on complex network
CN112599201A (en) * 2020-12-15 2021-04-02 中国人民解放军军事科学院军事医学研究院 System for analyzing infection path between virus receptor and human target organ, and electronic device
CN112951418A (en) * 2021-05-17 2021-06-11 臻和(北京)生物科技有限公司 Method and device for evaluating methylation of linked regions based on liquid biopsy, terminal equipment and storage medium
CN114373502A (en) * 2022-01-07 2022-04-19 吉林大学第一医院 Tumor data analysis system based on methylation
CN114373502B (en) * 2022-01-07 2022-12-06 吉林大学第一医院 Tumor data analysis system based on methylation
CN115019884A (en) * 2022-05-13 2022-09-06 华东交通大学 Network marker identification method fusing multiple groups of mathematical data
CN115019884B (en) * 2022-05-13 2023-11-03 华东交通大学 Network marker identification method integrating multiple groups of chemical data

Similar Documents

Publication Publication Date Title
CN107766697A (en) A kind of general cancer gene expression and the association analysis method that methylates
Gawad et al. Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics
Hackett et al. Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps
CN109326316A (en) A kind of Multi-Layered Network Model construction method and the application of cancer related SNP, gene, miRNA and protein interaction
Silver et al. Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps
US20200239965A1 (en) Source of origin deconvolution based on methylation fragments in cell-free dna samples
Bhattacharyya et al. MicroRNA signatures highlight new breast cancer subtypes
CN108319984B (en) The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level
KR101949286B1 (en) Method and system for tailored anti-cancer therapy based on the information of genomic sequence variant and survival of cancer patient
CN108804876B (en) Method and apparatus for calculating purity and chromosome ploidy of cancer sample
Hopp et al. Portraying the expression landscapes of cancer subtypes: A case study of glioblastoma multiforme and prostate cancer
Zeng et al. Cancer classification and pathway discovery using non-negative matrix factorization
KR20220086603A (en) Cancer classification using tissue-of-origin thresholding
CN115631789A (en) Pangenome-based group joint variation detection method
CN113192556B (en) Genotype and phenotype association analysis method in multigroup chemical data based on small sample
CN107368702A (en) A kind of method of structure miRNA regulated and control networks
CN115019884B (en) Network marker identification method integrating multiple groups of chemical data
Tian et al. Sparse group selection on fused lasso components for identifying group-specific DNA copy number variations
Li et al. Ensemble-based multi-objective clustering algorithms for gene expression data sets
Wu et al. Network-based method for inferring cancer progression at the pathway level from cross-sectional mutation data
Elsheikh et al. Relating connectivity changes in brain networks to genetic information in Alzheimer patients
Chakrapani et al. Effective utilisation of influence maximization technique for the identification of significant nodes in breast cancer gene networks
Nemes et al. A diagnostic algorithm to identify paired tumors with clonal origin
Wu et al. Network‐based method for detecting dysregulated pathways in glioblastoma cancer
Fang et al. Joint detection of associations between DNA methylation and gene expression from multiple cancers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180306