CN107766697A - A kind of general cancer gene expression and the association analysis method that methylates - Google Patents
A kind of general cancer gene expression and the association analysis method that methylates Download PDFInfo
- Publication number
- CN107766697A CN107766697A CN201710838076.5A CN201710838076A CN107766697A CN 107766697 A CN107766697 A CN 107766697A CN 201710838076 A CN201710838076 A CN 201710838076A CN 107766697 A CN107766697 A CN 107766697A
- Authority
- CN
- China
- Prior art keywords
- gene
- module
- expression
- sub
- methylates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to bioinformatics and supervised learning method and technology field, a kind of general cancer gene expression and the association analysis method that methylates are disclosed, difference expression gene and differential methylation site are filtered out using the sane t methods of inspection for combining linear model and Empirical Bayes;Differential gene and methylation sites are combined with protein reciprocation network, find significance difference opposite sex module using PPI protein reciprocation networks as framework;The sub-net module that false discovery rate FDR is more than threshold value is deleted, and obtains the sub-net module with statistical significance;Similar association mode and common regulatory gene between finding out gene expression and methylating.The present invention combines PPI networks, and finding influences the molecular pathway that cancer occurs, and finds algorithm using SPG modules, the sub-net module drawn has biological meaning.The pathogenic pattern of identical that the present invention is used to find between various cancers, so as to be preferably applied for cancer pathogenesis.
Description
Technical field
The invention belongs to bioinformatics and supervised learning method and technology field, more particularly to a kind of general cancer gene expression
With the association analysis method that methylates.
Background technology
Epigenetics is to study gene in the case where nucleotide sequence does not change, the heritable change of gene expression
A science of heredity subdiscipline.Epigenetic has a lot, and DNA methylation is most common of which one kind.Research shows that DNA is different
Normal methylating is an important factor for causing cancer to occur, to exist under normal circumstances between DNA methylation and gene expression dose
Certain association.Abnormal the methylating of promoter region is considered as a kind of mark of cancer, frequently can lead to tumor suppression base
The silence of cause and enlivening for oncogene.General cancer analysis is a new breakthrough of current cancer research, by finding different carcinoma
The identical and different pathogenic factor of organizational boundaries is crossed between disease, the genesis mechanism of cancer is explored from molecular level.Currently, it is right
In in cancer between DNA methylation and gene expression for the research of relation, because the spy of higher-dimension small sample in itself be present in data
Point, it is with influence of the gene expression for cancer on methylating for some gene using what traditional association analysis method was drawn
It is incomplete, it is impossible to the pathogenesis of cancer to be inquired into from full-length genome aspect, because gene is usually phase in organism
Mutually adjust, be interactional, the formation of many diseases is the result of multiple gene associations effects.Furthermore general cancer research analysis
Similar association mode between the DNA methylation of kinds cancer and gene expression, correlative study show between different tumour subgroups
Similar methylation patterns be present, such as TP53 mutation can cause serous ovarian cancer, the serosity intrauterine of high malignancy
Film cancer and substrate sample breast cancer, they have a common transcription features, are related to the activation of similar oncogenic pathways.
In summary, the problem of prior art is present be:Using traditional association analysis method draw on some base
Methylating with influence of the gene expression for cancer for cause is incomplete, it is impossible to from full-length genome aspect to the complicated disease such as cancer
The pathogenesis of disease is inquired into.
The content of the invention
The problem of existing for prior art, the invention provides a kind of general cancer gene expression and the association analysis that methylates
Method.
The present invention is achieved in that a kind of general cancer gene expression and the association analysis method that methylates, the general cancer
Gene expression and the association analysis method that methylates are filtered out using the sane t methods of inspection for combining linear model and Empirical Bayes
Difference expression gene and differential methylation site;Differential gene and methylation sites are combined with protein reciprocation network,
It was found that significance difference opposite sex module is using PPI protein reciprocation networks as framework;False discovery rate FDR is more than the subnet mould of threshold value
Block is deleted, and obtains the sub-net module with statistical significance;Between finding out gene expression and methylating similar association mode and
Common regulatory gene.
Further, the general cancer gene expression and the association analysis method that methylates comprise the following steps:
Step 1, gene expression data and the data prediction that methylates, are more than columns by missing data and null value number
Standard deviation, is then less than 1 gene elmination by 70% gene elmination, and remaining Gene Name is changed into corresponding NCBI
Ebtrez gene ID are corresponding gene numbering in data;
The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and sieve according to previous step
The gene of choosing, the methylation sites of corresponding gene promoter position are screened;
Step 2, differential expression is filtered out using the sane t methods of inspection of a kind of combination linear model and Empirical Bayes
Gene and differential methylation site utilize together with normal and cancer sample gene expression data and the Data Integration that methylates
One phenotype vector distinguishes both types;
Step 3, using protein reciprocation network PPI as framework, network includes 8434 genes and 303600
Side;Weight of the result that integration back is drawn as PPI networks, significance difference opposite sex mould is drawn using SPG community discovery algorithms
Block;By the t of each geneg (D)Methylate data variance analysis result and tg (R)Gene expression data difference analysis result
Variance carries out unitized processing, has identical variance:
tg={ H (tg (D))H(-tg (R))+H(-tg (D))H(tg (R))}|tg (D)-tg (R)|;
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe weight on the side that two gene g connect with h is determined in network
Justice is:wgh=(tg+th)/2;Using spin glass algorithm, by tgBig preceding 100 genes of value utilize community discovery as seed
Algorithm finds out the sub-net module of the maximization weight sum including seed cdna;
Step 4, derived differential expression module depend on PPI network topology structures, and institute is assessed using monte carlo method
Obtain the statistical significance of sub-net module;The arrangement of 1000 times is carried out to the node data in network and recalculates subnet mould
Block, the sub-net module that false discovery rate FDR is more than to threshold value are deleted, and obtain the sub-net module with statistical significance;
Step 5, by six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out base in each cancer
Because of similar association mode and common regulatory gene between expressing and methylating.
Further, in the step 2:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viFor i-th of gene
Covariance matrix diagonal element, d when doing simple regressioniFor the linear model error free degree of i-th of gene, d0It is diPriori
Estimation, s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate can by the prior distribution assumed and gene expression and
The data that methylate are obtained, and difference expression gene is screened according to the t values of the final gained of each gene;By calculating each gene first
The t values screening differential methylation site in base site.
Further, in the step 3:The differential expression of each gene and the relative difference of methylation differential degree are made
For the value of PPI nodes, the weight of the average value of two nodal values as side;Found out according to SPG algorithms significant in network
Sub-net module, hamilton's function is as relevance measure function between assessment disparate modules:
Wherein, σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijRepresent existing between point i and point j
The probability on side;δ(σi,σj) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0;γ=0.5 is chosen,
Make the size of object module in 10 to 100 genes.
Further, in the step 4:In order to verify the statistical significance of each sub-net module of gained, tried to achieve according to following formula
Module C module value:
The weighted value on each side in C is wgh, the collection at module midpoint is combined into V (C);
MC methods rearrange nodal value by 1000 times, assess the conspicuousness of gained module value, select significance
Result less than 0.05.
Advantages of the present invention and good effect are:DNA methylation and gene are excavated for general cancer, from full-length genome aspect
Association between expression, the present invention are used to find between various cancers that identical is caused a disease pattern, to cause so as to be preferably applied for cancer
Anttdisease Mechanism is inquired into.Examined in difference analysis using sane t, the high difference of reliability is drawn using linear model and Empirical Bayes
Different in nature methylation sites and difference expression gene.With reference to PPI networks, finding influences the molecular pathway that cancer occurs, and uses SPG moulds
Block finds algorithm, draws sub-net module.Method therefor and existing methods Comparative result of the present invention shows, the meter of the inventive method
Result is calculated to be obviously improved in terms of sensitivity and specificity.
Brief description of the drawings
Fig. 1 is general cancer gene expression provided in an embodiment of the present invention and the association analysis method flow chart that methylates.
Fig. 2 is methylating and gene expression association mode figure for six kinds of cancer genes provided in an embodiment of the present invention.
Fig. 3 is the inventive method provided in an embodiment of the present invention (FEM) and existing method (BioNet) in result of calculation spirit
The comparison diagram of quick property and specificity aspect.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
The present invention is examined based on sane t and FEM association analysis method, draws the similar channels and phase for influenceing various cancers
Same associated gene;FEM methods are directed to gene expression and the data that methylate, and find the molecular pathway and difference that change is expressed in cancer
Express module.
The application principle of the present invention is explained in detail below in conjunction with the accompanying drawings.
As shown in figure 1, general cancer gene expression provided in an embodiment of the present invention and the association analysis method that methylates include with
Lower step:
S101:Gene expression data and the data prediction that methylates, are more than columns by missing data and null value number
Standard deviation, is then less than 1 gene elmination by 70% gene elmination, and remaining Gene Name is changed into corresponding NCBI
Ebtrez gene ID;The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and according to upper
The gene of one step screening, the methylation sites of corresponding gene promoter position are screened;
S102:Difference expression gene and differential methylation site are screened, linear model and experience pattra leaves are combined using one kind
This sane t methods of inspection filter out difference expression gene and differential methylation site.By normal sample and the base of cancer sample
Together with expression data and the Data Integration that methylates, phenotype vector differentiation both types are utilized;
S103:Obtained differential gene and methylation sites are combined with protein reciprocation network, find significance difference
For different in nature module using PPI protein reciprocation networks as framework, the network includes 8434 genes and 303600 sides.Will be every
The value of the differential expression of individual gene and the relative difference of methylation differential degree as PPI nodes, two nodal values are put down
Weight of the average as side.Significance difference opposite sex module is drawn using SPG community discovery algorithms;
S104:Sub-net module of the screening with statistical significance, derived differential expression module depend on PPI network topologies
Structure, the statistical significance of gained sub-net module is assessed using Monte Carlo methods.Node data in network is carried out
The arrangement of 1000 times and these sub-net modules are recalculated, the sub-net module that false discovery rate FDR is more than to threshold value is deleted, and is obtained
To the sub-net module with statistical significance;
S105:By six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out gene in each cancer
Similar association mode and common regulatory gene between expressing and methylating.
In step s 102:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viFor i-th of gene
Covariance matrix diagonal element, d when doing simple regressioniFor the linear model error free degree of i-th of gene, d0It is diPriori
Estimation, s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate can by the prior distribution assumed and gene expression and
The data that methylate are obtained, and difference expression gene is screened according to the t values of the final gained of each gene.Similarly, it is each by calculating
The t values screening differential methylation site in gene methylation site.
In step s 103:Significance difference opposite sex module is drawn using SPG community discovery algorithms.By the t of each geneg (D)
(methylate data variance analysis result) and tg (R)The variance of (i.e. gene expression data difference analysis result) is united
One change is handled, and makes it have identical variance.It is often negatively correlated association due to gene expression and between methylating:
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe side that then two gene g connect with h in network
Weight definition is:wgh=(tg+th)/2。
Using spin glass algorithm SPG, by tgBig preceding 100 genes of value are calculated as seed using a kind of community discovery
Method finds out the sub-net module of the maximization weight sum including seed cdna.In SPG algorithms, input is weight adjacency matrix W, is made
The reason for this algorithm, is as follows:First, to allow the module alternative released, that is, allow appropriate overlapping between disparate modules.
Secondly, algorithm will also avoid the occurrence of the result of high superposed.SPG algorithms are a kind of greedy algorithm, by specifying seed node
Mode finds conspicuousness module.But not all seed node can obtain a module, because some seeds may
It is isolated node.The size of gained module can be determined by adjusting the parameter γ (0≤γ≤1) in SPG algorithms.
σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijExisting side is general between expression point i and point j
Rate.δ(σi,σj) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0.
γ=0.5 is made, makes the size of object module between 10 to 100 genes.First, unsupervised linear dimension reduction method
(PCA and ICA) is used in the gene for finding that the gene number of co-expression gene module is usually studied on gene expression data
Several 1% or so.The number gene selected in experiment after difference analysis is in 8000 genes or so, so the gene drawn
The number of module should be 80 or so.Research shows, during γ=0.5, object module has minimum overlapping.
In step S104:In order to verify the statistical significance of each sub-net module of gained, concrete operations are as follows:
Computing module C module valueThe weighted value on each side in C is wgh, module midpoint
Collection be combined into V (C).MC methods rearrange nodal value by 1000 times, then can assess the conspicuousness of gained module value, select
Those significances are less than 0.05 result.
The application principle of the present invention is further described with reference to specific embodiment.
General cancer gene expression provided in an embodiment of the present invention and the association analysis method that methylates comprise the following steps:
Step 1:Six kinds of cancers methylate data and gene expression data pretreatment
The present invention is mainly TCGA (The Cancer GenomeAtlas:Cancer and oncogene collection of illustrative plates) database offer
Six kinds of cancers composition general cancer project data, six kinds of cancers are head and neck squamous cell carcinoma HNSC, kidney hyaline cell respectively
Cancer KIRC, breast cancer BRCA, lung squamous cancer LUSC, adenocarcinoma of lung LUAD and carcinoma of endometrium UCEC.For methylating for every kind of cancer
Data, remove the row that data are NA;For gene expression data, then need to remove the row of no gene name and variance is small
Row in 1 removes, and then logarithmic scaleization is handled.
Methylate data prediction result, and initial data is 20530 rows, and six kinds of cancers are respectively after pretreatment:UCEC
Remaining 15469 rows, BRCA 17489 rows of residue, KIRC 17318 rows of residue, LUSC 17113 rows of residue, LUAD 17386 rows of residue,
HNSC 17553 rows of residue.
Step 2:Six kinds of cancer differential methylation sites and the screening of difference expression gene
Examined using sane t, calculate methylate t values and the gene expression t values of each gene.
Step 3:The acquisition of notable sub-net module
PPI networks, the t of each gene are built using difference analysis result as input datag (D)(methylate data difference
Specific analysis result) and tg (R)(i.e. gene expression data difference analysis result), two gene g connect with h in PPI networks
The weight definition on side is:wgh=(tg+th)/2.Six kinds of cancers are tested respectively, and finding out conspicuousness using SPG algorithms is higher than net
The difference submodule of other parts in network.In order to verify the statistical significance of gained sub-net module, counted using MC methods
Property examine, following table be meet p value be less than 0.05 conspicuousness sub-net module.
The sub-net module of 1 six kinds of cancers of table
The high expression of significant hyper-methylation low expression, hypomethylation, the high expression of hyper-methylation, hypomethylation in six kinds of cancers
Four kinds of association mode genes of low expression are as shown in table 2.Wherein, MLEL represents hypomethylation low expression, and MLEH represents hypomethylation
Height expression, MHEL represent hyper-methylation low expression, and MHEH hyper-methylations are high to express, similarly hereinafter.Statistics are every kind of cancer in table
The number of gene under corresponding association mode.
The quantity of gene in 2 four kinds of association modes of table
BRCA | KIRC | UCEC | LUSC | HNSC | LUAD | |
MLEL | 25 | 22 | 19 | 34 | 31 | 32 |
MLEH | 25 | 97 | 29 | 24 | 45 | 30 |
MHEL | 88 | 111 | 18 | 85 | 23 | 56 |
MHEH | 32 | 90 | 8 | 57 | 21 | 24 |
Inquire about IntOGen databases (http://www.intogen.org/search), obtain and caused corresponding to six kinds of cancers
Ospc gene, KIRC and LUSC identical Disease-causing gene SETD2, LUAD and BRCA identical pathogenic BMPR2, KIRC and LUSC phase
It is both present in Disease-causing gene LRP6 in database.Genecard databases have collected the related base of a large amount of diseases such as to cancer
Cause, in the database inquire about six kinds of cancers related genes, and with acquired results it was found that, LUSC have 67,
LUAD have 50, KIRC have 73, HNSC have 41 identical Disease-causing genes, as a result such as table 3.
The Genecard databases of table 3 are mutually homogenic with this experimental result
The significant hyper-methylation low expression of some in six kinds of cancers, hypomethylation height are expressed, hyper-methylation height is expressed and low
The model identical gene for the low expression that methylates is listed in Table 4 below.
The DNA methylation and gene expression association mode of 4 six kinds of cancers of table
Experiment finds have the number gene of identical association mode most between BRCA and LUAD both cancers;BRCA、
The number gene of identical association mode takes second place between LUAD and UCEC.
The as shown by data negative correlativing relation between intensity of variation and changes in gene expression that methylates is present in most of cancers in table 4
In disease.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.
Claims (5)
1. a kind of general cancer gene expression and the association analysis method that methylates, it is characterised in that the general cancer gene expression and
The association analysis method that methylates filters out differential expression base using the sane t methods of inspection for combining linear model and Empirical Bayes
Cause and differential methylation site;Differential gene and methylation sites are combined with protein reciprocation network, find significance difference
Different in nature module is using PPI protein reciprocation networks as framework;The sub-net module that false discovery rate FDR is more than threshold value is deleted, and is obtained
To the sub-net module with statistical significance;Similar association mode and common regulation between finding out gene expression and methylating
Gene.
2. general cancer gene expression as claimed in claim 1 and the association analysis method that methylates, it is characterised in that the general cancer
Disease gene expression and the association analysis method that methylates comprise the following steps:
Step 1, gene expression data and the data prediction that methylates, are more than columns 70% by missing data and null value number
Gene elmination, then standard deviation is less than 1 gene elmination, and by remaining Gene Name be changed to corresponding to NCBI Ebtrez
Gene ID are corresponding gene numbering in data;
The gene row of missing data is deleted, every data line is renamed as corresponding probe name, and screened according to previous step
Gene, the methylation sites of corresponding gene promoter position are screened;
Step 2, difference expression gene is filtered out using the sane t methods of inspection of a kind of combination linear model and Empirical Bayes
With differential methylation site by together with normal and cancer sample gene expression data and the Data Integration that methylates, one is utilized
Phenotype vector distinguishes both types;
Step 3, using protein reciprocation network PPI as framework, network includes 8434 genes and 303600 sides;It is whole
Weight of the result that conjunction back is drawn as PPI networks, significance difference opposite sex module is drawn using SPG community discovery algorithms;Will
The t of each geneg (D)Methylate data variance analysis result and tg (R)The variance of gene expression data difference analysis result is entered
The unitized processing of row, has identical variance:
tg={ H (tg (D))H(-tg (R))+H(-tg (D))H(tg (R))}|tg (D)-tg (R)|;
tg (D)And tg (R)Symbol is identical:tg=0;
tg (D)And tg (R)Symbol is different:tg=| tg (D)-tg (R)|;
WhereinThe weight definition on the side that two gene g connect with h in network
For:wgh=(tg+th)/2;Using spin glass algorithm, by tgBig preceding 100 genes of value are calculated as seed using community discovery
Method finds out the sub-net module of the maximization weight sum including seed cdna;
Step 4, derived differential expression module depend on PPI network topology structures, and resulting bottle is assessed using monte carlo method
The statistical significance of net module;The arrangement of 1000 times is carried out to the node data in network and recalculates sub-net module, will
The sub-net module that false discovery rate FDR is more than threshold value is deleted, and obtains the sub-net module with statistical significance;
Step 5, by six kinds of cancer acquired results comprehensive analysis, all there are multiple sub-net modules, find out gene table in each cancer
Similar association mode and common regulatory gene between reaching and methylating.
3. general cancer gene expression as claimed in claim 2 and the association analysis method that methylates, it is characterised in that the step
In two:Counting statistics amount is:
WhereinThe regression coefficient estimation compared for group difference, si 2For the residual standard deviation of sample, viLetter is done for i-th of gene
Covariance matrix diagonal element during simple regression, diFor the linear model error free degree of i-th of gene, d0It is diPrior estimate,
s0 2It in the free degree is d to be0When si 2Prior estimate, prior estimate by the prior distribution assumed and gene expression and can methylate
Data are obtained, and difference expression gene is screened according to the t values of the final gained of each gene;By calculating each gene methylation position
The t values screening differential methylation site of point.
4. general cancer gene expression according to claim 2 and the association analysis method that methylates, it is characterised in that the step
In rapid three:Value using the differential expression of each gene and the relative difference of methylation differential degree as PPI nodes, two
Weight of the average value of nodal value as side;Significant sub-net module in network is found out according to SPG algorithms, hamilton's function is made
To assess relevance measure function between disparate modules:
Wherein, σiRepresent the part belonging to i, WijThe weight adjacency matrix of network, pijExisting side between expression point i and point j
Probability;δ(σi,σj) it is Kronecker function, the value of input is identical, exports as 1;Otherwise, export as 0;γ=0.5 is chosen, makes knot
The size of fruit module is in 10 to 100 genes.
5. general cancer gene expression according to claim 2 and the association analysis method that methylates, it is characterised in that the step
In rapid four:In order to verify the statistical significance of each sub-net module of gained, module C module value is tried to achieve according to following formula:
The weighted value on each side in C is wgh, the collection at module midpoint is combined into V (C);
MC methods rearrange nodal value by 1000 times, assess the conspicuousness of gained module value, select significance and be less than
0.05 result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710838076.5A CN107766697A (en) | 2017-09-18 | 2017-09-18 | A kind of general cancer gene expression and the association analysis method that methylates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710838076.5A CN107766697A (en) | 2017-09-18 | 2017-09-18 | A kind of general cancer gene expression and the association analysis method that methylates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107766697A true CN107766697A (en) | 2018-03-06 |
Family
ID=61265614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710838076.5A Pending CN107766697A (en) | 2017-09-18 | 2017-09-18 | A kind of general cancer gene expression and the association analysis method that methylates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766697A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109411023A (en) * | 2018-09-30 | 2019-03-01 | 华中农业大学 | Interactive relation method for digging between a kind of gene based on Bayesian Network Inference |
CN110349633A (en) * | 2019-07-12 | 2019-10-18 | 大连海事大学 | A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways |
CN110428866A (en) * | 2019-07-23 | 2019-11-08 | 哈尔滨工业大学 | Cancer related pathways recognition methods based on network integration multiple groups data |
CN111640468A (en) * | 2020-05-18 | 2020-09-08 | 天士力国际基因网络药物创新中心有限公司 | Method for screening disease-related protein based on complex network |
CN112599201A (en) * | 2020-12-15 | 2021-04-02 | 中国人民解放军军事科学院军事医学研究院 | System for analyzing infection path between virus receptor and human target organ, and electronic device |
CN112662763A (en) * | 2020-03-10 | 2021-04-16 | 博尔诚(北京)科技有限公司 | Probe composition for detecting common amphoteric cancers |
CN112951418A (en) * | 2021-05-17 | 2021-06-11 | 臻和(北京)生物科技有限公司 | Method and device for evaluating methylation of linked regions based on liquid biopsy, terminal equipment and storage medium |
CN114373502A (en) * | 2022-01-07 | 2022-04-19 | 吉林大学第一医院 | Tumor data analysis system based on methylation |
CN115019884A (en) * | 2022-05-13 | 2022-09-06 | 华东交通大学 | Network marker identification method fusing multiple groups of mathematical data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740651A (en) * | 2016-03-07 | 2016-07-06 | 吉林大学 | Construction method for specific cancer differential expression gene regulation and control network |
-
2017
- 2017-09-18 CN CN201710838076.5A patent/CN107766697A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740651A (en) * | 2016-03-07 | 2016-07-06 | 吉林大学 | Construction method for specific cancer differential expression gene regulation and control network |
Non-Patent Citations (4)
Title |
---|
GORDON K. SMYTH ET AL.: "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments", 《STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY 3》 * |
JAMES WEST ET AL.: "An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways", 《SCIENTIFIC REPORTS》 * |
YINMING JIAO ET AL.: "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control", 《ORIGINAL PAPER》 * |
殷黎洋: "全基因组乳腺癌DNA甲基化与基因表达关联模式", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109411023A (en) * | 2018-09-30 | 2019-03-01 | 华中农业大学 | Interactive relation method for digging between a kind of gene based on Bayesian Network Inference |
CN109411023B (en) * | 2018-09-30 | 2022-03-18 | 华中农业大学 | Method for mining inter-gene interaction relation based on Bayesian network inference |
CN110349633A (en) * | 2019-07-12 | 2019-10-18 | 大连海事大学 | A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways |
CN110428866A (en) * | 2019-07-23 | 2019-11-08 | 哈尔滨工业大学 | Cancer related pathways recognition methods based on network integration multiple groups data |
CN112662763A (en) * | 2020-03-10 | 2021-04-16 | 博尔诚(北京)科技有限公司 | Probe composition for detecting common amphoteric cancers |
CN111640468A (en) * | 2020-05-18 | 2020-09-08 | 天士力国际基因网络药物创新中心有限公司 | Method for screening disease-related protein based on complex network |
CN112599201A (en) * | 2020-12-15 | 2021-04-02 | 中国人民解放军军事科学院军事医学研究院 | System for analyzing infection path between virus receptor and human target organ, and electronic device |
CN112951418A (en) * | 2021-05-17 | 2021-06-11 | 臻和(北京)生物科技有限公司 | Method and device for evaluating methylation of linked regions based on liquid biopsy, terminal equipment and storage medium |
CN114373502A (en) * | 2022-01-07 | 2022-04-19 | 吉林大学第一医院 | Tumor data analysis system based on methylation |
CN114373502B (en) * | 2022-01-07 | 2022-12-06 | 吉林大学第一医院 | Tumor data analysis system based on methylation |
CN115019884A (en) * | 2022-05-13 | 2022-09-06 | 华东交通大学 | Network marker identification method fusing multiple groups of mathematical data |
CN115019884B (en) * | 2022-05-13 | 2023-11-03 | 华东交通大学 | Network marker identification method integrating multiple groups of chemical data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766697A (en) | A kind of general cancer gene expression and the association analysis method that methylates | |
Gawad et al. | Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics | |
Hackett et al. | Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps | |
CN109326316A (en) | A kind of Multi-Layered Network Model construction method and the application of cancer related SNP, gene, miRNA and protein interaction | |
Silver et al. | Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps | |
US20200239965A1 (en) | Source of origin deconvolution based on methylation fragments in cell-free dna samples | |
Bhattacharyya et al. | MicroRNA signatures highlight new breast cancer subtypes | |
CN108319984B (en) | The construction method and prediction technique of xylophyta leaf morphology feature and photosynthesis characteristics prediction model based on DNA methylation level | |
KR101949286B1 (en) | Method and system for tailored anti-cancer therapy based on the information of genomic sequence variant and survival of cancer patient | |
CN108804876B (en) | Method and apparatus for calculating purity and chromosome ploidy of cancer sample | |
Hopp et al. | Portraying the expression landscapes of cancer subtypes: A case study of glioblastoma multiforme and prostate cancer | |
Zeng et al. | Cancer classification and pathway discovery using non-negative matrix factorization | |
KR20220086603A (en) | Cancer classification using tissue-of-origin thresholding | |
CN115631789A (en) | Pangenome-based group joint variation detection method | |
CN113192556B (en) | Genotype and phenotype association analysis method in multigroup chemical data based on small sample | |
CN107368702A (en) | A kind of method of structure miRNA regulated and control networks | |
CN115019884B (en) | Network marker identification method integrating multiple groups of chemical data | |
Tian et al. | Sparse group selection on fused lasso components for identifying group-specific DNA copy number variations | |
Li et al. | Ensemble-based multi-objective clustering algorithms for gene expression data sets | |
Wu et al. | Network-based method for inferring cancer progression at the pathway level from cross-sectional mutation data | |
Elsheikh et al. | Relating connectivity changes in brain networks to genetic information in Alzheimer patients | |
Chakrapani et al. | Effective utilisation of influence maximization technique for the identification of significant nodes in breast cancer gene networks | |
Nemes et al. | A diagnostic algorithm to identify paired tumors with clonal origin | |
Wu et al. | Network‐based method for detecting dysregulated pathways in glioblastoma cancer | |
Fang et al. | Joint detection of associations between DNA methylation and gene expression from multiple cancers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180306 |