CN109346127A - A kind of statistical analysis technique driving gene for detecting potential cancer - Google Patents

A kind of statistical analysis technique driving gene for detecting potential cancer Download PDF

Info

Publication number
CN109346127A
CN109346127A CN201810902841.XA CN201810902841A CN109346127A CN 109346127 A CN109346127 A CN 109346127A CN 201810902841 A CN201810902841 A CN 201810902841A CN 109346127 A CN109346127 A CN 109346127A
Authority
CN
China
Prior art keywords
gene
mutation
value
cancer
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810902841.XA
Other languages
Chinese (zh)
Other versions
CN109346127B (en
Inventor
李淼新
蒋琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201810902841.XA priority Critical patent/CN109346127B/en
Publication of CN109346127A publication Critical patent/CN109346127A/en
Application granted granted Critical
Publication of CN109346127B publication Critical patent/CN109346127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of statistical analysis technique that gene is driven for detecting potential cancer, has the advantage for being fully accurate fitting genome somatic mutation rate, so as to more effectively screen cancer driving gene.Particularly, which is not limited to by sample size, and the effect of detecting cancer driving gene can be also promoted for small sample.

Description

A kind of statistical analysis technique driving gene for detecting potential cancer
Technical field
The present invention relates to biology techniques fields, more particularly, to a kind of for detecting potential cancer driving gene Statistical analysis technique.
Background technique
There are mainly two types of the analysis methods for driving gene by somatic mutation detecting cancer at present, 1. background mutation rates (BMR) method and 2. background mutation ratio-metric method.The thought of background mutation rates method is to assess a gene in cancer sample Whether more somatic mutations, such as MutSigCV than expected are contained[1]And MuSiC[2]Method, wherein expected mutation count be by What multinomial index of correlation was predicted and was estimated.These prediction index include genetic characteristics, coding section length etc..MutSigCV method is also Ad hoc proposal adds the other three with the directly related variable of cancer cell (DNA replication dna number and transcriptional activity, dye in cancer cell Chromaticness situation) come improve to desired background mutation prediction effect.The method of sketch-based user interface measurement is by investigating a gene The ratio of middle variety classes somatic mutation number carrys out detecting cancer driving gene.For example, there is a method pair for entitled 20/20 rule The ratio of inactive mutation and periodical missense mutation does simple assessment and carrys out detecting cancer driving gene[3]。 Oncodrive-fm[4]And OncodriveFML[5]Influence of the mutation to gene function is integrated into assessment and promotes prediction effect. OncodriveCLUST[6]Consider the attribute of mutated site cluster.The method of a nearest entitled 20/20+[7]Continue 20/ The thought of 20 ratio measures, and the other possible favorable selection evolution Feature of 18 cancer cells is incorporated (for example, albumen interaction is made With network dimension etc.) utilize machine learning method prediction cancer driving gene.Because it needs pre- by Monte Carlo simulation The statistical significance (namely P value) of assessment point, therefore speed can be slow.
Although the general principles of two methods are fairly simple, still remain technology barrier and need to be crossed over, especially exist Low performance problem under small sample.For example, a nearest research[7]Show that existing cancer driving gene tester calculates Statistics P value out, which is disobeyed, to be uniformly distributed, and shows that the background mutation that they are obtained is poor fitting.Although being able to use computer Stochastic simulation measure correction P value distribution, but the key is to properly fitting background genes real cancer could be driven Dynamic gene is accurately identified from noisy background genes.Especially when sample size is too small to generate stable model When, this problem just seems more acute.So detecting cancer driving gene based on small sample in existing statistical analysis Usual inefficiency.Therefore, also have and research and propose before detecting sample with the gene spy that the integration of supervised learning method is shared Sign[7].However, cancer have it is high heterogeneous and specific[1]If the excessive additional general predictive feature of addition, it is possible to make The model that distinctive cancer driving gene is ignored, and is established by known drive gene in test sample, for finding new drive The efficiency of dynamic gene is often restricted.In addition, the problem of due to poor fitting, the cancer that different tools predicts drives base It is consistent because being difficult between each other, it is again usually not only hard but also have deviation to merge these results.Therefore, gene is driven in order to disclose cancer Complete map there is an urgent need to significantly more efficient methods.
Summary of the invention
Since cancer has a very strong heterogeneity, and most of cancer driving gene seems that effect is relatively mild and performance is unknown It is aobvious, cause existing method under general sample size scale to the identification of cancer driving gene usually inefficiency.And it actually grinds In studying carefully, various reasons such as resource and fund are limited to, sample size is not often sufficiently large, method existing for small sample Then it is more difficult precise Identification and goes out cancer driving gene.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of statistical analysis technique driving gene for detecting potential cancer, comprising the following steps:
S1. c is usedi,jIt indicates to send out in cancer sample on nonsynonymous mutation or shearing mutational site j in some background genes i Raw mutation allele number;If gene has miA mutational site, yiIndicate that the mutation allele on whole mutational sites is total Number, yiIt obeys negative binomial distribution (NB):
Wherein μiIt is expected mutation count, θ is the dispersion parameter of distribution;
Then probability density function isWherein Γ () is gamma function;
S2. using the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i, zero negative binomial point is cut The probability density function of cloth is
S3. generalized linear regression model is constructed:
η=log (μi)=β01×[x1, mutation allele number on same sense mutation position]
2×[x2, encode section length]
3×[x3, the limitation scoring of potential new hair mutation]
4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
5×[x5, HeLa cell DNA reproduction speed]
6×[x6, K562 cell HiC long range chromatin reciprocation]
Using the coefficient of maximum likelihood method estimation regression equation and the parameter θ of distribution;
The then logarithm of the number of the nonsynonymous mutation in gene i and the mutation allele on shearing mutational site It can be calculated by following formula:
Wherein,It is the coefficient of regression equation;
S4. regression equation parameter determine after, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Wherein sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yii,θ)].
It is observation yiAverage value, acquired by following formula
S5. deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard Difference.The P value of standard deviation residual error is calculated using standardized normal distribution:
Φ (x) is the cumulative distribution function of standardized normal distribution:
Φ (x) is the cumulative distribution function of standardized normal distribution;
S6. the P value of full gene is calculated using step S1~S5;
S7. the too small significant gene of P value is rejected using threshold value;
S8. the P value of remaining gene is calculated using step S1~S5;
S9. step S7~S8 is repeated until the significant gene for not having P value too small;Determining model parameter calculation is utilized at this time The P value of full gene;
Using the gene generation P value that step S1~S9 is with reference to somatic mutation sample, the too small base of P value is then rejected The somatic mutation for retaining gene is integrated into small sample by cause, is then estimated in gene by the method for step S1~S9 High frequency somatic mutation and corresponding P value.
In order to further enhance detection efficiency, in the step S1, a weighted model is constructed based on Random Forest model The scoring s of mutational site j on predicted gene ii,j, and by si,jBe converted to the scoring w of integeri,j, wi,j=Integer Scoring will be used as the priority valve in mutational site, then the mutation allele number after weighting are as follows:
If the mutation allele number of weighting also obeys negative binomial distribution:
It is expected mutation count,It is the dispersion parameter of bi-distribution;
Then subsequent process is executed again.
Compared with prior art, the beneficial effects of the present invention are:
Method provided by the invention has the advantage for being fully accurate fitting genome somatic mutation rate, so as to more effective Cancer driving gene is screened from background genes in ground.Particularly, which is not limited to by sample size, and small sample can also be promoted The effect of detection driving gene.
Detailed description of the invention
Fig. 1 is the configuration diagram of method.
Fig. 2 detects cancer with 4 kinds of methods for 11 kinds of cancers and the effect of gene is driven to compare figure.
Wherein, a: fold differences take the average value of logarithm;B: significant gene number;The common cancer that c:5 kind method obtains is aobvious Write gene number;D: the peculiar significant gene number of each method;E: the unique gene that each method obtains matches with cancer gene concentration Gene number.
Gene of the P value less than threshold value FDR=0.1 is removed.Cancer name label: BLCA: Urothelial Carcinoma of Bladder; BRCA: Breast cancer;COAD: colon cancer;UCEC: carcinoma of endometrium;HNSC: G. cephalantha;KIRC: kidney light cell cancer;LUAD: lung gland Cancer;LUSC: squamous cell lung carcinoma;MEL: melanoma;OV: serous cystadenocarcinoma of ovary;STAD: sdenocarcinoma of stomach.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, method framework provided by the invention includes three layers, the 1. layer be that iteration is cut zero negative binomial and returned (ITER), background mutation model estimates that nonsynonymous mutation and shearing mutation are pre- in individual gene according to the various features of genome The number of phase.The 2. layer be that weighted iteration cuts zero negative binomial and returns (WITER), for generating priority valve, and with the power predicted Value is mutated high risk potential in cancer sample and low-risk mutation is distinguish.3. layer be to integrate reference sample, pass through The strategy of independent sample for reference is added, the unstable difficulty of regression model caused by the sample size deficiency that improvement is had by oneself because of user Topic.1. layer be the 2. layer a part, 1. layer and 2. layer be the 3. layer a part.It is non-in cancer patient body cell Variation number of alleles in same sense mutation and shearing mutation, same sense mutation site is main input.Output is cancer sample In each gene somatic mutation z-score and p value.
First layer structure: the zero negative binomial that cuts of iteration returns
First layer structure proposes a kind of new method that the system cytoplasmic process parameter for being fitted gene on genome is distributed, life The zero negative binomial that cuts of entitled iteration returns (ITER).The observation number of somatic mutation and the difference of estimation desired number are used for Whether the somatic mutation for measuring some gene in a kind of cancer is excessive.Nonsynonymous mutation and shearing mutation are interested prominent Become, if the mutation count that some gene contains exceeds the expectation mutation count of this type gene, then the gene may be significantly Promote the driving gene of growth of cancers.Present invention ci,jIndicate nonsynonymous mutation or shearing mutational site in some background genes i The upper mutation allele number occurred in cancer sample of j.Assuming that the gene has miA mutational site, yiIndicate all mutation Mutation allele sum on site, yiIt obeys negative binomial distribution (NB).
μiIt is expected mutation count, θ is the dispersion parameter of distribution.Probability density function (PMF) is Wherein Γ () is gamma function.
But somatic mutation be it is rare, many genes do not have somatic mutation in the sample size of ordinary size, thus There is excessive zero in actual observation data.Because of the influence of zero expansion, so that regression equation is difficult accurately to be fitted mutation Number.Therefore first layer structure is proposed with the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i.Cut zero The probability density function of negative binomial distribution is,
Based on zero negative binomial distribution 2 distributions are cut, we construct a generalized linear regression model, on predicted gene i The desired number of nonsynonymous mutation and the mutation allele on shearing mutational site.The regression equation includes 6 covariants,
η=log (μi)=β01×[x1, mutation allele number on same sense mutation position]
2×[x2, encode section length]
3×[x3, the limitation scoring of potential new hair mutation]
4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
5×[x5, HeLa cell DNA reproduction speed]
6×[x6, K562 cell HiC long range chromatin reciprocation],
Mutation allele number in same sense mutation is counted in the detection sample that user possesses.Encode section length It is to be estimated from reference genetic model data RefGene.Gene limits score basis Samocha et al (2014)[8]It calculates. Last three covariants continue to use MutSigCV[1]The predictive variable of method.Expression value is derived from Cancer Cell Line The average value that 91 cell line is expressed in Encylcopedia (CCLE).Cellular replication multiple, range are measured from HeLa cell From 100 (early stages) to 1000 (advanced stages).The chromatin state of gene is measured from the HiC of K562 cell line experiment and is obtained, range Probably from -50 (closed states) to+50 (open states).Because some covariants are missing from value, missForest is used Method fills up missing values.MissForest is a kind of widely applied nonparametric missing values complementing method based on random forest. Certain model can also add other covariants.We use R kit countreg (https: //r- Forge.r-project.org/R/? group_id=522 the maximum likelihood method in) estimates regression equation coefficient and is distribution Parameter θ.
After above-mentioned model elaborates, then same sense mutation and shearing mutation logarithm in non-in gene iIt can be with It is calculated by following formula:
Wherein,It is the coefficient of regression equation.
After the parameter and dispersion parameter of equation are determined, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yii,θ)].
It is observation yiAverage value, acquired by following formula
In the analysis of actual data, discovery standardized normal distribution can be applied to the P of approximate standard deviation residual error Value.Deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard deviation.Based on such as Lower formula calculates P value:
Φ (x) is the cumulative distribution function of standardized normal distribution.
Assuming that most genes are the non-driven gene of background, ITER model estimates that prediction body cell is non-under this null hypothesis The expectation allele number of same sense mutation and shearing mutational site.éiBigger expression somatic mutation site allele quantity Observation more than prediction desired value it is big, become cancer driving gene a possibility that it is also bigger.
The recurrence mode of first layer structure proposition iteration reduces influence of the driving gene in null hypothesis regression model.
Step 1: calculating the P value of full gene with ITER
Step 2: rejecting the too small significant gene of P value with the False discovery rate that threshold value is (FDR)≤0.1
Step 3: calculating the P value of remaining gene with ITER
Step 4: repeat second and third step until the significant gene that does not have P value too small
It is that it is used to calculate full gene closest to null hypothesis model in the ITER model that last time iteration acquires P value (including the gene rejected when iteration).
Second layer structure: the iteration of weighting is cut zero negative binomial and is returned
ITER method is extended by second layer structure, and in mutational site, weighting becomes the more powerful side WITER of efficiency Method.Use si,j∈ [0,1] indicates the scoring of a mutational site j on gene i.It is cancer driving that scoring, which prompts the mutational site, A possibility that mutational site.By si,jBe converted to the scoring w of integeri,i, take si,j/ 0.1 max-int, i.e.,The scoring of this integer will be used as the priority valve in mutational site.ITER is that WITER works as wi,jThe one of=1 Kind special case.The mutation allele number of weighting are as follows:
Equally, also assume that the mutation allele number of weighting obeys negative binomial distribution:
Wherein,It is expected mutation count,It is the dispersion parameter of bi-distribution.Mutation allele number y originallyiQuilt The mutation allele number of weightingAfter replacement, section zero negative binomial regression process of iteration is constant, exists for detecting some gene Whether nonsynonymous mutation and shearing mutational site have excessive weighting mutation allele number.
Second layer structure constructs the scoring s that a weighted model predicts potential high-frequency body cell driving mutationi,j.It should Weighted model is constructed based on Random Forest model.The training set of Random Forest model (including 500 decision trees) is huge cancer Disease somatic mutation database COSMIC (V83).In order to avoid repeating demonstration problem, use is eliminated from COSMIC database In all samples (number=7,916) of 34 kinds of cancers of test.4,320 individual cells in COSMIC (V83) are had collected to be mutated Positive mutation training set is constituted, these mutation incidence in the cancerous tissue of primary is higher than 15 times of mean level.Also from The mutation of 258,846 individual cells is randomly selected in COSMIC sample as negative control catastrophe set.Each to impinging upon primary cancer Primary mutation only occurs in tissue.The Prediction Parameters of each mutation include to come from dbNSFP v3.5[9]19 genes of database The scoring of function harmfulness.
Third layer structure: ITER or WITER borrows sample for reference and analyzes small sample cancer
In small sample, when somatic mutation number is too small (being, for example, less than 28,000), it is stable to be difficult building one Regression model.However, it is noted that the core of ITER or WITER is gene constructed model non-driven to background.When two kinds When the non-driven gene mutation rate of cancer is close to each other, then integrating a kind of background genes of cancer to the background of another cancer Gene is feasible and effective.Third layer structure proposes a kind of borrow sample for reference strategy as a result,.The strategy is able to achieve to small Sample constructs stable ITER or WITER model.Generally by implementing in two steps:
The first step is the gene generation P value with reference to somatic mutation sample with above-mentioned ITER or WITER method, (false The sample for being used as reference calmly has the mutation of enough numbers).The too small gene of P value, example are rejected with a very loose threshold value Such as reject gene of the FDR less than 0.8 of corresponding P value.
The somatic mutation of retained gene is integrated into the small sample that user has by oneself, then all by them by second step It inputs ITER or WITER and constructs a new regression model.Finally, estimating the high frequency body cell in gene with this new model Mutation and corresponding P value.
For method provided by the invention compared with other methods, it can not only be accurately detected more cancer driving genes, And speed is fast.In all 11 cancers in testing, this method can always detect the significant gene of more cancers, simultaneously Avoid the statistically significant phenomenon of expansion and deflation (see Fig. 2).In the assessment test for multiple Minimum Samples, even only There is 30 or so sample size, method provided by the invention can detect significant cancer driving gene.And it is based on testing result, This method is made full use of to produce potential driving gene overall situation map 32 kinds of cancers.The map includes the 100 of 23 cancers A above peculiar gene, these genes are that the potential of diagnosing and treating cancer clearly marks.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.
Bibliography
[1]Lawrence M S,Stojanov P,Polak P,et al.Mutational heterogeneity in cancer and the search for new cancer-associated genes[J].Nature,2013,499 (7457):214-218.
[2]Dees N D,Zhang Q,Kandoth C,et al.MuSiC:identifying mutational significance in cancer genomes[J].Genome Res,2012,22(8):1589-98.
[3]Vogelstein B,Papadopoulos N,Velculescu V E,et al.Cancer genome landscapes[J]. Science,2013,339(6127):1546-58.
[4]Gonzalez-PerezA,Lopez-Bigas N.Functional impact bias reveals cancer drivers[J]. Nucleic Acids Res,2012,40(21):e169.
[5]Mularoni L,Sabarinathan R,Deu-Pons J,et al.OncodriveFML:a general framework to identify coding and non-coding regions with cancer driver mutations[J]. Genome Biol,2016,17(1):128.
[6]Tamborero D,Gonzalez-PerezA,Lopez-Bigas N.OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes[J]. Bioinformatics,2013,29(18):2238-44.
[7]Tokheim C J,Papadopoulos N,Kinzler K W,et al.Evaluating the evaluation of cancer driver genes[J].Proc Natl Acad Sci U S A,2016,113(50): 14330-14335.
[8]Samocha K E,Robinson E B,Sanders S J,et al.A framework for the interpretation of de novo mutation in human disease[J].Nat Genet,2014,46(9): 944-50.
[9]Liu X,Jian X,Boerwinkle E.dbNSFP:a lightweight database of human nonsynonymous SNPs and their functional predictions[J].Hum Mutat,2011,32(8): 894-9。

Claims (2)

1. a kind of for detecting the statistical analysis technique of potential cancer driving gene, it is characterised in that: the following steps are included:
S1. with using cI, jIt indicates to occur in cancer sample on nonsynonymous mutation or shearing mutational site j in some background genes i Mutation allele number;If gene i has miA mutational site, yiIndicate that the mutation allele on whole mutational sites is total Number, yiIt obeys negative binomial distribution (NB):
Wherein μiIt is expected mutation count, θ is the dispersion parameter of distribution;
Then probability density function isWherein Γ () is gamma function;
S2. using the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i, zero negative binomial distribution is cut Probability density function is
S3. generalized linear regression model is constructed:
η=log (μi)=β01×[x1, mutation allele number on same sense mutation position]
2×[x2, encode section length]
3×[x3, the limitation scoring of potential new hair mutation]
4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
5×[x5, HeLa cell DNA reproduction speed]
6×[x6, K562 cell HiC long range chromatin reciprocation]
Using the coefficient of maximum likelihood method estimation regression equation and the parameter θ of distribution;
The then logarithm of the number of the nonsynonymous mutation in gene i and the mutation allele on shearing mutational siteIt can be with It is calculated by following formula:
Wherein,It is the coefficient of regression equation;
S4. regression equation parameter determine after, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Wherein sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yii, θ)]
It is observation yiAverage value, acquired by following formula:;
S5. deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard deviation;It utilizes The P value of standardized normal distribution calculating standard deviation residual error:
pi=1- Φ (éi),
Φ (x) is the cumulative distribution function of standardized normal distribution:
pi=1- Φ (éi)
Φ (x) is the cumulative distribution function of standardized normal distribution;
S6. the P value of full gene is calculated using step S1~S5;
S7. the too small significant gene of P value is rejected using threshold value;
S8. the P value of remaining gene is calculated using step S1~S5;
S9. step S7~S8 is repeated until the significant gene for not having P value too small;It is whole using determining model parameter calculation at this time The P value of gene;
Using the gene generation P value that step S1~S9 is with reference to somatic mutation sample, the too small gene of P value is then rejected, it will The somatic mutation for retaining gene is integrated into small sample, and the high frequency in gene is then estimated by the method for step S1~S9 Somatic mutation and corresponding P value.
2. according to claim 1 for detecting the statistical analysis technique of potential cancer driving gene, it is characterised in that: institute It states in step S1, the scoring of a mutational site j on a weighted model predicted gene i is constructed based on Random Forest model sI, j, and by sI, jBe converted to the scoring w of integerI, j, wI, j=" sI, j/ 0.1], integer scoring will be used as the preferential of mutational site Weight, then the mutation allele number after weighting are as follows:
If the mutation allele number of weighting obeys negative binomial distribution:
It is expected mutation count,It is the dispersion parameter of bi-distribution;
Then subsequent process is executed again.
CN201810902841.XA 2018-08-09 2018-08-09 Statistical analysis method for detecting potential cancer driver gene Active CN109346127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810902841.XA CN109346127B (en) 2018-08-09 2018-08-09 Statistical analysis method for detecting potential cancer driver gene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810902841.XA CN109346127B (en) 2018-08-09 2018-08-09 Statistical analysis method for detecting potential cancer driver gene

Publications (2)

Publication Number Publication Date
CN109346127A true CN109346127A (en) 2019-02-15
CN109346127B CN109346127B (en) 2021-10-08

Family

ID=65296755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810902841.XA Active CN109346127B (en) 2018-08-09 2018-08-09 Statistical analysis method for detecting potential cancer driver gene

Country Status (1)

Country Link
CN (1) CN109346127B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910955A (en) * 2019-10-21 2020-03-24 中山大学 Establishment method of longitudinal analysis model of rare variation sites of susceptibility genes
CN112259163A (en) * 2020-10-28 2021-01-22 广西师范大学 Cancer driving module identification method based on biological network and subcellular localization data
WO2021042237A1 (en) * 2019-09-02 2021-03-11 北京哲源科技有限责任公司 Method for obtaining intracellular deterministic event, and electronic device
CN113517021A (en) * 2021-06-09 2021-10-19 海南精准医疗科技有限公司 Cancer driver gene prediction method
CN117809741A (en) * 2024-03-01 2024-04-02 浙江大学 Method and device for predicting cancer characteristic genes based on molecular evolution selective pressure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980763A (en) * 2017-03-30 2017-07-25 大连理工大学 A kind of cancer based on gene mutation frequency drives the screening technique of gene
US20170319692A1 (en) * 2016-04-22 2017-11-09 The Cleveland Clinic Foundation Anti-ar agent and radiation therapy for androgen receptor positive cancer
CN108256291A (en) * 2016-12-28 2018-07-06 杭州米天基因科技有限公司 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170319692A1 (en) * 2016-04-22 2017-11-09 The Cleveland Clinic Foundation Anti-ar agent and radiation therapy for androgen receptor positive cancer
CN108256291A (en) * 2016-12-28 2018-07-06 杭州米天基因科技有限公司 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result
CN106980763A (en) * 2017-03-30 2017-07-25 大连理工大学 A kind of cancer based on gene mutation frequency drives the screening technique of gene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
H SAKAMOTO ET AL: "Disproportionate representation of KRAS gene mutation in atypical adenomatous hyperplasia, but even distribution of EGFR gene mutation from preinvasive to invasive adenocarcinomas", 《JOURNAL OF PATHOLOGY》 *
MICHAEL S. LAWRENCE ET AL: "Mutational heterogeneity in cancer and the search for new cancer-associated genes", 《NATURE》 *
付立平 等: "1834 个亲子鉴定案例中 20 个 STR 基因座的突变分析", 《WORLD LATEST MEDICINE INFORMATION (ELECTRONIC VERSION) 》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021042237A1 (en) * 2019-09-02 2021-03-11 北京哲源科技有限责任公司 Method for obtaining intracellular deterministic event, and electronic device
CN112840402A (en) * 2019-09-02 2021-05-25 北京哲源科技有限责任公司 Method and electronic device for obtaining deterministic events in cells
CN110910955A (en) * 2019-10-21 2020-03-24 中山大学 Establishment method of longitudinal analysis model of rare variation sites of susceptibility genes
CN110910955B (en) * 2019-10-21 2024-03-01 中山大学 Method for establishing longitudinal analysis model of rare mutation sites of susceptibility genes
CN112259163A (en) * 2020-10-28 2021-01-22 广西师范大学 Cancer driving module identification method based on biological network and subcellular localization data
CN112259163B (en) * 2020-10-28 2022-04-22 广西师范大学 Cancer driving module identification method based on biological network and subcellular localization data
CN113517021A (en) * 2021-06-09 2021-10-19 海南精准医疗科技有限公司 Cancer driver gene prediction method
CN113517021B (en) * 2021-06-09 2022-09-06 海南精准医疗科技有限公司 Cancer driver gene prediction method
CN117809741A (en) * 2024-03-01 2024-04-02 浙江大学 Method and device for predicting cancer characteristic genes based on molecular evolution selective pressure

Also Published As

Publication number Publication date
CN109346127B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN109346127A (en) A kind of statistical analysis technique driving gene for detecting potential cancer
CN107025384A (en) A kind of construction method of complex data forecast model
CN114686591B (en) Lung squamous cell carcinoma immunotherapy curative effect prediction model based on gene expression condition, construction method and application thereof
CN106033502A (en) Virus identification method and device
EP3794145A1 (en) Inferring selection in white blood cell matched cell-free dna variants and/or in rna variants
CN111440869A (en) DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
Dash et al. Performance analysis of clustering techniques over microarray data: A case study
Dawany et al. Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types
Karimnezhad et al. Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach
Luo et al. A new approach for the 10.7-cm solar radio flux forecasting: based on empirical mode decomposition and LSTM
Sobhan et al. Explainable machine learning to identify patient-specific biomarkers for lung cancer
Zhuge et al. Construction of the model for predicting prognosis by key genes regulating EGFR-TKI resistance
Tai et al. Bayice: a Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data
Shahweli et al. In Silico Molecular Classification of Breast and Prostate Cancers using Back Propagation Neural Network
Bell-Glenn et al. A novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures
US11435357B2 (en) System and method for discovery of gene-environment interactions
Jin et al. Feature selection and classification over the network with missing node observations
Tsai et al. Significance analysis of ROC indices for comparing diagnostic markers: applications to gene microarray data
US8793209B2 (en) Reflecting the quantitative impact of ordinal indicators
Li et al. Using the SVM Method for Lung Adenocarcinoma Prognosis Based on Expression Level
Zhang et al. Network propagation models for gene selection
KR20220085139A (en) Method of gene selection for predicting medical information of patients and uses thereof
Zeng et al. A Novel Prognosis Model based on Comprehensive Analysis of Pyroptosis-Related Genes in Breast Cancer
Lakshmi et al. Design of a state-machine based genomic simulator and development of a system for prediction of Rheumatoid Arthritis (RA) using signal processing techniques
Zheng et al. TsImpute: an accurate two-step imputation method for single-cell RNA-seq data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant