CN109346127A - A kind of statistical analysis technique driving gene for detecting potential cancer - Google Patents
A kind of statistical analysis technique driving gene for detecting potential cancer Download PDFInfo
- Publication number
- CN109346127A CN109346127A CN201810902841.XA CN201810902841A CN109346127A CN 109346127 A CN109346127 A CN 109346127A CN 201810902841 A CN201810902841 A CN 201810902841A CN 109346127 A CN109346127 A CN 109346127A
- Authority
- CN
- China
- Prior art keywords
- gene
- mutation
- value
- cancer
- distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention provides a kind of statistical analysis technique that gene is driven for detecting potential cancer, has the advantage for being fully accurate fitting genome somatic mutation rate, so as to more effectively screen cancer driving gene.Particularly, which is not limited to by sample size, and the effect of detecting cancer driving gene can be also promoted for small sample.
Description
Technical field
The present invention relates to biology techniques fields, more particularly, to a kind of for detecting potential cancer driving gene
Statistical analysis technique.
Background technique
There are mainly two types of the analysis methods for driving gene by somatic mutation detecting cancer at present, 1. background mutation rates
(BMR) method and 2. background mutation ratio-metric method.The thought of background mutation rates method is to assess a gene in cancer sample
Whether more somatic mutations, such as MutSigCV than expected are contained[1]And MuSiC[2]Method, wherein expected mutation count be by
What multinomial index of correlation was predicted and was estimated.These prediction index include genetic characteristics, coding section length etc..MutSigCV method is also
Ad hoc proposal adds the other three with the directly related variable of cancer cell (DNA replication dna number and transcriptional activity, dye in cancer cell
Chromaticness situation) come improve to desired background mutation prediction effect.The method of sketch-based user interface measurement is by investigating a gene
The ratio of middle variety classes somatic mutation number carrys out detecting cancer driving gene.For example, there is a method pair for entitled 20/20 rule
The ratio of inactive mutation and periodical missense mutation does simple assessment and carrys out detecting cancer driving gene[3]。 Oncodrive-fm[4]And OncodriveFML[5]Influence of the mutation to gene function is integrated into assessment and promotes prediction effect.
OncodriveCLUST[6]Consider the attribute of mutated site cluster.The method of a nearest entitled 20/20+[7]Continue 20/
The thought of 20 ratio measures, and the other possible favorable selection evolution Feature of 18 cancer cells is incorporated (for example, albumen interaction is made
With network dimension etc.) utilize machine learning method prediction cancer driving gene.Because it needs pre- by Monte Carlo simulation
The statistical significance (namely P value) of assessment point, therefore speed can be slow.
Although the general principles of two methods are fairly simple, still remain technology barrier and need to be crossed over, especially exist
Low performance problem under small sample.For example, a nearest research[7]Show that existing cancer driving gene tester calculates
Statistics P value out, which is disobeyed, to be uniformly distributed, and shows that the background mutation that they are obtained is poor fitting.Although being able to use computer
Stochastic simulation measure correction P value distribution, but the key is to properly fitting background genes real cancer could be driven
Dynamic gene is accurately identified from noisy background genes.Especially when sample size is too small to generate stable model
When, this problem just seems more acute.So detecting cancer driving gene based on small sample in existing statistical analysis
Usual inefficiency.Therefore, also have and research and propose before detecting sample with the gene spy that the integration of supervised learning method is shared
Sign[7].However, cancer have it is high heterogeneous and specific[1]If the excessive additional general predictive feature of addition, it is possible to make
The model that distinctive cancer driving gene is ignored, and is established by known drive gene in test sample, for finding new drive
The efficiency of dynamic gene is often restricted.In addition, the problem of due to poor fitting, the cancer that different tools predicts drives base
It is consistent because being difficult between each other, it is again usually not only hard but also have deviation to merge these results.Therefore, gene is driven in order to disclose cancer
Complete map there is an urgent need to significantly more efficient methods.
Summary of the invention
Since cancer has a very strong heterogeneity, and most of cancer driving gene seems that effect is relatively mild and performance is unknown
It is aobvious, cause existing method under general sample size scale to the identification of cancer driving gene usually inefficiency.And it actually grinds
In studying carefully, various reasons such as resource and fund are limited to, sample size is not often sufficiently large, method existing for small sample
Then it is more difficult precise Identification and goes out cancer driving gene.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of statistical analysis technique driving gene for detecting potential cancer, comprising the following steps:
S1. c is usedi,jIt indicates to send out in cancer sample on nonsynonymous mutation or shearing mutational site j in some background genes i
Raw mutation allele number;If gene has miA mutational site, yiIndicate that the mutation allele on whole mutational sites is total
Number, yiIt obeys negative binomial distribution (NB):
Wherein μiIt is expected mutation count, θ is the dispersion parameter of distribution;
Then probability density function isWherein Γ () is gamma function;
S2. using the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i, zero negative binomial point is cut
The probability density function of cloth is
S3. generalized linear regression model is constructed:
η=log (μi)=β0+β1×[x1, mutation allele number on same sense mutation position]
+β2×[x2, encode section length]
+β3×[x3, the limitation scoring of potential new hair mutation]
+β4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
+β5×[x5, HeLa cell DNA reproduction speed]
+β6×[x6, K562 cell HiC long range chromatin reciprocation]
Using the coefficient of maximum likelihood method estimation regression equation and the parameter θ of distribution;
The then logarithm of the number of the nonsynonymous mutation in gene i and the mutation allele on shearing mutational site
It can be calculated by following formula:
Wherein,It is the coefficient of regression equation;
S4. regression equation parameter determine after, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Wherein sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yi|μi,θ)].
It is observation yiAverage value, acquired by following formula
S5. deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard
Difference.The P value of standard deviation residual error is calculated using standardized normal distribution:
Φ (x) is the cumulative distribution function of standardized normal distribution:
Φ (x) is the cumulative distribution function of standardized normal distribution;
S6. the P value of full gene is calculated using step S1~S5;
S7. the too small significant gene of P value is rejected using threshold value;
S8. the P value of remaining gene is calculated using step S1~S5;
S9. step S7~S8 is repeated until the significant gene for not having P value too small;Determining model parameter calculation is utilized at this time
The P value of full gene;
Using the gene generation P value that step S1~S9 is with reference to somatic mutation sample, the too small base of P value is then rejected
The somatic mutation for retaining gene is integrated into small sample by cause, is then estimated in gene by the method for step S1~S9
High frequency somatic mutation and corresponding P value.
In order to further enhance detection efficiency, in the step S1, a weighted model is constructed based on Random Forest model
The scoring s of mutational site j on predicted gene ii,j, and by si,jBe converted to the scoring w of integeri,j, wi,j=Integer
Scoring will be used as the priority valve in mutational site, then the mutation allele number after weighting are as follows:
If the mutation allele number of weighting also obeys negative binomial distribution:
It is expected mutation count,It is the dispersion parameter of bi-distribution;
Then subsequent process is executed again.
Compared with prior art, the beneficial effects of the present invention are:
Method provided by the invention has the advantage for being fully accurate fitting genome somatic mutation rate, so as to more effective
Cancer driving gene is screened from background genes in ground.Particularly, which is not limited to by sample size, and small sample can also be promoted
The effect of detection driving gene.
Detailed description of the invention
Fig. 1 is the configuration diagram of method.
Fig. 2 detects cancer with 4 kinds of methods for 11 kinds of cancers and the effect of gene is driven to compare figure.
Wherein, a: fold differences take the average value of logarithm;B: significant gene number;The common cancer that c:5 kind method obtains is aobvious
Write gene number;D: the peculiar significant gene number of each method;E: the unique gene that each method obtains matches with cancer gene concentration
Gene number.
Gene of the P value less than threshold value FDR=0.1 is removed.Cancer name label: BLCA: Urothelial Carcinoma of Bladder; BRCA:
Breast cancer;COAD: colon cancer;UCEC: carcinoma of endometrium;HNSC: G. cephalantha;KIRC: kidney light cell cancer;LUAD: lung gland
Cancer;LUSC: squamous cell lung carcinoma;MEL: melanoma;OV: serous cystadenocarcinoma of ovary;STAD: sdenocarcinoma of stomach.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, method framework provided by the invention includes three layers, the 1. layer be that iteration is cut zero negative binomial and returned
(ITER), background mutation model estimates that nonsynonymous mutation and shearing mutation are pre- in individual gene according to the various features of genome
The number of phase.The 2. layer be that weighted iteration cuts zero negative binomial and returns (WITER), for generating priority valve, and with the power predicted
Value is mutated high risk potential in cancer sample and low-risk mutation is distinguish.3. layer be to integrate reference sample, pass through
The strategy of independent sample for reference is added, the unstable difficulty of regression model caused by the sample size deficiency that improvement is had by oneself because of user
Topic.1. layer be the 2. layer a part, 1. layer and 2. layer be the 3. layer a part.It is non-in cancer patient body cell
Variation number of alleles in same sense mutation and shearing mutation, same sense mutation site is main input.Output is cancer sample
In each gene somatic mutation z-score and p value.
First layer structure: the zero negative binomial that cuts of iteration returns
First layer structure proposes a kind of new method that the system cytoplasmic process parameter for being fitted gene on genome is distributed, life
The zero negative binomial that cuts of entitled iteration returns (ITER).The observation number of somatic mutation and the difference of estimation desired number are used for
Whether the somatic mutation for measuring some gene in a kind of cancer is excessive.Nonsynonymous mutation and shearing mutation are interested prominent
Become, if the mutation count that some gene contains exceeds the expectation mutation count of this type gene, then the gene may be significantly
Promote the driving gene of growth of cancers.Present invention ci,jIndicate nonsynonymous mutation or shearing mutational site in some background genes i
The upper mutation allele number occurred in cancer sample of j.Assuming that the gene has miA mutational site, yiIndicate all mutation
Mutation allele sum on site, yiIt obeys negative binomial distribution (NB).
μiIt is expected mutation count, θ is the dispersion parameter of distribution.Probability density function (PMF) is Wherein Γ () is gamma function.
But somatic mutation be it is rare, many genes do not have somatic mutation in the sample size of ordinary size, thus
There is excessive zero in actual observation data.Because of the influence of zero expansion, so that regression equation is difficult accurately to be fitted mutation
Number.Therefore first layer structure is proposed with the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i.Cut zero
The probability density function of negative binomial distribution is,
Based on zero negative binomial distribution 2 distributions are cut, we construct a generalized linear regression model, on predicted gene i
The desired number of nonsynonymous mutation and the mutation allele on shearing mutational site.The regression equation includes 6 covariants,
η=log (μi)=β0+β1×[x1, mutation allele number on same sense mutation position]
+β2×[x2, encode section length]
+β3×[x3, the limitation scoring of potential new hair mutation]
+β4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
+β5×[x5, HeLa cell DNA reproduction speed]
+β6×[x6, K562 cell HiC long range chromatin reciprocation],
Mutation allele number in same sense mutation is counted in the detection sample that user possesses.Encode section length
It is to be estimated from reference genetic model data RefGene.Gene limits score basis Samocha et al (2014)[8]It calculates.
Last three covariants continue to use MutSigCV[1]The predictive variable of method.Expression value is derived from Cancer Cell Line
The average value that 91 cell line is expressed in Encylcopedia (CCLE).Cellular replication multiple, range are measured from HeLa cell
From 100 (early stages) to 1000 (advanced stages).The chromatin state of gene is measured from the HiC of K562 cell line experiment and is obtained, range
Probably from -50 (closed states) to+50 (open states).Because some covariants are missing from value, missForest is used
Method fills up missing values.MissForest is a kind of widely applied nonparametric missing values complementing method based on random forest.
Certain model can also add other covariants.We use R kit countreg (https: //r-
Forge.r-project.org/R/? group_id=522 the maximum likelihood method in) estimates regression equation coefficient and is distribution
Parameter θ.
After above-mentioned model elaborates, then same sense mutation and shearing mutation logarithm in non-in gene iIt can be with
It is calculated by following formula:
Wherein,It is the coefficient of regression equation.
After the parameter and dispersion parameter of equation are determined, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yi|μi,θ)].
It is observation yiAverage value, acquired by following formula
In the analysis of actual data, discovery standardized normal distribution can be applied to the P of approximate standard deviation residual error
Value.Deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard deviation.Based on such as
Lower formula calculates P value:
Φ (x) is the cumulative distribution function of standardized normal distribution.
Assuming that most genes are the non-driven gene of background, ITER model estimates that prediction body cell is non-under this null hypothesis
The expectation allele number of same sense mutation and shearing mutational site.éiBigger expression somatic mutation site allele quantity
Observation more than prediction desired value it is big, become cancer driving gene a possibility that it is also bigger.
The recurrence mode of first layer structure proposition iteration reduces influence of the driving gene in null hypothesis regression model.
Step 1: calculating the P value of full gene with ITER
Step 2: rejecting the too small significant gene of P value with the False discovery rate that threshold value is (FDR)≤0.1
Step 3: calculating the P value of remaining gene with ITER
Step 4: repeat second and third step until the significant gene that does not have P value too small
It is that it is used to calculate full gene closest to null hypothesis model in the ITER model that last time iteration acquires
P value (including the gene rejected when iteration).
Second layer structure: the iteration of weighting is cut zero negative binomial and is returned
ITER method is extended by second layer structure, and in mutational site, weighting becomes the more powerful side WITER of efficiency
Method.Use si,j∈ [0,1] indicates the scoring of a mutational site j on gene i.It is cancer driving that scoring, which prompts the mutational site,
A possibility that mutational site.By si,jBe converted to the scoring w of integeri,i, take si,j/ 0.1 max-int, i.e.,The scoring of this integer will be used as the priority valve in mutational site.ITER is that WITER works as wi,jThe one of=1
Kind special case.The mutation allele number of weighting are as follows:
Equally, also assume that the mutation allele number of weighting obeys negative binomial distribution:
Wherein,It is expected mutation count,It is the dispersion parameter of bi-distribution.Mutation allele number y originallyiQuilt
The mutation allele number of weightingAfter replacement, section zero negative binomial regression process of iteration is constant, exists for detecting some gene
Whether nonsynonymous mutation and shearing mutational site have excessive weighting mutation allele number.
Second layer structure constructs the scoring s that a weighted model predicts potential high-frequency body cell driving mutationi,j.It should
Weighted model is constructed based on Random Forest model.The training set of Random Forest model (including 500 decision trees) is huge cancer
Disease somatic mutation database COSMIC (V83).In order to avoid repeating demonstration problem, use is eliminated from COSMIC database
In all samples (number=7,916) of 34 kinds of cancers of test.4,320 individual cells in COSMIC (V83) are had collected to be mutated
Positive mutation training set is constituted, these mutation incidence in the cancerous tissue of primary is higher than 15 times of mean level.Also from
The mutation of 258,846 individual cells is randomly selected in COSMIC sample as negative control catastrophe set.Each to impinging upon primary cancer
Primary mutation only occurs in tissue.The Prediction Parameters of each mutation include to come from dbNSFP v3.5[9]19 genes of database
The scoring of function harmfulness.
Third layer structure: ITER or WITER borrows sample for reference and analyzes small sample cancer
In small sample, when somatic mutation number is too small (being, for example, less than 28,000), it is stable to be difficult building one
Regression model.However, it is noted that the core of ITER or WITER is gene constructed model non-driven to background.When two kinds
When the non-driven gene mutation rate of cancer is close to each other, then integrating a kind of background genes of cancer to the background of another cancer
Gene is feasible and effective.Third layer structure proposes a kind of borrow sample for reference strategy as a result,.The strategy is able to achieve to small
Sample constructs stable ITER or WITER model.Generally by implementing in two steps:
The first step is the gene generation P value with reference to somatic mutation sample with above-mentioned ITER or WITER method, (false
The sample for being used as reference calmly has the mutation of enough numbers).The too small gene of P value, example are rejected with a very loose threshold value
Such as reject gene of the FDR less than 0.8 of corresponding P value.
The somatic mutation of retained gene is integrated into the small sample that user has by oneself, then all by them by second step
It inputs ITER or WITER and constructs a new regression model.Finally, estimating the high frequency body cell in gene with this new model
Mutation and corresponding P value.
For method provided by the invention compared with other methods, it can not only be accurately detected more cancer driving genes,
And speed is fast.In all 11 cancers in testing, this method can always detect the significant gene of more cancers, simultaneously
Avoid the statistically significant phenomenon of expansion and deflation (see Fig. 2).In the assessment test for multiple Minimum Samples, even only
There is 30 or so sample size, method provided by the invention can detect significant cancer driving gene.And it is based on testing result,
This method is made full use of to produce potential driving gene overall situation map 32 kinds of cancers.The map includes the 100 of 23 cancers
A above peculiar gene, these genes are that the potential of diagnosing and treating cancer clearly marks.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Bibliography
[1]Lawrence M S,Stojanov P,Polak P,et al.Mutational heterogeneity in
cancer and the search for new cancer-associated genes[J].Nature,2013,499
(7457):214-218.
[2]Dees N D,Zhang Q,Kandoth C,et al.MuSiC:identifying mutational
significance in cancer genomes[J].Genome Res,2012,22(8):1589-98.
[3]Vogelstein B,Papadopoulos N,Velculescu V E,et al.Cancer genome
landscapes[J]. Science,2013,339(6127):1546-58.
[4]Gonzalez-PerezA,Lopez-Bigas N.Functional impact bias reveals
cancer drivers[J]. Nucleic Acids Res,2012,40(21):e169.
[5]Mularoni L,Sabarinathan R,Deu-Pons J,et al.OncodriveFML:a general
framework to identify coding and non-coding regions with cancer driver
mutations[J]. Genome Biol,2016,17(1):128.
[6]Tamborero D,Gonzalez-PerezA,Lopez-Bigas N.OncodriveCLUST:
exploiting the positional clustering of somatic mutations to identify cancer
genes[J]. Bioinformatics,2013,29(18):2238-44.
[7]Tokheim C J,Papadopoulos N,Kinzler K W,et al.Evaluating the
evaluation of cancer driver genes[J].Proc Natl Acad Sci U S A,2016,113(50):
14330-14335.
[8]Samocha K E,Robinson E B,Sanders S J,et al.A framework for the
interpretation of de novo mutation in human disease[J].Nat Genet,2014,46(9):
944-50.
[9]Liu X,Jian X,Boerwinkle E.dbNSFP:a lightweight database of human
nonsynonymous SNPs and their functional predictions[J].Hum Mutat,2011,32(8):
894-9。
Claims (2)
1. a kind of for detecting the statistical analysis technique of potential cancer driving gene, it is characterised in that: the following steps are included:
S1. with using cI, jIt indicates to occur in cancer sample on nonsynonymous mutation or shearing mutational site j in some background genes i
Mutation allele number;If gene i has miA mutational site, yiIndicate that the mutation allele on whole mutational sites is total
Number, yiIt obeys negative binomial distribution (NB):
Wherein μiIt is expected mutation count, θ is the dispersion parameter of distribution;
Then probability density function isWherein Γ () is gamma function;
S2. using the allelic variation equipotential radix for cutting zero negative binomial distribution models fitting background genes i, zero negative binomial distribution is cut
Probability density function is
S3. generalized linear regression model is constructed:
η=log (μi)=β0+β1×[x1, mutation allele number on same sense mutation position]
+β2×[x2, encode section length]
+β3×[x3, the limitation scoring of potential new hair mutation]
+β4×[x4, cancer cell system expression quantity in Cancer Cell Line Encylcopedia database]
+β5×[x5, HeLa cell DNA reproduction speed]
+β6×[x6, K562 cell HiC long range chromatin reciprocation]
Using the coefficient of maximum likelihood method estimation regression equation and the parameter θ of distribution;
The then logarithm of the number of the nonsynonymous mutation in gene i and the mutation allele on shearing mutational siteIt can be with
It is calculated by following formula:
Wherein,It is the coefficient of regression equation;
S4. regression equation parameter determine after, gene i zero be mutated probability be,
It cuts in zero model, the raw residual of gene i are as follows:
The deviation residual error of gene i are as follows:
Wherein sign (x) is standard signum function, and ll (μ, θ) is the natural logrithm likelihood function for cutting zero negative binomial distribution:
ll(yi| μ, θ)=ln [g (yi|μi, θ)]
It is observation yiAverage value, acquired by following formula:;
S5. deviation residual error is standardized:WhereinWithRespectively deviation residual estimation mean value and standard deviation;It utilizes
The P value of standardized normal distribution calculating standard deviation residual error:
pi=1- Φ (éi),
Φ (x) is the cumulative distribution function of standardized normal distribution:
pi=1- Φ (éi)
Φ (x) is the cumulative distribution function of standardized normal distribution;
S6. the P value of full gene is calculated using step S1~S5;
S7. the too small significant gene of P value is rejected using threshold value;
S8. the P value of remaining gene is calculated using step S1~S5;
S9. step S7~S8 is repeated until the significant gene for not having P value too small;It is whole using determining model parameter calculation at this time
The P value of gene;
Using the gene generation P value that step S1~S9 is with reference to somatic mutation sample, the too small gene of P value is then rejected, it will
The somatic mutation for retaining gene is integrated into small sample, and the high frequency in gene is then estimated by the method for step S1~S9
Somatic mutation and corresponding P value.
2. according to claim 1 for detecting the statistical analysis technique of potential cancer driving gene, it is characterised in that: institute
It states in step S1, the scoring of a mutational site j on a weighted model predicted gene i is constructed based on Random Forest model
sI, j, and by sI, jBe converted to the scoring w of integerI, j, wI, j=" sI, j/ 0.1], integer scoring will be used as the preferential of mutational site
Weight, then the mutation allele number after weighting are as follows:
If the mutation allele number of weighting obeys negative binomial distribution:
It is expected mutation count,It is the dispersion parameter of bi-distribution;
Then subsequent process is executed again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810902841.XA CN109346127B (en) | 2018-08-09 | 2018-08-09 | Statistical analysis method for detecting potential cancer driver gene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810902841.XA CN109346127B (en) | 2018-08-09 | 2018-08-09 | Statistical analysis method for detecting potential cancer driver gene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109346127A true CN109346127A (en) | 2019-02-15 |
CN109346127B CN109346127B (en) | 2021-10-08 |
Family
ID=65296755
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810902841.XA Active CN109346127B (en) | 2018-08-09 | 2018-08-09 | Statistical analysis method for detecting potential cancer driver gene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109346127B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110910955A (en) * | 2019-10-21 | 2020-03-24 | 中山大学 | Establishment method of longitudinal analysis model of rare variation sites of susceptibility genes |
CN112259163A (en) * | 2020-10-28 | 2021-01-22 | 广西师范大学 | Cancer driving module identification method based on biological network and subcellular localization data |
WO2021042237A1 (en) * | 2019-09-02 | 2021-03-11 | 北京哲源科技有限责任公司 | Method for obtaining intracellular deterministic event, and electronic device |
CN113517021A (en) * | 2021-06-09 | 2021-10-19 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN117809741A (en) * | 2024-03-01 | 2024-04-02 | 浙江大学 | Method and device for predicting cancer characteristic genes based on molecular evolution selective pressure |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980763A (en) * | 2017-03-30 | 2017-07-25 | 大连理工大学 | A kind of cancer based on gene mutation frequency drives the screening technique of gene |
US20170319692A1 (en) * | 2016-04-22 | 2017-11-09 | The Cleveland Clinic Foundation | Anti-ar agent and radiation therapy for androgen receptor positive cancer |
CN108256291A (en) * | 2016-12-28 | 2018-07-06 | 杭州米天基因科技有限公司 | It is a kind of to generate the method with higher confidence level detection in Gene Mutation result |
-
2018
- 2018-08-09 CN CN201810902841.XA patent/CN109346127B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170319692A1 (en) * | 2016-04-22 | 2017-11-09 | The Cleveland Clinic Foundation | Anti-ar agent and radiation therapy for androgen receptor positive cancer |
CN108256291A (en) * | 2016-12-28 | 2018-07-06 | 杭州米天基因科技有限公司 | It is a kind of to generate the method with higher confidence level detection in Gene Mutation result |
CN106980763A (en) * | 2017-03-30 | 2017-07-25 | 大连理工大学 | A kind of cancer based on gene mutation frequency drives the screening technique of gene |
Non-Patent Citations (3)
Title |
---|
H SAKAMOTO ET AL: "Disproportionate representation of KRAS gene mutation in atypical adenomatous hyperplasia, but even distribution of EGFR gene mutation from preinvasive to invasive adenocarcinomas", 《JOURNAL OF PATHOLOGY》 * |
MICHAEL S. LAWRENCE ET AL: "Mutational heterogeneity in cancer and the search for new cancer-associated genes", 《NATURE》 * |
付立平 等: "1834 个亲子鉴定案例中 20 个 STR 基因座的突变分析", 《WORLD LATEST MEDICINE INFORMATION (ELECTRONIC VERSION) 》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021042237A1 (en) * | 2019-09-02 | 2021-03-11 | 北京哲源科技有限责任公司 | Method for obtaining intracellular deterministic event, and electronic device |
CN112840402A (en) * | 2019-09-02 | 2021-05-25 | 北京哲源科技有限责任公司 | Method and electronic device for obtaining deterministic events in cells |
CN110910955A (en) * | 2019-10-21 | 2020-03-24 | 中山大学 | Establishment method of longitudinal analysis model of rare variation sites of susceptibility genes |
CN110910955B (en) * | 2019-10-21 | 2024-03-01 | 中山大学 | Method for establishing longitudinal analysis model of rare mutation sites of susceptibility genes |
CN112259163A (en) * | 2020-10-28 | 2021-01-22 | 广西师范大学 | Cancer driving module identification method based on biological network and subcellular localization data |
CN112259163B (en) * | 2020-10-28 | 2022-04-22 | 广西师范大学 | Cancer driving module identification method based on biological network and subcellular localization data |
CN113517021A (en) * | 2021-06-09 | 2021-10-19 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN113517021B (en) * | 2021-06-09 | 2022-09-06 | 海南精准医疗科技有限公司 | Cancer driver gene prediction method |
CN117809741A (en) * | 2024-03-01 | 2024-04-02 | 浙江大学 | Method and device for predicting cancer characteristic genes based on molecular evolution selective pressure |
Also Published As
Publication number | Publication date |
---|---|
CN109346127B (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109346127A (en) | A kind of statistical analysis technique driving gene for detecting potential cancer | |
CN107025384A (en) | A kind of construction method of complex data forecast model | |
CN114686591B (en) | Lung squamous cell carcinoma immunotherapy curative effect prediction model based on gene expression condition, construction method and application thereof | |
CN106033502A (en) | Virus identification method and device | |
EP3794145A1 (en) | Inferring selection in white blood cell matched cell-free dna variants and/or in rna variants | |
CN111440869A (en) | DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof | |
Dash et al. | Performance analysis of clustering techniques over microarray data: A case study | |
Dawany et al. | Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types | |
Karimnezhad et al. | Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach | |
Luo et al. | A new approach for the 10.7-cm solar radio flux forecasting: based on empirical mode decomposition and LSTM | |
Sobhan et al. | Explainable machine learning to identify patient-specific biomarkers for lung cancer | |
Zhuge et al. | Construction of the model for predicting prognosis by key genes regulating EGFR-TKI resistance | |
Tai et al. | Bayice: a Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data | |
Shahweli et al. | In Silico Molecular Classification of Breast and Prostate Cancers using Back Propagation Neural Network | |
Bell-Glenn et al. | A novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures | |
US11435357B2 (en) | System and method for discovery of gene-environment interactions | |
Jin et al. | Feature selection and classification over the network with missing node observations | |
Tsai et al. | Significance analysis of ROC indices for comparing diagnostic markers: applications to gene microarray data | |
US8793209B2 (en) | Reflecting the quantitative impact of ordinal indicators | |
Li et al. | Using the SVM Method for Lung Adenocarcinoma Prognosis Based on Expression Level | |
Zhang et al. | Network propagation models for gene selection | |
KR20220085139A (en) | Method of gene selection for predicting medical information of patients and uses thereof | |
Zeng et al. | A Novel Prognosis Model based on Comprehensive Analysis of Pyroptosis-Related Genes in Breast Cancer | |
Lakshmi et al. | Design of a state-machine based genomic simulator and development of a system for prediction of Rheumatoid Arthritis (RA) using signal processing techniques | |
Zheng et al. | TsImpute: an accurate two-step imputation method for single-cell RNA-seq data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |