CN106682455B - A kind of Statistical Identifying Method of multisample copy number consistency variable region - Google Patents
A kind of Statistical Identifying Method of multisample copy number consistency variable region Download PDFInfo
- Publication number
- CN106682455B CN106682455B CN201611040980.3A CN201611040980A CN106682455B CN 106682455 B CN106682455 B CN 106682455B CN 201611040980 A CN201611040980 A CN 201611040980A CN 106682455 B CN106682455 B CN 106682455B
- Authority
- CN
- China
- Prior art keywords
- copy number
- multisample
- variable region
- sample
- cnvs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The invention discloses a kind of Statistical Identifying Methods of multisample copy number consistency variable region, coefficient of relationship construction based on copy number site is fitted to curve, calculate the derivative value in each site, by assuming that the method for inspection detects significant derivative value, so that it is determined that copy number breakpoint, establishes copy number variation candidates region;Hypothesis testing zero cloth is constructed by way of the random permutation CNVs on full-length genome and sample both direction, detects copy number consistency variable region in multisample.The present invention avoids directly to be accommodated certain sequencing mistake and noise using sequencing read number, capable of accurately being positioned the boundary of copy number variable region;More true hypothesis testing zero cloth can be obtained compared to displacement in a single direction based on random permutation CNVs on full-length genome and sample both direction;Meanwhile being conducive to detect diversified consistency variation CNVs, i.e. copy number consistency variable region present in multisample subclass.
Description
Technical field
The invention belongs to copy number mutation field more particularly to a kind of multisample copy number consistency variable regions
Statistical Identifying Method.
Background technique
New-generation sequencing technology provides genome mutation data more comprehensively, richer, for it is deep understand life mechanism,
Cancer cell development mechanism provides Important Platform.Copy number variation (CopyNumber Variation, CNV) is weight in genome
Generation, the development of the variation phenomenon and cancer wanted have substantial connection.For this purpose, being carried out to the CNV data on new-generation sequencing platform
The analysis of system is that discovery cancer gene, research cancer cell molecule mechanism provide important channel, and how difficult point is from high score
Diversified CNV mode is accurately detected in resolution, the read data of low sequencing depth.Prior art: domestic at present
Existing expert proposes different copy number variation detection schemes outside, can substantially be divided into based on single tumor sample and be based on
The detection scheme of tumor-normal paired sample, such as SegSeq [D.Y.Chiang et al., " High-resolution
mapping of copy-number alterations with massively parallel sequencing,”Nat
Methods,vol.6,no.1,pp.99-103,Jan,2009],EWT[S.T.Yoon et al.,“Sensitive and
accurate detection of copy number variants using read depth of coverage,”
Genome Research,vol.19,no.9,pp.1586-1592,Sep,2009],BIC-seq[R.Xi et al.,“Copy
number variation detection in whole-genome sequencing data using the Bayesian
information criterion,”Proc Natl Acad Sci U S A,vol.108,no.46,pp.E1128-36,Nov
15,2011],CNVnator[A.Abyzov et al.,“CNVnator:an approach to discover,genotype,
and characterize typical and atypical CNVs from family and population genome
sequencing,”Genome Res,vol.21,no.6,pp.974-84,Jun,2011],ReadDepth[C.A.Miller
et al.,“ReadDepth:a parallel R package for detecting copy number alterations
from short sequencing reads,”PLoS One,vol.6,no.1,pp.e16327,2011],Control-
FREEC[V.Boeva et al.,“Control-free calling of copy number alterations in
deep-sequencing data using GC-content normalization,”Bioinformatics,vol.27,
no.2,pp.268-9,Jan 15,2011],CNV-TV[J.Duan et al.,“CNV-TV:a robust method to
discover copy number variation from short sequencing reads,”BMC
Bioinformatics,vol.14,pp.150,2013],CNVeM[Z.Wang et al.,“CNVeM:copy number
variation detection using uncertainty ofread mapping,”J Comput Biol,vol.20,
no.3,pp.224-36,Mar,2013],m-HMM[H.Wang et al.,“Copy number variation detection
using next generation sequencing read counts,”Bmc Bioinformatics,vol.15,Apr
The methods of 14,2014].These method majorities using sequencing depth calculation gene loci read number, and then in full-length genome or
Copy number variable region is predicted according to read number situation of change within the scope of whole chromosome.The characteristics of such methods is to realize opposite hold
Easily, there is preferable detection effect for the data of high sequencing depth;Its shortcoming is that directly relying on property of read number, and read
There is unstability in number, i.e. read number has certain random variation, and this random variation is often erroneously interpreted as copying in itself
Caused by shellfish number variation, especially for the data of low sequencing depth, the ratio of random amplitude of variation and copy number variation amplitude
Value is higher, so that such methods are difficult to obtain preferable copy number variation detection effect.In addition, there is part expert to propose
Copy number mutation detection method based on multisample, as cnvHiTSeq [E.Bellos et al., " cnvHiTSeq:
integrative models for high-resolution copy number variation detection and
genotyping using population sequencing data,”Genome Biol,vol.13,no.12,
pp.R120,2012],VarScan2+CMDS[D.C.Koboldt et al.,“VarScan 2:somatic mutation
and copy number alteration discovery in cancerby exome sequencing,”Genome
Res,vol.22,no.3,pp.568-76,Mar,2012,Q.Zhang et al.,“CMDS:a population-based
method for identifying recurrent DNA copy number aberrations in cancer from
high-resolution data,”Bioinformatics,vol.26,no.4,pp.464-9,Feb15,2010],
JointSLM[A.Magi et al.,“Detecting common copy number variants in high-
throughput sequencing data by using JointSLM algorithm,”Nucleic Acids
Research,vol.39,no.10,May,2011],cn.MOPS[G.Klambauer et al.,“cn.MOPS:mixture
of Poissons for discovering copy number variations in next-generation
sequencing data with a low false discovery rate,”Nucleic Acids Res,vol.40,
no.9,pp.e69,May,2012],CBSBR[J.Duan et al.,“Common copy number variation
detection from multiple sequenced samples,”IEEE Trans Biomed Eng,vol.61,no.3,
pp.928-37,Mar,2014],CODEX[Y.Jiang et al.,“CODEX:a normalization and copy
number variation detection method for whole exome sequencing,”Nucleic Acids
Res, vol.43, no.6, pp.e39, Mar 31,2015] etc..Such methods majority is to be based on being associated between copy number variant sites
Property or inter-sample difference detect consistency copy number variable region, its advantage lies in being able to hold copy number structural variation
Biological nature, to distinguish the copy number variation of consistency copy number variable region and randomness.The disadvantage is that being difficult to detect weak
Great consistency copy number variable region.These methods are in multisample copy number variation detection simultaneously, often to sample number
Amount has certain limitation, this seems for detecting the ability of the high consistency copy number variable region of certain class cancer or general cancer
It is limited.
In conclusion the Statistical Identifying Method of available sample copy number consistency variable region excessively relies on sequencing read number
Variation, it is difficult to obtain have statistical significance detection effect;Sample size cannot be excessive, and computation complexity is higher, is unfavorable for examining
Survey copy number consistency variable region in multisample.
Summary of the invention
The purpose of the present invention is to provide a kind of Statistical Identifying Methods of multisample copy number consistency variable region, it is intended to
The Statistical Identifying Method for solving available sample copy number consistency variable region excessively relies on the variation of sequencing read number, it is difficult to obtain
There must be the detection effect of statistical significance;Sample size cannot be excessive, and computation complexity is higher, is unfavorable for copying in detection multisample
The problem of number consistency variable region.
The invention is realized in this way a kind of Statistical Identifying Method of multisample copy number consistency variable region, described
The Statistical Identifying Method of multisample copy number consistency variable region is fitted to curve based on the coefficient of relationship in copy number site, with
This calculates the derivative value in each site, by assuming that the method for inspection detects significant derivative value, so that it is determined that copy number breakpoint, builds
Vertical copy number variation candidates region;Hypothesis is constructed by way of the random permutation CNVs on full-length genome and sample both direction
Zero cloth is examined, copy number consistency variable region in multisample is detected.
Further, it needs to carry out the pretreatment to sequencing data file before the coefficient of relationship curve matching, it is specific to wrap
It includes:
On the basis of comparing to sequencing data file, the read number in each site is calculated;According to sample read number mean value
Regularization is carried out to read number, to obtain the read number signal being comparable between sample, calculation formula are as follows:
Wherein, mean_RCnWith mean_RC respectively refer to n-th of sample read number mean value and multiple sample read numbers it is equal
Value, xnmRefer to the read number in n-th of sample, m-th of site, x'nmRefer to the read number after corresponding site is regular.
Further, isometric bins is defined, converts the read number as unit of bin for the read number of sample site, it is right
The detection for copying number variation member will be carried out as unit of bin.
Further, indicate that a sample, each column indicate a bin based on a line every in preprocessed data matrix M, M;It adopts
The coefficient of relationship between bins is calculated with Pearson correlation analysis method, and is fitted to curve, leading for each bin is solved with this
Numerical value.
Further, using derivative value as background, hypothesis testing zero cloth is established, the derivative value of conspicuousness, conspicuousness are examined
Mean that there are breakpoints in the position of the bin, obtain copy number variation candidates region.
Further, significant CNVs is detected using loop iteration process in copy number variation candidates region, specifically includes: passing through
The candidate region random permutation CNV constructs hypothesis testing zero cloth in full-length genome, tests to the candidate region CNV, if hair
The CNV of existing conspicuousness, just removes it from genome, reconfigures hypothesis testing zero cloth and examines CNV candidate regions again
Domain, until not finding new CNVs.
Further, detection multisample copy number consistency variable region includes: by random in full-length genome and sample
CNVs is replaced with transposition of structures data matrix Mt, calculate the frequency f that random CNVs occurs in multisample;The process n times are repeated,
N > 1000 obtain the distribution of a frequency f, i.e. hypothesis testing zero cloth;The CNVs frequency of data matrix before replacing is examined
It tests, calculates the p value of each CNV, the CNVs of multisample consistency variation is determined according to significance threshold value.
Another object of the present invention is to provide a kind of statistics using the multisample copy number consistency variable region
The cancer gene of the method for inspection.
Another object of the present invention is to provide a kind of statistics using the multisample copy number consistency variable region
The cancer cell molecule of the method for inspection.
The Statistical Identifying Method of multisample copy number consistency variable region provided by the invention, foundation are with statistical theory
The calculation method on basis, detects the copy number variable region of consistency in multiple samples, provides directly for discovery potential cancer gene
Technological means connect, feasible.The present invention finds out bins on the basis of carrying out Regularization to read number, as primitive,
The coefficient of relationship between bins is calculated in multisample space and is fitted to curve, and the derivative of each bin is calculated with this.By to derivative
Value carries out significance test, copy number breakpoint is detected, to obtain the candidate region CNV.By loop iteration process in single sample
In the region CNV is detected, that is, be directed to the candidate region CNV, random permutation process taken to construct zero cloth, it is aobvious to CNV with this
Work property is tested, and be will test and is rejected for significant CNV, rebuilds zero cloth, until the CNV termination for not detecting new follows
Ring.The advantage of doing so is that being able to detect that weak significant CNVs.On the basis of single sample CNV detection, in multisample space
The middle occurrence frequency building statistics according to CNV frequency detecting copy number consistency variable region, i.e., using CNV in multisample
Amount detects copy number consistency variable region by the permutation test method of multisample.
Existing most methods excessively rely on the variation of sequencing read number, since there are errors and read for sequencing technologies itself
In the presence of compared with very noisy, so that these methods are difficult to obtain the detection effect with statistical significance for the sample of low sequencing depth.
For this purpose, the present invention proposes to be fitted to curve using the coefficient of relationship building between copy number variant sites, each base is then calculated
The derivative value significance test is asked because of the derivative value in site, and then by being converted into the test problems of copy number variable region
Topic;It is not directly dependent on the size of sequencing read number in this way, certain sequencing mistake and noise can be accommodated.
The existing copy number mutation detection method for multisample has certain limitation to sample size or feature, such as
CBSBR method requires sample size cannot be excessive, and algorithm defaults 6 samples, and computation complexity is higher;Cn.MOPS requires sample
This, there are apparent otherness, is unfavorable for detecting copy number consistency variable region in multisample;For this purpose, the present invention establishes newly
Statistical inspection model, diversified copy number variation mode is detected using circulation rejecting process, and do not limit sample size
System, computation complexity is controllable, as table 1 lists the comparison of method.
The comparison of the computation complexity of 1.4 kinds of methods of table
Method | DCC | CBSBR | FREEC | cn.MOPS |
Runing time | 22s | 1721s | 50s | 38s |
Time complexity | O(mn) | O(mnk) | O(n) | O(mn) |
Space complexity | O(mn) | O(m2n2) | O(n) | O(mn) |
Software platform | C++ | MATLAB | C++ | R |
Wherein DCC is method of the invention, which is the result detected to the genome that length is 5Gb.
The present invention is based on the curves that is fitted to of coefficient of relationship to calculate derivative value, to examine copy number breakpoint, so that it is determined that copying
Shellfish number variation candidate region;On the one hand it avoids that directly certain sequencing mistake and noise can be accommodated, separately using sequencing read number
On the one hand the boundary of copy number variable region can accurately be positioned;Based on being set at random on full-length genome and sample both direction
CNVs is changed, compared to displacement in a single direction, this strategy can obtain more true hypothesis testing zero cloth;Meanwhile favorably
In the diversified consistency variation CNVs of detection, i.e. copy number consistency variable region present in multisample subclass.
Detailed description of the invention
Fig. 1 is the Statistical Identifying Method process of multisample copy number consistency variable region provided in an embodiment of the present invention
Figure.
Fig. 2 is the performance comparison schematic diagram of the present invention (DCC) and cn.MOPS method provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
Application principle of the invention is explained in detail with reference to the accompanying drawing.
As shown in Figure 1, the Statistical Identifying Method of multisample copy number consistency variable region provided in an embodiment of the present invention
The following steps are included:
S101: on the basis of comparing to sequencing data file (i.e. Fastq file), the read number in each site is calculated;
Regularization is carried out to read number according to sample read number mean value, to obtain the read number signal being comparable between sample;
S102: it based on preprocessed data matrix M (wherein every a line indicates that a sample, each column indicate a bin), adopts
The coefficient of relationship between bins is calculated with Pearson correlation analysis method, construct the coefficient of relationship is fitted to curve, asks with this
Solve the derivative value of each bin;Using derivative value as background, hypothesis testing zero cloth is established, the derivative value of conspicuousness is examined, it is significant
Property means that there are breakpoints in the position of the bin, to obtain copy number variation candidates region;
S103: the CNVs defined based on single sample, hypothesis testing zero cloth is constructed by the Replacement Strategy of multisample;It is opposed
The CNVs frequency of data matrix is tested before changing, and is calculated the p value of each CNV, is determined multisample according to significance threshold value
The CNVs of consistency variation.
Application principle of the invention is further described combined with specific embodiments below.
(1) data prediction
On the basis of comparing to sequencing data file (i.e. Fastq file), the read number in each site is calculated;According to sample
This read number mean value carries out Regularization to read number, specific such as formula to obtain the read number signal being comparable between sample
(1) shown in.
Wherein, mean_RCnWith mean_RC respectively refer to n-th of sample read number mean value and multiple sample read numbers it is equal
Value, xnmRefer to the read number in n-th of sample, m-th of site, x'nmRefer to the read number after corresponding site is regular.
On the basis of Regularization data, in order to reduce data dimension and reduce due between enchancement factor bring site
Otherness, the present invention define isometric bins, the read number converted the read number of sample site to as unit of bin.In this way,
The detection of copy number variation member will be carried out as unit of bin.
(2) derivative value is examined and for single sample detection copy number variation
Based on preprocessed data matrix M (wherein every a line indicates that a sample, each column indicate a bin), use
Pearson correlation analysis method calculates the coefficient of relationship between bins, and construct the coefficient of relationship is fitted to curve, is solved with this
The derivative value of each bin.
Using derivative value as background, hypothesis testing zero cloth is established, the derivative value of conspicuousness is examined, conspicuousness means
There are breakpoints in the position of the bin, to obtain copy number variation candidates region.The characteristics of doing so is to make full use of copy number
The intrinsic relevance of variant sites has similar horizontal coefficient of relationship between the site in that is, same copy number variable region, leads to
It crosses and the mode of derivative value is examined to find coefficient of relationship mutational site, to obtain the copy number variation candidates region that length does not wait.
For copy number variation candidates region, significant CNVs is detected using loop iteration process, specific practice is as follows: logical
It crosses the candidate region random permutation CNV in full-length genome and constructs hypothesis testing zero cloth, candidate region CNV is examined with this
It tests, if the CNV of discovery conspicuousness, it is just removed, reconfigure hypothesis testing zero cloth and examine CNV again from genome
Candidate region, until not finding new CNVs.
(3) multisample copy number consistency variable region is detected
Based on the CNVs that single sample defines, the Replacement Strategy for passing through multisample constructs hypothesis testing zero cloth: i.e. by
Random permutation CNVs is in full-length genome and sample with transposition of structures data matrix Mt, random CNVs is calculated with this and is sent out in multisample
Raw frequency f;The process n times (n > 1000) are repeated, to obtain the distribution of a frequency f, i.e. hypothesis testing zero cloth.With this
It tests to the CNVs frequency of data matrix before replacing, calculates the p value of each CNV, it is more according to the determination of significance threshold value
The CNVs of unanimity of samples variation.
The comparison of performance.Fig. 2 is (DCC) of the invention compared with the performance of cn.MOPS method, and experiment test is different
The CNV detection performance of DNA is sequenced under cancer cell purity (Tumorpurity).It is relatively high that Fig. 2 shows that the method for the present invention has
Performance.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (7)
1. a kind of Statistical Identifying Method of multisample copy number consistency variable region, which is characterized in that the multisample copy
The Statistical Identifying Method of number consistency variable region is fitted to curve based on the coefficient of relationship in copy number site, calculates each site
Derivative value, by assuming that the method for inspection detects significant derivative value, so that it is determined that copy number breakpoint, establishes copy number variation and wait
Favored area;Hypothesis testing is constructed by way of the random permutation copy number variation CNVs on full-length genome and sample both direction
Zero cloth detects copy number consistency variable region in multisample;
The Statistical Identifying Method of the multisample copy number consistency variable region specifically includes:
(1) on the basis of comparing to sequencing data file, the read number in each site is calculated;According to sample read number mean value pair
Read number carries out Regularization, to obtain the read number signal being comparable between sample;
(2) it is based on preprocessed data matrix M, wherein every a line indicates that a sample, each column indicate a bin, is used
Pearson correlation analysis method calculates the coefficient of relationship between bins, constructs the matched curve of the coefficient of relationship, is solved often with this
The derivative value of a bin;Using derivative value as background, hypothesis testing zero cloth is established, examines the derivative value of conspicuousness, conspicuousness meaning
Taste there are breakpoints in the position of the bin, to obtain copy number variation candidates region;
(3) CNVs defined based on single sample constructs hypothesis testing zero cloth by the Replacement Strategy of multisample;To number before displacement
It tests according to the CNVs frequency of matrix, calculates the p value of each CNV, multisample consistency is determined according to significance threshold value
The CNVs of variation.
2. the Statistical Identifying Method of multisample copy number consistency variable region as described in claim 1, which is characterized in that structure
It makes before coefficient of relationship is fitted to curve and needs to carry out to specifically include the pretreatment of sequencing data file:
On the basis of comparing to sequencing data file, the read number in each site is calculated;According to sample read number mean value to reading
Number of segment carries out Regularization, to obtain the read number signal being comparable between sample, calculation formula are as follows:
Wherein, mean_RCnThe read number mean value of n-th of sample and the mean value of multiple sample read numbers are respectively referred to mean_RC,
xnmRefer to the read number in n-th of sample, m-th of site, x'nmRefer to the read number after corresponding site is regular.
3. the Statistical Identifying Method of multisample copy number consistency variable region as claimed in claim 2, which is characterized in that fixed
The isometric bins of justice, converts the read number as unit of bin for the read number of sample site, will to the detection of copy number variation
Member carries out as unit of bin.
4. the Statistical Identifying Method of multisample copy number consistency variable region as described in claim 1, which is characterized in that copy
Significant CNVs is detected using loop iteration process in shellfish number variation candidate region, specifically includes: by random in full-length genome
The displacement copy candidate region number variation CNV constructs hypothesis testing zero cloth, tests to the copy candidate region number variation CNV,
If reconfiguring hypothesis testing zero cloth and again it was found that the copy number variation CNV of conspicuousness, just removes it from genome
The copy candidate region number variation CNV is examined, until not finding new CNVs.
5. the Statistical Identifying Method of multisample copy number consistency variable region as described in claim 1, which is characterized in that inspection
Survey multisample copy number consistency variable region include: by full-length genome and sample random permutation CNVs with the transposition of structures
Data matrix Mt, calculate the frequency f that random CNVs occurs in multisample;The process n times are repeated, n > 1000 obtain a frequency
The distribution of rate f, i.e. hypothesis testing zero cloth;It tests to the CNVs frequency of data matrix before replacing, calculates each copy number
The p value of variation CNV, the CNVs of multisample consistency variation is determined according to significance threshold value.
6. a kind of statistical check side using multisample copy number consistency variable region described in Claims 1 to 5 any one
The cancer gene of method.
7. a kind of statistical check side using multisample copy number consistency variable region described in Claims 1 to 5 any one
The cancer cell molecule of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611040980.3A CN106682455B (en) | 2016-11-24 | 2016-11-24 | A kind of Statistical Identifying Method of multisample copy number consistency variable region |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611040980.3A CN106682455B (en) | 2016-11-24 | 2016-11-24 | A kind of Statistical Identifying Method of multisample copy number consistency variable region |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106682455A CN106682455A (en) | 2017-05-17 |
CN106682455B true CN106682455B (en) | 2019-03-26 |
Family
ID=58866051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611040980.3A Active CN106682455B (en) | 2016-11-24 | 2016-11-24 | A kind of Statistical Identifying Method of multisample copy number consistency variable region |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106682455B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967410B (en) * | 2017-11-27 | 2021-07-30 | 电子科技大学 | Fusion method for gene expression and methylation data |
CN111508559B (en) * | 2020-04-21 | 2021-08-13 | 北京橡鑫生物科技有限公司 | Method and device for detecting target area CNV |
CN112767999A (en) * | 2021-01-05 | 2021-05-07 | 中国科学院上海药物研究所 | Analysis method and device for whole genome sequencing data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778350A (en) * | 2014-01-09 | 2014-05-07 | 西安电子科技大学 | Somatic copy number alteration obviousness detection method based on two-dimension statistic model |
CN105760712A (en) * | 2016-03-01 | 2016-07-13 | 西安电子科技大学 | Copy number variation detection method based on next generation sequencing |
-
2016
- 2016-11-24 CN CN201611040980.3A patent/CN106682455B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778350A (en) * | 2014-01-09 | 2014-05-07 | 西安电子科技大学 | Somatic copy number alteration obviousness detection method based on two-dimension statistic model |
CN105760712A (en) * | 2016-03-01 | 2016-07-13 | 西安电子科技大学 | Copy number variation detection method based on next generation sequencing |
Non-Patent Citations (4)
Title |
---|
CNV-TV: A robust method to discover copy number variation from short sequencing reads;Junbo Duan et al.;《BMC Bioinformatics》;20131231;第14卷(第150期);第1-12页 |
Common Copy Number Variation Detection From Multiple Sequenced Samples;Junbo Duan et al.;《IEEE Trans Biomed Eng》;20140331;第61卷(第3期);第928-937页 |
Copy number variation detection using next generation sequencing read counts;Heng Wang et al.;《BMC Bioinformatics》;20141231;第15卷(第109期);第1-14页 |
新一代测序的拷贝数变异检测算法研究与设计;李燕 等;《生物信息学》;20150930;第13卷(第3期);第186-191页 |
Also Published As
Publication number | Publication date |
---|---|
CN106682455A (en) | 2017-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tarabichi et al. | A practical guide to cancer subclonal reconstruction from DNA sequencing | |
Agrawal et al. | Large-scale analysis of disease pathways in the human interactome | |
Ay et al. | Analysis methods for studying the 3D architecture of the genome | |
JP6240210B2 (en) | Accurate and rapid mapping of target sequencing leads | |
Hong et al. | Inferring the origin of metastases from cancer phylogenies | |
WO2020035446A1 (en) | Systems and methods for using neural networks for germline and somatic variant calling | |
Liu et al. | Quantitative assessment of cell population diversity in single-cell landscapes | |
CN106682455B (en) | A kind of Statistical Identifying Method of multisample copy number consistency variable region | |
Halperin et al. | A method to reduce ancestry related germline false positives in tumor only somatic variant calling | |
Zhang et al. | Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data | |
Sefer | A comparison of topologically associating domain callers over mammals at high resolution | |
Park et al. | i6mA-DNC: Prediction of DNA N6-Methyladenosine sites in rice genome based on dinucleotide representation using deep learning | |
Rackham et al. | A Bayesian approach for analysis of whole-genome bisulfite sequencing data identifies disease-associated changes in DNA methylation | |
Gilmore et al. | ACE: A workbench using evolutionary genetic algorithms for analyzing association in TCGA | |
Wyllie et al. | M. tuberculosis microvariation is common and is associated with transmission: analysis of three years prospective universal sequencing in England | |
US20210324465A1 (en) | Systems and methods for analyzing and aggregating open chromatin signatures at single cell resolution | |
WO2017201400A1 (en) | Determination of cell types in mixtures using targeted bisulfite sequencing | |
Li et al. | SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples | |
Wu et al. | Computational Systems Biology | |
Hu et al. | Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline | |
CN116825182B (en) | Method for screening bacterial drug resistance characteristics based on genome ORFs and application | |
Lauria | Rank-based miRNA signatures for early cancer detection | |
Aljouie et al. | Cross-validation and cross-study validation of chronic lymphocytic leukaemia with exome sequences and machine learning | |
Haque et al. | Detection of copy number variations from NGS data by using an adaptive kernel density estimation-based outlier factor | |
Shi et al. | Ultra-rapid metagenotyping of the human gut microbiome |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |