CN110246543B - Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology - Google Patents

Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology Download PDF

Info

Publication number
CN110246543B
CN110246543B CN201910541057.5A CN201910541057A CN110246543B CN 110246543 B CN110246543 B CN 110246543B CN 201910541057 A CN201910541057 A CN 201910541057A CN 110246543 B CN110246543 B CN 110246543B
Authority
CN
China
Prior art keywords
copy number
number variation
gene
sample
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910541057.5A
Other languages
Chinese (zh)
Other versions
CN110246543A (en
Inventor
郎继东
王博
杨家亮
田埂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geneis Beijing Co ltd
Original Assignee
Geneis Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geneis Beijing Co ltd filed Critical Geneis Beijing Co ltd
Priority to CN201910541057.5A priority Critical patent/CN110246543B/en
Publication of CN110246543A publication Critical patent/CN110246543A/en
Application granted granted Critical
Publication of CN110246543B publication Critical patent/CN110246543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a computer system for detecting copy number variation by using a single sample based on a second-generation sequencing technology. The method can perform the copy number variation CNV detection on a single sample from sequencing original data without depending on factors necessary or required by the traditional methods such as comparison, sequencing depth, GC content correction and sample pairing. Therefore, the method not only simplifies the experiment and analysis steps and reduces the cost, but also has the analysis result highly consistent with that of the traditional method, and effectively corrects the false positive and false negative detected by the traditional method by adding the clinical verification result (such as FISH verification).

Description

Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology
Technical Field
The invention relates to gene detection, in particular to a method and a computer system for detecting copy number variation by using a single sample based on a second-generation sequencing technology.
Background
Copy Number Variation (CNV) is a structural variation, caused by a rearrangement of the genome, and can be classified into microscopic (microscopic) and sub-microscopic (microscopic) levels according to size. Microscopic structural variation mainly refers to chromosome aberrations including euploid or aneuploid, insertion, deletion, inversion, translocation and the like which can be seen under a microscope; the structural variation at the sub-microscopic level mainly refers to the variation of DNA fragment length above 1Kb, including insertion, deletion, duplication, inversion, translocation and the like. Copy number variation is one of the important pathogenic factors of human diseases, and the current research finds that CNV is related to the pathogenic mechanism or susceptibility of a plurality of complex genetic diseases, including tumors, acquired immunodeficiency syndrome, systemic lupus erythematosus, autoimmune inflammatory diseases and the like. Clinically, the copy number variation detection is necessary, and the variation of a large fragment of DNA sequence in a genome can be discovered early, so that a reference basis is provided for the diagnosis and treatment of diseases.
At present, a plurality of means and methods for detecting copy number variation exist, such as methods based on polymerase chain reaction, including multiplex-linked probe amplification technology, multiplex amplifiable probe hybridization technology and the like; methods based on hybridization techniques, including in situ immunofluorescence and Gismsa banding methods, and the like; the chip technology based method includes single nucleotide polymorphism chip, etc. These methods are not only complicated to operate, low in resolution, difficult to provide specific information of the variant region, but also low in analysis throughput, expensive in price and not very high in cost performance. With the rapid development of the second-generation sequencing technology, the sequencing cost is greatly reduced, the analysis flux is exponentially improved, and the resolution can be reduced to the Kb level, so that the copy number variation research of the sub-microscopic level can be deeper. At present, algorithms for detecting CNV are basically developed based on Whole Genome Sequencing (WGS) level, such as CNVkit, CNVnator, Control-FREEC and the like, and in consideration of detection accuracy, a pairing sample is generally required to detect CNV; for single sample detection, CNV identification is typically corrected based on sequencing depth and GC content. With the increasing demand of target sequencing, more targeted algorithms, such as patternrnv, Ioncopy, etc., are generated in addition to the aforementioned algorithms. However, all of these methods are subject to comparison, depend on the sequencing depth and GC content excessively, and are limited by the parameter settings of the comparison parameters and the analysis algorithm, so the overall process is complicated and complicated, and the experiment and analysis costs are high.
Disclosure of Invention
In view of this, the present invention provides a method and computer system for single-sample copy number variation detection based on the next-generation sequencing technology. The method can perform the copy number variation CNV detection on a single sample from sequencing original data without depending on factors necessary or required by the traditional methods such as comparison, sequencing depth, GC content correction and sample pairing.
Specifically, the present invention includes the following.
In a first aspect of the present invention, a method for detecting copy number variation using a single sample based on a second generation sequencing technology is provided, which comprises the following steps:
(1) establishing a first gene sample database and a second gene sample database, wherein the first gene sample database comprises A cases of copy number variation genes, the second gene sample database comprises B cases of genes without copy number variation in corresponding genes, and A and B are natural numbers respectively more than 50; preferably, the genes in the first gene sample database comprise copy number variations, and the genes in the second gene sample database preferably do not have corresponding copy number variations at the positions (i.e. regions) of the copy number variations of the genes in the first gene sample database. It should be noted that the mutations that are different from the copy number variation region in the first gene sample may be present in the genes in the second gene sample database, including copy number variation mutations. In order to ensure the reliability and accuracy of the method of the present invention, it is generally necessary that each of a and B is a natural number of 50 or more, preferably 100 or more, more preferably 200 or more, and further preferably 300 or more. The upper limit of A and B is not particularly limited.
(2) Will have a length LjThe j copy number variation areas of bp are divided by a sliding window with the size of m bp, the step length is n bp, and therefore i is L in each copy number variation areajN seed sequences, wherein if LjWhere/n is integer division, i is integer, if L isjIf/n is not an integer division, i is rounded down plus 1, resulting in a matrix of j × i seed sequences. In general, j is a natural number of 1 or more, preferably 10 or more, and more preferably 30 or more. m is a natural number of 50 or more, more preferably 80 or more, and still more preferably m is L or less. n is a natural number of 1 or more and L or less, preferably 5 or more and L or less.
(3) And respectively carrying out non-fault-tolerant complete sequence matching on the j x i seed sequences in the first gene sample database and the second gene sample database, and obtaining j x i matrixes of the number of completely matched seed sequences in each database.
(4) The matrix of the number of perfect match seed sequences in each database is normalized by dividing each perfect match seed sequence number by the average of all perfect match seed sequences of the copy number variation region.
(5) And (3) performing 0 value complementing treatment on the matrix of the completely matched seed sequence number after the standardization treatment, namely comparing the matrix with the largest number of the seed sequences obtained in the copy number variation region, and setting the matrix value of the number of the rest regions which is less than the number as 0.
(6) And performing mathematical modeling on the A + B standardized completely-matched seed sequence number matrixes subjected to the 0-complementing value processing, establishing a data statistical model according to the negative and positive results, and finally obtaining a negative and positive mathematical model for judging copy number variation.
(7) And (3) repeating the steps (2) to (5) on the sample to be judged, predicting and judging the copy number variation by using the mathematical model obtained in the step (6), and judging the sample to be judged to be positive if the predicted value is more than 0.5, preferably more than 0.6, and more preferably more than 0.8, and otherwise, judging the sample to be judged to be negative.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the gene sample data in the first gene sample database and the second gene sample database are derived from data obtained by whole genome sequencing and/or target region capture/amplification sequencing.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the copy number variation comprises gene copy number amplification and/or deletion.
Preferably, the method for detecting copy number variation including the euploid or aneuploid, insertion, deletion, inversion and translocation of chromosome and the insertion, deletion, duplication, inversion or translocation of DNA fragment using single sample based on the second generation sequencing technology of the present invention.
Preferably, in the method for detecting copy number variation using a single sample based on the second-generation sequencing technology of the present invention, the length of the DNA fragment is 1Kb or more, preferably 1.5Kb or more. On the other hand, it is preferably 10Kb or less, more preferably 8Kb or less.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the gene is ERBB 2.
Preferably, in the method for detecting copy number variation using a single sample based on the second-generation sequencing technology of the present invention, the data statistical model is established by a logistic regression or deep learning algorithm in step (6).
In a second aspect of the invention, there is provided a computer system comprising a processor and configured to perform the method of the first aspect of the invention.
The invention not only simplifies the experiment and analysis steps and reduces the cost, but also has higher consistency rate of the analysis result and the traditional method, and effectively corrects the false positive and false negative detected by the traditional method by increasing the clinical verification result (such as FISH verification).
Drawings
FIG. 1 is an exemplary flow chart of the present invention.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that the upper and lower limits of the range, and each intervening value therebetween, is specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control. Unless otherwise indicated, "%" or "amount" are percentages by weight.
Examples
The analysis method of the present invention was tested by selecting full exome data for 5 known ERBB2 amplification positive and 5 known ERBB2 amplification negative. Specifically, 1 ERBB2 positive Sample1 is exemplified (as shown in the flowchart 1), and the steps 1 to 11 are repeated in the other examples as follows:
1. collecting 272 ERBB2 gene amplification positive whole exome sequencing data, and collecting 1029 ERBB2 gene amplification negative whole exome sequencing data, and dividing the data into a training set and a testing set; the training set comprises 223 positive samples and 817 negative samples, and the testing set comprises 49 positive samples and 212 negative samples;
2. the gene ERBB2 comprises 27 full exons, 27 amplified or deleted regions Lj (0< j <28) bp are divided by a sliding window with the size of 50bp, the step size is 40bp, and each amplified or deleted region can obtain i ═ Lj/40 seed sequences, wherein if Lj/40 is divided into whole parts, i is rounded, if Lj/40 is not divided into whole parts, i is rounded downwards and added with 1, so that 27 × i seed sequence matrixes can be obtained in total, wherein Lj is respectively 311bp, 152bp, 214bp, 135bp, 69bp, 116bp, 142bp, 120bp, 127bp, 74bp, 91bp, 200bp, 133bp, 91bp, 161bp, 48bp, 139bp, 123bp, 99bp, 186bp, 156bp, 76bp, 147bp, 98bp, 189bp, 253bp and 974 bp. The corresponding i are 8, 4, 6, 4, 2, 3, 4, 2, 3, 6, 4, 3, 5, 2, 4, 3, 5, 4, 2, 4, 3, 5, 7, 25, respectively.
3. And respectively carrying out non-fault-tolerant complete sequence matching on the 27 x i seed sequences in the data of 1040 samples in the training set to obtain a matrix of the 27 x i completely matched seed sequence numbers of each sample.
4. The matrix of perfect match seed sequence numbers for each sample is normalized by dividing each perfect match seed sequence number by the average of all perfect match seed sequence numbers for the amplified or deleted region.
5. And (3) performing 0 value complementing treatment on the normalized matrix of the number of completely matched seed sequences, namely, taking the maximum number of seed sequences obtained by amplifying or deleting the regions as comparison, and setting the matrix value of the number of the rest regions which is less than the number as 0.
6. The standardized complete matching seed sequence number 27 x 25 matrix after the 1040 0-complementing value processing is subjected to mathematical modeling, firstly, 10 times of cross validation is carried out on 1040 samples, and a Convolutional Neural Network (CNN) algorithm in deep learning is utilized to combine negative and positive results to select hyper-parameters, adjust and optimize the model, and finally, an optimal mathematical model with the training set AUC of 93.04% and the testing set AUC of 94.54% is obtained and is used as a model method for judging new samples of the same type of data. The model parameters are shown in table 1.
TABLE 1 model parameters
Figure BDA0002102563590000061
Figure BDA0002102563590000071
7. And (3) repeating the step 2-5 with Sample1, and predicting and judging the copy number variation by using the optimal mathematical model obtained in the step 6, wherein the predicted value is 0.9916596 and is more than 0.5, and the copy number variation is considered to be positive.
The predicted values and judgment results of Sample2-Sample10 are shown in Table 2 below.
TABLE 2 summary of predicted results for each sample
Sample ID Prediction value Observed value
Sample1 0.9916596 Positive for
Sample2 0.9989957 Positive for
Sample3 0.9999901 Positive for
Sample4 0.9990958 Positive for
Sample5 0.99751943 Positive for
Sample6 0.012844639 Negative of
Sample7 0.006111831 Negative of
Sample8 0.003521628 Negative of
Sample9 0.008016149 Negative of
Sample10 0.002645513 Negative of
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.

Claims (9)

1. A method for detecting copy number variation by using a single sample based on a second-generation sequencing technology is characterized by comprising the following steps:
(1) establishing a first gene sample database and a second gene sample database, wherein the first gene sample database comprises A cases of copy number variation genes, the second gene sample database comprises B cases of genes which do not have copy number variation at corresponding positions, and A and B are natural numbers of more than 50 respectively;
(2) will have a length LjThe j copy number variation areas of bp are divided by a sliding window with the size of m bp, the step length is n bp, and therefore i is L in each copy number variation areajN seed sequences, wherein if LjWhere/n is integer division, i is integer, if L isjIf/n is not integer division, i is rounded down and added with 1, and a matrix consisting of j x i seed sequences is obtained in total;
(3) respectively carrying out non-fault-tolerant complete sequence matching on the j x i seed sequences in the first gene sample database and the second gene sample database, and obtaining a matrix of j x i completely matched seed sequence numbers in each database;
(4) normalizing the matrix of perfect match seed sequence numbers in each database, i.e., dividing each perfect match seed sequence number by the average of all perfect match seed sequence numbers of the copy number variation region;
(5) performing 0 value complementing treatment on the matrix of the completely matched seed sequence number after the standardization treatment, namely comparing the maximum number of the seed sequences obtained in the copy number variation area, and setting the matrix value of the number of the rest areas which is less than the number as 0;
(6) performing mathematical modeling on the A + B standardized fully matched seed sequence number matrixes subjected to the 0 complementing value processing, establishing a data statistical model according to negative and positive results, and finally obtaining a negative and positive mathematical model for judging copy number variation;
(7) and (5) repeating the steps (2) to (5) on the sample to be judged, predicting and judging the copy number variation by using the mathematical model obtained in the step (6), and judging the sample to be judged to be positive if the predicted value is more than 0.5, otherwise, judging the sample to be negative.
2. The method of claim 1, wherein j is a natural number greater than 1.
3. The method for detecting copy number variation using single sample based on secondary sequencing technology according to claim 2, wherein the gene sample data in the first gene sample database and the second gene sample database is derived from whole genome sequencing and/or data obtained by target region capture sequencing or target region amplification sequencing.
4. The method of claim 3, wherein the copy number variation comprises gene copy number amplification and/or deletion.
5. The method of claim 4, wherein the copy number variation comprises chromosome euploid or aneuploid aberrations, chromosome insertions, deletions, inversions or translocations, and DNA fragment insertions, deletions, duplications, inversions or translocations.
6. The method of claim 5, wherein the DNA fragment has a length of 1Kb or more.
7. The method for detecting copy number variation by using a single sample based on the secondary sequencing technology of claim 1, wherein the gene is ERBB 2.
8. The method for detecting copy number variation using single sample based on next-generation sequencing technology according to claim 1, wherein the data statistical model is established in step (6) by logistic regression or deep learning algorithm.
9. A computer system comprising a processor and configured to perform the method of any one of claims 1-8.
CN201910541057.5A 2019-06-21 2019-06-21 Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology Active CN110246543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910541057.5A CN110246543B (en) 2019-06-21 2019-06-21 Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910541057.5A CN110246543B (en) 2019-06-21 2019-06-21 Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology

Publications (2)

Publication Number Publication Date
CN110246543A CN110246543A (en) 2019-09-17
CN110246543B true CN110246543B (en) 2021-02-26

Family

ID=67888607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910541057.5A Active CN110246543B (en) 2019-06-21 2019-06-21 Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology

Country Status (1)

Country Link
CN (1) CN110246543B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111276189B (en) * 2020-02-26 2020-12-29 广州市金域转化医学研究院有限公司 Chromosome balance translocation detection and analysis system based on NGS and application thereof
CN112634987B (en) * 2020-12-25 2021-07-27 北京吉因加医学检验实验室有限公司 Method and device for detecting copy number variation of single-sample tumor DNA
CN113736865A (en) * 2021-09-09 2021-12-03 元码基因科技(北京)股份有限公司 Kit, reaction system and method for detecting gene copy number variation in sample

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372459A (en) * 2016-08-30 2017-02-01 天津诺禾致源生物信息科技有限公司 Method and device for detecting copy number variation based on amplicon next generation sequencing
CN108073791A (en) * 2017-12-12 2018-05-25 元码基因科技(北京)股份有限公司 Method based on two generation sequencing datas detection target gene structure variation
CN108256289A (en) * 2018-01-17 2018-07-06 湖南大地同年生物科技有限公司 A kind of method based on target area capture sequencing genomes copy number variation
CN108920899A (en) * 2018-06-10 2018-11-30 杭州迈迪科生物科技有限公司 A kind of single exon copy number variation prediction technique based on target area sequencing
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 Copy number variation detection method based on single-sample second-generation sequencing data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130184999A1 (en) * 2012-01-05 2013-07-18 Yan Ding Systems and methods for cancer-specific drug targets and biomarkers discovery
CN108304694B (en) * 2018-01-30 2021-08-31 元码基因科技(北京)股份有限公司 Method for analyzing gene mutation based on second-generation sequencing data
CN108427864B (en) * 2018-02-14 2019-01-29 南京世和基因生物技术有限公司 A kind of detection method, device and computer-readable medium copying number variation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372459A (en) * 2016-08-30 2017-02-01 天津诺禾致源生物信息科技有限公司 Method and device for detecting copy number variation based on amplicon next generation sequencing
CN108073791A (en) * 2017-12-12 2018-05-25 元码基因科技(北京)股份有限公司 Method based on two generation sequencing datas detection target gene structure variation
CN108256289A (en) * 2018-01-17 2018-07-06 湖南大地同年生物科技有限公司 A kind of method based on target area capture sequencing genomes copy number variation
CN108920899A (en) * 2018-06-10 2018-11-30 杭州迈迪科生物科技有限公司 A kind of single exon copy number variation prediction technique based on target area sequencing
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 Copy number variation detection method based on single-sample second-generation sequencing data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Computational tools for copy number variation";Min Zhao.et al;《BMC Bioinformatics》;20130913;全文 *
"基于高通量测序数据的基因组变异检测";刘永壮;《中国博士学位论文全文数据库(电子期刊)》;20180115(第1期);全文 *

Also Published As

Publication number Publication date
CN110246543A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
Flagel et al. The unreasonable effectiveness of convolutional neural networks in population genetic inference
Hu et al. The genetic basis of haploid induction in maize identified with a novel genome-wide association method
Yang et al. Subspecific origin and haplotype diversity in the laboratory mouse
CN110246543B (en) Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology
KR102384620B1 (en) Methods and processes for non-invasive assessment of genetic variations
Tian et al. Do genetic recombination and gene density shape the pattern of DNA elimination in rice long terminal repeat retrotransposons?
Nicolae et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS
CN105555968B (en) Methods and processes for non-invasive assessment of genetic variation
CN110910957B (en) Single-tumor-sample-based high-throughput sequencing microsatellite instability detection site screening method
Liu et al. Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae
CN110832597A (en) Variant classifier based on deep neural network
US20220101944A1 (en) Methods for detecting copy-number variations in next-generation sequencing
Schrider Background selection does not mimic the patterns of genetic diversity produced by selective sweeps
Zhang et al. Identifying and reducing AFLP genotyping error: an example of tradeoffs when comparing population structure in broadcast spawning versus brooding oysters
Fawcett et al. Population genomics of the fission yeast Schizosaccharomyces pombe
Vy et al. A composite-likelihood method for detecting incomplete selective sweep from population genomic data
Chen et al. Using Mendelian inheritance to improve high-throughput SNP discovery
Pool Genetic mapping by bulk segregant analysis in Drosophila: experimental design and simulation-based inference
Illingworth et al. Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses
CA2739462A1 (en) Methods for assembling panels of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions
Levitan et al. Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality
Ponsuksili et al. Expression quantitative trait loci analysis of genes in porcine muscle by quantitative real-time RT-PCR compared to microarray data
DeGiorgio et al. A spatially aware likelihood test to detect sweeps from haplotype distributions
CN116246705B (en) Analysis method and device for whole genome sequencing data
Eshaghi et al. Global profiling of DNA replication timing and efficiency reveals that efficient replication/firing occurs late during S-phase in S. pombe

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant