CN110246543B - Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology - Google Patents
Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology Download PDFInfo
- Publication number
- CN110246543B CN110246543B CN201910541057.5A CN201910541057A CN110246543B CN 110246543 B CN110246543 B CN 110246543B CN 201910541057 A CN201910541057 A CN 201910541057A CN 110246543 B CN110246543 B CN 110246543B
- Authority
- CN
- China
- Prior art keywords
- copy number
- number variation
- gene
- sample
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method and a computer system for detecting copy number variation by using a single sample based on a second-generation sequencing technology. The method can perform the copy number variation CNV detection on a single sample from sequencing original data without depending on factors necessary or required by the traditional methods such as comparison, sequencing depth, GC content correction and sample pairing. Therefore, the method not only simplifies the experiment and analysis steps and reduces the cost, but also has the analysis result highly consistent with that of the traditional method, and effectively corrects the false positive and false negative detected by the traditional method by adding the clinical verification result (such as FISH verification).
Description
Technical Field
The invention relates to gene detection, in particular to a method and a computer system for detecting copy number variation by using a single sample based on a second-generation sequencing technology.
Background
Copy Number Variation (CNV) is a structural variation, caused by a rearrangement of the genome, and can be classified into microscopic (microscopic) and sub-microscopic (microscopic) levels according to size. Microscopic structural variation mainly refers to chromosome aberrations including euploid or aneuploid, insertion, deletion, inversion, translocation and the like which can be seen under a microscope; the structural variation at the sub-microscopic level mainly refers to the variation of DNA fragment length above 1Kb, including insertion, deletion, duplication, inversion, translocation and the like. Copy number variation is one of the important pathogenic factors of human diseases, and the current research finds that CNV is related to the pathogenic mechanism or susceptibility of a plurality of complex genetic diseases, including tumors, acquired immunodeficiency syndrome, systemic lupus erythematosus, autoimmune inflammatory diseases and the like. Clinically, the copy number variation detection is necessary, and the variation of a large fragment of DNA sequence in a genome can be discovered early, so that a reference basis is provided for the diagnosis and treatment of diseases.
At present, a plurality of means and methods for detecting copy number variation exist, such as methods based on polymerase chain reaction, including multiplex-linked probe amplification technology, multiplex amplifiable probe hybridization technology and the like; methods based on hybridization techniques, including in situ immunofluorescence and Gismsa banding methods, and the like; the chip technology based method includes single nucleotide polymorphism chip, etc. These methods are not only complicated to operate, low in resolution, difficult to provide specific information of the variant region, but also low in analysis throughput, expensive in price and not very high in cost performance. With the rapid development of the second-generation sequencing technology, the sequencing cost is greatly reduced, the analysis flux is exponentially improved, and the resolution can be reduced to the Kb level, so that the copy number variation research of the sub-microscopic level can be deeper. At present, algorithms for detecting CNV are basically developed based on Whole Genome Sequencing (WGS) level, such as CNVkit, CNVnator, Control-FREEC and the like, and in consideration of detection accuracy, a pairing sample is generally required to detect CNV; for single sample detection, CNV identification is typically corrected based on sequencing depth and GC content. With the increasing demand of target sequencing, more targeted algorithms, such as patternrnv, Ioncopy, etc., are generated in addition to the aforementioned algorithms. However, all of these methods are subject to comparison, depend on the sequencing depth and GC content excessively, and are limited by the parameter settings of the comparison parameters and the analysis algorithm, so the overall process is complicated and complicated, and the experiment and analysis costs are high.
Disclosure of Invention
In view of this, the present invention provides a method and computer system for single-sample copy number variation detection based on the next-generation sequencing technology. The method can perform the copy number variation CNV detection on a single sample from sequencing original data without depending on factors necessary or required by the traditional methods such as comparison, sequencing depth, GC content correction and sample pairing.
Specifically, the present invention includes the following.
In a first aspect of the present invention, a method for detecting copy number variation using a single sample based on a second generation sequencing technology is provided, which comprises the following steps:
(1) establishing a first gene sample database and a second gene sample database, wherein the first gene sample database comprises A cases of copy number variation genes, the second gene sample database comprises B cases of genes without copy number variation in corresponding genes, and A and B are natural numbers respectively more than 50; preferably, the genes in the first gene sample database comprise copy number variations, and the genes in the second gene sample database preferably do not have corresponding copy number variations at the positions (i.e. regions) of the copy number variations of the genes in the first gene sample database. It should be noted that the mutations that are different from the copy number variation region in the first gene sample may be present in the genes in the second gene sample database, including copy number variation mutations. In order to ensure the reliability and accuracy of the method of the present invention, it is generally necessary that each of a and B is a natural number of 50 or more, preferably 100 or more, more preferably 200 or more, and further preferably 300 or more. The upper limit of A and B is not particularly limited.
(2) Will have a length LjThe j copy number variation areas of bp are divided by a sliding window with the size of m bp, the step length is n bp, and therefore i is L in each copy number variation areajN seed sequences, wherein if LjWhere/n is integer division, i is integer, if L isjIf/n is not an integer division, i is rounded down plus 1, resulting in a matrix of j × i seed sequences. In general, j is a natural number of 1 or more, preferably 10 or more, and more preferably 30 or more. m is a natural number of 50 or more, more preferably 80 or more, and still more preferably m is L or less. n is a natural number of 1 or more and L or less, preferably 5 or more and L or less.
(3) And respectively carrying out non-fault-tolerant complete sequence matching on the j x i seed sequences in the first gene sample database and the second gene sample database, and obtaining j x i matrixes of the number of completely matched seed sequences in each database.
(4) The matrix of the number of perfect match seed sequences in each database is normalized by dividing each perfect match seed sequence number by the average of all perfect match seed sequences of the copy number variation region.
(5) And (3) performing 0 value complementing treatment on the matrix of the completely matched seed sequence number after the standardization treatment, namely comparing the matrix with the largest number of the seed sequences obtained in the copy number variation region, and setting the matrix value of the number of the rest regions which is less than the number as 0.
(6) And performing mathematical modeling on the A + B standardized completely-matched seed sequence number matrixes subjected to the 0-complementing value processing, establishing a data statistical model according to the negative and positive results, and finally obtaining a negative and positive mathematical model for judging copy number variation.
(7) And (3) repeating the steps (2) to (5) on the sample to be judged, predicting and judging the copy number variation by using the mathematical model obtained in the step (6), and judging the sample to be judged to be positive if the predicted value is more than 0.5, preferably more than 0.6, and more preferably more than 0.8, and otherwise, judging the sample to be judged to be negative.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the gene sample data in the first gene sample database and the second gene sample database are derived from data obtained by whole genome sequencing and/or target region capture/amplification sequencing.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the copy number variation comprises gene copy number amplification and/or deletion.
Preferably, the method for detecting copy number variation including the euploid or aneuploid, insertion, deletion, inversion and translocation of chromosome and the insertion, deletion, duplication, inversion or translocation of DNA fragment using single sample based on the second generation sequencing technology of the present invention.
Preferably, in the method for detecting copy number variation using a single sample based on the second-generation sequencing technology of the present invention, the length of the DNA fragment is 1Kb or more, preferably 1.5Kb or more. On the other hand, it is preferably 10Kb or less, more preferably 8Kb or less.
Preferably, in the method for detecting copy number variation by using a single sample based on the second generation sequencing technology, the gene is ERBB 2.
Preferably, in the method for detecting copy number variation using a single sample based on the second-generation sequencing technology of the present invention, the data statistical model is established by a logistic regression or deep learning algorithm in step (6).
In a second aspect of the invention, there is provided a computer system comprising a processor and configured to perform the method of the first aspect of the invention.
The invention not only simplifies the experiment and analysis steps and reduces the cost, but also has higher consistency rate of the analysis result and the traditional method, and effectively corrects the false positive and false negative detected by the traditional method by increasing the clinical verification result (such as FISH verification).
Drawings
FIG. 1 is an exemplary flow chart of the present invention.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that the upper and lower limits of the range, and each intervening value therebetween, is specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control. Unless otherwise indicated, "%" or "amount" are percentages by weight.
Examples
The analysis method of the present invention was tested by selecting full exome data for 5 known ERBB2 amplification positive and 5 known ERBB2 amplification negative. Specifically, 1 ERBB2 positive Sample1 is exemplified (as shown in the flowchart 1), and the steps 1 to 11 are repeated in the other examples as follows:
1. collecting 272 ERBB2 gene amplification positive whole exome sequencing data, and collecting 1029 ERBB2 gene amplification negative whole exome sequencing data, and dividing the data into a training set and a testing set; the training set comprises 223 positive samples and 817 negative samples, and the testing set comprises 49 positive samples and 212 negative samples;
2. the gene ERBB2 comprises 27 full exons, 27 amplified or deleted regions Lj (0< j <28) bp are divided by a sliding window with the size of 50bp, the step size is 40bp, and each amplified or deleted region can obtain i ═ Lj/40 seed sequences, wherein if Lj/40 is divided into whole parts, i is rounded, if Lj/40 is not divided into whole parts, i is rounded downwards and added with 1, so that 27 × i seed sequence matrixes can be obtained in total, wherein Lj is respectively 311bp, 152bp, 214bp, 135bp, 69bp, 116bp, 142bp, 120bp, 127bp, 74bp, 91bp, 200bp, 133bp, 91bp, 161bp, 48bp, 139bp, 123bp, 99bp, 186bp, 156bp, 76bp, 147bp, 98bp, 189bp, 253bp and 974 bp. The corresponding i are 8, 4, 6, 4, 2, 3, 4, 2, 3, 6, 4, 3, 5, 2, 4, 3, 5, 4, 2, 4, 3, 5, 7, 25, respectively.
3. And respectively carrying out non-fault-tolerant complete sequence matching on the 27 x i seed sequences in the data of 1040 samples in the training set to obtain a matrix of the 27 x i completely matched seed sequence numbers of each sample.
4. The matrix of perfect match seed sequence numbers for each sample is normalized by dividing each perfect match seed sequence number by the average of all perfect match seed sequence numbers for the amplified or deleted region.
5. And (3) performing 0 value complementing treatment on the normalized matrix of the number of completely matched seed sequences, namely, taking the maximum number of seed sequences obtained by amplifying or deleting the regions as comparison, and setting the matrix value of the number of the rest regions which is less than the number as 0.
6. The standardized complete matching seed sequence number 27 x 25 matrix after the 1040 0-complementing value processing is subjected to mathematical modeling, firstly, 10 times of cross validation is carried out on 1040 samples, and a Convolutional Neural Network (CNN) algorithm in deep learning is utilized to combine negative and positive results to select hyper-parameters, adjust and optimize the model, and finally, an optimal mathematical model with the training set AUC of 93.04% and the testing set AUC of 94.54% is obtained and is used as a model method for judging new samples of the same type of data. The model parameters are shown in table 1.
TABLE 1 model parameters
7. And (3) repeating the step 2-5 with Sample1, and predicting and judging the copy number variation by using the optimal mathematical model obtained in the step 6, wherein the predicted value is 0.9916596 and is more than 0.5, and the copy number variation is considered to be positive.
The predicted values and judgment results of Sample2-Sample10 are shown in Table 2 below.
TABLE 2 summary of predicted results for each sample
Sample ID | Prediction value | Observed value |
Sample1 | 0.9916596 | Positive for |
Sample2 | 0.9989957 | Positive for |
Sample3 | 0.9999901 | Positive for |
Sample4 | 0.9990958 | Positive for |
Sample5 | 0.99751943 | Positive for |
Sample6 | 0.012844639 | Negative of |
Sample7 | 0.006111831 | Negative of |
Sample8 | 0.003521628 | Negative of |
Sample9 | 0.008016149 | Negative of |
Sample10 | 0.002645513 | Negative of |
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.
Claims (9)
1. A method for detecting copy number variation by using a single sample based on a second-generation sequencing technology is characterized by comprising the following steps:
(1) establishing a first gene sample database and a second gene sample database, wherein the first gene sample database comprises A cases of copy number variation genes, the second gene sample database comprises B cases of genes which do not have copy number variation at corresponding positions, and A and B are natural numbers of more than 50 respectively;
(2) will have a length LjThe j copy number variation areas of bp are divided by a sliding window with the size of m bp, the step length is n bp, and therefore i is L in each copy number variation areajN seed sequences, wherein if LjWhere/n is integer division, i is integer, if L isjIf/n is not integer division, i is rounded down and added with 1, and a matrix consisting of j x i seed sequences is obtained in total;
(3) respectively carrying out non-fault-tolerant complete sequence matching on the j x i seed sequences in the first gene sample database and the second gene sample database, and obtaining a matrix of j x i completely matched seed sequence numbers in each database;
(4) normalizing the matrix of perfect match seed sequence numbers in each database, i.e., dividing each perfect match seed sequence number by the average of all perfect match seed sequence numbers of the copy number variation region;
(5) performing 0 value complementing treatment on the matrix of the completely matched seed sequence number after the standardization treatment, namely comparing the maximum number of the seed sequences obtained in the copy number variation area, and setting the matrix value of the number of the rest areas which is less than the number as 0;
(6) performing mathematical modeling on the A + B standardized fully matched seed sequence number matrixes subjected to the 0 complementing value processing, establishing a data statistical model according to negative and positive results, and finally obtaining a negative and positive mathematical model for judging copy number variation;
(7) and (5) repeating the steps (2) to (5) on the sample to be judged, predicting and judging the copy number variation by using the mathematical model obtained in the step (6), and judging the sample to be judged to be positive if the predicted value is more than 0.5, otherwise, judging the sample to be negative.
2. The method of claim 1, wherein j is a natural number greater than 1.
3. The method for detecting copy number variation using single sample based on secondary sequencing technology according to claim 2, wherein the gene sample data in the first gene sample database and the second gene sample database is derived from whole genome sequencing and/or data obtained by target region capture sequencing or target region amplification sequencing.
4. The method of claim 3, wherein the copy number variation comprises gene copy number amplification and/or deletion.
5. The method of claim 4, wherein the copy number variation comprises chromosome euploid or aneuploid aberrations, chromosome insertions, deletions, inversions or translocations, and DNA fragment insertions, deletions, duplications, inversions or translocations.
6. The method of claim 5, wherein the DNA fragment has a length of 1Kb or more.
7. The method for detecting copy number variation by using a single sample based on the secondary sequencing technology of claim 1, wherein the gene is ERBB 2.
8. The method for detecting copy number variation using single sample based on next-generation sequencing technology according to claim 1, wherein the data statistical model is established in step (6) by logistic regression or deep learning algorithm.
9. A computer system comprising a processor and configured to perform the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910541057.5A CN110246543B (en) | 2019-06-21 | 2019-06-21 | Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910541057.5A CN110246543B (en) | 2019-06-21 | 2019-06-21 | Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110246543A CN110246543A (en) | 2019-09-17 |
CN110246543B true CN110246543B (en) | 2021-02-26 |
Family
ID=67888607
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910541057.5A Active CN110246543B (en) | 2019-06-21 | 2019-06-21 | Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110246543B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111276189B (en) * | 2020-02-26 | 2020-12-29 | 广州市金域转化医学研究院有限公司 | Chromosome balance translocation detection and analysis system based on NGS and application thereof |
CN112634987B (en) * | 2020-12-25 | 2021-07-27 | 北京吉因加医学检验实验室有限公司 | Method and device for detecting copy number variation of single-sample tumor DNA |
CN113736865A (en) * | 2021-09-09 | 2021-12-03 | 元码基因科技(北京)股份有限公司 | Kit, reaction system and method for detecting gene copy number variation in sample |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
CN108073791A (en) * | 2017-12-12 | 2018-05-25 | 元码基因科技(北京)股份有限公司 | Method based on two generation sequencing datas detection target gene structure variation |
CN108256289A (en) * | 2018-01-17 | 2018-07-06 | 湖南大地同年生物科技有限公司 | A kind of method based on target area capture sequencing genomes copy number variation |
CN108920899A (en) * | 2018-06-10 | 2018-11-30 | 杭州迈迪科生物科技有限公司 | A kind of single exon copy number variation prediction technique based on target area sequencing |
CN110808084A (en) * | 2019-09-19 | 2020-02-18 | 西安电子科技大学 | Copy number variation detection method based on single-sample second-generation sequencing data |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130184999A1 (en) * | 2012-01-05 | 2013-07-18 | Yan Ding | Systems and methods for cancer-specific drug targets and biomarkers discovery |
CN108304694B (en) * | 2018-01-30 | 2021-08-31 | 元码基因科技(北京)股份有限公司 | Method for analyzing gene mutation based on second-generation sequencing data |
CN108427864B (en) * | 2018-02-14 | 2019-01-29 | 南京世和基因生物技术有限公司 | A kind of detection method, device and computer-readable medium copying number variation |
-
2019
- 2019-06-21 CN CN201910541057.5A patent/CN110246543B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
CN108073791A (en) * | 2017-12-12 | 2018-05-25 | 元码基因科技(北京)股份有限公司 | Method based on two generation sequencing datas detection target gene structure variation |
CN108256289A (en) * | 2018-01-17 | 2018-07-06 | 湖南大地同年生物科技有限公司 | A kind of method based on target area capture sequencing genomes copy number variation |
CN108920899A (en) * | 2018-06-10 | 2018-11-30 | 杭州迈迪科生物科技有限公司 | A kind of single exon copy number variation prediction technique based on target area sequencing |
CN110808084A (en) * | 2019-09-19 | 2020-02-18 | 西安电子科技大学 | Copy number variation detection method based on single-sample second-generation sequencing data |
Non-Patent Citations (2)
Title |
---|
"Computational tools for copy number variation";Min Zhao.et al;《BMC Bioinformatics》;20130913;全文 * |
"基于高通量测序数据的基因组变异检测";刘永壮;《中国博士学位论文全文数据库(电子期刊)》;20180115(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110246543A (en) | 2019-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Flagel et al. | The unreasonable effectiveness of convolutional neural networks in population genetic inference | |
Hu et al. | The genetic basis of haploid induction in maize identified with a novel genome-wide association method | |
Yang et al. | Subspecific origin and haplotype diversity in the laboratory mouse | |
CN110246543B (en) | Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology | |
KR102384620B1 (en) | Methods and processes for non-invasive assessment of genetic variations | |
Tian et al. | Do genetic recombination and gene density shape the pattern of DNA elimination in rice long terminal repeat retrotransposons? | |
Nicolae et al. | Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS | |
CN105555968B (en) | Methods and processes for non-invasive assessment of genetic variation | |
CN110910957B (en) | Single-tumor-sample-based high-throughput sequencing microsatellite instability detection site screening method | |
Liu et al. | Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae | |
CN110832597A (en) | Variant classifier based on deep neural network | |
US20220101944A1 (en) | Methods for detecting copy-number variations in next-generation sequencing | |
Schrider | Background selection does not mimic the patterns of genetic diversity produced by selective sweeps | |
Zhang et al. | Identifying and reducing AFLP genotyping error: an example of tradeoffs when comparing population structure in broadcast spawning versus brooding oysters | |
Fawcett et al. | Population genomics of the fission yeast Schizosaccharomyces pombe | |
Vy et al. | A composite-likelihood method for detecting incomplete selective sweep from population genomic data | |
Chen et al. | Using Mendelian inheritance to improve high-throughput SNP discovery | |
Pool | Genetic mapping by bulk segregant analysis in Drosophila: experimental design and simulation-based inference | |
Illingworth et al. | Inferring genome-wide recombination landscapes from advanced intercross lines: application to yeast crosses | |
CA2739462A1 (en) | Methods for assembling panels of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions | |
Levitan et al. | Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality | |
Ponsuksili et al. | Expression quantitative trait loci analysis of genes in porcine muscle by quantitative real-time RT-PCR compared to microarray data | |
DeGiorgio et al. | A spatially aware likelihood test to detect sweeps from haplotype distributions | |
CN116246705B (en) | Analysis method and device for whole genome sequencing data | |
Eshaghi et al. | Global profiling of DNA replication timing and efficiency reveals that efficient replication/firing occurs late during S-phase in S. pombe |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |