CN113793641B - Method for rapidly judging sample gender from FASTQ file - Google Patents
Method for rapidly judging sample gender from FASTQ file Download PDFInfo
- Publication number
- CN113793641B CN113793641B CN202111149249.5A CN202111149249A CN113793641B CN 113793641 B CN113793641 B CN 113793641B CN 202111149249 A CN202111149249 A CN 202111149249A CN 113793641 B CN113793641 B CN 113793641B
- Authority
- CN
- China
- Prior art keywords
- mers
- fastq
- mer
- data
- chromosome
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000000523 sample Substances 0.000 claims abstract description 32
- 210000002593 Y chromosome Anatomy 0.000 claims abstract description 23
- 238000013461 design Methods 0.000 claims abstract description 14
- 238000007482 whole exome sequencing Methods 0.000 claims abstract description 9
- 230000020509 sex determination Effects 0.000 claims abstract description 6
- 230000000717 retained effect Effects 0.000 claims abstract description 3
- 210000000349 chromosome Anatomy 0.000 claims description 8
- 238000011109 contamination Methods 0.000 claims description 5
- 108090000623 proteins and genes Proteins 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 241000242583 Scyphozoa Species 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 6
- 238000007481 next generation sequencing Methods 0.000 description 12
- 238000012216 screening Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000035772 mutation Effects 0.000 description 4
- 238000013515 script Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012070 whole genome sequencing analysis Methods 0.000 description 3
- 210000001766 X chromosome Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005206 flow analysis Methods 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 238000011331 genomic analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a method for rapidly judging sample gender from FASTQ files, which comprises the following steps: (1) Generating a unique K-mer on the Y chromosome from the reference genome; (2) Acquiring intersections of design intervals of the full-exome sequencing capture probes, removing K-mers outside the intersections, arranging the retained K-mers in order of more times of occurrence in the design intervals of the capture probes, and selecting the K-mers at the front as a special K-mer set; (3) Randomly reading FASTQ files, counting the special K-mers, analyzing the distribution difference of the special K-mers in the FASTQ files with different sexes by using the real data of the same number of men and women, and determining a gender judgment threshold; (4) sex determination is performed on the FASTQ file according to the threshold value. The method is suitable for various data types of the NGS, has simple analysis flow and convenient operation, and greatly improves the judging efficiency.
Description
Technical Field
The invention relates to the technical field of biology and accurate medical high-throughput sequencing and mutation detection, in particular to a method for rapidly judging the sex of a sample from a FASTQ file.
Background
Along with the rapid development of modern medicine, the cost of high-throughput sequencing technology (Next-Generation Sequencing, NGS) is also becoming lower and lower, and is becoming the first choice for genetic disease, tumor and other gene detection. FASTQ is the most common file format used to store NGS sequencing bases and corresponding mass fractions, as well as other relevant information. FASTQ is also the raw data for sequencing data delivery and genomic analysis, on the basis of which NGS data and results in other formats, such as alignment file BAM and mutation detection file VCF, can be obtained by a large number of calculations. Researchers often need to verify that the sample gender and data gender are consistent when analyzing NGS data, which is critical to determine if the data and sample are consistent, if there is contamination, and subsequent chromosome copy number analysis and variation interpretation.
The main research ideas for judging the sex of NGS data are to analyze the coverage of specific genes on the X chromosome and the Y chromosome from BAM or the genotype distribution on the X chromosome and the Y chromosome from VCF, and these methods have the following disadvantages:
(1) The generation of the comparison file BAM and the mutation detection file VCF from the FASTQ requires a large amount of computing resources and storage space, and the analysis flow generally takes several hours to tens of hours according to the difference of the data amounts, so that the disadvantages are more obvious in some application scenarios in which only the sex of the data needs to be determined and the subsequent analysis is not needed temporarily.
(2) Most of the software used in the analysis process can only be run in a Linux system, the difficulty of installing and running the software on a Windows computer is great, many data are delivered through the network disk software of the Windows system, the sex judgment is required to be uploaded to a Linux server, and inconvenience is brought to analysts.
Therefore, an analyst is urgent to need a new technical solution, which can significantly reduce resource requirements and system dependencies, and also can rapidly determine sample gender and contamination between samples of different sexes from FASTQ files.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provide a method for rapidly judging the sex of a sample from a FASTQ file, which can remarkably reduce the resource requirement, reduce the system dependence and rapidly judge the sex of the sample.
The technical scheme of the invention is as follows:
a method for rapidly determining the sex of a sample from a FASTQ file, comprising the steps of:
(1) Generating a unique K-mer on the Y chromosome from the reference genome;
(2) Acquiring intersections of design intervals of all-exome sequencing capture probes of different sources, removing K-mers outside the intersections, arranging the retained K-mers in order of more than few times in the design intervals of the capture probes, and selecting K-mers with preset number of bits before as a final unique K-mer set;
(3) Randomly reading data of different FASTQ files, counting unique K-mers contained in the data, analyzing distribution differences of the unique K-mers in the different FASTQ files by using real data of the same number of men and women, and determining a gender judgment threshold;
(4) And carrying out sex determination on the FASTQ file according to the threshold value.
Optionally or preferably, in the above method, the threshold includes an upper threshold U and a lower threshold L of the number of K-mers, the data greater than U is male, and the data less than L is female; when the number of K-mers is between L and U, it is determined that contamination between samples of different sexes exists.
Alternatively or preferably, in the above method, the FASTQ file is a FASTQ file generated by whole gene sequencing or whole exome sequencing.
Alternatively or preferably, in the above method, the outside of the intersection in step (2) includes coverage of less than 50% and occurrence frequency on the Y chromosome of less than 3.
Optionally or preferably, in the method, the number of preset bits before in step (2) is the first 100 bits.
Alternatively or preferably, in the method, the data of different FASTQ files in step (3) are read randomly, and the number of FASTQ files is 10 ten thousand.
Compared with the prior art, the invention has the following beneficial effects:
the judging method is based on the special K-mers of the Y chromosome, the special K-mers only exist in the data of the male samples theoretically and contain possible sex information, and the dividing threshold value of the male and female data is determined by utilizing the difference of the occurrence frequencies of the K-mers in different sex FASTQs, so that the sex of the data and the pollution among the samples with different sexes can be judged from the NGS original data.
K-mers which are not covered or have low coverage rate and K-mers which have relatively low frequency of occurrence on the Y chromosome are removed, so that the robustness and the calculation speed of the K-mers can be further improved.
In addition, the invention has the following advantages:
1. the method has the advantages of rapid judging process, and no need of large amount of computing resources
The conventional judgment of the sex of data from the comparison file BAM or the mutation detection file VCF requires several to several tens of hours to be calculated on a specific server. The processing flow designed by the invention is simple to deploy, convenient to use and operate, and can complete the whole flow analysis only by deploying related executable files. The requirement on the computing resource of the server is low, a common notebook computer can judge the sex of dozens of FASTQs per minute by utilizing multithreading, and the efficiency is very high.
2. Independent of an operating system, and wide application range
The method is suitable for various data types of the present NGS, including full genome sequencing data of different depths and full exome sequencing data of various capture probes; the method is not only suitable for large-scale Linux servers, but also suitable for personal Windows notebook computers.
Drawings
FIG. 1 is a flowchart showing the whole judgment method of the embodiment 1;
FIG. 2 is a first partial flow chart of example 1;
FIG. 3 is a second partial flow chart of example 1;
FIG. 4 is a third partial flow chart of example 1;
fig. 5 is a fourth partial flowchart of embodiment 1.
Detailed Description
The following detailed description of the invention is presented in conjunction with the drawings and preferred embodiments to enable one skilled in the art to better understand and practice the invention.
Example 1
Referring to fig. 1, the method for quickly judging the sex of a sample from a FASTQ file includes the following parts:
a first part: generating a unique K-mer on the Y chromosome from the reference genome;
a second part: screening the special K-mer on the Y chromosome according to the probe interval and the occurrence frequency;
third section: analyzing the distribution difference of the screened K-mers in FASTQ with different sexes by using real data so as to determine a threshold value of sex judgment;
fourth part: and carrying out sex determination on the FASTQ of the NGS data according to the threshold value.
The detailed steps of each section are described in detail below.
A first part: generation of unique K-mers on Y chromosome from reference genome
By comparing the K-mer differences on the Y chromosome with the other chromosomes on the reference genome, unique K-mers on the Y chromosome are found that theoretically would only exist in the data of the male sample, implying possible gender information. See fig. 2 for a specific flow.
Input: a reference sequence of the human genome,
and (3) outputting: k-mers specific for the Y chromosome.
The steps are as follows:
(1) The reference sequence in human genome FASTA format, e.g., hg38.fa.gz, is downloaded from UCSC or other public database.
(2) Using script to split the reference sequence into two parts by chromosome: y chromosome sequence (Y.fa) and other chromosome sequences (other.fa).
(3) Different K-mer lengths are set, in this embodiment, 7, 9, 11, 13, 15, 17, 19 and 21 lengths are set respectively, and the two sequence files in the step (2) are counted by using the Jelyfish software module respectively.
(4) Comparing the K-mer sets of the two sequence files to find the unique K-mers on the Y chromosome.
(5) The length of the K-mer is determined to be 13 by considering the running time and the number of unique K-mers.
A second part: screening for unique K-mers on Y chromosome based on probe interval and occurrence number
In order to enable the unique K-mers on the Y chromosome to be better covered in different sequencing technologies and capture probes, the collection of design intervals of the capture probes is obtained according to the main-flow full-exome capture probes of different sources (produced by different manufacturers) on the market, the K-mers which are not covered or have low coverage rate are filtered, and meanwhile, the K-mers with relatively low occurrence frequency on the Y chromosome are removed, so that the robustness and the calculation speed of the K-mers are improved. The remaining K-mers are arranged in order of more to less occurrences in the design space of the capture probe, and the K-mers at the top 100 bits are selected as the final unique K-mer set, see fig. 3 for a specific flow.
Input: a K-mer specific to the Y chromosome, a probe capture region;
and (3) outputting: and (3) screening the specific K-mer.
The steps are as follows:
(1) Obtaining design intervals of whole exome sequencing capture probes from different probe design companies;
(2) Acquiring intersection of design intervals of probe capture probes of different design companies by using a program tool bedtk;
(3) Removing K-mers outside the intersection of the design intervals of the capture probes;
(4) Arranging K-mers in reverse order according to the occurrence times in the design interval of the capture probes;
(5) The K-mers in the first 100 positions are selected as the final unique K-mer set.
Third section: analyzing the distribution difference of the screened K-mers in FASTQ with different sexes by using real data so as to determine a threshold value of sex judgment;
10 ten thousand pieces of data (containing different sexes) of the FASTQ file are randomly read, and the second part of the screened unique K-mers are counted by using the script, that is, the number of the unique K-mers in the FASTQ file is calculated. The distribution difference of the special K-mers in different FASTQ files is analyzed by using a large number of real data of the same male and female numbers for statistics, and an upper limit threshold (U, the data greater than the threshold are male) and a lower limit threshold (L, the data smaller than the threshold are female) of the K-mers which can better distinguish the sexes of the male and female are distinguished. Meanwhile, if the number of K-mers is between L and U (L-U), there may be contamination between samples of different sexes, see fig. 4 for a specific flow.
Input: the specific K-mer, FASTQ and true gender after screening;
and (3) outputting: threshold for gender determination.
The steps are as follows:
(1) 10 ten thousand pieces of data of the FASTQ file are randomly read;
(2) Counting the screened K-mers by using scripts;
(3) And carrying out threshold division according to the true gender of the data.
Fourth part: sex determination of FASTQ of NGS data based on threshold
FASTQ generated by whole gene sequencing (Whole Genome Sequencing, WGS) or whole exome sequencing (Whole Exome Sequencing, WES) can be counted for the unique K-mers obtained from the second part after screening, and sex determination can be performed in combination with the threshold interval obtained from the third part, see fig. 5.
Input: a threshold value for judging the specific K-mer, FASTQ and sex after screening;
and (3) outputting: and judging the sex.
The steps are as follows:
(1) 10 ten thousand pieces of data of the FASTQ file are randomly read;
(2) Counting the screened unique K-mers by using scripts;
(3) And judging the sex according to the threshold value.
The method adopts the unique K-mer on the Y chromosome as the judgment basis, randomly samples the original FASTQ data to judge the sex of the NGS data, is suitable for various data types of the NGS, has simple analysis flow and convenient operation, can complete the whole flow analysis by only deploying related executable files, can judge the sex of dozens of FASTQs by using a common notebook computer and utilizing multiple threads per minute, and has greatly improved efficiency compared with the traditional method of calculating for several hours to tens of hours on a specific server.
Specific examples are set forth herein to illustrate the invention in detail, and the description of the above examples is only for the purpose of aiding in understanding the core concept of the invention. It should be noted that any obvious modifications, equivalents, or other improvements to those skilled in the art without departing from the inventive concept are intended to be included in the scope of the present invention.
Claims (6)
1. A method for rapidly determining the sex of a sample from a FASTQ file, comprising the steps of:
(1) Based on the reference genome, a unique K-mer on the Y chromosome is generated, and the specific operation method is as follows:
a. acquiring a reference sequence in a FASTA format of a reference genome;
b. splitting a reference sequence into two sequence files according to chromosomes: y chromosome and other chromosomes;
c. setting different K-mer lengths, and respectively carrying out K-mer counting on two sequence files by using a Jellyfish program module;
d. comparing the K-mer sets of the two sequence files to obtain a unique K-mer on a Y chromosome;
e. determining the length of a unique K-mer on the Y chromosome to be 13;
(2) Acquiring intersections of design intervals of all-exome sequencing capture probes of different sources, removing K-mers outside the intersections, arranging the retained K-mers in order of more than few times in the design intervals of the capture probes, and selecting K-mers with preset number of bits before as a final unique K-mer set;
(3) Randomly reading data of different FASTQ files, counting unique K-mers contained in the data, analyzing distribution differences of the unique K-mers in the different FASTQ files by using real data of the same number of men and women, and determining a gender judgment threshold;
(4) And carrying out sex determination on the FASTQ file according to the threshold value.
2. The method of claim 1, wherein the intersection of step (2) comprises coverage of less than 50% and frequency of occurrence on the Y chromosome of less than 3.
3. The method of claim 1, wherein the predetermined number of bits before in step (2) is the first 100 bits.
4. The method of claim 1, wherein the threshold in step (3) comprises an upper threshold U and a lower threshold L for the number of K-mers, data greater than U being male and data less than L being female; when the number of K-mers is between L and U, it is determined that contamination between samples of different sexes exists.
5. The method of claim 1, wherein the random reading of the data of the FASTQ files with different identities in step (3) has a number of FASTQ files of 10 ten thousand.
6. The method of claim 1, wherein the FASTQ file is a FASTQ file generated by whole gene sequencing or whole exome sequencing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111149249.5A CN113793641B (en) | 2021-09-29 | 2021-09-29 | Method for rapidly judging sample gender from FASTQ file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111149249.5A CN113793641B (en) | 2021-09-29 | 2021-09-29 | Method for rapidly judging sample gender from FASTQ file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113793641A CN113793641A (en) | 2021-12-14 |
CN113793641B true CN113793641B (en) | 2023-11-28 |
Family
ID=78877534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111149249.5A Active CN113793641B (en) | 2021-09-29 | 2021-09-29 | Method for rapidly judging sample gender from FASTQ file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113793641B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004063390A2 (en) * | 2003-01-10 | 2004-07-29 | Mmi Genomics, Inc. | Compositions and methods for determining canine gender |
WO2015035555A1 (en) * | 2013-09-10 | 2015-03-19 | 深圳华大基因科技有限公司 | Method, system, and computer readable medium for determining whether fetus has abnormal number of sex chromosomes |
WO2016008146A1 (en) * | 2014-07-18 | 2016-01-21 | 深圳华大基因研究院 | Gender identification method and apparatus for samples |
KR20160134106A (en) * | 2015-05-14 | 2016-11-23 | 배재대학교 산학협력단 | Kit for gender determination |
CN106520940A (en) * | 2016-11-04 | 2017-03-22 | 深圳华大基因研究院 | Chromosomal aneuploid and copy number variation detecting method and application thereof |
CN109192246A (en) * | 2018-06-22 | 2019-01-11 | 深圳市达仁基因科技有限公司 | Detect the method, apparatus and storage medium of chromosomal copy number exception |
WO2019025004A1 (en) * | 2017-08-04 | 2019-02-07 | Trisomytest, S.R.O. | A method for non-invasive prenatal detection of fetal sex chromosomal abnormalities and fetal sex determination for singleton and twin pregnancies |
CN109402241A (en) * | 2017-08-07 | 2019-03-01 | 深圳华大基因研究院 | Identification and the method for analyzing ancient DNA sample |
CN110033828A (en) * | 2019-04-03 | 2019-07-19 | 北京各色科技有限公司 | Sexual discriminating method based on chip detection DNA data |
CN110648721A (en) * | 2019-09-19 | 2020-01-03 | 北京市儿科研究所 | Method and device for detecting copy number variation by aiming at exon capture technology |
KR102150078B1 (en) * | 2019-12-30 | 2020-09-01 | 주식회사 마크로젠 | Prediction method for gender of fetus based on directivity with number of reads and analysis apparatus |
CN113053460A (en) * | 2019-12-27 | 2021-06-29 | 分子健康有限责任公司 | Systems and methods for genomic and genetic analysis |
JP2021101629A (en) * | 2019-12-24 | 2021-07-15 | モレキュラー ヘルス ゲーエムベーハー | System and method for genome analysis and gene analysis |
CN113192555A (en) * | 2021-04-21 | 2021-07-30 | 杭州博圣医学检验实验室有限公司 | Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele |
-
2021
- 2021-09-29 CN CN202111149249.5A patent/CN113793641B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004063390A2 (en) * | 2003-01-10 | 2004-07-29 | Mmi Genomics, Inc. | Compositions and methods for determining canine gender |
WO2015035555A1 (en) * | 2013-09-10 | 2015-03-19 | 深圳华大基因科技有限公司 | Method, system, and computer readable medium for determining whether fetus has abnormal number of sex chromosomes |
WO2016008146A1 (en) * | 2014-07-18 | 2016-01-21 | 深圳华大基因研究院 | Gender identification method and apparatus for samples |
KR20160134106A (en) * | 2015-05-14 | 2016-11-23 | 배재대학교 산학협력단 | Kit for gender determination |
CN106520940A (en) * | 2016-11-04 | 2017-03-22 | 深圳华大基因研究院 | Chromosomal aneuploid and copy number variation detecting method and application thereof |
WO2019025004A1 (en) * | 2017-08-04 | 2019-02-07 | Trisomytest, S.R.O. | A method for non-invasive prenatal detection of fetal sex chromosomal abnormalities and fetal sex determination for singleton and twin pregnancies |
CN109402241A (en) * | 2017-08-07 | 2019-03-01 | 深圳华大基因研究院 | Identification and the method for analyzing ancient DNA sample |
CN109192246A (en) * | 2018-06-22 | 2019-01-11 | 深圳市达仁基因科技有限公司 | Detect the method, apparatus and storage medium of chromosomal copy number exception |
CN110033828A (en) * | 2019-04-03 | 2019-07-19 | 北京各色科技有限公司 | Sexual discriminating method based on chip detection DNA data |
CN110648721A (en) * | 2019-09-19 | 2020-01-03 | 北京市儿科研究所 | Method and device for detecting copy number variation by aiming at exon capture technology |
JP2021101629A (en) * | 2019-12-24 | 2021-07-15 | モレキュラー ヘルス ゲーエムベーハー | System and method for genome analysis and gene analysis |
CN113053460A (en) * | 2019-12-27 | 2021-06-29 | 分子健康有限责任公司 | Systems and methods for genomic and genetic analysis |
KR102150078B1 (en) * | 2019-12-30 | 2020-09-01 | 주식회사 마크로젠 | Prediction method for gender of fetus based on directivity with number of reads and analysis apparatus |
CN113192555A (en) * | 2021-04-21 | 2021-07-30 | 杭州博圣医学检验实验室有限公司 | Method for detecting copy number of second-generation sequencing data SMN gene by calculating sequencing depth of differential allele |
Non-Patent Citations (3)
Title |
---|
Assessing the Sex-Related Genomic Composition Difference Using a k-mer-Based Approach: A Case of Study in Arapaima gigas (Pirarucu);Cavalcante, RLD,等;ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2020;第12558卷;第50-56页 * |
人类性别决定基因(SRY)的检测及其临床应用;陈勇,等;分子诊断与治疗杂志(第03期);第161-164页 * |
黄江平,等.法医学杂志.2016,第32卷(第5期),第371-377页. * |
Also Published As
Publication number | Publication date |
---|---|
CN113793641A (en) | 2021-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Quintelier et al. | Analyzing high-dimensional cytometry data using FlowSOM | |
US10347365B2 (en) | Systems and methods for visualizing a pattern in a dataset | |
EP3837690B1 (en) | Systems and methods for using neural networks for germline and somatic variant calling | |
US11954614B2 (en) | Systems and methods for visualizing a pattern in a dataset | |
Browning et al. | Haplotype phasing: existing methods and new developments | |
Zhou et al. | RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data | |
Olson et al. | Variant calling and benchmarking in an era of complete human genome sequences | |
Yao et al. | A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers | |
CN106021984A (en) | Whole-exome sequencing data analysis system | |
US20170228496A1 (en) | System and method for process control of gene sequencing | |
US20090226916A1 (en) | Automated Analysis of DNA Samples | |
CN111718982A (en) | Tumor tissue single sample somatic mutation detection method and device | |
CN113488106B (en) | Method for rapidly acquiring target genome region comparison result data | |
CN107944228A (en) | A kind of method for visualizing of gene sequencing variant sites | |
Parrish et al. | Assembly of non-unique insertion content using next-generation sequencing | |
Gombolay et al. | Ribose-Map: a bioinformatics toolkit to map ribonucleotides embedded in genomic DNA | |
CN110211640B (en) | GPU parallel computing-based complex disease gene interaction correlation analysis method | |
Trapnell et al. | Monocle: Cell counting, differential expression, and trajectory analysis for single-cell RNA-Seq experiments | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
CN108256291A (en) | It is a kind of to generate the method with higher confidence level detection in Gene Mutation result | |
Sater et al. | UMI-Gen: A UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries | |
CN117275577A (en) | Algorithm for detecting human mitochondrial genetic mutation sites based on second-generation sequencing technology | |
CN113793641B (en) | Method for rapidly judging sample gender from FASTQ file | |
JP6356015B2 (en) | Gene expression information analyzing apparatus, gene expression information analyzing method, and program | |
Mayrink et al. | A Bayesian hidden Markov mixture model to detect overexpressed chromosome regions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |