CN116705155A - Definition method of whole-gene DNA data - Google Patents

Definition method of whole-gene DNA data Download PDF

Info

Publication number
CN116705155A
CN116705155A CN202310970716.3A CN202310970716A CN116705155A CN 116705155 A CN116705155 A CN 116705155A CN 202310970716 A CN202310970716 A CN 202310970716A CN 116705155 A CN116705155 A CN 116705155A
Authority
CN
China
Prior art keywords
file
genome
data
gene
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310970716.3A
Other languages
Chinese (zh)
Inventor
夏志强
田阳阳
江思容
赵龙
夏成材
邹枚伶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanya Nanfan Research Institute Of Hainan University
Sanya Research Institute of Hainan University
Original Assignee
Sanya Nanfan Research Institute Of Hainan University
Sanya Research Institute of Hainan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanya Nanfan Research Institute Of Hainan University, Sanya Research Institute of Hainan University filed Critical Sanya Nanfan Research Institute Of Hainan University
Priority to CN202310970716.3A priority Critical patent/CN116705155A/en
Publication of CN116705155A publication Critical patent/CN116705155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Software Systems (AREA)
  • Immunology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Botany (AREA)
  • Mycology (AREA)
  • Epidemiology (AREA)
  • Microbiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of genome data analysis, in particular to a definition method of whole-gene DNA data. Comprising the following steps: acquiring a genome region file, a reference genome file and a genome variation site file of a species to be detected; the genomic variation site file is determined by comparing the reference genomic file; extracting data information of the position of the gene of the species to be detected from the genome region file; determining mutation sites through a genome mutation site file; classifying data information in the genome region file; taking a gene coding region as a weight, sequentially scoring other regions according to the action exerted in the biological process of the species to be detected and the distance from the mutation site from high to low, marking the region with the nearest position information to the mutation site as 10 points, and sequentially decreasing the scores in the other regions; and defining the species to be tested according to the score. The advantages are that: simple calculation, time saving and wide application.

Description

Definition method of whole-gene DNA data
Technical Field
The invention relates to the technical field of genome data analysis, in particular to a definition method of whole-gene DNA data.
Background
A gene is the entire nucleotide sequence required to produce a single polypeptide chain or functional RNA, and a DNA fragment with genetic information is called a gene.
The genome region file is an important file format, and important information such as coding regions, non-coding regions, gene structures, protein coding regions, promoter regions, transcription factor binding sites and the like in a genome DNA sequence can be known by reading the genome region file. This information can play a key role in the functional analysis of subsequent species. Such as the GFF format commonly used in assays. The genomic mutation site file encompasses all mutation sites of the genome, whether or not they are identical to a reference genome, and can be embodied in this file, such as VCF files commonly used in assays.
The GFF format is defined by Sanger research, a simple and convenient data format for characterizing DNA, RNA and protein sequences, and is currently also a common format for sequence annotation. GFF files are all called "General Feature Format", a generic feature format, a text file format that describes genes, transcripts, exons, introns and other sequence features in biological sequences. Typically, these features are used in applications such as genome annotation, gene recognition, sequence alignment, gene function prediction, and the like. Besides the position information describing the features, the GFF file can record information such as the names, roles and references of the features, and more fully describe all feature information in the sequence.
CDS (Coding sequence) is a sequence coding for a protein product, DNA is transcribed into mRNA, mRNA is translated into protein after being processed by splicing or the like, CDS is a DNA sequence corresponding to the protein sequence one by one, the sequence does not contain other sequences which are not corresponding to the protein, and sequence changes in the processing process of mRNA or the like are not considered, in short, the CDS corresponds to the codons of the protein completely. Through research on CDS, the amino acid sequence and function of the gene coding protein can be further known, and the evolution and variation of the gene can be researched. In addition, in the fields of gene editing, gene therapy and the like, analysis and modification of CDS sequences are also of great value.
At the current whole gene level, DNA data is important biological data, and research of DNA data and interpretation of the meaning thereof are also major research tasks in the genome era. The method has wide application in various aspects, stores arbitrary digital information, builds a DNA database, determines genotypes, performs gene sequencing, performs subsequent analysis and the like. However, the species cannot be directly resolved by GFF files and vcf files in the prior art.
Disclosure of Invention
The present invention provides a method for defining whole-gene DNA data to solve the above problems.
The invention aims to provide a definition method of whole-gene DNA data, which specifically comprises the following steps:
s1, acquiring a file: acquiring a genome region file, a reference genome file and a genome variation site file of a species to be detected; the genomic variation site file is determined by comparing the reference genomic file;
s2, data information processing: extracting data information of the position of the gene of the species to be detected from the genome area file; determining mutation sites by the genomic mutation site file;
s3, classifying data information: classifying data information in the genomic region file;
s4, scoring and sorting: sequentially scoring other functional areas according to the action exerted in the biological process of the species to be detected and the distance from the mutation site from high to low by taking the gene coding area as weight, marking the area with the nearest position information to the mutation site as 10 points, and sequentially decreasing the scores within 800-1500 bp from the upstream and downstream of the mutation site; and defining the species to be tested according to the score.
Preferably, the other regions in step S4 are mRNA, gene, exon, UTR, QTL and/or methylated regions.
Preferably, the scoring is successively decreased within 1000bp upstream and downstream from the mutation site in step S4.
Preferably, the classification in step S3 specifically includes the following steps:
s31, extracting a column of a CDS (coding region) of a gene in the genome region file, extracting a column of mRNA, gene, exon, UTR and/or functions, and classifying by using an awk command in Linux;
s32, selecting all classified data, and screening the data according to different types of areas;
s33, importing the genome region file and the reference genome file by using software TBtools, outputting the types of the files, and finishing classification.
Preferably, the awk command is:
and X is CDS, mRNA, gene, exon, UTR, QTL or a methylation region.
Preferably, the genomic region file is a GFF file; the reference genome file is a FASTA sequence file; the genomic variation site file is a vcf file.
Preferably, the method for acquiring the vcf file in step S1 specifically includes:
performing sequencing joint removal processing on the off-machine data by using fastp data quality control software to obtain sequencing data; comparing the sequencing data to a reference genome by using bwa sequence comparison software, and sequencing the compared sequencing data by using samtools sequence comparison software and preset genome position information; filtering the repeated segment PCR in the sequenced sequencing data by using a picard high-throughput sequencing data format kit; and (3) performing genome mutation analysis on the filtered sequencing data by using GATK, and finally obtaining a vcf file.
Preferably, the method for acquiring the GFF file in step S1 specifically includes: according to Latin name of the species to be tested, downloading corresponding GFF files in NCBI, ensembl, UCSC, geneCode database or published literature.
Preferably, the method for acquiring the GFF file in step S1 specifically includes: and obtaining the GFF file by annotating the reference genome file and the sequencing data.
Compared with the prior art, the invention has the following beneficial effects:
(1) Assigning biological significance to GFF files without a format of biological significance;
(2) Only the GFF files and vcf files of the species to be tested are processed and the scoring and sorting are carried out correspondingly, excessive calculation is not needed, time is saved to a certain extent, and the subsequent analysis is simplified;
(3) The files can be classified and ordered only by the Genome region files, the reference Genome files and the Genome variation site files of the species to be analyzed, and can be applied to multiple aspects such as GWAS (Genome-Wide Association Studies) -whole Genome association research, GS (Genomic selection) -whole Genome selective breeding, QTL positioning, species evolution and evolution, large population screening, new species identification standards, radiation mutagenesis screening and the like.
Drawings
FIG. 1 is a flow chart of a method of defining whole gene DNA data according to the present invention.
Fig. 2 is a flowchart of a method for defining whole-gene DNA data of rice according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.
Example 1
In this embodiment, taking rice as an example, a method for defining whole-gene DNA data is provided, which specifically includes the following steps:
s1, acquiring a file: obtaining a GFF file, a reference genome file and a vcf file on chromosome I in 533 parts of rice data; the vcf file is determined by comparing the reference genome file;
the GFF file is obtained in the following way: firstly searching the Latin name of the species rice, and downloading in a NCBI, ensembl, UCSC, geneCode database by utilizing the Latin name; or, checking and downloading the corresponding GFF files in the published literature; alternatively, the reference genome file and the sequencing data are used for annotation;
the reference genome file is obtained by the following steps: downloading in a NCBI, ensembl, UCSC, geneCode database and the like; searching the Latin name of the species, entering the website, and searching by using the Latin name to obtain a reference genome file of the species; alternatively, the corresponding reference genome file is reviewed and downloaded in the published literature; for species without a reference genome, sequencing data can be obtained by a sequencing means and genome assembly is carried out to obtain a reference genome file;
vcf file is a commonly used bioinformatics file format for storing genomic or transcriptome variant information; the acquisition mode is as follows: performing sequencing joint removal treatment on parent and hybrid progeny (F1) single plant off-machine data by using fastp data quality control software to obtain parent and hybrid progeny single plant sequencing data; comparing the sequencing data of the parent and hybrid progeny individuals to a reference genome by using bwa sequence comparison software, and respectively sequencing the sequences of the compared parent and hybrid progeny individuals by using samtools sequence comparison software and preset genome position information; filtering the repeated segment PCR in sequencing of the sequenced parent and the sequenced filial generation single plant by using a picard high-throughput sequencing data format kit; and (3) performing genome variation analysis on the filtered parent-parent and hybrid progeny single plant sequencing data by using GATK, and finally obtaining the vcf file.
S2, data information processing: extracting data information of each region such as a gene coding region, mRNA, gene, exon, five _prime_UTR and the like from the GFF file; determining mutation sites by vcf file;
s3, classifying data information: the specific method for classifying the data information in the GFF file comprises the following steps:
s31, extracting a column only comprising a gene coding region (CDS region) from the GFF file, and classifying by using an awk command in Linux, for example: awk 'BEGIN { fs=ofs= "\t" } $3= "CDS" { print $0}' wuzhong.gff3> cds.txt; if other categories are to be extracted, the contents in the' are replaced;
s32, opening the file to be processed by Excel, selecting all data, screening a third column (GFF file format is fixed, the third column is Type), and classifying the data according to the information of the column;
s33, importing GFF files and FASTA sequence files by using the existing software TBtools, outputting the types of the files, and completing classification; the results are shown in Table 1 below;
TABLE 1 data information for each region of chromosome I in Rice data
Taking a segment of CDS region with a starting position of 335869 and a terminating position of 337498 on chromosome 1 of rice as an example, VCF files are shown in the following table:
table 2 rice VCF files
S4, scoring and sorting: taking a gene coding region as a weight, gradually decreasing the distance from a mutation site to a mutation site according to the action of the gene coding region in the biological process of rice, sequentially scoring from high to low, in a specified CDS region, scoring the mutation site between the regions with the highest score, and gradually decreasing the rest according to the distance from the CDS region; scoring results are shown in table 3:
table 3 scoring results
Note that: the italics part of the table is the mutation site falling within the CDS region;
and finally, redefining DNA data of the rice according to the scoring result, and estimating the sites with high value.
The method of the invention is a new definition mode of conventional single nucleotide polymorphism; SNP-single nucleotide polymorphism refers mainly to DNA sequence polymorphism caused by variation of a single nucleotide at the genomic level. Some SNPs located inside genes are likely to directly affect protein structure or expression level, so the study of the mutation sites of binding SNPs according to various regions of GFF files is very representative.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. The definition method of the whole-gene DNA data is characterized by comprising the following steps:
s1, acquiring a file: acquiring a genome region file, a reference genome file and a genome variation site file of a species to be detected; the genomic variation site file is determined by comparing the reference genomic file;
s2, data information processing: extracting data information of the position of the gene of the species to be detected from the genome area file; determining mutation sites by the genomic mutation site file;
s3, classifying data information: classifying data information in the genomic region file;
s4, scoring and sorting: taking the gene coding region as a weight, sequentially scoring other functional regions according to the action exerted in the biological process of the species to be detected and the distance from the mutation site from high to low, marking the region with the nearest position information to the mutation site as 10 points, and sequentially decreasing the scores within 800-1500 bp from the upstream and downstream of the mutation site; and defining the species to be tested according to the score.
2. The method for defining whole-gene DNA data according to claim 1, wherein: the other regions in step S4 are mRNA, gene, exon, UTR, QTL and/or methylated regions.
3. The method for defining whole-gene DNA data according to claim 2, wherein: and in the step S4, the scoring is sequentially and progressively decreased within 1000bp from the upstream and downstream of the mutation site.
4. A method of defining whole-gene DNA data according to claim 3, wherein: the classification in the step S3 specifically includes the following steps:
s31, extracting a column of a CDS (coding region) of a gene in the genome region file, extracting a column of mRNA, gene, exon, UTR and/or functions, and classifying by using an awk command in Linux;
s32, selecting all classified data, and screening the data according to different types of areas;
s33, importing the genome region file and the reference genome file by using software TBtools, outputting the types of the files, and finishing classification.
5. The method of claim 4, wherein the awk command is:
and X is CDS, mRNA, gene, exon, UTR, QTL or a methylation region.
6. The method of defining whole-gene DNA data according to claim 5, wherein: the genome region file is a GFF file; the reference genome file is a FASTA sequence file; the genomic variation site file is a vcf file.
7. The method of defining whole-gene DNA data according to claim 6, wherein: the method for acquiring the vcf file in the step S1 specifically includes:
performing sequencing joint removal processing on the off-machine data by using fastp data quality control software to obtain sequencing data; comparing the sequencing data to a reference genome by using bwa sequence comparison software, and sequencing the compared sequencing data by using samtools sequence comparison software and preset genome position information; filtering the repeated segment PCR in the sequenced sequencing data by using a picard high-throughput sequencing data format kit; and (3) performing genome mutation analysis on the filtered sequencing data by using GATK, and finally obtaining a vcf file.
8. The method for defining whole-gene DNA data according to claim 7, wherein: the method for acquiring the GFF file in the step S1 specifically comprises the following steps: according to Latin name of the species to be tested, downloading corresponding GFF files in NCBI, ensembl, UCSC, geneCode database or published literature.
9. The method for defining whole-gene DNA data according to claim 7, wherein: the method for acquiring the GFF file in the step S1 specifically comprises the following steps: and obtaining the GFF file by annotating the reference genome file and the sequencing data.
CN202310970716.3A 2023-08-03 2023-08-03 Definition method of whole-gene DNA data Pending CN116705155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310970716.3A CN116705155A (en) 2023-08-03 2023-08-03 Definition method of whole-gene DNA data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310970716.3A CN116705155A (en) 2023-08-03 2023-08-03 Definition method of whole-gene DNA data

Publications (1)

Publication Number Publication Date
CN116705155A true CN116705155A (en) 2023-09-05

Family

ID=87837808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310970716.3A Pending CN116705155A (en) 2023-08-03 2023-08-03 Definition method of whole-gene DNA data

Country Status (1)

Country Link
CN (1) CN116705155A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104450898A (en) * 2014-11-26 2015-03-25 江苏出入境检验检疫局动植物与食品检测中心 Species identification method of euproctis insects
CN105008599A (en) * 2013-02-07 2015-10-28 中国种子集团有限公司 Rice whole genome breeding chip and application thereof
CN112349350A (en) * 2020-11-09 2021-02-09 山西大学 Method for strain identification based on Dunaliella core genome sequence
CN112927755A (en) * 2021-02-09 2021-06-08 北京博奥医学检验所有限公司 Method and system for identifying cfDNA (cfDNA) variation source
US20230057154A1 (en) * 2021-08-05 2023-02-23 Grail, Llc Somatic variant cooccurrence with abnormally methylated fragments
CN115838808A (en) * 2022-07-29 2023-03-24 江苏省家禽科学研究所科技创新有限公司 Molecular marker for identifying Wenshang Luhua chicken variety and application thereof
CN116426647A (en) * 2023-03-10 2023-07-14 江苏省家禽科学研究所 Molecular marker combination for identifying Tianjin monkey chicken variety and application thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105008599A (en) * 2013-02-07 2015-10-28 中国种子集团有限公司 Rice whole genome breeding chip and application thereof
CN104450898A (en) * 2014-11-26 2015-03-25 江苏出入境检验检疫局动植物与食品检测中心 Species identification method of euproctis insects
CN112349350A (en) * 2020-11-09 2021-02-09 山西大学 Method for strain identification based on Dunaliella core genome sequence
CN112927755A (en) * 2021-02-09 2021-06-08 北京博奥医学检验所有限公司 Method and system for identifying cfDNA (cfDNA) variation source
US20230057154A1 (en) * 2021-08-05 2023-02-23 Grail, Llc Somatic variant cooccurrence with abnormally methylated fragments
CN115838808A (en) * 2022-07-29 2023-03-24 江苏省家禽科学研究所科技创新有限公司 Molecular marker for identifying Wenshang Luhua chicken variety and application thereof
CN116426647A (en) * 2023-03-10 2023-07-14 江苏省家禽科学研究所 Molecular marker combination for identifying Tianjin monkey chicken variety and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
伊日贵 等: "基于 cyt b 基因的黄河上游高原鳅复合体的物种鉴定", 《四川动物》, vol. 42, no. 3, pages 272 - 279 *
果蝇饲养员的生信笔记: "软件4 —— hisat2和samtools", pages 1 - 7, Retrieved from the Internet <URL:https://www.jianshu.com/p/49fc02ec076e> *

Similar Documents

Publication Publication Date Title
Mathelier et al. Identification of altered cis-regulatory elements in human disease
Haberle et al. CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses
Jaffe et al. Developmental regulation of human cortex transcription and its clinical relevance at single base resolution
US20220101944A1 (en) Methods for detecting copy-number variations in next-generation sequencing
CA3049682C (en) Methods for non-invasive assessment of genetic alterations
CN110832597A (en) Variant classifier based on deep neural network
CN110692101B (en) Method for aligning targeted nucleic acid sequencing data
CN116042833A (en) Alignment and variant sequencing analysis pipeline
Arrigoni et al. Analysis RNA-seq and Noncoding RNA
CN108350498B (en) Parting method and device
Liu Bioinformatics in aquaculture: principles and methods
CN109524060B (en) Genetic disease risk prompting gene sequencing data processing system and processing method
Bengert et al. A software tool-box for analysis of regulatory RNA elements
CN115198023A (en) Hainan cattle liquid phase breeding chip and application thereof
Keel et al. Recent developments and future directions in meta-analysis of differential gene expression in livestock RNA-Seq
CN116705155A (en) Definition method of whole-gene DNA data
WO2019132010A1 (en) Method, apparatus and program for estimating base type in base sequence
Roy et al. NGS-μsat: bioinformatics framework supporting high throughput microsatellite genotyping from next generation sequencing platforms
CN110273011A (en) A kind of InDel molecular labeling relevant to the long character of pig body
CN113981070B (en) Method, device, equipment and storage medium for detecting embryo chromosome microdeletion
D’Agaro New advances in NGS technologies
CN111028885B (en) Method and device for detecting yak RNA editing site
Lee et al. Identification of mRNA polyadenylation sites in genomes using cDNA sequences, expressed sequence tags, and Trace
Sang Bioinformatics analysis of DNA methylation through bisulfite sequencing data
Yang et al. Terminitor: cleavage site prediction using deep learning models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination