CN113077842A - Third-generation full-length transcriptome auxiliary gene prediction method - Google Patents

Third-generation full-length transcriptome auxiliary gene prediction method Download PDF

Info

Publication number
CN113077842A
CN113077842A CN202110322129.4A CN202110322129A CN113077842A CN 113077842 A CN113077842 A CN 113077842A CN 202110322129 A CN202110322129 A CN 202110322129A CN 113077842 A CN113077842 A CN 113077842A
Authority
CN
China
Prior art keywords
intron
species
generation
full
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110322129.4A
Other languages
Chinese (zh)
Inventor
郑洪坤
刘福
李绪明
李婧姬
王晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Biomarker Technologies Co ltd
Original Assignee
Beijing Biomarker Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Biomarker Technologies Co ltd filed Critical Beijing Biomarker Technologies Co ltd
Priority to CN202110322129.4A priority Critical patent/CN113077842A/en
Publication of CN113077842A publication Critical patent/CN113077842A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of biological information, in particular to a third-generation full-length transcriptome auxiliary gene prediction method. The method comprises the steps of obtaining a highly reliable intron shear site of a genome of a species to be predicted by utilizing gene information of a homologous species and second generation transcriptome data information; and automatically correcting the intron splicing sites predicted by the third-generation full-length transcriptome sequencing data by using the high-reliability intron splicing sites, and predicting the gene structure after obtaining the high-reliability transcripts. The method can realize the prediction of alternative splicing, thereby utilizing the third generation full-length transcriptome data to carry out high-accuracy prediction on the gene structure of the animal and plant genome at the whole genome level.

Description

Third-generation full-length transcriptome auxiliary gene prediction method
Technical Field
The invention relates to the technical field of biological information, in particular to a method for predicting a genome-wide gene structure by using third-generation full-length transcriptome sequencing data for assistance.
Background
Eukaryotic genes trim introns during transcription and splice exons to form transcripts. Due to the existence of the splicing form, the eukaryotic gene can adopt different splicing forms (alternative splicing) to form different transcripts, so that the eukaryotic gene plays a more extensive and accurate role, and the eukaryotic gene structure is difficult to predict.
At present, aiming at eukaryotic gene prediction, the following 3 different strategies are mainly adopted: homology-based prediction (homology-based prediction), de novo prediction (de novo prediction), and transcriptome-based prediction (transcriptome-based prediction). As a large number of genomes are published at present, the shearing sites can be determined by utilizing the principle of better conservation of gene sequences among homologous species. Transcriptome prediction refers to the assistance of gene prediction by RNA-seq and three generations of full-length transcript data mixed across various tissues. Because the transcriptome data is the most direct reaction of the gene structure of the species, the exon regions and the splicing sites can be determined really and accurately by using the data, and the method belongs to the strategy with the highest reliability in three strategies of gene prediction.
Currently, the commonly used transcriptome auxiliary gene prediction method is carried out by adopting sequencing data of a second generation transcriptome. However, in next generation transcriptome sequencing, the extracted RNA fragments are usually fragmented and sequenced, and later relatively complete transcripts are obtained by assembling (e.g., using Tirnity software) the small fragment sequencing data (read). However, due to the short sequencing fragment, assembly errors or incomplete assembly may exist, so that the complete transcript cannot be accurately obtained, and further, the integrity and the accuracy of the gene prediction are seriously influenced.
However, the currently developed three-generation sequencing platform, especially the Nanopore platform, has low cost, can directly obtain the full-length transcript sequence with high quality, does not need to be assembled, and a sequencing read can span the full-length transcript, so that the position of the gene on the genome and the complete structure of the gene can be very easily determined by comparing the read with the genome, thereby being very beneficial to the annotation work of the gene and having higher accuracy. However, considering that the accuracy of all bases of the full-length transcript obtained by the current third-generation sequencing platform is about 85%, some insertion and deletion errors exist, and especially errors occurring on intron splicing sites (non-splicing sites, namely errors of internal sequences of exons can be directly corrected by comparing genomes without any error; and correction of the region is difficult due to the fact that intron splicing site sequences cannot be distinguished as introns or errors) can seriously affect gene prediction, so that the use of most gene prediction software developed based on a second-generation transcriptome is limited, and new software is required to be compatible with the third-generation full-length data but has small part of error data.
At present, software for analyzing third-generation sequencing data to assist gene prediction only has LoReAn, but the software can only correct the whole transcript sequence, and the correction of the most core intron shearing site in the gene prediction is not concerned, so that the correction error is larger; meanwhile, the method depends on self and second generation transcriptome prediction and does not depend on homologous species information, so that the problem of incomplete correction can be caused. Therefore, it is necessary to develop an error correction method for intron splice sites that combines multi-level data (homology and transcriptome) and has a greater influence on gene prediction, so as to realize comprehensive and accurate prediction of the gene structure of the genome.
Disclosure of Invention
The invention aims to provide a method for predicting a genome-wide gene structure by using the assistance of a third-generation full-length transcriptome.
In order to achieve the object of the present invention, the first aspect of the present invention provides a method for genome-wide level gene structure prediction, comprising:
predicting the gene structure of the species by using the second generation sequencing data of the species and the homologous species gene information, and acquiring the intersection of the intron shearing site information in the second generation sequencing data of the species and the homologous species gene information;
merging the intersection with an intron splicing site in the species' third generation full-length transcriptome data;
the combined intron splicing sites are used for identifying and correcting intron splicing sites obtained by predicting gene structures in three generations of full-length transcriptome data of species, and obtaining transcripts.
Specifically, the method for predicting the genome-wide level gene structure provided by the invention comprises the following steps:
(1) outputting the second generation sequencing data of the species and the intron shearing sites of the homologous species gene information of the species, and taking intersection;
(2) outputting intron cleavage sites for three generations of full-length transcriptome data of the species;
(3) and (3) merging the intron splicing sites in the intersection in the step (1) and the intron splicing sites in the step (2) for identifying the intron splicing sites obtained when the three-generation full-length transcriptome data is corrected and the gene structure of the species is predicted, so as to obtain the gene transcript.
In the method provided by the present invention, the obtaining of the intersection includes:
predicting the gene structure of the species to be predicted based on transcriptome data obtained by a second generation Illumina platform or predicting the gene structure of the species to be predicted based on homologous species gene information data, obtaining intron splicing sites with reads number larger than 2, and ensuring that the ratio of the intron splicing sites located in introns to the introns located in exons is larger than 0.5.
Specifically, in the method provided by the present invention, the obtaining of the intersection includes:
s1 predicting the species gene structure based on transcriptome data obtained from the second generation Illumina platform;
s2 using homologous species genetic information data for the species to predict the species genetic structure;
s3 outputs intron splice sites with reads greater than 2 in S1 and S2, respectively; and ensuring that the ratio of intron-located exon cleavage sites to exon-located intron-located cleavage sites is greater than 0.5;
and aligning the two intron splicing sites obtained in the S3, and taking intersection.
In the method provided by the invention, the acquisition of the intron splicing sites in the three generations of full-length transcriptome data of the species comprises the following steps:
predicting the gene structure of the species to be predicted based on full-length transcriptome data obtained by a third-generation Nanopore platform, and acquiring a cut intron in the third-generation full-length transcriptome data, wherein the cut intron is started by GT and ended by AG, or the cut intron is started by GC and ended by AG; intron cleavage sites with a read number greater than 5.
In the method provided by the present invention, the obtaining of the transcript comprises:
the combined intron shear sites extend 10-30 bases from left to right, and are used for replacing intron shear sites with intron shear sites obtained by predicting the species gene structure through species three-generation full-length transcriptome data, wherein intersection is taken, and the intron shear sites fall into the intersection. According to the method provided by the invention, after genome information of the species of the transcript nucleus is compared, whole genome level gene structure prediction is realized by using genemaker-ST.
As an embodiment of the present invention, a method for genome-wide gene structure prediction includes, in particular, the scheme shown in fig. 1:
(1) downloading a genome file and a gff3 file of homologous species of the species to be predicted from an NCBI website (https:// www.ncbi.nlm.nih.gov /);
(2) utilizing GeMoMa software to predict the genomic gene information of the species by means of the downloaded gene information of the homologous species, and generating an gff3 file containing the information of the required intron splicing sites;
(3) sequencing a second generation transcriptome of a species to be predicted by using an Illumina platform, wherein sequencing materials are derived from mixed samples of tissues and organs, and the coverage of transcripts is ensured as much as possible;
(4) comparing off-line sequencing data (read) with the genome of the species to be predicted by using HISAT2 to obtain a comparison result file in a bam format;
(5) converting the bam format into the bed format by using a group command in a bedtools, and extracting the position of intron cutting;
(6) counting the number of reads supported by each intron shearing position, and keeping the support degree to be more than 2; meanwhile, counting the number of the splicing sites in the intron and the number of the splicing sites in the exon to ensure that the ratio of the splicing sites in the intron and the exon is less than 0.5;
(7) taking intersection of the shearing sites obtained by the second generation transcriptome and homologous species to obtain a high-reliability intron shearing site;
(8) the method adopts a Nanopore platform to perform the sequencing of the third generation transcriptome of the species, and sequencing materials are derived from mixed samples of various tissues and organs, so that the transcripts are covered as much as possible;
(9) aligning sequencing data (read) to a genome using GMAP software and outputting in gff3 format, the file containing original uncorrected intron splice site information;
(10) screening out at least 5 read-supported intron splicing sites which meet GT-AG and GC-AG rules in original intron shearing sites, taking the selected intron splicing sites as candidate splicing sites with higher reliability in third-generation transcriptome prediction, and combining the selected intron splicing sites with the high-reliability intron shearing sites in the step (7) into a set of non-redundant high-reliability intron shearing sites; wherein GT-AG represents introns beginning with GT and ending with AG, GC-AG represents introns beginning with GC and ending with AG;
(11) extending the non-redundant high-reliability intron splicing sites by 20 bases from left to right, using bedtools to take intersection with the intron splicing sites identified in (9), and using the non-redundant high-reliability splicing sites to replace the splicing sites identified by the third-generation transcriptome in the intersection, thereby completing the correction of the intron splicing sites predicted by the third-generation full-length transcriptome sequencing data;
(12) after correcting the splicing sites predicted by the third generation transcriptome sequencing data, if the two transcripts correspond to the same splicing sites on the genome, merging the transcripts to finally obtain a set of non-redundant high-accuracy candidate transcripts;
(13) further assembling the candidate transcripts by using PASA software, and further assembling the remaining incomplete transcripts into complete transcripts to obtain a set of non-redundant transcripts;
(14) and obtaining the gene structure of the species to be predicted by adopting genemaker-ST based on the set of non-redundant transcripts.
The second aspect of the present invention provides three generations of full-length transcripts for genome-wide gene structure prediction obtained by the methods provided by the present invention.
The invention provides three generations of full-length transcripts for whole genome gene structure prediction, and further comprises the transcripts obtained by assembling non-full-length transcripts through PASA.
In the third generation of full-length transcripts for the whole genome gene structure prediction provided by the invention, if the splice sites corresponding to the genome of the third generation of full-length transcripts are the same, the third generation of full-length transcripts are merged to remove repeated third generation of full-length transcripts.
The invention also claims the method provided by the invention and the application of the three generations of full-length transcripts provided by the invention in the prediction of alternative splicing.
The beneficial effects of the invention at least comprise:
(1) the method provided by the invention is suitable for a method for predicting the whole genome gene structure under the assistance of sequencing data of three general generations of full-length transcriptomes of different species such as animals, plants and the like;
(2) the method provided by the invention can realize the identification of the genes of animal and plant genomes at the whole genome level by utilizing the third-generation full-length transcription set sequencing data, and is favorable for improving the integrity and the accuracy of gene prediction;
(3) the method provided by the invention obtains the splicing sites of the three types of data (second generation transcriptome data, homologous species data and third generation transcriptome data), identifies and corrects the splicing sites in the third generation transcriptome data to obtain a high-reliability transcript, assists in gene prediction (including alternative splicing prediction) based on high-reliability transcript information, and improves the reliability of gene prediction.
Drawings
Fig. 1 is a detailed route diagram of the present invention.
FIG. 2 is a diagram of three generations of full-length transcriptome as an aid to the prediction of Arabidopsis genes in example 1 of the present invention.
FIG. 3 is a diagram of three generations of full-length transcriptome data assisted zebrafish gene prediction in example 2 of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail with reference to the following examples. It is to be understood that the following examples are given for illustrative purposes only and are not intended to limit the scope of the present invention. Various modifications and alterations of this invention will become apparent to those skilled in the art without departing from the spirit and scope of this invention.
The experimental procedures used in the following examples are all conventional procedures unless otherwise specified. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
Example 1 assistance of Nanopore three generations of full-length transcriptome data for the prediction of Arabidopsis genome Gene Structure
In this embodiment, the gene prediction of arabidopsis genome is taken as an analysis object, and a method for predicting the whole genome gene structure with the assistance of third-generation full-length transcriptome sequencing data is provided, which specifically comprises the following steps:
1. second and third generation transcriptome data sequencing
Sequencing materials are derived from mixed samples of roots, stems, leaves, flowers and seeds of arabidopsis thaliana, mRNA is extracted, and sequencing is respectively carried out on a second generation Illumina platform and a third generation Nanopore platform to obtain original off-machine data (read).
2. Transcriptome data comparison and homologous species gene information prediction splice sites
(1) Comparing the original data of the next generation transcriptome of the off-line with the genome of Arabidopsis thaliana by using HISAT2, and obtaining a bam file;
(2) comparing the three-generation transcriptome initial data of the off-line with the genome of the Arabidopsis thaliana by using GMAP software, and outputting a comparison result in an gff3 format;
(3) the genome and gff3 file (NCBI No. GCF _000309985.1) of turnip, a homologous species of arabidopsis thaliana, are downloaded, then prediction of arabidopsis thaliana gene information is performed by means of turnip gene information using GeMoMa software, and a file in gff3 format is output, and splice site information is extracted.
3. Correction of intron splice sites predicted by third generation full-length transcriptome sequencing data
(1) Converting a bam format generated by the second-generation sequencing comparison into a bed format by using a group command in a bedtools, extracting positions for intron shearing, counting the number of reads supported by each intron shearing position by using a Linux command sort and uniq command, and keeping the support degree to be more than 2; meanwhile, counting the number of the splicing sites in the intron and the number of the splicing sites in the exon to ensure that the ratio of the splicing sites in the intron and the exon is less than 0.5 to obtain the high-reliability intron splicing sites;
(2) screening out intron splicing sites which accord with GT-AG and GC-AG rules in the original intron splicing sites of the third generation transcriptome data and are supported by at least 5 reads, taking the intron splicing sites as candidate splicing sites with higher reliability in the third generation transcriptome prediction, and combining the intron splicing sites with high reliability into a set of non-redundant intron splicing sites with high reliability;
(3) and extending the non-redundant high-reliability intron splice sites by 20 bases from left to right, using the bdtools to take intersection with candidate splice sites in the third-generation transcriptome prediction gene structure, and using the non-redundant high-reliability intron splice sites to replace the candidate splice sites in the third-generation transcriptome prediction gene structure falling in the intersection, thereby completing the correction of the intron splice sites predicted by the third-generation full-length transcriptome sequencing data.
4. Method for predicting gene structure after obtaining high-reliability transcript
(1) After correcting splice sites obtained by the third generation transcriptome sequencing data prediction, if the splice sites corresponding to the genome on the two transcripts are the same, merging the transcripts to finally obtain a set of non-redundant high-accuracy candidate transcripts;
(2) further assembling the candidate transcripts by using PASA software, and further assembling the remaining incomplete transcripts into complete transcripts to obtain a set of non-redundant transcripts;
(3) based on the transcripts, genemaker-ST is used to obtain a gene structure at the whole genome level, and IGV is used to observe the consistency of the gene structure and a reference gene structure.
The results were analyzed as follows: the third generation sequencing data is used for downloading 30Gb data volume of the computer, 50,531 transcripts are generated through prediction by the method, the consistency with a reference gene set reaches 72.04% through evaluation, and the result shows that most of genes are predicted and the method plays an important role in gene auxiliary prediction; FIG. 2 shows the prediction result of one of the gene structures, and the IGV is adopted to observe the consistency of the prediction result and the reference, so that three transcripts of the gene predicted by the invention can be seen, wherein one transcript is completely consistent with the reference gene structure, and the newly predicted transcript additionally proves that the newly developed method not only can ensure the accuracy of the predicted gene structure, but also can predict more alternative splicing.
Example 2 assistance of Nanopore three-generation full-length transcriptome data in prediction of genome Gene Structure of Zebra fish
In this example, the same method as in example 1 was used to predict the genome-wide gene structure of zebra fish using 30Gb three-generation ONT full-length transcriptome, and the result showed that the identity of the zebra fish with the reference gene set reached 80.28%. Where FIG. 3 is the information for one of the transcripts.
Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (9)

1. A method for genome-wide level gene structure prediction, comprising:
predicting the gene structure of the species by using the second generation sequencing data of the species and the homologous species gene information, and acquiring the intersection of the intron shearing site information in the second generation sequencing data of the species and the homologous species gene information;
merging the intersection with an intron splicing site in the species' third generation full-length transcriptome data;
the combined intron splicing sites are used for identifying and correcting intron splicing sites obtained by predicting gene structures in three generations of full-length transcriptome data of species, and obtaining transcripts.
2. The method of claim 1, wherein the obtaining of the intersection comprises:
predicting the species gene structure based on transcriptome data obtained by a second generation Illumina platform and predicting the species gene structure based on homologous species gene information data, acquiring intron splicing sites with reads number larger than 2, and ensuring that the ratio of intron splicing sites located in introns to exons is larger than 0.5.
3. The method of claim 1, wherein obtaining intron cleavage sites in the species three-generation full-length transcriptome comprises:
predicting a species gene structure based on full-length transcriptome data obtained by a third-generation Nanopore platform, and acquiring a cut intron in the third-generation full-length transcriptome data, wherein the cut intron is started by GT and ended by AG, or the cut intron is started by GC and ended by AG; intron cleavage sites with a read number greater than 5.
4. The method of claim 1, wherein the obtaining of the transcript comprises:
the combined intron shear sites extend 10-30 bases from left to right, and are used for replacing intron shear sites with intron shear sites obtained by predicting the species gene structure through species three-generation full-length transcriptome data, wherein intersection is taken, and the intron shear sites fall into the intersection.
5. The method of any one of claims 1 to 4, wherein the comparison of the transcripts with genomic information from the species is followed by whole genome level genetic structure prediction using genemaker-ST.
6. Three generations of full length transcripts for genome wide gene structure prediction, obtained by the method of any one of claims 1 to 5.
7. The three generations of full-length transcripts of claim 6 further comprising, assembled from PASA into non-full-length transcripts.
8. The third generation full-length transcript according to claim 7, wherein said third generation full-length transcript is dereferenced if it corresponds to the same splice sites on the genome.
9. Use of the method according to any one of claims 1 to 5 or the three-generation full-length transcript according to any one of claims 6 to 8 for the prediction of alternative splicing.
CN202110322129.4A 2021-03-25 2021-03-25 Third-generation full-length transcriptome auxiliary gene prediction method Pending CN113077842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110322129.4A CN113077842A (en) 2021-03-25 2021-03-25 Third-generation full-length transcriptome auxiliary gene prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110322129.4A CN113077842A (en) 2021-03-25 2021-03-25 Third-generation full-length transcriptome auxiliary gene prediction method

Publications (1)

Publication Number Publication Date
CN113077842A true CN113077842A (en) 2021-07-06

Family

ID=76610591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110322129.4A Pending CN113077842A (en) 2021-03-25 2021-03-25 Third-generation full-length transcriptome auxiliary gene prediction method

Country Status (1)

Country Link
CN (1) CN113077842A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121160A (en) * 2021-11-25 2022-03-01 广东美格基因科技有限公司 Method and system for detecting macrovirus group in sample
CN115579060A (en) * 2022-12-08 2023-01-06 国家超级计算天津中心 Gene locus detection method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121160A (en) * 2021-11-25 2022-03-01 广东美格基因科技有限公司 Method and system for detecting macrovirus group in sample
CN114121160B (en) * 2021-11-25 2022-06-21 广东美格基因科技有限公司 Method and system for detecting macrovirus group in sample
CN115579060A (en) * 2022-12-08 2023-01-06 国家超级计算天津中心 Gene locus detection method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Mascher et al. Long-read sequence assembly: a technical evaluation in barley
Sundell et al. AspWood: high-spatial-resolution transcriptome profiles reveal uncharacterized modularity of wood formation in Populus tremula
Haas et al. Full-length messenger RNA sequences greatly improve genome annotation
Yano et al. Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS
Tian et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data
Garg et al. Gene discovery and tissue-specific transcriptome analysis in chickpea with massively parallel pyrosequencing and web resource development
Inada et al. Conserved noncoding sequences in the grasses4
CN113077842A (en) Third-generation full-length transcriptome auxiliary gene prediction method
Shearman et al. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome
Devisetty et al. Polymorphism identification and improved genome annotation of Brassica rapa through deep RNA sequencing
Wang et al. Transcriptome characteristics and six alternative expressed genes positively correlated with the phase transition of annual cambial activities in Chinese Fir (Cunninghamia lanceolata (Lamb.) Hook)
Ojeda et al. Utilization of tissue ploidy level variation in de novo transcriptome assembly of Pinus sylvestris
Hewett-Emmett et al. Structure and evolutionary origins of the carbonic anhydrase multigene family
CA3069749A1 (en) Systems and methods for targeted genome editing
CN111292806B (en) Transcriptome analysis method by using nanopore sequencing
CN112786109A (en) Genome assembly method of genome completion map
Tutaj et al. Rat genome assemblies, annotation, and variant repository
Hu et al. GIPS: a software guide to sequencing-based direct gene cloning in forward genetics studies
GENES The normal structure and regulation of human globin gene clusters
Alioto et al. ASPic-GeneID: a lightweight pipeline for gene prediction and alternative isoforms detection
Zhao et al. PLET1 (C11orf34), a highly expressed and processed novel gene in pig and mouse placenta, is transcribed but poorly spliced in human
CN111445949A (en) Method for annotating genome of high-altitude polyploid fish by using nanopore sequencing data
CN108796053A (en) A kind of identification method of arabidopsis gene mutant
Venturini et al. Extended methods for the annotation of Triticum aestivum CS42
Pitra et al. Digital gene expression analysis of the hop (Humulus lupulus L.) transcriptome

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination