CN107563149B - 全长转录本的结构注释和比对结果评估方法 - Google Patents

全长转录本的结构注释和比对结果评估方法 Download PDF

Info

Publication number
CN107563149B
CN107563149B CN201710720711.XA CN201710720711A CN107563149B CN 107563149 B CN107563149 B CN 107563149B CN 201710720711 A CN201710720711 A CN 201710720711A CN 107563149 B CN107563149 B CN 107563149B
Authority
CN
China
Prior art keywords
full
comparison result
outputting
length
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710720711.XA
Other languages
English (en)
Other versions
CN107563149A (zh
Inventor
王智健
简洁
姜丽荣
孙子奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Personal Gene Technology Co ltd
Original Assignee
Shanghai Personal Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Personal Biotechnology Co ltd filed Critical Shanghai Personal Biotechnology Co ltd
Priority to CN201710720711.XA priority Critical patent/CN107563149B/zh
Publication of CN107563149A publication Critical patent/CN107563149A/zh
Application granted granted Critical
Publication of CN107563149B publication Critical patent/CN107563149B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明为全长转录本的结构注释和比对结果评估方法,公开的比对结果评估和基因结构注释方法,其使用了matchAnnot软件,脚本的作用是将已有的注释gtf文件和sam文件按matchAnnot软件所需的格式修改,使用matchAnnot进行结构注释和比对结果评估,优化了matchAnnot结果的展示方式,并进行统计。

Description

全长转录本的结构注释和比对结果评估方法
技术领域
本发明涉及生物技术领域,特别涉及一种全长转录本的结构注释和比对结果评估方法。
背景技术
结构注释是指基于全长转录本与参考基因组的比对结果对全长转录本的一级结构进行预测,比对结果评估是指将预测的全长转录本结构与已知基因结构进行比较的结果,用于评估全长转录本与参考基因组的比对情况。目前全长转录本的结构注释和比对结果评估方法为直接采用matchAnnot进行评估。该比对结果评估方法存在如下问题:1)matchAnnot对输入文件有要求,使用常规的gtf和sam文件可能会报错;2)matchAnnot的结果比较冗余,不够清晰明了。
发明内容
本发明所要解决的技术问题在于针对现有的全长转录本比对结果评估方法所存在上述问题而提供一种优化方法。
本发明所要解决的技术问题可以通过以下技术方案来实现:
比对结果评估和基因结构注释方法,包括如下步骤:
(1)获得参考基因组注释gtf文件和全长转录组与参考基因组比对结果sam文件中共有的染色体ID;
(2)筛选sam文件中比对上非共有染色体的条目,进行整理后输出到no_annotation.txt(这部分全长转录本可用无参考基因组注释方式进行补充注释,从而获得新基因)中,而比对上共有染色体的条目则输出到tmp.sam中;
(3)使用matchAnnot软件进行结构注释和比对结果评估,tmp.sam文件和gtf文件作为输入文件;
(4)整理matchAnnot结果,将全长转录本的polyA motif单独输出到polyA_motif.txt中,对每条全长转录本提取与其最佳匹配的参考基因和参考转录本的信息,结合gtf中该基因的信息一并输出到matchinfo.xls中,对全长转录本与最佳匹配的参考转录本和参考基因的对应关系输出到transcript_summary.txt中(可用于寻找来自同一基因的全长转录本),统计全长转录本的最高匹配得分并用R语言作饼图。
由于采用了如上的技术方案,本发明的核心部分使用了matchAnnot软件,脚本的作用是将已有的注释gtf文件和sam文件按matchAnnot软件所需的格式修改,使用matchAnnot进行结构注释和比对结果评估,优化了matchAnnot结果的展示方式,并进行统计。
附图说明
图1为本发发明的比对结果评估和结构注释方法的流程图。
具体实施方式
参见图1,比对结果评估和基因结构注释方法,包括如下步骤:
(1)获得参考基因组注释gtf文件和全长转录组与参考基因组比对结果sam文件中共有的染色体ID;
(2)筛选sam文件中比对上非共有染色体的条目,进行整理后输出到no_annotation.txt(这部分全长转录本可用无参考基因组注释方式进行补充注释,从而获得新基因)中,而比对上共有染色体的条目则输出到tmp.sam中;
(3)使用matchAnnot软件进行结构注释和比对结果评估,tmp.sam文件和gtf文件作为输入文件;
(4)整理matchAnnot结果,将全长转录本的polyA motif单独输出到polyA_motif.txt中,对每条全长转录本提取与其最佳匹配的参考基因和参考转录本的信息,结合gtf中该基因的信息一并输出到matchinfo.xls中,对全长转录本与最佳匹配的参考转录本和参考基因的对应关系输出到transcript_summary.txt中(可用于寻找来自同一基因的全长转录本),统计全长转录本的最高匹配得分并用R语言作饼图。

Claims (1)

1.比对结果评估和基因结构注释方法,其特征在于,包括如下步骤:
(1)获得参考基因组注释gtf文件和全长转录组与参考基因组比对结果sam文件中共有的染色体ID;
(2)筛选sam文件中比对上非共有染色体的条目,进行整理后输出到no_annotation.txt中,而比对上共有染色体的条目则输出到tmp.sam中;
(3)使用matchAnnot软件进行结构注释和比对结果评估,tmp.sam文件和gtf文件作为输入文件;
(4)整理matchAnnot结果,将全长转录本的polyA以及motif单独输出到polyA_motif.txt中,对每条全长转录本提取与其最佳匹配的参考基因和参考转录本的信息,结合gtf中该基因的信息一并输出到matchinfo.xls中,对全长转录本与最佳匹配的参考转录本和参考基因的对应关系输出到transcript_summary.txt中,统计全长转录本的最高匹配得分并用R语言作饼图。
CN201710720711.XA 2017-08-21 2017-08-21 全长转录本的结构注释和比对结果评估方法 Active CN107563149B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710720711.XA CN107563149B (zh) 2017-08-21 2017-08-21 全长转录本的结构注释和比对结果评估方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710720711.XA CN107563149B (zh) 2017-08-21 2017-08-21 全长转录本的结构注释和比对结果评估方法

Publications (2)

Publication Number Publication Date
CN107563149A CN107563149A (zh) 2018-01-09
CN107563149B true CN107563149B (zh) 2020-10-23

Family

ID=60976516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710720711.XA Active CN107563149B (zh) 2017-08-21 2017-08-21 全长转录本的结构注释和比对结果评估方法

Country Status (1)

Country Link
CN (1) CN107563149B (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627967A (zh) * 2022-03-15 2022-06-14 北京基石生命科技有限公司 一种精确注释三代全长转录本的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020064792A1 (en) * 1997-11-13 2002-05-30 Lincoln Stephen E. Database for storage and analysis of full-length sequences
EP1758792A2 (en) * 2004-06-04 2007-03-07 Linkagene LTD. Methods for detecting gene expression in peripheral blood cells and uses thereof
US8005621B2 (en) * 2004-09-13 2011-08-23 Agency For Science Technology And Research Transcript mapping method
KR102505097B1 (ko) * 2009-12-07 2023-03-02 더 트러스티스 오브 더 유니버시티 오브 펜실베니아 세포 리프로그래밍을 위한 정제된 변형 rna를 포함하는 rna 제제
CN106202992A (zh) * 2016-07-11 2016-12-07 东南大学 一种长链非编码rna的高通量芯片处理及分析流程控制方法

Also Published As

Publication number Publication date
CN107563149A (zh) 2018-01-09

Similar Documents

Publication Publication Date Title
Hsiao et al. RNA editing in nascent RNA affects pre-mRNA splicing
Hoff et al. Predicting genes in single genomes with AUGUSTUS
Linder et al. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome
Seemann Prokka: rapid prokaryotic genome annotation
Travis et al. Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data
Sievers et al. Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data
Hoffmann et al. Accurate mapping of tRNA reads
JP7319197B2 (ja) 標的核酸のシークエンシングデータをアライメントする方法
Liu et al. Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression
CN107506614B (zh) 一种细菌ncRNA预测方法
US8484229B2 (en) Method and system for identifying traditional arabic poems
CN107563149B (zh) 全长转录本的结构注释和比对结果评估方法
Zhao et al. Bioinformatics analysis of alternative polyadenylation in green alga Chlamydomonas reinhardtii using transcriptome sequences from three different sequencing platforms
Toffano-Nioche et al. Detection of non-coding RNA in bacteria and archaea using the DETR’PROK Galaxy pipeline
US20210398605A1 (en) System and method for promoter prediction in human genome
Diendorfer et al. Annotation of additional evolutionary conserved microRNAs in CHO cells from updated genomic data
Tárraga et al. A parallel and sensitive software tool for methylation analysis on multicore platforms
US20130226467A1 (en) System and method for processing reference sequence for analyzing genome sequence
JP6115526B2 (ja) 製品採取計画作成方法および製品採取計画作成装置
JP5414130B2 (ja) 塩基配列のリードエラーを判定するためのプログラム
JP6536580B2 (ja) 文集合抽出システム、方法およびプログラム
CN103678424A (zh) 一种文档校对的方法和装置
Morton et al. TIPR: transcription initiation pattern recognition on a genome scale
van Kooten et al. The transcriptional landscape of a rewritten bacterial genome reveals control elements and genome design principles
Yang et al. Terminitor: cleavage site prediction using deep learning models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 200231 1, 2 floor 2, 218 Yin do road, Xuhui District, Shanghai.

Patentee after: SHANGHAI PERSONAL BIOTECHNOLOGY Co.,Ltd.

Address before: 200231 1, 2 floor 2, 218 Yin do road, Xuhui District, Shanghai.

Patentee before: SHANGHAI PERSONAL BIOTECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20220623

Address after: Room 2401, 24 / F, building a, Yangzi science and technology innovation center, 211 pubin Road, Jiangbei new district, Nanjing, Jiangsu, 211800

Patentee after: NANJING PERSONAL GENE TECHNOLOGY Co.,Ltd.

Address before: 200231 1, 2 floor 2, 218 Yin do road, Xuhui District, Shanghai.

Patentee before: SHANGHAI PERSONAL BIOTECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right