CN107609347A - A kind of grand transcript profile data analysing method based on high throughput sequencing technologies - Google Patents

A kind of grand transcript profile data analysing method based on high throughput sequencing technologies Download PDF

Info

Publication number
CN107609347A
CN107609347A CN201710720413.0A CN201710720413A CN107609347A CN 107609347 A CN107609347 A CN 107609347A CN 201710720413 A CN201710720413 A CN 201710720413A CN 107609347 A CN107609347 A CN 107609347A
Authority
CN
China
Prior art keywords
analysis
grand
transcript profile
sequence
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710720413.0A
Other languages
Chinese (zh)
Inventor
薛正晟
杨洋
姜丽荣
孙子奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Original Assignee
SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd filed Critical SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Priority to CN201710720413.0A priority Critical patent/CN107609347A/en
Publication of CN107609347A publication Critical patent/CN107609347A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A kind of grand transcript profile data analysing method based on high throughput sequencing technologies disclosed by the invention, comprises the following steps:(1) quality data collection is obtained;(2) mRNA transcript sequence collection is obtained;(3) nonredundant protein sequence collection is obtained;(4) the function monoid abundance for obtaining each grade is composed and analyzed;(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and analyzed;(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further carry out Alpha and Beta diversity analysis, and then screen to obtain the key organism label in grand genome by a variety of Multivariate Statistics methods to grand transcript profile sample;(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, above analysis result is objectively presented;(8) according to samples sources, specific functional database is selected to carry out annotation analysis.

Description

A kind of grand transcript profile data analysing method based on high throughput sequencing technologies
Technical field
The present invention relates to technical field of biological, more particularly to a kind of grand transcript profile number based on high throughput sequencing technologies According to analysis method.
Background technology
The research object of grand transcription group (Metatranscriptomics) is microorganism group mRNA, is obtaining microorganism After organizing total serum IgE and removing rRNA, reverse transcription cDNA, and the Insert Fragment library of appropriate length is built, these libraries are entered Row both-end (Paired-end, PE) high-flux sequence, so as to fine group of species active in the whole flora of accurate quantification Into and its corresponding function expression, and then lock flora in key organism label, illustrate its biological significance.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of grand transcript profile data based on high throughput sequencing technologies Analysis method.
The technical problems to be solved by the invention can be achieved through the following technical solutions:
A kind of grand transcript profile data analysing method based on high throughput sequencing technologies, specifically comprises the following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for grand turn of downstream The quality data collection of record group credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out the assembling of grand transcript profile sequence assembly respectively to each sample, build grand transcript profile Contigs and Scaffolds sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade, And carry out comparison in difference analysis, metabolic pathway enrichment analysis, cluster analysis;
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carried out Comparison in difference analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis;
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can be carried out to grand transcript profile sample Alpha and Beta diversity analysis, and then it is raw to rely on a variety of Multivariate Statistics methods to screen to obtain the key in grand genome Substance markers thing;
(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, objectively present Above analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
As a result of technical scheme as above, the present invention has following features:
(1) directly the genetic fragment of activity expression in flora sample is sequenced, really realized to active specy and table Up to the accurate quantification of function;
(2) multiple functions annotations database is optional, and KEGG/EggNOG/CAZy/NR/Swiss- is selected according to Research Requirements The databases such as Prot/GO/VFDB/CARD, optimize the active function metabolism spectrum annotation of grand transcript profile;
(3) source of species is accurately identified by microbial gene information, obtains kind and plant with lower horizontal " high-resolution Rate " active specy finely forms spectrum;
(4) by a variety of multivariate statistical analysis and machine learning method, system, grand transcript profile big data is in depth excavated Middle difference related active specy and corresponding function, so as to accurately identify the active bio label of key.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of the grand transcript profile data analysing method based on high throughput sequencing technologies of the present invention.
Fig. 2 is the annotation result statistical chart of the EggNOG function monoids of the present invention.In figure, abscissa corresponds to the 25 of EggNOG Individual gene function major class, each major class are represented with an English capital letter, EggNOG of the ordinate for annotation to corresponding classification Function monoid quantity.
The Unigene differential expressions MA figures of Fig. 3 present invention.In figure, abscissa shows each Unigene in two samples (group) In average expression intensity (i.e. A values, A=[log2(Case)+log2(Control)]/2, Case and Control represent this respectively Expression quantity of the Unigene in two samples (group)), abscissa value is bigger, and corresponding Unigene average expression intensity is stronger.It is vertical Coordinate is expression quantity fold difference logarithm value (i.e. M value, M=logs of each Unigene between two samples (group)2(Control/ Case)), ordinate logarithm value is bigger, and expression quantity of the corresponding Unigene in Control samples (group) is higher, and in Case samples Expression quantity in this (group) is lower;Logarithm value is smaller, and expression quantity of the corresponding Unigene in Case samples (group) is higher, and Expression quantity in Control samples (group) is lower.The Unigene of differential expression is on the diagram with red spots in two samples (group) Represent, the Unigene of expression quantity indifference is represented with cyan round dot.
Fig. 4 is the display renderings of the present invention.Obtained KO functions are annotated in KEGG functional databases based on each sample The relative expression quantity distribution table of monoid, each sample (group) can be analyzed and be enriched with the KO of (i.e. expression quantity significantly raises), and led to Whether notable cross statistical check evaluation difference.The display form of metabolic pathway concentration effect will have according to selected functional category Institute is different.
Fig. 5 is the PHI database annotation result statistical charts of the present invention.In figure, abscissa corresponds to PHI 9 gene major classes, Gene dosage of the ordinate for annotation to corresponding classification.
Embodiment
Referring to Fig. 1, a kind of grand transcript profile data analysing method based on high throughput sequencing technologies for being provided in figure, specific bag Include following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for grand turn of downstream The quality data collection of record group credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out the assembling of grand transcript profile sequence assembly respectively to each sample, build grand transcript profile Contigs and Scaffolds sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade, And carry out comparison in difference analysis, metabolic pathway enrichment analysis, cluster analysis (referring to Fig. 2, Fig. 3);
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carried out Comparison in difference analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis are (referring to Fig. 4);
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can be carried out to grand transcript profile sample Alpha and Beta diversity analysis, and then it is raw to rely on a variety of Multivariate Statistics methods to screen to obtain the key in grand genome Substance markers thing (referring to Fig. 5);
(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, objectively present Above analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.

Claims (1)

1. a kind of grand transcript profile data analysing method based on high throughput sequencing technologies, it is characterised in that comprise the following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for the grand transcript profile in downstream The quality data collection of credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out grand transcript profile sequence assembly assembling respectively to each sample, build grand transcript profile Contigs and Scaffolds Sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade, go forward side by side The analysis of row comparison in difference, metabolic pathway enrichment analysis, cluster analysis;
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carry out difference Comparative analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis;
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can carry out Alpha to grand transcript profile sample With Beta diversity analysis, and then by a variety of Multivariate Statistics methods screen to obtain in grand genome key organism mark Thing;
(7) by a variety of data visualizations and interactive tools, draw two-dimensional/three-dimensional chart, it is comprehensive, objectively present more than Analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
CN201710720413.0A 2017-08-21 2017-08-21 A kind of grand transcript profile data analysing method based on high throughput sequencing technologies Pending CN107609347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710720413.0A CN107609347A (en) 2017-08-21 2017-08-21 A kind of grand transcript profile data analysing method based on high throughput sequencing technologies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710720413.0A CN107609347A (en) 2017-08-21 2017-08-21 A kind of grand transcript profile data analysing method based on high throughput sequencing technologies

Publications (1)

Publication Number Publication Date
CN107609347A true CN107609347A (en) 2018-01-19

Family

ID=61065596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710720413.0A Pending CN107609347A (en) 2017-08-21 2017-08-21 A kind of grand transcript profile data analysing method based on high throughput sequencing technologies

Country Status (1)

Country Link
CN (1) CN107609347A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804875A (en) * 2018-06-21 2018-11-13 中国科学院北京基因组研究所 A method of analyzing micropopulation body function using macro genomic data
CN109166602A (en) * 2018-08-29 2019-01-08 苏州微宏生物科技有限公司 The macro gene alaysis system of the microorganism of kitchen waste aerobic compost and method
CN109378038A (en) * 2018-09-17 2019-02-22 上海派森诺生物科技股份有限公司 A kind of automated analysis method based on the BSA assignment of genes gene mapping
CN109929862A (en) * 2019-03-14 2019-06-25 云南农业大学 A method of it is cloned from the macro transcript profile data screening cellulose enzyme gene of ruminant tumor gastric
CN110033826A (en) * 2018-12-10 2019-07-19 上海派森诺生物科技股份有限公司 A kind of analysis method applied to macrovirus group high-flux sequence data
CN111261229A (en) * 2020-01-17 2020-06-09 广州基迪奥生物科技有限公司 Biological analysis process of MeRIP-seq high-throughput sequencing data
CN111304307A (en) * 2020-02-20 2020-06-19 深圳未知君生物科技有限公司 Method and device for analyzing function of flora metagenome gene and storage device
CN111462819A (en) * 2020-02-26 2020-07-28 康美华大基因技术有限公司 Method for analyzing intestinal microorganism detection data, automatic interpretation system and medium
CN111816258A (en) * 2020-07-20 2020-10-23 杭州谷禾信息技术有限公司 Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species
CN112750501A (en) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 Optimized analysis method for macrovirome process
CN113035269A (en) * 2021-04-16 2021-06-25 北京计算科学研究中心 Genome metabolism model construction, optimization and visualization method based on high-throughput sequencing technology
WO2021142625A1 (en) * 2020-01-14 2021-07-22 北京大学 Method for predicting cell spatial relation based on single-cell transcriptome sequencing data
CN113257348A (en) * 2021-05-26 2021-08-13 南开大学 Macro-transcriptome sequencing data processing method and system
CN114203256A (en) * 2022-02-18 2022-03-18 上海仁东医学检验所有限公司 MIBC typing and prognosis prediction model construction method based on microbial abundance
CN117198409A (en) * 2023-09-15 2023-12-08 云南省农业科学院农业环境资源研究所 microRNA prediction method and system based on transcriptome data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104630206A (en) * 2015-02-05 2015-05-20 北京诺禾致源生物信息科技有限公司 Method for constructing transcriptome library
EP2955232A1 (en) * 2014-06-12 2015-12-16 Peer Bork Method for diagnosing adenomas and/or colorectal cancer (CRC) based on analyzing the gut microbiome
CN105279391A (en) * 2015-09-06 2016-01-27 苏州协云和创生物科技有限公司 Metagenome 16S rRNA high-throughput sequencing data processing and analysis process control method
CN107034279A (en) * 2017-05-05 2017-08-11 中山大学 Application of the tuberculosis microbial markers in the reagent of diagnosis of tuberculosis is prepared

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2955232A1 (en) * 2014-06-12 2015-12-16 Peer Bork Method for diagnosing adenomas and/or colorectal cancer (CRC) based on analyzing the gut microbiome
CN104630206A (en) * 2015-02-05 2015-05-20 北京诺禾致源生物信息科技有限公司 Method for constructing transcriptome library
CN105279391A (en) * 2015-09-06 2016-01-27 苏州协云和创生物科技有限公司 Metagenome 16S rRNA high-throughput sequencing data processing and analysis process control method
CN107034279A (en) * 2017-05-05 2017-08-11 中山大学 Application of the tuberculosis microbial markers in the reagent of diagnosis of tuberculosis is prepared

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
北京诺禾致源生物信息科技有限公司: "诺禾致源宏转录组报告", 《百度文库HTTPS://WENKU.BAIDU.COM/VIEW/7119907FAE1FFC4FFE4733687E21AF45B307FE67.HTML》 *
周华等: "高通量转录组测序的数据分析与基因发掘", 《江西科学》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804875B (en) * 2018-06-21 2020-11-17 中国科学院北京基因组研究所 Method for analyzing microbial population function by using metagenome data
CN108804875A (en) * 2018-06-21 2018-11-13 中国科学院北京基因组研究所 A method of analyzing micropopulation body function using macro genomic data
CN109166602A (en) * 2018-08-29 2019-01-08 苏州微宏生物科技有限公司 The macro gene alaysis system of the microorganism of kitchen waste aerobic compost and method
CN109166602B (en) * 2018-08-29 2022-04-12 苏州微宏生物科技有限公司 Microbe macro-gene analysis system and method for aerobic composting of kitchen waste
CN109378038A (en) * 2018-09-17 2019-02-22 上海派森诺生物科技股份有限公司 A kind of automated analysis method based on the BSA assignment of genes gene mapping
CN110033826A (en) * 2018-12-10 2019-07-19 上海派森诺生物科技股份有限公司 A kind of analysis method applied to macrovirus group high-flux sequence data
CN110033826B (en) * 2018-12-10 2023-08-08 上海派森诺生物科技股份有限公司 Analysis method applied to macrovirome high-throughput sequencing data
CN109929862A (en) * 2019-03-14 2019-06-25 云南农业大学 A method of it is cloned from the macro transcript profile data screening cellulose enzyme gene of ruminant tumor gastric
WO2021142625A1 (en) * 2020-01-14 2021-07-22 北京大学 Method for predicting cell spatial relation based on single-cell transcriptome sequencing data
CN111261229A (en) * 2020-01-17 2020-06-09 广州基迪奥生物科技有限公司 Biological analysis process of MeRIP-seq high-throughput sequencing data
CN111304307A (en) * 2020-02-20 2020-06-19 深圳未知君生物科技有限公司 Method and device for analyzing function of flora metagenome gene and storage device
CN111462819A (en) * 2020-02-26 2020-07-28 康美华大基因技术有限公司 Method for analyzing intestinal microorganism detection data, automatic interpretation system and medium
CN111816258A (en) * 2020-07-20 2020-10-23 杭州谷禾信息技术有限公司 Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species
CN111816258B (en) * 2020-07-20 2023-10-31 杭州谷禾信息技术有限公司 Optimization method for accurate identification of human flora 16S rDNA high-throughput sequencing species
CN112750501A (en) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 Optimized analysis method for macrovirome process
CN112750501B (en) * 2020-12-29 2024-04-02 上海派森诺生物科技股份有限公司 Optimized analysis method for macro virus group flow
CN113035269A (en) * 2021-04-16 2021-06-25 北京计算科学研究中心 Genome metabolism model construction, optimization and visualization method based on high-throughput sequencing technology
CN113257348A (en) * 2021-05-26 2021-08-13 南开大学 Macro-transcriptome sequencing data processing method and system
CN114203256A (en) * 2022-02-18 2022-03-18 上海仁东医学检验所有限公司 MIBC typing and prognosis prediction model construction method based on microbial abundance
CN117198409A (en) * 2023-09-15 2023-12-08 云南省农业科学院农业环境资源研究所 microRNA prediction method and system based on transcriptome data

Similar Documents

Publication Publication Date Title
CN107609347A (en) A kind of grand transcript profile data analysing method based on high throughput sequencing technologies
Schep et al. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data
CN111933218B (en) Optimized metagenome binding method for analyzing microbial community
CN107577919A (en) A kind of grand genomic data analysis method based on high throughput sequencing technologies
Mulligan et al. GeneNetwork: a toolbox for systems genetics
Law et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR
CN107463800B (en) A kind of enteric microorganism information analysis method and system
CN111261229B (en) Biological analysis process of MeRIP-seq high-throughput sequencing data
CN107391963A (en) Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method
CA2840459A1 (en) Compositions and methods for identifying and comparing members of microbial communities by computational analysis of amplicon sequences
Pehkonen et al. Theme discovery from gene lists for identification and viewing of multiple functional groups
CN110544509B (en) Single-cell ATAC-seq data analysis method
CN107292123A (en) A kind of method and apparatus of microbiologic population's composition based on high-flux sequence
CN114708910B (en) Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data
Batut et al. Reference-based RNA-Seq data analysis
CN109686406A (en) A kind of phylogenetic tree figure production method and system
Markowitz et al. Applying data warehouse concepts to gene expression data management
CN109762909A (en) A kind of 44 site InDels composite amplification detection kits for sample medical jurisprudence individual appreciation of degrading
Zheng et al. MOCHI: a comprehensive cross-platform tool for amplicon-based microbiota analysis
CN113793647A (en) Metagenome data analysis device and method based on next generation sequencing
CN107609349A (en) A kind of project implementation quality control system in bioanalysis platform
Guzzi et al. Challenges in microarray data management and analysis
Sintsova et al. mBARq: a versatile and user-friendly framework for the analysis of DNA barcodes from transposon insertion libraries, knockout mutants, and isogenic strain populations
Trostle et al. MECP2pedia: a comprehensive transcriptome portal for MECP2 disease research
CN111128297B (en) Preparation method of gene chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180119