CN107609347A - A kind of grand transcript profile data analysing method based on high throughput sequencing technologies - Google Patents
A kind of grand transcript profile data analysing method based on high throughput sequencing technologies Download PDFInfo
- Publication number
- CN107609347A CN107609347A CN201710720413.0A CN201710720413A CN107609347A CN 107609347 A CN107609347 A CN 107609347A CN 201710720413 A CN201710720413 A CN 201710720413A CN 107609347 A CN107609347 A CN 107609347A
- Authority
- CN
- China
- Prior art keywords
- analysis
- grand
- transcript profile
- sequence
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A kind of grand transcript profile data analysing method based on high throughput sequencing technologies disclosed by the invention, comprises the following steps:(1) quality data collection is obtained;(2) mRNA transcript sequence collection is obtained;(3) nonredundant protein sequence collection is obtained;(4) the function monoid abundance for obtaining each grade is composed and analyzed;(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and analyzed;(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further carry out Alpha and Beta diversity analysis, and then screen to obtain the key organism label in grand genome by a variety of Multivariate Statistics methods to grand transcript profile sample;(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, above analysis result is objectively presented;(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
Description
Technical field
The present invention relates to technical field of biological, more particularly to a kind of grand transcript profile number based on high throughput sequencing technologies
According to analysis method.
Background technology
The research object of grand transcription group (Metatranscriptomics) is microorganism group mRNA, is obtaining microorganism
After organizing total serum IgE and removing rRNA, reverse transcription cDNA, and the Insert Fragment library of appropriate length is built, these libraries are entered
Row both-end (Paired-end, PE) high-flux sequence, so as to fine group of species active in the whole flora of accurate quantification
Into and its corresponding function expression, and then lock flora in key organism label, illustrate its biological significance.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of grand transcript profile data based on high throughput sequencing technologies
Analysis method.
The technical problems to be solved by the invention can be achieved through the following technical solutions:
A kind of grand transcript profile data analysing method based on high throughput sequencing technologies, specifically comprises the following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for grand turn of downstream
The quality data collection of record group credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out the assembling of grand transcript profile sequence assembly respectively to each sample, build grand transcript profile Contigs and
Scaffolds sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade,
And carry out comparison in difference analysis, metabolic pathway enrichment analysis, cluster analysis;
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carried out
Comparison in difference analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis;
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can be carried out to grand transcript profile sample
Alpha and Beta diversity analysis, and then it is raw to rely on a variety of Multivariate Statistics methods to screen to obtain the key in grand genome
Substance markers thing;
(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, objectively present
Above analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
As a result of technical scheme as above, the present invention has following features:
(1) directly the genetic fragment of activity expression in flora sample is sequenced, really realized to active specy and table
Up to the accurate quantification of function;
(2) multiple functions annotations database is optional, and KEGG/EggNOG/CAZy/NR/Swiss- is selected according to Research Requirements
The databases such as Prot/GO/VFDB/CARD, optimize the active function metabolism spectrum annotation of grand transcript profile;
(3) source of species is accurately identified by microbial gene information, obtains kind and plant with lower horizontal " high-resolution
Rate " active specy finely forms spectrum;
(4) by a variety of multivariate statistical analysis and machine learning method, system, grand transcript profile big data is in depth excavated
Middle difference related active specy and corresponding function, so as to accurately identify the active bio label of key.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of the grand transcript profile data analysing method based on high throughput sequencing technologies of the present invention.
Fig. 2 is the annotation result statistical chart of the EggNOG function monoids of the present invention.In figure, abscissa corresponds to the 25 of EggNOG
Individual gene function major class, each major class are represented with an English capital letter, EggNOG of the ordinate for annotation to corresponding classification
Function monoid quantity.
The Unigene differential expressions MA figures of Fig. 3 present invention.In figure, abscissa shows each Unigene in two samples (group)
In average expression intensity (i.e. A values, A=[log2(Case)+log2(Control)]/2, Case and Control represent this respectively
Expression quantity of the Unigene in two samples (group)), abscissa value is bigger, and corresponding Unigene average expression intensity is stronger.It is vertical
Coordinate is expression quantity fold difference logarithm value (i.e. M value, M=logs of each Unigene between two samples (group)2(Control/
Case)), ordinate logarithm value is bigger, and expression quantity of the corresponding Unigene in Control samples (group) is higher, and in Case samples
Expression quantity in this (group) is lower;Logarithm value is smaller, and expression quantity of the corresponding Unigene in Case samples (group) is higher, and
Expression quantity in Control samples (group) is lower.The Unigene of differential expression is on the diagram with red spots in two samples (group)
Represent, the Unigene of expression quantity indifference is represented with cyan round dot.
Fig. 4 is the display renderings of the present invention.Obtained KO functions are annotated in KEGG functional databases based on each sample
The relative expression quantity distribution table of monoid, each sample (group) can be analyzed and be enriched with the KO of (i.e. expression quantity significantly raises), and led to
Whether notable cross statistical check evaluation difference.The display form of metabolic pathway concentration effect will have according to selected functional category
Institute is different.
Fig. 5 is the PHI database annotation result statistical charts of the present invention.In figure, abscissa corresponds to PHI 9 gene major classes,
Gene dosage of the ordinate for annotation to corresponding classification.
Embodiment
Referring to Fig. 1, a kind of grand transcript profile data analysing method based on high throughput sequencing technologies for being provided in figure, specific bag
Include following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for grand turn of downstream
The quality data collection of record group credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out the assembling of grand transcript profile sequence assembly respectively to each sample, build grand transcript profile Contigs and
Scaffolds sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade,
And carry out comparison in difference analysis, metabolic pathway enrichment analysis, cluster analysis (referring to Fig. 2, Fig. 3);
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carried out
Comparison in difference analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis are (referring to Fig. 4);
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can be carried out to grand transcript profile sample
Alpha and Beta diversity analysis, and then it is raw to rely on a variety of Multivariate Statistics methods to screen to obtain the key in grand genome
Substance markers thing (referring to Fig. 5);
(7) by a variety of data visualizations and interactive tools, two-dimensional/three-dimensional chart is drawn, it is comprehensive, objectively present
Above analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
Claims (1)
1. a kind of grand transcript profile data analysing method based on high throughput sequencing technologies, it is characterised in that comprise the following steps:
(1) quality examination is carried out to the both-end sequence initial data of machine under high-flux sequence, acquisition can be used for the grand transcript profile in downstream
The quality data collection of credit analysis;
(2) ribosomal RNA sequences prediction and rejecting are carried out to high quality sequence, obtains mRNA transcript sequence collection;
(3) carry out grand transcript profile sequence assembly assembling respectively to each sample, build grand transcript profile Contigs and Scaffolds
Sequence sets, and predictive genes are carried out, obtain nonredundant protein sequence collection;
(4) functional annotation is carried out with a variety of frequently-used data storehouses to protein sequence, obtains the function monoid abundance spectrum of each grade, go forward side by side
The analysis of row comparison in difference, metabolic pathway enrichment analysis, cluster analysis;
(5) species annotation is carried out to gene order, is planted and planted the species composition spectrum of following fine level, and carry out difference
Comparative analysis, cluster analysis, species composition richness and Uniformity Analysis and related network analysis;
(6) function abundance spectrum and species composition spectrum based on above-mentioned acquisition, further can carry out Alpha to grand transcript profile sample
With Beta diversity analysis, and then by a variety of Multivariate Statistics methods screen to obtain in grand genome key organism mark
Thing;
(7) by a variety of data visualizations and interactive tools, draw two-dimensional/three-dimensional chart, it is comprehensive, objectively present more than
Analysis result;
(8) according to samples sources, specific functional database is selected to carry out annotation analysis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710720413.0A CN107609347A (en) | 2017-08-21 | 2017-08-21 | A kind of grand transcript profile data analysing method based on high throughput sequencing technologies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710720413.0A CN107609347A (en) | 2017-08-21 | 2017-08-21 | A kind of grand transcript profile data analysing method based on high throughput sequencing technologies |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107609347A true CN107609347A (en) | 2018-01-19 |
Family
ID=61065596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710720413.0A Pending CN107609347A (en) | 2017-08-21 | 2017-08-21 | A kind of grand transcript profile data analysing method based on high throughput sequencing technologies |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609347A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804875A (en) * | 2018-06-21 | 2018-11-13 | 中国科学院北京基因组研究所 | A method of analyzing micropopulation body function using macro genomic data |
CN109166602A (en) * | 2018-08-29 | 2019-01-08 | 苏州微宏生物科技有限公司 | The macro gene alaysis system of the microorganism of kitchen waste aerobic compost and method |
CN109378038A (en) * | 2018-09-17 | 2019-02-22 | 上海派森诺生物科技股份有限公司 | A kind of automated analysis method based on the BSA assignment of genes gene mapping |
CN109929862A (en) * | 2019-03-14 | 2019-06-25 | 云南农业大学 | A method of it is cloned from the macro transcript profile data screening cellulose enzyme gene of ruminant tumor gastric |
CN110033826A (en) * | 2018-12-10 | 2019-07-19 | 上海派森诺生物科技股份有限公司 | A kind of analysis method applied to macrovirus group high-flux sequence data |
CN111261229A (en) * | 2020-01-17 | 2020-06-09 | 广州基迪奥生物科技有限公司 | Biological analysis process of MeRIP-seq high-throughput sequencing data |
CN111304307A (en) * | 2020-02-20 | 2020-06-19 | 深圳未知君生物科技有限公司 | Method and device for analyzing function of flora metagenome gene and storage device |
CN111462819A (en) * | 2020-02-26 | 2020-07-28 | 康美华大基因技术有限公司 | Method for analyzing intestinal microorganism detection data, automatic interpretation system and medium |
CN111816258A (en) * | 2020-07-20 | 2020-10-23 | 杭州谷禾信息技术有限公司 | Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species |
CN112750501A (en) * | 2020-12-29 | 2021-05-04 | 上海派森诺生物科技股份有限公司 | Optimized analysis method for macrovirome process |
CN113035269A (en) * | 2021-04-16 | 2021-06-25 | 北京计算科学研究中心 | Genome metabolism model construction, optimization and visualization method based on high-throughput sequencing technology |
WO2021142625A1 (en) * | 2020-01-14 | 2021-07-22 | 北京大学 | Method for predicting cell spatial relation based on single-cell transcriptome sequencing data |
CN113257348A (en) * | 2021-05-26 | 2021-08-13 | 南开大学 | Macro-transcriptome sequencing data processing method and system |
CN114203256A (en) * | 2022-02-18 | 2022-03-18 | 上海仁东医学检验所有限公司 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
CN117198409A (en) * | 2023-09-15 | 2023-12-08 | 云南省农业科学院农业环境资源研究所 | microRNA prediction method and system based on transcriptome data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104630206A (en) * | 2015-02-05 | 2015-05-20 | 北京诺禾致源生物信息科技有限公司 | Method for constructing transcriptome library |
EP2955232A1 (en) * | 2014-06-12 | 2015-12-16 | Peer Bork | Method for diagnosing adenomas and/or colorectal cancer (CRC) based on analyzing the gut microbiome |
CN105279391A (en) * | 2015-09-06 | 2016-01-27 | 苏州协云和创生物科技有限公司 | Metagenome 16S rRNA high-throughput sequencing data processing and analysis process control method |
CN107034279A (en) * | 2017-05-05 | 2017-08-11 | 中山大学 | Application of the tuberculosis microbial markers in the reagent of diagnosis of tuberculosis is prepared |
-
2017
- 2017-08-21 CN CN201710720413.0A patent/CN107609347A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2955232A1 (en) * | 2014-06-12 | 2015-12-16 | Peer Bork | Method for diagnosing adenomas and/or colorectal cancer (CRC) based on analyzing the gut microbiome |
CN104630206A (en) * | 2015-02-05 | 2015-05-20 | 北京诺禾致源生物信息科技有限公司 | Method for constructing transcriptome library |
CN105279391A (en) * | 2015-09-06 | 2016-01-27 | 苏州协云和创生物科技有限公司 | Metagenome 16S rRNA high-throughput sequencing data processing and analysis process control method |
CN107034279A (en) * | 2017-05-05 | 2017-08-11 | 中山大学 | Application of the tuberculosis microbial markers in the reagent of diagnosis of tuberculosis is prepared |
Non-Patent Citations (2)
Title |
---|
北京诺禾致源生物信息科技有限公司: "诺禾致源宏转录组报告", 《百度文库HTTPS://WENKU.BAIDU.COM/VIEW/7119907FAE1FFC4FFE4733687E21AF45B307FE67.HTML》 * |
周华等: "高通量转录组测序的数据分析与基因发掘", 《江西科学》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804875B (en) * | 2018-06-21 | 2020-11-17 | 中国科学院北京基因组研究所 | Method for analyzing microbial population function by using metagenome data |
CN108804875A (en) * | 2018-06-21 | 2018-11-13 | 中国科学院北京基因组研究所 | A method of analyzing micropopulation body function using macro genomic data |
CN109166602A (en) * | 2018-08-29 | 2019-01-08 | 苏州微宏生物科技有限公司 | The macro gene alaysis system of the microorganism of kitchen waste aerobic compost and method |
CN109166602B (en) * | 2018-08-29 | 2022-04-12 | 苏州微宏生物科技有限公司 | Microbe macro-gene analysis system and method for aerobic composting of kitchen waste |
CN109378038A (en) * | 2018-09-17 | 2019-02-22 | 上海派森诺生物科技股份有限公司 | A kind of automated analysis method based on the BSA assignment of genes gene mapping |
CN110033826A (en) * | 2018-12-10 | 2019-07-19 | 上海派森诺生物科技股份有限公司 | A kind of analysis method applied to macrovirus group high-flux sequence data |
CN110033826B (en) * | 2018-12-10 | 2023-08-08 | 上海派森诺生物科技股份有限公司 | Analysis method applied to macrovirome high-throughput sequencing data |
CN109929862A (en) * | 2019-03-14 | 2019-06-25 | 云南农业大学 | A method of it is cloned from the macro transcript profile data screening cellulose enzyme gene of ruminant tumor gastric |
WO2021142625A1 (en) * | 2020-01-14 | 2021-07-22 | 北京大学 | Method for predicting cell spatial relation based on single-cell transcriptome sequencing data |
CN111261229A (en) * | 2020-01-17 | 2020-06-09 | 广州基迪奥生物科技有限公司 | Biological analysis process of MeRIP-seq high-throughput sequencing data |
CN111304307A (en) * | 2020-02-20 | 2020-06-19 | 深圳未知君生物科技有限公司 | Method and device for analyzing function of flora metagenome gene and storage device |
CN111462819A (en) * | 2020-02-26 | 2020-07-28 | 康美华大基因技术有限公司 | Method for analyzing intestinal microorganism detection data, automatic interpretation system and medium |
CN111816258A (en) * | 2020-07-20 | 2020-10-23 | 杭州谷禾信息技术有限公司 | Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species |
CN111816258B (en) * | 2020-07-20 | 2023-10-31 | 杭州谷禾信息技术有限公司 | Optimization method for accurate identification of human flora 16S rDNA high-throughput sequencing species |
CN112750501A (en) * | 2020-12-29 | 2021-05-04 | 上海派森诺生物科技股份有限公司 | Optimized analysis method for macrovirome process |
CN112750501B (en) * | 2020-12-29 | 2024-04-02 | 上海派森诺生物科技股份有限公司 | Optimized analysis method for macro virus group flow |
CN113035269A (en) * | 2021-04-16 | 2021-06-25 | 北京计算科学研究中心 | Genome metabolism model construction, optimization and visualization method based on high-throughput sequencing technology |
CN113257348A (en) * | 2021-05-26 | 2021-08-13 | 南开大学 | Macro-transcriptome sequencing data processing method and system |
CN114203256A (en) * | 2022-02-18 | 2022-03-18 | 上海仁东医学检验所有限公司 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
CN117198409A (en) * | 2023-09-15 | 2023-12-08 | 云南省农业科学院农业环境资源研究所 | microRNA prediction method and system based on transcriptome data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609347A (en) | A kind of grand transcript profile data analysing method based on high throughput sequencing technologies | |
Schep et al. | chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data | |
CN111933218B (en) | Optimized metagenome binding method for analyzing microbial community | |
CN107577919A (en) | A kind of grand genomic data analysis method based on high throughput sequencing technologies | |
Mulligan et al. | GeneNetwork: a toolbox for systems genetics | |
Law et al. | RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR | |
CN107463800B (en) | A kind of enteric microorganism information analysis method and system | |
CN111261229B (en) | Biological analysis process of MeRIP-seq high-throughput sequencing data | |
CN107391963A (en) | Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method | |
CA2840459A1 (en) | Compositions and methods for identifying and comparing members of microbial communities by computational analysis of amplicon sequences | |
Pehkonen et al. | Theme discovery from gene lists for identification and viewing of multiple functional groups | |
CN110544509B (en) | Single-cell ATAC-seq data analysis method | |
CN107292123A (en) | A kind of method and apparatus of microbiologic population's composition based on high-flux sequence | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
Batut et al. | Reference-based RNA-Seq data analysis | |
CN109686406A (en) | A kind of phylogenetic tree figure production method and system | |
Markowitz et al. | Applying data warehouse concepts to gene expression data management | |
CN109762909A (en) | A kind of 44 site InDels composite amplification detection kits for sample medical jurisprudence individual appreciation of degrading | |
Zheng et al. | MOCHI: a comprehensive cross-platform tool for amplicon-based microbiota analysis | |
CN113793647A (en) | Metagenome data analysis device and method based on next generation sequencing | |
CN107609349A (en) | A kind of project implementation quality control system in bioanalysis platform | |
Guzzi et al. | Challenges in microarray data management and analysis | |
Sintsova et al. | mBARq: a versatile and user-friendly framework for the analysis of DNA barcodes from transposon insertion libraries, knockout mutants, and isogenic strain populations | |
Trostle et al. | MECP2pedia: a comprehensive transcriptome portal for MECP2 disease research | |
CN111128297B (en) | Preparation method of gene chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180119 |