CN107391963A - Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method - Google Patents

Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method Download PDF

Info

Publication number
CN107391963A
CN107391963A CN201710598315.4A CN201710598315A CN107391963A CN 107391963 A CN107391963 A CN 107391963A CN 201710598315 A CN201710598315 A CN 201710598315A CN 107391963 A CN107391963 A CN 107391963A
Authority
CN
China
Prior art keywords
analysis
module
project
result
eucaryon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710598315.4A
Other languages
Chinese (zh)
Inventor
刘彬旭
余果
郭权
任一
史彩萍
曾静
石今
周玄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sangge Information Technology Co Ltd
Original Assignee
Shanghai Sangge Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sangge Information Technology Co Ltd filed Critical Shanghai Sangge Information Technology Co Ltd
Priority to CN201710598315.4A priority Critical patent/CN107391963A/en
Publication of CN107391963A publication Critical patent/CN107391963A/en
Priority to CN201810797352.2A priority patent/CN109243532A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Abstract

Module and interactive interpretation of result module are submitted including project management module, fundamental analysis task without ginseng transcript profile interaction analysis system and method, the system based on the eucaryon for calculating cloud platform the invention discloses a kind of;First, sequencing data is uploaded to local cluster server, project is established in project management module, while uploads the database of client in local cluster server or selects using the database on line, and can carry out project locking in the project management module or share to other people operations;Then, in fundamental analysis task submits module, user can carry out Parameter analysis by visualization interface to sequencing data, and item file is produced after analysis;Before analysis, first whether progress resulting number is qualified according to Quality Control, and Parameter analysis is carried out if qualified;Direct return reports an error if unqualified;Caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, the report intuitively presented.

Description

Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method
Technical field
The present invention relates to analysis of biological information technical field, more particularly to a kind of eucaryon based on calculating cloud platform to turn without ginseng Record group interaction analysis system and method.
Background technology
Transcript profile is also known as transcript group or express spectra, refers to particular species, tissue or cell type and is expressed within a certain period All RNA summation, including the mRNA of encoding proteins matter and the RNA of various non-codings (rRNA, tRNA, snoRNA, snRNA, MicroRNA etc.).Transcript profile contains the restriction in time and space, and it is the dynamic movement of genome and external physical feature, What transcript profile reflected is the gene that active list reaches under specified conditions, is an important means for studying cell phenotype and function.Turn Record group is that the subject of changes in gene expression is studied from rna level, for it can not still obtain the species of whole genome sequence, Carry out transcript profile sequencing, it has also become link up the important bridge of phenotype and genotype.
The research of transcript profile can provide the expressing information of gene under specified conditions, so as to infer the function of unknown gene, The mechanism of action of particular adjustments gene is disclosed, can also determine when and where different types of cell and the gene of tissue are being swashed It is living or into dormancy, to transcript quantitatively it will be seen that the activity and expression quantity of specific gene, for disease diagnosis and control Treat.
For eucaryote full-length genome, transcript profile sequence does not contain introne and other non-coding sequences, energy Provide more efficient useful information.Transcript profile is sequenced and analytical technology can solve the depth excavation of new gene, the transcription of low abundance The problem of each side such as this discovery, transcripting spectrum drafting, metabolic pathway determination, gene family identification and evolutionary analysis.And nothing The eucaryote transcript profile sequencing of reference gene group need not design probe, and can not only detect known can also find new turn Record originally, low abundance transcript can also be detected when sequencing coverage rate is sufficiently large.
Bioinformatic data analysis is the most essential steps that high-flux sequence is applied to transcript profile research.Illumina Once caused data volume is up to 1000G for HiSeq operations, and personal computer and work station obviously can not complete the place of these data Science and engineering is made.
Data are adjusted, screened, compared during high flux data processing, it is necessary to which bioinformatics worker grasps pin This programming ability.Existing eucaryon is mainly made up of without ginseng transcript profile analysis of biological information three parts.Standard bioinformatic Analysis is the basis of whole transcriptome analysis, and its result, which is presented, includes data output statistics, data Quality Control, transcript profile splicing, spelling Connect transcript/Unigene length statistics, sequence analysis and expression analysis.Wherein sequence analysis includes ORF predictions, gene work( Can annotation, snp analysis and ssr analysis.And annotation of gene function will compare NR, Pfam, Swissprot, String, KEGG, GO, The databases such as COG.Expression analysis includes correlation analysis, Gene expression differential display, differential gene GO/ between repeated sample KEGG enrichments analysis, the analysis of differential gene expression pattern clustering, differential gene Venn analyses and conspicuousness GO directed acyclic graphs point Analysis.Advanced bio bioinformatics analysis presentation mode includes:Gene co-expressing network struction, Ipath confluence analysises, interactions between protein net Network analysis, Transcription factor analysis etc..Personalized biological bioinformatics analysis includes:The homologous annotation of pattern species was analyzed, based on the time The gene expression analysis of sequence, using transcript profile data phylogenetic tree construction, nearly edge species ortholog analysis, Divergent homologous genes GO/KEGG analyses, Conserved homologous genes GO/KEGG analyses, the selection pressure of GO categorization levels Power analysis, tree hypothesis testing analysis etc..
The operation flow of prior art is adopted manually, and operating efficiency is relatively low, therefore can not meet the needs of market.
The content of the invention
To solve the above problems, the present invention provides a kind of eucaryon based on calculating cloud platform without ginseng transcript profile interaction analysis system System and its method, to solve personal computer and work station can not complete data processing and existing manual operation flow efficiency is low Problem.
To realize one of above-mentioned mesh, the present invention provides a kind of eucaryon based on calculating cloud platform without ginseng transcript profile interaction analysis System, including
Project management module, for being checked and being managed to project information, pass through project, task, application and file pair Stateful analysis project carry out integrated management;
Fundamental analysis task submits module, is pressed for setting underlying parameter operation task, and by result and initial data Integrated according to preset format and be packaged into corresponding item file, underlying parameter operation task includes sequencing data Quality Control, transcription spelling Connect, annotation of gene function, expression analysis and genomic organization;
Interactive interpretation of result module, for asking generation analysis result according to user, and result visualization is shown, wrapped Include advanced bio bioinformatics analysis and personalized biological bioinformatics analysis;
The project management module submits module to be connected with interactive interpretation of result module by fundamental analysis task;
First, sequencing data is uploaded to local cluster server, project is established in project management module in the project pipe Project locking, which can be carried out, in reason module or shares to other people operating;
Then, in fundamental analysis task submits module, user can be joined by visualization interface to sequencing data Number analysis, item file is produced after analysis;Before analysis, first whether progress resulting number is qualified according to Quality Control, is carried out if qualified Parameter analysis;Direct return reports an error if unqualified;
Finally, caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, according to user Demand carries out secondary analysis and statistics, the report intuitively presented to item file.
Specifically, the middle-and-high-ranking bioinformatic analysis of the interactive interpretation of result module and personalized biological information credit Analysis, including gene co-expressing network struction, Ipath confluence analysises, interactions between protein network analysis, Transcription factor analysis, pattern thing Homologous annotation analysis, the gene expression analysis based on time series of kind, utilize transcript profile data phylogenetic tree construction, nearly edge thing The analysis of kind ortholog, Divergent homologous genes GO/KEGG analyses, Conserved homologous genes GO/KEGG analyses, GO divide The horizontal selection Pressure Analysis of class, tree hypothesis testing analysis.
To realize one of above-mentioned mesh, the present invention provides a kind of eucaryon based on calculating cloud platform without ginseng transcript profile interaction analysis Method, comprise the following steps:
Step 1, project is created;
Step 2, sequencing data is uploaded to local cluster server, project is established in project management module, in the project Project locking can be carried out in management module or shares to other people operating;
Step 3, task is established;
Step 4, in fundamental analysis task submits module, user can be joined by visualization interface to sequencing data Number analysis, item file is produced after analysis;Before analysis, first whether progress resulting number is qualified according to Quality Control, is carried out if qualified Parameter analysis;Direct return reports an error if unqualified;Parameter analysis includes sequencing data Quality Control, transcript profile splicing, gene function Annotation, expression analysis and genomic organization;
Step 5, caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, according to user Demand item file is carried out secondary analysis and statistics, the report intuitively presented, including advanced bio bioinformatics analysis and Personalized biological bioinformatics analysis.
Specifically, the project management module, it may also be used for check and manage the associated documents uploaded or produced by analysis Associated documents, associated documents can be uploaded, be searched, being replicated, being moved, being deleted and down operation;Project management module is also For tagging items state progress, project status progress can be in not starting, carrying out, completed, terminated and problem;Project pipe Reason module is additionally operable to check the state and log information of task run;And project management module, shared project is additionally operable to, and manage Member's authority.
Specifically, the interactive interpretation of result module includes graph tool, and change Color scheme, shape side can be achieved Case, cylindricality direction;Display legend, point title and merging or ranking function can be achieved;Interactive interpretation of result module analysis knot Fruit can be stored in report, be shown in report;Interactive interpretation of result module analysis result figure can be with PNG, JPEG, PDF, SVG lattice Formula is downloaded;The reporting format of interactive interpretation of result module can be html and pdf.
Specifically, the project management module, fundamental analysis task submit the behaviour of module and interactive interpretation of result module Html+Css+jquery front end page and PHP+Alpha server background are based on, interaction analysis module, which receives, takes office The server side scripts that the computer languages such as Perl, C, python, R are transferred after business execution order are divided sequencing data substantially Analysis.
Specifically, the fundamental analysis task submits module in the different phase analyzed the sequencing data, from Corresponding analysis software is chosen in its analysis software stored to analyze the sequencing data.
The beneficial effects of the present invention are:It is provided by the invention to be divided based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction Analysis system and its method, mainly submit module and interactive interpretation of result module including project management module, fundamental analysis task Three big modules, the system and method is by cloud computing technology, required for easily obtaining a large amount of sequencing datas analyses by network Basic calculation resource, meet under big data background researcher for the great demand of basic calculation resource.Meanwhile it is based on The eucaryon for calculating cloud platform provides highly integrated Data Analysis Platform without ginseng transcript profile interaction analysis method, without user certainly Oneself integrates all kinds of analysis softwares, builds analysis process, truly realizes a key analysis of biological information.In addition, it is based on Multiple databases can be compared without ginseng transcriptome analysis by calculating the eucaryon of cloud platform, support many algorithms, visual result is presented With the report of interactive mode.Finally, based on calculate cloud platform eucaryon without ginseng transcriptome analysis content it is comprehensive, not only cover eucaryon without Join standard analysis and the advanced analysis of transcriptome analysis, in addition to part personality analysis, more preferably meet user's request.
Brief description of the drawings
Fig. 1 is the eucaryon based on calculating cloud platform of the present invention without ginseng transcript profile interaction analysis system block diagram;
Fig. 2 is the eucaryon based on calculating cloud platform of the present invention without ginseng transcript profile interaction analysis method flow diagram;
Fig. 3 is that fundamental analysis task submits schematic diagram of the eucaryon of module without ginseng transcriptome analysis in the present invention;
Fig. 4 is eucaryon in the present invention without ginseng transcriptome analysis grassroot project schematic diagram;
Fig. 5 submits schematic diagram for eucaryon in the present invention without ginseng transcriptome analysis background task parameter;
Fig. 6 is eucaryon in the present invention without ginseng transcript profile interaction analysis schematic diagram;
Fig. 7 assesses PCA graph tool schematic diagrames for eucaryon in the present invention without expression quantity between ginseng transcript profile sample;
Main element symbol description is as follows:
10th, project management module 11, fundamental analysis task submit module
12nd, interactive interpretation of result module.
Embodiment
In order to more clearly state the present invention, the present invention is further described below in conjunction with the accompanying drawings.
Referring to Fig. 1, the present invention based on calculate cloud platform eucaryon without ginseng transcript profile interaction analysis system, including
Project management module 10, for being checked and being managed to project information, pass through project, task, application and file Integrated management is carried out to stateful analysis project;
Fundamental analysis task submits module 11, for setting underlying parameter operation task, and by result and initial data Integrated according to preset format and be packaged into corresponding item file;
Interactive interpretation of result module 12, for asking generation analysis result according to user, and result visualization is shown;
Project management module submits module to be connected with interactive interpretation of result module by fundamental analysis task;
First, sequencing data is uploaded to local cluster server, project is established in project management module, in the project pipe Project locking, which can be carried out, in reason module or shares to other people operating;Then, in fundamental analysis task submits module, user Parameter analysis can be carried out to sequencing data by visualization interface, item file is produced after analysis;Before analysis, first carry out Judge whether data Quality Control is qualified, and Parameter analysis is carried out if qualified;Direct return reports an error if unqualified;
Finally, caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, according to user Demand carries out secondary analysis and statistics, the report intuitively presented to item file.
Further referring to Fig. 2, the present invention also provides a kind of based on the microbial diversity interaction analysis for calculating cloud platform Method, comprise the following steps:
Step 1, project is created;
Step 2, sequencing data is uploaded to local cluster server, project is established in project management module, and at this Project locking can be carried out in project management module or shares to other people operating;
Step 3, task is established;
Step 4, in fundamental analysis task submits module, user can be joined by visualization interface to sequencing data Number analysis, item file is produced after analysis;Before analysis, first whether progress resulting number is qualified according to Quality Control, is carried out if qualified Parameter analysis;Direct return reports an error if unqualified;
Step 5, caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, according to user Demand carries out secondary analysis and statistics, the report intuitively presented to item file.
It is provided by the invention to be divided based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction compared to the situation of prior art Analysis system and its method, the invention mainly includes project management module 10, fundamental analysis task submits module 11 and interactive knot 12 3 big module of fruit analysis module, the system and method easily obtain a large amount of sequencing numbers by network by cloud computing technology According to the basic calculation resource required for analysis, researcher is met under big data background for the huge need of basic calculation resource Ask.Meanwhile highly integrated data analysis is provided without ginseng transcript profile interaction analysis method based on the eucaryon for calculating cloud platform and put down Platform, all kinds of analysis softwares are integrated without user oneself, build analysis process, truly realize a key biological information point Analysis.In addition, multiple databases can be compared without ginseng transcriptome analysis based on the eucaryon for calculating cloud platform, many algorithms are supported, are presented Visual result and interactive report.Finally, it is comprehensive without ginseng transcriptome analysis content based on the eucaryon for calculating cloud platform, no Only cover fundamental analysis and advanced analysis of the eucaryon without ginseng transcriptome analysis, in addition to part personality analysis, it is more preferable to meet to use Family demand.
In the present embodiment, project management module, it may also be used for check and manage the associated documents uploaded or produced by analysis Associated documents can be uploaded, searched, being replicated, being moved, being deleted and down operation by raw associated documents;Project management module It is additionally operable to tagging items state progress, project status progress can be in not starting, carrying out, completed, terminated and problem;Project Management module is additionally operable to check the state and log information of task run;And project management module, shared project is additionally operable to, and manage Manage member's authority.
Further referring to Fig. 3, fundamental analysis task submits module to can be used for sequencing data Quality Control, transcript profile splicing, base Because of functional annotation, expression analysis and genomic organization.The good and bad important judge index of sequencing quality is the data production of sequencing Amount, sequencing data Quality Control include the data volume statistics after the data volume statistics being sequenced and quality control;Transcript profile splicing be by From the beginning the RNA-seq high quality sequencing read (reads) obtained is by assembling generation contig and unique sequence;Gene function Annotation is that the result for assembling each database of sequence alignment is carried out into comprehensive statistics;Expression analysis is then to carry out expression quantity system Meter, further according to the gene expression amount being calculated, the analysis of gene differential expression between sample two-by-two or between packet is carried out, again finally Many research is carried out to differential gene with different analysis methods;Genomic organization includes SNP point Analyse (snp analysis), simple repeated sequence analysis (ssr analysis) and ORFs prediction (ORF predictions).
At this generate result can in interactive analysis module visual check, can also be checked in item file correspondingly Destination file.
In the present embodiment, interactive interpretation of result module is additionally operable to advanced bio bioinformatics analysis and personalized biological letter Cease credit analysis, including gene co-expressing network struction, Ipath confluence analysises, interactions between protein network analysis, Transcription factor analysis, Pattern species it is homologous annotation analysis, the gene expression analysis based on time series, using transcript profile data phylogenetic tree construction, Nearly edge species ortholog analysis, Divergent homologous genes GO/KEGG analyses, GO/KEGG points of Conserved homologous genes Analysis, the selection Pressure Analysis of GO categorization levels, tree hypothesis testing analysis etc.;Interactive interpretation of result module is additionally operable to change control Prescription case and select analysis sample.
Interactive interpretation of result module includes graph tool, and change Color scheme, shape scheme, cylindricality direction can be achieved; Legend, point title, merging or ranking function can be shown;Interactive interpretation of result module analysis result can be stored in report, report Middle displaying;Interactive interpretation of result module analysis result figure can be downloaded with PNG, JPEG, PDF, SVG form;Interactive result point The reporting format for analysing module can be html and pdf.
Project management module, fundamental analysis task submit the operation of module and interactive interpretation of result module to be based on html + Css+jquery front end page and PHP+Alpha server background, after interaction analysis module receives task execution command The server side scripts for transferring Perl, C, python, R computer language carry out fundamental analysis to sequencing data.Fundamental analysis is appointed Business submits module in the different phase analyzed sequencing data, chosen from the analysis software of its storage analyze accordingly it is soft Part is analyzed sequencing data.
Further referring to Fig. 4, based on establishment project of the present invention and task step, to click on analysis platform and entering my item Mesh, grassroot project is clicked on, entry item title, item description, selects field label, species label.Click on the project established Title, newly-built task.
Referring to Fig. 5, the setting of parameter mainly includes sequencing data Quality Control, transcript profile splicing, annotation of gene function, expression Amount analysis and genomic organization parameter setting.The sequential file folder of input fastq forms may be selected herein.
Sequencing data Quality Control to the raw sequencing data of each sample in selected fastq files carries out that related matter is sequenced Amount is assessed, including A/T/G/C base contentses distribution statisticses, base Mass Distribution statistics and base error rate distribution statisticses.Then Quality Control filtering is carried out to raw sequencing data, it is necessary to set the parameters such as sequencing type, minimum mass value, minimum length to remove sequencing Joint sequence, low quality read, uncertain base information rate higher sequence and the too short sequence of length, to the data after Quality Control again It is secondary to be counted, it is same to include A/T/G/C base contentses distribution statisticses, base Mass Distribution statistics and base error rate distribution system Meter.
Transcript profile splicing is needed all sequencing reads by from the beginning assembling generation contig (contig) and unique sequence (singleton), this analysis is the basis of subsequent treatment and biological function analysis.The parameter for needing to set has most short Contig length, reads directions, kmer length and kmer least count values, assembling the visual presentation of splicing includes assembling knot Fruit statistical form, sequence length distribution, sample and assembling result deck watch.
Annotation of gene function will assemble the data such as sequence alignment NR, Pfam, Swissprot, String, KEGG, GO, COG Storehouse, and comparison result is counted and comprehensive statistics and self-defined screening respectively.
Expression analysis first carries out expression quantity statistics, carried out further according to the gene expression amount that is calculated between sample two-by-two or Analysis of gene differential expression between packet, many research finally is carried out to differential gene with different analysis methods again. Settable parameter has sample packet file, control group scheme, expresses figureofmerit and significance.Optional differential gene is ground The every analysis studied carefully, including cluster analysis, KEGG enrichment analyses, GO enrichment analyses, KEGG statistical analyses, GO statistical analyses etc..
Genomic organization include snp analysis, ssr analysis and ORF prediction, need to choose whether set SSR primers and whether Compare Pfam databases.
The analysis of interactive interpretation of result module refers to Fig. 6, mainly includes sequencing data Quality Control, transcript profile splicing, expression Amount and differential gene analysis and genomic organization etc..
Wherein sequencing data Quality Control includes initial data statistics and two modules of Quality Control data statistics.
Initial data statistical form has counted the original reads of whole, total bases, sequencing mistake that each sample is sequenced to obtain Rate, the base number ratio that error rate≤1% is sequenced, base number ratio, the G/C base quantity that error rate≤0.1% is sequenced With the percentage for accounting for chief inspector's number of stages, and can by base quality distribution diagram, base error rate distribution map and base distribution figure come Check relevant information.The form can directly be downloaded in interaction page and also be checked in item file and lower operation. The selectable color in graph tool, change main title, X-axis title, Y-axis title.It can be realized on distribution map by choosing The amplification in visuals region, it is possible to use the dragging amplification instrument in the figure lower right corner is amplified to whole figure.Click is deposited Enter report button, the picture can be stored to the relevant position of report.
Quality Control data statistics has then counted the above-mentioned every terms of information after raw sequencing data quality control.
Transcript profile splicing includes sequence assembling and compares two modules with assembling result.
Wherein sequence assembling result, which is shown, is divided into assembling result statistical form, sequence length distribution table and sequence length distribution Figure.Sequence length distribution table and the changeable step-length of figure are shown, the sequence number in the range of a fixed step size can be shown according to user's request Mesh, the sequence that can also will be greater than certain step-length carry out summation displaying.With
Assemble result and compare then generation comparison result statistical form.
Expression quantity and differential gene analysis include functional annotation, expression analysis, differential gene research and gene co-expressing Four modules of network analysis,
Wherein functional annotation includes functional annotation general view and functional annotation inquires about two modules, and functional annotation general view is wrapped Include overview and the displaying of NR, Pfam, Swissplot, String, GO, COG, KEGG comparison result information.Overview is mainly to result Carry out comprehensive statistics and self-defined screening, including annotation summary statistics table, annotation statistics block diagram and annotation statistics Venn figures.Note The annotation situation of gene or transcript can be checked by releasing statistics block diagram, and the numerical monitor that annotation statistics Venn figures can be clicked on figure closes Join element, also can input element searched.By the comparison with NR storehouses, this species transcript sequence and close thing can be checked The similar situation of kind, and the function information of homologous sequence.After being compared with NR databases, the species in statistics comparison Not, E-value distributions and sequence similarity distribution situation, note is embodied in terms of species, E-value and sequence similarity distribution three Release the confidence level of result.As a result be shown as species taxonomy statistical form, species taxonomy statistics pie chart, species taxonomy statistics block diagram, NRE-value is distributed pie chart and NR similarities distribution pie chart.Wherein species taxonomy statistical form can be carried out according to species taxonomy is horizontal Screening.Pfam databases are a protein families big collections, and the annotation of protein family can be carried out to assembling the transcript come. Using Swiss-Prot protein group sequence alignment results, GO classification is carried out to gene.GO databases are to gene and protein function Unified restriction and description are carried out, the bioprocess that can be participated in using GO databases for one or one group of gene according to it, Three aspects of molecular function and cellular component carry out classification annotation, there is many small levels again below these three big branches (level), level ranks numeral is bigger, and function is more careful.Gene or protein can be corresponding by ID or Sequence annotation Method finds corresponding GO numberings, and GO numberings can be used for functional category or cellular localization.GO comparison informations can be shown Summary statistics table, the level statistical forms of GO 234 and GO statistic of classification figures are annotated for GO.The wherein level statistical forms of GO 234 can Screened according to sequence type (gene, transcript) and GO level horizontals.And GO statistic of classifications figure then can by abundance by height to Low screening species display.String databases can be used for the interaction of prediction protein, can by comparing String databases To obtain the COG classification informations of protein coding gene, COG annotations are carried out to result, function classification is carried out to all transcripts. The result that COG is compared is shown as COG categorised statistical forms and COG statistic of classification block diagrams, can be according to different sequence type (bases Cause, transcript) checked.KEGG databases are the big of network analysis gene function, contact genomic information and function information Type knowledge base.Compared with KEGG databases, obtain KO numberings corresponding to transcript, can obtain certain transcript according to KO numberings can The specific biological pathways that can be participated in.The result that KEGG is compared is shown as Pathway distribution histograms, Pathway statistic of classifications Table, Pathway Information Statistics table, Pathway paths figure and Pathway statistic of classification block diagrams, can be according to sequence type (base Because of, transcript) checked, it can also screen transcript profile or number gene pathway in the top is shown.Functional annotation is inquired about It can be carried out according to transcript length, sequence name, species, COG ID, GO ID, KO ID and KO name, as a result be shown as functional annotation Information table, the comparison annotation result comprising each database.
Expression analysis includes correlation analysis and PCA analyses between expression quantity statistics, sample, the knot of expression quantity statistics displaying Fruit has single sample expression quantity distribution table, expression quantity distribution map and expression quantity statistical form, and wherein expression quantity statistical form can be according to gene Expression moment matrix, transcript expression moment matrix, gene count matrix and transcript count matrix search transcript and can check note Release details.The result displaying of correlation analysis includes correlation coefficient matrix clustering tree, correlation coefficient matrix between sample Relative coefficient table between thermal map and sample.It is related to PCA analyses that the displaying of PCA analysis results includes PCA figures, Principal Component Explanation degree table Data.Referring to Fig. 7, the graph tool of PCA analysis charts can the principal component that shows of unrestricted choice X-axis and Y-axis, also can according to whether Z axis is selected to determine that planar graph or 3-D graphic is presented, can also be passed through by the color put on Color scheme decision figure Shape scheme determines to scheme the shape of upper point, can choose whether a display point title, if display main title, also can customize master Title, X, Y or Z axis title.
Differential gene research includes differential gene statistics, differential gene analysis, differential gene GO annotation enrichments analysis, difference Gene KEGG annotation enrichment analyses.The result presentation of wherein differential gene statistics counts for differential gene statistical form and differential gene Scheme (scatter diagram and volcano figure).This analysis may be selected significance, packet scheme, control group scheme and be calculated.Difference base Because analysis includes differential gene screening, expression pattern cluster and Venn analyses, wherein differential gene screening can be according to gene expression Variant statistical table selects combination of two and synthesis mode (union screening, screening of occuring simultaneously) to be calculated, and caused result is difference Genescreen result table, annotation details can be checked according to the table search gene.Expression pattern cluster can be according to differential gene Scale is expressed, selects clustering method (hclust, kmeans), distance algorithm (manhattan, eculidean), log truth of a matter values (10,2), sub- clusters number, gene selects scheme carries out computing, as a result with differential gene expression calorimetric figure, differential gene Heatmap analytical tables and son cluster heatmap analytical tables, the displaying of gene polyadenylation signal Clustering Tendency figure.Venn analyses can be according to difference base Because expressing scale, selection or newly-built packet scheme carry out computing, as a result with differential gene Venn statistics tables and differential gene Venn figures represent.Differential gene GO annotation enrichment analyses include GO statistic of classifications and GO enrichment analyses, and GO statistic of classifications can basis Gene expression difference statistical form selection combination of two carries out computing, is as a result rendered as differential gene GO annotation column diagrams and GO analyses Categorised statistical form.GO enrichment analysis can according to gene expression difference statistical form, select combination of two regulation and control type (up-regulation, under Adjust), significance, multiple testing adjustment method (BH, FDR) carry out computing, are as a result rendered as GO enrichments statistical form, GO enrichments Analyze block diagram, conspicuousness GO directed acyclic graphs and GO enrichment analysis bubble diagrams.Differential gene KEGG annotation enrichment analyses include KEGG statistical analyses and KEGG enrichment analyses, wherein KEGG statistical analyses can select two-by-two according to gene expression difference statistical form Combination carries out computing, is as a result shown as KEGG statistical analysis tables and differential gene KEGG path figures.KEGG enrichment analyses can basis Gene expression difference statistical form, selection combination of two, selection regulation and control type (up-regulation, lowering) and multiple testing adjustment method (BY, BH, None, Qvalue) computing is carried out, as a result it is divided into KEGG enrichments analysis block diagram and KEGG enrichment analysis scatter diagrams.
Gene co-expressing network analysis can set β soft-thresholds and module similar threshold values according to differential gene expression scale Carry out computing, be as a result shown as differential gene net list, network, differential gene module tables, single module networks, Softpower distribution maps and module tree graphs.
Genomic organization includes three modules such as ssr analysis, snp analysis, ORF analyses.Wherein ssr analysis can be according to base Because of sequence, transcription this document and primer progress computing whether is designed, as a result including SSR statistical forms, SSR type statistics table, SSR Type statistics block diagram and SSR primer statistical forms.The result presentation form of snp analysis has SNP type statistics table, SNP types point Butut, SNP positions statistical form, SNP positions statistics pie chart and SNP result statistical forms, wherein SNP statistical forms can according to sample or SNP types are screened (A/T, A/C, A/G, C/T, C/G).ORF prediction results are presented as ORF prediction results table, ORF sequences length Spend distribution table, sequence length distribution map and orf protein domain annotation table.
Deposit report button is clicked in the interaction analysis page, you can by analysis acquired results deposit report relevant position. It can be checked using the explanation on software and method, and biological significance in report.It is also problematic in the upper right corner of interaction analysis With answer, the parameter setting being likely encountered or biological significance class problem are further answered.
A kind of eucaryon based on calculating cloud platform of the present invention is without ginseng transcript profile interaction analysis method, interaction analysis institute Caused destination file is integrated according to preset format and is packaged into corresponding item file.Caused destination file can be downloaded, for entering The analysis of one step.
A kind of eucaryon based on calculating cloud platform of the present invention can be certainly without ginseng transcript profile interaction analysis method, user By selection sequencing data, self-defined required parameter, using the configuration file to sequencing data progress fundamental analysis, and with Chart and the form of report are presented, thus compared to the prior art analyzed using manual mode, the present invention is using automatic Mode analyzed, it is possible to increase eucaryon without ginseng transcriptome analysis efficiency.
In the embodiment of the present invention, in addition to advanced raw letter analysis and individual character metaplasia believe analysis, advanced raw letter analysis and individual character Metaplasia letter analysis is further analyzed on the basis of standard analysis, improves the utilization ratio of standard analysis the data obtained, has For, the mining data information of deeper so that eucaryon is no longer limited to traditional business line stream without ginseng transcriptome analysis mode The unicity of journey, improves efficiency and data user rate of the eucaryon without ginseng transcriptome analysis, and a master data can be done infinitely Advanced analysis and personality analysis, save time and experimental cost.
In the present embodiment, utilized without ginseng transcript profile interaction analysis method based on the eucaryon for calculating cloud platform and generally approved in the industry Trinity sequencing data is analyzed, the chart of generation meets the requirement of professional journals.At workflow interface, according to Step 1 selects the file of Fastq forms interested, selection sequencing type, sets analytical parameters, point to the order of step 5 Hit and preserve and run, you can realize interaction analysis of the eucaryon without ginseng transcriptome project.In the interaction analysis page, user can be certainly Sample, change packet are selected by execution, change the operation such as color matching, it is not necessary to and analyst links up wait repeatedly, substantially reduces item The mesh cycle.
Embodiment of above is only that the preferred embodiment of the present invention is described, and not the scope of the present invention is entered Row limits, on the premise of design spirit of the present invention is not departed from, technical side of this area ordinary skill technical staff to the present invention The various modifications and improvement that case is made, it all should fall into the protection domain of claims of the present invention determination.

Claims (7)

1. a kind of eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system, it is characterised in that:Including
Project management module, for being checked and being managed to project information, by project, task, application and file to all The analysis project of state carries out integrated management;
Fundamental analysis task submits module, for setting underlying parameter operation task, and by result and initial data according to pre- If form, which is integrated, is packaged into corresponding item file, underlying parameter operation task includes sequencing data Quality Control, transcript profile splicing, base Because of functional annotation, expression analysis and genomic organization;
Interactive interpretation of result module, for asking generation analysis result according to user, and result visualization is shown, including height Level bioinformatic analysis and personalized biological bioinformatics analysis;
The project management module submits module to be connected with interactive interpretation of result module by fundamental analysis task.
2. according to claim 1 existed based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction analysis system, its feature In, the middle-and-high-ranking bioinformatic analysis of the interactive interpretation of result module and personalized biological bioinformatics analysis, including gene Co-express network struction, Ipath confluence analysises, interactions between protein network analysis, Transcription factor analysis, the homologous annotation point of pattern species Analysis, the gene expression analysis based on time series, utilize transcript profile data phylogenetic tree construction, nearly edge species ortholog point Analysis, Divergent homologous genes GO/KEGG analyses, Conserved homologous genes GO/KEGG analyses, the selection of GO categorization levels Pressure Analysis, tree hypothesis testing analysis.
A kind of 3. analysis side of the eucaryon without ginseng transcript profile interaction analysis system based on calculating cloud platform as claimed in claim 1 Method, it is characterised in that comprise the following steps:
Step 1, project is created;
Step 2, sequencing data is uploaded to local cluster server, project is established in project management module, in the project management Project locking can be carried out in module or shares to other people operating;
Step 3, task is established;
Step 4, in fundamental analysis task submits module, user can carry out parameter point by visualization interface to sequencing data Analysis, item file is produced after analysis;Before analysis, first whether progress resulting number is qualified according to Quality Control, and parameter is carried out if qualified Analysis;Direct return reports an error if unqualified;Parameter analysis include sequencing data Quality Control, transcript profile splicing, annotation of gene function, Expression analysis and genomic organization;
Step 5, caused item file, which is sent in interactive interpretation of result module, interacts formula analysis, according to user's request Secondary analysis and statistics, the report intuitively presented, including advanced bio bioinformatics analysis and individual character are carried out to item file Change bioinformatic analysis.
4. according to claim 2 existed based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction analysis method, its feature In the project management module, it may also be used for the associated documents uploaded or the associated documents as caused by analysis are checked and manage, can Associated documents are uploaded, searched, are replicated, are moved, are deleted and down operation;Project management module is additionally operable to tagging items shape State progress, project status progress can be in not starting, carrying out, completed, terminated and problem;Project management module is additionally operable to look into See the state and log information of task run;And project management module, shared project is additionally operable to, and manage member's authority.
5. according to claim 2 existed based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction analysis method, its feature In the interactive interpretation of result module includes graph tool, and change Color scheme, shape scheme, cylindricality direction can be achieved;Can Realize and show legend, point title and merging or ranking function;Interactive interpretation of result module analysis result can be stored in report, Shown in report;Interactive interpretation of result module analysis result figure can be downloaded with PNG, JPEG, PDF, SVG form;Interactive mode knot The reporting format of fruit analysis module can be html and pdf.
6. according to claim 2 existed based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction analysis method, its feature In the project management module, fundamental analysis task submit the operation of module and interactive interpretation of result module to be based on html+ Css+jquery front end page and PHP+Alpha server background, after interaction analysis module receives task execution command The server side scripts for transferring the computer languages such as Perl, C, python, R carry out fundamental analysis to sequencing data.
7. according to claim 2 existed based on the eucaryon for calculating cloud platform without ginseng transcript profile interaction analysis method, its feature In the fundamental analysis task submits module in the different phase analyzed the sequencing data, the analysis stored from it Corresponding analysis software is chosen in software to analyze the sequencing data.
CN201710598315.4A 2017-07-21 2017-07-21 Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method Pending CN107391963A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710598315.4A CN107391963A (en) 2017-07-21 2017-07-21 Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method
CN201810797352.2A CN109243532A (en) 2017-07-21 2018-07-19 Eukaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710598315.4A CN107391963A (en) 2017-07-21 2017-07-21 Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method

Publications (1)

Publication Number Publication Date
CN107391963A true CN107391963A (en) 2017-11-24

Family

ID=60336487

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710598315.4A Pending CN107391963A (en) 2017-07-21 2017-07-21 Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method
CN201810797352.2A Pending CN109243532A (en) 2017-07-21 2018-07-19 Eukaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810797352.2A Pending CN109243532A (en) 2017-07-21 2018-07-19 Eukaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method

Country Status (1)

Country Link
CN (2) CN107391963A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537008A (en) * 2018-03-20 2018-09-14 常州大学 High-throughput gene sequencing big data analysis cloud platform system
CN108694305A (en) * 2018-03-30 2018-10-23 武汉光谷创赢生物技术开发有限公司 Analysis of biological information platform based on cloud computing
CN108835028A (en) * 2018-06-08 2018-11-20 江口县旭辉生态农业科技有限公司 A kind of quick conservation method of purification of Jiangkou radish pig
CN109032586A (en) * 2018-07-09 2018-12-18 中国银行股份有限公司 A kind of data visualization method and device
CN109215742A (en) * 2018-08-30 2019-01-15 武汉古奥基因科技有限公司 biological information visualization device and method
CN109300501A (en) * 2018-09-20 2019-02-01 国家卫生计生委科学技术研究所 Prediction method for three-dimensional structure of protein and the prediction cloud platform constructed with it
CN109584964A (en) * 2018-11-29 2019-04-05 江苏医联生物科技有限公司 The data management system of gene data various dimensions annotation
CN109637592A (en) * 2018-12-21 2019-04-16 深圳晶泰科技有限公司 The calculating task management and analysis and its operation method that molecular force field parameter generates
CN110659252A (en) * 2019-08-12 2020-01-07 安诺优达生命科学研究院 Cloud-based biological information data delivery method and device and electronic equipment
CN110838338A (en) * 2018-08-15 2020-02-25 上海美吉生物医药科技有限公司 System, method, storage medium, and electronic device for creating biological analysis item
CN114023384A (en) * 2022-01-06 2022-02-08 天津金域医学检验实验室有限公司 Method for automatically generating standardized report of full exome sequencing annotation table
CN114333994A (en) * 2020-09-30 2022-04-12 天津现代创新中药科技有限公司 Method and system for determining differential gene pathways based on reference-free transcriptome sequencing

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109880894A (en) * 2019-03-05 2019-06-14 杭州西合森医学检验实验室有限公司 The construction method of tumour immunity microenvironment prediction model based on RNAseq
CN111276190B (en) * 2020-01-07 2023-09-12 广州基迪奥生物科技有限公司 Dynamic interactive enrichment analysis method and system based on biological cloud platform
CN112967756B (en) * 2021-03-30 2022-07-26 上海欧易生物医学科技有限公司 High-throughput sequencing quality control analysis method based on Snakeman language and capable of rapidly feeding back mail feedback results in batches

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324866A (en) * 2013-03-26 2013-09-25 张弘 Ripple system
EP3030577B1 (en) * 2013-08-07 2019-10-02 Universität zu Köln Novel nrg1 fusion genes in cancer
CN104331640B (en) * 2014-10-17 2018-04-17 北京百迈客生物科技有限公司 Project concluding report analysis system and method based on biological cloud platform
CN105653900B (en) * 2015-12-25 2019-03-26 北京百迈客生物科技有限公司 Without ginseng transcriptome analysis system and method
CN105447336B (en) * 2015-12-29 2018-06-19 北京百迈客生物科技有限公司 Analysis of Microbial Diversity system based on biological cloud platform

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537008A (en) * 2018-03-20 2018-09-14 常州大学 High-throughput gene sequencing big data analysis cloud platform system
CN108694305A (en) * 2018-03-30 2018-10-23 武汉光谷创赢生物技术开发有限公司 Analysis of biological information platform based on cloud computing
CN108694305B (en) * 2018-03-30 2021-06-11 武汉生物样本库有限公司 Biological information analysis system based on cloud computing
CN108835028A (en) * 2018-06-08 2018-11-20 江口县旭辉生态农业科技有限公司 A kind of quick conservation method of purification of Jiangkou radish pig
CN109032586A (en) * 2018-07-09 2018-12-18 中国银行股份有限公司 A kind of data visualization method and device
CN110838338A (en) * 2018-08-15 2020-02-25 上海美吉生物医药科技有限公司 System, method, storage medium, and electronic device for creating biological analysis item
CN110838338B (en) * 2018-08-15 2023-09-29 上海美吉生物医药科技有限公司 Biological analysis item establishment system, biological analysis item establishment method, storage medium, and electronic device
CN109215742A (en) * 2018-08-30 2019-01-15 武汉古奥基因科技有限公司 biological information visualization device and method
CN109300501A (en) * 2018-09-20 2019-02-01 国家卫生计生委科学技术研究所 Prediction method for three-dimensional structure of protein and the prediction cloud platform constructed with it
CN109300501B (en) * 2018-09-20 2021-02-02 国家卫生健康委科学技术研究所 Protein three-dimensional structure prediction method and prediction cloud platform constructed by using same
CN109584964A (en) * 2018-11-29 2019-04-05 江苏医联生物科技有限公司 The data management system of gene data various dimensions annotation
CN109637592A (en) * 2018-12-21 2019-04-16 深圳晶泰科技有限公司 The calculating task management and analysis and its operation method that molecular force field parameter generates
CN109637592B (en) * 2018-12-21 2022-04-12 深圳晶泰科技有限公司 Calculation task management analysis system for molecular force field parameter generation and operation method thereof
CN110659252A (en) * 2019-08-12 2020-01-07 安诺优达生命科学研究院 Cloud-based biological information data delivery method and device and electronic equipment
CN114333994A (en) * 2020-09-30 2022-04-12 天津现代创新中药科技有限公司 Method and system for determining differential gene pathways based on reference-free transcriptome sequencing
CN114023384B (en) * 2022-01-06 2022-04-05 天津金域医学检验实验室有限公司 Method for automatically generating standardized report of full exome sequencing annotation table
CN114023384A (en) * 2022-01-06 2022-02-08 天津金域医学检验实验室有限公司 Method for automatically generating standardized report of full exome sequencing annotation table

Also Published As

Publication number Publication date
CN109243532A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN107391963A (en) Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method
CN107368704A (en) The interactive analysis system and method for the transcriptome project for having reference gene group based on cloud computing platform
Pyne et al. Automated high-dimensional flow cytometric data analysis
Palit et al. Meeting the challenges of high-dimensional single-cell data analysis in immunology
US10223498B2 (en) Discovering population structure from patterns of identity-by-descent
CN107368700A (en) Based on the microbial diversity interaction analysis system and method for calculating cloud platform
Caudai et al. AI applications in functional genomics
CN108198621B (en) Database data comprehensive diagnosis and treatment decision method based on neural network
US10573406B2 (en) Method, apparatus and computer program product for metabolomics analysis
Gustafsson et al. Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation
CN108140025A (en) For the interpretation of result of graphic hotsopt
Pehkonen et al. Theme discovery from gene lists for identification and viewing of multiple functional groups
CN108335756B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
Alexander et al. Quantifying age-dependent extinction from species phylogenies
Lopez et al. Bayesian inference for a generative model of transcriptome profiles from single-cell RNA sequencing
Carter et al. A systems-biology approach to modular genetic complexity
Espinoza et al. Interrogation of clonal tracking data using barcodetrackR
CN108320797B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
CA3154621A1 (en) Single cell rna-seq data processing
CN109308934A (en) A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm
Wang et al. Learning dynamics by computational integration of single cell genomic and lineage information
EP2710152A1 (en) Computer-implemented method and system for detecting interacting dna loci
Mu et al. Investigation on tree molecular genome of Arabidopsis thaliana for internet of things
CN116153424B (en) Monogenic pan-cancer prognosis analysis system and analysis method
Miyagi et al. How many ecological niches are defined by the superabundant marine microbe Prochlorococcus?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171124

WD01 Invention patent application deemed withdrawn after publication