CN112599192A - New coronavirus whole genome analysis system based on nanopore sequencing - Google Patents
New coronavirus whole genome analysis system based on nanopore sequencing Download PDFInfo
- Publication number
- CN112599192A CN112599192A CN202011641513.2A CN202011641513A CN112599192A CN 112599192 A CN112599192 A CN 112599192A CN 202011641513 A CN202011641513 A CN 202011641513A CN 112599192 A CN112599192 A CN 112599192A
- Authority
- CN
- China
- Prior art keywords
- analysis
- sequencing
- task
- unit
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 134
- 241000711573 Coronaviridae Species 0.000 title claims abstract description 39
- 238000007672 fourth generation sequencing Methods 0.000 title claims abstract description 27
- 238000012163 sequencing technique Methods 0.000 claims abstract description 82
- 238000001514 detection method Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000007671 third-generation sequencing Methods 0.000 claims abstract description 12
- 230000035772 mutation Effects 0.000 claims abstract description 11
- 238000003908 quality control method Methods 0.000 claims abstract description 11
- 244000052769 pathogen Species 0.000 claims description 31
- 238000012300 Sequence Analysis Methods 0.000 claims description 30
- 230000001717 pathogenic effect Effects 0.000 claims description 29
- 108090000623 proteins and genes Proteins 0.000 claims description 27
- 238000007726 management method Methods 0.000 claims description 13
- 238000007405 data analysis Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000003153 chemical reaction reagent Substances 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 2
- 238000002864 sequence alignment Methods 0.000 claims description 2
- 230000002596 correlated effect Effects 0.000 abstract description 3
- 230000000875 corresponding effect Effects 0.000 description 29
- 241000700605 Viruses Species 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 230000001960 triggered effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 244000005700 microbiome Species 0.000 description 2
- 238000003753 real-time PCR Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 206010003757 Atypical pneumonia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- Genetics & Genomics (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The invention provides a nanopore sequencing-based whole genome analysis system for a new coronavirus, which is used for establishing a complete analysis process aiming at sequencing data of the new coronavirus, performing overall analysis of quality control, genome coverage, variation detection, genome assembly and genome integrity aiming at sequencing data of second-generation sequencing and third-generation sequencing, and performing tree analysis aiming at variation detection. The mutation detection result is correlated with the sample, so that the epidemic history of the new coronavirus can be controlled conveniently, in addition, the whole analysis process of sequencing analysis is visually displayed, an operator can simply perform analysis operation according to an operation instruction on an operation interface, and the analysis result is comprehensively displayed in a chart form.
Description
Technical Field
The invention relates to a gene analysis system, in particular to a new coronavirus whole genome analysis system based on nanopore sequencing.
Background
The new coronavirus (SARs-CoV) as the infectious atypical pneumonia virus can be diagnosed by real-time PCR, virus gene sequencing or virus specific antibody detection, and the detection and control of the new coronavirus can be carried out by a whole genome sequencing method, so that good effect can be achieved. At present, a nanopore sequencing technology based on electric signal detection is a high-throughput sequencing platform with simplest experimental operation and fastest sequencing speed at present, but a large amount of original data generated by a sequencing method at present needs to be analyzed by professional experimenters trained for a long time, and the professional experimenters need to call various programs in a linux shell command line form to realize analysis work such as filtration, sequence comparison, microorganism species classification, microorganism reading number statistics, pathogenic microorganism detection, target species data extraction, genome integrity calculation and the like on original sequences. Such a drawback lies in that it is necessary for professional experimenters to have very strong biological information analysis and linux system operation capabilities, and each analysis program has different selection schemes and parameters, and professional experimenters are required to spend a large amount of time to repeatedly search and adjust the programs and parameters, so that the efficiency is very low, the visualization effect display of data is problematic, and the degree of automation is very low.
In summary, no simple and easy-to-operate sequencing analysis system for whole genome analysis of new coronavirus exists, and mutation detection of new coronavirus cannot be performed rapidly.
Disclosure of Invention
The invention aims to provide a nanopore sequencing-based whole genome analysis system for a new coronavirus, which integrates various analysis programs, has a simple and clear operation interface, can be easily operated by experimenters in a short time, can detect a variant gene, correlates an analysis result with sample information and is convenient for managing and controlling the new coronavirus.
In order to achieve the above object, the present technical solution provides a new coronavirus whole genome analysis system based on nanopore sequencing, comprising:
the system comprises a data analysis system and a sample management system which are mutually associated, wherein the data analysis system is used for acquiring sequencing data of a pathogen to be detected so as to identify the type of the pathogen to be detected, the sample management system is used for acquiring sample information corresponding to the pathogen to be detected, and the sequencing data is associated with the sample information;
the data analysis system includes:
the task establishing unit is used for establishing an analysis task corresponding to the sequencing data of the pathogen to be detected, wherein the analysis task stores the sequencing data and analysis parameters of the pathogen to be detected;
a reference genome storing a reference gene sequence of the new coronavirus;
the sequence comparison unit is used for obtaining a comparison instruction, comparing the sequencing data with the reference genome and obtaining a detection sequence of the pathogen to be detected;
the sequence analysis unit is used for acquiring a sequence analysis instruction and performing at least one sequence analysis task of genome coverage rate, variation detection, genome assembly and genome integrity on the basis of the detection sequence;
and the analysis report generating unit is used for acquiring the report instruction, extracting the analysis result data of the sequence analyzing unit and the sequence comparing unit, and associating the analysis result data with the sample management system to generate an analysis report.
Compared with the prior art, the technical scheme has the following characteristics and beneficial effects: providing visual display of an analysis process, optimizing parameter adjustment input, and analyzing a sequencing result in a one-click mode; integrating the analysis process, and providing new coronavirus genome negative/positive detection and genome variation detection; the analysis system which provides a graphical interface in one key mode and the PDF format detection report in one key mode enable data interpretation of a sequencing sequence to be simpler.
Drawings
FIG. 1 is a schematic diagram of a framework of a nanopore sequencing based genome wide analysis system for a new coronavirus according to the present invention.
FIG. 2 is a schematic diagram of input data and analysis parameters.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.
The whole genome analysis system of the new coronavirus based on nanopore sequencing is constructed, a complete analysis process is established by the whole genome analysis system aiming at sequencing data of the new coronavirus, the whole analysis of quality control, genome coverage, variation detection, genome assembly and genome integrity is carried out aiming at the sequencing data of second-generation sequencing and third-generation sequencing, and the dendriform analysis is carried out aiming at the variation detection. The mutation detection result is correlated with the sample, so that the epidemic history of the new coronavirus can be controlled conveniently, in addition, the whole analysis process of sequencing analysis is visually displayed, an operator can simply perform analysis operation according to an operation instruction on an operation interface, and the analysis result is comprehensively displayed in a chart form.
The scheme content of the novel coronavirus whole genome analysis system based on nanopore sequencing comprises the following steps: a comprehensive analysis process is constructed based on the sequencing data of suspected pathogens sequenced by the nanopore, and sequence comparison and variation detection of reference genomes are carried out aiming at the suspected pathogens, so that the problems that the conventional new crown conventional fluorescence quantitative PCR detection cannot be used for virus variation detection and virus evolution detection are solved; in addition, the analysis system supports complete data types, supports a nanopore sequencing technology with data formats of fastq, fast5 and barcoded fastq, and can also support single-end and double-end fastq second-generation data; the analysis process is visualized, and the analysis result can be generated one key and displayed in a graphical mode.
Fig. 1 shows a schematic diagram of a framework of the nanopore sequencing-based whole genome analysis system for a new coronavirus to be detected, which can perform gene sequence analysis on a pathogen to be detected, identify whether the pathogen is a new coronavirus, and identify variation of the new coronavirus, and the system comprises:
the system comprises a data analysis system and a sample management system which are mutually associated, wherein the data analysis system is used for acquiring sequencing data of a pathogen to be detected so as to identify the type of the pathogen to be detected, the sample management system is used for acquiring sample information corresponding to the pathogen to be detected, and the sequencing data is associated with the sample information;
the data analysis system includes:
the task establishing unit is used for establishing an analysis task corresponding to the sequencing data of the pathogen to be detected, wherein the analysis task stores the sequencing data and analysis parameters of the pathogen to be detected;
a reference genome storing a reference gene sequence of the new coronavirus;
the sequence comparison unit is used for obtaining a comparison instruction, comparing the sequencing data with the reference genome and obtaining a detection sequence of the pathogen to be detected;
the sequence analysis unit is used for acquiring a sequence analysis instruction and performing at least one sequence analysis task of genome coverage rate, variation detection, genome assembly and genome integrity on the basis of the detection sequence;
and the analysis report generating unit is used for acquiring the report instruction, extracting the analysis result data of the sequence analyzing unit and the sequence comparing unit, and associating the analysis result data with the sample management system to generate a visual analysis report.
In the scheme, the nanopore sequencing-based whole genome analysis system for the new coronavirus can be used for identifying whether an unknown pathogen is the new coronavirus or not, and detecting and evolving variation conditions of the new coronavirus. That is, when the coincidence rate of the gene sequence of the pathogen to be detected and the gene sequence in the reference genome is greater than a set threshold value, the pathogen to be detected is determined to be a new coronavirus; and carrying out subsequent variation detection on the pathogen to be detected so as to obtain the variation condition and the evolution process of the new coronavirus. And after the sequence comparison unit acquires the comparison instruction, comparing whether the sequencing data of the pathogen to be detected has the gene sequence corresponding to the reference genome, and if so, judging that the pathogen to be detected is the new coronavirus.
The task establishing unit is provided with a plurality of interfaces aiming at different types of data, different sequencing analysis channels are arranged corresponding to the different interfaces, and the corresponding sequencing analysis channels are selected according to the types of the sequencing data. Specifically, the multiple interfaces of the task establishing unit in the scheme enable the analysis system to analyze not only fastq and fast5 types of nanopore sequencing data, but also single-end and double-end fastq second-generation nanopore sequencing data, and simultaneously analyze and process multiple types of sequencing data, and the analysis system is applicable to various types of sequencing data, such as: illumina, huada, ion torrent, Pacbio, etc. are almost all high throughput sequencing platforms. This is because the present solution classifies the tasks created by the task creating unit and individually sets the sequencing analysis channel.
Specifically, the task establishing unit comprises a second-generation sequencing task module and a third-generation sequencing task module, the second-generation sequencing task module stores a second-generation sequencing task and corresponding analysis parameters, the third-generation sequencing task module stores a third-generation sequencing task and corresponding analysis parameters, the task establishing unit is provided with a parameter setting module, the parameter setting module is used for manually adjusting the parameters of sequencing data, and the parameter setting module is displayed on a system interface through a visual process. Moreover, it is worth mentioning that an independent sequencing analysis channel is established for each sequencing task, and a storage folder is established for the corresponding sequencing task.
Specifically, the analysis parameters for the third-generation sequencing task include a task name, a mode, a sequence path, a sample mixing reagent, a thread number, a length limit value, an accuracy limit value, a consistency depth and an SNP accuracy Q value, wherein the task name defines the name of each sequencing analysis channel so as to facilitate the user to quickly position and manage the established sequencing data; the mode can select one of fast5, fast q and barcoded fast q, and different subsequent analysis channels are selected according to each mode; inputting a file path of the folder by the sequence path; the mixed sample reagent provides single sample sequencing and a sequencing scheme of a multi-sample sequencing reagent corresponding to the Nanopore sequencing according to the analysis type.
The analysis parameters aiming at the second-generation sequencing task comprise a task name, sequence selection, thread number, consistency depth and SNP accuracy Q value, and the mode corresponding to the second-generation sequencing task is a fastq mode.
Particularly, the analysis parameters of the sequencing task can be set by manual selection according to the scheme. In particular, the pattern of sequencing data corresponds to different subsequent sequencing analysis efforts. The number of the threads is default to 10, the length limit value is default to 500, the accuracy limit value is default to 80, the consistency depth is default to 20, the SNP accuracy Q value is default to 20, and the specific parameters can be correspondingly adjusted according to the parameter setting module. In particular, since the present protocol provides a separate sequencing analysis channel, it allows the protocol to be targeted to different types of data.
The user inputs sequencing data on an interface of the analysis system and fills or modifies corresponding analysis parameters according to the instructions, the task establishing unit establishes corresponding storage folders based on the obtained sequencing data and the analysis parameters, and if the input sequencing data is a third-generation sequencing task, options of different data models are displayed.
Sample information corresponding to sequencing data is input into the sample management system, and the sample information includes but is not limited to: the task name of the sequencing data, the sampling information of the sequencing data and the personnel information of the sampling personnel corresponding to the sequencing data. The sampling information includes sample type, sampling date, and sequencing date. The person information includes the name, gender, and age of the person who sampled. And the sample information is associated with the sequencing data and stored in a folder corresponding to the sequencing data, or the sequencing data is associated with the sample data and stored in a sample management system.
And filling sample information on an interface of the analysis system by a user according to the instruction, and associating the task name in the task establishing unit with the option corresponding to the task name of the sequencing data for the user to select autonomously. Alternatively, the user may enter "the task name of the sequencing data" to match the corresponding sequencing data from the task creation unit.
And when the corresponding sequencing data are stored in the folder for storing the sequencing data by the analysis system, carrying out subsequent sequencing analysis according to the operation instruction of the user. The scheme reconfigures the triggering interfaces and the cascade relation of a reference genome, a sequence comparison unit, a sequence analysis unit and an analysis report generation unit according to a sequence analysis process, wherein the cascade relation is as follows: the sequence comparison unit is a lower-level task node of the reference genome, the sequence analysis unit is a lower-level associated task node of the sequence comparison unit, and the analysis report generation unit is a lower-level associated task node of the sequence comparison unit, the sequence analysis unit and the analysis report generation unit. A trigger interface of the sequence comparison unit corresponds to the comparison instruction, and the sequence comparison unit is triggered only after the comparison instruction is obtained; the trigger interface of the sequence analysis unit corresponds to the analysis instruction, and the species sequence analysis unit and the functional gene sequence analysis unit are triggered only after the analysis instruction is obtained.
The operation pressure of the analysis system is reduced through the arrangement of the mode, and the operation difficulty of operators is reduced. An operator selects corresponding content on an application interface of the analysis system according to requirements, generates corresponding instructions, cannot trigger lower task nodes under the condition that higher cascade conditions are not met due to the fact that cascade relations are set among units of the analysis system, and data to be analyzed are circulated in the analysis system according to the set flow direction.
In the scheme, the processing processes of the sequence comparison unit and the sequence analysis unit are independent and related, so that the result generated by comparison can be directly extracted in the analysis report generation unit. In addition, since the task establishing unit classifies the data modes, the sequence comparison unit can normally operate according to the comparison of the corresponding modes.
In some embodiments, the system comprises a data quality control unit, and the data quality control unit performs quality control on sequencing data of a pathogen to be detected according to set quality control conditions. At this time, the sequence comparison unit is a lower-level associated task node of the data quality control unit, and the sequencing data triggers the sequence comparison unit to perform comparison only after the quality control is completed.
In this scenario, the reference genome comprises the gene sequence for the new coronavirus. It is worth mentioning that the present scheme detects the variation of the new coronavirus, and if the variation gene sequence is obtained, the reference genome can be updated.
The sequence analysis unit can be divided into an independent variation detection unit, a genome assembly unit, a genome coverage rate unit and a genome integrity calculation unit according to the analysis content, and one or more of the variation detection unit, the genome assembly unit, the genome coverage rate unit and the genome integrity calculation unit are triggered according to the analysis instruction, so that the simple operation and analysis of the whole process are realized. That is to say, the scheme collects the tasks of the new crown gene sequence analysis, and the operator selects the analyzed tasks according to the requirement and triggers the corresponding sequence analysis unit to analyze.
Correspondingly, the user selects an analysis task on a page corresponding to the analysis system according to the variation detection, the genome assembly, the genome coverage rate and the genome integrity, and correspondingly generates different sequence analysis results.
Wherein the sequence analysis unit is triggered after the sequence comparison unit, and the analysis task of the sequence analysis unit is carried out only when a new crown gene sequence is detected to be contained in the pathogen to be detected. The sequence analysis unit analyzes the new coronavirus gene sequence and the whole gene sequence of the pathogen to be detected. Wherein the genome coverage rate is the coverage rate of the new coronavirus gene sequence in the whole gene sequence, and the genome integrity is the integrity of the new coronavirus gene sequence.
It is worth mentioning that the sequence analysis unit of the scheme designs the variation detection unit aiming at the new coronavirus, and the result detected by the variation detection unit is displayed in a tree form, so that the experimenter can conveniently perform visual analysis processing. The variation detection unit at least comprises one variation detection task of variation type effect annotation, evolution analysis and sample grouping, and different variation detection tasks are carried out according to different instruction operations.
Annotation of variant type effects: and a gene annotation file is built in, and the variation condition is annotated based on the gene annotation file. Evolution analysis: and analyzing the gene sequence according to the GISAID reference strain, and analyzing whether the gene sequence is evolved or updated. And (3) sample grouping, namely grouping the gene sequences according to the GISAID grouping standard, and classifying the gene sequences of the same type into a group. And the analysis results obtained by the evolution analysis and the sample grouping are displayed in a tree form.
In the scheme, the mutation detection result obtained by the mutation detection unit is correlated with the sample management system. Specifically, the mutation detection result is associated with the corresponding sample to be detected and the detection personnel so as to facilitate the tracking treatment of the new coronavirus.
In addition, the analysis report generating unit embeds an analysis report template, extracts corresponding data content according to the gene analysis command and fills the data content into the analysis report template, wherein the extracted data content comprises: one or more of sample information, sequencing data, sequence alignment results, and sequence analysis results. Moreover, it should be noted that, since the processing procedures of the present solution are independent and related, the analysis report generation unit is convenient to extract the corresponding content independently. In addition, the analysis report generation unit can generate a chart-type analysis report according to the extracted data content.
The flow interface of the analysis system provided by the scheme is simple and easy to operate, after the sequencing data and the sample information are input by an operator, the corresponding analysis content is displayed according to the indication, and finally the analysis content is summarized to obtain an analysis report, so that one-click output from offline data biological information to a result report is realized.
The nanopore sequencing-based new coronavirus whole genome analysis system provided by the present scheme can be carried and run on a computer system, and the computer system of the server comprises a Central Processing Unit (CPU) which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage part into a Random Access Memory (RAM). In the RAM, various programs and data necessary for system operation are also stored. The CPU, ROM, and RAM are connected to each other via a bus. An input/output (I/O) interface is also connected to the bus. The modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described modules may also be disposed in a processor.
The present invention is not limited to the above-mentioned preferred embodiments, and any other products in various forms can be obtained by anyone in the light of the present invention, but any changes in the shape or structure thereof, which have the same or similar technical solutions as those of the present application, fall within the protection scope of the present invention.
Claims (10)
1. A nanopore sequencing-based neocoronavirus whole genome analysis system, comprising:
the system comprises a data analysis system and a sample management system which are mutually associated, wherein the data analysis system is used for acquiring sequencing data of a pathogen to be detected so as to identify the type of the pathogen to be detected, the sample management system is used for acquiring sample information corresponding to the pathogen to be detected, and the sequencing data is associated with the sample information;
the data analysis system includes:
the task establishing unit is used for establishing an analysis task corresponding to the sequencing data of the pathogen to be detected, wherein the analysis task stores the sequencing data and analysis parameters of the pathogen to be detected;
a reference genome storing a reference gene sequence of the new coronavirus;
the sequence comparison unit is used for obtaining a comparison instruction, comparing the sequencing data with the reference genome and obtaining a detection sequence of the pathogen to be detected;
the sequence analysis unit is used for acquiring a sequence analysis instruction and performing at least one sequence analysis task of genome coverage rate, variation detection, genome assembly and genome integrity on the basis of the detection sequence;
and the analysis report generating unit is used for acquiring the report command, extracting the analysis result data of the sequence analyzing unit, and associating the analysis result data with the sample management system to generate an analysis report.
2. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 1, wherein the task establishment unit comprises a second-generation sequencing task module and a third-generation sequencing task module, the second-generation sequencing task module stores a second-generation sequencing task and corresponding analysis parameters, the third-generation sequencing task module stores a third-generation sequencing task and corresponding analysis parameters, and the task establishment unit is provided with a parameter setting module.
3. The nanopore sequencing based neocoronavirus whole genome analysis system of claim 2, wherein the analysis parameters for the third generation sequencing task comprise task name, pattern, sequence path, mixing reagent, number of threads, length limit, accuracy limit, depth of identity, and SNP accuracy Q-value, and the analysis parameters for the second generation sequencing task comprise task name, sequence selection, number of threads, depth of identity, and SNP accuracy Q-value.
4. The nanopore sequencing-based neocoronavirus whole genome analysis system according to claim 3, wherein one of fast5, fast q and barcodefstq is selected for the third generation sequencing task, and the corresponding mode for the second generation sequencing task is the fast q mode.
5. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 1, wherein the trigger interfaces and the cascade relationship of the reference genome, the sequence alignment unit, the sequence analysis unit, the analysis report generation unit are reconfigured according to a sequence analysis process.
6. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 1, wherein the sequence analysis unit is further divided into an independent variation detection unit, a genome assembly unit, a genome coverage unit and a genome integrity calculation unit according to the analysis content.
7. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 6, wherein the mutation detection unit comprises at least one mutation detection task selected from mutation type effect annotation, evolution analysis, and sample grouping.
8. The nanopore sequencing-based whole genome analysis system for neocoronavirus according to claim 7, wherein the variation detection result obtained by the variation detection unit is displayed in a tree form.
9. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 6, wherein the mutation detection unit detects a mutation and correlates with the sample management system.
10. The nanopore sequencing-based neocoronavirus whole genome analysis system of claim 1, comprising a data quality control unit, wherein the sequence comparison unit is a subordinate associated task node of the data quality control unit, and sequencing data triggers the sequence comparison unit to perform comparison only after quality control is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011641513.2A CN112599192A (en) | 2020-12-31 | 2020-12-31 | New coronavirus whole genome analysis system based on nanopore sequencing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011641513.2A CN112599192A (en) | 2020-12-31 | 2020-12-31 | New coronavirus whole genome analysis system based on nanopore sequencing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112599192A true CN112599192A (en) | 2021-04-02 |
Family
ID=75206712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011641513.2A Pending CN112599192A (en) | 2020-12-31 | 2020-12-31 | New coronavirus whole genome analysis system based on nanopore sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112599192A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113936739A (en) * | 2021-05-28 | 2022-01-14 | 四川大学 | Novel automatic assessment method for base mutation of coronavirus sample |
CN117727368A (en) * | 2023-12-13 | 2024-03-19 | 广州凯普医学检验所有限公司 | Automatic novel coronavirus genome rapid typing report system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021979A (en) * | 2016-05-12 | 2016-10-12 | 北京百迈客云科技有限公司 | Analysis system and method for human genome re-sequencing data |
CN106599614A (en) * | 2016-11-07 | 2017-04-26 | 为朔医学数据科技(北京)有限公司 | Control method and system for processing and analysis process of high-throughput sequencing data |
CN107180166A (en) * | 2017-04-21 | 2017-09-19 | 北京希望组生物科技有限公司 | A kind of full-length genome structure variation analysis method and system being sequenced based on three generations |
KR20190061771A (en) * | 2017-11-28 | 2019-06-05 | 단국대학교 천안캠퍼스 산학협력단 | Method of genome analysis using public next-generation sequencing data in the gene expression omnibus database |
CN111118226A (en) * | 2020-03-25 | 2020-05-08 | 北京微未来科技有限公司 | Novel coronavirus whole genome capture method, primer group and kit |
CN111455062A (en) * | 2020-04-01 | 2020-07-28 | 中国人民解放军总医院 | Kit and platform for detecting susceptibility genes of novel coronavirus |
US20200350035A1 (en) * | 2017-10-27 | 2020-11-05 | Sysmex Corporation | Gene analysis method, gene analysis apparatus, management server, gene analysis system, program, and storage medium |
-
2020
- 2020-12-31 CN CN202011641513.2A patent/CN112599192A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021979A (en) * | 2016-05-12 | 2016-10-12 | 北京百迈客云科技有限公司 | Analysis system and method for human genome re-sequencing data |
CN106599614A (en) * | 2016-11-07 | 2017-04-26 | 为朔医学数据科技(北京)有限公司 | Control method and system for processing and analysis process of high-throughput sequencing data |
CN107180166A (en) * | 2017-04-21 | 2017-09-19 | 北京希望组生物科技有限公司 | A kind of full-length genome structure variation analysis method and system being sequenced based on three generations |
US20200350035A1 (en) * | 2017-10-27 | 2020-11-05 | Sysmex Corporation | Gene analysis method, gene analysis apparatus, management server, gene analysis system, program, and storage medium |
KR20190061771A (en) * | 2017-11-28 | 2019-06-05 | 단국대학교 천안캠퍼스 산학협력단 | Method of genome analysis using public next-generation sequencing data in the gene expression omnibus database |
CN111118226A (en) * | 2020-03-25 | 2020-05-08 | 北京微未来科技有限公司 | Novel coronavirus whole genome capture method, primer group and kit |
CN111455062A (en) * | 2020-04-01 | 2020-07-28 | 中国人民解放军总医院 | Kit and platform for detecting susceptibility genes of novel coronavirus |
Non-Patent Citations (2)
Title |
---|
盛楠等: "新型冠状病毒SARS-CoV-2核酸检测技术平台的研究进展", 《分析化学》 * |
马丽娜等: "三代测序技术及其应用研究进展", 《中国畜牧兽医》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113936739A (en) * | 2021-05-28 | 2022-01-14 | 四川大学 | Novel automatic assessment method for base mutation of coronavirus sample |
CN117727368A (en) * | 2023-12-13 | 2024-03-19 | 广州凯普医学检验所有限公司 | Automatic novel coronavirus genome rapid typing report system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112599192A (en) | New coronavirus whole genome analysis system based on nanopore sequencing | |
CN107391963A (en) | Eucaryon based on calculating cloud platform is without ginseng transcript profile interaction analysis system and method | |
CN110462372B (en) | Visualization, comparative analysis, and automatic difference detection of large multi-parameter datasets | |
JP2022512633A (en) | Adaptive sorting for particle analyzers | |
Stolarek et al. | Dimensionality reduction by UMAP for visualizing and aiding in classification of imaging flow cytometry data | |
EP4016533B1 (en) | Method and apparatus for machine learning based identification of structural variants in cancer genomes | |
CN116825184A (en) | Method, device, equipment and storage medium for detecting cell composition of biological sample | |
CN113066532B (en) | Method for analyzing virus source sRNA data in host based on high-throughput sequencing technology | |
US20230393048A1 (en) | Optimized Sorting Gates | |
Trapnell et al. | Monocle: Cell counting, differential expression, and trajectory analysis for single-cell RNA-Seq experiments | |
CN110277139B (en) | Microorganism limit checking system and method based on Internet | |
CN110570901B (en) | Method and system for SSR typing based on sequencing data | |
US10883912B2 (en) | Biexponential transformation for graphics display | |
CN114118306B (en) | Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent | |
CN112687343A (en) | Nanopore sequencing-based broad-spectrum pathogenic microorganism and drug resistance analysis system | |
CN114863994A (en) | Pollution assessment method, device, electronic equipment and storage medium | |
JP2008226095A (en) | Gene expression variation analysis method, system and program | |
Ignatieva et al. | Investigation of ongoing recombination through genealogical reconstruction for SARS-CoV-2 | |
WO2023153413A1 (en) | System, program and method for predicting proportion of target cells in cultured cells containing two or more types of cells | |
WO2023162394A1 (en) | Model selection method and image processing method | |
TWI834429B (en) | Model selection method and image processing method | |
KR20180090680A (en) | Geneome analysis system | |
CN117912572A (en) | Cervical cancer information acquisition method and system based on big data platform and methylation inspection | |
Thallinger | Comparison of ddRAD Analysis Pipelines | |
Barak | VirMake: en Snakemake pipeline for metagenomikk dataanalyse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210402 |