CN111653316A - Visualization analysis method, system and storage medium based on next generation sequencing - Google Patents

Visualization analysis method, system and storage medium based on next generation sequencing Download PDF

Info

Publication number
CN111653316A
CN111653316A CN202010460556.4A CN202010460556A CN111653316A CN 111653316 A CN111653316 A CN 111653316A CN 202010460556 A CN202010460556 A CN 202010460556A CN 111653316 A CN111653316 A CN 111653316A
Authority
CN
China
Prior art keywords
gene
generation sequencing
data
sequencing
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010460556.4A
Other languages
Chinese (zh)
Inventor
周在威
王剑
陈瑛
陈爱玲
吴境雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunyin Biotechnology Co ltd
Original Assignee
Shanghai Xunyin Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunyin Biotechnology Co ltd filed Critical Shanghai Xunyin Biotechnology Co ltd
Priority to CN202010460556.4A priority Critical patent/CN111653316A/en
Publication of CN111653316A publication Critical patent/CN111653316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a visualized analysis method, a visualized analysis system and a storage medium based on next generation sequencing, wherein the method comprises the following steps: acquiring gene next-generation sequencing data stored in a database and a preset sample type; generating sequencing task information according to preset mode information; screening the second-generation gene sequencing data according to the sequencing task information; and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result. The visualized analysis method, the visualized analysis system and the storage medium based on the second-generation sequencing provided by the invention can effectively avoid the problem that medical personnel are not familiar with computer programming codes. The method is simple to implement and convenient to operate, and meets the requirement of medical personnel on the analysis of Next Generation Sequencing (NGS) data.

Description

Visualization analysis method, system and storage medium based on next generation sequencing
Technical Field
The invention relates to the field of gene second-generation sequencing, in particular to a visualized analysis method, a visualized analysis system and a visualized analysis storage medium based on second-generation sequencing.
Background
With the popularization and commercialization of next-generation sequencing technologies (ngs), a great deal of human genetic information has been examined. However, the size of the human genome is about 30 hundred million base pairs, 3000 Mbp. Genome size is usually expressed in the number of nucleotide base pairs, in millions, written as Mb or Mbp. The human genome consists of 23 pairs of chromosomes (46 in total), each containing hundreds of genes. Chromosome 1 to 22 are numbered in the order of their sizes from large to small, and chromosome 23 is a sex-determining sex chromosome. The largest chromosomes contain about 2 hundred million 5 million base pairs and the smallest have about 3800 ten thousand base pairs. A total of about 30 hundred million base pairs, 3000 Mbp.
In the prior art, performing data analysis of the human genome based on next generation sequencing data (NGS) is very demanding for the person performing the analysis, requiring a powerful computer background as well as medical knowledge.
Based on this, the inventors of the present application found that the methods in the prior art are difficult to implement and cannot meet the requirements of data analysis.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide a visualization analysis method, system and storage medium based on second-generation sequencing.
In a first aspect, an embodiment of the present invention provides a method for visual analysis based on next generation sequencing, including: acquiring gene next-generation sequencing data stored in a database and a preset sample type; generating sequencing task information according to preset mode information; screening the second-generation gene sequencing data according to the sequencing task information; and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result.
In one possible implementation, the obtaining of the second-generation sequencing data stored in the repository includes: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
In one possible implementation, the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information affecting protein function.
In one possible implementation, the preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.
In one possible implementation, the visual analysis method further includes displaying the analysis result.
In one possible implementation, the analysis results are a table of gene mutations that are characteristic of the disease.
In a second aspect, an embodiment of the present invention further provides a second-generation sequencing-based visualization analysis system, including: the acquisition unit is used for acquiring the second-generation sequencing data of the gene stored in the database and a preset sample type; the test task generating module is used for generating sequencing task information according to preset mode information; the screening unit is used for screening the second-generation sequencing data of the gene according to the sequencing task information; and the analysis unit is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.
In a possible implementation manner, the obtaining unit is further configured to: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
In one possible implementation, the visualization analysis system further includes: and the display unit is used for displaying the result of the genome re-sequencing analysis.
In a third aspect, embodiments of the present invention further provide a storage medium storing computer-executable instructions for performing the method for performing the second-generation sequencing-based visualization analysis according to the claims above.
The visualized analysis method, the visualized analysis system and the storage medium based on the second-generation sequencing provided by the embodiment of the invention can effectively solve the problem that medical personnel are not familiar with computer programming codes. The method is simple to implement and convenient to operate, and meets the requirement of medical personnel on the analysis of Next Generation Sequencing (NGS) data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for visualization analysis based on next generation sequencing according to an embodiment of the present invention;
FIG. 2 shows a schematic structural diagram of a visualization analysis system based on next generation sequencing provided by an embodiment of the invention.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The embodiment of the invention provides a visualized analysis method based on next generation sequencing, which is shown in a figure 1 and comprises the following steps: step 1-step 6;
step 1, acquiring second-generation genetic sequencing data and a preset sample type, wherein the second-generation genetic sequencing data is second-generation sequencing (NGS) data;
step 2, generating sequencing task information according to preset mode information;
in one implementation, the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information that affects protein function.
Step 3, screening the second-generation sequencing data of the gene according to the sequencing task information;
and 4, analyzing the screened gene second generation sequencing data by calling a bioinformatics software library and a script library according to the preset sample type to obtain an analysis result.
Therefore, the visualization analysis method based on next generation sequencing provided by the embodiment can effectively avoid the problem that medical personnel is not familiar with computer programming codes. The method is simple to implement and convenient to operate. The requirement of medical personnel for carrying out the analysis of Next Generation Sequencing (NGS) data is met.
In one implementation, the preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.
In one implementation, step S1 may be preceded by: and performing gene second-generation sequencing on the gene sample.
In one implementation, step S1 may include: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
In one implementation, step 4 may be further followed by:
and 5, displaying the analysis result.
In one implementation, the analysis results are a table of gene mutations that are characteristic of the disease.
After receiving the second-generation sequencing data, the hospital or the third-party detection mechanism is operated by medical and genetical professionals to extract the related second-generation sequencing data, and the related gene mutation table can be obtained after the related sample type, the genetic pattern and the medical phenotype are matched and processed.
In this embodiment, the bioinformatics software library may include SEEDERSEQ, GATK, ANNOVAR.
The script library may include SNP detection scripts, medical analysis scripts, and merged non-conventional raw data scripts.
It is understood that the visualization analysis in this embodiment is based on genome sequencing of known human genome sequences, and medical relevance analysis is performed on diseased individuals or populations. Individuals with genome sequencing can find a large number of SNP (single Nucleotide polymorphisms) of single Nucleotide polymorphism sites through sequence comparison, and carry out comprehensive analysis according to related medical phenotypes of patients and related genetic patterns of genetic diseases.
In the prior art, multiple pieces of bioinformatics software are needed for sequencing data analysis, the using method of each piece of software needs to be known relatively, and the linkage between different analysis modules needs manual intervention, so that the analysis is complicated and the efficiency is low.
Furthermore, the embodiment simplifies the analysis process of the visual analysis method based on the second-generation sequencing by calling the bioinformatics software and the personalized analysis script library, improves the genome sequencing efficiency and saves the scientific research cost.
The present embodiment also provides a visualization analysis system based on next generation sequencing, as shown in fig. 2, including: the device comprises an acquisition unit 1, a test task generation module 2, a screening unit 3 and an analysis unit 4.
The acquisition unit 1 is used for acquiring second-generation gene sequencing data and a preset sample type, wherein the second-generation gene sequencing data are stored in a database and are data of a gene sample subjected to second-generation sequencing.
And the test task generating module 2 is used for generating sequencing task information according to preset mode information.
And the screening unit 3 is used for screening the second-generation gene sequencing data according to the sequencing task information.
And the analysis unit 4 is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.
The obtaining unit 1 is further configured to: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
In one implementation, the visualization analysis system further includes: and the display unit 5 is used for displaying the genome re-sequencing analysis result.
Therefore, the visualization analysis system based on the second-generation sequencing provided by the embodiment can effectively avoid the problem that medical personnel is not familiar with computer programming codes. The method is simple to implement and convenient to operate. The requirement of medical personnel for carrying out the analysis of Next Generation Sequencing (NGS) data is met.
Embodiments of the present invention further provide a storage medium, where the storage medium stores computer-executable instructions, which include a program for executing the above visualization analysis method based on next generation sequencing, and the computer-executable instructions may execute the method in any of the above method embodiments.
The storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, nonvolatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A visual analysis method based on next generation sequencing is characterized by comprising the following steps:
acquiring gene next-generation sequencing data stored in a database and a preset sample type;
generating sequencing task information according to preset mode information;
screening the second-generation gene sequencing data according to the sequencing task information;
and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result.
2. The visual analysis method of claim 1, wherein the obtaining of gene secondary sequencing data stored in the repository comprises:
judging whether the second-generation sequencing data of the gene is more than 10 g;
if the data is more than 10g, acquiring the second-generation sequencing data of the gene;
if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
3. A visual analysis method according to claim 1, wherein the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information affecting protein function.
4. A visual analysis method according to claim 1, wherein said preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.
5. A visual analytics method as claimed in claim 1, further comprising displaying the analytics results.
6. The visual analysis method of claim 1, wherein the analysis result is a gene mutation table according to disease characteristics.
7. A visual analysis system based on next generation sequencing, comprising:
the acquisition unit is used for acquiring the second-generation sequencing data of the gene stored in the database and a preset sample type;
the test task generating module is used for generating sequencing task information according to preset mode information;
the screening unit is used for screening the second-generation sequencing data of the gene according to the sequencing task information;
and the analysis unit is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.
8. The visualization analysis system of claim 7, wherein the acquisition unit is further configured to:
judging whether the second-generation sequencing data of the gene is more than 10 g;
if the data is more than 10g, acquiring the second-generation sequencing data of the gene;
if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.
9. A visualization analysis system as recited in claim 7, wherein the visualization analysis system further comprises:
and the display unit is used for displaying the result of the genome re-sequencing analysis.
10. A storage medium storing computer-executable instructions for performing the method for visual analysis based on next-generation sequencing of any one of claims 1-6.
CN202010460556.4A 2020-05-27 2020-05-27 Visualization analysis method, system and storage medium based on next generation sequencing Pending CN111653316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460556.4A CN111653316A (en) 2020-05-27 2020-05-27 Visualization analysis method, system and storage medium based on next generation sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460556.4A CN111653316A (en) 2020-05-27 2020-05-27 Visualization analysis method, system and storage medium based on next generation sequencing

Publications (1)

Publication Number Publication Date
CN111653316A true CN111653316A (en) 2020-09-11

Family

ID=72348325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460556.4A Pending CN111653316A (en) 2020-05-27 2020-05-27 Visualization analysis method, system and storage medium based on next generation sequencing

Country Status (1)

Country Link
CN (1) CN111653316A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090240441A1 (en) * 2008-03-20 2009-09-24 Helicos Biosciences Corporation System and method for analysis and presentation of genomic data
CN105653893A (en) * 2015-12-25 2016-06-08 北京百迈客生物科技有限公司 Genome re-sequencing analysis system and method
CN109545281A (en) * 2018-09-30 2019-03-29 南京派森诺基因科技有限公司 A kind of analysis method of the trio family genetic mutation mode based on two generation high-flux sequences
CN109637584A (en) * 2019-01-24 2019-04-16 上海海云生物科技有限公司 Oncogene diagnostic assistance decision system
CN109750101A (en) * 2019-02-15 2019-05-14 中国医学科学院阜外医院 Detect gene panel and its application of monogenic inheritance hypertension
CN110931081A (en) * 2019-11-28 2020-03-27 广州基迪奥生物科技有限公司 Biological information analysis method for human monogenic genetic disease detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090240441A1 (en) * 2008-03-20 2009-09-24 Helicos Biosciences Corporation System and method for analysis and presentation of genomic data
CN105653893A (en) * 2015-12-25 2016-06-08 北京百迈客生物科技有限公司 Genome re-sequencing analysis system and method
CN109545281A (en) * 2018-09-30 2019-03-29 南京派森诺基因科技有限公司 A kind of analysis method of the trio family genetic mutation mode based on two generation high-flux sequences
CN109637584A (en) * 2019-01-24 2019-04-16 上海海云生物科技有限公司 Oncogene diagnostic assistance decision system
CN109750101A (en) * 2019-02-15 2019-05-14 中国医学科学院阜外医院 Detect gene panel and its application of monogenic inheritance hypertension
CN110931081A (en) * 2019-11-28 2020-03-27 广州基迪奥生物科技有限公司 Biological information analysis method for human monogenic genetic disease detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐望红: "《肿瘤流行病学》", 复旦大学出版社, pages: 40 - 41 *
陈静等: ""EPB41基因突变导致遗传性椭圆形红细胞增多症一家系的分子诊断研究"", vol. 51, no. 01, pages 97 - 101 *

Similar Documents

Publication Publication Date Title
US10127353B2 (en) Method and systems for querying sequence-centric scientific information
Han et al. Advanced applications of RNA sequencing and challenges
JP6420543B2 (en) Genome data processing method
JP2014508994A5 (en)
Huang et al. Evaluation of variant detection software for pooled next-generation sequence data
CA3204451A1 (en) Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics
Sana et al. GAMES identifies and annotates mutations in next-generation sequencing projects
Schilder et al. echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline
Wood et al. Recommendations for accurate resolution of gene and isoform allele-specific expression in RNA-Seq data
Llinares-López et al. Genome-wide genetic heterogeneity discovery with categorical covariates
Holtgrewe et al. Methods for the detection and assembly of novel sequence in high-throughput sequencing data
Arkin et al. EPIQ—efficient detection of SNP–SNP epistatic interactions for quantitative traits
WO2016114009A1 (en) Fusion gene analysis device, fusion gene analysis method, and program
WO2022029567A1 (en) A method for determining the pathogenicity/benignity of a genomic variant in connection with a given disease
US20220375544A1 (en) Kit and method of using kit
Benton et al. Variant call format–diagnostic annotation and reporting tool: A customizable analysis pipeline for identification of clinically relevant genetic variants in next-generation sequencing data
CN112735594A (en) Method for screening disease phenotype related mutation sites and application thereof
CN111653316A (en) Visualization analysis method, system and storage medium based on next generation sequencing
Bakhtiar et al. Identifying human disease genes: advances in molecular genetics and computational approaches
US20220293214A1 (en) Methods of analyzing genetic variants based on genetic material
Huang et al. Reveel: large-scale population genotyping using low-coverage sequencing data
Halim-Fikri et al. Central resources of variant discovery and annotation and its role in precision medicine
Kumaran et al. eyeVarP: a computational framework for the identification of pathogenic variants specific to eye disease
CN115810393B (en) Sequencing sample homology detection method and system based on SNPs library of construction crowd
WO2023136297A1 (en) Information processing system, information processing device, information processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination