CN117153248B - Gene region variation detection and visualization method and system based on pan genome - Google Patents

Gene region variation detection and visualization method and system based on pan genome Download PDF

Info

Publication number
CN117153248B
CN117153248B CN202311133414.7A CN202311133414A CN117153248B CN 117153248 B CN117153248 B CN 117153248B CN 202311133414 A CN202311133414 A CN 202311133414A CN 117153248 B CN117153248 B CN 117153248B
Authority
CN
China
Prior art keywords
genome
gene
result
region
variation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311133414.7A
Other languages
Chinese (zh)
Other versions
CN117153248A (en
Inventor
焦成智
陈力杨
高丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Jizhi Gene Technology Co ltd
Original Assignee
Tianjin Jizhi Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Jizhi Gene Technology Co ltd filed Critical Tianjin Jizhi Gene Technology Co ltd
Priority to CN202311133414.7A priority Critical patent/CN117153248B/en
Publication of CN117153248A publication Critical patent/CN117153248A/en
Application granted granted Critical
Publication of CN117153248B publication Critical patent/CN117153248B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of gene information detection, and discloses a genetic region variation detection and visualization method and system based on a pan genome. According to the method, a coverage value and an identity value are filtered, screened and compared according to a preliminary annotation result obtained by mapping, an optimal comparison result is screened according to the coverage value and the identity value, whether genes are located in the same collinearity region on different genomes is judged, and a final standard annotation result is determined according to the judgment result; extracting each genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file; sequence comparison is carried out on the extracted sequences according to a specific sequence, and variation among genomes is detected; and filtering the mutation detection result to remove the N-containing fragment. The invention is based on the visualization of the variation among the multiple genes, and can more conveniently and intuitively find the influence of the variation among the genes on each functional area of the genes.

Description

Gene region variation detection and visualization method and system based on pan genome
Technical Field
The invention belongs to the technical field of gene information detection, and particularly relates to a genetic region variation detection and visualization method and system based on a genome.
Background
With the decreasing cost of high throughput sequencing, there is currently an increasing number of species with a ubiquitous genome. By comparing and analyzing genomic differences and variations between different individuals, genetic diversity and evolutionary history within the same species can be revealed. This is important to understand the differences in phenotype, adaptability, and susceptibility to disease among different individuals. The genome-wide research can help reveal genetic differences and genome variation conditions among different individuals in the same species, and has important significance in the aspects of evolutionary process, phenotype differences and the like. Whereas the mutation detection visualization of the pan genome may further enhance understanding of these genetic differences and genomic variations. The visual research significance of mutation detection is that the complex genome mutation information can be displayed on an intuitive and easily understood graphical interface, so that scientific researchers can conveniently conduct data analysis and conclusion deduction. By visualizing the genomic variation differences exhibited, the genetic characteristics and differences between different individuals in the same species can be better understood and more relatedness can be found.
In the biological research process, there are often some genes which need to be concerned, and genetic variation information related to specific individuals can be found by analyzing and comparing genome information of individuals with the genes, so that possible genetic risks, biological characteristics, gene functions and the like of the individuals are revealed. The information has important significance in life science research, medical diagnosis, personalized treatment and other aspects. Because the variant file VCF is not convenient for generating variant digest statistics. With such complexity and difficulty, researchers or doctors can further analyze and understand the data characteristics of the mutation of the gene region of interest, such as mutation type, distribution, frequency, position and the like of a specific sample, through visual analysis, so that more comprehensive support is provided for subsequent work. Therefore, the visual inspection of the mutation of the star gene region is of great importance.
The current genome-wide variation visualization tools are based on variation display of whole genome, and have no variation presentation mode focused on a certain gene region.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) At present, the genome-wide variation visualization tool is based on variation display of whole genome, the display form is macroscopic, and the variation line form of a specific gene region cannot be focused. There is currently no tool for simultaneous presentation of multiple genome, mutation detection, gene structure, etc. The prior art variation display is often a display of all variation distribution/content across the genome or across a chromosome. Because of the long genomic sequence, only macroscopic display variations can occur in which regions the number is high and in which regions the number is low. Current researchers often look at a gene for the presence of a mutation in a region of the gene on a different genome, what type of mutation, where the mutation is located. This further results in an increased effort and the accuracy in the visual presentation of the information data obtained is somewhat affected.
(2) Since the current units for providing genome and annotation come from various sources, the quality and standard of genome annotation is not uniform, the specific genes to be analyzed often suffer from the reason that annotation standards are not uniform, resulting in incomplete gene structure, or no annotation to specific genes, thereby affecting the accurate determination of candidate gene regions.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a genetic region variation detection and visualization method and system based on a pan genome. The invention aims at microscopically exhibiting the mutation situation of a gene region. The invention also aims to avoid that no specific region of the genome can be found due to non-standardization of annotation results in order to standardize the annotation results.
The technical scheme of the invention is as follows: the genetic region variation detection and visualization method based on the pan genome comprises the following steps:
S1, mapping a gene cds sequence of interest to a genome sequence in a comparison mode to obtain a preliminary annotation result;
S2, filtering, screening and comparing to calculate a coverage value and an identity value according to the preliminary annotation result obtained by mapping, screening an optimal comparison result according to the coverage value and the identity value, judging whether genes are located in the same collinearity region on different genomes, and determining a final standard annotation result according to the judgment result;
s3, extracting each genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file;
S4, comparing the extracted sequences, and detecting variation among genomes;
S5, filtering a variation detection result, removing N-containing fragments, and classifying analysis variation types;
s6, visually displaying the classified mutation types by utilizing svg.
In step S1, the preliminary annotation result is the information of the structure of the gene on each genome, including the chromosome and specific location of the gene, and the information length of the gene region, the information length of the exon region, and the information length of the intron region.
In step S2, filtering, screening and comparing to calculate coverage value and identity value, including: comparing the specific gene sequence to the genome by gmap software, determining specific position information of the specific gene on the genome, and calculating a coverage value and an identity value according to the comparison result; the calculation formula is as follows:
coverage = length of aligned upper sequences/total length of genes x 100;
identity = length of sequences on base identical sequences/alignment x 100.
Further comprises: comparing the same gene with different genomes, determining the positions and the gene structures on the different genomes, selecting coverage values preferentially, selecting identity values, reserving comparison results and mapping, and taking the comparison results as final annotation results.
In step S3, each genomic gene region and upstream and downstream sequences are extracted according to the final standard annotation result of the annotation file, including: and writing a script according to the final standard annotation result to extract the sequence of the corresponding position.
Further, the upstream and downstream sequences include sequences of 5kb up and down, respectively.
In step S4, sequence alignment is performed on the extracted sequences, and variation between genomes is detected, including: and carrying out sequence comparison on the extracted sequences by mummer software, and sequentially carrying out inter-genome variation detection according to the arrangement sequence of the genomes.
Further, the mutation detection method comprises: after genome is compared by mummer software, a file of the linear delta.1 records is generated, and mutation information is extracted according to script writing of the linear delta.1 records file content.
Another object of the present invention is to provide a genome-wide-genome-based genetic region variation detection and visualization system, which implements the genome-wide-genome-based genetic region variation detection and visualization method, the system comprising:
the preliminary annotation result acquisition unit is used for mapping the cds sequence of the concerned gene onto the genome sequence in a comparison mode to obtain a preliminary annotation result;
The final standard annotation result determining unit is used for filtering, screening and comparing to calculate a coverage value and an identity value according to the preliminary annotation result obtained by mapping, screening an optimal comparison result according to the coverage value and the identity value, judging whether genes are located in the same co-linear region on different genomes, and determining a final standard annotation result according to the judgment result;
The genome gene region and upstream and downstream sequences extraction unit is used for extracting genome gene regions and upstream and downstream sequences according to the final standard annotation result of the annotation file;
An inter-genome variation detection unit for performing sequence alignment on the extracted sequences and detecting variation between genomes;
The mutation type filtering and classifying unit is used for filtering mutation detection results, removing N-containing fragments and analyzing and classifying mutation types;
and the display unit is used for visually displaying the classified mutation types by utilizing svg.
Further, the system is mounted on a raw information analysis platform and executes corresponding functions of each unit.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention visually displays the mutation information of the specific gene region among different genomes by providing standard annotation information of the specific gene in different genomes and providing detection results of the specific gene region.
The invention provides a visual display mode of a multi-genome gene region variation detection result, which is more applicable to the large trend of full genome assembly of a large number of samples with lower cost of full genome sequencing data. The invention is based on the visualization of the variation among the multiple genes, and can more conveniently and intuitively find the influence of the variation among the genes on each functional area of the genes.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure;
FIG. 1 is a diagram of a method for detecting and visualizing genetic region variation based on a pan genome;
FIG. 2 is a schematic diagram of a genetic region variation detection and visualization method based on the pan genome;
FIG. 3 is a schematic diagram of a genome-wide variation detection and visualization system provided by the invention;
In the figure: 1. a preliminary annotation result acquisition unit; 2. a final standard annotation result determination unit; 3. each genome gene region and upstream and downstream sequence extraction units; 4. an inter-genome variation detection unit; 5. a mutation type filtering and classifying unit; 6. and a display unit.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit or scope of the invention, which is therefore not limited to the specific embodiments disclosed below.
Embodiment 1, as shown in fig. 1, the method for detecting and visualizing genetic region variation based on pan genome according to the embodiment of the present invention comprises the following steps:
S1, mapping a gene cds sequence of interest to a genome sequence in a comparison mode to obtain a preliminary annotation result;
S2, filtering, screening and comparing to calculate a coverage value and an identity value according to the preliminary annotation result obtained by mapping, screening an optimal comparison result according to the coverage value and the identity value, judging whether genes are located in the same collinearity region on different genomes, and determining a final standard annotation result according to the judgment result;
s3, extracting each genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file;
S4, comparing the extracted sequences, and detecting variation among genomes;
S5, filtering a variation detection result, removing N-containing fragments, and classifying analysis variation types;
s6, visually displaying the classified mutation types by utilizing svg.
In step S1, the preliminary annotation result is the gene structure information of the relevant genes on each genome. Including the chromosome and specific position of the gene, and the information of the gene region, the exon region, the intron region, the length of the intron region, etc.
Preferably, because the current units for providing genome and annotation come from various sources, the quality and standard of genome annotation is not uniform, the specific genes to be obtained often suffer from incomplete gene structure due to non-uniform annotation standard, or do not annotate specific genes to influence the determination of candidate gene regions.
In step S2, the specific gene sequence is aligned to the genome by gmap software, and specific position information of the specific gene on the genome is determined; meanwhile, calculating a coverage value and an identity value according to the comparison result; the calculation formula is as follows:
coverage = length of aligned upper sequences/total length of genes x 100;
identity = length of sequences on base identical sequences/alignment x 100.
Because of the high homology of part of genes, the situation that different genes are compared to the same genome region may occur, when the situation occurs, the coverage value is preferentially selected, the identity value is selected, and the gene mapping result with a good comparison result is reserved as a final annotation result.
In step S3, the genome gene regions and the upstream and downstream sequences are extracted from the final standard annotation result according to the annotation file, and the sequences at the corresponding positions are extracted by writing a script according to the final standard annotation result.
In step S4, sequence alignment of the extracted sequences using mummer software in a specific order and detection of inter-genome variation includes: and detecting variation among the genomes according to the arrangement sequence of the genomes. Such as 1 to 2,2 to 3,3 to 4, … …,8 to 9,9 to 10.
The detection method comprises the following steps: and mummer, comparing the genome by software, generating a file of the cooling.delta.1copies, and writing a script according to the content of the file to extract variation information.
Example 2 as another embodiment of the present invention, as shown in fig. 2, the method for detecting and visualizing genetic region variation based on pan genome according to the example of the present invention includes:
Gene cds of interest and mapped onto the pan genome sequence;
using gmap software to obtain a preliminary annotation result;
Filtering and screening to obtain a standard annotation result;
Extracting genes and upstream and downstream sequences to obtain candidate distinguishing sequences;
performing sequence comparison on the extracted sequence by mummer software to obtain a variation detection result;
filtering and classifying the mutation detection result, and visually displaying by utilizing svg.
Embodiment 3 as shown in fig. 3, the genome-wide variation detection and visualization system provided by the embodiment of the present invention includes:
A preliminary annotation result acquisition unit 1, configured to map a cds sequence of a gene of interest onto a genome sequence in a manner of alignment, so as to obtain a preliminary annotation result;
The final standard annotation result determining unit 2 is used for filtering, screening and comparing to calculate a coverage value and a identity value according to the preliminary annotation result obtained by mapping, screening an optimal comparison result according to the coverage value and the identity value, judging whether genes are located in the same co-linear region on different genomes, and determining a final standard annotation result according to the judgment result;
the genome gene region and upstream and downstream sequences extracting unit 3 is used for extracting the genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file;
wherein the upstream and downstream sequences comprise sequences of 5kb up and down, respectively;
An inter-genome variation detection unit 4 for performing sequence alignment on the extracted sequences and detecting variation between genomes;
The mutation type filtering and classifying unit 5 is used for filtering the mutation detection result, removing the N-containing fragments and analyzing and classifying the mutation types;
And the display unit 6 is used for visually displaying the classified mutation types by utilizing svg.
Example 4 the genome-wide-based genetic region variation detection and visualization system of example 3 was mounted on a high-throughput sequencer and performed the corresponding functions of each unit.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The content of the information interaction and the execution process between the devices/units and the like is based on the same conception as the method embodiment of the present invention, and specific functions and technical effects brought by the content can be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. For specific working processes of the units and modules in the system, reference may be made to corresponding processes in the foregoing method embodiments.
Based on the technical solutions described in the embodiments of the present invention, the following application examples may be further proposed.
According to an embodiment of the present invention, there is also provided a computer apparatus including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, which when executed by the processor performs the steps of any of the various method embodiments described above.
Embodiments of the present invention also provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the respective method embodiments described above.
The embodiment of the invention also provides an information data processing terminal, which is used for providing a user input interface to implement the steps in the method embodiments when being implemented on an electronic device, and the information data processing terminal is not limited to a mobile phone, a computer and a switch.
The embodiment of the invention also provides a server, which is used for realizing the steps in the method embodiments when being executed on the electronic device and providing a user input interface.
Embodiments of the present invention also provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
While the invention has been described with respect to what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (8)

1. A genetic region variation detection and visualization method based on a pan genome is characterized by comprising the following steps:
S1, mapping a gene cds sequence of interest to a genome sequence in a comparison mode to obtain a preliminary annotation result;
S2, filtering, screening and comparing to calculate a coverage value and an identity value according to the preliminary annotation result obtained by mapping, screening an optimal comparison result according to the coverage value and the identity value, judging whether genes are located in the same collinearity region on different genomes, and determining a final standard annotation result according to the judgment result;
s3, extracting each genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file;
S4, comparing the extracted sequences, and detecting variation among genomes;
S5, filtering a variation detection result, removing N-containing fragments, and classifying analysis variation types;
s6, visually displaying the classified variation types by utilizing svg;
In step S1, the preliminary annotation result is the genetic structure information on each genome, including the chromosome and specific position where the gene is located, and the information length of the gene region, the information length of the exon region, and the information length of the intron region;
In step S2, the screening the optimal comparison result according to the coverage value and the identity value, determining whether the genes are located in the same co-linear region on different genomes, and determining the final standard annotation result according to the determination result includes: comparing the same gene with different genomes, determining positions and gene structures on different genomes, selecting coverage values preferentially, selecting identity values, and reserving a gene mapping result with a good comparison result as a final annotation result.
2. The method for detecting and visualizing a genomic region variation according to claim 1, wherein in step S2, filtering and comparing the coverage value and the identity value comprises:
comparing the specific gene sequence to the genome by gmap software, determining specific position information of the specific gene on the genome, and calculating a coverage value and an identity value according to the comparison result; the calculation formula is as follows:
coverage = length of aligned upper sequences/total length of genes x 100;
identity = length of sequences on base identical sequences/alignment x 100.
3. The method for detecting and visualizing a genomic region variation according to claim 1, wherein in step S3, extracting each genomic region and upstream and downstream sequences according to the final standard annotation result of the annotation file comprises: and writing a script according to the final standard annotation result to extract the sequence of the corresponding position.
4. The method for detecting and visualizing a genomic region variation according to claim 3, wherein the upstream and downstream sequences comprise sequences of 5kb up and down, respectively.
5. The method for detecting and visualizing a genomic region variation according to claim 1, wherein in step S4, sequence alignment is performed on the extracted sequences and the variation between genomes is detected, comprising: and carrying out sequence comparison on the extracted sequences by mummer software, and sequentially carrying out inter-genome variation detection according to the arrangement sequence of the genomes.
6. The method for detecting and visualizing a variation in a genomic region according to claim 5, wherein the method for detecting a variation comprises: after genome is compared by mummer software, a file of the linear delta.1 records is generated, and mutation information is extracted according to script writing of the linear delta.1 records file content.
7. A genome-wide-based genetic region variation detection and visualization system, characterized in that it implements the genome-wide-based genetic region variation detection and visualization method according to any one of claims 1 to 6, the system comprising:
A preliminary annotation result acquisition unit (1) for mapping the cds sequence of the concerned gene onto the genome sequence by means of comparison to obtain a preliminary annotation result; the preliminary annotation result is the gene structure information on each genome, including the chromosome and specific position of the gene, and the information length of the gene region, the information length of the exon region and the information length of the intron region;
The final standard annotation result determining unit (2) is configured to filter, filter and compare and calculate a coverage value and a identity value according to the preliminary annotation result obtained by mapping, filter and compare an optimal comparison result according to the coverage value and the identity value, determine whether genes are located in the same co-linear region on different genomes, and determine a final standard annotation result according to the determination result, and specifically includes: comparing the same gene with different genomes, determining positions and gene structures on different genomes, preferentially selecting coverage values, selecting identity values, and reserving a gene mapping result with a good comparison result as a final annotation result;
the genome gene region and upstream and downstream sequences extraction unit (3) is used for extracting the genome gene region and upstream and downstream sequences according to the final standard annotation result of the annotation file;
An inter-genome variation detection unit (4) for aligning the extracted sequences and detecting variation between genomes;
The mutation type filtering and classifying unit (5) is used for filtering mutation detection results, removing N-containing fragments and analyzing and classifying mutation types;
And the display unit (6) is used for visually displaying the classified mutation types by utilizing svg.
8. The genome-wide-based genetic region variation detection and visualization system of claim 7, wherein the system is mounted on a biological analysis platform and performs the corresponding functions of each unit.
CN202311133414.7A 2023-09-05 2023-09-05 Gene region variation detection and visualization method and system based on pan genome Active CN117153248B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311133414.7A CN117153248B (en) 2023-09-05 2023-09-05 Gene region variation detection and visualization method and system based on pan genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311133414.7A CN117153248B (en) 2023-09-05 2023-09-05 Gene region variation detection and visualization method and system based on pan genome

Publications (2)

Publication Number Publication Date
CN117153248A CN117153248A (en) 2023-12-01
CN117153248B true CN117153248B (en) 2024-05-07

Family

ID=88909595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311133414.7A Active CN117153248B (en) 2023-09-05 2023-09-05 Gene region variation detection and visualization method and system based on pan genome

Country Status (1)

Country Link
CN (1) CN117153248B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012034251A2 (en) * 2010-09-14 2012-03-22 深圳华大基因科技有限公司 Methods and systems for detecting genomic structure variations
WO2016154493A1 (en) * 2015-03-24 2016-09-29 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for multi-scale, annotation-independent detection of functionally-diverse units of recurrent genomic alteration
CN106498070A (en) * 2016-11-17 2017-03-15 中国科学院华南植物园 A kind of method based on genome LoF site examination indirect association Kiwi berry kinds
EP3243907A1 (en) * 2016-05-13 2017-11-15 Curetis GmbH Stable pan-genomes and their use
WO2020183428A2 (en) * 2019-03-14 2020-09-17 Tata Consultancy Services Limited Method and system for mapping read sequences using a pangenome reference
CN112233726A (en) * 2020-10-23 2021-01-15 深圳未知君生物科技有限公司 Analysis method and analysis device for bacterial strains and storage medium
CN114373506A (en) * 2021-12-31 2022-04-19 海南大学 Annotation method for pan-transcriptome of eukaryote
CN115064215A (en) * 2022-08-18 2022-09-16 北京大学人民医院 Method for tracing strain and identifying attribute through similarity
CN115216557A (en) * 2022-07-05 2022-10-21 河南农业大学 Preparation method and application of wheat ultra-high density SNP chip
CN115631789A (en) * 2022-10-25 2023-01-20 哈尔滨工业大学 Pangenome-based group joint variation detection method
CN115662521A (en) * 2022-11-07 2023-01-31 哈尔滨工业大学 Sequence real-time comparison method based on pan-genome

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012034251A2 (en) * 2010-09-14 2012-03-22 深圳华大基因科技有限公司 Methods and systems for detecting genomic structure variations
WO2016154493A1 (en) * 2015-03-24 2016-09-29 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for multi-scale, annotation-independent detection of functionally-diverse units of recurrent genomic alteration
EP3243907A1 (en) * 2016-05-13 2017-11-15 Curetis GmbH Stable pan-genomes and their use
CN106498070A (en) * 2016-11-17 2017-03-15 中国科学院华南植物园 A kind of method based on genome LoF site examination indirect association Kiwi berry kinds
WO2020183428A2 (en) * 2019-03-14 2020-09-17 Tata Consultancy Services Limited Method and system for mapping read sequences using a pangenome reference
CN112233726A (en) * 2020-10-23 2021-01-15 深圳未知君生物科技有限公司 Analysis method and analysis device for bacterial strains and storage medium
CN114373506A (en) * 2021-12-31 2022-04-19 海南大学 Annotation method for pan-transcriptome of eukaryote
CN115216557A (en) * 2022-07-05 2022-10-21 河南农业大学 Preparation method and application of wheat ultra-high density SNP chip
CN115064215A (en) * 2022-08-18 2022-09-16 北京大学人民医院 Method for tracing strain and identifying attribute through similarity
CN115631789A (en) * 2022-10-25 2023-01-20 哈尔滨工业大学 Pangenome-based group joint variation detection method
CN115662521A (en) * 2022-11-07 2023-01-31 哈尔滨工业大学 Sequence real-time comparison method based on pan-genome

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations;Peng Qin ed.;《CellPress》;第184卷(第13期);第3542-3558页 *
泛基因组学分析方法开发及应用;赵永兵;《中国博士学位论文全文数据库基础科学辑》;第1-73页 *
泛基因组学在植物中的应用研究进展;王灿;王艳芳;张应华;许俊强;;湖南生态科学学报;20200625(第02期);55-62 *

Also Published As

Publication number Publication date
CN117153248A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN109887548B (en) ctDNA ratio detection method and detection device based on capture sequencing
CN108319813B (en) Method and device for detecting circulating tumor DNA copy number variation
US7333907B2 (en) System and methods for characterization of chemical arrays for quality control
CN108664766B (en) Method, device, and apparatus for analyzing copy number variation, and storage medium
CN107944228B (en) Visualization method for gene sequencing variation site
Arrigo et al. Automated scoring of AFLPs using RawGeno v 2.0, a free R CRAN library
CN109243530B (en) Genetic variation determination method, system, and storage medium
Oliva et al. Systematic benchmark of ancient DNA read mapping
KR101795662B1 (en) Apparatus and Method for Diagnosis of metabolic disease
CN109524060B (en) Genetic disease risk prompting gene sequencing data processing system and processing method
CN116030892A (en) System and method for identifying chromosome reciprocal translocation breakpoint position
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
CN117153248B (en) Gene region variation detection and visualization method and system based on pan genome
WO2003074739A2 (en) Automated allele determination using fluorometric genotyping
CN111128308B (en) New mutation information knowledge platform for neuropsychiatric diseases
CN112102944A (en) NGS-based brain tumor molecular diagnosis analysis method
NL2033442B1 (en) A Preparation Method and the Application of an Ultra-high Density SNP Chip for Wheat
CN107885972A (en) It is a kind of based on the fusion detection method of single-ended sequencing and its application
EP1798651B1 (en) Gene information display method and apparatus
CN112837746B (en) Probe design method and positioning method for wheat exon sequencing gene positioning
CN108959853B (en) Analysis method, analysis device, equipment and storage medium for copy number variation
CN114566213A (en) Single-parent diploid analysis method and system for family high-throughput sequencing data
US20230282307A1 (en) Method for detecting uniparental disomy based upon ngs-trio, and use thereof
JP2005284964A (en) Method for displaying data and process in system for analyzing gene manifestation as well as system for analyzing gene expression
WO2023181370A1 (en) Information processing device, information processing method, and information processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant