CN114121153A - Gene mutation site detection method, device, electronic equipment and storage medium - Google Patents

Gene mutation site detection method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114121153A
CN114121153A CN202111394703.3A CN202111394703A CN114121153A CN 114121153 A CN114121153 A CN 114121153A CN 202111394703 A CN202111394703 A CN 202111394703A CN 114121153 A CN114121153 A CN 114121153A
Authority
CN
China
Prior art keywords
sequence
mutation
gene
amino acid
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111394703.3A
Other languages
Chinese (zh)
Inventor
金文祥
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kingmed Diagnostics Central Co Ltd
Original Assignee
Guangzhou Kingmed Diagnostics Central Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kingmed Diagnostics Central Co Ltd filed Critical Guangzhou Kingmed Diagnostics Central Co Ltd
Priority to CN202111394703.3A priority Critical patent/CN114121153A/en
Publication of CN114121153A publication Critical patent/CN114121153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the application discloses a method and a device for detecting gene mutation sites, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a detection gene sequence, and acquiring a reference gene sequence and a comparison parameter; comparing the sequence of the detection gene sequence with the sequence of the reference gene based on the comparison parameters to obtain a first sequence comparison result; extracting target detection gene segments from the detection gene sequences and translating the target detection gene segments into target amino acid sequences; based on the comparison parameters, carrying out sequence comparison on the target amino acid sequence and an amino acid sequence corresponding to the reference gene sequence to obtain a second sequence comparison result; searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites; and displaying a gene variation detection result and an amino acid variation detection result, wherein the gene variation detection result and the amino acid variation detection result respectively comprise position information of a mutation site, a mutation base and a variation type of the mutation site.

Description

Gene mutation site detection method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing and analysis technologies, and in particular, to a method and an apparatus for detecting a gene mutation site, an electronic device, and a storage medium.
Background
The detection of the gene mutation site has important application in aspects of prokaryotic genotyping, drug resistance identification, antigenicity change and the like. By comparing the measured gene sequence with a reference gene, a mutation is defined where the detected gene is not identical to the reference gene sequence.
The existing gene sequence comparison algorithm and software generally provide multiple tools such as nucleic acid sequence comparison and protein sequence comparison, allow a user to input a preset format sequence for comparison, and provide a sequence comparison result, but the display of a gene mutation site is not intuitive and detailed enough, and the user needs manual operation and check to obtain more information.
Disclosure of Invention
In view of the above, it is necessary to provide a gene mutation site detection method, apparatus, electronic device and medium for addressing the above problems.
In a first aspect, a method for detecting a gene mutation site is provided, which comprises:
obtaining a detection gene sequence, obtaining a reference gene sequence and comparison parameters, wherein the comparison parameters are used for calculating scores of different types of base pairs in sequence comparison;
comparing the sequence of the detection gene sequence with the sequence of the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result;
extracting a target detection gene segment from a detection gene sequence, and translating the target detection gene segment into a target amino acid sequence;
based on the comparison parameters, carrying out the sequence comparison on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence to obtain a second sequence comparison result;
searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites;
and displaying a gene variation detection result obtained based on the first sequence alignment result and an amino acid variation detection result obtained based on the second sequence alignment result, wherein the gene variation detection result and the amino acid variation detection result respectively comprise the position information of the mutation site, the mutation base of the mutation site and the variation type.
Optionally, the comparing the sequence of the detection gene sequence with the sequence of the reference gene based on the comparison parameter to obtain a first sequence comparison result includes:
based on a dynamic programming algorithm, aligning and comparing the detection gene sequence with the reference gene sequence to obtain a first sequence comparison result;
comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the comparison parameters to obtain a second sequence comparison result, which comprises:
and aligning and comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the dynamic programming algorithm to obtain a second sequence comparison result.
Optionally, the searching for the mutation site in the first sequence alignment result and the second sequence alignment result, and determining the variation type of the mutation site include:
searching mismatched bases by traversing the first sequence comparison result and the second sequence comparison result, determining the positions of the mismatched bases as mutation sites, recording the positions of the mutation sites, and determining the mutation type of the mutation sites as point mutation;
searching the initial position where deletion occurs by traversing the first sequence comparison result and the second sequence comparison result, after the initial position where deletion occurs is found, continuing to search the sequence until the deletion ending position is found, determining that the position from the initial position to the deletion ending position is a mutation site, recording the position of the mutation site and a corresponding deletion base, and determining that the mutation type of the mutation site is deletion mutation;
searching the initial position of mutation by traversing the comparison result of the first sequence and the comparison result of the second sequence, after the initial position of mutation is found, continuing to search the sequence until the end position of mutation is found, determining the position from the initial position of mutation to the end position of mutation as a mutation site, recording the position of the mutation site and the corresponding mutation base, and determining the mutation type of the mutation site as insertion mutation.
Optionally, the extracting a target detection gene segment from the detection gene sequence, and translating the target detection gene segment into a target amino acid sequence, includes:
intercepting the aligned detection gene sequence according to the initial position and the end position of a coding sequence of a preset reference gene sequence to obtain the target detection gene segment;
and (3) converting the amino acid of the target detection gene segment according to a universal amino acid codon table, and translating into the target amino acid sequence.
Optionally, after the obtaining of the detection gene sequence, the method further comprises:
checking the input detection gene sequence, and outputting prompt information of input errors if the input is empty or contains other characters except for preset characters; and if the input detection gene sequence only contains the preset characters, removing the blank spaces and the line change characters in the detection gene sequence and converting the blank spaces and the line change characters into capital letters.
Optionally, the obtaining of the alignment parameters includes:
acquiring input custom comparison parameters, or acquiring default comparison parameters;
the alignment parameters include a first parameter representing a score for a base perfect match, a second parameter representing a score for a base mismatch, a third parameter representing a score for a base alignment gap, and a fourth parameter representing a score for a base alignment gap extension, wherein the first and second parameters are positive numbers, the third and fourth parameters are negative numbers, and the fourth parameter is greater than or equal to the third parameter.
Optionally, the method further includes:
and responding to a downloading instruction of a target detection result, and downloading a text file of the target detection result, wherein the target detection result comprises the gene variation detection result and/or the amino acid variation detection result.
In a second aspect, there is provided a gene mutation site detection apparatus comprising:
the acquisition module is used for acquiring a detection gene sequence, acquiring a reference gene sequence and comparison parameters, wherein the comparison parameters are used for calculating scores of different types of base pairs in sequence comparison;
the comparison module is used for comparing the sequences of the detection gene sequence and the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result;
the extraction module is used for extracting a target detection gene segment from the detection gene sequence and translating the target detection gene segment into a target amino acid sequence;
the comparison module is further used for comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the comparison parameters to obtain a second sequence comparison result;
the search module is used for searching mutation sites in the first sequence comparison result and the second sequence comparison result and determining the variation type of the mutation sites;
a display module, configured to display a gene variation detection result obtained based on the first sequence comparison result and an amino acid variation detection result obtained based on the second sequence comparison result, where the gene variation detection result and the amino acid variation detection result both include position information of the mutation site, a mutation base of the mutation site, and the variation type.
In a third aspect, an electronic device is provided, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps as in the first aspect and any one of its possible implementations.
In a fourth aspect, there is provided a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of the first aspect and any possible implementation thereof.
The main objective of the present application is to provide a method, an apparatus, an electronic device and a storage medium for detecting a gene mutation site, wherein a reference gene sequence and an alignment parameter are obtained by obtaining a detection gene sequence, and the alignment parameter is used for calculating scores of different types of base pairs in sequence alignment; comparing the sequence of the detection gene sequence with the sequence of the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result; extracting a target detection gene segment from a detection gene sequence, and translating the target detection gene segment into a target amino acid sequence; based on the comparison parameters, carrying out the sequence comparison on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence to obtain a second sequence comparison result; searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites; and displaying a gene variation detection result obtained based on the first sequence comparison result and an amino acid variation detection result obtained based on the second sequence comparison result, wherein the gene variation detection result and the amino acid variation detection result both comprise the position information of the mutation site, the mutation base of the mutation site and the variation type, so that the user operation is simplified, the gene sequence can be automatically translated into the amino acid sequence, the gene sequence comparison, the gene annotation, the amino acid sequence comparison and the mutation site detection are realized, the specific position information, the mutation base and the variation type of the mutation site are displayed, and a more intuitive and detailed detection result is provided.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
FIG. 1 is a schematic flow chart of a method for detecting a gene mutation site according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of the type of gene mutation provided in the examples of the present application;
fig. 3 is a schematic page diagram of a detection result provided in the embodiment of the present application;
FIG. 4 is a schematic structural diagram of a gene mutation site detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for detecting a gene mutation site according to an embodiment of the present disclosure. The method can comprise the following steps:
101. and obtaining a detection gene sequence, obtaining a reference gene sequence and an alignment parameter, wherein the alignment parameter is used for calculating scores of different types of base pairs in sequence alignment.
The execution subject of the embodiment of the present application may be a gene mutation site detection apparatus, and may be an electronic device, and in a specific implementation, the electronic device may be a terminal, which may also be referred to as a terminal device, such as a desktop computer. In some embodiments, the gene mutation site detecting device may be other portable devices such as a laptop computer or a tablet computer.
The detection gene sequence in the embodiment of the application is a gene sequence to be detected and analyzed provided by a user, a reference gene sequence for comparison and adopted comparison parameters which can be preset by the user, and the comparison parameters can be used for calculating scores of different types of base pairs during sequence comparison.
Optionally, after obtaining the detection gene sequence, the method further comprises:
checking the input detection gene sequence, and outputting prompt information of input errors if the input is empty or contains other characters except for preset characters; if the inputted detection gene sequence only contains the preset character, removing the blank space and the line change character in the detection gene sequence and converting the blank space and the line change character into capital letters.
Specifically, the reference gene sequence input by the user may be checked first, and if the input is empty or contains a preset character (e.g., a, T, C, G, N), an indication of an input error may be given; for legal input, spaces and linefeeds in the input sequence may be automatically removed and case converted (e.g., all converted to capital letters). Optionally, an Example sequence presentation option and a sequence clearing option may be further provided, for Example, an Example key is displayed below the text box, an Example sequence is provided after clicking, a clear key is displayed below the text box, an input reference gene sequence is cleared after clicking, and an output result bar is cleared, so that the user can operate the system conveniently.
Similarly, similar checking steps can be performed for the detected gene sequence input by the user, and are not described in detail here.
In an optional embodiment, the obtaining of the alignment parameters includes:
acquiring input custom comparison parameters, or acquiring default comparison parameters;
the alignment parameters include a first parameter indicating a score of perfect base match, a second parameter indicating a score of mismatch, a third parameter indicating a score of gap in base alignment, and a fourth parameter indicating a score of gap extension in base alignment, wherein the first parameter and the second parameter are positive numbers, the third parameter and the fourth parameter are negative numbers, and the fourth parameter is equal to or greater than the third parameter.
Specifically, options of default parameters and custom comparison parameters may be provided, if a user selects the default comparison parameters, each parameter in the custom comparison parameter column may not be set, and if the option of the custom comparison parameter is selected, each parameter in the custom parameter column may be set, specifically, 4 parameters are provided in the embodiment of the present application:
identical characters points indicate scores for perfect base matches,
unidentical characters points indicate the scores of base mismatches,
gap open points represent the score of the gap in base alignment,
gap extension points represent the score for base alignment gap extension;
wherein, the software also embeds the corresponding parameter judgment condition: identifying characters points are positive numbers, gap exposing points and gap extending points are negative numbers, and the gap extending points are more than or equal to the gap exposing points; the default parameter options are software defaults.
Through the comparison parameters, the comparison score of the gene sequence can be more accurately and comprehensively calculated, and the mutation condition in the gene sequence can be evaluated.
102. And comparing the sequence of the detection gene sequence with the sequence of the reference gene based on the comparison parameters to obtain a first sequence comparison result.
The detection of the gene mutation site has important application in aspects of prokaryotic genotyping, drug resistance identification, antigenicity change and the like. By comparing the test gene sequence with the reference gene sequence, the site of inconsistency is defined as a mutation.
In the embodiment of the present application, the amino acid sequences corresponding to the detection gene sequence and the reference gene sequence may be subjected to sequence comparison, so as to obtain the comparison result of the amino acid sequences (or referred to as the protein sequence comparison result).
In an alternative embodiment, the step 102 includes:
and aligning and comparing the detection gene sequence with the reference gene sequence based on a dynamic programming algorithm to obtain the comparison result of the first sequence.
Specifically, the lengths of the reference gene sequence and the detection gene sequence can be automatically calculated according to the reference gene sequence and the detection gene sequence provided by the user. In the embodiment of the application, a dynamic programming algorithm is adopted for sequence comparison, so that a detection gene sequence provided by a user can be compared with a reference gene sequence to obtain an optimal sequence comparison result of the detection gene sequence, the optimal sequence comparison result is displayed on a software interface, and meanwhile, the comparison result of an amino acid sequence is also displayed.
103. Extracting target detection gene segment from the detection gene sequence, and translating the target detection gene segment into a target amino acid sequence.
The examples of this application relate to the coding region for proteins (CDS), which is part of the DNA of cells, and prokaryotic genes are divided into coding and non-coding regions. A coding region refers to a portion capable of transcribing messenger RNA, capable of synthesizing the corresponding protein, while a non-coding region is a DNA structure incapable of transcribing messenger RNA.
In the embodiment of the application, the CDS in the detected gene sequence can be translated into an amino acid sequence for displaying, and the converted amino acid sequence can be aligned.
In an alternative embodiment, the step 103 includes:
intercepting the aligned detection gene sequence according to the initial position and the end position of a coding sequence of a preset reference gene sequence to obtain the target detection gene segment;
and (3) converting the amino acid of the target detection gene segment according to a general amino acid codon table, and translating into the target amino acid sequence.
Specifically, the start position and the end position of the coding sequence of the predetermined reference gene sequence can be selected or provided by a user, and optionally, the software can automatically check whether the start position and the end position of the CDS provided by the user are proper, that is, the start position and the end position of the CDS both need to be between 1 and the total length of the reference gene sequence. Furthermore, the CDS starting and ending positions of the reference gene sequence provided by the user can correspondingly extract the sequence of the detection gene sequence in the interval, namely the target detection gene segment, translate the codon, translate the target detection gene segment into the target amino acid sequence and display the target amino acid sequence on a software interface.
104. And performing the sequence comparison on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence based on the comparison parameters to obtain a second sequence comparison result.
In an alternative embodiment, the step 102 includes:
and aligning and comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the dynamic programming algorithm to obtain a second sequence comparison result.
The amino acid sequence of step 104 can be processed in the manner described above with reference to the detailed description of step 102, and will not be described herein again.
105. And searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites.
This step 105 may be performed after the above-mentioned steps 102 and 104.
In the gene mutation site detection in the embodiment of the present application, specifically, mutation sites can be searched according to the comparison result of the gene sequence and the amino acid sequence, and the mutation types are divided into: point mutations, insertion mutations and deletion mutations and a new window can be regenerated for display, step 106 can be performed.
First, the type of point mutation, deletion mutation and insertion mutation is defined, for example, fig. 2 is a schematic diagram of a gene mutation type provided in the examples of the present application, and as shown in fig. 2, a test gene sequence is first aligned with a reference gene sequence. In the embodiment of the present application, a dynamic programming algorithm is used for sequence alignment, mismatches and gaps are allowed to exist in the sequence, a score of a perfect base match is a parameter match _ score, a score of a mismatch base is a parameter mismatch _ score, a score of opening a gap is a parameter gap _ open, a score parameter extended after the gap is a gap _ extended, and a sequence alignment function can be shown as follows:
Figure BDA0003369616560000091
Figure BDA0003369616560000101
then, mutation type detection is carried out, and point mutation is carried out when one base is different (mismatched base) at the same position a of the detection gene sequence and the reference gene sequence; detecting the deletion of the base of the gene sequence at the same position b, and determining the deletion mutation; and c, detecting the extra base of the detected gene sequence at the same position c, wherein the detection result is an insertion mutation.
In an alternative embodiment, the step 105 includes:
searching mismatched bases by traversing the first sequence comparison result and the second sequence comparison result, determining the position of the mismatched bases as a mutation site, recording the position of the mutation site, and determining the mutation type of the mutation site as point mutation;
searching the initial position where deletion occurs by traversing the first sequence comparison result and the second sequence comparison result, after the initial position where deletion occurs is found, continuing to search the sequence until the deletion ending position is found, determining the position from the initial position to the deletion ending position as a mutation site, recording the position of the mutation site and a corresponding deletion base, and determining the mutation type of the mutation site as deletion mutation;
searching the initial position of mutation by traversing the comparison result of the first sequence and the comparison result of the second sequence, after finding the initial position of mutation, continuing to search the sequence until finding the termination position of mutation, determining the position from the initial position of mutation to the termination position of mutation as a mutation site, recording the position of the mutation site and the corresponding mutation base, and determining the mutation type of the mutation site as insertion mutation.
Specifically, by sequence alignment, mutation detection is performed by the aligned sequences, and the mutation detection may be performed by first detecting point mutations, and then sequentially detecting deletion and insertion mutations. And detecting point mutation, searching the mismatched base by traversing the aligned sequences, and recording the position and the base. And detecting deletion and insertion mutation, searching the initial position of the deletion or insertion mutation by traversing the aligned sequence, after finding the mutation initial position, continuing to search the sequence until finding the end position of the mutation, and recording the position and the mutation base. And after the mutation search is finished, outputting the positions of all mutation results and the mutation bases.
The mutation searching step described above in the embodiments of the present application may be implemented by programming.
106. And displaying a result of detecting a genetic variation based on the result of the first sequence alignment and a result of detecting an amino acid variation based on the result of the second sequence alignment, wherein the result of detecting a genetic variation and the result of detecting an amino acid variation each include information on a position of the mutation site, a mutated base of the mutation site, and the type of the mutation.
After the detection is finished, the gene and amino acid mutation results can be automatically visualized. For example, fig. 3 is a schematic diagram of a page showing the detection result of mutation, as shown in fig. 3, the page shows the detection result of mutation including the detection result of gene mutation and the detection result of amino acid mutation, including mutation position, reference base, mutation base and mutation type. The embodiment of the application does not limit the display interface effect.
Further optionally, the method further includes:
and responding to a downloading instruction of a target detection result, and downloading a text file of the target detection result, wherein the target detection result comprises the gene variation detection result and/or the amino acid variation detection result.
The user can download the result by clicking the download key, and the downloaded file format is txt file format, and the result can also be downloaded into a text file by a download button, as shown in the following table 1:
Figure BDA0003369616560000111
TABLE 1
In general methods or methods, if mutation of an amino acid sequence is to be detected, the gene sequence is annotated as an amino acid sequence, and then the amino acid sequence is aligned and subjected to mutation detection.
According to the embodiment of the application, sequence comparison, gene annotation, amino acid sequence comparison and mutation site detection are combined, automatic operation can be realized, the accuracy of the compared mutation result is high through verification, manual inspection and input are not needed in the process, a user does not need to manually intercept the sequence and then translate the sequence, the comparison result does not need to manually check the mutation site information, and user operation is simplified.
The method in the embodiment of the application can be realized based on software written by using Python language, mainly comprises graphical interface program development and graphical interface back-end program development, and finally, the programs are packaged into an independent executable software package by using Pyinstaller, so that the method is supported to run on a Windows system, and can be run simply and conveniently under a micro personal computer.
Based on the description of the embodiment of the gene mutation site detection method, the embodiment of the application also discloses a gene mutation site detection device. Referring to fig. 4, the gene mutation site detecting apparatus 400 includes:
an obtaining module 410, configured to obtain a detection gene sequence, obtain a reference gene sequence and an alignment parameter, where the alignment parameter is used to calculate scores of different types of base pairs in sequence alignment;
an alignment module 420, configured to perform the sequence alignment on the detection gene sequence and the reference gene sequence based on the alignment parameter, so as to obtain a first sequence alignment result;
an extraction module 430, configured to extract a target detection gene segment from the detection gene sequence, and translate the target detection gene segment into a target amino acid sequence;
the alignment module 420 is further configured to perform the sequence alignment on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence based on the alignment parameter to obtain a second sequence alignment result;
a searching module 440, configured to search mutation sites in the first sequence alignment result and the second sequence alignment result, and determine a variation type of the mutation sites;
a display module 450, configured to display a gene variation detection result obtained based on the first sequence comparison result and an amino acid variation detection result obtained based on the second sequence comparison result, where the gene variation detection result and the amino acid variation detection result both include the position information of the mutation site, the mutation base of the mutation site, and the variation type.
According to an embodiment of the present application, each step involved in the method shown in fig. 1 may be performed by each module in the gene mutation site detecting apparatus 400 shown in fig. 4, and is not described herein again.
The gene mutation site detection device 400 in the embodiment of the present application can obtain a detection gene sequence, obtain a reference gene sequence and an alignment parameter, where the alignment parameter is used for calculating scores of different types of base pairs in sequence alignment; comparing the sequence of the detection gene sequence with the sequence of the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result; extracting a target detection gene segment from a detection gene sequence, and translating the target detection gene segment into a target amino acid sequence; based on the comparison parameters, carrying out the sequence comparison on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence to obtain a second sequence comparison result; searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites; and displaying a gene variation detection result obtained based on the first sequence comparison result and an amino acid variation detection result obtained based on the second sequence comparison result, wherein the gene variation detection result and the amino acid variation detection result both comprise the position information of the mutation site, the mutation base of the mutation site and the variation type, so that the user operation is simplified, the gene sequence can be automatically translated into the amino acid sequence, the gene sequence comparison, the gene annotation, the amino acid sequence comparison and the mutation site detection are realized, the specific position information, the mutation base and the variation type of the mutation site are displayed, and a more intuitive and detailed detection result is provided.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 5, the electronic device 500 includes at least a processor 501, an input device 502, an output device 503, and a computer storage medium 504. The processor 501, the input device 502, the output device 503, and the computer storage medium 504 within the electronic device may be connected by a bus or other means.
A computer storage medium 504 may be stored in the memory of the electronic device, the computer storage medium 504 being used for storing a computer program comprising program instructions, and the processor 501 being used for executing the program instructions stored by the computer storage medium 504. The processor 501 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 501 described above in the embodiments of the present application may be configured to perform a series of processes, including various steps involved in the method shown in fig. 1, and the like.
An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include both a built-in storage medium in the electronic device and, of course, an extended storage medium supported by the electronic device. Computer storage media provide storage space that stores an operating system for an electronic device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 501. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 501 to perform the corresponding steps in the above embodiments; in a specific implementation, one or more instructions in the computer storage medium may be loaded by the processor 501 and perform any steps involved in the method shown in fig. 1, and are not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the module is only one logical division, and other divisions may be possible in actual implementation, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims (10)

1. A method for detecting a gene mutation site, comprising:
obtaining a detection gene sequence, obtaining a reference gene sequence and comparison parameters, wherein the comparison parameters are used for calculating scores of different types of base pairs in sequence comparison;
comparing the sequence of the detection gene sequence with the sequence of the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result;
extracting a target detection gene segment from a detection gene sequence, and translating the target detection gene segment into a target amino acid sequence;
based on the comparison parameters, carrying out the sequence comparison on the target amino acid sequence and the amino acid sequence corresponding to the reference gene sequence to obtain a second sequence comparison result;
searching mutation sites in the first sequence comparison result and the second sequence comparison result, and determining the variation type of the mutation sites;
and displaying a gene variation detection result obtained based on the first sequence alignment result and an amino acid variation detection result obtained based on the second sequence alignment result, wherein the gene variation detection result and the amino acid variation detection result respectively comprise the position information of the mutation site, the mutation base of the mutation site and the variation type.
2. The method of claim 1, wherein the comparing the detected gene sequence with the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result comprises:
based on a dynamic programming algorithm, aligning and comparing the detection gene sequence with the reference gene sequence to obtain a first sequence comparison result;
comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the comparison parameters to obtain a second sequence comparison result, which comprises:
and aligning and comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the dynamic programming algorithm to obtain a second sequence comparison result.
3. The method of claim 2, wherein searching for the mutation site in the first sequence alignment result and the second sequence alignment result and determining the type of variation of the mutation site comprises:
searching mismatched bases by traversing the first sequence comparison result and the second sequence comparison result, determining the positions of the mismatched bases as mutation sites, recording the positions of the mutation sites, and determining the mutation type of the mutation sites as point mutation;
searching the initial position where deletion occurs by traversing the first sequence comparison result and the second sequence comparison result, after the initial position where deletion occurs is found, continuing to search the sequence until the deletion ending position is found, determining that the position from the initial position to the deletion ending position is a mutation site, recording the position of the mutation site and a corresponding deletion base, and determining that the mutation type of the mutation site is deletion mutation;
searching the initial position of mutation by traversing the comparison result of the first sequence and the comparison result of the second sequence, after the initial position of mutation is found, continuing to search the sequence until the end position of mutation is found, determining the position from the initial position of mutation to the end position of mutation as a mutation site, recording the position of the mutation site and the corresponding mutation base, and determining the mutation type of the mutation site as insertion mutation.
4. The method of claim 1, wherein said extracting a target test gene segment from a test gene sequence and translating said target test gene segment into a target amino acid sequence comprises:
intercepting the aligned detection gene sequence according to the initial position and the end position of a coding sequence of a preset reference gene sequence to obtain the target detection gene segment;
and (3) converting the amino acid of the target detection gene segment according to a universal amino acid codon table, and translating into the target amino acid sequence.
5. The method of any one of claims 1-4, wherein after said obtaining the test gene sequence, the method further comprises:
checking the input detection gene sequence, and outputting prompt information of input errors if the input is empty or contains other characters except for preset characters; and if the input detection gene sequence only contains the preset characters, removing the blank spaces and the line change characters in the detection gene sequence and converting the blank spaces and the line change characters into capital letters.
6. The method of claim 1, wherein the obtaining alignment parameters comprises:
acquiring input custom comparison parameters, or acquiring default comparison parameters;
the alignment parameters include a first parameter representing a score for a base perfect match, a second parameter representing a score for a base mismatch, a third parameter representing a score for a base alignment gap, and a fourth parameter representing a score for a base alignment gap extension, wherein the first and second parameters are positive numbers, the third and fourth parameters are negative numbers, and the fourth parameter is greater than or equal to the third parameter.
7. The method of claim 1, further comprising:
and responding to a downloading instruction of a target detection result, and downloading a text file of the target detection result, wherein the target detection result comprises the gene variation detection result and/or the amino acid variation detection result.
8. A gene mutation site detection device, comprising:
the acquisition module is used for acquiring a detection gene sequence, acquiring a reference gene sequence and comparison parameters, wherein the comparison parameters are used for calculating scores of different types of base pairs in sequence comparison;
the comparison module is used for comparing the sequences of the detection gene sequence and the reference gene sequence based on the comparison parameters to obtain a first sequence comparison result;
the extraction module is used for extracting a target detection gene segment from the detection gene sequence and translating the target detection gene segment into a target amino acid sequence;
the comparison module is further used for comparing the target amino acid sequence with the amino acid sequence corresponding to the reference gene sequence based on the comparison parameters to obtain a second sequence comparison result;
the search module is used for searching mutation sites in the first sequence comparison result and the second sequence comparison result and determining the variation type of the mutation sites;
a display module, configured to display a gene variation detection result obtained based on the first sequence comparison result and an amino acid variation detection result obtained based on the second sequence comparison result, where the gene variation detection result and the amino acid variation detection result both include position information of the mutation site, a mutation base of the mutation site, and the variation type.
9. An electronic device, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of counting thin cards as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of counting thin cards according to any one of claims 1-7.
CN202111394703.3A 2021-11-23 2021-11-23 Gene mutation site detection method, device, electronic equipment and storage medium Pending CN114121153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111394703.3A CN114121153A (en) 2021-11-23 2021-11-23 Gene mutation site detection method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111394703.3A CN114121153A (en) 2021-11-23 2021-11-23 Gene mutation site detection method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114121153A true CN114121153A (en) 2022-03-01

Family

ID=80440278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111394703.3A Pending CN114121153A (en) 2021-11-23 2021-11-23 Gene mutation site detection method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114121153A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115148281A (en) * 2022-06-29 2022-10-04 广州源井生物科技有限公司 Automatic design method and system for gene editing point mutation scheme
CN116564405A (en) * 2023-04-19 2023-08-08 江苏先声医学诊断有限公司 Average-disorder-based genome sequencing mutation site filtering method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115148281A (en) * 2022-06-29 2022-10-04 广州源井生物科技有限公司 Automatic design method and system for gene editing point mutation scheme
CN116564405A (en) * 2023-04-19 2023-08-08 江苏先声医学诊断有限公司 Average-disorder-based genome sequencing mutation site filtering method
CN116564405B (en) * 2023-04-19 2023-12-15 江苏先声医学诊断有限公司 Average-disorder-based genome sequencing mutation site filtering method

Similar Documents

Publication Publication Date Title
CN114121153A (en) Gene mutation site detection method, device, electronic equipment and storage medium
US20150339384A1 (en) Recommendation system and method for search input
CN107944228B (en) Visualization method for gene sequencing variation site
CN107133165B (en) Browser compatibility detection method and device
CN106445476B (en) Code change information determination method and device and electronic equipment
CN106328145A (en) Voice correction method and voice correction device
JP2015225669A (en) Annotation display assistance device and annotation display assistance method
CN102955773B (en) For identifying the method and system of chemical name in Chinese document
CN111933214B (en) Method and computing device for detecting RNA level somatic gene variation
CN111292809B (en) Method, electronic device, and computer storage medium for detecting RNA level gene fusion
CN108733674B (en) A2L file merging method and device
CN109917982B (en) Voice input method, device, equipment and readable storage medium
CN105094562A (en) Information processing method and terminal
WO2021042542A1 (en) Table of contents storage method and apparatus, computer device and storage medium
CN112802495A (en) Robot voice test method and device, storage medium and terminal equipment
CN114220113A (en) Paper quality detection method, device and equipment
CN110570908B (en) Sequencing sequence polymorphic identification method and device, storage medium and electronic equipment
CN112667502A (en) Page testing method, device and medium
JP6091471B2 (en) Source code analysis apparatus, source code analysis method, and source code analysis program
CN117291163B (en) Data extraction method, device, equipment and storage medium
CN110968677B (en) Text addressing method and device, medium and electronic equipment
KR20190078846A (en) Abnormal sequence identification method based on intron and exon
CN113220587A (en) Web interface compatibility testing method and device
CN107562725B (en) Index extraction verification method and device
CN111095309A (en) Information processing device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination