CN118280441A - Novel coronavirus sample evaluation method - Google Patents
Novel coronavirus sample evaluation method Download PDFInfo
- Publication number
- CN118280441A CN118280441A CN202410381953.0A CN202410381953A CN118280441A CN 118280441 A CN118280441 A CN 118280441A CN 202410381953 A CN202410381953 A CN 202410381953A CN 118280441 A CN118280441 A CN 118280441A
- Authority
- CN
- China
- Prior art keywords
- novel coronavirus
- evaluated
- mutation
- sample
- coronavirus sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000711573 Coronaviridae Species 0.000 title claims abstract description 133
- 238000011156 evaluation Methods 0.000 title claims abstract description 24
- 230000035772 mutation Effects 0.000 claims abstract description 98
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 10
- 108090000623 proteins and genes Proteins 0.000 claims description 10
- 102000004169 proteins and genes Human genes 0.000 claims description 10
- 150000001413 amino acids Chemical class 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 108700010904 coronavirus proteins Proteins 0.000 claims description 4
- 238000010201 enrichment analysis Methods 0.000 claims description 4
- 238000010219 correlation analysis Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 208000015181 infectious disease Diseases 0.000 claims 1
- 241000700605 Viruses Species 0.000 abstract description 46
- 238000011160 research Methods 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 6
- 238000004880 explosion Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 230000004075 alteration Effects 0.000 description 2
- 238000012098 association analyses Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000008570 general process Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 1
- 241000283966 Pholidota <mammal> Species 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Data Mining & Analysis (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a novel coronavirus sample evaluation method, medium and equipment, wherein the method comprises the following steps: obtaining original data of a novel coronavirus sample to be evaluated, and comparing to generate an intermediate file; analyzing the intermediate file to obtain base sequence information, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; carrying out epidemic analysis, function prediction and clinical relevance on a novel coronavirus sample to be evaluated, and generating evaluation data according to a epidemic analysis result, a function prediction result, a clinical relevance result and classification information; the novel coronavirus sample base mutation can be automatically and comprehensively evaluated to accurately judge the virus sample, so that powerful help is provided for virus tracing, virus typing and virus research.
Description
The application is a divisional application of Chinese application patent application based on the application number CN202110590258.1 and the application date 2021, 5 and 28, and the application name of the novel automatic evaluation method for coronavirus sample base mutation.
Technical Field
The invention relates to the technical field of virus sample data analysis, in particular to a novel coronavirus sample evaluation method, a computer readable storage medium and computer equipment.
Background
In the related art, when detecting a novel coronavirus, virus sample data is processed only through a traditional general process, so that data which can be directly interpreted by a user is obtained. The virus data is processed in the mode, so that the function is single, the deep interpretation of the data is lacked, and the deep research of viruses is not facilitated.
Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems in the above-described technology. Therefore, an object of the present invention is to provide a novel automatic evaluation method for base mutation of coronavirus sample, which can automatically and comprehensively evaluate the base mutation of the novel coronavirus sample to accurately judge the virus sample, thereby providing powerful help for virus tracing, virus typing and virus research.
A second object of the present invention is to propose a computer readable storage medium.
A third object of the invention is to propose a computer device.
To achieve the above objective, an embodiment of the first aspect of the present invention provides a novel automatic coronavirus sample base mutation assessment method, comprising the following steps: obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file; analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; and carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information.
According to the novel coronavirus sample base mutation automatic assessment method, firstly, the original data of a novel coronavirus sample to be assessed is obtained, and the original data is compared according to a reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In addition, the novel automatic coronavirus sample base mutation assessment method provided by the embodiment of the invention can also have the following additional technical characteristics:
Optionally, the original data is a fastq format file corresponding to the novel coronavirus sample to be evaluated, and the intermediate file is a bam format file.
Optionally, before comparing the base sequence information of the novel coronavirus sample to be evaluated with the reference sequence, sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy.
Optionally, performing a pandemic analysis on the novel coronavirus sample to be evaluated, comprising: and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
Optionally, performing functional prediction on the novel coronavirus sample to be evaluated, including: and comparing the basic base sequence information of the novel coronavirus to be evaluated with a reference sequence to judge whether a mutation site influences protein coding according to mutation information and judge whether the mutation site changes amino acid attributes.
Optionally, performing functional prediction on the novel coronavirus sample to be evaluated, including: and inquiring a new coronavirus protein database according to the mutation information, judging whether mutation sites are positioned in a protein core structure region according to an inquiry result, and carrying out conservation scoring on the mutation sites.
Optionally, clinically correlating the novel coronavirus sample to be evaluated, comprising: and acquiring clinical data corresponding to the novel coronavirus sample to be evaluated, preprocessing the clinical data to generate standardized data, and carrying out association analysis and enrichment analysis on the standardized data and the mutation information.
To achieve the above object, a second aspect of the present invention provides a computer-readable storage medium having stored thereon a novel coronavirus sample base mutation automatic assessment program which, when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method as described above.
According to the computer readable storage medium, the novel coronavirus sample base mutation automatic evaluation program is stored, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, and the novel coronavirus sample base mutation is automatically and comprehensively evaluated, so that the accurate judgment of a virus sample is realized, and further powerful help is provided for virus tracing, virus typing and virus research.
To achieve the above object, an embodiment of the third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the above-mentioned novel automatic coronavirus sample base mutation assessment method.
According to the computer equipment provided by the embodiment of the invention, the novel coronavirus sample base mutation automatic evaluation program is stored through the memory, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, the novel coronavirus sample base mutation is automatically and comprehensively evaluated, the virus sample is accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
Drawings
FIG. 1 is a schematic flow chart of a novel automatic evaluation method for base mutations of coronavirus samples according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the related art, when detecting a novel coronavirus, virus sample data is processed only through a traditional general process, so that data which can be directly interpreted by a user is obtained. The virus data is processed in the mode, so that the function is single, the deep interpretation of the data is lacked, and the deep research of viruses is not facilitated; according to the novel coronavirus sample base mutation automatic assessment method, firstly, the original data of a novel coronavirus sample to be assessed is obtained, and the original data is compared according to a reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a novel automatic coronavirus sample base mutation assessment method according to an embodiment of the present invention, as shown in FIG. 1, comprising the steps of:
S101, obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file.
In some embodiments, the raw data is a fastq format file corresponding to the new coronavirus sample to be evaluated (e.g., the raw data of second generation sequencing or third generation sequencing is a fastq format file), and the intermediate file is a bam format file.
S102, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information.
That is, the intermediate file is parsed to obtain base sequence information of the novel coronavirus sample to be evaluated; and then, comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to find out the base different from the reference sequence in the novel coronavirus sample to be evaluated, and obtaining mutation information.
The mutation information may include, among others, mutation site information and mutation type information (e.g., single base mutation, base deletion mutation, and base insertion mutation).
In some embodiments, to improve the accuracy of the evaluation result, sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy before the base sequence information of the novel coronavirus sample to be evaluated is aligned with the reference sequence.
It can be appreciated that due to the sequencing error, the base sequence information of the novel coronavirus sample to be evaluated may be caused to exist at the point of inconsistent reference sequence, namely potential error; and, this potential error is random and has no preference. Thus, such random errors are eliminated by sequence consistency, entropy; to provide accuracy of the final evaluation result.
And S103, classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information.
As an example, a classification database (e.g., GISAID classification criteria, nextstrain classification criteria, pangolin classification criteria, etc.) is queried according to the mutation information to determine the classification of the new coronavirus sample to be evaluated; therefore, the epidemic trend of the classification of the sample in internationally can be counted according to the classification result.
S104, carrying out epidemic analysis, function prediction and clinical association on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical association result and the classification information.
That is, after the classification evaluation is performed on the novel coronavirus sample to be evaluated, the pandemic analysis, the function prediction and the clinical association are also performed on the novel coronavirus sample to be evaluated, so that the novel coronavirus sample to be evaluated is subjected to omnibearing evaluation, the referenceability of the evaluation result is ensured, and finally, evaluation data is generated according to the pandemic analysis result, the function prediction result, the clinical association result and the classification information.
As an example, performing a pandemic analysis on a novel coronavirus sample to be evaluated includes: and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
That is, in the novel automatic coronavirus sample base mutation assessment method provided by the embodiment of the invention, a history database storing high-quality new coronavirus sequence information is preset; the high-quality new coronavirus sequence information comprises a virus sequence and detailed information such as the explosion time, the explosion country and the explosion region corresponding to the virus sequence. Thus, when the historical database is queried according to the base sequence information of the novel coronavirus sample to be evaluated, detailed information such as the explosion time, the explosion country, the explosion region and the like corresponding to the novel coronavirus sample to be evaluated can be obtained; furthermore, through the visual programming of the historical database and the map information, the virus epidemic situation analysis can be facilitated.
As an example, performing a functional prediction on a novel coronavirus sample to be evaluated includes: and comparing the basic base sequence information of the novel coronavirus to be evaluated with a reference sequence, so as to judge whether the mutation site affects protein coding according to mutation information and judge whether the mutation site changes amino acid attributes.
That is, the biological function of the novel coronavirus sample to be evaluated is also predicted. For example, the mutation site is located at a position in the genome of the novel coronavirus, the mutation site is located at a position of the novel coronavirus protein, the mutation site is located at a position of a codon, whether the mutation site changes the protein sequence, whether the biochemical properties of the encoded amino acid before and after mutation are changed, whether the mutation site is located in a functional domain of the protein, etc.; to determine whether the mutation site affects protein coding, alters amino acid properties, is located in the core domain of the protein.
In some embodiments, performing a functional prediction on a novel coronavirus sample to be evaluated comprises: and inquiring a new coronavirus protein database according to the mutation information, judging whether the mutation site is positioned in a protein core structural region according to the inquiry result, and carrying out conservation scoring on the mutation site.
As an example, the sequences of coronaviruses as recorded in the UCSC database are used for the conservation estimation of each site in the new coronavirus, i.e. for each mutation site.
In some embodiments, clinically correlating the novel coronavirus sample to be evaluated comprises: clinical data corresponding to the novel coronavirus sample to be evaluated is obtained, the clinical data is preprocessed to generate standardized data, and association analysis and enrichment analysis are carried out on the standardized data and mutation information.
That is, on the premise that a researcher can provide clinical data corresponding to a novel coronavirus sample to be evaluated, acquiring the clinical data, and preprocessing the acquired clinical data to generate corresponding standardized data; furthermore, the standardized data and the mutation information can be subjected to correlation analysis (such as correlation analysis, enrichment analysis and the like) so as to evaluate the clinical characterization of the mutation of the novel coronavirus sample to be evaluated on the infected host, and comprehensively evaluate the clinical function of the genetic variation.
In summary, according to the method for automatically evaluating the base mutation of the novel coronavirus sample in the embodiment of the invention, first, the original data of the novel coronavirus sample to be evaluated is obtained, and the original data is compared according to the reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In order to achieve the above-described embodiments, an embodiment of the present invention proposes a computer-readable storage medium having stored thereon a novel coronavirus sample base mutation automatic assessment program which, when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method as described above.
According to the computer readable storage medium, the novel coronavirus sample base mutation automatic evaluation program is stored, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, and the novel coronavirus sample base mutation is automatically and comprehensively evaluated, so that the accurate judgment of a virus sample is realized, and further powerful help is provided for virus tracing, virus typing and virus research.
In order to implement the above embodiment, the embodiment of the present invention proposes a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the above-mentioned novel automatic coronavirus sample base mutation assessment method when executing the program.
According to the computer equipment provided by the embodiment of the invention, the novel coronavirus sample base mutation automatic evaluation program is stored through the memory, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, the novel coronavirus sample base mutation is automatically and comprehensively evaluated, the virus sample is accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (6)
1. A novel automatic coronavirus sample base mutation assessment method, which is characterized by comprising the following steps:
Obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file;
Analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information;
Classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information;
Carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to a epidemic analysis result, a function prediction result, a clinical relevance result and the classification information;
The clinical correlation comprises the steps of obtaining clinical data corresponding to the novel coronavirus sample to be evaluated, preprocessing the clinical data to generate standardized data, and carrying out correlation analysis and enrichment analysis on the standardized data and the mutation information to evaluate clinical characterization of mutation of the novel coronavirus sample to be evaluated on an affected infection host;
the functional prediction comprises the steps of comparing basic base sequence information of the novel coronavirus to be evaluated with a reference sequence, so as to judge whether a mutation site affects protein coding according to mutation information and judge whether the mutation site changes amino acid attributes; or (b)
And inquiring a new coronavirus protein database according to the mutation information, judging whether mutation sites are positioned in a protein core structure region according to an inquiry result, and carrying out conservation scoring on the mutation sites.
2. The automatic base mutation assessment method for a novel coronavirus sample according to claim 1, wherein the original data is a fastq format file corresponding to the novel coronavirus sample to be assessed, and the intermediate file is a bam format file.
3. The method for automatically evaluating the base mutation of a novel coronavirus sample according to claim 1, wherein sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy before the base sequence information of the novel coronavirus sample to be evaluated is aligned with a reference sequence.
4. The method for automatically assessing the base mutation of a novel coronavirus sample of claim 1, wherein performing a pandemic analysis on the novel coronavirus sample to be assessed comprises:
and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
5. A computer-readable storage medium, characterized in that a novel coronavirus sample base mutation automatic assessment program is stored thereon, which when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method according to any one of claims 1 to 4.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the novel coronavirus sample base mutation automatic assessment method of any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410381953.0A CN118280441A (en) | 2021-05-28 | 2021-05-28 | Novel coronavirus sample evaluation method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110590258.1A CN113936739A (en) | 2021-05-28 | 2021-05-28 | Novel automatic assessment method for base mutation of coronavirus sample |
CN202410381953.0A CN118280441A (en) | 2021-05-28 | 2021-05-28 | Novel coronavirus sample evaluation method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110590258.1A Division CN113936739A (en) | 2021-05-28 | 2021-05-28 | Novel automatic assessment method for base mutation of coronavirus sample |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118280441A true CN118280441A (en) | 2024-07-02 |
Family
ID=79274248
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410381953.0A Pending CN118280441A (en) | 2021-05-28 | 2021-05-28 | Novel coronavirus sample evaluation method |
CN202110590258.1A Pending CN113936739A (en) | 2021-05-28 | 2021-05-28 | Novel automatic assessment method for base mutation of coronavirus sample |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110590258.1A Pending CN113936739A (en) | 2021-05-28 | 2021-05-28 | Novel automatic assessment method for base mutation of coronavirus sample |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN118280441A (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1446926A (en) * | 2002-03-27 | 2003-10-08 | 江斌 | Method for positioning genes related to body odor |
CN104372010A (en) * | 2014-11-13 | 2015-02-25 | 深圳华大基因科技有限公司 | New mutant pathogenic gene of febrile convulsion as well as coding protein and application thereof |
CN107577921A (en) * | 2017-08-25 | 2018-01-12 | 云壹生物技术(大连)有限公司 | A kind of tumor target gene sequencing data analytic method |
CN113627458A (en) * | 2017-10-16 | 2021-11-09 | 因美纳有限公司 | Variant pathogenicity classifier based on recurrent neural network |
CN109961825B (en) * | 2019-03-29 | 2022-12-02 | 郑州大学 | Protein structure local three-dimensional modeling method based on gene SNP site mutation |
US20210118559A1 (en) * | 2019-10-22 | 2021-04-22 | Tempus Labs, Inc. | Artificial intelligence assisted precision medicine enhancements to standardized laboratory diagnostic testing |
CN111292802B (en) * | 2020-02-03 | 2021-03-16 | 至本医疗科技(上海)有限公司 | Method, electronic device, and computer storage medium for detecting sudden change |
CN111445955B (en) * | 2020-04-10 | 2021-09-10 | 广州微远医疗器械有限公司 | Novel coronavirus variation analysis method and application |
CN111321252B (en) * | 2020-04-17 | 2021-06-15 | 山东仕达思生物产业有限公司 | Novel coronavirus nucleic acid detection primer pair with mutation resistance, kit and application thereof |
CN112599192A (en) * | 2020-12-31 | 2021-04-02 | 杭州柏熠科技有限公司 | New coronavirus whole genome analysis system based on nanopore sequencing |
-
2021
- 2021-05-28 CN CN202410381953.0A patent/CN118280441A/en active Pending
- 2021-05-28 CN CN202110590258.1A patent/CN113936739A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN113936739A (en) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alves et al. | Advancement in protein inference from shotgun proteomics using peptide detectability | |
Teo et al. | SAINTq: Scoring protein‐protein interactions in affinity purification–mass spectrometry experiments with fragment or peptide intensity data | |
CN110993023B (en) | Detection method and detection device for complex mutation | |
CN105183814A (en) | Internet of Things data cleaning method | |
CN113096736A (en) | Method and system for automatically analyzing viruses in real time based on nanopore sequencing | |
Kearse et al. | The Geneious 6.0. 3 read mapper | |
US20200109452A1 (en) | Method of detecting a fetal chromosomal abnormality | |
CN109599149B (en) | Prediction method of RNA coding potential | |
CN113096737B (en) | Method and system for automatically analyzing pathogen type | |
CN118280441A (en) | Novel coronavirus sample evaluation method | |
KR101770962B1 (en) | A method and apparatus of providing information on a genomic sequence based personal marker | |
CN116312779A (en) | Method and apparatus for detecting sample contamination and identifying sample mismatch | |
CN111696629B (en) | Method for calculating gene expression quantity of RNA sequencing data | |
CN116994647A (en) | Method for constructing model for analyzing mutation detection result | |
CN114171116A (en) | Method for evaluating fetal DNA concentration by free and self DNA of pregnant woman and application | |
CN110310706A (en) | A kind of protein is without mark absolute quantification method | |
US20130309660A1 (en) | Methods of characterizing, determining similarity, predicting correlation between and representing sequences and systems and indicators therefor | |
JP5213009B2 (en) | Gene expression variation analysis method and system, and program | |
Yona et al. | A unified sequence-structure classification of protein sequences: combining sequence and structure in a map of the protein space | |
US20210214774A1 (en) | Method for the identification of organisms from sequencing data from microbial genome comparisons | |
JP4614960B2 (en) | Testing the amino acid sequence of peptides by isotopic ratio | |
Freedman et al. | Building better genome annotations across the tree of life | |
CN113969310B (en) | Fetal DNA concentration evaluation method and application | |
CN113355438B (en) | Plasma microbial species diversity evaluation method and device and storage medium | |
Sarkozy et al. | Bootstrap-based quality scores for ngs variant calling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication |