CN118280441A - Novel coronavirus sample evaluation method - Google Patents

Novel coronavirus sample evaluation method Download PDF

Info

Publication number
CN118280441A
CN118280441A CN202410381953.0A CN202410381953A CN118280441A CN 118280441 A CN118280441 A CN 118280441A CN 202410381953 A CN202410381953 A CN 202410381953A CN 118280441 A CN118280441 A CN 118280441A
Authority
CN
China
Prior art keywords
novel coronavirus
evaluated
mutation
sample
coronavirus sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410381953.0A
Other languages
Chinese (zh)
Inventor
陈路
唐超
林静雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202410381953.0A priority Critical patent/CN118280441A/en
Publication of CN118280441A publication Critical patent/CN118280441A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a novel coronavirus sample evaluation method, medium and equipment, wherein the method comprises the following steps: obtaining original data of a novel coronavirus sample to be evaluated, and comparing to generate an intermediate file; analyzing the intermediate file to obtain base sequence information, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; carrying out epidemic analysis, function prediction and clinical relevance on a novel coronavirus sample to be evaluated, and generating evaluation data according to a epidemic analysis result, a function prediction result, a clinical relevance result and classification information; the novel coronavirus sample base mutation can be automatically and comprehensively evaluated to accurately judge the virus sample, so that powerful help is provided for virus tracing, virus typing and virus research.

Description

Novel coronavirus sample evaluation method
The application is a divisional application of Chinese application patent application based on the application number CN202110590258.1 and the application date 2021, 5 and 28, and the application name of the novel automatic evaluation method for coronavirus sample base mutation.
Technical Field
The invention relates to the technical field of virus sample data analysis, in particular to a novel coronavirus sample evaluation method, a computer readable storage medium and computer equipment.
Background
In the related art, when detecting a novel coronavirus, virus sample data is processed only through a traditional general process, so that data which can be directly interpreted by a user is obtained. The virus data is processed in the mode, so that the function is single, the deep interpretation of the data is lacked, and the deep research of viruses is not facilitated.
Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems in the above-described technology. Therefore, an object of the present invention is to provide a novel automatic evaluation method for base mutation of coronavirus sample, which can automatically and comprehensively evaluate the base mutation of the novel coronavirus sample to accurately judge the virus sample, thereby providing powerful help for virus tracing, virus typing and virus research.
A second object of the present invention is to propose a computer readable storage medium.
A third object of the invention is to propose a computer device.
To achieve the above objective, an embodiment of the first aspect of the present invention provides a novel automatic coronavirus sample base mutation assessment method, comprising the following steps: obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file; analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; and carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information.
According to the novel coronavirus sample base mutation automatic assessment method, firstly, the original data of a novel coronavirus sample to be assessed is obtained, and the original data is compared according to a reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In addition, the novel automatic coronavirus sample base mutation assessment method provided by the embodiment of the invention can also have the following additional technical characteristics:
Optionally, the original data is a fastq format file corresponding to the novel coronavirus sample to be evaluated, and the intermediate file is a bam format file.
Optionally, before comparing the base sequence information of the novel coronavirus sample to be evaluated with the reference sequence, sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy.
Optionally, performing a pandemic analysis on the novel coronavirus sample to be evaluated, comprising: and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
Optionally, performing functional prediction on the novel coronavirus sample to be evaluated, including: and comparing the basic base sequence information of the novel coronavirus to be evaluated with a reference sequence to judge whether a mutation site influences protein coding according to mutation information and judge whether the mutation site changes amino acid attributes.
Optionally, performing functional prediction on the novel coronavirus sample to be evaluated, including: and inquiring a new coronavirus protein database according to the mutation information, judging whether mutation sites are positioned in a protein core structure region according to an inquiry result, and carrying out conservation scoring on the mutation sites.
Optionally, clinically correlating the novel coronavirus sample to be evaluated, comprising: and acquiring clinical data corresponding to the novel coronavirus sample to be evaluated, preprocessing the clinical data to generate standardized data, and carrying out association analysis and enrichment analysis on the standardized data and the mutation information.
To achieve the above object, a second aspect of the present invention provides a computer-readable storage medium having stored thereon a novel coronavirus sample base mutation automatic assessment program which, when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method as described above.
According to the computer readable storage medium, the novel coronavirus sample base mutation automatic evaluation program is stored, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, and the novel coronavirus sample base mutation is automatically and comprehensively evaluated, so that the accurate judgment of a virus sample is realized, and further powerful help is provided for virus tracing, virus typing and virus research.
To achieve the above object, an embodiment of the third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the above-mentioned novel automatic coronavirus sample base mutation assessment method.
According to the computer equipment provided by the embodiment of the invention, the novel coronavirus sample base mutation automatic evaluation program is stored through the memory, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, the novel coronavirus sample base mutation is automatically and comprehensively evaluated, the virus sample is accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
Drawings
FIG. 1 is a schematic flow chart of a novel automatic evaluation method for base mutations of coronavirus samples according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the related art, when detecting a novel coronavirus, virus sample data is processed only through a traditional general process, so that data which can be directly interpreted by a user is obtained. The virus data is processed in the mode, so that the function is single, the deep interpretation of the data is lacked, and the deep research of viruses is not facilitated; according to the novel coronavirus sample base mutation automatic assessment method, firstly, the original data of a novel coronavirus sample to be assessed is obtained, and the original data is compared according to a reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a novel automatic coronavirus sample base mutation assessment method according to an embodiment of the present invention, as shown in FIG. 1, comprising the steps of:
S101, obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file.
In some embodiments, the raw data is a fastq format file corresponding to the new coronavirus sample to be evaluated (e.g., the raw data of second generation sequencing or third generation sequencing is a fastq format file), and the intermediate file is a bam format file.
S102, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information.
That is, the intermediate file is parsed to obtain base sequence information of the novel coronavirus sample to be evaluated; and then, comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to find out the base different from the reference sequence in the novel coronavirus sample to be evaluated, and obtaining mutation information.
The mutation information may include, among others, mutation site information and mutation type information (e.g., single base mutation, base deletion mutation, and base insertion mutation).
In some embodiments, to improve the accuracy of the evaluation result, sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy before the base sequence information of the novel coronavirus sample to be evaluated is aligned with the reference sequence.
It can be appreciated that due to the sequencing error, the base sequence information of the novel coronavirus sample to be evaluated may be caused to exist at the point of inconsistent reference sequence, namely potential error; and, this potential error is random and has no preference. Thus, such random errors are eliminated by sequence consistency, entropy; to provide accuracy of the final evaluation result.
And S103, classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information.
As an example, a classification database (e.g., GISAID classification criteria, nextstrain classification criteria, pangolin classification criteria, etc.) is queried according to the mutation information to determine the classification of the new coronavirus sample to be evaluated; therefore, the epidemic trend of the classification of the sample in internationally can be counted according to the classification result.
S104, carrying out epidemic analysis, function prediction and clinical association on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical association result and the classification information.
That is, after the classification evaluation is performed on the novel coronavirus sample to be evaluated, the pandemic analysis, the function prediction and the clinical association are also performed on the novel coronavirus sample to be evaluated, so that the novel coronavirus sample to be evaluated is subjected to omnibearing evaluation, the referenceability of the evaluation result is ensured, and finally, evaluation data is generated according to the pandemic analysis result, the function prediction result, the clinical association result and the classification information.
As an example, performing a pandemic analysis on a novel coronavirus sample to be evaluated includes: and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
That is, in the novel automatic coronavirus sample base mutation assessment method provided by the embodiment of the invention, a history database storing high-quality new coronavirus sequence information is preset; the high-quality new coronavirus sequence information comprises a virus sequence and detailed information such as the explosion time, the explosion country and the explosion region corresponding to the virus sequence. Thus, when the historical database is queried according to the base sequence information of the novel coronavirus sample to be evaluated, detailed information such as the explosion time, the explosion country, the explosion region and the like corresponding to the novel coronavirus sample to be evaluated can be obtained; furthermore, through the visual programming of the historical database and the map information, the virus epidemic situation analysis can be facilitated.
As an example, performing a functional prediction on a novel coronavirus sample to be evaluated includes: and comparing the basic base sequence information of the novel coronavirus to be evaluated with a reference sequence, so as to judge whether the mutation site affects protein coding according to mutation information and judge whether the mutation site changes amino acid attributes.
That is, the biological function of the novel coronavirus sample to be evaluated is also predicted. For example, the mutation site is located at a position in the genome of the novel coronavirus, the mutation site is located at a position of the novel coronavirus protein, the mutation site is located at a position of a codon, whether the mutation site changes the protein sequence, whether the biochemical properties of the encoded amino acid before and after mutation are changed, whether the mutation site is located in a functional domain of the protein, etc.; to determine whether the mutation site affects protein coding, alters amino acid properties, is located in the core domain of the protein.
In some embodiments, performing a functional prediction on a novel coronavirus sample to be evaluated comprises: and inquiring a new coronavirus protein database according to the mutation information, judging whether the mutation site is positioned in a protein core structural region according to the inquiry result, and carrying out conservation scoring on the mutation site.
As an example, the sequences of coronaviruses as recorded in the UCSC database are used for the conservation estimation of each site in the new coronavirus, i.e. for each mutation site.
In some embodiments, clinically correlating the novel coronavirus sample to be evaluated comprises: clinical data corresponding to the novel coronavirus sample to be evaluated is obtained, the clinical data is preprocessed to generate standardized data, and association analysis and enrichment analysis are carried out on the standardized data and mutation information.
That is, on the premise that a researcher can provide clinical data corresponding to a novel coronavirus sample to be evaluated, acquiring the clinical data, and preprocessing the acquired clinical data to generate corresponding standardized data; furthermore, the standardized data and the mutation information can be subjected to correlation analysis (such as correlation analysis, enrichment analysis and the like) so as to evaluate the clinical characterization of the mutation of the novel coronavirus sample to be evaluated on the infected host, and comprehensively evaluate the clinical function of the genetic variation.
In summary, according to the method for automatically evaluating the base mutation of the novel coronavirus sample in the embodiment of the invention, first, the original data of the novel coronavirus sample to be evaluated is obtained, and the original data is compared according to the reference sequence to generate an intermediate file; then, analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information; then classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information; secondly, carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to the epidemic analysis result, the function prediction result, the clinical relevance result and the classification information; therefore, the novel coronavirus sample base mutation can be automatically and comprehensively evaluated, so that the virus sample can be accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
In order to achieve the above-described embodiments, an embodiment of the present invention proposes a computer-readable storage medium having stored thereon a novel coronavirus sample base mutation automatic assessment program which, when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method as described above.
According to the computer readable storage medium, the novel coronavirus sample base mutation automatic evaluation program is stored, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, and the novel coronavirus sample base mutation is automatically and comprehensively evaluated, so that the accurate judgment of a virus sample is realized, and further powerful help is provided for virus tracing, virus typing and virus research.
In order to implement the above embodiment, the embodiment of the present invention proposes a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the above-mentioned novel automatic coronavirus sample base mutation assessment method when executing the program.
According to the computer equipment provided by the embodiment of the invention, the novel coronavirus sample base mutation automatic evaluation program is stored through the memory, so that the novel coronavirus sample base mutation automatic evaluation method is realized when the novel coronavirus sample base mutation automatic evaluation program is executed by the processor, the novel coronavirus sample base mutation is automatically and comprehensively evaluated, the virus sample is accurately judged, and powerful help is provided for virus tracing, virus typing and virus research.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (6)

1. A novel automatic coronavirus sample base mutation assessment method, which is characterized by comprising the following steps:
Obtaining original data of a novel coronavirus sample to be evaluated, and comparing the original data according to a reference sequence to generate an intermediate file;
Analyzing the intermediate file to obtain the base sequence information of the novel coronavirus sample to be evaluated, and comparing the base sequence information of the novel coronavirus sample to be evaluated with a reference sequence to obtain mutation information;
Classifying the novel coronavirus sample to be evaluated according to the mutation information to generate classification information;
Carrying out epidemic analysis, function prediction and clinical relevance on the novel coronavirus sample to be evaluated, and generating evaluation data according to a epidemic analysis result, a function prediction result, a clinical relevance result and the classification information;
The clinical correlation comprises the steps of obtaining clinical data corresponding to the novel coronavirus sample to be evaluated, preprocessing the clinical data to generate standardized data, and carrying out correlation analysis and enrichment analysis on the standardized data and the mutation information to evaluate clinical characterization of mutation of the novel coronavirus sample to be evaluated on an affected infection host;
the functional prediction comprises the steps of comparing basic base sequence information of the novel coronavirus to be evaluated with a reference sequence, so as to judge whether a mutation site affects protein coding according to mutation information and judge whether the mutation site changes amino acid attributes; or (b)
And inquiring a new coronavirus protein database according to the mutation information, judging whether mutation sites are positioned in a protein core structure region according to an inquiry result, and carrying out conservation scoring on the mutation sites.
2. The automatic base mutation assessment method for a novel coronavirus sample according to claim 1, wherein the original data is a fastq format file corresponding to the novel coronavirus sample to be assessed, and the intermediate file is a bam format file.
3. The method for automatically evaluating the base mutation of a novel coronavirus sample according to claim 1, wherein sequencing error information in the base sequence information of the novel coronavirus sample to be evaluated is also removed according to sequence identity and entropy before the base sequence information of the novel coronavirus sample to be evaluated is aligned with a reference sequence.
4. The method for automatically assessing the base mutation of a novel coronavirus sample of claim 1, wherein performing a pandemic analysis on the novel coronavirus sample to be assessed comprises:
and inquiring a historical database according to the base sequence information of the novel coronavirus sample to be evaluated so as to obtain the burst time, the burst country and the burst region corresponding to the novel coronavirus sample to be evaluated.
5. A computer-readable storage medium, characterized in that a novel coronavirus sample base mutation automatic assessment program is stored thereon, which when executed by a processor, implements the novel coronavirus sample base mutation automatic assessment method according to any one of claims 1 to 4.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the novel coronavirus sample base mutation automatic assessment method of any one of claims 1-4.
CN202410381953.0A 2021-05-28 2021-05-28 Novel coronavirus sample evaluation method Pending CN118280441A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410381953.0A CN118280441A (en) 2021-05-28 2021-05-28 Novel coronavirus sample evaluation method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110590258.1A CN113936739A (en) 2021-05-28 2021-05-28 Novel automatic assessment method for base mutation of coronavirus sample
CN202410381953.0A CN118280441A (en) 2021-05-28 2021-05-28 Novel coronavirus sample evaluation method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110590258.1A Division CN113936739A (en) 2021-05-28 2021-05-28 Novel automatic assessment method for base mutation of coronavirus sample

Publications (1)

Publication Number Publication Date
CN118280441A true CN118280441A (en) 2024-07-02

Family

ID=79274248

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202410381953.0A Pending CN118280441A (en) 2021-05-28 2021-05-28 Novel coronavirus sample evaluation method
CN202110590258.1A Pending CN113936739A (en) 2021-05-28 2021-05-28 Novel automatic assessment method for base mutation of coronavirus sample

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110590258.1A Pending CN113936739A (en) 2021-05-28 2021-05-28 Novel automatic assessment method for base mutation of coronavirus sample

Country Status (1)

Country Link
CN (2) CN118280441A (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1446926A (en) * 2002-03-27 2003-10-08 江斌 Method for positioning genes related to body odor
CN104372010A (en) * 2014-11-13 2015-02-25 深圳华大基因科技有限公司 New mutant pathogenic gene of febrile convulsion as well as coding protein and application thereof
CN107577921A (en) * 2017-08-25 2018-01-12 云壹生物技术(大连)有限公司 A kind of tumor target gene sequencing data analytic method
CN113627458A (en) * 2017-10-16 2021-11-09 因美纳有限公司 Variant pathogenicity classifier based on recurrent neural network
CN109961825B (en) * 2019-03-29 2022-12-02 郑州大学 Protein structure local three-dimensional modeling method based on gene SNP site mutation
US20210118559A1 (en) * 2019-10-22 2021-04-22 Tempus Labs, Inc. Artificial intelligence assisted precision medicine enhancements to standardized laboratory diagnostic testing
CN111292802B (en) * 2020-02-03 2021-03-16 至本医疗科技(上海)有限公司 Method, electronic device, and computer storage medium for detecting sudden change
CN111445955B (en) * 2020-04-10 2021-09-10 广州微远医疗器械有限公司 Novel coronavirus variation analysis method and application
CN111321252B (en) * 2020-04-17 2021-06-15 山东仕达思生物产业有限公司 Novel coronavirus nucleic acid detection primer pair with mutation resistance, kit and application thereof
CN112599192A (en) * 2020-12-31 2021-04-02 杭州柏熠科技有限公司 New coronavirus whole genome analysis system based on nanopore sequencing

Also Published As

Publication number Publication date
CN113936739A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
Alves et al. Advancement in protein inference from shotgun proteomics using peptide detectability
Teo et al. SAINTq: Scoring protein‐protein interactions in affinity purification–mass spectrometry experiments with fragment or peptide intensity data
CN110993023B (en) Detection method and detection device for complex mutation
CN105183814A (en) Internet of Things data cleaning method
CN113096736A (en) Method and system for automatically analyzing viruses in real time based on nanopore sequencing
Kearse et al. The Geneious 6.0. 3 read mapper
US20200109452A1 (en) Method of detecting a fetal chromosomal abnormality
CN109599149B (en) Prediction method of RNA coding potential
CN113096737B (en) Method and system for automatically analyzing pathogen type
CN118280441A (en) Novel coronavirus sample evaluation method
KR101770962B1 (en) A method and apparatus of providing information on a genomic sequence based personal marker
CN116312779A (en) Method and apparatus for detecting sample contamination and identifying sample mismatch
CN111696629B (en) Method for calculating gene expression quantity of RNA sequencing data
CN116994647A (en) Method for constructing model for analyzing mutation detection result
CN114171116A (en) Method for evaluating fetal DNA concentration by free and self DNA of pregnant woman and application
CN110310706A (en) A kind of protein is without mark absolute quantification method
US20130309660A1 (en) Methods of characterizing, determining similarity, predicting correlation between and representing sequences and systems and indicators therefor
JP5213009B2 (en) Gene expression variation analysis method and system, and program
Yona et al. A unified sequence-structure classification of protein sequences: combining sequence and structure in a map of the protein space
US20210214774A1 (en) Method for the identification of organisms from sequencing data from microbial genome comparisons
JP4614960B2 (en) Testing the amino acid sequence of peptides by isotopic ratio
Freedman et al. Building better genome annotations across the tree of life
CN113969310B (en) Fetal DNA concentration evaluation method and application
CN113355438B (en) Plasma microbial species diversity evaluation method and device and storage medium
Sarkozy et al. Bootstrap-based quality scores for ngs variant calling

Legal Events

Date Code Title Description
PB01 Publication