CN112270960A - Two-stage tumor diagnosis knowledge base and tumor mutation analysis system - Google Patents

Two-stage tumor diagnosis knowledge base and tumor mutation analysis system Download PDF

Info

Publication number
CN112270960A
CN112270960A CN202011207776.2A CN202011207776A CN112270960A CN 112270960 A CN112270960 A CN 112270960A CN 202011207776 A CN202011207776 A CN 202011207776A CN 112270960 A CN112270960 A CN 112270960A
Authority
CN
China
Prior art keywords
tumor
mutation
information
clinical
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011207776.2A
Other languages
Chinese (zh)
Other versions
CN112270960B (en
Inventor
开震天
龚振华
严志祥
沈伟强
罗鹏
蔡丽君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Topgen Biomedical Technology Co ltd
Original Assignee
Shanghai Topgen Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Topgen Biomedical Technology Co ltd filed Critical Shanghai Topgen Biomedical Technology Co ltd
Priority to CN202011207776.2A priority Critical patent/CN112270960B/en
Publication of CN112270960A publication Critical patent/CN112270960A/en
Application granted granted Critical
Publication of CN112270960B publication Critical patent/CN112270960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/20ICT specially adapted for the handling or processing of medical references relating to practices or guidelines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to a tumor mutation three-level analysis technology in the field of precise tumor medical treatment, and particularly relates to a two-level tumor diagnosis knowledge base and a tumor mutation analysis system. The tumor mutation analysis system provides a set of comprehensive and systematic tumor mutation rating method, guarantees normalization, accuracy and applicability of tumor mutation interpretation, integrates related information from all public resources timely, efficiently and accurately according to individual research needs, avoids repetition, omission and false detection of the information, does not need to consume a large amount of manpower and energy, and improves clinical reference reliability of the interpreted information.

Description

Two-stage tumor diagnosis knowledge base and tumor mutation analysis system
Technical Field
The invention belongs to a tumor mutation three-stage analysis technology in the field of precise tumor medical treatment, and particularly relates to a two-stage tumor diagnosis knowledge base and a construction method thereof, a tumor mutation analysis system integrating information of the two-stage tumor diagnosis knowledge base and a tumor mutation analysis method.
Background
Clinical diagnosis is a very complicated process, and doctors often need to combine the past medical history, current symptoms and various detection means of patients for comprehensive evaluation. With the advent of genetic testing technology, humans have been able to study the pathogenesis of diseases at a molecular level, which also provides genetic help for clinical diagnosis. In recent years, the popularization of high-throughput next-generation sequencing technology has led researchers to find mutations at a rate far exceeding that of the past. This revolutionary advance has created opportunities as well as significant challenges. How to mine the clinical significance contained behind these mutations becomes the most critical step in the present clinical setting, especially in the tumor field.
To solve the above problems, guidelines for tumor mutation classification and interpretation were published in 2017 by the AMP-ASCO-CAP combination. This guideline is intended to provide a framework theoretical basis for the interpretation of tumor mutations, the ten major criteria it suggests. However, these criteria are not complete and have not achieved consensus in the Tumor kingdom, such as "Identification of Germline variation in Tumor Genomic Sequencing analysis" (Identification of Germline Variants in Tumor Genomic Sequencing analysis. Montgomery ND, Selitsky SR, Patel NM, Hayes DN, Parker JS, Weck KE.J Mol Diagn2018, 20(1) 123-A certain criticism is given. In addition, the guidelines do not provide practical suggestions, such as: how many weights are assigned to each of the ten criteria, how the expression pattern of the mutations is uniform, how to collect clinical evidence, and the like.
Aiming at the subdivision field of tumor marker clinical databases, no unified standard exists in the market at present, and all large databases (such as CIViC, PMKB and CGI) are developed independently and have overlarge difference, so that the differences are quite large from mutation description modes, drug names and gene names to evidence grade grading modes and tumor grading. Moreover, no system is available on the market at present for integrating such data, which is time-consuming and labor-consuming if manual integration is performed, and due to the huge difference between databases, the accuracy of integration cannot be guaranteed, and the update information of each database cannot be tracked and acquired in time.
Aiming at the subdivision field of mutation annotation, mature software such as Annovar, ENSEMBL-VEP and snpEFF are available in the market at present, however, the software can only provide basic mutation function annotation and crowd frequency annotation, the clinical significance of the mutation in the tumor cannot be annotated, and the clinical information is the important basis for tumor mutation grading and interpretation.
Even if the reader is aware of all of the above information, how to grade the final single tumor mutation based on this information is still a current difficulty, because the grading of tumor mutations and their corresponding clinical evidence grading are closely related to the tumor typing of patients, and the clinical tumor typing system is too large, and even the versions released by the same organization at different times will have great differences. Therefore, it is important to unify patient's tumor typing input, as well as the normalized tumor clinical marker database.
Disclosure of Invention
The high-throughput second-generation sequencing technology enables massive mutations to be detected simultaneously, and meanwhile interpretation of the massive mutations becomes an industrial problem, so that how to correctly interpret each mutation and mine the clinical significance behind the mutation is a necessary step in the field of accurate medical technology. This problem is particularly important in the field of tumors, which, unlike the field of genetic diseases, are not diseases caused by single mutations or small number of mutations, but rather are complex diseases caused by mutations in the environment and various genes. This property of tumors has led physicians or researchers to understand the precise effects of each mutation on the tumor, and only by means of medical inquiry can search for known clinical evidence of each mutation one by one. Therefore, public resources for collecting and integrating clinical information of tumors become the most critical problem in the field.
The existing public resources of the clinical information of each tumor are developed independently, have no unified standard, have too large difference of the labeled information, cannot track and update comprehensively in real time, are not beneficial to the instant judgment, research and reference of an information acquirer, even if doctors or researchers with relevant background knowledge refer to the public resources of the clinical information of each tumor, can not make accurate judgment on each mutation in the tumor due to lack of clear and unified reading standard when researching and reading and judging the public resources of the clinical information of each tumor.
In addition, clinical evidence for tumors is based on the clear clinical diagnostic typing of tumors in each study, since the same mutation may have completely different manifestations in different tumor types. This means that the patient must be well-defined for a personalized test and an accurate determination of whether the classification matches the tumor classification described in the known clinical evidence. However, there is no matching rule for uniform clinical classification in the tumor field, so how to implement the matching function becomes a key step for personalized detection.
Aiming at the defects of difficult integration and inaccurate interpretation of common resources of clinical tumor information, the invention provides a secondary tumor diagnosis knowledge base and a construction method thereof, the construction method integrates the clinical tumor information in the existing common knowledge base of clinical tumor information by using special rules such as mutation analysis, tumor typing analysis and the like, unifies information labeling and integration standards, reduces the individual difference of tumor clinical information integration, avoids the repetition, omission and misjudgment of information, ensures the comprehensiveness, timeliness and accuracy of the integrated information, standardizes the information interpretation rating standard, accurately matches the individual-related clinical tumor information from the common resources of clinical tumor information, performs hierarchical sequencing on the individual-related clinical tumor information according to the individual correlation, determines the individual reference grade, reduces the individual difference of information interpretation, and is an information acquirer, especially, the research and judgment of doctors and researchers provide reliable information resources, and the clinical guidance value of the doctors and the researchers is improved.
The technical scheme of the invention is that a two-stage tumor diagnosis knowledge base comprises:
and the information acquisition unit is used for tracking and acquiring the latest tumor clinical information knowledge base from the common tumor clinical information knowledge bases respectively and sending the latest tumor clinical information knowledge base to the initial tumor diagnosis knowledge base unit.
The initial tumor diagnosis knowledge base unit is used for storing the latest tumor clinical information knowledge base of each tumor clinical information public knowledge base acquired from the information acquisition unit into the initial tumor diagnosis knowledge base, comparing the latest tumor clinical information knowledge base with historical versions stored in the initial tumor diagnosis knowledge base and derived from the same knowledge base, comparing the contents of the old and new version knowledge bases, capturing and recording updated tumor clinical information including but not limited to mutation description, clinical information and clinical evidence grade information, and simultaneously sending the updated tumor clinical information to the secondary tumor diagnosis knowledge base unit;
the description of the mutation includes but is not limited to the name of the gene, the name of the transcript, the exon, the codon, the type of the mutation and the like corresponding to the mutation;
the clinical information includes but is not limited to available drugs, tumor typing, clinical diagnosis and treatment description, post-healing condition, metastatic condition, and the like;
the clinical evidence grade information includes but is not limited to evidence level, evidence type, evidence grade and clinical meaning division according to various indexes such as sources of clinical information and the stage of the clinical information.
A secondary tumor diagnosis knowledge base unit, which acquires updated tumor clinical information from the initial tumor diagnosis knowledge base unit, analyzes the mutation description in each piece of tumor clinical information into definite all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations one by one, and according to HGVS naming stipulation, the mutation sites comprise but not limited to gene names, transcript names, exons and codons, and the mutation types comprise but not limited to insertion, deletion, insertional deletion, missense mutation, frameshift mutation, synonymous mutation, nonsense frameshift mutation, shearing mutation, any mutation and other mutations;
analyzing the tumor types in the clinical information into a tumor diagnosis type tree, wherein the tumor diagnosis type tree comprises but is not limited to tumor types which are analyzed according to tumor type rules provided by the world health organization and have upper and lower level membership, and each tumor type has an internal code; and the names of the same genes, and the names of the same drugs were normalized.
The information retrieval output unit is used for inputting individual mutation description and tumor typing, analyzing all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations of the input mutation description, and matching and displaying and outputting all tumor clinical information which is the same as the mutation site information and the mutation type information or has the mutation sites with upper and lower inclusion relations and the mutation types from the secondary tumor diagnosis knowledge base; according to the codes corresponding to the input tumor types, matching the tumor clinical information which is the same as the input tumor types and corresponds to the tumor types with superior inclusion relation from all the obtained tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance); the secondary tumor diagnosis knowledge base carries out hierarchical sequencing on all tumor clinical information related to individuals according to the individual correlation strength, and defines individual reference levels for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individuals.
The invention also provides a construction method of the secondary tumor diagnosis knowledge base, which comprises the following steps:
s1, information acquisition: respectively tracking and acquiring the latest tumor clinical information knowledge base from the public tumor clinical information knowledge bases;
s2, constructing an initial tumor diagnosis knowledge base: storing the latest tumor clinical information knowledge base of the acquired tumor clinical information public knowledge base to generate an initial tumor diagnosis knowledge base, comparing the initial tumor diagnosis knowledge base with historical versions derived from the same knowledge base stored in the initial tumor diagnosis knowledge base, comparing the contents of the new and old versions of the knowledge base, and capturing and recording updated tumor clinical information including but not limited to mutation description, clinical information and clinical evidence grade information;
the description of the mutation includes but is not limited to the name of the gene, the name of the transcript, the exon, the codon, the type of the mutation and the like corresponding to the mutation;
the clinical information includes but is not limited to available drugs, tumor typing, clinical diagnosis and treatment description, post-healing condition, metastatic condition, and the like;
the clinical evidence grade information comprises but is not limited to evidence level, evidence type, evidence grade and clinical meaning division according to indexes such as sources of clinical information, the stage of the clinical information and the like;
s3, constructing a secondary tumor diagnosis knowledge base: analyzing the mutation description in each piece of updated tumor clinical information acquired from the initial tumor diagnosis knowledge base one by one into all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations, according to HGVS naming stipulation, the mutation sites include but are not limited to gene names, transcript names, exons and codons, and the mutation types include but are not limited to insertions, deletions, indels, missense mutations, frameshift mutations, synonymous mutations, nonsense frameshift mutations, shearing mutations, arbitrary mutations and other mutations; analyzing the tumor types in the clinical information to tumor diagnosis type tree species, wherein the tumor diagnosis type tree species comprises but is not limited to tumor types which are analyzed according to tumor type rules provided by the world health organization and have upper and lower level membership, and each tumor type has an internal code; meanwhile, normalizing the names of the same genes and the names of the same medicines to generate a secondary tumor diagnosis knowledge base;
after mutation description analysis and tumor typing analysis in the secondary tumor diagnosis knowledge base, according to all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations after input individual mutation description and mutation description analysis, matching and displaying all tumor clinical information corresponding to the mutation site information and the mutation type information which are the same or have upper and lower inclusion relations from the secondary tumor diagnosis knowledge base; according to the codes corresponding to the tumor types of the input individuals, matching the tumor clinical information which is the same as the input tumor types and corresponds to the tumor types with superior inclusion relations from all the tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance); and (3) according to the individual correlation strength, all tumor clinical information related to the individual is ranked in a grading way, and an individual reference grade is determined, so that doctors and researchers can make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
The public knowledge base of clinical tumor information in the invention is a large website, a database, software and the like capable of annotating clinical tumor information, and comprises but is not limited to CIViC, PMKB and CGI.
The method for normalizing the same gene name comprises the following steps: matching and acquiring names of the same genes with different names in a human genome database (HGNC), converting and uniformly using the names in the human genome database.
The method for normalizing the same medicine name comprises the following steps: and matching and acquiring the name with the highest occurrence frequency of the same medicine with different names in the compound database from the compound information database, and converting and uniformly using the name with the highest occurrence frequency. Info, PubChem, and ChEMBL.
The invention also provides an executing device of the secondary tumor diagnosis knowledge base construction method, which comprises a memory for storing programs and a processor for executing the programs, so as to complete the secondary tumor diagnosis knowledge base construction method.
And a computer readable medium containing a program executable by a processor to perform the method for constructing a knowledge base for secondary tumor diagnosis of the present invention.
For the reader, the same base mutation, different reference sequence versions, annotated mutation basic information, such as the occurrence frequency of the mutated population, protein function prediction, HGVS naming and mutation transcript information, etc., are different, the same base mutation can correspond to a plurality of mutation transcripts, the HGVS naming information corresponding to each mutation transcript also has difference, the mutation related information annotated according to each mutation transcript, such as the mutation frequency in the tumor, whether the mutation is located in the tumor and related pathways, and whether the mutation is a start mutation, the tumor clinical information difference matched according to the HGVS naming (i.e. mutation description, such as mutation site and mutation type) is also obvious, and the above information annotation difference has no unified processing standard, different interpreters are different according to the information of each field of tumor annotation integration, and the clinical practicability and clinical reference of the finally integrated tumor clinical information and the individual mutation cannot be guaranteed And (4) value.
Furthermore, there is no uniform standard for how to interpret each mutation based on the information of each field of tumor mutation integration. For the reader, a great deal of time and energy is consumed for acquiring data item by item from each public resource, the professional expertise of the reader is required, and the requirement on the professional expertise is high. Various information is firstly summarized, and then clinical significance of each mutation is interpreted, so that the work is not only inefficient and lack of accuracy, but also more importantly, the work cannot be unified among different readers, and manual interpretation of two different readers can give completely different results.
Based on how to accurately obtain tumor clinical information related to individuals and other annotation information except the tumor clinical information from the individual mutation information and the tumor typing, eliminate the individual difference of information integration and annotation, and improve the individual practicability and clinical reference value, the invention provides a tumor mutation analysis system and a tumor mutation analysis method, on the basis of the two-stage tumor diagnosis knowledge base, by explicitly referring to a genome, and clinical transcripts and mutation descriptions which are most relevant to individual mutation, information annotation integration and interpretation standards are standardized, individual operation differences are reduced, and tumor clinical information with individual pertinence and other annotation information except the tumor clinical information which are subjected to relevance sequencing are obtained, so that the accuracy and clinical practicability of information interpretation are improved, and a reliable means is provided for obtaining individual relevant information with clinical pertinence.
Namely, the tumor mutation analysis system and the tumor mutation analysis method meet the requirement of individual research, integrate the relevant information from each public resource timely, efficiently and accurately, avoid the repetition, omission and error detection of the information, ensure the relevance of individuals and do not need to consume a large amount of manpower and energy; and according to a set of comprehensive and systematic tumor mutation grading method, specific judgment rules are subdivided based on AMP-ASCO-CAP tumor mutation grading and interpretation guidelines, and obtained information of various fields related to mutation, especially tumor clinical information, is graded and ordered according to individual correlation strength, so that individual reference grades are determined, and the normalization, the accuracy and the applicability of tumor mutation interpretation are ensured.
The technical scheme is that the tumor mutation analysis system comprises:
the information acquisition unit is used for acquiring mutation information, reference sequence versions and tumor types of individuals from the input port, sending the mutation information and the reference sequence versions to the mutation basic information annotation unit, and sending the tumor types to the mutation tumor clinical information acquisition unit;
and the mutation basic information annotation unit acquires information including but not limited to the following from the mutation basic information annotation public resource according to the mutation information acquired from the information acquisition unit and the reference sequence version: the method comprises the steps of (1) predicting the occurrence frequency of a mutated population, predicting the protein function, naming the HGVS and information of a mutated transcript, and sending the information of the naming of the HGVS to a mutated transcript acquisition unit;
the mutation transcript acquiring unit extracts a gene name from the HGVS name acquired by the mutation basic information annotation unit, and whether the gene corresponding to the mutation exists in a clinical most relevant transcript or not is matched in the locus reference sequence genome database, and if so, the gene is used as the clinical most relevant transcript; if not, selecting the transcript with the longest length from the mutant transcripts obtained from the mutation basic information annotation unit as the clinically most relevant transcript; extracting HGVS naming information corresponding to the most clinically relevant transcripts, and sending the HGVS naming information to a mutant tumor relevant information acquisition unit and a mutant tumor clinical information acquisition unit;
a mutation tumor related information acquisition unit which acquires the mutation frequency in the tumor, whether the tumor is located in a tumor and a related passage thereof, and whether the tumor is a priming mutation from a mutation tumor related information public resource according to the HGVS naming information corresponding to the clinically most relevant transcript acquired from the mutation transcript acquisition unit;
a mutation tumor clinical information acquisition unit which analyzes the HGVS naming information corresponding to the clinically most relevant transcript acquired from the mutation transcript acquisition unit, acquires the same or contained loci from the secondary tumor diagnosis knowledge base according to all the analyzed mutation locus information with the upper and lower inclusion relations and all the analyzed mutation type information with the upper and lower inclusion relations, and satisfies all the tumor clinical information corresponding to the mutation types with the same mutation type and the inclusion relations;
secondly, according to the codes corresponding to the individual tumor types acquired from the information acquisition unit, the evidence grades of the tumor clinical information which is matched from all the acquired tumor clinical information and corresponds to the tumor types which are the same as the input tumor types and have the superior inclusion relationship are unchanged, and the evidence grades of other tumor clinical information are adjusted downwards; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance);
if the clinical tumor information cannot be matched according to the analyzed mutation site information and the analyzed mutation type information, extracting the following indexes from the information acquired by the mutation basic information annotation unit, the mutation transcript acquiring unit and the mutation tumor related information acquiring unit, and grading the clinical tumor relevance: mutation frequency, population mutation frequency, tumor mutation frequency, germ line mutation knowledge base, function prediction of protein corresponding to mutation, and influence of mutation on tumor pathway; if more than three indexes appear, the index is judged to be IV grade, namely the correlation is the highest: the mutation frequency is close to 50% or 100%, and/or the mutation frequency of the human population is higher than 1%, and/or the mutation frequency of the tumor is lower than 1%, and/or the mutation is absent in germ line mutation, and/or the mutation does not have functional influence on protein, and/or the mutation is absent in the tumor and related pathways, and/or the mutation has no prominent phenotype in known related cells and animal experiments; if one to three items appear, the mutation grade is rated as grade III, and the clinical significance is unknown; if no one appears, the mark has potential clinical significance, but no known clinical evidence is observed;
and acquiring the hierarchical ranking of all tumor clinical information and other information except the tumor clinical information related to the individual according to the individual correlation strength, and determining the individual reference level for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
The invention also provides a tumor mutation analysis method, which comprises the following steps:
s1, information acquisition: obtaining individual mutation information, reference sequence version and tumor typing;
s2, mutation basis information annotation: and acquiring information including but not limited to the following from the mutation basic information annotation public resource according to the acquired mutation information and the reference sequence version: mutational population occurrence frequency, protein function prediction, HGVS naming, and mutational transcript information;
s3, mutant transcript acquisition: extracting a gene name from the HGVS name, matching whether a gene corresponding to the mutation exists a clinically most relevant transcript or not from a site reference sequence genome database, and if so, taking the gene as the clinically most relevant transcript; if not, selecting the transcript with the longest length from the mutant transcripts obtained from the mutation basic information annotation unit as the clinically most relevant transcript; extracting HGVS naming information corresponding to the clinically most relevant transcripts;
s4, acquiring the related information of the mutant tumor: acquiring mutation frequency in the tumor, whether the tumor is located in a tumor and a related passage thereof, and whether the tumor is a priming mutation from a public resource of mutation tumor related information according to HGVS naming information corresponding to a clinically most relevant transcript;
s5, acquiring clinical information of the mutant tumor: analyzing HGVS naming information corresponding to clinically most relevant transcripts, acquiring sites which are the same or have inclusion relation from a secondary tumor diagnosis knowledge base according to all analyzed mutation site information having upper and lower inclusion relation and all analyzed mutation type information having upper and lower inclusion relation, and meeting all tumor clinical information corresponding to mutation types which are the same and have inclusion relation;
secondly, according to codes corresponding to tumor typing, matching tumor clinical information which is the same as the input tumor typing and corresponds to the tumor typing with superior inclusion relation from all the obtained tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance);
if the clinical tumor information cannot be matched according to the analyzed mutation site information and the analyzed mutation type information, extracting the following indexes from the information acquired by the mutation basic information annotation unit, the mutation transcript acquiring unit and the mutation tumor related information acquiring unit, and grading the clinical tumor relevance: mutation Frequency (Variant Allelic Frequency), population mutation Frequency (Minor Allelic Frequency), tumor mutation Frequency, germ line mutation knowledge base, protein function prediction corresponding to mutation, and influence of mutation on tumor pathway; if more than three indexes appear, the index is judged to be IV grade, namely the correlation is the highest: the mutation frequency is close to 50% or 100%, and/or the mutation frequency of the human population is higher than 1%, and/or the mutation frequency of the tumor is lower than 1%, and/or the mutation is absent in germ line mutation, and/or the mutation does not have functional influence on protein, and/or the mutation is absent in the tumor and related pathways, and/or the mutation has no prominent phenotype in known related cells and animal experiments; if one to three items appear, the mutation grade is rated as grade III, and the clinical significance is unknown; if no one appears, the mark has potential clinical significance, but no known clinical evidence is observed;
and acquiring the hierarchical ranking of all tumor clinical information and other information except the tumor clinical information related to the individual according to the individual correlation strength, and determining the individual reference level for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
The public resource for mutation basic information annotation in the invention is a website, a database, software and the like capable of carrying out mutation basic information annotation, including but not limited to Annovar, ENSEMBL-VEP and snpEFF.
The genomic database of the locus reference sequence is a website, a database, software and the like capable of performing transcript annotation, and comprises but is not limited to an LRG (line-of-site query) knowledge base.
The tumor related information public resource provided by the invention is a website, a database, software and the like capable of carrying out tumor related information annotation, and comprises but is not limited to COSMIC, TCGA, ICGC and the like.
In order to ensure the implementation of the tumor mutation analysis method, the invention provides an execution device of the tumor mutation analysis method, which comprises a memory for storing a program and a processor for executing the program, so as to complete the tumor mutation analysis method.
The present invention also provides a computer readable medium containing a program executable by a processor to perform the tumor mutation analyzing method of the present invention.
However, it should be understood that the knowledge base for secondary tumor diagnosis and the tumor mutation analysis system of the present invention can perform personalized comprehensive interpretation of each mutation of a patient, but cannot make clinical guidance by integrating clinical meanings of each mutation, not only because the patient may contain mutations with conflicting clinical meanings, but also because the system cannot evaluate past medical history, current disease symptoms, and other detection results of the patient; even if two patients have detected the same mutation, the physician may make a completely different diagnosis and treatment for the other reasons mentioned above.
Compared with the prior art, the invention has the advantages that:
(1) secondary tumor diagnosis knowledge base:
the two-stage tumor diagnosis knowledge base tracks the update state of the common knowledge base of the clinical information of each tumor in real time through the information acquisition unit and acquires the clinical information knowledge base of each tumor in the latest version in time.
According to the secondary tumor diagnosis knowledge base, the updated version of each tumor clinical information public knowledge base is updated and stored in time through the initial tumor diagnosis knowledge base unit, the updated version of each tumor clinical information public knowledge base is compared with the historical version of each tumor clinical information public knowledge base, the updated tumor clinical information is extracted, the tracking record of the updated path is convenient to update, and omission or errors caused by manual operation are avoided.
The secondary tumor diagnosis knowledge base overcomes the defect that a plurality of knowledge bases are not comprehensive due to the reasons of non-uniform standards, different concentration directions and the like among the common knowledge bases of the clinical information of the tumors through the information acquisition unit and the initial tumor diagnosis knowledge base unit, ensures the comprehensiveness and accuracy of information collection from the knowledge base level and improves the reference value.
According to the secondary tumor diagnosis knowledge base, mutation description in tumor clinical information is analyzed into all specific mutation sites with upper and lower inclusion relations and all mutation types with upper and lower inclusion relations through a secondary tumor diagnosis knowledge base unit, so that subsequent matching retrieval is facilitated; analyzing the tumor types in the tumor clinical information into a tumor type tree, wherein each tumor type is correspondingly provided with one code, so that the subsequent matching retrieval is facilitated; the names of the same genes in the updated tumor information are normalized, the names of the same medicines are normalized, obstacles caused by alias names, common names and the like are eliminated, and the names are convenient to look up. The secondary tumor diagnosis knowledge base is analyzed through mutation description, so that all relevant tumor clinical information can be conveniently collected according to mutation, and the comprehensiveness and accuracy of the information are ensured; and marking the evidence grades of all tumor clinical information according to tumor typing analysis, namely the degree of closeness and closeness of the tumor typing, so that the clinical information is graded according to the individual correlation, the individual pertinence is improved, the reference value degree is improved, and reference of doctors and researchers is facilitated. In the process, mutation description analysis and tumor typing analysis occur twice respectively, once in the process of constructing a secondary tumor diagnosis knowledge base, the mutation description and the tumor typing in the known clinical information are respectively analyzed, once the mutation description and the tumor typing of an individual are input, the individual mutation description and the tumor typing analysis are carried out, finally, the individual mutation description analysis is matched with the mutation description analysis in the two-stage tumor diagnosis knowledge base, the tumor clinical information related to the individual is comprehensively and accurately obtained, then the individual tumor typing analysis is matched with the tumor typing analysis of the tumor clinical information obtained according to the mutation description analysis matching, the individual correlation of the tumor clinical information is sorted in a strong and weak grading mode, the individual practicability of the clinical information is enhanced, and a feasible processing scheme is provided for the personalized screening of mass data in clinical practice.
The invention provides a solid foundation for unifying the tumor typing of patients and normalizing the tumor typing in the clinical marker database of each large tumor based on the tumor clinical diagnosis typing tree guided by the world health organization, thereby achieving the effects of quickly and accurately carrying out mutation grading and clinical evidence grading.
The secondary tumor diagnosis knowledge base integrates clinical databases of various large tumor markers on the market, analyzes and unifies mutation description, tumor typing, drug names and gene names, and ensures the accuracy of integration; and the information acquisition and comparison unit in the secondary tumor clinical knowledge base keeps tracking each large tumor marker clinical database in real time so as to ensure timely information updating and finally avoid omission in the manual reading process.
The two-stage tumor diagnosis knowledge base analyzes and normalizes the mutation description, the tumor typing, the drug name and the gene name, unifies the labeled information and ensures the unification and standardization of information integration; the clinical information related to the individuals is matched in a targeted manner through mutation description and tumor typing, the clinical information related to the individuals is comprehensively and accurately matched through mutation description analysis and tumor typing analysis, meanwhile, grading sequencing is carried out according to the strength of the individual correlation, the individualized reference level of each clinical information is determined, the integrated interpretation standard of the known clinical information with individual pertinence is unified, the individual pertinence of the clinical information is improved, doctors and researchers can make final diagnosis and treatment judgment according to the actual illness state, past illness history and other diagnosis and treatment information of the individuals, and the clinical significance of mutation interpretation is improved.
(2) Tumor mutation analysis system
The mutation basic information annotation unit, the mutation transcript acquisition unit, the mutation tumor related information acquisition unit and the mutation tumor clinical information acquisition unit of the tumor mutation analysis system integrate a tumor clinical marker database (CIViC, PMKB and CGI), a crowd frequency database (1000 g, gnomaD, ExAC and ESP), an embryonic mutation database (HGMD and clinvar), a tumor mutation frequency database (TCGA, ICGC, COSMIC and the like), a biological pathway database (KEGG and the like) and a literature database (pubmed and the like) and other large database resources for providing relevant information of biology, clinics and the like, use mutation and tumor classification as keywords and main lines of information integration, interpret and annotate the mutation from multiple aspects such as basic information, transcripts, tumor related information, tumor clinical information and the like, ensure the comprehensiveness, accuracy and pertinence of information, and overcome the defects of the existing knowledge databases, which are caused by the defects of the knowledge databases, but not the comprehensiveness, incompleteness and incompleteness of the information, The problem of difficult information integration and weak pertinence is solved, and a large amount of manpower and energy are not required to be consumed.
Specifically, starting from basic information annotation of mutation, the invention defines the standards of each information annotation step, such as a clear reference sequence version, a clear clinical most relevant transcript acquisition method to acquire the clinically most relevant transcript, a clear mutation description (HGVS naming information) of the clinically most relevant transcript, clear tumor clinical information integration, an acquisition standard (the invention of a two-stage tumor diagnosis knowledge base) and the like, further accurately acquires all relevant tumor clinical information and other annotation information except the tumor clinical information aiming at an individual, and performs individual correlation reference grading of information to reduce the individual difference of integrated information. And the individual difference obtained by other annotation information (especially the most relevant clinical transcript and mutation description of the individual) except the tumor clinical information is reduced, the individual relevance of the tumor clinical information is improved, the individual relevance of the finally obtained tumor clinical information is further ensured, the individual difference is reduced, the individual relevance of all integrated information is finally improved, and the value degree and the interpretation significance of the individual reference are improved. In the process, clinically most relevant transcript information of patient mutation is clarified, and the essential effect is achieved on reducing individual difference of integrated information and ensuring individual relevance of the integrated information.
The mutant tumor clinical information resource in the tumor mutation analysis system adopts the two-stage tumor diagnosis knowledge base constructed by the invention, and collects all relevant tumor clinical information according to mutation analysis, thereby ensuring the comprehensiveness and accuracy of the information; and marking the evidence grades of all tumor clinical information according to tumor typing analysis, namely the degree of closeness and closeness of the tumor typing, so that the clinical information is graded according to the individual correlation, and the reference value of the analyzed and annotated tumor clinical information is ensured.
The mutation analysis system also provides a solution for the condition that corresponding tumor clinical information cannot be matched according to mutation sites and mutation types, and highlights the comprehensiveness and practicability of the mutation interpretation system, and the scheme is as follows: and grading the clinical relevance of the tumor according to the mutation frequency, the crowd mutation frequency, the tumor mutation frequency, the germ line mutation knowledge base, the function prediction of the protein corresponding to the mutation, the influence of the mutation on a tumor channel and other information extracted from the mutation basic information annotation unit, the mutation transcript acquisition unit and the mutation tumor related information acquisition unit, providing data information with reference value for the mutation which can be found without practice, and guiding doctors and researchers to make diagnosis reliability by combining other diagnosis and treatment information.
The invention also provides a systematic tumor mutation analysis method, which makes clear and complement ten indexes mentioned in AMP-ASCO-CAP tumor mutation grading and interpretation guidelines, so that each tumor mutation can be interpreted from two dimensions of known clinical evidence and clinical relevance, and the condition that the clinical meanings of single mutation conflict due to complete reference of the guidelines is avoided.
Namely, the two-stage tumor diagnosis knowledge base and the tumor mutation analysis system of the invention unify the information labeling, the integration standard and the information interpretation standard aiming at the mutation, reduce the individual difference of information integration and the individual difference of information interpretation, improve the individual correlation of the integrated information and the interpretation information, accurately acquire all tumor clinical information and other annotation information except the tumor clinical information which are related to the individual, and perform clinical reference grading interpretation according to the strength of the individual correlation, in particular to the clinical information which is an important basis for tumor mutation grading and interpretation, comprehensively and accurately acquire the individual correlation and perform clinical reference grading with the strength of the correlation, thereby providing reliable and valuable individualized reference information and providing an information acquisition processing method with practical significance for clinic.
Drawings
Fig. 1 is a schematic diagram of a construction process of a two-stage tumor diagnosis knowledge base shown in example 1.
FIG. 2 is a schematic diagram of tumor mutation analysis in example 2.
Detailed Description
The present invention is illustrated by the following examples for the purpose of facilitating understanding of the invention, but is not to be construed as being limited thereto.
Example 1
As shown in fig. 1, the initial tumor diagnosis knowledge base integrates all tumor diagnosis knowledge bases currently on the market, and keeps tracking and recording all tumor diagnosis knowledge bases on the market as the basis for content management and version update of the secondary tumor diagnosis knowledge base.
Firstly, the information acquisition unit tracks whether each tumor diagnosis knowledge base (each website, database, software and the like annotated by tumor clinical information, such as CIViC, PMKB and CGI) on the market is updated, if the update occurs, the information acquisition unit automatically captures the knowledge base of the updated version to be stored locally to generate an initial tumor diagnosis knowledge base, compares the initial tumor diagnosis knowledge base with the knowledge base of the historical version stored locally to find out whether the update is different, and if the update is different, captures and records the different tumor clinical information, and simultaneously sends the different tumor clinical information as the updated tumor clinical information to the secondary tumor diagnosis knowledge base.
The updated clinical information of the tumor includes, but is not limited to, mutation description, clinical information and clinical evidence grade information, wherein the mutation description includes, but is not limited to, a gene name, a transcript name, exons, codons, mutation types and the like corresponding to the mutation; the clinical information includes but is not limited to available drugs, tumor typing, clinical diagnosis and treatment description (pharmacology and pathology description), post-cure condition, metastasis condition and the like; the clinical evidence grade information includes, but is not limited to, evidence level, evidence type, evidence grade and clinical significance classification according to various indexes such as the source of clinical information and the stage (preclinical, mid-clinical, post-clinical, etc.) of the clinical information.
The secondary tumor clinical knowledge base receives the updated tumor clinical information sent by the initial tumor diagnosis knowledge base, after the updated contents of non-DNA level mutations (including but not limited to methylation type mutation, RNA level mutation, structural mutation, protein expression mutation and the like) are removed, the rest updated contents analyze the mutation description thereof one by one into clear all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations, according to the HGVS naming rule, the mutation sites include but not limited to gene names, transcript names, exons and codons, the mutation types include but not limited to insertion, deletion, insertion deletion, missense mutation, frameshift mutation, synonymous mutation, nonsense frameshift mutation, shearing mutation, arbitrary mutation and other mutations, the above mutations have upper and lower inclusion relations, such as frameshift mutation comprises nonsense frameshift mutation, nonsense mutations comprise nonsense frameshift mutations, and insertions comprise indels; analyzing the tumor types in the clinical information into a tumor diagnosis type tree, wherein the tumor diagnosis type tree comprises but is not limited to tumor types which are analyzed according to tumor type rules provided by the world health organization and have upper and lower level membership, and each tumor type has an internal code; and normalizing the names of the same genes and the names of the same medicines to generate a secondary tumor diagnosis knowledge base.
The method for normalizing the same gene name comprises the following steps: matching and acquiring names of the same genes with different names in a human genome database (HGNC), converting and uniformly using the names in the human genome database.
The method for normalizing the same medicine name comprises the following steps: and matching and acquiring the name with the highest occurrence frequency of the same medicine with different names in the compound database from the compound information database, and converting and uniformly using the name with the highest occurrence frequency. Info, PubChem, and ChEMBL.
The mutation description analysis and the tumor typing analysis in the secondary tumor diagnosis knowledge base facilitate matching and displaying and outputting all tumor clinical information which is the same as the mutation site information and the mutation type information or has the mutation sites and the mutation types with the upper and lower inclusion relations from the secondary tumor diagnosis knowledge base according to all the mutation site information with the upper and lower inclusion relations and all the mutation type information with the upper and lower inclusion relations after the input mutation description and the mutation description analysis; according to the codes corresponding to the input tumor types, the evidence grades of the tumor clinical information which is matched from all the tumor clinical information and corresponds to the tumor types which are the same as the input tumor types and have superior inclusion relations are unchanged, and the evidence grades of other tumor clinical information are adjusted downwards; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance); the doctor and the researcher can make the final diagnosis by combining the actual illness state, past medical history and other diagnosis and treatment information of the individual.
Example 2
In order to timely, comprehensively and accurately interpret the tumor mutation and provide accurate medical service for individuals, the invention provides a tumor mutation analyzing system, as shown in a full-flow schematic diagram in fig. 2, and the process of interpreting a sample by using the system is as follows:
the user needs to input three types of information at the input port in use: mutations contained in the tumor sample, clinical typing of the tumor sample, and the reference genome used (hg 19 or hg 38). Firstly, judging the storage type of the mutation, and converting the input format of a single mutation into a format required by vep; the file in the VCF format can be directly read by the vep without conversion. After the formats are unified, according to the reference sequence version provided by the user, performing basic information annotation on all mutations by using ENSEMBL-VEP (mutation basic information annotation database) in a mutation basic information annotation unit, wherein the basic information annotation comprises the following steps: the method comprises the steps of mutation crowd occurrence frequency, protein function prediction, HGVS naming and mutation transcript information, and sending the HGVS naming information to a mutation transcript acquisition unit. Then, the mutant transcript acquiring unit searches whether a clinically most significant transcript corresponding to the gene exists in a site reference sequence genome database (LRG) according to the gene name extracted from the received HGVS name, and if so, the transcript is used; if not, selecting the transcript with the longest length from all the released transcripts so as to determine that the transcript corresponding to the mutation has the highest clinical relevance; and extracting HGVS naming information corresponding to the clinically most relevant transcript, and sending the HGVS naming information to the mutant tumor relevant information acquisition unit and the mutant tumor clinical information acquisition unit.
When the transcript is established, the mutation tumor related information acquisition unit annotates the tumor related information to the tumor sample according to the HGVS naming information corresponding to the received clinically most relevant transcript, wherein the steps of the step of annotating the tumor related information comprise: tumor median mutation assessment, whether it is located in the tumor and its associated pathways, and whether it is a start-up mutation.
Finally, the mutation tumor clinical information obtaining unit obtains all the mutation site information with the upper and lower inclusion relations and all the mutation type information with the upper and lower inclusion relations, which are analyzed according to the HGVS naming information corresponding to the received clinically most relevant transcript by using the secondary tumor clinical knowledge base in FIG. 1, from the secondary tumor diagnosis knowledge base, the same or the same sites with the inclusion relations are obtained, and all the tumor clinical information corresponding to the mutation types with the same mutation type and the inclusion relations is satisfied;
secondly, according to codes corresponding to the tumor types input by individuals, matching tumor clinical information which is the same as the input tumor types and corresponds to the tumor types with superior inclusion relation from all the obtained tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced; if the evidence grade is A, B grade, the mutation grade is classified as I grade (with strong clinical significance); if the clinical evidence grade is C, D grade, the mutation grade is classified as II grade (with potential clinical significance);
if the clinical tumor information cannot be matched according to the analyzed site information and the analyzed mutation type, extracting the following indexes from the information acquired by the mutation basic information annotation unit, the mutation transcript acquiring unit and the mutation tumor related information acquiring unit, and grading the clinical tumor relevance: mutation frequency, population mutation frequency, tumor mutation frequency, germ line mutation knowledge base, function prediction of mutation-corresponding protein, and influence of mutation on tumor pathway. If more than three indexes appear, the index is judged to be IV grade, namely the correlation is the highest: the mutation frequency is close to 50% or 100%, and/or the mutation frequency of the human population is higher than 1%, and/or the mutation frequency of the tumor is lower than 1%, and/or the mutation is absent in germ line mutation, and/or the mutation does not have functional influence on protein, and/or the mutation is absent in the tumor and related pathways, and/or the mutation has no prominent phenotype in known related cells and animal experiments; if one to three items appear, the mutation grade is rated as grade III, and the clinical significance is unknown; if none of the terms appear, the marker is of potential clinical significance, but no known clinical evidence has been observed.
And finishing all the steps of annotation, biomarker rating and mutation meaning interpretation, displaying all the information to the user, and outputting and downloading. The doctor and the researcher can make the final diagnosis by combining the actual illness state, past medical history and other diagnosis and treatment information of the individual.
However, it should be understood that the knowledge base for secondary tumor diagnosis and the tumor mutation analysis system of the present invention can perform personalized comprehensive interpretation of each mutation of a patient, but cannot make clinical guidance by integrating clinical meanings of each mutation, not only because the patient may contain mutations with conflicting clinical meanings, but also because the system cannot evaluate past medical history, current disease symptoms, and other detection results of the patient; even if two patients have detected the same mutation, the physician may make a completely different diagnosis and treatment for the other reasons mentioned above.

Claims (9)

1. A tumor mutation analysis system comprising:
the information acquisition unit is used for acquiring mutation information, reference sequence versions and tumor types of individuals from the input port, sending the mutation information and the reference sequence versions to the mutation basic information annotation unit, and sending the tumor types to the mutation tumor clinical information acquisition unit;
and the mutation basic information annotation unit acquires information including but not limited to the following from the mutation basic information annotation public resource according to the mutation information acquired from the information acquisition unit and the reference sequence version: the method comprises the steps of (1) predicting the occurrence frequency of a mutated population, predicting the protein function, naming the HGVS and information of a mutated transcript, and sending the information of the naming of the HGVS to a mutated transcript acquisition unit;
the mutation transcript acquiring unit extracts a gene name from the HGVS name acquired by the mutation basic information annotation unit, and whether the gene corresponding to the mutation exists in a clinical most relevant transcript or not is matched in the locus reference sequence genome database, and if so, the gene is used as the clinical most relevant transcript; if not, selecting the transcript with the longest length from the mutant transcripts obtained from the mutation basic information annotation unit as the clinically most relevant transcript; extracting HGVS naming information corresponding to the most clinically relevant transcripts, and sending the HGVS naming information to a mutant tumor relevant information acquisition unit and a mutant tumor clinical information acquisition unit;
a mutation tumor related information acquisition unit which acquires the mutation frequency in the tumor, whether the tumor is located in a tumor and a related passage thereof, and whether the tumor is a priming mutation from a mutation tumor related information public resource according to the HGVS naming information corresponding to the clinically most relevant transcript acquired from the mutation transcript acquisition unit;
a mutation tumor clinical information acquisition unit which analyzes the HGVS naming information corresponding to the clinically most relevant transcript acquired from the mutation transcript acquisition unit, acquires the same or contained loci from the secondary tumor diagnosis knowledge base according to all the analyzed mutation locus information with the upper and lower inclusion relations and all the analyzed mutation type information with the upper and lower inclusion relations, and satisfies all the tumor clinical information corresponding to the mutation types with the same mutation type and the inclusion relations;
then, according to the codes corresponding to the individual tumor types acquired from the information acquisition unit, matching the tumor clinical information which is the same as the input tumor type and corresponds to the tumor type with a superior inclusion relation from all the acquired tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced;
if the clinical tumor information cannot be matched according to the analyzed mutation site information and the analyzed mutation type information, extracting the following indexes from the information acquired by the mutation basic information annotation unit, the mutation transcript acquiring unit and the mutation tumor related information acquiring unit, and grading the clinical tumor relevance: mutation frequency, population mutation frequency, tumor mutation frequency, germ line mutation knowledge base, function prediction of protein corresponding to mutation, and influence of mutation on tumor pathway;
and acquiring the hierarchical ranking of all tumor clinical information and other information except the tumor clinical information related to the individual according to the individual correlation strength, and determining the individual reference level for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
2. A method for analyzing a tumor mutation, comprising the steps of:
s1, information acquisition: obtaining individual mutation information, reference sequence version and tumor typing;
s2, mutation basis information annotation: and acquiring information including but not limited to the following from the mutation basic information annotation public resource according to the acquired mutation information and the reference sequence version: mutational population occurrence frequency, protein function prediction, HGVS naming, and mutational transcript information;
s3, mutant transcript acquisition: extracting a gene name from the HGVS name, matching whether a gene corresponding to the mutation exists a clinically most relevant transcript or not from a site reference sequence genome database, and if so, taking the gene as the clinically most relevant transcript; if not, selecting the transcript with the longest length from the mutant transcripts obtained from the mutation basic information annotation unit as the clinically most relevant transcript; extracting HGVS naming information corresponding to the clinically most relevant transcripts;
s4, acquiring the related information of the mutant tumor: acquiring mutation frequency in the tumor, whether the tumor is located in a tumor and a related passage thereof, and whether the tumor is a priming mutation from a public resource of mutation tumor related information according to HGVS naming information corresponding to a clinically most relevant transcript;
s5, acquiring clinical information of the mutant tumor: analyzing HGVS naming information corresponding to clinically most relevant transcripts, acquiring sites which are the same or have inclusion relation from a secondary tumor diagnosis knowledge base according to all analyzed mutation site information having upper and lower inclusion relation and all analyzed mutation type information having upper and lower inclusion relation, and meeting all tumor clinical information corresponding to mutation types which are the same and have inclusion relation;
secondly, according to the codes corresponding to the obtained individual tumor types, matching the tumor clinical information which is the same as the input tumor type and corresponds to the tumor type with a superior inclusion relation from all the obtained tumor clinical information, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced;
if the clinical tumor information cannot be matched according to the analyzed mutation site information and the analyzed mutation type information, extracting the following indexes from the information acquired by the mutation basic information annotation unit, the mutation transcript acquiring unit and the mutation tumor related information acquiring unit, and grading the clinical tumor relevance: mutation frequency, population mutation frequency, tumor mutation frequency, germ line mutation knowledge base, function prediction of protein corresponding to mutation, and influence of mutation on tumor pathway;
and acquiring the hierarchical ranking of all tumor clinical information and other information except the tumor clinical information related to the individual according to the individual correlation strength, and determining the individual reference level for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
3. An executing device of the tumor mutation analyzing method comprises a memory for storing programs and a processor for executing the programs, so as to complete the tumor mutation analyzing method.
4. A computer readable medium containing a program executable by a processor to perform the tumor mutation analysis method of the present invention.
5. A secondary knowledge base of tumor diagnosis comprising:
the information acquisition unit is used for tracking and acquiring the latest tumor clinical information knowledge base from the public knowledge base of the tumor clinical information respectively and sending the latest tumor clinical information knowledge base to the initial tumor diagnosis knowledge base unit;
the initial tumor diagnosis knowledge base unit is used for storing the latest tumor clinical information knowledge base of each tumor clinical information public knowledge base acquired from the information acquisition unit into the initial tumor diagnosis knowledge base, comparing the latest tumor clinical information knowledge base with historical versions stored in the initial tumor diagnosis knowledge base and derived from the same knowledge base, comparing the contents of the old and new version knowledge bases, capturing and recording updated tumor clinical information including but not limited to mutation description, clinical information and clinical evidence grade information, and simultaneously sending the updated tumor clinical information to the secondary tumor diagnosis knowledge base unit;
the secondary tumor diagnosis knowledge base unit acquires updated tumor clinical information from the initial tumor diagnosis knowledge base unit, and analyzes the mutation description in each piece of tumor clinical information one by one into all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations; analyzing the tumor types in the clinical information into a tumor diagnosis type tree, wherein the tumor diagnosis type tree comprises but is not limited to tumor types which are analyzed according to tumor type rules provided by the world health organization and have upper and lower level membership, and each tumor type has an internal code; normalizing the names of the same genes and the names of the same medicines;
the information retrieval output unit is used for inputting individual mutation description and tumor typing, analyzing all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations of the input mutation description, and matching and displaying and outputting all tumor clinical information which is the same as the mutation site information and the mutation type information or has the mutation sites with upper and lower inclusion relations and the mutation types from the secondary tumor diagnosis knowledge base; matching tumor clinical information which is the same as the input tumor typing and corresponds to the tumor typing with superior inclusion relation from all the obtained tumor clinical information according to the codes corresponding to the input tumor typing, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced;
the secondary tumor diagnosis knowledge base carries out hierarchical sequencing on all tumor clinical information related to individuals according to the individual correlation strength, and defines individual reference levels for doctors and researchers to make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individuals.
6. The secondary knowledge base for tumor diagnosis as claimed in claim 5, wherein in the initial knowledge base unit for tumor diagnosis, the mutation description includes but is not limited to the gene name, transcript name, exon, codon and mutation type corresponding to the mutation; the clinical information includes, but is not limited to, available drugs, tumor typing, clinical diagnosis and treatment description, post-healing condition, and metastatic condition; the clinical evidence grade information comprises but is not limited to evidence level, evidence type, evidence grade and clinical meaning division according to sources of clinical information and various indexes of the clinical information at the stage;
in the secondary tumor diagnosis knowledge base unit, mutation sites include but are not limited to gene names, transcript names, exons and codons, and mutation types include but are not limited to insertions, deletions, indels, missense mutations, frameshift mutations, synonymous mutations, nonsense frameshift mutations, splice mutations, arbitrary mutations and other mutations.
7. A method for constructing a secondary tumor diagnosis knowledge base is characterized by comprising the following steps:
s1, information acquisition: respectively tracking and acquiring the latest tumor clinical information knowledge base from the public tumor clinical information knowledge bases;
s2, constructing an initial tumor diagnosis knowledge base: storing the latest tumor clinical information knowledge base of the acquired tumor clinical information public knowledge base to generate an initial tumor diagnosis knowledge base, comparing the initial tumor diagnosis knowledge base with historical versions derived from the same knowledge base stored in the initial tumor diagnosis knowledge base, comparing the contents of the new and old versions of the knowledge base, and capturing and recording updated tumor clinical information including but not limited to mutation description, clinical information and clinical evidence grade information;
s3, constructing a secondary tumor diagnosis knowledge base: analyzing the mutation description in each piece of updated tumor clinical information acquired from the initial tumor diagnosis knowledge base one by one into all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations, and analyzing the tumor types in the clinical information to tumor diagnosis typing trees, wherein the tumor diagnosis typing trees comprise but are not limited to all tumor types with upper and lower membership relations analyzed according to tumor typing rules provided by the world health organization, and each tumor type has an internal code; meanwhile, normalizing the names of the same genes and the names of the same medicines to generate a secondary tumor diagnosis knowledge base;
after mutation description and tumor typing analysis in the secondary tumor diagnosis knowledge base, according to all mutation site information with upper and lower inclusion relations and all mutation type information with upper and lower inclusion relations after input individual mutation description and mutation description analysis, matching and displaying all tumor clinical information which is the same as the mutation site information and the mutation type information or has the mutation sites with upper and lower inclusion relations and the mutation types and corresponds to the mutation types from the secondary tumor diagnosis knowledge base; matching tumor clinical information which is the same as the input tumor typing and corresponds to the tumor typing with superior inclusion relation from all the obtained tumor clinical information according to codes corresponding to the tumor typing of the input individual, wherein the evidence grade of the tumor clinical information is unchanged, and the evidence grade of other tumor clinical information is reduced;
and (3) according to the individual correlation strength, all tumor clinical information related to the individual is ranked in a grading way, and an individual reference grade is determined, so that doctors and researchers can make final diagnosis and treatment judgment by combining the actual disease condition, past medical history and other diagnosis and treatment information of the individual.
8. An executing device of the two-stage tumor diagnosis knowledge base construction method comprises a memory used for storing programs and a processor used for executing the programs, so as to complete the two-stage tumor diagnosis knowledge base construction method.
9. A computer readable medium containing a program executable by a processor to perform the method of constructing a secondary tumor diagnosis knowledge base of the present invention.
CN202011207776.2A 2020-11-03 2020-11-03 Secondary tumor diagnosis knowledge base and tumor mutation analysis system Active CN112270960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011207776.2A CN112270960B (en) 2020-11-03 2020-11-03 Secondary tumor diagnosis knowledge base and tumor mutation analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011207776.2A CN112270960B (en) 2020-11-03 2020-11-03 Secondary tumor diagnosis knowledge base and tumor mutation analysis system

Publications (2)

Publication Number Publication Date
CN112270960A true CN112270960A (en) 2021-01-26
CN112270960B CN112270960B (en) 2023-06-06

Family

ID=74345712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011207776.2A Active CN112270960B (en) 2020-11-03 2020-11-03 Secondary tumor diagnosis knowledge base and tumor mutation analysis system

Country Status (1)

Country Link
CN (1) CN112270960B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735520A (en) * 2021-02-03 2021-04-30 深圳裕康医学检验实验室 Interpretation method, system and storage medium for tumor individualized immunotherapy gene detection result
CN113836931A (en) * 2021-11-24 2021-12-24 慧算医疗科技(上海)有限公司 Method, system and terminal for building cancer medication knowledge base based on domain ontology
CN118248214A (en) * 2024-02-01 2024-06-25 南方海洋科学与工程广东省实验室(珠海) High-confidence intestinal flora single amino acid mutation identification method and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013067001A1 (en) * 2011-10-31 2013-05-10 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US20160092631A1 (en) * 2014-01-14 2016-03-31 Omicia, Inc. Methods and systems for genome analysis
CN105956378A (en) * 2016-04-26 2016-09-21 成都聚恒康科技有限公司 Clinical decision supporting system of tumor diagnosis and treatment
CN109686456A (en) * 2018-12-26 2019-04-26 博奥生物集团有限公司 A kind of accurate medication interpretation system and method for tumour
CN111833962A (en) * 2020-06-16 2020-10-27 荣联科技集团股份有限公司 Tumor medication interpretation database and construction method and device thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013067001A1 (en) * 2011-10-31 2013-05-10 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US20160092631A1 (en) * 2014-01-14 2016-03-31 Omicia, Inc. Methods and systems for genome analysis
CN105956378A (en) * 2016-04-26 2016-09-21 成都聚恒康科技有限公司 Clinical decision supporting system of tumor diagnosis and treatment
CN109686456A (en) * 2018-12-26 2019-04-26 博奥生物集团有限公司 A kind of accurate medication interpretation system and method for tumour
CN111833962A (en) * 2020-06-16 2020-10-27 荣联科技集团股份有限公司 Tumor medication interpretation database and construction method and device thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735520A (en) * 2021-02-03 2021-04-30 深圳裕康医学检验实验室 Interpretation method, system and storage medium for tumor individualized immunotherapy gene detection result
CN112735520B (en) * 2021-02-03 2021-07-20 深圳裕康医学检验实验室 Interpretation method, system and storage medium for tumor individualized immunotherapy gene detection result
CN113836931A (en) * 2021-11-24 2021-12-24 慧算医疗科技(上海)有限公司 Method, system and terminal for building cancer medication knowledge base based on domain ontology
CN113836931B (en) * 2021-11-24 2022-03-08 慧算医疗科技(上海)有限公司 Method, system and terminal for building cancer medication knowledge base based on domain ontology
CN118248214A (en) * 2024-02-01 2024-06-25 南方海洋科学与工程广东省实验室(珠海) High-confidence intestinal flora single amino acid mutation identification method and application thereof

Also Published As

Publication number Publication date
CN112270960B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN112270960B (en) Secondary tumor diagnosis knowledge base and tumor mutation analysis system
CN103975328B (en) The clinical related information for clinical decision support is extracted from patient's sequencing data retrospective
Waters Systems toxicology and the Chemical Effects in Biological Systems (CEBS) knowledge base
CN109686439B (en) Data analysis method, system and storage medium for genetic disease gene detection
US8175816B2 (en) System and method for analyzing metabolomic data
CN1385702A (en) Method for supply clinical diagnosis
US20060184489A1 (en) Genetic knowledgebase creation for personalized analysis of medical conditions
CN112466463B (en) Intelligent answering system based on tumor accurate diagnosis and treatment knowledge graph
JP2009505231A (en) System, method, and computer program for comparing and editing metabolite data obtained from a plurality of samples using a computer system database
Han et al. Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing
Callahan et al. Ontologizing health systems data at scale: making translational discovery a reality
CN110111844A (en) A kind of gene data interpretation annotation system
Ruau et al. Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets
CN116721699A (en) Intelligent recommendation method based on tumor gene detection result
Grewal et al. Analysis of expression data: an overview
KR20230102240A (en) Multidimensional omics data transformation system and method therefor
CN114566221A (en) Automatic analysis and interpretation system for NGS data of genetic diseases
Cristiano et al. Methods and techniques for miRNA data analysis
CN111243661A (en) Gene physical examination system based on gene data
CN114882943B (en) Method and device for analyzing somatic cell variation
Layer et al. Mining thousands of genomes to classify somatic and pathogenic structural variants
Reches et al. From phenotyping to genotyping-bioinformatics for the busy clinician
De Filippis et al. Computational strategies in nutrigenetics: constructing a reference dataset of nutrition-associated genetic polymorphisms
KR102483880B1 (en) disease profiling information providing system based on multiple database information and method therefor
US11978531B2 (en) Method for monitoring and management of cell lines using periodic low-coverage DNA sequencing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 312000 block a, Kechuang building, No. 586, West Ring Road, Qixian street, Keqiao District, Shaoxing City, Zhejiang Province

Applicant after: Zhejiang Shaoxing Dingjing Biomedical Technology Co.,Ltd.

Address before: 200000 floor 5, No.3, Lane 118, Furonghua Road, Pudong New Area, Shanghai

Applicant before: SHANGHAI TOPGEN BIOMEDICAL TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant