CN114927191A - Interpretation method for NGS report of blood system disease - Google Patents

Interpretation method for NGS report of blood system disease Download PDF

Info

Publication number
CN114927191A
CN114927191A CN202210385521.8A CN202210385521A CN114927191A CN 114927191 A CN114927191 A CN 114927191A CN 202210385521 A CN202210385521 A CN 202210385521A CN 114927191 A CN114927191 A CN 114927191A
Authority
CN
China
Prior art keywords
variation
information
report
evidence
diseases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210385521.8A
Other languages
Chinese (zh)
Other versions
CN114927191B (en
Inventor
付海阔
王奇隆
舒金才
陈金雄
尚华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gaolingzhiteng Information Technology Co ltd
Original Assignee
Beijing Gaolingzhiteng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gaolingzhiteng Information Technology Co ltd filed Critical Beijing Gaolingzhiteng Information Technology Co ltd
Priority to CN202210385521.8A priority Critical patent/CN114927191B/en
Publication of CN114927191A publication Critical patent/CN114927191A/en
Application granted granted Critical
Publication of CN114927191B publication Critical patent/CN114927191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a reading method of an NGS report of a hematological disease, which belongs to the field of letter reading and comprises the following steps: s1: uploading a result data vcf file of the credit generation analysis; s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3; s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, carrying out filtering processing and generating annotation files; s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner; s5: matching variation site information in a knowledge base; s6: acquiring report data; s7: acquiring a report template according to a submission unit; s8: a report is generated and exported. The invention realizes convenient data management and rapid report output, helps users realize intelligent management of gene data, comprehensively improves the report reading capability and greatly improves the efficiency.

Description

Interpretation method for NGS report of blood system disease
Technical Field
The invention belongs to the field of letter interpretation and relates to a blood system disease NGS report interpretation method.
Background
The high throughput sequencing technology (NGS) can perform sequencing and general reading length detection on hundreds of thousands to millions of DNA molecules in parallel at one time, and the like, and complete sequence information is spliced by reading a plurality of short DNA fragments. The NGS messaging process can be divided into three levels: first-level information analysis-conversion of offboard raw data (BCL format) to readable data (VCF format); secondary information analysis-site annotation filtering for VCF data, etc.; and (3) three-level information analysis, namely, combining the clinical diagnosis and treatment conditions of the patient to carry out clinical significance interpretation on the mutant gene locus. Where report interpretation is the last and most important link. The interpretation of the report of the second generation sequencing needs to search a large amount of databases and professional documents, and the problems of large data volume, complex operation, difficult query and the like are faced, and one tumor report is completely interpreted manually for about 6 hours.
Disclosure of Invention
In view of the above, the present invention provides a method for interpreting a hematological disease NGS report.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for interpreting a hematological disease (NGS) report, comprising the steps of:
s1: uploading a result data vcf file of the credit generation analysis;
s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3;
s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, carrying out filtering processing and generating annotation files;
s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner;
s5: matching variation site information in a knowledge base;
s6: acquiring report data;
s7: acquiring a report template according to a submission unit;
s8: a report is generated and exported.
Further, the knowledge base comprises the following structural information:
gene: receiving and recording human gene related information including basic information, description information and other related information of the genes, wherein the basic information includes gene names, gene positions, gene types and common transcripts, and the other related information includes protein structural domains of the genes, related diseases and evidences;
mutation: dividing into parent variation and child variation;
parent variation: integrating and inducing the general variation related information, grading the variation grade of the variation related information, associating the variation of the type with the variation of the parent, inducing and summarizing the variation description content of the variation of the parent, collating variation summary information corresponding to diseases according to the related diseases of the variation of the type, collating related evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
sub-variation: the method comprises a plurality of types, wherein details of each type of sub-variation comprise basic information, variation description information and grading information related to the variation, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation summary information related to the sub-variation and a plurality of types of related evidence collation evidence summary information is collated according to targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
evidence is as follows: the method comprises approved medication, guidelines, clinical trials and literature evidences, wherein genes, variation and diseases targeted by the literatures are summarized and relevant information of evidence types, evidence relations and evidence grading is summarized through interpretation of different types of literatures;
diseases: clinical blood-related diseases are recorded and graded according to the diseases.
Further, the sub-variant types include:
SNP/InDel: single base variation, and insertion and deletion of small fragments;
fusion: fusion gene, two genes spliced together;
CNV: copy number variation, duplication of large fragments;
SV: other structural variations.
Further, the matching of the mutation site information in the knowledge base in step S5 specifically includes:
judging whether the code shift hotspot variation exists in each piece of site information, and if the code shift hotspot variation exists, correcting through a transvar; then, site knowledge base matching is carried out, matching is carried out through genes, transcripts and p points, if a plurality of knowledge base variation sites are matched, the variation site with the point c being empty is preferentially selected, if the variation site is not matched, matching is carried out through the genes, the transcripts and the point c, and if variation site information is matched, variation site reading information is returned;
if the unmatched variation is a frameshift or hotspot variation type, matching parent variation through the variation locus, if the matched corresponding parent variation locus information is clinically significant, indicating that the locus which is not recorded in the knowledge base is a clinically significant variation locus, automatically grading and preliminarily interpreting the locus according to related information, simultaneously recording the locus in the knowledge base for marking, and waiting for interpretation by an expert;
and if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
Further, the details of interpretation are as follows:
according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out; according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring are carried out on multiple evidences with different dimensions, such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and optimal interpretation data are found out;
and (3) scoring the evidence weight: all the varied evidences comprise three dimensions of an evidence type, an evidence label and related diseases, the three dimensions are graded according to sample information and report purposes during interpretation, and the evidence with a low score is the best interpretation evidence;
type of evidence: including approved medications, guidelines, clinical trials, and literature types, for diagnostic reporting purposes;
evidence label: including treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution signatures, and also for judging condition reporting uses;
the related diseases are as follows: judging the score according to the hierarchical relation between the patient diagnosis diseases in the sample information and the disease numbers, wherein the rule is as follows: the priority of the node is 1 point as the priority of the child node, the brother node of the node and the child node below the brother node are 1.5 points, the score of the father node is 2 points, the score of the brother node of the father node and the child node below the brother node of the father node are 2.5 points, and the logic is used for grading all evidences step by step;
and after the scores of the three dimensions are obtained, multiplying the scores of the three dimensions as required to obtain a best matching evidence, or grouping and sequencing the three dimensions, and selecting the best interpretation data to generate a report.
The invention has the beneficial effects that: according to the invention, through structured storage of the letter generation analysis result, more accurate and multidimensional interpretation site data is convenient for users; detailed reading information can be rapidly inquired through the gene locus, information such as function influence, diseases, medicines and the like related to the reading information can be checked, each piece of information can be conveniently traced back to an original text of an information source, and reading contents can be updated in time according to the latest research; the decoded content can be gradually precipitated to form a knowledge base, and a user can automatically match the knowledge base after uploading second-generation sequencing data, mark the decoded gene locus and help the user to screen the locus. Each piece of reading information can be obtained according to multiple dimensions of information sources, gene variation types, diseases, medicines and the like. A user can generate a report by one key by selecting a gene locus needing to be written into the report, and the process from uploading sequencing data to outputting the report can be completed in a few minutes.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a general flowchart of a method for interpreting a hematological disease NGS report;
FIG. 2 is a diagram of a knowledge base architecture;
FIG. 3 is an automatic + manual matching flow chart;
FIG. 4 is an automatic + manual matching implementation;
FIG. 5 is a chart of specific disease categories given in the examples.
Detailed Description
The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and embodiments may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present invention, and the specific meaning of the terms described above will be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, a reading method for NGS report of hematological diseases provided by the present invention includes the following steps:
uploading a vcf file;
judging whether the uploaded file is an annotation file, and if so, executing the fourth step;
thirdly, obtaining relevant filtering conditions of the detection items corresponding to the samples, filtering and generating annotation files;
fourthly, reading the information of the variation sites in the file and storing the information in a structured way;
matching knowledge base variation site information;
sixthly, report data is obtained;
seventhly, acquiring a report template according to a submission unit;
generating and exporting a report.
The structure of the knowledge base is shown in fig. 2 and is divided into the following parts: gene, variation, evidence, literature, disease.
Gene: the method mainly includes human gene related information, including basic information such as gene name, gene position, gene type and common transcript, gene description organized by an interpreter, description information of the genes in related websites such as genecards, omim and uniprot, and related information such as protein domains of the genes, related diseases and evidences.
Mutation: it is divided into two major categories, parent variation and child variation.
Parent variation: integrating and summarizing some common variation related information, carrying out variation grade grading on the variation related information, associating the sub-variation of the type with the parent variation, simultaneously carrying out induction and summarization on variation description content of the parent variation, collating variation summary information corresponding to diseases according to related diseases of the parent variation of the type, collating related evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism.
Sub-variation: the main categories are: SNP/InDel (single base variation, and indels of small fragments), Fusion (Fusion gene, splicing together of two genes), CNV (copy number variation, duplication of large fragments, SV (other structural variations).
The details of each sub-variation mainly comprise basic information related to the variation, variation description information and grading information, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation information related to the sub-variation and various types of related evidence such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism.
Evidence: the types of evidences including approved medicines, guidelines, clinical trials, documents and the like are included, and genes, variation and diseases targeted by the documents, and related information of evidence types, evidence relationships and evidence grading are summarized through interpretation of the different types of documents.
Diseases: clinical blood-related diseases are recorded and graded according to the diseases.
The specific steps of matching the knowledge base and interpreting are shown in fig. 3-4, and the patient vcf file is first imported into the system through related filtering of the detection items and formatted. And then, judging whether the code shift hot spot variation exists in each piece of site information, and if the code shift hot spot variation exists, correcting through a transvar. And then carrying out site knowledge base matching, carrying out matching through genes, transcripts and p points, preferentially selecting the mutation with the point c being empty if a plurality of knowledge base mutation sites are matched, carrying out matching through the genes, the transcripts and the point c if the mutation sites are not matched, and returning mutation site reading information if the mutation site information is matched.
Because the heterogeneity of the variant sites related to the blood diseases is very strong, even if the knowledge base is very perfect, the situation that the knowledge base cannot be matched can not be avoided, and therefore, a process for automatically discovering the variant sites is designed. When the unmatched mutation is a frameshift or hotspot mutation type, the parent mutation is matched through the mutation site, if the matched corresponding parent mutation site information is clinically significant, the site which is not recorded in the knowledge base is the clinically significant mutation site, the site is automatically graded and preliminarily interpreted according to the related information, and meanwhile, the site is recorded in the knowledge base and marked to wait for the deeper interpretation of an expert.
And if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
Interpretation: firstly, according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out. And secondly, according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring a plurality of evidences with different dimensions such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and finding out the optimal interpretation data.
And (3) scoring the evidence weight: all evidence for variation includes approved drug, guideline, clinical trial and literature types, and treatment, prognosis, diagnosis, risk, clinical profile, population distribution, etc., as well as markers associated with standard diseases. When in interpretation, the three dimensions are scored according to sample information and report purposes, and the best interpretation evidence is obtained when the score is low.
(1) Type of evidence: judging the condition reporting use, if the reporting use is treatment, the evidence for approved medication and clinical trial scores 0.5 and the other scores are default 1.
(2) Evidence label: similar type scores are obtained, if the reported use is diagnosis, the evidence of the label related to the prognosis, diagnosis, risk and the like is between 0.5 and 1, and the others are 1.
(3) Related diseases: and judging the score according to the hierarchical relation between the sample information and the disease number of the patient diagnosis disease. The rule is that the self node and the child node are scored with 1 as the priority, the brother node and the child node below the self node are scored with 1.5, the father node is scored with 2, the brother node and the child node below the father node are scored with 2.5, and all evidences are scored step by step according to the logic.
As shown in FIG. 5, in this example, when the patient was diagnosed with mature B-cell lymphoma, the following scores were made as shown in Table 1 below:
TABLE 1
Figure BDA0003593509210000061
And finally, three dimensionality scores are obtained, the three scores can be multiplied as required to obtain the best matching evidence, or the three dimensions are grouped and sorted, and the best interpretation data is selected to generate a report, so that a clinician can conveniently and quickly know the disease details of the patient.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (5)

1. A method for interpreting a hematological disease (NGS) report, comprising: the method comprises the following steps:
s1: uploading a result data vcf file of the credit generation analysis;
s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3;
s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, filtering the relevant filtering conditions and generating annotation files;
s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner;
s5: matching variation site information in a knowledge base;
s6: acquiring report data;
s7: acquiring a report template according to a submission unit;
s8: a report is generated and exported.
2. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the knowledge base comprises the following structural information:
gene: receiving and recording human gene related information including basic information, description information and other related information of the gene, wherein the basic information includes a gene name, a gene position, a gene type and a common transcript, and the other related information includes a protein structure domain of the gene, related diseases and evidences;
mutation: dividing into parent variation and child variation;
parent variation: integrating and summarizing the general variation related information, grading the variation grade of the variation related information, associating the sub variation of the type with the parent variation, summarizing the variation description content of the parent variation, summarizing variation summary information corresponding to diseases according to the related diseases of the parent variation of the type, summarizing evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
sub-variation: the method comprises a plurality of types, wherein details of each type of sub-variation comprise basic information, variation description information and grading information related to the variation, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation summary information related to the sub-variation and a plurality of types of related evidence collation evidence summary information is collated according to targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
evidence: the method comprises approved medication, guidelines, clinical trials and literature evidences, wherein genes, variation and diseases targeted by the literatures, and related information of evidence types, evidence relations and evidence grading are summarized through interpretation of different types of the literatures;
diseases: clinical blood-related diseases are recorded and graded according to the diseases.
3. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the sub-variant types include:
SNP/InDel: single base variation, and insertion and deletion of small fragments;
fusion: fusion genes, two genes spliced together;
CNV: copy number variation, duplication of large fragments;
SV: other structural variations.
4. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the matching of the mutation site information in the knowledge base in step S5 specifically includes:
judging whether the code shift hotspot variation exists in each piece of site information, and if the code shift hotspot variation exists, correcting through a transvar; then, site knowledge base matching is carried out, matching is carried out through genes, transcripts and p points, if a plurality of knowledge base variation sites are matched, the variation site with the point c being empty is preferentially selected, if the variation site is not matched, matching is carried out through the genes, the transcripts and the point c, and if variation site information is matched, variation site reading information is returned;
if the unmatched variation is a frameshift or hotspot variation type, matching parent variation through the variation locus, if the matched corresponding parent variation locus information is clinically significant, indicating that the locus which is not recorded in the knowledge base is a clinically significant variation locus, automatically grading and preliminarily interpreting the locus according to related information, simultaneously recording the locus in the knowledge base for marking, and waiting for interpretation by an expert;
and if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
5. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the details of interpretation are as follows:
according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out; according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring are carried out on multiple evidences with different dimensions, such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and optimal interpretation data are found out;
and (3) scoring the evidence weight: all the varied evidences comprise three dimensions of an evidence type, an evidence label and related diseases, the three dimensions are graded according to sample information and report purposes during interpretation, and the evidence with a low score is the best interpretation evidence;
type of evidence: including approved medications, guidelines, clinical trials, and literature types, for diagnostic reporting purposes;
evidence label: including treatment, prognosis, diagnosis, risk, clinical features, population distribution signatures, and also for judging condition reporting purposes;
the related diseases are as follows: judging the score according to the hierarchical relation between the sample information and the disease numbers of the patient diagnosis diseases, wherein the rule is as follows: the priority of the node and the priority of the child node are graded into 1, the brother node of the node and the child node below the brother node are graded into 1.5, the score of the father node is graded into 2, and the score of the brother node of the father node and the child node below the brother node of the father node are graded into 2.5, and all evidences are graded by the logic step by step;
and after the three dimensionality scores are obtained, multiplying the three scores to obtain the best matching evidence according to the requirement, or grouping and sequencing the three dimensionalities, and selecting the best interpretation data to generate a report.
CN202210385521.8A 2022-04-13 2022-04-13 NGS report interpretation method for blood system diseases Active CN114927191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210385521.8A CN114927191B (en) 2022-04-13 2022-04-13 NGS report interpretation method for blood system diseases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210385521.8A CN114927191B (en) 2022-04-13 2022-04-13 NGS report interpretation method for blood system diseases

Publications (2)

Publication Number Publication Date
CN114927191A true CN114927191A (en) 2022-08-19
CN114927191B CN114927191B (en) 2024-07-02

Family

ID=82806459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210385521.8A Active CN114927191B (en) 2022-04-13 2022-04-13 NGS report interpretation method for blood system diseases

Country Status (1)

Country Link
CN (1) CN114927191B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453591A (en) * 2023-05-08 2023-07-18 上海信诺佰世医学检验有限公司 RNA-seq data analysis-based variation rating and report generation system and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013067001A1 (en) * 2011-10-31 2013-05-10 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
CN109754856A (en) * 2018-12-07 2019-05-14 北京荣之联科技股份有限公司 Automatically generate method and device, the electronic equipment of genetic test report
CN111966708A (en) * 2020-09-02 2020-11-20 荣联科技集团股份有限公司 Tumor accurate medication reading system, reading method and device
CN112233725A (en) * 2020-10-14 2021-01-15 合肥达徽基因科技有限公司 ATP7B gene mutation second-generation sequencing automated analysis reading method and report system
CN113707218A (en) * 2020-05-22 2021-11-26 苏州安智因医学检验所有限公司 Intelligent reading method and system for human genetic disease gene detection
WO2021248695A1 (en) * 2020-06-08 2021-12-16 国家卫生健康委科学技术研究所 Monogenic disease name recommendation method and system based on clinical features and sequence variations
WO2021248694A1 (en) * 2020-06-11 2021-12-16 国家卫生健康委科学技术研究所 Report interpretation method and system for structural variations in sample data of patient
CN114023384A (en) * 2022-01-06 2022-02-08 天津金域医学检验实验室有限公司 Method for automatically generating standardized report of full exome sequencing annotation table

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013067001A1 (en) * 2011-10-31 2013-05-10 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
CN109754856A (en) * 2018-12-07 2019-05-14 北京荣之联科技股份有限公司 Automatically generate method and device, the electronic equipment of genetic test report
CN113707218A (en) * 2020-05-22 2021-11-26 苏州安智因医学检验所有限公司 Intelligent reading method and system for human genetic disease gene detection
WO2021248695A1 (en) * 2020-06-08 2021-12-16 国家卫生健康委科学技术研究所 Monogenic disease name recommendation method and system based on clinical features and sequence variations
WO2021248694A1 (en) * 2020-06-11 2021-12-16 国家卫生健康委科学技术研究所 Report interpretation method and system for structural variations in sample data of patient
CN111966708A (en) * 2020-09-02 2020-11-20 荣联科技集团股份有限公司 Tumor accurate medication reading system, reading method and device
CN112233725A (en) * 2020-10-14 2021-01-15 合肥达徽基因科技有限公司 ATP7B gene mutation second-generation sequencing automated analysis reading method and report system
CN114023384A (en) * 2022-01-06 2022-02-08 天津金域医学检验实验室有限公司 Method for automatically generating standardized report of full exome sequencing annotation table

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘粉香;杨文国;孙勤红;: "基于转录组测序数据分析及高通量GO注释理论的研究", 安徽农业科学, no. 31, 15 November 2018 (2018-11-15), pages 94 - 97 *
张绪超等: "二代测序临床报告解读指引", 《循证医学》, vol. 20, no. 4, 31 August 2020 (2020-08-31), pages 194 - 195 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453591A (en) * 2023-05-08 2023-07-18 上海信诺佰世医学检验有限公司 RNA-seq data analysis-based variation rating and report generation system and method

Also Published As

Publication number Publication date
CN114927191B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
US6675166B2 (en) Integrated multidimensional database
CN109344250B (en) Rapid structuring method of single disease diagnosis information based on medical insurance data
CN102640145B (en) Credible inquiry system and method
US20030066025A1 (en) Method and system for information retrieval
Savova et al. Word sense disambiguation across two domains: biomedical literature and clinical notes
US20090299977A1 (en) Method for Automatic Labeling of Unstructured Data Fragments From Electronic Medical Records
CN106326640A (en) Medical speech control system and control method thereof
US20200234801A1 (en) Methods and systems for healthcare clinical trials
US20100306218A1 (en) Systems and methods for interfacing with healthcare organization coding system
Babic et al. Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines
Lee et al. High-throughput analysis of clinical flow cytometry data by automated gating
Lavergne et al. A dataset for ICD-10 coding of death certificates: Creation and usage
CN114927191A (en) Interpretation method for NGS report of blood system disease
CN114068006A (en) Leukemia clinical decision, teaching and scientific research auxiliary support system and method
US20020132258A1 (en) Knowledge database and method for constructing knowledge database
CN116775897A (en) Knowledge graph construction and query method and device, electronic equipment and storage medium
AU781841B2 (en) Graphical user interface for display and analysis of biological sequence data
CN116721699B (en) Intelligent recommendation method based on tumor gene detection result
Bernstein et al. Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive
JP2004348271A (en) Clinical trial data outputting device, clinical trial data outputting method, and clinical trial data outputting program
WO2001090951A2 (en) An internet-linked system for directory protocol based data storage, retrieval and analysis
Korkontzelos et al. Text mining for efficient search and assisted creation of clinical trials
Samuel et al. Mining online full-text literature for novel protein interaction discovery
KR100601941B1 (en) Method for indexing sequence listing and system therefor
CN113611434A (en) Auxiliary inquiry system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant