CN114927191A - Interpretation method for NGS report of blood system disease - Google Patents
Interpretation method for NGS report of blood system disease Download PDFInfo
- Publication number
- CN114927191A CN114927191A CN202210385521.8A CN202210385521A CN114927191A CN 114927191 A CN114927191 A CN 114927191A CN 202210385521 A CN202210385521 A CN 202210385521A CN 114927191 A CN114927191 A CN 114927191A
- Authority
- CN
- China
- Prior art keywords
- variation
- information
- report
- evidence
- diseases
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 201000010099 disease Diseases 0.000 title claims description 50
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims description 50
- 210000004369 blood Anatomy 0.000 title claims description 6
- 239000008280 blood Substances 0.000 title claims description 6
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 46
- 230000035772 mutation Effects 0.000 claims abstract description 25
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 208000014951 hematologic disease Diseases 0.000 claims abstract description 11
- 208000018706 hematopoietic system disease Diseases 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 6
- 229940079593 drug Drugs 0.000 claims description 18
- 239000003814 drug Substances 0.000 claims description 18
- 238000003745 diagnosis Methods 0.000 claims description 17
- 238000011282 treatment Methods 0.000 claims description 14
- 238000004393 prognosis Methods 0.000 claims description 13
- 230000036267 drug metabolism Effects 0.000 claims description 9
- 239000012634 fragment Substances 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000003759 clinical diagnosis Methods 0.000 claims description 4
- 230000037433 frameshift Effects 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000003780 insertion Methods 0.000 claims description 2
- 230000037431 insertion Effects 0.000 claims description 2
- 238000002483 medication Methods 0.000 claims description 2
- 102000004169 proteins and genes Human genes 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000013523 data management Methods 0.000 abstract 1
- 238000007726 management method Methods 0.000 abstract 1
- 230000001939 inductive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 208000003950 B-cell lymphoma Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 210000003519 mature b lymphocyte Anatomy 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a reading method of an NGS report of a hematological disease, which belongs to the field of letter reading and comprises the following steps: s1: uploading a result data vcf file of the credit generation analysis; s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3; s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, carrying out filtering processing and generating annotation files; s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner; s5: matching variation site information in a knowledge base; s6: acquiring report data; s7: acquiring a report template according to a submission unit; s8: a report is generated and exported. The invention realizes convenient data management and rapid report output, helps users realize intelligent management of gene data, comprehensively improves the report reading capability and greatly improves the efficiency.
Description
Technical Field
The invention belongs to the field of letter interpretation and relates to a blood system disease NGS report interpretation method.
Background
The high throughput sequencing technology (NGS) can perform sequencing and general reading length detection on hundreds of thousands to millions of DNA molecules in parallel at one time, and the like, and complete sequence information is spliced by reading a plurality of short DNA fragments. The NGS messaging process can be divided into three levels: first-level information analysis-conversion of offboard raw data (BCL format) to readable data (VCF format); secondary information analysis-site annotation filtering for VCF data, etc.; and (3) three-level information analysis, namely, combining the clinical diagnosis and treatment conditions of the patient to carry out clinical significance interpretation on the mutant gene locus. Where report interpretation is the last and most important link. The interpretation of the report of the second generation sequencing needs to search a large amount of databases and professional documents, and the problems of large data volume, complex operation, difficult query and the like are faced, and one tumor report is completely interpreted manually for about 6 hours.
Disclosure of Invention
In view of the above, the present invention provides a method for interpreting a hematological disease NGS report.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for interpreting a hematological disease (NGS) report, comprising the steps of:
s1: uploading a result data vcf file of the credit generation analysis;
s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3;
s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, carrying out filtering processing and generating annotation files;
s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner;
s5: matching variation site information in a knowledge base;
s6: acquiring report data;
s7: acquiring a report template according to a submission unit;
s8: a report is generated and exported.
Further, the knowledge base comprises the following structural information:
gene: receiving and recording human gene related information including basic information, description information and other related information of the genes, wherein the basic information includes gene names, gene positions, gene types and common transcripts, and the other related information includes protein structural domains of the genes, related diseases and evidences;
mutation: dividing into parent variation and child variation;
parent variation: integrating and inducing the general variation related information, grading the variation grade of the variation related information, associating the variation of the type with the variation of the parent, inducing and summarizing the variation description content of the variation of the parent, collating variation summary information corresponding to diseases according to the related diseases of the variation of the type, collating related evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
sub-variation: the method comprises a plurality of types, wherein details of each type of sub-variation comprise basic information, variation description information and grading information related to the variation, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation summary information related to the sub-variation and a plurality of types of related evidence collation evidence summary information is collated according to targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
evidence is as follows: the method comprises approved medication, guidelines, clinical trials and literature evidences, wherein genes, variation and diseases targeted by the literatures are summarized and relevant information of evidence types, evidence relations and evidence grading is summarized through interpretation of different types of literatures;
diseases: clinical blood-related diseases are recorded and graded according to the diseases.
Further, the sub-variant types include:
SNP/InDel: single base variation, and insertion and deletion of small fragments;
fusion: fusion gene, two genes spliced together;
CNV: copy number variation, duplication of large fragments;
SV: other structural variations.
Further, the matching of the mutation site information in the knowledge base in step S5 specifically includes:
judging whether the code shift hotspot variation exists in each piece of site information, and if the code shift hotspot variation exists, correcting through a transvar; then, site knowledge base matching is carried out, matching is carried out through genes, transcripts and p points, if a plurality of knowledge base variation sites are matched, the variation site with the point c being empty is preferentially selected, if the variation site is not matched, matching is carried out through the genes, the transcripts and the point c, and if variation site information is matched, variation site reading information is returned;
if the unmatched variation is a frameshift or hotspot variation type, matching parent variation through the variation locus, if the matched corresponding parent variation locus information is clinically significant, indicating that the locus which is not recorded in the knowledge base is a clinically significant variation locus, automatically grading and preliminarily interpreting the locus according to related information, simultaneously recording the locus in the knowledge base for marking, and waiting for interpretation by an expert;
and if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
Further, the details of interpretation are as follows:
according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out; according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring are carried out on multiple evidences with different dimensions, such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and optimal interpretation data are found out;
and (3) scoring the evidence weight: all the varied evidences comprise three dimensions of an evidence type, an evidence label and related diseases, the three dimensions are graded according to sample information and report purposes during interpretation, and the evidence with a low score is the best interpretation evidence;
type of evidence: including approved medications, guidelines, clinical trials, and literature types, for diagnostic reporting purposes;
evidence label: including treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution signatures, and also for judging condition reporting uses;
the related diseases are as follows: judging the score according to the hierarchical relation between the patient diagnosis diseases in the sample information and the disease numbers, wherein the rule is as follows: the priority of the node is 1 point as the priority of the child node, the brother node of the node and the child node below the brother node are 1.5 points, the score of the father node is 2 points, the score of the brother node of the father node and the child node below the brother node of the father node are 2.5 points, and the logic is used for grading all evidences step by step;
and after the scores of the three dimensions are obtained, multiplying the scores of the three dimensions as required to obtain a best matching evidence, or grouping and sequencing the three dimensions, and selecting the best interpretation data to generate a report.
The invention has the beneficial effects that: according to the invention, through structured storage of the letter generation analysis result, more accurate and multidimensional interpretation site data is convenient for users; detailed reading information can be rapidly inquired through the gene locus, information such as function influence, diseases, medicines and the like related to the reading information can be checked, each piece of information can be conveniently traced back to an original text of an information source, and reading contents can be updated in time according to the latest research; the decoded content can be gradually precipitated to form a knowledge base, and a user can automatically match the knowledge base after uploading second-generation sequencing data, mark the decoded gene locus and help the user to screen the locus. Each piece of reading information can be obtained according to multiple dimensions of information sources, gene variation types, diseases, medicines and the like. A user can generate a report by one key by selecting a gene locus needing to be written into the report, and the process from uploading sequencing data to outputting the report can be completed in a few minutes.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a general flowchart of a method for interpreting a hematological disease NGS report;
FIG. 2 is a diagram of a knowledge base architecture;
FIG. 3 is an automatic + manual matching flow chart;
FIG. 4 is an automatic + manual matching implementation;
FIG. 5 is a chart of specific disease categories given in the examples.
Detailed Description
The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and embodiments may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present invention, and the specific meaning of the terms described above will be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, a reading method for NGS report of hematological diseases provided by the present invention includes the following steps:
uploading a vcf file;
judging whether the uploaded file is an annotation file, and if so, executing the fourth step;
thirdly, obtaining relevant filtering conditions of the detection items corresponding to the samples, filtering and generating annotation files;
fourthly, reading the information of the variation sites in the file and storing the information in a structured way;
matching knowledge base variation site information;
sixthly, report data is obtained;
seventhly, acquiring a report template according to a submission unit;
generating and exporting a report.
The structure of the knowledge base is shown in fig. 2 and is divided into the following parts: gene, variation, evidence, literature, disease.
Gene: the method mainly includes human gene related information, including basic information such as gene name, gene position, gene type and common transcript, gene description organized by an interpreter, description information of the genes in related websites such as genecards, omim and uniprot, and related information such as protein domains of the genes, related diseases and evidences.
Mutation: it is divided into two major categories, parent variation and child variation.
Parent variation: integrating and summarizing some common variation related information, carrying out variation grade grading on the variation related information, associating the sub-variation of the type with the parent variation, simultaneously carrying out induction and summarization on variation description content of the parent variation, collating variation summary information corresponding to diseases according to related diseases of the parent variation of the type, collating related evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism.
Sub-variation: the main categories are: SNP/InDel (single base variation, and indels of small fragments), Fusion (Fusion gene, splicing together of two genes), CNV (copy number variation, duplication of large fragments, SV (other structural variations).
The details of each sub-variation mainly comprise basic information related to the variation, variation description information and grading information, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation information related to the sub-variation and various types of related evidence such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism.
Evidence: the types of evidences including approved medicines, guidelines, clinical trials, documents and the like are included, and genes, variation and diseases targeted by the documents, and related information of evidence types, evidence relationships and evidence grading are summarized through interpretation of the different types of documents.
Diseases: clinical blood-related diseases are recorded and graded according to the diseases.
The specific steps of matching the knowledge base and interpreting are shown in fig. 3-4, and the patient vcf file is first imported into the system through related filtering of the detection items and formatted. And then, judging whether the code shift hot spot variation exists in each piece of site information, and if the code shift hot spot variation exists, correcting through a transvar. And then carrying out site knowledge base matching, carrying out matching through genes, transcripts and p points, preferentially selecting the mutation with the point c being empty if a plurality of knowledge base mutation sites are matched, carrying out matching through the genes, the transcripts and the point c if the mutation sites are not matched, and returning mutation site reading information if the mutation site information is matched.
Because the heterogeneity of the variant sites related to the blood diseases is very strong, even if the knowledge base is very perfect, the situation that the knowledge base cannot be matched can not be avoided, and therefore, a process for automatically discovering the variant sites is designed. When the unmatched mutation is a frameshift or hotspot mutation type, the parent mutation is matched through the mutation site, if the matched corresponding parent mutation site information is clinically significant, the site which is not recorded in the knowledge base is the clinically significant mutation site, the site is automatically graded and preliminarily interpreted according to the related information, and meanwhile, the site is recorded in the knowledge base and marked to wait for the deeper interpretation of an expert.
And if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
Interpretation: firstly, according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out. And secondly, according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring a plurality of evidences with different dimensions such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and finding out the optimal interpretation data.
And (3) scoring the evidence weight: all evidence for variation includes approved drug, guideline, clinical trial and literature types, and treatment, prognosis, diagnosis, risk, clinical profile, population distribution, etc., as well as markers associated with standard diseases. When in interpretation, the three dimensions are scored according to sample information and report purposes, and the best interpretation evidence is obtained when the score is low.
(1) Type of evidence: judging the condition reporting use, if the reporting use is treatment, the evidence for approved medication and clinical trial scores 0.5 and the other scores are default 1.
(2) Evidence label: similar type scores are obtained, if the reported use is diagnosis, the evidence of the label related to the prognosis, diagnosis, risk and the like is between 0.5 and 1, and the others are 1.
(3) Related diseases: and judging the score according to the hierarchical relation between the sample information and the disease number of the patient diagnosis disease. The rule is that the self node and the child node are scored with 1 as the priority, the brother node and the child node below the self node are scored with 1.5, the father node is scored with 2, the brother node and the child node below the father node are scored with 2.5, and all evidences are scored step by step according to the logic.
As shown in FIG. 5, in this example, when the patient was diagnosed with mature B-cell lymphoma, the following scores were made as shown in Table 1 below:
TABLE 1
And finally, three dimensionality scores are obtained, the three scores can be multiplied as required to obtain the best matching evidence, or the three dimensions are grouped and sorted, and the best interpretation data is selected to generate a report, so that a clinician can conveniently and quickly know the disease details of the patient.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (5)
1. A method for interpreting a hematological disease (NGS) report, comprising: the method comprises the following steps:
s1: uploading a result data vcf file of the credit generation analysis;
s2: judging whether the uploaded file is a comment file, if so, executing the step S4, otherwise, executing the step S3;
s3: obtaining relevant filtering conditions of the detection items corresponding to the samples, filtering the relevant filtering conditions and generating annotation files;
s4: reading the mutation site information in the annotation file and storing the mutation site information in a structured manner;
s5: matching variation site information in a knowledge base;
s6: acquiring report data;
s7: acquiring a report template according to a submission unit;
s8: a report is generated and exported.
2. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the knowledge base comprises the following structural information:
gene: receiving and recording human gene related information including basic information, description information and other related information of the gene, wherein the basic information includes a gene name, a gene position, a gene type and a common transcript, and the other related information includes a protein structure domain of the gene, related diseases and evidences;
mutation: dividing into parent variation and child variation;
parent variation: integrating and summarizing the general variation related information, grading the variation grade of the variation related information, associating the sub variation of the type with the parent variation, summarizing the variation description content of the parent variation, summarizing variation summary information corresponding to diseases according to the related diseases of the parent variation of the type, summarizing evidence summary information according to various types of targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
sub-variation: the method comprises a plurality of types, wherein details of each type of sub-variation comprise basic information, variation description information and grading information related to the variation, related parent variation information is associated, and meanwhile, related evidence summary information is collated according to disease collation variation summary information related to the sub-variation and a plurality of types of related evidence collation evidence summary information is collated according to targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution and drug metabolism;
evidence: the method comprises approved medication, guidelines, clinical trials and literature evidences, wherein genes, variation and diseases targeted by the literatures, and related information of evidence types, evidence relations and evidence grading are summarized through interpretation of different types of the literatures;
diseases: clinical blood-related diseases are recorded and graded according to the diseases.
3. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the sub-variant types include:
SNP/InDel: single base variation, and insertion and deletion of small fragments;
fusion: fusion genes, two genes spliced together;
CNV: copy number variation, duplication of large fragments;
SV: other structural variations.
4. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the matching of the mutation site information in the knowledge base in step S5 specifically includes:
judging whether the code shift hotspot variation exists in each piece of site information, and if the code shift hotspot variation exists, correcting through a transvar; then, site knowledge base matching is carried out, matching is carried out through genes, transcripts and p points, if a plurality of knowledge base variation sites are matched, the variation site with the point c being empty is preferentially selected, if the variation site is not matched, matching is carried out through the genes, the transcripts and the point c, and if variation site information is matched, variation site reading information is returned;
if the unmatched variation is a frameshift or hotspot variation type, matching parent variation through the variation locus, if the matched corresponding parent variation locus information is clinically significant, indicating that the locus which is not recorded in the knowledge base is a clinically significant variation locus, automatically grading and preliminarily interpreting the locus according to related information, simultaneously recording the locus in the knowledge base for marking, and waiting for interpretation by an expert;
and if the mutation site information is not matched, entering a manual interpretation stage according to the situation, and finally confirming the report site through a report interpretation person and generating a report.
5. The method for interpreting a hematological disease NGS report according to claim 1, wherein: the details of interpretation are as follows:
according to the classification of variation sites, the variation with clinical significance and potential clinical significance is found out; according to the hierarchical relation of clinical diagnosis diseases of patients in a disease tree, weighting and scoring are carried out on multiple evidences with different dimensions, such as targeted medication, treatment, prognosis, diagnosis, risk, clinical characteristics, population distribution, drug metabolism and the like under the same variation, and optimal interpretation data are found out;
and (3) scoring the evidence weight: all the varied evidences comprise three dimensions of an evidence type, an evidence label and related diseases, the three dimensions are graded according to sample information and report purposes during interpretation, and the evidence with a low score is the best interpretation evidence;
type of evidence: including approved medications, guidelines, clinical trials, and literature types, for diagnostic reporting purposes;
evidence label: including treatment, prognosis, diagnosis, risk, clinical features, population distribution signatures, and also for judging condition reporting purposes;
the related diseases are as follows: judging the score according to the hierarchical relation between the sample information and the disease numbers of the patient diagnosis diseases, wherein the rule is as follows: the priority of the node and the priority of the child node are graded into 1, the brother node of the node and the child node below the brother node are graded into 1.5, the score of the father node is graded into 2, and the score of the brother node of the father node and the child node below the brother node of the father node are graded into 2.5, and all evidences are graded by the logic step by step;
and after the three dimensionality scores are obtained, multiplying the three scores to obtain the best matching evidence according to the requirement, or grouping and sequencing the three dimensionalities, and selecting the best interpretation data to generate a report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210385521.8A CN114927191B (en) | 2022-04-13 | 2022-04-13 | NGS report interpretation method for blood system diseases |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210385521.8A CN114927191B (en) | 2022-04-13 | 2022-04-13 | NGS report interpretation method for blood system diseases |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114927191A true CN114927191A (en) | 2022-08-19 |
CN114927191B CN114927191B (en) | 2024-07-02 |
Family
ID=82806459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210385521.8A Active CN114927191B (en) | 2022-04-13 | 2022-04-13 | NGS report interpretation method for blood system diseases |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114927191B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453591A (en) * | 2023-05-08 | 2023-07-18 | 上海信诺佰世医学检验有限公司 | RNA-seq data analysis-based variation rating and report generation system and method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013067001A1 (en) * | 2011-10-31 | 2013-05-10 | The Scripps Research Institute | Systems and methods for genomic annotation and distributed variant interpretation |
CN109754856A (en) * | 2018-12-07 | 2019-05-14 | 北京荣之联科技股份有限公司 | Automatically generate method and device, the electronic equipment of genetic test report |
CN111966708A (en) * | 2020-09-02 | 2020-11-20 | 荣联科技集团股份有限公司 | Tumor accurate medication reading system, reading method and device |
CN112233725A (en) * | 2020-10-14 | 2021-01-15 | 合肥达徽基因科技有限公司 | ATP7B gene mutation second-generation sequencing automated analysis reading method and report system |
CN113707218A (en) * | 2020-05-22 | 2021-11-26 | 苏州安智因医学检验所有限公司 | Intelligent reading method and system for human genetic disease gene detection |
WO2021248695A1 (en) * | 2020-06-08 | 2021-12-16 | 国家卫生健康委科学技术研究所 | Monogenic disease name recommendation method and system based on clinical features and sequence variations |
WO2021248694A1 (en) * | 2020-06-11 | 2021-12-16 | 国家卫生健康委科学技术研究所 | Report interpretation method and system for structural variations in sample data of patient |
CN114023384A (en) * | 2022-01-06 | 2022-02-08 | 天津金域医学检验实验室有限公司 | Method for automatically generating standardized report of full exome sequencing annotation table |
-
2022
- 2022-04-13 CN CN202210385521.8A patent/CN114927191B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013067001A1 (en) * | 2011-10-31 | 2013-05-10 | The Scripps Research Institute | Systems and methods for genomic annotation and distributed variant interpretation |
CN109754856A (en) * | 2018-12-07 | 2019-05-14 | 北京荣之联科技股份有限公司 | Automatically generate method and device, the electronic equipment of genetic test report |
CN113707218A (en) * | 2020-05-22 | 2021-11-26 | 苏州安智因医学检验所有限公司 | Intelligent reading method and system for human genetic disease gene detection |
WO2021248695A1 (en) * | 2020-06-08 | 2021-12-16 | 国家卫生健康委科学技术研究所 | Monogenic disease name recommendation method and system based on clinical features and sequence variations |
WO2021248694A1 (en) * | 2020-06-11 | 2021-12-16 | 国家卫生健康委科学技术研究所 | Report interpretation method and system for structural variations in sample data of patient |
CN111966708A (en) * | 2020-09-02 | 2020-11-20 | 荣联科技集团股份有限公司 | Tumor accurate medication reading system, reading method and device |
CN112233725A (en) * | 2020-10-14 | 2021-01-15 | 合肥达徽基因科技有限公司 | ATP7B gene mutation second-generation sequencing automated analysis reading method and report system |
CN114023384A (en) * | 2022-01-06 | 2022-02-08 | 天津金域医学检验实验室有限公司 | Method for automatically generating standardized report of full exome sequencing annotation table |
Non-Patent Citations (2)
Title |
---|
刘粉香;杨文国;孙勤红;: "基于转录组测序数据分析及高通量GO注释理论的研究", 安徽农业科学, no. 31, 15 November 2018 (2018-11-15), pages 94 - 97 * |
张绪超等: "二代测序临床报告解读指引", 《循证医学》, vol. 20, no. 4, 31 August 2020 (2020-08-31), pages 194 - 195 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453591A (en) * | 2023-05-08 | 2023-07-18 | 上海信诺佰世医学检验有限公司 | RNA-seq data analysis-based variation rating and report generation system and method |
Also Published As
Publication number | Publication date |
---|---|
CN114927191B (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6675166B2 (en) | Integrated multidimensional database | |
CN109344250B (en) | Rapid structuring method of single disease diagnosis information based on medical insurance data | |
CN102640145B (en) | Credible inquiry system and method | |
US20030066025A1 (en) | Method and system for information retrieval | |
Savova et al. | Word sense disambiguation across two domains: biomedical literature and clinical notes | |
US20090299977A1 (en) | Method for Automatic Labeling of Unstructured Data Fragments From Electronic Medical Records | |
CN106326640A (en) | Medical speech control system and control method thereof | |
US20200234801A1 (en) | Methods and systems for healthcare clinical trials | |
US20100306218A1 (en) | Systems and methods for interfacing with healthcare organization coding system | |
Babic et al. | Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines | |
Lee et al. | High-throughput analysis of clinical flow cytometry data by automated gating | |
Lavergne et al. | A dataset for ICD-10 coding of death certificates: Creation and usage | |
CN114927191A (en) | Interpretation method for NGS report of blood system disease | |
CN114068006A (en) | Leukemia clinical decision, teaching and scientific research auxiliary support system and method | |
US20020132258A1 (en) | Knowledge database and method for constructing knowledge database | |
CN116775897A (en) | Knowledge graph construction and query method and device, electronic equipment and storage medium | |
AU781841B2 (en) | Graphical user interface for display and analysis of biological sequence data | |
CN116721699B (en) | Intelligent recommendation method based on tumor gene detection result | |
Bernstein et al. | Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive | |
JP2004348271A (en) | Clinical trial data outputting device, clinical trial data outputting method, and clinical trial data outputting program | |
WO2001090951A2 (en) | An internet-linked system for directory protocol based data storage, retrieval and analysis | |
Korkontzelos et al. | Text mining for efficient search and assisted creation of clinical trials | |
Samuel et al. | Mining online full-text literature for novel protein interaction discovery | |
KR100601941B1 (en) | Method for indexing sequence listing and system therefor | |
CN113611434A (en) | Auxiliary inquiry system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |