EP3738054A1 - A system and method for extracting oncological information of prognostic significance from natural language - Google Patents
A system and method for extracting oncological information of prognostic significance from natural languageInfo
- Publication number
- EP3738054A1 EP3738054A1 EP18900405.4A EP18900405A EP3738054A1 EP 3738054 A1 EP3738054 A1 EP 3738054A1 EP 18900405 A EP18900405 A EP 18900405A EP 3738054 A1 EP3738054 A1 EP 3738054A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- attribute
- data points
- unstructured
- text
- health information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000000771 oncological effect Effects 0.000 title claims description 69
- 238000000605 extraction Methods 0.000 claims abstract description 33
- 230000036541 health Effects 0.000 claims description 73
- 239000003814 drug Substances 0.000 claims description 34
- 229940079593 drug Drugs 0.000 claims description 33
- 238000004458 analytical method Methods 0.000 claims description 28
- 230000001419 dependent effect Effects 0.000 claims description 28
- 238000003745 diagnosis Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 26
- 238000004393 prognosis Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 20
- 230000015654 memory Effects 0.000 claims description 15
- 230000004083 survival effect Effects 0.000 claims description 15
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 claims description 14
- 206010070863 Toxicity to various agents Diseases 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 13
- 238000011282 treatment Methods 0.000 claims description 13
- 230000003542 behavioural effect Effects 0.000 claims description 9
- 230000000877 morphologic effect Effects 0.000 claims description 9
- 206010020751 Hypersensitivity Diseases 0.000 claims description 8
- 230000007815 allergy Effects 0.000 claims description 8
- 238000012544 monitoring process Methods 0.000 claims description 8
- 238000002649 immunization Methods 0.000 claims description 7
- 230000003053 immunization Effects 0.000 claims description 7
- 238000009533 lab test Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 7
- 238000005192 partition Methods 0.000 claims description 6
- 206010028980 Neoplasm Diseases 0.000 description 34
- 201000011510 cancer Diseases 0.000 description 19
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 18
- 201000010099 disease Diseases 0.000 description 16
- 102000015694 estrogen receptors Human genes 0.000 description 14
- 108010038795 estrogen receptors Proteins 0.000 description 14
- 238000004891 communication Methods 0.000 description 7
- 238000002560 therapeutic procedure Methods 0.000 description 7
- 238000010200 validation analysis Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 206010006187 Breast cancer Diseases 0.000 description 4
- 208000026310 Breast neoplasm Diseases 0.000 description 4
- 108090000468 progesterone receptors Proteins 0.000 description 4
- 102000003998 progesterone receptors Human genes 0.000 description 4
- 108700020796 Oncogene Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 101150029707 ERBB2 gene Proteins 0.000 description 2
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 2
- 206010027476 Metastases Diseases 0.000 description 2
- 102000043276 Oncogene Human genes 0.000 description 2
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 2
- 208000003721 Triple Negative Breast Neoplasms Diseases 0.000 description 2
- 230000033115 angiogenesis Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009401 metastasis Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 102000005962 receptors Human genes 0.000 description 2
- 108020003175 receptors Proteins 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 210000000130 stem cell Anatomy 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 208000022679 triple-negative breast carcinoma Diseases 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 101000882584 Homo sapiens Estrogen receptor Proteins 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000208125 Nicotiana Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 108700025695 Suppressor Genes Proteins 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 210000004443 dendritic cell Anatomy 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000005802 health problem Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002483 medication Methods 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011275 oncology therapy Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000002054 transplantation Methods 0.000 description 1
- 230000005747 tumor angiogenesis Effects 0.000 description 1
- 238000002255 vaccination Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the present invention relates generally to identifying and extracting (together referred to as abstraction) data from unstructured text, and more particularly to extracting prognostically significant data from unstructured medical text to provide useful oncologic information of prognostic significance, to validate the data from an oncologic standpoint, and to transform the data into information that can be further analyzed to provide actionable insights.
- cancer stem cells see e.g., Clarke, M.F.,“Self-renewal and solid-tumor stem cells,” Biology of Blood and Marrow Transplantation 1 1 : 14-16 (2005); metabolic derangements in cancer cells (See, e.g., Stine, ZE et al, (2015)“MYC, metabolism and cancer,” Cancer Discov. 5 (10): 1024-39); and discoveries in tumor immunology (See, e.g., Nestle, FO, (2000)“Dendritic cell vaccination for cancer therapy,” Oncogene 19 (56): 6673-9; Khanna, R., (1998)“Tumour surveillance:
- a doctor evaluating an oncologic condition of a patient typically records notes of the evaluation in the patient’s medical records in the form of a natural language, such as, e.g., English.
- a natural language such as, e.g., English.
- an extraction system for extracting prognostically significant data from unstructured text comprising medical data.
- the extraction system may be implemented for complex data, such as, e.g., oncologic data.
- the extraction system extracts prognostically significant data from unstructured text to allow for further processing, e.g., to validate the data from an oncologic standpoint, and to transform the data into information that can be further analyzed derive actionable insights.
- a system and method are provided for extracting data from unstructured medical text. Data points are identified in unstructured medical text, where the data points are determined from a dictionary database. A value associated with each of the data points is determined from the unstructured medical text. Each of the data points is mapped to its respective value for extraction from the unstructured medical text.
- the unstructured medical text may be sentences or phrases based on grammar rules of a natural language (e.g., English).
- a natural language e.g., English
- the unstructured medical text may be notations regarding a patient by a doctor.
- the unstructured medical text may include unstructured oncologic text.
- the dictionary database is generated (e.g., as a preprocessing step).
- a plurality of data points is received that is known to be diagnostically and prognostically significant (e.g., as identified by the doctor evaluating the patient, other doctors or medical experts, or advisory boards).
- Equivalent data points of the plurality of data points are determined.
- the equivalent data points are data points that are synonyms of the plurality of data points or morphological variations of the plurality of data points.
- the plurality of data points and the equivalent data points are stored to generate the dictionary database.
- each of the data points may be mapped to its respective value to generate attribute-value pairs.
- the mapping may store the attribute-value pairs as a list of attribute-value pairs, a collection of tuples, a table, or any other suitable data structure.
- the data points i.e., attributes in the attribute-value pairs
- UMLS unified medical language system
- the data points and their respective values are validated to ensure integrity.
- the data points and their respective values may be validated by identifying inconsistencies with the set of data points and values or by identifying data points having a respective value that cannot be correct.
- the data points and their respective values are modelled to transform the data into information that can be further analyzed to provide actionable insights.
- a nodal address is assigned to a patient based on the data points and their respective values
- the described invention provides a method for extracting objective oncologic data of prognostic significance from subjective unstructured medical text in a natural language, comprising: A. receiving input comprising (i) one or more lists of data points; (ii) unstructured medical text comprising unstructured oncologic text and (iii) a database comprising an exhaustive dictionary of medical knowledge that identifies one or more data points as significant to diagnosis and prognosis; B.
- processing the inputs in A to generate lists of attribute-value pairs by: (a) identifying in the unstructured medical text data points and equivalents of the data points that are significant for diagnosis and prognosis; (b) extracting from the unstructured medical text all facts of known importance associated with words or phrases in the unstructured text that are syntactically or semantically dependent on or related to the extracted data points; (c), associating the extracted data points with the words or phrases in the unstructured text that are syntactically or semantically dependent on, or related to, the extracted data points; (d) mapping each extracted data point (attribute) with its syntactically or semantically dependent word or phrase (value) in (c) to generate attribute value pairs; (e) standardizing each attribute in the attribute-value pairs according to a code that represents each attribute; (C) validating the standardized attribute-data pairs to ensure oncologic integrity by: identifying inconsistencies between standardized attribute pairs; and identifying medical errors; (D) outputting a list of validated, standardized attribute-
- the method further comprises (E) based on the list of attribute-value pairs generated in B, classifying like personal health information, and grouping types of patients in the patient population based on the personal health information associated with the patient population as belonging to a plurality of nodal addresses by (1 ) representing each nodal address as a discrete punctuated string of digits comprising a prefix, a middle, and a suffix that represent a set of preselected variables that partition sorted personal health information for each patient in the patient population using a sorting filter to provide a sorted set of personal health information for that population, and to identify patients satisfying each parameter in the patient population, and classified like personal health information into a clinically relevant set of health information; (2) reducing trillions of possible permutations to a reduced number of clinically meaningful permutations based on the discrete punctuated string of digits representing each nodal address that enable analysis of first behavioral and then consequent clinical and cost outcome variance from an ideal value, expressed as best clinical outcome at lowest possible cost, in a requisite
- the unstructured medical text is stored as an image or as text; or (b) the unstructured medical text is a notation from a doctor in form of sentences or phrases based on grammar rules relating to an oncologic condition of a patient evaluated by the doctor; or (c) the unstructured oncologic text includes one or more of demographic parameters, a simple indicator, a numerically based parameter, a standards based parameter, dates of service, medical history, medicines, diagnoses, allergies, immunization status, lab tests results, vital signs, and personal statistics.
- the equivalents of the data points comprise synonyms and morphological variations of the data points.
- the code that represents each attribute comprises a unified medical language system (UMLS) code.
- UMLS unified medical language system
- the outputting is on a per patient basis, a per event basis or both.
- the outputting on a per event basis comprises e.g., diagnosis, treatment, progression (e.g., ECOG), or clinical outcome.
- clinical outcome comprises one or more of overall survival (OS), progression free survival (PFS), response metrics, quality of life metrics, incidence of drug toxicity, severity of drug toxicity, delivered dose intensity, drugs received, drug interval, drug duration, cost of care, or death
- the outputting is in form of a java script output notation (JSON) document.
- JSON java script output notation
- the described invention provides a non- transitory computer readable storage medium storing computer program instructions for extracting objective oncologic data of prognostic significance from unstructured medical text in a natural language, which, when executed on a processor, cause the processor to perform operations comprising: A. receiving input comprising (i) one or more lists of data points; (ii) unstructured medical text comprising unstructured oncologic text and (iii) a database comprising an exhaustive dictionary of medical knowledge that identifies one or more data points as significant to diagnosis and prognosis; B.
- processing the inputs in A to generate lists of attribute-value pairs by: (a) identifying in the unstructured medical text data points and equivalents of the data points that are significant for diagnosis and prognosis; (b) extracting from the unstructured medical text all facts of known importance associated with words or phrases in the unstructured text that are syntactically or semantically dependent on or related to the extracted data points; (c) associating the extracted data points with the words or phrases in the unstructured text that are syntactically or semantically dependent on, or related to, the extracted data points; (d) mapping each extracted data point (attribute) with its syntactically or semantically dependent word or phrase (value) in (c) to generate attribute value pairs; (e) standardizing each attribute in the attribute-value pairs according to a code that represents each attribute; (C) validating the standardized attribute-data pairs to ensure oncologic integrity by: identifying inconsistencies between standardized attribute pairs; identifying medical errors; (D) outputting a list of validated, standardized attribute- value
- the non-transitory computer readable storage medium which, when executed on a processor, causes the processor to perform operations further comprising (E) based on the list of attribute-value pairs generated in B, classifying like personal health information, and grouping types of patients in the patient population based on the personal health information associated with the patient population as belonging to a plurality of nodal addresses by: (i) representing each nodal address as a discrete punctuated string of digits comprising a prefix, a middle, and a suffix that represent a set of preselected variables that partition sorted personal health information for each patient in the patient population using a sorting filter to provide a sorted set of personal health
- the unstructured medical text is stored as an image or as text; or (b) the unstructured medical text is a notation from a doctor in form of sentences or phrases based on grammar rules relating to an oncologic condition of a patient evaluated by the doctor; or (c) the unstructured oncologic text includes one or more of demographic parameters, a simple indicator, a numerically based parameter, a standards based parameter, dates of service, medical history, medicines, diagnoses, allergies, immunization status, lab tests results, vital signs, personal statistics.
- the equivalents of the data points comprise synonyms and morphological variations of the data points.
- the code that represents each attribute comprises a unified medical language system (UMLS) code.
- UMLS unified medical language system
- the outputting is on a per patient basis, a per event basis or both.
- the outputting on a per event basis comprises e.g., diagnosis, treatment, progression (e.g., ECOG), or clinical outcome.
- clinical outcome comprises one or more of overall survival (OS), progression free survival (PFS), response metrics, quality of life metrics, incidence of drug toxicity, severity of drug toxicity, delivered dose intensity, drugs received, drug interval, drug duration, cost of care, or death
- OS overall survival
- PFS progression free survival
- response metrics quality of life metrics
- incidence of drug toxicity severity of drug toxicity
- delivered dose intensity drugs received
- drug interval drug duration
- cost of care or death
- JSON java script output notation
- the described invention provides a system for extracting objective oncologic data of prognostic significance from unstructured medical text in a natural language, comprising: a first database comprising an exhaustive dictionary of medical knowledge that identifies one or more data points as significant to diagnosis and prognosis; a second database comprising personal health information data for a population of human subjects; wherein the first and second database are communicatively linked using a common patient identifier and through the use of database access using the common patient identifier; an extraction system comprising: a computer server comprising a processor comprising a clinical outcome tracking and analysis module communicatively linked to the first database, the second database and the network; and a memory to store computer program instructions, the computer program instructions when executed on the processor cause the processor to perform operations comprising: A.
- receiving input comprising (i) one or more lists of data points; (ii) unstructured medical text comprising unstructured oncologic text and (iii) a database comprising an exhaustive dictionary of medical knowledge that identifies one or more data points as significant to diagnosis and prognosis B.
- processing the inputs in A to generate lists of attribute-value pairs by: (a) identifying in the unstructured medical text data points and equivalents of the data points that are significant for diagnosis and prognosis; (b) extracting from the unstructured medical text all facts of known importance associated with words or phrases in the unstructured text that are syntactically or semantically dependent on or related to the extracted data points; (c) associating each extracted data point with the words or phrases in unstructured text that are syntactically or semantically dependent on, or related to, each extracted data point to produce a set of extracted data points that are of prognostic significance; (d) mapping each extracted data point (attribute) with its syntactically or semantically dependent word or phrase (value) in (c) to generate attribute value pairs; (e) standardizing each attribute in the attribute-value pairs according to a code that represents each attribute; (C) validating the standardized attribute- data pairs to ensure oncologic integrity by: identifying inconsistencies between standardized attribute pairs; and identifying medical errors;
- the computer program instructions of the system when executed on the processor cause the processor to perform operations further comprising (E) based on the list of attribute-value pairs generated in B, classifying like personal health information, and grouping types of patients in the patient population based on the personal health information associated with the patient population as belonging to a plurality of nodal addresses by: (i) representing each nodal address as a discrete punctuated string of digits comprising a prefix, a middle, and a suffix that represent a set of preselected variables that partition sorted personal health information for each patient in the patient population using a sorting filter to provide a sorted set of personal health information for that population, and to identify patients satisfying each parameter in the patient population, and classified like personal health information into a clinically relevant set of health information; (ii) reducing trillions of possible permutations to a reduced number of clinically meaningful permutations based on the discrete punctuated string of digits representing each nodal address that enable analysis of first
- the unstructured medical text is stored as an image or as text; or (b) the unstructured medical text is a notation from a doctor in form of sentences or phrases based on grammar rules relating to an oncologic condition of a patient evaluated by the doctor; or (c) the unstructured oncologic text includes one or more of demographic parameters, a simple indicator, a numerically based parameter, a standards based parameter, dates of service, medical history, medicines, diagnoses, allergies, immunization status, lab tests results, vital signs, personal statistics; or (d) the equivalents of the data points comprise synonyms and morphological variations of the data points; or (e) the code that represents each attribute comprises a unified medical language system (UMLS) code; or (f) the outputting is on a per patient basis, a per event basis or both.
- UMLS unified medical language system
- the outputting on a per event basis comprises e.g., diagnosis, treatment, progression (e.g., ECOG), or clinical outcome.
- clinical outcome comprises one or more of overall survival (OS), progression free survival (PFS), response metrics, quality of life metrics, incidence of drug toxicity, severity of drug toxicity, delivered dose intensity, drugs received, drug interval, drug duration, cost of care, or death
- the outputting is in form of a java script output notation (JSON) document.
- JSON java script output notation
- Figure 1 shows a high-level diagram of a communications system, in accordance with one embodiment
- Figure 2 shows a system architecture of an extraction system for identifying and extracting prognostically significant data from unstructured text, in accordance with one embodiment
- Figure 3 shows an example of unstructured oncologic text, in accordance with one embodiment
- Figure 4 shows an exemplary table of attribute-value pairs, in accordance with one embodiment
- Figure 5 illustratively depicts a flow diagram of a method for identifying and extracting prognostically significant data from unstructured text, in accordance with one embodiment
- Figure 6 shows a high-level block diagram of a computer for an extraction system, in accordance with one embodiment.
- condition refers to a variety of health states and is meant to include disorders or diseases caused by any underlying mechanism or disorder.
- data integrity refers to the extent to which all data are complete, consistent, and accurate throughout the data.
- diagnosis and its other grammatical forms is used herein to refer to a determination of the nature of a disease.
- a group of words that can stand alone and make a complete thought that consists of a subject and a predicate is an independent clause or a sentence.
- the subject is the thing that is the focus of the sentence.
- the predicate tells the action that the subject is taking or something about the subject.
- a compound sentence is one with two independent clauses joined by a conjunction or a semicolon.
- a noun is a part of speech that denotes a person, animal, place, thing, quality, idea, activity, or feeling.
- a noun can be singular, plural, or show possession.
- a pronoun is a word that takes the place of a noun, like: ⁇ ”,“he”,“she”,
- a verb is a part of speech used to describe an action, state of being or occurrence, and can be a main verb or a helping verb. Verbs also indicate tense and sometimes change their form to show past, present, or future tense. State of being (linking) verbs link the subject to the rest of the sentence.
- An adjective is a part of speech that describes, identifies or further defines a noun or a pronoun. For example, an adjective can add meaning by telling how much, which one, what kind, or describing it in other ways.
- An article e.g.,“a”,“an”, or“the” is an adjective used to point out or refer to a noun. “A” and“an” are indefinite articles. “The” is a definite article.
- An adverb is a part of speech that modifies or qualifies a verb, an adjective, or other adverb or a word group telling, e.g., when, where, how, why, in what manner, or to what extent an action is performed.
- a preposition is a part of speech that shows a relationship between a noun or pronoun and some other nearby word or element in the rest of the sentence. A sentence should not end with a preposition.
- a conjunction is a part of speech used to connect clauses, phrases, or sentences, or to coordinate words in the same clause. A conjunction should not be used to start a sentence.
- Every sentence needs a punctuation mark (e.g., a period, exclamation mark, or question mark) at the end of it.
- a punctuation mark e.g., a period, exclamation mark, or question mark
- An apostrophe is a punctuation mark used to indicate either possession or the omission of letters or numbers.
- a colon is used to separate a sentence from a list of items, between two sentences when the second one explains the first, and to introduce a long direct quote.
- a comma separates things in a series and goes wherever there is a pause in the sentence. For example, commas surround the name of a person being addressed, separate the day of the month from the year in a date, and separate a town from the state. When the clauses of a compound sentence are joined by a conjunction, a comma is usually placed before the conjunction.
- Parentheses are a pair of round brackets used to mark off a parenthetical word or phrase.
- a parenthetical word or phrase is a word, clause or sentence inserted as an explanation or an aside into a passage that is grammatically complete without it.
- a semicolon is used to take the place of a conjunction, and is placed before introductory words like“therefore” or“however.” It is also used to separate a list of things if there are commas within each unit.
- ICD-10 The International Statistical Classification of Diseases and Related
- ICD-10 Health Problems 10th Revision
- WHO World Health Organization
- the code set allows more than 14,400 different codes and permits the tracking of many new diagnoses.
- ICD-10 is an updated version of the ICD-9 code sets.
- Health plan systems and health care providers are required by the Health Insurance Portability and Accountability Act (HIPAA) to use a standard code set to indicate diagnoses and procedures on transactions. For diagnoses, the ICD-9-CM code set is used.
- HIPAA Health Insurance Portability and Accountability Act
- the ICD-9-CM procedure code set is used for inpatient hospital procedures.
- CPT Current Procedural Terminology
- HPCS Healthcare Common Procedure Coding System
- “information extraction” refers to the act of extracting, by computer, recognizable information from documents written in a human language. It is distinguished from“information retrieval” which refers to the act of identifying, by computer, documents written in a human language that are relevant to some specific question. This often includes, e.g., statistical analysis of vocabulary (to determine subject matter) and a considerable amount of natural language processing. It is easier to process natural-language texts in a way that falls short of full understanding, but still allows some of the meaning to be extracted.
- natural language refers to an actual language as used in ordinary discourse to communicate in everyday life.
- natural language processing refers to the use of computers to process information expressed in human (natural) languages. Getting computers to understand a human language, which includes signal processing/speech recognition, syntactic analysis, or parsing to determine sentence structure, semantic analysis to determine meaning, and pragmatics/knowledge representation to encode the meaning into a computer, language is a difficult, largely unsolved problem.
- oncologic refers to relating to the branch of medicine that deals with the physical, chemical and biologic properties and features of neoplasms, including causation, pathogenesis and treatment.
- parsing refers to the analysis, by computer, of the structure of statements in a human or artificial language. Programs that accept natural language input generally have to parse sentences in human languages.
- processing refers to both interpretation (meaning understanding) and generation (meaning production).
- prognosis and its other grammatical forms as used herein refers to a prediction about the probable course and/or outcome of a disease.
- prognostic refers to relating to prognosis; a symptom upon which a prognosis is based or one indicative of the likely outcome
- speech recognition refers to use of computers to recognize spoken words.
- the same spoken word does not produce entirely the same sound waves when pronounced by different individuals, or even when pronounced by the same person on more than one occasion.
- the computer must digitize the sound, transform it to discard unneeded information, and try to match it with words stored in a dictionary.
- Most speech recognition systems are speaker-dependent; they have to be trained to recognize a particular person’s speech and then can distinguish thousands of words but only the words on which they were trained). Speaker independent speech recognition is less effective.
- semantic analysis refers to the use of contextual clues surrounding words and phrases in natural language text so that the computer can better understand the implied or practical meaning and relevance of that text.
- Machine-driven semantic analysis can extract relevant and useful information from large bodies of unstructured data; find an answer to a question without having to ask a human; discover the meaning of colloquial speech, and uncover specific meanings of words that are not commonly used in our own language.
- syntax refers to the set of rules that specify how the symbols of a language can be put together to form meaningful statements.
- A“syntax error” is a place in a program where the syntax rules of the programming language were not followed.
- FIG. 1 shows a high-level diagram of a communications system 100, in accordance with one or more embodiments.
- Communications system 100 includes one or more computing devices 102-A, . . ., 102-N (collectively referred to as computing devices 102).
- Computing devices 102 may comprise any type of computing device, such as, e.g., a computer, a workstation, a tablet, a mobile device, a server, or a database.
- Computing devices 102 are operated by users for communicating via network 104.
- Network 104 may include any type of network or combination of different types of networks, and may be implemented in a wired and/or a wireless configuration.
- network 104 may include one or more of the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a Fibre Channel storage area network (SAN), a cellular communications network, etc.
- LAN local area network
- WAN wide area network
- SAN Fibre Channel storage area network
- cellular communications network etc.
- a user may interact with computing device 102 for the processing of data.
- a user such as a doctor or other medical professional, may interact with computing device 102 to store medical information of a patient in a patient medical record database (not shown). The medical information is typically input into
- unstructured data e.g., text or images.
- unstructured data e.g., text or images.
- unstructured text may be in the form of sentences or phrases generally following English grammar rules.
- embodiments of the present invention provide for an extraction system 106, which is configured to generate a dictionary of prognostically significant data points (e.g., data points of medical significance from a prognostic standpoint) and to identify and extract prognostically significant data points from unstructured medical text using the dictionary, thereby transforming subjective data into objective data.
- Extraction system 106 in accordance with embodiments of the invention thus provides for improvements in computer related technology by facilitating the identification and extraction of prognostically significant data points from unstructured medical text.
- Figure 2 illustratively depicts a system architecture 200 of extraction system 106 for the identification and extraction of prognostically significant data from unstructured text, in accordance with one or more embodiments.
- Extraction system 106 receives input 202, e.g., from a user (e.g., a doctor or medical professional) interacting with computing device 102 via network 104 in Figure 1.
- Input 202 includes unstructured data, such as unstructured text 204, a list of data points from advisory boards, and an exhaustive dictionary of medical knowledge from advisory boards of what is significant to diagnosis, prognosis or both in the context of e.g., oncology .
- Unstructured text 204 is any textual data that is not structured (such as, e.g., in fields or tables).
- unstructured text 204 may be in the form of sentences or phrases generally following grammar rules (e.g., English grammar rules).
- the unstructured data of input 202 includes unstructured image data (e.g., charts).
- the unstructured image data is converted to unstructured text 204, e.g., using methods known in the art.
- unstructured text 204 includes unstructured medical text of a patient.
- the unstructured medical text may be notations of a patient from a doctor or other medical professional recorded in the patient’s medical records.
- unstructured medical text of the patient may include, for example, demographic parameters.
- Exemplary parameters include, without limitation, sex, age, ethnicity, comorbidities, tobacco use, medical record number, source of insurance, primary care medical professional, referring medical professional, hospital, approved service vendors (e.g., pharmacy), disease specific clinical and molecular phenotype, therapy intent, stage of therapy with respect to progression of disease, biomarkers and cost of care.
- the element Eastern Cooperative Oncology Group (ECOG) performance status/quality of life metrics refers to a scale by which the quality of life of the patient over time can be tracked. It is part of the demographic parameter disease specific clinical molecular phenotype, i.e., the stage of a patient’s health at the start of therapy.
- a comparison of ECOG at start of therapy e.g., ECOG of 3
- ECOG after therapy e.g., ECOG of 2
- the unstructured medical text may be a simple indicator (e.g., positive, negative, not accessed), a numerically based parameter (e.g., tumor size), a standards based parameter (e.g., tumor grade),, dates of service, medical history, medication, diagnoses, allergies, immunization status, laboratory test results, vital signs, personal statistics, or any other suitable medical information of the patient.
- a simple indicator e.g., positive, negative, not accessed
- a numerically based parameter e.g., tumor size
- a standards based parameter e.g., tumor grade
- dates of service e.g., dates of service
- medical history e.g., medication, diagnoses, allergies, immunization status, laboratory test results, vital signs, personal statistics, or any other suitable medical information of the patient.
- the unstructured medical text of the patient is
- Unstructured oncologic text is particularly difficult to
- Figure 3 illustratively shows an example of unstructured oncologic text 300, in accordance with one embodiment.
- Unstructured oncologic text 300 may be unstructured text 204.
- Unstructured oncologic text 300 may contain notations from a doctor recorded in a patient’s medical records after evaluating the patient for breast cancer.
- Unstructured oncologic text 300 includes, e.g., a chief complaint of the patient, a history of the present illness, dates of service, allergies of the patient, and current medications of the patient.
- Input 202 of Figure 2 also includes data points 206.
- Data points 206 are data points that are significant for diagnosis and prognosis.
- extraction system 106 is configured to analyze unstructured text 204 to determine oncologic information of diagnostic and prognostic significance
- data points 206 are data points of oncologic
- Data points 206 may be determined by the particular doctor evaluating the patient, other doctors or medical experts, advisory boards, medical journals or publications, etc.
- An example of data points 206 that are significant for breast cancer diagnosis and prognosis include presence (+) or absence (-) of the estrogen receptor (ER), human epidermal growth factor receptor 2 (Fler2 receptor), and progesterone receptor (PR).
- ER estrogen receptor
- Fler2 receptor human epidermal growth factor receptor 2
- PR progesterone receptor
- Data point engine 208 of extraction system 106 is configured to receive data points 206 and identify equivalent data points of data points 206.
- Equivalent data points of data points 206 are data points that are synonyms of data points 206 or morphological variations of data points 206.
- the equivalent data points of data points 206 may be identified by, e.g., referencing a list, a table, or a database of equivalent data points.
- data point engine 208 identifies the following equivalent data points for the data point of estrogen receptor: human estrogen receptor and ER.
- the equivalent data points of data point 206 are data points that, in combination, are equivalent to data point 206.
- a diagnosis of triple negative breast cancer means that estrogen receptors, Her2 receptors, and progesterone receptors are not present.
- Data point engine 208 thus identifies the following equivalent data points for the data point of triple negative breast cancer: ER neg, HER2 neg, and PR neg.
- Data point engine 208 stores data points 206 and the identified equivalent data points of data points 206 in dictionary 210 (e.g., a dictionary database).
- Dictionary 210 thus provides for an exhaustive list of all data points that are significant for diagnosis or prognosis.
- data point engine 208 receives data points 206 and identifies equivalent data points of data points 206 to generate dictionary 210 in a pre processing step. In this manner, extraction system 106 can analyze a plurality of inputs 202 of unstructured text 204 without having to regenerate dictionary 210 for each input 202. In one embodiment, instead of receiving data points 206 and identifying equivalent data points of data points 206 to generate dictionary 210, dictionary 210 is directly received as input 202 or was previously generated and stored in a database (not shown) and retrieved by extraction system 106 as necessary.
- Natural language processor 212 receives unstructured text 204 and analyzes unstructured text 204 using dictionary 210 to extract all facts of known importance, e.g., cancer type, date of visit, ICD9 classification, stage, ER (+/-), Her2(+/-), PD (+/-). For example, natural language processor 212 may analyze unstructured text 204 to identify and extract prognostically significant data points in dictionary 210 that are present in unstructured text 204. [0097] The extracted data points are associated with words or phrases in unstructured text 204 that are syntactically or semantically dependent on (or related to) the respective extracted data point. In one embodiment, natural language processor 212 determines the words or phrases from unstructured text 204 that are syntactically or
- the output of natural language processor 212 is a set of extracted data points, identified as being prognostically significant in dictionary 210, each associated words or phrases that are syntactically or semantically dependent.
- unstructured text 204 may state:“ER was positive.”
- Natural language processor 212 would identify ER as being a significant data point (as indicated by dictionary 210) and determine that ER is syntactically dependent on the word“positive.”
- Mapping module 214 maps each extracted data point with its respective syntactically or semantically dependent word or phrase, as determined by natural language processor 212, to generate attribute-value pairs.
- each extracted data point is an attribute and their respective syntactically or semantically dependent word or phrase is the value.
- unstructured text 204 comprises“ER was positive,” ER is the attribute and positive is the value.
- unstructured text comprises:
- the attribute-value pairs may be stored in any suitable data structure, such as, e.g., a collection of tuples (in the form of, e.g., atthbutewalue or ⁇ attribute, value>) or a table having rows of attributes and corresponding values.
- FIG. 4 illustratively shows an exemplary table 400 of attribute-value pairs, in accordance with one embodiment.
- the attribute-value pairs in table 400 may be identified from unstructured medical text 300 in Figure 3.
- Table 400 includes attributes 402 with corresponding values 402. Attributes 402 are prognostically significant data points extracted from unstructured medical text 300 using dictionary 210. Values 404 are syntactically or semantically dependent on the attribute 402 in unstructured medical text 300.
- Mapping module 214 maps the attributes with respective values and stores the mapping as table 400.
- Standardization module 216 in Figure 2 is configured to standardize the attributes in the attribute-value pairs.
- the attributes are standardized according to the unified medical language system (UMLS), which is an industry accepted standard.
- UMLS unified medical language system
- the UMLS code for each attribute is obtained by looking up a database that has mapped the UMLS codes and the corresponding attributes (see, e.g.,
- Each attribute is assigned the UMLS code that represents that particular attribute. For example, where UMLS code C12345 represents ER, the attribute in the attribute-value pair ⁇ ER, positive> is assigned UMLS code C12345. The attribute-value pair is thus standardized to ⁇ 012345, positive>. It should be understood that standardization module 216 may
- Validation module 218 is configured to validate the standardized attribute- value pairs to ensure integrity. Validation is performed through the use of standard lists that map ICD9 codes to cancer types, and comparing attribute - value pairs with the cancer type referenced in the extracted data. In particular, validation module 218 may identify
- Validation module 218 may also identify standardized attribute-value pairs that cannot be correct. Validation module 218 validates the standardized attribute-value pairs at the field level and amongst those fields. For example, a patient’s performance is evaluated according to the Eastern Cooperative Oncology Group (ECOG) scale, ranging from Grade 0 (i.e., fully active) to Grade 5 (i.e., dead). A standardized attribute-value pair indicating ECOG is 7 cannot be correct. Validation module 218 will identify the attribute-value pair indicating ECOG is 7 for, e.g., manual review or correction or removal.
- ECOG Eastern Cooperative Oncology Group
- Validation module 218 will identify the attribute-value pair indicating an ICD9 code of 174.9.
- Modelling module 220 models the validated, standardized attribute-value pairs as a model that is optimal for analyzing and deriving actionable insights (i.e., a model that best fits analysis of the oncologic data). For example, a patient diagnosed with stage 1 cancer is subsequently diagnosed with stage 2 cancer. Modelling module 220 models the validated, standardized attribute-value pairs to identify the progression of the cancer from stage 1 to stage 2, thus providing actionable insights as the patient would have received different treatment for each stage. Modeling of the data is done in a fashion that enables grouping of data points that belong to a particular longitudinal point in the patient’s journey through cancer. All data points that are required to enable diagnosis and prognosis analysis are grouped together. The result of the modeling is used to look up which data points are to be extracted from the free text.
- Extraction system 106 analyzes input 202 to provide output 222.
- Output 222 includes a list of the validated, standardized attribute-value pairs.
- output 222 may include a list attribute-value pairs in the format of atthbutewalue, ⁇ attribute, value>, or any other suitable format.
- extraction system 106 receives input 202 to provide the list of attribute-value pairs as output 222 on a per patient basis.
- extraction system 106 also may provide the list of attribute-value pairs as output 222 on a per event basis (e.g., diagnosis, treatment, progression (e.g., ECOG), outcomes (e.g., overall survival (OS), progression free survival (PFS), toxicity).
- diagnosis, treatment, progression e.g., ECOG
- OS overall survival
- PFS progression free survival
- clinical outcome comprise at least one of survival, response metrics, quality of life metrics, incidence of drug toxicity, severity of drug toxicity, delivered dose intensity, drugs received, drug interval, drug duration, cost of care, and death.
- Output 222 may be in the format of a java script output notation (JSON) document, however any other suitable format may also be employed.
- JSON java script output notation
- extraction system 106 extracts significant data points from unstructured medical text 206.
- the extracted significant data points provide useful medical (e.g., oncologic) information of prognostic significance that can be used to provide actionable insights.
- Each clinical outcome tracking and analysis nodal address (CNA) is a subset of that list.
- Like personal health information can be classified and types of patients in the patient population grouped based on personal health information associated with the patient population by generating and assigning a plurality of nodal addresses within a computer containing a processor comprising a first clinical outcome tracking and analysis module.
- a patient is classified into one or more Clinical outcome tracking and analysis Nodal Addresses (CNAs) based on the list of attribute-value pairs of output 222 determined by extraction system 106 from unstructured text 204 for that patient.
- the CNAs represent a set of preselected variables that can be used to classify groups of patients (or data) into clinically relevant sets.
- the list of attribute-value pairs of output 222 are used to generate a unique CNA for each combination of prognostically significant data points.
- the CNA is a list of variables (as a function of a letter representing the variable and a number representing the selection within the variable).
- the letter A may represent the sex or gender variable and numbers 1 and 2 represent female and male patient
- the letter B may represent the race variable and number 1 through 4 represent different races.
- a CNA may be represented as A1 -2, B1 -4, . . ., N1.
- the CNA is represented as a plurality of discrete strings of digits separated by periods, where each string of digits indicates one or more variables (e.g., disease, phenotype, therapy type, progression/track, sex, etc.).
- a first string of digits may represent a particular disease
- a second string of digits may represent a type of disease
- a third string of digits may indicate a subtype of the disease
- a further string of digits may indicate a phenotype.
- the first string of digits may be 01 indicating cancer
- the second string of digits may be 02 indicating breast oncology
- a third string of digits may be 01 indicating breast cancer
- a fourth string of digits may be 1201 representing particular characteristics of a phenotype such that the nodal address is 01.02.01.1201. It should be understood that the nodal address may include any number of strings of digits and is not limited to four strings.
- Each CNA may be associated with one or more bundles of predetermined patient care services (e.g., treatment plans). Each bundle may also be associated with one or more nodes. The services included in each bundle may be determined by one or more medical professionals, a hospital, a group, an insurance company, etc. to optimize patient care and/or cost. In one example, a bundle may indicate a number of imaging scans, a drug or choice of drugs, a schedule of when to administer the drugs, an operation or procedure, a number and frequency of follow up visits, etc. The bundling of patient care services may be particularly useful for risk contracting.
- each bundle corresponding to a nodal address may have a predetermined cost allowing a user (e.g., doctor, patient, etc.) to choose an appropriate bundle.
- the cost may be determined or negotiated based on historical data associated with that particular disease or nodal address.
- the bundling of services provides cost certainty to an insurance company and/or hospital for a particular disease. This also reduces the cost of processing and maintaining records. Additionally, medical professionals will know ahead of time the predetermined course of treatment, which provides incentives to physicians to obtain better outcomes at lower costs.
- Each nodal address reduces trillions of possible permutations to a reduced number of clinically meaningful permutations based on the discrete punctuated string of digits representing each nodal address. According to some embodiments, this enables analysis of first behavioral and then consequent clinical and cost outcome variance from an ideal value, expressed as best clinical outcome at lowest possible cost, in a requisite time needed to alert for necessary care and avoidance of unnecessary care, thereby increasing the value of care, meaning better clinical outcomes at a lowest possible cost. According to some embodiments, the CNA enables identification of a specific patient as a candidate for a specific treatment, clinical trial, or drug.
- the CNA provides an analytic interface with connections to claims data to support health plans, hospitals and physician practices in managing doctors and other health care providers.
- CNAs reduce processing requirements and time for processing to make real-time monitoring efficient based on the discrete punctuated string of digits representing each nodal address and based on the reduction in permutations. This real time monitoring enables prediction of key points in time at which, for example, behavioral variance is likely to occur and interrupts treatment flow to avoid over-/under- utilization of care to prevent the behavioral variance.
- Figure 5 shows a flow diagram of a method 500 of operation of the extraction system 106, in accordance with one or more embodiments.
- unstructured medical text is received.
- the unstructured medical text may be sentences or phrases in the form of a natural language (e.g., based on grammar rules (e.g., English language grammar rules).
- the unstructured medical text may be notations of a patient from a doctor.
- the unstructured medical text includes unstructured oncologic text.
- step 504 data points determined from a dictionary database are identified in the unstructured medical text.
- the unstructured medical text may be parsed to identify the data points.
- the dictionary database is generated (e.g., as a preprocessing step).
- a plurality of data points is received that are known to be diagnostically and prognostically significant.
- the plurality of data points may be determined from the doctor evaluating the patient, other doctors or medical experts, advisory boards, or any other suitable source.
- Equivalent data points of the plurality of data points are determined.
- the equivalent data points are data points that are synonyms of the plurality of data points or morphological variations of the plurality of data points.
- the plurality of data points and the equivalent data points are stored to generate the dictionary database.
- a value associated with each of the data points is determined from the unstructured medical text.
- the value may be syntactically or semantically dependent on its respective data points. For example, probabilistic or semantic analysis may be performed to identify a value from the unstructured medical text for each of the data points.
- each of the data points is mapped to its respective value to generate attribute-value pairs.
- the mapping may be store the attribute-value pairs as a list of attribute-value pairs, a collection of tuples, a table, or any other suitable data structure.
- the data points i.e., attributes in the attribute-value pairs
- each attribute is assigned or converted to a corresponding UMLS code.
- Other standardizations may be employed in accordance with the present principles.
- the data points and their respective values are validated to ensure integrity.
- the data points and their respective values may be validated by identifying inconsistencies with the set of data points and values or by identifying data points having a respective value that cannot be correct.
- step 512 the data points and their respective values are modelled to provide actionable insight.
- data points and their respective values are extracted from the unstructured medical text to provide useful information of prognostic and diagnostic significance. These data points and their respective values can be further analyzed to provide actionable insights. For example, the data points and their respective values can be employed to assign a CNA to a patient.
- the CNA may be represented as a discrete punctuated string of digits each representing a set of preselected variables
- Systems, apparatuses, and methods described herein may be implemented using digital circuitry, or using one or more computers using well-known computer processors, memory units, storage devices, computer software, and other components.
- a computer includes a processor for executing instructions and one or more memories for storing instructions and data.
- a computer may also include, or be coupled to, one or more mass storage devices, such as one or more magnetic disks, internal hard disks and removable disks, magneto-optical disks, optical disks, etc.
- Systems, apparatus, and methods described herein may be implemented using computers operating in a client-server relationship.
- the client computers are located remotely from the server computer and interact via a network.
- the client-server relationship may be defined and controlled by computer programs running on the respective client and server computers.
- Systems, apparatus, and methods described herein may be implemented within a network-based cloud computing system.
- a server or another processor that is connected to a network communicates with one or more client computers via a network.
- a client computer may communicate with the server via a network browser application residing and operating on the client computer, for example.
- a client computer may store data on the server and access the data via the network.
- a client computer may transmit requests for data, or requests for online services, to the server via the network.
- the server may perform requested services and provide data to the client
- the server may also transmit data adapted to cause a client computer to perform a specified function, e.g., to perform a calculation, to display specified data on a screen, etc.
- the server may transmit a request adapted to cause a client computer to perform one or more of the method steps described herein, including one or more of the steps of Figure 5.
- Certain steps of the methods described herein, including one or more of the steps of Figure 5 may be performed by a server or by another processor in a network-based cloud-computing system.
- Certain steps of the methods described herein, including one or more of the steps of Figure 5 may be performed by a client computer in a network-based cloud computing system.
- the steps of the methods described herein, including one or more of the steps of Figure 5 may be performed by a server and/or by a client computer in a network-based cloud computing system, in any combination.
- Systems, apparatus, and methods described herein may be implemented using a computer program product tangibly embodied in an information carrier, e.g., in a non- transitory machine-readable storage device, for execution by a programmable processor; and the method steps described herein, including one or more of the steps of Figure 5, may be implemented using one or more computer programs that are executable by such a processor.
- a computer program is a set of computer program instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result.
- a computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing
- FIG. 600 A high-level block diagram 600 of an example computer that may be used to implement systems, apparatus, and methods described herein is depicted in Figure 6.
- Computer 602 includes a processor 604 operatively coupled to a data storage device 612 and a memory 610.
- Processor 604 controls the overall operation of computer 602 by executing computer program instructions that define such operations.
- the computer program
- Computer 602 may also include one or more network interfaces 606 for communicating with other devices via a network.
- Computer 602 may also include one or more input/output devices 608 that enable user interaction with computer 602 (e.g., display, keyboard, mouse, speakers, buttons, etc.).
- Processor 604 may include both general and special purpose
- Processor 604 may include one or more central processing units (CPUs), for example.
- Processor 604, data storage device 612, and/or memory 610 may include, be supplemented by, or incorporated in, one or more application-specific integrated circuits (ASICs) and/or one or more field programmable gate arrays (FPGAs).
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- Data storage device 612 and memory 610 each include a tangible non- transitory computer readable storage medium.
- Data storage device 612, and memory 610 may each include high-speed random access memory, such as dynamic random access memory (DRAM), static random access memory (SRAM), double data rate synchronous dynamic random access memory (DDR RAM), or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices such as internal hard disks and removable disks, magneto-optical disk storage devices, optical disk storage devices, flash memory devices, semiconductor memory devices, such as erasable programmable read-only memory (EPROM), electrically erasable
- DRAM dynamic random access memory
- SRAM static random access memory
- DDR RAM double data rate synchronous dynamic random access memory
- EPROM erasable programmable read-only memory
- EEPROM programmable read-only memory
- CD-ROM compact disc read-only memory
- DVD-ROM digital versatile disc read-only memory
- Input/output devices 608 may include peripherals, such as a printer, scanner, display screen, etc.
- input/output devices 608 may include a display device such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor for displaying information to the user, a keyboard, and a pointing device such as a mouse or a trackball by which the user can provide input to computer 602.
- display device such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor for displaying information to the user
- keyboard such as a keyboard
- pointing device such as a mouse or a trackball by which the user can provide input to computer 602.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2018/013118 WO2019139570A1 (en) | 2018-01-10 | 2018-01-10 | A system and method for extracting oncological information of prognostic significance from natural language |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3738054A1 true EP3738054A1 (en) | 2020-11-18 |
EP3738054A4 EP3738054A4 (en) | 2021-08-18 |
Family
ID=67219872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18900405.4A Pending EP3738054A4 (en) | 2018-01-10 | 2018-01-10 | A system and method for extracting oncological information of prognostic significance from natural language |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3738054A4 (en) |
WO (1) | WO2019139570A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116910817B (en) * | 2023-09-13 | 2023-12-29 | 北京国药新创科技发展有限公司 | Desensitization processing method and device for medical data and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7672987B2 (en) * | 2005-05-25 | 2010-03-02 | Siemens Corporate Research, Inc. | System and method for integration of medical information |
US8811692B2 (en) * | 2007-04-17 | 2014-08-19 | Francine J. Prokoski | System and method for using three dimensional infrared imaging for libraries of standardized medical imagery |
US8229881B2 (en) * | 2007-07-16 | 2012-07-24 | Siemens Medical Solutions Usa, Inc. | System and method for creating and searching medical ontologies |
AU2012225661A1 (en) * | 2011-03-07 | 2013-09-19 | Health Fidelity, Inc. | Systems and methods for processing patient history data |
US9734291B2 (en) * | 2013-10-08 | 2017-08-15 | COTA, Inc. | CNA-guided care for improving clinical outcomes and decreasing total cost of care |
-
2018
- 2018-01-10 WO PCT/US2018/013118 patent/WO2019139570A1/en unknown
- 2018-01-10 EP EP18900405.4A patent/EP3738054A4/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2019139570A1 (en) | 2019-07-18 |
EP3738054A4 (en) | 2021-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10878962B2 (en) | System and method for extracting oncological information of prognostic significance from natural language | |
US20210210184A1 (en) | Clinical concept identification, extraction, and prediction system and related methods | |
US20220020495A1 (en) | Methods and apparatus for providing guidance to medical professionals | |
US11101024B2 (en) | Medical coding system with CDI clarification request notification | |
US11152084B2 (en) | Medical report coding with acronym/abbreviation disambiguation | |
US9165116B2 (en) | Patient data mining | |
US8612261B1 (en) | Automated learning for medical data processing system | |
WO2020243732A1 (en) | Systems and methods of clinical trial evaluation | |
CN114026651A (en) | Automatic generation of structured patient data records | |
US20140365239A1 (en) | Methods and apparatus for facilitating guideline compliance | |
Zeng et al. | Identifying breast cancer distant recurrences from electronic health records using machine learning | |
US10318635B2 (en) | Automated mapping of service codes in healthcare systems | |
Hammami et al. | Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach | |
US11875884B2 (en) | Expression of clinical logic with positive and negative explainability | |
EP3000064A1 (en) | Methods and apparatus for providing guidance to medical professionals | |
Kaswan et al. | AI-based natural language processing for the generation of meaningful information electronic health record (EHR) data | |
EP3738054A1 (en) | A system and method for extracting oncological information of prognostic significance from natural language | |
US8756234B1 (en) | Information theory entropy reduction program | |
Macri et al. | Automated identification of clinical procedures in free-text electronic clinical records with a low-code named entity recognition workflow | |
US20240086771A1 (en) | Machine learning to generate service recommendations | |
US20240020740A1 (en) | Real-time radiology report completeness check and feedback generation for billing purposes based on multi-modality deep learning | |
US20240071623A1 (en) | Patient health platform | |
de Paula et al. | Core determinants of quality criteria for mhealth for hypertension: evidence from machine learning instruments | |
WO2023091495A1 (en) | System and method for rapid informatics-based prognosis and treatment development | |
CN116992839A (en) | Automatic generation method, device and equipment for medical records front page |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200723 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SMITH, STEPHEN AUGUST Inventor name: PARENT, CORY Inventor name: WAISMAN, IDAN Inventor name: RAJ, DILIP Inventor name: HASAN, ALI |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Free format text: PREVIOUS MAIN CLASS: G06F0017000000 Ipc: G06F0040300000 |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20210720 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 40/30 20200101AFI20210714BHEP Ipc: G06F 40/279 20200101ALI20210714BHEP Ipc: G16H 10/60 20180101ALI20210714BHEP |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: COTA INC. |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: COTA, INC. |