US20230360752A1 - Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms - Google Patents
Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms Download PDFInfo
- Publication number
- US20230360752A1 US20230360752A1 US18/222,324 US202318222324A US2023360752A1 US 20230360752 A1 US20230360752 A1 US 20230360752A1 US 202318222324 A US202318222324 A US 202318222324A US 2023360752 A1 US2023360752 A1 US 2023360752A1
- Authority
- US
- United States
- Prior art keywords
- data
- patient
- structured
- processors
- unstructured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title claims description 40
- 230000008713 feedback mechanism Effects 0.000 title 1
- 238000012372 quality testing Methods 0.000 title 1
- 230000001131 transforming effect Effects 0.000 title 1
- 238000012360 testing method Methods 0.000 claims abstract description 99
- 238000000034 method Methods 0.000 claims abstract description 60
- 230000008569 process Effects 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000015654 memory Effects 0.000 claims abstract description 3
- 238000010200 validation analysis Methods 0.000 claims description 77
- 230000002068 genetic effect Effects 0.000 claims description 26
- 238000003745 diagnosis Methods 0.000 claims description 24
- 238000000275 quality assurance Methods 0.000 claims description 18
- 238000011156 evaluation Methods 0.000 claims description 14
- 108090000623 proteins and genes Proteins 0.000 claims description 9
- 238000013499 data model Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 238000011282 treatment Methods 0.000 claims description 5
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 claims description 2
- 238000010223 real-time analysis Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000012549 training Methods 0.000 claims 1
- 206010028980 Neoplasm Diseases 0.000 description 19
- 238000004458 analytical method Methods 0.000 description 19
- 230000000875 corresponding effect Effects 0.000 description 19
- 238000013502 data validation Methods 0.000 description 16
- 230000036541 health Effects 0.000 description 15
- 201000011510 cancer Diseases 0.000 description 9
- 239000003814 drug Substances 0.000 description 8
- 238000012552 review Methods 0.000 description 8
- 230000010354 integration Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 6
- 229940079593 drug Drugs 0.000 description 5
- 230000008520 organization Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000002560 therapeutic procedure Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 230000009897 systematic effect Effects 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 208000032818 Microsatellite Instability Diseases 0.000 description 2
- 206010033128 Ovarian cancer Diseases 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 230000002411 adverse Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000002483 medication Methods 0.000 description 2
- 230000000869 mutational effect Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000002611 ovarian Effects 0.000 description 2
- 210000001672 ovary Anatomy 0.000 description 2
- 238000001959 radiotherapy Methods 0.000 description 2
- 230000000391 smoking effect Effects 0.000 description 2
- 238000010998 test method Methods 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 238000012384 transportation and delivery Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 206010003658 Atrial Fibrillation Diseases 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 206010010741 Conjunctivitis Diseases 0.000 description 1
- 208000029462 Immunodeficiency disease Diseases 0.000 description 1
- 206010023825 Laryngeal cancer Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 229930012538 Paclitaxel Natural products 0.000 description 1
- 206010000891 acute myocardial infarction Diseases 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000011511 automated evaluation Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000009960 carding Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 230000037406 food intake Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000009533 lab test Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 230000009245 menopause Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 230000000771 oncological effect Effects 0.000 description 1
- 229960001592 paclitaxel Drugs 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- RCINICONZNJXQF-MZXODVADSA-N taxol Chemical compound O([C@@H]1[C@@]2(C[C@@H](C(C)=C(C2(C)C)[C@H](C([C@]2(C)[C@@H](O)C[C@H]3OC[C@]3([C@H]21)OC(C)=O)=O)OC(=O)C)OC(=O)[C@H](O)[C@@H](NC(=O)C=1C=CC=CC=1)C=1C=CC=CC=1)O)C(=O)C1=CC=CC=C1 RCINICONZNJXQF-MZXODVADSA-N 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/20—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Definitions
- This application is directed to systems and methods for ensuring accurate data entry in one or more computer systems.
- Rich and meaningful data can be found in source clinical documents and records, such as diagnosis, progress notes, pathology reports, radiology reports, lab test results, follow-up notes, images, and flow sheets. These types of records are referred to as “raw clinical data.”
- raw clinical data many electronic health records do not include robust structured data fields that permit storage of clinical data in a structured format. Where electronic medical record systems capture clinical data in a structured format, they do so with a primary focus on data fields required for billing operations or compliance with regulatory requirements. The remainder of a patient’s record remains isolated, unstructured and inaccessible within text-based or other raw documents, which may even be stored in adjacent systems outside of the formal electronic health record. Additionally, physicians and other clinicians would be overburdened by having to manually record hundreds of data elements across hundreds of discrete data fields.
- Efforts to structure clinical data also may be limited by conflicting information within a single patient’s record or among multiple records within an institution.
- health systems may have done so in different formats.
- Different health systems may have one data structure for oncology data, a different data structure for genomic sequencing data, and yet another different data structure for radiology data.
- different health systems may have different data structures for the same type of clinical data. For instance, one health system may use one EMR for its oncology data, while a second health system uses a different EMR for its oncology data.
- the data schema in each EMR will usually be different.
- a health system may even store the same type of data in different formats throughout its organization. Determination of data quality across various data sources is both a common occurrence and challenge within the healthcare industry.
- a computer-implemented method of performing improved automated quality assurance testing of structured patient data based on transformed unstructured data streams includes (i) receiving, via one or more electronic data streams, unstructured patient data; (ii) processing, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validating the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) causing, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device accessible by a user; and (v) receiving, one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
- a computing system includes one or more processors; and one or more memories having stored thereon instructions that, when executed by one or more processors, cause the computing system to: (i) receive, via one or more electronic data streams, unstructured patient data; (ii) process, via the one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via the one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) cause, via the one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and (v) receive, via the one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
- a computer-readable medium includes instructions that, when executed by one or more processors, cause a computer to: (i) receive, via one or more electronic data streams, unstructured patient data; (ii) process, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) cause, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and (v) receive, via one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
- FIG. 1 shows an exemplary user interface that a clinical data analyst may utilize to structure clinical data from raw clinical data
- FIG. 2 depicts one example of EMR-extracted structured data that includes a payload of diagnosis-related data
- FIG. 3 depicts one example of EMR-extracted structured data that includes a payload of medication-related data
- FIG. 4 depicts a user interface that may be used by a conflict resolution user when a complex disagreement is identified for a patient record
- FIG. 5 depicts a user interface that may be used by a conflict resolution user when a more straightforward disagreement is identified for a patient record
- FIG. 6 depicts a list of test suites within a “demographics” root level category
- FIG. 7 depicts an exemplary test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing
- FIG. 8 depicts a second exemplary test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing
- FIG. 9 depicts one example of a user interface through which a manager-level user can view and maintain validations, quickly determine which patient cases have passed or failed, obtain the specific detail about any failed validation, and quickly re-assign cases for further manual QA and issue resolution prior to clinical sign-out and approval;
- FIG. 10 depicts an exemplary user interface for performing quality assurance testing based on generic abstractions from raw documents
- FIG. 11 depicts an exemplary user interface that is used to provide abstraction across multiple streams of raw clinical data and documents
- FIG. 12 depicts an exemplary user interface for performing an inter-rater reliability analysis
- FIG. 13 depicts another exemplary user interface
- FIG. 14 depicts another visualization of the exemplary user interface of FIG. 13 ;
- FIG. 15 depicts one example of various metrics or reports generated by the present system
- FIG. 16 depicts a second example of various metrics or reports generated by the present system
- FIG. 17 depicts a third example of various metrics or reports generated by the present system
- FIG. 18 depicts a fourth example of various metrics or reports generated by the present system.
- FIG. 19 reflects a generalized process flow diagram for carrying out the method disclosed herein, from raw data importation, through data structuring, and then through automated quality assurance testing.
- a comprehensive data integrity evaluation and validation system is described herein, the system usable, e.g., to generate a definitive clinical record for a patient or consistency among groups, projects, or cohorts of patients. Due to the quantity and varying intricacy or elements of a clinical record, multiple categories of basic and complex validations may be needed to provide the requisite completeness and accuracy.
- various authors use software tools to compose validation rules that can be run independently on one or more patient records or applied in aggregate to all patient records comprising a given grouping, project or defined cohort.
- validations can be applied to a specific attribute (e.g. gender) or to a combination of attributes (e.g. gender and primary diagnosis) that results in the authoring of basic and advanced rule-based logic.
- the system may include a dynamic user interface enabling a user to design and build a new query by selecting one or more attributes represented in the system and then associating a desired rule (e.g. is present, is above/below/within a certain threshold value or range, etc.) with those attributes.
- Validation rules can operate in a stand-alone fashion or can be chained and/or linked together at a project and/or patient cohort level.
- validation checks can also be grouped and bundled into query sets or used individually as part of an ad-hoc quality assurance check initiated either manually or automatically upon delivery of a cohort of patient data.
- the system may maintain the ability to programmatically seed and/or populate a predefined set of validation rules that may be applicable to one or more streams.
- a validation rule may be composed of a seeded set of rules and/or checks that enable data integrity.
- Examples of validation rules may include date-related rules such as including a date field and requiring an entry in that field, confirming whether a date is within a proper time period (e.g., providing an error if the rule requires the date to be earlier than or equal to the present date and the entered date is sometime in the future, or requiring a future date and the entered date is in the past), confirming that an entered value is within a predetermined range (e.g., an ECOG value must be between 0 and 5 or a Karnofsky score must be between 0 and 100), determining whether a metastatic site distance is sufficiently close to a tumor site diagnosis, or determining whether data received in a certain field conflicts with entered data in another field (e.g., any non-zero entry or any entry whatsoever in a gravidity field should return an error if the patient’s gender in another field is indicated to be “male” or given a certain diagnosis or cancer
- a series of API endpoints await a sufficiently articulated and valid rule definition as well as a corresponding validation rule name.
- the API for the service may enable the creation, update, and/or deletion of the validations; alternatively, the validations may be managed in an administrative user interface or directly via database queries.
- a plurality of rules optionally may be grouped as a set, as compared to being evaluated independently.
- a rule can be associated with a query set (a combination of validation queries) and/or a specific cohort of patients where it can be run automatically to detect data inconsistencies and anomalies.
- Query sets may be groupings of validation rules and checks that are grouped as a result of similarity in the types of checks performed and/or the needs of a quality assurance (“QA”) user wanting to identify the integrity of patient records via use of bulk and/or combined validation rules and checks.
- QA quality assurance
- an example of a patient cohort includes a sub-group of patients with a certain diagnosis sub-type (e.g., ovarian or lung within a cancer type) and/or a sub-subgroup of patients with a particular diagnosis stage or molecular mutation or variant within the sub-group.
- patient cohorts are not limited to oncological areas of medicine but may apply to groupings of patients in other disease areas as well, such as cardiovascular disease, atrial fibrillation, immunodeficiency diseases, etc.
- rules can be evaluated on those cohorts to determine if a subset of patients satisfy validation requirements specific to the subset as compared to generic rules that may apply to all patients.
- Applying a query set to a patient record or a portion thereof may result in the system verifying an accuracy of the data structuring within an acceptable system- or user-defined threshold level, in which case the structured data may be deemed accepted and the patient record may be amended to include that structured data.
- the query set result may indicate the presence of one or more errors in the data structuring, requiring further review and/or modifications to the structured data, and the patient record then may be amended to include the modified structured data.
- the structured clinical data may differ on the basis of the types of data elements within each list of structured clinical data, the organization of data elements within a structured clinical data schema, or in other ways.
- Structured clinical data refers to clinical data that has been ingested into a structured format governed by a data schema.
- structured clinical data may be patient name, diagnosis date, and a list of medications, arranged in a JSON format. It should be understood that there are many, more complicated types of structured clinical data, which may take different formats.
- Data schema means a particular set of data attributes and relationships therein that comprise a set of structured data to be used for various purposes (e.g. internal analysis, integration with purpose-built applications, etc.).
- Data element means a particular clinical and/or phenotypic data attribute. For instance, a comorbidity (e.g. acute myocardial infarction), adverse event (e.g. conjunctivitis), performance score (e.g. ECOG score of 3), etc.
- a comorbidity e.g. acute myocardial infarction
- adverse event e.g. conjunctivitis
- performance score e.g. ECOG score of 3
- Data value means the value of the data in a data element. For instance, in a “Diagnosis Date” data element, the data value may be “Oct. 10, 2016”.
- Certain systems and methods described herein permit a patient’s structured clinical record to be automatically evaluated and scored in a consistent manner, while also simultaneously allowing for the determination of data integrity across various data sources.
- a patient may have disparate structured data residing in multiple applications and/or EMR databases within and across institutions, it may be a challenge to determine whether the structured data that exists within these sources is at a sufficient level of granularity and/or accuracy when analyzed independently and/or combined. Issues also may arise relating to clinical informatics - where a particular raw value may not have been correlated with a recognized medical ontology and/or vocabulary.
- a structured clinical record may benefit from the use of validation rules and checks like those described herein.
- the data is structured, it may be possible to determine whether the particular data in a field is in an appropriate format, is in an acceptable range, etc.
- the particular data in a field is in an appropriate format, is in an acceptable range, etc.
- certain such results may be represented as numbers
- structuring that data may permit it to be captured in a manner that can be validated automatically and/or used for aggregate population evaluation.
- a system as described in this application can uniquely identify and support the resolution of gaps in a patient’s record.
- inter-rater reliability and a comprehensive clinical data validation system facilitate the identification and resolution of gaps in a patient’s record when abstracted across multiple disparate streams.
- the platform may include a workflow tool and an administrative user interface for querying, reporting, and output tagging.
- the system may support externally sourced data validations and/or edit checks corresponding to custom data science analysis workflows as well as data integrity enforcement for various purposes, such as for clinical trial management.
- externally sourced may refer to validation rules or checks authored by one or more external parties, e.g., health systems, clinical trial management services, etc., importable and ingestible into the present validation system, for use and integration with other rules and/or validation checks.
- Externally sourced also may refer to ingestion of other validations originated by other individuals or applications other than the present validation system while still internal to the entity employing the present system.
- the system may compare multiple sets of structured clinical data for a single patient, select the most correct data element for each of the structured data elements, and return a new list of structured clinical data containing the most correct data element value for each data element.
- the new list reflects a single “source of truth” for a patient based on the raw clinical data for that patient.
- Certain systems and methods may make use of various systematic validation checks at multiple stages in a process that commences with raw data input and ends with the data being curated, including at a data abstraction stage and/or a quality assurance stage. Additional stages in this timeline may include a data sufficiency score-carding stage in which the raw inputs are analyzed to determine whether they contain a sufficient amount of clinical data to proceed with the abstraction stage, and a downstream stage in which validation checks are used for patient cohorts.
- these systematic validation checks can be applied before data abstraction of raw clinical documents and notes.
- the validation checks can be re-run or re-initiated to evaluate the quality of the abstraction.
- the structured clinical data may be merged into a larger dataset.
- the larger dataset may have the same or a similar data schema to the structured clinical data.
- the larger dataset may be used for the conduct of research, may be associated with published research or clinical guidelines, and may be provided to third parties for their own research and analysis.
- FIG. 1 an exemplary user interface that a clinical data analyst may utilize to structure clinical data from raw clinical data is depicted.
- the input data may be abstracted data that signifies a comprehensive, dynamic representation of a patient’s clinical attributes across multiple categories, e.g., demographics, diagnosis, treatments, outcomes, genetic testing, labs, etc. Within each of these categories, attributes may be repeated to reflect multiple instances of a particular clinical data attribute present in multiple locations within the patient data. In particular, since abstraction is based on a full history of a patient’s clinical record and associated documents, multiple attributes may be repeated across different data collection time periods and visit dates. For example, attributes like demographics (e.g. smoking status), treatments (e.g. therapies prescribed), outcomes (e.g. RECIST response level), and others can have one or more values ascribed to a given patient’s abstracted clinical attributes.
- demographics e.g. smoking status
- treatments e.g. therapies prescribed
- outcomes e.g. RECIST response level
- others can have one or more values ascribed to a given patient’s abstracted clinical attributes.
- patient data can be extracted from source records, research projects, tracking sheets and the like.
- sample source fields from unstructured flat files may include: enrollment-date, age_at_enrollment, sex, race, marital_status, gravidity, menopause, cancer status, age_at_diagnosis, laterality, T_stage_clinical, T_stage_pathological, histology, grade, etc., and the system may extract both the source fields as well as their respective data values.
- this input data often is inconsistent and dynamic to the principal investigator, researcher and/or partnering organization providing the patient data.
- data models may vary substantially between a researcher, principal investigator and/or organization collecting and maintaining patient data.
- the raw data ascribed to the data model must be considered capable of dynamic, continuous updates as more information is obtained and/or recorded.
- a mapping exercise may be required to relate information from unstructured data originating in flat files into a canonical schema, format and/or model for evaluation purposes. Mapping also may be required to be able to run validation rules and checks across consistent data that has been merged into a single data model and/or schema for evaluation.
- the mapping exercise may identify source data fields and attributes from the data provider, e.g., a third party organization or researcher, and analyze that data in its raw form in order to determine linkages between the data and medical concepts or terminology reflected by the data and a data model used by the system.
- Such concept mapping may be performed manually by specially-trained informatics engineers or other specialists or one or more software applications specifically designed to undertake such mapping, as would be appreciated by one of ordinary skill in the relevant art.
- patient data may be Electronic Medical Record (EMR)-extracted structured data.
- EMR Electronic Medical Record
- This data can include a set of text strings representing various clinical attributes but may also include various ontological code systems and concepts to represent each text string in a way that can be compared against other data sets and/or validations.
- FIG. 2 depicts one example of EMR-extracted structured data that includes a payload of diagnosis-related data, specifically, data pertaining to a diagnosis of Malignant neoplasm of larynx, unspecified.
- FIG. 3 depicts one example of EMR-extracted structured data relating to the medication Paclitaxel, provided intravenously.
- patient data may be extracted through a clinical concept identification, extraction, prediction, and learning engine such as the one described in the commonly-owned U.S. Pat. No. 10,395,772, titled “Mobile Supplementation, Extraction, and Analysis of Health Records,” and issued Aug. 27, 2019, the contents of which are incorporated herein by reference in their entirety. Additional relevant details may be found in the commonly-owned U.S. Pat. Application No. 16/702,510, titled “Clinical Concept Identification, Extraction, and Prediction System and Related Methods,” filed Dec. 3, 2019, the contents of which also are incorporated herein by reference in their entirety.
- the output of this engine may be a configurable and extensible set of predictions about a given patient’s clinical attributes across a variety of content types.
- These types may include (but may not be limited to) primary diagnosis & metastases sites, tumor characterization histology, standard grade, tumor characterization alternative grade, medication / ingredient, associated outcomes, procedures, adverse events, comorbidities, smoking status, performance scores, radiotherapies, imaging modality, etc.
- the system may be configured to automatically initiate the evaluation of both partial and fully structured patient clinical records across multiple sources and/or streams through a variety of triggering events.
- Such events may include, e.g.: (1) receiving an on-demand request, e.g., via a job runner by an Administrative user using an Administrator-driven user interface that can initiate the process programmatically, (2) via a background service triggered upon receipt of new software code commits or corresponding application build phases, (3) when new data is either received or ingested across sources and streams, (4) upon achieving a sufficient inter-rater or intra-rater reliability scoring system, which is run automatically on a configurable percentage of patient records as part of a project or batch, (5) upon completion of either a case abstraction and/or QA activity, (6) a bulk initiation of evaluation of multiple structured clinical records once all have been completed, e.g., upon receipt of clinical data and/or records for patients participating in an institution’s clinical trial, which may be obtained via a site
- Data analysis also may be triggered in one or more other ways, including via an automated trigger.
- automated triggers may occur, e.g., when a case has been submitted and recorded successfully, when a case has generated a data product representing all of the structured content, or in preparation for data delivery to a partner expecting a set of de-identified patient records containing structured clinical records that have been validated for quality, accuracy and consistency.
- Trigger #1 (on-demand): a user with appropriate authorization can manually initiate one or more distinct tests to support the evaluation of one or more patient clinical records. In its default state, this functionality manifests itself as part of a graphical user interface presented after entering in a specific request for one or more tests at a terminal window command line.
- Trigger #2 (on receipt of code commits): tests can be initiated en masse via a background service or selectively when only a subset of tests are required to validate specific patient clinical data and/or attributes.
- validation may take advantage of “continuous integration,” or the practice of integrating new code with existing code while embedding automated testing and checks into this process to minimize and/or eliminate gaps and issues in production-level software and applications.
- new code commits are made, reviewed, approved and merged into various code branches for subsequent application build phases while intermediate software (e.g. Jenkins) maintains responsibility for running one or more test suites programmatically and recording their output (e.g. failed, pending and passed) as well as collecting details, stacktraces and/or screenshots resulting from these tests.
- intermediate software e.g. Jenkins
- Trigger #3 new data ingested: an integration engine and/or intermediate data lake receives and processes new structured data which may also initiate corresponding tests to evaluate and score the data as its own distinct stream as well as comparatively to any existing data received for the patient.
- an integration engine may receive a stream of XML and/or JSON content comprising structured data and corresponding ontological code systems and concepts as extracted from a health system’s EMR at a single point in time. Upon receipt, this data would be evaluated against one or more test suites for accuracy, coverage and/or insufficiency. It may also be compared and evaluated relative to other patient record data received via other sources and similarly run through one or more test suites.
- the system may receive a FHIR-compliant payload from partners that contains one or more genetic / genomic testing results for one or more patients.
- the test suite for genetic testing referenced above may be run programmatically to evaluate the integrity of this data and may also be compared and evaluated relative to other genetic testing content already ingested and/or abstracted as part of one or more patient records.
- Trigger #4A inter-rater reliability: the system will evaluate two instances of a patient’s abstracted clinical data and compose a score at both the case and field-levels to determine a level of agreement between a plurality of abstractors (or “raters”) in order to determine whether to automatically begin the evaluation process.
- “automatically” may refer to a systematic assignment of a subset of patient cases that will be abstracted by two distinct individuals in a “double-blind” manner where the reviewer may also be unaware of participant identities.
- a scoring scheme is used to calculate the proficiency and accuracy of each submission by taking into account the modifications and updates made by a conflict resolution user.
- the system may assign a first version or instance of a case or data stream to a first rater and a second version or instance of the case or data stream to a second rater, i.e., the plurality of raters may review the same subset of cases or records, after which the system may determine whether there is a sufficiently high degree of overlap and/or agreement between each rater’s abstraction.
- a third-party conflict resolver may review the raw clinical data and each rater’s abstraction content in order to generate a de facto or “best” abstraction of the patient record.
- the conflict resolver may select from among the abstractions provided by the other raters.
- the conflict resolver additionally or alternatively may provide its own abstraction and select the “best” abstraction from the group that includes its own abstraction and those of the other raters.
- FIG. 4 illustrates one of the steps to be performed by a conflict resolution user when a complex disagreement is identified for a patient record.
- a conflict resolver must evaluate the radiotherapies cited by the two abstractors and determine which are in fact appropriate for the “de facto” patient clinical record by moving the most correct items to therapy groups.
- FIG. 5 illustrates one of the steps to be performed by a conflict resolution user when a basic disagreement is identified for a patient record.
- a conflict resolver must evaluate the demographic data cited by the two abstractors and determine which are in fact appropriate for the “de facto” patient clinical record by selecting the correct “race” clinical data value.
- Trigger #4B intra-rater reliability: like the previously-disclosed trigger, the system also may be used to evaluate a plurality of abstractions from a single rater, in order to determine how consistent the rater is in his or her efforts.
- the notes or other clinical data reviewed by the rater may relate to the same patient, e.g., different portions of a patient’s record, or they may be similar or distinct portions of raw clinical data from multiple patients.
- Trigger #5 case abstraction completion and/or quality assurance completion: clinical data attributes for the patient record may be evaluated systematically for gaps in logic through the use of a clinical data validation service that centralizes a number of rules (see below for details) and works in conjunction with a cohort sign-out process.
- Trigger #6 (upon receipt of clinical data and/or records for patients participating in an institution’s clinical trial): clinical data attributes for a patient potentially eligible for participation in a clinical trial may be evaluated on-demand or as part of a broader batch of patients from that institution on a rolling basis.
- the present system and method may support the workflow’s ability to identify gaps in clinical attributes that may be required for inclusion / exclusion criteria evaluation and matching.
- Trigger #7 on-demand analysis: structured data may be extracted, either directly or via a mapping procedure, from a clinical note while that note is being created or dictated by a physician or other clinician. The structured data is analyzed, and errors, incomplete information, or conflicting information in the underlying data are reported back to the clinician in real time.
- the default set of evaluation criteria may be composed at a category-level (e.g. demographics, diagnosis, genetic testing and labs, treatments and outcomes) along with nested sub-groupings that allow for granular and precise evaluation of clinical patient attributes by type.
- category-level e.g. demographics, diagnosis, genetic testing and labs, treatments and outcomes
- nested sub-groupings allow for granular and precise evaluation of clinical patient attributes by type. For example, and with regard to the depiction in FIG. 6 of a list of test suites within a “demographics” root level category, a test may be written to determine whether a record of ovarian cancer was a correctly structured instance:
- a determination that the record was structured “correctly” may mean more than simply determining whether there are data values in each of the specified fields and attributes. Instead, correct structuring also may signify that all of the attributes listed were adequately provided and mapped to accepted and/or preferred medical concepts, i.e., that the requisite data was provided, represented, and properly fulfilled all validation checks managed by the system. Mapping may relate to both a system-defined data model as well as one or more external models, such as the Fast Healthcare Interoperability Resources (“FHIR”) specification.
- the system may include one or more test suites that define the criteria for the relevant categories and nested sub-groupings and then may execute relevant validation checks to carry out those test suites.
- Medical concepts can span numerous dictionaries, vocabularies and ontologies, and data elements within structured data generally conform to a specific system, concept code and preferred text descriptor. For instance, in the example discussed above, for “Ovary,” i.e., the tissue of origin identified for a corresponding primary tumor instance, the system may determine whether that data instance is mapped to the “SNOMED” code of 93934004 with a preferred text descriptor of “Primary malignant neoplasm of ovary (disorder)” in order to comply with a test suite that includes the same relationship.
- the test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing may include evaluating whether values for the following criteria are present and accurately structured:
- a test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing may include the following criteria:
- the evaluation and/or analysis performed as part of the system referenced above may comprise a combination of several of the trigger mechanisms discussed above.
- the analysis of data can be initiated programmatically or manually by one or more of the triggers on a particular set of patient data records (either structured or unstructured) and from multiple disparate data sources.
- the system may include: (1) automated and continuously maintained test suites specific to one or more clinical attributes and/or content types, (2) clinical data validation processes performed at run-time during abstraction as well as quality assurance activities, and (3) inter-rater reliability (IRR).
- the triggers may evolve or be revised over time to generate a more robust, more complete quality assurance system. For example, test suites may grow continuously to support more templates or later-generated abstraction fields for clinical data structuring.
- the clinical data validations may be maintained in a library programmatically via web service endpoints or a user interface that supports the addition of new validations and corresponding definitions of rules, e.g., using a rule builder.
- the system may generate multiple streams of abstracted clinical data that can be evaluated and re-assigned to a more sophisticated user with deeper clinical background to help resolve any conflicts, thereby producing a de facto “source of truth” for a given patient’s clinical record.
- the system may rely on data from other patients to determine whether the data in a target patient’s record appears correct or whether it may warrant an alert signifying a potential error or an otherwise unexpected finding.
- anomalies can automatically be detected or ascertained when a newly validated patient record contains data (e.g. clinical or molecular) that have not been found in any previous patient records run through the validation rule and/or check.
- a patient record may include both clinical and molecular data, where the molecular data may include data reflecting a “new” gene, in that there may not be much, if any, clinical knowledge regarding the medical effects of having the gene.
- a molecular variant present in structured data for a patient from a 3rd party NGS lab that is invalid or unknown to science may be flagged for manual review as it may have been mis-keyed or entered incorrectly into a structured clinical record.
- the system may search its data store for indications of other patients with that gene. For example, the system may use a library of known valid molecular variants as well as a review of all previous molecular variants found in previous data structuring activities for other patient records to detect anomalous data elements. The system then may search for similarities in clinical data among those other patients in order to develop a template test suite.
- the system may assume that the other patients’ clinical data is accurate, such that deviations from that data when a validation check is performed on a subject patient’s data may trigger an alert to the provider or reviewer as to either an error in the subject patient’s data or, alternatively, to an unexpected result that may warrant further investigation.
- validations may be fairly straightforward, e.g., when comparing different portions of a patient record, is the system able to extract a patient’s gender from more than one location and do those gender-based attributes match up?
- a test suite that instructs the system to query one or more known portions of a record for gender-identifying information, to review that information for internal consistency (if more than one portion of the record is considered), and to return that gender as an attribute for the patient may be usable for multiple use cases as a fairly generic test suite.
- the test suite may seek to compare the structured patient data against a set of one or more guidelines, e.g., clinical trial inputs or metrics reflecting general patient population results (e.g., survival, progression, etc.), to determine whether the patient’s data is in-line with those guidelines or reflects a potential error or outlier.
- guidelines e.g., clinical trial inputs or metrics reflecting general patient population results (e.g., survival, progression, etc.)
- the validation may be structured to match a clinical practice guideline that must be met before a patient is eligible to receive a therapy.
- a clinical practice guideline that must be met before a patient is eligible to receive a therapy.
- One example set of clinical practice guidelines is the National Comprehensive Cancer Network Clinical Practice Guidelines.
- a validation may be structured to include the relevant criteria from one or more practice guidelines. If the patient record contains information that permits the validation to pass successfully, then the patient may be permitted to receive the therapy.
- validations may be specific to certain use cases based, e.g., on other data extracted from a patient record. For example, certain types of cancer are gender-specific. Thus, a quality assurance validation or rule that says “if structured data extracted from the patient record includes an attribute for prostate cancer, then a patient gender of ‘female’ represents an error” is useful for prostate cancer use cases but not for other cancers or diseases.
- validations may be multi-variable or require more than a simple cross-check of two fields against one another.
- a patient record may document scenarios that reflect valid or invalid staging, and the relevant cancer also may have subtypes that vary based on staging.
- a complete validation check of a test suite may require that the system evaluate all of the possibilities at each stage to determine whether the structured data is complete and internally consistent.
- the system may include an automated process for evaluating each test suite to determine whether it represents an accurate test. That process may require running through each of the possibilities that are queried in the test suite and determining that none of the tests conflict with other tests in the suite. Thus, e.g., the system may assume that a first test yields a “true” or valid result. Then, given that result, the system determines whether it is possible for a second test to also yield a “true” or valid result. The system continues in that process until a “false” or invalid result is reached or until all tests have been evaluated. In the latter case, the system may recognize that the test suite does not include any failures and may publish the test suite for actual implementation. In the former case, once an invalid result is determined, the system may flag the test suite for further review and either amendment or definitive approval, despite the invalid result.
- One objective of the system is to allow for the creation, management and assignment of specific clinical data fields and their corresponding attributes via a single user interface.
- a dynamic management and rendering engine for template-specific fields enables the system to achieve this objective by permitting different classes of users to rapidly configure new templates with custom field configurations in minutes without code by employing a user interface that permits those users to select both the fields, as well as the hierarchy among the fields, that are desired for a given clinical data structuring project or use case. Templates may drive a determination of what content from the raw data is available to an abstractor. Finally, the system maintains a version history of every template modification made by authorized users for auditing purposes.
- validations can be leveraged at a more granular project-specific level (rather than at an individual level or a cohort level), which may allow for the evaluation and scoring of specific template configurations as well as their corresponding data fields.
- the validation service rather than running validations against a single patient’s clinical data elements and content generally, the validation service also may be run with a batch or bulk set of patient clinical data elements that correspond to one or more projects. Data may be sourced from one or more sources, including upstream abstracted patient content (e.g., prior to structuring) or from more finalized versions of the data (e.g., from a downstream data warehouse in a structured format).
- these bulk or test validation service checks may be configured to run either sequentially or simultaneously.
- the system may be configured to perform these validation checks on patients associated with projects that have been configured to these templates to ensure that data has been abstracted, captured and/or encoded properly.
- Results of the foregoing validations may be output as structured code, e.g., in a JSON file format.
- the file may include one or more indicators describing which clinical data attributes passed or failed a particular validation.
- results of a test suite processing all clinical data attributes may produce a result output as structured code, e.g., also in a JSON format, that describes which particular test(s) within the suite passed or failed for one or more given patient records passed to it.
- the system may be usable by a plurality of different users having distinct roles.
- the following list describes various user roles or use cases, the corresponding actions each user may take, and one or more benefits that may result from use of the system as a result of those actions:
- a clinical manager may want to evaluate a single patient, a project, an in-progress or completed cohort or one or more patients abstracted and/or QA’ed by a specific abstractor or lead user for accuracy. Additionally, this user may want to obtain an analysis of a data stream sourced externally (e.g. via EMR or structured data extract) to determine the need for further incremental abstraction of a patient’s clinical record.
- a single abstracted patient can be evaluated for accuracy through the use of the clinical data validation service either upon request, when the corresponding patient case is being submitted via Workbench or when clinical attributes are modified. Validation rules are run atop all structured clinical data for a single abstracted patient and pass / fail assignments are made as a result.
- the clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on a single abstracted patient at that point in time.
- a project can be evaluated for accuracy through the use of the clinical data validation service either upon request or when the project is used as a filter within the QA Manager Console user interface.
- validation rules will have already been run atop all structured clinical data for all completed and submitted patients within the given project and pass / fail assignments are retrieved as a result.
- the clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on abstracted patients within a project at that point in time.
- a cohort can similarly be evaluated for accuracy through the use of the clinical data validation service either upon request or when the cohort is used as a filter within the QA Manager Console.
- validation rules will have already been run atop all structured clinical data for all completed and submitted patients with the given cohort and pass / fail assignments are retrieved as a result.
- the clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on abstracted patients within a cohort at that point in time.
- Externally sourced data streams may first be ingested and mapped to a source-specific schema by a member of an integrations team. Subsequently, the schema may be aligned to a clinical data model by a member of an informatics team that allows for mapping of concepts to a canonical set of systems, codes, and values. After the schema mapping and concept mapping steps, the clinical data validation service can evaluate an externally sourced patient record upon request by using the default set of validations checks. Further, source-specific custom rules and validations may be authored within the QA Manager Console to ensure proper coverage of all desired data integrity checks.
- a clinical abstraction lead may want to identify gaps in abstraction for a patient and/or project assigned to their abstraction team, perhaps specific to a cancer type (e.g. colorectal team).
- the clinical abstraction lead may want to obtain the IRR score for a project, manually initiate a test suite for one or more clinical data attributes as well as perform various validation checks.
- IRR scores at a project-level are aggregated and averaged across all eligible and completed IRR cases within that project.
- IRR case agreement thresholds and case eligibility percentage are configurable at the project level and will vary.
- a global set of validation checks are available via the clinical data validation service and can be run atop one or more patient records corresponding to a project.
- a clinical data abstractor may want to preview content ingested from third party sources into various data streams and obtain a report including quantitative insights specific to clinical data attributes (e.g. medications, procedures, adverse events, genetic testing, etc) that will help them to more fully abstract a patient’s clinical record from various disparate sources.
- clinical data attributes e.g. medications, procedures, adverse events, genetic testing, etc
- An operational lead may want to better understand data coverage and quality gaps specific to one or more patients or in aggregate across specific projects/cohorts. Further, they may want to receive automated notifications and warnings that will alert them to take action directly with health system content partners when data validations fail and/or the automated evaluation and scoring for various clinical data streams is insufficient.
- a data scientist may want to integrate with the system to better train machine learning models based on various levels of priority and/or a trust scale for various clinical data ingested and/or structured across clinical data streams. For example, a project or cohort with a high IRR score, near-perfect clinical data validation checks and automated test suites passing may be treated preferentially to other unstructured or semi-structured clinical data with lower scores.
- An integration and/or infrastructure engineer may want to monitor various clinical data streams being ingested from external sources to verify connectivity, data sufficiency as well as quality over time.
- a quality assurance engineer may want to compare the output of their manually maintained clinical data test suites against externally sourced content programmatically or on an ad-hoc basis.
- a product manager may want to better understand the project, cohort and/or field level scoring of either/both abstracted and structured data to determine further improvements to various workflows, user interfaces and design patterns to accelerate and further streamline the data structuring operation.
- the system maintains a continuously growing set of stream-specific validations, warnings, and errors that help proactively inform and/or alert administrators of patient data quality and integrity issues.
- a supported application and any of its users can quickly identify whether a patient case, either individually or one within a specific cohort, has passed or failed one or more validation checks.
- Validations may be managed through a QA Manager Console user interface where they are constructed and/or grouped for use as part of quality assurance activities (at a batch and/or cohort level) and as part of on-demand evaluation criteria for one or more patient records. These validations are also useful when accounting for inclusion and exclusion criteria specific to patient cohorts for research and/or clinical trial consideration purposes.
- FIGS. 9 - 12 depict one example of the user interface through which a manager-level user can view and maintain these validations, quickly determine which patient cases have passed or failed, obtain the specific detail about any failed validation, and quickly re-assign cases for further manual QA and issue resolution prior to clinical sign-out and approval.
- FIG. 10 depicts an exemplary user interface for performing quality assurance testing based on generic abstractions from raw documents.
- FIG. 11 depicts an exemplary user interface that is used to provide abstraction across multiple streams of raw clinical data and documents.
- FIG. 12 depicts an exemplary user interface for performing an inter-rater reliability analysis.
- FIGS. 13 and 14 show a second exemplary user interface that a clinical data analyst may utilize to compare, merge and generate a “single source of truth” patient record across multiple data schemas, sources and/or streams.
- the system additionally may output and/or deliver various metrics and reports that provide insight into the accuracy and/or completeness of patient clinical records specific to a project as well as across selected projects for comparative and benchmarking purposes.
- Reporting data may include rankings and scores at both the patient record and clinical data attribute / field grain, indicative of data source / stream quality, completeness and integrity. This information becomes available to clinical data abstractors within a data curation, abstraction, and/or structuring toolset and user interface to aid in their desire to generate a “single source of truth” consolidated patient record atop various sources. It can also be used by clinical data managers to ensure a high quality data product deliverable for partners. As seen in these figures, the system may generate outputs permitting a user to visualize the IRR scoring and conflict resolution processes, as well as to review the subsequent reporting and insights generated afterwards. Additionally, a sample visualization describing data quality across various clinical data attributes and types is included for reference.
- validation rules may be composed of hard, blocking errors (e.g., an indication of a new problem emerging after a recorded date of death) and loose warning notifications (e.g., an indication from one portion of the patient’s record that the patient has stage 2 lung cancer while a second portion of the record indicates that the cancer is stage 3) that help to improve the integrity of a patient record during the clinical data structuring process as well as afterwards during subsequent QA activities.
- These validation rules can have various severity levels that indicate to an application and/or system process whether to reject fully or accept but call attention to a particular issue found in the data analyzed. Because the system may include a “sliding scale” of error severity, the results of the data quality tests may not be an “all-or-nothing” situation.
- the system may generate quantitative metrics such as a “% success” indicator to measure the accuracy of the data structuring.
- This indicator also may account for the fact that a test suite may comprise dozens, if not hundreds, of different validation checks and that some may return acceptable results while others may indicate errors, missing information, or incomplete information.
- FIG. 19 depicts one exemplary process flow of the present disclosure.
- external data is received by the system, where it is ranked, scored, or otherwise structured, either on its own or in consideration with other data streams from the same patient.
- the structured data then is run through one or more QA Automation processes, such as the processes discussed herein in order to generate metrics and reports that can be output, e.g., to an administrative user or to the institution providing the external data.
Abstract
A method includes receiving unstructured data; processing the unstructured data to generate corresponding structured records; validating the structured patient records using quality tests; causing errors incomplete information to be displayed; and receiving a revision to the unstructured patient data. A computing system includes a processor; and a memory having stored thereon instructions that, when executed by processor, cause the computing system to: receive unstructured data; process the unstructured data to generate corresponding structured records; validate the structured patient records using quality tests; cause errors incomplete information to be displayed; and receive a revision to the unstructured patient data. A computer-readable medium includes instructions that, when executed by a processor, cause a computer to: receive unstructured data; process the unstructured data to generate corresponding structured records; validate the structured patient records using quality tests; cause errors incomplete information to be displayed; and receive a revision to the unstructured patient data.
Description
- This application is a continuation of U.S. Pat. Application No. 16/732,210, entitled AUTOMATED QUALITY ASSURANCE TESTING OF STRUCTURED CLINICAL DATA, filed on Dec. 31, 2019, which claims priority to U.S. Provisional Pat. Application No. 62/787,249, entitled AUTOMATED QUALITY ASSURANCE TESTING OF STRUCTURED CLINICAL DATA, filed on Dec. 31, 2018, and hereby incorporated by reference in its entirety.
- This application is directed to systems and methods for ensuring accurate data entry in one or more computer systems.
- In precision medicine, physicians and other clinicians provide medical care designed to optimize efficiency or therapeutic benefit for patients on the basis of their particular characteristics. Each patient is different, and their different needs and conditions can present a challenge to health systems that must grapple with providing the right resources to their clinicians, at the right time, for the right patients. Health systems have a significant need for systems and methods that allow for precision-level analysis of patient health needs, in order to provide the right resources, at the right time, to the right patients.
- Rich and meaningful data can be found in source clinical documents and records, such as diagnosis, progress notes, pathology reports, radiology reports, lab test results, follow-up notes, images, and flow sheets. These types of records are referred to as “raw clinical data.” However, many electronic health records do not include robust structured data fields that permit storage of clinical data in a structured format. Where electronic medical record systems capture clinical data in a structured format, they do so with a primary focus on data fields required for billing operations or compliance with regulatory requirements. The remainder of a patient’s record remains isolated, unstructured and inaccessible within text-based or other raw documents, which may even be stored in adjacent systems outside of the formal electronic health record. Additionally, physicians and other clinicians would be overburdened by having to manually record hundreds of data elements across hundreds of discrete data fields.
- As a result, most raw clinical data is not structured in the medical record. Hospital systems, therefore, are unable to mine and/or uncover many different types of clinical data in an automated, efficient process. This gap in data accessibility can limit a hospital system’s ability to plan for precision medicine care, which in turn limits a clinician’s ability to provide such care.
- Several software applications have been developed to provide automated structuring, e.g, through natural language processing or other efforts to identify concepts or other medical ontological terms within the data. Like manual structuring, however, many of such efforts remain limited by errors or incomplete information.
- Efforts to structure clinical data also may be limited by conflicting information within a single patient’s record or among multiple records within an institution. For example, where health systems have structured their data, they may have done so in different formats. Different health systems may have one data structure for oncology data, a different data structure for genomic sequencing data, and yet another different data structure for radiology data. Additionally, different health systems may have different data structures for the same type of clinical data. For instance, one health system may use one EMR for its oncology data, while a second health system uses a different EMR for its oncology data. The data schema in each EMR will usually be different. Sometimes, a health system may even store the same type of data in different formats throughout its organization. Determination of data quality across various data sources is both a common occurrence and challenge within the healthcare industry.
- What is needed is a system that addresses one or more of these challenges.
- In one aspect, in general, a computer-implemented method of performing improved automated quality assurance testing of structured patient data based on transformed unstructured data streams includes (i) receiving, via one or more electronic data streams, unstructured patient data; (ii) processing, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validating the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) causing, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device accessible by a user; and (v) receiving, one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
- In another aspect, in general, a computing system includes one or more processors; and one or more memories having stored thereon instructions that, when executed by one or more processors, cause the computing system to: (i) receive, via one or more electronic data streams, unstructured patient data; (ii) process, via the one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via the one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) cause, via the one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and (v) receive, via the one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
- In yet another aspect, in general, a computer-readable medium includes instructions that, when executed by one or more processors, cause a computer to: (i) receive, via one or more electronic data streams, unstructured patient data; (ii) process, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping; (iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records; (iv) cause, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and (v) receive, via one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
-
FIG. 1 shows an exemplary user interface that a clinical data analyst may utilize to structure clinical data from raw clinical data; -
FIG. 2 depicts one example of EMR-extracted structured data that includes a payload of diagnosis-related data; -
FIG. 3 depicts one example of EMR-extracted structured data that includes a payload of medication-related data; -
FIG. 4 depicts a user interface that may be used by a conflict resolution user when a complex disagreement is identified for a patient record; -
FIG. 5 depicts a user interface that may be used by a conflict resolution user when a more straightforward disagreement is identified for a patient record; -
FIG. 6 depicts a list of test suites within a “demographics” root level category; -
FIG. 7 depicts an exemplary test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing; -
FIG. 8 depicts a second exemplary test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing; -
FIG. 9 depicts one example of a user interface through which a manager-level user can view and maintain validations, quickly determine which patient cases have passed or failed, obtain the specific detail about any failed validation, and quickly re-assign cases for further manual QA and issue resolution prior to clinical sign-out and approval; -
FIG. 10 depicts an exemplary user interface for performing quality assurance testing based on generic abstractions from raw documents; -
FIG. 11 depicts an exemplary user interface that is used to provide abstraction across multiple streams of raw clinical data and documents; -
FIG. 12 depicts an exemplary user interface for performing an inter-rater reliability analysis; -
FIG. 13 depicts another exemplary user interface; -
FIG. 14 depicts another visualization of the exemplary user interface ofFIG. 13 ; -
FIG. 15 depicts one example of various metrics or reports generated by the present system; -
FIG. 16 depicts a second example of various metrics or reports generated by the present system; -
FIG. 17 depicts a third example of various metrics or reports generated by the present system; -
FIG. 18 depicts a fourth example of various metrics or reports generated by the present system; and -
FIG. 19 reflects a generalized process flow diagram for carrying out the method disclosed herein, from raw data importation, through data structuring, and then through automated quality assurance testing. - A comprehensive data integrity evaluation and validation system is described herein, the system usable, e.g., to generate a definitive clinical record for a patient or consistency among groups, projects, or cohorts of patients. Due to the quantity and varying intricacy or elements of a clinical record, multiple categories of basic and complex validations may be needed to provide the requisite completeness and accuracy. In the functionality described below, various authors use software tools to compose validation rules that can be run independently on one or more patient records or applied in aggregate to all patient records comprising a given grouping, project or defined cohort.
- These validations can be applied to a specific attribute (e.g. gender) or to a combination of attributes (e.g. gender and primary diagnosis) that results in the authoring of basic and advanced rule-based logic. In particular, the system may include a dynamic user interface enabling a user to design and build a new query by selecting one or more attributes represented in the system and then associating a desired rule (e.g. is present, is above/below/within a certain threshold value or range, etc.) with those attributes. Validation rules can operate in a stand-alone fashion or can be chained and/or linked together at a project and/or patient cohort level.
- The construction of these validations is performed through the selection of one or more existing query sets as part of a validation query and/or through the design of a new query. Alternatively, validation checks can also be grouped and bundled into query sets or used individually as part of an ad-hoc quality assurance check initiated either manually or automatically upon delivery of a cohort of patient data. Still further, the system may maintain the ability to programmatically seed and/or populate a predefined set of validation rules that may be applicable to one or more streams.
- A validation rule may be composed of a seeded set of rules and/or checks that enable data integrity. Examples of validation rules may include date-related rules such as including a date field and requiring an entry in that field, confirming whether a date is within a proper time period (e.g., providing an error if the rule requires the date to be earlier than or equal to the present date and the entered date is sometime in the future, or requiring a future date and the entered date is in the past), confirming that an entered value is within a predetermined range (e.g., an ECOG value must be between 0 and 5 or a Karnofsky score must be between 0 and 100), determining whether a metastatic site distance is sufficiently close to a tumor site diagnosis, or determining whether data received in a certain field conflicts with entered data in another field (e.g., any non-zero entry or any entry whatsoever in a gravidity field should return an error if the patient’s gender in another field is indicated to be “male” or given a certain diagnosis or cancer sub-type, or an entered staging value is invalid given a cancer diagnosis or sub-type entry in another field).
- From a system perspective, a series of API endpoints await a sufficiently articulated and valid rule definition as well as a corresponding validation rule name. The API for the service may enable the creation, update, and/or deletion of the validations; alternatively, the validations may be managed in an administrative user interface or directly via database queries.
- A plurality of rules optionally may be grouped as a set, as compared to being evaluated independently. Thus, in a separate transaction, a rule can be associated with a query set (a combination of validation queries) and/or a specific cohort of patients where it can be run automatically to detect data inconsistencies and anomalies. Query sets may be groupings of validation rules and checks that are grouped as a result of similarity in the types of checks performed and/or the needs of a quality assurance (“QA”) user wanting to identify the integrity of patient records via use of bulk and/or combined validation rules and checks. Conversely, an example of a patient cohort includes a sub-group of patients with a certain diagnosis sub-type (e.g., ovarian or lung within a cancer type) and/or a sub-subgroup of patients with a particular diagnosis stage or molecular mutation or variant within the sub-group. It will be understood that patient cohorts are not limited to oncological areas of medicine but may apply to groupings of patients in other disease areas as well, such as cardiovascular disease, atrial fibrillation, immunodeficiency diseases, etc. With regard to patient cohorts, rules can be evaluated on those cohorts to determine if a subset of patients satisfy validation requirements specific to the subset as compared to generic rules that may apply to all patients.
- Applying a query set to a patient record or a portion thereof may result in the system verifying an accuracy of the data structuring within an acceptable system- or user-defined threshold level, in which case the structured data may be deemed accepted and the patient record may be amended to include that structured data. In another instance, the query set result may indicate the presence of one or more errors in the data structuring, requiring further review and/or modifications to the structured data, and the patient record then may be amended to include the modified structured data.
- In order to properly apply the validation rules, it may be necessary to standardize, normalize, or otherwise structure the input data. Thus, systems and methods are described herein that permit the automatic analysis of different types of structured clinical data. The structured clinical data may differ on the basis of the types of data elements within each list of structured clinical data, the organization of data elements within a structured clinical data schema, or in other ways.
- Within the context of the present disclosure, the following terms may be understood to have the following meanings:
- “Structured” clinical data refers to clinical data that has been ingested into a structured format governed by a data schema. As one simple example, structured clinical data may be patient name, diagnosis date, and a list of medications, arranged in a JSON format. It should be understood that there are many, more complicated types of structured clinical data, which may take different formats.
- “Data schema” means a particular set of data attributes and relationships therein that comprise a set of structured data to be used for various purposes (e.g. internal analysis, integration with purpose-built applications, etc.).
- “Data element” means a particular clinical and/or phenotypic data attribute. For instance, a comorbidity (e.g. acute myocardial infarction), adverse event (e.g. conjunctivitis), performance score (e.g. ECOG score of 3), etc.
- “Data value” means the value of the data in a data element. For instance, in a “Diagnosis Date” data element, the data value may be “Oct. 10, 2016”.
- Certain systems and methods described herein permit a patient’s structured clinical record to be automatically evaluated and scored in a consistent manner, while also simultaneously allowing for the determination of data integrity across various data sources. In particular, given that a patient may have disparate structured data residing in multiple applications and/or EMR databases within and across institutions, it may be a challenge to determine whether the structured data that exists within these sources is at a sufficient level of granularity and/or accuracy when analyzed independently and/or combined. Issues also may arise relating to clinical informatics - where a particular raw value may not have been correlated with a recognized medical ontology and/or vocabulary. Given this context, a structured clinical record may benefit from the use of validation rules and checks like those described herein. In particular, because the data is structured, it may be possible to determine whether the particular data in a field is in an appropriate format, is in an acceptable range, etc. For example, with regard to lab results and/or readings, while certain such results may be represented as numbers, structuring that data may permit it to be captured in a manner that can be validated automatically and/or used for aggregate population evaluation. As these validation checks can apply across phenotypic/clinical, molecular and other types of patient-specific information from various structured data sources, a system as described in this application can uniquely identify and support the resolution of gaps in a patient’s record. Additionally, with validation rules & checks as well as a toolset as described in this application, structured and unstructured data across multiple data sources can automatically be analyzed to ensure a higher degree of patient data accuracy and fidelity. Thus, in some aspects, inter-rater reliability and a comprehensive clinical data validation system facilitate the identification and resolution of gaps in a patient’s record when abstracted across multiple disparate streams.
- Certain systems and methods may be utilized within an overall clinical data structuring platform. The platform may include a workflow tool and an administrative user interface for querying, reporting, and output tagging.
- In one aspect, the system may support externally sourced data validations and/or edit checks corresponding to custom data science analysis workflows as well as data integrity enforcement for various purposes, such as for clinical trial management. In this context, “externally sourced” may refer to validation rules or checks authored by one or more external parties, e.g., health systems, clinical trial management services, etc., importable and ingestible into the present validation system, for use and integration with other rules and/or validation checks. “Externally sourced” also may refer to ingestion of other validations originated by other individuals or applications other than the present validation system while still internal to the entity employing the present system.
- Additionally or alternatively, the system may compare multiple sets of structured clinical data for a single patient, select the most correct data element for each of the structured data elements, and return a new list of structured clinical data containing the most correct data element value for each data element. The new list reflects a single “source of truth” for a patient based on the raw clinical data for that patient.
- Certain systems and methods may make use of various systematic validation checks at multiple stages in a process that commences with raw data input and ends with the data being curated, including at a data abstraction stage and/or a quality assurance stage. Additional stages in this timeline may include a data sufficiency score-carding stage in which the raw inputs are analyzed to determine whether they contain a sufficient amount of clinical data to proceed with the abstraction stage, and a downstream stage in which validation checks are used for patient cohorts. Thus, these systematic validation checks can be applied before data abstraction of raw clinical documents and notes. Additionally or alternatively, once the data abstraction process has been completed, the validation checks can be re-run or re-initiated to evaluate the quality of the abstraction.
- In certain embodiments, the structured clinical data may be merged into a larger dataset. The larger dataset may have the same or a similar data schema to the structured clinical data. The larger dataset may be used for the conduct of research, may be associated with published research or clinical guidelines, and may be provided to third parties for their own research and analysis.
- Turning now to
FIG. 1 , an exemplary user interface that a clinical data analyst may utilize to structure clinical data from raw clinical data is depicted. - In one aspect, the input data may be abstracted data that signifies a comprehensive, dynamic representation of a patient’s clinical attributes across multiple categories, e.g., demographics, diagnosis, treatments, outcomes, genetic testing, labs, etc. Within each of these categories, attributes may be repeated to reflect multiple instances of a particular clinical data attribute present in multiple locations within the patient data. In particular, since abstraction is based on a full history of a patient’s clinical record and associated documents, multiple attributes may be repeated across different data collection time periods and visit dates. For example, attributes like demographics (e.g. smoking status), treatments (e.g. therapies prescribed), outcomes (e.g. RECIST response level), and others can have one or more values ascribed to a given patient’s abstracted clinical attributes.
- In a second aspect, patient data can be extracted from source records, research projects, tracking sheets and the like. For example, sample source fields from unstructured flat files may include: enrollment-date, age_at_enrollment, sex, race, marital_status, gravidity, menopause, cancer status, age_at_diagnosis, laterality, T_stage_clinical, T_stage_pathological, histology, grade, etc., and the system may extract both the source fields as well as their respective data values.
- In both aspects, the form of this input data often is inconsistent and dynamic to the principal investigator, researcher and/or partnering organization providing the patient data. For example, data models may vary substantially between a researcher, principal investigator and/or organization collecting and maintaining patient data. Additionally, since the data can be collected in real-time from various systems or data capture mechanisms, the raw data ascribed to the data model must be considered capable of dynamic, continuous updates as more information is obtained and/or recorded. As a result, a mapping exercise may be required to relate information from unstructured data originating in flat files into a canonical schema, format and/or model for evaluation purposes. Mapping also may be required to be able to run validation rules and checks across consistent data that has been merged into a single data model and/or schema for evaluation. In particular, the mapping exercise may identify source data fields and attributes from the data provider, e.g., a third party organization or researcher, and analyze that data in its raw form in order to determine linkages between the data and medical concepts or terminology reflected by the data and a data model used by the system. Such concept mapping may be performed manually by specially-trained informatics engineers or other specialists or one or more software applications specifically designed to undertake such mapping, as would be appreciated by one of ordinary skill in the relevant art.
- In a third aspect, patient data may be Electronic Medical Record (EMR)-extracted structured data. This data can include a set of text strings representing various clinical attributes but may also include various ontological code systems and concepts to represent each text string in a way that can be compared against other data sets and/or validations. As a result of this structuring, the data mapping exercise may be significantly more straightforward than the exercise required for either of the other two instances.
FIG. 2 depicts one example of EMR-extracted structured data that includes a payload of diagnosis-related data, specifically, data pertaining to a diagnosis of Malignant neoplasm of larynx, unspecified. Similarly,FIG. 3 depicts one example of EMR-extracted structured data relating to the medication Paclitaxel, provided intravenously. - In a fourth aspect, patient data may be extracted through a clinical concept identification, extraction, prediction, and learning engine such as the one described in the commonly-owned U.S. Pat. No. 10,395,772, titled “Mobile Supplementation, Extraction, and Analysis of Health Records,” and issued Aug. 27, 2019, the contents of which are incorporated herein by reference in their entirety. Additional relevant details may be found in the commonly-owned U.S. Pat. Application No. 16/702,510, titled “Clinical Concept Identification, Extraction, and Prediction System and Related Methods,” filed Dec. 3, 2019, the contents of which also are incorporated herein by reference in their entirety. The output of this engine may be a configurable and extensible set of predictions about a given patient’s clinical attributes across a variety of content types. These types may include (but may not be limited to) primary diagnosis & metastases sites, tumor characterization histology, standard grade, tumor characterization alternative grade, medication / ingredient, associated outcomes, procedures, adverse events, comorbidities, smoking status, performance scores, radiotherapies, imaging modality, etc.
- In order to make use of data from one or more of these streams, the system may be configured to automatically initiate the evaluation of both partial and fully structured patient clinical records across multiple sources and/or streams through a variety of triggering events. Such events may include, e.g.: (1) receiving an on-demand request, e.g., via a job runner by an Administrative user using an Administrator-driven user interface that can initiate the process programmatically, (2) via a background service triggered upon receipt of new software code commits or corresponding application build phases, (3) when new data is either received or ingested across sources and streams, (4) upon achieving a sufficient inter-rater or intra-rater reliability scoring system, which is run automatically on a configurable percentage of patient records as part of a project or batch, (5) upon completion of either a case abstraction and/or QA activity, (6) a bulk initiation of evaluation of multiple structured clinical records once all have been completed, e.g., upon receipt of clinical data and/or records for patients participating in an institution’s clinical trial, which may be obtained via a site coordinator, via EMR or source records, or (7) real-time analysis during creation of a patient note or other clinical data. Each of these trigger events is discussed in greater detail, as follows. Data analysis also may be triggered in one or more other ways, including via an automated trigger. Such automated triggers may occur, e.g., when a case has been submitted and recorded successfully, when a case has generated a data product representing all of the structured content, or in preparation for data delivery to a partner expecting a set of de-identified patient records containing structured clinical records that have been validated for quality, accuracy and consistency.
- Trigger #1 (on-demand): a user with appropriate authorization can manually initiate one or more distinct tests to support the evaluation of one or more patient clinical records. In its default state, this functionality manifests itself as part of a graphical user interface presented after entering in a specific request for one or more tests at a terminal window command line.
- Trigger #2 (on receipt of code commits): tests can be initiated en masse via a background service or selectively when only a subset of tests are required to validate specific patient clinical data and/or attributes. In this aspect, validation may take advantage of “continuous integration,” or the practice of integrating new code with existing code while embedding automated testing and checks into this process to minimize and/or eliminate gaps and issues in production-level software and applications. As part of this process, new code commits are made, reviewed, approved and merged into various code branches for subsequent application build phases while intermediate software (e.g. Jenkins) maintains responsibility for running one or more test suites programmatically and recording their output (e.g. failed, pending and passed) as well as collecting details, stacktraces and/or screenshots resulting from these tests.
- Trigger #3 (new data ingested): an integration engine and/or intermediate data lake receives and processes new structured data which may also initiate corresponding tests to evaluate and score the data as its own distinct stream as well as comparatively to any existing data received for the patient. In one possible implementation, an integration engine may receive a stream of XML and/or JSON content comprising structured data and corresponding ontological code systems and concepts as extracted from a health system’s EMR at a single point in time. Upon receipt, this data would be evaluated against one or more test suites for accuracy, coverage and/or insufficiency. It may also be compared and evaluated relative to other patient record data received via other sources and similarly run through one or more test suites. In another possible implementation, the system may receive a FHIR-compliant payload from partners that contains one or more genetic / genomic testing results for one or more patients. In this example, the test suite for genetic testing referenced above may be run programmatically to evaluate the integrity of this data and may also be compared and evaluated relative to other genetic testing content already ingested and/or abstracted as part of one or more patient records.
- Trigger #4A (inter-rater reliability): the system will evaluate two instances of a patient’s abstracted clinical data and compose a score at both the case and field-levels to determine a level of agreement between a plurality of abstractors (or “raters”) in order to determine whether to automatically begin the evaluation process. In this example, “automatically” may refer to a systematic assignment of a subset of patient cases that will be abstracted by two distinct individuals in a “double-blind” manner where the reviewer may also be unaware of participant identities. Further, a scoring scheme is used to calculate the proficiency and accuracy of each submission by taking into account the modifications and updates made by a conflict resolution user.
- The system may assign a first version or instance of a case or data stream to a first rater and a second version or instance of the case or data stream to a second rater, i.e., the plurality of raters may review the same subset of cases or records, after which the system may determine whether there is a sufficiently high degree of overlap and/or agreement between each rater’s abstraction. When the requisite threshold is not met, a third-party conflict resolver may review the raw clinical data and each rater’s abstraction content in order to generate a de facto or “best” abstraction of the patient record. In one aspect, the conflict resolver may select from among the abstractions provided by the other raters. In another aspect, the conflict resolver additionally or alternatively may provide its own abstraction and select the “best” abstraction from the group that includes its own abstraction and those of the other raters.
- With regard to this trigger,
FIG. 4 illustrates one of the steps to be performed by a conflict resolution user when a complex disagreement is identified for a patient record. In this example, a conflict resolver must evaluate the radiotherapies cited by the two abstractors and determine which are in fact appropriate for the “de facto” patient clinical record by moving the most correct items to therapy groups. - Conversely,
FIG. 5 illustrates one of the steps to be performed by a conflict resolution user when a basic disagreement is identified for a patient record. In this example, a conflict resolver must evaluate the demographic data cited by the two abstractors and determine which are in fact appropriate for the “de facto” patient clinical record by selecting the correct “race” clinical data value. - Trigger #4B (intra-rater reliability): like the previously-disclosed trigger, the system also may be used to evaluate a plurality of abstractions from a single rater, in order to determine how consistent the rater is in his or her efforts. The notes or other clinical data reviewed by the rater may relate to the same patient, e.g., different portions of a patient’s record, or they may be similar or distinct portions of raw clinical data from multiple patients.
- Trigger #5 (case abstraction completion and/or quality assurance completion): clinical data attributes for the patient record may be evaluated systematically for gaps in logic through the use of a clinical data validation service that centralizes a number of rules (see below for details) and works in conjunction with a cohort sign-out process.
- Trigger #6 (upon receipt of clinical data and/or records for patients participating in an institution’s clinical trial): clinical data attributes for a patient potentially eligible for participation in a clinical trial may be evaluated on-demand or as part of a broader batch of patients from that institution on a rolling basis. With regard to this workflow, the present system and method may support the workflow’s ability to identify gaps in clinical attributes that may be required for inclusion / exclusion criteria evaluation and matching.
- Trigger #7 (on-demand analysis): structured data may be extracted, either directly or via a mapping procedure, from a clinical note while that note is being created or dictated by a physician or other clinician. The structured data is analyzed, and errors, incomplete information, or conflicting information in the underlying data are reported back to the clinician in real time.
- Regardless of the choice of triggering event, the default set of evaluation criteria (e.g. test suites) may be composed at a category-level (e.g. demographics, diagnosis, genetic testing and labs, treatments and outcomes) along with nested sub-groupings that allow for granular and precise evaluation of clinical patient attributes by type. For example, and with regard to the depiction in
FIG. 6 of a list of test suites within a “demographics” root level category, a test may be written to determine whether a record of ovarian cancer was a correctly structured instance: - Primary tumor instance identified as part of a patient record
- Tissue of origin identified for a corresponding primary tumor instance
- e.g. Ovary
- Date of diagnosis identified for a primary diagnosis
- e.g. Dec. 15, 2015
- Date of recurrence identified for a primary diagnosis
- e.g. Mar. 5, 2016
- Diagnosis (e.g. histology) identified for the corresponding primary diagnosis
- e.g. Ovarian stromal tumor
- Standard grade identified for the corresponding primary diagnosis
- e.g. Grade 2 (moderately differentiated)
- AJCC staging identified for the corresponding primary diagnosis
- e.g. T1B, N0, M0 (Stage 1B)
- In this example, a determination that the record was structured “correctly” may mean more than simply determining whether there are data values in each of the specified fields and attributes. Instead, correct structuring also may signify that all of the attributes listed were adequately provided and mapped to accepted and/or preferred medical concepts, i.e., that the requisite data was provided, represented, and properly fulfilled all validation checks managed by the system. Mapping may relate to both a system-defined data model as well as one or more external models, such as the Fast Healthcare Interoperability Resources (“FHIR”) specification. In this regard, the system may include one or more test suites that define the criteria for the relevant categories and nested sub-groupings and then may execute relevant validation checks to carry out those test suites.
- Medical concepts can span numerous dictionaries, vocabularies and ontologies, and data elements within structured data generally conform to a specific system, concept code and preferred text descriptor. For instance, in the example discussed above, for “Ovary,” i.e., the tissue of origin identified for a corresponding primary tumor instance, the system may determine whether that data instance is mapped to the “SNOMED” code of 93934004 with a preferred text descriptor of “Primary malignant neoplasm of ovary (disorder)” in order to comply with a test suite that includes the same relationship.
- In a second example, and with regard to
FIG. 7 , the test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing may include evaluating whether values for the following criteria are present and accurately structured: - Initial genetic testing instance identified and/or added to a patient record
- Date identified for an instance of genetic testing
- e.g. Jan. 1, 2017
- Testing provider identified for an instance of genetic testing
- e.g. Tempus
- Test method identified for an instance of genetic testing
- e.g. Mutation analysis
- Gene result detail identified for an instance of genetic testing
- e.g. Gene: KRAS
- e.g. Result: Amplification
- e.g. Raw Result: 100
- e.g. Detail: N/A
- Tumor mutational burden identified for an instance of genetic testing
- e.g. 10
- Microsatellite instability identified for an instance of genetic testing
- e.g. High
- In a third example, and with regard to
FIG. 8 , a test suite for determining sufficiency of a structured and/or abstracted instance of genetic testing may include the following criteria: - Initial genetic testing instance identified and/or added to a patient record
- Date identified for an instance of genetic testing
- e.g. Jan. 1, 2017
- Testing provider identified for an instance of genetic testing
- e.g. Tempus
- Test method identified for an instance of genetic testing
- e.g. Mutation analysis
- Gene result detail identified for an instance of genetic testing
- e.g. Gene: KRAS
- e.g. Result: Amplification
- e.g. Raw Result: 100
- e.g. Detail: N/A
- Tumor mutational burden identified for an instance of genetic testing
- e.g. 10
- Microsatellite instability identified for an instance of genetic testing
- e.g. High
- In one aspect, the evaluation and/or analysis performed as part of the system referenced above may comprise a combination of several of the trigger mechanisms discussed above. Thus, the analysis of data can be initiated programmatically or manually by one or more of the triggers on a particular set of patient data records (either structured or unstructured) and from multiple disparate data sources. For example, the system may include: (1) automated and continuously maintained test suites specific to one or more clinical attributes and/or content types, (2) clinical data validation processes performed at run-time during abstraction as well as quality assurance activities, and (3) inter-rater reliability (IRR). Additionally, the triggers may evolve or be revised over time to generate a more robust, more complete quality assurance system. For example, test suites may grow continuously to support more templates or later-generated abstraction fields for clinical data structuring. Similarly, the clinical data validations (errors, warnings, etc.) may be maintained in a library programmatically via web service endpoints or a user interface that supports the addition of new validations and corresponding definitions of rules, e.g., using a rule builder. The system may generate multiple streams of abstracted clinical data that can be evaluated and re-assigned to a more sophisticated user with deeper clinical background to help resolve any conflicts, thereby producing a de facto “source of truth” for a given patient’s clinical record.
- In still another example, the system may rely on data from other patients to determine whether the data in a target patient’s record appears correct or whether it may warrant an alert signifying a potential error or an otherwise unexpected finding. For each validation rule and/or check performed (or triggered) on a given patient record, anomalies can automatically be detected or ascertained when a newly validated patient record contains data (e.g. clinical or molecular) that have not been found in any previous patient records run through the validation rule and/or check. For example, a patient record may include both clinical and molecular data, where the molecular data may include data reflecting a “new” gene, in that there may not be much, if any, clinical knowledge regarding the medical effects of having the gene. In another example, a molecular variant present in structured data for a patient from a 3rd party NGS lab that is invalid or unknown to science may be flagged for manual review as it may have been mis-keyed or entered incorrectly into a structured clinical record. In both cases, the system may search its data store for indications of other patients with that gene. For example, the system may use a library of known valid molecular variants as well as a review of all previous molecular variants found in previous data structuring activities for other patient records to detect anomalous data elements. The system then may search for similarities in clinical data among those other patients in order to develop a template test suite. Thus, the system may assume that the other patients’ clinical data is accurate, such that deviations from that data when a validation check is performed on a subject patient’s data may trigger an alert to the provider or reviewer as to either an error in the subject patient’s data or, alternatively, to an unexpected result that may warrant further investigation.
- In one instance, validations may be fairly straightforward, e.g., when comparing different portions of a patient record, is the system able to extract a patient’s gender from more than one location and do those gender-based attributes match up? In those instances, a test suite that instructs the system to query one or more known portions of a record for gender-identifying information, to review that information for internal consistency (if more than one portion of the record is considered), and to return that gender as an attribute for the patient may be usable for multiple use cases as a fairly generic test suite. In another example, the test suite may seek to compare the structured patient data against a set of one or more guidelines, e.g., clinical trial inputs or metrics reflecting general patient population results (e.g., survival, progression, etc.), to determine whether the patient’s data is in-line with those guidelines or reflects a potential error or outlier.
- In another instance, the validation may be structured to match a clinical practice guideline that must be met before a patient is eligible to receive a therapy. One example set of clinical practice guidelines is the National Comprehensive Cancer Network Clinical Practice Guidelines. A validation may be structured to include the relevant criteria from one or more practice guidelines. If the patient record contains information that permits the validation to pass successfully, then the patient may be permitted to receive the therapy.
- In another instance, validations may be specific to certain use cases based, e.g., on other data extracted from a patient record. For example, certain types of cancer are gender-specific. Thus, a quality assurance validation or rule that says “if structured data extracted from the patient record includes an attribute for prostate cancer, then a patient gender of ‘female’ represents an error” is useful for prostate cancer use cases but not for other cancers or diseases.
- In still another instance, validations may be multi-variable or require more than a simple cross-check of two fields against one another. For example, with regard to lung or breast cancer, a patient record may document scenarios that reflect valid or invalid staging, and the relevant cancer also may have subtypes that vary based on staging. Thus, a complete validation check of a test suite may require that the system evaluate all of the possibilities at each stage to determine whether the structured data is complete and internally consistent.
- Still further, the system may include an automated process for evaluating each test suite to determine whether it represents an accurate test. That process may require running through each of the possibilities that are queried in the test suite and determining that none of the tests conflict with other tests in the suite. Thus, e.g., the system may assume that a first test yields a “true” or valid result. Then, given that result, the system determines whether it is possible for a second test to also yield a “true” or valid result. The system continues in that process until a “false” or invalid result is reached or until all tests have been evaluated. In the latter case, the system may recognize that the test suite does not include any failures and may publish the test suite for actual implementation. In the former case, once an invalid result is determined, the system may flag the test suite for further review and either amendment or definitive approval, despite the invalid result.
- One objective of the system is to allow for the creation, management and assignment of specific clinical data fields and their corresponding attributes via a single user interface. A dynamic management and rendering engine for template-specific fields enables the system to achieve this objective by permitting different classes of users to rapidly configure new templates with custom field configurations in minutes without code by employing a user interface that permits those users to select both the fields, as well as the hierarchy among the fields, that are desired for a given clinical data structuring project or use case. Templates may drive a determination of what content from the raw data is available to an abstractor. Finally, the system maintains a version history of every template modification made by authorized users for auditing purposes.
- In addition to the single-user-centric analysis described above, in another aspect, validations can be leveraged at a more granular project-specific level (rather than at an individual level or a cohort level), which may allow for the evaluation and scoring of specific template configurations as well as their corresponding data fields. Thus, rather than running validations against a single patient’s clinical data elements and content generally, the validation service also may be run with a batch or bulk set of patient clinical data elements that correspond to one or more projects. Data may be sourced from one or more sources, including upstream abstracted patient content (e.g., prior to structuring) or from more finalized versions of the data (e.g., from a downstream data warehouse in a structured format). Like the single-user-centric analysis described above, these bulk or test validation service checks may be configured to run either sequentially or simultaneously. The system may be configured to perform these validation checks on patients associated with projects that have been configured to these templates to ensure that data has been abstracted, captured and/or encoded properly.
- Results of the foregoing validations may be output as structured code, e.g., in a JSON file format. The file may include one or more indicators describing which clinical data attributes passed or failed a particular validation. Similarly, results of a test suite processing all clinical data attributes may produce a result output as structured code, e.g., also in a JSON format, that describes which particular test(s) within the suite passed or failed for one or more given patient records passed to it.
- The system may be usable by a plurality of different users having distinct roles. For example, the following list describes various user roles or use cases, the corresponding actions each user may take, and one or more benefits that may result from use of the system as a result of those actions:
- A clinical manager may want to evaluate a single patient, a project, an in-progress or completed cohort or one or more patients abstracted and/or QA’ed by a specific abstractor or lead user for accuracy. Additionally, this user may want to obtain an analysis of a data stream sourced externally (e.g. via EMR or structured data extract) to determine the need for further incremental abstraction of a patient’s clinical record.
- A single abstracted patient can be evaluated for accuracy through the use of the clinical data validation service either upon request, when the corresponding patient case is being submitted via Workbench or when clinical attributes are modified. Validation rules are run atop all structured clinical data for a single abstracted patient and pass / fail assignments are made as a result. The clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on a single abstracted patient at that point in time.
- A project can be evaluated for accuracy through the use of the clinical data validation service either upon request or when the project is used as a filter within the QA Manager Console user interface. At this point in time, validation rules will have already been run atop all structured clinical data for all completed and submitted patients within the given project and pass / fail assignments are retrieved as a result. The clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on abstracted patients within a project at that point in time.
- A cohort can similarly be evaluated for accuracy through the use of the clinical data validation service either upon request or when the cohort is used as a filter within the QA Manager Console. At this point in time, validation rules will have already been run atop all structured clinical data for all completed and submitted patients with the given cohort and pass / fail assignments are retrieved as a result. The clinical data validation service also maintains an “effective as of” timestamp that ensures that only appropriate validations are run on abstracted patients within a cohort at that point in time.
- Externally sourced data streams may first be ingested and mapped to a source-specific schema by a member of an integrations team. Subsequently, the schema may be aligned to a clinical data model by a member of an informatics team that allows for mapping of concepts to a canonical set of systems, codes, and values. After the schema mapping and concept mapping steps, the clinical data validation service can evaluate an externally sourced patient record upon request by using the default set of validations checks. Further, source-specific custom rules and validations may be authored within the QA Manager Console to ensure proper coverage of all desired data integrity checks.
- A clinical abstraction lead may want to identify gaps in abstraction for a patient and/or project assigned to their abstraction team, perhaps specific to a cancer type (e.g. colorectal team). In this instance, the clinical abstraction lead may want to obtain the IRR score for a project, manually initiate a test suite for one or more clinical data attributes as well as perform various validation checks. IRR scores at a project-level are aggregated and averaged across all eligible and completed IRR cases within that project. As a reminder, IRR case agreement thresholds and case eligibility percentage are configurable at the project level and will vary. A global set of validation checks are available via the clinical data validation service and can be run atop one or more patient records corresponding to a project.
- A clinical data abstractor may want to preview content ingested from third party sources into various data streams and obtain a report including quantitative insights specific to clinical data attributes (e.g. medications, procedures, adverse events, genetic testing, etc) that will help them to more fully abstract a patient’s clinical record from various disparate sources.
- An operational lead may want to better understand data coverage and quality gaps specific to one or more patients or in aggregate across specific projects/cohorts. Further, they may want to receive automated notifications and warnings that will alert them to take action directly with health system content partners when data validations fail and/or the automated evaluation and scoring for various clinical data streams is insufficient.
- A data scientist may want to integrate with the system to better train machine learning models based on various levels of priority and/or a trust scale for various clinical data ingested and/or structured across clinical data streams. For example, a project or cohort with a high IRR score, near-perfect clinical data validation checks and automated test suites passing may be treated preferentially to other unstructured or semi-structured clinical data with lower scores.
- An integration and/or infrastructure engineer may want to monitor various clinical data streams being ingested from external sources to verify connectivity, data sufficiency as well as quality over time.
- A quality assurance engineer may want to compare the output of their manually maintained clinical data test suites against externally sourced content programmatically or on an ad-hoc basis.
- A product manager may want to better understand the project, cohort and/or field level scoring of either/both abstracted and structured data to determine further improvements to various workflows, user interfaces and design patterns to accelerate and further streamline the data structuring operation.
- For each of the triggers discussed above, as well as for other events that may trigger the quality assurance testing disclosed herein, the system maintains a continuously growing set of stream-specific validations, warnings, and errors that help proactively inform and/or alert administrators of patient data quality and integrity issues. By making a request to the clinical data validation service, a supported application and any of its users can quickly identify whether a patient case, either individually or one within a specific cohort, has passed or failed one or more validation checks.
- Validations may be managed through a QA Manager Console user interface where they are constructed and/or grouped for use as part of quality assurance activities (at a batch and/or cohort level) and as part of on-demand evaluation criteria for one or more patient records. These validations are also useful when accounting for inclusion and exclusion criteria specific to patient cohorts for research and/or clinical trial consideration purposes.
-
FIGS. 9-12 depict one example of the user interface through which a manager-level user can view and maintain these validations, quickly determine which patient cases have passed or failed, obtain the specific detail about any failed validation, and quickly re-assign cases for further manual QA and issue resolution prior to clinical sign-out and approval. In particular,FIG. 10 depicts an exemplary user interface for performing quality assurance testing based on generic abstractions from raw documents.FIG. 11 depicts an exemplary user interface that is used to provide abstraction across multiple streams of raw clinical data and documents.FIG. 12 depicts an exemplary user interface for performing an inter-rater reliability analysis. - In another aspect,
FIGS. 13 and 14 show a second exemplary user interface that a clinical data analyst may utilize to compare, merge and generate a “single source of truth” patient record across multiple data schemas, sources and/or streams. - Turning now to
FIGS. 15-18 , the system additionally may output and/or deliver various metrics and reports that provide insight into the accuracy and/or completeness of patient clinical records specific to a project as well as across selected projects for comparative and benchmarking purposes. Reporting data may include rankings and scores at both the patient record and clinical data attribute / field grain, indicative of data source / stream quality, completeness and integrity. This information becomes available to clinical data abstractors within a data curation, abstraction, and/or structuring toolset and user interface to aid in their desire to generate a “single source of truth” consolidated patient record atop various sources. It can also be used by clinical data managers to ensure a high quality data product deliverable for partners. As seen in these figures, the system may generate outputs permitting a user to visualize the IRR scoring and conflict resolution processes, as well as to review the subsequent reporting and insights generated afterwards. Additionally, a sample visualization describing data quality across various clinical data attributes and types is included for reference. - With regard to the analytical tools described above, validation rules may be composed of hard, blocking errors (e.g., an indication of a new problem emerging after a recorded date of death) and loose warning notifications (e.g., an indication from one portion of the patient’s record that the patient has
stage 2 lung cancer while a second portion of the record indicates that the cancer is stage 3) that help to improve the integrity of a patient record during the clinical data structuring process as well as afterwards during subsequent QA activities. These validation rules can have various severity levels that indicate to an application and/or system process whether to reject fully or accept but call attention to a particular issue found in the data analyzed. Because the system may include a “sliding scale” of error severity, the results of the data quality tests may not be an “all-or-nothing” situation. Instead, as seen inFIG. 17 , the system may generate quantitative metrics such as a “% success” indicator to measure the accuracy of the data structuring. This indicator also may account for the fact that a test suite may comprise dozens, if not hundreds, of different validation checks and that some may return acceptable results while others may indicate errors, missing information, or incomplete information. - Finally,
FIG. 19 depicts one exemplary process flow of the present disclosure. In that figure, external data is received by the system, where it is ranked, scored, or otherwise structured, either on its own or in consideration with other data streams from the same patient. The structured data then is run through one or more QA Automation processes, such as the processes discussed herein in order to generate metrics and reports that can be output, e.g., to an administrative user or to the institution providing the external data. - In addition to the details of data acquisition, structuring, analytical triggering, and post-trigger analysis for a plurality of different use cases set forth herein, other relevant details of those actions may be found in the commonly-owned U.S. Pat. Application No. 16/657,804, titled “Data Based Cancer Research and Treatment Systems and Methods,” filed Oct. 18, 2019, the contents of which are incorporated herein by reference in their entirety.
- While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Thus, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.
Claims (20)
1. A computer-implemented method of performing improved automated quality assurance testing of structured patient data based on transformed unstructured data streams, the method comprising:
(i) receiving, via one or more electronic data streams, unstructured patient data;
(ii) processing, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping;
(iii) after processing the unstructured patient data, validating the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records;
(iv) causing, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device accessible by a user; and
(v) receiving, one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
2. The computer-implemented method of claim 1 , further comprising:
repeating steps (ii)-(v) until one or more of the structured patient records satisfies a threshold for accuracy.
3. The computer-implemented method of claim 2 , further comprising:
generating, via one or more processors, a digital abstraction of a patient record when the threshold for accuracy is not met.
4. The computer-implemented method of claim 1 , wherein the electronic data streams include JavaScript Object Notation-formatted data.
5. The computer-implemented method of claim 1 , wherein the unstructured patient data includes at least one of: (i) patient demographics data, (ii) diagnosis data, (iii) treatments data, (iv) outcomes data, (v) genetic testing data, (vi) labs data, or other (vii) patient data.
6. The computer-implemented method of claim 1 , wherein processing the unstructured patient data to generate structured patient records includes processing the unstructured patient data to identify medical ontological terms within the unstructured patient data includes processing the unstructured patient data using natural language processing to identify medical ontological terms within the unstructured patient data.
7. The computer-implemented method of claim 1 ,
wherein performing the schema mapping includes determining a mapping between a plurality of data elements and a plurality of data values within a data schema and the unstructured patient data;
wherein the plurality of data elements includes one or more clinical and/or phenotypic data attributes;
wherein the plurality of data values includes values extracted from the unstructured patient data corresponding to respective ones of the plurality of data elements; and
wherein the data schema includes semantic relationship information describing correspondence of respective ones of the plurality of data elements to respective ones of the plurality of the data values.
8. The computer-implemented method of claim 1 ,
wherein performing the concept mapping includes determining a mapping between a data model and one or more medical concepts included in the unstructured patient data, by identifying and processing one or more source data fields and attributes in the unstructured patient data.
9. The computer-implemented method of claim 1 , further comprising:
processing, via one or more processors, at least a portion of the structured patient records to identify one or more clinical attributes; and
determining, based on the clinical attributes, whether to include at least one of the structured patient records corresponding to the at least the portion of the structured patient records in a clinical trial.
10. The computer-implemented method of claim 9 , further comprising:
determining whether to include the at least the one of the structured patient records corresponding to the at least the portion of the structured patient records in the clinical trial based on the at least one data quality test.
11. The computer-implemented method of claim 1 , wherein performing the at least one data quality test on the structured patient records to identify one or more errors or one or more instances of incomplete information includes processing the structured patient records using (i) a recursive algorithm (ii) an iterative algorithm or (iii) a conflict resolution procedure.
12. The computer-implemented method of claim 11 , further comprising:
generating an indication of support of the at least one data quality test based on the processing of the structured patient records using the recursive algorithm; or
generating a modification of the at least one data quality test based on the processing of the structured patient records using the recursive algorithm.
13. The computer-implemented method of claim 1 ,
wherein the at least one data quality test is a genetic testing test including molecular variant criteria, and further comprising:
identifying a gene and a type of molecular variant of the gene.
14. The computer-implemented method of claim 1 , wherein the method is performed in response to detecting at least one trigger event.
15. The computer-implemented method of claim 14 , wherein detecting the at least the one trigger event includes:
(i) detecting, via one or more processors, an on-demand request;
(ii) detecting, via one or more processors, a new software code commit;
(iii) detecting, via one or more processors, an application build phase;
(iv) detecting, via one or more processors, receipt of new data;
(vi) detecting, via one or more processors, ingesting of data across a source or a stream;
(vii) detecting, via one or more processors, a sufficient inter-rater or intra-rater reliability scoring system;
(viii) detecting, via one or more processors, a completion of a case abstraction and/or a quality assurance activity;
(ix) detecting, via one or more processors, a bulk initiation of evaluation of multiple structured patient records once all have been completed; or
(x) detecting, via one or more processors, a real-time analysis during creation of a patient note or other patient data.
16. The computer-implemented method of claim 14 , further comprising:
appending the at least the one trigger event to a continuously growing set of stream-specific validations, warnings and errors monitored by administrators.
17. The computer-implemented method of claim 1 , further comprising:
training, via one or more processors, one or more machine learning models based on the structured patient records.
18. The computer-implemented method of claim 1 , further comprising:
generating, via one or more processors, a definitive clinical record for one or more patients.
19. A computing system, comprising:
one or more processors; and
one or more memories having stored thereon instructions that, when executed by one or more processors, cause the computing system to:
(i) receive, via one or more electronic data streams, unstructured patient data;
(ii) process, via the one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping;
(iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via the one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records;
(iv) cause, via the one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and
(v) receive, via the one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
20. A computer-readable medium having stored thereon instructions that, when executed by one or more processors, cause a computer to:
(i) receive, via one or more electronic data streams, unstructured patient data;
(ii) process, via one or more processors, the unstructured patient data to generate corresponding structured patient records by performing a schema mapping and a concept mapping;
(iii) after processing the unstructured patient data, validate the structured patient records generated by the schema mapping and the concept mapping, by performing, via one or more processors, at least one data quality test on the structured patient records to identify one or more errors or instances of incomplete information in the structured patient records;
(iv) cause, via one or more processors, an indication of the identified errors or instances of incomplete information to be displayed on a display device of accessible by a user; and
(v) receive, via one or more processors, a revision to the unstructured patient data from the user via a computing device accessible to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/222,324 US20230360752A1 (en) | 2018-12-31 | 2023-07-14 | Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862787249P | 2018-12-31 | 2018-12-31 | |
US16/732,210 US11742064B2 (en) | 2018-12-31 | 2019-12-31 | Automated quality assurance testing of structured clinical data |
US18/222,324 US20230360752A1 (en) | 2018-12-31 | 2023-07-14 | Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/732,210 Continuation US11742064B2 (en) | 2018-12-31 | 2019-12-31 | Automated quality assurance testing of structured clinical data |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230360752A1 true US20230360752A1 (en) | 2023-11-09 |
Family
ID=71407241
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/732,210 Active 2040-06-20 US11742064B2 (en) | 2018-12-31 | 2019-12-31 | Automated quality assurance testing of structured clinical data |
US18/222,324 Pending US20230360752A1 (en) | 2018-12-31 | 2023-07-14 | Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/732,210 Active 2040-06-20 US11742064B2 (en) | 2018-12-31 | 2019-12-31 | Automated quality assurance testing of structured clinical data |
Country Status (2)
Country | Link |
---|---|
US (2) | US11742064B2 (en) |
WO (1) | WO2020142558A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11693836B2 (en) * | 2019-01-14 | 2023-07-04 | Visa International Service Association | System, method, and computer program product for monitoring and improving data quality |
US11615868B2 (en) * | 2019-08-23 | 2023-03-28 | Omnicomm Systems, Inc. | Systems and methods for automated edit check generation in clinical trial datasets |
EP4026047A1 (en) * | 2019-09-06 | 2022-07-13 | F. Hoffmann-La Roche AG | Automated information extraction and enrichment in pathology report using natural language processing |
US11942226B2 (en) * | 2019-10-22 | 2024-03-26 | International Business Machines Corporation | Providing clinical practical guidelines |
US11741404B2 (en) * | 2019-11-05 | 2023-08-29 | Mckesson Corporation | Methods and systems for user interface interaction |
US11604785B2 (en) * | 2021-03-26 | 2023-03-14 | Jpmorgan Chase Bank, N.A. | System and method for implementing a data quality check module |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2001265205A1 (en) * | 2000-05-31 | 2001-12-11 | Fasttrack Systems, Inc. | Clinical trials management system and method |
US7913159B2 (en) | 2003-03-28 | 2011-03-22 | Microsoft Corporation | System and method for real-time validation of structured data files |
US20050234740A1 (en) * | 2003-06-25 | 2005-10-20 | Sriram Krishnan | Business methods and systems for providing healthcare management and decision support services using structured clinical information extracted from healthcare provider data |
US7532942B2 (en) * | 2006-01-30 | 2009-05-12 | Bruce Reiner | Method and apparatus for generating a technologist quality assurance scorecard |
US20110276346A1 (en) * | 2008-11-03 | 2011-11-10 | Bruce Reiner | Automated method for medical quality assurance |
CN101714191A (en) * | 2009-11-13 | 2010-05-26 | 无锡曼荼罗软件有限公司 | Quality control method and device for electronic medical records |
US11024406B2 (en) * | 2013-03-12 | 2021-06-01 | Nuance Communications, Inc. | Systems and methods for identifying errors and/or critical results in medical reports |
US10318881B2 (en) * | 2013-06-28 | 2019-06-11 | D-Wave Systems Inc. | Systems and methods for quantum processing of data |
WO2015134668A1 (en) * | 2014-03-04 | 2015-09-11 | The Regents Of The University Of California | Automated quality control of diagnostic radiology |
DE102015009187B3 (en) * | 2015-07-16 | 2016-10-13 | Dimo Dietrich | Method for determining a mutation in genomic DNA, use of the method and kit for carrying out the method |
US20170185720A1 (en) * | 2015-12-24 | 2017-06-29 | Yougene Corp. | Curated genetic database for in silico testing, licensing and payment |
US20200185069A1 (en) * | 2017-06-07 | 2020-06-11 | 3M Innovative Properties Company | Medical coding quality control |
US10489502B2 (en) * | 2017-06-30 | 2019-11-26 | Accenture Global Solutions Limited | Document processing |
US20190156947A1 (en) * | 2017-11-22 | 2019-05-23 | Vital Images, Inc. | Automated information collection and evaluation of clinical data |
AU2019229273B2 (en) * | 2018-02-27 | 2023-04-27 | Cornell University | Ultra-sensitive detection of circulating tumor DNA through genome-wide integration |
US10395772B1 (en) | 2018-10-17 | 2019-08-27 | Tempus Labs | Mobile supplementation, extraction, and analysis of health records |
-
2019
- 2019-12-31 WO PCT/US2019/069156 patent/WO2020142558A1/en active Application Filing
- 2019-12-31 US US16/732,210 patent/US11742064B2/en active Active
-
2023
- 2023-07-14 US US18/222,324 patent/US20230360752A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2020142558A1 (en) | 2020-07-09 |
US20200279623A1 (en) | 2020-09-03 |
US11742064B2 (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230360752A1 (en) | Transforming unstructured patient data streams using schema mapping and concept mapping with quality testing and user feedback mechanisms | |
Jiang et al. | Health system-scale language models are all-purpose prediction engines | |
Johnson et al. | The MIMIC Code Repository: enabling reproducibility in critical care research | |
US20200381087A1 (en) | Systems and methods of clinical trial evaluation | |
US20200411199A1 (en) | Platforms for conducting virtual trials | |
US11373739B2 (en) | Systems and methods for interrogating clinical documents for characteristic data | |
Verma et al. | Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience | |
US7917525B2 (en) | Analyzing administrative healthcare claims data and other data sources | |
US20060173663A1 (en) | Methods, system, and computer program products for developing and using predictive models for predicting a plurality of medical outcomes, for evaluating intervention strategies, and for simultaneously validating biomarker causality | |
Callahan et al. | A comparison of data quality assessment checks in six data sharing networks | |
Bettencourt-Silva et al. | Building data-driven pathways from routinely collected hospital data: a case study on prostate cancer | |
Liu et al. | Data completeness in healthcare: a literature survey | |
US20210343420A1 (en) | Systems and methods for providing accurate patient data corresponding with progression milestones for providing treatment options and outcome tracking | |
Gálvez et al. | Visual analytical tool for evaluation of 10-year perioperative transfusion practice at a children's hospital | |
Wang et al. | Bayesian hierarchical latent class models for estimating diagnostic accuracy | |
Zhang et al. | Moving towards vertically integrated artificial intelligence development | |
Diaz-Garelli et al. | DataGauge: a practical process for systematically designing and implementing quality assessments of repurposed clinical data | |
Dentler et al. | Formalization and computation of quality measures based on electronic medical records | |
Naik et al. | Assessment of the Nursing Quality Indicators for Reporting and Evaluation (NQuIRE) database using a data quality index | |
Devine et al. | Preparing electronic clinical data for quality improvement and comparative effectiveness research: the SCOAP CERTAIN automation and validation project | |
Kondylakis et al. | Status and recommendations of technological and data-driven innovations in cancer care: Focus group study | |
Wang et al. | Development of a novel imaging informatics-based system with an intelligent workflow engine (IWEIS) to support imaging-based clinical trials | |
US11243972B1 (en) | Data validation system | |
Fox et al. | Developing an expert panel process to refine health outcome definitions in observational data | |
Castellanos et al. | Raising the bar for real-world data in oncology: Approaches to quality across multiple dimensions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: TEMPUS AI, INC., ILLINOIS Free format text: CHANGE OF NAME;ASSIGNOR:TEMPUS LABS, INC.;REEL/FRAME:066544/0110 Effective date: 20231207 |