AU2021102318A4

AU2021102318A4 - System for Improving Prediction Accuracy of Healthcare Ontology

Info

Publication number: AU2021102318A4
Application number: AU2021102318A
Authority: AU
Inventors: Devaraju B. M.; Priyanka Chandrashekhar Hiremath; Monika P.; Piyush Kumar Pareek; G. T. Raju; Venkatesan Selvam; H. R. Shashidhara
Original assignee: Hiremath Priyanka Chandrashekhar Mrs; P Monika Dr; Raju G T Dr; Selvam Venkatesan Dr
Current assignee: Hiremath Priyanka Chandrashekhar Mrs; P Monika Dr; Raju G T Dr; Selvam Venkatesan Dr
Priority date: 2021-03-19
Filing date: 2021-05-02
Publication date: 2021-06-17
Anticipated expiration: 2029-05-02

Abstract

TITLE OF INVENTION: System for Improving Prediction Accuracy of Healthcare Ontology FIELS OF INVENTION: COMPUTER SCIENCE ABSTRACT There is considerable concern about the lack of data in this digital age because the web is packed with a great deal of data in different formats and sizes. Healthcare data is used in online and other databases on a regular basis. The Semantic Web intelligently parses its ontological details. Machine Learning and Big Data Analysis have included tree-based algorithms like ID3, C4.5 etc. If the dataset quality is relatively low, mis-pruning of trees can decrease the prediction accuracy considerably. It has been suggested that EMIA is a solution based on the principles of HMM and Stochastic Automata Model (SAM). The proposed application incorporates the rules of diagnosed models based on the thresholds into an integrated Healthcare ontology resulting in a prediction accuracy of 86 percent.

Description

TITLE OF INVENTION: System for Improving Prediction Accuracy of Healthcare Ontology FIELD OF INVENTION: COMPUTER SCIENCE

BACKGROUNDOFSTUDY

[001] Internet holds vast amount of knowledge on a gigantic scale. Semantic web deals with information or meaning about data which is present in the web. Medicine is one of the major domains among other industries that produce data in large quantities every minute in day to day life. However, the created data cannot function effectively even within the same domain if it is not connected to other data which is present in the same web.

[002] The data available in semantic web is already vast and complex. It has many patterns, missing values, various labels and many other things in it. Therefore, it is not suitable for automated computer information exploration. The data featured in the proposed graphical technique contributes towards better decision-making and increases the prediction accuracy.

PRIOR ART OF WORK

[003] ("US7899764B2") : A medical ontology may be used for computer assisted clinical decision support. Multi-level and/or semantically grouped medical ontology is incorporated into a machine learning algorithm. The resulting machine-learnt algorithm outputs information to assist in clinical decisions. For example, a patient record is input to the algorithm. Based on the incorporated medical ontology, similarities are aggregated in different groups. An aggregate similarity of at least one group is a function of an aggregate similarity of another group. One or more similar patients and/or outcomes are identified based on similarity. Probability based outputs may be provided.

[004] ("US20070178501A"): The system described herein enables clinicians and researchers to use aggregated genetic and phenotypic data from clinical trials and medical records to make the safest, most effective treatment decisions for each patient. This involves (i) the creation of a standardized ontology for genetic, phenotypic, clinical, pharmacokinetic, pharmacodynamic and other data sets, (ii) the creation of a translation engine to integrate heterogeneous data sets into a database using the standardized ontology, and (iii) the development of statistical methods to perform data validation and outcome prediction with the integrated data. The system is designed to interface with patient electronic medical records (EMRs) in hospitals and laboratories to extract a particular patient's relevant data. The system may be used in the context of generating phenotypic predictions and enhanced medical laboratory reports for treating clinicians. The system may also be used in the context of leveraging the huge amount of data created in medical and pharmaceutical clinical trials. The ontology and validation rules are designed to be flexible so as to accommodate a disparate set of clients. The system is designed to be flexible so that it can change to accommodate scientific progress and remain optimally configured.

[005] ("W02006130162A2"): An information system using a healthcare ontology to provide a standardized representation for healthcare data is disclosed. One embodiment of the information system comprises a digital logic platform for storing and using the healthcare ontology. The healthcare ontology describes concepts and relationships between the contents derived from the corpus of domain specific knowledge and linking with standardized terminological systems.

[006] ("US8024128B2"): The information management system disclosed enables caregivers to make better decisions by using aggregated data. The system enables the integration, validation and analysis of genetic, phenotypic and clinical data from multiple subjects. A standardized data model stores a range of patient data in standardized data classes comprising patient profile, genetic, symptomatic, treatment and diagnostic information. Data is converted into standardized data classes using a data parser specifically tailored to the source system. Relationships exist between standardized data classes, based on expert rules and statistical models, and are used to validate new data and predict phenotypic outcomes. The prediction may comprise a clinical outcome in response to a proposed intervention. The statistical models and methods for training those models may be input according to a standardized template. Methods are described for selecting, creating and training the statistical models to operate on genetic, phenotypic, clinical and undetermined data sets.

[007] ("US20120095300Al"): Methods, systems, and computer storage media are provided for predicting a probability of acute deterioration for a specific patient. Various discrete measurements are taken regarding the patient's current health. Those measurements are used to determine a PPOD score, which is displayed for clinicians.

SUMMARY OF THE INVENTION

[008] The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not an extensive overview of the present invention. It is not intended to identify the key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concept of the invention in a simplified form as a prelude to a more detailed description of the invention presented later.

Work progresses on extending the proposal for enhancing prediction accuracy across ontologies upon interaction connecting the semantic data at schema level for efficient utilization of the available datasets.

DETAILED DESCRIPTION OF THE INVENTION

[009] The following description is of exemplary embodiments only and is not intended to limit the scope, applicability or configuration of the invention in any way. Rather, the following description provides a convenient illustration for implementing exemplary embodiments of the invention. Various changes to the described embodiments may be made in the function and arrangement of the elements described without departing from the scope of the invention.

Algorithm: Effective Model Integration Algorithm (EMIA) to improve prediction Accuracy

Input: Training dataset (T), Number of sets (M), Number of Models in each set (N) Output: Enhanced Model based on C4.5 classification rules

1. Let DS be the Dataset created from T of varied size, where IDSI= M 2. Let A be a vector representing prediction 3. Let BSM be a list with 3 data fields representing Binary Similarity Measure 4. Let Tv represent threshold of all models in particular setof M 5. Vm eM, IN/ = C / C is a decimal constant > 5 for optimal solution 6. for i= 1 to |MI do 7. for] = 1 to INI do 8. Using DSi, execute C4.5 algorithm 9. Document rules of model () 10. Document Prediction Accuracy of model(j) in A 11. end for 12. end for 13. VNE M, Compute Threshold (Tv) based on A grouped on M, 14. for i = 1 to |MI do 15. forj= 1 to INI do 16. if A(N) 1 TN then 17. Mark model n eligible for integration / where meN */ 18. end if 19. end for 20. end for 21. for i = 1 to IMI do 22. V (a,b) e N, /Ni gets updated with integrated models every iterations 23. ifmodels (a and b) are marked as eligible for integration then 24. Compute Binary Similarity Measure of a and b and store in BSM 25. Compute Threshold (TN) based on BSM scores from recent construct 26. V(a,b) e N present in BSM /* BSM from recent construct *7 27. if BSM(a,b) > TNthen / Thfrom recent construct 28. Integrate the models a &b and rename as Integrated model 29. end for 30. Repeat the preceding for loop until no more intra models are eligible for integration 31. Consider M itself as N as result of previous step 32. Repeat the preceding for loop until no more inter models are eligible for integration 33. Finally from the resultant integrated models, choose the model with highest prediction accuracy by verifying on C4.5 results as final Enhanced Model 34. Return the Enhanced Model

[010] The proposed idea starts with m sets with each set having n models equivalent to states of HMM in a SAM form. Instead of probability functions in HMM, the suggested approach considers the threshold values for integration of models if found similar proved upon computation of binary similarity scores between the models for models generalization. To begin with accuracies of all models in all the sets will be documented at level-i. The intra models get selected to further levels for integrating based on the following steps: For all a belongs to n in each model m • At Level-1: calculate accuracy, apply threshold to select the model a further. • From level-2 to level-p, using the Asymmetric Binary Similarity measure method, the concept of clustering is applied for determining the pair of rules to be integrated and the process continues till the last level to get final set of rules so as to attain better decision accuracy.

[011] The enhanced rules obtained using EMIA can be utilized efficiently by following the sequence of methodology starting with enhanced rule generation, followed by building Ontology - by applying the rules defined in .pie file using tools like GraphDB or using Cellfie tool available in protege and querying the constructed knowledge graph using SPARQL queries for efficient knowledge retrieval. Observations conclude that the ontology built using the stated methodology is witnessed with the increased derived instances of up to 10% compared to the derived instances of well-known decision algorithms resulting in good prediction rate comparatively.

[012] Semantic web instead of working on existing data files, works on concepts represented in the documents. Despite the advancements in artificial intelligence and machine learning, current algorithms are not to predict with accuracy without rule pruning. So to address these disadvantages, an Effective MIA (EMIA) has been proposed based on the principle of Hidden Markov Model and the Stochastic Automata Model (SAM). Thresholds are determined by accuracy ratings and Binary Similarity Tests. Different models from different sets are combined, until the baseline hypothesis is supported.

The experimental results show that LUCC is better than current techniques. The proposed model worked very well with an overall accuracy of 86%. Hence, our proposed solution performs better in any domain no matter which domain you are operating in.

Claims

TITLE OF INVENTION: System for Improving Prediction Accuracy of Healthcare Ontology Claims We claim,

1. System for Improving Prediction Accuracy of Healthcare Ontology, generates improved decision rules by integrating the rules of chosen models based on threshold in a Healthcare ontology