WO2022019745A1

WO2022019745A1 - System and method for facilitating consolidation and analysis of time-based instances of data

Info

Publication number: WO2022019745A1
Application number: PCT/MY2020/050171
Authority: WO
Inventors: Hooi Hwa LIM; Mohamad Zakaria ALLI; Wan Zawawi MD ZIN; Fauziah Hanim JAHIDIN; Nor Farhanah HASBOLLAH
Original assignee: Mimos Berhad
Priority date: 2020-07-24
Filing date: 2020-11-25
Publication date: 2022-01-27

Abstract

A system (100) and method for facilitating consolidation and analysis of time-based instances of data from disparate data sources (110) has been disclosed. The system (100) provides a self-service approach to enable users to interact with their data directly. The system (100) facilitates consolidation of time-based instances of data for a designated application, wherein the application's underlying data has different and multiple cut-off dates, cut-off variables and business rules. The system (100) consolidates the data in a target database (112) to facilitate self-service fixed format data reporting and ad-hoc analysis.

Description

SYSTEM AND METHOD FOR FACILITATING CONSOLIDATION AND ANALYSIS

OF TIME-BASED INSTANCES OF DATA

FIELD OF THE INVENTION

The present invention generally relates to computer implemented system and method that process data for business intelligence related analytics. Particularly, the present invention relates to a system and a method that facilitates consolidation of time-based instances of data from disparate data sources for facilitating fixed format reporting and ad-hoc analysis.

BACKGROUND OF THE INVENTION

Modern organizations produce huge amount of data which is generated from various computer application-based modules or sub-applications, stored in disparate data sources and categorized using different variables and rules in the respective data sources.

Organizations are increasingly realizing the importance of this data to extract insights and valuable information which will enable them to make data-driven decisions. This data also forms the source to extract business intelligence related analytics to enable organizations to leverage on their current data to make decisions.

The healthcare industry, for instance, produces huge amount of registration and patient data which covers disparate services such as inpatient, daycare, outpatient, clinical support, family health, oral health and oncology. The data from each of these services is saved in discrete database tables, wherein each table may have different column names, values and rules for capturing and storing this data. This data may also be stored in different database types such as in relational or non-relational databases.

Currently, if business intelligence related analytics need to be extracted for the organization then this extraction would have to be done manually by designated database personnel using different sets of manual scripts to cater for different requirements of each of the modules and data sources such as different and multiple dates, variables and business rules.

Due to the above shortcomings, the management of these organization does not have ready access to the business intelligence related analytics. There is therefore a need for a self-service system that allows decision makers to interact with their data directly and generate business intelligence related analytics such as fixed format reports and ad-hoc analysis.

SUMMARY

A system and method for facilitating consolidation and analysis of time-based instances of data from disparate data sources has been disclosed. The system provides a self-service approach to enable users to interact with their data directly. The system facilitates consolidation of time-based instances of data for a designated application, wherein the application’s underlying data has different and multiple cut off dates, variables and business rules. The system consolidates the data to facilitate self-service fixed format data reporting and ad-hoc analysis.

According to one aspect of the present invention there is disclosed a system for facilitating consolidation and analysis of time-based instances of data from disparate data sources. The system includes one or more processors, input/output (I/O) interface(s), and one or more data storage devices or memory operatively coupled to the one or more processors, wherein the one or more processors are configured by the instructions to connect a source database and a target database, wherein the source database include pre-loaded data from disparate sources for a designated application in one or more tables of the source database; receive, at a user interface, requirements including time-based instance value and range and metadata required for fetching the time-based instances of data; detect one or more keywords from the received requirements; compose a script for consolidating data using the detected requirements; and schedule consolidation of time-based instances of data from the source database to the target database using the composed script. According to another aspect of the present invention there is provided a method for consolidating time-based instances of data from disparate data sources, the method is performed by one or more processors executing instructions for one or more modules stored in a memory, the method comprising the following steps: connecting a source database and a target database, wherein the source database include pre- loaded data from disparate sources for a designated application in one or more tables of the source database; receiving requirements, at a user interface generated via the one or more processors, including time-based instance value and range and metadata required for fetching the time-based instances of data; detecting one or more keywords from the received requirements; composing a script for consolidating data using the detected requirements; and scheduling consolidation of time-based instances of data from the source database to the target database using the composed script.

In yet another aspect, there are provided one or more non-transitory machine readable information storage media storing instructions which, when executed by one or more processors, causes the one or more processors to execute a method comprising: connecting a source database and a target database, wherein the source database include pre-loaded data from disparate sources for a designated application in one or more tables of the source database; receiving requirements including time-based instance value and range and metadata required for fetching the time-based instances of data; detecting one or more keywords from the received requirements; composing a script for consolidating data using the detected requirements; and scheduling consolidation of time-based instances of data from the source database to the target database using the composed script.

Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components. BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

The present invention will be fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention, wherein:

FIGURE 1 illustrates components for a system for facilitating consolidation and analysis of time-based instances of data from disparate data sources, in accordance with an embodiment of the present invention.

FIGURE 2 illustrates a functional process flow showing the steps for facilitating consolidation and analysis of time-based instances of data from disparate data sources, in accordance with an embodiment of the present invention.

FIGURE 3 shows sample metadata being generated for consolidating time-based instances of data between the source database and the target database in accordance with the present invention. FIGURE 4 is an exemplary stack area chart for providing visualization of the time- based instance of data being consolidated by a composer module of the present invention.

DETAILED DESCRIPTION In accordance with the present invention, there is provided a system and a method for facilitating consolidation and analysis of time-based instances of data from disparate data sources for a designated application, which will now be described with reference to the embodiment shown in the accompanying drawings. The embodiment does not limit the scope and ambit of the invention. The description relates purely to the exemplary embodiment and its suggested applications.

The embodiment herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiment in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiment herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the description should not be construed as limiting the scope of the embodiment herein.

The description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation.

Term ‘consolidation’ herein refers to collecting and integrating data from multiple and different types of data sources into a single database.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware or programmable instructions) or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “unit,” “module,” or “system.”

Referring to the accompanying drawings, FIGURE 1 illustrates an exemplary architecture in which or with which proposed system (100) for facilitating consolidation and analysis of time-based instances of data from disparate data sources can be implemented in accordance with an embodiment of the present invention. The system (100) disclosed provides a self-service approach to enable users to interact with their data directly and generate business intelligence related analytics such as fixed format reports and ad-hoc analysis. Particularly, the system (100) enables users to consolidate and load time-based instances of data such as monthly or yearly data from various data sources (110) in a target database (112). Referring to FIGURE 2 depicts example functional units, or components and functional process flow of the compiler (118), composer (120), scheduler module (122) and connectors to fetch data from relevant disparate data sources (110) which have different and multiple cut-off dates, cut-off variables and business rules and consolidate the data in the target database (112) to facilitate self-service fixed format data reporting and ad-hoc analysis.

According to an embodiment, the system (100) includes one or more processors (104), input/output (I/O) interface(s) (106), and one or more data storage devices or memory (102) operatively coupled to the one or more processors (104). The one or more processors (104) may be one or more software processing modules and/or hardware processors. The memory (102) may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.

In an embodiment, a plurality of modules (108) can be stored in the memory (102), wherein the modules (108) may comprise a connector module (116), a compiler module (118), a composer module (120), a scheduler module (122) which when executed by the one or more processors (104) are configured to consolidate and load time-based instances of data from various data sources which have different and multiple dates, variables and business rules.

Those skilled in the art would appreciate that, although the system (100) includes a number of distinct units and modules, as illustrated in FIGURE 1, it should be recognized that some units and modules may be combined, and/or some functions may be performed by one or more units or modules. Therefore, the embodiment of FIGURE 1 represents the major components of the system (100) for facilitating consolidation and analysis of time-based instances of data from disparate data sources (110), but these components may be combined or divided depending on the particular design without limiting the scope of the present disclosure. The working of the system (100) is described in conjunction with FIGURES 1 and 2 explained below.

According to the invention, data which is generated from various computer application-based modules hosted by an organization, stored in disparate data sources (110) and categorized using different variables and rules in the respective data source (110) is copied to one or more tables of a source database (111) by the one or more processors (104). In one embodiment, the source database (111) is pre- loaded with data from disparate sources (110) of the computer application-based modules in one or more tables of the source database. The source database (111) tables include necessary metadata to consolidate data from disparate data sources (110).

In accordance with one aspect of the present disclosure, the system (100) includes a connector module (116) which facilitates to configure a connection between the source database (111) and the target database (112). According to the invention, the source database (111) and the target database (112) may be of the same database type. Alternatively, the source database (111) and the target database (112) may be of different database types operating according to different database protocols and/or query languages and/or even having different structures.

The time-instance based consolidation process is initiated on receiving at a user interface (114) the requirements including time-based instance value and range and metadata required for fetching the time-based instances of data. According to one embodiment of the present invention, the time-based instance value is selected from the group comprising of daily, weekly, monthly, quarterly and yearly, and the time- based instance range is a date or a date range. The metadata includes one or more variables representing column name(s), cut-off date(s) and type of data that will be stored in one or more tables or documents hosted in the source database (111) and the target database (112).

The compiler module (118) detects one or more keywords from the requirements using probability analysis and data clustering techniques. The compiler module (118) analyzes the received requirements using probability analysis followed by classifying the one or more detected keywords using data clustering techniques based on knowledgebase repositories. The compiler module (118) collects and stores the detected details in the target database (112), wherein the detected keywords from the requirements form the columns and corresponding name and data type of the target database (112).

The one or more processors (104) activate the composer module (120) on sensing a button click being performed on a designated area of the user interface (114). Alternatively, the composer module (120) may be activated by a single input including a keyboard stroke, a touch, a gesture, a voice or an artificial intelligence (Al) based input on the designated area of the user interface (114). The composer module (120) composes a script for consolidating data using the detected keywords or requirements saved in the target database (112) with only the single input. The composer module (120) uses interactive classification and cohesion scoring techniques to compose the script for consolidating data using the detected keywords or requirements.

In accordance with an embodiment of the present invention, the composer module (120) may generate an alert in the event any outlier pattern is found in the time- based instance data being consolidated. These alerts are represented in visual format on the user interface (114), wherein in one embodiment the visual alert is a stack area chart which shows the total data being consolidated and stored in the target database (112) on executing the script. On receiving the alert for data outliers on the user interface (114), the users may change the requirements entered on the user interface (114).

The scheduler module (122) performs the tasks of scheduling consolidation of time- based instances of data from the source database (111 ) to the target database (112) using the composed script to load data into the target database (112) through one of an extract transform load (ETL) process. The time-based instances of consolidated data are stored in the target database (112). The one or more tables or documents, depending on the type of database, of the target database (112) are overwritten with the latest time-based instances of consolidated data for the time-based instance value such as daily, weekly, monthly, quarterly or annually selected by the user. The time-based instance of data in one or more tables or documents hosted in the target database are used by business analytical tools or visualization tools including Tableau, SAS and SAP Business Object for generating self-service business analytical reports or dashboard generation.

In another embodiment, the scheduler module (122) can analyze patterns of one or more scheduled consolidations of time-based instances of data to auto-schedule consolidation of time-based instances of data for one or more sub-applications of the designated application. A log table is maintained in the target database (112) that stores the details, parameters and status of the scheduled jobs running, such as sub-application sequences, time-based instance value and range, source tables, target tables, start_datetime, end_datetime and a flag that indicates the job completion. The scheduler module (122) analyses the pattern of the jobs running and sequence to be followed using the start_datetime and end_datetime, and auto starts the designated application’s next sub-application’s data consolidation job upon sensing that a flag has been marked completed.

According to another aspect of the present invention, there is provided a computer- implemented method for consolidating time-based instances of data from disparate data sources. The method is performed by one or more processors executing instructions for one or more modules stored in a memory. The method comprises the following steps: connecting a source database and a target database, wherein the source database include pre-loaded data from disparate sources for a designated application in one or more tables of the source database; receiving requirements, at a user interface generated via the one or more processors, including time-based instance value and range and metadata required for fetching the time-based instances of data; detecting one or more keywords from the received requirements; composing a script for consolidating data using the detected requirements; and scheduling consolidation of time-based instances of data from the source database to the target database using the composed script.

In an embodiment, the step of detecting one or more keywords from the received requirements includes analyzing the received requirements using probability analysis; and classifying the one or more detected keywords using data clustering techniques.

In addition, the step of composing a script for consolidating data using the detected requirements includes analyzing the time-based instance of data being generated using the composed script; and generating an alert in the event any outlier pattern is found in generated time-based instances of data.

Furthermore, the step of scheduling consolidation of time-based instances of data includes identifying patterns of one or more scheduled consolidations of time-based instances of data; and auto-scheduling upcoming consolidation of time-based instances of data for the designated application based on the identified patterns.

In accordance with an additional aspect, the present invention can take the form of a computer program product accessible from a machine-readable media providing programming code for use by the system (100). The software and/or computer program product can be hosted in the environment of FIGURE 1 to implement the teachings of the present invention. One or more non-transitory machine readable information storage media storing instructions which, when executed by one or more processors, causes the one or more processors to execute a method comprising connecting a source database and a target database; loading data from disparate sources for a designated application in one or more tables of the source database; receiving requirements including time-based instance value and range and metadata required for fetching the time-based instances of data; detecting one or more keywords from the received requirements; composing a script for consolidating data using the detected requirements; and scheduling consolidation of time-based instances of data from the source database to the target database for one or more sub-application modules of the designated application using the composed script.

EXAMPLE:

The present invention will now be explained by implementing its teachings for Malaysia Healthcare Data Warehouse (MyHDW) application which has multiple sub applications including inpatient, daycare, outpatient, clinical support, cancer registries, family health, oral Health and the like. FIGURE 3 shows sample metadata being generated for consolidating time-based instances of data between the source database and the target database and FIGURE 4 shows an exemplary stack area chart for providing visualization of the time-based instance of data being consolidated by the composer module (120) of the present invention.

Referring to FIGURES 3 and 4, the different sub-applications of MyHDW have different data sources and different columns. For instance, admission date, encounter date and the like. The MyHDW users sets the consolidation date as 22nd of each month and this date is stored in cutoff_date_list of the target database (112). Further, using the user interface (114), the user enters the following details including but not limited to source tables name, target tables name, time-instance based date, variables and status. The compiler module (118) interactively classifies and analyses these details and data provided, via probability analysis. The compiler module (118) then performs classification or clustering of the textual data using predefined knowledgebase repositories to detect the keywords which are then stored in a cut-off table as seen in FIGURE 3. For instance, if the user on the user interface (114) enters metadata ‘admission’ as requirement for the inpatient sub-application then the compiler module (118) based on probability analysis will suggest ‘admission_date_key’ as the variable to be used for the inpatient sub-application in the target database (112) for consolidation of data. The probability analysis is conducted by comparing ‘admission’ with existing variable names used in one or more tables for the inpatient sub-application present in the source database (111) and finding a relevant match for the keyword “admission”.

Thereafter on sensing an single input including but not limited to a click, a stroke, a touch, a gesture, a voice and an Al based input on a designated area of the user interface (114), the composer module (120) uses interactive classification and cohesion scoring techniques to generate a script for data consolidation. A sample script generated by the composer module (120) is as follows: insert into ${target_table_name} select ^* from ${source_table_name} where date(updated_datetime)=date('${cutoff_date} ) and substring(${transaction_column_name},5,2)::int>=${cutoff_month} and substring(${ transaction_column_name}, 1 ,4)::int=${ cutoff_year} and ${ transaction_column_name}<=${cutoff_date_key}

According to an embodiment of the invention, there is a predefined standard naming convention for ETL tables for a data warehouse. For instance, as seen in FIGURE 3, in MyFIDW, the source tables in the source database (111) have the table names with suffix “_fact” while target table in the target database (112) have the prefix “prelim” and suffix “_fact”. While key for date-based column in the target database (112) has the suffix “_date_key”. Consolidation begins by fetching data from the source database (111) based on the date_format. The details stored in a knowledgebase repository will be used in probability analysis and classification/clustering by the one or more processors (104) to auto create the necessary data structures in the target database (112). The structure being one or more columns in one or more tables of the target database (112) to facilitate the data consolidation.

The composer module (120) also generates visual alerts for users if there are any outliers found in the data to enable the users to change the variables provided on the user interface. A sample of the visualization is seen in FIGURE 4 in the form of a stack area chart to show an expected result that will be captured in consolidated or prelim tables in the target database (112), based on the compiled keywords generated by the compiler module (118).

The scheduler module (122) is configured to run the data extract transform load (ETL) process from the source database (111) to the target database (112) on a daily basis at midnight. The data ETL process checks the status of the scheduled jobs from the relevant cut-off table in the target database (112) to kick-start next sub applications data ETL process upon completion of current process. For instance, if the scheduler module (122) is currently running the data ETL process for the inpatient sub-application of MyFIDW, upon inpatient’s data ETL process completion, the scheduler module (122) checks the status of the other sub-applications. If the completion flag is not marked then the scheduler module (122) automatically starts the data ETL process for the daycare or outpatient sub-applications, so on and so forth until all the sub-applications are flagged as complete.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises," "comprising," “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.

The use of the expression “at least” or “at least one” suggests the use of one or more elements, as the use may be in one of the embodiments to achieve one or more of the desired objects or results.

The process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously, in parallel, or concurrently.

Various methods described herein may be practiced by combining one or more machine-readable storage media containing the code according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the invention could be accomplished by modules, routines, subroutines, or subparts of a computer program product.

While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

Claims

CLAIMS:

1. A system (100) for facilitating consolidation of time-based instances of data from disparate data sources (110), comprising: a memory (102) for storing instructions for one or more modules (108); one or more input / output interfaces (106); and one or more processors (104) coupled to the memory (102) via the one or more interfaces (104), wherein the one or more modules (108) are selectively executed by the one or more processors (104) to facilitate consolidation of time-based instances of data characterized in that the one or more processors (104) are configured by the instructions to:

- connect a source database (111) and a target database (112);

- receive, at a user interface (114), requirements including time-based instance value and range, and metadata required for fetching the time-based instances of data;

- detect one or more keywords from the received requirements;

- compose a script for consolidating data using the detected requirements; and

- schedule consolidation of time-based instances of data from the source database (111) to the target database (112) using the composed script.

2. The system as claimed in claim 1 , wherein the time-based instance value is selected from the group comprising of daily, weekly, monthly, quarterly and yearly; and the range is a date range.

3. The system as claimed in claim 1, wherein the metadata includes one or more variables representing name, cut-off date and type of data in the source database (111) and the target database (112).

4. The system as claimed in claim 1, wherein the one or more processors (104) detect one or more keywords from the received requirements using probability analysis and data clustering techniques.

5. The system as claimed in claim 1, wherein the one or more processors (104) compose a script for consolidating data using the detected requirements using interactive classification and cohesion scoring techniques.

6. The system as claimed in claim 1, wherein the one or more processors (104) compose the script in response to a single input received on a designated area of the user interface (114), wherein the input includes a click, a stroke, a touch, a gesture, a voice and an artificial intelligence, Al based input.

7. The system as claimed in claim 1 , wherein the one or more processors (104) are further configured to generate an alert in the event any outlier pattern is found in the time-based instance data being consolidated, wherein the alert is represented in visual format on the user interface (114).

8. The system as claimed in claim 1 , wherein one or more processors (104) are further configured to analyse patterns of one or more scheduled consolidations of time-based instances of data to auto-schedule consolidation of time-based instances of data for one or more sub-applications of the designated application.

9. A computer-implemented method for consolidating time-based instances of data from disparate data sources, the method performed by one or more processors executing instructions for one or more modules stored in a memory, the method comprising the following steps:

- connecting a source database and a target database;

- receiving, at a user interface generated via the one or more processors, requirements including time-based instance value and range, and metadata required for fetching the time-based instances of data;

- detecting one or more keywords from the received requirements;

- composing a script for consolidating data using the detected requirements; and

- scheduling consolidation of time-based instances of data from the source database to the target database using the composed script.

10. The computer-implemented method as claimed in claim 9, wherein the step of detecting one or more keywords from the received requirements includes: i. analyzing the received requirements using probability analysis; and ii. classifying the one or more detected keywords using data clustering techniques.

11. The computer-implemented method as claimed in claim 9, wherein the step of composing a script for consolidating data using the detected requirements includes analyzing the time-based instance of data being generated using the composed script; and generating an alert in the event any outlier pattern is found in generated time-based instance of data.

12. The computer-implemented method as claimed in claim 9, wherein the step of scheduling consolidation of time-based instances of data includes identifying patterns of one or more scheduled consolidations of time-based instances of data; and auto-scheduling consolidation of time-based instances of data for one or more sub-applications of the designated application based on the identified patterns.

13. One or more non-transitory machine readable information storage media storing instructions for one or more modules (108) which, when executed by one or more processors, causes the one or more processors to execute a method comprising: i. connecting a source database and a target database; ii. receiving requirements including time-based instance value and range and metadata required for fetching the time-based instances of data; iii. detecting one or more keywords from the received requirements; iv. composing a script for consolidating data using the detected requirements; and v. scheduling consolidation of time-based instances of data from the source database to the target database using the composed script.