WO2021133164A1 - Unstructured data in enterprise data warehouse - Google Patents

Unstructured data in enterprise data warehouse Download PDF

Info

Publication number
WO2021133164A1
WO2021133164A1 PCT/MY2020/050170 MY2020050170W WO2021133164A1 WO 2021133164 A1 WO2021133164 A1 WO 2021133164A1 MY 2020050170 W MY2020050170 W MY 2020050170W WO 2021133164 A1 WO2021133164 A1 WO 2021133164A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unstructured
warehouse
unstructured data
harmonization
Prior art date
Application number
PCT/MY2020/050170
Other languages
French (fr)
Inventor
Mohamad Zakaria ALLI
Nur Syafiqah MUNIR
Wan Zawawi MD ZIN
Shahirina MOHD TAHIR
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2021133164A1 publication Critical patent/WO2021133164A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Definitions

  • the present invention generally relates to data analysis. More particularly, the invention relates to system and method for analysis of unstructured data in enterprise data warehouse. BACKGROUND OF THE INVENTION
  • EHRs electronic health records
  • the data is generated in the form of unstructured data. This data usually being written by medical officer which is required as part of discharge process. These data provide important piece of information for decision making when it is being collectively analyzed.
  • Unstructured data types typically are not well fit in traditional data warehouses that are based on relational databases, as they are inherently limited in analyzing the data. As a result, this piece of information could not be used during analysis for decision making. However, as unstructured data is also important in order to produce exhaustive analysis, solution need to be introduced so that both structured and unstructured data can be analyzed.
  • US7849048 B2 discloses a system and method of making unstructured data available to structured data analysis tools.
  • the system includes middleware software that can be used in combination with structured data tools to perform analysis on both structured and unstructured data.
  • Data can be read from a wide variety of unstructured sources.
  • the data may then be transformed with commercial data transformation products that may, for example, extract individual pieces of data and determine relationships between the extracted data.
  • the transformed data and relationships may then be passed through an extraction/transform/load (ETL) layer and placed in a structured schema.
  • ETL extraction/transform/load
  • the structured schema may then be made available to commercial or proprietary structured data analysis tools.
  • Another prior art US 7668849 B1 discloses a method, system, and software of relating structured data to unstructured data includes displaying unstructured data in a first display area and displaying structured data related to the unstructured data in a second display area.
  • displaying unstructured data in a first display area and displaying structured data related to the unstructured data in a second display area In response to a change in the display of one of the unstructured data in the first display area or the structured data in the second display area, automatically dynamically changing the display in the other of the first display area or the second display area to display changed data based on its relation to the changed data in the one of the first display area or the second display area.
  • the present invention provides a system for analyzing unstructured data.
  • the system includes at least one data source module for providing at least one unstructured data, a Privacy Assurance Services component for conducting pseudonymization on the unstructured data to mask personal identification information, a data harmonization tool configured for codification of the pseudonymized data based on at least on reference dataset, and at least one data analytics and visualization module configured for visualizing and analyzing transformed codified data loaded into a data warehouse.
  • the present invention provides a method of analyzing unstructured data.
  • the method includes extracting at least one unstructured data from a data source, pseudonymizing the data through Privacy Assurance Services component, sending pseudonymized data to a data harmonization module for codification, transforming and loading codified data into data warehouse and sending the transformed data to at least one data analytics and visualization module for visualization and analytics.
  • Fig. 1 shows an architecture diagram of a system for extracting, transform & loading (ETL) unstructured data inside enterprise data warehouse using data harmonization tool in accordance with an embodiment of the present invention.
  • Fig. 2 shows a flow diagram of a method for analyzing the unstructured data utilizing data harmonization tool in accordance with an embodiment of the present invention.
  • ETL transform & loading
  • Embodiments described herein will refer to plan views and/or cross-sectional views by way of ideal schematic views. Accordingly, the views may be modified depending on simplistic assembling or manufacturing technologies and/or tolerances. Therefore, example embodiments are not limited to those shown in the views but include modifications in configurations formed on basis of assembling process. Therefore, regions exemplified in the figures have schematic properties and shapes of regions shown in the figures exemplify specific shapes or regions of elements, and do not limit the various embodiments including the example embodiments.
  • FIG. 1 an architecture diagram of a system for extracting, transforming & loading (ETL) unstructured data inside enterprise data warehouse using data harmonization tool is shown in accordance with an embodiment of the present invention.
  • the system 100 includes a data analytics and visualization module 110, an ETL support tool 120, a privacy assurance service (PAS) component 130, a data source 140, a data harmonization tool 150, a reference dataset 160 and an enterprise data warehouse 170.
  • PAS privacy assurance service
  • the data analytic and visualization module 110 is configured for composing and exposing data from the data warehouse 170 for analyzing and visualizing heterogeneous data.
  • the ETL support tool 120 is configured for extracting the data from source and generate a JSON file.
  • the JSON file is then transformed based on the business rules and the transformed data is loaded into the data warehouse 170.
  • the privacy assurance service component 130 performs pseudonymization process in order to anonymize personal information.
  • the data source 140 is configured to capture and collect the unstructured data.
  • the data harmonization tool 150 is configured for codifying unstructured data into an integrated, consistent and unambiguous data based on terminology inside the reference dataset 160.
  • the reference dataset 160 is configured to store all relevant terminology with unique codes.
  • the at least one data warehouse 170 includes a plurality of warehouse databases for storing the data.
  • the at least one reference dataset 160 includes a plurality of reference database for storing list of a plurality of terminologies with unique codes, each unique code is required for codification of the unstructured data.
  • the terminology herein refers to a hierarchy of terms which include unique codes and definitions.
  • the unique code functions as an identifier for the term. For example, term ‘Cardiovascular Implant’ is tagged to a unique code of ‘309513005’ which then will be used by data harmonization tool 150 as tagging for each identified term that found inside the unstructured data. Similar term such as ‘Cardiovascular Therapy’ is also being tagged with the same unique code of ‘309513005’ to ensure wider coverage on the area.
  • the data harmonization module 150 includes a plurality of components for codified unstructured data.
  • the visualization module 110 which is based on Business Intelligent (BI) tool is used to visualize the data from the data warehouse 170.
  • the visualization medium such as fixed- format report and dashboard is being created using BI tool which then will extract the data from the data warehouse 170 and populate the data according to the designed report.
  • the system 100 includes at least but not limited to four processors and sixteen gigabit of Random Access Memory (RAM) configured for processing the extraction, transformation and loading of data in the data warehouse 170.
  • the size of processors and RAM may varies depending on data velocity and volume.
  • a flow diagram 200 of a method for analyzing the unstructured data utilizing data harmonization tool is shown in accordance with an embodiment of the present invention.
  • the method includes step S210 of collecting and extracting data from a source system where the data extraction output is in JSON format.
  • JSON format is an output format for codified data that has undergone data harmonization; in which the unstructured data is structured before it can be processed by the ETL tool which is one of the subsequent steps in accordance with the embodiment of the present invention.
  • S220 Pseudonymization of unstructured data is done via Privacy Assurance Service component.
  • Data Harmonization module fetch pseudomized data from file server for codification and generate JSON file.
  • the ETL support tool fetch JSON file with codified data and performs the transformation and loading into data warehouse.
  • data analytics and Visualization is performed, in which during this step, composing and exposing data from the warehouse to be used for analyzing and visualizing heterogenous data.
  • the present invention utilizes data harmonization tool which is based on semantic technology that utilizes different terminologies to combine textual data into an integrated, consistent and unambiguous data.
  • Data may be captured or collected from a source system in form of unstructured data will need to go thorough pseudonymization process to mask personal identification information.
  • the ETL support tool processes and sends the output to data harmonization component in order to codify the unstructured data based on the reference dataset that being provided. Codified data is generated and processed by the ETL support tool for data warehouse consumption. Processed data inside data warehouse can be used by intelligent tools for analysis.
  • the data source, ETL support tool, data harmonization module, and/or processor included in or associated with the system 100 described herein may comprise one or more microprocessors, digital signal processors, application specific integrated circuits, field programmable gate arrays, and/or other types of digital processing circuits, configured according to computer program instructions implemented in software (or firmware).
  • microprocessors digital signal processors
  • application specific integrated circuits field programmable gate arrays
  • other types of digital processing circuits configured according to computer program instructions implemented in software (or firmware).
  • firmware firmware

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a system and method for analyzing unstructured data. The system includes at least one data source module for providing at least one unstructured data, a Privacy Assurance Services component for conducting pseudonymization on the unstructured data to mask personal identification information, a data harmonization tool configured for codification of the pseudonymized data based one at least on reference dataset, and at least one data analytics and visualization module configured for visualizing and analyzing transformed codified data loaded into a data warehouse.

Description

UNSTRUCTURED DATA IN ENTERPRISE DATA WAREHOUSE
FIELD OF INVENTION
The present invention generally relates to data analysis. More particularly, the invention relates to system and method for analysis of unstructured data in enterprise data warehouse. BACKGROUND OF THE INVENTION
Clinical information in electronic health records (EHRs) is mostly in a form of unstructured data, for example, procedure and diagnosis data. The data is generated in the form of unstructured data. This data usually being written by medical officer which is required as part of discharge process. These data provide important piece of information for decision making when it is being collectively analyzed.
Unstructured data types typically are not well fit in traditional data warehouses that are based on relational databases, as they are inherently limited in analyzing the data. As a result, this piece of information could not be used during analysis for decision making. However, as unstructured data is also important in order to produce exhaustive analysis, solution need to be introduced so that both structured and unstructured data can be analyzed.
One prior art document US7849048 B2 discloses a system and method of making unstructured data available to structured data analysis tools. The system includes middleware software that can be used in combination with structured data tools to perform analysis on both structured and unstructured data. Data can be read from a wide variety of unstructured sources. The data may then be transformed with commercial data transformation products that may, for example, extract individual pieces of data and determine relationships between the extracted data. The transformed data and relationships may then be passed through an extraction/transform/load (ETL) layer and placed in a structured schema. The structured schema may then be made available to commercial or proprietary structured data analysis tools.
Another prior art US 7668849 B1 discloses a method, system, and software of relating structured data to unstructured data includes displaying unstructured data in a first display area and displaying structured data related to the unstructured data in a second display area. In response to a change in the display of one of the unstructured data in the first display area or the structured data in the second display area, automatically dynamically changing the display in the other of the first display area or the second display area to display changed data based on its relation to the changed data in the one of the first display area or the second display area.
However, none of the prior arts addresses the problem of issues arising in analyzing structured and unstructured data. In the view of foregoing, there is a need for an improved method and system for overcoming the short comings associated with prior arts.
SUMMARY OF THE INVENTION
Accordingly, the present invention provides a system for analyzing unstructured data. The system includes at least one data source module for providing at least one unstructured data, a Privacy Assurance Services component for conducting pseudonymization on the unstructured data to mask personal identification information, a data harmonization tool configured for codification of the pseudonymized data based on at least on reference dataset, and at least one data analytics and visualization module configured for visualizing and analyzing transformed codified data loaded into a data warehouse. In an embodiment, the present invention provides a method of analyzing unstructured data. The method includes extracting at least one unstructured data from a data source, pseudonymizing the data through Privacy Assurance Services component, sending pseudonymized data to a data harmonization module for codification, transforming and loading codified data into data warehouse and sending the transformed data to at least one data analytics and visualization module for visualization and analytics.
BRIEF DESCRIPTION OF THE DRAWINGS
The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which: Fig. 1 shows an architecture diagram of a system for extracting, transform & loading (ETL) unstructured data inside enterprise data warehouse using data harmonization tool in accordance with an embodiment of the present invention. Fig. 2 shows a flow diagram of a method for analyzing the unstructured data utilizing data harmonization tool in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Various embodiment of the present invention provides system and method for analyzing of unstructured data in enterprise data warehouse. The following description provides specific details of certain embodiments of the invention illustrated in the drawings to provide a thorough understanding of those embodiments. It should be recognized, however, that the present invention can be reflected in additional embodiments and the invention may be practiced without some of the details in the following description.
The various embodiments including the example embodiments will now be described more fully with reference to the accompanying drawings, in which the various embodiments of the invention are shown. The invention may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. In the drawings, the sizes of components may be exaggerated for clarity.
It will be understood that as used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
Spatially relative terms, such as “data,” “unstructured data,” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the structure in use or operation in addition to the orientation depicted in the figures.
Embodiments described herein will refer to plan views and/or cross-sectional views by way of ideal schematic views. Accordingly, the views may be modified depending on simplistic assembling or manufacturing technologies and/or tolerances. Therefore, example embodiments are not limited to those shown in the views but include modifications in configurations formed on basis of assembling process. Therefore, regions exemplified in the figures have schematic properties and shapes of regions shown in the figures exemplify specific shapes or regions of elements, and do not limit the various embodiments including the example embodiments.
The subject matter of example embodiments, as disclosed herein, is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different features or combinations of features similar to the ones described in this document, in conjunction with other technologies. Generally, the various embodiments including the example embodiments relate to a system and method for extracting, transforming and loading an unstructured data into data warehouse for analytics.
Referring to Fig. 1, an architecture diagram of a system for extracting, transforming & loading (ETL) unstructured data inside enterprise data warehouse using data harmonization tool is shown in accordance with an embodiment of the present invention. The system 100 includes a data analytics and visualization module 110, an ETL support tool 120, a privacy assurance service (PAS) component 130, a data source 140, a data harmonization tool 150, a reference dataset 160 and an enterprise data warehouse 170.
The data analytic and visualization module 110 is configured for composing and exposing data from the data warehouse 170 for analyzing and visualizing heterogeneous data.
The ETL support tool 120 is configured for extracting the data from source and generate a JSON file. The JSON file is then transformed based on the business rules and the transformed data is loaded into the data warehouse 170. The privacy assurance service component 130 performs pseudonymization process in order to anonymize personal information.
The data source 140 is configured to capture and collect the unstructured data.
The data harmonization tool 150 is configured for codifying unstructured data into an integrated, consistent and unambiguous data based on terminology inside the reference dataset 160.
The reference dataset 160 is configured to store all relevant terminology with unique codes.
In an embodiment, the at least one data warehouse 170 includes a plurality of warehouse databases for storing the data. In an embodiment, the at least one reference dataset 160 includes a plurality of reference database for storing list of a plurality of terminologies with unique codes, each unique code is required for codification of the unstructured data. The terminology herein refers to a hierarchy of terms which include unique codes and definitions. The unique code functions as an identifier for the term. For example, term ‘Cardiovascular Implant’ is tagged to a unique code of ‘309513005’ which then will be used by data harmonization tool 150 as tagging for each identified term that found inside the unstructured data. Similar term such as ‘Cardiovascular Therapy’ is also being tagged with the same unique code of ‘309513005’ to ensure wider coverage on the area.
In an embodiment, the data harmonization module 150 includes a plurality of components for codified unstructured data.
In an embodiment, the visualization module 110 which is based on Business Intelligent (BI) tool is used to visualize the data from the data warehouse 170. The visualization medium such as fixed- format report and dashboard is being created using BI tool which then will extract the data from the data warehouse 170 and populate the data according to the designed report. In an embodiment the system 100 includes at least but not limited to four processors and sixteen gigabit of Random Access Memory (RAM) configured for processing the extraction, transformation and loading of data in the data warehouse 170. The size of processors and RAM may varies depending on data velocity and volume. Referring to Fig. 2, a flow diagram 200 of a method for analyzing the unstructured data utilizing data harmonization tool is shown in accordance with an embodiment of the present invention. The method includes step S210 of collecting and extracting data from a source system where the data extraction output is in JSON format. It should be noted that JSON format is an output format for codified data that has undergone data harmonization; in which the unstructured data is structured before it can be processed by the ETL tool which is one of the subsequent steps in accordance with the embodiment of the present invention.
In S220 Pseudonymization of unstructured data is done via Privacy Assurance Service component. In S230 Data Harmonization module fetch pseudomized data from file server for codification and generate JSON file. In S240 the ETL support tool fetch JSON file with codified data and performs the transformation and loading into data warehouse. In S250 data analytics and Visualization is performed, in which during this step, composing and exposing data from the warehouse to be used for analyzing and visualizing heterogenous data.
The present invention utilizes data harmonization tool which is based on semantic technology that utilizes different terminologies to combine textual data into an integrated, consistent and unambiguous data.
Data may be captured or collected from a source system in form of unstructured data will need to go thorough pseudonymization process to mask personal identification information. Once pseudonymized, the ETL support tool processes and sends the output to data harmonization component in order to codify the unstructured data based on the reference dataset that being provided. Codified data is generated and processed by the ETL support tool for data warehouse consumption. Processed data inside data warehouse can be used by intelligent tools for analysis.
The data source, ETL support tool, data harmonization module, and/or processor included in or associated with the system 100 described herein may comprise one or more microprocessors, digital signal processors, application specific integrated circuits, field programmable gate arrays, and/or other types of digital processing circuits, configured according to computer program instructions implemented in software (or firmware). As would be apparent to a person having ordinary skilled in the art, the afore-described methods and systems may be provided in many variations, modifications or alternatives to existing methods and systems. The principles and concepts disclosed herein may also be implemented in various manner which may not have been specifically described herein but which are to be understood as encompassed within the scope of the appended claims.

Claims

1. A system (100) for analyzing unstructured data, characterized in that, the system (100) comprising: at least one data source module (140) for providing at least one unstructured data; a privacy assurance services component (130) for conducting pseudonymization on the unstructured data to mask personal identification information; a data harmonization tool (150) configured for codification of the pseudonymized data based on at least one reference dataset (160); and at least one data analytics and visualization module (110) configured for visualizing and analyzing transformed codified data loaded into a data warehouse (170).
2. The system (100) of claim 1 further comprises at least one file server at a client side for providing access of files.
3. The system (100) of claim 1 wherein the data warehouse (170) includes a plurality of warehouse databases for storing the data.
4. The system (100) of claim 1 wherein the at least one reference dataset (160) includes a plurality of reference database for storing list of a plurality of terminologies with a unique code, each unique code is required for codification of the unstructured data.
5. The system (100) of claim 1 wherein the data harmonization tool (150) includes a plurality of components for codifying unstructured data.
6. The system (100) of claim 1 wherein the data analytics and visualization tool (110) is further configured for composing and exposing the data from the data warehouse (170).
7. The system (100) of claim 1 further comprises at least one processor configured for processing the extraction, transformation and loading of data in the data warehouse (170).
8. A method of analyzing unstructured data, characterized by the steps of: extracting at least one unstructured data from a data source module; pseudonymizing the data through privacy assurance services component; sending pseudonymized data to a data harmonization module for codification; transforming and loading codified data into a data warehouse; and sending the transformed codified data to at least one data analytics and visualization module for visualization and analytics.
9. The method of claim 8 further comprising the step of outputting the extracted unstructured data as JavaScript Object notation format.
10. The method of claim 8 wherein the data is pseudonymized for masking personally identifiable information.
11. The method of claim 8 wherein the codification of the data is based on a reference dataset.
PCT/MY2020/050170 2019-12-24 2020-11-25 Unstructured data in enterprise data warehouse WO2021133164A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2019007773 2019-12-24
MYPI2019007773 2019-12-24

Publications (1)

Publication Number Publication Date
WO2021133164A1 true WO2021133164A1 (en) 2021-07-01

Family

ID=76574601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2020/050170 WO2021133164A1 (en) 2019-12-24 2020-11-25 Unstructured data in enterprise data warehouse

Country Status (1)

Country Link
WO (1) WO2021133164A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6643635B2 (en) * 2001-03-15 2003-11-04 Sagemetrics Corporation Methods for dynamically accessing, processing, and presenting data acquired from disparate data sources
US20070294113A1 (en) * 2006-06-14 2007-12-20 General Electric Company Method for evaluating correlations between structured and normalized information on genetic variations between humans and their personal clinical patient data from electronic medical patient records
US20130197938A1 (en) * 2011-08-26 2013-08-01 Wellpoint, Inc. System and method for creating and using health data record
WO2014152305A1 (en) * 2013-03-14 2014-09-25 Ontomics, Inc. System and methods for personalized clinical decision support tools
WO2017130305A1 (en) * 2016-01-27 2017-08-03 株式会社日立製作所 Computer system and data amount reduction method
JP2019159608A (en) * 2018-03-09 2019-09-19 株式会社日立製作所 Search device and search method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6643635B2 (en) * 2001-03-15 2003-11-04 Sagemetrics Corporation Methods for dynamically accessing, processing, and presenting data acquired from disparate data sources
US20070294113A1 (en) * 2006-06-14 2007-12-20 General Electric Company Method for evaluating correlations between structured and normalized information on genetic variations between humans and their personal clinical patient data from electronic medical patient records
US20130197938A1 (en) * 2011-08-26 2013-08-01 Wellpoint, Inc. System and method for creating and using health data record
WO2014152305A1 (en) * 2013-03-14 2014-09-25 Ontomics, Inc. System and methods for personalized clinical decision support tools
WO2017130305A1 (en) * 2016-01-27 2017-08-03 株式会社日立製作所 Computer system and data amount reduction method
JP2019159608A (en) * 2018-03-09 2019-09-19 株式会社日立製作所 Search device and search method

Similar Documents

Publication Publication Date Title
US11449538B2 (en) Method and system for high performance integration, processing and searching of structured and unstructured data
US9652513B2 (en) Generating data pattern information
Jonnalagadda et al. Text mining of the electronic health record: an information extraction approach for automated identification and subphenotyping of HFpEF patients for clinical trials
JP5379693B2 (en) Method and system for high performance integration, processing and search of structured and unstructured data using coprocessors
US11048715B1 (en) Automated file acquisition, identification, extraction and transformation
US6829608B2 (en) Systems and methods for discovering mutual dependence patterns
JP6902106B2 (en) Creating cognitive intelligence queries from a large number of data corpora
Berger et al. Data mining as a tool for research and knowledge development in nursing
Murthy et al. Big Data solutions on a small scale: Evaluating accessible high-performance computing for social research
CN113268500B (en) Service processing method and device and electronic equipment
US20220229854A1 (en) Constructing ground truth when classifying data
US6728697B2 (en) Partial delegation of complex OLAP queries with application to zero suppression condition evaluation
US20070282804A1 (en) Apparatus and method for extracting database information from a report
CN116483822B (en) Service data early warning method, device, computer equipment and storage medium
WO2021133164A1 (en) Unstructured data in enterprise data warehouse
Wah et al. Development of a data warehouse for lymphoma cancer diagnosis and treatment decision support
CN112131215B (en) Bottom-up database information acquisition method and device
CN113722296A (en) Agricultural information processing method and device, electronic equipment and storage medium
CN113806356B (en) Data identification method and device and computing equipment
US10658075B1 (en) Rapid reporting of meaningful use in electronic health records
Brown et al. Complex data types and a data manipulation language for scientific and statistical databases
CN111427893B (en) Json data storage method, json data storage device, computer equipment and storage medium
Nguyen-Cong et al. Storing and Querying DICOM Data with HYTORMO
Rademaker et al. Analyzing Sepsis Treatment Variations in Subpopulations with Process Mining
CN116244326A (en) Data processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20904858

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20904858

Country of ref document: EP

Kind code of ref document: A1