WO2017189693A1 - Learning from historical logs and recommending database operations on a data-asset in an etl tool - Google Patents

Learning from historical logs and recommending database operations on a data-asset in an etl tool Download PDF

Info

Publication number
WO2017189693A1
WO2017189693A1 PCT/US2017/029583 US2017029583W WO2017189693A1 WO 2017189693 A1 WO2017189693 A1 WO 2017189693A1 US 2017029583 W US2017029583 W US 2017029583W WO 2017189693 A1 WO2017189693 A1 WO 2017189693A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
context data
database
application
predictive model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2017/029583
Other languages
English (en)
French (fr)
Inventor
Atreyee Dey
Sanjay Kaluskar
Udayakumar Dhansingh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Informatica LLC
Original Assignee
Informatica LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Informatica LLC filed Critical Informatica LLC
Priority to JP2018555888A priority Critical patent/JP6843882B2/ja
Priority to EP17790324.2A priority patent/EP3449334A4/en
Priority to AU2017255561A priority patent/AU2017255561B2/en
Priority to CA3022113A priority patent/CA3022113A1/en
Publication of WO2017189693A1 publication Critical patent/WO2017189693A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90324Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • a graphical user interface of the data analysis application includes a data section, an information section, and various user interface controls.
  • the data section is for displaying the tables for analysis.
  • the information section is for displaying profiled information about the tables, based on the schema definitions of the tables.
  • a composite data control is for receiving a database operation (equivalently, a database command) to unify the tables into a composite table based on at least one matching column among the tables.
  • the composite data control may be multiple different controls for the various unifying database operations.
  • a recommendation control of the UI is for displaying recommended database operations determined by the database operation recommendation module.
  • FIG. 1 is a high-level block diagram of a computing environment for generating predictive models from historical logs of database operations and recommending database operations on data in a data analysis application according to one embodiment.
  • FIG. 5A is a flowchart illustrating a method for building and training a predictive model for determining and recommending database operations to a guided user of a data analysis application according to one embodiment.
  • FIG. 7 is a flowchart illustrating a method for presenting, in a data analysis application, recommended database operations and operands received from a data analysis server.
  • the computing environment 100 includes data repositories 102, a data analysis server 104, and a data analysis application 125.
  • the data repositories 102 include one or more systems for managing data. Each data repository 102 provides a channel for accessing and updating data stored within the data repository 102. Data in a data repository 102 may be associated with users, groups of users, entities, and/or workflows.
  • a data repository 102 may be a customer relationship management (CRM) system or a human resource (HR) management system that stores data associated with all individuals associated with a particular entity.
  • a data repository 102 may be a data source or an export target for an ETL process. Examples of data sources include databases, applications, and local files.
  • the data analysis application 125 is a software application that enables users to manipulate data extracted from the data repositories 102 by the data analysis server 104 and select and specify database operations to be performed on single tables or multiple tables, and is one means for performing this function.
  • the data analysis application 125 provides data to users in the form of projects, which are sets of tables.
  • the various modules of the data analysis application 125 are not native or standard components of a generic computer system, and provide specific functionality as described herein that extends beyond the generic functions of computer system.
  • the functions and operations of the modules is sufficiently complex as to require an implementation by a computer system, and as such cannot be performed in any practical embodiment by mental steps in the human mind. Each of these components is described in greater detail below.
  • the database operation UI module 124 provides one or more database operation controls for applying to data in the table(s) generated by the UI module 122, and is one means for performing this function. Specifically, the database operation UI module 124 provides controls that allow a user of the data analysis application 125 to select, specify and/or cause the application of database operations associated with the tables.
  • coluran_triraraable : 0.0
  • the classes for the model are names of database operations that were performed on the columns, as identified in Table 1, shown in column 310. As shown in Table 5, these particular example features and classes would be used to train the OP Model.
  • the example of FIG. 3 shows 14 data entries but in practice, the predictive models described above may be trained using hundreds, thousands, millions or more data entries.
  • the pieces of context data and database operation history data that make up the data entries are selected from log entries by the model training module 210 as described above with respect to FIG. 2.
  • Data entries may be stored in the database operation recommendation store 121.
  • the database operation recommendation module 114 builds 505 a predictive model.
  • the predictive model can be the operation model (OP), the operand model (OPD), or the column operation model (OPC), or any combination thereof.
  • Building the predictive model includes determining the training users whose database operations will be used as the training data for the model.
  • Building the predictive model further includes determining model classes. For example, if the predictive model is the OP Model, the classes are database operations. If the predictive model is the OPD model, the classes are operands. If the predictive model is the OPC model, the classes are join and union operations, or defined two- table operations.
  • the model training module 210 trains 510 the model using the maintained database operation history data and context data from the determined training users.
  • the model training module 210 retrieves the database operation history data and training context data corresponding to the training users from the profiling data store 118 and the database operation history store 120.
  • the model training module 210 determines what context data is predictive of a particular database operation or operand.
  • the model training module 210 determines feature weights for each model feature, as described above with respect to FIG. 2. Feature weights and other parameters may be stored in the database operation recommendation store 121 and retrieved for use as necessary.
  • the model training module 210 preprocesses the context data prior to training the model, as discussed above with respect to FIG. 2. Once a model is trained, it may be used to determine the probability of classes (operations or operands) based on a set of features (context data received from the data analysis application).
  • recommendation generation module 220 selects a number of the most probable database operations as determined by the model to provide as recommendations to the guided user. If the OPD Model is also used, the selected database operations determined by the OP Model are used as inputs to the OPD Model to determine a number of the most probable operands for the selected database operations.
  • each recommended database operation includes an operation identifier that uniquely identifies the database operation the data analysis application 125.
  • each recommended database operation further includes a textual name or description of the database operation for presentation to the user of the data analysis application 125.
  • Database operations, operation identifiers, and textual names and descriptions may be stored in the database operation recommendation store 121 and retrieved by the database operation recommendation module 114 prior to sending recommended database operations to the data analysis application 125.
  • suggestions card 435 includes recommended database operations.
  • the column 650 contains phone numbers that are formatted in different ways. The
  • the present invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of computer-readable storage medium suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/US2017/029583 2016-04-26 2017-04-26 Learning from historical logs and recommending database operations on a data-asset in an etl tool Ceased WO2017189693A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2018555888A JP6843882B2 (ja) 2016-04-26 2017-04-26 履歴ログからの学習と、etlツール内のデータアセットに関するデータベースオペレーションの推奨
EP17790324.2A EP3449334A4 (en) 2016-04-26 2017-04-26 LEARNING FROM HISTORICAL LOGS AND RECOMMENDING DATABASE OPERATIONS ON A DATA ASSET IN AN ETL TOOL
AU2017255561A AU2017255561B2 (en) 2016-04-26 2017-04-26 Learning from historical logs and recommending database operations on a data-asset in an ETL tool
CA3022113A CA3022113A1 (en) 2016-04-26 2017-04-26 Learning from historical logs and recommending database operations on a data-asset in an etl tool

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/139,186 US10324947B2 (en) 2016-04-26 2016-04-26 Learning from historical logs and recommending database operations on a data-asset in an ETL tool
US15/139,186 2016-04-26

Publications (1)

Publication Number Publication Date
WO2017189693A1 true WO2017189693A1 (en) 2017-11-02

Family

ID=60090217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/029583 Ceased WO2017189693A1 (en) 2016-04-26 2017-04-26 Learning from historical logs and recommending database operations on a data-asset in an etl tool

Country Status (6)

Country Link
US (1) US10324947B2 (enExample)
EP (1) EP3449334A4 (enExample)
JP (1) JP6843882B2 (enExample)
AU (1) AU2017255561B2 (enExample)
CA (1) CA3022113A1 (enExample)
WO (1) WO2017189693A1 (enExample)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10073858B2 (en) 2013-05-16 2018-09-11 Oracle International Corporation Systems and methods for tuning a storage system
US11289200B1 (en) 2017-03-13 2022-03-29 C/Hca, Inc. Authorized user modeling for decision support
US10412028B1 (en) * 2013-05-24 2019-09-10 HCA Holdings, Inc. Data derived user behavior modeling
US10936599B2 (en) 2017-09-29 2021-03-02 Oracle International Corporation Adaptive recommendations
US11269822B2 (en) * 2017-10-09 2022-03-08 Sap Se Generation of automated data migration model
US11222274B2 (en) * 2017-12-01 2022-01-11 At&T Intellectual Property I, L.P. Scalable integrated information structure system
US10783161B2 (en) * 2017-12-15 2020-09-22 International Business Machines Corporation Generating a recommended shaping function to integrate data within a data repository
US11323564B2 (en) * 2018-01-04 2022-05-03 Dell Products L.P. Case management virtual assistant to enable predictive outputs
JP7246095B2 (ja) * 2018-02-09 2023-03-27 国立大学法人静岡大学 機械学習システム及び機械学習方法
JP7119411B2 (ja) * 2018-02-16 2022-08-17 日本電気株式会社 データベース装置、データ管理方法、及びコンピュータ・プログラム
US12190250B2 (en) * 2018-03-16 2025-01-07 International Business Machines Corporation Contextual intelligence for unified data governance
US11908573B1 (en) 2020-02-18 2024-02-20 C/Hca, Inc. Predictive resource management
US11531929B2 (en) * 2018-11-09 2022-12-20 Citrix Sysiems, Inc. Systems and methods for machine generated training and imitation learning
US11636071B2 (en) * 2020-01-10 2023-04-25 Salesforce.Com, Inc. Database replication error recovery based on supervised learning
CN111324455A (zh) * 2020-02-10 2020-06-23 浙江中智达科技有限公司 工业云平台调度方法、装置及系统
US11455316B2 (en) 2020-02-28 2022-09-27 Clumio, Inc. Modification of data in a time-series data lake
US11379500B2 (en) * 2020-03-30 2022-07-05 Sap Se Automated data integration, reconciliation, and self healing using machine learning
US11763178B2 (en) 2020-05-29 2023-09-19 Capital One Services, Llc Predictive scheduling and execution of data analytics applications based on machine learning techniques
CN113392174A (zh) * 2020-08-28 2021-09-14 郭举 基于大数据和人工智能的信息解析方法及系统
US11886891B2 (en) * 2021-09-10 2024-01-30 Sap Se Context-based multiexperience element dynamically generated using natural language processing
CN117055479B (zh) * 2023-07-19 2024-05-24 河南上恒医药科技有限公司 口服溶液药物生产过程中设备状态监控方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8978114B1 (en) * 2012-07-15 2015-03-10 Identropy, Inc. Recommendation engine for unified identity management across internal and shared computing applications
US20160055426A1 (en) * 2014-08-25 2016-02-25 Sunstone Analytics Customizable machine learning models
US20160092475A1 (en) * 2014-09-26 2016-03-31 Oracle International Corporation Automated entity correlation and classification across heterogeneous datasets

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7698170B1 (en) * 2004-08-05 2010-04-13 Versata Development Group, Inc. Retail recommendation domain model
US7720720B1 (en) * 2004-08-05 2010-05-18 Versata Development Group, Inc. System and method for generating effective recommendations
JP4958476B2 (ja) * 2006-05-24 2012-06-20 株式会社日立製作所 検索装置
US20090144262A1 (en) * 2007-12-04 2009-06-04 Microsoft Corporation Search query transformation using direct manipulation
JP5049223B2 (ja) * 2008-07-29 2012-10-17 ヤフー株式会社 Webクエリに対する検索要求属性を自動推定する検索装置、検索方法及びプログラム
US8266148B2 (en) * 2008-10-07 2012-09-11 Aumni Data, Inc. Method and system for business intelligence analytics on unstructured data
US20100114976A1 (en) * 2008-10-21 2010-05-06 Castellanos Maria G Method For Database Design
US8200661B1 (en) * 2008-12-18 2012-06-12 Google Inc. Dynamic recommendations based on user actions
US8161077B2 (en) * 2009-10-21 2012-04-17 Delphix Corp. Datacenter workflow automation scenarios using virtual databases
US20120102007A1 (en) * 2010-10-22 2012-04-26 Alpine Consulting, Inc. Managing etl jobs
US8978137B2 (en) * 2012-02-29 2015-03-10 Cisco Technology, Inc. Method and apparatus for retroactively detecting malicious or otherwise undesirable software
US9646262B2 (en) * 2013-06-17 2017-05-09 Purepredictive, Inc. Data intelligence using machine learning
KR101404710B1 (ko) * 2013-12-16 2014-06-11 김민수 호 기반 광고 서비스 제공 방법
US11227104B2 (en) * 2014-05-11 2022-01-18 Informatica Llc Composite data creation with refinement suggestions
US9881059B2 (en) * 2014-08-08 2018-01-30 Yahoo Holdings, Inc. Systems and methods for suggesting headlines
US9507824B2 (en) * 2014-08-22 2016-11-29 Attivio Inc. Automated creation of join graphs for unrelated data sets among relational databases
US20160117087A1 (en) * 2014-10-23 2016-04-28 Microsoft Corporation Job creation and reuse
US9686086B1 (en) * 2014-12-01 2017-06-20 Arimo, Inc. Distributed data framework for data analytics
US10430421B2 (en) * 2014-12-29 2019-10-01 Facebook, Inc. Recommending content items in a social network using delayed interaction
US10713587B2 (en) * 2015-11-09 2020-07-14 Xerox Corporation Method and system using machine learning techniques for checking data integrity in a data warehouse feed

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8978114B1 (en) * 2012-07-15 2015-03-10 Identropy, Inc. Recommendation engine for unified identity management across internal and shared computing applications
US20160055426A1 (en) * 2014-08-25 2016-02-25 Sunstone Analytics Customizable machine learning models
US20160092475A1 (en) * 2014-09-26 2016-03-31 Oracle International Corporation Automated entity correlation and classification across heterogeneous datasets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3449334A4 *

Also Published As

Publication number Publication date
EP3449334A4 (en) 2019-12-25
EP3449334A1 (en) 2019-03-06
CA3022113A1 (en) 2017-11-02
JP6843882B2 (ja) 2021-03-17
US10324947B2 (en) 2019-06-18
JP2019519027A (ja) 2019-07-04
AU2017255561A1 (en) 2018-12-13
US20170308595A1 (en) 2017-10-26
AU2017255561B2 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
AU2017255561B2 (en) Learning from historical logs and recommending database operations on a data-asset in an ETL tool
US10558629B2 (en) Intelligent data quality
US11227104B2 (en) Composite data creation with refinement suggestions
US9798829B1 (en) Data graph interface
US12443908B2 (en) Data distillery for signal detection
US20170060931A1 (en) Intelligent data munging
US20110246415A1 (en) Method and system for validating data
US11030258B1 (en) Ranking anomalies associated with populations of users based on relevance
US20140279803A1 (en) Disambiguating data using contextual and historical information
US11037096B2 (en) Delivery prediction with degree of delivery reliability
US11379466B2 (en) Data accuracy using natural language processing
US20150193511A1 (en) Graphical record matching process replay for a data quality user interface
US10824606B1 (en) Standardizing values of a dataset
Baizyldayeva et al. Decision making procedure: applications of IBM SPSS cluster analysis and decision tree
CN110019182B (zh) 一种数据追溯方法及装置
US20190205299A1 (en) Library search apparatus, library search system, and library search method
Fajri et al. Implementation of business intelligence to determine evaluation of activities (Case Study Indonesia Stock Exchange)
Serpell Incorporating data quality improvement into supply–use table balancing
Avdeenko et al. Modeling information space for decision-making in the interaction of higher education system with regional labor market
US20200342302A1 (en) Cognitive forecasting
WO2014168961A1 (en) Generating data analytics using a domain model
CN113901332B (zh) 任职历程信息挖掘方法和装置、以及存储介质和电子设备
US20230169072A1 (en) Augmented query validation and realization
Ayyavaraiah Data Mining For Business Intelligence
CN116756410A (zh) 产品服务推荐方法、产品服务推荐装置、设备及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 3022113

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2018555888

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2017790324

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17790324

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017790324

Country of ref document: EP

Effective date: 20181126

ENP Entry into the national phase

Ref document number: 2017255561

Country of ref document: AU

Date of ref document: 20170426

Kind code of ref document: A