WO2017189693A1 - Learning from historical logs and recommending database operations on a data-asset in an etl tool - Google Patents
Learning from historical logs and recommending database operations on a data-asset in an etl tool Download PDFInfo
- Publication number
- WO2017189693A1 WO2017189693A1 PCT/US2017/029583 US2017029583W WO2017189693A1 WO 2017189693 A1 WO2017189693 A1 WO 2017189693A1 US 2017029583 W US2017029583 W US 2017029583W WO 2017189693 A1 WO2017189693 A1 WO 2017189693A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- context data
- database
- application
- predictive model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90324—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- a graphical user interface of the data analysis application includes a data section, an information section, and various user interface controls.
- the data section is for displaying the tables for analysis.
- the information section is for displaying profiled information about the tables, based on the schema definitions of the tables.
- a composite data control is for receiving a database operation (equivalently, a database command) to unify the tables into a composite table based on at least one matching column among the tables.
- the composite data control may be multiple different controls for the various unifying database operations.
- a recommendation control of the UI is for displaying recommended database operations determined by the database operation recommendation module.
- FIG. 1 is a high-level block diagram of a computing environment for generating predictive models from historical logs of database operations and recommending database operations on data in a data analysis application according to one embodiment.
- FIG. 5A is a flowchart illustrating a method for building and training a predictive model for determining and recommending database operations to a guided user of a data analysis application according to one embodiment.
- FIG. 7 is a flowchart illustrating a method for presenting, in a data analysis application, recommended database operations and operands received from a data analysis server.
- the computing environment 100 includes data repositories 102, a data analysis server 104, and a data analysis application 125.
- the data repositories 102 include one or more systems for managing data. Each data repository 102 provides a channel for accessing and updating data stored within the data repository 102. Data in a data repository 102 may be associated with users, groups of users, entities, and/or workflows.
- a data repository 102 may be a customer relationship management (CRM) system or a human resource (HR) management system that stores data associated with all individuals associated with a particular entity.
- a data repository 102 may be a data source or an export target for an ETL process. Examples of data sources include databases, applications, and local files.
- the data analysis application 125 is a software application that enables users to manipulate data extracted from the data repositories 102 by the data analysis server 104 and select and specify database operations to be performed on single tables or multiple tables, and is one means for performing this function.
- the data analysis application 125 provides data to users in the form of projects, which are sets of tables.
- the various modules of the data analysis application 125 are not native or standard components of a generic computer system, and provide specific functionality as described herein that extends beyond the generic functions of computer system.
- the functions and operations of the modules is sufficiently complex as to require an implementation by a computer system, and as such cannot be performed in any practical embodiment by mental steps in the human mind. Each of these components is described in greater detail below.
- the database operation UI module 124 provides one or more database operation controls for applying to data in the table(s) generated by the UI module 122, and is one means for performing this function. Specifically, the database operation UI module 124 provides controls that allow a user of the data analysis application 125 to select, specify and/or cause the application of database operations associated with the tables.
- coluran_triraraable : 0.0
- the classes for the model are names of database operations that were performed on the columns, as identified in Table 1, shown in column 310. As shown in Table 5, these particular example features and classes would be used to train the OP Model.
- the example of FIG. 3 shows 14 data entries but in practice, the predictive models described above may be trained using hundreds, thousands, millions or more data entries.
- the pieces of context data and database operation history data that make up the data entries are selected from log entries by the model training module 210 as described above with respect to FIG. 2.
- Data entries may be stored in the database operation recommendation store 121.
- the database operation recommendation module 114 builds 505 a predictive model.
- the predictive model can be the operation model (OP), the operand model (OPD), or the column operation model (OPC), or any combination thereof.
- Building the predictive model includes determining the training users whose database operations will be used as the training data for the model.
- Building the predictive model further includes determining model classes. For example, if the predictive model is the OP Model, the classes are database operations. If the predictive model is the OPD model, the classes are operands. If the predictive model is the OPC model, the classes are join and union operations, or defined two- table operations.
- the model training module 210 trains 510 the model using the maintained database operation history data and context data from the determined training users.
- the model training module 210 retrieves the database operation history data and training context data corresponding to the training users from the profiling data store 118 and the database operation history store 120.
- the model training module 210 determines what context data is predictive of a particular database operation or operand.
- the model training module 210 determines feature weights for each model feature, as described above with respect to FIG. 2. Feature weights and other parameters may be stored in the database operation recommendation store 121 and retrieved for use as necessary.
- the model training module 210 preprocesses the context data prior to training the model, as discussed above with respect to FIG. 2. Once a model is trained, it may be used to determine the probability of classes (operations or operands) based on a set of features (context data received from the data analysis application).
- recommendation generation module 220 selects a number of the most probable database operations as determined by the model to provide as recommendations to the guided user. If the OPD Model is also used, the selected database operations determined by the OP Model are used as inputs to the OPD Model to determine a number of the most probable operands for the selected database operations.
- each recommended database operation includes an operation identifier that uniquely identifies the database operation the data analysis application 125.
- each recommended database operation further includes a textual name or description of the database operation for presentation to the user of the data analysis application 125.
- Database operations, operation identifiers, and textual names and descriptions may be stored in the database operation recommendation store 121 and retrieved by the database operation recommendation module 114 prior to sending recommended database operations to the data analysis application 125.
- suggestions card 435 includes recommended database operations.
- the column 650 contains phone numbers that are formatted in different ways. The
- the present invention also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer.
- a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of computer-readable storage medium suitable for storing electronic instructions, and each coupled to a computer system bus.
- the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2018555888A JP6843882B2 (ja) | 2016-04-26 | 2017-04-26 | 履歴ログからの学習と、etlツール内のデータアセットに関するデータベースオペレーションの推奨 |
| EP17790324.2A EP3449334A4 (en) | 2016-04-26 | 2017-04-26 | LEARNING FROM HISTORICAL LOGS AND RECOMMENDING DATABASE OPERATIONS ON A DATA ASSET IN AN ETL TOOL |
| AU2017255561A AU2017255561B2 (en) | 2016-04-26 | 2017-04-26 | Learning from historical logs and recommending database operations on a data-asset in an ETL tool |
| CA3022113A CA3022113A1 (en) | 2016-04-26 | 2017-04-26 | Learning from historical logs and recommending database operations on a data-asset in an etl tool |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/139,186 US10324947B2 (en) | 2016-04-26 | 2016-04-26 | Learning from historical logs and recommending database operations on a data-asset in an ETL tool |
| US15/139,186 | 2016-04-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017189693A1 true WO2017189693A1 (en) | 2017-11-02 |
Family
ID=60090217
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2017/029583 Ceased WO2017189693A1 (en) | 2016-04-26 | 2017-04-26 | Learning from historical logs and recommending database operations on a data-asset in an etl tool |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US10324947B2 (enExample) |
| EP (1) | EP3449334A4 (enExample) |
| JP (1) | JP6843882B2 (enExample) |
| AU (1) | AU2017255561B2 (enExample) |
| CA (1) | CA3022113A1 (enExample) |
| WO (1) | WO2017189693A1 (enExample) |
Families Citing this family (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10073858B2 (en) | 2013-05-16 | 2018-09-11 | Oracle International Corporation | Systems and methods for tuning a storage system |
| US11289200B1 (en) | 2017-03-13 | 2022-03-29 | C/Hca, Inc. | Authorized user modeling for decision support |
| US10412028B1 (en) * | 2013-05-24 | 2019-09-10 | HCA Holdings, Inc. | Data derived user behavior modeling |
| US10936599B2 (en) | 2017-09-29 | 2021-03-02 | Oracle International Corporation | Adaptive recommendations |
| US11269822B2 (en) * | 2017-10-09 | 2022-03-08 | Sap Se | Generation of automated data migration model |
| US11222274B2 (en) * | 2017-12-01 | 2022-01-11 | At&T Intellectual Property I, L.P. | Scalable integrated information structure system |
| US10783161B2 (en) * | 2017-12-15 | 2020-09-22 | International Business Machines Corporation | Generating a recommended shaping function to integrate data within a data repository |
| US11323564B2 (en) * | 2018-01-04 | 2022-05-03 | Dell Products L.P. | Case management virtual assistant to enable predictive outputs |
| JP7246095B2 (ja) * | 2018-02-09 | 2023-03-27 | 国立大学法人静岡大学 | 機械学習システム及び機械学習方法 |
| JP7119411B2 (ja) * | 2018-02-16 | 2022-08-17 | 日本電気株式会社 | データベース装置、データ管理方法、及びコンピュータ・プログラム |
| US12190250B2 (en) * | 2018-03-16 | 2025-01-07 | International Business Machines Corporation | Contextual intelligence for unified data governance |
| US11908573B1 (en) | 2020-02-18 | 2024-02-20 | C/Hca, Inc. | Predictive resource management |
| US11531929B2 (en) * | 2018-11-09 | 2022-12-20 | Citrix Sysiems, Inc. | Systems and methods for machine generated training and imitation learning |
| US11636071B2 (en) * | 2020-01-10 | 2023-04-25 | Salesforce.Com, Inc. | Database replication error recovery based on supervised learning |
| CN111324455A (zh) * | 2020-02-10 | 2020-06-23 | 浙江中智达科技有限公司 | 工业云平台调度方法、装置及系统 |
| US11455316B2 (en) | 2020-02-28 | 2022-09-27 | Clumio, Inc. | Modification of data in a time-series data lake |
| US11379500B2 (en) * | 2020-03-30 | 2022-07-05 | Sap Se | Automated data integration, reconciliation, and self healing using machine learning |
| US11763178B2 (en) | 2020-05-29 | 2023-09-19 | Capital One Services, Llc | Predictive scheduling and execution of data analytics applications based on machine learning techniques |
| CN113392174A (zh) * | 2020-08-28 | 2021-09-14 | 郭举 | 基于大数据和人工智能的信息解析方法及系统 |
| US11886891B2 (en) * | 2021-09-10 | 2024-01-30 | Sap Se | Context-based multiexperience element dynamically generated using natural language processing |
| CN117055479B (zh) * | 2023-07-19 | 2024-05-24 | 河南上恒医药科技有限公司 | 口服溶液药物生产过程中设备状态监控方法及系统 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8978114B1 (en) * | 2012-07-15 | 2015-03-10 | Identropy, Inc. | Recommendation engine for unified identity management across internal and shared computing applications |
| US20160055426A1 (en) * | 2014-08-25 | 2016-02-25 | Sunstone Analytics | Customizable machine learning models |
| US20160092475A1 (en) * | 2014-09-26 | 2016-03-31 | Oracle International Corporation | Automated entity correlation and classification across heterogeneous datasets |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7698170B1 (en) * | 2004-08-05 | 2010-04-13 | Versata Development Group, Inc. | Retail recommendation domain model |
| US7720720B1 (en) * | 2004-08-05 | 2010-05-18 | Versata Development Group, Inc. | System and method for generating effective recommendations |
| JP4958476B2 (ja) * | 2006-05-24 | 2012-06-20 | 株式会社日立製作所 | 検索装置 |
| US20090144262A1 (en) * | 2007-12-04 | 2009-06-04 | Microsoft Corporation | Search query transformation using direct manipulation |
| JP5049223B2 (ja) * | 2008-07-29 | 2012-10-17 | ヤフー株式会社 | Webクエリに対する検索要求属性を自動推定する検索装置、検索方法及びプログラム |
| US8266148B2 (en) * | 2008-10-07 | 2012-09-11 | Aumni Data, Inc. | Method and system for business intelligence analytics on unstructured data |
| US20100114976A1 (en) * | 2008-10-21 | 2010-05-06 | Castellanos Maria G | Method For Database Design |
| US8200661B1 (en) * | 2008-12-18 | 2012-06-12 | Google Inc. | Dynamic recommendations based on user actions |
| US8161077B2 (en) * | 2009-10-21 | 2012-04-17 | Delphix Corp. | Datacenter workflow automation scenarios using virtual databases |
| US20120102007A1 (en) * | 2010-10-22 | 2012-04-26 | Alpine Consulting, Inc. | Managing etl jobs |
| US8978137B2 (en) * | 2012-02-29 | 2015-03-10 | Cisco Technology, Inc. | Method and apparatus for retroactively detecting malicious or otherwise undesirable software |
| US9646262B2 (en) * | 2013-06-17 | 2017-05-09 | Purepredictive, Inc. | Data intelligence using machine learning |
| KR101404710B1 (ko) * | 2013-12-16 | 2014-06-11 | 김민수 | 호 기반 광고 서비스 제공 방법 |
| US11227104B2 (en) * | 2014-05-11 | 2022-01-18 | Informatica Llc | Composite data creation with refinement suggestions |
| US9881059B2 (en) * | 2014-08-08 | 2018-01-30 | Yahoo Holdings, Inc. | Systems and methods for suggesting headlines |
| US9507824B2 (en) * | 2014-08-22 | 2016-11-29 | Attivio Inc. | Automated creation of join graphs for unrelated data sets among relational databases |
| US20160117087A1 (en) * | 2014-10-23 | 2016-04-28 | Microsoft Corporation | Job creation and reuse |
| US9686086B1 (en) * | 2014-12-01 | 2017-06-20 | Arimo, Inc. | Distributed data framework for data analytics |
| US10430421B2 (en) * | 2014-12-29 | 2019-10-01 | Facebook, Inc. | Recommending content items in a social network using delayed interaction |
| US10713587B2 (en) * | 2015-11-09 | 2020-07-14 | Xerox Corporation | Method and system using machine learning techniques for checking data integrity in a data warehouse feed |
-
2016
- 2016-04-26 US US15/139,186 patent/US10324947B2/en active Active
-
2017
- 2017-04-26 AU AU2017255561A patent/AU2017255561B2/en active Active
- 2017-04-26 JP JP2018555888A patent/JP6843882B2/ja active Active
- 2017-04-26 WO PCT/US2017/029583 patent/WO2017189693A1/en not_active Ceased
- 2017-04-26 EP EP17790324.2A patent/EP3449334A4/en not_active Withdrawn
- 2017-04-26 CA CA3022113A patent/CA3022113A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8978114B1 (en) * | 2012-07-15 | 2015-03-10 | Identropy, Inc. | Recommendation engine for unified identity management across internal and shared computing applications |
| US20160055426A1 (en) * | 2014-08-25 | 2016-02-25 | Sunstone Analytics | Customizable machine learning models |
| US20160092475A1 (en) * | 2014-09-26 | 2016-03-31 | Oracle International Corporation | Automated entity correlation and classification across heterogeneous datasets |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3449334A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3449334A4 (en) | 2019-12-25 |
| EP3449334A1 (en) | 2019-03-06 |
| CA3022113A1 (en) | 2017-11-02 |
| JP6843882B2 (ja) | 2021-03-17 |
| US10324947B2 (en) | 2019-06-18 |
| JP2019519027A (ja) | 2019-07-04 |
| AU2017255561A1 (en) | 2018-12-13 |
| US20170308595A1 (en) | 2017-10-26 |
| AU2017255561B2 (en) | 2020-11-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2017255561B2 (en) | Learning from historical logs and recommending database operations on a data-asset in an ETL tool | |
| US10558629B2 (en) | Intelligent data quality | |
| US11227104B2 (en) | Composite data creation with refinement suggestions | |
| US9798829B1 (en) | Data graph interface | |
| US12443908B2 (en) | Data distillery for signal detection | |
| US20170060931A1 (en) | Intelligent data munging | |
| US20110246415A1 (en) | Method and system for validating data | |
| US11030258B1 (en) | Ranking anomalies associated with populations of users based on relevance | |
| US20140279803A1 (en) | Disambiguating data using contextual and historical information | |
| US11037096B2 (en) | Delivery prediction with degree of delivery reliability | |
| US11379466B2 (en) | Data accuracy using natural language processing | |
| US20150193511A1 (en) | Graphical record matching process replay for a data quality user interface | |
| US10824606B1 (en) | Standardizing values of a dataset | |
| Baizyldayeva et al. | Decision making procedure: applications of IBM SPSS cluster analysis and decision tree | |
| CN110019182B (zh) | 一种数据追溯方法及装置 | |
| US20190205299A1 (en) | Library search apparatus, library search system, and library search method | |
| Fajri et al. | Implementation of business intelligence to determine evaluation of activities (Case Study Indonesia Stock Exchange) | |
| Serpell | Incorporating data quality improvement into supply–use table balancing | |
| Avdeenko et al. | Modeling information space for decision-making in the interaction of higher education system with regional labor market | |
| US20200342302A1 (en) | Cognitive forecasting | |
| WO2014168961A1 (en) | Generating data analytics using a domain model | |
| CN113901332B (zh) | 任职历程信息挖掘方法和装置、以及存储介质和电子设备 | |
| US20230169072A1 (en) | Augmented query validation and realization | |
| Ayyavaraiah | Data Mining For Business Intelligence | |
| CN116756410A (zh) | 产品服务推荐方法、产品服务推荐装置、设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 3022113 Country of ref document: CA |
|
| ENP | Entry into the national phase |
Ref document number: 2018555888 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2017790324 Country of ref document: EP |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17790324 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2017790324 Country of ref document: EP Effective date: 20181126 |
|
| ENP | Entry into the national phase |
Ref document number: 2017255561 Country of ref document: AU Date of ref document: 20170426 Kind code of ref document: A |