WO2016049034A1 - Guided data exploration - Google Patents

Guided data exploration Download PDF

Info

Publication number
WO2016049034A1
WO2016049034A1 PCT/US2015/051462 US2015051462W WO2016049034A1 WO 2016049034 A1 WO2016049034 A1 WO 2016049034A1 US 2015051462 W US2015051462 W US 2015051462W WO 2016049034 A1 WO2016049034 A1 WO 2016049034A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
attributes
entropy
interestingness
sorted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2015/051462
Other languages
English (en)
French (fr)
Inventor
Uri Sheffer
Adam Craig POCOCK
Brook Stevens
Mashhood Ishaque
Vladimir Zelevinsky
Tristan R. SPAULDING
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to EP15843760.8A priority Critical patent/EP3198489A4/en
Priority to CN201580047313.5A priority patent/CN106605222B/zh
Priority to JP2017515979A priority patent/JP6637968B2/ja
Publication of WO2016049034A1 publication Critical patent/WO2016049034A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor

Definitions

  • Fig. 2 is a flow diagram of the functionality of the guided data exploration module of Fig. 1 and other elements in accordance with one embodiment of the present invention.
  • system 10 can be implemented as a distributed system. Further, the functionality disclosed herein can be implemented on separate servers or devices that may be coupled together over a network. Further, one or more component of system 10 may not be included. For example, for functionality of a user client, system 10 may be a smartphone that includes a processor, memory and a display, but may not include one or more of the other components shown in Fig. 1 .
  • System 10 includes a bus 12 or other communication mechanism for communicating information, and a processor 22 coupled to bus 12 for processing information.
  • Processor 22 may be any type of general or specific purpose processor.
  • System 10 further includes a memory 14 for storing information and instructions to be executed by processor 22.
  • Memory 14 can be comprised of any combination of random access memory (“RAM”), read only memory (“ROM”), static storage such as a magnetic or optical disk, or any other type of computer readable media.
  • System 10 further includes a communication device 20, such as a network interface card, to provide access to a network. Therefore, a user may interface with system 10 directly, or remotely through a network, or any other method.
  • Processor 22 is further coupled via bus 12 to a display 24, such as a Liquid Crystal Display ("LCD").
  • LCD Liquid Crystal Display
  • a keyboard 26 and a cursor control device 28, such as a computer mouse, are further coupled to bus 12 to enable a user to interface with system 10.
  • unstructured or partially structured data is stored in database 17 of Fig. 1 .
  • the data is stored in an Apache Hive, which is a data warehouse infrastructure built on top of Hadoop for providing data
  • the data from 204 is indexed into a server and published to a user interface.
  • the data is indexed at 206 as an Endeca index in an "MDEX" engine from Oracle Corp.
  • the entropy value can be further diminished, since the uncertainty inherent of this variable has been reduced. For example, tossing an unbiased coin yields an equal 0.5 chance (probability) of a tails or heads outcome. Since the uncertainty is high, the entropy would reflect its highest value (i.e., 1 ). If, however, the outcome records whether women are pregnant or not, and it is known that pregnant women account for 5% of the women population, the entropy will drop and indicate a value of 0.2864 bits.
  • Certain embodiments can apply different mappings of entropy to interestingness based on each attribute type. For example, geocodes can be considered always interesting, no matter the distribution of their values. [0049] Some embodiments allow users to dynamically modify the lists of the attributes that have been sorted according to their interestingness. The possibilities include user interface elements such as "remove” and "like” buttons, to

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/US2015/051462 2014-09-24 2015-09-22 Guided data exploration Ceased WO2016049034A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP15843760.8A EP3198489A4 (en) 2014-09-24 2015-09-22 Guided data exploration
CN201580047313.5A CN106605222B (zh) 2014-09-24 2015-09-22 有指导的数据探索
JP2017515979A JP6637968B2 (ja) 2014-09-24 2015-09-22 ガイド付きデータ探索

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201462054517P 2014-09-24 2014-09-24
US62/054,517 2014-09-24
US14/678,218 US10387494B2 (en) 2014-09-24 2015-04-03 Guided data exploration
US14/678,218 2015-04-03

Publications (1)

Publication Number Publication Date
WO2016049034A1 true WO2016049034A1 (en) 2016-03-31

Family

ID=55525958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/051462 Ceased WO2016049034A1 (en) 2014-09-24 2015-09-22 Guided data exploration

Country Status (5)

Country Link
US (2) US10387494B2 (enExample)
EP (1) EP3198489A4 (enExample)
JP (2) JP6637968B2 (enExample)
CN (1) CN106605222B (enExample)
WO (1) WO2016049034A1 (enExample)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014228991A (ja) * 2013-05-21 2014-12-08 ソニー株式会社 情報処理装置および方法、並びにプログラム
US11093640B2 (en) 2018-04-12 2021-08-17 International Business Machines Corporation Augmenting datasets with selected de-identified data records
US10770171B2 (en) 2018-04-12 2020-09-08 International Business Machines Corporation Augmenting datasets using de-identified data and selected authorized records
CN110007989A (zh) * 2018-12-13 2019-07-12 国网信通亿力科技有限责任公司 数据可视化平台系统
CN110362303B (zh) * 2019-07-15 2020-08-25 深圳市宇数科技有限公司 数据探索方法和系统
US11893038B2 (en) 2021-10-21 2024-02-06 Treasure Data, Inc. Data type based visual profiling of large-scale database tables

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094060A1 (en) * 2005-10-25 2007-04-26 Angoss Software Corporation Strategy trees for data mining
US20090112904A1 (en) * 2006-10-31 2009-04-30 Business Objects, S.A. Apparatus and Method for Categorical Filtering of Data
US20110302226A1 (en) * 2010-06-04 2011-12-08 Yale University Data loading systems and methods
US20130080373A1 (en) * 2011-09-22 2013-03-28 Bio-Rad Laboratories, Inc. Systems and methods for biochemical data analysis
US20140218383A1 (en) * 2013-02-07 2014-08-07 Oracle International Corporation Visual data analysis for large data sets

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012053A (en) 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US6035294A (en) 1998-08-03 2000-03-07 Big Fat Fish, Inc. Wide access databases and database systems
WO2000008539A1 (en) 1998-08-03 2000-02-17 Fish Robert D Self-evolving database and method of using same
US20020138492A1 (en) * 2001-03-07 2002-09-26 David Kil Data mining application with improved data mining algorithm selection
US7383257B2 (en) * 2003-05-30 2008-06-03 International Business Machines Corporation Text explanation for on-line analytic processing events
US7587685B2 (en) 2004-02-17 2009-09-08 Wallace James H Data exploration system
JP2005327172A (ja) 2004-05-17 2005-11-24 Canon Inc オブジェクト検索装置(検索式の再構成)
US7912875B2 (en) 2006-10-31 2011-03-22 Business Objects Software Ltd. Apparatus and method for filtering data using nested panels
US7873220B2 (en) 2007-01-03 2011-01-18 Collins Dennis G Algorithm to measure symmetry and positional entropy of a data set
US8935249B2 (en) * 2007-06-26 2015-01-13 Oracle Otc Subsidiary Llc Visualization of concepts within a collection of information
US8832140B2 (en) 2007-06-26 2014-09-09 Oracle Otc Subsidiary Llc System and method for measuring the quality of document sets
US8417715B1 (en) 2007-12-19 2013-04-09 Tilmann Bruckhaus Platform independent plug-in methods and systems for data mining and analytics
US8396870B2 (en) * 2009-06-25 2013-03-12 University Of Tennessee Research Foundation Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling
US9183203B1 (en) 2009-07-01 2015-11-10 Quantifind, Inc. Generalized data mining and analytics apparatuses, methods and systems
US20110055246A1 (en) * 2009-09-01 2011-03-03 Yann Le Biannic Navigation and visualization of relational database
US8336539B2 (en) * 2010-08-03 2012-12-25 Sunpower Corporation Opposing row linear concentrator architecture
US9299173B2 (en) 2011-06-07 2016-03-29 International Business Machines Corporation Automatic selection of different visualizations for the organization of multivariate data
JP2013021496A (ja) * 2011-07-11 2013-01-31 Fujitsu Ltd 移動局、及び送信制御方法
JP2013037515A (ja) 2011-08-08 2013-02-21 Sony Corp 情報処理装置、情報処理方法、プログラム、及び情報処理システム
WO2013096887A1 (en) 2011-12-23 2013-06-27 Amiato, Inc. Scalable analysis platform for semi-structured data
US9201934B2 (en) 2012-10-02 2015-12-01 Oracle International Corporation Interactive data mining
US10395215B2 (en) 2012-10-19 2019-08-27 International Business Machines Corporation Interpretation of statistical results
US9934299B2 (en) * 2012-10-22 2018-04-03 Workday, Inc. Systems and methods for interest-driven data visualization systems utilizing visualization image data and trellised visualizations
US9240016B2 (en) 2013-03-13 2016-01-19 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing predictive query interface as a cloud service
US20140344235A1 (en) * 2013-05-17 2014-11-20 Emmanuel Zarpas Determination of data modification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094060A1 (en) * 2005-10-25 2007-04-26 Angoss Software Corporation Strategy trees for data mining
US20090112904A1 (en) * 2006-10-31 2009-04-30 Business Objects, S.A. Apparatus and Method for Categorical Filtering of Data
US20110302226A1 (en) * 2010-06-04 2011-12-08 Yale University Data loading systems and methods
US20130080373A1 (en) * 2011-09-22 2013-03-28 Bio-Rad Laboratories, Inc. Systems and methods for biochemical data analysis
US20140218383A1 (en) * 2013-02-07 2014-08-07 Oracle International Corporation Visual data analysis for large data sets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3198489A4 *

Also Published As

Publication number Publication date
JP6637968B2 (ja) 2020-01-29
CN106605222B (zh) 2020-09-04
CN106605222A (zh) 2017-04-26
US10552484B2 (en) 2020-02-04
JP6862531B2 (ja) 2021-04-21
US20160085880A1 (en) 2016-03-24
EP3198489A1 (en) 2017-08-02
EP3198489A4 (en) 2018-02-28
JP2020074105A (ja) 2020-05-14
US20160085851A1 (en) 2016-03-24
JP2017532675A (ja) 2017-11-02
US10387494B2 (en) 2019-08-20

Similar Documents

Publication Publication Date Title
JP6862531B2 (ja) ガイド付きデータ探索
US9437022B2 (en) Time-based visualization of the number of events having various values for a field
US11042569B2 (en) System and method for load, aggregate and batch calculation in one scan in a multidimensional database environment
US8682885B2 (en) Method and system for combining data objects
US20150356085A1 (en) Guided Predictive Analysis with the Use of Templates
Santos et al. Modelling and implementing big data warehouses for decision support
US20150278315A1 (en) Data fitting selected visualization type
US11803865B2 (en) Graph based processing of multidimensional hierarchical data
EP3217296A1 (en) Data query method and apparatus
Vijayarani et al. Research in big data–an overview
US20190050672A1 (en) INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS
CN111913860A (zh) 一种操作行为分析方法及装置
Kajáti et al. Advanced analysis of manufacturing data in Excel and its Add-ins
CN117951186A (zh) 见解数据生成的方法和装置
US20170316071A1 (en) Visually Interactive Identification of a Cohort of Data Objects Similar to a Query Based on Domain Knowledge
CN114490833A (zh) 一种图计算结果可视化方法和系统
US20160162814A1 (en) Comparative peer analysis for business intelligence
CN110874366A (zh) 数据处理、查询方法和装置
JP7418781B2 (ja) 企業類似度算出サーバ及び企業類似度算出方法
US10628452B2 (en) Providing multidimensional attribute value information
US10803053B2 (en) Automatic selection of neighbor lists to be incrementally updated
CN109635074B (zh) 一种基于舆情信息的实体关系分析方法及终端设备
Azam et al. Three Steps Strategy to Search for Optimum Classification Trees
Borodin et al. Analysis of multidimensional data with high dimensionality: data access problems and possible solutions
CN109086309A (zh) 一种指标维度关系定义方法、服务器及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15843760

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015843760

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015843760

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017515979

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE