WO2016049034A1 - Guided data exploration - Google Patents
Guided data exploration Download PDFInfo
- Publication number
- WO2016049034A1 WO2016049034A1 PCT/US2015/051462 US2015051462W WO2016049034A1 WO 2016049034 A1 WO2016049034 A1 WO 2016049034A1 US 2015051462 W US2015051462 W US 2015051462W WO 2016049034 A1 WO2016049034 A1 WO 2016049034A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- attributes
- entropy
- interestingness
- sorted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
Definitions
- Fig. 2 is a flow diagram of the functionality of the guided data exploration module of Fig. 1 and other elements in accordance with one embodiment of the present invention.
- system 10 can be implemented as a distributed system. Further, the functionality disclosed herein can be implemented on separate servers or devices that may be coupled together over a network. Further, one or more component of system 10 may not be included. For example, for functionality of a user client, system 10 may be a smartphone that includes a processor, memory and a display, but may not include one or more of the other components shown in Fig. 1 .
- System 10 includes a bus 12 or other communication mechanism for communicating information, and a processor 22 coupled to bus 12 for processing information.
- Processor 22 may be any type of general or specific purpose processor.
- System 10 further includes a memory 14 for storing information and instructions to be executed by processor 22.
- Memory 14 can be comprised of any combination of random access memory (“RAM”), read only memory (“ROM”), static storage such as a magnetic or optical disk, or any other type of computer readable media.
- System 10 further includes a communication device 20, such as a network interface card, to provide access to a network. Therefore, a user may interface with system 10 directly, or remotely through a network, or any other method.
- Processor 22 is further coupled via bus 12 to a display 24, such as a Liquid Crystal Display ("LCD").
- LCD Liquid Crystal Display
- a keyboard 26 and a cursor control device 28, such as a computer mouse, are further coupled to bus 12 to enable a user to interface with system 10.
- unstructured or partially structured data is stored in database 17 of Fig. 1 .
- the data is stored in an Apache Hive, which is a data warehouse infrastructure built on top of Hadoop for providing data
- the data from 204 is indexed into a server and published to a user interface.
- the data is indexed at 206 as an Endeca index in an "MDEX" engine from Oracle Corp.
- the entropy value can be further diminished, since the uncertainty inherent of this variable has been reduced. For example, tossing an unbiased coin yields an equal 0.5 chance (probability) of a tails or heads outcome. Since the uncertainty is high, the entropy would reflect its highest value (i.e., 1 ). If, however, the outcome records whether women are pregnant or not, and it is known that pregnant women account for 5% of the women population, the entropy will drop and indicate a value of 0.2864 bits.
- Certain embodiments can apply different mappings of entropy to interestingness based on each attribute type. For example, geocodes can be considered always interesting, no matter the distribution of their values. [0049] Some embodiments allow users to dynamically modify the lists of the attributes that have been sorted according to their interestingness. The possibilities include user interface elements such as "remove” and "like” buttons, to
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Human Computer Interaction (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP15843760.8A EP3198489A4 (en) | 2014-09-24 | 2015-09-22 | Guided data exploration |
| CN201580047313.5A CN106605222B (zh) | 2014-09-24 | 2015-09-22 | 有指导的数据探索 |
| JP2017515979A JP6637968B2 (ja) | 2014-09-24 | 2015-09-22 | ガイド付きデータ探索 |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201462054517P | 2014-09-24 | 2014-09-24 | |
| US62/054,517 | 2014-09-24 | ||
| US14/678,218 US10387494B2 (en) | 2014-09-24 | 2015-04-03 | Guided data exploration |
| US14/678,218 | 2015-04-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016049034A1 true WO2016049034A1 (en) | 2016-03-31 |
Family
ID=55525958
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2015/051462 Ceased WO2016049034A1 (en) | 2014-09-24 | 2015-09-22 | Guided data exploration |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US10387494B2 (enExample) |
| EP (1) | EP3198489A4 (enExample) |
| JP (2) | JP6637968B2 (enExample) |
| CN (1) | CN106605222B (enExample) |
| WO (1) | WO2016049034A1 (enExample) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2014228991A (ja) * | 2013-05-21 | 2014-12-08 | ソニー株式会社 | 情報処理装置および方法、並びにプログラム |
| US11093640B2 (en) | 2018-04-12 | 2021-08-17 | International Business Machines Corporation | Augmenting datasets with selected de-identified data records |
| US10770171B2 (en) | 2018-04-12 | 2020-09-08 | International Business Machines Corporation | Augmenting datasets using de-identified data and selected authorized records |
| CN110007989A (zh) * | 2018-12-13 | 2019-07-12 | 国网信通亿力科技有限责任公司 | 数据可视化平台系统 |
| CN110362303B (zh) * | 2019-07-15 | 2020-08-25 | 深圳市宇数科技有限公司 | 数据探索方法和系统 |
| US11893038B2 (en) | 2021-10-21 | 2024-02-06 | Treasure Data, Inc. | Data type based visual profiling of large-scale database tables |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070094060A1 (en) * | 2005-10-25 | 2007-04-26 | Angoss Software Corporation | Strategy trees for data mining |
| US20090112904A1 (en) * | 2006-10-31 | 2009-04-30 | Business Objects, S.A. | Apparatus and Method for Categorical Filtering of Data |
| US20110302226A1 (en) * | 2010-06-04 | 2011-12-08 | Yale University | Data loading systems and methods |
| US20130080373A1 (en) * | 2011-09-22 | 2013-03-28 | Bio-Rad Laboratories, Inc. | Systems and methods for biochemical data analysis |
| US20140218383A1 (en) * | 2013-02-07 | 2014-08-07 | Oracle International Corporation | Visual data analysis for large data sets |
Family Cites Families (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6012053A (en) | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
| US6035294A (en) | 1998-08-03 | 2000-03-07 | Big Fat Fish, Inc. | Wide access databases and database systems |
| WO2000008539A1 (en) | 1998-08-03 | 2000-02-17 | Fish Robert D | Self-evolving database and method of using same |
| US20020138492A1 (en) * | 2001-03-07 | 2002-09-26 | David Kil | Data mining application with improved data mining algorithm selection |
| US7383257B2 (en) * | 2003-05-30 | 2008-06-03 | International Business Machines Corporation | Text explanation for on-line analytic processing events |
| US7587685B2 (en) | 2004-02-17 | 2009-09-08 | Wallace James H | Data exploration system |
| JP2005327172A (ja) | 2004-05-17 | 2005-11-24 | Canon Inc | オブジェクト検索装置(検索式の再構成) |
| US7912875B2 (en) | 2006-10-31 | 2011-03-22 | Business Objects Software Ltd. | Apparatus and method for filtering data using nested panels |
| US7873220B2 (en) | 2007-01-03 | 2011-01-18 | Collins Dennis G | Algorithm to measure symmetry and positional entropy of a data set |
| US8935249B2 (en) * | 2007-06-26 | 2015-01-13 | Oracle Otc Subsidiary Llc | Visualization of concepts within a collection of information |
| US8832140B2 (en) | 2007-06-26 | 2014-09-09 | Oracle Otc Subsidiary Llc | System and method for measuring the quality of document sets |
| US8417715B1 (en) | 2007-12-19 | 2013-04-09 | Tilmann Bruckhaus | Platform independent plug-in methods and systems for data mining and analytics |
| US8396870B2 (en) * | 2009-06-25 | 2013-03-12 | University Of Tennessee Research Foundation | Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling |
| US9183203B1 (en) | 2009-07-01 | 2015-11-10 | Quantifind, Inc. | Generalized data mining and analytics apparatuses, methods and systems |
| US20110055246A1 (en) * | 2009-09-01 | 2011-03-03 | Yann Le Biannic | Navigation and visualization of relational database |
| US8336539B2 (en) * | 2010-08-03 | 2012-12-25 | Sunpower Corporation | Opposing row linear concentrator architecture |
| US9299173B2 (en) | 2011-06-07 | 2016-03-29 | International Business Machines Corporation | Automatic selection of different visualizations for the organization of multivariate data |
| JP2013021496A (ja) * | 2011-07-11 | 2013-01-31 | Fujitsu Ltd | 移動局、及び送信制御方法 |
| JP2013037515A (ja) | 2011-08-08 | 2013-02-21 | Sony Corp | 情報処理装置、情報処理方法、プログラム、及び情報処理システム |
| WO2013096887A1 (en) | 2011-12-23 | 2013-06-27 | Amiato, Inc. | Scalable analysis platform for semi-structured data |
| US9201934B2 (en) | 2012-10-02 | 2015-12-01 | Oracle International Corporation | Interactive data mining |
| US10395215B2 (en) | 2012-10-19 | 2019-08-27 | International Business Machines Corporation | Interpretation of statistical results |
| US9934299B2 (en) * | 2012-10-22 | 2018-04-03 | Workday, Inc. | Systems and methods for interest-driven data visualization systems utilizing visualization image data and trellised visualizations |
| US9240016B2 (en) | 2013-03-13 | 2016-01-19 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing predictive query interface as a cloud service |
| US20140344235A1 (en) * | 2013-05-17 | 2014-11-20 | Emmanuel Zarpas | Determination of data modification |
-
2015
- 2015-04-03 US US14/678,218 patent/US10387494B2/en active Active
- 2015-05-08 US US14/707,283 patent/US10552484B2/en active Active
- 2015-09-22 EP EP15843760.8A patent/EP3198489A4/en not_active Ceased
- 2015-09-22 JP JP2017515979A patent/JP6637968B2/ja active Active
- 2015-09-22 WO PCT/US2015/051462 patent/WO2016049034A1/en not_active Ceased
- 2015-09-22 CN CN201580047313.5A patent/CN106605222B/zh active Active
-
2019
- 2019-12-23 JP JP2019231678A patent/JP6862531B2/ja active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070094060A1 (en) * | 2005-10-25 | 2007-04-26 | Angoss Software Corporation | Strategy trees for data mining |
| US20090112904A1 (en) * | 2006-10-31 | 2009-04-30 | Business Objects, S.A. | Apparatus and Method for Categorical Filtering of Data |
| US20110302226A1 (en) * | 2010-06-04 | 2011-12-08 | Yale University | Data loading systems and methods |
| US20130080373A1 (en) * | 2011-09-22 | 2013-03-28 | Bio-Rad Laboratories, Inc. | Systems and methods for biochemical data analysis |
| US20140218383A1 (en) * | 2013-02-07 | 2014-08-07 | Oracle International Corporation | Visual data analysis for large data sets |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3198489A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP6637968B2 (ja) | 2020-01-29 |
| CN106605222B (zh) | 2020-09-04 |
| CN106605222A (zh) | 2017-04-26 |
| US10552484B2 (en) | 2020-02-04 |
| JP6862531B2 (ja) | 2021-04-21 |
| US20160085880A1 (en) | 2016-03-24 |
| EP3198489A1 (en) | 2017-08-02 |
| EP3198489A4 (en) | 2018-02-28 |
| JP2020074105A (ja) | 2020-05-14 |
| US20160085851A1 (en) | 2016-03-24 |
| JP2017532675A (ja) | 2017-11-02 |
| US10387494B2 (en) | 2019-08-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6862531B2 (ja) | ガイド付きデータ探索 | |
| US9437022B2 (en) | Time-based visualization of the number of events having various values for a field | |
| US11042569B2 (en) | System and method for load, aggregate and batch calculation in one scan in a multidimensional database environment | |
| US8682885B2 (en) | Method and system for combining data objects | |
| US20150356085A1 (en) | Guided Predictive Analysis with the Use of Templates | |
| Santos et al. | Modelling and implementing big data warehouses for decision support | |
| US20150278315A1 (en) | Data fitting selected visualization type | |
| US11803865B2 (en) | Graph based processing of multidimensional hierarchical data | |
| EP3217296A1 (en) | Data query method and apparatus | |
| Vijayarani et al. | Research in big data–an overview | |
| US20190050672A1 (en) | INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS | |
| CN111913860A (zh) | 一种操作行为分析方法及装置 | |
| Kajáti et al. | Advanced analysis of manufacturing data in Excel and its Add-ins | |
| CN117951186A (zh) | 见解数据生成的方法和装置 | |
| US20170316071A1 (en) | Visually Interactive Identification of a Cohort of Data Objects Similar to a Query Based on Domain Knowledge | |
| CN114490833A (zh) | 一种图计算结果可视化方法和系统 | |
| US20160162814A1 (en) | Comparative peer analysis for business intelligence | |
| CN110874366A (zh) | 数据处理、查询方法和装置 | |
| JP7418781B2 (ja) | 企業類似度算出サーバ及び企業類似度算出方法 | |
| US10628452B2 (en) | Providing multidimensional attribute value information | |
| US10803053B2 (en) | Automatic selection of neighbor lists to be incrementally updated | |
| CN109635074B (zh) | 一种基于舆情信息的实体关系分析方法及终端设备 | |
| Azam et al. | Three Steps Strategy to Search for Optimum Classification Trees | |
| Borodin et al. | Analysis of multidimensional data with high dimensionality: data access problems and possible solutions | |
| CN109086309A (zh) | 一种指标维度关系定义方法、服务器及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15843760 Country of ref document: EP Kind code of ref document: A1 |
|
| REEP | Request for entry into the european phase |
Ref document number: 2015843760 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2015843760 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2017515979 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |