WO2022049680A1 - 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム - Google Patents
結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム Download PDFInfo
- Publication number
- WO2022049680A1 WO2022049680A1 PCT/JP2020/033308 JP2020033308W WO2022049680A1 WO 2022049680 A1 WO2022049680 A1 WO 2022049680A1 JP 2020033308 W JP2020033308 W JP 2020033308W WO 2022049680 A1 WO2022049680 A1 WO 2022049680A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- join
- index
- search
- column
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Definitions
- the Jaccard similarity is 0.5
- the editing distance is 0.37
- the similarity after Word2vec conversion is 0.8 for "ramune” and "ramune 250 ml” illustrated in FIG. 2.
- the maximum similarity is calculated to be 0.8, which is larger than the threshold value Tr , so that it is determined that the two records can be combined.
- the join column candidate extraction unit 112 extracts a column including a record that can be a join key with another table from each outer table included in the outer table group as a join column candidate.
- the join column candidate extraction unit 112 estimates, for example, the types of all the columns in the outer table group, and determines whether or not the column of the estimated type can be joined with the columns of other tables.
- the column type here may be a type such as a "character string type" or a "numeric type" indicating a character attribute, or may indicate a concept represented by a column. ..
- N 10
- M 1
- the number of results when the join index A is used for the record of a certain target column is 2, and the number of results when the join index B is used is 5.
- FIG. 9 is an explanatory diagram showing an example of a process for determining whether or not to exclude the feature vector. It is assumed that the data in the "sales" column of the base table BT3 illustrated in FIG. 9 and the data in the "attribute 1" column and the “attribute 2" column of the external table FT3 are standardized data, respectively. That is, each surrounded by the broken line illustrated in FIG. 9 corresponds to the feature vector.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Operations Research (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/024,163 US12222916B2 (en) | 2020-09-02 | 2020-09-02 | Coupling table specification system, coupling table search device, method, and program |
| PCT/JP2020/033308 WO2022049680A1 (ja) | 2020-09-02 | 2020-09-02 | 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム |
| JP2022546785A JP7424501B2 (ja) | 2020-09-02 | 2020-09-02 | 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/033308 WO2022049680A1 (ja) | 2020-09-02 | 2020-09-02 | 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022049680A1 true WO2022049680A1 (ja) | 2022-03-10 |
Family
ID=80491870
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/033308 Ceased WO2022049680A1 (ja) | 2020-09-02 | 2020-09-02 | 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12222916B2 (https=) |
| JP (1) | JP7424501B2 (https=) |
| WO (1) | WO2022049680A1 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12411840B2 (en) * | 2023-09-26 | 2025-09-09 | International Business Machines Corporation | Embedding based heterogenous dataset evaluation |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001155028A (ja) * | 1999-11-29 | 2001-06-08 | Hitachi Ltd | リレーショナルデータベースにおける集約演算処理方法、その装置及び集約演算処理プログラムを記録したコンピュータ読み取り可能な記録媒体 |
| JP2010176327A (ja) * | 2009-01-28 | 2010-08-12 | Sony Corp | 学習装置、学習方法、情報処理装置、データ選択方法、データ蓄積方法、データ変換方法、及びプログラム |
| JP2017188137A (ja) * | 2016-03-31 | 2017-10-12 | スマートインサイト株式会社 | 異種データソース混在環境におけるフィールド間の関係性の自動的発見のための方法、プログラム、および、システム |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7343367B2 (en) * | 2005-05-12 | 2008-03-11 | International Business Machines Corporation | Optimizing a database query that returns a predetermined number of rows using a generated optimized access plan |
| US9342553B1 (en) * | 2012-05-13 | 2016-05-17 | Google Inc. | Identifying distinct combinations of values for entities based on information in an index |
| US20160154851A1 (en) * | 2013-04-24 | 2016-06-02 | Hitachi Ltd. | Computing device, storage medium, and data search method |
| US10120851B2 (en) * | 2016-06-30 | 2018-11-06 | Microsoft Technology Licensing, Llc | Automatic semantic data enrichment in a spreadsheet |
| WO2018025706A1 (ja) | 2016-08-05 | 2018-02-08 | 日本電気株式会社 | テーブル意味推定システム、方法およびプログラム |
| US11093494B2 (en) | 2016-12-06 | 2021-08-17 | Microsoft Technology Licensing, Llc | Joining tables by leveraging transformations |
| CN111259004B (zh) * | 2020-01-08 | 2023-04-14 | 腾讯科技(深圳)有限公司 | 一种存储引擎中数据索引的方法以及相关装置 |
-
2020
- 2020-09-02 US US18/024,163 patent/US12222916B2/en active Active
- 2020-09-02 WO PCT/JP2020/033308 patent/WO2022049680A1/ja not_active Ceased
- 2020-09-02 JP JP2022546785A patent/JP7424501B2/ja active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001155028A (ja) * | 1999-11-29 | 2001-06-08 | Hitachi Ltd | リレーショナルデータベースにおける集約演算処理方法、その装置及び集約演算処理プログラムを記録したコンピュータ読み取り可能な記録媒体 |
| JP2010176327A (ja) * | 2009-01-28 | 2010-08-12 | Sony Corp | 学習装置、学習方法、情報処理装置、データ選択方法、データ蓄積方法、データ変換方法、及びプログラム |
| JP2017188137A (ja) * | 2016-03-31 | 2017-10-12 | スマートインサイト株式会社 | 異種データソース混在環境におけるフィールド間の関係性の自動的発見のための方法、プログラム、および、システム |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7424501B2 (ja) | 2024-01-30 |
| US20230394016A1 (en) | 2023-12-07 |
| JPWO2022049680A1 (https=) | 2022-03-10 |
| US12222916B2 (en) | 2025-02-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4011906B2 (ja) | プロファイル情報の情報検索方法、プログラム、記録媒体及び装置 | |
| CN104573130B (zh) | 基于群体计算的实体解析方法及装置 | |
| RU2010125681A (ru) | Способы и системы для реализации приближенного сравнения строк в базе данных | |
| CN103838857B (zh) | 一种基于语义的自动服务组合系统及方法 | |
| Liu et al. | ProtDet-CCH: protein remote homology detection by combining long short-term memory and ranking methods | |
| KR102438923B1 (ko) | 시계열 분포 특징을 고려한 딥러닝 기반 비트코인 블록 데이터 예측 시스템 | |
| KR102358357B1 (ko) | 시장규모추정장치 및 그 동작 방법 | |
| JP7103496B2 (ja) | 関連スコア算出システム、方法およびプログラム | |
| Zhang et al. | Unsupervised entity resolution with blocking and graph algorithms | |
| CN113505117A (zh) | 基于数据指标的数据质量评估方法、装置、设备及介质 | |
| EP3477505B1 (en) | Fingerprint clustering for content-based audio recogntion | |
| JP7485057B2 (ja) | 相関索引構築装置、相関テーブル探索装置、方法およびプログラム | |
| Qinl et al. | Synthesizing privacy preserving entity resolution datasets | |
| CN117992575A (zh) | 文本匹配方法、装置、计算机设备、存储介质、程序产品 | |
| US10984005B2 (en) | Database search apparatus and method of searching databases | |
| CN115422429B (zh) | 关联词的确定方法、装置、计算机设备和存储介质 | |
| JP7424501B2 (ja) | 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム | |
| JP7444269B2 (ja) | テーブル統合システム、方法およびプログラム | |
| JP2007219929A (ja) | 感性評価システム及び方法 | |
| JPWO2015125209A1 (ja) | 情報構造化システム及び情報構造化方法 | |
| Fageeri et al. | A semi-apriori algorithm for discovering the frequent itemsets | |
| Joseph et al. | Top-k competitor trust mining and customer behavior investigation using data mining technique [J] | |
| TWI484359B (zh) | 文章資訊提供方法以及系統 | |
| CN116228484B (zh) | 基于量子聚类算法的课程组合方法及装置 | |
| Ryu et al. | Integrating feature analysis and background knowledge to recommend similarity functions |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20952420 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022546785 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18024163 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20952420 Country of ref document: EP Kind code of ref document: A1 |