WO2022049680A1 - 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム - Google Patents

結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム Download PDF

Info

Publication number
WO2022049680A1
WO2022049680A1 PCT/JP2020/033308 JP2020033308W WO2022049680A1 WO 2022049680 A1 WO2022049680 A1 WO 2022049680A1 JP 2020033308 W JP2020033308 W JP 2020033308W WO 2022049680 A1 WO2022049680 A1 WO 2022049680A1
Authority
WO
WIPO (PCT)
Prior art keywords
join
index
search
column
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/033308
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
于洋 董
昌史 小山田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US18/024,163 priority Critical patent/US12222916B2/en
Priority to PCT/JP2020/033308 priority patent/WO2022049680A1/ja
Priority to JP2022546785A priority patent/JP7424501B2/ja
Publication of WO2022049680A1 publication Critical patent/WO2022049680A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the Jaccard similarity is 0.5
  • the editing distance is 0.37
  • the similarity after Word2vec conversion is 0.8 for "ramune” and "ramune 250 ml” illustrated in FIG. 2.
  • the maximum similarity is calculated to be 0.8, which is larger than the threshold value Tr , so that it is determined that the two records can be combined.
  • the join column candidate extraction unit 112 extracts a column including a record that can be a join key with another table from each outer table included in the outer table group as a join column candidate.
  • the join column candidate extraction unit 112 estimates, for example, the types of all the columns in the outer table group, and determines whether or not the column of the estimated type can be joined with the columns of other tables.
  • the column type here may be a type such as a "character string type" or a "numeric type" indicating a character attribute, or may indicate a concept represented by a column. ..
  • N 10
  • M 1
  • the number of results when the join index A is used for the record of a certain target column is 2, and the number of results when the join index B is used is 5.
  • FIG. 9 is an explanatory diagram showing an example of a process for determining whether or not to exclude the feature vector. It is assumed that the data in the "sales" column of the base table BT3 illustrated in FIG. 9 and the data in the "attribute 1" column and the “attribute 2" column of the external table FT3 are standardized data, respectively. That is, each surrounded by the broken line illustrated in FIG. 9 corresponds to the feature vector.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2020/033308 2020-09-02 2020-09-02 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム Ceased WO2022049680A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/024,163 US12222916B2 (en) 2020-09-02 2020-09-02 Coupling table specification system, coupling table search device, method, and program
PCT/JP2020/033308 WO2022049680A1 (ja) 2020-09-02 2020-09-02 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム
JP2022546785A JP7424501B2 (ja) 2020-09-02 2020-09-02 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/033308 WO2022049680A1 (ja) 2020-09-02 2020-09-02 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム

Publications (1)

Publication Number Publication Date
WO2022049680A1 true WO2022049680A1 (ja) 2022-03-10

Family

ID=80491870

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/033308 Ceased WO2022049680A1 (ja) 2020-09-02 2020-09-02 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム

Country Status (3)

Country Link
US (1) US12222916B2 (https=)
JP (1) JP7424501B2 (https=)
WO (1) WO2022049680A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12411840B2 (en) * 2023-09-26 2025-09-09 International Business Machines Corporation Embedding based heterogenous dataset evaluation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001155028A (ja) * 1999-11-29 2001-06-08 Hitachi Ltd リレーショナルデータベースにおける集約演算処理方法、その装置及び集約演算処理プログラムを記録したコンピュータ読み取り可能な記録媒体
JP2010176327A (ja) * 2009-01-28 2010-08-12 Sony Corp 学習装置、学習方法、情報処理装置、データ選択方法、データ蓄積方法、データ変換方法、及びプログラム
JP2017188137A (ja) * 2016-03-31 2017-10-12 スマートインサイト株式会社 異種データソース混在環境におけるフィールド間の関係性の自動的発見のための方法、プログラム、および、システム

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343367B2 (en) * 2005-05-12 2008-03-11 International Business Machines Corporation Optimizing a database query that returns a predetermined number of rows using a generated optimized access plan
US9342553B1 (en) * 2012-05-13 2016-05-17 Google Inc. Identifying distinct combinations of values for entities based on information in an index
US20160154851A1 (en) * 2013-04-24 2016-06-02 Hitachi Ltd. Computing device, storage medium, and data search method
US10120851B2 (en) * 2016-06-30 2018-11-06 Microsoft Technology Licensing, Llc Automatic semantic data enrichment in a spreadsheet
WO2018025706A1 (ja) 2016-08-05 2018-02-08 日本電気株式会社 テーブル意味推定システム、方法およびプログラム
US11093494B2 (en) 2016-12-06 2021-08-17 Microsoft Technology Licensing, Llc Joining tables by leveraging transformations
CN111259004B (zh) * 2020-01-08 2023-04-14 腾讯科技(深圳)有限公司 一种存储引擎中数据索引的方法以及相关装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001155028A (ja) * 1999-11-29 2001-06-08 Hitachi Ltd リレーショナルデータベースにおける集約演算処理方法、その装置及び集約演算処理プログラムを記録したコンピュータ読み取り可能な記録媒体
JP2010176327A (ja) * 2009-01-28 2010-08-12 Sony Corp 学習装置、学習方法、情報処理装置、データ選択方法、データ蓄積方法、データ変換方法、及びプログラム
JP2017188137A (ja) * 2016-03-31 2017-10-12 スマートインサイト株式会社 異種データソース混在環境におけるフィールド間の関係性の自動的発見のための方法、プログラム、および、システム

Also Published As

Publication number Publication date
JP7424501B2 (ja) 2024-01-30
US20230394016A1 (en) 2023-12-07
JPWO2022049680A1 (https=) 2022-03-10
US12222916B2 (en) 2025-02-11

Similar Documents

Publication Publication Date Title
JP4011906B2 (ja) プロファイル情報の情報検索方法、プログラム、記録媒体及び装置
CN104573130B (zh) 基于群体计算的实体解析方法及装置
RU2010125681A (ru) Способы и системы для реализации приближенного сравнения строк в базе данных
CN103838857B (zh) 一种基于语义的自动服务组合系统及方法
Liu et al. ProtDet-CCH: protein remote homology detection by combining long short-term memory and ranking methods
KR102438923B1 (ko) 시계열 분포 특징을 고려한 딥러닝 기반 비트코인 블록 데이터 예측 시스템
KR102358357B1 (ko) 시장규모추정장치 및 그 동작 방법
JP7103496B2 (ja) 関連スコア算出システム、方法およびプログラム
Zhang et al. Unsupervised entity resolution with blocking and graph algorithms
CN113505117A (zh) 基于数据指标的数据质量评估方法、装置、设备及介质
EP3477505B1 (en) Fingerprint clustering for content-based audio recogntion
JP7485057B2 (ja) 相関索引構築装置、相関テーブル探索装置、方法およびプログラム
Qinl et al. Synthesizing privacy preserving entity resolution datasets
CN117992575A (zh) 文本匹配方法、装置、计算机设备、存储介质、程序产品
US10984005B2 (en) Database search apparatus and method of searching databases
CN115422429B (zh) 关联词的确定方法、装置、计算机设备和存储介质
JP7424501B2 (ja) 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム
JP7444269B2 (ja) テーブル統合システム、方法およびプログラム
JP2007219929A (ja) 感性評価システム及び方法
JPWO2015125209A1 (ja) 情報構造化システム及び情報構造化方法
Fageeri et al. A semi-apriori algorithm for discovering the frequent itemsets
Joseph et al. Top-k competitor trust mining and customer behavior investigation using data mining technique [J]
TWI484359B (zh) 文章資訊提供方法以及系統
CN116228484B (zh) 基于量子聚类算法的课程组合方法及装置
Ryu et al. Integrating feature analysis and background knowledge to recommend similarity functions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20952420

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022546785

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 18024163

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20952420

Country of ref document: EP

Kind code of ref document: A1