CA2786445C - Matching metadata sources using rules for characterizing matches - Google Patents

Matching metadata sources using rules for characterizing matches Download PDF

Info

Publication number
CA2786445C
CA2786445C CA2786445A CA2786445A CA2786445C CA 2786445 C CA2786445 C CA 2786445C CA 2786445 A CA2786445 A CA 2786445A CA 2786445 A CA2786445 A CA 2786445A CA 2786445 C CA2786445 C CA 2786445C
Authority
CA
Canada
Prior art keywords
terms
rules
match
source
quality metric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA2786445A
Other languages
English (en)
French (fr)
Other versions
CA2786445A1 (en
Inventor
Andrew Schon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ab Initio Technology LLC
Original Assignee
Ab Initio Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ab Initio Technology LLC filed Critical Ab Initio Technology LLC
Publication of CA2786445A1 publication Critical patent/CA2786445A1/en
Application granted granted Critical
Publication of CA2786445C publication Critical patent/CA2786445C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Machine Translation (AREA)
CA2786445A 2010-01-13 2011-01-13 Matching metadata sources using rules for characterizing matches Active CA2786445C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US29466310P 2010-01-13 2010-01-13
US61/294,663 2010-01-13
PCT/US2011/021108 WO2011088195A1 (en) 2010-01-13 2011-01-13 Matching metadata sources using rules for characterizing matches

Publications (2)

Publication Number Publication Date
CA2786445A1 CA2786445A1 (en) 2011-07-21
CA2786445C true CA2786445C (en) 2018-02-13

Family

ID=43755121

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2786445A Active CA2786445C (en) 2010-01-13 2011-01-13 Matching metadata sources using rules for characterizing matches

Country Status (8)

Country Link
US (1) US9031895B2 (enExample)
EP (1) EP2524327B1 (enExample)
JP (1) JP5768063B2 (enExample)
KR (1) KR101758669B1 (enExample)
CN (1) CN102792298B (enExample)
AU (1) AU2011205296B2 (enExample)
CA (1) CA2786445C (enExample)
WO (1) WO2011088195A1 (enExample)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2744240C (en) 2008-12-02 2019-06-18 Ab Initio Technology Llc Visualizing relationships between data elements and graphical representations of data element attributes
US8370407B1 (en) * 2011-06-28 2013-02-05 Go Daddy Operating Company, LLC Systems providing a network resource address reputation service
US8966152B2 (en) 2011-08-02 2015-02-24 Cavium, Inc. On-chip memory (OCM) physical bank parallelism
US9081801B2 (en) * 2012-07-25 2015-07-14 Hewlett-Packard Development Company, L.P. Metadata supersets for matching images
US9892026B2 (en) * 2013-02-01 2018-02-13 Ab Initio Technology Llc Data records selection
US9872634B2 (en) * 2013-02-08 2018-01-23 Vital Connect, Inc. Respiratory rate measurement using a combination of respiration signals
US9864755B2 (en) 2013-03-08 2018-01-09 Go Daddy Operating Company, LLC Systems for associating an online file folder with a uniform resource locator
US9178888B2 (en) 2013-06-14 2015-11-03 Go Daddy Operating Company, LLC Method for domain control validation
US9521138B2 (en) 2013-06-14 2016-12-13 Go Daddy Operating Company, LLC System for domain control validation
WO2015095275A1 (en) * 2013-12-18 2015-06-25 Ab Initio Technology Llc Data generation
US9544402B2 (en) * 2013-12-31 2017-01-10 Cavium, Inc. Multi-rule approach to encoding a group of rules
US9275336B2 (en) 2013-12-31 2016-03-01 Cavium, Inc. Method and system for skipping over group(s) of rules based on skip group rule
US9667446B2 (en) 2014-01-08 2017-05-30 Cavium, Inc. Condition code approach for comparing rule and packet data that are provided in portions
US10296192B2 (en) 2014-09-26 2019-05-21 Oracle International Corporation Dynamic visual profiling and visualization of high volume datasets and real-time smart sampling and statistical profiling of extremely large datasets
US10210246B2 (en) 2014-09-26 2019-02-19 Oracle International Corporation Techniques for similarity analysis and data enrichment using knowledge sources
US10891272B2 (en) 2014-09-26 2021-01-12 Oracle International Corporation Declarative language and visualization system for recommended data transformations and repairs
US10684998B2 (en) 2014-11-21 2020-06-16 Microsoft Technology Licensing, Llc Automatic schema mismatch detection
CN104504021A (zh) * 2014-12-11 2015-04-08 北京国双科技有限公司 数据匹配方法及装置
US10891258B2 (en) * 2016-03-22 2021-01-12 Tata Consultancy Services Limited Systems and methods for de-normalized data structure files based generation of intelligence reports
JP6665678B2 (ja) * 2016-05-17 2020-03-13 富士通株式会社 メタデータ登録方法、メタデータ登録プログラムおよびメタデータ登録装置
US11106643B1 (en) * 2017-08-02 2021-08-31 Synchrony Bank System and method for integrating systems to implement data quality processing
US11016936B1 (en) * 2017-09-05 2021-05-25 Palantir Technologies Inc. Validating data for integration
US10936599B2 (en) 2017-09-29 2021-03-02 Oracle International Corporation Adaptive recommendations
US10885056B2 (en) * 2017-09-29 2021-01-05 Oracle International Corporation Data standardization techniques
US11093639B2 (en) * 2018-02-23 2021-08-17 International Business Machines Corporation Coordinated de-identification of a dataset across a network
GB2574905A (en) * 2018-06-18 2019-12-25 Arm Ip Ltd Pipeline template configuration in a data processing system
US11113324B2 (en) * 2018-07-26 2021-09-07 JANZZ Ltd Classifier system and method
US11074230B2 (en) 2018-09-04 2021-07-27 International Business Machines Corporation Data matching accuracy based on context features
US11163750B2 (en) 2018-09-27 2021-11-02 International Business Machines Corporation Dynamic, transparent manipulation of content and/or namespaces within data storage systems
US11755754B2 (en) * 2018-10-19 2023-09-12 Oracle International Corporation Systems and methods for securing data based on discovered relationships
CN110210222B (zh) * 2018-10-24 2023-01-31 腾讯科技(深圳)有限公司 数据处理方法、数据处理装置和计算机可读存储介质
KR102774097B1 (ko) 2019-03-22 2025-03-04 삼성전자주식회사 전자 장치 및 그 제어 방법
US11269905B2 (en) 2019-06-20 2022-03-08 International Business Machines Corporation Interaction between visualizations and other data controls in an information system by matching attributes in different datasets
CN110414579A (zh) * 2019-07-18 2019-11-05 北京信远通科技有限公司 元数据模型合标性检查方法及装置、存储介质
CN111639077B (zh) * 2020-05-15 2024-03-22 杭州数梦工场科技有限公司 数据治理方法、装置、电子设备、存储介质
US11734511B1 (en) * 2020-07-08 2023-08-22 Mineral Earth Sciences Llc Mapping data set(s) to canonical phrases using natural language processing model(s)
CN112181949A (zh) * 2020-10-10 2021-01-05 浪潮云信息技术股份公司 一种在线数据建模的方法及装置
CN112199433A (zh) * 2020-10-28 2021-01-08 云赛智联股份有限公司 一种用于城市级数据中台的数据治理系统
CN112751938B (zh) * 2020-12-30 2023-04-07 上海赋算通云计算科技有限公司 一种基于多集群作业的实时数据同步系统,实现方法以及存储介质
CN113362174B (zh) * 2021-06-17 2023-01-24 富途网络科技(深圳)有限公司 数据对比方法、装置、设备以及存储介质
US12050575B2 (en) 2021-07-26 2024-07-30 International Business Machines Corporation Mapping of heterogeneous data as matching fields
CN113792057A (zh) * 2021-08-02 2021-12-14 浪潮软件股份有限公司 一种业务数据标准字典匹配方法
CN117332284B (zh) * 2023-12-01 2024-02-09 湖南空间折叠互联网科技有限公司 线下医疗数据匹配算法及系统
CN119202755B (zh) * 2024-11-27 2025-03-14 深圳市安仕新能源科技股份有限公司 一种基于mes的规格范围自动匹配方法、系统和介质

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000040085A (ja) * 1998-07-22 2000-02-08 Hitachi Ltd 日本語形態素解析処理の後処理方法および装置
US6826568B2 (en) * 2001-12-20 2004-11-30 Microsoft Corporation Methods and system for model matching
US7730063B2 (en) * 2002-12-10 2010-06-01 Asset Trust, Inc. Personalized medicine service
JP2003271656A (ja) * 2002-03-19 2003-09-26 Fujitsu Ltd 関係付候補生成装置,関係付候補生成方法,関係付システム,関係付候補生成プログラムおよび同プログラムを記録したコンピュータ読取可能な記録媒体
US7542958B1 (en) * 2002-09-13 2009-06-02 Xsb, Inc. Methods for determining the similarity of content and structuring unstructured content from heterogeneous sources
US20040158567A1 (en) 2003-02-12 2004-08-12 International Business Machines Corporation Constraint driven schema association
US20040249682A1 (en) 2003-06-06 2004-12-09 Demarcken Carl G. Filling a query cache for travel planning
US7552110B2 (en) 2003-09-22 2009-06-23 International Business Machines Corporation Method for performing a query in a computer system to retrieve data from a database
JP4511892B2 (ja) * 2004-07-26 2010-07-28 ヤフー株式会社 類義語検索装置、その方法、そのプログラム、および、情報検索装置
US20060075013A1 (en) * 2004-09-03 2006-04-06 Hite Thomas D System and method for relating computing systems
JP4687089B2 (ja) * 2004-12-08 2011-05-25 日本電気株式会社 重複レコード検出システム、および重複レコード検出プログラム
CA2609911A1 (en) * 2005-04-25 2006-11-02 Leon Falic Internet-based duty-free goods electronic commerce system and method
US20070005621A1 (en) * 2005-06-01 2007-01-04 Lesh Kathryn A Information system using healthcare ontology
US7716630B2 (en) 2005-06-27 2010-05-11 Ab Initio Technology Llc Managing parameters for graph-based computations
CN100343852C (zh) * 2005-09-27 2007-10-17 南方医科大学 一种与特定功能相关的基因信息检索系统及用于该系统的检索词数据库的构建方法
US20080021912A1 (en) * 2006-07-24 2008-01-24 The Mitre Corporation Tools and methods for semi-automatic schema matching
US8027948B2 (en) * 2008-01-31 2011-09-27 International Business Machines Corporation Method and system for generating an ontology
WO2009017158A1 (ja) * 2007-08-01 2009-02-05 Nec Corporation 変換プログラム探索システムおよび変換プログラム探索方法
US8775441B2 (en) 2008-01-16 2014-07-08 Ab Initio Technology Llc Managing an archive for approximate string matching
CN101650746B (zh) 2009-09-27 2011-06-29 中国电信股份有限公司 一种对排序结果进行验证的方法和系统

Also Published As

Publication number Publication date
CN102792298B (zh) 2017-03-29
AU2011205296B2 (en) 2016-07-28
CA2786445A1 (en) 2011-07-21
JP2013517569A (ja) 2013-05-16
KR101758669B1 (ko) 2017-07-18
EP2524327A1 (en) 2012-11-21
EP2524327B1 (en) 2017-11-29
JP5768063B2 (ja) 2015-08-26
AU2011205296A1 (en) 2012-07-12
US9031895B2 (en) 2015-05-12
KR20120135218A (ko) 2012-12-12
CN102792298A (zh) 2012-11-21
WO2011088195A1 (en) 2011-07-21
US20110173149A1 (en) 2011-07-14

Similar Documents

Publication Publication Date Title
CA2786445C (en) Matching metadata sources using rules for characterizing matches
CN108701255B (zh) 用于通过模式分解来推断数据变换的系统和方法
Do et al. Matching large schemas: Approaches and evaluation
US10546001B1 (en) Natural language queries based on user defined attributes
US11698918B2 (en) System and method for content-based data visualization using a universal knowledge graph
US20120197887A1 (en) Generating data pattern information
Deb Nath et al. Towards a programmable semantic extract-transform-load framework for semantic data warehouses
WO2013033098A1 (en) Relational metal-model and associated domain context-based knowledge inference engine for knowledge discovery and organization
US20230072607A1 (en) Data augmentation and enrichment
US20240220876A1 (en) Artificial intelligence (ai) based data product provisioning
CN118708731A (zh) 用于生成数据库字典定义的方法和生成查询代码的系统
Pamungkas et al. B-BabelNet: business-specific lexical database for improving semantic analysis of business process models
Naik Artificial Intelligence and Large Language Models in CAD
Du et al. A schema aware ETL workflow generator
HK1173248A (en) Matching metadata sources using rules for characterizing matches
HK1173248B (en) Matching metadata sources using rules for characterizing matches
Traeger et al. SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation
Arasu et al. Towards a domain independent platform for data cleaning
KR20250046799A (ko) 그래프 데이터베이스 기반 트리플 데이터와 위치 정보를 병합한 형태의 문서 관계형성 시스템 및 방법
Middleton et al. Salt: Scalable automated linking technology for data-intensive computing
Dempsey et al. Temporal Benchmarks
JP4889964B2 (ja) 規則文章作成装置
CN119961401A (zh) 用于实现数据问答的方法、系统和计算机可读介质
Grishin et al. Possibility of obtaining functional dependences from database structure
Kwakye et al. Instance-Based Integration of Multidimensional Data Models

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20160113