CA3185178C - Data quality analysis - Google Patents

Data quality analysis Download PDF

Info

Publication number
CA3185178C
CA3185178C CA3185178A CA3185178A CA3185178C CA 3185178 C CA3185178 C CA 3185178C CA 3185178 A CA3185178 A CA 3185178A CA 3185178 A CA3185178 A CA 3185178A CA 3185178 C CA3185178 C CA 3185178C
Authority
CA
Canada
Prior art keywords
data
dataset
profile
particular field
instance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA3185178A
Other languages
English (en)
French (fr)
Other versions
CA3185178A1 (en
Inventor
Chuck SPITZ
Joel Gould
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ab Initio Technology LLC
Original Assignee
Ab Initio Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ab Initio Technology LLC filed Critical Ab Initio Technology LLC
Publication of CA3185178A1 publication Critical patent/CA3185178A1/en
Application granted granted Critical
Publication of CA3185178C publication Critical patent/CA3185178C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/197Version control
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)
  • User Interface Of Digital Computer (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)
CA3185178A 2015-06-12 2016-06-10 Data quality analysis Active CA3185178C (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201562174997P 2015-06-12 2015-06-12
US62/174,997 2015-06-12
US15/175,793 2016-06-07
US15/175,793 US10409802B2 (en) 2015-06-12 2016-06-07 Data quality analysis
CA2988256A CA2988256A1 (en) 2015-06-12 2016-06-10 Data quality analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA2988256A Division CA2988256A1 (en) 2015-06-12 2016-06-10 Data quality analysis

Publications (2)

Publication Number Publication Date
CA3185178A1 CA3185178A1 (en) 2016-12-15
CA3185178C true CA3185178C (en) 2023-09-26

Family

ID=56178502

Family Applications (2)

Application Number Title Priority Date Filing Date
CA3185178A Active CA3185178C (en) 2015-06-12 2016-06-10 Data quality analysis
CA2988256A Pending CA2988256A1 (en) 2015-06-12 2016-06-10 Data quality analysis

Family Applications After (1)

Application Number Title Priority Date Filing Date
CA2988256A Pending CA2988256A1 (en) 2015-06-12 2016-06-10 Data quality analysis

Country Status (9)

Country Link
US (2) US10409802B2 (enExample)
EP (2) EP3308297B1 (enExample)
JP (3) JP6707564B2 (enExample)
KR (1) KR102033971B1 (enExample)
CN (2) CN117807065A (enExample)
AU (2) AU2016274791B2 (enExample)
CA (2) CA3185178C (enExample)
SG (1) SG10201909389VA (enExample)
WO (1) WO2016201176A1 (enExample)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10409802B2 (en) 2015-06-12 2019-09-10 Ab Initio Technology Llc Data quality analysis
US9734188B1 (en) * 2016-01-29 2017-08-15 International Business Machines Corporation Systematic approach to determine source of data quality issue in data flow in an enterprise
US10776740B2 (en) 2016-06-07 2020-09-15 International Business Machines Corporation Detecting potential root causes of data quality issues using data lineage graphs
US10452625B2 (en) * 2016-06-30 2019-10-22 Global Ids, Inc. Data lineage analysis
US11960498B2 (en) 2016-09-29 2024-04-16 Microsoft Technology Licensing, Llc Systems and methods for dynamically rendering data lineage
US10657120B2 (en) * 2016-10-03 2020-05-19 Bank Of America Corporation Cross-platform digital data movement control utility and method of use thereof
US10242079B2 (en) 2016-11-07 2019-03-26 Tableau Software, Inc. Optimizing execution of data transformation flows
US11853529B2 (en) 2016-11-07 2023-12-26 Tableau Software, Inc. User interface to prepare and curate data for subsequent analysis
US10885057B2 (en) * 2016-11-07 2021-01-05 Tableau Software, Inc. Correlated incremental loading of multiple data sets for an interactive data prep application
CA2989617A1 (en) 2016-12-19 2018-06-19 Capital One Services, Llc Systems and methods for providing data quality management
US10147040B2 (en) 2017-01-20 2018-12-04 Alchemy IoT Device data quality evaluator
US10855783B2 (en) * 2017-01-23 2020-12-01 Adobe Inc. Communication notification trigger modeling preview
US10298465B2 (en) * 2017-08-01 2019-05-21 Juniper Networks, Inc. Using machine learning to monitor link quality and predict link faults
US10394691B1 (en) 2017-10-05 2019-08-27 Tableau Software, Inc. Resolution of data flow errors using the lineage of detected error conditions
US10783138B2 (en) * 2017-10-23 2020-09-22 Google Llc Verifying structured data
US10331660B1 (en) * 2017-12-22 2019-06-25 Capital One Services, Llc Generating a data lineage record to facilitate source system and destination system mapping
CN110413632B (zh) * 2018-04-26 2023-05-30 腾讯科技(深圳)有限公司 管理状态的方法、装置、计算机可读介质及电子设备
CA3103470A1 (en) * 2018-06-12 2019-12-19 Intergraph Corporation Artificial intelligence applications for computer-aided dispatch systems
US10678660B2 (en) * 2018-06-26 2020-06-09 StreamSets, Inc. Transformation drift detection and remediation
JP7153500B2 (ja) * 2018-08-09 2022-10-14 富士通株式会社 データ管理装置およびデータ推奨プログラム
CN113168413B (zh) * 2018-10-09 2022-07-01 塔谱软件公司 用于交互式数据准备应用的多个数据集的相关增量加载
US11250032B1 (en) 2018-10-22 2022-02-15 Tableau Software, Inc. Data preparation user interface with conditional remapping of data values
US10691304B1 (en) 2018-10-22 2020-06-23 Tableau Software, Inc. Data preparation user interface with conglomerate heterogeneous process flow elements
US11704494B2 (en) * 2019-05-31 2023-07-18 Ab Initio Technology Llc Discovering a semantic meaning of data fields from profile data of the data fields
US11157470B2 (en) * 2019-06-03 2021-10-26 International Business Machines Corporation Method and system for data quality delta analysis on a dataset
US11100097B1 (en) 2019-11-12 2021-08-24 Tableau Software, Inc. Visually defining multi-row table calculations in a data preparation application
US11886399B2 (en) 2020-02-26 2024-01-30 Ab Initio Technology Llc Generating rules for data processing values of data fields from semantic labels of the data fields
KR102240496B1 (ko) * 2020-04-17 2021-04-15 주식회사 한국정보기술단 데이터 품질 관리 시스템 및 그 방법
US20220059238A1 (en) * 2020-08-24 2022-02-24 GE Precision Healthcare LLC Systems and methods for generating data quality indices for patients
CN112131303A (zh) * 2020-09-18 2020-12-25 天津大学 基于神经网络模型的大规模数据沿袭方法
US11277473B1 (en) * 2020-12-01 2022-03-15 Adp, Llc Coordinating breaking changes in automatic data exchange
US12117978B2 (en) * 2020-12-09 2024-10-15 Kyndryl, Inc. Remediation of data quality issues in computer databases
KR102608736B1 (ko) * 2020-12-15 2023-12-01 주식회사 포티투마루 질의에 대한 문서 검색 방법 및 장치
US11921698B2 (en) 2021-04-12 2024-03-05 Torana Inc. System and method for data quality assessment
US12326852B2 (en) 2021-04-26 2025-06-10 International Business Machines Corporation Identifying anomalous transformations using lineage data
US12032994B1 (en) 2021-10-18 2024-07-09 Tableau Software, LLC Linking outputs for automatic execution of tasks
US20230185786A1 (en) * 2021-12-13 2023-06-15 International Business Machines Corporation Detect data standardization gaps
KR102833351B1 (ko) 2022-03-23 2025-07-10 배재대학교 산학협력단 데이터 프로파일링을 이용한 학사정보 시스템의 데이터 품질 관리 방법 및 장치
KR102437098B1 (ko) * 2022-04-15 2022-08-25 이찬영 인공 지능 기반의 오류 데이터 판정 방법 및 그 장치
US12242441B1 (en) * 2022-07-11 2025-03-04 Databricks, Inc. Data lineage tracking
CA3268252A1 (en) 2022-09-20 2024-03-28 Ab Initio Technology Llc Techniques for discovering and updating the semantic meaning of data fields
US12169486B2 (en) * 2022-10-19 2024-12-17 Snowflake Inc. File-based error handling during ingestion with transformation
US11822375B1 (en) * 2023-04-28 2023-11-21 Infosum Limited Systems and methods for partially securing data
US12353413B2 (en) 2023-08-04 2025-07-08 Optum, Inc. Quality evaluation and augmentation of data provided by a federated query system
EP4510001A1 (en) * 2023-08-15 2025-02-19 AB Initio Technology LLC Data set evaluation based on data lineage analysis
US12417218B2 (en) * 2023-08-30 2025-09-16 Capital One Services, Llc Systems and methods for scalable dataset content embedding for improved database searchability
US12204538B1 (en) 2023-09-06 2025-01-21 Optum, Inc. Dynamically tailored time intervals for federated query system
US12393593B2 (en) 2023-09-12 2025-08-19 Optum, Inc. Priority-driven federated query-based data caching
US12411895B1 (en) 2024-07-17 2025-09-09 Wells Fargo Bank, N.A. Rules for data quality support

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5966072A (en) 1996-07-02 1999-10-12 Ab Initio Software Corporation Executing computations expressed as graphs
CN1853181A (zh) * 2003-09-15 2006-10-25 Ab开元软件公司 数据归档
KR100922141B1 (ko) * 2003-09-15 2009-10-19 아브 이니티오 소프트웨어 엘엘시 데이터 프로파일링 방법 및 시스템
US7328428B2 (en) * 2003-09-23 2008-02-05 Trivergent Technologies, Inc. System and method for generating data validation rules
US7743420B2 (en) * 2003-12-02 2010-06-22 Imperva, Inc. Dynamic learning method and adaptive normal behavior profile (NBP) architecture for providing fast protection of enterprise applications
KR100582896B1 (ko) 2004-01-28 2006-05-24 삼성전자주식회사 소프트웨어 버전 자동 관리 시스템 및 버전 관리 방법
US7716630B2 (en) 2005-06-27 2010-05-11 Ab Initio Technology Llc Managing parameters for graph-based computations
US20070174234A1 (en) 2006-01-24 2007-07-26 International Business Machines Corporation Data quality and validation within a relational database management system
JP2008265618A (ja) * 2007-04-23 2008-11-06 Toyota Motor Corp 車載電子制御装置
AU2009219299B2 (en) * 2008-02-26 2015-05-07 Ab Initio Technology Llc Graphic representations of data relationships
CN101425078A (zh) 2008-11-17 2009-05-06 阿里巴巴集团控股有限公司 一种软件源代码的更新方法及装置
WO2010065623A1 (en) * 2008-12-02 2010-06-10 Ab Initio Software Llc Visualizing relationships between data elements and graphical representations of data element attributes
EP2440882B1 (en) * 2009-06-10 2020-02-12 Ab Initio Technology LLC Generating test data
KR101688555B1 (ko) * 2009-09-16 2016-12-21 아브 이니티오 테크놀로지 엘엘시 데이터세트 요소의 매핑
JP2011253491A (ja) * 2010-06-04 2011-12-15 Toshiba Corp プラント異常検知装置,プラント異常検知装置方法,およびプログラム
US8819010B2 (en) 2010-06-28 2014-08-26 International Business Machines Corporation Efficient representation of data lineage information
JP5331774B2 (ja) * 2010-10-22 2013-10-30 株式会社日立パワーソリューションズ 設備状態監視方法およびその装置並びに設備状態監視用プログラム
JP2012146241A (ja) 2011-01-14 2012-08-02 Canon Inc ソフトウェアアップデート方法、ソフトウェアアップデート装置、及びソフトウェアアップデートプログラム
US10013439B2 (en) 2011-06-27 2018-07-03 International Business Machines Corporation Automatic generation of instantiation rules to determine quality of data migration
US9330148B2 (en) 2011-06-30 2016-05-03 International Business Machines Corporation Adapting data quality rules based upon user application requirements
US8812411B2 (en) * 2011-11-03 2014-08-19 Microsoft Corporation Domains for knowledge-based data quality solution
US9202174B2 (en) * 2013-01-28 2015-12-01 Daniel A Dooley Automated tracker and analyzer
US10489360B2 (en) 2012-10-17 2019-11-26 Ab Initio Technology Llc Specifying and applying rules to data
US9075860B2 (en) * 2012-10-18 2015-07-07 Oracle International Corporation Data lineage system
US9569342B2 (en) * 2012-12-20 2017-02-14 Microsoft Technology Licensing, Llc Test strategy for profile-guided code execution optimizers
US9558230B2 (en) 2013-02-12 2017-01-31 International Business Machines Corporation Data quality assessment
US9576036B2 (en) * 2013-03-15 2017-02-21 International Business Machines Corporation Self-analyzing data processing job to determine data quality issues
US9256656B2 (en) 2013-08-20 2016-02-09 International Business Machines Corporation Determining reliability of data reports
JP2014006933A (ja) 2013-10-11 2014-01-16 Ricoh Co Ltd 情報処理装置、機器、情報処理システム、インストール支援方法、及びインストール支援プログラム
US10409802B2 (en) 2015-06-12 2019-09-10 Ab Initio Technology Llc Data quality analysis

Also Published As

Publication number Publication date
US20160364434A1 (en) 2016-12-15
CN107810500A (zh) 2018-03-16
JP7654699B2 (ja) 2025-04-01
KR20180030521A (ko) 2018-03-23
JP2020161147A (ja) 2020-10-01
EP3308297B1 (en) 2021-03-24
CA3185178A1 (en) 2016-12-15
CN117807065A (zh) 2024-04-02
SG10201909389VA (en) 2019-11-28
AU2016274791B2 (en) 2019-07-25
KR102033971B1 (ko) 2019-10-18
EP3839758A1 (en) 2021-06-23
AU2019253860B2 (en) 2021-12-09
JP6707564B2 (ja) 2020-06-10
JP2018523195A (ja) 2018-08-16
JP2023062126A (ja) 2023-05-02
US11249981B2 (en) 2022-02-15
EP3308297A1 (en) 2018-04-18
US10409802B2 (en) 2019-09-10
CA2988256A1 (en) 2016-12-15
HK1250066A1 (zh) 2018-11-23
WO2016201176A1 (en) 2016-12-15
JP7507602B2 (ja) 2024-06-28
US20200057757A1 (en) 2020-02-20
CN107810500B (zh) 2023-12-08
AU2016274791A1 (en) 2017-11-30
AU2019253860A1 (en) 2019-11-14
EP3839758B1 (en) 2022-08-10

Similar Documents

Publication Publication Date Title
AU2019253860B2 (en) Data quality analysis
US10185641B2 (en) Data generation
US10089362B2 (en) Systems and/or methods for investigating event streams in complex event processing (CEP) applications
CN107643956B (zh) 定位异常数据的异常起源的方法和装置
US10664807B2 (en) Retroactively modifying database records
US9959329B2 (en) Unified master report generator
US8589444B2 (en) Presenting information from heterogeneous and distributed data sources with real time updates
HK40045500A (en) Data quality analysis
HK40045500B (en) Data quality analysis
CN114218925B (zh) 数据处理方法、装置、设备、介质和程序产品
HK1250066B (en) Data quality analysis
US10558647B1 (en) High performance data aggregations
US20250139075A1 (en) System and method for providing a consolidated data hub
US12411895B1 (en) Rules for data quality support
WO2025038623A1 (en) Data set evaluation based on data lineage analysis
WO2025049234A1 (en) Conversion of data lineages
Sonnleitner et al. Persistence of workflow control data in temporal databases
CN116467376A (zh) 梳理数据仓库依赖关系的方法、装置、设备和存储介质

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209

EEER Examination request

Effective date: 20221209