CA3253894A1 - AUTOMATED KEY-VALUE PAIR EXTRACTION - Google Patents

AUTOMATED KEY-VALUE PAIR EXTRACTION

Info

Publication number
CA3253894A1
CA3253894A1 CA3253894A CA3253894A CA3253894A1 CA 3253894 A1 CA3253894 A1 CA 3253894A1 CA 3253894 A CA3253894 A CA 3253894A CA 3253894 A CA3253894 A CA 3253894A CA 3253894 A1 CA3253894 A1 CA 3253894A1
Authority
CA
Canada
Prior art keywords
document
node
key
characters
keys
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3253894A
Other languages
English (en)
French (fr)
Inventor
Jad Dino Raad
Adam Blacke
Original Assignee
Alteryx Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alteryx Inc filed Critical Alteryx Inc
Publication of CA3253894A1 publication Critical patent/CA3253894A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Input (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
CA3253894A 2022-03-02 2023-02-27 AUTOMATED KEY-VALUE PAIR EXTRACTION Pending CA3253894A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/685,328 US12154356B2 (en) 2022-03-02 2022-03-02 Automated key-value pair extraction
US17/685,328 2022-03-02
PCT/US2023/013970 WO2023167824A1 (en) 2022-03-02 2023-02-27 Automated key-value pair extraction

Publications (1)

Publication Number Publication Date
CA3253894A1 true CA3253894A1 (en) 2023-09-07

Family

ID=87850837

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3253894A Pending CA3253894A1 (en) 2022-03-02 2023-02-27 AUTOMATED KEY-VALUE PAIR EXTRACTION

Country Status (8)

Country Link
US (2) US12154356B2 (https=)
EP (1) EP4487220A4 (https=)
JP (1) JP2025507838A (https=)
KR (1) KR20240157071A (https=)
CN (1) CN118786420A (https=)
AU (1) AU2023227770A1 (https=)
CA (1) CA3253894A1 (https=)
WO (1) WO2023167824A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102905894B1 (ko) * 2025-05-13 2025-12-31 이지자산평가주식회사 인공지능 모델을 활용하여 테이블 데이터를 분석하기 위한 방법 및 장치

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8443278B2 (en) 2009-01-02 2013-05-14 Apple Inc. Identification of tables in an unstructured document
US9645999B1 (en) 2016-08-02 2017-05-09 Quid, Inc. Adjustment of document relationship graphs
US11256760B1 (en) * 2018-09-28 2022-02-22 Automation Anywhere, Inc. Region adjacent subgraph isomorphism for layout clustering in document images
US10713524B2 (en) * 2018-10-10 2020-07-14 Microsoft Technology Licensing, Llc Key value extraction from documents
US10878234B1 (en) * 2018-11-20 2020-12-29 Amazon Technologies, Inc. Automated form understanding via layout agnostic identification of keys and corresponding values
CN114005123B (zh) 2021-10-11 2024-05-24 北京大学 一种印刷体文本版面数字化重建系统及方法
US12039798B2 (en) * 2021-11-01 2024-07-16 Salesforce, Inc. Processing forms using artificial intelligence models

Also Published As

Publication number Publication date
EP4487220A4 (en) 2025-12-24
CN118786420A (zh) 2024-10-15
US12154356B2 (en) 2024-11-26
WO2023167824A1 (en) 2023-09-07
AU2023227770A1 (en) 2024-09-12
US20250046107A1 (en) 2025-02-06
EP4487220A1 (en) 2025-01-08
KR20240157071A (ko) 2024-10-31
JP2025507838A (ja) 2025-03-21
US20230282013A1 (en) 2023-09-07

Similar Documents

Publication Publication Date Title
US11869263B2 (en) Automated classification and interpretation of life science documents
CN106104570B (zh) 检测和提取图像文档组件来创建流文档
JP6827116B2 (ja) ウェブページのクラスタリング方法及び装置
US11256912B2 (en) Electronic form identification using spatial information
US12248794B2 (en) Self-supervised system for learning a user interface language
KR102682244B1 (ko) Esg 보조 툴을 이용하여 정형화된 esg 데이터로 기계학습 모델을 학습하는 방법 및 기계학습 모델로 자동완성된 esg 문서를 생성하는 서비스 서버
EP4302227A1 (en) System and method for automated document analysis
EP3175375A1 (en) Image based search to identify objects in documents
US20250046107A1 (en) Automated key-value pair extraction
US11687578B1 (en) Systems and methods for classification of data streams
CA3254041A1 (en) MODULAR VECTOR SYSTEM (MODVEC): PLATFORM FOR BUILDING NEXT-GENERATION EXPRESSION VECTORS
CN116560819A (zh) 基于rpa的批量自动化操作方法、系统、设备及储存介质
CN113920509A (zh) 目标页面展示方法、装置、计算机设备及存储介质
US12437008B1 (en) Resolving latent status from dense information using machine learning
US12536679B2 (en) Application matching method and application matching device
US20240233426A9 (en) Method of classifying a document for a straight-through processing
CN118968187A (zh) 基于多粒度模型的图片违规检测方法、装置、系统及介质
CN117370817A (zh) 数据处理方法、装置、设备、介质和程序产品
CN117215947A (zh) 一种页面白屏检测方法、装置、计算机设备及存储介质
Vujic Quality Assurance Workflow, Release 2+ Release Report

Legal Events

Date Code Title Description
A00 Application filed

Free format text: ST27 STATUS EVENT CODE: A-0-1-A10-A00-A101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: APPLICATION RECEIVED - PCT

Effective date: 20240830

A00 Application filed

Free format text: ST27 STATUS EVENT CODE: A-1-1-A10-A00-A102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: COMPLIANCE REQUIREMENTS DETERMINED MET

Effective date: 20250121

A15 Pct application entered into the national or regional phase

Free format text: ST27 STATUS EVENT CODE: A-1-1-A10-A15-X000 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: NATIONAL ENTRY REQUIREMENTS DETERMINED COMPLIANT

Effective date: 20250121

P18 Priority claim added or amended

Free format text: ST27 STATUS EVENT CODE: A-1-1-P10-P18-P105 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: PRIORITY CLAIM REQUIREMENTS DETERMINED COMPLIANT

Effective date: 20250121

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-1-1-W10-W00-W100 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: LETTER SENT

Effective date: 20250131

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 2ND ANNIV.) - STANDARD

Year of fee payment: 2

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20250227

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT DETERMINED COMPLIANT

Effective date: 20250227

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20250227

R00 Party data change recorded

Free format text: ST27 STATUS EVENT CODE: A-1-1-R10-R00-R113 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CHANGE OF ADDRESS OR METHOD OF CORRESPONDENCE REQUEST RECEIVED

Effective date: 20250406

R18 Changes to party contact information recorded

Free format text: ST27 STATUS EVENT CODE: A-1-1-R10-R18-R114 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CHANGE OF METHOD OF CORRESPONDENCE REQUIREMENTS DETERMINED COMPLIANT

Effective date: 20251210

Free format text: ST27 STATUS EVENT CODE: A-1-1-R10-R18-R143 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CHANGE OF ADDRESS REQUIREMENTS DETERMINED COMPLIANT

Effective date: 20251210

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-1-1-W10-W00-W111 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CORRESPONDENT DETERMINED COMPLIANT

Effective date: 20251210

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 3RD ANNIV.) - STANDARD

Year of fee payment: 3

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20260219

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20260219