CN118786420A - 自动键-值对提取 - Google Patents

自动键-值对提取 Download PDF

Info

Publication number
CN118786420A
CN118786420A CN202380024039.4A CN202380024039A CN118786420A CN 118786420 A CN118786420 A CN 118786420A CN 202380024039 A CN202380024039 A CN 202380024039A CN 118786420 A CN118786420 A CN 118786420A
Authority
CN
China
Prior art keywords
document
node
keys
key
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380024039.4A
Other languages
English (en)
Chinese (zh)
Inventor
J·D·拉德
A·布莱克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Otrex Co ltd
Original Assignee
Otrex Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Otrex Co ltd filed Critical Otrex Co ltd
Publication of CN118786420A publication Critical patent/CN118786420A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Input (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
CN202380024039.4A 2022-03-02 2023-02-27 自动键-值对提取 Pending CN118786420A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/685,328 US12154356B2 (en) 2022-03-02 2022-03-02 Automated key-value pair extraction
US17/685,328 2022-03-02
PCT/US2023/013970 WO2023167824A1 (en) 2022-03-02 2023-02-27 Automated key-value pair extraction

Publications (1)

Publication Number Publication Date
CN118786420A true CN118786420A (zh) 2024-10-15

Family

ID=87850837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380024039.4A Pending CN118786420A (zh) 2022-03-02 2023-02-27 自动键-值对提取

Country Status (8)

Country Link
US (2) US12154356B2 (https=)
EP (1) EP4487220A4 (https=)
JP (1) JP2025507838A (https=)
KR (1) KR20240157071A (https=)
CN (1) CN118786420A (https=)
AU (1) AU2023227770A1 (https=)
CA (1) CA3253894A1 (https=)
WO (1) WO2023167824A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102905894B1 (ko) * 2025-05-13 2025-12-31 이지자산평가주식회사 인공지능 모델을 활용하여 테이블 데이터를 분석하기 위한 방법 및 장치

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8443278B2 (en) 2009-01-02 2013-05-14 Apple Inc. Identification of tables in an unstructured document
US9645999B1 (en) 2016-08-02 2017-05-09 Quid, Inc. Adjustment of document relationship graphs
US11256760B1 (en) * 2018-09-28 2022-02-22 Automation Anywhere, Inc. Region adjacent subgraph isomorphism for layout clustering in document images
US10713524B2 (en) * 2018-10-10 2020-07-14 Microsoft Technology Licensing, Llc Key value extraction from documents
US10878234B1 (en) * 2018-11-20 2020-12-29 Amazon Technologies, Inc. Automated form understanding via layout agnostic identification of keys and corresponding values
CN114005123B (zh) 2021-10-11 2024-05-24 北京大学 一种印刷体文本版面数字化重建系统及方法
US12039798B2 (en) * 2021-11-01 2024-07-16 Salesforce, Inc. Processing forms using artificial intelligence models

Also Published As

Publication number Publication date
EP4487220A4 (en) 2025-12-24
US12154356B2 (en) 2024-11-26
WO2023167824A1 (en) 2023-09-07
CA3253894A1 (en) 2023-09-07
AU2023227770A1 (en) 2024-09-12
US20250046107A1 (en) 2025-02-06
EP4487220A1 (en) 2025-01-08
KR20240157071A (ko) 2024-10-31
JP2025507838A (ja) 2025-03-21
US20230282013A1 (en) 2023-09-07

Similar Documents

Publication Publication Date Title
US11869263B2 (en) Automated classification and interpretation of life science documents
KR20160132842A (ko) 플로우 문서를 생성하기 위한 이미지 문서 컴포넌트 검출 및 추출 기법
EP3175375A1 (en) Image based search to identify objects in documents
US10372980B2 (en) Electronic form identification using spatial information
US12197897B2 (en) Image-based infrastructure-as-code processing based on predicted context
US20210295031A1 (en) Automated classification and interpretation of life science documents
US20240249191A1 (en) System and method of automated document page classification and targeted data extraction
US11687578B1 (en) Systems and methods for classification of data streams
US20250046107A1 (en) Automated key-value pair extraction
AU2023249062B2 (en) System and method for machine learning document partitioning
US20260064765A1 (en) Drawing search device, drawing database construction device, drawing search system, drawing search method, and recording medium
US20240420296A1 (en) Annotation Based Document Processing with Imperfect Document Images
CN117785149A (zh) 应用生成方法及相关装置、设备和存储介质
CA3254041A1 (en) MODULAR VECTOR SYSTEM (MODVEC): PLATFORM FOR BUILDING NEXT-GENERATION EXPRESSION VECTORS
WO2021018016A1 (zh) 一种专利信息展示方法、装置、设备及存储介质
US12561195B2 (en) Log data condensation for interoperability with language models
US12536679B2 (en) Application matching method and application matching device
US20260017769A1 (en) Method and system for validating images with observations
US20250111689A1 (en) Generation of domain-specific images for training optical character recognition (ocr) machine learning model
CN115794191A (zh) 前端资源库的优化方法及装置

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination