JP2025507838A - 自動化されたキーとバリューペアの抽出 - Google Patents
自動化されたキーとバリューペアの抽出 Download PDFInfo
- Publication number
- JP2025507838A JP2025507838A JP2024551948A JP2024551948A JP2025507838A JP 2025507838 A JP2025507838 A JP 2025507838A JP 2024551948 A JP2024551948 A JP 2024551948A JP 2024551948 A JP2024551948 A JP 2024551948A JP 2025507838 A JP2025507838 A JP 2025507838A
- Authority
- JP
- Japan
- Prior art keywords
- document
- node
- key
- keys
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Input (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/685,328 US12154356B2 (en) | 2022-03-02 | 2022-03-02 | Automated key-value pair extraction |
| US17/685,328 | 2022-03-02 | ||
| PCT/US2023/013970 WO2023167824A1 (en) | 2022-03-02 | 2023-02-27 | Automated key-value pair extraction |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JP2025507838A true JP2025507838A (ja) | 2025-03-21 |
| JP2025507838A5 JP2025507838A5 (https=) | 2026-03-06 |
Family
ID=87850837
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2024551948A Pending JP2025507838A (ja) | 2022-03-02 | 2023-02-27 | 自動化されたキーとバリューペアの抽出 |
Country Status (8)
| Country | Link |
|---|---|
| US (2) | US12154356B2 (https=) |
| EP (1) | EP4487220A4 (https=) |
| JP (1) | JP2025507838A (https=) |
| KR (1) | KR20240157071A (https=) |
| CN (1) | CN118786420A (https=) |
| AU (1) | AU2023227770A1 (https=) |
| CA (1) | CA3253894A1 (https=) |
| WO (1) | WO2023167824A1 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102905894B1 (ko) * | 2025-05-13 | 2025-12-31 | 이지자산평가주식회사 | 인공지능 모델을 활용하여 테이블 데이터를 분석하기 위한 방법 및 장치 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8443278B2 (en) | 2009-01-02 | 2013-05-14 | Apple Inc. | Identification of tables in an unstructured document |
| US9645999B1 (en) | 2016-08-02 | 2017-05-09 | Quid, Inc. | Adjustment of document relationship graphs |
| US11256760B1 (en) * | 2018-09-28 | 2022-02-22 | Automation Anywhere, Inc. | Region adjacent subgraph isomorphism for layout clustering in document images |
| US10713524B2 (en) * | 2018-10-10 | 2020-07-14 | Microsoft Technology Licensing, Llc | Key value extraction from documents |
| US10878234B1 (en) * | 2018-11-20 | 2020-12-29 | Amazon Technologies, Inc. | Automated form understanding via layout agnostic identification of keys and corresponding values |
| CN114005123B (zh) | 2021-10-11 | 2024-05-24 | 北京大学 | 一种印刷体文本版面数字化重建系统及方法 |
| US12039798B2 (en) * | 2021-11-01 | 2024-07-16 | Salesforce, Inc. | Processing forms using artificial intelligence models |
-
2022
- 2022-03-02 US US17/685,328 patent/US12154356B2/en active Active
-
2023
- 2023-02-27 CA CA3253894A patent/CA3253894A1/en active Pending
- 2023-02-27 KR KR1020247032812A patent/KR20240157071A/ko active Pending
- 2023-02-27 AU AU2023227770A patent/AU2023227770A1/en active Pending
- 2023-02-27 JP JP2024551948A patent/JP2025507838A/ja active Pending
- 2023-02-27 CN CN202380024039.4A patent/CN118786420A/zh active Pending
- 2023-02-27 EP EP23763837.4A patent/EP4487220A4/en active Pending
- 2023-02-27 WO PCT/US2023/013970 patent/WO2023167824A1/en not_active Ceased
-
2024
- 2024-10-24 US US18/925,781 patent/US20250046107A1/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| EP4487220A4 (en) | 2025-12-24 |
| CN118786420A (zh) | 2024-10-15 |
| US12154356B2 (en) | 2024-11-26 |
| WO2023167824A1 (en) | 2023-09-07 |
| CA3253894A1 (en) | 2023-09-07 |
| AU2023227770A1 (en) | 2024-09-12 |
| US20250046107A1 (en) | 2025-02-06 |
| EP4487220A1 (en) | 2025-01-08 |
| KR20240157071A (ko) | 2024-10-31 |
| US20230282013A1 (en) | 2023-09-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11869263B2 (en) | Automated classification and interpretation of life science documents | |
| EP3117369B1 (en) | Detecting and extracting image document components to create flow document | |
| KR102485129B1 (ko) | 정보 푸시 방법, 장치, 기기 및 저장매체 | |
| JP6827116B2 (ja) | ウェブページのクラスタリング方法及び装置 | |
| JP2020511726A (ja) | 電子文書からのデータ抽出 | |
| US20220121821A1 (en) | Extracting data from documents using multiple deep learning models | |
| US11256912B2 (en) | Electronic form identification using spatial information | |
| US11176403B1 (en) | Filtering detected objects from an object recognition index according to extracted features | |
| EP4302227A1 (en) | System and method for automated document analysis | |
| WO2023183096A1 (en) | Self-supervised system for learning a user interface language | |
| US20250046107A1 (en) | Automated key-value pair extraction | |
| US11687578B1 (en) | Systems and methods for classification of data streams | |
| WO2016093839A1 (en) | Structuring of semi-structured log messages | |
| KR102734982B1 (ko) | 대형 언어 모델을 위한 프롬프트의 테이블을 처리하는 방법 및 시스템 | |
| US20240420296A1 (en) | Annotation Based Document Processing with Imperfect Document Images | |
| CA3254041A1 (en) | MODULAR VECTOR SYSTEM (MODVEC): PLATFORM FOR BUILDING NEXT-GENERATION EXPRESSION VECTORS | |
| CN119577767B (zh) | 基于用户界面转换图的违法安卓应用检测方法及电子设备 | |
| US12437008B1 (en) | Resolving latent status from dense information using machine learning | |
| WO2021018016A1 (zh) | 一种专利信息展示方法、装置、设备及存储介质 | |
| US20240320274A1 (en) | Conserving system resources using smart document retention | |
| US20240233426A9 (en) | Method of classifying a document for a straight-through processing | |
| BIN et al. | OPTICAL CHARACTER RECOGNITION ON THE CLOUD |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20260226 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20260226 |