KR20240157071A - 자동화된 키-값 쌍 추출 - Google Patents
자동화된 키-값 쌍 추출 Download PDFInfo
- Publication number
- KR20240157071A KR20240157071A KR1020247032812A KR20247032812A KR20240157071A KR 20240157071 A KR20240157071 A KR 20240157071A KR 1020247032812 A KR1020247032812 A KR 1020247032812A KR 20247032812 A KR20247032812 A KR 20247032812A KR 20240157071 A KR20240157071 A KR 20240157071A
- Authority
- KR
- South Korea
- Prior art keywords
- document
- node
- key
- characters
- keys
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Input (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/685,328 US12154356B2 (en) | 2022-03-02 | 2022-03-02 | Automated key-value pair extraction |
| US17/685,328 | 2022-03-02 | ||
| PCT/US2023/013970 WO2023167824A1 (en) | 2022-03-02 | 2023-02-27 | Automated key-value pair extraction |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| KR20240157071A true KR20240157071A (ko) | 2024-10-31 |
Family
ID=87850837
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| KR1020247032812A Pending KR20240157071A (ko) | 2022-03-02 | 2023-02-27 | 자동화된 키-값 쌍 추출 |
Country Status (8)
| Country | Link |
|---|---|
| US (2) | US12154356B2 (https=) |
| EP (1) | EP4487220A4 (https=) |
| JP (1) | JP2025507838A (https=) |
| KR (1) | KR20240157071A (https=) |
| CN (1) | CN118786420A (https=) |
| AU (1) | AU2023227770A1 (https=) |
| CA (1) | CA3253894A1 (https=) |
| WO (1) | WO2023167824A1 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102905894B1 (ko) * | 2025-05-13 | 2025-12-31 | 이지자산평가주식회사 | 인공지능 모델을 활용하여 테이블 데이터를 분석하기 위한 방법 및 장치 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8443278B2 (en) | 2009-01-02 | 2013-05-14 | Apple Inc. | Identification of tables in an unstructured document |
| US9645999B1 (en) | 2016-08-02 | 2017-05-09 | Quid, Inc. | Adjustment of document relationship graphs |
| US11256760B1 (en) * | 2018-09-28 | 2022-02-22 | Automation Anywhere, Inc. | Region adjacent subgraph isomorphism for layout clustering in document images |
| US10713524B2 (en) * | 2018-10-10 | 2020-07-14 | Microsoft Technology Licensing, Llc | Key value extraction from documents |
| US10878234B1 (en) * | 2018-11-20 | 2020-12-29 | Amazon Technologies, Inc. | Automated form understanding via layout agnostic identification of keys and corresponding values |
| CN114005123B (zh) | 2021-10-11 | 2024-05-24 | 北京大学 | 一种印刷体文本版面数字化重建系统及方法 |
| US12039798B2 (en) * | 2021-11-01 | 2024-07-16 | Salesforce, Inc. | Processing forms using artificial intelligence models |
-
2022
- 2022-03-02 US US17/685,328 patent/US12154356B2/en active Active
-
2023
- 2023-02-27 CA CA3253894A patent/CA3253894A1/en active Pending
- 2023-02-27 KR KR1020247032812A patent/KR20240157071A/ko active Pending
- 2023-02-27 AU AU2023227770A patent/AU2023227770A1/en active Pending
- 2023-02-27 JP JP2024551948A patent/JP2025507838A/ja active Pending
- 2023-02-27 CN CN202380024039.4A patent/CN118786420A/zh active Pending
- 2023-02-27 EP EP23763837.4A patent/EP4487220A4/en active Pending
- 2023-02-27 WO PCT/US2023/013970 patent/WO2023167824A1/en not_active Ceased
-
2024
- 2024-10-24 US US18/925,781 patent/US20250046107A1/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| EP4487220A4 (en) | 2025-12-24 |
| CN118786420A (zh) | 2024-10-15 |
| US12154356B2 (en) | 2024-11-26 |
| WO2023167824A1 (en) | 2023-09-07 |
| CA3253894A1 (en) | 2023-09-07 |
| AU2023227770A1 (en) | 2024-09-12 |
| US20250046107A1 (en) | 2025-02-06 |
| EP4487220A1 (en) | 2025-01-08 |
| JP2025507838A (ja) | 2025-03-21 |
| US20230282013A1 (en) | 2023-09-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3117369B1 (en) | Detecting and extracting image document components to create flow document | |
| US11869263B2 (en) | Automated classification and interpretation of life science documents | |
| US12518553B2 (en) | System and method for automated document analysis | |
| US20230305863A1 (en) | Self-Supervised System for Learning a User Interface Language | |
| US10372980B2 (en) | Electronic form identification using spatial information | |
| US11687578B1 (en) | Systems and methods for classification of data streams | |
| US20250046107A1 (en) | Automated key-value pair extraction | |
| CN120469686A (zh) | 一种基于视频的网页生成方法、系统、设备和介质 | |
| US12094232B2 (en) | Automatically determining table locations and table cell types | |
| CA3254041A1 (en) | MODULAR VECTOR SYSTEM (MODVEC): PLATFORM FOR BUILDING NEXT-GENERATION EXPRESSION VECTORS | |
| CN119577767B (zh) | 基于用户界面转换图的违法安卓应用检测方法及电子设备 | |
| CN105677827A (zh) | 一种表单的获取方法及装置 | |
| US20240233426A9 (en) | Method of classifying a document for a straight-through processing | |
| US20260017769A1 (en) | Method and system for validating images with observations | |
| CN118968187A (zh) | 基于多粒度模型的图片违规检测方法、装置、系统及介质 | |
| CN120411826A (zh) | 基于无人机双模态图像的融合目标识别方法及装置 | |
| CN120853186A (zh) | 一种文档解析方法以及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PA0105 | International application |
Patent event date: 20240930 Patent event code: PA01051R01D Comment text: International Patent Application |
|
| PG1501 | Laying open of application |