GB2627092A - Deep learning techniques for extraction of embedded data from documents - Google Patents
Deep learning techniques for extraction of embedded data from documents Download PDFInfo
- Publication number
- GB2627092A GB2627092A GB2405984.2A GB202405984A GB2627092A GB 2627092 A GB2627092 A GB 2627092A GB 202405984 A GB202405984 A GB 202405984A GB 2627092 A GB2627092 A GB 2627092A
- Authority
- GB
- United Kingdom
- Prior art keywords
- text
- sub
- embeddings
- groupings
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163273761P | 2021-10-29 | 2021-10-29 | |
| US17/819,445 US12367352B2 (en) | 2021-10-29 | 2022-08-12 | Deep learning techniques for extraction of embedded data from documents |
| PCT/US2022/074974 WO2023076754A1 (en) | 2021-10-29 | 2022-08-15 | Deep learning techniques for extraction of embedded data from documents |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202405984D0 GB202405984D0 (en) | 2024-06-12 |
| GB2627092A true GB2627092A (en) | 2024-08-14 |
Family
ID=86147364
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2405984.2A Pending GB2627092A (en) | 2021-10-29 | 2022-08-15 | Deep learning techniques for extraction of embedded data from documents |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US12367352B2 (https=) |
| JP (1) | JP2024540111A (https=) |
| KR (1) | KR20240091051A (https=) |
| CN (1) | CN118202344A (https=) |
| GB (1) | GB2627092A (https=) |
| WO (1) | WO2023076754A1 (https=) |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12158900B2 (en) * | 2022-10-28 | 2024-12-03 | Abbyy Development Inc. | Extracting information from documents using automatic markup based on historical data |
| US12315052B2 (en) * | 2022-12-15 | 2025-05-27 | Accenture Global Solutions Limited | Generation of context-aware word embedding vectors for given semantic properties of a word using few texts |
| US12314318B2 (en) * | 2023-02-17 | 2025-05-27 | Snowflake Inc. | Enhanced searching using fine-tuned machine learning models |
| US12562163B2 (en) * | 2023-05-12 | 2026-02-24 | Servicenow, Inc. | Bidirectional assistant for development platforms |
| US11928569B1 (en) * | 2023-06-30 | 2024-03-12 | Intuit, Inc. | Automated user experience orchestration using natural language based machine learning techniques |
| CN116561602B (zh) * | 2023-07-10 | 2023-09-19 | 三峡高科信息技术有限责任公司 | 一种用于销售成本结转的销采物资自动匹配的方法 |
| US12277150B2 (en) * | 2023-07-20 | 2025-04-15 | Quantem Healthcare, Inc. | Computing technologies for hierarchies of chatbot application programs operative based on data structures containing unstructured texts |
| CN117097790A (zh) * | 2023-08-08 | 2023-11-21 | 北京字跳网络技术有限公司 | 一种信息推送方法、装置、计算机设备及存储介质 |
| US20250371272A1 (en) * | 2024-06-04 | 2025-12-04 | Optum, Inc. | Modified large language model architecture with span-level attention mechanism for conversion of natural language text to structured knowledge graph |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004326600A (ja) * | 2003-04-25 | 2004-11-18 | Fujitsu Ltd | 構造化文書のクラスタリング装置 |
| US20190073420A1 (en) * | 2017-09-04 | 2019-03-07 | Borislav Agapiev | System for creating a reasoning graph and for ranking of its nodes |
| KR20190058935A (ko) * | 2017-11-22 | 2019-05-30 | 주식회사 와이즈넛 | 문서 내 핵심 키워드 추출 시스템 및 방법 |
| US20200073882A1 (en) * | 2018-08-31 | 2020-03-05 | Accenture Global Solutions Limited | Artificial intelligence based corpus enrichment for knowledge population and query response |
| US10607042B1 (en) * | 2019-02-12 | 2020-03-31 | Live Objects, Inc. | Dynamically trained models of named entity recognition over unstructured data |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10380259B2 (en) * | 2017-05-22 | 2019-08-13 | International Business Machines Corporation | Deep embedding for natural language content based on semantic dependencies |
| US11914954B2 (en) * | 2019-12-08 | 2024-02-27 | Virginia Tech Intellectual Properties, Inc. | Methods and systems for generating declarative statements given documents with questions and answers |
| US11861314B2 (en) * | 2020-04-03 | 2024-01-02 | Asapp, Inc. | Extracting clinical follow-ups from discharge summaries |
| US11741146B2 (en) * | 2020-07-13 | 2023-08-29 | Nec Corporation | Embedding multi-modal time series and text data |
| US20220093088A1 (en) * | 2020-09-24 | 2022-03-24 | Apple Inc. | Contextual sentence embeddings for natural language processing applications |
| CN113011169B (zh) * | 2021-01-27 | 2022-11-11 | 北京字跳网络技术有限公司 | 一种会议纪要的处理方法、装置、设备及介质 |
-
2022
- 2022-08-12 US US17/819,445 patent/US12367352B2/en active Active
- 2022-08-15 KR KR1020247017614A patent/KR20240091051A/ko active Pending
- 2022-08-15 WO PCT/US2022/074974 patent/WO2023076754A1/en not_active Ceased
- 2022-08-15 JP JP2024525422A patent/JP2024540111A/ja active Pending
- 2022-08-15 CN CN202280073269.5A patent/CN118202344A/zh active Pending
- 2022-08-15 GB GB2405984.2A patent/GB2627092A/en active Pending
-
2025
- 2025-06-11 US US19/235,153 patent/US20250307566A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004326600A (ja) * | 2003-04-25 | 2004-11-18 | Fujitsu Ltd | 構造化文書のクラスタリング装置 |
| US20190073420A1 (en) * | 2017-09-04 | 2019-03-07 | Borislav Agapiev | System for creating a reasoning graph and for ranking of its nodes |
| KR20190058935A (ko) * | 2017-11-22 | 2019-05-30 | 주식회사 와이즈넛 | 문서 내 핵심 키워드 추출 시스템 및 방법 |
| US20200073882A1 (en) * | 2018-08-31 | 2020-03-05 | Accenture Global Solutions Limited | Artificial intelligence based corpus enrichment for knowledge population and query response |
| US10607042B1 (en) * | 2019-02-12 | 2020-03-31 | Live Objects, Inc. | Dynamically trained models of named entity recognition over unstructured data |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2024540111A (ja) | 2024-10-31 |
| US20250307566A1 (en) | 2025-10-02 |
| GB202405984D0 (en) | 2024-06-12 |
| KR20240091051A (ko) | 2024-06-21 |
| US20230139397A1 (en) | 2023-05-04 |
| US12367352B2 (en) | 2025-07-22 |
| WO2023076754A1 (en) | 2023-05-04 |
| CN118202344A (zh) | 2024-06-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| GB2627092A (en) | Deep learning techniques for extraction of embedded data from documents | |
| JP2024540111A5 (https=) | ||
| CN110770735B (zh) | 具有嵌入式数学表达式的文档的编码转换 | |
| JP2618832B2 (ja) | 文書の論理構造の解析方法及びシステム | |
| CN111914825B (zh) | 文字识别方法、装置及电子设备 | |
| KR101851785B1 (ko) | 챗봇의 트레이닝 세트 생성 장치 및 방법 | |
| WO2007005937A2 (en) | Grammatical parsing of document visual structures | |
| Layton et al. | Recentred local profiles for authorship attribution | |
| CN114417871A (zh) | 模型训练及命名实体识别方法、装置、电子设备及介质 | |
| US20220414328A1 (en) | Method and system for predicting field value using information extracted from a document | |
| CN111737949B (zh) | 题目内容提取方法、装置、可读存储介质及计算机设备 | |
| CN110413972B (zh) | 一种基于nlp技术的表名字段名智能补全方法 | |
| KR101851789B1 (ko) | 도메인 유사어구 생성 장치 및 방법 | |
| Fréry et al. | Ujm at clef in author verification based on optimized classification trees | |
| Clausner et al. | Efficient ocr training data generation with aletheia | |
| CN106446147A (zh) | 一种基于结构化特征的情感分析方法 | |
| CN116822634A (zh) | 一种基于布局感知提示的文档视觉语言推理方法 | |
| CN114492437B (zh) | 关键词识别方法、装置、电子设备及存储介质 | |
| Sun et al. | Squared english word: A method of generating glyph to use super characters for sentiment analysis | |
| CN112528682A (zh) | 语种检测方法、装置、电子设备和存储介质 | |
| CN111984845B (zh) | 网站错别字识别方法和系统 | |
| CN107797986A (zh) | 一种基于lstm‑cnn的混合语料分词方法 | |
| CN109325237B (zh) | 用于机器翻译的完整句识别方法与系统 | |
| Se et al. | AMRITA_CEN@ FIRE 2015: Extracting entities for social media texts in Indian languages | |
| Dinu et al. | Romanian syllabication using machine learning |