JP2024540111A - 文書からの埋め込まれるデータの抽出のための深層学習技術 - Google Patents
文書からの埋め込まれるデータの抽出のための深層学習技術 Download PDFInfo
- Publication number
- JP2024540111A JP2024540111A JP2024525422A JP2024525422A JP2024540111A JP 2024540111 A JP2024540111 A JP 2024540111A JP 2024525422 A JP2024525422 A JP 2024525422A JP 2024525422 A JP2024525422 A JP 2024525422A JP 2024540111 A JP2024540111 A JP 2024540111A
- Authority
- JP
- Japan
- Prior art keywords
- text
- data
- embeddings
- unstructured
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163273761P | 2021-10-29 | 2021-10-29 | |
| US63/273,761 | 2021-10-29 | ||
| US17/819,445 US12367352B2 (en) | 2021-10-29 | 2022-08-12 | Deep learning techniques for extraction of embedded data from documents |
| US17/819,445 | 2022-08-12 | ||
| PCT/US2022/074974 WO2023076754A1 (en) | 2021-10-29 | 2022-08-15 | Deep learning techniques for extraction of embedded data from documents |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JP2024540111A true JP2024540111A (ja) | 2024-10-31 |
| JP2024540111A5 JP2024540111A5 (https=) | 2025-03-06 |
Family
ID=86147364
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2024525422A Pending JP2024540111A (ja) | 2021-10-29 | 2022-08-15 | 文書からの埋め込まれるデータの抽出のための深層学習技術 |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US12367352B2 (https=) |
| JP (1) | JP2024540111A (https=) |
| KR (1) | KR20240091051A (https=) |
| CN (1) | CN118202344A (https=) |
| GB (1) | GB2627092A (https=) |
| WO (1) | WO2023076754A1 (https=) |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12158900B2 (en) * | 2022-10-28 | 2024-12-03 | Abbyy Development Inc. | Extracting information from documents using automatic markup based on historical data |
| US12315052B2 (en) * | 2022-12-15 | 2025-05-27 | Accenture Global Solutions Limited | Generation of context-aware word embedding vectors for given semantic properties of a word using few texts |
| US12314318B2 (en) * | 2023-02-17 | 2025-05-27 | Snowflake Inc. | Enhanced searching using fine-tuned machine learning models |
| US12562163B2 (en) * | 2023-05-12 | 2026-02-24 | Servicenow, Inc. | Bidirectional assistant for development platforms |
| US11928569B1 (en) * | 2023-06-30 | 2024-03-12 | Intuit, Inc. | Automated user experience orchestration using natural language based machine learning techniques |
| CN116561602B (zh) * | 2023-07-10 | 2023-09-19 | 三峡高科信息技术有限责任公司 | 一种用于销售成本结转的销采物资自动匹配的方法 |
| US12277150B2 (en) * | 2023-07-20 | 2025-04-15 | Quantem Healthcare, Inc. | Computing technologies for hierarchies of chatbot application programs operative based on data structures containing unstructured texts |
| CN117097790A (zh) * | 2023-08-08 | 2023-11-21 | 北京字跳网络技术有限公司 | 一种信息推送方法、装置、计算机设备及存储介质 |
| US20250371272A1 (en) * | 2024-06-04 | 2025-12-04 | Optum, Inc. | Modified large language model architecture with span-level attention mechanism for conversion of natural language text to structured knowledge graph |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113011169A (zh) * | 2021-01-27 | 2021-06-22 | 北京字跳网络技术有限公司 | 一种会议纪要的处理方法、装置、设备及介质 |
| US20210312128A1 (en) * | 2020-04-03 | 2021-10-07 | Asapp, Inc. | Extracting clinical follow-ups from discharge summaries |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004326600A (ja) | 2003-04-25 | 2004-11-18 | Fujitsu Ltd | 構造化文書のクラスタリング装置 |
| US10380259B2 (en) * | 2017-05-22 | 2019-08-13 | International Business Machines Corporation | Deep embedding for natural language content based on semantic dependencies |
| US10503791B2 (en) | 2017-09-04 | 2019-12-10 | Borislav Agapiev | System for creating a reasoning graph and for ranking of its nodes |
| KR102019194B1 (ko) | 2017-11-22 | 2019-09-06 | 주식회사 와이즈넛 | 문서 내 핵심 키워드 추출 시스템 및 방법 |
| US11734328B2 (en) | 2018-08-31 | 2023-08-22 | Accenture Global Solutions Limited | Artificial intelligence based corpus enrichment for knowledge population and query response |
| US10607042B1 (en) | 2019-02-12 | 2020-03-31 | Live Objects, Inc. | Dynamically trained models of named entity recognition over unstructured data |
| US11914954B2 (en) * | 2019-12-08 | 2024-02-27 | Virginia Tech Intellectual Properties, Inc. | Methods and systems for generating declarative statements given documents with questions and answers |
| US11741146B2 (en) * | 2020-07-13 | 2023-08-29 | Nec Corporation | Embedding multi-modal time series and text data |
| US20220093088A1 (en) * | 2020-09-24 | 2022-03-24 | Apple Inc. | Contextual sentence embeddings for natural language processing applications |
-
2022
- 2022-08-12 US US17/819,445 patent/US12367352B2/en active Active
- 2022-08-15 KR KR1020247017614A patent/KR20240091051A/ko active Pending
- 2022-08-15 WO PCT/US2022/074974 patent/WO2023076754A1/en not_active Ceased
- 2022-08-15 JP JP2024525422A patent/JP2024540111A/ja active Pending
- 2022-08-15 CN CN202280073269.5A patent/CN118202344A/zh active Pending
- 2022-08-15 GB GB2405984.2A patent/GB2627092A/en active Pending
-
2025
- 2025-06-11 US US19/235,153 patent/US20250307566A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210312128A1 (en) * | 2020-04-03 | 2021-10-07 | Asapp, Inc. | Extracting clinical follow-ups from discharge summaries |
| CN113011169A (zh) * | 2021-01-27 | 2021-06-22 | 北京字跳网络技术有限公司 | 一种会议纪要的处理方法、装置、设备及介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250307566A1 (en) | 2025-10-02 |
| GB202405984D0 (en) | 2024-06-12 |
| KR20240091051A (ko) | 2024-06-21 |
| US20230139397A1 (en) | 2023-05-04 |
| GB2627092A (en) | 2024-08-14 |
| US12367352B2 (en) | 2025-07-22 |
| WO2023076754A1 (en) | 2023-05-04 |
| CN118202344A (zh) | 2024-06-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7561836B2 (ja) | 自然言語処理のためのストップワードデータ拡張 | |
| JP7703667B2 (ja) | 固有表現認識モデルを用いたコンテキストタグ統合 | |
| JP7682202B2 (ja) | ドメイン外(ood)検出のための改良された技術 | |
| JP7721559B2 (ja) | 自然言語処理のためのノイズデータ拡張 | |
| US12099816B2 (en) | Multi-factor modelling for natural language processing | |
| US12217497B2 (en) | Extracting key information from document using trained machine-learning models | |
| US12367352B2 (en) | Deep learning techniques for extraction of embedded data from documents | |
| JP7771196B2 (ja) | 自然言語プロセッサのための複数特徴均衡化 | |
| JP2024539003A (ja) | 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整 | |
| JP2023551860A (ja) | 自然言語処理のためのドメイン外データ拡張 | |
| JP2023544328A (ja) | チャットボットの自動スコープ外遷移 | |
| JP2024518416A (ja) | 単純で効果的な敵対的攻撃方法としてのバリアント不一致攻撃(via) | |
| JP2023551322A (ja) | 自然言語処理のためのキーワードデータ拡張ツール | |
| US12210830B2 (en) | System and techniques for handling long text for pre-trained language models | |
| JP2024543062A (ja) | 自然言語処理のパスのドロップアウト | |
| US12374322B2 (en) | Adjusting outlier data points for training a machine-learning model | |
| JP2024540387A (ja) | ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク | |
| US20230136965A1 (en) | Prohibiting inconsistent named entity recognition tag sequences | |
| US12412043B2 (en) | Rule-based techniques for extraction of question and answer pairs from data | |
| JP2025528391A (ja) | 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大 | |
| WO2023091436A1 (en) | System and techniques for handling long text for pre-trained language models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20250226 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20250226 |
|
| A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20251114 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20260106 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20260224 |