JP2024540111A - 文書からの埋め込まれるデータの抽出のための深層学習技術 - Google Patents

文書からの埋め込まれるデータの抽出のための深層学習技術 Download PDF

Info

Publication number
JP2024540111A
JP2024540111A JP2024525422A JP2024525422A JP2024540111A JP 2024540111 A JP2024540111 A JP 2024540111A JP 2024525422 A JP2024525422 A JP 2024525422A JP 2024525422 A JP2024525422 A JP 2024525422A JP 2024540111 A JP2024540111 A JP 2024540111A
Authority
JP
Japan
Prior art keywords
text
data
embeddings
unstructured
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2024525422A
Other languages
English (en)
Japanese (ja)
Other versions
JP2024540111A5 (https=
Inventor
ジョン,シュ
ダルマシリ,ヤクピティヤゲ・ドン・タヌジャ・サモッダイ
ドゥオング,タン・ロング
ジョンソン,マーク・エドワード
Original Assignee
オラクル・インターナショナル・コーポレイション
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オラクル・インターナショナル・コーポレイション filed Critical オラクル・インターナショナル・コーポレイション
Publication of JP2024540111A publication Critical patent/JP2024540111A/ja
Publication of JP2024540111A5 publication Critical patent/JP2024540111A5/ja
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
JP2024525422A 2021-10-29 2022-08-15 文書からの埋め込まれるデータの抽出のための深層学習技術 Pending JP2024540111A (ja)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163273761P 2021-10-29 2021-10-29
US63/273,761 2021-10-29
US17/819,445 US12367352B2 (en) 2021-10-29 2022-08-12 Deep learning techniques for extraction of embedded data from documents
US17/819,445 2022-08-12
PCT/US2022/074974 WO2023076754A1 (en) 2021-10-29 2022-08-15 Deep learning techniques for extraction of embedded data from documents

Publications (2)

Publication Number Publication Date
JP2024540111A true JP2024540111A (ja) 2024-10-31
JP2024540111A5 JP2024540111A5 (https=) 2025-03-06

Family

ID=86147364

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2024525422A Pending JP2024540111A (ja) 2021-10-29 2022-08-15 文書からの埋め込まれるデータの抽出のための深層学習技術

Country Status (6)

Country Link
US (2) US12367352B2 (https=)
JP (1) JP2024540111A (https=)
KR (1) KR20240091051A (https=)
CN (1) CN118202344A (https=)
GB (1) GB2627092A (https=)
WO (1) WO2023076754A1 (https=)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12158900B2 (en) * 2022-10-28 2024-12-03 Abbyy Development Inc. Extracting information from documents using automatic markup based on historical data
US12315052B2 (en) * 2022-12-15 2025-05-27 Accenture Global Solutions Limited Generation of context-aware word embedding vectors for given semantic properties of a word using few texts
US12314318B2 (en) * 2023-02-17 2025-05-27 Snowflake Inc. Enhanced searching using fine-tuned machine learning models
US12562163B2 (en) * 2023-05-12 2026-02-24 Servicenow, Inc. Bidirectional assistant for development platforms
US11928569B1 (en) * 2023-06-30 2024-03-12 Intuit, Inc. Automated user experience orchestration using natural language based machine learning techniques
CN116561602B (zh) * 2023-07-10 2023-09-19 三峡高科信息技术有限责任公司 一种用于销售成本结转的销采物资自动匹配的方法
US12277150B2 (en) * 2023-07-20 2025-04-15 Quantem Healthcare, Inc. Computing technologies for hierarchies of chatbot application programs operative based on data structures containing unstructured texts
CN117097790A (zh) * 2023-08-08 2023-11-21 北京字跳网络技术有限公司 一种信息推送方法、装置、计算机设备及存储介质
US20250371272A1 (en) * 2024-06-04 2025-12-04 Optum, Inc. Modified large language model architecture with span-level attention mechanism for conversion of natural language text to structured knowledge graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011169A (zh) * 2021-01-27 2021-06-22 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质
US20210312128A1 (en) * 2020-04-03 2021-10-07 Asapp, Inc. Extracting clinical follow-ups from discharge summaries

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004326600A (ja) 2003-04-25 2004-11-18 Fujitsu Ltd 構造化文書のクラスタリング装置
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
US10503791B2 (en) 2017-09-04 2019-12-10 Borislav Agapiev System for creating a reasoning graph and for ranking of its nodes
KR102019194B1 (ko) 2017-11-22 2019-09-06 주식회사 와이즈넛 문서 내 핵심 키워드 추출 시스템 및 방법
US11734328B2 (en) 2018-08-31 2023-08-22 Accenture Global Solutions Limited Artificial intelligence based corpus enrichment for knowledge population and query response
US10607042B1 (en) 2019-02-12 2020-03-31 Live Objects, Inc. Dynamically trained models of named entity recognition over unstructured data
US11914954B2 (en) * 2019-12-08 2024-02-27 Virginia Tech Intellectual Properties, Inc. Methods and systems for generating declarative statements given documents with questions and answers
US11741146B2 (en) * 2020-07-13 2023-08-29 Nec Corporation Embedding multi-modal time series and text data
US20220093088A1 (en) * 2020-09-24 2022-03-24 Apple Inc. Contextual sentence embeddings for natural language processing applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210312128A1 (en) * 2020-04-03 2021-10-07 Asapp, Inc. Extracting clinical follow-ups from discharge summaries
CN113011169A (zh) * 2021-01-27 2021-06-22 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质

Also Published As

Publication number Publication date
US20250307566A1 (en) 2025-10-02
GB202405984D0 (en) 2024-06-12
KR20240091051A (ko) 2024-06-21
US20230139397A1 (en) 2023-05-04
GB2627092A (en) 2024-08-14
US12367352B2 (en) 2025-07-22
WO2023076754A1 (en) 2023-05-04
CN118202344A (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
JP7561836B2 (ja) 自然言語処理のためのストップワードデータ拡張
JP7703667B2 (ja) 固有表現認識モデルを用いたコンテキストタグ統合
JP7682202B2 (ja) ドメイン外(ood)検出のための改良された技術
JP7721559B2 (ja) 自然言語処理のためのノイズデータ拡張
US12099816B2 (en) Multi-factor modelling for natural language processing
US12217497B2 (en) Extracting key information from document using trained machine-learning models
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
JP7771196B2 (ja) 自然言語プロセッサのための複数特徴均衡化
JP2024539003A (ja) 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整
JP2023551860A (ja) 自然言語処理のためのドメイン外データ拡張
JP2023544328A (ja) チャットボットの自動スコープ外遷移
JP2024518416A (ja) 単純で効果的な敵対的攻撃方法としてのバリアント不一致攻撃(via)
JP2023551322A (ja) 自然言語処理のためのキーワードデータ拡張ツール
US12210830B2 (en) System and techniques for handling long text for pre-trained language models
JP2024543062A (ja) 自然言語処理のパスのドロップアウト
US12374322B2 (en) Adjusting outlier data points for training a machine-learning model
JP2024540387A (ja) ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク
US20230136965A1 (en) Prohibiting inconsistent named entity recognition tag sequences
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data
JP2025528391A (ja) 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大
WO2023091436A1 (en) System and techniques for handling long text for pre-trained language models

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250226

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20250226

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20251114

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20260106

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20260224