CN118202344A - 用于从文档中提取嵌入式数据的深度学习技术 - Google Patents

用于从文档中提取嵌入式数据的深度学习技术 Download PDF

Info

Publication number
CN118202344A
CN118202344A CN202280073269.5A CN202280073269A CN118202344A CN 118202344 A CN118202344 A CN 118202344A CN 202280073269 A CN202280073269 A CN 202280073269A CN 118202344 A CN118202344 A CN 118202344A
Authority
CN
China
Prior art keywords
text
data
sub
robot
embeddings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280073269.5A
Other languages
English (en)
Chinese (zh)
Inventor
钟旭
Y·D·T·S·达摩西里
T·L·杜翁
M·E·约翰逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN118202344A publication Critical patent/CN118202344A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN202280073269.5A 2021-10-29 2022-08-15 用于从文档中提取嵌入式数据的深度学习技术 Pending CN118202344A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163273761P 2021-10-29 2021-10-29
US63/273,761 2021-10-29
US17/819,445 US12367352B2 (en) 2021-10-29 2022-08-12 Deep learning techniques for extraction of embedded data from documents
US17/819,445 2022-08-12
PCT/US2022/074974 WO2023076754A1 (en) 2021-10-29 2022-08-15 Deep learning techniques for extraction of embedded data from documents

Publications (1)

Publication Number Publication Date
CN118202344A true CN118202344A (zh) 2024-06-14

Family

ID=86147364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280073269.5A Pending CN118202344A (zh) 2021-10-29 2022-08-15 用于从文档中提取嵌入式数据的深度学习技术

Country Status (6)

Country Link
US (2) US12367352B2 (https=)
JP (1) JP2024540111A (https=)
KR (1) KR20240091051A (https=)
CN (1) CN118202344A (https=)
GB (1) GB2627092A (https=)
WO (1) WO2023076754A1 (https=)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12158900B2 (en) * 2022-10-28 2024-12-03 Abbyy Development Inc. Extracting information from documents using automatic markup based on historical data
US12315052B2 (en) * 2022-12-15 2025-05-27 Accenture Global Solutions Limited Generation of context-aware word embedding vectors for given semantic properties of a word using few texts
US12314318B2 (en) * 2023-02-17 2025-05-27 Snowflake Inc. Enhanced searching using fine-tuned machine learning models
US12562163B2 (en) * 2023-05-12 2026-02-24 Servicenow, Inc. Bidirectional assistant for development platforms
US11928569B1 (en) * 2023-06-30 2024-03-12 Intuit, Inc. Automated user experience orchestration using natural language based machine learning techniques
CN116561602B (zh) * 2023-07-10 2023-09-19 三峡高科信息技术有限责任公司 一种用于销售成本结转的销采物资自动匹配的方法
US12277150B2 (en) * 2023-07-20 2025-04-15 Quantem Healthcare, Inc. Computing technologies for hierarchies of chatbot application programs operative based on data structures containing unstructured texts
CN117097790A (zh) * 2023-08-08 2023-11-21 北京字跳网络技术有限公司 一种信息推送方法、装置、计算机设备及存储介质
US20250371272A1 (en) * 2024-06-04 2025-12-04 Optum, Inc. Modified large language model architecture with span-level attention mechanism for conversion of natural language text to structured knowledge graph

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004326600A (ja) 2003-04-25 2004-11-18 Fujitsu Ltd 構造化文書のクラスタリング装置
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
US10503791B2 (en) 2017-09-04 2019-12-10 Borislav Agapiev System for creating a reasoning graph and for ranking of its nodes
KR102019194B1 (ko) 2017-11-22 2019-09-06 주식회사 와이즈넛 문서 내 핵심 키워드 추출 시스템 및 방법
US11734328B2 (en) 2018-08-31 2023-08-22 Accenture Global Solutions Limited Artificial intelligence based corpus enrichment for knowledge population and query response
US10607042B1 (en) 2019-02-12 2020-03-31 Live Objects, Inc. Dynamically trained models of named entity recognition over unstructured data
US11914954B2 (en) * 2019-12-08 2024-02-27 Virginia Tech Intellectual Properties, Inc. Methods and systems for generating declarative statements given documents with questions and answers
US11861314B2 (en) * 2020-04-03 2024-01-02 Asapp, Inc. Extracting clinical follow-ups from discharge summaries
US11741146B2 (en) * 2020-07-13 2023-08-29 Nec Corporation Embedding multi-modal time series and text data
US20220093088A1 (en) * 2020-09-24 2022-03-24 Apple Inc. Contextual sentence embeddings for natural language processing applications
CN113011169B (zh) * 2021-01-27 2022-11-11 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质

Also Published As

Publication number Publication date
JP2024540111A (ja) 2024-10-31
US20250307566A1 (en) 2025-10-02
GB202405984D0 (en) 2024-06-12
KR20240091051A (ko) 2024-06-21
US20230139397A1 (en) 2023-05-04
GB2627092A (en) 2024-08-14
US12367352B2 (en) 2025-07-22
WO2023076754A1 (en) 2023-05-04

Similar Documents

Publication Publication Date Title
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
CN116724305B (zh) 上下文标签与命名实体识别模型的集成
CN115398437B (zh) 改进的域外(ood)检测技术
CN116802629B (zh) 用于自然语言处理的多因素建模
CN115398436B (zh) 用于自然语言处理的噪声数据扩充
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
CN115917553A (zh) 在聊天机器人中实现稳健命名实体识别的实体级数据扩充
US12153885B2 (en) Multi-feature balancing for natural language processors
CN116635862A (zh) 用于自然语言处理的域外数据扩充
KR102821062B1 (ko) 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들
CN116615727A (zh) 用于自然语言处理的关键词数据扩充工具
CN118215920A (zh) 用于使用散列嵌入进行语言检测的宽深网络
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data
CN119768794A (zh) 自适应训练数据扩充以促进命名实体识别模型的训练
WO2023091436A1 (en) System and techniques for handling long text for pre-trained language models
CN121773407A (zh) 用于将自然语言谈话变换成可视化表示的技术

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination