KR20240091051A - 문서들로부터의 임베딩된 데이터의 추출을 위한 딥 러닝 기술들 - Google Patents

문서들로부터의 임베딩된 데이터의 추출을 위한 딥 러닝 기술들 Download PDF

Info

Publication number
KR20240091051A
KR20240091051A KR1020247017614A KR20247017614A KR20240091051A KR 20240091051 A KR20240091051 A KR 20240091051A KR 1020247017614 A KR1020247017614 A KR 1020247017614A KR 20247017614 A KR20247017614 A KR 20247017614A KR 20240091051 A KR20240091051 A KR 20240091051A
Authority
KR
South Korea
Prior art keywords
text
data
sub
embeddings
bot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
KR1020247017614A
Other languages
English (en)
Korean (ko)
Inventor
슈 종
야쿠피티야게 돈 타누자 사모디에 다르마시리
탄 롱 동
마크 에드워드 존슨
Original Assignee
오라클 인터내셔날 코포레이션
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 오라클 인터내셔날 코포레이션 filed Critical 오라클 인터내셔날 코포레이션
Publication of KR20240091051A publication Critical patent/KR20240091051A/ko
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
KR1020247017614A 2021-10-29 2022-08-15 문서들로부터의 임베딩된 데이터의 추출을 위한 딥 러닝 기술들 Pending KR20240091051A (ko)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163273761P 2021-10-29 2021-10-29
US63/273,761 2021-10-29
US17/819,445 US12367352B2 (en) 2021-10-29 2022-08-12 Deep learning techniques for extraction of embedded data from documents
US17/819,445 2022-08-12
PCT/US2022/074974 WO2023076754A1 (en) 2021-10-29 2022-08-15 Deep learning techniques for extraction of embedded data from documents

Publications (1)

Publication Number Publication Date
KR20240091051A true KR20240091051A (ko) 2024-06-21

Family

ID=86147364

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020247017614A Pending KR20240091051A (ko) 2021-10-29 2022-08-15 문서들로부터의 임베딩된 데이터의 추출을 위한 딥 러닝 기술들

Country Status (6)

Country Link
US (2) US12367352B2 (https=)
JP (1) JP2024540111A (https=)
KR (1) KR20240091051A (https=)
CN (1) CN118202344A (https=)
GB (1) GB2627092A (https=)
WO (1) WO2023076754A1 (https=)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12158900B2 (en) * 2022-10-28 2024-12-03 Abbyy Development Inc. Extracting information from documents using automatic markup based on historical data
US12315052B2 (en) * 2022-12-15 2025-05-27 Accenture Global Solutions Limited Generation of context-aware word embedding vectors for given semantic properties of a word using few texts
US12314318B2 (en) * 2023-02-17 2025-05-27 Snowflake Inc. Enhanced searching using fine-tuned machine learning models
US12562163B2 (en) * 2023-05-12 2026-02-24 Servicenow, Inc. Bidirectional assistant for development platforms
US11928569B1 (en) * 2023-06-30 2024-03-12 Intuit, Inc. Automated user experience orchestration using natural language based machine learning techniques
CN116561602B (zh) * 2023-07-10 2023-09-19 三峡高科信息技术有限责任公司 一种用于销售成本结转的销采物资自动匹配的方法
US12277150B2 (en) * 2023-07-20 2025-04-15 Quantem Healthcare, Inc. Computing technologies for hierarchies of chatbot application programs operative based on data structures containing unstructured texts
CN117097790A (zh) * 2023-08-08 2023-11-21 北京字跳网络技术有限公司 一种信息推送方法、装置、计算机设备及存储介质
US20250371272A1 (en) * 2024-06-04 2025-12-04 Optum, Inc. Modified large language model architecture with span-level attention mechanism for conversion of natural language text to structured knowledge graph

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004326600A (ja) 2003-04-25 2004-11-18 Fujitsu Ltd 構造化文書のクラスタリング装置
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
US10503791B2 (en) 2017-09-04 2019-12-10 Borislav Agapiev System for creating a reasoning graph and for ranking of its nodes
KR102019194B1 (ko) 2017-11-22 2019-09-06 주식회사 와이즈넛 문서 내 핵심 키워드 추출 시스템 및 방법
US11734328B2 (en) 2018-08-31 2023-08-22 Accenture Global Solutions Limited Artificial intelligence based corpus enrichment for knowledge population and query response
US10607042B1 (en) 2019-02-12 2020-03-31 Live Objects, Inc. Dynamically trained models of named entity recognition over unstructured data
US11914954B2 (en) * 2019-12-08 2024-02-27 Virginia Tech Intellectual Properties, Inc. Methods and systems for generating declarative statements given documents with questions and answers
US11861314B2 (en) * 2020-04-03 2024-01-02 Asapp, Inc. Extracting clinical follow-ups from discharge summaries
US11741146B2 (en) * 2020-07-13 2023-08-29 Nec Corporation Embedding multi-modal time series and text data
US20220093088A1 (en) * 2020-09-24 2022-03-24 Apple Inc. Contextual sentence embeddings for natural language processing applications
CN113011169B (zh) * 2021-01-27 2022-11-11 北京字跳网络技术有限公司 一种会议纪要的处理方法、装置、设备及介质

Also Published As

Publication number Publication date
JP2024540111A (ja) 2024-10-31
US20250307566A1 (en) 2025-10-02
GB202405984D0 (en) 2024-06-12
US20230139397A1 (en) 2023-05-04
GB2627092A (en) 2024-08-14
US12367352B2 (en) 2025-07-22
WO2023076754A1 (en) 2023-05-04
CN118202344A (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
JP7682202B2 (ja) ドメイン外(ood)検出のための改良された技術
JP7561836B2 (ja) 自然言語処理のためのストップワードデータ拡張
US12099816B2 (en) Multi-factor modelling for natural language processing
US12217497B2 (en) Extracting key information from document using trained machine-learning models
US20220058347A1 (en) Techniques for providing explanations for text classification
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
US20250094725A1 (en) Digital assistant using generative artificial intelligence
JP7771196B2 (ja) 自然言語プロセッサのための複数特徴均衡化
KR20240089615A (ko) 사전-트레이닝된 언어 모델의 단일 트랜스포머 계층으로부터의 다중-헤드 네트워크의 미세-튜닝
US12412563B2 (en) Path dropout for natural language processing
KR102821062B1 (ko) 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들
US12572852B2 (en) Lexical dropout for natural language processing
US12112560B2 (en) Usage based resource utilization of training pool for chatbots
US12374322B2 (en) Adjusting outlier data points for training a machine-learning model
KR20240096829A (ko) 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크
US20230136965A1 (en) Prohibiting inconsistent named entity recognition tag sequences
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data
WO2025058830A1 (en) Digital assistant using generative artificial intelligence
WO2023091436A1 (en) System and techniques for handling long text for pre-trained language models

Legal Events

Date Code Title Description
PA0105 International application

St.27 status event code: A-0-1-A10-A15-nap-PA0105

PG1501 Laying open of application

St.27 status event code: A-1-1-Q10-Q12-nap-PG1501

D11 Substantive examination requested

Free format text: ST27 STATUS EVENT CODE: A-1-2-D10-D11-EXM-PA0201 (AS PROVIDED BY THE NATIONAL OFFICE)

PA0201 Request for examination

St.27 status event code: A-1-2-D10-D11-exm-PA0201