KR20240096829A - 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 - Google Patents

해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 Download PDF

Info

Publication number
KR20240096829A
KR20240096829A KR1020247019170A KR20247019170A KR20240096829A KR 20240096829 A KR20240096829 A KR 20240096829A KR 1020247019170 A KR1020247019170 A KR 1020247019170A KR 20247019170 A KR20247019170 A KR 20247019170A KR 20240096829 A KR20240096829 A KR 20240096829A
Authority
KR
South Korea
Prior art keywords
gram
vector
grams
embedding
bot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
KR1020247019170A
Other languages
English (en)
Korean (ko)
Inventor
탄 티엔 부
푸리아 자레무디
듀이 부
마크 에드워드 존슨
슈 종
블라디슬라프 블리노프
콩 주이 부 호앙
유-헹 홍
비남르 고엘
필립 빅터 오르그렌
스리니바사 파니 쿠마르 가데
비샬 비슈노이
탄 롱 동
Original Assignee
오라클 인터내셔날 코포레이션
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 오라클 인터내셔날 코포레이션 filed Critical 오라클 인터내셔날 코포레이션
Publication of KR20240096829A publication Critical patent/KR20240096829A/ko
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Error Detection And Correction (AREA)
KR1020247019170A 2021-11-08 2022-11-07 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 Pending KR20240096829A (ko)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163263728P 2021-11-08 2021-11-08
US63/263,728 2021-11-08
US18/052,694 US12602545B2 (en) 2021-11-08 2022-11-04 Wide and deep network for language detection using hash embeddings
US18/052,694 2022-11-04
PCT/US2022/049164 WO2023081483A1 (en) 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings

Publications (1)

Publication Number Publication Date
KR20240096829A true KR20240096829A (ko) 2024-06-26

Family

ID=86230305

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020247019170A Pending KR20240096829A (ko) 2021-11-08 2022-11-07 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크

Country Status (6)

Country Link
US (1) US12602545B2 (https=)
JP (1) JP2024540387A (https=)
KR (1) KR20240096829A (https=)
CN (1) CN118215920A (https=)
GB (1) GB2625485A (https=)
WO (1) WO2023081483A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12512187B2 (en) * 2023-02-02 2025-12-30 Tempus Ai, Inc. Sparse N-gram modeling for patient-entity relation extraction
US12614234B2 (en) 2023-02-20 2026-04-28 State Farm Mutual Automobile Insurance Ground truth insurance database
US12332928B2 (en) 2023-02-24 2025-06-17 State Farm Mutual Automobile Insurance Company Systems and methods for analysis of user telematics data using generative AI
US12400283B2 (en) 2023-04-03 2025-08-26 State Farm Mutual Automobile Insurance Company Artificial intelligence for flood monitoring and insurance claim filing
US12248993B2 (en) 2023-06-06 2025-03-11 State Farm Mutual Automobile Insurance Company Chatbot for reviewing social media
US20240427990A1 (en) * 2023-06-20 2024-12-26 Nvidia Corporation Text normalization and inverse text normalization for multi-lingual language models

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4962483B2 (ja) * 2008-12-19 2012-06-27 日本電気株式会社 情報処理装置
US9111095B2 (en) * 2012-08-29 2015-08-18 The Johns Hopkins University Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams
US9483768B2 (en) * 2014-08-11 2016-11-01 24/7 Customer, Inc. Methods and apparatuses for modeling customer interaction experiences
US10043009B2 (en) 2014-09-24 2018-08-07 Intel Corporation Technologies for software basic block similarity analysis
WO2018071594A1 (en) * 2016-10-11 2018-04-19 Talla, Inc. Systems, apparatus, and methods for platform-agnostic message processing
US10109275B2 (en) 2016-12-19 2018-10-23 Asapp, Inc. Word hash language model
US10984340B2 (en) * 2017-03-31 2021-04-20 Intuit Inc. Composite machine-learning system for label prediction and training data collection
US10963273B2 (en) * 2018-04-20 2021-03-30 Facebook, Inc. Generating personalized content summaries for users
US11106873B2 (en) * 2019-01-22 2021-08-31 Sap Se Context-based translation retrieval via multilingual space
US20210042800A1 (en) * 2019-08-06 2021-02-11 Hewlett Packard Enterprise Development Lp Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data
CN110955745B (zh) 2019-10-16 2022-04-01 宁波大学 一种基于深度学习的文本哈希检索方法
US11741306B2 (en) * 2019-12-18 2023-08-29 Microsoft Technology Licensing, Llc Controllable grounded text generation
US10997179B1 (en) * 2019-12-26 2021-05-04 Snowflake Inc. Pruning index for optimization of pattern matching queries
WO2021195130A1 (en) * 2020-03-23 2021-09-30 Sorcero, Inc. Cross-context natural language model generation
US10909461B1 (en) * 2020-05-08 2021-02-02 Google Llc Attention neural networks with locality-sensitive hashing
CN114254660A (zh) * 2020-09-22 2022-03-29 北京三星通信技术研究有限公司 多模态翻译方法、装置、电子设备及计算机可读存储介质
US11875128B2 (en) * 2021-06-28 2024-01-16 Ada Support Inc. Method and system for generating an intent classifier

Also Published As

Publication number Publication date
WO2023081483A1 (en) 2023-05-11
JP2024540387A (ja) 2024-10-31
GB202404718D0 (en) 2024-05-15
US20230141853A1 (en) 2023-05-11
GB2625485A (en) 2024-06-19
US12602545B2 (en) 2026-04-14
CN118215920A (zh) 2024-06-18

Similar Documents

Publication Publication Date Title
JP7682202B2 (ja) ドメイン外(ood)検出のための改良された技術
US12361219B2 (en) Context tag integration with named entity recognition models
JP7561836B2 (ja) 自然言語処理のためのストップワードデータ拡張
US20220230000A1 (en) Multi-factor modelling for natural language processing
US12217497B2 (en) Extracting key information from document using trained machine-learning models
KR20240089615A (ko) 사전-트레이닝된 언어 모델의 단일 트랜스포머 계층으로부터의 다중-헤드 네트워크의 미세-튜닝
US12153885B2 (en) Multi-feature balancing for natural language processors
EP4128010A1 (en) Noise data augmentation for natural language processing
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
US12602545B2 (en) Wide and deep network for language detection using hash embeddings
US12412563B2 (en) Path dropout for natural language processing
JP2024518416A (ja) 単純で効果的な敵対的攻撃方法としてのバリアント不一致攻撃(via)
KR102821062B1 (ko) 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들
US12572852B2 (en) Lexical dropout for natural language processing
US20250225129A1 (en) Techniques for efficient encoding in neural semantic parsing systems
US20240028963A1 (en) Methods and systems for augmentation and feature cache
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data
US20240169161A1 (en) Automating large-scale data collection

Legal Events

Date Code Title Description
PA0105 International application

Patent event date: 20240607

Patent event code: PA01051R01D

Comment text: International Patent Application

PG1501 Laying open of application