GB2625485A - Wide and deep network for language detection using hash embeddings - Google Patents

Wide and deep network for language detection using hash embeddings Download PDF

Info

Publication number
GB2625485A
GB2625485A GB2404718.5A GB202404718A GB2625485A GB 2625485 A GB2625485 A GB 2625485A GB 202404718 A GB202404718 A GB 202404718A GB 2625485 A GB2625485 A GB 2625485A
Authority
GB
United Kingdom
Prior art keywords
gram
vector
embedding
grams
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2404718.5A
Other languages
English (en)
Other versions
GB202404718D0 (en
Inventor
Tien Vu Thanh
Zaremoodi Poorya
Vu Duy
Edward Johnson Mark
Zhong Xu
Blinov Vladislav
Duy Vu Hoang Cong
Hong Yu-Heng
Goel Vinamr
Victor Orgren Philip
Phani Kumar Gadde Srinivasa
Vishnoi Vishal
Long Duong Thanh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of GB202404718D0 publication Critical patent/GB202404718D0/en
Publication of GB2625485A publication Critical patent/GB2625485A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Error Detection And Correction (AREA)
GB2404718.5A 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings Pending GB2625485A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202163263728P 2021-11-08 2021-11-08
US18/052,694 US12602545B2 (en) 2021-11-08 2022-11-04 Wide and deep network for language detection using hash embeddings
PCT/US2022/049164 WO2023081483A1 (en) 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings

Publications (2)

Publication Number Publication Date
GB202404718D0 GB202404718D0 (en) 2024-05-15
GB2625485A true GB2625485A (en) 2024-06-19

Family

ID=86230305

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2404718.5A Pending GB2625485A (en) 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings

Country Status (6)

Country Link
US (1) US12602545B2 (https=)
JP (1) JP2024540387A (https=)
KR (1) KR20240096829A (https=)
CN (1) CN118215920A (https=)
GB (1) GB2625485A (https=)
WO (1) WO2023081483A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12512187B2 (en) * 2023-02-02 2025-12-30 Tempus Ai, Inc. Sparse N-gram modeling for patient-entity relation extraction
US12614234B2 (en) 2023-02-20 2026-04-28 State Farm Mutual Automobile Insurance Ground truth insurance database
US12332928B2 (en) 2023-02-24 2025-06-17 State Farm Mutual Automobile Insurance Company Systems and methods for analysis of user telematics data using generative AI
US12400283B2 (en) 2023-04-03 2025-08-26 State Farm Mutual Automobile Insurance Company Artificial intelligence for flood monitoring and insurance claim filing
US12248993B2 (en) 2023-06-06 2025-03-11 State Farm Mutual Automobile Insurance Company Chatbot for reviewing social media
US20240427990A1 (en) * 2023-06-20 2024-12-26 Nvidia Corporation Text normalization and inverse text normalization for multi-lingual language models

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300691A1 (en) * 2014-09-24 2017-10-19 Jason R. Upchurch Technologies for software basic block similarity analysis
US20190019503A1 (en) * 2016-12-19 2019-01-17 Asapp, Inc. Word hash language model
CN110955745A (zh) * 2019-10-16 2020-04-03 宁波大学 一种基于深度学习的文本哈希检索方法
US20210294781A1 (en) * 2020-03-23 2021-09-23 Sorcero, Inc. Feature engineering with question generation

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4962483B2 (ja) * 2008-12-19 2012-06-27 日本電気株式会社 情報処理装置
US9111095B2 (en) * 2012-08-29 2015-08-18 The Johns Hopkins University Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams
US9483768B2 (en) * 2014-08-11 2016-11-01 24/7 Customer, Inc. Methods and apparatuses for modeling customer interaction experiences
WO2018071594A1 (en) * 2016-10-11 2018-04-19 Talla, Inc. Systems, apparatus, and methods for platform-agnostic message processing
US10984340B2 (en) * 2017-03-31 2021-04-20 Intuit Inc. Composite machine-learning system for label prediction and training data collection
US10963273B2 (en) * 2018-04-20 2021-03-30 Facebook, Inc. Generating personalized content summaries for users
US11106873B2 (en) * 2019-01-22 2021-08-31 Sap Se Context-based translation retrieval via multilingual space
US20210042800A1 (en) * 2019-08-06 2021-02-11 Hewlett Packard Enterprise Development Lp Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data
US11741306B2 (en) * 2019-12-18 2023-08-29 Microsoft Technology Licensing, Llc Controllable grounded text generation
US10997179B1 (en) * 2019-12-26 2021-05-04 Snowflake Inc. Pruning index for optimization of pattern matching queries
US10909461B1 (en) * 2020-05-08 2021-02-02 Google Llc Attention neural networks with locality-sensitive hashing
CN114254660A (zh) * 2020-09-22 2022-03-29 北京三星通信技术研究有限公司 多模态翻译方法、装置、电子设备及计算机可读存储介质
US11875128B2 (en) * 2021-06-28 2024-01-16 Ada Support Inc. Method and system for generating an intent classifier

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300691A1 (en) * 2014-09-24 2017-10-19 Jason R. Upchurch Technologies for software basic block similarity analysis
US20190019503A1 (en) * 2016-12-19 2019-01-17 Asapp, Inc. Word hash language model
CN110955745A (zh) * 2019-10-16 2020-04-03 宁波大学 一种基于深度学习的文本哈希检索方法
US20210294781A1 (en) * 2020-03-23 2021-09-23 Sorcero, Inc. Feature engineering with question generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAOJIN ZHANG et al; "Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems", arXiv:2007.14523, July 2020, pp.1-8 [retrieved on 2023.03.23]. Retrieved from <https://arxiv.org/abs/2007.14523> pages 1-7 (20200 *

Also Published As

Publication number Publication date
WO2023081483A1 (en) 2023-05-11
JP2024540387A (ja) 2024-10-31
GB202404718D0 (en) 2024-05-15
US20230141853A1 (en) 2023-05-11
US12602545B2 (en) 2026-04-14
KR20240096829A (ko) 2024-06-26
CN118215920A (zh) 2024-06-18

Similar Documents

Publication Publication Date Title
GB2625485A (en) Wide and deep network for language detection using hash embeddings
US20220292356A1 (en) Systems and methods of training neural networks against adversarial attacks
US11620515B2 (en) Multi-task knowledge distillation for language model
CN111699498B (zh) 作为问答的多任务学习
WO2019222206A1 (en) Multitask learning as question answering
CN115700515B (zh) 文本多标签分类方法及装置
CN112216273A (zh) 一种针对语音关键词分类网络的对抗样本攻击方法
Eom et al. Anti-spoofing using transfer learning with variational information bottleneck
CN112364945B (zh) 一种基于域-不变特征的元-知识微调方法及平台
JP2024540387A5 (https=)
CN106547735A (zh) 基于深度学习的上下文感知的动态词或字向量的构建及使用方法
WO2022177581A1 (en) Improved two-stage machine learning for imbalanced datasets
CN116226357B (zh) 一种输入中包含错误信息场景下的文档检索方法
CN110647919A (zh) 一种基于k-均值聚类和胶囊网络的文本聚类方法及系统
Chen et al. Generalized Correntropy based deep learning in presence of non-Gaussian noises
WO2019167296A1 (ja) 自然言語処理のための装置、方法及びプログラム
KR20240172939A (ko) 지식 증류를 이용한 딥페이크 음성탐지 모델의 학습 방법 및 장치
Song et al. Error-correcting output codes with ensemble diversity for robust learning in neural networks
CN114065771A (zh) 一种预训练语言处理方法及设备
Wang et al. Task-adaptive unbiased regularization meta-learning for few-shot cross-domain fault diagnosis
JP6612716B2 (ja) パターン識別装置、パターン識別方法およびプログラム
US20230108177A1 (en) Hardware-Aware Progressive Training Of Machine Learning Models
WO2021064856A1 (ja) ロバスト学習装置、ロバスト学習方法、プログラム及び記憶装置
CN118694641B (zh) 基于自适应特征蒸馏原型回放的类增量调制识别方法
WO2025080326A1 (en) Efficient decoding using large and small generative artificial intelligence models