GB2625485A - Wide and deep network for language detection using hash embeddings - Google Patents
Wide and deep network for language detection using hash embeddings Download PDFInfo
- Publication number
- GB2625485A GB2625485A GB2404718.5A GB202404718A GB2625485A GB 2625485 A GB2625485 A GB 2625485A GB 202404718 A GB202404718 A GB 202404718A GB 2625485 A GB2625485 A GB 2625485A
- Authority
- GB
- United Kingdom
- Prior art keywords
- gram
- vector
- embedding
- grams
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Error Detection And Correction (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163263728P | 2021-11-08 | 2021-11-08 | |
| US18/052,694 US12602545B2 (en) | 2021-11-08 | 2022-11-04 | Wide and deep network for language detection using hash embeddings |
| PCT/US2022/049164 WO2023081483A1 (en) | 2021-11-08 | 2022-11-07 | Wide and deep network for language detection using hash embeddings |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202404718D0 GB202404718D0 (en) | 2024-05-15 |
| GB2625485A true GB2625485A (en) | 2024-06-19 |
Family
ID=86230305
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2404718.5A Pending GB2625485A (en) | 2021-11-08 | 2022-11-07 | Wide and deep network for language detection using hash embeddings |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US12602545B2 (https=) |
| JP (1) | JP2024540387A (https=) |
| KR (1) | KR20240096829A (https=) |
| CN (1) | CN118215920A (https=) |
| GB (1) | GB2625485A (https=) |
| WO (1) | WO2023081483A1 (https=) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12512187B2 (en) * | 2023-02-02 | 2025-12-30 | Tempus Ai, Inc. | Sparse N-gram modeling for patient-entity relation extraction |
| US12614234B2 (en) | 2023-02-20 | 2026-04-28 | State Farm Mutual Automobile Insurance | Ground truth insurance database |
| US12332928B2 (en) | 2023-02-24 | 2025-06-17 | State Farm Mutual Automobile Insurance Company | Systems and methods for analysis of user telematics data using generative AI |
| US12400283B2 (en) | 2023-04-03 | 2025-08-26 | State Farm Mutual Automobile Insurance Company | Artificial intelligence for flood monitoring and insurance claim filing |
| US12248993B2 (en) | 2023-06-06 | 2025-03-11 | State Farm Mutual Automobile Insurance Company | Chatbot for reviewing social media |
| US20240427990A1 (en) * | 2023-06-20 | 2024-12-26 | Nvidia Corporation | Text normalization and inverse text normalization for multi-lingual language models |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170300691A1 (en) * | 2014-09-24 | 2017-10-19 | Jason R. Upchurch | Technologies for software basic block similarity analysis |
| US20190019503A1 (en) * | 2016-12-19 | 2019-01-17 | Asapp, Inc. | Word hash language model |
| CN110955745A (zh) * | 2019-10-16 | 2020-04-03 | 宁波大学 | 一种基于深度学习的文本哈希检索方法 |
| US20210294781A1 (en) * | 2020-03-23 | 2021-09-23 | Sorcero, Inc. | Feature engineering with question generation |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4962483B2 (ja) * | 2008-12-19 | 2012-06-27 | 日本電気株式会社 | 情報処理装置 |
| US9111095B2 (en) * | 2012-08-29 | 2015-08-18 | The Johns Hopkins University | Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams |
| US9483768B2 (en) * | 2014-08-11 | 2016-11-01 | 24/7 Customer, Inc. | Methods and apparatuses for modeling customer interaction experiences |
| WO2018071594A1 (en) * | 2016-10-11 | 2018-04-19 | Talla, Inc. | Systems, apparatus, and methods for platform-agnostic message processing |
| US10984340B2 (en) * | 2017-03-31 | 2021-04-20 | Intuit Inc. | Composite machine-learning system for label prediction and training data collection |
| US10963273B2 (en) * | 2018-04-20 | 2021-03-30 | Facebook, Inc. | Generating personalized content summaries for users |
| US11106873B2 (en) * | 2019-01-22 | 2021-08-31 | Sap Se | Context-based translation retrieval via multilingual space |
| US20210042800A1 (en) * | 2019-08-06 | 2021-02-11 | Hewlett Packard Enterprise Development Lp | Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data |
| US11741306B2 (en) * | 2019-12-18 | 2023-08-29 | Microsoft Technology Licensing, Llc | Controllable grounded text generation |
| US10997179B1 (en) * | 2019-12-26 | 2021-05-04 | Snowflake Inc. | Pruning index for optimization of pattern matching queries |
| US10909461B1 (en) * | 2020-05-08 | 2021-02-02 | Google Llc | Attention neural networks with locality-sensitive hashing |
| CN114254660A (zh) * | 2020-09-22 | 2022-03-29 | 北京三星通信技术研究有限公司 | 多模态翻译方法、装置、电子设备及计算机可读存储介质 |
| US11875128B2 (en) * | 2021-06-28 | 2024-01-16 | Ada Support Inc. | Method and system for generating an intent classifier |
-
2022
- 2022-11-04 US US18/052,694 patent/US12602545B2/en active Active
- 2022-11-07 JP JP2024526927A patent/JP2024540387A/ja active Pending
- 2022-11-07 GB GB2404718.5A patent/GB2625485A/en active Pending
- 2022-11-07 CN CN202280074232.4A patent/CN118215920A/zh active Pending
- 2022-11-07 KR KR1020247019170A patent/KR20240096829A/ko active Pending
- 2022-11-07 WO PCT/US2022/049164 patent/WO2023081483A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170300691A1 (en) * | 2014-09-24 | 2017-10-19 | Jason R. Upchurch | Technologies for software basic block similarity analysis |
| US20190019503A1 (en) * | 2016-12-19 | 2019-01-17 | Asapp, Inc. | Word hash language model |
| CN110955745A (zh) * | 2019-10-16 | 2020-04-03 | 宁波大学 | 一种基于深度学习的文本哈希检索方法 |
| US20210294781A1 (en) * | 2020-03-23 | 2021-09-23 | Sorcero, Inc. | Feature engineering with question generation |
Non-Patent Citations (1)
| Title |
|---|
| CAOJIN ZHANG et al; "Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems", arXiv:2007.14523, July 2020, pp.1-8 [retrieved on 2023.03.23]. Retrieved from <https://arxiv.org/abs/2007.14523> pages 1-7 (20200 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023081483A1 (en) | 2023-05-11 |
| JP2024540387A (ja) | 2024-10-31 |
| GB202404718D0 (en) | 2024-05-15 |
| US20230141853A1 (en) | 2023-05-11 |
| US12602545B2 (en) | 2026-04-14 |
| KR20240096829A (ko) | 2024-06-26 |
| CN118215920A (zh) | 2024-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| GB2625485A (en) | Wide and deep network for language detection using hash embeddings | |
| US20220292356A1 (en) | Systems and methods of training neural networks against adversarial attacks | |
| US11620515B2 (en) | Multi-task knowledge distillation for language model | |
| CN111699498B (zh) | 作为问答的多任务学习 | |
| WO2019222206A1 (en) | Multitask learning as question answering | |
| CN115700515B (zh) | 文本多标签分类方法及装置 | |
| CN112216273A (zh) | 一种针对语音关键词分类网络的对抗样本攻击方法 | |
| Eom et al. | Anti-spoofing using transfer learning with variational information bottleneck | |
| CN112364945B (zh) | 一种基于域-不变特征的元-知识微调方法及平台 | |
| JP2024540387A5 (https=) | ||
| CN106547735A (zh) | 基于深度学习的上下文感知的动态词或字向量的构建及使用方法 | |
| WO2022177581A1 (en) | Improved two-stage machine learning for imbalanced datasets | |
| CN116226357B (zh) | 一种输入中包含错误信息场景下的文档检索方法 | |
| CN110647919A (zh) | 一种基于k-均值聚类和胶囊网络的文本聚类方法及系统 | |
| Chen et al. | Generalized Correntropy based deep learning in presence of non-Gaussian noises | |
| WO2019167296A1 (ja) | 自然言語処理のための装置、方法及びプログラム | |
| KR20240172939A (ko) | 지식 증류를 이용한 딥페이크 음성탐지 모델의 학습 방법 및 장치 | |
| Song et al. | Error-correcting output codes with ensemble diversity for robust learning in neural networks | |
| CN114065771A (zh) | 一种预训练语言处理方法及设备 | |
| Wang et al. | Task-adaptive unbiased regularization meta-learning for few-shot cross-domain fault diagnosis | |
| JP6612716B2 (ja) | パターン識別装置、パターン識別方法およびプログラム | |
| US20230108177A1 (en) | Hardware-Aware Progressive Training Of Machine Learning Models | |
| WO2021064856A1 (ja) | ロバスト学習装置、ロバスト学習方法、プログラム及び記憶装置 | |
| CN118694641B (zh) | 基于自适应特征蒸馏原型回放的类增量调制识别方法 | |
| WO2025080326A1 (en) | Efficient decoding using large and small generative artificial intelligence models |