KR20240096829A - 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 - Google Patents
해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 Download PDFInfo
- Publication number
- KR20240096829A KR20240096829A KR1020247019170A KR20247019170A KR20240096829A KR 20240096829 A KR20240096829 A KR 20240096829A KR 1020247019170 A KR1020247019170 A KR 1020247019170A KR 20247019170 A KR20247019170 A KR 20247019170A KR 20240096829 A KR20240096829 A KR 20240096829A
- Authority
- KR
- South Korea
- Prior art keywords
- gram
- vector
- grams
- embedding
- bot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Error Detection And Correction (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163263728P | 2021-11-08 | 2021-11-08 | |
| US63/263,728 | 2021-11-08 | ||
| US18/052,694 US12602545B2 (en) | 2021-11-08 | 2022-11-04 | Wide and deep network for language detection using hash embeddings |
| US18/052,694 | 2022-11-04 | ||
| PCT/US2022/049164 WO2023081483A1 (en) | 2021-11-08 | 2022-11-07 | Wide and deep network for language detection using hash embeddings |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| KR20240096829A true KR20240096829A (ko) | 2024-06-26 |
Family
ID=86230305
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| KR1020247019170A Pending KR20240096829A (ko) | 2021-11-08 | 2022-11-07 | 해시 임베딩들을 사용하는 언어 검출을 위한 와이드 및 딥 네트워크 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US12602545B2 (https=) |
| JP (1) | JP2024540387A (https=) |
| KR (1) | KR20240096829A (https=) |
| CN (1) | CN118215920A (https=) |
| GB (1) | GB2625485A (https=) |
| WO (1) | WO2023081483A1 (https=) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12512187B2 (en) * | 2023-02-02 | 2025-12-30 | Tempus Ai, Inc. | Sparse N-gram modeling for patient-entity relation extraction |
| US12614234B2 (en) | 2023-02-20 | 2026-04-28 | State Farm Mutual Automobile Insurance | Ground truth insurance database |
| US12332928B2 (en) | 2023-02-24 | 2025-06-17 | State Farm Mutual Automobile Insurance Company | Systems and methods for analysis of user telematics data using generative AI |
| US12400283B2 (en) | 2023-04-03 | 2025-08-26 | State Farm Mutual Automobile Insurance Company | Artificial intelligence for flood monitoring and insurance claim filing |
| US12248993B2 (en) | 2023-06-06 | 2025-03-11 | State Farm Mutual Automobile Insurance Company | Chatbot for reviewing social media |
| US20240427990A1 (en) * | 2023-06-20 | 2024-12-26 | Nvidia Corporation | Text normalization and inverse text normalization for multi-lingual language models |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4962483B2 (ja) * | 2008-12-19 | 2012-06-27 | 日本電気株式会社 | 情報処理装置 |
| US9111095B2 (en) * | 2012-08-29 | 2015-08-18 | The Johns Hopkins University | Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams |
| US9483768B2 (en) * | 2014-08-11 | 2016-11-01 | 24/7 Customer, Inc. | Methods and apparatuses for modeling customer interaction experiences |
| US10043009B2 (en) | 2014-09-24 | 2018-08-07 | Intel Corporation | Technologies for software basic block similarity analysis |
| WO2018071594A1 (en) * | 2016-10-11 | 2018-04-19 | Talla, Inc. | Systems, apparatus, and methods for platform-agnostic message processing |
| US10109275B2 (en) | 2016-12-19 | 2018-10-23 | Asapp, Inc. | Word hash language model |
| US10984340B2 (en) * | 2017-03-31 | 2021-04-20 | Intuit Inc. | Composite machine-learning system for label prediction and training data collection |
| US10963273B2 (en) * | 2018-04-20 | 2021-03-30 | Facebook, Inc. | Generating personalized content summaries for users |
| US11106873B2 (en) * | 2019-01-22 | 2021-08-31 | Sap Se | Context-based translation retrieval via multilingual space |
| US20210042800A1 (en) * | 2019-08-06 | 2021-02-11 | Hewlett Packard Enterprise Development Lp | Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data |
| CN110955745B (zh) | 2019-10-16 | 2022-04-01 | 宁波大学 | 一种基于深度学习的文本哈希检索方法 |
| US11741306B2 (en) * | 2019-12-18 | 2023-08-29 | Microsoft Technology Licensing, Llc | Controllable grounded text generation |
| US10997179B1 (en) * | 2019-12-26 | 2021-05-04 | Snowflake Inc. | Pruning index for optimization of pattern matching queries |
| WO2021195130A1 (en) * | 2020-03-23 | 2021-09-30 | Sorcero, Inc. | Cross-context natural language model generation |
| US10909461B1 (en) * | 2020-05-08 | 2021-02-02 | Google Llc | Attention neural networks with locality-sensitive hashing |
| CN114254660A (zh) * | 2020-09-22 | 2022-03-29 | 北京三星通信技术研究有限公司 | 多模态翻译方法、装置、电子设备及计算机可读存储介质 |
| US11875128B2 (en) * | 2021-06-28 | 2024-01-16 | Ada Support Inc. | Method and system for generating an intent classifier |
-
2022
- 2022-11-04 US US18/052,694 patent/US12602545B2/en active Active
- 2022-11-07 JP JP2024526927A patent/JP2024540387A/ja active Pending
- 2022-11-07 GB GB2404718.5A patent/GB2625485A/en active Pending
- 2022-11-07 CN CN202280074232.4A patent/CN118215920A/zh active Pending
- 2022-11-07 KR KR1020247019170A patent/KR20240096829A/ko active Pending
- 2022-11-07 WO PCT/US2022/049164 patent/WO2023081483A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023081483A1 (en) | 2023-05-11 |
| JP2024540387A (ja) | 2024-10-31 |
| GB202404718D0 (en) | 2024-05-15 |
| US20230141853A1 (en) | 2023-05-11 |
| GB2625485A (en) | 2024-06-19 |
| US12602545B2 (en) | 2026-04-14 |
| CN118215920A (zh) | 2024-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7682202B2 (ja) | ドメイン外(ood)検出のための改良された技術 | |
| US12361219B2 (en) | Context tag integration with named entity recognition models | |
| JP7561836B2 (ja) | 自然言語処理のためのストップワードデータ拡張 | |
| US20220230000A1 (en) | Multi-factor modelling for natural language processing | |
| US12217497B2 (en) | Extracting key information from document using trained machine-learning models | |
| KR20240089615A (ko) | 사전-트레이닝된 언어 모델의 단일 트랜스포머 계층으로부터의 다중-헤드 네트워크의 미세-튜닝 | |
| US12153885B2 (en) | Multi-feature balancing for natural language processors | |
| EP4128010A1 (en) | Noise data augmentation for natural language processing | |
| US12367352B2 (en) | Deep learning techniques for extraction of embedded data from documents | |
| US12602545B2 (en) | Wide and deep network for language detection using hash embeddings | |
| US12412563B2 (en) | Path dropout for natural language processing | |
| JP2024518416A (ja) | 単純で効果的な敵対的攻撃方法としてのバリアント不一致攻撃(via) | |
| KR102821062B1 (ko) | 사전-트레이닝된 언어 모델들에 대한 긴 텍스트를 핸들링하기 위한 시스템 및 기술들 | |
| US12572852B2 (en) | Lexical dropout for natural language processing | |
| US20250225129A1 (en) | Techniques for efficient encoding in neural semantic parsing systems | |
| US20240028963A1 (en) | Methods and systems for augmentation and feature cache | |
| US12412043B2 (en) | Rule-based techniques for extraction of question and answer pairs from data | |
| US20240169161A1 (en) | Automating large-scale data collection |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PA0105 | International application |
Patent event date: 20240607 Patent event code: PA01051R01D Comment text: International Patent Application |
|
| PG1501 | Laying open of application |