CN118215920A - 用于使用散列嵌入进行语言检测的宽深网络 - Google Patents

用于使用散列嵌入进行语言检测的宽深网络 Download PDF

Info

Publication number
CN118215920A
CN118215920A CN202280074232.4A CN202280074232A CN118215920A CN 118215920 A CN118215920 A CN 118215920A CN 202280074232 A CN202280074232 A CN 202280074232A CN 118215920 A CN118215920 A CN 118215920A
Authority
CN
China
Prior art keywords
gram
vector
obtaining
robot
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280074232.4A
Other languages
English (en)
Chinese (zh)
Inventor
T·T·乌
P·扎雷穆迪
D·武
M·E·约翰逊
钟旭
V·布利诺夫
C·D·V·黄
洪宇衡
V·戈尔
P·V·奥格伦
S·P·K·加德
V·比什诺伊
T·L·杜翁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN118215920A publication Critical patent/CN118215920A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Error Detection And Correction (AREA)
CN202280074232.4A 2021-11-08 2022-11-07 用于使用散列嵌入进行语言检测的宽深网络 Pending CN118215920A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163263728P 2021-11-08 2021-11-08
US63/263,728 2021-11-08
US18/052,694 US12602545B2 (en) 2021-11-08 2022-11-04 Wide and deep network for language detection using hash embeddings
US18/052,694 2022-11-04
PCT/US2022/049164 WO2023081483A1 (en) 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings

Publications (1)

Publication Number Publication Date
CN118215920A true CN118215920A (zh) 2024-06-18

Family

ID=86230305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280074232.4A Pending CN118215920A (zh) 2021-11-08 2022-11-07 用于使用散列嵌入进行语言检测的宽深网络

Country Status (6)

Country Link
US (1) US12602545B2 (https=)
JP (1) JP2024540387A (https=)
KR (1) KR20240096829A (https=)
CN (1) CN118215920A (https=)
GB (1) GB2625485A (https=)
WO (1) WO2023081483A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12512187B2 (en) * 2023-02-02 2025-12-30 Tempus Ai, Inc. Sparse N-gram modeling for patient-entity relation extraction
US12614234B2 (en) 2023-02-20 2026-04-28 State Farm Mutual Automobile Insurance Ground truth insurance database
US12332928B2 (en) 2023-02-24 2025-06-17 State Farm Mutual Automobile Insurance Company Systems and methods for analysis of user telematics data using generative AI
US12400283B2 (en) 2023-04-03 2025-08-26 State Farm Mutual Automobile Insurance Company Artificial intelligence for flood monitoring and insurance claim filing
US12248993B2 (en) 2023-06-06 2025-03-11 State Farm Mutual Automobile Insurance Company Chatbot for reviewing social media
US20240427990A1 (en) * 2023-06-20 2024-12-26 Nvidia Corporation Text normalization and inverse text normalization for multi-lingual language models

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4962483B2 (ja) * 2008-12-19 2012-06-27 日本電気株式会社 情報処理装置
US9111095B2 (en) * 2012-08-29 2015-08-18 The Johns Hopkins University Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams
US9483768B2 (en) * 2014-08-11 2016-11-01 24/7 Customer, Inc. Methods and apparatuses for modeling customer interaction experiences
US10043009B2 (en) 2014-09-24 2018-08-07 Intel Corporation Technologies for software basic block similarity analysis
WO2018071594A1 (en) * 2016-10-11 2018-04-19 Talla, Inc. Systems, apparatus, and methods for platform-agnostic message processing
US10109275B2 (en) 2016-12-19 2018-10-23 Asapp, Inc. Word hash language model
US10984340B2 (en) * 2017-03-31 2021-04-20 Intuit Inc. Composite machine-learning system for label prediction and training data collection
US10963273B2 (en) * 2018-04-20 2021-03-30 Facebook, Inc. Generating personalized content summaries for users
US11106873B2 (en) * 2019-01-22 2021-08-31 Sap Se Context-based translation retrieval via multilingual space
US20210042800A1 (en) * 2019-08-06 2021-02-11 Hewlett Packard Enterprise Development Lp Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data
CN110955745B (zh) 2019-10-16 2022-04-01 宁波大学 一种基于深度学习的文本哈希检索方法
US11741306B2 (en) * 2019-12-18 2023-08-29 Microsoft Technology Licensing, Llc Controllable grounded text generation
US10997179B1 (en) * 2019-12-26 2021-05-04 Snowflake Inc. Pruning index for optimization of pattern matching queries
WO2021195130A1 (en) * 2020-03-23 2021-09-30 Sorcero, Inc. Cross-context natural language model generation
US10909461B1 (en) * 2020-05-08 2021-02-02 Google Llc Attention neural networks with locality-sensitive hashing
CN114254660A (zh) * 2020-09-22 2022-03-29 北京三星通信技术研究有限公司 多模态翻译方法、装置、电子设备及计算机可读存储介质
US11875128B2 (en) * 2021-06-28 2024-01-16 Ada Support Inc. Method and system for generating an intent classifier

Also Published As

Publication number Publication date
WO2023081483A1 (en) 2023-05-11
JP2024540387A (ja) 2024-10-31
GB202404718D0 (en) 2024-05-15
US20230141853A1 (en) 2023-05-11
GB2625485A (en) 2024-06-19
US12602545B2 (en) 2026-04-14
KR20240096829A (ko) 2024-06-26

Similar Documents

Publication Publication Date Title
CN116724305B (zh) 上下文标签与命名实体识别模型的集成
CN114424185B (zh) 用于自然语言处理的停用词数据扩充
CN115398436B (zh) 用于自然语言处理的噪声数据扩充
CN115398437B (zh) 改进的域外(ood)检测技术
CN116802629B (zh) 用于自然语言处理的多因素建模
CN116583837B (zh) 用于自然语言处理的基于距离的logit值
CN116547676B (zh) 用于自然语言处理的增强型logit
US12153885B2 (en) Multi-feature balancing for natural language processors
CN116635862A (zh) 用于自然语言处理的域外数据扩充
CN118140230A (zh) 对经预训练的语言模型的单个转换器层的多头网络进行微调
US12602545B2 (en) Wide and deep network for language detection using hash embeddings
CN116615727A (zh) 用于自然语言处理的关键词数据扩充工具
CN118265981B (zh) 用于为预训练的语言模型处置长文本的系统和技术
CN118202344A (zh) 用于从文档中提取嵌入式数据的深度学习技术
CN119183573A (zh) 实体感知数据增强技术
US20250225129A1 (en) Techniques for efficient encoding in neural semantic parsing systems
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data
CN119768794A (zh) 自适应训练数据扩充以促进命名实体识别模型的训练

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination