JP2024540387A - ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク - Google Patents

ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク Download PDF

Info

Publication number
JP2024540387A
JP2024540387A JP2024526927A JP2024526927A JP2024540387A JP 2024540387 A JP2024540387 A JP 2024540387A JP 2024526927 A JP2024526927 A JP 2024526927A JP 2024526927 A JP2024526927 A JP 2024526927A JP 2024540387 A JP2024540387 A JP 2024540387A
Authority
JP
Japan
Prior art keywords
vector
gram
grams
embedding
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2024526927A
Other languages
English (en)
Japanese (ja)
Other versions
JP2024540387A5 (https=
Inventor
ブー,タン・ティエン
ザレムーディ,ポーヤ
ブー,ズイ
ジョンソン,マーク・エドワード
ジョン,シュ
ブリノフ,ブラディスラフ
ホアン,コン・ズイ・ブー
ホン,ユ-ヘン
ゴエル,ビナムル
オグレン,フィリップ・ビクター
ガッデ,シュリニバーサ・ファニ・クマール
ビシュノイ,ビシャル
ドゥオン,タン・ロン
Original Assignee
オラクル・インターナショナル・コーポレイション
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オラクル・インターナショナル・コーポレイション filed Critical オラクル・インターナショナル・コーポレイション
Publication of JP2024540387A publication Critical patent/JP2024540387A/ja
Publication of JP2024540387A5 publication Critical patent/JP2024540387A5/ja
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/02User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Error Detection And Correction (AREA)
JP2024526927A 2021-11-08 2022-11-07 ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク Pending JP2024540387A (ja)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163263728P 2021-11-08 2021-11-08
US63/263,728 2021-11-08
US18/052,694 US12602545B2 (en) 2021-11-08 2022-11-04 Wide and deep network for language detection using hash embeddings
US18/052,694 2022-11-04
PCT/US2022/049164 WO2023081483A1 (en) 2021-11-08 2022-11-07 Wide and deep network for language detection using hash embeddings

Publications (2)

Publication Number Publication Date
JP2024540387A true JP2024540387A (ja) 2024-10-31
JP2024540387A5 JP2024540387A5 (https=) 2025-06-11

Family

ID=86230305

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2024526927A Pending JP2024540387A (ja) 2021-11-08 2022-11-07 ハッシュ埋め込みを用いた言語検出のための広範な深層ネットワーク

Country Status (6)

Country Link
US (1) US12602545B2 (https=)
JP (1) JP2024540387A (https=)
KR (1) KR20240096829A (https=)
CN (1) CN118215920A (https=)
GB (1) GB2625485A (https=)
WO (1) WO2023081483A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12512187B2 (en) * 2023-02-02 2025-12-30 Tempus Ai, Inc. Sparse N-gram modeling for patient-entity relation extraction
US12614234B2 (en) 2023-02-20 2026-04-28 State Farm Mutual Automobile Insurance Ground truth insurance database
US12332928B2 (en) 2023-02-24 2025-06-17 State Farm Mutual Automobile Insurance Company Systems and methods for analysis of user telematics data using generative AI
US12400283B2 (en) 2023-04-03 2025-08-26 State Farm Mutual Automobile Insurance Company Artificial intelligence for flood monitoring and insurance claim filing
US12248993B2 (en) 2023-06-06 2025-03-11 State Farm Mutual Automobile Insurance Company Chatbot for reviewing social media
US20240427990A1 (en) * 2023-06-20 2024-12-26 Nvidia Corporation Text normalization and inverse text normalization for multi-lingual language models

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010146308A (ja) * 2008-12-19 2010-07-01 Nec Corp 情報処理装置
US20180285773A1 (en) * 2017-03-31 2018-10-04 Intuit Inc. Composite machine-learning system for label prediction and training data collection
US10909461B1 (en) * 2020-05-08 2021-02-02 Google Llc Attention neural networks with locality-sensitive hashing

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9111095B2 (en) * 2012-08-29 2015-08-18 The Johns Hopkins University Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams
US9483768B2 (en) * 2014-08-11 2016-11-01 24/7 Customer, Inc. Methods and apparatuses for modeling customer interaction experiences
US10043009B2 (en) 2014-09-24 2018-08-07 Intel Corporation Technologies for software basic block similarity analysis
WO2018071594A1 (en) * 2016-10-11 2018-04-19 Talla, Inc. Systems, apparatus, and methods for platform-agnostic message processing
US10109275B2 (en) 2016-12-19 2018-10-23 Asapp, Inc. Word hash language model
US10963273B2 (en) * 2018-04-20 2021-03-30 Facebook, Inc. Generating personalized content summaries for users
US11106873B2 (en) * 2019-01-22 2021-08-31 Sap Se Context-based translation retrieval via multilingual space
US20210042800A1 (en) * 2019-08-06 2021-02-11 Hewlett Packard Enterprise Development Lp Systems and methods for predicting and optimizing the probability of an outcome event based on chat communication data
CN110955745B (zh) 2019-10-16 2022-04-01 宁波大学 一种基于深度学习的文本哈希检索方法
US11741306B2 (en) * 2019-12-18 2023-08-29 Microsoft Technology Licensing, Llc Controllable grounded text generation
US10997179B1 (en) * 2019-12-26 2021-05-04 Snowflake Inc. Pruning index for optimization of pattern matching queries
WO2021195130A1 (en) * 2020-03-23 2021-09-30 Sorcero, Inc. Cross-context natural language model generation
CN114254660A (zh) * 2020-09-22 2022-03-29 北京三星通信技术研究有限公司 多模态翻译方法、装置、电子设备及计算机可读存储介质
US11875128B2 (en) * 2021-06-28 2024-01-16 Ada Support Inc. Method and system for generating an intent classifier

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010146308A (ja) * 2008-12-19 2010-07-01 Nec Corp 情報処理装置
US20180285773A1 (en) * 2017-03-31 2018-10-04 Intuit Inc. Composite machine-learning system for label prediction and training data collection
US10909461B1 (en) * 2020-05-08 2021-02-02 Google Llc Attention neural networks with locality-sensitive hashing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SVENSTRUP, DAN ET AL.: "Hash Embeddings for Efficient Word Representations [オンライン]", ARXIV, JPN6026006934, 12 September 2017 (2017-09-12), pages 1 - 9, ISSN: 0005802802 *

Also Published As

Publication number Publication date
WO2023081483A1 (en) 2023-05-11
GB202404718D0 (en) 2024-05-15
US20230141853A1 (en) 2023-05-11
GB2625485A (en) 2024-06-19
US12602545B2 (en) 2026-04-14
KR20240096829A (ko) 2024-06-26
CN118215920A (zh) 2024-06-18

Similar Documents

Publication Publication Date Title
JP7703667B2 (ja) 固有表現認識モデルを用いたコンテキストタグ統合
JP7561836B2 (ja) 自然言語処理のためのストップワードデータ拡張
JP7682202B2 (ja) ドメイン外(ood)検出のための改良された技術
JP7721559B2 (ja) 自然言語処理のためのノイズデータ拡張
US12099816B2 (en) Multi-factor modelling for natural language processing
US12217497B2 (en) Extracting key information from document using trained machine-learning models
JP7771196B2 (ja) 自然言語プロセッサのための複数特徴均衡化
JP2024539003A (ja) 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整
US12367352B2 (en) Deep learning techniques for extraction of embedded data from documents
US12602545B2 (en) Wide and deep network for language detection using hash embeddings
JP2024518416A (ja) 単純で効果的な敵対的攻撃方法としてのバリアント不一致攻撃(via)
JP2023551322A (ja) 自然言語処理のためのキーワードデータ拡張ツール
JP2024543062A (ja) 自然言語処理のパスのドロップアウト
US12572852B2 (en) Lexical dropout for natural language processing
JP2024541762A (ja) 事前トレーニングされた言語モデルのための長いテキストを処理するシステムおよび技術
US20250225129A1 (en) Techniques for efficient encoding in neural semantic parsing systems
US12412043B2 (en) Rule-based techniques for extraction of question and answer pairs from data

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250603

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20250603

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20260212

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20260224