JP7721559B2 - 自然言語処理のためのノイズデータ拡張 - Google Patents

自然言語処理のためのノイズデータ拡張

Info

Publication number
JP7721559B2
JP7721559B2 JP2022559639A JP2022559639A JP7721559B2 JP 7721559 B2 JP7721559 B2 JP 7721559B2 JP 2022559639 A JP2022559639 A JP 2022559639A JP 2022559639 A JP2022559639 A JP 2022559639A JP 7721559 B2 JP7721559 B2 JP 7721559B2
Authority
JP
Japan
Prior art keywords
utterances
utterance
text
intent
skillbot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2022559639A
Other languages
English (en)
Japanese (ja)
Other versions
JP2023519713A (ja
JP2023519713A5 (https=
Inventor
ジャラルッディン,エリアス・ルクマン
ビシュノイ,ビシャル
ジョンソン,マーク・エドワード
ドゥオング,タン・ロング
ホング,ユ-ヘング
ビナコタ,バラコタ・シュリニバス
Original Assignee
オラクル・インターナショナル・コーポレイション
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オラクル・インターナショナル・コーポレイション filed Critical オラクル・インターナショナル・コーポレイション
Publication of JP2023519713A publication Critical patent/JP2023519713A/ja
Publication of JP2023519713A5 publication Critical patent/JP2023519713A5/ja
Priority to JP2025125324A priority Critical patent/JP2025170253A/ja
Application granted granted Critical
Publication of JP7721559B2 publication Critical patent/JP7721559B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • G10L2015/0633Creating reference templates; Clustering using lexical or orthographic knowledge sources
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
JP2022559639A 2020-03-30 2020-09-11 自然言語処理のためのノイズデータ拡張 Active JP7721559B2 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2025125324A JP2025170253A (ja) 2020-03-30 2025-07-28 自然言語処理のためのノイズデータ拡張

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063002066P 2020-03-30 2020-03-30
US63/002,066 2020-03-30
PCT/US2020/050342 WO2021201907A1 (en) 2020-03-30 2020-09-11 Noise data augmentation for natural language processing

Related Child Applications (1)

Application Number Title Priority Date Filing Date
JP2025125324A Division JP2025170253A (ja) 2020-03-30 2025-07-28 自然言語処理のためのノイズデータ拡張

Publications (3)

Publication Number Publication Date
JP2023519713A JP2023519713A (ja) 2023-05-12
JP2023519713A5 JP2023519713A5 (https=) 2023-06-15
JP7721559B2 true JP7721559B2 (ja) 2025-08-12

Family

ID=72659890

Family Applications (2)

Application Number Title Priority Date Filing Date
JP2022559639A Active JP7721559B2 (ja) 2020-03-30 2020-09-11 自然言語処理のためのノイズデータ拡張
JP2025125324A Pending JP2025170253A (ja) 2020-03-30 2025-07-28 自然言語処理のためのノイズデータ拡張

Family Applications After (1)

Application Number Title Priority Date Filing Date
JP2025125324A Pending JP2025170253A (ja) 2020-03-30 2025-07-28 自然言語処理のためのノイズデータ拡張

Country Status (5)

Country Link
US (2) US11538457B2 (https=)
EP (1) EP4128010A1 (https=)
JP (2) JP7721559B2 (https=)
CN (1) CN115398436B (https=)
WO (1) WO2021201907A1 (https=)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118642630A (zh) * 2018-08-21 2024-09-13 谷歌有限责任公司 用于自动助理调用的方法
US11538457B2 (en) * 2020-03-30 2022-12-27 Oracle International Corporation Noise data augmentation for natural language processing
US11556788B2 (en) * 2020-06-15 2023-01-17 International Business Machines Corporation Text-based response environment action selection
US11599721B2 (en) * 2020-08-25 2023-03-07 Salesforce, Inc. Intelligent training set augmentation for natural language processing tasks
DK202170043A1 (en) * 2021-01-29 2022-12-12 A P Moeller Mærsk As A method for autonomous reconciliation of invoice data and related electronic device
US12026471B2 (en) * 2021-04-16 2024-07-02 Accenture Global Solutions Limited Automated generation of chatbot
US12242816B2 (en) * 2021-06-30 2025-03-04 Microsoft Technology Licensing, Llc Task-action prediction engine for a task management system
US12321428B2 (en) * 2021-07-08 2025-06-03 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program
EP4363965A1 (en) * 2021-08-06 2024-05-08 Siemens Aktiengesellschaft Source code synthesis for domain specific languages from natural language text
US12468938B2 (en) * 2021-09-21 2025-11-11 International Business Machines Corporation Training example generation to create new intents for chatbots
CN114491048B (zh) * 2022-02-16 2025-08-15 北京微播易科技股份有限公司 一种数据增强方法、文本分类模型的训练方法和装置
CN115878765B (zh) * 2022-04-18 2024-09-13 北京中关村科金技术有限公司 一种融合意图识别降噪的催款话术挖掘方法及装置
CN114881130A (zh) * 2022-04-26 2022-08-09 华北电力大学 一种基于Bagging模型的继电保护缺陷文本定级方法
US12451141B2 (en) 2022-06-08 2025-10-21 International Business Machines Corporation Generating multi-turn dialog datasets
US12579448B2 (en) 2022-06-22 2026-03-17 Oracle International Corporation Techniques for positive entity aware augmentation using two-stage augmentation
CN117668216A (zh) * 2022-08-12 2024-03-08 南方电网大数据服务有限公司 意图识别模型训练方法、意图识别方法和装置
CN116150311A (zh) * 2022-08-16 2023-05-23 马上消费金融股份有限公司 文本匹配模型的训练方法、意图识别方法及装置
US12499385B2 (en) * 2022-08-22 2025-12-16 Oracle International Corporation Adaptive training data augmentation to facilitate training named entity recognition models
CN115909354B (zh) * 2022-11-11 2023-11-10 北京百度网讯科技有限公司 文本生成模型的训练方法、文本获取方法及装置
US12512089B2 (en) * 2022-12-07 2025-12-30 International Business Machines Corporation Testing cascaded deep learning pipelines comprising a speech-to-text model and a text intent classifier
JP2024098791A (ja) * 2023-01-11 2024-07-24 株式会社東芝 情報処理装置、情報処理方法及び情報処理プログラム
US12231378B2 (en) * 2023-06-08 2025-02-18 Sap Se Realtime conversation AI insights and deployment
US20250008021A1 (en) * 2023-06-28 2025-01-02 Jpmorgan Chase Bank, N.A. Systems and methods for artificial intelligence-based coaching using microlearning
US12367342B1 (en) * 2025-01-15 2025-07-22 Conversational AI Ltd Automated analysis of computerized conversational agent conversational data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018081298A (ja) 2016-11-16 2018-05-24 三星電子株式会社Samsung Electronics Co.,Ltd. 自然語処理方法及び装置と自然語処理モデルを学習する方法及び装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110289025A1 (en) * 2010-05-19 2011-11-24 Microsoft Corporation Learning user intent from rule-based training data
US20160055240A1 (en) 2014-08-22 2016-02-25 Microsoft Corporation Orphaned utterance detection system and method
WO2016055240A1 (en) * 2014-10-06 2016-04-14 Zentrum Mikroelektronik Dresden Ag Pulsed linear power converter
CN105786798B (zh) * 2016-02-25 2018-11-02 上海交通大学 一种人机交互中自然语言意图理解方法
US10510336B2 (en) * 2017-06-12 2019-12-17 International Business Machines Corporation Method, apparatus, and system for conflict detection and resolution for competing intent classifiers in modular conversation system
CN107515857B (zh) 2017-08-31 2020-08-18 科大讯飞股份有限公司 基于定制技能的语义理解方法及系统
US10303978B1 (en) * 2018-03-26 2019-05-28 Clinc, Inc. Systems and methods for intelligently curating machine learning training data and improving machine learning model performance
US10726204B2 (en) * 2018-05-24 2020-07-28 International Business Machines Corporation Training data expansion for natural language classification
US11093707B2 (en) * 2019-01-15 2021-08-17 International Business Machines Corporation Adversarial training data augmentation data for text classifiers
CN110223674B (zh) * 2019-04-19 2023-05-26 平安科技(深圳)有限公司 语音语料训练方法、装置、计算机设备和存储介质
CN110457447A (zh) * 2019-05-15 2019-11-15 国网浙江省电力有限公司电力科学研究院 一种电网任务型对话系统
CN110209791B (zh) * 2019-06-12 2021-03-26 百融云创科技股份有限公司 一种多轮对话智能语音交互系统及装置
US11538457B2 (en) * 2020-03-30 2022-12-27 Oracle International Corporation Noise data augmentation for natural language processing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018081298A (ja) 2016-11-16 2018-05-24 三星電子株式会社Samsung Electronics Co.,Ltd. 自然語処理方法及び装置と自然語処理モデルを学習する方法及び装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
村上 聡一朗 外6名,自然発話に頑健な機械翻訳の検討,言語処理学会第25回年次大会 発表論文集 [online] Proceedings of the Twenty-fifth Annual Meeting of the Association for Natural Language Processing,日本,言語処理学会,2019年03月04日,pp.651-654
池田 大志 外2名,文書分類におけるテキストノイズおよびラベルノイズの影響分析,言語処理学会第26回年次大会 発表論文集 [online] Proceedings of the Twenty-sixth Annual Meeting of the Association for Natural Language Processing,日本,言語処理学会,2020年03月09日,pp.221-224

Also Published As

Publication number Publication date
JP2023519713A (ja) 2023-05-12
JP2025170253A (ja) 2025-11-18
US11538457B2 (en) 2022-12-27
WO2021201907A1 (en) 2021-10-07
CN115398436B (zh) 2025-08-05
US20210304733A1 (en) 2021-09-30
US20230169955A1 (en) 2023-06-01
CN115398436A (zh) 2022-11-25
US11972755B2 (en) 2024-04-30
EP4128010A1 (en) 2023-02-08

Similar Documents

Publication Publication Date Title
JP7721559B2 (ja) 自然言語処理のためのノイズデータ拡張
JP7561836B2 (ja) 自然言語処理のためのストップワードデータ拡張
JP7703667B2 (ja) 固有表現認識モデルを用いたコンテキストタグ統合
JP7682202B2 (ja) ドメイン外(ood)検出のための改良された技術
JP7789778B2 (ja) 自然言語処理のためのドメイン外データ拡張
JP7828346B2 (ja) 自然言語処理のためのキーワードデータ拡張ツール
JP7726995B2 (ja) 自然言語処理のための強化されたロジット
JP2024539003A (ja) 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整
JP7771196B2 (ja) 自然言語プロセッサのための複数特徴均衡化
JP2024540111A (ja) 文書からの埋め込まれるデータの抽出のための深層学習技術
US12572852B2 (en) Lexical dropout for natural language processing
JP2024543062A (ja) 自然言語処理のパスのドロップアウト
JP2025528391A (ja) 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230607

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20230607

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20240628

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20240806

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240920

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20250107

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250311

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20250701

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20250730

R150 Certificate of patent or registration of utility model

Ref document number: 7721559

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150