JP7721559B2 - 自然言語処理のためのノイズデータ拡張 - Google Patents
自然言語処理のためのノイズデータ拡張Info
- Publication number
- JP7721559B2 JP7721559B2 JP2022559639A JP2022559639A JP7721559B2 JP 7721559 B2 JP7721559 B2 JP 7721559B2 JP 2022559639 A JP2022559639 A JP 2022559639A JP 2022559639 A JP2022559639 A JP 2022559639A JP 7721559 B2 JP7721559 B2 JP 7721559B2
- Authority
- JP
- Japan
- Prior art keywords
- utterances
- utterance
- text
- intent
- skillbot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
- G10L2015/0633—Creating reference templates; Clustering using lexical or orthographic knowledge sources
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0638—Interactive procedures
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2025125324A JP2025170253A (ja) | 2020-03-30 | 2025-07-28 | 自然言語処理のためのノイズデータ拡張 |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063002066P | 2020-03-30 | 2020-03-30 | |
| US63/002,066 | 2020-03-30 | ||
| PCT/US2020/050342 WO2021201907A1 (en) | 2020-03-30 | 2020-09-11 | Noise data augmentation for natural language processing |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2025125324A Division JP2025170253A (ja) | 2020-03-30 | 2025-07-28 | 自然言語処理のためのノイズデータ拡張 |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2023519713A JP2023519713A (ja) | 2023-05-12 |
| JP2023519713A5 JP2023519713A5 (https=) | 2023-06-15 |
| JP7721559B2 true JP7721559B2 (ja) | 2025-08-12 |
Family
ID=72659890
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2022559639A Active JP7721559B2 (ja) | 2020-03-30 | 2020-09-11 | 自然言語処理のためのノイズデータ拡張 |
| JP2025125324A Pending JP2025170253A (ja) | 2020-03-30 | 2025-07-28 | 自然言語処理のためのノイズデータ拡張 |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2025125324A Pending JP2025170253A (ja) | 2020-03-30 | 2025-07-28 | 自然言語処理のためのノイズデータ拡張 |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11538457B2 (https=) |
| EP (1) | EP4128010A1 (https=) |
| JP (2) | JP7721559B2 (https=) |
| CN (1) | CN115398436B (https=) |
| WO (1) | WO2021201907A1 (https=) |
Families Citing this family (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118642630A (zh) * | 2018-08-21 | 2024-09-13 | 谷歌有限责任公司 | 用于自动助理调用的方法 |
| US11538457B2 (en) * | 2020-03-30 | 2022-12-27 | Oracle International Corporation | Noise data augmentation for natural language processing |
| US11556788B2 (en) * | 2020-06-15 | 2023-01-17 | International Business Machines Corporation | Text-based response environment action selection |
| US11599721B2 (en) * | 2020-08-25 | 2023-03-07 | Salesforce, Inc. | Intelligent training set augmentation for natural language processing tasks |
| DK202170043A1 (en) * | 2021-01-29 | 2022-12-12 | A P Moeller Mærsk As | A method for autonomous reconciliation of invoice data and related electronic device |
| US12026471B2 (en) * | 2021-04-16 | 2024-07-02 | Accenture Global Solutions Limited | Automated generation of chatbot |
| US12242816B2 (en) * | 2021-06-30 | 2025-03-04 | Microsoft Technology Licensing, Llc | Task-action prediction engine for a task management system |
| US12321428B2 (en) * | 2021-07-08 | 2025-06-03 | Nippon Telegraph And Telephone Corporation | User authentication device, user authentication method, and user authentication computer program |
| EP4363965A1 (en) * | 2021-08-06 | 2024-05-08 | Siemens Aktiengesellschaft | Source code synthesis for domain specific languages from natural language text |
| US12468938B2 (en) * | 2021-09-21 | 2025-11-11 | International Business Machines Corporation | Training example generation to create new intents for chatbots |
| CN114491048B (zh) * | 2022-02-16 | 2025-08-15 | 北京微播易科技股份有限公司 | 一种数据增强方法、文本分类模型的训练方法和装置 |
| CN115878765B (zh) * | 2022-04-18 | 2024-09-13 | 北京中关村科金技术有限公司 | 一种融合意图识别降噪的催款话术挖掘方法及装置 |
| CN114881130A (zh) * | 2022-04-26 | 2022-08-09 | 华北电力大学 | 一种基于Bagging模型的继电保护缺陷文本定级方法 |
| US12451141B2 (en) | 2022-06-08 | 2025-10-21 | International Business Machines Corporation | Generating multi-turn dialog datasets |
| US12579448B2 (en) | 2022-06-22 | 2026-03-17 | Oracle International Corporation | Techniques for positive entity aware augmentation using two-stage augmentation |
| CN117668216A (zh) * | 2022-08-12 | 2024-03-08 | 南方电网大数据服务有限公司 | 意图识别模型训练方法、意图识别方法和装置 |
| CN116150311A (zh) * | 2022-08-16 | 2023-05-23 | 马上消费金融股份有限公司 | 文本匹配模型的训练方法、意图识别方法及装置 |
| US12499385B2 (en) * | 2022-08-22 | 2025-12-16 | Oracle International Corporation | Adaptive training data augmentation to facilitate training named entity recognition models |
| CN115909354B (zh) * | 2022-11-11 | 2023-11-10 | 北京百度网讯科技有限公司 | 文本生成模型的训练方法、文本获取方法及装置 |
| US12512089B2 (en) * | 2022-12-07 | 2025-12-30 | International Business Machines Corporation | Testing cascaded deep learning pipelines comprising a speech-to-text model and a text intent classifier |
| JP2024098791A (ja) * | 2023-01-11 | 2024-07-24 | 株式会社東芝 | 情報処理装置、情報処理方法及び情報処理プログラム |
| US12231378B2 (en) * | 2023-06-08 | 2025-02-18 | Sap Se | Realtime conversation AI insights and deployment |
| US20250008021A1 (en) * | 2023-06-28 | 2025-01-02 | Jpmorgan Chase Bank, N.A. | Systems and methods for artificial intelligence-based coaching using microlearning |
| US12367342B1 (en) * | 2025-01-15 | 2025-07-22 | Conversational AI Ltd | Automated analysis of computerized conversational agent conversational data |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2018081298A (ja) | 2016-11-16 | 2018-05-24 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 自然語処理方法及び装置と自然語処理モデルを学習する方法及び装置 |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110289025A1 (en) * | 2010-05-19 | 2011-11-24 | Microsoft Corporation | Learning user intent from rule-based training data |
| US20160055240A1 (en) | 2014-08-22 | 2016-02-25 | Microsoft Corporation | Orphaned utterance detection system and method |
| WO2016055240A1 (en) * | 2014-10-06 | 2016-04-14 | Zentrum Mikroelektronik Dresden Ag | Pulsed linear power converter |
| CN105786798B (zh) * | 2016-02-25 | 2018-11-02 | 上海交通大学 | 一种人机交互中自然语言意图理解方法 |
| US10510336B2 (en) * | 2017-06-12 | 2019-12-17 | International Business Machines Corporation | Method, apparatus, and system for conflict detection and resolution for competing intent classifiers in modular conversation system |
| CN107515857B (zh) | 2017-08-31 | 2020-08-18 | 科大讯飞股份有限公司 | 基于定制技能的语义理解方法及系统 |
| US10303978B1 (en) * | 2018-03-26 | 2019-05-28 | Clinc, Inc. | Systems and methods for intelligently curating machine learning training data and improving machine learning model performance |
| US10726204B2 (en) * | 2018-05-24 | 2020-07-28 | International Business Machines Corporation | Training data expansion for natural language classification |
| US11093707B2 (en) * | 2019-01-15 | 2021-08-17 | International Business Machines Corporation | Adversarial training data augmentation data for text classifiers |
| CN110223674B (zh) * | 2019-04-19 | 2023-05-26 | 平安科技(深圳)有限公司 | 语音语料训练方法、装置、计算机设备和存储介质 |
| CN110457447A (zh) * | 2019-05-15 | 2019-11-15 | 国网浙江省电力有限公司电力科学研究院 | 一种电网任务型对话系统 |
| CN110209791B (zh) * | 2019-06-12 | 2021-03-26 | 百融云创科技股份有限公司 | 一种多轮对话智能语音交互系统及装置 |
| US11538457B2 (en) * | 2020-03-30 | 2022-12-27 | Oracle International Corporation | Noise data augmentation for natural language processing |
-
2020
- 2020-09-09 US US17/016,117 patent/US11538457B2/en active Active
- 2020-09-11 WO PCT/US2020/050342 patent/WO2021201907A1/en not_active Ceased
- 2020-09-11 JP JP2022559639A patent/JP7721559B2/ja active Active
- 2020-09-11 EP EP20781159.7A patent/EP4128010A1/en active Pending
- 2020-09-11 CN CN202080099408.2A patent/CN115398436B/zh active Active
-
2022
- 2022-11-23 US US17/993,130 patent/US11972755B2/en active Active
-
2025
- 2025-07-28 JP JP2025125324A patent/JP2025170253A/ja active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2018081298A (ja) | 2016-11-16 | 2018-05-24 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 自然語処理方法及び装置と自然語処理モデルを学習する方法及び装置 |
Non-Patent Citations (2)
| Title |
|---|
| 村上 聡一朗 外6名,自然発話に頑健な機械翻訳の検討,言語処理学会第25回年次大会 発表論文集 [online] Proceedings of the Twenty-fifth Annual Meeting of the Association for Natural Language Processing,日本,言語処理学会,2019年03月04日,pp.651-654 |
| 池田 大志 外2名,文書分類におけるテキストノイズおよびラベルノイズの影響分析,言語処理学会第26回年次大会 発表論文集 [online] Proceedings of the Twenty-sixth Annual Meeting of the Association for Natural Language Processing,日本,言語処理学会,2020年03月09日,pp.221-224 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2023519713A (ja) | 2023-05-12 |
| JP2025170253A (ja) | 2025-11-18 |
| US11538457B2 (en) | 2022-12-27 |
| WO2021201907A1 (en) | 2021-10-07 |
| CN115398436B (zh) | 2025-08-05 |
| US20210304733A1 (en) | 2021-09-30 |
| US20230169955A1 (en) | 2023-06-01 |
| CN115398436A (zh) | 2022-11-25 |
| US11972755B2 (en) | 2024-04-30 |
| EP4128010A1 (en) | 2023-02-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7721559B2 (ja) | 自然言語処理のためのノイズデータ拡張 | |
| JP7561836B2 (ja) | 自然言語処理のためのストップワードデータ拡張 | |
| JP7703667B2 (ja) | 固有表現認識モデルを用いたコンテキストタグ統合 | |
| JP7682202B2 (ja) | ドメイン外(ood)検出のための改良された技術 | |
| JP7789778B2 (ja) | 自然言語処理のためのドメイン外データ拡張 | |
| JP7828346B2 (ja) | 自然言語処理のためのキーワードデータ拡張ツール | |
| JP7726995B2 (ja) | 自然言語処理のための強化されたロジット | |
| JP2024539003A (ja) | 事前トレーニングされた言語モデルの単一のトランスフォーマ層からのマルチヘッドネットワークの微調整 | |
| JP7771196B2 (ja) | 自然言語プロセッサのための複数特徴均衡化 | |
| JP2024540111A (ja) | 文書からの埋め込まれるデータの抽出のための深層学習技術 | |
| US12572852B2 (en) | Lexical dropout for natural language processing | |
| JP2024543062A (ja) | 自然言語処理のパスのドロップアウト | |
| JP2025528391A (ja) | 名前付きエンティティ認識モデルの訓練を容易にするための適応的訓練データ拡大 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230607 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20230607 |
|
| A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20240628 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20240806 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240920 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20250107 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20250311 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20250701 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20250730 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7721559 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |