JP7229847B2 - 対話装置、対話方法、及び対話コンピュータプログラム - Google Patents

対話装置、対話方法、及び対話コンピュータプログラム Download PDF

Info

Publication number
JP7229847B2
JP7229847B2 JP2019090423A JP2019090423A JP7229847B2 JP 7229847 B2 JP7229847 B2 JP 7229847B2 JP 2019090423 A JP2019090423 A JP 2019090423A JP 2019090423 A JP2019090423 A JP 2019090423A JP 7229847 B2 JP7229847 B2 JP 7229847B2
Authority
JP
Japan
Prior art keywords
utterance
user
feature model
speech
end point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2019090423A
Other languages
English (en)
Japanese (ja)
Other versions
JP2020187211A5 (https=
JP2020187211A (ja
Inventor
イスティクラリ アディバ アマリア
健 本間
貴志 住吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP2019090423A priority Critical patent/JP7229847B2/ja
Priority to EP20164080.2A priority patent/EP3739583B1/en
Priority to US16/824,634 priority patent/US11605377B2/en
Publication of JP2020187211A publication Critical patent/JP2020187211A/ja
Publication of JP2020187211A5 publication Critical patent/JP2020187211A5/ja
Application granted granted Critical
Publication of JP7229847B2 publication Critical patent/JP7229847B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)
JP2019090423A 2019-05-13 2019-05-13 対話装置、対話方法、及び対話コンピュータプログラム Active JP7229847B2 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019090423A JP7229847B2 (ja) 2019-05-13 2019-05-13 対話装置、対話方法、及び対話コンピュータプログラム
EP20164080.2A EP3739583B1 (en) 2019-05-13 2020-03-19 Dialog device, dialog method, and dialog computer program
US16/824,634 US11605377B2 (en) 2019-05-13 2020-03-19 Dialog device, dialog method, and dialog computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2019090423A JP7229847B2 (ja) 2019-05-13 2019-05-13 対話装置、対話方法、及び対話コンピュータプログラム

Publications (3)

Publication Number Publication Date
JP2020187211A JP2020187211A (ja) 2020-11-19
JP2020187211A5 JP2020187211A5 (https=) 2022-02-25
JP7229847B2 true JP7229847B2 (ja) 2023-02-28

Family

ID=69846252

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2019090423A Active JP7229847B2 (ja) 2019-05-13 2019-05-13 対話装置、対話方法、及び対話コンピュータプログラム

Country Status (3)

Country Link
US (1) US11605377B2 (https=)
EP (1) EP3739583B1 (https=)
JP (1) JP7229847B2 (https=)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7151606B2 (ja) * 2019-04-17 2022-10-12 日本電信電話株式会社 コマンド解析装置、コマンド解析方法、プログラム
US11050885B1 (en) * 2020-06-02 2021-06-29 Bank Of America Corporation Call interception heuristics
US11587567B2 (en) 2021-03-21 2023-02-21 International Business Machines Corporation User utterance generation for counterfactual analysis and improved conversation flow
EP4158621B1 (en) * 2021-08-17 2025-04-23 Google LLC Enabling natural conversations with soft endpointing for an automated assistant
US12020703B2 (en) 2021-08-17 2024-06-25 Google Llc Enabling natural conversations with soft endpointing for an automated assistant
KR102840099B1 (ko) * 2021-08-30 2025-08-01 한국전자기술연구원 대화형 에이전트 시스템에서 back-channel 자동 생성 방법 및 시스템
JP2025097515A (ja) * 2023-12-19 2025-07-01 株式会社デンソー 制御装置、ロボットシステム、システム、制御方法、及び制御プログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008069308A1 (ja) 2006-12-08 2008-06-12 Nec Corporation 音声認識装置および音声認識方法
JP2018124484A (ja) 2017-02-02 2018-08-09 トヨタ自動車株式会社 音声認識装置
JP2018523156A (ja) 2015-06-29 2018-08-16 アマゾン テクノロジーズ インコーポレイテッド 言語モデルスピーチエンドポインティング

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177810B2 (en) * 2001-04-10 2007-02-13 Sri International Method and apparatus for performing prosody-based endpointing of a speech signal
US8214213B1 (en) * 2006-04-27 2012-07-03 At&T Intellectual Property Ii, L.P. Speech recognition based on pronunciation modeling
JP4798039B2 (ja) 2007-03-26 2011-10-19 株式会社デンソー 音声対話装置および方法
US7996214B2 (en) * 2007-11-01 2011-08-09 At&T Intellectual Property I, L.P. System and method of exploiting prosodic features for dialog act tagging in a discriminative modeling framework
US9437186B1 (en) 2013-06-19 2016-09-06 Amazon Technologies, Inc. Enhanced endpoint detection for speech recognition
JP5958475B2 (ja) 2014-01-17 2016-08-02 株式会社デンソー 音声認識端末装置、音声認識システム、音声認識方法
US10134425B1 (en) * 2015-06-29 2018-11-20 Amazon Technologies, Inc. Direction-based speech endpointing
US10332509B2 (en) * 2015-11-25 2019-06-25 Baidu USA, LLC End-to-end speech recognition
US10235994B2 (en) * 2016-03-04 2019-03-19 Microsoft Technology Licensing, Llc Modular deep learning model
US12020174B2 (en) * 2016-08-16 2024-06-25 Ebay Inc. Selecting next user prompt types in an intelligent online personal assistant multi-turn dialog
US10832658B2 (en) * 2017-11-15 2020-11-10 International Business Machines Corporation Quantized dialog language model for dialog systems
US10810996B2 (en) * 2018-07-31 2020-10-20 Nuance Communications, Inc. System and method for performing automatic speech recognition system parameter adjustment via machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008069308A1 (ja) 2006-12-08 2008-06-12 Nec Corporation 音声認識装置および音声認識方法
JP2018523156A (ja) 2015-06-29 2018-08-16 アマゾン テクノロジーズ インコーポレイテッド 言語モデルスピーチエンドポインティング
JP2018124484A (ja) 2017-02-02 2018-08-09 トヨタ自動車株式会社 音声認識装置

Also Published As

Publication number Publication date
EP3739583B1 (en) 2022-09-14
US11605377B2 (en) 2023-03-14
US20200365146A1 (en) 2020-11-19
EP3739583A1 (en) 2020-11-18
JP2020187211A (ja) 2020-11-19

Similar Documents

Publication Publication Date Title
JP7229847B2 (ja) 対話装置、対話方法、及び対話コンピュータプログラム
JP7508533B2 (ja) 話者埋め込みと訓練された生成モデルとを使用する話者ダイアライゼーション
CN112673421B (zh) 训练和/或使用语言选择模型以自动确定用于口头话语的话音辨识的语言
JP7786874B2 (ja) ニューラル・スピーチ・ツー・ミーニング
JP6538779B2 (ja) 音声対話システム、音声対話方法、および音声対話システムを適合させる方法
KR101942521B1 (ko) 음성 엔드포인팅
JP7825043B2 (ja) エンドツーエンド音声認識モデルのオンデバイスバッチ処理のための単語境界を予測すること
JP4729902B2 (ja) 音声対話システム
US10152298B1 (en) Confidence estimation based on frequency
JP7230806B2 (ja) 情報処理装置、及び情報処理方法
JP2024538020A (ja) 自然会話音声システムのための非流暢性検出モデル
CN116229946A (zh) 用于语音识别的系统和方法
JPWO2019017462A1 (ja) 満足度推定モデル学習装置、満足度推定装置、満足度推定モデル学習方法、満足度推定方法、およびプログラム
JP7531164B2 (ja) 発話解析装置、発話解析方法及びプログラム
JP2015099304A (ja) 共感反感箇所検出装置、共感反感箇所検出方法及びプログラム
JP6629172B2 (ja) 対話制御装置、その方法及びプログラム
KR20210081166A (ko) 다국어 음성 환경에서의 언어 식별 장치 및 방법
CN114302028A (zh) 提词方法、装置以及电子设备、存储介质、程序产品
JP2025514776A (ja) 結合セグメント化及び自動音声認識
EP4675613A2 (en) Accurate response for noisy user speech by cross-attention stitching encoded audio features into large language models
CN111402893A (zh) 语音识别模型确定方法、语音识别方法及装置、电子设备
CN114822538A (zh) 重打分模型的训练和语音识别方法、装置、系统及设备
CN111640423A (zh) 一种词边界估计方法、装置及电子设备
CN115294974B (zh) 一种语音识别方法、装置、设备和存储介质
JP2025510752A (ja) 継続した会話のためのe2eモデリングを使用した意図されたクエリ検出

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220216

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20220216

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20230124

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20230131

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20230215

R150 Certificate of patent or registration of utility model

Ref document number: 7229847

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150