JP7177348B2 - 音声認識装置、音声認識方法およびプログラム - Google Patents
音声認識装置、音声認識方法およびプログラム Download PDFInfo
- Publication number
- JP7177348B2 JP7177348B2 JP2019019476A JP2019019476A JP7177348B2 JP 7177348 B2 JP7177348 B2 JP 7177348B2 JP 2019019476 A JP2019019476 A JP 2019019476A JP 2019019476 A JP2019019476 A JP 2019019476A JP 7177348 B2 JP7177348 B2 JP 7177348B2
- Authority
- JP
- Japan
- Prior art keywords
- speech recognition
- conversation
- unit
- section
- utterance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 85
- 238000012545 processing Methods 0.000 claims description 126
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 19
- 238000012790 confirmation Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Algebra (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
Description
[参考文献1]
"THE NTT CHIME-3 SYSTEM: ADVANCES IN “SPEECH ENHANCEMENT AND RECOGNITION FOR MOBILE MULTI-MICROPHONE DEVICES", Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J. Fabian, Miquel Espi, Takuya Higuchi, Shoko Araki, and Tomohiro Nakatani, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 436-443, December 2015.
[参考文献2]
“音声認識のための深層学”, 久保陽太郎, 人工知能29巻1号,pp 62-70, 2014年1月”
[参考文献3]
特開2017-072884号公報
11 第1の音声認識部
12 処理管理部
13 判定部
14 第2の音声認識部
15,16,17 記憶部
Claims (5)
- 複数の話者による会話の音声データに対して、第1の方式により音声認識処理を行い、前記複数の話者それぞれの発話区間ごとの音声認識結果を出力する第1の音声認識部と、
前記第1の音声認識部による音声認識処理の結果に基づき、前記会話の用件を含む発話の区間である用件区間を判定する判定部と、
前記判定部により用件区間であると判定された区間の音声データに対して、前記第1の方式よりも高精度の第2の方式により音声認識処理を行い、音声認識結果を用件テキストとして出力する第2の音声認識部と、を備える音声認識装置。 - 請求項1に記載の音声認識装置において、
前記判定部は、前記発話または前記用件区間に含まれるキーワードおよび該キーワードの類義語に基づき、前記用件のドメインを判定し、
前記第2の音声認識部は、前記第2の方式として、前記判定部により判定された前記用件のドメインに応じた音声認識モデルを用いた音声認識方式、または、前記発話の音響特徴または前記第1の音声認識部による音声認識の信頼度に応じた音声認識モデルを用いた音声認識方式を用いる、音声認識装置。 - 音声認識装置において実行される音声認識方法であって、
複数の話者による会話の音声データに対して、第1の方式により音声認識処理を行い、前記複数の話者それぞれの発話区間ごとの音声認識結果を出力するステップと、
前記第1の方式による音声認識処理の結果に基づき、前記会話の用件を含む発話の区間である用件区間を判定するステップと、
前記用件区間であると判定された区間の音声データに対して、前記第1の方式よりも高精度の第2の方式により音声認識処理を行い、音声認識結果を用件テキストとして出力するステップと、を含む音声認識方法。 - 請求項3に記載の音声認識方法において、
前記第1の方式は、HMM(Hidden Markov Model)方式またはHMM-DNN(Deep Neural Network)方式を用いた音声認識方式であり、
前記第2の方式は、CNN-NIN(Convolutional Neural Network and Network In Network)方式を用いた音声認識方式である、音声認識方法。 - コンピュータを請求項1または2に記載の音声認識装置として機能させるためのプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019019476A JP7177348B2 (ja) | 2019-02-06 | 2019-02-06 | 音声認識装置、音声認識方法およびプログラム |
US17/428,276 US11990136B2 (en) | 2019-02-06 | 2020-01-24 | Speech recognition device, search device, speech recognition method, search method, and program |
PCT/JP2020/002558 WO2020162229A1 (ja) | 2019-02-06 | 2020-01-24 | 音声認識装置、検索装置、音声認識方法、検索方法およびプログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019019476A JP7177348B2 (ja) | 2019-02-06 | 2019-02-06 | 音声認識装置、音声認識方法およびプログラム |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2020126185A JP2020126185A (ja) | 2020-08-20 |
JP7177348B2 true JP7177348B2 (ja) | 2022-11-24 |
Family
ID=71947175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019019476A Active JP7177348B2 (ja) | 2019-02-06 | 2019-02-06 | 音声認識装置、音声認識方法およびプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US11990136B2 (ja) |
JP (1) | JP7177348B2 (ja) |
WO (1) | WO2020162229A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11710480B2 (en) * | 2019-08-07 | 2023-07-25 | International Business Machines Corporation | Phonetic comparison for virtual assistants |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012093943A (ja) | 2010-10-27 | 2012-05-17 | Nippon Telegr & Teleph Corp <Ntt> | テキスト分割装置、テキスト分割学習装置、テキスト分割方法、テキスト分割学習方法、プログラム |
JP2013167666A (ja) | 2012-02-14 | 2013-08-29 | Nec Corp | 音声認識装置、音声認識方法、及びプログラム |
WO2014069121A1 (ja) | 2012-10-31 | 2014-05-08 | 日本電気株式会社 | 会話分析装置及び会話分析方法 |
JP2015049254A (ja) | 2013-08-29 | 2015-03-16 | 株式会社日立製作所 | 音声データ認識システム及び音声データ認識方法 |
JP2016062333A (ja) | 2014-09-18 | 2016-04-25 | 株式会社日立製作所 | 検索サーバ、及び検索方法 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6421645B1 (en) * | 1999-04-09 | 2002-07-16 | International Business Machines Corporation | Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification |
KR100612839B1 (ko) * | 2004-02-18 | 2006-08-18 | 삼성전자주식회사 | 도메인 기반 대화 음성인식방법 및 장치 |
WO2010100853A1 (ja) * | 2009-03-04 | 2010-09-10 | 日本電気株式会社 | 言語モデル適応装置、音声認識装置、言語モデル適応方法、及びコンピュータ読み取り可能な記録媒体 |
JP5235187B2 (ja) * | 2009-11-16 | 2013-07-10 | 日本電信電話株式会社 | 音声認識装置、音声認識方法及び音声認識プログラム |
JP5434731B2 (ja) * | 2010-03-24 | 2014-03-05 | トヨタ自動車株式会社 | 音声認識システム及び自動検索システム |
JP5231484B2 (ja) * | 2010-05-19 | 2013-07-10 | ヤフー株式会社 | 音声認識装置、音声認識方法、プログラム、及びプログラムを配信する情報処理装置 |
JP5549506B2 (ja) * | 2010-09-28 | 2014-07-16 | 富士通株式会社 | 音声認識装置及び音声認識方法 |
US9495350B2 (en) * | 2012-09-14 | 2016-11-15 | Avaya Inc. | System and method for determining expertise through speech analytics |
JP6210239B2 (ja) * | 2015-04-20 | 2017-10-11 | 本田技研工業株式会社 | 会話解析装置、会話解析方法及びプログラム |
US10347244B2 (en) * | 2017-04-21 | 2019-07-09 | Go-Vivace Inc. | Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response |
US10558421B2 (en) * | 2017-05-22 | 2020-02-11 | International Business Machines Corporation | Context based identification of non-relevant verbal communications |
EP3739573B1 (en) * | 2018-01-12 | 2023-06-28 | Sony Group Corporation | Information processing device, information processing method, and program |
US10878011B2 (en) * | 2018-02-05 | 2020-12-29 | International Business Machines Corporation | Cognitive ranking of terms used during a conversation |
US11574628B1 (en) * | 2018-09-27 | 2023-02-07 | Amazon Technologies, Inc. | Deep multi-channel acoustic modeling using multiple microphone array geometries |
WO2021029643A1 (en) * | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | System and method for modifying speech recognition result |
-
2019
- 2019-02-06 JP JP2019019476A patent/JP7177348B2/ja active Active
-
2020
- 2020-01-24 US US17/428,276 patent/US11990136B2/en active Active
- 2020-01-24 WO PCT/JP2020/002558 patent/WO2020162229A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012093943A (ja) | 2010-10-27 | 2012-05-17 | Nippon Telegr & Teleph Corp <Ntt> | テキスト分割装置、テキスト分割学習装置、テキスト分割方法、テキスト分割学習方法、プログラム |
JP2013167666A (ja) | 2012-02-14 | 2013-08-29 | Nec Corp | 音声認識装置、音声認識方法、及びプログラム |
WO2014069121A1 (ja) | 2012-10-31 | 2014-05-08 | 日本電気株式会社 | 会話分析装置及び会話分析方法 |
JP2015049254A (ja) | 2013-08-29 | 2015-03-16 | 株式会社日立製作所 | 音声データ認識システム及び音声データ認識方法 |
JP2016062333A (ja) | 2014-09-18 | 2016-04-25 | 株式会社日立製作所 | 検索サーバ、及び検索方法 |
Non-Patent Citations (1)
Title |
---|
福冨隆朗ほか,コンタクトセンタ対話に対する要望・了承表現を用いた用件フェーズ抽出,日本音響学会2010年春季研究発表会講演論文集,日本,2010年03月10日,pp.223-226,ISSN 1880-7658 |
Also Published As
Publication number | Publication date |
---|---|
JP2020126185A (ja) | 2020-08-20 |
WO2020162229A1 (ja) | 2020-08-13 |
US20220108699A1 (en) | 2022-04-07 |
US11990136B2 (en) | 2024-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11250851B2 (en) | Multi-layer keyword detection | |
US10186265B1 (en) | Multi-layer keyword detection to avoid detection of keywords in output audio | |
US11335330B2 (en) | Updating a voice template | |
CN108351872B (zh) | 用于响应用户语音的方法和系统 | |
US9305553B2 (en) | Speech recognition accuracy improvement through speaker categories | |
US10917758B1 (en) | Voice-based messaging | |
US8712779B2 (en) | Information retrieval system, information retrieval method, and information retrieval program | |
US7603279B2 (en) | Grammar update system and method for speech recognition | |
US7844456B2 (en) | Grammar confusability metric for speech recognition | |
US10885909B2 (en) | Determining a type of speech recognition processing according to a request from a user | |
US9711167B2 (en) | System and method for real-time speaker segmentation of audio interactions | |
US10089978B2 (en) | Detecting customers with low speech recognition accuracy by investigating consistency of conversation in call-center | |
JP6985221B2 (ja) | 音声認識装置及び音声認識方法 | |
CN114385800A (zh) | 语音对话方法和装置 | |
JP2018128575A (ja) | 話し終わり判定装置、話し終わり判定方法およびプログラム | |
Walker et al. | Semi-supervised model training for unbounded conversational speech recognition | |
JP2020160425A (ja) | 評価システム、評価方法、及びコンピュータプログラム。 | |
JP7177348B2 (ja) | 音声認識装置、音声認識方法およびプログラム | |
JP6755633B2 (ja) | 用件判定装置、用件判定方法およびプログラム | |
WO2020196743A1 (ja) | 評価システム及び評価方法 | |
JP6615803B2 (ja) | 用件判定装置、用件判定方法およびプログラム | |
JPH10173769A (ja) | 音声メッセージ検索装置 | |
JP6526602B2 (ja) | 音声認識装置、その方法、及びプログラム | |
JP2018163295A (ja) | 音声対話装置および音声対話方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20210528 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20220531 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20220728 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20221011 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20221024 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 7177348 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |