JP7605289B2 - 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 - Google Patents
音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 Download PDFInfo
- Publication number
- JP7605289B2 JP7605289B2 JP2023503251A JP2023503251A JP7605289B2 JP 7605289 B2 JP7605289 B2 JP 7605289B2 JP 2023503251 A JP2023503251 A JP 2023503251A JP 2023503251 A JP2023503251 A JP 2023503251A JP 7605289 B2 JP7605289 B2 JP 7605289B2
- Authority
- JP
- Japan
- Prior art keywords
- probability
- phoneme
- character
- sequence
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 43
- 238000013528 artificial neural network Methods 0.000 claims description 109
- 238000012549 training Methods 0.000 claims description 28
- 239000013256 coordination polymer Substances 0.000 description 108
- 238000004364 calculation method Methods 0.000 description 43
- 238000004891 communication Methods 0.000 description 29
- 238000004590 computer program Methods 0.000 description 28
- 238000012545 processing Methods 0.000 description 28
- 230000008569 process Effects 0.000 description 14
- 230000012447 hatching Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000007476 Maximum Likelihood Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 4
- 238000007639 printing Methods 0.000 description 4
- 239000000470 constituent Substances 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 235000000177 Indigofera tinctoria Nutrition 0.000 description 1
- 241000219050 Polygonaceae Species 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229940097275 indigo Drugs 0.000 description 1
- COHYTHOBJLSHDF-UHFFFAOYSA-N indigo powder Natural products N1C2=CC=CC=C2C(=O)C1=C1C(=O)C2=CC=CC=C2N1 COHYTHOBJLSHDF-UHFFFAOYSA-N 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/008106 WO2022185437A1 (ja) | 2021-03-03 | 2021-03-03 | 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JPWO2022185437A1 JPWO2022185437A1 (https=) | 2022-09-09 |
| JPWO2022185437A5 JPWO2022185437A5 (https=) | 2023-11-10 |
| JP7605289B2 true JP7605289B2 (ja) | 2024-12-24 |
Family
ID=83153997
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023503251A Active JP7605289B2 (ja) | 2021-03-03 | 2021-03-03 | 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20240144915A1 (https=) |
| JP (1) | JP7605289B2 (https=) |
| WO (1) | WO2022185437A1 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118891636A (zh) * | 2023-02-20 | 2024-11-01 | 株式会社日立高新技术 | 模型生成系统以及模型生成方法 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013072974A (ja) | 2011-09-27 | 2013-04-22 | Toshiba Corp | 音声認識装置、方法及びプログラム |
| JP2019012095A (ja) | 2017-06-29 | 2019-01-24 | 日本放送協会 | 音素認識辞書生成装置および音素認識装置ならびにそれらのプログラム |
| US10210860B1 (en) | 2018-07-27 | 2019-02-19 | Deepgram, Inc. | Augmented generalized deep learning with special vocabulary |
-
2021
- 2021-03-03 JP JP2023503251A patent/JP7605289B2/ja active Active
- 2021-03-03 WO PCT/JP2021/008106 patent/WO2022185437A1/ja not_active Ceased
- 2021-03-03 US US18/279,134 patent/US20240144915A1/en not_active Abandoned
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013072974A (ja) | 2011-09-27 | 2013-04-22 | Toshiba Corp | 音声認識装置、方法及びプログラム |
| JP2019012095A (ja) | 2017-06-29 | 2019-01-24 | 日本放送協会 | 音素認識辞書生成装置および音素認識装置ならびにそれらのプログラム |
| US10210860B1 (en) | 2018-07-27 | 2019-02-19 | Deepgram, Inc. | Augmented generalized deep learning with special vocabulary |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2022185437A1 (ja) | 2022-09-09 |
| JPWO2022185437A1 (https=) | 2022-09-09 |
| US20240144915A1 (en) | 2024-05-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102754124B1 (ko) | 숫자 시퀀스에 대한 종단 간 자동 음성 인식 | |
| CN113439301B (zh) | 用于机器学习的方法和系统 | |
| JP7092953B2 (ja) | エンドツーエンドモデルによる多言語音声認識のための音素に基づく文脈解析 | |
| Le et al. | Automatic speech recognition for under-resourced languages: application to Vietnamese language | |
| KR20230043084A (ko) | 순차적 운율 특징을 기초로 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체 | |
| Livescu et al. | Subword modeling for automatic speech recognition: Past, present, and emerging approaches | |
| Taylor et al. | Analysis of pronunciation learning in end-to-end speech synthesis | |
| CN110767213A (zh) | 一种韵律预测方法及装置 | |
| CN112669845A (zh) | 语音识别结果的校正方法及装置、电子设备、存储介质 | |
| Chen et al. | The USTC system for blizzard challenge 2014 | |
| JP6718787B2 (ja) | 日本語音声認識モデル学習装置及びプログラム | |
| US20240211688A1 (en) | Systems and Methods for Generating Locale-Specific Phonetic Spelling Variations | |
| US20080027725A1 (en) | Automatic Accent Detection With Limited Manually Labeled Data | |
| Hanzlíček et al. | Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure | |
| Al-Zaro et al. | Speaker-independent phoneme-based automatic Quranic speech recognition using deep learning | |
| JP7605289B2 (ja) | 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 | |
| Joshi et al. | Vowel mispronunciation detection using DNN acoustic models with cross-lingual training. | |
| Rajendran et al. | A robust syllable centric pronunciation model for Tamil text to speech synthesizer | |
| CN116453500A (zh) | 小语种的语音合成方法、系统、电子设备和存储介质 | |
| Proença | Automatic assessment of reading ability of children | |
| Soundarya et al. | Analysis of mispronunciation detection and diagnosis based on conventional deep learning techniques | |
| Taylor | Pronunciation modelling in end-to-end text-to-speech synthesis | |
| Hendessi et al. | A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM | |
| US11809831B2 (en) | Symbol sequence converting apparatus and symbol sequence conversion method | |
| Sayed et al. | Convolutional neural networks to facilitate the continuous recognition of arabic speech with independent speakers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230816 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20230816 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20240806 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240910 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20241112 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20241125 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7605289 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |