JP7605289B2 - 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 - Google Patents

音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 Download PDF

Info

Publication number
JP7605289B2
JP7605289B2 JP2023503251A JP2023503251A JP7605289B2 JP 7605289 B2 JP7605289 B2 JP 7605289B2 JP 2023503251 A JP2023503251 A JP 2023503251A JP 2023503251 A JP2023503251 A JP 2023503251A JP 7605289 B2 JP7605289 B2 JP 7605289B2
Authority
JP
Japan
Prior art keywords
probability
phoneme
character
sequence
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023503251A
Other languages
English (en)
Japanese (ja)
Other versions
JPWO2022185437A1 (https=
JPWO2022185437A5 (https=
Inventor
浩司 岡部
仁 山本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of JPWO2022185437A1 publication Critical patent/JPWO2022185437A1/ja
Publication of JPWO2022185437A5 publication Critical patent/JPWO2022185437A5/ja
Application granted granted Critical
Publication of JP7605289B2 publication Critical patent/JP7605289B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
JP2023503251A 2021-03-03 2021-03-03 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体 Active JP7605289B2 (ja)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/008106 WO2022185437A1 (ja) 2021-03-03 2021-03-03 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体

Publications (3)

Publication Number Publication Date
JPWO2022185437A1 JPWO2022185437A1 (https=) 2022-09-09
JPWO2022185437A5 JPWO2022185437A5 (https=) 2023-11-10
JP7605289B2 true JP7605289B2 (ja) 2024-12-24

Family

ID=83153997

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023503251A Active JP7605289B2 (ja) 2021-03-03 2021-03-03 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体

Country Status (3)

Country Link
US (1) US20240144915A1 (https=)
JP (1) JP7605289B2 (https=)
WO (1) WO2022185437A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118891636A (zh) * 2023-02-20 2024-11-01 株式会社日立高新技术 模型生成系统以及模型生成方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013072974A (ja) 2011-09-27 2013-04-22 Toshiba Corp 音声認識装置、方法及びプログラム
JP2019012095A (ja) 2017-06-29 2019-01-24 日本放送協会 音素認識辞書生成装置および音素認識装置ならびにそれらのプログラム
US10210860B1 (en) 2018-07-27 2019-02-19 Deepgram, Inc. Augmented generalized deep learning with special vocabulary

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013072974A (ja) 2011-09-27 2013-04-22 Toshiba Corp 音声認識装置、方法及びプログラム
JP2019012095A (ja) 2017-06-29 2019-01-24 日本放送協会 音素認識辞書生成装置および音素認識装置ならびにそれらのプログラム
US10210860B1 (en) 2018-07-27 2019-02-19 Deepgram, Inc. Augmented generalized deep learning with special vocabulary

Also Published As

Publication number Publication date
WO2022185437A1 (ja) 2022-09-09
JPWO2022185437A1 (https=) 2022-09-09
US20240144915A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
KR102754124B1 (ko) 숫자 시퀀스에 대한 종단 간 자동 음성 인식
CN113439301B (zh) 用于机器学习的方法和系统
JP7092953B2 (ja) エンドツーエンドモデルによる多言語音声認識のための音素に基づく文脈解析
Le et al. Automatic speech recognition for under-resourced languages: application to Vietnamese language
KR20230043084A (ko) 순차적 운율 특징을 기초로 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체
Livescu et al. Subword modeling for automatic speech recognition: Past, present, and emerging approaches
Taylor et al. Analysis of pronunciation learning in end-to-end speech synthesis
CN110767213A (zh) 一种韵律预测方法及装置
CN112669845A (zh) 语音识别结果的校正方法及装置、电子设备、存储介质
Chen et al. The USTC system for blizzard challenge 2014
JP6718787B2 (ja) 日本語音声認識モデル学習装置及びプログラム
US20240211688A1 (en) Systems and Methods for Generating Locale-Specific Phonetic Spelling Variations
US20080027725A1 (en) Automatic Accent Detection With Limited Manually Labeled Data
Hanzlíček et al. Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure
Al-Zaro et al. Speaker-independent phoneme-based automatic Quranic speech recognition using deep learning
JP7605289B2 (ja) 音声認識装置、音声認識方法、学習装置、学習方法、及び、記録媒体
Joshi et al. Vowel mispronunciation detection using DNN acoustic models with cross-lingual training.
Rajendran et al. A robust syllable centric pronunciation model for Tamil text to speech synthesizer
CN116453500A (zh) 小语种的语音合成方法、系统、电子设备和存储介质
Proença Automatic assessment of reading ability of children
Soundarya et al. Analysis of mispronunciation detection and diagnosis based on conventional deep learning techniques
Taylor Pronunciation modelling in end-to-end text-to-speech synthesis
Hendessi et al. A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM
US11809831B2 (en) Symbol sequence converting apparatus and symbol sequence conversion method
Sayed et al. Convolutional neural networks to facilitate the continuous recognition of arabic speech with independent speakers

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230816

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20230816

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20240806

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240910

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20241112

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20241125

R150 Certificate of patent or registration of utility model

Ref document number: 7605289

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150