CN113396455B - 用于语音识别训练和评分的音译 - Google Patents

用于语音识别训练和评分的音译 Download PDF

Info

Publication number
CN113396455B
CN113396455B CN201980082043.XA CN201980082043A CN113396455B CN 113396455 B CN113396455 B CN 113396455B CN 201980082043 A CN201980082043 A CN 201980082043A CN 113396455 B CN113396455 B CN 113396455B
Authority
CN
China
Prior art keywords
script
speech recognition
language
words
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980082043.XA
Other languages
English (en)
Chinese (zh)
Other versions
CN113396455A (zh
Inventor
布瓦那·拉马巴德兰
马敏
佩德罗·J·莫雷诺·门吉巴尔
杰西·埃蒙德
布赖恩·E·罗克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN113396455A publication Critical patent/CN113396455A/zh
Application granted granted Critical
Publication of CN113396455B publication Critical patent/CN113396455B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
CN201980082043.XA 2018-12-12 2019-02-08 用于语音识别训练和评分的音译 Active CN113396455B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862778431P 2018-12-12 2018-12-12
US62/778,431 2018-12-12
PCT/US2019/017258 WO2020122974A1 (en) 2018-12-12 2019-02-08 Transliteration for speech recognition training and scoring

Publications (2)

Publication Number Publication Date
CN113396455A CN113396455A (zh) 2021-09-14
CN113396455B true CN113396455B (zh) 2025-04-15

Family

ID=65520451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980082043.XA Active CN113396455B (zh) 2018-12-12 2019-02-08 用于语音识别训练和评分的音译

Country Status (5)

Country Link
EP (1) EP3877973B1 (https=)
JP (1) JP7208399B2 (https=)
KR (1) KR102731583B1 (https=)
CN (1) CN113396455B (https=)
WO (1) WO2020122974A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240185840A1 (en) * 2021-08-30 2024-06-06 Boe Technology Group Co., Ltd. Method of training natural language processing model method of natural language processing, and electronic device

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114420159B (zh) * 2020-10-12 2025-04-18 苏州声通信息科技有限公司 音频评测方法及装置、非瞬时性存储介质
US11568858B2 (en) * 2020-10-17 2023-01-31 International Business Machines Corporation Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings
CN113889105B (zh) * 2021-09-29 2025-07-04 北京搜狗科技发展有限公司 一种语音翻译方法、装置和用于语音翻译的装置
CN114118108A (zh) * 2021-11-11 2022-03-01 支付宝(杭州)信息技术有限公司 建立转译模型的方法、转译方法和对应装置
CN114299930B (zh) * 2021-12-21 2025-03-14 广州虎牙科技有限公司 端到端语音识别模型处理方法、语音识别方法及相关装置
CN114520001B (zh) * 2022-03-22 2025-08-01 科大讯飞股份有限公司 一种语音识别方法、装置、设备及存储介质
KR102616598B1 (ko) * 2023-05-30 2023-12-22 주식회사 엘솔루 번역 자막을 이용한 원문 자막 병렬 데이터 생성 방법

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009129315A1 (en) * 2008-04-15 2009-10-22 Mobile Technologies, Llc System and methods for maintaining speech-to-speech translation in the field

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8335688B2 (en) * 2004-08-20 2012-12-18 Multimodal Technologies, Llc Document transcription system training
US20080221866A1 (en) * 2007-03-06 2008-09-11 Lalitesh Katragadda Machine Learning For Transliteration
JP2009157888A (ja) 2007-12-28 2009-07-16 National Institute Of Information & Communication Technology 音訳モデル作成装置、音訳装置、及びそれらのためのコンピュータプログラム
US7472061B1 (en) * 2008-03-31 2008-12-30 International Business Machines Corporation Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciations
US9176936B2 (en) * 2012-09-28 2015-11-03 International Business Machines Corporation Transliteration pair matching
US10540957B2 (en) * 2014-12-15 2020-01-21 Baidu Usa Llc Systems and methods for speech transcription
JP2018028848A (ja) 2016-08-19 2018-02-22 日本放送協会 変換処理装置、音訳処理装置、およびプログラム
US10255909B2 (en) * 2017-06-29 2019-04-09 Intel IP Corporation Statistical-analysis-based reset of recurrent neural networks for automatic speech recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009129315A1 (en) * 2008-04-15 2009-10-22 Mobile Technologies, Llc System and methods for maintaining speech-to-speech translation in the field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
radek safarik 等.unified approach to development of ASR systems for east slavic languages.《statistical language and speech processing.SLSP 2017.lecture notes in computer science》.2017,第10583卷193-203. *
unified approach to development of ASR systems for east slavic languages;radek safarik 等;《statistical language and speech processing.SLSP 2017.lecture notes in computer science》;20170927;第10583卷;193-203 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240185840A1 (en) * 2021-08-30 2024-06-06 Boe Technology Group Co., Ltd. Method of training natural language processing model method of natural language processing, and electronic device
US12525222B2 (en) * 2021-08-30 2026-01-13 Boe Technology Group Co., Ltd. Method of training natural language processing model method of natural language processing, and electronic device

Also Published As

Publication number Publication date
JP2022515048A (ja) 2022-02-17
JP7208399B2 (ja) 2023-01-18
KR102731583B1 (ko) 2024-11-15
CN113396455A (zh) 2021-09-14
WO2020122974A1 (en) 2020-06-18
KR20210076163A (ko) 2021-06-23
EP3877973A1 (en) 2021-09-15
EP3877973B1 (en) 2025-09-10

Similar Documents

Publication Publication Date Title
CN113396455B (zh) 用于语音识别训练和评分的音译
US11417322B2 (en) Transliteration for speech recognition training and scoring
US11942076B2 (en) Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models
Stolcke et al. Recent innovations in speech-to-text transcription at SRI-ICSI-UW
TWI539441B (zh) 語音辨識方法及電子裝置
Kumar et al. A large-vocabulary continuous speech recognition system for Hindi
Emond et al. Transliteration based approaches to improve code-switched speech recognition performance
TW201517015A (zh) 聲學模型的建立方法、語音辨識方法及其電子裝置
US20150179169A1 (en) Speech Recognition By Post Processing Using Phonetic and Semantic Information
Hirayama et al. Automatic speech recognition for mixed dialect utterances by mixing dialect language models
Cucu et al. SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
Arısoy et al. A unified language model for large vocabulary continuous speech recognition of Turkish
Alsharhan et al. Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic
Halabi Arabic speech corpus
Srivastava et al. Homophone Identification and Merging for Code-switched Speech Recognition.
Qiu et al. Context-aware neural confidence estimation for rare word speech recognition
Pellegrini et al. Automatic word decompounding for asr in a morphologically rich language: Application to amharic
US20240420680A1 (en) Simultaneous and multimodal rendering of abridged and non-abridged translations
Veisi et al. Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Horii et al. Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text
Wong et al. Goodness-of-pronunciation without phoneme time alignment
Vertanen Efficient computer interfaces using continuous gestures, language models, and speech
Lehečka et al. Improving speech recognition by detecting foreign inclusions and generating pronunciations
Pushpakumara Applicability of Transfer Learning on End-to-End Sinhala Speech Recognition
Georgiou et al. Context dependent statistical augmentation of persian transcripts.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant