SG11201808360SA - Acoustic model training method, speech recognition method, apparatus, device and medium - Google Patents

Acoustic model training method, speech recognition method, apparatus, device and medium

Info

Publication number
SG11201808360SA
SG11201808360SA SG11201808360SA SG11201808360SA SG11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA SG 11201808360S A SG11201808360S A SG 11201808360SA
Authority
SG
Singapore
Prior art keywords
training
acoustic model
model
medium
model training
Prior art date
Application number
SG11201808360SA
Inventor
Hao Liang
Jianzong Wang
Ning Cheng
Jing Xiao
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of SG11201808360SA publication Critical patent/SG11201808360SA/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/16Hidden Markov models [HMM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/022Demisyllables, biphones or triphones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Character Discrimination (AREA)

Abstract

An acoustic model training method, a speech recognition method, an apparatus, a device and a medium. The acoustic model training method comprises: performing feature extraction from a training speech signal to obtain an audio feature sequence; training the audio feature sequence by a phoneme mixed Gaussian Model-Hidden Markov Model to obtain a phoneme feature sequence; and training the phoneme feature sequence by a Deep Neural Net-Hidden Markov Model-sequence training model to obtain a target acoustic model. The acoustic model training method can effectively save the time required for an acoustic model training, improve the training efficiency, and ensure the recognition efficiency.
SG11201808360SA 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method, apparatus, device and medium SG11201808360SA (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710627480.8A CN107680582B (en) 2017-07-28 2017-07-28 Acoustic model training method, voice recognition method, device, equipment and medium
PCT/CN2017/099825 WO2019019252A1 (en) 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method and apparatus, device and medium

Publications (1)

Publication Number Publication Date
SG11201808360SA true SG11201808360SA (en) 2019-02-27

Family

ID=61133210

Family Applications (1)

Application Number Title Priority Date Filing Date
SG11201808360SA SG11201808360SA (en) 2017-07-28 2017-08-31 Acoustic model training method, speech recognition method, apparatus, device and medium

Country Status (4)

Country Link
US (1) US11030998B2 (en)
CN (1) CN107680582B (en)
SG (1) SG11201808360SA (en)
WO (1) WO2019019252A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 Method and system for rapidly building robust acoustic model

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102535411B1 (en) * 2017-11-16 2023-05-23 삼성전자주식회사 Apparatus and method related to metric learning based data classification
CN108447475A (en) * 2018-03-02 2018-08-24 国家电网公司华中分部 A kind of method for building up of the speech recognition modeling based on electric power dispatching system
CN108564940B (en) * 2018-03-20 2020-04-28 平安科技(深圳)有限公司 Speech recognition method, server and computer-readable storage medium
CN108806696B (en) * 2018-05-08 2020-06-05 平安科技(深圳)有限公司 Method and device for establishing voiceprint model, computer equipment and storage medium
CN108831463B (en) * 2018-06-28 2021-11-12 广州方硅信息技术有限公司 Lip language synthesis method and device, electronic equipment and storage medium
CN108989341B (en) * 2018-08-21 2023-01-13 平安科技(深圳)有限公司 Voice autonomous registration method and device, computer equipment and storage medium
CN108986835B (en) * 2018-08-28 2019-11-26 百度在线网络技术(北京)有限公司 Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network
CN109167880B (en) * 2018-08-30 2021-05-21 努比亚技术有限公司 Double-sided screen terminal control method, double-sided screen terminal and computer readable storage medium
CN109036379B (en) * 2018-09-06 2021-06-11 百度时代网络技术(北京)有限公司 Speech recognition method, apparatus and storage medium
CN111048062B (en) 2018-10-10 2022-10-04 华为技术有限公司 Speech synthesis method and apparatus
CN110164452B (en) * 2018-10-10 2023-03-10 腾讯科技(深圳)有限公司 Voiceprint recognition method, model training method and server
CN109559735B (en) * 2018-10-11 2023-10-27 平安科技(深圳)有限公司 Voice recognition method, terminal equipment and medium based on neural network
CN109524011A (en) * 2018-10-22 2019-03-26 四川虹美智能科技有限公司 A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition
CN109243429B (en) * 2018-11-21 2021-12-10 苏州奇梦者网络科技有限公司 Voice modeling method and device
US11170761B2 (en) * 2018-12-04 2021-11-09 Sorenson Ip Holdings, Llc Training of speech recognition systems
CN109326277B (en) * 2018-12-05 2022-02-08 四川长虹电器股份有限公司 Semi-supervised phoneme forced alignment model establishing method and system
CN109243465A (en) * 2018-12-06 2019-01-18 平安科技(深圳)有限公司 Voiceprint authentication method, device, computer equipment and storage medium
CN109830277B (en) * 2018-12-12 2024-03-15 平安科技(深圳)有限公司 Rope skipping monitoring method, electronic device and storage medium
CN109817191B (en) * 2019-01-04 2023-06-06 平安科技(深圳)有限公司 Tremolo modeling method, device, computer equipment and storage medium
CN109616103B (en) * 2019-01-09 2022-03-22 百度在线网络技术(北京)有限公司 Acoustic model training method and device and storage medium
CN109887484B (en) * 2019-02-22 2023-08-04 平安科技(深圳)有限公司 Dual learning-based voice recognition and voice synthesis method and device
CN111798857A (en) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
CN111833847B (en) * 2019-04-15 2023-07-25 北京百度网讯科技有限公司 Voice processing model training method and device
CN110415685A (en) * 2019-08-20 2019-11-05 河海大学 A kind of audio recognition method
US11423926B2 (en) * 2019-12-20 2022-08-23 Eduworks Corporation Real-time voice phishing detection
US11586964B2 (en) * 2020-01-30 2023-02-21 Dell Products L.P. Device component management using deep learning techniques
CN111489739B (en) * 2020-04-17 2023-06-16 嘉楠明芯(北京)科技有限公司 Phoneme recognition method, apparatus and computer readable storage medium
CN111696525A (en) * 2020-05-08 2020-09-22 天津大学 Kaldi-based Chinese speech recognition acoustic model construction method
CN111666469B (en) * 2020-05-13 2023-06-16 广州国音智能科技有限公司 Statement library construction method, device, equipment and storage medium
CN111798841B (en) * 2020-05-13 2023-01-03 厦门快商通科技股份有限公司 Acoustic model training method and system, mobile terminal and storage medium
CN111833852B (en) * 2020-06-30 2022-04-15 思必驰科技股份有限公司 Acoustic model training method and device and computer readable storage medium
CN111933121B (en) * 2020-08-31 2024-03-12 广州市百果园信息技术有限公司 Acoustic model training method and device
CN111816171B (en) * 2020-08-31 2020-12-11 北京世纪好未来教育科技有限公司 Training method of voice recognition model, voice recognition method and device
CN112331219B (en) * 2020-11-05 2024-05-03 北京晴数智慧科技有限公司 Voice processing method and device
CN112489662A (en) * 2020-11-13 2021-03-12 北京沃东天骏信息技术有限公司 Method and apparatus for training speech processing models
CN113035247B (en) * 2021-03-17 2022-12-23 广州虎牙科技有限公司 Audio text alignment method and device, electronic equipment and storage medium
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
TWI780738B (en) * 2021-05-28 2022-10-11 宇康生科股份有限公司 Abnormal articulation corpus amplification method and system, speech recognition platform, and abnormal articulation auxiliary device
CN113450803B (en) * 2021-06-09 2024-03-19 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer device and readable storage medium
CN113345418A (en) * 2021-06-09 2021-09-03 中国科学技术大学 Multilingual model training method based on cross-language self-training
CN113449626B (en) * 2021-06-23 2023-11-07 中国科学院上海高等研究院 Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal
CN113689867B (en) * 2021-08-18 2022-06-28 北京百度网讯科技有限公司 Training method and device of voice conversion model, electronic equipment and medium
CN113723546B (en) * 2021-09-03 2023-12-22 江苏理工学院 Bearing fault detection method and system based on discrete hidden Markov model
CN114360517B (en) * 2021-12-17 2023-04-18 天翼爱音乐文化科技有限公司 Audio processing method and device in complex environment and storage medium
CN116364063B (en) * 2023-06-01 2023-09-05 蔚来汽车科技(安徽)有限公司 Phoneme alignment method, apparatus, driving apparatus, and medium

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089178B2 (en) * 2002-04-30 2006-08-08 Qualcomm Inc. Multistream network feature processing for a distributed speech recognition system
US8972253B2 (en) 2010-09-15 2015-03-03 Microsoft Technology Licensing, Llc Deep belief network for large vocabulary continuous speech recognition
US8442821B1 (en) 2012-07-27 2013-05-14 Google Inc. Multi-frame prediction for hybrid neural network/hidden Markov models
US9972306B2 (en) * 2012-08-07 2018-05-15 Interactive Intelligence Group, Inc. Method and system for acoustic data selection for training the parameters of an acoustic model
AU2013305615B2 (en) * 2012-08-24 2018-07-05 Interactive Intelligence, Inc. Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems
CN103117060B (en) * 2013-01-18 2015-10-28 中国科学院声学研究所 For modeling method, the modeling of the acoustic model of speech recognition
CN103971678B (en) 2013-01-29 2015-08-12 腾讯科技(深圳)有限公司 Keyword spotting method and apparatus
CN103971685B (en) * 2013-01-30 2015-06-10 腾讯科技(深圳)有限公司 Method and system for recognizing voice commands
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
KR101988222B1 (en) 2015-02-12 2019-06-13 한국전자통신연구원 Apparatus and method for large vocabulary continuous speech recognition
WO2016165120A1 (en) * 2015-04-17 2016-10-20 Microsoft Technology Licensing, Llc Deep neural support vector machines
KR102494139B1 (en) * 2015-11-06 2023-01-31 삼성전자주식회사 Apparatus and method for training neural network, apparatus and method for speech recognition
JP6679898B2 (en) * 2015-11-24 2020-04-15 富士通株式会社 KEYWORD DETECTION DEVICE, KEYWORD DETECTION METHOD, AND KEYWORD DETECTION COMPUTER PROGRAM
CN105702250B (en) * 2016-01-06 2020-05-19 福建天晴数码有限公司 Speech recognition method and device
CN105869624B (en) * 2016-03-29 2019-05-10 腾讯科技(深圳)有限公司 The construction method and device of tone decoding network in spoken digit recognition
CN105976812B (en) * 2016-04-28 2019-04-26 腾讯科技(深圳)有限公司 A kind of audio recognition method and its equipment
CN105957518B (en) * 2016-06-16 2019-05-31 内蒙古大学 A kind of method of Mongol large vocabulary continuous speech recognition
CN106409289B (en) * 2016-09-23 2019-06-28 合肥美的智能科技有限公司 Environment self-adaption method, speech recognition equipment and the household electrical appliance of speech recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634476A (en) * 2019-10-09 2019-12-31 深圳大学 Method and system for rapidly building robust acoustic model

Also Published As

Publication number Publication date
WO2019019252A1 (en) 2019-01-31
CN107680582B (en) 2021-03-26
US20210125603A1 (en) 2021-04-29
US11030998B2 (en) 2021-06-08
CN107680582A (en) 2018-02-09

Similar Documents

Publication Publication Date Title
SG11201808360SA (en) Acoustic model training method, speech recognition method, apparatus, device and medium
EP3751561A3 (en) Hotword recognition
EP3767619A4 (en) Speech recognition and speech recognition model training method and apparatus
AU2019268131A1 (en) Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal
EP4235369A3 (en) Modality learning on mobile devices
EP3154054A3 (en) Method and apparatus for training language model and recognizing speech
PH12019501674A1 (en) Speech wakeup method, apparatus, and electronic device
EP3770905A4 (en) Speech recognition method, apparatus and device, and storage medium
EP3742436A4 (en) Voice synthesis method, model training method, device and computer device
EP3742332A4 (en) Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
EP3001662A3 (en) Conference proceed apparatus and method for advancing conference
EP4064284A4 (en) Voice detection method, prediction model training method, apparatus, device, and medium
WO2014025682A3 (en) Acoustic data selection for training the parameters of an acoustic model
EP3648099A4 (en) Voice recognition method, device, apparatus, and storage medium
GB2551917A (en) Privacy-preserving training corpus selection
MY179900A (en) Speech recognition method and speech recognition apparatus
EP3479376A4 (en) Speech recognition method and apparatus based on speaker recognition
WO2018038385A3 (en) Method for voice recognition and electronic device for performing same
EP3584790A4 (en) Voiceprint recognition method, device, storage medium, and background server
EP3046053A3 (en) Method and apparatus for training language model, and method and apparatus for recognizing language
EP3193328A4 (en) Method and device for performing voice recognition using grammar model
WO2016044027A8 (en) Method and apparatus for performing speaker recognition
EP3353766A4 (en) Methods for the automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition
EP2963643A3 (en) Entity name recognition
EP3848855A4 (en) Learning method and apparatus for intention recognition model, and device