KR102494139B1 - 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법 - Google Patents

뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법 Download PDF

Info

Publication number
KR102494139B1
KR102494139B1 KR1020150156152A KR20150156152A KR102494139B1 KR 102494139 B1 KR102494139 B1 KR 102494139B1 KR 1020150156152 A KR1020150156152 A KR 1020150156152A KR 20150156152 A KR20150156152 A KR 20150156152A KR 102494139 B1 KR102494139 B1 KR 102494139B1
Authority
KR
South Korea
Prior art keywords
learning
data
neural network
primary
training
Prior art date
Application number
KR1020150156152A
Other languages
English (en)
Korean (ko)
Other versions
KR20170053525A (ko
Inventor
이호식
최희열
Original Assignee
삼성전자주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자주식회사 filed Critical 삼성전자주식회사
Priority to KR1020150156152A priority Critical patent/KR102494139B1/ko
Priority to JP2016216662A priority patent/JP6861500B2/ja
Priority to US15/344,110 priority patent/US10529317B2/en
Priority to EP16197493.6A priority patent/EP3166105B1/en
Priority to CN201610977394.5A priority patent/CN106683663B/zh
Publication of KR20170053525A publication Critical patent/KR20170053525A/ko
Application granted granted Critical
Publication of KR102494139B1 publication Critical patent/KR102494139B1/ko

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Image Analysis (AREA)
KR1020150156152A 2015-11-06 2015-11-06 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법 KR102494139B1 (ko)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020150156152A KR102494139B1 (ko) 2015-11-06 2015-11-06 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법
JP2016216662A JP6861500B2 (ja) 2015-11-06 2016-11-04 ニューラルネットワークトレーニング装置及び方法と、音声認識装置及び方法
US15/344,110 US10529317B2 (en) 2015-11-06 2016-11-04 Neural network training apparatus and method, and speech recognition apparatus and method
EP16197493.6A EP3166105B1 (en) 2015-11-06 2016-11-07 Neural network training apparatus and method
CN201610977394.5A CN106683663B (zh) 2015-11-06 2016-11-07 神经网络训练设备和方法以及语音识别设备和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150156152A KR102494139B1 (ko) 2015-11-06 2015-11-06 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법

Publications (2)

Publication Number Publication Date
KR20170053525A KR20170053525A (ko) 2017-05-16
KR102494139B1 true KR102494139B1 (ko) 2023-01-31

Family

ID=57256114

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150156152A KR102494139B1 (ko) 2015-11-06 2015-11-06 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법

Country Status (5)

Country Link
US (1) US10529317B2 (zh)
EP (1) EP3166105B1 (zh)
JP (1) JP6861500B2 (zh)
KR (1) KR102494139B1 (zh)
CN (1) CN106683663B (zh)

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102209689B1 (ko) * 2015-09-10 2021-01-28 삼성전자주식회사 음향 모델 생성 장치 및 방법, 음성 인식 장치 및 방법
WO2017126482A1 (ja) * 2016-01-19 2017-07-27 日本電気株式会社 情報処理装置、情報処理方法、及び、記録媒体
JP2018187006A (ja) * 2017-04-30 2018-11-29 株式会社藤商事 回胴式遊技機
US11195093B2 (en) * 2017-05-18 2021-12-07 Samsung Electronics Co., Ltd Apparatus and method for student-teacher transfer learning network using knowledge bridge
TWI767000B (zh) * 2017-05-20 2022-06-11 英商淵慧科技有限公司 產生波形之方法及電腦儲存媒體
CN109147773B (zh) * 2017-06-16 2021-10-26 上海寒武纪信息科技有限公司 一种语音识别装置和方法
CN107680582B (zh) * 2017-07-28 2021-03-26 平安科技(深圳)有限公司 声学模型训练方法、语音识别方法、装置、设备及介质
CN107610709B (zh) * 2017-08-01 2021-03-19 百度在线网络技术(北京)有限公司 一种训练声纹识别模型的方法及系统
KR102563752B1 (ko) 2017-09-29 2023-08-04 삼성전자주식회사 뉴럴 네트워크를 위한 트레이닝 방법, 뉴럴 네트워크를 이용한 인식 방법 및 그 장치들
CN108417224B (zh) * 2018-01-19 2020-09-01 苏州思必驰信息科技有限公司 双向神经网络模型的训练和识别方法及系统
KR20190129580A (ko) 2018-05-11 2019-11-20 삼성전자주식회사 음성 인식 모델을 개인화하는 방법 및 장치
CN109166571B (zh) * 2018-08-06 2020-11-24 广东美的厨房电器制造有限公司 家电设备的唤醒词训练方法、装置及家电设备
US20200019840A1 (en) * 2018-07-13 2020-01-16 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for sequential event prediction with noise-contrastive estimation for marked temporal point process
EP3598777B1 (en) * 2018-07-18 2023-10-11 Oticon A/s A hearing device comprising a speech presence probability estimator
CN109036412A (zh) * 2018-09-17 2018-12-18 苏州奇梦者网络科技有限公司 语音唤醒方法和系统
CN109448746B (zh) * 2018-09-28 2020-03-24 百度在线网络技术(北京)有限公司 语音降噪方法及装置
CN111383651A (zh) * 2018-12-29 2020-07-07 Tcl集团股份有限公司 一种语音降噪方法、装置及终端设备
KR102002549B1 (ko) * 2019-01-23 2019-07-22 주식회사 솔리드웨어 다단계 분류모델 생성 방법 및 그 장치
CN109872730B (zh) * 2019-03-14 2021-01-12 广州飞傲电子科技有限公司 音频数据的失真补偿方法、模型建立方法和音频输出设备
CN111783932A (zh) * 2019-04-03 2020-10-16 华为技术有限公司 训练神经网络的方法和装置
KR20210010284A (ko) 2019-07-18 2021-01-27 삼성전자주식회사 인공지능 모델의 개인화 방법 및 장치
KR102321798B1 (ko) * 2019-08-15 2021-11-05 엘지전자 주식회사 인공 신경망 기반의 음성 인식 모델을 학습시키는 방법 및 음성 인식 디바이스
CN110349571B (zh) * 2019-08-23 2021-09-07 北京声智科技有限公司 一种基于连接时序分类的训练方法及相关装置
US11900246B2 (en) 2019-09-02 2024-02-13 Samsung Electronics Co., Ltd. Method and apparatus for recognizing user based on on-device training
CN110634476B (zh) * 2019-10-09 2022-06-14 深圳大学 一种快速搭建鲁棒性声学模型的方法及系统
KR102663669B1 (ko) * 2019-11-01 2024-05-08 엘지전자 주식회사 소음 환경에서의 음성 합성
US20210142177A1 (en) * 2019-11-13 2021-05-13 Nvidia Corporation Synthesizing data for training one or more neural networks
DE102020201400A1 (de) 2020-02-05 2021-08-05 Zf Friedrichshafen Ag Generieren von akustischen Trainingsdaten
US11475220B2 (en) * 2020-02-21 2022-10-18 Adobe Inc. Predicting joint intent-slot structure
CN111582463B (zh) * 2020-06-08 2024-02-09 佛山金华信智能科技有限公司 伺服电机故障识别及模型训练方法、装置、介质及终端
US11455534B2 (en) * 2020-06-09 2022-09-27 Macronix International Co., Ltd. Data set cleaning for artificial neural network training
US11741944B2 (en) * 2020-11-24 2023-08-29 Google Llc Speech personalization and federated training using real world noise
CN112992170B (zh) * 2021-01-29 2022-10-28 青岛海尔科技有限公司 模型训练方法及装置、存储介质及电子装置
KR102362872B1 (ko) * 2021-06-08 2022-02-15 오브젠 주식회사 인공지능 학습을 위한 클린 라벨 데이터 정제 방법
GB202203733D0 (en) * 2022-03-17 2022-05-04 Samsung Electronics Co Ltd Patched multi-condition training for robust speech recognition

Family Cites Families (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1989009457A1 (en) 1988-03-25 1989-10-05 Hitachi, Ltd. Processing of high-order information with neuron network and minimum and maximum value searching method therefor
JP3521429B2 (ja) * 1992-03-30 2004-04-19 セイコーエプソン株式会社 ニューラルネットワークを用いた音声認識装置およびその学習方法
DE19531967C2 (de) 1995-08-30 1997-09-11 Siemens Ag Verfahren zum Training eines neuronalen Netzes mit dem nicht deterministischen Verhalten eines technischen Systems
US6446038B1 (en) * 1996-04-01 2002-09-03 Qwest Communications International, Inc. Method and system for objectively evaluating speech
KR100199296B1 (ko) 1996-10-02 1999-06-15 이계철 규칙적인 잡음을 이용한 한글 인식 시스템
JP3614662B2 (ja) 1998-06-12 2005-01-26 日本電信電話株式会社 時空間パターン検出方法及び装置ならびに記録媒体
JP2000259598A (ja) 1999-03-12 2000-09-22 Fuji Electric Co Ltd ニューラルネットワークの最適化学習方法
US6615170B1 (en) * 2000-03-07 2003-09-02 International Business Machines Corporation Model-based voice activity detection system and method using a log-likelihood ratio and pitch
US6876966B1 (en) * 2000-10-16 2005-04-05 Microsoft Corporation Pattern recognition training method and apparatus using inserted noise followed by noise reduction
TWI223792B (en) * 2003-04-04 2004-11-11 Penpower Technology Ltd Speech model training method applied in speech recognition
KR100576803B1 (ko) 2003-12-11 2006-05-10 한국전자통신연구원 신경망에 기반한 음성, 영상, 및 문맥의 통합 음성인식장치 및 방법
US7620546B2 (en) * 2004-03-23 2009-11-17 Qnx Software Systems (Wavemakers), Inc. Isolating speech signals utilizing neural networks
WO2006000103A1 (en) * 2004-06-29 2006-01-05 Universite De Sherbrooke Spiking neural network and use thereof
WO2006099621A2 (en) * 2005-03-17 2006-09-21 University Of Southern California Topic specific language models built from large numbers of documents
US20060277028A1 (en) * 2005-06-01 2006-12-07 Microsoft Corporation Training a statistical parser on noisy data by filtering
US20090271195A1 (en) * 2006-07-07 2009-10-29 Nec Corporation Speech recognition apparatus, speech recognition method, and speech recognition program
KR100908121B1 (ko) 2006-12-15 2009-07-16 삼성전자주식회사 음성 특징 벡터 변환 방법 및 장치
US20110060587A1 (en) * 2007-03-07 2011-03-10 Phillips Michael S Command and control utilizing ancillary information in a mobile voice-to-speech application
ES2678415T3 (es) * 2008-08-05 2018-08-10 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Aparato y procedimiento para procesamiento y señal de audio para mejora de habla mediante el uso de una extracción de característica
US20100145687A1 (en) * 2008-12-04 2010-06-10 Microsoft Corporation Removing noise from speech
US8639502B1 (en) * 2009-02-16 2014-01-28 Arrowhead Center, Inc. Speaker model-based speech enhancement system
EP2259214B1 (en) 2009-06-04 2013-02-27 Honda Research Institute Europe GmbH Implementing a neural associative memory based on non-linear learning of discrete synapses
JP5027859B2 (ja) 2009-10-26 2012-09-19 パナソニック デバイスSunx株式会社 信号識別方法および信号識別装置
US8265928B2 (en) * 2010-04-14 2012-09-11 Google Inc. Geotagged environmental audio for enhanced speech recognition accuracy
US8447596B2 (en) * 2010-07-12 2013-05-21 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
US8725669B1 (en) * 2010-08-02 2014-05-13 Chi Yung Fu Signal processing method and apparatus
TWI442384B (zh) * 2011-07-26 2014-06-21 Ind Tech Res Inst 以麥克風陣列為基礎之語音辨識系統與方法
US8972256B2 (en) * 2011-10-17 2015-03-03 Nuance Communications, Inc. System and method for dynamic noise adaptation for robust automatic speech recognition
US9477925B2 (en) 2012-11-20 2016-10-25 Microsoft Technology Licensing, Llc Deep neural networks training for speech and pattern recognition
KR101558653B1 (ko) 2013-06-14 2015-10-08 전북대학교산학협력단 신경망을 이용한 영상의 화질 개선 시스템 및 방법
US9679224B2 (en) * 2013-06-28 2017-06-13 Cognex Corporation Semi-supervised method for training multiple pattern recognition and registration tool models
CN104143327B (zh) * 2013-07-10 2015-12-09 腾讯科技(深圳)有限公司 一种声学模型训练方法和装置
US9508347B2 (en) * 2013-07-10 2016-11-29 Tencent Technology (Shenzhen) Company Limited Method and device for parallel processing in model training
CN103474066B (zh) * 2013-10-11 2016-01-06 福州大学 基于多频带信号重构的生态声音识别方法
US9633671B2 (en) * 2013-10-18 2017-04-25 Apple Inc. Voice quality enhancement techniques, speech recognition techniques, and related systems
CN103854662B (zh) * 2014-03-04 2017-03-15 中央军委装备发展部第六十三研究所 基于多域联合估计的自适应语音检测方法
EP3192071A4 (en) * 2014-09-09 2017-08-23 Microsoft Technology Licensing, LLC Variable-component deep neural network for robust speech recognition
US9953661B2 (en) * 2014-09-26 2018-04-24 Cirrus Logic Inc. Neural network voice activity detection employing running range normalization
US9299347B1 (en) * 2014-10-22 2016-03-29 Google Inc. Speech recognition using associative mapping
CN104538028B (zh) * 2014-12-25 2017-10-17 清华大学 一种基于深度长短期记忆循环神经网络的连续语音识别方法
US20160189730A1 (en) * 2014-12-30 2016-06-30 Iflytek Co., Ltd. Speech separation method and system
CN104700828B (zh) * 2015-03-19 2018-01-12 清华大学 基于选择性注意原理的深度长短期记忆循环神经网络声学模型的构建方法
US9666183B2 (en) * 2015-03-27 2017-05-30 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
CN104952448A (zh) * 2015-05-04 2015-09-30 张爱英 一种双向长短时记忆递归神经网络的特征增强方法及系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Seyedmahdad Mirsamadi 외, "A Study on Deep Neural Network Acoustic Model Adaptation for Robust Far-field Speech Recognition."(2015.09.06.)
Shi Yin 외, "Noisy training for deep neural network in speech recognition."(2015.01.20.)

Also Published As

Publication number Publication date
EP3166105A1 (en) 2017-05-10
US20170133006A1 (en) 2017-05-11
CN106683663A (zh) 2017-05-17
CN106683663B (zh) 2022-01-25
US10529317B2 (en) 2020-01-07
JP6861500B2 (ja) 2021-04-21
KR20170053525A (ko) 2017-05-16
EP3166105B1 (en) 2019-09-18
JP2017090912A (ja) 2017-05-25

Similar Documents

Publication Publication Date Title
KR102494139B1 (ko) 뉴럴 네트워크 학습 장치 및 방법과, 음성 인식 장치 및 방법
KR102209689B1 (ko) 음향 모델 생성 장치 및 방법, 음성 인식 장치 및 방법
US9818409B2 (en) Context-dependent modeling of phonemes
He et al. Multi-view recurrent neural acoustic word embeddings
US10268671B2 (en) Generating parse trees of text segments using neural networks
CN106469552B (zh) 语音识别设备和方法
US9984683B2 (en) Automatic speech recognition using multi-dimensional models
CN106688034B (zh) 具有情感内容的文字至语音转换
EP3218854B1 (en) Generating natural language descriptions of images
KR102195627B1 (ko) 통역 모델 생성 장치 및 방법과, 자동 통역 장치 및 방법
US20180061439A1 (en) Automatic audio captioning
KR102101044B1 (ko) 텍스트 투 스피치 및 시맨틱스에 기초한 오디오 인적 상호 증명 기법
US11675975B2 (en) Word classification based on phonetic features
CN103839545A (zh) 用于构建多语言声学模型的设备和方法
CN112528637B (zh) 文本处理模型训练方法、装置、计算机设备和存储介质
JP2018537788A (ja) 外部メモリを用いたニューラルネットワークの拡張
CN107112005A (zh) 深度神经支持向量机
US11011161B2 (en) RNNLM-based generation of templates for class-based text generation
CN110335608B (zh) 声纹验证方法、装置、设备及存储介质
CN118043885A (zh) 用于半监督语音识别的对比孪生网络
US10755171B1 (en) Hiding and detecting information using neural networks
JP6810580B2 (ja) 言語モデル学習装置およびそのプログラム
CN111522937B (zh) 话术推荐方法、装置和电子设备
WO2018154372A1 (en) Sound identification utilizing periodic indications
JP6082657B2 (ja) ポーズ付与モデル選択装置とポーズ付与装置とそれらの方法とプログラム

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant