CN108389576A - The optimization method and system of compressed speech recognition modeling - Google Patents
The optimization method and system of compressed speech recognition modeling Download PDFInfo
- Publication number
- CN108389576A CN108389576A CN201810021903.6A CN201810021903A CN108389576A CN 108389576 A CN108389576 A CN 108389576A CN 201810021903 A CN201810021903 A CN 201810021903A CN 108389576 A CN108389576 A CN 108389576A
- Authority
- CN
- China
- Prior art keywords
- model
- sequence
- student
- posterior probability
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000005457 optimization Methods 0.000 title claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 72
- 238000013528 artificial neural network Methods 0.000 claims abstract description 30
- 230000006835 compression Effects 0.000 claims abstract description 21
- 238000007906 compression Methods 0.000 claims abstract description 21
- 230000000644 propagated effect Effects 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 230000015654 memory Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000013518 transcription Methods 0.000 description 8
- 230000035897 transcription Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000005498 polishing Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810021903.6A CN108389576B (en) | 2018-01-10 | 2018-01-10 | Method and system for optimizing compressed speech recognition model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810021903.6A CN108389576B (en) | 2018-01-10 | 2018-01-10 | Method and system for optimizing compressed speech recognition model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108389576A true CN108389576A (en) | 2018-08-10 |
CN108389576B CN108389576B (en) | 2020-09-01 |
Family
ID=63077076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810021903.6A Active CN108389576B (en) | 2018-01-10 | 2018-01-10 | Method and system for optimizing compressed speech recognition model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108389576B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448706A (en) * | 2018-12-12 | 2019-03-08 | 苏州思必驰信息科技有限公司 | Neural network language model compression method and system |
CN110246487A (en) * | 2019-06-13 | 2019-09-17 | 苏州思必驰信息科技有限公司 | Optimization method and system for single pass speech recognition modeling |
CN110867178A (en) * | 2018-08-28 | 2020-03-06 | 中国科学院声学研究所 | Multi-channel far-field speech recognition method |
CN111312271A (en) * | 2020-02-28 | 2020-06-19 | 云知声智能科技股份有限公司 | Model compression method and system for improving convergence rate and processing performance |
CN111598216A (en) * | 2020-04-16 | 2020-08-28 | 北京百度网讯科技有限公司 | Method, device and equipment for generating student network model and storage medium |
CN111627428A (en) * | 2020-05-15 | 2020-09-04 | 北京青牛技术股份有限公司 | Method for constructing compressed speech recognition model |
CN111754985A (en) * | 2020-07-06 | 2020-10-09 | 上海依图信息技术有限公司 | Method and device for training voice recognition model and voice recognition |
CN111768762A (en) * | 2020-06-05 | 2020-10-13 | 北京有竹居网络技术有限公司 | Voice recognition method and device and electronic equipment |
CN112673421A (en) * | 2018-11-28 | 2021-04-16 | 谷歌有限责任公司 | Training and/or using language selection models to automatically determine a language for voice recognition of spoken utterances |
CN113314107A (en) * | 2021-05-28 | 2021-08-27 | 思必驰科技股份有限公司 | Method and apparatus for training speech augmentation models |
CN113362218A (en) * | 2021-05-21 | 2021-09-07 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113488023A (en) * | 2021-07-07 | 2021-10-08 | 合肥讯飞数码科技有限公司 | Language identification model construction method and language identification method |
WO2022121257A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Model training method and apparatus, speech recognition method and apparatus, device, and storage medium |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0612093A (en) * | 1992-03-02 | 1994-01-21 | American Teleph & Telegr Co <Att> | Speech recognizing apparatus and method and apparatus for training thereof |
JPH10207485A (en) * | 1997-01-22 | 1998-08-07 | Toshiba Corp | Speech recognition system and method of speaker adaptation |
CN1293428A (en) * | 2000-11-10 | 2001-05-02 | 清华大学 | Information check method based on speed recognition |
CN1455388A (en) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | Voice identifying system and compression method of characteristic vector set for voice identifying system |
CN101105939A (en) * | 2007-09-04 | 2008-01-16 | 安徽科大讯飞信息科技股份有限公司 | Sonification guiding method |
US7624015B1 (en) * | 1999-05-19 | 2009-11-24 | At&T Intellectual Property Ii, L.P. | Recognizing the numeric language in natural spoken dialogue |
CN102682763A (en) * | 2011-03-10 | 2012-09-19 | 北京三星通信技术研究有限公司 | Method, device and terminal for correcting named entity vocabularies in voice input text |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN104951468A (en) * | 2014-03-28 | 2015-09-30 | 阿里巴巴集团控股有限公司 | Data searching and processing method and system |
CN106157953A (en) * | 2015-04-16 | 2016-11-23 | 科大讯飞股份有限公司 | continuous speech recognition method and system |
NZ719961A (en) * | 2012-07-20 | 2016-11-25 | Interactive Intelligence Inc | Method and system for real-time keyword spotting for speech analytics |
US20170004858A1 (en) * | 2015-06-30 | 2017-01-05 | Coursera, Inc. | Content-based audio playback speed controller |
CN106384587A (en) * | 2015-07-24 | 2017-02-08 | 科大讯飞股份有限公司 | Voice recognition method and system thereof |
US20170092262A1 (en) * | 2015-09-30 | 2017-03-30 | Nice-Systems Ltd | Bettering scores of spoken phrase spotting |
CN106653003A (en) * | 2016-12-26 | 2017-05-10 | 北京云知声信息技术有限公司 | Voice recognition method and device |
WO2017148523A1 (en) * | 2016-03-03 | 2017-09-08 | Telefonaktiebolaget Lm Ericsson (Publ) | Non-parametric audio classification |
CN107545897A (en) * | 2016-06-23 | 2018-01-05 | 松下知识产权经营株式会社 | Conversation activity presumption method, conversation activity estimating device and program |
-
2018
- 2018-01-10 CN CN201810021903.6A patent/CN108389576B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0612093A (en) * | 1992-03-02 | 1994-01-21 | American Teleph & Telegr Co <Att> | Speech recognizing apparatus and method and apparatus for training thereof |
JPH10207485A (en) * | 1997-01-22 | 1998-08-07 | Toshiba Corp | Speech recognition system and method of speaker adaptation |
US7624015B1 (en) * | 1999-05-19 | 2009-11-24 | At&T Intellectual Property Ii, L.P. | Recognizing the numeric language in natural spoken dialogue |
CN1293428A (en) * | 2000-11-10 | 2001-05-02 | 清华大学 | Information check method based on speed recognition |
CN1455388A (en) * | 2002-09-30 | 2003-11-12 | 中国科学院声学研究所 | Voice identifying system and compression method of characteristic vector set for voice identifying system |
CN101105939A (en) * | 2007-09-04 | 2008-01-16 | 安徽科大讯飞信息科技股份有限公司 | Sonification guiding method |
CN102682763A (en) * | 2011-03-10 | 2012-09-19 | 北京三星通信技术研究有限公司 | Method, device and terminal for correcting named entity vocabularies in voice input text |
NZ719961A (en) * | 2012-07-20 | 2016-11-25 | Interactive Intelligence Inc | Method and system for real-time keyword spotting for speech analytics |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN104951468A (en) * | 2014-03-28 | 2015-09-30 | 阿里巴巴集团控股有限公司 | Data searching and processing method and system |
CN106157953A (en) * | 2015-04-16 | 2016-11-23 | 科大讯飞股份有限公司 | continuous speech recognition method and system |
US20170004858A1 (en) * | 2015-06-30 | 2017-01-05 | Coursera, Inc. | Content-based audio playback speed controller |
CN106384587A (en) * | 2015-07-24 | 2017-02-08 | 科大讯飞股份有限公司 | Voice recognition method and system thereof |
US20170092262A1 (en) * | 2015-09-30 | 2017-03-30 | Nice-Systems Ltd | Bettering scores of spoken phrase spotting |
WO2017148523A1 (en) * | 2016-03-03 | 2017-09-08 | Telefonaktiebolaget Lm Ericsson (Publ) | Non-parametric audio classification |
CN107545897A (en) * | 2016-06-23 | 2018-01-05 | 松下知识产权经营株式会社 | Conversation activity presumption method, conversation activity estimating device and program |
CN106653003A (en) * | 2016-12-26 | 2017-05-10 | 北京云知声信息技术有限公司 | Voice recognition method and device |
Non-Patent Citations (1)
Title |
---|
吴镇扬: ""基于隐马尔可夫模型与并行模型组合的特征补偿算法"", 《东南大学学报(自然科学版)》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110867178A (en) * | 2018-08-28 | 2020-03-06 | 中国科学院声学研究所 | Multi-channel far-field speech recognition method |
CN110867178B (en) * | 2018-08-28 | 2022-01-21 | 中国科学院声学研究所 | Multi-channel far-field speech recognition method |
CN112673421A (en) * | 2018-11-28 | 2021-04-16 | 谷歌有限责任公司 | Training and/or using language selection models to automatically determine a language for voice recognition of spoken utterances |
CN109448706A (en) * | 2018-12-12 | 2019-03-08 | 苏州思必驰信息科技有限公司 | Neural network language model compression method and system |
CN110246487A (en) * | 2019-06-13 | 2019-09-17 | 苏州思必驰信息科技有限公司 | Optimization method and system for single pass speech recognition modeling |
CN110246487B (en) * | 2019-06-13 | 2021-06-22 | 思必驰科技股份有限公司 | Optimization method and system for single-channel speech recognition model |
CN111312271A (en) * | 2020-02-28 | 2020-06-19 | 云知声智能科技股份有限公司 | Model compression method and system for improving convergence rate and processing performance |
CN111598216B (en) * | 2020-04-16 | 2021-07-06 | 北京百度网讯科技有限公司 | Method, device and equipment for generating student network model and storage medium |
CN111598216A (en) * | 2020-04-16 | 2020-08-28 | 北京百度网讯科技有限公司 | Method, device and equipment for generating student network model and storage medium |
CN111627428A (en) * | 2020-05-15 | 2020-09-04 | 北京青牛技术股份有限公司 | Method for constructing compressed speech recognition model |
CN111627428B (en) * | 2020-05-15 | 2023-11-14 | 北京青牛技术股份有限公司 | Method for constructing compressed speech recognition model |
CN111768762A (en) * | 2020-06-05 | 2020-10-13 | 北京有竹居网络技术有限公司 | Voice recognition method and device and electronic equipment |
CN111768762B (en) * | 2020-06-05 | 2022-01-21 | 北京有竹居网络技术有限公司 | Voice recognition method and device and electronic equipment |
CN111754985A (en) * | 2020-07-06 | 2020-10-09 | 上海依图信息技术有限公司 | Method and device for training voice recognition model and voice recognition |
CN111754985B (en) * | 2020-07-06 | 2023-05-02 | 上海依图信息技术有限公司 | Training of voice recognition model and voice recognition method and device |
WO2022121257A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Model training method and apparatus, speech recognition method and apparatus, device, and storage medium |
CN113362218A (en) * | 2021-05-21 | 2021-09-07 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113314107A (en) * | 2021-05-28 | 2021-08-27 | 思必驰科技股份有限公司 | Method and apparatus for training speech augmentation models |
CN113488023A (en) * | 2021-07-07 | 2021-10-08 | 合肥讯飞数码科技有限公司 | Language identification model construction method and language identification method |
CN113488023B (en) * | 2021-07-07 | 2022-06-14 | 合肥讯飞数码科技有限公司 | Language identification model construction method and language identification method |
Also Published As
Publication number | Publication date |
---|---|
CN108389576B (en) | 2020-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108389576A (en) | The optimization method and system of compressed speech recognition modeling | |
Chen et al. | End-to-end neural network based automated speech scoring | |
US11487950B2 (en) | Autonomous evolution intelligent dialogue method, system, and device based on a game with a physical environment | |
CN109637546B (en) | Knowledge distillation method and apparatus | |
Sim et al. | An investigation into on-device personalization of end-to-end automatic speech recognition models | |
US20200402497A1 (en) | Systems and Methods for Speech Generation | |
CN110706692B (en) | Training method and system of child voice recognition model | |
CN109657041A (en) | The problem of based on deep learning automatic generation method | |
CN110246487A (en) | Optimization method and system for single pass speech recognition modeling | |
CN110275939B (en) | Method and device for determining conversation generation model, storage medium and electronic equipment | |
Schatzmann et al. | Error simulation for training statistical dialogue systems | |
CN106409288B (en) | A method of speech recognition is carried out using the SVM of variation fish-swarm algorithm optimization | |
CN107408111A (en) | End-to-end speech recognition | |
CN109036391A (en) | Audio recognition method, apparatus and system | |
CN108711421A (en) | A kind of voice recognition acoustic model method for building up and device and electronic equipment | |
CN102651217A (en) | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis | |
CN108389575A (en) | Audio data recognition methods and system | |
CN109410974A (en) | Sound enhancement method, device, equipment and storage medium | |
CN110427629A (en) | Semi-supervised text simplified model training method and system | |
EP3916640A2 (en) | Method and apparatus for improving quality of attention-based sequence-to-sequence model | |
CN114596844A (en) | Acoustic model training method, voice recognition method and related equipment | |
May | Kernel approximation methods for speech recognition | |
CN108461080A (en) | A kind of Acoustic Modeling method and apparatus based on HLSTM models | |
CN113254582A (en) | Knowledge-driven dialogue method based on pre-training model | |
CN114373480A (en) | Training method of voice alignment network, voice alignment method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200616 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: AI SPEECH Co.,Ltd. Applicant after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Address before: Suzhou City, Jiangsu Province, Suzhou Industrial Park 215123 Xinghu Street No. 328 Creative Industry Park 9-703 Applicant before: AI SPEECH Co.,Ltd. Applicant before: SHANGHAI JIAO TONG University |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201027 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: AI SPEECH Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Co.,Ltd. Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
CP01 | Change in the name or title of a patent holder |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Patentee after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Patentee before: AI SPEECH Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Optimization Method and System for Compressed Speech Recognition Model Effective date of registration: 20230726 Granted publication date: 20200901 Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch Pledgor: Sipic Technology Co.,Ltd. Registration number: Y2023980049433 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |