CN117059081A - 一种轻量化语音识别方法、计算机设备及可读存储介质 - Google Patents
一种轻量化语音识别方法、计算机设备及可读存储介质 Download PDFInfo
- Publication number
- CN117059081A CN117059081A CN202311111161.3A CN202311111161A CN117059081A CN 117059081 A CN117059081 A CN 117059081A CN 202311111161 A CN202311111161 A CN 202311111161A CN 117059081 A CN117059081 A CN 117059081A
- Authority
- CN
- China
- Prior art keywords
- attention
- module
- layer
- decoder
- attention module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000005516 engineering process Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 17
- 230000000873 masking effect Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311111161.3A CN117059081B (zh) | 2023-08-30 | 2023-08-30 | 一种轻量化语音识别方法、计算机设备及可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311111161.3A CN117059081B (zh) | 2023-08-30 | 2023-08-30 | 一种轻量化语音识别方法、计算机设备及可读存储介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117059081A true CN117059081A (zh) | 2023-11-14 |
CN117059081B CN117059081B (zh) | 2024-08-09 |
Family
ID=88653492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311111161.3A Active CN117059081B (zh) | 2023-08-30 | 2023-08-30 | 一种轻量化语音识别方法、计算机设备及可读存储介质 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117059081B (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180341860A1 (en) * | 2017-05-23 | 2018-11-29 | Google Llc | Attention-based sequence transduction neural networks |
CN110909527A (zh) * | 2019-12-03 | 2020-03-24 | 北京字节跳动网络技术有限公司 | 文本处理模型的运行方法、装置、电子设备、及存储介质 |
US20220121871A1 (en) * | 2020-10-16 | 2022-04-21 | Tsinghua University | Multi-directional scene text recognition method and system based on multi-element attention mechanism |
WO2022121150A1 (zh) * | 2020-12-10 | 2022-06-16 | 平安科技(深圳)有限公司 | 基于自注意力机制和记忆网络的语音识别方法及装置 |
CN114999460A (zh) * | 2022-05-18 | 2022-09-02 | 匀熵智能科技(无锡)有限公司 | 一种结合Transformer的轻量化中文语音识别方法 |
CN116013309A (zh) * | 2023-01-14 | 2023-04-25 | 西南大学 | 基于轻量化Transformer网络的语音识别系统及方法 |
-
2023
- 2023-08-30 CN CN202311111161.3A patent/CN117059081B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180341860A1 (en) * | 2017-05-23 | 2018-11-29 | Google Llc | Attention-based sequence transduction neural networks |
CN110909527A (zh) * | 2019-12-03 | 2020-03-24 | 北京字节跳动网络技术有限公司 | 文本处理模型的运行方法、装置、电子设备、及存储介质 |
US20220121871A1 (en) * | 2020-10-16 | 2022-04-21 | Tsinghua University | Multi-directional scene text recognition method and system based on multi-element attention mechanism |
WO2022121150A1 (zh) * | 2020-12-10 | 2022-06-16 | 平安科技(深圳)有限公司 | 基于自注意力机制和记忆网络的语音识别方法及装置 |
CN114999460A (zh) * | 2022-05-18 | 2022-09-02 | 匀熵智能科技(无锡)有限公司 | 一种结合Transformer的轻量化中文语音识别方法 |
CN116013309A (zh) * | 2023-01-14 | 2023-04-25 | 西南大学 | 基于轻量化Transformer网络的语音识别系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN117059081B (zh) | 2024-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112037798B (zh) | 基于触发式非自回归模型的语音识别方法及系统 | |
CN111048082B (zh) | 一种改进的端到端语音识别方法 | |
CN111415667B (zh) | 一种流式端到端语音识别模型训练和解码方法 | |
CN110490946B (zh) | 基于跨模态相似度和生成对抗网络的文本生成图像方法 | |
CN112464861B (zh) | 用于智能人机交互的行为早期识别方法、系统及存储介质 | |
CN110633683B (zh) | 结合DenseNet和resBi-LSTM的中文句子级唇语识别方法 | |
CN111477221A (zh) | 采用双向时序卷积与自注意力机制网络的语音识别系统 | |
CN113627266B (zh) | 基于Transformer时空建模的视频行人重识别方法 | |
CN111261223B (zh) | 一种基于深度学习的crispr脱靶效应预测方法 | |
CN113257248B (zh) | 一种流式和非流式混合语音识别系统及流式语音识别方法 | |
CN113378973B (zh) | 一种基于自注意力机制的图像分类方法 | |
CN115101085A (zh) | 一种卷积增强外部注意力的多说话人时域语音分离方法 | |
CN116258989A (zh) | 基于文本与视觉的时空关联型多模态情感识别方法、系统 | |
CN113656569A (zh) | 一种基于上下文信息推理的生成式对话方法 | |
CN113488029A (zh) | 基于参数共享非自回归语音识别训练解码方法及系统 | |
CN114238652A (zh) | 一种用于端到端场景的工业故障知识图谱建立方法 | |
Yook et al. | Voice conversion using conditional CycleGAN | |
Papadimitriou et al. | End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition. | |
CN116863920B (zh) | 基于双流自监督网络的语音识别方法、装置、设备及介质 | |
CN117059081B (zh) | 一种轻量化语音识别方法、计算机设备及可读存储介质 | |
CN116994573A (zh) | 一种基于脉冲神经网络的端到端语音识别方法与系统 | |
Narayanan et al. | Hierarchical sequence to sequence voice conversion with limited data | |
CN113946670B (zh) | 一种面向对话情感识别的对比式上下文理解增强方法 | |
Tanaka et al. | End-to-end rich transcription-style automatic speech recognition with semi-supervised learning | |
CN115115667A (zh) | 一种基于目标变换回归网络的精确目标跟踪方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Ding Yi Inventor after: Li Le Inventor after: Wang Hao Inventor after: Hong Xingjian Inventor after: Leng Dong Inventor after: Li Shangran Inventor after: Wei Guangyong Inventor after: Duan Zhikui Inventor after: Huang Hailiang Inventor after: Bai Jian Inventor after: Zhang Hailin Inventor after: Lu Heping Inventor after: Li Changjie Inventor after: Chen Huanran Inventor before: Huang Hailiang Inventor before: Li Le Inventor before: Wang Hao Inventor before: Hong Xingjian Inventor before: Leng Dong Inventor before: Ding Yi Inventor before: Wei Guangyong Inventor before: Duan Zhikui Inventor before: Bai Jian Inventor before: Liang Yingwei Inventor before: Zhang Hailin Inventor before: Lu Heping Inventor before: Li Changjie Inventor before: Chen Huanran |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |