KR20230039505A - 음성 인식 방법, 인코딩 및 디코딩 방법, 장치, 전자 기기 및 기록 매체 - Google Patents

음성 인식 방법, 인코딩 및 디코딩 방법, 장치, 전자 기기 및 기록 매체 Download PDF

Info

Publication number
KR20230039505A
KR20230039505A KR1020220060826A KR20220060826A KR20230039505A KR 20230039505 A KR20230039505 A KR 20230039505A KR 1020220060826 A KR1020220060826 A KR 1020220060826A KR 20220060826 A KR20220060826 A KR 20220060826A KR 20230039505 A KR20230039505 A KR 20230039505A
Authority
KR
South Korea
Prior art keywords
feature
encoding
segment
history
information
Prior art date
Application number
KR1020220060826A
Other languages
English (en)
Korean (ko)
Inventor
시아오인 푸
즈지에 천
밍씬 리앙
밍슌 양
밍šœ 양
레이 지아
하이펑 왕
Original Assignee
베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 filed Critical 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드
Publication of KR20230039505A publication Critical patent/KR20230039505A/ko

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
KR1020220060826A 2021-09-13 2022-05-18 음성 인식 방법, 인코딩 및 디코딩 방법, 장치, 전자 기기 및 기록 매체 KR20230039505A (ko)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111069754.9 2021-09-13
CN202111069754.9A CN113889076B (zh) 2021-09-13 2021-09-13 语音识别及编解码方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
KR20230039505A true KR20230039505A (ko) 2023-03-21

Family

ID=79009223

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020220060826A KR20230039505A (ko) 2021-09-13 2022-05-18 음성 인식 방법, 인코딩 및 디코딩 방법, 장치, 전자 기기 및 기록 매체

Country Status (5)

Country Link
US (1) US20230090590A1 (zh)
EP (1) EP4148727A1 (zh)
JP (1) JP7302132B2 (zh)
KR (1) KR20230039505A (zh)
CN (1) CN113889076B (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116454A (zh) * 2022-06-15 2022-09-27 腾讯科技(深圳)有限公司 音频编码方法、装置、设备、存储介质及程序产品
CN115223573A (zh) * 2022-07-15 2022-10-21 北京百度网讯科技有限公司 语音唤醒方法、装置、电子设备以及存储介质
CN115132210B (zh) * 2022-09-02 2022-11-18 北京百度网讯科技有限公司 音频识别方法、音频识别模型的训练方法、装置和设备
CN116741151B (zh) * 2023-08-14 2023-11-07 成都筑猎科技有限公司 一种基于呼叫中心的用户呼叫实时监测系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049668B2 (en) * 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
CN111429889B (zh) * 2019-01-08 2023-04-28 百度在线网络技术(北京)有限公司 基于截断注意力的实时语音识别的方法、装置、设备以及计算机可读存储介质
CN112242144A (zh) * 2019-07-17 2021-01-19 百度在线网络技术(北京)有限公司 基于流式注意力模型的语音识别解码方法、装置、设备以及计算机可读存储介质
CN111627418B (zh) * 2020-05-27 2023-01-31 携程计算机技术(上海)有限公司 语音合成模型的训练方法、合成方法、系统、设备和介质
CN112331185B (zh) * 2020-11-10 2023-08-11 珠海格力电器股份有限公司 一种语音交互方法、系统、存储介质及电子设备
CN112382278B (zh) * 2020-11-18 2021-08-17 北京百度网讯科技有限公司 流式语音识别结果显示方法、装置、电子设备和存储介质
CN112530437B (zh) * 2020-11-18 2023-10-20 北京百度网讯科技有限公司 语义识别方法、装置、设备以及存储介质
CN112735428A (zh) * 2020-12-27 2021-04-30 科大讯飞(上海)科技有限公司 一种热词获取方法、语音识别方法及相关设备
CN112908305B (zh) * 2021-01-30 2023-03-21 云知声智能科技股份有限公司 一种提升语音识别准确性的方法和设备
CN113362812B (zh) * 2021-06-30 2024-02-13 北京搜狗科技发展有限公司 一种语音识别方法、装置和电子设备

Also Published As

Publication number Publication date
JP7302132B2 (ja) 2023-07-04
CN113889076A (zh) 2022-01-04
JP2023041610A (ja) 2023-03-24
US20230090590A1 (en) 2023-03-23
EP4148727A1 (en) 2023-03-15
CN113889076B (zh) 2022-11-01

Similar Documents

Publication Publication Date Title
KR20230039505A (ko) 음성 인식 방법, 인코딩 및 디코딩 방법, 장치, 전자 기기 및 기록 매체
JP7417759B2 (ja) ビデオ認識モデルをトレーニングする方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN113590858B (zh) 目标对象的生成方法、装置、电子设备以及存储介质
US20220108684A1 (en) Method of recognizing speech offline, electronic device, and storage medium
CN113674732B (zh) 语音置信度检测方法、装置、电子设备和存储介质
CN114445831A (zh) 一种图文预训练方法、装置、设备以及存储介质
CN112784897A (zh) 图像处理方法、装置、设备和存储介质
CN114724168A (zh) 深度学习模型的训练方法、文本识别方法、装置和设备
CN113689868B (zh) 一种语音转换模型的训练方法、装置、电子设备及介质
CN112861548A (zh) 自然语言生成及模型的训练方法、装置、设备和存储介质
US20230410794A1 (en) Audio recognition method, method of training audio recognition model, and electronic device
CN114937478B (zh) 用于训练模型的方法、用于生成分子的方法和装置
CN113889087B (zh) 语音识别及模型建立方法、装置、设备和存储介质
CN113689866A (zh) 一种语音转换模型的训练方法、装置、电子设备及介质
CN114783428A (zh) 语音翻译、模型训练方法、装置、设备及存储介质
CN113553413A (zh) 对话状态的生成方法、装置、电子设备和存储介质
CN113361574A (zh) 数据处理模型的训练方法、装置、电子设备及存储介质
CN113129869A (zh) 语音识别模型的训练与语音识别的方法、装置
CN113689867B (zh) 一种语音转换模型的训练方法、装置、电子设备及介质
CN113408298B (zh) 语义解析方法、装置、电子设备及存储介质
CN115034198B (zh) 语言模型中嵌入模块计算优化的方法
US20230081957A1 (en) Motion search method and apparatus, electronic device and storage medium
CN115168553A (zh) 对话语句补全及模型训练方法、装置、设备和存储介质
CN112862909A (zh) 一种数据处理方法、装置、设备以及存储介质
CN115064148A (zh) 语音合成与语音合成模型的训练方法、装置