ES3053473T3 - Vocoder techniques - Google Patents
Vocoder techniquesInfo
- Publication number
- ES3053473T3 ES3053473T3 ES23713351T ES23713351T ES3053473T3 ES 3053473 T3 ES3053473 T3 ES 3053473T3 ES 23713351 T ES23713351 T ES 23713351T ES 23713351 T ES23713351 T ES 23713351T ES 3053473 T3 ES3053473 T3 ES 3053473T3
- Authority
- ES
- Spain
- Prior art keywords
- data
- given frame
- bitstream
- audio signal
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Electrically Operated Instructional Devices (AREA)
- Stereophonic System (AREA)
Abstract
Se describe un generador de audio (10) configurado para generar una señal de audio (16) a partir de un flujo de bits (3), representando el flujo de bits (3) la señal de audio (16), subdividida en una secuencia de tramas, comprendiendo el generador de audio (10): un primer aprovisionador de datos (702) configurado para proporcionar, para una trama dada, primeros datos (15) derivados de una señal de entrada (14); un primer bloque de procesamiento (40, 50, 50a-50h), configurado, para la trama dada, para recibir los primeros datos (15) y para emitir primeros datos de salida (69) en la trama dada, donde el primer bloque de procesamiento (50) comprende: al menos una capa de preacondicionamiento aprendible (710) configurada para recibir el flujo de bits (3), o una versión procesada (112) del mismo, y, para la trama dada, datos de destino de salida (12) que representan la señal de audio (16) en la trama dada; al menos una capa de aprendizaje de acondicionamiento (71, 72, 73) configurada, para el marco dado, para procesar los datos de destino (12) para obtener parámetros de característica de acondicionamiento (74, 75) para el marco dado; y un elemento de estilo (77), configurado para aplicar los parámetros de característica de acondicionamiento (74, 75) a los primeros datos (15, 59a) o a los primeros datos normalizados (59, 76'); en donde la al menos una capa de aprendizaje de preacondicionamiento (710) incluye al menos una capa de aprendizaje recurrente. (Traducción automática con Google Translate, sin valor legal)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22163062 | 2022-03-18 | ||
| EP22182048 | 2022-06-29 | ||
| PCT/EP2023/057107 WO2023175197A1 (en) | 2022-03-18 | 2023-03-20 | Vocoder techniques |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| ES3053473T3 true ES3053473T3 (en) | 2026-01-22 |
Family
ID=85726420
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| ES23713351T Active ES3053473T3 (en) | 2022-03-18 | 2023-03-20 | Vocoder techniques |
| ES23712886T Active ES3053472T3 (en) | 2022-03-18 | 2023-03-20 | Vocoder techniques |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| ES23712886T Active ES3053472T3 (en) | 2022-03-18 | 2023-03-20 | Vocoder techniques |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US20250087223A1 (es) |
| EP (5) | EP4494137B1 (es) |
| CN (2) | CN119096296A (es) |
| ES (2) | ES3053473T3 (es) |
| PL (2) | PL4494137T3 (es) |
| WO (2) | WO2023175197A1 (es) |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022081678A1 (en) * | 2020-10-15 | 2022-04-21 | Dolby Laboratories Licensing Corporation | Frame-level permutation invariant training for source separation |
| US20240005945A1 (en) * | 2022-06-29 | 2024-01-04 | Aondevices, Inc. | Discriminating between direct and machine generated human voices |
| US20250095664A1 (en) * | 2023-09-14 | 2025-03-20 | Robert Bosch Gmbh | Systems and methods of processing audio data with a multi-rate learnable audio frontend |
| CN117153196B (zh) * | 2023-10-30 | 2024-02-09 | 深圳鼎信通达股份有限公司 | Pcm语音信号处理方法、装置、设备及介质 |
| EP4600951A1 (en) * | 2024-02-06 | 2025-08-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Disentangled audio coding and decoding with style control |
| WO2025201625A1 (en) * | 2024-03-25 | 2025-10-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder and decoder |
| WO2026073499A1 (zh) * | 2024-10-01 | 2026-04-09 | 华为技术有限公司 | 处理信号的方法和相关装置 |
| CN119851680A (zh) * | 2025-01-02 | 2025-04-18 | 河北工业大学 | 基于双路径一维卷积分组循环网络的轻量化语音增强方法 |
| CN120783775B (zh) * | 2025-09-08 | 2025-12-09 | 科大讯飞股份有限公司 | 音频编解码方法、电子设备及程序产品 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7167335B2 (ja) * | 2018-10-29 | 2022-11-08 | ドルビー・インターナショナル・アーベー | 生成モデルを用いたレート品質スケーラブル符号化のための方法及び装置 |
| CN117546237A (zh) * | 2021-04-27 | 2024-02-09 | 弗劳恩霍夫应用研究促进协会 | 解码器 |
-
2023
- 2023-03-20 WO PCT/EP2023/057107 patent/WO2023175197A1/en not_active Ceased
- 2023-03-20 CN CN202380036574.1A patent/CN119096296A/zh active Pending
- 2023-03-20 EP EP23713351.7A patent/EP4494137B1/en active Active
- 2023-03-20 WO PCT/EP2023/057108 patent/WO2023175198A1/en not_active Ceased
- 2023-03-20 EP EP25208403.3A patent/EP4682878A3/en active Pending
- 2023-03-20 PL PL23713351.7T patent/PL4494137T3/pl unknown
- 2023-03-20 ES ES23713351T patent/ES3053473T3/es active Active
- 2023-03-20 PL PL23712886.3T patent/PL4494136T3/pl unknown
- 2023-03-20 ES ES23712886T patent/ES3053472T3/es active Active
- 2023-03-20 EP EP23712886.3A patent/EP4494136B1/en active Active
- 2023-03-20 EP EP24223510.9A patent/EP4510131B1/en active Active
- 2023-03-20 CN CN202380036584.5A patent/CN119698656A/zh active Pending
- 2023-03-20 EP EP25208428.0A patent/EP4700772A3/en active Pending
-
2024
- 2024-09-18 US US18/888,957 patent/US20250087223A1/en active Pending
- 2024-09-18 US US18/889,102 patent/US20250014584A1/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| US20250087223A1 (en) | 2025-03-13 |
| PL4494137T3 (pl) | 2026-03-23 |
| EP4700772A3 (en) | 2026-03-18 |
| EP4494136A1 (en) | 2025-01-22 |
| EP4682878A2 (en) | 2026-01-21 |
| CN119096296A (zh) | 2024-12-06 |
| EP4682878A3 (en) | 2026-03-04 |
| EP4510131A2 (en) | 2025-02-19 |
| EP4494137A1 (en) | 2025-01-22 |
| EP4494136C0 (en) | 2025-10-15 |
| US20250014584A1 (en) | 2025-01-09 |
| EP4510131B1 (en) | 2026-04-22 |
| EP4494136B1 (en) | 2025-10-15 |
| EP4700772A2 (en) | 2026-02-25 |
| CN119698656A (zh) | 2025-03-25 |
| EP4494137C0 (en) | 2025-10-15 |
| ES3053472T3 (en) | 2026-01-22 |
| EP4510131A3 (en) | 2025-03-19 |
| WO2023175197A1 (en) | 2023-09-21 |
| WO2023175198A1 (en) | 2023-09-21 |
| PL4494136T3 (pl) | 2026-03-23 |
| EP4494137B1 (en) | 2025-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| ES3053473T3 (en) | Vocoder techniques | |
| MX2023004330A (es) | Generador de audio y métodos para generar una señal de audio y entrenamiento de un generador de audio. | |
| JP7701490B2 (ja) | ニューラルネットワークを使用したターゲット話者の声でのテキストからの音声合成 | |
| BR112023018522A2 (pt) | Aprimoramento de fala baseado em contexto | |
| BR112023022466A2 (pt) | Decodificador, métodos e unidade de armazenamento não transitória | |
| Gaido et al. | End-to-end speech-translation with knowledge distillation: FBK@ IWSLT2020 | |
| US9342509B2 (en) | Speech translation method and apparatus utilizing prosodic information | |
| US20190172443A1 (en) | System and method for generating expressive prosody for speech synthesis | |
| BR112023013902A2 (pt) | Geração de fala sintetizada | |
| WO2008038082A3 (en) | Prosody conversion | |
| BR9711448A (pt) | Processo para caracterizar microorganismos, aparelho paraa triagem de microorganismos, e, base de dados de microorganismos. | |
| CN112365879A (zh) | 语音合成方法、装置、电子设备和存储介质 | |
| BR112018007547A2 (pt) | adaptação relacionada à tela de conteúdo ambisonic de alta ordem (hoa) | |
| Basak et al. | End-to-end lyrics recognition with voice to singing style transfer | |
| CL2021000836A1 (es) | Sistemas y métodos para interpretar interacciones de alta energía | |
| Huu et al. | Mispronunciation detection and diagnosis model for tonal language, applied to Vietnamese. | |
| Gaido et al. | On knowledge distillation for direct speech translation | |
| US11074926B1 (en) | Trending and context fatigue compensation in a voice signal | |
| Garassino et al. | Vowel length in Intemelian Ligurian. An experimental and cross-dialectal investigation | |
| Leite et al. | A corpus of neutral voice speech in Brazilian Portuguese | |
| WO2025015219A3 (en) | External key authentication and regeneration | |
| JPWO2022159256A5 (es) | ||
| 鄭明中 | Voice onset time of syllable-initial stops in Sixian Hakka: Isolated syllables | |
| CN108628841A (zh) | 基于birch聚类算法翻译粤语口音和英语的app | |
| Shang et al. | The HW-TSC’s Simultaneous Speech-to-Speech Translation system for IWSLT 2023 evaluation |