MX2023004329A - Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio. - Google Patents

Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio.

Info

Publication number
MX2023004329A
MX2023004329A MX2023004329A MX2023004329A MX2023004329A MX 2023004329 A MX2023004329 A MX 2023004329A MX 2023004329 A MX2023004329 A MX 2023004329A MX 2023004329 A MX2023004329 A MX 2023004329A MX 2023004329 A MX2023004329 A MX 2023004329A
Authority
MX
Mexico
Prior art keywords
data
audio
generator
audio generator
training
Prior art date
Application number
MX2023004329A
Other languages
English (en)
Inventor
Guillaume Fuchs
Markus Multrus
Jan Büthe
Srikanth Korse
Ahmed Mustafa Mahmoud Ahmed
Nicola Pia
Kishan Gupta
Original Assignee
Fraunhofer Ges Forschung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Ges Forschung filed Critical Fraunhofer Ges Forschung
Publication of MX2023004329A publication Critical patent/MX2023004329A/es

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Se describen técnicas para generar una señal de audio y entrenar un generador de audio. Un generador de audio (10) puede generar una señal de audio (16) a partir de una señal de entrada (14) y datos objetivo (12) que representan la señal de audio (16). Los datos objetivo (12) se derivan de texto. El generador de audio comprende: un primer bloque de procesamiento (40, 50, 50a-50h), que recibe primeros datos (15, 59a) derivados de la señal de entrada (14) y que genera primeros datos de salida (69); un segundo bloque de procesamiento (45), que recibe, como segundos datos, los primeros datos de salida (69) o datos derivados de los primeros datos de salida (69). El primer bloque de procesamiento (50) comprende: un conjunto de acondicionamiento de capas de aprendizaje (71, 72, 73) configuradas para procesar los datos objetivo (12) para obtener parámetros de características de acondicionamiento (74, 75); y un elemento de estilo (77), configurado para aplicar los parámetros de las características de acondicionamiento (74, 75) a los primeros datos (15, 59a) o primeros datos normalizados (59, 76').
MX2023004329A 2020-10-15 2021-10-13 Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio. MX2023004329A (es)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP20202058 2020-10-15
PCT/EP2021/072075 WO2022078651A1 (en) 2020-10-15 2021-08-06 Audio generator and methods for generating an audio signal and training an audio generator
PCT/EP2021/078371 WO2022079129A1 (en) 2020-10-15 2021-10-13 Audio generator and methods for generating an audio signal and training an audio generator

Publications (1)

Publication Number Publication Date
MX2023004329A true MX2023004329A (es) 2023-06-13

Family

ID=72964439

Family Applications (2)

Application Number Title Priority Date Filing Date
MX2023004329A MX2023004329A (es) 2020-10-15 2021-10-13 Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio.
MX2023004330A MX2023004330A (es) 2020-10-15 2021-10-13 Generador de audio y métodos para generar una señal de audio y entrenamiento de un generador de audio.

Family Applications After (1)

Application Number Title Priority Date Filing Date
MX2023004330A MX2023004330A (es) 2020-10-15 2021-10-13 Generador de audio y métodos para generar una señal de audio y entrenamiento de un generador de audio.

Country Status (8)

Country Link
US (2) US20230282202A1 (es)
EP (2) EP4229623A1 (es)
JP (2) JP2023546098A (es)
KR (2) KR20230109630A (es)
CN (2) CN116686042A (es)
CA (2) CA3195578A1 (es)
MX (2) MX2023004329A (es)
WO (4) WO2022078634A1 (es)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092503B (zh) * 2023-04-06 2023-06-20 华侨大学 联合时域和频域的伪造语音检测方法、装置、设备及介质
CN116403562B (zh) * 2023-04-11 2023-12-05 广州九四智能科技有限公司 一种基于语义信息自动预测停顿的语音合成方法、系统
CN117292672B (zh) * 2023-11-27 2024-01-30 厦门大学 一种基于矫正流模型的高质量语音合成方法
CN117592384B (zh) * 2024-01-19 2024-05-03 广州市车厘子电子科技有限公司 一种基于生成对抗网络的主动声浪生成方法
CN117809621B (zh) * 2024-02-29 2024-06-11 暗物智能科技(广州)有限公司 一种语音合成方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10068557B1 (en) * 2017-08-23 2018-09-04 Google Llc Generating music with deep neural networks
KR20230043250A (ko) * 2018-05-17 2023-03-30 구글 엘엘씨 뉴럴 네트워크들을 사용하여 대상 화자의 음성으로 텍스트로부터의 스피치의 합성
CN110060690B (zh) * 2019-04-04 2023-03-24 南京邮电大学 基于STARGAN和ResNet的多对多说话人转换方法

Also Published As

Publication number Publication date
JP2023546099A (ja) 2023-11-01
EP4229624A1 (en) 2023-08-23
WO2022078651A1 (en) 2022-04-21
KR20230109631A (ko) 2023-07-20
CA3195582A1 (en) 2022-04-21
MX2023004330A (es) 2023-06-13
WO2022079129A1 (en) 2022-04-21
US20230317056A1 (en) 2023-10-05
CN116648742A (zh) 2023-08-25
CN116686042A (zh) 2023-09-01
KR20230109630A (ko) 2023-07-20
EP4229623A1 (en) 2023-08-23
CA3195578A1 (en) 2022-04-21
WO2022079130A1 (en) 2022-04-21
US20230282202A1 (en) 2023-09-07
JP2023546098A (ja) 2023-11-01
WO2022078634A1 (en) 2022-04-21

Similar Documents

Publication Publication Date Title
MX2023004329A (es) Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio.
US11688386B2 (en) Wearable vibrotactile speech aid
US20210055796A1 (en) Tactile audio enhancement
EP3822814A3 (en) Human-machine interaction method and apparatus based on neural network
Neekhara et al. Expediting TTS synthesis with adversarial vocoding
WO2021085506A1 (ja) 振動制御装置,振動制御プログラム及び振動制御方法
WO2021077086A8 (en) Synthesizing audio of a venue
AU2019394097A8 (en) Apparatus, method and computer program for encoding, decoding, scene processing and other procedures related to DirAC based spatial audio coding using diffuse compensation
EP3797415B1 (en) Sound processing apparatus and method for sound enhancement
WO2022110259A1 (zh) 振动生成方法、振动控制方法及其相关设备
WO2021219798A3 (en) Method, apparatus and system for enhancing multi-channel audio in a dynamic range reduced domain
Comunità et al. Modelling black-box audio effects with time-varying feature modulation
CN117456989A (zh) 基于全同态加密的隐私保护语音分类方法及系统
CN206894872U (zh) 一种集成麦克风接收阵列的超声定向发射参量阵
TW201325268A (zh) 虛擬實境音源定位裝置
JP2020166299A5 (ja) 音声合成方法、音声合成システムおよびプログラム
CN113241054B (zh) 语音平滑处理模型生成方法、语音平滑处理方法及装置
WO2023063880A3 (en) System and method for training a transformer-in-transformer-based neural network model for audio data
US20220375485A1 (en) Signal processing apparatus, signal processing method, and program
US11601768B2 (en) Method of generating sounds for reducing an effect of tinnitus and tinnitus control instrument performing the same
JP2007033804A (ja) 音源分離装置,音源分離プログラム及び音源分離方法
Fornari et al. Creating soundscapes using evolutionary spatial control
US20240046949A1 (en) Real-time audio processing system, real-time audio processing program, and method for training speech analysis model
US20230147412A1 (en) Systems and methods for authoring immersive haptic experience using spectral centroid
GB0212147D0 (en) Speech processing apparatus and method