BR112023013902A2 - Geração de fala sintetizada - Google Patents

Geração de fala sintetizada

Info

Publication number
BR112023013902A2
BR112023013902A2 BR112023013902A BR112023013902A BR112023013902A2 BR 112023013902 A2 BR112023013902 A2 BR 112023013902A2 BR 112023013902 A BR112023013902 A BR 112023013902A BR 112023013902 A BR112023013902 A BR 112023013902A BR 112023013902 A2 BR112023013902 A2 BR 112023013902A2
Authority
BR
Brazil
Prior art keywords
speech
synthesized speech
speech generation
control parameters
processors
Prior art date
Application number
BR112023013902A
Other languages
English (en)
Inventor
Erik Visser
Kyungguen Byun
Lae-Hoon Kim
Shuhua Zhang
Sunkuk Moon
Vahid Montazeri
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of BR112023013902A2 publication Critical patent/BR112023013902A2/pt

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Psychiatry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

geração de fala sintetizada. a presente invenção refere-se a um dispositivo para geração de fala que inclui um ou mais processadores configurados para receber um ou mais parâmetros de controle que indicam características de fala alvo. o um ou mais processadores também são configurados para processar, usando um multicodificador, uma representação de entrada de fala com base no um ou mais parâmetros de controle para gerar dados codificados correspondentes a um sinal de áudio que representa uma versão da fala com base nos características de fala alvo.
BR112023013902A 2021-01-21 2021-12-08 Geração de fala sintetizada BR112023013902A2 (pt)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/154,372 US11676571B2 (en) 2021-01-21 2021-01-21 Synthesized speech generation
PCT/US2021/072800 WO2022159256A1 (en) 2021-01-21 2021-12-08 Synthesized speech generation

Publications (1)

Publication Number Publication Date
BR112023013902A2 true BR112023013902A2 (pt) 2024-01-23

Family

ID=79092855

Family Applications (1)

Application Number Title Priority Date Filing Date
BR112023013902A BR112023013902A2 (pt) 2021-01-21 2021-12-08 Geração de fala sintetizada

Country Status (8)

Country Link
US (1) US11676571B2 (pt)
EP (1) EP4281964A1 (pt)
JP (1) JP2024504316A (pt)
KR (1) KR20230133295A (pt)
CN (1) CN116711002A (pt)
BR (1) BR112023013902A2 (pt)
TW (1) TW202230328A (pt)
WO (1) WO2022159256A1 (pt)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11676571B2 (en) * 2021-01-21 2023-06-13 Qualcomm Incorporated Synthesized speech generation
US20230018384A1 (en) * 2021-07-14 2023-01-19 Google Llc Two-Level Text-To-Speech Systems Using Synthetic Training Data
US12002455B2 (en) * 2021-07-22 2024-06-04 Qualcomm Incorporated Semantically-augmented context representation generation
US20230037541A1 (en) * 2021-07-29 2023-02-09 Xinapse Co., Ltd. Method and system for synthesizing speeches by scoring speeches
CN115394284B (zh) * 2022-08-23 2024-09-24 平安科技(深圳)有限公司 语音合成方法、系统、设备及存储介质
US20240087597A1 (en) * 2022-09-13 2024-03-14 Qualcomm Incorporated Source speech modification based on an input speech characteristic
US20240304174A1 (en) * 2023-03-08 2024-09-12 Sony Interactive Entertainment Inc. Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105390141B (zh) * 2015-10-14 2019-10-18 科大讯飞股份有限公司 声音转换方法和装置
KR102072162B1 (ko) * 2018-01-05 2020-01-31 서울대학교산학협력단 인공 지능 기반 외국어 음성 합성 방법 및 장치
JP7178028B2 (ja) 2018-01-11 2022-11-25 ネオサピエンス株式会社 多言語テキスト音声合成モデルを利用した音声翻訳方法およびシステム
US11138392B2 (en) 2018-07-26 2021-10-05 Google Llc Machine translation using neural network models
WO2020027619A1 (ko) * 2018-08-02 2020-02-06 네오사피엔스 주식회사 순차적 운율 특징을 기초로 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체
US10741169B1 (en) * 2018-09-25 2020-08-11 Amazon Technologies, Inc. Text-to-speech (TTS) processing
KR102663669B1 (ko) * 2019-11-01 2024-05-08 엘지전자 주식회사 소음 환경에서의 음성 합성
CN111862953B (zh) * 2019-12-05 2023-08-22 北京嘀嘀无限科技发展有限公司 语音识别模型的训练方法、语音识别方法及装置
US11335324B2 (en) * 2020-08-31 2022-05-17 Google Llc Synthesized data augmentation using voice conversion and speech recognition models
EP3989217B1 (en) * 2020-10-22 2023-09-27 Thomson Licensing Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN112562728B (zh) * 2020-11-13 2024-06-18 百果园技术(新加坡)有限公司 生成对抗网络训练方法、音频风格迁移方法及装置
CN112466316A (zh) * 2020-12-10 2021-03-09 青海民族大学 一种基于生成对抗网络的零样本语音转换系统
US11676571B2 (en) * 2021-01-21 2023-06-13 Qualcomm Incorporated Synthesized speech generation

Also Published As

Publication number Publication date
KR20230133295A (ko) 2023-09-19
EP4281964A1 (en) 2023-11-29
TW202230328A (zh) 2022-08-01
WO2022159256A1 (en) 2022-07-28
US20220230623A1 (en) 2022-07-21
JP2024504316A (ja) 2024-01-31
US11676571B2 (en) 2023-06-13
CN116711002A (zh) 2023-09-05

Similar Documents

Publication Publication Date Title
BR112023013902A2 (pt) Geração de fala sintetizada
BR112022004040A2 (pt) Sistema para gerar dados animais simulados e modelos
BR112021024196A2 (pt) Sistemas e métodos para aprendizado de máquina de atributos de voz
BR112023018522A2 (pt) Aprimoramento de fala baseado em contexto
BR112018007547A2 (pt) adaptação relacionada à tela de conteúdo ambisonic de alta ordem (hoa)
BR112018073902A2 (pt) sinalização de vídeo de realidade virtual em transmissão contínua adaptativa dinâmica através de http
SG10201707702YA (en) Collaborative Voice Controlled Devices
BR112014017457A8 (pt) aparelho de transmissão de áudio espacial; aparelho de codificação de áudio espacial; método de geração de sinais de saída de áudio espacial; e método de codificação de áudio espacial
BR112022017511A2 (pt) Método de reticulação de proteína
WO2018158178A3 (en) Lighting script control
BR112023005820A2 (pt) Notificação de validade para um modelo de aprendizado de máquina
BRPI0516971A (pt) sistema para gerar eletronicamente uma forma de onda de reverberação artificial a partir de uma forma de onda de entrada e produto de programa de computador
BR112014017001A8 (pt) classificação de sinal de múltiplos modos de codificação
BR112019004964A2 (pt) gerador de caso de teste construído em editor de fluxo de trabalho de integração de dados
BR112015014830A2 (pt) dispositivo e método de processamento de informação, e, programa
BR112023017511A2 (pt) Operação de dispositivo baseada em classificador dinâmico
NO20080634L (no) Skriptformatering
MX2023004329A (es) Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio.
US20160133241A1 (en) Composition engine
BR112023005891A2 (pt) Sintaxe de alto nível para rotação de laser em compressão de nuvem de ponto de geometria (g-pcc)
BR112017000274A2 (pt) sistema de comunicação audiovisual remoto entre dois ou mais usuários, lâmpada com luzes com características luminosas, e método de transmissão de um sinal emocional associado a um discurso para um usuário em uma posição remota em relação ao falante
BR112021025892A2 (pt) Aparelho de detecção de falsificação, método de detecção de falsificação e meio de armazenamento legível por computador
BR112023022466A2 (pt) Decodificador, métodos e unidade de armazenamento não transitória
US20160189694A1 (en) Systems and methods for generating presentation system page commands
BR112022026158A2 (pt) Aparelho e método para gerar um sinal de reverberação difusa