BR112023013902A2 - Geração de fala sintetizada - Google Patents
Geração de fala sintetizadaInfo
- Publication number
- BR112023013902A2 BR112023013902A2 BR112023013902A BR112023013902A BR112023013902A2 BR 112023013902 A2 BR112023013902 A2 BR 112023013902A2 BR 112023013902 A BR112023013902 A BR 112023013902A BR 112023013902 A BR112023013902 A BR 112023013902A BR 112023013902 A2 BR112023013902 A2 BR 112023013902A2
- Authority
- BR
- Brazil
- Prior art keywords
- speech
- synthesized speech
- speech generation
- control parameters
- processors
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Psychiatry (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Telephonic Communication Services (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
geração de fala sintetizada. a presente invenção refere-se a um dispositivo para geração de fala que inclui um ou mais processadores configurados para receber um ou mais parâmetros de controle que indicam características de fala alvo. o um ou mais processadores também são configurados para processar, usando um multicodificador, uma representação de entrada de fala com base no um ou mais parâmetros de controle para gerar dados codificados correspondentes a um sinal de áudio que representa uma versão da fala com base nos características de fala alvo.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/154,372 US11676571B2 (en) | 2021-01-21 | 2021-01-21 | Synthesized speech generation |
PCT/US2021/072800 WO2022159256A1 (en) | 2021-01-21 | 2021-12-08 | Synthesized speech generation |
Publications (1)
Publication Number | Publication Date |
---|---|
BR112023013902A2 true BR112023013902A2 (pt) | 2024-01-23 |
Family
ID=79092855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR112023013902A BR112023013902A2 (pt) | 2021-01-21 | 2021-12-08 | Geração de fala sintetizada |
Country Status (8)
Country | Link |
---|---|
US (1) | US11676571B2 (pt) |
EP (1) | EP4281964A1 (pt) |
JP (1) | JP2024504316A (pt) |
KR (1) | KR20230133295A (pt) |
CN (1) | CN116711002A (pt) |
BR (1) | BR112023013902A2 (pt) |
TW (1) | TW202230328A (pt) |
WO (1) | WO2022159256A1 (pt) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11676571B2 (en) * | 2021-01-21 | 2023-06-13 | Qualcomm Incorporated | Synthesized speech generation |
US20230018384A1 (en) * | 2021-07-14 | 2023-01-19 | Google Llc | Two-Level Text-To-Speech Systems Using Synthetic Training Data |
US12002455B2 (en) * | 2021-07-22 | 2024-06-04 | Qualcomm Incorporated | Semantically-augmented context representation generation |
US20230037541A1 (en) * | 2021-07-29 | 2023-02-09 | Xinapse Co., Ltd. | Method and system for synthesizing speeches by scoring speeches |
CN115394284B (zh) * | 2022-08-23 | 2024-09-24 | 平安科技(深圳)有限公司 | 语音合成方法、系统、设备及存储介质 |
US20240087597A1 (en) * | 2022-09-13 | 2024-03-14 | Qualcomm Incorporated | Source speech modification based on an input speech characteristic |
US20240304174A1 (en) * | 2023-03-08 | 2024-09-12 | Sony Interactive Entertainment Inc. | Ambient Noise Capture for Speech Synthesis of In-Game Character Voices |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105390141B (zh) * | 2015-10-14 | 2019-10-18 | 科大讯飞股份有限公司 | 声音转换方法和装置 |
KR102072162B1 (ko) * | 2018-01-05 | 2020-01-31 | 서울대학교산학협력단 | 인공 지능 기반 외국어 음성 합성 방법 및 장치 |
JP7178028B2 (ja) | 2018-01-11 | 2022-11-25 | ネオサピエンス株式会社 | 多言語テキスト音声合成モデルを利用した音声翻訳方法およびシステム |
US11138392B2 (en) | 2018-07-26 | 2021-10-05 | Google Llc | Machine translation using neural network models |
WO2020027619A1 (ko) * | 2018-08-02 | 2020-02-06 | 네오사피엔스 주식회사 | 순차적 운율 특징을 기초로 기계학습을 이용한 텍스트-음성 합성 방법, 장치 및 컴퓨터 판독가능한 저장매체 |
US10741169B1 (en) * | 2018-09-25 | 2020-08-11 | Amazon Technologies, Inc. | Text-to-speech (TTS) processing |
KR102663669B1 (ko) * | 2019-11-01 | 2024-05-08 | 엘지전자 주식회사 | 소음 환경에서의 음성 합성 |
CN111862953B (zh) * | 2019-12-05 | 2023-08-22 | 北京嘀嘀无限科技发展有限公司 | 语音识别模型的训练方法、语音识别方法及装置 |
US11335324B2 (en) * | 2020-08-31 | 2022-05-17 | Google Llc | Synthesized data augmentation using voice conversion and speech recognition models |
EP3989217B1 (en) * | 2020-10-22 | 2023-09-27 | Thomson Licensing | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium |
CN112562728B (zh) * | 2020-11-13 | 2024-06-18 | 百果园技术(新加坡)有限公司 | 生成对抗网络训练方法、音频风格迁移方法及装置 |
CN112466316A (zh) * | 2020-12-10 | 2021-03-09 | 青海民族大学 | 一种基于生成对抗网络的零样本语音转换系统 |
US11676571B2 (en) * | 2021-01-21 | 2023-06-13 | Qualcomm Incorporated | Synthesized speech generation |
-
2021
- 2021-01-21 US US17/154,372 patent/US11676571B2/en active Active
- 2021-12-08 EP EP21831478.9A patent/EP4281964A1/en active Pending
- 2021-12-08 KR KR1020237024214A patent/KR20230133295A/ko unknown
- 2021-12-08 JP JP2023543425A patent/JP2024504316A/ja active Pending
- 2021-12-08 BR BR112023013902A patent/BR112023013902A2/pt unknown
- 2021-12-08 WO PCT/US2021/072800 patent/WO2022159256A1/en active Application Filing
- 2021-12-08 CN CN202180091481.XA patent/CN116711002A/zh active Pending
- 2021-12-08 TW TW110145937A patent/TW202230328A/zh unknown
Also Published As
Publication number | Publication date |
---|---|
KR20230133295A (ko) | 2023-09-19 |
EP4281964A1 (en) | 2023-11-29 |
TW202230328A (zh) | 2022-08-01 |
WO2022159256A1 (en) | 2022-07-28 |
US20220230623A1 (en) | 2022-07-21 |
JP2024504316A (ja) | 2024-01-31 |
US11676571B2 (en) | 2023-06-13 |
CN116711002A (zh) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
BR112023013902A2 (pt) | Geração de fala sintetizada | |
BR112022004040A2 (pt) | Sistema para gerar dados animais simulados e modelos | |
BR112021024196A2 (pt) | Sistemas e métodos para aprendizado de máquina de atributos de voz | |
BR112023018522A2 (pt) | Aprimoramento de fala baseado em contexto | |
BR112018007547A2 (pt) | adaptação relacionada à tela de conteúdo ambisonic de alta ordem (hoa) | |
BR112018073902A2 (pt) | sinalização de vídeo de realidade virtual em transmissão contínua adaptativa dinâmica através de http | |
SG10201707702YA (en) | Collaborative Voice Controlled Devices | |
BR112014017457A8 (pt) | aparelho de transmissão de áudio espacial; aparelho de codificação de áudio espacial; método de geração de sinais de saída de áudio espacial; e método de codificação de áudio espacial | |
BR112022017511A2 (pt) | Método de reticulação de proteína | |
WO2018158178A3 (en) | Lighting script control | |
BR112023005820A2 (pt) | Notificação de validade para um modelo de aprendizado de máquina | |
BRPI0516971A (pt) | sistema para gerar eletronicamente uma forma de onda de reverberação artificial a partir de uma forma de onda de entrada e produto de programa de computador | |
BR112014017001A8 (pt) | classificação de sinal de múltiplos modos de codificação | |
BR112019004964A2 (pt) | gerador de caso de teste construído em editor de fluxo de trabalho de integração de dados | |
BR112015014830A2 (pt) | dispositivo e método de processamento de informação, e, programa | |
BR112023017511A2 (pt) | Operação de dispositivo baseada em classificador dinâmico | |
NO20080634L (no) | Skriptformatering | |
MX2023004329A (es) | Generador de audio y metodos para generar una se?al de audio y entrenar un generador de audio. | |
US20160133241A1 (en) | Composition engine | |
BR112023005891A2 (pt) | Sintaxe de alto nível para rotação de laser em compressão de nuvem de ponto de geometria (g-pcc) | |
BR112017000274A2 (pt) | sistema de comunicação audiovisual remoto entre dois ou mais usuários, lâmpada com luzes com características luminosas, e método de transmissão de um sinal emocional associado a um discurso para um usuário em uma posição remota em relação ao falante | |
BR112021025892A2 (pt) | Aparelho de detecção de falsificação, método de detecção de falsificação e meio de armazenamento legível por computador | |
BR112023022466A2 (pt) | Decodificador, métodos e unidade de armazenamento não transitória | |
US20160189694A1 (en) | Systems and methods for generating presentation system page commands | |
BR112022026158A2 (pt) | Aparelho e método para gerar um sinal de reverberação difusa |