WO2020068401A1 - Codage/décodage de tatouage audio - Google Patents

Codage/décodage de tatouage audio Download PDF

Info

Publication number
WO2020068401A1
WO2020068401A1 PCT/US2019/050161 US2019050161W WO2020068401A1 WO 2020068401 A1 WO2020068401 A1 WO 2020068401A1 US 2019050161 W US2019050161 W US 2019050161W WO 2020068401 A1 WO2020068401 A1 WO 2020068401A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
data
audio data
watermark
determining
Prior art date
Application number
PCT/US2019/050161
Other languages
English (en)
Inventor
Yuan-Yen Tai
Mohamed Mansour
Parind Shah
Original Assignee
Amazon Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/141,489 external-priority patent/US10950249B2/en
Priority claimed from US16/141,578 external-priority patent/US10978081B2/en
Application filed by Amazon Technologies, Inc. filed Critical Amazon Technologies, Inc.
Publication of WO2020068401A1 publication Critical patent/WO2020068401A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un système qui peut incorporer des tatouages audio dans des données audio à l'aide d'une matrice de vecteur propre. Le système peut détecter des tatouages audio dans des données audio malgré les effets de réverbération. Par exemple, le système peut incorporer de multiples répétitions d'un tatouage audio avant de générer un contenu audio de sortie à l'aide d'un ou plusieurs haut-parleurs. Pour détecter le tatouage audio dans des données audio générées par un microphone, le système peut effectuer une auto-corrélation qui indique où le tatouage audio est répété. Selon certains exemples, le système peut coder le tatouage audio à l'aide de multiples répétitions d'un vecteur propre à segments multiples. En outre ou en variante, le système peut coder le tatouage audio à l'aide d'une séquence binaire de valeurs positives et négatives, qui peut être utilisée en tant que clé partagée pour coder/décoder le tatouage audio. Le tatouage audio peut être intégré dans des données audio de sortie pour permettre une suppression de mot de veille (par exemple, éviter une diaphonie entre des dispositifs) et/ou une transmission de signal local entre des dispositifs à proximité l'un de l'autre.
PCT/US2019/050161 2018-09-25 2019-09-09 Codage/décodage de tatouage audio WO2020068401A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16/141,578 2018-09-25
US16/141,489 US10950249B2 (en) 2018-09-25 2018-09-25 Audio watermark encoding/decoding
US16/141,578 US10978081B2 (en) 2018-09-25 2018-09-25 Audio watermark encoding/decoding
US16/141,489 2018-09-25

Publications (1)

Publication Number Publication Date
WO2020068401A1 true WO2020068401A1 (fr) 2020-04-02

Family

ID=68000137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/050161 WO2020068401A1 (fr) 2018-09-25 2019-09-09 Codage/décodage de tatouage audio

Country Status (1)

Country Link
WO (1) WO2020068401A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4181121A1 (fr) * 2018-05-22 2023-05-17 Google LLC Suppression de mots actifs

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1542227A1 (fr) * 2003-12-11 2005-06-15 Deutsche Thomson-Brandt Gmbh Procédé et dispositif pour la transmission de bits de données d'un filigrane à spectre étalé et pour l'extraction de bits de données d'un filigrane intégré dans un spectre étalé
US20140108020A1 (en) * 2012-10-15 2014-04-17 Digimarc Corporation Multi-mode audio recognition and auxiliary data encoding and decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1542227A1 (fr) * 2003-12-11 2005-06-15 Deutsche Thomson-Brandt Gmbh Procédé et dispositif pour la transmission de bits de données d'un filigrane à spectre étalé et pour l'extraction de bits de données d'un filigrane intégré dans un spectre étalé
US20140108020A1 (en) * 2012-10-15 2014-04-17 Digimarc Corporation Multi-mode audio recognition and auxiliary data encoding and decoding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAI YUAN-YEN ET AL: "Audio Watermarking over the Air with Modulated Self-correlation", ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 12 May 2019 (2019-05-12), pages 2452 - 2456, XP033565880, DOI: 10.1109/ICASSP.2019.8683329 *
YONG XIANG ET AL: "Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, IEEE, USA, vol. 26, no. 3, 1 March 2018 (2018-03-01), pages 529 - 539, XP058385078, ISSN: 2329-9290, DOI: 10.1109/TASLP.2017.2782487 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4181121A1 (fr) * 2018-05-22 2023-05-17 Google LLC Suppression de mots actifs
US11967323B2 (en) 2018-05-22 2024-04-23 Google Llc Hotword suppression

Similar Documents

Publication Publication Date Title
US10950249B2 (en) Audio watermark encoding/decoding
US10978081B2 (en) Audio watermark encoding/decoding
Kong et al. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis
Zhang et al. Deep learning for environmentally robust speech recognition: An overview of recent developments
Kameoka et al. ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder
US11631404B2 (en) Robust audio identification with interference cancellation
Wang et al. Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking
Qian et al. Single-channel multi-talker speech recognition with permutation invariant training
Wu et al. An end-to-end deep learning approach to simultaneous speech dereverberation and acoustic modeling for robust speech recognition
Alharbi et al. Automatic speech recognition: Systematic literature review
US10854186B1 (en) Processing audio data received from local devices
US11017763B1 (en) Synthetic speech processing
US20230298593A1 (en) Method and apparatus for real-time sound enhancement
Yuliani et al. Speech enhancement using deep learning methods: A review
Li et al. A conditional generative model for speech enhancement
CN111261145B (zh) 语音处理装置、设备及其训练方法
Wang et al. Enhanced Spectral Features for Distortion-Independent Acoustic Modeling.
Priyanka et al. Multi-channel speech enhancement using early and late fusion convolutional neural networks
Wu et al. Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party.
US11528571B1 (en) Microphone occlusion detection
Cornell et al. Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection
US11769491B1 (en) Performing utterance detection using convolution
Sofer et al. CNN self-attention voice activity detector
Park et al. The Second DIHARD Challenge: System Description for USC-SAIL Team.
WO2020068401A1 (fr) Codage/décodage de tatouage audio

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19773298

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19773298

Country of ref document: EP

Kind code of ref document: A1