WO2020068401A1 - Codage/décodage de tatouage audio - Google Patents
Codage/décodage de tatouage audio Download PDFInfo
- Publication number
- WO2020068401A1 WO2020068401A1 PCT/US2019/050161 US2019050161W WO2020068401A1 WO 2020068401 A1 WO2020068401 A1 WO 2020068401A1 US 2019050161 W US2019050161 W US 2019050161W WO 2020068401 A1 WO2020068401 A1 WO 2020068401A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- data
- audio data
- watermark
- determining
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
L'invention concerne un système qui peut incorporer des tatouages audio dans des données audio à l'aide d'une matrice de vecteur propre. Le système peut détecter des tatouages audio dans des données audio malgré les effets de réverbération. Par exemple, le système peut incorporer de multiples répétitions d'un tatouage audio avant de générer un contenu audio de sortie à l'aide d'un ou plusieurs haut-parleurs. Pour détecter le tatouage audio dans des données audio générées par un microphone, le système peut effectuer une auto-corrélation qui indique où le tatouage audio est répété. Selon certains exemples, le système peut coder le tatouage audio à l'aide de multiples répétitions d'un vecteur propre à segments multiples. En outre ou en variante, le système peut coder le tatouage audio à l'aide d'une séquence binaire de valeurs positives et négatives, qui peut être utilisée en tant que clé partagée pour coder/décoder le tatouage audio. Le tatouage audio peut être intégré dans des données audio de sortie pour permettre une suppression de mot de veille (par exemple, éviter une diaphonie entre des dispositifs) et/ou une transmission de signal local entre des dispositifs à proximité l'un de l'autre.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/141,578 | 2018-09-25 | ||
US16/141,489 US10950249B2 (en) | 2018-09-25 | 2018-09-25 | Audio watermark encoding/decoding |
US16/141,578 US10978081B2 (en) | 2018-09-25 | 2018-09-25 | Audio watermark encoding/decoding |
US16/141,489 | 2018-09-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020068401A1 true WO2020068401A1 (fr) | 2020-04-02 |
Family
ID=68000137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2019/050161 WO2020068401A1 (fr) | 2018-09-25 | 2019-09-09 | Codage/décodage de tatouage audio |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020068401A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4181121A1 (fr) * | 2018-05-22 | 2023-05-17 | Google LLC | Suppression de mots actifs |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1542227A1 (fr) * | 2003-12-11 | 2005-06-15 | Deutsche Thomson-Brandt Gmbh | Procédé et dispositif pour la transmission de bits de données d'un filigrane à spectre étalé et pour l'extraction de bits de données d'un filigrane intégré dans un spectre étalé |
US20140108020A1 (en) * | 2012-10-15 | 2014-04-17 | Digimarc Corporation | Multi-mode audio recognition and auxiliary data encoding and decoding |
-
2019
- 2019-09-09 WO PCT/US2019/050161 patent/WO2020068401A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1542227A1 (fr) * | 2003-12-11 | 2005-06-15 | Deutsche Thomson-Brandt Gmbh | Procédé et dispositif pour la transmission de bits de données d'un filigrane à spectre étalé et pour l'extraction de bits de données d'un filigrane intégré dans un spectre étalé |
US20140108020A1 (en) * | 2012-10-15 | 2014-04-17 | Digimarc Corporation | Multi-mode audio recognition and auxiliary data encoding and decoding |
Non-Patent Citations (2)
Title |
---|
TAI YUAN-YEN ET AL: "Audio Watermarking over the Air with Modulated Self-correlation", ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 12 May 2019 (2019-05-12), pages 2452 - 2456, XP033565880, DOI: 10.1109/ICASSP.2019.8683329 * |
YONG XIANG ET AL: "Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, IEEE, USA, vol. 26, no. 3, 1 March 2018 (2018-03-01), pages 529 - 539, XP058385078, ISSN: 2329-9290, DOI: 10.1109/TASLP.2017.2782487 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4181121A1 (fr) * | 2018-05-22 | 2023-05-17 | Google LLC | Suppression de mots actifs |
US11967323B2 (en) | 2018-05-22 | 2024-04-23 | Google Llc | Hotword suppression |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10950249B2 (en) | Audio watermark encoding/decoding | |
US10978081B2 (en) | Audio watermark encoding/decoding | |
Kong et al. | Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis | |
Zhang et al. | Deep learning for environmentally robust speech recognition: An overview of recent developments | |
Kameoka et al. | ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder | |
US11631404B2 (en) | Robust audio identification with interference cancellation | |
Wang et al. | Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking | |
Qian et al. | Single-channel multi-talker speech recognition with permutation invariant training | |
Wu et al. | An end-to-end deep learning approach to simultaneous speech dereverberation and acoustic modeling for robust speech recognition | |
Alharbi et al. | Automatic speech recognition: Systematic literature review | |
US10854186B1 (en) | Processing audio data received from local devices | |
US11017763B1 (en) | Synthetic speech processing | |
US20230298593A1 (en) | Method and apparatus for real-time sound enhancement | |
Yuliani et al. | Speech enhancement using deep learning methods: A review | |
Li et al. | A conditional generative model for speech enhancement | |
CN111261145B (zh) | 语音处理装置、设备及其训练方法 | |
Wang et al. | Enhanced Spectral Features for Distortion-Independent Acoustic Modeling. | |
Priyanka et al. | Multi-channel speech enhancement using early and late fusion convolutional neural networks | |
Wu et al. | Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party. | |
US11528571B1 (en) | Microphone occlusion detection | |
Cornell et al. | Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection | |
US11769491B1 (en) | Performing utterance detection using convolution | |
Sofer et al. | CNN self-attention voice activity detector | |
Park et al. | The Second DIHARD Challenge: System Description for USC-SAIL Team. | |
WO2020068401A1 (fr) | Codage/décodage de tatouage audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19773298 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19773298 Country of ref document: EP Kind code of ref document: A1 |