WO2023141133A3 - Sound isolation - Google Patents
Sound isolation Download PDFInfo
- Publication number
- WO2023141133A3 WO2023141133A3 PCT/US2023/011012 US2023011012W WO2023141133A3 WO 2023141133 A3 WO2023141133 A3 WO 2023141133A3 US 2023011012 W US2023011012 W US 2023011012W WO 2023141133 A3 WO2023141133 A3 WO 2023141133A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- training
- vocal
- training dataset
- machine learning
- learning model
- Prior art date
Links
- 238000002955 isolation Methods 0.000 title 1
- 238000000034 method Methods 0.000 abstract 3
- 230000001755 vocal effect Effects 0.000 abstract 3
- 238000000605 extraction Methods 0.000 abstract 2
- 238000010801 machine learning Methods 0.000 abstract 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Filters That Use Time-Delay Elements (AREA)
- Electrically Operated Instructional Devices (AREA)
- Circuit For Audible Band Transducer (AREA)
- Machine Translation (AREA)
Abstract
Examples described herein provide a computer-implemented method that includes defining a training dataset. The training dataset includes a ground truth and a training input. The method further includes training a machine learning model to perform vocal extraction using the training dataset. The method further includes performing vocal extraction, using the machine learning model, on an audio stream to extract a vocal aspect of the audio stream.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263301141P | 2022-01-20 | 2022-01-20 | |
US63/301,141 | 2022-01-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023141133A2 WO2023141133A2 (en) | 2023-07-27 |
WO2023141133A3 true WO2023141133A3 (en) | 2023-08-24 |
Family
ID=87348981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/011012 WO2023141133A2 (en) | 2022-01-20 | 2023-01-18 | Sound isolation |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023141133A2 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190392802A1 (en) * | 2018-06-25 | 2019-12-26 | Casio Computer Co., Ltd. | Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus |
US20210249027A1 (en) * | 2020-02-07 | 2021-08-12 | Google Llc | Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations |
US20210360349A1 (en) * | 2020-05-14 | 2021-11-18 | Nvidia Corporation | Audio noise determination using one or more neural networks |
-
2023
- 2023-01-18 WO PCT/US2023/011012 patent/WO2023141133A2/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190392802A1 (en) * | 2018-06-25 | 2019-12-26 | Casio Computer Co., Ltd. | Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus |
US20210249027A1 (en) * | 2020-02-07 | 2021-08-12 | Google Llc | Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations |
US20210360349A1 (en) * | 2020-05-14 | 2021-11-18 | Nvidia Corporation | Audio noise determination using one or more neural networks |
Non-Patent Citations (1)
Title |
---|
JIASONG WU; QINGCHUN LI; GUANYU YANG; LOTFI SENHADJI; HUAZHONG SHU: "Self-Supervised Speech Denoising Using Only Noisy Audio Signals", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 October 2021 (2021-10-30), 201 Olin Library Cornell University Ithaca, NY 14853, XP091246573 * |
Also Published As
Publication number | Publication date |
---|---|
WO2023141133A2 (en) | 2023-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4113354A3 (en) | Method and apparatus for generating pre-trained language model, electronic device and storage medium | |
MX2016013015A (en) | Methods and systems of handling a dialog with a robot. | |
MX2022008911A (en) | Joint extraction of named entities and relations from text using machine learning models. | |
Kang | The emergence of phonological adaptation from phonetic adaptation: English loanwords in Korean | |
JP2020003537A5 (en) | Audio extraction device, audio playback device, audio extraction method, audio playback method, machine learning method and program | |
WO2019161193A3 (en) | System and method for adaptive detection of spoken language via multiple speech models | |
WO2017166966A1 (en) | Method and apparatus for constructing speech decoding network in digital speech recognition, and storage medium | |
EP3048607A3 (en) | Automatic transcription of musical content and real-time musical accompaniment | |
EP4235648A3 (en) | Language model biasing | |
CN109285535A (en) | Phoneme synthesizing method based on Front-end Design | |
NZ713997A (en) | System and method for fingerprinting datasets | |
MX2008002500A (en) | Incorporation of speech engine training into interactive user tutorial. | |
ATE442641T1 (en) | LANGUAGE RECOGNITION METHOD AND SYSTEM ADAPTED TO THE CHARACTERISTICS OF NON-NATIVE SPEAKERS | |
EP3996054A3 (en) | Method and apparatus for image segmentation | |
WO2020070758A3 (en) | Systems and methods for simulation of humans by human twin | |
EP3955243A3 (en) | Speech generation using crosslingual phoneme mapping | |
BR112022011199A2 (en) | EMOTION DETECTION IN AUDIO INTERACTIONS | |
GB2581705A (en) | Abstraction and portablity to intent recognition | |
EP4152280A3 (en) | Method and apparatus for recognizing text, and method and apparatus for training text recognition model | |
Yang et al. | Machine learning approaches to improving pronunciation error detection on an imbalanced corpus | |
WO2023141133A3 (en) | Sound isolation | |
WO2022249050A3 (en) | Method and system for processing multilingual user inputs using single natural language processing model | |
EP4296875A3 (en) | Method and apparatus for interactive and privacy-preserving communication between a server and a user device | |
WO2021236323A3 (en) | Token packing for sequence models | |
Santos et al. | CORAA NURCSP Minimal Corpus: a manually annotated corpus of Brazilian Portuguese spontaneous speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23743662 Country of ref document: EP Kind code of ref document: A2 |