WO2023141133A3 - Sound isolation - Google Patents

Sound isolation Download PDF

Info

Publication number
WO2023141133A3
WO2023141133A3 PCT/US2023/011012 US2023011012W WO2023141133A3 WO 2023141133 A3 WO2023141133 A3 WO 2023141133A3 US 2023011012 W US2023011012 W US 2023011012W WO 2023141133 A3 WO2023141133 A3 WO 2023141133A3
Authority
WO
WIPO (PCT)
Prior art keywords
training
vocal
training dataset
machine learning
learning model
Prior art date
Application number
PCT/US2023/011012
Other languages
French (fr)
Other versions
WO2023141133A2 (en
Inventor
Matthew FREED
Jackson BLUME
William Leon
Original Assignee
Malamute, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Malamute, Inc. filed Critical Malamute, Inc.
Publication of WO2023141133A2 publication Critical patent/WO2023141133A2/en
Publication of WO2023141133A3 publication Critical patent/WO2023141133A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Filters That Use Time-Delay Elements (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Machine Translation (AREA)

Abstract

Examples described herein provide a computer-implemented method that includes defining a training dataset. The training dataset includes a ground truth and a training input. The method further includes training a machine learning model to perform vocal extraction using the training dataset. The method further includes performing vocal extraction, using the machine learning model, on an audio stream to extract a vocal aspect of the audio stream.
PCT/US2023/011012 2022-01-20 2023-01-18 Sound isolation WO2023141133A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263301141P 2022-01-20 2022-01-20
US63/301,141 2022-01-20

Publications (2)

Publication Number Publication Date
WO2023141133A2 WO2023141133A2 (en) 2023-07-27
WO2023141133A3 true WO2023141133A3 (en) 2023-08-24

Family

ID=87348981

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/011012 WO2023141133A2 (en) 2022-01-20 2023-01-18 Sound isolation

Country Status (1)

Country Link
WO (1) WO2023141133A2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392802A1 (en) * 2018-06-25 2019-12-26 Casio Computer Co., Ltd. Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus
US20210249027A1 (en) * 2020-02-07 2021-08-12 Google Llc Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations
US20210360349A1 (en) * 2020-05-14 2021-11-18 Nvidia Corporation Audio noise determination using one or more neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190392802A1 (en) * 2018-06-25 2019-12-26 Casio Computer Co., Ltd. Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus
US20210249027A1 (en) * 2020-02-07 2021-08-12 Google Llc Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations
US20210360349A1 (en) * 2020-05-14 2021-11-18 Nvidia Corporation Audio noise determination using one or more neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIASONG WU; QINGCHUN LI; GUANYU YANG; LOTFI SENHADJI; HUAZHONG SHU: "Self-Supervised Speech Denoising Using Only Noisy Audio Signals", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 October 2021 (2021-10-30), 201 Olin Library Cornell University Ithaca, NY 14853, XP091246573 *

Also Published As

Publication number Publication date
WO2023141133A2 (en) 2023-07-27

Similar Documents

Publication Publication Date Title
EP4113354A3 (en) Method and apparatus for generating pre-trained language model, electronic device and storage medium
MX2016013015A (en) Methods and systems of handling a dialog with a robot.
MX2022008911A (en) Joint extraction of named entities and relations from text using machine learning models.
Kang The emergence of phonological adaptation from phonetic adaptation: English loanwords in Korean
JP2020003537A5 (en) Audio extraction device, audio playback device, audio extraction method, audio playback method, machine learning method and program
WO2019161193A3 (en) System and method for adaptive detection of spoken language via multiple speech models
WO2017166966A1 (en) Method and apparatus for constructing speech decoding network in digital speech recognition, and storage medium
EP3048607A3 (en) Automatic transcription of musical content and real-time musical accompaniment
EP4235648A3 (en) Language model biasing
CN109285535A (en) Phoneme synthesizing method based on Front-end Design
NZ713997A (en) System and method for fingerprinting datasets
MX2008002500A (en) Incorporation of speech engine training into interactive user tutorial.
ATE442641T1 (en) LANGUAGE RECOGNITION METHOD AND SYSTEM ADAPTED TO THE CHARACTERISTICS OF NON-NATIVE SPEAKERS
EP3996054A3 (en) Method and apparatus for image segmentation
WO2020070758A3 (en) Systems and methods for simulation of humans by human twin
EP3955243A3 (en) Speech generation using crosslingual phoneme mapping
BR112022011199A2 (en) EMOTION DETECTION IN AUDIO INTERACTIONS
GB2581705A (en) Abstraction and portablity to intent recognition
EP4152280A3 (en) Method and apparatus for recognizing text, and method and apparatus for training text recognition model
Yang et al. Machine learning approaches to improving pronunciation error detection on an imbalanced corpus
WO2023141133A3 (en) Sound isolation
WO2022249050A3 (en) Method and system for processing multilingual user inputs using single natural language processing model
EP4296875A3 (en) Method and apparatus for interactive and privacy-preserving communication between a server and a user device
WO2021236323A3 (en) Token packing for sequence models
Santos et al. CORAA NURCSP Minimal Corpus: a manually annotated corpus of Brazilian Portuguese spontaneous speech

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23743662

Country of ref document: EP

Kind code of ref document: A2